<<

Canopy Position Influences on Gene Expression and Quality

Bing Cheng M.Sc., B.Sc.

A thesis submitted for the degree of Doctor of Philosophy at The University of Queensland in 2018 Queensland Alliance for Agriculture and Food Innovation

Abstract

Coffee is one of the most valuable commodities exported around the world. Only two out of 125 species are highly valued in the genus , Arabica and Robusta coffee, accounting for 62% and 35% of the world . This project aimed to improve the understanding of the molecular and genetic basis of coffee quality through the detection of the upper and lower canopy position influences on quality and gene expression. Selection from different parts of the plant might help to meet the increased threat from climate change and the demands of highly discerning consumers.

Coffee quality is a very complicated trait depending on different perspectives. Phenotypic analysis was used to systematically access coffee quality through measurement of physical traits (including bean weight and size (length, width and thickness)), chemical components analysis (including determination of sucrose, trigonelline and content) and sensory evaluation to assess the aroma. To understand the influence of canopy position on coffee quality and gene expression, transcriptome analysis was conducted in this study. A reference transcriptome was generated for Arabica coffee using long read sequencing. To understand the dynamic of canopy position influences, three different ripening stages were selected. Upper canopy samples were used to understand the processes associated with coffee bean ripening and how the key components were accumulated in coffee beans through ripening. Thereafter, comparative transcriptome analysis was conducted to characterise the influence of canopy position.

Higher sucrose, trigonelline and caffeine contents and greater aroma intensity were found in coffee beans from the lower canopy. According to transcriptome analysis, this increase resulted from increased expression of genes likely to be associated with these bean components and slower but longer maturation in coffee beans from the lower canopy. The slower and longer maturation was likely to be related with differential expression of genes associated with hormone inhibition and slower loss of photosynthesis. This study greatly improved understanding of molecular and genetic basis of coffee quality and canopy influences. Candidate genes from this study are potential for future breeding for high quality coffee.

II

Declaration by author

This thesis is composed of my original work, and contains no material previously published or written by another person except where due reference has been made in the text. I have clearly stated the contribution by others to jointly-authored works that I have included in my thesis.

I have clearly stated the contribution of others to my thesis as a whole, including statistical assistance, survey design, data analysis, significant technical procedures, professional editorial advice, financial support and any other original research work used or reported in my thesis. The content of my thesis is the result of work I have carried out since the commencement of my higher degree by research candidature and does not include a substantial part of work that has been submitted to qualify for the award of any other degree or diploma in any university or other tertiary institution. I have clearly stated which parts of my thesis, if any, have been submitted to qualify for another award.

I acknowledge that an electronic copy of my thesis must be lodged with the University Library and, subject to the policy and procedures of The University of Queensland, the thesis be made available for research and study in accordance with the Copyright Act 1968 unless a period of embargo has been approved by the Dean of the Graduate School.

I acknowledge that copyright of all material contained in my thesis resides with the copyright holder(s) of that material. Where appropriate I have obtained copyright permission from the copyright holder to reproduce material in this thesis and have sought permission from co- authors for any jointly authored works included in the thesis.

Publications included in this thesis

1. Cheng, B., Furtado, A., Smyth, H.E. and Henry, R.J., 2016. Influence of genotype and environment on coffee quality. Trends in Food Science & Technology, 57, pp.20-30. (This publication is extracted from chapter 2) Contributor Statement of contribution

III

Author Bing Cheng (Candidate) Conception and design (70%) Analysis and interpretation (80%) Drafting and production (80%) Author Robert J. Henry Conception and design (20%) Analysis and interpretation (20%) Drafting and production (10%) Author Agnelo Furtado Drafting and production (5%) Author Heather E. Smyth Drafting and production (5%)

2. Cheng, B., Furtado, A. and Henry, R.J., 2017. Long-read sequencing of the coffee bean transcriptome reveals the diversity of full-length transcripts. GigaScience, 6 (11), pp1-13. (This publication is incorporated in chapter 4) Contributor Statement of contribution Author Bing Cheng (Candidate) Conception and design (50%) Analysis and interpretation (90%) Drafting and production (80%) Author Robert J. Henry Conception and design (40%) Analysis and interpretation (10%) Drafting and production (15%) Author Agnelo Furtado Conception and design (10%) Drafting and production (5%)

Cheng, B., Furtado, A. and Henry, R.J., 2018. The coffee bean transcriptome explains the accumulation of the major bean components through ripening. Scientific Reports, 8 (1) (This manuscript is incorporated in chapter 5) Contributor Statement of contribution Author Bing Cheng (Candidate) Conception and design (50%) Analysis and interpretation (80%) Drafting and production (80%) Author Robert J. Henry Conception and design (40%) Analysis and interpretation (10%) Drafting and production (10%) Author Agnelo Furtado Conception and design (10%)

IV

Analysis and interpretation (10%) Drafting and production (10%)

Submitted manuscripts included in this thesis

Cheng, B., Furtado, A., Smyth, H.E. and Henry, R.J. Analysis of Bean Phenotype and the Transcriptome from Different Canopy Positions Reveals Candidate Genes Influencing Coffee Quality. Submitted. (This manuscript is incorporated in chapter 3 and 6). Contributor Statement of contribution Author Bing Cheng (Candidate) Conception and design (50%) Analysis and interpretation (70%) Drafting and production (70%) Author Robert J. Henry Conception and design (30%) Analysis and interpretation (10%) Drafting and production (20%) Author Agnelo Furtado Conception and design (10%) Analysis and interpretation (10%) Drafting and production (5%) Author Heather E. Smyth Conception and design (10%) Analysis and interpretation (10%) Drafting and production (5%)

Other publications during candidature

No other publications.

Contributions by others to the thesis

No contributions by others.

Statement of parts of the thesis submitted to qualify for the award of another degree

No works submitted towards another degree have been included in this thesis.

Research Involving Human or Animal Subjects

No animal or human subjects were involved in this research.

V

Acknowledgements

At this time I am about to finish, I really want to express my thanks to people who helped and made my PhD journey easier and enjoyable.

My great thanks first goes to my supervisor, Professor Robert J. Henry, who was open- minded and accepted my PhD application when I was a beginner in bioinformatics, biotechnology and plant biology. Robert’s focus on the main question always keeps me on track. Without his patience and encouragement, I could not have finished this project so smoothly. The next thanks is to Dr. Agnelo Furtado, my co-supervisor. Agnelo taught me a lot in experiment design, record and data analysis. This makes my work more efficient and reproductive, and this training also benefits me in the non-academic field. I also appreciated my other co-supervisor, Dr Heather E. Smyth’s. She was very helpful in the phenotypic analysis, contributed critical ideas to the experiment, results and manuscripts. In addition, I am grateful to the useful feedbacks from my thesis panel, Dr. Tim O’ Hare, A/Prof. Bruce Topp and Dr. Glen Fox, as well as my thesis examiners. These feedbacks provide different perspectives of the results and are very useful to improve my thesis chapters.

I thank Green Cauldron Coffee, Australia for providing the coffee fruit materials, pulping, husking and drying facilities; Poss Reading, Marta Brozynska, Adam Healey, Tiparat Tikapunya, Zhi Xian Lim, Ravi Nirmal, Nam Hoang and Hayba Badro for assistance with sample harvest and Poss Reading and Erli Wang for coffee sample processing. I thank the Wenny Bekti Sunarharum, Hue Tran and Tom Kukhang, who shared their knowledge in coffee. I thank the good feedbacks for Chapter six from Tom Kukhang, Douglas Soltis and Pam Soltis. I thank the critical feedbacks to my manuscripts from the anonymous peer- reviewers, Sandeep Chakraborty and Stephanie Bocs.

Instrumental thanks go to Health Food Science Precinct (HFSP) of Department of Agriculture and Fisheries and the Research Computing Centre (RCC) of The University of Queensland for the phenotypic analysis. Special acknowledgement goes to Kent Fanning (Department of Agriculture and Fisheries, Queensland) and Igor Makunin (RCC), Kevin Smyth (RCC) and Erli Wang (School of Mathematics and Physics) from the University of Queensland. I thank Kent Fanning’s help with the HPLC system. Igor helped me with issues on Galaxy very efficiently and helped me to understand the basics of my bioinformatics. Igor then introduced Kevin to me, who helped me so much in software installation on the high performance computing system and shared me useful scripts to run jobs more efficiently. I acknowledge

VI

Erli with his great help in bioinformatics, especially python scripts. Without his help, my analysis cannot proceed so quickly. Additionally, I really appreciated those staff and students from HFSP attended the sensory test.

In the end I would also like to thank the travel award from the local committee of 26th International Conference on Coffee Sciences, which provides me a chance to display my expanded PhD findings, understand coffee science both in my homeland and the world as well as meet peers and collaborators from all research aspects on coffee. I thank Queensland Alliance of Agriculture and Food Innovation for the award to support my travel to attend the Plant and Animal Genome Conference in 2018 and present my major work of PhD, find potential explorers, communicate with peers and update with new findings in the field. Following attendance at this conference, I really appreciate my principle supervisor, Robert Henry, for his support of a lab visit to the University of Florida for an extension analysis of this PhD project (in the collaboration with Pamela and Douglas Soltis). A manuscript will be drafted for this extension work. This great experience produced a different perspective for me to understand my project and provided a good chance to strengthen my programming skills.

Financial support

This research was supported by Chinese Scholarship Council, Australian Research Council and Top-up Assistant Award from The University of Queensland.

VII

Keywords

Coffee quality, canopy position, gene expression, phenotypic traits, transcriptome, reference, polyploid

Australian and New Zealand Standard Research Classifications (ANZSRC)

ANZSRC code: 070601 Horticultural Crop Growth and Development, 30% ANZSRC code: 100199 Agricultural Biotechnology not elsewhere classified, 30% ANZSRC code: 060102 Bioinformatics, 50%

Fields of Research (FoR) Classification

FoR code: 0706 Horticultural Production, 30% FoR code: 1001 Agricultural Biotechnology 30% FoR code: 0601 Biochemistry and Cell Biology, 50%

VIII

Table of contents Abstract ...... II

Declaration by author ...... III

Publications included in this thesis ...... III

Submitted manuscripts included in this thesis ...... V

Other publications during candidature ...... V

Contributions by others to the thesis ...... V

No contributions by others...... V

Statement of parts of the thesis submitted to qualify for the award of another degree ...... V

Research Involving Human or Animal Subjects ...... V

Financial support ...... VII

Keywords ...... VIII

Australian and New Zealand Standard Research Classifications (ANZSRC) ...... VIII

Fields of Research (FoR) Classification ...... VIII

List of Figures ...... XV

List of Tables ...... XVI

List of Abbreviations used in the thesis ...... XVII

Chapter 1 Introduction ...... 1

1.1 Goals and Challenges ...... 1

1.2 Contributions ...... 1

1.3 Thesis Outline ...... 2

Chapter 2 Literature review ...... 3

2.1 Introduction to Coffee Quality ...... 4

2.1.1 Green Coffee Grading ...... 5

2.1.1.1 Bean Size ...... 5

2.1.1.2 Defective Beans ...... 5

2.1.1.3 Bean Colour ...... 6

IX

2.1.2 Sensory Evaluation ...... 6

2.1.3 Chemical basis of coffee quality ...... 7

2.1.3.1 Caffeine ...... 7

2.1.3.2 Trigonelline ...... 8

2.1.3.3 Chlorogenic Acids ...... 9

2.1.3.4 Sucrose ...... 9

2.1.3.5 Lipids ...... 10

2.1.3.6 Volatiles ...... 11

2.2 Introduction to Coffee Genetics ...... 13

2.2.1 Classical genetics ...... 13

2.2.1.1 ...... 13

2.2.1.2 ...... 15

2.2.2 Coffee Genomics ...... 16

2.2.2.1 Coffee Genome ...... 16

2.2.2.2 Transcriptome ...... 17

2.3 Genotype and Environment influences on Coffee Quality ...... 26

2.3.1 Shade ...... 26

2.3.2 Pruning...... 28

2.3.3 Water ...... 29

2.3.4 Temperature ...... 30

2.3.5 Soil ...... 33

2.3.6 Wind ...... 33

2.3.7 Topography ...... 33

2.4 Methods for analysis of genotype and environment influences on coffee quality ...... 33

2.4.1 The development of transcriptome analysis technologies ...... 33

2.4.2 RNA-Seq technologies ...... 34

2.4.3 Applications in coffee ...... 39

X

2.5 Conclusion ...... 39

Chapter 3 Flavour and quality analysis of coffee beans ...... 41

3.1 Introduction ...... 41

3.2 Plant material and methods ...... 41

3.2.1 Coffee bean processing ...... 41

3.2.2 Bean size and weight measurement ...... 42

3.2.3 Moisture content ...... 42

3.2.4 Analysis of caffeine and trigonelline ...... 42

3.2.5 Analysis of sucrose ...... 43

3.2.6 Sensory evaluation ...... 43

3.2.6.1 Sample preparation and presentation ...... 43

3.2.6.3 Difference testing ...... 45

3.2.7 Statistical analysis...... 46

3.3 Results ...... 46

3.3.1 Influence of canopy position on the size and chemical compositions of coffee beans ...... 46

3.3.2 Influence of canopy position on the aroma of coffee beans ...... 47

3.4 Summary ...... 48

Chapter 4 Generate a coffee bean transcriptome reference ...... 50

4.1 Introduction ...... 50

4.2 Plant material and methods ...... 51

4.2.1 RNA sample preparation ...... 51

4.2.2 cDNA preparation ...... 52

4.2.3 Raw read processing and error correction ...... 52

4.2.4 Transcriptome annotation ...... 54

4.2.5 Case studies with the caffeine and sucrose genes...... 54

4.2.6 Comparison to other available coffee databases ...... 57

XI

4.2.7 Novel genes ...... 57

4.2.8 Analysis of long sequences ...... 57

4.3 Results ...... 57

4.3.1 Overview of full-Length RNA molecules from long-read sequencing ...... 57

4.3.2 Functional Annotation ...... 59

4.3.2.1 Case study I: Isoform diversity in the caffeine biosynthesis pathway ...... 63

4.3.2.2 Case study II: Long sucrose isoforms provide insight into the complexity of the polyploid system ...... 65

4.3.3 Comparison to other available coffee databases ...... 69

4.3.4 Novel genes ...... 70

4.3.5 Long transcripts ...... 71

4.4 Discussion ...... 72

4.4.1 Polyploid expression...... 73

4.4.2 5’UTR extension...... 74

4.4.3 Long transcripts ...... 75

4.4.4 Transcriptome analysis of polyploids using long-read sequencing ...... 75

Chapter 5 Understanding of the ripening coffee bean through transcriptome analysis ...... 77

5.1 Introduction ...... 77

5.2 Plant material and methods ...... 78

5.2.1 RNA sample and cDNA library preparation ...... 78

5.2.2 Reads mining and RNA-Seq analysis ...... 78

5.2.3 Statistical analysis...... 78

5.3 Results ...... 79

5.3.1 Raw read processing ...... 80

5.3.2 Highly expressed transcripts through coffee bean ripening ...... 83

5.3.2.1 Overview ...... 83

5.3.2.2 Intensive lipids formation at the green stage ...... 83

XII

5.3.3 More changes in the comparison of the red vs green stage ...... 84

5.3.4 Storage compounds accumulated through bean maturity ...... 88

5.3.4.1 Major lipids accumulation at the green stage ...... 88

5.3.4.4 The flow of cell wall components ...... 89

5.3.4.3 Major storage protein accumulation at the green stage ...... 90

5.3.4.4 Phenylpropanoids ...... 90

5.3.4.5 Co-expression network of the bean storage genes ...... 90

5.4 Discussion ...... 92

Chapter 6 Environment influences on coffee quality ...... 97

6.1 Introduction ...... 97

6.2 Plant material and methods ...... 98

6.2.1 RNA sample and cDNA library preparation ...... 98

6.2.2 Read mining and RNA-Seq analysis ...... 98

6.2.3 Statistical analysis...... 99

6.2.4 Data availability ...... 99

6.3 Results ...... 99

6.3.1 Influence of canopy position on the expression of genes in pathways associated with bean composition ...... 99

6.3.2 Transcriptome analysis and the rates of bean maturation in different canopy positions ...... 101

6.3.3 Genes associated with coffee quality ...... 103

6.3.4 Co-expression network associated with the control of bean quality ...... 106

6.4 Summary ...... 108

Chapter 7 Conclusion ...... 111

Bibliography ...... 112

Appendices ...... 136

Appendix 1 Differentially expressed genes related to photosynthesis...... 136

Appendix 2 Differentially expressed genes related to hormones...... 141 XIII

Appendix 3 Differentially expressed genes related to stress response...... 149

XIV

List of Figures

Figure 1 Project outline of this thesis draft...... 2 Figure 2 Coffee price over the last 45 years [22] ...... 4 Figure 3 Trigonelline ...... 8 Figure 4 Major caffeine biosynthesis pathway in coffee...... 18 Figure 5 Possible trigonelline biosynthesis pathway in coffee...... 20 Figure 6 Chlorogenic acids biosynthesis pathway in coffee ...... 21 Figure 7 Sucrose metabolism pathway in coffee ...... 23 Figure 8 Lipids biosynthesis in coffee ...... 25 Figure 9 Coffee bean samples from the upper (>170cm) and lower (<170cm) canopy...... 42 Figure 10 Sample preparation and presentation for consumer sensory test...... 44 Figure 11 Chemical components of coffee beans (caffeine, trigonelline and sucrose) as influenced by canopy position...... 47 Figure 12 Coffee fruits of immature (Green), intermediate (Yellow) and mature (Red) stages ...... 51 Figure 13 Species distribution of coffee long read sequencing isoforms according to the result of BLASTx against NCBI non-redundant plant proteins (NR-plant)...... 59 Figure 14 Top Species distribution of coffee long read sequencing isoforms according to the result of BLASTx against NCBI non-redundant plant proteins (NR-plant)...... 60 Figure 15 Distribution of InterProScan families from coffee long read sequencing dataset ... 61 Figure 16 Pie chart and word cloud of coffee long read sequencing isoforms distribution to biological process, cellular component and molecular function ...... 62 Figure 17 The KEGG pathway distribution of coffee long read sequencing isoforms...... 63 Figure 18 Putative transcript variants from long-read sequencing aligned to reference caffeine genes...... 64 Figure 19 Putative transcript variants from long-read sequencing aligned to reported genes encoding 7-methylxanthine N-methyltransferase...... 65 Figure 20 Motif search results of putative sucrose synthase gene 1 from long read sequencing...... 67 Figure 21 Putative variants from long-read sequencing aligned to the reference sucrose genes...... 68

XV

Figure 22 The distribution of the number of the coffee long read sequencing sequences (coffee LRS-sequences), C. canephora coding sequences with UTR, C. arabica contigs with length...... 70 Figure 23 Pie chart and word cloud of long sequences (coffee long read sequencing isoforms >10kb) distribution to biological process, cellular component and molecular function...... 72 Figure 24 Overview of ripening coffee bean transcriptome...... 79 Figure 25 The number of transcripts expressed at coffee bean ripening...... 81 Figure 26 GO distribution in developing coffee bean, including biological process, molecular function and cellular components...... 81 Figure 27 Candidate genes involved in molecular network differentially expressed through ripening...... 88 Figure 28 Co-expression network of key storage genes according to weight...... 91 Figure 29 Summary of expression pattern in storage-related genes through coffee bean ripening...... 93 Figure 30 Difference in candidate gene expression...... 100 Figure 31 Transcriptome analysis of developing coffee beans from the upper and lower canopy...... 102 Figure 32 PCA analysis of gene expression in three ripening stages of coffee beans from the upper (U) and lower canopy (L)...... 103 Figure 33 The expression of differentially expressed genes in canopy position comparison of ripening coffee beans with functional annotation...... 105 Figure 34 Coexpression network of candidate genes with photosynthesis, hormone and stress related differentially expressed genes (DEGs)...... 108

List of Tables

Table 1 Comparison of the high throughput technologies ...... 37 Table 2 Speciality coffee definitions and sensory reference standards...... 44 Table 3 Bean weight and size in coffee beans from the upper and lower canopy (n=4, mean±SD)...... 46 Table 4 Physical, chemical and sensorial profiling of coffee beans from the upper and lower canopy (n=12)...... 48

XVI

Table 5 Details of caffeine candidate genes, putative transcript variants annotated and 5‘UTR extension information ...... 55 Table 6 Details of sucrose candidate genes, putative transcript variants annotated and 5‘UTR extension information...... 55 Table 7 Arabica long-read sequencing isoforms compared to Coffea canephora coding sequences and Coffea arabica EST sequences ...... 57 Table 8 BLAST output filtering with query coverage and cumulative identity...... 58 Table 9 Arabica long-read sequencing transcriptome annotation with different databases .... 58 Table 10 Results of 5’ UTRs from long-read sequencing scanned with UTRdb...... 65 Table 11 Coffee long read sequencing isoforms annotated by Rfam (rfam.xfam.org)...... 71 Table 12 Overview of RNA-Seq results...... 80 Table 13 Top 30 KEGG pathways and their corresponding parent groups in ripening coffee bean...... 82 Table 14 Top ten highly expressing transcripts in green, yellow and red stages of coffee beans (TPM)...... 83 Table 15 Expression (TPM) of top ten down/upregulated differentially expressed genes in the comparison of yellow vs green, red vs green and red vs yellow...... 85

List of Abbreviations used in the thesis

Abbreviation Description 11S 11s globulin 4CL 4-coumarate CoA ligase 7S 7S-like globulin ABA abscisic acid AC acetyl-CoA carboxylase ACO_1 1-aminocyclopropane-1-carboxylate oxidase isoform 1 ACS3 1-aminocyclopropane-1-carboxylate synthase 3-like AFLP amplified fragment length polymorphism AGPs arabinogalactan protein AspAT aspartate aminotransferase, mitochondrial GTP binding protein BRASSINAZOLE INSENSITIVE PALE GREEN BPG2 chloroplastic 2 bZIP leucine zipper transcription factor C3’H p-coumaroylester 3’-hydroxylases C4H cytochrome cinnamate 4-hydroxylase CAGE cap analysis of gene expression CBD Coffee berry disease

XVII

CCoAOMT caffeoyl-CoA 3-O-methyl transferase CCR2 cinnamoyl- reductase 2 CEL cellulase CESA cellulose synthase chi2 class III chitinase CIN cytosolic invertase CKX6 Cytokinin dehydrogenase 6-like COBRA-like COBRA-like protein coffee-LRS Coffee long read sequencing CP24_2 chlorophyll a-b binding CP24 10A isoform 2 CPS1_1 ent-copalyl diphosphate synthase isoform 1 CPT diacylglycerol cholinephosphotransferase CQA caffeoylquinic acids CWIN cell wall invertase CWINI Cell wall invertase inhibitor CYT Mannose-1-phosphate guanylyltransferase DAF days after flowering DAGAT diacylglycerol acyltransferase DEGs Differentially expressed genes DH dehydration diCQA di-caffeoylquinic acids EAR enoyl-ACP reductase EMBL-EBI European Nucleotide Archive ESTs Expressed sequence tags EXPA alpha-expansin transcripts EXT2 extensin-2-like FAT acyl-ACP thioesterase FDR False discovery rate FL Full-length FOUR databases NR plant, NT, C. canephora CDS with UTR and C. arabica EST database FQA feruloylquinic acids FQT feruloyl-CoA quinate feruloyl transferase G Green gal α-galactosidase GI Geninfo identifier GLUA2 glutelin type A2 GO gene ontology GPAT glycerol-phosphate acyltransferase GRP glycine-rich RNA-binding 1 HAD hydroxyacyl-acyltransferase HCT quinate hydroycinnamoyl transferase HCTa shikimate O-hydroxycinnamoyltransferase transcript a HDT of Timor HEGs Top ten highly expressed genes

XVIII

HQ high quality HQT hydroxycinnamoyl-CoA quinate hydroxycinnamoyl transferase Hsps High-scoring Segment Pairs HT Hexose transporter ICE isoform-level clustering ICO International Coffee Organization ID cumulative identity INV invertase IPS InterProScan IREH1 probable serine threonine kinase IREH1 ISTR inverse sequence-tagged repeat KAR ketoacyl-ACP reductase KAS ketoacyl-ACP synthase KCS beta ketoacyl-CoA synthase KCS11-like 3-ketoacyl-CoA synthase 11-like KEGG Kyoto encyclopaedia of genes and genomes KFA kynurenine formamidase L lower LACS long-chain acyl-CoA synthetase LG coffee beans from the lower canopy at the green stage LOX5_4 probable linoleate 9S-lipoxygenase 5 isoform 4 LPAAT acylglycerol-phosphate acyltranferase LPC lysophosphatidylcholine acyltransferase LPCAT lysophosphatidylcholine LQ low quality LR coffee beans from the lower canopy at the red stage LRR Leucine-rich protein LY coffee beans from the lower canopy at the yellow stage MAT malonyl-CoA ACP transacylase MPSS massively parallel signature sequencing MS mannan synthase MYB90 MYB transcription factors ncRNA non-coding RNAs NFL non-full-length NGS Next-Generation Sequencing NMT N-methylnucleosidase OEE2_1 oxygen-evolving enhancer protein 2 isoform 1 OLE oleosin family PAL L-phenylalanine ammonia-lyase PAP phosphatidate phosphatase Pat patatin PC phosphatidylcholine PDAT phospholipid diacylglycerol acyltransferase PDH pyruvate dehydrogenase

XIX

PG polygalacturonase-like PL8 probable pectin lyase 8 PsbW photosystem II reaction centre W protein Qcovs query coverage QTL quantitative trait loci R Red R vs Y red vs yellow stage RAPD randomly amplified polymorphic DNA RD22 BURP domain RD22 RFLP restriction fragment length polymorphisms ROIs reads of insert SA salicylic acid SAC3 11S correlated with phosphoinositide phosphatase SAC3 SAD stearoyl-ACP desaturase SAGE serial analysis of gene expression SAM S-adenosylmethionine SDIR1 E3 ubiquitin- ligase SDIR1 sks5 SKU5 similar 5 SNP single nucleotide polymorphisms SPP sucrose phosphate phosphatase SPS sucrose phosphate synthase SSR simple sequence repeats or microsatellites ST Sucrose transporter SUS sucrose synthases TCS1 tea caffeine synthesis 1 TPM Transcripts Per Kilobase Million U upper UBQ10 polyubiquitin 10 UG coffee beans from the upper canopy at the green stage uORFs upstream open reading frames USX UDP-XYL synthase UY coffee beans from the upper canopy at the yellow stage VIN vacuolar invertase WGS whole genome sequencing XTH30 probable xyloglucan endotransglucosylase hydrolase 30 XTHB probable xyloglucan endotransglucosylase hydrolase B Xyl2-like beta-xylosidase/alpha-L-arabinofuranosidase 2-like Y Yellow Y vs G yellow vs green stage

XX

Chapter 1 Introduction

1.1 Goals and Challenges

The primary goal of this thesis is to distinguish the quality difference of coffee beans collected from the upper and the lower canopy. This would bring profits to the farmers without any special skills needed and provide high quality coffee to the market. In detail, three aims were included to achieve this goal. Aim one is to investigate the quality variation in coffee beans from the upper and lower canopy. Aim two is to identify the gene expression regulation and growing micro-environment influences on coffee bean ripening. Aim three is to associate coffee quality with gene expression. This study would improve the understanding of the molecular and genetic basis of coffee quality and provide reference for future breeding and studies.

Coffee quality is a complicated trait involved with numerous parameters, including physical, chemical and sensorial parameters (reviewed by Chapter two). Therefore, the first challenge is to select the suitable parameters to screen and detect critical traits to evaluate coffee quality (conducted in Chapter three). Transcriptome analysis captures all the genes expressed in the bean at a certain time point and environment. However, without a reference, it is a challenge to know what genes are expressed in the cell and their expression levels. Chapter four was designed to build a reference transcriptome for transcriptome wide gene expression studies. Moreover, environment influences on the quality (traits) are likely to be a long-term process to change the chemical components accumulated. Hence three developmental stages of coffee bean ripening were selected to study on this dynamic change (Chapter five). In the end, the quality traits and the transcriptome datasets were associated to emphasis the quality varies from the upper to the lower canopy (Chapter six). Challenges and future studies were described in chapter seven.

1.2 Contributions

The research work documented in this thesis generated a number of contributions, such as the following.

• Explains the sensorial difference of the beans from the upper and lower canopy. • Explains the physical difference of the beans from the upper and lower canopy. • Explains the chemical component difference in caffeine, trigonelline and sucrose of the beans from the upper and lower canopy.

1

• Generated the first long-read coffee bean transcriptome. • Improved the understanding of key components accumulation in ripening coffee bean from different parts of the canopy. • Explains the microenvironment influences to coffee quality with comparative transcriptome and associated studies.

1.3 Thesis Outline

This thesis was drafted according to the structure in Figure 1. First the problem was introduced in chapter one. Previous methods conducted, research gaps remained and methods proposed to solve as well as research questions involved were discussed in the following paragraphs of chapter two. Then in chapter three to six, the detailed proposed methods and results were documented. In the end, discussion was drafted comparing to the previous studies and future improvement was suggested to understand the research questions. Future perspectives were proposed in chapter seven for further analysis.

Figure 1 Project outline of this thesis draft.

2

Chapter 2 Literature review

The aim of this chapter is to provide a comprehensive review of coffee quality, genomics, genotype and environment influences and the previous methods to analyse coffee quality and environmental influences influences. This chapter will be the fundamental knowledge of the following chapters.

Coffee plants are evergreen shrubs from the genus Coffea, belong to the family Rubiaceae [1]. Coffee is a brewed beverage comes from roasted coffee beans [2]. Coffee "berries" are picked when ripe and then processed to obtain green beans. Seeds inside the berries are dried to obtain green beans, which are roasted according to the desirable flavour, then processed with grinding and brewing to make the coffee beverage. Coffee plants are cultivated primarily in equatorial places, such as Latin America, Africa and South Asia. Brazil leads the world coffee production by 40%, followed by Vietnam (~20%), Colombia, Indonesia, Honduras, Ethiopia, India, Uganda, Peru Mexico and others [3] Although there are more than a hundred species within the genus Coffea, only two species are highly regarded as a source of coffee beverage worldwide, Coffea canephora (also known commonly as Robusta coffea) and Coffea arabica (also known as Arabica coffee).

The C. canephora plant is generally better adapted to lowlands of less than 800 m above sea level, like the production areas in West and Central Africa [4, 5]. The C. canephora plant is an open-pollinated diploid species with strong adaptation capacity, however, it is strictly allogamous, that is only outbreeding and homozygous genotypes can only be obtained by haplodiploidization [6]. Therefore, Robusta coffee has highly diverse ecotypes and each of them is unique from one another in genotype aspect [2, 7]. With high resistance to coffee leaf rust, C. canephora is more productive than C. arabica [8]. Robusta coffee accounted for 35.2% world coffee production in 2018. [9]. Robusta coffee has a poor cup quality but very high resistance to disease.

Originating by hybridization of two related diploid species, Coffea eugenioides (E genome) and Coffea canephora (C genome), C. arabica (2n=4x=44) has limited diversity since it is the only self-fertile or autogamous (CaEa genome) species of the genus Coffea [10]. Nevertheless, with environmental influences, considerable diversity is also observed in this species [2, 11]. Arabica coffee prefers tropical highlands around 600 m to 1500 m above sea level, such as southwest Ethiopia [2, 10, 12-15]. In 2018, Arabica coffee contributes to 62.2%

3 of coffee production worldwide [9]. The coffee produced from C. arabica is associated with high quality compared to C. canephora, for example it has a stronger aroma [5, 16].

The number of coffee farming families worldwide is approximately 25 million with more than 70% being small farmers with land of fewer than ten hectares but producing 75 % of world coffee [17, 18]. The world coffee market is dynamic with the coffee price changing dramatically year by year (Figure 2). This “roller coaster” is more than a change of price but brings immense international social, economic and political impacts and results in large numbers of farmers and rural workers confronting bankruptcy, migration, and hunger [19].

Price change has resulted from crop failure and overproduction. Crop failure is resulted from disease, insects, land stress, climate change and other environmental factors. Scientists from different parts of the world have been dedicated to these fields of research for years. To deal with overproduction, an international coffee agreement was introduced to the coffee producing countries in 1962 using export quota to stabilise the coffee price at a high level. However, this agreement was abandoned in 1990. It is believed new impetus is required to boost the coffee supply chain. Recently, the awareness of quality, taste, health, and environmental issues has increased among coffee consumers bringing growing demand for high quality and specialty [20, 21]. High-quality coffees offer price premiums to producers.

Coffee prices over the last 45 years 4 3.5 3 2.5 2 1.5 1 0.5

US dollar/lb US 0

20/08/1985 20/08/2017 20/08/1973 20/08/1975 20/08/1977 20/08/1979 20/08/1981 20/08/1983 20/08/1987 20/08/1989 20/08/1991 20/08/1993 20/08/1995 20/08/1997 20/08/1999 20/08/2001 20/08/2003 20/08/2005 20/08/2007 20/08/2009 20/08/2011 20/08/2013 20/08/2015 Year

Figure 2 Coffee price over the last 45 years [22] 2.1 Introduction to Coffee Quality

Coffee quality is a complicated trait, which has multifactorial determinants influenced by factors such as genotype, environment and postharvest treatments. [23-26]. Among these,

4 genotype is essential, for it determines various key characteristics, such as bean size, caffeine content and flavour [27, 28]. Manufacturers play an important role in the assessment of coffee quality since they select the raw beans for processing good quality coffee and would be interested to select coffee beans with the bean quality they are interested in [2]. A better understanding of the quality traits, such as physical and chemical traits, has obtained significant improvements to coffee quality [29, 30].

Generally, coffee quality should focus on the final utilization and consumer preferences that can be assessed in three primary ways, namely, green coffee bean grading, sensory evaluation and chemical analysis [19]. Green coffee bean grading aims to analyse the bean size distribution, the amount of defective beans as well as bean colour. Sensory evaluation is a process where coffee flavour is determined. Finally, chemical analysis allows determination of moisture content and some chemical key components related to quality, such as caffeine and sucrose.

2.1.1 Green Coffee Grading

2.1.1.1 Bean Size

Bean size is an important quality trait since coffee grade is correlated with price; the smaller the bean size the lower the price will be. However, larger beans do not always taste better. Ideally, uniform bean size is favourable in as less roasting time is needed for smaller beans while more time needed for larger beans. Improperly roasted beans will certainly result in poor cup quality [31, 32]. Bean size varies with coffee genotype and environment [32-34]. For example, Arabica beans are larger than Robusta beans with 18-22 g and 12-15 g per 100 beans respectively [2]. It has been found that there is a negative correlation between shade intensity and bean size [32, 35].

2.1.1.2 Defective Beans

Defective beans are normally rejected in coffee processing. For each coffee bean roasting and shipment, characterisation of whether green beans match the required quality is important. Generally, roasting of defective beans impairs taste and flavours and affect cup quality [36, 37]. It is preferable not to roast beans from different species, such as Arabica with Robusta due to their different flavour. Basically, there are three types of defective beans: 1) field damaged beans from genetic and environmental effects (soil, climate (for example drought has a negative effect that may produce empty beans), pests, diseases and crop

5 management and so on); 2) process damaged beans which are spoiled primarily by processing like pulping and washing; 3) storage damage due to imperfect storage or pests.

2.1.1.3 Bean Colour

The colour of green beans is a sign of freshness, moisture and homogeneity. Genotype influences bean colour. The green-bluish colour of washed Arabica is considered premium while it’s the browner beans in the case of Robusta [2]. Like bean size and defective character, bean colour changes with different environments and crop management, for example, coffee grown at high altitude is often greenly-blue and if grown in soil lacking zinc, coffee beans become light-grey in colour [2].

2.1.2 Sensory Evaluation

Flavour is the primary standard in worldwide coffee trade [38]. Despite having a fine bean size and good appearance without defective beans, the coffee flavour could be poor. For that reason, coffee flavour assessment is essential to quality measurement according to the particular utilization proposed for the coffee, like roast and ground and liquid . Cup quality analysis aims to evaluate the coffee in an objective and reproducible way to create a flavour profile in terms of established terminology. A group of trained people judge a coffee sample for its flavour, mouth-feel and aftertaste. The basic terminologies used are aroma, flavour, body and acidity. A coffee flavour profile vocabulary of aroma, taste and mouth-feel have been established by the International Coffee Organization (ICO) as well [39].

Compared to the above physical parameters, flavour (cup quality) is more sensitive not only to genotype but to environment and other relevant factors. It is known that genotype affects coffee flavour. For example, the acidity ranges dramatically in different washed Arabica coffees, while Robusta coffees are described as low or no acidity at all with coarse liquor, harsh and cereal notes and thick body [33]. Ultimately, Arabica coffee is sold as blends with varying proportions of Robusta coffee, but Robusta coffees are seldom used alone [2]. During coffee development, environment is a very important factor that influences coffee flavour [26, 32, 35, 40, 41]. Varieties such as Blue Mountain, SL-28, Pluma Hidalgo are famous worldwide due to premium flavour. However, if grown in another zone instead of the preferred area, they do not always have the good flavour. [42]. It is believed that the interaction of genotype and a particular environment results in premium coffee, such as Blue Mountain, which is only a premium coffee when grown in Jamaica. But little is known of how these combinations of genotype and environment generate high quality coffee [2].

6

Other than factors such as pest, climate (hail or altitude), growth condition and lack of zinc that influence the physical quality, pesticides may bring unpleasant coffee flavour as well [26, 32, 35, 40, 43, 44]. Harvest and post harvesting processes affect coffee quality. Mixed green, ripe, red and black beans produce inferior flavour coffee, therefore, only ripe beans are used and great manual effort is needed in harvesting. Unlike other plants, coffee lacks the regulatory mechanism to drop over-ripe fruits and this over-bearing leads to nutrient deficiency and finally leads to immature beans that influence the flavour. Post-harvest treatments that influence flavour include three processing methods (wet, semi-dry and dry), pulping, washing, drying and husking [2]

2.1.3 Chemical basis of coffee quality

The chemistry of coffee quality is highly complex including a vast spectrum of compounds influenced by numerous factors. These substances contribute to the coffee flavour and aroma as well as our health.

2.1.3.1 Caffeine

Caffeine, a key chemical component of the coffee plant, was isolated initially by Runge (a German chemist) in 1819 [45]. Nowadays, caffeine is the world-famous, behaviourally active drug consumed primarily from coffee. Caffeine consumption each day varies between 70 to 76 mg/person worldwide. But in the US and Canada it reaches 210 to 238 mg/person and in Sweden and Finland it is even more than 400 mg/person. Around 80 to 100% of the caffeine comes from the intake of coffee alone [46, 47]. Recent sequencing of the Robusta coffee genome revealed that caffeine had evolved separately in coffee and other plants such as tea suggesting a biologically important role for caffeine [48].

Caffeine is a stimulant to central nervous system and a relaxant to smooth muscle. Caffeine is utilized widely in analgesic remedies additive [49]. Caffeine has various health effects when consumed moderately, including increases in energy availability, alertness or wakefulness and the ability to focus, decreased fatigue and boosted physical performance. Nevertheless, consumption of too much caffeine will result in undesired effects like cardiovascular disease, depression, oversensitivity, anxiety and irritability and even addiction [50]. Once consumed, caffeine is absorbed completely in one hour and distributed evenly throughout the body. The plasma level then peaks after 30-60 min. Research shows that 100 ml of coffee containing 55 g/L of roasted ground coffee, considering 70% Arabica coffee around the world, the caffeine content intake in average is around 100 mg caffeine. Caffeine also plays an ecological role as

7 a natural chemical defence for the coffee plant against animals, such as herbivory and insects, but clear experimental support for these activities is difficult to obtain [51, 52].

Caffeine (1,3,7-trimethylxanthine), a classical purine alkaloid, naturally produced in seeds, flowers and leaves of over 80 plant species [53], but high concentrations accumulate only in several species, like Coffea arabica (Arabica coffee), Camellia sinensis (tea) and Theobroma cacao (cacao). Caffeine content of green coffee beans varies significantly in different genotypes; there is considerable caffeine in green coffee beans of current commercially cultivated coffee plants; caffeine in Arabica coffee varies from 1.2 to 1.4%, while in Robusta coffee, it ranges between 1.2 and 3.3% [54]; in some wild coffee species, however, the caffeine content is either undetectable or at rather low levels [54-56]. In the two cultivated species, the caffeine content is influenced by the fruit development, the maturity period, and the daily filling of caffeine in the coffee fruits [57]. Postharvest processing does not affect the caffeine concentration in the coffee and only a small amount of caffeine is affected by roasting process [58, 59].

2.1.3.2 Trigonelline

Trigonelline is distributed widely in plants such as plants [60]. Trigonelline in raw coffee beans varies from 0.3 to 1.3% [61]. Next to caffeine content, trigonelline is another alkaloid highly present in coffee beans. As a pyridine derivative (see Figure 3), trigonelline is an aroma precursor that contributes to the desirable flavour products formed during coffee roasting, such as pyrazine and furans [62-64]. Trigonelline decomposes rapidly depending on the roasting temperature and time. A de-methylation process occurs in the degradation process generating a water-soluble vitamin B, nicotinic acid, which is fairly bioavailable in coffee beverages compared with other natural sources [65]. Three and a half standard cups intake of coffee per day allow for one-third of the adult daily minimum dietary requirement of nicotinic acid [66]. Therefore, coffee is a significant dietary source of nicotinic acid [67].

Figure 3 Trigonelline

8

In the coffee plant, trigonelline is synthesized in the leaves and fruit pericarp, after which it accumulates in coffee seeds. Trigonelline content shows associations with different genotypes, for instance, Arabica coffee was shown to have higher trigonelline content compared to Robusta coffee. This contributes partly to the higher quality of Arabica coffee and increasing the trigonelline content could probably be a way to improve the cup quality of Robusta coffee [5, 68]. Environment is another factor that influences the trigonelline content [69]; previous work showed a higher content in sun-grown coffee beans [35]. The trigonelline content is higher during all stages of development of coffee beans grown at higher elevations [70].

2.1.3.3 Chlorogenic Acids

Chlorogenic acids (CGA) are a group of phenolic compounds with numerous health benefits, such as prevention of cancer, cardiovascular disease, diabetes, Alzheimer’s disease and increase antibacterial and anti-inflammatory activity [71-75]. Coffee, especially the green bean, is a major natural source of CGA that reaches up to 12% of dry matter in the bean [5]. Other than health effects, CGA in coffee contributes to the pigment of the beans, formation of aroma and astringency [76, 77]. Furthermore, CGAs are significant plant metabolites that are associated with the protection of plant cells against insults, for example, oxidative stress, UV irradiation and infection [78-80].

Thirty different species of CGA have been detected in green coffee beans. These CGAs are classified in three subgroups: caffeoylquinic acids (CQA; 5CQA included), di-caffeoylquinic acids (diCQA) and feruloylquinic acids (FQA) [81]. The major group of CGAs is CQA, with respective compounds of 5CQA that accounts for 56- 62% of total CGA content [82]. However, CGA content is influenced by a series of factors, such as processing, phenotype and environment. Similar to trigonelline, CGAs are thermally unstable and degraded during roasting to generate phenolic substances which account for the bitterness of coffee [83]. The CGA content of different varieties varies widely. Arabica contains less CGA than robusta coffee, with 4.0-8.4% and 7.0-14.4% respectively [84]. CGAs accumulate at higher levels with increasing altitude [30].

2.1.3.4 Sucrose

As a major free sugar present in green coffee beans, sucrose plays an important role in coffee aroma [38, 68]. Sucrose is a flavour precursor that is broke down to anhydro-sugars (like 1,6 anhydro-glucose) and other compounds (such as glyoxal) immediately once roasted [64].

9

These substances then react with amino acids in Maillard reactions [85]. Desirable volatile and non-volatiles components including furans, pyrazine, aliphatic acids and hydroxymethyl furfural are formed and are widely regarded as contributing positively to coffee flavour [86]. However, Maillard reactions not only bring pleasant substances but also generates undesirable compounds such as acrylamide or substrates that can be activated metabolically to form carcinogens with sulfotransferases [87, 88]. Sucrose is detected at 7-8% of the dry matter content, which contributes to over 90% of total low molecular carbohydrates in coffee [89]. The other low molecular sugars, for instance, glucose and fructose may also degrade or react with amino acids, but in green beans they are recorded in trace amounts, representing only about 0.5% of the total [90].

Unlike other compounds, sucrose was found not significantly influenced by processing [89]. Nevertheless, altitude and slope exposure significantly influence sucrose accumulation in coffee beans [26]. Moreover, the sucrose content depends on the genotype; sucrose accounts for 5.1- 9 % dry matter in Arabica coffee while only 4-7 % in Robusta [5, 68, 91].

2.1.3.5 Lipids

Coffee lipids, range between 10-13% of the bean and are composed of different classes of lipids [92]. In green beans, most of the lipids in the endosperm are triglycerides (fats), diterpenes, sterols, phospholipids, esters with fatty acids and tocopherols. Lipids in the outer layers of the bean include coffee wax, which is mainly 5-hydroxytryptamide esters with fatty acids [59]. Lipids in coffee contribute to the texture and mouthfeel of the beverage [59]. Triacylglycerols are 75% of total lipids and diterpene fatty acids are 20% of total lipids [93]. and in the diterpene fatty acids class attract much attention owing to physiological effects [94]. It is paradoxical that cafestol (and Kahweol) may increase serum cholesterol; but also have potential to prevent carcinogenesis [95, 96]. Tocopherols are another important group of lipids in green beans but are only present in small amounts. Tocopherols have antioxidant activities. Together with tocotrienols, they belong to eight vitamins and contribute to the vitamin E [97, 98]. Accounting for only 0.1-0.3% of total bean weight, coffee wax is sometimes removed by technological treatment such as polishing, dewaxing or decaffeinating which make the coffee beverage more attractive to consumers [59].

Roasting does not change most of the coffee lipids, however, with strong roast, lipids can accumulate at the bean surface. During roasting, cafestol and kahweol produce

10 dehydrocafestol and dehydrokahweol respectively; these new compounds increase with higher temperatures [99-101]. Research shows that the ratio of cafestol to dehydrocafestol is a measure degree of roasting; for example, a ratio of 25-40 is characteristic of a well-roasted coffee; for strongly roasted coffees, the ratio decreases to 10-15 [101]. A higher content of lipids was found in Arabica coffee compared to Robusta coffee. Kahweol is specifically found in Arabica coffee while 16-O- methylcafestol (16-OMC) is unique in Robusta coffee. Cafestol and kahweol are sensitive to light, heat and acids, and kahweol is unstable in a purified form [94]. However, despite the important role of lipids in coffee beans, they are difficult to retain in the final beverage. The content of lipids in a coffee brew depends on the preparation; for the normal filtered preparations, the lipids are less than 0.2%; for the strongly roasted espresso, lipids accounts for 1-2%; in the Scandinavian style coffee, lipids reach up to 22% [75, 94]. These differences result from the different temperatures and times used during ; higher temperatures and longer time during coffee brewing brings more lipids into the beverage [102, 103].

2.1.3.6 Volatiles

Volatiles in coffee are generated in a wide range of reactions and interactions: Maillard reactions (especially non-enzymatic browning), Strecker degradation, degradation of sulphur amino acids, proline, trigonelline, quinic acid moiety and minor lipids (mostly diterpenes). [104, 105] The volatiles in coffee are also quite complex and are influenced by genotype, growing location and processing. [106] Around 200 volatiles have already been detected in green coffee beans, while the number of volatile compounds formed on roasting are more than 1,000 [107]. However, only a small number of components significantly affect coffee flavour [59, 108, 109]. Many of these compounds are not stable and decrease greatly over time in the brew [110].

The volatiles in Arabica coffee have been recently reviewed, which are be primarily classified as furans, sulphur-containing substances, pyrazines, furanones and phenolic compounds [107]. The furans and pyrazines are more abundant in coffee but the most significant contributors to coffee flavour are sulphur-containing compounds and pyrazines [108].

Important sulphur-containing compounds include thiols, such as 2-furfurylthiol, which has a low concentration but present relatively low sensory threshold (0.01 ppb) and a strong roasted aroma [111-113]. Other thiols with low sensory thresholds and high contributions to flavour

11 include 3-methylthiophene and 2, 4-dimethyl-5-ethylthiazole, which is characterized in coffee that have meaty flavours [114-117].

Pyrazines were also found in low sensory threshold concentrations but very important to coffee flavour. For instance, 3-isobutyl-2-methoxypyrazine has an extremely low sensory threshold (0.002 ppb) but a significant effect on Arabica coffee flavour [118]. The taste of nutty, earthy, roasty and green aromas were characterized typically in pyrazines [119]. However, ethylpyrazines and ethenylalkylpyrazines in this class were contribute to the earthy aroma in Robusta coffee, while 2-ethyl-3,5-dimethylpyrazine and 2,3-diethyl-5- methylpyrazine contribute to the roasty and earthy flavours [59].

Furans present pleasant flavour in coffee but also associate with harmful effects to our health [120, 121]. These compounds come from thermal degradation of carbohydrates (Millard Reaction), ascorbic acid (highest level), or unsaturated fatty acids during roasting [122]. The key odour contributing furans in coffee are thought to be furfuryl alcohol, which show the lowest vapour pressure in the headspace over brewed coffee and typically have bitter and burnt aromas [123, 124]. Important flavour compounds were also found like furfuryl acetate, 2-furfural, 2-methylfuran and 5-methylfurfural in ground roasting coffee [125]. However, it is reported that alkyl- and alkenyl-substituted furans, like 3-methylfuran, are not responsible for sensory differences [126]. The furans are present in coffee brew at 3–115 ppb [127].

Furanones, which have a sweetish and caramel-like aroma, are formed in coffee through Maillard reactions and aldol condensation subsequently [86]. Within this volatile class, 4- hydroxy-2,5-dimethyl-3(2H)-furanone and 2(5)-ethyl-4-hydroxy-5(2)-methyl-3(2H)-furanone account for caramel-like aroma while 3-hydroxy-4,5-dimethyl-2(5H)-furanone and 4-ethyl-3- hydroxy-5-methyl-2(5H)-furanone are responsible for spicy aroma [59].

Formed from the breakdown of chlorogenic acids in roasted coffee beans, Guaiacol, 4- ethylguaiacol and 4-vinylguaiacol and vanillin (phenolic compounds) contribute to spicy phenolic aroma [86, 128]. The content of this volatile class varies from 3 to 56 ppm in coffee beverage based on the variety and geography selected and importantly the chlorogenic acids amount and species in green beans [129].

Each volatile group has remarkably different aroma components. Volatile flavour compounds in coffee are known to be influenced greatly during processing and post-harvest handling. They are also influenced by genetics and environment. The major differences between arabica and robusta volatile composition includes 2-methylisoborneol (the key aroma

12 components in robusta coffee), 2-ethyl-3,5-dimethylpyrazine and 2,3-diethyl-5- methylpyrazine [130]. As these volatile phenolic compounds are derived from chlorogenic acids, this can also be a way to distinguish these two species since there is a significant difference between the chlorogenic acids in arabica and robusta coffee [86]. Environment is thought to influence the volatile composition of coffee. For example, high air temperatures were found to induce accumulation of volatiles in green coffee, meanwhile, the presence of ethanol and acetone were characteristic of cool temperatures [131].

2.2 Introduction to Coffee Genetics

2.2.1 Classical genetics

2.2.1.1 Coffea arabica

Arabica coffee are primarily homozygous, obtained through self-pollination, such as the famous variety Mundo Novo [132]. Originally from Ethiopia, arabica coffee was introduced to Yemen by the Arabs in the 13 to 14th century. Later the Dutch brought this species to Indonesia around 1700, and then to Central and South America [2]. This group of arabica coffee genotypes is known as Typica and includes the famous Blue Mountain [133]. Typica is relatively low yielding (less than average) with large and elongated beans and small elongated leaves but presents good cup quality. However, the problem is that Typica is involved with all primary coffee diseases except for the varieties Guatemala and Blue Mountain that show resistance to Coffee berry disease (CBD). Typica is now found in Colombia and Central America [134, 135]. Another group of genotypes is based upon plants introduced to Bourbon Island and to Africa, South and Central America; this group was named Bourbon and is more productive and diverse than Typica [134, 135]. Researchers planted Typica and Bourbon side by side and found some hybrid genoptypes, Mundo Novo, for example, derived from Bourbon and Sumatra (Typica). Bourbon is now found in Colombia, Central America and West Africa [2]. Bourbon has broader leaves, round cherries, shows high cup quality and yield but is also susceptible to all main coffee diseases and pests [135].

Since there is lower resistance to leaf rust but higher flavour quality of arabica coffee, spontaneous interspecific crosses between C. arabica and C. canephora [136] have been identified to combine the desirable characteristics. The most famous is Hybrid of Timor (HDT) with leaf rust resistance and higher productivity, discovered around 1920 [137]. Another important example is Catimor, which originated from a cross among Caturra and

13 two plants of HDT that were resistant to all known leaf rusts [138]. Naturally occurring accessions derived from seed progenies of coffee trees from Ethiopia, Sudan or Kenya have high yield and resistance to coffee leaf rust. Famous varieties from Ethiopia, like , were taken to Kenya, India and other places since 1930. In the 1940’s, collections from Sudan in Boma Plateau were found to be relatively resistant to CBD and leaf rust. Another significant collection, with 400 accessions, was made after an FAO expedition (1964) in Brazil, Costa Rica, India and Tanzania; Field-testing did not yield high performing lines, but three varieties were found with almost no caffeine within the 3000 Ethiopian arabica coffee genotypes identified. Later on, selection was carried out to search for varieties resistant to CBD. Early accessions, such as Typica and Bourbon, from Yemen became commercial varieties worldwide.

Genetic techniques have been used to evaluate intra- and interspecific variation using enzymatic or biochemical markers [5, 55]. Classical methods at the DNA level to analyse the genotype diversity include randomly amplified polymorphic DNA (RAPD), restriction fragment length polymorphisms (RFLP), inverse sequence-tagged repeat (ISTR), amplified fragment length polymorphism (AFLP) and simple sequence repeats or microsatellites (SSR) [136, 139-142]. With the development of the single nucleotide polymorphisms (SNP) technique, genes associated with biosynthetic pathways and metabolism in coffee bean are able to be analysed [143, 144]. Significant differences were also found in Typica and Bourbon coffee and there was a large divergence between these two groups [145]. It was found that C. arabica is the hybrid of two species C. eugenioides (maternal) and C. canephora (paternal) respectively [10].

Despite there being 40 mutants reported in C. arabica plants, only a few have traits or favours of commercial interest, including bean color, bean size, special flavour, male sterility or low caffeine content [2]. For example, the , “Laurina” or “Bourbon Pointu” originating from Reunion Island contains relatively low caffeine (about 0.6%). A recessive gene from a Bourbon genetic background was found to regulate this trait [146]. Natural crosses were found between C. arabica and diploid species containing significant commercial traits. Crosses with C. liberica were also discovered; like the popular S795 family which contains the SH3 resistance gene from C. liberica [136]. Hybrids between C. arabica and C. canephora, like hybrid of Timor have resistance to all known H.vastatrix [147]. Some other crosses were also shown to transfer traits to C. arabica [148].

14

2.2.1.2 Coffea canephora

Due to low resistance of C. arabica to the destructive leaf rust, Dutch brought C. liberica to Indonesia around 1875. This species is better adapted to lower altitude sites and is resistant to leaf rust [2]. The beverage quality of this species is poorer than either Arabica or Robusta coffee [149]. Two decades later, C. liberica was almost destroyed by leaf fungus and was thus replaced by another coffee species, C. canephora from 1910, which had been first introduced to Java in 1900 [150]. The famous robusta coffee was selected among C. canephora coffee types because of the higher productivity and improved resistance to leaf rust [68]. As a result, a large amount was cultivated in the lower coffee land in Indonesia instead of C. arabica and C. liberica [151]. In the 20th century, robusta coffee was re- introduced into Africa. Whilst, a similar substitution by C. canephora occurred in India [2].

Wild plants are found in humid tropical lowland Africa. To protect against the loss of this coffee germplasm due to deforestation, a large collection was organized by a French institute in the 1970s. A total of 1000 genotypes were selected in two collections, one is the Guinean collection from Ivory Coast and Guinea, the other one is from Central African for Congolese collection [152]. The first collection, the Guinean collection, has rather small and elongated leaves, as well as small beans, which are lower than 11 g/100 dry matter. Many varieties show tolerance to drought, but most of them are susceptible to local leaf rust. The other collection (Congolese) contains relatively large and broad leaves, larger beans (13 g/100 dry beans); most plants are susceptible to drought but have a high resistance to leaf rust. This collection has been planted since the cultivation of C. canephora and shows better agronomic performance than the Guinean collection [152]. Both the collections are well represented in the collection of the Ivory Coast. Unfortunately, other countries have accessed almost exclusively the Congolese collection. With intense research into the morphological variation of coffee, another group, the Ugandan Group was found [153].

C. canephora was introduced to Indonesia in 1900 from different countries. The main selected cultivar, Laurentii, originated from the Democratic Republic of Congo [154]. This cultivar is productive and resistant to leaf rust. In Africa, the unselected C. canephora commercial can be divided into two groups, robusta and kouillou groups. C. canephora was introduced from Africa to Brazil at the same time it was taken to Indonesia (1900) [155]. Only a few mutants in C. canephora have been found, like Mocha, with a dwarf trait that would be of interest in more intensive cultivation systems [154]. Spontaneous

15 diploid hybrids between C. canphora var. Ugandae and C. congensis, like Congusta or Conuga have been significant in coffee production with a high level of yield and good quality [154].

2.2.2 Coffee Genomics

2.2.2.1 Coffee Genome

Despite there being considerable diversity between the different coffee species, they share a common base genome. The only amphidiploid (2n=4x=44 chromosomes) in the coffee plant group, C. arabica, has a genome size estimated at 2.62 pg of DNA per nucleus by flow cytometry while others are diploid, such as C. canephora with a genome size of 1.41 pg [156, 157].

In order to preserve genetic resources and explore interspecific hybrids for breeding programs, wild Coffea species were collected from Madagascan and African tropical forests during the 1970s and the 1980s. Since the 1980s, genetic linkage map construction and quantitative trait loci (QTLs) identification have been explored especially in C. arabica. However, C. arabica cultivars have a narrow genetic base of cultivated form based on foundation effect and Ethiopian genetic resources are unavailable for geneticists. Despite it having a large natural diversity, diploid status and being the main source of , C. canephora has not been considered as an important resource because of the lower quality of the coffee. However, because the allotetraploid trait of C. arabica creates problems in assembly of the genome sequence, the scientific community agreed to focus on C. canephora in 1989 [158]. With the stimulation of Arabidopsis genome research in the 1990s, synteny was used to extrapolate results from one species to other related species, for instance, rice to sorghum and sugarcane [159] However, for coffee, researchers began to develop markers in the 1980s. The whole genome sequencing (WGS) methods developed recently offer access to all genes, provide genome information on regulatory regions, structure and noncoding sequences. The whole sequence of the C. arabica chloroplast was published in 2007 with a 155 189 bp length genome. [160] In this work, 112 distinct and 18 duplicate inverted repeat contribute to the 130 genes presented; 79 protein genes, 29 transfer RNA genes, 4 ribosomal RNA genes and 18 genes with introns [160]. The whole nuclear genome of C. canephora, published in 2014, was sequenced using WGS with 30 × coverage. A high density genetic map has been provided and the complete genome is estimated to 710 Mb. Assembly data shows 25,216 contigs and 13,345 scaffolds, which account for 80% of the whole genome (568.6 Mb)

16 resulting in 17% of the genome being inter- contig gaps (97 Mb) [48]. Recently, a draft genome of C. arabica (K7) is released with 76,409 scaffolds (N50 of 54,544 bp) and 99,829 gene models [161]. Another Arabica genome based on Red Bourbon is also available recently with 36,864 genomic fragments and 78,311 genes [162].

2.2.2.2 Transcriptome

The transcriptome defines all the transcripts within a cell or tissue and help to investigate the mechanisms of control of gene expression during crucial biological processes and physiological condition. This means transcriptome analysis can increase understanding of coffee quality with candidate genes as well as potential genes not discovered in coffee yet. Transcriptome analysis has been used effectively to study physiological responses to abiotic and biotic stresses, disease, pest and berry development (such as biosynthesis pathways). [163-166].

Expressed sequence tags resources

Expressed sequence tags (ESTs) are cDNA sequences produced by high-throughput single- pass sequencing that have been a valuable resource in earlier work. With a complete genome sequence, ESTs can be used to deduct genome composition and organization, characterize the gene repertoire or explore plant gene evolution, validate gene predictions, identify transcribed coding and noncoding regions or alternative gene splicing and detect new allelic diversity. [167-169] There are more than 74 million ESTs publicly available since 2013 in the dbEST database with 174,275 ESTs for C. arabica and 69,066 for C. canephora. However, compared to Zea mays (maize), which has more than 2 million ESTs, this data is rather limited [170]. Despite numerous efforts to collect Coffea EST sequences from different research groups, most of the sequencing data belongs to private parties [171, 172]. Considerable work on ESTs was published around 2006. A French working group studied 5,534 unigenes from ESTs from C. canephora leaves and fruits [172]. A Brazilian Coffee Genome Project found 130,792, 12,381 and 10,566 ESTs from leaves and fruits of C. arabica, C. canephora and C. racemosa, respectively and 33,000 unigenes were identified [171]. 32,961 sequences were detected from C. arabica var Caturra leaf, flower and fruit separately by the CENICAFE group and assembled into 10,799 unigenes [173]. An Italian group found 1,587 non redundant ESTs from leaves and embryonic roots [174]. Another French group identified 5,534 unigenes from C. canephora leaves and fruits [175].

Coffee metabolism and Candidate Genes

17

Genome projects have started to focus on sequencing the genes related to important traits, for instance, flowering time [176, 177]. Coffee research has primarily focused on agronomic improvement of the plant. are most interested in improving or selecting for disease resistant Arabica coffee or higher cup quality Robusta coffee [158]. Recently, an increasing interest has been shown in research on genetic control of cup quality. To improve coffee beverage quality, it is important to understand coffee metabolism and the genes governing accumulation of vital compounds in beans and identify the molecular determinants of coffee aroma and taste. Caffeine, trigonelline, CGA, sucrose, lipids and volatiles are the focus of commercial importance [158]. There is still some research focus on fruiting cycle length, which shows an impact on chemical accumulation [176, 177].

1. Caffeine

The key biosynthesis pathway of caffeine is from xanthosine, 7-methylxanthosine, 7- methylxanthine, and theobromine to caffeine by three methylation steps (S- adenosylmethionine- (SAM) - dependent methylation steps) and a nucleosidase step (Ribose removal step) (Figure 4).

Figure 4 Major caffeine biosynthesis pathway in coffee. Note: Gene accession numbers are marked below the enzyme encoded in italics. The initially step is caffeine biosynthesis, catalyzed by xanthosine 7 N-methyltransferase, (EC 2.1.1.158) (7-methylxanthosine synthase), is to convert xanthosine to 7- methylxanthosine. Encoding genes of 7-methylxanthosine synthase are CmXRS1 (AB034699) and CaXMT1 (AB048793), which have been found in C. arabica [178, 179].

Secondly, 7-methylxanthosine is catalyzed by a nucleosidase. The N-methylnucleosidase (NMT) (EC 3.2.2.25) has already been characterized in tea leaves, however, the nucleosidase native to coffee still remains unknown at the protein level[180]. Methyl transfer and nucleoside cleavage are believed to occur together and be catalyzed by a single enzyme [181]. Subsequently are the methylation steps with native N-methyltransferase (EC 2.1.1.160) detected in coffee plants [182]. The gene encoding the protein with 7-methylxanthine

18 methyltransferase activity was designated based on the sequence of the tea caffeine synthase and named as CaMXMT and is involved in the third step to produce theobromine [183]. Another two homologous genes encoding theobromine synthases CTS1 (AB034700) and CTS2 (AB054841) had been cloned from young coffee leaves in 2003, although CTS1 was later found to be identical to CaMXMT [178, 179]. CaXMT1 was found to catalyze the third caffeine biosynthesis step and CaXMT2 was found to be involved in the last step to produce caffeine [179]. Mizuno et al. (2003) found a coffee caffeine synthase 1 (CCS1(AB086414)), which is homologues to CTS1 and CTS2 and with a similar function to the tea caffeine synthesis 1 (TCS1) that can catalyze the last two methylation steps [178]. Leroy et al. (2011) mapped two genes associated with caffeine metabolism to LGs A and I in C. canephora [184]. More recently in 2014, research showed a single gene family, ORTHOMCL170, clustering 23 genes including genes that encode caffeine biosynthesis. This work shows that the coffee caffeine biosynthetic NMTs are located in a gene clade different from its sister lineages, cacao and tea. Three tandem arrays found in microsynteny analyses of ORTHOMCL170, indicates that coffee caffeine synthase genes, CcXMT (encoding xanthosine NMTs), CcMTL, and CcNMT3, contribute to a metabolic gene cluster (a firm assemblage of co-expressed tandem duplicates) [48].

2. Trigonelline

Trigonelline formed from nicotinic acid. The enzymes of trigonelline biosynthesis are nicotine adenine dinucleotide (NAD) nucleosidase (EC3.2.2.5), nicotinamidase (EC 3.5.1.19) and nicotinate N-methyltransferase (trigonelline synthase) (EC 2.1.1.7) (Figure 5). Trigonelline synthase was first identified in peas and then partially purified from Glycine max [185-187]. However, the specific biosynthesis pathway has not been elucidated. Recently, Mizuno et al. (2014) isolated two homologous trigonelline synthases from coffee, encoded by CTgS1 (AB054842) and CTgS2 (AB054843) [188]. These two sequence were found to be similar to that of coffee caffeine synthases with an identity of 82% between CCS1 and CTgS1 [188].

19

Figure 5 Possible trigonelline biosynthesis pathway in coffee. Note: Gene accession numbers are marked below the enzyme encoded in italic.

3. CGAs

The synthesis of CGA is primarily based on a phenylpropanoid pathway that has been intensively studied, however, limited work has been conducted at the transcriptome level. The biosynthesis of CGA, involves the enzymes and encoding genes shown in Figure 6. The initial steps of this pathway converts phenylalanine to cinnamic acid. This step is catalyzed by the tetrameric enzyme, L-phenylalanine ammonia-lyase (PAL, EC 4.3.1.24), P450 cytochrome cinnamate 4-hydroxylase (C4H, EC 1.14.13.11) and 4-coumarate CoA ligase (4CL, EC 6.2. 1.12). The PAL catalyzed step is the important regulatory point in phenylpropanoid metabolism [189, 190]. The PAL genes are a multigene family that has been studied in many species [191-194]. The PAL genes are responsive to changing environment, including biotic or abiotic stress [195]. Compared to the sequence of CcPAL1 (first identified in C. canephora), two homologous of PAL genes (CcPAL2 and CcPAL3) were discovered with similar genome sequence and encoded proteins; CcPAL2 locates in coffee genome linkage group F was syntonic to PAL gene included in tomato chromosome 9, thus it was proposed to be the ancestral gene to C. canephora; CcPAL2 was highly expressed in many tissues of C. canephora, such as flowers and fruit pericarp; However, CcPAL1 and CcPAL3 were expressed highly in immature fruit [189]. C4H is responsible for the first oxygenation step catalyzation in the phenylpropanoid pathway in higher plants and it is subject to environmental stress [196].

Another two key transferases are quinate hydroycinnamoyl transferase (HCT) and hydroxycinnamoyl-CoA quinate hydroxycinnamoyl transferase (HQT). Research has shown increasing HQT expression in tomato corresponds to an increasing content of CGA while a HQT expression decrease leads to a CGA reduction. Moreover, an increase in HQT expression results in a CGA rise and this is linked with an improving tolerance to the oxidative stress and an increased resistance to bacterial pathogens [80]. The HQT and HCT of C. canephora have also been cloned and characterized at different stages of development of bean, pericarp and other tissues [197]. For the p-coumaroylester 3’-hydroxylases (C3’H), two full-length cDNA clones, C3H1 and C3H2 (CYP98A35 and CYP98A36), were extracted from C. canephora plants; C3H1 was found to be involved in CQA biosynthesis (Figure 6), while C3Hc2 was only involved in the alternative reaction to catalyze shikimate esters [198].

20

Currently, the biosynthesis of 5-feruloyl quinic acid is not clear. It is unconfirmed whether it is metabolized through a caffeoyl-CoA 3-O-methyl transferase (CCoAOMT) or by a CCoAOMT -like enzyme [199]. A RT-PCR study in 2010 identified transcripts in the CGA biosynthesis pathway in coffee seeds [200]. This research found that temperature has a direct influence on CGA accumulation. A negative correlation was found between CCR4 and CHR transcript accumulation in response to temperature in developing coffee bean at 120-150 days after flowering (DAF). In contrast, a positive correlation of C3Hc1, CCoAOMT1, F5H2, CCR1 and CCR2 expression with temperature at mature stages was identified. Interestingly, in the early development stage of the fruit endosperm, temperature regulates PAL2 and C4H. High temperatures resulted in over-accumulation of PAL2 and C4H mRNA at in coffee beans at 90-120 DAF. The reverse situation was then observed at 120-150 DAF under cool climates, which slows down the activation of phenylpropanoid genes [200].

Figure 6 Chlorogenic acids biosynthesis pathway in coffee

21

Note: PAL, L-phenylalanine ammonia-lyase; C4H, cinnamate 4-hydroxylase; 4CL, 4- coumarate:CoA ligase; HCT, quinate hydroycinnamoyl transferase; C3’H, p-coumaroylester 3’-hydroxylases; HQT, hydroxycinnamoyl-CoA quinate hydroxycinnamoyl transferase; CCoAOMT, caffeoyl-CoA 3-O-methyl transferase; FQT, feruloyl-CoA quinate feruloyl transferase.

4. Sucrose

Sucrose metabolism in coffee is associated with sucrose synthases (CaSUS1 and CaSUS2 in C. arabica), sucrose phosphate synthase (CcSPS1, CcSPS2 in C. canephora), sucrose phosphate phosphatase (CcSPP1 in C. canephora), invertase (cell wall invertase, cytoplasmic CaInv3 in C. arabica and cell wall CcInv4 in C. canephora) that catalyse the irreversible hydrolysis of sucrose) (Figure 7). Invertase inhibitors have been found in C. canephora: CcInv1, 2, 3, 4 [85, 201]. A study of sucrose metabolism during C. arabica fruit ripening revealed that the structure of CaSUS1 (subgenome copy of C. canephora) is significantly different from CaSUS2 in C. arabica (subgenome copy of C. eugenioides). Compared to CcSUS1, a split between exon 5 and 11 was found and 18 differences in the protein coding sequence of CaSUS1 was identified [85, 202]. Additional bands were discovered in CaSUS1 at the nucleic acid level compared to that of CcSUS1, CaSUS1 and CaSUS2 were probably different from one another as well [202]. The expression of CaSUS1 accumulated primarily in immature perisperm, endosperm and pericarp but was reduced with fruit development. However, CaSUS2 transcripts were rarely detected in pericarp and perisperm and were undetectable at the early endosperm stage, but increased in coffee bean development. Peak expression of CaSUS2 was found coffee beans at the last weeks of pericarp and endosperm development. This indicates that these two genes may have different functions in sucrose metabolism [85]. Another finding on sucrose metabolism of Arabica and Robusta coffee revealed that the CaSUS transcripts of Arabica coffee increased in the first two stages (small and large green cherry stages), but decreased in the yellow cherry and then increased massively at the last red cheery stage [201]. CcSUS2 was found to have the same development pattern, however, CcSUS1 transcripts were at low levels all the time. This is different from another study in C. racemosa, where SUS1 and SUS2 shown a reverse pattern of through ripening [202]. Higher activity of soluble acid invertase was also found in Robusta coffee than in Arabica genotypes, especially in the early stages of fruit development. This suggests higher sucrose degradation in Robusta coffee at early developmental stages compared to Arabica coffee. Lower activities of CaSUS2 and invertase were found in Arabica

22 coffee resulting in more sucrose accumulation in later stages. Meanwhile, higher SPS activities in Arabica were revealed in the last fruit development stages leading to a higher accumulation of sucrose [201]. Since higher sucrose was found in Arabica coffee, SUS2 and SPS were probably very important for higher sucrose accumulation.

Figure 7 Sucrose metabolism pathway in coffee Note: CWIN, Cell wall invertase; HT, Hexose transporter; ST, Sucrose transporter; INV, invertase; SUS, Sucrose synthase; SPP, Sucrose phosphate; SPS, Sucrose phosphate synthase.

5. Lipids

Triacylglycerol biosynthesis pathways, including the Kennedy pathway, have a predominate role in coffee lipid metabolism and have been studied thoroughly [203, 204]. The central route of the fatty acid and triacylglycerol pathways are present in Figure 8. Only two genes encodes MAT and KASIII, catalysing the conversion of malonyl-CoA and malonyl-ACP to 3-ketoacyl-ACP, were not known in the Coffea EST databases [203]. Gene expression in the lipids biosynthesis of C. arabica varies in different tissues (endosperm and perisperm) and growing stages [203]. Two different tissues, endosperm and perisperm, present different transcriptional programs of lipid biosynthesis. In most of the endoplasmic reticulum, for example, one gene encoding diacylglycerol acyltransferase in perisperm showed maximal expression in the first fruit development stage but decreased with fruit growth, while the other paralog in the endosperm increased in the early stages and then declined [203]. The gene encoding the β-Ketoacyl-CoA synthase catalyses the first step of wax biosynthesis was found in coffee beans and flowers but mainly accumulated in leaves [205]. The gene encoding acyl-CoA diacylglycerol acyltransferase was identified to catalyse the final step of triglyceride synthesis [204]. Additionally, five genes from the coffee oleosin family,

23

CcOLE1-5 and a gene from steroleosin family, CcSTO-1, were identified, isolated, characterized and quantitated in C. canephora. All the five genes (CcOLE 1-5) were found to be expressed mainly in mature Arabica and Robusta seeds, especially CcOLE1 and CcOLE2. However, CcSTO-1 expression remained evenly during germination [206]. Oleate desaturase (FAD2) produces linoleic acid. More recently, six genes were identified encoding FAD2 and most of them from genes on chromosome 1 tandem duplications. CcFAD2 in C. canephora was found to be actively transcribed in the developing endosperm based on RNA sequencing data [48]. Meanwhile, transcripts expression increased during seed germination together with a rise in linoleic acid content [203]. Other research revealed the gene, FAD8, encoded the plastidial isozymes of ω-3 fatty acid desaturase contributing to trienoic fatty acids.

24

Figure 8 Lipids biosynthesis in coffee Note: PDH, pyruvate dehydrogenase; AC, acetyl-CoA carboxylase; MAT, malonyl-CoA ACP transacylase; KAS, ketoacyl-ACP synthase; KAR, ketoacyl-ACP reductase; HAD, hydroxyacyl-acyltransferase; EAR, enoyl-ACP reductase; PC, phosphatidylcholine; DAGAT, diacylglycerol acyltransferase; PDAT, phospholipid diacylglycerol acyltransferase; LPC, lysophosphatidylcholine acyltransferase; LPCAT, lysophosphatidylcholine; SAD, stearoyl- ACP desaturase; FAT, acyl-ACP thioesterase; LACS, long-chain acyl-CoA synthetase; GPAT, glycerol-phosphate acyltransferase; LPAAT, acylglycerol-phosphate acyltranferase; PAP, phosphatidate phosphatase; CPT, diacylglycerol cholinephosphotransferase; KCS, beta ketoacyl-CoA synthase.

25

2.3 Genotype and Environment influences on Coffee Quality

With a wide range of environmental differences, there is a diversity of coffee grown all over the world. Environmental factors include climatic conditions and soil, systems differences, for example, variation in genotypes and agronomic management (such as shade, fertilizers, pest, diseases and weed conditions) all influence coffee production and quality [32].

2.3.1 Shade

The influence of sun and shade on coffee has been discussed for more than a century [207, 208]. Coffee originated from the understorey of African forests. Both C. arabica and C. canephora are native to shade. Substitution of “natural shade” with shade species has been recommended since the 19th century with selected species, especially bananas and fruit trees being used large scale substitution for shade occur in the 20th century [32, 209, 210]. Early development of coffee plantations was on small farms. However, after the World War II, there was an increasing demand for coffee due to more coffee consumers plus an improving availability of chemical inputs, which resulted in a demand for higher yielding coffee plantations [32]. Around the world, efforts to meet this demand focused on higher plant densities, more external inputs like fertilizers, pruning and pollarding, or reduce the number of shady trees on coffee farms. Coffee was found to be more vigorous and productive with lower shade levels particularly in optimal or near-optimal production areas [211, 212]. The use of shade was abandoned. However, the increased yield that resulted was coupled with a requirement for more external inputs [32, 213]. The outcome was increased long-term use of these inputs; fertilizers to meet the nutritional demands, pesticides to eliminate pest problems and herbicides to reduce weed competition. All these management changes resulted in a degradation of natural resources, less bio-diverse systems and was, in some cases, related to environmental pollution and health issues [214]. In contrast, the systems with shade and trees provided coffee an ecosystem with (1) sustainable biodiversity (habitat niches for local wildlife and migrating birds) with potentially valuable plants and animals, like microorganisms with different functions, insects including predators, birds, and mammals; (2) moderated microclimate; (3) controlled weeds and (4) retained water and maintained soil fertility [210, 215-220]. Much work has been done on the influence of shade on coffee produced from optimal plantations for coffee [221]. However, research on shade trees in suboptimal conditions in the typical Central America coffee plantations indicated significant advantages relative to that found in optimal condition areas [32]. Although many publications show shade effects on coffee production, there are only a few long-term field experiments.

26

Shade reduces the amount of fallen fruits by around 20% as well as the number of rejected fruits (diseased, mummified, sunburnt or died berries) by 10% [32, 35].

Despite many publications on factors influencing coffee quality, there is limited research focused on shade influences [32, 35]. Research has shown that the level of shade is positively correlated with fruit weight, bean size, chemical components as well as cup quality, especially under open and dense shade plots [35, 222, 223]. Higher sucrose, chlorogenic acid and trigonelline contents were found in full sun immature coffee beans with more bitter and astringent beverage, while higher caffeine and fat content were found in shade grown beans or leaves. This presumably results from the lower temperatures experienced in a delayed berry ripening process [35, 224, 225]. For suboptimal coffee plantations, shade plays a dominant role to reduce high temperatures and promote the uniform development and ripening of beans [218, 220]. The effect of shade on sugar metabolism was also analysed during the development of coffee berries. Higher reducing sugars were found in the pericarp and endosperm of coffee beans from 50% shaded beans compared to full sun management while sucrose content was slightly lower in shaded beans. Sucrose synthase activities were higher at maturation of perisperm development for shaded plants [226]. More recent research has shown that the total sugar content of coffee beans was the highest in 60% shade condition compared to 50% and 70% [223].

There have been few reports of the influence of shade on expression of coffee candidate genes (involved with quality traits). Shade studies with caffeine have mostly involved work on coffee leaves or seedlings. Light was found to be a positive influence on leaf caffeine content. Transcript analysis showed that this positive effect was due to the regulation of caffeine biosynthetic genes under shade conditions. Three N-methyltransferases expression was reduced in complete darkness [227]. Marraccini et al. (2007) also found that shade improves the expression level of CaSUS2, which was associated with sucrose content [226]. Similar results were reported in earlier work with increased reducing sugars content and decreased sucrose in shade compared with full sun management [228]. Higher enzymatic levels of sucrose synthase and sucrose-phosphate synthase were identified at the last stage of pericarp and perisperm tissue development in shaded samples while higher levels in endosperm were found in developing stages. Expression of SUS-encoding genes was found to be modified by shade in coffee beans with higher transcripts levels of CaSUS2 gene demonstrated in shade [228].

27

2.3.2 Pruning

It is well-known that the best fruits come from the primary branches and those from the secondary or tertiary branches tend to be smaller [229]. Several reports on pruning of coffee trees have been reported [230-233]. Together with capping, pruning improves the health of coffee plants and make it easier for the farmers to reach the entire tree [229]. Meanwhile, pruning affects the microclimate and plant physiology and influences the life cycle of fungi, like coffee rust epidemics. Immediate pruning will also reduce the levels of infectious lesions like spores [234]. Without pruning may limit coffee tree growing. For example in C. alliodora, nutrients, especially K, will be stored in tree stems [235]. Nnutrients will be released from the pruned branches to the soil and become available to the crop after pruning [236-239]. Pruned trees produce more fruits but results in higher labour load and increased leaf susceptibility to disease [234]. Additionally, coffee farmers ensures regular production by pruning but avoid the possibility of weakening the coffee tree [229].

Generally, the time of pruning is dictated by coffee phenology. For instance, in Costa Rica, the first pruning is around January to February to promote flowering by June to July in order to promote active berry ripening. In theory, the time for pruning could be coordinated with maximum nutrient demand period to avoid a temporary nutrition shortage or surplus [235, 239]. In addition to pruning the coffee trees, pruning the shad trees surrounding the coffee trees also make a difference. Research shows pruning of selected shad trees for coffee plantation released only 40 percent of the N content totally from Erythrina poeppigiana mulch after 30 days [240]. However, P and K release were faster than that of N [241]. Moreover, it was shown that the biannually pruned shade tree E. poeppigiana returned 90- 100% of the nutrients to biomass above ground, however, unpruned Cordia alliodora only returned N, P and K at 36%, 18% and 12%, respectively [242].

Very limited information is available on pruning influences on coffee quality. However, research on shade influences on coffee quality have employed pruning intensity to change shade patterns. Pruning in favour of sunlight increased the coffee bean size, weight, visual appearance ratings of beans, acidity and body rating plus off-flavours [32]. Therefore, it would be of interest to study the principle of how genes expression changes with pruning treatment, especially on transcriptome level to explain how the phenotype differences result from.

28

2.3.3 Water

Climate effects can either benefit coffee production or even ruin it. Among the climate effects, drought and temperature are the major influences on coffee bean development [243, 244]. Water is available from rainfall as well as atmospheric humidity. Rainfall is the essential influence to the coffee plant [245]. For Arabica coffee, appropriate water availability should be 1400 to 2000 mm total annual rainfall, less than 4-6 months of dry season (36 mm average monthly rainfall) provided deep soil offers good water retention and humidity around 60 %. However, Robusta coffee is more sensitive to drought, total annual rainfall should be 2000 to 2500 mm with no more than 3-4 months of dry season (50 mm average monthly rainfall) and humidity around 70 to 75% owing to higher evapotranspiration resulting from higher temperatures (lower planting altitude). A few months are required each year with little or no rain for the coffee plant to flower [245]. Excessive rainfall will make it difficult to dry the plant and lead to erosion while deficient rainfall will cause drought problems [245]. Moderate drought will affect plant flowering and bean development and influence final coffee production, however, severe drought will result in plant death. Moreover, a highly variable climate will bring more defective beans, affect the chemical compositions and influence the final cup quality [246, 247]. Therefore, this issue may become more pronounced with global climate change making it more difficult to maintain sustainable coffee production and quality [248, 249].

The principle of water deficiency in plants involves physiological and biochemical processes that have been discussed for decades [250-255]. The physiological processes under water deficiency include diffusion of CO2 resistances to stomata and mesophyll, regulation of light reception and electron transport while biochemical processes involved with Calvin cycle that regulate the activity of ribulose-1, 5-bisphosphate carboxylase and assimilation. In the early stages of water deficiency, a decrease of photosynthesis, one of the first responses, is caused by stomatal closure to temporary soil drought in order to limit water loss and carbon assimilation. Excess reducing power generated with CO2 assimilation preceding electron transfer reactions can form the oxygen species reactivation that results in damage by photo- oxidative [256]. Rubisco (ribulose-1,5-bisphosphate carboxylase/oxygenase, EC 4.1.1.39), the key photosynthetic enzyme in C3 plants, is used for CO2 fixation and photorespiration [257]. The rubisco holoenzyme in higher plants is composed of a large (RBCL) and a small (RBCS) subunit, encoded by unique chloroplastic genes and multi-gene family, respectively [258]. Much work shows that RBCS transcripts vary with light intensity and plant

29 development [259]. Numerous studies show that drought stress genes expressed higher than many photosynthetic genes [260-264]. Research also found the leaf size of coffee plants irrigated twice per day was twice that of plants with twice irrigation per week despite the unit-leaf-area assimilation rates were the same to each other [265, 266]. Recently, homologous RBSC1 genes were found in C. arabica, including CaCe from the subgenome of C. eugenioides and CaCc from the subgenome of C. eugenioides (higher expressed). However, the total RBSC1 gene expression was highly depressed in both C. canephora and C. arabica under drought stress. Gene expression of different times per day was also explored in this research that the RBSC1 protein was accumulated in the C. canephora leaves under water stress which is probably from nocturnal RBCS1 expression; this increasing accumulation may have an antioxidative effect on photorespiration in water stressed coffee plants [267]. Based on the evidence that shade could reduce negative drought influence on coffee, Cavatte et al. (2012) detected shade influence on the morphology of water stressed coffee [268]. Unfortunately, only a small variation of biomass allocation was observed in shade and drought treatment; proline content was found to increase in plants under drought stress, while shade did not attenuate the side effects of drought on coffee trees.

Most of this work was done with coffee plant leaves and seedlings but only a few studies have been done with beans. Researchers found that water management (irrigation) is not a major factor as the changes (compared to non-irrigated coffee plants) failes to significantly influence the chemical attributes and cup quality compared to coffee plants [247].

2.3.4 Temperature

As tropical and sub-tropical plants, temperature fluctuations influence significantly on the growth of coffee. A rapid temperature drop will result in the “Hot and Cold” disease of coffee, which is the reason to apply shading coffee in order to decrease the variation of temperature [269-271]. Low temperature influences the geographical distribution of coffee [272]. For C. arabica, the temperature fluctuation range should be below 19 oC; The optimal growing temperature is around 18 oC (night) to 22 oC (day); The extreme temperature for Arabica coffee is 15 oC at night while during the day it is around 25 to 30 oC [2]. Higher temperatures will induce more leaf rust and accelerate fruit maturation [2]. The photosynthesis of Arabica coffee will reduce when the temperature is above 25 oC. However, heavy damage will occur to the coffee plant if it is more than 30 oC, for instance, leaf chlorosis, “star flowers” and wilting blossom. Low temperature have side effects on the coffee plant and will favour CBD

30 and lead to white or yellow leaf discoloration [245]. Extreme temperatures below 4 oC will result in lesions on coffee leaves and fruit. Even worse, if the temperature lasts six hours or more at -2 oC, serious damage will occur to the coffee plant even leading to plant death [245]. For Robusta coffee, the best growing temperature is 22 to 28 oC. The variation of temperature needs to be managed within a range of 12 to 14 oC. High temperatures of more than 30 oC will reduce photosynthesis, especially for atmospheres with dry conditions, while low temperatures involve problems starting from 10 oC and the coffee plant dies at around 4-5 oC [245]. Temperature variation between the canopy and the soil may be significant, such as the reported 4 to 5 oC. Wind and air (humidity) can also affect the effective air temperature [245].

Chilling (low but above-freezing temperature, <20 oC) injury in plants started to be surveyed 80 years ago and has been systematically analysed since the 1960s [273]. Cold temperatures influence all kinds of cellular function, including metabolic processes of the plant, reduces enzymatic activities, decreases photosynthetic capacity and alters membrane fluidity, which will result in increasing membrane permeability, membrane integrity being impaired and cell compartmentation breaking down [274, 275]. Lipid and fatty acid composition were also changed in plant cell membranes to keep chloroplast function at the low temperature [276, 277]. All these side-effects are often associated with decreased crop production [271]. Extreme temperatures also increase the phenolic compounds both in coffee plants and seeds [82].

Research on temperature effects has been primarily based on coffee plants. One study found that irreversible membrane damage, increased lipid synthesis and alteration of membrane unsaturation occurred to var. dewevreiplants in chilling temperatures in order to maintain cellular integrity [278]. Similar research showed that cold will induce photo- inhibition of photosynthesis, which results from impairment of photosynthetic components; Biochemical (such as carbohydrate synthesis) and biophysical (antennae functioning) inactivation was detected instead of stomatal constraints; C. dewevrei was found to be more highly cold-sensitive than C. canephora and C. arabica as the photoinhibitory impairments were heavier [271]. Both biochemical and biophysical impairments lead to limited gene expression [275]. In alfalfa, plasma membrane rigidification was found inducing cold responsive genes (COR) and leading to cold acclimation [279]. During the vegetative stages of the model plant, Arabidopsis, the pollen maturation is the most sensitive stage in response to cold. Many COR genes in leaf are found yet to induce at this stage [280]. Different gene expression was exhibited in roots and leaves in cold temperatures: 86% different genes were

31 found [281]. Therefore, cold-regulated transcription probably varies in different tissues. Generally, cold stress will induce the APETALA2/ ETHYLENE RESPONSE FACTOR family, CBFs (C-repeat binding factors, that is, dehydration-responsive element-binding protein 1s) transcription factors expression. CBFs can bind to cis-elements of promoters of COR genes and the expression is increased [275]. Despite the presence of homologs of CBFs, chilling sensitive plants, like rice and tomato, were still found to be easily impaired at cold temperatures [275]. However, overexpression of Arabidopsis CBFs was found to improve the tolerance of cold stress in tomato and rice in transgenic analysis [282, 283]. Different gene expression was detected in chilling conditions. For example, chitinase, which was responsible for the antifreeze activity and the encoding genes (cachi3-1, cachi3-2 and cachi4-1) might contribute to the cold acclimation and tolerance with other genes [284-286]. A study on C. arabica and C. canephora plant transcripts showed positive changes of chitinase gene expression in adverse conditions, sub-zero temperatures included. However, this positive result was not found in C. dewevrei [243].

In contrast, coffee plants under heat pressure are also a threat to crop production and quality. Heat stress may lead to direct injuries to coffee plants, protein denaturation and aggregation. For example, heat may increase the membrane lipid fluidity and produce indirect injuries, such as inactivation of the chloroplasts and mitochondrial enzymes and biosynthesis inhibition. Both direct and indirect injuries result in cellular homeostasis and then starvation, ion flux reduction and toxic compounds production, and finally inhibit plant growth [287, 288]. Frequently disturbances can even lead to plant death [289]. Heat has become more important recently owing to the global warming since it is reported that tropical temperatures will increase 1-5.8 oC by the end of the 21st century [290]. Research on C. arabica showed that there is an increase in raffinose and stachyose and decrease in plant growth and coffee yield under heat stress [291, 292]. Moreover, damaged palisade cells and ultrastructure were observed in another study; The chemical profile also showed changes in coffee cell wall polymers and structural cell anatomy [293].

Temperature was shown to influence the development of CGAs during coffee fruit development in the transcription level. Recently, characterization of genome-wide homologous gene expression on transcripome level in Coffea arabica was also studied to see the plant growth with an increasing temperature. As a conclusion, study temperature influencing coffee quality through coffee fruit ripening would be necessarily to explain how the key chemical compounds accumulates and determines coffee quality.

32

2.3.5 Soil

Soil is the living base of the coffee plant which is a very complicated factor involved with water (moisture) and temperature.[294, 295] Coffee plants favour of alluvial and colluvium soils with a volcanic texture. Meanwhile, the soil depth above hard-pan or water table needs to be taken into consideration. A soil depth of 2m is the minimum requirement for the water supply of the taproot allowing the plants to withstand the dry season. Higher organic materials in the soil are highly considered since less prone is to be eroded [245].

2.3.6 Wind

Places like Madagascar and the Philippines that often experience frequent cyclones are not recommend for growing coffee, as this will damage the coffee plants. A typical example is the Island of Reunion, where coffee plantations were discarded since 19th century owing to a serious leaf rust outbreak together with frequently violent cyclones [245]. However, even less powerful but long lasting wind will also be harmful to the coffee plant. This damaging wind will cause branches to die-back, blemish flowers and even mature fruit. Shade trees and windbreaks are highly recommend in these areas [2].

2.3.7 Topography

The most favourable land for coffee is flat or slightly rolling hills, which offers deep soil and a good water supply; Coffee prefers steep slopes, however, more manual labour cost is needed with steep slopes due to slopes of more than 20% are difficult to mechanize [245].

2.4 Methods for analysis of genotype and environment influences on coffee quality

Generally, phenotype depends on the genotype and environment [296]. As discussed above, environmental influences, like abiotic stresses and drought, negatively influence plant growth and reduce crop productivity. However, plants have a remarkable ability to evolve and adapt to this stress which results in phenotypic plasticity associated with physiology and metabolism changes [297]. To complete the whole set of transcripts in a cell, transcriptome analysis aims to classify the transcripts of all species (mRNAs, non-coding RNAs and small RNAs included), identify the genes transcriptional structure (such as their 5’start sites and 3’ ends) and to reveal the changing expression levels across different environments [298].

2.4.1 The development of transcriptome analysis technologies

A diversity of the Next-Generation Sequencing (NGS) technologies has been established for transcriptome analysis [298, 299]. The first group of technologies were hybridization-based 33 methods, including custom-made microarray or oligo microarrays (commercial high density). Specialized microarrays have also been designed to quantify distinct spliced isoforms. Another high-resolution technology in this group is the genomic tiling microarrays that allows mapping from several base pairs to 100 bp [300-304]. However the limitations of these methods include reliance on knowledge of the genome sequence, high levels of background and a limited dynamic detection range due to cross-hybridization and background signals [305, 306]. To solve these problems, tag-based sequencing, serial analysis of gene expression (SAGE), cap analysis of gene expression (CAGE) as well as massively parallel signature sequencing (MPSS) were then developed. Although these methods provide high throughput and precise gene expression, most of the technologies are expensive; some short significant tags cannot be mapped to the reference genome and short tags isoforms are unable to be distinguished from one another; therefore, there is a limitation to the use of the traditional sequencing technology in annotating the transcriptome structure [307-313].

RNA sequencing (RNA-Seq) was developed to overcome these existing problems of tag- based sequencing based on novel high-throughput DNA sequencing methods for mapping and quantifying transcriptomes. It costs less and allows annotation to single-base resolution as well as genome scale gene expression analysis. RNA-Seq is attractive especially for non- model organisms without reference genome sequences [314]. Sequence diversity can also be discovered by RNA-Seq, for instance SNPs [315, 316]. In contrast to microarrays, RNA-Seq has low background signal with high resolution of DNA sequences mapped to the genome. A wide range of gene expression can be obtained since there is no upper limit to sequence numbers (quantification) for RNA-Seq [298]. The gene expression quantification and reproducibility of RNA-Seq has been found to be high and requires less RNA than the other methods [316, 317].

2.4.2 RNA-Seq technologies

Basically, total or fractionated RNA is converted to a cDNA fragment library. Then RNA- Seq enables de novo transcriptome sequencing based on the currently wide-used technologies, such as the Illumina, Pacific Bioscience and Oxford Nanopore platforms. The first NGS technology, based on pyrosequencing was developed by Roche in 2004 [318]. This sequencing technology could analyse 500 million cDNA sequences in ten hours with read length of 250 nt [299].

34

RNA sequencing was developed by Solexa with the Illumina GAIIx in 2006 providing a high-throughput sequencing platform. This method delivered a low sequencing-error rate even in repetitive regions [319]. In 2010, the HiSeq 1000 and HiSeq 2000 systems were developed by Illumina, which enabled higher sequence output (100-150 bp) at a lower cost. Meanwhile, MiSeq system (Table 1) was produced as a personal benchtop sequencing platform to analyse small genomes in a few hours [320]. HiSeq systems were introduced later by Illumina as shown in Table 1, including Hiseq 2500, HiSeq 3000, HiSeq 4000, HiSeq X Ten. With the highest output and lowest cost per-base, Illumina is now the leading NGS platform [321].

Ion Torrent PGM was commercialized by Life Technologies in 2010. Ion Torrent PGM is based on hydrogen ion sensitive transistor strategy. Developed by the founder of 454, Jonathan Rothberg, this machine uses similar template preparation and sequencing steps to Roche 454. With the fluorescence and camera scanning instead of optical detection of the incorporated nucleotides. This technology runs faster and has a lower cost and smaller instrument size compared to many Illumina platforms (such as the Hiseq systems). Two year later, in 2012, the Ion Proton updated to a higher output (10 Gb, Table 1), shorter read length and faster speed.

Based on single molecule real time sequencing, the RS platform was made commercially available in 2010 by Pacific Biosciences. Long read length and no clonal amplification required show significant advantages to short reads technologies. However, higher error rate, lower output and higher cost limit the use of this technology [321]. The RS II platform released in 2013 using circular consensus reads aims to reduce its random errors and was found very effective compared to the previous platform. Since 2016, Pacbio sciences released its new Sequel system to the market. This system has improved throughput and sequencing length compared to RSII.

Another single molecule real time approach appeared in 2014, the MinION, released by Oxford Nanopore Technologies. This system does not need library preparation or sequencing reagents and attractively, it is an inexpensive hand-held device. More recently, Nanopore established two new platform, GridION X5 and PromethION in 2017. These two platforms have very high throughput compared to the MinION. Nevertheless, the data quality of this approach still remains to improve owing to the relatively high systematic errors [322].

35

Obviously, these novel new approaches need to improve and make more reasonable combination of size, speed, read length and machine cost. Although Illumina presently dominants the NGS market, significant future advances in many of these platforms are expected. Moreover, the potential of these technologies have already shown that large amounts of sequence data can now be obtained efficiently.

36

Table 1 Comparison of the high throughput technologies

Platform Device Sequencing Output Read Reads per Run Time Error rate Pros Cons Applications and publications: strategy Range Length Run (bp) MiSeq Four- channel; 0.3-15 2 x 300 25 million 5-55 h 0.8% Speed, personal and Short reads Transcriptome study of different organs Sequencing by Gb bp simplicity for targeted and [323]; resequencing of and identifying new synthesis small genome sequencing. genes [324] dynamics of transcriptome Illumina during fruit development [325] HiSeq2500 Four- channel; 10 - 1000 2 x 150 300 million 7 hr - 6 days < 1% Rapid mode; less cost Short reads; Human whole genome, exome sequencing, Sequencing by Gb - 4 billion effective; highly scalable RNA-Seq, methylation: sequencing small synthesis genomic targets found high efficiency and extreme accuracy[326]; transcriptome profiling; full-length tissues transcriptome sequences [327] NexSeq500 Two-channel; 20 - 120 2 x 150 130 - 400 11-29 h < 1% Fast, high throughput; Short reads A fast benchtop sequencer for individual Sequencing by Gb million reduced complexity labs and can be used for exome, synthesis transcriptome; whole-genome sequencing [321, 328]. HiSeq3000 Flow cell; 125 - 750 2 x 150 2.5 billion <1 - 3.5 days < 1% 50 samples per run; higher Short reads Exome, transcriptome; whole-genome Sequencing by Gb speed and output sequencing; structural variations [329, 330]; synthesis the census study with this tool has been HiSeq4000 Flow cell; 125 - 2 x 150 2.5 - 5 <1 - 3.5 days < 1% 50 - 100 samples per run; Short reads introduced to marine meiofaunal biodiversity Sequencing by 1500 Gb billion higher speed and output [330, 331] synthesis HiSeq X Ten Flow cell; 900-1800 2 x 150 3 billion <3 days < 1% Maximum throughput and Short reads Population genomics; human whole-genome Sequencing by Gb lowest cost population-scale sequencing[322, 328] synthesis human whole-genome sequencing; enhanced optics and computing capacity; improved cluster generation chemistry

37

Life Science Ion Torrent Hydrogen ion 30 Mb-2 400 400-550 k; 2.3-7.3 h 1.71% Long reads; low cost per High error More useful to targeted sequencing and (PGM) sensitive transistor Gb 2-3 million; sample; short run time rates in small genomes; transcriptome assembly and depend 4-5.5 homopolym quantification [332] reduce extreme genome on the million ers region; complexity and identify SNPs [333] chip and high reagent illustrate potential markers [334] base cost Ion Proton Hydrogen ion 10 Gb 200 60–80 2-4 h --- Faster speed and reduced High error Exome, transcriptome, whole-genome, target sensitive transistor million cost than PGM but less read rates in sequencing; microbial discovery: sequence length; clonal amplification homopolym with nonmodel organisms [335] is avoided and allowed direct ers region; sequencing of native and high reagent potentially modified DNA cost; limited literature support Pacific Biosciences RSII Single-molecule 1-12 Gb 10-16 kb 1G 0.5-6 h 13-15% Long read length; short run Relatively Genome sequencing of virus, bacteria, lower Real-time (mainly times; no PCR amplification high initial eukaryotes; target sequencing: reconstruct indels) error rates complex regions [336, 337] Sequel Single-molecule 10-120 10-14 kb 10G 0.5-10 h 13-15% Long read length; short run Relatively Whole genome sequencing, targeted Real-time Gb times; no PCR amplification high initial sequencing, complex populations, RNA error rates sequencing, epigenetics, etc. [338-341] Metabarcoding of fungi and other eukaryotes [342]; structure variant detection [343] Oxford Nanopore MinION Single-molecule 10-20Gb <2Mb 10-20Gb 48 h Insertion: Speed, single-base Relatively Whole genome sequencing, targeted real time 4.9%; sensitivity; ultra-long read high sequencing, complex populations, RNA deletion: lengths; portable; direct systematic sequencing, epigenetics, etc. [344] 7.8%; DNA/RNA sequencing; high errors; Determine the position and structure of substitution: yields for large genomes higher bacterial resistance island together with GridION X5 50– 50–100Gb 48 h 5.1% On-device base calling – no machine Illumina-derived reads[345]; resolve 100Gb local infrastructure cost assembly gap on human chromosome [346] requirement. PromethION Tb Very large data volumes

38

2.4.3 Applications in coffee

With a reduction in the time and cost, NGS is attractive to genome sequencing, especially for large genome sequencing, such as de novo sequencing [347]. NGS technologies provide genomic information in detail and enable the analysis of specific organisms and the diversity of individuals [48, 299]. Other applications of NGS are now widely used, such as whole- genome re-sequencing and targeted sequencing [348, 349]. Comparison of sequences with reference genomes in these applications reveals the influencing phenotypic characteristics [350]. NGS approaches have also been applied to the identification of molecular mutations and novel allele discovery [351, 352]. Detection of epigenetic modification and genome regulation is also an established use of NGS [348, 350, 353].

Gene expression profiling, enables identification and quantification of transcripts and has had significant success in detecting molecular differences under different biological conditions, or in individual tissues [347, 350]. As mentioned above, altitude influences on coffee gene expression have been studied with RNA-Seq. The reads obtained were aligned to de novo assembled contigs from the 206 million RNA-Seq reads [164]. Temperature affects the sub- genomes and their regulation of coffee gene expression was explored with RNA-Seq. Another interesting study compared the RNA-Seq (on an Illumina HiSeq 2000 with100 bp single-end sequences) with de novo transcriptome assembly in C. arabica and C. eugenioides. Sequence comparison and functional annotation were analysed using public databases (NCBI-nr, Swiss-port and PlantCyc) [354]. Research was also conducted on different tissues and individuals, for example, the research on leaves, flowers and initial fruit (perisperm) development of arabica coffee using Illumina HiSeq2000 technology for transcriptome analysis. Unigenes were generated with de novo assembly, including candidate genes. Similar pattern of gene transcription and diterpenes concentration levels were analysed for five candidate genes to understand coffee diterpenes synthesis [165].

2.5 Conclusion

According to the literature, coffee quality is a very complex trait influenced by numerous factors, such as environment and genotype. However, the research gaps stand in 1) numerous studies conducted with coffee plants, such as coffee seedlings, leaves and stems, instead of coffee fruit. However, coffee fruit is critical to coffee quality, which produce the final product. 2) Among the limited work in coffee fruit development and quality, most work explored the ripening of coffee fruit to see how the important phenotypic traits involved with

39 seed development. But no systematic work has been done with phenotypic traits of coffee quality influenced by the environment or genotype. 3) The influence of canopy position to coffee quality has not been conducted before. In conclusion, the full potential of NGS technology to the understanding of coffee quality has not been explored. Detailed analysis of the transcriptome is expected to provide new insights into the genetic and environmental control of coffee quality.

40

Chapter 3 Flavour and quality analysis of coffee beans

The aim of this chapter is to identify the quality (difference) of coffee beans from the upper and lower canopy.

3.1 Introduction

Chapter 2 presented a review on the important parameters of coffee quality traits. In this chapter, the quality of beans harvested from different levels of the canopy were compared in terms of physical bean size and weight, chemical compositions of beans (caffeine, trigonelline and sucrose) as well as determination of the sensory differences.

Chapter 2 has documented that sensory evaluation is a primary way to determine coffee quality. Bean weight and size were key traits and are often determinants of bean commercial price. Coffee quality is also often determined by composition such as the caffeine, trigonelline and sucrose content. In addition to a typical bitterness, caffeine provides perceived strength and body [355]. Caffeine contributes to the coffee flavour to some extent [38]. Caffeine is also an essential component to protect the tree from predators and competing with other plants. Between 60% and 90% of the trigonelline is degraded during roasting to form fine flavor components like pyridine and pyrroles, contributing to the overall aroma perception and bitterness [63, 64, 355-357]. This degradation during roasting also produces bioavailable compounds such as vitamin B and nicotinic acid [59]. Trigonelline level was revealed to be strongly correlated with high quality [38]. Sugars, especially the most abundant sucrose is a key flavor precursor, almost fully degraded during roasting, to produce aromatic compounds, such as aldehydes, carboxylic acids and hydroxymethylfurfural. [115, 355]

3.2 Plant material and methods

3.2.1 Coffee bean processing

Mature fruits of Coffea arabica var. K7 were harvested from the upper (>170 cm) and lower (<170 cm) tree canopy between 10 am to 2 pm (Figure 9). Dry-processing was applied in this study. Coffee cherries were de-pulped with a coffee huller (CAPE, Australia) and sun-dried. When samples reached a moisture content around 12%, a de-husking process was performed to obtain raw coffee beans. Raw beans were then split into two groups: one set for bean size, weight, caffeine, trigonelline and sucrose analysis. The other one was roasted using a Probatino coffee roaster (Probat, Germany) and accessed by sensory evaluation.

41

Figure 9 Coffee bean samples from the upper (>170cm) and lower (<170cm) canopy. 3.2.2 Bean size and weight measurement

The length, width and thickness of coffee beans (n=10) were measured using Vernier calipers in quadruplicate. Bean weight was determined by measuring 100 raw coffee beans (four batches) using an analytical balance.

3.2.3 Moisture content

The moisture test was conducted according to the AOAC official method 979.12. About 2g of raw coffee beans were dried at 105oC for 16 hours. The ratio of dried weight to the original bean weight determines the moisture content percentage.

3.2.4 Analysis of caffeine and trigonelline

Caffeine and trigonelline extractions were performed as previously described with some modifications [91]. Green coffee powder (2g) was extracted after boiling in MiliQ water (20 ml) while mixing for 2.5 min. The extraction process was then repeated twice, and the total volume adjusted to 100 ml. The solution was cooled in an ice bath and centrifuged (1500 rpm, 4°C, 5 min) before filtering with a 0.45 μm syringe filter (Thermo Fisher Scientific, Australia) and transferring to 2 mL HPLC vials (Thermo Fisher Scientific, Australia). Extractions were performed in triplicate. Samples were stored at -80°C in HPLC vials until analysis could take place.

The analysis of caffeine and trigonelline was performed using HPLC as described previously [91]. Calibration was performed with caffeine and trigonelline standards between 0.5-500

42 ug/ml. Reagents caffeine and trigonelline hydrochloride (analytical standards), methanol and

KH2PO4 (analytical grade) were purchased from Sigma-Aldrich (USA). Extracts were analysed with a Waters Associates (USA) HPLC, filtered with a Spherisorb S5 ODS2 (0.46 x 25.0 cm) column and a guard column Bondapak C18 (10µm). Gradient A was a phosphate buffer (KH2PO4) 0.1M (pH 4.0), while B was methanol at 0 min (7%), 4 min (9%), 6 min (25%), 13 min (29%), 21 min (50%) and 35-40 min (7%). The injection volume was 20 µl. UV detection was set at 265 nm. Each extraction was analysed in HPLC in triplicate.

3.2.5 Analysis of sucrose

The method used for the extraction of sucrose was modified from a previous study [358]. Green coffee (ground) samples (0.1g) were mixed with 4ml of MiliQ water. After sonication (20 min), extracts were centrifuged and filtered prior to HPLC analysis. Extraction was performed in duplicate.

The method used for HPLC analysis of sucrose followed a procedure validated previously [359]. Calibration was performed with a sucrose standard (Sigma Aldrich) at concentrations between 100 and 500mg/L. An Agilent HPLC 1110 series (ELSD detection) HPLC was used fitted with an Alltech Prevail Carbohydrate ES (5um, 250mm x 4.6mm) column. The mobile phase used was 25% water and 75% acetonitrile isocratic. The injection volume was 10ul. Each extraction was analyzed by HPLC in triplicate.

3.2.6 Sensory evaluation

3.2.6.1 Sample preparation and presentation

Samples of coffee beans from the upper and lower canopy treatments were prepared for sensory assessment such that panellists could compare treatments both as whole beans and freshly ground bean samples. Whole bean (unground) sample was prepared with one bean from each biological replicate and four beans in total for the upper or lower canopy whole bean sample. For ground coffee sample assessment, coffee beans were ground and a portion (0.5g) was placed in a clear 60 ml serving cup sealed with a lid (Figure 10).

43

Compusense with questionaries

Coffee ground

Serving hatch Current status

Coffee beans

Samples Water-refresh

Figure 10 Sample preparation and presentation for consumer sensory test. 3.2.6.2 Preliminary screening

Prior to formal evaluation, a preliminary sensory screening was performed whereby samples of ground coffee and coffee beans were presented to a panel of 12 trained tasters, followed by the Australian Standard AS 2542 1.1. The panelists had been previously recruited externally after pre-screening for sensory acuity and training in sensory methods and were experienced in participating in descriptive analysis studies of food and beverage products. The trained sensory panelists were asked to describe the aroma and appearance of samples and indicate any qualitative or quantitative sample differences followed by a discussion session to reach a consensus on the key sample differences. A sensory lexicon for coffee was provided to the panel (Table 2).

Table 2 Speciality coffee definitions and sensory reference standards.

Sensory attributes Definition Reference standards

Overall aroma intensity The overall aroma intensity of the sample. NA Citrus zesty A sharp lifted citrus, zesty aroma detected when first lifting Citrus zest the lid.

Flowery A heady floral, fragrant note. Jasmine tea

Blackcurrant / berry A fruity, blackcurrant or red currant aroma, berry-like. Blackcurrant jam

Aroma Herby / grassy A slightly dried herbaceous note, herby and grassy almost Green snow pea like pipe tobacco. Chocolate A chocolate note, cocoa-like. Dark chocolate grated Burnt An acrid, burnt and charry note. Ash from the smoker

44

Smoky A smoky note with a little ashy. Ash from the smoker Toasted A lighter toast aroma, pleasant like freshly toasted bread. Toasted white bread Cereal / caramel A sweet caramel aroma like cereal and molasses, dark Original caramel, oat bran, pinch of brown sugar. dark brown sugar, drops of water Roasted nutty A roasted nutty aroma like roast peanuts or pecans Crushed hazelnuts (with skins) and roasted peanuts. Woody Aroma of wood chips, cedar and fragrant almost medicinal Essential oil essence in wood chips and clove-like Pepper spice A gragrant spicy pepper aroma, like black pepper, Sichuan Black pepper and other spices crushed pepper, star anise. Aroma attributes Definition Reference standards ‘other’ aroma NA Tart / sourness The sourness of the sample Nil

Saltiness The saltiness of the sample Nil

Bitterness The bitterness of the sample Nil Astringency The astringency, drying sensation in the mouth and cheek Nil pouches Oiliness A mouth coating oiliness sensation Nil

Flavour/mouth feel Overall flavour The overall intensity of flavour sensations Nil intensity ‘other’ flavour NA

3.2.6.3 Difference testing

Formal evaluation of samples was conducted using a paired comparison difference test [360, 361] involving a total of 86 participants, made up of were staff and students from the Health and Food Sciences Precinct, Coopers Plains, Brisbane, Australia. A number of sessions were conducted throughout two subsequent days with up to 12 participants attending each session. As per the paired comparison test, samples were presented to panelists according to a balanced design in pairs and participants were asked to evaluate the aroma and identify the sample with a stronger (more intense) aroma. The paired comparison is a forced choice technique. Data were obtained and processed with Compusense® five software (5.0.49, Compusense Inc., Guelph, Ontario, Canada) based on a two-sided directional paired comparison [362].

Sensory evaluation was conducted in the purpose-built sensory laboratory and focus group room of the Health and Food Sciences Precinct, Coopers Plains, Brisbane, Australia. The 12- individual booths were temperature and light controlled and equipped with computers for data collection. The focus group room, used for preliminary screening, was a board-room style set up with a round table, chairs and a smartboard for collecting information from discussions.

45

3.2.7 Statistical analysis

All the physicochemical analyses were conducted in triplicate (at least). Significant analysis was carried out with ANOVA single factor analysis, except sensory evaluation analysis. Two-sided paired end analysis was conducted for sensory evaluation and the comparison significance was determined using p-value less than 5%, 1% and 0.1%.

3.3 Results

3.3.1 Influence of canopy position on the size and chemical compositions of coffee beans

Table 3 summarises the results for the physical traits of the coffee beans from the upper and lower canopy, including the bean size (width, length and thickness) and weight. Statistically, there was no significant difference in bean size and weight between samples collected from the upper and lower canopy. The results from the non-volatile compositional analysis of the coffee beans collected from the upper and lower canopy are shown in Figure 11. Coffee beans from the lower canopy were significantly higher in caffeine (1.11 ± 0.04 vs 1.07 ± 0.07%), trigonelline (0.93 ± 0.02 vs 0.90 ± 0.01%) and sucrose (8.6±0.1 vs 7.7±0.2%) content (p < 1%) compared to the upper canopy according to chemical analysis.

Table 3 Bean weight and size in coffee beans from the upper and lower canopy (n=4, mean±SD).

Bean size Coffee bean Samples Weight g/100 beans Length (mm) Width (mm) Thickness (mm)

UPPER 18.6 ± 1.0 9.6 ± 0.6 7.5 ± 0.4 4.0 ± 0.2

LOWER 19.2 ± 0.3 9.7 ± 0.6 7.2 ± 0.4 4.1 ± 0.3

P-value 0.1 0.4 0.2 0.2

46

Figure 11 Chemical components of coffee beans (caffeine, trigonelline and sucrose) as influenced by canopy position. Note: A. Coffee beans (Coffea arabica cv. K7) were collected from the lower and upper canopy (above and below 170cm). B. Histogram of the caffeine, trigonelline and sucrose content in coffee beans from upper and lower canopy (percentage of dry matter) (n=4, mean±SD). Note: ** indicates p < 1%.

3.3.2 Influence of canopy position on the aroma of coffee beans

Preliminary sensory screening suggested that the overall aroma intensity of the samples was different with the lower canopy being identified as having a more intense aroma (Table 4). Whole beans from the upper canopy had low aroma intensity, while those of the lower canopy were described by the panel as having a medium intensity odor. The aroma of the whole unground beans from the upper canopy was uniquely described as smelling of milk, sweet, caramel, fresh grassy, woody and old perfume. The beans from the lower canopy beans were described as strong, roasted, fruity (citrus, raspberry, lime), beef jerky and chrysanthemum. Not surprisingly, a much stronger odor was produced by ground coffee compared to the whole unground bean samples. The outstanding aroma of the ground beans from the upper canopy coffee was described as pipe tobacco, herby, woody, earthy, mushroom and dry dusty. The aroma of the ground beans from the lower canopy was distinguished as having a higher intensity, which was a deeper aroma, described as dark chocolate, orange and citrus zest. Following the preliminary screening, a difference test (paired comparison) was conducted which indeed demonstrated that both ground and whole selected coffee beans samples collected from the lower canopy were significantly more

47 intense in terms of aroma compared to samples from the upper canopy (p values of <1% and <5% for ground and whole beans, respectively). An increase in the intensity of aroma explains an improvement in organoleptic coffee quality [32, 363].

Table 4 Physical, chemical and sensorial profiling of coffee beans from the upper and lower canopy (n=12).

Whole bean Ground Bean

Upper canopy Lower Canopy Upper canopy Lower Canopy

Medium intensity, Low aroma intensity, milk, Medium odour intensity, hazelnut, roasted, nut, vanilla, sweet candy, brown herby, caramel, roasted nuts, smoky, tobacco, strong, Higher intensity, deeper, sugar, caramel, treacle, intense, onion/pepper, chico babies, cinnamon, dark chocolate, smoky, golden syrup, malt, burnt, chocolate, burnt, toast, spice, chocolate, lime, charred, toast, nutmeg, Brazil nut, berry, smoky, woody, pipe tobacco, pepper, malt, vanilla, citrus, nutty, cedar, burnt, citrus fresh, grassy, pepper, spice, blackcurrant, molasses, raspberry, beef jerky, zest, spicy, BBQ spice, milk-chocolate, old jammy, mushroom, earthy, Chinese medicine, mixed spice, orange, treacle perfume/cologne, woody, dry dusty, Szechuan pepper, mulberry/chrysanthemum, flowery tree sap, floral, smoky flowery

3.4 Summary

Shade has been found to enhance coffee flavor and increase caffeine concentration [227]. However, decreased trigonelline and sucrose content (increased expression of SUS2) [32, 35, 228, 364] has been reported when the whole plants are shaded. Shade provides lower growing temperature, which delays coffee bean maturity about one month [35]. Sucrose content has been reported to increase in response to cold [166]. The temperature difference between the upper and lower canopy has been reported to be 1-4 oC [365]. The increase in caffeine and decrease in sucrose were also observed in coffee beans grown in high altitude, which provides good quality beans. However, there was an increase in trigonelline levels in coffee beans from high altitudes. Gene expression analysis yet to be conducted in these two systems limit our understanding of the basic reasons and applications for the development of coffee quality.

This study reveals high-quality coffee beans from the lower canopy was associated with more intense aroma (improvement in organoleptic coffee quality) as well as improved sucrose, trigonelline and caffeine content. As discussed in chapter 2 and the introduction part of this chapter, sucrose and trigonelline are flavour precursors for coffee beverage. Significant increase in caffeine, trigonelline and sucrose content in coffee beans from the lower canopy is

48 likely to contribute to the deeper sensory. However, few research questions need to be considered for the downstream analysis. 1) Are these chemical differences regulated by gene expression? 2) Which genes were involved if any? 3) At what stage this genetic control was more important/dominant? 4) What are the gaps to answer these questions? 5) Are there any other important genes that are important in these changes? Therefore, the downstream chapters were designed to answer these questions.

49

Chapter 4 Generate a coffee bean transcriptome reference

The aim of this chapter is to generate a transcriptome reference for the downstream analysis on a comprehensive scale.

4.1 Introduction We have discussed in chapter 2 that transcriptome analysis is more direct and comprehensive to obtain the changing gene expression under different environmental conditions. However, one of the research gaps is that a transcriptome reference is needed before the global gene expression analysis can be conducted. A reference transcriptome will not only show the function of the gene, but the reference sequence for the reads generated to be aligned to and to identify the differences in gene expression. Moreover, Arabica coffee studied in this research is a polyploid system. Polyploidy creates a complicated transcriptome with diverse transcript isoforms. As an important evolutionary process in plants, polyploidization generates new species and increases biodiversity [366]. A balance of genetic and biochemical features is required for the polyploid to survive while carrying multiple genomes in the same nucleus [367]. Genetic changes associated with the formation of polyploids include gene function, which may remain unchanged, or diversification among the multiple homeologs, leading to neofunctionalization, subfunctionalization, or pseudogenization [368]. Alternative splicing and polyadenylation also contribute further to the diversity of transcripts [369, 370]. Additionally, different 5’UTRs account for transcript variation, however, limited information is available on this for especially for plant genes. The diversity of 5’UTR includes different functional motifs, like upstream open reading frames or introns harboured in this area, influencing post-transcription expression [371, 372]. TRNA sequencing (RNA-Seq) reviewed in chapter 2 makes it possible to capture the identity of these transcripts. Generating a reference transcriptome is essential for studying variation in expression of genes and the influence of genotype or environment on their expression [373, 374]. Most studies generate a reference transcriptome by short-read sequencing and reconstruct the transcriptome by the assembly and/or mapping of reads to other available reference genomes [375-377]. However, this is difficult for long transcripts, repetitive sequences and transposable elements. It is particularly challenging for complex polyploid genomes [378]. The LRS technology (e.g. PacBio) has recently become available and this technology overcomes these difficulties by generating sequence information for the full length as a single sequence read, including very long transcripts (e.g. those exceeding 10kb)

50 without the need for further assembly. This technique has been applied in a few plant studies and provides further information on transcript diversity, including alternative splicing and alternative polyadenylation [369, 370]. A high-quality reference genome and annotation are not yet available for Arabica coffee. However, a draft genome is available for one of the parents, C. canephora [48]. Arabica coffee is produced in limited high altitude tropical environments and is threatened by climate change. Understanding the genetic and environmental control of coffee quality will be facilitated by the availability of detailed knowledge of the transcriptome of the coffee bean. This study used LRS by Pacbio Iso-seq to characterise the Arabica coffee bean transcriptome including beans from immature, intermediate and mature stages to explore the complex polyploid system and establish a reference transcriptome for future studies of gene expression. 4.2 Plant material and methods 4.2.1 RNA sample preparation

Fruits at different development stages (immature, intermediate and mature fruits) of Coffea arabica var. K7 (Figure 12) were harvested from Green Cauldron Coffee, Federal, Australia. Five coffee trees were selected randomly and ten coffee fruits (five fruits from each of the upper and lower canopy of each tree) were collected separately for each tree and each stage of development. Samples were collected in triplicate. In total, 450 coffee fruits (900 beans) from 15 trees were collected. Once each fruit was harvested, the pericarp was removed immediately with a scalpel in 20 s or less. The coffee beans were immediately frozen in liquid nitrogen, transported on dry ice and stored at -80 oC until further use. Total RNA was extracted from coffee fruits as described previously [379]. Equal amounts of RNA from each of the 90 (3 replicates of 5 trees at 2 levels in the canopy and 3 stages of development) extractions were combined to provide a representative sample for sequencing to generate a reference transcriptome. Afterwards, combined RNA was assessed for integrity using an Agilent RNA 6000 nano kit and chips on a Bioanalyzer 2100 (Agilent Technologies, California, USA) and processed further for cDNA preparation.

Figure 12 Coffee fruits of immature (Green), intermediate (Yellow) and mature (Red) stages

51

4.2.2 cDNA preparation

The Pacbio Iso-seq protocol was used for cDNA preparation. cDNA was synthesised using a Clontech SMARTer PCR cDNA Synthesis kit (ClonTech, Takara Bio Inc., Shiga, Japan) and amplified using a KAPA HIFI PCR kit (Kapa Biosystems, Boston, USA). The double- stranded cDNA was split into two sub-samples. One was used directly for sequencing. The other set was normalised to equalise transcript abundance and obtain rare sequences. The cDNA was purified for normalisation using a QIAquick PCR Purification Kit (Qiagen). The purified cDNA was precipitated and normalised with a Trimmer-2 cDNA normalisation kit (Evrogen, Moscow, Russia). The resulting cDNA was evaluated and quantified using an Agilent DNA 12000 Kit and Chips on a Bioanalyzer 2100 (Agilent Technologies, California, USA). The same amount of non-normalized (8µg) and normalised (8µg) cDNA was submitted for Pacbio Isoform Sequencing platform (Iso-Seq). Samples were subjected to a Pacbio Iso-Seq protocol through purification, size selection (Blue Pippin system), re-amplification, SMRTbell template preparation and Iso-seq on a Pacbio RS II platform. A size selection protocol was applied as smaller cDNAs are more abundant and would otherwise be preferentially sequenced. Four Bluepippin bins were selected for non-normalized cDNA sequencing, with size ranges of 0.5-2.5kb, 2-3.5kb, 3- 6.5kb and 5-10kb, respectively since Pacbio sequencing preferentially sequences short DNA fragments. Two bins were selected for normalised cDNA sequencing, 2-3.5kb and 3-6.5kb, as the normalisation biases against longer sequences. 4.2.3 Raw read processing and error correction

Sequence data was processed through the RS IsoSeq (version 2.3) pipeline [380]. The first step was to remove adapters and artefacts to generate reads of insert (ROIs) consensus sequences. Short sequences less than 300 bp were removed as the Bluepippin cDNA size selection starts from 500 bp, where some sequences less than 500 bp have a chance to be sequenced. Non-Chimeric ROIs sequences were filtered into two groups of sequences comprised of full-length ROIs sequences and non-full length ROIs sequences. Full-length (FL) ROIs sequences were identified based on the presence of the 5’-adaptor sequence, the 3’ adapter sequences (both used in the library preparation) and poly (A) tail. Further, FL ROIs sequences were passed through the isoform-level clustering (ICE). ROI sequences were used to correct errors (polish) the isoform sequences using the Quiver software module. The polishing process of Quiver generated two isoform sequence files, one with high quality (HQ) isoform sequences and the other with low quality (LQ) isoform sequences corresponding to

52 an expected accuracy of ≥ 99% or below respectively. LQ output (or non-FL coverage sequences) is useful in some cases, as it may result from rare transcripts or lower coverage sequences. And these low coverage sequences can be further used to correct errors in HQ output. The Primer IIA sequence motifs (used in the library preparation) which escaped removal at the ROIs stage corresponded to 11 sequences were trimmed using CLC genomic workbench 9.0 (CLC, QIAGEN, CLC Bio, Denmark). After combining the HQ and LQ transcripts, further clustering was processed with CD-HIT-EST (c=0.99) [381]. In the following step, the contaminant sequences were removed by CLC stepwise as follows: 1) Chloroplast transcript sequences were identified by BLASTn to the C. arabica complete chloroplast genome (GenBank: EF044213.1); 2) Mitochondrial transcripts were characterised by BLASTn to Nicotiana tabacum and Vitis vinifera complete mitochondrial genomes (BA000042.1 and FM179380.1); 3) Ribosomal sequences were detected by BLASTn to the reported C. arabica, C. canephora and C. eugenioides ribosomal genes (AJ224846, EU650386, DQ153609, AF416459, EU650384, EU650385, AF542981, AF542990, JX459583, JX459584, JX459585, JX459586, JX459587, DQ153593, AF542982, DQ423064, DQ153588, DQ153621, AF542986); 4) Virus, viroid and prokaryote contaminants were identified with BLASTn to their reference genomes from the NCBI database (April 4th, 2017). Prokaryotic contaminants were screened with available reference genomes from NCBI (Feb 9th, 2017); 5) Fungal sequences were investigated by BLASTn to fungal proteins (April 4th, 2017). All the above analyses were processed one after another with a maximum E-value threshold of 1e-10. From the BLASTn results, significant matches were filtered with a bit score (A) ≥ 300 as well as identity ≥ 80%. In each step, the filtered significant sequences were processed further with cloud BLASTn to the NCBI non-redundant database (bit score: B) to further confirm the matches. This validation step was confirmed by comparison of the bit score (comparison of value A and B). If the higher bit score was associated with a contaminant sequence in the BLASTn (A>B), then the sequence was discarded. In total 526 sequences corresponding to chloroplastic (200), mitochondrial (264), ribosomal (37), viral (0), viroid (0), prokaryotic (0) and fungal (25) contaminant sequences, respectively, were removed in this process. Sequence quality was then accessed with the Fasta Statistics through Galaxy/GVL 4.0 [382]. This set of Iso-seq processed isoforms was used for further analysis and hereafter named the “Coffee long read sequencing (coffee-LRS) isoforms” [341]. The term ‘isoforms’, or ‘isoform sequence’ or ‘transcript’ used in this study represent individual sequences from the coffee-

53

LRS isoforms, while “transcript variants” indicate different transcript of a gene, including alternative spliced variants and homeologs. 4.2.4 Transcriptome annotation

A number of databases were used for annotation of the coffee-LRS isoforms described as follows. 1) The plant Geninfo identifier (GI) list was downloaded from NCBI Protein Entrez (May 2nd, 2017, 8,431,379 items). The plant proteins were retrieved from the NR database using this GI list, yielding 5,099,147 sequences (NR-plant). Then, the full set of the coffee- LRS isoforms was submitted to stand-alone BLASTx against the NR database below 1e-10; 2) Sequences without hits from step 1 were submitted further to NCBI non-redundant nucleotide sequences (NT, May 5th, 2017) BLASTn at 1e-10; 3) Sequences without a hit from step 2 were processed further with BLASTn (1e-20) to C. canephora coding sequences (CDS) with UTR and C. arabica EST database [383, 384]; 4) The output of BLASTx was filtered with query coverage (Qcovs), cumulative identity (ID) and sequence length into three categories, high, medium and low quality annotation. Query coverage indicates the input coffee-LRS isoforms covered by the matched sequences. Cumulative identity represents the identity length to the aligned length (AL). ID can be expressed as the ratio of the sum of identity length to the sum of the aligned length of all the High-scoring Segment Pairs (HspS) of a subject. The four databases above, NR plant, NT, C. canephora CDS with UTR and C. arabica EST database, are named as FOUR databases in this manuscript. Finally, all the BLASTx and BLASTn results were processed by function annotation with BLAST2GO. The Blast2GO Pro 4.0 (North America, US: USA2 Version: b2g_Sep 16) pipeline was based on default settings [385]. InterProScan (IPS) was used to search sequence protein domains from EBI databases to improve annotation (North America, US: USA2, Version: b2g_Sep 16). In the follow-up phase, Blast2GO Mapping, Annotation and Annex functions were applied to retrieve GO (gene ontology) terms, select reliable annotations and increase the number of annotated isoforms respectively. The GO-slim tool was used against the plant database to provide plant generic GOs. Finally, GO enzyme mapping and KEGG (Kyoto encyclopaedia of genes and genomes) pathway maps were loaded. 4.2.5 Case studies with the caffeine and sucrose genes

Two case studies were performed with genes encoding caffeine and sucrose biosynthesis pathway (caffeine and sucrose genes) to investigate specifically the quality, advantage and additional potential of the coffee-LRS isoforms. Reported coffee caffeine and sucrose

54 candidate genes were downloaded from the European Nucleotide Archive (EMBL-EBI) (Table 5 and Table 6). Table 5 Details of caffeine candidate genes, putative transcript variants annotated and 5‘UTR extension information

Putative transcript Caffeine candidate Accession length 5'UTR Species Source Abbreviation completeness variants from LRS genes number (bp) extension isoform sequences

AB048793 C. arabica mRNA CaXMT1 1,316 YES Genomic JX978514 C. arabica G-CaXMT1 1,987 YES c69597/f1p2/1421 xanthosine DNA c154338/f1p2/1360 YES methyltransferase 1 DQ422954 C. canephora mRNA CcXMT1 1,316 YES c71416/f3p3/1376 Genomic JX978509 C. canephora G-CcXMT1 1,994 YES DNA xanthosine Genomic JX978515 C. arabica G-CaXMT2 2,038 YES Not identified - methyltransferase2 DNA

AB048794 C. arabica mRNA CaMXMT1 1,298 YES

7-methylxanthine N- Genomic c20397/f5p1/1361 YES methyltransferase 1 JX978511 C. arabica G-CaMXMT1 1,838 YES DNA HQ616707 C. canephora mRNA CcMXMT1 1,222 YES Genomic JX978507 C. canephora G-CcMXMT1 1,829 YES DNA AB084126 C. arabica mRNA CaMXMT2 1,155 YES 7-methylxanthine N- c10402/f2p3/1277 YES methyltransferase 2 Genomic JX978512 C. arabica G-CaMXMT2 2,010 YES DNA AB084125 C. arabica mRNA CaDXMT1 1,155 YES 3,7- Genomic c25904/f2p0/977 dimethylxanthine N- JX978510 C. arabica G-CaDXMT1 2,063 YES YES DNA c71881/f6p2/1386 methyltransferase 1 DQ422955 C. canephora mRNA CcDXMT1 1,364 YES

KJ577793 C. arabica mRNA CaDXMT2 1,155 YES 3,7- c63815/f1p2/1273 dimethylxanthine N- c48759/f1p1/1517 YES methyltransferase 2 Genomic c26870/f6p5/1402 KJ577792 C. arabica G-CaDXMT2 2,006 YES DNA

Table 6 Details of sucrose candidate genes, putative transcript variants annotated and 5‘UTR extension information.

55

Candidate genes Accession Species Source Abbreviation length completeness Putative transcript 5'UTR number (bp) variants from LRS extension isoform sequences

Sucrose synthase 1 AM087674.1 C. arabica mRNA CaSUS1 2,979 YES c86432/f7p9/4842 YES c91298/f1p1/3137 c84406/f3p18/2975 c62911/f29p21/2965 DQ834312.1 C. canephora mRNA CcSUS2 2,989 YES c92344/f1p26/4662 c92296/f1p5/4676 c89510/f1p6/4592 c106591/f2p0/4381 AJ880768.2 C. canephora Genomic G-CcSUS1 3,957 exon 1-13 c72639/f25p28/2961 DNA

Sucrose synthase 2 AM087675.1 C. arabica mRNA CaSUS2 2,889 YES c73322/f3p2/3080 YES c75363/f3p2/2906 AM087676.1 C. canephora Genomic G-CcSUS2 5,672 exon 1-15 DNA

Sucrose phosphate DQ834321.1 C. canephora mRNA CcSPS1 3,150 YES c51110/f2p0/3136 YES synthase 1 DQ842233.1 C. canephora Genomic G-CcSPS1 8,215 YES DNA

Sucrose phosphate DQ842234.1 C. canephora Genomic G-CcSPS2 1,550 NO c103631/f1p2/4695 YES synthase 2 DNA c88660/f2p0/4282 c106342/f1p4/4274 c104672/f1p1/4440 (reverse)

For potential caffeine candidate genes, coffee-LRS isoforms were processed with BLASTn (1e-20) against the reported caffeine genes. Sequences with hits to the reported caffeine genes were submitted to BLASTx (1e-20) with the NR database to confirm whether they were caffeine genes (higher bit score). Confirmed transcripts (potential caffeine isoforms) and sucrose isoforms annotated by Blast2GO (potential sucrose transcripts) were further evaluated with Geneious 10.0.4 by aligning back to the reported candidate genes in allele level [386]. Motif analysis was conducted with default parameters except for “ten motifs” selected with MEME 4.11.2 [387]. UTRscan was used for UTR functional motifs annotation [388].

56

4.2.6 Comparison to other available coffee databases

To compare with available coffee sequences, the full coffee-LRS isoforms were processed with BLASTn (1e-20) to C. canephora CDS with UTR and C. arabica EST database, respectively and the other way around [383, 384]. The C. eugenioides transcriptome (young leaves and mature fruits) from Illumina was also used in the comparison [389]. 4.2.7 Novel genes

Coffee-LRS isoforms without hits to the FOUR databases were submitted to the Rfam database by Blast2GO Pro package for non-coding RNA analysis [390]. Sequences without hits to Rfam were probably novel genes in coffee. 4.2.8 Analysis of long sequences

In order to explore the advantage of using the LRS PacBio platform to obtain long sequences, the BLASTx and BLAST2GO functional annotation result for the coffee-LRS isoforms longer than 10kb were extracted from the total dataset. 4.3 Results 4.3.1 Overview of full-Length RNA molecules from long-read sequencing

A total of 2,618,905 raw reads were generated from LRS platform, which yielded 443,877 reads of insert. After 8,842 short sequences (less than 300 bp) were removed, 233,464 full- length (FL) and 201,571 non-full-length (NFL) reads were generated. The individual isoforms were sequenced in average five times. In total, 95,995 coffee-LRS isoforms were recovered after sequences representing chloroplast, mitochondrial and ribosomal transcripts were removed (Table 7). The length of the sequences in this dataset ranged from 301 bp to 23,335 bp, with an average length of 3,236 bp. The GC content was 41.4% and the N50 was 4,865 bp. Table 7 Arabica long-read sequencing isoforms compared to Coffea canephora coding sequences and Coffea arabica EST sequences

average min GC N50 max_length Number of Different datasets length length content % (bp) (bp) sequences (bp) (bp) Coffea arabica EST 44.7 734 662 32 3,584 35,153 database 1 [383] Coffea canphora coding 42.6 2,046 1,616 45 17,206 25,570 sequences with UTR 2 Coffea arabica long-read 41.4 4,865 3,236 301 23,335 95,995 sequencing isoforms

57

Table 8 BLAST output filtering with query coverage and cumulative identity. Note: Qcovs (query coverage); ID (cumulative identity); NR plant, NCBI non-redundant plant protein database; NT, NCBI non-redundant nucleotide database; #sequences, number of sequences.

BLASTx output (NR plant) High 300-1,000 bp 1,000-3,000 bp 3,000-5,000 bp >5,000 bp Qcovs (%, ≥) 60 50 40 30 ID (%, ≥) 70 70 60 60 #sequences 34,719 Medium 300-1,000 bp 1,000-3,000 bp 3,000-5,000 bp >5,000 bp Qcovs (%, ≥) 50 40 20 10 ID (%, ≥) 60 60 50 50 #sequences 13,655 Low 300-1,000 bp 1,000-3,000 bp 3,000-5,000 bp >5,000 bp Qcovs (%, ≥) 9 4 2 1 ID (%, ≥) 22.69 22.71 25.50 24.01

HSP length range (bp) 88-961 92-2,844 88-4,938 97-14,424 #sequences with hit 40,341 #sequences without hit to NR plant 7,280 BLASTn output (NT) with 7,280 sequences 300-1,000 bp 1,000-3,000 bp 3,000-5,000 bp >5,000 bp Qcovs (%, ≥) 5 2 1 1 ID (%, ≥) 73.32 71.83 75.51 76.09 HSP length range (bp) 41-962 37-1559 32-4219 50-5578 #sequences with hit 1,981 #sequences without hit to NR plant and NT 5,299 BLASTn output (C. canephora CDS with UTR and C. arabica contigs) with 5,299 sequences Putative novel genes in coffee 1,217

Table 9 Arabica long-read sequencing transcriptome annotation with different databases

Number of sequences % of sequences Databases annotated annotated Long-read sequencing 95,995 - transcriptome BLAST 94,709 98.66 Mapped 78,571 81.85 InterProScan 70,774 73.73 InterProScan GOs 33,605 35.01 GO slim 58,050 60.47 KEGG 11,489 11.97

The BLASTx output (against NR plant) was divided into three groups, high, medium and low quality based on Qcovs, ID and sequence length (Table 8). A number of 34,719 (high), 13,655 (medium) and 40,341 (low) sequences were grouped into each quality groups, respectively. Thereafter, 7,280 sequences without hits were processed with BLASTn to NT database and resulting in 1,981 sequences with hits. A total of 5,299 sequences without a hit were further accessed with C. canephora CDS and UTR and C. arabica contigs. Finally, there were 1,217 sequences with no hits to any of the above databases (FOUR databases).

58

4.3.2 Functional Annotation

Functional annotation of the coffee-LRS isoforms was investigated using different databases. The data in Table 9 shows that 94,709 sequences (98.66%) had hits to NR plant proteins, NR sequences, Coffea canephora CDS and UTR and Coffea arabica contigs as well as Rfam. A total of 70,774 sequences (73.73%) matched to IPS protein domains with 33,605 IPS GOs (35.01%). A number of 78,571 sequences (81.85%) had identified GOs. After the GOs were merged, GOs of 58,050 sequences (60.47%) matched with GO-slim (plant).

Species distribution of BLASTx (NR-plant) output

1,500,000

1,250,000

1,000,000

750,000

# BLASThits # 500,000

250,000

0

others

Ipomoea nil Ipomoea

Glycine max Glycine

Juglans regia Juglans

Vitisvinifera

Prunus persica Prunus

Citrus sinensis Citrus

Jatropha curcas Jatropha

Malus domestica Malus

Coffea canephora Coffea

Theobroma cacao Theobroma

Nelumbo nucifera Nelumbo

Solanum pennellii Solanum

Manihot esculenta Manihot

Sesamum indicum Sesamum

Capsicum Capsicum annuum communis Ricinus

Nicotiana tabacum Nicotiana

Populus euphratica Populus

Erythranthe guttata Erythranthe

Nicotiana attenuata Nicotiana

Nicotiana sylvestris Nicotiana

Solanum tuberosum Solanum

Gossypium hirsutum Gossypium

Gossypium arboreum Gossypium

Gossypium raimondii Gossypium

Pyrus x bretschneideri Pyrusx

Solanum lycopersicum Solanum

Nicotiana tomentosiformis Nicotiana Daucus carota subsp. sativus Daucussubsp. carota

Figure 13 Species distribution of coffee long read sequencing isoforms according to the result of BLASTx against NCBI non-redundant plant proteins (NR-plant). Note: #, number. Of all the hits to the NR plant proteins from BLASTx, the coffee-LRS isoforms (maximum 50 hits to each sequences) had the highest number of hits to the N. tabacum (tobacco, 174,308 hits), followed by C. canephora (142,656 hits), V. Vinifera (grape, 134,025 hits) and Theobroma cacao (cacao, 132,336 hits) proteins (Figure 13). Most hits found in tobacco were probably because the tobacco database is more extensive and well annotated than those of other related species, like C. canephora. For top-hit species, there is no doubt the majority of the sequences has top-hit with the progenitor, C. canephora (73,587 sequences), followed by Sesamum idicum (1,321 sequences) and Nicotiana tabacum (767 sequences). (Figure 14). The

59

NR-plant database consists of few proteins sequences from C. arabica as reflected by just 485 protein sequence hits and is ranked seventh in the top-species hit list. This indicates the limit information on C. arabica. Of the 33,512 sequences (34.91%) with IPS GOs, cytochrome P450 (IPR001128, 353 matches) had the most sequence matches among the IPS families (Figure 15). Top species distribution of BLASTx (NR-plant)

100000

10000

485 1000

100 # BLASThits #

10

1

others

Ipomoea nil Ipomoea

Juglans regia Juglans

Vitisvinifera

Cajanus cajan Cajanus

Coffea arabica Coffea

Citrus sinensis Citrus

Jatropha curcas Jatropha

Ziziphus jujuba Ziziphus

Coffea canephora Coffea

Theobroma cacao Theobroma

Nelumbo nucifera Nelumbo

Solanum pennellii Solanum

Manihot esculenta Manihot

Sesamum indicum Sesamum

Capsicum Capsicum annuum communis Ricinus

Nicotiana tabacum Nicotiana

Eucalyptus grandis Eucalyptus euphratica Populus

Erythranthe guttata Erythranthe

Nicotiana attenuata Nicotiana

Nicotiana sylvestris Nicotiana

Solanum tuberosum Solanum

Gossypium raimondii Gossypium

Solanum lycopersicum Solanum

Nicotiana tomentosiformis Nicotiana

Dorcoceras hygrometricum Dorcoceras Daucus carota subsp. sativus Daucussubsp. carota

Cynara cardunculus var. cardunculusscolymus Cynara

Figure 14 Top Species distribution of coffee long read sequencing isoforms according to the result of BLASTx against NCBI non-redundant plant proteins (NR-plant). Note: #, number.

60

InterProScan families distribution

100000

10000

1000

100

10 Number of Number sequences

1

Figure 15 Distribution of InterProScan families from coffee long read sequencing dataset

61

Biological process (BP, 56,230 sequences) was more abundant than cellular component (CC, 44,528 sequences) and molecular function (MF, 45,604 sequences) (Figure 16). Within these functional groups, the highest number of sequences were annotated with the biosynthetic process (11,627 sequences, 20.68%), membrane component (21,175 sequences, 47.55%) and transferase activity (11,921 sequences, 26.14%). A total of 156 pathways with 921 enzymes were annotated by KEGG, associated with 11.97% of the whole dataset (11,489 sequences). Among these, starch and sucrose metabolism ranked as the fifth most abundant pathways, with 36 encoding enzymes and 766 isoforms annotated (Figure 17). The average number of coffee LRS isoforms encoding the 921 enzymes was 18 while the highest number was found in phosphatase (EC: 3.6.1.15, 2,969 sequences), encoding the purine metabolism and thiamine metabolism pathway. In comparison, only 802 sequences were associated with 142 pathways and 374 enzymes in C. eugenioides transcriptome and starch and sucrose pathway relating to 450 contigs was the most encoded pathway [29].

Figure 16 Pie chart and word cloud of coffee long read sequencing isoforms distribution to biological process, cellular component and molecular function

62

KEGG pathway distribution (TOP 30) 200 3,500 180 3,000 160 140 2,500 120 2,000 100 1,500 80 60 1,000 40 500

20 Number of Number enzymes Number of Number sequences 0 0

Figure 17 The KEGG pathway distribution of coffee long read sequencing isoforms. The candidate genes for the major caffeine candidate genes were not identified by KEGG pathway. To evaluate the annotated isoforms and their diversity, further analysis was performed with caffeine pathway. The sucrose pathway was also analysed as a case study as sucrose candidate genes were relatively long and highly diverse. Both of these pathways are important for the understanding of coffee quality [31].

4.3.2.1 Case study I: Isoform diversity in the caffeine biosynthesis pathway

The caffeine pathway has previously been widely studied (Figure 18a). Candidate genes and complete coding sequences of both transcripts and genomic DNA are available in public databases and can be used as well-established references for caffeine candidate gene analysis (Table 5). From the BLASTn output, 25 long-read transcripts were annotated and related to candidate caffeine genes [21]. Further alignment suggests ten high quality isoforms were likely to be putative caffeine genes, including three transcript variants of XMT1, one of MXMT1, one of MXMT2 together with two of DXMT1 and three of DXMT2. All genes encoding the caffeine primary pathway, except the XMT2 gene, can be found in the bean transcriptome (Figure 18 and Table 5). The length distribution of these isoforms ranged between 977 and 1,517bp.

63

Figure 18 Putative transcript variants from long-read sequencing aligned to reference caffeine genes. Note: a. Main caffeine biosynthesis pathway in coffee, adaptive from [391]. b. Alignment of three Arabica putative XMT1 variants from long-read sequencing (c69597/f1p2/1412, c154338/f1p2/1360 and c71416/f3p3/1376), Coffea arabica and Coffea canephora XMT1 (CaXMT1 and CcXMT1) to Arabica XMT1 genomic DNA sequence (G-CaXMT1). c. Possible alternative polyadenylation of putative XMT1 Iso-seq variant (c25904/f2p0977) from long-read sequencing; G-CaDXMT1, Arabica DXMT1 genomic DNA sequence; CaDXMT1, DXMT1 coding sequence; d. Two polyadenylation signals were identified in 3’ends of c25904/f2p0/977; e. Possible alternative splicing (intron retention) in one of the putative DXMT2 variants (c48759/f1p1/1517) from long-read sequencing transcripts; G- CaDXMT2, Arabica DXMT2 genomic DNA sequence; CaDXMT2, Arabica DXMT2 coding sequence. (Note: black colour in the alignment means different nucleotides to reference sequence, Arabica genomic XMT1, while grey colour means the same nucleotides as the reference.). Importantly, all ten isoforms were extended at the 5’ UTR region compared to the corresponding sequences reported in Arabica and Robusta coffee, while eight isoforms were longer at the 3’ end (Figure 18b, 18c, 18e and Figure 19). The most extended isoform (c695597/f1p2/1421) was 136 bp longer than the previously reported candidate genes (CaXMT1, Figure 18b). Nine isoforms were found to be longer than the reported genomic DNA sequences. The other isoform was likely to have resulted from an alternative polyadenylation event (c25904/f2p0/977, Figure 18c) as two potential polyadenylation signals (AAUAAA) were identified in the 3’ UTR (Figure 18d). Alternative splicing was also presented in caffeine isoforms, for example, intron retention was detected in one of the putative DXMT2 isoforms (Figure 18e).

64

Figure 19 Putative transcript variants from long-read sequencing aligned to reported genes encoding 7-methylxanthine N-methyltransferase. Note: G-CaMXMT1, Arabica MXMT1 genomic DNA sequence; G-CaMXMT2, Arabica MXMT2 genomic DNA sequence; MXMT1, MXMT2, MXMT1, MXMT2 coding sequence; Ca, Arabica coffee; Cc, Robusta coffee; c20397/f5p1/1361, c10402/f2p3/1277, coffee long read sequencing isoforms. Black colour in the aligned area (correspond to G-CaMXMT reference genes) indicates sequences different to the reference. Coffee LRS isoforms encoding XMT1 (Figure 18b), MXMT1 (Figure 19) and DXMT2 (Figure 18c, 18e) were better aligned to the corresponding C. canephora isoforms, individually (higher identity, Figure 18c and Figure 19). This indicates these transcript variants were potentially C. canephora sub-genome copies. In contrast, isoforms encoding XMT2, MXMT2 and DXMT1 were poorly aligned with C. canephora isoforms (more variants) and were probably C. eugenioides sub-genome copies.

4.3.2.2 Case study II: Long sucrose isoforms provide insight into the complexity of the polyploid system

Sucrose genes were used to investigate the transcriptome sequence diversity of the polyploidy system. For the sucrose synthase 1 gene (SUS1), one of the important genes in the sucrose metabolism, nine transcript variants were identified (Figure 20, Table 6 and Figure 21a). Compared to c86432/f7p9/4842, the other eight transcript variants varied in motif replacement (motif 7 replaced motif 9 in c106591/f2p0/4381), deletion (for example c92344/f1p26/4662) and relocation (intron retention, c92296/f1p5/4676 and c91298/f1p1/3137). (Figure 21b). The length of these nine putative SUS1 transcript variants ranged from 2,961 to 4,842 bp. Table 10 Results of 5’ UTRs from long-read sequencing scanned with UTRdb. Note: uORF, Upstream Open Reading Frame.

No. Sequence name 5' UTR length (bp) uORF 1 c86432/f2p7/4842 2,131 12 2 c91298/f1p1/3137 347 2 3 c84406/f3p18/2975 242 2 4 c62911/f29p21/2965 218 0 5 c92344/f1p26/4662 1,981 10

65

6 c92296/f1p5/4676 1,884 12 7 c89510/f1p6/4592 1,871 11 8 c106591/f2p0/4381 1,683 11 9 c72639/f25p28/2961 224 0

Importantly, all the sucrose transcript variants studied in this research were extended in the 5’UTR region relative to previous reports, except for SPS1, c51110/f2p0/3136 (Table 6). Some transcript variants, such as the longest putative SUS1 sequence identified, c86432/f2p7/4842 (4,842 bp), extended 2,131 bp upstream of the C. canephora coding sequence (G-CcSUS1) and 1,994 bp upstream of the Arabica sucrose synthase 1 mRNA coding sequence (CaSUS1). The length of the 5’leading region of the SUS1 transcript variants ranged between 218 and 2,131 bp (Table 10). To understand the diversity in this region, the 5’ leading sequences of the nine putative SUS1 transcript variants were scanned using the UTRdb online server. A maximum of 12 upstream open reading frames (uORFs) were identified and the number was positively correlated with the length of the sequences. No uORFs were identified in the two transcript variants with short 5’UTR, c62911/f29p21/2965 (218 bp leader sequence) and c72639/f25p28/2961 (232 bp leader sequence).

66

Figure 20 Motif search results of putative sucrose synthase gene 1 from long read sequencing. Note: a. Ten motifs were annotated in 9 putative sucrose synthase 1 variants from long-read sequencing, analysed by MEME 4.11.2. b. Motif location of 9 putative sucrose synthase 1 variants. Different motifs were highlighted with red arrows and intron retention was shown with dashed boxes.

67

Figure 21 Putative variants from long-read sequencing aligned to the reference sucrose genes. Note: a. Possible sucrose metabolism in coffee adapted from [391]; SS, sucrose synthase; SPS, sucrose phosphate synthase; SP, sucrose phosphatase; INV, invertase; CINV, cell wall invertase; b. Alignment of 9 Putative Sucrose synthase variants from long-read sequencing and C. arabica sucrose synthase gene 1 (CaSUS1) to C. canephora genomic sucrose synthase 1 (exons 1-13) (G-CcSUS1 (1-13)); Green box highlights variants result from different sub- genome copies, while intron retention events were marked with the blue box highlight; c. polyploid expression when zooming green area in 100%; d. possible alternative splicing (intron retention) from a C. canephora sub-genome copy when zooming blue box in 100%; e. possible intron retention from a C. eugenioides sub-genome copy when zooming blue area in 100%.red line classifies two groups of variants as different sub-genome copies. Different

68 nucleotides compared to the consensus were highlighted in black in the alignment; f. Putative variants from long read sequencing aligned with C. canephora genomic sucrose phosphate synthase 2 sequence (G-CcSPS2); FWD, forward sequence; REV, reverse sequence. Different nucleotides compared to the consensus were highlighted in black in the alignment. The nine SUS1 transcript variants revealed transcript diversity that resulted largely from different copies from the progenitors. When aligned to G-CcSUS1 (C. canephora SUS1 genomic sequence), the top four putative SUS1 transcript variants showed high identity and consistent nucleotide variants (like the guanine highlighted at 3,726 bp in the consensus sequence, Fig. 21c), suggesting that these were copies from the C. canephora sub-genome. For example, compared to the consensus sequence, the same indels were present in 3,707bp and 3,733bp, a cytosine at 3,713bp and guanine at 3,715bp. Consistently, the sequence of intron retention in one of the top four sequences, c91298/f1/p1/3137 (Fig. 21d) shows high homology to the intron sequence of C. canephora. However, the bottom five transcript variants had a higher number of variations compared to G-CcSUS1 that are likely to be C. eugenioides sub-genome derived copies. The lower five transcripts had lots of variations compared to C. canephora intron 10, further indicating this group was from a different copy, probably C. eugenioides (Fig. 21e). Additionally, some alleles of G-CcSUS1 were common in nine putative Arabica SUS1 transcript variants and Arabica sucrose synthase 1(CaSUS1), such as the variant at 3,666 bp (Fig. 21e). This type of allele probably results from different genotypes. Polyploid expression patterns were also observed in SPP1 transcript variants, the top two alignments were similar to C. canephora and the other two were slightly different but related. All of the four transcript variants were longer in the upstream sequences while three extended further downstream than had previously been reported. Another essential potential of LRS is to explore sequences not yet complete or published. For instance, four transcript variants were identified from this research while SPS2 has only been identified in C. canephora rather than C. arabica (Fig. 21f). 4.3.3 Comparison to other available coffee databases

To understand the advantage and the diversity of this polyploid coffee transcriptome, a comparison was made with the available coffee database. More than twice the number of isoforms were identified in the tetraploid Arabica LRS transcriptome (immature, intermediate and mature fruits) compared with the C. eugenioides contigs (36,935 de novo assembled contigs, average length: 701 bp, from immature leaves and mature fruits), C. canephora CDS with UTR (25,570 sequences, from a variety of tissues, including fruits) and C. arabica EST database (35,153 contigs, including fruits) (Table 7) [14, 23, 29]. The coffee-LRS isoforms

69 show greater transcript length, diversity and a lower GC content. The N50 of the Pacbio dataset (4,865 bp) was more than three times longer and the average length was more than twice that of the other databases. The sequence distribution of C. arabica contigs peaks at 655 bp while C. canephora CDS with UTR reaches the largest number of sequences at 1,490 bp (Figure 22). Most of the sequences from the C. canephora CDS with UTR and the C. arabica EST database were less than 3,770 bp. By comparison, 39,917 coffee LRS isoforms (41.6%) were longer than 3,770 bp.

Figure 22 The distribution of the number of the coffee long read sequencing sequences (coffee LRS-sequences), C. canephora coding sequences with UTR, C. arabica contigs with length. Note: The horizontal axis is formatted in logarithmic scale. Results of the BLASTn analysis indicated that of the 95,995 coffee-LRS isoforms, 9,308 (9.7%) had no matches to the C. canephora CDS with UTR while 3,682 (3.8%) isoforms had no hits to the C. arabica contigs. This indicates coffee-LRS isoforms are very diverse compared to these two databases. Conversely, 9,167 (26.1%) of C. canephora CDS with UTR and 4,830 (18.9%) of C. arabica contigs had no hits to the coffee-LRS isoforms. These two sets of sequences without hits are probably sequences from leaf or other tissues not expressed in the tissues investigated in this study. 4.3.4 Novel genes

The 1,217 sequences without hits to the FOUR databases (NR plant proteins, NT database, C. canephora CDS with UTR and C. arabica EST database) were submitted to the Rfam server to predict non-coding RNAs (ncRNA). The four isoforms that matched were in three biotypes,

70 two transcripts were identified as CD-box snoRNA, one as HACA-box snoRNA and the other one as a miRNA (Table 11). Other than these, the other 1,213 sequences had no hit to the FOUR databases and Rfam are likely to be novel genes that have not been discovered in coffee or contaminants from other organisms with no sequence information to date [21]. Length distribution of this new dataset ranged from 325 to 19,189 bp. Table 11 Coffee long read sequencing isoforms annotated by Rfam (rfam.xfam.org).

Family Sequence Name Family ID Score E-Value GC% Start End Strand Biotype GOs Acc GO:0035068 c63645/f1p0/1090 RF00075 mir-166 92.7 8.50E-24 0.42 298 452 + Gene; miRNA GO:0035195 Gene; snRNA; GO:0005730 c36778/f1p0/867 RF00267 snoR64 69.7 1.50E-14 0.36 445 539 + snoRNA; CD-box GO:0006396 Gene; snRNA; GO:0005730 c42517/f1p0/1215 RF00360 snoZ107_R87 87.9 9.50E-20 0.49 276 384 + snoRNA; CD-box GO:0006396 Gene; snRNA; GO:0005730 c25730/f1p0/842 RF01227 snoR83 91.6 1.90E-18 0.45 183 321 + snoRNA; HACA-box GO:0006396

4.3.5 Long transcripts

In order to assess the value of LRS in discovering long sequences, 577 transcripts longer than 10 kb were further analysed. Functional annotation of this extremely long dataset shows the majority of the sequences (564 sequences, 97.8%) matched to the FOUR databases [21]. The HSP/Hit coverage distribution was relatively evenly distributed from 0 to 100% compared to the HSP/Seq coverage. In parallel, the majority of sequences distributed less than 50% HSP/Seq coverage and peaked at 6%, representing limited information of long sequences in the NR database. IPS matches were found for 352 sequences (61.0%) while 61 of them had IPS GOs. A total of 446 sequences (77.3%) were retrieved with GO terms, while 201 isoforms (34.8%) from these were also annotated with GO-Slim. In total, 144 sequences were classified into the biological process, with 92 sequences into cellular component and 79 into molecular function (Figure 23). Among them, biosynthetic process (31 sequences), member (43 sequences) and hydrolase activity (25 sequences) were the top groups, separately, from the three functional processes. Among the annotated isoforms, 18 sequences encoding 12 enzymes from 13 pathways were annotated with a KEGG pathway. The starch and sucrose metabolism ranking the third most encoded pathway with two isoforms encoding two enzymes.

71

Figure 23 Pie chart and word cloud of long sequences (coffee long read sequencing isoforms >10kb) distribution to biological process, cellular component and molecular function. 4.4 Discussion Full-length transcripts generated by LRS in this study provided an isoform level polyploid coffee bean reference transcriptome. Compared to its sub-genome progenitors, the Arabica coffee bean transcriptome was more diverse and complicated with more isoforms, enzymes and pathways. Case studies in caffeine and sucrose identified that this diversity and complexity were a result of alternative splicing, polyadenylation, 5’UTR extension and sub- genome copies. Discovery of novel genes and long transcripts was also an advantage of using the LRS technology.

72

4.4.1 Polyploid expression

Different transcript variants may vary in function within the cell and be differentially expressed in tissues or environmental conditions. The abundance of variants in the Arabica transcriptome and case studies of caffeine and sucrose genes compared to the sub-genome progenitors clearly shows the complexity of the polyploid expression. Generally, polyploidy results in three main expression patterns of non-additive expression, dominant expression in which total gene expression in the hybrid is similar to one of the parents, transgressive expression compared to the progenitors or unequal homeolog expression [1]. Previously, it was proposed in coffee that the lower caffeine in Arabica coffee was due to the C. eugenioides sub-genome attributes. Based on phylogenetic analysis, CaXMT1, CaMXMT1 and CaDXMT2 were believed to be from the C. canephora sub- genome while CaXMT2, CaMXMT2 and CaDXMT1 were from the C. eugenioides sub- genome [32]. C. eugenioides has a very low caffeine biosynthesis together with a rapid catabolism [33]. The expression of sub-genome copies from C. eugenioides suggested lower caffeine in Arabica coffee compared to Robusta coffee. This study supports this hypothesis of transcript variants from sub-genome copies controlling the trait. Using the LRS isoforms, further studies are now possible at the isoform level (this study was at the transcript variant level) to understand sub-genome gene expression in the polyploid C. arabica. First, it will be possible to determine directly whether the expression of Arabica caffeine genes follows a non-additive expression pattern. Secondly, it would be interesting to determine the reason for more identified transcript variants and their changes in differential expression in tissues and at development stages. Thirdly, it will be possible to determine whether this expression pattern is influenced by environment, influencing coffee quality. Fourth, whether these different gene expression patterns result in different phenotypes. Similar analysis could be applied to many other genes or pathways of interest. Isoforms and transcript variants found in LRS tetraploid Arabica coffee bean transcriptome in this study were assigned to a number of functional groups, pathways and to specific enzyme functions. Arabica is believed to be more adaptive to temperature change than its diploid parents [34]. This study may also help elucidate the genetic basis of the higher sucrose in Arabica coffee. More generally, the complete polyploid transcriptome from this study will improve our understanding of the evolutionary adaptation and plasticity of polyploid species. However, further improvement is still needed in LRS technologies to improve the sequencing depth. Candidate genes in the caffeine pathway are reported to be expressed at low levels in fruits

73 compared to leaves, especially XMT2 (detected by quantitive RT-PCR) [32]. Transcripts were not detected in this study probably due to low expression of XMT2 and the PacBio Iso- seq technology not being sensitive enough to capture these transcripts. This is likely to happen in the case of other isoforms expressed at low levels that may not be captured by the Iso-Seq technology even after application of the cDNA library normalisation step, as was applied in this study. 4.4.2 5’UTR extension

Full-length transcripts captured in this study show the advantage of LRS. All the caffeine and sucrose isoforms annotated in this study, except for SPS1, were extended in the 5’UTR compared to those available from public databases. Previously, it was difficult to sequence the 5’ end as cDNA library preparation starts from the 3’ end and normally fails to reach the 5’end. Further, it was not easy to assemble the non-coding parts of transcripts as limited cDNA sequence was available to guide the assembly and confirm the contigs obtained. Therefore, less information is available on the 5’UTRs, especially for plants. Generally, the length of the 5’UTR ranges from 100 up to a few thousand bp [35]. This length difference is proposed because of the complex gene regulation maintained in eukaryotes [36]. Few post- transcriptional mechanisms have been studied in 5’UTRs, including the regulation by the pre- initiation complex and uORF re-initiation. uORFs are common in 5’UTRs that have critical regulation. They contain their own set of start and stop codons that can be scanned by ribosomes and translated. This regulation can inhibit translation of the main ORF transcript and reduce the amount of protein translated. Regulation of re-initiation of uORF translation was found to be associated with the length of sequence between the uORF and the main ORF, suggesting interactions with translation factors are required before initiation of translation [37]. This was also shown to be influenced by stress conditions [37]. However, not all uORF may have a role in translation control. In the leucine zipper transcription factor (bZIP) 11 gene, for example, harbouring four uORFs, only uORF2 was required for this regulation and this uORF is relatively conserved [38]. Other types of 5’UTR regulation may also be found such as that due to introns in the 5’UTRs. This happens to approximately 35% of human genes [6]. Understanding the mechanism of 5’UTR regulation will be greatly facilitated by the use of the full-length transcripts. In this study, multiple uORFs were characterised in the SUS1 5’ UTR and these may contribute to diverse functions and regulation that may be influenced by stress conditions. Climate change is a threat to Arabica coffee, which grows at high altitude.

74

It may be possible to influence 5’UTR regulation in Arabica coffee and have the potential to influence coffee quality. To confirm this, further phenotype, proteome and metabolome studies are required. 4.4.3 Long transcripts

LRS also has potential in discovering long transcripts, such as the sucrose synthase genes annotated here. Even though numerous studies have defined the sucrose pathways, not all the candidate genes have been identified. Many sucrose metabolism genes are too long to be captured by short read sequencing without significant de novo assembly. For example, the C. arabica SUS2 coding sequence is 2,889 bp and the genomic DNA sequence (exon 1 to 15) is 5,672 bp (Table 10). Sucrose synthase genes (6-7 different isoforms) were previously identified in cotton, rice, and Arabidopsis, However, in coffee, only two had been reported [39-41]. For genes that were only previously available for C. canephora, (e.g. SPP1), this study also identified isoforms in Arabica. For genes that previously only had partial sequences available, (e.g. SPS2), the transcripts identified in this study will guide further studies and improve current databases. Furthermore, the low coverage annotation of long sequences (>10kb) by BLASTx and BLASTn against the FOUR databases indicated the limited information on long sequences requiring further study. 4.4.4 Transcriptome analysis of polyploids using long-read sequencing

LRS technologies show advantages in understanding complex transcriptomes, especially from polyploid species [4, 42, 43]. First, this eliminates transcriptome reconstruction and that reduces the computation time. This is an essential goal for bioinformatics data analysis and software development [44]. To avoid obsolescence, transcriptome analysis calls for rapid genomics and bioinformatics to reduce the time from experiment to publication. Secondly, as there is no assembly of reads with LRS, there are no erroneous results due to misassembles caused by complex polyploid transcriptomes with a large number of repeats or homeolog genes. For example, almost 80% of the wheat genome is repetitive [43]. Last but not least, it shows the potential to capture rare or long sequences to provide an overview of the transcriptome and fully characterise RNA diversity, like 5’UTR extension in this study, alternative splicing and polyadenylation. [4, 45]. However, LRS technologies have been normally biased with high error rates, for example, previously released PacBio single molecule real-time sequencing (SMRT) reads had a very high error rate, 11-14%, therefore, numerous methods have been proposed to correct the sequences [46]. One common approach was to map back to a reference genome and (or) use

75 hybrid sequencing, for example, using short reads with high throughput to correct LRS isoform sequences [5, 47]. However, caution is necessary when using this strategy. The reference genome is often far from 100% accurate: 1) most draft genomes have numerous fragmented contigs or scaffolds with huge imbedded gaps. Even genomes previously considered well assembled have had many gaps [48]. 2) Problems also exist in poorly assembled gene loci. Few recently released genomes have been re-visited to generate improved assemblies [13]. 3) LRS isoform sequences normally come from different sources (e.g. genotype) to the reference genomes that they can be compared with. Hybrid sequencing correction may have system bias and result in loss of isoforms/transcript variants or generate a “compromised” consensus. Previously, it has been estimated that there was no approach that has achieved more than 60 % accuracy for transcript reconstruction, even for the most studied human genome [49]. For instance, short read platforms deliver data that is less representative of rare or long isoforms and there is a high chance of losing these reads from the long-read dataset when correcting. Improved accuracy may be generated from the platform itself, for example, Pacbio Iso-seq generates improved accuracy from CCS reads. This allows multiple passes of each transcript. Each pass can be used to correct the others with their random errors (mainly indels). The isoform clustering and polishing in this protocol is expected to deliver 99% accuracy. Prior to size selection, normalisation was further applied in parallel to the dataset in this study to decrease the frequency of abundant reads and produce a more even representation of the transcriptome and to capture rare sequences. A highly diverse transcriptome has resulted. The abundance of genes that had not been previously sequenced (1,213), transcript variants and longer isoforms indicate the limits of previous studies and potential of LRS technologies. However, the limitation shows in detecting short sequences less than 300bp (raw data cut-off). The chances of large errors due to indels from Pacbio sequencing may produce reads shorter than the actual reads. Additionally, the Blue pippin size selection system starts from 500bp in the cDNA library preparation, with few sequences from the boundary (400-500bp). Therefore, improvement is needed to capture a broader transcriptome. In conclusion, this study will improve the understanding of the biology and genetic improvement of polyploid species such as coffee. It provides a useful technique to generate a full-length reference transcriptome and improve understanding of UTR regions.

76

Chapter 5 Understanding of the ripening coffee bean through transcriptome analysis

Based on the reference generated in chapter 4, this chapter aims to explain the key components accumulated in coffee bean ripening, facilitating the understanding of the environment influences to the ripening coffee bean in the subsequent chapter 5. Transcriptome dataset of coffee bean from the upper canopy was used in this study as a start.

5.1 Introduction

Coffee bean is a tropical dicotyledonous albuminous seed, with a copious endosperm and a tiny embryo. Generally, seeds have evolved intricate strategies to reserve nutrients and take advantage of the pericarp and pulp to encourage dispersal by animals [392]. The composition of the coffee bean determines the quality of the coffee. An improved understanding of the molecular basis of determination of bean composition is required to support enhanced coffee production and genetic improvement especially in response to climate change.

Numerous biochemical studies have been conducted on the genetic and environment factors influencing the accumulation of key components of the bean [391]. Genetic control of these processes can be investigated by the study of changes in gene expression through bean ripening, including transcripts regulating bean filling as well as in response to stress [203, 392, 393]. Early studies applied RT-PCR (coffee beans) or microarrays to mainly coffee leaves or seedlings [166, 203, 389]. More recently, different tissues of Arabica coffee including flowers, leaves and fruit pericarp have been subjected to transcriptome analysis. However, the essential components contributing to coffee quality and regeneration, the endosperm and embryo (bean), were excluded in these studies [394]. Importantly, the absence of a reference genome or transcriptome further limited previous studies.

In this chapter, coffee beans from different development stages, green, yellow and red coffee beans (other than exocarp or mesocarp) were collected and the RNA was sequenced for transcriptome analysis. A recent long read sequencing full-length (LRS) coffee bean transcriptome was used as a reference to facilitate transcriptome analysis in the ripening coffee bean [341].

The aim was to understand the progress of accumulation of the key component in the coffee bean through ripening and the molecular basis of genetic regulation of coffee quality and deliver a platform for the study of genotypic and environmental influences on the coffee transcriptome and coffee quality. 77

5.2 Plant material and methods

5.2.1 RNA sample and cDNA library preparation

Coffea arabica cv. K7 cherries of different ripening stages (green, yellow and red) were collected from the upper canopy only as described previously [341]. Total RNA isolated from nine samples (three development stages in triplicates) was processed individually according to Furtado et al. [379]. The integrity of total RNA was accessed with an Agilent RNA 6000 nano kit and chips through a Bioanalyzer 2100 (Agilent Technologies, California, USA). Thereafter, a standard 18 x Truseq total RNA library preparation was conducted with the use of an additional Ribo-Zero kit. Samples were subsequently sequenced on an Illumina HiSeq4000 platform (2×150bp paired-end reads).

5.2.2 Reads mining and RNA-Seq analysis

Raw reads were mainly processed with CLC Genomic Workbench 10.0.1 (CLC Bio, Denmark) as following. 1) Adapters and indexes were trimmed. 2) Reads failed matching the PHRED score (<0.01) and length (≥40bp) were removed. 3) RNA-Seq analysis (read similarity 0.9, length similarity 0.8) was conducted with the processed reads. A recent published long-read sequencing coffee bean transcriptome was used as a reference with Transcripts Per Kilobase Million (TPM) as the expression parameter [391]. Outlier expression values, classified with coefficient variation and standard deviation, were not considered in this study.

5.2.3 Statistical analysis

Transcripts expressed at each development stage were filtered with TPM (>1). Functional annotations were analysed with BLAST2GO for GO terms and KEGG pathway distribution [395, 396]. Venn diagrams were built through an online tool [397]. Highly expressed transcripts were filtered with TPM (> 500). Differential gene expression tool for RNA-Seq (CLC) was used for significant through ripening stages. DEGs were filtered with FDR p- value correction (< 0.01) and maximum group means (TPM≥10).

Bean storage molecular network was constructed with DEGs through Mercator and Mapman 3.6.0RC1 [398, 399]. Co-expression network was constructed with Mapman annotated storage DEGs and candidate genes from the targeted analysis. Gene expression of these transcripts were log2(x) transformed before analysis through WGCNA build-in Web MEV

78 package (cutoff_0.9) and Cytoscape 3.5.1. [7, 400] (Note: 0.0001 was assigned to transcripts with an expression value of 0 before log2 transformation).

5.3 Results

Nine coffee bean transcriptome datasets were generated in this study, including green, yellow and red stages, all in triplicate (Figure 24a). As pericarp (including exocarp and mesocarp) was removed before RNA extraction, the coffee bean in this study refers to endocarp, perisperm (seed coat), endosperm and embryo (encapsulated by endosperm) (Figure 24b).

Figure 24 Overview of ripening coffee bean transcriptome. Note: a. Coffee cherries of different development stages. b. Tissue representatives of coffee beans used in this study (pericarp was discarded). c. Venn graph of highly expressed genes (TPM > 500) in green, yellow and red coffee beans. d. The distribution of differentially expressed genes (up/down regulations) of ripening coffee beans according to fold change (FDR p-value ≤ 0.01, TPM (max group means) >10).

79

5.3.1 Raw read processing

A total of 120,784,786 reads were generated from the nine coffee bean transcriptome datasets (Table 12). This number was slightly reduced (117,539,360) after trimming. For individual samples, the number of trimmed reads were between 4,602,943 and 29,612,113. A range of 48.37% and 64.05% of reads were mapped to the long-read sequencing coffee bean transcriptome (coffee LRS transcriptome).

Table 12 Overview of RNA-Seq results. Note: Green1, 2, 3, three replicates of Green stage; Yellow1, 2, 3, three replicates of Yellow stage; Red1, 2, 3, three replicates of red stage.

Sample ID Raw reads Trimmed reads Mapped genes Green1 17,355,392 16,980,672 59.04 Green2 30,315,140 29,612,113 52.25 Green3 9,793,436 9,568,758 57.08 Yellow1 8,170,498 7,942,370 56.54 Yellow2 24,656,422 23,905,608 59.9 Yellow3 12,768,562 12,350,938 64.05 Red1 7,642,464 7,405,192 60.19 Red2 4,741,574 4,602,943 58.36 Red3 5,341,298 5,170,766 48.37 Sum 120,784,786 117,539,360

The yellow stage has the highest number of genes expressed (43,552 transcripts, TPM > 1), compared to red and green stages (43,257 and 36,388 transcripts) (Figure 25). Functional annotation of the expressed transcripts revealed that GO terms associated with the metabolic process, catalytic activity and cell part, ranked as the most abundant in “biological process”, “molecular function” and “cellular components” respectively (Figure 26). The lower number of transcript expression was observed at green stage.

80

Figure 25 The number of transcripts expressed at coffee bean ripening.

Figure 26 GO distribution in developing coffee bean, including biological process, molecular function and cellular components. The top three pathways with the most transcripts expressed were purine pathway and thiamine metabolism. (Table 13). Starch and sucrose metabolism and phenylpropanoid biosynthesis were the sixth and ninth most enriched pathway. The top 30 pathways with the highest number of transcripts expressed were from 11 parent-pathway groups. The dominant parent-pathway group was carbohydrate metabolism, enriched with the highest number of pathways (seven pathways), including starch and sucrose metabolism and galactose

81 metabolism. This is followed by the amino acid and lipid metabolism parent-group, which relating to five and three metabolism pathways, individually.

Table 13 Top 30 KEGG pathways and their corresponding parent groups in ripening coffee bean.

KEGG Pathways (TOP 30) expressed Ranking Parent-groups of pathways ripening coffee seeds 1 Purine metabolism Nucleotide metabolism 2 Thiamine metabolism Metabolism of cofactors and vitamins 3 Biosynthesis of antibiotics Global and overview maps Xenobiotics biodegradation and 4 Aminobenzoate degradation metabolism 5 T cell receptor signalling pathway Immune system 6 Starch and sucrose metabolism Carbohydrate metabolism 7 Glycolysis/ Gluconeogenesis Carbohydrate metabolism 8 Th1 and Th2 cell differentiation Immune system Biosynthesis of other secondary 9 Phenylpropanoid biosynthesis metabolites 10 Pyrimidine metabolism Nucleotide metabolism Xenobiotics biodegradation and 11 Drug metabolism-other enzymes metabolism 12 Pyruvate metabolism Carbohydrate metabolism Amino sugar and nucleotide sugar 13 Carbohydrate metabolism metabolism 14 Glutathione metabolism Metabolism of other amino acids 15 Cysteine and methionine metabolism Amino acid metabolism 16 Glycerolipid metabolism Lipid metabolism 17 Galactose metabolism Carbohydrate metabolism Carbon fixation in photosynthetic 18 Energy metabolism organisms Xenobiotics biodegradation and 19 Drug metabolism-cytochrome P450 metabolism Glyoxylate and dicarboxylate 20 Carbohydrate metabolism metabolism 21 Methane metabolism Energy metabolism 22 Pentose phosphate pathway Carbohydrate metabolism Metabolism of xenobiotics by Xenobiotics biodegradation and 23 cytochrome P450 metabolism Glycine, serine and threonine 24 Amino acid metabolism metabolism 25 Fatty acid degradation Lipid metabolism 26 Glycerophospholipid metabolism Lipid metabolism Valine, leucine and isoleucine 27 Amino acid metabolism degradation 28 Lysine degradation Amino acid metabolism 29 Alanine, aspartate and glutamate Amino acid metabolism

82

metabolism 30 Oxidative phosphorylation Energy metabolism

5.3.2 Highly expressed transcripts through coffee bean ripening

5.3.2.1 Overview

Of all the top ten highly expressed genes (HEGs), more unique transcripts were expressed over TPM500 at the green stage (46 transcripts) other than yellow and red stages (27 and 8) (Figure 24c). A total of 68 common transcripts were expressed at all the three development stages. A significantly higher number of common transcripts (73) were expressed at both the yellow and red stages, while the number of common transcripts (seven) expressed at the green and the red stages were the same as that of the first two stages.

5.3.2.2 Intensive lipids formation at the green stage

The top ten HEGs in green, yellow and red coffee beans were extracted individually for further analysis (Table 14). Other than three unnamed protein products, a ncRNA was one the most abundant transcript at all stages, peaking at the green stage and decreasing at the last two stages. Two non-specific lipid transfer (LTP) A-like transcripts, showed extremely high expression in green coffee beans (33,227 and 18,563) but decreased dramatically in yellow stage (5,488 and 1,446) until maturity (1,260,378 and 1,165). This suggested a high level of lipids may accumulate and transported from this stage. A similar drop was also identified in alpha-galactosidase 2, from 10,120 in the green stage to more than ten times lower in yellow stage (830) and red stage (351). Fewer changes were characterized in transcripts encoding 11S storage globulin (bean storage protein) and metallothionein type 3 (important for bean development), presenting maximum expression in green coffee beans (9,994 and 7,709 respectively), while gradually dropped until red coffee beans (4,059 and 6,829 respectively). Transcripts encoding for kirola-like protein and dehydrin DH1a (response to desiccation), peaked at the yellow stage and decreased slightly in red stage.

Table 14 Top ten highly expressing transcripts in green, yellow and red stages of coffee beans (TPM). Note: Transcripts in the top-ten list but common between the green and yellow stages and between yellow and red stages were indicated in underline and italic text respectively.

Top10 in green Green Yellow Red Sequence Description

Cucumis melo uncharacterized LOC107990804 (LOC107990804) transcript c80141/f1p1/450 94,493 15,075 22,440 variant ncRNA

83

c63112/f22p4/653 33,227 5,488 1,260 non-specific lipid-transfer A-like

c31970/f1p3/511 30,270 26,070 29,246 hypothetical protein COLO4_02306

Sesamum indicum NEDD8-specific protease 1 (LOC105170338) transcript c24248/f1p0/1181 29,834 3,567 4,370 variant mRNA

c34759/f1p0/1129 25,585 17,528 16,531 unnamed protein product

c8287/f57p5/686 18,563 1,446 378 non-specific lipid-transfer A-like

c152969/f1p43/1468 10,120 830 351 alpha galactosidase 2

c153711/f216p83/1761 9,994 7,687 4,059 11S storage globulin

c13322/f1p0/738 10,004 7,731 8,460 Nicotiana sylvestris uncharacterized LOC104249947 (LOC104249947) transcript variant ncRNA

c73790/f1p0/2722 15,260 4,001 5,532 pyrophosphate--fructose 6-phosphate 1-phosphotransferase subunit beta

Top10 in Yellow Green Yellow Red Sequence Description

c31970/f1p3/511 30,270 26,070 29,246 hypothetical protein COLO4_02306

c34759/f1p0/1129 25,585 17,528 16,531 unnamed protein product

Cucumis melo uncharacterized LOC107990804 (LOC107990804) transcript c80141/f1p1/450 94,493 15,075 22,440 variant ncRNA

c35373/f1p2/1001 4,961 10,731 11,805 PSII 32 kDa (mitochondrion)

c153711/f216p83/1761 9,994 7,687 4,059 11S storage globulin

c56830/f3p1/506 7,709 7,576 6,829 metallothionein type 3

c63165/f2p6/845 1,759 6,977 7,010 unnamed protein product

c31995/f48p5/939 1,783 6,744 5,988 dehydrin DH1a

c42276/f1p3/825 2,200 6,278 7,029 unnamed protein product

c153289/f6p3/751 1,930 5,841 5,180 kirola-like

Top10 in Red Green Yellow Red Sequence Description

c31970/f1p3/511 30,270 26,070 29,246 hypothetical protein COLO4_02306

Cucumis melo uncharacterized LOC107990804 (LOC107990804) transcript c80141/f1p1/450 94,493 15,075 22,440 variant ncRNA

c34759/f1p0/1129 25,585 17,528 16,531 unnamed protein product

c35373/f1p2/1001 4,961 10,731 11,805 PSII 32 kDa (mitochondrion)

c42276/f1p3/825 2,200 6,278 7,029 unnamed protein product

c63165/f2p6/845 1,759 6,977 7,010 unnamed protein product

c56830/f3p1/506 7,709 7,576 6,829 metallothionein type 3

c31995/f48p5/939 1,783 6,744 5,988 dehydrin DH1a

c153289/f6p3/751 1,930 5,841 5,180 kirola-like

c24248/f1p0/1181 56,405 14,965 18,132 Sesamum indicum NEDD8-specific protease 1 (LOC105170338) transcript variant mRNA

5.3.3 More changes in the comparison of the red vs green stage

Three comparisons between developmental stages were conducted in this study of the ripening Arabica coffee bean transcriptome, yellow vs green (Y vs G), red vs green (R vs G) 84 and red vs yellow (R vs Y) included. The highest number of differentially expressed transcripts (DEGs) were shown in the R vs G comparison (2,262), including the most downregulated DEGs (1,257) (Figure 24d). Only 130 DEGs were seen in R vs Y, including 108 downregulated genes and 22 upregulated genes. A total of 2,058 DEGs were characterized in Y vs G, with the most up-regulated DEGs (1,070). The majority of DEGs were distributed within the range of two to ten times fold change, while only a few varied less than two times fold change.

The top ten up and down-regulated DEGs from individual comparisons were extracted in Table 15 to understand the most significant changes in coffee bean ripening. These transcripts included pathogen relevant, cell function and key chemical biosynthesis transcripts.

In the comparison of Y vs G, two class III chitinase (chi2) transcripts were highly expressed in green coffee beans and were absent (TPM<1) in yellow stage. Similarly, a dramatic decrease was shown in a lipid degradation transcript (controls GDSL esterase lipase APG- like protein), an O-fucosyltransferase family transcript (relating to mannan biosynthesis and galactomannan accumulation), an amino acid transport transcript (WAT1-related protein-like), and a cellular function transcript (regulating ubiquitin 40S protein S27a). The ubiquitin 40S protein S27a related transcript also demonstrated a significant change in the comparison of R vs G. Additionally, a significant decrease from green to red coffee beans was characterized in a cell wall vascular inhibitor of fructosidase 1-like (vacuole invertase inhibitor). In the comparison of R vs Y, E3 ubiquitin- ligase SHPRH isoform X1, Luminal-binding 5-like and cinnamoyl- reductase 2-like. degraded in red stage.

Table 15 Expression (TPM) of top ten down/upregulated differentially expressed genes in the comparison of yellow vs green, red vs green and red vs yellow. Note: Words in italic means common in all comparisons, while bold words indicate common in either two stages.

log₂ Expression values DOWN regulations FDR p- Comparisons fold Sequence description value (Sequence ID) change Green Yellow Red

c59581/f2p4/1062 -11.44 0.00E+00 199.16 0.07 0.36 12658Cc06_g14410

c32395/f2p4/1179 -10.38 0.00E+00 90.46 0.19 0.00 Class III chitinase (chi2)

c12812/f5p0/577 -9.85 0.00E+00 2550.72 5.81 4.67 unnamed protein product Yellow vs Green c51638/f2p4/1139 -9.52 0.00E+00 50.59 0.11 0.10 Class III chitinase (chi2)

c76145/f1p1/642 -9.02 0.00E+00 2878.44 10.98 10.79 ubiquitin 40S protein S27a

c1950/f3p0/1246 -8.57 0.00E+00 81.57 0.40 0.68 GDSL esterase lipase APG-like

85

c29520/f6p2/1511 -8.40 0.00E+00 78.92 0.30 0.77 O-fucosyltransferase family

c6862/f2p2/1505 -8.34 0.00E+00 29.29 0.30 0.25 GDP-fucose O-fucosyltransferase

c55541/f6p4/1596 -8.24 0.00E+00 49.02 0.35 0.65 WAT1-related At2g37460-like

c67536/f1p2/961 -8.22 0.00E+00 5278.44 34.48 25.48 13999Contig1271

c227/f23p5/1246 -10.39 0.00E+00 231.93 8.32 0.37 hyoscyamine 6-dioxygenase-like

c59581/f2p4/1062 -10.15 0.00E+00 199.16 0.07 0.36 12658Cc06_g14410

c12812/f5p0/577 -10.11 0.00E+00 2,550.72 5.81 4.67 unnamed protein product

c47253/f1p1/1271 -9.21 0.00E+00 209.43 6.9 0.55 hyoscyamine 6 beta-hydroxylase

c76145/f1p1/642 -9 0.00E+00 2,878.44 10.98 10.79 ubiquitin 40S protein S27a Red vs Green c30689/f6p3/1067 -8.85 0.00E+00 153.84 2.49 0.79 12658Cc06_g14410

c154384/f5p2/662 -8.82 0.00E+00 221.23 2.77 0.7 vacuolar invertase inhibitor

c11158/f3p5/1395 -8.81 0.00E+00 85.17 0.86 0.33 dehydration-responsive element-binding 2D-like

c67536/f1p2/961 -8.69 0.00E+00 5,278.44 34.48 25.48 13999Contig1271

c44011/f13p4/584 -8.38 0.00E+00 4,673.02 138.76 26.39 unnamed protein product

c9963/f1p1/900 -4.4 0.00E+00 790.69 309.83 15.03 E3 ubiquitin- ligase SHPRH isoform X1

c2057/f9p6/929 -4.04 0.00E+00 1412.25 537.78 33.22 luminal-binding 5-like

c72328/f2p11/3062 -4.02 0.00E+00 137.62 37.07 2.29 electron transfer flavo subunit beta, mitochondrial

c8307/f2p1/1218 -2.88 2.13E-12 511.49 107.26 14.63 ADP-ribosylation factor 1-like

c424/f42p13/1416 -2.47 2.13E-12 208.87 132.87 24.38 endonuclease V Red vs Yellow c46628/f1p11/1830 -3.01 7.11E-12 115.19 24.31 2.46 MYB transcription factor MYB90

c32622/f2p2/3386 -2.35 1.07E-11 6.48 48.55 9.85 mannose-1-phosphate guanyltransferase alpha

c23452/f5p4/795 -2.34 1.66E-11 228.91 122.02 25.05 cinnamoyl- reductase 2-like

c3523/f1p2/916 -1.98 3.73E-11 145.87 217.53 56.09 unnamed protein product

c2393/f4p5/950 -3.89 1.02E-10 539.03 201.10 14.12 endonuclease V isoform X1

Log₂ Expression values UP regulations FDR p- Comparisons fold Sequence description value (Sequence ID) change Green Yellow Red

c70460/f8p4/1626 8.41 0.00E+00 0.78 406.11 996.73 trans-resveratrol di-O-methyltransferase-like

c14624/f5p2/1077 6.27 0.00E+00 2.60 354.84 221.06 germin subfamily T member 2

c5092/f1p2/1158 6.20 0.00E+00 0.97 108.21 295.30 MYB transcription factor

c87219/f6p8/5061 6.14 0.00E+00 0.19 28.59 61.20 retrotransposon Ty1-copia subclass

c21121/f1p0/941 6.11 0.00E+00 0.57 66.88 44.72 germin 2-1 Yellow vs Green c101202/f2p0/662 5.97 0.00E+00 0.58 76.30 69.50 pathogenesis-related 1

c32281/f2p1/2368 5.92 0.00E+00 0.60 55.18 41.06 probable rhamnogalacturonate lyase B

c128223/f1p1/2236 5.63 0.00E+00 0.44 47.61 129.42 MYB transcription factor MYB90

probable xyloglucan endotransglucosylase c41502/f4p9/1331 5.62 0.00E+00 4.65 453.03 411.65 hydrolase B

c66442/f1p1/1028 5.49 0.00E+00 1.96 137.11 115.80 pectinesterase inhibitor 11

Red vs Green c70460/f8p4/1626 9.68 0.00E+00 0.78 406.11 996.7 trans-resveratrol di-O-methyltransferase-like

86

c5092/f1p2/1158 7.61 0.00E+00 0.97 108.21 295.3 MYB transcription factor MYB90

c87219/f6p8/5061 7.23 0.00E+00 0.19 28.59 61.2 retrotransposon Ty1-copia subclass

c128223/f1p1/2236 7.02 0.00E+00 0.44 47.61 129.4 MYB transcription factor MYB90

c122295/f1p5/5026 6.56 0.00E+00 0.47 41.21 105.4 retrotransposon Ty1-copia subclass

c88329/f1p1/2084 6.31 0.00E+00 0.2 13.33 27.42 MYB transcription factor MYB90

c63916/f1p0/1089 6.18 0.00E+00 0.24 20.96 49.39 beta-glucosidase 44-like

c10950/f7p1/863 5.92 0.00E+00 0.53 43.84 97.24 lipid transfer

c52265/f1p6/1138 5.82 0.00E+00 6.09 405.31 686 MYB transcription factor MYB90

c101202/f2p0/662 5.8 0.00E+00 0.58 76.3 69.5 pathogenesis-related 1

c21728/f1p0/1557 4.19 2.84E-10 1.30 3.02 59.72 luminal-binding 5-like

c13553/f10p5/1234 5.09 5.26E-09 17.61 0.68 25.13 electron transfer flavo subunit mitochondrial

c28681/f1p0/995 4.75 5.26E-09 2.54 0.94 30.53 ADP-ribosylation factor 1-like

c128223/f1p1/2236 1.39 3.27E-06 0.44 47.61 129.42 MYB transcription factor MYB90

c78857/f1p1/1552 1.36 3.56E-06 7.22 39.18 103.33 cinnamoyl- reductase 2-like Red vs Yellow c89078/f1p2/5144 4.5 2.47E-05 18.17 1.74 37.43 endonuclease V isoform X1

c113302/f1p3/1573 1.40 0.00015 3.32 46.00 123.83 cinnamoyl- reductase 2-like

c67038/f1p5/2024 1.74 0.00022 0.81 12.82 43.78 ECERIFERUM 1-like

c22721/f1p0/827 3.23 0.00023 1.59 6.99 70.51 luminal-binding 5-like

c53207/f3p2/1601 1.21 0.00086 3.81 34.66 81.76 cinnamoyl- reductase 2-like

Numerous MYB transcription factors (MYB90) were identified as upregulated DEGs in top ten DEGs of all comparisons, increasing from green to maturity (maximum expression was TPM: 686). A trans-resveratrol di-O-methyltransferase-like transcript, catalyzing pterostilbene (antifungal and pharmacological function) biosynthesis and a transposon elements variant gene (retrotransposon Ty1-copia subclass) were upregulated in both comparison of Y vs G and R vs G as they increased sharply from green to red stage. From the green stage, cell wall degradation DEGs, such as probable rhamnogalacturonate lyase B (probably pectin degradation) and probable xyloglucan endotransglucosylase hydrolase B (XTHB, cleaving primary cell wall xyloglucan polymers and contributing to the construction of the growing tissues) were upregulated at the yellow stage. Meanwhile, transcript expression of the pectinesterase inhibitor 11 transcript, maintaining the integrity of cell walls, was increased and peaked at the yellow stage. Steady growth was detected in beta- glucosidase 44-like, lipid transfer and pathogenesis-related 1 transcripts. Moreover, the comparison of R vs Y included cell function transcripts, like Luminal-binding 5-like,

87 endonuclease V and three transcripts probably related to phenolic compounds, cinnamoyl- reductase 2-like.

5.3.4 Storage compounds accumulated through bean maturity

An association study of the bean storage molecular network was conducted with lipid, cell wall component, storage protein and phenylpropanoid related DEGs in ripening coffee bean (Figure 27). Major changes were observed in the comparison of Y vs G and R vs G. More DEGs were assigned in the comparison of R vs G, while only a few shown in a comparison of the last two stages (R vs Y).

Figure 27 Candidate genes involved in coffee bean storage molecular network differentially expressed through ripening. Note: Up and down regulations were shown in red and blue boxes. Darker color indicates larger fold change (log2 transformed fold change). FA, fatty acids; TAG, triacylglycerol; LTP, lipid transfer protein. 5.3.4.1 Major lipids accumulation at the green stage

The main lipid-related DEGs went through a decrease in expression compared to levels in green coffee beans, especially fatty acid desaturation (omega-6-desaturase, critical for biosynthesis of linoleic acid), TAG synthase (oil body oleosin family proteins) and lipase (for TAG degradation) [401]. One omega-6-desaturase declined dramatically (log2 ratio: -6.10) at the yellow stage in comparison with the green stage. Similarly, non-specific LTP protein decreased sharply (maximum log2 ratio: -8.16 in R vs G) compared to the green stage,

88 supporting the idea of lipids predominantly accumulating in green beans. Only a limited number of DEGs remained in the comparison of the last two stages, and they were all down- regulated. This includes three non-specific lipid transfer proteins (phospholipid transfer protein) and four GDSL-like Lipases. Therefore, TAG is likely to be synthesized in green beans, which accompanies its degradation by lipase. Linoleic acid was also apparently formed in green bean.

5.3.4.4 The flow of cell wall components

Most down-regulated DEGs in the comparison of Y vs G were found to relate to cell wall precursor synthase, hemicellulose synthase (only downregulation), cellulose synthase and arabinogalactan protein (AGPs) DEGs [401]. Only one UDP-XYL synthase (USX, UDP-D- xylose biosynthesis) transcript 2-like was upregulated in Y vs G and R vs G in cell wall precursor DEGs. The others were downregulated either in Y vs G or R vs G. Mannose-1- phosphate guanylyltransferase (CYT, involved in GDP-mannose biosynthesis) was downregulated in the last two stages. Downregulations include nucleotide-rhamnose synthase/epimerase-reductase, USX6 and USX6-like, UDP-glucose 6-dehydrogenase 5 (provides nucleotide sugars for cell-wall polymers).

Decreased cellulose synthase DEGs are transcript 1 and 2 (CESA1, 2), COBRA-like protein 1, COBRA-like and CESA-like transcript (mannan synthase 1-MS1, which peaks in the green stage), while upregulation was seen in two CESA-like transcripts with upregulation observed in MS2 (peaking at the yellow stage). AGP related DEGs were downregulated in FLA17 and upregulated as was seen in FLA1 (top expression in yellow stage). Hemicellulose related DEG, regulating hydroxyproline O-galactosyltransferase 6 (translocates galactose from UDP- galactose to the residues of AGPs), declined in the yellow and red stages compared to the green stage. However, the only increases in transcript expression were shown in the comparison of Y vs G and R vs G in cellulase (CEL3 and 5) and beta 1,4-glucanase (such as transcript 6), with a peak expression in the yellow stage. Leucine-rich protein (LRR) associated DEGs were also seen to rise in Y vs G and R vs G. A large number of DEGs were found in pectin degradation (pectin esterases, lyses and pectinases), which were mainly upregulated in Y vs G and R vs G. A major rise of transcript expression was also observed in other cell wall modification DEGs including numerous expansin (and expansin-like) transcripts (isoform 6, 8, 10, 11 and 15) and XTH transcripts (isoform B, 2, 6, 12, 23, 30). Expression of BGAL-like, probable beta-D-xylosidase 7 and mannan endo-1,4-beta-

89 mannosidase 5-like (one transcript variant was upregulated) DEGs declined at the yellow stage. Similar patterns continued in the second comparison, R vs G. A higher number of DEGs were associated with cell wall precursor synthases, cellulose synthases, cell wall modifications and mannan degradation. However, in the last comparison, R vs Y, all DEGs were downregulated. Altogether, cellular precursors, hemicellulose, FLA17, CESA1 and CESA2 and MS1 increased relative to the highest expression at the green stage. Peak expression shifted to cellular degradation (CEL3 and 5) and MS2 at the yellow stage and pectin degradation and LRR in the last two stages.

5.3.4.3 Major storage protein accumulation at the green stage

In terms of storage protein related DEGs, both up and down regulations were identified in both Y vs G and R vs G comparisons, while no DEGs were assigned in the comparison of R vs Y [401]. Altogether, significant decreases were identified in eight storage protein related DEGs in the yellow and red stage compared to green coffee beans. They were four 11s globulin (11S), three 7S-like globulin transcripts (7S-like) and glutelin type A2 (GLUA2) DEGs. In addition to these eight DEGs, one more 11S and a patatin (Pat) 2 also decreased in the red stage in contrast to green beans. In contrast, upregulation was observed in three Pat6, GLUA3, TAG lipase SDP1, SDP1-like (storage lipids degradation in bean germination) and SDP1-like DEGs in the comparisons of Y vs G and R vs G (larger fold change at the red stage). Transcripts regulating these proteins are likely to peak at maturity. One more GLU, type B5 (GLUB5), was also more highly expressed at the red stage rather than in green coffee beans. Hence, 11S, 7S-Like and GLUA2 storage probably occurred since the green stage, while SDP1, SDP1-like, GLUA3, Pat6, accumulated at the end.

5.3.4.4 Phenylpropanoids

Most CGAs related transcripts peaked at the yellow stage, with a significant increase since the green stage and decreased dramatically in the red stage (FDR corrected p-value < 0.001). Exceptions were phenylalanine ammonia-lyase (PAL) 4 and 4-coumarate CoA ligase (4CL) 7, which increased from green coffee beans until maturity. This suggested that CGAs were likely to be mainly formed from the yellow stage and 4CL7 probably contributes to further accumulation of CGAs or lignin in later stages.

5.3.4.5 Co-expression network of the bean storage genes

Four groups of transcripts were targeted for co-expression network analysis, including lipid, cell wall, storage protein and phenylpropanoids DEGs (Figure 28). The aim was to 90 investigate the connections among key bean storage/quality attributed transcripts. A great number of the cell wall and phenylpropanoids DEGs were filtered out in the co-expression module (weight ≥ 0.9), followed by lipid, other metabolites, and storage protein related transcripts.

Figure 28 Co-expression network of key storage genes according to weight. Each triangular indicates a candidate gene. Black, green and red lines indicating weight ranged in 0.90-0.95, 0.95-0.97 and 0.97-1.00. Different transcripts of alpha-expansin transcripts (EXPA) played a key role in this network module, especially EXPA6_3 (EXPA6_3 transcript variant 3). EXPA6_3 was co-expressed with 22 transcripts from all five categories, mainly cell wall-related. The top five connected transcripts to EXPA6_3 were EXPA 8_2 (0.9709), probable xyloglucan endotransglucosylase hydrolase 30 (XTH30_2, 0.9681), probable pectin lyase 8 (PL8_1, 0.9402),

91 polygalacturonase-like (PG_2) and EXPA11-like (0.9399). Another core transcript connected with all five transcript groups was PG_2. PG_2, identified as being connected to cinnamoyl- reductase 2 (CCR2), beta-xylosidase/alpha-L-arabinofuranosidase 2-like (Xyl2-like_2) and shikimate O-hydroxycinnamoyltransferase transcript a (HCTa). This suggested PG_2 is essential in cell wall modification and probably interacts with phenylpropanoid and phenolic biosynthesis.

A lipid DEG, CP5_3, functions as a membrane related protein and was one of the pivotal transcripts in the co-expression network. The top five co-expression were from different groups, comprise of CCR2, EXPA8_1, PG_2 and 11S_2. In addition, 3-ketoacyl-CoA synthase 11-like (KCS11-like, biosynthesis of very long chain fatty acids), unanimously expressed with transcripts from all categories, was another central transcript from the module. The co-expressed transcripts were CP5_3, non-specific phospholipase C2, cytochrome P450 71D8, EXPA8_1 and CCR2. The relationship of these diverse transcripts suggested that CP5_3 and KCS11-like are essential in bean ripening and nutrient reservation.

CCR2 (biosynthesis of phenolics), was among the most important transcripts from the network. Other than the above connections, CCR2 was also associated with diverse categories with the highest impact on EXPA6_4, Xyl2-like_2, and EXPA8_1. Hence, it is likely CCR2 is pivotal to cell wall expansion and modification. The next important phenylpropanoids related DEGs was HCTa, correlating to EXPA8_1, Pat6_3, MS2_1, and 7S-like_1, indicating a key role in cell wall expansion and storage protein biosynthesis.

Very few storage protein related transcripts were distributed in this co-expression network module and they were co-expressed with a lower number of transcripts. Among them, SDP1 and 11S were relatively important. SDP1 expression was also found to be concurrent with SKU5 similar 5 (sks5_1), XTH30_2, CCR2 and HCTa, while 11S correlated with phosphoinositide phosphatase SAC3 (SAC3_1), CCR2, SDP1-like, and KCS11-like.

5.4 Discussion

This chapter provides insights into the sequence of accumulation of the key components in the ripening of the bean (summarised in Figure 29). At the initiation of the storage phase, peak expression of candidate transcripts for major storage compounds was observed in green beans, such as galactomanan, FLA17, TAG, linoleic acid, 11S, 7S-like and glutelin A2. Main accumulation of FLA1 started at the yellow stage. Transcripts encoding SDP1, SDP1-like, Glueteline A3, Pat2, storage proteins reached a maximum expression at red stage. In addition,

92 pectin was mainly degraded at yellow and red stages. Moreover, the co-expression network illustrated the importance of expansin transcripts in bean storage and ripening.

Figure 29 Summary of expression pattern in storage-related genes through coffee bean ripening. Note: FLA, faccilin-like arabinogalactan; TAG, Triacylglycerol, SDP1, TAG storage lipase SDP1. As a storage tissue, nutrients are formed and stored through complicated pathways in beans to support the embryo development and reproduction of the plant. The peak development sequence in coffee bean is perisperm, endosperm and then embryo [402]. The first development phase is a gradual growth period with cell division of the endosperm (0-7 week after flowering, WAF). The second phase is a rapid bean filling stage with endosperm cell elongation (8-17 WAF) [403]. Last is the main storage phase that can be observed with pericarp maturation (from green, yellow to red pericarp) and a moisture loss in the endosperm. This moisture loss was associated with one of the dehydration transcripts (DH1a), with the highest expression at the yellow stage. The three development stages of coffee beans collected in this research were representative of the storage phase, including green, yellow and red stage.

93

Arabica coffee beans are composed of cell wall polysaccharides (CWP, 50% of the dry mass), lipids (13-17%), proteins (11-15%), sucrose (7-11%), and CGAs (5-8%) [391, 404, 405]. The major CWP in the young coffee bean, comprise arabinogalactan (~50%), cellulose, pectic polymers (~20%) and galactomannans (10%) [203, 404, 406]. However, at maturity, this has changed to arabinogalactan-proteins (~30%, AGPs), cellulose (15%), pectic polymers (~5%) and galactomannan (50%) [406, 407]. Lipids were reported to be mainly composed of triacylglycerol (TAG, 70-80% of lipids), while the major storage protein was identified as 11S globulin (45%) [203, 404, 407].

The lower number but wider range of HEGs from green coffee beans was probably associated with the initiation of the storage phase. This is supported by a large number of key components formed predominantly at this stage (Fig 29). Galactomannan is a typical storage compound for legumes (>30% of the bean dry weight), coffee, and palms [408]. The low solubility and high viscosity galactomannans are a stable storage reserve that provides strong cells to prevent osmotic stress, microbial attack or mechanical damage [408, 409]. In seed germination, galactomannans were degraded by α-galactosidase (encoded by gal), endo-β- mannanase and β-mannosidase to provide carbon and energy for embryo development [408, 410]. Golublins such as 11S and 7S are major storage proteins in legumes [411, 412]. Glutelins and prolamins are the predominant proteins in cereals, such as rice (> 60%) [413]. 11S globulin was found to be the major storage protein in coffee [404]. In addition, this study suggested the presence of other potential storage proteins of 7S-like, SDP1, SDP1-like, glutelinA2, A3, and Patatin 6. In contrast, the major storage proteins of the exalbuminous seed of Arabidopsis, mainly 2S and 12S protein (one-third of the total protein) were mainly formed late in bean development [414]. The reverse pattern was also shown in the TAG and linoleic acid (fatty acids desaturation) storage in coffee and Arabidopsis seeds (stored in late stage) [415]. Different storage pattern may result from a different structure (copious endosperm in the coffee bean) and function of bean tissues. In Arabidopsis, TAG storage lipase SDP1 and SDP1-like were found to catalyze more than 90% of the TAG in seed germination [416].

The higher number of transcripts expressed at the yellow stage together with more DEGs compared to the green stage indicated a great shift at the yellow stage. A slight decrease in the number of transcripts expressed at the red stage as well as the small number of DEGs (mostly down-regulated) compared to the yellow stage, demonstrates bean maturity and fewer changes in coffee beans when the pericarp changed from yellow to red. This also

94 indicates the end of bean maturity. Fewer components accumulated in the last two stages suggesting a possible shift to modification, such as cell wall degradation (pectin and cellulose degradation). The evidence that SDP1 accumulated after TAG is likely to avoid unnecessary degradation of storage reserves. Some or all of the coffee bean arabinogalactans were combined with proteins and present as AGPs [417]. This structure is typical in plant cell walls and has an important role in intercellular signaling and wound sealing (as a glue) [418]. Different transcripts of AGPs accumulated (FLA17 at the early stage with FLA1 in later stages) indicating various function through ripening in various tissues (endosperm, perisperm and embryo). It is considered that pectinesterase, lyase, and polygalacturonase catalyze pectin degradation, modifying cell walls through demethylesterification and depolymerization of pectin [419-422]. The decrease in pectin in the ripening coffee bean reaches a peak after the yellow coffee bean stage. This parallels the transcripts expressed in fleshy fruit, such as transcripts encoding polygalacturonase in Arabidopsis, which are highly expressed in ripe fruits (pericarp and mesocarp) as they soften [423]. In addition, dilution by the accumulation of other key compounds during bean ripening is probably another reason for the decline of pectin content.

Importantly, this study suggested expansin transcripts (especially EXPA6_3) were essential in bean ripening and storage. The higher number of the cell wall and phenylpropanoids related transcripts filtered with the network module reveals the close relationship of transcripts from these two groups. Multiple alpha-expansin transcripts from the core co-expression network are connected to transcripts from different transcript categories indicating their diverse functions in bean storage. Expansins loosen cell walls during plant growth and allow responses to the plant growth hormone, auxin [424]. There were four expansin subfamily members, EXPA (acid-induced), EXPB (beta-expansin), EXPA-like and EXPB-like [425]. It was proposed that wall loosening by expansins may be involved with a breakdown of non- covalent binding of cellulose microfibrils. This results in turgor driven polymer movement [425, 426] that may be inhibited by some polysaccharide-binding proteins [33-37]. However, the exact mechanism of expansin action remains unknown. In this study, expansin related DEGs were EXPA or EXPA-like. Consistently, core transcripts from the co-expression network, EXPA6_3, were highly stimulated with probable xyloglucan endotransglucosylase 30 (degradation of xyloglucan polymer). Co-expression was also shown with pectin and xylan degradation transcripts (PL8, PG_2 and Xyl2-like), suggesting pectin and hemicellulose are potentially targeted by expansins or involved in the cell wall expansion phase. Other than

95 cell wall-related transcripts, CP5, SDP1 and HCTa, were concurrent with EXPA6_3 suggesting a diverse interaction of EXPA6_3. When cell walls expand, they became vulnerable. This may result in the co-expression of HCTa, SDP1 and CP5 (lipid transfer).

Plants cannot move; therefore, they have evolved numerous strategies to survive, mature and regeneration. As the bean grows to become a nutrition factory, other species like predators (virus or insects) are also interested in consuming the storage compounds generated. However, plants have evolved multiple strategies to protect beans from damage and recover from damage. For example, galactomannan and the lignified endocarp provide a physical barrier to prevent bean from being digested by predators but dispersed after consumption of the pulp [427].

Different HEGs and DEGs characterize the complicated strategy evolved in the dicotyledonous albuminous coffee beans. Stress-related transcripts are more highly expressed in the green coffee beans, which have the protection of a thick and firm lignified pericarp. However, to protect against pathogens such as fungi, specific transcripts are expressed in green coffee beans. This is supported by the peak expression of chi2, galactomannan at this stage. The gradually hardening of endosperm then takes over the protection of the embryo and endosperm. Assorted transcripts were expressed at the yellow stage in coffee beans. As the coffee pericarp became soft and its color changed to be distinguished at the yellow stage. The brighter color (red) of the coffee pericarp attracts animals to consume the pulp but to disperse intact beans it requires more stress-responsive transcripts, which were evidenced in this research. This coincidence with digressive metabolite biosynthesis upon maturity and fewer growth transcripts. The beta-glucosidase 44-like transcript is a typical example of a plant stress resistance gene, expressed at maximum level in the ripe coffee beans. It was highly expressed in the Arabica coffee bean to activate chemical defense compounds once the cell is being attacked [428, 429]. Another representative case is actin-7, highly expressed in red bean, related to growth and required for callus formation and response to wounding [430].

In conclusion, this ripening bean transcriptome will provide a platform to determine the genotype and environment influences on coffee quality. Targeted analysis is now possible to characterize gene families of interest (e.g. transcriptional factors). Phylotranscriptomic analysis is another approach now enabled for the study of evolutionary diversity features.

96

Chapter 6 Environment influences on coffee quality

Based on the transcriptome reference from chapter 4, this chapter is the association study of the quality traits from chapter 3, upper canopy transcriptome datasets from chapter 5 and the lower canopy transcriptome datasets. The aim is to explain the genetic and micro- environmental control of coffee quality.

6.1 Introduction

Chapter 2 highlighted coffee quality as a very complex trait defined by different perspectives. Consumers may consider good coffee flavor comes from high-quality coffee beans. Farmers or manufactures may focus on physical traits, like bean size and weight as they may relate to price. Both coffee flavor and the physical traits are associated with the levels of chemical components such as the caffeine, trigonelline and sucrose in the bean. In addition to a typical bitterness, caffeine provides perceived strength and body [355] and is associated with good quality [38]. Between 60% and 90% of the trigonelline in the bean is degraded during roasting to form subtle flavor components like pyridine and pyrroles, contributing to the overall aroma perception and bitterness [63, 64, 355-357]. Trigonelline levels have been strongly correlated with high quality [38]. Sugars, especially the most abundant, sucrose, is a key flavor precursor, being almost fully degraded during roasting (98% degrades after dark roast), to produce aromatic compounds, such as aldehydes, carboxylic acids and hydroxymethylfurfural through Millard reaction [115, 355]. Sucrose is also a source carbohydrate used to support the biosynthesis of key components in coffee beans, such as cell wall polysaccharides, proteins and lipids [431].

These chemical components are regulated by gene expression throughout the ripening of the coffee bean and are influenced by the growing environment. Early studies investigated environmental factors required to produce high-quality coffee, such as shade and high altitude. Quality, especially flavor, has been shown to be determined by chemical composition [32, 364, 391]. However, the mechanisms determining environmental influences on coffee quality and chemical composition remain unknown. An understanding of the phenotypic and genetic basis and environmental control of coffee quality will be required to speed breeding in response to the threat of climate change. Approximately 50% of the current high-quality coffee plantations (central Americal, Brazil and South East Asia) will disappear by 2050; In Ethiopia, the country of origin of Arabica coffee, the decrease is predicted to be

97 as high as 70% by 2070 [432]. However, coffee consumption has doubled (9.5bn kilos) since 1980 (4.9bn kilos) according to the International Coffee Organisation [433].

Transcriptome analysis has been widely applied as a tool in gene expression analysis, as it can capture all the genes expressed in a cell at a certain time point and environment [434- 438]. In this study, coffee beans from two levels of the coffee tree canopy were compared to explore the impact of the environment on coffee quality. To capture the changing pattern of canopy position, different ripening stages, immature (green pericarp), intermediate (yellow) and mature (red) stages, of the coffee bean were analyzed. A reference coffee bean transcriptome produced by long read sequencing from the same samples was used as a reference [341]. Phenotypic and transcriptomic analysis was used to explore how the canopy position influences gene expression and coffee quality.

6.2 Plant material and methods

6.2.1 RNA sample and cDNA library preparation

Coffee cherries, Coffea arabica var. K7, of different development stages (green, yellow and red) were from subsets (cherries only from upper or lower canopy) of the collections reported previously [341]. Same total RNA isolation was conducted from nine samples (three development stages in triplicates) from both the upper and lower canopy was processed individually. The integrity of total RNA was accessed with an Agilent RNA 6000 nano kit and chips through a Bioanalyzer 2100 (Agilent Technologies, California, USA). Thereafter, a standard 18 x Truseq total RNA library preparation was conducted with additional Ribo-Zero kit. Samples were subsequently sequenced on an Illumina HiSeq4000 platform (2×150bp paired-end reads). Unfortunately, one of the replicate from the lower canopy at the yellow stage was contaminated, so it was not further analyzed thereafter. Therefore, only two replicates were included in coffee beans from the lower canopy at the yellow stage.

6.2.2 Read mining and RNA-Seq analysis

Raw reads were mainly processed with CLC Genomic Workbench 10.0.1 (CLC, QIAGEN, CLC Bio, Denmark) as following. 1) Raw reads quality was accessed and then adapters and indexes were trimmed. 2) A further trimming with quality (0.01) and length (≥40bp) was conducted and read quality was accessed. 3) Trimmed reads (both singletons and paired reads) were processed through RNA-Seq analysis. Coffee bean long read sequencing transcriptome was used, based upon the same sample collection and total RNA extractions [341]. Transcripts Per Kilobase Million (TPM) was selected as expression values. 98

6.2.3 Statistical analysis

Only genes with TPM more than one were considered as a valid expression. Differential gene expression was analyzed with the building tool for RNA-Seq using CLC. Differentially expressed genes (DEGs) were filtered with an FDR-p-value less than 0.01 and max group mean more than 10. Two comparisons were conducted in this study. One was to compare genes expressed at different canopy positions. DEGs from this dataset were annotated with the LRS coffee bean transcriptome reference and the only annotation with a bit score of more than 300 was considered for further analysis. The other one was to access the stage variations in the samples from the upper and lower canopy positions separately, including yellow vs green stage (Y vs G) and red vs yellow stage (R vs Y) in the upper canopy dataset as well as Y vs G and R vs Y in the lower canopy. Both comparisons were processed with Mercator for the classification of photosynthesis, hormone, and stress associated DEGs. Classified genes were prepared for the construction of the co-expression network with caffeine, trigonelline and sucrose genes. Co-expression network analysis was processed through WGCNA built in MEV (weight higher than 0.85) and visualized by Cytoscape 3.5.1.

6.2.4 Data availability

Transcriptome dataset can be accessed from EMBL with accession number: PRJEB24137 and PRJEB24850.

6.3 Results

6.3.1 Influence of canopy position on the expression of genes in pathways associated with bean composition

To explore the reason for the increased accumulation of these key bean components in the lower canopy, the expression of genes associated with caffeine, trigonelline and sucrose metabolism pathways was assessed. Expression of genes of the caffeine biosynthetic pathway was generally highest at the green bean stage (Figure 30A), with the exception of MXMT1, which peaked at the yellow stage. Gene expression was not significantly different in the upper and lower canopy. This indicates that most caffeine was accumulated early in bean maturation and the upper and lower canopy have slightly different caffeine accumulation patterns. Trigonelline synthase was expressed more at the green and red stages in coffee beans in the lower canopy compared to those from the upper canopy (Figure 30B, p<5%). This suggests a longer time period of trigonelline accumulation in coffee beans from the lower canopy.

99

Figure 30 Difference in candidate gene expression. Note: A. Caffeine biosynthesis pathway. B. Trigonelline biosynthesis pathway. C. Sucrose metabolism pathway. XMT, 7-methylxanthosine synthase; MXMT, 7-methylxanthine methyltransferase; DXMT, 3,7-dimethylxanthine N-methyltransferase; CTgS2, trigonelline synthase 2; SUS, sucrose synthase; VIN, vacuolar invertase; CIN, neutral invertase; CWIN, cell wall invertase 4; VINI, vacuolar invertase inhibitor; CWINVI1, cell wall invertase inhibitor. SPS, sucrose phosphate synthase; SPP, sucrose phosphatase. Note: TPM ± SD was used in each time point; *, **, significant difference (p< 5%, p<1%) between the upper and lower canopy at the ripening stage indicated; _1 and _2 indicate different isoforms.

Sucrose is transferred from the photosynthetic source tissues to the sink tissues through the apoplast (through the cell) and symplast pathways (through cytoplasm). In sink tissues, sucrose is degraded by various tissue-specific invertase to hexose (Figure 30C). These invertase include cell wall invertase (apoplast, CWIN), cytosolic invertase (CIN) and vacuolar invertase (VIN). Cell wall invertase inhibitor (CWINI) has been reported to suppress CWIN activity [431]. In cytoplasm specifically, sucrose can also be degraded by sucrose synthase (SUS) to UDP-Glucose, which can be used for the resynthesis of sucrose through sucrose phosphate synthase (SPS) and sucrose phosphate phosphatase (SPP) or for cell wall polysaccharide synthesis.

Significantly elevated sucrose content probably resulted from increased expression of SPS1, SPS2_2, SUS2_1, CIN, VIN_1 and VINI_2 (Figure 30D). A rise in the expression of SPS1 (10±1 vs 6±2) and SPS2_2 (25±4 vs 16±6) was found in coffee beans from the lower canopy

100 at the green stage (p <5%). Improved SUS2_1 expression was found in lower canopy coffee at the yellow stage suggesting more sucrose was being transferred from source tissues and degraded for more downstream components to be formed. CIN mediates sugar homeostasis in low SUS activity [431]. It alternatively degrades sucrose in the cytoplasm (to glucose and fructose). There was a higher expression of CIN in the lower canopy coffee beans at the green stage (32±4 and 21±5 for lower and upper coffee beans, p <5%). Large differences, increases or decreases, were identified in VIN_1 expression in the lower canopy in all stage comparisons. This includes 2±1 vs 5±2 at green stages in comparison of the lower and upper canopy (decreased in the lower canopy, p <5%), 65±1 vs 50±5 in the yellow stage comparison (increased, p<5%) and 81±9 vs 50±7 in the red stage comparison (increased, p <1%). In contrast, VINI expression declined following the green stage and significantly higher expression was seen in VINI_2 for lower canopy coffee at the yellow stage (p <1%).

CWINI (max: 748±102 in lower canopy beans at the red stage) and SUS1 (max: 235±53 in upper canopy beans at the green stage) were both expressed at a very high level, suggesting their important role in sucrose metabolism and cell development. Sucrose degradation by CIN and SUS2 and resynthesis by SPS and SPP were greatest at the yellow and red stages. Sucrose degradation by VIN was highest in the vacuole at the red stage.

6.3.2 Transcriptome analysis and the rates of bean maturation in different canopy positions

Transcriptome analysis was conducted to determine the influence of canopy position on global gene expression and the regulation of phenotypic variation. The number of genes expressed in coffee beans from the lower canopy increased from green to red stages (Figure 31A), while this number finally decreased at maturity in coffee beans from the upper canopy at the red stage. A higher number of genes were expressed in coffee beans from the lower canopy than in the upper canopy at both the early (green) and late (red) stages. More differentially expressed genes (DEGs) were observed in the comparison of lower and upper canopy at the green stage (Figure 31B). The majority of the DEGs were downregulated in coffee beans from the lower canopy relative to the upper canopy at the green stage, even though a greater number of genes were expressed in coffee beans from the lower canopy at this time. Rapid changes were observed in coffee beans from the upper canopy, especially from the green to the yellow stage, while more gradual ripening was seen in the lower canopy. This is consistent with the number of DEGs from the different stages. More DEGs were found for the yellow and green stage comparison from the upper canopy (UY vs UG: 2,058

101

DEGs) in contrast with the lower canopy (LY vs LG: 1,256 DEGs) (Figure 31C and 31D). In addition, genes expressed at the green stage in coffee beans at both canopy position were grouped together according to principal component analysis (PCA) in the overall gene expression, while genes expressed at the yellow stage were different and very close to those expressed at red stages (Figure 32). Compared to gene expression in the upper canopy, lower canopy samples varied less.

Figure 31 Transcriptome analysis of developing coffee beans from the upper and lower canopy. Note: A) genes expressed in coffee bean ripening from the lower and upper canopy. B) Differentially expressed genes (DEGs) in the upper and lower canopy comparison through different ripening stages (up/down-regulated in coffee beans from the lower canopy). C) The number of DEGs in stage comparisons of the upper canopy, yellow vs green stage (UY vs UG) and red vs yellow stage (UR vs UY). D) The number of DEGs in stage comparisons of the lower canopy, yellow vs green stage (LY vs LG) and red vs yellow stage (LR vs LY). Note: upper canopy data was extracted from ENA with accession number: PRJEB24137; U_, D_, means up/down regulations.

102

Figure 32 PCA analysis of gene expression in three ripening stages of coffee beans from the upper (U) and lower canopy (L). Note: G, Y, R, green, yellow and red stages depend on the pericarp colour. 6.3.3 Genes associated with coffee quality

The up and down-regulated genes defined the different rates of change (development) in coffee beans from different canopy positions (Figure 33). Of the comparisons at the green stage, two transcripts encoding GTP binding protein BRASSINAZOLE INSENSITIVE PALE GREEN chloroplastic 2 (BPG2) were expressed approximately four-fold higher in coffee beans from the upper canopy at the green stage (UG). BPG2 has been reported to control light regulation of chloroplast photosynthetic protein translation and chlorophyll formation [439-441]. Increased expression of BPG2 in the present study is likely to result from a higher level of sunlight exposure of UG. The same was true for the greater expression of transcript encoding aspartate aminotransferase, mitochondrial (AspAT) in the UG indicating more carbon and energy consumption. This is because AspAT is light dependent and feeds the amino acid pool for nitrogen assimilation in terms of carbon and energy metabolism [442-444]. Importantly, elevated expression was illustrated in transcripts associated with kynurenine formamidase (KFA) in UG, while this gene was not detected in the lower canopy (LG). KFA was involved with the tryptophan catabolic process to kynurenine. A tryptophan-dependent pathway is necessary for the auxin biosynthesis in plant shade avoidance, however, kynurenine can inhibit this process [445-447]. The expression of KFA suggests that this dependent pathway is not required for UG with more sunlight. The augmentation of abundant polyubiquitin 10 (UBQ10) expression suggested higher salicylic acid (SA) was formed in UG. Upregulation of UBQ10 was determined in Arabidopsis in response to SA [448]. Moreover, glycine-rich RNA-binding 1 (GRP) isoforms being highly expressed in UG provided further evidence of higher SA. GRP is more abundant in young tissues; It suppresses cell death, enhanced by SA and response to stress, such as dehydration, cold and salt [449-452]. Here the upper canopy was probably a more stressed environment (dehydration upon ripening) at the early bean ripening stage. Higher GRPs expression suggested a younger stage of UG compared to LG and probable response to SA provided at

103 early bean ripening to encourage growth. Interestingly, greater expression of transcription factor bHLH62 suggested heat stress or rapid growth in coffee beans from the upper canopy at the green stage. The bHLH62 plays a role in a broad range of plant growth, stress (heat) and hormone (signalling of jasmonic acid, JA) response and metabolite synthesis [453-455]. In contrast, incremented expression in LG was identified in genes linked to E3 ubiquitin- ligase SDIR1 (SDIR1) suggesting extra abscisic acid (ABA) was produced to suppress bean growth compared to UG. SDIR1 is a positive ABA regulator of signaling and overexpression of SDIR1 leads to enhanced ABA (growth inhibition) [456-458].

104

Figure 33 The expression of differentially expressed genes in canopy position comparison of ripening coffee beans with functional annotation. Note: Expression values are transformed with log2 (TPM+1); Blue and red color indicates the extremes of high and lower expression; G, Y, R indicate coffee beans at green, yellow and red stages. Among the genes augmented in expression at the yellow stage in coffee beans from the upper canopy (UY), probable serine threonine kinase IREH1 (IREH1) was one of the key genes regulating growth [459], suggesting rapid growth in UY compared to coffee beans from the lower canopy (LY). In comparison, upregulations in LY included genes regulating quality traits, such as 11s storage globulin (the main protein in the coffee bean), sucrose synthase 2_2, oil body-associated 2A-like (regulate oil body size, which stores lipids). Importantly, BURP domain RD22 (RD22) was also more highly expressed in LY. RD22 was expressed at the early to mid-stage of seed development, and can be upregulated in response to dehydration, ABA and salt stress rather than heat or cold and suppresses chlorophyll degradation [460-462]. Again, higher expression of RD22 suggests an earlier development stage of LY compared to UY. The almost five times elevated RD22 expression in LY compared to UY indicated stronger dehydration resistance and less photosynthesis loss facilitating continued growth until the red stage.

At the last stage, only two genes with a known function were identified as upregulated in coffee beans from the lower canopy (LR), extensin-2-like (EXT2) and succinate-CoA ligase [ADP-forming] subunit alpha-2 mitochondrial. Extensin plays a key role in plant growth and

105 development by strengthening the cell wall and supporting the normal development of the embryo by modifying the cell plate in cytokinesis [463]. The development of coffee seed starts from perisperm to endosperm and then embryo [404]. A rise in EXT2 at mature stage predicts further growth of the LR with cell wall strengthening or embryo maturity.

6.3.4 Co-expression network associated with the control of bean quality

Five networks constructed in this study show the close relationships of candidate quality associated genes (caffeine and sucrose) with photosynthesis (Appendix 1), hormone (Appendix 2) and stress (Appendix 3) related DEGs (Figure 34). Caffeine genes were co- expressed with photosynthesis, hormone as well as sucrose genes, while sucrose gene expression also related to hormone and stress-related genes.

XMT1 was affiliated to both DEGs linked to photosystem II reaction center W protein (PsbW) and oxygen-evolving enhancer protein 2 isoform 1 (OEE2_1) together with gibberellin biosynthesis gene (ent-copalyl diphosphate synthase isoform 1, CPS1_1). The expression of PsbW is light-mediated [464] and OEE2_1 encodes PsbP, controls photosystem II and response to biotic stress [464-466]. PsbW was downregulated at the yellow stage compared to the green stage in both upper (UY vs UG: 3.77-fold) and lower canopy (LY vs LG: 3.12-fold) (see Appendix 1). The same pattern was classified in OEE2_1 of the upper canopy (3.36-fold) and lower canopy (2.53-fold). The only significant reduction was observed in the expression of CPS1_1 of the upper canopy from green to the yellow stage (2.29-fold). There was no significant difference in PsbW, OEE2_1 and CPS1_1 from the canopy positions comparison. This suggested that the photosynthesis II and gibberellin formation rate declined dramatically in the upper canopy from the green stage indicating a higher growth rate while the slower reduction of photosynthesis II and gibberellin biosynthesis in coffee beans from the lower canopy supports their slower growth.

DXMT1 was connected to chlorophyll a-b binding CP24 10A isoform 2 (CP24_2, confidence: 0.977), while DXMT2 was related to probable linoleate 9S-lipoxygenase 5 isoform 4 (LOX5_4). CP24 functions in light harvesting of photosystem I; Plants with no CP24 have reduced electron transport, limited growth[467] LOX has diverse function in plant development, fruit ripening, aroma formation and as seed storage protein; it can process hydroperoxide fatty acids to form the jasmonic acid (JA, inhibit growth) [468-470]. Similar degradation of CP24 expression was detected in the upper (4.3-fold) and lower canopy (4.5- fold) at the yellow stage compared to the green stage. The decrease of LOX5_4 expression

106 was slower in the lower canopy from green to the yellow stage (1.6-fold) but faster from yellow to red stage (2.2-fold). The reverse pattern was identified in the upper canopy, with 2.1-fold and 1.9-fold changes. This indicates a similar photosynthesis I change in the upper and lower canopy but different LOX5 accumulations and jasmonic acid formation to regulate seed maturity.

In addition, cytokinin dehydrogenase 6-like (CKX6) was directed to both MXMT2 and sucrose gene, VINI_2. CKX inactives the plant hormone, Cytokinin and delayed or reduced CKX leads to higher growth [471]. A 3.7-fold reduction in CKX6 expression was observed in the lower canopy from the green to the yellow stage and a reduction was also found in the upper canopy (2.5-fold). The sharp decrease of CKX6 in the lower canopy suggested its higher maturity level at the green stage.

Compared to the other candidate genes, CWINI cooperated with more DEGs, including genes involved in ethylene formation [472, 473], 1-aminocyclopropane-1-carboxylate oxidase isoform 1 (ACO_1) and 2 (ACO_2, with confidence: 0.95) and 1-aminocyclopropane-1- carboxylate synthase 3-like (ACS3) and dehydration response [474] (probable methyltransferase PMT21 isoform 1 and 2, PMT21_1 and PMT21_2). All these co-expressed genes showed increased expression in the upper canopy from the green to the yellow stage, compared to the lower canopy (ACO_1: 1.9 vs 1.5-fold, ACO_2: 1.9 vs 1.6-fold, ACS_3 3.1vs 2.7-fold, PMT21_1: 2.4 vs 2.1-fold and PMT21_2: 3.vs 2.6-fold). This would result in more ethylene and accelerated maturity and higher dehydration response further indicating less maturity and more rapid growth in the upper canopy from the green to the yellow stage.

107

Figure 34 Coexpression network of candidate genes with photosynthesis, hormone and stress related differentially expressed genes (DEGs). Note: PSI_, PSII_, H_, GA_, JA_, Ck_, El, S_, D_, DEGs related to photosynthesis I, II, hormone, gibberellin, jasmonic acid, Cytokinin, ethylene, stress and cold response. DXMT1: 3,7-dimethylxanthine N-methyltransferase 1; CP24_10A_2: chlorophyll a-b binding protein CP24 10A isoform 2; PSII_CAB1_3: chlorophyll a-b binding protein CAB1 isoform 3; PSI_XI_3: photosystem I reaction center subunit XI chloroplastic; XMT1: xanthosine methyltransferase 1; PSII_OEE2_1: oxygen-evolving enhancer protein 2 isoform 1; PSII_W: photosystem II reaction center W protein; PSI_O: photosystem I subunit O; GOD_2: gibberellin 20 oxidase 1-D-like isoform 2; CDS_1: copalyl diphosphate synthase isoform 1, CWINI: cell wall invertase inhibitor; PMT21_1: probable methyltransferase PMT21 isoform 1; PMT21_2: probable methyltransferase PMT21 isoform 2; PMT21_3: probable methyltransferase PMT21 isoform 3; ACO_1: 1-aminocyclopropane-1-carboxylate oxidase isoform 1; ACO_2: 1-aminocyclopropane-1-carboxylate oxidase isoform 2; ACS_3: 1- aminocyclopropane-1-carboxylate synthase 3-like; DXMT2: 3,7-dimethylxanthine N- methyltransferase 2; LOX5_4: probable linoleate 9S-lipoxygenase 5 isoform 4; MXMT2: 7- methylxanthine methyltransferase 2; VINI_2: valcuolar invertase inhibitor isoform 2; CKX6: cytokinin dehydrogenase 6-like. 6.4 Summary

This chapter illustrated that caffeine was accumulated early in maturation; however, it was not associated with higher gene expression. Longer accumulation with a gain in the expression of CTgS2 at green and red stages provides more trigonelline in the lower canopy. The rise in sucrose content resulted from higher expression of multiple genes at different stages and the important genes to enhance sucrose content are significantly expressed CIN

108

(green stage), SPS2_2 (green stage), SPS1 (green stage), SUS2_2 (yellow stage), VINI_2 (yellow stage) and VIN_2 (all stages).

Importantly, variation in expression of these genes in the two canopy positions was regulated by differentially expressed genes. Starting from a higher developed stage, coffee beans from the lower canopy had a longer maturity possibly controlled by ABA and the dehydration response gene (SDIR1 and RD22), inhibiting growth at the green stage and RD22 to suppresses chlorophyll degradation and probably reduce photosynthesis loss at the yellow stage. Coffee beans from the upper canopy grew rapidly from a lower mature level at the green stage. This rapid growth may be primarily driven by BPG2 and AspAT (light manner) to elevate chlorophyll formation and carbon and energy consumption, GRPs to response to dehydration and stress as well as UBQ10 and transcription factor bHLH62 (heat and growth/hormone) response to SA, JA and stress at the green stage.

Interestingly, key genes governing caffeine biosynthesis also showed co-expressed correlation by photosynthesis (CP24 to DXMT1 and OEE2 to XMT1) and hormone genes (gibberellin biosynthesis CDS to XMT1, cytokinin associated CKX6 to MXMT2 and JA related LOX5 to DXMT2). This correlation, as well as the significant changes in the ripening of coffee beans of the lower canopy, highlights caffeine biosynthesis with the photosystem II and hormone (JA, cytokinin, ethylene and ABA) responses associated. The direction of CKX6 to MXMT2 and VINI_2 suggested CKX6 as a key coordinator between caffeine and sucrose metabolism.

Sucrose is the carbon backbone to form major components in the plant; Together with fructose, UDP-Glucose (produced from sucrose degradation) can be used for the formation of cell wall polysaccharides, proteins and lipids [431]. From previous studies, we know candidate genes associated with many essential components accumulate from the green stage, for example, galactomannan (~50% of cell wall polysaccharides), 11S (~45% of protein) and triacylglycerol (78% lipids) [401, 404]. This is consistent with higher upstream degradation of sucrose by SUS1 and CWIN (suggested by CWINI) to degrade sucrose to the building blocks, UDP-Glucose and fructose. Inhibition of CWINI enhanced tomato CWIN activity (2- fold) and seed weight [475]. Other than inhibition of CWIN, CWINI also negatively mediate cytokinin in tobacco leaf [476]. Notably, in this study, CWINI was found to have a positive correlation with ethylene biosynthesis genes (ACO_1, ACO_2 and ACS_3) to force bean ripening as well as dehydration response, PMT21 as the early dehydration response. SUS1

109 and CWINI genes are potentially key candidates to regulate bean growth and the downstream key components accumulation.

In conclusion, analysis of the candidate genes identified in this study can be used for breeding to produce high-quality coffee beans or exploring ways to improve coffee quality by manipulating the environment of the developing bean. Adaptation of coffee production to new environments may be facilitated by selection of genotypes with higher expression of the genes contributing to coffee quality under the target environmental conditions.

110

Chapter 7 Conclusion

This project demonstrated the production of higher quality coffee beans from the lower canopy. Improved understanding of the molecular and genetic basis of coffee quality and canopy influences was facilitated by phenotypic and transcriptomic analysis. Significantly higher caffeine, sucrose and trigonelline contents as well as intense aroma were found in coffee beans from the lower canopy.

To understand this improvement, transcriptome analysis was proposed. However, without an existing reference genome or transcriptome, long-read sequencing technology was used to build a transcriptome reference, the LRS coffee bean transcriptome reference. Other than the usage for downstream analysis, this transcriptome reference also provides insights to the diversity of the tetraploid genome. Long 5’UTR sequences were also found in this reference, which will inspire future related studies in plant.

RNA-seq analysis was based on short-read sequencing and the LRS coffee bean transcriptome reference built in this project. Differentially expressed genes through coffee bean ripening explain the major components accumulated pattern. Many of the key components were formed at the early ripening stage, the green stage, for example, TAG and galactomannan. Importantly, a co-expression network was generated for the analysis of further regulation and control of the accumulation of key components. Alpha expansin was found to be one of the most important genes in this network.

A study of canopy position influences on coffee quality was conducted by comparative transcriptome analysis. Key candidate genes contributing to the differences in coffee quality were found. Improved chemical composition resulted from higher expression of candidate genes as well as longer and slower maturity of the coffee beans from the lower canopy. Longer and slower maturity was likely to have resulted from candidate hormone genes inhibiting growth at the green stage as well as less photosynthesis loss in the later stages.

111

Bibliography

1. Stevens, P.F. Angiosperm Phylogeny Website, Version 7. 2006 [cited 2006 27 December]; Available from: http://www.mobot.org/MOBOT/research/APweb/ 2. Wintgens, J.N., Coffee: growing, processing, sustainable production : a guidebook for growers, processors, traders, and researchers. Second, updated edition. ed. 2012: Wiley-VCH Verlag GmbH & Co. KGaA,Weinheim, Germany. 3. Statistics. World's largest coffee producing countries in 2017 (in 1,000 60 kilogram bags). 2018 [cited 2018; This statistic shows coffee production worldwide in 2017, by country. In that year, Brazil produced some 55 million 60 kilogram bags of coffee. Vietnam came in second, with about 28.5 million 60 kilogram bags of coffee. ]. Available from: https://www.statista.com/statistics/277137/world-coffee-production-by-leading-countries/. 4. Klein, A.M., Steffan-Dewenter, I., and Tscharntke, T., Pollination of Coffea canephora in relation to local and regional agroforestry management. Journal of Applied Ecology, 2003. 40(5): p. 837-845. 5. Ky, C.L., Louarn, J., Dussert, S., et al., Caffeine, trigonelline, chlorogenic acids and sucrose diversity in wild Coffea arabica L. and C. canephora P. accessions. Food Chemistry, 2001. 75(2): p. 223-230. 6. Christophe, M., Thierry, L., Christian, C., et al., Heterozygous genotypes are efficient testers for assessing between-population combining ability in the reciprocal recurrent selection of Coffea canephora. Euphytica, 2008. 160(1): p. 101-110. 7. Pilar, M. and Susan, M., Simple sequence repeat diversity in diploid and tetraploid Coffea species. Genome, 2004. 47(3): p. 501-509. 8. Juan, C., Herrera, P.; , Gabriel, A., A.., Hernando, A., Cortina, G.;, et al., Genetic analysis of partial resistance to coffee leaf rust (Hemileia vastatrix Berk & Br.) introgressed into the cultivated Coffea arabica L. from the diploid C. canephora species. Euphytica, 2009. 167(1): p. 57-67. 9. Organization, I.C. Coffee Market Report. 2017 [cited 2018; Available from: http://www.ico.org/documents/cy2017-18/cmr-0118-e.pdf. 10. Lashermes, P., Combes, M.C., Robert, J., et al., Molecular characterisation and origin of the Coffea arabica L. genome. Molecular and General Genetics MGG, 1999. 261(2): p. 259-266. 11. Lashermes, P., Combes, M.C., Prakash, N.S., et al., Genetic linkage map of Coffea canephora: effect of segregation distortion and analysis of recombination rate in male and female meioses. Genome, 2001. 44(4): p. 589-595. 12. Berecha, G., Aerts, R., Vandepitte, K., et al., Effects of forest management on mating patterns, pollen flow and intergenerational transfer of genetic diversity in wild Arabica coffee (Coffea arabica L.) from Afromontane rainforests. Biological Journal of the Linnean Society, 2014. 112(1): p. 76-88. 13. Free, J.B., Insect polination of crops. 1993: London: Academic Press. 14. Klein, A.-M., Steffan-Dewenter, I., and Tscharntke, T., Fruit set of highland coffee increases with the diversity of pollinating bees. Proceedings of the Royal Society B: Biological Sciences, 2003. 270(1518): p. 955-961. 15. Rehm, S.E., G., The cultivated plants of the tropics and subtropics. Cultivation, economic value, utilization. 1991: Weikersheim: Markgraf. 16. Silvarolla, M., B., Mazzafera, P., and Fazuoli, L., C., Plant biochemistry: A naturally decaffeinated arabica coffee. Nature, 2004. 429(6994): p. 826-826. 17. Oxfam, B., O., Charavat, C., & Eagleton, D.), The coffee market: A background study. 2001: London: Oxfam. 18. Nina, L. and Gregory, D., The coffee book: Anatomy of an industry from crop to the last drop. The Rep Rev Up edition ed. 2006: The New Press, New York. 19. Fridell, G., coffee. 2014: Polity Press, Malden, MA, Cambridge, UK.

112

20. Goodman, D., Agro-food studies in the "Age of Ecology": Nature, corporeality, bio-Politics. Sociologia Ruralis, 1999. 39(1): p. 17-38. 21. Rice, A.R., Noble goals and challenging terrain: Organic and movements in the global marketplace. Journal of Agricultural and Environmental Ethics, 2001. 14(14): p. 39–66. 22. ICO. Coffee Prices - 45 Year Historical Chart. 2018 [cited 2018 2018 April]; Available from: http://www.macrotrends.net/2535/coffee-prices-historical-chart-data. 23. Bosselmann, A.S., Dons, K., Oberthur, T., et al., The influence of shade trees on coffee quality in small holder coffee agroforestry systems in Southern Colombia Agriculture. Ecosystems and Environment, 2009. 129: p. 253–260. 24. Läderacha, P., Oberthürb, T., Cook, S., et al., Systematic agronomic farm management for improved coffee quality. Field Crops Research, 2011. 120: p. 321–329. 25. Yu, Q.Y., Guyot, R., Kochko, A., et al., Micro-collinearity and genome evolution in the vicinity of an ethylene receptor gene of cultivated diploid and allotetraploid coffee species (Coffea). The Plant Journal, 2011. 67: p. 305–317. 26. Avelino, J., Barboza, B., Araya, J.C., et al., Effects of slope exposure, altitude and yield on coffee quality in two altitude terroirs of Costa Rica, Orosi and Santa Maria de Dota. Journal of the Science of Food and Agriculture, 2005. 85(11): p. 1869-1876. 27. Leroy, T., Ribeyre, F., Bertrand, B., et al., Genetics of coffee quality. Brazilian Journal of Plant Physiology, 2006. 18: p. 229-242. 28. Lashermes, P., Combes, M.C., Topart, P., et al., Molecular Breeding in Coffee (Coffea Arabica L.), in Coffee Biotechnology and Quality, T. Sera, et al., Editors. 2000, Springer Netherlands. p. 101-112. 29. Leroy, T., Bellis, F., Legnate, H., et al., Improving the quality of African robustas: QTLs for yield- and quality-related traits in Coffea canephora. Tree Genetics & Genomes, 2011. 7: p. 781–798. 30. Joët, T., Laffargue, A., Descroix, F., et al., Influence of environmental factors, wet processing and their interactions on the biochemical composition of green Arabica coffee beans. Food Chemistry, 2010. 118(3): p. 693-701. 31. Barel, M. and Jacquet, M., Coffee quality: its causes, appreciation and improvement. Plant Rech. Dévelop, , 1994. 1: p. 5-13. 32. Muschler, R.G., Shade management and its effect on coffee growth and quality. Coffee: Growing, processing, sustainable production: A guidebook for growers, processors, traders, and researchers, 2008: p. 391-418. 33. Van der Vossen, H.A.M. and Walyaro, D.J., The coffee in Kenya-review of the progress made since 1977 and plan of action for the coming years. Kenya coffee, 1981. 46: p. 113-130. 34. Dessalegn, Y., Labuschagne, M.T., Osthoff, G., et al., Genetic diversity and correlation of bean caffeine content with cup quality and green bean physical characteristics in coffee (Coffea arabica L.). Journal of the Science of Food and Agriculture, 2008. 88(10): p. 1726-1730. 35. Vaast, P., Bertrand, B., Perriot, J.-J., et al., Fruit thinning and shade improve bean characteristics and beverage quality of coffee (Coffea arabica L.) under optimal conditions. Journal of the Science of Food and Agriculture, 2006. 86(2): p. 197-204. 36. Franca, A.S., Oliveira, L.S., Mendonça, J.C.F., et al., Physical and chemical attributes of defective crude and roasted coffee beans. Food Chemistry, 2005. 90(1–2): p. 89-94. 37. Salva, T.J.G., Ribeiro, J.S., and Ferreira, M.M.C., Chemometric models for the quantitative descriptive sensory analysis of Arabica coffee beverages using near infrared spectroscopy. Talanta, 2011. 83: p. 1352–1358. 38. Farah, A., Monteiro, M.C., Calado, V., et al., Correlation between cup quality and chemical attributes of Brazilian coffee. Food Chemistry, 2006. 98(2): p. 373-380.

113

39. Sunarharum, W.B., Williams, D.J., and Smyth, H.E., Complexity of coffee flavor: A compositional and sensory perspective. Food Research International, 2014. 62: p. 315-325. 40. Decazy, F., Avelino, J., Guyot, B., et al., Quality of Different Honduran Coffees in Relation to Several Environments. Journal of Food Science, 2003. 68(7): p. 2356-2361. 41. Yeretzian, C., Jordan, A., Badoud, R., et al., From the green bean to the cup of coffee: investigating coffee roasting by on-line monitoring of volatiles. European Food Research and Technology, 2002. 214(2): p. 92-104. 42. Jean, C., Galland,;, Jacques, A., ;, Alejandra, L., ; , et al., Origin coffees: are Appellations of Origin on the horizon? Coffee, terroirs and qualities, 2006: p. 50-56. 43. Bucheli, P., Meyer, I., Pittet, A., et al., Industrial Storage of Green Robusta Coffee under Tropical Conditions and Its Impact on Raw Material Quality and Ochratoxin A Content. Journal of Agricultural and Food Chemistry, 1998. 46(11): p. 4507-4511. 44. Van Der Vossen, H.A.M., The Cup Quality of Disease-Resistant Cultivars of Arabica Coffee ( Coffea arabica). Experimental Agriculture, 2009. 45(03): p. 323-332. 45. Weinberger, B.A. and Bealer, B.K., The world of caffeine. 2002, NewYork: Routledge. 46. Debry, G., Coffee and Health. 1994, Paris: John Libbey. 47. Barone, J.J. and Roberts, H.R., Caffeine consumption. Food and Chemical Toxicology, 1996. 34(1): p. 119-129. 48. Denoeud, F., Carretero-Paulet, L., Dereeper, A., et al., The coffee genome provides insight into the convergent evolution of caffeine biosynthesis. science, 2014. 345(6201): p. 1181- 1184. 49. Trask, A.V., Motherwell, W.D.S., and Jones, W., Pharmaceutical Cocrystallization: Engineering a Remedy for Caffeine Hydration. Crystal Growth & Design, 2005. 5(3): p. 1013- 1021. 50. Jiang, L., Ding, Y., Jiang, F., et al., Electrodeposited nitrogen-doped graphene/carbon nanotubes nanocomposite as enhancer for simultaneous and sensitive voltammetric determination of caffeine and vanillin. Analytica Chimica Acta, 2014. 833(0): p. 22-28. 51. Andrea, I. and Rinantonio, V., Espresso coffee : the science of quality 2nd edition ed. 2005, Amsterdam: Elsevier Academic. 52. Filho, O. and Mazzafera, P., Caffeine Does Not Protect Coffee Against the Leaf Miner Perileucoptera coffeella. Journal of Chemical Ecology, 2000. 26(6): p. 1447-1464. 53. Hiroshi, S., Yun-Soo, K., and Yong-Eui, C., Like Cures Like: Caffeine Immunizes Plants Against Biotic Stresses. 2013. 54. Mazzafera, P. and Carvalho, A., Breeding for low seed caffeine content of coffee (Coffea L.) by interspecific hybridization. Euphytica, 1991. 59(1): p. 55-60. 55. Rakotomalala, J.-J.R., Cros, E., Clifford, M.N., et al., Caffeine and theobromine in green beans from Mascarocoffea. Phytochemistry, 1992. 31(4): p. 1271-1272. 56. Campa, C., Doulbeau, S., Dussert, S., et al., Diversity in bean Caffeine content among wild Coffea species: Evidence of a discontinuous distribution. Food Chemistry, 2005. 91(4): p. 633- 637. 57. Akaffou, D.S., Ky, C.L., Barre, P., et al., Identification and mapping of a major gene (Ft1) involved in fructification time in the interspecific cross Coffea pseudozanguebariae × C. liberica var. Dewevrei: impact on caffeine content and seed weight. Theoretical and Applied Genetics, 2003. 106(8): p. 1486-1490. 58. Davis, A.P., Govaerts, R., Bridson, D.M., et al., An annotated taxonomic conspectus of the genus Coffea (Rubiaceae). Botanical Journal of the Linnean Society, 2006. 152(4): p. 465-512. 59. Oestreich-Janzen, S., Chemistry of coffee. Chem. Biol, 2010. 3: p. 1085-1117. 60. Bonham, C.C., Wood, K.V., Yang, W.-J., et al., Identification of quaternary ammonium and tertiary sulfonium compounds by plasma desorption mass spectrometry. Journal of Mass Spectrometry, 1995. 30(8): p. 1187-1194.

114

61. Stennert, A. and Maier, H., Trigonelline in coffee. Zeitschrift für Lebensmittel-Untersuchung und Forschung, 1994. 199(3): p. 198-200. 62. Dart, S.K. and Nursten, H.E., Volatile Components, in Coffee, R.J. Clarke and R. Macrae, Editors. 1985, Springer Netherlands. p. 223-265. 63. Clifford, M.N., Chemical and Physical Aspects of Green Coffee and Coffee Products, in Coffee, M.N. Clifford and K.C. Willson, Editors. 1985, Springer, US. p. 305-374. 64. De Maria, C.A.B., Trugo, L.C., Neto, F.R.A., et al., Composition of green coffee water-soluble fractions and identification of volatiles formed during roasting. Food Chemistry, 1996. 55(3): p. 203-207. 65. Trugo, L., Analysis of coffee products. Encyclopedia of Food Sciences and Nutrition, 2nd Ed.; Trugo, LC, Finglas, PM, Eds, 2003: p. 1498-1506. 66. Teply, L.J. and Prier, R.F., Nutrients in Coffee, Nutritional Evaluation of Coffee Including Niacin Bioassay. Journal of Agricultural and Food Chemistry, 1957. 5(5): p. 375-377. 67. Perrone, D., Donangelo, C.M., and Farah, A., Fast simultaneous analysis of caffeine, trigonelline, nicotinic acid and sucrose in coffee by liquid chromatography–mass spectrometry. Food Chemistry, 2008. 110(4): p. 1030-1035. 68. Campa, C., Ballester, J.F., Doulbeau, S., et al., Trigonelline and sucrose diversity in wild Coffea species. Food Chemistry, 2004. 88(1): p. 39-43. 69. Figueiredo, L.P., Borém, F.M., Cirillo, M.Â., et al., The Potential for High Quality Bourbon Coffees From Different Environments. 2013. Vol. 5. 2013. 70. Sridevi, V. and Giridhar, P., Influence of Altitude Variation on Trigonelline Content during Ontogeny of Coffea Canephora Fruit. Journal of Food Studies, 2013. 2(1): p. 62-74. 71. van Dam, R.M. and Feskens, E.J.M., Coffee consumption and risk of type 2 diabetes mellitus. The Lancet, 2002. 360(9344): p. 1477-1478. 72. Lindsay, J., Laurin, D., Verreault, R., et al., Risk Factors for Alzheimer’s Disease: A Prospective Analysis from the Canadian Study of Health and Aging. American Journal of Epidemiology, 2002. 156(5): p. 445-453. 73. Almeida, A.A.P., Farah, A., Silva, D.A.M., et al., Antibacterial Activity of Coffee Extracts and Selected Coffee Chemical Compounds against Enterobacteria. Journal of Agricultural and Food Chemistry, 2006. 54(23): p. 8738-8743. 74. Michel David dos Santos, Almeida, M.C., Lopes, N.P., et al., Evaluation of the Anti- inflammatory, Analgesic and Antipyretic Activities of the Natural Polyphenol Chlorogenic Acid. Biological and Pharmaceutical Bulletin, 2006. 29(11): p. 2236-2240. 75. Ranheim, T. and Halvorsen, B., Coffee consumption and human health–beneficial or detrimental?–Mechanisms for effects of coffee consumption on different risk factors for cardiovascular disease and type 2 diabetes mellitus. Molecular nutrition & food research, 2005. 49(3): p. 274-284. 76. De Maria, C.A.B., Trugo, L.C., Moreira, R.F.A., et al., Simultaneous determination of total chlorogenic acid, trigonelline and caffeine in green coffee samples by high performance gel filtration chromatography. Food Chemistry, 1995. 52(4): p. 447-449. 77. Rice-Evans, C.A., Miller, N.J., and Paganga, G., Structure-antioxidant activity relationships of flavonoids and phenolic acids. Free Radical Biology and Medicine, 1996. 20(7): p. 933-956. 78. Matsuda, F., Morino, K., Miyashita, M., et al., Metabolic Flux Analysis of the Phenylpropanoid Pathway in Wound-Healing Potato Tuber Tissue using Stable Isotope-Labeled Tracer and LC- MS Spectroscopy. Plant and Cell Physiology, 2003. 44(5): p. 510-517. 79. Peterson, J., Harrison, H., Snook, M., et al., Chlorogenic acid content in sweetpotato germplasm: possible role in disease and pest resistance. Allelopathy Journal, 2005. 16(2): p. 239-249. 80. Niggeweg, R., Michael, A.J., and Martin, C., Engineering plants with increased levels of the antioxidant chlorogenic acid. Nature biotechnology, 2004. 22(6): p. 746.

115

81. Farah, A., Monteiro, M., Donangelo, C.M., et al., Chlorogenic Acids from Green Coffee Extract are Highly Bioavailable in Humans. The Journal of Nutrition, 2008. 138(12): p. 2309-2315. 82. Farah, A. and Donangelo, C.M., Phenolic compounds in coffee. Brazilian Journal of Plant Physiology, 2006. 18(1): p. 23-36. 83. Campa, C., Doulbeau, S., Dussert, S., et al., Qualitative relationship between caffeine and chlorogenic acid contents among wild Coffea species. Food chemistry, 2005. 93(1): p. 135- 139. 84. Preedy, V.R., Processing and Impact on Antioxidants in Beverages. 2014: Elsevier. 85. Geromel, C., Ferreira, L.P., Guerreiro, S.M.C., et al., Biochemical and genomic analysis of sucrose metabolism during coffee (Coffea arabica) fruit development. Journal of Experimental Botany, 2006. 57(12): p. 3243-3258. 86. Grosch, W., Chemistry III: Volatile Compounds, in Coffee. 2008, Blackwell Science Ltd. p. 68- 89. 87. Zyzak, D.V., Sanders, R.A., Stojanovic, M., et al., Acrylamide formation mechanism in heated foods. Journal of agricultural and food chemistry, 2003. 51(16): p. 4782-4787. 88. Glatt, H., Schneider, H., and Liu, Y., V79-hCYP2E1-hSULT1A1, a cell line for the sensitive detection of genotoxic effects induced by carbohydrate pyrolysis products and other food- borne chemicals. Mutation Research/Genetic Toxicology and Environmental Mutagenesis, 2005. 580(1): p. 41-52. 89. Knopp, S., Bytof, G., and Selmar, D., Influence of processing on the content of sugars in green Arabica coffee beans. European Food Research and Technology, 2006. 223(2): p. 195-201. 90. Ky, C.L., Doulbeau, B., Guyot, B., et al., Inheritance of coffee bean sucrose content in the interspecific cross Coffea pseudozanguebariae× Coffea liberica ‘dewevrei’. , 2000. 119(2): p. 165-168. 91. Casal, S., Oliveira, M.B., and Ferreira, M.A., HPLC/diode-array applied to the thermal degradation of trigonelline, nicotinic acid and caffeine in coffee. Food Chemistry, 2000. 68(4): p. 481-485. 92. De Azevedo, A., Kieckbush, T., Tashima, A., et al., Extraction of green coffee oil using supercritical carbon dioxide. The Journal of Supercritical Fluids, 2008. 44(2): p. 186-192. 93. Kurzrock, T. and Speer, K., Diterpenes and diterpene esters in coffee. Food Reviews International, 2001. 17(4): p. 433-450. 94. Speer, K. and Kölling-Speer, I., The lipid fraction of the coffee bean. Brazilian Journal of Plant Physiology, 2006. 18(1): p. 201-216. 95. Lam, L.K., Sparnins, V.L., and Wattenberg, L.W., Effects of derivatives of kahweol and cafestol on the activity of glutathione S-transferase in mice. Journal of medicinal chemistry, 1987. 30(8): p. 1399-1403. 96. Cavin, C., Holzhäuser, D., Constable, A., et al., The coffee-specific diterpenes cafestol and kahweol protect against aflatoxin B1-induced genotoxicity through a dual mechanism. Carcinogenesis, 1998. 19(8): p. 1369-1375. 97. Abidi, S.L. and Mounts, T.L., Reversed-phase high-performance liquid chromatographic separations of tocopherols. Journal of Chromatography A, 1997. 782(1): p. 25-32. 98. Konings, E., Roomans, H., and Beljaars, P.R., Liquid chromatographic determination of tocopherols and tocotrienols in margarine, infant foods, and vegetables. Journal of AOAC International, 1995. 79(4): p. 902-906. 99. Speer, K., Tewis, R., and Montag, A. A new roasting component in coffee. in International Colloquium on the Chemistry of Coffee. 1991. ASIC Paris. 100. Tewis, R., Montag, A., and Speer, K. Dehydrocafestol and dehydrokahweol: two new roasting components in coffee. in Colloq. Sci. Int. Cafe. 1993. 101. Kölling-Speer, I., Kurt, A., and Speer, K. Cafestol an dehydrocafestol in roasted coffee. in COLLOQUE SCIENTIFIQUE INTERNATIONAL SUR LE CAFE. 1997. ASIC ASSOCIATION SCIENTIFIQUE INTERNATIONALE.

116

102. Aro, A., Tuomilehto, J., Kostiainen, E., et al., Bioled coffee increases serum low density lipoprotein concentration. Metabolism, 1987. 36(11): p. 1027-1030. 103. Bønaa, K., Arnesen, E., Thelle, D.S., et al., Coffee and cholesterol: is it all in the brewing? The Tromsø Study. BMJ, 1988. 297(6656): p. 1103-1104. 104. , A. and Viani, R., Espresso coffee: the chemistry of quality. 1995: Academic Press London, UK. 105. Semmelroch, P., Laskawy, G., Blank, I., et al., Determination of potent odourants in roasted coffee by stable isotope dilution assays. Flavour and fragrance journal, 1995. 10(1): p. 1-7. 106. Freitas, A.C. and Mosca, A., Coffee geographic origin—an aid to coffee differentiation. Food research international, 1999. 32(8): p. 565-573. 107. Sunarharum, W., B., Williams, D., J., and Smyth, H., E., Complexity of coffee flavor: A compositional and sensory perspective. Food Research International, 2014. 62: p. 315-325. 108. Nijssen, L., Volatile compounds in food: qualitative and quantitative data. 1996: TNO Nutrition and Food Research Institute. 109. Leloup, V. and Liardon, R. Analytical characterization of coffee carbohydrates. in COLLOQUE SCIENTIFIQUE INTERNATIONAL SUR LE CAFE. 1993. ASIC ASSOCIATION SCIENTIFIQUE INTERNATIONALE. 110. Hofmann, T., Czerny, M., and Schieberle, P. Instrumental analysis and sensory studies on the role of melanoidins in the aroma staling of coffee brew. in 19ème Colloque Scientifique International sur le Café, Trieste, Italy, 14-18 mai 2001. 2001. Association Scientifique Internationale du Café (ASIC). 111. Semmelroch, P. and Grosch, W., Analysis of roasted coffee powders and brews by gas chromatography-olfactometry of headspace samples. LWT-Food Science and Technology, 1995. 28(3): p. 310-313. 112. Blank, I., Sen, A., and Grosch, W., Potent odorants of the roasted powder and brew of Arabica coffee. Zeitschrift für Lebensmittel-Untersuchung und Forschung, 1992. 195(3): p. 239-245. 113. Michishita, T., Akiyama, M., Hirano, Y., et al., Gas chromatography/olfactometry and electronic nose analyses of retronasal aroma of espresso and correlation with sensory evaluation by an artificial neural network. Journal of food science, 2010. 75(9): p. S477-S489. 114. Akiyama, M., Murakami, K., Ohtani, N., et al., Analysis of volatile compounds released during the grinding of roasted coffee beans using solid-phase microextraction. Journal of agricultural and food chemistry, 2003. 51(7): p. 1961-1969. 115. Grosch, W., Chemistry III: volatile compounds. Coffee: recent developments, 2001: p. 68-89. 116. Blank, I., Sen, A., and Grosch, W. Aroma impact compounds of arabica and robusta coffee. Qualitative and quantitative investigations. in Quatorzieme colloque scientifique international sur le cafe, San Francisco, 14-19 juillet 1991. 1991. 117. Ribeiro, J., Augusto, F., Salva, T., et al., Prediction of sensory properties of Brazilian Arabica roasted coffees by headspace solid phase microextraction-gas chromatography and partial least squares. Analytica Chimica Acta, 2009. 634(2): p. 172-179. 118. Belitz, H.D., Food chemistry. Aroma Compounds, 2009: p. 340-402. 119. Czerny, M., Christlbauer, M., Christlbauer, M., et al., Re-investigation on odour thresholds of key food aroma compounds and development of an aroma language based on odour qualities of defined aqueous odorant solutions. European Food Research and Technology, 2008. 228(2): p. 265-273. 120. Monien, B.H., Herrmann, K., Florian, S., et al., Metabolic activation of furfuryl alcohol: formation of 2-methylfuranyl DNA adducts in Salmonella typhimurium strains expressing human sulfotransferase 1A1 and in FVB/N mice. Carcinogenesis, 2011: p. bgr126. 121. Peterson, L.A., Electrophilic Intermediates Produced by Bioactivation of Furan*. Drug metabolism reviews, 2006. 38(4): p. 615-626.

117

122. Crews, C. and Castle, L., A review of the occurrence, formation and analysis of furan in heat- processed foods. Trends in Food Science & Technology, 2007. 18(7): p. 365-372. 123. Shimoda, M. and Shibamoto, T., Isolation and identification of headspace volatiles from brewed coffee with an on-column GC/MS method. Journal of agricultural and food chemistry, 1990. 38(3): p. 802-804. 124. Leino, M., Lapveteläinen, A., Menchero, P., et al., Characterisation of stored Arabica and Robusta coffees by headspace-GC and sensory analyses. Food quality and preference, 1992. 3(2): p. 115-125. 125. Petisca, C., Pérez-Palacios, T., Farah, A., et al., Furans and other volatile compounds in ground roasted and espresso coffee using headspace solid-phase microextraction: Effect of roasting speed. Food and Bioproducts Processing, 2013. 91(3): p. 233-241. 126. Ho, C., Hwang, H., Yu, T., et al. An overview of the Maillard reactions related to aroma generation in coffee. in COLLOQUE Scientifique International sur le Café, 15. Montpellier (Francia), Juin 6-11, 1993.. 1993. 127. Kuballa, T., Stier, S., and Strichow, N., Furan concentrations in coffee and coffee beverages. Deutsche Lebensmittel-Rundschau, 2005. 101(6): p. 229-235. 128. Akiyama, M., Murakami, K., Ikeda, M., et al., Analysis of the headspace volatiles of freshly brewed Arabica coffee using solid‐phase microextraction. Journal of food science, 2007. 72(7): p. C388-C396. 129. Cheong, M.W., Tong, K.H., Ong, J.J.M., et al., Volatile composition and antioxidant capacity of Arabica coffee. Food Research International, 2013. 51(1): p. 388-396. 130. Grosch, W., Semmelroch, P., and Masanetz, C. Quantification of potent odorants in coffee. in COLLOQUE Scientifique International sur le Café, 15. Montpellier (Francia), Juin 6-11, 1993.. 1993. 131. Bertrand, B., Boulanger, R., Dussert, S., et al., Climatic factors directly impact the volatile organic compound fingerprint in green Arabica coffee bean as well as coffee beverage quality. Food chemistry, 2012. 135(4): p. 2575-2583. 132. Anthony, F., Bertrand, B., Quiros, O., et al., Genetic diversity of wild coffee (Coffea arabica L.) using molecular markers. Euphytica, 2001. 118(1): p. 53-65. 133. Carvalho, A., Principles and practice of coffee plant breeding for productivity and quality factors: Coffea arabica. Coffee agronomy, 1988: p. 129-165. 134. van der Vossen, H.M., Coffee Selection and Breeding, in Coffee, M.N. Clifford and K.C. Willson, Editors. 1985, Springer US. p. 48-96. 135. Carvalho, A., Ferwerda, F.P., Frahm-Leliveld, J.A., et al., Coffee (Coffea arabica L. and Coffea canephora Pierre ex Froehner). in Miscellaneous Papers. 1969, Landbouwhogeschool Wageningen p. 189-241. 136. Surya Prakash, N., Combes, M.-C., Somanna, N., et al., AFLP analysis of introgression in coffee cultivars (Coffea arabica L.) derived from a natural interspecific hybrid. Euphytica, 2002. 124(3): p. 265-271. 137. Bettencourt, A., Considerações gerais sobre o ‘Híbrido de Timor’. Instituto Agronomico de Campinas, Circular, 1973(31): p. 285. 138. Bisco, J. and Logan, W., Types and varieties of coffee. Coffee handbook. Cannon Press, Harare, Zimbabwe, 1987. 139. Orozco-Castillo, C., Chalmers, K.J., Waugh, R., et al., Detection of genetic diversity and selective gene introgression in coffee using RAPD markers. Theoretical and Applied Genetics, 1994. 87(8): p. 934-940. 140. Spaniolas, S., May, S.T., Bennett, M.J., et al., Authentication of Coffee by Means of PCR-RFLP Analysis and Lab-on-a-Chip Capillary Electrophoresis. Journal of Agricultural and Food Chemistry, 2006. 54(20): p. 7466-7470.

118

141. Aga, E. and Bryngelsson, T., Inverse Sequence-tagged Repeat (ISTR) Analysis of Genetic Variability in Forest Coffee (Coffea arabica L.) from Ethiopia. Genetic Resources and Crop Evolution, 2006. 53(4): p. 721-728. 142. Poncet, V., Hamon, P., Minier, J., et al., SSR cross-amplification and variation within coffee trees (Coffea spp.). Genome, 2004. 47(6): p. 1071-1081. 143. Mishra, M., Tornincasa, P., De Nardi, B., et al., Genome organization in coffee as revealed by EST PCRRFLP, SNPs and SSR analysis. Journal of Crop Science and Biotechnology, 2011. 14(1): p. 25-37. 144. Pot, D., Bouchet, S., Marraccini, P., et al. Nucleotide diversity of genes involved in sucrose metabolism. Towards the identification of candidates genes controlling sucrose variability in Coffea sp. in 21st international conference on coffee science, montpellier. Association Scientifique Internationale du Cafe (ASIC), Paris. 2006. 145. Anthony, F., Combes, M., Astorga, C., et al., The origin of cultivated Coffea arabica L. varieties revealed by AFLP and SSR markers. Theoretical and Applied Genetics, 2002. 104(5): p. 894-900. 146. Baumann, T.W., Sondahl, M.R., Mösli Waldhauser, S.S., et al., Non-destructive analysis of natural variability in bean caffeine content of Laurina coffee. Phytochemistry, 1998. 49(6): p. 1569-1573. 147. de Brito, G., Caixeta, E., Gallina, A., et al., Inheritance of coffee leaf rust resistance and identification of AFLP markers linked to the resistance gene. Euphytica, 2010. 173(2): p. 255- 264. 148. Etienne, H., Anthony, F., Dussert, S., et al., Biotechnological applications for the improvement of coffee (Coffea arabica L.). In Vitro Cellular & Developmental Biology - Plant, 2002. 38(2): p. 129-138. 149. Lim, T., Coffea liberica, in Edible Medicinal And Non-Medicinal Plants. 2013, Springer. p. 710- 714. 150. Cramer, P.J.S., A review of literature of coffee research in Indonesia. 1957, Turrialba,Costa Rica: Intet-American Institute of Agriculture Sciences. 151. Appel, O., Disease Resistance in Plants. Science, 1915. 41(1065): p. 773-782. 152. Leroy, T., Montagnon, C., Charrier, A., et al., Reciprocal recurrent selection applied to Coffea canephora Pierre. I. Characterization and evaluation of breeding populations and value of intergroup hybrids. Euphytica, 1993. 67(1-2): p. 113-125. 153. Musoli, P., Cubry, P., Aluka, P., et al., Genetic differentiation of wild and cultivated populations: diversity of Coffea canephora Pierre in Uganda. Genome, 2009. 52(7): p. 634- 646. 154. Carvalho, A., Histórico do desenvolvimento do cultivo do café no Brasil. Documentos IAC, 1993. 34: p. 1-7. 155. Bouharmont, P., Lotodé, R., Awemo, J., et al., La sélection générative du caféier Robusta au Cameroun. Analyse des résultats d’un essai d’hybrides diallele partiel implanté en 1973. Café Cacao Thé, 1986. 30(2): p. 93-112. 156. Lashermes, P., Cros, J., Combes, M.-C., et al., Inheritance and restriction fragment length polymorphism of chloroplast DNA in the genus Coffea L. Theoretical and Applied Genetics, 1996. 93(4): p. 626-632. 157. Noirot, M., Poncet, V., Barre, P., et al., Genome size variations in diploid African Coffea species. Annals of botany, 2003. 92(5): p. 709-714. 158. De Kochko, A., Akaffou, S., Andrade, A.C., et al., Advances in Coffea genomics. Advances in botanical research, 2010. 53: p. 23-63. 159. Devos, K.M., Moore, G., and Gale, M.D., Conservation of marker synteny during evolution. Euphytica, 1995. 85(1-3): p. 367-372. 160. Samson, N., Bausher, M.G., Lee, S.B., et al., The complete nucleotide sequence of the coffee (Coffea arabica L.) chloroplast genome: organization and implications for biotechnology and

119

phylogenetic relationships amongst angiosperms. Plant biotechnology journal, 2007. 5(2): p. 339-353. 161. Tran, H.T., Ramaraj, T., Furtado, A., et al., Use of a draft genome of coffee (Coffea arabica) to identify SNP s associated with caffeine content. Plant biotechnology journal, 2018. 162. Research, W.C. Coffea arabica genome. 2018; Available from: https://worldcoffeeresearch.org/work/coffea-arabica-genome/. 163. Mantilla, J.G., Galeano, N.F., Gaitan, A.L., et al., Transcriptome analysis of the entomopathogenic fungus Beauveria bassiana grown on the coffee berry borer (Hypothenemus hampei). Microbiology, 2012: p. mic. 0.051664-0. 164. Medrano, J.F., Koyner, R., Islas-Trejo, A., et al. Coffee Transcriptome Analysis using RNA Sequencing: Comparison of Coffees Grown at Different Altitudes. in Plant and Animal Genome XXI Conference. 2013. Plant and Animal Genome. 165. Ivamoto, S.T. Transcriptome Analysis in Leaves, Flowers and Initial Fruit Development of Coffea arabica L. in Plant and Animal Genome XXIII Conference. 2015. Plant and Animal Genome. 166. Combes, M.C., Dereeper, A., Severac, D., et al., Contribution of subgenomes to the transcriptome and their intertwined regulation in the allopolyploid Coffea arabica grown at contrasted temperatures. New Phytologist, 2013. 200(1): p. 251-260. 167. Van der Hoeven, R., Ronning, C., Giovannoni, J., et al., Deductions about the number, organization, and evolution of genes in the tomato genome based on analysis of a large expressed sequence tag collection and selective genomic sequencing. The Plant Cell Online, 2002. 14(7): p. 1441-1456. 168. Tang, J., Vosman, B., Voorrips, R.E., et al., QualitySNP: a pipeline for detecting single nucleotide polymorphisms and insertions/deletions in EST data from diploid and polyploid species. BMC bioinformatics, 2006. 7(1): p. 438. 169. Rudd, S., Expressed sequence tags: alternative or complement to whole genome sequences? Trends in plant science, 2003. 8(7): p. 321-329. 170. NCBI. Number of public entries. 2013; Available from: http://www.ncbi.nlm.nih.gov/genbank/dbest/dbest_summary. 171. Vieira, L.G.E., Andrade, A.C., Colombo, C.A., et al., Brazilian coffee genome project: an EST- based genomic resource. Brazilian Journal of Plant Physiology, 2006. 18(1): p. 95-108. 172. Lin, C., Mueller, L.A., Mc Carthy, J., et al., Coffee and tomato share common gene repertoires as revealed by deep sequencing of seed and cherry transcripts. Theoretical and applied genetics, 2005. 112(1): p. 114-130. 173. Montoya, G., Vuong, H., Buell, C., et al. Sequence analysis from leaves, flowers and fruits of Coffea arabica var. Caturra. in 21st International Conference on Coffee Science, Montpellier, France, 11-15 September, 2006. 2007. Association Scientifique Internationale du Café (ASIC). 174. Bell, J., De Nardi, B., Dreos, R., et al., Differential responses of Coffea arabica L. leaves and roots to chemically induced systemic acquired resistance. Genome, 2006. 49(12): p. 1594- 1605. 175. Poncet, V., Rondeau, M., Tranchant, C., et al., SSR mining in coffee tree EST databases: potential use of EST–SSRs as markers for the Coffea genus. Molecular Genetics and Genomics, 2006. 276(5): p. 436-449. 176. Pereira, L.F.P., Galvão, R.M., Kobayashi, A.K., et al., Ethylene production and acc oxidase gene expression during fruit ripening of Coffea arabica L. Brazilian Journal of Plant Physiology, 2005. 17(3): p. 283-289. 177. Ságio, S.A., Barreto, H.G., Lima, A.A., et al., Identification and expression analysis of ethylene biosynthesis and signaling genes provides insights into the early and late coffee cultivars ripening pathway. Planta, 2014. 239(5): p. 951-963.

120

178. Mizuno, K., Kato, M., Irino, F., et al., The first committed step reaction of caffeine biosynthesis: 7-methylxanthosine synthase is closely homologous to caffeine synthases in coffee (Coffea arabica L.). FEBS letters, 2003. 547(1): p. 56-60. 179. Uefuji, H., Ogita, S., Yamaguchi, Y., et al., Molecular cloning and functional characterization of three distinct N-methyltransferases involved in the caffeine biosynthetic pathway in coffee plants. Plant physiology, 2003. 132(1): p. 372-380. 180. Negishi, O., Ozawa, T., and Imagawa, H., N-methyl nucleosidase from tea leaves. Agricultural and biological chemistry, 1988. 52(1): p. 169-175. 181. McCarthy, A.A. and McCarthy, J.G., The structure of two N-methyltransferases from the caffeine biosynthetic pathway. Plant physiology, 2007. 144(2): p. 879-889. 182. Ashihara, H. and Suzuki, T., Distribution and biosynthesis of caffeine in plants. Front Biosci, 2004. 9(2): p. 1864-76. 183. Ogawa, M., Herai, Y., Koizumi, N., et al., 7-Methylxanthine Methyltransferase of Coffee Plants GENE ISOLATION AND ENZYMATIC PROPERTIES. Journal of Biological Chemistry, 2001. 276(11): p. 8213-8218. 184. Leroy, T., De Bellis, F., Legnate, H., et al., Improving the quality of African robustas: QTLs for yield-and quality-related traits in Coffea canephora. Tree Genetics & Genomes, 2011. 7(4): p. 781-798. 185. Joshi, J.G. and Handler, P., Biosynthesis of trigonelline. The Journal of biological chemistry, 1960. 235: p. 2981-2983. 186. Upmeier, B., Gross, W., Ko¨ster, S., et al., Purification and properties of S-adenosyl-l- methionine:Nicotinic acid-N-methyltransferase from cell suspension cultures of Glycine max L. Archives of Biochemistry and Biophysics, 1988. 262(2): p. 445-454. 187. Chen, X. and Wood, A.J., Purification and Characterization of S-Adenosyl-L-Methionine Nicotinic Acid-N-Methyltransferase from Leaves of Glycine max. Biologia Plantarum, 2004. 48(4): p. 531-535. 188. Mizuno, K., Matsuzaki, M., Kanazawa, S., et al., Conversion of nicotinic acid to trigonelline is catalyzed by N-methyltransferase belonged to motif B′ methyltransferase family in Coffea arabica. Biochemical and biophysical research communications, 2014. 452(4): p. 1060-1066. 189. Lepelley, M., Mahesh, V., McCarthy, J., et al., Characterization, high-resolution mapping and differential expression of three homologous PAL genes in Coffea canephora Pierre (Rubiaceae). Planta, 2012. 236(1): p. 313-326. 190. Huang, J., Gu, M., Lai, Z., et al., Functional analysis of the Arabidopsis PAL gene family in plant growth, development, and response to environmental stress. Plant Physiology, 2010. 153(4): p. 1526-1538. 191. Shi, R., Shuford, C.M., Wang, J.P., et al., Regulation of phenylalanine ammonia-lyase (PAL) gene family in wood forming tissue of Populus trichocarpa. Planta, 2013. 238(3): p. 487-497. 192. Tonnessen, B.W., Manosalva, P., Lang, J.M., et al., Rice phenylalanine ammonia-lyase gene OsPAL4 is associated with broad spectrum disease resistance. Plant molecular biology, 2014. 87(3): p. 273-286. 193. Zhang, X., Gou, M., and Liu, C.-J., Arabidopsis Kelch repeat F-box proteins regulate phenylpropanoid biosynthesis via controlling the turnover of phenylalanine ammonia-lyase. The Plant Cell Online, 2013. 25(12): p. 4994-5010. 194. Hyun, M.W., Yun, Y.H., Kim, J.Y., et al., Fungal and plant phenylalanine ammonia-lyase. Mycobiology, 2011. 39(4): p. 257-265. 195. Tovar, M.J., Romero, M.P., Girona, J., et al., L‐Phenylalanine ammonia‐lyase activity and concentration of phenolics in developing olive (Olea europaea L cv Arbequina) fruit grown under different irrigation regimes. Journal of the Science of Food and Agriculture, 2002. 82(8): p. 892-898.

121

196. Benveniste, I., Salaün, J.-P., and Durst, F., Phytochrome-mediated regulation of a monooxygenase hydroxylating cinnamic acid in etiolated pea seedlings. Phytochemistry, 1978. 17(3): p. 359-363. 197. Lepelley, M., Cheminade, G., Tremillon, N., et al., Chlorogenic acid synthesis in coffee: An analysis of CGA content and real-time RT-PCR expression of HCT, HQT, C3H1, and CCoAOMT1 genes during grain development in C. canephora. Plant Science, 2007. 172(5): p. 978-996. 198. Mahesh, V., Million-Rousseau, R., Ullmann, P., et al., Functional characterization of two p- coumaroyl ester 3′-hydroxylase genes from coffee tree: evidence of a candidate for chlorogenic acid biosynthesis. Plant molecular biology, 2007. 64(1-2): p. 145-159. 199. Fellenberg, C., Milkowski, C., Hause, B., et al., Tapetum‐specific location of a cation‐ dependent O‐methyltransferase in Arabidopsis thaliana. The Plant Journal, 2008. 56(1): p. 132-145. 200. Joët, T., Salmona, J., Laffargue, A., et al., Use of the growing environment as a source of variation to identify the quantitative trait transcripts and modules of co-expressed genes that determine chlorogenic acid accumulation. Plant Cell Environ, 2010. 33(7): p. 1220-33. 201. Privat, I., Foucrier, S., Prins, A., et al., Differential regulation of grain sucrose accumulation and metabolism in Coffea arabica (Arabica) and Coffea canephora (Robusta) revealed through gene expression and enzyme activity analysis. New Phytologist, 2008. 178(4): p. 781- 797. 202. Leroy, T., Marraccini, P., Dufour, M., et al., Construction and characterization of a Coffea canephora BAC library to study the organization of sucrose biosynthesis genes. Theor Appl Genet, 2005. 111(6): p. 1032-41. 203. Joët, T., Laffargue, A., Salmona, J., et al., Metabolic pathways in tropical dicotyledonous albuminous seeds: Coffea arabica as a case study. New Phytologist, 2009. 182(1): p. 146-162. 204. Beaudoin, F. and Napier, J.A., 8 Biosynthesis and compartmentation of triacylglycerol in higher plants, in Lipid Metabolism and Membrane Biogenesis. 2004, Springer. p. 267-287. 205. Privat, I., Bardil, A., Gomez, A.B., et al., The'PUCE CAFE'project: the first 15K coffee microarray, a new tool for discovering candidate genes correlated to agronomic and quality traits. BMC genomics, 2011. 12(1): p. 5. 206. Simkin, A.J., Qian, T., Caillet, V., et al., Oleosin gene family of Coffea canephora: quantitative expression analysis of five oleosin genes in developing and germinating coffee grain. Journal of plant physiology, 2006. 163(7): p. 691-708. 207. Cook, O.F., Shade in . 1901: US Dept. of Agriculture, Division of Botany. 208. Haarer, A.E., Modern coffee production. Modern coffee production, 1956. 209. Lock, C., Coffee: its culture and commerce in all countrie. 1888. 210. Borkhataria, R.R., Collazo, J.A., and Groom, M.J., Species abundance and potential biological control services in shade vs. sun coffee in Puerto Rico. Agriculture, Ecosystems & Environment, 2012. 151: p. 1-5. 211. Chaves, A.R., Ten-Caten, A., Pinheiro, H.A., et al., Seasonal changes in photoprotective mechanisms of leaves from shaded and unshaded field-grown coffee (Coffea arabica L.) trees. Trees, 2008. 22(3): p. 351-361. 212. DaMatta, F.M., Ecophysiological constraints on the production of shaded and unshaded coffee: a review. Field Crops Research, 2004. 86(2): p. 99-114. 213. Guiscafre-Arrillaga, J., Sombra o sol para el cafeto. El café de El Salvador, 1957. 308(309): p. 320-364. 214. Pendergrast, M., Uncommon grounds: The and how it transformed our world. 1999: Basic Books. 215. Gallina, S., Mandujano, S., and Gonzalez-Romero, A., Conservation of mammalian biodiversity in coffee plantations of Central Veracruz, Mexico. Agroforestry Systems, 1996. 33(1): p. 13-27.

122

216. Greenberg, R., Bichier, P., and Sterling, J., Bird populations in rustic and planted shade coffee plantations of eastern Chiapas, Mexico. Biotropica, 1997. 29(4): p. 501-514. 217. Perfecto, I., Rice, R.A., Greenberg, R., et al., Shade coffee: a disappearing refuge for biodiversity. BioScience, 1996: p. 598-608. 218. Barradas, V. and Fanjul, L., Microclimatic characterization of shade and open–grown coffee. Coffee arabica, 1986. 219. Goldberg, A. and Kigel, J., Dynamics of the weed community in coffee plantations grown under shade trees: effect of clearing. Israel journal of botany, 1986. 35(2): p. 121-131. 220. Muschler, R.G., Tree-crop compatibility in agroforestry: production and quality of coffee grown under managed tree shade in Costa Rica. 1998, University of Florida. 221. Schaller, M., Schroth, G., Beer, J., et al., Species and site characteristics that permit the association of fast-growing trees with crops: the case of Eucalyptus deglupta as coffee shade in Costa Rica. Forest Ecology and Management, 2003. 175(1): p. 205-215. 222. Willey, R. The use of shade in coffee, cocoa and tea. in Horticultural Abstracts. 1975. 223. Somporn, C., Kamtuo, A., Theerakulpisut, P., et al., Effect of shading on yield, sugar content, phenolic acids and antioxidant property of coffee beans (Coffea Arabica L. cv. Catimor) harvested from north‐eastern Thailand. Journal of the Science of Food and Agriculture, 2012. 92(9): p. 1956-1963. 224. Pompelli, M.F., Pompelli, G.M., de Oliveira, A.F., et al., The effect of light and nitrogen availability on the caffeine, theophylline and allantoin contents in leaves of Coffea arabica L. Aims Environmental Science, 2014. 1(1): p. 1-11. 225. Delaroza, F., Rakocevic, M., Malta, G.B., et al., Spectroscopic and chromatographic fingerprint analysis of composition variations in Coffea arabica leaves subject to different light conditions and plant phenophases. Journal of the Brazilian Chemical Society, 2014. 25(11): p. 1929-1938. 226. Marraccini, P., Geromel, C., Ferreira, L., et al. Soluble Sugars, Enzymatic Activities and Gene Expression during Development of Coffee Fruits Submitted to Shade Condition. in 21st International Conference on Coffee Science, Montpellier, France, 11-15 September, 2006. 2007. Association Scientifique Internationale du Café (ASIC). 227. Kumar, A., Simmi, P., Naik, G.K., et al., RP-HPLC and transcript profile indicate increased leaf caffeine in Coffea canephora plants by light. Journal of Biology and Earth Sciences, 2015. 5(1): p. 1-9. 228. Geromel, C., Ferreira, L.P., Davrieux, F., et al., Effects of shade on the development and sugar metabolism of coffee (Coffea arabica L.) fruits. Plant Physiology and Biochemistry, 2008. 46(5): p. 569-579. 229. Snoeck, J., Lambot, C., and Wintgens, J., Crop maintenance. Coffee: growing, processing, sustainable production. A guidebook for growers, processors, traders and researchers, 2009: p. 250-273. 230. Beer, J., Advantages, disadvantages and desirable characteristics of shade trees for coffee, cacao and tea. Agroforestry systems, 1987. 5(1): p. 3-13. 231. Mendonça, E. and Stott, D., Characteristics and decomposition rates of pruning residues from a shaded coffee system in Southeastern Brazil. Agroforestry systems, 2003. 57(2): p. 117-125. 232. Youkhana, A. and Idol, T., Tree pruning mulch increases soil C and N in a shaded coffee agroecosystem in Hawaii. Soil biology and Biochemistry, 2009. 41(12): p. 2527-2534. 233. Glover, N. and Beer, J., Nutrient cycling in two traditional Central American agroforestry systems. Agroforestry Systems, 1986. 4(2): p. 77-87. 234. Avelino, J., Willocquet, L., and Savary, S., Effects of crop management patterns on coffee rust epidemics. Plant pathology, 2004. 53(5): p. 541-547. 235. Beer, J., Litter production and nutrient cycling in coffee (Coffea arabica) or cacao (Theobroma cacao) plantations with shade trees. Agroforestry systems, 1988. 7(2): p. 103-114. 236. Budowski, G., Applicability of agroforestry systems. 1981: CATIE.

123

237. Fassbender, H., Nutrient cycling in agroforestry systems of coffee (Coffea arabica) with shade trees in the Central Experiment of CATIE. Advances in agroforestry research, 1987: p. 155- 165. 238. Jordan, C.F., Nutrient cycling in tropical forest ecosystems. 1985: John Wiley & Sons. 239. Zingore, S., Mafongoya, P., Nyamugafata, P., et al., Nitrogen mineralization and maize yields following application of tree prunings to a sandy soil in Zimbabwe. Agroforestry Systems, 2003. 57(3): p. 199-211. 240. Quinlan, M.M. and Universidad de Costa Rica, S.J.C., Turrialba, Mulches from two tropical tree species: Erythrina poeppigiana(Walpers) O. F. Cook and Gmelina arborea Rox. as nitrogen sources in the production of maize(Zea mays L.). 1984, University of Costa Rica. 241. Ewel, J.J., Litter fall and leaf decomposition in a tropical forest succession in eastern Guatemala. The Journal of Ecology, 1976: p. 293-308. 242. Fassbender, H.W., Alpizar, L., Heuveldop, J., et al., Modelling agroforestry systems of cacao (Theobroma cacao) with laurel (Cordia alliodora) and poro (Erythrina poeppigiana) in Costa Rica III. Cycles of organic matter and nutrients. Agroforestry Systems, 1988. 6(1): p. 49-62. 243. DaMatta, F.M. and Ramalho, J.D.C., Impacts of drought and temperature stress on coffee physiology and production: a review. Brazilian Journal of Plant Physiology, 2006. 18(1): p. 55- 81. 244. DaMatta, F.M., Exploring drought tolerance in coffee: a physiological approach with some insights for plant breeding. Brazilian Journal of Plant Physiology, 2004. 16(1): p. 1-6. 245. Descroix, F. and Snoeck, J., Environmental factors suitable for coffee cultivation. Coffee: Growing, Processing, Sustainable Production: A Guidebook for Growers, Processors, Traders, and Researchers, 2008: p. 164-177. 246. Carr, M., The water relations and irrigation requirements of coffee. Experimental Agriculture, 2001. 37(01): p. 1-36. 247. Silva, E.A.d., Mazzafera, P., Brunini, O., et al., The influence of water management and environmental conditions on the chemical composition and beverage quality of coffee beans. Brazilian Journal of Plant Physiology, 2005. 17(2): p. 229-238. 248. Schroth, G., Laderach, P., Dempewolf, J., et al., Towards a climate change adaptation strategy for coffee communities and ecosystems in the Sierra Madre de Chiapas, Mexico. Mitigation and Adaptation Strategies for Global Change, 2009. 14(7): p. 605-625. 249. Gay, C., Estrada, F., Conde, C., et al., Potential impacts of climate change on agriculture: A case of study of coffee production in Veracruz, Mexico. Climatic Change, 2006. 79(3-4): p. 259-288. 250. Chaves, M., Effects of water deficits on carbon assimilation. Journal of experimental Botany, 1991. 42(1): p. 1-16. 251. Lawlor, D., Integration of biochemical processes in the physiology of water stressed plants, in Effects of stress on photosynthesis. 1983, Springer. p. 35-44. 252. Baerenfaller, K., Massonnet, C., Walsh, S., et al., Systems‐based analysis of Arabidopsis leaf growth reveals adaptation to water deficit. Molecular systems biology, 2012. 8(1). 253. Petridis, A., Therios, I., Samouris, G., et al., Effect of water deficit on leaf phenolic composition, gas exchange, oxidative damage and antioxidant activity of four Greek olive (Olea europaea L.) cultivars. Plant Physiology and Biochemistry, 2012. 60: p. 1-11. 254. Lawlor, D., The effects of water deficit on photosynthesis. Environment and Plant Metabolism. Flexibility and Acclimation, 1995: p. 129-160. 255. Cornic, G., Drought stress and high light effects on leaf photosynthesis. Photoinhibition of photosynthesis, 1994. 17: p. 297-313. 256. Asada, K., The water-water cycle in chloroplasts: scavenging of active oxygens and dissipation of excess photons. Annual review of plant biology, 1999. 50(1): p. 601-639. 257. Jensen, R.G. and Bahr, J.T., Ribulose 1, 5-bisphosphate carboxylase-oxygenase. Annual Review of Plant Physiology, 1977. 28(1): p. 379-400.

124

258. Gutteridge, S. and Gatenby, A.A., Rubisco synthesis, assembly, mechanism, and regulation. The Plant Cell, 1995. 7(7): p. 809. 259. Dean, C., Pichersky, E., and Dunsmuir, P., Structure, evolution, and regulation of RbcS genes in higher plants. Annual review of plant biology, 1989. 40(1): p. 415-439. 260. Rizhsky, L., Liang, H., and Mittler, R., The combined effect of drought stress and heat shock on gene expression in tobacco. Plant Physiology, 2002. 130(3): p. 1143-1151. 261. Bartholomew, D.M., Bartley, G.E., and Scolnik, P.A., Abscisic acid control of rbcS and cab transcription in tomato leaves. Plant physiology, 1991. 96(1): p. 291-296. 262. Williams, J., Bulman, M., and Neill, S., Wilt‐induced ABA biosynthesis, gene expression and down‐regulation of rbcS mRNA levels in Arabidopsis thaliana. Physiologia Plantarum, 1994. 91(2): p. 177-182. 263. Vu, J.C., Gesch, R.W., Allen, L.H., et al., C0 2 Enrichment Delays a Rapid, Drought-Induced Decrease in Rubisco Small Subunit Transcript Abundance. Journal of plant physiology, 1999. 155(1): p. 139-142. 264. Hayano-Kanashiro, C., Calderón-Vázquez, C., Ibarra-Laclette, E., et al., Analysis of gene expression and physiological responses in three Mexican maize under drought stress and recovery irrigation. PLoS One, 2009. 4(10): p. e7531. 265. Meinzer, F., Saliendra, N., and Crisosto, C., Carbon isotope discrimination and gas exchange in Coffea arabica during adjustment to different soil moisture regimes. Functional Plant Biology, 1992. 19(2): p. 171-184. 266. DaMatta, F.M., Chaves, A.R., Pinheiro, H.A., et al., Drought tolerance of two field-grown clones of Coffea canephora. Plant Science, 2003. 164(1): p. 111-117. 267. Marraccini, P., Freire, L.P., Alves, G.S., et al., RBCS1 expression in coffee: Coffea orthologs, Coffea arabica homeologs, and expression variability between genotypes and under drought stress. BMC plant biology, 2011. 11(1): p. 85. 268. Cavatte, P.C., Oliveira, Á.A., Morais, L.E., et al., Could shading reduce the negative impacts of drought on coffee? A morphophysiological analysis. Physiologia Plantarum, 2012. 144(2): p. 111-122. 269. Bauer, H., Wierer, R., Hatheway, W., et al., Photosynthesis of Coffea arabica after chilling. Physiologia Plantarum, 1985. 64(4): p. 449-454. 270. Da Matta, F.M., Maestri, M., Mosquim, P.R., et al., Photosynthesis in coffee (Coffea arabica and C. canephora) as affected by winter and summer conditions. Plant Science, 1997. 128(1): p. 43-50. 271. Ramalho, J., Quartin, V., Leitão, E., et al., Cold acclimation ability and photosynthesis among species of the tropical Coffea genus. Plant Biology, 2003. 5(6): p. 631-641. 272. Fortunato, A.S., Lidon, F.C., Batista-Santos, P., et al., Biochemical and molecular characterization of the antioxidative system of Coffea sp. under cold conditions in genotypes with contrasting tolerance. Journal of plant physiology, 2010. 167(5): p. 333-342. 273. Kratsch, H. and Wise, R., The ultrastructure of chilling stress. Plant, Cell & Environment, 2000. 23(4): p. 337-350. 274. Dubey, R.S., Photosynthesis in plants under stressful conditions. Handbook of photosynthesis, 1997. 2: p. 717-737. 275. Chinnusamy, V., Zhu, J., and Zhu, J.-K., Cold stress regulation of gene expression in plants. Trends in plant science, 2007. 12(10): p. 444-451. 276. Harwood, J.L., 6 Plant Lipid Metabolism. Plant biochemistry, 1997: p. 237. 277. Routaboul, J.-M. and Fischer, S.F., Trienoic fatty acids are required to maintain chloroplast function at low temperatures. Plant Physiology, 2000. 124(4): p. 1697-1705. 278. Campos, P.S., Quartin, V., Cochicho Ramalho, J., et al., Electrolyte leakage and lipid degradation account for cold sensitivity in leaves ofCoffea sp. plants. Journal of plant physiology, 2003. 160(3): p. 283-292.

125

279. Örvar, B.L., Sangwan, V., Omann, F., et al., Early steps in cold sensing by plant cells: the role of actin cytoskeleton and membrane fluidity. The Plant Journal, 2000. 23(6): p. 785-794. 280. Lee, J.-Y. and Lee, D.-H., Use of serial analysis of gene expression technology to reveal changes in gene expression in Arabidopsis pollen undergoing cold stress. Plant Physiology, 2003. 132(2): p. 517-529. 281. Kreps, J.A., Wu, Y., Chang, H.-S., et al., Transcriptome changes for Arabidopsis in response to salt, osmotic, and cold stress. Plant Physiology, 2002. 130(4): p. 2129-2141. 282. Hsieh, T.-H., Lee, J.-T., Yang, P.-T., et al., Heterology Expression of the ArabidopsisC- Repeat/Dehydration Response Element Binding Factor 1 Gene Confers Elevated Tolerance to Chilling and Oxidative Stresses in Transgenic Tomato. Plant Physiology, 2002. 129(3): p. 1086-1094. 283. Oh, S.-J., Song, S.I., Kim, Y.S., et al., Arabidopsis CBF3/DREB1A and ABF3 in transgenic rice increased tolerance to abiotic stress without stunting growth. Plant Physiology, 2005. 138(1): p. 341-351. 284. Yeh, S., Moffatt, B., Griffith, M., et al., Chitinase genes responsible to cold encode antifreeze proteins in winter cereals Plant Physiol, 2000. 124: p. 1251-1263. 285. Kasprzewska, A., Plant chitinases-regulation and function. Cellular and Molecular Biology Letters, 2003. 8(3): p. 809-824. 286. Santos, P., Machado, E., Gouveia, M., et al. Gene expression analysis of Coffeaspp. exposed to biotic and abiotic stress. in Book of Abstracts of the XIV Congresso Nacional de Bioquímica, PB. 2004. 287. Wahid, A., Gelani, S., Ashraf, M., et al., Heat tolerance in plants: an overview. Environmental and Experimental Botany, 2007. 61(3): p. 199-223. 288. Yang, K.A., Lim, C.J., Hong, J.K., et al., Identification of cell wall genes modified by a permissive high temperature in Chinese cabbage. Plant science, 2006. 171(1): p. 175-182. 289. Kotak, S., Larkindale, J., Lee, U., et al., Complexity of the heat stress response in plants. Current opinion in plant biology, 2007. 10(3): p. 310-316. 290. Jones, P.D., New, M., Parker, D.E., et al., Surface air temperature and its changes over the past 150 years. Reviews of Geophysics, 1999. 37(2): p. 173-199. 291. Drinnan, J. and Menzel, C., Temperature affects vegetative growth and flowering of coffee (Coffea arabica L.). Journal of Horticultural Science, 1995. 70(1): p. 25-34. 292. dos Santos, T.B., Budzinski, I.G., Marur, C.J., et al., Expression of three galactinol synthase isoforms in Coffea arabica L. and accumulation of raffinose and stachyose in response to abiotic stresses. Plant Physiology and Biochemistry, 2011. 49(4): p. 441-448. 293. Lima, R.B., dos Santos, T.B., Vieira, L.G.E., et al., Heat stress causes alterations in the cell-wall polymers and anatomy of coffee leaves (Coffea arabica L.). Carbohydrate Polymers, 2013. 93(1): p. 135-143. 294. Praxedes, S.C., DaMatta, F.M., Loureiro, M.E., et al., Effects of long-term soil drought on photosynthesis and carbohydrate metabolism in mature robusta coffee (Coffea canephora Pierre var. kouillou) leaves. Environmental and Experimental Botany, 2006. 56(3): p. 263-273. 295. DaMatta, F.M., Loos, R.A., Silva, E.A., et al., Effects of soil water deficit and nitrogen nutrition on water relations and photosynthesis of pot-grown Coffea canephora Pierre. Trees, 2002. 16(8): p. 555-558. 296. Grishkevich, V., Ben‐Elazar, S., Hashimshony, T., et al., A genomic bias for genotype– environment interactions in C. elegans. Molecular systems biology, 2012. 8(1). 297. Bohnert, H.J., Nelson, D.E., and Jensen, R.G., Adaptations to environmental stresses. The plant cell, 1995. 7(7): p. 1099. 298. Wang, Z., Gerstein, M., and Snyder, M., RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics, 2009. 10(1): p. 57-63. 299. Mutz, K.O., Heilkenbrinker, A., Lonne, M., et al., Transcriptome analysis using next- generation sequencing. Curr Opin Biotechnol, 2013. 24(1): p. 22-30.

126

300. Clark, T.A., Sugnet, C.W., and Ares, M., Genomewide analysis of mRNA processing in yeast using splicing-specific microarrays. Science, 2002. 296(5569): p. 907-910. 301. David, L., Huber, W., Granovskaia, M., et al., A high-resolution map of transcription in the yeast genome. Proceedings of the National Academy of Sciences, 2006. 103(14): p. 5320- 5325. 302. Yamada, K., Lim, J., Dale, J.M., et al., Empirical analysis of transcriptional activity in the Arabidopsis genome. Science, 2003. 302(5646): p. 842-846. 303. Bertone, P., Stolc, V., Royce, T.E., et al., Global identification of human transcribed sequences with genome tiling arrays. Science, 2004. 306(5705): p. 2242-2246. 304. Cheng, J., Kapranov, P., Drenkow, J., et al., Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science, 2005. 308(5725): p. 1149-1154. 305. Okoniewski, M.J. and Miller, C.J., Hybridization interactions between probesets in short oligo microarrays lead to spurious correlations. BMC bioinformatics, 2006. 7(1): p. 276. 306. Royce, T.E., Rozowsky, J.S., and Gerstein, M.B., Toward a universal microarray: prediction of gene expression through nearest-neighbor probe sequence identification. Nucleic acids research, 2007. 35(15): p. e99. 307. Velculescu, V.E., Zhang, L., Vogelstein, B., et al., Serial analysis of gene expression. Science, 1995. 270(5235): p. 484-487. 308. Harbers, M. and Carninci, P., Tag-based approaches for transcriptome research and genome annotation. Nature methods, 2005. 2(7): p. 495-502. 309. Kodzius, R., Kojima, M., Nishiyori, H., et al., CAGE: cap analysis of gene expression. Nature methods, 2006. 3(3): p. 211-222. 310. Shiraki, T., Kondo, S., Katayama, S., et al., Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proceedings of the National Academy of Sciences, 2003. 100(26): p. 15776-15781. 311. Brenner, S., Johnson, M., Bridgham, J., et al., Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nature biotechnology, 2000. 18(6): p. 630-634. 312. Peiffer, J.A., Kaushik, S., Sakai, H., et al., A spatial dissection of the Arabidopsis floral transcriptome by MPSS. BMC plant biology, 2008. 8(1): p. 43. 313. Reinartz, J., Bruyns, E., Lin, J.-Z., et al., Massively parallel signature sequencing (MPSS) as a tool for in-depth quantitative gene expression profiling in all organisms. Briefings in functional genomics & proteomics, 2002. 1(1): p. 95-104. 314. Vera, J.C., Wheat, C.W., Fescemyer, H.W., et al., Rapid transcriptome characterization for a nonmodel organism using 454 pyrosequencing. Molecular ecology, 2008. 17(7): p. 1636-1647. 315. Morin, R.D., Bainbridge, M., Fejes, A., et al., Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing. Biotechniques, 2008. 45(1): p. 81. 316. Cloonan, N., Forrest, A.R.R., Kolle, G., et al., Stem cell transcriptome profiling via massive- scale mRNA sequencing. Nature methods, 2008. 5(7): p. 613-619. 317. Nagalakshmi, U., Wang, Z., Waern, K., et al., The transcriptional landscape of the yeast genome defined by RNA sequencing. Science, 2008. 320(5881): p. 1344-1349. 318. Mardis, E.R., Next-generation DNA sequencing methods. Annu Rev Genomics Hum Genet, 2008. 9: p. 387-402. 319. Quail, M.A., Smith, M., Coupland, P., et al., A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC genomics, 2012. 13(1): p. 341. 320. Minoche, A.E., Dohm, J.C., and Himmelbauer, H., Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems. Genome Biol, 2011. 12(11): p. R112.

127

321. Reuter, J.A., Spacek, D.V., and Snyder, M.P., High-Throughput Sequencing Technologies. Molecular cell, 2015. 58(4): p. 586-597. 322. van Dijk, E.L., Auger, H., Jaszczyszyn, Y., et al., Ten years of next-generation sequencing technology. Trends in genetics, 2014. 30(9): p. 418-426. 323. Kamenetsky, R., Faigenboim, A., Mayer, E.S., et al., Integrated transcriptome catalogue and organ-specific profiling of gene expression in fertile garlic (Allium sativum L.). BMC Genomics, 2015. 16(1): p. 12. 324. Wibberg, D., Rupp, O., Jelonek, L., et al., Improved genome sequence of the phytopathogenic fungus Rhizoctonia solani AG1-IB 7/3/14 as established by deep mate-pair sequencing on the MiSeq (Illumina) system. Journal of biotechnology, 2015. 203: p. 19-21. 325. Martínez-López, L.A., Ochoa-Alejo, N., and Martínez, O., Dynamics of the chili pepper transcriptome during fruit development. BMC genomics, 2014. 15(1): p. 143. 326. Schmitt, M.W., Fox, E.J., Prindle, M.J., et al., Sequencing small genomic targets with high efficiency and extreme accuracy. Nature methods, 2015. 12(5): p. 423-425. 327. Xu, Z., Peters, R.J., Weirather, J., et al., Full‐length transcriptome sequences and splice variants obtained by a combination of sequencing platforms applied to different root tissues of Salvia miltiorrhiza and tanshinone biosynthesis. The Plant Journal, 2015. 328. Balasubramanian, S., Solexa Sequencing: Decoding Genomes on a Population Scale. Clinical chemistry, 2015. 61(1): p. 21-24. 329. Holmfeldt, L., Diaz-Flores, E., Zhang, J., et al., Integrated genomic analysis of hypodiploid acute lymphoblastic leukemia. Cancer Research, 2012. 72(8 Supplement): p. 4870-4870. 330. Carugati, L., Corinaldesi, C., Dell'Anno, A., et al., Metagenetic tools for the census of marine meiofaunal biodiversity: an overview. Marine genomics, 2015. 24: p. 11-20. 331. McCutcheon, J.N. and Giaccone, G., Next-Generation Sequencing: Targeting Targeted Therapies. Clinical Cancer Research, 2015: p. clincanres. 0407.2015. 332. Mangul, S., Caciula, A., Al Seesi, S., et al., Transcriptome assembly and quantification from Ion Torrent RNA-Seq data. BMC genomics, 2014. 15(Suppl 5): p. S7. 333. Pukk, L., Ahmad, F., Hasan, S., et al., Less is more: extreme genome complexity reduction with ddRAD using Ion Torrent semiconductor technology. Molecular ecology resources, 2015. 334. Wielstra, B., Duijm, E., Lagler, P., et al., Parallel tagged amplicon sequencing of transcriptome‐based genetic markers for Triturus newts with the Ion Torrent next‐ generation sequencing platform. Molecular ecology resources, 2014. 14(5): p. 1080-1089. 335. Recknagel, H., Jacobs, A., Herzyk, P., et al., Double‐digest RAD sequencing using Ion Proton semiconductor platform (ddRADseq‐ion) with nonmodel organisms. Molecular ecology resources, 2015. 336. Chaisson, M.J.P., Huddleston, J., Dennis, M.Y., et al., Resolving the complexity of the human genome using single-molecule sequencing. Nature, 2015. 517(7536): p. 608-611. 337. Huddleston, J., Ranade, S., Malig, M., et al., Reconstructing complex regions of genomes using long-read sequencing technology. Genome research, 2014. 24(4): p. 688-696. 338. Kronenberg, Z.N., Fiddes, I.T., Gordon, D., et al., High-resolution comparative analysis of great ape genomes. Science, 2018. 360(6393): p. eaar6343. 339. Kosicki, M., Tomberg, K., and Bradley, A., Repair of double-strand breaks induced by CRISPR– Cas9 leads to large deletions and complex rearrangements. Nature biotechnology, 2018. 340. Breaux, B., Hunter, M.E., Cruz-Schneider, M.P., et al., The Florida manatee (Trichechus manatus latirostris) T cell receptor loci exhibit V subgroup synteny and chain-specific evolution. Developmental & Comparative Immunology, 2018. 85: p. 71-85. 341. Cheng, B., Furtado, A., and Henry, R.J., Long-read sequencing of the coffee bean transcriptome reveals the diversity of full-length transcripts. Gigascience, 2017. 342. Tedersoo, L., Tooming‐Klunderud, A., and Anslan, S., PacBio metabarcoding of Fungi and other eukaryotes: errors, biases and perspectives. New Phytologist, 2018. 217(3): p. 1370- 1385.

128

343. Wenger, A., Hickey, L., Chin, J., et al., Structural variant detection with low-coverage PacBio sequencing. Nature. 517(7536): p. 608-611. 344. Michael, T.P., Jupe, F., Bemm, F., et al., High contiguity Arabidopsis thaliana genome assembly with a single nanopore flow cell. Nature communications, 2018. 9(1): p. 541. 345. Ashton, P.M., Nair, S., Dallman, T., et al., MinION nanopore sequencing identifies the position and structure of a bacterial antibiotic resistance island. Nature biotechnology, 2014. 346. Jain, M., Fiddes, I.T., Miga, K.H., et al., Improved data analysis for the MinION nanopore sequencer. Nature methods, 2015. 347. Zhou, X., Ren, L., Meng, Q., et al., The next-generation sequencing technology and application. Protein & cell, 2010. 1(6): p. 520-536. 348. Nowrousian, M., Next-generation sequencing techniques for eukaryotic microorganisms: sequencing-based solutions to biological problems. Eukaryotic cell, 2010. 9(9): p. 1300-1310. 349. Kohlmann, A., Grossmann, V., and Haferlach, T. Integration of next-generation sequencing into clinical practice: are we there yet? in Seminars in oncology. 2012. Elsevier. 350. Cullum, R., Alder, O., and Hoodless, P.A., The next generation: using new sequencing technologies to analyse gene regulation. Respirology, 2011. 16(2): p. 210-222. 351. Gasser, T., Hardy, J., and Mizuno, Y., Milestones in PD genetics. Movement Disorders, 2011. 26(6): p. 1042-1048. 352. Vilariño-Güell, C., Wider, C., Ross, O.A., et al., VPS35 mutations in Parkinson disease. The American Journal of Human Genetics, 2011. 89(1): p. 162-167. 353. Morozova, O. and Marra, M.A., Applications of next-generation sequencing technologies in functional genomics. Genomics, 2008. 92(5): p. 255-264. 354. Yuyama, P.M. RNA-Seq Analysis and De Novo Transcriptome Assembly Of Coffea arabica and Coffea eugenioides. in Plant and Animal Genome XXI Conference. 2013. Plant and Animal Genome. 355. Janzen, S.O., Comprehensive Natural Products II Chemistry and Biology Chemistry of Coffee. 2010, Elsevier Applied Science. 356. Clarke, R. and Macrae, R., Coffee chemistry (Vol. 1). Essex, England: Elsevier Science, 1985. 306. 357. Dart, S. and Nursten, H., Volatile components, in Coffee. 1985, Springer. p. 223-265. 358. Abid, M., Jabbar, S., Wu, T., et al., Sonication enhances polyphenolic compounds, sugars, carotenoids and mineral elements of apple juice. Ultrasonics sonochemistry, 2014. 21(1): p. 93-97. 359. Ma, C., Sun, Z., Chen, C., et al., Simultaneous separation and determination of fructose, sorbitol, glucose and sucrose in fruits by HPLC–ELSD. Food chemistry, 2014. 145: p. 784-788. 360. ISO, Sensory analysis––Methodology––Paired comparison test ISO. 2002. 361. Prescott, J., Norris, L., Kunst, M., et al., Estimating a “consumer rejection threshold” for cork taint in white wine. Food Quality and Preference, 2005. 16(4): p. 345-349. 362. Pride’Tangerines, M.U.E. Handling & Processing Section. in Proc. Fla. State Hort. Soc. 2012. 363. Bhumiratana, N., Adhikari, K., and Chambers IV, E., Evolution of sensory aroma attributes from coffee beans to brewed coffee. LWT-Food Science and Technology, 2011. 44(10): p. 2185-2192. 364. Odeny, D., Chemining’wa, G., and Shibairo, S. Beverage quality and biochemical components of shaded coffee. in 25th International Conference of Coffee Science. 2014. 365. Wintgens, J.N., Coffee: growing, processing, sustainable production. A guidebook for growers, processors, traders and researchers. 2009: Wiley-Vch. 366. Yoo, M.-J., Liu, X., Pires, J.C., et al., Nonadditive gene expression in polyploids. Annual review of genetics, 2014. 48: p. 485-517. 367. Adams, K.L., Cronn, R., Percifield, R., et al., Genes duplicated by polyploidy show unequal contributions to the transcriptome and organ-specific reciprocal silencing. Proceedings of the National Academy of sciences, 2003. 100(8): p. 4649-4654.

129

368. Levasseur, A. and Pontarotti, P., The role of duplications in the evolution of genomes highlights the need for evolutionary-based approaches in comparative genomics. Biology direct, 2011. 6(1): p. 11. 369. Wang, B., Tseng, E., Regulski, M., et al., Unveiling the complexity of the maize transcriptome by single-molecule long-read sequencing. Nature Communications, 2016. 7. 370. Abdel-Ghany, S.E., Hamilton, M., Jacobi, J.L., et al., A survey of the sorghum transcriptome using single-molecule long reads. Nature Communications, 2016. 7. 371. Bicknell, A.A., Cenik, C., Chua, H.N., et al., Introns in UTRs: why we should stop ignoring them. Bioessays, 2012. 34(12): p. 1025-1034. 372. Mignone, F., Gissi, C., Liuni, S., et al., Untranslated regions of mRNAs. Genome biology, 2002. 3(3): p. reviews0004. 1. 373. Van Veen, H., Vashisht, D., Akman, M., et al., Transcriptomes of eight Arabidopsis thaliana accessions reveal core conserved, genotype-and organ-specific responses to flooding stress. Plant physiology, 2016: p. pp. 00472.2016. 374. Garg, R., Shankar, R., Thakkar, B., et al., Transcriptome analyses reveal genotype-and developmental stage-specific molecular responses to drought and salinity stresses in chickpea. Scientific reports, 2016. 6. 375. Grabherr, M.G., Haas, B.J., Yassour, M., et al., Full-length transcriptome assembly from RNA- Seq data without a reference genome. Nature biotechnology, 2011. 29(7): p. 644-652. 376. Wang, X.-W., Luan, J.-B., Li, J.-M., et al., De novo characterization of a whitefly transcriptome and analysis of its gene expression during development. BMC genomics, 2010. 11(1): p. 400. 377. Li, P., Ponnala, L., Gandotra, N., et al., The developmental dynamics of the maize leaf transcriptome. Nature genetics, 2010. 42(12): p. 1060-1067. 378. Michael, T.P. and VanBuren, R., Progress, challenges and the future of crop genomes. Current opinion in plant biology, 2015. 24: p. 71-81. 379. Furtado, A., RNA extraction from developing or mature wheat seeds, in Cereal Genomics. 2014, Springer. p. 23-28. 380. PacificBiosciences. RS_IsoSeq (v2.3) Tutorial 2. 2. Isoform level clustering (ICE and Quiver) 2015; Available from: https://github.com/PacificBiosciences/cDNA_primer/wiki/RS_IsoSeq- %28v2.3%29-Tutorial-%232.-Isoform-level-clustering-%28ICE-and-Quiver%29. 381. Fu, L., Niu, B., Zhu, Z., et al., CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics, 2012. 28(23): p. 3150-3152. 382. Afgan, E., Sloggett, C., Goonasekera, N., et al., Genomics Virtual Laboratory: a practical bioinformatics workbench for the cloud. PloS one, 2015. 10(10): p. e0140829. 383. Mondego, J.M., Vidal, R.O., Carazzolle, M.F., et al., An EST-based analysis identifies new genes and reveals distinctive gene expression features of Coffea arabica and Coffea canephora. BMC plant biology, 2011. 11(1): p. 1. 384. Dereeper, A., Bocs, S., Rouard, M., et al., The coffee genome hub: a resource for coffee genomes. Nucleic acids research, 2015. 43(D1): p. D1028-D1035. 385. Götz, S., García-Gómez, J.M., Terol, J., et al., High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic acids research, 2008. 36(10): p. 3420-3435. 386. Kearse, M., Moir, R., Wilson, A., et al., Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics, 2012. 28(12): p. 1647-1649. 387. Bailey, T.L., Boden, M., Buske, F.A., et al., MEME SUITE: tools for motif discovery and searching. Nucleic acids research, 2009: p. gkp335. 388. Grillo, G., Turi, A., Licciulli, F., et al., UTRdb and UTRsite (RELEASE 2010): a collection of sequences and regulatory motifs of the untranslated regions of eukaryotic mRNAs. Nucleic acids research, 2010. 38(suppl 1): p. D75-D80.

130

389. Yuyama, P.M., Júnior, O.R., Ivamoto, S.T., et al., Transcriptome analysis in Coffea eugenioides, an Arabica coffee ancestor, reveals differentially expressed genes in leaves and fruits. Molecular Genetics and Genomics, 2016. 291(1): p. 323-336. 390. Nawrocki, E.P., Burge, S.W., Bateman, A., et al., Rfam 12.0: updates to the RNA families database. Nucleic acids research, 2014: p. gku1063. 391. Cheng, B., Furtado, A., Smyth, H.E., et al., Influence of genotype and environment on coffee quality. Trends in Food Science & Technology, 2016. 57: p. 20-30. 392. Giovannoni, J., Nguyen, C., Ampofo, B., et al., The Epigenome and Transcriptional Dynamics of Fruit Ripening. Annual Review of Plant Biology, 2017. 68: p. 61-84. 393. Li, L., Cheng, Z., Ma, Y., et al., The Association of Hormone Signaling Genes, Transcription, and Changes in Shoot Anatomy during Moso Bamboo Growth. Plant Biotechnology Journal, 2017. 394. Ivamoto, S.T., Júnior, O.R., Domingues, D.S., et al., Transcriptome Analysis of Leaves, Flowers and Fruits Perisperm of Coffea arabica L. Reveals the Differential Expression of Genes Involved in Raffinose Biosynthesis. PloS one, 2017. 12(1): p. e0169595. 395. Conesa, A. and Götz, S., Blast2GO: A comprehensive suite for functional analysis in plant genomics. International journal of plant genomics, 2008. 2008. 396. Kanehisa, M., Goto, S., Hattori, M., et al., From genomics to chemical genomics: new developments in KEGG. Nucleic acids research, 2006. 34(suppl_1): p. D354-D357. 397. Bioinformatics & Evolutionary Genomics, G.U., Belgium. Calculate and draw custom Venn diagrams. Available from: http://bioinformatics.psb.ugent.be/webtools/Venn/. 398. Thimm, O., Bläsing, O., Gibon, Y., et al., mapman: a user‐driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes. The Plant Journal, 2004. 37(6): p. 914-939. 399. Lohse, M., Nagel, A., Herter, T., et al., Mercator: a fast and simple web server for genome scale functional annotation of plant sequence data. Plant, cell & environment, 2014. 37(5): p. 1250-1258. 400. Franz, M., Lopes, C.T., Huck, G., et al., Cytoscape. js: a graph theory library for visualisation and analysis. Bioinformatics, 2015. 32(2): p. 309-311. 401. Cheng, B., Furtado, A., and Henry, R.J., The coffee bean transcriptome explains the accumulation of the major bean components through ripening. Scientific reports, 2018. 8. 402. De Castro, R., Estanislau, W., De Mesquita, P., et al., A functional model of coffee (Coffea arabica) seed development. Salamanca, 2002. 403. Wormer, T. and Njuguna, S., Bean size and shape as quality factors in Kenya Coffee. Kenya Coffee, 1966. 31: p. 397-405. 404. De Castro, R.D. and Marraccini, P., Cytology, biochemistry and molecular changes during coffee fruit development. Brazilian Journal of Plant Physiology, 2006. 18(1): p. 175-199. 405. Sant’Ana, G.C., Pereira, L.F., Pot, D., et al., Genome-wide association study reveals candidate genes influencing lipids and diterpenes contents in Coffea arabica L. Scientific reports, 2018. 8(1): p. 465. 406. Redgwell, R. and Fischer, M., Coffee carbohydrates. Brazilian Journal of Plant Physiology, 2006. 18(1): p. 165-174. 407. Marraccini, P., Rogers, W.J., Caillet, V., et al., Biochemical and molecular characterization of α-D-galactosidase from coffee beans. Plant Physiology and Biochemistry, 2005. 43(10): p. 909-920. 408. Buckeridge, M.S., Seed cell wall storage polysaccharides: models to understand cell wall biosynthesis and degradation. Plant Physiology, 2010. 154(3): p. 1017-1023. 409. Morant, A.V., Jørgensen, K., Jørgensen, C., et al., β-Glucosidases as detonators of plant chemical defense. Phytochemistry, 2008. 69(9): p. 1795-1813.

131

410. Buckeridge, M.S. and Dietrich, S.M., Mobilisation of the raffinose family oligosaccharides and galactomannan in germinating seeds of Sesbania marginata Benth.(Leguminosae- Faboideae). Plant Science, 1996. 117(1-2): p. 33-43. 411. Orruno, E. and Morgan, M., Purification and characterisation of the 7S globulin storage protein from sesame (Sesamum indicum L.). Food Chemistry, 2007. 100(3): p. 926-934. 412. Rogers, W.J., Bézard, G., Deshayes, A., et al., Biochemical and molecular characterization and expression of the 11S-type storage protein from Coffea arabica endosperm. Plant Physiology and Biochemistry, 1999. 37(4): p. 261-272. 413. Kusaba, M., Miyahara, K., Iida, S., et al., Low glutelin content1: a dominant mutation that suppresses the glutelin multigene family via RNA silencing in rice. The Plant Cell, 2003. 15(6): p. 1455-1467. 414. Kroj, T., Savino, G., Valon, C., et al., Regulation of storage protein gene expression in Arabidopsis. Development, 2003. 130(24): p. 6065-6073. 415. Santos‐Mendoza, M., Dubreucq, B., Baud, S., et al., Deciphering gene regulatory networks that control seed development and maturation in Arabidopsis. The Plant Journal, 2008. 54(4): p. 608-620. 416. Eastmond, P.J., SUGAR-DEPENDENT1 encodes a patatin domain triacylglycerol lipase that initiates storage oil breakdown in germinating Arabidopsis seeds. The Plant Cell Online, 2006. 18(3): p. 665-675. 417. Redgwell, R.J., Curti, D., Fischer, M., et al., Coffee bean arabinogalactans: acidic polymers covalently linked to protein. Carbohydrate Research, 2002. 337(3): p. 239-253. 418. Nothnagel, E.A., Bacic, A., and Clarke, A.E., Cell and developmental biology of arabinogalactan-proteins. 2012: Springer Science & Business Media. 419. Jiang, L., Yang, S.-L., Xie, L.-F., et al., VANGUARD1 encodes a pectin methylesterase that enhances pollen tube growth in the Arabidopsis style and transmitting tract. The Plant Cell, 2005. 17(2): p. 584-596. 420. Micheli, F., Pectin methylesterases: cell wall enzymes with important roles in plant physiology. Trends in plant science, 2001. 6(9): p. 414-419. 421. Marín‐Rodríguez, M.C., Orchard, J., and Seymour, G.B., Pectate lyases, cell wall degradation and fruit softening. Journal of experimental botany, 2002. 53(377): p. 2115- 2119. 422. González-Carranza, Z.H., Elliott, K.A., and Roberts, J.A., Expression of polygalacturonases and evidence to support their role during cell separation processes in Arabidopsis thaliana. Journal of experimental botany, 2007. 58(13): p. 3719-3730. 423. Rose, J.K., Catalá, C., Gonzalez-Carranza, Z.H., et al., Cell wall disassembly. Annual Plant Reviews, 2003. 8: p. 264-324. 424. Ding, X., Cao, Y., Huang, L., et al., Activation of the indole-3-acetic acid–amido synthetase GH3-8 suppresses expansin expression and promotes salicylate-and jasmonate-independent basal immunity in rice. The Plant Cell, 2008. 20(1): p. 228-240. 425. Cosgrove, D.J., Plant expansins: diversity and interactions with plant cell walls. Current opinion in plant biology, 2015. 25: p. 162-172. 426. Cosgrove, D.J., Loosening of plant cell walls by expansins. Nature, 2000. 407(6802): p. 321- 326. 427. Urbaneja, A., Jacas, J., Verdú, M., et al., Dinámica e impacto de los parasitoides autóctonos de Phyllocnistis citrella Stainton, en la Comunidad Valenciana. Invest. Agric. Prod. Prot. Veg, 1998. 13: p. 787-796. 428. Nisius, A., The stromacentre inAvena plastids: An aggregation ofβ-glucosidase responsible for the activation of oat-leaf saponins. Planta, 1988. 173(4): p. 474-481. 429. Halkier, B.A. and Gershenzon, J., Biology and biochemistry of glucosinolates. Annu. Rev. Plant Biol., 2006. 57: p. 303-333.

132

430. McDowell, J.M., An, Y., Huang, S., et al., The Arabidopsis ACT7 actin gene is expressed in rapidly developing tissues and responds to several external stimuli. Plant physiology, 1996. 111(3): p. 699-711. 431. Ruan, Y.-L., Sucrose metabolism: gateway to diverse carbon use and sugar signaling. Annual review of plant biology, 2014. 65: p. 33-67. 432. Moat, J., Williams, J., Baena, S., et al., Resilience potential of the Ethiopian coffee sector under climate change. Nature plants, 2017. 3(7): p. 17081. 433. ICO. Coffee total production - crop year. 2017 [cited 2017; Available from: http://www.ico.org/new_historical.asp?section=Statistics. 434. Wuyts, N., Dhondt, S., and Inzé, D., Measurement of plant growth in view of an integrative analysis of regulatory networks. Current opinion in plant biology, 2015. 25: p. 90-97. 435. Digel, B., Pankin, A., and von Korff, M., Global transcriptome profiling of developing leaf and shoot apices reveals distinct genetic and environmental control of floral transition and inflorescence development in barley. The Plant Cell, 2015: p. tpc. 15.00203. 436. Dal Santo, S., Zenoni, S., Sandri, M., et al., Grapevine field experiments reveal the contribution of genotype, the influence of environment and the effect of their interaction (G× E) on the berry transcriptome. The Plant Journal, 2018. 93(6): p. 1143-1159. 437. Henry, R.J., Furtado, A., and Rangan, P., Wheat seed transcriptome reveals genes controlling key traits for human preference and crop adaptation. Current opinion in plant biology, 2018. 438. Karlova, R., Rosin, F.M., Busscher-Lange, J., et al., Transcriptome and metabolite profiling show that APETALA2a is a major regulator of tomato fruit ripening. The Plant Cell, 2011: p. tpc. 110.081273. 439. Komatsu, T., Kawaide, H., Saito, C., et al., The chloroplast protein BPG2 functions in brassinosteroid‐mediated post‐transcriptional accumulation of chloroplast rRNA. The Plant Journal, 2010. 61(3): p. 409-422. 440. Kim, B.-H., Malec, P., Waloszek, A., et al., Arabidopsis BPG2: a phytochrome-regulated gene whose protein product binds to plastid ribosomal RNAs. Planta, 2012. 236(2): p. 677-690. 441. Nakano, T., Asami, T., Yamagami, A., et al., Transformed plants or algae with highly expressed chloroplast protein BPG2. 2013, Google Patents. 442. Schultz, C.J. and Coruzzi, G.M., The aspartate aminotransferase gene family of Arabidopsis encodes isoenzymes localized to three distinct subcellular compartments. The Plant Journal, 1995. 7(1): p. 61-75. 443. Lee, C.P., Eubel, H., O'Toole, N., et al., Heterogeneity of the mitochondrial proteome for photosynthetic and non-photosynthetic Arabidopsis metabolism. Molecular & Cellular Proteomics, 2008. 7(7): p. 1297-1316. 444. Nomura, M., Higuchi, T., Katayama, K., et al., The promoter for C4-type mitochondrial aspartate aminotransferase does not direct bundle sheath-specific expression in transgenic rice plants. Plant and cell physiology, 2005. 46(5): p. 743-753. 445. He, W., Brumos, J., Li, H., et al., A small-molecule screen identifies L-kynurenine as a competitive inhibitor of TAA1/TAR activity in ethylene-directed auxin biosynthesis and root growth in Arabidopsis. The Plant Cell, 2011: p. tpc. 111.089029. 446. Tao, Y., Ferrer, J.-L., Ljung, K., et al., Rapid synthesis of auxin via a new tryptophan- dependent pathway is required for shade avoidance in plants. Cell, 2008. 133(1): p. 164-176. 447. Ljung, K., Auxin metabolism and homeostasis during plant development. Development, 2013. 140(5): p. 943-950. 448. Blanco, F., Garretón, V., Frey, N., et al., Identification of NPR1-dependent and independent genes early induced by salicylic acid treatment in Arabidopsis. Plant molecular biology, 2005. 59(6): p. 927-944. 449. Condit, C.M. and Meagher, R.B., Expression of a gene encoding a glycine-rich protein in petunia. Molecular and cellular biology, 1987. 7(12): p. 4273-4279.

133

450. Lu, Y.-Y. and Chen, C.-Y., Molecular analysis of lily leaves in response to salicylic acid effective towards protection against Botrytis elliptica. Plant Science, 2005. 169(1): p. 1-9. 451. Sachetto-Martins, G., Franco, L.O., and de Oliveira, D.E., Plant glycine-rich proteins: a family or just proteins with a common motif? Biochimica et Biophysica Acta (BBA)-Gene Structure and Expression, 2000. 1492(1): p. 1-14. 452. Zchut, S., Weiss, M., and Pick, U., Temperature-regulated expression of a glycine-rich RNA- binding protein in the halotolerant alga Dunaliella salina. Journal of plant physiology, 2003. 160(11): p. 1375. 453. Zhang, L., Chen, J., Li, Q., et al., Transcriptome-wide analysis of basic helix-loop-helix transcription factors in Isatis indigotica and their methyl jasmonate responsive expression profiling. Gene, 2016. 576(1): p. 150-159. 454. Kazan, K. and Manners, J.M., MYC2: the master in action. Molecular plant, 2013. 6(3): p. 686- 703. 455. Spadoni, A., Guidarelli, M., Phillips, J., et al., Transcriptional profiling of apple fruits in response to heat treatment: involvement of a defense response during P. expansum infection. Scienze e Tecnologie Agrarie, Ambientali e Alimentari: p. 93. 456. Zhang, Y., Yang, C., Li, Y., et al., SDIR1 is a RING finger E3 ligase that positively regulates stress-responsive abscisic acid signaling in Arabidopsis. The Plant Cell, 2007. 19(6): p. 1912- 1929. 457. Xia, Z., Liu, Q., Wu, J., et al., ZmRFP1, the putative ortholog of SDIR1, encodes a RING-H2 E3 ubiquitin ligase and responds to drought stress in an ABA-dependent manner in maize. Gene, 2012. 495(2): p. 146-153. 458. Zhang, H., Cui, F., Wu, Y., et al., The RING finger ubiquitin E3 ligase SDIR1 targets SDIR1- INTERACTING PROTEIN1 for degradation to modulate the salt stress response and ABA signaling in Arabidopsis. The Plant Cell, 2015: p. tpc. 114.134163. 459. Oyama, T., Shimura, Y., and Okada, K., The IRE gene encodes a protein kinase homologue and modulates root hair growth in Arabidopsis. The Plant Journal, 2002. 30(3): p. 289-299. 460. Ding, X., Hou, X., Xie, K., et al., Genome-wide identification of BURP domain-containing genes in rice reveals a gene family with diverse structures and responses to abiotic stresses. Planta, 2009. 230(1): p. 149-163. 461. Yamaguchi-Shinozaki, K. and Shinozaki, K., The plant hormone abscisic acid mediates the drought-induced expression but not the seed-specific expression of rd22, a gene responsive to dehydration stress in Arabidopsis thaliana. Molecular and General Genetics MGG, 1993. 238(1-2): p. 17-25. 462. Harshavardhan, V.T., Seiler, C., Junker, A., et al., AtRD22 and AtUSPL1, members of the plant- specific BURP domain family involved in Arabidopsis thaliana drought tolerance. PloS one, 2014. 9(10): p. e110065. 463. Guo, L., Chen, Y., Ye, N., et al., Differential retention and expansion of the ancestral genes associated with the paleopolyploidies in modern rosid plants, as revealed by analysis of the extensins super-gene family. BMC genomics, 2014. 15(1): p. 612. 464. Lorković, Z., Schröder, W., Pakrasi, H., et al., Molecular characterization of PsbW, a nuclear- encoded component of the photosystem II reaction center complex in spinach. Proceedings of the National Academy of Sciences, 1995. 92(19): p. 8930-8934. 465. Seidler, A., The extrinsic polypeptides of photosystem II. Biochimica et Biophysica Acta (BBA)- Bioenergetics, 1996. 1277(1-2): p. 35-60. 466. Förster, B., Mathesius, U., and Pogson, B.J., Comparative proteomics of high light stress in the model alga Chlamydomonas reinhardtii. Proteomics, 2006. 6(15): p. 4309-4320. 467. de Bianchi, S., Dall'Osto, L., Tognon, G., et al., Minor antenna proteins CP24 and CP26 affect the interactions between photosystem II subunits and the electron transport rate in grana membranes of Arabidopsis. The Plant Cell, 2008. 20(4): p. 1012-1028.

134

468. Kolomiets, M.V., Chen, H., Gladon, R.J., et al., A leaf lipoxygenase of potato induced specifically by pathogen infection. Plant physiology, 2000. 124(3): p. 1121-1130. 469. Guo, Y., Abernathy, B., Zeng, Y., et al., TILLING by sequencing to identify induced mutations in stress resistance genes of peanut (Arachis hypogaea). BMC genomics, 2015. 16(1): p. 157. 470. Rameshwari, R., Madhu, S., Prasad, V., et al., Computational Analysis of tuberization protein linoleate 9S-lipoxygenase 3 from Solanum tuberosum. International Journal of Chemtech Research, 2015. 8(10): p. 294-310. 471. Werner, T., Motyka, V., Laucou, V., et al., Cytokinin-deficient transgenic Arabidopsis plants show multiple developmental alterations indicating opposite functions of cytokinins in the regulation of shoot and root meristem activity. The Plant Cell, 2003. 15(11): p. 2532-2550. 472. White, M.F., Vasquez, J., Yang, S.F., et al., Expression of apple 1-aminocyclopropane-1- carboxylate synthase in Escherichia coli: kinetic characterization of wild-type and active-site mutant forms. Proceedings of the National Academy of Sciences, 1994. 91(26): p. 12428- 12432. 473. McCarthy, D.L., Capitani, G., Feng, L., et al., Glutamate 47 in 1-aminocyclopropane-1- carboxylate synthase is a major specificity determinant. Biochemistry, 2001. 40(41): p. 12276-12284. 474. Kiyosue, T., Yamaguchi-Shinozaki, K., and Shinozaki, K., Cloning of cDNAs for genes that are early-responsive to dehydration stress (ERDs) inArabidopsis thaliana L.: identification of three ERDs as HSP cognate genes. Plant molecular biology, 1994. 25(5): p. 791-798. 475. Jin, Y., Ni, D.-A., and Ruan, Y.-L., Posttranslational elevation of cell wall invertase activity by silencing its inhibitor in tomato delays leaf senescence and increases seed weight and fruit hexose level. The Plant Cell, 2009. 21(7): p. 2072-2089. 476. Lara, M.E.B., Garcia, M.-C.G., Fatima, T., et al., Extracellular invertase is an essential component of cytokinin-mediated delay of senescence. The Plant Cell, 2004. 16(5): p. 1276- 1287.

135

Appendices

Appendix 1 Differentially expressed genes related to photosynthesis.

Fold change UY LY vs LR vs UR vs PSII Seq ID vss Description LG LY UY UG (at4g05180 : 221.0) Encodes the PsbQ subunit of the oxygen evolving complex of photosystem II.; photosystem II subunit Q-2 (PSBQ-2); FUNCTIONS IN: calcium ion binding; INVOLVED IN: peptidyl-cysteine S-nitrosylation; LOCATED IN: in 9 components; EXPRESSED IN: 23 plant structures; EXPRESSED DURING: 14 growth PS.lightreaction.phot stages; CONTAINS InterPro DOMAIN/s: Photosystem II oxygen evolving complex protein PsbQ (InterPro:IPR008797); BEST Arabidopsis thaliana protein match is: 1.1.1.2 osystem II.PSII c154023/f2p3/937 -1.58 -2.29 photosystem II subunit QA (TAIR:AT4G21280.1); Has 206 Blast hits to 206 proteins in 40 species: Archae - 0; Bacteria - 0; Metazoa - 0; Fungi - 0; Plants - 200; Viruses - polypeptide subunits 0; Other Eukaryotes - 6 (source: NCBI BLink). & (p12301|psbq_spiol : 220.0) Oxygen-evolving enhancer protein 3, chloroplast precursor (OEE3) (16 kDa subunit of oxygen evolving system of photosystem II) (OEC 16 kDa subunit) - Spinacia oleracea (Spinach) & (loc_os07g36080.1 : 197.0) no description available & (ipr008797 : 188.28136) Photosystem II PsbQ, oxygen evolving complex & (reliability: 771.4924) & (original description: no original description) (q40459|psbo_tobac : 563.0) Oxygen-evolving enhancer protein 1, chloroplast precursor (OEE1) (33 kDa subunit of oxygen evolving system of photosystem II) (OEC 33 kDa subunit) (33 kDa thylakoid membrane protein) - Nicotiana tabacum (Common tobacco) & (at3g50820 : 520.0) Encodes a protein which is an extrinsic subunit of photosystem II and which has been proposed to play a central role in stabilization of the catalytic manganese cluster. In Arabidopsis thaliana the PsbO proteins are encoded by two genes: psbO1 and psbO2. PsbO2 is the minor isoform in the wild-type. Mutants defective in this gene have been shown to be affected in the PS.lightreaction.phot dephosphorylation of the D1 protein of PSII.; photosystem II subunit O-2 (PSBO2); FUNCTIONS IN: oxygen evolving activity, poly(U) RNA binding; INVOLVED IN: 1.1.1.2 osystem II.PSII c154915/f4p6/1272 -1.37 -1.99 photosynthesis, light reaction, photoinhibition, photosystem II assembly, photosystem II stabilization, regulation of protein amino acid dephosphorylation; LOCATED IN: in polypeptide subunits 9 components; EXPRESSED IN: 24 plant structures; EXPRESSED DURING: 14 growth stages; CONTAINS InterPro DOMAIN/s: Photosystem II manganese-stabilising protein PsbO (InterPro:IPR002628); BEST Arabidopsis thaliana protein match is: PS II oxygen-evolving complex 1 (TAIR:AT5G66570.1); Has 534 Blast hits to 532 proteins in 148 species: Archae - 0; Bacteria - 144; Metazoa - 1; Fungi - 0; Plants - 207; Viruses - 0; Other Eukaryotes - 182 (source: NCBI BLink). & (loc_os01g31690.1 : 518.0) no description available & (chl4|519678 : 308.0) no description available & (ipr002628 : 250.51176) Photosystem II PsbO, manganese-stabilising & (reliability: 1478.3956) & (original description: no original description) (q40459|psbo_tobac : 587.0) Oxygen-evolving enhancer protein 1, chloroplast precursor (OEE1) (33 kDa subunit of oxygen evolving system of photosystem II) (OEC 33 kDa subunit) (33 kDa thylakoid membrane protein) - Nicotiana tabacum (Common tobacco) & (loc_os01g31690.1 : 543.0) no description available & (at3g50820 : 538.0) Encodes a protein which is an extrinsic subunit of photosystem II and which has been proposed to play a central role in stabilization of the catalytic manganese cluster. In Arabidopsis thaliana the PsbO proteins are encoded by two genes: psbO1 and psbO2. PsbO2 is the minor isoform in the wild-type. Mutants defective PS.lightreaction.phot in this gene have been shown to be affected in the dephosphorylation of the D1 protein of PSII.; photosystem II subunit O-2 (PSBO2); FUNCTIONS IN: oxygen evolving 1.1.1.2 osystem II.PSII c52740/f6p8/1370 -1.19 activity, poly(U) RNA binding; INVOLVED IN: photosynthesis, light reaction, photoinhibition, photosystem II assembly, photosystem II stabilization, regulation of protein polypeptide subunits amino acid dephosphorylation; LOCATED IN: in 9 components; EXPRESSED IN: 24 plant structures; EXPRESSED DURING: 14 growth stages; CONTAINS InterPro DOMAIN/s: Photosystem II manganese-stabilising protein PsbO (InterPro:IPR002628); BEST Arabidopsis thaliana protein match is: PS II oxygen-evolving complex 1 (TAIR:AT5G66570.1); Has 534 Blast hits to 532 proteins in 148 species: Archae - 0; Bacteria - 144; Metazoa - 1; Fungi - 0; Plants - 207; Viruses - 0; Other Eukaryotes - 182 (source: NCBI BLink). & (chl4|519678 : 308.0) no description available & (ipr002628 : 251.09831) Photosystem II PsbO, manganese-stabilising & (reliability: 1515.4221) & (original description: no original description) (q41387|psbw_spiol : 103.0) Photosystem II reaction center W protein, chloroplast precursor (PSII 6.1 kDa protein) - Spinacia oleracea (Spinach) & (ipr009806 : 99.538795) Photosystem II PsbW, class 2 & (at2g30570 : 96.7) Encodes a protein similar to photosystem II reaction center subunit W.; photosystem II reaction center W PS.lightreaction.phot (PSBW); FUNCTIONS IN: molecular_function unknown; INVOLVED IN: biological_process unknown; LOCATED IN: chloroplast thylakoid membrane; EXPRESSED 1.1.1.2 osystem II.PSII c22480/f2p0/561 -3.12 -3.77 IN: 22 plant structures; EXPRESSED DURING: 13 growth stages; CONTAINS InterPro DOMAIN/s: Photosystem II protein PsbW, class 2 (InterPro:IPR009806); Has 77 polypeptide subunits Blast hits to 77 proteins in 26 species: Archae - 0; Bacteria - 0; Metazoa - 0; Fungi - 0; Plants - 74; Viruses - 0; Other Eukaryotes - 3 (source: NCBI BLink). & (reliability: 193.4) & (original description: no original description) (q9slq8|psbp_cucsa : 404.0) Oxygen-evolving enhancer protein 2, chloroplast precursor (OEE2) (23 kDa subunit of oxygen evolving system of photosystem II) (OEC 23 kDa subunit) (23 kDa thylakoid membrane protein) (OEC23) - Cucumis sativus (Cucumber) & (at1g06680 : 374.0) Encodes a 23 kD extrinsic protein that is part of PS.lightreaction.phot photosystem II and participates in the regulation of oxygen evolution.; photosystem II subunit P-1 (PSBP-1); FUNCTIONS IN: poly(U) RNA binding; INVOLVED IN: 1.1.1.2 osystem II.PSII c3592/f1p2/1131 -2.57 -3.12 photosynthesis, light reaction, defense response to bacterium, response to light intensity; LOCATED IN: in 9 components; EXPRESSED IN: 24 plant structures; polypeptide subunits EXPRESSED DURING: 15 growth stages; CONTAINS InterPro DOMAIN/s: Photosystem II oxygen evolving complex protein PsbP (InterPro:IPR002683), Mog1/PsbP/DUF1795, alpha/beta/alpha sandwich (InterPro:IPR016124), Mog1/PsbP, alpha/beta/alpha sandwich (InterPro:IPR016123); BEST Arabidopsis thaliana protein match is: photosystem II subunit P-2 (TAIR:AT2G30790.1); Has 30201 Blast hits to 17322 proteins in 780 species: Archae - 12; Bacteria - 1396; Metazoa - 17338; Fungi -

136

3422; Plants - 5037; Viruses - 0; Other Eukaryotes - 2996 (source: NCBI BLink). & (loc_os07g04840.1 : 357.0) no description available & (chl4|513916 : 229.0) no description available & (ipr016123 : 140.98532) Mog1/PsbP, alpha/beta/alpha sandwich & (reliability: 937.9625) & (original description: no original description) (q9slq8|psbp_cucsa : 408.0) Oxygen-evolving enhancer protein 2, chloroplast precursor (OEE2) (23 kDa subunit of oxygen evolving system of photosystem II) (OEC 23 kDa subunit) (23 kDa thylakoid membrane protein) (OEC23) - Cucumis sativus (Cucumber) & (at1g06680 : 375.0) Encodes a 23 kD extrinsic protein that is part of photosystem II and participates in the regulation of oxygen evolution.; photosystem II subunit P-1 (PSBP-1); FUNCTIONS IN: poly(U) RNA binding; INVOLVED IN: PS.lightreaction.phot photosynthesis, light reaction, defense response to bacterium, response to light intensity; LOCATED IN: in 9 components; EXPRESSED IN: 24 plant structures; 1.1.1.2 osystem II.PSII c30337/f1p2/1161 -2.53 -3.36 EXPRESSED DURING: 15 growth stages; CONTAINS InterPro DOMAIN/s: Photosystem II oxygen evolving complex protein PsbP (InterPro:IPR002683), polypeptide subunits Mog1/PsbP/DUF1795, alpha/beta/alpha sandwich (InterPro:IPR016124), Mog1/PsbP, alpha/beta/alpha sandwich (InterPro:IPR016123); BEST Arabidopsis thaliana protein match is: photosystem II subunit P-2 (TAIR:AT2G30790.1); Has 30201 Blast hits to 17322 proteins in 780 species: Archae - 12; Bacteria - 1396; Metazoa - 17338; Fungi - 3422; Plants - 5037; Viruses - 0; Other Eukaryotes - 2996 (source: NCBI BLink). & (loc_os07g04840.1 : 358.0) no description available & (chl4|513916 : 232.0) no description available & (ipr016123 : 144.96754) Mog1/PsbP, alpha/beta/alpha sandwich & (reliability: 945.58344) & (original description: no original description)

(at4g10340 : 305.0) photosystem II encoding the light-harvesting chlorophyll a/b binding protein CP26 of the antenna system of the photosynthetic apparatus; light harvesting complex of photosystem II 5 (LHCB5); FUNCTIONS IN: chlorophyll binding; INVOLVED IN: response to blue light, response to red light, response to far red light, photosynthesis, nonphotochemical quenching; LOCATED IN: in 9 components; EXPRESSED IN: 29 plant structures; EXPRESSED DURING: 14 growth stages; PS.lightreaction.phot CONTAINS InterPro DOMAIN/s: Chlorophyll A-B binding protein (InterPro:IPR001344); BEST Arabidopsis thaliana protein match is: chlorophyll A/B binding protein 1 1.1.1.1 c46280/f2p2/1097 -1.73 -2.75 osystem II.LHC-II (TAIR:AT1G29930.1); Has 2360 Blast hits to 2295 proteins in 226 species: Archae - 0; Bacteria - 0; Metazoa - 4; Fungi - 0; Plants - 2013; Viruses - 0; Other Eukaryotes - 343 (source: NCBI BLink). & (loc_os11g13890.1 : 294.0) no description available & (chl4|516517 : 159.0) no description available & (p15194|cb2b_pinsy : 113.0) Chlorophyll a-b binding protein type 2 member 1B, chloroplast precursor (Chlorophyll a-b binding protein type II 1B) (CAB) (LHCP) - Pinus sylvestris (Scots pine) & (reliability: 610.0) & (original description: no original description) (at4g10340 : 461.0) photosystem II encoding the light-harvesting chlorophyll a/b binding protein CP26 of the antenna system of the photosynthetic apparatus; light harvesting complex of photosystem II 5 (LHCB5); FUNCTIONS IN: chlorophyll binding; INVOLVED IN: response to blue light, response to red light, response to far red light, photosynthesis, nonphotochemical quenching; LOCATED IN: in 9 components; EXPRESSED IN: 29 plant structures; EXPRESSED DURING: 14 growth stages; PS.lightreaction.phot CONTAINS InterPro DOMAIN/s: Chlorophyll A-B binding protein (InterPro:IPR001344); BEST Arabidopsis thaliana protein match is: chlorophyll A/B binding protein 1 1.1.1.1 c3852/f4p9/1092 -2.62 -3.9 osystem II.LHC-II (TAIR:AT1G29930.1); Has 2360 Blast hits to 2295 proteins in 226 species: Archae - 0; Bacteria - 0; Metazoa - 4; Fungi - 0; Plants - 2013; Viruses - 0; Other Eukaryotes - 343 (source: NCBI BLink). & (loc_os11g13890.1 : 438.0) no description available & (chl4|516517 : 258.0) no description available & (p15194|cb2b_pinsy : 215.0) Chlorophyll a-b binding protein type 2 member 1B, chloroplast precursor (Chlorophyll a-b binding protein type II 1B) (CAB) (LHCP) - Pinus sylvestris (Scots pine) & (ipr022796 : 109.16311) Chlorophyll A-B binding protein & (reliability: 922.0) & (original description: no original description) (at5g54270 : 470.0) Lhcb3 protein is a component of the main light harvesting chlorophyll a/b-protein complex of Photosystem II (LHC II).; light-harvesting chlorophyll B- binding protein 3 (LHCB3); FUNCTIONS IN: structural molecule activity; INVOLVED IN: photosynthesis; LOCATED IN: light-harvesting complex, thylakoid, chloroplast thylakoid membrane, chloroplast; EXPRESSED IN: 24 plant structures; EXPRESSED DURING: 14 growth stages; CONTAINS InterPro DOMAIN/s: PS.lightreaction.phot Chlorophyll A-B binding protein (InterPro:IPR001344); BEST Arabidopsis thaliana protein match is: photosystem II light harvesting complex gene 2.1 1.1.1.1 c1017/f3p0/1005 -1.99 -2.52 osystem II.LHC-II (TAIR:AT2G05100.1); Has 1807 Blast hits to 1807 proteins in 277 species: Archae - 0; Bacteria - 0; Metazoa - 736; Fungi - 347; Plants - 385; Viruses - 0; Other Eukaryotes - 339 (source: NCBI BLink). & (loc_os07g37550.1 : 462.0) no description available & (p27523|cb23_horvu : 457.0) Chlorophyll a-b binding protein of LHCII type III, chloroplast precursor (CAB) - Hordeum vulgare (Barley) & (chl4|520113 : 338.0) no description available & (ipr022796 : 115.60729) Chlorophyll A-B binding protein & (reliability: 940.0) & (original description: no original description) (at5g54270 : 470.0) Lhcb3 protein is a component of the main light harvesting chlorophyll a/b-protein complex of Photosystem II (LHC II).; light-harvesting chlorophyll B- binding protein 3 (LHCB3); FUNCTIONS IN: structural molecule activity; INVOLVED IN: photosynthesis; LOCATED IN: light-harvesting complex, thylakoid, chloroplast thylakoid membrane, chloroplast; EXPRESSED IN: 24 plant structures; EXPRESSED DURING: 14 growth stages; CONTAINS InterPro DOMAIN/s: PS.lightreaction.phot Chlorophyll A-B binding protein (InterPro:IPR001344); BEST Arabidopsis thaliana protein match is: photosystem II light harvesting complex gene 2.1 1.1.1.1 c40273/f5p0/941 -2.52 -3.21 osystem II.LHC-II (TAIR:AT2G05100.1); Has 1807 Blast hits to 1807 proteins in 277 species: Archae - 0; Bacteria - 0; Metazoa - 736; Fungi - 347; Plants - 385; Viruses - 0; Other Eukaryotes - 339 (source: NCBI BLink). & (loc_os07g37550.1 : 462.0) no description available & (p27523|cb23_horvu : 457.0) Chlorophyll a-b binding protein of LHCII type III, chloroplast precursor (CAB) - Hordeum vulgare (Barley) & (chl4|520113 : 338.0) no description available & (ipr022796 : 115.76413) Chlorophyll A-B binding protein & (reliability: 940.0) & (original description: no original description) (loc_os07g37240.1 : 409.0) no description available & (at5g01530 : 405.0) light harvesting complex photosystem II (LHCB4.1); FUNCTIONS IN: chlorophyll binding; INVOLVED IN: response to blue light, response to red light, response to far red light, photosynthesis; LOCATED IN: in 6 components; EXPRESSED IN: 27 plant structures; EXPRESSED DURING: 14 growth stages; CONTAINS InterPro DOMAIN/s: Chlorophyll A-B binding protein (InterPro:IPR001344); BEST Arabidopsis PS.lightreaction.phot 1.1.1.1 c153232/f1p5/1220 -2.44 -3.09 thaliana protein match is: light harvesting complex photosystem II (TAIR:AT3G08940.2); Has 1807 Blast hits to 1807 proteins in 277 species: Archae - 0; Bacteria - 0; osystem II.LHC-II Metazoa - 736; Fungi - 347; Plants - 385; Viruses - 0; Other Eukaryotes - 339 (source: NCBI BLink). & (q93wd2|cb29_chlre : 216.0) Chlorophyll a-b binding protein CP29 - Chlamydomonas reinhardtii & (chl4|517514 : 216.0) no description available & (ipr022796 : 110.34177) Chlorophyll A-B binding protein & (reliability: 810.0) & (original description: no original description) (p08222|cb22_cucsa : 282.0) Chlorophyll a-b binding protein of LHCII type 1 (Chlorophyll a-b binding protein of LHCII type I) (CAB) (LHCP) (Fragment) - Cucumis PS.lightreaction.phot 1.1.1.1 c22081/f2p8/1121 -2.85 -3.79 -2.85 -3.79 sativus (Cucumber) & (loc_os01g52240.1 : 279.0) no description available & (at2g34420 : 260.0) Photosystem II type I chlorophyll a/b-binding protein; photosystem osystem II.LHC-II II light harvesting complex gene B1B2 (LHB1B2); FUNCTIONS IN: chlorophyll binding; INVOLVED IN: photosynthesis, light harvesting in photosystem II,

137

photosynthesis; LOCATED IN: in 7 components; EXPRESSED IN: cotyledon, guard cell, juvenile leaf, cultured cell, leaf; EXPRESSED DURING: seed development stages; CONTAINS InterPro DOMAIN/s: Chlorophyll A-B binding protein (InterPro:IPR001344); BEST Arabidopsis thaliana protein match is: chlorophyll A/B binding protein 1 (TAIR:AT1G29930.1); Has 2425 Blast hits to 2343 proteins in 222 species: Archae - 0; Bacteria - 0; Metazoa - 4; Fungi - 0; Plants - 2093; Viruses - 0; Other Eukaryotes - 328 (source: NCBI BLink). & (chl4|520113 : 235.0) no description available & (reliability: 520.0) & (original description: no original description) (p27493|cb22_tobac : 436.0) Chlorophyll a-b binding protein 21, chloroplast precursor (LHCII type I CAB-21) (LHCP) - Nicotiana tabacum (Common tobacco) & (at2g34430 : 409.0) Photosystem II type I chlorophyll a/b-binding protein; light-harvesting chlorophyll-protein complex II subunit B1 (LHB1B1); FUNCTIONS IN: chlorophyll binding; INVOLVED IN: photosynthesis, light harvesting in photosystem II, photosynthesis; LOCATED IN: in 8 components; EXPRESSED IN: shoot, PS.lightreaction.phot 1.1.1.1 c66376/f1p4/1194 -2.59 -3.44 -2.59 -3.44 cotyledon, guard cell, cultured cell, leaf; CONTAINS InterPro DOMAIN/s: Chlorophyll A-B binding protein (InterPro:IPR001344); BEST Arabidopsis thaliana protein osystem II.LHC-II match is: chlorophyll A/B binding protein 1 (TAIR:AT1G29930.1); Has 2425 Blast hits to 2344 proteins in 223 species: Archae - 0; Bacteria - 0; Metazoa - 4; Fungi - 0; Plants - 2094; Viruses - 0; Other Eukaryotes - 327 (source: NCBI BLink). & (loc_os01g41710.1 : 402.0) no description available & (chl4|520113 : 326.0) no description available & (ipr022796 : 113.625175) Chlorophyll A-B binding protein & (reliability: 818.0) & (original description: no original description) (p27493|cb22_tobac : 504.0) Chlorophyll a-b binding protein 21, chloroplast precursor (LHCII type I CAB-21) (LHCP) - Nicotiana tabacum (Common tobacco) & (at2g34430 : 478.0) Photosystem II type I chlorophyll a/b-binding protein; light-harvesting chlorophyll-protein complex II subunit B1 (LHB1B1); FUNCTIONS IN: chlorophyll binding; INVOLVED IN: photosynthesis, light harvesting in photosystem II, photosynthesis; LOCATED IN: in 8 components; EXPRESSED IN: shoot, PS.lightreaction.phot 1.1.1.1 c49588/f1p2/1026 -3.53 -3.95 cotyledon, guard cell, cultured cell, leaf; CONTAINS InterPro DOMAIN/s: Chlorophyll A-B binding protein (InterPro:IPR001344); BEST Arabidopsis thaliana protein osystem II.LHC-II match is: chlorophyll A/B binding protein 1 (TAIR:AT1G29930.1); Has 2425 Blast hits to 2344 proteins in 223 species: Archae - 0; Bacteria - 0; Metazoa - 4; Fungi - 0; Plants - 2094; Viruses - 0; Other Eukaryotes - 327 (source: NCBI BLink). & (loc_os01g41710.1 : 472.0) no description available & (chl4|520113 : 377.0) no description available & (ipr022796 : 114.487404) Chlorophyll A-B binding protein & (reliability: 956.0) & (original description: no original description) (p27493|cb22_tobac : 506.0) Chlorophyll a-b binding protein 21, chloroplast precursor (LHCII type I CAB-21) (LHCP) - Nicotiana tabacum (Common tobacco) & (at2g34430 : 481.0) Photosystem II type I chlorophyll a/b-binding protein; light-harvesting chlorophyll-protein complex II subunit B1 (LHB1B1); FUNCTIONS IN: chlorophyll binding; INVOLVED IN: photosynthesis, light harvesting in photosystem II, photosynthesis; LOCATED IN: in 8 components; EXPRESSED IN: shoot, PS.lightreaction.phot 1.1.1.1 c10698/f1p2/984 -5.68 -6.57 cotyledon, guard cell, cultured cell, leaf; CONTAINS InterPro DOMAIN/s: Chlorophyll A-B binding protein (InterPro:IPR001344); BEST Arabidopsis thaliana protein osystem II.LHC-II match is: chlorophyll A/B binding protein 1 (TAIR:AT1G29930.1); Has 2425 Blast hits to 2344 proteins in 223 species: Archae - 0; Bacteria - 0; Metazoa - 4; Fungi - 0; Plants - 2094; Viruses - 0; Other Eukaryotes - 327 (source: NCBI BLink). & (loc_os01g41710.1 : 475.0) no description available & (chl4|520113 : 375.0) no description available & (ipr022796 : 114.21297) Chlorophyll A-B binding protein & (reliability: 962.0) & (original description: no original description) (p27494|cb23_tobac : 496.0) Chlorophyll a-b binding protein 36, chloroplast precursor (LHCII type I CAB-36) (LHCP) - Nicotiana tabacum (Common tobacco) & (at2g05100 : 478.0) Lhcb2.1 protein encoding a subunit of the light harvesting complex II. Member of a gene family with high degree of sequence similarity. Initially LHCB2.3 was considered as a separate gene but appears to be an allele of LHCB2.1.; photosystem II light harvesting complex gene 2.1 (LHCB2.1); FUNCTIONS IN: PS.lightreaction.phot chlorophyll binding; INVOLVED IN: response to salt stress, response to blue light, response to red light, response to far red light, photosynthesis; LOCATED IN: in 7 1.1.1.1 c51025/f1p1/947 -4.14 -4.85 osystem II.LHC-II components; EXPRESSED IN: 12 plant structures; CONTAINS InterPro DOMAIN/s: Chlorophyll A-B binding protein (InterPro:IPR001344); BEST Arabidopsis thaliana protein match is: photosystem II light harvesting complex gene 2.2 (TAIR:AT2G05070.1); Has 2375 Blast hits to 2314 proteins in 223 species: Archae - 0; Bacteria - 0; Metazoa - 4; Fungi - 0; Plants - 2059; Viruses - 0; Other Eukaryotes - 312 (source: NCBI BLink). & (loc_os03g39610.1 : 454.0) no description available & (chl4|520113 : 384.0) no description available & (ipr022796 : 113.101105) Chlorophyll A-B binding protein & (reliability: 956.0) & (original description: no original description) (p36494|cb4_spiol : 367.0) Chlorophyll a-b binding protein CP24, chloroplast precursor - Spinacia oleracea (Spinach) & (loc_os04g38410.1 : 367.0) no description available & (at1g15820 : 361.0) Lhcb6 protein (Lhcb6), light harvesting complex of photosystem II.; light harvesting complex photosystem II subunit 6 (LHCB6); FUNCTIONS IN: chlorophyll binding; INVOLVED IN: photosynthesis, nonphotochemical quenching; LOCATED IN: in 7 components; EXPRESSED IN: 24 plant structures; EXPRESSED PS.lightreaction.phot 1.1.1.1 c20916/f1p3/1066 -3.54 -3.58 DURING: 14 growth stages; CONTAINS InterPro DOMAIN/s: Chlorophyll A-B binding protein (InterPro:IPR001344); BEST Arabidopsis thaliana protein match is: osystem II.LHC-II photosystem I light harvesting complex gene 6 (TAIR:AT1G19150.1); Has 2316 Blast hits to 2191 proteins in 220 species: Archae - 0; Bacteria - 0; Metazoa - 2; Fungi - 0; Plants - 1959; Viruses - 0; Other Eukaryotes - 355 (source: NCBI BLink). & (chl4|516832 : 141.0) no description available & (ipr022796 : 103.434006) Chlorophyll A-B binding protein & (reliability: 722.0) & (original description: no original description) (p36494|cb4_spiol : 418.0) Chlorophyll a-b binding protein CP24, chloroplast precursor - Spinacia oleracea (Spinach) & (at1g15820 : 410.0) Lhcb6 protein (Lhcb6), light harvesting complex of photosystem II.; light harvesting complex photosystem II subunit 6 (LHCB6); FUNCTIONS IN: chlorophyll binding; INVOLVED IN: photosynthesis, nonphotochemical quenching; LOCATED IN: in 7 components; EXPRESSED IN: 24 plant structures; EXPRESSED DURING: 14 growth stages; PS.lightreaction.phot 1.1.1.1 c575/f4p2/1000 -4.45 -4.34 CONTAINS InterPro DOMAIN/s: Chlorophyll A-B binding protein (InterPro:IPR001344); BEST Arabidopsis thaliana protein match is: photosystem I light harvesting osystem II.LHC-II complex gene 6 (TAIR:AT1G19150.1); Has 2316 Blast hits to 2191 proteins in 220 species: Archae - 0; Bacteria - 0; Metazoa - 2; Fungi - 0; Plants - 1959; Viruses - 0; Other Eukaryotes - 355 (source: NCBI BLink). & (loc_os04g38410.1 : 386.0) no description available & (chl4|516832 : 141.0) no description available & (ipr022796 : 103.667625) Chlorophyll A-B binding protein & (reliability: 820.0) & (original description: no original description)

PSI (at3g47470 : 417.0) Encodes a chlorophyll a/b-binding protein that is more similar to the PSI Cab proteins than the PSII cab proteins. The predicted protein is about 20 PS.lightreaction.phot amino acids shorter than most known Cab proteins.; light-harvesting chlorophyll-protein complex I subunit A4 (LHCA4); FUNCTIONS IN: chlorophyll binding; 1.1.2.1 c23706/f2p3/1049 -3.53 -4.48 osystem I.LHC-I INVOLVED IN: response to karrikin, photosynthesis; LOCATED IN: in 7 components; EXPRESSED IN: 24 plant structures; EXPRESSED DURING: 14 growth stages; CONTAINS InterPro DOMAIN/s: Chlorophyll A-B binding protein (InterPro:IPR001344); BEST Arabidopsis thaliana protein match is: photosystem I light harvesting

138

complex gene 2 (TAIR:AT3G61470.1); Has 2336 Blast hits to 2249 proteins in 223 species: Archae - 0; Bacteria - 0; Metazoa - 3; Fungi - 0; Plants - 1967; Viruses - 0; Other Eukaryotes - 366 (source: NCBI BLink). & (loc_os08g33820.1 : 392.0) no description available & (p13869|cb12_pethy : 233.0) Chlorophyll a-b binding protein, chloroplast precursor (LHCI type II CAB) - Petunia hybrida (Petunia) & (chl4|510281 : 205.0) no description available & (ipr022796 : 103.74416) Chlorophyll A-B binding protein & (reliability: 834.0) & (original description: no original description) (loc_os06g21590.1 : 198.0) no description available & (at3g54890 : 194.0) Encodes a component of the light harvesting complex associated with photosystem I.; photosystem I light harvesting complex gene 1 (LHCA1); FUNCTIONS IN: chlorophyll binding; INVOLVED IN: response to blue light, response to red light, photosynthesis, light harvesting in photosystem I, response to far red light, photosynthesis; LOCATED IN: light-harvesting complex, chloroplast thylakoid membrane, PS.lightreaction.phot chloroplast, plastoglobule, membrane; EXPRESSED IN: 28 plant structures; EXPRESSED DURING: 14 growth stages; CONTAINS InterPro DOMAIN/s: Chlorophyll A- 1.1.2.1 c27017/f2p1/1001 -3.21 -4.25 osystem I.LHC-I B binding protein (InterPro:IPR001344); BEST Arabidopsis thaliana protein match is: light harvesting complex photosystem II (TAIR:AT2G40100.1); Has 2134 Blast hits to 2063 proteins in 214 species: Archae - 0; Bacteria - 0; Metazoa - 3; Fungi - 0; Plants - 1854; Viruses - 0; Other Eukaryotes - 277 (source: NCBI BLink). & (chl4|523536 : 99.8) no description available & (p12333|cb2a_spiol : 90.1) Chlorophyll a-b binding protein, chloroplast precursor (LHCII type I CAB) (LHCP) - Spinacia oleracea (Spinach) & (reliability: 388.0) & (original description: no original description) (at1g61520 : 385.0) PSI type III chlorophyll a/b-binding protein (Lhca3*1); photosystem I light harvesting complex gene 3 (LHCA3); FUNCTIONS IN: chlorophyll binding; INVOLVED IN: photosynthesis, light harvesting, photosynthesis; LOCATED IN: light-harvesting complex, chloroplast thylakoid membrane, plastoglobule; PS.lightreaction.phot EXPRESSED IN: 23 plant structures; EXPRESSED DURING: 14 growth stages; CONTAINS InterPro DOMAIN/s: Chlorophyll A-B binding protein 1.1.2.1 c45789/f1p2/1161 -3.81 -4.11 osystem I.LHC-I (InterPro:IPR001344); BEST Arabidopsis thaliana protein match is: photosystem I light harvesting complex gene 5 (TAIR:AT1G45474.2). & (loc_os02g10390.1 : 366.0) no description available & (chl4|518157 : 263.0) no description available & (p13869|cb12_pethy : 152.0) Chlorophyll a-b binding protein, chloroplast precursor (LHCI type II CAB) - Petunia hybrida (Petunia) & (ipr022796 : 105.58244) Chlorophyll A-B binding protein & (reliability: 770.0) & (original description: no original description) (p13869|cb12_pethy : 464.0) Chlorophyll a-b binding protein, chloroplast precursor (LHCI type II CAB) - Petunia hybrida (Petunia) & (at3g61470 : 434.0) Encodes a component of the light harvesting antenna complex of photosystem I.; photosystem I light harvesting complex gene 2 (LHCA2); FUNCTIONS IN: chlorophyll binding; INVOLVED IN: photosynthesis, light harvesting in photosystem I, photosynthesis; LOCATED IN: in 6 components; EXPRESSED IN: leaf; EXPRESSED DURING: PS.lightreaction.phot 1.1.2.1 c55530/f1p0/1264 -3.36 -4.51 seedling growth; CONTAINS InterPro DOMAIN/s: Chlorophyll A-B binding protein (InterPro:IPR001344); BEST Arabidopsis thaliana protein match is: photosystem I osystem I.LHC-I light harvesting complex gene 6 (TAIR:AT1G19150.1); Has 30201 Blast hits to 17322 proteins in 780 species: Archae - 12; Bacteria - 1396; Metazoa - 17338; Fungi - 3422; Plants - 5037; Viruses - 0; Other Eukaryotes - 2996 (source: NCBI BLink). & (loc_os07g38960.1 : 427.0) no description available & (chl4|510281 : 210.0) no description available & (ipr022796 : 115.842606) Chlorophyll A-B binding protein & (reliability: 868.0) & (original description: no original description) (loc_os02g10390.1 : 258.0) no description available & (at1g61520 : 251.0) PSI type III chlorophyll a/b-binding protein (Lhca3*1); photosystem I light harvesting complex gene 3 (LHCA3); FUNCTIONS IN: chlorophyll binding; INVOLVED IN: photosynthesis, light harvesting, photosynthesis; LOCATED IN: light-harvesting complex, PS.lightreaction.phot chloroplast thylakoid membrane, plastoglobule; EXPRESSED IN: 23 plant structures; EXPRESSED DURING: 14 growth stages; CONTAINS InterPro DOMAIN/s: 1.1.2.1 c6144/f1p3/1340 -2.84 -3.34 osystem I.LHC-I Chlorophyll A-B binding protein (InterPro:IPR001344); BEST Arabidopsis thaliana protein match is: photosystem I light harvesting complex gene 5 (TAIR:AT1G45474.2). & (chl4|518157 : 158.0) no description available & (p13869|cb12_pethy : 91.3) Chlorophyll a-b binding protein, chloroplast precursor (LHCI type II CAB) - Petunia hybrida (Petunia) & (reliability: 502.0) & (original description: no original description) (at3g47470 : 416.0) Encodes a chlorophyll a/b-binding protein that is more similar to the PSI Cab proteins than the PSII cab proteins. The predicted protein is about 20 amino acids shorter than most known Cab proteins.; light-harvesting chlorophyll-protein complex I subunit A4 (LHCA4); FUNCTIONS IN: chlorophyll binding; INVOLVED IN: response to karrikin, photosynthesis; LOCATED IN: in 7 components; EXPRESSED IN: 24 plant structures; EXPRESSED DURING: 14 growth stages; PS.lightreaction.phot CONTAINS InterPro DOMAIN/s: Chlorophyll A-B binding protein (InterPro:IPR001344); BEST Arabidopsis thaliana protein match is: photosystem I light harvesting 1.1.2.1 c70535/f8p3/995 -4.44 -5.05 osystem I.LHC-I complex gene 2 (TAIR:AT3G61470.1); Has 2336 Blast hits to 2249 proteins in 223 species: Archae - 0; Bacteria - 0; Metazoa - 3; Fungi - 0; Plants - 1967; Viruses - 0; Other Eukaryotes - 366 (source: NCBI BLink). & (loc_os08g33820.1 : 390.0) no description available & (p13869|cb12_pethy : 231.0) Chlorophyll a-b binding protein, chloroplast precursor (LHCI type II CAB) - Petunia hybrida (Petunia) & (chl4|510281 : 204.0) no description available & (ipr022796 : 102.58671) Chlorophyll A-B binding protein & (reliability: 832.0) & (original description: no original description) (p13869|cb12_pethy : 503.0) Chlorophyll a-b binding protein, chloroplast precursor (LHCI type II CAB) - Petunia hybrida (Petunia) & (at3g61470 : 447.0) Encodes a component of the light harvesting antenna complex of photosystem I.; photosystem I light harvesting complex gene 2 (LHCA2); FUNCTIONS IN: chlorophyll binding; INVOLVED IN: photosynthesis, light harvesting in photosystem I, photosynthesis; LOCATED IN: in 6 components; EXPRESSED IN: leaf; EXPRESSED DURING: PS.lightreaction.phot 1.1.2.1 c8390/f2p6/1366 -3 -3.37 seedling growth; CONTAINS InterPro DOMAIN/s: Chlorophyll A-B binding protein (InterPro:IPR001344); BEST Arabidopsis thaliana protein match is: photosystem I osystem I.LHC-I light harvesting complex gene 6 (TAIR:AT1G19150.1); Has 30201 Blast hits to 17322 proteins in 780 species: Archae - 12; Bacteria - 1396; Metazoa - 17338; Fungi - 3422; Plants - 5037; Viruses - 0; Other Eukaryotes - 2996 (source: NCBI BLink). & (loc_os07g38960.1 : 441.0) no description available & (chl4|510281 : 212.0) no description available & (ipr022796 : 115.67398) Chlorophyll A-B binding protein & (reliability: 894.0) & (original description: no original description)

(p22181|psah_orysa : 168.0) Photosystem I reaction center subunit VI, chloroplast precursor (PSI-H) (Light-harvesting complex I 11 kDa protein) (Protein GOS5) - Oryza sativa (Rice) & (loc_os05g48630.2 : 168.0) no description available & (at1g52230 : 164.0) photosystem I subunit H2 (PSAH2); FUNCTIONS IN: molecular_function PS.lightreaction.phot unknown; INVOLVED IN: photosynthesis; LOCATED IN: in 6 components; EXPRESSED IN: 21 plant structures; EXPRESSED DURING: 13 growth stages; CONTAINS 1.1.2.2 osystem I.PSI c13597/f5p2/729 -2.29 -2.88 InterPro DOMAIN/s: Photosystem I reaction centre subunit VI (InterPro:IPR004928); BEST Arabidopsis thaliana protein match is: photosystem I subunit H-1 polypeptide subunits (TAIR:AT3G16140.1); Has 102 Blast hits to 102 proteins in 31 species: Archae - 0; Bacteria - 0; Metazoa - 0; Fungi - 0; Plants - 100; Viruses - 0; Other Eukaryotes - 2 (source: NCBI BLink). & (ipr004928 : 140.51956) Photosystem I PsaH, reaction centre subunit VI & (reliability: 573.90924) & (original description: no original description)

139

(p12355|psaf_spiol : 284.0) Photosystem I reaction center subunit III, chloroplast precursor (Light-harvesting complex I 17 kDa protein) (PSI-F) - Spinacia oleracea (Spinach) & (at1g31330 : 282.0) Encodes subunit F of photosystem I.; photosystem I subunit F (PSAF); FUNCTIONS IN: molecular_function unknown; INVOLVED IN: PS.lightreaction.phot photosynthesis; LOCATED IN: in 7 components; EXPRESSED IN: 25 plant structures; EXPRESSED DURING: 14 growth stages; CONTAINS InterPro DOMAIN/s: 1.1.2.2 osystem I.PSI c17083/f2p0/1064 -3.18 -3.38 Photosystem I reaction centre protein PsaF, subunit III (InterPro:IPR003666); Has 387 Blast hits to 387 proteins in 119 species: Archae - 0; Bacteria - 139; Metazoa - 0; polypeptide subunits Fungi - 0; Plants - 79; Viruses - 6; Other Eukaryotes - 163 (source: NCBI BLink). & (loc_os03g56670.2 : 267.0) no description available & (ipr003666 : 191.80771) Photosystem I PsaF, reaction centre subunit III & (chl4|526339 : 187.0) no description available & (reliability: 899.66345) & (original description: no original description) (at1g30380 : 169.0) Encodes subunit K of photosystem I reaction center.; photosystem I subunit K (PSAK); FUNCTIONS IN: molecular_function unknown; INVOLVED IN: photosynthesis; LOCATED IN: thylakoid, chloroplast thylakoid membrane, photosystem I, chloroplast, membrane; EXPRESSED IN: 23 plant structures; EXPRESSED PS.lightreaction.phot DURING: 13 growth stages; CONTAINS InterPro DOMAIN/s: Photosystem I PsaG/PsaK protein (InterPro:IPR000549), Photosystem I reaction centre, PsaK, plant 1.1.2.2 osystem I.PSI c25205/f3p0/555 -3.95 -4 (InterPro:IPR017493), Photosystem I reaction centre, PsaG/PsaK, plant (InterPro:IPR016370); Has 85 Blast hits to 85 proteins in 34 species: Archae - 0; Bacteria - 0; polypeptide subunits Metazoa - 0; Fungi - 0; Plants - 85; Viruses - 0; Other Eukaryotes - 0 (source: NCBI BLink). & (q9zt05|psak_medsa : 163.0) Photosystem I reaction center subunit psaK, chloroplast precursor (Photosystem I subunit X) (PSI-K) - Medicago sativa (Alfalfa) & (loc_os07g05480.1 : 140.0) no description available & (reliability: 338.0) & (original description: no original description) (at5g64040 : 108.0) Encodes the only subunit of photosystem I located entirely in the thylakoid lumen. May be involved in the interaction between plastocyanin and the photosystem I complex.; PSAN; FUNCTIONS IN: calmodulin binding; INVOLVED IN: photosynthetic electron transport in photosystem I; LOCATED IN: chloroplast PS.lightreaction.phot thylakoid membrane, photosystem I, chloroplast thylakoid lumen, chloroplast, chloroplast photosystem I; EXPRESSED IN: 22 plant structures; EXPRESSED DURING: 13 1.1.2.2 osystem I.PSI c25509/f2p0/720 -5.23 -6.27 growth stages; CONTAINS InterPro DOMAIN/s: Photosystem I reaction centre subunit N (InterPro:IPR008796); Has 30201 Blast hits to 17322 proteins in 780 species: polypeptide subunits Archae - 12; Bacteria - 1396; Metazoa - 17338; Fungi - 3422; Plants - 5037; Viruses - 0; Other Eukaryotes - 2996 (source: NCBI BLink). & (p31093|psan_horvu : 107.0) Photosystem I reaction center subunit N, chloroplast precursor (PSI-N) - Hordeum vulgare (Barley) & (loc_os12g08770.1 : 103.0) no description available & (reliability: 216.0) & (original description: no original description) (at1g55670 : 194.0) Encodes subunit G of photosystem I, an 11-kDa membrane protein that plays an important role in electron transport between plastocyanin and PSI and is involved in the stability of the PSI complex. PSI-G subunit is bound to PSI-B and is in contact with Lhca1. The protein inserts into thylakoids by a direct or "spontaneous" pathway that does not involve the activities of any known chloroplast protein-targeting machinery. PSI-G appears to be directly or indirectly involved in the interaction between Photosystem I and plastocyanin.; photosystem I subunit G (PSAG); FUNCTIONS IN: molecular_function unknown; INVOLVED IN: photosynthetic PS.lightreaction.phot NADP+ reduction, photosystem I stabilization, protein stabilization, photosynthetic electron transport in photosystem I, photosynthesis; LOCATED IN: in 7 components; 1.1.2.2 osystem I.PSI c34744/f1p0/684 -3.62 -3.56 EXPRESSED IN: 24 plant structures; EXPRESSED DURING: 14 growth stages; CONTAINS InterPro DOMAIN/s: Photosystem I reaction centre, PsaG, plant polypeptide subunits (InterPro:IPR017494), Photosystem I PsaG/PsaK protein (InterPro:IPR000549), Photosystem I reaction centre, PsaG/PsaK, plant (InterPro:IPR016370); Has 117 Blast hits to 117 proteins in 27 species: Archae - 0; Bacteria - 0; Metazoa - 0; Fungi - 0; Plants - 117; Viruses - 0; Other Eukaryotes - 0 (source: NCBI BLink). & (p12357|psag_spiol : 188.0) Photosystem I reaction center subunit V, chloroplast precursor (PSI-G) (Photosystem I 9 kDa protein) - Spinacia oleracea (Spinach) & (loc_os09g30340.1 : 184.0) no description available & (ipr017494 : 90.45474) Photosystem I PsaG, plant & (reliability: 388.0) & (original description: no original description) (at1g08380 : 160.0) Encodes subunit O of photosystem I.; photosystem I subunit O (PSAO); FUNCTIONS IN: molecular_function unknown; INVOLVED IN: response to PS.lightreaction.phot salt stress, photosynthesis, light harvesting in photosystem I; LOCATED IN: thylakoid, photosystem I, chloroplast; EXPRESSED IN: 21 plant structures; EXPRESSED 1.1.2.2 osystem I.PSI c65847/f1p0/621 -4.86 -5.56 DURING: 13 growth stages; CONTAINS InterPro DOMAIN/s: Photosystem I PsaO (InterPro:IPR017498); Has 161 Blast hits to 161 proteins in 28 species: Archae - 0; polypeptide subunits Bacteria - 0; Metazoa - 0; Fungi - 0; Plants - 159; Viruses - 0; Other Eukaryotes - 2 (source: NCBI BLink). & (loc_os04g33830.1 : 134.0) no description available & (chl4|524654 : 80.9) no description available & (reliability: 320.0) & (original description: no original description) (p12355|psaf_spiol : 285.0) Photosystem I reaction center subunit III, chloroplast precursor (Light-harvesting complex I 17 kDa protein) (PSI-F) - Spinacia oleracea (Spinach) & (at1g31330 : 282.0) Encodes subunit F of photosystem I.; photosystem I subunit F (PSAF); FUNCTIONS IN: molecular_function unknown; INVOLVED IN: PS.lightreaction.phot photosynthesis; LOCATED IN: in 7 components; EXPRESSED IN: 25 plant structures; EXPRESSED DURING: 14 growth stages; CONTAINS InterPro DOMAIN/s: 1.1.2.2 osystem I.PSI c70202/f2p0/938 -2.8 -4.13 Photosystem I reaction centre protein PsaF, subunit III (InterPro:IPR003666); Has 387 Blast hits to 387 proteins in 119 species: Archae - 0; Bacteria - 139; Metazoa - 0; polypeptide subunits Fungi - 0; Plants - 79; Viruses - 6; Other Eukaryotes - 163 (source: NCBI BLink). & (loc_os03g56670.2 : 267.0) no description available & (ipr003666 : 192.13622) Photosystem I PsaF, reaction centre subunit III & (chl4|526339 : 187.0) no description available & (reliability: 900.2384) & (original description: no original description) (q39654|psal_cucsa : 181.0) Photosystem I reaction center subunit XI, chloroplast precursor (PSI-L) (PSI subunit V) - Cucumis sativus (Cucumber) & (loc_os12g23200.1 : 180.0) no description available & (at4g12800 : 175.0) Encodes subunit L of photosystem I reaction center.; photosystem I subunit l (PSAL); FUNCTIONS IN: PS.lightreaction.phot molecular_function unknown; INVOLVED IN: photosynthesis, light reaction, photosynthesis; LOCATED IN: in 7 components; EXPRESSED IN: 24 plant structures; 1.1.2.2 osystem I.PSI c153694/f2p0/804 -3.72 EXPRESSED DURING: 14 growth stages; CONTAINS InterPro DOMAIN/s: Photosystem I reaction centre, subunit XI PsaL (InterPro:IPR003757); Has 443 Blast hits to polypeptide subunits 443 proteins in 121 species: Archae - 0; Bacteria - 178; Metazoa - 0; Fungi - 0; Plants - 87; Viruses - 0; Other Eukaryotes - 178 (source: NCBI BLink). & (ipr003757 : 96.3721) Photosystem I PsaL, reaction centre subunit XI & (chl4|512552 : 95.9) no description available & (reliability: 518.6512) & (original description: no original description) (q41229|psaeb_nicsy : 144.0) Photosystem I reaction center subunit IV B, chloroplast precursor (PSI-E B) [Contains: Photosystem I reaction center subunit IV B isoform 2] - Nicotiana sylvestris (Wood tobacco) & (at2g20260 : 122.0) Encodes subunit E of photosystem I.; photosystem I subunit E-2 (PSAE-2); FUNCTIONS IN: catalytic PS.lightreaction.phot activity; INVOLVED IN: photosynthesis; LOCATED IN: thylakoid, chloroplast thylakoid membrane, chloroplast, plastoglobule, photosystem I reaction center; 1.1.2.2 osystem I.PSI c19171/f1p2/821 -1.69 EXPRESSED IN: 25 plant structures; EXPRESSED DURING: 13 growth stages; CONTAINS InterPro DOMAIN/s: Photosystem I reaction centre subunit IV/PsaE polypeptide subunits (InterPro:IPR003375), Electron transport accessory protein (InterPro:IPR008990); BEST Arabidopsis thaliana protein match is: Photosystem I reaction centre subunit IV / PsaE protein (TAIR:AT4G28750.1); Has 441 Blast hits to 441 proteins in 130 species: Archae - 0; Bacteria - 177; Metazoa - 0; Fungi - 0; Plants - 90; Viruses - 3; Other Eukaryotes - 171 (source: NCBI BLink). & (loc_os07g25430.3 : 115.0) no description available & (chl4|509622 : 83.6) no description available & (reliability: 244.0) &

140

(original description: no original description)

(p32869|psad_cucsa : 335.0) Photosystem I reaction center subunit II, chloroplast precursor (Photosystem I 20 kDa subunit) (PSI-D) (PS I subunit 5) - Cucumis sativus (Cucumber) & (at1g03130 : 298.0) Encodes a protein predicted by sequence similarity with spinach PsaD to be photosystem I reaction center subunit II (PsaD2); PS.lightreaction.phot photosystem I subunit D-2 (PSAD-2); FUNCTIONS IN: molecular_function unknown; INVOLVED IN: photosynthesis; LOCATED IN: in 6 components; EXPRESSED 1.1.2.2 osystem I.PSI c19491/f3p2/942 -1.14 IN: 23 plant structures; EXPRESSED DURING: 14 growth stages; CONTAINS InterPro DOMAIN/s: Photosystem I protein PsaD (InterPro:IPR003685); BEST polypeptide subunits Arabidopsis thaliana protein match is: photosystem I subunit D-1 (TAIR:AT4G02770.1); Has 509 Blast hits to 509 proteins in 136 species: Archae - 0; Bacteria - 143; Metazoa - 0; Fungi - 0; Plants - 165; Viruses - 3; Other Eukaryotes - 198 (source: NCBI BLink). & (loc_os08g44680.1 : 290.0) no description available & (chl4|522766 : 222.0) no description available & (ipr003685 : 169.32658) Photosystem I PsaD & (reliability: 892.32153) & (original description: no original description) (at4g12800 : 144.0) Encodes subunit L of photosystem I reaction center.; photosystem I subunit l (PSAL); FUNCTIONS IN: molecular_function unknown; INVOLVED IN: photosynthesis, light reaction, photosynthesis; LOCATED IN: in 7 components; EXPRESSED IN: 24 plant structures; EXPRESSED DURING: 14 growth stages; PS.lightreaction.phot CONTAINS InterPro DOMAIN/s: Photosystem I reaction centre, subunit XI PsaL (InterPro:IPR003757); Has 443 Blast hits to 443 proteins in 121 species: Archae - 0; 1.1.2.2 osystem I.PSI c24120/f1p0/991 -3.46 Bacteria - 178; Metazoa - 0; Fungi - 0; Plants - 87; Viruses - 0; Other Eukaryotes - 178 (source: NCBI BLink). & (q39654|psal_cucsa : 142.0) Photosystem I reaction center polypeptide subunits subunit XI, chloroplast precursor (PSI-L) (PSI subunit V) - Cucumis sativus (Cucumber) & (loc_os12g23200.1 : 134.0) no description available & (chl4|512552 : 93.6) no description available & (reliability: 288.0) & (original description: no original description) (q41229|psaeb_nicsy : 144.0) Photosystem I reaction center subunit IV B, chloroplast precursor (PSI-E B) [Contains: Photosystem I reaction center subunit IV B isoform 2] - Nicotiana sylvestris (Wood tobacco) & (at2g20260 : 122.0) Encodes subunit E of photosystem I.; photosystem I subunit E-2 (PSAE-2); FUNCTIONS IN: catalytic activity; INVOLVED IN: photosynthesis; LOCATED IN: thylakoid, chloroplast thylakoid membrane, chloroplast, plastoglobule, photosystem I reaction center; PS.lightreaction.phot EXPRESSED IN: 25 plant structures; EXPRESSED DURING: 13 growth stages; CONTAINS InterPro DOMAIN/s: Photosystem I reaction centre subunit IV/PsaE 1.1.2.2 osystem I.PSI c55146/f7p2/690 -1.47 (InterPro:IPR003375), Electron transport accessory protein (InterPro:IPR008990); BEST Arabidopsis thaliana protein match is: Photosystem I reaction centre subunit IV / polypeptide subunits PsaE protein (TAIR:AT4G28750.1); Has 441 Blast hits to 441 proteins in 130 species: Archae - 0; Bacteria - 177; Metazoa - 0; Fungi - 0; Plants - 90; Viruses - 3; Other Eukaryotes - 171 (source: NCBI BLink). & (loc_os07g25430.3 : 115.0) no description available & (chl4|509622 : 83.6) no description available & (reliability: 244.0) & (original description: no original description) (at4g12800 : 163.0) Encodes subunit L of photosystem I reaction center.; photosystem I subunit l (PSAL); FUNCTIONS IN: molecular_function unknown; INVOLVED IN: photosynthesis, light reaction, photosynthesis; LOCATED IN: in 7 components; EXPRESSED IN: 24 plant structures; EXPRESSED DURING: 14 growth stages; PS.lightreaction.phot CONTAINS InterPro DOMAIN/s: Photosystem I reaction centre, subunit XI PsaL (InterPro:IPR003757); Has 443 Blast hits to 443 proteins in 121 species: Archae - 0; 1.1.2.2 osystem I.PSI c668/f2p0/815 -3.44 Bacteria - 178; Metazoa - 0; Fungi - 0; Plants - 87; Viruses - 0; Other Eukaryotes - 178 (source: NCBI BLink). & (q39654|psal_cucsa : 163.0) Photosystem I reaction center polypeptide subunits subunit XI, chloroplast precursor (PSI-L) (PSI subunit V) - Cucumis sativus (Cucumber) & (loc_os12g23200.2 : 160.0) no description available & (chl4|512552 : 113.0) no description available & (reliability: 326.0) & (original description: no original description)

Appendix 2 Differentially expressed genes related to hormones.

Fold change UY UR LY vs LR vs Auxin Seq ID vs vs Description LG LY UG UY (p49249|in22_maize : 368.0) IN2-2 protein - Zea mays (Maize) & (loc_os04g26910.1 : 361.0) no description available & (at1g60710 : 355.0) Encodes ATB2.; ATB2; hormone FUNCTIONS IN: oxidoreductase activity; INVOLVED IN: response to cadmium ion; EXPRESSED IN: 23 plant structures; EXPRESSED DURING: 14 growth stages; metabolism.auxin.ind CONTAINS InterPro DOMAIN/s: Aldo/keto reductase (InterPro:IPR001395), Aldo/keto reductase subgroup (InterPro:IPR020471); BEST Arabidopsis thaliana protein 17.2.3 c16801/f3p4/1470 -1.77 uced-regulated- match is: NAD(P)-linked oxidoreductase superfamily protein (TAIR:AT1G60730.1); Has 30719 Blast hits to 30695 proteins in 2595 species: Archae - 650; Bacteria - responsive-activated 20319; Metazoa - 1822; Fungi - 2308; Plants - 1286; Viruses - 0; Other Eukaryotes - 4334 (source: NCBI BLink). & (ipr023210 : 176.89359) NADP-dependent oxidoreductase domain & (chl4|518106 : 161.0) no description available & (reliability: 710.0) & (original description: no original description) (at4g27450 : 159.0) Aluminium induced protein with YGL and LRDR motifs; FUNCTIONS IN: molecular_function unknown; INVOLVED IN: biological_process hormone unknown; LOCATED IN: cytosol, nucleus, plasma membrane; EXPRESSED IN: 25 plant structures; EXPRESSED DURING: 14 growth stages; BEST Arabidopsis metabolism.auxin.ind 17.2.3 c17090/f2p0/604 1.25 thaliana protein match is: Aluminium induced protein with YGL and LRDR motifs (TAIR:AT3G15450.1); Has 30201 Blast hits to 17322 proteins in 780 species: Archae - uced-regulated- 12; Bacteria - 1396; Metazoa - 17338; Fungi - 3422; Plants - 5037; Viruses - 0; Other Eukaryotes - 2996 (source: NCBI BLink). & (loc_os03g53270.2 : 105.0) no responsive-activated description available & (reliability: 318.0) & (original description: no original description) hormone (q05349|12kd_fraan : 139.0) Auxin-repressed 12.5 kDa protein - Fragaria ananassa (Strawberry) & (at1g28330 : 137.0) dormancy-associated protein (DRM1); dormancy- 17.2.3 metabolism.auxin.ind c18729/f3p0/807 1.12 associated protein-like 1 (DYL1); FUNCTIONS IN: molecular_function unknown; INVOLVED IN: response to fructose stimulus, response to sucrose stimulus, response to uced-regulated- glucose stimulus; LOCATED IN: cellular_component unknown; EXPRESSED IN: 22 plant structures; EXPRESSED DURING: 13 growth stages; CONTAINS InterPro

141

responsive-activated DOMAIN/s: Dormancyauxin associated (InterPro:IPR008406); BEST Arabidopsis thaliana protein match is: Dormancy/auxin associated family protein (TAIR:AT2G33830.2). & (loc_os11g44810.1 : 127.0) no description available & (ipr008406 : 104.72499) Dormancyauxin associated & (reliability: 254.0) & (original description: no original description) hormone (at2g46690 : 127.0) SAUR-like auxin-responsive protein family ; CONTAINS InterPro DOMAIN/s: Auxin responsive SAUR protein (InterPro:IPR003676); BEST metabolism.auxin.ind Arabidopsis thaliana protein match is: SAUR-like auxin-responsive protein family (TAIR:AT3G61900.1); Has 1291 Blast hits to 1280 proteins in 26 species: Archae - 0; 17.2.3 c22853/f1p1/809 2.2 uced-regulated- Bacteria - 0; Metazoa - 0; Fungi - 0; Plants - 1290; Viruses - 0; Other Eukaryotes - 1 (source: NCBI BLink). & (loc_os06g50040.1 : 86.3) no description available & responsive-activated (reliability: 254.0) & (original description: no original description) hormone (at5g65470 : 733.0) O-fucosyltransferase family protein; CONTAINS InterPro DOMAIN/s: GDP-fucose protein O-fucosyltransferase (InterPro:IPR019378); BEST metabolism.auxin.ind Arabidopsis thaliana protein match is: O-fucosyltransferase family protein (TAIR:AT4G24530.1); Has 30201 Blast hits to 17322 proteins in 780 species: Archae - 12; 17.2.3 c23295/f2p3/2167 -1.09 uced-regulated- Bacteria - 1396; Metazoa - 17338; Fungi - 3422; Plants - 5037; Viruses - 0; Other Eukaryotes - 2996 (source: NCBI BLink). & (loc_os09g24570.1 : 698.0) no description responsive-activated available & (ipr019378 : 216.03754) GDP-fucose protein O-fucosyltransferase & (reliability: 1466.0) & (original description: no original description) hormone (at1g75580 : 172.0) SAUR-like auxin-responsive protein family ; CONTAINS InterPro DOMAIN/s: Auxin responsive SAUR protein (InterPro:IPR003676); BEST metabolism.auxin.ind Arabidopsis thaliana protein match is: SAUR-like auxin-responsive protein family (TAIR:AT1G19830.1); Has 1403 Blast hits to 1386 proteins in 27 species: Archae - 0; 17.2.3 c2396/f8p0/753 1.58 1.75 uced-regulated- Bacteria - 0; Metazoa - 0; Fungi - 0; Plants - 1402; Viruses - 0; Other Eukaryotes - 1 (source: NCBI BLink). & (loc_os02g24740.1 : 114.0) no description available & responsive-activated (p33081|ax15a_soybn : 85.9) Auxin-induced protein 15A - Glycine max (Soybean) & (reliability: 344.0) & (original description: no original description) hormone (at5g65470 : 731.0) O-fucosyltransferase family protein; CONTAINS InterPro DOMAIN/s: GDP-fucose protein O-fucosyltransferase (InterPro:IPR019378); BEST metabolism.auxin.ind Arabidopsis thaliana protein match is: O-fucosyltransferase family protein (TAIR:AT4G24530.1); Has 30201 Blast hits to 17322 proteins in 780 species: Archae - 12; 17.2.3 c24764/f3p4/2298 -1.27 uced-regulated- Bacteria - 1396; Metazoa - 17338; Fungi - 3422; Plants - 5037; Viruses - 0; Other Eukaryotes - 2996 (source: NCBI BLink). & (loc_os09g24570.1 : 700.0) no description responsive-activated available & (ipr019378 : 215.3116) GDP-fucose protein O-fucosyltransferase & (reliability: 1462.0) & (original description: no original description) hormone (at2g46690 : 126.0) SAUR-like auxin-responsive protein family ; CONTAINS InterPro DOMAIN/s: Auxin responsive SAUR protein (InterPro:IPR003676); BEST metabolism.auxin.ind Arabidopsis thaliana protein match is: SAUR-like auxin-responsive protein family (TAIR:AT3G61900.1); Has 1291 Blast hits to 1280 proteins in 26 species: Archae - 0; 17.2.3 c31316/f1p0/1137 2.1 uced-regulated- Bacteria - 0; Metazoa - 0; Fungi - 0; Plants - 1290; Viruses - 0; Other Eukaryotes - 1 (source: NCBI BLink). & (loc_os06g50040.1 : 86.7) no description available & responsive-activated (reliability: 252.0) & (original description: no original description) hormone (loc_os08g35110.1 : 124.0) no description available & (at5g20820 : 119.0) SAUR-like auxin-responsive protein family ; CONTAINS InterPro DOMAIN/s: Auxin metabolism.auxin.ind responsive SAUR protein (InterPro:IPR003676); BEST Arabidopsis thaliana protein match is: SAUR-like auxin-responsive protein family (TAIR:AT1G72430.1); Has 120 17.2.3 c44100/f1p4/1072 1.79 1.97 uced-regulated- Blast hits to 120 proteins in 13 species: Archae - 0; Bacteria - 0; Metazoa - 0; Fungi - 0; Plants - 120; Viruses - 0; Other Eukaryotes - 0 (source: NCBI BLink). & (reliability: responsive-activated 238.0) & (original description: no original description) (at4g15550 : 425.0) UDP-glucose:indole-3-acetate beta-D-glucosyltransferase; indole-3-acetate beta-D-glucosyltransferase (IAGLU); FUNCTIONS IN: UDP- glycosyltransferase activity, transferase activity, transferring glycosyl groups; INVOLVED IN: metabolic process; LOCATED IN: chloroplast; EXPRESSED IN: 17 plant hormone structures; EXPRESSED DURING: 7 growth stages; CONTAINS InterPro DOMAIN/s: UDP-glucuronosyl/UDP-glucosyltransferase (InterPro:IPR002213); BEST 17.2.1 metabolism.auxin.sy c45825/f1p2/1630 1.66 1.82 Arabidopsis thaliana protein match is: UDP-glucosyltransferase 75B1 (TAIR:AT1G05560.1); Has 30201 Blast hits to 17322 proteins in 780 species: Archae - 12; Bacteria - nthesis-degradation 1396; Metazoa - 17338; Fungi - 3422; Plants - 5037; Viruses - 0; Other Eukaryotes - 2996 (source: NCBI BLink). & (loc_os01g08440.1 : 355.0) no description available & (q41819|iaag_maize : 267.0) Indole-3-acetate beta-glucosyltransferase (EC 2.4.1.121) (IAA-Glu synthetase) ((Uridine 5'-diphosphate-glucose:indol-3-ylacetyl)-beta-D- glucosyl transferase) - Zea mays (Maize) & (reliability: 828.0) & (original description: no original description) hormone (loc_os08g35110.1 : 124.0) no description available & (at5g20820 : 119.0) SAUR-like auxin-responsive protein family ; CONTAINS InterPro DOMAIN/s: Auxin metabolism.auxin.ind responsive SAUR protein (InterPro:IPR003676); BEST Arabidopsis thaliana protein match is: SAUR-like auxin-responsive protein family (TAIR:AT1G72430.1); Has 120 17.2.3 c51757/f5p2/869 1.5 uced-regulated- Blast hits to 120 proteins in 13 species: Archae - 0; Bacteria - 0; Metazoa - 0; Fungi - 0; Plants - 120; Viruses - 0; Other Eukaryotes - 0 (source: NCBI BLink). & (reliability: responsive-activated 238.0) & (original description: no original description) (at4g27450 : 225.0) Aluminium induced protein with YGL and LRDR motifs; FUNCTIONS IN: molecular_function unknown; INVOLVED IN: biological_process hormone unknown; LOCATED IN: cytosol, nucleus, plasma membrane; EXPRESSED IN: 25 plant structures; EXPRESSED DURING: 14 growth stages; BEST Arabidopsis metabolism.auxin.ind thaliana protein match is: Aluminium induced protein with YGL and LRDR motifs (TAIR:AT3G15450.1); Has 30201 Blast hits to 17322 proteins in 780 species: Archae - 17.2.3 c53561/f2p1/1359 1.24 uced-regulated- 12; Bacteria - 1396; Metazoa - 17338; Fungi - 3422; Plants - 5037; Viruses - 0; Other Eukaryotes - 2996 (source: NCBI BLink). & (loc_os04g58280.2 : 143.0) no responsive-activated description available & (p24805|tsjt1_tobac : 110.0) Stem-specific protein TSJT1 - Nicotiana tabacum (Common tobacco) & (reliability: 450.0) & (original description: no original description) (at4g27450 : 373.0) Aluminium induced protein with YGL and LRDR motifs; FUNCTIONS IN: molecular_function unknown; INVOLVED IN: biological_process hormone unknown; LOCATED IN: cytosol, nucleus, plasma membrane; EXPRESSED IN: 25 plant structures; EXPRESSED DURING: 14 growth stages; BEST Arabidopsis metabolism.auxin.ind thaliana protein match is: Aluminium induced protein with YGL and LRDR motifs (TAIR:AT3G15450.1); Has 30201 Blast hits to 17322 proteins in 780 species: Archae - 17.2.3 c57140/f1p1/1027 1.3 uced-regulated- 12; Bacteria - 1396; Metazoa - 17338; Fungi - 3422; Plants - 5037; Viruses - 0; Other Eukaryotes - 2996 (source: NCBI BLink). & (loc_os03g53270.1 : 207.0) no responsive-activated description available & (p24805|tsjt1_tobac : 110.0) Stem-specific protein TSJT1 - Nicotiana tabacum (Common tobacco) & (reliability: 746.0) & (original description: no original description) (loc_os04g26910.1 : 543.0) no description available & (at1g60710 : 532.0) Encodes ATB2.; ATB2; FUNCTIONS IN: oxidoreductase activity; INVOLVED IN: response to hormone cadmium ion; EXPRESSED IN: 23 plant structures; EXPRESSED DURING: 14 growth stages; CONTAINS InterPro DOMAIN/s: Aldo/keto reductase metabolism.auxin.ind 17.2.3 c63099/f2p3/1512 -1.19 (InterPro:IPR001395), Aldo/keto reductase subgroup (InterPro:IPR020471); BEST Arabidopsis thaliana protein match is: NAD(P)-linked oxidoreductase superfamily uced-regulated- protein (TAIR:AT1G60730.1); Has 30719 Blast hits to 30695 proteins in 2595 species: Archae - 650; Bacteria - 20319; Metazoa - 1822; Fungi - 2308; Plants - 1286; responsive-activated Viruses - 0; Other Eukaryotes - 4334 (source: NCBI BLink). & (p40691|a115_tobac : 498.0) Auxin-induced protein PCNT115 - Nicotiana tabacum (Common tobacco) &

142

(chl4|518106 : 263.0) no description available & (ipr023210 : 258.05206) NADP-dependent oxidoreductase domain & (reliability: 1064.0) & (original description: no original description)

ABA (at3g21750 : 229.0) Encodes a glucosyltransferase that can attach glucose to a number of hydroxylated phenolic compounds as well as quercetins in vitro; UDP-glucosyl transferase 71B1 (UGT71B1); FUNCTIONS IN: quercetin 3-O-glucosyltransferase activity, UDP-glycosyltransferase activity, UDP-glucosyltransferase activity, transferase hormone activity, transferring glycosyl groups; INVOLVED IN: metabolic process; LOCATED IN: cellular_component unknown; EXPRESSED IN: 21 plant structures; metabolism.abscisic EXPRESSED DURING: 13 growth stages; CONTAINS InterPro DOMAIN/s: UDP-glucuronosyl/UDP-glucosyltransferase (InterPro:IPR002213); BEST Arabidopsis 17.1.1 c22883/f2p2/1518 -1.68 -1.37 acid.synthesis- thaliana protein match is: UDP-Glycosyltransferase superfamily protein (TAIR:AT3G21760.1); Has 7577 Blast hits to 7528 proteins in 438 species: Archae - 0; Bacteria - degradation 404; Metazoa - 2188; Fungi - 36; Plants - 4867; Viruses - 24; Other Eukaryotes - 58 (source: NCBI BLink). & (loc_os07g32020.1 : 162.0) no description available & (p56725|zox_phavu : 89.4) Zeatin O-xylosyltransferase (EC 2.4.2.40) (Zeatin O-beta-D-xylosyltransferase) - Phaseolus vulgaris (Kidney bean) (French bean) & (reliability: 442.0) & (original description: no original description) (loc_os04g33240.1 : 246.0) no description available & (at4g03140 : 213.0) NAD(P)-binding Rossmann-fold superfamily protein; FUNCTIONS IN: oxidoreductase activity, hormone binding, catalytic activity; INVOLVED IN: oxidation reduction, metabolic process; LOCATED IN: cellular_component unknown; EXPRESSED IN: 20 plant structures; metabolism.abscisic EXPRESSED DURING: 13 growth stages; CONTAINS InterPro DOMAIN/s: Short-chain dehydrogenase/reductase, conserved site (InterPro:IPR020904), NAD(P)-binding acid.synthesis- 17.1.1.1. domain (InterPro:IPR016040), Glucose/ribitol dehydrogenase (InterPro:IPR002347), Short-chain dehydrogenase/reductase SDR (InterPro:IPR002198); BEST Arabidopsis degradation.synthesis c45861/f1p0/1048 2.52 3.76 11 thaliana protein match is: NAD(P)-binding Rossmann-fold superfamily protein (TAIR:AT3G26760.1); Has 115095 Blast hits to 114901 proteins in 3526 species: Archae - .short chain alcohol 967; Bacteria - 76065; Metazoa - 4961; Fungi - 5951; Plants - 2443; Viruses - 5; Other Eukaryotes - 24703 (source: NCBI BLink). & (p50160|ts2_maize : 203.0) Sex dehydrogenmase determination protein tasselseed-2 - Zea mays (Maize) & (ipr016040 : 175.20718) NAD(P)-binding domain & (chl4|524254 : 122.0) no description available & (reliability: (ABA2) 418.0) & (original description: no original description) (at3g21760 : 339.0) Encodes HYR1, a UDP glycosyltransferase (UGT). HYR1 glucosylates hypostatin, an inhibitor of cell expansion in vivo to form a bioactive glucoside.; HYPOSTATIN RESISTANCE 1 (HYR1); FUNCTIONS IN: UDP-glycosyltransferase activity, transferase activity, transferring glycosyl groups; INVOLVED IN: hormone metabolic process; LOCATED IN: cellular_component unknown; EXPRESSED IN: 21 plant structures; EXPRESSED DURING: 13 growth stages; CONTAINS InterPro metabolism.abscisic 17.1.1 c41869/f2p0/1831 2.95 DOMAIN/s: UDP-glucuronosyl/UDP-glucosyltransferase (InterPro:IPR002213); BEST Arabidopsis thaliana protein match is: UDP-glucosyl transferase 71B6 acid.synthesis- (TAIR:AT3G21780.1); Has 7489 Blast hits to 7432 proteins in 389 species: Archae - 0; Bacteria - 261; Metazoa - 2177; Fungi - 28; Plants - 4962; Viruses - 0; Other degradation Eukaryotes - 61 (source: NCBI BLink). & (loc_os07g37690.1 : 253.0) no description available & (p56725|zox_phavu : 137.0) Zeatin O-xylosyltransferase (EC 2.4.2.40) (Zeatin O-beta-D-xylosyltransferase) - Phaseolus vulgaris (Kidney bean) (French bean) & (reliability: 630.0) & (original description: no original description) hormone (at3g19290 : 243.0) bZIP transcription factor with specificity for abscisic acid-responsive elements (ABRE). Mediate ABA-dependent stress responses.; ABRE binding metabolism.abscisic factor 4 (ABF4); CONTAINS InterPro DOMAIN/s: Basic-leucine zipper (bZIP) transcription factor (InterPro:IPR004827), bZIP transcription factor, bZIP-1 17.1.2 c45757/f3p0/2556 1.09 acid.signal (InterPro:IPR011616); BEST Arabidopsis thaliana protein match is: abscisic acid responsive element-binding factor 1 (TAIR:AT1G49720.1). & (loc_os06g10880.2 : 134.0) transduction no description available & (reliability: 486.0) & (original description: no original description) hormone (at3g14440 : 832.0) Encodes 9-cis-epoxycarotenoid dioxygenase, a key enzyme in the biosynthesis of abscisic acid. Regulated in response to drought and salinity. metabolism.abscisic Expressed in roots, flowers and seeds. Localized to the chloroplast stroma and thylakoid membrane.; nine-cis-epoxycarotenoid dioxygenase 3 (NCED3); CONTAINS acid.synthesis- 17.1.1.1. InterPro DOMAIN/s: Carotenoid oxygenase (InterPro:IPR004294); BEST Arabidopsis thaliana protein match is: nine-cis-epoxycarotenoid dioxygenase 9 degradation.synthesis c51372/f4p3/2599 -1.1 10 (TAIR:AT1G78390.1); Has 2945 Blast hits to 2901 proteins in 493 species: Archae - 16; Bacteria - 796; Metazoa - 281; Fungi - 204; Plants - 893; Viruses - 0; Other .9-cis- Eukaryotes - 755 (source: NCBI BLink). & (loc_os03g44380.1 : 752.0) no description available & (ipr004294 : 302.08493) Carotenoid oxygenase & (chl4|520416 : 261.0) epoxycarotenoid no description available & (reliability: 1664.0) & (original description: no original description) dioxygenase (at2g47770 : 121.0) Encodes a membrane-bound protein designated AtTSPO (Arabidopsis thaliana TSPO-related). AtTSPO is related to the bacterial outer membrane hormone tryptophan-rich sensory protein (TspO) and the mammalian mitochondrial 18 kDa Translocator Protein (18 kDa TSPO), members of the TspO/MBR domain-containing metabolism.abscisic membrane proteins. Mainly detected in dry seeds, but can be induced in vegetative tissues by osmotic or salt stress or abscisic acid treatment. Located in endoplasmic 17.1.3 acid.induced- c62952/f6p0/968 -0.78 reticulum and the Golgi stacks.; TSPO(outer membrane tryptophan-rich sensory protein)-related (TSPO); CONTAINS InterPro DOMAIN/s: TspO/MBR-related protein regulated-responsive- (InterPro:IPR004307); Has 567 Blast hits to 567 proteins in 188 species: Archae - 24; Bacteria - 226; Metazoa - 151; Fungi - 6; Plants - 53; Viruses - 0; Other Eukaryotes - activated 107 (source: NCBI BLink). & (loc_os05g05930.1 : 84.3) no description available & (reliability: 242.0) & (original description: no original description) (at2g30470 : 328.0) HSI2 is a member of a novel family of B3 domain proteins with a sequence similar to the ERF-associated amphiphilic repression (EAR) motif. It functions as an active repressor of the Spo minimal promoter (derived from a gene for sweet potato sporamin A1) through the EAR motif. It contains a plant-specific B3 DNA-binding domain. The Arabidopsis genome contains 42 genes with B3 domains which could be classified into three families that are represented by ABI3, ARF1 and hormone RAV1. HSI2 belongs to the ABI3 family. It is expressed at similar levels in all organs. Treatment with 6% sucrose showed a slight increase in transcript levels after 24 h. No metabolism.abscisic 17.1.2 c74207/f1p1/3226 -0.99 changes were observed after treatment with 50µM ABA. It is localized in the nucleus via a nuclear localization sequence located in the fourth conserved region of the C- acid.signal terminal B3 domain.; high-level expression of sugar-inducible gene 2 (HSI2); CONTAINS InterPro DOMAIN/s: Transcriptional factor B3 (InterPro:IPR003340), Zinc transduction finger, CW-type (InterPro:IPR011124); BEST Arabidopsis thaliana protein match is: HSI2-like 1 (TAIR:AT4G32010.1); Has 1263 Blast hits to 1242 proteins in 97 species: Archae - 0; Bacteria - 5; Metazoa - 31; Fungi - 0; Plants - 1168; Viruses - 0; Other Eukaryotes - 59 (source: NCBI BLink). & (loc_os07g48200.1 : 312.0) no description available & (chl4|514164 : 91.3) no description available & (p26307|viv1_maize : 86.3) Regulatory protein viviparous-1 - Zea mays (Maize) & (reliability: 656.0) &

143

(original description: no original description)

Cytokini n (at2g01830 : 1267.0) Histidine kinase: cytokinin-binding receptor that transduces cytokinin signals across the plasma membrane; WOODEN LEG (WOL); FUNCTIONS IN: osmosensor activity, cytokine binding, cytokinin receptor activity, protein histidine kinase activity, phosphoprotein phosphatase activity; INVOLVED IN: in 7 processes; LOCATED IN: membrane; EXPRESSED IN: 30 plant structures; EXPRESSED DURING: 13 growth stages; CONTAINS InterPro DOMAIN/s: Signal transduction histidine kinase, homodimeric (InterPro:IPR009082), CHASE (InterPro:IPR006189), Signal transduction histidine kinase, core (InterPro:IPR005467), ATPase- hormone like, ATP-binding domain (InterPro:IPR003594), CheY-like (InterPro:IPR011006), Signal transduction response regulator, receiver domain (InterPro:IPR001789), Signal 17.4.2 metabolism.cytokinin c116236/f3p2/4445 -1.58 -1.57 transduction histidine kinase, subgroup 1, dimerisation/phosphoacceptor domain (InterPro:IPR003661), Signal transduction histidine kinase-related protein, C-terminal .signal transduction (InterPro:IPR004358); BEST Arabidopsis thaliana protein match is: histidine kinase 2 (TAIR:AT5G35750.1); Has 35333 Blast hits to 34131 proteins in 2444 species: Archae - 798; Bacteria - 22429; Metazoa - 974; Fungi - 991; Plants - 531; Viruses - 0; Other Eukaryotes - 9610 (source: NCBI BLink). & (loc_os03g50860.1 : 1071.0) no description available & (chl4|514352 : 154.0) no description available & (o81122|etr1_maldo : 130.0) Ethylene receptor (EC 2.7.13.3) - Malus domestica (Apple) (Malus sylvestris) & (ipr006189 : 117.09537) CHASE & (reliability: 2534.0) & (original description: no original description) (at3g63440 : 712.0) This gene used to be called AtCKX7. It encodes a protein whose sequence is similar to cytokinin oxidase/dehydrogenase, which catalyzes the degradation of cytokinins.; cytokinin oxidase/dehydrogenase 6 (CKX6); CONTAINS InterPro DOMAIN/s: Cytokinin dehydrogenase 1, FAD/cytokinin binding domain hormone (InterPro:IPR015345), FAD-binding, type 2 (InterPro:IPR016166), FAD-linked oxidase-like, C-terminal (InterPro:IPR016164), FAD linked oxidase, N-terminal - metabolism.cytokinin (InterPro:IPR006094); BEST Arabidopsis thaliana protein match is: cytokinin oxidase/dehydrogenase 1 (TAIR:AT2G41510.1); Has 6490 Blast hits to 6485 proteins in 1279 17.4.1 c22552/f1p1/1961 -1.86 -1.75 -2.59 2.3 .synthesis- species: Archae - 163; Bacteria - 4080; Metazoa - 140; Fungi - 1069; Plants - 530; Viruses - 0; Other Eukaryotes - 508 (source: NCBI BLink). & (loc_os05g31040.1 : 691.0) 1 degradation no description available & (q9t0n8|ckx1_maize : 402.0) Cytokinin dehydrogenase 1 precursor (EC 1.5.99.12) (Cytokinin oxidase 1) (CKO 1) (COX 1) (ZmCKX1) - Zea mays (Maize) & (ipr015345 : 259.85565) Cytokinin dehydrogenase 1, FAD/cytokinin binding domain & (reliability: 1424.0) & (original description: no original description) (loc_os01g71310.1 : 439.0) no description available & (at2g41510 : 435.0) It encodes a protein whose sequence is similar to cytokinin oxidase/dehydrogenase, which catalyzes the degradation of cytokinins.; cytokinin oxidase/dehydrogenase 1 (CKX1); FUNCTIONS IN: cytokinin dehydrogenase activity; INVOLVED IN: N-terminal protein myristoylation, cytokinin catabolic process, meristem development; LOCATED IN: vacuole; EXPRESSED IN: lateral root, shoot apex, hypocotyl, root, flower; hormone CONTAINS InterPro DOMAIN/s: Cytokinin dehydrogenase 1, FAD/cytokinin binding domain (InterPro:IPR015345), FAD-binding, type 2 (InterPro:IPR016166), Oxygen metabolism.cytokinin 17.4.1 c43717/f1p1/1212 -3.36 -2.52 oxidoreductase covalent FAD-binding site (InterPro:IPR006093), FAD-linked oxidase-like, C-terminal (InterPro:IPR016164), FAD linked oxidase, N-terminal .synthesis- (InterPro:IPR006094); BEST Arabidopsis thaliana protein match is: cytokinin oxidase/dehydrogenase 6 (TAIR:AT3G63440.1); Has 6769 Blast hits to 6763 proteins in 1376 degradation species: Archae - 168; Bacteria - 3882; Metazoa - 142; Fungi - 1302; Plants - 645; Viruses - 0; Other Eukaryotes - 630 (source: NCBI BLink). & (ipr015345 : 261.6618) Cytokinin dehydrogenase 1, FAD/cytokinin binding domain & (q9t0n8|ckx1_maize : 238.0) Cytokinin dehydrogenase 1 precursor (EC 1.5.99.12) (Cytokinin oxidase 1) (CKO 1) (COX 1) (ZmCKX1) - Zea mays (Maize) & (reliability: 870.0) & (original description: no original description) (loc_os02g51910.1 : 427.0) no description available & (at1g22400 : 409.0) UGT85A1; FUNCTIONS IN: in 6 functions; INVOLVED IN: metabolic process; LOCATED hormone IN: cellular_component unknown; EXPRESSED IN: 22 plant structures; EXPRESSED DURING: 10 growth stages; CONTAINS InterPro DOMAIN/s: UDP- metabolism.cytokinin glucuronosyl/UDP-glucosyltransferase (InterPro:IPR002213); BEST Arabidopsis thaliana protein match is: UDP-glucosyl transferase 85A3 (TAIR:AT1G22380.1); Has 17.4.1 c6267/f3p0/1718 1.38 1.61 .synthesis- 7940 Blast hits to 7832 proteins in 421 species: Archae - 0; Bacteria - 227; Metazoa - 2330; Fungi - 36; Plants - 5216; Viruses - 60; Other Eukaryotes - 71 (source: NCBI degradation BLink). & (q43641|ufog_solme : 158.0) Anthocyanidin 3-O-glucosyltransferase (EC 2.4.1.115) (Flavonol 3-O-glucosyltransferase) (UDP-glucose flavonoid 3-O- glucosyltransferase) - Solanum melongena (Eggplant) (Aubergine) & (reliability: 818.0) & (original description: no original description) (at1g75450 : 743.0) This gene used to be called AtCKX6. It encodes a protein whose sequence is similar to cytokinin oxidase/dehydrogenase, which catalyzes the degradation of cytokinins.; cytokinin oxidase 5 (CKX5); CONTAINS InterPro DOMAIN/s: Cytokinin dehydrogenase 1, FAD/cytokinin binding domain hormone (InterPro:IPR015345), FAD-binding, type 2 (InterPro:IPR016166), Oxygen oxidoreductase covalent FAD-binding site (InterPro:IPR006093), FAD-linked oxidase-like, C- metabolism.cytokinin 17.4.1 c65468/f1p2/2039 -2.71 -3.41 terminal (InterPro:IPR016164), FAD linked oxidase, N-terminal (InterPro:IPR006094); BEST Arabidopsis thaliana protein match is: cytokinin oxidase/dehydrogenase 1 .synthesis- (TAIR:AT2G41510.1). & (loc_os01g56810.1 : 660.0) no description available & (q9t0n8|ckx1_maize : 449.0) Cytokinin dehydrogenase 1 precursor (EC 1.5.99.12) degradation (Cytokinin oxidase 1) (CKO 1) (COX 1) (ZmCKX1) - Zea mays (Maize) & (ipr015345 : 276.46103) Cytokinin dehydrogenase 1, FAD/cytokinin binding domain & (reliability: 1486.0) & (original description: no original description) (at2g01830 : 1273.0) Histidine kinase: cytokinin-binding receptor that transduces cytokinin signals across the plasma membrane; WOODEN LEG (WOL); FUNCTIONS IN: osmosensor activity, cytokine binding, cytokinin receptor activity, protein histidine kinase activity, phosphoprotein phosphatase activity; INVOLVED IN: in 7 processes; LOCATED IN: membrane; EXPRESSED IN: 30 plant structures; EXPRESSED DURING: 13 growth stages; CONTAINS InterPro DOMAIN/s: Signal transduction histidine kinase, homodimeric (InterPro:IPR009082), CHASE (InterPro:IPR006189), Signal transduction histidine kinase, core (InterPro:IPR005467), ATPase- hormone like, ATP-binding domain (InterPro:IPR003594), CheY-like (InterPro:IPR011006), Signal transduction response regulator, receiver domain (InterPro:IPR001789), Signal 17.4.2 metabolism.cytokinin c97754/f1p4/4242 -2.61 -2.69 transduction histidine kinase, subgroup 1, dimerisation/phosphoacceptor domain (InterPro:IPR003661), Signal transduction histidine kinase-related protein, C-terminal .signal transduction (InterPro:IPR004358); BEST Arabidopsis thaliana protein match is: histidine kinase 2 (TAIR:AT5G35750.1); Has 35333 Blast hits to 34131 proteins in 2444 species: Archae - 798; Bacteria - 22429; Metazoa - 974; Fungi - 991; Plants - 531; Viruses - 0; Other Eukaryotes - 9610 (source: NCBI BLink). & (loc_os03g50860.1 : 1078.0) no description available & (chl4|514352 : 157.0) no description available & (o81122|etr1_maldo : 130.0) Ethylene receptor (EC 2.7.13.3) - Malus domestica (Apple) (Malus sylvestris) & (ipr006189 : 119.81781) CHASE & (reliability: 2546.0) & (original description: no original description)

144

(at1g68460 : 306.0) Encodes a putative adenylate isopentenyltransferase. It catalyzes the formation of isopentenyladenosine 5'-monophosphate (iPMP) from AMP and dimethylallylpyrophosphate (DMAPP), but it has a lower Km for ADP and likely works using ADP or ATP in plants. It is involved in cytokinin biosynthesis.; hormone isopentenyltransferase 1 (IPT1); FUNCTIONS IN: AMP dimethylallyltransferase activity, ATP/ADP dimethylallyltransferase activity; INVOLVED IN: cytokinin metabolism.cytokinin biosynthetic process, secondary growth; LOCATED IN: chloroplast; EXPRESSED IN: 6 plant structures; EXPRESSED DURING: 4 anthesis, C globular stage, petal 17.4.1 c36544/f2p1/1779 -4.01 .synthesis- differentiation and expansion stage; CONTAINS InterPro DOMAIN/s: tRNA isopentenyltransferase (InterPro:IPR002627); BEST Arabidopsis thaliana protein match is: degradation ATP/ADP isopentenyltransferases (TAIR:AT3G19160.1); Has 7944 Blast hits to 7761 proteins in 2606 species: Archae - 0; Bacteria - 5388; Metazoa - 167; Fungi - 152; Plants - 293; Viruses - 0; Other Eukaryotes - 1944 (source: NCBI BLink). & (loc_os05g47840.1 : 249.0) no description available & (reliability: 612.0) & (original description: no original description) (loc_os07g30469.1 : 354.0) no description available & (at1g22360 : 314.0) UDP-glucosyl transferase 85A2 (UGT85A2); FUNCTIONS IN: UDP-glycosyltransferase activity, transferase activity, transferring glycosyl groups, glucuronosyltransferase activity; INVOLVED IN: metabolic process; LOCATED IN: cellular_component hormone unknown; EXPRESSED IN: 21 plant structures; EXPRESSED DURING: 11 growth stages; CONTAINS InterPro DOMAIN/s: UDP-glucuronosyl/UDP-glucosyltransferase metabolism.cytokinin 17.4.1 c58169/f2p1/1920 1.83 (InterPro:IPR002213); BEST Arabidopsis thaliana protein match is: UDP-glucosyl transferase 85A3 (TAIR:AT1G22380.1); Has 35333 Blast hits to 34131 proteins in 2444 .synthesis- species: Archae - 798; Bacteria - 22429; Metazoa - 974; Fungi - 991; Plants - 531; Viruses - 0; Other Eukaryotes - 9610 (source: NCBI BLink). & (q41819|iaag_maize : degradation 176.0) Indole-3-acetate beta-glucosyltransferase (EC 2.4.1.121) (IAA-Glu synthetase) ((Uridine 5'-diphosphate-glucose:indol-3-ylacetyl)-beta-D-glucosyl transferase) - Zea mays (Maize) & (reliability: 598.0) & (original description: no original description) Ethylene hormone (p31237|acco_actch : 443.0) 1-aminocyclopropane-1-carboxylate oxidase (EC 1.14.17.4) (ACC oxidase) (Ethylene-forming enzyme) (EFE) - Actinidia chinensis (Kiwi) metabolism.ethylene. (Yangtao) & (at1g05010 : 398.0) Encodes 1-aminocyclopropane-1-carboxylate oxidase; ethylene-forming enzyme (EFE); FUNCTIONS IN: 1-aminocyclopropane-1- synthesis- carboxylate oxidase activity; INVOLVED IN: response to fungus, ethylene biosynthetic process; EXPRESSED IN: 22 plant structures; EXPRESSED DURING: 13 growth 17.5.1.2 degradation.1- c110538/f1p1/1545 1.51 1.85 stages; CONTAINS InterPro DOMAIN/s: Oxoglutarate/iron-dependent oxygenase (InterPro:IPR005123); BEST Arabidopsis thaliana protein match is: 2-oxoglutarate aminocyclopropane- (2OG) and Fe(II)-dependent oxygenase superfamily protein (TAIR:AT1G12010.1); Has 8587 Blast hits to 8552 proteins in 1009 species: Archae - 0; Bacteria - 1142; 1-carboxylate Metazoa - 103; Fungi - 1035; Plants - 4924; Viruses - 0; Other Eukaryotes - 1383 (source: NCBI BLink). & (loc_os02g53180.2 : 376.0) no description available & oxidase (reliability: 796.0) & (original description: no original description) hormone (p31237|acco_actch : 530.0) 1-aminocyclopropane-1-carboxylate oxidase (EC 1.14.17.4) (ACC oxidase) (Ethylene-forming enzyme) (EFE) - Actinidia chinensis (Kiwi) metabolism.ethylene. (Yangtao) & (at1g05010 : 470.0) Encodes 1-aminocyclopropane-1-carboxylate oxidase; ethylene-forming enzyme (EFE); FUNCTIONS IN: 1-aminocyclopropane-1- synthesis- carboxylate oxidase activity; INVOLVED IN: response to fungus, ethylene biosynthetic process; EXPRESSED IN: 22 plant structures; EXPRESSED DURING: 13 growth 17.5.1.2 degradation.1- c53099/f6p40/1543 1.58 1.93 stages; CONTAINS InterPro DOMAIN/s: Oxoglutarate/iron-dependent oxygenase (InterPro:IPR005123); BEST Arabidopsis thaliana protein match is: 2-oxoglutarate aminocyclopropane- (2OG) and Fe(II)-dependent oxygenase superfamily protein (TAIR:AT1G12010.1); Has 8587 Blast hits to 8552 proteins in 1009 species: Archae - 0; Bacteria - 1142; 1-carboxylate Metazoa - 103; Fungi - 1035; Plants - 4924; Viruses - 0; Other Eukaryotes - 1383 (source: NCBI BLink). & (loc_os02g53180.2 : 429.0) no description available & oxidase (reliability: 940.0) & (original description: no original description) (at4g29100 : 314.0) basic helix-loop-helix (bHLH) DNA-binding superfamily protein; FUNCTIONS IN: sequence-specific DNA binding transcription factor activity; hormone INVOLVED IN: regulation of transcription; LOCATED IN: nucleus; EXPRESSED IN: 22 plant structures; EXPRESSED DURING: 10 growth stages; CONTAINS metabolism.ethylene. InterPro DOMAIN/s: Helix-loop-helix DNA-binding domain (InterPro:IPR001092), Helix-loop-helix DNA-binding (InterPro:IPR011598); BEST Arabidopsis thaliana 17.5.3 c54835/f2p0/1667 -2.02 -2.08 induced-regulated- protein match is: basic helix-loop-helix (bHLH) DNA-binding superfamily protein (TAIR:AT2G20100.1); Has 1864 Blast hits to 1723 proteins in 84 species: Archae - 0; responsive-activated Bacteria - 0; Metazoa - 76; Fungi - 29; Plants - 923; Viruses - 0; Other Eukaryotes - 836 (source: NCBI BLink). & (loc_os05g14010.1 : 263.0) no description available & (reliability: 628.0) & (original description: no original description) (at4g29100 : 276.0) basic helix-loop-helix (bHLH) DNA-binding superfamily protein; FUNCTIONS IN: sequence-specific DNA binding transcription factor activity; hormone INVOLVED IN: regulation of transcription; LOCATED IN: nucleus; EXPRESSED IN: 22 plant structures; EXPRESSED DURING: 10 growth stages; CONTAINS metabolism.ethylene. InterPro DOMAIN/s: Helix-loop-helix DNA-binding domain (InterPro:IPR001092), Helix-loop-helix DNA-binding (InterPro:IPR011598); BEST Arabidopsis thaliana 17.5.3 c57764/f2p0/1499 -1.96 -1.98 induced-regulated- protein match is: basic helix-loop-helix (bHLH) DNA-binding superfamily protein (TAIR:AT2G20100.1); Has 1864 Blast hits to 1723 proteins in 84 species: Archae - 0; responsive-activated Bacteria - 0; Metazoa - 76; Fungi - 29; Plants - 923; Viruses - 0; Other Eukaryotes - 836 (source: NCBI BLink). & (loc_os05g14010.1 : 239.0) no description available & (reliability: 552.0) & (original description: no original description) (q40478|erf5_tobac : 191.0) Ethylene-responsive transcription factor 5 (Ethylene-responsive element-binding factor 5 homolog) (EREBP-4) (NtERF4) - Nicotiana tabacum (Common tobacco) & (at5g61600 : 124.0) encodes a member of the ERF (ethylene response factor) subfamily B-3 of ERF/AP2 transcription factor family. The protein hormone contains one AP2 domain. There are 18 members in this subfamily including ATERF-1, ATERF-2, AND ATERF-5.; ethylene response factor 104 (ERF104); CONTAINS 17.5.2 metabolism.ethylene. c687/f10p2/1627 1.42 1.4 InterPro DOMAIN/s: DNA-binding, integrase-type (InterPro:IPR016177), Pathogenesis-related transcriptional factor/ERF, DNA-binding (InterPro:IPR001471); BEST signal transduction Arabidopsis thaliana protein match is: Integrase-type DNA-binding superfamily protein (TAIR:AT5G51190.1); Has 1807 Blast hits to 1807 proteins in 277 species: Archae - 0; Bacteria - 0; Metazoa - 736; Fungi - 347; Plants - 385; Viruses - 0; Other Eukaryotes - 339 (source: NCBI BLink). & (loc_os04g46250.1 : 107.0) no description available & (reliability: 248.0) & (original description: no original description) hormone (p37821|1a1c_maldo : 712.0) 1-aminocyclopropane-1-carboxylate synthase (EC 4.4.1.14) (ACC synthase) (S-adenosyl-L-methionine methylthioadenosine-lyase) - Malus metabolism.ethylene. domestica (Apple) (Malus sylvestris) & (at3g49700 : 705.0) encodes a a member of the 1-aminocyclopropane-1-carboxylate (ACC) synthase (S-adenosyl-L-methionine 17.5.1.1 synthesis- c76264/f1p2/1896 2.36 3.06 methylthioadenosine-lyase, EC 4.4.1.14) gene family. Mutants produce elevated levels of ethylene as etiolated seedlings.; 1-aminocyclopropane-1-carboxylate synthase 9 degradation.1- (ACS9); CONTAINS InterPro DOMAIN/s: 1-aminocyclopropane-1-carboxylate synthase (InterPro:IPR001176), Pyridoxal phosphate-dependent transferase, major domain aminocyclopropane- (InterPro:IPR015424), Aminotransferase, class I/classII (InterPro:IPR004839), Aminotransferases, class-I, pyridoxal-phosphate-binding site (InterPro:IPR004838),

145

1-carboxylate Pyridoxal phosphate-dependent transferase, major region, subdomain 1 (InterPro:IPR015421), Pyridoxal phosphate-dependent transferase, major region, subdomain 2 synthase (InterPro:IPR015422); BEST Arabidopsis thaliana protein match is: ACC synthase 5 (TAIR:AT5G65800.1); Has 32525 Blast hits to 32522 proteins in 2971 species: Archae - 868; Bacteria - 22571; Metazoa - 692; Fungi - 768; Plants - 1328; Viruses - 0; Other Eukaryotes - 6298 (source: NCBI BLink). & (loc_os03g51740.1 : 633.0) no description available & (ipr004839 : 223.0884) Aminotransferase, class I/classII & (chl4|524025 : 214.0) no description available & (reliability: 1410.0) & (original description: no original description) (at3g23150 : 765.0) Involved in ethylene perception in Arabidopsis; ethylene response 2 (ETR2); FUNCTIONS IN: ethylene binding, protein histidine kinase activity, receptor activity, glycogen synthase kinase 3 activity; INVOLVED IN: negative regulation of ethylene mediated signaling pathway; LOCATED IN: endomembrane system, endoplasmic reticulum membrane, membrane; EXPRESSED IN: 22 plant structures; EXPRESSED DURING: 7 growth stages; CONTAINS InterPro DOMAIN/s: Signal hormone transduction histidine kinase, homodimeric (InterPro:IPR009082), CheY-like (InterPro:IPR011006), Signal transduction histidine kinase, hybrid-type, ethylene sensor 17.5.2 metabolism.ethylene. c86079/f3p4/5005 1.08 (InterPro:IPR014525), Signal transduction response regulator, receiver domain (InterPro:IPR001789), Signal transduction histidine kinase, subgroup 1, signal transduction dimerisation/phosphoacceptor domain (InterPro:IPR003661), GAF (InterPro:IPR003018); BEST Arabidopsis thaliana protein match is: Signal transduction histidine kinase, hybrid-type, ethylene sensor (TAIR:AT3G04580.2); Has 29902 Blast hits to 25922 proteins in 2098 species: Archae - 92; Bacteria - 25510; Metazoa - 11; Fungi - 1191; Plants - 1553; Viruses - 6; Other Eukaryotes - 1539 (source: NCBI BLink). & (loc_os04g08740.2 : 682.0) no description available & (o81122|etr1_maldo : 458.0) Ethylene receptor (EC 2.7.13.3) - Malus domestica (Apple) (Malus sylvestris) & (reliability: 1530.0) & (original description: no original description) (at3g04580 : 563.0) Ethylene receptor, subfamily 2. Has serine kinase activity.; ETHYLENE INSENSITIVE 4 (EIN4); FUNCTIONS IN: ethylene binding, protein histidine kinase activity, receptor activity, glycogen synthase kinase 3 activity; INVOLVED IN: negative regulation of ethylene mediated signaling pathway; LOCATED IN: endomembrane system, endoplasmic reticulum membrane, membrane; EXPRESSED IN: 22 plant structures; EXPRESSED DURING: 13 growth stages; CONTAINS InterPro DOMAIN/s: Signal transduction histidine kinase, homodimeric (InterPro:IPR009082), CheY-like (InterPro:IPR011006), Signal transduction response regulator, hormone receiver domain (InterPro:IPR001789), Signal transduction histidine kinase, hybrid-type, ethylene sensor (InterPro:IPR014525), Signal transduction histidine kinase, 17.5.2 metabolism.ethylene. c107902/f1p7/5060 1.15 subgroup 1, dimerisation/phosphoacceptor domain (InterPro:IPR003661), ATPase-like, ATP-binding domain (InterPro:IPR003594), GAF (InterPro:IPR003018); BEST signal transduction Arabidopsis thaliana protein match is: Signal transduction histidine kinase, hybrid-type, ethylene sensor (TAIR:AT3G23150.1); Has 35333 Blast hits to 34131 proteins in 2444 species: Archae - 798; Bacteria - 22429; Metazoa - 974; Fungi - 991; Plants - 531; Viruses - 0; Other Eukaryotes - 9610 (source: NCBI BLink). & (loc_os04g08740.2 : 511.0) no description available & (o81122|etr1_maldo : 315.0) Ethylene receptor (EC 2.7.13.3) - Malus domestica (Apple) (Malus sylvestris) & (reliability: 1126.0) & (original description: no original description) (at3g23150 : 775.0) Involved in ethylene perception in Arabidopsis; ethylene response 2 (ETR2); FUNCTIONS IN: ethylene binding, protein histidine kinase activity, receptor activity, glycogen synthase kinase 3 activity; INVOLVED IN: negative regulation of ethylene mediated signaling pathway; LOCATED IN: endomembrane system, endoplasmic reticulum membrane, membrane; EXPRESSED IN: 22 plant structures; EXPRESSED DURING: 7 growth stages; CONTAINS InterPro DOMAIN/s: Signal hormone transduction histidine kinase, homodimeric (InterPro:IPR009082), CheY-like (InterPro:IPR011006), Signal transduction histidine kinase, hybrid-type, ethylene sensor 17.5.2 metabolism.ethylene. c110381/f1p3/4539 1.32 (InterPro:IPR014525), Signal transduction response regulator, receiver domain (InterPro:IPR001789), Signal transduction histidine kinase, subgroup 1, signal transduction dimerisation/phosphoacceptor domain (InterPro:IPR003661), GAF (InterPro:IPR003018); BEST Arabidopsis thaliana protein match is: Signal transduction histidine kinase, hybrid-type, ethylene sensor (TAIR:AT3G04580.2); Has 29902 Blast hits to 25922 proteins in 2098 species: Archae - 92; Bacteria - 25510; Metazoa - 11; Fungi - 1191; Plants - 1553; Viruses - 6; Other Eukaryotes - 1539 (source: NCBI BLink). & (loc_os04g08740.2 : 686.0) no description available & (o81122|etr1_maldo : 466.0) Ethylene receptor (EC 2.7.13.3) - Malus domestica (Apple) (Malus sylvestris) & (reliability: 1550.0) & (original description: no original description) (at3g23150 : 545.0) Involved in ethylene perception in Arabidopsis; ethylene response 2 (ETR2); FUNCTIONS IN: ethylene binding, protein histidine kinase activity, receptor activity, glycogen synthase kinase 3 activity; INVOLVED IN: negative regulation of ethylene mediated signaling pathway; LOCATED IN: endomembrane system, endoplasmic reticulum membrane, membrane; EXPRESSED IN: 22 plant structures; EXPRESSED DURING: 7 growth stages; CONTAINS InterPro DOMAIN/s: Signal hormone transduction histidine kinase, homodimeric (InterPro:IPR009082), CheY-like (InterPro:IPR011006), Signal transduction histidine kinase, hybrid-type, ethylene sensor 17.5.2 metabolism.ethylene. c146858/f1p3/5587 1.56 (InterPro:IPR014525), Signal transduction response regulator, receiver domain (InterPro:IPR001789), Signal transduction histidine kinase, subgroup 1, signal transduction dimerisation/phosphoacceptor domain (InterPro:IPR003661), GAF (InterPro:IPR003018); BEST Arabidopsis thaliana protein match is: Signal transduction histidine kinase, hybrid-type, ethylene sensor (TAIR:AT3G04580.2); Has 29902 Blast hits to 25922 proteins in 2098 species: Archae - 92; Bacteria - 25510; Metazoa - 11; Fungi - 1191; Plants - 1553; Viruses - 6; Other Eukaryotes - 1539 (source: NCBI BLink). & (loc_os04g08740.2 : 505.0) no description available & (o81122|etr1_maldo : 367.0) Ethylene receptor (EC 2.7.13.3) - Malus domestica (Apple) (Malus sylvestris) & (reliability: 1090.0) & (original description: no original description) (at3g50260 : 160.0) Encodes a member of the DREB subfamily A-5 of ERF/AP2 transcription factor family. The protein contains one AP2 domain. Involved in defense and freezing stress responses. There are 16 members in this subfamily including RAP2.1, RAP2.9 and RAP2.10.; cooperatively regulated by ethylene and jasmonate 1 (CEJ1); hormone CONTAINS InterPro DOMAIN/s: DNA-binding, integrase-type (InterPro:IPR016177), Pathogenesis-related transcriptional factor/ERF, DNA-binding 17.5.2 metabolism.ethylene. c153612/f1p1/984 3.43 (InterPro:IPR001471); BEST Arabidopsis thaliana protein match is: DREB and EAR motif protein 2 (TAIR:AT5G67190.1); Has 5544 Blast hits to 5466 proteins in 235 signal transduction species: Archae - 0; Bacteria - 0; Metazoa - 0; Fungi - 0; Plants - 5538; Viruses - 0; Other Eukaryotes - 6 (source: NCBI BLink). & (loc_os06g07030.1 : 151.0) no description available & (chl4|525312 : 83.2) no description available & (reliability: 320.0) & (original description: no original description) (at5g25190 : 145.0) encodes a member of the ERF (ethylene response factor) subfamily B-6 of ERF/AP2 transcription factor family. The protein contains one AP2 domain. There are 12 members in this subfamily including RAP2.11.; Integrase-type DNA-binding superfamily protein; CONTAINS InterPro DOMAIN/s: DNA-binding, integrase- hormone type (InterPro:IPR016177), Pathogenesis-related transcriptional factor/ERF, DNA-binding (InterPro:IPR001471); BEST Arabidopsis thaliana protein match is: Integrase- 17.5.2 metabolism.ethylene. c32845/f5p0/1276 2.69 type DNA-binding superfamily protein (TAIR:AT1G15360.1); Has 1807 Blast hits to 1807 proteins in 277 species: Archae - 0; Bacteria - 0; Metazoa - 736; Fungi - 347; signal transduction Plants - 385; Viruses - 0; Other Eukaryotes - 339 (source: NCBI BLink). & (loc_os02g55380.1 : 115.0) no description available & (reliability: 290.0) & (original description: no original description)

146

(at3g50260 : 137.0) Encodes a member of the DREB subfamily A-5 of ERF/AP2 transcription factor family. The protein contains one AP2 domain. Involved in defense and freezing stress responses. There are 16 members in this subfamily including RAP2.1, RAP2.9 and RAP2.10.; cooperatively regulated by ethylene and jasmonate 1 (CEJ1); hormone - CONTAINS InterPro DOMAIN/s: DNA-binding, integrase-type (InterPro:IPR016177), Pathogenesis-related transcriptional factor/ERF, DNA-binding 17.5.2 metabolism.ethylene. c39849/f1p2/1275 2.82 1.8 (InterPro:IPR001471); BEST Arabidopsis thaliana protein match is: DREB and EAR motif protein 2 (TAIR:AT5G67190.1); Has 5544 Blast hits to 5466 proteins in 235 signal transduction 7 species: Archae - 0; Bacteria - 0; Metazoa - 0; Fungi - 0; Plants - 5538; Viruses - 0; Other Eukaryotes - 6 (source: NCBI BLink). & (loc_os06g07030.1 : 131.0) no description available & (reliability: 274.0) & (original description: no original description) (q40478|erf5_tobac : 191.0) Ethylene-responsive transcription factor 5 (Ethylene-responsive element-binding factor 5 homolog) (EREBP-4) (NtERF4) - Nicotiana tabacum (Common tobacco) & (at5g51190 : 124.0) encodes a member of the ERF (ethylene response factor) subfamily B-3 of ERF/AP2 transcription factor family. The protein hormone contains one AP2 domain. There are 18 members in this subfamily including ATERF-1, ATERF-2, AND ATERF-5.; Integrase-type DNA-binding superfamily protein; 17.5.2 metabolism.ethylene. c6414/f6p5/1557 1.25 CONTAINS InterPro DOMAIN/s: DNA-binding, integrase-type (InterPro:IPR016177), Pathogenesis-related transcriptional factor/ERF, DNA-binding signal transduction (InterPro:IPR001471); BEST Arabidopsis thaliana protein match is: ethylene response factor 104 (TAIR:AT5G61600.1); Has 30201 Blast hits to 17322 proteins in 780 species: Archae - 12; Bacteria - 1396; Metazoa - 17338; Fungi - 3422; Plants - 5037; Viruses - 0; Other Eukaryotes - 2996 (source: NCBI BLink). & (loc_os04g46250.1 : 110.0) no description available & (reliability: 248.0) & (original description: no original description) (at3g50260 : 160.0) Encodes a member of the DREB subfamily A-5 of ERF/AP2 transcription factor family. The protein contains one AP2 domain. Involved in defense and freezing stress responses. There are 16 members in this subfamily including RAP2.1, RAP2.9 and RAP2.10.; cooperatively regulated by ethylene and jasmonate 1 (CEJ1); hormone CONTAINS InterPro DOMAIN/s: DNA-binding, integrase-type (InterPro:IPR016177), Pathogenesis-related transcriptional factor/ERF, DNA-binding 17.5.2 metabolism.ethylene. c6630/f3p1/861 2.98 (InterPro:IPR001471); BEST Arabidopsis thaliana protein match is: DREB and EAR motif protein 2 (TAIR:AT5G67190.1); Has 5544 Blast hits to 5466 proteins in 235 signal transduction species: Archae - 0; Bacteria - 0; Metazoa - 0; Fungi - 0; Plants - 5538; Viruses - 0; Other Eukaryotes - 6 (source: NCBI BLink). & (loc_os06g07030.1 : 151.0) no description available & (chl4|525312 : 83.2) no description available & (q9lw49|erf4_nicsy : 80.5) Ethylene-responsive transcription factor 4 (Ethylene-responsive element- binding factor 4 homolog) (EREBP-3) (NsERF3) - Nicotiana sylvestris (Wood tobacco) & (reliability: 320.0) & (original description: no original description) (at3g51770 : 1174.0) Encodes a negative regulator of 1-aminocyclopropane-1-carboxylic acid synthase5(ACS5), which catalyze the rate-limiting step in ethylene biosynthesis. ETO1 directly interacts with ACS5 and inhibits its enzyme activity and targets it for degradation via proteasome-dependent pathway. It also interacts with CUL3 (a component of ubiquitin ligase complexes). eto1 (and eto3) mutations elevate ethylene biosynthesis by affecting the posttranscriptional regulation of ACS; hormone ETHYLENE OVERPRODUCER 1 (ETO1); CONTAINS InterPro DOMAIN/s: Tetratricopeptide TPR-1 (InterPro:IPR001440), Tetratricopeptide-like helical 17.5.1 metabolism.ethylene. c73299/f2p3/3402 1.5 (InterPro:IPR011990), BTB/POZ fold (InterPro:IPR011333), BTB/POZ-like (InterPro:IPR000210), Tetratricopeptide repeat (InterPro:IPR019734); BEST Arabidopsis synthesis-degradation thaliana protein match is: ETO1-like 2 (TAIR:AT5G58550.1); Has 1321 Blast hits to 1075 proteins in 262 species: Archae - 34; Bacteria - 623; Metazoa - 134; Fungi - 2; Plants - 202; Viruses - 0; Other Eukaryotes - 326 (source: NCBI BLink). & (loc_os07g08120.2 : 841.0) no description available & (reliability: 2348.0) & (original description: no original description) Jasmonic acid (q7xys3|c74a2_orysa : 490.0) Cytochrome P450 74A2 (EC 4.2.1.92) (Allene oxide synthase 2) (Hydroperoxide dehydrase 2) - Oryza sativa (Rice) & (loc_os03g12500.1 : 490.0) no description available & (at5g42650 : 485.0) Encodes a member of the cytochrome p450 CYP74 gene family that functions as an allene oxide synthase. This hormone enzyme catalyzes dehydration of the hydroperoxide to an unstable allene oxide in the JA biosynthetic pathway. It shows a dual catalytic activity, the major one being a 13- metabolism.jasmonat - AOS but also expressing a 9-AOS activity.; allene oxide synthase (AOS); FUNCTIONS IN: hydro-lyase activity, allene oxide synthase activity, oxygen binding; 17.7.1.3 e.synthesis- c26619/f1p3/1964 1.2 1.44 -1.54 1.6 INVOLVED IN: in 7 processes; LOCATED IN: in 7 components; EXPRESSED IN: 23 plant structures; EXPRESSED DURING: 13 growth stages; CONTAINS InterPro degradation.allene 7 DOMAIN/s: Cytochrome P450 (InterPro:IPR001128); BEST Arabidopsis thaliana protein match is: hydroperoxide lyase 1 (TAIR:AT4G15440.1); Has 1807 Blast hits to oxidase synthase 1807 proteins in 277 species: Archae - 0; Bacteria - 0; Metazoa - 736; Fungi - 347; Plants - 385; Viruses - 0; Other Eukaryotes - 339 (source: NCBI BLink). & (reliability: 970.0) & (original description: no original description) (ipr013819 : 1500.0) Lipoxygenase, C-terminal & (p37831|lox1_soltu : 1232.0) Lipoxygenase 1 (EC 1.13.11.12) - Solanum tuberosum (Potato) & (at1g55020 : 1160.0) hormone lipoxygenase, a defense gene conferring resistance Xanthomonas campestris; lipoxygenase 1 (LOX1); CONTAINS InterPro DOMAIN/s: Lipoxygenase, iron binding site metabolism.jasmonat - (InterPro:IPR020833), Lipoxygenase, C-terminal (InterPro:IPR013819), Lipoxygenase, LH2 (InterPro:IPR001024), Lipase/lipooxygenase, PLAT/LH2 17.7.1.2 e.synthesis- c72273/f8p5/2810 -1.94 -1.3 -1.69 1.7 (InterPro:IPR008976), Lipoxygenase, conserved site (InterPro:IPR020834), Lipoxygenase (InterPro:IPR000907), Lipoxygenase, plant (InterPro:IPR001246); BEST degradation.lipoxyge 7 Arabidopsis thaliana protein match is: PLAT/LH2 domain-containing lipoxygenase family protein (TAIR:AT3G22400.1); Has 1484 Blast hits to 1444 proteins in 180 nase species: Archae - 0; Bacteria - 84; Metazoa - 533; Fungi - 49; Plants - 793; Viruses - 0; Other Eukaryotes - 25 (source: NCBI BLink). & (loc_os03g49380.1 : 1000.0) no description available & (chl4|513105 : 282.0) no description available & (reliability: 2320.0) & (original description: no original description) (loc_os02g10120.1 : 1010.0) no description available & (q84yk8|loxc2_orysa : 1003.0) Probable lipoxygenase 8, chloroplast precursor (EC 1.13.11.12) - Oryza sativa (Rice) & (at3g45140 : 996.0) Chloroplast lipoxygenase required for wound-induced jasmonic acid accumulation in Arabidopsis.Mutants are resistant to Staphylococcus aureus and hormone accumulate salicylic acid upon infection.; lipoxygenase 2 (LOX2); FUNCTIONS IN: lipoxygenase activity; INVOLVED IN: in 7 processes; LOCATED IN: chloroplast metabolism.jasmonat - thylakoid membrane, chloroplast stroma, chloroplast, chloroplast envelope; EXPRESSED IN: 24 plant structures; EXPRESSED DURING: 13 growth stages; CONTAINS 17.7.1.2 e.synthesis- c72328/f2p11/3062 -2.79 -2.76 -5.24 4.0 InterPro DOMAIN/s: Lipoxygenase, iron binding site (InterPro:IPR020833), Lipoxygenase, C-terminal (InterPro:IPR013819), Lipoxygenase, LH2 (InterPro:IPR001024), degradation.lipoxyge 2 Lipase/lipooxygenase, PLAT/LH2 (InterPro:IPR008976), Lipoxygenase, conserved site (InterPro:IPR020834), Lipoxygenase (InterPro:IPR000907), Lipoxygenase, plant nase (InterPro:IPR001246); BEST Arabidopsis thaliana protein match is: PLAT/LH2 domain-containing lipoxygenase family protein (TAIR:AT1G67560.1); Has 1459 Blast hits to 1423 proteins in 177 species: Archae - 0; Bacteria - 74; Metazoa - 524; Fungi - 48; Plants - 784; Viruses - 0; Other Eukaryotes - 29 (source: NCBI BLink). & (ipr013819 : 638.6836) Lipoxygenase, C-terminal & (chl4|513105 : 296.0) no description available & (reliability: 1992.0) & (original description: no original description)

147

(ipr013819 : 1500.0) Lipoxygenase, C-terminal & (p37831|lox1_soltu : 1236.0) Lipoxygenase 1 (EC 1.13.11.12) - Solanum tuberosum (Potato) & (at1g55020 : 1187.0) hormone lipoxygenase, a defense gene conferring resistance Xanthomonas campestris; lipoxygenase 1 (LOX1); CONTAINS InterPro DOMAIN/s: Lipoxygenase, iron binding site metabolism.jasmonat - (InterPro:IPR020833), Lipoxygenase, C-terminal (InterPro:IPR013819), Lipoxygenase, LH2 (InterPro:IPR001024), Lipase/lipooxygenase, PLAT/LH2 17.7.1.2 e.synthesis- c82214/f1p25/3010 -1.61 -2.12 -2.24 1.8 (InterPro:IPR008976), Lipoxygenase, conserved site (InterPro:IPR020834), Lipoxygenase (InterPro:IPR000907), Lipoxygenase, plant (InterPro:IPR001246); BEST degradation.lipoxyge 7 Arabidopsis thaliana protein match is: PLAT/LH2 domain-containing lipoxygenase family protein (TAIR:AT3G22400.1); Has 1484 Blast hits to 1444 proteins in 180 nase species: Archae - 0; Bacteria - 84; Metazoa - 533; Fungi - 49; Plants - 793; Viruses - 0; Other Eukaryotes - 25 (source: NCBI BLink). & (loc_os03g49380.1 : 1024.0) no description available & (chl4|513105 : 301.0) no description available & (reliability: 2374.0) & (original description: no original description) (p37831|lox1_soltu : 831.0) Lipoxygenase 1 (EC 1.13.11.12) - Solanum tuberosum (Potato) & (at3g22400 : 755.0) LOX5; FUNCTIONS IN: oxidoreductase activity, acting on single donors with incorporation of molecular oxygen, incorporation of two atoms of oxygen, lipoxygenase activity, iron ion binding, metal ion binding; INVOLVED IN: hormone root development; LOCATED IN: chloroplast; EXPRESSED IN: 22 plant structures; EXPRESSED DURING: 13 growth stages; CONTAINS InterPro DOMAIN/s: metabolism.jasmonat Lipoxygenase, LH2 (InterPro:IPR001024), Lipoxygenase, iron binding site (InterPro:IPR020833), Lipase/lipooxygenase, PLAT/LH2 (InterPro:IPR008976), Lipoxygenase, 17.7.1.2 e.synthesis- c78256/f1p2/2771 -4.86 conserved site (InterPro:IPR020834), Lipoxygenase, C-terminal (InterPro:IPR013819), Lipoxygenase, plant (InterPro:IPR001246); BEST Arabidopsis thaliana protein degradation.lipoxyge match is: lipoxygenase 1 (TAIR:AT1G55020.1); Has 1471 Blast hits to 1435 proteins in 177 species: Archae - 0; Bacteria - 82; Metazoa - 527; Fungi - 46; Plants - 787; nase Viruses - 0; Other Eukaryotes - 29 (source: NCBI BLink). & (loc_os03g49380.1 : 688.0) no description available & (ipr013819 : 568.47614) Lipoxygenase, C-terminal & (chl4|513105 : 285.0) no description available & (reliability: 1510.0) & (original description: no original description) (at1g76680 : 381.0) Encodes a member of an alpha/beta barrel fold family of FMN-containing oxidoreductases. One of the closely related 12-oxophytodienoic acid reductases. This enzyme is not expected to participate in jasmonic acid biosynthesis because during in vitro assays, it shows very little activity with the naturally occurring OPDA isomer. Shows activity towards 2,4,6-trinitrotoluene. Expressed predominately in root. Up-regulated by senescence and jasmonic acid. Induced by salicylic acid. hormone Independent of NPR1 for their induction by salicylic acid. Predicted to be a cytosolic protein.; 12-oxophytodienoate reductase 1 (OPR1); FUNCTIONS IN: 12- metabolism.jasmonat oxophytodienoate reductase activity; INVOLVED IN: in 7 processes; LOCATED IN: cellular_component unknown; EXPRESSED IN: 11 plant structures; EXPRESSED 17.7.1.5 e.synthesis- c24042/f1p1/1335 1.38 DURING: seedling growth, developing seed stage; CONTAINS InterPro DOMAIN/s: Aldolase-type TIM barrel (InterPro:IPR013785), NADH:flavin degradation.12-Oxo- oxidoreductase/NADH oxidase, N-terminal (InterPro:IPR001155); BEST Arabidopsis thaliana protein match is: 12-oxophytodienoate reductase 2 (TAIR:AT1G76690.1); PDA-reductase Has 35333 Blast hits to 34131 proteins in 2444 species: Archae - 798; Bacteria - 22429; Metazoa - 974; Fungi - 991; Plants - 531; Viruses - 0; Other Eukaryotes - 9610 (source: NCBI BLink). & (loc_os06g11210.1 : 348.0) no description available & (chl4|518284 : 269.0) no description available & (ipr013785 : 212.80542) Aldolase-type TIM barrel & (reliability: 762.0) & (original description: no original description) (ipr013819 : 1500.0) Lipoxygenase, C-terminal & (p37831|lox1_soltu : 1230.0) Lipoxygenase 1 (EC 1.13.11.12) - Solanum tuberosum (Potato) & (at1g55020 : 1159.0) hormone lipoxygenase, a defense gene conferring resistance Xanthomonas campestris; lipoxygenase 1 (LOX1); CONTAINS InterPro DOMAIN/s: Lipoxygenase, iron binding site metabolism.jasmonat (InterPro:IPR020833), Lipoxygenase, C-terminal (InterPro:IPR013819), Lipoxygenase, LH2 (InterPro:IPR001024), Lipase/lipooxygenase, PLAT/LH2 17.7.1.2 e.synthesis- c76229/f1p5/2814 -2.92 (InterPro:IPR008976), Lipoxygenase, conserved site (InterPro:IPR020834), Lipoxygenase (InterPro:IPR000907), Lipoxygenase, plant (InterPro:IPR001246); BEST degradation.lipoxyge Arabidopsis thaliana protein match is: PLAT/LH2 domain-containing lipoxygenase family protein (TAIR:AT3G22400.1); Has 1484 Blast hits to 1444 proteins in 180 nase species: Archae - 0; Bacteria - 84; Metazoa - 533; Fungi - 49; Plants - 793; Viruses - 0; Other Eukaryotes - 25 (source: NCBI BLink). & (loc_os03g49380.1 : 998.0) no description available & (chl4|513105 : 281.0) no description available & (reliability: 2318.0) & (original description: no original description)

GA (at4g25420 : 311.0) Encodes gibberellin 20-oxidase that is involved in the later steps of the gibberellin biosynthetic pathway. Regulated by a circadian clock. Weak hormone expression response to far red light.; GA20OX1; CONTAINS InterPro DOMAIN/s: Isopenicillin N synthase (InterPro:IPR002283), Oxoglutarate/iron-dependent oxygenase metabolism.gibbereli 17.6.1.1 (InterPro:IPR005123); BEST Arabidopsis thaliana protein match is: gibberellin 20 oxidase 2 (TAIR:AT5G51810.1); Has 8544 Blast hits to 8501 proteins in 990 species: n.synthesis- c11143/f1p0/1201 2.36 2.04 1 Archae - 0; Bacteria - 1084; Metazoa - 124; Fungi - 981; Plants - 4942; Viruses - 0; Other Eukaryotes - 1413 (source: NCBI BLink). & (o04706|gao1b_wheat : 285.0) degradation.GA20 Gibberellin 20 oxidase 1-B (EC 1.14.11.-) (Gibberellin C-20 oxidase 1-B) (GA 20-oxidase 1-B) (Ta20ox1B) (TaGA20ox1-B) - Triticum aestivum (Wheat) & oxidase (loc_os03g63970.1 : 265.0) no description available & (chl4|510590 : 97.4) no description available & (reliability: 622.0) & (original description: no original description) (at5g25900 : 189.0) Encodes a member of the CYP701A cytochrome p450 family that is involved in later steps of the gibberellin biosynthetic pathway.; GA requiring 3 hormone (GA3); FUNCTIONS IN: oxygen binding; INVOLVED IN: gibberellin biosynthetic process, gibberellic acid mediated signaling pathway, ent-kaurene oxidation to metabolism.gibbereli kaurenoic acid; LOCATED IN: chloroplast outer membrane, endoplasmic reticulum, microsome; EXPRESSED IN: 24 plant structures; EXPRESSED DURING: 13 growth 17.6.1.3 n.synthesis- c30767/f1p1/2139 -1.06 -1.48 stages; CONTAINS InterPro DOMAIN/s: Cytochrome P450 (InterPro:IPR001128), Cytochrome P450, E-class, group I (InterPro:IPR002401), Cytochrome P450, conserved degradation.ent- site (InterPro:IPR017972); BEST Arabidopsis thaliana protein match is: Cytochrome P450 superfamily protein (TAIR:AT5G07990.1); Has 1807 Blast hits to 1807 proteins kaurene oxidase in 277 species: Archae - 0; Bacteria - 0; Metazoa - 736; Fungi - 347; Plants - 385; Viruses - 0; Other Eukaryotes - 339 (source: NCBI BLink). & (loc_os06g37300.1 : 167.0) no description available & (reliability: 378.0) & (original description: no original description) (at4g02780 : 748.0) Catalyzes the conversion of geranylgeranyl pyrophosphate (GGPP) to copalyl pyrophosphate (CPP) of gibberellin biosynthesis; GA REQUIRING 1 hormone (GA1); CONTAINS InterPro DOMAIN/s: Terpene synthase, metal-binding domain (InterPro:IPR005630), Terpenoid synthase (InterPro:IPR008949), Terpenoid metabolism.gibbereli cylases/protein prenyltransferase alpha-alpha toroid (InterPro:IPR008930), Terpene synthase-like (InterPro:IPR001906); BEST Arabidopsis thaliana protein match is: 17.6.1.1 n.synthesis- c69223/f3p0/2820 -2.29 Terpenoid cyclases/Protein prenyltransferases superfamily protein (TAIR:AT1G79460.1); Has 1979 Blast hits to 1971 proteins in 256 species: Archae - 0; Bacteria - 97; degradation.copalyl Metazoa - 0; Fungi - 61; Plants - 1817; Viruses - 0; Other Eukaryotes - 4 (source: NCBI BLink). & (o04408|ksa_pea : 739.0) Ent-kaurene synthase A, chloroplast precursor diphosphate synthase (EC 5.5.1.13) (Ent-copalyl diphosphate synthase) (KSA) - Pisum sativum (Garden pea) & (loc_os02g17780.1 : 667.0) no description available & (ipr001906 : 116.6899)

148

Terpene synthase-like & (reliability: 1496.0) & (original description: no original description)

(o04408|ksa_pea : 266.0) Ent-kaurene synthase A, chloroplast precursor (EC 5.5.1.13) (Ent-copalyl diphosphate synthase) (KSA) - Pisum sativum (Garden pea) & hormone (at4g02780 : 224.0) Catalyzes the conversion of geranylgeranyl pyrophosphate (GGPP) to copalyl pyrophosphate (CPP) of gibberellin biosynthesis; GA REQUIRING 1 metabolism.gibbereli (GA1); CONTAINS InterPro DOMAIN/s: Terpene synthase, metal-binding domain (InterPro:IPR005630), Terpenoid synthase (InterPro:IPR008949), Terpenoid 17.6.1.1 n.synthesis- c96938/f1p2/2739 -1.65 cylases/protein prenyltransferase alpha-alpha toroid (InterPro:IPR008930), Terpene synthase-like (InterPro:IPR001906); BEST Arabidopsis thaliana protein match is: degradation.copalyl Terpenoid cyclases/Protein prenyltransferases superfamily protein (TAIR:AT1G79460.1); Has 1979 Blast hits to 1971 proteins in 256 species: Archae - 0; Bacteria - 97; diphosphate synthase Metazoa - 0; Fungi - 61; Plants - 1817; Viruses - 0; Other Eukaryotes - 4 (source: NCBI BLink). & (loc_os02g17780.1 : 212.0) no description available & (ipr001906 : 83.05558) Terpene synthase-like & (reliability: 448.0) & (original description: no original description) (o04705|gao1d_wheat : 462.0) Gibberellin 20 oxidase 1-D (EC 1.14.11.-) (Gibberellin C-20 oxidase 1-D) (GA 20-oxidase 1-D) (Ta20ox1D) (TaGA20ox1-D) (Protein hormone Wga20) - Triticum aestivum (Wheat) & (at4g25420 : 454.0) Encodes gibberellin 20-oxidase that is involved in the later steps of the gibberellin biosynthetic pathway. metabolism.gibbereli 17.6.1.1 Regulated by a circadian clock. Weak expression response to far red light.; GA20OX1; CONTAINS InterPro DOMAIN/s: Isopenicillin N synthase (InterPro:IPR002283), n.synthesis- c19835/f2p1/1479 -3.05 1 Oxoglutarate/iron-dependent oxygenase (InterPro:IPR005123); BEST Arabidopsis thaliana protein match is: gibberellin 20 oxidase 2 (TAIR:AT5G51810.1); Has 8544 Blast degradation.GA20 hits to 8501 proteins in 990 species: Archae - 0; Bacteria - 1084; Metazoa - 124; Fungi - 981; Plants - 4942; Viruses - 0; Other Eukaryotes - 1413 (source: NCBI BLink). & oxidase (loc_os03g63970.1 : 421.0) no description available & (chl4|510590 : 103.0) no description available & (reliability: 908.0) & (original description: no original description) Brassinosteroid (loc_os03g12910.1 : 419.0) no description available & (at1g58440 : 410.0) Encodes a putative protein that has been speculated, based on sequence similarities, to have hormone squalene monooxygenase activity.; XF1; FUNCTIONS IN: squalene monooxygenase activity; INVOLVED IN: response to water deprivation, sterol biosynthetic process; metabolism.brassinos LOCATED IN: endomembrane system, integral to membrane; EXPRESSED IN: 29 plant structures; EXPRESSED DURING: 13 growth stages; CONTAINS InterPro 17.3.1.2. teroid.synthesis- c125718/f1p0/2057 1.34 1.17 DOMAIN/s: Squalene epoxidase (InterPro:IPR013698); BEST Arabidopsis thaliana protein match is: squalene epoxidase 2 (TAIR:AT2G22830.1); Has 1994 Blast hits to 99 degradation.sterols.ot 1990 proteins in 731 species: Archae - 43; Bacteria - 1249; Metazoa - 112; Fungi - 225; Plants - 178; Viruses - 0; Other Eukaryotes - 187 (source: NCBI BLink). & her (o48651|erg1_pangi : 402.0) Squalene monooxygenase (EC 1.14.99.7) (Squalene epoxidase) (SE) - Panax ginseng (Korean ginseng) & (chl4|515912 : 280.0) no description available & (ipr013698 : 163.30122) Squalene epoxidase & (reliability: 820.0) & (original description: no original description) hormone (at1g78950 : 888.0) Terpenoid cyclases family protein; CONTAINS InterPro DOMAIN/s: Terpene synthase, conserved site (InterPro:IPR002365), Squalene cyclase metabolism.brassinos (InterPro:IPR018333), Terpenoid cylases/protein prenyltransferase alpha-alpha toroid (InterPro:IPR008930), Prenyltransferase/squalene oxidase (InterPro:IPR001330); 17.3.1.2. teroid.synthesis- c77174/f1p4/2601 -7.2 -6.22 BEST Arabidopsis thaliana protein match is: camelliol C synthase 1 (TAIR:AT1G78955.1); Has 2107 Blast hits to 2005 proteins in 574 species: Archae - 6; Bacteria - 886; 99 degradation.sterols.ot Metazoa - 160; Fungi - 240; Plants - 609; Viruses - 0; Other Eukaryotes - 206 (source: NCBI BLink). & (loc_os02g04710.1 : 887.0) no description available & her (chl4|510832 : 697.0) no description available & (reliability: 1738.0) & (original description: no original description) Phytochrome (p33530|phya1_tobac : 1800.0) Phytochrome A1 - Nicotiana tabacum (Common tobacco) & (at1g09570 : 1682.0) Light-labile cytoplasmic red/far-red light photoreceptor involved in the regulation of photomorphogenesis. It exists in two inter-convertible forms: Pr and Pfr (active) and functions as a dimer.The N terminus carries a single tetrapyrrole chromophore, and the C terminus is involved in dimerization. It is the sole photoreceptor mediating the FR high irradiance response (HIR). Major regulator in red-light induction of phototropic enhancement. Involved in the regulation of de-etiolation. Involved in gravitropism and phototropism. Requires FHY1 for nuclear accumulation.; phytochrome A (PHYA); CONTAINS InterPro DOMAIN/s: PAC motif (InterPro:IPR001610), Phytochrome, central region (InterPro:IPR013515), Signal 30.11 signalling.light c153854/f2p4/3931 1.39 transduction histidine kinase, core (InterPro:IPR005467), PAS fold (InterPro:IPR013767), PAS (InterPro:IPR000014), Phytochrome chromophore attachment domain (InterPro:IPR016132), ATPase-like, ATP-binding domain (InterPro:IPR003594), PAS fold-2 (InterPro:IPR013654), Phytochrome (InterPro:IPR001294), Signal transduction histidine kinase, subgroup 1, dimerisation/phosphoacceptor domain (InterPro:IPR003661), Phytochrome chromophore binding site (InterPro:IPR013516), GAF (InterPro:IPR003018); BEST Arabidopsis thaliana protein match is: phytochrome B (TAIR:AT2G18790.1); Has 30201 Blast hits to 17322 proteins in 780 species: Archae - 12; Bacteria - 1396; Metazoa - 17338; Fungi - 3422; Plants - 5037; Viruses - 0; Other Eukaryotes - 2996 (source: NCBI BLink). & (loc_os03g51030.3 : 1417.0) no description available & (ipr013515 : 160.71095) Phytochrome, central region & (reliability: 3364.0) & (original description: no original description)

Appendix 3 Differentially expressed genes related to stress response.

Fold change LY UY LR UR Heat Seq ID vs vs vs vs Description LG UG LY UY 20.2.1 stress.abiotic. c16355/f5p3/922 -1.18 -1.17 (at1g80920 : 129.0) A nuclear encoded soluble protein found in the chloroplast stroma.; J8; FUNCTIONS IN: unfolded protein binding, heat shock

149

heat protein binding; INVOLVED IN: protein folding, response to stress; LOCATED IN: nucleus; EXPRESSED IN: 24 plant structures; EXPRESSED DURING: 15 growth stages; CONTAINS InterPro DOMAIN/s: Molecular chaperone, heat shock protein, Hsp40, DnaJ (InterPro:IPR015609), Heat shock protein DnaJ, N-terminal (InterPro:IPR001623), Heat shock protein DnaJ (InterPro:IPR003095); BEST Arabidopsis thaliana protein match is: Chaperone DnaJ-domain superfamily protein (TAIR:AT3G05345.1); Has 10939 Blast hits to 10938 proteins in 2525 species: Archae - 126; Bacteria - 6057; Metazoa - 1338; Fungi - 591; Plants - 814; Viruses - 3; Other Eukaryotes - 2010 (source: NCBI BLink). & (reliability: 258.0) & (original description: no original description) (at4g13830 : 181.0) DnaJ-like protein (J20); nuclear gene; DNAJ-like 20 (J20); FUNCTIONS IN: heat shock protein binding; INVOLVED IN: protein folding, response to stress; LOCATED IN: nucleus; EXPRESSED IN: 23 plant structures; EXPRESSED DURING: 13 growth stages; CONTAINS InterPro DOMAIN/s: Molecular chaperone, heat shock protein, Hsp40, DnaJ (InterPro:IPR015609), Heat shock protein DnaJ, N-terminal stress.abiotic. 20.2.1 c142986/f2p1/8934 1.09 (InterPro:IPR001623), Heat shock protein DnaJ, conserved site (InterPro:IPR018253); BEST Arabidopsis thaliana protein match is: Molecular heat chaperone Hsp40/DnaJ family protein (TAIR:AT4G39960.1); Has 22791 Blast hits to 22789 proteins in 3231 species: Archae - 176; Bacteria - 9466; Metazoa - 3791; Fungi - 2164; Plants - 2102; Viruses - 8; Other Eukaryotes - 5084 (source: NCBI BLink). & (loc_os01g01160.1 : 148.0) no description available & (reliability: 362.0) & (original description: no original description) (at4g36040 : 135.0) Chaperone DnaJ-domain superfamily protein; FUNCTIONS IN: heat shock protein binding; INVOLVED IN: protein folding, response to stress; LOCATED IN: nucleus; EXPRESSED IN: 24 plant structures; EXPRESSED DURING: 14 growth stages; CONTAINS InterPro DOMAIN/s: Molecular chaperone, heat shock protein, Hsp40, DnaJ (InterPro:IPR015609), Heat shock protein DnaJ, N-terminal stress.abiotic. 20.2.1 c63616/f13p15/2442 1.44 (InterPro:IPR001623); BEST Arabidopsis thaliana protein match is: Chaperone DnaJ-domain superfamily protein (TAIR:AT2G17880.1); Has 18528 heat Blast hits to 18528 proteins in 3080 species: Archae - 133; Bacteria - 8129; Metazoa - 2968; Fungi - 1612; Plants - 1794; Viruses - 5; Other Eukaryotes - 3887 (source: NCBI BLink). & (loc_os08g43490.1 : 94.7) no description available & (reliability: 270.0) & (original description: no original description) (at5g52640 : 168.0) Encodes a cytosolic heat shock protein AtHSP90.1. AtHSP90.1 interacts with disease resistance signaling components SGT1b and RAR1 and is required for RPS2-mediated resistance.; heat shock protein 90.1 (HSP90.1); FUNCTIONS IN: unfolded protein binding, ATP binding; INVOLVED IN: defense response to bacterium, response to heat, response to arsenic; LOCATED IN: cytosol, cell wall, plasma membrane; EXPRESSED IN: 25 plant structures; EXPRESSED DURING: 14 growth stages; CONTAINS InterPro DOMAIN/s: Chaperone protein htpG (InterPro:IPR001404), Heat shock protein Hsp90, conserved site (InterPro:IPR019805), Heat shock protein Hsp90, C-terminal (InterPro:IPR020576), stress.abiotic. 20.2.1 c73027/f2p3/2637 0.93 Heat shock protein Hsp90, N-terminal (InterPro:IPR020575), Ribosomal protein S5 domain 2-type fold (InterPro:IPR020568), ATPase-like, ATP- heat binding domain (InterPro:IPR003594); BEST Arabidopsis thaliana protein match is: HEAT SHOCK PROTEIN 81.4 (TAIR:AT5G56000.1); Has 1807 Blast hits to 1807 proteins in 277 species: Archae - 0; Bacteria - 0; Metazoa - 736; Fungi - 347; Plants - 385; Viruses - 0; Other Eukaryotes - 339 (source: NCBI BLink). & (p36182|hsp82_tobac : 168.0) Heat shock protein 82 (Fragment) - Nicotiana tabacum (Common tobacco) & (loc_os04g01740.1 : 164.0) no description available & (chl4|525808 : 152.0) no description available & (ipr001404 : 112.64435) Heat shock protein Hsp90 & (reliability: 336.0) & (original description: no original description) (at5g57710 : 880.0) Double Clp-N motif-containing P-loop nucleoside triphosphate hydrolases superfamily protein; FUNCTIONS IN: ATP binding; INVOLVED IN: response to karrikin; EXPRESSED IN: 22 plant structures; EXPRESSED DURING: 13 growth stages; CONTAINS InterPro DOMAIN/s: ATPase, AAA-2 (InterPro:IPR013093); BEST Arabidopsis thaliana protein match is: Double Clp-N motif-containing P-loop nucleoside stress.abiotic. 20.2.1 c76198/f1p3/2828 1.28 triphosphate hydrolases superfamily protein (TAIR:AT4G30350.1); Has 16657 Blast hits to 15152 proteins in 2785 species: Archae - 23; Bacteria - heat 13282; Metazoa - 28; Fungi - 336; Plants - 548; Viruses - 0; Other Eukaryotes - 2440 (source: NCBI BLink). & (loc_os08g15230.1 : 623.0) no description available & (q6f2y7|hs101_orysa : 80.1) Heat shock protein 101 - Oryza sativa (Rice) & (reliability: 1760.0) & (original description: no original description) (loc_os05g38530.1 : 775.0) no description available & (p09189|hsp7c_pethy : 773.0) Heat shock cognate 70 kDa protein - Petunia hybrida (Petunia) & (at3g12580 : 770.0) heat shock protein 70 (HSP70); FUNCTIONS IN: ATP binding; INVOLVED IN: in 9 processes; LOCATED IN: cytosol, mitochondrion, cell wall, plasma membrane; EXPRESSED IN: 23 plant structures; EXPRESSED DURING: 11 growth stages; CONTAINS InterPro stress.abiotic. DOMAIN/s: Heat shock protein 70, conserved site (InterPro:IPR018181), Heat shock protein Hsp70 (InterPro:IPR001023), Heat shock protein 70 20.2.1 c81174/f1p5/2324 1.01 -1.39 -1.31 heat (InterPro:IPR013126); BEST Arabidopsis thaliana protein match is: heat shock cognate protein 70-1 (TAIR:AT5G02500.1); Has 34126 Blast hits to 33731 proteins in 4830 species: Archae - 159; Bacteria - 16481; Metazoa - 3906; Fungi - 1752; Plants - 1258; Viruses - 310; Other Eukaryotes - 10260 (source: NCBI BLink). & (chl4|525480 : 689.0) no description available & (ipr013126 : 507.14853) Heat shock protein 70 family & (reliability: 1540.0) & (original description: no original description) stress.abiotic. (p09887|hs22c_soybn : 125.0) Chloroplast small heat shock protein (Fragment) - Glycine max (Soybean) & (at4g27670 : 123.0) chloroplast located 20.2.1 c87209/f4p3/4193 1.12 heat small heat shock protein.; heat shock protein 21 (HSP21); CONTAINS InterPro DOMAIN/s: Heat shock protein Hsp20 (InterPro:IPR002068), HSP20-

150

like chaperone (InterPro:IPR008978); BEST Arabidopsis thaliana protein match is: HSP20-like chaperones superfamily protein (TAIR:AT5G51440.1); Has 6158 Blast hits to 6158 proteins in 1414 species: Archae - 225; Bacteria - 3528; Metazoa - 23; Fungi - 170; Plants - 1466; Viruses - 0; Other Eukaryotes - 746 (source: NCBI BLink). & (loc_os03g14180.1 : 108.0) no description available & (reliability: 246.0) & (original description: no original description) (p51819|hsp83_iponi : 730.0) Heat shock protein 83 - Ipomoea nil (Japanese morning glory) (Pharbitis nil) & (loc_os04g01740.1 : 710.0) no description available & (at5g52640 : 709.0) Encodes a cytosolic heat shock protein AtHSP90.1. AtHSP90.1 interacts with disease resistance signaling components SGT1b and RAR1 and is required for RPS2-mediated resistance.; heat shock protein 90.1 (HSP90.1); FUNCTIONS IN: unfolded protein binding, ATP binding; INVOLVED IN: defense response to bacterium, response to heat, response to arsenic; LOCATED IN: cytosol, cell wall, plasma membrane; EXPRESSED IN: 25 plant structures; EXPRESSED DURING: 14 growth stages; CONTAINS InterPro DOMAIN/s: Chaperone protein stress.abiotic. 20.2.1 c154161/f1p5/1063 1.14 htpG (InterPro:IPR001404), Heat shock protein Hsp90, conserved site (InterPro:IPR019805), Heat shock protein Hsp90, C-terminal heat (InterPro:IPR020576), Heat shock protein Hsp90, N-terminal (InterPro:IPR020575), Ribosomal protein S5 domain 2-type fold (InterPro:IPR020568), ATPase-like, ATP-binding domain (InterPro:IPR003594); BEST Arabidopsis thaliana protein match is: HEAT SHOCK PROTEIN 81.4 (TAIR:AT5G56000.1); Has 1807 Blast hits to 1807 proteins in 277 species: Archae - 0; Bacteria - 0; Metazoa - 736; Fungi - 347; Plants - 385; Viruses - 0; Other Eukaryotes - 339 (source: NCBI BLink). & (chl4|525808 : 643.0) no description available & (ipr001404 : 556.0942) Heat shock protein Hsp90 & (reliability: 1418.0) & (original description: no original description) (p51819|hsp83_iponi : 768.0) Heat shock protein 83 - Ipomoea nil (Japanese morning glory) (Pharbitis nil) & (at5g52640 : 758.0) Encodes a cytosolic heat shock protein AtHSP90.1. AtHSP90.1 interacts with disease resistance signaling components SGT1b and RAR1 and is required for RPS2- mediated resistance.; heat shock protein 90.1 (HSP90.1); FUNCTIONS IN: unfolded protein binding, ATP binding; INVOLVED IN: defense response to bacterium, response to heat, response to arsenic; LOCATED IN: cytosol, cell wall, plasma membrane; EXPRESSED IN: 25 plant structures; EXPRESSED DURING: 14 growth stages; CONTAINS InterPro DOMAIN/s: Chaperone protein htpG (InterPro:IPR001404), Heat shock protein stress.abiotic. 20.2.1 c225/f13p4/1049 1.39 Hsp90, conserved site (InterPro:IPR019805), Heat shock protein Hsp90, C-terminal (InterPro:IPR020576), Heat shock protein Hsp90, N-terminal heat (InterPro:IPR020575), Ribosomal protein S5 domain 2-type fold (InterPro:IPR020568), ATPase-like, ATP-binding domain (InterPro:IPR003594); BEST Arabidopsis thaliana protein match is: HEAT SHOCK PROTEIN 81.4 (TAIR:AT5G56000.1); Has 1807 Blast hits to 1807 proteins in 277 species: Archae - 0; Bacteria - 0; Metazoa - 736; Fungi - 347; Plants - 385; Viruses - 0; Other Eukaryotes - 339 (source: NCBI BLink). & (loc_os04g01740.1 : 745.0) no description available & (chl4|525808 : 677.0) no description available & (ipr001404 : 340.60028) Heat shock protein Hsp90 & (reliability: 1516.0) & (original description: no original description) (q01545|hsp22_iponi : 117.0) 18.8 kDa class II heat shock protein - Ipomoea nil (Japanese morning glory) (Pharbitis nil) & (at5g12020 : 111.0) 17.6 kDa class II heat shock protein (HSP17.6II); CONTAINS InterPro DOMAIN/s: Heat shock protein Hsp20 (InterPro:IPR002068), HSP20-like stress.abiotic. 20.2.1 c31659/f1p1/2584 1.04 chaperone (InterPro:IPR008978); BEST Arabidopsis thaliana protein match is: heat shock protein 17.6A (TAIR:AT5G12030.1); Has 1807 Blast hits to heat 1807 proteins in 277 species: Archae - 0; Bacteria - 0; Metazoa - 736; Fungi - 347; Plants - 385; Viruses - 0; Other Eukaryotes - 339 (source: NCBI BLink). & (loc_os01g08860.1 : 101.0) no description available & (reliability: 222.0) & (original description: no original description) (q01545|hsp22_iponi : 144.0) 18.8 kDa class II heat shock protein - Ipomoea nil (Japanese morning glory) (Pharbitis nil) & (at5g12020 : 131.0) 17.6 kDa class II heat shock protein (HSP17.6II); CONTAINS InterPro DOMAIN/s: Heat shock protein Hsp20 (InterPro:IPR002068), HSP20-like stress.abiotic. 20.2.1 c35154/f1p0/1085 1.52 chaperone (InterPro:IPR008978); BEST Arabidopsis thaliana protein match is: heat shock protein 17.6A (TAIR:AT5G12030.1); Has 1807 Blast hits to heat 1807 proteins in 277 species: Archae - 0; Bacteria - 0; Metazoa - 736; Fungi - 347; Plants - 385; Viruses - 0; Other Eukaryotes - 339 (source: NCBI BLink). & (loc_os01g08860.1 : 119.0) no description available & (reliability: 262.0) & (original description: no original description) (q03684|bip4_tobac : 137.0) Luminal-binding protein 4 precursor (BiP 4) (78 kDa glucose-regulated protein homolog 4) (GRP 78-4) - Nicotiana tabacum (Common tobacco) & (at5g28540 : 136.0) Encodes the luminal binding protein BiP, an ER-localized member of the HSP70 family. BiP is composed of an N-terminal ATP binding domain and a C-terminal domain that binds to hydrophobic patches on improperly/incompletely folded proteins in an ATP-dependent manner. Involved in polar nuclei fusion during proliferation of endosperm nuclei.; BIP1; FUNCTIONS IN: ATP binding; INVOLVED IN: protein folding, ER-associated protein catabolic process, response to heat, polar nucleus fusion; LOCATED IN: cell wall, stress.abiotic. 20.2.1 c21728/f1p0/1557 4.58 4.19 plasma membrane, chloroplast, vacuole, endoplasmic reticulum lumen; EXPRESSED IN: 8 plant structures; EXPRESSED DURING: L mature pollen heat stage, M germinated pollen stage, seedling growth; CONTAINS InterPro DOMAIN/s: Heat shock protein 70, conserved site (InterPro:IPR018181), Heat shock protein Hsp70 (InterPro:IPR001023), Heat shock protein 70 (InterPro:IPR013126); BEST Arabidopsis thaliana protein match is: Heat shock protein 70 (Hsp 70) family protein (TAIR:AT5G42020.1); Has 36391 Blast hits to 35786 proteins in 4820 species: Archae - 162; Bacteria - 17493; Metazoa - 3988; Fungi - 1814; Plants - 1283; Viruses - 341; Other Eukaryotes - 11310 (source: NCBI BLink). & (loc_os02g02410.1 : 131.0) no description available & (chl4|518564 : 118.0) no description available & (reliability: 272.0) & (original description: no original description)

151

Cold (at5g67320 : 468.0) Encodes a WD-40 protein involved in histone deacetylation in response to abiotic stress.Identified in a screen for mutations with altered expression of stress induced genes. Functions as a repressor of cold tolerance induced genes. Loss of function mutants are hypersensitive to freezing.; high expression of osmotically responsive genes 15 (HOS15); CONTAINS InterPro DOMAIN/s: WD40 repeat 2 (InterPro:IPR019782), LisH dimerisation motif, subgroup (InterPro:IPR013720), WD40 repeat, conserved site (InterPro:IPR019775), WD40 repeat (InterPro:IPR001680), G- protein beta WD-40 repeat, region (InterPro:IPR020472), WD40 repeat-like-containing domain (InterPro:IPR011046), WD40-repeat-containing stress.abiotic. 20.2.2 c136754/f1p2/3157 1.28 domain (InterPro:IPR017986), WD40/YVTN repeat-like-containing domain (InterPro:IPR015943), LisH dimerisation motif (InterPro:IPR006594), cold WD40 repeat, subgroup (InterPro:IPR019781); BEST Arabidopsis thaliana protein match is: TBP-associated factor 5 (TAIR:AT5G25150.1); Has 1807 Blast hits to 1807 proteins in 277 species: Archae - 0; Bacteria - 0; Metazoa - 736; Fungi - 347; Plants - 385; Viruses - 0; Other Eukaryotes - 339 (source: NCBI BLink). & (loc_os07g22220.1 : 459.0) no description available & (chl4|513422 : 377.0) no description available & (ipr015943 : 164.23857) WD40/YVTN repeat-like-containing domain & (p93107|pf20_chlre : 97.8) Flagellar WD repeat protein PF20 - Chlamydomonas reinhardtii & (reliability: 936.0) & (original description: no original description)

Drought (at1g31850 : 1006.0) S-adenosyl-L-methionine-dependent methyltransferases superfamily protein; CONTAINS InterPro DOMAIN/s: Protein of unknown function DUF248, methyltransferase putative (InterPro:IPR004159); BEST Arabidopsis thaliana protein match is: S-adenosyl-L-methionine- stress.abiotic. dependent methyltransferases superfamily protein (TAIR:AT4G19120.2); Has 35333 Blast hits to 34131 proteins in 2444 species: Archae - 798; 20.2.3 c132978/f1p4/2339 2.08 2.41 drought/salt Bacteria - 22429; Metazoa - 974; Fungi - 991; Plants - 531; Viruses - 0; Other Eukaryotes - 9610 (source: NCBI BLink). & (loc_os06g01450.1 : 537.0) no description available & (ipr004159 : 482.06128) Putative S-adenosyl-L-methionine-dependent methyltransferase & (reliability: 2012.0) & (original description: no original description) (at5g25610 : 303.0) responsive to dehydration 22 (RD22) mediated by ABA; RESPONSIVE TO DESSICATION 22 (RD22); FUNCTIONS IN: nutrient reservoir activity; INVOLVED IN: response to desiccation, response to salt stress, response to abscisic acid stimulus; LOCATED IN: endomembrane system; EXPRESSED IN: 21 plant structures; EXPRESSED DURING: 13 growth stages; CONTAINS InterPro DOMAIN/s: BURP stress.abiotic. 20.2.3 c60944/f61p71/1402 1.34 2.03 (InterPro:IPR004873); BEST Arabidopsis thaliana protein match is: unknown seed protein like 1 (TAIR:AT1G49320.1); Has 1807 Blast hits to 1807 drought/salt proteins in 277 species: Archae - 0; Bacteria - 0; Metazoa - 736; Fungi - 347; Plants - 385; Viruses - 0; Other Eukaryotes - 339 (source: NCBI BLink). & (loc_os01g53240.1 : 290.0) no description available & (ipr004873 : 210.88232) BURP domain & (p21747|ea92_vicfa : 120.0) Embryonic abundant protein USP92 precursor - Vicia faba (Broad bean) & (reliability: 606.0) & (original description: no original description) (at5g25610 : 307.0) responsive to dehydration 22 (RD22) mediated by ABA; RESPONSIVE TO DESSICATION 22 (RD22); FUNCTIONS IN: nutrient reservoir activity; INVOLVED IN: response to desiccation, response to salt stress, response to abscisic acid stimulus; LOCATED IN: endomembrane system; EXPRESSED IN: 21 plant structures; EXPRESSED DURING: 13 growth stages; CONTAINS InterPro DOMAIN/s: BURP stress.abiotic. 20.2.3 c69083/f1p25/1426 2.55 2.29 (InterPro:IPR004873); BEST Arabidopsis thaliana protein match is: unknown seed protein like 1 (TAIR:AT1G49320.1); Has 1807 Blast hits to 1807 drought/salt proteins in 277 species: Archae - 0; Bacteria - 0; Metazoa - 736; Fungi - 347; Plants - 385; Viruses - 0; Other Eukaryotes - 339 (source: NCBI BLink). & (loc_os01g53240.1 : 289.0) no description available & (ipr004873 : 210.96236) BURP domain & (p21747|ea92_vicfa : 118.0) Embryonic abundant protein USP92 precursor - Vicia faba (Broad bean) & (reliability: 614.0) & (original description: no original description) (at1g31850 : 361.0) S-adenosyl-L-methionine-dependent methyltransferases superfamily protein; CONTAINS InterPro DOMAIN/s: Protein of unknown function DUF248, methyltransferase putative (InterPro:IPR004159); BEST Arabidopsis thaliana protein match is: S-adenosyl-L-methionine- stress.abiotic. dependent methyltransferases superfamily protein (TAIR:AT4G19120.2); Has 35333 Blast hits to 34131 proteins in 2444 species: Archae - 798; 20.2.3 c75150/f1p1/2520 2.56 3.19 drought/salt Bacteria - 22429; Metazoa - 974; Fungi - 991; Plants - 531; Viruses - 0; Other Eukaryotes - 9610 (source: NCBI BLink). & (loc_os04g14150.1 : 342.0) no description available & (ipr004159 : 176.96259) Putative S-adenosyl-L-methionine-dependent methyltransferase & (reliability: 722.0) & (original description: no original description) (at4g19120 : 956.0) early-responsive to dehydration 3 (ERD3); CONTAINS InterPro DOMAIN/s: Protein of unknown function DUF248, methyltransferase putative (InterPro:IPR004159); BEST Arabidopsis thaliana protein match is: S-adenosyl-L-methionine-dependent stress.abiotic. methyltransferases superfamily protein (TAIR:AT1G31850.2); Has 1146 Blast hits to 1130 proteins in 105 species: Archae - 6; Bacteria - 157; 20.2.3 c80690/f2p4/2523 2.66 2.84 drought/salt Metazoa - 0; Fungi - 3; Plants - 960; Viruses - 0; Other Eukaryotes - 20 (source: NCBI BLink). & (loc_os06g01450.1 : 538.0) no description available & (ipr004159 : 483.2064) Putative S-adenosyl-L-methionine-dependent methyltransferase & (reliability: 1912.0) & (original description: no original description)

152

(at1g31850 : 1004.0) S-adenosyl-L-methionine-dependent methyltransferases superfamily protein; CONTAINS InterPro DOMAIN/s: Protein of unknown function DUF248, methyltransferase putative (InterPro:IPR004159); BEST Arabidopsis thaliana protein match is: S-adenosyl-L-methionine- stress.abiotic. dependent methyltransferases superfamily protein (TAIR:AT4G19120.2); Has 35333 Blast hits to 34131 proteins in 2444 species: Archae - 798; 20.2.3 c80989/f2p4/2668 1.66 1.69 drought/salt Bacteria - 22429; Metazoa - 974; Fungi - 991; Plants - 531; Viruses - 0; Other Eukaryotes - 9610 (source: NCBI BLink). & (loc_os06g01450.1 : 538.0) no description available & (ipr004159 : 483.07288) Putative S-adenosyl-L-methionine-dependent methyltransferase & (reliability: 2008.0) & (original description: no original description) (at5g25610 : 303.0) responsive to dehydration 22 (RD22) mediated by ABA; RESPONSIVE TO DESSICATION 22 (RD22); FUNCTIONS IN: nutrient reservoir activity; INVOLVED IN: response to desiccation, response to salt stress, response to abscisic acid stimulus; LOCATED IN: endomembrane system; EXPRESSED IN: 21 plant structures; EXPRESSED DURING: 13 growth stages; CONTAINS InterPro DOMAIN/s: BURP stress.abiotic. 20.2.3 c82558/f6p78/1513 1.5 2.2 (InterPro:IPR004873); BEST Arabidopsis thaliana protein match is: unknown seed protein like 1 (TAIR:AT1G49320.1); Has 1807 Blast hits to 1807 drought/salt proteins in 277 species: Archae - 0; Bacteria - 0; Metazoa - 736; Fungi - 347; Plants - 385; Viruses - 0; Other Eukaryotes - 339 (source: NCBI BLink). & (loc_os01g53240.1 : 290.0) no description available & (ipr004873 : 210.70642) BURP domain & (p21747|ea92_vicfa : 120.0) Embryonic abundant protein USP92 precursor - Vicia faba (Broad bean) & (reliability: 606.0) & (original description: no original description) (at5g25610 : 270.0) responsive to dehydration 22 (RD22) mediated by ABA; RESPONSIVE TO DESSICATION 22 (RD22); FUNCTIONS IN: nutrient reservoir activity; INVOLVED IN: response to desiccation, response to salt stress, response to abscisic acid stimulus; LOCATED IN: endomembrane system; EXPRESSED IN: 21 plant structures; EXPRESSED DURING: 13 growth stages; CONTAINS InterPro DOMAIN/s: BURP stress.abiotic. 20.2.3 c83549/f12p7/1331 1.48 2.3 (InterPro:IPR004873); BEST Arabidopsis thaliana protein match is: unknown seed protein like 1 (TAIR:AT1G49320.1); Has 1807 Blast hits to 1807 drought/salt proteins in 277 species: Archae - 0; Bacteria - 0; Metazoa - 736; Fungi - 347; Plants - 385; Viruses - 0; Other Eukaryotes - 339 (source: NCBI BLink). & (loc_os01g53240.1 : 248.0) no description available & (ipr004873 : 157.96208) BURP domain & (p21745|ea30_vicfa : 121.0) Embryonic abundant protein VF30.1 precursor - Vicia faba (Broad bean) & (reliability: 540.0) & (original description: no original description) (at5g25610 : 307.0) responsive to dehydration 22 (RD22) mediated by ABA; RESPONSIVE TO DESSICATION 22 (RD22); FUNCTIONS IN: nutrient reservoir activity; INVOLVED IN: response to desiccation, response to salt stress, response to abscisic acid stimulus; LOCATED IN: endomembrane system; EXPRESSED IN: 21 plant structures; EXPRESSED DURING: 13 growth stages; CONTAINS InterPro DOMAIN/s: BURP stress.abiotic. 20.2.3 c84128/f235p47/1433 2.69 (InterPro:IPR004873); BEST Arabidopsis thaliana protein match is: unknown seed protein like 1 (TAIR:AT1G49320.1); Has 1807 Blast hits to 1807 drought/salt proteins in 277 species: Archae - 0; Bacteria - 0; Metazoa - 736; Fungi - 347; Plants - 385; Viruses - 0; Other Eukaryotes - 339 (source: NCBI BLink). & (loc_os01g53240.1 : 289.0) no description available & (ipr004873 : 210.96236) BURP domain & (p21747|ea92_vicfa : 118.0) Embryonic abundant protein USP92 precursor - Vicia faba (Broad bean) & (reliability: 614.0) & (original description: no original description) (at2g39750 : 922.0) S-adenosyl-L-methionine-dependent methyltransferases superfamily protein; CONTAINS InterPro DOMAIN/s: Protein of unknown function DUF248, methyltransferase putative (InterPro:IPR004159); BEST Arabidopsis thaliana protein match is: Putative methyltransferase stress.abiotic. 20.2.3 c91877/f1p3/2723 -1.41 family protein (TAIR:AT5G06050.1); Has 1633 Blast hits to 1528 proteins in 174 species: Archae - 2; Bacteria - 140; Metazoa - 223; Fungi - 78; drought/salt Plants - 1007; Viruses - 29; Other Eukaryotes - 154 (source: NCBI BLink). & (loc_os01g62800.1 : 822.0) no description available & (ipr004159 : 519.97876) Putative S-adenosyl-L-methionine-dependent methyltransferase & (reliability: 1844.0) & (original description: no original description) (at3g12360 : 479.0) Encodes a protein with an ankyrin motif and transmembrane domains that is involved in salt tolerance. Expressed throughout the plant and localized to the plasma membrane. Loss of function mutations show an increased tolerance to salt based on assaying seedling growth in the presence of salt. In the mutants, induction of genes required for production of reactive oxygen species is reduced suggesting that itn1 promotes ROS production.; INCREASED TOLERANCE TO NACL (ITN1); INVOLVED IN: response to salt stress; LOCATED IN: plasma membrane; stress.abiotic. 20.2.3 c25531/f3p2/2200 0.99 EXPRESSED IN: 24 plant structures; EXPRESSED DURING: 13 growth stages; CONTAINS InterPro DOMAIN/s: Ankyrin repeat-containing drought/salt domain (InterPro:IPR020683), Ankyrin repeat (InterPro:IPR002110); BEST Arabidopsis thaliana protein match is: Ankyrin repeat family protein (TAIR:AT3G09550.1); Has 61607 Blast hits to 26705 proteins in 1191 species: Archae - 49; Bacteria - 5778; Metazoa - 28187; Fungi - 6605; Plants - 4929; Viruses - 594; Other Eukaryotes - 15465 (source: NCBI BLink). & (loc_os03g17240.1 : 463.0) no description available & (reliability: 958.0) & (original description: no original description) (at5g25610 : 303.0) responsive to dehydration 22 (RD22) mediated by ABA; RESPONSIVE TO DESSICATION 22 (RD22); FUNCTIONS IN: nutrient reservoir activity; INVOLVED IN: response to desiccation, response to salt stress, response to abscisic acid stimulus; LOCATED IN: endomembrane system; EXPRESSED IN: 21 plant structures; EXPRESSED DURING: 13 growth stages; CONTAINS InterPro DOMAIN/s: BURP stress.abiotic. 20.2.3 c55297/f1p3/1869 1.98 (InterPro:IPR004873); BEST Arabidopsis thaliana protein match is: unknown seed protein like 1 (TAIR:AT1G49320.1); Has 1807 Blast hits to 1807 drought/salt proteins in 277 species: Archae - 0; Bacteria - 0; Metazoa - 736; Fungi - 347; Plants - 385; Viruses - 0; Other Eukaryotes - 339 (source: NCBI BLink). & (loc_os01g53240.1 : 290.0) no description available & (ipr004873 : 210.2486) BURP domain & (p21747|ea92_vicfa : 120.0) Embryonic abundant protein USP92 precursor - Vicia faba (Broad bean) & (reliability: 606.0) & (original description: no original description)

153

(at5g25610 : 303.0) responsive to dehydration 22 (RD22) mediated by ABA; RESPONSIVE TO DESSICATION 22 (RD22); FUNCTIONS IN: nutrient reservoir activity; INVOLVED IN: response to desiccation, response to salt stress, response to abscisic acid stimulus; LOCATED IN: endomembrane system; EXPRESSED IN: 21 plant structures; EXPRESSED DURING: 13 growth stages; CONTAINS InterPro DOMAIN/s: BURP stress.abiotic. 20.2.3 c59336/f1p3/1265 3.28 (InterPro:IPR004873); BEST Arabidopsis thaliana protein match is: unknown seed protein like 1 (TAIR:AT1G49320.1); Has 1807 Blast hits to 1807 drought/salt proteins in 277 species: Archae - 0; Bacteria - 0; Metazoa - 736; Fungi - 347; Plants - 385; Viruses - 0; Other Eukaryotes - 339 (source: NCBI BLink). & (loc_os01g53240.1 : 290.0) no description available & (ipr004873 : 211.14468) BURP domain & (p21747|ea92_vicfa : 120.0) Embryonic abundant protein USP92 precursor - Vicia faba (Broad bean) & (reliability: 606.0) & (original description: no original description) (at5g25610 : 307.0) responsive to dehydration 22 (RD22) mediated by ABA; RESPONSIVE TO DESSICATION 22 (RD22); FUNCTIONS IN: nutrient reservoir activity; INVOLVED IN: response to desiccation, response to salt stress, response to abscisic acid stimulus; LOCATED IN: endomembrane system; EXPRESSED IN: 21 plant structures; EXPRESSED DURING: 13 growth stages; CONTAINS InterPro DOMAIN/s: BURP stress.abiotic. 20.2.3 c67862/f1p8/1321 9.49 (InterPro:IPR004873); BEST Arabidopsis thaliana protein match is: unknown seed protein like 1 (TAIR:AT1G49320.1); Has 1807 Blast hits to 1807 drought/salt proteins in 277 species: Archae - 0; Bacteria - 0; Metazoa - 736; Fungi - 347; Plants - 385; Viruses - 0; Other Eukaryotes - 339 (source: NCBI BLink). & (loc_os01g53240.1 : 289.0) no description available & (ipr004873 : 211.14468) BURP domain & (p21747|ea92_vicfa : 118.0) Embryonic abundant protein USP92 precursor - Vicia faba (Broad bean) & (reliability: 614.0) & (original description: no original description)

Metal binding (p43389|mt3_actch : 95.5) Metallothionein-like protein type 3 PKIWI503 - Actinidia chinensis (Kiwi) (Yangtao) & (at3g15353 : 94.4) metal metallothionein, binds to and detoxifies excess copper and other metals, limiting oxidative damage; metallothionein 3 (MT3); FUNCTIONS IN: handling.bindi 15.2 c18604/f2p0/530 -1.32 -2.07 copper ion binding; INVOLVED IN: cellular copper ion homeostasis; LOCATED IN: cellular_component unknown; EXPRESSED IN: 22 plant ng, chelation structures; EXPRESSED DURING: 13 growth stages. & (loc_os05g11320.1 : 82.0) no description available & (reliability: 188.8) & (original and storage description: no original description) (at5g24580 : 276.0) Heavy metal transport/detoxification superfamily protein ; FUNCTIONS IN: copper ion binding, metal ion binding; INVOLVED metal IN: copper ion transport, metal ion transport; LOCATED IN: cellular_component unknown; EXPRESSED IN: 20 plant structures; EXPRESSED handling.bindi DURING: 13 growth stages; CONTAINS InterPro DOMAIN/s: Heavy metal transport/detoxification protein (InterPro:IPR006121); BEST 15.2 c21738/f1p1/1330 -3.05 -4.49 ng, chelation Arabidopsis thaliana protein match is: Heavy metal transport/detoxification superfamily protein (TAIR:AT5G50740.1); Has 553 Blast hits to 416 and storage proteins in 80 species: Archae - 0; Bacteria - 22; Metazoa - 98; Fungi - 42; Plants - 238; Viruses - 27; Other Eukaryotes - 126 (source: NCBI BLink). & (loc_os03g64340.1 : 171.0) no description available & (reliability: 552.0) & (original description: no original description) (at5g63530 : 213.0) Farnesylated protein that binds metals.; farnesylated protein 3 (FP3); FUNCTIONS IN: transition metal ion binding, metal ion metal binding; INVOLVED IN: cellular transition metal ion homeostasis; LOCATED IN: plasma membrane; EXPRESSED IN: 22 plant structures; handling.bindi EXPRESSED DURING: 13 growth stages; CONTAINS InterPro DOMAIN/s: Heavy metal transport/detoxification protein (InterPro:IPR006121); 15.2 c84187/f10p6/1481 -1.86 -1.82 ng, chelation BEST Arabidopsis thaliana protein match is: Heavy metal transport/detoxification superfamily protein (TAIR:AT5G50740.1); Has 5078 Blast hits to and storage 3127 proteins in 372 species: Archae - 4; Bacteria - 284; Metazoa - 1364; Fungi - 240; Plants - 2332; Viruses - 113; Other Eukaryotes - 741 (source: NCBI BLink). & (loc_os04g32030.1 : 188.0) no description available & (reliability: 426.0) & (original description: no original description) (at1g09240 : 421.0) Encodes a nicotianamine synthase.; nicotianamine synthase 3 (NAS3); CONTAINS InterPro DOMAIN/s: Nicotianamine synthase metal (InterPro:IPR004298); BEST Arabidopsis thaliana protein match is: nicotianamine synthase 4 (TAIR:AT1G56430.1); Has 198 Blast hits to 195 handling.bindi proteins in 46 species: Archae - 20; Bacteria - 10; Metazoa - 0; Fungi - 20; Plants - 147; Viruses - 0; Other Eukaryotes - 1 (source: NCBI BLink). & 15.2 c28825/f15p14/1398 -1.78 ng, chelation (q9fxw5|nas3_orysa : 330.0) Nicotianamine synthase 3 (EC 2.5.1.43) (S-adenosyl-L-methionine:S-adenosyl-L-methionine:S-adenosyl-methionine 3- and storage amino-3-carboxypropyltransferase 3) (OsNAS3) - Oryza sativa (Rice) & (loc_os07g48980.1 : 330.0) no description available & (reliability: 842.0) & (original description: no original description)

154