Critical Players and

Gene Expression Regulation in

Eucalyptus Xylogenesis

Victor João Martins Taveira Carocha

Dissertation presented to obtain the

PhD degree in Biology

Instituto de Tecnologia Química e Biológica | Universidade Nova de Lisboa

Oeiras, Portugal

November, 2016

The work presented in this thesis was performed at:

Laboratório de Biotecnologia de Células Vegetais BioResources 4 Sustainability Unit (GREEN-IT) Instituto de Tecnologia Química e Biológica Universidade Nova de Lisboa. Av. República, Quinta de Marquês, 2780-157 Oeiras, Portugal.

Laboratoire de Recherche en Sciences Végétales Centre Nationale Recherche Scientífique (CNRS) BP42617, Auzeville 31326 Castanet Tolosan, France.

Instituto de Biologia Experimental e Tecnológica Avenida da República, Estação Agronómica Nacional, 2780-157 Oeiras, Portugal

Ph.D. Supervisors: Jorge Almiro Barceló Caldeira Pinto Paiva, PhD Senior Researcher, Institute of Genetics of the Polish Academy of Sciences (IPG PAN), Department of Integrative Plant Biology, Strzeszynka 34, 60-479 Poznan, Poland. Jacqueline Grima-Pettenati, PhD Directeur de recherche, Centre National de la Recherche Scientífique (CNRS), Laboratoire de Recherche en Sciences Végétales, Université Toulouse III, UPS, CNRS. BP 42617, Auzeville, 31326 Castanet Tolosan, France. Manuel Pedro Salema Fevereiro, PhD Assistant Professor, Departamento de Biologia Vegetal, Faculdade de Ciências da Universidade de Lisboa (FCUL), Campo Grande, 1749-016 Lisboa, Portugal. Laboratório de Biotecnologia de Células Vegetais, Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa. Universidade Nova de Lisboa. Av. República, Quinta de Marquês, 2780-157 Oeiras, Portugal.

PROJECTOS INDIVIDUAIS The student, Victor João Martins Taveira Carocha received financial support from Fundação para a Ciência e Tecnologia and Fundo Social Europeu in the scope of Quadro Comunitário de Apoio throught the PhD fellowship SFRH/BD/72982/2010. The activities in this PhD project were financed by FCT – Fundação para a Ciência e Tecnologia and ANR – L'Agence Nationale de la Recherche under the research projects microEGo (FCT PTDC/AGR-GLP/098179/2008), DataStorm (EXCL/EEI- ESS/0257/2012) and P-KBBE Treeforjoules (ANR-2010-KBBE-007-01 and FCT P-KBBE/AGR_GPL/0001/2010). ACKNOWLEDGMENTS/AGRADECIMENTOS

In 2011, this PhD project came after a time of abrupt changes in my professional life. Good fortune came with a challenge made by three Researchers whom I had great professional consideration. After five years of productive and dynamic professional experiences under the supervision of Doctor Jorge Paiva, Professor Pedro Fevereiro and Professor Jacqueline Grima-Pettenati, I feel fully rewarded for having accepted your challenge. Thank you for receiving me in your labs! Beyond supervisors you became friends and motivators. I was greatly privileged by your time, honored by your supervision and your trust. I thank you Jorge for the first invitation, for being always active, helpful, friendly and positive especially in hard times. I thank you Pedro for giving me your trust and your enthusiastic, strong and productive perspectives on Science. I thank you Jacqueline for your kind, firm and sincere friendship; and for the great privilege of sharing with me your house, your loving family and good friends. I thank you all for sharing with me some of your immense scientific knowledge and for providing me resources to do these studies and enrich my professional life.

I acknowledge many other people for direct and indirect contributions for this thesis. Do BCV, agradeço-te Clara Graça por todos os momentos de trabalho produtivo, alegria, confiança e amizade que partilhamos. Fui privilegiado em ter o tempo e o espaço para beneficiar da tua enorme dimensão humana. Agradeço-te Francisco Fernandes pela tua competência professional, pela tua generosidade e qualidade humana; foste decisivo para ultrapassar o período mais difícil desta tese. Agradeço-te Susana Pêra pela tua serenidade lúcida e pelo apoio em diversas etapas desta tese. Agradeço-te Susana Araújo, pela tua brilhante personalidade e pelo apoio profissional e pessoal. Agradeço-vos Rita Caré e José Parreira pelas longas, excelentes e amigáveis conversas. Agradeço a Inês Trindade o apoio nas etapas de trabalho com radioactivos. Agradeço-vos a todos Dulce Santos, André Almeida, Carlota Vaz Patto, Sofia Duque, Susana Neves, Priscila Pereira, Olívia Costa, Rita Morgado, Diana Branco, Maria Assunção, Claúdio Capitão, Nuno Almeida, Mara Alves, Susana Leitão, Catarina Bicho, Ana Ferreira, Tomás Carvalho, Joana Lérias, Joana Amado, Carolina Gomes, Sara Costa e Maria Medalhas pelo apoio pessoal e profissional durante estes anos. From the LRSV in Toulouse, I deep thank you my dear friends Marçal Soler and Anna Plasencia for the contributions to this thesis and your truly competent iv professional help. More, you warmly receive me in your house and share great moments of friendship which I deeply prize. I thank you all, Hélène San-Clement, Isabelle Truchet, Hua Cassan-Wang, Bruno Savelli, Fabien Mounet, Yves Barrière, Nathalie Ladouce, Audrey Courtial, Eduardo Camargo, Hong Yu, Mohammed Najib, Raphaël Poyet and Annabelle Dupas for scientific help and for making me feel that Toulouse was my second home. I thank also all the people from the KBBE project “Tree for joules”. From the University of Pretoria, I thank Alexander Myburg and Charles Heffer for the data and interactions which made possible the exciting work around the release of the E. grandis genome. From the University of East Anglia, I thank Tamas Dalmay, Simon Moxon and Mathew Stocks for suggestions during the bioinformatics predictions. Do INESC-ID, agradeço a Ana Teresa Freitas, Nuno Mendes e Jorge Oliveira e ainda a Andreia Fonseca do IMM, pelas primeiras predições bioinformáticas. Do IICT, agradeço a José Carlos Rodrigues, Ana Alves e Rita Simões pelas excelentes análises químicas da madeira. Do ISA, agradeço a Teresa Quilhó as valiosas caracterizações anatómicas da madeira. Da Altri Florestal, nosso parceiro industrial, agradeço a Luís Leal, Clara Araújo e Lucinda Neves as diversas sugestões, o excelente apoio logístico e pelo material biológico usado nesta tese.

Dos velhos dias da Torre Bela e Oeiras, agradeço-vos Cristina Marques, Fátima Cunha, Marta Melo, Carla Ribeiro, José Araújo, António Ramos, José Cardoso, José Rafael, Nuno Borralho e José G. Ferreira pela amizade que permanece e por outrora ajudarem a me tornar um praticante das virtudes do espírito de equipa. Lembro também o saudoso Luís Lemos cuja memória persistirá em nós.

Agradeço profundamente a vocês, Luís Alves e Vítor Gonçalves, meus amigos de longa data; E Pluribus Unum. Agradeço-vos Teresa Baptista da Silva, Susana Alves, José Matos, Isabelle Milla, Cesaltina da Fonte e Alzira Ribeiro pelo forte apoio e permanente amizade.

Agradeço a toda a minha querida família, em especial aos que já partiram e que guardo no coração. Obrigado pelo vosso Amor, apoio e compreensão. Em particular, obrigado Silvestre, meu querido Pai há tantos anos uma eterna Saudade. Obrigado pelo teu Amor e valores incorruptíveis. E obrigado, Albina, minha querida Mãe, pelos teus valores, o teu exemplo permanente e mais do que tudo pelo teu puro, incondicional e ilimitado Amor. Que esta tese vos possa orgulhar.

v

LIST OF MOST USED ABBREVIATIONS

% percentage kb kilobases º degrees Celsius M molar 1w one week of bending stress min minutes 34 three and four weeks of bending miRNA micro RNA stress 4CL 4-coumarate:CoA ligase miRNA* micro RNA star ARF auxin response factors mM millimolar BAC bacterial artificial chromosome mRNA messenger RNA bp base pairs MYB MYB: plant orthologs of myeloblastosis oncogenes transcription factors C3’H p-coumarate 3-hydroxylase NAC no apical meristem, ATAF1, ATAF2, and Cup-shaped cotyledon 2 C4H cinnamate 4-hydroxylase ng nanogram CAD cinnamyl alcohol dehydrogenase nt nucleotide CCoAOMT caffeoyl CoA 3-O- OW opposite wood methyltransferase CCR cinnamoyl CoA reductase PAL phenylalanine ammonia-lyase cDNA complementary DNA PCA principal component analysis CDS coding sequence PCR polymerase chain reaction COMT caffeate/5-hydroxyferulate O- RNA ribonucleic acid methyltransferase CSE caffeoyl shikimate esterase RNA-Seq RNA sequencing ct cycle threshold RT-qPCR reverse transcription quantitative polymerase chain reaction DNA deoxyribonucleic acid S syringyl dsRNA double stranded RNA SCW secondary cell wall DX developing xylem sRNA small RNA Egl globulus Labill TF transcription factor Egr Eucalyptus grandis W. Hill TM trademark EST expressed sequence tag TW tension wood F5H ferulate 5-hydroxylase U units FDX fully developed xylem v/v volume / volume Fig. figure VN numbers of vessel per mm2 FL fibre length VTD vessel tangential diameter FPKM fragments per kilobase of exon w/v weight / volume per million reads mapped FW fibre width Yr year FWT fibre cell wall thickness G guaiacyl

vi Table of contents

ACKNOWLEDGMENTS/AGRADECIMENTOS ...... iv LIST OF MOST USED ABBREVIATIONS ...... vi Table of contents ...... vii List of Figures ...... ix List of Tables ...... xi GENERAL ABSTRACT ...... 1 RESUMO GERAL (IN PORTUGUESE) ...... 4 Objectives of this thesis ...... 7 Organization of the manuscript ...... 8

Chapter 1 ...... 12 Introduction

Wood and its importance ...... 12 The genus Eucalyptus ...... 13 Genomic and genetic resources to study wood formation in Eucalyptus ...... 17 Wood formation in trees ...... 21 Cell wall organization ...... 24 Chemical composition of plant cell walls ...... 26 Wood complex properties ...... 28 Wood variability ...... 29 Tension wood formation and properties ...... 30 Hormonal influence on wood formation ...... 33 Transcriptional regulation of wood formation ...... 35 Post-transcriptional regulation of wood formation ...... 37

Chapter 2 ...... 42 Genome-wide analysis of the lignin toolbox of Eucalyptus grandis

Abstract ...... 44 Chapter introduction ...... 45 Objectives ...... 47 Materials and methods ...... 48 Results ...... 51 Discussion ...... 72

Chapter 3 ...... 76 Validation of a local assembly of a tandem duplicated array of genes involved in lignin biosynthesis in Eucalyptus grandis

Abstract ...... 78 Chapter introduction ...... 79

vii

Objectives ...... 82 Materials and methods ...... 83 Results ...... 86 Discussion ...... 89

Chapter 4 ...... 90 The Eucalyptus grandis R2R3-MYB transcription factor family: evidence for woody growth-related evolution and function

Abstract ...... 92 Chapter introduction ...... 93 Objectives ...... 95 Materials and methods ...... 96 Results ...... 101 Discussion ...... 115

Chapter 5 ...... 121 Anatomical, chemical and transcriptomic dynamics during tension wood formation in Eucalyptus globulus

Abstract ...... 123 Chapter introduction ...... 124 Objectives ...... 125 Materials and methods ...... 126 Results ...... 133 Discussion ...... 153

Chapter 6 ...... 160 Post-transcriptional regulation dynamics during Eucalyptus globulus tension wood formation

Abstract ...... 162 Chapter introduction ...... 163 Objectives ...... 165 Materials and methods ...... 166 Results ...... 172 Discussion ...... 186

Chapter 7 ...... 193 General discussion and perspectives

Main achievements ...... 193 Future perspectives ...... 206

Supplementary data ...... 212 References ...... 213

viii

List of Figures

Chapter 1

Figure 1.1 - Eucalyptus globulus flower-buds opening-up. ....... 13 Figure 1.2 - Eucalyptus globulus Labill. (Tasmanian blue gum) .. 16 Figure 1.3 - Eucalyptus grandis W. Hill ex Maiden (Floodem gum or rose gum). ... 16 Figure 1.4 - Overview of procambial/cambial cell specification and xylem/phloem cell differentiation. ... 22 Figure 1.5 - Xylogenesis or wood formation. 23 Figure 1.6 - Cell wall structure in wood. .... 25 Figure 1.7 - Formation of tension and opposite wood in E. globulus in response to an artificially induced gravitropism stress. 31 Figure 1.8 - Multilayered hierarchical gene regulatory networks that govern root xylem cell specification and differentiation in Arabidopsis thaliana. ..... 36 Figure 1.9 - Biogenesis and operation of plant micro RNA. ..... 40

Chapter 2

Figure 2.1 - The PAL bona fide clade. ... 53 Figure 2.2 - The C4H, C3’H and F5H bona fide clades. . 55 Figure 2.3 - The 4CL, HCT and CSE bona fide clades. .. 59 Figure 2.4 - The CCoAOMT and COMT bona fide clades. .... 62 Figure 2.5 - The CCR and CAD bona fide clades. ...... 66 Figure 2.6 - The Eucalyptus lignification toolbox. .... 69 Figure 2.7 - Heatmaps of transcript accumulation patterns of 38 Eucalyptus putative monolignol biosynthesis genes. ... 70

Chapter 3

Figure 3.1 - Genomic disposition of the E. grandis tandem gene cluster nº 547. 83 Figure 3.2 - Electrophoretic separation of amplicons generated from genic and intergenic regions of a tandem array of eight E. grandis OMT genes. .... 87 Figure 3.3 - A targeted BAC sequencing approach to verify the genomic architecture of a tandem array of eight E. grandis OMT genes. .... 88

Chapter 4

Figure 4.1 - Neighbour-joining phylogenetic tree of R2R3-MYB proteins from five plant species. .... 104 Figure 4.2 - Presence/absence of R2R3-MYB genes belonging to the woody-preferential subgroups in different plant genomes. 106 Figure 4.3 - Physical position of the 141 R2R3-MYB genes in the 11 chromosome scaffolds of E. grandis. 106 Figure 4.4 - Heat map of the RNA-Seq transcript abundance pattern of

ix

the 141 R2R3-MYB genes from Eucalyptus grandis. . 111 Figure 4.5 - Heat map of the transcript abundance patterns assessed using microfluidic qPCR of 63 Eucalyptus genes. .. 113

Chapter 5

Figure 5.1 - Design of a time course experiment of gravitropism stress induction and collection of developing xylem tissues from five-years-old E. globulus trees. .... 127 Figure 5.2 - Histochemical characterization of E. globulus tension and opposite wood tissues. .... 133 Figure 5.3 - Principal components analysis of chemical (synthetic pyrolysis products) and anatomical characteristics quantified in E. globulus TW and OW tissues. ... 140 Figure 5.4 – Workflow to identify differentially expressed loci in developing xylem tissues from E. globulus. .... 141 Figure 5.5 - Hierarchical clustering of E. globulus tension wood and opposite wood differentiating xylem samples...... 145 Figure 5.6 - Expression profile of genes up-regulated in TW and OW of E. globulus, along a bending stress kinetics. ..... 145

Chapter 6

Figure 6.1 – Workflow for miRNA prediction. . 173 Figure 6.2 – Predicted secondary structures for 15 E. globulus miRNA precursors. .... 174 Figure 6.3 – Physical position of miRNA identified in DX tissues. . 175 Figure 6.4 - Distribution of distinct E. globulus miRNA by length classes. ..... 177 Figure 6.5 – The 15 most abundant mature miRNA in the transcriptome of E. globulus developing xylem tissues. ... 177 Figure 6.6 - Physical mapping of 132 loci encoding 192 E. globulus miRNA-encoding loci and 135 loci encoding 135 target genes in the E. grandis genome. ........ 179 Figure 6.7 - Venn diagrams for comparative analyses of five E. globulus degradome libraries. ... 180 Figure 6.8 – Hierarchical clustering of E. globulus TW vs OW tissues samples based on transcript abundance of differentially expressed miRNA. ....... 183 Figure 6.9 - Clusters of miRNA with homogeneous expression profiles in the E. globulus TW and OW kinetics. .. 185

Chapter 7

Figure 7.1 – A summary of distinctions between TW and OW tissues from E. globulus. ...... 198

x

List of Tables

Chapter 3

Table 3.1 – Amplicons obtained from genic and intergenic regions of a tandem array of eight E. grandis OMT genes. .... 87

Chapter 5

Table 5.1 –Comparative anatomical characterization of E. globulus tension wood and opposite wood tissues sampled from developing xylem. ...... 135 Table 5.2 – Comparative chemical characterization of E. globulus wood tissues from developing xylem and from fully developed xylem. ...... 136 Table 5.3 - Chemical characterization (main groups of pyrolysis products) of E. globulus tension wood and opposite wood tissues along a tension wood induction kinetic. ... 137 Table 5.4 - Transcriptome sequencing (FPKM values) data for 93 differentially expressed loci in E. globulus tension wood and opposite wood. ..... 146 Table 5.5 - Genes up-regulated in E. globulus opposite wood or tension wood associated to carbohydrate activity. . 151

Note: A digital version of this thesis can be found in the following link:

http://www.itqb.unl.pt/~carocha/Phd_Thesis_VJCarocha.pdf

xi

GENERAL ABSTRACT

Secondary xylem, commonly known as wood, is essentially formed by highly lignified secondary cell walls of both fibres and vessels. The process of formation of secondary xylem is also termed xylogenesis. Secondary xylem derives from the vascular cambium, whose dividing cells undergo irreversible differentiation, under a strict temporal and spatial control.

The genus Eucalyptus holds an important ecological and economical value since it is one of the most planted hardwood trees worldwide. Among its nearly seven hundred species, Eucalyptus globulus and E. grandis stand out for their superlative socioeconomic value in temperate and tropical/sub-tropical regions, respectively.

The main objective of this doctoral thesis was to produce original research to improve our knowledge of the critical players and the gene regulation mechanisms prevalent during Eucalyptus xylogenesis.

In the first part of this thesis, two genome-wide surveys combined with comparative phylogenetics and gene expression profiling led to the identification of important structural and regulatory genes involved in the biosynthesis of Eucalyptus secondary cell walls. In the first survey, 175 genes from eleven multigene families involved in lignin biosynthesis were identified. In the second survey, 141 R2R3-MYB transcription factors were identified. The developmental and environmental expression patterns of these genes were profiled in wide-range tissue panels, including both vascular and non-vascular tissues. These results, combined with comparative phylogenetic analyses, led to the identification of a restricted set of 17 putative bona fide genes most-likely involved in Eucalyptus vascular lignification. Moreover, subgroups containing R2R3-MYB genes which are key regulators of lignin biosynthesis and secondary cell wall formation,

1 were identified and found conserved comparatively to some plant model species.

In the second part of this thesis, the phenotypic and molecular dynamics of the formation of developing xylem (DX) tissues in Eucalyptus globulus trees submitted to a kinetics of gravitropism stress were studied. Comparison of the anatomical and chemical changes induced in tension and opposite wood tissues formed in the same trees, confirmed the extensive impact of strong gravitopism stress stimuli over secondary cell wall formation.

Whole coding transcriptome data analysis after the generation and deep sequencing (mRNA-Seq) of twelve mRNA libraries led to the identification of 88 genes differentially expressed between tension and opposite wood tissues. Distinct carbon partitioning priorities and dynamic carbon fluxes for the biosynthesis of different cell wall components and for energy metabolism were evidenced between those contrasting tissues. Critical transcriptional modulation and hormonal regulatory mechanisms of E. globulus xylogenesis were also highlighted.

The dynamics of post-transcriptional regulation mediated by miRNAs during the differentiation of tension and opposite wood tissues were studied with the generation and deep-sequencing (sRNA-Seq) of nine Eucalyptus globulus small RNA libraries. In total 132 pre-miRNAs, corresponding to 162 distinct miRNAs were identified (49% previously not reported before). Among these, 71 mature miRNAs were found differentially expressed between tension and opposite wood tissues. Five degradome libraries were generated from the same tissues and their deep sequencing analysis allowed the identification of 135 target genes regulated by 103 miRNA (37% previously unreported). Two sets of miRNAs revealed differential and contrasting expression between

2 tension and opposite wood tissues. Tension wood over-expressed miRNAs included some most-likely involved in abiotic stress responses, in particular to maintain mineral homeostasis. Opposite wood preferentially expressed miRNA included some most-likely involved in cambial activity and hormonal signaling.

Substantial interplays between transcriptional and post- transcriptional regulations were evidenced, corroborating the importance of miRNA-mediated regulation during xylogenesis. Results also suggested the existence of species-specific and tissue-specific components for these interplays.

The scientific contributions of this thesis led to the identification and characterization of critical actors of Eucalyptus xylogenesis. The thesis also provided novel, advanced insights into the highly regulated mechanisms of wood formation. These advances provide foundational knowledge for future functional studies having the potential for defining molecular breeding strategies to modulate cell wall properties and therefore for engineering improved wood traits in eucalypts according requires end-uses.

3

RESUMO GERAL (IN PORTUGUESE)

O xilema secundário, comummente conhecido por madeira, é essencialmente formado por células de fibras e vasos que apresentam paredes secundárias altamente lenhificadas. O processo de formação do xilema é também designado por xilogénese. O xilema secundário resulta da atividade meristemática do câmbio vascular, cujas células em divisão se diferenciam de forma irreversível sobre um estrito controlo temporal e espacial.

O género Eucalyptus tem um alto valor ecológico e económico. Entre as cerca de setecentas espécies que integram este género, as espécies Eucalyptus globulus e Eucalyptus grandis, destacam-se pelo seu superlativo valor socioeconómico em regiões temperadas e regiões tropicais/sub-tropicais, respetivamente.

O principal objetivo desta tese de doutoramento foi produzir trabalho de investigação original que contribua para o progresso do conhecimento acerca de quais são os atores fundamentais e quais são os mecanismos de regulação da expressão de genes prevalentes que estão associados à xilogénese em Eucalyptus.

Na primeira parte desta tese, dois estudos desenvolvidos à escala genómica permitiram identificar importantes genes estruturais e reguladores envolvidos na biossíntese da parede celular em Eucalyptus. No primeiro desses estudos foram identificados 175 genes que são membros de onze famílias multigénicas envolvidas na biossíntese da lenhina. No segundo estudo foram identificados 141 fatores de transcrição R2R3-MYB. A análise dos perfis expressão destes genes em painéis diversificados de tecidos vasculares e não vasculares, permitiu caracterizar a sua expressão em diversas etapas de desenvolvimento e respostas ambientais. Estes resultados, combinados com análises de filogenia comparativa, permitiram

4 identificar um conjunto restrito de 17 genes muito provavelmente envolvidos na lenhificação vascular em Eucalyptus. Adicionalmente foram identificados subgrupos conservados de genes R2R3-MYB que são conhecidos reguladores-chave na biossíntese da lenhina e na formação da parede celular secundária.

Na segunda parte da tese, as dinâmicas fenotípicas e moleculares induzidas em xilema em desenvolvimento formado em resposta a diferentes períodos de estímulo gravitrópico foram estudadas em Eucalyptus globulus. Perfis contrastantes de alterações anatómicas e químicas reveladas para amostras de xilema em diferenciação de lenho de tensão e de oposição formados nas mesmas árvores, confirmaram o grande impacto de fortes estímulos gravitrópicos sobre os processos de formação da parede celular secundária.

A análise de dados de transcritoma codificante integral gerados por sequenciação de alto débito de doze bibliotecas de mRNA, permitiu identificar 88 genes diferencialmente expressos entre madeira de tensão e oposição. Nesses tecidos contrastantes foi evidenciada a existência de prioridades distintas nas partições e nos dinâmicos fluxos de carbono para os diversos componentes da parede celular e para o metabolismo energético. Foram igualmente evidenciados mecanismos de modulação da transcrição e regulação hormonal da xilogénese em E. globulus.

A dinâmica de regulação pós-transcricional mediada por miRNA durante a diferenciação de tensão e de oposição foi estudada em E. globulus com a produção e sequenciação de nove bibliotecas de sRNA. Foi identificado um total de 162 miRNA distintos (49% destes nunca antes reportados). Entre esses, 71 miRNA exibiram expressão diferencial entre tecidos de lenho de tensão e de oposição. Cinco bibliotecas de degradoma foram geradas a partir dos mesmos tecidos.

5

A análise de dados da sua sequenciação por alto débito permitiu a identificação de 135 genes-alvo com expressão regulada pela mediação de 103 miRNA (37% destes nunca antes reportados). Dois conjuntos distintos de miRNA revelaram expressão diferencial e contrastante em tecidos de madeira de tensão e de oposição. Entre os miRNA preferencialmente expressos em madeira de tensão incluem- se alguns muito provavelmente envolvidos na regulação das respostas a diferentes estresses abióticos, em particular à homeostasia de minerais. Entre os miRNA preferencialmente expressos em madeira de oposição incluíram-se alguns muito provavelmente envolvidos na regulação da actividade cambial e da sinalização hormonal.

Foram evidenciadas substanciais interações entre os mecanismos de regulação transcricional e pós-transcricional, comprovando a importância da regulação mediada por miRNA durante a xilogénese. Os dados sugerem igualmente a existência de componentes específicas da espécie e dos tecidos estudados, associadas a essas interações.

Esta tese contribuiu para a identificação e caracterização de importantes atores da xilogenése em Eucalyptus. Permitiu igualmente obter uma nova, mais avançada compreensão dos mecanismos altamente regulados responsáveis pela formação da madeira, conhecimento fundamental para futuros estudos funcionais. Esses estudos encerram um elevado potencial para a definição de estratégias de melhoramento molecular visando a modulação das propriedades da parede celular e consequente otimização das características da madeira de eucalipto de acordo com as utilizações finais pretendidas.

6

Objectives of this thesis

The overall objective of my PhD project was to contribute for a better knowledge about critical players and gene expression regulation mechanisms associated to the formation of vascular tissues and the determination of wood properties in Eucalyptus.

The specific objectives were:

● To conduct a bibliographic review on the current knowledge on basic aspects related with wood formation and properties (Chapter 1).

● To accomplish genome wide surveys to identify genes which are members of the eleven families involved in Eucalyptus grandis monolignol biosynthesis and to identify a core lignification gene set involved in developmental xylem lignification in Eucalyptus (Chapter 2).

● To contribute to the experimental validation of gene tandem duplication loci, a major feature of the Eucalyptus grandis genome (Chapter 3).

● To perform a genome-wide survey to identify members of the R2R3-MYB family in Eucalyptus grandis and to distinguish which of those genes could be involved in the regulation of secondary cell wall formation and woody biomass production (Chapter 4).

● To characterize the phenotypic and transcriptomic dynamics during the differentiation of tension and opposite wood tissues from Eucalyptus globulus (Chapter 5).

● To assess and compare the temporal and spatial dynamics of miRNA-mediated post-transcriptional regulations which occur during the formation of tension and opposite wood tissues from Eucalyptus globulus (Chapter 6).

● To get insights into the biological impacts of transcriptional and post-transcriptional (miRNA-mediated) regulations and their cross-talk interactions, to expand our knowledge about the formation of tension and opposite wood from Eucalyptus globulus under a of gravitropic stimulus (Chapter 5-7).

7

Organization of the manuscript

This thesis starts with a “General Abstract” and a description of the “General Objectives”, followed by seven main chapters. Chapter 1 is a “General Introduction” the following five chapters deal with original research (Chapter 2 to 6). Each of them is presented in the form of a research article and includes thematic discussions of the results. The final chapter “General discussion and perspectives” provides a global discussion on the main achievements of my PhD and perspectives and strategies for future work.

Chapter 1, “Introduction”, was built based on an extensive bibliographic review. This chapter provides a synthesis of current knowledges, which covers basic aspects related to wood formation and wood properties. The ecological and economical relevance of some of the species of the Eucalyptus genus is mentioned. Available genomic resources for the study of wood formation in this genus are described. Basic concepts on wood biogenesis, cell wall structural organization, composition and variability and its inherent properties are summarized. Next, current knowledge concerning the formation of tension and opposite wood, its basic, diverse and contrasting properties and the influence of mechanisms of hormonal control are briefly summarized. Finally, the nature and influence of transcriptional and post-transcriptional regulation mechanisms in wood formation is briefly introduced.

In the first years of my PhD, I had the privilege to participate to the international effort for the release of the fully annotated sequence of the Eucalyptus grandis genome. This effort was collectively built in the follow up of a massive sequencing and data analysis project, which was conducted at the DOE JGI (U.S. Department of Energy Joint Genome Institute, Walnut Creek, California, USA). It involved the

8 participation of 80 scientists from nine countries and 34 laboratories. This participation allowed me to develop collaborative research work with scientists from the laboratories of my supervisors at LRSV-CNRS (Professor Jacqueline Grima-Pettenati, Toulouse, France); IBET (Doctor Jorge Paiva, Lisboa, Portugal) and ITQB (Professor Pedro Fevereiro, Oeiras, Portugal). I also collaborated with scientists from the team of Doctor Alexander Myburg (DG-FABI, University of Pretoria, South Africa). Doctor A. Myburg was one of the leading scientists of this ambitious project. The research contributions for this international effort in which I was participated are condensed in Chapter 2, 3 and 4. These three chapters are the first part of the original research developed in my doctoral studies.

Chapter 2 presents a genome-wide survey to identify members of the eleven multigene families involved in monolignol and lignin biosynthesis in E. grandis. In this study, I explored the extensive sequence databases produced after the E. grandis genome sequencing project, to identify those genes. Comparative phylogenetic studies of those genes with orthologs previously studied in other plant species were performed to provide insights on the evolutionary history of each multigene family and to identify putative bona fide clades of enzymes involved in monolignol biosynthesis. Expression profiling work using both RNA-Seq and RT-qPCR techologies was performed to examine some developmental and environmental responsive expression patterns of those genes. In the final stage of this study, I focused on the identification of which members from those families are involved in vascular lignification in Eucalyptus. I performed the experimental work, the data analysis and wrote the first draft. A part of this study was published in the genome paper (Myburg et al., 2014, Nature 510: 356-362), and the main core as companion paper of the

9 genome paper in a dedicated issue of New Phytologist (Carocha et al., 2015, New Phytol. 206: 1297–1313).

Chapter 3 presents the experimental validation of a local sequence assembly of a short region of the Eucalyptus grandis genome containing a tandem duplicated array of eight similar genes. It was a contribution to a collective effort aimed at validating the assembly of specific regions of the E. grandis sequenced genome (v1.0) I performed the experimental work and the data analysis which were partially included in the genome paper published in Nature (Myburg et al., 2014, Nature 510: 356-362).

Chapter 4 presents a genome-wide identification of members of the R2R3-MYB transcription factor family in Eucalyptus grandis. The objectives of this study were similar to those described for the study presented in Chapter 2. The R2R3-MYB transcription factor family is one of the largest and most important transcription factor families in higher since its members control a wide variety of plant-specific processes, including secondary cell wall formation. I participated in some parts of the experimental work including gene expression profiling work. I also contributed to data treatment and analysis. This study was published in New Phytologist (Soler et al., 2015, New Phytol. 206: 1364–1377).

The second part of the original research of my PhD studies is presented in Chapters 5 and 6 focused on the formation of tension and opposite woods in Eucalyptus globulus. In angiosperm trees, tension and opposite wood tissues are formed in the two opposite sides of bent trunks in response to gravitropism. Tension and opposite wood are highly contrasting wood-forming tissues and therefore are considered an excellent model to study the formation of xylem cell walls.

10

Chapter 5 presents the dynamics of anatomical, chemical and coding transcriptomic changes which occur during the differentiation of tension and opposite wood tissues in E. globulus. This study intended to obtain deeper insights into the phenotypic and molecular plasticity of xylogenesis in E. globulus. To achieve this objective, an unprecedented experimental field trial was implemented allowing performing a time course of tension and opposite wood differentiation. This kinetics experiment comprised four different periods of artificial gravitropism induced in bent trunks of mature trees. I participated in most of the experimental work and performed most of the data analysis. Parts of this chapter were included in a research article that was recently submitted for publication in an international journal.

Chapter 6 presents the dynamics of miRNA-mediated post- transcriptional regulation during the formation of tension and opposite wood tissues in Eucalyptus globulus. Genome-wide identification of microRNAs (miRNAs) was performed using sequence data from the non-coding transcriptomes of E. globulus tension and opposite woods formed at different times of the tension wood kinetics. Large-scale identification of miRNA target genes was achieved by degradome sequencing analyses from the same tissues. Functional categorization of those genes provided insights into the biological processes, molecular functions and cellular components under miRNA mediated post-transcriptional regulation. I participated in most of the experimental work and performed most of the data analysis including the bioinformatics predictions. Parts of this chapter were included in a research article that was recently submitted for publication in an international journal.

Finally, the main results and perspectives for future work were summarized in Chapter 7, General discussion and perspectives.

11

Chapter 1

Introduction

Wood and its importance

Wood is a heterogeneous, hard fibrous, structural, biological tissue that makes up the greater part of stems, branches and roots of trees and other woody plants. Shaped by sequential stacked growth rings, wood confers a stable structure to the trees, provides mechanical support and channels for the conduction of water, hormones (produced in root tips), nutrients and minerals.

For thousands of years and until the XXI century, wood has been the prime raw material for both firewood and construction. Its availability and improved techniques to process and manufacture this raw material into countless convenient forms for human life, has benefited and comforted the lives of our ancestors. Wood is the main source of terrestrial biomass and, unlike coal and oil, it is a natural and endless renewable source of energy (Du and Yamamoto, 2007; Plomion et al., 2001).

Forests are estimated to cover about 4 billion hectares, which represents 31% of the land surface from our planet (FAO, 2013).

Forests are major sinks for excess atmospheric CO2, thereby reducing one of the major contributors to global warming (Boudet et al., 2003; Plomion et al., 2001). Given their ecological impact, forests have been recognized a multi-functional prominence. They are increasingly valued for soil protection, watershed management, protection against avalanches and provision of biodiversity.

In the last decades the increased economic activities and world demand on wood raw material have considerably increased the pressure over the forest land areas. In this scenario, large-scale

12 commercial tree plantations have reduced the net loss of forest area and satisfied some of the industry demand. Currently, the area of planted forest has increased and accounts for 7% of total forest area, i.e. 264 million hectares (FAO, 2013). These statistics highlight the major role that forest play in human activities and Earth sustainability.

In 2010, forest occupied around 35% of the national territory of Portugal, representing about 3.2 million hectares (ICFN, 2013). From these, 72% (around 2.3 million hectares) are occupied by the three species with highest commercial value which are maritime pine (Pinus pinaster) with 714 thousand ha, cork oak (Quercus suber) with 737 thousand ha and finally Eucalyptus (mostly Eucalyptus globulus), which leads with 812 thousand ha (CELPA, 2015).

The genus Eucalyptus

Figure 1.1 - Eucalyptus globulus flower-buds opening-up. Source: (Cerasoli et al., 2016).

The term “Eucalyptus” derives from a combination of Greek words “eu” meaning “well” and “calypto” meaning covered, allusive to the cap protecting the flower buds.

The first species belonging to the genus Eucalyptus, E. obliqua was initially described by L’Héritier de Brutelle in 1789 (Brooker, 2000;

13

Nelson, 1983) from specimens collected by David Nelson during the third Pacific expedition commanded by Captain James Cook (EUCLID: eucalypts of southern Australia, Centre for Plant Biodiversity Research, 2002). The origin and evolution of this genus has been reviewed in Myburg et al. (2007). It belongs to a basal Rosid lineage ( order, family), which evolved mostly in the isolation of the Australian continent and therefore represents an independent evolutionary experiment for studies of the woody perennial lifestyle (Myburg et al., 2014).

The Eucalyptus genus includes around 700 species (Brooker, 2000) native to Australia, , and nearby islands. The Eucalyptus genus is highly diverse and displays significant adaptability and phenotypic plasticity. Eucalypts come in a great range of shapes and sizes – from tall trees to small multi-stemmed shrubs. Among its thirteen subgenera the most important is Symphyomyrtus, comprising about 474 species including the ones with commercial importance (Grattapaglia et al., 2012, 2015). Only 20 or so of those Eucalyptus species have been extensively used in commercial forests plantation given they are fast and easy to grow, provide high forest productivities and deliver wood with high density, durability and interesting fibre properties (Grattapaglia et al., 2015; Villar et al., 2011). A relevant anatomical aspect of eucalypts is its relatively short and uniform fibre length which presents low coarseness as compared to other hardwoods frequently used as pulpwood sources. Those valuable wood properties make Eucalyptus highly suitable to provide raw material for the operation of important sectors such as pulp and paper production and solid wood derived products (Potts et al., 2004; CELPA, 2012). Additionally, in a context of a global carbon economy, Eucalyptus species are increasingly used as fast-growing woody crops for renewable biomass and bioenergy production, since they present

14 short rotation time, high productivity and adaptability to a broad range of environments, (Eldridge et al., 1993; Myburg et al., 2007; Rockwood et al., 2008). However, Eucalypts are still regarded as a largely unexplored source of woody biomass for biofuel production (Shepherd et al., 2011).

Among those hundreds of species, Eucalyptus globulus (Tasmanian blue gum or Southern Blue Gum or Blue Gum) (Fig. 1.2) and Eucalyptus grandis (Flooded gum or Rose gum) (Fig 1.3) stand out for their substantial social and commercial value in temperate and tropical and sub-tropical countries, respectively. These include over 90 countries from tropical regions of Asia, Australia, Africa and South America, as well as from temperate regions in South West Europe, South and North America, Australia and Africa (Çetinkol et al., 2012; Shepherd et al., 2011). The first specimen of Eucalyptus globulus (section Maidenaria) was described by the French botanist J.

Labillardière in 1800 (http://id.biodiversity.org.au/instance/apni/454621). The first specimen of Eucalyptus grandis (section Latoangulatae) was firstly described by W. Hill in 1862 (http://id.biodiversity.org.au/instance/apni/455071).

The two species have been extensively used in commercial plantations and are among the leading sources of wood biomass used worldwide for short-fibre pulpwood and timber (Grattapaglia et al., 2012; Kullan et al., 2012).

15

Figure 1.2 - Eucalyptus globulus Labill. (Tasmanian blue gum). Source of image of native Australia distribution of the E. globulus species: www.florabank.org.au.

Figure 1.3 - Eucalyptus grandis W. Hill ex Maiden (Floodem gum or rose gum). Source of single E. grandis (BRASUZ1) tree photo: DOE Joint Genome Institute in Flickr. Source of image of native Australia distribution of the E. grandis species: www.florabank.org.au.

16

Commercial plantations of eucalypts have expanded in the last 60 years and have been estimated to occupy at least 20 million ha worldwide, in tropical and subtropical areas on four continents (Iglesias and Wiltermann, 2009). Eucalyptus grandis W. Hill ex Maiden is the most widely planted forest species in those commercial plantations because of its fast growth. Eucalyptus globulus Labill is recognized as the paradigm species for short-fibre excellence that makes it the ideal source of lignocellulosic raw material for pulp and paper industry. In particular, commercial plantations of E. globulus have been estimated to occupy around 2.3 million ha (Iglesias and Wiltermann, 2009). In fact, only a few countries in the world, including Portugal, are able to sustain viable, profitable commercial plantations of E. globulus. Currently, Portugal is the third biggest European producer of pulp (7.2% of the total), the eleventh biggest producer of pulp and paper (2.4%) and the second producer of non-coated cardboard (21.5%). The importance of this sector is reflected on the numbers as it represented 4% of the GDI (Gross Internal Product) of Portugal in 2012 and 5% of all exports in 2013 (CELPA, 2015). Although, Portugal has some highly favorable edapho-climatic conditions, important restrictions are still imposed to forest productivity such as cost of labor, soil availability and heterogeneity. In such context and considering the increasing competitiveness of Southern hemisphere commercial plantations and industrial operations, novel and educated strategies for increasing productivity and wood quality are required (Bedon et al., 2011).

Genomic and genetic resources to study wood formation in Eucalyptus

To the present date around 27 genetic maps integrating thousands of molecular markers are available for six different species of Eucalyptus (including E. globulus and E. grandis) (reviewed in Bartholomé et al.,

17

2015; Grattapaglia et al., 2012). Some of those are highly saturated genetic maps, with large numbers of microsatellites, biallelic “Single Nucleotide Polymorphisms” (SNPs) and “Diversity Array Technology” (DArT) markers and occasionally candidate genes for wood formation and wood quality traits. Those maps are key resources for advanced studies on comparative genome analysis, whole genome assembly, QTL (Quantitative Trait Loci) mapping and QTL validation across pedigrees (Bartholomé et al., 2015; Grattapaglia et al., 2015).

Before the availability of the Eucalyptus genome sequence i.e. from the early 90s to 2011, some targeted studies enabled the punctual identification and characterization of key structural genes involved in the biosynthesis of secondary cell wall (SCW) biopolymers such as lignin (Gion et al., 2000; Grima-Pettenati et al., 1993; Lacombe et al., 1997), cellulose (Creux et al., 2011, 2008; Lu et al., 2008; Ranik and Myburg, 2006) and heteroxylans (Goulao et al., 2011; Lopes et al., 2010). Notably, the gene encoding Cinnamoyl CoA Reductase (CCR), the first enzyme of the monolignol branch pathway was first cloned in Eucalyptus (Lacombe et al., 1997).

A few Eucalyptus regulatory genes impacting the biosynthesis of SCW were also characterized (De Micco et al., 2012; Foucart et al., 2009; Goicoechea et al., 2005; Hussey et al., 2011; Legay et al., 2007, 2010).

Concurrently, large-scale projects involved either xylem transcriptome or proteome analyses provided catalogs of thousands of genes expressed during Eucalyptus xylogenesis. These studies have significantly extended our knowledge on the number and diversity of both structural and regulatory genes involved in xylem differentiation in Eucalyptus spp (Barros et al., 2009; Camargo et al., 2014; Celedon et al., 2007; Elissetche et al., 2011; Foucart et al., 2006; Gallo de

18

Carvalho et al., 2008; Hefer et al., 2015; Kirst et al., 2004; Külheim, 2010; Mizrachi et al., 2010, 2015; Novaes et al., 2008; Paux et al., 2004, 2005; Qiu et al., 2008; Rengel et al., 2009; Salazar et al., 2013; Shinya et al., 2014; Solomon et al., 2010).

Part of these studies used massive parallel deep sequencing technologies and became the largest contributors of the current repositories of genes and ESTs in public databases. In September 2016, GenBank lists a total of 167,436 ESTs from seven commercial important Eucalyptus species [Eucalyptus camaldulensis (58,584); Eucalyptus grandis (42,576); Eucalyptus globulus (28,949); Eucalyptus gunnii (19,841); Eucalyptus pellita (8,871); Eucalyptus urophylla (7,440); Eucalyptus tereticornis (1,131)] and one hybrid [Eucalyptus grandis x Eucalyptus nitens (44)]. The existence of other large private EST databases resources produced from tissues, organs from other Eucalyptus species, generated in a wide range of developmental and environmental conditions has also been reported (reviewed in Grattapaglia et al., 2012). Sequencing of ESTs has been carried out either randomly, to generate an index of expressed genes, or has been directed to discover genes that are selectively expressed typically during wood formation or abiotic or in responses to biotic stresses (Grattapaglia et al., 2015).

Gene expression profiling in various organs and/or under different conditions has been a powerful tool to get insights on the extension and contributive importance of alternative splicing forms from identical loci. It is also important to identify which genes are more relevant for certain critical biological processes, and which are could be the most interesting ones for advanced, highly-focused functional characterization studies. Such studies will likely benefit our better understanding about the actual cellular roles of the selected genes,

19 their affective biological impact, and in certain cases their participative role in regulatory networks.

A major breakthrough was the release of the full annotated sequence of the genome of E. grandis (Myburg et al., 2014), publicly accessible since 2011 in the Phytozome database (http://www.phytozome.net/) (Goodstein et al., 2012). Eucalyptus grandis [genome size, 640 Mbp (Myburg et al., 2014)] was the second hardwood forest tree species whose genome has been entirely sequenced (Myburg et al., 2014). The genome of Eucalyptus globulus [530Mbp (Myburg et al., 2014)] is still not publicly available.

In this exciting whole-genome cartography context, the interest in the genomics of Eucalyptus has skyrocketed with growing numbers of functional genomics studies and whole-transcriptome sequencing studies linked to digital transcript counting (RNA-Seq) of various Eucalyptus species, organs, tissues, developmental stages and environmental conditions (Camargo et al., 2011; Villar et al., 2011). Some of these studies accomplished the genome-wide identification and characterization of important structural multigene families such as phenylpropanoid and lignin (Carocha et al., 2015) and cellulose and heteroxylans (Myburg et al., 2014).

Similar studies published in the same time period equally contributed to boost our ability to identify genes likely involved in the transcriptional regulation of wood formation. Examples are the genome-wide analyses of families of key transcriptional regulators such as MYB (Soler et al., 2015), NAC (Hussey et al., 2015), Aux/IAA (Yu et al., 2014), ARF - Auxin Response Factors (Yu et al., 2015) and AP2/ERF (Cao et al., 2015).

20

Wood formation in trees

Wood, also known as secondary xylem is essentially formed by highly lignified secondary cell walls of both fibres and vessels. Wood formation is also called xylogenesis. The developmental process which results in the biogenesis of woody stems is known as secondary growth resultant from the diametral growth of stems, and is mainly determined by the vascular cambium activity.

Vascular cambium is an internal meristem embedded between the phloem and the xylem. Unlike apical meristems, the cambium is a complex tissue containing two morphologically (size and shape) distinct meristematic cell types: axially elongated fusiform initials and isodiametrical ray initials (Déjardin et al., 2010; Mellerowicz et al., 2001). These cells originate from procambium cells, an innermost primary lateral meristem previously derived from a de novo differentiation of parenchyma cells (Fig. 1.4). Both of these lateral meristems result from hormone-driven cellular recruitment and re- differentiation processes (Schuetz et al., 2013).

This active population of meristematic cells (stem cells) is able to divide in three planes (tangential, transversal and radial). Depending on the type of initial cells and on the kind of division, the meristematic cells are able to differentiate into a variety of precursor cell types of secondary phloem on the outside of the ring and xylem on the inside (Fig. 1.4).

The outer-layer phloem precursor cells differentiates into secondary phloem and the inner-layer xylem precursor cells into secondary xylem (Schuetz et al., 2013). Besides, some cambium “stem cells” are also produced and stay undifferentiated to feed the next rounds of cell division and differentiation.

21

Figure 1.4 - Overview of procambial/cambial cell specification and xylem/phloem cell differentiation. Note: the different cell types are not drawn to scale. Extracted from Schuetz et al. (2013).

22

The transition from meristematic cambium initials to the diverse fully- differentiated cell types of wood, occurs through successive cycles of three dynamic, partially overlapping, developmental stages: cell division, initial and late cell expansion (which involve elongation and radial enlargement) and SCW deposition (Chaffey et al., 2002; Déjardin et al., 2010; Hertzberg et al., 2001; Paux et al., 2004; Pilate et al., 2004a; Ye, 2002) (Fig. 1.5).

Figure 1.5 - Xylogenesis or wood formation. (A) Cross-section of 6-month-old aspen stem (adapted from (Baucher et al., 2007; Schrader, 2003) , scale bar=100µm. (B) Schematic representation of the cambial zone and the differentiating xylem. The secondary cell wall sub-layers (S1, S2 and S3) and the stages for the deposition of the main polymers are indicated. The main steps of xylogenesis are represented (adapted from (Hertzberg et al., 2001) except for program cell death because it takes place at different stages for vessels and fibres.

23

Finally and closely associated to SCW deposition, programmed cell death occurs in both vessels elements (faster) and in fibre cells although in different stages (Bollhöner et al., 2012; Courtois-Moreau et al., 2009; Moreau et al., 2005).

This irreversible differentiation process leads to a tridimensional composite tissue composed of three main cell types with distinct cellular roles: i) xylem tracheary or vessel elements (responsible for transport of water, solute and hormones); ii) xylary fibres (structural support); and iii) xylem parenchyma cells (fulfilling diverse roles including translocation of metabolites between phloem and xylem and during the extensive lignification of SCWs occurring in neighboring cells). Only vessel elements and fibres undergo the extensive SCW formation which occurs between the plasma membrane and the primary cell wall and typifies xylogenesis (Didi et al., 2015; Schuetz et al., 2013).

Cell wall organization

The lignified secondary cell wall is multi-layered composite with three major cell wall layers including the middle lamellae, the thin primary wall and the large secondary cell wall per se (Arend, 2008; Fang et al., 2007) (Fig. 1.6). These layers are formed at distinct cell differentiation times (Mellerowicz et al., 2001; Plomion et al., 2001).

The middle lamellae is first formed at the time of cytokinesis and ensures adherence between neighbor cells (Chaffey et al., 2002). The primary cell wall is formed next during the growth phase of the cell and consists of several layers of randomly oriented cellulose microfibrils interspaced by pectic substances and hemicelluloses, mostly xyloglucans (Plomion et al., 2001).

24

Figure 1.6 - (A) Cell wall structure in wood. Taxus Canadensis, magnification: x22,000. Photo credits: BIOPHOTO ASSOCIATES . (B) Detail from the cell wall structure: middle lamella (ML), primary wall (PW), external secondary wall (S1), middle secondary wall (S2) internal secondary wall (S3) and cell lumen (CL). (C) Spatial arrangement of main cell wall components (cellulose, hemicellulose and lignin) of lignocellulosic biomass. Image credits: U.S Department of Energy Genome Programs.

The enriched xyloglucans fraction keeps the strength of the primary cell wall, while allowing its extensibility (Takahashi Schmidt, 2008). The absence of lignin allows the many cell wall-hydrolyzing enzymes to remodel the primary cell wall thereby permitting cells to elongate both in length and diameter (Mauriat et al., 2014).

Secondary cell wall are deposited, after the cessation of cell growth but only in specific cell types (Chaffey et al., 2002). Secondary cell wall formation occurs in the fibre cells and vascular elements to provide them with a high rigidity mainly due to the elevated levels of cellulose and hemicelluloses (glucuronoxylan) (Mauriat et al., 2014). Secondary cell walls (SCW) are formed by three sub-layers (S1, S2 and S3) composed of cellulose microfibrils with distinct orientations, and in which hemicelluloses and lignin are also found (Plomion et al., 2001) (Fig. 1.5 and Fig. 1.6).

25

Chemical composition of plant cell walls

Secondary cell walls are composed of approximately 80% of cellulose and hemicelluloses, with the remaining primarily composed of lignin (Boerjan et al., 2003; Mizrachi et al., 2012; Scheller and Ulvskov, 2010). In E. globulus for instance, wood comprises cellulose (around 50%), lignin (around 20%) and hemicelluloses (24-27%) (Evtuguin and Neto, 2004).

Cellulose is the most abundant biopolymer synthesized on Earth, estimated to represent about 1.5×1012 tons of the total annual biomass production (Klemm et al., 2005; Leisola et al., 2012). Given its properties and abundance, cellulose has a high commercial value and many industrial end-uses applications. The bulk of this cellulose is deposited in the thick secondary cell walls of woody plants (Delmer and Haigler, 2002). Cellulose is deposited in cell wall as a long linear series of glucose residues bound through β-1,4 glycosidic bonds into straight glucan chains that are assembled into microfibrils stabilized by hydrogen bonding (Lindeboom et al., 2008). Cellulose confers strength to cell walls (Plomion et al., 2001; Turner and Somerville, 1997) which is derived from layers of cellulose microfibrils (Lindeboom et al., 2008). The biosynthesis of cellulose microfibrils is promoted by Cellulose Synthases (CesA) which are rosette-shaped multiprotein terminal complex located outside but associated with the cell membrane (Mueller and Brown, 1980; Watanabe et al., 2015). These rosettes are composed of up to 36 individual catalytic subunits, each with cellulose synthase activity and several models for rosette assembly and catalytic dynamics have been proposed (Brown Jr and Saxena, 2000). Live cell imaging of cellulose synthesis in secondary cell wall has been recently reported (Watanabe et al., 2015). Six E. grandis CesAs have been firstly described in E. grandis by Ranik and Myburg (2006). These authors reported the existence of two

26 contrasting groups of apparently co-regulated CesAs, involved in either primary or secondary cell wall biosynthesis. More recently, a genome- wide search extended the number to a total of sixteen E. grandis CesA genes, but only six of these were found preferentially expressed in DX (immature) tissues (Myburg et al., 2014).

Hemicelluloses include several different polysaccharides which in plant cell walls present β,1-4 linked backbones with an equatorial configuration (Scheller and Ulvskov, 2010). In contrast with cellulose, these multiple, structurally complex, hemicelluloses are highly substituted, frequently with O-linked acetate, in various degrees and patterns of acetyl esterification (Gille and Pauly, 2012). In E. globulus, xylans are the most abundant hemicelluloses (16-20%) followed by glucans (4-6%) and glucomannan. Pectic compounds account for 1- 2% (Evtuguin and Neto, 2004).

Lignin is the second most abundant biopolymer on Earth, estimated to represent as much as 30% of the total biomass produced in the biosphere (Boerjan et al., 2003; Leisola et al., 2012). Both its abundance and its recalcitrance to biodegradation, turn lignin into another major form of fixed carbon storage (Boudet et al., 2003). This highly structural complex polyphenolic biopolymer is predominantly found in the secondary cell walls of xylem elements but also in the middle lamellae (Boerjan et al., 2003). Lignin confers rigidity to support the cellulose fibres and provides hydrophobic surfaces in vessels, essential for water conduction (Bhuiyan et al., 2009). Lignin is mainly formed by the random oxidative polymerization of three monolignol monomers, p-coumaryl, coniferyl and sinapyl alcohols which generate p-hydroxyphenyl (H), guaiacyl (G), and syringyl (S) units, respectively (Grima-Pettenati et al., 2012). Lignin composition and structure vary between and within species and also among tissues and organs of the same plant (Anterola and Lewis, 2002; Plomion et al., 2001). Lignin

27 has also been reported to vary in different developmental processes, and in response to external conditions experienced by plants (Bhuiyan et al., 2009).

Lignin exhibits a complex chemical structure involving various ether (α-O-4, β-O-4, among others) and C-C (phenyl-phenyl) linkages (Guda et al., 2015). In hardwoods lignin, the β-O-4 ether linkages are the most abundant and their cleavage is considered as a critical reaction during the harsh alkaline or acidic industrial pretreatments for lignocellulosic biomass delignification (Santos et al., 2013). Also in hardwoods, the relative amounts of syringyl (S) and guaiacyl (G) units have been correlated to the proportion of β-O-4 linkages present in the wood matrix. In particular, the S/G ratio has been correlated to more easily pulped lignin (Alves et al., 2011; Santos et al., 2013). Eucalyptus globulus presents lignin with an unusually high S/G ratio (G:S ratio of about 14:84) among hardwoods (Evtuguin and Neto, 2004) and a high frequency of β-O-4 linkages (Elissetche et al., 2011). The wood of eucalypts is thus more easily delignified and provides higher pulp yield (Aguayo et al., 2015).

The composition of lignin affects the interaction between lignin and the carbohydrate components of the cell walls (Mellerowicz et al., 2001). The H- and G-lignin interact with pectins, while the S-lignin forms links with cross-linking glycans (Fukuda and Terashima, 1988). In fact, almost all lignin is covalently linked with polysaccharides (mainly hemicelluloses) to form lignin-carbohydrate complexes (Mikhail Balakshin, 2014).

Wood complex properties

The complex and highly variable properties of wood are determined by its anatomy (size, shape, and arrangement of different cell types) and by the physicochemical composition of the secondary cell walls.

28

The efficiency of wood industrial processing and therefore the final industrial value and end-uses are determined by the ultrastructure of wood. This ultrastructure depends on the relative amounts and the interactions among the three major biopolymers deposited in the secondary cell walls of xylem cells: cellulose, lignin and hemicelluloses (Plomion et al., 2001). A common element in the organization of secondary cell walls is the presence of rigid cellulose microfibrils embedded in a gel-like matrix of heterogeneous, cross-linked, non- cellulosic polysaccharides (mainly hemicelluloses), aromatic heteropolymers (lignin) and proteins. These cell wall components are organized into a functional matrix which has been the subject of several models focused in its functional organization (Keegstra, 2010).

Wood variability

Wood properties vary hugely among tree species, among genotypes within the same species and even within individual trees depending on the position in the tree and the time when it is formed (Du and Yamamoto, 2007). Also, as long-lived organisms with a sessile lifestyle, trees are exposed to numerous external stimuli (e.g. temperature changes, photoperiod, water soil content, wind, gravity), which strongly influence the activity of vascular cambium, the biosynthesis of secondary cell walls and, in fine, the properties of wood (Zobel and Buijtenen, 1989).

The heterogeneity of wood is explained by the highly complex and variable anatomical and physicochemical properties of their different cell types. The number of cell types is optimized in accordance with the strategies developed by the trees to sustain growth and ensure survival (Du and Yamamoto, 2007). This is quite evident in particular types of wood formed in contrasting conditions under the influence of environmental stimuli such as: i) seasonal stimulus and response

29 variation, namely during the growing season (within the annual growth rings), and ii) gravitropic stimulus and response variation which result in the formation of reaction wood (Plomion et al., 2001).

Remarkably, across the range of provenances and progenies, Eucalyptus globulus presents significant variation of the chemical composition of wood (Stackpole et al., 2011). Such variation in chemical traits has been evaluated at the different levels such as the lignin content, the lignin monomer composition (S/G ratio) as well as the contents of cellulose and extractives (Stackpole et al., 2011). This wide-range of natural variation offers interesting opportunities for the study and characterization of Eucalyptus xylogenesis in projects aiming the modulation of wood properties.

Tension wood formation and properties

The formation of tension wood following a gravitropism stress stimulus is a remarkable example of wood plasticity in angiosperms. In fact, in response to a gravitropic stimulus caused either by natural (e.g. soil subsidence, slopes, asymmetric crown shape, heavy loads, wind, snow) or artificial mechanical forces, angiosperm trees trigger a set of biological responses to react to a non-optimal orientation of the stem in an attempt to re-gain verticality (Du and Yamamoto, 2007). Under such stimuli, tension wood (TW) is formed on the upper, adaxial, convex side of the leaning stem while opposite wood (OW) is formed on the reverse, abaxial, concave side (Fig. 1.7). The formation of these highly contrasting tissues creates unequal tensile strengths in the opposite sides of the bent segment of the trunk.

30

Figure 1.7 - Formation of tension and opposite wood in Eucalyptus globulus in response to an artificially induced gravitropism stress. Tissues were stained with Astra Blue (stains cellulose in blue) and Safranin (stains lignin in red). (A) Detail of a well visible G-layer in the lumen of a cell formed in a tension wood tissue from poplar. The image from the field was kindly provided by J. Paiva (IPG PAN). The images from histochemical characterization of tension and opposite wood tissues were kindly provided by T. Quilhó (ISA). The lower image was modified from (Clair and Thibaut, 2001).

Interestingly, TW formation is also been frequently found in apparently straight, vertical dominant E. globulus trees and was suggested to confer increased capacities for the trees to compete for light (Washusen and Ilic, 2001). These capacities are advantageous in dense forests in which trees experience significant canopy competition. In such ecosystems, trees reduce stem diameter and increase stem length in a process which involves the generation of tensile strengths to confer them the capacity to correct the orientation of the stem axis (Richter, 2014).

The two contrasting types of reaction wood exhibit distinctive mechanical and chemical properties (Déjardin et al., 2010). For instance, in comparison to normal straight wood, TW exhibits altered physical properties such as higher tensile stress, with fewer, smaller and lower porosity vessels (Paux et al., 2005; Pilate et al., 2004a). The

31 most critical differences are observed in fibre cells, which are significantly longer and possess thicker cell walls and smaller lumens as compared to fibres from normal wood and OW (Jourez et al., 2001).

Chemical properties of TW are also considerably altered comparatively to OW which shows characteristics far more similar to normal wood formed in a non-bent tree (Pilate et al., 2004b). In TW, an extra, translucent, gelatinous-like G-layer in the lumen of xylary cells replacing S3 and part or the whole S2 sub-layers is often observed (reviewed in Mellerowicz and Gorshkova, 2011). This G-layer layer which is loosely attached to the other cell-wall layers (Du and Yamamoto, 2007) consists of almost highly pure crystalline cellulose microfibrils oriented almost parallel to the longitudinal axis of the fibre cell (Haygren and Bowyer, 1996; Jourez et al., 2001; Qiu et al., 2008).

Lignin content is generally strongly decreased in cell walls from TW (Aguayo et al., 2012; Timell, 1986; Yoshida et al., 2002). However, lignin structures have been visualized within the G-layer of poplar TW fibres (Joseleau et al., 2004). The same authors also suggested that S-lignin synthesis in TW fibres might be under specific spatiotemporal regulation targeted differentially throughout cell wall layers. Wood phenotyping and functional genomics studies pointed out that TW- specific modifications, might reflect a modulation of lignin metabolism rather than an up-regulation of the cellulose biosynthetic pathway (Fagerstedt et al., 2007; Mizrachi et al., 2015).

Hemicellulose composition has also been found to be quite different between TW and OW like for instance in the E. grandis x E. urophylla hybrid (Mizrachi et al., 2015). These authors evidenced that rhamnose and arabinose levels were not significantly different while galactose, xylose and mannose concentrations were significantly lower in TW.

32

Altogether, the increased cellulose content, decreased lignin content, and smaller microfibril angle in the secondary cell walls of TW are likely conferring an increase tensile growth stress (Yoshida et al., 2002) causing shrinkage events when harvesting mature wood. In industrial processes, the presence of extensive TW causes abnormal high transverse and longitudinal shrinkage in solid wood and very high growth stresses that present difficulties during primary wood industrial processing (Vermaas, 1995; Washusen et al., 2002). This seriously affects the dimensional stability of Eucalyptus wood and is susceptible to restrict its use in solid wood industrial end-uses. On the other hand, both TW and OW of E. globulus have been demonstrated as suitable raw materials for organosolv pretreatment and bioethanol production with high conversion yields (Muñoz et al., 2011).

Considering its biological significance and its impact on wood quality and industrial processes, TW formation has been considered an excellent model to study the formation of xylem cell walls (Arend, 2008; Pilate et al., 2004a).

Hormonal influence on wood formation

Phytohormones are synthesized in most plant organs and function either where they are synthesized or in other parts of the plant to which they are transported (Hellgren, 2003). In the context of xylogenesis, phytohormones impact vascular cambium activity in the stages of cell division, cell expansion and control of differentiation into different types of cambial derivatives (Mellerowicz et al., 2001).

Auxins and ethylene are known to stimulate cambial cell division and diameter growth which deeply impact wood development (Hellgren, 2003).

33

Auxins are also known to be involved in the control of tension wood formation (Little and Savidge, 1987; Timell, 1986). Despite the still lack of experimental support, the presence of radial auxin concentration gradients across wood-forming tissues was suggested to regulate cambial activity and differentiation of cambial derivatives (Bhalerao and Fischer, 2014). In this hypothesis, changes in the cellular auxin concentration, changes in tissue sensitivity towards auxin presence and changes in polar auxin transport, might provide positional information to cells modulating tissue cambial activity, dormancy, secondary cell wall deposition and tension wood formation (Bhalerao and Fischer, 2014; Ko et al., 2006; Moyle et al., 2002).

Interestingly, patterns of auxin distribution during gravitational induction of reaction wood have been studied in poplar and pine trees and the results suggested a role for signals other than auxins (in this case, IAA) in the reaction wood response (Hellgren et al., 2004). These authors also suggested that the gravitational stimulus interacts with the IAA signal transduction pathway. In the case of ethylene, changes of internal concentrations seem also to stimulate cambial growth, apparently by promoting cell radial growth and TW formation in poplar (Andersson-Gunnerås et al., 2003). In Eucalyptus gomphocephala, auxin-independent elevated ethylene levels (either higher production or preferential accumulation of ethylene) were associated to TW formation (Nelson and Hillis, 1978).

Other phytohormones have also an important role in the internal hormonal balances which influence xylogenesis. For instance, gibberellins are known to influence polar transport of auxin, and therefore are able to establish synergistic actions with auxins to impact cambial differentiation and fibre elongation (Little and Savidge, 1987). Cytokinins have been reported to be central regulators of procambium initiation (Yang and Wang, 2016). Cytokinins were shown to be critical

34 for distribution of efflux-carrier PIN-formed proteins which promote channeling of auxin towards the axis of xylem precursor cells (Yang and Wang, 2016). They have been shown to affect cell division during xylem formation and cambial activity responds very sensitively to changes in cytokinin levels (Matsumoto-Kitano et al., 2008; Nieminen et al., 2008). Strigolactones have been shown to be involved in secondary growth in Eucalyptus globulus.They also seem to induce cambium division and are required in the IAA (Indole-3-acetic acid) signaling pathway which controls wood formation (Agusti et al., 2011). Finally, jasmonic acid signaling has been linked to cells under tension stress in the interfascicular fibres of Arabidopsis thaliana (Sehr et al., 2010).

Excellent reviews are available on the biology behind the influence of phytohormones during TW formation (Bhalerao and Fischer, 2014; Du and Yamamoto, 2007), and on the role of these molecules in gravity sensing (Lopez et al., 2014) and mechanical perception in plants (Telewski, 2006).

Transcriptional regulation of wood formation

With the formation of diverse cell types, xylogenesis illustrates one of the most remarkable biologic examples of highly-ordered, adaptive and precisely spatiotemporal controlled plant cell differentiation mechanisms (Chaffey et al., 2002; Déjardin et al., 2010; Paux et al., 2005; Pilate et al., 2004a). Xylogenesis requires a precisely coordinated expression in time and space of a very large number of structural and regulatory genes involved in the sequential steps of xylogenesis from cell expansion to secondary cell wall deposition and in fine programmed cell death. Pioneer works have shown that transcriptional regulation was the chief mechanism regulating this

35 differentiation process (Andersson-Gunnerås et al., 2003; Demura and Fukuda, 2007; Hertzberg et al., 2001; Ranik et al., 2006).

Recent researches mainly performed in the model plant Arabidopsis showed that the transcriptional regulation controlling secondary cell wall deposition at the spatial and temporal levels is achieved through multileveled complex and highly hierarchical networks of transcription factors mostly including MYB and NAC (NO APICAL MERISTEM, ATAF1,

ATAF2, AND CUP-SHAPED COTYLEDON 2) proteins (Fig. 1.8). Outputs of these researches resulted in excellent reviews which proposed increasingly complex and comprehensive transcriptional regulatory networks that govern secondary cell wall (see Druart et al., 2007; Hussey et al., 2013; Ruzicka et al., 2015; Taylor-Teeples et al., 2015; Yang and Wang, 2016; Zhong et al., 2010).

Figure 1.8 - Multilayered hierarchical transcriptional regulatory networks that govern secondary cell wall (SCW) biosynthesis in Arabidopsis thaliana. Extracted from (Yang and Wang, 2016). SCW deposition is regulated by a large number of TFs through both hierarchical and non-hierarchical regulatory networks. At least three layers of regulators are involved in the regulation of the expression of SCW biosynthetic genes (including some of those involved in the biosynthesis of cellulose, hemicelluloses and lignin). These layers include NAC domain master regulators in tier 3, two MYB domain regulators in tier 2 and many other regulators in tier 1 which function downstream the NAC and MYB domain master regulators over the structural genes. Post-transcriptional regulation of expression of HDZIP III TF genes mediated by micro-RNA 165/166 is also represented. Colored rectangles represent transcription factors in different tiers as specified in the column on the right. Blue arrows denote positive regulation, while a red line with blunt ends denotes negative regulation.

36

Despite wood formation being an evident characteristic of trees, many herbaceous plants including Arabidopsis, are also able to develop a vascular cambium and form limited amount of secondary xylem (Chaffey et al., 2002; Lev-Yadun, 1994; Nieminen et al., 2004). Also, although most of these discoveries derived from research in Arabidopsis, many genes have shown conserved functions in biofuel feedstock species (Yang and Wang, 2016).

However, in addition to transcriptional regulation, other layers of regulation including epigenetics and post-translational modifications exist and seem to play an important role in the regulation of gene expression during wood formation (Hirayama and Shinozaki, 2010).

Post-transcriptional regulation of wood formation

The importance of small RNAs (sRNA) and their participation in mechanisms of gene silencing by post-transcriptional regulation is being increasingly highlighted. Plant genomes are now known to encode various classes sRNA that function in distinct, yet overlapping, genetic and epigenetic silencing pathways (Borges and Martienssen, 2015). Small RNAs are non-coding RNAs involved in plant development during growth, reproduction, genome reprogramming, responses to the environment, as well as defense against pathogens, and silencing of endogenous transposable elements (Borges and Martienssen, 2015; Källman et al., 2013; Trindade et al., 2011).

Several major classes of sRNA have been described: microRNAs (miRNA), hairpin‑derived small interfering RNAs (hp‑siRNAs), natural antisense siRNAs (natsiRNAs), secondary siRNAs and heterochromatic siRNAs (hetsiRNAs). The abundance of these classes and its variation among plant species suggests coevolution between environmental adaptations and gene-silencing mechanisms (Borges and Martienssen, 2015). The large variety of sRNA pathways in plants

37 has been hypothesized to contribute for their phenotypic plasticity (Borges and Martienssen, 2015).

Despite their common origin being double-stranded RNA (dsRNA) precursors, sRNA are derived from a diverse set of sources, including transcription of inverted repeat structures, convergent transcription or virus replication, as well as transcription of genes containing a hairpin structure by RNA polymerase II (Voinnet, 2009). Some dsRNA precursors can also result from the action of specific RNA-dependent RNA polymerases. The resulting dsRNA is cleaved into sRNA duplexes typically 18-24nt long by one of the four RNase III Dicer-like enzymes (DCL) found in plants (Parent et al., 2012).

MicroRNAs (20-24 nt), which represent the second most abundant class of sRNA (Poulsen et al., 2013) are produced as single duplexes excised from short fold-back stem-loops by DCL1 (Voinnet, 2009) (Fig. 1.9). A second miRNA biogenesis pathway involves the sequential process long fold back stem-loops into a series of RNA duplexes by DCL4 cleavage (Rajagopalan et al., 2006).

These sRNA duplexes are then protected from uridylation and degradation by methylation at their 3’ end in a sequence-independent manner promoted by an S-adenosyl methionine-dependent methyltransferase Hua Enhancer 1 (HEN1) (Li et al., 2005; Yu et al., 2005).

After its biogenesis, the miRNA duplex is incorporated into an ARGONAUTE (AGO) complex and one strand is retained to form an RNA-induced silencing complex (RISC) (reviewed in Poulsen et al., 2013). The sRNA strand guides the complex to specific, complementary segments of RNA molecules. Depending on the nature of the RNA target and the Argonaute protein involved, the RISC triggers either DNA methylation, chromatin (histone) modification

38

(leading to transcriptional gene silencing) or RNA cleavage or translational inhibition (leading to post-transcriptional gene silencing) (Parent et al., 2012).

The first report of a small inhibitory RNA involved in translational control in a plant system was accomplished with the isolation of a small molecular weight RNA from barley embryos that specifically inhibited translation initiation (Gunnery and Datta, 1987). MicroRNAs were formally firstly reported in plants in 2002 (Llave et al., 2002; Reinhart et al., 2002), about 10 years later after the first report on animal species. MicroRNAs are negative post-transcriptional regulatory RNA molecules that mediate the down-regulation of specific target genes either by site-specific transcript cleavage or by inhibiting their translation into proteins (reviewed in Axtell, 2013; Borges and Martienssen, 2015; Fei et al., 2013; Iwakawa and Tomari, 2013; Ma et al., 2013; Rogers and Chen, 2013).

Comprehensive views of miRNA-mediated gene regulatory networks in plants have evolved in recent years (Meng et al., 2011). Members of evolutionary conserved miRNA families are involved in conserved miRNA-mediated mechanisms associated to relevant plant physiological processes (reviewed in Cuperus et al., 2011; Dalmay, 2012; Taylor et al., 2014). However, functional characterization of the miRNA targets is still scarce and will be essential to progress and to provide deeper biological insights into certain miRNA-mediated pathways (Meng et al., 2011).

39

miR162 40 Pol II Gene A MIR gene Gene B

3’ 5’ ’ 3 5’

pri-miRNA 3’ pre-miRNA HEN 5’ ’ 1 5 3’ Nucleos ’ HYL1 DCL1 Mature (3 methyl) ’ ’ SE 5 3 miRNA 5’ DCL1 3’ HYL1 5’ 3’ 5’ 3’ HST Citoplasm miR168 D-bodies mRNA 3’ 3’ target A(n) 5’ 5’ AGO 5’ 3’ 1 AGO1 ’ ’ ’ 5’ 5 3 AGO 3’ 3 3’ (n)A 1 5’ 5’ 5’ (RISC) 3’

5’ 5’ 3’ mRNA cleavage Translation repression 3’ (n)A

Figure 1.9 - Biogenesis and operation of plant micro RNA. Scheme from Claúdio Capitão (2011).

40

MicroRNAs have also been reported to play an important role in wood formation (Carvalho et al., 2013; Klevebring et al., 2009; Ko et al., 2006; Lu et al., 2013; Neutelings et al., 2012; Ong and Wickneswari, 2011, 2012; Puzey et al., 2012; Schuetz et al., 2013; Shi et al., 2010a; Smith et al., 2013; Xia et al., 2012). Some studies which specifically focused on vascular cambium differentiation (Kim et al., 2005; Klevebring et al., 2009; Ko et al., 2006; Lu et al., 2005; Robischon et al., 2011; Sunkar and Zhu, 2004) have revealed conserved actors and mechanisms among woody and non-woody species. In poplar, miRNA target genes have also been related with the formation of tension wood (Chen et al., 2015; Lu et al., 2013, 2005; Puzey et al., 2012; Sun et al., 2012). Despite such advances we still need a deeper comprehension of the miRNA post-transcriptional regulation mechanisms involved in wood formation and SCW biosynthesis in trees and particularly in Eucalyptus.

A few studies identified miRNA in Eucalyptus spp (Levy et al., 2014; Myburg et al., 2014; Victor, 2007) but no eucalypt miRNA has been yet included within the 28,645 miRNA entries in miRBase (v.21) (Kozomara and Griffiths-Jones, 2011).

In this context, the identification of participative members in post- transcriptional regulation mechanisms with the clarification of the role and implications of additional layers of gene expression regulation including the interactions with transcriptional regulatory layers will provide significant advances towards a better comprehension of wood formation processes.

41

Chapter 2

Genome-wide analysis of the lignin toolbox of Eucalyptus grandis

The present chapter was mostly published on the following research article:

Carocha V, Soler M, Hefer C, Cassan-Wang H, Fevereiro P, Myburg AA, Paiva JAP and Grima-Pettenati J. 2015. Genome-wide analysis of the lignin toolbox of Eucalyptus grandis. New Phytologist 206: 1364–1377.

Parts of the present chapter were also publishedin the following research articles:

Myburg AA, Grattapaglia D, Tuskan GA, Hellsten U, Hayes RD, Grimwood J, , Carocha V, Paiva J, ... 2014. The genome of Eucalyptus grandis. Nature 510 (7505): 356-362.

Carvalho A, Graça C, Carocha V, Pêra S, Lousada JL, Lima-Brito J and Paiva JAP. 2015. An improved total RNA isolation from secondary tissues of woody species for coding and non-coding gene expression analyses. Wood Sci Technol 49: 647-658.

Cassan-Wang H, Soler M, Yu H, Camargo ELO, Carocha V, Ladouce N, Savelli B, Paiva JAP, Leple J-C and Grima-Pettenati J. 2012. Reference Genes for High-Throughput Quantitative Reverse Transcription–PCR Analysis of Gene Expression in Organs and Tissues of Eucalyptus Grown in Various Environmental Conditions. Plant Cell Physiol. 53(12): 2101-16.

42

Chapter 2

Genome-wide analysis of the lignin toolbox of Eucalyptus grandis.

Table of Contents

Chapter 2 ...... 42

Abstract ...... 44 Chapter introduction ...... 45 Objectives ...... 47 Materials and methods ...... 48 Results ...... 51 Discussion ...... 72

43

Abstract

• We performed genome-wide identification of putative members of the eleven multigene families involved in the lignin branch of the phenylpropanoid pathway. Comparative phylogenetic studies focusing on bona fide clades inferred from genes functionally characterized in other species allowed to identify those genes mainly involved in xylem developmental lignification. RNA-Seq and microfluidic RT-qPCR expression data were used to investigate developmental and environmental responsive expression patterns of the genes.

• The phylogenetic analysis revealed that 38 E. grandis genes are located in bona fide lignification clades. Four multigene families (HCT, C3H, COMT and PAL) are expanded by tandem gene duplication compared to other plant species. Seventeen of the 38 genes exhibit strong, preferential expression in highly lignified

tissues, likely representing the E. grandis core lignification toolbox.

• The identification of major genes involved in lignin biosynthesis in Eucalyptus, the most widely planted hardwood crop worldwide, provides the foundation for the development of biotechnology approaches to develop tree varieties with enhanced processing qualities.

44

Chapter introduction

The huge economic importance of the pulp industry has been a driving force to decipher the lignin biosynthetic pathway, which has proven more complex and reticulate than initially thought (reviewed in Grima-Pettenati and Goffner, 1999; Boudet et al., 2003). The topology of the pathway has been revised several times in the last decades (reviewed in Humphreys and Chapple, 2002; Boerjan et al., 2003; Ralph et al., 2004; Vanholme et al., 2010) and new alternative routes are still being discovered such as that involving the recently described caffeoyl shikimate esterase (CSE; Vanholme et al., 2013). Altogether eleven enzymatic reactions are implicated in the synthesis of monolignols that involves the general phenylpropanoid pathway starting with the deamination of phenylalanine and leading to the production of hydroxycinnamoyl CoA esters. The enzymes involved in this short sequence of reactions are phenylalanine ammonia-lyase (PAL); cinnamate 4-hydroxylase (C4H) and 4-coumarate:CoA ligase (4CL).

Although lignin is the most abundant phenylpropanoid derived from the hydroxycinnamoyl-CoA esters, the latter are also the precursors of a wide range of end products including flavonoids, anthocyanins and condensed tannins, which vary according to species, cell-type and environmental signals (Dixon and Paiva, 1995). In order to produce monolignols, hydroxycinnamoyl-CoA esters undergo successive hydroxylation and O-methylation of their aromatic rings (Boerjan et al., 2003) involving the following enzymatic activities: shikimate O- hydroxycinnamoyltransferase (HCT); caffeoyl shikimate esterase (CSE); p-coumarate 3-hydroxylase (C3’H); caffeoyl CoA 3-O- methyltransferase (CCoAOMT); ferulate 5-hydroxylase (F5H) and caffeate/5-hydroxyferulate O-methyltransferase (COMT). The conversion of the side-chain carboxyl to an alcohol group is catalyzed

45 successively by cinnamoyl CoA reductase (CCR) and cinnamyl alcohol dehydrogenase (CAD), two enzymes considered to be the most specific of the monolignol biosynthesis pathway.

Many efforts were produced in the last decades to delineate the lignin pathway in the Eucalyptus genus. Remarkably, the CCR gene was cloned for the first time in Eucalyptus gunnii (EguCCR) and its identity was proven unambiguously by the enzymatic activity of the corresponding recombinant protein (Lacombe et al., 1997). EguCCR cDNA was then used as a probe to clone its orthologs in tobacco (Piquemal et al., 1998), Arabidopsis (Lauvergeat et al., 2001) and Populus (Leplé et al., 2007). In line with its key role in controlling lignin content and composition, EgCCR was later shown to co-localize with a QTL for S/G lignin ratio (Gion et al., 2011). Similar pioneering work including proof of enzymatic activity has been done for EguCAD2 (Grima-Pettenati et al., 1993) which was the second CAD gene cloned in plants. Other lignin biosynthetic genes have been cloned in eucalypts (Gion et al., 2000; Poeydomenge et al., 1994); and located on genetic maps (Gion et al., 2011). However, it is the recent availability of the E. grandis genome (Myburg et al., 2014) that provided the opportunity to perform a comprehensive genome-wide analysis of lignin biosynthetic genes as reported in the present study.

46

Objectives

To accomplish a genome-wide survey to identify genes which are members of the eleven multigene families involved in Eucalyptus grandis monolignol biosynthesis.

To conduct comparative protein phylogenetic analyses for each of the eleven multigene gene families to identify putative bona fide clades of proteins involved in monolignol biosynthesis in Eucalyptus.

To perform high-throughput expression profiling using both RNA-Seq and RT-qPCR technology to identify a core lignification gene set involved in developmental xylem lignification in Eucalyptus.

47

Materials and methods

In silico identification of E. grandis phenylpropanoid/monolignol genes

A first survey of the phenylpropanoid genes involved in monolignol biosynthesis was conducted using E. grandis annotation v1.0 in Phytozome v7.0, and then refined on annotation v1.1 in Phytozome v8.0. A combination of keywords and BLASTp searches (using as queries proteins from Arabidopsis and Populus) allowed retrieval of 175 predicted E. grandis protein sequences that were used to generate large comparative phylogenetic trees including protein sequences from Populus trichocarpa, Vitis vinifera, Arabidopsis thaliana and Oryza sativa (Figs. S2.1-S2.6). To define the bona fide clades in E. grandis (i.e. clades with homologs of genes that have been experimentally verified to be involved in xylem cell lignification), we included sequences of bona fide enzymes proven to have true enzymatic activity/biological function. To do this, we performed an extensive literature survey (Table S2.1). For E. grandis short name gene nomenclature, we adopted a species related prefix (Egr) followed by the multigene family abbreviation (Table S2.2). The numbering prioritized E. grandis orthologs of bona fide lignification genes from other species described in the literature. The remaining family members were numbered sequentially according their position on the eleven main chromosomes. Manual curation of the retrieved E. grandis sequences was performed whenever necessary, resulting either in the correction of some gene models or in the elimination of truncated predictions.

Phylogenetic analyses

Phylogenetic relationships among selected sets of predicted primary transcripts (Table S2.3) were analyzed separately for each multigene

48 family. The protein sequences were first aligned using the MAAFT online program (Katoh et al., 2002) using the default settings. The trees were computed and assembled in MEGA5 (Tamura et al., 2013) using the Maximum Likelihood method based on the JTT matrix-based model (Jones et al., 1992). Bootstrap supported consensus trees were inferred from 1000 replicates (Felsenstein, 1985). Branches with less than 50% bootstrap support were collapsed. Initial tree(s) for the heuristic search were obtained automatically as follows: when the number of common sites was <100, or less than one fourth of the total number of sites, the maximum parsimony method was used; otherwise the BIONJ method with MCL distance matrix was used. The trees were drawn to scale with branch lengths measured in the number of substitutions per site and were graphically displayed in the form of a radiating phylogenetic tree. Protein sequence identity, similarity and global similarity matrixes were generated using the SIAS online tool

(http://imed.med.ucm.es/Tools/sias.html). Matrixes were generated involving only the E. grandis family members or involving all the proteins selected for the construction of the bona fide protein phylogeny trees (Table S2.4).

Microfluidic real-time quantitative PCR expression analysis

The RT-qPCR profiling was performed using a “developmental tissue” panel comprising twelve samples collected from fruit capsules, floral buds, shoot tips, roots, young leaves, mature leaves, cambium enriched fractions, developing secondary xylem, secondary phloem, primary and secondary stems. In addition, we used a panel of “contrasting xylem samples” consisting of juvenile versus mature xylem, tension versus opposite xylem and high versus limiting nitrogen fertilization. We also included stems, leaves and roots of young eucalypts trees submitted to cold treatments. Plant material description, RNA extraction and cDNA synthesis were previously

49 reported in Camargo et al. (2014); Cassan-Wang et al. (2012) and Soler et al. (2015). Transcript abundance was assessed by microfluidic qPCR using the BioMark® 96.96 Dynamic Array platform (Fluidigm) as explained in Cassan-Wang et al. (2012). Gene-specific primer pairs

(Table S2.5) were designed using the QUANTPRIME program (Arvidsson et al., 2008). A dissociation step was performed after amplification to confirm the presence of a single amplicon. Three reference genes identified in the same tissue panels by Cassan-Wang et al. (2012) were used for data normalization: PP2A1 (Eucgr.B03386), PPA23 (Eucgr.B03031) and SAND (Eucgr.B02502). Data normalization was performed using the formula proposed by Pfaffl (2001) and adopting as calibrator a mix of all of the samples used in the assay. The normalized data was further processed in EXPANDER6 (Ulitsky et al., 2010) for use in hierarchical clustering analysis. The data treatment procedure consisted of data flooring (1E-5), log transformation and standardization (normalization of each expression pattern to have a mean of 0 and a variance of 1). Hierarchical clustering was performed using Pearson correlation and Average linkage. The consistency of the results provided by the biological replicates was evaluated and the results for each sample were averaged by geometric mean.

RNA-Seq expression analysis

RNA-Seq data of six tissues for three field-grown E. grandis individuals, and root samples prepared from young rooted cuttings, were obtained from EUCGENIE (http://eucgenie.bi.up.ac.za ; (Hefer et al., 2015; Myburg et al., 2014). The absolute transcript abundance values obtained for the 38 lignin biosynthetic genes were computed from

FPKM values obtained with TOPHAT (Trapnell et al., 2009) and

CUFFLINKS (Trapnell et al., 2010). Values were standardized using

EXPANDER 6 (Ulitsky et al., 2010), as described in Soler et al. (2015).

50

Results

In silico identification of phenylpropanoid genes encoding bona fide enzymes

Some of the eleven gene families involved in the phenylpropanoid pathway leading to monolignols, such as the 4CL, COMT, CCR, CCoAOMT and CAD families belong indeed to very large superfamilies resulting in erroneous annotations in genome-wide studies as thoroughly addressed by (Kim et al., 2004, 2007). To avoid such problems, it is necessary to identify bona fide members i.e. those encoding the true enzymatic reactions. To this end, we performed an extensive literature survey to identify genes from the eleven gene families proven to encode bona fide enzymes through either biochemical characterization of their enzymatic activities and/or through forward or reverse genetics approaches (Table S2.1). A total of 75 genes encoding bona fide enzymes from several plant species were included in the phylogenetic analyses together with phenylpropanoid/lignin annotated genes from Populus, Vitis, Arabidopsis and Oryza and the 175 putative phenylpropanoid genes retrieved from the E. grandis genome (Table S2.2). The phylogenetic reconstructions (Figs. S2.1-S2.6) enabled us to delimit bona fide clades for each family and propose a selected subset of 38 E. grandis bona fide genes (Table S2.2).

PAL (Phenylalanine ammonia-lyase)

PAL (EC 4.3.1.5) is the first enzyme of the general phenylpropanoid pathway catalyzing the deamination of phenylalanine to produce cinnamic acid, a universal intermediate in the formation of a large variety of plant-specific phenylpropanoid derivatives. We constructed a phylogenetic tree (Fig. 2.1a) in which we highlighted the genes encoding bona fide PAL enzymes. In comparison to Arabidopsis,

51 parsley, tobacco and Medicago where PAL is encoded by three to four members, the E. grandis PAL family comprises nine genes i.e. three more than in Populus. Eight of the nine EgrPAL proteins were positioned in the PAL bona fide clade (Fig. 2.1a).

Only EgrPAL2 was placed outside, being the most phylogenetically divergent member sharing only 57% to 61% amino acid sequence identity with the other EgrPAL members and exhibiting closer relationship with gymnosperm PAL proteins (Fig. 2.1a). The EgrPAL2 gene was highly expressed in flowers and shoot tips exhibiting weak expression in xylem (Fig. 2.1b-c).

EgrPAL1 was found phylogenetically close to AtPAL1 and 2, reported as mainly involved in anthocyanin production (Huang et al., 2010; Rohde et al., 2004). In agreement with this putative role, this gene revealed a weak overall expression, mostly restricted to flower tissues. EgrPAL8 also showed strong expression in phloem, shoot tips and flowers. EgrPAL9 was placed in the same sub-clade as AtPAL4 and was the most abundantly expressed EgrPAL gene, showing preferential expression in developing xylem although not highly specific (Fig. 2.1b). The EgrPAL3 gene experienced at least two rounds of tandem duplication producing four additional lineage-specific genes, EgrPAL4, 5, 6 and 7 (sharing between 97 and 99% protein sequence identity). Although all five of these PAL genes showed preferential expression in xylem, EgrPAL3 was the most highly expressed member from the cluster (6-95X fold) also displaying xylem tissue specificity (Fig. 2.1b-c). Based on their expression patterns EgrPAL3 and 9 are the most likely PAL genes involved in developmental lignification.

52

Figure 2.1 - The PAL bona fide clade: comparative phylogeny and expression profiles. Unrooted protein phylogenetic tree constructed with PAL bona fide enzymes from several species. A total of 797 non-ambiguous amino acids positions were considered in the final dataset. (a). Heatmaps of transcript accumulation patterns of EgrPAL genes generated by (b) microfluidic RT-qPCR and (c) RNA-Seq. Gene accession number and short name are indicated on the left side. 53 The hydroxylation steps

The hydroxylation steps of the monolignol pathway are catalyzed by C4H (EC: 1.14.13.11), C3’H (EC:1.14.14.1) and F5H (EC:1.14.13), three members of the cytochrome P450 monooxygenases superfamily, belonging to the CYP73, CYP98 and CYP84 families, respectively.

C4H, the second enzyme of the general phenylpropanoid pathway, catalyzes the 4-hydroxylation of trans-cinnamic acid into 4-hydroxy- cinnamate. With the exception of Arabidopsis presenting a single C4H gene, C4H is in general encoded by small gene families not exceeding four members. The E. grandis genome has two C4H members encoding EgrC4H1 and EgrC4H2 sharing 61% identity and belonging to Class I and II, respectively (Fig. 2.2a (i)). In agreement with its membership of class I, in which several members have a major role in lignin biosynthesis, EgrC4H1 was highly and preferentially expressed in developing xylem (Fig. 2.2a (ii,iii)). EgrC4H2 was also preferentially expressed in xylem albeit to a lesser extent (7X fold less) and its overall transcript level was lower than that of EgrC4H1 (12X fold). C4H class II genes have been associated with stress responses (Costa et al., 2003; Lu et al., 2006; Raes et al., 2003) although involvement in vascular lignification has also been shown in french beans and tobacco (Blee et al., 2001; Nedelkina et al., 1999). Together these results suggest that EgrC4H1 is the main C4H gene involved in lignin biosynthesis, but a role for EgrC4H2, although less prominent is also likely.

54

Figure 2.2 - The C4H, C3’H and F5H bona fide clades: comparative phylogeny and expression profiles. (i) Unrooted protein phylogenetic trees constructed with bona fide C4H (a), C3’H (b) and F5H (c) enzymes from several species. A total of 542 (C4H), 511 (C3’H) and 565 (F5H) non-ambiguous amino acid positions were considered in the final datasets. Heatmaps of transcript accumulation patterns of two EgrC4H, four EgrC3’H and two EgrF5H genes were generated by (ii) microfluidic RT-qPCR and (iii) RNA-Seq.

55

C3’H catalyzes predominantly the 3-hydroxylation of 4-coumaroyl shikimate. C3H-defective Arabidopsis ref8 mutants exibited lignin depleted in meta-hydroxylated G and S units (Abdulrazzak et al., 2006; Franke et al., 2002). The C3’H-catalyzed reaction is the first irreversible step towards S and G lignins (Anterola and Lewis, 2002). Interestingly, C3’H from Populus (PtrC3H3) expressed in yeast converts 4-coumaroyl shikimate, but when it was co-expressed with PtrC4H1 and/or PtrC4H2, a dramatic increase in catalytic activity and efficiency was observed. PtrC4H1, PtrC4H2, and PtrC3’H3 form all three possible heterodimers and a heterotrimer (PtrC4H1/C4H2/C3’H3) likely to be involved in monolignol biosynthesis (Chen et al., 2011). Whereas Arabidopsis has a single C3’H gene (Raes et al., 2003), in E. grandis, the C3’H family encompasses four members. Three (EgrC3H1-3) are located in a 73 kb genomic region of chromosome 1 (Fig. S2.7) and share 93-95% sequence identity (Table S2.5). As in Populus, this family has been expanded by lineage- specific tandem duplications (Hamberger et al., 2007). EgrC3H4 is located on chromosome 7 and shares 76-80% similarity with the other three E. grandis genes. RNA-Seq profiling highlighted marked distinctions between the four members of this family (Fig. 2.2b (iii)). For instance, EgrC3’H3’s expression profile was distinct from that of EgrC3’H1 and 2, exhibiting a strong and preferential expression in xylem, while the two others were preferentially expressed in leaves, shoot tips and flowers. The xylem preferential expression was also shared by EgrC3’H4. However, EgrC3’H3 was by far the most strongly expressed member in xylem, with a 65X fold higher expression than EgrC3’H4 (Fig. 2.2b). EgrC3’H3 and to a lesser extent EgrC3’H4 are likely to be involved in developmental lignification in Eucalyptus.

F5H (EC:1.14.13), also known called coniferaldehyde/coniferyl alcohol 5-hydroxylase to better reflect its preferred substrates

56

(Humphreys et al., 1999; Osakabe et al., 1999), is involved in the pathway leading to sinapyl alcohol and, ultimately, to S lignin. The Arabidopsis genome harbours two F5H paralogs encoding AtF5H1 (CYP84A1) and AtF5H2 (CYP84A4). AtF5H1 had been shown to be involved in lignification through the analysis of the fah1 mutant that has little to no S lignin (Marita et al., 1999; Meyer et al., 1998). By contrast, the overexpression of AtF5H1 in the mutant background produces plants displaying substantially higher S units than normal up to about 92% in AtF5H1-up-regulated Arabidopsis (Meyer et al., 1998), up to 84% in tobacco (Franke et al., 2000), and as high as 93.5% in hybrid Populus (Stewart et al., 2009). The previously called AtF5H2 gene was shown recently to be an Arabidopsis-specific paralog of AtF5H1, originating from a recent duplication event that led to neofunctionalization. Indeed, the encoded enzyme (CYP84A4) is involved in the biosynthesis of alpha-pyrones and generates the catechol-substituted substrate for an extradiol ring-cleavage dioxygenase (Weng et al., 2012).

In E. grandis, we also identified two EgrF5H genes, both revealed closer phylogenetic proximity to AtF5H1. They are located in different chromosomes (EgrF5H2 in chromosome 9 and EgrF5H1 in chromosome 10; Fig S2.7) and encode proteins sharing 84% amino acid sequence similarity (Table S2.4). EgrF5H2 showed a modest overall expression, almost exclusively restricted to root tissues whereas EgrF5H1 was very highly and preferentially expressed (93%) in developing xylem (Fig. 2.2c (ii, iii)). The EgrF5H1 protein shares 99.6% similarity with its E. globulus ortholog recently proven to be involved in vascular lignification (García et al., 2014).

57

4CL (4-coumarate:CoA ligase)

4CL (EC 6.2.1.12), the third and last enzyme of the general phenylpropanoid pathway, catalyzes the formation of CoA thiol esters of 4-coumarate and other 4-hydroxycinnamates in a two-step reaction involving the formation of an adenylate intermediate (Ehlting et al., 2001). The 4CL family comprises the acyl:CoA synthetase (ACS) superfamily (Hamberger et al., 2007; Souza et al., 2008), as shown in Figure S2.1, allowing discrimination of the 4CL bona fide clade (Fig. 2.3a (i)), grouped in two classes according to (Ehlting et al., 1999). Like Populus and Arabidopsis, E. grandis has a single representative (Egr4CL2) in class II associated with flavonoid and soluble phenolic biosynthesis (Ehlting et al., 1999). The Egr4CL2 gene was preferentially expressed in shoot tips, flowers and leaves consistent with its possible function in Eucalyptus. In contrast to other dicots having two to four members in class I (Hamberger et al., 2007), E. grandis has a single member (Egr4CL1). In agreement with the predicted role of members of class I in monolignol biosynthesis, Egr4CL1 was strongly and preferentially expressed in developing xylem (Fig. 2.3a (ii, iii)).

HCT (Hydroxycinnamoyl CoA: shikimate hydroxycinnamoyl transferase)

HCT (EC.2.3.1.133) catalyzes the reactions both immediately preceding and following the insertion of the 3-hydroxyl group by C3H into monolignol precursors (Hoffmann et al., 2003, 2004).

HCT uses p-coumaroyl-CoA and caffeoyl-CoA as preferential substrates to transfer an acyl group to the acceptor compound shikimic acid, yielding p-coumaroyl shikimate.

58

Figure 2.3 - The 4CL, HCT and CSE bona fide clades: comparative phylogeny and expression profiles. (i) Unrooted protein phylogenetic tree constructed with bona fide 4CL (a), HCT (b) and CSE (c) enzymes from several species. A total of 614 (4CL), 514 (HCT) and 443 (CSE) non-ambiguous amino acid positions were considered in the final datasets. Heatmaps of transcript accumulation patterns of Egr4CL, EgrHCT and EgrCSE genes were generated by (ii) microfluidic RT-qPCR and (iii) RNA-Seq.

59

A closely related acyl transferase, HCQ (hydroxy-cinnamoyl CoA:quinate hydroxycinnamoyl transferase) yielding p-coumaroyl quinate esters, is involved in chlorogenic acid biosynthesis and not lignin biosynthesis (Niggeweg et al., 2004; Umezawa, 2010). Because of the high similarities between HCT and HCQ, we first constructed a phylogenetic tree including both HCT and HCQ proteins (Fig. S2.2), allowing us to distinguish the two major clades corresponding to the bona fide HCQ and HCT, respectively. The five EgrHCTs belong to the latter showing a lineage-specific expansion as compared to the other dicots where only one (Arabidopsis, Medicago) or two members (Populus) are present (Fig. 2.3b (i)). EgrHCT1 to 4 are in tandem arrangement in an 87 kb genomic region of chromosome 6 (Fig S2.7), and have high amino acid sequence identities (89% to 96%) resulting from recent tandem gene duplication events. EgrHCT1, 2 and 3 exhibited very similar expression profiles (highly expressed in leaves). The fourth member of the tandem array, EgrHCT4, exhibited a very distinct profile, being highly and preferentially expressed in xylem thus suggesting that functional divergence occurred after tandem duplication. Its expression profile clustered with that of EgrHCT5 which was however more strongly and preferentially expressed in xylem. EgrHCT5 is located on a different chromosome (chromosome 10; Fig. S2.7), the corresponding protein was phylogenetically distinct from the four others presenting 66% to 69% amino acid sequence identity (Table S2.4), and being closer to the HCT from other species. EgrHCT5 and EgrHCT4 are therefore likely involved in lignin biosynthesis.

CSE (Caffeoyl shikimate esterase)

The involvement of CSE (EC: 3.1.1.-) in lignin biosynthesis has been described very recently thanks to the analysis of an Arabidopsis cse-2 knockout mutant that presented a reduced lignin content enriched in H

60 units and depleted in S units (Vanholme et al., 2013). CSE hydrolyzes caffeoyl shikimate into caffeoyl-CoA (Vanholme et al., 2013). Together with 4CL, CSE was proposed to be involved in an alternative pathway leading to the formation of caffeoyl-CoA, bypassing the second reaction performed by HCT. We mainly used the eudicotyledons orthologs of AtCSE previously reported (Vanholme et al., 2013) to generate the CSE phylogenetic tree (Fig. 2.3c (i)). The EgrCSE gene has a closer phylogenetic relationship to the grapevine and the Populus CSE genes. The EgrCSE gene was found strongly and preferentially expressed in E. grandis developing xylem tissues supporting its inferred role in lignin biosynthesis.

The methylation steps

Plants present a wide variety of S-adenosyl-L-Met-dependent O- methyltransferases (OMTs) that act on Phe-derived substrates during the production of numerous plant secondary compounds in addition to lignin (Eckardt, 2002). COMT (EC: 2.1.1.68) and CCoAOMT (EC 2.1.1.104) are both involved in methylation steps of the monolignol pathway (Ye and Varner, 1995).

CCoAOMT catalyzes the methylation of caffeoyl CoA to produce feruloyl CoA. Functional characterization of CCoAOMT in several plant species (see Table S2.1) revealed that it was involved in the synthesis of G units. Focusing on the bona fide CCoAOMT discrete clade (Fig. 2.4a (i)) delimited from a superfamily tree (Fig. S2.3) allowed us to identify two EgrCCoAOMT genes, the same number as in Populus (Chen et al., 2000). EgrCCoAOMT1 was located on chromosome 9 and EgrCCoAOMT2 on chromosome 7 (Fig. S2.7).

They were reported to originate from a lineage-specific whole genome duplication event (Myburg et al., 2014). Both EgrCCoAOMT genes revealed a strong, preferential expression in developing xylem

61 tissues, although EgrCCoAOMT2 presented a 2X fold higher expression relatively to EgrCCoAOMT1 (Fig. 2.4a (iii)).

Figure 2.4 - The CCoAOMT and COMT bona fide clades: comparative phylogeny and expression profiles. (i) Unrooted protein phylogenetic tree constructed with bona fide CCoAOMT (a) and COMT (b) enzymes from several species. A total of 262 (CCoAOMT) and 365 (COMT) non-ambiguous amino acid positions were considered in the final dataset. Heatmaps of transcript accumulation patterns of two EgrCCoAOMT and seven EgrCOMT genes were generated by (ii) microfluidic RT- qPCR and (iii) RNA-Seq.

In angiosperms, COMT (EC: 2.1.1.68) was initially thought to be a bifunctional enzyme similarly involved in the synthesis of G and S precursors. However, in several plants species, COMT down- regulation led to a dramatic decrease in S units and to the

62 incorporation of 5-hydroxyguaiacyl units (5-OH-G) into lignins. COMT is now considered to be preeminently involved in the synthesis of S units through the preferential methylation of 5-hydroxyconiferyl aldehyde into sinapaldehyde (Davin et al., 2008). We assembled a phylogenetic tree for the OMT superfamily (Fig. S2.4) in which we delimited the bona fide COMT clade (Fig. 2.4b (i)). Compared with other species harboring only one or two members within the bona fide clade, the E. grandis COMT family was expanded with seven EgrCOMT genes resulting from distinct duplication mechanisms. Expression profiling highlighted EgrCOMT1 as being by far the most expressed EgrCOMT member with a massive and highly specific expression in developing xylem. The closest ortholog of EgrCOMT1 (99% similarity) was found to be the E. gunnii COMT (Poeydomenge et al., 1994). EgrCOMT1, 2, 3 and 4 were all located in a 135 kb genomic region of chromosome 1 and resulted from tandem gene duplication events (Fig. S2.7) followed by functional divergence. Indeed, EgrCOMT2, 3 and 4 presented similar expression profiles that were very different from EgrCOMT1, exhibiting a low expression in highly lignified tissues and broader expression over the other sampled tissues. EgrCOMT2 showed a very high expression in shoot tips (Fig. 2.4b (ii)). Together with EgrCOMT5, EgrCOMT1-4 formed a distinct sub-clade (Fig. 2.4b (i)). EgrCOMT5 was located in another region of chromosome 1, exhibiting a similar tendency although with lower expression levels. All the amino-acids residues described to be involved in the catalytic activity, substrate binding and chemical interactions of COMT enzymes (Zubieta et al., 2002) are only fully conserved in EgrCOMT1. For the remaining proteins EgrCOMT2-5, the catalytic center amino-acids (His-269, Glu-329 and Glu-297) are conserved but residue substitutions are found in the lignin monomer binding interactions residues, COMT methoxy and SAM/ASH substrate binding pockets (data not shown). EgrCOMT6 and EgrCOMT7 were

63 reported as originated by recent segmental duplication (Myburg et al., 2014). Their predicted proteins sharing 99% amino acid sequence identity (Table S2.4) were positioned in the same sub-clade as AtCOMT and NaCOMT. These two genes were found preferentially expressed in leaves and flowers according to E. grandis-RNA Seq data (Fig. 2.4b (iii)), but were also detected in cambium, xylem and phloem tissues in RT-qPCR experiments performed with a larger and different organ and tissue panel. EgrCOMT1 is the most likely candidate involved in developmental lignification.

The two last reductive steps

CCR (EC 1.2.1.44) catalyses the first committed step of the lignin- specific branch of monolignol biosynthesis by converting cinnamoyl CoA esters to their corresponding cinnamaldehydes. The first cDNA (EguCCR) and genomic clone encoding CCR were isolated from Eucalyptus gunnii and their identity proven through enzymatic characterization of the corresponding recombinant protein (Lacombe et al., 1997; Lauvergeat et al., 2001). The cDNA was used to clone the tobacco CCR cDNA leading to the first transgenic plants with down- regulated CCR activity exhibiting a severe reduction (50%) of their lignin content (Piquemal et al., 1998). A wide study of the CCR and CCR-like gene superfamily in land plants revealed a distribution into three major subfamilies, and highlighted the bona fide CCR family (Barakat et al., 2011); this study, Fig S2.5. In addition to EgrCCR1, closely related (91% identity) to the E. gunnii EguCCR1 and likely involved in lignin biosynthesis, the bona fide clade also includes a second E. grandis CCR gene (EgrCCR2) reported in Eucalyptus (Fig. 2.5a (i)). EgrCCR1 and EgrCCR2 are located in distinct chromosomes (chromosome 10 and 6, respectively) and encode proteins sharing only 56% amino acid sequence identity. Both proteins harbour the

64

CCR signature, NWYCY (Lacombe et al., 1997) essential for its enzymatic activity (Escamilla-Trevino et al., 2010).

Other species have been shown to harbour two functional CCR genes such as Arabidopsis (Lauvergeat et al., 2001), maize (Pichon et al., 1998; Tamasloukht et al., 2011), and switchgrass (Escamilla- Trevino et al., 2010). In all cases, the AtCCR1 gene was shown to be involved in developmental lignin biosynthesis, while AtCCR2 was poorly expressed during development but was inducible by biotic or abiotic stresses and hypothesized to play role in defense mechanisms. Indeed, a similar situation was found in Eucalyptus where the two EgrCCR genes showed distinct overall expression levels and patterns. EgrCCR1 was found to be strongly and preferentially expressed in developing xylem in agreement with a role in developmental lignin biosynthesis. In contrast, EgrCCR2, which presented a very low overall expression, was expressed in E. grandis shoot tips (RNA-Seq data), whereas it could be detected in developing xylem by RT-qPCR (Fig. 2.5a).

65

Figure 2.5 - The CCR and CAD bona fide clades: comparative phylogeny and expression profiles. (i) Unrooted protein phylogenetic trees constructed with bona fide CCR (a) and CAD (b) enzymes from several species. A total of 380 (CCR) and 390 (CAD) non-ambiguous amino acid positions were considered in the final dataset. Heatmaps of transcript accumulation patterns of two EgrCCR and two EgrCAD genes were generated by (ii) microfluidic RT-qPCR and (iii) RNA-Seq.

66

CAD (EC:1.1.1.195) catalyzes the reduction of hydroxycinnamyl aldehydes to their corresponding alcohols. CAD and CAD-like genes have been reported in numerous species as forming large multigene families containing members exhibiting a low degree of homology and different affinities to various substrates, and probably related to various physiological roles (Sibout et al., 2003). Several authors have suggested the existence of three evolutionary CAD classes, differing in their patterns of evolution and expression (Barakat et al., 2009; Costa et al., 2003; Guo et al., 2010; Raes et al., 2003; Sibout et al., 2005). Class I comprises all bona fide CAD genes as shown in the broad CAD/CAD-like phylogenetic tree (Fig. S2.6). Interestingly, in this bona fide clade, Arabidopsis, Vitis, parsley, tobacco and E. grandis are represented by two CAD proteins, Populus being the exception with only a single CAD gene (PtrCAD1) associated with monolignol biosynthesis (Fig. 2.5b (i)). The two E. grandis genes, EgrCAD2 and EgrCAD3 are located in chromosome 7 and 8, respectively, in regions reported to result from a recent segmental duplication event (Myburg et al., 2014). The two genes revealed strong phylogenetic proximity (87% sequence identity at the protein level). They are also closely related to EguCAD2 shown to have a true CAD enzymatic activity and being strongly expressed in xylem tissue (Grima-Pettenati et al., 1993). EgrCAD3 (Eucgr.H03208) initially present in the v1.0 annotation was later removed in v1.1, but can still be found in the low confidence transcripts in the v2.0 annotation. Our expression data support that this gene is actively transcribed. The two EgrCAD genes presented similar expression patterns with preferential expression in highly lignified tissues such as secondary stem and developing xylem. EgrCAD3 exhibited a 5X fold higher expression than EgrCAD2 both in developing xylem tissues and in overall expression.

67

Comparison of hierarchical clustering of RNA-Seq and RT-qPCR expression profiling

The step-by-step combined analysis of the phylogeny neighborhood and of the expression profiles for the 38 bona fide clade members of the eleven families highlighted a subset of 17 genes as the most likely major genes involved in xylem lignification (Fig. 2.6). The core vascular lignification toolbox comprises Egr PAL3 & 9, C4H1 & 2, 4CL1, HCT4 & 5, SCE, C3H3 & 4, CCoAOMT1 & 2, F5H1, COMT1, CCR1, and CAD2 & 3. In order to get a more global picture of the genes exhibiting strong and/or preferential expression in highly lignified tissues, given the fact that in a few cases, we noticed distinctive expression patterns between the tissues panels used for microfluidic RT-qPCR and RNA-Seq, respectively, we carried out independent hierarchical clustering of the overall expression profiling data obtained in the two panels (Fig. 2.7, Table S2.6). Both heatmaps revealed large clusters of genes with preferential expression in developing xylem, cambium and/or secondary stem allowing us to consider eight more genes (EgrPAL5, 6, 7 & 8; EgrC3H1 & 2; EgrCOMT6 & 7) as potential candidates for vascular lignification. However, these eight genes are not as strongly expressed in developing xylem, as the seventeen genes of the core toolbox. Considering their expression patterns, the remaining 13 genes are probably involved in the biosynthesis of phenylpropanoid compounds other than lignin.

68

(a) (b)

Phenylalanine EgrPAL3 (Eucgr.G02848) 2 EgrPAL9 (Eucgr.J01079) Cinnamate p-Coumarate Caffeate Ferulate 5-OH- ferulate Sinapate EgrC4H1 (Eucgr.J01844) 2 EgrC4H2 (Eucgr.C00065)

p-Coumaroyl CoA 1 Egr4CL1 (Eucgr.C02284)

EgrHCT4 (Eucgr.F03978) 2 EgrHCT5 (Eucgr.J03126) p-Coumaroyl shikimate Caffeoyl shikimate

OH EgrC3'H3 (Eucgr.A02190) HO O 2 OH EgrC3'H4 (Eucgr.G03199) O OH O HO p-Coumaroyl quinate Caffeoyl quinate 1 EgrCSE (Eucgr.F02557)

EgrCCoAOMT1 (Eucgr.I01134) 2 EgrCCoAOMT2 (Eucgr.G01417)

Caffeoyl CoA Feruloyl CoA 5-OH-feruloyl CoA Sinapoyl CoA 1 EgrF5H1 (Eucgr.J02393)

1 EgrCOMT1 (Eucgr.A01397) O H CO (Eucgr.J03114) H 3 H 1 EgrCCR1 p-Coumaraldehyde Caffeyl aldehyde Coniferaldehyde 5-OH-Coniferaldehyde Sinapaldehyde HO EgrCAD2 (Eucgr.G01350) 2 EgrCAD3 (Eucgr.H03208)

p-Coumaryl alcohol Caffeyl alcohol Coniferyl alcohol 5-OH-coniferyl alcohol Sinapyl alcohol

p-OH-phenyl lignin (H) Guaiacyl lignin (G) Syringyl lignin (S)

Figure 2.6 - The Eucalyptus lignification toolbox. The biosynthetic pathway was adapted from Humphreys and Chapple (2002) and

69 modified according to the recent findings of Vanholme et al. (2013). The 17 Eucalyptus grandis genes encoding enzymes located in the bona fide clades constitute the core set of a Eucalyptus lignification toolbox. Reactions thought to be key-steps in lignin biosynthesis are indicated with black arrows. 69

Figure 2.7 - Heatmaps of transcript accumulation patterns of 38 Eucalyptus putative monolignol biosynthesis genes using (a) microfluidic RT-qPCR and (b) RNA-Seq. Gene accession number and short name are indicated at the left side.

Transcript abundance was expressed in log2 ratio with blue colors indicating higher accumulation of transcript and yellow colors indicating lower accumulation. Absolute expression was evidenced using an Excel 2010 two-color scale (minimum meaning lower than 1st quartile and maximum meaning higher 3rd quartile) with red colors indicating higher accumulation of transcript and white colors indicating lower accumulation.

70

Environmental and developmental expression characteristics of the 38 bona fide genes

Finally, we examined the response of the bona fide genes to several environmental cues such as gravitropic stress (tension vs opposite wood), nitrogen fertilization (high vs low) and cold treatment. We also included xylem samples at two different physiological stages (juvenile vs mature). The results are presented in Table S2.7. Interestingly, most of the thirteen genes clearly not included in the lignification toolbox were more responsive to the environmental stimuli as compared to those of the lignin toolbox, supporting a role in the synthesis of phenylpropanoids involved in plant defence.

For instance, EgrCCR2 was strongly responsive to gravitropic stimulus, nitrogen supplementation, and to cold stress, whereas EgrCCR1 transcript levels were stable under these treatments. The EgrCOMT2-5 genes revealed strong and similar responses to cold stress in young leaves tissues, EgrCOMT4 being the most induced gene. Among the tandem duplicated COMT genes (EgrCOMT1-4), EgrCOMT2 was responsive to most stress conditions tested whereas EgrCOMT1 was only moderately responsive to nitrogen fertilization.

Although lignin biosynthetic genes were in general not strongly responsive to environmental stimuli, all were down-regulated in response to high nitrogen fertilization confirming and extending recent results obtained on the same samples (Camargo et al., 2014). Among those lignin genes, EgrC3H4 was the most responsive, being differentially expressed in response to all tested abiotic stimuli. It was also more expressed in mature than in juvenile wood. The nine members of the EgrPAL family had distinct patterns of responses to the environmental stimuli, supporting functional diversification following duplication.

71

Discussion

Building on the recent availability of the E. grandis genome (Myburg et al., 2014), we report here a comprehensive genome-wide analysis of the phylogenetic relationships and expression profiles of the phenylpropanoid and lignin biosynthesis gene families in Eucalyptus highlighting the evolutionary histories of these families and those members that are likely to be involved in lignification during xylem development.

The E. grandis phenylpropanoid/lignin families underwent different evolutionary histories.

Seven of the eleven families presented no traces of tandem duplications, harboring low or even single-copy genes. According to De Smet et al. (2013), single copy genes are often involved in essential housekeeping functions highly conserved across species, and result from a selection pressure to preserve them as singletons. For instance, C4H and F5H have critical functions regulating lignification and determining lignin monomer composition (Franke et al., 2000). In addition, when proteins function as dimers or part of larger multi-protein complexes, the presence of multiple isoforms could disrupt protein stability and functionality (Cannon et al., 2004). C4H is indeed known to be involved in membrane multiprotein complexes with PAL (Achnine et al., 2004) and also with C3’H (Chen et al., 2011, and references therein) allowing metabolic channeling.

Although tandem duplication has been described as the most prominent mechanism contributing to multigene family expansion in E. grandis shaping the size and the biological functions of many families (Hussey et al., 2015; Myburg et al., 2014; Soler et al., 2015), only four bona fide lignification families (PAL, HCT, C3’H and COMT) were shaped by tandem gene duplication events. In sharp contrast, tandem

72 duplication mechanism had a major impact on the non bona fide clades of the 4CL, COMT, CCoAOMT, and CAD families.

The frequencies of tandem duplicated genes in the bona fide PAL (56%), HCT (80%), C3’H (75%) and COMT (57%) clades exceeded the 34.5% observed at the genome level (Myburg et al., 2014). In the COMT, HCT and C3’H families, only one of the tandem duplicated genes showed preferential expression in highly lignified tissues. Gene duplication has been recognized as a primary mechanism for increasing functional diversification, and the increased expression divergence in duplicated genes can substantially contribute to morphological diversification (Wang et al., 2012). Indeed, the phenylpropanoid pathway leads to a wide variety of compounds such as flavonoids, lignans and hydroxycinnamate derivatives produced in specific metabolic branches and participating in diverse cellular processes underlying plant growth, development, adaptation, defense and reproduction (Tsai et al., 2006). In metabolic terms, this pattern might derive from cellular strategies to adjust protein synthesis according functional needs, keeping only one or two highly expressed genes in each family, in a particular tissue, and/or in a particular environmental condition. Many of the E. grandis COMT and HCT genes expanded via tandem duplication tended to be involved in responses to environmental stimuli consistent with earlier observations on tandem duplicated genes (Hanada et al., 2008). For instance, in the EgrCOMT1-4 tandem duplicated genes, only EgrCOMT1 is likely to be involved in developmental lignification, the three others have only residual expression in xylem and are inducible by abiotic stresses to different levels suggesting neo- or subfunctionalization following duplication from a common ancestor. Supporting this assumption, some residues substitutions were found in the lignin monomer binding interacting residues, COMT methoxy and SAM/ASH binding pockets of

73

EgrCOMT2-5 proteins as compared to bona fide COMTs. EgrCOMT2- 5 could be involved in the synthesis of phenylpropanoids compounds in response to environmental cues or in yet unknown pathways specific to Eucalyptus similar to the recently demonstrated role of CYP84A4 that originates from AtHCT1 (CYP84A1) but evolved to perform a different enzymatic reaction to produce Arabidopsis specific alpha-pyrones (Weng et al., 2012).

The situation was different for the PAL family where EgrPAL3 experienced a high rate of tandem duplication producing four additional lineage-specific genes, all showing preferential expression in xylem. Such a functional redundancy has been associated with increasing robustness of biological systems by functional buffering, suggesting that the eventual loss of function in one copy can be compensated for by other copies (Gu, 2003). In some cases, gene duplication has also been proposed to confer an immediate selective advantage by facilitating elevated expression (Hurles, 2004) resulting in protein dosage benefits. Thus, lineage-specific duplications of some phenylpropanoid families can lead to both functional redundancy and to divergence likely contributing to the wide adaptability of the Eucalyptus species to the challenging Australian environment.

For the bona fide CAD and CCoAOMT families, other duplication mechanisms than tandem duplication seemed to be more important in family evolution. Both families present gene pairs that retained similar expression profiles suggesting functional redundancy and strong involvement in vascular lignification. The two CCoAOMT genes originated from a lineage-specific whole genome duplication event (Myburg et al., 2014).

The PAL and COMT families were also affected by other duplication mechanisms. For instance, EgrPAL1 and EgrPAL8 were traced back

74 to an ancient hexaploidization event shared by the core and of which only a small proportion of duplicated genes have survived subsequent gene loss. EgrPAL9 and EgrPAL3 resulted from a lineage- specific whole-genome duplication event detected in E. grandis (Myburg et al., 2014).

Expression profiling identifies a core Eucalyptus vascular lignification toolbox within the bona fide genes

For each of the eleven bona fide gene families, we combined comparative phylogeny within the bona fide clades with individual gene developmental expression profiling in two independent data sets (RNA-Seq and RT-qPCR). This approach identified 17 genes as likely involved in developmental lignin biosynthesis and constituting the so called “core lignin toolbox” while eight more were pointed out by hierarchical clustering considering all of the 38 gene expression patterns. Collectively, we can consider the E. grandis lignin toolbox as being composed of 25 members (17 strong candidates and eight additional genes possibly involved). Many of these genes had not been reported before, enriching our knowledge of the lignin biosynthetic pathway in eucalypts although a role in the biosynthesis of heartwood extractives, lignans, polyphenols and condensed tannins that are present in xylem, cannot be excluded. The thirteen remaining genes are likely involved in the biosynthesis of other phenylpropanoid compounds than lignin. For instance, EgrPAL1 may be involved in anthocyanin production like EgrPAL1 because it is phylogenetically close to AtPAL1 and 2 (Huang et al., 2010; Rohde et al., 2004). Several of these genes were hypothesized to be involved in plant defense as inferred from their orthologs in other species. Consistent with this, we found that many of these genes (like EgrCCR2) were highly responsive to environmental stimuli.

75

Chapter 3

Validation of a local assembly of a tandem duplicated array of genes involved in lignin biosynthesis in Eucalyptus grandis

Parts of the present chapter were published in the following research article:

Myburg AA, Grattapaglia D, Tuskan GA, Hellsten U, Hayes RD, Grimwood J, , Carocha V, Paiva J, .... 2014. The genome of Eucalyptus grandis. Nature 510 (7505): 356-362.

76

Chapter 3 Validation of a local assembly for a tandem duplicated array of genes involved in lignin biosynthesis in Eucalyptus grandis.

Table of Contents

Chapter 3 ...... 76

Abstract ...... 78 Chapter introduction ...... 79 Objectives ...... 82 Materials and methods ...... 83 Results ...... 86 Discussion ...... 89

77

Abstract

• The availability of a non-redundant chromosome-scale reference (v1.0) sequence for Eucalyptus grandis revealed extensive tandem gene duplications. It was necessary to verify if these regions were indeed well assembled, and not the result of artificial identification of duplicated gene loci due to partial heterozygoty of the E. grandis clone sequenced,

• We generated sequence evidence to validate a local sequence assembly across a 144 kb region of the E. grandis chromosome 11. This region comprises eight highly similar O-methyltransferase (OMT) genes disposed in tandem. A BAC clone spanning these eight OMT genes was selected and specific primers were designed to amplify genic and intergenic portions of the targeted region.

• The orientation and alignment of the amplified BAC sequences to the reference genome sequence matched exclusively the targeted regions and therefore validated the local assembly over this highly duplicated genomic region.

78

Chapter introduction

A full, non-redundant chromosome-scale reference (v1.0) sequence has been assembled for one Eucalyptus grandis (BRASUZ1) genotype (Myburg et al., 2014). This assembly was based on a 6.73 whole- genome Sanger shotgun coverage, paired bacterial artificial chromosome (BAC)-end sequencing and a high-density genetic linkage map (Kullan et al., 2012).

The BRASUZ1 genome selected for the whole genome sequencing project has been recognized to be far less homozygous than what would be expectable for an individual produced by self-pollination (Myburg et al., 2014). According these authors, this may be due to genome-wide selection in the S1 (selfing) progeny against regions carrying a high genetic load. As a result of this selection, BRASUZ1 and its live siblings could be on the low-end of the homozygosity distribution in its self-pollination family. Despite this drawback, the same authors demonstrated that large segments of the BRASUZ1 genome were completely homozygous; an observation which corroborates that BRASUZ1 indeed provided a single haplotype for assembly.

The complexity of eukaryote genomes makes assembly errors inevitable in the process of constructing full-sequenced reference genomes (Ekblom and Wolf, 2014; Zhang and Backström, 2014). Despite recent progresses, in large-scale genome sequencing programs the percentage of incorrect or missing annotations remains high. In particular, recent duplications which are typical sources of gene clusters are often misannotated (Li et al., 2015).

A notable output from the E. grandis full genome-sequencing project was the verification that this genome has the largest number of genes

79 in tandem arrays (12,570 genes, around 34% of the total) reported among sequenced plant genomes. These genes were associated to unusually high rates of tandem gene duplications which have been stable (following expected exponential increase) in Eucalyptus since the divergence of the Myrtales lineage over 100 million years ago (Myburg et al., 2014).

Tandem gene duplications are among the major gene duplication mechanisms in eukaryotes. Tandem duplications are associated to unequal crossing-over, which results in duplicated genes with high nucleotidic identity linked within a short region of a chromosome (Zhang, 2003). Such regions are susceptible to challenge genome assembly, a drawback particularly relevant in highly heterozygous genomes where allele diversity and tandem gene differences are within the same range (Myburg et al., 2014). This situation might lead to the separate assembly of allelic forms, the generation of artificially duplicated gene loci where both allelic forms are included in the local assembly and therefore to cause inflated estimates of tandem clusters and tandem gene arrays.

The frequencies of tandem duplication (clusters and genes) in heterozygous regions of E. grandis have been estimated and found not higher than in homozygous regions (Myburg et al., 2014). However, doubts about possible inaccuracies in local genome assemblies remain in respect to localized regions of high incidence of tandem repeats.

In this genomic context and in the framework of the public release of the Eucalyptus grandis fully annotated genome, a collaborative international effort was developed by several research groups to validate the Sanger sequence v1.0 genome assembly for this species.

80

The current study was one of such contributions. We focused on a tandem duplicated array of eight highly similar predicted O- methyltransferase (OMT) genes in chromossome 11. We generated experimental (sequence) evidence that these genes do exist and were correctly assembled in the v1.0 assembly of the E. grandis genome.

81

Objectives

To experimentally validate the accuracy of a local sequence assembly across a region of the Eucalyptus grandis genome which comprises eight highly similar O-methyltransferase (OMT) tandem duplicated genes.

82

Materials and methods

For this study, we analyzed a tandem duplicated array of eight predicted O-methyltransferase (OMT) genes: Eucgr.K00949, Eucgr.K00950, Eucgr.K00951, Eucgr.K00953, Eucgr.K00954, Eucgr.K00955, Eucgr.K00956, and Eucgr.K00957. These eight OMT genes comprise the tandem gene cluster nº 547 (Supplementary data 19 in Myburg et al., 2014). The genes are located in a genomic region of 144 kb in the Eucalyptus grandis chromosome 11 (scaffold_11:11468901..11612951) (Fig. 3.1). Their predicted coding sequences showed between 64 to 96% pairwise sequence identities (Fig. 3.3).

Figure 3.1 - Genomic disposition of the Eucalyptus grandis tandem gene cluster nº 547 (Myburg et al., 2014). This cluster spans a genomic region of 144 kb in the Eucalyptus grandis chromosome 11 (scaffold_11:11468901..11612951). The image was generated from a screen capture using JBrowse in Phytozome v10.3.

We selected one BAC clone (EG_Ba70107) from an EG_Ba BAC library constructed from HindIII restricted nuclear DNA fragments of the BRASUZ1 genome (Paiva et al., 2011). This BAC clone spans all eight OMT genes. BRASUZ1 genotype is the confirmed self (S1) individual selected for the Eucalyptus grandis reference genome sequencing project. Using QUANTPRIME program (Arvidsson et al., 2008) based on the Eucalyptus grandis genome (v1.0 assembly, Phytozome v6.0), and accepting splice variant hits, primers were designed to amplify specific coding regions from each of these eight genes in the selected BAC clone. Additionally, primers pairs to

83 specifically amplify part for the nine intergenic regions flanking these genes (Table 3.1) were designed using PRIMER3 PLUS v2.3.3 software

(Untergasser et al., 2012) (http://primer3plus.com/cgi-bin/dev/primer3plus.cgi).

Primer design considered the default settings proposed by PRIMER3

PLUS program and the following options: i) annealing temperature of 60ºC and G/C content 40-60%; ii) primer size between 18 to 27bp, and expected product size range between 501-3000bp.

Genomic DNA from BRASUZ1 was used for the positive control in the PCR amplifications. Genomic DNA extraction was performed according to the protocol adapted from (Gemas, 2004). DNA integrity was checked by running the DNA samples in a 0.8% (w/v) agarose gel, in 0.5X Tris-Borate-EDTA buffer (TBE), and stained SYBR® Safe (Invitrogen®, Carlsbad, CA, USA), along to the molecular marker λ phage DNA (Invitrogen®, Carlsbad, CA, USA). Bands were visualized in a Gel Doc-1000 UV (Bio-Rad® Laboratories, Inc.) image acquisition system. Quantification and quality (A260/280 and A260/230 ratios) of nucleic acids was determined using the NanoDropTM ND-1000 Spectrophotometer (Thermo ScientificTM, Wilmington, Delaware, USA).

All PCR amplifications were done in a Thermal Cycler C-1000 (Bio- Rad® Laboratories, Inc.), in a total volume of 20µL using 10ng of DNA, 5X GoTaq® buffer, 1.5mM MgCl2, 0.2mM of each dNTPs, 0.4µL of primer forward (10µM), 0.4µL of primer reverse (10µM), and 1U of GoTaq® DNA polymerase (Promega, Madison, WI, USA). Conditions used for the PCR amplification were as follows: denaturation cycle at 94ºC for 3 min, then 40 cycles at 94ºC for 15 seconds, annealing at 60ºC or 65°C for 30 seconds, extension for 72ºC for 2 min, and a final extension at 72ºC for 10 min. The evaluation of the bands of the PCR products was performed using the 2% (w/v) agarose 0.5X TBE gel and stained SYBR Safe (Invitrogen®, Carlsbad, CA, USA) along the molecular marker 1 Kb DNA Ladder (Invitrogen®, Carlsbad, CA, USA).

84

Bands were visualized in a Gel Doc-1000 UV (Bio-Rad® Laboratories, Inc.) image acquisition system.

The amplicons obtained from the PCR reactions using the BAC clone DNA were purified using the DNA DNeasy Plant Mini Kit (Qiagen®, Hilden, Germany). The purified amplicons were sent to STAB vida (Caparica, Portugal) for Sanger sequencing with the corresponding forward and reverse primers (Fig. 3.1). The attained sequences were aligned against the E. grandis (BRASUZ1) genome assembly using

CLUSTALW (Larkin et al., 2007), and in the case of the CDS (coding sequence), keeping the bases in codons when opening gaps. Sequence identity values were calculated using the SIAS webpage

(http://imed.med.ucm.es/Tools/sias.html) considering only the region comprised in the shortest sequence in the alignment and without considering gaps. Visualization of the position of the primers, amplicons and gene models in the chromosome and in the BAC clones was performed using FANCYGENE7 (Rambaldi and Ciccarelli, 2009).

85

Results

A single amplicon was obtained and analyzed for each of the eight OMT genes and for seven of the nine-targeted intergenic regions (Fig. 3.2; Table 3.1). The exception was the intergenic region between Eucgr.K00955 and Eucgr.K00956. All amplicons were sequenced except for the one obtained for the intergenic region between Eucgr.K00956 and Eucgr.K00957.

In all cases, the orientation and alignment of the amplified BAC sequences to the genome sequence matched the targeted exclusive regions and therefore validated the accuracy of assembly over the highly duplicated genomic region spanning 185 kb. The source sequences obtained for the genic and intergenic regions for this OMT gene cluster are available as supplementary data 3 in (Myburg et al., 2014).

86

Table 3.1 – Amplicons obtained from genic and intergenic regions of a tandem array of eight E. grandis OMT genes.

Code Accession Position in the Position of BAC in Sequence of the forward Forward primer Sequence of the reverse Reverse primer Amplicon Frw sequenced ID Number chromosome 11 chromosome 11 primer position in primer position in size region in chromosome 11 chromosome 11 chromosome 11 K49 Eucgr.K00949 11468901..11469947 11459853..11641527 TCTCCTGAAACCGCCTACCTTG 11469848..11469827 ACGGCCTTCATCACGAACTTGTC 11469499..11469521 350 11469799..11469499 K50 Eucgr.K00950 11474584..11476110 11459853..11641527 AGAACATCTTCTCCGCCCACTG 11475884..11475863 ACCACAATCAAGCTGGGAAGGC 11474666..11474687 1219 11475829..11474893 K51 Eucgr.K00951 11511900..11515098 11459853..11641527 TTCGATCTGCCTCACGTCGTTG 11514393..11514372 ATCACGGATAGAGCTCGAGTGG 11512092..11512113 2302 11514342..11513343 K53 Eucgr.K00953 11539604..11540425 11459853..11641527 ATGATGTCCCCGTGGCAATT 11540035..11540016 TGGCAGTAGTACTCACCATCAA 11539627..11539648 409 11539994..11539627 K54 Eucgr.K00954 11546695..11549613 11459853..11641527 CGGGCAAGGCAAAATGTACA 11548681..11548700 TCTTCTCTCTCTCGATTGCTATG 11549271..11549249 591 11548720..11549260 K55 Eucgr.K00955 11562697..11566022 11459853..11641527 TGTCAAAAGGGTGGGAATCCT 11565133..11565113 GGAGAGGAGATGTGGGCCAA 11563637..11563656 1497 11565092..11564064 K56 Eucgr.K00956 11571359..11574375 11459853..11641527 TGTGAGCGGGATCACGATTAGG 11573845..11573824 TCCTTACCACCCGAGCAAAGTG 11571468..11571489 2378 11573778..11572813 K57 Eucgr.K00957 11609967 - 11612951 11459853..11641527 GTCAAGAGGGTGGGAATCGA 11612060..11612041 CCATTGAGGCACAAGCGATA 11610633..11610652 1428 11612020..11610969 0|49 out-Eucgr.K00949 11458900..11468900 11459853..11641527 CCGAGAAAATCACCGGGACA 11463028..11463047 CCTCCGCGTACCTTATGGAC 11463590..11463571 563 11463074..11463590 49|50 Eucgr.K00949-50 11469948..11474583 11459853..11641527 GGAGTCCATCGTCGGTGATC 11472960..11472979 GATGCCTCCCCCTTCCTTTC 11473559..11473540 600 11473007..11473559 50|51 Eucgr.K00950-51 11476111..11511899 11459853..11641527 TCGCCGGCTAGTTATTCGTC 11493888..11493907 GAATAGTGCGCGCGAACAAT 11494447..11494428 560 11493936..11494446 51|53 Eucgr.K00951-53 11515099..11539603 11459853..11641527 ACCCTCCCTCCTCCAAGATC 11537454..11537473 GTGGTCATATGGGTGGTGCA 11537954..11537935 501 11537495..11537954 53|54 Eucgr.K00953-54 11540426..11546694 11459853..11641527 TTTGACAAAAGGCGCTCGTG 11543236..11543255 CTCCATGTGGACGTGTTGGA 11543765..11543746 453 11543282..11543765 54|55 Eucgr.K00954-55 11549614..11562696 11459853..11641527 AGTACCTCCACCTCCTCGTC 11555697..11555716 TCGTACTCCGGGTCATCCTT 11556214..11556195 515 11555730..11556214 57|0 Eucgr.K00957-out 11613250..11618250 11459853..11641527 CACACGTCGGGCACAAAAAT 11613554..11613573 GCGAAACACATGATCCGACG 11614068..11614049 495 11613601..11614044

Figure 3.2 - Electrophoretic separation of amplicons generated from genic and intergenic regions of a tandem array of eight 87 Eucalyptus grandis OMT genes (tandem gene cluster nº 457, as reported in Myburg et al., 2014). Gel consisted of a matrix of Agarose 2% (w/v). See Table 3.1 for sample legend.

87

88

Figure 3.3 - A targeted BAC sequencing approach to verify the genomic architecture of a tandem array of eight E. grandis OMT genes. A. Nucleotidic alignment of the coding regions (ORF) of the 5 most similar OMT gene models out of 8 (Eucgr.K00953 to Eucgr.K00957) B. Scheme of the E. grandis chromosome 11 showing 8 genes models arranged in tandem. C. Percentage of sequence identity between genomic clones from start to stop codon including introns. D. Percentage of sequence identity of the CDS. E. Schematic representation of the 11,643kb region of the E. grandis chromosome 11 (grey horizontal bar) spanning the 8 OMT gene models (Eucgr.K00949, Eucgr.K00950, Eucgr.K00951, Eucgr.K00953, Eucgr.K00954, Eucgr.K00955 Eucgr.K00956 and Eucgr.K00957) represented by black vertical rectangles. Their positions are also represented on the corresponding BAC clone (EG_Ba070107) which is schematized by a bold black horizontal line. A close view of each gene model is presented showing their exon-intron structure (white boxes: exons; black lines: introns). The positions of the primers used to amplify specific regions on the BAC clones are shown using orange and blue arrow tips. Sequenced regions are shown using the same color code. 88 Discussion

The work of selective amplification, sequencing of amplicons and sequence alignment to the E. grandis reference genome achieved in this study has validated the existence of the tandem gene cluster nº 547 (Myburg et al., 2014) integrating eight OMT genes, as proposed by the E. grandis (v1.0) genome assembly.

Moreover, this study reached to similar conclusions in respect to a parallel work accomplished by Doctor Marçal Soler (LRSV; Toulouse, France) which targeted two predicted genomic clusters comprising nine R2R3-MYB genes (tandem gene clusters nº 344 and 346, as described in Supplementary data 19 from Myburg et al. (2014). Main results and conclusion of this study were published in Supplementary data 3 from Myburg et al. (2014).

Although both studies were focused in single, handpicked genomic regions, they were focused in different gene families and the results obtained from independent BAC clones led them to similar conclusions. Both studies produced experimental evidence supporting the accuracy of those local assemblies across highly similar tandem copies.

Together with the reports about a low frequency of contig breaks separating tandem gene pairs (Myburg et al., 2014; see extended data Fig. 5) these results can be regarded as positive indications for a general correctness of the E. grandis (v1.0) genome assembly.

Finally, these were also interesting and positive results in the context of this thesis, since many of the multigene families studied in this Thesis in Chapters 2 and 4, involved similar clusters of high sequence homology genes in tandem gene array disposition.

89

Chapter 4

The Eucalyptus grandis R2R3-MYB transcription factor family: evidence for woody growth-related evolution and function

The present chapter was mostly published on the following research article:

Soler M, Camargo ELO, Carocha V, Cassan-Wang H, Savelli B, Hefer CA, Paiva JAP, Alexander AM, Grima-Pettenati J. 2014. The Eucalyptus grandis R2R3-MYB transcription factor family: evidence for woody growth-related evolution and function. New Phytologist 206: 1297-1313.

Parts of the present chapter were also published on the following research articles:

Myburg AA, Grattapaglia D, Tuskan GA, Hellsten U, Hayes RD, Grimwood J, , Carocha V, Paiva J, ... 2014. The genome of Eucalyptus grandis. Nature 510 (7505): 356-362.

Cassan-Wang H, Soler M, Yu H, Camargo ELO, Carocha V, Ladouce N, Savelli B, Paiva JAP, Leple J-C and Grima-Pettenati J. 2012. Reference Genes for High-Throughput Quantitative Reverse Transcription–PCR Analysis of Gene Expression in Organs and Tissues of Eucalyptus Grown in Various Environmental Conditions. Plant Cell Physiol. 53(12): 2101-16.

90

Chapter 4

The Eucalyptus grandis R2R3-MYB transcription factor family: evidence for woody growth-related evolution and function

Table of Contents

Chapter 4 ...... 90

Abstract ...... 92 Introduction ...... 93 Objectives ...... 95 Materials and methods ...... 96 Results ...... 101 Discussion ...... 115

91

Abstract

• The R2R3-MYB family, one of the largest transcription factor families in higher plants, controls a wide variety of plant-specific processes including, notably, phenylpropanoid metabolism and secondary cell wall formation. We performed a genome-wide analysis of this superfamily in Eucalyptus, one of the most planted hardwood trees worldwide.

• A total of 141 predicted R2R3-MYB sequences identified in the Eucalyptus grandis genome sequence were subjected to comparative phylogenetic analyses with Arabidopsis thaliana, Oryza sativa, Populus trichocarpa and Vitis vinifera. We analysed features such as gene structure, conserved motifs and genome location. Transcript abundance patterns were assessed by RNA-Seq and validated by high-throughput quantitative PCR.

• We found some R2R3-MYB subgroups with expanded membership in E. grandis, V. vinifera and P. trichocarpa, and others preferentially found in woody species, suggesting diversification of specific functions in woody plants. By contrast, subgroups containing key genes regulating lignin biosynthesis and secondary cell wall formation are more conserved across all of the species analysed.

• In Eucalyptus, R2R3-MYB tandem gene duplications seem to disproportionately affect woody-preferential and woody-expanded subgroups. Interestingly, some of the genes belonging to woody- preferential subgroups show higher expression in the cambial region, suggesting a putative role in the regulation of secondary growth.

92

Chapter introduction

R2R3-MYB genes constitute one of the largest families of transcription factors in plants and regulate many aspects of plant biology, such as primary and secondary metabolism, cell fate, developmental processes and responses to biotic and abiotic stresses (Dubos et al., 2010; Jin and Martin, 1999). The first demonstration that R2R3-MYB genes regulate lignin deposition came from transgenic tobacco (Nicotiana tabacum) plants overexpressing AmMYB308 and AmMYB330 from Antirrhinum majus (Tamagnone et al., 1998). After this discovery, several R2R3- MYB genes were shown to regulate aspects of phenylpropanoid metabolism, including the biosynthesis and deposition of lignin during secondary wall formation (reviewed by Dubos et al., 2010; Grima-Pettenati et al., 2012; Rogers and Campbell, 2004). Some R2R3-MYB genes have also been characterized in woody plants such as Populus and Pinus (Bomal et al., 2008; Gómez- Maldonado et al., 2004; Karpinska et al., 2004; McCarthy et al., 2010; Patzlaff et al., 2003; Winzell et al., 2010). Only two R2R3- MYB genes have been characterized to date in Eucalyptus (Eucalyptus gunnii), EgMYB1 and EgMYB2, two master regulators acting, respectively, as repressors and activators of secondary cell wall formation (Goicoechea et al., 2005; Legay et al., 2007, 2010).

R2R3-MYB proteins have a highly conserved N-terminal DNA-binding domain composed of two adjacent repeats of the MYB domain (the R2R3 MYB domain) and a highly variable C-terminal activation or repression domain (reviewed in Jin and Martin, 1999; Rogers and Campbell, 2004). In Arabidopsis thaliana, phylogenetic reconstruction of the R2R3-MYB family was performed by motif detection in the C- terminal variable region to define 22 subgroups of genes, although for many genes no motif was found to group them (Kranz et al., 1998; Stracke et al., 2001). Phylogenetic studies including studies on rice

93

(Oryza sativa; Jiang et al., 2004; Yanhui et al., 2006), grapevine (Vitis vinifera; Matus et al., 2008), poplar (Populus trichocarpa; Wilkins et al., 2009) and, more recently, maize (Zea mays; Du et al., 2012a), soybean (Glycine max; Du et al., 2012b), apple (Malus domestica; Cao et al., 2013) and switchgrass (Panicum virgatum; Zhao and Bartley, 2014) were also performed, showing that most of the subgroups described in A. thaliana were conserved. However, in some cases new subgroups absent in A. thaliana were found. Sequence-based homology classification is important to construct hypotheses about the function of R2R3-MYB genes not yet studied in model species, as genes in the same subgroup are thought to have relatively similar roles (Jiang et al., 2004), and also to determine the function of genes in non- model species by defining putative orthology relationships with known genes in model species.

In this study, we took advantage of the newly sequenced Eucalyptus grandis genome to characterize the entire R2R3-MYB family, including its phylogenetic relationships with other plant lineages, genomic distribution, expansion by tandem duplication events, exon–intron structure and gene expression. Because of the industrial importance of wood in Eucalyptus, we focus on R2R3- MYB genes putatively involved in the regulation of secondary cell wall formation and woody biomass production.

94

Objectives

To accomplish a genome-wide survey to identify genes which are members of the R2R3-MYB family in Eucalyptus grandis.

To characterize the entire R2R3-MYB family, including its phylogenetic relationships with other plant lineages, genomic distribution, expansion by tandem duplication events, exon–intron structure and gene expression.

To focus on the identification and characterization of R2R3-MYB genes putatively involved in the regulation of secondary cell wall formation and woody biomass production.

95

Materials and methods

Identification of R2R3-MYB gene models and phylogenetic analysis

Eucalyptus grandis R2R3-MYB genes were retrieved from the E. grandis genome version 1.1 available in Phytozome

(http://www.phytozome.net/) using INTERPROSCAN (Zdobnov and Apweiler, 2001) to search for the MYB domain signature. In total, 167 predicted gene models were found with two consecutive repeats of the MYB domain (excluding alternative transcripts). All but one (166) were also retrieved when performing a BLASTP analysis using the previously identified E. grandis genes against the whole A. thaliana R2R3-MYB gene data set (Dubos et al., 2010) with a cut-off e-value of e-40. The number of gene models was reduced to 141 after manual curation to eliminate partial (19) and uncertain (six) sequences.

We applied the same procedure to identify and retrieve the R2R3- MYB sequences from Populus trichocarpa (v2.2), V. vinifera (v2010) and rice (O. sativa; v7.0). The number of R2R3-MYB gene models identified by our methodology in the genome of A. thaliana (126) was the same as described in the literature (Dubos et al., 2010). However, the numbers of genes in P. trichocarpa (180), V. vinifera (123) and O. sativa (106) were different from those previously reported (192, 108 and 109, respectively; Matus et al., 2008; Wilkins et al., 2009; Yanhui et al., 2006). Taking into account that in P. trichocarpa 21 predicted genes could be alleles, whereas in V. vinifera a posterior study modified the number of R2R3-MYB proteins to 118 (Wilkins et al., 2009), the number of proteins found in our approach was very close to that reported, with just small differences that could be explained by the use of an updated version of the genome or by a different search methodology.

96

R2R3-MYB sequences were aligned using MAFFT with the FFT-NS-i algorithm (Katoh et al., 2002) and evolutionary history was inferred by constructing a neighbor-joining phylogenetic tree with 1000 bootstrap replicates using MEGA5 (Tamura et al., 2011). Sequences were compared using the complete deletion method in MEGA5, which does not consider the positions containing gaps or missing data (mostly contained outside the conserved R2R3-MYB domain) as performed previously by Matus et al. (2008). The evolutionary distances were computed using the Jones–Taylor–Thornton substitution model and the rate variation among sites was modelled with a gamma distribution of 1.

All of the E. grandis R2R3-MYBs belonging to subgroups with members in E. grandis, P. trichocarpa and V. vinifera but without members in A. thaliana and O. sativa (putative woody-preferential subgroups) were used in a reverse BLAST search against several different available plant genomes. The best hit obtained for a BLAST search in each plant genome was then used in another BLAST search against the E. grandis genome, and only if this retrieved the original E. grandis gene, or another member of the same subgroup, did we consider the subgroup to exist in the analyzed genome. The following plant genomes, available in Phytozome, were used: Citrus sinensis (v1.1), Gossypium raimondii (v2.1), A. thaliana (TAIR annotation release 10), Capsella rubella (v1.0), Brassica rapa (v1.2), Carica papaya (39 draft), Medicago truncatula (v3.5), Prunus persica (v1.0), Malus domestica (v1.0), Cucumis sativus (v1), Manihot esculenta (v4.1), Ricinus communis (v0.1), Linum usitatissium (v1.0), P. trichocarpa (v2.2), V. vinifera (v2010), Solanum lycopersicum (v2.3), Mimmulus guttatus (v1.1), Aquilegia coerulea (v1.1), O. sativa (v7.0), Zea mays (v5b.60), Selaginella moellendorfii (v1.0) and Physcomitrella patens (v1.6). The same strategy was applied using the EST

97

(Expressed Sequence Tags) database (http://www.ncbi.nlm.nih. gov/dbEST/) for Pinus spp. and Picea spp.

In silico characterization of the Eucalyptus grandis R2R3-MYB family

Information on intron–exon position and splicing sites for each gene model in the corresponding chromosome scaffold was extracted from the Phytozome database (E. grandis annotation v1.1). Gene sequences were normalized to start always at position 1 and to have the same orientation. FANCYGENE (Rambaldi and Ciccarelli, 2009) was used to obtain a graphic representation of the structure of all the genes.

Gene models were plotted according to their physical position in the 11 chromosome scaffolds representing the 11 linkage groups of E. grandis using MAPCHART 2.2 (Voorrips, 2002). Genes belonging to the same subgroup, physically close (within 100 kb of each other) on a chromosome scaffold, and separated by 10 or fewer spacer gene models were considered to be in tandem duplicated arrays, similar to the procedure of Hanada et al. (2008). The same methodology was used to plot the gene models from P. trichocarpa, V. vinifera, A. thaliana and O. sativa.

The rate of synonymous versus nonsynonymous substitutions per site in sequence pairs was calculated for each cluster of tandem duplicated genes using the codon-based Z-test for selection in MEGA5 with the Nei–Gojobori method using purifying selection as an alternative hypothesis (Tamura et al., 2011). Positions containing gaps and missing data were not considered for the analysis.

98

Amino acid motif identification was performed using all the sequences in each woody-preferential subgroup using the MEME Suite (Bailey et al., 2009) with parameters described by Jiang et al. (2004).

RNA-Seq expression analysis

RNA-Seq data from six different tissues of three field-grown E. grandis trees, as described in Mizrachi et al. (2010), were obtained from EUCGEN(IE http://www.eucgenie.org/). Mean fragments per kilobase of exon per million fragments mapped (FPKM) values per gene were calculated for each tissue, and FPKM values of 0 were set to 1 before performing log2 transformation. Values were standardized using EXPANDER 6 (Ulitsky et al., 2010), resulting in a mean of 0 and a variance of 1 across tissues. Standardization was required because the absolute FPKM values were too divergent among genes for confident clustering of transcript abundance values. K-means clustering with EXPANDER 6 (Ulitsky et al., 2010) grouped the genes into 12 main clusters with unique expression patterns.

Microfluidic quantitative PCR (qPCR) expression analysis

Juvenile xylem, mature xylem, tension xylem, opposite xylem, primary stem, secondary stem, young leaves (expanding leaves), mature leaves (fully expanded leaves), lateral roots, floral buds and fruit capsules were collected from Eucalyptus globulus. Shoot tips, secondary xylem, secondary phloem and a cambium enriched fraction, were collected from 7-yr-old and 25-yr-old trees of Eucalyptus ‘Gundal’ hybrids (Eucalyptus gunnii x Eucalyptus dalrympleana). Vascular tissues were sampled as described in Foucart et al. (2009). Briefly, bark was removed from the trunk and a cambium-enriched fraction was obtained by gently scraping both of the exposed surfaces (i.e. the inner side of the bark and the external side of the xylogenic tissue), thereby including cambial initials as well as xylem and phloem mother

99 cells. After sampling of the cambium-enriched region, secondary phloem and xylem were obtained by deeper scraping of the exposed surfaces. Plant material provenance, RNA extraction and cDNA synthesis are described in Cassan-Wang et al. (2012). Gene-specific primers for microfluidic qPCR were designed using QUANTPRIME (Arvidsson et al., 2008) with default parameters (Supporting Information Table S3.1).

Transcript abundance was assessed by microfluidic RT-qPCR using the BioMark® 96.96 Dynamic Array platform (Fluidigm, San Francisco, CA, USA) as explained in Cassan-Wang et al. (2012). A dissociation step was performed after amplification to confirm the presence of a single amplicon. The contribution of primer dimers in the fluorometric reads and the absence of environmental contamination were assessed using non-template controls. When no amplification was detected, the cycle threshold (Ct) was assigned a value of 30, an arbitrary value when transcript was not detected (this was not done for the housekeeping genes, as it was not necessary and may introduce data bias). Amplification efficiency for each gene was calculated based on a dilution series (Table S4.1) and relative transcript abundance was calculated according to Pfaffl (2001). A mix with equal amounts of each sample was used to standardize data, and the geometric mean of five validated housekeeping genes, SAND (Eucgr.B02502), PP2A1 (Protein Phosphatase 2A1, Eucgr.B03386), EF1α (Elongation Factor 1 alpha, Eucgr.B02473), IDH (NADP Isocitrate dehydrogenase, Eucgr.F02901) and PP2A3 (Protein Phosphatase 2A3, Eucgr.B03031), was used as reference to normalize data (Cassan-Wang et al., 2012).

Finally, data were log2- transformed and hierarchical clustering was performed using EXPANDER 6 (Ulitsky et al., 2010).

100

Results

Comparative phylogenetic analysis of the R2R3-MYB family in five angiosperm species

A total of 141 R2R3-MYB gene models were identified in the E. grandis genome after manual curation and exclusion of alternative transcripts. In addition to the accession numbers attributed by Phytozome, we assigned short names to the 141 E. grandis R2R3- MYB genes. We kept the names of EgrMYB1 and EgrMYB2, the two first and only Eucalyptus R2R3-MYB genes functionally characterized to date (Goicoechea et al., 2005; Legay et al., 2007, 2010). The 139 remaining genes were numbered consecutively, from the first gene on the first chromosome scaffold (EgrMYB3) to the last gene on the last chromosome scaffold (EgrMYB141). Correspondence of gene names with accession numbers is shown in Table S4.2. All identified R2R3- MYB genes from E. grandis (141) were aligned with those of A. thaliana (126) and their evolutionary history was inferred by constructing a neighbour-joining phylogenetic tree. Taking into account the topology of the tree and the bootstrap values, 42 subgroups were defined (Fig. S4.1) with high bootstrap values defining most of the subgroups (above 75 for > 90% of the subgroups). Because A. thaliana is, by far, the species in which the R2R3-MYB genes have been most extensively studied and because nearly all of the subgroups contain at least one A. thaliana gene, we used the nomenclature of Kranz et al. (1998) revised by Stracke et al. (2001) and Dubos et al. (2010) to name the subgroups. When no subgroup name had already been proposed in A. thaliana, we named the subgroup after the best- known functionally characterized A. thaliana member.

We then used all of the R2R3-MYB genes from V. vinifera (123), P. trichocarpa (180) and O. sativa (106) identified, and we included them with those from A. thaliana and E. grandis to construct a neighbour-

101 joining phylogenetic tree. The topology of the resulting tree was mostly the same as obtained when using only sequences from E. grandis and A. thaliana, although bootstrap values were somewhat lower because 676 sequences from five different plant species were used (Fig. 4.1; Fig. S4.2 and Table S4.3). Generally, a good agreement exists between the phylogeny presented herein and the phylogeny described previously for A. thaliana (Dubos et al., 2010; Stracke et al., 2001), with nearly all of the subgroups maintained in this five-species phylogeny. The only exceptions are the A. thaliana genes of the subgroups S9 and S18, which were split in two (S9a, S9b, S18a and S18b), and the genes from subgroups 2 and 3, and 10 and 24, which were merged (S2&3 and S10&24).

Although, in general, bootstrap values used to define subgroups were relatively high, branches at the base of the tree were very short and had low bootstrap support, probably because many gene duplications in the R2R3-MYB proteins took place early in the history of land plants (Rabinowicz et al., 1999) and they were followed by rapid diversification of the family (Dias et al., 2003). Inside the subgroups, bootstrap values were also low, impairing in many cases the identification of direct co-orthology relationships between genes from different species, probably because many genes evolved differently after divergence from a common ancestor, including several events of lineage-specific gene duplication and gene loss.

The vast majority of the subgroups (70%) contained members from all five species, whereas the remaining subgroups were found only in some species. For instance, three subgroups (S6, S15 and SAtM82) were present in all species but rice. These subgroups were also absent in maize (Du et al., 2012a), suggesting that they are not present in monocots. Some subgroups were specific to A. thaliana, V. vinifera or rice. The A. thaliana-specific subgroup S12, for instance, is 102 known to contain members involved in the regulation of glucosinolate biosynthesis, which is specific to the Brassicaceae family (Matus et al., 2008). Three subgroups (S5, S6 and SAtM5, highlighted in yellow in Fig. 4.1) contained a significantly higher number of sequences from the woody perennial species (E. grandis, V. vinifera and P. trichocarpa), as compared with the herbaceous species (A. thaliana and O. sativa). We designated these subgroups as putative “woody- expanded subgroups” (Fig. 4.1). Remarkably, A. thaliana members of these three woody-expanded subgroups are involved in the phenylpropanoid pathway, more specifically in proanthocyanidin and anthocyanin biosynthesis [AtMYB123/TT2 (Nesi et al., 2001); AtMYB75/PAP1, AtMYB90/PAP2, AtMYB113, and AtMYB114 (Borevitz et al., 2000; Gonzalez et al., 2008); AtMYB5 (Gonzalez et al., 2009)], suggesting diversified regulation of these pathways in woody plants.

Five subgroups were completely absent from A. thaliana and rice, containing only members from the three woody species (E. grandis, V. vinifera and P. trichocarpa) (highlighted in red in Fig. 4.1). In order to determine whether these five subgroups were woody-specific, we assessed their occurrence in other plant genomes.

103

Figure 4.1 - Neighbour-joining phylogenetic tree of R2R3-MYB proteins from five plant species. The phylogenetic tree was constructed using 676 amino acid sequences including all of the R2R3-MYB proteins from Eucalyptus grandis (Egr), Vitis vinifera (Vv), Populus trichocarpa (Ptr), Arabidopsis thaliana (At) and Oryza sativa (Os) (Supporting Information Table S2 in Soler et al., 2015). Each triangle represents a R2R3-MYB subgroup (information about which genes are in each subgroup is included in Table S3 in Soler et al., 2015), defined based on the topology of the tree and the bootstrap values. Bootstrap values are shown next to the branches (values < 25 are not shown). Two subgroups (S10&24 and S18a) had low bootstrap values, suggesting

104 high divergence between these proteins from these five different plant species. The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree (amino acid substitutions per site). Subgroup names are included next to each clade together with a short name to simplify nomenclature. The number of genes of each species for each subgroup is also included. Subgroups in general expanded in woody species (E. grandis, V. vinifera and P. trichocarpa) are highlighted in yellow, whereas subgroups preferentially found in woody species are highlighted in red. Other groups are in grey.

We performed a reverse BLAST search in which all of the E. grandis genes belonging to these subgroups were used in similarity searches against the genomes of other plant species from diverse phylogenetic divisions (Fig. 4.2). The results showed that all five subgroups were completely absent in all the Bryophyte, Lycophyte and Monocot genomes analyzed, but were present to different extents in Eudicots (except in the Brassicaceae family) and Gymnosperms, the two phylogenetic divisions containing cambium-derived woody plants. Within the Eudicots, we found the presence of members of at least four of these five subgroups in the perennial trees and shrubs (E. grandis, C. sinensis, G. raimondii, C. papaya, P. persica, M. domestica, M. esculenta, P. trichocarpa and V. vinifera) with the only exception being R. communis. In contrast, we found fewer subgroups (from one to three) in non-woody Eudicot plants (M. truncatula, C. sativus, L. usitatissium, S. lycopersicum, M. guttatus and A. coerulea) and zero in non-woody Brassicacea (A. thaliana, C. rubella and B. rapa). Therefore, accordingly, we named the five subgroups ‘woody- preferential subgroups’ (WPS-I, II, III, IV and V) as potentially involved in regulating aspects of cambium-derived woody growth that may be absent or less developed in some plant lineages.

105

Figure 4.2 - Presence/absence of genes belonging to the woody-preferential subgroups in different plant genomes from diverse phylogenetic divisions obtained by a reverse BLAST strategy. Black boxes indicate the presence of woody-preferential subgroups, whereas white boxes indicate their absence.

Figure 4.3 - Physical position of the 141 R2R3-MYB genes in the 11 chromosome scaffolds of Eucalyptus grandis. Genes belonging to the same subgroup, physically close (within 100 kb of each other) and separated by 10 or fewer spacer gene models were considered to be the result of tandem duplication events. Each duplication array is highlighted in yellow when corresponding to woody-expanded subgroups (S5, S6 and SAtM5), in red when corresponding to woody-preferential subgroups (WPS-I, II, III, IV and V), and in grey for the other subgroups.

106

In silico characterization of the E. grandis R2R3-MYB gene models

Analysis of the exon–intron structure revealed that most of the E. grandis R2R3-MYB genes contain three exons (Fig. S4.3). In most of these three-exon R2R3-MYB genes, exons 1 and 2 encoded nearly the whole MYB domain, whereas the third exon encoded the rest of the MYB domain and the C-terminal region of the protein. This structure is similar to that described for A. thaliana and V. vinifera (Matus et al., 2008). The size of the third exon is much more variable than those of the first two exons, and changes in the sequence of the third exon are thought to be associated with the functional divergence among R2R3-MYB genes (Dias et al., 2003; Matus et al., 2008). Within each subgroup, all the genes had similar patterns of intron–exon structure, as was also observed by Jiang et al. (2004). In general, the sizes of the introns were variable, contrasting with the positions and phases of the intron sites, which were quite conserved. Indeed, the great majority of the genes contained two conserved intron sites, one in the R2 domain and the other in the R3 domain, in phases 1 and 2, respectively (Fig. S4.3). The remaining genes exhibited different splicing patterns, which were generally conserved within the same subgroups in different species (Jiang et al., 2004). The E. grandis R2R3-MYB genes are distributed throughout the 11 chromosome scaffolds, with chromosome scaffolds B, C and J containing more genes than the others (Fig. 4.3). Interestingly, whereas chromosome scaffold G contains the smallest number of R2R3-MYB genes, it does contain the E. grandis orthologues of the two well-described EgMYB1 and EgMYB2 genes involved in secondary cell wall formation (Goicoechea et al., 2005; Legay et al., 2007, 2010). Ten tandem duplication arrays are highlighted in Fig. 4.3. Remarkably, these tandem arrays include mostly genes from the woody-expanded and woody-preferential subgroups. These subgroups also contain many

107 gene models in tandem duplication arrays in P. trichocarpa and V. vinifera, but not in A. thaliana and O. sativa (Fig. S4.4).

Analysis of the ratio of synonymous to non-synonymous substitutions per site revealed that nearly all tandem duplicated genes were under purifying selection (Table S4.4). This type of selection, in which the rate of synonymous mutations is greater than that of non-synonymous mutations, is the most common form of evolution for genes under functional constraints. However, for some gene pairs from subgroup S6 on scaffold J, the rates of synonymous and non-synonymous substitutions per site were similar, suggesting that these genes evolved without selective constraints. This type of neutral evolution is typical although not exclusive for pseudogenes (Koonin and Wolf, 2010).

Arabidopsis thaliana subgroups were defined by the presence in the variable C-terminal region of the R2R3-MYB genes of conserved motifs thought to play important roles in specifying functional divergence (Kranz et al., 1998; Stracke et al., 2001). For example, it has been demonstrated that AtMYB4, the putative orthologue of EgMYB1, contains a C-terminal motif also present in other R2R3-MYB proteins from subgroup S4 that is essential to bind and repress the C4H promoter, thus modifying phenylpropanoid metabolism (Jin et al., 2000). We searched for conserved motifs in the C-terminal region of the R2R3-MYB genes belonging to the woody-preferential subgroups including E. grandis, P. trichocarpa and V. vinifera genes. Using the

MEME SUITE (Bailey et al., 2009), we identified one to two putative motifs in each subgroup (Fig. S4.5). These previously unidentified motifs could be important for the function or regulation of these woody- preferential genes.

108

Deep transcript abundance profiling of the R2R3-MYB family by RNA-Seq

Eucalyptus grandis R2R3-MYB transcript abundance was analysed in six E. grandis organs and tissues by RNA-Seq (Table S4.5). Clustering based on transcript abundance patterns revealed 12 RNA-Seq-based expression clusters (Fig. 4.4). In most cases, genes present in the same phylogenetic subgroup exhibited distinct transcript profiles. As pointed out by Dubos et al. (2010), genes belonging to the same subgroup can perform similar functions, but in different cell types or in response to different developmental or environmental cues. In some cases, however, transcript abundance patterns of all genes belonging to the same phylogenetic subgroup were quite similar. This is the case for all members of subgroup S1, which were grouped in RNA-Seq- based cluster 1. In A. thaliana, members of this subgroup are mostly involved in stress responses and defense mechanisms (Cominelli et al., 2005; Raffaele et al., 2008; Seo et al., 2011). Similarly, all genes of subgroup S22 were grouped in RNA-Seq-based cluster 3. Arabidopsis thaliana members of subgroup S22 are generally related to abiotic stress responses (Jung et al., 2008).

Most of the genes involved in secondary cell wall formation and lignin biosynthesis exhibit preferential expression in secondary xylem, where these processes mainly occur. Therefore, the best candidates to play roles in wood formation in E. grandis are probably grouped in RNA- Seq-based clusters 2 (preferentially expressed in immature xylem and phloem) and 10 (preferentially expressed in immature xylem). In line with this hypothesis, about 70% of genes found in clusters 2 and 10 belong to subgroups S2&3, S4, S13, S21, SAtM46, SAtM85 and SAtM103. All these subgroups contain members that have been shown to be involved in secondary cell wall formation in functional

109 studies, mostly in A. thaliana and poplar (reviewed in Dubos et al., 2010; Grima-Pettenati et al., 2012).

Moreover, the E. grandis orthologues of EgMYB1 and EgMYB2 (Goicoechea et al., 2005; Legay et al., 2007, 2010) were also included in RNA-Seq-based clusters 2 and 10, respectively.

No transcripts were detected for seven genes in RNA-Seq-based cluster 12. A lack of expression data may indicate that these are pseudogenes. In support of this hypothesis, this cluster contained EgrMYB113, a gene that apparently evolved without selective constraints (Table S4.4) and has a large deletion in its C-terminal domain (data not shown). However, some genes might only be expressed under very specific treatments. This may be the case for AtMYB19, which, in a large survey carried out by (Kranz et al., 1998) in A. thaliana, was found to be expressed only after infection with Pseudomonas syringae. Thus, it is not possible to make the distinction in RNA-Seq-based cluster 12 between pseudogenes (with the likely exception of EgrMYB113) and genes only expressed in specific conditions not included in our RNA-Seq survey.

110

Figure 4.4 - Heat map of the RNA-Seq transcript abundance pattern of the 141 R2R3-MYB genes from Eucalyptus grandis in six different tissues clustered in 12

111 expression groups using K-means. For each gene, its name is shown to the left of the heatmap, whereas the short name of the phylogenetic subgroup is included to the right. Transcript abundance is expressed in standardized log2 fragments per kilobase of exon per million fragments mapped (FPKM) values. Next to each RNA-Seq-based cluster there is a graph with the mean transcript abundance ± SD for the entire cluster. ST, shoot tips; YL, young leaves; MT, mature leaves; FL, flowers; PH, phloem; IX, immature xylem.

Detailed transcript abundance profiling of selected R2R3-MYBs by microfluidic RT-qPCR

Based on a combination of phylogenetic analysis and RNA-Seq transcript abundance data, we selected 63 R2R3-MYB genes for expression profiling in a larger survey of developmental and environmental conditions. These genes were selected among those belonging to the woody-expanded subgroups (S5, S6 and SAtM5), woody-preferential subgroups (WPS-I, II, III, IV and V) or subgroups related to secondary cell wall formation and lignin biosynthesis (S2&3, S4, S13, S21, SAtM46, SAtMYB85 and SAtM103). Relative transcript abundance was analysed by microfluidic RT-qPCR in 16 different tissues of E. globulus and E. gunnii x dalrympleana hybrids (Table S4.6). Hierarchical clustering was performed and genes were grouped in three main RT-qPCR-based expression clusters (Fig. 4.5): genes preferentially expressed in xylem samples; genes highly expressed in lateral roots, leaves, shoot tips, flowers and fruits; and genes with preferential expression in the cambium-enriched region.

112

Figure 4.5 - Heat map of the transcript abundance patterns assessed using microfluidic qPCR of 63 Eucalyptus genes in 16 different tissues from Eucalyptus globulus and from Eucalyptus Gundal hybrids (Eucalyptus gunnii x Eucalyptus dalrympleana). Genes and samples were hierarchically clustered according to their transcript abundance (expressed in relation to the mean of all samples and log2- transformed) showing a topology that allows a clear differentiation of three different qPCR-based expression clusters. Gene names are shown to the left of the heat map, whereas the short name of the phylogenetic subgroup is included to the right. Next to each RT-qPCR-based cluster there is a graph with the transcript abundance of all the genes within the cluster (in grey) and the mean of all the genes within the cluster (in black). R, lateral roots; SS, secondary stem; PS, primary stem; OX, opposite xylem; TX, tension xylem; X, xylem; JX, juvenile xylem; MX, mature xylem; Ph, phloem; C25y, cambium from 25-yr-old trees, C7y, cambium from 7-yr-old trees; FB, floral buds; FC, fruit ; ST, shoot tips; ML, mature leaves; YL, young leaves; Egl, E. globulus; Egn, E. gunnii x E. dalrympleana.

113

The RT-qPCR-based cluster of genes highly and preferentially expressed in xylem (Fig. 4.5) contained most of the E. grandis R2R3- MYB genes putatively involved in secondary cell wall formation, which were previously found in RNA-Seq-based clusters 2 and 10 (in Fig. 4.4). The RT-qPCR-based cluster of genes highly expressed in lateral roots, leaves, flowers and fruits, was composed of a large set of genes exhibiting different transcript abundance patterns that have in common the general relatively low transcript abundance in the vascular tissues (xylem, cambium and phloem) and high transcript abundance in shoot tips (Fig. 4.5). This RT-qPCR-based cluster contains all genes analyzed belonging to WPS-IV and V, which were not preferentially expressed in vascular tissues.

The RT-qPCR-based cluster of genes preferentially expressed in the cambium-enriched fraction is highly enriched in genes from subgroups S5 and WPS-I, II and III. In fact, all of the genes profiled with RT-qPCR included in these woody-preferential subgroups were grouped in this cluster, with the exception of EgrMYB68, which was included in the xylem RT-qPCR-based cluster because of its high transcript abundance in xylem tissues, although it was also quite abundant in the cambial-enriched fraction.

114

Discussion

The analysis of the genome sequence of E. grandis, the second hardwood forest tree to be sequenced, identified 141 members of the R2R3-MYB family, which were further characterized by co- phylogenetic analysis with the corresponding gene families of A. thaliana, P. trichocarpa, V. vinifera and O. sativa. This comparative phylogenetic approach allowed us to identify three subgroups expanded in woody species (S5, S6 and SAtM5) and, strikingly, five new subgroups only present in woody species (WPS-I, II, III, IV and V).

A reverse BLAST search on 24 genomes from relevant taxonomic lineages revealed that these five woody-preferential subgroups were totally absent in the basal lineages of the Bryophytes and Lycophytes, as well as in the more modern Monocot and Brassicaceae lineages. Within the Eudicots, all five subgroups were found only in woody perennial species, whereas in non-woody plants only zero to three subgroups were identified. Moreover, the presence of some subgroups in Gymnosperms and Eudicots suggested that they are derived from a common ancestor of the Gymnosperms and Angiosperms, and were lost in some lineages during evolution.

Eucalyptus grandis genes from WPS-I, II and III are preferentially expressed in the cambium-enriched region, the meristem responsible for the extensive secondary growth that leads to wood formation. It is therefore tempting to speculate that these genes could regulate specific aspects related to cambium development and metabolism. However, it should be taken into account that genes regulating secondary growth could be present in nearly all Eudicot plants, and that the distinction between woody and herbaceous lifestyles could be mainly a measure of the degree of gene regulation (Groover, 2005; Spicer and Groover, 2010). It is well known, for example, that A. thaliana plants grown under specific conditions are able to develop

115 secondary growth in parts of the inflorescence stem and in the hypocotyl, producing secondary xylem with vessels and fibres similar to those of angiosperm woody species, although some structural differences are still notable such as the lack of rays (Chaffey et al., 2002; Lens et al., 2012; Lev-Yadun, 1994; Melzer et al., 2008; Nieminen et al., 2004). Even these differences were questioned in a recent paper showing the presence of rays during secondary growth of the A. thaliana inflorescence stems (Mazur and Kurczynska, 2012). Nevertheless, some aspects of tree secondary growth, such as the seasonal variations of cambial activity and the developmental separation between juvenile and mature wood, do not occur in short- lived annuals such as A. thaliana (Chaffey et al., 2002; Nieminen et al., 2004). It is conceivable, therefore, that perennial woody plants require more sophisticated control to adapt secondary growth to many different environmental conditions over their long lifespans.

Nearly all the transcripts of the genes from the woody-preferential subgroups preferentially expressed in the cambium were also relatively highly expressed in shoot tips. This finding supports a possible dual role in meristematic regulation, as found for many genes regulating both the vascular cambium and shoot apical meristems (Schrader et al., 2004); reviewed in Groover (2005). However, taking into account the close phylogenetic relationship between these genes and those from subgroups involved in the biosynthesis of anthocyanins and flavonoids (S4, S5, S6, S7 and SAtM5), another plausible possibility is that genes belonging to WPS-I, II and III regulate some phenylpropanoid-derived compounds mostly synthesized in cambium- derived cells from woody species. In fact, it is well known that flavonoids can interfere with auxin transport capacity by competing with the auxin efflux inhibitor 1-N-naphthylphthalamic acid (NPA), thus increasing local auxin concentrations (reviewed by Taylor and

116

Grotewold, 2005). As cambium stimulation depends on auxin and local NPA treatments induce cambium activity (Suer et al., 2011), it is tempting to speculate that these woody-preferential genes could regulate the biosynthesis of flavonoids or some phenylpropanoid- derived compounds that modify auxin transport in cambium. Indirect evidence to support this hypothesis is that, in A. thaliana, R2R3-MYBs from the S7 subgroup negatively regulate auxin response via interacting with other transcription factors (reviewed by Li, 2014). Considering the importance of cambial activity for woody plants, the existence of such a fine regulatory system, which could be absent or less developed in herbaceous plants, is plausible. This sophisticated mechanism could, for example, modulate cambium activity in response to environmental factors. The functions of woody-preferential subgroups should, however, be further studied using reverse genetic approaches in woody model plants with the aim of characterizing their roles. The C-terminal motifs conserved within the different woody- preferential subgroups could assist in the characterization of these genes, as they are thought to specify the function of these R2R3- MYB genes.

Genes from subgroups expanded in woody species (S5, S6 and SAtM5) include members involved in the regulation of the phenylpropanoid pathway leading to proanthocyanidin and anthocyanin biosynthesis, which is important for pigment production and proper seed development in A. thaliana (Borevitz et al., 2000; Gonzalez et al., 2008, 2009; Nesi et al., 2001). Matus et al. (2008) hypothesized that the expansion of these subgroups in V. vinifera compared with A. thaliana is attributable to the need for more sophisticated control of phenylpropanoid-related processes regulating the formation of grapes. Some genes from these subgroups have been found to control fruit colour in some Rosid species (Allan et al., 2008

117 and references therein; Lin-Wang et al., 2010). It is conceivable that such a mechanism could also be involved in the fruiting structures of E. grandis and P. trichocarpa, but other roles for these genes are also possible. For example, as flavonoid biosynthesis and lignin biosynthesis share many reactions from the phenylpropanoid pathway, some genes regulating anthocyanin biosynthesis could also be related to lignin biosynthesis and secondary cell wall formation. This is the case with the gene AtMYB75/PAP1 from the S6 subgroup, which, apart from its role in anthocyanin biosynthesis, interacts with the transcriptional repressor KNAT7 (KNOTTED-LIKE HOMEOBOX OF

ARABIDOPSIS THALIANA 7) and, in A thaliana loss-of-function mutants, increases cell wall thickness in the xylary and interfascicular fibres of the inflorescence stems (Bhargava et al., 2010). In addition, genes from woody-expanded subgroups are also involved in stress responses, as was found for some A. thaliana, poplar and apple members of these subgroups (Cao et al., 2013; Chen et al., 2012; Lea et al., 2007; Mellway et al., 2009; Olsen et al., 2009; Page et al., 2012; Rowan et al., 2009).

S5 is the most expanded subgroup in E. grandis and only two of the 16 genes belonging to this subgroup found in the E. grandis genome were not included in tandem arrays. Of the 12 genes from the S5 subgroup analysed by microfluidic RT-qPCR, six grouped in the RT- qPCR-based cluster of genes preferentially expressed in the cambium- enriched fraction. The fact that S5 is the subgroup most expanded in E. grandis, even compared with other woody species analyzed, points to putative specializations relating to unique aspects of Eucalyptus biology (Myburg et al., 2014). To date, the only functional data available for genes belonging to subgroup S5 come from maize C1, A. thaliana AtMYB123 (TT2), and their putative orthologues in P. trichocarpa and V. vinifera, all of which have been related to the

118 biosynthesis of proanthocyanidins and have no described roles in cambium (Bogs et al., 2007; Mellway et al., 2009; Nesi et al., 2001; Paz-Ares et al., 1987). Subgroup S5 genes in E. grandis could also be involved in the regulation of the biosynthesis of flavonoids, for example related to production of kino, a polyphenolic exudate that seems to contain high amounts of proanthocyanidins (Hillis and Yazaki, 1974). This substance forms veins particularly in the wood of Eucalyptus species, apparently in response to stresses such as damage to the vascular cambium, decreasing the commercial value of the wood produced under these circumstances (Eyles and Mohammed, 2003; Hillis and Yazaki, 1974).

Twenty-four per cent of R2R3-MYB genes in E. grandis were arranged in tandem duplication arrays, somewhat lower that the percentage found at the whole-genome level (34%; Myburg et al., 2014), but this percentage was not equally distributed across subfamilies. The E. grandis genome has a larger proportion of tandem duplicated genes than any other plant genome studied and this seems to be attributable to an unusually high rate of tandem duplicate gain (Myburg et al., 2014). However, it is remarkable that 94% of the R2R3- MYB genes of E. grandis that are spatially arranged in tandem arrays belong to subgroups either expanded or preferentially found in woody species. Specifically, the subgroups containing the highest proportion of tandem duplicated genes were S5 and S6, with 88% and 82%, respectively, of their genes arranged in tandem in E. grandis.

Populus trichocarpa and V. vinifera also contained a high percentage of R2R3-MYB gene models from woody-expanded and woody- preferential subgroups arranged in tandem repeats (c. 80% of the tandem duplicated genes). The percentage of tandem genes in woody- expanded groups was much lower in A. thaliana (33%), whereas no R2R3-MYB genes arranged in tandem were found in the genome of O. 119 sativa. It has been previously shown that tandem duplicated genes tend to be more frequently retained during evolution when they are involved in responses to environmental stimuli (Hanada et al., 2008 and references therein). Taking into account that perennial plants such as trees and shrubs have to overcome many biotic and abiotic challenges over their lifetimes in order to be evolutionarily successful, whereas annual herbs can simply die after producing seeds, it could be hypothesized that the adaptive mechanisms for stress resistance in woody perennial plants should be more elaborated than those in annual herbaceous plants. Therefore, the higher number of tandem duplicated genes from the woody-expanded and woody-preferential subgroups in E. grandis, P. trichocarpa and V. vinifera could be related to a more sophisticated response to stress conditions in these woody perennial plants.

120

Chapter 5

Anatomical, chemical and transcriptomic dynamics during tension wood formation in Eucalyptus globulus

Parts of the present chapter were published on the following research articles:

Carocha V, Quilhó T, Alves A, Graça C, Pêra S, Oliveira J, San- Clemente H, Leal L, Rodrigues JC, Fevereiro P, Grima-Pettenati J, and Paiva JAP. 2015. Anatomical, chemical and transcriptomic dynamics during tension wood formation in Eucalyptus globulus. In preparation.

Carvalho A, Graça C, Carocha V, Pêra S, Lousada JL, Lima-Brito J and Paiva JAP. 2015. An improved total RNA isolation from secondary tissues of woody species for coding and non-coding gene expression analyses. Wood Sci Technol 49: 647-658.

Cassan-Wang H, Soler M, Yu H, Camargo ELO, Carocha V, Ladouce N, Savelli B, Paiva JAP, Leple J-C and Grima-Pettenati J. 2012. Reference Genes for High-Throughput Quantitative Reverse Transcription–PCR Analysis of Gene Expression in Organs and Tissues of Eucalyptus Grown in Various Environmental Conditions. Plant Cell Physiol. 53 (12): 2101-16.

121

Chapter 5

Anatomical, chemical and transcriptomic dynamics during tension wood formation in Eucalyptus globulus.

Table of Contents

Chapter 5 ...... 121

Abstract ...... 123 Chapter introduction ...... 124 Objectives ...... 125 Materials and methods ...... 126 Results ...... 133 Discussion ...... 153

122

Abstract

• The formation of tension (TW) and opposite (OW) wood under a gravitropic stimulus is a remarkable example of wood plasticity in Angiosperm trees. Wood properties are determined by their complex anatomical features and the physiochemical composition of their secondary cell walls.

• Contrasting anatomical and chemical changes during the differentiation of TW and OW were profiled and confirmed the significant impact of gravitropism stress in cell wall properties.

• Whole-coding transcriptome dynamics revealed by the deep sequencing of 12 E. globulus mRNA libraries produced from TW and OW tissues formed under different periods of gravitropic stimulus, allowed for the identification of 93 genes differentially expressed. Tissue fate rather than bending stress period was evidenced as the most determinant factor of differentiation during the induction of TW and OW.

• Distinct carbon partitioning priorities and dynamic carbon fluxes to the different cell wall components and to energy metabolism were revealed. Critical transcriptional modulation and hormonal regulatory mechanisms of E. globulus xylogenesis were also highlighted.

123

Chapter introduction

Large-scale sequencing projects of xylem transcriptome or proteome analyses have significantly advanced our knowledge on the number and diversity of structural and regulatory genes involved in xylem differentiation in Eucalyptus spp (Camargo et al., 2014; Celedon et al., 2007; Elissetche et al., 2011; Foucart et al., 2006; Hefer et al., 2015; Mizrachi et al., 2010, 2015; Novaes et al., 2008; Paux et al., 2004, 2005; Qiu et al., 2008; Rengel et al., 2009; Salazar et al., 2013; Shinya et al., 2014). These advances highlighted the pivotal role of transcriptional regulation during the radial, secondary growth of woody stems (Andersson-Gunnerås et al., 2003; Demura and Fukuda, 2007). However, none of the previous studies was focused on the dynamics of anatomical, chemical and transcriptomic changes which occur during the differentiation of highly contrasted TW and OW tissues from E. globulus.

In this chapter, the dynamics of TW and OW formation was assessed and compared using Eucalyptus globulus developing xylem (DX) samples formed after one to four week of bending stress. Biological variation was accounted by using three different E. globulus genotypes. The anatomical and chemical changes were compared with those measured in fully developed xylem (FDX) tissues from mature E. globulus trees. Additionally, transcription modifications in those contrasting DX tissues were checked with the construction and deep-sequencing of five Eucalyptus globulus mRNA libraries. This analysis identified genes differentially expressed in those contrasting DX tissues and unraveled those implied in transcriptional modulation of E. globulus xylogenesis. The gathering and analysis of this anatomical, chemical and transcriptomic data provides new insights to understand relevant contributive biological processes, functions and cellular components during such active periods of wood formation.

124

Objectives

To profile and compare the dynamics of anatomical and chemical changes which occur during the differentiation of Eucalyptus globulus tension and opposite wood tissues formed under different periods of gravitropism stress stimulus.

To profile and compare the dynamics of whole-transcriptome changes which occur during the differentiation of Eucalyptus globulus tension and opposite wood tissues formed under different periods of a gravitropism stress stimulus.

To identify genes differentially expressed between those contrasting developing xylem tissues and unravel those implied in transcriptional modulation during Eucalyptus globulus xylogenesis.

To gather and analyse these anatomical, chemical and transcriptomic data sets for a better understanding of the most relevant contributive biological processes, functions and cellular components during such active periods of wood formation.

125

Materials and methods

Reaction wood induction and tissue sampling

Tension wood (TW) and opposite wood (OW) formation was induced by bending five-years-old E. globulus trees from three genotypes (GB3, GM2-58 and MB43) to an angle of roughly 45º, at Altri Florestal SA’s Experimental Research Station (Óbidos, Portugal) (Fig. 5.1). All developing xylem (DX) samples were collected on July 7th, 2010 from three clonal identity certified trees (ramets) for each of the three genotypes. TW tissues were collected from the upper and OW from the lower sides of the trunks bent during one (TW1w, OW1w), two (TW2w, OW2w), three (TW3w, OW3w), and four (TW4w and OW4w) weeks. Control samples (Ctrl) were harvested from the trunks of straight trees at breast height. Sampling and conservation of DX tissues was performed as described before (Paux et al., 2004) (Fig. 5.1).

For comparison purposes with developing xylem (DX), fully developed mature xylem (FDX) was sampled from nine trees used to collect the DX samples. FDX samples were collected by means of a powder-driven increment borer with a 12 mm inside diameter. Samples were collected at diameter at breast height (137 cm from the base to the stem).

126

Figure 5.1 - Design of a time course experiment of gravitropism stress induction and collection of developing xylem tissues from five-years-old Eucalyptus globulus trees. The upper scheme shows the temporal line established for a time course bending stress experiment to induce the formation of tension wood and opposite wood in five-years-old Eucalyptus globulus trees. Trunk bending was done at 1 week interval dates to allow for the harvesting of the different developing xylem samples in a single sampling day. The field work was accomplished in 2010 at Altri Florestal SA’s Experimental Research Station (Óbidos, Portugal). The lower images show details of the delicate procedures to collect developing xylem samples into liquid nitrogen in order to control extensive oxidation and degradation of the tissues.

127

Anatomical and chemical characterization of wood forming tissues

The dynamics of TW and OW induction and formation were studied by the histochemical characterization of wood disks harvested (3-5 cm thickness) from five-year old GM2-58 trees submitted to different bending stress periods (one, two, three and four weeks). The zone corresponding to wood formation near the vascular cambium was sampled from wood disks in both TW and OW sides and fixed in FAA (5% v/v formaldehyde; 50% ethanol, 5% acetic acid). For anatomical measurements, transverse microscopic sections of approximately 20 µm of thickness were cut using a sliding microtome and stained with 1% (v/v) Astra blue and 1% (v/v) Safranin and mounted in Eukitt mounting medium. Additionally, specimens were sampled from both TW and OW sides of the wood disks and macerated for two days in a solution of 30% hydrogen peroxide and glacial acetic acid 1:1 at 60 °C and then stained with Astra blue. Microscopic images were collected using Leica DMLA connected to a video camera (Leica DFC320). Anatomical measurements were made in the centre of the current annual ring in two samples of the same tree, using a projection microscope Leica DMLA and a semi–automatic image analyser Leitz ASM 68K. The numbers of vessel per mm2 (VN) and vessel tangential diameter (VTD) in transverse sections and fibre dimensions in individualized cells were measured according Sultana et al. (2010). VN was determined by dividing the measured number of vessels by the total area. VTD was estimated in 60 vessels. Fibre length (FL), width (FW) and cell wall thickness (FWT) was measured in 40 fibre cells. Each measurement was performed twice. The number of required measurements was previously calculated to give a statistical significance level of 0.05. Two-factor ANOVA was performed to check for significant effect on duration of bending stress (1, 2, 3 and 4 weeks

128 of bending), type of tissue (OW vs TW) and interactions different times of bending stress from OW and TW and respective interactions.

DX and FDX samples were collected and the extracted fraction was chemically characterized by analytical pyrolysis following the methodology described in Paiva et al. (2008a). Synthetic chemical variables representing the main chemical components of cell wall were calculated based on pyrolysis products as previous described in Paiva et al. (2008b), briefly:

• Toluene (aa,1), a pyrolysis product of phenylalanine (Moldoveanu, 1998a) which is a marker for protein content variation; • Syringyl (S) and guaiacyl (G) units, the prevalent monomers of lignin in Eucalyptus, since p-hydroxyphenyl (H) units are only minor components (Rencoret et al., 2011). Hydroxyphenyl (H) pyrolysis products (phenol and cresol), mainly derive from tyrosine pyrolysis (Faix et al., 1991; Moldoveanu, 1998b). S/G ratio was also estimated since it is an increasingly used selection criteria for Eucalyptus species suitable for pulping processes (Ohra-aho et al., 2013); • Pentosans (cP), typical hemicellulose markers. Xylans are the most abundant macromolecular non-cellulosic components of Eucalyptus globulus wood (Evtuguin and Neto, 2004), typically covalently linked to lignin; • Hexosans (cH), mainly of cellulose origin; • cP/cH ratio, represents a rough estimation of the relative proportions of hemicelluloses and cellulose; • Other carbohydrates (c), other than pentosans and hexosans, but mainly of hemicellulose origin.

Two Sample t-tests assuming unequal variances were applied to test the significance of the differences of chemical compounds of DX and

129

FDX. ANOVA two factors (type of tissue and duration of bending) where used to analyse the effects of bending time and type of tissue on cell wall chemical characteristics. Principal component analysis (PCA) was performed over the anatomical and chemical characterizations data using the R package ADE4TKGUI (Thioulouse and Dray, 2007).

RNA extraction and mRNA libraries preparation and sequencing

Total RNA was isolated using approximately 100 mg of biological material per sample following the method developed by (Carvalho et al., 2015).

For the construction of messenger RNA (mRNA) libraries, four equimolar pools of total RNA were produced for each of the three genotypes. Pools were produced by considering two tissue types (TW and OW) and two bending periods (late: pooling the “3 weeks and 4 weeks” samples and “early”: “1 week” samples). Each tissue-genotype pool consisted in three RNA samples extracted from three ramets of each genotype. Ten micrograms of each RNA pool were sent to the Genotoul platform at the Génopole Toulouse Midi-Pyrénées (Toulouse,

France; http://www.genotoul.fr/) for mRNA libraries preparation and sequencing following internal procedures. The TRUSEQ™ SBSv5 sequencing kit (Illumina) was used for library 100bp pair-end sequencing, using the Illumina Hi-Seq 2000 instrument.

Sequence read quality control and alignment to E. grandis reference genome

CASAVA v1.8, the quality control pipeline from Illumina was used to produce fastq format sequences (Fig. 5.4). The effectiveness of sequence adapters trimming was checked by cross matching (Smith- Waterman based local alignment) reads with the sequences of the used adapters. The high quality surviving reads were aligned to the E. 130 grandis reference genome (v1.0 assembly) using BOWTIE2 (Langmead et al., 2012) for genome index construction, followed by splice-aware read alignments performed with TOPHAT 2.0.6 (Trapnell et al., 2012) with now annotation reference, considering the three genotypes (GB3, MB43 and GM2-58) as biological replicates.

Identification of differential expressed transcript conducted with

CUFFLINKS (Trapnell et al., 2012). CUFFCOMPARE was then used to integrate the E. grandis genome annotation vs. the four transcriptome annotations generated by TOPHAT on the final transcriptome annotation. Significant changes in transcript expression (fragments per kilobase of coding sequence per million mapped fragments or FPKM) were identified using CUFFDIFF. Post-analysis of the output from

CUFFLINKS was done with CUMMERBUND R package (Goff et al., 2013).

Transcript abundance was assessed by RT-qPCR for nine genes selected to confirm the expression tendencies revealed with RNAseq data analysis. Expression of these genes was profiled using developing xylem tissues samples from TW and OW tissues collected from two genotypes (GB3 and MB43) and generated after one week or three and four weeks of bending stress. Three biological replicates were used for each genotype. This work was performed by microfluidic qPCR using a BioMark® 96.96 Dynamic Array platform (Fluidigm, South San Francisco, California, USA) as detailed in Cassan-Wang et al. (2012). Gene-specific primer pairs (Table S5.1a) were designed using the QUANTPRIME program (Arvidsson et al., 2008). A dissociation step was performed after amplification to confirm the presence of a single amplicon. Primer efficiency was estimated using LINREG (Ruijter et al., 2009). Three reference genes identified in the same tissue panels by Cassan-Wang et al. (2012) were used for data normalization: PP2A1 (Eucgr.B03386), PPA23 (Eucgr.B03031) and

131

SAND (Eucgr.B02502). Efficiency correction and data normalization was performed using GENEX 6.0.1.612 (MultiD Analyses AB, Göteborg, Sweden) program. The consistency of the results provided by the biological replicates was evaluated. Finally, the expression profiles of these nine genes (relative expression to the mean) were compared to the profiles revealed by RNAseq.

Functional classification of differentially expressed genes and miRNA target genes

The BLAST2GO program (Conesa et al., 2005) was used for the functional classification of differentially expressed genes in TW or in OW. The two groups of genes were defined based on their over- expression in TW or OW tissues. In order to determine whether the two gene lists were enriched for specific GO terms or Pfam domains, we used enrichment test analyses, to compare each set against a reference-set of the GO-SLIM functional classification of the Eucalyptus grandis transcriptome. Fisher’s exact test as implemented in

BLAST2GO was used to compute the enrichment P-value for each GO term. MapMan (Thimm et al., 2004) was used to categorize the differentially expressed genes into distinct functional classes (bins).

132

Results

Phenotypic analysis of wood-forming tissues during tension wood induction kinetics

Anatomical and morphological characterization

Image analysis of histochemical characterization of cross sections of Eucalyptus globulus wood tissues showed a lower occurrence of lignified cell walls in TW comparatively to OW (Fig. 5.2).

Figure 5.2 – Histochemical characterization of Eucalyptus globulus tension wood (B, C & D) and opposite wood (A) tissues. Tissue samples were obtained from wood disks (3-5mm thickness) harvested from 5-yr old E. globulus trees under different bending periods (1, 2, 3 and 4 weeks). Samples were fixed in FAA (5% v/v formaldehyde; 50% ethanol, 5% acetic acid) and stained with 1% (v/v) Astra blue (stains cellulose-enriched walls in blue) and 1% (v/v) Safranin (stains lignified cell walls in red). The vascular cambium (not displayed) was positioned below in the photographs.

133

This reduction was more pronounced after three and four weeks of bending. Although a few cells presented gelatinous cell wall fibres after one week of bending the differentiation of TW was only evident after the second week (Fig. 5.2). At that point, modified cell walls occurred mostly farther from the cambium region, comparatively to three and four weeks of bending.

Larger vessels were found in TW relatively to OW during the entire kinetics of TW/OW formation (Table 5.1; S5.2). Larger vessel tangential diameter (VTD) values were scored in both TW and OW tissues after two and three weeks of bending but a decrease was noticed after four weeks of bending. A lower vessel number (VN) was observed in TW tissues in comparison to OW tissues formed after the first and the second week of bending but this tendency was inverted in later responses. VN decreased consistently along the kinetics of OW formation. In contrast, VN showed an increase tendency in TW tissues formed after longer bending periods (Table 5.1). Fibre cells were found shorter, narrower and with thicker cell walls in tension wood relatively to opposite wood (Table 5.1).

Statistical significant differences involving these anatomical characteristics were mostly observed while comparing between TW and OW tissues formed after one week of bending (Table 5.1; Table S5.2). Comparing among TW time period samples, statistical significant differences were mostly observed between tissues formed after the first and the second weeks of bending (Table 5.1; Table S5.2). Comparing among OW time period samples, statistical significant differences were observed between the first week and the following weeks of bending (Table 5.1; Table S5.2).

134

Table 5.1 - Comparative anatomical characterization of Eucalyptus globulus tension wood (TW) and opposite wood (OW) tissues sampled from developing xylem (DX). = average value, sd = standard deviation. Statistical significant differences between tissues were estimated using a Two-Factor ANOVA. Statistical significance is indicated is indicated as following: ns P>0.05; * P<0.05; ** P<0.01; *** P<0.001; **** P<0.0001.

Time of bending stress ( ± ) Tissue Chemical Tissue type characteristic x time P- P- P- ( ± ) 1 week 2 weeks 3 weeks 4 weeks value value value VTD - Vessel TW 127.5±33.9 123.1±26.5 137.7±26.2 132.2±31.3 116.3±34.5 tangential **** **** **** diameter (µm) OW 116.8±35.1 87.8± 25.8 132.7±30.9 132.0±32.8 114.8±30.0

TW 5.2±1.6 5.1±0.6 4.2±0.8 5.3±0.5 6.2±2.6 VN - nº ns ns ** vessel/mm2 OW 5.6±1.2 7.2±0.2 6.1±0.3 5.0±0.7 4.3±0.5

TW 1039±165 1015±189 1069±162 1055±152 1020±147 FL - Fibre * * ns length (mm) OW 1104±523 1134±155 1217±1000 1035±165 1027±1343

TW 16.3±2.1 17.0±2.3 16.2±1.8 15.7±2.2 16.2±2.0 FW - Fibre * **** ** width (µm) OW 17.2±2.3 16.8±2.0 17.4±2.5 16.9±2.0 17.5±2.6

FWT - Fibre TW 4.3±0.7 4.4± 0.8 4.1±0.7 4.3±0.8 4.2±0.7 wall thickness * **** * (µm) OW 4.1±0.8 3.9±0.9 4.3±0.7 4.2±0.7 3.9±0.9

Chemical characterization

To our knowledge, analytical pyrolysis was for the first time used to characterize the cell wall composition of DX samples on extractive-free tissues in Eucalyptus. To instruct us about the particularities of these DX tissues, pyrograms of DX tissues were initially compared with those generated for FDX tissues (Table 5.2). The pyrograms provided 72 pyrolysis products for DX tissues and 71 pyrolysis products for FDX tissues (Table S5.3a). The main difference between pyrograms obtained from FDX and DX tissues was the absence of a peak of protein (“aa,1” aka toluene) origin in FDX tissues (Fig. S5.1; Table S5.3a). Also in DX tissues from maritime pine (Paiva et al., 2008b). As in DX samples the “H” products s exhibited the same profile as phenylalanine pyrolysis products (aa,1), confirming their protein origin. Thus, the “H” products were not used to quantify the variation of lignin content, as it could be biased by the presence of protein pyrolysis

135 products. Instead, the sum of S- and G- units (G+S) was used as a reliable proxy for lignin content. For all main components of cell wall we observed highly significant variations between DX and FDX samples. We observed less 40% of lignin (S+G), 44% of S/G in DX and 21% less hexosans (cH) content in DX when compared to , but more 14% pentosans (cP) and 32% other carbohydrates (c) in DX than in FDX.

Table 5.2 - Comparative chemical characterization of Eucalyptus globulus wood tissues from developing xylem (DX) and from fully developed xylem (FDX). Synthetic variables represent the mains components of cell walls. . = average value, sd = standard deviation. Statistical significant differences between averages were analysed using a Two tails t-Student test assuming unequal variances: **** means P<0.0001.

DX (n=60) FDX (n=11) DX/FDX t-test Synthetic Variable ( ± ) ( ± ) (p-value) Hexosans (cH) 22,00±2,78 27,82±1,79 0,79 **** Pentosans (cP) 7,60±1,85 6,64±0,24 1,14 *** Other carbohydrates (c) 49,73±3,68 37,65±1,714 1,32 **** Amino acids (aa,1) 2,02±0,54 0.00±0.00 - - Guaiacyl units (G) 6,76±1,42 7,94±0,35 0,85 **** Syringyl units (S) 7,66±1,74 16,27±0,52 0,47 **** G+S 14,42±2,95 24,21±0,80 0,60 **** S/G 1,14±0,21 2,05±0,07 0,56* ****

Chemical variation of TW and OW forming xylem samples during bending kinetics

To study the effects of bending time and tissue type on cell wall chemical composition, pyrolysis was carried out on wood-forming tissue collected at the upper and bottom side on bent trees along a bending period of four weeks (Table 5.3).

136

Table 5.3 - Chemical characterization (main groups of pyrolysis products) of Eucalyptus globulus tension wood (TW) and opposite wood (OW) tissues, along a tension wood induction kinetic. = average value, sd = standard deviation. Statistical significant differences between type of tissue, duration of bending and interactions between these two factors were done by Two-Factor ANOVA. Statistical significance is indicated is indicated as following: ns P>0.05; * P<0.05; ** P<0.01; *** P<0.001; **** P<0.0001. Tukeys’s multiple comparison test between time of bending were done for each type of tissue: same letter means no significant difference between bending time for one tissue type (P<0.05).

Tissue type Time of bending stress ( ± ) Tissue Chemical x time characteristic Tissue ( ± ) P- 1 week 2 weeks 3 weeks 4 weeks P- P-value value value TW 6.3±0.3 6.0±0.8a 6.1±0.6a 6.4±2.0a 6.6±0.7a Guaiacyl units (G) **** **** *** OW 7.6±1.6 5.9±0.7a 7.2±0.7bb 7.9±0.2b 9.6±0.9c TW 7.0±0.8 8.0±0.1.2a 6.0±1.4b 6.9±3.0b 6.9±1.1b Syringyl units (S) ns **** ns OW 8.5±1.2 7.6±1.4a 8.0±0.8a,b 8.0±0.8b 10.2±1.3b TW 13.2±0.8 14.0±1.8a 12.2±1.8a 13.3±4.8a 13.5±1.7a Lignin (G + S) **** ** ** OW 16.1±2.7 13.5a±1.6 15.2a,b±1.3 15.9a,b±0.8 19.9c±2.2 TW 1.2±0.2 1.3±0.14a 1.0±0.2b 1.1±0.2b 1.0±0.1b S/G ns **** ns OW 1.1±0.1 1.3±0.3a 1.1±0.1a,b 1.0±0.1b 1.1±0.1b TW 7.2±1.2 7.9±1.9a 7.9±1.7a 5.4±2.3a 7.4±1.4a Pentosans (cP) ns ** ns OW 7.9±0.9 9.1±1.6a 8.2±1.7a,b 7.0±1.5b 7.2±1.2a,b TW 23.2±0.3 23.2±3.2a 23.4±2.8a 23.2±1.2a 22.7±1.9a Hexosans (cH) **** * * OW 20.3±2.1 23.4±2.3b 19.7±1.5a 19.0±0.4a 19.1±1.9a a a a a Other TW 50.1±3.9 49.2±4.9 50.6±4.2 50.8±4.6 49.8±3.1 ns ns ns carbohydrates (c) OW 49.4±2.0 48.4±3.3a 50.7±2.5a 51.4±1.2a 47.0±3.8a a a,b b a,b Protein content TW 2.0±0.4 1.6±0.5 1.8±0.3 2.5±0.6 2.1±0.6 ns *** ns (aa1) OW 2.1±0.2 1.8±0.5a 2.1±0.7a,b 2.3±0.2b 2.3±0.3a,b

Significant differences for total lignin content (G+S) and G-units were observed between OW and TW (22.0% and 20% more lignin and G- units in OW than in TW, respectively). These differences resulted from the significant increase of G- and S-units along the bending stress in OW developing tissues and a significant reduction of S-units during the bending stress in TW. Indeed, with similar G- and S- units after one week of stress, 4 weeks later the OW samples showed more 62.7% G- units and 47.5% S- units in OW, while in TW developing xylem, it was observed significant reduction of S-units from the between the first week and the second week of but no significant variation of G-units content along the bending kinetics.

The content of hexosans (cH) was 14.2% significantly higher between TW and OW developing xylem. Along the bending stress, the content in hexosans remained constant, but in OW it was observed a significant reduction between the first and fourth week of bending. For

137 pentosans, other carbohydrates and protein content (aa1) no significant differences were found between TW and OW, despite significant differences were observed in pentosans content of OW after the first week of bending, and also the increase in aa1 content in both OW and TW developing tissues along the bending stress kinetics.

Integration of anatomical and chemical dynamics along the kinetics of TW and OW formation

Principal component analysis (PCA) (Fig.5.3, Table S5.3b) was used to summarize the relationship between tissues and the different chemical and anatomical characteristics. The first two components of the PCA (PC1 x PC2) explained 66.7% of the variation in the DX samples with the PC1 accounting for 35.4%. The correlation circle (Fig. 5.3a) shows that the PC1 axis correlated positively with lignin (G, S and G+S) content and negatively with cellulose content (cH) and fibre wall thickness (FWT). The PC2 axis was negatively correlated with hemicelluloses (cP) content, S/G ratio (S/G), fibre length (FL), fibre width (GW) and vessel number (VN), and positively correlated with proteins (aa,1 and H pyrolysis products), carbohydrates content (c) and vessel tangential diameter (VTD) (see also Table S5.2).

To better visualize the progression of the modifications of cell wall composition during TW and OW formation, samples where connected by successive bending periods (Fig. 5.3b). Distinct paths for both types of tissues were clearly observed. It is possible to observe enrichments in lignin contents and important reductions of “cH” and “cP” in the OW samples, with increased bending periods. In contrast, little variation in lignin contents were observed in TW as compared to OW, for increased bending times. Also, strong variations of contents of proteins and polysaccharides (cP and c) were observed along the TW induction. Some anatomical characteristics also showed interesting variation, as for instance in OW samples fibres that were increasingly

138 larger (FW) and presented decreased wall thickness (FWT). Also, along the TW formation we observed a reduction in the number of vessels (VN) but an increase in the vessel higher diameter (VTD).

Coding transcriptome dynamics in tension wood formation

Identification of genes differentially expressed in tension wood formation dynamics

Global transcriptome modifications during the induction of tension wood were assessed by generating and sequencing 12 RNA-Seq libraries of DX tissues collected in upper side and bottom side of trunk bent by different periods of time for three genotypes. To analyse different time response of bending stress and type of tissue, samples collected after one week (early response) of stress were pooled by tissue type while samples collected after 3 and 4 weeks after bending (late response) were pooled together also by tissue type. A total of 7.6x108 raw paired-reads were obtained for the 12 libraries. After trimming and checking for quality and other genome contaminants, a total of 3,68x 108 pair-end reads were properly mapped against the E. grandis genome v1.1 genome (Fig. 5.4; Table S5.4). A total of 51,857 potential transcript loci were identified and 37,025 matched with the genomic coordinates of one the reference model genes (Fig. 5.4; Table S5.5). The identification of the new loci could be in part attributed to the fact of not using the E. grandis reference genome to guide the identification of loci by TOPHAT.

139

140

Figure 5.3 - Principal components analysis (PCA) of chemical (synthetic pyrolysis products) and anatomical characteristics quantified in Eucalyptus globulus tension wood (TW) and opposite wood (OW) tissues. (a) Correlation circle shows the position of nine selected composite pyrolysis products and five anatomical characteristics on the PC1–PC2 plane. Pyrolysis products sourced from proteins are represented in green, from lignins in red and from sugars in blue. Protein origin: aa1, toluene (amino acid content) and H, p-hydroxyphenyl (mainly from proteins). Lignin origin: G, guaiacyl- and S, syringyl- units; G+S (sum of G and S). Polysaccharide origin and cellulose as main contributor: cH (hexosans). Hemicelluloses origin: cP (pentosans); c (other carbohydrates). The ratios S/G and cP/cH were also considered and displayed. Anatomical characteristics are represented in rose: VN, numbers of vessel per mm2; VTD, vessel tangential diameter; FL, fibre length; FW, fibre width and FWT, cell wall thickness. (b) Main plane (PC1–PC2) with the positional display of the differentiating xylem samples from tension and opposite wood kinetics (one to four weeks bending: 1w-4w). PCA was performed using the ade4 library of the R package (Chessel et al., 2004). 140

Figure 5.4 – Workflow to identify differentially expressed loci in developing xylem tissues from Eucalyptus globulus.

Transcription analysis revealed 93 loci differentially expressed (q<0.05) (Table S5.5). Based on their genomic coordinates, 84 of the 93 loci were successfully assigned to a single predicted gene from the E. grandis genome (v1.0 assembly, v.1.1 annotation). For four loci it was not possible to discriminate which of the genes located in in tandem array disposition were being targeted, which led us to consider it a unique entity. The genomic location of the five remaining loci did not match the coordinates of any of the 36,376 predicted genes (Table 5.4). However, BLASTn searches comparing the sequences corresponding to these five loci against the E. grandis genome (v1.0

141 assembly & v.1.1 annotation) allowed us to associate XLOC_048375 to Eucgr.H04603, a predicted gene encoding a SAUR-like auxin- responsive protein family, overexpressed in TW both during the early and late response to bending. This gene is no longer included in the current available version of the E. grandis reference genome since it was one of the 8,620 low-confidence gene models, predicted in the original v1.0 annotation, later removed from v1.1.Additionally, BLAST2GO (Conesa et al., 2005) and PlantCAZYmes (Ekstrom et al., 2014) were used to annotate these new loci (Table S5.6). Among these it was possible to identify one pectinesterase-like (XLOC_050971) preferentially over expressed in TW during the late response; a heavy metal transport/detoxification superfamily protein (XLOC_008700), overexpressed in early TW and late TW, and a glucan endo-1,3-beta-glucosidase (XLOC_002488) also over expressed in early TW. For XLOC_018314, significant highly expressed in TW wood during the early and late response to bending, no significant homologies were found for it. Among the genes differentially expressed, 11 were also reported by Mizrachi et al. (2005), including heat shock proteins (Eucgr.C00678; Eucgr.E00653) and hormonal 1-Aminocyclopropane-1-Carboxylate Oxidase (Eucgr.D01368), KNOTTED-like homeobox transcription factors (Eucgr.D01935), glycogenin-like starch initiation (Eucgr.F00232), IQ- domain (Eucgr.F01203), calmodulin-binding protein; laccase (Eucgr.G03028), S-adenosylmethionine synthase (SAMS; Eucgr.B00946), leucine-rich repeat kinase (LRR; Eucgr.F02727), one unknown function gene (Eucgr.B02741), and the tension wood marker gene fasciclin-like arabinogalactan (FLA; Eucgr.J00938).

The 93 differentially loci were mapped into “MapMan pathway analysis overview” for functional classification (Thimm et al., 2004). The loci were classified into 17 out of the 36 MapMan main functional

142 categories (bins) (Table 5.4). The most represented functional bins were “protein” (10 data points); “cell wall” (9), “stress”, and “miscellaneous” (8), “RNA” (7) and “cell organization” and “signaling” (5).

Tissue type (TW or OW) was found the foremost factor contributing for differential transcript accumulation (Fig. 5.5; Fig. S5.2). All total, 31 and 53 genes were found differentially expressed between tension and opposite wood, during early and late response to bending stress, respectively. In contrast, only 10 and 6 genes were identified as up- regulated in OW, during the early and late response to bending, respectively. Among the genes up-regulated in OW during the early response to bending, three unknown function proteins H01072, two EXORDIUM like 2 genes in tandem (Eucgr.G00850, Eucgr.G00851), heat shock protein 90.1 (Eucgr.E00653), glycolipid transfer protein 2 (Eucgr.F02730) and one basic chitinase (Eucgr.I01495), and a CCCH- type zinc finger family protein (Eucgr.C00948) were identified. During the late response, senescence-associated gene 12 (Eucgr.D00496), Glycosyl hydrolases family 32 protein (Eucgr.D02029), alpha-vacuolar processing enzyme (Eucgr.J03189), a Bifunctional inhibitor/lipid- transfer protein/seed storage 2S albumin superfamily protein (Eucgr.B02633; Eucgr.K03041), laccase 14 (Eucgr.E04221) were identified as being overexpressed in OW. For early and late response 15 common genes were found among the most overexpressed genes in TW, being these change folds in average 2.57 times bigger in late response when compared to early response. Among five genes preferentially expressed in TW wood, it was possible to identify an ethylene-forming enzyme (Eucgr.D01368), the annotated new locus XLOC_048375), a FASCICLIN-like arabinogalactan-protein (Eucgr.J00938), a member of the Exostosin family protein

(Eucgr.H00343) and an unknown function gene (Eucgr.J00937).

143

The number of transcripts differentially expressed in the same type of tissue but at different bending times was far lower. Only one and ten transcripts were detected as differentially expressed between early and late, for OW and TW, respectively. For OW, only a senescence- associated gene (Eucgr.D00506) presented a statistical significant fold change between early and late response (137.9 times difference). For TW, 8 genes presented statistical significant higher expression during early response to bending. These genes comprise defense or transport related genes such as an alpha-vacuolar processing enzyme (Eucgr.J03189), stress response genes, such as a plasma membrane intrinsic protein (Eucgr.K02993), pathogenesis-related 4 (Eucgr.L03258), a NAD(P)-linked oxidoreductase superfamily protein (Eucgr.B02637) and abscisic acid transduction signaling pathway (MLP-like protein 423 Eucgr.H04007,Eucgr.H04010), and one expansin (Eucgr.G03134), as well the new locus XLOC_002488 homologue of abeta-1,3-glucanase. Only two genes were found with significant increase fold in late response for TW that is an unknow function protein (Eucgr.K02384) and a 17.6 kDa class II heat shock protein (Eucgr.C00678).

Based on the transcriptomic profile distance of the differentially expressed genes (Fig.5.5), samples were grouped into two clusters of type wood: The cluster “tension wood” (cluster-TW) grouped the samples collected in the tension wood forming side (TW1w and TW34w), while the cluster “opposite wood” (cluster-OW) grouped the samples collected in the opposite side (OW1w and TW34w) of the trunk bent segments. This result suggests the occurrence of effective reprogramming transcriptomic activities during bending stress, in both sides of the bent tree, independent of the duration of bending stress. In accordance to this result, genes differentially expressed could be

144 groups in two main gene cluster based on their preferential expression in TW or OW developing xylem (Fig. 5.5-5.6; detail in and Table 5.4).

Figure 5.5 – Hierarchical clustering of Eucalyptus globulus tension wood (TW) and opposite wood (OW) differentiating xylem samples. Samples were collected after 1 week (1w) or 3 and 4 weeks (34w) of bending stress according to their transcriptional Euclidean distance and UPGM. Hierarchical clustering pattern was generated with EXPANDER 6.5 (Ulitsky et al., 2010) on standardized data (mean=0, standard deviation=1).

a) b) TW gene cluster OW gene cluster 2 2

1 1

0 0

-1 -1 Relative expression Relative Relative expression Relative

-2 -2 w w w w w 1 4 1 4 1w 4 1w 4w W 3 W 3 W 3 W 3 O W T W W T W O T O O T Time of bending Time of bending

Figure 5.6 – Expression profile of genes up-regulated in tension wood (a) and opposite wood (b) of Eucalyptus globulus, along a bending stress kinetics. Clusters were generated after a hierarchical clustering analysis using EXPANDER 6.5 on standardized data (mean=0, standard deviation=1). Box boundaries represent the 25% and 75% of values; line inside the box represents the mean value; lines connected to boxes by vertical bars, range of extreme values of standardized data within each level.

145

146

Table 5.4 - Transcriptome sequencing (FPKM values) data for 93 differentially expressed loci in Eucalyptus globulus tension wood (TW) and opposite wood (OW). Comparisons: tension OW (TW) versus opposite OW (OW) formed after different trunk bending stress periods: 1 week (1w) and 3 and 4 weeks (34w); ● - indicates a statistically significant differential expression detected for this gene in the above discriminated comparison between developing xylem tissues (q-value<0.05).

Expression (FPKM) Differential expression (q-value < 0.05) Functional Cluster classification E. grandis loci Gene annotation OW1w OW1w TW1w OW1w TW1w OW34w Cluster (MapMan bin class) OW1w OW34w TW1w TW34w vs vs vs vs vs vs TW1w OW34w OW34w TW34w TW34w TW34w 1 - Photosynthesis Eucgr.E02381 light-harvesting chlorophyll-protein complex II subunit B1 9.1 11.0 66.2 143.4 TW ● ● 2 - Major CHO metabolism Eucgr.D02029 Glycosyl hydrolases family 32 protein 19.0 24.8 1.5 0.2 OW ● ● Eucgr.C03201 sucrose synthase 4 16.4 16.3 78.4 123.4 TW ● Eucgr.F03138 Glycosyl hydrolases family 32 protein 0.1 0.0 16.4 0.2 TW ● 3 - Minor CHO metabolism Eucgr.B02637 NAD(P)-linked oxidoreductase superfamily protein 0.9 9.4 28.1 0.3 TW ● 10 - Cell Wall Eucgr.B03214 expansin A1 5.7 3.3 32.1 75.6 TW ● ● Eucgr.F00232 plant glycogenin-like starch initiation protein 3 38.4 35.8 212.8 466.3 TW ● ● Eucgr.F02941 cellulase 3 4.7 3.8 35.5 75.0 TW ● Eucgr.G00649 beta-D-xylosidase 4 24.4 32.5 113.4 133.6 TW ● Eucgr.G03134 expansin A8 0.9 2.5 24.1 0.2 TW ● Eucgr.H01055 beta-xylosidase 2 30.4 52.0 114.4 167.7 TW ● Eucgr.J00938 FASCICLIN-like arabinogalactan-protein 12 8.7 3.3 1282.5 3002.8 TW ● ●● ● Eucgr.J02266 Pectin lyase-like superfamily protein 29.3 21.4 88.0 127.4 TW ●

XLOC_050971 Pectin methylesterase 3 25.1 17.2 64.7 122.9 TW ● 11 - Lipid metabolism Eucgr.F02730 glycolipid transfer protein 2 110.4 141.6 7.6 17.0 OW ● ● Eucgr.B00946 S-adenosyl-L-methionine-dependent methyltransferases superfamily protein7.6 6.5 49.7 15.8 TW ● 15 - Metal handling XLOC_008700 heavy metal transport/detoxification superfamily protein 2.1 1.4 53.1 23.2 TW ● ● 16 - Secondary metabolism Eucgr.E04221 laccase 14 10.9 36.7 12.5 1.0 OW ● Eucgr.G03028 Laccase/Diphenol oxidase family protein 15.5 24.7 88.9 123.2 TW ● Eucgr.K03111 laccase 17 32.0 27.3 119.7 172.0 TW ● ●

146

Table 5.4 – (cont.) Expression (FPKM) Differential expression (q-value < 0.05) Functional Cluster classification E. grandis loci Gene annotation OW1w OW1w TW1w OW1w TW1w OW34w Cluster (MapMan bin class) OW1w OW34w TW1w TW34w vs vs vs vs vs vs TW1w OW34w OW34w TW34w TW34w TW34w 17 - Hormone metabolism Eucgr.D01368 ethylene-forming enzyme; 1-Aminocyclopropane-1-Carboxylate Oxidase8.3 13.5 647.1 1399.4 TW ● ●● ● Eucgr.E00054 Cytochrome b561/ferric reductase transmembrane w ith DOMON related86.8 domain 102.9 342.1 528.9 TW ● XLOC_048375 SAUR-like auxin responsive 0.2 0.1 24.1 40.4 TW ● ●● ● 20 - Stress Eucgr.C00678 17.6 kDa class II heat shock protein 651.1 341.9 34.5 277.5 OW ● ● Eucgr.E00653 heat shock protein 90.1 120.6 42.4 3.7 26.8 OW ● ● Eucgr.I01495 basic chitinase 803.5 373.9 78.5 121.8 OW ● Eucgr.J01957|J01959 HSP20-like chaperones superfamily protein 6343.0 7833.2 1225.8 2949.9 OW ● Eucgr.D01892 osmotin 34 0.3 0.1 29.8 0.0 TW ● Eucgr.E00809 Disease resistance-responsive (dirigent-like protein) family protein 67.0 50.6 246.1 468.4 TW ●● Eucgr.H00321 homolog of carrot EP3-3 chitinase 0.1 0.1 15.5 0.0 TW ● Eucgr.L03258 pathogenesis-related 4 1.4 1.4 72.5 0.2 TW ● 26 - Miscellaneous Eucgr.B02633 Bifunctional inhibitor/lipid-transfer protein/seed storage 2S albumin superfamily177.0 228.1 122.9 25.1 OW ● Eucgr.H04642 cytochrome P450, family 78, subfamily A, polypeptide 10 53.4 29.7 3.1 7.5 OW ● Eucgr.K03041 Bifunctional inhibitor/lipid-transfer protein/seed storage 2S albumin superfamily621.7 2227.2 262.5 57.9 OW ●● Eucgr.C03896|C03897 beta-galactosidase 12 19.3 19.6 66.3 148.1 TW ●● Eucgr.G02702 cytochrome P450, family 94, subfamily C, polypeptide 1 0.7 2.3 18.5 61.4 TW ● ● Eucgr.H00343 Exostosin family protein 0.1 0.1 39.9 84.0 TW ● ●● ● Eucgr.I00988 SKU5 similar 17 89.8 63.9 368.8 622.6 TW ● XLOC_002488 beta-1,3-glucanase 0.3 0.3 55.5 0.0 TW ● ● ● 27 - RNA Eucgr.C00948 CCCH-type zinc finger family protein 217.3 163.0 25.6 76.5 OW ● ● Eucgr.D02645 homeobox 7 65.4 23.8 18.6 1.3 OW ● Eucgr.D01819 myb domain protein 103 - EgrMYB60 21.7 23.4 115.0 212.4 TW ● ● Eucgr.D01935 KNOTTED-like homeobox of Arabidopsis thaliana 7 25.9 28.1 101.5 174.0 TW ● ● Eucgr.G02379 Zim17-type zinc finger protein 67.1 60.3 447.8 641.5 TW ● ●● ● Eucgr.H04007|H04010 MLP-like protein 423 7.1 17.9 67.5 0.6 TW ● Eucgr.H05072 Eukaryotic aspartyl protease family protein 72.5 80.5 328.4 471.3 TW ● ●

147

Table 5.4 – (cont.) Expression (FPKM) Differential expression (q-value < 0.05) Functional Cluster classification E. grandis loci Gene annotation OW1w OW1w TW1w OW1w TW1w OW34w Cluster (MapMan bin class) OW1w OW34w TW1w TW34w vs vs vs vs vs vs TW1w OW34w OW34w TW34w TW34w TW34w 28 - DNA Eucgr.B03112 DNA glycosylase superfamily protein 28.7 43.6 152.4 221.3 TW ● 29 - Protein Eucgr.D00496 senescence-associated gene 12 1.9 10.8 2.4 0.1 OW ● Eucgr.D00506 senescence-associated gene 12 0.3 38.4 1.1 0.1 OW ● Eucgr.J03189 alpha-vacuolar processing enzyme 18.9 40.3 40.1 0.5 OW ●● Eucgr.A01172 Nucleotide-diphospho-sugar transferases superfamily protein 57.3 68.9 253.8 395.9 TW ● Eucgr.B00047 RING/U-box superfamily protein 39.0 33.5 107.5 184.6 TW ● Eucgr.E02755 Leucine-rich repeat protein kinase family protein 8.6 11.7 49.7 65.9 TW ● Eucgr.F02028 RING/U-box superfamily protein 88.2 84.5 425.2 780.9 TW ● ● Eucgr.H01314 Protein kinase superfamily protein 17.1 20.4 112.2 168.8 TW ● ● ● Eucgr.H03424 Kinase interacting (KIP1-like) family protein 8.9 8.2 42.9 70.8 TW ● ● Eucgr.I01147 fucosyltransferase 12 16.9 15.8 71.2 192.7 TW ● ● 30 - Signaling Eucgr.G00850 EXORDIUM like 2 97.5 79.0 4.9 23.3 OW ● Eucgr.G00851 EXORDIUM like 2 140.2 91.7 18.7 42.3 OW ● Eucgr.F01203 IQ-domain 10 7.2 7.8 277.5 619.2 TW ● ●● ● Eucgr.F02727 Leucine-rich repeat transmembrane protein kinase family protein 2.8 3.9 30.5 42.8 TW ● ●● ● Eucgr.H00370 Calcium-binding EF-hand family protein 26.5 14.9 84.4 131.4 TW ● 31 - Cell organization Eucgr.D01847 tubulin beta chain 2 23.9 29.5 797.5 1493.1 TW ● ●● ● Eucgr.I02702 actin 7 28.8 32.0 190.8 177.0 TW ● Eucgr.J02401 ATP binding microtubule motor family protein 8.9 14.1 64.0 149.9 TW ● ● Eucgr.K01666 beta-6 tubulin 130.4 125.1 481.9 929.7 TW ● ● Eucgr.K01841 actin 1 45.2 35.2 186.3 210.7 TW ● 33 - Development Eucgr.J02614 Unknow n function 4.2 1.5 31.9 37.2 TW ●● ● 34 - Transport Eucgr.F02993 Sec14p-like phosphatidylinositol transfer family protein 21.3 14.1 72.7 102.1 TW ● Eucgr.K02993 plasma membrane intrinsic protein 1;4 201.6 82.4 230.3 42.1 TW ●

148

Table 5.4 – (cont.) Expression (FPKM) Differential expression (q-value < 0.05) Functional Cluster classification E. grandis loci Gene annotation OW1w OW1w TW1w OW1w TW1w OW34w Cluster (MapMan bin class) OW1w OW34w TW1w TW34w vs vs vs vs vs vs TW1w OW34w OW34w TW34w TW34w TW34w 35 - Not assigned Eucgr.B02741 Unknow n function 272.9 24.9 6.4 96.5 OW ● Eucgr.H01072 Unknow n function 168.7 74.9 8.6 38.5 OW ● Eucgr.L03529 Unknow n function 278.4 63.9 1.1 21.5 OW ● Eucgr.A00530 Plant protein of unknow n function (DUF828) 13.2 18.8 178.5 291.2 TW ● ●● ● Eucgr.A01856 Unknow n function 194.7 138.1 620.9 1115.8 TW ● ● Eucgr.B03382 ARM repeat superfamily protein 7.9 9.3 44.3 76.5 TW ● ● Eucgr.B03885 Family of unknow n function (DUF662) 99.7 136.8 246.4 504.3 TW ● Eucgr.C00339 Unknow n function 2.8 1.5 45.8 0.7 TW ● Eucgr.C00653 Unknow n function 88.8 95.1 365.6 532.3 TW ● ● Eucgr.E00266 TPX2 (targeting protein for Xklp2) protein family 10.7 16.3 151.6 297.3 TW ● ●● ● Eucgr.F03133 Late embryogenesis abundant (LEA) hydroxyproline-rich glycoprotein10.8 family 12.2 102.6 346.0 TW ● ●● Eucgr.F03638 Nucleotide-sugar transporter family protein 29.6 25.6 139.5 211.3 TW ● ● Eucgr.F04121 Unknow n function 12.0 24.0 81.9 141.8 TW ● Eucgr.G03122 Plant protein of unknow n function (DUF827) 120.8 99.5 305.7 560.7 TW ● Eucgr.H00727 Bifunctional inhibitor/lipid-transfer protein/seed storage 2S albumin superfamily261.9 284.6 607.7 1763.0 TW ● Eucgr.H01008 Unknow n function 8.5 7.5 40.8 89.2 TW ●● Eucgr.I01644 Protein of unknow n function (DUF579) 4.5 4.8 97.2 222.2 TW ● ●● ● Eucgr.I01697 zinc finger (C3HC4-type RING finger) family protein 6.3 4.6 73.0 236.4 TW ● ●● ● Eucgr.J00170|J00171 TRICHOME BIREFRINGENCE-LIKE 33 68.6 59.6 659.1 807.0 TW ● ● Eucgr.J00937 Unknow n function 0.7 0.6 760.6 1899.4 TW ● ●● ● Eucgr.K02384 Unknow n function 103.6 113.4 12.3 168.4 TW ● Eucgr.K02492 Unknow n function 28.7 31.8 108.8 232.8 TW ● ● XLOC_018314 Unknow n function 68.9 50.6 300.8 392.8 TW ● ● 149

149

Cluster-TW grouped the 73 genes overexpressed in TW, including the most preferentially and high expressed genes in TW developing samples such as EgrFLA2b (Eucgr.J00938), a Fasciclin-Like Arabinogalactan- encoding gene (FLA) – a marker of tension wood – and 1- Aminocyclopropane-1-Carboxylate Oxidase-related (ACC oxidase), involved in the synthesis of ethylene (Andersson-Gunnerås et al., 2003). The majority of the carbohydrate metabolism genes (Table 5.5; S5.6) were found to integrate Cluster-TW such as sucrose synthase (Eucgr.C03201), beta-galactosidase (Eucgr.C03896, Eucgr.C03897), cellulose (Eucgr.F02941), β-D-xylosidases (Eucgr.G00649; Eucgr.H01055), fucosyltransferase (Eucgr.I01147), pectin lyase-like (Eucgr.J02266), chitinase (Eucgr.H00321), exostosin family protein (Eucgr.H00343), nucleotide- sugar transferases (Eucgr.A01172; Eucgr.F03638). Cluster-OW consisted of 20 genes up-regulated in OW, including the exordium-like genes (Eucgr.G00850, Eucgr.G00851), chitinase (Eucgr.I01495), heat shock responses (Eucgr.C00678; Eucgr.E00653) and unknown function (Eucgr.B02741).

RT–qPCR validated the RNA-seq expression profile of nine differentially expressed genes, despite some genotype effect (Fig. S5.3 and Table S5.1b-c): a) eight genes (Eucgr.B03112; Eucgr.C03896; Eucgr.D01368; Eucgr.D01819; Eucgr.F03133; Eucgr.G00649; Eucgr.J02401 and Eucgr. K03111) confirmed as up-regulated in TW developing xylem tissues; b) one gene (Eucgr.I01495) confirmed as up- regulated in OW developing tissues.

150

Table 5.5 - Genes up-regulated in Eucalyptus globulus opposite wood or tension wood associated to carbohydrate activity.

Cluster Gene model Gene annotation CAZY annotation Eucgr.D02029 Glycosyl hydrolases family 32 protein GH32 Opposite Eucgr.E04221 Laccase 14 AA1 wood Eucgr.I01495 basic chitinase GH19 Nucleotide-diphospho-sugar transferases Eucgr.A01172 GT43 superfamily protein Eucgr.J02266 Pectin lyase-like superfamily protein PL1, GH28 Eucgr.K03111 Laccase 17 AA1 Eucgr.C03201 Sucrose synthase 4 GT4, GT5, Eucgr.C03896, GH35,GH2, Beta-galactosidase 12 Eucgr.C03897 GH42 Eucgr.F03138 Glycosyl hydrolases family 32 protein GH32 Eucgr.F00232 Plant glycogenin-like starch initiation protein 3 GT8, GT75 Eucgr.F02941 Cellulase 3 GH9

Tension Eucgr.G00649 Beta-D-xylosidase 4 GH3 wood Eucgr.G03028 Laccase/Diphenol oxidase family protein AA1 Eucgr.H00321 Homolog of carrot EP3-3 chitinase GH19, CBM18 Eucgr.H00343 Exostosin family protein GT47 Eucgr.H01055 Beta-xylosidase 2 GH3 Eucgr.I00988 SKU5 similar 17 AA1 Eucgr.I01147 Fucosyltransferase 12 GT10, GT47 Zinc finger (C3HC4-type RING finger) family Eucgr.I01697 GT92 protein XLOC_050971 Pectinesterase-like CE8 XLOC_002488 Glucan endo-1,3-beta-glucosidase GT17

Functional classification of 88 differentially expressed genes

For this analysis only the 88 loci corresponding to gene model of the E. grandis the reference annotation (Myburg et al., 2014) were analysed. Comparative GO functional classification between genes of Cluster-TW and Cluster-OW was done to get insights on the functional differences between these two groups of genes. GO-terms were attributed to 47 (36 from Cluster-TW and 11 from Cluster OW) of the 88 loci (Table S5.7a-b). A total 35 GO-terms were listed for both datasets: 16 Biological

151

Processes (BP), 5 Cellular Components (CC) and 14 Molecular Functions (MF). Eight BP-GO terms were exclusively associated to Cluster-TW, namely, “carbohydrate metabolism”, “generation of precursor metabolites and energy”, "DNA metabolic process", "cellular protein modification process", "response to external stimulus", "response to biotic stimulus", “cellular component organization”, and “photosynthesis”. The remaining BP-GO terms were represented in both clusters, including metabolic processes related with “nucleobase- containing compounds”, “transport”, “response to stress” and “protein metabolism” (Table S5.7a,b). Two MF-GO terms were exclusively associated to Cluster-TW, namely "motor", "kinase" and "transferase" activities (Table S5.7b). Two GO terms were exclusive to Cluster-OW, namely "nucleic acid binding" and "lipid binding". Three CC-GO terms were exclusive to TW-Cluster, namely "cellular component", "cytoskeleton", "membrane" and one exclusive to OW-Cluster, namely "cytoplasm" (Table S5.7b). Nine enriched GO terms were obtained while comparing Cluster-TW with Cluster-OW. Two BP-GO terms were over- represented in Cluster-TW, “carbohydrate metabolic process” and “single-organism carbohydrate metabolic process” (Table S5.7c). Seven BP-GO terms were over-represented in Cluster-OW, "response to arsenic-containing substance", "response to oxidative stress", "aromatic compound biosynthetic process", "heterocycle biosynthetic process", response to biotic stimulus", "response to other organism", "multi- organism process" (Table S5.7c).

152

Discussion

In this study, Eucalyptus globulus tension wood induction kinetics was used as a model to study the phenotypic and molecular plasticity of xylogenesis to gravitropic stress. Contrary to previous studies (e.g. Aguayo et al., 2010; Mizrachi et al., 2015; Paux et al., 2005), TW and OW developing xylem were sampled from non-related genotypes under a tension induction time course and characterized in terms of anatomical, chemical and transcriptome variation. This option provides a more robust characterization of the xylogenesis plasticity in response to gravitropic stimulus over time, and takes in account the variations due to the different genetic backgrounds.

Phenotypic plasticity of xylogenesis

The presence of a G-layer is a distinctive characteristic of TW formation. In this study the presence of G-layers was observed in some cells sampled from TW tissues formed soon after one week of bending, even in locations far from the cambial zone. In contrast, no G-layers were present in cells sampled from OW tissues formed in the same period. The presence of G-layers in TW1w samples could result from previous gravitropic responses, as previously reported in trunks of well- growth straight trees (Aguayo et al., 2010; Washusen and Ilic, 2001). However, in this study the presence of G-layers in TW was largely well- established after two weeks of trunk bending and found visually similar to the differentiation of TW tissues reported before (Aguayo et al., 2010; Paux et al., 2005). This alteration contributed to a significant higher cell wall thickness of the fibre cells and increased vessel tangential diameter, in accordance with previous results (Aguayo et al., 2010; Mizrachi et al., 2015). Greater cell wall thickness has been considered another main distinctive anatomical characteristic of TW relatively to OW (Aguayo et al., 2010). Nevertheless, in our study, fibre length and fibre width were

153 significantly smaller in TW forming tissues comparatively to the opposite side, in contrast with previous results obtained by Aguayo et al. (2010) and Mizrachi et al. (2015).

Part of the novelty provided by this study consisted in the analysis of the variation of the chemical composition of developing xylem samples from Eucalyptus. It was possible to identify a toluene-derived peak in pyrograms from DX samples but not from FDX samples. This observation attested for the presence of proteins in DX in contrast to FDX and is in accordance to that previously reported in maritime pine (Paiva et al., 2008a). As the peaks derived from H-units might be biased by the “aa” derived pyrolysis products, the quantification of H-units was excluded for the comparative analysis of total lignin (Paiva et al., 2008a) between DX and FDX. Similarly to maritime pine, DX from Eucalyptus globulus presented a lower content of pyrolysis products from lignin and hexosans when compared to FDX (Paiva et al., 2008b). Along the bending stress kinetics, it was observed an increase of the relative contents of lignin in DX from OW, while in TW tissues the analogous variation was smaller. These tendencies are also in accordance to the relative increase in lignins reported to occur in Eucalyptus full developed wood from TW, when compared to straight wood (Mizrachi et al., 2015) or to OW (Aguayo et al., 2010). In contrast, pyrolysis products of hexosans - related to the cellulose content - in developing xylem from TW increased relatively to those of OW tissues, which is in accordance with previous results in poplar and Eucalyptus. TW has been associated with increased contents of alpha-cellulose, relatively to straight wood or OW (Aguayo et al., 2010, 2012; Côté Jr et al., 1969; Mizrachi et al., 2015; Schwerin, 1958; Timell, 1986). The contents of hexosans during the TW formation time course remained relatively stable, similarly to that observed by Aguayo et al. (2010), while it decreased in OW samples along the bending stress kinetics. The variation of the contents of

154 pentosans during TW and OW formation time course, suggest the participation of hemicelluloses during cell wall rearrangements and on carbon reallocation. Hemicelluloses have been progressively regarded as carbon sources during periods of carbon demand (Hoch, 2007), in addition to their important structural role attributed to strengthen cell walls (Paiva et al., 2008a; Scheller and Ulvskov, 2010).

Coding transcriptome dynamics during bending stress responses

The analysis of the coding transcriptome dynamics revealed 88 genes and 5 new loci, differentially expressed between TW and OW in at least in one of the sampling periods. The identification of these 5 new loci agrees with the perspectives that RNA-seq can be used together with other functional genomics methods, to enhance the analysis of gene expression and identification of new genes (Conesa et al., 2016). These 93 loci contrast with the 196 cDNAs reported by Paux et al. (2005) and the 366 genes reported by Mizrachi et al. (2015) as significantly differentially expressed between TW and upright normal wood. Several factors might contribute to a higher variability of the transcriptomes, such as the age of the trees growing in field-experiments or the use of non- related genotype background as biological replicates. Despite the differences between studies, the comparison between this study to that of Mizrachi et al. (2015) in E. urograndis revealed eleven genes in common (see Table S5.6), including a fasciclin-like arabinogalactan (FLA; Eucgr.J00938) gene. In poplar, FLA genes are considered molecular markers of TW development (Andersson-Gunnerås et al., 2006; Gerttula et al., 2015; Lafarguette et al., 2004). It is worth notice, that even after one week of bending this gene was found up-regulated 148-fold in early TW, and it became over-expressed to 903-fold higher after 3-4 weeks. This huge increase suggests that FLA proteins already participate in early responses to bending stimulus. FLA proteins seem to affect stem biomechanics through changes in cellulose deposition and/or

155 cell wall structural properties (MacMillan et al., 2010). In Eucalyptus, EgrFLA2 has been previously attributed a functional specialization which is linked to microfibril angle (MFA) (MacMillan et al., 2015). The biological relevance of this differential expression is considerable since the mechanical properties of wood and fibres are determined to a significant extent by MFA (Lamara et al., 2015). Together with FLA gene, two other interesting TW-related candidate genes likely involved in carbohydrate metabolism were identified, namely one exostosin family protein (GT47; Eucgr Eucgr.H00343) and one beta-galactosidase 8 (GH35; Eucgr.J00936.1).

The data from this study suggests that for the same type of DX tissue the transcriptome of E. globulus do not varies significantly along the applied bending stress. However, bending stress induced a stronger transcriptomic remodeling between the upper and bottom side of the trunk bent segment. Indeed, while more than 50% of the genes listed in this work were found differentially expressed between TW and OW tissues, only a reduced number of genes were found differentially expressed between the early and late response within the same type of tissue. In particular, we observed an up-regulation of genes linked to adjustments of osmotic potential and defense, and to cell wall expansibility, as suggested by the up-regulation of one expansin during the early response of TW tissues.

Gravitropic response induces the partition of carbon and energy to tension wood tissues

The transcriptomic response between TW and OW can be understood in a context of dynamic reallocation of carbon resources from bottom to upper side developing xylem tissues of the bent stem. In support to this hypothesis, two exordium-like genes (Eucgr.G00850, Eucgr.G00851) were found over-expressed in early OW tissues. Exordium-like genes

156 take part in a regulatory pathway that controls growth and development during carbon and energy-limiting growth conditions (Schröder et al., 2011), together with genes related with program cell death (PCD). In contrast, the TW transcriptomes were enriched by genes linked to carbohydrate process, such as pectin lyase (Eucgr.J02266), sucrose synthase (SUSY; Eucgr.C03201), cellulase gene (Eucgr.F02941), among other glucoside hydroxlases (e.g. beta-galactosidase 8; GH35; Eucgr.J00936.1) and glucoside transferases (eg. one exostosin family protein; GT47; Eucgr Eucgr.H00343). Pectin lyase likely assumes an important role in pectin remodeling in primary and even in early stages of secondary cell wall biosynthesis. Genes encoding pectin/pectate lyases and hydrolases have been found highly expressed in poplar xylem (Geisler-Lee et al., 2006). SUSY genes are pivotal enzymes in cell wall biosynthesis, in particular on that of cellulosic microfibrils (Gerber et al., 2014). SUSY activity in developing xylem tissues does not specifically affect cellulose biosynthesis, but instead is involved in the definition of the total carbon incorporation to cell walls, and in ensuring an intact structure of the fibre wall (Gerber et al., 2014; Zieher, 2010). Some plant membrane-bound cellulases have been attributed an editing function during cellulose biosynthesis needed for the formation of crystalline cellulose, a typical characteristic of the G- layers; a role in the removal of putative, membrane-bound precursors from the end of the growing cellulose chain (Geisler-Lee et al., 2006; Peng et al., 2002); or even an action on amorphous cellulose in the cell wall and therefore to contribute for an increased cell wall plasticity (Ohmiya et al., 2000; Park et al., 2003). Several cytoskeleton-related proteins such tubulins, actins and other associated proteins were found up regulated in TW, providing a scaffold for intense intracellular transport of metabolites, a template for cell wall assembly and ensuring cell shape and mechanical resistance to deformation (Alberts et al., 2007).Two genes encoding β-xylosidases (Eucgr.G00649; Eucgr.H01055) were found over-expressed in TW but

157 also revealed small increased expression in late OW relatively to early OW. Known to act on xylans, β-xylosidases have been implicated in the organization and loosening of glucuronoxylans in the Arabidopsis cell wall, thereby facilitating lignin polymerization in the polysaccharide matrix (Goujon et al., 2003).

Despite the increasing lignin content in OW and contrary to what has been reported by Mizrachi et al. (2015), no monolignol biosynthetic genes (Carocha et al., 2015) were found differentially expressed. Nevertheless, three laccase genes were found differentially expressed during TW and OW differentiation. Curiously, at late response one (Eucgr.E04221) of these laccases was highly expressed in OW and very low expressed in TW. The other two laccases were both highly expressed in TW at early and late response. Knowing that laccases are also involved in lignin monomer polymerization (Boerjan et al., 2003; Vanholme et al., 2010), and since an increased lignin content in OW has been quantified along the applied bending stress, it is feasible that Eucgr.E04221 might be involved in the polymerization of monolignols.

Cambial activity and xylem differentiation are spatially and temporally regulated by complex hierarchical networks of transcription factors (Druart et al., 2007; Hussey et al., 2013; Taylor-Teeples et al., 2015; Zhong et al., 2010). The genes EgrMYB60 (Eucgr.D01819) and KNOTTED-like homeobox (Eucgr.D01935), both up-regulated in TW tissues, likely play important roles on the differential regulation of the tissue transcriptome. EgrMYB60, a gene recently described in Eucalyptus (Soler et al., 2015), likely assumes a critical role in cell wall biosynthesis since is a close homologue of ATMYB103. This ATMYB103 gene is a known downstream target of SND1, a NAC-domain master regulator of xylem fibre cell development (Schuetz et al., 2013; Zhong et al., 2006). The KNOTTED-like homeobox (Eucgr.D01935), likely a major

158 regulator of cambial differentiation, was also found over-expressed in TW forming tissues, in accordance to that found in Eucalyptus by Mizrachi et al. (2015). In poplar, the overexpression of its homologue Arborknox2 (ARK, potri.002G113300) was associated to a faster maturation of G-layers and to enhanced gravibending response (Gerttula et al., 2015).

In addition, the importance of ethylene on the response to bending stress was highlighted in our transcriptomic analysis, in accordance with previous results in Eucalyptus (Mizrachi et al., 2015) and poplar (Andersson-Gunnerås et al., 2006). Ethylene is recognized to have a key role as an endogenous stimulator of cambial growth during TW response (Andersson-Gunnerås et al., 2003, 2006; Vahala et al., 2013). It also impacts other aspects of wood formation such as the morphology of xylem cells, and the ontogenesis of vessels, fibres and ray parenchyma cells (Little and Savidge, 1987). Besides, one CYP94C1 gene (Eucgr.G02702) was found highly expressed and regulated in TW. Together with other Cytochrome P450 genes, CYP94C1 genes have been described to exert a negative feedback control on the levels of the bioactive form of jasmonic - acid jasmonyl-L-isoleucine (JA-Ile) (Kitaoka et al., 2011; Koo et al., 2011). JA-Ile is known to be involved in the jasmonate-mediated signaling pathway in plants. A crosstalk between ethylene and jasmonate signaling pathways has been suggested to enable a rapid and integrated response to gravitropism (Zhu et al., 2011).

159

Chapter 6

Post-transcriptional regulation dynamics during Eucalyptus globulus tension wood formation

The present chapter integrated parts the following research article:

Carocha V, Fernandes F, San-Clemente H, Graça C, Pêra S, Alves A, Leal L, Szittya G, Dalmay T, Quilhó T, Rodrigues JC, Fevereiro P, Grima-Pettenati J, and Paiva JAP. 2015. Post-transcriptional regulation dynamics during Eucalyptus globulus tension wood formation. In preparation.

160

Chapter 6

Post-transcriptional regulation dynamics during tension wood formation in Eucalyptus globulus

Table of Contents

Chapter 6 ...... 160

Abstract ...... 162 Chapter introduction ...... 163 Objectives ...... 165 Materials and methods ...... 166 Results ...... 172 Discussion ...... 186

161

Abstract

• Despite recent progresses that led to an increased and more integrated understanding on the role of transcriptional regulations during wood formation, our knowledge on post-transcriptional regulations mediated by miRNA is still scarce. MiRNA are non-coding small RNAs that control a vast array of developmental and stress-related biological processes including wood formation. • To extend our knowledge on the miRNA regulation of wood formation we deep-sequenced nine Eucalyptus globulus smallRNA libraries from developing xylem tissues of tension and opposite wood, collected from trees subjected to different periods of mechanical/gravitropic stress. • A total of 162 distinct miRNA were identified, among which 49% have never been reported before. The analysis of five degradome libraries of developing xylem, identified 135 genes targeted by 103 miRNA, including 38 never before reported. • Two contrasting sets of miRNA have been found differentially expressed in between tension and opposite wood. The functional classification of genes targeted by the newly identified miRNA, highlighted genes involved in secondary cell wall formation such as those of the biosynthesis of phenylpropanoids and lignin. • This knowledge on post-transcriptional regulation of xylogenesis and cell wall biosynthesis will enable to develop new and more efficient strategies to improve wood biomass on different tree species.

162

Chapter introduction

With the recent publication of the annotated sequence of the E. grandis genome (Myburg et al., 2014), genome-wide analyses of key families of transcriptional regulators boosted our ability to identify novel promising candidates involved in the regulation of secondary wall polymers biosynthesis (Hussey et al., 2015; Soler et al., 2015; Yu et al., 2014, 2015). Despite further studies are still required to progress in our understanding on the molecular mechanisms governing wood formation, an impressive amount of information is already available and showed that these processes are far more complex than previously expected (Grima-Pettenati et al., 2012; Schuetz et al., 2013) involving multi- layered networks of transcriptional regulation (reviewed in Hussey et al., 2013; Ruzicka et al., 2015; Taylor-Teeples et al., 2015).

In addition to transcriptional networks, post-transcriptional regulations, which include those mediated by microRNAs (miRNA), play an important role in wood development. In plants, miRNAs are small (20–24 nt) endogenous, non-coding RNAs (Bartel, 2004) which act by base pairing with complementary mRNA target sequences, mediating their down- regulation either by mRNA site-specific cleavage or by translational repression (Llave et al., 2002; Rhoades et al., 2002). MiRNAs play crucial roles in diverse processes of plant development and cellular differentiation, including vascular cambium differentiation (Kim et al., 2005; Ko et al., 2006; Lu et al., 2005; Sunkar and Zhu, 2004), and secondary cell wall biosynthesis (Schuetz et al., 2013; Wong et al., 2011). In woody species, miRNA target genes were shown to be related to vascular differentiation (Klevebring et al., 2009; Robischon et al., 2011), to tension wood formation in response to gravity (Lu et al., 2013, 2005; Puzey et al., 2012). However, knowledge on miRNA post- transcriptional regulation of genes involved in wood formation and secondary cell wall biosynthesis in the Eucalyptus genus is still scarce.

163

Indeed, although a few studies have identified some miRNAs in Eucalyptus spp (Hudson et al., 2014; Levy et al., 2014; Myburg et al., 2014; Victor, 2007), none of these miRNAs is present among the 24,521 miRNA entries in miRBase (v.20) (Kozomara and Griffiths-Jones, 2011).

In this study, the full non-coding transcriptome of developing xylem tissues was used to identify miRNAs differentially expressed during the formation of tension and opposite wood in Eucalyptus globulus.

For that purpose a deep sequencing (sRNA-Seq) of nine E. globulus sRNA libraries from contrasting developing xylem tissues was performed and this deep-sequencing data was used to predict candidate miRNAs. Large-scale miRNA target gene identification was also performed using degradome sequencing analyses. The prediction, large-scale validation by degradome analysis and functional categorization of miRNA target genes provided new insights into the biological processes, functions and cellular components under post-transcriptional regulation by miRNA during xylogenesis

164

Objectives

To accomplish genome-wide identification of miRNA expressed during the xylogenesis of Eucalyptus globulus.

To identify those miRNA differentially expressed during the formation of tension and opposite wood from Eucalyptus globulus.

To accomplish large scale miRNA target gene identifications by degradome sequencing analyses of tension and opposite wood from Eucalyptus globulus.

To get insights on the biological processes, the molecular functions and the cellular components impacted by miRNA-mediated post- transcriptional regulation during the formation of tension and opposite wood from Eucalyptus globulus.

165

Materials and methods

Reaction wood induction and tissue sampling

Tension wood (TW) and opposite wood (OW) formation was induced by bending five-years-old E. globulus trees from three genotypes (GB3, GM2-58 and MB43) to an angle of roughly 45º, at Altri Florestal SA’s Experimental Research Station in Óbidos, Portugal (Fig. 5.1). All developing xylem samples were collected on July 7th, 2010 from three clonal identity certified trees (ramets) for each of the three genotypes. TW tissues were collected from the upper and OW from the lower sides of the trunks bent during one (TW1w, OW1w), two (TW2w, OW2w), three (TW3w, OW3w), and four (TW4w and OW4w) weeks. Control samples (Ctrl) were harvested from the trunks of straight trees at breast height. Sampling and conservation of samples were performed as described in Paux et al. (2004). Tension wood induction was confirmed by histochemical characterization of wood disks harvested from trees subjected to a gravitropic stimulus for two and three weeks (Chapter 4 for methodology; Fig. S6.1).

RNA extraction and libraries preparation and sequencing

Total RNA was isolated using approximately 100 mg of biological material per sample following the method proposed by Carvalho et al. (2015). Total RNA was checked by BioAnalyzer (Agilent), and all samples gave a RIN >7, assuring their high quality for further analysis.

For the construction of the eight small RNA (sRNA) libraries of the bending kinetics, sRNA samples from the three genotypes were pooled in equimolar amounts by tissue type [(TW, OW) and bending period: four (4w), three (3w), two (2w) and one week (1w)]: one (TW1w, OW1w), two (TW2w, OW2w), three (TW3w, OW3w), and four (TW4w and OW4w). A ninth sRNA pool was obtained by pooling sRNA from straight trees (Ctrl). Fifteen micrograms of each RNA pool were sent to FASTERIS SA

166

(Geneva, Switzerland; www.fasteris.com) where the nine sRNA libraries were prepared and sequenced using the TRUSEQ™ SBSv5 sequencing kit (Illumina) and the Illumina Hi-Seq 2000 instrument, on a multiplex run with 1x50+7(index) cycles. miRNA analysis workflow

In silico prediction and annotation of E. globulus miRNA

The dataset consisted of nine sequenced sRNA libraries. The MIRCAT module from the UEA Small RNA Workbench (Stocks et al., 2012) was used for miRNA prediction (Fig. 6.1). Each sRNA library was individually treated. Initially the reads were filtered for 3’ adaptor sequence removal. Only reads sequences with 16 to 35 nt were considered. Further filters were applied to remove low complexity, invalid and a minimum abundance of 6 associated reads (Filter-1). Next, sequences coding for tRNA and rRNA (Filter-2) as well as reads that did not match the genome (Filter-3) were discarded. For this, a de novo assembly draft of the Eucalyptus globulus genome (Paiva et al., unpublished) with 544,840 contigs and totalizing around 438 Mbps was used. The surviving sequences mapped into this input genome revealed genomic regions covered with potential sRNA loci. Default plant parameters proposed in MIRCAT were adopted except by considering only loci with a minimum abundance of six associated reads and a reads sequences length of 20-24nt. The output files were then processed to define a unified list of miRNA candidates (Filter-4). All conserved miRNA candidates, presenting complete, 100% sequence homology to at least one miRNA from miRBase v.21 (Kozomara and Griffiths-Jones, 2014) as identified by MIRCAT were included in the final list (Filter-5). The remaining miRNA candidates were filtered based on the following criteria: i) contains an associated expressed miRNA* in at least one of the libraries; ii) the folded hairpin secondary structure satisfies all

167 guidelines from (Meyers et al., 2008; Filter-6). For this, hairpin secondary structures were generated with the RNA

FOLDING/ANNOTATION tool and each pre-miRNA structure were visually verified by two operators.

Nomenclature used for miRNA followed the criteria and conventions described in (Ambros et al., 2003). The prefix egl- was provided as an indicator of species for each E. globulus precursor and mature miRNA. The closest sequence E. grandis homologs of the pre-miRNA were identified by BLASTn (Altschul et al., 1990) and their position mapped onto the eleven E. grandis chromosomes. Lettered suffixes were provided for E. globulus precursors and mature miRNA based on the position and location of the different precursors along the E. grandis genome (from first chromosome to the last chromosome). The conserved miRNA received the conserved family name. The mature miRNA received a final prefix (-5p or -3p) indicating their location on the precursor arm (5' or the 3', respectively).

BLASTN sequence homology searches between our predicted mature miRNA against those identified in previous Eucalyptus studies (Levy et al., 2014; Myburg et al., 2014; Pappas et al., 2015; Victor, 2007).

Transcript variation of mature miRNA

Transcript variation of predicted mature miRNA in the nine sRNA libraries was estimated using the Pairwise Fisher Exact test with

Bonferroni correction available at IDEG6 (Romualdi et al., 2003). In order to provide information about the differential expression between TW and OW, only the comparisons between these two contrasting tissues from the same kinetic time point were considered in this thesis.

Hierarchical clustering analysis was accomplished in EXPANDER 6.5 (Ulitsky et al., 2010) to group samples and miRNAs based on the

168 similarity of their expression profile. First, the miRNA expression data for each condition was treated as following: a) normalization (referenced to the total of reads); b) flooring (1E-5); c) log transformation and standardization (normalization of each expression pattern to have a mean of 0 and a variance of 1). Hierarchical clustering was then performed using Pearson correlation and Average linkage.

Validation of miRNA by RT-qPCR

Real-time SYBR® Green RT-qPCR was performed in a total volume of 10µL by using 200 nM of each miRNA-specific primer and PerfeCTa Universal PCR Primer (Quanta BioSciences Inc. Gaithersburg, MD, USA) and the PerfeCTa® SYBR® Green SuperMix (Quanta BioSciences Inc. Gaithersburg, MD, USA), on a PikoRealTM thermocycler (Thermo®, Waltham, MA USA), according to the manufacturer's instructions and using the following PCR conditions: a denaturing step at 95ºC for 15 min followed by 40 cycles of three sequential steps (denaturing 15 sec at 95ºC, annealing 30 sec at 60ºC and elongation 30 sec at 70ºC), and detected by fluorescence. The final melting curve was produced by raising the temperature from 60–95ºC in increments of 0.2ºC.

Validation of miRNA by Northern blot

Expression of selected mature miRNA candidates was validated by northern blot using Locked Nucleic Acid (LNA™; Exiqon, Vedbaek, Denmark)-labelled probes and the same protocol described in (Trindade et al., 2010). RNA pools were obtained from samples of a preliminary tension wood induction experiment, conducted in 2009, in the same experimental field. In this preliminary experiment, trees were bent so that samples were harvested three and four weeks later along with samples from six straight control trees at 30th July.

169

Degradome sequencing analysis

Five degradome libraries were generated following the procedures described by Pantaleo et al. (2010): one library from DX of straight trees (Ctrl), two libraries corresponding DX of TW and OW sides after one week of bending (early response) and, finally two other libraries corresponding to DX of TW and OW sides by pooling the total RNA of samples collected after three and four weeks of tree bending stress (late response). Each library was produced from pooled samples (three genotypes) of 500µg total RNA. All libraries were sequenced at BaseClear (Leiden, The Netherlands), using HiSeq from Illumina®. The miRNA:target interactions were identified using PARESNIP (Folkes et al., 2012), the Illumina sequence reads after cleaning for quality and adaptors, the sequences of the mature miRNA candidates from MIRCAT predictions and the Eucalyptus gene models catalogue a from

PHYTOZOME v7.0 using the BIOMART tool (Smedley et al., 2015). Four independent PARESNIP analyses were conducted for each degradome library. The default plant parameters proposed in PARESNIP were adopted including 100 shuffles for p-value calculation and a cut-off p- value of 0.05. Cleaved target transcripts were validated only if involved in cleavage events of signal category 0-2 revealed in at least two

PARESNIP analyses and associated to p-value < 0.05 in at least one degradome library.

The genomic distribution of predicted miRNA, degradome-validated target genes as well as graphical representations of the miRNA- mediated cleavage events listed after degradome analysis were graphically displayed using CIRCOS, Circular Genome Data Visualization software (Krzywinski et al., 2009).

170

Functional characterization of differentially expressed genes and miRNA target genes

Gene ontology-driven analyses were performed in parallel for functional classification of miRNA target genes revealed by degradome analysis in each sRNA library. BLAST2GO (Conesa et al., 2005) categorization was performed and differences between conditions were highlighted by running an enrichment test analysis (Fisher's Exact Test with Multiple Testing Correction of FDR) using the reference genome as test-set. Redundant and low level GO terms were replaced by higher level semantically similar GO-slim terms using the REVIGO web service (Supek et al., 2011).

171

Results

In silico prediction and annotation of E. globulus miRNA

Eight small-RNA libraries were constructed from tension (TW) and opposite (OW) differentiating wood tissues sampled at one, two, three and four weeks after bending E. globulus trees. A ninth library was generated from wood sampled on straight trees. Each of nine small-RNA libraries sequenced using the Solexa technology (Illumina®) has generated a similar number of reads (3 to 4 million of unfiltered sRNA sequence), totalizing 66.9 millions of reads (Fig. 6.1, Table S6.1a).

Sequential application of filters for adaptor removal, sequence quality, length (Fig. 6.1; Filter-1) and coding potential (Filter-2) resulted in a total of 32.1 million filtered reads which were mapped against an E. globulus draft genome sequence (Paiva et al., unpublished) (Filter-3), generating a total set of non-redundant 1,617 potential precursors (Table S6.1b). Conserved miRNA were identified by miRCat based on full, 100% sequence homology with at least one mature miRNA from miRBase v.21 (Kozomara and Griffiths-Jones, 2014). Based on this annotation, 63 pre- miRNA generating 39 guide and 34 distinct star sequences, grouped in 22 conserved families were identified (Fig. S6.3; Table S6.1b). These conserved miRNA, which belong to 22 families, were highly expressed all along the tension wood induction kinetics (Table S6.2a-b).

172

Fig. 6.1 – Workflow for miRNA prediction.

The remaining 1,554 potential pre-miRNA were carefully evaluated based on the presence of miRNA* and on the visual inspection of their secondary structure. This procedure resulted in the identification of 69 additional pre-miRNA totalizing 89 distinct mature miRNAs absent from miRbase v21. Altogether 132 pre-miRNA generating 162 mature miRNA were identified (Fig. 6.2 and Fig. S6.2). From these (Table S6.1b), only 55 were previously predicted in Eucalyptus (Levy et al., 2014; Myburg et al., 2014; Pappas et al., 2015; Victor, 2007) (Table S6.3). A total of 77

173 miRNA identified in this study were not described in miRBase v21 neither in previous studies in Eucalyptus (Levy et al., 2014; Myburg et al., 2014; Pappas et al., 2015; Victor, 2007). All 132 E. globulus pre- miRNA mapped against the eleven scaffolds of the E. grandis genome, allowing the identification of their homologues in this genome (Fig. 6.3).

Figure 6.2 – Predicted secondary structures for 15 Eucalyptus globulus miRNA precursors identified from the non-coding transcriptome of E. globulus developing xylem tissues. These miRNA precursors provided a total of eleven distinct mature guide miRNA sequences (blue) and eleven distinct star sequences (red). The hairpin secondary structures were generated with the RNA FOLDING/ANNOTATION tool from the UEA Small RNA Workbench (Stocks et al., 2012).

174

Chromossome1 Chromossome3 Chromossome5 Chromossome7 Chromossome9 Chromossome11 0 . 0 . 0 . 0 . 0 . 0 . egl-miR828a-5p 5287 egl-miRu-I51-5p 3890138 egl-miR169b-5p 388703 egl-miR166e-1-3p 4901677 egl-miRu-C01-3p egl-miR828b-3p* 4694295 egl-miRu-G39-3p 5976137 egl-miRu-I52-3p 379625 egl-miR166e-2-3p egl-miRu-C02-1-3p 2852125 egl-miRu-E01-3p 8030163 egl-miRu-G40-3p egl-miR166e-2-5p* 5478813 egl-miRu-C02-1-5p* 4432884 egl-miRu-E29-3p 834369 egl-miRu-K61-3p egl-miRu-C03-1-3p 8617357 egl-miR319c-3p egl-miR393b-3p 15796592 egl-miR408-3p egl-miRu-C02-2-3p egl-miR166b-3p egl-miR393b-5p-1* 3316932 16985648 egl-miR171a-3p 5478823 egl-miRu-C02-2-5p-1* 10945907 egl-miR166b-5p-1* 23213236 egl-miRu-G41-5p egl-miR393b-5p-2* 18970656 egl-miRu-A01-5p egl-miR166b-5p-2* 35709112 egl-miRu-G42-3p egl-miRu-C02-2-5p-2* 28669372 egl-miRu-I53-3p 11654421 egl-miR393b-5p-3* 19140130 egl-miRu-A02-3p 7082208 egl-miRu-C04-3p 11359753 egl-miRu-E30-3p egl-miRu-G43-3p-1* 8552304 egl-miRu-K63-1-5p 22751000 egl-miRu-A03-5p egl-miR168-3p-1* egl-miRu-G43-3p-2* 29004144 egl-miRu-I54-5p 11657478 18108516 egl-miRu-C05-3p 40119712 29017840 egl-miRu-I55-5p 16790196 egl-miR160d-5p 22106400 egl-miR171b-3p 12267823 egl-miR168-3p-2* egl-miRu-G43-3p-3* 1693482812063331 egl-miRu-K64-5p egl-miR168-5p egl-miRu-G43-5p 33470652 egl-miR171c-3p 16412700 22132400 egl-miR396a-5p egl-miRu-I56-3p 17880928 egl-miR169j-5p 32127652 egl-miRu-A04-3p egl-miRu-G12-2-3p 33801888 28534316 egl-miR396b-3p 43175892 egl-miRu-I56-5p* egl-miR156f-3p-1* egl-miR396b-5p-1* egl-miRu-G12-2-5p* 2960456428604034 egl-miR156f-3p-2* egl-miR395a-3p 35243776 egl-miRu-I57-5p 22144186 egl-miR396b-5p-2* 38047664 egl-miR828b-5p 43681440 egl-miR156f-5p egl-miR395a-5p* egl-miRu-I58-3p-1* 39359120 40297280 . egl-miR396b-5p-3* 37092872 egl-miRu-I58-3p-2* 29796208 egl-miR159a-3p egl-miR396b-5p-4* 43692564 egl-miR395b-3p 29805940 egl-miR159b-3p 43762032 egl-miR395c-3p egl-miRu-I58-5p 30779452 egl-miRu-C07-5p 39019480 . 29997824 egl-miR166f-3p Chromossome2 32837248 egl-miRu-C08-3p 46392696 egl-miR160b-5p 30673352 egl-miR535b-5p 32837252 egl-miRu-C09-3p egl-miRu-E31-2-3p egl-miR319d-3p 35304128 egl-miR171d-3p 0 . 33774500 egl-miR156b-5p 52512416 egl-miRu-E31-1-3p egl-miR319d-5p-1* egl-miR390b-3p* 50120512 Chromossome10 37646824 7607165 egl-miRu-B01-5p egl-miR169c-3p-1* egl-miRu-E31-1-5p* egl-miR319d-5p-2* egl-miR390b-5p . egl-miR162-3p egl-miR169c-3p-2* egl-miR319d-5p-3* 0 egl-miRu-K65-3p 8437599 38350680 . egl-miR319e-3p egl-miR162-5p* egl-miR169c-3p-3* 52447652 egl-miRu-K65-5p-1* egl-miR319e-5p-1* 9012547 egl-miR319a-3p egl-miR169c-5p egl-miRu-K65-5p-2* Chromossome8 1304810 egl-miR319e-5p-2* 10262545 egl-miR156a-5p 38350940 egl-miR169d-5p 64203152 egl-miRu-E32-5p 40625436 egl-miRu-K65-5p-3* egl-miR319e-5p-3* 41117828 egl-miR169e-5p 0 . egl-miRu-K65-5p-4* egl-miR319e-5p-4* egl-miRu-C10-3p* 1904151 egl-miR166c-3p egl-miRu-K65-5p-5* 41185868 egl-miR393a-3p* egl-miRu-C10-5p 71770792 egl-miRu-E33-5p 1951533 egl-miRu-H44-3p 2618345 egl-miRu-K65-5p-6* 21568104 egl-miRu-B02-5p egl-miR393a-5p 41230216 egl-miR169f-5p 74731016 . 3351137 egl-miRu-H45-5p 40943976 egl-miRu-K63-2-5p egl-miR166a-3p egl-miR172-3p 25315576 41302760 egl-miR169g-5p egl-miRu-H46-3p 11654421 42258700 egl-miR167c-5p egl-miR166a-5p* 3505807 egl-miR172-5p* 41314728 egl-miR169h-5p egl-miRu-H46-5p* 42264064 egl-miRu-K66-5p 11657478 egl-miRu-J59-3p 41377600 egl-miR169i-5p Chromossome6 7443194 egl-miRu-H47-5p 45510588 . 12063331 egl-miR160c-5p 50822812 egl-miR156d-3p 9251065 egl-miR535a-5p 0 . 16412700 egl-miRu-J61-5p 54652316 egl-miRu-C11-3p 11375877 egl-miRu-H48-5p 28534316 egl-miR156e-5p 74591736 egl-miRu-C12-3p 13459333 egl-miR164a-5p 80088352 . 28604034 egl-miRu-J60-5p Chromossome4 39359120 . 47721956 egl-miRu-B03-3p egl-miR2111-3p* 0 . 14027179 egl-miRu-E34-3p 50114608 egl-miRu-D01-3p egl-miR2111-5p 4346105 55406896 egl-miRu-B04-3p egl-miRu-D01-5p* 60341852 egl-miR169a-5p 8627357 egl-miRu-D02-5p 61222544 egl-miRu-B05-5p 42263568 egl-miR398b-3p egl-miR160a-5p 61761880 egl-miR398a-5p 13709344 63870816 egl-miR390a-5p egl-miR160b-3p* 49391496 egl-miRu-H49-5p 64237464 . egl-miRu-F35-3p* 34627168 49907172 egl-miR167a-5p egl-miRu-F35-5p egl-miRu-F36-3p-1* 37678544 egl-miRu-F36-3p-2* 30078292 egl-miRu-D03-3p egl-miRu-F36-5p egl-miRu-D04-1-3p egl-miR167b-3p* 31262104 37685264 egl-miR156c-5p 62733508 egl-miRu-D05-2-3p egl-miR167b-5p 43926032 egl-miRu-F37-3p 31818608 egl-miR319b-3p 65074400 egl-miR164b-5p 43926040 egl-miR394-5p 33310136 egl-miRu-D06-5p egl-miRu-E31-2-3p egl-miRu-F38-3p 69491752 egl-miRu-H50-3p egl-miRu-E31-1-3p 49869912 40228576 egl-miRu-D07-5p egl-miRu-F38-5p* 72509720 egl-miR166d-3p . egl-miRu-E31-1-5p* 41978404 53893728 . 74330456 . Figure 6.3 – Physical position of miRNA identified in developing xylem tissues. Dark green loci indicates conserved miRNA described in miRBase v.21; light green indicates conserved miRNA previously reported in Eucalyptus; black loci indicates new miRNA previously reported in Eucalyptus and blue loci indicates new miRNA, never reported. A total of 132 loci (pre-miRNA) were mapped. The physical map was generated in MAPCHART 2.2 (Voorrips, 2002). 175

The stem-loop lengths of the E. globulus pre-miRNA (119nt in average) within the size range reported for other plants species (100- 200nt) (Meyers et al., 2008). The 162 mature miRNA were distributed in five size classes with 53% and 23% displaying a size of 21nt and 24nt, respectively (Fig. 6.4a). The size distribution did not change significantly throughout the bending kinetics (Fig. S6.4), however transcript abundance varied along the bending kinetics (Fig. 6.4b; Fig. S6.5a-i).

For instance, after one week of bending there was a considerable increase of transcripts from 20nt-class size (up to 37-40% of all transcripts; Fig. 6.4b). This increase was mostly associated to a single candidate, egl-miRu-F36-5p, ranked as the third most abundant miRNA in our dataset (Fig. 6.5; Table S6.2a-b). This class of 20nt became residual in the longer bending periods, which instead revealed a prevalence of transcript abundance of the 21nt class (from 82-94%).

The vast majority (94.2%) of the reads was addressed to only 15 mature sequences, including 8 members of six conserved families (miR159, 166, 167, 168, 169, 319, 396), one new miRNA (egl-miRu- F36-5p) and two other miRNA which were also not found in miRBase v.21 but have been previously described in Eucalyptus (egl-miRu-K65- 3p and egl-miRu-H46-3p) (Fig 6.5; Table S6.2a). These fifteen miRNA presented a high-variability of expression profiles along the tension wood kinetic (Fig 6.5; Table S6.2b). The two most abundant mature miRNA are members of families, miR-159 (egl-miR159a-3p; egl- miR159b-3p, 23.6% of the reads) and miR-169 (egl-miR169b-5p; egl- miR169c-5p; egl-miR169d-5p; egl-miR169f-5p; egl-miR169h-5p, 17% of the reads). The third most abundant was the new miRNA, egl-miRu- F36-5p (11.1% of the reads). This miRNA (20nt) was only detected in the first week after bending in both tension and opposite wood differentiating xylem (Table S6.2b; Figure 6.5).

176

(a) 100 90 80 70 60 50 40 30

Nº distict miRNA Nºdistict 20 10 0 20 21 22 23 24 Length of mature miRNA (nt) (b) 100%

80%

60% 24 nt 23 nt 40% 22 nt 21 nt 20% 20 nt

0% Ctrl TW1w OW1w TW2w OW2w TW3w OW3w TW4w OW4w Tissue

Figure 6.4 - Distribution of distinct Eucalyptus globulus miRNA by length classes (nt) according: (a) number and (b) relative transcriptome abundance. Tension wood (TW) and opposite wood (OW) tissues were sampled in trees following one (1w), two (2w), three (3w) and four (4w) weeks of trunk bending. Ctrl tissues were sampled from straight trees.

Figure 6.5 – The 15 most abundant mature miRNA in the transcriptome of Eucalyptus globulus developing xylem tissues. Profiles of transcript abundance throughout nine developing xylem tissues (conditions) are presented. The fifteen most abundant miRNA are displayed in descending order, with the most abundant presented on the left side. Tension wood (TW) and opposite wood (OW) tissues were sampled in trees following one (1w), two (2w), three (3w) and four (4w) weeks of trunk bending. Ctrl tissues were sampled from straight trees

177

As a first step to validate the miRNA identification strategy, the expression and size of six miRNA, including four members of three conserved miR families (miR167_1, miR172 and miR396) and egl- miRu-F38-3p and egl-miRu-F38-5p* already reported by Levy et al. (2014), were positively verified by northern blotting (Fig. S6.6). RT- qPCR analyses were used to validate eleven miRNA that together represented 42% of miRNA reads (Table S6.4). This set includes five of the fifteen most highly expressed candidates but also the expression of low transcript abundant candidates such as egl-miRu-K66-5p, associated to 18 sRNA filtered reads (0.002%) was validated.

Identification of E. globulus miRNA targets

The analyses of the sequencing data from the five developing xylem degradome libraries using PARESNIP (Folkes et al., 2012) (Table S6), led to the identification of 246 distinct high-confidence miRNA:target gene interactions (Table S6.5a). These interactions involved 78 miRNA and 213 target transcripts encoded by 135 target genes (Fig 6.7). The majority (88%) of these interactions are “trans” miRNA-mRNA interactions, that is they involve pairs of miRNA and target transcripts encoded by genes located in distinct chromosomes (Table S6.5a; Fig. 6.6). Cooperative miRNA-mediated gene regulation (a single gene regulated by more than one miRNA) was detected for 15% of the target genes whereas multiplicity of miRNA-mediated gene regulation (a single miRNA mediating the cleavage of transcripts from more than one gene) was detected for 44.3% miRNA. Among the 78 mature miRNA having at least one validated target gene, 42 belong to the 22 known miR families predicted in our dataset, and their targets are conserved whereas the remaining 36 are not present in miRBase v.21 (Table S6.5a). Several genes related with cell wall biosynthesis were identified as targeted by both miRNA newly identified and previously known. For example, four lignin toolbox phenylalanine ammonia-lyase

178

(EgrPAL) genes were targeted by egl-miRu-H49-5p (Table S6.5a). Other examples included a basic chitinase (Eucgr.J02518) and a galactose oxidase gene (Eucgr.J02333), both targeted by miR2111 members, another galactose oxidase targeted by a miR394 member, a xyloglucan endotransglucosylase/hydrolase (Eucgr.K00883) targeted by egl-miRu-I51-5p and three laccases targeted by egl-miRu-F38-3p (Table S6.5a).

Figure 6.6 – Physical mapping of 132 E. globulus miRNA-encoding loci and 135 loci encoding 135 miRNA target genes in the Eucalyptus grandis genome. The physical positions of MIR genes were established by BLASTn sequence homologies. The 246 miRNA:target transcript interactions revealed by degradome analysis are represented by connecting the loci encoding the mature miRNA with the loci encoding the target transcript. Red highlight discriminates differentially expressed miRNA (tension vs. opposite wood tissues) and their associated target transcript interactions. The distribution graphic was generated using CIRCOS software (Krzywinski et al., 2009).

179

Figure 6.7 - Venn diagrams for comparative analyses of five Eucalyptus globulus degradome libraries. Only high-confidence interaction (class 0, 1 and 2) were considered in this analysis. (a) 246 miRNA:target interactions; (b) 78 miRNA; (c) 213 miRNA target transcripts and (d) 135 miRNA target genes. Degradome libraries were generated from developing xylem from tension (TW) and opposite wood (OW) tissues. Tissues were sampled in trees following one (1w), two (2w), three (3w) and four (4w) weeks of trunk bending. Ctrl tissues were sampled from straight trees. Diagrams produced with SUMO package (http://www.oncoexpress.de/software/sumo).

The majority of the miRNA-mediated transcript cleavage interaction events were library-specific (Fig. 6.7a). Indeed, a total of 125 cleavage events, library-specific, were detected, involving 61 distinct mature miRNA and 122 target transcripts encoded by 77 target genes. Only a small fraction of the interactions detected (12.6% interaction pairs) was common to all the five libraries (Table S6.5a). These 31 interactions involved 13 miRNA and 25 target genes, 22 of them related to transcription factors such as ARFs (EgrARF10, Egr16B, EgrARF17, ARF6A, and ARF6B), R2R3MYBs (EgrMyb22, EgrMyb23, EgrMyb62,

180

EgrMyb88, EgrMyb92), (Table S6.5b-f). Only few miRNA target genes coding for structural proteins were identified in this common set of interactions such as Dicer-1 like gene (Eucgr.D02653), a microtubule associated protein (MAP65/ASE1) family protein (Eucgr.A02875), respectively by targeted by miRNAs egl-miR162-3p, egl-miR828a-5p, and one copper/zinc superoxide dismutase (Eucgr.B01760) targeted by egl-miRu-C03-1-3p and egl-miRu-G12-2-3p.

Identification of differentially expressed miRNA and functional analysis of their targets

Fisher Exact Test (with Bonferroni correction, p-value<8.57E-06) was applied to identify miRNA differentially expressed between TW and OW xylem samples, after one, two, three and four weeks of bending. Seventy one miRNA were differentially expressed between at least in one of the four TW vs OW comparisons (Table S6.6). Among those, 16 miRNA were differentially expressed between TW and OW for all bending times suggesting that they play important role in the differentiation of the different xylem types, independently of the duration of the gravitropic stress. All these 16 miRNA were highly expressed (among the 25 most expressed ones), two of them being new miRNA (egl-miRu-D01-3p and egl-miRu-E01-3p).

GO category enrichments tests of the degradome genes targeted by differentially expressed miRNA identified 42 non-redundant GO terms

(log10 FDR-adjusted p-value<-1.35) representing 19 Biological Processes (BP), 9 Cellular Components (CC) and 14 Molecular Functions (MF) (Table S6.7a). Among the enriched GO categories, 'Transcription factor activity' was represented by 40 genes (Table S6.7b) belonging to 12 transcription factor (TF) families: SBP, MYB, ARF, NAC, HD-ZIP, C2H2, NF-YA, GRAS, AP2, GRF, bHlH and C3H known to have important regulatory functions during wood formation and/or in the variability of the final properties of wood (Grima-Pettenati

181 et al., 2012; Wang et al., 2011); (Table S6.8a-b). These 40 TF were found to be post-transcriptionally regulated by 24 mature miRNA. Eighteen of these miRNA are conserved and represent twelve known miR families (miR156, miR159, mir319, mirR160, miR164, miR166, miR167_1, miR169_2, miR171_1, miR172, miR393, miR396 and miR828) which target 34 TF genes. Transcription factors targeted by these known families, such as those of the miR160, miR166 and miR167_1 families were also conserved in other plant species (Table S6.8b). One TF target gene, Eucgr.A02065 (ARF), is cleaved under the mediation of egl-miRu-H47-5p which is not present in miRBase v.21 but has been reported in Levy et al. (2013). The remaining six TF genes, which represent four TF families, are exclusively targeted by five new miRNA: Eucgr.A02065, Eucgr.E00888 (ARF), Eucgr.H02628 (bHLH), Eucgr.K00844 (CH3), Eucgr.A01418, Eucgr.C00823, Eucgr.D01674, Eucgr.F00097 (GRF), Eucgr.G01060, Eucgr.G01551 (NAC) and Eucgr.K01921 (NF-YA ).

Transcript variation of miRNA

The nine differentiating xylem samples were clustered based on the transcriptomic Euclidian distance calculated among the 71 differentially expressed miRNAs (Fig. 6.8). TW and OW samples collected after two, three and four weeks of bending clustered in two distinct groups, while both TW and OW sampled after one week (1w) clustered together with the Crtl sample, reflecting the transcriptomic proximity of these tree type of tissues.

182

Figure 6.8 – Hierarchical clustering of E. globulus tension (TW) versus opposite wood (OW) tissues samples based on transcript abundance of differentially expressed miRNA between tension (TW) and opposite (OW) forming xylem, after one (1w), two (2w), three (3w) and four weeks (4w) of bending stress. The hierarchical clustering heat-map was generated with Expander 6.5 (Ulitsky et al., 2010).

Considering the tension and opposite wood samples, 40 of the 71 miRNA clustered into two homogeneous groups (Fig. 6.9) each containing 20 miRNAs, whereas 31 miRNAs remained as singletons (Table S6.6). The two clusters displayed very contrasting profiles (Fig. 6.9). The expression levels of the miRNAs of the TW-cluster (Fig. 6.9a) strongly increased along the bending kinetics in TW, while they remained more or less constant in OW. The TW-cluster is composed of miRNAs from the families miR535, miR395, miR319, miR172, miR169-2, targeting transcripts of annexin 1, Cysteine proteinases superfamily protein, ATP sulfurylase 1, different members of AP2 family, nuclear factor Y subunit A3, respectively. In addition, this TW- cluster contained one new miRNA targeting two NAC TFs [EgrNAC89 (Eucgr.G01060) and EgrNAC108 (Eucgr.G01551)] were identified, both belonging to the phylogenetic E. grandis sub-group IVa (Hussey et al., 2013). In general, the correlation coefficients between the expression levels of the miRNAs of TW and that of their targets were close to 0 or positive, suggesting that these miRNAs are rather implicated in the homeostasis of their targets than in their down- regulation in TW (Table S6.9).

183

The OW-cluster contained miRNAs over-expressed in OW and down- regulated in TW. Among the miRNAs with a validated target, members of the miR160, miR166, and miR 828 families were found to target transcripts of Auxin responsive factors [EgrARF10, EgrARF16A, , EgrARF16A, EgrARF17], homeobox 8 (Eucgr.C00605), and different MYB transcription factors (e.g. EgrMyb22, EgrMyb23, EgrMyb62, EgrMyb67, and EgrMyb88). In this cluster, transcripts of the Microtubule associated protein (MAP65/ASE1) family (Eucgr.A02875) targeted by miR828 members were found, highlighting the role of the control of cytokinesis (Lucas et al., 2011) by miRNA during OW differentiation. In contrast to the TW-cluster, negative correlations were found for almost all the miRNA-target interactions that involved miRNAs from the OW-cluster (Table S6.9).

The singletons dataset comprised those differentially expressed miRNAs that were not clustered into the TW-cluster or in the OW- cluster. These miRNAs showed a roughly constant overexpression in OW forming tissues along the kinetics, but found down regulated in OW and up-regulated in TW forming tissues at three weeks of bending. This dataset of singletons comprised miRNAs targeting different transcripts from genes related with cell wall biosynthesis such as a glycine-rich protein, and the Trichome birefringence-like (TBL) 25 protein proposed to encode wall polysaccharide specific O- acetyltransferases (Gille et al., 2011) and signaling genes such as a leucine-rich repeat receptor kinase HAESA-like 1.

184

a) TW-cluster 3 2 1 0 -1

(standardized (standardized -2 Expression level level Expression -3 S1WT S2WT S3WT S4WT S1WO S2WO S3WO S4WO

egl-miR166e-1-3p egl-miR167c-5p egl-miR169b-5p,+4 egl-miR172-3p

egl-miR172-5p* egl-miR319e-5p-2* egl-miR395a-3p,+2 egl-miR535a-5p,+1

egl-miRu-C05-3p egl-miR396b-3p egl-miRu-D06-5p egl-miRu-E01-3p

egl-miRu-E31-1-3p egl-miRu-E32-5p,+2 egl-miRu-F35-5p egl-miRu-I57-5p

egl-miRu-I58-3p-1* egl-miRu-I58-5p egl-miRu-J60-5p egl-miR393b-3p

average TW

b) OW-cluster 3 2 1 0 -1 (standardized (standardized Expression level level Expression -2 S1WT S2WT S3WT S4WT S1WO S2WO S3WO S4WO

egl-miR156a-5p,+3 egl-miR160a-5p,+1 egl-miR160b-3p* egl-miR160b-5p,+1

egl-miR164a-5p,+1 egl-miR166a-3p,+3 egl-miR166b-5p-1* egl-miR166c-3p,+1

egl-miR171a-3p egl-miR319b-3p,+1 egl-miR319d-3p,+1 egl-miR828a-5p,+1

egl-miRu-C10-5p egl-miRu-H44-3p egl-miRu-H50-3p egl-miRu-I53-3p

egl-miRu-I55-5p egl-miRu-K65-3p egl-miRu-K65-5p-2* egl-miRu-K65-5p-3*

Average OW

Fig. 6.9 - Clusters of miRNA with homogeneous expression profiles in the E. globulus: (a) tension wood (TW) and (b) opposite wood (OW) kinetics. Tissues were sampled in trees following one (1w), two (2w), three (3w) and four (4w) weeks of trunk bending. The clusters with an average homogeneity of 0.783, and an average separation score of -0.157 were generated with the Click algorithm in Expander 6.5 (Ulitsky et al., 2010).

185

Discussion

In this study, we used Eucalyptus globulus tension wood induction kinetics as a model to provide new clues on the post-transcriptional miRNA mediated regulation of the differentiation of tension and opposite wood. A total of 132 pre-miRNA, corresponding to 162 mature distinct miRNA were identified, following the strict guidelines for miRNA classification (Meyers et al., 2008). This dataset considerably enriched the miRNA catalog for Eucalyptus species, as 66% of the miRNA identified in this study have never been described before. Despite the worldwide importance of Eucalyptus, there is no miRNA deposited in miRBase (v.21) for this genus and only a few studies were published to identify miRNA in a restricted number of its species (Hudson et al., 2014; Levy et al., 2014; Myburg et al., 2014; Pappas et al., 2015; Victor, 2007). However, none of these studies addressed xylogenesis under to the gravitropic stress, and the consequent formation of tension wood. The analysis of degradome libraries revealed, that the majority of the cleavage interactions detected were library specific, and only less than 10% of the miRNA targets were identical to those identified in E. grandis leaves degradome libraries reported by Pappas et al. (2015). These results highlighted the spatio-temporal complexity of the miRNA post-transcriptional regulation of gravitropic responses.

Time specific MiRNAs precise regulate post-transcriptional regulation of tension wood induction.

The temporal expressional differences observed along the duration of bending experiments are in accordance of previous observations of the presence of tension wood after 14 days of bending (Carocha, unpublished, Paux et al. 2005, Mizrachi, et al, 2015), being the samples collected after one week of bending closer of that of samples collected in non-stressed trees. The identification of time specific miRNA such as egl-miRu-F36-5p supports the hypothesis of tight

186 temporal post-transcriptional modulation gene expression during the tension wood induction. Egl-miRu-F36-5p was the third most expressed miRNA along the kinetics, but was only detected in the first week after bending in both tension and opposite wood differentiating xylem, suggesting an important role in the early response to bending stress. Egl-miRu-F36-5p targeted the transcripts of a hypothetical membrane magnesium transporter that was found highly expressed in E. grandis immature xylem (Eucgr.A01688). The expression of this gene is highly correlated (0.992) with the expression of an ethylene response sensor related protein transcript (Eucgr.H0.3145) (www.phytozome.net). Recently Niu et al. (2015) showed that the elongation and directional growth of primary roots in A. thaliana could be interactively modulated by phosphorus and magnesium. Ethylene is known for long time to be produced during tension wood formation (Andersson-Gunnerås et al., 2003; Du and Yamamoto, 2003). Changes of ethylene internal concentrations are known to stimulate cambial growth, apparently by promoting cell radial growth and tension wood formation in poplar (Andersson-Gunnerås et al., 2003).

The spatial plasticity of post-transcriptional regulation by miRNA were also put in evidence in this study, as 43% of miRNA were found differentially expressed between tension and opposite wood forming tissues, at least in one time point of the tension wood induction kinetics. Among these miRNA, it was possible to correctly assign 20 miRNA to one of the two homogenous expressional clusters associated to tension and opposite type of differentiating xylem.

MiRNA upregulated in tension wood are involved in mineral homeostasis of this tissues

The miRNAs assigned to TW-cluster, showed to be up-regulated in this tissue type after two week of bending, despite their general low expression. Their targets suggest they were likely implicated in

187 different abiotic responses, in particular to maintain mineral homeostasis in such highly active tissue. For instance, the miRNA169_2 family, known to be involved in gravitropic stress in poplar (Puzey et al., 2012) and in cotton fiber elongation (Xue et al., 2013). This famility was also found involved in environmental abiotic stress responses and in Phosphorus and Nitrogen homeostasis (Lu and Huang, 2008; Luan et al., 2015; Pant et al., 2009). Additionally, in this cluster were identified members of miR172 and miR395 families, whose expression correlated negatively with the expression of their targets, the unknown function gene Eucgr.J00901 and a homolog of the E. grandis ATP sulfurylase 1 (Eucgr.H03408.1) respectively. In poplar, miR172 were found up-regulated in tension wood forming tissues (Lu et al., 2005). Besides they were also deeply involved in plant responses to abiotic stresses (Khraiwesh et al., 2012; Zhou et al., 2010), including Iron (Fe) deficiency stresses (Kong and Yang, 2010). In A. thaliana, miR395 cleaves mRNAs encoding the ATPS1 and ATPS4 isoforms in both leaves and roots (Kawashima et al., 2011) whereas the negative regulation of ATPS1 could be particular significant for the maintenance of the status of S-compounds and S-N homeostasis (Anjum et al., 2015), in the highly active tension wood differentiating tissues.

The expression of Class III HD-ZIP, ARFs and R2R3-MYB transcription factors are negatively modulated by miRNAs in opposite wood

In contrast, the opposite wood cluster consisted of miRNAs with up- regulated expression after two weeks of bending stress. In general, these miRNA correlate negatively with their target genes, which suggest their active role on canonical expression reduction of transcription factors involved in cell division and meristem differentiation, auxin signalling and cell fate:

188

• Four mature Egl-miR166 family members overexpressed in opposite wood were found to mediate the cleavage of homologous transcripts of four A. thaliana III HD-ZIP genes [Eucgr.B02504 (REVOLUTA/IFL1), Eucgr.C00605 (ATHB8), Eucgr.D00184 (PHABULOSA/ATHB14) and Eucgr.F03066 (CORONA/ATHB15)], but not of PHAVOLUTA/ATHB9. In Arabidopsis, the Class III HD-ZIP transcription factors known to promote xylem differentiation and to coordinate cell expansion and cell maturation (Ilegems et al., 2010; Miyashima et al., 2011; Robischon et al., 2011) and are also post- transcriptionally regulated by members of the miR165/166 family (Emery et al., 2003; Zhong and Ye, 2014). Down-regulation of these target genes in OW could be related with the decrease of the cambial activity and intravascular patterning such as reduced vessel tangential diameter in opposite wood as observed in the samples used (Carocha et al., unpublished; Chapter 5). In Arabidopsis, miR166 is a regulator of ATHB15, a gene that controls vascular development in xylem tissues and inter-fascicular regions (Kim et al., 2005). PtrHB1, a close homolog of the Arabidopsis REVOLUTA gene, has been closely associated with poplar wood formation and found regulated both developmentally and seasonally and their expression found inversely correlated with the level of Pta-miR166 (Ko et al., 2006). Also in poplar, the PtrHB7/PtrHB8 genes, which are close related to the A. thaliana AtHB-8 (Baima et al., 2001), play a positive role in the promotion of vascular cambium differentiation during the secondary growth (Zhu et al., 2013). Besides, PtrHB3/PtrHB4 gene, homologous of the Arabidopsis ATHB14 that is involved in determination of vascular patterning (Carlsbecker et al., 2010; Furuta et al., 2012; Miyashima et al., 2011), were only found in cambium tissue (Ko et al., 2006). In poplar PtrHB3/PtrHB4 are known to interact with the ARBORKNOX2 (ARK2) (Liu et al., 2015) that modulate the gravitropic in this species (Gerttula et al., 2015).

189

• Auxin-Responsive Factor (ARF) genes are important actors of the auxin-signalling pathway, as they regulate the transcription of auxin-responsive genes (Yu et al., 2014). Curiously, as previously predicted (Yu et al., 2014), four ARF genes (EgrARF10, EgrARF17, and EgrARF16A and EgrARF16B) were found targeted and negatively correlated in this study by the mature miR160 member (egl-miR160b- 5p;egl-miR160c-5p) which was significantly overexpressed in opposite wood forming tissues. Auxins have been implicated in the differentiation of the tension wood in trees and in the induction of the gelatinous G-fibres characteristics of this type of wood (Gerttula et al., 2015; Hellgren et al., 2004; Timell, 1986). Recently, Gerttula et al. (2015) proposed that in poplar the signals generated by lateral auxin transport towards the ground by PIN3 were the responsible for the elicitation of the differential growth response of the cambium to gravistimulation on upper growth responses of upper (tension wood) versus lower (opposite wood) sides of the stem. Moreover, HD-ZIP gene expression is known to be positively regulated by auxin, while HD-ZIP activity promotes positively the polar transport of auxin transport (PAT) (Baima et al., 2001; Ilegems et al., 2010; Schuetz et al., 2013; Turchi et al., 2015).

• R2R3-MYB transcription factors regulate many aspects of plant biology, such as primary and secondary metabolism, cell fate and developmental processes (Dubos et al., 2010; Jin and Martin, 1999; Stracke et al., 2001). Five Eucalyptus R2R3-MYB genes (EgrMYB22, EgrMYB23, EgrMYB62, EgrMYB67 and EgrMYB88) were targeted by the differentially expressed miR828 members that belong to OW- cluster. All these five R2R3-MYB genes have been associated to higher expression in Eucalyptus cambial tissues (Soler et al., 2015). In particular, EgrMYB88, specifically and highly expressed in vascular tissues, was recently described as a regulator of the biosynthesis of

190 some phenylpropanoid-derived secondary metabolites including lignin, in cambium and in the first layers of differentiating xylem (Soler et al., 2016). Conserved and complex MYB-mediated regulatory networks involving post-transcriptional regulation of the expression of a subset of R2R3-MYB genes by miR159 and miR828 were already described (Li and Lu, 2014). In Arabidopsis, miR828 regulation of MYB genes influences trichome and fiber development (Guan et al., 2014). In cotton, GhMYB2D mRNA accumulates during fiber initiation and is targeted by miR828 generating trans-acting siRNAs (ta-siRNAs) of the TAS4 family (Guan et al., 2014). In addition, members of egl-miR159 were also found up-regulated in opposite wood forming tissues after two weeks of bending stress. These miRNAs targeted a long-chain acyl-CoA synthetase (ACS) gene (Eucgr.H04493) and in the case of egl-miR828, a MAP65/ASE1 gene (Eucgr.A02875). In poplar, miR159 members were described as mechanical stress responsive (Puzey et al., 2012) and up-regulated in OW tissues (Lu et al., 2005). ACS are critical enzymes in lipid degradation and synthesis (Yu et al., 2014) while MAP65/ASE1 proteins are major cross-linkers which promote microtubule flexibility, a critical feature in the formation and plasticity of complex microtubules networks which impact fundamental cellular processes (Portran et al., 2013), such as cell division and microfibril orientation of the cell walls.

All together the overexpression of these miRNAs in opposite wood highlight the role of post-transcriptional regulation of the meristematic activity observed in opposite wood, by targeting important genes involved on cytokinesis, membrane and cell-wall remodelling.

New miRNAS target important genes related to the cell wall biosynthesis

Some of the new miRNAs seem to be involved in the regulation of some important structural genes of the cell wall biosynthesis. For

191 example, egl-miRu-I51-5p targets one xyloglucan endoglucosylase/hydrolase (XET) gene and one DUF 538 domain protein. XET activity is essential for G-fiber shrinking since it repairs xyloglucan cross-links between the G and the S2 layers, thus maintaining their contact (Nishikubo et al., 2007). It was suggested that DUF538 domain proteins were implied in general stress responses (Gholizadeh and Baghban Kohnehrouz, 2010) such as heat response (Li et al., 2013). These proteins were also implied in elevation of redox enzyme activities including catalase, peroxidase, polyphenol oxidase and phenylalanine ammonia lyase (Gholizadeh, 2011). The miRNA egl-MiRu-H49-59 targeted several homologues of the E. grandis PAL family, recently identified by Carocha et al. (2015). Despite egl-MiRu- H49-5p was not differentially expressed in TW versus OW, its role in post-transcriptional regulation of different homologues EgrPAL genes might lead to compensatory responses of other EgrPAL genes, like those induced in poplar by using artificial miRNAs (Shi et al., 2010b), contributing for the regulation of lignification in these tissues.

192

Chapter 7

General discussion and perspectives

Main achievements

Genome sequencing projects have significantly improved our understanding on the blueprints of many plant species. These large- scale projects provided us full catalogs of protein-coding and noncoding genes in a genome-wide context. Mining this information has greatly impacted many areas of plant science.

Building on the recent availability of the full-annotated sequenced Eucalyptus grandis genome (Myburg at al., 2014) this thesis was oriented to obtain deeper insights and advanced knowledge on the critical actors and gene expression regulation mechanisms during Eucalyptus xylogenesis.

Whole-genome surveys led to the identification of important structural and transcriptional regulatory genes associated to cell wall biosynthesis in Eucalyptus, such those involved in monolignol biosynthesis (Chapters 2) and R2R3-MYB family (Chapter 4). In these studies, most of the 175 members of eleven multigene families involved in the biosynthesis of monolignols (Chapter 2; Carocha et al., 2015) and most of the 141 members from the R2R3-MYB family (Chapter 4; Soler et al., 2015) were described for the first time in Eucalyptus grandis. Besides, for the first time, these studies revealed the genomic architecture of twelve important multigene families in the context of xylogenesis. This ground information will be fundamental for future functional studies towards a better characterization of the effective role of genes implicated in structural and in regulatory events leading to wood formation.

193

Developmental and stress-induced responses of individual genes were profiled in wide-range panels of tissues and conditions using technologies such as RT-qPCR and RNA-Seq. Given the ecological and economical relevance of lignin in Eucalyptus, both studies were then oriented for the identification of a restricted set of members most likely involved in vascular lignification. A core Eucalyptus vascular lignification toolbox was proposed, integrating 17 genes most likely involved in vascular lignification (Fig. 2.6; Carocha et al, 2015), given their strong and preferential expression in highly lignified vascular forming tissues. For instance, from the nine newly identified PAL genes, gene expression profiling studies revealed only two bona fide PAL genes (EgrPAL3 and EgrPAL9) likely involved in vascular lignification. All eleven multigene families were represented in the core set (EgrPAL3, 9; EgrC4H1, 2; Egr4CL EgrHCT4, 5; EgrC3’H3, 4; EgrCSE; EgrCCoAOMT1, 2; EgrCOMT1, EgrF5H1; EgrCCR1; EgrCAD2, 3). Additionally, a restricted set of R2R3-MYB genes was putatively involved in secondary cell wall formation (Figs. 4.4 and 4.5; Soler et al., 2015). These include both newly described genes such as EgMYB60 (sub-group SAtMYB103), EgMYB31 (SAtMYB46), EgMYB82 (SAtMYB85), among others, and also well-functionally characterized genes such as EgMYB1 (S4) (Legay et al., 2007, 2010) and EgMYB2 (SAtM46) (Goicoechea et al., 2005).

Other important outcomes were the novel insights on the evolutionary histories of those twelve multigene families, inferred from comparative phylogenetic analyses (Chapter 2 and 4). In particular, tandem gene duplication, considered a primary mechanism of genome evolution for functional diversification was found the most prominent mechanism contributing to multigene family expansion (Myburg et al., 2014). Remarkably, E. grandis presents the highest rate of tandem gene duplications (34% of all genes) among all plant species with publicly

194 available fully sequenced, annotated genomes (Myburg et al., 2014). At the time, such exuberant duplications challenged the task of genome assembly and were susceptible to lead to erroneous identification of duplicated gene loci. Therefore, an international collaborative effort was raised to validate the Sanger sequence v1.0 genome assembly for E. grandis. The study presented in Chapter 3 was one of several contributions to validate the genome sequence assembly and the tandem duplication. Using a BAC clone strategy we were able to successful experimental validate the organization in tandem of eight highly similar O-methyltransferase (OMT) (Fig. 3.3). This study was later published as a part of the reference genome article for this species (Myburg et al., 2014).

Tandem duplicated genes tend to be more frequently retained during evolution when they are involved in responses to environmental stimuli (Hanada et al., 2008). Therefore, the patterns of expression divergence revealed between members of tandem duplicated groups suggested that this gene proliferation mechanism might contribute for the morphological diversification in Eucalyptus, a well-recognized characteristic of this large genus (Holman et al., 2003), and to the wide adaptation ability of many Eucalyptus species to their challenging and competitive environments. In these multiple stress scenarios, the existence of multilayered, sophisticated and flexible control of gene expression would confer additional competitive advantages. In addition, tandem duplication may also contribute for putative specializations related to unique aspects of the Eucalyptus biology, such as floral development (Cao et al., 2015; Vining et al., 2015), secondary metabolites (Külheim et al., 2015) and defense (Christie et al., 2015). Since the extensive impact of these duplications has been considered one of the most remarkable features of the E. grandis genome, further research on this theme should be expected from the

195

Eucalyptus international community, such questions related to tandem and segmental duplications of multigenic families in E. grandis (Li et al., 2015) and evolutive aspects of lignin multigene families (Labeeuw and Martone, 2015).

Tandem gene duplications were shown to extensively impact the shape, size and, likely, the biological functions of some multigene families involved in E. grandis monolignol biosynthesis (Chapter 2; Carocha et al., 2015). While EgrPAL, EgrHCT, EgrC3’H and EgrCOMT families present high rates of tandem duplication, other multigene families presented no traces of such influence, but instead harbor low- or even single-copy genes, such as EgrC4H, Egr4CL, EgrCSE, EgrCCoAOMT, EgrF5H, EgrCCR and EgrCAD families. This may result from a selection pressure to preserve these families as singleton or low-copy number. For instance, C4H and F5H are known to have critical functions regulating lignification and determining lignin monomer composition (Franke et al., 2000), and in a large spectrum of plant species, these families are known to consistently present low number of members (Xu et al., 2009). Our results also showed that none of the 17 genes of the core lignification toolbox was shaped by tandem gene duplication (Fig 2.6; Carocha et al., 2015).

Similarly, tandem gene duplication unequally affected several subgroups of the large family of R2R3-MYB transcription factors. Woody-expanded (S5, S6 and SAtM5) and woody-preferential subgroups (WPS-I, II, III, IV and V) of R2R3-MYB genes were found particularly impacted by tandem duplication (Chapter 4; Soler et al., 2015). In fact, 94% of the R2R3-MYB genes of E. grandis that are spatially arranged in tandem arrays belong to subgroups either expanded or preferentially found in woody species (Fig. 4.3). Since some of these genes might be involved in the regulatory control of the phenylpropanoid pathway, those duplications may have contributed for

196 a diversified regulation of this pathway which may be woody perennial specific or even Eucalyptus specific (Chapter 4). In addition, some genes from these subgroups showed higher and preferential expression in cambial tissues, suggesting a putative role in the regulation of specific aspects related to cambium development and metabolism and therefore in secondary growth [Fig. 4.5; e.g. EgMYB36, 37, 38, 41, 47, 85 (S5); EgMYB19, 20, 26, 89 (WPS-III)].

In the second part of this thesis (Chapters 5 and 6), the study of phenotypic and molecular plasticity of xylogenesis in response to gravitropic stress was undertaken in E. globulus. Part of the novelty of these studies, consisted in the implementation of a field experiment to induce the formation of tension wood (TW) and opposite wood (OW) (Fig. 5.1) on five-years-old E. globulus trees submitted to 1, 2, 3 and 4 weeks of gravitropic stress stimulus. Also distinctively to other studies (e.g. Aguayo et al., 2010; Mizrachi et al., 2015; Paux et al., 2005), biological variation was accounted by using three non-related genotype backgrounds as biological replicates, conferring a more robust identification of genes differentially accumulating in TW and OW.

197

198

Figure 7.1 – A summary of distinctions between tension (TW) and opposite wood (OW) tissues from Eucalyptus globulus. The distinctions summarize the results of the anatomical characterizations (histochemistry), chemical characterizations (analytical pyrolysis), full coding- transcriptome sequencing (mRNA-Seq) and non-coding transcriptome sequencing (sRNA-Seq and miRNA analyses.

198

As expected, a considerable impact of gravitopism stress in the reprograming of xylogenesis of Eucalyptus globulus was observed, leading to highly contrasting phenotypes between tension and opposite wood differentiating xylem tissues (Chapter 5; Fig. 7.1). The presence of G-layers in TW tissues was well established after two weeks of bending and was spatially extended in the following weeks, in clear contrast with their absence in OW tissues (Fig. 5.2). TW revealed larger and fewer vessels. TW also revealed shorter, narrower fibres with thicker cell walls relatively to OW tissues (Fig. 7.1; Chapter 5). Higher contents of cellulose and other carbohydrate metabolites, as well as lower lignin and lower hemicellulose contents were observed for TW relatively to OW tissues (Fig. 7.1; Chapter 5). Cellulose contents remained relatively constant along the kinetics of TW formation, while in OW a significant reduction was observed between the first and the following weeks. Lignin content significantly increased along the kinetics of OW formation (Table 5.3). In accordance with previous studies in poplar (Andersson-Gunnerås et al., 2006; Gerttula et al., 2015; Lafarguette et al., 2004), and Eucalyptus (Paux et al., 2005; Mizrachi et al., 2015) these results attested for a remarkable phenotypic plasticity of xylogenesis in E. globulus and therefore for the advantages of using this model to study the dynamic transcriptional and post-transcriptional regulatory mechanisms of xylogenesis.

The response to gravitropic stress induced an important reprogramming of the transcriptome since the beggining of the stress (Chapter 5). RNA-Seq analysis between the TW an OW differentiating tissues collected after 1week or 3 and 4 weeks of bending stress, led to the identification of 88 mRNA-coding genes and five new loci (Table 5.4), which presented differential expression between TW and OW tissues in at least one sampling period.

199

The transcriptome of E. globulus (control straight trees) seems not to vary much in the same type of DX tissue during the same period used for the reaction wood kinetics. In contrast, strong transcriptomic differences were noticed between TW and OW tissues. Indeed, tissue type was found the foremost factor contributing for differential transcript accumulation (Fig. 5.5). Curiously, one fasciclin-like arabinogalactan (FLA; Eucgr.J00938) gene, was highly overexpressed TW tissues after 1 week of bending, and increased its expression during the late response. Since FLA genes are considered a molecular markers of TW in poplar (Andersson-Gunnerås et al., 2006; Gerttula et al., 2015; Lafarguette et al., 2004), it will be interesting to study the role of FLA on xylogenesis in early response to gravitropic stress. Transcriptomic analysis suggested distinct priorities for carbon partition and dynamic fluxes for reallocation of carbon metabolites between TW and OW tissues. A large number of carbohydrate metabolism genes were found over-expressed in TW (Table 5.5), in accordance to the increased fiber wall thickness and cellulose contents in these tissues as suggested by anatomical and chemical data (Fig. 7.1). The over- expression of two Exordium-like genes in OW tissues further emphasized the importance of the carbon reallocation priorities as these genes often take part of a regulatory pathway that controls growth and development during carbon and energy-limiting growth conditions (Schröder et al., 2011). Besides, genes associated to cell wall expansibility and remodeling, hydrolase activities, cambial differentiation, adjustments of osmotic potential and defense were found highly upregulated in TW, while genes upregulated in OW were associated to programed cell death and hydrolysis activity (Fig. 7.1), in accordance with the different cambial activities of this two contrasting tissues.

200

Well-known important players of transcriptional modulation were also identified in this study (Chapter 5). These included for example, a KNOTTED-like homeobox (Eucgr.D01935) and also EgrMYB60 (Eucgr.D01819), a MYB transcription factor, both overexpressed in TW tissues (Table 5.4). In Arabidopsis KNOTTED-like homeobox is likely a major regulator of cambial differentiation (Liebsch et al., 2014) while EgrMYB60 was putatively involved in transcriptional regulation during secondary cell wall formation (Chapter 4; Soler et al., 2015).

Similarly to previous studies in Eucalyptus a gene (Eucgr.D01368) encoding a 1-Aminocyclopropane-1-Carboxylate Oxidase-related (ACC oxidase), found highly up-regulated in TW tissues along the bending kinetics, highlighted the role of ethylene in gravitropic response in TW differentiation. ACC Oxidase are involved in the biosynthesis of ethylene and known to act as an endogenous stimulator of cambial growth during TW response (Andersson- Gunnerås et al., 2003, 2006; Vahala et al., 2013). Its up-regulation on TW tissues may also impact other aspects of wood formation influenced by ethylene such as the morphology of xylem cells and the ontogenesis of vessels, fibres and ray parenchyma cells (Little and Savidge, 1987).

The non-coding transcriptome was also assessed to unravel the dynamics of the differential post-transcriptional regulation mediated by miRNA in TW and OW tissues of E. globulus (Chapter 6). Identification of miRNA was assessed by generating and sequencing smallRNA libraries from the TW and OW developing xylem samples collected after 1, 2, 3 and 4 weeks of bending stress, using the same experimental dispositive described in Chapter 5 (Fig. 5.1). To experimentally identify the targets and validate miRNAs, five degradome libraries were generated from DX of TW and OW samples

201 collected after 1 week (early response) or 3 and 4 weeks (late response) of bending, and also from DX of non-bent trees.

The identification of a large number of mature miRNAs (162) and miRNA target genes (135) has considerably enriched our knowledge on post-transcriptional regulation mediated by miRNAs during the xylem differentiation in Eucalyptus. Indeed, this dataset extended the miRNA catalog for Eucalyptus species, as 66% of the miRNAs identified in this study have never been described in other studies available for this genus (Levy et al., 2014; Myburg et al., 2014; Pappas et al., 2015; Victor, 2007). The majority of the miRNA-mediated transcript cleavage interaction events were library-specific (Fig. 6.7a). An interesting novelty was the post-transcriptional regulation mediated by egl-MiRu-H49-5p that targets four E. globulus homologues of the E. grandis PAL family (EgrPAL3, 4, 6 and 7; Carocha et al., 2015, Chapter 2). It is possible that post-transcriptional regulations mediated by egl-miR-H49-5p over these four E. globulus PAL genes might lead to compensatory responses of other members of the PAL multigene family, contributing for the fine regulation of lignification during xylogenesis. So far the impact of post-transcriptional regulation of different PAL genes has only been reported in poplar using an artificial miRNA technology (Shi et al., 2010b). Those authors reported that the down-regulation of subset of PAL genes by an amiRNA led to an increase in transcript abundance of a distinct subset of PAL genes.

Like for transcriptome remodeling, we observed a strong predominance of the effect of type of tissue on differential accumulation of miRNAs, despite of some evidence for the importance of the effect of the duration of bending stress. Among these miRNAs, egl-miRu-F36-5p – the third most expressed miRNA - showed a very specific expression pattern in the response to gravitropic stress. Indeed, this miRNA was only detected in the first week after bending in

202 both tension and opposite wood differentiating xylem. This miRNA target a hypothetical membrane magnesium transporter gene (Eucgr.A01688), a gene found to be highly expressed in E. grandis immature xylem (EucGenIE, Hefer et al., 2011) and highly correlated (r2=0.992) with the expression of an ethylene response sensor related protein transcript (Eucgr.H0.3145). Taking in account that we found one ACC oxidase overexpressed in TW forming tissues after 1 week of bending, these results suggests a possible involvement of ethylene signaling and magnesium transport during the early response to bending. Future investigations on this putative network will be required to validate this hypothesis.

Another important outcome was the spatial plasticity of miRNA- mediated post-transcriptional regulation since a substantial part (43%) of the miRNAs was differentially expressed between TW and OW forming tissues. Two clusters, each composed of 20 differentially expressed miRNAs, overexpressed in tension (Fig. 6.9a) and/or in opposite xylem (Fig. 6.9b), put in evidence the biological relevance of miRNAs. The functional categorization of miRNA target genes (Table S6.7a-b) provided indications on the biological relevance of the post- transcriptional regulations mediated by the miRNA with differential expression in tension versus opposite wood tissues (Fig. 7.1).

In TW forming tissues, miRNA-mediated post-transcriptional regulations seemed to impact more mineral homeostasis and abiotic and mechanical stress responses, which occur in these actively proliferating tissues. An example was the downregulation of an ATP sulforylase and a Cysteine proteinase mediated by a member of miR395, which could be particularly significant for the maintenance of the status of S-compounds and S-N homeostasis (Anjum et al., 2015). In general, despite their low expression, these TW-overexpressed miRNAs presented correlation coefficients between their expression

203 and that of their targets, close to 0 or positive. Such correlations could be associated to mechanisms of limitation or homeostasis of the expression of their targets as proposed in some studies (Kuo and Chiou, 2011; Xue et al., 2009).

In OW forming tissues, miRNA-mediated post-transcriptional regulations seemed more directly involved in the modulation of the expression of target genes. In general, the levels of expression of the target genes of the miRNAs over-expressed in OW were reduced and thus negatively correlated. These targets are mainly involved in the regulation of the proliferation and differentiation of cambial meristematic cells. An example was the downregulation of homologous transcripts of four A. thaliana III HD-ZIP genes. In Arabidopsis, these genes, known to promote xylem differentiation and to coordinate cell expansion and cell maturation (Ilegems et al., 2010; Miyashima et al., 2011; Robischon et al., 2011), are also targeted by members of the miR165/166 family (Emery et al., 2003; Zhong and Ye, 2014). Other examples are different Auxin responsive factors (EgrARF10, EgrARF16A, EgrARF16A, EgrARF17), NAC (EgrNAC27), homeobox 8 (Eucgr.C00605), and different MYB transcription factors (e.g. EgrMYB22, EgrMYB23, EgrMYB62, EgrMYB67, and EgrMYB88). These genes are targeted by members of the miR160, miR164 and miR828 families, respectively.

Noticeably, the ample negative control of cambial differentiation observed in OW tissues, occurs in a cellular context in which metabolic resources are being mostly channeled for active xylogenesis in TW. These results are coherent with the variation in anatomical traits and the transcriptomic reprograming observed in TW and OW forming tissues (Fig. 7.1; see also Chapter 5).

204

Overall, in the context of dynamic formation of contrasting wood tissues, these results highlighted the existence of flexible transcriptional (Chapter 5) and post-transcriptional regulation mediated by miRNAs (Chapter 6). The interplay between these two layers of regulation, that likely modulates cambial activity and impact intravascular-radial-pattern formation and vascular cell specification, was also evidenced. As expected, most of these regulatory interplays are conserved across plant species, since they involve conserved miRNAs and homologous transcript cleavage events.

Interestingly, a small part of the regulatory interplays involved miRNAs that have never been reported before. Despite the still low number of woody species for which miRNA-mediated post- transcriptional regulation has been reported, these new miRNAs may represent a Eucalyptus species-specific component of the regulatory interplays. Notably, some other new miRNAs target structural genes likely involved in the biosynthesis and deposition of lignin, hemicelluloses and other main cell wall structural components. Examples were egl-miRu-H49-5p which targets four PAL genes and egl-miRu-I51-5p which targets a Xyloglucan endotransglucosylase/hydrolase9 gene (Table S6.5a). Therefore, post- transcriptional regulatory events could contribute to the high phenotypic variation of the main cell wall biopolymers occurring between TW and OW in E. globulus.

However, it should be mentioned that these results provide only a partial view of the post-transcriptional regulation as the study focused specifically on transcript cleavage miRNA-mediated events. There is still no indication about which and how prevalent are the events of miRNA-mediated translational repression. This might explain in part the lack of targets for some predicted miRNAs after degradome sequencing analysis. It is reasonable to admit that those miRNAs may

205 act on their targets primarily at the translational level. Indeed, in recent years, increasing experimental evidence suggests that the impact of miRNA mediated translational inhibition has been largely underestimated in plants (Ma et al., 2013). Among the different experimental approaches to evaluate miRNA-mediated translational repression (reviewed in Gu and Kay, 2010; Ma et al., 2013), genome- wide ribosome footprinting coupled to RNA-Seq could be used to experimentally demonstrate events of repression of translation initiation of target mRNAs (Janas and Novina, 2012). The availability of such information will contribute for a better understanding of the regulatory and metabolic impacts of miRNA-mediated regulation during Eucalyptus xylogenesis.

Future perspectives

Genome-wide identification of members of multigene families involved in monolignol and lignin biosynthesis in E. grandis establishes valuable knowledge and framework for future characterization of gene families in Eucalyptus and other tree species. Together with the studies on the plasticity of coding and non-coding transcriptomes of developing xylem formed under a gravitropic stimulus, a large repository of genes and associated information has been generated and provides important knowledge for future reverse and forward genetic studies. Despite the substantial gene expression profiling work produced in Chapters 2, 4, 5 and 6, which used wide range panels of Eucalyptus tissues, other relevant conditions and alternative genetic backgrounds may be used to better characterize selected genes in particular situations such as drought and cold, as exemplified by other studies (Thumma et al., 2012; Ployet at al., in preparation, respectively). For instance, due the seasonality of temperature, water availability and light in Portugal and in other temperate regions, it will be very interesting to study the impact of other edapho-climatic

206 conditions on molecular regulation of xylogenesis in Eucalyptus globulus. It is expected that the plasticity of coding and non-coding transcriptomes will be substantial, in accordance to that reported in E. grandis (Budzinski et al., 2016; Fernández et al., 2015; Keller et al., 2013), and other tree species (Druart et al., 2007; Ruttink et al., 2007; Schrader et al., 2004; Shim et al., 2014). New relevant protein coding and non-coding genes are expected outputs of those studies. More importantly, these studies are susceptible to provide new, important and complementary insights about the nature and the interconnection of different regulatory mechanisms for different abiotic stress stimuli. Among non-coding RNAs, it is well-established that plants assign miRNAs as critical post-transcriptional regulators of gene expression in a sequence-specific manner to respond to the numerous abiotic stresses they face during their growth cycle (Shriram et al., 2016). In fact, the nature of miRNAs’ response to environmental stresses has been described to have a multiplicity of aspects which could involve miRNA-, stress-, tissue-, and genotype-dependent manners (Zhang, 2015). Stress conditions may induce or inhibit the expression of miRNAs. Interestingly, some authors suggest that stress-induced miRNAs seem to lead to down-regulation of negative regulators of stress whereas stress-inhibited miRNAs allow the accumulation and function of positive regulators (Zhang, 2015). Therefore, studies adapted to the Eucalyptus biology and its responses to abiotic stresses, which could provide us new information on the responsive nature and the expression profile changes of miRNA during seasons will provide valuable elements on the nature and the extension of this particular types of post-transcriptional regulation.

Additionally, there is an increasing evidence that long non-coding RNAs (lncRNAs) also play important roles in a wide range of biological processes, especially in plant development and response to stresses

207

(Zhang and Chen, 2013). Similarly to a recent published study in Populus tomentosa (Chen et al., 2015), the genome-wide identification of long non-coding RNAs (lncRNAs; greater than 200 nt) and the analysis of their expression, could be used to explore the role of these non-coding RNAs in the complex regulatory networks underlying wood formation in Eucalyptus and in the response to abiotic and biotic stresses.

Functional studies will be a valuable follow-up investment if oriented to discern the precise roles and to provide new clues on the biological function and role of the genes and miRNA highlighted in this thesis. Functional studies also enclose the potential for the definition of future molecular breeding applications to engineer improved wood properties in eucalypts.

Transgenic technology in trees allowed reverse genetics approaches, using activation or suppression of gene expression to investigate function (Sederoff et al., 2009). Gene transfer experiments have long shown that specific genes can have major effects on wood properties (Chiang, 2006). Functional studies might be accomplished with the adoption of different strategies. The choice of the most adequate strategy will depend on the chosen miRNA and/or their target genes as well as the plant transformation system to be used. In Eucalyptus, the lack of a highly efficient, fast, reproducible and stable homologous transformation system has been a long time limitation (reviewed in Chauhan et al., 2014; Girijashankar, 2011). However, a few studies demonstrated the potential of this area as they reported field trials in which regeneration of healthy trees with stable trait expression was observed (Chauhan et al., 2014). Other authors circumvented those limitations with the use of orthologous transformation systems, taking advantage of well-established protocols and biological material available from plant model species. Examples included secondary cell

208 wall-related genes cloned in Eucalyptus but whose functional characterization has been accomplished in Arabidopsis [e.g. to modulate lignin (Legay et al., 2010) and cellulose (Creux et al., 2008)], tobacco [e.g. to modulate lignin (Rahantamalala et al., 2010) and cellulose (Lu et al., 2008)]; iii) poplar [e.g. to modulate lignin (Legay et al., 2010). Fortunately, a significant advance has been recently reported and might be decisive for a successful implementation of functional studies in Eucalyptus. This progress came in the form of an alternative protocol for a fast, stable, efficient and versatile Eucalyptus homologous transformation system (Plasencia, 2015). With this elegant system it is now possible to obtain and easily detect co- transformed E. grandis hairy roots using fluorescent markers. Chimeric plants are produced with wild type aerial parts and transgenic hairy roots. Additionally, the authors reported interesting high average efficiencies and proposed this system as a valid option for medium throughput functional studies to elucidate the effective role of genes of interest. Another important advantage is that those transformed hairy roots provide biological material which enable complementary studies on protein subcellular localizations, gene expression patterns (e.g. RT- qPCR and promoter expression) as well as modulation of endogenous gene expression.

An interesting idea to explore using transgenic approaches would be to induce overexpression of egl-MiRu-H49-5p. Although presenting no differential expression between TW and OW, this miRNA targets transcripts encoded by four tandem duplicated PAL genes, including EgrPAL3 which integrates the core set of the lignification toolbox. In such study it would be interesting to verify if the compensatory responses described in poplar for a sub-set of PAL genes after using artificial miRNA technology to down-regulate another sub-set of PAL genes (Shi et al., 2010b), also occur in Eucalyptus. In theory the down-

209 regulation of some PAL genes may cause a reduction of total lignin content however compensatory responses or pleiotropic effects cannot be discarded. It is possible that this study could provide further elements to better understand functional redundancy and divergence in the PAL family. Taking in account the results obtained during this thesis, other genes seems to be of particular interest for functional studies, as they are targeted by miRNA or revealed highly up- regulated in tension wood. Among them, the miR828 targeting several MYB genes, or the new gene loci identified during this thesis could be interesting choices.

Forward genetics approaches based on quantitative and qualitative analysis of natural variation could be used to better understand the genetic architecture and molecular basis of specific traits (Sederoff et al., 2009). Some outputs of this thesis such as the lignin toolbox or the catalog of genes up-regulated in tension wood, might benefit to the understanding and the development of genomic tools to be incorporated into tree breeding programs through allele-specific evaluation and selection. Forward genetics approaches such as population-wide association genetics are feasible for species with high levels of genetic diversity in natural populations and breeding populations (Neale and Savolainen, 2004). The genus Eucalyptus like many other forest trees is essentially undomesticated and holds a high natural genetic diversity that can be explored through forward genetics aproaches. Associations of natural allelic variation with improved growth and wood properties are most wanted outputs of forward genetic approaches in tree breeding programs. A few studies already established associations between allelic variants from wood-related genes with quantitative traits in Eucalyptus: CCR with variation in MFA (Thumma et al., 2005), some CesA genes with variation in S/G ratio (Denis et al., 2013), two pectin methylesterases with solid wood

210 properties (Sexton et al., 2011). Selection of specific alleles from these genes may enable identification of trees with superior wood quality for breeding and deployment (Sexton et al., 2011). Genome-wide associations studies, which detect genomic variations by scanning all genome (Silva-Junior and Grattapaglia, 2015); genomic selection, which involves selection decisions based on genomic breeding values estimated as the sum of the effects of genome-wide markers capturing most quantitative trait loci (QTL) for the target trait(s) (Grattapaglia and Resende, 2010); together with candidate gene approaches as the ones followed in this thesis (Chapter 4 and Chapter 5) represent a great potential benefit for Eucalyptus breeding programs. A recent illustration of the potential of this area, was the release of a Eucalyptus multi-species genome-wide 60K SNP chip, which includes 59,222 SNPs (Silva-Junior et al., 2015). According the authors, this chip is an outstanding tool to address population genomics questions in Eucalyptus and to empower genomic selection, genome-wide association studies and the broader study of complex trait variation in eucalypts.

211

Supplementary data

The following links provides access to the Supplementary files mentioned in Chapters 2-4:

Chapter 2 Fig. S2.1 - S2.7 (corresponds to Supplementary Figures 1-7 in the original article) URL: http://onlinelibrary.wiley.com/store/10.1111/nph.13313/asset/supinfo/nph13313-sup- 0001-FigS1-S7.pdf?v=1&s=c125d595de03d1ff7f6976ad0ad69b7c9682e8e0

Table S2.1 - S2.2 (corresponds to Supplementary Tables 1-2 in the original article) URL: http://onlinelibrary.wiley.com/store/10.1111/nph.13313/asset/supinfo/nph13313-sup- 0002-TableS1-S2.xlsx?v=1&s=a0654b3fd4bd8d2914fb71a144fa1319db75b22a

Table S2.3 (corresponds to Supplementary Table 3 in the original article) URL: http://onlinelibrary.wiley.com/store/10.1111/nph.13313/asset/supinfo/nph13313-sup- 0003-TableS3.xlsx?v=1&s=c1781571975af3ff4d4adfa862fa47d8b72c7cdd

Table S2.4 - Table S2.5 (corresponds to Supplementary Tables 4-5 in the original article) URL: http://onlinelibrary.wiley.com/store/10.1111/nph.13313/asset/supinfo/nph13313-sup- 0004-TableS4.xlsx?v=1&s=11f24de2be93c43630494b0686062ef3341e3e0f

Table S2.6 - Table S2.7 (corresponds to Supplementary Tables 6-7 in the original article) URL: http://onlinelibrary.wiley.com/store/10.1111/nph.13313/asset/supinfo/nph13313-sup- 0005-TableS6-S7.xlsx?v=1&s=c33603bf6b4328cd53ade83c470153e6552019f1

Chapter 3 Table Supp Data 3 (corresponds to supplementary data 3 in the original article) URL: http://www.nature.com/nature/journal/v510/n7505/extref/nature13308-s3.zip

Chapter 4 (corresponds to Supplementary Figures 1-5 in the original article) Fig. S4.1 - Fig. S4.5 (corresponds to Supplementary Figures 1-5 in the original article) URL: http://onlinelibrary.wiley.com/store/10.1111/nph.13039/asset/supinfo/nph13039-sup- 0001-FigS1-S5.pdf?v=1&s=d58f2001d7d5b74f26c7f1a20032fb66a3e28a84

Table S4.1 - Table S4.6 (corresponds to Supplementary Tables 1-6 in the original article) URL: http://onlinelibrary.wiley.com/store/10.1111/nph.13039/asset/supinfo/nph13039-sup- 0002-TableS1-S6.pdf?v=1&s=f2d0a44e4507f9b2119e1248548317e4ecd83277

The Supplementary files mentioned in Chapter 5 and 6 can be found in the following links:

http://www.itqb.unl.pt/~carocha/Chapter_5.rar

http://www.itqb.unl.pt/~carocha/Chapter_6.rar

The mRNA-Seq data used in this study was submitted to the European Nucleotide Archive (ENA) and received the following study accession:

http://www.ebi.ac.uk/ena/data/view/PRJEB12857.

The sRNA-Seq data used in this study was submitted to ArrayExpress and received the following study accession:

https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-2997/

212

References

Abdulrazzak, N., Pollet, B., Ehlting, J., Larsen, K., Asnaghi, C., Ronseau, S., Proux, C., Erhardt, M., Seltzer, V., Renou, J., et al. (2006). A coumaroyl-ester-3-hydroxylase insertion mutant reveals the existence of nonredundant meta-hydroxylation pathways and essential roles for phenolic precursors in cell expansion and plant growth. (vol 140, pg 30, 2006). Plant Physiol. 141, 1708–1708. Achnine, L., Blancaflor, E., Rasmussen, S., and Dixon, R. (2004). Colocalization of L- phenylalanine ammonia-lyase and cinnarnate 4-hydroxylase for metabolic channeling in phenylpropanoid biosynthesis. Plant Cell 16, 3098–3109. Aguayo, M.G., Quintupill, L., Castillo, R., Baeza, J., Freer, J., and Mendonça, R.T. (2010). Determination of differences in anatomical and chemical characteristics of tension and oppsoite wood of 8-year-old Eucalyptus globulus. Maderas Cienc. Tecnol. 12, 241–251. Aguayo, M.G., Mendonca, R.T., Martinez, P., Rodriguez, J., and Pereira, M. (2012). Chemical characteristics and Kraft pulping of tension wood from Eucalyptus globulus labill. Rev. Arvore 36, 1163–1171. Aguayo, M.G., Ruiz, J., Norambuena, M., and Teixeira Mendonça, R. (2015). Structural features of dioxane lignin from Eucalyptus globulus and their relationship with the pulp yield of contrasting genotypes. Maderas Cienc. Tecnol. 17, 625–636. Agusti, J., Herold, S., Schwarz, M., Sanchez, P., Ljung, K., Dun, E.A., Brewer, P.B., Beveridge, C.A., Sieberer, T., Sehr, E.M., et al. (2011). Strigolactone signaling is required for auxin- dependent stimulation of secondary growth in plants. Proc. Natl. Acad. Sci. U. S. A. 108, 20242–20247. Alberts, B., Johson, A., Lewis, J., Raff, M., Roberts, K., and Walter, P. (2007). Molecular Biology of the Cell, 5th Edition (New York: Garland Science). Allan, A.C., Hellens, R.P., and Laing, W.A. (2008). MYB transcription factors that colour our fruit. Trends Plant Sci. 13, 99–102. Altschul, S., Gish, W., Miller, W., Myers, E., and Lipman, D. (1990). Basic local alignment search tool. J. Mol. Biol. 215, 403–410. Alves, A., Simões, R., Stackpole, D., Vaillancourt, R., Potts, B., Schwanninger, M., and Rodrigues, J. (2011). Determination of the syringyl/guaiacyl (S/G) ratio of Eucalyptus globulus wood lignin by NIR-based PLS-R models using analytical pyrolysis as the reference method. J. Infrared Spectrosc. 19, 343. Ambros, V., Bartel, B., Bartel, D., Burge, C., Carrington, J., Chen, X., Dreyfuss, G., Eddy, S., Griffiths-Jones, S., Marshall, M., et al. (2003). A uniform system for microRNA annotation. RNA 9, 277–279. Andersson-Gunnerås, S., Hellgren, J.M., Björklund, S., Regan, S., Moritz, T., and Sundberg, B. (2003). Asymmetric expression of a poplar ACC oxidase controls ethylene production during gravitational induction of tension wood. Plant J. Cell Mol. Biol. 34, 339–349. Andersson-Gunnerås, S., Mellerowicz, E.J., Love, J., Segerman, B., Ohmiya, Y., Coutinho, P.M., Nilsson, P., Henrissat, B., Moritz, T., and Sundberg, B. (2006). Biosynthesis of cellulose- enriched tension wood in Populus: global analysis of transcripts and metabolites identifies biochemical and developmental regulators in secondary wall biosynthesis. Plant J. Cell Mol. Biol. 45, 144–165. Anjum, N.A., Gill, R., Kaushik, M., Hasanuzzaman, M., Pereira, E., Ahmad, I., Tuteja, N., and Gill, S.S. (2015). ATP-sulfurylase, sulfur-compounds, and plant stress tolerance. Front. Plant Sci. 6. Anterola, A., and Lewis, N. (2002). Trends in lignin modification: a comprehensive analysis of the effects of genetic manipulations/mutations on lignification and vascular integrity. Phytochemistry 61, 221–294. Arend, M. (2008). Immunolocalization of (1,4)-beta-galactan in tension wood fibers of poplar. Tree Physiol. 28, 1263–1267. Arvidsson, S., Kwasniewski, M., Riano-Pachon, D., and Mueller-Roeber, B. (2008). QuantPrime - a flexible tool for reliable high-throughput primer design for quantitative PCR. BMC Bioinformatics 9. Axtell, M.J. (2013). Classification and comparison of small RNAs from plants. Annu. Rev. Plant Biol. 64, 137–159. Bailey, T.L., Boden, M., Buske, F.A., Frith, M., Grant, C.E., Clementi, L., Ren, J., Li, W.W., and Noble, W.S. (2009). MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 37, W202-208. Baima, S., Possenti, M., Matteucci, A., Wisman, E., Altamura, M.M., Ruberti, I., and Morelli, G. (2001). The arabidopsis ATHB-8 HD-zip protein acts as a differentiation-promoting transcription factor of the vascular meristems. Plant Physiol. 126, 643–655.

213

Barakat, A., Bagniewska-Zadworna, A., Choi, A., Plakkat, U., DiLoreto, D., Yellanki, P., and Carlson, J. (2009). The cinnamyl alcohol dehydrogenase gene family in Populus: phylogeny, organization, and expression. Bmc Plant Biol. 9. Barakat, A., Yassin, N., Park, J., Choi, A., Herr, J., and Carlson, J. (2011). Comparative and phylogenomic analyses of cinnamoyl-CoA reductase and cinnamoyl-CoA-reductase-like gene family in land plants. Plant Sci. 181, 249–257. Barros, E., van Staden, C.-A., and Lezar, S. (2009). A microarray-based method for the parallel analysis of genotypes and expression profiles of wood-forming tissues in Eucalyptus grandis. Bmc Biotechnol. 9. Bartel, D.P. (2004). MicroRNAs: Genomics, biogenesis, mechanism, and function. Cell 116, 281– 297. Bartholomé, J., Mandrou, E., Mabiala, A., Jenkins, J., Nabihoudine, I., Klopp, C., Schmutz, J., Plomion, C., and Gion, J.-M. (2015). High-resolution genetic maps of Eucalyptus improve Eucalyptus grandis genome assembly. New Phytol. 206, 1283–1296. Baucher, M., Jaziri, M.E., and Vandeputte, O. (2007). From primary to secondary growth: origin and development of the vascular system. J. Exp. Bot. 58, 3485–3501. Bedon, F., Majada, J., Feito, I., Chaumeil, P., Dupuy, J.-W., Lomenech, A.-M., Barre, A., Gion, J.- M., and Plomion, C. (2011). Interaction between environmental factors affects the accumulation of root proteins in hydroponically grown Eucalyptus globulus (Labill.). Plant Physiol. Biochem. PPB Société Fr. Physiol. Végétale 49, 69–76. Bhalerao, R.P., and Fischer, U. (2014). Auxin gradients across wood-instructive or incidental? Physiol. Plant. 151, 43–51. Bhargava, A., Mansfield, S.D., Hall, H.C., Douglas, C.J., and Ellis, B.E. (2010). MYB75 functions in regulation of secondary cell wall formation in the Arabidopsis inflorescence stem. Plant Physiol. 154, 1428–1438. Bhuiyan, N.H., Selvaraj, G., Wei, Y., and King, J. (2009). Role of lignification in plant defense. Plant Signal. Behav. 4, 158–159. Blee, K., Choi, J., O’Connell, A., Jupe, S., Schuch, W., Lewis, N., and Bolwell, G. (2001). Antisense and sense expression of cDNA coding for CYP73A15, a class II cinnamate 4- hydroxylase, leads to a delayed and reduced production of lignin in tobacco. Phytochemistry 57, 1159–1166. Boerjan, W., Ralph, J., and Baucher, M. (2003). Lignin biosynthesis. Annu. Rev. Plant Biol. 54, 519–546. Bogs, J., Jaffé, F.W., Takos, A.M., Walker, A.R., and Robinson, S.P. (2007). The grapevine transcription factor VvMYBPA1 regulates proanthocyanidin synthesis during fruit development. Plant Physiol. 143, 1347–1361. Bollhöner, B., Prestele, J., and Tuominen, H. (2012). Xylem cell death: emerging understanding of regulation and function. J. Exp. Bot. err438. Bomal, C., Bedon, F., Caron, S., Mansfield, S.D., Levasseur, C., Cooke, J.E.K., Blais, S., Tremblay, L., Morency, M.-J., Pavy, N., et al. (2008). Involvement of Pinus taeda MYB1 and MYB8 in phenylpropanoid metabolism and secondary cell wall biogenesis: a comparative in planta analysis. J. Exp. Bot. 59, 3925–3939. Borevitz, J.O., Xia, Y., Blount, J., Dixon, R.A., and Lamb, C. (2000). Activation tagging identifies a conserved MYB regulator of phenylpropanoid biosynthesis. Plant Cell 12, 2383–2394. Borges, F., and Martienssen, R.A. (2015). The expanding world of small RNAs in plants. Nat. Rev. Mol. Cell Biol. 16, 727–741. Boudet, A.M., Kajita, S., Grima-Pettenati, J., and Goffner, D. (2003). Lignins and lignocellulosics: a better control of synthesis for new and improved uses. Trends Plant Sci. 8, 576–581. Brooker, M.I.H. (2000). A new classification of the genus Eucalyptus L’Hér. (Myrtaceae). Aust. Syst. Bot. 13, 79–148. Brown Jr, R.M., and Saxena, I.M. (2000). Cellulose biosynthesis: A model for understanding the assembly of biopolymers. Plant Physiol. Biochem. 38, 57–67. Budzinski, I.G.F., Moon, D.H., Lindén, P., Moritz, T., and Labate, C.A. (2016). Seasonal Variation of Carbon Metabolism in the Cambial Zone of Eucalyptus grandis. Plant Physiol. 932. Camargo, E., Costa, L., Soler, M., Salazar, M., Lepikson, J., Gonçalves, D., Marques, W., Carazzolle, M., Martinez, Y., Grima-Pettenati, J., et al. (2011). Effects of nitrogen fertilization on global xylem transcript profiling of Eucalyptus urophylla x grandis evaluated by RNA-seq technology. BMC Proc. 5, P106. Camargo, E.L.O., Nascimento, L.C., Soler, M., Salazar, M.M., Lepikson-Neto, J., Marques, W.L., Alves, A., Teixeira, P.J.P.L., Mieczkowski, P., Carazzolle, M.F., et al. (2014). Contrasting nitrogen fertilization treatments impact xylem gene expression and secondary cell wall lignification in Eucalyptus. BMC Plant Biol. 14, 256. Cannon, S.B., Mitra, A., Baumgarten, A., Young, N.D., and May, G. (2004). The roles of segmental and tandem gene duplication in the evolution of large gene families in Arabidopsis thaliana. BMC Plant Biol. 4, 10.

214

Cao, P.B., Azar, S., SanClemente, H., Mounet, F., Dunand, C., Marque, G., Marque, C., and Teulières, C. (2015). Genome-wide analysis of the AP2/ERF family in Eucalyptus grandis: an intriguing over-representation of stress-responsive DREB1/CBF genes. PloS One 10, e0121041. Cao, Z.-H., Zhang, S.-Z., Wang, R.-K., Zhang, R.-F., and Hao, Y.-J. (2013). Genome wide analysis of the apple MYB transcription factor family allows the identification of MdoMYB121 gene confering abiotic stress tolerance in plants. PloS One 8, e69955. Carlsbecker, A., Lee, J.-Y., Roberts, C.J., Dettmer, J., Lehesranta, S., Zhou, J., Lindgren, O., Moreno-Risueno, M.A., Vatén, A., Thitamadee, S., et al. (2010). Cell signalling by microRNA165/6 directs gene dose-dependent root cell fate. Nature 465, 316–321. Carocha, V., Soler, M., Hefer, C., Cassan-Wang, H., Fevereiro, P., Myburg, A.A., Paiva, J.A.P., and Grima-Pettenati, J. (2015). Genome-wide analysis of the lignin toolbox of Eucalyptus grandis. New Phytol. 206, 1297–1313. Carvalho, A., Paiva, J., Louzada, J., and Lima-Brito, J. (2013). The transcriptomics of secondary growth and wood formation in conifers. Mol. Biol. Int. 2013, 974324–974324. Carvalho, A., Graça, C., Carocha, V., Pêra, S., Lousada, J.L., Lima-Brito, J., and Paiva, J.A.P. (2015). An improved total RNA isolation from secondary tissues of woody species for coding and non-coding gene expression analyses. Wood Sci. Technol. 49, 647–658. Cassan-Wang, H., Soler, M., Yu, H., Camargo, E., Carocha, V., Ladouce, N., Savelli, B., Paiva, J., Leple, J., and Grima-Pettenati, J. (2012). Reference Genes for High-Throughput Quantitative Reverse Transcription-PCR Analysis of Gene Expression in Organs and Tissues of Eucalyptus Grown in Various Environmental Conditions. Plant Cell Physiol. 53, 2101–2116. Celedon, P.A.F., de Andrade, A., Meireles, K.G.X., da Cruz Gallo de Carvalho, M.C., Caldas, D.G.G., Moon, D.H., Carneiro, R.T., Franceschini, L.M., Oda, S., and Labate, C.A. (2007). Proteomic analysis of the cambial region in juvenile Eucalyptus grandis at three ages. PROTEOMICS 7, 2258–2274. CELPA (2012). Boletim Estatístico da Celpa de 2011. CELPA (2015). Boletim Estatístico da Celpa de 2014. Cerasoli, S., Caldeira, M., Pereira, M., Caudullo, G., and de Rigo, D. (2016). Eucalyptus globulus and other eucalypts in Europe: distribution, habitat, usage and threats. In European Atlas of Forest Tree Species., (Luxembourg), pp. 90–91. Çetinkol, Ö.P., Smith-Moritz, A.M., Cheng, G., Lao, J., George, A., Hong, K., Henry, R., Simmons, B.A., Heazlewood, J.L., and Holmes, B.M. (2012). Structural and Chemical Characterization of Hardwood from Tree Species with Applications as Bioenergy Feedstocks. PLoS ONE 7, e52820. Chaffey, N., Cholewa, E., Regan, S., and Sundberg, B. (2002). Secondary xylem development in Arabidopsis: a model for wood formation. Physiol. Plant. 114, 594–600. Chauhan, R.D., Veale, A., Ma, C., Strauss, S.H., and Myburg, A.A. (2014). Genetic transformation of Eucalyptus - Challenges and Future prospects. In Tree Biotechnology., (Boca Raton, FL: CRC Press), p. Chen, C.Y., Meyermans, H., Burggraeve, B., De Rycke, R.M., Inoue, K., De Vleesschauwer, V., Steenackers, M., Van Montagu, M.C., Engler, G.J., and Boerjan, W.A. (2000). Cell-specific and conditional expression of caffeoyl-coenzyme A-3-O-methyltransferase in poplar. Plant Physiol. 123, 853–867. Chen, H., Li, Q., Shuford, C., Liu, J., Muddiman, D., Sederoff, R., and Chiang, V. (2011). Membrane protein complexes catalyze both 4-and 3-hydroxylation of cinnamic acid derivatives in monolignol biosynthesis. Proc. Natl. Acad. Sci. U. S. A. 108, 21253–21258. Chen, J., Quan, M., and Zhang, D. (2015). Genome-wide identification of novel long non-coding RNAs in Populus tomentosa tension wood, opposite wood and normal wood xylem by RNA-seq. Planta 241, 125–143. Chen, M., Wang, Z., Zhu, Y., Li, Z., Hussain, N., Xuan, L., Guo, W., Zhang, G., and Jiang, L. (2012). The effect of transparent TESTA2 on seed fatty acid biosynthesis and tolerance to environmental stresses during young seedling establishment in Arabidopsis. Plant Physiol. 160, 1023–1036. Chessel, D., Dufour, A.B., and Thioulouse, J. (2004). The ade4 package-I- One-table methods. R News 4, 5–10. Chiang, V.L. (2006). Monolignol biosynthesis and genetic engineering of lignin in trees, a review. Environ. Chem. Lett. 4, 143–146. Christie, N., Tobias, P., Naidoo, S., and Kulheim, C. (2015). The Eucalyptus grandis NBS-LRR Gene Family: Physical Clustering and Expression Hotspots. Front. Plant Sci. 6. Clair, B., and Thibaut, B. (2001). SHRINKAGE OF THE GELATINOUS LAYER OF POPLAR AND BEECH TENSION WOOD. IAWA J. 22, 121–131. Cominelli, E., Galbiati, M., Vavasseur, A., Conti, L., Sala, T., Vuylsteke, M., Leonhardt, N., Dellaporta, S.L., and Tonelli, C. (2005). A guard-cell-specific MYB transcription factor regulates stomatal movements and plant drought tolerance. Curr. Biol. CB 15, 1196–1200. 215

Conesa, A., Gotz, S., Garcia-Gomez, J.M., Terol, J., Talon, M., and Robles, M. (2005). Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21, 3674–3676. Conesa, A., Madrigal, P., Tarazona, S., Gomez-Cabrero, D., Cervera, A., McPherson, A., Szcześniak, M.W., Gaffney, D.J., Elo, L.L., Zhang, X., et al. (2016). A survey of best practices for RNA-seq data analysis. Genome Biol. 17, 13. Costa, M.A., Collins, R.E., Anterola, A.M., Cochrane, F.C., Davin, L.B., and Lewis, N.G. (2003). An in silico assessment of gene function and organization of the phenylpropanoid pathway metabolic networks in Arabidopsis thaliana and limitations thereof. Phytochemistry 64, 1097–1112. Côté Jr, W.A., Day, A.C., and Timell, T.E. (1969). A contribution to the ultrastructure of tension wood fibers. Wood Sci. Technol. 3, 257–271. Courtois-Moreau, C.L., Pesquet, E., Sjödin, A., Muñiz, L., Bollhöner, B., Kaneda, M., Samuels, L., Jansson, S., and Tuominen, H. (2009). A unique program for cell death in xylem fibers of Populus stem. Plant J. Cell Mol. Biol. 58, 260–274. Creux, N., Castro, M.D., Ranik, M., Spokevicius, A., Bossinger, G., Maritz-Olivier, C., and Myburg, Z. (2011). In silico and functional characterization of the promoter of a Eucalyptus secondary cell wall associated cellulose synthase gene (EgCesA1). BMC Proc. 5, P107. Creux, N.M., Ranik, M., Berger, D.K., and Myburg, A.A. (2008). Comparative analysis of orthologous cellulose synthase promoters from Arabidopsis, Populus and Eucalyptus: evidence of conserved regulatory elements in angiosperms. New Phytol. 179, 722–737. Cuperus, J.T., Fahlgren, N., and Carrington, J.C. (2011). Evolution and Functional Diversification of MIRNA Genes. Plant Cell 23, 431–442. Dalmay, T. (2012). Evolution of Plant MicroRNAs. In eLS, John Wiley & Sons, Ltd, ed. (Chichester, UK: John Wiley & Sons, Ltd), p. Davin, L., Jourdes, M., Patten, A., Kim, K., Vassao, D., and Lewis, N. (2008). Dissection of lignin macromolecular configuration and assembly: Comparison to related biochemical processes in allyl/propenyl phenol and lignan biosynthesis. Nat. Prod. Rep. 25, 1015–1090. De Micco, V., Ruel, K., Joseleau, J.-P., Grima-Pettenati, J., and Aronne, G. (2012). Xylem anatomy and cell wall ultrastructure of Nicotiana tabacum after lignin genetic modification through transcriptional activator EgMYB2. Iawa J. 33, 269–286. De Smet, R., Adams, K., Vandepoele, K., van Montagu, M., Maere, S., and Van de Peer, Y. (2013). Convergent gene loss following gene and genome duplications creates single-copy families in flowering plants. Proc. Natl. Acad. Sci. U. S. A. 110, 2898–2903. Déjardin, A., Laurans, F., Arnaud, D., Breton, C., Pilate, G., and Leplé, J.-C. (2010). Wood formation in Angiosperms. Dév. Végétatif PlantesGeorges Pelletier Jean-Fr. Morot-Gaudy 333, 325–334. Delmer, D.P., and Haigler, C.H. (2002). The regulation of metabolic flux to cellulose, a major sink for carbon in plants. Metab. Eng. 4, 22–28. Demura, T., and Fukuda, H. (2007). Transcriptional regulation in wood formation. Trends Plant Sci. 12, 64–70. Denis, M., Favreau, B., Ueno, S., Camus-Kulandaivelu, L., Chaix, G., Gion, J.-M., Nourrisier- Mountou, S., Polidori, J., and Bouvet, J.-M. (2013). Genetic variation of wood chemical traits and association with underlying genes in Eucalyptus urophylla. Tree Genet. Genomes 9, 927–942. Dias, A.P., Braun, E.L., McMullen, M.D., and Grotewold, E. (2003). Recently duplicated maize R2R3 Myb genes provide evidence for distinct mechanisms of evolutionary divergence after duplication. Plant Physiol. 131, 610–620. Didi, V., Jackson, P., and Hejátko, J. (2015). Hormonal regulation of secondary cell wall formation. J. Exp. Bot. erv222. Dixon, R.A., and Paiva, N.L. (1995). Stress-Induced Phenylpropanoid Metabolism. Plant Cell 7, 1085–1097. Druart, N., Johansson, A., Baba, K., Schrader, J., Sjodin, A., Bhalerao, R.R., Resman, L., Trygg, J., Moritz, T., and Bhalerao, R.P. (2007). Environmental and hormonal regulation of the activity-dormancy cycle in the cambial meristem involves stage-specific modulation of transcriptional and metabolic networks. Plant J. 50, 557–573. Du, S., and Yamamoto, F. (2003). Ethylene evolution changes in the stems of Metasequoia glyptostroboides and Aesculus turbinata seedlings in relation to gravity-induced reaction wood formation. Trees 17, 522–528. Du, S., and Yamamoto, F. (2007). An Overview of the Biology of Reaction Wood Formation. J. Integr. Plant Biol. 49, 131–143. Du, H., Feng, B.-R., Yang, S.-S., Huang, Y.-B., and Tang, Y.-X. (2012a). The R2R3-MYB transcription factor gene family in maize. PloS One 7, e37463. Du, H., Yang, S.-S., Liang, Z., Feng, B.-R., Liu, L., Huang, Y.-B., and Tang, Y.-X. (2012b). Genome-wide analysis of the MYB transcription factor superfamily in soybean. BMC Plant Biol. 12, 106. 216

Dubos, C., Stracke, R., Grotewold, E., Weisshaar, B., Martin, C., and Lepiniec, L. (2010). MYB transcription factors in Arabidopsis. Trends Plant Sci. 15, 573–581. Eckardt, N. (2002). Probing the mysteries of lignin biosynthesis: The crystal structure of caffeic acid/5-hydroxyferulic acid 3/5-O-methyltransferase provides new insights. Plant Cell 14, 1185–1189. Ehlting, J., Büttner, D., Wang, Q., Douglas, C.J., Somssich, I.E., and Kombrink, E. (1999). Three 4-coumarate:coenzyme A ligases in Arabidopsis thaliana represent two evolutionarily divergent classes in angiosperms. Plant J. Cell Mol. Biol. 19, 9–20. Ehlting, J., Shin, J.J.K., and Douglas, C.J. (2001). Identification of 4-coumarate : coenzyme A ligase (4CL) substrate recognition domains. Plant J. 27, 455–465. Ekblom, R., and Wolf, J.B.W. (2014). A field guide to whole-genome sequencing, assembly and annotation. Evol. Appl. 7, 1026–1042. Ekstrom, A., Taujale, R., McGinn, N., and Yin, Y. (2014). PlantCAZyme: a database for plant carbohydrate-active enzymes. Database J. Biol. Databases Curation 2014. Eldridge, K.G., Davidson, J., Harwood, C., Wyk, G. van, and others (1993). Eucalypt domestication and breeding. (Clarendon Press). Elissetche, J.P., Salas-Burgos, A., Garcia, R., Iturra, C., Teixeira, R., Rodriguez, J., and Valenzuela, S. (2011). Generation and analysis of expressed sequence tags (ESTs) from cambium tissue cDNA libraries of contrasting genotypes of Eucalyptus globulus Labill. BMC Proc. 5, P108. Emery, J.F., Floyd, S.K., Alvarez, J., Eshed, Y., Hawker, N.P., Izhaki, A., Baum, S.F., and Bowman, J.L. (2003). Radial Patterning of Arabidopsis Shoots by Class III HD-ZIP and KANADI Genes. Curr. Biol. 13, 1768–1774. Escamilla-Trevino, L.L., Shen, H., Uppalapati, S.R., Ray, T., Tang, Y., Hernandez, T., Yin, Y., Xu, Y., and Dixon, R.A. (2010). Switchgrass (Panicum virgatum) possesses a divergent family of cinnamoyl CoA reductases with distinct biochemical properties. New Phytol. 185, 143– 155. Evtuguin, D., and Neto, C. (2004). Recent advances in Eucalyptus wood chemistry: structural features through the prism of thechnological response. In Proceedings of the IUFRO Conference, (Aveiro, Portugal: RAIZ, Instituto Investigação de Floresta e Papel), p. Eyles, A., and Mohammed, C. (2003). Kino vein formation in Eucalyptus globulus and E. nitens. Aust. For. 66, 206–212. Fagerstedt, K., Mellerowicz, E., Gorshkova, T., Rual, K., and Joseleau, J. (2007). Cell wall polymers in reaction wood. In The Biology of Reaction Wood., (Berlin Heidelberg: Springer- Verlag), p. Faix, O., Bremer, J., Schmidt, O., and Tatjana, S.J. (1991). Monitoring of chemical changes in white-rot degraded beech wood by pyrolysis—gas chromatography and Fourier-transform infrared spectroscopy. J. Anal. Appl. Pyrolysis 21, 147–162. Fang, C.-H., Clair, B., Gril, J., and Almeras, T. (2007). G-layer transverse shrinkage and its consequences on shrinkage of poplar tension wood. Wood Sci. Technol. 41, 659–671. FAO (2013). State of the World’s forests 2012. Fei, Q., Xia, R., and Meyers, B.C. (2013). Phased, secondary, small interfering RNAs in posttranscriptional regulatory networks. Plant Cell 25, 2400–2415. Felsenstein, J. (1985). Confidence-limits on phylogenies - an approach using the bootstrap. Evolution 39, 783–791. Fernández, M., Troncoso, V., and Valenzuela, S. (2015). Transcriptome Profile in Response to Frost Tolerance in Eucalyptus globulus. Plant Mol. Biol. Report. 33, 1472–1485. Folkes, L., Moxon, S., Woolfenden, H.C., Stocks, M.B., Szittya, G., Dalmay, T., and Moulton, V. (2012). PAREsnip: a tool for rapid genome-wide discovery of small RNA/target interactions evidenced through degradome sequencing. Nucleic Acids Res. 40, e103. Foucart, C., Paux, E., Ladouce, N., San-Clemente, H., Grima-Pettenati, J., and Sivadon, P. (2006). Transcript profiling of a xylem vs phloem cDNA subtractive library identifies new genes expressed during xylogenesis in Eucalyptus. New Phytol. 170, 739–752. Foucart, C., Jauneau, A., Gion, J.-M., Amelot, N., Martinez, Y., Panegos, P., Grima-Pettenati, J., and Sivadon, P. (2009). Overexpression of EgROP1, a Eucalyptus vascular-expressed Rac-like small GTPase, affects secondary xylem formation in Arabidopsis thaliana. New Phytol. 183, 1014–1029. Franke, R., McMichael, C.M., Meyer, K., Shirley, A.M., Cusumano, J.C., and Chapple, C. (2000). Modified lignin in tobacco and poplar plants over-expressing the Arabidopsis gene encoding ferulate 5-hydroxylase. Plant J. 22, 223–234. Franke, R., Humphreys, J.M., Hemm, M.R., Denault, J.W., Ruegger, M.O., Cusumano, J.C., and Chapple, C. (2002). The Arabidopsis REF8 gene encodes the 3-hydroxylase of phenylpropanoid metabolism. Plant J. 30, 33–45. Fukuda, T., and Terashima, N. (1988). Heterogeneity in formation of lignin XII. Deposition of chemical components during the formation of cell walls of black pine and poplar. Mokuzai Gakkaishi 604–608. 217

Furuta, K., Lichtenberger, R., and Helariutta, Y. (2012). The role of mobile small RNA species during root growth and development. Curr. Opin. Cell Biol. 24, 211–216. Gallo de Carvalho, M.C. da C., Caldas, D.G.G., Carneiro, R.T., Moon, D.H., Salvatierra, G.R., Franceschini, L.M., de Andrade, A., Celedon, P.A.F., Oda, S., and Labate, C.A. (2008). SAGE transcript profiling of the juvenile cambial region of Eucalyptus grandis. Tree Physiol. 28, 905–919. García, J.R., Anderson, N., Le-Feuvre, R., Iturra, C., Elissetche, J., Chapple, C., and Valenzuela, S. (2014). Rescue of syringyl lignin and sinapate ester biosynthesis in Arabidopsis thaliana by a coniferaldehyde 5-hydroxylase from Eucalyptus globulus. Plant Cell Rep. 33, 1263– 1274. Geisler-Lee, J., Geisler, M., Coutinho, P.M., Segerman, B., Nishikubo, N., Takahashi, J., Aspeborg, H., Djerbi, S., Master, E., Andersson-Gunnerås, S., et al. (2006). Poplar Carbohydrate-Active Enzymes. Gene Identification and Expression Analyses. Plant Physiol. 140, 946–962. Gemas, V. (2004). ‐ Genetic variability of two woody perennials : Olea europaea L. and Eucalyptus globulus Labill. : assessed by RAPD and ISSR markers. PhD thesis. Universidade Nova de Lisboa. Instituto de Tecnologia Química e Biológica. Gerber, L., Zhang, B., Roach, M., Rende, U., Gorzsás, A., Kumar, M., Burgert, I., Niittylä, T., and Sundberg, B. (2014). Deficient sucrose synthase activity in developing wood does not specifically affect cellulose biosynthesis, but causes an overall decrease in cell wall polymers. New Phytol. 203, 1220–1230. Gerttula, S., Zinkgraf, M., Muday, G., Lewis, D., Ibatullin, F.M., Brumer, H., Hart, F., Mansfield, S.D., Filkov, V., and Groover, A. (2015). Transcriptional and Hormonal Regulation of Gravitropism of Woody Stems in Populus. Plant Cell tpc.15.00531. Gholizadeh, A. (2011). Heterologous expression of stress-responsive DUF538 domain containing protein and its morpho-biochemical consequences. Protein J. 30, 351–358. Gholizadeh, A., and Baghban Kohnehrouz, B. (2010). Identification of DUF538 cDNA clone from Celosia cristata expressed sequences of nonstressed and stressed leaves. Russ. J. Plant Physiol. 57, 247–252. Gille, S., and Pauly, M. (2012). O-acetylation of plant cell wall polysaccharides. Front. Plant Sci. 3, 12. Gille, S., de Souza, A., Xiong, G., Benz, M., Cheng, K., Schultink, A., Reca, I.-B., and Pauly, M. (2011). O-Acetylation of Arabidopsis Hemicellulose Xyloglucan Requires AXY4 or AXY4L, Proteins with a TBL and DUF231 Domain[W][OA]. Plant Cell 23, 4041–4053. Gion, J.-M., Rech, P., Grima-Pettenati, J., Verhaegen, D., and Plomion, C. (2000). Mapping candidate genes in Eucalyptus with emphasis on lignification genes. Mol. Breed. 6, 441– 449. Gion, J.-M., Carouche, A., Deweer, S., Bedon, F., Pichavant, F., Charpentier, J.-P., Bailleres, H., Rozenberg, P., Carocha, V., Ognouabi, N., et al. (2011). Comprehensive genetic dissection of wood properties in a widely-grown tropical tree: Eucalyptus. Bmc Genomics 12. Girijashankar, V. (2011). Genetic transformation of eucalyptus. Physiol. Mol. Biol. Plants Int. J. Funct. Plant Biol. 17, 9–23. Goff, L., Trapnell, C., and Kelley, D. (2013). cummeRbund: Analysis, exploration, manipulation, and visualization of Cufflinks high-throughput sequencing data. Goicoechea, M., Lacombe, E., Legay, S., Mihaljevic, S., Rech, P., Jauneau, A., Lapierre, C., Pollet, B., Verhaegen, D., Chaubet-Gigot, N., et al. (2005). EgMYB2, a new transcriptional activator from Eucalyptus xylem, regulates secondary cell wall formation and lignin biosynthesis. Plant J. 43, 553–567. Gómez-Maldonado, J., Avila, C., Torre, F., Cañas, R., Cánovas, F.M., and Campbell, M.M. (2004). Functional interactions between a glutamine synthetase promoter and MYB proteins. Plant J. Cell Mol. Biol. 39, 513–526. Gonzalez, A., Zhao, M., Leavitt, J.M., and Lloyd, A.M. (2008). Regulation of the anthocyanin biosynthetic pathway by the TTG1/bHLH/Myb transcriptional complex in Arabidopsis seedlings. Plant J. Cell Mol. Biol. 53, 814–827. Gonzalez, A., Mendenhall, J., Huo, Y., and Lloyd, A. (2009). TTG1 complex MYBs, MYB5 and TT2, control outer seed coat differentiation. Dev. Biol. 325, 412–421. Goodstein, D.M., Shu, S., Howson, R., Neupane, R., Hayes, R.D., Fazo, J., Mitros, T., Dirks, W., Hellsten, U., Putnam, N., et al. (2012). Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res. 40, D1178–D1186. Goujon, T., Minic, Z., El Amrani, A., Lerouxel, O., Aletti, E., Lapierre, C., Joseleau, J.-P., and Jouanin, L. (2003). AtBXL1, a novel higher plant (Arabidopsis thaliana) putative beta- xylosidase gene, is involved in secondary cell wall metabolism and plant development. Plant J. 33, 677–690. Goulao, L.F., Vieira-Silva, S., and Jackson, P.A. (2011). Association of hemicellulose- and pectin- modifying gene expression with Eucalyptus globulus secondary growth. Plant Physiol. Biochem. PPB Société Fr. Physiol. Végétale 49, 873–881. 218

Grattapaglia, D., and Resende, M.D.V. (2010). Genomic selection in forest tree breeding. Tree Genet. Genomes 7, 241–255. Grattapaglia, D., Vaillancourt, R., Shepherd, M., Thumma, B., Foley, W., Kulheim, C., Potts, B., and Myburg, A. (2012). Progress in Myrtaceae genetics and genomics: Eucalyptus as the pivotal genus. Tree Genet. Genomes 8, 463–508. Grattapaglia, D., Mamani, E.M.C., Silva-Junior, O.B., and Faria, D.A. (2015). A novel genome- wide microsatellite resource for species of Eucalyptus with linkage-to-physical correspondence on the reference genome sequence. Mol. Ecol. Resour. 15, 437–448. Grima-Pettenati, J., and Goffner, D. (1999). Lignin genetic engineering revisited. Plant Sci. 145, 51–65. Grima-Pettenati, J., Feuillet, C., Goffner, D., Borderies, G., and Boudet, A.M. (1993). Molecular- Cloning and Expression of a Eucalyptus-Gunnii Cdna Clone Encoding Cinnamyl Alcohol- Dehydrogenase. Plant Mol. Biol. 21, 1085–1095. Grima-Pettenati, J., Soler, M., Camargo, E.L.O., and Wang, H. (2012). Transcriptional Regulation of the Lignin Biosynthetic Pathway Revisited: New Players and Insights. In Lignins: Biosynthesis, Biodegradation and Bioengineering, L. Jouann, and C. Lapierre, eds. pp. 173–218. Groover, A.T. (2005). What genes make a tree a tree? Trends Plant Sci. 10, 210–214. Gu, X. (2003). Evolution of duplicate genes versus genetic robustness against null mutations. Trends Genet. TIG 19, 354–356. Gu, S., and Kay, M.A. (2010). How do miRNAs mediate translational repression? Silence 1, 11. Guan, X., Pang, M., Nah, G., Shi, X., Ye, W., Stelly, D.M., and Chen, Z.J. (2014). miR828 and miR858 regulate homoeologous MYB2 gene functions in Arabidopsis trichome and cotton fibre development. Nat. Commun. 5, 3050. Guda, V., Steele, P., Penmetsa, V., and Li, Q. (2015). Chapter 7. Advances in fast pyrolysis tehnology. In Recent Advances in Thermochemical Conversion of Biomass, (Elsevier), p. Gunnery, S., and Datta, A. (1987). An inhibitor RNA of translation from barley embryo. Biochem. Biophys. Res. Commun. 142, 383–388. Guo, D., Ran, J., and Wang, X. (2010). Evolution of the Cinnamyl/Sinapyl Alcohol Dehydrogenase (CAD/SAD) Gene Family: The Emergence of Real Lignin is Associated with the Origin of Bona Fide CAD. J. Mol. Evol. 71, 202–218. Hamberger, B., Ellis, M., Friedmann, M., Souza, C., Barbazuk, B., and Douglas, C. (2007). Genome-wide analyses of phenylpropanoid-related genes in Populus trichocarpa, Arabidopsis thaliana, and Oryza sativa: the Populus lignin toolbox and conservation and diversification of angiosperm gene families. Can. J. Bot.-Rev. Can. Bot. 85, 1182–1201. Hanada, K., Zou, C., Lehti-Shiu, M.D., Shinozaki, K., and Shiu, S.-H. (2008). Importance of lineage-specific expansion of plant tandem duplicates in the adaptive response to environmental stimuli. Plant Physiol. 148, 993–1003. Haygren, J., and Bowyer, J. (1996). Forest Products and Wood Science: An Introduction. (Ames, Iowa: Iowa State University Press.). Hefer, C., Mizrachi, E., Joubert, F., and Myburg, A. (2011). The Eucalyptus genome integrative explorer (EucGenIE): a resource for Eucalyptus genomics and transcriptomics. BMC Proc. 5, O49. Hefer, C.A., Mizrachi, E., Myburg, A.A., Douglas, C.J., and Mansfield, S.D. (2015). Comparative interrogation of the developing xylem transcriptomes of two wood-forming species: Populus trichocarpa and Eucalyptus grandis. New Phytol. 206, 1391–1405. Hellgren, J. (2003). Ethylene and auxin in the control of wood formation. Aspen Bibliogr. Ph.D. Dissertation, 1–105. Hellgren, J.M., Olofsson, K., and Sundberg, B. (2004). Patterns of Auxin Distribution during Gravitational Induction of Reaction Wood in Poplar and Pine. Plant Physiol. 135, 212–220. Hertzberg, M., Aspeborg, H., Schrader, J., Andersson, A., Erlandsson, R., Blomqvist, K., Bhalerao, R., Uhlén, M., Teeri, T.T., Lundeberg, J., et al. (2001). A transcriptional roadmap to wood formation. Proc. Natl. Acad. Sci. U. S. A. 98, 14732–14737. Hillis, W.E., and Yazaki, Y.-I. (1974). Kinos of Eucalyptus species and their acid degradation products. Phytochemistry 13, 495–498. Hirayama, T., and Shinozaki, K. (2010). Research on plant abiotic stress responses in the post- genome era: past, present and future. Plant J. Cell Mol. Biol. 61, 1041–1052. Hoch, G. (2007). Cell wall hemicelluloses as mobile carbon stores in non-reproductive plant tissues. Funct. Ecol. 21, 823–834. Hoffmann, L., Maury, S., Martz, F., Geoffroy, P., and Legrand, M. (2003). Purification, cloning, and properties of an acyltransferase controlling shikimate and quinate ester intermediates in phenylpropanoid metabolism. J. Biol. Chem. 278, 95–103. Hoffmann, L., Besseau, S., Geoffroy, P., Ritzenthaler, C., Meyer, D., Lapierre, C., Pollet, B., and Legrand, M. (2004). Silencing of hydroxycinnamoy-coenzyme A shikimate/quinate hydroxycinnamoyltransferase affects phenylpropanoid biosynthesis. Plant Cell 16, 1446– 1465. 219

Holman, J.E., Hughes, J.M., and Fensham, R.J. (2003). A morphological cline in Eucalyptus: a genetic perspective. Mol. Ecol. 12, 3013–3025. Huang, J., Gu, M., Lai, Z., Fan, B., Shi, K., Zhou, Y.-H., Yu, J.-Q., and Chen, Z. (2010). Functional Analysis of the Arabidopsis PAL Gene Family in Plant Growth, Development, and Response to Environmental Stress. Plant Physiol. 153, 1526–1538. Hudson, C.J., Freeman, J.S., Jones, R.C., Potts, B.M., Wong, M.M.L., Weller, J.L., Hecht, V.F.G., Poethig, R.S., and Vaillancourt, R.E. (2014). Genetic control of heterochrony in Eucalyptus globulus. G3 Bethesda Md 4, 1235–1245. Humphreys, J., and Chapple, C. (2002). Rewriting the lignin roadmap. Curr. Opin. Plant Biol. 5, 224–229. Humphreys, J.M., Hemm, M.R., and Chapple, C. (1999). New routes for lignin biosynthesis defined by biochemical characterization of recombinant ferulate 5-hydroxylase, a multifunctional cytochrome P450-dependent monooxygenase. Proc. Natl. Acad. Sci. U. S. A. 96, 10045–10050. Hurles, M. (2004). Gene duplication: the genomic trade in spare parts. PLoS Biol. 2, E206. Hussey, S.G., Mizrachi, E., Spokevicius, A.V., Bossinger, G., Berger, D.K., and Myburg, A.A. (2011). SND2, a NAC transcription factor gene, regulates genes involved in secondary cell wall development in Arabidopsis fibres and increases fibre cell area in Eucalyptus. Bmc Plant Biol. 11. Hussey, S.G., Mizrachi, E., Creux, N.M., and Myburg, A.A. (2013). Navigating the transcriptional roadmap regulating plant secondary cell wall deposition. Front. Plant Sci. 4, 325. Hussey, S.G., Saïdi, M.N., Hefer, C.A., Myburg, A.A., and Grima-Pettenati, J. (2015). Structural, evolutionary and functional analysis of the NAC domain protein family in Eucalyptus. New Phytol. 206, 1337–1350. ICFN (2013). IFN6 – Áreas dos usos do solo e das espécies florestais de Portugal continental. Resultados preliminares. Iglesias, I., and Wiltermann, D. (2009). Eucalyptologics Information Resources on Eucalypt Cultivation Worldwide. Ilegems, M., Douet, V., Meylan-Bettex, M., Uyttewaal, M., Brand, L., Bowman, J.L., and Stieger, P.A. (2010). Interplay of auxin, KANADI and Class III HD-ZIP transcription factors in vascular tissue formation. Development 137, 975–984. Iwakawa, H., and Tomari, Y. (2013). Molecular Insights into microRNA-Mediated Translational Repression in Plants. Mol. Cell 52, 591–601. Janas, M.M., and Novina, C.D. (2012). Not lost in translation: stepwise regulation of microRNA targets. EMBO J. 31, 2446–2447. Jiang, C., Gu, X., and Peterson, T. (2004). Identification of conserved gene structures and carboxy-terminal motifs in the Myb gene family of Arabidopsis and Oryza sativa L. ssp. indica. Genome Biol. 5, R46. Jin, H., and Martin, C. (1999). Multifunctionality and diversity within the plant MYB-gene family. Plant Mol. Biol. 41, 577–585. Jin, H., Cominelli, E., Bailey, P., Parr, A., Mehrtens, F., Jones, J., Tonelli, C., Weisshaar, B., and Martin, C. (2000). Transcriptional repression by AtMYB4 controls production of UV- protecting sunscreens in Arabidopsis. EMBO J. 19, 6150–6161. Jones, D., Taylor, W., and Thorton, J. (1992). The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8, 275–282. Joseleau, J.-P., Imai, T., Kuroda, K., and Ruel, K. (2004). Detection in situ and characterization of lignin in the G-layer of tension wood fibres of Populus deltoides. Planta 219, 338–345. Jourez, B., Riboux, A., and Leclercq, A. (2001). Comparison of basic density and longitudinal shrinkage in tension wood and opposite wood in young stems of Populus euramericana cv. Ghoy when subjected to a gravitational stimulus. Can. J. For. Res.-Rev. Can. Rech. For. 31, 1676–1683. Jung, C., Seo, J.S., Han, S.W., Koo, Y.J., Kim, C.H., Song, S.I., Nahm, B.H., Choi, Y.D., and Cheong, J.-J. (2008). Overexpression of AtMYB44 enhances stomatal closure to confer abiotic stress tolerance in transgenic Arabidopsis. Plant Physiol. 146, 623–635. Källman, T., Chen, J., Gyllenstrand, N., and Lagercrantz, U. (2013). A Significant Fraction of 21- Nucleotide Small RNA Originates from Phased Degradation of Resistance Genes in Several Perennial Species. Plant Physiol. 162, 741–754. Karpinska, B., Karlsson, M., Srivastava, M., Stenberg, A., Schrader, J., Sterky, F., Bhalerao, R., and Wingsle, G. (2004). MYB transcription factors are differentially expressed and regulated during secondary vascular tissue development in hybrid aspen. Plant Mol. Biol. 56, 255–270. Katoh, K., Misawa, K., Kuma, K., and Miyata, T. (2002). MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30, 3059– 3066.

220

Kawashima, C.G., Matthewman, C.A., Huang, S., Lee, B.-R., Yoshimoto, N., Koprivova, A., Rubio-Somoza, I., Todesco, M., Rathjen, T., Saito, K., et al. (2011). Interplay of SLIM1 and miR395 in the regulation of sulfate assimilation in Arabidopsis. Plant J. 66, 863–876. Keegstra, K. (2010). Plant Cell Walls. Plant Physiol. 154, 483–486. Keller, G., Cao, P.B., Clemente, H.S., Kayal, W.E., Marque, C., and Teulières, C. (2013). Transcript profiling combined with functional annotation of 2,662 ESTs provides a molecular picture of Eucalyptus gunnii cold acclimation. Trees 27, 1713–1735. Khraiwesh, B., Zhu, J.-K., and Zhu, J. (2012). Role of miRNAs and siRNAs in biotic and abiotic stress responses of plants. Biochim. Biophys. Acta-Gene Regul. Mech. 1819, 137–148. Kim, J., Jung, J.-H., Reyes, J.L., Kim, Y.-S., Kim, S.-Y., Chung, K.-S., Kim, J.A., Lee, M., Lee, Y., Narry Kim, V., et al. (2005). microRNA-directed cleavage of ATHB15 mRNA regulates vascular development in Arabidopsis inflorescence stems. Plant J. Cell Mol. Biol. 42, 84– 94. Kim, S., Kim, M., Bedgar, D., Moinuddin, S., Cardenas, C., Davin, L., Kang, C., and Lewis, N. (2004). Functional reclassification of the putative cinnamyl alcohol dehydrogenase multigene family in Arabidopsis. Proc. Natl. Acad. Sci. U. S. A. 101, 1455–1460. Kim, S.-J., Kim, K.-W., Cho, M.-H., Franceschi, V.R., Davin, L.B., and Lewis, N.G. (2007). Expression of cinnamyl alcohol dehydrogenases and their putative homologues during Arabidopsis thaliana growth and development: lessons for database annotations? Phytochemistry 68, 1957–1974. Kirst, M., Myburg, A.A., De León, J.P.G., Kirst, M.E., Scott, J., and Sederoff, R. (2004). Coordinated Genetic Regulation of Growth and Lignin Revealed by Quantitative Trait Locus Analysis of cDNA Microarray Data in an Interspecific Backcross of Eucalyptus. Plant Physiol. 135, 2368–2378. Kitaoka, N., Matsubara, T., Sato, M., Takahashi, K., Wakuta, S., Kawaide, H., Matsui, H., Nabeta, K., and Matsuura, H. (2011). Arabidopsis CYP94B3 Encodes Jasmonyl-l-Isoleucine 12- Hydroxylase, a Key Enzyme in the Oxidative Catabolism of Jasmonate. Plant Cell Physiol. 52, 1757–1765. Klemm, D., Heublein, B., Fink, H.-P., and Bohn, A. (2005). Cellulose: Fascinating Biopolymer and Sustainable Raw Material. Angew. Chem. Int. Ed. 44, 3358–3393. Klevebring, D., Street, N.R., Fahlgren, N., Kasschau, K.D., Carrington, J.C., Lundeberg, J., and Jansson, S. (2009). Genome-wide profiling of Populus small RNAs. Bmc Genomics 10. Ko, J.H., Prassinos, C., and Han, K.H. (2006). Developmental and seasonal expression of PtaHB1, a Populus gene encoding a class IIIHD-Zip protein, is closely associated with secondary growth and inversely correlated with the level of microRNA (miR166). New Phytol. 169, 469–478. Kong, W.W., and Yang, Z.M. (2010). Identification of iron-deficiency responsive microRNA genes and cis-elements in Arabidopsis. Plant Physiol. Biochem. PPB Société Fr. Physiol. Végétale 48, 153–159. Koo, A.J.K., Cooke, T.F., and Howe, G.A. (2011). Cytochrome P450 CYP94B3 mediates catabolism and inactivation of the plant hormone jasmonoyl-L-isoleucine. Proc. Natl. Acad. Sci. U. S. A. 108, 9298–9303. Koonin, E.V., and Wolf, Y.I. (2010). Constraints and plasticity in genome and molecular-phenome evolution. Nat. Rev. Genet. 11, 487–498. Kozomara, A., and Griffiths-Jones, S. (2011). miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res. 39, D152–D157. Kozomara, A., and Griffiths-Jones, S. (2014). miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res. 42, D68–D73. Kranz, H.D., Denekamp, M., Greco, R., Jin, H., Leyva, A., Meissner, R.C., Petroni, K., Urzainqui, A., Bevan, M., Martin, C., et al. (1998). Towards functional characterisation of the members of the R2R3-MYB gene family from Arabidopsis thaliana. Plant J. Cell Mol. Biol. 16, 263– 276. Krzywinski, M., Schein, J., Birol, I., Connors, J., Gascoyne, R., Horsman, D., Jones, S.J., and Marra, M.A. (2009). Circos: an information aesthetic for comparative genomics. Genome Res. 19, 1639–1645. Külheim, C. (2010). Applying second-generation sequencing to non-model species. Aust. Biochem. 41, 10–13. Külheim, C., Padovan, A., Hefer, C., Krause, S.T., Köllner, T.G., Myburg, A.A., Degenhardt, J., and Foley, W.J. (2015). The Eucalyptus terpene synthase gene family. BMC Genomics 16, 450. Kullan, A., van Dyk, M., Hefer, C., Jones, N., Kanzler, A., and Myburg, A. (2012). Genetic dissection of growth, wood basic density and gene expression in interspecific backcrosses of Eucalyptus grandis and E-urophylla. Bmc Genet. 13. Kuo, H.-F., and Chiou, T.-J. (2011). The Role of MicroRNAs in Phosphorus Deficiency Signaling. Plant Physiol. 156, 1016–1024.

221

Labeeuw, L., and Martone, P. (2015). Ancient origin of the biosynthesis of lignin precursors. Biol. Direct 10, 23. Lacombe, E., Hawkins, S., VanDoorsselaere, J., Piquemal, J., Goffner, D., Poeydomenge, O., Boudet, A.M., and GrimaPettenati, J. (1997). Cinnamoyl CoA reductase, the first committed enzyme of the lignin branch biosynthetic pathway: Cloning, expression and phylogenetic relationships. Plant J. 11, 429–441. Lafarguette, F., Leplé, J.-C., Déjardin, A., Laurans, F., Costa, G., Lesage-Descauses, M.-C., and Pilate, G. (2004). Poplar genes encoding fasciclin-like arabinogalactan proteins are highly expressed in tension wood. New Phytol. 164, 107–121. Lamara, M., Raherison, E., Lenz, P., Beaulieu, J., Bousquet, J., and MacKay, J. (2015). Genetic architecture of wood properties based on association analysis and co-expression networks in white spruce. New Phytol. n/a-n/a. Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., et al. (2007). Clustal W and Clustal X version 2.0. Bioinforma. Oxf. Engl. 23, 2947–2948. Lauvergeat, V., Lacomme, C., Lacombe, E., Lasserre, E., Roby, D., and Grima-Pettenati, J. (2001). Two cinnamoyl-CoA reductase (CCR) genes from Arabidopsis thaliana are differentially expressed during development and in response to infection with pathogenic bacteria. Phytochemistry 57, 1187–1195. Lea, U.S., Slimestad, R., Smedvig, P., and Lillo, C. (2007). Nitrogen deficiency enhances expression of specific MYB and bHLH transcription factors and accumulation of end products in the flavonoid pathway. Planta 225, 1245–1253. Legay, S., Lacombe, E., Goicoechea, M., Brière, C., Séguin, A., Mackay, J., and Grima-Pettenati, J. (2007). Molecular characterization of EgMYB1, a putative transcriptional repressor of the lignin biosynthetic pathway. Plant Sci. 173, 542–549. Legay, S., Sivadon, P., Blervacq, A.-S., Pavy, N., Baghdady, A., Tremblay, L., Levasseur, C., Ladouce, N., Lapierre, C., Seguin, A., et al. (2010). EgMYB1, an R2R3 MYB transcription factor from eucalyptus negatively regulates secondary cell wall formation in Arabidopsis and poplar. New Phytol. 188, 774–786. Leisola, M., Pastinen, O., and Axe, D.D. (2012). Lignin--Designed Randomness. BIO-Complex. 2012. Lens, F., Smets, E., and Melzer, S. (2012). Stem anatomy supports Arabidopsis thaliana as a model for insular woodiness. New Phytol. 193, 12–17. Leplé, J.-C., Dauwe, R., Morreel, K., Storme, V., Lapierre, C., Pollet, B., Naumann, A., Kang, K.- Y., Kim, H., Ruel, K., et al. (2007). Downregulation of cinnamoyl-coenzyme A reductase in poplar: multiple-level phenotyping reveals effects on cell wall polymer metabolism and structure. Plant Cell 19, 3669–3691. Levy, A., Szwerdszarf, D., Abu-Abied, M., Mordehaev, I., Yaniv, Y., Riov, J., Arazi, T., and Sadot, E. (2014). Profiling microRNAs in Eucalyptus grandis reveals no mutual relationship between alterations in miR156 and miR172 expression and adventitious root induction during development. Bmc Genomics 15. Lev-Yadun, S. (1994). Induction of sclereid differentation in the pith of Arabidopsis thaliana(L.) Heynh. J. Exp. Bot. 45, 1845–1849. Li, S. (2014). Transcriptional control of flavonoid biosynthesis. Plant Signal. Behav. 9. Li, C., and Lu, S. (2014). Genome-wide characterization and comparative analysis of R2R3-MYB transcription factors shows the complexity of MYB-associated regulatory networks in Salvia miltiorrhiza. BMC Genomics 15, 277. Li, J., Yang, Z., Yu, B., Liu, J., and Chen, X. (2005). Methylation protects miRNAs and siRNAs from a 3’-end uridylation activity in Arabidopsis. Curr. Biol. CB 15, 1501–1507. Li, Q., Yu, H., Cao, P.B., Fawal, N., Mathé, C., Azar, S., Cassan-Wang, H., Myburg, A.A., Grima- Pettenati, J., Marque, C., et al. (2015). Explosive tandem and segmental duplications of multigenic families in Eucalyptus grandis. Genome Biol. Evol. 7, 1068–1081. Li, Y.-F., Wang, Y., Tang, Y., Kakani, V.G., and Mahalingam, R. (2013). Transcriptome analysis of heat stress response in switchgrass (Panicum virgatumL.). BMC Plant Biol. 13, 153. Liebsch, D., Sunaryo, W., Holmlund, M., Norberg, M., Zhang, J., Hall, H.C., Helizon, H., Jin, X., Helariutta, Y., Nilsson, O., et al. (2014). Class I KNOX transcription factors promote differentiation of cambial derivatives into xylem fibers in the Arabidopsis hypocotyl. Development 141, 4311–4319. Lindeboom, J., Mulder, B.M., Vos, J.W., Ketelaar, T., and Emons, A.M.C. (2008). Cellulose microfibril deposition: coordinated activity at the plant plasma membrane. J. Microsc. 231, 192–200. Lin-Wang, K., Bolitho, K., Grafton, K., Kortstee, A., Karunairetnam, S., McGhie, T.K., Espley, R.V., Hellens, R.P., and Allan, A.C. (2010). An R2R3 MYB transcription factor associated with regulation of the anthocyanin biosynthetic pathway in Rosaceae. BMC Plant Biol. 10, 50.

222

Little, C.H.A., and Savidge, R.A. (1987). 7. The role of plant growth regulators in forest tree cambial growth. Plant Growth Regul. 6, 137–169. Liu, L., Ramsay, T., Zinkgraf, M., Sundell, D., Street, N.R., Filkov, V., and Groover, A. (2015). A resource for characterizing genome-wide binding and putative target genes of transcription factors expressed during secondary growth and wood formation in Populus. Plant J. Cell Mol. Biol. 82, 887–898. Llave, C., Kasschau, K.D., Rector, M.A., and Carrington, J.C. (2002). Endogenous and silencing- associated small RNAs in plants. Plant Cell 14, 1605–1619. Lopes, F.J.F., Pauly, M., Brommonshenkel, S.H., Lau, E.Y., Diola, V., Passos, J.L., and Loureiro, M.E. (2010). The EgMUR3 xyloglucan galactosyltransferase from Eucalyptus grandis complements the mur3 cell wall phenotype in Arabidopsis thaliana. Tree Genet. Genomes 6, 745–756. Lopez, D., Tocquard, K., Venisse, J.-S., Legué, V., and Roeckel-Drevet, P. (2014). Gravity sensing, a largely misunderstood trigger of plant orientated growth. Front. Plant Sci. 5. Lu, X.-Y., and Huang, X.-L. (2008). Plant miRNAs and abiotic stress responses. Biochem. Biophys. Res. Commun. 368, 458–462. Lu, S., Zhou, Y., Li, L., and Chiang, V.L. (2006). Distinct roles of cinnamate 4-hydroxylase genes in Populus. Plant Cell Physiol. 47, 905–914. Lu, S., Li, L., Yi, X., Joshi, C.P., and Chiang, V.L. (2008). Differential expression of three eucalyptus secondary cell wall-related cellulose synthase genes in response to tension stress. J. Exp. Bot. 59, 681–695. Lu, S., Li, Q., Wei, H., Chang, M.-J., Tunlaya-Anukit, S., Kim, H., Liu, J., Song, J., Sun, Y.-H., Yuan, L., et al. (2013). Ptr-miR397a is a negative regulator of laccase genes affecting lignin content in Populus trichocarpa. Proc. Natl. Acad. Sci. U. S. A. 110, 10848–10853. Lu, S.F., Sun, Y.H., Shi, R., Clark, C., Li, L.G., and Chiang, V.L. (2005). Novel and mechanical stress-responsive microRNAs in Populus trichocarpa that are absent from Arabidopsis. Plant Cell 17, 2186–2203. Luan, M., Xu, M., Lu, Y., Zhang, L., Fan, Y., and Wang, L. (2015). Expression of zma-miR169 miRNAs and their target ZmNF-YA genes in response to abiotic stress in maize leaves. Gene 555, 178–185. Lucas, J.R., Courtney, S., Hassfurder, M., Dhingra, S., Bryant, A., and Shaw, S.L. (2011). Microtubule-Associated Proteins MAP65-1 and MAP65-2 Positively Regulate Axial Cell Growth in Etiolated Arabidopsis Hypocotyls[W]. Plant Cell 23, 1889–1903. Ma, X., Cao, X., Mo, B., and Chen, X. (2013). Trip to ER: MicroRNA-mediated translational repression in plants. RNA Biol. 10, 1586–1592. MacMillan, C.P., Mansfield, S.D., Stachurski, Z.H., Evans, R., and Southerton, S.G. (2010). Fasciclin-like arabinogalactan proteins: specialization for stem biomechanics and cell wall architecture in Arabidopsis and Eucalyptus. Plant J. Cell Mol. Biol. 62, 689–703. MacMillan, C.P., Taylor, L., Bi, Y., Southerton, S.G., Evans, R., and Spokevicius, A. (2015). The fasciclin-like arabinogalactan protein family of Eucalyptus grandis contains members that impact wood biology and biomechanics. New Phytol. 206, 1314–1327. Marita, J.M., Ralph, J., Hatfield, R.D., and Chapple, C. (1999). NMR characterization of lignins in Arabidopsis altered in the activity of ferulate 5-hydroxylase. Proc. Natl. Acad. Sci. U. S. A. 96, 12328–12332. Matsumoto-Kitano, M., Kusumoto, T., Tarkowski, P., Kinoshita-Tsujimura, K., Václavíková, K., Miyawaki, K., and Kakimoto, T. (2008). Cytokinins are central regulators of cambial activity. Proc. Natl. Acad. Sci. 105, 20027–20031. Matus, J.T., Aquea, F., and Arce-Johnson, P. (2008). Analysis of the grape MYB R2R3 subfamily reveals expanded wine quality-related clades and conserved gene structure organization across Vitis and Arabidopsis genomes. BMC Plant Biol. 8, 83. Mauriat, M., Le Provost, G., Rozenberg, P., Delzon, S., Breda, N., Clair, B., Coutand, C., Domec, J.-C., Fourcaud, T., Grima-Pettenati, J., et al. (2014). Wood Formation in Trees. In Tree Biotechnology, (CRC Press), p. 656 p. Mazur, E., and Kurczynska, E.U. (2012). Rays, intrusive growth, and storied cambium in the inflorescence stems of Arabidopsis thaliana (L.) Heynh. Protoplasma 249, 217–220. McCarthy, R.L., Zhong, R., Fowler, S., Lyskowski, D., Piyasena, H., Carleton, K., Spicer, C., and Ye, Z.-H. (2010). The poplar MYB transcription factors, PtrMYB3 and PtrMYB20, are involved in the regulation of secondary wall biosynthesis. Plant Cell Physiol. 51, 1084– 1090. Mellerowicz, E.J., and Gorshkova, T.A. (2011). Tensional stress generation in gelatinous fibres: a review and possible mechanism based on cell-wall structure and composition. J. Exp. Bot. err339. Mellerowicz, E.J., Baucher, M., Sundberg, B., and Boerjan, W. (2001). Unravelling cell wall formation in the woody dicot stem. Plant Mol. Biol. 47, 239–274. Mellway, R.D., Tran, L.T., Prouse, M.B., Campbell, M.M., and Constabel, C.P. (2009). The wound-, pathogen-, and ultraviolet B-responsive MYB134 gene encodes an R2R3 MYB 223

transcription factor that regulates proanthocyanidin synthesis in poplar. Plant Physiol. 150, 924–941. Melzer, S., Lens, F., Gennen, J., Vanneste, S., Rohde, A., and Beeckman, T. (2008). Flowering- time genes modulate meristem determinacy and growth form in Arabidopsis thaliana. Nat. Genet. 40, 1489–1492. Meng, Y., Shao, C., and Chen, M. (2011). Toward microRNA-mediated gene regulatory networks in plants. Brief. Bioinform. 12, 645–659. Meyer, K., Shirley, A.M., Cusumano, J.C., Bell-Lelong, D.A., and Chapple, C. (1998). Lignin monomer composition is determined by the expression of a cytochrome P450-dependent monooxygenase in Arabidopsis. Proc. Natl. Acad. Sci. U. S. A. 95, 6619–6623. Meyers, B.C., Axtell, M.J., Bartel, B., Bartel, D.P., Baulcombe, D., Bowman, J.L., Cao, X., Carrington, J.C., Chen, X., Green, P.J., et al. (2008). Criteria for Annotation of Plant MicroRNAs. Plant Cell 20, 3186–3190. Mikhail Balakshin, E.A.C. (2014). Isolation and Analysis of Lignin-Carbohydrate Complexes (LCC) Preparations with Traditional and Advanced Methods. Stud. Nat. Prod. Chem. 42, 83–115. Miyashima, S., Koi, S., Hashimoto, T., and Nakajima, K. (2011). Non-cell-autonomous microRNA165 acts in a dose-dependent manner to regulate multiple differentiation status in the Arabidopsis root. Dev. Camb. Engl. 138, 2303–2313. Mizrachi, E., Hefer, C.A., Ranik, M., Joubert, F., and Myburg, A.A. (2010). De novo assembled expressed gene catalog of a fast-growing Eucalyptus tree produced by Illumina mRNA- Seq. BMC Genomics 11, 681. Mizrachi, E., Mansfield, S.D., and Myburg, A.A. (2012). Cellulose factories: advancing bioenergy production from forest trees. New Phytol. 194, 54–62. Mizrachi, E., Maloney, V.J., Silberbauer, J., Hefer, C.A., Berger, D.K., Mansfield, S.D., and Myburg, A.A. (2015). Investigating the molecular underpinnings underlying morphology and changes in carbon partitioning during tension wood formation in Eucalyptus. New Phytol. 206, 1351–1363. Moldoveanu, S.C. (1998a). Chapter 12. Analytical pyrolysis of proteins. In Techniques and Instrumentation in Analytical Chemistry, (Elsevier), pp. 373–397. Moldoveanu, S.C. (1998b). Chapter 9. Analytical pyrolysis of lignins. In Techniques and Instrumentation in Analytical Chemistry, (Elsevier), pp. 327–351. Moreau, C., Aksenov, N., Lorenzo, M.G., Segerman, B., Funk, C., Nilsson, P., Jansson, S., and Tuominen, H. (2005). A genomic approach to investigate developmental cell death in woody tissues of Populus trees. Genome Biol. 6, R34. Moyle, R., Schrader, J., Stenberg, A., Olsson, O., Saxena, S., Sandberg, G., and Bhalerao, R.P. (2002). Environmental and auxin regulation of wood formation involves members of the Aux/IAA gene family in hybrid aspen. Plant J. Cell Mol. Biol. 31, 675–685. Mueller, S., and Brown, R., Jr. (1980). Evidence for an intramembrane component associated with a cellulose microfibril-synthesizing complex in higher plants. J. Cell Biol. 84, 315–326. Muñoz, C., Baeza, J., Freer, J., and Mendonça, R.T. (2011). Bioethanol production from tension and opposite wood of Eucalyptus globulus using organosolv pretreatment and simultaneous saccharification and fermentation. J. Ind. Microbiol. Biotechnol. 38, 1861– 1866. Myburg, A.A., Potts, B.M., Marques, C.M., Kirst, M., Gion, J.-M., Grattapaglia, D., and Grima- Pettenatti, J. (2007). Eucalypts. In Forest Trees, C. Kole, ed. (Springer Berlin Heidelberg), pp. 115–160. Myburg, A.A., Grattapaglia, D., Tuskan, G.A., Hellsten, U., Hayes, R.D., Grimwood, J., Jenkins, J., Lindquist, E., Tice, H., Bauer, D., et al. (2014). The genome of Eucalyptus grandis. Nature 510, 356–+. Neale, D.B., and Savolainen, O. (2004). Association genetics of complex traits in conifers. Trends Plant Sci. 9, 325–330. Nedelkina, S., Jupe, S., Blee, K., Schalk, M., Werck-Reichhart, D., and Bolwell, G. (1999). Novel characteristics and regulation of a divergent cinnamate 4-hydroxylase (CYP73A15) from French bean: engineering expression in yeast. Plant Mol. Biol. 39, 1079–1090. Nelson, E.C. (1983). Australian plants cultivated in England before 1788. Telopea. Nelson, N.D., and Hillis, W.E. (1978). Ethylene and tension wood formation in Eucalyptus gomphocephala. Wood Sci. Technol. 12, 309–315. Nesi, N., Jond, C., Debeaujon, I., Caboche, M., and Lepiniec, L. (2001). The Arabidopsis TT2 gene encodes an R2R3 MYB domain protein that acts as a key determinant for proanthocyanidin accumulation in developing seed. Plant Cell 13, 2099–2114. Neutelings, G., Fénart, S., Lucau-Danila, A., and Hawkins, S. (2012). Identification and characterization of miRNAs and their potential targets in flax. J. Plant Physiol. 169, 1754– 1766.

224

Nieminen, K., Immanen, J., Laxell, M., Kauppinen, L., Tarkowski, P., Dolezal, K., Tähtiharju, S., Elo, A., Decourteix, M., Ljung, K., et al. (2008). Cytokinin signaling regulates cambial development in poplar. Proc. Natl. Acad. Sci. U. S. A. 105, 20032–20037. Nieminen, K.M., Kauppinen, L., and Helariutta, Y. (2004). A weed for wood? Arabidopsis as a genetic model for xylem development. Plant Physiol. 135, 653–659. Niggeweg, R., Michael, A., and Martin, C. (2004). Engineering plants with increased levels of the antioxidant chlorogenic acid. Nat. Biotechnol. 22, 746–754. Nishikubo, N., Awano, T., Banasiak, A., Bourquin, V., Ibatullin, F., Funada, R., Brumer, H., Teeri, T.T., Hayashi, T., Sundberg, B., et al. (2007). Xyloglucan endo-transglycosylase (XET) functions in gelatinous layers of tension wood fibers in poplar--a glimpse into the mechanism of the balancing act of trees. Plant Cell Physiol. 48, 843–855. Niu, Y., Jin, G., Li, X., Tang, C., Zhang, Y., Liang, Y., and Yu, J. (2015). Phosphorus and magnesium interactively modulate the elongation and directional growth of primary roots in Arabidopsis thaliana (L.) Heynh. J. Exp. Bot. 66, 3841–3854. Novaes, E., Drost, D.R., Farmerie, W.G., Pappas, J., Georgios J., Grattapaglia, D., Sederoff, R.R., and Kirst, M. (2008). High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome. Bmc Genomics 9. Ohmiya, Y., Samejima, M., Shiroishi, M., Amano, Y., Kanda, T., Sakai, F., and Hayashi, T. (2000). Evidence that endo-1,4-beta-glucanases act on cellulose in suspension-cultured poplar cells. Plant J. Cell Mol. Biol. 24, 147–158. Ohra-aho, T., Gomes, F.B., Colodette, J., and Tamminen, T. (2013). S/G ratio and lignin structure among Eucalyptus hybrids determined by Py-GC/MS and nitrobenzene oxidation. J. Anal. Appl. Pyrolysis 101, 166–171. Olsen, K.M., Slimestad, R., Lea, U.S., Brede, C., Løvdal, T., Ruoff, P., Verheul, M., and Lillo, C. (2009). Temperature and nitrogen effects on regulators and products of the flavonoid pathway: experimental and kinetic model studies. Plant Cell Environ. 32, 286–299. Ong, S.S., and Wickneswari, R. (2011). Expression profile of small RNAs in Acacia mangium secondary xylem tissue with contrasting lignin content - potential regulatory sequences in monolignol biosynthetic pathway. BMC Genomics 12, S13. Ong, S.S., and Wickneswari, R. (2012). Characterization of microRNAs Expressed during Secondary Wall Biosynthesis in Acacia mangium. PLoS ONE 7. Osakabe, K., Tsao, C., Li, L., Popko, J., Umezawa, T., Carraway, D., Smeltzer, R., Joshi, C., and Chiang, V. (1999). Coniferyl aldehyde 5-hydroxylation and methylation direct syringyl lignin biosynthesis in angiosperms. Proc. Natl. Acad. Sci. U. S. A. 96, 8955–8960. Page, M., Sultana, N., Paszkiewicz, K., Florance, H., and Smirnoff, N. (2012). The influence of ascorbate on anthocyanin accumulation during high light acclimation in Arabidopsis thaliana: further evidence for redox control of anthocyanin synthesis. Plant Cell Environ. 35, 388–404. Paiva, J. a. P., Garnier-Géré, P.H., Rodrigues, J.C., Alves, A., Santos, S., Graça, J., Le Provost, G., Chaumeil, G., Da Silva-Perez, D., Bosc, A., et al. (2008a). Plasticity of maritime pine (Pinus pinaster) wood-forming tissues during a growing season. New Phytol. 179, 1080– 1094. Paiva, J.A.P., Garcés, M., Alves, A., Garnier-Géré, P., Rodrigues, J.C., Lalanne, C., Porcon, S., Le Provost, G., Perez, D. da S., Brach, J., et al. (2008b). Molecular and phenotypic profiling from the base to the crown in maritime pine wood-forming tissue. New Phytol. 178, 283– 301. Paiva, J.A.P., Prat, E., Vautrin, S., Santos, M.D., San-Clemente, H., Brommonschenkel, S., Fonseca, P.G.S., Grattapaglia, D., Song, X., Ammiraju, J.S.S., et al. (2011). Advancing Eucalyptus genomics: identification and sequencing of lignin biosynthesis genes from deep- coverage BAC libraries. Bmc Genomics 12. Pant, B.D., Musialak-Lange, M., Nuc, P., May, P., Buhtz, A., Kehr, J., Walther, D., and Scheible, W.-R. (2009). Identification of Nutrient-Responsive Arabidopsis and Rapeseed MicroRNAs by Comprehensive Real-Time Polymerase Chain Reaction Profiling and Small RNA Sequencing. Plant Physiol. 150, 1541–1555. Pantaleo, V., Szittya, G., Moxon, S., Miozzi, L., Moulton, V., Dalmay, T., and Burgyan, J. (2010). Identification of grapevine microRNAs and their targets using high-throughput sequencing and degradome analysis. Plant J. Cell Mol. Biol. 62, 960–976. Pappas, M. de C.R., Pappas, G.J., and Grattapaglia, D. (2015). Genome-wide discovery and validation of Eucalyptus small RNAs reveals variable patterns of conservation and diversity across species of Myrtaceae. BMC Genomics 16. Parent, J.-S., Martínez de Alba, A.E., and Vaucheret, H. (2012). The origin and effect of small RNA signaling in plants. Front. Plant Sci. 3. Park, Y.W., Tominaga, R., Sugiyama, J., Furuta, Y., Tanimoto, E., Samejima, M., Sakai, F., and Hayashi, T. (2003). Enhancement of growth by expression of poplar cellulase in Arabidopsis thaliana. Plant J. Cell Mol. Biol. 33, 1099–1106.

225

Patzlaff, A., McInnis, S., Courtenay, A., Surman, C., Newman, L.J., Smith, C., Bevan, M.W., Mansfield, S., Whetten, R.W., Sederoff, R.R., et al. (2003). Characterisation of a pine MYB that regulates lignification. Plant J. Cell Mol. Biol. 36, 743–754. Paux, E., Tamasloukht, M., Ladouce, N., Sivadon, P., and Grima-Pettenati, J. (2004). Identification of genes preferentially expressed during wood formation in Eucalyptus. Plant Mol. Biol. 55, 263–280. Paux, E., Carocha, V., Marques, C., de Sousa, A.M., Borralho, N., Sivadon, P., and Grima- Pettenati, J. (2005). Transcript profiling of Eucalyptus xylem genes during tension wood formation. New Phytol. 167, 89–100. Paz-Ares, J., Ghosal, D., Wienand, U., Peterson, P.A., and Saedler, H. (1987). The regulatory c1 locus of Zea mays encodes a protein with homology to myb proto-oncogene products and with structural similarities to transcriptional activators. EMBO J. 6, 3553–3558. Peng, L., Kawagoe, Y., Hogan, P., and Delmer, D. (2002). Sitosterol-beta-glucoside as primer for cellulose synthesis in plants. Science 295, 147–150. Pfaffl, M.W. (2001). A new mathematical model for relative quantification in real-time RT-PCR. Nucleic Acids Res. 29, e45. Pichon, M., Courbou, I., Beckert, M., Boudet, A.M., and Grima-Pettenati, J. (1998). Cloning and characterization of two maize cDNAs encoding Cinnamoyl-CoA Reductase (CCR) and differential expression of the corresponding genes. Plant Mol. Biol. 38, 671–676. Pilate, G., Chabbert, B., Cathala, B., Yoshinaga, A., Leple, J.C., Laurans, F., Lapierre, C., and Ruel, K. (2004a). Lignification and tension wood. C. R. Biol. 327, 889–901. Pilate, G., Déjardin, A., Laurans, F., and Leplé, J.-C. (2004b). Tension wood as a model for functional genomics of wood formation. New Phytol. 164, 63–72. Piquemal, J., Lapierre, C., Myton, K., O’Connell, A., Schuch, W., Grima-Pettenati, J., and Boudet, A.M. (1998). Down-regulation of cinnamoyl-CoA reductase induces significant changes of lignin profiles in transgenic tobacco plants. Plant J. 13, 71–83. Plasencia, A. (2015). Transcriptional regulation of wood formation in Eucalyptus. Role of MYB transcription factors and protein-protein interactions. Université Toulouse 3 Paul Sabatier. Plomion, C., Leprovost, G., and Stokes, A. (2001). Wood Formation in Trees. Plant Physiol. 127, 1513–1523. Poeydomenge, O., Boudet, A., and Grima-Pettenati, J. (1994). A cDNA encoding S-Adeonsyl- Methione-Caffeic-Acid 3-O-Methyltransferase from Eucalyptus. Plant Physiol. 105, 749– 750. Portran, D., Zoccoler, M., Gaillard, J., Stoppin-Mellet, V., Neumann, E., Arnal, I., Martiel, J.L., and Vantard, M. (2013). MAP65/Ase1 promote microtubule flexibility. Mol. Biol. Cell 24, 1964– 1973. Potts, B., Vaillancourt, R., Jordan, G., Dutkowski, G., and Costa eSilva, J. (2004). Exploration of the Eucalyptus globulus gene pool. In Proceedings of the IUFRO Conference, (Aveiro, Portugal: RAIZ, Instituto Investigação de Floresta e Papel), pp. 44–61. Poulsen, C., Vaucheret, H., and Brodersen, P. (2013). Lessons on RNA Silencing Mechanisms in Plants from Eukaryotic Argonaute Structures[W]. Plant Cell 25, 22–37. Puzey, J.R., Karger, A., Axtell, M., and Kramer, E.M. (2012). Deep Annotation of Populus trichocarpa microRNAs from Diverse Tissue Sets. Plos One 7. Qiu, D., Wilson, I.W., Gan, S., Washusen, R., Moran, G.F., and Southerton, S.G. (2008). Gene expression in Eucalyptus branch wood with marked variation in cellulose microfibril orientation and lacking G-layers. New Phytol. 179, 94–103. Rabinowicz, P.D., Braun, E.L., Wolfe, A.D., Bowen, B., and Grotewold, E. (1999). Maize R2R3 Myb genes: Sequence analysis reveals amplification in the higher plants. Genetics 153, 427–444. Raes, J., Rohde, A., Christensen, J., Van de Peer, Y., and Boerjan, W. (2003). Genome-wide characterization of the lignification toolbox in Arabidopsis. Plant Physiol. 133, 1051–1071. Raffaele, S., Vailleau, F., Léger, A., Joubès, J., Miersch, O., Huard, C., Blée, E., Mongrand, S., Domergue, F., and Roby, D. (2008). A MYB transcription factor regulates very-long-chain fatty acid biosynthesis for activation of the hypersensitive cell death response in Arabidopsis. Plant Cell 20, 752–767. Rahantamalala, A., Rech, P., Martinez, Y., Chaubet-Gigot, N., Grima-Pettenati, J., and Pacquit, V. (2010). Coordinated transcriptional regulation of two key genes in the lignin branch pathway - CAD and CCR - is mediated through MYB- binding sites. BMC Plant Biol. 10, 130. Rajagopalan, R., Vaucheret, H., Trejo, J., and Bartel, D.P. (2006). A diverse and evolutionarily fluid set of microRNAs in Arabidopsis thaliana. Genes Dev. 20, 3407–3425. Ralph, J., Lundquist, K., Brunow, G., Lu, F., Kim, H., Schatz, P.F., Marita, J.M., Hatfield, R.D., Ralph, S.A., Christensen, J.H., et al. (2004). Lignins: Natural polymers from oxidative coupling of 4-hydroxyphenyl- propanoids. Phytochem. Rev. 3, 29–60. Rambaldi, D., and Ciccarelli, F.D. (2009). FancyGene: dynamic visualization of gene structures and protein domain architectures on genomic loci. Bioinforma. Oxf. Engl. 25, 2281–2282. 226

Ranik, M., and Myburg, A.A. (2006). Six new cellulose synthase genes from Eucalyptus are associated with primary and secondary cell wall biosynthesis. Tree Physiol. 26, 545–556. Ranik, M., Creux, N.M., and Myburg, A.A. (2006). Within-tree transcriptome profiling in wood- forming tissues of a fast-growing Eucalyptus tree. Tree Physiol. 26, 365–375. Reinhart, B.J., Weinstein, E.G., Rhoades, M.W., Bartel, B., and Bartel, D.P. (2002). MicroRNAs in plants. Genes Dev. 16, 1616–1626. Rencoret, J., Gutiérrez, A., Nieto, L., Jiménez-Barbero, J., Faulds, C.B., Kim, H., Ralph, J., Martínez, A.T., and Del Río, J.C. (2011). Lignin composition and structure in young versus adult Eucalyptus globulus plants. Plant Physiol. 155, 667–682. Rengel, D., Clemente, H.S., Servant, F., Ladouce, N., Paux, E., Wincker, P., Couloux, A., Sivadon, P., and Grima-Pettenati, J. (2009). A new genomic resource dedicated to wood formation in Eucalyptus. Bmc Plant Biol. 9. Rhoades, M.W., Reinhart, B.J., Lim, L.P., Burge, C.B., Bartel, B., and Bartel, D.P. (2002). Prediction of plant microRNA targets. Cell 110, 513–520. Richter, C. (2014). Wood Characteristics: Description, Causes, Prevention, Impact on Use and Technological Adaptation (Springer). Robischon, M., Du, J., Miura, E., and Groover, A. (2011). The Populus Class III HD ZIP, popREVOLUTA, Influences Cambium Initiation and Patterning of Woody Stems. Plant Physiol. 155, 1214–1225. Rockwood, D.L., Rudie, A.W., Ralph, S.A., Zhu, J.Y., and Winandy, J.E. (2008). Energy Product Options for Eucalyptus Species Grown as Short Rotation Woody Crops. Int. J. Mol. Sci. 9, 1361–1378. Rogers, K., and Chen, X. (2013). Biogenesis, Turnover, and Mode of Action of Plant MicroRNAs[OPEN]. Plant Cell 25, 2383–2399. Rogers, L.A., and Campbell, M.M. (2004). The genetic control of lignin deposition during plant growth and development. New Phytol. 164, 17–30. Rohde, A., Morreel, K., Ralph, J., Goeminne, G., Hostyn, V., De Rycke, R., Kushnir, S., Van Doorsselaere, J., Joseleau, J.P., Vuylsteke, M., et al. (2004). Molecular phenotyping of the pal1 and pal2 mutants of Arabidopsis thaliana reveals far-reaching consequences on phenylpropanoid, amino Acid, and carbohydrate metabolism. Plant Cell 16, 2749–2771. Romualdi, C., Bortoluzzi, S., D’Alessi, F., and Danieli, G.A. (2003). IDEG6: a web tool for detection of differentially expressed genes in multiple tag sampling experiments. Physiol. Genomics 12, 159–162. Rowan, D.D., Cao, M., Lin-Wang, K., Cooney, J.M., Jensen, D.J., Austin, P.T., Hunt, M.B., Norling, C., Hellens, R.P., Schaffer, R.J., et al. (2009). Environmental regulation of leaf colour in red 35S:PAP1 Arabidopsis thaliana. New Phytol. 182, 102–115. Ruijter, J.M., Ramakers, C., Hoogaars, W.M.H., Karlen, Y., Bakker, O., van den Hoff, M.J.B., and Moorman, A.F.M. (2009). Amplification efficiency: linking baseline and bias in the analysis of quantitative PCR data. Nucleic Acids Res. 37, e45. Ruttink, T., Arend, M., Morreel, K., Storme, V., Rombauts, S., Fromm, J., Bhalerao, R.P., Boerjan, W., and Rohde, A. (2007). A Molecular Timetable for Apical Bud Formation and Dormancy Induction in Poplar. Plant Cell 19, 2370–2390. Ruzicka, K., Ursache, R., Hejatko, J., and Helariutta, Y. (2015). Xylem development - from the cradle to the grave. New Phytol. 207, 519–535. Salazar, M.M., Nascimento, L.C., Oliveira Camargo, E.L., Goncalves, D.C., Neto, J.L., Marques, W.L., Pereira Lima Teixeira, P.J., Mieczkowski, P., Costa Mondego, J.M., Carazzolle, M.F., et al. (2013). Xylem transcription profiles indicate potential metabolic responses for economically relevant characteristics of Eucalyptus species. Bmc Genomics 14, 201. Santos, R.B., Hart, P., Jameel, H., and Chang, H. (2013). Wood Based Lignin Reactions Important to the Biorefinery and Pulp and Paper Industries. BioResources 8, 1456–1477. Scheller, H.V., and Ulvskov, P. (2010). Hemicelluloses. Annu. Rev. Plant Biol. 61, 263–289. Schrader, J. (2003). Developmental biology of wood formation - finding regulatory factors through functional genomics. Schrader, J., Nilsson, J., Mellerowicz, E., Berglund, A., Nilsson, P., Hertzberg, M., and Sandberg, G. (2004). A high-resolution transcript profile across the wood-forming meristem of poplar identifies potential regulators of cambial stem cell identity. Plant Cell 16, 2278–2292. Schröder, F., Lisso, J., and Müssig, C. (2011). EXORDIUM-LIKE1 Promotes Growth during Low Carbon Availability in Arabidopsis. Plant Physiol. 156, 1620–1630. Schuetz, M., Smith, R., and Ellis, B. (2013). Xylem tissue specification, patterning, and differentiation mechanisms. J. Exp. Bot. 64, 11–31. Schwerin, G. (1958). The Chemistry of Reaction Wood. Part II. The Polysaccharides of Eucalyptus goniocalyx and Pinus radiata. Holzforsch. - Int. J. Biol. Chem. Phys. Technol. Wood 12, 43–48. Sederoff, R., Myburg, A., and Kirst, M. (2009). Genomics, domestication, and evolution of forest trees. Cold Spring Harb. Symp. Quant. Biol. 74, 303–317.

227

Sehr, E.M., Agusti, J., Lehner, R., Farmer, E.E., Schwarz, M., and Greb, T. (2010). Analysis of secondary growth in the Arabidopsis shoot reveals a positive role of jasmonate signalling in cambium formation. Plant J. 63, 811–822. Seo, P.J., Lee, S.B., Suh, M.C., Park, M.-J., Go, Y.S., and Park, C.-M. (2011). The MYB96 transcription factor regulates cuticular wax biosynthesis under drought conditions in Arabidopsis. Plant Cell 23, 1138–1152. Sexton, T., Henry, R., Harwood, C., Thomas, D., McManus, L., Raymond, C., Henson, M., and Shepherd, M. (2011). SNP discovery and association mapping in Eucalyptus pilularis (blackbutt). BMC Proc. 5, 1–2. Shepherd, M., Bartle, J., Lee, D., Brawner, J., Bush, J., Turnbull, P., MacDonnell, P., Brown, T., Simmons, B., and Henry, R. (2011). Eucalypts as a biofuel feedstock. Biofuels 639–657. Shi, R., Sun, Y., Li, Q., Heber, S., Sederoff, R., and Chiang, V. (2010a). Towards a Systems Approach for Lignin Biosynthesis in Populus trichocarpa: Transcript Abundance and Specificity of the Monolignol Biosynthetic Genes. Plant Cell Physiol. 51, 144–163. Shi, R., Yang, C., Lu, S., Sederoff, R., and Chiang, V.L. (2010b). Specific down-regulation of PAL genes by artificial microRNAs in Populus trichocarpa. Planta 232, 1281–1288. Shim, D., Ko, J.-H., Kim, W.-C., Wang, Q., Keathley, D.E., and Han, K.-H. (2014). A molecular framework for seasonal growth-dormancy regulation in perennial plants. Hortic. Res. 1, 14059. Shinya, T., Hayashi, K., Onogi, S., and Kawaoka, A. (2014). Transcript Level Analysis of Lignin and Flavonoid Biosynthesis Related Genes in &lt;i&gt;Eucalyptus globulus&lt;/i&gt; Am. J. Plant Sci. 5, 2764–2772. Shriram, V., Kumar, V., Devarumath, R.M., Khare, T.S., and Wani, S.H. (2016). MicroRNAs As Potential Targets for Abiotic Stress Tolerance in Plants. Plant Biotechnol. 817. Sibout, R., Eudes, A., Pollet, B., Goujon, T., Mila, I., Granier, F., Seguin, A., Lapierre, C., and Jouanin, L. (2003). Expression pattern of two paralogs encoding cinnamyl alcohol dehydrogenases in Arabidopsis. Isolation and characterization of the corresponding mutants. Plant Physiol. 132, 848–860. Sibout, R., Eudes, A., Mouille, G., Pollet, B., Lapierre, C., Jouanin, L., and Seguin, A. (2005). CINNAMYL ALCOHOL DEHYDROGENASE-C and -D are the primary genes involved in lignin biosynthesis in the floral stem of Arabidopsis. Plant Cell 17, 2059–2076. Silva-Junior, O.B., and Grattapaglia, D. (2015). Genome-wide patterns of recombination, linkage disequilibrium and nucleotide diversity from pooled resequencing and single nucleotide polymorphism genotyping unlock the evolutionary history of Eucalyptus grandis. New Phytol. 208, 830–845. Silva-Junior, O.B., Faria, D.A., and Grattapaglia, D. (2015). A flexible multi-species genome-wide 60K SNP chip developed from pooled resequencing of 240 Eucalyptus tree genomes across 12 species. New Phytol. 206, 1527–1540. Smedley, D., Haider, S., Durinck, S., Pandini, L., Provero, P., Allen, J., Arnaiz, O., Awedh, M.H., Baldock, R., Barbiera, G., et al. (2015). The BioMart community portal: an innovative alternative to large, centralized data repositories. Nucleic Acids Res. 43, W589-598. Smith, R.A., Schuetz, M., Roach, M., Mansfield, S.D., Ellis, B., and Samuels, L. (2013). Neighboring parenchyma cells contribute to Arabidopsis xylem lignification, while lignification of interfascicular fibers is cell autonomous. Plant Cell 25, 3988–3999. Soler, M., Camargo, E.L.O., Carocha, V., Cassan-Wang, H., San Clemente, H., Savelli, B., Hefer, C.A., Paiva, J.A.P., Myburg, A.A., and Grima-Pettenati, J. (2015). The Eucalyptus grandis R2R3-MYB transcription factor family: evidence for woody growth-related evolution and function. New Phytol. 206, 1364–1377. Solomon, O.L., Berger, D.K., and Myburg, A.A. (2010). Diurnal and circadian patterns of gene expression in the developing xylem of Eucalyptus trees. South Afr. J. Bot. 76, 425–439. Souza, C., Barbazuk, B., Ralph, S., Bohlmann, J., Hamberger, B., and Douglas, C. (2008). Genome-wide analysis of a land plant-specific acyl : coenzymeA synthetase (ACS) gene family in Arabidopsis, poplar, rice and Physcomitrella. New Phytol. 179, 987–1003. Spicer, R., and Groover, A. (2010). Evolution of development of vascular cambia and secondary growth. New Phytol. 186, 577–592. Stackpole, D.J., Vaillancourt, R.E., Alves, A., Rodrigues, J., and Potts, B.M. (2011). Genetic Variation in the Chemical Components of Eucalyptus globulus Wood. G3 GenesGenomesGenetics 1, 151–159. Stewart, J., Akiyama, T., Chapple, C., Ralph, J., and Mansfield, S. (2009). The Effects on Lignin Structure of Overexpression of Ferulate 5-Hydroxylase in Hybrid Poplar. Plant Physiol. 150, 621–635. Stocks, M.B., Moxon, S., Mapleson, D., Woolfenden, H.C., Mohorianu, I., Folkes, L., Schwach, F., Dalmay, T., and Moulton, V. (2012). The UEA sRNA workbench: a suite of tools for analysing and visualizing next generation sequencing microRNA and small RNA datasets. Bioinforma. Oxf. Engl. 28, 2059–2061.

228

Stracke, R., Werber, M., and Weisshaar, B. (2001). The R2R3-MYB gene family in Arabidopsis thaliana. Curr. Opin. Plant Biol. 4, 447–456. Suer, S., Agusti, J., Sanchez, P., Schwarz, M., and Greb, T. (2011). WOX4 imparts auxin responsiveness to cambium cells in Arabidopsis. Plant Cell 23, 3247–3259. Sun, Y.-H., Shi, R., Zhang, X.-H., Chiang, V.L., and Sederoff, R.R. (2012). MicroRNAs in trees. Plant Mol. Biol. 80, 37–53. Sunkar, R., and Zhu, J.K. (2004). Novel and stress-regulated microRNAs and other small RNAs from Arabidopsis. Plant Cell 16, 2001–2019. Supek, F., Bošnjak, M., Škunca, N., and Šmuc, T. (2011). REVIGO summarizes and visualizes long lists of gene ontology terms. PloS One 6, e21800. Takahashi Schmidt, J. (2008). Functional studies of selected extracellular carbohydrate-active hydrolases in wood formation. Tamagnone, null, Merida, null, Parr, null, Mackay, null, Culianez-Macia, null, Roberts, null, and Martin, null (1998). The AmMYB308 and AmMYB330 transcription factors from antirrhinum regulate phenylpropanoid and lignin biosynthesis in transgenic tobacco. Plant Cell 10, 135–154. Tamasloukht, B., Lam, M.S.-J.W.Q., Martinez, Y., Tozo, K., Barbier, O., Jourda, C., Jauneau, A., Borderies, G., Balzergue, S., Renou, J.-P., et al. (2011). Characterization of a cinnamoyl- CoA reductase 1 (CCR1) mutant in maize: effects on lignification, fibre development, and global gene expression. J. Exp. Bot. 62, 3837–3848. Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M., and Kumar, S. (2011). MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol. Biol. Evol. 28, 2731–2739. Tamura, K., Stecher, G., Peterson, D., Filipski, A., and Kumar, S. (2013). MEGA6: Molecular Evolutionary Genetics Analysis Version 6.0. Mol. Biol. Evol. 30, 2725–2729. Taylor, L.P., and Grotewold, E. (2005). Flavonoids as developmental regulators. Curr. Opin. Plant Biol. 8, 317–323. Taylor, R.S., Tarver, J.E., Hiscock, S.J., and Donoghue, P.C.J. (2014). Evolutionary history of plant microRNAs. Trends Plant Sci. 19, 175–182. Taylor-Teeples, M., Lin, L., de Lucas, M., Turco, G., Toal, T.W., Gaudinier, A., Young, N.F., Trabucco, G.M., Veling, M.T., Lamothe, R., et al. (2015). An Arabidopsis gene regulatory network for secondary cell wall synthesis. Nature 517, 571-U307. Telewski, F.W. (2006). A unified hypothesis of mechanoperception in plants. Am. J. Bot. 93, 1466–1476. Thimm, O., Bläsing, O., Gibon, Y., Nagel, A., Meyer, S., Krüger, P., Selbig, J., Müller, L.A., Rhee, S.Y., and Stitt, M. (2004). MAPMAN: a user-driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes. Plant J. Cell Mol. Biol. 37, 914–939. Thioulouse, J., and Dray, S. (2007). Interactive Multivariate Data Analysis in R with the ade4 and ade4TkGUI Packages. J. Stat. Softw. 22, 1–14. Thumma, B.R., Nolan, M.F., Evans, R., and Moran, G.F. (2005). Polymorphisms in Cinnamoyl CoA Reductase (CCR) Are Associated With Variation in Microfibril Angle in Eucalyptus spp. Genetics 171, 1257–1265. Thumma, B.R., Sharma, N., and Southerton, S.G. (2012). Transcriptome sequencing of Eucalyptus camaldulensis seedlings subjected to water stress reveals functional single nucleotide polymorphisms and genes under selection. BMC Genomics 13, 364. Timell, T.E. (1986). Compression wood in gymnosperms (Springer-Verlag: Berlin, etc). Trapnell, C., Pachter, L., and Salzberg, S.L. (2009). TopHat: discovering splice junctions with RNA-Seq. Bioinforma. Oxf. Engl. 25, 1105–1111. Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M.J., Salzberg, S.L., Wold, B.J., and Pachter, L. (2010). Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515. Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D.R., Pimentel, H., Salzberg, S.L., Rinn, J.L., and Pachter, L. (2012). Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578. Trindade, I., Capitao, C., Dalmay, T., Fevereiro, M.P., and dos Santos, D.M. (2010). miR398 and miR408 are up-regulated in response to water deficit in Medicago truncatula. Planta 231, 705–716. Trindade, I., Santos, D., Dalmay, T., and Fevereiro, P. (2011). Facing the Environment: Small RNAs and the Regulation of Gene Expression Under Abiotic Stress in Plants. In Abiotic Stress Response in Plants - Physiological, Biochemical and Genetic Perspectives, A. Shanker, ed. (InTech), p. Tsai, C.-J., Kayal, E., and Harding, S. (2006). Populus, the new model system for investigating phenylpropanoid complexity. Int. J. Appl. Sci. Eng. 4, 221–233.

229

Turchi, L., Baima, S., Morelli, G., and Ruberti, I. (2015). Interplay of HD-Zip II and III transcription factors in auxin-regulated plant development. J. Exp. Bot. 66, 5043–5053. Turner, S.R., and Somerville, C.R. (1997). Collapsed xylem phenotype of Arabidopsis identifies mutants deficient in cellulose deposition in the secondary cell wall. Plant Cell 9, 689–701. Ulitsky, I., Maron-Katz, A., Shavit, S., Sagir, D., Linhart, C., Elkon, R., Tanay, A., Sharan, R., Shiloh, Y., and Shamir, R. (2010). Expander: from expression microarrays to networks and functions. Nat. Protoc. 5, 303–322. Umezawa, T. (2010). The cinnamate/monolignol pathway. Phytochem. Rev. 9, 1–17. Untergasser, A., Cutcutache, I., Koressaar, T., Ye, J., Faircloth, B.C., Remm, M., and Rozen, S.G. (2012). Primer3-new capabilities and interfaces. Nucleic Acids Res. 40. Vahala, J., Felten, J., Love, J., Gorzsás, A., Gerber, L., Lamminmäki, A., Kangasjärvi, J., and Sundberg, B. (2013). A genome-wide screen for ethylene-induced Ethylene Response Factors (ERFs) in hybrid aspen stem identifies ERF genes that modify stem growth and wood properties. New Phytol. 200, 511–522. Vanholme, R., Ralph, J., Akiyama, T., Lu, F., Pazo, J.R., Kim, H., Christensen, J.H., Van Reusel, B., Storme, V., De Rycke, R., et al. (2010). Engineering traditional monolignols out of lignin by concomitant up-regulation of F5H1 and down-regulation of COMT in Arabidopsis. Plant J. 64, 885–897. Vanholme, R., Cesarino, I., Rataj, K., Xiao, Y., Sundin, L., Goeminne, G., Kim, H., Cross, J., Morreel, K., Araujo, P., et al. (2013). Caffeoyl Shikimate Esterase (CSE) Is an Enzyme in the Lignin Biosynthetic Pathway in Arabidopsis. Science 341, 1103–1106. Vermaas, H. (1995). Drying eucalypts for quality: Material characteristics, pré-drying treatments, drying methods, schedules and optimization of drying quality. (São Paulo: IPEF/IPT), pp. 119–132. Victor, M. (2007). MicroRNAs in the differentiating tissues of Populus and Eucalyptus trees. Thesis. Villar, E., Klopp, C., Noirot, C., Novaes, E., Kirst, M., Plomion, C., and Gion, J.-M. (2011). RNA- Seq reveals genotype-specific molecular responses to water deficit in eucalyptus. BMC Genomics 12, 538. Vining, K.J., Romanel, E., Jones, R.C., Klocko, A., Alves-Ferreira, M., Hefer, C.A., Amarasinghe, V., Dharmawardhana, P., Naithani, S., Ranik, M., et al. (2015). The floral transcriptome of Eucalyptus grandis. New Phytol. 206, 1406–1422. Voinnet, O. (2009). Origin, Biogenesis, and Activity of Plant MicroRNAs. Cell 136, 669–687. Voorrips, R.E. (2002). MapChart: Software for the graphical presentation of linkage maps and QTLs. J. Hered. 93, 77–78. Wang, J.-W., Park, M.Y., Wang, L.-J., Koo, Y., Chen, X.-Y., Weigel, D., and Poethig, R.S. (2011). MiRNA Control of Vegetative Phase Change in Trees. Plos Genet. 7. Wang, Y., Wang, X., and Paterson, A.H. (2012). Genome and gene duplications and gene expression divergence: a view from plants. Ann. N. Y. Acad. Sci. 1256, 1–14. Washusen, R., and Ilic, J. (2001). Relationship between transverse shrinkage and tension wood from three provenances of Eucalyptus globulus Labill. Holz Als Roh- Werkst. 59, 85–93. Washusen, R., Ades, P., and Vinden, P. (2002). Tension wood occurrence in Eucalyptus globulus Labill. I. The spatial distribution of tension wood in one 11-year-old tree. Aust. For. 65, 120– 126. Watanabe, Y., Meents, M.J., McDonnell, L.M., Barkwill, S., Sampathkumar, A., Cartwright, H.N., Demura, T., Ehrhardt, D.W., Samuels, A.L., and Mansfield, S.D. (2015). Visualization of cellulose synthases in Arabidopsis secondary cell walls. Science 350, 198–203. Weng, J.-K., Li, Y., Mo, H., and Chapple, C. (2012). Assembly of an evolutionarily new pathway for α-pyrone biosynthesis in Arabidopsis. Science 337, 960–964. Wilkins, O., Nahal, H., Foong, J., Provart, N.J., and Campbell, M.M. (2009). Expansion and diversification of the Populus R2R3-MYB family of transcription factors. Plant Physiol. 149, 981–993. Winzell, A., Aspeborg, H., Wang, Y., and Ezcurra, I. (2010). Conserved CA-rich motifs in gene promoters of Pt x tMYB021-responsive secondary cell wall carbohydrate-active enzymes in Populus. Biochem. Biophys. Res. Commun. 394, 848–853. Wong, M.M.L., Cannon, C.H., and Wickneswari, R. (2011). Identification of lignin genes and regulatory sequences involved in secondary cell wall formation in Acacia auriculiformis and Acacia mangium via de novo transcriptome sequencing. Bmc Genomics 12. Xia, R., Zhu, H., An, Y.-Q., Beers, E.P., and Liu, Z. (2012). Apple miRNAs and tasiRNAs with novel regulatory networks. Genome Biol. 13, R47. Xu, Z., Zhang, D., Hu, J., Zhou, X., Ye, X., Reichel, K.L., Stewart, N.R., Syrenne, R.D., Yang, X., Gao, P., et al. (2009). Comparative genome analysis of lignin biosynthesis gene families across the plant kingdom. BMC Bioinformatics 10 Suppl 11, S3. Xue, L.-J., Zhang, J.-J., and Xue, H.-W. (2009). Characterization and expression profiles of miRNAs in rice seeds. Nucleic Acids Res. 37, 916–930.

230

Xue, W., Wang, Z., Du, M., Liu, Y., and Liu, J.-Y. (2013). Genome-wide analysis of small RNAs reveals eight fiber elongation-related and 257 novel microRNAs in elongating cotton fiber cells. BMC Genomics 14, 629. Yang, J.H., and Wang, H. (2016). Molecular Mechanisms for Vascular Development and Secondary Cell Wall Formation. Front. Plant Sci. 7. Yanhui, C., Xiaoyuan, Y., Kun, H., Meihua, L., Jigang, L., Zhaofeng, G., Zhiqiang, L., Yunfei, Z., Xiaoxiao, W., Xiaoming, Q., et al. (2006). The MYB transcription factor superfamily of Arabidopsis: expression analysis and phylogenetic comparison with the rice MYB family. Plant Mol. Biol. 60, 107–124. Ye, Z.-H. (2002). Vascular tissue differentiation and pattern formation in plants. Annu. Rev. Plant Biol. 53, 183–202. Ye, Z., and Varner, J. (1995). Differential expression of 2 O-Methyltransferasein lignin biosynthesis in lignin biosynthesis in Zinnia elegans. Plant Physiol. 108, 459–467. Yoshida, M., Ohta, H., Yamamoto, H., and Okuyama, T. (2002). Tensile growth stress and lignin distribution in the cell walls of yellow poplar, Liriodendron tulipifera Linn. Trees 16, 457– 464. Yu, B., Yang, Z., Li, J., Minakhina, S., Yang, M., Padgett, R.W., Steward, R., and Chen, X. (2005). Methylation as a crucial step in plant microRNA biogenesis. Science 307, 932–935. Yu, H., Soler, M., Mila, I., Clemente, H.S., Savelli, B., Dunand, C., Paiva, J.A.P., Myburg, A.A., Bouzayen, M., Grima-Pettenati, J., et al. (2014). Genome-Wide Characterization and Expression Profiling of the AUXIN RESPONSE FACTOR (ARF) Gene Family in Eucalyptus grandis. Plos One 9, e108906. Yu, H., Soler, M., Clemente, H.S., Mila, I., Paiva, J.A.P., Myburg, A.A., Bouzayen, M., Grima- Pettenati, J., and Cassan-Wang, H. (2015). Comprehensive Genome-Wide Analysis of the Aux/IAA Gene Family in Eucalyptus: Evidence for the Role of EgrIAA4 in Wood Formation. Plant Cell Physiol. 56, 700–714. Zdobnov, E.M., and Apweiler, R. (2001). InterProScan--an integration platform for the signature- recognition methods in InterPro. Bioinforma. Oxf. Engl. 17, 847–848. Zhang, B. (2015). MicroRNA: a new target for improving plant tolerance to abiotic stress. J. Exp. Bot. 66, 1749–1761. Zhang, J. (2003). Evolution by gene duplication: an update. Trends Ecol. Evol. 18, 292–298. Zhang, Q., and Backström, N. (2014). Assembly errors cause false tandem duplicate regions in the chicken (Gallus gallus) genome sequence. Chromosoma 123, 165–168. Zhang, Y.-C., and Chen, Y.-Q. (2013). Long noncoding RNAs: new regulators in plant development. Biochem. Biophys. Res. Commun. 436, 111–114. Zhao, K., and Bartley, L.E. (2014). Comparative genomic analysis of the R2R3 MYB secondary cell wall regulators of Arabidopsis, poplar, rice, maize, and switchgrass. BMC Plant Biol. 14, 135. Zhong, R., and Ye, Z.-H. (2014). Complexity of the transcriptional network controlling secondary wall biosynthesis. Plant Sci. Int. J. Exp. Plant Biol. 229, 193–207. Zhong, R., Demura, T., and Ye, Z.-H. (2006). SND1, a NAC domain transcription factor, is a key regulator of secondary wall synthesis in fibers of Arabidopsis. Plant Cell 18, 3158–3170. Zhong, R., Lee, C., and Ye, Z.-H. (2010). Evolutionary conservation of the transcriptional network regulating secondary cell wall biosynthesis. Trends Plant Sci. 15, 625–632. Zhou, L., Liu, Y., Liu, Z., Kong, D., Duan, M., and Luo, L. (2010). Genome-wide identification and analysis of drought-responsive microRNAs in Oryza sativa. J. Exp. Bot. 61, 4157–4168. Zhu, Y., Song, D., Sun, J., Wang, X., and Li, L. (2013). PtrHB7, a class III HD-Zip Gene, Plays a Critical Role in Regulation of Vascular Cambium Differentiation in Populus. Mol. Plant 6, 1331–1343. Zhu, Z., An, F., Feng, Y., Li, P., Xue, L., A, M., Jiang, Z., Kim, J.-M., To, T.K., Li, W., et al. (2011). Derepression of ethylene-stabilized transcription factors (EIN3/EIL1) mediates jasmonate and ethylene signaling synergy in Arabidopsis. Proc. Natl. Acad. Sci. U. S. A. 108, 12539– 12544. Zieher, C. (2010). Biochemistry of the Fiber. In Physiology of Cotton, P.J.M. Stewart, P.D.M. Oosterhuis, P.J.J. Heitholt, and D.J.R. Mauney, eds. (Springer Netherlands), pp. 361–378. Zobel, B., and Buijtenen, J. van (1989). Wood Variation. Its causes and control. Zubieta, C., Kota, P., Ferrer, J.-L., Dixon, R.A., and Noel, J.P. (2002). Structural basis for the modulation of lignin monomer methylation by caffeic acid/5-hydroxyferulic acid 3/5-O- methyltransferase. Plant Cell 14, 1265–1277.

231