The Pennsylvania State University

The Graduate School

The Huck Institutes of Life Sciences

IDENTIFICATION AND FUNCTIONAL CHARACTERIZATION OF GLYCOSYL

HYDROLASE FAMILY 1 (GH1) GENES ACTING ON MONOLIGNOL SUBSTRATES

IN POPLAR AND LOBLOLLY PINE

A Thesis in

Integrative Biosciences

by

Anushree Sengupta

© Anushree Sengupta

Submitted in Partial Fulfillment of the Requirements for the Degree of

Master of Science

August 2012 The thesis of Anushree Sengupta was reviewed and approved* by:

John E. Carlson Professor of Molecular Genetics Thesis Adviser

Claude dePamphilis Professor of Biology

Dawn Luthe Professor of Stress Biology

Teh Hui-Kao Professor of Biochemistry & Molecular Biology Chair, Intercollege Graduate Degree Program in Plant Biology

*Signatures on file in the Graduate School.

ii

Abstract

ß-glucosidases (BGLUs) are members of the Glycosyl hydrolase family 1 (GH1) group of enzymes that play important roles in several physiological processes in . Coniferin ß - glucosidase (CBG) identified by Dharmawardhana et al. (1995) in Pinus contorta was the first

BGLU identified to specifically act on monolignol substrates. It removes a glucoside from the glucosylated form of coniferyl alcohol, coniferin, prior to its polymerization into lignin.

Our goal is to try to identify the ß-glucosidase genes in poplar and pine that are specific to lignin monomers. We used the pine CBG cDNA sequence as BLAST query against all plant nodes in

Phytozome (JGI) to identify other CBG genes and constructed a NJ tree with the sequences obtained manual sequence curation and 1000 bootstrap replicates with the aligned sequences.

We retrieved 40 GH1 family genes from the Poplar genome and by phylogenetic analysis identified 6 Poplar genes that cluster with CBG and other lignin monomer specific genes, which includes two forms of the CBG gene in Pinus contorta and four forms of the gene in Pinus taeda identified previously in our lab (Song Liu, unpublished data). We analyzed the expression of the

6 genes identified to be putative lignification genes in the leaves, xylem and phloem samples collected during different time points in Populus balsamifera, a poplar closely related to

P. trichocarpa.

Expression analysis of samples of xylem, phloem and leaves collected during April, June and

September show a decrease in expression of the genes in from April to September in xylem and

iii phloem. The expression is also higher in the vascular tissues where lignin is present than in the leaves.

iv

Contents

LIST OF FIGURES viii

LIST OF TABLES x

LIST OF ABBREVIATIONS xi

LIST OF OVERSIZED MATERIALS (IN ELECTRONIC VERSION) xii

ACKNOWLEDGEMENTS xiii

CHAPTER 1: INTRODUCTION AND LITERATURE REVIEW 1

Synthesis of monolignols 2

Extracellular steps in lignin biosynthesis 4

Model organisms 7

CHAPTER 2: EVOLUTIONARY ANALYSIS OF THE GLYCOSYL HYDROLASE

FAMILY 1 (GH1) GENES IN PLANTS AND IDENTIFICATION OF

PUTATIVE GENES INVOLVED IN LIGNIFICATION

Introduction 10

Materials and Methods 11

Results 13

Discussion and Conclusions 17

v

CHAPTER 3: DETERMINING THE EXPRESSION OF LIGNIN SUB-CLADE GH1

GENES

Introduction 32

Materials and Methods

Collection of plant tissue samples: 34

qRT-PCR Primer design and validation: 35

Extraction of RNA and qRT-PCR: 36

Normalization of the results 37

Statistical analysis of data 38

Results and Analysis of data

Expression of CBG-like genes in Populus balsamifera 40

Expression of CBG-like genes in Pinus taeda 43

Conclusions and Discussion 45

CHAPTER 4: LOCALIZATION OF LIGNIN IN POPLAR STEMS

Introduction 59

Materials and Methods:

Collection of specimens 59

Preparation for microscopy 60

vi

Results 61

Conclusions and Discussion 62

CHAPTER 5: GENERAL DISCUSSION AND THE BROADER PERSPECTIVE 65

LITERATURE CITED 69

APPENDIX A ALL SEQUENCES IN PHYLOGENETIC TREE 75

APPENDIX B TABLES, FIGURES AND RAW DATA FOR CHAPTER 3

PART A – Poplar samples 93

PART B – Loblolly Pine samples 149

vii

LIST OF FIGURES

Figure1.1 Mode of Action of β-glucosidase 5

Figure 2.1 Available as Supplemental figure 3 in computerized 22 version. NJ analysis of amino acid alignments of the

putative lignin clade (PLC) of GH1 family genes that contains CBG-like genes which putatively act on lignin monomer substrates.

Figure 2.2 A Maximum-Likelihood tree (ML) of the PLC obtained 23 by amino acid alignments and 500 boot strap reiterations. Available as Supplemental figure 3 in computerized version

Figure 2.3 Synteny of putative Lignification b-glucosidase in 24 Populus showing tandem duplicates in Poplar PLC

genes

Figure 2.4A Muscle v3.8 alignment of the CDS sequence of 25 PoptrGH1-26 and PoptrGH1-27

Figure 2.4B Muscle v3.8 alignment of 100bp upstream region of 27 PoptrGH1-26 and PoptrGH1-27

Figure 2.5A and B Comparison of the 6 Populus CBG-like genes with the 29 29 predicted ancestral GH1 gene structure, consisting of 13 exons

Figure 2.6A Sequence Logos for amino acids in the catalytic 30 acid/base domain in all GH1 sequences

viii

Figure 2.6B Sequence Logos for amino acids in the catalytic 30 acid/base domain in Lignin clade only (PLC)

Figure 2.7A Sequence Logos for amino acids in the catalytic 31 nucleophile domain in all GH1 sequences

Figure 2.7B Sequence Logos for amino acids in the catalytic 31 nucleophile domain in the Lignin clade only (PLC)

Figure 3.1 A to D Graphs of dCt using UBQ Vs time in Poplus balsamifera 53

Figure 3.2 A to D Graphs of dCt using PoptrGH1-34 Vs time in Poplus 55 balsamifera

Figure 3.3 A and B Graphs of dCt using Act2 Vs time in Pinus taeda 57

Figure 3.4 A and B Percentage graph of dCt using Act2 Vs time in Pinus 58 taeda

Figure 4.1 Autofluorescence CLSM images of cross section of 63 Populus balsamifera stem, with 405nm excitation

Figure 4.2 Fluorescent CLSM images of cross sections of Populus 64 balsamifera stems stained in basic fuchsin

ix

LIST OF TABLES

Table 2.1 Poplar accession numbers and the corresponding GH1 21 names

Table 3.1a Primer Sequences used for RT-PCR assay of gene 48 expression in Poplar

Table 3.1b Primer Sequences used for RT-PCR assay of gene 48 expression in Pine

Table 3.2a Summary of ANOVA results for comparisons of 49 biological replicates and tissues; data normalized with UBQ11

Table 3.2b Summary of ANOVA results for comparisons of 50 biological replicates and months; data normalized with UBQ11

Table 3.3a Summary of ANOVA results for comparisons of 51 biological replicates and tissues; data normalized with GH1-34

Table 3.3b Summary of ANOVA results for comparisons of 52 biological replicates and months; data normalized with GH1-34

x

LIST OF ABBREVIATIONS

GH1 Glycosyl Hydrolase Family 1

CBG Coniferin Beta Glucosidase

PLC Putative Lignin Clade

Poptr Poplar trichocarpa

For complete names of all the abbreviations used in phylogenetic tree, please refer to Appendix A.

xi

LIST OF OVERSIZED MATERIALS (in electronic version)

Supplemental Figure 1 Complete NJ tree

Supplemental Figure 2 NJ tree of PLC

Supplemental Figure 3 ML tree of PLC

All Sequences NJ tree Newick

PLC NJ Tree Newick

PLC ML Tree 100BS newick

All Sequences Protein Alignment NEXUS

PLC Protein alignment NEXUS

xii

ACKNOWLEDGEMENTS

First of all, I’d like to thank my adviser, Dr.John Carlson and my committee members,

Dr.Dawn Luthe , Dr.Claude dePamphilis and former committee member, Dr.David Braun for their help to do this project.

I would like to thank the various centers I received funding from which includes the Huck

Institutes of Life Sciences, Intercollegiate degree program in Plant Biology, The Schatz

Center for Tree Molecular Genetics in the School of Forest Resources at Pennsylvania State

University and the Center for Lignocellulose Structure and Function, an Energy Frontier

Research Center funded by the US department of Energy.

I would also thank Denis S. DiLoreto, Nicole Zembower, Lena Landherr Sheaffer, Paula E.

Ralph, Yuannian Jiao, as well as other Carlson Lab members –Joshua Verbano, Joshua R.

Herr, Tyler K. Wagner, Chien-Chih Chen, Charles Addo-Quaye, Teodora Best, and various other members of the Plant Biology community.

My special acknowledgement goes to my parents and my husband, all of who patiently put up with me and supported me in every step of this project. They stood strong to give me the strength to finish my defense within a couple of months after my mother passed away.

Finally, I would like to thank Dr.Howard Salis for allowing me to finish off the thesis while I was working in his lab, and all Salis lab members who have provided wonderful moral support.

xiii

Chapter 1:

Introduction and Literature Review

Lignin is an important component of the plant cell wall. Along with cellulose, lignin forms the backbone for plants to stand erect. The deposition of lignin in the cell wall of photosynthetic land plants is considered to have evolved around 430 million years ago

(Boudet, 2000). Apart from providing mechanical support, lignin provides hydrophobicity to water conducting cells (Dixon et al., 2001, Taiz and Zeiger, 2006). Lignin also plays a role in plant defense by providing a hydrophobic wall that counters the action of hydrolytic enzymes released by pathogens (Boudet, 2000, Taiz and Zeigler, 2006), and defends against herbivory by making the tissue difficult to digest by herbivores (Rogers and Campbell,

2004). It has also been postulated that lignins play a role in cell wall extension, but that has not yet been convincingly demonstrated (Boudet, 2000). Zhong et al. (1997) demonstrated that in mutants which lacked lignin in interfascicular fibers of Arabidopsis, known as the interfascicular fiber mutant (ifl), the stems grow long and could not stand as erect as the wild type plants, but rather would lie on the ground. But the usefulness of lignin to plants makes it less acceptable for the food and fodder industry as its presence makes fodder difficult for animals to digest. In addition, the paper industry requires cost-intensive polluting chemicals to remove lignin (Boudet, 2000). Lignin also interferes with the cellulase enzymes during industrial conversion of cellulose to alcohols to be used as a biofuel (Sticklen, 2006). Engineering plants to remove or reduce lignin content in a way

1

that will not be deleterious to the plant will have benefits for the fodder, energy and pulp industries.

Lignin is a polymer composed of the methoxylated and hydroxylated phenylpropanoid monomers known as monolignols. The monomers are coniferyl alcohol which is converted into the guaiacyl (G) subunit in lignin, sinapyl alcohol which is converted into the syringyl

(S) unit, and 4-coumaryl alcohol which gives rise to the p-hydroxyphenyl (H) subunit which is the only non-methylated lignin subunit (Boudet, 2000; Dixon et al., 2001). Gymnosperm lignins are mainly composed of coniferyl alcohol, dicot lignins are a mixture of coniferyl and sinapyl alcohol, and the monocot lignins are a mixture of all three monolignols (Amthor,

2003). Biosynthesis of lignin is an energy intensive process, and forms a substantial chunk of the carbon sink in plants (Rogers and Campbell, 2004).

Synthesis of monolignols

Phenylalanine is the precursor for the synthesis of monolignols (Rogers and Campbell,

2004; Boudet, 2000; Dixon et al., 2001; Boerjan et al. 2003; Rogers and Campbell, 2004;

Weng and Chapple, 2010). Phenylalanine ammonia lyase (PAL) is the first enzyme in the monolignol biosynthetic pathway which deaminates phenylalanine to yield trans-cinnamic acid. This is then converted to p-coumaroyl CoA by 4-hydroxycinnamoyl-CoA ligase (4CL) followed by the formation of para-coumarate esters with the help of hydroxycinnamoyl-

CoA:shikimate (HCT). This is then converted to caffeoyl shikimic acid with the help of p- coumaryl shikimate 3’-hydroxylase. The difference between the three types of lignin is the presence of different numbers of methyl ether side chains varying from zero (H-lignin) to 2

2

(S-lignin). Caffeoyl-CoA O-methyl transferase (CCoAOMT) catalyzes the transmethylation reaction by substituting a methyl group on the phenol group of lignin using S-adenosyl methionine (SAM) as the donor for the methyl group. Ferulate 5-hydroxylase (F5H) shows a preference for coniferyl aldehyde as a substrate and converts it into 5-OH coniferaldehyde.

Caffeic acid O-methyl transferase (COMT) aids in the methylation of the aldehydes to form its corresponding methylated aldehydes. Hydroxycinnamoyl-CoA reductase (CCR) converts the esters to their corresponding aldehyde form which is followed by cinnamyl alcohol dehydrogenase (CAD) that reduces the aldehydes into the monolignol alcohols. CAD is a multifunctional gene, and usually converts all the different forms of monolignol-aldehydes into their corresponding alcohols. However, a homolog of CAD called SAD (sinapyl alcohol dehydrogenase) has been found to exist in Populus tremuloides that specifically converts sinapyl aldehyde (S-lignin monomer) subunits into their alcohol form (Li et al., 2001;

Bomati and Noel, 2005).

Potential orthologs of all the genes involved in the monolignol biosynthetic pathway have been found to exist in the genome of the non-lignifying Bryophyte Physcometrilla patens although the most primitive fully sequenced land plant, green algae Chlamydomonas reinhardtii has none (Weng and Chapple, 2010). This suggests that the monolignol biosynthetic genes and products existed well before the evolution of lignified tracheophytes. It is also possible that the evolution of genes coding for cell wall enzymes involved in the transport and assembly of lignin played an important role in the evolution of lignified vascular plants.

3

Extracellular steps in lignin biosynthesis

Once the monolignols are synthesized inside the cell, they are transported to the cell wall for polymerization. The mechanisms of transport of lignin to the cell wall in preparation for polymerization are unclear as well as controversial. The 4-O--D-glucosides of monolignols

(para-hydroxycinnamyl alcohol glucoside, coniferin, and syringin) accumulate in the cambial sap of all gymnosperms and some angiosperm trees including the five species of dicots in the Magnoliaceae and Oleaceae families (Terazawa et al, 1984). These glucosides have been suggested to be the transport and/or storage form of monolignols. The monolignol glucosides have been shown to accumulate in cambium during wood formation

(Savidge 1989) and be efficient precursors for specific labeling of lignin in both gymnosperms and angiosperms. Samuels et al (2002) found evidence for a role for Golgi apparatus in transport of monolignols in developing xylem cells. But Kaneda et al. (2008) did not find any evidence for Golgi-derived transport of monolignols in an experiment where they fed radiolabeled Phenylalanine ([3H]Phe) to dissected cambium tissues from

Pinus contorta (lodgepole pine) (Reviewed by Vanholme, 2010).

Before polymerization, the monolignols have to be cleaved from the glucose moiety. This is done by the enzyme β-glucosidase. Dharmawardhana et al. (1995) were the first to discover a β-glucosidase that specifically acts on a monolignol substrate, namely coniferin, and converts coniferin to coniferyl alcohol. The suggested method of conversion is shown in Figure 1.

4

Once they are deglycosylated, the monolignols are dehydrogenated followed by radical quenching and cross-coupling among the stable free radicals resulting in the complex three

- dimensional structures of lignin. The full length cDNA for the lodgepole pine’s Coniferin - glucosidase (CBG) consisting of 1909 basepairs was cloned (Dharmawardhana et al., 1999).

Recombinant CBG protein expressed in E. coli showed the same physical and kinetic characteristics as the native enzyme from xylem. Antibodies were generated to the recombinant CBG protein for use in sub-cellular localization of monolignols. The 23 amino acid N-terminal ER secretion signal peptide in the deduced 513 peptide sequence further supports a role for CBG in extra-cellular (cell wall) lignin polymerization events.

Figure1.1: Mode of Action of β-glucosidase

In the cell wall

The pine CBG gene is grouped under the glycosyl hydrolase family 1 (GH1) according to the classification published by Henrissat in 1991 (subsequent reports by Henrissat and Davies,

1997; Bourne and Henrissat, 2001). Glycosyl hydrolases hydrolyzes Oxygen of Sulfur linked

5

–glycosidic bonds (Czjzek et al., 2000). Czjzek et al. (2000) elucidated the identity of the amino acids and mechanism of the enzyme–substrate formation. They found out the - glucosidases have an aglycone (the substrate, which is a non-sugar moiety) binding sub site, which explains its narrow substrate specificity. β-glucosidase capable of hydrolyzing coniferin was found in seedlings and in suspension culture systems (Hosel et al, 1982).

Following the publication of the complete Arabidopsis genomic sequence, Xu et al. (2004) took up the task of completely reannotating the glycoside hydrolase (GH) family 1 β- glucosidase in Arabidopsis. They found 48 GH1 family genes in Arabidopsis. A phylogenetic tree showed that 47 out of the 48 members belong to the same ancestry and only one had a different ancestry. A phylogenetic analysis of protein sequences of the 48 Arabidopsis β– glucosidase genes with other plant taxa (Xu et al. 2004) produced a cluster of three of the sequences, BGLU45, BGLU46 and BGLU47, with the β-glucosidase of Pinus contorta that has a function in lignification as published by Dharmawardhana et al. (1995). This led Xu et al. (2004) to hypothesize that the gene products of BGLU45, BGLU46 and BGLU47 function in lignification. Escamilla-Trevino et al. in 2006 followed up of the study by Xu et al. and established that the genes BGLU45 and BGLU46 are active in the lignin monomer deglucosylation step in lignification. Opassiri et al (2006) have published a similar analysis of the β-glucosidase genes in rice. They found that the 13:12 exon:intron ratio is the most common organization of β-glucosidase genes in rice. But they also reported genes with 13,

12, 11, 9 exon or intronless genes. Only three out of five gene structure patterns observed in rice were also observed in Arabidopsis by Xu et al. Two intronless genes were found in rice, but none in Arabidopsis.

6

Given the conserved nature of lignification genes, using primers designed from the Pinus contorta gene should yield analogous genes in other species. Our objective is to find - glucosidases in Pinus taeda (loblolly pine) and Populus trichocarpa (poplar). The hypothesis for this study is that it is possible to deduce analogous genes in Pinus taeda and Populus trichocarpa using a comparative functional genomics approach based on the CBG gene in

Pinus contorta. To expand the set for CBG sequences, and to learn if the CBG gene is conserved in other species, we undertook a comparative analysis between lodgepole pine and loblolly pine. We searched for CBG genes in these two species with gradient PCR using degenerate primers designed to target regions of sequence homology. The amplified regions were sequenced and aligned with the existing pine CBG cDNA sequence. Exons and introns were identified by alignment of genomic sequences with the pine CBG cDNA sequence. Alignments of the sequences of these clones reveal families of 4 gene sequences in loblolly pine and 2 gene sequences in lodgepole pine. The goal of this thesis is to find genes orthologous to CBG in Populus trichocarpa, the first fully sequenced tree species, based on their sequence similarity, and to determine if they have a role in lignin biosynthesis such as acting on monolignol-glucoside substrates.

Model Organisms

The trees used in this study are Populus balsamifera and Pinus taeda.

Populus trichocarpa is the first completely sequenced woody tree (Tuskan et al., 2006). The poplar genome provides a foundation for the study of several aspects of plant biology that were previously not easily approachable (Bradshaw et al., 2000; Jansson and Douglas,

7

2007). Being a woody tree species that grows quickly, can be clonally propagated and is efficiently genetically transformable, poplar serves as an excellent model system to study putative genes that act on monolignol substrates (Jansson and Douglas, 2007).

Populus balsamifera is a northern species in the cottonwood section of poplar, which is closely related to the model species Populus trichocarpa. These species have more than

99% sequence similarity (Bradshaw et al., 2000; Barakat et al., 2007) and P. trichocarpa has at times been classified as a subspecies of P. balsamifera. Populus trichocarpa trees are not found in the eastern United States. However large Populus balsamifera trees grow on the Penn State University main campus at University Park, which were used for this study for the ease of availability.

Pines are good model organisms for gymnosperms for several reasons. They are the most studied and characterized group of gymnosperms (Lev–Yadun and Sederoff, 2000). The megagametophyte in pines can yield enough DNA for haploid analysis and haploid mapping. Large number of genetic maps and markers are currently available for pines, especially the economically very important southern US loblolly pine (Pinus taeda), which is also genetically transformable. Thus, Loblolly pine is a good model system for studying gymnosperms. Gymnosperms link angiosperms evolutionarily with lower plants and provide important information about the evolution of vascular plants. The lowest group of vascular plants is pteridophytes. They don’t have very well established secondary cell walls, though lignin has been detected in some of the higher orders of pteridophytes like leptosporangiates (Harris et al., 2005; Sarkar et al., 2009). Gymnosperms represent the

8

evolution of tall, woody plants, which was made possible with the lignification and subsequent strengthening of the main stem/trunk enabling the plants to stand erect.

However pines as a model system come with their share of drawbacks. They have very large genomes, making complete genome sequencing a distant target. Pines also have a long generation time and take many years to reach maturity. There is huge genetic diversity in pines, and no species of pine can be found growing in all environmental conditions.

9

Chapter 2:

Evolutionary analysis of the Glycosyl Hydrolase family 1 (GH1) genes in plants and

identification of putative genes involved in lignification

Introduction

Xu et al. (2004) did a phylogenetic analysis of the protein sequences for all known

Arabidopsis Glycosyl Hydrolase family 1 (GH1) genes and found that three of the 47 related genes, BGLU45, BGLU46 and BGLU47 clustered with Pinus contorta’s β-glucosidase (CBG) sequence which has known function in lignification, as published by Dharmawardhana et al. (1995) (Supplemental figures 2 and 3 for the phylogenetic tree). Xu et al. and Escamilla-

Trevino et al. (2006) established experimentally that the three Arabidopsis genes BGLU45,

BGLU46 and BGLU47 also act on monolignol-glucoside substrates involved in lignification.

We did a comparative genomic analysis of the β-glucosidase family to identify genes putatively involved in lignification based on their sequence similarity and relatedness to known monolignol-glucosidase genes. Our hypothesis is that the genes in Poplar and other species which cluster phylogenetically with genes previously known to code for proteins that act on monolignol -glucoside substrates, and which share conservation at known active site residues, can be considered putative CBG-like genes. Also, the expression of the

CBG-like genes should also coincide with the timing of lignification. Their functions can be elucidated with certain in vitro assays. The main focus of this study will be to identify putative β-glucosidase genes involved in lignification in poplar and pine.

10

Materials and Methods:

The protein sequence of Pinus contorta β-glucosidase- PcCBG (AAC69619) was used as query to to perform tBLASTn and BLASTp against Viridiplantae in Phytozome , all plants in

NCBI, and all conifers in PlantGDB. All sequences reported by BLAST with alignments to the

PcCBG coding sequence at a minimum score of e-20 were extracted. Very small sequences were removed (0-200 aa in length; the typical size of the protein is around 400 to 550 aa in length). About 810 sequences were aligned using Muscle (v3.8) multiple alignment software (Edgar, 2004). MEGA v5.05 (Tamura et al., 2011) was used to compute pair-wise distances using Poisson model for calculated the distances. All the sequences that were too dissimilar such that pairwise distances could not be computed were noted and removed from further analysis. There were also several sequences that could not be aligned with the other sequences. These sequences were removed (they were usually very small sequences). Note that this does not mean that the sequences are not GH1 members, but may represent partial assemblies of GH1 coding sequences. Given the large number of sequences, it was difficult to draw a legible tree with all of the sequences. Although two of the sequences removed from the final GH1 family tree were poplar sequences

(POPTR0003s22020 and POPTR0217s00210), they were checked by cluster analysis with a representative set of fewer sequences to make sure that they did not cluster within the lignin subclade.

Once the less similar sequences were removed, any misalignments were rectified manually.

A preliminary Neighbor-Joining (NJ) tree was constructed with 1000 bootstrap replications.

11

The NJ trees were based on amino acid alignments, the reliability of branching was estimated with 1000 bootstrap replications, and Poisson correction conducted using Mega

4.0 (Tamura et al., 2007) software. An ML tree with 100 bootstraps was also constructed for the PLC proteins that were aligned in the NJ tree mentioned above using Mega 5.05. The

Jones Taylor-Thornton (JTT) model (Jones et al., 1992) was used ,along with a Gamma distribution setting of 2 discrete rate categories in the ML tree construction. Pair-wise deletion of positions with gaps was used in Mega for the NJ tree construction, while all sites were used for ML tree.

Any two sequences in the GH1 NJ tree that seemed to be exactly the same (100 BS value and same branch length) were checked for redundancy, and any redundant sequences removed. The sequences were aligned once again using Muscle, and checked manually. An

NJ tree was then constructed with the remaining 786 sequences with 1000 bootstraps

(represented on the tree as percentages) using Mega.

In the NJ tree, six poplar sequences clustered within the pine monolignol-glucosidase clade.

These six sequences were used to query the Populus genome sequences in Phytozome by

BLAST to check for any additional homologous sequences not identified with the first

BLAST search using the PcCBG query sequence. The new sequences obtained were checked to determine if they were already included in the previous analysis and we found that all the new sequences were already included. A new NJ tree was then constructed and was rooted using a clade containing five sequences from Physcometrella patens, viz.

Pp1s1_726v6, pp1s127_79v6, Pp114_133v6, pp1s978v6, pp1s22_312v6, and one sequence

12

from Sellaginella moellendorffi, Smo:228612, as the outgroup. All the sequences are mentioned in Addendum 1.

The sequences that clustered with Pinus lignification genes are hypothesized as being putative GH1 genes acting on monolignol substrates. This cluster of sequences was named the “putative lignin clade (PLC)”. All the 84 PLC sequences identified were extracted from the rest of the GH1 family sequences and realigned using Muscle and the same protocol mentioned above. An NJ tree with 1000 bootstraps was constructed for the PLC sequences.

This tree was rooted with Pinus genes as the out group.

The gene structures were drawn using “Gene Structure Display Server” (Guo et al., 2007).

Lines joining the exons in different genes were drawn by hand and scanned.

Sequence logos (Schneider et al., 1990) were constructed for the conserved active sites

(catalytic acid/base domain and the nucleophile domains) of the protein sequences (Czjzek et al., 2000). Separate alignments and logos were made for all 786 of the sequences and for only the sequences in the lignin clade. The software “Weblogo” (Crooks et al., 2004) was used to prepare the sequence logos.

Results

After eliminating very short and redundant sequences, there were 786 sequences that were used to estimate the phylogenetic tree for GH1 family proteins (Appendix A contains a list of all the genes used in the tree). Forty-four Populus CBG-like protein sequences were identified through BLAST searches using the PcCBG protein sequence to query all plant

13

genes in Phytozome (Table 2.1). Two of the 44 poplar sequences were too small, and hence removed from analysis. The 44 Poplar genes were named according to their positions in the genome scaffolds. The accession numbers and their corresponding genes are shown in

Table 2.1.

An NJ tree was constructed for all 786 of the sequences based on amino acid alignments

(Supplementary figures 1). The tree yielded a distinct clade of PcCBG and other CBG-like genes from gymnosperms and angiosperms (Figure 2.1; Supplemental figure 2). We called this the putative lignin clade (PLC). There are two cassava sequences that could not be identified to belong to any of the subclades, and are associated with a weak bootstrap value. Adding sequences from more species, especially other non-grass monocots and basal could help in grouping the sequences and providing a better bootstrap value.

In total, 84 “CBG-like” sequences form the PLC sub-clade within the GH1 family, including seven gymnosperm genes and 77 angiosperm genes. The sequences from the PLC were extracted and aligned once again on Muscle. A maximum likelihood (ML) tree was generated using only the PLC sequences (Figure 2.2; supplemental figure 3). Another NJ tree was generated using only the PLC genes (Supplemental figure 2).

This PLC clade is characterized by distinct angiosperm and gymnosperm clades. The PLC

Angiosperm clade includes two monocot sub-clades and three Eudicot sub-clades. These were named Monocots 1, Monocots 2, and Eudicots 1, Eudicots 2 and Eudicots 3 respectively. This naming is not derived from and is not related to the APG III classification

14

system (Angiosperm Phylogeny Group, 1998), and members of both Eurosids I and II are found in the Eudicots1 and Eudicots2 subclade. The Eudicots3 subclade consists of three sequences from the Eucalyptus genome and two from Cassava (Euphorbiaceae). Cassava is also represented in both the Eudicots1 and Eudicots2 clade, but Eucalyptus is only found in

Eudicots1. Both the ML (Figure 2.2; supplemental figure 3) and NJ (Figure 2.1;

Supplemental figure 2) trees for the PLC clade place one dicot subclade (“Eudicots (1)”) as distinct from and sister to all of the other sub-clades of Angiosperm PLC proteins. The second major Angiosperm clade of PLC proteins contains the sub-clades Monocots 1,

Monocots 2, Eudicots 2, and Eudicots 3. The NJ and ML trees show strong support for the sub-clades named Monocots 1 and Eudicots 2 as being distinct sister groups within this second major clade of Angiosperm of PLC proteins. However the Monocots 2 and Eudicots

3 sub-clades are separated on the NJ and ML trees with bootstrap values of only 48% and

47%, respectively, suggesting that the Monocots 2 and Eudicots 3 groups could be two sub- sub-clades within one orthologous sub-clade. Interestingly, two Cassava CBG-like protein sequences clustered as a separate sub-sub-clade within this Monocots 2 - Eudicots 3 sub- clade on the NJ tree, albeit it with weak bootstrap support (48%). The two Cassava CBG- like proteins clustered with the Eudicots 3 sub-sub-clade in the ML tree, which is a more likely grouping.

Six poplar genes were identified in the PLC. Five of these are tandem duplicates in scaffold

4, whereas one belonged to scaffold 1 (Figure 2.3). Interestingly, four of the five tandem duplicates, PoptrGH1-24 to 27, clustered in the same sub-sub-clade labeled as “Eudicot 1”,

15

whereas PoptrGH1-28 clustered in a different sub-sub-clade of “Eudicot” 1 sub-clade.

PoptrGH1-19 clustered separately with the “Eudicot 2” sub-clade.

The PLC genes in Poplar (Figure 2.5) as well as Pinus taeda and Pinus contorta had exon:intron structures of 12:11, which was also seen in CBG-like genes in Arabidopsis (Xu et al., 2004) and rice (Oppassiri et al., 2006). However, the ancestral gene organization in GH1 family seems to be 13:12 exons:introns (Xu et al. , 2004).

The sequence logos generated for the two catalytic domains – a nucleophile domain

(I/VTENG) and an acid-base domain (WXTFNEP) - show similar amino acid profiles whether all of the genes were used or just the PLC genes (Figures 2.5 A and B, and 2.6 A and B).

However, although all the genes in PLC have the acid-base domain, some of the genes in

PLC lack the nucleophile domain (I/VTENG). The proteins lacking the nucleophile domain are PtcbgB1, PtcbgB2, Zma:GRMZM5G845736, Aco:AcoGoldSmith_v1.007763m and

Mtr:Medtr8g146370. The Mes:cassava4.1_031147m and Os4bglu17 proteins have only the final “G” of the nucleophile domain. The Glyma13g35410 protein has a methionine (M) inserted between threonine (T) and glutamic acid (E) but it lacks the asparagine (N) in the fourth position. The Csi:orange1.1g045534m protein has a DVTVVGREG inserted between residues “N” and “G” of the conserved nucleophile domain sequence, but this is a false positive result, because the insert appears after the I/VTENG sequence in this protein, and also has a “D” residue in place of a “G”.

16

17

Discussion and Conclusions

The Putative Lignin Clade

According to the literature (Jiao et al., 2011), there has been a seed-plant wide genome duplication, which could have had an effect on the structure of the GH1 family. Our phylogenetic tree does not provide evidence for the seed-plant wide duplication, since all seven of the gymnosperm genes included in the analysis form a single clade. But we did find two genes in Pinus contorta and four genes in Pinus taeda, which are different in their intron lengths (Song Liu, unpublished data), which were all included in the phylogenetic tree after translating the coding sequence. The difference among the gymnosperm CBG- like genes resides only in the intronic region, while their protein coding sequences are almost identical.

It seems that the gymnosperm and angiosperm genes in the PLC clade evolved from a common ancestor of seed plants. The lack of any lower plants in PLC could indicate that this gene evolved in the seed plants that form lignified secondary cell walls. The presence of diverse eudicotyledonous in two distinct clades supports the hypothesis of either an

Angiosperm-wide or a seedplant-wide duplication (Jiao et al 2011) which separated the PLC

Eudicot 1 sub-clade from the PLC Eudicot 2 and PLC Eudicot 3 sub-clades. The duplication was perhaps followed by loss or divergence of the monocot genes orthologous to Eudicot 1 sub-clade. Alternatively, because all of the monocot genes included in the analysis are from a single family, additional sampling of nongrass monocots may allow the resolution of one monocot clade with each of the major eudicot clades. There is a weaker evidence for

18

another ancient duplication event separating Eudicot 2 from Eudicot 3 and the monocot sub-clade. Sub-clade Eudicot 3 consists of Cassava and Castor, both of which belong to

Euphorbiaceae, and Eucalyptus which belongs to the family Myrtaceae. Because Eucalyptus is absent from the Eudicot 1 clade, its position as a distinct clade with some of the Castor sequences could be artifactual. Additional sequences from taxa related to Eucalyptus and

Cassava, and sequences from from species belonging to Asterids, are necessary to provide conclusive evidence regarding the Eudicots 3 clade .

There are six poplar genes in PLC compared to three in Arabidopsis thaliana (Xu et al.,

2004), which is consistent with the other phenylpropanoid pathway genes (except CAD) which are present in higher numbers (almost double) in Populus than Arabidopsis (Tuskan et al., 2006). This is consistent with the retention of a larger fraction of paralogs in the slowly evolving Salicaceae genome (Tuskan et al). The total number of GH1 family genes in poplar is 44, compared to 46 in Arabidopsis, which leads us to hypothesize that though there was generally a rapid loss of genes in poplar after its most recent genome duplication, there is a basic set of necessary PLC genes below which further gene loss may not occur. The presence of five tandemly duplicated poplar genes in Eudicot 1 suggests additional non-genome wide duplications in the gene following the genome wide duplication event. Given that poplar is a woody plant, selection may have favored the growth and retention of PLC genes in Salicaceae, or loss of PLC genes in Arabidopsis.

As already mentioned, five of the six poplar genes in PLC, PoptrGH1-24 to 28, are tandem duplicates. It is interesting to note that all the five tandem duplicates are clustered in the

19

Eudicot 1 subclade, whereas the lone gene outside that cluster, PoptrGH1-19, is in Eudicot

2 subclade. This could lead to speculation that PoptrGH1-19 and the five tandem duplicates originated from a common ancestor that duplicated during the angiosperm-wide duplication event. Interestingly, Arabidopsis thaliana and A. lyrata do not have any PLC gene representatives outside the Eudicot 1 sub-clade. Castor and Eucalyptus do not have any genes in subclade 2, but they have genes in subclade-3 which is related to subclade 2.

This could lead to the hypothesis that Arabidopsis lost PLC genes after the duplication event, and the three homologs, AtBGLU45, AtBGLU46 and AtBGLU47, may be the result of the Brassicaceae wide  and  duplication events (Jiao et al., 2011).

Gene organization and protein composition of the PLC

There are six poplar CBG-like genes in the PLC, five of which are tandem duplicates on scaffold 4 (Figure 2.3). The sixth one, PoptrGH1-19, is on scaffold 1 (Figure 2.1C). PoptrGH1-

26 and PoptrGH1-27 have the same sequence except for 42 bases that are missing at the beginning of genomic sequence of PoptrGH1-26. We compared the upstream non- translated regions of the two genes. The UTR region of PoptrGH1-27 had 94 bases. The

1000 bases upstream of the PoptrGH1-26 genomic sequence were extracted from

Phytozome v7.0. The coding sequences and UTR regions were aligned separately using

Muscle on Mega (v5.05). There is a very high similarity in the CDS and upstream region of the two genes (Figures 2.4A and B respectively), while the intron regions are different.

There are 42 bases missing in PoptrGH1-26, but that could be due to error in assembly or annotation because the 42 bases upstream of the gene matches the 42 bases in the coding

20

region of PoptrGH1-27. It is therefore likely that the two genes are products of a very recent duplication. It is not possible at the moment to differentiate between the expression of the two genes because it is not possible to design qRT-PCR primers that can distinguish the transcripts from the two genes.

As mentioned earlier, according to Xu et al. (2004), the GH1 members have different exon: intron configurations, but it appears that the 13:12 exon:intron ratio is the ancestral type.

All the genes in the PLC have the 12:11 configuration (Figure 2.5 A and B). Comparison with the ancestral structure of the GH1 gene family reveals absence of the ninth intron in all genes in the PLC. This suggests that loss of the ninth exon/intron occurred at the time of origin of the PLC clade with its divergence from the rest of the GH1 family. We look forward to learning, as CBG-like genes from additional taxa are discovered, if the timing of this exon/intron event may coincide with the timing of the origin of lignification in seed plants.

The sequence logo created for the conserved active sites, WXTFNEP and I/VTENG shows high levels of conservation among the PLC genes. Though some proteins in the PLC do not have the nucleophile domain (I/VTENG), the acid/base domain is found in all of the PLC sequences (Figs. 2.6A and B, and 2.7 A and B). Thus the acid/base domain may be most important to conservation of GH1 protein functions.

21

Table 2.1: Poplar accession numbers and the corresponding GH1 names. PLC genes are highlighted in green.

Sequence Id Poplar GH1 Comments POPTR_0001s05280 PoptrGH1-1 POPTR_0001s23050 PoptrGH1-2 POPTR_0001s23060 PoptrGH1-3 POPTR_0001s23070 PoptrGH1-4 POPTR_0001s23080 PoptrGH1-5 POPTR_0001s23090 PoptrGH1-6 POPTR_0001s23100 PoptrGH1-7 POPTR_0001s23110 PoptrGH1-8 POPTR_0001s23120 PoptrGH1-9 POPTR_0001s23130 PoptrGH1-10 POPTR_0001s23310 PoptrGH1-11 POPTR_0001s23320 PoptrGH1-12 POPTR_0001s23330 PoptrGH1-13 POPTR_0001s23350 PoptrGH1-14 POPTR_0001s23430 PoptrGH1-15 POPTR_0001s23440 PoptrGH1-16 POPTR_0001s23450 PoptrGH1-17 POPTR_0001s23460 PoptrGH1-18 POPTR_0001s41440 PoptrGH1-19 POPTR_0001s44040 PoptrGH1-20 POPTR_0002s21830 PoptrGH1-21 POPTR_0003s22020 PoptrGH1-22 Too small, not used POPTR_0003s22030 PoptrGH1-23 POPTR_0004s01880 PoptrGH1-24 POPTR_0004s01890 PoptrGH1-25 POPTR_0004s01900 PoptrGH1-26 POPTR_0004s01910 PoptrGH1-27 POPTR_0004s01920 PoptrGH1-28 POPTR_0004s04080 PoptrGH1-29 POPTR_0004s10900 PoptrGH1-30 POPTR_0004s18990 PoptrGH1-31 POPTR_0004s19050 PoptrGH1-32 New sequences from new V7 POPTR_0005s06090 PoptrGH1-33 POPTR_0008s09380 PoptrGH1-34 POPTR_0010s16740 PoptrGH1-35 POPTR_0010s18600 PoptrGH1-36 POPTR_0011s13375 PoptrGH1-37 New sequences from new V7 POPTR_0012s04670 PoptrGH1-38 POPTR_0015s04280 PoptrGH1-39 POPTR_0019s02590 PoptrGH1-40 POPTR_0217s00210 PoptrGH1-41 Too small, not used POPTR_0231s00210 PoptrGH1-42 New sequences from new V7 POPTR_0613s00200 PoptrGH1-43 POPTR_2696s00200 PoptrGH1-44 New sequences from new V7

22

Figure 2.1: Available as Supplemental figure 3 in computerized version. NJ analysis of amino acid alignments of the putative lignin clade (PLC) of GH1 family genes that contains CBG-like genes which putatively act on lignin monomer substrates. Note that the PLC clade contains 6 genes from Populus The PLC clade has 2 monocot sub-clades and 3 Eudicot sub- clades, named Monocot 1, Monocot 2, Eudicot 1, Eudicot 2, and Eudicot 3.

23

Figure 2.2: A Maximum-Likelihood tree (ML) of the PLC obtained by amino acid alignments and 500 boot strap reiterations. Available as Supplemental figure 3 in computerized version.

24

Figure 2.3: Synteny of putative Lignification b-glucosidase in Populus showing tandem duplicates in Poplar PLC genes.

25

Figure 2.4A: Muscle v3.8 alignment of the CDS sequence of PoptrGH1-26 and PoptrGH1-27

NOTE: The missing bases in PoptrGH1-26 might be an error in the Phytozome database because the complete sequence is present upstream of where it is shown to start.

PoptrGH1-26 ------CTGATCCTGCTACCTCTC PoptrGH1-27 ATGGGAATTTCTTCACTTTGTAAAGCTTTAATTCTCTTAGAGCTGATCTTGCTACCTCTC ****** ***********

PoptrGH1-26 TTTGCATCATCTGATACAAAAACTCTGCATGAAAGTTCAGATTCTTCTTCATTTCCTGCC PoptrGH1-27 TTTGCATCATCTGATACAAAAACTCTGCATGAAAGTTCAGATTCTTCTTCATTTCCTGCC ************************************************************

PoptrGH1-26 AACTTTCTCTTTGGGACTGCCTCCTCTTCTTATCAGTTTGAAGGAGCTTACCTGAGTGAT PoptrGH1-27 AACTTTCTCTTTGGGACTGCCTCCTCTTCTTATCAGTTTGAAGGAGCTTACCTGAGTGAT ************************************************************

PoptrGH1-26 GGAAAAGGTTTGAGCAACTGGGATGTCCATACACATAAACCAGGAAACATAATTGATGGA PoptrGH1-27 GGAAAAGGTTTGAGCAACTGGGATGTCCATACACATAAACCAGGCAACATAATTGATGGA ******************************************** ***************

PoptrGH1-26 AGCAATGGAGATATCGCCGTCGACCAATATCATCGGTATCTGGAAGACATTGATCTAATG PoptrGH1-27 AGCAACGGAGATATCGCCGTCGACCAATATCATCGGTATCTGGAAGACATTGAGCTAATG ***** *********************************************** ******

PoptrGH1-26 GCCTCCCTCGGAGTCAACAGCTATAGGTTTTCGATGTCATGGGCACGAATTCTACCCAAA PoptrGH1-27 GCCTCCCTCGGAGTCAACAGCTATAGGTTTTCGATGTCATGGGCACGAATTCTACCCAAA ************************************************************

PoptrGH1-26 GGGAGATTTGGAGGTGTCAATATGGCTGGTATTAGCTACTATAACAAGCTGATCAATGCT PoptrGH1-27 GGGAGATTTGGAGGTGTCAATATGGCTGGTATTAGCTACTATAACAAGCTGATCAATGCT ************************************************************

PoptrGH1-26 CTCCTACTTAAAGGGATTCAACCATTTGTGTCATTGACTCATTTTGATGTGCCTCAAGAG PoptrGH1-27 CTCCTACTTAAAGGGATTCAACCATTTGTGTCATTGACTCATTTTGATGTGCCTCAAGAG ************************************************************

PoptrGH1-26 CTTGAGGATAGATACGGGGGTTTTCTAAGTCCTAAATCCCAAGAGGATTTCGGATATTAT PoptrGH1-27 CTTGAGGATAGATACGGGGGTTTTCTAAGTCCTAAATCCCAAGAGGATTTCGGATATTAT ************************************************************

PoptrGH1-26 GTAGACATCTGTTTCAAGTACTTCGGAGACCGAGTGAAGTACTGGGCCACCTTCAATGAG PoptrGH1-27 GTAGACATCTGTTTCAAGTACTTCGGAGACCGAGTGAAGTACTGGGCCACCTTCAATGAG ************************************************************

PoptrGH1-26 CCAAATTTTCAAGCCATTTATGGTTATCGTGTAGGTGAATGCCCACCAAAACGCTGCTCA PoptrGH1-27 CCAAATTTTCAAGCCATTTATGGTTATCGTGTAGGTGAATGCCCACCAAAACGCTGCTCA ************************************************************

PoptrGH1-26 AAGCCTTTCGGAAATTGCAGTCATGGGGACTCAGAGGCGGAGCCCTTTATTGCGGCGCAT PoptrGH1-27 AAGCCTTTCGGAAATTGCAGTCATGGGGACTCAGAGGCGGAGCCCTTTATTGCGGCGCAT ************************************************************

26

PoptrGH1-26 AACATAATCTTAGCTCATGCAACTGCAGTTGATATTTACAGAACCAAATACCAGAGAGAG PoptrGH1-27 AACATAATCTTAGCTCATGCAACTGCAGTTGATATTTACAGAACCAAATACCAGAGAGAG ************************************************************

PoptrGH1-26 CAAAGAGGCAGCATTGGTATTGTCATGAATTGCATGTGGTATGAACCTATTAGCAATTCA PoptrGH1-27 CAAAGAGGCAGCATTGGTATTGTCATGAATTGCATGTGGTATGAACCTATTAGCAATTCA ************************************************************

PoptrGH1-26 ACAGCAAACAAGTTAGCAGTTGAAAGAGCTCTTGCCTTCTTCTTGCGCTGGTTCTTGGAC PoptrGH1-27 ACAGCAAACAAGTTAGCAGTTGAAAGAGCTCATGCCTTCTTCTTGCGCTGGTTCTTGGAC ******************************* ****************************

PoptrGH1-26 CCAATCATATTTGGAAGATATCCTGAAGAAATGAAAGAAGTTCTGGGATCTACTCTACCT PoptrGH1-27 CCAATCATATTTGGAAGATATCCTGAAGAAATGAAAAAAGTTCTGGGATCTACTCTACCT ************************************ ***********************

PoptrGH1-26 GAATTTTCAAGAAATGACATGAATAAATTGAGGAAGGGACTGGATTTTATCGGCATGAAT PoptrGH1-27 GAATTTTCAAGAAATGACATGAATAAATTGAGGAAGGGACTGGATTTTATCGGCATGAAT ************************************************************

PoptrGH1-26 CATTACACCAGTTACTACGTTCAAGATTGCATCTTGTCTGTGTGTGAACCTGGAAAAGGA PoptrGH1-27 CATTACACCAGTTACTACGTTCAAGATTGCATCTTGTCTGTGTGTGAACCTGGAAAAGGA ************************************************************

PoptrGH1-26 AGCACGAGGACAGAAGGTTCCTCTCTATTAACTCAAGAAAAAGATGGAGTTCCCATCGGC PoptrGH1-27 AGCACGAGGACAGAAGGTTCCTCTCTATTAACTCAAGAAAAAGATGGAGTTCCCATCGGC ************************************************************

PoptrGH1-26 AAACCTAGTGAAGTGGATTGGCTACATGTTTATCCACAAGGAATGGAAAAGATGGTTACC PoptrGH1-27 AAACCTAGTGAAGTGGATTGGCTACATGTTTATCCACAAGGAATGGAAAAGATGGTTACC ************************************************************

PoptrGH1-26 TATGTAAAGGAGAGATACAATAACACACCCATGATCATCACAGAAAATGGGTATTCCCAA PoptrGH1-27 TATGTAAAGGAGAGATACAATAACACACCCATGATCATCACAGAAAATGGGTATGCCCAA ****************************************************** *****

PoptrGH1-26 GTGAGCAATTCAAACGGAAACATTGAAGAATTCCTTCATGATACAGGAAGGGTGGAATAC PoptrGH1-27 GTGAGCAATTCAAACGGAAACATTGAAGAATTCCTTCATGATACAGGAAGGGTGGAATAC ************************************************************

PoptrGH1-26 ATGTCTGGCTATTTGGATGCCTTGCTGACAGCAATGAAGAAAGGAGCAGATGTGAGGGGC PoptrGH1-27 ATGTCTGGCTATTTGGATGCCTTGCTGACAGCAATGAAGAAAGGAGCAGATGTGAGGGGC ************************************************************

PoptrGH1-26 TATTTTGCCTGGTCCTTCCTTGATAATTTTGAGTGGACATTCGGTTATACAAGAAGATTT PoptrGH1-27 TATTTTGCCTGGTCCTTCCTTGATAATTTTGAGTGGACATTCGGTTATACAAGAAGATTT ************************************************************

PoptrGH1-26 GGACTTTACCATGTTGATTACACCACAATGAAGCGAACTCCAAGATTATCAGCAACTTGG PoptrGH1-27 GGACTTTACCATGTTGATTACACCACAATGAAGCGAACTCCAAGATTATCAGCAACTTGG ************************************************************

PoptrGH1-26 TACAAAGAATTTATTGCAAGGTATAAGGTAGACAAATCCCAGATGTGA PoptrGH1-27 TACAAAGAATTTATTGCAAGGTATAAGGTAGACAAATCCCAGATGTGA ************************************************

27

Figure 2.4B: Muscle v3.8 alignment of 100bp upstream region of PoptrGH1-26 and PoptrGH1-27

PoptrGH1-26 ------PoptrGH1-27 AGCATACTTGGCAATCTGATCTAACCTTCTTGCTCTTTTCTCATGCCTTGAAATATTCTC

PoptrGH1-26 ------PoptrGH1-27 CTAATATTTATCTTCATCTCTCACCAGTCGTAACTTGTTGGAAGGCTAATCACGGGTTTA

PoptrGH1-26 ------GGTGACTCGCCAGTTGCTTGCCGTCCACAACAACAATCAAAC PoptrGH1-27 CCTGGTCCAAACCTGGAAAGTGACTCGCCAGTTGCTTGCCGTCAACAACAACAATCAAAC ************************ ****************

PoptrGH1-26 ATTAATGGTGGTTTATCCTCATGGTTTATCCAGACTGATTATACAATGCCTTGAAGTGAA PoptrGH1-27 ATTAATGGTGGTTTATCCTCATGGTTTATCCAGACTGATTATACAATGCCTTGAAGTGAA ************************************************************

PoptrGH1-26 TCCATTTTTATATTTTATTTTAAAAAATATATATTGGTGTGGTTGGTCCATAATATTTGG PoptrGH1-27 TCCATTTTTATATTTTATTTTAAAAAATATATATTGGTGTGGTTGGTCCATAATATTTGG ************************************************************

PoptrGH1-26 TGCGTCAATTTCTTATATTTTATTTCATGAGAATATATTGGTGTGGTTGGTCCACCAACT PoptrGH1-27 TGCGTCAATTTCTTATATTTTATTTCATGAGAATATATTGGTGTGGTTGGTCCACCAACT ************************************************************

PoptrGH1-26 CTTTTCAGCCATATGACAAATAAAATAATTTAAATAATTAGGCCCTTTTGAAGTGGATCC PoptrGH1-27 CTTTTCAGCCATATGACAAATAAAATAATTTAAATAATTAGGCCCTTTTGAAGTGGATCC ************************************************************

PoptrGH1-26 ATTTTTATAAATATAAAATAGCTTGATTTGTGATACACAAGGGCCACCTTTTTTGAGTAA PoptrGH1-27 ATTTTTATAAATATAAAATAGCTTGATTTGTGATACACAAGGGCCACCTTTTTTGAGTAA ************************************************************

PoptrGH1-26 ATTATAAATTAGTCCCTATATGTTTGATGATCTTGTAAGTTAGTTCCTCTGTTTCTGAAA PoptrGH1-27 ATTATAAATTAGTCCCCATATGTTTGATGATCTTGTAAGTTAGTTCCTCTGTTTCTGAAA **************** *******************************************

PoptrGH1-26 AGTATATGCTAGTCCCTTGACCAAAATTGATTGCTATTTCTTATTTAATTTCTATATTAA PoptrGH1-27 AGTACATCCTAGTCCCTTGACCAAAATTGATTGCTATTTCTTATTTAATTTCTATATTAA **** ** ****************************************************

PoptrGH1-26 TTTTTTAAATGCTTTTTTTAATTTTATCTGGTAAGATTGAGAAATTTGATTGCAAACTAA PoptrGH1-27 TTTTTTAAATGCTTTTTTTAATTTTATCTGGTAAGATTGAGAAATTTGATTGCAAACTAA ************************************************************

PoptrGH1-26 GAAAGACAAAAGAATAAGTTAATTGTGATGAAAAACCAACAGAAGGACAAAGTTACAATC PoptrGH1-27 GAAAGACAAAAGAATAAGTTAATTGTGATGAAAAACCAACAGAAGGACAAAGTTACAATC ************************************************************

PoptrGH1-26 AAATATGAAGAAATAGGGACTTAACTGCAAGTGTTTAAACCAAAGGGATTAACTAATCAT PoptrGH1-27 AAATATGAAGAAATAGGGACTTAACTGCAAGTTTTTAAACCAAAGGGATTAACTAATCAT ******************************** ***************************

28

PoptrGH1-26 TGCATTACAAATACAGGGATCAAATAAAGTAAATTACTT-TAAAAAATTCAATTATAAAC PoptrGH1-27 TGCATTACAAATACAGGGATCAAATAAAGTAAATTACTTAAAAAAAAAACAATTTTAAAC *************************************** ****** ***** *****

PoptrGH1-26 TGTTCACTACCTGTTGCTCTAGGATTTTTGAGCGATTCTGGAAAAAGATTCTTTC-TTCA PoptrGH1-27 TGTTCACTACCTGTTCCTCTAGGATTTTTGAGCCATTCTGGAAAAAGATTCTTTCTTTCA *************** ***************** ********************* ****

PoptrGH1-26 TCTTGTCTAA--TTTCTGACTTTTTCCCATACCAAGAACAGAAGTTTCAACAACTTCTCT PoptrGH1-27 TCTTCTCTGACGTTTCTGACGTTTTCCCATAGCAAGAACAGAAGTTTCAACAACTTCTCT **** *** * ******** ********** ****************************

PoptrGH1-26 GATTCCTTCTATATTAGAGTTTTTTCTACATGAAACTAGACAACCCTGCACATATATATA PoptrGH1-27 GATTCCTTCTATATTAGAGTTTTTTCAACCTGAAACTAGACAACCCTGCACATATACATA ************************** ** ************************** ***

PoptrGH1-26 G-AAGCACAAACTTGTATACAAAGACATCTAGCTTTTTAACTTTGGATACAGCATATATA PoptrGH1-27 GAAAGCACAAACTTGTATACAAAGGCATCTAGCTTCTTAACTTTGGATACAGC----ATA * ********************** ********** ***************** ***

PoptrGH1-26 TGAATTGCATTATATCTAGCATGGGAATTTCTTCACTTTGTAAAGCTTTAATTCTCTTAG PoptrGH1-27 TGAATTGCATTAT CTAGC------************* *****

PoptrGH1-26 AG PoptrGH1-27 --

NOTE: The last part of the PoptrGH1-26 should be a part of the coding sequence. It seems like an error in annotation.

29

Figure 2.5 A and B: Exon:Intron sequence organization of the 6 CBG-like genes identified in

Populus, which are composed of 12 exons (coding regions) interrupted by 11 introns (non- coding regions) in their genomic sequence. (A) Comparison of the 6 Populus CBG-like genes with the predicted ancestral GH1 gene structure, consisting of 13 exons. The numbers represent the phase of each exon. (B) Comparison of the exon positions in the 6

Populus CBG-like genes. Structures were drawn using “Gene Structure Display Server” http://gsds.cbi.pku.edu.cn/ .

A:

B:

30

Figure 2.6 A and B: Sequence Logos for amino acids in the catalytic acid/base domain in (A) all GH1 sequences and (B) Lignin clade only (PLC). The logos were drawn using “Weblogo”

(Schneider et al., 1990; Crooks et al., 2004; weblogo.berkeley.edu/ )

A. All sequences

B. PLC only

31

Figure 2.7A and B: Sequence Logos for amino acids in the catalytic nucleophile domain in

(A) all GH1 sequences and (B) Lignin clade only (PLC). The logos were drawn using

“Weblogo” (Schneider et al., 1990; Crooks et al., 2004; weblogo.berkeley.edu/ )

A. All sequences

B. PLC Only

32

Chapter 3:

Determining the expression of lignin sub-clade GH1 genes

Introduction:

Lignification in vascular tissue, particularly in the xylem, corresponds with the growth season in trees. Thus the amount of monolignol β–glucosidase mRNA and enzyme activity per unit fresh mass in Pinus and Populus xylem should vary by season as well, and coincide with lignification. So there should be more monolignol β-glucosidase activity found in xylem tissues just before and during lignification than in other times of the year.

Populus trichocarpa is a good model system to study gene expression and development in trees since it has a small genome size which has been completely sequenced. Its generation time is shorter than most trees (maturity in about 4 years), several Populus hybrids are readily transformable, and it is fairly closely related to the well studied model plant

Arabidopsis (Bradshaw Jr. et al., 2000; Jansson and Douglas, 2007).

Pines are good model organisms for gymnosperms for several reasons. They are the most studied and characterized group of gymnosperms (Lev–Yadun and Sederoff, 2000).

Gymnosperms form the connecting link between angiosperms and lower plants. The megagametophyte in pines can yield enough DNA for haploid analysis and haploid mapping. Large number of genetic maps and markers are currently available for pines, especially the economically very important southern US Loblolly pine (Pinus taeda), which is also genetically transformable. However pines as a model system come with their share

33

of drawbacks. They have very large genomes, making complete genome sequencing a distant target. The tree has a long generation time and takes many years to reach maturity.

There is huge genetic diversity in pines, and no species of pine can be found growing in all environmental conditions.

In chapter 2, we identified six poplar genes that clustered phylogenetically with Pinus contorta’s β-glucosidase that is known to act on monolignol substrates. We studied the expression of five of the six Poplar CBG-like genes by quantitative RT-PCR. Two of these five genes, PoptrGH1-26 and PoptrGH1-27 have the same coding (exon) sequences, and hence their expression could not distinguished from each other. For purposes of this study, we refer to the expression of these genes together as PoptrGH1-26/GH1-27. Primer design for the PoptrGH1-19 gene failed, and this gene was thus not included in expression assays. We also studied the expression of the CBG genes in Pinus taeda. The four Pinus taeda CBG genes also have very little difference in their coding sequences, and thus we could not distinguish between their mRNAs. Hence for purposes of expression they are all noted together as PtCBG. Studying the expression of CBG-like genes in poplars and pines may provide interesting insights into the evolution of this gene family.

34

Materials and Methods

Collection of plant tissue samples:

The plants used for this study were Populus balsamifera (Balsam Poplar) and Pinus taeda

(Loblolly Pine). The importance of these two species as model organisms has been discussed in Chapter 1.

Loblolly Pine samples were collected in 2009 from a research plantation on Littlegreen

Briar road, near McVeytown, PA 17051 in Mifflin cCunty. Balsam Poplar samples were collected in 2008 from either trees growing on the White Course golf course or next to

Forest Resources Laboratory, both sites on the campus of Penn State University, University

Park, PA 16802.

The tissue collections focused on developing secondary phloem and xylem tissues and leaves or needles which were collected from mature trees in 2008 for Poplar and 2009 for

Pines from time points in the growing season representing early spring (in April for Poplar or May for Pines), summer (June) and the end of the growing season (September). Strips of xylem and phloem samples were separated from the tree and from the bark using a hammer and chisel. All tissues were immediately flash frozen on site in liquid nitrogen. The samples were transported to the lab in liquid nitrogen (Poplars) or dry ice (Pines) and stored in a -80˚C freezer until the RNA was extracted.

Specimens of poplar and pine were also grown in the greenhouse from which tissues were collected to test PCR primers, and to serve as inter-plate gene expression calibrators.

35

qRT-PCR Primer design and validation:

Primers were designed for qRT-PCR reactions using Primer3 software v4.0 (Rozen and

Skaletsky, 2000) to study the expression of the putative poplar CBG-like genes identified by phylogenetic analysis of the GH1 family. The primers selected were between 18 to 22 basepairs in length with annealing temperature (Tm) ranging between 50˚ to 60˚ C and GC content between 30 to 60% with an optimum of 40%. Polyubiquitin (UBQ11;

POPTR_0001s44440) and tubulin alpha-3/alpha-5 chain from poplar (TUA2;

POPTR_0003s21080) were tested for possible use as internal controls (Brunner et al., 2004;

Gutierrez et al., 2008). Though both TUA2 and UBQ11 showed consistent expression in greenhouse samples, UBQ11 showed the more consistent expression among the various experimental samples from the field, and hence was chosen to be used as an internal control. PoptrGH1-34 is a GH1 gene in Populus that did not cluster with the putative lignin clade genes (see chapter 2). PoptrGH1-34 was thus selected to serve as a control in the gene expression studies, representative of the non-lignin clade GH1 genes, whose expression is not expected to be correlated with lignification. The PoptrGH1-34 gene showed uniform expression across samples and was hence used as another internal control. All of the primers used are shown in Table 3.1a.

Primers were also designed to test for the expression of Pinus taeda CBG-like genes based on their CDS sequence, as explained above. The primer sequences are shown in Table 3.1b

All the primers were tested using cDNA from leaves and stems collected from Poplar hybrid

OGY (Populus x euramericana) wild type plants, and needles and stem tissues collected from

36

Pinus taeda samples grown in the greenhouse. The samples were amplified both by standard PCR as well as real time PCR. Any primer that produced more than one amplification product, or more than 1 peak in the qRT-PCR melt curve were not used, and a new primer was then tested and selected for the gene. We could not obtain a good primer for PoptrGH1-19 because all of the primer pairs that were tested showed two peaks in the qPCR melt curve. Hence, we did not include this gene in our gene expression analysis.

However, this is an interesting gene, especially because it is the only gene out of the six PLC genes that is not tandemly duplicated.

Extraction of RNA and qRT-PCR:

RNA was extracted from all plant tissues that were collected. Ambion’s MagMAX-96 total

RNA isolation kit (Cat # AM1830) was used to isolate RNA from Populus samples. Leaf and bud tissues were ground frozen using a Geno/Grinder 2000 - high throughput sample homogenizer. The harder xylem and phloem tissues were ground in liquid nitrogen using a mortar and pestle that had been baked at 200˚C for six hours to destroy any RNAse.

RNA from Pinus taeda samples was extracted using the CTAB protocol described by Chang et al. (1993). DNase treatment of the RNA was conducted using TURBO DNase (Ambion,

Cat # AM2238 ) to remove any contaminating DNA.

The resulting RNAs were evaluated by micro-capillary electrophoresis using a Bioanalyzer

(Agilent). RNA samples with a RIN value from the Bioanalyzer of less than 7.0 were discarded. RNA samples were also quantified using Qubit2.0 RNA testing kit.

37

For poplar, 500ng total RNA was used to prepare cDNA. For pine samples 100ng total RNA was used for cDNA preparation. cDNA was prepared using Bio-Rad’s iScript Select cDNA

Synthesis Kit (Cat # 170-8897).

The quantitative PCR reactions were conducted on the Bio-Rad model “MyIQ iCycler” machine, using SsoFast EvaGreen Supermix (Bio-Rad Cat # 172-5202) as the fluorescence dye.

Normalization of the results

The data obtained by qRT-PCR of poplar was normalized in two ways:

1) dCt performed using reference gene UBQ11 as the control.

2) dCt performed using a non-lignin clade gene PoptrGH1-34 as the control.

Ubiquitin is the reference gene (Brunner et al., 2004) that was used because it was found to be expressed uniformly in the data set.

PoptrGH1-34 was selected to serve as a non lignin clade GH1 family gene. After all the studies, it was found to be quite uniformly expressed across all samples. Hence, it was used as a second gene for normalization.

The gene expression data could not be double normalized (ddCt) because the samples were collected from trees grown in the field, and hence there wasn’t any starting point or time of induction to serve as reference points.

38

Statistical analysis of data

The normalized data was analyzed using Analysis of variance (ANOVA) on Minitab. The samples taken were randomized during a particular time period. During the next sampling time, random samples were collected for Poplar with replacement, and without replacement for pines.

The qRT-PCR analysis of PoptrGH1-24 and PoptrGH1-25 in leaf RNA samples and in certain replicates in Xylem or Phloem RNA samples did not show expression levels above background. These were considered missing or “no data” values. The qRT-PCR assays were repeated on the samples with no data to confirm that there were no technical errors in preparation of the assays. It is not possible to distinguish between such “no data” results from qRT-PCR being the result of very low expression levels that are not above background levels and complete absence of the transcript. This means that the mean of the three biological replicates could not be used in further analyses. The ways to deal with missing values are to remove the missing values, or replace them by (a) the mean or (b) estimate nearest neighbors or (c) replace by an arbitrary value. We cannot use the mean or nearest neighbors because these are not "missing values", but samples where the genes are not expressed. We have done the analysis in two different ways – (1) eliminated the data and

(2) replaced them with a value of zero (to provide a uniform value).

Bartlett’s test was attempted on the samples to check for normal distribution, but a minimum of six replicates are necessary to conduct the test. However, normality plots

39

performed on the analysis of residuals of ANOVA shows that the data is close to normal, and hence the data is assumed normal. The plots are shown in Appendix B.

PoptrGH1-24 and PoptrGH1-25 showed the widest range of expression, and hence the tissue types and months calculations were compared against biological replicates.

The fold changes for all genes were calculated using a variation of the 2-ΔΔct equation (Livak and Schmittgen, 2001). According to the equation, ct value is normalized twice – once with the reference gene, and once with the expression at the beginning of the treatment. This study did not have any artificial or control treatments, and hence we normalized only with a reference gene and used the formula 2-Δct to calculate the fold change. Based on a paper by Brunner et al., (2004) we tested two genes as reference for normalization –

Polyubiquitin (UBQ11) and Tubulin (TUA2) - on stems and leaves of samples grown in the green house. Both genes showed similar expression in both stems and leaves on the greenhouse samples, but tubulin showed greater variation among the samples collected from the field at different time points. Hence, UBQ2 was used as one of the reference genes. A second gene used as a reference was PoptrGH1-34, which is a GH1 gene that does not cluster with the PLC. This was chosen as a reference gene because it showed uniform expression across all samples.

40

Results and Analysis of data:

Expression of CBG-like genes in Populus balsamifera:

Expression of five out of the six poplar CBG-like genes that clustered in the putative lignin clade (PoptrGH1-24, 25, 26/27, 28) plus the non-lignin clade gene PoptrGH1-34, were analyzed using qRT-PCR. Two of the genes – PoptrGH1-26 and PoptrGH1-27 – have the same coding sequence, and hence their expression was tested together using a single primer. Expression assays could not be conducted for the sixth CBG-like gene in poplar, the

PoptrGH1-19 gene, due to inability to obtain primers that could pass qRT-PCR QC tests.

As mentioned in materials and methods section, tissue samples were collected from three trees, for three biological replicates. RNA was extracted from each of the tissue types, and a master mix of all the ingredients except primers and cDNA was prepared before running the samples together on a plate. Each biological replicate sample was replicated three times (three technical replicates) to evaluate technical errors caused due to faulty pipetting. qRT-PCR was conducted with the five poplar genes (PoptrGH1-24, 25, 26/27, 28) and the non-lignin clade gene PoptrGH1-34, on 27 samples (leaves, phloem and xylem for three biological replicates) and a total of 162 data points were obtained, including the three technical replicates for each data point. ANOVA analysis showed that the error in technical replicates was very small; hence the data collected was good quality. Tables with

41

the final qRT-PCR datasets and fold change calculations and ANOVA output tables and figures for all of the poplar samples are presented in Appendix B, Part A.

PoptrGH1-24 and PoptrGH1-25 genes showed no expression above background for all three replicates of RNA isolated from leaves collected in April. PoptrGH1-24 showed no expression above background for RNA from leaves collected in June from plant (biological replicate) 3 and in September from plant 2. PoptrGH1-25 showed no expression in RNA from leaves collected in June for plant 3 nor in September from plant 2. PoptrGH1-25 also showed no expression in RNA collected in September from Xylem from plants 1 and 3. qRT-

PCR was repeated on the RNA samples that showed no expression results to check for manual errors, but the same results were the same. Hence, these data points are noted as

“no data” points, although they cannot accurately be considered as missing data points, nor can no expression be treated as true zero values . They were also tested by substituting

“0” for the fold change values where there was “no data” available.

To test the results for statistically significant differences in expression between genes, between tissues, and between collection time points, ANOVA was used. Two operational null hypotheses were tested assuming that the only source of error is technical replicates.

The two hypotheses were:

H01: The expression of the genes in the putative lignin clade is not related to the

seasonal time period when tissues were collected.

42

H02: The expression of the genes in the putative lignin clade does not depend on

the biological replicate.

H03: : The expression of the genes in the putative lignin clade does not depend on

the tissue type.

A two-way ANOVA with multiple entries (3 tech. replicates) in a cell was performed on the data obtained. The results are shown in tables 3.2 to 3.4. In almost all the cases all the three effects (biological reps, sample collection dates, and interactions) were highly significant. This means that the error due to differences in technical replicates is small enough to capture all possible biological effects. The rest of the analyses were done with interplate calibrated data, which takes the average of the three technical replicates.

Then the hypotheses assuming the interaction effect is the source of error was tested. For this the data used were averages of the 3 technical replicates for each biological rep.

For xylem tissue by GH1-28 gene interaction, a 17.9% chance for obtaining the F value 2.73 when

H02 is true was obtained, which is higher for all the other RNA samples in Poplar (H02 there is no significant difference between bio rep). This is quite a high value and we thus cannot reject H0 with complete confidence. For other Tissue-Gene combinations the chances of obtaining the observed F values when H0 is true were even higher. On the whole, for none of the tissue-gene interactions was it possible to reject the null hypothesis about differences between the biological replicates.

Significant differences between biological reps was not obtained even at p=0.01 (Table

3.5). Therefore, this result showed that it is acceptable to use the average of technical

43

replicates and biological replicates for each tissue type in further analyses. In contrast, effects of tissue collection time point were found to be significant in almost all cases.

Expression of CBG-like genes in Pinus taeda:

As with Poplar, needle, phloem and xylem samples were collected from three Loblolly pine trees that were randomly selected as biological replicates from the planation without replacement. Xylem and Phloem samples were collected from the tree trunks at eye level, and none of the trees once sampled were sampled again. RNA was extracted from each of the tissue types using a CTAB extraction protocol (Chang et al, 1993), and a master mix was prepared before preparing and running qRT-PCR plates. Gene expression by qRT-PCR was assayed using primers from the Pinus contorta PtCBG gene and from the Pinus taeda Lac2 gene as control, on all 27 samples (three biological replicates of needles, phloem and xylem). The pine Act2 gene was used as an internal control to normalize data. Each sample was replicated three times (three technical replicates) to allow for the detection of technical errors. Tables with the final qRT-PCR datasets and fold change calculations and

ANOVA output tables and figures for the pine samples are presented in Appendix B, Part B.

Figures 3.3A and B show the trends in gene expression levels in the pine samples collected during May, June and September. Laccase (Lac2), a positive control for lignification, shows an increase in the xylem between May and June which corresponds to known times for lignification in trees. Lac2 expression then starts to decrease in September. The expression of Lac2 in needles and phloem was quite low, and considered at zero for further analyses.

44

However the PtCBG qRT-PCR primer did not show the same increase in gene expression levels as Lac2 in xylem cells over the growing season. Expression of the CBG-like genes was higher in June and September than in May in xylem, though the difference is much less pronounced between June to September than for Lac2.

To perform ANOVA, the operational null hypotheses for pines are the same as that for poplar.

H01: The expression of the genes in the putative lignin clade does not depend on

the time period of sample collection.

H02: The expression of the genes in the putative lignin clade does not depend on

the biological replicate.

The ANOVA analysis showed that the effects are not significant for either Lac2 or CBG.

Hence we also calculated a percentage taking into consideration the time lapses (May was month 1, June month 2 and September month 5), shown in Figure 3.4. The percentage increases in RNA levels for PtCBG and the Pinus taeda Lac2 genes, between May to June and June to September, are a better indicator of the trends because of time points that are missing. It shows that there is an increase in the Lac2 gene in all three tissue types between

May to June, but a decrease in expression between June to September. Xylem shows the highest increase in Lac2 RNA between May to June. The PtCBG gene did not show exactly the same trend as the Lac2 gene. PtCBG gene expression increased 204% between May to

June in Xylem and 248% in phloem, even though phloem does not have as many lignified

45

cells as xylem in Pinus taeda. The high levels of PtCBG gene expression in phloem cells may indicate contamination of phloem tissues with xylem cells, which can be difficult to cleanly separate when these tissues are collected from mature trees growing under field conditions. If this was the case, it may be most relevant to consider the changes in phloem and xylem expression levels together, as one tissue type.

Conclusions and Discussion

The tables 3.2. (1 to 4).1 and and figures 3.2.(1 to 4).2 a to f shows the effects of the six genes on different time periods after normalization of the data. From the results, it can be concluded that the expression of lignin genes is related to time point in the growing season in all the genes and tissue types tested. However, there is a difference in the expression patterns amongst different genes. The non-lignin clade gene, GH1-34, shows slight increase between April, June and September in poplar leaves and phloem, but shows a trend similar to GH1-28 with highest expression in xylem in May.

The residual plots in figures 3.(1 to 4) a to f show that there isn’t significant amount of variation of the data from the regression line.

Leaves show higher expression of GH1-24, 25 and 26/27 in September than June, and higher expression in June and April for GH1-26/27. There was no detectable signal in leaves in any of the three biological replicates of GH1-24 and GH1-25 in April, though there was signal in other genes and UBQ11 tested with the same cDNA. This could mean that there was no expression of those genes in leaves in April.

46

The expression of the genes in phloem and xylem are quite similar in poplar for the genes

GH1-24, 25 and 26/27. But the expression of GH1-28 is higher in June in xylem than phloem, than in September.

The peak in xylem cell development and lignification usually takes place between April to

June in most trees, then decreasing in the summer period and ceasing in the fall months and in phloem and xylem. Hence, an enzyme that is involved in lignification should show greater expression in xylem than in phloem and leaves/needles tissues; and higher gene expression in April and June than in September. We see this for the Poplar genes in figures

3.1 a-d and 3.2 a-d. It is seen that leaves have lower expression of all three genes in June and during some of the other months for the genes PoptrGH1- 24, 25 and 26/27.

PoptrGH1-25 was expressed in both leaves (all plants in April, and in one plant each in June and September) as well as in xylem (plants 1 and 3 in September). This could indicate that their transcripts are not present in detectable quantities in April, similar to PoptrGH1-24.

The absence of the gene in xylem samples from September could mean that the genes have shut down in some specimens in September. Given that the same trees were not sampled between three different time points, it is not possible to determine how the expression changes between months for the same tree. However, sampling the same tree over different time points would disobey the assumption of random sampling for ANOVA.

The presence of multiple genes with similar expression patterns in Poplar seems biologically redundant. It is possible that each of the genes act on different substrates (i.e. the H, G or S monolignols). It could also be that although they have redundant functions,

47

poplar has not lost the extra genes yet. Further studies showing substrate affinity can throw some light on this.

The results in Pines are not yet conclusive. ANOVA does not show significant differences in the expression in different tissue types or between months. However, graphically there was definitely an increase seen in absolute levels of CBG-like gene expression between

May to June in the xylem cells that corresponds with the increase in Lac2 expression. More data and analysis is required for more conclusive results.

Further experiments are necessary to test the above hypotheses. Histochemical studies were done to check the tissue specificity and timing of lignification. The histological experiments and results are presented in Chapter 4.

48

Table 3.1a: Primer Sequences used for RT-PCR assay of gene expression in Poplar.

Gene Forward Primer Reverse Primer

PoptrBGLU19 CGGAATGACAGGACAAGGAT CAGCAAAGACCACGCAAAA

PoptrBGLU24 CATACCCACACACCAGGAAA ATCTCCGAATCTCCCTCTGG

PoptrBGLU25 TTCATTGCCTCCCAACTTTC GTTGACTCCAAGGGTTTCCA

PoptrBGLU26/27 GGTTTGAGCAACTGGGATGT TAGAATTCGTGCCCATGACA

PoptrBGLU28 TCGATATCCGGAAGATCTGG TACGCAGAAGCTCGTTGATG

PoptrUBQ11 GATGTTGCTGTGCTTTTGGA ATGCGAGGATATGGAACGAG

Table 3.1b: Primer Sequences used for RT-PCR assay of gene expression in Pine

Gene Forward Primer Reverse Primer

PtCBG CAGCAATCAAAAATGGCTCA TCGTCGTGCTGAAGAAATTG

PtAct2 TTGCTGACCGTATGAGCAAG ACTCAGCCTTTGCAATCCAC

PtLac2 GGTTTGGGGTTGATCAAATG AATCCCACGGAAGGGTTATC

49

Table 3.2a: Summary of ANOVA results for comparisons of biological replicates and tissues; data normalized with UBQ11.

R-Sq(adj) Between Bio. Rep Between Tissues Sl. No. Gene Month F value Probability F value Probability

1 GH1-24 April 2.66 0.185 4.31 0.100 55.39

2 GH1-24 June 1.53 0.321 3.01 0.159 38.91

3 GH1-24 September 0.13 0.880 0.03 0.971 0.00

4 GH1-25 April 1.21 0.387 3.76 0.121 42.60

5 GH1-25 June 1.97 0.254 4.69 0.089 53.81

6 GH1-25 September 0.23 0.802 0.31 0.750 0.00

7 GH1-26/27 April 1.89 0.265 3.00 0.160 41.89

8 GH1-26/27 June 2.15 0.233 4.46 0.096 53.51

9 GH1-26/27 September 1.64 0.303 17.26 0.011 80.85

10 GH1-28 April 0.77 0.523 5.40 0.073 51.04

11 GH1-28 June 0.56 0.608 4.13 0.106 40.28

12 GH1-28 September 8.99 0.033 1.02 0.440 66.68

Not available data points were substituted with 0.0

50

Table 3.2b: Summary of ANOVA results for comparisons of biological replicates and months; data normalized with UBQ11.

R-Sq(adj) Between Bio. Rep Between Month Sl. No. Gene Plant part F value Probability F value Probability

1 GH1-24 Leaf

2 GH1-24 Phloem 0.77 0.521 8.26 0.038 63.73

3 GH1-24 Xylem 0.58 0.603 2.84 0.171 26.10

4 GH1-25 Leaf

5 GH1-25 Phloem 0.71 0.543 29.52 0.004 87.59

6 GH1-25 Xylem 0.84 0.497 3.94 0.113 40.97

7 GH1-26/27 Leaf 1.80 0.278 16.81 0.011 80.59

8 GH1-26/27 Phloem 0.84 0.498 14.89 0.014 77.44

9 GH1-26/27 Xylem 0.97 0.453 2.77 0.176 30.34

10 GH1-28 Leaf 0.87 0.487 2.92 0.165 30.90

11 GH1-28 Phloem 0.05 0.953 16.59 0.012 78.54

12 GH1-28 Xylem 2.73 0.179 3.72 0.122 52.65

51

Table 3.3a: Summary of ANOVA results for comparisons of biological replicates and tissues; data normalized with GH1-34.

R-Sq(adj) Between Bio. Rep Between Tissues Sl. No. Gene Month F value Probability F value Probability

1 GH1-24 April 0.90 0.474 15.38 0.013 78.12

2 GH1-24 June 0.97 0.454 6.07 0.061 55.73

3 GH1-24 September 0.53 0.624 0.51 0.634 0.00

4 GH1-25 April 1.01 0.440 51.18 0.001 92.62

5 GH1-25 June 1.04 0.433 11.40 0.022 72.30

6 GH1-25 September 0.05 0.951 0.06 0.941 0.00

7 GH1-26/27 April 0.56 0.610 6.05 0.062 53.56

8 GH1-26/27 June 1.66 0.298 1.70 0.292 25.47

9 GH1-26/27 September 2.31 0.215 9.42 0.031 70.87

10 GH1-28 April 1.02 0.439 14.10 0.015 76.63

11 GH1-28 June 0.40 0.693 1.57 0.314 0.00

12 GH1-28 September 1.38 0.349 1.94 0.257 24.94

Not available data points were substituted with 0.0

52

Table 3.3b: Summary of ANOVA results for comparisons of biological replicates and months; data normalized with GH1-34.

R-Sq(adj) Between Bio. Rep Between Month Sl. No. Gene Plant part F value Probability F value Probability

1 GH1-24 Leaf

2 GH1-24 Phloem 0.77 0.523 10.17 0.027 69.08

3 GH1-24 Xylem 1.82 0.274 6.57 0.054 61.53

4 GH1-25 Leaf

5 GH1-25 Phloem 0.18 0.845 327.31 0.000 98.79

6 GH1-25 Xylem 3.55 0.130 51.93 0.001 93.04

7 GH1-26/27 Leaf 1.17 0.397 2.08 0.240 23.93

8 GH1-26/27 Phloem 0.79 0.513 14.51 0.015 76.88

9 GH1-26/27 Xylem 2.31 0.215 9.42 0.031 70.87

10 GH1-28 Leaf 0.81 0.505 23.42 0.006 84.75

11 GH1-28 Phloem 1.49 0.328 14.02 0.016 77.15

12 GH1-28 Xylem 0.05 0.949 4.74 0.088 41.15

53

Figure 3.1 A to D: Graphs of dCt using UBQ Vs time in Poplus balsamifera

NOTE: April xylem 3 sample data point for the control gene UBQ11 was bad. Hence April xylem 3 data was not used in making this graph.

A: PoptrGH1-24 normalized with UBQ11 30

25

r)] -

t 20 -

15 Xylem

10 Phloem Leaves

5 Fold IncreaseFold [2^( 0 April June September -5 Months

t = gene of interest r = reference gene

B: PoptrGH1-25 normalized with UBQ11 16 14

12

r)]

- t - 10

8 Xylem

6 Phloem 4 Leaves

Fold IncreaseFold [2^( 2 0 -2 April June September Months

54

(Figure 3.1 continued)

C: PoptrGH1-26/27 normalized with UBQ11 12

10

r)]

- 8

t - 6 Xylem 4 Phloem 2 Leaves 0 Fold IncreaseFold [2^( April June September -2 -4 Months

D: PoptrGH1-28 normalized with UBQ11 0.6

0.5

r)]

- 0.4

t - 0.3 Xylem 0.2 Phloem 0.1 Leaves 0 Fold IncreaseFold [2^( April June September -0.1 -0.2 Months

55

Figure 3.2 A to D: Graphs of dCt using PoptrGH1-34 Vs time in Poplus balsamifera

A :PoptrGH1-24 normalized with GH1-34 3.5

3

r)]

- 2.5

t - 2 Xylem

1.5 Phloem 1 Leaves

0.5 Fold IncreaseFold [2^( 0 April June September -0.5 Months

B :PoptrGH1-25 normalized with GH1-34 1.8

1.6 r)]

- 1.4

t - 1.2 1 Xylem 0.8 0.6 Phloem

0.4 Leaves Fold IncreaseFold [2^( 0.2 0 April June September Months

56

(Figure 3.2 continued)

C: PoptrGH1-26/27 normalized with GH1-34 0.9

0.8

r)]

- 0.7 t - 0.6 0.5 Xylem 0.4 0.3 Phloem

0.2 Leaves Fold IncreaseFold [2^( 0.1 0 April June September Months

NOTE: April 1 and June 1 were rejected in Xylem as outlier

D: PoptrGH1-28 normalized with GH1-34 0.16 0.14 0.12 0.1 0.08 Xylem 0.06 Phloem 0.04 Leaves 0.02 0 -0.02 April June September Months

57

Figure 3.3 A and B: Graphs of dCt using Act2 Vs time in Pinus taeda

A: Pine PtCBG normalized with Act2 0.07

0.06

0.05

0.04 Needles 0.03 Phloem

0.02 Xylem

0.01

0 May June September -0.01

B: Pine PtLac2 normalized with Act2 0.35

0.3

0.25

0.2 Needles 0.15 Phloem

0.1 Xylem

0.05

0 May June September -0.05

57

Figure 3.4 A and B: Percentage graph of dCt using Act2 Vs time in Pinus taeda

450 A: CBG Fold Increase

400

350

300

250

May-June 200 June-Sept

Percentage 150

100

50

0 Needles Phloem Xylem -50

1800 B: Lac2 Fold Increase

1600

1400

1200

1000

May-June 800 June-Sept

Percentage 600

400

200

0 Needles Phloem Xylem -200

58

Chapter 4:

Localization of Lignin in Poplar stems

Introduction

The expression of PLC genes in poplar was found to be higher in phloem and xylem tissues during the growing season than in leaves. We next asked if a correlation between expression of CBG-like genes and lignin formation in phloem and xylem could be established. Autofluorescence and histochemical studies with basic fuchsin (Fuchs, 1963) were performed and observed with light and Confocal Laser Scanning Microscopy

(CLSM).to test for the presence of lignin in stem sections during the same time course in which gene expression was followed.

Materials and Methods:

Collection of specimens

Samples of poplar stems (2-4 years old) were collected on the same date in 2008 and from the exact same trees that were used to extract RNA for qRT-PCR analysis in chapter 3. The stem samples were collected and preserved in 70% ethanol, and stored in +4˚C until

October 2011 when they were used for sectioning and imaging.

59

Preparation for microscopy

Horizontal cross sections (CS) of each of the stems were performed by hand with fresh single edge razor blades as thinly as possible and sections were stored in water. The sections were transferred to slides and mounted in water and observed under the light microscope with 4x, 10x, 20x and 40x magnifications. They were also observed for UV autofluorescence from lignin under CLSM at 405nm excitation laser for fluorescence with bypass filter of 425-475nm wavelength.

Sections were stained with basic fuchsin using the method described by Dharmawardhana et al. (1992). The sections were stained in 0.01% basic fuchsin dissolved in 50% ethanol for

5 minutes, and destained in 70% ethanol for 5 minutes. The sections were mounted in

Millipore deionized water, and observed under the microscope within a few hours. Stained sections were observed under CLSM with excitation wavelength of 543nm, and emission filter wavelength of 555 to 655nm. Unstained sections were also observed under 543nm to eliminate the possibility of detecting autofluorescence at that wavelength.

All the stained and unstained sections were observed under 4x, 10x, 20x and 40x magnification lenses, and digital photographs taken of the most informative sections.

In some cases, the confocal microscope images were z-stacked to obtain images with sufficient depth and quality for assessing lignin content and cell types across the section. Z- stacking is a technique used in confocal microscopes where images are taken by the

60

microscope in different planes and the brightest fields for a particular region across different optical planes are assembled to yield a sharper image.

There was no fool-proof way of quantifying the cell wall growth on the images obtained because (1) the hand sections were not uniform enough in size to be able to compare the fluorescence levels between them, and (2) the size of the cell walls were too small to measure accurately at the magnifications obtained. To obtain thinner and more uniform sections for quantification, embedding of the tissues in paraffin and microtome sectioning was attempted by the Electron Microscopy facility staff. Unfortunately the preserved tissue sections did not provide good thin sections. The tissues appeared to be too soft after long storage in ethanol. Hence only qualitative comparisons of lignin content were conducted.

Results

Lignin deposition was observed in the cells of stem tissue CS’s of all samples. Figure 4.1 shows CLSM images from lignin autofluorescence under UV, and 4.2 shows CLSM images from basic fuchsin induced lignin fluorescence. To make sure that the CLSM was not capturing autofluorescence with the 543nm excitation laser, unstained sections (Figure

4.3A) were randomly selected from the stem sections and imaged (Stem 3 from September

2008). Figure 4.3B is the same field of view as figures 4.1 and 4.1 under bright field microscope. As it can be seen from the images, no fluorescence is visible in the unstained section at 543nm excitation. Hence, we can conclude that the fluorescence observed in the

61

stained sections was specific to basic fuchsin stained lignin and was not due to autofluorescence from other non-stained fluorescing materials.

Though it was not possible to quantify the fluorescence or width of the cell walls

(corresponding to secondary cell wall deposition), CS’s of tissue samples collected in April showed early wood starting to form, indicating that the growing season had started, with new secondary growth and lignification, at the time of sample collection. Samples in

September showed late wood deposition which is consistent with the trees being in late stages of the growth season and subsequently approaching the end of the period when lignin deposition takes place.

Conclusions and Discussion

The sections showed lignification occurring in xylem cells and to a lesser extent in phloem cells in the time points sampled. This corresponds to the qRT-PCR results presented in chapter 3 where the PLC genes PoprtGH1-24, 24, and 26/27 showed increased expression levels in the April and June samples relative to September. The sections also showed that stem secondary growth had begun at the time that our first samples were collected on

April 25, 2008, confirming that our studies of PLC gene expression coincided with active lignin synthesis in the poplar stems.

62

Figure 4.1: Autofluorescence CLSM images of cross section of Populus balsamifera stem, with 405nm excitation.

63

Figure 4.2: Fluorescent CLSM images of cross sections of Populus balsamifera stems stained in basic fuchsin

64

Chapter 5:

General Discussion and the Broader Perspective

The steps leading to monolignol biosynthesis are well studied (Boerjan et al., 2003; Weng and Chapple, 2010), but the transport of lignin monomers and the assembly of the lignin polymer in cell walls are less well explored. Our study involved a gene family, β- glucosidase, some of whose members have been shown earlier to take part in lignin transport and/or assembly (Dharmawardhana et al., 1995, 1999).

We hypothesized that the genes that are closely related to the Pinus contorta’s coniferin- specific β-glucosidase should also act on glycosylated monolignol substrates. We identified six glycosyl hydrolase family 1 genes in Populus trichocarpa, and analyzed the expression of five of them by performing qRT-PCR on tissues collected at different time points during the growing season and in different tissues of Populus balsamifera, a very closely related species to P. trichocarpa.

Phylogenetic analysis placed six poplar GH1 family proteins into a Putative Lignin Clade

(PLC) along with known monolignol-specific β-glucosidase from other species. Genes within the Putative Lignin Clade separated into 3 groups (sub-clades) of dicots and two groups of monocot species. Five of the poplar PLC genes (PoptrGH1-24. PoptrGH1-25,

PoptrGH1-26/27 and PoptrGH1-28) are tandemly repeated in the poplar genome and all fell into the PLC Eudicot sub-clade 1, which appears to be the basal, sister group to the other 4 PLC sub-clades. The expressions of these five genes were tested by qRT-PCR. The

65

genes PoptrGH1-24. PoptrGH1-25 and PoptrGH1-26/27, showed higher levels of expression in April and June compared to September; and higher expression in xylem and phloem tissues than in leaves. These results are consistent with the expected tissue-specificity and seasonal time course of secondary growth and lignin biosynthesis in trees. In contrast, the fourth gene in the PLC sub-sub-clade, PoptrGH1-28, showed a uniform level of expression across the different time points, though its expression was higher in xylem and phloem tissues than in leaves at all time points. The observed uniform expression of PoptrGH1-28 is interesting in that this gene was in a different sub-sub-clade within the Eudicot subclade

1 in the GH1 PLC family tree, than the other 4 tandemly duplicated genes. It would be interesting to study more about the functional relationships between the PoptrGH1-28 gene and the other PLC CBG-like genes in poplar.

The expression of the PoptrGH1-19 gene, which clusters phylogenetically with species other than poplar in the Eudicot 2 subclade, could not be determined in this study. It would be interesting to study the expression of this gene, which may have arisen from one of the angiosperm genome duplications, and to compare its substrate specificities relative to the other poplar PLC genes.

The CBG-like genes in Pinus taeda also showed a higher expression in the May to June time period than in the June to September period, though the significance levels were not high.

The significance levels might be improved with more samples, and with gene-specific PCR primers. Assays at more time points might provide better insights into the detailed seasonal time course of expression of CBG-like genes in pines.

66

Overall, these results indicate that monolignol-glucoside specific β-glucosidases are present in poplar and in Loblolly pine that could act as good candidates for improvement of lignin in trees by biotechnology. But the results should be confirmed by more experiments to determine the exact role of these genes in monolignol transport and assembly.

Experiments to determine enzyme activities of the tree CBG-like proteins on different glucoside substrates will provide insight into the specificity of the enzymes. Stable transformed mutants would be useful to observe the actual phenotypic effect of these genes on lignin biosynthesis in the trees.

This study showed that a number of glycosyl hydrolases in family GH1 cluster phylogenetically with known CBG-like proteins in pine, Arabidopsis and rice. Using the poplar model system for trees, it was shown in this study, and in Arabidopsis and rice in previous studies, that expression of genes in the Putative Lignin Clade coincide with the times and tissues where lignin formation takes place. Further studies are necessary to ascertain if the poplar PLC genes identified in this study act specifically on monolignol substrates as has previously been shown for the pine CBG and Arabidopsis CBG-like gene products.

Further support from the phylogenetic analysis for the potential role of genes in the

Putative Lignin Clade is seen in the absence of genes from any lower, non lignified plants like bryophytes and lycopodiopsida with the PLC cluster of genes in the GH1 family. Also, the presence of distinct gymnosperm and angiosperm subclades in the PLC suggests that the original GH1 gene acting on monolignol substrates evolved from a common ancestral

67

gene that either played a role in earlier stages of the evolution of lignin biosynthesis, or played a role in the storage and activation of phenylpropanoids that served roles in lower plants other than lignin formation, such as antibiotics or protectants from abiotic stresses.

Further phylogenetic analysis may yet provide a conclusive answer as more GH1 gene sequences become available from such species, including Pteridophytes and more gymnosperms.

Lignin is necessary and very important biological material for the growth and health of higher plant species. However in many cellulose-based industries, lignin is a detriment as it interferes with processing of raw biomass materials. Extraction of lignin from fiber is a costly and polluting industrial process. Suppressing lignin biosynthesis in plants that serve as feedstocks for the forage, pulp, energy and chemical industries would be very useful, but studies have shown undesirable effects of altering earlier stages of the phenylpropanoid pathway. As the deglycosylation of monolignols is one of the last steps in lignin biosynthesis, lignin-specific glucosidases from the GH1 family may be good targets for biotechnological improvements of feedstock plants.

68

LITERATURE CITED

Amthor JS (2003) Efficiency of lignin biosynthesis: a quantitative analysis, Annals of Botany, 91:673-691

Barakat A, Wall PK, DiLoreto S, dePamphilis CW, Carlson JE (2007) Conservation and divergence of microRNAs in Populus, BMC Genomics, 8: 481 - 496.

Boerjan W, Ralph J, Baucher M (2003) Lignin biosynthesis, Annu. Rev. Plant Biol., 54: 519 – 546

Boudet AM (2000) Lignins and lignification: Selected issues, Plant Physiol. Biochem., 38 (1/2): 81−96

Bourne Y, Henrissat B (2001) Glycoside hydrolases and glycosyltransferases: families and functional modules Current Opinion in Structural Biology, 11(5): 593-600

Bradshaw Jr. HD, Ceulemans R, Davis J, Stettler R (2000) Emerging model systems in Plant biology: Poplar (Populus) as a model forest tree, J Plant Growth Regul, 19: 306 – 313

Brinkmanm K, Blaschke L, Polle A (2002) Comparison of different methods for lignin determination as a basis for calibration of near-infrared reflectance spectroscopy and implications of lignoproteins, J.of Chem. Ecol., 28(12): 2483 - 2501

Brunner AM, Yakovlev IA, Strauss SH (2004) Validation internal controls for quantitative plant gene expression studies, BMC Plant Biology, 4: 14

Chapple CCS, Vogt T, Ellis BE, Somerville CR (1992) An Arabidopsis mutant defective in general phenylpropanoid pathway, The Plant Cell, 4: 1413–1424

Crooks GE, Hon G, Chandonia JM, Brenner SE (2004) WebLogo: A sequence logo generator, Genome Research, 14:1188-1190

Czjzek M, Cicek M, Zamboni V, Bevan DR, Henrissat B, Esen A (2000) The mechanism of substrate (aglycone) specificity in-glucosidases is revealed by crystal structures of

69

mutant maize β-glucosidase - DIMBOA, -DIMBOAGIc, and - dhurrin complexes, PNAS, 97(25): 13555-13567

Dharmawardhana DP, Ellis BE, Carlson JE (1992) Characterization of vascular lignification in Arabidopsis thaliana, J. Can. Bot, 70: 2238-2244.

Dharmawardhana DP, Ellis BE, Carlson JE (1995) A -glucosidase from lodgepole pine xylem specific for the lignin precursor coniferin, Plant Physiol., 107: 331-339

Dharmawardhana DP, Ellis BE, Carlson JE (1999) cDNA cloning and heterologous expression of coniferin β-glucosidase, Plant Mol. Biol., 40: 365 – 372

Dixon RA, Chen F, Guo D, Parvathi K (2001) The biosynthesis of monolignols: a ‘‘metabolic grid’’, or independent pathways to guaiacyl and syringyl units?, Phytochemistry, 57:1069–1084

Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput, Nucleic Acid Res., 32(5): 1792-1797

Elkind Y, Edwards R, Mavandad M, Hedrick SA, Ribak O, Dixon RA, Lamb CJ (1990) Abnormal plant development and down-regulation of phenylpropanoid biosynthesis in transgenic tobacco containing a heterologous phenylalanine ammonia-lyase, PNAS, 87: 9057 – 9061

Escamilla-Treviño LL, Chem W, Card ML, Shih M-C, Cheng C-L, Poulton JE (2006) Arabidopsis thaliana β-glucosidase BGLU45 and BGLU46 hydrolyse monolignol glucosides, Phytochemistry, 67(15):1651-60

Fukushima RS, Hatfield RD (2001) Extraction and isolation of lignin for utilization as a standard to determine lignin concentration using the acetyl bromide spectrophotometric method, J. Agric. Food Chem., 49: 7, 3133 – 3139

Garcia S, Latgé JP (1987) A new colorimetric method for dosage of lignin, Biotech. Techniq., 1(1): 63 – 68

70

Gray–Mitsumune M, Molirot EK, Cukovic D, Carlson JE, Douglas CJ (1999) Developmentally regulated patterns of expression directed by poplar PAL promoter in transgenic tobacco and poplar, Plant Mol. Bio., 39: 657–669

Guo AY, Zhu QH, Chen X, Luo JC (2007) GSDS: a gene structure display server, Yi Chuan 29(8):1023-1026

Gutierrez L, Mauriat M, Guénin S, Pelloux J, Lefebvre J-F, Louvet R, rusterucci C, Moritz T, Guerineau F, Bellini C, Wuytswinkel OV (2008) The lack of systematic validation of reference genes: a serious pitfall undervalues in reverse transcription polymerase chain reaction (RT-PCT) analysis in plants, Plant Biotech. J., 6(6): 609-618

Hall BG (2001) Phylogenetic trees made easy, Ed.2, Sinauer Associates Inc., Sunderland MA, USA

Hall BG (2007) Phylogenetic trees made easy, Ed.3, Sinauer Associates Inc., Sunderland MA, USA

Harding SA, Leshkevich J, Chiang VL, Tsai C-J (2002) Differential substrate inhibition couples kinetically distinct 4-Coumarate :Coenzyme A ligases with spatially distinct metabolic roles in quaking aspen, Plant Physiology, 128: 428 – 438

Harris PJ (2005) Diversity in plant cell walls. In: Henry RJ, editor. Plant diversity and evolution: genotypic and phenotypic variation in higher plants. Wallingford, UK: CAB International Publishing; 2005. p. 201-227

Henrissat B (1991) A classification of glycosyl hydrolases based on amino acid sequence similarities, Biochem. J., 280: 309 - 316

Henrissat B, Bairoch A (1993) New families in the classification of glycosyl hydrolases based on amino acid sequence similarities, Biochem. J. 293: 781 - 788

Henrissat B, Bairoch A (1996) Updating the sequence - based classification fo glycosyl hydrolases, Biochem. J. 293: 781 - 788

Henrissat B, Davies G (1997) Structural and sequence based classification of glycoside hydrolases, Current Opinion in Structural Biology, 7:637-644

71

Hosel W, Fiedler–Preiss A, Borgmann E (1982) Relationship of coniferin β-glucosidase to lignification in various plant cell suspension cultures, Plant cell Org Cult, 1: 137–148

Iiyama K, Wallis FA (1988) An improved acetyl bromide procedure for determining lignin in woods and wood pulps, Wood Sci. Technol., 22:271 – 280

Jansson S, Douglas CJ (2007) Populus: A model system for Plant biology, Annu. Rev. Plant Biol., 58:435–58

Jiao Y, Wickett NJ, Ayyampalayam S, Chanderbali AS, Landherr L, Ralph PE, Tomosho LP, Hu Y, Liang H, Soltis PS, Soltis DE, Clifton SW, Schlarbaum, SE, Schuster, SC, Ma H, Leebens-Mack J, dePamphilis, CW (2011) Ancestral polyploidy in seed plants and angiosperms, Nature, 473: 97-102

Jin L, Lloyd RV (1997) In situ hybridization: methods and applications, J. Clinical Lab. Analysis, 11: 2 – 9

Johnson DB, Moore WE, Zank L (1961) Spectrophotometric determination of lignin in small wood samples, TAPPI, 44(11): 793 – 780

Jones DT, Taylor WR & Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. Computer Applications in the Biosciences 8: 275- 282

Kaneda M, Renising KH, Wong JCT, Banno B, Mansfield SD, Samuels L (2008) Tracking monolignols during wood development in Lodgepole Pine, Plant Physiology 147(4): 1750-1760

Leple JC, Brasileiro ACM, Michel MF, Delmotte F, Jouanin L (1992) Transgenic poplars: expression of chimeric genes using four different constructs, Plant Cell Reports, 11: 137–141

Lev-Yadun S, Sederoff R (2000) Pines as model gymnosperms to study evolution, wood formation and perennial growth, J Plant Growth Regul, 19: 290 - 305.

72

Liang H, Maynard CA, Allen RD, Powell WA (2001) Increased Septoria musiva resistance in transgenic hybrid poplar leaves expressing a wheat oxalate oxidase gene, Plant Mol. Bio., 45: 619–629

Livak KJ and Schmittgen TD (2001) Analysis of relative gene expression data using real-time −ΔΔC quantitative PCR and the 2 T method, Methods, 25: 402–408

Opassiri R, Pomthong B, Onkoksoong T, Akiyama T, Esen A, Cairns JRK (2006) Analysis of -glucosidase, BMC Plant Biology, 6 : 33

Patten AM, Cardenas CL, Cochrane FC, Laskar DD, Bedgar DL, Davin LB, Lewis NG (2005) Reassessment of effects of lignification and vascular development in the irx4 Arabidopsis mutant, Phytochemistry, 66: 2092 - 2107.

Patten AM, Jourdes M, Brown EE, Laborie M-P, Davin LB, Lewis NG (2007) Reaction tissue formation and stem tensile modulus properties in wild - type and p-coumatate-2- hydrolase down regulated lines of Alfalfa, Medicago sativa (Fabaceae), Am. J. of Botany, 94(6) : 912 – 925

Rice,P. Longden,I. and Bleasby,A. (2000) The European molecular biology open software suite, Trends in Genetics 16(6): 276-277

Rogers LA, Campbell MM (2004) The genetic control of lignin deposition during plant growth and development, New Phytologist, 164: 17–30

Rozen S, Skaletsky HJ (2000) Primer3 on the WWW for general users and for biologist programmers. In: Krawetz S, Misener S (eds) Bioinformatics Methods and Protocols: Methods in Molecular Biology. Humana Press, Totowa, NJ, pp 365-386

Samuels AL, Rensing KH, Douglas CJ, Mansfield SD, Dharmawardhana DP, Ellis BE (2002) Cellular machinery of wood production: differentiation of secondary xylem in Pinus contorta var. latifolia, Planta, 216: 72-82

Sarkar P, Bosneaga E, Auer M (2009) Plant cell walls throughout evolution: towards a molecular understanding of their design principles, J. Exp. Bot. 60 (13): 3615-3635.

73

Schneider TD, Stephens RM. 1990. Sequence Logos: A new way to display consensus sequences, Nucleic Acids Res. 18: 6097-6100

Sticklen M (2006) Plant genetic engineering to improve biomass characteristics of biofuel, Current Opinions in Biotech., 17: 315 – 319

Taiz L, Zeiger E (2006) Plant Physiology, Ed 4, Sinauer Associates Inc., Sunderland MA

Tamura K, Peterson D, Peterson N, Stecher G, Nei M, and Kumar S (2011) MEGA5: Molecular Evolutionary Genetics Analysis using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods. Molecular Biology and Evolution, 28: 2731-2739

Tuskan GA,Difazio S, Jansson S, Bohlmann J,Grigoriev I, HellstenU, Putnam N, Ralph S, Rombauts S, Salamov A, et al (2006) The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313: 1596–1604

Uden Wv, Pras N, Batterman S, Visser Jf, Malingre’ TM (1990) The accumulation and isolation of coniferin from a high – producing cell suspension of flavum L., Planta, 183:25 – 30

Vanholme R, Demedts B, Morreel K, Ralph J, Boeran W (2010) Lignin biosynthesis and structure, Plant Physiology, 153: 895-905

Xu Z, Escamilla-Trevino LL, Zeng L, Lalgondar M, Bevan DR, Winkel BSJ, Mohamed A, Cheng C-L, Shih M-C, Poulton JE, Esen A (2004) Functional genomic analysis of Arabidopsis thaliana glycosyl hydrolase family 1, Plant Mol. Bio., 55: 343-367

Zhong R, Taylor JJ, Ye Z-H (1997) Disruption of interfascicular fiber differentiation in an Arabidopsis mutant, The Plant Cell, 9: 2159-2170

WEBSITES:

http://www.phytozome.net/ http://www.plantgdb.org/ http://blast.ncbi.nlm.nih.gov/ http://gsds.cbi.pku.edu.cn/ http://weblogo.berkeley.edu/

74

Appendix A:

All sequences in Phylogenetic tree

Bryophytes Physcometrella patens: Funiriaceae Ppa:Pp1s1_726V6 Ppa:Pp1s110_112V6 Ppa:Pp1s114_133V6 Ppa:Pp1s137_79V6 Ppa:Pp1s138_14V6 Ppa:Pp1s170_62V6 Ppa:Pp1s22_312V6 Ppa:Pp1s39_162V6 Ppa:Pp1s76_8V6 Ppa:Pp1s9_78V6 Ppe:ppa003718m Ppe:ppa003831m Ppe:ppa003856m Ppe:ppa003891m Ppe:ppa004108m Ppe:ppa004226m Ppe:ppa004288m Ppe:ppa004358m Ppe:ppa004368m Ppe:ppa004380m Ppe:ppa004484m Ppe:ppa004523m Ppe:ppa005036m Ppe:ppa005194m Ppe:ppa006110m Ppe:ppa006142m Ppe:ppa006167m Ppe:ppa007195m Ppe:ppa015161m Ppe:ppa015330m Ppe:ppa015619m Ppe:ppa015721m Ppe:ppa015887m Ppe:ppa015970m Ppe:ppa016583m Ppe:ppa016757m Ppe:ppa017484m Ppe:ppa017816m Ppe:ppa017981m

75

Ppe:ppa018404m Ppe:ppa018777m Ppe:ppa018933m Ppe:ppa019137m Ppe:ppa019262m Ppe:ppa019518m Ppe:ppa019573m Ppe:ppa019582m Ppe:ppa020067m Ppe:ppa020368m Ppe:ppa020817m Ppe:ppa020839m Ppe:ppa021137m Ppe:ppa021158m Ppe:ppa021225m Ppe:ppa022045m Ppe:ppa022156m Ppe:ppa022513m Ppe:ppa022831m Ppe:ppa023034m Ppe:ppa023264m Ppe:ppa023565m Ppe:ppa024193m Ppe:ppa024207m Ppe:ppa024434m Ppe:ppa024630m Ppe:ppa024653m Ppe:ppa025067m Ppe:ppa025259m Ppe:ppa025619m Ppe:ppa025660m Ppe:ppa025840m Ppe:ppa026358m Ppe:ppa026448m Ppe:ppa027074m Ppe:ppa027189m

Pteridophytes Selaginella moellendorffii Smo:127964 Smo:144295 Smo:149851 Smo:151109 Smo:153534 Smo:163822 Smo:163827 Smo:228612 Smo:268319

76

Smo:268527 Smo:408050 Smo:419452 Smo:73365 Smo:75234 Smo:76384 Smo:76748 Smo:88863 Smo:98083 Gymnosperms

Picea sitchensis (Spruce) : Pineaceae A9NVG8_PICSI A9NZQ3_PICSI B8LQ09_PICSI B8LQ52_PICSI C0PT85_PICSI D5ACM5_PICSI D5AD30_PICSI

Pinus contorta: Pineaceae PcCBG AAC69619_ Q9ZT64 PccbgA PccbgB

Pinus taeda: Pineaceae PtcbgA1 PtcbgA2 PtcbgB1 PtcbgB2

Angiosperms : Monocots Brachypodium distachyon: Poaceae Bdi:Bradi1g10890 Bdi:Bradi1g10920 Bdi:Bradi1g10930 Bdi:Bradi1g10940 Bdi:Bradi1g19270 Bdi:Bradi1g33040 Bdi:Bradi1g42690 Bdi:Bradi1g70170 Bdi:Bradi2g09190 Bdi:Bradi2g09200 Bdi:Bradi2g27770 Bdi:Bradi2g57640 Bdi:Bradi2g59650 Bdi:Bradi2g59660

77

Bdi:Bradi3g00650 Bdi:Bradi3g40000 Bdi:Bradi3g40010 Bdi:Bradi3g45610 Bdi:Bradi3g45630 Bdi:Bradi3g45640 Bdi:Bradi3g45650 Bdi:Bradi4g08040 Bdi:Bradi4g34930 Bdi:Bradi4g34940 Bdi:Bradi4g34950 Bdi:Bradi5g13260 Bdi:Bradi5g13270 Bdi:Bradi5g15530 Bdi:Bradi5g15540

Oryza sativa (Rice): Poaceae Os01g32364_Os1bglu1 Os01g59819_Os1bglu2 Os01g59840_Os1bglu3 Os01g67220_Os1bglu4 Os01g70520_Os1bglu5 Os03g11420_Os3bglu6 Os03g49600_Os3bglu7 Os03g49610_Os3bglu8 Os04g39814_Os4bglu9 Os04g39840_Os4bglu10 Os04g39864_Os4bglu11 Os04g39880_Os4bglu12 Os04g39900_Os4bglu13 Os04g43360_Os4bglu14 Os04g43390_Os4bglu16 Os04g43400_Os4bglu17 Os04g43410_Os4bglu18 Os05g30250_Os5bglu19 Os05g30280_Os5bglu20 Os05g30300_Os5bglu21 Os05g30350_Os5bglu22 Os05g30390_Os5bglu23 Os06g21570_Os6bglu24 Os06g46940_Os6bglu25 Os07g46280_Os7bglu26 Os08g39860_Os8bglu27 Os08g39870_Os8bglu28 Os09g31410_Os9bglu29 Os09g31430_Os9bglu30 Os09g33680_Os9bglu31 Os09g33690_Os9bglu32

78

Os09g33710_Os9bglu33 Os10g17650_Os10bglu34 Os11g08120_Os11bglu37 Os12g23170_Os12bglu38

Sorghum bicolor: Poaceae Sbi:Sb01g010825 Sbi:Sb01g010830 Sbi:Sb01g010840 Sbi:Sb01g013360 Sbi:Sb01g043030 Sbi:Sb02g028400 Sbi:Sb02g029620 Sbi:Sb02g029640 Sbi:Sb02g041550 Sbi:Sb03g008350 Sbi:Sb03g029560 Sbi:Sb03g037780 Sbi:Sb03g042690 Sbi:Sb06g019830 Sbi:Sb06g019840 Sbi:Sb06g019850 Sbi:Sb06g019860 Sbi:Sb06g019880 Sbi:Sb06g022385 Sbi:Sb06g022410 Sbi:Sb06g022420 Sbi:Sb06g022450 Sbi:Sb06g022460 Sbi:Sb06g022490 Sbi:Sb06g022500 Sbi:Sb06g022510 Sbi:Sb08g007570 Sbi:Sb08g007586 Sbi:Sb08g007610 Sbi:Sb08g007650 Sbi:Sb09g018145 Sbi:Sb09g018160 Sbi:Sb09g018180 Sbi:Sb10g012220 Sbi:Sb10g022300 Sbi:Sb10g027600 Sbi:Sb10g028060

Setaria italica: Poaceae Sit:Si001092m Sit:Si001247m Sit:Si002332m

79

Sit:Si006283m Sit:Si006286m Sit:Si008435m Sit:Si009837m Sit:Si009850m Sit:Si009871m Sit:Si009882m Sit:Si009896m Sit:Si009910m Sit:Si010127m Sit:Si011994m Sit:Si012753m Sit:Si013564m Sit:Si021753m Sit:Si021785m Sit:Si024527m Sit:Si029509m Sit:Si029536m Sit:Si029542m Sit:Si029565m Sit:Si032721m Sit:Si034935m Sit:Si034989m Sit:Si035308m Sit:Si035908m

Zea mays (maize): Poaceae Zma:AC155376.2_FGT005 Zma:AC203966_FGT006 Zma:AC217401.3_FGT002 Zma:AC234160.1_FGT003 Zma:GRMZM2G008247 Zma:GRMZM2G012236 Zma:GRMZM2G014844 Zma:GRMZM2G015804 Zma:GRMZM2G016890 Zma:GRMZM2G031660 Zma:GRMZM2G031693 Zma:GRMZM2G044092 Zma:GRMZM2G055699 Zma:GRMZM2G069024 Zma:GRMZM2G076946 Zma:GRMZM2G077015 Zma:GRMZM2G108133 Zma:GRMZM2G112704 Zma:GRMZM2G118003 Zma:GRMZM2G120962 Zma:GRMZM2G148176

80

Zma:GRMZM2G163544 Zma:GRMZM2G174699 Zma:GRMZM2G362362 Zma:GRMZM2G376416 Zma:GRMZM2G426467 Zma:GRMZM5G810727 Zma:GRMZM5G828987 Zma:GRMZM5G845736 Zma:GRMZM5G882852

Basal Eudicot Aquilegia coerulea (Columbine): Ranuncuculaceae Aco:AcoGoldSmith_v1.003009m Aco:AcoGoldSmith_v1.003037m Aco:AcoGoldSmith_v1.003160m Aco:AcoGoldSmith_v1.003204m Aco:AcoGoldSmith_v1.003268m Aco:AcoGoldSmith_v1.003283m Aco:AcoGoldSmith_v1.003285m Aco:AcoGoldSmith_v1.003299m Aco:AcoGoldSmith_v1.003324m Aco:AcoGoldSmith_v1.003325m Aco:AcoGoldSmith_v1.003341m Aco:AcoGoldSmith_v1.003363m Aco:AcoGoldSmith_v1.003389m Aco:AcoGoldSmith_v1.003397m Aco:AcoGoldSmith_v1.003454m Aco:AcoGoldSmith_v1.003499m Aco:AcoGoldSmith_v1.003571m Aco:AcoGoldSmith_v1.003589m Aco:AcoGoldSmith_v1.003600m Aco:AcoGoldSmith_v1.003647m Aco:AcoGoldSmith_v1.003722m Aco:AcoGoldSmith_v1.004386m Aco:AcoGoldSmith_v1.004679m Aco:AcoGoldSmith_v1.005032m Aco:AcoGoldSmith_v1.005665m Aco:AcoGoldSmith_v1.005926m Aco:AcoGoldSmith_v1.007023m Aco:AcoGoldSmith_v1.007354m Aco:AcoGoldSmith_v1.007763m Aco:AcoGoldSmith_v1.007844m Aco:AcoGoldSmith_v1.012802m Aco:AcoGoldSmith_v1.013289m Aco:AcoGoldSmith_v1.014487m Aco:AcoGoldSmith_v1.015405m Aco:AcoGoldSmith_v1.015560m

81

Aco:AcoGoldSmith_v1.015570m Aco:AcoGoldSmith_v1.018099m Aco:AcoGoldSmith_v1.018788m Aco:AcoGoldSmith_v1.020585m Aco:AcoGoldSmith_v1.021327m Aco:AcoGoldSmith_v1.021630m Aco:AcoGoldSmith_v1.022154m Aco:AcoGoldSmith_v1.022298m Aco:AcoGoldSmith_v1.022499m Aco:AcoGoldSmith_v1.022713m Aco:AcoGoldSmith_v1.023363m Aco:AcoGoldSmith_v1.024931m Aco:AcoGoldSmith_v1.025034m Aco:AcoGoldSmith_v1.025778m Aco:AcoGoldSmith_v1.026253m

Angiosperms : Eudicots Arabidopsis lyrata :Brassicaceae (Rosid) Aly:314159 Aly:315311 Aly:315321 Aly:316517 Aly:326101 Aly:359476 Aly:472753 Aly:474221 Aly:474259 Aly:474364 Aly:475147 Aly:475148 Aly:475732 Aly:477608 Aly:478241 Aly:479295 Aly:479297 Aly:481414 Aly:483572 Aly:483573 Aly:486477 Aly:486479 Aly:486481 Aly:486774 Aly:486775 Aly:489446 Aly:491999 Aly:492664 Aly:492699 Aly:493685

82

Aly:494520 Aly:495536 Aly:862845 Aly:870958 Aly:873493 Aly:875932 Aly:878433 Aly:882771 Aly:891388 Aly:892250 Aly:898581 Aly:902351 Aly:903867 Aly:903868 Aly:907850 Aly:909775 Aly:917734 Aly:918586 Aly:920911

Arabidopsis thaliana :Brassicaceae (Rosid) AT1G02850_BGLU11 AT1G26560_BGLU40 AT1G45191_BGLU1 AT1G47600_BGLU34_TGG4 AT1G51470_BGLU35_TGG5 AT1G51490_BGLU36 AT1G52400_BGLU18_ATBG1_BGL1 AT1G60090_BGLU4 AT1G60260_BGLU5 AT1G60270_BGLU6 AT1G61810_BGLU45 AT1G61820_BGLU46 AT1G66270_BGLU21 AT1G66280_BGLU22 AT1G75940 _BGLU20_ATA27 AT2G25630_BGLU14 AT2G32860_BGLU33 AT2G44450_BGLU15 AT2G44460_BGLU28 AT2G44470_BGLU29 AT2G44480_BGLU17 AT2G44490_BGLU26_PEN2 AT3G03640_BGLU25_GLUC AT3G09260_BGLU23 AT3G18070_BGLU43 AT3G18080_BGLU44 AT3G21370_BGLU19

83

AT3G60120_BGLU27 AT3G60130_BGLU16 AT3G60140_BGLU30_DIN2_SRG2 AT3G62740_BGLU7 AT3G62750_BGLU8 AT4G21760_BGLU47 AT4G22100_BGLU3 AT4G27820_BGLU9 AT4G27830_BGLU10 AT5G16580_BGLU2 AT5G24540_BGLU31 AT5G24550_BGLU32 AT5G25980_BGLU37_TGG2 AT5G26000_BGLU38_TGG1 AT5G28510_BGLU24 AT5G36890_BGLU42 AT5G42260_BGLU12 AT5G44640_BGLU13 AT5G48375_BGLU39_TGG3 AT5G54570_BGLU41

Citrus clemintina: Rutaceae (Rosid) Ccl:Clementine0.9_006414m Ccl:Clementine0.9_006916m Ccl:Clementine0.9_007212m Ccl:Clementine0.9_007409m Ccl:Clementine0.9_007416m Ccl:Clementine0.9_007556m Ccl:Clementine0.9_007753m Ccl:Clementine0.9_007955m Ccl:Clementine0.9_007994m Ccl:Clementine0.9_008585m Ccl:Clementine0.9_009489m Ccl:Clementine0.9_011904m Ccl:Clementine0.9_013036m Ccl:Clementine0.9_027496m Ccl:Clementine0.9_027839m Ccl:Clementine0.9_028095m Ccl:Clementine0.9_028622m Ccl:Clementine0.9_029031m Ccl:Clementine0.9_029561m Ccl:Clementine0.9_030023m Ccl:Clementine0.9_030448m Ccl:Clementine0.9_030603m Ccl:Clementine0.9_031033m Ccl:Clementine0.9_034689m

Carica papaya: Caricaceae (Rosid)

84

Cpa:evm.model.supercontig_1.109 Cpa:evm.model.supercontig_1.110 Cpa:evm.model.supercontig_116.16 Cpa:evm.model.supercontig_116.17 Cpa:evm.model.supercontig_17.152 Cpa:evm.model.supercontig_189.36 Cpa:evm.model.supercontig_198.21 Cpa:evm.model.supercontig_3.438 Cpa:evm.model.supercontig_444 Cpa:evm.model.supercontig_444(2) Cpa:evm.model.supercontig_444(3) Cpa:evm.model.supercontig_561.1 Cpa:evm.model.supercontig_561.2 Cpa:evm.model.supercontig_80.122 Cpa:evm.model.supercontig_80.123 Cpa:evm.model.supercontig_84.113 Cpa:evm.model.supercontig_88.6 Cpa:evm.model.supercontig_9.230 Cpa:evm.TU.contig_25278 Cpa:evm.TU.contig_28829 Cpa:evm.TU.contig_29746

Cucumis sativa: Cucurbitaceae (Rosid) Csa:Cucsa.029410 Csa:Cucsa.050040 Csa:Cucsa.053330 Csa:Cucsa.086000 Csa:Cucsa.097180 Csa:Cucsa.108330 Csa:Cucsa.121830 Csa:Cucsa.123340 Csa:Cucsa.143110 Csa:Cucsa.143120 Csa:Cucsa.143140 Csa:Cucsa.143150 Csa:Cucsa.143170 Csa:Cucsa.143180 Csa:Cucsa.143190 Csa:Cucsa.143200 Csa:Cucsa.189380 Csa:Cucsa.327780 Csa:Cucsa.336940 Csa:Cucsa.336950 Csa:Cucsa.338070 Csa:Cucsa.341820

Citrus sinensis: Rutaceae (Rosid) Csi:orange1.1g009535m

85

Csi:orange1.1g009558m Csi:orange1.1g009642m Csi:orange1.1g009780m Csi:orange1.1g010049m Csi:orange1.1g010588m Csi:orange1.1g011145m Csi:orange1.1g012181m Csi:orange1.1g012716m Csi:orange1.1g012937m Csi:orange1.1g013298m Csi:orange1.1g015181m Csi:orange1.1g036046m Csi:orange1.1g036937m Csi:orange1.1g036948m Csi:orange1.1g040158m Csi:orange1.1g040688m Csi:orange1.1g045534m Csi:orange1.1g046009m Csi:orange1.1g046612m Csi:orange1.1g046891m

Eucalyptus grandis: Myrtaceae (Rosid) Egr:Egrandis_v1_0.007033m Egr:Egrandis_v1_0.008064m Egr:Egrandis_v1_0.008129m Egr:Egrandis_v1_0.008570m Egr:Egrandis_v1_0.008724m Egr:Egrandis_v1_0.008928m Egr:Egrandis_v1_0.008949m Egr:Egrandis_v1_0.008959m Egr:Egrandis_v1_0.009178m Egr:Egrandis_v1_0.009188m Egr:Egrandis_v1_0.009251m Egr:Egrandis_v1_0.009325m Egr:Egrandis_v1_0.009363m Egr:Egrandis_v1_0.009436m Egr:Egrandis_v1_0.009542m Egr:Egrandis_v1_0.009873m Egr:Egrandis_v1_0.010407m Egr:Egrandis_v1_0.010473m Egr:Egrandis_v1_0.011486m Egr:Egrandis_v1_0.012431m Egr:Egrandis_v1_0.026971m Egr:Egrandis_v1_0.038165m Egr:Egrandis_v1_0.041084m Egr:Egrandis_v1_0.044074m Egr:Egrandis_v1_0.045102m Egr:Egrandis_v1_0.045275m

86

Egr:Egrandis_v1_0.047640m Egr:Egrandis_v1_0.048466m Egr:Egrandis_v1_0.048749m Egr:Egrandis_v1_0.049044m Egr:Egrandis_v1_0.050233m Egr:Egrandis_v1_0.052898m Egr:Egrandis_v1_0.054030m Egr:Egrandis_v1_0.009802m

Glycine max (Soybean): Fabaceae (Rosid) Gma:Glyma01g06980 Gma:Glyma02g02230 Gma:Glyma02g17480 Gma:Glyma02g17490 Gma:Glyma02g40910 Gma:Glyma06g41200 Gma:Glyma07g11310 Gma:Glyma07g18400 Gma:Glyma07g18410 Gma:Glyma07g38840 Gma:Glyma07g38850 Gma:Glyma08g15930 Gma:Glyma08g15950 Gma:Glyma08g15960 Gma:Glyma08g15980 Gma:Glyma08g46180 Gma:Glyma09g00550 Gma:Glyma09g30910 Gma:Glyma11g13770 Gma:Glyma11g13780 Gma:Glyma11g13800 Gma:Glyma11g13810 Gma:Glyma11g13820 Gma:Glyma11g13830 Gma:Glyma11g13850 Gma:Glyma11g13860 Gma:Glyma11g16220 Gma:Glyma12g05770 Gma:Glyma12g05780 Gma:Glyma12g05790 Gma:Glyma12g05800 Gma:Glyma12g05810 Gma:Glyma12g05820 Gma:Glyma12g05830 Gma:Glyma12g11280 Gma:Glyma12g15620 Gma:Glyma12g35120 Gma:Glyma12g35140

87

Gma:Glyma12g36870 Gma:Glyma13g35410 Gma:Glyma13g35430 Gma:Glyma13g41800 Gma:Glyma14g39230 Gma:Glyma15g03610 Gma:Glyma15g03620 Gma:Glyma15g11290 Gma:Glyma15g42570 Gma:Glyma15g42590 Gma:Glyma16g19480 Gma:Glyma20g03210

Vitis vinifera: Vitaceae (Basal Rosid) GSVIVT01003998001 GSVIVT01003999001 GSVIVT01008398001 GSVIVT01012191001 GSVIVT01012192001 GSVIVT01012650001 GSVIVT01014399001 GSVIVT01014400001 GSVIVT01014551001 GSVIVT01025343001 GSVIVT01028001001 GSVIVT01028004001 GSVIVT01028006001 GSVIVT01030881001 GSVIVT01032004001 GSVIVT01032005001 GSVIVT01032006001 GSVIVT01032007001 GSVIVT01032014001 GSVIVT01032015001 GSVIVT01032017001 GSVIVT01032018001 GSVIVT01032019001 GSVIVT01032023001 GSVIVT01032025001 GSVIVT01032142001 GSVIVT01032149001

Manihot esculanta (Cassava): Euphorbiacecae (Rosid) Mes:cassava4.1_005099m Mes:cassava4.1_005127m Mes:cassava4.1_005140m Mes:cassava4.1_005555m Mes:cassava4.1_005677m

88

Mes:cassava4.1_005766m Mes:cassava4.1_005780m Mes:cassava4.1_005890m Mes:cassava4.1_006078m Mes:cassava4.1_006309m Mes:cassava4.1_006332m Mes:cassava4.1_006336m Mes:cassava4.1_008514m Mes:cassava4.1_008910m Mes:cassava4.1_012507m Mes:cassava4.1_021008m Mes:cassava4.1_021061m Mes:cassava4.1_021468m Mes:cassava4.1_022562m Mes:cassava4.1_022817m Mes:cassava4.1_022953m Mes:cassava4.1_023547m Mes:cassava4.1_023739m Mes:cassava4.1_023753m Mes:cassava4.1_024320m Mes:cassava4.1_024585m Mes:cassava4.1_024736m Mes:cassava4.1_025199m Mes:cassava4.1_026312m Mes:cassava4.1_026351m Mes:cassava4.1_026504m Mes:cassava4.1_027570m Mes:cassava4.1_027713m Mes:cassava4.1_028433m Mes:cassava4.1_028771m Mes:cassava4.1_028884m Mes:cassava4.1_029035m Mes:cassava4.1_029195m Mes:cassava4.1_029389m Mes:cassava4.1_029420m Mes:cassava4.1_029644m Mes:cassava4.1_031027m Mes:cassava4.1_031067m Mes:cassava4.1_031101m Mes:cassava4.1_031147m Mes:cassava4.1_031399m Mes:cassava4.1_031777m Mes:cassava4.1_032229m Mes:cassava4.1_032290m Mes:cassava4.1_032518m Mes:cassava4.1_032622m Mes:cassava4.1_032669m Mes:cassava4.1_032853m

89

Mes:cassava4.1_033294m Mes:cassava4.1_033426m Mes:cassava4.1_033687m

Mimulus guttatus: Lamiales (Asterid) Mgu:mgv1a003998m Mgu:mgv1a004080m Mgu:mgv1a004735m Mgu:mgv1a004736m Mgu:mgv1a004935m Mgu:mgv1a004939m Mgu:mgv1a005186m Mgu:mgv1a006902m Mgu:mgv1a007433m Mgu:mgv1a007486m Mgu:mgv1a018115m Mgu:mgv1a019285m Mgu:mgv1a023593m Mgu:mgv1a023610m Mgu:mgv1a023615m Mgu:mgv1a025215m Mgu:mgv1a025437m Mgu:mgv1a025884m

Medicago truncatula: Fabaceae (Rosid) Mtr:AC233100_73.1 Mtr:Medtr1g108460 Mtr:Medtr1g108470 Mtr:Medtr3g040460 Mtr:Medtr3g040490 Mtr:Medtr3g166720 Mtr:Medtr4g024200 Mtr:Medtr4g115490 Mtr:Medtr4g115640 Mtr:Medtr5g077420 Mtr:Medtr7g046330 Mtr:Medtr7g046340 Mtr:Medtr7g046370 Mtr:Medtr7g046430 Mtr:Medtr8g146370 Mtr:Medtr8g146380 Mtr:Medtr8g146400

Populus trichocarpa: Salicaceae () (See table 2.1 for detailed gene names) PoptrGH1-1 PoptrGH1-2 PoptrGH1-3

90

PoptrGH1-4 PoptrGH1-5 PoptrGH1-6 PoptrGH1-7 PoptrGH1-8 PoptrGH1-9 PoptrGH1-10 PoptrGH1-11 PoptrGH1-12 PoptrGH1-13 PoptrGH1-14 PoptrGH1-15 PoptrGH1-16 PoptrGH1-17 PoptrGH1-18 PoptrGH1-19 PoptrGH1-20 PoptrGH1-21 PoptrGH1-23 PoptrGH1-24 PoptrGH1-25 PoptrGH1-26 PoptrGH1-27 PoptrGH1-28 PoptrGH1-29 PoptrGH1-30 PoptrGH1-31 PoptrGH1-32 PoptrGH1-33 PoptrGH1-34 PoptrGH1-35 PoptrGH1-36 PoptrGH1-37 PoptrGH1-38 PoptrGH1-39 PoptrGH1-40 PoptrGH1-42 PoptrGH1-43 PoptrGH1-44

Ricinus communis: Euphorbiaceae (Rosid) Rco:28330.m000020 Rco:29808.m000891 Rco:29808.m000892 Rco:29842.m003629 Rco:29878.m000230 Rco:29904.m002964 Rco:29924.m000095

91

Rco:29929.m004509 Rco:29986.m001601 Rco:29986.m001602 Rco:29986.m001603 Rco:29986.m001604 Rco:29986.m001605 Rco:29986.m001606 Rco:30147.m014538 Rco:30147.m014539 Rco:30169.m006385 Rco:30169.m006386 Rco:30174.m009125 Rco:30226.m001977 Rco:30226.m001978 Rco:30226.m001982 Rco:30226.m001984 Rco:30226.m001987

92

Appendix B:

Tables, figures and raw data for Chapter 3

APPENDIX B

PART A

TABLES OF RAW DATA AND EXPRESSION FOLD CHANGES FROM qRT-PCR ASSAYS, WITH TABLES AND FIGURES FOR ANOVA OUTPUT FOR ANALYSIS OF qRT-PCR DATA, FOR POPLAR SAMPLES

93

Table A2.1.1.1: Raw data and expression fold change calculations for poplar samples with PoptrGH1-24 normalized with UBQ11.

Target - Reference PoptrGH1- Biological UBQ 11 (UBQ11) Fold increase 2^[-(t-r)] 24 Reps. Leaves Phloem Xylem Leaves Phloem Xylem Leaves Phloem Xylem Leaves Phloem Xylem 1 _ 25.6 25.49 28.1 28.68 29.32 _ -3.08 -3.83 _ 8.456144 14.22148 April 2 _ 25.58 25.97 27.23 28.94 30.17 _ -3.36 -4.2 _ 10.26741 18.37917 3 _ 26.71 25.78 28.94 28.2 24.97 _ -1.49 0.81 _ 2.80889 0.570382 1 32.94 28.17 27.26 26.91 27.19 25.75 6.03 0.98 1.51 0.015303 0.50698 0.351111 June 2 29.44 28.19 25.6 27.67 28.02 26.73 1.77 0.17 -1.13 0.293209 0.888843 2.188587 3 - 28.19 23.99 24.09 28.48 26.2 - -0.29 -2.21 - 1.22264 4.626753 1 33.19 34.51 35.27 29.72 27.76 28.59 3.47 6.75 6.68 0.090246 0.009291 0.009753 September 2 - 33.84 30.51 29.22 28.15 28.09 - 5.69 2.42 - 0.01937 0.186856 3 32.56 29.62 34.98 28.4 26.99 27.27 4.16 2.63 7.71 0.055939 0.161544 0.004776

94

Table A2.1.1.2a: Two-way ANOVA for Gene PoptrGH1-24: dCt (Normalized with UBQ11) versus Bio Rep, and different tissue types in April.

OUTPUT Source DF SS MS F P Bio Rep 2 116.277 58.1384 2.66 0.185 Tissue 2 188.825 94.4123 4.31 0.100 Error 4 87.588 21.8969 Total 8 392.689

S = 4.679 R-Sq = 77.70% R-Sq(adj) = 55.39%

Figure A2.1a Residual Plots for BioRep VsTissue type for GH1-24 Normal Probability Plot Versus Fits 99 5.0

90 2.5

l

t

a

n

u

e

d c

50 i 0.0

r

s

e

e P R -2.5 10 -5.0 1 -10 -5 0 5 10 -5 0 5 10 15 Residual Fitted Value

Histogram Versus Order 3 5.0

2.5

y

l c

2 a

n

u e

d 0.0

u i

s

q

e

e

r R

F 1 -2.5

-5.0 0 -6 -4 -2 0 2 4 1 2 3 4 5 6 7 8 9 Residual Observation Order

95

Table A2.1.1.2b: Two-way ANOVA for Gene PoptrGH1-24: dCt (Normalized with UBQ11) versus Bio Rep, and different tissue types in June.

OUTPUT Source DF SS MS F P Bio Rep 2 4.1268 2.06339 1.53 0.321 Tissue 2 8.1168 4.05841 3.01 0.159 Error 4 5.3849 1.34622 Total 8 17.6285

S = 1.160 R-Sq = 69.45% R-Sq(adj) = 38.91%

Figure A2.1b Residual Plots for BioRep VsTissue type for GH1-24 Normal Probability Plot Versus Fits 99

90 1

l

t

a

n

u

e

d c

50 i r

s 0

e

e

P R 10 -1 1 -2 -1 0 1 2 -1 0 1 2 3 Residual Fitted Value

Histogram Versus Order 3

1

y

l c

2 a

n

u

e

d

u i s

q 0

e

e

r R

F 1

-1 0 -1.0 -0.5 0.0 0.5 1.0 1.5 1 2 3 4 5 6 7 8 9 Residual Observation Order

96

Table A2.1.1.2c: Two-way ANOVA for Gene PoptrGH1-24: dCt (Normalized with UBQ11) versus Bio Rep, and different tissue types in September.

OUTPUT

Source DF SS MS F P Bio Rep 2 0.0024907 0.0012453 0.13 0.880 Tissue 2 0.0005678 0.0002839 0.03 0.971 Error 4 0.0376727 0.0094182 Total 8 0.0407311

S = 0.09705 R-Sq = 7.51% R-Sq(adj) = 0.00%

Figure A2.1c Residual Plots for BioRep VsTissue type for GH1-24 Normal Probability Plot Versus Fits 99 0.10

90 l

t 0.05

a

n

u

e

d c

50 i r

s 0.00

e

e

P R 10 -0.05

1 -0.10 -0.2 -0.1 0.0 0.1 0.2 0.02 0.04 0.06 0.08 Residual Fitted Value

Histogram Versus Order

2.0 0.10

y l

c 1.5

a 0.05

n

u

e

d

u i

1.0 s

q 0.00

e

e

r R F 0.5 -0.05

0.0 -0.10 5 0 5 0 5 0 5 0 7 5 2 0 2 5 7 0 1 2 3 4 5 6 7 8 9 .0 .0 .0 .0 .0 .0 .0 .1 -0 -0 -0 0 0 0 0 0 Observation Order Residual

97

Table A2.1.1.2d:Two-way ANOVA for Gene PoptrGH1-24: dCt (Normalized with UBQ11) versus Bio Rep, and different times for leaves (Note: missing data points)

OUTPUT

Source DF SS MS F P Bio Rep 2 0.0104416 0.0052208 0.43 0.676 Month 2 0.0158778 0.0079389 0.66 0.566 Error 4 0.0481875 0.0120469 Total 8 0.0745069

S = 0.1098 R-Sq = 35.32% R-Sq(adj) = 0.00%

Figure A2.1d Residual Plots for BioRep Vs Time GH1-24 Leaves Normal Probability Plot Versus Fits 99

90 0.1

l

t

a

n

u

e

d c

50 i

r

s e

e 0.0

P R 10

1 -0.1 -0.2 -0.1 0.0 0.1 0.2 -0.05 0.00 0.05 0.10 0.15 Residual Fitted Value

Histogram Versus Order 3

0.1

y

l c

2 a

n

u

e

d

u i

s

q e

e 0.0

r R

F 1

0 -0.1 -0.10 -0.05 0.00 0.05 0.10 0.15 1 2 3 4 5 6 7 8 9 Residual Observation Order

98

Table A2.1.1.2e: Two-way ANOVA for Gene PoptrGH1-24: dCt (Normalized with UBQ11) versus Bio Rep, and different times for Phloem

OUTPUT

Source DF SS MS F P Bio Rep 2 8.495 4.2473 0.77 0.521 Tissue 2 91.014 45.5070 8.26 0.038 Error 4 22.043 5.5109 Total 8 121.552

S = 2.348 R-Sq = 81.86% R-Sq(adj) = 63.73%

Figure A2.1e Residual Plots for BioRep Vs Time GH1-24 Phloem Normal Probability Plot Versus Fits 99 2

90

l

t

a n

u 0

e

d c

50 i

r

s

e

e

P R 10 -2

1 -5.0 -2.5 0.0 2.5 5.0 0 2 4 6 8 Residual Fitted Value

Histogram Versus Order

3 2

y

l c

2 a n

u 0

e

d

u i

s

q

e

e

r R

F 1 -2

0 -3 -2 -1 0 1 2 1 2 3 4 5 6 7 8 9 Residual Observation Order

99

Table A2.1.1.2f: Two-way ANOVA for Gene PoptrGH1-24: dCt (Normalized with UBQ11) versus Bio Rep, and different times for Xylem

Source DF SS MS F P Bio Rep 2 40.886 20.443 0.58 0.603 Tissue 2 201.305 100.653 2.84 0.171 Error 4 141.933 35.483 Total 8 384.125

S = 5.957 R-Sq = 63.05% R-Sq(adj) = 26.10%

Figure A2.1f Residual Plots for BioRep Vs Time GH1-24 Xylem Normal Probability Plot Versus Fits 99 5

90

l

t

a n

u 0

e

d c

50 i

r

s

e

e

P R 10 -5

1 -10 -5 0 5 10 -5 0 5 10 15 Residual Fitted Value

Histogram Versus Order

3 5

y

l c

2 a n

u 0

e

d

u i

s

q

e

e

r R

F 1 -5

0 -7.5 -5.0 -2.5 0.0 2.5 5.0 1 2 3 4 5 6 7 8 9 Residual Observation Order

100

Table A2.1.2.1: The raw data and expression fold change calculations for poplar samples with PoptrGH1-25 normalized with UBQ11

PoptrGH1- Biological UBQ 11 Target - Reference (UBQ11) Fold increase 2^[-(t-r)] 25 Reps. Leaves Phloem Xylem Leaves Phloem Xylem Leaves Phloem Xylem Leaves Phloem Xylem 1 _ 27.65 25.86 28.1 28.68 29.32 _ -1.03 -3.46 _ 2.042024 11.00433 April 2 _ 27.33 26.78 27.23 28.94 30.17 _ -1.61 -3.39 _ 3.052518 10.48315 3 _ 27.34 26.14 28.94 28.2 24.97 _ -0.86 1.17 _ 1.815038 0.444421 1 34.41 30.91 28.72 26.91 27.19 25.75 7.5 3.72 2.97 0.005524 0.075887 0.127627 June 2 31.4 31.13 27.46 27.67 28.02 26.73 3.73 3.11 0.73 0.075363 0.115824 0.602904 3 - 30.11 26.24 24.09 28.48 26.2 - 1.63 0.04 - 0.323088 0.972655 1 33.12 35.23 - 29.72 27.76 28.59 3.4 7.47 - 0.094732 0.00564 - September 2 - 35.81 32.2 29.22 28.15 28.09 - 7.66 4.11 - 0.004944 0.057912 3 34.67 32.75 - 28.4 26.99 27.27 6.27 5.76 - 0.012958 0.018453 -

101

Table A2.1.2.2a: Two-way ANOVA for Gene PoptrGH1-25: dCt (Normalized with UBQ11) versus Bio Rep, and different tissue types in April.

OUTPUT Source DF SS MS F P Bio Rep 2 27.083 13.5416 1.21 0.387 Tissue 2 83.825 41.9123 3.76 0.121 Error 4 44.638 11.1595 Total 8 155.546

S = 3.341 R-Sq = 71.30% R-Sq(adj) = 42.60%

Figure A2.2a Residual Plots for BioRep Vs Tissue GH1-25 April Normal Probability Plot Versus Fits 99 2

90

l

t a

n 0

u

e

d c

50 i

r

s

e e

P -2 R 10 -4 1 -5.0 -2.5 0.0 2.5 5.0 0 5 10 Residual Fitted Value

Histogram Versus Order 4 2

y 3

l

c a

n 0

u

e

d

u i

2 s

q

e

e r

R -2 F 1 -4 0 -4 -3 -2 -1 0 1 2 3 1 2 3 4 5 6 7 8 9 Residual Observation Order

102

Table A2.1.2.2b: Two-way ANOVA for Gene PoptrGH1-25: dCt (Normalized with UBQ11) versus Bio Rep, and different tissue types in June.

OUTPUT Source DF SS MS F P Bio Rep 2 0.197208 0.098604 1.97 0.254 Tissue 2 0.470266 0.235133 4.69 0.089 Error 4 0.200435 0.050109 Total 8 0.867908

S = 0.2238 R-Sq = 76.91% R-Sq(adj) = 53.81%

Figure A2.2b Residual Plots for BioRep Vs Tissue GH1-25 June Normal Probability Plot Versus Fits 99 0.2 90

l 0.1

t

a

n

u

e

d c

50 i 0.0

r

s

e

e P R -0.1 10 -0.2 1 -0.4 -0.2 0.0 0.2 0.4 0.00 0.25 0.50 0.75 Residual Fitted Value

Histogram Versus Order 3 0.2

y 0.1

l c

2 a

n

u e

d 0.0

u i

s

q

e

e r R -0.1

F 1 -0.2 0 -0.3 -0.2 -0.1 0.0 0.1 0.2 1 2 3 4 5 6 7 8 9 Residual Observation Order

103

Table A2.1.2.2c: Two-way ANOVA for Gene PoptrGH1-25: dCt (Normalized with UBQ11) versus Bio Rep, and different tissue types in September.

OUTPUT Source DF SS MS F P Bio Rep 2 0.0007947 0.0003973 0.23 0.802 Tissue 2 0.0010553 0.0005277 0.31 0.750 Error 4 0.0068333 0.0017083 Total 8 0.0086833

S = 0.04133 R-Sq = 21.31% R-Sq(adj) = 0.00%

Figure A2.2c Residual Plots for BioRep Vs Tissue GH1-25 Septemb Normal Probability Plot Versus Fits 99 0.04

90

l t

a 0.02

n

u

e

d c

50 i r

s 0.00

e

e

P R 10 -0.02

1 -0.04 -0.08 -0.04 0.00 0.04 0.08 0.000 0.012 0.024 0.036 0.048 Residual Fitted Value

Histogram Versus Order 3

0.04

y l

c 0.02

2 a

n

u

e

d

u i s

q 0.00

e

e

r R

F 1 -0.02

0 -0.04 -0.030 -0.015 0.000 0.015 0.030 0.045 1 2 3 4 5 6 7 8 9 Residual Observation Order

104

Table A2.1.2.2d: Two-way ANOVA for Gene PoptrGH1-25: dCt (Normalized with UBQ11) versus Bio Rep, and different times for leaves (Note: many missing data points).

OUTPUT Source DF SS MS F P Bio Rep 2 0.0013483 0.0006742 0.36 0.717 Month 2 0.0020954 0.0010477 0.56 0.609 Error 4 0.0074572 0.0018643 Total 8 0.0109009

S = 0.04318 R-Sq = 31.59% R-Sq(adj) = 0.00%

Figure A2.2d Residual Plots for BioRep Vs Time GH1-25 Leaves Normal Probability Plot Versus Fits 99 0.050

90 0.025

l

t

a

n

u

e

d c

50 i 0.000

r

s

e

e

P R 10 -0.025

1 -0.050 -0.08 -0.04 0.00 0.04 0.08 0.000 0.025 0.050 Residual Fitted Value

Histogram Versus Order 0.050 4

0.025 y

3 l

c

a

n

u

e

d i

u 0.000 s

q 2

e

e

r R F 1 -0.025

0 -0.050 -0.05 -0.03 -0.01 0.01 0.03 0.05 1 2 3 4 5 6 7 8 9 Residual Observation Order

105

Table A2.1.2.2e: Two-way ANOVA for Gene PoptrGH1-25: dCt (Normalized with UBQ11) versus Bio Rep, and different times for Phloem

OUTPUT Source DF SS MS F P Bio Rep 2 0.2374 0.11871 0.71 0.543 Tissue 2 9.8301 4.91506 29.52 0.004 Error 4 0.6659 0.16648 Total 8 10.7334

S = 0.4080 R-Sq = 93.80% R-Sq(adj) = 87.59%

Figure A2.2e Residual Plots for BioRep Vs Time GH1-25 Phloem Normal Probability Plot Versus Fits 99 0.50 90

l 0.25

t

a

n

u

e

d c

50 i r

s 0.00

e

e

P R 10 -0.25

1 -0.50 -0.8 -0.4 0.0 0.4 0.8 0.0 0.6 1.2 1.8 2.4 Residual Fitted Value

Histogram Versus Order

3 0.50 y

l 0.25 c

2 a

n

u

e

d

u i

s 0.00

q

e

e

r R

F 1 -0.25

0 -0.50 -0.4 -0.2 0.0 0.2 0.4 0.6 1 2 3 4 5 6 7 8 9 Residual Observation Order

106

Table A2.1.2.2f: Two-way ANOVA for Gene PoptrGH1-25: dCt (Normalized with UBQ11) versus Bio Rep, and different times for Xylem

OUTPUT Source DF SS MS F P Bio Rep 2 20.999 10.4995 0.84 0.497 Tissue 2 98.931 49.4655 3.94 0.113 Error 4 50.215 12.5538 Total 8 170.145

S = 3.543 R-Sq = 70.49% R-Sq(adj) = 40.97%

Figure A2.2f Residual Plots for BioRep Vs Time GH1-25 Xylem Normal Probability Plot Versus Fits 99 2

90

l

t a

n 0

u

e

d c

50 i

r

s e

e -2

P R 10 -4 1 -5.0 -2.5 0.0 2.5 5.0 0 3 6 9 Residual Fitted Value

Histogram Versus Order 3

2

y

l c

2 a

n 0

u

e

d

u i

s

q e

e -2

r R

F 1 -4 0 -4 -2 0 2 1 2 3 4 5 6 7 8 9 Residual Observation Order

107

Table A2.1.3.1: The raw data and expression fold change calculations for poplar samples with PoptrGH1-26/27 normalized with UBQ11.

Target - Reference PoptrGH1- Biological UBQ 11 (UBQ11) Fold increase 2^[-(t-r)] 26/27 Reps. Leaves Phloem Xylem Leaves Phloem Xylem Leaves Phloem Xylem Leaves Phloem Xylem 1 36.15 27.05 27.99 28.1 28.68 29.32 8.05 -1.63 -1.33 0.003773 3.09513 2.514027 April 2 34.11 27.41 27.6 27.23 28.94 30.17 6.88 -1.53 -2.57 0.00849 2.887858 5.938094 3 33.7 27.91 26.96 28.94 28.2 24.97 4.76 -0.29 1.99 0.036906 1.22264 0.251739 1 34.61 32.39 30.27 26.91 27.19 25.75 7.7 5.2 4.52 0.004809 0.027205 0.043586 June 2 32.13 31.53 28.46 27.67 28.02 26.73 4.46 3.51 1.73 0.045437 0.087778 0.301452 3 30.81 31.38 27.38 24.09 28.48 26.2 6.72 2.9 1.18 0.009486 0.133972 0.441351 1 32.67 36.87 37.43 29.72 27.76 28.59 2.95 9.11 8.84 0.129408 0.00181 0.002182 September 2 30.84 36.17 32.77 29.22 28.15 28.09 1.62 8.02 4.68 0.325335 0.003852 0.03901 3 30.41 32.26 36.05 28.4 26.99 27.27 2.01 5.27 8.78 0.248273 0.025916 0.002275

108

Table A2.2.3.2a: Two-way ANOVA for Gene PoptrGH1-26/27: dCt (Normalized with UBQ11) versus Bio Rep, and different tissue types in April.

OUTPUT Source DF SS MS F P Bio Rep 2 8.9811 4.49057 1.89 0.265 Tissue 2 14.2626 7.13129 3.00 0.160 Error 4 9.5192 2.37979 Total 8 32.7629

S = 1.543 R-Sq = 70.95% R-Sq(adj) = 41.89%

Figure A2.3a Residual Plots for BioRep VsTissue type for GH1-26 Normal Probability Plot Versus Fits 99 2

90

l 1

t

a

n

u

e

d c

50 i

r s

e 0

e

P R 10 -1 1 -2 -1 0 1 2 0 2 4 Residual Fitted Value

Histogram Versus Order 2

2.0 y

1.5 l 1

c

a

n

u

e

d

u i s

q 1.0 0

e

e

r R F 0.5 -1 0.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 1 2 3 4 5 6 7 8 9 Residual Observation Order

109

Table A2.2.3.2b: Two-way ANOVA for Gene PoptrGH1-26/27: dCt (Normalized with UBQ11) versus Bio Rep, and different tissue types in June.

OUTPUT Source DF SS MS F P Bio Rep 2 0.045641 0.0228203 2.15 0.233 Tissue 2 0.094741 0.0473707 4.46 0.096 Error 4 0.042510 0.0106275 Total 8 0.182892

S = 0.1031 R-Sq = 76.76% R-Sq(adj) = 53.51%

Figure A2.3b Residual Plots for BioRep VsTissue type for GH1-26 Normal Probability Plot Versus Fits 99 0.10

90 0.05

l

t

a

n

u

e d

c 0.00

50 i

r

s

e

e P R -0.05 10 -0.10 1 -0.2 -0.1 0.0 0.1 0.2 -0.1 0.0 0.1 0.2 0.3 Residual Fitted Value

Histogram Versus Order

4 0.10

0.05

y 3

l

c

a

n

u e

d 0.00

u i

2 s

q

e

e r R -0.05 F 1 -0.10 0 -0.10 -0.05 0.00 0.05 0.10 1 2 3 4 5 6 7 8 9 Residual Observation Order

110

Table A2.2.3.2c: Two-way ANOVA for Gene PoptrGH1-26/27: dCt (Normalized with UBQ11) versus Bio Rep, and different tissue types in September.

OUTPUT Source DF SS MS F P Bio Rep 2 0.009335 0.0046674 1.64 0.303 Tissue 2 0.098442 0.0492209 17.26 0.011 Error 4 0.011410 0.0028524 Total 8 0.119186

S = 0.05341 R-Sq = 90.43% R-Sq(adj) = 80.85

Figure A2.3c Residual Plots for BioRep VsTissue type for GH1-26 Normal Probability Plot Versus Fits 99 0.050 90

0.025

l

t

a

n

u

e d

c 0.000

50 i

r

s

e

e P R -0.025 10 -0.050 1 -0.10 -0.05 0.00 0.05 0.10 0.0 0.1 0.2 0.3 Residual Fitted Value

Histogram Versus Order

3 0.050

y 0.025

l c

2 a

n

u e

d 0.000

u i

s

q

e

e r R -0.025

F 1 -0.050 0 -0.075 -0.050 -0.025 0.000 0.025 0.050 1 2 3 4 5 6 7 8 9 Residual Observation Order

111

Table A2.2.3.2d: Two-way ANOVA for Gene PoptrGH1-26/27: dCt (Normalized with UBQ11) versus Bio Rep, and different times for Leaves

OUTPUT Source DF SS MS F P Bio Rep 2 0.009991 0.0049953 1.80 0.278 Month 2 0.093494 0.0467469 16.81 0.011 Error 4 0.011125 0.0027813 Total 8 0.114610

S = 0.05274 R-Sq = 90.29% R-Sq(adj) = 80.59%

Figure A2.3d Residual Plots for BioRep Vs Time GH1-26/27 Leaves Normal Probability Plot Versus Fits 99 0.050 90

0.025

l

t

a

n

u

e

d c

50 i 0.000

r

s

e

e P R -0.025 10 -0.050 1 -0.10 -0.05 0.00 0.05 0.10 0.0 0.1 0.2 0.3 Residual Fitted Value

Histogram Versus Order

3 0.050

y 0.025

l c

2 a

n

u e

d 0.000

u i

s

q

e

e r R -0.025

F 1

-0.050 0 -0.050 -0.025 0.000 0.025 0.050 1 2 3 4 5 6 7 8 9 Residual Observation Order

112

Table A2.2.3.2e: Two-way ANOVA for Gene PoptrGH1-26/27: dCt (Normalized with UBQ11) versus Bio Rep, and different times for Phloem

OUTPUT Source DF SS MS F P Bio Rep 2 0.6227 0.31136 0.84 0.498 Tissue 2 11.1011 5.55053 14.89 0.014 Error 4 1.4908 0.37269 Total 8 13.2145

S = 0.6105 R-Sq = 88.72% R-Sq(adj) = 77.44%

Figure A2.3e Residual Plots for BioRep Vs Time GH1-26/27 Phloem Normal Probability Plot Versus Fits 99 0.5

90

l t

a 0.0

n

u

e

d c

50 i

r

s

e

e P R -0.5 10

1 -1.0 -1.0 -0.5 0.0 0.5 1.0 0 1 2 3 Residual Fitted Value

Histogram Versus Order 4 0.5

y 3

l c

a 0.0

n

u

e

d

u i

2 s

q

e

e r R -0.5 F 1

0 -1.0 -0.75 -0.50 -0.25 0.00 0.25 0.50 1 2 3 4 5 6 7 8 9 Residual Observation Order

113

Table A2.2.3.2f: Two-way ANOVA for Gene PoptrGH1-26/27: dCt (Normalized with UBQ11) versus Bio Rep, and different times for Xylem

Two-way ANOVA: BioRep Vs Time GH1-26/27 Xylem versus Bio Rep, Tissue

Source DF SS MS F P Bio Rep 2 5.3864 2.69318 0.97 0.453 Tissue 2 15.3601 7.68004 2.77 0.176 Error 4 11.0882 2.77206 Total 8 31.8347

S = 1.665 R-Sq = 65.17% R-Sq(adj) = 30.34%

Figure A2.3f Residual Plots for BioRep Vs Time GH1-26/27 Xylem Normal Probability Plot Versus Fits 99 2 90

l 1

t

a

n

u

e

d c 50 i

r 0

s

e

e

P R 10 -1

1 -2 -3.0 -1.5 0.0 1.5 3.0 0.0 1.5 3.0 4.5 Residual Fitted Value

Histogram Versus Order 3 2

y 1

l c

2 a

n

u

e

d i

u 0

s

q

e

e

r R

F 1 -1

0 -2 -2 -1 0 1 2 1 2 3 4 5 6 7 8 9 Residual Observation Order

114

Table A2.1.4.1: The raw data and expression fold change calculations for poplar samples with PoptrGH1-28 normalized with UBQ11

PoptrGH1- Biological UBQ 11 Target - Reference (UBQ11) Fold increase 2^[-(t-r)] 28 Reps. Leaves Phloem Xylem Leaves Phloem Xylem Leaves Phloem Xylem Leaves Phloem Xylem 1 30.4 29.52 32.97 28.1 28.68 29.32 2.3 0.84 3.65 0.203063 0.558644 0.07966 April 2 30.92 30 33.84 27.23 28.94 30.17 3.69 1.06 3.67 0.077482 0.479632 0.078563 3 29.62 29.1 32.8 28.94 28.2 24.97 0.683333 0.9 7.83 0.622725 0.535887 0.004395 1 36.32 29.95 34.13 26.91 27.19 25.75 9.41 2.76 8.38 0.00147 0.147624 0.003002 June 2 32.53 32.93 35.28 27.67 28.02 26.73 4.86 4.91 8.55 0.034435 0.033262 0.002668 3 38.64 30.55 35.15 24.09 28.48 26.2 14.55 2.07 8.95 4.17E-05 0.238159 0.002022 1 35.18 32.92 33.82 29.72 27.76 28.59 5.46 5.16 5.23 0.022718 0.02797 0.026645 September 2 37.93 30.57 32.21 29.22 28.15 28.09 8.71 2.42 4.12 0.002388 0.186856 0.057512 3 35.06 34.68 34.78 28.4 26.99 27.27 6.66 7.69 7.51 0.009889 0.004843 0.005486

115

Table A2.1.4.2a: Two-way ANOVA for Gene PoptrGH1-28: dCt (Normalized with UBQ11) versus Bio Rep, and different tissue types in April.

OUTPUT Source DF SS MS F P Bio Rep 2 0.047093 0.023546 0.77 0.523 Tissue 2 0.332347 0.166173 5.40 0.073 Error 4 0.122997 0.030749 Total 8 0.502436

S = 0.1754 R-Sq = 75.52% R-Sq(adj) = 51.04%

Figure A2.4a Residual Plots for BioRep Vs Tissue GH1-28 April Normal Probability Plot Versus Fits 99 0.2

90

l t

a 0.1

n

u

e

d c

50 i

r

s e

e 0.0

P R 10 -0.1 1 -0.30 -0.15 0.00 0.15 0.30 0.00 0.15 0.30 0.45 0.60 Residual Fitted Value

Histogram Versus Order 3

0.2

y

l c

2 a 0.1

n

u

e

d

u i

s

q e

e 0.0

r R

F 1 -0.1 0 -0.1 0.0 0.1 0.2 1 2 3 4 5 6 7 8 9 Residual Observation Order

116

Table A2.1.4.2b: Two-way ANOVA for Gene PoptrGH1-28: dCt (Normalized with UBQ11) versus Bio Rep, and different tissue types in June.

OUTPUT Source DF SS MS F P Bio Rep 2 0.0048110 0.0024055 0.56 0.608 Tissue 2 0.0351972 0.0175986 4.13 0.106 Error 4 0.0170329 0.0042582 Total 8 0.0570411

S = 0.06526 R-Sq = 70.14% R-Sq(adj) = 40.28%

Figure A2.4b Residual Plots for BioRep Vs Tissue GH1-28 June Normal Probability Plot Versus Fits 99 0.05

90

l

t

a

n u

e 0.00

d c

50 i

r

s

e

e P R -0.05 10

1 -0.10 -0.10 -0.05 0.00 0.05 0.10 -0.05 0.00 0.05 0.10 0.15 Residual Fitted Value

Histogram Versus Order 3

0.05

y

l c

2 a

n u

e 0.00

d

u i

s

q

e

e

r R

F 1 -0.05

0 -0.10 -0.08 -0.06 -0.04 -0.02 0.00 0.02 0.04 0.06 1 2 3 4 5 6 7 8 9 Residual Observation Order

117

Table A2.1.4.2c: Two-way ANOVA for Gene PoptrGH1-28: dCt (Normalized with UBQ11) versus Bio Rep, and different tissue types in September.

OUTPUT Source DF SS MS F P Bio Rep 2 0.0332438 0.0166219 8.99 0.033 Tissue 2 0.0037570 0.0018785 1.02 0.440 Error 4 0.0073977 0.0018494 Total 8 0.0443986

S = 0.04301 R-Sq = 83.34% R-Sq(adj) = 66.68%

Figure A2.4c Residual Plots for BioRep Vs Tissue GH1-28 Septemb Normal Probability Plot Versus Fits 99 0.02

90

l t

a 0.00

n

u

e

d c

50 i r

s -0.02

e

e

P R 10 -0.04

1 -0.06 -0.08 -0.04 0.00 0.04 0.08 -0.05 0.00 0.05 0.10 0.15 Residual Fitted Value

Histogram Versus Order 4 0.02

y 3

l c

a 0.00

n

u

e

d

u i

2 s

q -0.02

e

e

r

R F 1 -0.04

0 -0.06 -0.06 -0.04 -0.02 0.00 0.02 1 2 3 4 5 6 7 8 9 Residual Observation Order

118

Table A2.1.4.2d: Two-way ANOVA for Gene PoptrGH1-28: dCt (Normalized with UBQ11) versus Bio Rep, and different times for Leaves

OUTPUT Source DF SS MS F P Bio Rep 2 0.049533 0.0247665 0.87 0.487 Month 2 0.167350 0.0836751 2.92 0.165 Error 4 0.114494 0.0286236 Total 8 0.331378

S = 0.1692 R-Sq = 65.45% R-Sq(adj) = 30.90%

Figure A2.4d Residual Plots for BioRep Vs Time GH1-28 Leaves Normal Probability Plot Versus Fits 99 0.2

90 l

t 0.1

a

n

u

e

d c

50 i r

s 0.0

e

e

P R 10 -0.1

1 -0.2 -0.30 -0.15 0.00 0.15 0.30 0.0 0.1 0.2 0.3 0.4 Residual Fitted Value

Histogram Versus Order

2.0 0.2

y 1.5

l 0.1

c

a

n

u

e

d

u i

1.0 s

q 0.0

e

e

r R F 0.5 -0.1

0.0 -0.2 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 0.20 1 2 3 4 5 6 7 8 9 Residual Observation Order

119

Table A2.1.4.2e: Two-way ANOVA for Gene PoptrGH1-28: dCt (Normalized with UBQ11) versus Bio Rep, and different times for Phloem

OUTPUT Source DF SS MS F P Bio Rep 2 0.001050 0.000525 0.05 0.953 Tissue 2 0.356522 0.178261 16.59 0.012 Error 4 0.042981 0.010745 Total 8 0.400553

S = 0.1037 R-Sq = 89.27% R-Sq(adj) = 78.54%

Figure A2.4e Residual Plots for BioRep Vs Time GH1-28 Phloem Normal Probability Plot Versus Fits 99 0.10

90

l t

a 0.05

n

u

e

d c

50 i

r s

e 0.00

e

P R 10 -0.05

1 -0.10 -0.2 -0.1 0.0 0.1 0.2 0.00 0.15 0.30 0.45 0.60 Residual Fitted Value

Histogram Versus Order 2.0 0.10

y 1.5

l c

a 0.05

n

u

e

d

u i

1.0 s

q 0.00

e

e

r

R F 0.5 -0.05

0.0 -0.10 -0.10 -0.05 0.00 0.05 0.10 0.15 1 2 3 4 5 6 7 8 9 Residual Observation Order

120

Table A2.1.4.2f: Two-way ANOVA for Gene PoptrGH1-28: dCt (Normalized with UBQ11) versus Bio Rep, and different times for Xylem

OUTPUT Source DF SS MS F P Bio Rep 2 0.0029380 0.0014690 2.73 0.179 Tissue 2 0.0040048 0.0020024 3.72 0.122 Error 4 0.0021538 0.0005385 Total 8 0.0090967

S = 0.02320 R-Sq = 76.32% R-Sq(adj) = 52.65%

Figure A2.4f Residual Plots for BioRep Vs Time GH1-28 Xylem Normal Probability Plot Versus Fits 99 0.02

90 l

t 0.01

a

n

u

e

d c

50 i 0.00

r

s

e

e P R -0.01 10 -0.02 1 -0.04 -0.02 0.00 0.02 0.04 -0.02 0.00 0.02 0.04 0.06 Residual Fitted Value

Histogram Versus Order 2.0 0.02

y 1.5

l 0.01

c

a

n

u

e

d i u 0.00

1.0 s

q

e

e r R -0.01 F 0.5 -0.02 0.0 -0.02 -0.01 0.00 0.01 0.02 1 2 3 4 5 6 7 8 9 Residual Observation Order

121

Table A2.2.1.1: The raw data and expression fold change calculations for poplar samples with PoptrGH1-24 normalized with PoptrGH1-34.

Target - Reference (GH1- PoptrGH1- Biological GH1-34 34) Fold increase 2^[-(t-r)] 24 Reps. Leaves Phloem Xylem Leaves Phloem Xylem Leaves Phloem Xylem Leaves Phloem Xylem 1 _ 25.6 25.49 27.3 25.1 25.82 _ 0.5 -0.33 _ 0.707107 1.257013 April 2 _ 25.58 25.97 27.14 24.88 27.39 _ 0.7 -1.42 _ 0.615572 2.675855 3 _ 26.71 25.78 25.69 24.68 26.68 _ 2.03 -0.9 _ 0.244855 1.866066 1 32.94 28.17 27.26 26.62 24.69 26.39 6.32 3.48 0.87 0.0125167 0.089622 0.547147 June 2 29.44 28.19 25.6 26.12 24.71 26.56 3.32 3.48 -0.96 0.1001337 0.089622 1.94531 3 - 28.19 23.99 24.08 24.29 25.59 - 3.9 -1.6 - 0.066986 3.031433 1 33.19 34.51 35.27 25.9 25.22 26.48 7.29 9.29 8.79 0.0063899 0.001597 0.002259 September 2 - 33.84 30.51 24.89 23.86 24.77 - 9.98 5.74 - 0.00099 0.018711 3 32.56 29.62 34.98 26.09 25.52 25.02 6.47 4.1 9.96 0.0112807 0.058315 0.001004

122

Table A2.2.1.2a: Two-way ANOVA for Gene PoptrGH1-24: dCt (Normalized with GH1-34) versus Bio Rep, and different tissue types in April.

OUTPUT Source DF SS MS F P Bio Rep 2 0.35299 0.17649 0.90 0.474 Tissue 2 5.99884 2.99942 15.38 0.013 Error 4 0.78011 0.19503 Total 8 7.13194

S = 0.4416 R-Sq = 89.06% R-Sq(adj) = 78.12%

Figure A2.5a Residual Plots for BioRep VsTissue type for GH1-24 Normal Probability Plot Versus Fits 99 0.50

90 0.25

l

t

a

n

u

e

d c

50 i 0.00

r

s

e

e P R -0.25 10 -0.50 1 -0.8 -0.4 0.0 0.4 0.8 0.0 0.5 1.0 1.5 2.0 Residual Fitted Value

Histogram Versus Order 3 0.50

0.25

y

l

c a

n 2

u

e

d i

u 0.00

s

q

e

e

r R

F 1 -0.25

-0.50 0 -0.6 -0.4 -0.2 0.0 0.2 0.4 1 2 3 4 5 6 7 8 9 Residual Observation Order

NOTE: All three replicates in leaves were missing data points (expression was below detection limit), which were substituted with value of 0 for the ANOVA test

123

Table A2.2.1.2b: Two-way ANOVA for Gene PoptrGH1-24: dCt (Normalized with GH1-34) versus Bio Rep, and different tissue types in June.

*NOTE: Missing data point was substituted with a value of 0 for the ANOVA analysis.

OUTPUT Source DF SS MS F P Bio Rep 2 1.01487 0.50744 0.97 0.454 Tissue 2 6.35034 3.17517 6.07 0.061 Error 4 2.09349 0.52337 Total 8 9.45870

S = 0.7234 R-Sq = 77.87% R-Sq(adj) = 55.73%

Figure A2.5b Residual Plots for BioRep VsTissue type for GH1-24 Normal Probability Plot Versus Fits 99 1.0

90 0.5

l

t

a

n

u

e

d c

50 i 0.0

r

s

e

e

P R 10 -0.5

1 -1.0 -1.0 -0.5 0.0 0.5 1.0 0.0 0.6 1.2 1.8 2.4 Residual Fitted Value

Histogram Versus Order 1.0 3

0.5

y

l

c a

n 2

u

e

d i

u 0.0

s

q

e

e

r R

F 1 -0.5

0 -1.0 -0.75 -0.50 -0.25 0.00 0.25 0.50 0.75 1 2 3 4 5 6 7 8 9 Residual Observation Order

124

Table A2.2.1.2c: Two-way ANOVA for Gene PoptrGH1-24: dCt (Normalized with GH1-34) versus Bio Rep, and different tissue types in September.

OUTPUT Source DF SS MS F P Bio Rep 2 0.0005006 0.0002503 0.53 0.624 Tissue 2 0.0004818 0.0002409 0.51 0.634 Error 4 0.0018853 0.0004713 Total 8 0.0028677

S = 0.02171 R-Sq = 34.26% R-Sq(adj) = 0.00%

NOTE: Expression in the Leaves Bio Rep 2 sample is a missing data point (was below detection limit ) and was substituted with a value of 0.

Figure A2.5c Residual Plots for BioRep VsTissue type for GH1-24 Normal Probability Plot Versus Fits 99 0.03

90 0.02

l

t

a n

u 0.01

e

d c

50 i

r

s e

e 0.00

P R 10 -0.01

1 -0.04 -0.02 0.00 0.02 0.04 0.00 0.01 0.02 0.03 Residual Fitted Value

Histogram Versus Order 0.03 2.0

0.02 y

1.5 l

c

a n

u 0.01

e

d

u i s

q 1.0 e

e 0.00

r R F 0.5 -0.01 0.0 -0.02 -0.01 0.00 0.01 0.02 0.03 1 2 3 4 5 6 7 8 9 Residual Observation Order

125

Table A2.2.1.2d: Two-way ANOVA for Gene PoptrGH1-24: dCt (Normalized with GH1-34) versus Bio Rep, and different times for Phloem

OUTPUT Source DF SS MS F P Bio Rep 2 0.033860 0.016930 0.77 0.523 Tissue 2 0.450014 0.225007 10.17 0.027 Error 4 0.088479 0.022120 Total 8 0.572353

S = 0.1487 R-Sq = 84.54% R-Sq(adj) = 69.08%

Figure A2.5d Residual Plots for BioRep Vs Time GH1-24 Phloem Normal Probability Plot Versus Fits 99 0.1

90

l

t

a n

u 0.0

e

d c

50 i

r

s

e

e P R -0.1 10

1 -0.2 -0.2 -0.1 0.0 0.1 0.2 0.00 0.15 0.30 0.45 0.60 Residual Fitted Value

Histogram Versus Order 2.0 0.1

y 1.5

l

c

a n

u 0.0

e

d

u i

1.0 s

q

e

e

r R

F -0.1 0.5

0.0 -0.2 -0.20 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 1 2 3 4 5 6 7 8 9 Residual Observation Order

126

Table A2.2.1.2e: Two-way ANOVA for Gene PoptrGH1-24: dCt (Normalized with GH1-34) versus Bio Rep, and different times for Xylem

OUTPUT Source DF SS MS F P Bio Rep 2 1.9618 0.98091 1.82 0.274 Tissue 2 7.0800 3.54000 6.57 0.054 Error 4 2.1537 0.53843 Total 8 11.1955

S = 0.7338 R-Sq = 80.76% R-Sq(adj) = 61.53%

Figure A2.5e Residual Plots for BioRep Vs Time GH1-24 Xylem Normal Probability Plot Versus Fits 99 1.0

90

l 0.5

t

a

n

u

e

d c

50 i

r

s e

e 0.0

P R 10 -0.5 1 -1.0 -0.5 0.0 0.5 1.0 -1 0 1 2 Residual Fitted Value

Histogram Versus Order 1.0

2.0 y

1.5 l 0.5

c

a

n

u

e

d

u i s

q 1.0 e

e 0.0

r R F 0.5 -0.5 0.0 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1 2 3 4 5 6 7 8 9 Residual Observation Order

127

Table A2.2.2.1: The raw data and fold change calculations for poplar samples with PoptrGH1-25 normalized with PoptrGH1-34.

Target - Reference (GH1- PoptrGH1- Biological GH1-34 34) Fold increase 2^[-(t-r)] 25 Reps. Leaves Phloem Xylem Leaves Phloem Xylem Leaves Phloem Xylem Leaves Phloem Xylem 1 _ 27.65 25.86 27.3 25.1 25.82 _ 2.55 0.04 _ 0.170755 0.972655 April 2 _ 27.33 26.78 27.14 24.88 27.39 _ 2.45 -0.61 _ 0.183011 1.526259 3 _ 27.34 26.14 25.69 24.68 26.68 _ 2.66 -0.54 _ 0.15822 1.453973 1 34.41 30.91 28.72 26.62 24.69 26.39 7.79 6.22 2.33 0.0045183 0.013415 0.198884 June 2 31.4 31.13 27.46 26.12 24.71 26.56 5.28 6.42 0.9 0.0257372 0.011679 0.535887 3 - 30.11 26.24 24.08 24.29 25.59 - 5.82 0.65 - 0.017701 0.63728 1 33.12 35.23 - 25.9 25.22 26.48 7.22 10.01 - 0.0067075 0.00097 - September 2 - 35.81 32.2 24.89 23.86 24.77 - 11.95 7.43 - 0.000253 0.005799 3 34.67 32.75 - 26.09 25.52 25.02 8.58 7.23 - 0.0026131 0.006661 -

128

Table A2.2.2.2a: Two-way ANOVA for Gene PoptrGH1-25: dCt (Normalized with GH1-34) versus Bio Rep, and different tissue types in April.

“No data” in leaves were substituted with 0.

OUTPUT Source DF SS MS F P Bio Rep 2 0.06104 0.03052 1.01 0.440 Tissue 2 3.08080 1.54040 51.18 0.001 Error 4 0.12039 0.03010 Total 8 3.26224

S = 0.1735 R-Sq = 96.31% R-Sq(adj) = 92.62%

Figure A2.6a Residual Plots for BioRep Vs Tissue GH1-25 April Normal Probability Plot Versus Fits 99 0.1

90

l

t a

n 0.0

u

e

d c

50 i

r

s e

e -0.1

P R 10 -0.2 1 -0.30 -0.15 0.00 0.15 0.30 0.0 0.4 0.8 1.2 1.6 Residual Fitted Value

Histogram Versus Order 4 0.1

y 3

l

c a

n 0.0

u

e

d

u i

2 s

q

e e

r -0.1 R F 1 -0.2 0 -0.2 -0.1 0.0 0.1 1 2 3 4 5 6 7 8 9 Residual Observation Order

129

Table A2.2.2.2b:Two-way ANOVA for Gene PoptrGH1-25: dCt (Normalized with GH1-34) versus Bio Rep, and different tissue types in April.

ANOVA done without data points from the leaves.

OUTPUT Source DF SS MS F P Bio Rep 2 0.09156 0.04578 1.02 0.495 Tissue 1 1.97330 1.97330 43.92 0.022 Error 2 0.08987 0.04493 Total 5 2.15473

S = 0.2120 R-Sq = 95.83% R-Sq(adj) = 89.57%

Figure A2.6b Residual Plots for BioRep Vs Tissue GH1-25 April Normal Probability Plot Versus Fits 99 0.2

90 0.1

l

t

a

n

u

e

d c

50 i 0.0

r

s

e

e

P R 10 -0.1

1 -0.2 -0.4 -0.2 0.0 0.2 0.4 0.0 0.4 0.8 1.2 1.6 Residual Fitted Value

Histogram Versus Order 0.2 2.0

0.1 y

1.5 l

c

a

n

u

e

d i

u 0.0 s

q 1.0

e

e

r R F 0.5 -0.1

0.0 -0.2 -0.2 -0.1 0.0 0.1 0.2 1 2 3 4 5 6 Residual Observation Order

130

Table A2.2.2.2c:Two-way ANOVA for Gene PoptrGH1-25: dCt (Normalized with GH1-34) versus Bio Rep, and different tissue types in June.

OUTPUT Source DF SS MS F P Bio Rep 2 0.036193 0.018097 1.04 0.433 Tissue 2 0.396388 0.198194 11.40 0.022 Error 4 0.069551 0.017388 Total 8 0.502133

S = 0.1319 R-Sq = 86.15% R-Sq(adj) = 72.30%

NOTE: Leaves 3 taken as 0.

Figure A2.6c Residual Plots for BioRep Vs Tissue GH1-25 June Normal Probability Plot Versus Fits 99 0.1

90

l

t

a n

u 0.0

e

d c

50 i

r

s

e

e P R -0.1 10

1 -0.2 -0.2 -0.1 0.0 0.1 0.2 0.00 0.15 0.30 0.45 0.60 Residual Fitted Value

Histogram Versus Order 3

0.1

y

l c

2 a n

u 0.0

e

d

u i

s

q

e

e

r R

F 1 -0.1

0 -0.2 -0.15 -0.10 -0.05 0.00 0.05 0.10 1 2 3 4 5 6 7 8 9 Residual Observation Order

131

Table A2.2.2.2d: Two-way ANOVA for Gene PoptrGH1-25: dCt (Normalized with GH1-34) versus Bio Rep, and different tissue types in September.

NOTE: Leaves 1 and Xylem 1 and 3 substituted with 0.

OUTPUT Source DF SS MS F P Bio Rep 2 0.0000017 0.0000009 0.05 0.951 Tissue 2 0.0000021 0.0000010 0.06 0.941 Error 4 0.0000682 0.0000171 Total 8 0.0000720

S = 0.004129 R-Sq = 5.31% R-Sq(adj) = 0.00%

Figure A2.6d Residual Plots for BioRep Vs Tissue GH1-25 Septemb Normal Probability Plot Versus Fits 99 0.004

90

l t

a 0.002

n

u

e

d c

50 i

r

s e

e 0.000

P R 10 -0.002 1 -0.008 -0.004 0.000 0.004 0.008 0.0015 0.0020 0.0025 0.0030 0.0035 Residual Fitted Value

Histogram Versus Order 4 0.004

y 3

l c

a 0.002

n

u

e

d

u i

2 s

q e

e 0.000

r R F 1 -0.002 0 3 2 1 0 1 2 3 4 0 0 0 0 0 0 0 0 1 2 3 4 5 6 7 8 9 .0 .0 .0 .0 .0 .0 .0 .0 -0 -0 -0 0 0 0 0 0 Observation Order Residual Two-way ANOVA for Gene PoptrGH1-25: dCt (Normalized with GH1-34) versus Bio Rep, and different times for Leaves

132

Table A2.2.2.2e: Two-way ANOVA for Gene PoptrGH1-25: dCt (Normalized with GH1-34) versus Bio Rep, and different times for Phloem

OUTPUT Source DF SS MS F P Bio Rep 2 0.0000284 0.0000142 0.18 0.845 Tissue 2 0.0528308 0.0264154 327.31 0.000 Error 4 0.0003228 0.0000807 Total 8 0.0531819

S = 0.008984 R-Sq = 99.39% R-Sq(adj) = 98.79%

Figure A2.6e Residual Plots for BioRep Vs Time GH1-25 Phloem Normal Probability Plot Versus Fits 99 0.010

90 0.005

l

t

a

n

u

e

d c

50 i 0.000

r

s

e

e P R -0.005 10 -0.010 1 -0.01 0.00 0.01 0.00 0.04 0.08 0.12 0.16 Residual Fitted Value

Histogram Versus Order 3 0.010

0.005

y

l c

2 a

n

u

e

d i

u 0.000

s

q

e

e

r R

F 1 -0.005

-0.010 0 -0.010 -0.005 0.000 0.005 0.010 1 2 3 4 5 6 7 8 9 Residual Observation Order

133

Table A2.2.2.2f: Two-way ANOVA for Gene PoptrGH1-25: dCt (Normalized with GH1-34) versus Bio Rep, and different times for Xylem

OUTPUT Source DF SS MS F P Bio Rep 2 0.18333 0.09166 3.55 0.130 Tissue 2 2.67854 1.33927 51.93 0.001 Error 4 0.10316 0.02579 Total 8 2.96503

S = 0.1606 R-Sq = 96.52% R-Sq(adj) = 93.04%

Figure A2.6f Residual Plots for BioRep Vs Time GH1-25 Xylem Normal Probability Plot Versus Fits 99 0.2

90 l

t 0.1

a

n

u

e

d c

50 i

r s

e 0.0

e

P R 10 -0.1 1 -0.2 -0.1 0.0 0.1 0.2 0.0 0.4 0.8 1.2 1.6 Residual Fitted Value

Histogram Versus Order 2.0 0.2

y 1.5

l 0.1

c

a

n

u

e

d

u i

1.0 s q

e 0.0

e

r R F 0.5 -0.1 0.0 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 0.20 1 2 3 4 5 6 7 8 9 Residual Observation Order

134

Table A2.2.3.1: The raw data and expression fold change calculations for poplar samples for PoptrGH1-26/27 normalized with PoptrGH1-34.

Target - Reference (GH1- PoptrGH1- Biological GH1-34 34) Fold increase 2^[-(t-r)] 26/27 Reps. Leaves Phloem Xylem Leaves Phloem Xylem Leaves Phloem Xylem Leaves Phloem Xylem 1 36.15 27.05 27.99 27.3 25.1 25.82 8.85 1.95 2.17 0.002167 0.258816 0.222211 April 2 34.11 27.41 27.6 27.14 24.88 27.39 6.97 2.53 0.21 0.007977 0.173139 0.864537 3 33.7 27.91 26.96 25.69 24.68 26.68 8.013333 3.23 0.28 0.00387 0.106579 0.823591 1 34.61 32.39 30.27 26.62 24.69 26.39 7.99 7.7 3.88 0.003933 0.004809 0.067921 June 2 32.13 31.53 28.46 26.12 24.71 26.56 6.01 6.82 1.9 0.015517 0.008851 0.267943 3 30.81 31.38 27.38 24.08 24.29 25.59 6.73 7.09 1.79 0.00942 0.00734 0.289172 1 32.67 36.87 37.43 25.9 25.22 26.48 6.77 11.65 10.95 0.009163 0.000311 0.000506 September 2 30.84 36.17 32.77 24.89 23.86 24.77 5.95 12.31 8 0.016176 0.000197 0.003906 3 30.41 32.26 36.05 26.09 25.52 25.02 4.32 6.74 11.03 0.050067 0.009355 0.000478

135

Table A2.2.3.2a: Two-way ANOVA for Gene PoptrGH1-26/27: dCt (Normalized with GH1- 34) versus Bio Rep, and different tissue types in April.

OUTPUT Source DF SS MS F P Bio Rep 2 0.059120 0.029560 0.56 0.610 Tissue 2 0.639224 0.319612 6.05 0.062 Error 4 0.211186 0.052797 Total 8 0.909530

S = 0.2298 R-Sq = 76.78% R-Sq(adj) = 53.56%

Figure A2.7a Residual Plots for BioRep Vs Tissue GH1-26/27 Apr Normal Probability Plot Versus Fits 99 0.2

90

l

t

a n

u 0.0

e

d c

50 i

r

s

e

e

P R 10 -0.2

1 -0.4 -0.2 0.0 0.2 0.4 0.0 0.2 0.4 0.6 0.8 Residual Fitted Value

Histogram Versus Order

3 0.2

y

l c

2 a n

u 0.0

e

d

u i

s

q

e

e

r R

F 1 -0.2

0 -0.3 -0.2 -0.1 0.0 0.1 0.2 1 2 3 4 5 6 7 8 9 Residual Observation Order

136

Table A2.2.3.2b: Two-way ANOVA for Gene PoptrGH1-26/27: dCt (Normalized with GH1- 34) versus Bio Rep, and different tissue types in June.

OUTPUT Source DF SS MS F P Bio Rep 2 0.0000593 0.0000297 1.66 0.298 Tissue 2 0.0000608 0.0000304 1.70 0.292 Error 4 0.0000714 0.0000178 Total 8 0.0001915

S = 0.004224 R-Sq = 62.73% R-Sq(adj) = 25.47%

Figure A2.7b Residual Plots for BioRep Vs Tissue GH1-26/27 June Normal Probability Plot Versus Fits 99 0.0050

90 0.0025

l

t

a

n

u

e

d c

50 i 0.0000

r

s

e

e

P R 10 -0.0025

1 -0.0050 -0.008 -0.004 0.000 0.004 0.008 0.0000 0.0025 0.0050 0.0075 0.0100 Residual Fitted Value

Histogram Versus Order

2.0 0.0050 y l 0.0025

c 1.5

a

n

u

e

d i u 0.0000

1.0 s

q

e

e

r R

F 0.5 -0.0025

0.0 -0.0050 5 0 5 0 5 0 5 1 2 3 4 5 6 7 8 9 04 03 01 00 01 03 04 .0 .0 .0 .0 .0 .0 .0 Observation Order -0 -0 -0 0 0 0 0 Residual

137

Table A2.2.3.2c: Two-way ANOVA for Gene PoptrGH1-26/27: dCt (Normalized with GH1- 34) versus Bio Rep, and different tissue types in September.

OUTPUT Source DF SS MS F P Bio Rep 2 0.154723 0.077361 2.31 0.215 Tissue 2 0.629702 0.314851 9.42 0.031 Error 4 0.133728 0.033432 Total 8 0.918153

S = 0.1828 R-Sq = 85.44% R-Sq(adj) = 70.87%

Figure A2.7c Residual Plots for Tissue Vs BioRep GH1-26/27Sept Normal Probability Plot Versus Fits 99 0.2

90 0.1

l

t

a

n

u e

d 0.0 c

50 i

r

s

e

e P R -0.1 10 -0.2 1 -0.30 -0.15 0.00 0.15 0.30 0.00 0.25 0.50 0.75 Residual Fitted Value

Histogram Versus Order 0.2 2.0

0.1 y

1.5 l

c

a

n

u e

d 0.0

u i

1.0 s

q

e

e r R -0.1 F 0.5 -0.2 0.0 -0.24 -0.18 -0.12 -0.06 0.00 0.06 0.12 0.18 1 2 3 4 5 6 7 8 9 Residual Observation Order

138

Table A2.2.3.2d: Two-way ANOVA for Gene PoptrGH1-26/27: dCt (Normalized with GH1- 34) versus Bio Rep, and different times for Leaves

OUTPUT Source DF SS MS F P Bio Rep 2 0.0003855 0.0001928 1.17 0.397 Month 2 0.0006839 0.0003420 2.08 0.240 Error 4 0.0006564 0.0001641 Total 8 0.0017259

S = 0.01281 R-Sq = 61.97% R-Sq(adj) = 23.93%

Figure A2.7d Residual Plots for BioRep Vs Time GH1-26/27 Leaves Normal Probability Plot Versus Fits 99 0.02

90 l

t 0.01

a

n

u

e

d c

50 i

r

s

e

e P R 0.00 10

1 -0.01 -0.02 -0.01 0.00 0.01 0.02 0.00 0.01 0.02 0.03 Residual Fitted Value

Histogram Versus Order 0.02

4 y

3 l

c 0.01

a

n

u

e

d

u i s

q 2

e

e r R 0.00 F 1

0 -0.01 -0.010 -0.005 0.000 0.005 0.010 0.015 1 2 3 4 5 6 7 8 9 Residual Observation Order

139

Table A2.2.3.2e: Two-way ANOVA for Gene PoptrGH1-26/27: dCt (Normalized with GH1- 34) versus Bio Rep, and different times for Phloem

OUTPUT Source DF SS MS F P Bio Rep 2 0.0033266 0.0016633 0.79 0.513 Tissue 2 0.0608288 0.0304144 14.51 0.015 Error 4 0.0083859 0.0020965 Total 8 0.0725413

S = 0.04579 R-Sq = 88.44% R-Sq(adj) = 76.88%

Figure A2.7e Residual Plots for BioRep Vs Time GH1-26/27 Phloem Normal Probability Plot Versus Fits 99 0.050 90

l 0.025

t

a

n

u

e

d c 50 i

r 0.000

s

e

e P R -0.025 10 -0.050 1 -0.08 -0.04 0.00 0.04 0.08 0.00 0.05 0.10 0.15 0.20 Residual Fitted Value

Histogram Versus Order

3 0.050 y

l 0.025 c

2 a

n

u

e

d i

u 0.000

s

q

e

e

r R F 1 -0.025

-0.050 0 -0.06 -0.04 -0.02 0.00 0.02 0.04 0.06 1 2 3 4 5 6 7 8 9 Residual Observation Order

140

Table A2.2.3.2f: Two-way ANOVA for Gene PoptrGH1-26/27: dCt (Normalized with GH1- 34) versus Bio Rep, and different times for Xylem

OUTPUT Source DF SS MS F P Bio Rep 2 0.154723 0.077361 2.31 0.215 Tissue 2 0.629702 0.314851 9.42 0.031 Error 4 0.133728 0.033432 Total 8 0.918153

S = 0.1828 R-Sq = 85.44% R-Sq(adj) = 70.87%

Figure A2.7f Residual Plots for BioRep Vs Time GH1-26/27 Xylem Normal Probability Plot Versus Fits 99 0.2

90 0.1

l

t

a

n

u e

d 0.0 c

50 i

r

s

e

e P R -0.1 10 -0.2 1 -0.30 -0.15 0.00 0.15 0.30 0.00 0.25 0.50 0.75 Residual Fitted Value

Histogram Versus Order 0.2 2.0

0.1 y

1.5 l

c

a

n

u e

d 0.0

u i

1.0 s

q

e

e r R -0.1 F 0.5 -0.2 0.0 -0.24 -0.18 -0.12 -0.06 0.00 0.06 0.12 0.18 1 2 3 4 5 6 7 8 9 Residual Observation Order

141

Table A2.2.4.1: The raw data and expression fold change calculations for poplar samples with PoptrGH1-28 normalized with PoptrGH1-34.

Target - Reference (GH1- PoptrGH1- Biological GH1-34 34) Fold increase 2^[-(t-r)] 28 Reps. Leaves Phloem Xylem Leaves Phloem Xylem Leaves Phloem Xylem Leaves Phloem Xylem 1 30.4 29.52 32.97 27.3 25.1 25.82 3.1 4.42 7.15 0.116629 0.046714 0.007041 April 2 30.92 30 33.84 27.14 24.88 27.39 3.78 5.12 6.45 0.072796 0.028756 0.011438 3 29.62 29.1 32.8 25.69 24.68 26.68 3.936667 4.42 6.12 0.065305 0.046714 0.014378 1 36.32 29.95 34.13 26.62 24.69 26.39 9.7 5.26 7.74 0.001202 0.026096 0.004678 June 2 32.53 32.93 35.28 26.12 24.71 26.56 6.41 8.22 8.72 0.01176 0.003354 0.002371 3 38.64 30.55 35.15 24.08 24.29 25.59 14.56 6.26 9.56 4.14E-05 0.013048 0.001325 1 35.18 32.92 33.82 25.9 25.22 26.48 9.28 7.7 7.34 0.001609 0.004809 0.006172 September 2 37.93 30.57 32.21 24.89 23.86 24.77 13.04 6.71 7.44 0.000119 0.009552 0.005759 3 35.06 34.68 34.78 26.09 25.52 25.02 8.97 9.16 9.76 0.001994 0.001748 0.001153

142

Table A2.2.4.2a: Two-way ANOVA for Gene PoptrGH1-28: dCt (Normalized with GH1-34) versus Bio Rep, and different tissue types in April.

OUTPUT Source DF SS MS F P Bio Rep 2 0.0006010 0.0003005 1.02 0.439 Tissue 2 0.0083084 0.0041542 14.10 0.015 Error 4 0.0011785 0.0002946 Total 8 0.0100878

S = 0.01716 R-Sq = 88.32% R-Sq(adj) = 76.63%

Figure A2.8a Residual Plots for BioRep Vs Tissue GH1-28 April Normal Probability Plot Versus Fits 99 0.02 90

l 0.01

t

a

n

u

e

d c

50 i r

s 0.00

e

e

P R 10 -0.01

1 -0.02 -0.030 -0.015 0.000 0.015 0.030 0.000 0.025 0.050 0.075 0.100 Residual Fitted Value

Histogram Versus Order 3 0.02

y 0.01

l c

2 a

n

u

e

d

u i

s 0.00

q

e

e

r R

F 1 -0.01

0 -0.02 5 0 5 0 5 0 5 0 1 1 0 0 0 1 1 2 1 2 3 4 5 6 7 8 9 .0 .0 .0 .0 .0 .0 .0 .0 -0 -0 -0 0 0 0 0 0 Observation Order Residual

143

Table A2.2.4.2b: Two-way ANOVA for Gene PoptrGH1-28: dCt (Normalized with GH1-34) versus Bio Rep, and different tissue types in June.

OUTPUT Source DF SS MS F P Bio Rep 2 0.0000587 0.0000293 0.40 0.693 Tissue 2 0.0002284 0.0001142 1.57 0.314 Error 4 0.0002911 0.0000728 Total 8 0.0005782

S = 0.008531 R-Sq = 49.65% R-Sq(adj) = 0.00%

Figure A2.8b Residual Plots for BioRep Vs Tissue GH1-28 June Normal Probability Plot Versus Fits 99 0.010

90 0.005

l

t

a

n

u

e

d c

50 i 0.000

r

s

e

e

P R -0.005 10

1 -0.010 -0.01 0.00 0.01 0.000 0.005 0.010 0.015 0.020 Residual Fitted Value

Histogram Versus Order

3 0.010 y

l 0.005

c

a n

2 u

e

d i

u 0.000

s

q

e

e r 1 R

F -0.005

0 -0.010 0 5 0 5 0 5 0 5 1 2 3 4 5 6 7 8 9 10 07 05 02 00 02 05 07 .0 .0 .0 .0 .0 .0 .0 .0 Observation Order -0 -0 -0 -0 0 0 0 0 Residual

144

Table A2.2.4.2c: Two-way ANOVA for Gene PoptrGH1-28: dCt (Normalized with GH1-34) versus Bio Rep, and different tissue types in September.

OUTPUT Source DF SS MS F P Bio Rep 2 0.0000198 0.0000099 1.38 0.349 Tissue 2 0.0000278 0.0000139 1.94 0.257 Error 4 0.0000286 0.0000072 Total 8 0.0000762

S = 0.002674 R-Sq = 62.47% R-Sq(adj) = 24.94%

Figure A2.8c Residual Plots for BioRep Vs Tissue GH1-28 Septemb Normal Probability Plot Versus Fits 99 0.0030

90 0.0015

l

t

a

n

u

e

d c

50 i 0.0000

r

s

e

e

P R 10 -0.0015

1 -0.0030 -0.0050 -0.0025 0.0000 0.0025 0.0050 0.000 0.002 0.004 0.006 0.008 Residual Fitted Value

Histogram Versus Order 0.0030 2.0

0.0015 y

1.5 l

c

a

n

u

e

d i

u 0.0000 s

q 1.0

e

e

r R F 0.5 -0.0015

0.0 -0.0030 -0.003 -0.002 -0.001 0.000 0.001 0.002 0.003 1 2 3 4 5 6 7 8 9 Residual Observation Order

145

Table A2.2.4.2d: Two-way ANOVA for Gene PoptrGH1-28: dCt (Normalized with GH1-34) versus Bio Rep, and different times for Leaves

OUTPUT Source DF SS MS F P Bio Rep 2 0.0004693 0.0002346 0.81 0.505 Month 2 0.0135026 0.0067513 23.42 0.006 Error 4 0.0011533 0.0002883 Total 8 0.0151251

S = 0.01698 R-Sq = 92.38% R-Sq(adj) = 84.75%

Figure A2.8d Residual Plots for BioRep Vs Time GH1-28 Leaves Normal Probability Plot Versus Fits 99 0.02

90

l t

a 0.01

n

u

e

d c

50 i

r

s e

e 0.00

P R 10 -0.01 1 -0.030 -0.015 0.000 0.015 0.030 0.000 0.025 0.050 0.075 0.100 Residual Fitted Value

Histogram Versus Order 3

0.02

y

l c

2 a 0.01

n

u

e

d

u i

s

q e

e 0.00

r R

F 1

-0.01 0 5 0 5 0 5 0 5 0 1 1 0 0 0 1 1 2 1 2 3 4 5 6 7 8 9 .0 .0 .0 .0 .0 .0 .0 .0 -0 -0 -0 0 0 0 0 0 Observation Order Residual

146

Table A2.2.4.2e: Two-way ANOVA for Gene PoptrGH1-28: dCt (Normalized with GH1-34) versus Bio Rep, and different times for Phloem

OUTPUT Source DF SS MS F P Bio Rep 2 0.0002163 0.0001081 1.49 0.328 Tissue 2 0.0020331 0.0010166 14.02 0.016 Error 4 0.0002901 0.0000725 Total 8 0.0025395

S = 0.008517 R-Sq = 88.58% R-Sq(adj) = 77.15%

Figure A2.8e Residual Plots for BioRep Vs Time GH1-28 Phloem Normal Probability Plot Versus Fits 99 0.010

90

l t

a 0.005

n

u

e

d c

50 i

r

s e

e 0.000

P R 10 -0.005 1 -0.01 0.00 0.01 0.00 0.01 0.02 0.03 0.04 Residual Fitted Value

Histogram Versus Order 2.0 0.010

y 1.5 l

c 0.005

a

n

u

e

d

u i

1.0 s

q e

e 0.000

r R F 0.5 -0.005 0.0 -0.004 0.000 0.004 0.008 1 2 3 4 5 6 7 8 9 Residual Observation Order

147

Table A2.2.4.2f: Two-way ANOVA for Gene PoptrGH1-28: dCt (Normalized with GH1-34) versus Bio Rep, and different times for Xylem

OUTPUT Source DF SS MS F P Bio Rep 2 0.0000012 0.0000006 0.05 0.949 Tissue 2 0.0001125 0.0000563 4.74 0.088 Error 4 0.0000474 0.0000119 Total 8 0.0001612

S = 0.003443 R-Sq = 70.58% R-Sq(adj) = 41.15%

Figure A2.8f Residual Plots for BioRep Vs Time GH1-28 Xylem Normal Probability Plot Versus Fits 99 0.004

90 0.002

l

t

a

n

u

e

d c

50 i 0.000

r

s

e

e

P R -0.002 10

1 -0.004 -0.0050 -0.0025 0.0000 0.0025 0.0050 0.003 0.006 0.009 0.012 Residual Fitted Value

Histogram Versus Order 2.0 0.004

0.002

y 1.5

l

c

a

n

u

e

d i u 0.000

1.0 s

q

e

e

r R F 0.5 -0.002

0.0 -0.004 -0.004 -0.002 0.000 0.002 0.004 1 2 3 4 5 6 7 8 9 Residual Observation Order

148

PART B

TABLES OF RAW DATA AND EXPRESSION FOLD CHANGES FROM qRT-PCR ASSAYS, WITH TABLES AND FIGURES FOR ANOVA OUTPUT FOR ANALYSIS OF qRT-PCR DATA, FOR PINE SAMPLES

Table A2.3.1.1: The raw data and expression fold change calculations for pine samples for CBG normalized with Act2.

Biologica CBG l Act2 Target - Reference (Act2) Fold increase 2^[-(t-r)] Reps. Needles Phloem Xylem Needles Phloem Xylem Needles Phloem Xylem Needles Phloem Xylem 1 33.73 29.67 29.74 20.78 19.99 20.67 12.9500 9.68 9.07 0.00013 0.00122 0.00186 May 2(3) 33.51 31.12 30.42 23.35 21.01 24.23 10.1600 10.11 6.19 0.00087 0.00090 0.01370 3 27.25 32.57 29.85 21.65 23.12 22.76 5.60 9.45 7.09 0.02062 0.00143 0.00734 1 30.87 30.15 29.24 22.16 21.96 22.34 8.71 8.19 6.9 0.00239 0.00342 0.00837 June 2(4) 33.34 30.12 26.85 22.4 21.57 22.23 10.9400 8.55 4.62 0.00051 0.00267 0.04067 3 33.79 32.31 30.113333 27.78333 25.87333 25.62667 6.00667 6.43667 4.4867 0.01555 0.01154 0.04460 1 32.6 30.83 29.43 22.68 21.04 21.67 9.92 9.79 7.76 0.00103 0.00113 0.00461 Septembe 2 33.84 32.27 32.36 25.92 27.04 28.36 7.92 5.23 4. 0.00413 0.02664 0.06250 r 3 35.6 29.42 30.55 26.58 24.46 26.02 9.02 4.96 4.53 0.00193 0.03213 0.04328

149

Table A2.3.1.2a: Two-way ANOVA for Gene CBG: dCt (Normalized with Act2) versus Bio Rep, and different tissue types in May.

Two-way ANOVA: BioRep Vs Tissue for CBG in May versus Bio Rep, Tissue

OUTPUT Source DF SS MS F P Bio Rep 2 0.0001144 0.0000572 1.01 0.441 Tissue 2 0.0000780 0.0000390 0.69 0.553 Error 4 0.0002260 0.0000565 Total 8 0.0004184

S = 0.007517 R-Sq = 45.98% R-Sq(adj) = 0.00%

Figure A2.9a Residual Plots for BioRep Vs Tissue for CBG in May Normal Probability Plot Versus Fits 99 0.010

90

l 0.005

t

a

n

u

e

d c

50 i

r

s e

e 0.000

P R 10 -0.005 1 -0.010 -0.005 0.000 0.005 0.010 -0.005 0.000 0.005 0.010 Residual Fitted Value

Histogram Versus Order

3 0.010

y

l c

a 0.005 n

2 u

e

d

u i

s q

e 0.000

e r

1 R F -0.005 0 0 5 0 5 0 5 0 1 2 3 4 5 6 7 8 9 05 02 00 02 05 07 10 .0 .0 .0 .0 .0 .0 .0 Observation Order -0 -0 0 0 0 0 0 Residual

150

Table A2.3.1.2b: Two-way ANOVA for Gene CBG: dCt (Normalized with Act2) versus Bio Rep, and different tissue types in June.

OUTPUT Source DF SS MS F P Bio Rep 2 0.0005515 0.0002758 2.62 0.188 Tissue 2 0.0012702 0.0006351 6.03 0.062 Error 4 0.0004216 0.0001054 Total 8 0.0022434

S = 0.01027 R-Sq = 81.21% R-Sq(adj) = 62.41%

Figure A2.9b Residual Plots for BioRep Vs Tissue for CBG inJune Normal Probability Plot Versus Fits 99 0.010

90 0.005

l

t

a

n u

e 0.000

d c

50 i

r

s e

e -0.005

P R 10 -0.010 1 -0.02 -0.01 0.00 0.01 0.02 0.00 0.01 0.02 0.03 0.04 Residual Fitted Value

Histogram Versus Order 3 0.010

0.005

y

l c

2 a n

u 0.000

e

d

u i

s

q e

e -0.005

r R

F 1 -0.010 0 -0.015 -0.010 -0.005 0.000 0.005 0.010 1 2 3 4 5 6 7 8 9 Residual Observation Order

151

Table A2.3.1.2c: Two-way ANOVA for Gene CBG: dCt (Normalized with Act2) versus Bio Rep, and different tissue types in September.

OUTPUT Source DF SS MS F P Bio Rep 2 0.0014128 0.0007064 3.22 0.147 Tissue 2 0.0017791 0.0008896 4.05 0.109 Error 4 0.0008781 0.0002195 Total 8 0.0040701

S = 0.01482 R-Sq = 78.42% R-Sq(adj) = 56.85%

Figure A2.9c Residual Plots for BioRep Vs Tissue for CBG in Sep Normal Probability Plot Versus Fits 99 0.02

90

l 0.01

t

a

n

u

e

d c

50 i r

s 0.00

e

e

P R 10 -0.01 1 -0.02 -0.01 0.00 0.01 0.02 0.00 0.02 0.04 Residual Fitted Value

Histogram Versus Order 0.02

2.0 y

1.5 l 0.01

c

a

n

u

e

d

u i s

q 1.0 0.00

e

e

r R F 0.5 -0.01 0.0 -0.015 -0.010 -0.005 0.000 0.005 0.010 0.015 1 2 3 4 5 6 7 8 9 Residual Observation Order

152

Table A2.3.1.2d: Two-way ANOVA for Gene CBG: dCt (Normalized with Act2) versus Bio Rep, and different times for Needles

OUTPUT Source DF SS MS F P Bio Rep 2 0.0002510 0.0001255 3.17 0.150 Month 2 0.0000389 0.0000195 0.49 0.645 Error 4 0.0001585 0.0000396 Total 8 0.0004485

S = 0.006295 R-Sq = 64.65% R-Sq(adj) = 29.31%

Figure A2.9d Residual Plots for BioRep Vs Time CBG in Needles Normal Probability Plot Versus Fits 99 0.008

90 0.004

l

t

a

n

u

e

d c

50 i 0.000

r

s

e

e P R -0.004 10

1 -0.008 -0.010 -0.005 0.000 0.005 0.010 0.000 0.004 0.008 0.012 0.016 Residual Fitted Value

Histogram Versus Order 0.008 3

0.004

y

l

c a

n 2

u

e

d i

u 0.000

s

q

e

e

r R

F 1 -0.004

0 -0.008 -0.0075 -0.0050 -0.0025 0.0000 0.0025 0.0050 1 2 3 4 5 6 7 8 9 Residual Observation Order

153

Table A2.3.1.2e: Two-way ANOVA for Gene CBG: dCt (Normalized with Act2) versus Bio Rep, and different times for Phloem

OUTPUT Source DF SS MS F P Bio Rep 2 0.0002629 0.0001314 1.58 0.312 Month 2 0.0005733 0.0002867 3.44 0.135 Error 4 0.0003330 0.0000833 Total 8 0.0011693

S = 0.009125 R-Sq = 71.52% R-Sq(adj) = 43.03%

Figure A2.9e Residual Plots for BioRep Vs Time CBG in Phloem Normal Probability Plot Versus Fits 99 0.005

90

l

t a

n 0.000

u

e

d c

50 i

r

s e

e -0.005

P R 10 -0.010 1 -0.01 0.00 0.01 -0.01 0.00 0.01 0.02 0.03 Residual Fitted Value

Histogram Versus Order 2.0 0.005

y 1.5

l

c a

n 0.000

u

e

d

u i

1.0 s

q

e e

r -0.005 R F 0.5 -0.010 0.0 -0.012 -0.008 -0.004 0.000 0.004 0.008 1 2 3 4 5 6 7 8 9 Residual Observation Order

154

Table A2.3.1.2f: Two-way ANOVA for Gene CBG: dCt (Normalized with Act2) versus Bio Rep, and different times for Xylem

OUTPUT Source DF SS MS F P Bio Rep 2 0.0019263 0.0009632 5.73 0.067 Month 2 0.0014380 0.0007190 4.27 0.102 Error 4 0.0006728 0.0001682 Total 8 0.0040371

S = 0.01297 R-Sq = 83.34% R-Sq(adj) = 66.67%

Figure A2.9f Residual Plots for BioRep Vs Time CBG in Xylem Normal Probability Plot Versus Fits 99

90 0.01

l

t

a

n

u

e

d c

50 i r

s 0.00

e

e

P R 10 -0.01 1 -0.02 -0.01 0.00 0.01 0.02 0.00 0.02 0.04 0.06 Residual Fitted Value

Histogram Versus Order 3

0.01

y

l c

2 a

n

u

e

d

u i s

q 0.00

e

e

r R

F 1

-0.01 0 -0.010 -0.005 0.000 0.005 0.010 0.015 1 2 3 4 5 6 7 8 9 Residual Observation Order

155

Table A2.3.2.1 The raw data and expression fold change calculations for pine samples for Gene Lac2: dCt normalized with Act2.

Biological Act 2 Target - Reference (Act2) Fold increase 2^[-(t-r)] Lac2 Reps. Needles Phloem Xylem Needles Phloem Xylem Needles Phloem Xylem Needles Phloem Xylem 1 35.7 30.18 26.6 20.78000 19.99000 20.67000 14.92000 10.19000 5.93000 0.00003 0.00086 0.01640 May 2(3) 37.12 31.18 31.22 23.35000 21.01000 24.23000 13.77000 10.17000 6.99000 0.00007 0.00087 0.00787 3 34.5 32.11 30.28 21.65000 23.12000 22.76000 12.85000 8.99000 7.52000 0.00014 0.00197 0.00545 1 36.75 31.2 27.51 22.16000 21.96000 22.34000 14.59000 9.24000 5.17000 0.00004 0.00165 0.02778 June 2(4) - 26.71 23.83 22.40000 21.57000 22.23000 5.14000 1.60000 0.02836 0.32988 3 37.945 34.73 28.153333 27.78333 25.87333 25.62667 10.16167 8.85667 2.52667 0.00087 0.00216 0.17354 1 34.83 32.2 25.71 22.68000 21.04000 21.67000 12.15000 11.16000 4.04000 0.00022 0.00044 0.06079 September 2 36.74 36.19 30.63 25.92000 27.04000 28.36000 10.82000 9.15000 2.27000 0.00055 0.00176 0.20733 3 36.3 30.87 28.89 26.58000 24.46000 26.02000 9.72000 6.41000 2.87000 0.00119 0.01176 0.13679

156

Table A2.3.2.2a: Two-way ANOVA for Gene Lac2: dCt (Normalized with Act2) versus Bio Rep, and different tissue types in May.

OUTPUT Source DF SS MS F P Bio Rep 2 0.0000187 0.0000094 0.77 0.520 Tissue 2 0.0001731 0.0000866 7.16 0.048 Error 4 0.0000483 0.0000121 Total 8 0.0002402

S = 0.003476 R-Sq = 79.88% R-Sq(adj) = 59.75%

Figure A2.10a Residual Plots for BioRep Vs Tissue for Lac2 in Ma Normal Probability Plot Versus Fits 99 0.004

90 l

t 0.002

a

n

u

e

d c

50 i r

s 0.000

e

e

P R 10 -0.002

1 -0.004 -0.0050 -0.0025 0.0000 0.0025 0.0050 0.000 0.003 0.006 0.009 0.012 Residual Fitted Value

Histogram Versus Order 2.0

0.004

y l

c 1.5

a 0.002

n

u

e

d

u i

1.0 s

q 0.000

e

e

r R F 0.5 -0.002

0.0 -0.004 3 2 1 0 1 2 3 4 0 0 0 0 0 0 0 0 1 2 3 4 5 6 7 8 9 .0 .0 .0 .0 .0 .0 .0 .0 -0 -0 -0 0 0 0 0 0 Observation Order Residual

157

Table A2.3.2.2b: Two-way ANOVA for Gene Lac2: dCt (Normalized with Act2) versus Bio Rep, and different tissue types in June.

Note: Expression in the June needles sample 2 was below detection limit, and thus the ‘missing’ data was substituted with a value of 0

OUTPUT Source DF SS MS F P Bio Rep 2 0.018081 0.0090405 1.29 0.370 Tissue 2 0.059022 0.0295108 4.21 0.104 Error 4 0.028037 0.0070093 Total 8 0.105140

S = 0.08372 R-Sq = 73.33% R-Sq(adj) = 46.67%

Figure A2.10b Residual Plots for BioRep Vs Tissue for Lac2 in Ju Normal Probability Plot Versus Fits 99 0.10

90 0.05

l

t

a

n

u

e

d c

50 i 0.00

r

s

e

e

P R -0.05 10

1 -0.10 -0.1 0.0 0.1 0.0 0.1 0.2 Residual Fitted Value

Histogram Versus Order 3 0.10

0.05

y

l c

2 a

n

u

e

d i

u 0.00

s

q

e

e

r R F 1 -0.05

0 -0.10 -0.10 -0.05 0.00 0.05 0.10 1 2 3 4 5 6 7 8 9 Residual Observation Order

158

Table A2.3.2.2c: Two-way ANOVA for Gene Lac2: dCt (Normalized with Act2) versus Bio Rep, and different tissue types in September.

OUTPUT Source DF SS MS F P Bio Rep 2 0.0037050 0.0018525 1.04 0.432 Tissue 2 0.0350393 0.0175197 9.85 0.028 Error 4 0.0071139 0.0017785 Total 8 0.0458582

S = 0.04217 R-Sq = 84.49% R-Sq(adj) = 68.97%

Figure A2.10c Residual Plots for BioRep Vs Tissue for Lac2 in Se Normal Probability Plot Versus Fits 99 0.050

90

0.025

l

t

a

n

u

e

d c

50 i 0.000

r

s

e

e

P R -0.025 10

1 -0.050 -0.08 -0.04 0.00 0.04 0.08 -0.05 0.00 0.05 0.10 0.15 Residual Fitted Value

Histogram Versus Order 3 0.050

y 0.025

l c

2 a

n

u

e

d i

u 0.000

s

q

e

e

r R

F 1 -0.025

0 -0.050 -0.04 -0.02 0.00 0.02 0.04 1 2 3 4 5 6 7 8 9 Residual Observation Order

159

Table A2.3.2.2d: Two-way ANOVA for Gene Lac2: dCt (Normalized with Act2) versus Bio Rep, and different times for Needles

Note: Expression in the June nNeedles sample 2 was below detection limit, and the data 2 was substituted as with a value of 0

OUTPUT Source DF SS MS F P Bio Rep 2 0.0000007 0.0000003 4.83 0.086 Month 2 0.0000005 0.0000003 3.52 0.131 Error 4 0.0000003 0.0000001 Total 8 0.0000015

S = 0.0002668 R-Sq = 80.67% R-Sq(adj) = 61.34%

Figure A2.10d Residual Plots for BioRep Vs Time Lac2 in Needles Normal Probability Plot Versus Fits 99 0.0002

90

l t

a 0.0000

n

u

e

d c

50 i

r

s

e

e P R -0.0002 10

1 -0.0004 -0.00050 -0.00025 0.00000 0.00025 0.00050 0.00000 0.00025 0.00050 0.00075 0.00100 Residual Fitted Value

Histogram Versus Order 2.0 0.0002

y 1.5

l c

a 0.0000

n

u

e

d

u i

1.0 s

q

e

e r R -0.0002 F 0.5

0.0 -0.0004 -0.0003 -0.0002 -0.0001 0.0000 0.0001 0.0002 1 2 3 4 5 6 7 8 9 Residual Observation Order

160

Table A2.3.2.2e: Two-way ANOVA for Gene Lac2: dCt (Normalized with Act2) versus Bio Rep, and different times for Phloem

OUTPUT Source DF SS MS F P Bio Rep 2 0.032301 0.0161505 2.67 0.183 Month 2 0.045355 0.0226774 3.75 0.121 Error 4 0.024158 0.0060395 Total 8 0.101814

S = 0.07771 R-Sq = 76.27% R-Sq(adj) = 52.54%

Figure A2.10e Residual Plots for BioRep Vs Time Lac2 in Phloem Normal Probability Plot Versus Fits 99 0.10

90 0.05

l

t

a

n

u

e

d c

50 i 0.00

r

s

e

e

P R 10 -0.05

1 -0.10 -0.10 -0.05 0.00 0.05 0.10 -0.1 0.0 0.1 0.2 Residual Fitted Value

Histogram Versus Order 0.10 4.8

0.05

y l

c 3.6

a

n

u

e

d i

u 0.00 s

q 2.4

e

e

r

R F 1.2 -0.05

0.0 -0.10 -0.075 -0.050 -0.025 0.000 0.025 0.050 0.075 1 2 3 4 5 6 7 8 9 Residual Observation Order

161

Table A2.3.2.2f: Two-way ANOVA for Gene Lac2: dCt (Normalized with Act2) versus Bio Rep, and different times for Xylem

OUTPUT Source DF SS MS F P Bio Rep 2 0.032301 0.0161505 2.67 0.183 Month 2 0.045355 0.0226774 3.75 0.121 Error 4 0.024158 0.0060395 Total 8 0.101814

S = 0.07771 R-Sq = 76.27% R-Sq(adj) = 52.54%

Figure A2.10f Residual Plots for BioRep Vs Time Lac2 in Xylem Normal Probability Plot Versus Fits 99 0.10

90 0.05

l

t

a

n

u

e

d c

50 i 0.00

r

s

e

e

P R 10 -0.05

1 -0.10 -0.10 -0.05 0.00 0.05 0.10 -0.1 0.0 0.1 0.2 Residual Fitted Value

Histogram Versus Order 0.10 4.8

0.05

y l

c 3.6

a

n

u

e

d i

u 0.00 s

q 2.4

e

e

r

R F 1.2 -0.05

0.0 -0.10 -0.075 -0.050 -0.025 0.000 0.025 0.050 0.075 1 2 3 4 5 6 7 8 9 Residual Observation Order

162