Identification of genes influencing wood fibre properties in Eucalyptus nitens

By

Nahida Bhuiyan B. Sc Ag (Hons 1); Master of Horticulture

Thesis submitted in total fulfillment of the requirements for the degree of Doctor of Philosophy

School of Forest and Ecosystem Science The University of Melbourne

May 2008

Abstract

Eucalypts are a major forest resource globally and the area of eucalypt plantations for pulp and paper production is expanding rapidly in Australia. Consequently, there is an increasing need to breed eucalypts with improved wood properties. Since many high value wood traits are under strong genetic control, identification of DNA markers linked to these traits will have application in breeding programs. In recent years there has been a shift in marker strategy away from QTL mapping in pedigrees to association studies in unrelated populations. In the latter approach, single nucleotide polymorphisms (SNPs) in candidate genes are screened to identify SNPs that significantly associate with wood traits. Significant SNPs could be used for marker-assisted selection (MAS) in breeding programs. The objectives of this study were to identify candidate genes that may influence pulp yield in eucalypts and to identify SNP variants in those genes that associate with superior wood and pulp traits.

Approximately 300 trees from a full-sib Eucalyptus nitens progeny derived from a wide intra specific cross were used for gene discovery. DNA microarrays containing ~5800 young xylem of cDNAs Eucalyptus grandis were screened with probes synthesised from

RNA isolated from trees with either high or low pulp yield. Forty-six transcripts were differentially regulated, of which 27 were more abundant in high pulp trees and 19 were more abundant in low pulp trees. All differentially expressed cDNAs were partially sequenced and searched against existing gene databases. Six genes were selected as putative pulp yield candidate genes based on their significant similarity to genes with known function and were named EgrCesA3 (cellulose synthase), EgrNAM1 (NAM

II family protein), EgrXET (xyloglucan endotransglycosylase), EgrGalk (galactokinase),

EgrHB1 (class III homeodomain leucine zipper protein) and EgrZnf1 (C3HC4 type zinc finger protein). Real-Time PCR was carried out on selected genes to confirm the accuracy of the microarray results. Full length cDNAs were obtained for EgrCesA3,

EgrHB1 and EgrZnf1 and the candidate genes were partially characterised. An additional candidate gene, the novel gene EgrPAAPA, was selected based on previous research due to its high expression in the cambium and its expression in eucalypt branches.

EgrPAAPA was cloned by screening an E. grandis cDNA library and fully sequenced.

The full length EgrPAAPA encodes a short 172 amino acid protein rich in alanine, glutamic acid and proline residues. The EgrPAAPA protein appears to be a hydroxyproline-rich glycoprotein (HRGP) and the repetitive ‘PAAPA’ motif suggests that it might play a structural role in cell wall development. Southern blot analysis revealed that E. grandis has a single copy of the EgrPAAPA gene and northern blot analysis revealed that EgrPAAPA is most strongly expressed in xylem tissues.

Allelic variation in EnCesA3, EnNAM1, EnPAAPA and EnHB1 was examined by sequencing each gene in 16 to 24 unrelated E. nitens individuals. SNPs were identified by sequence analysis and patterns of nucleotide diversity, linkage disequilibrium and the selection of suitable polymorphisms were estimated. A moderate level of nucleotide diversity (θw = 0.0056 and π = 0.0039) was observed and linkage disequilibrium was generally low, extending only a few hundred base pairs in each gene. Negative selection has been operating in EnHB1. Selected TagSNPs from EnNAM1, EnHB1 and EnPAAPA were genotyped across 300 unrelated E. nitens trees which had been phenotyped for six

III wood quality traits including pulp yield, cellulose, lignin, Klason lignin, microfibril angle

(MFA) and density. Five highly significant genetic associations (p<0.01) were detected between several SNPs in EnHB1 and all wood quality traits except density. A significant association was also found between EnPAAPA and MFA (p<0.05). No significant associations were found with any of the EnNAM1 SNPs. The strong genetic associations between SNPs in EnHB1 and a range of wood traits is consistent with this gene’s known role as a transcription factor controlling vascular development. Validation of these associations in different populations will be necessary in order to confirm these results.

Alternatively, QTL mapping can be performed in order to confirm whether QTL for wood property traits can be detected at the EnHB1 and EnPAAPA loci.

IV

Declaration

The contents presented in this thesis are my original work by study and research except where reference is made. It has not been submitted previously for a degree at any other University. The thesis is less than 100,000 words in length, exclusive of tables, illustrations and bibliography.

Nahida Bhuiyan May 2008

V Acknowledgements

I would like to thank my supervisors Dr Simon Southerton, Dr Gerd Bossinger and Dr Gavin Moran for their constant help and encouragement throughout this work. I would like to thank especially Dr Southerton for his endless support and guidance throughout the duration of my study. It has been a great honour and pleasure to work under his supervision. I thank Dr Bossinger for his willingness to share valuable experiences with me especially in the literature review. I thank Dr Bala Thumma and Dr Shannon Dillon for their assistance with association genetics analysis. I thank Dr Colleen MacMillan and Dr Karen Fullard for helpful discussion and advice during early stages of my research work. I thank my lab colleagues Charlie Bell, Maureen Nolan for their help in the lab. I also thank Dr Gapare Washington for helping me SAS genetics analysis.

Special big thank to Judith Wright, Forest Biosciences, Clayton for helping me to get the predicted pulp yield data using NIRA analysis. I would like to thank to Dr Bingyu Zhang for her contributing in amplification and sequencing of EnCesA3 gene.

I am grateful to the University of Melbourne and CSIRO for providing me an APA (I) scholarship. The industry partner for Australian Research Council project was Sappi, and their financial support is appreciated. In particular I would like to thank Dr Arlene Bailey and Dr Terry Stanger for their involvement and support throughout this research. I also thank CSIRO for providing the lab and computer facilities required to conduct the research. I would like to thank IP Australia for their patience as I have completed my thesis.

I thank my husband, Anamul Bhuiyan and my son Navid for their patience and cooperation during my research work. I would also like to thank my parents, my brother Habib Bhuiyan and my sisters: Silva and Elena for their encouragement and well wishes throughout the work.

VI Table of Contents

Abstract………………………………………………………………………………...... II Declaration……………………………………………………………………………...... V Acknowledgements……………………………………………………………………...VI Table of Contents……………………………………………………………………….VII List of Tables…………………………………………………………………………….XI List of Figures…………………………………………………………………………..XII List of Abbreviations…………………………………………………………………..XIV

Chapter 1 Literature-review 1.1 Introduction………………………………………………………….2 1.2 The Eucalyptus……………………………………………….3 1.3 Wood fibre traits…….……………………………………………....5 1.4 Identifying genes involved in wood formation……………………...6 1.5 Wood formation……………………………………………………..9 1.6 Cell wall development……………………………………………..12 1.6.1 Cellulose biosynthesis………………………………..….13 1.6.2 Hemicellulose biosynthesis…………………………...... 16 1.6.3 Lignin biosynthesis……………………………………...16 1.6.4 Cell wall proteins…………………………………..……19 1.7 Family-based DNA linkage mapping…………………………...... 20 1.8 Population-based association studies……………………………...21 1.9 Single Nucleotide Polymorphism (SNP) discovery and genotyping………………………………………………………....24 1.10 Summary………………………………………………………...... 27

Chapter 2 Discovery of candidate genes for pulp yield in Eucalyptus nitens 2.1 Introduction……………………………………………………...... 29 2.2 Materials and methods………………………………………….....31 2.2.1 Plant material…………………………………………………..31

VII 2.2.2 Wood sample preparation and pulp trait analysis……………...32 2.2.3 Microarray printing…………………………………………….33 2.2.4 RNA isolation………………………………………...... 33 2.2.5 Probe preparation…………………………………………….....34 2.2.6 Array hybridization and washing……………………………….35 2.2.7 Scanning and data analysis……………………………………..37 2.2.8 Sequencing and phylogenetic analysis………………………....38 2.2.9 Real time RT- PCR analysis…………………………………....38

2.3 Results……………………………………………………………….41 2.3.1 Selected genes more active in high pulp yield progeny………...42 2.3.2 Selected genes less active in high pulp yield progeny………….44 2.3.3 Functional classification of selected pulp yield genes………….46 2.3.4 Functional groupings of selected pulp yield genes…………..…48 2.3.5 Candidate genes putatively affecting pulp yield………….…….49 2.3.6 Real-time PCR assays…………………………………………..50 2.3.7 Expression of control gene in real-time RT-PCR………………52 2.3.8 Standard transcript levels for quantitative PCR assays………... 53 2.3.9 Quantitative analysis of EgrCesA3, EgrHB1 and EgrNAM1 by real-time PCR………………………………….…54 2.3.10 Phylogenetic analysis……………………………………...... 58

2.4. Discussion………………………………………………………….72 2.4.1 Gene discovery strategy………………………………………...72 2.4.2 Pulp yield candidate genes……………………………………...75 2.4.2.1 EgrCesA3…………………………………………………75 2.4.2.2 EgrNAM1…………………………………………………76 2.4.2.3 EgrHB1…………………………………………………...77 2.4.2.4 EgrZnf1…………………………………………………...79 2.4.3 Cell wall loosening genes…………………………………...... 80

VIII Chapter 3 Cloning and molecular characterization of the novel eucalypt cell wall gene EgrPAAPA

3.1 Introduction……………………………………………………….83 3.2 Materials and methods……………………………………………84 3.2.1 Plant material………………………………………………….84 3.2.2 Probe preparation……………………………………………...84 3.2.3 Library screening…………………………………………...... 85 3.2.4 DNA and RNA extraction……………………………………..87 3.2.5 Southern blot.………..……………………………………...... 88 3.2.6 Northern blot analysis………………………………………....89

3.3 Results ……………………………………………………………90 3.3.1 Isolation and sequence analysis of EgrPAAPA……………...... 90 3.3.2 Southern blot analysis……………………………………...... 96 3.3.3 Expression analysis of EgrPAAPA…………………………….98

3.4 Discussion………………………………………………………..100

Chapter 4 Genomic diversity and association mapping of candidate genes in E. nitens

4.1 Introduction……………………………………………………..106 4.2 Materials and methods……………………………………….....107 4.2.1 Plant material………………………………………………..107 4.2.2 Wood trait measurements……………………………………108 4.2.3 DNA preparation…………………………………………….108 4.2.4 Primer design………………………………………………...109 4.2.5 PCR Amplification………………………………………...... 110 4.2.6 Cloning of PCR products…………………………………….111 4.2.7 Sequencing…………………………………………………...111

IX 4.2.8 Statistical analysis………………………………………….112 4.2.8.1 Nucleotide and haplotype diversity analysis…………...112 4.2.8.2 Neutrality tests…………………………………………113 4.2.8.3 Linkage disequilibrium (LD) analysis…………………113 4.2.8.4 Tagging SNPs…………………………………………..114 4.2.9 SNPs genotyping…………………………………………..114 4.2.10 Marker analysis for association……………………………118

4.3 Results…………………………………………………………119 4.3.1 Genomic structure of EnCesA3, EnHB1, EnNAM1 and EnPAAPA………………………………………………….119 4.3.2 Nucleotide diversity………………………………………..123 4.3.3 Haplotype diversity………………………………………...126 4.3.4 Neutrality tests……………………………………………..128 4.3.5 Linkage disequilibrium…………………………………….130 4.3.6 Haplotype block partitioning……………………………....133 4.3.7 Genetic association………………………………………...135

4.4 Discussion…………………………………………………….140 4.4.1 Diversity…………………………………………………..140 4.4.2 Neutrality tests…………………………………………….142 4.4.3 Linkage disequilibrium …………………………………....145 4.4.4 Association studies………………………………………...147

Chapter 5 General Discussion……………………………………………...151

References……………………………………………………………………..158

X List of Tables

Table 2.1: Real-Time RT-PCR primers……………………………………………..40 Table 2.2: Selected genes more active in high pulp yield progeny…………………43 Table 2.3: Selected genes less active in high pulp yield progeny…………………..45 Table 2.4: Functional classification of selected pulp yield genes…………………..47 Table 2.5: Candidate pulp yield genes identified in Eucalyptus nitens……………..49 Table 3.1: Amino acid composition of EgrPAAPA and other glutamic acid rich proteins……………………………………………………………...95 Table 4.1: PCR amplification primers used for SNP discovery…………………...109 Table 4.2: Oligonucleotides used for genotyping SNPs in EnNAM1……………...116 Table 4.3: Nucleotide diversity of selected candidate genes…………………...... 125 Table 4.4: Haplotype number, diversity and the minimum number of

recombination events (RM) of the candidate genes……………………..127 Table 4.5: Selection tests for the candidate genes……………………………….....129 Table 4.6: Haplotype blocks and TAG SNPs for selected candidate genes………..134 Table 4.7: Genetic associations between 21 SNPs from EnHB1, EnNAM1 and EnPAAPA and pulp yield, cellulose, total lignin, Klason lignin, MFA and density…………………………………………………136 Table 4.8: Summary of genetic associations between significant SNPs and wood traits…………………………………………………………...139

XI List of Figures Figure 1.1: Year wise eucalypt plantations in Australia………………………………4 Figure 2.1: Discovery Approach……………………………………………………..36 Figure 2.2: Distribution of selected pulp yield genes into functional groups………..48 Figure 2.3: Real-Time PCR primer pair products of EgrCesA3 (2.3.1), EgrHB1(2.3.2), EgrNAM1(2.3.3), and EgrSEC13 (control gene primer)……………………………………………………………...51 Figure 2.4: Amplification plot for EgrCesA3 and EgrSEC13 using RT-PCR……….52 Figure 2.5: Standard curve for EgrHB1 quantitative RT-PCR assays……………….53 Figure 2.6: Standard curve for EgrNAM1 quantitative RT-PCR assays………...... 54 Figure 2.7.1: Quantitative expression of EgrCesA3 in high and low pulp yield trees……………………………………………………………...55 Figure 2.7.2: Quantitative expression of EgrHB1 in high and low pulp yield trees……………………………………………………………...56 Figure 2.7.3: Quantitative expression of EgrNAM1 in high and low pulp yield trees……………………………………………………………...57 Figure 2.8: Unrooted phylogenetic analysis of CesA proteins from Arabidopsis thaliana, Populus tremuloides, Pinus spp, Oriza sativa and Eucalyptus grandis …………………………………....59 Figure 2.9: Nucleotide and deduced amino acid sequence of the partial EgrNAM1 cDNA………………………………………………………...61 Figure 2.10: Alignment of deduced amino acid sequences of partial EgrNAM1 and the closely related arabidopsis NAC protein ANAC014…………..62 Figure 2.11: Unrooted phylogenetic analysis of 3’ region of plant NAC proteins………………………………………………………………...63 Figure 2.12: Nucleotide sequence of EgrHB1……………………………………….65 Figure 2.13: Comparison of the deduced amino acid sequences of ATHB-15 and EgrHB1…………………………………………………………….66 Figure 2.14: Unrooted phylogenetic tree of plant HD-Zip III proteins (contain MEHKLA domain) …………………………………………………...67 Figure 2.15: Nucleotide and deduced amino acid sequence of EgrZnf1 …………...69

XII Figure 2.16: Comparison of the deduced amino acid sequence of EgrZnf1 with Arabidopsis and Oryza sativa C3HC4-type ring finger proteins………………………………………………………………..70 Figure 2.17: A phylogenetic tree of Zinc finger (C3HC4 type ring .finger) family proteins from Arabidopsis thaliana, Oryza sativa and Eucalyptus grandis ………………………………………………71 Figure 3.1: Nucleotide and deduced amino acid sequences of the EgrPAAPA…..…………………………………………………………..92 Figure 3.2: Comparison of the deduced amino acid sequences of EgrPAAPA with other glutamic acid rich proteins ……………………………….....94 Figure 3.3: Southern blot analysis of the EgrPAAPA gene…………………………97 Figure 3.4: Northern blot analysis of the EgrPAAPA gene…………………………99 Figure 4.1: Schematic diagram of MLPA analysis of SNPs resulting in amplification of different sized products from the A and B alleles…….115 Figure 4.2.1: Genomic structure of EnCesA3 ……………………………………....119 Figure 4.2.2: Genomic structure of EnHB1…………………………………………120 Figure 4.2.3: Genomic structure of EnNAM1……………………………………….121 Figure 4.2.4: Genomic structure of EnPAAPA……………………………………...122 Figure 4.3.1: Distribution of the squared correlations of allele frequencies (r2) against distance in base pairs for the EnCesA3 locus…………...... 131 Figure 4.3.2: Distribution of the squared correlations of allele frequencies (r2) against distance in base pairs for the EnHB1 locus…………………..131 Figure .4.3.3: Distribution of the squared correlations of allele frequencies (r2) against distance in base pairs for the EnNAM1 locus………………...132 Figure 4.3.4: Distribution of the squared correlations of allele frequencies (r2) against distance in base pairs for the EnPAAPA locus……………….132

XIII List of Abbreviations

AFLPs Amplified fragment length polymorphisms AGPs Arabinogalactan proteins BSA Bulk segregating analysis CAZymes Carbohydrate metabolism enzymes CCR Cinnamoyl CoA reductase DEPC Diethyl pyrocarbonate DNA Deoxyribonucleic acid EDTA Ethylene diamine tetra acetate Egr Eucalyptus grandis En Eucalyptus nintens EST Expressed gene sequence tag GAS Gene-assisted selection GLM General linear model HRGP Hydroxyproline-rich glycoprotein ISSA Induced somatic sectors analysis LD Linkage disequilibrium MAS Marker-assisted selection MFA Microfibril angle Multiplex ligation-dependent probe MLPA amplification NAM No apical meristem NIRA Near-infrared reflectance analysis PCR Polymerase chain reaction PRPs Proline-rich proteins QTL Quantitative trait locus QTNs Quantitative trait nucleotides RAPDs Random amplified polymorphisms RFLPs Restriction fragment length polymorphisms RNA Ribonucleic acid SDS Sodium dodecyl sulfate SNPs Single nucleotide polymorphisms SSC Sodium chloride sodium citratein SSPE Sodium chloride sodium phosphate EDTA SSRs Simple sequence repeats UTR Untranslated region XET Xyloglucan endotransglycosylases

XIV

Chapter 1

Literature review

1.1 Introduction

Eucalypt forestry operations increasingly focus on elite lines with desirable wood traits for clonal plantations. As a consequence, attention has been directed to the development of breeds and clones with superior wood and pulp traits. Yet, the genetic and molecular control of high value wood quality traits is poorly understood.

Wood is a complex and highly variable material and this variability exists not only between , but also within species and even within a single tree. Understanding this variation in wood formation is important in order to improve end-use products making genetic improvement of wood quality traits at the molecular level a high priority for forest tree breeding programs. Due to their size, variability, heterozygosity, dormancy and high out–crossing rates, trees are not amenable for breeding like annual field crops. Candidate gene-based association studies are ideally suited for the dissection of complex quantitative traits such as wood properties, and are now beginning to reveal the genes and allelic variations that underly variation in these traits in forest trees.

This review will focus firstly on the genus Eucalyptus because of its suitability for high quality pulp and paper production, a trait of particular interest in this current study. Secondly, wood fibre development will be reviewed as well as molecular approaches to studying wood formation with an emphasis on cell wall development in trees. Also the application of association (LD) mapping for the characterisation of wood fibre traits using single nucleotide polymorphisms (SNPs) in selected candidate genes will be discussed.

2

1.2 The genus Eucalyptus

Eucalyptus, an evergreen woody plant, is the dominant genus in many Australian ecosystems and includes some of the world’s most important plantation hardwood species. Eucalyptus is a large genus with over 700 species, all of which are endemic to Australia and the islands to its North, where they dominate the forests and woodlands mainly of coastal regions. More than 300 species occur in south-eastern

Australia, occupying many different habitats (Brooker and Kleinig, 1999). Drier

areas of the country, particularly in the south, are covered by eucalypt mallee

shrublands (Williams and Brooker, 1997). Many eucalypts are site-sensitive and have preferences for particular soil types, while other species are distributed over wide geographic and environmental gradients and show a relatively wide tolerance to soil type (Boland et al., 1984). Consequently, eucalypts can be successfully grown in most of the tropical and temperate climatic regions of the world (Eldridge et al.,

1993). Of the more than 700 species of eucalypt only a few have been grown in plantations including E globulus, E. grandis, E. nitens, Corymbia citriodora, E. camaldulensis, E. urophylla, E. tereticornis, and E. viminalis (Maxwell, 2004).

Eucalypt forests and woodlands provide an enormous resource of wood which varies widely in colour, density, hardness, toughness, strength, elasticity and durability

(Boland et al., 1984). Due to their diversity of properties, some eucalypt timbers find applications for structural purposes such as bridge building and harbor construction while others are excellent for scantling and general building timber, posts and poles, fencing, railway sleepers, cabinet-making, cardboard, plywood, wood-chips, paper pulp and fuel. Most industrial eucalypt wood is used in the production of bleached

3

kraft pulp for the paper industry due to the excellent properties that it imparts to printing, writing and tissues papers (Raymond, 2002; Eldridge et al., 1993).

World wide, eucalypt plantations are rapidly expanding and presently occupy about

18 million ha (Gerd Bossinger, per comm.). Most of the plantations occur in Brazil and South Africa (Williamson, 2004). In Australia, the last decade has seen a massive expansion of eucalypt plantations and the total plantation area today is almost five times that of 1994 (Figure 1.1).

900000

800000

700000

600000

500000

400000

Area (hectares) Area 300000

200000

100000

0 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 Plantation by year

Figure 1.1 Year wise eucalypt plantations in Australia (all data adopted from Australia’s Plantations, 2007)

4

The Australian eucalypt plantation area has recently reached a total of 710,000 ha, of which Eucalyptus nitens occupies 143,000 ha (Australia’s Plantations, 2007). World wide the main temperate plantation species is E. globulus however the plantation area of E. nitens is also significant especially in temperate regions. Eucalyptus nitens, native to eastern Victoria and NSW, is one of the species preferred for production of wood fibre for pulp and paper production. Approximately 90% of new hardwood plantations established in 2006 was aimed at pulp wood production (Australia’s

Plantation, 2007). Due to its economic importance, members of the genus Eucalyptus have also received growing attention as experimental systems to help our understanding of wood formation aimed at the genetic improvement of planting stock.

Molecular identification and characterization of genes controlling pulping characteristics and other high value wood traits in eucalypts are expected not only to increase our overall understanding of fibre development, but will potentially also add value to the germplasm available for the next generation of plantations.

1.3 Wood fibre traits

Wood quality is defined in terms of a particular end use, and may involve several traits. Over the last ten years a number of researchers have assessed wood property traits in forest tree breeding programs (Moran et al., 2002; Raymond, 2002; Pot et al.,

2002) and a number of physical and chemical wood property traits were defined as targets for genetic improvement. These were categorized as: cell wall biochemical traits (lignin, cellulose and non-cellulosic polysaccharide, i.e. hemicellulose), individual fibre or tracheid properties (cell length and microfibril angle) and whole wood characteristics (density, stiffness, shrinkage, reaction wood formation and pulp

5

yield (Bossinger et al., 2007: Neale et al., 2002). Generally wood traits are complex in nature and do not exhibit classic Mendelian inheritance patterns where substitution of a single allele is accountable for a qualitative change to phenotype (Lander and

Schork, 1994). Complex or quantitative traits are controlled by multiple genetic and non-genetic factors and are influenced by the accumulated effects of many genes and the environment. More specifically, interaction between many genes and the environment (G*E), and the interactions between genes and their genetic background

(G*G) are dependent on the degrees of variation of different traits (Gupta et al.,

2005). Although studies are limited, heritabilities of wood traits reported in the literature are generally high (Neale et al., 2002). In eucalypts, pulp yield traits showed moderate heritability (h2 of 0.33 to 0.58) whereas density showed high heritability (h2 of 0.67 to 1) (Raymond, 2002). Moderate to high heritabilities of economically important wood property traits suggests that their genetic control is quantitative and is likely to be influenced by a number of genes. An earlier approach was to characterise or localize the genomic region controlling these important traits of interest by quantitative trait locus (QTL) mapping. The main intention of these studies was to better understand the genetic components of these traits for marker- aided selection (MAS) in breeding programs. Numerous linkage maps have been developed to locate QTLs and to provide a basis for MAS. These are described in a later section of this chapter under the heading ‘family based DNA linkage mapping’.

1.4 Identifying genes involved in wood formation

Selection of candidate genes involved in wood formation is not a clear-cut procedure.

A number of approaches and a variety of information sources can be used to identify a

6

particular gene as a potential candidate gene. While all genes involved in expression

of a trait may be candidate genes, only genes with polymorphisms influencing the trait

are accessible to the geneticist. Knowledge of biochemical and/or physiological

pathways important in wood formation can inform choice of candidate-genes such as

genes involved in cellulose and lignin biosynthesis pathways (Bossinger et al., 2007;

Pflieger et al., 2001; Moran et al., 2002, Tabor et al., 2002). Another approach in

forest tree research is to select candidate genes on the basis of their position on genetic linkage maps and in relation to QTLs. Most candidates chosen from research are selected based on information obtained from mutant studies and on the identification and cloning of genes in model plant species particularly Arabidopsis

thaliana (Ehlting et al., 2005; Yokoyama and Nishitani, 2006). New approaches are

emerging where genes are chosen as candidates based on sequence information that might indicate that they have been the target of selection (reviewed by Bossinger et al., 2007). Such genes can be classified as evolutionary candidates.

At the start of this project little information was available regarding candidate genes for wood formation in eucalypts and candidate genes were selected based on research presented in Chapter two. Microarray technology proved to be a powerful tool for this undertaking. Using this technology, it was possible to identify a number of genes with potential roles in secondary cell wall biogenesis (Yokoyama and Nishitani, 2006;

Ko et al., 2004; Oh et al., 2003; Hertzberg et al., 2001; Demura et al.,2002). In arrays, global gene expression patterns can be inferred by comparing treatment against control, comparing xylem samples with contrasting properties and xylem vs bark or other tissues. Genes have been identified that are differentially regulated during wood formation, and differentially expressed genes were clustered into groups

7

or identified as candidate genes based on their expression pattern (Oh et al., 2003;

Moran et al., 2002). This approach has been successfully used and a number of

candidate genes including transcriptional regulators were identified in developing

xylem tissues (Hertzberg et al., 2001; Paux et al., 2004). Microarrays have also been

used to determine genomic regions (gene expression QTLs, or eQTLs) in segregating

populations explaining transcript variation in co-regulated genes (Schadt et al., 2003).

The approach was used in an E. grandis back cross segregating population and successfully identified eQTLS in lignin biosynthetic pathway that colocalized with growth QTLs (Kirst et al., 2004).

Recent advances in molecular genetics research of forest trees has been greatly

assisted by advances in our understanding of tree genomes which is revealing

numerous candidate genes for fibre traits. The complete sequencing of plant genomes

such as Arabidopsis (The Arabidopsis Genome Initiative 2000), Oryza (Yu et al.,

2002) and Populus (Tuscan et al., 2006) is improving our understanding of the number of genes involved in development of different organs and the function of these genes. It has been estimated that of the 25,498 protein-encoding genes in the

Arabidopsis genome over 400 are involved in cell wall formation and modification

(The Arabidopsis Genome Initiative, 2000).

Large-scale EST projects in important commercial forest species have also been undertaken to identify important genes involved in fibre development. In poplar,

Mellerowicz et al. (2001) reported the sequencing of 30,000 ESTs and found that 4% of genes from cambial tissues and 7% from xylem tissues are involved in cell wall formation. This included cytoskeletal genes and genes involved in the synthesis and

8

modification of cellulose, hemicelluloses, lignin and pectin. Sterky et al. (1998)

identified 3719 unique transcripts in 5692 ESTs from two poplar xylem cDNA

libraries. Of these transcripts, 2245 corresponded to 820 different proteins. The same

authors also reported that 4% (228) of ESTs were involved in cell wall synthesis or

modification, including lignin and cellulose biosynthesis. A further, 12% (683) of

ESTs showed no significant similarities to genes in existing databases. One

conclusion was that many of these unknown sequences may also have specific roles in

wood formation. Allona et al. (1998) observed that 10% of 1097 pine xylem ESTs

are involved in cell wall formation. Most encoded enzymes in the lignin biosynthetic

pathway or enzymes involved in carbohydrate metabolism.

In eucalypts ESTs were isolated that were more abundant in cambial tissue (Bossinger

and Leitch, 2000) or that were involved in xylogenesis and tension wood formation

(Paux et al., 2004; 2005). Nehra et al. (2005) reported a large EST project that had

accumulated more than 170,000 sequences from 20 libraries from Eucalyptus grandis.

Many of these ESTs were derived from xylem cDNA libraries. More recently, Qiu et

al. (2008) identified several candidate genes differentially expressed in eucalypt

branches in which large changes in wood properties such as cellulose content and microfibril angle were observed.

1.5 Wood formation

Wood is the natural creation of the vascular cambium (Larson, 1994), a lateral meristem that develops in conifer and most dicotyledonous land plants. Although, different trees produce wood of very dissimilar quality with a wide variation in wood

9

properties the basic production of wood is uniform. The production of wood is also known as xylogenesis and involves five major sequential steps including cell division, cell expansion (elongation and radial enlargement), cell wall thickening (involving synthesis and deposition of cellulose, hemicellolose, cell wall proteins, and lignin), programmed cell death, and heartwood formation (Plomion et al., 2001). Wood formation proceeds from the vascular cambium, which is derived from the procambium and ultimately from the apical meristem. The first anatomically visible stage of vascular differentiation in shoots of growing plants is the formation of provascular strands at the shoot apex connecting emerging leaf primordia with existing vasculature. In the sub-apical elongation zone the innermost cells of the procambium differentiate into protoxylem and further down the metaxylem is formed in more centrifugally located cells. In parallel proto-and metaphloem develops from the centrifugal parts of the procambium (Esau, 1960; Schrader, 2003).

In trees undergoing secondary growth a layer of undifferentiated procambium remains between the metaxylem and the metaphloem. These cells constitute the fascicular cambium, which produces secondary xylem and phloem through periclinal division.

Cells located between the vascular strands are differentiated and form the interfascicular cambium linking the fascicular cambium between individual vascular bundles. This leads to the formation of a complete cylinder of cambial cells extending along much of the length of the elongated root and stem – the vascular cambium (Larson, 1994; Steve and Sussex, 1989). The vascular cambium plays a major role in the diametral growth of gymnosperm and angiosperm shoots and roots which is of great significance in the perennial life of trees (Lachaud et al., 1999). It ensures the regular renewal of functional xylem and phloem. Vascular tissues are

10

made up of xylem and phloem, xylem being involved in the transport of water, soil

derived nutrients and plant hormones from the roots up to the shoots, and phloem in

the transport of organic material such as photosynthetic products, proteins, hormones

and mRNA from the leaves to the roots.

There is still debate about the nomenclature of cells within the cambium. In this

chapter the term cambium refers to meristematic cells that contribute to an increase in

diameter of the shoot by periclinal and anticlinal divisions giving rise to xylem and

phloem cells characteristic of secondary growth. These tissues are comprised of two distinct zones, the cambial zone, where all cell divisions occur and the maturation or differentiation zone, where cells undergo a variety of processes on their way to their final position in the plant stem. The division zone is generally comprised of a layer or layers of initiating cell/s; the cambial initials, that are enclosed on both radial walls by xylem and phloem mother cells. Cambium mother cells differ from cambium initials as they are in a higher differentiation state being destined to become either xylem or phloem elements, whereas initials retain the ability to produce both (Lachaud et al.,

1999; Larson 1994). Two distinct types, fusiform or ray initials, are present in the cambium and are responsible for longitudinally or transversely aligned cambial derivatives respectively. Short radial initials give rise to the ray cells that are essential for nutrient movement between phloem and xylem. Elongated fusiform initials divide lengthwise in the periclinal (tangential) plane to give rise to xylem and phloem mother cells. In gymnosperms the wood elements are mostly tracheids, while in dicots, they include vessel elements, vessel-associated cells, axial parenchyma and fibres. On the outer side of the cambium in gymnosperms, phloem sieve tubes are formed, while in dicotyledons, companion cells, axial parenchyma and fibres are

11

formed in the phloem. Anticlinal (radial) divisions of the fusiform initials give rise to

more mother cells and allows for increases in the circumference of the cambium

(Lachaud et al., 1999; Plomion et al., 2001). In eucalypt wood, most of the wood

elements are fibre cells, which primarily carry out a structural role. Consequently

eucalypt wood properties are largely determined by the properties of these xylem

fibres.

Differentiation of the cells produced by the cambial initials involves four major steps: cell expansion, followed by the ordered deposition of a thick multilayered secondary wall, lignification, and program cell death (PCD). During the formation of the primary cell wall, cells expand longitudinally and radially until they reach their final size. Once expansion is completed, secondary wall formation begins, driven by the coordinated expression of numerous genes involved in the synthesis of the four major compounds in the secondary cell wall: polysaccharides, lignin, cell wall proteins and minor soluble and insoluble (pectins and cell wall proteins) compounds in a neutral solvent (Plomion et al., 2001). Mature wood is primarily composed of cellulose (40%

- 50%), hemicellulose (25%) and lignin (25-35%) (Suzuki et al., 2006; Plomion et al.,

2001). The cellulose and lignin contained in woody trees represent the two most abundant organic compounds on earth (Grima-Pettenati and Goffner, 1999).

1.6 Cell wall development

The plant cell wall is a highly organized, chemically complex and dynamic structure.

It exerts control over cellular mechanisms and acts as a structural support for the cell.

In bulk much of the plant body is comprised of cell wall material and wood is mostly

12

made of the cell walls of dead cells. The primary cell wall forms a tough yet extensible extra cellular matrix of polysaccharides, while the secondary cell wall is more rigid, thicker, and sometimes impregnated with lignin (Roberts, 2001). These matrices define the features of individual cells within the plant body. While the primary cell wall ultimately functions as the determinant of plant morphology, the secondary cell wall provides overall strength and flexibility and its traits determine many of the economically important wood quality features. Consequently, many of the genes that are involved in secondary cell wall biosynthesis are primary targets for candidate gene based association studies (see below).

1.6.1 Cellulose biosynthesis

Polysaccharides are the primary components of the cell wall, accounting for about

70% of cellular dry weight. They exist as either cellulose, or as matrix components

(Bacic et al., 1988). Cellulose is the most abundant biopolymer on earth (Fagard et al., 2000) and also acts as a carbon sink in plants (Haigler et al., 2001). The Cellulose exists in a linear polymer of β(1-4)-linked glucose residues, arranged in microfibrils and it is the major structural component of the cell wall (see review by Baskin, 2001).

These microfibrils are subsequently interlaced with hemicellulose and pectic polysaccharides (Reiter, 1998).

The primary cell wall of trees contains 20-30% cellulose whereas the secondary cell wall comprises ~40-90% cellulose depending on the cell type (Taylor et al., 1999).

Cellulose synthesis occurs at the plasma membrane in higher plants by a membrane- bound complex (Joshi et al, 2004; Doblin et al., 2002). This six-lobed complex, which resembles a rosette in freeze-fracture, contains a transmembrane pore. Rosettes

13

are composed of up to 36 individual catalytic subunits and several models for rosette

assembly have been proposed (Ranik and Myburg, 2006; Brown and Saxena, 2000).

It is thought that multiple individual chains of cellulose are extruded through the pore

complex into the cell wall as separate β(1-4)-linked glucan chains. They immediately

hydrogen bond with one another to form crystallites, which associate to form

semicrystalline microfibrils. In wood fibre cells, the cellulose microfibrils are laid

down in characteristic layers fabricated at different periods during cell differentiation

(Plomion et al., 2001). In the primary cell wall the cellulose microfibrils are arranged randomly within the wall while in secondary cell walls they are aligned in an ordered, parallel arrangement in three (S1, S2 and S3) layers. The orientation of microfibrils

varies significantly between the different S layers in the secondary cell wall and their

orientation in the S2 layer is referred to as the microfibril angle (MFA), an important

wood quality trait that determines the cell wall architecture (Washusen et al., 2005;

Spokevicius et al., 2007) and might be causative for the tensile and elastic properties for wood and pulp (Tibbits, 2006).

The catalytic subunit of the cellulose synthase complex is the CesA (cellulose synthase) protein. The CesA1 protein has been shown to bind UDP-D-glucose in vitro, consistent with a role in the polymerisation of nucleotide sugar into cellulose.

In fibre plants, the substrate for CesA (UDP-D-Glc) is provided by the enzyme sucrose

synthase (SuSy) (Plomion, et al., 2001; Haigler, et al., 2001). Joshi et al. (2004)

proposed a three step model in which SuSy provides the UDP-glucose to the cellulose

machinery for organizing the hexagonal rosettes where KORRIGAN, a membrane associated cellulase, edits the glucose chain conversion to cellulose microfibrils. The

first plant cellulose synthase gene was identified by comparing cotton amino acid

14

motifs to bacterial cellulose synthase proteins (Pear et al., 1996). Since then the

availability of abundant genomic resources in Arabidopsis, Oryza and more recently

Populus has allowed the identification of a number of CesA genes. In Arabidopsis

three CesA proteins occur in primary walls (AtCesA1, AtCesA3 and AtCesA6) (Joshi

et al, 2004; Doblin et al., 2002) and at least three other CesA genes (AtCesA4,

AtCesA7 and AtCesA8) are required for cellulose synthesis in secondary cell walls

(Taylor et al., 2003). Eighteen CesA genes were identified in the Populus trichocarpa

genome and named PtCeaA1 to 18, with PtCesA1 to 10 being the most closely related

homologs of AtCesA1 to 10 (Suzuki et al., 2006). Homologues of primary and

secondary CesA genes have recently been identified also in Eucalyptus grandis

(Ranik and Myburg, 2006). Quantitative reverse-transcription PCR in this species

demonstrated that EgCesA1, EgCesA2 and EgCesA3 are abundant in xylem cells

whereas, EgCesA4, EgCesA5 and EgCesA6 are more strongly expressed in unfolding leaves. These studies also determined EgCesA3 is the most strongly expressed in xylem secondary cell walls. It is likely that within the CesA gene family significant functional specialization exists.

There are at least 40 predicted CesA and CesA like genes in the Arabidopsis genome sequence. Based on predicted protein sequences Richmond and Somerville (2001) grouped cellulose synthase-like genes into seven clearly distinguished families: the

Ces A family, which includes the catalytic genes described above, and six families

(CslA, CslB, CslC, CslD, CslE and CslG) of structurally related genes of unknown function designated as the ‘cellulose synthase-like’ genes. Richmond and Somerville

(2000) reported that all of the members of the cellulose synthase super-family appear to be integral membrane proteins, with three to six transmembrane domains in the

15

carboxyl terminal region of the protein and one or two transmembrane domains in the

amino terminal region. Geisler-Lee et al (2006) surveyed expression of 1,600 poplar

enzymes involved in carbohydrate metabolism (CAZymes) in a range of tissues.

Compared to arabidopsis, poplar has about 1.6 times more CAZymes, which might

indicate the importance of this class of enzyme in poplar xylogenesis. In this survey, the most abundant transcripts in developing xylem were sucrose synthase (SuSy) followed by cellulose synthase (CesA), KORRIGAN, a membrane-bound gene

(Master et al., 2004.) and ELP1, which associates with chitinase (Zhong et al., 2002).

1.6.2 Hemicellulose biosynthesis

The remainder of the noncellulosic polysaccharides in the cell wall are known as hemicellulose. This includes heteropolymers such as glucomannan, galactoglucomannan, arabinogalactan, and glucuronoxylan, or homopolymers such as galactan, arabinan, or β-(1-3)-glucan (Suzuki et al., 2006; Plomion et al., 2001). The biosynthesis of these polysaccharides occurs in the Golgi and is divided into two main stages: the synthesis of a backbone by polysaccharide synthases and the addition of side chain residues in reactions catalysed by a variety of glycosyl transferases

(Keegstra and Raikhel, 2001). It is believed that xyloglucan, the predominant hemicellulose in the primary wall of most plants is cross-linked to the cellulose microfibrils, thereby establishing a strong three-dimensional network (Reiter, 1998).

Xyloglucan endotransglycosylases (XETs/hydrolases (XEHs), jointly named XTHs aid xyloglucan to be incorporated and modified in the cell wall network (Rose et al.,

2002). A number of gene models related to XTHs in the poplar genome were identified and among them were some highly expressed cell wall related CAZ enzymes with specific expression patterns (Geisler-Lee et al., 2006).

16

1.6.3 Lignin biosynthesis

Lignin is a heterogeneous phenolic polymer that is most abundant in thickened

secondary cell walls. After cellulose, lignin is the second largest biopolymer on earth.

Due to its hydrophobic nature, it confers impermeability to tracheary elements and

therefore allows the transport of water and solutes through the vascular system. The

lignin content of woody species is an important trait for the pulp and paper industry as

it must be removed from the fibres in order to produce pulp suitable for high quality

paper (Chaffey, 2000).

Lignin is mainly derived from the polymerization of three different hydroxycinnamyl

alcohols (monolignols): p-coumaryl alcohol, coniferyl alcohol, and sinapyl alcohol

(Mellerowicz, et al., 2001; Sibout et al., 2005), which are derivatives of the

phenylpropanoid pathway. This gives rise to the monolignols, para-hydroxyphenyl

(H), guaiacyl (G) and syringyl (S), which vary from each other by their degree of

methoxylation. The relative proportions of these units differ between species, cell

types and in response to the environment (Sibout et al., 2005; Boudet, 1998). The

pathway of lignin biosynthesis is relatively well understood; the monolignols are

synthesised from the phenylpropanoid pathway through successive deamination,

reduction, hydroxylation and methylation steps (Boerjan et al., 2003). A number of

genes for most of the known enzymes of the monolignol biosynthesis pathway have

been cloned and identified (for details see Mellerowicz, et al., 2001). Some research

has been conducted to alter lignin content and composition by genetic modification of genes from the lignin pathway. Tobacco transgenic analyses of phenylalanine ammonia-lyase (PAL) or cinnamate 4-hydroxylase (C4H) have shown reduced lignin

content compared to wild type (Sewalt et al., 1997). Jouanin et al. (2000) have shown

17

a reduction in lignin content (17%) in transgenic poplar by decreasing caffeic acid O- methyltransferase activity (COMT). The enzyme cinnamyl-CoA reductase (CCR) which catalyses the reduction of the hydroxycinnamoyl-CoA esters to their corresponding aldehydes was down regulated leading to an overall decrease in lignin content and an increase in the S:G ratio (Piquemal et al., 1998; Mellerowicz, et al.,

2001). Cinnamyl alcohol dehydrogenase (CAD) is involved in the reduction of cinnamaldehydes into cinnamylalcohols, the last step of the monolignol biosynthesis pathway. During lignin biosynthesis in angiosperms, coniferyl and sinapyl aldehydes are believed to be converted into their corresponding alcohols by CAD and by sinapyl alcohol dehydrogenase (SAD), respectively. An aspen gene was discovered that encodes a SAD enzyme which is specifically involved in the reduction of syringyl monolignol in vitro (Li et al., 2001). However, in Arabidopsis Sibout et al (2005) have shown that the last step of monolignol biosynthesis was altered in mutant tissue.

Their double mutant tissue specific studies indicate that CAD-D and CAD-C reduce coniferaldehyde and sinapaldehyde into corresponding alcohols in inflorescences and stem.

Peroxidases are believed to be responsible for the final oxidation of cinnamyl alcohols to form lignin. In poplar, two anionic peroxidases have been preferentially expressed in developing xylem affecting oxidation on syringaldazine (Goldberg et al., 1983).

Laccases are also believed to polymerize monolignols but their precise role is unclear

(Mellerowicz, et al., 2001). An increased level of phenolic compounds in poplar suggests a possible role for laccases in the oxidation of phenolics leading to cross- linking in the cell wall (Ranocha et al., 2000). Also MYB (Tamagnone et al., 1998) and LIM (Kawaoka et al., 2000) related transcription factors affect the expression of

18

genes involved in the phenylpropanoid biosynthesis pathway by affecting lignin

content and the accumulation of phenolic compounds. Though, a number of studies

have been devoted to the transgenic expression, structure and composition of the polymer to obtain insight into lignin polymerisation, to date, no interaction with cellulose synthases during cell wall formation has been found.

1.6.4 Cell wall proteins

The cell walls of higher plants contain up to 10 % protein; however protein content varies widely among plant species, cell types, and even neighboring cells. The compositional and structural properties of cell wall proteins also vary in response to environmental and developmental factors. Much is known about the structure and metabolic regulation of the various cell wall proteins but relatively little information

is available on their exact function (Showalter, 1993). Most cell wall proteins belong to the hydroxyproline-rich glycoprotein (HRGP) super family and are usually heavily glycosylated. This includes the extensins and chimeric proteins that contain extensin- like domains, proline-rich proteins (PRPs), arabinogalactan proteins (AGPs) and

solanaceous lectins (Sommer-Knudsen et al., 1998).

Cell wall proteins are typically rich in one or two amino acids, contain highly repetitive sequence domains and are either highly or poorly glycosylated (Cassab,

1998). The majority of cell wall protein is cross linked into the wall and probably has

a structural function, although the AGPs may be an important exception. AGPs are

generally not thought to be structural proteins, are typically soluble and may play a

role in morphogenesis. Although, recently it was suggested that some AGPs may

play a mechanical role in cell wall rigidification through oxidative cross-linking

19

(Siefert and Roberts, 2007). Since the abundance of cell wall proteins varies

according to cell type, it is likely that they carry out specific functions. However,

there is little evidence about what these functions might be (Cassab, 1998). Cloning

and characterisation of genes encoding structural cell wall proteins of eucalypts may

help to formulate hypotheses concerning their function within the wall.

1.7 Family-based DNA linkage mapping

There is considerable interest in applying molecular technology to the breeding of

forest trees with improved fibre traits. Initial molecular approaches were primarily focused on the use of within-family genetic linkage mapping and were aimed at finding DNA markers linked to a trait. Considerable research has been directed at

identifying quantitative trait loci (QTLs) in forestry species (Devey, 2004:

Grattapaglia, 2001; Moran et al., 2002). This work has relied on the construction of

high density genetic linkage maps using DNA markers such as restriction fragment

length polymorphisms (RFLPs), microsatellites or simple sequence repeats (SSRs),

random amplified polymorphisms (RAPDs) and amplified fragment length

polymorphisms (AFLPs). DNA markers linked to QTLs for high value traits could be

used in marker-assisted selection (MAS) for efficient and cost effective early

selection for desirable wood fibre traits in eucalypt breeding programs (Moran et al.,

2002).

As with most forest tree species, wood traits of commercial interest in eucalypts are

quantitative (Thamarus et al., 2002) and many of these traits have been shown to be

under moderate to high genetic control. This includes traits such as basic density

20

(Dean et al., 1990; Kube et al., 2001; Muneri and Raymond, 2000), density (Borralho et al., 1992, 1993; Cotterill and Brolin 1997), pulp yield (Dean et al., 1990; Borralho et al., 1993) cellulose content (Kube et al., 2001) and fibre length (Kube et al., 2001).

In eucalypts, genetic linkage maps have been constructed for E. nitens using RFLP

and RAPD markers in an outbred F2 pedigree (Byrne et al. 1995) and in E. globulus and E. tereticornis based on AFLP markers (Marques et al., 1998). A number of

QTLs for wood density, cellulose, fibre length, microfibril angle and pulp yield

(Thamarus et al 2002), volume growth and wood specific gravity (Grattapaglia et al.,

1996) and for pulp yield have been identified in eucalypts (Moran et al, 2002).

MAS based on QTL is most useful for selection within families in narrow breeding

programs. The approach is limited however, because QTLs frequently cannot be

validated in different families, a major limitation in most forest tree breeding

programs which typically have a broad genetic base (Butcher and Southerton 2007).

For this reason, in recent years interest has turned towards the use of association

genetics to identify variation in the DNA sequence of genes directly controlling

phenotypic variation (gene-assisted selection, GAS).

1.8 Population-based association studies

In recent years there has been a shift in research strategy away from family-based

mapping towards population-based association or linkage disequilibrium (LD)

mapping. With the completion of the human, arabidopsis, rice and poplar genome

sequences, attention is increasingly being focused on the discovery and analysis of

genetic variation in natural populations. This research has the potential not only to

21

identify and genetically map disease loci (Hoskins et al., 2001; Cogan et al., 2006),

but also to identify causal polymorphisms within the gene affecting the phenotypes

(Palaisa et al., 2004). Over 10 million single nucleotide polymorphisms (SNPs) are

estimated to occur in the human genome (Syvänen, 2005; Weiner and Hudson, 2002;

Remm and Metspalu, 2001) and SNP-based association studies or LD mapping is now

being used to identify genes or genomic regions associated with complex traits. As

natural populations are used in association studies, recombinations accumulated over

many generations break any long range associations between markers leaving short

stretches of the genome in LD. Factors that increase the extent of LD include

inbreeding, small population size, genetic isolation between lineages, population

subdivision, low recombination rate, population admixture, natural and artificial

selection as well as balancing selection (Gupta et al., 2005). Factors which lead to a

decrease or disruption in LD include out-crossing, high recombination and mutation

rates. A variety of statistical approaches have been used to estimate the level of LD

and these have different strengths, depending on the context (Gupta, et al., 2005;

Jorde, 2000). The two preferred measures of LD are D’ and r2. D’ measures only

recombination differences and r2 summarises recombination and mutation differences.

For association studies, r2 is preferred because it indicates how markers are correlated

with QTL of interest (Abdallah et al., 2003). A few studies have been carried out in

woody plants to assess the diversity of candidate genes (Dillon, et al., submitted; Pot

et al., 2005; Brown et al., 2004; Ingvarsson, 2005). Results from these nuclear

diversity studies indicate that LD in forest tree species is low and typically decays

within the gene. An implication of this low LD in forest trees is that genome wide LD

mapping is difficult as large numbers of markers are required to cover the whole genome. On the other hand LD mapping based on candidate genes should be possible

22

and should result in high resolution mapping of candidate genes. As the association will persist longer over many rounds of recombination and will be found across population of the species. So far, two association studies have been reported in forest tree species (Thumma et al., 2005; González-Martínez et al., 2007)

A major advantage of association mapping in forest trees is the possibility of high resolution mapping of marker-trait associations. Since LD typically breaks down within or close to the gene (Brown et al., 2004, Pot et al., 2005; Thumma et al., 2005)

DNA markers that associate with a trait in natural populations are likely to lie within or near the gene that directly influences the trait and in some cases may be the functional variant. Association studies are particularly useful in forest tree species as tree breeding is only a few decades old and most of the natural variation is present in the breeding populations. The first published association genetics study in forest trees reported an association between the eucalypt cinnamoyl-CoA reductase (CCR) gene and microfibril angle (Thumma et al., 2005). This discovery also indicates the potential for association studies to explore the molecular basis of fibre traits in woody plants. In another study González-Martínez et al. (2007) identified several SNP markers from different candidate genes associated with several wood quality traits in

Pinus taeda. Both of these studies were carried out in populations with low genetic structure. The occurrence of significant population structure can lead to the discovery of false associations (Thumma et al., 2005; Gupta et al., 2005). However, statistical methods have been developed to identify and correct for population structure

(Pritchard et al., 2000). The Transmission/Disequilibrium Test (TDT) was developed to account for structure in families (Spielman et al., 1993)

23

1.9 Single Nucleotide Polymorphism (SNP) discovery and genotyping

The raw material for association studies is single nucleotide polymorphisms (SNPs).

They are defined as single base variations in a DNA sequence, usually represented as two or sometimes three different bases at a single position, and occurring at > 1% allele frequency (Engle et al., 2006; Weaver, 2000). SNPs are the most frequent DNA variations found in most organisms (Solemani et al., 2003; Paris et al., 2003; Choy et al., 1999) and can occur in both coding (gene) and non coding regions of the genome.

In coding regions, SNPs are characterised as synonymous (substitution does not replace the amino acid) and nonsynonymous (substitutions that result in amino acid replacements). The effect of SNPs on phenotypic traits is variable and patchy but is

clearly dependent upon the location of SNPs in the genome. SNPs that occur in

regions upstream of the protein-encoding gene regions might influence the binding of

promoters or repressors, resulting in differential regulation of transcription (Carlson et al., 2005). Polymorphisms at intron/exon boundaries may affect exonic or intronic splicing (Fairbrother et al., 2002; Maniatis and Tasic, 2002).

Initial research on high value wood fibre trait marker analysis relied on restriction

RFLPs, microsatallites, RAPDs and AFLPS linked to QTL studies. Today, SNPs are valuable markers for genetic analysis due to their biallelic nature, abundance and stability compared to microsatellite markers (Dantec et al., 2004; Paris et al., 2003;

Kota et al., 2003). They are useful for the detection of associations between allelic forms of a gene and phenotypes (Jorde, 2000; Emahazion et al., 2001; Ohnishi, et al.,

2000; Yamada et al., 2000), construction of high-resolution genetic maps (Hoskin et al., 2001) and for inferring population history (Brumfield et al., 2003). In recent years, SNP-based genetic studies have been initiated for several major forestry genera

24

including pines (Neale and Savolainen, 2004; Brown et al., 2004), eucalypts

(Thumma et al., 2005) and poplar (Zhang et al., 2005; Ingvarsson, 2005).

Several methods have been reported for identifying and characterising SNPs,

including single-strand confirmation polymorphism analysis (Orita et al., 1989),

heteroduplex analysis by denaturing high-performance liquid chromatography

(DHPLC) (Lichten and Fox 1983), direct DNA sequencing, and computational

methods (Nowotny et al., 2001). Of these methods, direct DNA sequencing is the

most widely used for high-throughput discovery of SNPs in humans, other ,

plants, and forest trees (Chan, 2005; Kononoff et al., 2005; Brown et al., 2004;

Freudenberg-Hua et al., 2003; Schmid et al., 2003). This approach is based on direct

sequencing of DNA segments (amplified by PCR) from a panel of individuals. PCR primers are designed to amplify segments of DNA (1000-2000 bp) derived from a subset of individuals that represent the diversity in the population of interest

(Rafalski, 2002). This approach has been used extensively for SNP discovery within candidate genes in several economically important forest trees including Populus tremula (Ingvarsson, 2005), Pinus taeda (Brown et al., 2004), Pinus spp (Pot et al.,

2005), Pseudotsuga menziesii (Krutovsky and Neale, 2005) and Eucalyptus nitens

(Thumma et al., 2005). For SNP discovery this approach was followed in research described in Chapter 4.

Association studies rely on the capacity to determine genotypes for numerous SNPs in a large number of individuals and the development of accurate, low cost and high- throughput technologies for SNP genotyping has been crucial for their widespread use. A number of methodologies have been developed for SNP genotyping and these

25

are reviewed elsewhere (Eldering et al., 2003; Gut, 2001 and Shi, 2001). Compared with SSR analysis, SNP assays usually do not require separation of DNA fragments by size and therefore can be performed in 96 or 384-well plates. The most common methodologies used for SNP genotyping include hybridisation, primer extension and cleavage, exonuclease detection (TaqMan), invasive cleavage of oligonucleotide probes (invader assay), multiplex ligation-dependent probe amplification (MLPA) and

GoldenGate assay (bead arrays).

To date, SNP genotyping to perform LD mapping has not been frequently used in forest trees however, some forest geneticists have used SNP genotyping in natural populations. For example, Thumma et al. (2005) used primer extension to determine the genotypes of 290 individuals and successfully used association mapping with accurate SNP data. Association mapping using the MLPA and GoldenGate assays has not emerged so far. In humans, Janssen et al. (2005) used the MLPA method to detect copy number, deletion and duplication of exons in the dystrophin gene.

Schouten et al. (2002) used MLPA and detected a single base mutation by using 40 nucleic acid sequences in one single reaction. Most recently, Dillon et al. (CSIRO

Forest Biosciences, Canberra) used Illumina GoldenGate assays for candidate gene base association mapping for Pinus spp (personal communication). In MLPA, three oligonucleotides are used to bind the DNA template immediately flanking the SNP.

The short oligonucleotides are specific to the SNP sequence and can anneal to the target sequences of both alleles and compete with each other for binding. The differential SNP dependent ligation is detected by PCR using the amplification sequences of oligo nucleotides. The GoldenGate assay is an amalgamation of an oligonucleotide probe ligation and allele specific extension reaction which is analysed

26

using Illumina core technology. Compared with MLPA this method allows a large number of SNPs in one reaction with high specific extension and amplification (Shen et al., 2005). This thesis presents genotyping data using MLPA and GoldenGate assays from the same natural population which is described in Chapter 4.

1.10 Summary

The primary focus of this thesis is to identify genes controlling wood property traits in

Eucalyptus nitens, one of the most important hardwood species planted for pulp and paper production. The research begins with the discovery of xylem-expressed candidate genes in a full-sib mapping population that were subsequently used in association mapping in a natural population in order to dissect complex wood quality traits of interest. Chapter 2 documents the experiments undertaken to identify candidate genes that control wood fibre traits and in particular pulp yield. Chapter 3 describes the partial characterization of the novel EgrPAAPA gene from eucalypts, that is differentially expressed in wood tissues with large variation in wood traits such as cellulose content. Finally, in Chapter 4 selected candidate genes were studied further in order to identify common sequence polymorphisms (SNPs) which were then used in association studies in a natural population, aimed at uncovering SNPs influencing pulp yield in E. nitens .

27

Chapter 2

Discovery of candidate genes for pulp yield in Eucalyptus nitens

2.1 INTRODUCTION

Eucalypt plantation forestry for pulp and paper production increasingly relies on breeding programs and elite lines selected for desirable traits. Variation in wood properties is influenced by variation in the size, shape and arrangement of cells in wood, as well as variation in the chemistry of the cell walls. As a consequence, considerable attention has been given to extensive analysis of wood molecular structure and chemistry. Wood formation in trees is the result of five major sequential processes including cell division, cell expansion, cellwall thickening, programmed cell death and heartwood formation

(Plomion et al., 2001). Wood is formed from the terminal differentiation of cells in xylem, a process that typically continues throughout the annual growing season of a woody plant. The specialized nature of the cell walls in wood tissues provides favorable material for the investigation of many aspects of cell wall structure and biogenesis. It is likely that tracheid length, diameter and wall thickness will affect the strength and density of wood, whereas the proportion of cellulose and lignin will impact on the quality of the pulping properties of wood.

Several high value wood traits are under moderate to strong genetic control (Raymond,

2002) and gene loci (QTLs) influencing these traits have been identified in eucalypts

(Moran et al., 2002; Thamarus et al., 2002; Myburg et al., 2003 ; Byrne et al., 1995).

With the completion of the full sequence of the Arabidopsis genome (The Arabidopsis

Genome Initiative, 2000), and most recently Populus trichocarpa, (Tuskan et al, 2006) it is now possible to predict the number of genes that are involved in the development of different organs within woody plants. It has been estimated that 10% of Arabidopsis

29

genes encode proteins which may play a role in the synthesis and modification of cell wall polymers (The Arabidopsis Genome Initiative, 2000). It is likely that eucalypts have a similar number of genes involved in cell wall synthesis. Many of these genes will be expressed in a tissue specific manner; thus a subset of these genes is likely to be involved in fibre cell wall development. The large majority of these genes have not yet been fully characterized in forest trees and their influence on wood fibre traits is not known.

However, the properties of wood, and the process of wood formation, are principally due to the action of genes expressed in differentiating secondary xylem. Greater understanding of these genes and their protein products will help us understand their role in wood formation, and provide potential targets for the directed modification of wood properties.

Microarray technology is a powerful approach to identify genes involved in various aspects of the fibre cell and xylem differentiation and can provide strong pointers to biological function. Ehlting et al. (2005) used this approach to identify candidate genes for lignin biosynthesis and transcriptional regulators of fibre differentiation in

Arabidopsis. Demura et al. (2002) identified genes closely associated with morphogenic events during secondary cell wall biosynthesis in Zinnia cell cultures in tracheary element trans-differentiation. Hertzberg et al. (2001) used poplar cDNA arrays to profile changes in gene expression at various stages of secondary xylem differentiation. The information derived from these studies provides a starting point for unraveling the molecular mechanisms of fibre differentiation. To date there have been limited reports on global examination of gene expression profiles in wood with contrasting traits. Comparison of

30

gene expression profiles in wood fibre tissue between trees with extremes of a particular wood or fibre trait may provide clues to the genes controlling this important wood trait.

The objective of this study was to identify candidate genes that may influence pulp yield in eucalypts and which could be used in the development of eucalypt breeds and clones with superior wood and pulp traits. cDNA microarrays were used to identify genes differentially expressed during wood formation in low and high pulping trees in a

Eucalyptus nitens full-sib progeny trial. The accuracy and sensitivity of the microarray data was evaluated for selected candidate genes using real-time RT-PCR. A number of interesting genes that may influence variation in wood formation, in particular wood pulping properties, were identified. Some of these genes will be good candidates for SNP identification and association studies aimed at identifying allelic variation linked to variation in pulp yield.

2.2 MATERIALS AND METHODS

2.2.1 Plant materials

Approximately three hundred nine-year-old full sibling Eucalyptus nitens progenies from a wide intra-specific cross growing at Ridgely, Tasmania were used as plant material for gene discovery. The four grandparents of the cross came from widely separated populations throughout the natural distribution of the species (Byrne and Moran, 1994).

Xylem tissue was isolated from each tree by cutting and removing a window in the bark,

31

approximately 150 x 300 mm, and the young xylem tissue scraped from the exposed wood with a clean chisel. Xylem was harvested on three successive days between the hours of 8.00am and 11.00am. The harvested xylem tissue was immediately frozen in liquid nitrogen and stored on dry ice. The frozen xylem tissue was subsequently stored at

-80°C prior to RNA isolation.

2.2.2 Wood sample preparation and pulp trait analysis

Two bark to bark wood cores collected at 1.1 m from 300 trees were used for determining

predicted pulp yield and cellulose content analysis. Each core was broken into pieces and

then ground in a Wiley Mill using a 1.0 mm screen for 1 minute. Cellulose was

measured by digesting 1.0 g oven dried wood meal with diglime and HCl at 90°C for 1

hour then washed with methanol and water. After drying overnight at 105°C the

remaining crude cellulose was determined as a percentage of weight of the woodmeal.

Details of sampling and laboratory processing are given in Raymond et al. (2001).

Predicted pulp yield was obtained by near-infrared reflectance analysis (NIRA). NIR spectra were measured on two sub samples of the wood meal by diffuse reflectance in a scanning spectrophotometer (NIR Systems Inc Mode 5000). The predicted pulp yield and cellulose content data was normally distributed (STDV-1.15) and the highest and lowest pulp yield was 55% and 47.5% respectively. Trees selected for comparison had

pulp yields of >52.63 % (high pulp) and <49.69% (low pulp), respectively.

32

2.2.3 Microarray printing cDNA microarrays containing approximately 5,800 clones were printed using cDNA clones from a number of sources. About 4,900 clones were randomly selected cDNAs amplified by Deyou Qiu (Qiu et al., 2008) from a cDNA library from mature xylem from

Eucalyptus grandis. Approximately 100 cambial-specific E. globulus cDNAs (Bossinger and Leitch, 2000) and a further 800 cDNA clones from a E. nitens leaf cDNA library obtained from Karen Fullard (CSIRO Forest Biosciences, Canberra) were included on the arrays. Inserts were amplified by PCR and purified in the method of Qiu et al., (2008). cDNAs were printed onto Superaldehyde glass slides (TeleChem, Sunnydale, CA) using a Bio-Rad (Hercules, CA) VersArray Chipwriter Pro with SMP 3 stealth microspotting quill pins (TeleChem, Sunnydale, CA). Each cDNA was spotted twice on a different area of the same slide. Glass slides were processed according to the TeleChem protocols.

2.2.4 RNA isolation

Total RNA was isolated from xylem tissues of high and low pulp yield trees using the methods of Southerton et al. (1998) with slight modifications. Three grams of frozen xylem tissue was ground in liquid nitrogen with a mortar and pestle and dissolved in pre- warmed (65°C) extraction buffer (2% CTAB, 2% PVP40, 100 mM Tris-HCL pH 8.0, 25

mM EDTA, 2 M NaCl, 0.5 g/l spermidine and 2% β-mercaptoethanol). The tube was shaken vigorously and incubated at 65°C for 5 min. The solutions were then extracted

twice with an equal volume of chloroform by centrifugation at 7500 rpm for 15 min. The

supernatant from the second extraction was mixed with 0.25 volume of 10 M LiCl and

the RNA was precipitated over night in an ice bucket at 4°C. RNA was pelleted with

33

centrifugation at 7500 rpm for 20 min at 4°C and then dissolved in 500 µl of SSTE (0.5 %

SDS, 1 M NaCl, 10 mM Tris-HCL pH 8.0 and 1 mM EDTA pH 8.0). The redissolved

solution was transferred into an eppendorf tube and extracted with an equal volume of

chloroform. The purified RNA was precipitated by adding 2 volumes of ethanol and

placed at -70°C for 1 hour and pelleted by centrifugation at 13000 rpm at 4°C. The pellet

was then washed with 75% ethanol at 13000 rpm for 20 min at 4°C and dried before

dissolving in DEPC (diethyl pyrocarbonate) treated water. All RNA samples were

verified by formaldehyde gel electrophoresis following the RNeasy mini kit (QIAGEN,

Hilden, Germany) and stored in 70% ethanol in a -80°C freezer. RNA was further

quantified by measuring A260 using a dual-beam spectrophotometer.

2.2.5 Probe preparation

Total RNA pooled into eight-tree-bulks was used to synthesize probes for array

screening. One hundred micrograms of total RNA (12.5 µg from each tree) was reverse

transcribed using Superscript II reverse transcriptase (Invitrogen, Carlsbad, California)

and oligo (dT) 23mer (Sigma-Aldrich Corp, St Louis, Missouri). The 40 µl reactions

contained 24.5 µl of 100 µg of total RNA, 0.5 µl of 0.5 µg/µl oligo dT, 8 µl of 5x

Superscript II Ist synthesis buffer, 4 µl of 0.1 M DTT, 2 µl of dNTP mix (10 mM each of

dATP, dCTP, dGTP, dTTP), 1 µl of 0.2 U/µl SuperscriptII reverse transcriptase. The

reaction mixture was incubated at 42°C for 1 hour, treated with 0.20 µl RNase H (5 U/µl)

at 37°C for 30 minutes. After incubation the mixture was adjusted to a volume of 200 µl

with TE buffer (pH 8.2) and first strand cDNA purified using Amicon Microcon YM30

filters (Millipore, Bedford, MA) following the manufacturers methods. The cDNA was

34

resuspended in a volume of 8 µl using TE buffer. Probes for hybridisations were obtained by labeling of the cDNA using either CyDyeTM3-dUTP or CyDye TM5- dUTP

(Amersham Biosciences, Buckinghamshire, England). The labeling reaction including

9.5 µl of ddH2O, 2 µl of cDNA, 10x Klenow buffer (USB Corporation, Cleveland, Ohio),

1 µl of random primers (3 µg/µl, Invitrogen, Carlsbad, California), was incubated for 2 minute at 95°C, then cooled on ice for 5 minutes. After cooling, 0.5 µl of CyDyeTM3-

dUTP or CyDye TM5, 2 µl dNTP mix (0.25 mM each, except 0.09 mM dTTP) and 1 µl of

5 U/µl Klenow polymerase (USB Corporation) was added to the reaction and the reaction

incubated for 3 hour at 37°C. The Cy3 and Cy5 probes were combined and purified using

the same Amicon Microcon YM30 filter as describe above and dried in a Speed Vacuum.

2.2.6 Array hybridisation and washing

Microarrays were co-hybridized with a Cy5 high pulp probe synthesised from RNA

isolated from 8 high pulp yield trees and a Cy3 low pulp probe synthesised from RNA

isolated from 8 low pulp yield trees (Figure 2.11). This hybridisation was repeated three

times with RNA isolated from different trees and all hybridisations performed with the

dyes swapped.

35

cDNA arrays

Probe from 8 low Probe from 8

pulp yield trees high pulp yield trees

RNA RNA

y

equenc

Fr

Pulp trait

Figure 2.1 Discovery Approach

High and low pulp combined probes were resuspended in 20 µl hybridisation buffer (5x

SSC, 0.1% SDS, 50% formamide, 0.1 mg/ml salmon sperm DNA) then denatured at 95°C for 5 minutes and chilled on ice before being applied to the arrays. The slide was covered with a 25 x 25 mm cover slip (Frale Scientific, USA). About 30 µl of water was loaded onto each corner of the slide and the slide incubated over night at 42°C in a water-proof humidified hybridisation chamber in the dark. After hybridisation slides were washed for

36

10 min in 2x SSC/0.1% SDS at 42°C, 0.1x SSC/0.1% SDS for 10 min at room

temperature, 0.1x SSC for 5 min twice and finally in 0.1x SSC for 15 min in a shaker at

room temperature. The slides were then dried by centrifugation at 1000 rpm for 1

minute.

2.2.7 Scanning and data analysis

The microarray slides were scanned using a GenePix 4000A microarray scanner (Axon

Instruments Inc., Foster City, California) and analysed with the GenePix 3.0 program.

Grids were predefined and manually positioned to ensure optimal spot recognition.

Unreliable spots due to dust contamination, false intensity, background noise etc. were flagged on the data set so that they could be excluded from later analysis. The scanned data were spatially normalized using the methodology of Wilson et al. (2003) and used in statistical analysis. Median values were calculated for each replicate of each spot in order to identify clones that were differentially expressed between high and low pulp xylem tissues. Interesting clones were selected using the following criteria: 1. The clone consistently appeared in the extreme 5% of differentially expressed clones in both dye swaps of each biological replicate. 2. The clone had an average median expression value at least 4 times greater than the background for both channeled. 3. Visual inspection confirmed differential expression.

37

2.2.8 Sequencing and phylogenetic analysis

The 5’ regions of all differentially expressed cDNAs were sequenced using the pBluescript reverse primer (5’ GGAAACAGCTATGACCATG 3’) from plasmid DNA.

Plasmid DNA was isolated from single cell cultures of each differentially expressed clone using a plasmid mini prep kit (QIAGEN, GmbH, Hilden). Sequencing was performed using BigDye Terminator Version 3 reagents and a sequence analyzer from ABI PRISM.

Further sequences for selected clones were obtained using the pBluescript forward primer

(5’ GTAAAACGACGGCCAGT 3’) and internal primers. The individual ESTs were searched against the Genbank database using the tBLASTX program (National Center for

Biotechnology Information http://www.ncbi.nlm.nih.gov./Entrez) in order to identify

significant similarity with published gene data. Predicted amino acid sequences were

obtained using the ExPASy translate tool (http://kr.expasy.org/tools/dna.html). Multiple

sequence alignments of candidate pulp genes with published sequences were obtained

using the CLUSTAL program and phylogenetic analysis was performed using MEGA

version 3.0 (Kumar et al., 2004). Candidate pulp yield genes were selected on the basis

of their significant similarity with known functions and their expression ratio.

2.2.9 Real-Time RT-PCR analysis

Real-Time PCR was performed on selected candidate genes (EgrCesA3, EgrNAM1 and

EgrHB1) in an attempt to confirm the microarray results. The quality and quantity of

individual total RNA samples was evaluated prior to RT-PCR by photometric

measurements after gel electrophoresis with a standard RNA marker (Promega, Madison,

USA). The house keeping gene EgrSEC13 was used as control gene to normalize

38

transcript levels between different samples. This gene showed no difference in expression between top and bottom xylem in eucalypt branches and was used as a control gene for comparing splice variant expression in CCR (Thumma et al., 2005) for RT-PCR experiments. The EgrSEC13 forward and reverse primers (Table 2.1) were a gift from

Colleen MacMillan (CSIRO Forest Biosciences, Canberra). cDNA derived from each of five high and five low pulp trees was used as template for RT-PCR amplification with each gene. This was repeated three times with different RNA samples from the high and low pulp yield groups.

Five micrograms of total RNA from each sample was used for cDNA synthesis using

SuperScript II Reverse transcriptase (Invitrogen). Approximately 1-2 µl RNA (5 µg) was mixed with 1 µl of oligo(dt) 21 mer (500 µg/ml; Proligo, Australia), 1 µl of dNTPs (10 mM each) and 8-9 µl nuclease free water to a total volume of 12 µl. The mixture was incubated at 65°C for 5 min followed by brief cooling on ice and centrifugation to collect

contents. The following were then added: 4 µl of 5x First-Strand Buffer, 2 µl of 0.1M

DTT and 1 µl of RNase Out (40 units/µl; Invitrogen) and the contents mixed gently by

pipetting. The samples were incubated at 42°C for 2 min and 1 µl SuperScript II reverse

transcriptase (200 units) was added and mixed by pipetting up and down. The synthesis

of cDNA was performed at 42°C for 50 minutes and then reverse transcriptase activities

were stopped by heating at 70°C for 15 minutes. Each reverse transcriptase reaction was

diluted to 100 ng/µl, and 1 µl was added as template for the real-time PCR reaction.

Forward and reverse gene-specific primers were designed manually and are listed in

Table 1. All primers had a Tm of > 55°C and were designed to produce a PCR product of

39

180 to 203 bp. To avoid amplification of genomic DNA, primers (one of two) also spanned an intron splice junction. The primers were purchased from Proligo, Australia.

Table 2.1 Real-Time RT-PCR primers

Target genes Primers Sequences (5' - 3') Amplicon length (bp) EgrCesA3 EgrCesA3 F ATGCCTTGGTTCGTGTCTC 203 EgrCesA3 R CCTGTTGGCATACCGATCA EgrNAM1 EgrNAM1 F AATATCAAGATAAGGACCCG 189 EgrNAM1 R TTCTTCTTTGACCTTGGTAG EgrHB1 EgrHB1 F CAGTAAAGGAAGCTTCTAGC 180 EgrHB1 R ATAGAGGCCACATGTTCCT EgrSEC13 EgrSEC13a GATGCACTCTGATTGGGTCA 250 EgrSEC13b CACTGCCTCATTCCACAAAG

Standard curves were generated using the EgrHB1 and EgrNAM1 genes to estimate the transcript levels for RT-PCR. The EgrSEC13 gene was used as a positive control, and water was used as a no-template control in each standard curve. Seven serial diluted concentrations (0, 25, 50, 100,150, 200 and 250 ng/µl) of cDNA derived from a single tree was used in each experiment and measurements at each concentration of cDNA were repeated three times. Relative concentrations were determined by analyzing the raw data using Rotor-Gene 2000 software version 5.0 (Corbett Research, Sydney, Australia) and then standard curves computed using Microsoft Excel software.

Quantitative RT-PCR was carried out in a fluorometric thermal cycler (Rotor-Gene 2000,

Corbett Research, Sydney, Australia) with a final volume of 20 µl. The reaction mixture contained 1 µl of a 1:10000 dilution of SYBR Green I (Invitrogen), 1 µl template (~100

40

ng/µl), 2 µl of 10x buffer, 1.6 µl dNTPs (2.5 mM each), 2.8 µl of 25 mM MgCl2, 1 µl of

20 µM each primer, 0.16 µl of 5 U/µl Taq Platinum Polymerase (Invitrogen) and 9.44 µl

of distilled water. Real-time PCR reactions were performed with the following cycling

conditions: 5 min initial denaturation at 94°C, 45 cycles of 15 s at 94°C, 15 s 53°C, 25 s at

72°C followed by 5 min at 40°C, and 1 min at 55°C. This was followed by a melting-

curve program (53°C to 99°C, with a 5 s hold at each temperature). For each experiment,

the EgSEC13 house keeping gene was used as a positive control to correct for uneven amounts of cDNA within samples. A no-template reaction was also included as a negative control in each experiment. Each cDNA sample was tested in triplicate and the mean values calculated. Raw data was analysed with the Rotor gene 2000 software version 5.0 (Corbett Research). Concentrations of each cDNA sample were calculated and the results corrected for uneven cDNA template and background using the controls.

All data were expressed as mean values and differences between the high and low pulp groups were determined using a t-test.

2.3 RESULTS

In order to identify genes potentially involved in wood fibre development, in particular genes affecting pulp yield, cDNA microarrays containing ~5800 cDNA clones were screened with probes synthesised from RNA bulks isolated from trees with either high or low pulp yield. In total, 46 transcripts were found to be differentially expressed; 27 transcripts were more abundant in high pulp trees and 19 were less abundant in high pulp trees, compared to low pulp trees. Partial DNA sequences were obtained from the 5’ end of clones and trimmed manually to remove vector and poor quality sequences. The

41

average sequence length was 700 bp after trimming. All ESTs from the pulp yield screens were assigned a cellular role on the basis of sequence similarity with genes of known function as revealed using TBLASTX with a cut off E value < e-10. In total 43

(93.48%) ESTs showed significant similarity to gene sequences deposited in public

databases (Table 2.2 and Table 2.3).

2.3.1 Selected genes more active in high pulp yield progeny

Table 2.2 lists the results of homology searches of differentially expressed genes which were more active in high pulp yield trees. Most high pulp yield ESTs were highly similar

(score >300) to Arabidopsis proteins of known function, including cellulose synthase, beta expansin, xyloglucan endotransglycosylase, beta amylase, pectate lyase, heat shock protein, ATPase, homeodomain-leucine zipper protein and methionine adenosyltransferase . Some of these genes are well characterized and they are known to play a significant role in cell wall formation. One differentially expressed clone (p59h2) had homology to two different genes namely XTH9 and a galactokinase family protein.

The 5’ sequence of clone p59h2 showed strong similarity with XHT9 whereas the 3’ sequence shared homology with a kinase family protein. Two clones, p26h2 and p34g6 showed strong homology with Arabidopsis cellulose synthase protein (AtCesA7) but the clone p34g6 was truncated. Further sequencing revealed that the two clones are very likely to be derived from transcripts of the same gene as they shared 98% DNA homology. Six transcripts were moderately similar (scores between 100 and 300) to known Arabidopsis proteins whereas the rest of the up-regulated clones showed low similarity with a BLASTX score <100.

42

Table 2.2 Selected genes more active in high pulp yield progeny.

Clone ID Blast Homology Blast Arabidpsis Average Score Locas tag Expression Ratio p59h2 Xyloglucan endotransglycosylase (XTH9) 311 At4g03210 2.69 Galactokinase 412 At3g10700 p8g1 Unknown protein 327 At5g57410 2.18 p13d10 Clathrin binding protein 311 At4g31480 2.14 p36h1 No hit 1.98 p26h1 Unknown protein 154 GI:9294015 1.82 p27a5 No apical meristem (NAM) family protein 75.5 At1g33060 1.81 p34g6 Cellulose synthase (AtCesA7 ) 351 At5g17420 1.79 p41a7 Unknown protein 427 At5g55120 1.71 p28c2 Pectate lyase 474 At4g24780 1.69 p33g10 Protein phosphatase type 2C 198 At3g02750 1.62 p46d2 Beta-expansin (EXPB3) 362 At4g28250 1.58 p10b10 Calmodulin-binding family protein 83.6 At3g22190 1.58 p65d3 Class III Homeodomain leucine zipper protein 411 At1g52150 1.57 p17c10 No hit 1.56 p8c1 Unknown protein 428 At5g57410 1.54 p9d5 Protein arginine N-methyltransferase 300 At2g19670 1.54 p65f10 Peptidoglycan-binding LysM 85.5 At5g62150 1.53 K1ENcold3C1 Translationally controlled tumor protein 231 At3g16640 1.51 p59d12 Methionine adenosyltransferase (SAM 2) 479 At4g01850 1.50 K1ENcold2H9 Beta-amylase like protein 394 At5g18670 1.50 p65h1 ATPase (AVP1) 554 At1g15690 1.45 p25f6 Map 4 kinase alpha2 231 At3g15220 1.45 p44e2 Alcohol dehydrogenase 473 At1g77120 1.44 K3ENcold10D2ATPase 218 At1g17840 1.42 p26h2 Cellulose synthase (AtCesA7 ) 481 At5g17420 1.41 p23b6 Heat shock protein 403 At5g52640 1.27

43

2.3.2 Selected genes less active in high pulp yield progeny

Table 2.3 lists those clones whose expression was found to be down regulated in high pulp yield trees compared to low pulp yield trees. Of 19 down regulated clones, 15 showed significant similarity with plant sequences in public databases with > 100

BLASTX scores. Three clones had BLAST scores <100 and only one did not show any homology with known protein sequences. All down regulated ESTs are homologous to

Arabidopsis or plant proteins with known function except one EST (p54c4) which showed similarity to an integral membrane transporter protein from Homo sapiens. The most strongly upregulated transcripts in low pulp yield progeny were a Ras-related/GTP binding protein, glutathione peroxidase and a zinc binding protein kinase C inhibitor.

Interestingly two different zinc binding protein transcripts were more abundant in the low pulp yield progeny.

44

Table 2.3 Selected genes less active in high pulp yield progeny.

Clone Blast Homology Blast Arabidopsis Average ID Score Locus tag Expression Ratio p1b8 Unknown protein 293 GI:9294179 0.33 p49a3 GTP-binding protein 392 At4g35860 0.36 p54b12 Zinc binding protein kinase C inhibitor 147 At3g56490 0.36 p14c11 Glutathione peroxidase 294 At4g11600 0.39 Dicarboxylate/tricarboxylate carrier p46f5 protein 463 GI:21554157 0.47 p71b2 ABC transporter family protein 296 At5g06530 0.47 p27g6 UDP -Xylose synthase 4 (UXS4) 561 At2g47650 0.50 p17a8 Cytochrome b 55.8 GI:402962 0.51 Mud9 No hit 0.56 Vacuolar proton pyrophosphatase p17a4 (H_PPase) 441 At1g15690 0.57 Zinc finger (C3HC4-type ring finger p29f4 )protein 177 At1g73760 0.57 p41f4 Unknown protein 279 At4g27720 0.58 p26d2 Stromal cell-derived factor 2-like protein 239 At2g25110 0.59 p54c4 Integral membrane transporter protein 54.7 *HS 0.60 Mud5 Auxin induced gene 51.5 *ZE 0.60 Glyceraldehyde-3-phosphate p26g2 dehydrogenase 478 At1g13440 0.62 p47d12 Unknown protein 225 At1g15760 0.64 p2h1 14-3-3 protein GF14 phi (GRF4) 392 At1g35160 0.68 p44d4 Phosphatidic acid phosphatase 2 295 At1g15080 0.70

* HS = Homo sapiens, ZE = Zinnia elegans

There are no established criteria for similarity searches (BLAST scores or E values) to determine orthologs within gene families (Sterky et. al., 2004). The above results demonstrated that most of the differentially expressed clones are homologous to

sequences found in the Arabidopsis genome.

45

2.3.3 Functional classification of selected pulp yield genes

ESTs with protein identities were classified into 12 functional groups using BLAST annotation data. Table 2.4 describes the functional categorization of the 46 transcripts detected in high pulp vs low pulp xylem expression analysis. Genes involved in cell wall biogenesis, general metabolism and transport were among the most represented in the comparisons.

46

Table 2.4 Functional classification of selected pulp yield genes

Functional category Putative gene identification

Cell wall biogenesis Cellulose synthase (AtCesA7) Pectate lyase Beta-expansin Peptidoglycan-binding LysM Xyloglucan endotransglycosylase UDP-xylose synthase 4 Heat shock/stress Heat shock protein Glutathion peroxidase Protein folding Methionine adenosyltransferase (SAM2) S-adenosylmethionine-dependent methyltransferase Cytoskeleton Map 4 kinase alpha2 Protein synthesis Beta-amylase like protein Translationally controlled tumor protein Calmodulin-binding protein Calthrin binding protein Transcription Class III Homeodomain-leucine zipper protein No apical meristem (NAM) protein Zinc finger (C3HC4-type Ring finger) Zing binding protein kinase C inhibitor Metabolism Alcohol dehydrogenase Galactokinase Cytochrome b Stromal cell-derived factor 2-like protein Glyceraldehyde-3-phosphate dehydrogenase Transport ATPase ATPase (AVP) vacuolar proton pyrophosphatase (HPPase) Integral membrane transporter protein Dicarboxylate/trycarboxylate carrier protein ABC transporter family protein Signal transduction Protein phosphatase 2C (PP2C) GTP binding protein 14-3-3 protein GF14 phi (GRF4) Phosphatidic acid phosphatase 2 (ATPAP2) Hormone Auxin induced gene Unknown protein No hit

47

2.3.4 Functional groupings of selected pulp yield genes

Figure 2.2 shows the distribution into functional groups of candidate pulp yield genes detected in high pulp vs low pulp xylem expression analysis. There were some clear differences in the abundance of transcripts falling into certain functional categories.

There appeared to be greater activation of genes involved in cell wall (23%) and cytoskeleton (4%) biogenesis, protein folding (7%) and protein synthesis (15%) in trees with higher pulp yield. Only one gene involved in cell wall biogenesis was found to be active in low pulp trees. Genes involved in metabolism (16%), transport (21%), and signal transduction (16%) were more active in lower pulp yield trees. The percentage of genes in other categories was broadly similar between the two pulp yield classes.

4% 7% 15% 4% 7% Cell wall biogenesis Selected genes more 4% 23% Heat shock/stress active in high pulp tree 7% Protein folding Signal transduction xylem 7% 7% 15% Hormone Unknown protein

Transporter protein

Cytoskeleton

5% 5% 16% No hits 5% Metabolism Protein synthesis 16% Selected genes less 21% Transcription active in high pulp tree xylem 11% 5% 16%

Figure 2.2 Distribution of selected pulp yield genes into functional groups.

48

2.3.5 Candidate genes putatively affecting pulp yield

A number of the candidate genes identified in the microarray analysis have similarity to genes that are involved in cell wall biogenesis, and were considered candidates for influencing pulp yield (Table 2.5). These included a cellulose synthase gene highly homologous to EgrCesA3 (Ranik and Myburg 2006), a homeodomain-leucine zipper protein (EgrHB1), a no apical meristem (EgrNAM1) family protein, a zinc finger family protein (EgrZnf1) a xyloglucan endotransglycosylase, a galactokinase. EgrCesA3,

EgrNAM1, EgrHB1 and EgrZnf1 were selected for further analysis.

Table 2.5 Candidate pulp yield genes identified in Eucalyptus nitens.

Pulp Yield Clone IDAbbreviation Gene Function

P26h2 EgrCesA3 Cellulose Synthase Secondary cell wall biogenesis P27a5 EgrNAM1 No apical meristem family protein Plant development P59h1F EgrXET Xyloglucan endotransglycosylase Cell Expansion Up P59h1R EgrGalK Galactokinase Carbohydrate transport and metabolism Down P65d3 EgrHB1 Class III HD-Zip Vascular development

P29f4 EgrZnf1 Zinc finger (C3HC4-type Ring finger) Protein binding, zinc ion binding

49

2.3.6 Real-Time PCR assays

The reliability of microarray data can be compromised due to the large number of genes involved in the hybridisations. In addition, plants often possess multigene families that can give rise to cross-hybridisation between cDNA representatives of members of the same family. A common approach to confirm microarray results for candidate genes is to use real-time PCR (RT-PCR) analysis (Ehlting et al., 2005; Kinser et al., 2004; Lancaster et al., 2004; Maguire et al., 2002; Klok et al., 2002; Jones et al., 2002). Expression of three candidate pulp yield genes was validated using quantitative RT-PCR using the same

RNA samples that were used for the microarray hybridisation (detailed in the Materials and Methods section). For successful RT-PCR analysis an optimal primer fit is the most important determinant (Pfaffl and Hageleit, 2001). Primers were designed to maximize annealing specificity and annealing temperature, to minimize primer dimer formation. and to make optimal use of buffer conditions. In addition, amplification of genomic DNA was avoided by designing primer pairs that span exon:exon splice junctions thus a DNase treatment of RNA samples was unnecessary. All primer pairs produced the expected

PCR products (Figure 2.3 and Table 2.1).

50

(1)

500 bp

CesA3 Pro. SEC13 Pro. EgrCesA3 Pro. EgrSEC13 Pro.

(2)

(2)

500 bp

(3) Sec13EgrSEC13 Pro. Pro. EgrHB1 Pro. HB-15HD-Zip Pro. Pro.

(3)

500 bp Sec13 Pro. NAM Pro.

EgrSEC13 Pro. EgrNAM1 Pro.

Figure 2.3 Real-Time PCR primer pair products of EgrCesA3 (2.3.1), EgrHB1(2.3.2), EgrNAM1(2.3.3), and EgrSEC13 (control gene primer). All products are shown in triplicate reactions. The product from first lane of figure 2.3.2 is EgrHB1 protein.

51

2.3.7 Expression of control gene in real-time RT-PCR

The level of expression of EgrSEC13 was relatively constant in all real-time PCR amplifications. Figure 2.4 demonstrates the reliability of EgrSEC13 amplification as a control gene in this study. The PCR signal is initially below the level of detection and increases with cycle number until it crosses a threshold. In total 12-13 cycles were needed to reach this threshold level.

EgrCesA3EgCesA1 amplification

EgSEC13EgrSEC13amplification

Fluorescence

Ct

Cycle number

Figure 2.4 Amplification plot for EgrCesA3 and EgrSEC13 using RT-PCR. Logarithmic fluorescence versus cycle number is shown. Each coloured trace represents amplification from the cDNA of an individual tree. The three violet coloured traces represent EgrCesA3 amplified three times from the same cDNA. The remaining coloured traces represent EgrSEC13 amplified three times from each of 5 trees. The Ct-value indicates the number of cycles for an individual sample to reach the threshold level. The threshold is set at a level where the rate of amplification is the greatest during the exponential phase. Data is derived from a single experiment.

52

2.3.8 Standard transcript levels for quantitative PCR assays

For accurate quantification of transcript abundance, standard curves were constructed for

EgrHB1 and EgrNAM1 using the competitive quantitative RT PCR system (Bustin,

2000). An approximately linear relationship was obtained (EgrHB1: R2 = 0.8737,

EgrNAM1: R2 = 0.9485) between the diluted DNA samples and their corresponding

concentrations (Figure 2.5 and 2.6). This demonstrated that the PCR reaction was efficient and that there was a high correlation between the original amount of cDNA in the template and their corresponding Ct values obtained after amplification. This indicated that the RT-PCR assays could be used for quantitative measurement of expression levels.

6.00

5.48 5.00 y = -0.7029x + 5.3279 R2 = 0.8737 4.00 3.55

3.00 A ve.Com. Quant . 2.61 Linear (A ve.Co m. Quant .) 2.31 2.00 2.00 Aver. Com.Conc. Aver.

1. 0 0 0.86 0.82

0.00 20 0ng/ ul 100ng/ ul 50ng / ul 2 5ng/ ul 12.5ng/ ul 6 .25ng / ul 0ng / ul

cDNA Concentrations

Figure 2.5 Standard curve for EgrHB1 quantitative RT-PCR assays. Seven serial dilutions from 200 ng to 0 ng of EgrHB1 cDNA were used to test the capacity of RT-PCR to detect quantitative changes in gene expression.

53

0.035

0.030 0.030 y = -0.0061x + 0.0338 0.025 R2 = 0.9485 0.023 0.020 Aver.Comp.Conc. 0.015 Linear (A ver.Comp .Conc.) 0.012 0.010 0.007 Aver. Comp.Aver. Conc. 0.005 0.003 0.000 0.000 250ng/ul 200ng/ul 100ng/ul 50ng/ul 25ng/ul 0ng/ul -0.005

cDNA Concentrations

Figure 2.6 Standard curve for EgrNAM1 quantitative RT-PCR assays. Six serial dilutions from 250 ng to 0 ng of EgrNAM1 cDNA were used to test the capacity of RT- PCR to detect quantitative changes in gene expression.

2.3.9 Quantitative analysis of EgrCesA3, EgrHB1 and EgrNAM1 by real-time PCR

Expression of each gene was quantified in three separate experiments using cDNA templates from high and low pulp yield trees. One hundred ng cDNA of each sample was used as template in a single reaction, and each sample was performed in triplicate.

Quantitative expression of each sample was estimated using the Rotor gene 2000 software and average expression was compared between high and low pulp yield trees.

The level of expression of the three candidate pulp yield genes and their quantitative analysis is shown in Figures 2.7.1 – 2.7.3.

54

Expression of EgrCesA3 (Figure 2.7.1) was slightly higher in high pulp yield trees compared to low pulp yield trees; however, the difference was not significant based on analysis of variance (p=0.05).

3.40

3.30

3.20

3.10

3.00 H-Pulp L-Pulp 2.90

Comparative Conc. 2.80

2.70

2.60

2.50 Pulp Yield Group

Figure 2.7.1 Quantitative expression of EgrCesA3 in high and low pulp yield trees. In each group 15 trees from three different experiments were used for comparison analysis. Error bars denote the standard deviation.

55

The average expression of EgrHB1 was significantly (p = 0.01) higher in low pulp yield trees compared to high pulp yield trees (Figure 2.7.2). The quantitative expression was

0.13 in high pulp trees and 0.18 in the low pulp group with an average standard deviation of 0.06 and 0.08 respectively.

0.25

0.20

0.15 * H-Pulp L-Pulp 0.10

0.05

0.00 Pulp Yield Group

Figure 2.7.2 Quantitative expression of EgrHB1 in high and low pulp yield trees. In each group 15 trees from three different experiments were used for analysis. Error bars denote the standard deviations. Asterisk indicates significant difference between two pulp yield groups by ANOVA analysis.

56

The average expression of EgrNAM1 was significantly higher (p=0.05) in high pulp yield trees compared to low pulp yield trees (Figure 2.7.3). The quantitative expression was

4.96 in high pulp trees and 3.47 in low pulp trees with an average standard deviation of

1.18 and 0.25 respectively.

6.00

5.00

4.00 * H-Pulp 3.00 L-Pulp

2.00 Aver. Comparative Conc.

1.00

0.00 Pulp Yield Group

Figure 2.7.3 Quantitative expression of EgrNAM1 in high and low pulp yield trees. In each group 10 trees from two different experiments were used for analysis. Error bars denote the standard deviations. Asterisk indicates significant difference between two pulp yield groups by ANOVA analysis.

57

2.3.10 Phylogenetic analysis

Cellulose synthase

The cellulose synthase gene (EgrCesA3) was more strongly expressed in high pulp trees, shared strongest DNA homology (99%) with EgCesA3 (AAY60845.1), a secondary cell wall cellulose synthase identified in eucalypts (Ranik and Myburg, 2006). It also shared strong amino acid sequence similarity with AtCesA7 (88%) and PtrCesA2 (87%) from arabidopsis and poplar respectively. The full length EgrCesA3 cDNA is 3305 bp in length with an uninterrupted open reading frame of 3120 bp encoding a predicted protein sequence of 1040 amino acids. Thirty six full length CesA amino acid sequences from

Eucalyptus grandis, Arabidopsis thaliana, Oryza sativa, Populus tremuloides, Pinus taeda and Pinus radiata were compared with the EgrCesA3 protein and a phylogenetic tree illustrating the position of EgrCesA3 is shown in Figure 2.8.

58

AtCesA10

AtCesA1

EgCesA5 PrCesA1 PtCesA3 OsCesA9 OsCesA1 AtCes7 PtrCesA2

EgrCesA3 (Pulp) EgCesA3

PrCesA2 EgCesA1 PtrCesA1 PrCesA10 AtCesA8 OsCesA5 OsCesA2

OsCes4 PtCesA2

PtrCeSA3 AtCesA3 PtrCesA5 AtCes4 EgCesA4 EgCesA2

OsCesA3 PtrCesA6 OsCesA7 EgCesA6 OsCesA8 AtCesA9

AtCEsA2 PtrCesA7 AtCesA5

AtCesA6

Figure 2.8 Unrooted phylogenetic analysis of CesA proteins from Arabidopsis thaliana, Populus tremuloides, Pinus spp, Oriza sativa and Eucalyptus grandis. CLUSTALW was used to align deduced amino acid sequences and the unrooted neighbor-joining tree was obtained using MEGA 3.1. Bold lines indicate the group of secondary cell wall CesAs related to EgrCesA3. GenBank accession numbers for the proteins used were as follows: Arabidopsis thaliana, AtCesA1 (AF027172), AtCesA2 (AF027173.1), AtCesA3 (AF027174), AtCesA4 (AF458083.1), AtCesA5 (NM_121024.1), AtCesA6 (NM_125870), AtCesA7 (AF088917), AtCesA8 (AF267742), AtCesA9 (NM_127746.1) AtCesA10 (NM_128111.2); Populus tremuloides, PtrCesA1 (AF072131.1), PtrCesA2 (AAM26299.1), PtrCesA3 (AF527387.1), PtrCesA5 (AY055724.2), PtrCesA6 (AAP40636.1), PtrCesA7 (AY162180); Oryza sativa L. OsCesA1 (AAU44296.1), OsCesA2 (AAP21426.1), OsCesA3 (BAD30574), OsCesA4 (BAC57282.1), OsCesA5 (NP_001051648.1), OsCesA7 (NP_001051830.1), OsCesA8 (NP_001059303.1), OsCesA9 (AF200533); Pinus taeda, PtCesA2 (AAX18648.1), PtCesA3 (AAX18649.1); Pinus radiata, PrCesA1 (AAT57672.1), PrCesA2 (AAQ63936.1), PrCesA10 (AAQ63935.1); Eucalyptus grandis, EgCesA1 (AAY60843.1), EgCesA2 (AAY60844.1), EgCesA3 (AAY60845.1), EgCesA4 (AAY60846.1), EgCesA5 (AAY60847.1), EgCesA6 (AAY60848.1).

59

No Apical Meristem (NAM)

NAM is a transcription factor and belongs to a family of proteins containing a NAC domain. The EgrNAM1 protein shared most amino acid sequence similarity (28%) with

Arabidopsis ANAC014 (AT1G33060). While several NAC domain proteins occur in poplar, no poplar sequence with close similarity to NAM/CUC2 or EgrNAM1 were identified in the poplar genome sequence. Further sequence analysis revealed that the

EgrNAM1 cDNA is truncated and lacks a start codon. The partial EgrNAM1 cDNA is

1490 bp in length and contained an incomplete open reading frame of 1221 bp encoding a predicted protein of 407 amino acids (Figure 2.9). An alignment of the predicted partial amino acid sequences of EgrNAM1 and the closely related arabidopsis ANAC014 protein sequence shows the regions of strongest similarity between the two genes (Figure

2.10). The 3’ sequences of a total of 28 NAC proteins with homology to EgrNAM1 were selected for phylogenetic analysis including VND (a vascular related NAC domain protein), SND1 (a secondary wall associated NAC domain protein), and other NAC domain proteins from Arabidopsis thaliana and Oryza sativa. Figure 2.11 illustrates the relationship of the EgrNAM1 protein with Arabidopsis and other plant NAC proteins.

According to the dendrogram the eucalypt protein falls into a distinct sub group that includes the Arabidopsis protein ANAC014.

60

ggtaacactgatggaacaccattggaggtaaatccattgctcgaaagtggccccaaattc G N T D G T P L E V N P L L E S G P K F tatgatctgccttgtgatccaactgatcacaaaggtttattccctgagcagacacagcta Y D L P C D P T D H K G L F P E Q T Q L aaaacagggcaagctccttacatggattctccttttgcctgtgattttggcaatgatctt K T G Q A P Y M D S P F A C D F G N D L gagggactactattccacgatgctggtgagcacgagatcacactctcggatttgttggat E G L L F H D A G E H E I T L S D L L D gaggtcttcaataatcaagatgactcctgtgaagagtccaccagtcagaagaattcagtt E V F N N Q D D S C E E S T S Q K N S V ggaagtgagattccatttgatcctccgttcaacttatcccagccggtcctaggggtcaaa G S E I P F D P P F N L S Q P V L G V K gataatgtatggtatgatgatcttgttgacaataataattggatggatgcgttacctttg D N V W Y D D L V D N N N W M D A L P L tggggaccatctggagctgttgcttcaaataatatggcagaatcaactggtgaatctttc W G P S G A V A S N N M A E S T G E S F aatagcaatggtggaactccgatcaagatcaggacacgtgactctcaagtcctaccacat N S N G G T P I K I R T R D S Q V L P H tcaagtgattatggagcgcagggtactgctcctagaaggttgcgtctggcggtggatcga S S D Y G A Q G T A P R R L R L A V D R gccaatcaactaccaagaagcgtaaagctagaaggagctgatgagggaatgagaggtact A N Q L P R S V K L E G A D E G M R G T ggctctaatgcagaagaagaagttcagtcaactgctgaggagagttctgttaaccacaat G S N A E E E V Q S T A E E S S V N H N gacaacacgattgtcggaaccaatatcaagataaggacccgtcaacctcgacatcaacca D N T I V G T N I K I R T R Q P R H Q P ggctcagagaattcaatagttcaaggtattgctcccagaagaatccggctgcagatgaac G S E N S I V Q G I A P R R I R L Q M N agtcaatctgggtcaactagggatgatgaagttaaaacttcaagcttcgatgaagaagta S Q S G S T R D D E V K T S S F D E E V cagtctgctcctaccaaggtcaaagaagaaacggacaatatttctgatcacaacgaacca Q S A P T K V K E E T D N I S D H N E P gagagagacggtcagcttcctgctcatgataagatgacagaaattgtcgaagagccgtgt E R D G Q L P A H D K M T E I V E E P C acaaacttaacgttggggtcaaaaccaggtggggaatcacgtagtcatgcgatggctcca T N L T L G S K P G G E S R S H A M A P gcatctttgggaataccttcagtacgccctgcttctcgtttccgcttatctttacttttc A S L G I P S V R P A S R F R L S L L F acagtcggcatgtctcttgtcatagtcctggccgtagttttctctggtggggtacagaga T V G M S L V I V L A V V F S G G V Q R tacatcaaactggacattgcgtagagaaagagccgttttccatcccatctttacatatgg Y I K L D I A - catagggtacgcgtggagacgtctaatagaggatgatgcagatattttatgtattgcatc ttttgctgtaaaatatatttccctgtatctagcagaggagcctgctctgttgtgggcttc tggattcagtgagacaacagatttctgtatckttagtgagtaagasctttatatggatc

Figure 2.9 Nucleotide and deduced amino acid sequence of the partial EgrNAM1 cDNA. The deduced amino acid sequence is shown below the nucleotide sequence in single-letter code. Bold letters indicate the stop codon of the gene.

61

EgrNAM1 1 GNTDGTPLEVNPLLESGPKFYDLPCDP-TDHKGLFPEQTQLKTGQAPYMD 49 |...|.:|.||||...|..: .| ...|..:|.|:.:....: :|| ANAC014 1 NLGKTLVEENPLLRDVPTLH----GPILSEKSYYPGQSSIGFATS-HMD 44

EgrNAM1 50 SPFACDFGNDLEGLLFHD-AGEHEITLSDLLDEVFNNQDDSCEESTSQKN 98 |.::.||||...||.|.| |.|.:.:|:|:|||||:|.::| |..:|: ANAC014 45 SMYSSDFGNCDYGLHFQDGASEQDASLTDVLDEVFHNHNES---SNDRKD 91

EgrNAM 99 SV------GSEIPFDPPFNLSQPVLGVKDNVWYDDLVDNNNW 134 .| .:|.|| :||:|.: ||.:.. ANAC014 92 FVLPNMMHWPGNTRLLSTEYPF------LKDSVAF---VDGSAE 126

EgrNAM1 135 MDALPLWGPSGAVASNNMAESTGESFNSNGGTPIKIRTRDSQVLP--HSS 182 :.....:.|. .:||..::| ::.:|.....|...|..|:.|. |:: ANAC014 127 VSGSQQFVPD-ILASRWVSE---QNVDSKEAVEILSSTGSSRTLTPLHNN 172

EgrNAM1 183 DYGAQGTAPRRLRLAVDRANQLPRSVKLEGADEGMRGTGSNAEEEVQSTA 232 .:|...::. ..|:|..|. |..:..||:. ANAC014 173 VFGQYASSS---YAAIDPFNY------NVNQPEQSSF 200

EgrNAM1 233 EESSVNHNDNTIVGTNI-KIRTRQPRHQPGSENSIVQGIAPRRIRLQMNS 281 |:| |.|..|..:|| :.:.|...:|...::.:.||.||||||||:.. ANAC014 201 EQS---HVDRNISPSNIFEFKARSRENQRDLDSVVDQGTAPRRIRLQIEQ 247

EgrNAM1 282 Q-SGSTRDDEVKTSSFDE--EVQSAPTK-VKEETDNISDHNEPERDGQLP 327 . :..|...|....:::| |||||.:| |:||..|:|.....:|..:|. ANAC014 248 PLTPVTNKKERDADNYEEEDEVQSAMSKVVEEEPANLSAQGTAQRRIRLQ 297

EgrNAM1 328 AHDKMTEIVEEPC-----TNLTLGSKPGGESRSHAMAPASLGIPSVRPAS 372 .. :.:|. |...... :.|..|...... |.|..... ANAC014 298 TR------LRKPLITLNNTKRNSNGREGEASHRKCEMQEKEDISSSSSWQ 341

EgrNAM1 373 RFRLSLL-FTVGMSLVIVLAVVFSGGVQRYIKLDIA 407 :.:.||: |: |:||::||: .:.::|. ANAC014 342 KQKKSLVQFS---SVVIIVAVI------VVLVEIWKESRDAKCSFLFHQ 381

EgrNAM1 408 407

ANAC014 382 LDSFKGMFT 390

Figure 2.10 Alignment of deduced amino acid sequences of partial EgrNAM1 and the closely related arabidopsis NAC protein ANAC014. Numbers indicate the position of the amino acid residues in each protein. Shaded yellow indicates amino acid residues that are identical. Double dots indicate amino acids with similar properties. Hyphens indicate gaps in the amino acid alignment. CLUSTALW was used to align deduced amino acid sequences.

62

OsNAC8 TIP NAC2 CUC1 CUC2 NAC1 CUC OsNAC5 NAP

SND1 AtNAM OsNAC7 ANAC VND7 OsNAC3 VND6 ATAF1 VND5 ATF2 VND2 VND4 VND3 VND1

OsNAC4

OsNAC6

NAM/CUC2-protein ANAC014

EgrNAM1 Figure 2.11 Unrooted phylogenetic analysis of 3’ region of plant NAC proteins. CLUSTALW was used to align deduced amino acid sequences and the unrooted neighbor-joining tree was obtained using MEGA 3.1. NCBI gene ID and GenBank accession numbers for the proteins used were as follows: Arabidopsis thaliana, ANAC (GI:50401153), NAP (GI:50400809), AtNAM (GI:50401213), ATAF1(NM_100054), ATAF2 (CAC35884.1), CUC1 (NP_188135.1), CUC2 (NP_200206.1), CUC3 (NP_177768.1), NAC1(GI:51316422), TIP (NP_197847.3), ANACO14 (NP_973954.1), NAM/CUC2-Protein (CAB80274.1), VND1(NM_127362.1), VND2 (NM_119783.2), VND3 (NM_126028.1), VND4 (NM_101098.2), VND5 (NM_104947.1), VND6 (BT026510.1), VND 7(NM_105851.1), SND1 (NM_103011.1); Oriza sativa L, OsNAC3 (NM_186659), OsNAC4 (XM_483529), OsNAC5 (NP_908352.1), OsNAC6 (XM_464630), OsNAC7 (NM_202450), OsNAC8 (AB028187); Eucalyptus grandis, EgrNAM1

63

EgrHB1

The full-length EgrHB1 cDNA is 3709 bp in length with a coding sequence of 2532 bp encoding a predicted protein of 844 amino acids. The cDNA has a long 872 bp

5’untranslated region (UTR) and 302 bp of 3’ UTR (Figure 2.12). EgrHB1 showed strongest amino acid homology to PopHB5 (89%) and ATHB-15 (85%). An alignment of the predicted amino acid sequence of EgrHB1 and ATHB-15 is shown in Figure 2.13.

The predicted amino acid sequence contained a DNA binding homeodomain, a leucine zipper dimerization domain, a sterol-binding domain, MEKHLA domain and conserved micro RNA mi165/166 sequences. Figure 2.14 illustrates the phylogenetic relationship of EgrHB1 with other class III HD-Zip proteins from Arabidopsis thaliana, Populus trichocarpa and Zinnia elegans.

64

CTCAAATTCAATGAATTGTATTGCTTTTTCTCCATTTTGCAGGGGCAGGGAATAATTATCACCATATCTGA GATCTCCTGTTTTTGACTCCAAACTTCTCCTCCCTGCCCAAACCATCATCTCTCCCCCTCCCTCGCCGCCT CTCTCTCTCCTCATTTAATGGGTCTTCCCCGATAGTTTTTGGATTTTATATCCTTTTCGTGAAAACCCCCC ATTAAAGATGTGTGGACGAGCAGATGAACAGTGGATTCAGAAGAATCTTTATTGGCGCGATGAAGATCGGT GAGGAGGAGAAGAGAAGAGCTTCTGTTGCCGCTTCTTCTTCGGCCATTTGTTGCTCCTGGAGCTGAAATTC GGATCTGGGTGGGAGCAAAGTTCGCGACTTTATATGCATGCCGAAGAACTCGCCGCTGAGTTGACTCGTTG ACTCGCTCGGATCATGTGACCCTGAAGAGTTGAGCTCCGATTCTTGAATAATAGGCAAAAGGGTGTTGGTG TCATGTATGAAGCTTGCGATTGTGAAAGGTTTTAGGTTGAAAAAGCGCTGATCTCTAGAGAGAGATTTTTC ACTGTTCGTCTTGTGTTCGGTGATGCTTCTCAACGGGACAACAAGGGATGAGACTGCGTAATTAGACCGGG GATAATATTTGCTTTGTTGTTTGTATCAAGCAGTTCTCAAGATTGCACTGAAATCTTGGGTATTTGGCATT TTGCATGTGGGTGAGAGTTCTAAAGAGGCGAGAAAAGAAAAGGAGGGGTGCTTTGGGGCCAGGCACTGTGG ACTATTTCCTGAGAAGTTCTGCCGTTTCTTTTCCTGGAAAGTGAGAAATTTTCCCGGAAAGTGAGGGTGAT TGAATTGAATATTTCTGGAGATGGCAACCTCCTGCAAAGAAGGTAAACTCGGGCACAGCAACAGTAGCAAT AGCTTGGACAATGGGAAATATGTGAGGTACACGCCTGAGCAGGTTGAGGCCCTCGAGAGGCTCTACCACGA GTGTCCGAAGCCCAGTTCACTCCGTCGCCAACAGCTGATCAGGGAGTGTCCCATTCTCTCCAATATTGAGC CCAAGCAAATCAAGGTCTGGTTCCAGAACCGAAGATGCAGGGAGAAGCAGAGGAAAGAAGCTTCCCGTTTG CAAGCTGTGAACAGGAAGCTCACTGCGATGAACAAGTTATTGATGGAGGAGAATGATAGGTTGCAGAAGCA AGTTTCTCAGCTGGTGTATGAGAATGGCTATTTCCGCCAACACACCCAGAACACGACGCTTGCAACCAAAG ACACAAGCTGTGAATCGGTGGTGACGAGCGGTCAACACCAGTTGACATCTCAGCATCCTCCCAGGGATGCT AGTCCTGCAGGGCTTTTGTCCATTGCAGAAGAGACTTTAGCAGAGTTTCTTTCAAAGGCCACTGGAACCGC TGTGGAGTGGGTCCAAATGCCTGGAATGAAGCCTGGTCCGGATTCCATTGGAATCGTTGCTATTTCTCATG GTTGCGCTGGCGTGGCAGCACGAGCATGCGGACTTGTGGGTCTTGAACCTACAAGAGTTGCAGAAATCCTA AAGGATCGACCGTCATGGTTCCGTGACTGTCGAGCCGTGGATGTTTTGAACGTGTTGCCAACAGCAAATGG TGGAACCATTGAGCTGCTCTACATGCAGCTCTATGCGCCAACAACCTTGGCGCCAGCCCGTGACTTCTGGT TGCTGCGTTATACTTCTGTTCTGGAAGATGGGAGTCTCGTGGTGTGTGAGAGGTCACTTAAAAATACACAA AATGGTCCAAGCATGCCTCCAGTACAGCCTTTTGTCCGAGCAGAGATGCTCCCTAGTGGCTACTTGGTACG TCCATGTGAAGGTGGTGGTTCAATCATACGCATTGTTGATCACTTGGATCTAGAGCCATGGAGTGTGCCTG AAGTACTGCGACCATTGTATGAGTCCTCCACAATGCTTGCTCAGAAGACGACAATGGCAGCTCTGCGACAG CTGAGGCAGATAGCTCAGGAAGTTTCACAGCCTAATGTTTCTGGCTGGGGAAGGCGACCTGCAGCACTTAG AGCTCTTAGCCAGAGGTTAAGCAGGGGATTTAATGAGGCTCTTAATGGATTTACTGATGAAGGATGGTCGA TCATGGGGAATGATGGCATTGATGATGTCACTATTCTCGTGAATTCATCCCCTGACAAGCTAATGGGATTG AATCTTTCGTTTTCAAATGGATTCCCAGCTGTGAGCAACGCTGTTCTATGCGCGAGGGCCTCTATGCTCTT GCAGAATGTGCCTCCTGCAGTCCTCCTTCGCTTCCTCCGTGAGCACAGGTCAGAATGGGCTGACAACAGTA TTGATGCATACTCAGCCGCAGCAGTTAAAGTTGGTTCCTGTGCTTTACCTGGATCACGTATTGGGAGTTTC GGGGGTCAGGTTATACTTCCACTTGCTCATACTATTGAGCATGAAGAGTTCTTGGAGGTCATCAAATTAGA AGGTATGGGCCACTCTCCAGAAGATGCCTTAATGCCTAGAGATATATTTTTCCTGCAAATGTGCAGTGGAG TGGATGAAAATGCTGTGGGAACATTTGCCGAATTGATATTTGCTCCAATTGATGCTTCCTTTGCTGATGAT GCACCTCTTCTTCCTTCTGGGTTTCGTATCATTCCTCTTGATTCAGTAAAGGAAGCTTCTAGCCCTAATCG CACATTGGACCTTGCCTCTTCTCTTGAGATCGGGCCAGCTGGAAATAGGAGTTTTAATGATATTAATGCTA ATTCTGGTTGTACGAGATCAGTGATGACTATCGCATTTGAGTTTGCATTCGAAAGCCACATGCAGGAACAT GTGGCCTCTATGGCCCGCCAATATGTGCTTAGTATAATATCCTCGGTGCAGAGAGTGGCATTGGCACTCTC TCCTTCCAATCTCGGTTCACATGCTGGTCTGCGTACACCTCTTGGCACTCCTGAAGCCCAAACACTTGCTC GCTGGATTTGCCACAGTTATAGGTGCTACTTGGGGGTGGATCTTCTCAAGTCCAGCAATGAAGGAAGTGAG TTGATTCTCAAGAACCTGTGGCATCACTCAGATGCTATTATGTGCTGCTCTCTTAAGGCCTTACCCGTATT CACGTTTGCAAATCAGGCAGGTCTGGACATGCTCGAAACCACCTTGGTGGCGCTGCAAGACATAACCCTGG AAAAGATTTTTGATGATCATGGCCGAAAGACTCTGTGTTCAGAGTTCCCACAAATCATGCAACAGGGTTTT GCTTGTCTTCAAGGTGGGATCTGCCTCTCGAGCATGGGACGACCAGTGTCGTACGAAAGGGCAGTGGCCTG GAAAGTTATGAATGAGGAAGAGAATGCCCACTGCATCTGCTTTATGTTCATCAACTGGTCTTTTGTGTGAT TTCTGTTGCAGAAACTAAGGTATTAAGCTATGTAAGTTGTGAAGAATGACTCTTCATCTAGTGACTGCTAC TTCAAACTCCTATGGTCTGTGAACCTTAGAACTGATGTGTCCTCTCTTGTTTAGACGATCGTCATGTGGAC GCCTGGTCGATGTCGACTCTTTTGCCATGTCTGTGTAGTGGTTATGAATGGACGTGGATGTTATGCTTGGA AGTGGTTGATTATCATTTTCGTGCTGTAGATGTGGACATTACTTTGTTCTAGTTGGAGAGGAATTGAGTTT TTGGTAGAAAAAAAAAAAAAAAAAAAAAAAAA

Figure 2.12 Nucleotide sequence of EgrHB1.

65

************************************** ATHB-15 1 MAMSCKDGKLG------CLDNGKYVRYTPEQVEALERLYHDCPKPSSIRRQQLIRECPIL EgrHB1 1 MATSCKEGKLGHSNSSNSLDNGKYVRYTPEQVEALERLYHECPKPSSLRRQQLIRECPIL

********************************************** *********************** ATHB-15 55 SNIEPKQIKVWFQNRRCREKQRKEASRLQAVNRKLTAMNKLLMEENDRLQKQVSQLVHEN EgrHB1 61 SNIEPKQIKVWFQNRRCREKQRKEASRLQAVNRKLTAMNKLLMEENDRLQKQVSQLVYEN

* *********************** ATHB-15 115 SYFRQHTPNPSLPAKDTSCESVVTSGQHQLASQNPQRDASPAGLLSIAEETLAEFLSKAT EgrHB1 121 GYFRQHTQNTTLATKDTSCESVVTSGQHQLTSQHPPRDASPAGLLSIAEETLAEFLSKAT

************************************************************ ATHB-15 175 GTAVEWVQMPGMKPGPDSIGIIAISHGCTGVAARACGLVGLEPTRVAEIVKDRPSWFREC EgrHB1 181 GTAVEWVQMPGMKPGPDSIGIVAISHGCAGVAARACGLVGLEPTRVAEILKDRPSWFRDC

************************************************************ ATHB-15 235 RAVEVMNVLPTANGGTVELLYMQLYAPTTLAPPRDFWLLRYTSVLEDGSLVVCERSLKST EgrHB11 241 RAVDVLNVLPTANGGTIELLYMQLYAPTTLAPARDFWLLRYTSVLEDGSLVVCERSLKNT

************************************************************ ATHB-15 295 QNGPSMPLVQNFVRAEMLSSGYLIRPCDGGGSIIHIVDHMDLEACSVPEVLRPLYESPKV EgrHB1 301 QNGPSMPPVQPFVRAEMLPSGYLVRPCEGGGSIIRIVDHLDLEPWSVPEVLRPLYESSTM

************ ATHB-15 355 LAQKTTMAALRQLKQIAQEVTQTNSSVNGWGRRPAALRALSQRLSRGFNEAVNGFTDEGW EgrHB1 361 LAQKTTMAALRQLRQIAQEVSQPN--VSGWGRRPAALRALSQRLSRGFNEALNGFTDEGW

ATHB-15 415 SVIG-DSMDDVTITVNSSPDKLMGLNLTFANGFAPVSNVVLCAKASMLLQNVPPAILLRF EgrHB1 419 SIMGNDGIDDVTILVNSSPDKLMGLNLSFSNGFPAVSNAVLCARASMLLQNVPPAVLLRF

ATHB-15 474 LREHRSEWADNNIDAYLAAAVKVGPCS---ARVGGFGGQVILPLAHTIEHEEFMEVIKLE EgrHB1 479 LREHRSEWADNSIDAYSAAAVKVGSCALPGSRIGSFGGQVILPLAHTIEHEEFLEVIKLE

ATHB-15 531 GLGHSPEDAIVPRDIFLLQLCSGMDENAVGTCAELIFAPIDASFADDAPLLPSGFRIIPL EgrHB1 539 GMGHSPEDALMPRDIFFLQMCSGVDENAVGTFAELIFAPIDASFADDAPLLPSGFRIIPL

ATHB-15 591 DSAKEVSSPNRTLDLASALEIGSAGTKASTDQSGNSTCARSVMTIAFEFGIESHMQEHVA EgrHB1 599 DSVKEASSPNRTLDLASSLEIGPAGNRSFNDINANSGCTRSVMTIAFEFAFESHMQEHVA

********************* ATHB-15 651 SMARQYVRGIISSVQRVALALSPSHISSQVGLRTPLGTPEAQTLARWICQSYRGYMGVEL EgrHB1 659 SMARQYVLSIISSVQRVALALSPSNLGSHAGLRTPLGTPEAQTLARWICHSYRCYLGVDL

************************************************************ ATHB-15 711 LKSNSDGNESILKNLWHHTDAIICCSMKALPVFTFANQAGLDMLETTLVALQDISLE-KI EgrHB1 719 LKSSNEGSELILKNLWHHSDAIMCCSLKALPVFTFANQAGLDMLETTLVALQDITLEVKI

************************************************************ ATHB-15 770 FDDNGRKTLCSEFPQIMQQGFACLQGGICLSSMGRPVSYERAVAWKVLNEEENAHCICFV EgrHB1 779 FDDHGRKTLCSEFPQIMQQGFACLQGGICLSSMGRPVSYERAVAWKVMNEEENAHCICFM

******* ATHB-15 830 FINWSFV EgrHB1 839 FINWSF-

Figure 2.13 Comparison of the deduced amino acid sequences of ATHB-15 and EgrHB1. Numbers indicate the position of the amino acid residues of each protein. Black boxes indicate amino acid residues that are identical in both sequences. Grey boxes indicate similar amino acids. Green, blue, red and pink asterisks indicate the position of the DNA binding homeodomain (HB), leucine zipper dimerization domain, putative sterol-binding domain, and PAS related MEKHLA domain respectively. Yellow shaded box indicate the target microRNA miR165/166 site.

66

ZeHB-10

ATHB-8 Zehb2 ZeHB-13

ATHB-15 CORONA PopHB8 PopHB7

EgrHB1

PopHB6 PopHB5 PtaHDZ32

PtaHDZ31

PtaHDZ33

PopHB4

ZeHB-12 PopHB3

Zehb1

PopHB1 PHB REV

Figure 2.14 Unrooted phylogenetic tree of plant HD-Zip III proteins (contain MEHKLA domain). CLUSTALW was used to align deduced amino acid sequences and the unrooted dendogram was obtained using MEGA 3.1. GenBank accession numbers for the proteins used were as follows: Arabidopsis thaliana, ATHB-8 (NP_195014.1), ATHB-15 (NP_175627.1), CORONA (AAW88440.1), PHV (NP_181018.1), REV (NM_125462.2); Zinnia elegans, Zehb1(CAC84906.1), Zehb2 (CAC84276.1), ZeHB-10 (BAC22512.1), ZeHB-12 (BAC22514.1), ZeHB-13 (BAD01502.1); Populus trichocarpa, PopHB1 (AAX19050.1), PopHB3 (AAX19052.1), PopHB4 (AAX19053.1), PopHB5 (AAX19054.1), PopHB6 (AAX19055.1), PopHB7 (AAX19056.1), PopHB8 (AAX19057.1); Pinus taeda, PtaHDZ31 (ABG73245.1), PtaHDZ32 (ABG73246.1), PtaHDZ33 (ABG73247.1)

67

Zinc finger (C3HC4-type Ring finger)

The eucalypt zinc finger protein EgrZnf1 cDNA is 1440 bp in length and contains an open reading frame encoding a protein of 370 amino acids preceded by a 73 bp 5’-UTR and followed by a 254 bp of 3’UTR (Figure 2.15). Sequence analysis revealed that this gene is most homologous to C3HC4-type ring finger proteins from Arabidopsis thaliana

(At1g73760) and Oryza sativa (BAD82497.1) with 41% and 44% sequence identity respectively. It shared strongest DNA homology (97%) with a Eucalyptus globulus cDNA (ES595343.1) from leaves grown under low temperature conditions. An alignment of the predicted amino acid sequence of EgrZnf1 with four other plant C3HC4-type ring finger proteins sharing highest amino acid similarity is shown in Figure 2.16. The alignment clearly showed that EgrZnf1 is a Ring-H2 type C3HC4 ring finger protein, in which the ring motifs are located in the C–terminal region of the protein. Twenty C3HC4 type ring finger proteins from Arabidopsis thaliana and Oryza sativa were compared with the predicted eucalypt zinc finger protein and a phylogenetic tree illustrating the position of EgrZnf1 is shown in Figure 2.17.

68

ctctctctctctctctctcttctcactctctcttgctctctcgtgctctttctctctctct ctcacagaaatcatgcctgttttgggagagagctctgagcacaccaaacccagaagaccc L T E I M P V L G E S S E H T K P R R P agaaaccagctctgcaatcccattcaagaaactgcagatcggagcccatcgcccttcgtc R N Q L C N P I Q E T A D R S P S P F V cttccaaaccgcgcgaaacccaccatttcttcgtttctccactctgaatcccccagtgag L P N R A K P T I S S F L H S E S P S E cacatgcccacgaccatgaactccgccaagaagaagcatttcgcgtcctcaagattccgg H M P T T M N S A K K K H F A S S R F R ggactcgggtgtgcggcctccgcgtcgcagcaggtctccgtgccggcggtgattagggca G L G C A A S A S Q Q V S V P A V I R A tctgcggattgggagaagaggaaggtgaagaagaagaagcagaagagaggcggcggcggc S A D W E K R K V K K K K Q K R G G G G ggcggcggcggtggcggcggcaatccaacagtggtggtagacggtggtgggtcgagcttc G G G G G G G N P T V V V D G G G S S F gggggttgcaattctgggtcctcttgcgtcgtggctgaggatgcttggtgtggtcctgga G G C N S G S S C V V A E D A W C G P G attggcttctcagctgctgatgctgattgtgtggtggtcggcaggaggaacatgtctgca I G F S A A D A D C V V V G R R N M S A agagggacgattgatggcgataagtttggtccaagagagcgactttgcgtctcgaggcgg R G T I D G D K F G P R E R L C V S R R actgtgaatccagaasagtttattctggattctgaccccgccttcggaactactcactca T V N P E X F I L D S D P A F G T T H S gggctggaaacatatgtgcctggacctcggcgctaccgccatattcgacatccttcgcct G L E T Y V P G P R R Y R H I R H P S P gaaggcctagctgagattatgatgcttcacacaagtcttctaatgggaggaaggttggat E G L A E I M M L H T S L L M G G R L D gtacttgatcgttaccgaggctggagactcgacattgataatatgacttatgagcaattg V L D R Y R G W R L D I D N M T Y E Q L cttgagcttggcgataggattggctatgtgaatactggcctcaaaggtgatcagatagct L E L G D R I G Y V N T G L K G D Q I A cactgcatcagaaagattaagctcacaaatctgaatgatttggcacgtcatttttctgca H C I R K I K L T N L N D L A R H F S A aaagcagataggaagtgcagcatttgccaagaagagttcgaagtaggtgacgagctcggg K A D R K C S I C Q E E F E V G D E L G aagttgaactgtgggcacggctaccacatggaatgcataaagcactggcttgggcaaaag K L N C G H G Y H M E C I K H W L G Q K aactcgtgcccggtttgcaagtctgaggcagtggctcggtcctagaatcctcgcaaatct N S C P V C K S E A V A R S - N P R K S taaagccttggcagttgctccctcagcccccccaaaatgtataggttttgtttcctatat taggagaatccattcttttcattcgctatgcaattctctccacccatggtctcattgttg aagtttgatctcttgttctcctttgtcattttttttcgtttcaattctaaatatatgatt cggaattgctagatgtttcaatttgarkattcttatccgaaaaaaaaaaaaaaaaaaaaa

Figure 2.15 Nucleotide and deduced amino acid sequence of EgrZnf1. The deduced amino acid sequence is shown below the nucleotide sequence in single-letter code. Bold letters indicate the start and stop codons of the gene.

69

OsP0034C09.30. 1 -----MAG-DRRGGGGVVSADGERRRGIRRLLLPRGEGSSSSS---PPQPPPLQAEEGR- Os05g0550000. 1 -----MARRDGVGGDGGASAAEQQRRVALRVLLSRAEASSP-----PPATVEEEAQRGRS EgrZnf1. 1 ------MPVLGESSEHTKPRRPRNQLCNPIQETADRSPS-----PFVLPNR-AKPTIS AtZn-fingerI. 1 MPVSAEPSSSSSTTIGQHMRLQRPRNHRNLPPISTADEP------LIPKPSRVSKSAMS AtZn-fingerII 1 ------MSSTTIGEHIRLRRARNQTIRHLHAADDDPPLSHVVLPISQPNRFCNSAMS

OsP0034C09.30. 51 ------RKGFASAALRGLGCTSAAASQAYAPGAGAAAAAAVRSSAD Os05g0550000. 51 ------GGGNKGLASAALRGLGCTSTAALRAHAP---ASAVEVASSSER EgrZnf1. 47 SFLHSESPSEHMPTTMNSAKKKHFASSRFRGLGC---AASASQQVSVPAVIRASADWEKR AtZn-fingerI. 54 SFFLLP------ETTKKKPNGTASFRGLGC---TTSASQQVSVPAVIRSSADWDAS AtZn-fingerII 52 SFFPLPTSS-----SNESTRKKPYQTSSFRGMGCYAAAAAAAQEVSVPSVIRYSADLDAR

OsP0034C09.30. 91 WHGRRRRR-GKEKRKERGGGGGGGGGGHLVG--GGIGA------DVWC--APGI Os05g0550000. 91 WHGRRRRRKVQERRSARGGGGGGGGGVAPPGPAPAAAG------DVWCTCAPGI EgrZnf1. 104 KVKKKKQK----RGGGGGGGGGGGNPTVVVDGGGSSFGGCNSGSSCVVAEDAWCG--PGI AtZn-fingerI. 101 NFKIKKTK----KKNKNKGSSSYNGGSIKILSEASTS----SSVACAAIPDVWCG--PGV AtZn-fingerII 107 IRKDKKKKKHKHKKKKKKNKGSYEDGSIRILSEEAR------DVIDVWCR--PGL

OsP0034C09.30. 134 PFAAEASSVDCVVAR------HQMVGRGRGGDAERPHRERPCLS--RRVTVQE Os05g0550000. 139 PFAAEASSVDCVVVARH-----HHAHHTAAAMGSGRRGEAERRHRERPAAPRARRVTMRE EgrZnf1. 158 GFSAADADCVVVG------R-RNMS-ARGTIDGD-----KFGPRER--LCVSRRTVNPE AtZn-fingerI. 151 GFSTDAVVGGSIDTVVSDPPR-RNIP-VRRKIDGDKTNSNSNNHREGSSSLLPRRSLNQE AtZn-fingerII 154 GFSTDAVIGRSVD-----PPRGRNIPSSRRKIDVD---NNNYNHTLG-SSVLPIRFLNQE

OsP0034C09.30. 179 QISSSFMDSPPPPHLDVAPFFGADLLPSGRLRRMR-GYRHSPVG-LEEEIMMFQTRVLLG Os05g0550000. 194 HISSSLMDSPPFPDM---PLLNADLLPPPPSGRHRHGYRHPHVGAAEEEIMMLRTRLLWG EgrZnf1. 202 X---FILDSDP-----AFGTTHSGLETYVPGPRRYRHIRHPSPEGLAEIMMLHTSLLMGG AtZn-fingerI. 209 --SNPYFDSDS-----SFLTS-RAEQT----DRYHRHLRLPYPDGLAEMMMMQNGFVMGG AtZn-fingerII 205 THSHDIFNSDS-----TFVTSSRAEPTMLS-SRCRGHLPRSYPDDLTEMRMLQNGFVMGR

OsP0034C09.30. 237 GMSMYDRYQDWRLDVDNMTYEELLELGDKIGYVNTGLREDEIVRNLRKVKHPAFDSSFRY Os05g0550000. 251 RFGMHDQHQDWRLDVDNMTYEELLDLEDRIGYVSTGLHDDEIARSLRMVKYSAFN-PKHF EgrZnf1. 254 RLDVLDRYRGWRLDIDNMTYEQLLELGDRIGYVNTGLKGDQIAHCIRKIKLTNLNDLARH AtZn-fingerI. 257 VLSSFDQFRDMRLNVDNMTYEQLLELGERIGHVNTGLTEKQIKSCLRKVKPCRQDTTV-- AtZn-fingerII 259 ITDSRDNYHELRLDVDSMSYEQLLELGDRIGYVNTGLKESEIHRCLGKIKPS-VSHTL--

******************************************** OsP0034C09.30. 297 ST-EMEKKCSICQEEFEANEEMGRLDCGHSYHVYCIKQWLSQKNVCPVCKTAVTKT- Os05g0550000. 310 AT-EVERNCSICQEEFEANEETGRLICGHSYHVQCIKQWLSRKNTCPVCKTVVSKT- EgrZnf1. 314 FSAKADRKCSICQEEFEVGDELGKLNCGHGYHMECIKHWLGQKNSCPVCKSEAVARS AtZn-fingerI. 315 ----ADRKCIICQDEYEAKDEVGELRCGHRFHIDCVNQWLVRKNSCPVCKTMAYNKS AtZn-fingerII 316 ----VDRKCSICQDEYEREDEVGELNCGHSFHVHCVKQWLSRKNACPVCKKAAYGKP

Figure 2.16 Comparison of the deduced amino acid sequence of EgrZnf1 with Arabidopsis and Oryza sativa C3HC4-type ring finger proteins. Numbers indicate the position of the amino acid residues of each protein. Black boxes indicate identical amino acid residues and grey boxes indicate similar amino acids. Asterisks indicate the position of the ring finger domain. Shaded yellow boxes with pink letter indicate the conserved cysteine/histidine residues of the RING-H2 motif.

70

XERICO OSRHG1a

EgrZnf1 OsRing-finger AtZnf6

AtZnf1

AtZnf2 AtZnf5 AtZnf

AtZnf3 AtRIE1

AtZnf4

ATL3 ATL6

AtRINGH2-1 ATL2

OsRing2 ATL4 ATL5 OsRing-1. AtRINGH2-2

Figure 2.17 A phylogenetic tree of Zinc finger (C3HC4 type ring finger) family proteins from Arabidopsis thaliana, Oryza sativa and Eucalyptus grandis. CLUSTALW was used to align deduced amino acid sequences and the unrooted neighbor-joining tree was obtained using MEGA 3.1. GenBank accession numbers for the proteins used were as follows: Arabidopsis thaliana, AtZnf1 (F25p22.18) (NP_177517.1), AtZnf2 (F2H15.19)(NP_173239.1), AtZnf(C-terminal) (AAM63662.1), AtZnf3 (F2H15.16) (AAM51597.1), AtXERICO (NM_201687), AtZnf5 (T12C24.29) (NM_101146), AtZnf6 (T15N24.30)(NM_118792), AtRIE1 (AY168924), AtZnf (T23K23.8) (NM_105477), ATL2 (NM_112545), ATL3 (AF132013), ATL4 (AF13201), ATL5 (AF132015), ATL6 (AF132016), AtRingH2-1 (AAM60957), AtRingH2-2 (NP_174766); Oryza sativa; OsP0034C09.30 (BAD82497.1), Os05g0550000 (NP_001056239.1), OsRing-1 (AY579411), OsRing-2 (XP_450948).

71

2.4 DISCUSSION

2.4.1 Gene discovery strategy

This study was aimed at identifying genes differentially expressed in eucalypt xylem giving rise to wood with differing pulp yield. Genetic variation in these genes may influence variation in pulp yield and is the subject of further study in Chapter 4.

Microarrays containing ~5800 cDNAs were differentially screened using xylem pooled from high and low pulp yield trees. Similar approaches have been carried out in trees in order to identify genes important in tension wood (Lafarguette et al., 2004; Paux et al.,

2005; Qiu et al., 2008), compression wood (Peter, 2003), and earlywood vs late wood (Li,

Xinguo, CSIRO Forest Biosciences, personal communication).

A number of strategies were used to improve the reliability of the microarray experiments. Attempts were made to minimize the biological variation between xylem samples as this will impact the abundance of gene transcripts expressed at the time of sample collection (Jaypal and Melendez, 2006; Brazma et al., 2001). Xylem samples were collected from full-sib trees of the same age, growing on the same site. Xylem was harvested from the same height on the stem and samples were collected in a three hour window between 8.00 am and 11.00 am and frozen immediately in liquid nitrogen. This was done to minimize the likelihood of identifying genes differentially regulated in xylem in response to diurnal signals (Solomon and Myburg, 2007). The uniform quality of the RNA isolated from the xylem was verified by gel electrophoresis (Carter et al.,

2005). Bulking of RNAs from several trees was performed in order to increase the likelihood that the candidate genes identified will be relevant to many genotypes. In

72

addition, three biological replications and dye swaps were also performed. Microarray data was normalized to adjust for inequalities in the amounts of RNA used for cDNA preparation and to remove possible nonlinear bias in fluorescence as a result of differences in cDNA labeling.

The criteria for selection of differentially regulated candidate genes were deliberately less stringent than in other studies as the intention of the experiments was to identify a reasonable number of candidate genes that may influence pulp yield. These candidates could then be further narrowed down to a small number of genes which could be examined for genetic variation in natural populations and association studies. Clones were deemed interesting and selected that had expression ratios consistently in the top (or bottom) 5% of clones in each biological replicate. The probability that this would occur by chance was estimated to be approximately 1:8000. Selected clones were further verified by visual spot inspections so that false or unusual spots could be rejected.

Despite attempts to improve the reliability of microarray experiments, it is desirable to confirm the results obtained from arrays by independent methods such as northern blot, real-time RT- PCR or in-situ hybridisation (Reghunathan et al., 2005). In order to verify the reliability of the microarray data, EgrCesA3, EgrHB1 and EgrNAM1 were selected for validation using real-time RT-PCR as this method is used frequently for evaluating array data (Brazeau, 2004; Rajeevan et al., 2001). Some researchers have reported good agreement for specific genes identified by microarrays and subsequent testing with northern blot analyses and quantitative real-time RT-PCR (Brazeau, 2004). The same

73

RNA samples and biological replicates were used for verification studies of gene expression in high and low pulp yield trees. The real-time PCR confirmed the stronger expression of EgrNAM1 in high pulp trees. The expression of EgrCesA3 was also higher in high pulp trees but not significantly different from low pulp trees. The expression of

EgrHB1; however, was higher in low pulp yield trees, in contrast to the microarray results where it was more strongly expressed in high pulp trees. This sort of discrepancy has been observed in other microarray verification studies (Paux et al., 2004; Klok et al.,

2002; Jones et al., 2002) and may be explained by cross hybridisation of closely related genes from the same multigene family (Miller et al., 2002) as reported previously (Cho et al., 2002). Another possible explanation could be a bias in dye labeling to specific sequences as the combination of fluorescent dyes for probe labeling affects expression data in certain genes. The effect of dye combination was particularly notable for the gene

NAT1 whose signals on DNA microarrays were significant and reproducible (Taniguchi et al., 2001) using different sources of total RNA and poly (A) RNA. Taniguchi et al.

(2001) also reported that repeating the analysis with the reverse combination of dyes gave a different result. This may explain the contrasting results obtained for EgrHB1. These results highlight an important limitation of microarrays and the fact that validation is crucial in order to be confident that a gene is differentially expressed.

74

2.4.2 Pulp yield candidate genes

A relatively small proportion of the cDNAs spotted on the microarrays (46) were differentially regulated in trees differing in pulp yield. This result is not surprising given that the comparisons in this study were between samples both derived from xylem. Much larger numbers of differentially regulated genes have been observed in comparisons between eucalypt xylem and leaf tissues (Paux et al., 2004). Approximately 79% of the

Eucalyptus nitens candidate genes identified had putative homologues of known function in other organisms. About 15% of the differentially regulated ESTs were homologous to proteins in the database that have no known function and 6.5% had no homologue in the database. These results are consistent with other EST studies in pine (Whetten et al.,

2001) and poplar (Sterky et al., 1998). Most interesting among the differentially expressed genes were a number of cell wall genes that were largely up-regulated in high pulp trees and a small number of transcription factors. Four ESTs were selected from among these for further analysis including ESTs with homology to (1) a secondary cell wall cellulose synthase (EgrCesA3), (2) a NAC-domain protein likely to be involved in regulating plant development (EgrNAM1), a homeodomain transcription factor potentially regulating vascular development (EgrHB1) and a C3HC4-type zinc finger transcription factor protein (EgrZnf1).

2.4.2.1 EgrCesA3

Two ESTs identified in the arrays were identical to EgCesA3, a Eucalyptus grandis secondary cell wall CesA identified by Ranik and Myburg (2006). EgCesA3 was found to be the most abundant out of five eucalypt CesAs expressed in xylem tissues.

75

Phylogenetic analysis confirmed the close relationship of EgCesA3 with other secondary cell wall CesAs identified in arabidopsis (AtCesA7 or IRX3; Taylor et al., 1999) and in poplar (PtrCesA2; Samuga and Joshi 2002) which are also abundant in cells contributing to secondary xylem deposition. The cDNA arrays also contained copies of at least one other secondary cell wall CesA which was differentially regulated in eucalypt branches

(Qiu et al., 2008); however this CesA was not differentially regulated in this study. This suggests that not all secondary CesAs are coordinately regulated together. There is a strong correlation between cellulose and pulp yield in Eucalyptus wood (Moran et al.,

2002). It seems likely that increased expression of EgrCesA3 could lead to higher cellulose synthesis in secondary cell walls and that this in turn could contribute to higher pulp yield.

2.4.2.2 EgrNAM1

Phylogenetic and BLAST analysis revealed that the EgrNAM1 EST is homologous to the

C-terminal of several NAC domain proteins, a class of plant transcriptional regulators involved in diverse roles in plant development. NAC proteins are characterized by the presence of a highly conserved NAC domain structure in the N-terminal region and a divergent C-terminal activation domain (Olsen et al., 2005; Ernst et al., 2004; Duval et al., 2002). No sequences homologous to the predicted EgrNAM1 protein were identified in poplar or pine. CUC2 and AtNAM (GI: 4325285) are the most closely related arabidopsis proteins to EgrNAM1 that have been functionally studied. AtNAM and

CUC2 have been shown to play a role in shoot apical meristem development (Duval et al., 2002) and CUC2 is involved in the separation of embryonic organs (cotyledons) and

76

floral organs during morphogenesis (Aida et al., 1997). The petunia homologue of NAM has been shown to be involved in the development and maintenance of the shoot apical meristem (Souer et al., 1996).

Several other NAC proteins have been implicated in secondary cell wall development in plants. It was recently reported that SND1 (ANAC012) is a key transcriptional switch regulating secondary wall thickening of fibres. Zhong et al. (2006) demonstrated that expression of SND1 is associated with interfascicular fibers and xylary fibres in stems, and dominant repression of SND1 leads to a severe decrease in the thickness of fibre secondary cell walls. Similarly, VND6 and VND7 (vascular related NAC domain) have been identified as transcriptional switches for plant metaxylem and protoxylem vessel formation (Kubo et al., 2005). Repression of VND6 or VND7 resulted in inhibition of the development of metaxylem or protoxylem in roots and over expression of these genes led to ectopic differentiation of metaxylem and protoxylem. Using double knockout plants

Mitsuda et al. (2007) showed that the NAC proteins NST1 and NST3 (SND1) redundantly regulate the secondary cell wall thickenings in interfascicular fibres of inflorescence stems and secondary xylem of hypocotyls in arabidopsis. The expression of genes involved in the biosynthesis of secondary walls was down regulated in nst-1 nst3-1 plants.

2.4.2.3 EgrHB1

Sequence homology and phylogenetic analysis revealed that EgrHB1 is closely related to

ATHB-15, PtrHB5 and 6 and ZeHB-13, forming a single clade among class III HD-Zip

77

proteins (Prigge and Clark, 2006; Emery et al., 2003; Ohashi-Ito and Fukuda, 2003). In arabidopsis there are five class III HD-Zip genes including REV (Revoluta), PHB

(Phabulosa) and PHV (Phavoluta) and the more distantly related ATHB-15 and ATHB-8

(Emery et al. 2003). PHB, PHV and REV are expressed in the adaxial domains of lateral organs in vascular tissues and in the apical meristem while ATHB-8 and ATHB-15 appear to be expressed exclusively in vascular tissues (Baima et al., 1995). Previous studies on dominant gain-of-function mutations suggest PHB, PHV and REV are important for regulating organ polarity by specifying adaxial and abaxial cell fates and vascular patterning (Emry et al., 2003; Kim et al., 2005). ATHB-8 over-expressing transgenic plants exhibit a promotion of procambial/cambial cell differentiation in xylem formation and earlier and higher production of lignified tissue in inflorescence stems compared to wild type (Baima et al., 2001). Detailed analysis of ATHB-15 expression has not been reported; however, expression of a closely related Zinnia HD-zip gene, ZeHB13, was restricted to vascular tissue (Ohashi-Ito and Fukuda 2003). Populus has eight class III

HD-Zip proteins (Ko et al., 2006) among which PtrHB5 and PtrHB6 are the most closely related to EgrHB1. These two poplar HD-Zip proteins have been found to be strongly expressed in cambium and tension wood. Recent studies on ATHB-15 (Kim et al., 2005) and PtaHB1 (Ko et al., 2006) has revealed that the regulation of these genes is influenced by microRNA through mRNA cleavage. It seems that the class III HD-Zip genes are important regulators of vascular development in land plants and are evolving to perform unique functions in vascular tissue.

78

The up regulation of PtrHB5 and 6 transcriptional activity in tension wood provides an interesting clue to the role of these genes in wood development. Tension wood in poplar is enriched in cellulose and deficient in lignin and hemicellulose (Andersson-Gunnerås et al., 2006). It is likely that both of these genes activate downstream structural genes involved in cellulose biosynthesis which may lead to changes in cellulose content.

Further genetic and biochemical studies are required to confirm the exact role of EgrHB1 in secondary growth in eucalypts.

2.4.2.4 EgrZnf1

Phylogenetic analysis of EgrZnf1 revealed that it codes for a C3HC4 type zinc finger protein and belongs to a large family of transcription factors which have a specialized type of ring finger domain of 40 to 60 residues. The ring finger motif, also called a

‘cross-brace’ motif, is defined by Cys-X2-Cys-X (9-39)-Cys-X(1-3)-His-X(2-3)-

(Asn/Cys/His)-X2-Cys-X(4-48)-Cys-X2-Cys (Borden and Freemont, 1996; Meng et al.,

2006) There are two variants of ring finger proteins in higher eukaryotes; the C3HC4 type and C3H2C3-type (Ring H2 finger) which are clearly differentiated by the pattern of occurrence of cysteine and histidine residues (Borden and Freemont, 1996; Borden,

2000). Zinc finger proteins are involved in a diverse range of biological processes including RNA degradation (Sims and Ordanic, 2001), protein degradation (Zhang et al.,

2005), development (Disch et al., 2006; Karlowski et al., 2003), defense (Serrano and

Guzmán, 2004) and drought response (Kam et al., 2007; Sahin-Cevik and Moore, 2006).

Though a large number (165) of ring finger genes have been identified in arabidopsis, few of these genes have been functionally characterized. The arabidopsis ring H2-type

79

zinc finger protein, XERICO, influences drought tolerance during seed germination and seedling growth (Ko et al., 2006). The transcript of XERICO was up-regulated in salt and osmotic stress responses and further transgenic analysis revealed increased levels of cellular abscisic acid (ABA). Their findings also suggest that XERICO probably has a function on ABA homeostasis at a post-translational level through the ubiquitin/proteasome pathway by interacting with AtTLP9, an ASK1-interacting F-box protein involved in the ABA signaling. Another Ring-H2 zinc finger protein, RIE1 has been found to be essential in arabidopsis seed development. Mutant rie1 plants produced abnormal seeds although plants appeared to grow normally compared to wild type (Xu and Li, 2003). This sort of pleiotropic phenotype suggests an influence on hormone levels.

2.4.3 Cell wall loosening genes

Other cell wall genes up-regulated in high pulp trees included three genes likely to be involved in cell wall loosening. This included a xyloglucan transglycosylase (XTH), a pectate lyase and a beta-expansin. A cellulose/xyloglucan network constitutes the basic framework of the cell wall, in which xyloglucans form tight non-covalent bonds via hydrogen-bonds with cellulose microfibrils. The xyloglucan-cellulose network is further embedded in a pectin-rich matrix which is further cross-linked with a network of structural proteins. XTHs are implicated in both the splitting and/or reconnection of xyloglucan cross-links, and are considered to play a key role in both the construction and disassembly of cell wall architecture (Vissenberg et al., 2005). XTHs have been detected in secondary cell wall formation in poplar (Bourquin et al., 2002) and in eucalypts (Paux

80

et al., 2004). These genes are likely to participate in restructuring primary cell walls while the secondary layers are deposited by creating and reinforcing the connections between the primary cell wall and secondary cell wall layers. Expansins are important proteins involved in cell wall loosening and expansion (Darley et al., 2001; Carpita and

McCann, 2000). They are required for disruption of the non-covalent bonding between cellulose and hemicelluose, thus allowing cell wall polymers to yield to the turgor- generated growth force (Carpita and McCann, 2000). The pectin rich network may control access of cell wall related enzymes to their substrates (Carpita and McCann,

2000). It is possible that pectate lyase mediates removal and modification of the existing pectin matrix, allowing the deposition of newly synthesised wall polymers.

Using cDNA microarrays from eucalypt xylem tissues a number of interesting genes were identified which potentially influence the properties of wood formed in Eucalyptus nitens. Most of the identified genes share homology with genes that are at least partially characterized in other plants and which play a role in cellulose biosynthesis or regulate aspects of secondary cell wall formation in vascular tissues. It is possible that allelic variation in some of these genes could underlie the genetic variation in wood properties that has been observed in Eucalyptus (Thumma et al., 2007; Thumma et al., 2005) and

Pinus taeda (González-Martínez et al., 2007). Genes identified in this study are the subject of genetic diversity and association genetics research described in Chapter 4.

81

Chapter 3

Cloning and molecular characterization of the novel eucalypt cell wall

gene EgrPAAPA

3.1 INTRODUCTION

Recently, a novel eucalypt EST known as EgrPAAPA was identified at CSIRO Forest

Biosciences, Canberra. The EgrPAAPA transcript is downregulated in branches of E. nitens (Qiu et al., 2008). The EgrPAAPA EST was identical to Suc75 (AW191334), a transcript found to be abundant in the vascular cambium of E. globulus (Bossinger and

Leitch, 2000). The cDNA fragment had no apparent homologs in the Arabidopsis thaliana genome and it has not been found in other plants. The predicted amino acid sequence of EgrPAAPA is rich in proline and alanine, and in particular the amino acid motif PAAPA, suggesting that it may be involved in cell wall development. In fact, the high proline/hydroxyproline content of the protein backbone is characteristic of hydroxyproline-rich glycoproteins (HRGPs), a class of protein molecules believed to play a fundamental role in plant development (Sommer-Knudsen et al., 1998).

In E. nitens, EgrPAAPA is downregulated in both upper and lower branch xylem when compared to xylem from vertical stems. There are considerable changes in cell wall structure in eucalypt branches including changes in cellulose and lignin content and microfibril angle (Qiu et al., 2008; Washusen et al., 2005). It has also been found that genes that are preferentially expressed in xylem fibres are likely to be involved in the synthesis of cell wall constituents. The EgrPAAPA gene may therefore be involved in cell wall development, which in turn may affect wood properties.

In this study, a full length cDNA of EgrPAAPA was isolated and the expression of this gene was examined in a range of eucalypt tissues. Sequence analysis confirmed that

83

EgrPAAPA is a novel protein. The structure and function of EgrPAAPA was explored by comparing its predicted amino acid sequence to the sequences of published HRGPs.

3.2 MATERIALS AND METHODS

3.2.1 Plant material

A nine-year-old E. grandis tree growing at the CSIRO, Forest Sciences site in Canberra,

Australia, was used as a source of DNA for Southern blot analysis. The same tree was also used as source of RNA for northern blot analysis including xylem from the upper and lower sides of branches, leaves, and floral buds. Leaves, shoots and young stems were collected from juvenile branches in the crown of the trees and flowers were collected from mature branches. Roots were harvested from E. grandis seedlings growing in a glasshouse.

3.2.2 Probe preparation

The short EgrPAAPA cDNA was amplified using the pBluescript M13 Reverse (5’

GGAAACAGCTATGACCATG 3’) and M13 Forward (5’ GTAAAACGACGGCCAGT

3’) primers and used as a probe for library screening. The amplification reaction contained 5 µl of 10x buffer, 1 µl of 10 mM dNTPs, 1.5 µl of 50 mM MgCl2, 2.5 µl of

each primer (10 µM), 0.5 µl Taq DNA polymerase (Life Technologies, California), 1 µl

template (10x dilution of original cDNA) and water, added to 50 µl. The PCR reaction

was performed on an ABI GeneAmpR PCR system 2700 (manufacturer details) using the

following program: 1 cycle of 3 min at 94°C; 30 cycles of 94°C for 45 s, 55°C for 30 s and

72°C for 90 s followed by a final extension of 10 min at 72°C. Amplifications were

84

confirmed by gel electrophoresis and then the PCR products purified using a QIAquick

PCR Gel purification kit (QIAGEN, GmbH, Hilden, Germany). Purified PCR products

(25 µl) were mixed with 2 µl of EcoRI and 2 µl of XhoI (Invitrogen, Carlsbad,

California), 8 µl 10x RE buffer and 3 µl distilled water and incubated at 37°C for 2 hours

to remove vector sequences. The digestion products were separated in 1% TAE agarose

and extracted using a QIAquick gel extraction kit (QIAGEN). The purified PCR product

was confirmed and quantified by gel electrophoresis against a standard

(GeneRulerTM100bp DNA Ladder; Maryland, USA). Approximately 50 ng of purified

cDNA was labelled using 5 µl of α32P-dCTP (50 µCi) and Ready-To-Go DNA labelling

beads (Amersham Biosciences, Buckinghamshire, England) according to the

manufacturer’s instructions. Labelled DNA was separated from unincorporated label

using a ProbeQuant G-50 micro column (Amersham) and used for library screening. The

probes used in Southern and northern blot analysis were also synthesized using Ready-

To-Go DNA Labelling Beads (-dCTP) (Amersham).

3.2.3 Library screening

An E. grandis xylem cDNA library, cloned uni-directionally into pre-digested EcoRI and

XhoI Lambda ZAPII vector (CSIRO Forest Biosciences, Canberra), was screened to

identify a full length transcript of EgrPAAPA. The library was plated at a density of 2x

104 pfu per 150 mm LB-Bottom-agar/NZ-Top-agarose plates and clones from each plate

were adsorbed onto duplicate Hybond-N+ filters (Amersham) following the

manufacturers instructions. The plaques on each filter were denatured with 1.5 M NaOH

containing 1.5 M NaCl, and neutralized in 0.5 M Tris-base (pH 7.2) containing 1.5 M

85

NaCl. Residues were removed by washing in 2x SSPE/0.5% SDS (sodium dodecyl sulphate) at room temperature. The DNA was UV-cross-linked to the moist filters.

Filters were prehybridized in Modified Southern Buffer that contained 0.2% (w/v) Ficoll

400, 0.2% (w/v) bovin serum albumin (BSA), 0.2% (w/v) polyvinylpyrrolidone-360,

0.5% (w/v) SDS, and 500 µg denatured salmon sperm DNA in 5x SSPE, at 65°C

overnight with agitation. The random primer α32P-dCTP labelled short EgrPAAPA

cDNA fragment was denatured by boiling for 5 min, chilled on ice then incubated in the

hybridisation solution at 65°C overnight with agitation. Non-specifically bound probe

was removed by washing the filters in 2x SSPE/0.1% (w/v) SDS at room temperature for

10 minutes, 1x SSPE/0.1% (w/v) SDS at 65°C for 15 minutes and finally in 0.1x

SSPE/0.1% (w/v) SDS at 65°C for 10 minutes.

Positive primary plaques identified from duplicated lifts from each plate were cored from

the agar, and the phagemids eluted in 1 ml SM buffer containing a drop of chloroform for

3 h at room temperature with regular vortexing. Dilutions of the phage solution

(typically 1:100 and 1:1000) were incubated with bacterial host cells (600 OD) in LB agar medium and purified by repeated plating and screening following the protocol outlined above.

Plasmids were isolated from nineteen positive Lambda Zap II tertiary plaques using

ExAssist Helper phage and XL1-Blue MRF’ cells that had been prepared by growing in 1

M MgSO4 and 0.2 g/ml maltose overnight, and resuspended in 10 mM MgSO4. The

circularized pBluescript SK plasmids were replicated in the SOLR (amps) E. coli strain in

86

the presence of ampicillin, according to the manufacturer’s instructions (Stratagene, La

Jolla, California). The size of each insert was determined by gel electrophoresis after restriction digestion with NotI (Invitrogen) enzyme.

The nineteen positive clones were sequenced independently in both directions using the pBluescript SK Reverse (5’ GGAAACAGCTATGACCATG 3’) and Forward (5’

GTAAAACGACGGCCAGT 3’) primers. Sequencing was performed using Big Dye terminator version 3 reagents and a sequence analyzer from ABI PRISM. Sequences were analyzed using the tBLASTX program (National Center for Biotechnology

Information http://www.ncbi.nlm.nih.gov./Entrez and Populus nuclear genome, available at http://www.jgi.doe.gov/poplar) in order to identify significant homology with published gene sequences. Multiple sequence alignments were performed using the

EMBL-EBI site at http://www.ebi.ac.uk/clustalw by ClustalW2 software. The predicted

amino acid sequence of EgrPAAPA was deduced using the ExPASy translate tool

(http://kr.expasy.org/tools/dna.html). Prediction of N-terminal cleavage sites and GPI modification site were performed with GPI Plant Prediction Server

(http://mendel.imp.ac.at/gpi/plant_server.html; Eisenhaber, et al., 2003).

3.2.4 DNA and RNA extraction

Genomic DNA was extracted from good quality leaf tissues using a standard method

(Glaubitz et al., 2001) with modifications. Approximately 5 µg of leaf tissue was ground

in liquid nitrogen using a mortar and pestle and dissolved in 25 ml of extraction buffer

(0.35 M Sorbitol, 100 mM Tris pH 8, 100 mM Boric acid, 25 mM EDTA,1 M NaCl, 10%

87

PEG 8000, 2% PVP 40000, 0.5% BSA, 0.1% Spermine and 0.1% Spermidine) in a 50 ml beaker. The solution was homogenized by mixing and filtered through very fine muslin into a 50 ml tube on ice. The tube was shaken vigorously and centrifuged at 2000 rpm for 10 min. The pellet was resuspended in 2.5 µl of wash buffer (0.35 M Sorbitol, 50 mM Tris pH 8.0 and 25 mM EDTA pH 8.0) by mixing with a paint brush. A 0.5 ml of

20% SDS, 0.5 ml 20% Triton, 1 ml of 5 M NaCl, and 1 ml of 8.6% CTAB/0.7 M NaCl were added sequentially with gentle mixing and the mixture incubated at 65°C for 30 min. The solution was then cooled to room temperature and extracted with 20 µl of chloroform:IAA (24:1) and the phases separated after centrifugation at 5000 rpm for 10 min. The upper aqueous layer was transferred into a new tube and DNA precipitated with 2/3 volume of 100% isopropanol. The DNA pellet was then collected after an over night wash with 50% isopropanol/0.3 M NH4O acetate (1.5 ml) and resuspended in 200

µl of TE (pH 8.0). Total RNA was isolated as described in the Materials and Methods sections of Chapter 2. The quality of RNA was evaluated by photometric measurements after gel electrophoresis using an RNA standard (Promega, Australia). DNA and RNA were quantified by measuring OD260 on a spectrophotometer.

3.2.5 Southern blot

Five micrograms of genomic DNA was digested at 37°C overnight in three separate

reactions with HindIII, XhoI, and SacI. All restriction enzymes were purchased from

Invitrogen. Digestions were performed in a total volume of 50 µl and included 5 µl of

10x buffer, 2 µl of 50 mM spermidine, 0.05 µl RNase A (4.25 U/ml), 32 µl of genomic

DNA (160 ng/µl), 7.95 µl distilled water and 3 µl of restriction enzyme (10 U/µl).

88

Products were separated by gel electrophoresis in a 0.7% 1x TBE gel. DNA was stained with 10 mg/ml ethidium bromide for 15 min, rinsed in distilled water for 10 minutes and then viewed under UV to determine the extent of digestion, and then transferred to

Hybond-N+ membrane (Amersham) following the manufacturer’s instructions. DNA was

cross-linked to the membrane with UV and the blot was prehybridized (5x SSC, 5x

Denhardt’s Solution, 0.5% SDS, 0.3 mg/ml denatured salmon sperm DNA) at 55°C for 2

hours. The template DNA used for probe synthesis was derived from gel purified PCR

product, synthesized using TaqF1 (Fisher Biotech, Australia) DNA polymerase to

produce a ~1100 bp product including the entire EgrPAAPA coding region. This was

achieved using the M13 forward and reverse primers with the pBluescript SK EgrPAAPA

plasmid. Randomly primed α-32P-dCTP DNA probes (Amersham) were hybridized to

filters at 55°C overnight with agitation. Filters were washed with 2x SSC/0.5% SDS at

55°C twice for 15 minutes, and 0.5x SSC/0.5% SDS at 65°C twice for 15 minutes. After

washing filters were exposed to Biomax-MS film (Kodak) and the signal amplified with a

Biomax Screen at -80°C for 24 hours.

3.3.6 Northern blot analysis

Approximately 20 µg of each RNA sample was used for northern blot analysis and was

separated in 1.2% formaldehyde-agarose (4.8 g agarose, 40 ml of 10x MOPS, 55.2 ml

DEPC treated water, 7.2 ml of 37% formaldehyde, 4 µl of 10 mg/ml ethidium bromide)

by gel electrophoresis using the methods described in the RNeasy Mini handbook

(QIAGEN). RNA (28 µl) was denatured at 65°C for 5 minutes and then 7 µl of RNA

loading dye added to the tube. After chilling on ice the samples were loaded onto the gel

89

and electrophoretically separated at 5-7 V/cm overnight. Equal loading and separation of the RNA was confirmed by photometric observation after gel electrophoresis. After washing with DEPC treated water and 20x SSC for15 min each, the separated RNA was transferred to Hybond N membrane (Amersham) using 20x SSC followed by UV cross- linking. The probe used in Southern blot analysis (described above) was also used for probe synthesis using Ready-To-Go DNA Labelling Beads (-dCTP) (Amersham). Filters were hybridised with the labelled probe at 55°C overnight in hybridisation buffer (50% deionised formamide, 0.25 NaPO4, 0.25 NaCl, 1 M EDTA and 7% SDS) with agitation.

Filters were washed in 2x SSC/0.5% SDS at 55°C twice for 5 minutes, and 0.1x

SSC/0.5% SDS at 65°C for 15 minutes. After washing filters were exposed to Biomax-

MS film (Kodak) and the signal amplified with a Biomax Screen at -80°C for 24 hours.

3.3 RESULTS

3.3.1 Isolation and sequence analysis of EgrPAAPA

The previously identified EgrPAAPA cDNA lacked a start codon and therefore did not

encode the entire open reading frame. A full-length EgrPAAPA cDNA was obtained by

screening a E. grandis xylem cDNA library using the short EgrPAAPA cDNA as a probe.

Nineteen positive clones from the library were excised in vivo and maintained in

pBluescript. The 5’ and 3’ end of each clone were sequenced and analysed. The

sequences of twelve clones were identical or nearly identical to the DNA sequence of the

short EgrPAAPA clone and eight out of the twelve clones (1, 3, 9, 10, 13, 14, 16 & 19)

contained a predicted start and stop codon.

90

The full length EgrPAAPA cDNA is 1105 bp in length with an uninterrupted open reading frame of 516 bp encoding a predicted protein sequence of 172 aa (Figure 3.1).

The cDNA has a 98 bp 5’ untranslated region (UTR) and a long 488 bp 3’ UTR. One clone (16) differed at three base positions to the other clones, indicating that there are two

EgrPAAPA alleles in the cDNA library. Interestingly, these differences result in two amino acid substitutions at positions 45 and 144, which are also indicated in Figure 3.1.

One allele has a leucine instead of a proline residue at position 45 and a threonine instead of a serine residue at position 144. The predicted amino acid sequence of EgrPAAPA is rich in the amino acids alanine (28.5%), glutamic acid (24.4%), proline (15.1%), valine

(10.5%) and threonine (9.3%). There are 13 pairs of glutamic acid residues in the

EgrPAAPA protein sequence. Both alleles have seven “PAAPA” motifs towards the C- terminal end of the predicted protein.

91

aaaccagccctcatttgccccccatattttcatttctctgattcccatcaactttccctc ctgcaaattcccctagctttgctttttcatttccagatggccagcgtagaggttgcacaa L Q I P L A L L F H F Q M A S V E V A Q Gctacaacattgcctgaggagaaggcacccgaggtgacagagacagaggaagcgctcgcg A T T L P E E K A P E V T E T E E A L A gaagaagtggccaccccagccgtgaccgaggaacccgtagtcgatgcaccggctgccgag

L (allele2) E E V A T P A V T E E P V V D A P A A E gagtcagttcccgcagtagaatctgagacgccggcagaagtcgaaaccaaggaagtcgtg E S V P A V E S E T P A E V E T K E V V gaggaaaccaaggccgaagagccagcagagccagccgttgccgaggagaccactagggaa E E T K A E E P A E P A V A E E T T R E ctggatatcaccgagaccccggccgctcctgctgaggcggaagcccccgccgctccagcc L D I T E T P A A P A E A E A P A A P A aaggaggaggcgccggccgctcctgccgaggagacgccagcagctcctgctgaagtggac K E E A P A A P A E E T P A A P A E V D gtgcctgctgcccctgccaaggaggaagctcccgtcgccgccgacagcacacctgctgcc

T (allele2) V P A A P A K E E A P V A A D S T P A A cctgccgaggaggttgcaccggctgctcctgctgaggagactccagccaaggtggaagca P A E E V A P A A P A E E T P A K V E A ccggcagcggagtaaggacgatccggcactcgagctgcattcgaggaagctttgagtaga P A A E - G R S G T R A A F E E A L S R gatcatcccactggtattcaggaagagagtgtgcaagtaaatgcggtggtggtgtgtcta ctcttttttagttatgatggggttcttgggtttctaggtttgcatgttgaagttggtcag gttttgctcattgtgtctttttttcttccttggtttagtttcttactatattaaatatgg actagaccatgttgtggtactgctactatggtttgctcgagtctagaagatgcccgctcg gagccacgtcaaggtccgccgccgccgcaagaatggctgataattgttttatttgaggcg aggctgtgtttgcttttttggggggcgatggtcggaggattgcggcagaggaaggctgcg atgtttggcgccgagaggggggcttcatatgattactagcaatgtgtgatgagatatgaa taatgaaatttgctatttctcctaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

FF Figure 3.1 Nucleotide and deduced amino acid sequences of the EgrPAAPA cDNA. The stop codon is indicated by a short dash. The PAAPA motif and pairs of glutamic acid are underlined. The position of the amino acid substitutions in the two alleles is indicated above the amino acid sequence.

92

The deduced amino acid sequence of EgrPAAPA was compared against predicted amino acid sequences in published databases using tBLASTX. No significant similarity was detected with proteins in the arabidopsis and poplar genome sequences using the default

E value cut off. However, nucleotide blast gave significant EST similarity with eucalypt

ESTs. EgrPAAPA was identical to an E. grandis mRNA (CB968035) and shared 92% identity with an E. gunnii sequence (CT985875); both ESTs being derived from differentiating xylem.

Further sequence analysis using Protein Blast (swissprot) revealed that EgrPAAPA had

35% amino acid similarity with an allergen-related glutamic acid rich protein (Hev b 5,

Q39967) from Hevea brasiliensis. Several other glutamic acid rich protein sequences retrieved from public data bases shared amino acid similarity with EgrPAAPA including cDNAs isolated from grape (43%, Grip31,CAB85626), cassava (42%, C1, AAM55492), chickpea (39%, Cap28,CAA12357) and buckwheat (41%, T10699). Figure 3.2 shows the alignment of the EgrPAAPA protein with other glutamic acid rich proteins. A comparison of the amino acid composition of EgrPAAPA with other glutamic acid rich proteins is shown in Table 3.1. Apart from glutamic acid, most of the proteins also contain relatively high proportions of proline, alanine, valine and threonine.

93

EgrPAAPA. 1 MASVEVAQATTLPEEKAPEVTETEEALAEEVAT--PAVTEEPVVDAPAAEESVPAVESET Cap28. 1 MASVEVAHQAPVTAVQENEPTTEVINTPETITEQQPATEVPEATEQPEAEVPATEETIEE Grip31. 1 MATAEVVSATPALSEEKTEESVKAEETPVEEVAAAPPTPEPVAEEPKEAETAAVPEESAA C1. 1 MATAEVVTAQTALPEEKPAEEVKVSEIVTEEAAPA---VEPVAEEPKEAEPVAVSEEPKE T10699. 1 MAAVEVEPNTTTLQENE----KSEVAQVEEVAA-VEQVETPAVEEAPAAVEEAPAAVEEA

EgrPAAPA. 59 PAEVETKEVVEETKAEEPAEPAVA--EETTRELDITETPAAPAEAEAPAAPAKEEAPAAP Cap28. 61 PTETAKEEAPAAPETQDPLEVETK--EVVTEEAKE-ENTEAPKETEESPEEVKEEA---- Grip31. 61 PEAEAPADQDETKEVVEQVEVET---KEVVEKTDQAVVEEPAVEKTEE--VPEETPDQE- C1. 58 ADDAPAEVAVETKEVVEVEEAKTVTEEPTVEKTEEE--EETPKEETPEPVVVKETPKEEP T10699. 56 VEEEAPAAIEEAPAAVEEEAPAAV--EEAVAETKEVVEEEVKAAEEAPTPEAEEEK----

EgrPAAPA. 117 AEETPAAPAEVDVPAA------PAKEEAPVAADSTPAAPAEEVAPAAPAEETPAKVEAPA Cap28. 114 AVEEPIATIENESAAP------PAPPATAEVEEEKENKPVEPVE-APPAAVAEVPIETPE Grip31. 115 -TNSPVVEETKEATEP---AEEPAPKPEPAPADEAPKEEGPAAEEE--EKPAEAKVKVET C1. 116 AAETVVVEAPKETTEAATEAEAPAPESAPASASETPAEEEVPKEEEGDEKKSEAEVEAEK T10699. 110 ----KVEEVEEEKQQV------VEAEEAPAAVEAPAEEAPAVVEEKAEEATTEAAVEKAE

EgrPAAPA. 171 AE--- Cap28. 167 A---- Grip31. 169 DEKAE C1. 176 TE--- T10699. 160 -----

Figure 3.2 Comparison of the deduced amino acid sequences of EgrPAAPA with other glutamic acid rich proteins. Numbers indicate the position of the amino acid residues in each protein. Black boxes indicate identical amino acid residues and grey boxes indicate similar amino acids in three or more sequences.

94

Table 3.1 Amino acid composition of EgrPAAPA and other glutamic acid rich proteins. Composition is shown as a percentage.

Amino EgrPAAPA Cap28 Grip31 C1 Hev b 5 T10699 acid Pro 15.1 14.3 12 11.3 15.2 7.5

Ala 28.5 15.5 16.8 17 18.5 24.5

Glu 24.5 28.5 28.9 31.6 30.5 34

Val 10.5 10.7 12.1 13.6 4.6 15.7

Thr 9.3 11.3 8.7 10.2 13.2 5

Ser 2.3 1.8 2.9 3.4 3 0.6

Lys 2.1 4.2 7.5 8.5 8.6 5

Others 7.5 13.1 11.1 4.4 6.4 7.7

95

3.3.2 Southern blot analysis

Southern-blot analysis was carried out to determine the copy number of the EgrPAAPA gene in the eucalypt genome. Genomic DNA was digested with the restriction enzymes

SacI, which does not cut within the EgrPAAPA cDNA, HindIII, which cuts once, and with XhoI, which cuts twice within the cDNA. Figure 3.3 shows a Southern blot hybridized with the full length 32P-labelled EgrPAAPA cDNA clone under highly

stringent conditions. As expected, restriction digestion with SacI and HindIII resulted in

one and two bands respectively, indicating that there is a single copy of the EgrPAAPA

gene in the Eucalyptus grandis genome. A single hybridizing band was observed in

DNA digested with XhoI, which cuts the EgrPAAPA cDNA at nucleotide positions 631

and 878. It is likely that XhoI sites close by in adjacent genomic regions gave rise to

smaller XhoI DNA fragments which may have run off the gel.

96

900 bp ►

Figure 3.3 Southern blot analysis of the EgrPAAPA gene. Five µg of E. grandis genomic DNA was digested with HindIII, SacI and XhoI, respectively and electrophoretically separated in each lane. Hybridisation was performed under high stringency conditions using the full length EgrPAAPA cDNA as probe.

97

3.3.3 Expression analysis of EgrPAAPA

To investigate the expression pattern of EgrPAAPA, northern blot analysis was performed on total RNA isolated from a range of eucalypt tissues. A hybridizing band of approximately 1.35 kb in size was observed, which is approximately the expected size of the EgrPAAPA cDNA (Figure 3.4). Strongest expression of EgrPAAPA was observed in roots and xylem from the upper sides of branches. Weak expression was observed in xylem from the lower side of branches and weaker expression detected in floral buds.

Expression was not detected in leaves.

98

UX LX LFBFB R

EgrPAAPA 1.35 kb

rRNArRNArRNA

Figure 3.4 Northern blot analysis of the EgrPAAPA gene. Labelled EgrPAAPA cDNA was hybridized to total RNA extracted from upper branch xylem (UX), lower branch xylem (LX), leaves (L), floral buds (FB) and roots (R). Exposure time was 24 hours. Ethidium bromide stained rRNAs are shown to demonstrate equal loading of the respective lanes.

99

3.4 DISCUSSION

The full length transcript of EgrPAAPA encodes a short 172 amino acid protein rich in alanine, glutamic acid and proline residues. Southern blot analysis revealed that E. grandis has a single copy of the EgrPAAPA gene. Northern analysis revealed that

EgrPAAPA is most strongly expressed in vascular tissues. There are currently no sequences in public databases, outside of the genus Eucalyptus, which have close similarity with the EgrPAAPA gene. The deduced amino acid sequence does; however, have a number of interesting features which give clues to the possible function of the gene.

Northern blot hybridisation revealed that the EgrPAAPA gene is expressed strongly in xylem tissue from stems and in root tissues. Weak expression was also observed in floral buds and no expression was detected in leaves. The difference in expression between upper and lower xylem in eucalypt branches suggests EgrPAAPA may be involved in developmental processes that impact on cell wall properties. Upper branch xylem is likely to be synthesizing proportionally more fibre cells and fewer vessels than the lower branch xylem (Scurfield and Wardrop, 1962). In addition, upper branch xylem has been observed to contain a higher proportion of cellulose and cellulose microfibrils more closely aligned to the fibre axis (Qiu et al., 2008).

The predicted EgrPAAPA protein contains a high proportion of alanine, glutamic acid, proline, valine and threonine. The repeating PAAPA motif and the high percentage of proline (more than 15%) suggested EgrPAAPA may be a hydroxyl proline rich

100

glycoprotein (HRGP) that may play a structural role (Josè-Estanyol and Puigdomenech,

2000; Sommer-Knudsen et al., 1998). In addition, Ala-Pro, and Pro-Ala sequences are frequently heavily glycosylated via posttranslational hydroxylation of prolines by prolyl hydroxylases (Sommer-Knudsen, 1998; Showalter and Varner, 1989). However, it appears that the amino acid sequence of EgrPAAPA does not clearly fall into any of the five classes of higher plant HRGPs, including extensins, AGPs, proline-rich proteins

(PRPs), hybrid PRP (HyPRPs) and solanaceous lectins (Showalter, 1993; Sommer-

Knudsen et al., 1998; Josè-Estanyol and Puigdomenech, 2000). For example, extensins are Hyp-rich basic glycoproteins containing a number of copies of the pentapeptide Ser

(Hyp)4 motif (Sommer-Knudsen et al., 1998), which is absent in the EgrPAAPA protein.

The protein moieties of arabinogalactan proteins (AGPs), another cell wall glycoprotein,

contain Hyp/Pro-Ala or Ala-Hyp/pro motifs that are likely to be present in the

EgrPAAPA protein but other characteristics of AGPs are absent (Gao et al., 1999;

Showalter, 2001; Schultz et al., 2002). Most of the well studied AGPs appear to have a

signal peptide, a central Pro-Ala repeat motif and a C-terminal domain with or without

cysteines (Josè-Estanyol and Puigdomenech, 2000). The most compelling evidence that

EgrPAAPA is not a basic glycoprotein is the fact that it lacks a signal peptide which is required for targeting the protein to the ER. In every other respect the EgrPAAPA protein resembles an arabinogalactan protein. The most common protein moiety of

proline rich glycoproteins (PRPS) is Pro-Pro-Xaa-Yaa-Lys, where X is usually Val, His,

Thr or Ala and Yaa is normally Tyr, Thr, Glu or Pro (Sommer-Knudsen et al., 1998).

EgrPAAPA is rich in Glu, Ala, Pro, Val and Thr, but lacks the protein motifs commonly

observed in other PRPs. It is possible that EgrPAAPA is a novel PRP with a repeat motif

101

that has not been observed in other plant proteins. Recent proteomic approaches in arabidopsis have revealed that approximately 5000 cell wall genes are encoded by proteins with a predicted signal peptide that targets to the secretory pathway, whereas about 500 genes are extracellular proteins which are weakly bound to the matrix (Jamet et al., 2006). The latter group of genes are not directly involved in cell wall assembly and modification; they are acidic and are located between the plasma membrane and the cell wall. The EgrPAAPA protein may belong to this class of cell wall protein.

Other features of the EgrPAAPA protein sequence which may provide clues to the possible function of the gene include the high proportion of glutamic acid pairs scattered throughout the protein and the occurrence of seven “PAAPA” motifs in the C-terminal half of the protein. EgrPAAPA shares some amino acid similarity with a number of glutamic acid rich proteins. This includes Cap28, which is expressed during the early development of young chickpea epicotyls (Romo et al., 2002). Recent transcription studies of Cap 28 indicated a role in the development of the embryonic axes during germination (Hernández-Nistal et al., 2006). A cassava gene (c54) encoding a glutamic acid rich protein was found to be expressed more actively in phloem and the vascular cambium, suggesting it may be involved in secondary cell wall development of cassava

(Zhang and Bohl-Zenger, 2003). Currently, no hypothesis has been postulated for the role of the glutamic acid pairs (Hernández-Nistal et al., 2006). It appears likely that glutamic acid rich proteins may be involved in a diverse range of cellular functions such as cell aggregation (Galán et al., 2004), vascular development (Zhang and Bohl-Zenger,

102

2003), fruit ripening (Akasawa et al., 1996; Fowler et al., 2000), and development (Romo et al., 2002; Taylor, 1992).

A striking feature of the predicted protein of EgrPAAPA are the seven ‘PAAPA’ motifs in its C-terminal region, which are typically separated by short regions rich in glutamic acid. The repeating ‘PAAPA’ motif generally does not occur in the other glutamic acid rich proteins with closest similarity to EgrPAAPA, however, a single PAAPA motif appears in the allergen-related glutamic acid rich protein (Hevb5) from Hevea brasiliensis. All of these proteins are rich in ‘PA’ and ‘AP’ amino acid pairs. Further sequence analysis using the signal 3.0 server program (Nielson et al., 1999) did not reveal evidence of a signal peptide. No signal peptide was found in the Cap28 sequence which has a similar sized N-terminal hydrophobic sequence (Romo et al., 2002). No potential glycosylphosphatidylinositol (GPI) modification site was identified in EgrPAAPA, indicating that the C-terminal region of the protein is unlikely to be GPI anchored

(Eisenhaber et al., 2003).

EgrPAAPA appears to be a novel eucalypt gene. It is most similar to glutamic acid rich proteins which have been found in a number of other plant species; however, EgrPAAPA stands apart from other plant proteins by the occurrence of seven ‘PAAPA’ motif repeats in the C-terminal region of the protein. This repeated motif suggests that the EgrPAAPA protein is likely to play a structural role in the cell wall. The stronger expression of

EgrPAAPA in upper compared to lower branch xylem suggests the protein may play a role in cell wall modifications taking place in these tissues, including changes in cellulose

103

content and microfibril orientation. Further insight into the functional role of EgrPAAPA may be obtained by disrupting the gene in transgenic eucalypt tissues. This could be done using induced somatic sectors analysis (ISSA), which was recently used to confirm the effect of disrupting ß-tubulin expression on cellulose microfibril orientation in eucalypts (Spokevicius et al., 2007).

104

Chapter 4

Genetic diversity and association mapping of candidate genes in E. nitens

4.1 INTRODUCTION

Candidate gene-based association studies are well suited to the dissection of complex traits in tree species (Neale and Savolainen 2004). The life histories of many forest tree species have frequently given rise to large unstructured populations, adequate levels of genetic diversity and rapid decay of linkage disequilibrium. This makes them ideally suited for association studies aimed at revealing the genetic basis of complex traits such as wood properties. First developed for the examination of the genetic basis of human disease, this approach has recently been successfully transferred to plants (Olsen et al.,

2004) and the discovery of an association between alleles of the eucalypt cinnamoyl CoA reductase (CCR) gene and microfibril angle (MFA) in eucalypts, demonstrates the potential for its application to woody plants (Thumma et al., 2005). Evidence of the low extent of linkage disequilibrium (LD) in many woody angiosperms further supports the use of candidate gene based approaches to LD mapping in trees (Brown et al., 2004;

Ingvarsson, 2005).

Compared with quantitative trait loci (QTL) mapping, association mapping can identify individual genes and alleles that are responsible for phenotypic differences in quantitative traits (Neale and Savolainen, 2004). Association studies can be applied to identify the single-nucleotide polymorphisms that are responsible for variation in phenotypes

(quantitative trait nucleotides, QTNs). Thumma et al. (2007) have recently detected a

SNP polymorphism in a cobra-like gene in E. nitens that is associated with cellulose content and pulp yield in E. nitens and SNP-based functional studies indicate that the

106

significant SNP could be the Cis-regulating polymorphism controlling allelic expression imbalance (Thumma, CSIRO Forest Biosciences pers. comm).

In the previous chapters several candidate genes were identified that may influence pulp yield in E. nitens. This chapter describes nucleotide diversity and association analysis of three selected candidate pulp yield genes EgrCesA3, EgrNAM1 and EgrHB1 and the novel EgrPAAPA gene. Single nucleotide polymorphisms (SNPs) were identified in these genes as these are abundant markers best suited for association mapping of genes controlling complex traits (Rafalaski, 2002; Carlson et al., 2005). Investigations were directed at (i) the genomic structure of the four candidate genes (ii) identification of common SNPs, (iii) analysis of nucleotide diversity and LD and (iv) selection of haplotype-tagging SNPs and use in association mapping. Evidence of associations between SNPs in the EnHB1 transcription factor and cellulose content, pulp yield and

MFA are presented. Associations between a SNP and MFA in the novel EnPAAPA gene are also described.

4.2 MATERIALS AND METHODS

4.2.1 Plant material

Three hundred unrelated E. nitens trees (association population) growing at Meunna,

Tasmania were used for association analysis. This is the same population was used for

CCR gene association studies (Thumma et al. 2005). Seeds for this trial were originally collected from more than 400 unrelated trees in natural populations in the Victorian

107

central highlands. Sixteen randomly selected trees from this population were used for sequencing and SNP discovery in EnCesA3 and EnNAM1. For EnHB1 and EnPAAPA, bulk DNA from 298 trees was used for SNP discovery.

4.2.2 Wood trait measurements

Six phenotypic traits including pulp yield, cellulose content, MFA, total lignin content,

Klason lignin and density were used for association mapping. Trait data were collected at CSIRO Forest Biosciences, Clayton. Pulp yield, cellulose content, total and Klason lignin were estimated by NIRA analysis (details in Chapter 2 Materials and Methods section). MFA and density were measured using SilviScan analysis. Details of sampling and laboratory processing for SilviScan are given in Qiu et al. (2008).

4.2.3 DNA preparation

Genomic DNA was extracted from mature leaves from 298 individuals by Charlie Bell and Bala Thumma (CSIRO Forest Biosciences, Canberra) using a standard method

(Glaubitz et al., 2001). These DNA samples were diluted to 10 ng/µl and used for SNP identification in EnCesA3 and EnNAM1. Pooled genomic DNAs were used for SNP identification in EnHB1 and EnPAAPA. Genomic DNA from all 298 trees was pooled and purified by Colleen MacMillan (CSIRO Forest Biosciences, Canberra) using a

Qiagen DNeasy maxi kit (Hilden, Germany). The final concentration of pooled genomic

DNA was ~ 45 ng/µl.

108

4.2.4 Primer design

Primers were designed from each cDNA sequence using the Primer3 software (Rozen and Skaletsky, 2000; available at http://frodo.wi.mit.edu/cgi- bin/primer3/primer3_www.cgi) in order to amplify overlapping PCR fragments covering most of each gene. Forward and reverse gene-specific primers are listed in Table 4.1.

Table 4.1 PCR amplification primers used for SNP discovery.

Genes Primers Sequences (5'- 3') Amplified product (bp) EgrCesA3 Frag 1 EgrCesA3L1 F AGCCGGAGCTGGACTTGT 5234 EgrCesA3R3 R TGATCCGTTCGTGCTAAAGA EgrCesA3L2 F AGAAAGAGGGAGGGTGGAAA EgrCesA3R4 R TATGTGGGCACTGGATGTGT EgrCesA3L3 F TTGATGCAGACGGAAATGAG EgrCesA3R5 R AGGTGGAGTGGAGTGAGCAT EgrHB1-15 Frag 1 EgrHB1-1 F GTATTGCTTTTTCTCCATTTTGC 1532 EgrHB1-21 R TCAATAACTTGTTCATCGCAGTG Frag 2 EgrHB1-22 F TGAATATTTCTGGAGATGGCAAC 2503 EgrHB1-24 R ATTGTGGAGGACTCATACAATGG Frag 3 EgrHB1-18 F AATACACAAAATGGTCCAAGCAT 2450 EgrHB1-32 R ATACGAAACCCAGAAGGAAGAAG Frag 4 EgrHB1-30 F GTTAAGCAGGGGATTTAATGAGG 2320 EgrHB1-31 F CCACAGGTTCTTGAGAATCAACT Frag 5 EgrHB1-16 F ACTATCGCATTTGAGTTTGCATT 1770 EgrHB1-17 R CACTACACAGACATGGCAAAAGA EgrNAM1 Frag 1 EgrNAM1-P3 F GGTAACACTGATGGAACACCATT 1952 EgrNAM1-P4 R TATCTCTGTACCCCACCAGAGAA Frag 2 EgrNAM1-P5 F GAAGAGTCCACCAGTCAGAAGAA 1465 EgrNAM1-P6 R GCGTACCCTATGCCATATGTAAA EgrPAAPA Frag 1 EgrPAAPA1 F CTAAACCAGCCCTCATTTGC 2282 EgrPAAPA2 R AACATCGCAGCCTTCCTCT Frag 1 EgrPAAPA1 F CTAAACCAGCCCTCATTTGC 1432 EgrPAAPA5 R GCGCTTCCTCTGTCTCTGTC

EgrCesA3 primers were designed Bingyu Zhang, CSIRO Forest Biosciences, Yarralumla.

The first primer sets of EgrCesA3 produced 5234 bp products. The rest of the primers pairs of this gene for internal sequencing.

109

4.2.5 PCR amplification

Most of the coding region of EnCesA3 was amplified from sixteen individuals using

PfuTurbo DNA polymerase (Stratagene, USA) in methods similar to those describes below. The amplification was carried out and sequenced by Bingyu Zhang, a post- doctoral fellow visiting CSIRO Forest Biosciences, Canberra from the Chinese Academy of Forestry. EnHB1 and EnPAAPA genomic DNA was amplified in PCR reactions containing 2 µl of 10 x buffer, 1 µl of 10 mM dNTP, 1.6 µl of 25 mM MgCl2, 0.5 µl of

each primer (20 µM), 1 µl pooled genomic DNA (~45 ng/µl), 12.9 µl distilled water and

0.5 µl Taq F2 DNA polymerase (5 units/µl; Fisher Biotech, Australia). The amplifications were performed on an ABI thermal cycler (GeneAmpR PCR System 2700)

with initial denaturing at 94°C for 2 min, 35 cycles of 94°C 30 s, a primer-specific

annealing temperature for 30 s and 2-3 min at 72°C followed by a 10 min extension at

72°C. The EnNAM1 genomic DNA was amplified from DNA from sixteen randomly

selected trees using PfuTurbo DNA polymerase (Stratagene, USA). The 100 µl PCR

reactions contained 10 µl of 10 x PCR buffer, 1 µl of 25 mM each dNTPs, 2.5 µl of 20

µM each primer mix, 2 µl of 100 ng/µl DNA template, 2 µl of 2.5 U/µl PfuTurbo DNA

polymerase (Stratagene, USA) and 80 µl of distilled water. DNA was amplified in the

methods described above. All PCR products were confirmed by gel electrophoresis using a 1% agarose gel and purified using a QIAGEN gel extraction kit (Hilden, Germany).

Purified PCR products were quantified by gel electrophoresis by comparing with a 100 bp gene ladder (Fermentas, Australia).

110

4.2.6 Cloning of PCR products

All purified DNA amplicons were ligated into the pGEM-T Easy vector using the pGEM-

T Easy vector kits using the manufacturer’s methods (Promega, USA). Two microliters of each ligation reaction was transformed into bacterial cells (JM109 or DH5α) by heat- shocking for 50 s at 42°C, and plated out on LB/ampicillin/IPTG/X-Gal medium, and

then incubated at 37°C overnight. Twenty four white colonies were picked then cultured in 5 ml LB/ampicillin medium overnight at 37°C with vigorous shaking and plasmid

DNA was isolated using the QIAprep Spin Miniprep kit (Hilden, Germany). In the case

of EnNAM1, DNA from 16 trees was ligated, transformed and plated out separately

following the procedure described above. A single positive colony was picked from each

individual and cultured for plasmid DNA preparation. The sizes of all inserts were verified by digestion with NotI followed by gel electrophoresis. The digestion reactions were carried out at 37°C water bath and consisted of 1 µl buffer D, 0.1 µl BSA, 1 µl

plasmid DNA, 0.20 µl NotI (Invitrogen, USA) and 7.70 µl distilled water.

4.2.7 Sequencing

Twenty four different amplicons of each fragment of EnHB1 and EnPAAPA, and 18

amplicons of each fragment of EnNAM1 were sequenced in both directions using pGEM-

Teasy vector Forward (5’ GTAAAACGACGGCCAGT 3’) and Reverse (5’

CAGGAAACAGCTATGAC 3’) primers. Further sequences of large gene fragments

were obtained using internal primers. Sequencing was carried out using BigDye

Terminator version 3.1 reagents following the manufacturer’s protocols. The products of

the sequencing reaction were precipitated using ethanol, dried down under a vacuum and

111

sent to the John Curtin School of Medical Research (JCMSR) for gel separation.

Sequences were verified manually and contigs were assembled using the computer software program MEGA version 3.1 (Kumar et al., 2004). Multiple sequence alignments were made using the same program and adjusted manually. All chromatograms and SNPs were visually checked using Sequencher 4.6 (Gene Codes,

Corporation, Ann Arbor. Michigan, USA) to exclude any sequencing errors.

4.2.8 Statistical analysis

4.2.8.1 Nucleotide and haplotype diversity analysis: Numbers of polymorphic sites

(S) and synonymous and non- synonymous sites were calculated using the DNASP 4.0 software (Rozas et al., 2003) excluding indels. Synonymous and non-synonymous sites were verified manually by comparing the predicted amino acid sequences of the genes using the MEGA version 3.1 translation program. Nucleotide diversity was estimated as

θw from the number of polymorphic segregating (S) sites (Watterson, 1975, Equation

1.4a) and π (based on the average number of nucleotide differences per site between sequences (Nei, 1987). These parameters were computed at three levels: (i) the entire sequence; (ii) non-coding sequences (including introns, and 3’ and 5’ untranslated regions (UTRs) and (iii) coding sequences. The number of haplotypes, haplotype diversity and the minimum number of historical recombination events (RM, based on 4-

gamete test from coalescent simulations) were also calculated using the DNASP 4.0

software (Rozas et al., 2003). Haplotype diversity (Hd) was estimated using Equation 8.4

in Nei (1987), except that n was used instead of 2n.

112

4.2.8.2 Neutrality tests: Neutrality test statistics D (Tajima, 1989), D* and F* (Fu and

Li, 1993) were performed to ascertain whether or not genes followed the model of neutral evolution (Kimura, 1983). The D test is based on the differences between the number of segregating sites and the average number of nucleotide differences whereas the D*-test is based on the differences between the number of singletons and the total number of mutations. The F*-test statistic is based on the difference between the number of singletons and the average number of nucleotide differences between pairs of sequences.

Confidence intervals at P < 0.01, 0.05 and 0.001 were used to test the significance of each statistic which followed a 2-tailed distribution.

4.2.8.3 Linkage disequilibrium (LD) analysis: LD between pairs of sites in candidate genes was calculated using the TASSEL software (E. S. Buckler IV, http://www.maizegenetics.net/bioinformatics/tasselindex.htm). The squared correlations of allele frequencies (r2) and their significance were measured for each alignment with a

two sided Fisher’s Exact test. Decay of LD with distance in base pairs (bp) between sites

within the same candidate locus was evaluated by nonlinear regression (SAS genetics

software version 9.1). Pooled r2 were estimated from each gene and plotted against

distance and the expectation of r2 (E r2) of average within gene decay of LD. According

2 to Hill and Weir (1988) the expected value or r = 1/(1+4Ner) where N is effective

population size and r is the recombination rate.

113

4.2.8.4 Tagging SNPs: Haplotype tagging SNP (htSNP) analysis was performed to select SNPs for association mapping by a heuristic algorithm in htSNPer (Ding et al.,

2005). Haplotype data were partitioned into blocks based on pairwise LD | D’| using the block definition of Reich et al (2001) with the cut off (α) for mean pairwise LD within blocks set at 0.8, and a minor allele frequency of 0.05. Haplotype tag SNP performance criteria were estimated using the α-percent coverage method (Patil et al., 2001). This test has been considered for choosing the SNPs for association mapping as it significantly reduces genotyping effort without much loss of power (Zhang et al., 2003).

4.2.9 SNP genotyping

Multiplex Ligation-Dependent Probe Amplification (MLPA):

SNPs in the EnNAM1 genomic sequence were genotyped using the MLPA method of

Schouten et al. (2002). The MLPA method is outlined in Figure 4.1. In this approach three oligonucleotides are designed that bind to the DNA template immediately flanking the SNP. The two S oligonucleotides bind to the upstream of the target SNP and contain an allele-specific nucleotide at the 3’ end, a common 19 nt PCR amplification sequence.

The M oligonucleotide binds to the right side of the SNP and contains target specific sequence at the 5’ phosphorylated end and PCR amplification sequences at the 3’ end.

The S and M oligonucleotides are hybridised to the target DNA and then ligated together in a ligation reaction. When annealed to the target sequence, only one of the two short S oligonucleotides ligates to the common M oligonucleotide. This differential SNP- dependant ligation is detected by PCR using the amplification sequences in the S and M oligonucleotides. The sensitivity of the S1 ligase for a mismatch next to the ligation site

114

is used to distinguish two sequences differing in only one or a few nucleotides, such as with SNPs. Amplification products from both alleles were distinguished by using different lengths of S oligonucleotides, giving different sized amplification products.

Table 4.2 describes the oligonucleotides that were used for EnNAM1 SNP genotyping.

M S1

S1 primer Ligation, x bp amplification product (19)…AGTC A …AGTCGAT… M S2 S2 primer No ligation, no amplification product

(22)…AGTT A …AGTCGAT… S1 M

No ligation, no amplification product B S2 …AGTTGAT… M

Ligation, x + 3 bp amplification product B …AGTTGAT…

Figure 4.1 Schematic diagram of MLPA analysis of SNPs resulting in amplification of different sized products from the A and B alleles. See Materials and Methods for a description of the MLPA method.

115

Table 4.2 Oligonucleotides used for genotyping SNPs in EnNAM1

SNPs Probes (bs) Sequences (5'-3') Amplification Products (bs)

SNP 1 NAM-SNP-1 TCACTCTCGGATTTGTTGGATGAGGTCTTCAATAATCTCTAGATTGGATCTTGCTGGCAC 92/96 NAMLIG - SNP 1C GGGTTCCCTAAGGGTTGGATGAGCATGAGATC NAMLIG - SNP 1A GGGTTCCCTAAGGGTTGGACTGGTGAGCATGAGATA

SNP 2 NAM-SNP-2 CCATCTTCCTTAATCTGCTAACTATGTCGTCAAATTTTGATTTCTAGATTGGATCTTGCTGGCAC 102/107 NAMLIG - SNP 2A GGGTTCCCTAAGGGTTGGAATCAATAGAATATAAGTA NAMLIG - SNP 2G GGGTTCCCTAAGGGTTGGACTTGCATCAATAGAATATAAGTG

SNP 3 NAM-SNP-3 TGAACAAATGTTTGATAGGCAGTTGAGATTATGTCAGATCAAATTTCTAGATTGGATCTTGCTGGCAC 109/112 NAMLIG - SNP 3T GGGTTCCCTAAGGGTTGGACTCCATTTTCACTTTTTGATAT NAMLIG - SNP 3A GGGTTCCCTAAGGGTTGGAGTTCTCCATTTTCACTTTTTGATAA

SNP 4 NAM-SNP-4 AGGGGTCAAAGATAATGTATGGTATGATGATCTTGTTGACAATAATAATTCTAGATTGGATCTTGCTGGCAC 117/120 NAMLIG - SNP 4C GGGTTCCCTAAGGGTTGGATCCCTGTTGTCTTTGCAGCCGGTCCC NAMLIG - SNP 4T GGGTTCCCTAAGGGTTGGATCTTCCCTGTTGTCTTTGCAGCCGGTCCT

SNP 5 NAM-SNP-5 CTCAAGTCCTACCACATTCAAGTGATTATGGAGCGCAGGGTACTGCTCCTAGTCTAGATTGGATCTTGCTGGCAC 124/127 NAMLIG - SNP 5T GGGTTCCCTAAGGGTTGGAGAACTCCGATCAAGATCAGGACACGTGACT NAMLIG - SNP 5C GGGTTCCCTAAGGGTTGGAGTGGAACTCCGATCAAGATCAGGACACGTGACC

SNP 6 NAM-SNP-6 TTCAGTCAACTGCTGAGGAGAGTTCTGTTAACCACAATGATCTAGATTGGATCTTGCTGGCAC 120/123 NAMLIG - SNP 6-AAG GGGTTCCCTAAGGGTTGGAGGCTCTAATGCAGAAG NAMLIG - SNP 6-AAGAAGAAG GGGTTCCCTAAGGGTTGGAGGCTCTAATGCAGAAGAAGAAG

116

MLPA analysis:

Thirty nanograms of genomic DNA were used for EnNAM1 MLPA SNPs assays. After

preliminary assays of probes, SNP 3, 4 and 5 were genotyped in multiplex reactions for

each of 298 trees from the Meunna association population. DNA samples were diluted

° with H2O to 7 µl and heated at 98 C for 5 minutes in 200 µl tubes in a thermocycler

(GeneAmp, PCR system 9700) with a heated lid. After cooling to 25°C, 1.5 µl of probe

mix (~5 pmol of each oligonucleotide and each M13-derived oligonucleotide in H20) and

1.5 µl of MLPA buffer (MRC-Holland, Amsterdam) were added to the denatured DNA samples. The samples were then heated at 95°C for 1 minute and incubated for 6 hours

at 60°C. After cooling to 54°C, 11 µl of ligase-65 mix (MRC-Holland, Amsterdam) was

added to each sample and the samples incubated for 5 minutes at 54°C, then heated for 5

minutes at 98°C. The ligase-65 mix contained 1 µl of ligase-65 buffer, 1 µl of ligase-65

buffer B, 8.6 µl of H2O and 0.4 µl of ligase-65. Ten microliters of SALSA PCR buffer mixture (1.3 µl + 8.7 µl of H2O) was pipetted into a new tube and 4 µl of ligation reaction

added to the tubes. The reaction mixture was heated to 60°C prior to adding 3.6 µl of

PCR mixture to the reaction tube. The PCR mixture contained 0.8 µl of PCR primer, 0.8

µl of SALSA dilution buffer, 1.8 µl of H2O and 0.2 µl SALSA polymerase (MRC-

Holland, Amsterdam). The PCR reaction was incubated using the following program: 35 cycles: 30 s at 95°C, 30 s at, 60°C and 1 min at 72°C, followed by a 10 min extension at

72°C. Amplification products were mixed with D1-labelled CEQ DNA 400 size standards and separated on a Beckman CEQ2000 eight capillary electrophoresis system

(Beckman Coulter, Mijdrecht, Netherlands) in formamide. Data were analyzed with the

CEQ2000 fragment analysis software and directly exported to Microsoft Excel for further

117

analysis. To prevent cross contamination, all pipetting was performed using aerosol

resistant filter tip pipettes (San Diego, California).

Illumina GoldenGate genotyping:

Selected SNPs from EnHB1 and EnPAAPA were genotyped by Path West Laboratory,

Perth, Western Australia, using Illumina GoldenGate assay. The GoldenGate assay is a

combination of an oligonucleotide ligation and allele–specific extension reaction which is

analysed using Illumina core technology (Engle et al., 2006). In common with the

MLPA method, 3 specific oligonucleotide probes are designed to bind the DNA template

flanking the SNP. The advantage of this method is that a large number of SNPs can be multiplexed in one reaction with high specific extension and amplification (Shen et al.,

2005).

4.2.10 Marker analysis for associations:

GLM (General Linear Model) in TASSEL (version 2.1, released November 2007) was used to carry out single marker analyses to test for associations. The GLM function performs association by a least squares fixed effects linear model and software uses a ∑

restricted model which leads to a marker sums of squares that is a good test of marker

effects even when data is unbalanced (Searle, 1987). To correct for multiple testing,

permutation tests were used and implemented using TASSEL software.

118

RESULTS

4.3.1 Genomic structure of EnCesA3, EnHB1, EnNAM1 and EnPAAPA

Genomic sequences for EnCesA3, EnHB1, EnNAM1 and EnPAAPA were obtained by

sequencing PCR amplification products from 16-18 alleles using primers matching the cDNA sequence of each gene. In total, ~ 16 540 bp of genomic DNA was amplified and sequenced.

EnCesA3

The structural organization of EnCesA3 was determined by comparing EnCesA3 cDNA and genomic DNA using MEGA version 3.1. The complete EnCesA3 genomic sequence yielded 5236 nucleotides comprising 13 exons and 12 introns. The genomic sequence of

EnCesA3 was 39 bp upstream of the stop codon. Figure 4.2.1 shows the exon and intron positions in EnCesA3.

E 1 E 2 E3 E 4 E 5 E 6 E 7 E 8 E 9 E 10 E 11 E 12 E 13 71 194 208 92 123 267 350 264 213 214 200 354 540

ATG I 1 I 2 I 3 I 4 I 5 I 6 I 7 I 8 I 9 I 10 I 11 I 12 141 342 122 167 103 468 85 93 94 117 342 95

Figure 4.2.1 Genomic structure of EnCesA3. Exons and introns are indicated by green and grey boxes respectively. Numbers indicate the number of nucleotides contained within each intron and exon. ATG denotes the location of the start codon.

119

EnHB1

The EnHB1 genomic structure was derived from 5 overlapping fragments amplified from

genomic DNA. Compilation of overlapping fragments yielded a sequence of 7217 bp.

Comparison of the EgrHB1 cDNA and EnHB1 genomic DNA sequence revealed 18 exons and 17 introns in the transcribed unit of the gene. The exon/intron structure is illustrated in Figure 4.2.2. Exon size varied between 81 and 327 bp while intron size varied from 84 bp to 538 bp. The 5’ UTR was 855 bp and the 3’UTR was 183 bp in length. The genomic structure of EnHB1 is similar to its arabidopsis counter part ATHB-

15 as both contained 17 introns in their genomic sequences (Kim et al., 2005).

5 ’UTRE 1 E 2 E 3 E 4 E 5 E 6 E 7 E 8 E 9 E10 E 11 E 12 E 13 E 14 E 15 E 16 E 17 E 18 3’UTR 855 227 160 101 91 98 112 86 154 76 107 192 186 81 136 327 104 150 146 183

ATG TAG I 1 I 2 I 3 I 4 I 5 I 6 I 7 I 8 I 9 I 10 I 11 I 12 I 13 I 14 I 15 I 16 I 17 371 235 151 223 85 84 181 81 149 89 538 119 241 122 303 355 318

Figure 4.2.2 Genomic structure of EnHB1. Exon positions are shown by green boxes and introns by adjoining lines. The 5’ and 3’ UTR are represented by white boxes. Numbers indicate the nucleotides contained within each intron and exon. ATG and TAG denote the location of the start and stop codon.

120

EnNAM1

The EgrNAM1 cDNA was not full length and thus the genomic structure of this gene was incomplete. The partial genomic structure of EnNAM1 comprised three exons and two introns spanning 1806 bp. Figure 4.2.3 shows the position of the two exons and their sizes.

E 1 E 2 E 3 3’ UTR

319 578 312 155

I1 I 2 I 3 366 80 TAG

Figure 4.2.3 Genomic structure of EnNAM1. Exon and intron positions are shown by green and grey boxes respectively. The 3’ UTR sequence is represented by a white box. Numbers indicate the nucleotides contained within each intron and exon. TAG denotes the location of the stop codon in the genomic sequence.

121

EnPAAPA

The EnPAAPA genomic sequence was obtained from 2 overlapping fragments amplified from genomic DNA. Assembly of the overlapping fragments yielded a 2281 bp nucleotide sequence. The genomic sequence of the EnPAAPA gene consists of two exons and one large intron and is illustrated in Figure 4.2.4. The intron starts 15 bases downstream of the start codon. The EnPAAPA gene has a 98 bp 5’ UTR and a 409 bp long 3’ UTR.

5’ UTR E 1 E 2 3’ UTR

98 15 506 409

I 1 ATG TAA 1253

Figure 4.2.4 Genomic structure of EnPAAPA. Exon and intron positions are shown by green and grey boxes respectively. The 5’ and 3’ UTR are shown by white boxes. Numbers indicate the nucleotides contained within the intron and exons. ATG and TAA denote the location of the start and stop codon respectively.

122

4.3.2 Nucleotide diversity

Each gene fragment was sequenced in 7-24 individuals. Genetic diversity was examined

for the entire sequence of each gene, including non-coding regions (introns, 3’ and 5’

UTRs) and coding regions. Both indel and single nucleotide polymorphisms were

present in the sequenced regions. There were 19 insertion/deletion (indel)

polymorphisms and a total of 280 SNPs in the ~15 kb sequenced from the four candidate

genes. Nine indels and 73 SNPs were indentified in the 5237 bp sequenced in EnCesA3.

Indels were 1-5 bp in length and seven of them occurred in non-coding sequences. There

were 150 SNPs and 4 indels in EnHB1 from 6870 bp of sequence. Indels in EnHB1 were

5-19 bp in length and all are located in non-coding regions. EnNAM1 and EnPAAPA

indels were 2-6 bp in length and located in coding and non-coding regions. Most of the

indels are singletons with the exception of the indel from EnCesA3. All indels were

excluded from further analysis. Nucleotide diversity was estimated as θw (the scaled mutation rate θ = 4Nµ, where N and µ are the effective population size and mutation rate per site) and π (nucleotide differences per site) as described in Materials and Methods.

Estimates of nucleotide diversity for the four genes are summarised in Table 4.3.

Species-wide the level of total nucleotide diversity was moderate with θw = 0.0056 and π

= 0.0039 on average. In all four genes the average total diversity, θw and π varied from

0.00372 to 0.00799 and from 0.00167 to 0.00652 respectively. Diversity was slightly higher in EnHB1 and EnPAAPA compared to the other genes (Table 4.3); however, these differences were not statistically significant by single factor ANOVA analysis. A reasonably low value for π compared to θw was obtained in EnHB1 compared to the other three gene loci. Diversity values estimated for non-synonymous sites were

123

generally lower than silent sites (non-coding and synonymous) and average non-

synonymous to synonymous diversity ratios ranged from 0.14 to 0.27 (π) for EnCesA3 and EnHB1. EnNAM1 and EnPAAPA had higher non-synonymous to synonymous ratios

compared to EnCesA3 and EnHB1. EnPAAPA had about two to five fold higher levels of

non-synonymous diversity than the other three gene loci.

124

Table 4.3 Nucleotide diversity of selected candidate genes

Total Non-coding Synonymous Non-synonymous Gene ID Locus N Indel L S θw π LS θw π LS θw π LS θw π EnCesA3 EnCesA3 -Frag 1 14 1 748 18 0.00770 0.00819 470 15 0.00889 0.01031 261 0 0 0 261 3 0.00471 0.00274 EnCesA3 -Frag 2 7 2 809 12 0.00609 0.00616 388 7 0.00769 0.00777 411 2 0.00907 0.00847 411 3 0.00381 0.00386 EnCesA3 -Frag 3 16 1 734 13 0.00538 0.00402 462 11 0.00748 0.00558 264 2 0.00975 0.00755 264 0 0 0 EnCesA3 -Frag 4 13 1 781 7 0.00138 0.00289 177 1 0.00107 0.00051 588 0 0 0 588 5 0.00348 0.00166 EnCesA3 -Frag 5 12 2 639 10 0.00523 0.00452 207 1 0.00768 0.00663 423 6 0.02096 0.01599 423 3 0.00303 0.00263 EnCesA3 -Frag 6 15 2 891 5 0.00176 0.00146 407 3 0.00302 0.00251 447 2 0.00604 0.00374 447 0 0 0 EnCesA3 -Frag 7 15 0 635 8 0.00388 0.00264 95 1 0.00685 0.00568 537 4 0.00950 0.00662 537 3 0.00226 0.00098 Total 0 5237 73 2206 39 2931 16 2931 17 Average 0.00449 0.00427 0.00610 0.00557 0.00790 0.00605 0.00247 0.00170 EnHB1 EnHB1 -Frag 1 24 0 820 20 0.00657 0.00349 820 20 0.00657 0.00349 0 0 0 0 0 0 0 0 EnHB1 -Frag 2 22 1 712 9 0.00353 0.00238 405 9 0.00531 0.00358 291 0 0 0 291 0 0 0 EnHB1 -Frag 3 23 0 712 16 0.00610 0.00295 439 14 0.00402 0.00813 270 1 0.00444 0.00143 270 1 0.00042 0.00130 EnHB1 -Frag 4 24 0 549 14 0.00683 0.00412 339 10 0.00752 0.00458 207 1 0.00506 0.00157 207 2 0.00348 0.00202 EnHB1 -Frag 5 24 0 552 9 0.00437 0.00163 262 4 0.00477 0.00193 288 2 0.00714 0.00324 288 3 0.00377 0.00117 EnHB1 -Frag 6 19 0 730 14 0.00552 0.00425 534 10 0.00544 0.00447 192 1 0.00647 0.00794 192 3 0.00581 0.00341 EnHB1 -Frag 7 12 0 763 16 0.00694 0.00421 360 5 0.00581 0.00382 399 3 0.01039 0.00665 399 8 0.00873 0.00484 EnHB1 -Frag 8 20 1 757 19 0.00752 0.00561 423 13 0.00788 0.00602 327 1 0.00362 0.00129 327 6 0.00679 0.00479 EnHB1 -Frag 9 24 0 387 11 0.00761 0.00335 283 8 0.00867 0.00393 102 2 0.02060 0.00641 102 1 0.00352 0.00110 EnHB1 -Frag 10 23 1 888 22 0.00688 0.00248 571 19 0.00848 0.00296 294 1 0.00397 0.00128 294 3 0.00360 0.00116 Total 3 6870 150 4436 112 2370 12 2370 27 Average 0.00619 0.00345 0.00645 0.004291 0.00617 0.00298 0.00361 0.00198 EnNAM1 EnNAM1- Frag1 18 2 1016 13 0.00377 0.00287 366 10 0.00566 0.00354 630 0 0 630 3 0.00181 0.00220 EnNAM1- Frag2 18 0 391 5 0.00372 0.00167 87 0 0 0 303 0 0 0 303 5 0.00615 0.00276 Total 2 1407 18 453 10 933 933 8 Average 0.00375 0.00227 0.00283 0.00177 0.00398 0.00248 EnPAAPA EnPAAPA -Frag 1 18 3 807 27 0.00985 0.00718 781 26 0.00965 0.00693 15 0 0 0 15 1 0.02357 0.02385 EnPAAPA -Frag 2 14 2 626 12 0.00612 0.00586 554 12 0.00666 0.00638 63 0 0 0 63 0 0 0 Total 5 1433 39 1335 38 78 78 1 0.01179 0.01193 Average 0.00799 0.00652 0.00816 0.006655

Total Average (Sp-wide) 0.00556 0.00390 0.00615 0.00470 0.00557 0.00344 0.00404 0.00288 N = Total sequence individual, S = Number of segregating sites, Θw and π = Nucleotide diversity estimate, L = Gene fragment length in bp,

125

4.3.3 Haplotype diversity

The number of haplotypes and corresponding haplotype diversity along with minimum recombination events (RM) of each gene locus are presented in Table 4.4. The average number of haplotypes and haplotype diversity of selected gene fragments varied from

6.86 to 13 and from 0.85 to 0.968 respectively. Large variations in haplotype number and haplotype diversity were observed in EnPAAPA. In all four genes little evidence of strong haplotype structure was observed. One or more historical recombination events were detected in 11 of the 21 fragments. In EnCesA3 two of the seven fragments without evidence for historical recombination had six haplotypes on average.

126

Table 4.4 Haplotype number, diversity and the minimum number of recombination events

(RM) of the candidate genes

Gene ID Locus Length (bp) No. of haplotypes Haplotype diversity RM

EgrCesA3 EnCesA3 -Frag 1 748 10 0.945 3 EnCesA3 -Frag 2 809 7 1.000 0 EnCesA3 -Frag 3 734 5 0.675 0 EnCesA3 -Frag 4 781 5 0.538 0 EnCesA3 -Frag 5 639 9 0.909 0 EnCesA3 -Frag 6 891 7 0.800 1 EnCesA3 -Frag 7 635 5 0.771 0 Average 6.86 0.805 0.57

EgrHB1 EnHB1 -Frag 1 820 16 0.924 1 EnHB1 -Frag 2 712 9 0.706 1 EnHB1 -Frag 3 712 13 0.874 1 EnHB1 -Frag 4 549 13 0.924 0 EnHB1 -Frag 5 552 7 0.504 0 EnHB1 -Frag 6 730 11 0.930 0 EnHB1 -Frag 7 763 8 0.848 1 EnHB1 -Frag 8 757 13 0.942 2 EnHB1 -Frag 9 387 9 0.696 0 EnHB1 -Frag 10 888 13 0.783 1 Average 11.2 0.813 0.7

EgrNAM1 EnNAM1- Frag 1 1016 9 0.824 3 EnNAM1- Frag 2 391 5 0.484 0 Average 7 0.654 1.5

EgrPAAPA EnPAAPA -Frag 1 807 16 0.980 2 EnPAAPA -Frag 2 626 10 0.956 1 Average 13 0.968 1.5

127

4.3.4 Neutrality tests

Neutrality tests were performed in order to identify signatures of selection or other demographic process for the candidate genes. Tajima’s-D, Fu’s D* and F* statistics are shown in Table 4.5. The tests were significantly different from zero for five fragments among twenty one and were found to be negative. The EnHB1 gene has possibly been under negative selection as four out of ten regions were found to be significantly different from zero with all three tests. Though negative statistics were observed in EnNAM1 and

EnPAAPA, they were not significant. There is some evidence of negative selection in

EnCesA3, as indicated by the significant negative value in fragment 4 using all three tests.

128

Table 4.5 Selection tests for the candidate genes.

Gene ID Locus DΦ D♣ F♣

EnCesA3 EnCesA3 -Frag 1 0.26499 0.52880 0.52427 EnCesA3 -Frag 2 0.06050 0.19660 0.18240 EnCesA3 -Frag 3 -0.97725 1.12446 0.62411 EnCesA3 -Frag 4 -1.98196 * -2.49878 ** -2.69137 ** EnCesA3 -Frag 5 -0.55629 -0.55046 -0.62694 EnCesA3 -Frag 6 -0.57372 -0.26173 -0.39426 EnCesA3 -Frag 7 -1.17370 -1.22049 -1.38429

EnHB1 EnHB1 -Frag 1 -1.70990 -2.69847 * -2.80165 * EnHB1 -Frag 2 -1.08910 -0.75184 -0.98666 EnHB1 -Frag 3 -1.85790 * -2.20220 -2.44766 EnHB1 -Frag 4 -1.38722 -1.57743 -1.77350 EnHB1 -Frag 5 -2.05071 * -2.44907 -2.71212 * EnHB1 -Frag 6 -0.84997 -0.27368 -0.50931 EnHB1 -Frag 7 -1.70001 -1.63445 -1.87871 EnHB1 -Frag 8 -0.96175 -0.81764 -0.99974 EnHB1 -Frag 9 -1.89549 * -3.23987 ** -3.30735 ** EnHB1 -Frag 10 -2.43863 *** -3.62044 ** -3.80998 **

EnNAM1 EnNAM1- Frag1 -0.89190 -1.10857 -1.21057 EnNAM1- Frag2 -1.74210 -1.91692 -2.14990

EnPAAPA EnPAAPA -Frag 1 -1.07972 -1.69734 -1.76069 EnPAAPA -Frag 2 -0.16680 0.02292 -0.03209

Φ Tajima’s D

♣ Fu and Li’s neutrality test

* Significant at P < 0.05

** Significant at P < 0.02

*** Significance at P < 0.01

129

4.3.5 Linkage disequilibrium

Pair wise LD (r2) was plotted against the distances between parsimonious SNPs within all four genes. Figure 4.3.1 to 4.3.4 summarise the LD between pairs of sites within the four candidate genes. The nonlinear regression model for analysing the decay of linkage disequilibrium with distance was calculated for EnCesA3, EnHB1 and EnPAAPA. In all three genes LD decays quite rapidly with distance. The predicted value of r2 declined to

0.1 or less within 500 bp in EnHB1 and in EnPAAPA. In EnCesA3 the predicted value of

r2 declined to 0.2 at about 400 bp. Despite the rapid decline of LD, several sites in

EnCesA3, EnHB1 and EnPAAPA show extensive linkage disequilibrium. Due to the

lower number of SNPs in EnNAM1, the nonlinear regression model was not used.

Linkage disequilibrium with physical distance in EnNAM1 was estimated using TASSEL

software. Pooled estimated allele frequencies LD (r2) were plotted against distance in

base pairs between polymorphic sites. LD analysis of the eucalypt NAM gene revealed

that LD does not extend over the entire region and most of the SNPs are in linkage

equilibrium (r2 <0.3).

130

1.2

1

0.8 2

r 0.6

0.4

0.2

0 0 100 200 300 400 500 600 700 800 Nucleotide distance (bp)

Figure 4.3.1 Distribution of the squared correlations of allele frequencies (r2) against distance in base pairs for the EnCesA3 locus. The fitted curve describes the least squares fit of r2 to its expectation at each distance.

1.2

1

0.8 2

r 0.6

0.4

0.2

0 0 100 200 300 400 500 600 700 800 900 Nucleotide distance (bp)

Figure 4.3.2 Distribution of the squared correlations of allele frequencies (r2) against distance in base pairs for the EnHB1 locus. The fitted curve describes the least squares fit of r2 to its expectation at each distance.

131

1.2

1

0.8 2

r 0.6

0.4

0.2

0 0 200 400 600 800 1000 1200 1400 Nucleotide distance (bp)

Figure 4.3.3 Distribution of the squared correlations of allele frequencies (r2) against distance in base pairs for the EnNAM1 locus.

1.2

1

0.8

r2 0.6

0.4

0.2

0 0 100 200 300 400 500 600 700 800 Nucleotide distance (bp)

Figure 4.3.4 Distribution of the squared correlations of allele frequencies (r2) against distance in base pairs for the EnPAAPA locus. The fitted curve describes the least squares fit of r2 to its expectation at each distance.

132

4.3.6 Haplotype block partitioning

To select informative and useful SNPs for association mapping haplotype block partitioning was performed. Tag SNPs were identified from gene loci using htSNPer by a heuristic algorithm. Predicted tags SNPs for association mapping are presented in

Table 4.6. Haplotype block analysis revealed that there is one block per region in

EnCesA3, EnNAM1 and EnPAAPA. Results were obtained separately for each gene fragment analysis. In EnHB1, four fragments from a total of 10 had more than one block.

The highest average number of TAG SNPs (8.5) per region was obtained from EnPAAPA and its more than double when compared with other genes reflecting the low LD observed in EnPAAPA.

133

Table 4.6 Haplotype blocks and TAG SNPs for selected candidate genes

Gene ID Locus SNPs Haplotype Blocks Tag - SNPs

EnCesA3 EnCesA3 -Frag 1 18 1 5 EnCesA3 -Frag 2 12 1 5 EnCesA3 -Frag 3 13 1 2 EnCesA3 -Frag 4 71 5 EnCesA3 -Frag 5 12 1 6 EnCesA3 -Frag 6 51 4 EnCesA3 -Frag 7 81 2 Mean 10.71 1 4.14

EnHB1 EnHB1 -Frag 1 20 2 2 EnHB1 -Frag 2 92 3 EnHB1 -Frag 3 16 1 3 EnHB1 -Frag 4 14 1 3 EnHB1 -Frag 5 91 0 EnHB1 -Frag 6 14 1 8 EnHB1 -Frag 7 16 1 4 EnHB1 -Frag 8 19 4 6 EnHB1 -Frag 9 11 1 1 EnHB1 -Frag 10 22 2 0 Mean 15 1.6 3

EnNAM1 EnNAM1- Frag 1 13 1 5 EnNAM1- Frag 2 51 1 Mean 9 1 3

EnPAAPA EnPAAPA -Frag 1 27 1 10 EnPAAPA -Frag 2 12 1 7 Mean 19.5 1 8.5

134

4.3.7 Genetic association

In total 62 SNPs from EnNAM1, EnHB1 and EnPAAPA were selected for genotyping in

300 unrelated trees in the association population. SNPs from EnCesA3 were not

genotyped in this study due to cost. Three SNPs out of six identified in EnNAM1 were

successfully genotyped using the MLPA method as stated earlier (section 4.2.9 in thesis).

In total 56 SNPs from EnHB1 (47) and EnPAAPA (9) were genotyped using the Illumina

GoldenGate assay, of which 18 SNPs proved to be polymorphic. About 2/3 of the SNPs

were found to be monomorphic. The high failure rate for the SNPs may be due to

sequence artifacts introduced during PCR, sequencing errors or technical limitations of

the Illumina assay. Association tests were performed between genotypes and various

wood quality traits using TASSEL software. The significant genetic associations were

remained significant for all the SNPs after correction of multiple testing using 1000

permutation tests. Ten significant associations were observed between seven SNPs and

wood quality traits. Table 4.7 and 4.8 present the results of the association analyses of

the 21 SNPs for selected wood property traits. Strong associations were identified

between EnHB1 and traits. Six SNPs out of the 15 genotyped for EnHB1 were

significantly (p<0.05 and p<0.01) associated with pulp yield, cellulose, total/Klason lignin and mirofibril angle. Three SNPs, (SNP-16, SNP-90 and SNP-93) were associated

with at least two traits. No association was observed between any SNPs and density. A single SNP from EnPAAPA was significantly associated (p<0.05) with MFA. No

significant associations were found for any of the EnNAM1 SNPs. The range of r2 values

(total variation of each SNP) of significant SNPs was 0.0230 to 0.0455.

135

Table 4.7 Genetic associations between 21 SNPs from EnHB1, EnNAM1 and EnPAAPA and pulp yield, cellulose, total lignin, Klason lignin, MFA and density.

Trait Gene SNP Position of SNP SNP N Marker effect (GLM model) Frequency (%) FP R 2 Pulp Yield EnHB1 SNP-16 5' UTR 45.83 272 3.2085 0.0420 * 0.0230 SNP-18 5' UTR 20.83 272 1.5238 0.2197 0.0111 SNP-27 Intron 1 20.83 272 0.2625 0.7693 0.0019 SNP-34 Intron 2 26.09 272 1.5995 0.2039 0.0116 SNP-50 Intron 3 29.17 271 2.3949 0.0931 0.0174 SNP-59 Intron 6 12.50 272 0.3588 0.6988 0.0026 SNP-68 Intron 8 8.33 272 0.9545 0.3863 0.0070 SNP-71 Exon 11(syn) 21.05 272 0.9545 0.3863 0.0070 SNP-78 Intron 11 21.05 272 0.9502 0.3880 0.0069 SNP-80 Intron 11 21.05 272 1.1563 0.3162 0.0084 SNP-82 Intron 11 15.79 268 0.7314 0.4822 0.0054 SNP-90 Exon 14 (ns) 8.33 273 3.2174 0.0740 0.0116 SNP-93 Intron 14 30.00 267 3.7957 0.0237 * 0.0276 SNP-94 Exon 15 (syn) 20.00 271 0.8777 0.4169 0.0064 SNP-104 Intron 15 25.00 270 5.0069 0.0073 ** 0.0358

EnNAM1 SNP-3 Exon 1 44.45 274 0.1578 0.8541 0.0012 SNP-4 Exon 1 44.45 269 0.9988 0.3697 0.0074 SNP-5 Exon 1 27.78 264 0.9528 0.3870 0.0072

EnPAAPA SNP-11 Intron 1 50.00 272 0.2058 0.8141 0.0015 SNP-13 Intron 1 44.00 271 4.2359 0.0405 0.0154 SNP-33 Intron 1 36.00 271 0.5092 0.6016 0.0037

Cellulose EnHB1 SNP-16 5' UTR 45.83 205 1.2063 0.3014 0.0116 SNP-18 5' UTR 20.83 206 0.5800 0.5608 0.0056 SNP-27 Intron 1 20.83 206 0.4383 0.6457 0.0042 SNP-34 Intron 2 26.09 206 2.9625 0.0539 0.0280 SNP-50 Intron 3 29.17 205 2.2551 0.1075 0.0215 SNP-59 Intron 6 12.50 206 0.6420 0.5273 0.0062 SNP-68 Intron 8 8.33 206 0.5956 0.5522 0.0057 SNP-71 Exon 11(syn) 21.05 206 0.5956 0.5522 0.0057 SNP-78 Intron 11 21.05 206 0.6252 0.5361 0.0060 SNP-80 Intron 11 21.05 206 0.5956 0.5522 0.0057 SNP-82 Intron 11 15.79 201 0.1290 0.8790 0.0013 SNP-90 Exon 14 (ns) 8.33 207 2.0267 0.1561 0.0097 SNP-93 Intron 14 30.00 201 3.5580 0.0303 * 0.0342 SNP-94 Exon 15 (syn) 20.00 204 0.5856 0.5577 0.0057 SNP-104 Intron 15 25.00 204 2.6778 0.0711 0.0256

EnNAM1 SNP-3 Exon 1 44.45 203 0.1093 0.8965 0.0011 SNP-4 Exon 1 44.45 202 0.8035 0.4492 0.0079 SNP-5 Exon 1 27.78 199 0.7435 0.4768 0.0074

EnPAAPA SNP-11 Intron 1 50.00 206 0.5115 0.6004 0.0049 SNP-13 Intron 1 44.00 207 1.0536 0.3059 0.0051 SNP-33 Intron 1 36.00 205 1.9313 0.1476 0.0185

136

* = Significant at P<0.05, ** = Significant at P<0.01, GLM = General Linear Model, R2 = Total variation explained by the marker, syn – Synonymous SNP, ns – Non-synonymous SNP Table 4.7 continued:

Trait Gene SNP Position of SNP SNP N Marker effect (GLM model) Frequency (%) FP R 2 Total lignin EnHB1 SNP-16 5' UTR 45.83 272 0.4812 0.6186 0.0035 SNP-18 5' UTR 20.83 272 0.5935 0.5531 0.0043 SNP-27 Intron 1 20.83 272 0.7320 0.4819 0.0054 SNP-34 Intron 2 26.09 272 0.0840 0.9194 0.0006 SNP-50 Intron 3 29.17 271 0.6986 0.4982 0.0051 SNP-59 Intron 6 12.50 272 0.3744 0.6880 0.0027 SNP-68 Intron 8 8.33 272 0.1427 0.8671 0.0010 SNP-71 Exon 11(syn) 21.05 272 0.1427 0.8671 0.0010 SNP-78 Intron 11 21.05 272 1.0084 0.3662 0.0074 SNP-80 Intron 11 21.05 272 0.1916 0.8258 0.0014 SNP-82 Intron 11 15.79 268 0.3013 0.7401 0.0022 SNP-90 Exon 14 (ns) 8.33 273 8.2792 0.0043 ** 0.0294 SNP-93 Intron 14 30.00 267 1.3435 0.2627 0.0100 SNP-94 Exon 15 syn 20.00 271 0.1786 0.8365 0.0013 SNP-104 Intron 15 25.00 270 0.9054 0.4056 0.0067

EnNAM1 SNP-3 Exon 1 44.45 274 0.2226 0.8006 0.0016 SNP-4 Exon 1 44.45 269 0.5732 0.5644 0.0042 SNP-5 Exon 1 27.78 264 3.4451 0.0333 0.0254

EnPAAPA SNP-11 Intron 1 50.00 272 0.3689 0.6918 0.0027 SNP-13 Intron 1 44.00 271 1.1148 0.2920 0.0041 SNP-33 Intron 1 36.00 271 0.3335 0.7167 0.0025

Klason Ligni EnHB1 SNP-16 5' UTR 45.83 272 0.9441 0.3903 0.0069 SNP-18 5' UTR 20.83 272 0.9152 0.4017 0.0067 SNP-27 Intron 1 20.83 272 0.5676 0.5675 0.0042 SNP-34 Intron 2 26.09 272 0.4465 0.6403 0.0033 SNP-50 Intron 3 29.17 271 1.8206 0.1639 0.0133 SNP-59 Intron 6 12.50 272 0.5503 0.5774 0.0040 SNP-68 Intron 8 8.33 272 0.3687 0.6920 0.0027 SNP-71 Exon 11(syn) 21.05 272 0.3687 0.6920 0.0027 SNP-78 Intron 11 21.05 272 0.6021 0.5484 0.0044 SNP-80 Intron 11 21.05 272 0.5111 0.6004 0.0037 SNP-82 Intron 11 15.79 268 0.6702 0.5124 0.0050 SNP-90 Exon 14 (ns) 8.33 273 13.0070 0.0004 ** 0.0455 SNP-93 Intron 14 30.00 267 1.2667 0.2835 0.0094 SNP-94 Exon 15 (syn) 20.00 271 0.4313 0.6501 0.0032 SNP-104 Intron 15 25.00 270 0.6957 0.4996 0.0051

EnNAM1 SNP-3 Exon 1 44.45 274 0.0266 0.9738 0.0002 SNP-4 Exon 1 44.45 269 0.1972 0.8211 0.0015 SNP-5 Exon 1 27.78 264 2.6359 0.0735 0.0196

EnPAAPA SNP-11 Intron 1 50.00 272 0.5068 0.6030 0.0037 SNP-13 Intron 1 44.00 271 0.3274 0.5677 0.0012 SNP-33 Intron 1 36.00 271 0.1981 0.8204 0.0015

137

* = Significant at P<0.05, ** = Significant at P<0.01, GLM = General Linear Model, R2 = Total variation explained by the marker, syn – Synonymous SNP, ns – Non-synonymous SNP

Table 4.7 continued:

Trait Gene SNP Position of SNP SNP N Marker effect (GLM model) Frequency (%) FP R 2 MFA EnHB1 SNP-16 5' UTR 45.83 272 6.5646 0.0016 ** 0.0460 SNP-18 5' UTR 20.83 272 0.4898 0.6133 0.0036 SNP-27 Intron 1 20.83 272 0.1720 0.8421 0.0013 SNP-34 Intron 2 26.09 272 3.7716 0.0242 * 0.0270 SNP-50 Intron 3 29.17 271 5.4413 0.0048 ** 0.0386 SNP-59 Intron 6 12.50 272 0.2310 0.7939 0.0017 SNP-68 Intron 8 8.33 272 0.2356 0.7903 0.0017 SNP-71 Exon 11(syn) 21.05 272 0.2356 0.7903 0.0017 SNP-78 Intron 11 21.05 272 0.3751 0.6876 0.0028 SNP-80 Intron 11 21.05 272 0.0520 0.9493 0.0004 SNP-82 Intron 11 15.79 268 0.6259 0.5356 0.0046 SNP-90 Exon 14 (ns) 8.33 273 0.0376 0.8465 0.0001 SNP-93 Intron 14 30.00 267 0.1187 0.8881 0.0009 SNP-94 Exon 15 (syn) 20.00 271 0.2342 0.7914 0.0017 SNP-104 Intron 15 25.00 270 1.1278 0.3253 0.0083

EnNAM1 SNP-3 Exon 1 44.45 274 0.1660 0.8471 0.0012 SNP-4 Exon 1 44.45 269 0.2155 0.8063 0.0016 SNP-5 Exon 1 27.78 264 0.5663 0.5683 0.0043

EnPAAPA SNP-11 Intron 1 50.00 272 1.2635 0.2843 0.0092 SNP-13 Intron 1 44.00 271 0.0456 0.8311 0.0002 SNP-33 Intron 1 36.00 271 3.8079 0.0234 * 0.0273

Density EnHB1 SNP-16 5' UTR 45.83 272 0.3010 0.7403 0.0022 SNP-18 5' UTR 20.83 272 2.5655 0.0787 0.0185 SNP-27 Intron 1 20.83 272 1.4562 0.2349 0.0106 SNP-34 Intron 2 26.09 272 0.4328 0.6492 0.0032 SNP-50 Intron 3 29.17 271 0.0328 0.9677 0.0002 SNP-59 Intron 6 12.50 272 0.1306 0.8776 0.0010 SNP-68 Intron 8 8.33 272 0.4412 0.6437 0.0032 SNP-71 Exon 11(syn) 21.05 272 0.4412 0.6437 0.0032 SNP-78 Intron 11 21.05 272 1.2382 0.2915 0.0090 SNP-80 Intron 11 21.05 272 0.4341 0.6483 0.0032 SNP-82 Intron 11 15.79 268 0.3398 0.7122 0.0025 SNP-90 Exon 14 (ns) 8.33 273 1.8726 0.1723 0.0068 SNP-93 Intron 14 30.00 267 0.5372 0.5850 0.0040 SNP-94 Exon 15 (syn) 20.00 271 0.4413 0.6437 0.0032 SNP-104 Intron 15 25.00 270 1.8541 0.1586 0.0135

EnNAM1 SNP-3 Exon 1 44.45 274 0.7580 0.4696 0.0055 SNP-4 Exon 1 44.45 269 1.4413 0.2384 0.0106 SNP-5 Exon 1 27.78 264 1.3285 0.2666 0.0100

EnPAAPA SNP-11 Intron 1 50.00 272 1.5491 0.2143 0.0113 SNP-13 Intron 1 44.00 271 0.6763 0.4116 0.0025 SNP-33 Intron 1 36.00 271 1.1270 0.3255 0.0082

138

* = Significant at P<0.05, ** = Significant at P<0.01, GLM = General Linear Model, R2 = Total variation explained by the marker, syn – Synonymous SNP, ns – Non-synonymous SNP

Table 4.8 Summary of genetic associations between significant SNPs and wood traits

Trait Gene SNP Position of SNP SNP N Marker effect (GLM model) 2 Frequency (%) FP R Pulp Yield EnHB1 SNP-16 5' UTR 45.83 272 3.2085 0.0420 * 0.0230 SNP-93 Intron 14 30.00 267 3.7957 0.0237 * 0.0276 SNP-104 Intron 15 25.00 270 5.0069 0.0073 ** 0.0358 Cellulose SNP-93 Intron 14 30.00 201 3.5580 0.0303 * 0.0342 Total lignin SNP-90 Exon 14 (ns) 8.33 273 8.2792 0.0043 ** 0.0294 Klason Lignin SNP-90 Exon 14 (ns) 8.33 273 13.0070 0.0004 ** 0.0455 MFA SNP-16 5' UTR 45.83 272 6.5646 0.0016 ** 0.0460 SNP-34 Intron 2 26.09 272 3.7716 0.0242 * 0.0270 SNP-50 Intron 3 29.17 271 5.4413 0.0048 ** 0.0386

EnPAAPA SNP-33 Intron 1 36.00 271 3.8079 0.0234 * 0.0273

• *= Significant at P<0.05, ** = Significant at P<0.01, GLM = General Linear Model, R2 = Total variation explained by the marker, syn – Synonymous SNP, ns – Non-synonymous SNP • Blue, red and brown color indicates the same SNP that is associated with at least two traits.

139

4.4 DISCUSSION

4.4.1 Diversity

This study provides information about nucleotide diversity and linkage disequilibrium

and evidence of selection in four cell wall genes in Eucalyptus nitens using SNP markers.

For most genes full length sequences were obtained. SNP polymorphisms were

substantially more abundant than indel polymorphisms. A total of 280 unique SNPs were found in the ~15 kb sequenced from the four genes (Table 4.3) or ~1 SNP for every 53.4

bp. This level of polymorphism is comparable with previous studies in Eucalyptus

nitens,1/59 bp (Zhang and Zhang., 2005), Populus tremula 1/60 bp (Ingvarsson, 2005),

and Pinus taeda 1/63 bp (Brown et al., 2004). Moderate levels of nucleotide diversity

were detected in all four genes in this study. The average level of variation within

population (θw = 0.0056) in E. nitens was similar to that reported for Pinus taeda drought

stress related candidate genes (θw = 0.0053, González-Martínez et al., 2006), disease resistance, wood formation and drought genes (θw = 0.0049, 0.0041 and 0.0046

respectively, Neale and Savolainen, 2004) and the BpMADS2 gene in Betula pendula (θw

= 0.0044, Järvinen et al., 2003). The similarity between the above species is not

surprising given that all have large population sizes and high out crossing rates.

Furthermore, the species wide level of polymorphism is ~two fold less compared with

Populus tremula (θw = 0.0167, Ingvarsson, 2005) which was sampled from five geographically disjunct locations in Europe. Polymorphism levels in Douglas fir (θw =

0.0070, Krutovsky and Neale, 2005) were slightly higher than in this study. Although many factors influence diversity (see review by Edward and Thornsbery, 2002) the

140

differences observed in Douglas fir are likely to be the result of the larger number of

genes studied compared with this study.

In other studies of woody plants (González-Martínez et al., 2006; Krutovsky and Neale,

2005) significant differences in estimates of θw and π have been observed between different genes. Though, gene wise θw and π values of EnPAAPA are slightly higher

(Table 4.3) than the other three genes the differences are not statistically significant by

ANOVA. This highlights the problem of comparing variation among species when estimates are based on one or a few loci. The insignificant average nucleotide diversity among the four genes could be due to the fact that the genes under investigation may be functionally related. However, EnCesA3, EnHB1 and EnNAM1 were identified as cell wall formation genes in the pulp yield expression analysis in this project.

The average θw and π values were similar in this study except in EnHB1 where the average θw value was 1.8 fold higher than the average π value. In general, π and θw

were similar with a tendency for θw to be slightly higher than π. This was likely the

result of an excess of low frequency SNPs in EnHB1. Consequently, the neutrality test

statistics tend to be negative (Table 4.5). Low levels of non-synonymous to synonymous

diversity ratios (<1) were found in EnCesA3 and EnHB1. Non-synonymous/synonymous

diversity (<1) with a negative neutrality test suggested that the genes are under purifying

selection. This can also arise because of population structure (Krutovsky and Neale,

2005); however, the population used in this study has no significant sub-structure

(Thumma et al., 2005). The low level of dN/dS indicates that selection has been

operating in EnCesA3 and EnHB1. The higher levels of non-synonymous to synonymous

141

nucleotide diversity ratios (>2) in EnPAAPA compared to the other three genes, suggests

that this gene might be less sensitive to changes in its DNA and protein sequence, and

therefore has been under less selection pressure.

The average haplotype numbers for selected gene fragments varied between 6.9 and 13.0.

This finding is consistent with wood formation genes in Pinus taeda, where 13 haplotypes were observed on average per gene (Brown et al., 2004). Pot et al. (2005) observed 6 haplotypes in CesA3 from P. pinaster which is comparable with the results for

EnCesA3. Large variation in haplotype number and diversity were observed in EnCesA3

gene fragments compared to the other genes and average haplotype diversity is slightly

higher than in other woody plant studies. High haplotype diversity may reflect

diversifying selection rather than meaningful recombination as the high diversity resulted

from low frequency polymorphisms.

4.4.2 Neutrality test

A number of neutrality tests were performed to identify genes or sites departing from

standard neutral patterns. There was a tendency across loci for an excess of singleton mutations as shown by negative values of Tajima’s D for all four genes. Comparison of

the Tajima’s-D statistics among the genes revealed 6 significant departures from

neutrality in selected candidate gene regions. Only two regions from 21 loci gave

positive selection values but these were not significant. Nucleotide diversities departing

from the neutral model are most likely to be found in genes affecting important adaptive

traits, such as wood formation, disease resistance and cold or drought tolerance

142

(González-Martínez et al., 2006). The candidate genes presented in this study were

selected based on their involvement in wood fibre development in trees. A general trend

for negative selection was evident, with 19 regions out of 21 having a negative test score,

some of them being significant. Negative scores in neutrality tests are not unusual and

negative selection is directly correlated with low-frequency polymorphisms and related

with population growth, directional or purifying selection and selective sweeps (Dillon et

al., submitted). Several genes that have shown a negative departure from a neutral model

in DNA sequence studies were related to biotic and abiotic stress tolerance or key

metabolic pathways such as lignin formation in plants (González-Martínez et al., 2006;

Pot et al., 2005). Thus, selecting SNPs from loci that are potentially under selection is useful for association mapping.

EnCesA3, a crucial gene in secondary cell wall biogenesis, showed positive and negative

values in all three neutrality tests but only one region was significantly different from

zero. Sequence alignments of this region revealed a high number of singleton

polymorphic sites in this fragment. This is probably the reason for the significant

statistics value. Pot et al (2005) reported non-significant positive Tajima’s D-values for

CesA3 in Pinus pinaster in a 1048 bp fragment of the gene while ~5237 bp of EnCesA3

was sequenced in this study and two non-significant positive values were detected for

two fragments. Sequence analysis revealed that two types of individuals (haplotypes) are

maintained out of 16 individual trees.

143

EnHB1, a transcription factor involved in xylem differentiation, has probably experienced negative selection. All regions (loci) of this gene showed negative values in all three selection tests. Four regions out of ten were significantly different from zero. The negative values for these statistics are likely to be due to an excess of low frequency polymorphisms that occur across the loci. Some of the low frequency polymorphisms in

EnHB1 may be due to Taq polymerase errors as these sequences were obtained from cloned PCR products, which were originally amplified from bulked DNA. The error rate of Taq polymerase ranges from 10-4 to 2 x 10-5 (Cline et al., 1996). This can account for about 13% of the observed singletons at most, thus it does not appear to explain the significant excess of singleton mutations observed. The gene diversity data and evidence of departure from neutrality in the EnHB1 gene is not surprising given the pivotal role this gene plays in vascular differentiation in Arabidopsis and other plants (Prigge and

Clark, 2006; Ko et al., 2006; Ohashi-Ito and Fududa, 2003). Though, tests of neutrality are often impeded by a lack of knowledge pertaining to demographic history and the effects of selection in particular, population subdivision may bias selection tests

(Ingvarssion, 2005). Since the population used in this study had no significant sub- structure (Thumma et al., 2005), it is most likely that the departures of the frequency spectra from the standard neutral model are the result of purifying/diversifying selection in this gene. Intensive purifying selection regulating diversity across different loci of the

EnHB1 candidate gene in E nitens indicates the importance of this gene in wood fibre development.

144

4.4.3 Linkage Disequilibrium

Linkage Disequilibrium (LD) declines rapidly with distance in all four genes. In EnHB1 and EnPAAPA LD declined to negligible levels (r2 <0.1 to 0.1) within 500 bp. In

EnCesA3 LD also declined to negligible levels after about 400 bp but r2 is slightly higher

(0.2). Levels of LD observed in E. nitens are similar to those observed in Populus, where

LD has been observed to decline rapidly in a 5 gene study (Ingvarsson, 2005). This data confirms recent studies in E. nitens with COBL4 (Thumma, pers. comm). Thumma et al.

(2005) also observed low levels of LD in CCR, which is associated with MFA in

Eucalyptus spp. In maize, Remington et al. (2001) found that LD declined to negligible levels in ~1 kb in 5 out of 6 genes studied in a set of 102 inbred lines. However, more wide-ranging LD (up to 100 kb) is seen in elite maize lines (Ching et al., 2002). In Pinus radiata linkage disequilibrium is small and ranges from less than 100 bp to ≤2000 bp

(Dillon et al., submitted). Previously, Kumar et al. (2004) demonstrated low levels of LD in Pinus radiata breeding selections using microsatelites. González-Martínez et al. (2006) also reported that LD declined to ~0.20 at 800 bp in 18 drought response candidate genes in Pinus. The low level of LD in this study is not unexpected as eucalypts are an out crossing species, with generally large effective population sizes, high recombination rates and high mutation rates. Selection or demography also influences LD along with other factors. Directional or purifying selection can reduce the levels of LD by rapid fixation of new adaptive mutations. The very low levels of LD in EnHB1 might reflect purifying selection at this locus.

145

Among plants (including forest trees), the extent of LD differs dramatically between selfing and out crossing species. LD extends up to 250 kb in selfing species such as

Arabidopsis and rice as long term selfing dramatically reduces the effective recombination rate (Nordborg, 2000). In wild barley, a highly selfing species, some loci show high levels of LD, as expected in a selfing species, whereas other loci show a surprisingly rapid decline of LD with distance (Lin et al., 2002). The reasons behind the different patterns of LD among loci in barley and also differences in LD pattern in arabidopsis are not fully understood. LD is a result of the interaction of many factors such as mutation and recombination rates, mating system, selection, population size, structure and history. The decay of LD with distance is a critical factor (Thumma, personal communication) that affects the success of candidate gene-based association mapping. If LD decays quickly the associations found between a particular SNP and phenotypic trait are more likely to be causative rather than due to linkage (Krutovsky et al., 2005).

Though rapid decline of LD was observed in this analysis, several sites showed linkage disequilibrium extending throughout the full length of a particular fragment in EnHB1,

EnCesA3 and EnPAAPA (Figure 4.3.1, 4.3.2 and 4.3.4). There were some limitations in

LD estimation. The LD plots including all SNPs in EnCesA3 and EnHB1 could not be viewed using TASSLE software, which would have given a more realistic picture of LD for the entire gene. In addition, these LD estimates are based on a limited number of sequences. Considering the low level of LD found within the genes, selection of htSNPs based on construction of LD blocks was relatively successful. There were 1.6 haplotype

146

blocks per fragment in EnHB1 while the other genes had about 1 block per fragment.

The identification of 82 htSNPs from a total of 280 SNPs enabled a 70% reduction in the genotyping required.

4.4.4 Association studies

Candidate gene-based association mapping revealed seven SNPs that are associated with wood properties including pulp yield, lignin, Klason lignin and MFA. Interestingly, four out of the 15 SNPS in EnHB1 occurred in highly significant associations (P<0.01), suggesting that these associations are likely to be real. This included SNP-16 in EnHB1 which was significantly associated with pulp yield and MFA; the first report of a multiple trait association for a single SNP (Thumma et al., 2005; Gronzález-Martínez et al., 2007).

EnHB1 SNP-19 was associated with pulp yield and cellulose; however, these two traits are closely related to each other (see Chapter 2 Materials and Methods). This result provides unique confirmation of the important functional role of the EnHB1 gene in cell wall development. Collectively, the SNPs from this gene are associated with wood traits including pulp yield, cellulose, lignin (total and Klason) and MFA. Wood is composed of cellulose, lignin and hemicellulose in which cellulose microfibrils are embedded in a lignin-hemi-cellulose matrix (Plomion et al., 2001). EnHB1 appears to regulate at least two distinct pathways (lignin and cellulose biosynthesis) leading to cell wall development. This is entirely consistent with its known role as a transcription factor controlling vascular development (Ohashi-Ito and Fukuda 2003). It appears wood/pulp formation is strongly affected by allelic variation in the EnHB1 gene.

147

In the EnHB1 gene, SNP-16, SNP-93 and SNP-104 were significantly associated with pulp yield; together explaining 8.64% of genetic variation in this important commercial trait. SNP-104 showed the strongest association explaining about 3.58% of the total variation in pulp yield; SNP-16 and SNP-93 explaining about 2.30% and 2.76% respectively. Recently, González-Martínez et al. (2007) reported SNP associations explaining 2.21-3.57% of total genetic variation in early wood/late wood specific gravity and early wood MFA in loblolly pine. Thumma et al. (2005) reported a SNP in the CCR gene accounting for 4.6% of total genetic variation in MFA. SNP-16 and SNP-93 were also associated with MFA and cellulose respectively. Although pulp yield and MFA are

different traits both of them are either related to or indirectly influenced by cellulose

content. About 40 - 50% of wood is cellulose and cellulose microfibrils are fundamental

structural molecules in the fibre cell wall. All these 3 SNPs are common SNPs (Table

4.7) and were selected as tagSNPs with htSNP analysis. TagSNP analysis revealed that

SNP16 and SNP104 (fragment 1 and 8: Table 4.6) belong to two different blocks.

EnHB1 SNP-90 is strongly associated with total and Klason lignin and explains about

2.94% and 4.55% of variation respectively. As Klason lignin and total lignin are clearly

related traits this is not surprising. The EnHB1 SNP-90 is the only non-synonymous SNP

associated with trait variation. This SNP is located in exon 14 and results in the

substitution of an aspartic acid by glutamic acid in the predicted protein. EnHB1 is a

transcription factor; however, there are no reports of it playing a role in controlling genes

in the lignin pathway. Transgenic analysis with ATHB8 (closely related to EnHB1)

revealed earlier lignification in inflorescence stems after bolting (Baima et al., 2001).

148

A significant genetic association was found between a SNP in EnPAAPA and MFA.

EnPAAPA is a novel gene which was partially characterised during this study (chapter 3).

The significant SNP, SNP-33, occurs in the single intron and explains about 2.73% of

total variation in MFA. This is comparable with EnHB1 SNP-34 (2.7%, this study) and

the Pinus taeda α-tubulin M10 SNP (2.21%, González-Martìnez et al., 2007). EnPAAPA

was earlier found to be more abundant in the vascular cambium of E. globulus (Bossinger

and Leitch, 2000). This gene was down regulated in both upper and lower branch xylem

where large changes in MFA were observed (Qui et al., 2008). This result provides a

clue to the function of the EnPAAPA gene.

Linking phenotype with genotype is the central aim of genetics (Botstein and Risch,

2003). Candidate gene-based association studies as presented in this thesis are a

powerful approach towards connecting complex phenotypes with genotypic variation.

This study revealed a number of significant associations between SNPs from different

genes with several wood quality traits. The discovery of strong associations between several SNPs in the HD-zip transcription factor EnHB1 and a range of wood traits is a very promising preliminary result. Validation of these associations in different populations is necessary in order to confirm these results. Alternatively, QTL mapping can be performed in order to confirm the collation of significant SNPs with QTLs for wood property traits.

149

Chapter 5

General discussion

General discussion

Pulp and paper production accounts for one percent of the world’s total economic output

(Pot et al., 2002). Consequently, there is considerable interest in developing high pulp

yield eucalypt varieties for plantation forestry. Genetic improvement of wood property

traits by applying genomics techniques is a high priority area of research. Currently,

candidate gene-based association studies are an attractive approach for dissecting

complex traits. To facilitate such investigations this thesis presents research aimed at

identifying candidate genes in E. nitens that may influence pulp yield. Association

studies were then carried out with selected candidate genes in order to identify allelic

variants associating with wood quality traits in E. nitens.

The experimental population used in this thesis was a 300 full-sib progeny derived from a

controlled cross between genetically diverse E. nitens trees, located at Ridgely,

Tasmania. Pulp yield was estimated in the progeny by use of NIR spectroscopy, and

trees with low and high pulp yield were selected on the basis of this data. The hypothesis

that genes playing a role in controlling variation in pulp yield may be differentially expressed in low and high pulp yield trees was tested using microarrays containing approximately 5800 cDNAs. Using probes synthesised from RNA derived from replicated bulks of low and high pulp yield trees, a number of candidate genes were identified that may influence pulp yield. This suggests that it is likely that natural variation occurred in the full-sib progeny trial. A relatively small number of genes were differentially expressed between low and high pulp yield trees as opposed to other studies comparing much more diverse tissues. From among the differentially expressed genes,

151 several candidate genes were selected on the basis of their likely functional role in cell

wall development, including a secondary cell wall cellulose synthase (EgrCesA3), a NAC

domain protein implicated in controlling plant development (EgrNAM1), a homeodomain

transcription factor known to be a regulator of vascular development (EgrHB1) and a

C3HC4-type zinc finger transcription protein (EgrZnf1). In order to validate the

differential expression of these genes in low and high pulp yield trees, the differential

expression of selected candidates was examined using real-time PCR. Differential

expression was confirmed in two of the three genes studied, although results in one of the

candidates were opposite to those obtained from microarray analysis.

An additional candidate gene was selected from previous CSIRO and University of

Melbourne research. EgrPAAPA is specifically expressed in cambial tissue (Bossinger

and Leitch, 2000) and is down regulated in eucalypt branches (Qiu et al., 2008). A full

length cDNA of EgrPAAPA was cloned from an E. grandis xylem cDNA library and

fully sequenced. Studies confirmed that there are no sequences with strong homology to

EgrPAAPA in public databases outside of the genus Eucalyptus. The predicted

EgrPAAPA protein is rich in alanine, glutamic acid and proline residues. The most

important feature of the predicted protein of EgrPAAPA are the seven ‘PAAPA’ motifs in

its C-terminal region and a high proportion of glutamic acid pairs scattered throughout

the protein. The occurrence of the ‘PAAPA’ motifs distinguished EgrPAAPA from

similar glutamic acid rich proteins which have been found in other plant species. It appears that EgrPAAPA is a hydroxyproline-rich glycoprotein (HRGP), a class of

152 proteins with fundamental impact on plant development. The repetitive ‘PAAPA’ motif may play a structural role in cell wall development.

SNP-based candidate gene association studies provide valuable information for connecting phenotypic and genotypic variation within complex traits. Genetic diversity and association studies were carried out on EgrPAAPA and three genes selected on the basis of their differential expression in high and low pulp yield trees, including

EgrCesA3, EgrNAM1, and EgrHB1. Common SNPs (freq > 0.1) were identified in the four genes by sequencing the genes in approximately 300 unrelated trees. Overall, a moderate level of nucleotide diversity was observed in all four genes. The frequency of

SNPs was one SNP per ~54 bp across coding and non-coding regions on average.

Diversity was slightly higher in EnHB1 and EnPAAPA compared to others. The average level of within population diversity in E. nitens was similar to other tree species. Linkage disequilibrium was generally low and declined rapidly over only a few hundred base pairs in each gene. The low level of LD and diversity found in this study is consistent with the level of LD observed in other tree species and is ideally to derive from their out- crossing nature and large effective population size. Selection tests indicate that EnHB1 has been under negative selection and that this selection is most likely purifying/diversifying. Significant selections like this are likely to be found in genes affecting important traits (González Martínez, 2006).

Twenty one SNPs genotypic data from an association population containing ~300 unrelated trees which had been phenotyped for wood traits including pulp yield, cellulose

153 content, total lignin, Klason lignin, MFA and density. A number of significant

associations between SNPs in EgrHB1 and EgrPAAPA were identified for a range of

traits. SNPs from EnHB1 are strongly associated (P<0.01) with various wood traits

including pulp yield, cellulose, lignin (total and Klason) and MFA. Interestingly, several

SNPs influenced more than one trait (e.g. SNP16 influenced pulp yield and MFA). These

findings suggest that EnHB1 regulates or influences two distinct biosynthetic pathways

(cellulose and lignin) involved in cell wall development. This is consistent with the

function of the Arabidopsis homologue AT-HB15, which is known to be a transcription

factor controlling vascular development (Prigge and Clark, 2006; Kim et al., 2005;

Ohashi-Ito and Fukuda, 2003; Baima et al., 2001).

In this study, it is demonstrated that EgrPAAPA is a novel eucalypt gene that encodes a

glutamic acid-rich protein. The observation of a significant association between a SNP in

EnPAAPA with microfibril angle (MFA) may provide an important clue to the function of

the EgrPAAPA gene. The occurrence of the repetitive PAAPA motif suggests the protein has a structural role. It is possible that the PAAPA protein is deposited in the cell wall where it influences the deposition of cellulose microfibrils. Further studies using ISSA and gene knockouts may confirm the role of EgrPAAPA in cell wall development as recently demonstrated for EgrTUB1 (Spokevicius et al., 2007). Though these results indicate that EnPAAPA is an important gene; it was not identified as a candidate gene in microarray expression comparisons between high and low pulp yield trees. This suggests that polymorphisms impacting expression of the gene were not present in the E. nitens full-sib family chosen for study.

154 While some NAC/NAM proteins have been linked to secondary cell wall development in

plants (Zhong et al., 2006; Mitsuda et al., 2007), no significant associations were detected

between SNPs in EgrNAM1 and any of the wood traits measured. However, a full-length

sequence of EnNAM1 was not available and consequently no conclusive results could be obtained.

Concluding remarks

The thesis addresses an important challenge for forest molecular genetics; identifying

genes that control wood quality traits. Candidate gene-based association studies, as

presented in this thesis, are shown to be a powerful approach for connecting complex

phenotypic variation with genotypic variation. Discovery of suitable high quality

candidate pulp yield genes using the BSA (bulk segregating analysis) approach in E.

nitens is a novel approach. Gene expression comparisons between RNA bulks of low

versus high pulp yield trees from a full-sib population revealed a number of candidate

genes which may influence the trait. The discovery of SNPs in one of those candidate

genes (EnHB1) that are associated with variation in pulp yield indicates that the approach

was successful. This suggests that the expression BSA approach may be useful for

identifying candidate genes for other phenotypic traits. A limitation of the approach is

that only genes with expression polymorphisms in the full-sib family could be identified.

Additionally, polymorphisms which affect the trait but not expression levels (e.g.

polymorphisms causing amino acid changes in the protein) would not be detected.

155 In this thesis several SNPs in the transcription factor EnHB1 were found to be associated

with multiple traits. DNA polymorphisms in this gene were associated with pulp yield,

cellulose, lignin and MFA and several of these SNPs do not occur in linkage

disequilibrium. This is a novel discovery, as all other published associations in trees have

been single gene: trait associations. These results are consistent with the fact that EnHB1

is a master regulator of vascular development, regulating genes in two distinct

biosynthetic pathways (cellulose and lignin) during cell wall development. This finding illustrates the potential to select SNPs influencing two or more traits independently from within the same gene. It is recommended that the associations identified in EnHB1 and

EnPAAPA be validated in a different population in order to confirm these results.

Alternatively, QTL mapping could be performed to investigate whether a QTL for the particular wood property trait can be detected at the EnHB1 and EnPAAPA loci. If

confirmed, these SNPs may be valuable markers for gene-assisted selection in E. nitens breeding programs aimed at improving high value wood traits such as pulp yield.

156

References

Abdallah, J. M., B. Goffinet, C. Cierco-Ayrolles and M. Pérez-Enciso (2003). Linkage disequilibrium fine mapping of quantitative trait loci: a simulation study. Genet. Sel. Evol. 35: 513-532.

Australia's Plantations (2007). Department of Agriculture, Fisheries and Forestry. Bureau of Rural Sciences, Canberra.

Aida, M., T. Ishida, K. Fukaki, H. Fujisawa and M. Tasaka (1997). Genes involved in organ separation in Arabidopsis: an analysis of the cup-shaped cotyledon mutant. Plant Cell, 9: 941-857.

Akasawa, A., L.-S. Hsieh, B. M. Martin, T. Liu and Y. Lin (1996). A novel acidic allergen, Hev b 5, in latex. Purification cloning and characterization. The Journal of Biological Chemistry, 271(41): 25389-25393.

Allona, I., M. Quinn, E. Shoop, K. Swope, S.-St. Cyr, J. Carlis, J. Riedl, E. Retzel, M. M. Campbell, R. Sederoff and R. W. Whetten (1998). Analysis of xylem formation in pine by cDNA sequencing. Proc. Natl. Acad. Sci., 95: 9693- 9698.

Andersson-Gunnerås, S., E. J. Mellerowicz, J. Love, B. Segerman, Y. Ohmiya, P. M. Coutinho, P. Nilsson, B. Henrissat, T. Moritz and B. Sundberg (2006). Biosynthesis of cellulose-enriched tension wood in Populus: global analysis of transcripts and metabolites identifies biochemical and developmental regulators in secondary wall biosynthesis. The Plant Journal, 45: 145-165.

Bacic, A., P. J. Harris, and B. A. Stone (1988). Structure and function of plant cell walls. In: The biochemistry of plant: a comprehensive treatise (Vol. 14) (Stumpf, P. K. and Conn. E. E. eds), Academic Press: 297-371.

158 Baima, S., M. Possenti, A. Matteucci, E. Wisman, M. M. Altamura, I. Ruberti and G. Morelli (2001). The Arabidopsis ATHB-8 HD-Zip protein acts as a differentiation-promoting transcription factor of the vascular meristems. Plant Physiology, 126: 643-655.

Baima, S., F. Nobili, G. Sessa, S. Lucchetti, I. Ruberti and G. Morelli (1995). The expression of the ATHB-8 homeobox gene is restricted to provascular cells in Arabidopsis thaliana. Development, 121: 4171-4182.

Baskin, T. I. (2001). On the alignment of cellulose microfibrils by cortical microtubules: a review and a model. Protoplasma, 215: 150-171.

Boerjan, W., J. Ralph and M. Baucher (2003). Lignin biosynthesis. Annu. Rev. Plant Biol., 54: 519-546.

Boland D. J., M. I. H. Brooker, G. M. Chippendale, N., Hall, B. P. M. Hyland, R. D. Johnston, D. A. Kleinig, and J. D. Turner (1984). Forest trees of Australia. Thomas Nelson, Melbourne.

Borden, K. L. B. (2000). Ring domains: master builders of molecular scaffolds? Journal of Molecular Biology, 295: 1103-112.

Borden, K. L. B. and P. S. Freemont (1996). The ring finger domain: a recent example of a sequence-structure family. Current Opinion in Structural Biology, 6:395-401.

Borralho, N. M. G., P. P. Cotterill, and P. J. Kaanowski (1993). Breeding objectives for pulp production of Eucalyptus globulus under different industrial cost structures. Canadian Journal of Forest Research, 23: 648-656.

159 Borralho, N. M. G., P. P. Cotterill, and P. J. Kanowski (1992). Genetic parameters and gains expected from selection for dry weight in Eucalyptus globulus ssp in Portugal. Forest Science, 38: 80-94.

Bossinger, G., J. F. G. Tibbets, L. J. Mcmanus and A. V. Spokevicius, (2007). Molecular tree domestication and the xylogenesis candidate gene cascade. In: The compromised wood workshop 2007 (Entwistle, K., Harris, P. and Walker, J. eds), The Wood Technology Research Centre, University of Canterbury, New Zealand, 69-84.

Bossinger, G. and M. A. Leitch (2000). Isolation of cambium-specific genes from Eucalyptus globulus Labill. In: Cell & molecular biology of wood formation (R. A. Savidge, J. R. Barnett, and R. Napier eds). Oxford, BIOS Scientific Publisher Ltd: 203-207.

Botstein, D. and N. Risch (2003). Discovering genotypes underlying human phenotypes: past successes for Mendelian disease, future approaches for complex disease. Nat. Genet., 33: 228-237.

Boudet, A. M. (1998). A new view of lignification. Trends in Plant Science, 3 (2): 67-71.

Bourquin, V., N. Nishikubo, H. Abe, H. Brumer, S. Denman, M. Eklund, M. Christiernin, T. T. Teeri, B. Sundberg and E. J. Mellerowicz (2002). Xyloglucan endotransglycosylases have a function during the formation of secondary cell walls of vascular tissues. Plant Cell, 14: 3073-3088.

160 Brazma, A., P. Hingamp, J. Quackenbush, G. Sherlock, P. Spellman, C. Stoeckert, J. Aach, W. Ansorge, C. A. Ball, H. C. Causton, T. Gaasterland, P. Glenisson, F. C. P. Holstege, I. F. Kim, V. Markowitz, J. C. Matese, H. Parkinson, A. Robinson, U. Sarkans, S. Schulze-Kremer, J. Stewart, R. Taylor, J. Vilo and M. Vingron (2001). Minimum information about a microarray experiment (MIAME)- toward standards for microarray data. Nature Genetics, 29: 365-371.

Brazeau, D. A. (2004). Combining genome-wide and targeted gene expression profiling in drug discovery: microarrays and real-time PCR. DDT, 9(19): 838- 845.

Brooker, M.I.H. and D.A. Kleinig (1999). Field guide to Eucalypts. Second Edition, Vol. 1. Blooming Books, Australia.

Brown, G. R, G. P. Gill, R. J. Kuntz, C. H. Langley and D. B Neale (2004). Nucleotide diversity and linkage disequilibrium in loblolly pine. Proc. Natl. Acad. Sci., 101(42): 15255-15260.

Brown, R. M. and I. M. Saxena (2000). Cellulose biosynthesis: a model for understanding the assembly of biopolymers. Plant Physiol Biochem, 38: 57-67.

Brumfield, R.T., P. Beerli, D. A. Nickerson and S. V. Edwards (2003). The utility of single nucleotide polymorphisms in inferences of population history. Trends in Ecology and Evolution, 18(5): 249-256.

Bustin, S. A. (2000). Absolute quantification of mRNA using real-time reverse transcription polymerase chain reaction assays. Journal of molecular endocrinology, 25, 169-193.

161 Butcher, P. A. and S. G. Southerton (2007). MAS in forestry species. Marker- assisted selection (MAS) in crops, livestock, forestry and fish: current status and the way forward. Rome: FAO, 283-305.

Byrne, M., J. C. Murrel, B. Allen and G. F. Moran (1995). An integrated genetic linkage map for eucalypts using RFLP, RAPD and isozyme markers. Theoretical and Applied Genetics, 91: 869-875.

Byrne, M. and G. F. Moran (1994). Population divergence in the chloroplast genome of Eucalyptus nitens. Heridity, 73: 18-28.

Carlson, C. S., S. F. Aldred, P. K. Lee, R. P. Tracy, S. M. Schwartz, M. Rieder, K. Liu, O. D. Williams, C. Iribarren, E. C. Lewis, M. Fornage, E. Boerwinkle, M. Gross, C. Jaquish, D. A. Nickerson, R. M. Myers, D. S. Siscovick and A. P. Reiner (2005). Polymorphisms within the C-reactive protein (CRP) promoter region are associated with plasma CRP levels. The American Journal of Human Genetics, 77: 64-77.

Carpita, N. and M. McCann (2000). The cell wall, In: Biochemistry and molecular biology of plants (B. Buchanan, W. Gruissem, and R. Jones, eds). American Society of Plant Biologists, Rockville, Maryland: 52-108.

Carter, D. E, J. F. Robinson, E. M. Allister, M. W. Huff and R. A. Hegele (2005). Quality assessment of microarray experiments. Clinical Biochemistry, 38(7): 639- 642.

Cassab, G. I. (1998). Plant cell wall proteins. Plant Molecular Biology, 49: 281- 309.

Chaffey, N. (2000). Microfibril orientation in wood cells: new angles on an old topic. Trends in Plant Science, 5: 360-362.

162 Chan, E. Y. (2005). Advances in sequencing technology. Mutation Research, 573: 13-40.

Ching, A., K. S. Caldwell, M. Jung, M. Dolan, O. S. Smith, S. Tingey, M. Morgante and A. J. Rafalski, (2002). SNP frequency, haplotype structure and linkage disequilibrium in elite maize inbred lines. BMC. Genet., 3: 19-32.

Cho, Y., J. Fernandes, S. H. Kim and V. Walbot (2002). Gene-expression profile comparisons distinguish seven organs of maize. Genome Biology, 3 (9): research0045.1-0045.16.

Choy, Y. S., S. L. Dabora, F. Hall, V. Ramesh, Y. Niida, D. Franz, J. Kasprzyk- obara, M. P. Reeve and D. J. Kwiatkowski (1999). Superiority of denaturing high performance liquid chromatography over single-stranded conformation and conformation-sensitive gel electrophoresis for mutation detection in TSC2. Ann Hum Genet, 63:383-391.

Cline, J., J. C. Braman and H. H. Hogrefe (1996). PCR fidelity of Pfu DNA polymerase and other thermostable DNA polymerases. Nucleic Acids Research, 24: 3546-3551.

Cogan, N. O. I., R. C. Ponting, A. C. Vecchies, M. C. Drayton, J. George, P. M. Dracatos, M. P. Dobrowolski, T. I. Sawbridge, K. F. Smith, G. C. Spangenberg and J. W. Forster (2006). Gene-associated single nucleotide polymorphism discovery in perennial ryegrass (Lolium perenne L.). Mol Gen Genomics, 276: 101-112.

Cotterill, P. P. and A. Brolin (1997). Improving Eucalyptus wood, pulp and paper quality by genetic selection. Proc. IUFRO conference on silviculture and improvement of Eucalypts. Salvadar, Brazil.

163 Dantec, L.L., D. Chagné, D. Pot, O. Cantin, P. Garnier-Géré, F. Bedon, J.-M. Frigerio, P. Chaumeil, P. Léger, V. Garcia, Frédéric Laigret, A. Daruvar and C. Plomion (2004). Automated SNP detection in expressed sequence tags: statistical considerations and application to maritime pine sequences. Plant Molecular Biology, 54: 461-470.

Darley, C. P., A. M. Forrester and S. J. McQueen-Mason (2001). The molecular basis of plant cell wall extension. Plant Molecular Biology, 47:179-195.

Dean, G. H., J. French, and W. N. Tibbits (1990). Variation in pulp and paper making characteristics in a field trial of Eucalyptus globulus. Appita 44th Annual General Conference, Rotoura.

Devey, M. (2004). Genomics and gene discovery in forest trees. In: Plantation forest biotechnology for the 21st century (C. Walter and M. Carson eds.), Research Signpost, India.

Demura, T., G. Tashiro, G. Horiguchi, N. Kishimoto, M. Kubo, N. Matsuoka, A. Minami, M. Nagata-Hiwatashi, K. Nakamura, Y. Okamura, N. Sassa, S. Suzuki, J. Yazaki, S. Kikuchi and H. Fukuda (2002). Visualization by comprehensive microarray analysis of gene expression programs during transdifferentiation of mesophyll cells into xylem cells. Proc. Natl. Acad. Sci., 99(24): 15794-15799.

Dillon, S. K., M. F. Nolan, W. J. Gapare, W. Li, P. Matter, H. X. Wu and S. G Southerton (submitted). Nucleotide polymorphism, linkage disequilibrium and Tag SNP selection in candidate genes for wood fibre development in Pinus radiata.

Ding, K., J. Zhang, K, Zhou, Y. Shen and X. Zhang (2005). HtSNPer1.0: software for haplytype block partition and htSNPs selection. BMC Bioinformatics 6, 38.

164 Disch, S., E. Anastaslou, V. K. Sharma, T. Laux and J. C. Fletcher (2006). The E3 ubiquitin ligase BIG BROTHER controls arabidopsis organ size in a dosage- dependent manner. Current Biology, 16: 272-279.

Doblin, M. S., I. Kurek, D. Jacob-Wilk, D. P. Delmer (2002). Cellulose biosynthesis in plants: from genes to rosettes. Plant Cell Physiol., 43 (12): 1407- 1420.

Duval, M., T.-F. Hsieh, S. Y. Kim and T. L. Thomas (2002). Molecular characterization of AtNAM: a member of the Arabidopsis NAC domain superfamily. Plant Molecular Biology, 50: 237-248.

Edward, S. B. and J. M. Thornsberry (2002). Plant molecular diversity and applications to genomics. Current Opinion in Plant Biology, 5: 107-111.

Ehlting, J., N. Mattheus, D. S. Aeschliman, E. Li, B. Hamberger, I. F. Cullis, J. Zhuang, M. Kaneda, S. D. Mansfield, L. Samuels, K. Ritland, B. E. Ellis, Jörg Bohlmann and C. J. Douglas (2005). Global transcript profiling of primary stems from Arabidopsis thaliana identifies candidate genes for missing links in lignin biosynthesis and transcriptional regulators of fiber differentiation. The Plant Journal, 42: 618-640.

Eisenhaber, B., M. Wildpaner, C. J. Schultz, G.H.H. Borner, P. Dupree, and F. Eisenhaber (2003). Glycosylphosphatidylinositol lipid anchoring of plant proteins. Sensitive prediction from sequence-and genome-wide studies for Arabidopsis and rice. Plant Physiology, 133(4):1691-1701.

Eldering, E., C. A. Spek, H. L. Aberson, A. Grummels, I. A. Derks, A. F. Vos, C. J. McElgunn and J. P. Souchten (2003). Expression profiling via novel multiplex assay allows rapid assessment of gene regulation in defined signalling pathways. Nucleic Acids Research, 31(23): e153.1-9.

165 Eldridge, K., J. Davidson, C. Harwood and G. VAN-WYK (1993). Eucalypt domestication and breeding. Clarendon Press, Oxford.

Emahazion, T., L. Feuk, M. Jobs, S. L. Sawyer, D. Fredman, D. Clair, J. A. Prince and A. J. Brookes (2001). SNP association studies in Alzheimer's disease highlight problems for complex disease analysis. Trends in Genetics, 17 (7): 407- 413.

Emery, J. F., S. K. Floyd, J. Alvarez, Y Eshed, N. P. Hawker, A. Izhaki, S. F. Baum and J. L. Bowman (2003). Radial patterning of Arabidopsis shoots by class III HD-Zip and KANADI genes. Current Biology, 13: 1768-1774.

Engle, L. J., Simpson C. L. and M. A. Landers (2006). Using high-throughput SNP technologies to study cancer. Oncogene, 25 (11): 1594-1601.

Ernst, H. A., A. N. Olsen, K. Skriver, S. Larsen, L. L. Leggio (2004). Structure of the conserved domain of ANAC, a member of the NAC family of Transcription factors. EMBO reports, 5(3): 1-7.

Esau, K. (1960). Anatomy of seed plants. John Wiley & Sons, New York.

Fagard, M., T. Desnos, T. Desprez, F. Goubet, G. Refregier, G. Mouille, M. C. McCann, C. Rayon, S. Vernhettes and H. Höfte (2000). PROCUSTE1 encodes a cellulose synthase required for normal cell elongation specifically in roots and dark-grown hypocotyls of Arabidopsis. Plant Cell, 12: 2409-2423.

Fairbrother, W. G., R.-F. Yeh, P. A. Sharp and C. B. Burge (2002). Predictive identification of exonic splicing enhancers in human genes. Science, 297:1007- 1013.

166 Fowler, M. R., J. Gartland, W. Norton, A. Slater, M. C. Elliott and N. W. Scott (2000). RS2: a sugar beet gene related to the latex allergen Hev b 5 family. Journal of Experimental Botany, 51(353): 2125-2126.

Freudenberg-Hua, Y., J. Freudenberg, N. Kluck, S. Cichon, P. Propping, M. M. Nöthen (2003). Single nucleotide variation analysis in 65 candidate genes for CNS disorders in a representative sample of the European population. Genome Research, 13(10): 2271-2276.

Fu, Y.-X. and W.-H. Li (1993). Statistical tests of neutrality of mutations. Genetics, 133: 693-709.

Galán, A., M. Casanova, A. Murgui, D. M. MacCallum, F. C. Odds, N. A. R. Gow and J. P. Martínez (2004). The Candida albicans pH-regulated KER1 gene encodes a lysine/glutamic-acid-rich plasma-membrane protein that is involved in cell aggregation. Microbiology, 150:2641-2651.

Gao, M., M. J. Kieliszewski, D. T. A. Lamport and A. M. Showalter (1999). Isolation, characterization and immunolocalization of a novel, modular tomato arabinogalactan-protein corresponding to the LeAGP-1 gene. The Plant Journal, 18(1): 43-55.

Geisler-Lee, J., M. Geisler, P. M. Coutinho, B. Segerman, N. Nishikubo, J. Takahashi, H. Aspeborg, S Djerbi, E. Master, S. Andersson-Gunnerås, B. Sundberg, S. Karpinski, T. T. Terri, L. A. Kleczkowski, B. Henrissat and E. J. Mellerowicz (2006). Poplar carbohydrate-active enzymes. Gene identification and expression analyses. Plant Physiology, 140: 946-962.

Glaubitz, J. C., L. C. Emebiri, and G. F. Moran (2001). Dinucleotide microsatellites from Eucalyptus siberi: inheritance, diversity, and improved scoring of single-base differences. Genome, 44: 1041-1045.

167 Goldberg, R., A.-M. Catesson and Y. Czaninski (1983). Some properties of syringaldazine oxidase, a peroxidase specifically involved in the lignification processes. Z. Pflanzenphysiol, 110: 267-279.

González-Martínez, S. C., N. C. Wheeler, E. Ersoz, C. D. Nelson and D. B. Neale (2007). Association genetics in Pinus taeda L. I. wood property traits. Genetics, 175: 399-409.

González-Martínez, S. C., E. Ersoz, G. R. Brown, N. C. Wheeler and D. B. Neale (2006). DNA sequence variation and selection of tag single-nucleotide polymorphisms at candidate genes for drought-stress response in Pinus taeda L. Genetics, 172: 1915-1926.

González-Martínez, S. C., K. V. Krutovsky and D. B. Neale (2006). Forest-tree population genomics and adaptive evolution. New Phytologist, 170: 227-238.

Grattapaglia, D. (2001). Genomic technologies for the development of the eucalypt of the future. In: Proceedings of the IUFRO international symposium “ Developing the Eucalypt of the future” Valdivia, Chile.

Grattapaglia, D., F. L. G. Bertolucci, R. Penchel and R. R. Sederoff (1996). Genetic mapping of quantitative trait loci controlling growth and wood quality traits in Eucalyptus grandis using a material half-sib family and RAPD markers. Genetics, 144: 1205-1214.

Grima-Pettenati, J. and D. Goffner (1999). Lignin genetic engineering revisited. Plant Science, 145: 51-65.

Gupta, P.K., S. Rustgi, and P. L. Kulwal (2005). Linkage disequilibrium and association studies in higher plants: present status and future prospects. Plant Molecular Biology, 57: 461-485.

168 Gut, I. G. (2001). Automation in genotyping of single nucleotide polymorphisms. Human Mutation, 17 (6): 475-492.

Haigler, C. H., M. Ivanova-Datcheva, P. S. Hogan, V. V. Salnikov, S. Hwang, K. Martin and D. P. Delmer (2001). Carbon partitioning to cellulose synthesis. Plant Molecular Biology, 47: 29-51.

Hernández-Nistal, J., E. Labrador, I. Martín, T. Jiménez and B. Dopico (2006). Transcriptional profiling of cell wall protein genes in chickpea embryonic axes during germination and growth. Plant Physiology and Biochemistry, 44: 684-692.

Hertzberg, M., H. Aspeborg, J. Schrader, A. Andersson, R. Erlandsson, K. Blomqvist, R. Bhalerao, M. Uhlén, T. T. Teeri, J. Lundeberg, B. Sundberg, P. Nilsson and G. Sandberg (2001). A transcriptional roadmap to wood formation. Proc. Natl. Acad. Sci., 98 (25): 14732-14737.

Hill, W. G. and B. S. Weir (1988). Variances and covariances of squared linkage disequilibrium in finite populations. Theor. Popul. Bio., 33:54-78.

Hoskins, R. A., A. C. Phan, M. Naeemuddin, F. A. Mapa, D. A. Ruddy, J. J. Ryan, L. M. Young, T. Wells, C. Kopczynski and M. C. Ellis (2001). Single nucleotide polymorphism markers for genetic mapping in Drosophila melanogaster. Genome Research, 11: 1100-1113.

Ingvarsson, P. K. (2005). Nucleotide polymorphism and linkage disequilibrium within and among natural populations of European Aspen (Populus tremula L., Salicaceae). Genetics, 169: 945-953.

Jamet, E., H. Canut, G. Boudart and R. F. Pont-Lezica (2006). Cell wall proteins: a new insight through proteomics. Trends in Plant Science, 11(1): 33-39.

169 Janssen, B., C. Hartmann, V. Scholz , A. Jauch and J. Zschocke (2005). MLPA analysis for the detection of deletions, duplications and complex rearrangements in the dystrophin gene: potential and pitfalls. Neurogenetics, 6: 29-35.

Järvinen, P., J. Lemmetyinen, O. Savolainen and T. Sopanen (2003). DNA sequence variation in BpMADS2 gene in two populations of Betula pendula. Molecular Ecology, 12: 369-384.

Jaypal, M. and A. J. Melendez (2006). DNA microarray technology for target identification and validation. Clinical and Experimental Pharmacology and Physiology, 33: 496-503.

Jouanin, L., T. Goujon, V. Nadaï, M.-T., Martin, I, Milla, C. Vallet, B. Pollet, A. Yoshinaga, B. Chabbert, M. Petit-Conil and C. Lapierre (2000). Lignification in transgenic poplars with extremely reduced caffeic acid O-methyltransferase activity. Plant Physiol, 123: 1363-1373.

Jones, P. G, D. Allaway, D. M. Gilmour, C. Harris, D. Rankin, E. R. Retzel and C. A. Jones (2002). Gene discovery and microarray analysis of cacao (Theobroma cacao L.) varieties. Planta, 216: 255-264.

Jorde, L. B. (2000). Linkage disequilibrium and the search for complex disease genes. Genome Research, 10: 1435-1444.

Josè-Estanyol, M. and P. Puigdomenech (2000). Plant cell wall glycoproteins and their genes. Plant Physiol. Biochem., 38 (1/2): 97-108.

Joshi, C. P., S. Bhandari, P. Ranjan, U. C. Kalluri, X. Liang, T. Fujino and A. Samuga (2004). Genomics of cellulose biosynthesis in poplars. New Phytologist, 164: 53-61.

170 Kam, J., P. Gresshoff, R. Shorter and G.-P. Xue (2007). Expression analysis of ring zinc finger genes from Triticum aestivum and identification of TaRZF70 that contains four RING-H2 domains and differentially responds to water deficit between leaf and root. Plant Science, 173: 650-659.

Karlowski, W. M. and A. M. Hirsch (2003). The over-expression of an alfalfa RING-H2 gene induces pleiotropic effects on plant growth and development. Plant Molecular Biology, 52: 121-133.

Kawaoka, A., P. Kaothien, K. Yoshida, S. Endo, K. Yamada and H. Ebinuma (2000). Functional analysis of tobacco LIM protein Ntliml involved in lignin biosynthesis. Plant Journal, 22: 289-301.

Keegstra, K. and N. Raikhel (2001). Plant glycosyltransferases. Current Opinion in Plant Biology, 4: 219-224.

Kim, J., J-H. Jung, J. L. Reyes, Y-S. Kim, S-Y. Kim, K-S. Chung, J. A. Kim, M. Lee, Y. Lee, V. N. Kim, N-H. Chua and C-M. Park (2005) microRNA-directed cleavage of ATHB15 mRNA regulates vascular development in Arabidopsis inflorescence stems. The Plant Journal, 42: 84-94.

Kimura, M. (1983). The Neutral Theory of Molecular Evolution. Cambridge University Press, Cambridge, Massachusetts.

Kinser, S., Q. Jia, M. Li, A. Laughter, P. D. Cornwell, J. C. Corton and J.C. Pestka (2004). Gene expression profiling in spleens of deoxynivalenol-exposed mice: immediate early genes as primary targets. Journal of Toxicology and Environmental Health, Part A, 67(18): 1423-1441.

171 Kirst, M., A. A. Myburg, J. P. G. De León, M. E. Kirst, J. Scott and R. Sederoff (2004). Coordinated genetic regulation of growth and lignin revealed by quantitative trait locus analysis of cDNA microarray data in an interspecific backcross of Eucalyptus. Plant Physiology, 135: 2368-2378.

Klok, E. J, I. W. Wilson, D. Wilson, S. C. Chapman, R. M. Ewing, S. C. Somervill, W. J. Peacock, R. Dolferus and E. S. Dennis (2002). Expression profile analysis of the low-oxygen response in Arabidopsis root cultures. Plant Cell, 14: 2481-2494.

Ko, J.-H., S. H. Yang and K. H. Han (2006). Upregulation of an Arabidopsis RING-H2 gene, XERICO, confers drought tolerance through increased abscisic acid biosynthesis. Plant Journal, 47: 343-355.

Ko, J.-H., C. Prassinos and K-H. Han (2006). Developmental and seasonal expression of PtaHB1, a Populus gene encoding a class III HD-Zip protein, is closely associated with secondary growth and inversely correlated with the level of micro RNA (miR166). New Phytologist, 169: 469-478.

Ko, J. H., K. H. Han, S. Park, S. and J. Yang (2004). Plant body weight-induced secondary growth in Arabidopsis and its transcription genotype revealed by whole-transcriptome profiling. Plant Physiol. 135: 1069-1083.

Kononoff, P. J., H. M. Deobald, E. L. Stewart, A. D. Laycock and L. S. Marquess (2005). The effect of a leptin single nucleotide polymorphism on quality grade, yield grade and carcass weight of beef cattle. Journal of Science, 83: 927- 932.

Kota, R., S. Rudd, A. Facius, G. Kolesov, T. Thiel, H. Zhang, N. Stein, K. Mayer and A. Graner (2003). Snipping polymorphisms from large EST collections in barley (Hordeum vulgare L.). Mol. Gen. Genomics, 270: 24-33.

172 Krutovsky, K. V. and D. B. Neale (2005). Nucleotide diversity and linkage disequilibrium in cold-hardiness-and wood quality-related candidate genes in Douglas Fir. Genetics, 171: 2029-2041.

Kube, P. D., C.A. Raymond, and P.W. Banham (2001). Genetic parameters for diameter, basic density, fibre properties and cellulose content in Eucalyptus nitens. Forest Genetics, 8: 285-294.

Kubo, M., M. Udagawa, N. Nishikubo, G. Horiguchi, M. Yamaguchi, J. Ito, T. Mimura, H. Fukuda and T. Demura (2005). Transcription switches for protoxylem and metaxylem vessel formation. Genes & Development, 19: 1855-1860.

Kumar, S., C. Echt, P. L. Wilcox and T. E. Richardson (2004). Testing for linkage disequilibrium in the New Zealand radiata pine breeding population. Theoretical and Applied Genetics, 108: 292-298.

Lachaud, S., A.-M. Catesson, and J.-L. Bonnemain (1999). Structure and functions of the vascular cambium. Life Sciences, 322: 633-650.

Lafarguette, F., J. C. Leplé, A. Déjardin, F. Laurans, G. Costa, M.-C. Lesage- Descauses and G. Pilate (2004). Poplar genes encoding fasciclin-like arabinogalactan proteins are highly expressed in tension wood. New Phytologist, 164:107-121.

Lancaster, J. M., H. K. Dressman, R. S. Whitaker, L. Havrilesky, J. Gray, J. R. Marks, J. R. Nevins, A. Berchuck (2004). Gene expression patterns that characterize advanced stage serous ovarian cancers. J Soc Gynecol Investig, 11(1): 51-59.

Lander, E. S. and N. S. Schork (1994). Genetic dissection of complex traits. Science, 265: 2037-2048.

173 Larson, P. R. (1994). The vascular cambium. Development and structure. Springer-Verlag, Heidelberg, Germany.

Li, L., X. F. Cheng, J. Leshkevich, T. Umezawa, S. A. Harding and V. L. Chiang (2001). The last step of syringyl monolignol biosynthesis in angiosperms is regulated by a novel gene encoding sinapyl alcohol dehydrogenase. Plant Cell, 13: 1567-1586.

Lichten, M. J. and M. S. Fox (1983). Detection of non-homology-containing heteroduplex molecules. Nucleic Acids Research, 11: 3959-3971.

Lin, J.-Z., P. L. Morrell and M. T. Clegg (2002). The influence of linkage and inbreeding on patterns of nucleotide sequence diversity at duplicate alcohol dehydtrogenase loci in wild barley Hordeum vulgare ssp. spontaneum. Genetics, 162: 2007-2015.

Maguire, T. L., S. Grimmond, A. Forrest, I. Iturbe-Ormaetxe, K. Meksem and P. Gresshoff (2002). Tissue-specific gene expression in soybean (Glycine max) detected by cDNA microarray analysis. J. Plant Physiol., 159: 1361-1374.

Maniatis, T. and B. Tasic (2002). Alternative pre-mRNA splicing and proteome expansion in metazoans. Nature, 418: 236-243.

Marques, C. M., J. A. Araújo, J. G. Ferreira, R. Whetten, D. M. O’ Malley, B.-H. Liu and R. Sederoff (1998). AFLP genetic maps of Eucalyptus globulus and E. tereticornis. Theor. Appl. Genet, 96: 727-737.

Master, E. R., U. J. Rudsander, W. Zhou, H. Henriksson, C. Divne, S. Denman, D. B. Wilson and T. T. Teeri (2004). Recombinant expression and enzymatic characterization of PttCe19A, a KOR homologue from Populus tremula x tremuloides. Biochemistry, 43: 10080-10089.

174 Maxwell, A. (2004). The , phylogeny and impact of Mycosphaerella species on eucalypts in South-Western Australia. PhD Thesis, Murdoch University.

Mellerowicz, E. J., M. Baucher, B. Sundberg and W. Boerjan (2001). Unravelling cell wall formation in the woody dicot stem. Plant Molecular Biology, 47: 239- 274.

Meng, X.-B., W.-S. Zhao, R.-M. Lin, M. Wang, and Y.-L. Peng (2006). Molecular cloning and characterization of a rice blast-inducible RING-H2 type zinc finger gene. DNA Sequence, 17: 41-48.

Miller, N. A., Q. Gong, R. Bryan, M. Ruvolo, L. A. Turner and S. T. LaBrie (2002). Cross hybridization of closely related genes on high-density maroarrays. BioTechniques, 32:620-625.

Mitsuda, N., A. Iwase, H. Yamamoto, M Yoshida and M. Seki (2007). NAC transcription factors, NST1 and NST3, are key regulators of the formation of secondary walls in woody tissues of Arabidopsis. The Plant Cell, Online publication: 1-11.

Moran, G. F., K. A. Thamarus, C. A. Raymond, D. Qiu, T. Uren and S. G. Southerton (2002). Genomics of Eucalyptus wood traits. Ann. For. Sci., 59: 645- 650.

Muneri, A. and C.A. Raymond (2000). Genetic parameters and genotype-by- environment interactions for basic density, pilodyn penetration and diameter in Eucalyptus globulus. Forest Genetics, 7: 317-328.

Myburg, A., A. R. Griffin, R. R. Sederoff and R. W. Whetten (2003). Theoretical and Applied Genetics, 107 (6):1028-1042

175 Neale, D. B. and O. Savolainen (2004). Association genetics of complex traits in conifers. Trends in Plant Science, 9(7): 325-330.

Neale, D. B., M. M. Sewell, and G. R. Brown, (2002). Molecular dissection of the quantitative inheritance of wood property traits in loblolly pine. Ann. For. Sci., 59: 595-605.

Nehra, N. S., M. R. Becwar, W. H. Rottmann, L. Pearson, K. Chowdhury, S. Chang, H. D. Wilde, R. J. Kodrzycky, C. Zhang, K. C. Gause, D. W. Parks and M. A. Hinchee (2005). Forest biotechnology: innovative methods, emerging opportunities. In Vitro Cell. Dev. Biol., 41: 701-717.

Nei, M. (1987). Molecular Evolutionary Genetics. Columbia University Press, New York.

Nielsen H., S.Brunak and von G. Heijne (1999). Machine learning approaches for the prediction of signal peptides and other protein sorting signals. Protein Eng., 12: 3–9

Nordborg, M. (2000). Linkage disequilibrium, gene trees and selfing: an ancestral recombination graph with partial self-fertilization. Genetics, 154: 923-929.

Nowotny, P., J. M. Kwon, and A. M. Goate (2001). SNP analysis to dissect human traits. Current Opinion in Neurobiology, 11: 637-641.

Oh, S., S. Park and K.-H. Han (2003). Transcriptional regulation of secondary growth in Arabidopsis thaliana. Journal of Experimental Botany, 54: 2709-2722.

176 Ohashi-Ito, K. and H. Fukuda (2003). HD-Zip III Homeobox genes that include a novel member, ZeHB-13 (Zinnia)/ATHB-15 (Arabidopsis), are involved in procambium and xylem cell differentiation. Plant Cell Physiol, 44(12): 1350- 1358.

Ohnishi, Y., T. Tanaka, R. Yamada, K. Suematsu, M. Minami, K. Fujii, N. Hoki, K, Kodama, S. Nagata, T, Hayashi, N. Kinoshita, H. Sato, H. Sato, T. Kuzuya, H. Takeda, ·M. Hori and Y. Nakamura (2000). Identification of 187 single nucleotide polymorphisms (SNPs) among 41 candidate genes for ischemic heart disease in the Japanese population. Hum Genet, 106: 288-292.

Olsen, A. N., H. A. Ernst, L. L. Leggio and K. Skriver (2005). NAC transcription factors: structurally distinct, functionally diverse. Trends in Plant Science, 10(2): 79-87.

Olsen, K. M., S. S. Halldorsdottir, J. R. Stinchcombe, C. Weinig, J. Schmitt and M. D. Purugganan (2004). Linkage disequilibrium mapping of Arabidopsis CRY2 flowering time alleles. Genetics, 167: 1361-1369.

Orita, M., Y. Suzuki, T. Sekiya and K. Hayashi (1989). Rapid and sensitive detection of point mutations and DNA polymorphisms using the polymerase chain reaction. Genomics, 5 (4): 874-879.

Palaisa, K., M. Morgante, S. Tingey and A. Rafalski (2004). Long-range patterns of diversity and linkage disequilibrium surrounding the maize Y1 gene are indicative of an asymmetric selective sweep. Proc. Natl. Acad. Sci., 101(26): 9885-9890.

177 Paris, M., R. H. Potter, R. C. M. Lance, C. D. Li and M. G. K. Jones (2003). Typing Mlo alleles for powdery mildew resistance in barley by single nucleotide polymorphism analysis using MALDI-ToF mass spectrometry. Australian Journal of Agricultural Research, 54: 1343-1349.

Patil, N., A. J. Berno, D. A. Hinds, W. A. Barrett, J. M. Doshi, C. R. Hacker, C. R. Kautzer, D. H. Lee, C. Marjoribanks, D. P. McDonough, B. T. N. Nguyen, M. C. Norris, J. B Sheehan, N. Shen, D. Stern, R. P. Stokowski, Daryl, J. Thomas, M. O. Trulson, K. R. Vyas, K. A. Frazer, S. P. A. Fodor and D. R. Cox (2001). Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science, 294: 1719-1723.

Paux, E., V. Carocha, C. Marques, A. M. Sousa, N. Borralho, P. Sivadon and Grima-Pettenati (2005). Transcript profiling of Eucalyptus xylem genes during tension wood formation. New Phytologist, 167: 89-100.

Paux, E., M-B. Tamasloukht, N. Ladouce, P. Sivadon and J. Grima-Pettenati (2004). Identification of genes preferentially expressed during wood formation in Eucalyptus. Plant Molecular biology, 55: 263-280.

Pear, J. R., Y. Kawagoe, W. E. Schreckengost, D. P. Delmer and D. M. Stalker (1996). Higher plants contain homologs of the bacterial celA genes encoding the catalytic subunit of cellulose synthase. Proc. Natl. Acad. Sci., 93:12637-12642.

Peter, G., L. VanZyl, U. Egerstdotter, J. Mackay, W. Li and R. Sederoff (2003). Gene compression profiles during normal and compression wood formation in loblolly pine. IFURO Tree Biotechnology Conference, Umea, Sweden.

Pfaffl, M. W. and M. Hageleit (2001). Validities of mRNA quantification using recombinant RNA and recombinant DNA external calibration curves in real-time RT-PCR. Biotechnology letters, 2001. 23 (4): 275-282.

178 Pflieger, S., V. Lefebvre and M. Causse (2001). The candidate gene approach in plant genetics: a review. Molecular Breeding, 7: 275-291.

Piquemal, J., C. Lapierre, K. Myton, A. O’Connell, W. Schuch, J. Grima_Pettenati and A.-M. Boudet (1998). Down-regulation in cinnamoyl-CoA reductase induces significant changes of lignin profiles in transgenic tobacco plants. Plant Journal, 13: 71-83.

Plomion, C., G. Leprovost, and A. Stokes (2001). Wood formation in trees. Plant Physiology, 127: 1513-1523.

Pot, D., G. Chantre, P. Rozenberg, J. C. Rodrigues, G. L. Jones, H. Pereira, B. Hannrup, C. Cahalan and C. Plomion (2002). Genetic control of pulp and timber properties in maritime pine (Pinus pinaster Ait.). Ann. For. Sci., 59: 563-575.

Pot, D., L. McMillan, C. Echt, G. Provost, P. Garnier-Géré, S. Cato and C. Plomion (2005). Nucleotide variation in genes involved in wood formation in two pine species. New Phytologist, 167: 101-112.

Prigge, M. J. and S. E. Clark (2006). Evolution of the class III HD-Zip gene family in land plants. Evolution & Development, 8(4): 350-361.

Pritchard, J. K., M. Stephens, N. A. Rosenberg, and P. Donnelly (2000). Association mapping in structured populations. Am. J. Hum. Genet. 37: 170–181.

Qiu, D., I. W. Wilson, S. Gan, R. Washusen, G. F. Moran and S. G. Southerton (2008). Gene expression in Eucalyptus branch wood with marked variation in cellulose microfibril orientation and lacking G-layers. New Phytologist, 179: 111- 120.

179 Rafalski, A. (2002). Applications of single nucleotide polymorphisms in crop genetics. Current Opnion in Plant Biology, 2002. 5: 94-100.

Rafalski, J.A. (2002). Novel genetic mapping tools in plants: SNPs and LD-based approaches. Plant Science, 162: 329-333.

Rajeevan, M. S., D. G. Ranamukhaarachchi, S. D. Vernon and E. R. Unger (2001). Use of real-time quantitative PCR to validate the results of cDNA array and differential display PCR technologies. Methods, 25(4): 443-451.

Ranik, M. and A. A. Myburg (2006). Six new cellulose synthase genes from Ecalyptus are associated with primary and secondary cell wall biosynthesis. Tree Physiology, 26: 545-556.

Ranocha, P., D. Goffiner and A. M. Boudet (2000). Plant laccases: are they involved in lignification. In: Cell and Molecular Biology of Wood formation (R. Savidage, J. Barnnett and R. Napier eds.), BIOS Scientific Publications, Oxford: 397-410.

Raymond, C. A. (2002). Genetics of Eucalyptus wood properties. Ann. For. Sci. 59: 252-531.

Raymond, C. A., L. R. Schimleck, A. Muneri and A. J. Michell (2001). Nondestructive sampling of Eucalyptus globules and E. nitens for wood properties. III. Predicted pulp yield using near infrared reflectance analysis. Wood Science and Technology, 35: 203-215.

Reghunathan, R., M. Jayapal, L.-Y. Hsu, H.-H. Chng, D. Tai, B. P. Leung and A. J. Melendez (2005). Expression profile of immune response genes in patients with severe acute respiratory syndrome. BMC Immunol. 6(2):1-11.

180 Reich, D. E., M. Cargill, S. Bolk, J. Ireland, P. C. Sabeti, D. J. Richter, T. Lavery, R. Kouyoumjian, S. F. Farhadian, R. Ward and E. S. Lander (2001). Linkage disequilibrium in the human genome. Nature, 411:199-204.

Reiter, W.-D. (1998). The molecular analysis of cell wall components. Trends in Plant Science, 3 (1): 27-32.

Remington, D. L. J. M. Thornsberry, Y. Matsuoka, L. M. Wilson. S. R. Whitt, J. Doebley, S. Kresovich, M. M. Goodman and E. S. Buckler IV (2001). Structure of linkage disequilibrium and phenotypic associations in the maize genome. Proc. Natl. Acad. Sci., 98(20): 11479-11484.

Remm, M. and A. Metspalu (2001). High-density genotyping and linkage disequilibrium in the human genome using chromosome 22 as a model. Current Opinion in Chemical Biology, 6: 24-30.

Richmond, T. A. and C. R. Somerville (2001). Integrative approaches to determining Csl function. Plant Molecular Biology, 47: 131-143.

Richmond, T. A. and C. R. Somerville (2000). The cellulose synthase superfamily. Plant Physiology, 124: 495-498.

Roberts, K. (2001). How the cell wall acquired a cellular context. Plant Physiology, 125: 127-130.

Rose, J. K.C., J. Braam, S. C. Fry and K. Nishitani (2002). The XTH family of enzymes involved in xyloglucan endotransglucosylation and endohydrolysis: current perspectives and a new unifying nomenclature. Plant Cell Physiol., 43: 1421–1435

181 Romo, S., B. Dopico, and E. Labrador (2002). The expression of new Cicer arietinum cDNA, encoding a glutamic acid-rich protein, is related to development. Plant Physiology, 159: 1375-1381.

Rozas, J., J. C. Sánchez-DelBarrio, X. Messeguer and R. Rozas (2003). DNASP, DNA polymorphisms analysis by the coalescent and other methods. Bioinforamtics, 19: 2496-2497.

Rozen, S. and H. Skaletsky (2000). Methods in molecular biology, Vol. 132: bioinformatics methods and protocols (eds, S. Misener and S. A. Krawetz), Humana Press Inc, Totowa, NJ.

Sahin-Cevik, M. and G.A. Moore (2006). Isolation and characterization of a novel RING-H2 finger gene induced in response to cold and drought in the interfertile Citrus relative Poncirus trifoliata, Physiol. Plant. 126: 153–161.

Samuga, A. and C. P. Joshi (2002). A new cellulose synthase gene (PtrCesA2) from aspen xylem is orthologous to Arabidopsis AtCesA7 (irx3) gene associated with secondary cell wall synthesis. Gene, 296: 37-44.

Schadt, E. E., S. A. Monks, T. A., Drake, A. J. Lusis, N. Che, V. Colinayo, T. G. Ruff, S. B. Milligan, J. R. Lamb, G. Cavet, P. S. Linsley, M. Mao, R. B. Stoutghton and S. H. Friend (2003). Genetics of gene expression surveyed in maize, mouse and man. Nature, 422: 297-302.

Schmid, K. J., T. R. Sörensen, R. Stracke, R., O. Törjék, T. Altmann, T. Mitchell- Olds and B. Weisshaar (2003). Large-scale identification and analysis of genome- wide single-nucleotide polymorphisms for mapping in Arabidopsis thaliana. Genome Research, 13: 1250-1257.

182 Schouten, J. P., C. J. McElgunn, R. Waaijer, D. Zwijnenburg, F. Diepvens and G. Pals (2002). Relative quantification of 40 nucleic acid sequences by multiplex ligation-dependent probe amplification. Nucleic Acids Research, 30(12): e57.

Schrader, J. (2003). Developmental biology of wood formation-finding regulatory factors through functional genomics. PhD Thesis. Swedish University of Agricultural Sciences.

Schultz, C. J., M. P. Rumsewicz, K. L., K. L. Johnson, B. J. Jones, Y. M. Gaspar and A. Bacic (2002). Using genomic resources to guide research directions. The arabinogalactan protein gene family as a test case. Plant Physiology, 129:1448- 1463.

Scurfield, G. and A. B. Wardrop (1962). The nature of reaction wood; the reaction anatomy of seedlings of woody perennials. Australian Journal of Botany, 10:93- 105.

Serrano, M. and P. Guzmán (2004). Isolation and gene expression analysis of Arabidopsis thaliana mutants with constitutive expression of ATL2, an early elicitor-response RING-H2 zinc-finger gene. Genetics, 167: 919-929.

Searle, S. R. (1987). Linear models for unbalanced data, John Wiley & Sons.

Sewalt, V. J. H., W. Ni, J. W. Blount, H. G. Jung, S. A. Masoud, P. A. Howles, C. Lamb and R. A. Dixon (1997). Reduced lignin content and altered lignin composition in transgenic tobacco down-regulated in expression of L- phenylalanine ammonia-lyase or cinnanate 4-hydroxylase. Plant Physiol, 115: 41- 50.

183 Shen, R., J.-B. Fan, D. Campbell, W. Chang, J. Chen, D. Doucet, J. Yeakley, M. Bibikova, E. W. Garcia, C. McBride, F. Steemers, F. Garcia, B. G. Kermani, K. Gunderson and A. Oliphant (2005). High-throughput SNP genotyping on universal bead arrays. Mutation Research, 573:70-82.

Shi, M. M. (2001). Enabling large-scale pharmacogenetic studies by high- throughput mutation detection and genotyping technologies. Clinical Chemistry, 47 (2): 164-172.

Showalter, A. M. (2001). Arabinogalactan-proteins: structure, expression and function. Cell. Mol. Life. Sci., 58: 1399-1417.

Showalter, A. M. (1993). Structure and function of plant cell wall proteins. Plant Cell, 5: 9-23.

Showalter, A. M. and J. E. Varner (1989). Plant hydroxyproline-rich glycoproteins. The Biochemistry of plants. P. K. Stumpf and E. E. Conn. New York, Academic Press: 485-520.

Sibout, R., A. Eudes, G. Mouille, B. Pollet, C. Lapierre, L. Jouanin and A. Séguin (2005). Cinnnamyl alcohol dehydrogenase-C and –D are the primary genes involved in lignin biosynthesis in the floral stem of Arabidopsis. Plant Cell, 17: 2059-2076.

Siefert, G. J. and K. Roberts (2007). The biology of arabinogalactan proteins. Annual Review of Plant Biology, 58:137-161.

Sims, T. L. and M. Ordanic (2001). Identification of a S-ribonuclease-binding protein in Petunia hybrida. Plant Molecular Biology, 47: 771-783.

184 Solemani, V. D., B. R. Baum, and D. A. Johnson (2003). Efficient validation of single nucleotide polymorphisms in plants by allele-Specific PCR, with an example from Barley. Plant Molecular Biology Reporter, 21: 281-288.

Solomon, L. and A. Myburg (2007). Diurnal profiling of cambial gene expression allows dissection of carbon allocation pathways in Eucalyptus. IUFRO tree Biotechnology Conference, Portugal.

Sommer-Knudsen, J., A. Bacic, and A. E. Clarke (1998). Hydroxyproline-rich plant glycoproteins. Phytochemistry, 47(4): 483-497.

Souer, E., A. Van Houwelingen, D. Kloos, J. Mol and R. Koes (1996). The no apical meristem gene of petunia is required for pattern formation in embryos and flowers and is expressed at meristem and primordial boundaries. Cell, 85 (2): 159- 170.

Southerton, S. G., H. Marshall, A. Mouradov and R. D. Teasdale (1998). Eucalypt MADS-box gene expressed in developing flowers. Plant Physiology, 118(2): 365- 372.

Spielman, R.S., R. E. McGinnis and W. J. Ewens (1993). Transmission test for

linkage disequilibrium: the insulin gene region and insulin-dependent diabetes

mellitus (IDDM). Am. J. Hum. Genet., 52: 506–516.

Spokevicius, A. V., S. G. Southerton, C. P. MacMillan, D. Qiu, S. Gan, J. F. G. Tibbits, G. F. Moran and G. Bossinger (2007). Β-tubulin affects cellulose microfibril orientation in plant secondary fibre cell walls. The Plant Journal, 51: 717-726.

185 Sterky, F., R. R. Bhalerao, P. Unneberg, B. Segerman, P. Nilsson, A. M. Brunner, L. Charbonnel-Campaa, J. J. Lindvall, K. Tandre, S. H. Strauss, B. Sundberg, P. Gustafsson, M. Uhlén, R. P. Bhalerao, O. Nilsson, G. Sandberg , J. Karlsson, J. Lundeberg and S. Jansson (2004). A Populus EST resource for plant functional genomics. Proc. Natl. Acad. Sci., 101(38): 13951-13956.

Sterky, F., S. Regan, J. Karlsson, M. Hertzberg, A. Rohde, A. Holmberg, B. Amini, R. Bhalerao, M. Larsson, R. Villarroel, M. V. Montagu, G. Sandberg, O. Olsson, T. T. Teeri, W. Boerjan, P. Gustafsson, M. Uhlén, B. Sundberg and J. Lundeberg (1998). Gene discovery in the wood-forming tissues of Poplar: Analysis of 5692 expressed sequence tags. Proc. Natl. Acad. Sci., 95: 13330- 13335.

Steves, T. A. and I. M. Sussex (1989). Patterns in plant development. Cambridge University Press, Cambridge, UK.

Suzuki, S., L. Li, Y.-H. Sun and V. L. Chiang (2006). The cellulose synthase gene superfamily and biochemical functions of xylem-specific cellulose synthase-like genes in Populus trichocarpa. Plant Physiology, 142: 1233-1245.

Syvänen, A.C. (2005). Toward genome-wide SNP genotyping. Nature Genetics, 37: S5-S10.

Tabor, H. K., N. J. Risch and R. M. Myers, (2002). Candidate-gene approaches for studying complex genetic traits: practical considerations. Nature Reviews – Genetics, 3: 1-7.

Tajima, F. (1989). Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics, 123:585-595.

186 Tamagnone, L., A. Merida, A. Parr, S. Mackay, F. A. Culianez-Macia, K. Roberts and C. Martin (1998). The AmMYB308 and AmMYB330 transcription factors from Antirrhinum regulate phenyl propanoid and lignin biosynthesis in transgenic tobacco. Plant cell, 10: 135-154.

Taniguchi, M., K. Miura, H. Iwao and S. Yamanaka (2001) Quantitative assessment of DNA microarrays-comparison with northern blot analyses. Genomics, 71: 34-39.

Taylor, N.G., R. M. Howells, A. K. Huttly,` K. Vickers, S. R. Turner (2003). Interactions among three distinct CesA proteins essential for cellulose synthesis. Proc. Natl. Acad. Sci., 100 (3): 1450-1455.

Taylor, N. G., S. Wolf-Rüdiger, S. Cutler, C. R. Somerville and S. R. Turner (1999). The irregular xylem3 locus of Arabidopsis encodes a cellulose synthase required for secondary cell wall synthesis. Plant Cell, 11: 769-779.

Taylor, M. A., S. A. M. Arif, A. Kumar, H. V. Davies, L. A. Scobie, S. R. Pearce and A. J. Flavell (1992). Expression and sequence analysis of cDNAs induced during the early stages of tuberisation in different organs of the potato plant (Solanum tuberosum L.). Plant Molecular Biology, 20: 641-651.

Thamarus, K. A., K. Groom, J. Murrell and M. Byrne (2002). A genetic linkage map for Eucalyptus globulus with candidate loci for wood, fibre and floral traits. Theor Appl Genet, 104: 379-387.

The Arabidopsis Genome Initiative (2000). Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature, 408: 796-815.

187 Thumma, B. R., S. Dillon, N. Bhuiyan, J. George, M. F. Nolan, W. Li, J. Gorantla, X. Li, C. J. Bell, C. P. MacMillan and S. G. Southerton (2007). Improving wood fibre properties in Pinus radiata and eucalypts through association studies. IUFRO Tree Biotechnology Conference, Portugal.

Thumma, B. R., M. F. Nolan, R. Evans and G. F. Moran (2005). Polymorphisms in Cinnamoyl CoA Reductase (CCR) are associated with variation in microfibril angle in Eucalyptus spp. Genetics, 171: 1257-1265.

Tibbits, J. F. G. (2006). Towards association studies in Pinus radiata D. Don – populations and wood property candidate–genes. PhD Thesis. The University of Melbourne.

Tuscan, G. A., S. Difazio, S. Jansson, J. Bohlmann, I. Grigoriev, U. Hellsten, N. Putnam. S. Ralph, S. Rombauts, A. Salamov, J. Schein, L. Sterck, A. Aerts R. R. Bhalerao, R. P. Bhalerao, D. Blaudez, W. Boerjan, A. Brun, A. Brunner, V. Busov, M. Campbell, J. Carlson, M. Chalot, J. Chapman, G.-L. Chen, D. Cooper, P. M. Coutinho, J. Couturier, S. Covert, Q. Cronk, R. Cunningham, J. Davis, S. Degroeve, A. Déjardin, C. dePamphilis, J. Detter, B. Dirks, I. Dubchak, S. Duplessis, J. Ehlting. B. Ellis, K. Gendler, D. Goodstein, M. Gribskov, J. Grimwood, A. Groover, L. Gunter, B. Hamberger, B. Heinze, Y. Helariutta, B. Henrissat, D. Holligan, R. Holt, W. Huang, N. Islam-Faridi, S. Jones, M. Jones- Rhoades, R. Jorgensen, C. Joshi, J. Kangasjärvi, J. Karlsson, C. Kelleher, R. Kirkpatrick, M. Kirst, A. Kohler, U. Kalluri, F. Larimer, J. Leebens_Mack, J.-C. Leplé, P. Locascio, Y. Lou, S. Lucas, F. Martin, B. Montanini, C. Napoli, D. R. Nelson, C. Nelson, K. Nieminen, O. Nilsson, V. Pereda, G. Peter, R. Philippe, G. Pilate, A. Poliakov, J. Razumovskaya, P. Richardson, C. Rinaldi, K. Ritland, P. Rouzé, D. Ryaboy, J. Schmutz, J. Schrader, B. Segerman, H. Shin, A. Siddiqui, F. Sterky, A. Terry, C. J. Tsai, E. Uberbacher, P. Unneberg, J. Vahala, K. Wall, S. Wessler, G. Yang, T. Yin, C. Douglas, M. Marra, G. Sandberg, Y. Van de Peer and E. Rokhsar (2006). The genome of black cottonwood, Populus trichocarpa

188 (Torr. & Gray). Science, 313:158-160.

Vissenberg, K., Fry, S. C., M. Pauly, H. Höfte and J.-P. Verbelen (2005). XTH acts at the microfibril-matrix interface during cell elongation. Journal of Experimental Botany, 56: 673-683.

Washusen, R., R. Evans and S. Southerton (2005). A study of Eucalyptus grandis and Eucalyptus globulus branch wood microstructure. IAWA Journal, 26:203- 210.

Watterson, G. A. (1975). On the number of segregating sites in the genetical models without recombination. Theor. Popul. Biol. 7: 256-276.

Weaver, T.A. (2000). High-throughput SNP discovery and typing for genome- wide genetic analysis. New technologies for life sciences, A Trends Guide: 36-42.

Weiner, M. P. and T. J. Hudson (2002). Introduction to SNPs: Discovery of Markers for Disease. Biotechniques, 32: S4-S13.

Whetten, R., Y.-H. Sun, Y. Zhang and R. Sederoff (2001). Functional genomics and cell wall biosynthesis in loblolly pine. Plant Molecular Biology, 47: 275-291.

Williams J. E. and M. I. H. Brooker (1997). Eucalypts: an intorduction in eucalypt ecology: individuals to ecosystems (J. Williams and J. Woinarski, eds) Cambridge University Press: 1-15.

Williamson, S. (2004). Eucalyptus genome project branches out. Australian Life Scientist, Research News: 8.

Wilson, D. L., M. J. Buckley, C. A. Helliwell and I. W. Wilson (2003). New normalization methods for cDNA microarray data. Bioinformatics, 19(11): 1325- 1332.

189 Xu, R. and Q. Q. Li (2003). A RING-H2 zinc-finger protein gene RIE1 is essential for seed development in Arabidopsis. Plant Molecular Biology, 53: 37- 50.

Yamada, R., T. Tanaka, Y. Ohnishi, K. Suematsu, M. Minami, T. Seki, M. Yukioka, A. Maeda, N. Murata, O. Saiki, R. Teshima, O. Kudo, K. Ishikawa, A. Ueyosi, H. Tateishi, M. Inaba, H. Goto, Y. Nishizawa, S. Tohma, T. Ochi, K. Yamamoto and Y. Nakamura(2000). Identification of 142 single nucleotide polymorphisms in 41 candidate genes for rheumatoid arthritis in the Japanese population. Hum Genet, 106: 293-297.

Yokoyama, R. and K. Nishitani (2006). Identification and characterization of Arabidopsis thaliana genes involved in xylem secondary cell walls. J. Plant. Res. 119:189-194.

Yu, J., S. Hu, J. Wang, G. K.-S. Wong, S. Li, B. Liu, Y. Deng, L. Dai, Y. Zhou, X. Zhang, M. Cao, J. Liu, J. Sun, J. Tang, Y. Chen, X. Huang, W. Lin, C. Ye, W. Tong, L. Cong, J. Geng, Y. Han, L. Li, W. Li, G. Hu, X. Huang, W. Li, J. Li, Z. Liu, L. Li, J. Liu, Q. Qi, J. Liu, L. Li, T. Li, X. Wang, H. Lu, T. Wu, M. Zhu, P. Ni, H. Han, W. Dong, X. Ren, X. Feng, P. Cui, X. Li, H. Wang, X. Xu, W. Zhai, Z. Xu, J. Zhang, S. He, J. Zhang, J. Xu, K. Zhang, X. Zheng, J. Dong, W. Zeng, L. Tao, J. Ye, J. Tan, X. Ren, X. Chen, J. He, D. Liu, W. Tian, C. Tian, H. Xia, Q. Bao, G. Li, H. Gao, T. Cao, J. Wang, W. Zhao, P. Li, W. Chen, X. Wang, Y. Zhang, J. Hu, J. Wang, S. Liu, J. Yang, G. Zhang, Y. Xiong, Z. Li, L. Mao, C. Zhou, Z. Zhu, R. Chen, B. Hao, W. Zheng, S. Chen, W. Guo, G. Li, S. Liu, M. Tao, J. Wang, L. Zhu, L. Yuan, and H. Yang (2002). A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science, 296: 79-92

Zhang, D. and Z. Zhang (2005). Single nucleotide polymorphisms (SNPs) discovery and linkage disequilibrium (LD) forest trees. Forestry Studies in China, 7(3): 1-14.

190 Zhang, P. and S. Bohl-Zenger (2003). Two cassava promoters related to vascular expression and storage root formation. Planta, 218: 192-203.

Zhang, K., F. Sun, M. S. Waterman and T. Chen (2003). Haplotype block partition with limited resources and applications to human chromosome 21 haplotype data. Am. J. Hum. Genet., 73: 63-73.

Zhang, B., Y. Zhou, L. Zhang, Q. Zhuge, M.-X., Wang and M.-R. Huang (2005). Identification and validation of single nucleotide polymorphisms in poplar using publicly Expressed Sequence Tags. Journal of Integrative Plant Biology, 47(12):1493-1499.

Zhang, X., V. Garreton and N.-H. Chua (2005). The AIP2 E3 ligase acts as a novel negative regulator of ABA signaling by promoting ABI3 degradation. Genes Dev., 19:1532–1543.

Zhong, R., T. Demura, and Z.-H. Ye (2006). SND1, a NAC domain transcription factor, is a key regulator of secondary wall synthesis in fibers of Arabidopsis. The Plant Cell Preview: 1-13.

Zhong, R., S. J. Kays, B. P. Schroeder and Z.-H. Ye (2002). Mutation of a chitinase-like gene causes ectopic deposition of lignin, aberrant cell shapes, and overproduction of ethylene. Plant Cell, 14: 165-179.

191

Minerva Access is the Institutional Repository of The University of Melbourne

Author/s: Bhuiyan, N.

Title: Identification of genes influencing wood fibre properties in Eucalyptus nitens

Date: 2008

Citation: Bhuiyan, N. (2008). Identification of genes influencing wood fibre properties in Eucalyptus nitens. PhD thesis, School of Forest and Ecosystem Science, The University of Melbourne.

Publication Status: Unpublished

Persistent Link: http://hdl.handle.net/11343/35124

File Description: Identification of genes influencing wood fibre properties in Eucalyptus nitens

Terms and Conditions: Terms and Conditions: Copyright in works deposited in Minerva Access is retained by the copyright owner. The work may not be altered without permission from the copyright owner. Readers may only download, print and save electronic copies of whole works for their own personal non-commercial use. Any use that exceeds these limits requires permission from the copyright owner. Attribution is essential when quoting or paraphrasing from these works.