Louisiana State University LSU Digital Commons

LSU Doctoral Dissertations Graduate School

3-14-2018 The etD erminants of Patterns and the Impact of Phosphate Starvation on Nucleosome Patterns and Gene Expression in Rice Qi Zhang Louisiana State University and Agricultural and Mechanical College, [email protected]

Follow this and additional works at: https://digitalcommons.lsu.edu/gradschool_dissertations Part of the Agronomy and Crop Sciences Commons, Bioinformatics Commons, Genetics Commons, Genomics Commons, and the Plant Breeding and Genetics Commons

Recommended Citation Zhang, Qi, "The eD terminants of Nucleosome Patterns and the Impact of Phosphate Starvation on Nucleosome Patterns and Gene Expression in Rice" (2018). LSU Doctoral Dissertations. 4502. https://digitalcommons.lsu.edu/gradschool_dissertations/4502

This Dissertation is brought to you for free and open access by the Graduate School at LSU Digital Commons. It has been accepted for inclusion in LSU Doctoral Dissertations by an authorized graduate school editor of LSU Digital Commons. For more information, please [email protected]. THE DETERMINANTS OF NUCLEOSOME PATTERNS AND THE IMPACT OF PHOSPHATE STARVATION ON NUCLEOSOME PATTERNS AND GENE EXPRESSION IN RICE

A Dissertation

Submitted to the Graduate Faculty of the Louisiana State University and Agricultural and Mechanical College in partial fulfillment of the requirements for the degree of Doctor of Philosophy

in

The Department of Biological Sciences

by Qi Zhang B.S., Sichuan Agricultural University, 2012 May 2018 ACKNOWLEDGEMENTS

The journey towards the completion of the degree hasn’t been easy, and I would like to

express my heartfelt gratitude to everyone who has been around me, listening to me, and

supporting me.

I would like to thank my major professor, Dr. Aaron Smith for accepting me as his

graduate student and providing financial support. I would like to thank him for his time and

effort to train me to become an independent scientist, as well as his patience with me especially

when facing difficulties. His guidance and encouragement has been invaluable throughout my

graduate studies.

I would like to thank my graduate committee members Dr. John Larkin, Dr. Maheshi

Dassanayake, and Dr. David Donze for their continuous support, insightful comments, and encouragement. I thank Dr. Maud Walsh for her service as a Dean’s representative on my

graduate committee. My special thanks to Dr. Larkin and Dr. Dassanayake for their useful discussions during joint lab meetings and their critical reading of my first manuscript. I would like to express my sincere gratitude to Dr. Dassanayake and Dr. Dong-Ha Oh for helping me

start genomic data analysis and providing very useful suggestions and assistance on my research.

I would like to acknowledge Sandra DiTusa for teaching me laboratory as well as lab

management skills. I thank Dr. Scott Herke and Dr. Subbaiah Chalivendra for useful tips on

experimental trouble-shooting. I would like to thank Dr. James Moroney, Dr. Marc Cohn, and

Dr. Joomyeong Kim for their suggestions on my research and presentations.

I thank current and former members of the Smith lab, Maryam Foorozani, Jifeng Li,

Matthew Vandal, Dr. Elena Fontenot and Dr. Sarah Zahraeifard for their discussions, help, and

encouragement.

ii

Lastly I would like to thank my parents Zhijian Zhang and Yaping Shen, and my uncle

Prof. Pei-Yi Shen for their support and love. My family has been inspiring and encouraging since my first day of school.

iii

TABLE OF CONTENTS

ACKNOWLEDGEMENTS ...... ii

ABBREVIATIONS ...... v

ABSTRACT ...... vi

CHAPTER 1. BACKGROUND ...... 1

CHAPTER 2. NUCLEOSOME POSITIONING AND OCCUPANCY IN RICE ...... 20

CHAPTER 3. THE IMPACT OF PHOSPHATE STARVATION ON NUCLEOSOME PATTERNS AND GENE EXPRESSION IN RICE ...... 45

CHAPTER 4. CONCLUSIONS ...... 60

REFERENCES ...... 65

APPENDIX: COPYRIGHT INFORMATION ...... 75

VITA ...... 76

iv

ABBREVIATIONS

ChIP

COR coronatine

DEG differentially expressed gene

DHS DNase I hypersensitivity site

FDR false discovery rate

FPKM fragments per kilobase of transcript per million mapped reads

GB gene body

GO gene ontology

MNase-seq sequencing of micrococcal nuclease digested chromatin

NDR nucleosome depleted region

NFR nucleosome free region

PCC Pearson's correlation coefficient

PCG protein-coding gene

PSG pseudogenes

PSR phosphate starvation response

RNAP II RNA polymerase II

RNA-seq RNA sequencing

SCC Spearman's correlation coefficient

TE transposable element

TEG transposable element related gene

TF factor

TSS transcription start site

TTS transcription termination site

v

ABSTRACT

In eukaryotic cells, DNA is a large molecule that must be greatly condensed to fit within

the nucleus. DNA is wrapped around proteins to form , which facilitate

DNA condensation, but on the other hand, may limit DNA processes. Organisms must respond

to environmental stress in order to survive, and one strategy is by remodeling nucleosomes to

promote changes in DNA accessibility to alter gene expression. Studies have demonstrated a

clear correlation between nucleosome dynamics and transcriptional change in some eukaryotes,

however factors that affect nucleosome positioning in plants are largely unknown, and the

correlation between nucleosome dynamics and transcriptional changes in response to

environmental perturbation remain unclear.

We report a high-resolution map of nucleosome patterns in the rice (Oryza sativa)

genome by deep sequencing of micrococcal nuclease digested chromatin. The results reveal that

nucleosome patterns at rice genes were affected by both cis- and trans- determinants, including

GC content and transcription. A negative correlation between nucleosome occupancy across the transcription start site (TSS) and transcription was observed, and the nucleosome patterns across the TSS were correlated with distinct functional categories of genes. A parallel experiment was done monitoring nucleosome dynamics and transcription changes in response to phosphate starvation for 24 hours. Phosphate starvation resulted in numerous instances of nucleosome dynamics across the genome which were enhanced at differentially expressed genes.

This work demonstrates that rice nucleosome patterns are suggestive of gene functions, and reveal a link between chromatin remodeling and transcriptional changes in response to deficiency of a major macronutrient. The findings help to enhance the understanding towards eukaryotic gene regulation at the chromatin level.

vi

CHAPTER 1. BACKGROUND

Primary structure of eukaryotic chromatin

Eukaryotic DNA must be condensed to fit into a small space within the nucleus. Studies

on primary chromatin structure reveal that 146 bp of DNA wrap around a histone octamer

consisting of each of two copies of H2A, H2B, H3 and H4 to comprise a nucleosome (Luger et

al., 1997). Linker DNA refers to the region in between nucleosomes, and can be bound by

histone H1. The length of linker DNA between nucleosomes varies among species, cell types

and developmental stages, ranging from ~20 bp to ~50 bp (Teif et al., 2012; Beh et al., 2015;

Zhang et al., 2015). Longer linkers facilitate the packaging of DNA and are related to silent genes and heterochromatin (Valouev et al., 2011), whereas shorter linkers are related to higher

expression levels (Zhang et al., 2015). The formation of nucleosomes along DNA facilitates

DNA compaction, however it makes the DNA inaccessible for important cellular processes

including DNA replication, recombination, repair and transcription (Annunziato, 2005; Radman-

Livaja and Rando, 2010; Price and D'Andrea, 2013; Pulivarthy et al., 2016). Hence, nucleosome

distribution is not static but rather is modified in accordance with these processes.

The development of genome sequencing technology enables the rapid growth in

determining genome-wide nucleosome positions at a high resolution. Genome-wide nucleosome

maps have been widely reported in a variety of species including in yeast (Lee et al., 2007), C.

elegans (Valouev et al., 2008), Drosophila (Mavrich et al., 2008b), human (Valouev et al., 2011),

Arabidopsis (Li et al., 2014) and rice (Wu et al., 2014). In observing the common patterns from the reported nucleosome maps, a stereotypical nucleosome pattern can be summarized across the transcription start site (TSS) of actively transcribed genes: a nucleosome depleted region (NDR, sometimes termed ‘nucleosome free region (NFR)’ depending on the context. I use NDR

1

throughout this dissertation) localized just upstream of the TSS. Regions upstream of the TSS

(promoters) are depleted of nucleosomes relative to regions downstream of the TSS (transcribed

regions). Evenly-spaced nucleosome arrays are found downstream of the TSS, with the strongly-

positioned first nucleosome of the array (‘+1’ nucleosome). Nucleosome arrays upstream of the

TSS with relatively positioned ‘-1’ nucleosome are observed in yeast and humans, but not in

Tetrahymena thermophila and plants, and this phenomenon is discussed in Chapter 2.

Nucleosomes toward the 5’ end of genes are relatively better positioned than the ones further downstream of the TSS, and there is an NDR immediately upstream of the transcription termination site (TTS) at the 3’ end of genes. The nucleosome arrays downstream of the TSS can be partially explained by the barrier model (Mavrich et al., 2008a), in which a positioned nucleosome or other proteins that compete for binding to the DNA (for example, transcription factors, chromatin remodelers, and RNA polymerases) act as barriers to limit the possibilities of nucleosome positioning close to the barrier, close to the first nucleosome, the second and so on, forming the ‘statistical positioning’ with a decay of nucleosome arrays away from the closest barrier (Figure 1.1).

Nucleosome patterns (i.e. positioning and occupancy) are influenced by various cis- and trans- factors, including DNA sequence preferences, DNA , histone post- translational modifications, histone variants, transcription factors, chromatin remodelers, RNA polymerases and other DNA binding proteins that compete with . Contrasting results from studies reporting the impact of a single factor on transcription suggest that the impact of a single factor is usually context-dependent and involves the interaction with other factors. I reviewed the literature on how each factor impacts nucleosome stability, and hence nucleosome positioning and occupancy, and then reviewed the interactions among those factors to discuss the

2

impact on transcription in general. I use ‘nucleosome positioning’, ‘nucleosome occupancy’ and

‘nucleosome patterns’ throughout this dissertation: nucleosome positioning refers to where the nucleosome is located with respect to the genomic DNA, and nucleosome occupancy refers to the likelihood of having a nucleosome at a specific genomic location (Struhl and Segal, 2013).

Nucleosome patterns refer to the combination of both nucleosome positioning and occupancy.

Figure 1.1. Schematic map of nucleosome patterns at eukaryotic genes. Modified from (Jiang and Pugh, 2009). Consensus distribution of yeast nucleosome across the gene is shown. The peaks and valleys represent the relative nucleosome occupancy and the green (strong) to blue (weak) color transition suggest the correlation between nucleosome composition (histone variant levels, active histone post-translational marks) and phasing. The ovals represent the nucleosome positioning and the darker color suggests stronger positioning. Nucleosome sequence preferences

Studies have revealed the sequence differences between nucleosomal DNA and naked

DNA, suggesting that nucleosomal DNA has a sequence preference. Unlike classic protein-

DNA binding, nucleosomal DNA requires the sharp bending of the DNA molecule nearly twice across the histone octamer and plays roles in establishing compact higher-order chromatin structure. The 10-bp periodic WW (A/T) dinucleotides oscillating in phase with each other and out of phase with 10-bp periodic SS (G/C) dinucleotides (Figure 1.2) provides local bendability for the nucleosomal DNA and are enriched within the nucleosome in humans, yeast, Drosophila,

3

C. elegans and rice (Lee et al., 2007; Mavrich et al., 2008a; Mavrich et al., 2008b; Valouev et al.,

2011; Zhang et al., 2015). On the other hand, the rigid poly (dA:dT) tracks are found to resist

nucleosome formation, causing nucleosome depleted regions (NDR) across the genome (Yuan et

al., 2005; Lee et al., 2007; Mavrich et al., 2008a; Mavrich et al., 2008b; Struhl and Segal, 2013).

The particular dinucleotide oscillations at the well-positioned +1 nucleosome (the first nucleosome downstream of the transcription start site) varies among species: yeast and C. elegans use WW (A/T) dinucleotides and Drosophila and humans use SS (G/C) dinucleotides, as the oscillating pattern of WW or SS dinucleotides can be used to identify nucleosomes respectively (Mavrich et al., 2008b; Valouev et al., 2011). Importantly, trans-acting factors are found to constantly override sequence preferences. In humans, the insulator CTCF and repressor

NRSF binding sites are surrounded by nucleosome arrays and the binding sites are depleted of nucleosomes, however those sites intrinsically favor nucleosome positioning and exhibit high nucleosome occupancy in vitro suggesting the action of trans-factors in vivo to remove nucleosomes at those sites (Valouev et al., 2011). In yeast, the chromatin remodeler Isw2 inhibits transcription by positioning nucleosomes over the TSS and NDR where unfavorable sequences are usually found (Whitehouse et al., 2007).

Another way to characterize the impact of intrinsic DNA sequence on nucleosome positioning is the GC content. However, the effect of GC content on nucleosome positioning is controversial, possibly due to its partially overlapping role with CpG methylation (see below), and the exact sequence arrangement within the length of genomic location of interest. Generally, high GC content is found to favor nucleosome positioning (Tillo and Hughes, 2009; Gaffney et al., 2012). It is shown that nucleosomes are stabilized by GC-rich sequences, yet the positioning is not strong, and nucleosome positioning is further strengthened by flanking AT-rich repelling

4

sequences to restrict the mobility of the nucleosome in humans (Valouev et al., 2011) and rice

(Zhang et al., 2015). As a result, nucleosomal DNA is generally GC-rich and linker DNA is AT-

rich. GC content plays a more important role in determining in vivo nucleosome positions in

organisms with low-GC genomes, for example, in Tetrahymena thermophila (22% GC) (Beh et

al., 2015). Nucleosome patterns across the transcription start site (TSS) in yeast (38% GC) show

Figure 1.2. Crystal structure of the nucleosome. Modified from (Bonisch and Hake, 2012). Color codes: Yellow-H2A, Red-H2B, Blue-H3, Green-H4, Light Grey-DNA. L1 loop, acidic patch and C-terminal docking domain are shown in magenta, cyan and orange respectively. WW and SS: 10-bp periodic WW (A/T) dinucleotides that oscillate in phase with each other and out of phase with 10-bp periodic SS (G/C) dinucleotides within the nucleosomal DNA. Sequence preference is obtained from (Struhl and Segal, 2013). significant differences between in vivo and in vitro data as the in vivo nucleosome arrays across

the TSS are lost in vitro however in T. thermophila, in vivo and in vitro nucleosome patterns

across the TSS are largely the same. With a low-GC genome that generally disfavors

nucleosomes, T. thermophila uses oscillating GC content downstream of the TSS to guide

nucleosome positions, especially at the +1 nucleosome. In vitro nucleosomes are weakly

5

positioned across the TSS in yeast, as no GC content oscillation is found downstream of the TSS,

suggesting a more dominant role of trans- determinants in nucleosome positioning in vivo (Beh

et al., 2015).

DNA methylation and nucleosome positioning

Covalent modification of DNA changes its flexibility and affinity to histones and thus has

an impact on nucleosome formation. The addition of a methyl group to the fifth carbon of a

cytosine produces 5-methyl-cytosine, which is usually adjacent to a guanine (CpG dinucleotides).

This process is catalyzed by DNA methyl-transferase (DNMT) and the changes in methylation

are widely shown to be correlated with eukaryotic growth, development and disease (Jin et al.,

2011; Smith and Meissner, 2013; Dantas Machado et al., 2015). The hydrophobic methyl group not only influences the direct contacts between DNA and proteins, but also changes the conformation of the DNA binding site (Dantas Machado et al., 2015). It is unclear whether there is a causal correlation between DNA methylation and nucleosome formation, and how DNA methylation could alter nucleosome stability. In Arabidopsis and humans, it was shown that

DNA methylation is enriched at nucleosomal DNA, and the 10-bp periodicity of DNA methylation at nucleosomal DNA further indicates that DNA is methylated while bound to histones (Chodavarapu et al., 2010). Studies suggest that DNA methylation impacts nucleosome formation indirectly via other factors: DNMTs are shown to interact with both DNA and histones for proper anchoring to function (Jin et al., 2011), and to overcome nucleosome barriers, nucleosome remodeler DDM1 is involved in nucleosomal DNA methylation (Lyons and

Zilberman, 2017). In addition, there is a positive correlation (r=0.853) between nucleosome occupancy and the number of methylated CpG dinucleotides (Collings and Anderson, 2017). It has been shown that the loss of DNA methylation results in nucleosome reorganization and

6

gaining of active or permissive histone modification marks including H3K27ac, H3K4me3 and

histone variant H2A.Z deposition at CpG island (CGI) promoters but not at insulators or non-

CGI promoters, suggesting the methylation of nucleosomal DNA is also context-dependent (Lay et al., 2015). Studies have been carried out to elucidate the impact of DNA methylation on nucleosome stability with contrasting results (i.e. results show that DNA methylation either stabilizes (Collings et al., 2013; Lee et al., 2015) or destabilizes (Portella et al., 2013) nucleosomes). It is also indicated that when the given sequence is methylated, the dependence of the nucleosome formation energy on the sequence becomes moderate, making strongly or weakly formed nucleosomes weakened or strengthened, respectively (Minary and Levitt, 2014).

DNA methylation can also affect histone post-translational modifications. Methylated

CpG can be recognized by proteins that contain a methyl-CpG-binding domain (MBD), which can further recruit (HDAC) and other chromatin remodeling complexes, resulting in the repression of transcription (Baylin et al., 2001; Robertson, 2002). Similarly, it is shown that by recognizing MBD, DNMT may directly recruit histone methyl-transferase (HMT) to the region, causing the methylation of the associated histones (Jin et al., 2011). A link between DNA methylation and histone variant deposition has also been revealed. Replacement of canonical histone H2A with its variant H2A.Z is excluded from regions with methylation within the gene bodies. This correlation is conserved from plants to animals, indicating the role of H2A.Z deposition in protecting DNA from methylation and maintaining transcription activation (Zilberman et al., 2008; Zemach et al., 2010). The close interplay among DNA methylation, histone modifications, and histone variant deposition suggests their complex yet balanced roles in maintaining chromatin stability while facilitating biological functions.

7

Histone variants and nucleosome stability

Histone proteins within nucleosomes can be replaced by paralogous gene products, so nucleosome stability can therefore be altered by the interaction with these variants. Among core histones, the H2A family has the largest number of variants, including H2A.Z and H2A.X which are found universally in eukaryotes (Talbert and Henikoff, 2010). Additionally, H2A.Z is about

60% identical to the canonical H2A within the same species but is about 80% identical among species, suggesting its unique functions over other histone variants (Bonisch and Hake, 2012).

Here I focus on reviewing the current understanding of how H2A.Z incorporation affects nucleosome stability, and how H2A.Z interacts with other factors. H2A.Z is deposited by the

SWR1 chromatin remodeling complex and removed by SWI/SNF and INO80 complexes (Luk et al., 2010; Papamichos-Chronakis et al., 2011). The sequence preferences for H2A.Z-containing nucleosomes remain unchanged as the canonical H2A (Segal and Widom, 2009). As the H2A-

H2B dimer is at the edges of the nucleosome (Figure 1.2), and disassembly of a nucleosome requires the initial removal of an H2A-H2B dimer, the stability of the H2A-H2B dimer in a nucleosome is crucial for the stability of the entire nucleosome (Henikoff, 2008). The structural differences between H2A.Z and the canonical H2A account for local nucleosome stability as well as higher-order chromatin stability. The most significant difference is at the L1-L1 interface between the two H2As (Figure 1.2), which is important for the interaction between the two histone proteins by holding the two H2A-H2B dimers together (Suto et al., 2000). H2A.Z has a more extensive L1-L1 interface that results in a more stable association between two H2A-

H2B dimers as well as a more dynamic interface with (H3-H4)2 tetramer due to loss of

interaction (Luger et al., 1997; Suto et al., 2000). It is proposed that such structural differences

make nucleosomes more resistant towards dimer loss and open nucleosome structure (Bonisch

8 and Hake, 2012). Secondly, the C-terminus of H2A, which is located at the DNA entry/exit (i.e. where DNA begins and finishes the ~1.65 rounds wrapping of histones) (Figure 1.2), is important for histone-DNA interaction, interaction with linker histone H1 as well as other factors.

H2A.Z differs from H2A at C-terminal length and amino acid sequences, which was found to stabilize nucleosomes (Bonisch and Hake, 2012). Thirdly, there is an increased acidic patch on the surface of H2A.Z-containing nucleosome (Figure 1.2). The acidic patch is a negatively charged binding surface on the nucleosome that interacts with the neighboring H4 tail and the linker histone H1 to facilitate the formation of secondary chromatin structure (Boulikas et al.,

1980; Kalashnikova et al., 2013). Thus, the increased acidic patch on H2A.Z-containing nucleosome is likely to cause compact chromatin structure. Moreover, the extended acidic patch of H2A.Z is required but not sufficient for ISWI ATP-dependent chromatin remodeler activity

(Goldman et al., 2010) and contributes to local chromatin flexibility.

To better understand the impact of H2A.Z incorporation on nucleosome stability and hence transcription, its interaction with other factors must be considered. H2A.Z plays different roles depending on its partners within the nucleosome, for example, nucleosomes with both

H2A.Z and H3 variant H3.3 are shown to be the least stable, whereas nucleosomes with both

H2A.Z and canonical H3 are more stable than the canonical form with H2A and H3 (Jin and

Felsenfeld, 2007). It is found that nucleosomes with both H2A.Z and H3.3 (the least stable combination) are enriched surrounding NDRs of active promoters, where higher nucleosome dynamics are expected (Jin et al., 2009). The different or contrasting roles of H2A.Z on nucleosome stability offers an explanation to the ‘bivalent promoters’, where both active histone post-translational modification marks (e.g. H3K4me3) and inactive (e.g. H3K27me3) marks are located, that H2A.Z makes the nucleosome more or less stable depending on which form of H3 it

9

is interacting with. H2A.Z nucleosomes are found necessary for pioneer transcription factor (TF)

binding, which can bind nucleosomal DNA directly. Binding of pioneer TFs promotes the

recruitment of SWI/SNF and INO80 complexes to remove H2A.Z nucleosomes, causing

nucleosome deletion that allows the binding of other classic TFs that bind naked DNA

(Subramanian et al., 2015). Like canonical histones, the C-terminal of H2A.Z can be post- translationally modified, for example, acetylated H2A.Z is found surrounding the TSS of active genes (Valdés-Mora et al., 2012).

Active histone post-translational marks, for example H3K4 methylation, are mainly deposited on H3.3 nucleosomes rather than H3 nucleosomes (Ooi et al., 2006). It is hypothesized that H3K4 methylation serves as a marker for histone replacement rather than an epigenetic mark itself that allows for active chromatin inheritance (Henikoff, 2008). This hypothesis is supported by the contrasting effect of H2A.Z incorporation on histone H3 or H3.3 to amplify their differences in nucleosome stability as discussed above, and in addition, the destabilization with H2A.Z and H3.3 incorporation is independent of histone (Jin and

Felsenfeld, 2007). More evidence on the co-localization of histone variants together with modification marks and their net effect on nucleosome stability and transcription can better help to understand the roles of histone variant in chromatin organization and transcriptional regulation.

Linker histone H1 acts to neutralize the charges on the linker DNA and directly interact with core histone H2A to facilitate chromatin compaction at higher orders. Recent studies have revealed the roles of histone H1 in more than structural assistance in DNA compaction but also functional impact on DNA replication, DNA repair and genome stability (Fyodorov et al., 2017).

Deletion of H1 results in shortened nucleosome spacing (Woodcock et al., 2006), which is one of the signatures to distinguish euchromatin from heterochromatin, indicating the role of H1 in

10

transcriptional regulation. The mechanism of H1-nucleosomal DNA interaction has been

revealed, and arginine 54 on H1 is necessary for H1-nucleosomal DNA interaction due to its

positive charges and hydrogen bonding ability. That chromatin of embryonic stem cells is more

accessible than differentiated cells can be partially explained by the citrullination of arginine 54

on H1, which loses the positive charges and has reduced ability for hydrogen bonding

(Christophorou et al., 2014).

Histone modifications and nucleosome stability

Histone post-translational modifications are shown to affect nucleosome stability and accessibility as they change how histones interact with each other, and with the DNA, as well as how histones interact with other nucleosome-binding proteins. As variants are found mostly within the (H2A-H2B) dimers, histone modifications are more frequently found within the (H3-

H4)2 tetramer. Modifications can be found at the N-terminal tails as well as at the core histones,

and tail modifications are found to affect transcriptional activities indirectly, the modifications at

the core histone itself are found to directly impact nucleosome stability and thus dynamics.

The N-terminal tails from H3 make contact with the linker DNA and H1, and H4 tails make contact with adjacent nucleosome surfaces (Strahl and Allis, 2000). Acetylation of at those tails neutralizes the positively charged lysines and thus decreases the interaction between the (H3-H4)2 tetramer and the DNA, resulting in increased nucleosome mobility. Methylation of

lysines and arginines are found to stabilize non-histone proteins binding to the H3 tails (Henikoff,

2008). Depending on the nature and function of such non-histone proteins, the methylation at

H3 tails can either stabilize or destabilize the nucleosomes. For example, binding of polycomb

repressive complex 1 (PRC1) to H3K27me and binding of heterochromatin protein 1 (HP1) to

H3K9me helps to anchor the H3 tail and stabilize the nucleosome while binding of chromatin

11

remodelers to acetylated H4 destabilizes nucleosomes (Henikoff, 2008). Since H3K4

methylation may serve as the marker for histone variant deposition as discussed above, histone

tail modifications may rely on the interaction with other factors to alter nucleosome stability.

Studies have shown that modifications at the histone core can directly impact nucleosome

stability. Depending on the location and nature of the modification, core histone modifications

can alter histone-DNA interactions and histone-histone interactions to change nucleosome

stability. Acetylation of H3K56 and methylation of H3R42, both located near the DNA

entry/exit site, have minor impacts on nucleosome stability but increase nucleosome ‘breathing’

(transient exposure of nucleosomal DNA at the entry/exit site) (Tessarz and Kouzarides, 2014).

For H3K56Ac, the interaction between histone and DNA becomes weaker at the DNA entry/exit

site, increasing the unwrapping of nucleosomal DNA near the site. H3K56Ac is found to be one

of the modifications that keeps chromatin accessible at a higher order level and is involved in

transcriptional regulation and DNA replication (Masumoto et al., 2005; Xu et al., 2005;

Watanabe et al., 2010). Adding a methyl group at arginine (H3R42me) not only enhances steric

hindrance but also removes a hydrogen bond donor, which slightly decreases nucleosome

stability to allow RNA polymerase II (RNA Pol II) to migrate more easily through the template

(Casadio et al., 2013). Acetylation of H3K122 and H3K64 are found to destabilize nucleosomes,

as these two lysines are located at the dyad axis (i.e. the middle of the nucleosomal DNA) where

histone-DNA interaction is the strongest. H3K122Ac and H3K64Ac are enriched at active

promoters, and promote eviction of nucleosomes at promoters, suggesting the role of these two

modifications in transcription activation (Di Cerbo et al., 2014). H3K79me and H4K91Ac both

decreases nucleosome stability by altering histone-histone interactions: methylation of H3K79 causes the loss of a hydrogen bond of the L2 loop to (Lu et al., 2008), and acetylation

12

of H4K91 neutralizes charges of the at the (H2A-H2B) dimer-(H3-H4)2 tetramer interface

(Ye et al., 2005) to decrease the association. Histone core modifications may have similar

effects as histone variant replacement. For example, glutamine 105 of H2A is located at the C-

terminal docking domain of H2A (Figure 1.2), and the modification of glutamine 105 may lead

to a similar effect as H2A.Z deposition, which has a glycine residue at position 105 in humans,

and serine in yeast with the effect of stabilizing nucleosomes (Suto et al., 2000).

Phosphate sensing, uptake and signaling

In this dissertation, I discussed the impact of phosphate starvation on nucleosome

patterns and transcription. In order to survive, organisms must respond rapidly and vigorously to

environmental stress, such as low availability of nutrients. Phosphorus (P) is an essential

nutrient as it is a structural component of nucleic acids and phospholipids, and is involved in the

regulation of biological processes. Consequently, prolonged P starvation can lead to arrest of

both growth and cell division (Secco et al., 2012). To ensure coordination between growth and

external P availability, organisms have evolved sophisticated sensing and signaling pathways.

When plant root tips make contact with low P conditions, primary root growth is arrested

suggesting the root tips are the local sensing site for inorganic phosphate (Pi) (Svistoonoff et al.,

2007). The mechanism behind the sensing is unclear in plants since no particular Pi sensor or

transceptor (a transporter and a receptor) has been identified. In yeast, PHO84 is identified as a

Pi sensing transceptor (Popova et al., 2010) and the SPX-domain (SG1/PHO81/XPR1)

containing PHO1 can be the candidate of the Pi sensing transceptor in plants. Pi itself can serve

as a signal in Pi starvation signaling, as Pi starvation response is suppressed by the application of

2- phosphite (HPO3 ) (Ticconi et al., 2001). The major form of Pi taken up by plants is determined by pH dependent Pi uptake experiments in plants: the results reveal that the uptake rates are

13

- highest between pH 5 and 6 where H2PO4 predominates. The movement of Pi anion to the sites

of uptake by plants is by diffusion and it’s a very slow process (10-12 to 10-15 m2/s). The main

site for Pi uptake is at the root hairs. Root hairs contribute to most of the total root surface area

(70%), and they are the location for Pi transporters (e.g. Pht1;1, Pht1;4) in plants (Schachtman et

al., 1998). The overall negative charge on the cell wall caused by the presence of carboxyl

groups associated with the pectic polysaccharides and the concentration differences of Pi in and

out of the cell (soil/apoplasm < 2 μM and symplasm/cytoplasm at mM level) require the

involvement of Pi transporters (Smith et al., 2003).

Pi is one of the most limiting nutrients for plants due to its low solubility in soil and poor

uptake efficiency (Raghothama, 1999). Pi fertilizers are applied to maintain crop growth, but are

mined from non-renewable sources, and over-fertilization of Pi can cause environmental

problems, including eutrophication of waterways and hypoxia (Vance et al., 2003; Conley et al.,

2009; Elser and Bennett, 2011; Li et al., 2016). Understanding how plants respond to Pi limitation, and increasing Pi-use efficiency will aid in enhancing agriculture sustainability.

Under Pi deficiency conditions, plants carry out changes to deal with the low Pi situation,

and this response is termed the Pi starvation response (PSR). Several morphological changes are

observed, including increased root to shoot biomass ratio, stimulated lateral root and root hair

growth, and inhibited lateral root growth. Physiological changes include the de-repression or activation of high affinity Pi transporters, accumulation of sucrose and anthocyanin at the deprived tissues, and the reduction of photosynthesis (Hammond et al., 2004).

Rice is an important crop species feeding a large population in the world. With a relatively small genome size, rice is used as a model organism that is relatively easy to study. In rice, the sensing and signaling of Pi starvation can be summarized as follows (Hu and Chu, 2011):

14

OsSPX1 is an SPX domain containing protein (since it contains a domain similar to yeast

PHO81, it might play a role in –Pi sensing), and it negatively regulates OsPHR2, which is a

transcription factor that plays an important role in Pi starvation signaling in rice with the

consensus binding sequence P1BS (5’-GNATATNC-3’) (Zhou et al., 2008). OsmiR399 is a direct target of OsPHR2, and OsmiR399 negatively regulates OsPHO2 (LTN1). OsPHO2 encodes a putative -conjugating (E2) that plays a role in protein degradation.

That suggests OsPHO2 regulates Pi starvation signaling by causing target protein degradation.

However, the direct target of OsPHO2 is not known, but may be Pi transporters and other

components involved in Pi starvation signaling. In addition to OsPHR2, there are other members

in the PHR1 subfamily reported to regulate Pi-starvation and Pi-homeostasis redundantly:

OsPHR1, OsPHR3, and OsPHR4 (Ruan et al., 2017).

Regulation of gene expression at the chromatin level

In animals and plants, it has been demonstrated that nucleosomes are enriched at exons of genes, and strongly positioned at intron-exon and exon-intron junctions, with RNA polymerase II enrichment at the exons over introns (Chodavarapu et al., 2010). These observations are consistent with the role of nucleosome positioning in transcription.

One of the most extensively studied models regarding the regulation of gene expression in the context of chromatin structure is the yeast PHO5 gene (Horz and Altenburger, 1981;

Almer et al., 1986; Vogel et al., 1989; Fascher et al., 1993; Svaren and Horz, 1997; Korber and

Barbaric, 2014). Also, the chromatin remodeling at the PHO5 gene provides an example of the

activation of a repressed gene in response to Pi starvation. Pi starvation in yeast results in an

increased production of secreted acid , and more than 90% of the acid phosphatase

activity is provided by Pho5. The expression of the PHO5 gene is strongly regulated by Pi

15

concentration in the medium, where no phosphate activates the expression, whereas high and

even low phosphate represses the expression. Under repressed conditions (+Pi), transcription

factor Pho4 that activates Pho5 is phosphorylated by the cyclin-CDK complex Pho80 and Pho85.

At the PHO5 promoter, there are strongly positioned nucleosomes occupying the region except

for a ~80 bp of UASp1 (Upstream Activation Sequence phosphate) is exposed (sensitive to

DNaseI digestion). UASp1 is the high-affinity binding site for Pho4, and phosphorylated Pho4

can not bind UASp1. Under Pi starvation conditions (-Pi), Pho81 serves as a starvation signal to

inhibit the activity of the Pho80-Pho85 complex, thus un-phosphorylated Pho4 binds

nucleosome-free UASp1 and further recruits Pho2, another transcription factor, as well as the

SWI/SNF complex to remove nucleosome arrays at the promoter using ATP and finally activates

PHO5 expression (Figure 1.3).

Figure 1.3. Nucleosome patterns at the PHO5 promoter under non-induced (+Pi) and induced (- Pi) conditions. Nucleosomes -1, -2, -3 and -4 are removed upon activation. Filled circles mark the two Pho4 binding sites UASp1 and UASp2. The positions are listed relative to the coding sequence (PHO5). Figure is modified from (Fascher et al., 1993). The knowledge learned from the activation of the yeast PHO5 promoter reveals the role

of transcription factors and chromatin remodelers as trans-determinants of nucleosome positioning that correlate with transcription. The interaction between nucleosomes and transcription factors can also fine-tune transcription. At the PHO5 promoter, there are two Pho4

16

binding sites, one located at -360 (UASp1, high-affinity) and the other at -250 (UASp2, low-

affinity) (Figure 1.3). The nucleosome-free UASp1 is mainly responsible for the activation of the promoter by setting the threshold of responses via binding affinity, while the nucleosome-

bound UASp2 is responsible for the overall expression level once the promoter is activated (Lam

et al., 2008). The low-affinity UASp2 is an example of nucleosome-bound transcription factor binding sites and the accessibility of such sites are likely dependent on nucleosome stability differences. For example if the binding sites are close to the entry-exit position of nucleosomal

DNA, the binding sites can be exposed transiently, as discussed above.

Nucleosome arrays at the PHO5 promoter under repressed conditions differ from the

stereotypical nucleosome patterns across the TSS, in which an NDR is surrounded by the

strongly positioned -1 and +1 nucleosome upstream and downstream of the NDR, respectively

(Figure 1.1 and 1.3). In addition, the -1 nucleosome of the PHO5 promoter covers the TATA

box, which is usually found at inducible or “stress-responsive” promoters. Positioned

nucleosomes covering the TATA box and other transcription factor binding sites at the promoter

region seems to be a typical pattern in inducible promoters (Basehoar et al., 2004). This differs

from constitutive promoters or promoters of “housekeeping” genes, where a wider NDR

provides transcription factor accessibility and allows the constant assembly of transcription

machinery without recruitment of chromatin remodeling complexes nor hydrolyzing ATP to

fulfill the “housekeeping” role.

What maintains the difference between inducible and constitutive promoters in term of the nucleosome patterns? In addition to transcription factors and chromatin remodelers, the action of RNA polymerase II plays a central role in maintaining promoter accessibility during protein-coding gene transcription. The positioned +1 nucleosome immediately downstream of

17

the TSS may serve as a strong barrier for RNA polymerase II. Deposition of histone variant

H2A.Z at the +1 nucleosome facilitates (H2A-H2B) dimer loss (Talbert and Henikoff, 2014) to

allow RNA polymerase II passage while leaving the hexasome (nucleosome without an H2A-

H2B dimer) (Lai and Pugh, 2017). On the contrary, the positioned +1 nucleosome may

contribute to the pausing of RNA polymerase II (Mavrich et al., 2008b; Bai and Morozov, 2010).

Paused Pol II is associated with promoters with pro-nucleosome sequences, and promoters of genes with less paused Pol II disfavor nucleosome assembly suggesting the role of Pol II in maintaining promoter accessibility by eviction of nucleosomes. Such behavior of Pol II is also found to poise genes for higher levels of transcription due to promoter accessibility (Adelman and Lis, 2012).

Correlation does not grant casualty and attempts have been made to elucidate the causal relationship between nucleosome dynamics and transcriptional regulation. Under induced conditions, with a deleted or mutated TATA box, the PHO5 promoter undergoes nucleosome eviction without the proper loading and activation of RNA Pol II (Fascher et al., 1993; Barbaric et al., 2007). In Drosophila, the Hsp70 gene body undergoes nucleosome loss upon heat shock that is independent of Hsp70 transcription (Petesch and Lis, 2008). Those results suggest that the loss of nucleosomes at genes (promoter or gene body) is the prerequisite of transcription, or at least independent of transcription. Recently, it was found that nucleosome dynamics is more likely an increase in nucleosome accessibility rather than nucleosome eviction (a decrease in occupancy) during transcription induction (Mueller et al., 2017).

In this dissertation, I illustrate the cis- and trans- determinants of nucleosome patterns in rice including GC content and transcription. I discuss how GC content and transcription affect nucleosome positioning and occupancy and draw the correlation between nucleosome patterns

18 and gene functional annotations (Chapter 2.). Further, I show how environmental perturbation

(via Pi starvation) changes nucleosome patterns and transcription, and the correlation between nucleosome dynamics and transcriptional changes (Chapter 3.).

19

CHAPTER 2. NUCLEOSOME POSITIONING AND OCCUPANCY IN RICE1

Introduction

There are two major types of determinants of nucleosome occupancy identified from a

variety of eukaryotic organisms: DNA sequence features, considered cis-determinants, such as

GC content, and trans-determinants, including transcription factors and chromatin remodelers that modulate nucleosome placement, histone variant deposition, and histone post-translational modifications (Lee et al., 2007; Mavrich et al., 2008a; Tillo and Hughes, 2009; Valouev et al.,

2011; Struhl and Segal, 2013; Li et al., 2014; Wu et al., 2014; Liu et al., 2015). Previous studies have also demonstrated the presence of well-positioned nucleosomes adjacent to transcriptional start sites (TSS) of eukaryotic genes (Lee et al., 2007; Mavrich et al., 2008b; Valouev et al., 2011;

Li et al., 2014; Wu et al., 2014; Liu et al., 2015; Zhang et al., 2015). In contrast, much less is understood regarding the dynamics of nucleosome patterns (i.e. positioning and occupancy) and their connections to transcriptional regulation.

Studies of nucleosome occupancy and positioning in plants are limited compared to those in model animal species. Recent genome-wide studies in Arabidopsis (Chodavarapu et al., 2010;

Li et al., 2014) and rice (Wu et al., 2014; Zhang et al., 2015) have shown nucleosome patterns in genic regions that are generally similar to other eukaryotes. Also similar to other species is that transcription is an important trans-determinant of nucleosome occupancy in plants (Fincher et al.,

2013; Li et al., 2014; Liu et al., 2015; Zhang et al., 2015). However, the particular determinants of nucleosome occupancy in plants remain unclear, and how they compare to other eukaryotes has not been investigated. Here, we report the genome-wide nucleosome positioning and

Portions of this chapter previously appeared as Zhang, Q., Oh, D.H., DiTusa, S.F., RamanaRao, M.V., Baisakh, N., Dassanayake, M., and Smith, A.P. (2018). Rice nucleosome patterns undergo remodeling coincident with stress-induced gene expression. BMC Genomics 19, 97. 20

occupancy in rice via deep sequencing of micrococcal nuclease digested chromatin (MNase-seq).

We reveal the relationships among nucleosome patterns, gene structure, gene functions and transcriptional activities in rice.

Materials and Methods

Plant material and growth conditions

Rice (Oryza sativa ssp. japonica cv. Nipponbare) seeds were surface-sterilized with 5% bleach and rinsed, and soaked in sterile distilled water at 37 °C in the dark for 3 days for pre- germination. Seeds were allowed to germinate at 22 °C under a 16 hr/8 hr day/night cycle for 14 days. Rice seedlings were transferred to half-strength Hoagland’s nutrient media (2.5 mM KNO3,

1 mM KH2PO4, 3.5 mM Ca(NO3)2·4H2O, 1 mM MgSO4·7H2O, 5 μM MnCl2·4H2O, 0.07 mM

NaMoO4, 0.02 mM H3BO3, 0.3 μM ZnSO4·7H2O, 0.2 μM CuSO4·5H2O, 0.014 mM Fe(EDTA))

for 21 days in a growth room under non-sterile conditions. For nutrient treatment, seedlings

were transferred to fresh half-strength Hoagland’s or the same media lacking phosphorus

(KH2PO4) for 24 hours. The hydroponic experiments were performed under 16 hr/8 hr day/night

cycles and temperature was kept at 22 °C in a growth room. The pH of the solution was adjusted

to 5.5 and the solution was renewed every 7 days. Shoots (green tissues) and roots from the

seedlings were flash frozen in liquid nitrogen, and stored at −80 °C.

MNase-seq

The EZ Nucleosomal DNA Prep Kit (Zymo Research, Irvine, CA) was used for nuclei

isolation and mono-nucleosomal DNA preparation from plant tissues. The nuclei isolation and

MNase treatment were performed according to the manual with modifications. Briefly, tissues

were ground in liquid nitrogen and re-suspended in HBM buffer (25mM Tris pH 7.6, 0.44M

sucrose, 10mM MgCl2, 0.1% Triton X, 2M spermidine, and 10mM β-mercaptoethanol). The

21

mixture was filtered with miracloth and centrifuged for 60 min at 2,000× g. Isolated nuclei were

washed with Nuclei Prep Buffer (Zymo Research). The prepared nuclei were treated with MN

Digestion buffer (Zymo research). The nuclei were digested with 1 U (final concentration of

0.004 U/μl) micrococcal nuclease (MNase, Zymo Research) for 30 min at room temperature.

The digestion was stopped by addition of 5× MN Stop Buffer (Zymo Research). Nucleosomal

DNA was purified by addition of DNA Binding Buffer (Zymo Research), and centrifuged with a

Zymo-spin IIC Column in a Collection Tube (Zymo Research). DNA was washed with DNA

Wash Buffer and eluted with warm DNA Elution Buffer (Zymo Resarch). Purified DNA was run on a 2% agarose gel containing ethidium bromide and visualized under UV light. The mono- nucleosomal DNA (~150 bp band) was excised from the gel and purified with a gel purification kit (Qiagen, Hilden, Germany).

Approximately 500 ng of MNase-digested mono-nucleosomal DNA from each sample was used for Illumina library generation. Library construction and deep sequencing were performed by the Roy J. Carver Biotechnology Center at the University of Illinois at Urbana-

Champaign using an Illumina HiSeq 2000 platform (Illumina, San Diego, CA). Raw data comprised 100 bp of single-ended reads. Illumina sequencing reads were mapped to the rice genome (MSU7) (Kawahara et al., 2013) using Bowtie (version 1.1.2) (Langmead et al., 2009) and only uniquely mapped reads were considered for further analysis. Approximately 63 million reads per sample (~15× coverage) were obtained. Correlations among mapped sequencing samples were analyzed using DeepTools (Ramírez et al., 2016). Mapped reads were subject to

GC content bias correction as previously described (Benjamini and Speed, 2012; Ramírez et al.,

2016). Nucleosome positions were identified and analyzed using the dpos function of DANPOS software with default settings for two sets of biological replicates (control and –Pi) (Chen et al.,

22

2013). An FDR of 0.05 was used for dynamic nucleosome calling, except for position shift

where a range setting of 50-95 bp was used. Genome-wide nucleosome patterns were plotted

using DANPOS (Chen et al., 2013) or ngs.plot (Shen et al., 2014). Nucleosome occupancy were

determined as averaged reads per million mapped reads. IGV was used to visualize mapped

reads to the reference genome (Robinson et al., 2011). The genome annotation was obtained

from MSU Rice Genome Annotation Project website (http://rice.plantbiology.msu.edu/) and

parsing of dynamic nucleosome locations were performed using BedTools (version 2.26.0)

(Quinlan and Hall, 2010).

RNA-seq

The total RNA from plant tissues was extracted using an RNeasy Plant Mini Kit (Qiagen)

according to the manufacturer’s instructions. On-column DNase digestion was performed on the

extracted total RNA with an RNase-free DNase set (Qiagen) according to manufacturer’s

instructions to reduce DNA contamination. Approximately 2 μg of DNase-treated total RNA from each sample was used for sequencing library construction. For sequencing library preparation, ribosomal RNA was removed with Ribo-ZeroTM rRNA Removal Kit (Plant) and the

remaining RNA was processed with the TruSeq Stranded mRNA library construction kit

(Illumina) starting at the fragment/elute step (no mRNA selection). Library construction and

deep sequencing were performed by the Roy J. Carver Biotechnology Center at the University of

Illinois at Urbana-Champaign using an Illumina HiSeq 2500 platform (Illumina). Raw data

comprised 100 bp of single-ended reads.

Ribosomal RNA reads were further removed by mapping sequencing reads to all known

rice ribosomal DNA sequences obtained from the Oryza repeat database

(http://plantrepeats.plantbiology.msu.edu/index.html) using Bowtie (Langmead et al., 2009) with

23

default settings, and reads that failed to align were kept. The remaining reads were then mapped

to the rice genome (MSU7) (Kawahara et al., 2013) using TopHat2 (Kim et al., 2013) with the following settings: --b2-sensitive -g 1, allowing only one hit for each read. Approximately 97 million reads per sample were obtained. Correlations among mapped sequencing samples were analyzed using DeepTools (Ramírez et al., 2016). Cufflinks (Trapnell et al., 2012) was used to determine transcript abundance (FPKM) of each gene from two control replicates.

Results

Nucleosome patterns of different types of rice genes

To investigate the genome-wide nucleosome patterns in rice, I isolated rice nuclei from shoots (green tissue of the plant) and performed micrococcal nuclease (MNase) digestion.

MNase-digested chromatin was run on a 2% agarose gel and the size of around 150 bp was excised and subject to purification. The size-selected, gel-purified nucleosomal DNA was sent for sequencing (MNase-seq) (Figure. 2.1).

Figure 2.1. Analysis of nucleosome patterns via MNase-seq. (A) Nuclei from 5-week-old rice shoots were treated with MNase and the boxed region was purified for sequencing. (B) An illustration of MNase-seq. Linker DNA were digested by MNase, but nucleosomal DNA were protected by histones and thus represents the location of nucleosomes in the genome.

24

We generated a total of four MNase-seq libraries including two biological replicates from

shoots of plants grown in full nutrient hydroponic solution for 5 weeks with an additional 24- hour of growth in either full nutrient hydroponic solution (control replicate 1 and 2, C1 and C2) or nutrient solution lacking phosphate (−Pi replicate 1 and 2, P1 and P2). Single-ended sequencing reads were mapped to Michigan State University rice genome annotation release 7

(MSU7) using Bowtie to allow unique mapping of each read (Langmead et al., 2009; Kawahara et al., 2013) (Table 2.1).

Table 2.1. Summary of MNase-seq data. Coverage = (number of mapped reads × read length) / total length of annotated rice genome (373,245,519) (Kawahara et al., 2013). Sample name Read length (bp) Number of mapped reads Coverage C1 100 50,609,088 13.6× C2 100 69,001,067 18.5× P1 100 55,539,463 14.9× P2 100 79,571,427 21.3×

To examine the reproducibility of mapped reads, Spearman’s correlation coefficient

(SCC) of mapped read abundance at each genomic position was calculated between C1 and C2

(SCC=0.95) and P1 and P2 (SCC=0.96) (Figure 2.2A). Considering the high reproducibility of

mapped reads, we generated averaged profiles from two replicates for the ease of data

presentation but kept replicates separated for data analysis. To summarize nucleosome patterns

in rice genes, nuclear chromosomal genes were extracted from annotation records, and the

representative model (the most abundant splice variant) of the gene was kept, and this left a total

number of 55,801 records. Each annotated record has the information for whether it is

‘expressed’ and if it is ‘TE (transposable element)’ according to the annotation (Kawahara et al.,

2013). With this information, I further divided all records into four categories: protein-coding genes (PCG, n=36,378), transposable elements (TE, n=4,101), transposable element-related genes (TEG, n=12,836), and pseudogenes (PSG, n=2,486) (Table 2.2).

25

Figure 2.2. Reproducibility of MNase-seq and RNA-seq. C1, control replicate 1; C2, control replicate 2; P1, −Pi replicate 1, P2, −Pi replicate 2. (A) Clustered heatmap of mapped MNase- seq samples with Spearman correlation coefficient (ρ). The distances among sample pairs are determined as 1-ρ. (B) Clustered heatmap of mapped rmRNA-seq samples with Pearson correlation coefficient (r). The distances among sample pairs are determined as 1-r. Table 2.2. Classification of annotated rice genes. 55,801 representative models of rice genes were further divided into four categories: PCG, TE, TEG and PSG. TE (+) TE (−) Expressed (+) TEG (n=12,836) PCG (n=36,378) Expressed (−) TE (n=4,101) PSG (n=2,486)

Averaged profiles for each category of gene were generated across the transcription start

site or the 5’ boundary of the element (TSS), transcription termination site or the 3’ boundary of

the element (TTS), and the gene body (GB, from TSS to TTS) using averaged control samples

(Figure 2.3). For PCG, we observed several distinct nucleosome pattern features: First, evenly-

spaced nucleosome arrays were observed downstream of the transcription start site (TSS; Figure

2.3A); Second, a nucleosome-depleted region (NDR) was found immediately upstream of the

TSS (Figure 2.3A and C); Third, higher nucleosome occupancy was found immediately

26

downstream of the TSS and upstream of the transcription termination site (TTS) as compared to

the remainder of the gene body (GB; Figure 2.3).

Figure 2.3. Nucleosome patterns in rice genes. Regions 1,000 bp upstream and downstream of the transcription start site (TSS, or 5’ boundary of the element), transcription termination site (TTS or 3’ boundary of the element), and gene body (GB, from TSS (5’ boundary) to TTS (3’ boundary) of a gene) were used to plot MNase-seq density under control conditions. Nucleosome occupancy (y-axes) were determined as averaged reads per million mapped reads. (A) MNase-seq density for all rice genes across the transcription start site (TSS) under control conditions. PCG, protein-coding genes; TEG, transposable element-related genes; TE, transposable elements; PSG, pseudogenes, annotated genes that are neither expressed nor transposable elements. (B) Same analysis as (A) at the transcription termination site (TTS). (C) Same analysis as (A) at gene body (GB). In contrast to PCG, neither evenly-spaced nucleosome arrays downstream of the TSS nor

NDRs upstream of the TSS were found in the remaining three gene categories, indicating these features are signatures of rice genes poised for transcription (Figure 2.3). In addition, nucleosome occupancy across genes were higher in PCG and PSG than in TE or TEG, indicating

27 repetitive elements are relatively depleted of nucleosomes, or could be due to the discarding of reads mapped to multiple genomic locations during the mapping procedure.

We focused on PCG throughout this study to understand the functional impact of nucleosome patterns in the genome. While investigating individual genes, we noticed some genomic loci had exceptionally high nucleosome occupancy which could cause difficulties in data interpretation, and we found that those loci were annotated with nuclear insertions of organellar DNA (Figure 2.4). We speculated that even with careful handling, there were significant amounts of chloroplast DNA included in our nucleosomal DNA samples. Hence, I searched for genes with annotated organellar insertion sites and found 278 such genes, and those genes were excluded, leaving a total number of 36,100 PCG for downstream analysis.

Figure 2.4. An example of organellar insertions in genes. IGV visualization of mapped read distribution on chromosome 12. The box in the top panel marks a window with high read abundance. Zooming in on the window reveals five protein-coding genes with organellar insertion sites.

28

Dissection of nucleosome patterns upstream of the transcription start sites

The nucleosome pattern for rice PCG that we observed is similar to findings from other studies, but highlights an apparent distinction among species—for example, evenly-spaced nucleosome arrays were found upstream of the TSS in genome-wide studies of human (Valouev et al., 2011) and yeast (Lee et al., 2007), but not in rice (Wu et al., 2014; Zhang et al., 2015 and this study), Arabidopsis (Li et al., 2014; Liu et al., 2015), or Tetrahymena thermophila (Xiong et al., 2016). This may result from differences in either MNase-digestion strength (Vera et al.,

2014), lengths of NDR regions and associated DNase I hypersensitivity sites (Wu et al., 2014), or sample complexity due to combined cell types. We hypothesized that diverse nucleosome patterns from subsets of genes were masking defined nucleosome arrays upstream of the TSS when combined in the single PCG nucleosome profile (Figure 2.3A). Therefore, we analyzed nucleosome occupancy and positioning of all PCG separately to search for evenly-spaced nucleosome arrays. Interestingly, we identified two major subsets of PCG that each had nucleosome arrays not only downstream of the TSS but also upstream (Figure 2.5A).

These gene subsets had a well-defined −1 nucleosome (the first nucleosome upstream of the TSS) at either −140 bp (n=7,270) or −250 bp (n=7,170) relative to the TSS, with phased nucleosome arrays further upstream. The nucleosome arrays of these two groups of genes are out of phase with each other and thus mimic “destructive wave interference” when combined into a profile of all genes, masking the nucleosome peaks in the region upstream of the TSS

(Figure 2.3). Gene ontology (GO) term enrichment analysis (Du et al., 2010) of the two subsets of genes did not yield any significantly enriched GO terms (false discovery rate (FDR) <0.05).

This indicated no obvious functional link among the genes in each group, but rather the contribution of other determinants to the distinct nucleosome patterns.

29

Figure 2.5. Genes with nucleosome arrays upstream of the transcription start sites. (A) MNase- seq density across the TSS of genes found to have -1 nucleosome at either −140 bp (7,270 genes) or −250 bp (7,170 genes) relative to the TSS. (B) Same analysis as (A) with subsets of genes filtered for ‘ideal’ genes at the TSS. (C) Same analysis as (A) with subsets of genes filtered for ‘non-ideal’ genes at the TSS. To test whether these phased nucleosome arrays upstream of the TSS were caused by

nearby genes, I further divided the gene subsets into two categories: ‘non-ideal’ at the TSS (i.e.

genes with one or more TSS or TTS within the ± 1,000 bp TSS window other than itself), and

‘ideal’ at the TSS (the complement to ‘non-ideal’). There was no major difference in nucleosome arrays among subsets filtered for ‘ideal’ genes, subsets filtered for ‘non-ideal’ genes and the original subsets, indicating that the phased nucleosome arrays upstream of the TSS were free of the influence from nearby genes (Figure 2.5B and C).

30

Nucleosome patterns are associated with GC content of genes

To search for factors that contribute to nucleosome profiles of rice genes, we investigated

GC content and transcription, which are known cis- and trans-determinants of nucleosome occupancy. It is widely reported that GC- and AT-rich sequences favor and disfavor nucleosome formation, respectively (Mavrich et al., 2008a; Mavrich et al., 2008b; Tillo and Hughes, 2009;

Valouev et al., 2011; Zhang et al., 2015; Xiong et al., 2016). However, a negative correlation between GC content and nucleosome occupancy was shown in Arabidopsis and rice when examining random fragments of genomic DNA (Liu et al., 2015), and few studies have examined the correlation between GC content and nucleosome occupancy across the TSS. To investigate

GC content as a potential determinant of nucleosome occupancy in rice, we first plotted the GC content distribution of all PCG across the TSS, which broadly showed a pattern similar to nucleosome occupancy with a narrow trough immediately upstream of the TSS and a wide peak downstream of the TSS (Figure 2.6A and C). Secondly, we plotted the GC content distribution of the −140 and −250 gene subsets separately and observed opposing oscillations of GC content upstream of the TSS, such that the GC content correlated with the position of the -1 nucleosome

(the first nucleosome peak upstream of the TSS) of each subset (−140 and −250 relative to the

TSS; Figure 2.6A and C). Both of the above analyses showed a positive correlation between GC content and nucleosome occupancy across the TSS.

Next we sorted all PCG based on their GC content across the TSS, divided them into five quintiles (1st with the lowest GC content and 5th with the highest), and plotted the corresponding

MNase-seq densities (Figure 2.7A). Interestingly, this analysis revealed a negative correlation

between GC content and nucleosome occupancy across the TSS. To uncouple the contribution

31 of GC content from the TSS upstream and downstream regions, we generated another two sets of quintiles sorted based on their GC content either 1,000 bp upstream or downstream of the TSS.

Figure 2.6. GC content distributions across genes. GC content of PCG and two subsets of PCG (n=7,270 and n=7,170) at the TSS (A), TTS (B) and the gene body (C). Both analyses yielded a similar negative correlation: genes with high GC content either upstream or downstream of the TSS had lower nucleosome occupancy in the corresponding region (Figure 2.7B and C). In addition, gene quintiles with an average of 48% (2nd quintile) to

53% (3rd quintile) GC content downstream of the TSS have the best nucleosome phasing (Figure

2D). This may reflect stronger periodicity of SS (G/C) and WW (A/T) dinucleotides within the region, which would favor well-positioned nucleosomes (Zhang et al., 2015) and would yield roughly equal GC and AT content overall. Together these results reveal that the correlation between GC content and nucleosome occupancy in rice is complex, possibly due to GC content acting as a cis-determinant, and also reflecting the involvement of other determinants, such as gene expression.

32

Figure 2.7. Correlations between GC content and nucleosome patterns. (A) MNase-seq density of PCG grouped by their GC content across the TSS (± 1,000 bp TSS), 1st lowest, 5th highest. (B) MNase-seq density of PCG grouped by their GC content at a window of 1,000 bp upstream of the TSS, 1st lowest, 5th highest. (C) MNase-seq density of PCG grouped by their GC content at a window of 1,000 bp downstream of the TSS, 1st lowest, 5th highest. Nucleosome patterns are associated with gene expression levels

To elucidate the relationship between nucleosome patterns and gene expression in rice we

carried out RNA sequencing (RNA-seq) analysis of the same tissues used for the MNase-seq

experiments. Four RNA-seq libraries were generated consisting of two biological replicates for

control (C1 and C2) and −Pi (P1 and P2). We obtained on average 97 million uniquely mapped

reads for each library (Table 2.3). To assess the reproducibility of mapped RNA-seq reads, we calculated Pearson’s correlation coefficient (PCC) of sequencing read abundance at each genomic position between C1 and C2 (PCC=0.91) and P1 and P2 (PCC=0.97) (Figure 2.2B).

33

Biological replicates from RNA sequencing were kept separated for data analysis. Using the control samples, PCG were separated into five groups according to their expression levels (1st quintile highest and 5th quintile lowest) as determined by their FPKM values (Trapnell et al.,

2012). The MNase-seq densities of genes grouped by their expression were plotted at the window of ± 1,000 bp TSS (Figure 2.8A and B), ± 1,000 bp TTS (Figure 2.8C) and at the gene body (Figure 2.8D). We found that highly expressed genes had wider NDRs upstream of the

TSS and had relatively lower nucleosome occupancy across the TSS than lower expressed genes.

Moreover, evenly-spaced nucleosome arrays were more evident in highly expressed genes.

These observations are consistent with previous studies on rice and Arabidopsis (Li et al., 2014;

Wu et al., 2014; Liu et al., 2015), and demonstrate that transcription is a strong determinant of nucleosome patterning in rice.

Table 2.3. Summary of RNA-seq data. Sample name Input reads Mapped reads Mapping rate C1 109,975,468 100,928,054 91.8% C2 101,746,188 95,469,826 93.8% P1 104,338,118 96,639,613 92.6% P2 101,964,123 94,335,655 92.5%

We observed a general negative correlation between nucleosome occupancy and gene expression, especially across the TSS of the genes, and at the NDR (Figure 2.8A and B).

However, it is not clear to what extent nucleosome occupancy correlates with gene expression quantitatively. To address this, I tried to build a regression model by correlating nucleosome occupancy across the TSS and TTS of genes and gene expression levels. Previous studies have reported a non-linear relationship between nucleosome occupancy and gene expression, and different strategies have been used to normalize the RNA-seq FPKM values to fit in the linear

34

regression model: log-transformation in Arabidopsis (Liu et al., 2015) and percentile-

transformation in human (Ulz et al., 2016).

Figure 2.8. Correlations between nucleosome patterns and gene expression. (A) Heatmap of MNase-seq density of PCG sorted by their expression level under control conditions from RNA- seq analysis of the same tissue (1st highest, 5th lowest). The vertical line in the middle of the heatmap indicates the TSS. (B) Average plot of the same data with genes grouped by their expression levels. (C) Same analysis as (B) at the TTS. (D) Same analysis at the gene body. To determine which strategy works best for my datasets, I plotted the regression

coefficient between nucleosome occupancy and gene expression surrounding the NDR using

both strategies and the un-transformed data set as control (‘linear’), which is the position where I

expected to observe the most significant correlation (Figure 2.8B). Again, to reduce the impact

of nearby genes on the modeling, ‘ideal’ genes were used across the TSS and the TTS. I

observed that the percentile-transformation worked better than log-transformation with a more

significant correlation coefficient at the NDR (Figure 2.9A). Hence, I chose to percentile-

35

transform FPKM values and fed those into the linear regression model together with the

nucleosome occupancy read counts in 100-bp binned windows across the TSS and TTS (Figure

2.9B and C).

Figure 2.9. Quantitative correlations between nucleosome occupancy and gene expression. (A) Correlation coefficient between nucleosome occupancy and gene expression surrounding the NDR. RNA-seq FPKM values were either kept as is (linear), log-transformed, or percentile- transformed. (B) Correlation coefficient between gene expression and nucleosome occupancy in 100-bp bins flanking the TSS. (C) Same analysis as (B) flanking the TTS. The correlation coefficient between nucleosome occupancy and gene expression across the TSS and TTS is negative (r<0), indicating that there was a negative correlation between the two quantitatively (Figure 2.9B and C). In addition, the correlation was stronger towards the

TSS and the TTS, and the correlation at the NDR was the strongest, and the correlation at 500 bp upstream of the TTS was the weakest (Figure 2.9B and C). Surprisingly, the correlation at the strongly positioned +1 nucleosome was relatively weak, but followed by relatively strong

36 correlations downstream of the +1 nucleosome further into the gene body, indicating the stability of the +1 nucleosome during transcription (Figure 2.9B). These results suggest that nucleosome occupancy alone can explain gene expression level to some extent, yet there are other factors involved.

Nucleosome patterns are associated with alternative splicing and gene length

We next separated all PCGs into four groups based on their number of splice variants and plotted their MNase-seq density across the TSS and TTS. Clearly, genes with only one splice variant had higher nucleosome occupancy across the TSS except at the +1 nucleosome (the first nucleosome downstream of the TSS) and had weaker nucleosome phasing downstream of the

TSS compared with genes that have more splice variants (Figure 2.10A), but no particular correlation was observed at the TTS (Figure 2.10B). Studies have shown that there is a correlation between gene length and gene expression (Yang, 2009; Grishkevich and Yanai, 2014), and since we observed a correlation between nucleosome patterns and gene expression, we next investigated the potential correlation between nucleosome patterns and gene length.

All PCGs were separated based on their length (from TSS to TTS, 1st longest, 5th shortest) and MNase-seq density was plotted ± 1,000 bp TSS. Longer genes had lower nucleosome occupancy across the TSS than shorter genes (Figure 2.10C), however we observed the opposite pattern at the TTS where longer genes had higher nucleosome occupancy (Figure 2.10D).

Increasing gene length is associated with higher nucleosome occupancy across the TSS and less evident 5’ NDR and the +1 nucleosome, which are the signatures of highly expressed genes.

Indeed, we found a positive correlation between gene length and gene expression in rice

(SCC=0.23).

37

Nucleosome patterns are linked to gene functions

Experiments described above indicate a contribution of both cis- and trans-determinants to nucleosome patterning in rice. To further explore distinct nucleosome patterns across the TSS and their possible correlation with gene function, we performed k-means clustering of MNase- seq profiles ± 500 bp TSS of PCG.

Figure 2.10. Correlations between nucleosome patterns and alternative splicing and gene length. (A) MNase-seq density at the TSS of PCG grouped by their number of splice variants, Group_1: 1 variant, Gourp_2: 2 variants, Group_3: 3 variants, Group_4: ≥4 variants. (B) Same analysis as (A) at the TTS. (C) MNase-seq density of PCG grouped by their length (from the TSS to the TTS, 1st longest, 5th shortest). (D) Same analysis as (C) at the TTS. Six clusters (A through F) of genes with distinct nucleosome patterns across the TSS were evident (Figure 2.11A). GO term enrichment analysis (Du et al., 2010) revealed that each of the six clusters had enriched GO terms. Interestingly, the clusters fell into two contrasting groups based on shared similar GO terms. Genes of clusters A, B, and C (type I, n=15,400) were

38

enriched in GO terms related to key biological processes, whereas clusters D, E, and F (type II,

n=20,700) were enriched in stress-related GO terms (Figure 2.12). Hence, we termed type I

genes “housekeeping” and type II genes “stress-responsive”. Consistent with a housekeeping

role, type I genes had on average significantly higher expression levels than type II genes under

control conditions (type I average FPKM=22.47, type II average FPKM=12.71, p<2.2×10-16,

Mann-Whitney U test). In addition, comparison of our type I housekeeping gene group with a housekeeping gene list identified by a previous study (Chandran et al., 2016) based on consistent expression in multiple cell or tissue types, found that approximately 80% (3,279/4,243) of these previously reported housekeeping genes were included in our type I gene group.

Because studies have shown that the TATA box in the promoter region is associated with stress-responsive genes in yeast (Basehoar et al., 2004), we tested whether our type II stress- responsive gene group was enriched with genes containing a TATA box. We searched promoters of rice genes for the TATA consensus sequence (CTATAWAWA) previously reported (Civan and Svec, 2009). Indeed, we found that type II genes were more likely to contain a TATA box within 50 bp upstream of the TSS than type I genes (2.6 fold, p=6.90×10-13,

Fisher’s exact test). Moreover, an average profile on type II genes across the TSS showed an evident −1 nucleosome where an NDR was found in type I genes (Figure 2.11B), and genes with a TATA box within 50 bp upstream of the TSS (n=1,093) had a −1 nucleosome at the same region (Figure 2.11C). Together these results indicate that nucleosome patterns across the TSS are suggestive of gene function.

Discussion

We found higher nucleosome occupancy surrounding the TSS and the TTS compared to the gene body suggesting the role of chromatin organization in defining the initiation and

39

termination of transcription (Figure 2.3). The presence of evenly-spaced nucleosome arrays

downstream of the TSS and NDRs immediately upstream of the TSS at PCG further indicates the

impact of chromatin organization on transcriptional activities (Figure 2.3A).

Figure 2.11. Clustering of genes based on nucleosome patterns across the TSS. (A) k-means clustering of nucleosome patterns ± 500 bp TSS of rice genes under control conditions. Right, individual heatmap profiles for gene clusters A−F at the TSS. (B) Average profiles for gene clusters ABC (type I gene) and DEF (type II gene) at the TSS. (C) MNase-seq density across the TSS of genes that contain TATA box under control (Ctrl) and −Pi conditions (n=1,093). The nucleosome patterns we found in rice genes were largely consistent with patterns in yeast and human (Lee et al., 2007; Valouev et al., 2011), but as with Arabidopsis, rice genes

lacked evenly-spaced nucleosome arrays upstream of the TSS (Figure 2.3A). Studies on yeast

and human used single cell types but studies on Arabidopsis and rice, including this study, used

homogenized plant tissues consisting of multiple cell types that could contribute to the

40

heterogeneity of nucleosome patterns in the promoter region due to tissue-specific expression

differences.

Figure 2.12. Significantly enriched GO terms for clusters ABC (type I gene) and DEF (type II gene). The color of the node represents the corrected p-value with a color scale ranging from yellow (corrected p-value=0.05) to dark orange (corrected p-value= 5×10-7). However, a genome-wide study on the single-celled protozoan Tetrahymena thermophila

also showed nucleosome phasing downstream but not upstream of the TSS (Xiong et al., 2016).

This result is possibly due to cell-to-cell differences within the same T. thermophilia culture,

since studies on single-cell nucleosome mapping in yeast showed that different cells in the same

41 yeast culture possessed different nucleosome patterns in the promoter region (Small et al., 2014).

Another explanation is that that since the plant and T. thermophila genomes contain larger numbers of genes compared to yeast and human, they may contain greater variability in nucleosome patterns at the promoter region which interferes with the detection of nucleosome arrays. The variable length of DHS was shown to mask the detection of nucleosome arrays upstream of the TSS in rice (Wu et al., 2014). We identified two subsets of genes that had evenly-spaced nucleosome arrays upstream of the TSS and the nucleosome arrays were canceled out while an average profile was plotted suggesting nucleosome patterns at individual genes may differ from stereotypical averaged profiles (Figure 2.5). Nonetheless, the arrays downstream of the TSS were “in phase” in both subsets of genes indicating the role of coding sequence characteristics and transcription activities in establishing nucleosome arrays.

Positions of nucleosomes in the genome are not random but rather are controlled by the combination of both cis- and trans-determinants. We observed lower nucleosome occupancy at relatively highly expressed genes, and more highly expressed genes had larger distance between the TSS and the +1 nucleosome (the first nucleosome peak downstream of the TSS) with a wider

5’ NDR (Figure 2.8A and B). These observations agree with previous findings in Arabidopsis and rice, reflecting the correlation between open chromatin architecture and transcription (Li et al., 2014; Wu et al., 2014; Liu et al., 2015). The positive correlation between gene length and gene expression we observed in rice agrees with the findings in plants but contradicts the findings in animals that highly expressed genes have shorter primary transcripts (Ren et al., 2006) reflecting different evolutionary divergence paths in plants and animals. It has been widely reported that GC- and AT-rich sequences favor and disfavor nucleosome formation, respectively

(Tillo and Hughes, 2009). Indeed, enriched SS (G/C) dinucleotides in the cores of well-

42

positioned nucleosomes and enriched WW (A/T) dinucleotides in nucleosome flanking

sequences has been observed for yeast (Mavrich et al., 2008a), Drosophila (Mavrich et al.,

2008b), human (Valouev et al., 2011), rice (Zhang et al., 2015), and T. thermophila (Xiong et al.,

2016). However, seemingly contradictory correlations among GC content and nucleosome

occupancy have been observed. For example, the GC content of randomly selected genomic

fragments of yeast and human are positively correlated with nucleosome occupancy, whereas a

negative correlation is observed for rice and Arabidopsis (Wang et al., 2012; Liu et al., 2015).

Also, GC content has a negative correlation with nucleosome occupancy at predicted

transcription factor binding sites in human and rice (Wang et al., 2012; Liu et al., 2015). Herein,

we compared the relationship between nucleosome occupancy and GC content in several ways.

First we mapped GC content distribution across the TSS, which correlated with nucleosome

occupancy on a “macro” level: relatively low levels upstream of the TSS and higher levels

downstream (Figure 2.6). By comparing the -140 and -250 gene subsets, we also showed a

positive correlation on a “micro” level: peaks of GC% overlapped with the positions of the

corresponding -1 nucleosome peaks (Figure 2.6A). These observations are consistent with GC-

rich sequences favoring nucleosome occupancy. However, grouping PCG into GC quintiles showed an obvious negative correlation across the TSS, particularly in the downstream region

(Figure 2.7C). Together these results support the hypothesis that GC content intrinsically influences nucleosome occupancy, but that other determinants including transcription contribute to nucleosome occupancy. We also observed that genes with on average 48% to 53% of GC content downstream of the TSS have better nucleosome phasing, suggesting the influence of GC content on nucleosome phasing downstream of the TSS (Figure 2.7C). Future studies on DNA sequence arrangement at positioned nucleosomes within different regions of rice genes may

43

reveal how DNA sequence (content and arrangement) affects genome-wide nucleosome

positioning.

Distinct nucleosome patterns across the TSS of rice genes make it possible to categorize

genes based on their nucleosome patterns across the TSS. We show two groups of rice genes

(type I and type II) clustered by their distinct nucleosome patterns across the TSS with distinct

gene functions (Figure 2.11). Type I genes have wide 5’ NDRs correlating with relatively high

transcription rates as housekeeping genes, whereas type II stress-responsive genes have

nucleosomes positioned on either side of the TSS which may create obstacles for transcription machinery as well as serving as the landmarks for TF and chromatin remodelers to recognize under induced conditions. Nucleosome patterns show strong correlations with the above- discussed gene characteristics, suggesting the possibility of inferring such characteristics (e.g. expression and function) from the associated nucleosome patterns. Indeed, we demonstrated that nucleosome occupancy alone can explain transcription activities to some extent in rice (Figure

2.9), and recent studies have shown promising results on predicting tumor gene expression and tissue of origin from nucleosome patterns of cell-free DNA in human plasma (Snyder et al., 2016;

Ulz et al., 2016). Together, our findings support a conserved correlation between nucleosome patterns and gene characteristics in eukaryotes, which could benefit the agricultural and medical communities in the near future.

44

CHAPTER 3. THE IMPACT OF PHOSPHATE STARVATION ON NUCLEOSOME PATTERNS AND GENE EXPRESSION IN RICE2

Introduction

In budding yeast (Saccharomyces cerevisiae), a combination of transcriptional regulators

and downstream targets comprising the PHO regulon modulate adaptive responses to deficiency

of inorganic phosphate (Pi), a primary source of P (Secco et al., 2012). Early studies examining

the role of chromatin structure in transcriptional regulation showed that nucleosome remodeling

also plays a role in modulating PHO regulon genes. Specifically, nucleosomes are evicted to

expose the promoter region of the yeast PHO5 secreted acid phosphatase gene in response to

low-Pi conditions (Almer et al., 1986; Barbaric et al., 2007). More recent genome-wide studies

on the remodeling of primary chromatin structure in response to environmental perturbation in

yeast have shown a connection between global nucleosome dynamics and transcription activities

(Shivaswamy et al., 2008; Huebert et al., 2012).

As sessile organisms, plants constantly encounter environmental challenges and must shape themselves for adaptation. Many studies have been carried out to investigate Pi starvation responses (PSRs) in plants. These studies have identified morphological and physiological

responses aimed at enhancing Pi acquisition and recycling, as well as key regulators of these

responses (Rouached et al., 2010; Hu and Chu, 2011; Wu et al., 2013; Wang et al., 2014).

Transcript profiling studies have shown that transcriptional regulation plays an important role in

modulating PSRs (Wu et al., 2003; Zheng et al., 2009; Secco et al., 2013; Secco et al., 2015), and emerging data from our laboratory and others are indicating that chromatin-level mechanisms are

also involved in regulating PSRs (Smith et al., 2010; Iglesias et al., 2013; Kuo et al., 2014).

Portions of this chapter previously appeared as Zhang, Q., Oh, D.H., DiTusa, S.F., RamanaRao, M.V., Baisakh, N., Dassanayake, M., and Smith, A.P. (2018). Rice nucleosome patterns undergo remodeling coincident with stress-induced gene expression. BMC Genomics 19, 97. 45

Questions remain on how environmental perturbation affects nucleosome dynamics and the

extent to which nucleosome remodeling is linked to changes in gene expression in plants or other

systems. Here, we demonstrate the impact of Pi starvation on nucleosome patterns in rice and

examined transcription activities by deep sequencing of ribosomal RNA-depleted RNA followed by quantification of intronic transcript abundance. By integrating information obtained from these assays, we reveal a significant correlation between nucleosome dynamics and changes in gene expression in response to limitation of a major essential nutrient.

Materials and Methods

Plant material and growth conditions

Plant material and growth conditions were the same as described in Chapter 2. For nutrient treatment, seedlings were transferred to fresh half-strength Hoagland’s or the same media lacking phosphorus (KH2PO4) for 24 hours.

MNase-seq

MNase-seq experiments were carried out the as described in Chapter 2.

RNA-seq

The experimental procedures of RNA-seq were the same as described in Chapter 2.

Differential expression of biological replicates between control and –Pi were determined by the

iRNAseq pipeline (Madsen et al., 2015) based on the sequencing reads abundance at the introns

with default settings. Records from the output file ‘introns.txt’ of the iRNA-seq pipeline was

filtered for adjusted p-value (Padj)<0.05, and then genes with positive log2_FC were determined

as up-regulated, and genes with negative values were determined as down-regulated. GO term

enrichment analysis were performed using AgriGO with default settings (Du et al., 2010).

46

Quantification of total phosphorus (P) and inorganic phosphate (Pi)

Plant tissues were rinsed thoroughly in distilled water before analyses. Quantification of

total phosphorus was conducted by an acid digestion method as described previously (Jones,

2001). Briefly, 0.5 g of the dried leaf and root tissues were digested with 5.0 mL of concentrated

HNO3 in a heat block at 125 °C for 2.5 h followed by repeated addition of 3 mL 30% H2O2 until the digest was clear. The temperature of the heat block was reduced to 80 °C for the residue to dry. Colorless dry residue was dissolved in 20 mL deionized water and analyzed by inductively coupled plasma emission spectroscopy (ICP) using (NH4)2HPO4 as the standard in the LSU Soil

Testing & Plant Analysis Laboratory.

Quantification of inorganic phosphate (Pi) was conducted by grinding plant tissue in

liquid nitrogen and dissolving in distilled water. Pi was quantified by the molybdate assay

(Ames, 1966), and a standard curve was generated using KH2PO4.

Results

Pi starvation induces large-scale nucleosome dynamics

To assess the impact of environmental perturbation on nucleosome patterns, we

compared the nucleosome profiles identified in control shoot tissues with those in shoots

harvested from plants subjected to a 24-hour Pi starvation treatment. We measured total

phosphorus and inorganic phosphate concentrations in rice seedlings from control and –Pi

treatments prior to nucleosome and transcription profiling. Shoot P concentrations significantly

decreased (p<0.01, t-test) while the root P concentrations remained similar to that of the control

(p>0.05, t-test) after 24 hours of Pi starvation (Figure 3.1A and B). In contrast, Pi concentration

in shoots was unchanged (p>0.05, t-test) while the root Pi concentration decreased (p<0.05, t-test)

after 24 hours of Pi starvation (Figure 3.1C and D). These changes in Pi concentrations after 24

47

hours of Pi starvation agree with a previous study which reflects the initiation of Pi starvation in

rice seedlings (Secco et al., 2013).

Figure 3.1. Changes of total phosphorus and inorganic phosphate concentrations in response to phosphate starvation. All values are the mean ± standard error of the mean; n=3 biological replicates with 3 technical repeats each. DW, dry weight; FW, fresh weight. (A) Total phosphorus (P) concentrations for shoots of 5-week-old seedlings grown under full nutrient (Ctrl) and Pi-starvation (−Pi) conditions. (B) Total P concentrations for roots of 5-week-old seedlings grown under full nutrient (Ctrl) and Pi-starvation (−Pi) conditions. (C) Inorganic phosphate (Pi) concentrations for shoots of 5-week-old seedlings grown under full nutrient (Ctrl) and Pi- starvation (−Pi) conditions. (D) Pi concentrations for roots of 5-week-old seedlings grown under full nutrient (Ctrl) and Pi-starvation (−Pi) conditions. To investigate the impact of Pi starvation on genome-wide nucleosome patterns, we compared MNase-seq results from control and –Pi samples. We first examined nucleosome patterns across the TSS of genes. We found that nucleosome phasing remained largely the same between the control and –Pi samples, whereas –Pi samples had higher nucleosome occupancy

1,000 bp upstream of the TSS (p<2.2×10-16, Wilcoxon signed-rank test) but lower nucleosome

48 occupancy 1,000 bp downstream of the TSS (p<2.2×10-16, Wilcoxon signed-rank test) as compared to control samples (Figure 3.2A and C). This raised the question of whether Pi starvation decreased nucleosome occupancy in coding regions but increased nucleosome occupancy in non-coding regions. To address this, we examined nucleosome occupancy changes in exons of genes under Pi starvation. Since longer exons allow the occupancy of more nucleosomes, we separated exons according to length: 170−240 bp, 315−350 bp, 480−550 bp, and 645−715 bp, which allows for the occupancy of one, two, three or four nucleosomes, respectively (Chodavarapu et al., 2010). Both the control and –Pi samples showed strong nucleosome peaks and NDRs that marked intron-exon and exon-intron junctions (Figure 3.3).

However, nucleosome occupancy was lower in exons and greater in introns of –Pi samples relative to the control samples (Figure 3.3), further supporting a major “redistribution” of nucleosomes from coding regions to non-coding regions in response to Pi starvation. A notable exception was at the TTS of PCG, at which the –Pi samples contained greater nucleosome occupancy relative to the controls (Figure 3.2B).

Figure 3.2. Changes in nucleosome patterns in response to phosphate starvation. (A) MNase-seq density across the TSS from 24-hour control and −Pi rice shoot tissues. (B) Same analysis as (A) across the TTS. (C) Same analysis as (A) at the GB.

49

Figure 3.3. Changes in nucleosome occupancy at the exons of rice genes in response to phosphate starvation. MNase-seq density of exons grouped according to their length: (A) 170−240 bp; (B) 315−350 bp; (C) 480−550 bp (D) 645−715 bp under control and −Pi conditions. Plots are centered at the 5’ boundaries of the exons. To better illustrate nucleosome dynamics in response to Pi starvation, we employed

DANPOS, which defines accurate nucleosome maps and detects dynamic nucleosomes between samples (Chen et al., 2013). This analysis revealed a substantial impact of Pi starvation on nucleosome occupancy and positioning. Using the nucleosome profile from control samples as a baseline, DANPOS identified 313,769 dynamic nucleosomes with either a position shift (range:

50-95 bp), occupancy change (FDR<0.05), or fuzziness change (FDR<0.05) associated with Pi starvation from two biological replicates (Figure 3.4A). We analyzed the locations of the dynamic nucleosomes in the rice genome and found they were widely distributed in gene-related

50

regions (1,000 bp upstream of the TSS to 200 bp downstream of the TTS), and a large number of

dynamic nucleosomes were mapped within 200 bp upstream of the TSS, 5’-UTR and exons

(Figure 3.4B).

Figure 3.4. Nucleosome dynamics detected by DANPOS in response to Pi starvation. (A) Three types of nucleosome dynamics identified by DANPOS: position shift (range: 50-95 bp), occupancy change (FDR<0.05) and fuzziness change (FDR<0.05). Each averaged MNase-seq density graph is centered at the dyad of nucleosome that is found to house the change. (B) Number of different types of dynamic nucleosomes per kilo base (kb) per 100 nucleosomes called by DANPOS at the promoter (500−1,000 bp upstream of the TSS, 200−500 bp upstream of the TSS and 200 bp upstream of the TSS), intragenic (5’UTR, exons, introns and 3’UTR), 200 bp downstream of the TTS, and intergenic regions. Nucleosome dynamics are enriched at differentially expressed genes

The significant negative correlation between NDR nucleosome occupancy and gene expression (Figure 2.9) renders a question of whether the changes in NDR nucleosome occupancy correlate with changes in gene expression. To address this, I plotted the log2-fold

51

changes of NDR occupancy against the FPKM log2-fold changes of the corresponding gene in

response to Pi starvation (Figure 3.5).

Figure 3.5. Correlation between NDR occupancy changes and FPKM changes. Log2-fold changes in NDR occupancy (log2(NFR_RPM_FC), y-axis) were plotted against the log2-fold changes in FPKM values (log2(FPKM_FC), x-axis) of the corresponding gene in response to Pi starvation. Quadrant I: false positives; II: true positives; III: true negatives; IV: false negatives. The plots showed that there is no direct correlation between changes in NDR occupancy

and differential gene expression (r2=0.002). To evaluate to what extent changes in NDR

occupancy could explain differential gene expression, I classified genes based on their NDR

occupancy and FPKM behaviors (Figure 3.5 and Table 3.1). I found that the model had a

sensitivity value of 0.55 (II/(II+IV)) and a specificity value of 0.42 (III/(III+I)). Those results

suggest that NDR occupancy alone cannot explain differential gene expression in rice. It is

possible, however, adding one or more biological observations (e.g. nucleosome occupancy

52

across the TSS) to the model can increase the sensitivity and specificity in explaining differential

gene expression as described in humans (Ulz et al., 2016).

Table 3.1. Summary of genes with NDR occupancy changes and FPKM changes. Number of genes NDR_Occupancy_Increase NDR_Occupancy_Decrease FPKM_Increase 7,409 (false positive), I 4,969 (true positive), II FPKM_Decrease 5,280 (true negative), III 3,989 (false negative), IV

Our RNA sequencing libraries were prepared from total RNA depleted of ribosomal

RNA, and quantification and comparison of intronic reads enabled us to more accurately capture changes in transcriptional activities than steady-state transcript levels inferred from conventional

RNA-seq. We employed the iRNA-seq pipeline, which was demonstrated to perform at comparable qualities as global run-on (GRO)-seq and RNA polymerase II (RNAP II) ChIP-seq in determining genome-wide changes in transcriptional activities (Madsen et al., 2015). Using

RNA-seq libraries from two biological replicates of control and –Pi samples, the iRNA-seq pipeline identified 134 up-regulated and 691 down-regulated genes (padj<0.05) in response to

24-hour Pi starvation in rice.

To investigate the correlation between transcriptional changes and nucleosome dynamics, we searched for differentially expressed (either up- or down-regulated) genes (DEGs) in which dynamic nucleosomes were present. We found that approximately 60%, 20%, and 50% of the

DEGs were associated with nucleosome position shift, occupancy change, and fuzziness change respectively, within gene-related regions. These observed proportions were significantly higher than the same number of randomly selected genes as DEGs (p<2.2×10-16, binomial test, with

10,000 iterations, Figure 3.6A and B). Altogether, 130 out of 134 up-regulated genes and 678

out of 691 down-regulated genes were found to contain dynamic nucleosomes, suggesting a strong correlation between transcriptional changes and nucleosome dynamics in response to Pi starvation in rice.

53

Figure 3.6. Changes in nucleosome patterns correlates with changes in transcription activities in response to phosphate starvation. (A) Percentage of 134 up-regulated genes (DEG_Up) called by iRNA-seq associated with dynamic nucleosomes called by DANPOS. Randomly selected (10,000 times) genes with the same size of DEGs were used as a control, and values are mean ± standard deviation. Pos_sft_5_3: 5’-3’ position shift; Pos_sft_3_5: 3’-5’ position shift; Occu_increase: increase in nucleosome occupancy; Occu_decrease: decrease in nucleosome occupancy; Fuz_increase: increase in nucleosome fuzziness; Fuz_decrease: decrease in nucleosome fuzziness. (B) Same analysis as (A) with 691 down-regulated genes (DEG_Dn). (C) AgriGO representation of the overrepresented Gene Ontology (GO) terms in the 130 genes up- regulated by 24 hours of Pi starvation in the shoots via iRNA-seq associated with dynamic nucleosomes. Number in parenthesis represents the FDR value. (D) Same analysis as (C) with 678 down-regulated genes associated with dynamic nucleosomes. To better understand the roles of nucleosome dynamics and transcriptional changes in rice PSR, we sought functional information for the genes that exhibited changes in expression

54

and nucleosome dynamics in response to Pi starvation. We first examined whether these genes

were biased for type I (housekeeping) or type II (stress-responsive) genes. We found that both

DEGs and genes with dynamic nucleosomes had higher overlaps with type II genes than type I genes (1.9 fold, p=1.73×10-3 and 1.8 fold, p=2.19×10-3 respectively, Fisher’s exact test). Next

we carried out GO term enrichment analysis on DEGs associated with dynamic nucleosomes.

Up-regulated genes with nucleosome dynamics were involved in photosynthesis (GO:0015979,

FDR=2.09×10-4) whereas down-regulated genes containing dynamic nucleosomes were enriched

in GO terms including cell cycle (GO:0004079, FDR=8.01×10-5) and cell wall (GO:0005618,

FDR=4.51×10-8) (Figure 3.6C and D). These results support a modulation of photosynthesis and

growth in rice shoots in response to Pi starvation through changes in gene expression that are

linked to corresponding changes in nucleosome positioning and occupancy.

Nucleosome dynamics at cis-regulatory elements

In yeast, nucleosome remodeling is necessary for full induction of several yeast PHO

regulon genes, which are activated in response to Pi deficiency (Barbaric et al., 2007; Wippo et

al., 2009). For example, in the case of the yeast PHO5 acid phosphatase gene, a nucleosome

blocks a promoter binding site of the Pho4 transcriptional activator under Pi-replete conditions, repressing PHO5 transcription. Upon Pi deficiency, nucleosome remodeling exposes the cis-

element making it accessible to Pho4, which then initiates PHO5 transcription (Barbaric et al.,

2007). In rice, the OsPHR2 transcription factor induces expression of numerous Pi-related genes

in response to Pi starvation by binding to the P1BS cis-element (consensus sequence

GNATATNC) (Zhou et al., 2008).

To investigate the possible nucleosome dynamics at the P1BS element in response to 24- hours of Pi starvation, we compared nucleosome occupancy between control and –Pi samples at

55

occurrences of the P1BS motif found within 500 bp upstream of the TSS. Nucleosome

occupancy was higher in the –Pi samples near the center of the P1BS motif (± 250 bp) as compared to the control samples (p=1.58×10-9, Wilcoxon signed-rank test, Figure 3.7A). To

examine the specificity of nucleosome dynamics at the P1BS motif in response to Pi starvation,

we also compared nucleosome occupancy centered at the G-box (CACGTG) found within 500

bp upstream of the TSS, which is a known cis-element that regulates jasmonic acid (JA)- responsive gene expression (Zhang et al., 2015), and we also observed increased nucleosome occupancy at the G-box motif in the –Pi samples (p=5.80×10-10, Wilcoxon signed-rank test,

Figure 3.7B). Moreover, nucleosome occupancy centered at both elements found within 500 bp downstream of the TSS were lower in the –Pi samples than the control samples (p≤6.19×10-9,

Wilcoxon signed-rank test, Figure 3.7C and D).

These results indicate that the nucleosome dynamics we observed at the P1BS motif may not be unique to Pi starvation but rather were the results of a broad effect of higher nucleosome occupancy upstream of the TSS and lower nucleosome occupancy downstream of the TSS caused by Pi starvation (Figure 3.2A and C). We further plotted MNase-seq density of regions centered at TATA box and Y-patch (consensus sequence CYTCYYCCYC), a core promoter element in rice genes (Civan and Svec, 2009), and found all TF binding sites including P1BS were depleted of nucleosomes compared with surrounding regions while strong nucleosome peaks were found on the boundaries of binding sites regardless of Pi starvation (Figure 3.7A, E, and F).

Discussion

Our work revealed changes in nucleosome patterns and transcription activities in response to 24 hours of Pi starvation and their correlations in rice. Genome-wide studies in yeast

56

showed dynamic relationships among nucleosome patterns, gene expression and TF binding

responding to heat shock and oxidative stress (Shivaswamy et al., 2008; Huebert et al., 2012).

Figure 3.7. Changes in nucleosome occupancy at cis-elements in response to phosphate starvation. MNase-seq density of (A) P1BS within 500 bp upstream of the TSS; (B) G-box within 500 bp upstream of the TSS; (C) P1BS within 500 up downstream of the TSS; (D) G-box within 500 bp downstream of the TSS; (E) TATA box; (F) Y-patch. Plots are centered at the sequence of interest under control (Ctrl) and −Pi conditions. In Arabidopsis, 1-hour of coronatine (COR, a phytotoxin that mimics a precursor of jasmonic acid (JA) which is involved in defense-related stress responses) treatment was shown to trigger changes in transcript abundance, but differential gene expression changes were not

57

correlated with nucleosome occupancy changes in genes (Liu et al., 2015). Our data show that

24 hours of Pi starvation induces genome-wide nucleosome reorganization, especially at the regions surrounding the TSS and the coding sequence (Figures 3.2-3.4), and we found a strong correlation between nucleosome dynamics at genes and transcriptional changes (Figure 3.6A and

B).

Among the three types of nucleosome dynamics (position shift, occupancy change and fuzziness change), we found that position shift and fuzziness are more relevant to DEGs by Pi starvation than occupancy change (Figure 3.6A and B), indicating specificity of nucleosome dynamics and its association with biological functions, as demonstrated in yeast previously

(Chen et al., 2013). Histone variants H2A.Z and H3.3, and acetylated and methylated histones are usually enriched in the +1 nucleosome and contribute to the flexibility of nucleosome occupancy which assists nucleosome eviction and assembly of the pre-initiation complex (Jiang and Pugh, 2009). H2A.Z deposition at PSR genes was shown to be correlated with Pi starvation responses in Arabidopsis (Smith et al., 2010). Our observation of enriched nucleosome dynamics across the TSS during Pi starvation may reflect the replacement of canonical histones

with variants such as H2A.Z as well as post-translational modifications of histones at the TSS

(Figures 3.2A and 3.4B).

Unlike previous studies in yeast and Arabidopsis, we employed iRNA-seq on ribosome-

depleted total RNA to capture changes in nascent RNA transcript instead of quantification of

steady-state RNA transcript in response to an environmental perturbation, improving the

identification of DEGs that are specific to the treatment. With 24 hours of Pi starvation,

photosynthesis was up-regulated while cell cycle and cell wall synthesis were down-regulated

(Figure 3.6C and D). These observations are consistent with a recent study on Pi starvation in

58

the dinoflagellate Amphidinium carterae, which showed decreased cell division in response to Pi

starvation, but continued photosynthetic capability (Li et al., 2016). We propose a similar

adaptation strategy carried out by rice in response to short-to-medium term Pi starvation, which

includes enhancing photosynthesis to accumulate energy, and reducing DNA replication, cell

division, and cell wall expansion to minimize Pi usage. In Arabidopsis, the DNA element G-box

plays a role in regulating JA-responsive gene expression, and it was shown that nucleosomes

were depleted at this element regardless of COR treatment, which mimics JA responses (Liu et

al., 2015). We show similar patterns at the rice core promoter elements TATA box and Y-patch,

as well as P1BS; that nucleosomes were depleted at those elements regardless of Pi starvation

(Figure 3.7A, E, and F). This may result from constant binding of trans-acting factors to assist a

rapid transcriptional change in response to stress. We observed a significant drop in Pi

concentration after 24 hours of Pi starvation in the roots (Figure 3.1D), and future nucleosome and transcription profiling studies could include roots where Pi is sensed and acquired. We anticipate that monitoring nucleosome dynamics with transcriptional changes at multiple time points of Pi starvation with Pi re-supply in a single cell of multiple plant cell types, and identification of PSR-related chromatin remodelers could broaden the understanding on PSR in rice and eventually provide useful resources for the agricultural community.

59

CHAPTER 4. CONCLUSIONS

The packaging of eukaryotic genomes into nucleosomes has an impact on every

biological process involving DNA. Recent advances in genome sequencing technology make it

possible to map nucleosomes at a high resolution, as well as to study the correlation among DNA

sequence, nucleosome formation, chromatin remodeling and transcription. Just like beads on a

string, a strongly positioned nucleosome and other DNA-binding proteins limits the possibilities

of nucleosome positioning around them, acting as a barrier and creating decaying nucleosome

arrays from the barrier. The most obvious barrier of nucleosome positioning in the eukaryotic

genome is the nucleosome depleted region (NDR) near the transcription start site (TSS), where

nucleosome arrays are formed on either side. We found that the loss of evenly-spaced

nucleosome arrays upstream of the TSS was due to gene-to-gene differences in nucleosome

positioning, and further studies can be carried out to study tissue-specific and even cell-to-cell variances.

It is unclear that how tissue-specific expression and gene-to-gene differences could contribute to the variance of nucleosome arrays upstream of the TSS. Through tissue-specific transcript profiling, tissue-specific expressed genes can be identified and the nucleosome patterns upstream of the TSS of those genes could be determined. Gene-to-gene differences can be analyzed similarly as the analysis of cis- and trans- determinants of nucleosome patterns: First, I could analyze the exact DNA sequence composition, specifically the oscillations of WW/SS dinucleotides at the nucleosome peaks of the two groups of genes that had out-of-phase nucleosome arrays upstream of the TSS. If the out-of-phase nucleosome arrays were caused by

DNA sequence differences, I should observe the corresponding oscillations of WW/SS dinucleotides within the arrays. In addition, I could look into the DNA sequence arrangement at

60

the NDR of the two groups of genes, and I could further identify putative transcription factor

binding sites at those NDR that are responsible for the tissue-specific expression. Second, a

wider NDR at the TSS may contribute to a higher expression as discussed in Chapter 2, and since

the two groups of genes had out-of-phase nucleosome arrays upstream of the TSS, their NDR

width may suggest their expression levels. I examined the expression level of the ‘-140’ and ‘-

250’ gene groups and found that the ‘-140’ gene groups had significantly higher averaged expression level yet lower median expression level (‘-140’ group averaged FPKM=18.88 median

FPKM=2.88, ‘-250’ group averaged FPKM=15.51 median FPKM=3.26, p=0.0012, Mann-

Whitney U test). This suggests that the ‘-140’ gene group was enriched with highly expressed genes, which can also be inferred from its lower nucleosome occupancy right at the TSS (Figure

2.5). One caveat of the MNase-seq assay used in this dissertation is that I used a median to low level of MNase, which on the one hand preserved as many nucleosomes as possible but on the other hand could include DNA that were protected by other protein factors including transcription factors and RNA polymerases. This could be improved by introducing an additional ChIP step for histone H4.

Intrinsic DNA sequence has a dominant impact on nucleosome formation as not all DNA sequence is capable of sharp bending. The differences between in vivo and in vitro nucleosome

patterns suggest other factors including transcription factors, chromatin remodelers, and RNA

polymerases constantly override sequence preferences to carry out biological functions. In

addition, nucleosomal DNA methylation, histone variant deposition and histone post- translational modifications have distinct roles in affecting nucleosome stability, and those alterations are closely tied to growth, development and responses to environmental perturbations.

We demonstrate that nucleosome patterns at rice genes were affected by GC content and

61 transcription, and the nucleosome patterns at genes can be used to categorize genes. Functional gene categories are closely tied to their expression levels, and we show that the correlation between nucleosome occupancy and transcription was strongest at the NDR, providing the possibilities to predict transcript abundance from nucleosome occupancy.

Through 24 hours of Pi starvation in rice, we show numerous nucleosome dynamics across the genome, which were enhanced at differentially expressed genes. I found that there was an increase of transcription upon Pi starvation but the increase was not statistically significant (Ctrl total FPKM=487,353 -Pi total FPKM=562,311 p=0.51, Mann-Whitney U test).

Another way to investigate whether there was general genome activation or inactivation upon Pi starvation is to measure the linker DNA length. An increase in linker length would suggest genome inactivation and vice versa, as discussed in Chapter 1. I was unable to address the linker

DNA length from my MNase-seq data due to the uncertain length of each read from the sequencing library. Future research should use pair-end sequencing on MNase-seq so that the exact length of each nucleosomal DNA can be measured and a better anchoring of nucleosomal

DNA reads allows the measurements of linker DNA length as well as nucleosome phasing at individual genes. I found up-regulated genes with dynamic nucleosomes were enriched with the

GO term photosynthesis while cell cycle and cell wall were among the enriched GO terms for down-regulated genes. I examined the annotation of the up-regulated genes, and found that there were several genes involved in iron (Fe) homeostasis including LOC_Os12g01530 (ferritin-1, chloroplast precursor, putative, expressed), which is an ortholog of Arabidopsis Fer1 that possesses a P1BS binding site at its promoter region suggesting its role in Fe homeostasis under

Pi starvation conditions in both rice and Arabidopsis (Secco et al., 2013). The cross-talk between Fe and Pi signaling in rice has been reported by previous studies, and Fe was found

62 overloaded in the shoots under Pi starvation (Zheng et al., 2009; Secco et al., 2013). I observed the induction of Fe storage genes in the shoots and correspondingly I expect to observe the repression of Fe transporter genes in the roots and nucleosome dynamics with a net result of closed chromatin at the promoters of those repressed genes. Future works can also include the measurements of Fe concentration in the shoots and roots to better understand the cross-talk between Fe and Pi homeostasis in rice. For down-regulated genes, I found genes closely related to DNA replication and cell cycle, including origin recognition complex (LOC_Os06g08790,

LOC_Os10g26280), MCM DNA replication licensing factor (LOC_Os05g39850,

LOC_Os05g14590, and LOC_Os01g36390), helicase (LOC_Os04g49860), single-strand binding protein (LOC_Os03g11540), and cyclin (LOC_Os01g59120, LOC_Os04g47580, and

LOC_Os10g41430) that contributed to the GO term enrichment of cell cycle and expansin

(LOC_Os02g42650 and LOC_Os10g40700) that contributed to the GO term ‘cell wall’. Future studies can target those genes and explore their behavior under Pi starvation to identify putative players in Pi starvation responses, and experiments on measuring DNA replication, cell cycle and cell wall synthesis can be carried out to confirm the findings from chromatin and transcription profiling.

Roots are the places where Pi is sensed and acquired, and I expect to observe more dramatic transcriptional changes and nucleosome dynamics at genes including high-affinity phosphate transporters, SPX-domain containing genes and acid aiming at recycling phosphate from the environment. Nucleosome organization is highly dynamic, and what we have captured is a snapshot of a continuously changing process. More work can be done tracking the time-course nucleosome dynamics and transcriptional changes in response to Pi starvation along with assessing the role of other trans-determinants. I hypothesize that for

63

certain Pi starvation responsive genes that are repressed under Pi sufficient conditions, the

removal of nucleosomes at the promoter is the prerequisite of transcriptional activation. If

possible, a time-lapse footage can be taken at the roots capturing the morphology changes in

response to Pi starvation: The elongation of primary roots is halted with the extension of lateral

root and the growth of root hairs. At that time, suppose the corresponding transcriptional

changes had already begun, and then several hours before would be a good time point to

investigate chromatin profiles in the roots. If the plants were exposed with longer Pi starvation

conditions, I expect to observe the recycling of P in the shoots, and the up-regulation of

phosphatases, ribonucleases, and high-affinity phosphate transporters in the shoots which are similar to earlier responses in the roots. Re-supplying to Pi-starved plants would help with the identification of certain ‘stress memory’ genes, which respond quickly to the stress and return to basal-level expression when the stress is removed. For those genes, I expect the chromatin architectures remain the same with or without stress as chromatin profiles may serve as the foundation of the ‘stress memory’. To develop rice with better Pi-use efficiency and starvation tolerance, it would be interesting to investigate whether those chromatin profiles related to ‘stress memory’ are mitotically heritable and whether they can pass on to the next generation.

Our work on rice has confirmed some of the principles of nucleosome patterns and their determinants observed in model eukaryotes, yet organisms have evolved different mechanisms to cope with their environment. Future work is needed to uncover the complexity of Pi starvation responses by identifying specific genes or even the master players in nucleosome dynamics.

64

REFERENCES

Adelman, K., and Lis, J.T. (2012). Promoter-proximal pausing of RNA polymerase II: emerging roles in metazoans. Nature Reviews Genetics 13, 720-731.

Almer, A., Rudolph, H., Hinnen, A., and Horz, W. (1986). Removal of positioned nucleosomes from the yeast PHO5 promoter upon PHO5 induction releases additional upstream activating DNA elements. The EMBO Journal 5, 2689-2696.

Ames, B.N. (1966). Assay of inorganic phosphate, total phosphate and phosphatases. Methods in Enzymology 8, 115-118.

Annunziato, A.T. (2005). Split decision: what happens to nucleosomes during DNA replication? The Journal of Biological Chemistry 280, 12065-12068.

Bai, L., and Morozov, A.V. (2010). Gene regulation by nucleosome positioning. Trends in Genetics : TIG 26, 476-483.

Barbaric, S., Luckenbach, T., Schmid, A., Blaschke, D., Horz, W., and Korber, P. (2007). Redundancy of chromatin remodeling pathways for the induction of the yeast PHO5 promoter in vivo. The Journal of Biological Chemistry 282, 27610-27621.

Basehoar, A.D., Zanton, S.J., and Pugh, B.F. (2004). Identification and distinct regulation of yeast TATA box-containing genes. Cell 116, 699-709.

Baylin, S.B., Esteller, M., Rountree, M.R., Bachman, K.E., Schuebel, K., and Herman, J.G. (2001). Aberrant patterns of DNA methylation, chromatin formation and gene expression in cancer. Human Molecular Genetics 10, 687-692.

Beh, L.Y., Müller, M.M., Muir, T.W., Kaplan, N., and Landweber, L.F. (2015). DNA-guided establishment of nucleosome patterns within coding regions of a eukaryotic genome. Genome Research 25, 1727-1738.

Benjamini, Y., and Speed, T.P. (2012). Summarizing and correcting the GC content bias in high- throughput sequencing. Nucleic Acids Research 40, e72-e72.

Bonisch, C., and Hake, S.B. (2012). Histone H2A variants in nucleosomes and chromatin: more or less stable? Nucleic Acids Research 40, 10719-10741.

Boulikas, T., Wiseman, J.M., and Garrard, W.T. (1980). Points of contact between histone H1 and the histone octamer. Proceedings of the National Academy of Sciences of the United States of America 77, 127-131.

Casadio, F., Lu, X., Pollock, S.B., LeRoy, G., Garcia, B.A., Muir, T.W., Roeder, R.G., and Allis, C.D. (2013). H3R42me2a is a histone modification with positive transcriptional effects. Proceedings of the National Academy of Sciences of the United States of America 110, 14894-14899.

Chandran, A.K.N., Bhatnagar, N., Kim, B., and Jung, K.-H. (2016). Genome-wide identification and analysis of rice genes to elucidate morphological agronomic traits. Journal of Plant Biology 59, 639-647.

65

Chen, K., Xi, Y., Pan, X., Li, Z., Kaestner, K., Tyler, J., Dent, S., He, X., and Li, W. (2013). DANPOS: dynamic analysis of nucleosome position and occupancy by sequencing. Genome Research 23, 341-351.

Chodavarapu, R.K., Feng, S.H., Bernatavichute, Y.V., Chen, P.Y., Stroud, H., Yu, Y.C., Hetzel, J.A., Kuo, F., Kim, J., Cokus, S.J., Casero, D., Bernal, M., Huijser, P., Clark, A.T., Kramer, U., Merchant, S.S., Zhang, X.Y., Jacobsen, S.E., and Pellegrini, M. (2010). Relationship between nucleosome positioning and DNA methylation. Nature 466, 388- 392.

Christophorou, M.A., Castelo-Branco, G., Halley-Stott, R.P., Oliveira, C.S., Loos, R., Radzisheuskaya, A., Mowen, K.A., Bertone, P., Silva, J.C., Zernicka-Goetz, M., Nielsen, M.L., Gurdon, J.B., and Kouzarides, T. (2014). Citrullination regulates pluripotency and histone H1 binding to chromatin. Nature 507, 104-108.

Civan, P., and Svec, M. (2009). Genome-wide analysis of rice (Oryza sativa L. subsp. japonica) TATA box and Y Patch promoter elements. Genome 52, 294-297.

Collings, C.K., and Anderson, J.N. (2017). Links between DNA methylation and nucleosome occupancy in the human genome. & Chromatin 10, 18.

Collings, C.K., Waddell, P.J., and Anderson, J.N. (2013). Effects of DNA methylation on nucleosome stability. Nucleic Acids Research 41, 2918-2931.

Conley, D.J., Paerl, H.W., Howarth, R.W., Boesch, D.F., Seitzinger, S.P., Havens, K.E., Lancelot, C., and Likens, G.E. (2009). Ecology. Controlling eutrophication: nitrogen and phosphorus. Science 323, 1014-1015.

Dantas Machado, A.C., Zhou, T., Rao, S., Goel, P., Rastogi, C., Lazarovici, A., Bussemaker, H.J., and Rohs, R. (2015). Evolving insights on how cytosine methylation affects protein-DNA binding. Briefings in Functional Genomics 14, 61-73.

Di Cerbo, V., Mohn, F., Ryan, D.P., Montellier, E., Kacem, S., Tropberger, P., Kallis, E., Holzner, M., Hoerner, L., Feldmann, A., Richter, F.M., Bannister, A.J., Mittler, G., Michaelis, J., Khochbin, S., Feil, R., Schuebeler, D., Owen-Hughes, T., Daujat, S., and Schneider, R. (2014). Acetylation of histone H3 at lysine 64 regulates nucleosome dynamics and facilitates transcription. eLife 3, e01632.

Du, Z., Zhou, X., Ling, Y., Zhang, Z., and Su, Z. (2010). agriGO: a GO analysis toolkit for the agricultural community. Nucleic Acids Research 38, W64-70.

Elser, J., and Bennett, E. (2011). Phosphorus cycle: A broken biogeochemical cycle. Nature 478, 29-31.

Fascher, K.D., Schmitz, J., and Horz, W. (1993). Structural and functional requirements for the chromatin transition at the PHO5 promoter in Saccharomyces cerevisiae upon PHO5 activation. Journal of Molecular Biology 231, 658-667.

Fincher, J.A., Vera, D.L., Hughes, D.D., McGinnis, K.M., Dennis, J.H., and Bass, H.W. (2013). Genome-wide prediction of nucleosome occupancy in maize reveals plant chromatin structural features at genes and other elements at multiple scales. Plant Physiology 162, 1127-1141.

66

Fyodorov, D.V., Zhou, B.R., Skoultchi, A.I., and Bai, Y. (2017). Emerging roles of linker histones in regulating chromatin structure and function. Nature Reviews Molecular Cell Biology.

Gaffney, D.J., McVicker, G., Pai, A.A., Fondufe-Mittendorf, Y.N., Lewellen, N., Michelini, K., Widom, J., Gilad, Y., and Pritchard, J.K. (2012). Controls of nucleosome positioning in the human genome. PLoS Genetics 8, e1003036.

Goldman, J.A., Garlick, J.D., and Kingston, R.E. (2010). Chromatin remodeling by imitation switch (ISWI) class ATP-dependent remodelers is stimulated by histone variant H2A.Z. The Journal of Biological Chemistry 285, 4645-4651.

Grishkevich, V., and Yanai, I. (2014). Gene length and expression level shape genomic novelties. Genome Research 24, 1497-1503.

Hammond, J.P., Broadley, M.R., and White, P.J. (2004). Genetic responses to phosphorus deficiency. Annals of Botany 94, 323-332.

Henikoff, S. (2008). Nucleosome destabilization in the epigenetic regulation of gene expression. Nature Reviews Genetics 9, 15-26.

Horz, W., and Altenburger, W. (1981). Sequence specific cleavage of DNA by micrococcal nuclease. Nucleic Acids Research 9, 2643-2658.

Hu, B., and Chu, C. (2011). Phosphate starvation signaling in rice. Plant Signaling & Behavior 6, 927-929.

Huebert, D.J., Kuan, P.-F., Keleş, S., and Gasch, A.P. (2012). Dynamic changes in nucleosome occupancy are not predictive of gene expression dynamics but are linked to transcription and chromatin regulators. Molecular and Cellular Biology 32, 1645-1653.

Iglesias, J., Trigueros, M., Rojas-Triana, M., Fernández, M., Albar, J.P., Bustos, R., Paz-Ares, J., and Rubio, V. (2013). Proteomics identifies ubiquitin–proteasome targets and new roles for chromatin-remodeling in the Arabidopsis response to phosphate starvation. Journal of Proteomics 94, 1-22.

Jiang, C., and Pugh, B.F. (2009). Nucleosome positioning and gene regulation: advances through genomics. Nature Reviews Genetics 10, 161-172.

Jin, B., Li, Y., and Robertson, K.D. (2011). DNA methylation: superior or subordinate in the epigenetic hierarchy? Genes & Cancer 2, 607-617.

Jin, C., and Felsenfeld, G. (2007). Nucleosome stability mediated by histone variants H3.3 and H2A.Z. Genes & Development 21, 1519-1529.

Jin, C., Zang, C., Wei, G., Cui, K., Peng, W., Zhao, K., and Felsenfeld, G. (2009). H3.3/H2A.Z double variant-containing nucleosomes mark 'nucleosome-free regions' of active promoters and other regulatory regions. Nature Genetics 41, 941-945.

Jones, J.B. (2001). Laboratory guide for conducting soil tests and plant analysis. (CRC Press).

Kalashnikova, A.A., Porter-Goff, M.E., Muthurajan, U.M., Luger, K., and Hansen, J.C. (2013). The role of the nucleosome acidic patch in modulating higher order chromatin structure. Journal of The Royal Society Interface 10. 67

Kawahara, Y., de la Bastide, M., Hamilton, J.P., Kanamori, H., McCombie, W.R., Ouyang, S., Schwartz, D.C., Tanaka, T., Wu, J., Zhou, S., Childs, K.L., Davidson, R.M., Lin, H., Quesada-Ocampo, L., Vaillancourt, B., Sakai, H., Lee, S.S., Kim, J., Numa, H., Itoh, T., Buell, C.R., and Matsumoto, T. (2013). Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data. Rice 6, 4.

Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., and Salzberg, S.L. (2013). TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biology 14, R36.

Korber, P., and Barbaric, S. (2014). The yeast PHO5 promoter: from single locus to systems biology of a paradigm for gene regulation through chromatin. Nucleic Acids Research 42, 10888-10902.

Kuo, H.-F., Chang, T.-Y., Chiang, S.-F., Wang, W.-D., Charng, Y.-y., and Chiou, T.-J. (2014). Arabidopsis inositol pentakisphosphate 2-kinase, AtIPK1, is required for growth and modulates phosphate homeostasis at the transcriptional level. The Plant Journal 80, 503- 515.

Lai, W.K.M., and Pugh, B.F. (2017). Understanding nucleosome dynamics and their links to gene expression and DNA replication. Nature reviews. Molecular Cell Biology 18, 548- 562.

Lam, F.H., Steger, D.J., and O'Shea, E.K. (2008). Chromatin decouples promoter threshold from dynamic range. Nature 453, 246-250.

Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10, R25.

Lay, F.D., Liu, Y., Kelly, T.K., Witt, H., Farnham, P.J., Jones, P.A., and Berman, B.P. (2015). The role of DNA methylation in directing the functional organization of the cancer epigenome. Genome Research 25, 467-477.

Lee, J.Y., Lee, J., Yue, H., and Lee, T.-H. (2015). Dynamics of nucleosome assembly and effects of DNA methylation. Journal of Biological Chemistry 290, 4291-4303.

Lee, W., Tillo, D., Bray, N., Morse, R.H., Davis, R.W., Hughes, T.R., and Nislow, C. (2007). A high-resolution atlas of nucleosome occupancy in yeast. Nature Genetics 39, 1235-1244.

Li, G., Liu, S., Wang, J., He, J., Huang, H., Zhang, Y., and Xu, L. (2014). ISWI proteins participate in the genome-wide nucleosome distribution in Arabidopsis. The Plant Journal 78, 706-714.

Li, M., Shi, X., Guo, C., and Lin, S. (2016). Phosphorus deficiency inhibits cell division but not growth in the dinoflagellate Amphidinium carterae. Frontiers in Microbiology 7.

Liu, M.J., Seddon, A.E., Tsai, Z.T., Major, I.T., Floer, M., Howe, G.A., and Shiu, S.H. (2015). Determinants of nucleosome positioning and their influence on plant gene expression. Genome Research 25, 1182-1195.

Lu, X., Simon, M.D., Chodaparambil, J.V., Hansen, J.C., Shokat, K.M., and Luger, K. (2008). The effect of H3K79 dimethylation and H4K20 trimethylation on nucleosome and chromatin structure. Nature Structural & Molecular Biology 15, 1122-1124.

68

Luger, K., Mader, A.W., Richmond, R.K., Sargent, D.F., and Richmond, T.J. (1997). Crystal structure of the nucleosome core particle at 2.8 A resolution. Nature 389, 251-260.

Luk, E., Ranjan, A., Fitzgerald, P.C., Mizuguchi, G., Huang, Y., Wei, D., and Wu, C. (2010). Stepwise histone replacement by SWR1 requires dual activation with histone H2A.Z and canonical nucleosome. Cell 143, 725-736.

Lyons, D.B., and Zilberman, D. (2017). DDM1 and Lsh remodelers allow methylation of DNA wrapped in nucleosomes. eLife 6, e30674.

Madsen, J.G., Schmidt, S.F., Larsen, B.D., Loft, A., Nielsen, R., and Mandrup, S. (2015). iRNA- seq: computational method for genome-wide assessment of acute transcriptional regulation from total RNA-seq data. Nucleic Acids Research 43, e40.

Masumoto, H., Hawke, D., Kobayashi, R., and Verreault, A. (2005). A role for cell-cycle- regulated histone H3 lysine 56 acetylation in the DNA damage response. Nature 436, 294-298.

Mavrich, T.N., Ioshikhes, I.P., Venters, B.J., Jiang, C., Tomsho, L.P., Qi, J., Schuster, S.C., Albert, I., and Pugh, B.F. (2008a). A barrier nucleosome model for statistical positioning of nucleosomes throughout the yeast genome. Genome Research 18, 1073-1083.

Mavrich, T.N., Jiang, C., Ioshikhes, I.P., Li, X., Venters, B.J., Zanton, S.J., Tomsho, L.P., Qi, J., Glaser, R.L., Schuster, S.C., Gilmour, D.S., Albert, I., and Pugh, B.F. (2008b). Nucleosome organization in the Drosophila genome. Nature 453, 358-362.

Minary, P., and Levitt, M. (2014). Training-free atomistic prediction of nucleosome occupancy. Proceedings of the National Academy of Sciences of the United States of America 111, 6293-6298.

Mueller, B., Mieczkowski, J., Kundu, S., Wang, P., Sadreyev, R., Tolstorukov, M.Y., and Kingston, R.E. (2017). Widespread changes in nucleosome accessibility without changes in nucleosome occupancy during a rapid transcriptional induction. Genes & Development 31, 451-462.

Ooi, S.L., Priess, J.R., and Henikoff, S. (2006). Histone H3.3 variant dynamics in the germline of Caenorhabditis elegans. PLoS Genetics 2, e97.

Papamichos-Chronakis, M., Watanabe, S., Rando, O.J., and Peterson, C.L. (2011). Global regulation of H2A.Z localization by the INO80 chromatin-remodeling enzyme is essential for genome integrity. Cell 144, 200-213.

Petesch, S.J., and Lis, J.T. (2008). Rapid, transcription-independent loss of nucleosomes over a large chromatin domain at Hsp70 loci. Cell 134, 74-84.

Popova, Y., Thayumanavan, P., Lonati, E., Agrochao, M., and Thevelein, J.M. (2010). Transport and signaling through the phosphate-binding site of the yeast Pho84 phosphate transceptor. Proceedings of the National Academy of Sciences of the United States of America 107, 2890-2895.

Portella, G., Battistini, F., and Orozco, M. (2013). Understanding the connection between epigenetic DNA methylation and nucleosome positioning from computer simulations. PLoS Computational Biology 9, e1003354.

69

Price, B.D., and D'Andrea, A.D. (2013). Chromatin remodeling at DNA double-strand breaks. Cell 152, 1344-1354.

Pulivarthy, S.R., Lion, M., Kuzu, G., Matthews, A.G., Borowsky, M.L., Morris, J., Kingston, R.E., Dennis, J.H., Tolstorukov, M.Y., and Oettinger, M.A. (2016). Regulated large-scale nucleosome density patterns and precise nucleosome positioning correlate with V(D)J recombination. Proceedings of the National Academy of Sciences of the United States of America 113, E6427-E6436.

Quinlan, A.R., and Hall, I.M. (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841-842.

Radman-Livaja, M., and Rando, O.J. (2010). Nucleosome positioning: how is it established, and why does it matter? Developmental Biology 339, 258-266.

Raghothama, K.G. (1999). Phosphate Acquisition. Annual Review of Plant Physiology and Plant Molecular Biology 50, 665-693.

Ramírez, F., Ryan, D.P., Grüning, B., Bhardwaj, V., Kilpert, F., Richter, A.S., Heyne, S., Dündar, F., and Manke, T. (2016). deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Research 44, W160-W165.

Ren, X.Y., Vorst, O., Fiers, M.W., Stiekema, W.J., and Nap, J.P. (2006). In plants, highly expressed genes are the least compact. Trends in Genetics 22, 528-532.

Robertson, K.D. (2002). DNA methylation and chromatin - unraveling the tangled web. Oncogene 21, 5361-5379.

Robinson, J.T., Thorvaldsdóttir, H., Winckler, W., Guttman, M., Lander, E.S., Getz, G., and Mesirov, J.P. (2011). Integrative Genomics Viewer. Nature Biotechnology 29, 24-26.

Rouached, H., Arpat, A.B., and Poirier, Y. (2010). Regulation of phosphate starvation responses in plants: signaling players and cross-talks. Molecular Plant 3, 288-299.

Ruan, W., Guo, M., Wu, P., and Yi, K. (2017). Phosphate starvation induced OsPHR4 mediates Pi-signaling and homeostasis in rice. Plant Molecular Biology 93, 327-340.

Schachtman, D.P., Reid, R.J., and Ayling, S. (1998). Phosphorus uptake by plants: from soil to cell. Plant Physiology 116, 447-453.

Secco, D., Wang, C., Shou, H., and Whelan, J. (2012). Phosphate homeostasis in the yeast Saccharomyces cerevisiae, the key role of the SPX domain-containing proteins. FEBS Letters 586, 289-295.

Secco, D., Jabnoune, M., Walker, H., Shou, H., Wu, P., Poirier, Y., and Whelan, J. (2013). Spatio-temporal transcript profiling of rice roots and shoots in response to phosphate starvation and recovery. The Plant Cell 25, 4285-4304.

Secco, D., Wang, C., Shou, H., Schultz, M.D., Chiarenza, S., Nussaume, L., Ecker, J.R., Whelan, J., and Lister, R. (2015). Stress induced gene expression drives transient DNA methylation changes at adjacent repetitive elements. eLife 4.

Segal, E., and Widom, J. (2009). What controls nucleosome positions? Trends in Genetics 25, 335-343. 70

Shen, L., Shao, N., Liu, X., and Nestler, E. (2014). ngs.plot: Quick mining and visualization of next-generation sequencing data by integrating genomic databases. BMC Genomics 15, 284.

Shivaswamy, S., Bhinge, A., Zhao, Y., Jones, S., Hirst, M., and Iyer, V.R. (2008). Dynamic remodeling of individual nucleosomes across a eukaryotic genome in response to transcriptional perturbation. PLoS Biology 6, e65.

Small, E.C., Xi, L., Wang, J.P., Widom, J., and Licht, J.D. (2014). Single-cell nucleosome mapping reveals the molecular basis of gene expression heterogeneity. Proceedings of the National Academy of Sciences of the United States of America 111, E2462-2471.

Smith, A.P., Jain, A., Deal, R.B., Nagarajan, V.K., Poling, M.D., Raghothama, K.G., and Meagher, R.B. (2010). Histone H2A.Z regulates the expression of several classes of phosphate starvation response genes but not as a transcriptional activator. Plant Physiology 152, 217-225.

Smith, F.W., Mudge, S.R., Rae, A.L., and Glassop, D. (2003). Phosphate transport in plants. Plant and Soil 248, 71-83.

Smith, Z.D., and Meissner, A. (2013). DNA methylation: roles in mammalian development. Nature Reviews Genetics 14, 204.

Snyder, Matthew W., Kircher, M., Hill, Andrew J., Daza, Riza M., and Shendure, J. (2016). Cell- free DNA comprises an in vivo nucleosome footprint that informs its tissues-of-origin. Cell 164, 57-68.

Strahl, B.D., and Allis, C.D. (2000). The language of covalent histone modifications. Nature 403, 41-45.

Struhl, K., and Segal, E. (2013). Determinants of nucleosome positioning. Nature structural & Molecular Biology 20, 267-273.

Subramanian, V., Fields, P.A., and Boyer, L.A. (2015). H2A.Z: a molecular rheostat for transcriptional control. F1000Prime Reports 7, 01.

Suto, R.K., Clarkson, M.J., Tremethick, D.J., and Luger, K. (2000). Crystal structure of a nucleosome core particle containing the variant histone H2A.Z. Nature Structural Biology 7, 1121-1124.

Svaren, J., and Horz, W. (1997). Transcription factors vs nucleosomes: regulation of the PHO5 promoter in yeast. Trends in Biochemical Sciences 22, 93-97.

Svistoonoff, S., Creff, A., Reymond, M., Sigoillot-Claude, C., Ricaud, L., Blanchet, A., Nussaume, L., and Desnos, T. (2007). Root tip contact with low-phosphate media reprograms plant root architecture. Nature Genetics 39, 792-796.

Talbert, P.B., and Henikoff, S. (2010). Histone variants-ancient wrap artists of the epigenome. Nature reviews. Molecular Cell Biology 11, 264-275.

Talbert, P.B., and Henikoff, S. (2014). Environmental responses mediated by histone variants. Trends in Cell Biology 24, 642-650.

71

Teif, V.B., Vainshtein, Y., Caudron-Herger, M., Mallm, J.P., Marth, C., Hofer, T., and Rippe, K. (2012). Genome-wide nucleosome positioning during embryonic stem cell development. Nature Structural & Molecular Biology 19, 1185-1192.

Tessarz, P., and Kouzarides, T. (2014). Histone core modifications regulating nucleosome structure and dynamics. Nature Reviews Molecular cell biology 15, 703-708.

Ticconi, C.A., Delatorre, C.A., and Abel, S. (2001). Attenuation of phosphate starvation responses by phosphite in Arabidopsis. Plant Physiology 127, 963-972.

Tillo, D., and Hughes, T.R. (2009). G+C content dominates intrinsic nucleosome occupancy. BMC Bioinformatics 10, 442.

Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D.R., Pimentel, H., Salzberg, S.L., Rinn, J.L., and Pachter, L. (2012). Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nature Protocols 7, 562- 578.

Ulz, P., Thallinger, G.G., Auer, M., Graf, R., Kashofer, K., Jahn, S.W., Abete, L., Pristauz, G., Petru, E., Geigl, J.B., Heitzer, E., and Speicher, M.R. (2016). Inferring expressed genes by whole-genome sequencing of plasma DNA. Nature Genetics 48, 1273-1278.

Valdés-Mora, F., Song, J.Z., Statham, A.L., Strbenac, D., Robinson, M.D., Nair, S.S., Patterson, K.I., Tremethick, D.J., Stirzaker, C., and Clark, S.J. (2012). Acetylation of H2A.Z is a key epigenetic modification associated with gene deregulation and epigenetic remodeling in cancer. Genome Research 22, 307-321.

Valouev, A., Johnson, S.M., Boyd, S.D., Smith, C.L., Fire, A.Z., and Sidow, A. (2011). Determinants of nucleosome organization in primary human cells. Nature 474, 516-520.

Valouev, A., Ichikawa, J., Tonthat, T., Stuart, J., Ranade, S., Peckham, H., Zeng, K., Malek, J.A., Costa, G., McKernan, K., Sidow, A., Fire, A., and Johnson, S.M. (2008). A high- resolution, nucleosome position map of C. elegans reveals a lack of universal sequence- dictated positioning. Genome Research 18, 1051-1063.

Vance, C.P., Uhde-Stone, C., and Allan, D.L. (2003). Phosphorus acquisition and use: critical adaptations by plants for securing a nonrenewable resource. New Phytologist 157, 423- 447.

Vera, D.L., Madzima, T.F., Labonne, J.D., Alam, M.P., Hoffman, G.G., Girimurugan, S.B., Zhang, J., McGinnis, K.M., Dennis, J.H., and Bass, H.W. (2014). Differential nuclease sensitivity profiling of chromatin reveals biochemical footprints coupled to gene expression and functional DNA elements in Maize. The Plant Cell 26, 3883-3893.

Vogel, K., Horz, W., and Hinnen, A. (1989). The two positively acting regulatory proteins PHO2 and PHO4 physically interact with PHO5 upstream activation regions. Molecular and Cellular Biology 9, 2050-2057.

Wang, J., Zhuang, J., Iyer, S., Lin, X., Whitfield, T.W., Greven, M.C., Pierce, B.G., Dong, X., Kundaje, A., Cheng, Y., Rando, O.J., Birney, E., Myers, R.M., Noble, W.S., Snyder, M., and Weng, Z. (2012). Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Research 22, 1798-1812.

72

Wang, Z., Ruan, W., Shi, J., Zhang, L., Xiang, D., Yang, C., Li, C., Wu, Z., Liu, Y., Yu, Y., Shou, H., Mo, X., Mao, C., and Wu, P. (2014). Rice SPX1 and SPX2 inhibit phosphate starvation responses through interacting with PHR2 in a phosphate-dependent manner. Proceedings of the National Academy of Sciences of the United States of America 111, 14953-14958.

Watanabe, S., Resch, M., Lilyestrom, W., Clark, N., Hansen, J.C., Peterson, C., and Luger, K. (2010). Structural characterization of H3K56Q nucleosomes and nucleosomal arrays. Biochimica et Biophysica Acta 1799, 480-486.

Whitehouse, I., Rando, O.J., Delrow, J., and Tsukiyama, T. (2007). Chromatin remodelling at promoters suppresses antisense transcription. Nature 450, 1031-1035.

Wippo, C.J., Krstulovic, B.S., Ertel, F., Musladin, S., Blaschke, D., Sturzl, S., Yuan, G.C., Horz, W., Korber, P., and Barbaric, S. (2009). Differential cofactor requirements for histone eviction from two nucleosomes at the yeast PHO84 promoter are determined by intrinsic nucleosome stability. Molecular and Cellular Biology 29, 2960-2981.

Woodcock, C.L., Skoultchi, A.I., and Fan, Y. (2006). Role of linker histone in chromatin structure and function: H1 stoichiometry and nucleosome repeat length. Chromosome Research 14, 17-25.

Wu, P., Shou, H., Xu, G., and Lian, X. (2013). Improvement of phosphorus efficiency in rice on the basis of understanding phosphate signaling and homeostasis. Current Opinion in Plant Biology 16, 205-212.

Wu, P., Ma, L., Hou, X., Wang, M., Wu, Y., Liu, F., and Deng, X.W. (2003). Phosphate starvation triggers distinct alterations of genome expression in Arabidopsis roots and leaves. Plant Physiology 132, 1260-1271.

Wu, Y.F., Zhang, W.L., and Jiang, J.M. (2014). Genome-wide nucleosome positioning is orchestrated by genomic regions associated with DNase I hypersensitivity in rice. PLoS Genetics 10.

Xiong, J., Gao, S., Dui, W., Yang, W., Chen, X., Taverna, S.D., Pearlman, R.E., Ashlock, W., Miao, W., and Liu, Y. (2016). Dissecting relative contributions of cis- and trans- determinants to nucleosome distribution by comparing Tetrahymena macronuclear and micronuclear chromatin. Nucleic Acids Research 44, 10091-10105.

Xu, F., Zhang, K., and Grunstein, M. (2005). Acetylation in histone H3 globular domain regulates gene expression in yeast. Cell 121, 375-385.

Yang, H. (2009). In plants, expression breadth and expression level distinctly and non-linearly correlate with gene structure. Biology Direct 4, 45.

Ye, J., Ai, X., Eugeni, E.E., Zhang, L., Carpenter, L.R., Jelinek, M.A., Freitas, M.A., and Parthun, M.R. (2005). Histone H4 lysine 91 acetylation a core domain modification associated with chromatin assembly. Molecular Cell 18, 123-130.

Yuan, G.C., Liu, Y.J., Dion, M.F., Slack, M.D., Wu, L.F., Altschuler, S.J., and Rando, O.J. (2005). Genome-scale identification of nucleosome positions in S. cerevisiae. Science 309, 626-630.

73

Zemach, A., McDaniel, I.E., Silva, P., and Zilberman, D. (2010). Genome-wide evolutionary analysis of eukaryotic DNA methylation. Science 328, 916-919.

Zhang, T., Zhang, W., and Jiang, J. (2015). Genome-wide nucleosome occupancy and positioning and their impact on gene expression and evolution in plants. Plant Physiology 168, 1406-1416.

Zheng, L., Huang, F., Narsai, R., Wu, J., Giraud, E., He, F., Cheng, L., Wang, F., Wu, P., Whelan, J., and Shou, H. (2009). Physiological and transcriptome analysis of iron and phosphorus interaction in rice seedlings. Plant Physiology 151, 262-274.

Zhou, J., Jiao, F., Wu, Z., Li, Y., Wang, X., He, X., Zhong, W., and Wu, P. (2008). OsPHR2 is involved in phosphate-starvation signaling and excessive phosphate accumulation in shoots of plants. Plant Physiology 146, 1673-1686.

Zilberman, D., Coleman-Derr, D., Ballinger, T., and Henikoff, S. (2008). Histone H2A.Z and DNA methylation are mutually antagonistic chromatin marks. Nature 456, 125-129.

74

APPENDIX: COPYRIGHT INFORMATION

From: Maria Salazar

To: Qi Zhang Date: Tue, 1/30/2018, 3:21 PM Subject: 00790423 RE: Request the permission to reuse a published article in a PhD dissertation Dear Dr. Zhang,

Thank you for contacting BioMed Central.

The open access articles published in BioMed Central's journals are made available under the Creative Commons Attribution (CC-BY) license, which means they are accessible online without any restrictions and can be re-used in any way, subject only to proper attribution (which, in an academic context, usually means citation).

The re-use rights enshrined in our license agreement (http://www.biomedcentral.com/about/policies/license-agreement) include the right for anyone to produce printed copies themselves, without formal permission or payment of permission fees.

You will be able to find details about these articles at http://www.biomedcentral.com/about/policies/reprints-and-permissions

If you have any questions, please do not hesitate to contact me.

With kind regards, --- Maria Salazar Global Open Research Support Executive Global Open Research Support

Springer Nature T +44 (0)203 192 2009 www.springernature.com

75

VITA

Qi Zhang is the son of Zhijian Zhang and Yaping Shen. He was born in 1990 in

Hangzhou, China and graduated from Hangzhou Xuejun High School in 2008. Thereafter, he obtained his Bachelor’s degree in Biotechnology in 2012 at Sichuan Agricultural University in

Chengdu, China. During undergraduate studies, he worked on rice genotyping under the supervision of Dr. Xianjun Wu and Dr. Hongyu Zhang. In August 2012, he went to Louisiana

State University to pursue his Ph.D. with Dr. Aaron Smith. His graduate project was mapping nucleosome positions in the rice genome. As a graduate teaching assistant, he taught recitation for Genetics and Microbiology laboratory. During his spare time, he volunteered at Baton Rouge

General Hospital. Mr. Zhang expects to graduate with Doctor of Philosophy in Biological

Sciences in May 2018.

76