Protein-DNA Interactions of pUL34, an Essential Human Cytomegalovirus DNA-

Binding

A dissertation presented to

the faculty of

the College of Arts and Sciences of Ohio University

In partial fulfillment

of the requirements for the degree

Doctor of Philosophy

Mark D. Slayton

August 2018

© 2018 Mark D. Slayton. All Rights Reserved. 2

This dissertation titled

Protein-DNA Interactions of pUL34, an Essential Human Cytomegalovirus DNA-

Binding Protein

by

MARK D. SLAYTON

has been approved for

the Department of Molecular and Cellular Biology

and the College of Arts and Sciences by

Bonita J. Biegalke

Associate Professor of Biomedical Sciences

Joseph Shields

Interim Dean, College of Arts and Sciences 3

ABSTRACT

SLAYTON, MARK D., Ph.D., August 2018, Molecular and Cellular Biology

Protein-DNA Interactions of pUL34, an Essential Human Cytomegalovirus DNA-

Binding Protein

Director of Dissertation: Bonita J. Biegalke

Human cytomegalovirus (HCMV) is primarily an opportunistic pathogen in human, causing significant disease in immunocompromised individuals. A large, double- stranded DNA genome (~230 kilobases) provides the coding capacity for over 200 , of which only 25% are required for viral replication in cell culture. The viral UL34 encodes sequence-specific DNA-binding (pUL34) which are essential for replication, and viruses lacking the proper expression of pUL34 cannot replicate in cell culture. Interactions of pUL34 with DNA binding sites (US3 and US9?) represses transcription of (these) two viral immune evasion genes that are dispensable for replication in cell culture. There are 12 additional predicted pUL34-binding sites present in the HCMV genome (strain AD169), with three of them concentrated near the HCMV origin of lytic replication (oriLyt). Analysis of 47 clinical isolates of HCMV confirmed that the predicted UL34-binding sites were highly conserved. Protein-DNA interactions were analyzed during infection with ChIP-seq and confirmed that pUL34 binds to the human and viral genome during infection, including at the three predicted UL34-binding sites in the oriLyt region. Mutagenesis of the UL34-binding sites in an oriLyt-containing plasmid significantly reduced viral-mediated, oriLyt-dependent DNA replication.

Subsequently, mutagenesis of these same sites in the HCMV genome reduced the 4 replication efficiencies of the resulting viruses. Protein-protein interaction analyses demonstrated that pUL34 interacts with 3 virus proteins that are essential for viral DNA replication - IE2, UL44, and UL84, suggesting that pUL34-DNA interactions in the oriLyt region are involved in the DNA replication cascade. Lastly, mutagenesis of the predicted UL34-binding site in the third exon of another essential viral gene, UL37, demonstrated that some UL34-binding sites are not important for viral replication. 5

DEDICATION

This work is dedicated to my father, Daniel Earl Slayton, who wanted nothing more than

to see his children succeed at their goals. Even though he was unable to see this

accomplishment, I was motivated and encouraged by his memory. 6

ACKNOWLEDGMENTS

I am thankful to a great number of individuals who assisted me during my thesis research in a variety of ways. First and foremost, I would like to thank my dissertation advisor, Dr. Bonita Biegalke, who made this entire project possible. She accepted me into her lab and set me off on a well-designed project, giving me invaluable advice and assistance along the way. Her bold and direct personality helped to shape me into the scientist that I am today. I would also like to thank the members of my dissertation committee, Drs. Calvin James, Mark Berryman, and Justin Holub, who examined my work and asked me challenging questions which were critical to refining both my research and my knowledge.

From the laboratory, I would like to thank my colleagues Ms. Janet Hammer, who kept the lab running efficiently and helped me perform several experiments, as well as

Dr. Tanvir Hossain, who created an important virus and assisted in several experiments.

Both Janet and Tanvir encouraged me when I was frustrated with the results of a long but failed experiment, telling me to “keep moving forward”. I am thankful to several undergraduate students who rotated through the laboratory and provided time-saving assistance through laboratory maintenance – in particular, Ms. Allison Marsh, Ms. Gianna

Montague and Mr. Jim Wilson.

I thank my mother and my three sisters for believing in me and offering continual words of encouragement. Finally, I thank my wife, Liza Zimmerman-Slayton, for her devotion, love, encouragement, and support by making my life outside the laboratory as wonderful as possible. 7

TABLE OF CONTENTS

Page

Abstract ...... 3 Dedication ...... 5 Acknowledgments...... 6 List of Tables ...... 9 List of Figures ...... 10 Chapter 1: Introduction ...... 11 Medical Significance ...... 11 Human Cytomegalovirus ...... 12 Classification...... 12 Structural organization ...... 12 Genome organization ...... 13 Viral infection and replication ...... 14 Lytic-Phase Genome Replication ...... 16 The origin of lytic replication ...... 16 Viral genes essential to DNA replication ...... 17 DNA-Binding Protein UL34 ...... 21 Summary ...... 22 Chapter 2: Materials and Methods ...... 24 Human cell culture...... 24 Oligonucleotides...... 24 Plasmid mutagenesis and cloning...... 27 Transient transfection and total DNA isolation...... 28 Southern blotting...... 29 Quantitative polymerase chain reactions...... 29 RT-qPCR...... 30 Generation of mutant viruses...... 30 One-step growth curve...... 31 Chromatin immunoprecipitation...... 32 Next-generation sequencing...... 33 Genomics data analysis...... 34 8

Protein-protein interactions...... 35 Statistical analysis...... 36 Data availability...... 36 Chapter 3: Conservation of UL34 Binding Sites and ChIP-Seq Analysis ...... 37 Background ...... 37 Aims ...... 39 Results ...... 39 Discussion ...... 67 Chapter 4: UL34 Binding Sites in the Origin of Lytic Replication ...... 70 Background ...... 70 Aims ...... 71 Results ...... 72 Discussion ...... 86 Chapter 5: BAC Mutagenesis With gBlocks and the UL34 Binding Site in Exon 3 of the UL37 Open Reading Frame ...... 89 Background ...... 89 Aims ...... 90 Results ...... 91 Discussion ...... 105 Summary ...... 107 Future Directions ...... 108 References ...... 111

9

LIST OF TABLES

Page

Table 1. List of oligonucleotides used in this work ...... 24 Table 2. Conservation of UL34 binding sites ...... 42 Table 3. Distribution of the aligned reads in the ChIP-seq samples ...... 51 Table 4. Interactions between pUL34 and the viral genome during infection...... 52 Table 5. Interactions between pUL34 and the during infection ...... 62 Table 6. Reactome Pathway of the human genes that UL34 binds near or within ...... 64 Table 7. Sequence of the long primers which failed to generate a transfer construct ...... 93 Table 8. Sequence of the gBlock gene fragment ...... 97 10

LIST OF FIGURES

Page

Figure 1. Organization of the HCMV genome ...... 15 Figure 2. UL34-binding sites throughout the HCMV genome ...... 41 Figure 3. Characterization of the Myc-UL34-AD169 virus ...... 47 Figure 4. Motif prediction in the viral genome and pUL34-DNA binding interactions during infection ...... 54 Figure 5. Enrichment of pUL34 in the oriLyt region...... 58 Figure 6. Motif prediction and overrepresented human transcription factor binding sites found in ChIP-seq data...... 65 Figure 7. Transient replication assays ...... 73 Figure 8. Number of genomes required to generate a single plaque ...... 78 Figure 9. Mutant virus analysis ...... 79 Figure 10. Protein-protein interactions ...... 83 Figure 11. Secondary structure of long primers and design of gBlock gene fragment .... 94 Figure 12. Work-flow chart of “en passant” mutagenesis with the standard protocol (left) and modified protocol (right) ...... 100 Figure 13. One-step growth curve comparison of wild-type (pHB5) and mutant (UL37A and B) virus ...... 102 Figure 14. Measurement of UL37 transcripts from wild-type and UL37-exon3-mutant infected cells ...... 104

11

CHAPTER 1: INTRODUCTION

Medical Significance

Human cytomegalovirus (HCMV) is primarily an opportunistic pathogen, causing significant disease following infection in persons with immature or compromised immune systems. It is the most common congenital infection worldwide and the leading non-hereditary cause of deafness in children (Manicklal et al., 2013). In adults, HCMV is problematic for transplant patients, causing an increase in organ rejection and pneumonia

(Azevedo et al., 2015; Wang et al., 2014), and for AIDS patients, causing organ-specific diseases such as retinitis (Heiden et al., 2007). The significant financial burden of treating HCMV infection has led to multiple vaccine development strategies, but a licensed vaccine is not yet available (see (Schleiss, 2016) for a review). Antiviral compounds used to treat HCMV infection target the viral DNA polymerase, UL54

(Mercorelli et al., 2008). Recently, drugs targeting other essential viral proteins, such as the viral terminase complex (Melendez and Razonable, 2015); the viral protein kinase,

UL97 (Shannon-Lowe and Emery, 2010); and the major transcriptional transactivator,

UL122 (IE2) (Mercorelli et al., 2016) have been developed. Additional viral genes essential for replication that may serve as potential drug or vaccine targets still lack functional characterization.

12

Human Cytomegalovirus

Classification

Human cytomegalovirus, or human herpesvirus 5, is a member of the herpesviridae family of DNA viruses. Herpesviruses are enveloped viruses with large dsDNA genomes and have lengthy replication cycles. Herpesviruses are well known for their ability to undergo both an active, lytic infection and a suppressed or latent infection.

There are three subfamilies in herpesviridae, grouped based on genome structure and growth kinetics: alphaherpesvirinae which includes the herpes simplex-1 and -2 viruses as well as Varicella-Zoster virus (chicken pox); betaherpesvirinae which includes cytomegalovirus, human herpesvirus 6A/6B, and human herpesvirus 7; and gammaherpesvirinae which includes Epstein-Barr virus and Kaposi’s sarcoma-associated herpesvirus.

Structural organization

The HCMV virion is comprised of an icosahedral capsid which is surrounded by a layer of viral proteins called the tegument which is then surrounded by a lipid bilayer envelope derived from host cell membranes. The capsid functions to directly protect the linear dsDNA viral genome and is formed by the association and self-assembly of virally- encoded proteins. The tegument layer is present between the capsid and envelope, and contains viral proteins that both associate with the viral capsid and remain generally unstructured (Chen et al., 1999; Varnum et al., 2004). These tegument proteins are involved in processes occurring before viral replication begins, such as disabling host immune responses, trafficking the capsid to the nucleus and activating viral gene 13 expression (Kalejta, 2008). The outermost layer of the mature virion is the envelope.

The viral envelope consists of a host-derived lipid bilayer embedded with cellular and viral glycoproteins which are important for attachment to target host cells (Vanarsdall and Johnson, 2012).

Genome organization

Human cytomegalovirus has one the largest genomes of any human viral pathogen, with ~230 kilobases of linear dsDNA. The genome is organized into sections based on the properties of the respective local sequences. Most of the genome is comprised of unique sequences, separated into long (unique long; UL) and short (unique short; US) portions. The ends of the genome contain repeated sequences which are inverted to each other (terminal repeat long; TRL and terminal repeat short; TRS) while the unique long and short regions are separated by internal repeats (internal repeat short,

IRS and internal repeat long, IRL). Additionally, the ends of the genome contain a single unpaired base (Tamashiro and Spector, 1986) that is important for circularization: a key step in rolling circle DNA replication, which the virus is postulated to utilize for replication of the viral genome.

The large HCMV genome results in more potential open reading frames (ORFs) than is typical for a virus. However, the number of ORFs encoded by HCMV varies depending on the isolation source and strain (laboratory versus clinical) as well as the general process used to determine which ORFs are likely to be functional. The number of functional ORFs ranges from 165 to 252 (Murphy and Shenk, 2008; Murphy et al.,

2003; Stern-Ginossar et al., 2012). The widely used laboratory strain, AD169, contains 14

208 predicted ORFs when the repeat regions are included. However, AD169 was passaged several times in fibroblasts before being sequenced, and was selected for rapid growth in these cells (Chee et al., 1990). When compared to clinical isolates, AD169 is missing 19 additional predicted ORFs (Figure 1). The missing genes are clustered at the end of the unique long region and encode for a glycoprotein complex involved in entry to cell types that are infectible by clinical isolates but not by strains extensively passaged in the laboratory. The IRL region (an inversion of the TRL region) replaces these missing genes in laboratory strains, but is not present in clinical isolates (Cha et al., 1996).

Viral infection and replication

Human cytomegalovirus, like all known viruses, must infect cells of the host to replicate. While it is only able to infect humans, the virus can infect a wide range of human cell types. Studies to determine HCMV cell tropism have shown that lytic replication is possible in nearly every cell type and organ tissue (Einhorn and Ost, 1984;

Ho et al., 1984; Kahl et al., 2000; Riegler et al., 2000; Sinzger et al., 1995). Primarily, viral replication is highly permissive in epithelial and endothelial cells, smooth muscle cells, as well as connective tissue cells such as fibroblasts (Sinzger et al., 1995). When clinical isolates are passaged several times in fibroblasts, the virus becomes specialized for replication in that cell type (Woodroffe et al., 1997); however, if isolates are passaged in endothelial cells, the ability to infect a broad range of cell types is maintained (Sinzger et al., 2008; Waldman et al., 1991). This effect has been attributed to the UL128-UL131 , which is missing in fibroblast-specialized strains (Adler et al., 2006); the fibroblast-specialized strains can replicate to higher titers than the isolates which

Figure 1. Organization of the HCMV genome.

Diagram of the HCMV genome for the laboratory strain AD169. The unique long (UL) and unique short (US) regions are indicated by the gray rectangles and labeled above. Genes that are essential to viral DNA replication, as well as UL34, are marked with vertical black bars and labeled below the UL region. Repeat regions are displayed in blue (long) and red (short) with arrows indicating the directionality of transcripts which originate there. The portion of the UL region that is missing from

AD169 but is present in clinical isolates is indicated below the IRL region (UL133-151).

maintain broad tropism. Latent infection occurs primarily in monocytes and macrophages, which are severely limited for lytic replication until differentiation, at which point reactivation can occur (Carter and Ehrlich, 2008; Ibanez et al., 1991).

A full lytic-phase replication cycle for HCMV is lengthy, taking roughly 5 days in cell culture (Chambers et al., 1971; Furukawa et al., 1973); in comparison, the HSV-1 lytic-phase replication cycle is complete in ~24 hours (Russell et al., 1964). HCMV gene expression is extensively regulated in a temporal manner. Immediately after infection, the immediate-early genes are expressed (Stinski et al., 1983), encoding proteins which primarily function as transcriptional regulators and immunoevasion (Hermiston et al.,

1987; Jones et al., 1996). The onset of early gene expression occurs around 24 hours after infection and results in the expression of proteins that are involved in replication of the viral DNA genome (Kerry et al., 1994; White and Spector, 2007). Late gene expression, the final phase of the gene expression cascade (Geballe et al., 1986), results in the production of structural proteins which make up the capsid, tegument and envelope glycoproteins of new virions (Anders et al., 2007; Stinski, 1977).

Lytic-Phase Genome Replication

The origin of lytic replication

The human cytomegalovirus genome contains a single, highly complex lytic- phase origin (oriLyt) of DNA replication. oriLyt spans approximately 3.5 kilobases, from nucleotide 91165 to 94860 in strain AD169 (Anders et al., 1992). The oriLyt can be further defined to a core section (~1.5kb) which, when deleted, results in minimal or no viral replication. This region is structurally unique when compared to the remainder of 17 the HCMV genome and has a high degree of base asymmetry: the leftward region is A+T rich, while the rightward region is G+C rich. Several motifs and repeated elements have been identified within the region flanking the oriLyt core, including binding sites for important trans-acting factors, such as IE2, UL34, UL84 and human transcription factors, as well as RNA/DNA hybrid structures including an important RNA stem-loop (Colletti et al., 2007; Liu and Biegalke, 2013; Masse et al., 1992; Xu et al., 2004; Zhu et al., 1998).

Although the core region is regarded as the most vital portion of the oriLyt, deletion studies showed that the leftward flanking region (which contains the oriLyt promoter) is absolutely required for proper oriLyt function, while the deletion of the rightward flanking region yielded a virus that was severely growth defective (Borst and Messerle,

2005). Additionally, when the oriLyt is inverted relative to the flanking regions, the genome is unable to be replicated, highlighting the positional importance of the flanking regions.

Viral genes essential to DNA replication

Replication of the viral DNA genome during the lytic phase requires the expression of several viral genes (Pari, 2008). To date, a total of nine proteins have been identified as critical for genome replication. Most of these proteins are conserved amongst all herpesviruses and are referred to as core replication proteins. The core replication protein set consists of UL44 (DNA polymerase processivity factor), UL54

(viral DNA polymerase), UL57 (single-stranded DNA binding protein), and UL70-

UL102-UL105 (helicase-primase complex). The remaining three essential proteins are

UL122 (IE2; major transcriptional regulator), which is conserved amongst the 18 betaherpesvirinae subfamily, as well as UL84 (multifunctional activator) and

UL112/UL113 locus (phosphoproteins) which are conserved amongst cytomegaloviruses.

The viral DNA polymerase is composed of a two-protein complex, UL44

(processivity factor) and UL54 (catalytic subunit). Both DNA polymerase subunits are produced early in viral replication and have been well characterized due to their importance in the viral replication cycle. UL44 is highly expressed during infection and localizes to compartments in the cell nucleus where viral DNA synthesis occurs; however, by itself, UL44 cannot synthesize DNA (Appleton et al., 2006; Strang et al.,

2012). The protein-protein interaction formed between UL44 and UL54 takes place on the C-terminal of UL54 (Loregian et al., 2003), where UL44 acts as a clamp and greatly increases the specificity of the complex to viral DNA (Appleton et al., 2004). Expression of UL54 is activated at early times during infection by the immediate early proteins IE1 and IE2 (Kerry et al., 1996). UL54 has similarities to human DNA polymerase α and is the main target for many antivirals used to treat HCMV infection (Ertl et al., 1991;

Mercorelli et al., 2008). UL54 can synthesize DNA without UL44, but the efficiency is increased by 2 to 5 fold when UL44 is present (Ertl and Powell, 1992).

UL57 encodes a single-stranded DNA binding protein with homology to HSV-1 major DNA-binding protein, ICP8 (UL29). Because ICP8 has been well characterized and is highly homologous to UL57, many of the functionalities of ICP8 are presumed to be the same for UL57. The UL57 ORF and promoter region are located directly adjacent to the origin of lytic replication promoter (Kiehl et al., 2003). UL57 localizes to the nucleus of infected cells and is distributed throughout the structure of replication 19 compartments (Penfold and Mocarski, 1997; Strang et al., 2012). During genome replication, it is assumed that UL57 binds to unraveled ssDNA near the lytic replication origin to facilitate efficient DNA replication (Pari, 2008).

The helicase-primase complex is formed by UL70 (primase), UL102 (primase- associated factor) and UL105 (helicase). These proteins are predicted to act as the helicase-primase complex through their homology to HSV-1 proteins that perform the same functions; however, their exact roles have yet to be proven experimentally. The primase, UL70 (homologue of UL52 in HSV-1) is a DNA-dependent RNA polymerase responsible for synthesizing RNA primers that are extended by dNTP polymerization via the viral DNA polymerase (Urban et al., 2009). Limited studies have been performed to analyze the primase-associated factor, UL102 (homologue of UL8 in HSV-1) (Smith and

Pari, 1995). Likewise, the putative helicase protein, UL105 (homologue of UL5 in HSV-

1) remains largely unstudied (Smith et al., 1996). However, these three proteins interact and likely form a complex (McMahon and Anders, 2002) and are required for lytic-phase

DNA replication (Pari and Anders, 1993).

The UL84 gene encodes for a multifunctional protein that is required for lytic- phase DNA replication. UL84 maintains a close interaction with UL122 (IE2) during infection and regulates IE2-mediated transcriptional activation (Gao et al., 2008). UL84 is a DNA-binding protein that interacts with the oriLyt promoter as well as viral (Strang et al., 2009) and cellular DNA replication factors (Kagele et al., 2009); as such, UL84 is proposed to be the HCMV origin initiator protein. UL84 also binds to and shuttles viral mRNA (Gao et al., 2010) and interacts with an RNA stem-loop that is formed within the 20 essential region of the oriLyt (Colletti et al., 2007). Intriguingly, despite this vast array of functionalities, the change of a single amino acid in UL122 (IE2) results in a viable cytomegalovirus in the absence of UL84 (Spector, 2015).

The UL112/UL113 locus is transcribed to yield spliced transcripts that encode four phosphoproteins of varying sizes. The phosphoproteins are expressed early in infection and localize to viral DNA replication compartments within the cell nucleus

(Ahn et al., 1999; Yamamoto et al., 1998). These proteins are postulated to recruit UL44 to the pre-replication compartments for efficient DNA replication (Park et al., 2006).

While all four phosphoproteins were implicated in UL44 recruitment, only two of the isoforms are required for efficient viral replication (Schommartz et al., 2017).

Perhaps the best characterized gene in the human cytomegalovirus genome,

UL122 (IE2; immediate-early 2) is the primary transcriptional regulator of early and late viral genes (Hermiston et al., 1987). The major immediate early (MIE) enhancer- promoter controls expression of the MIE locus transcript, which is spliced to generate the gene products of UL123 (IE1) and UL122 (IE2); exons 1, 2, 3 and 4 comprise IE1 while exons 1, 2, 3 and 5 produce IE2 (Stenberg et al., 1984). The major isoform of IE2 is roughly 86 kDa, and therefore referred to as IE86; deletion of this isoform results in an unviable virus that cannot express other viral genes (Marchini et al., 2001). IE86 is responsible for repression of its own promoter (MIEP), disabling promoters involved in cellular immune response, activating viral and cellular promoters that are beneficial to viral replication, pushing the cell cycle into S phase, inhibiting cellular DNA replication and activating lytic-phase viral DNA replication (Ahn et al., 1999; Petrik et al., 2006; 21

Pizzorno and Hayward, 1990; Schwartz et al., 1994; Taylor and Bresnahan, 2006;

Wiebusch et al., 2003). Although the other isoforms of IE2 (besides IE86) are dispensable for viral replication, they are still important and contribute to the overall efficiency of viral replication (White et al., 2007). Consequently, recent approaches in druggable targets for anti-HCMV therapy have focused on disrupting IE2 function with encouraging results (Beelontally et al., 2017; Mercorelli et al., 2016).

DNA-Binding Protein UL34

The global mutagenesis experiments which targeted each HCMV gene individually revealed those that are essential for viral replication in human fibroblasts

(Dunn et al., 2003; Yu et al., 2003). Out of the total 162 ORFs that were tested, 45 were determined to be essential to replication; of these, only 4 ORFs (UL34, UL60, UL84 and

UL90) were specific to cytomegalovirus and not conserved amongst other herpesviruses.

Identifying the functionalities of these cytomegalovirus-specific essential genes is of interest to herpesvirology and microbial pathology overall.

UL34 encodes for two sequence-specific DNA binding proteins that are expressed at early and late time points post infection (Biegalke et al., 2004; Rana and Biegalke,

2014). The early and late UL34 proteins are highly similar, differing by only 21 amino acids on the amino terminal. Both UL34 proteins act as transcriptional repressors of the non-essential immunoevasion genes, US3 and US9, through a sequence-specific binding interaction near the transcription start site (LaPierre and Biegalke, 2001) and shuttle into and out of the cell nucleus, where they co-localize with IE2 and UL44 in viral DNA replication compartments (Biegalke, 2013; Biegalke et al., 2004; Rana and Biegalke, 22

2014). UL34 DNA-binding has been identified as a 10-nucleotide response element

(AAACACCGT[G/T]) that occurs at 14 locations throughout the viral genome in strain

AD169 (LaPierre and Biegalke, 2001; Liu and Biegalke, 2013). The UL34-binding sites are located within transcriptional regulatory regions, open reading frames, and within the region flanking oriLyt. The three binding sites near the oriLyt are located specifically: near the UL57 and oriLyt promoter, next to the leftward boundary of the oriLyt core region, and within the essential 4.9 kilobase non-coding RNA (RNA4.9). The locations of these binding sites suggest a regulatory role for UL34 in viral DNA replication through direct or indirect interactions with other essential replication factors.

Summary

Human cytomegalovirus is a significant human pathogen that undergoes a complex and lengthy replication cycle, contains a large dsDNA genome (relative to other viral pathogens), modulates the host immune response, and acts opportunistically by replicating lytically or remaining latent until conditions are prime for reactivation

(Murphy and Shenk, 2008; Powers et al., 2008; Sinclair and Sissons, 2006). Most cytomegalovirus genes which are essential to replication in cell culture have known functions, with few exceptions (Dunn et al., 2003). Despite extensive research into the biology of this virus, a functional vaccine is not available, drug toxicity remains an issue and the virus develops resistance to these compounds in a fairly rapid manner (Luisi et al., 2017; Piret and Boivin, 2017). Targeting an essential, but CMV-specific protein, such as UL34, might result in a drug with lower toxicity than those which target viral components that have homology to human proteins. However, the properties of the 23

UL34 proteins which make them essential to viral replication are currently unknown (Liu and Biegalke, 2013; Rana and Biegalke, 2014).

The purpose of this work was to broaden our understanding of the UL34 proteins, focusing primarily on the DNA-binding activity of the proteins. The genomes of several

(47) HCMV isolates were analyzed for predicted UL34-binding sites in silico; these binding sites were conserved amongst HCMV strains independent of tissue site and geographic location. The DNA-binding profile of the early UL34 protein was expanded by examining the specificity for both the viral and human genome within infected cells; previously, these experiments were performed through in vitro experiments which targeted the known binding sequence (Liu and Biegalke, 2013). One result from these studies demonstrated that the UL34 proteins interact with the viral oriLyt region. Further experiments were carried out to mutate the UL34-binding sites within the viral oriLyt region and demonstrated that the binding sites contribute to efficient viral replication.

Within the oriLyt region, UL34-DNA binding interactions occur at close proximity to the known DNA-binding sites of essential viral DNA replication proteins; protein-protein interaction studies were performed and demonstrated that UL34 proteins interact with core viral replication proteins. The combined results of these experiments demonstrate that the DNA-binding interactions of the UL34 proteins are more frequent than previously thought, and that these interactions within the oriLyt region are involved in both oriLyt-dependent DNA replication and overall viral growth. 24

CHAPTER 2: MATERIALS AND METHODS

Human cell culture.

Primary human diploid fibroblasts (HDFs) were maintained in Dulbecco’s minimal essential medium supplemented with 100 ug/mL streptomycin, 100 units/mL penicillin, 2mM glutamine and 10% NuSerum (Becton Dickinson, Franklin Lakes, NJ).

Oligonucleotides.

Table 1. List of oligonucleotides used in this work.

Identifier Sequence, 5’ – 3’ Description

For site directed mutagenesis (SDM) 537 AAAGGCCTGACCGCACCCCAACCGGC of pSP50 at UL34- oriLyt-3. AAAGGCCTTACAGTTTCGGGGCCATTTGAACA For SDM of pSP50 at 538 GAGAAAGGTGGG UL34-oriLyt-1. AACATATGGTTTTTAAATCACAAAGAACCGCC For SDM of pSP50 at 539 TGACGG UL34-oriLyt-1. Within pCR-Blunt 544 TCGCATGCTCCTCTAGACTCGAG vector multiple cloning site. Within pCR-Blunt 545 TGGGAGCTCTCCGGATCCAA vector multiple cloning site. AAAGGCCTTACAGTTTCGGGGCCATTTGAACA For SDM of pSP50 at 557 G UL34-oriLyt-3. For SDM of pSP50 568 TCAGTGCGCATGCGTCGGTAAAATTCC near unique NotI site.

For SDM of pSP50 569 CTAGGTGGGCGGAGCGGTAATTTTCC near unique NotI site.

For SDM of pSP50 at 570 CATGAAACAGATCTATGGGGAACG GTGTTGT UL34-oriLyt-2. For SDM of pSP50 at 571 ACAGATCTTCATGGCAATGGGCGAAAGTTAC UL34-oriLyt-2. 25

Table 1 continued. Identifier Sequence, 5’ – 3’ Description

To create GST- 577 AAGTCGACGTCACGGGGAATCACTATGTAC tagged IE2 exon 5. To create GST- 581 TTGGATCCATGCTGCCTATCTACGAGAC tagged IE2 exon 5. GGACCTATACTATTACCGCCCCACCGCCGTCG To generate transfer TCGTCATGATGGAGCAGAAGCTGATCTCAGA construct for Red 586 GGAGGACCTGAACTTCAAGTAGGGATAACAG recombination adding GGTAATCGATTT myc-tag to UL34. CGCATCTCGGCGGCTCGCAGGACTGAATCGTC To generate transfer GTTGGAGAAGTCTCGGTGGTGATGATGAAGTT construct for Red 585 CAGGTCCTCCTCTGAGATCAGCTTCTGCTCCA recombination adding TCATGACGACGGCCAGTGTTACAACCAATTAA myc-tag to UL34. CC To revert mutation at 596 GATCGGTCTAGAAACACCGTG UL34-oriLyt-2. Phosphorylated To revert mutation at 597 GATCCACGGTGTTTCTAGACC UL34-oriLyt-2. Phosphorylated To generate transfer AGCTAACAAACCGTGAAAAGTCACGTTTCAC construct for Red 598 GCATATGGTTTTTAAATTAGGGATAACAGGGT recombination at AATCGATTT UL34-oriLyt-1. To generate transfer CAGGCGGTTCTTTGTGATTTAAAAACCATATG construct for Red 599 CGTGAAACGTGACTTTGCCAGTGTTACAACCA recombination at ATTAACC UL34-oriLyt-1. For qPCR to quantify plasmid DNA. 612 GTCAGACCCCGTAGAAAAGATC Targets plasmid vector sequence. For qPCR to quantify plasmid DNA. 613 GAGTTGGTAGCTCTTGATCCG Targets plasmid vector sequence. For qPCR to quantify viral DNA. Targets 614 TGTTCCACTCGAAATAGGCTC US3 genomic sequence. For qPCR to quantify viral DNA. Targets 615 CGTCTCTCAATCTTACATGGACAG US3 genomic sequence. 26

Table 1 continued.

Identifier Sequence, 5’ – 3’ Description Cy5-labeled probe Cy5/CTGCGCGTA/TAO/ATCTGCTGCTTGC/3lAb 616 targeting pCR-Blunt RQsp vector. 6FAM/CGCTCATCA/ZEN/TGGTGGTACTGCTCA/ 6FAM-labeled probe 617 3lABkFQ targeting US3. To generate transfer CCACCTTTCTCTGTTCAAATGGCCCCGAAACT construct for Red 624 GTAAGGCCTGACCGCACCCTAGGGATAACAG recombination at GGTAATCGATTT UL34-oriLyt-3. To generate transfer AAATGGCGCCGGTTGGGGTGCGGTCAGGCCT construct for Red 625 TACAGTTTCGGGGCCAGCCAGTGTTACAACCA recombination at ATTAACC UL34-oriLyt-3. To generate transfer CTTGTAAAAAGTAACTTTCGCCCATTGCCATG construct for Red 626 AAGATCTATGGGGAACTAGGGATAACAGGGT recombination at AATCGATTT UL34-oriLyt-2. To generate transfer GTCGACACACAACACCGTTCCCCATAGATCTT construct for Red 627 CATGGCAATGGGCGAAGCCAGTGTTACAACC recombination at AATTAACC UL34-oriLyt-2. For qPCR to quantify viral genomes. 664 CTTGGTGTTAGGGAGAACTC Targets UL44 genomic sequence. For qPCR to quantify viral genomes. 665 ATCGCAACTCCGGCAATTA Targets UL44 genomic sequence. 6FAM/ACGTTACAG/ZEN/AATCCTCGCTGTCGC 6FAM-labeled probe 666 /3IABkFQ targeting UL44. For SDM of pSP50 at 669 TACAATCTAGATCTATCTGGCGGCGAGCGTA a non-UL34 binding site. For SDM of pSP50 at 670 TACGCTCGCCGCCAGATAGATCTAGATT a non-UL34 binding site. For the detection of 685 AGAGCTACGACGTGCCTGAC cDNA derived from beta-actin transcripts. 27

Table 1 continued.

Identifier Sequence, 5’ – 3’ Description

For the detection of 686 AGCACTGTGTTGGCGTACAG cDNA derived from beta-actin transcripts. For the detection of 696 GGCAAGACGTCCCGGAGAAGA cDNA derived from RNA4.9 transcripts. For the detection of 697 TGTGGTTCGTCGTACTCACAGTCT cDNA derived from RNA4.9 transcripts. For the detection of 698 AGGCTGCGTTCCACACCGTT cDNA derived from UL57 transcripts. For the detection of 699 CTTGGTGACCGTGCCCGTGA cDNA derived from UL57 transcripts.

Plasmid mutagenesis and cloning.

A plasmid containing the wild-type oriLyt region from HCMV strain AD169

(pSP50) was kindly provided by Greg Pari (University of Nevada, Reno). Site-directed mutagenesis was performed to mutate the UL34-binding sites within the oriLyt plasmid to unique restriction enzyme recognition sites. The first binding site (UL34-ori1), corresponding to nucleotides 91,787 – 91,796 in HCMV strain AD169 (Accession #

FJ527563), was mutated into a NdeI site; the second binding site (UL34-ori2), corresponding to nucleotides 92,213 – 92,222, was mutated into a BglII site; the third binding site (UL34-ori3), corresponding to nucleotides 95,509 – 95,518, was mutated into a StuI site. Double mutant plasmids (UL34-ori1+3 and UL34-ori2+3) were created by further modifying the single mutant plasmids. Oligonucleotides used to convert

UL34-binding site 1 to a NdeI site were 538 and 539; oligonucleotides using to convert 28 binding site 2 to a BglII site were 570 and 571; oligonucleotides used to convert binding site 3 to a StuI site were 537 and 557; oligonucleotides used to convert the non-specific site to a BglII site were 669 and 670. For manipulation of plasmid sequences, amplimers were also generated that utilized the unique restriction enzyme sites in the vector with oligonucleotides 544 and 545 or the unique NotI site present in oriLyt with oligonucleotides 568 and 569. pBJ895 has UL34-binding site 2 mutated; pBJ896 has binding site 1 mutated; pBJ905, binding site 3 mutated; pBJ902, binding sites 1 and 3 mutated; and pBJ903, binding sites 2 and 3 mutated. The mutated UL34-binding site 2 present in pBJ895 was restored to generate pBJ912; pBJ912 was constructed from pBJ895 by inserting into the BglII site, annealed phosphorylated oligonucleotides 596 and 597 which contain the wild-type UL34 binding site. Regions of plasmids that were altered were verified by sequencing.

Transient transfection and total DNA isolation.

Prior to use in transfection, plasmids were purified with the EndoFree Maxiprep

Kit (QIAGEN) according to the manufacturer’s instructions. All transfections were performed in biological duplicate. Equivalent amounts (5 µg) of each plasmid were prepared in serum-free growth media in the presence of DEAE-Dextran and layered on top of 80% confluent fibroblasts in 60mm dishes for 4 hours. The transfection was halted after 4 hours by washing the cells with serum-free growth media and replacing with standard media. Cells were allowed to recover for ~36 hours before being infected with

HCMV strain Towne at an MOI of 10 plaque forming units (pfu) per cell. At 5 days post infection, total DNA (human genomic, viral genomic, plasmid) was extracted with the 29

Quick-DNA Universal Kit (Zymo Research, Irvine, CA) according to the manufacturer’s instructions.

Southern blotting.

Equivalent amounts (1 µg) of each total DNA sample were digested with KpnI to generate different sized fragments of the oriLyt region (viral genomic, plasmid) and with

DpnI to further fragment input plasmid DNA. OriLyt sequences were detected by hybridization to an oriLyt-specific probe that was labeled and detected using the DIG

High Prime DNA Labeling and Detection Kit (Roche Applied Sciences, Penzberg,

Germany) according to the manufacturer’s instructions.

Quantitative polymerase chain reactions.

Multiplex quantitative polymerase chain reaction (qPCR) was performed using primers and labeled probes which recognize either viral, plasmid, or positive control targets. DNA samples extracted from transfected cells were digested with DpnI to eliminate the detection of input plasmid DNA. For detection of replicated plasmid, primers targeting the plasmid backbone, 612 and 613, were used in combination with the

Cy5-labeled probe 616. For detection of the viral genome, primers targeting the genomic sequences in the viral gene US3, 614 and 615, were used in combination with the 6FAM- labeled probe 617.

Viral genomic copy numbers were determined by qPCR using primers and a probe targeting the HCMV genome within the UL44 open reading frame. The oligonucleotides used for qPCR were 664 and 665 in combination with the 6FAM- labeled probe 666. A Bio-Rad CFX96 system coupled to a Bio-Rad C1000 thermocycler 30 was used for the qPCR reactions. The Takyon no ROX Mastermix was used for amplification; all reactions included a qPCR internal positive control labeled with

Yakima Yellow (Eurogentec, Liège, Belgium).

RT-qPCR.

Human fibroblasts were infected with the wild-type and UL34-oriLyt mutant viruses using an equivalent number of genomes, corresponding to 0.1 pfu/cell for wild- type virus. At 48 hours post infection, total RNA was isolated with the QIAGEN RNeasy

Mini Kit and converted to cDNA with the New England Biolabs OneTaq RT-PCR Kit according to the manufacturer’s instructions. For the detection of cDNA derived from

UL57 transcripts, primers 698 and 699 were used; for the detection of RNA4.9, primers

696 and 697 were used; for the detection of housekeeping gene beta-actin, primers 685 and 686 were used; for the detection of UL44, primers 664 and 665 were used. Targets were detected with Bio-Rad SsoAdvanced Universal SYBR Green Supermix; melt-curve analysis verified that a single product was generated per primer pair.

Generation of mutant viruses.

The “en passant” two-step Red-mediated recombination system was utilized for generation of mutant HCMV BACs (Tischer et al., 2010). Polymerase chain reactions

(PCR) targeting the kanamycin resistance and I-SceI site in pEP-KanS (a gift from Dr.

Nikolaus Osterrieder, Free University of Berlin, Germany) with 5’ extensions containing the desired mutation were used to generate transfer constructs. The oligonucleotides used to mutate UL34-oriLyt-1 were 598 and 599; the oligonucleotides used to mutate UL34- oriLyt-2 were 626 and 627; the oligonucleotides used to mutate UL34-oriLyt-3 were 624 31 and 625; and the oligonucleotides to add the myc tag to the amino terminal of UL34 were

586 and 585. The transfer constructs were electroporated into GS1783 cells (E. coli, a gift from Dr. Gregory Smith, Northwestern University Medical School, Chicago, USA) harboring the parental HCMV BAC, pHB5 (a gift from Dr. Ulrich Koszinowski, Max von Pettenkofer-Institute, Germany).

Two separately isolated mutant HCMV-BACs were analyzed to reduce the likelihood of the observed effects resulting from a random mutation. After purification and sequencing to confirm insertion of the desired mutation, the resultant mutant BACs, along with a plasmid encoding pp71, were transfected into HDFs using Lipofectamine

3000. Immunofluorescence was performed as described previously (Rana and Biegalke,

2014), using the mouse monoclonal anti-myc-tag antibody used for ChIP (Cell

Signaling). For the viruses containing mutations to the UL34-oriLyt binding sites, the relative number of genome copies per pfu were determined relative to the parental pHB5 virus. Virus stocks were centrifuged through a 20% sucrose cushion at 80,000g for 1 hour. Viral DNA was extracted from the pellet using the DNeasy Blood & Tissue kit

(QIAGEN) and quantified as described in “Quantitative polymerase chain reactions”. An alternative version of this protocol was developed and described in Chapter 5.

One-step growth curve.

Human diploid fibroblasts were grown in 60mm cell culture dishes as described above. Cells were infected with the wild-type and oriLyt mutant viruses at an equivalent number of viral genomes (See “Quantitative polyermase chain reactions”) corresponding to 0.01 pfu / cell for the wild-type virus. Supernatants were collected at multiple 32 timepoints post infection (4, 24, 48, 72, 96, 132, 168, 240, 312 hrs) and fresh media was applied after each collection. The titer of each timepoint supernatant was then determined by the number of pfu produced by each isolate.

Chromatin immunoprecipitation.

For chromatin immunoprecipitation (ChIP) experiments, 7x106 cells were infected with Myc-UL34-AD169 at an MOI of 5 pfu/cell. The SimpleChIP Enzymatic

Chromatin IP Kit w/ Magnetic Beads (Cell Signaling) was utilized according to the manufacturer’s instructions. 48 hours post infection, protein-DNA complexes were crosslinked by adding 37% formaldehyde to a final concentration of 1% for 10 minutes.

Crosslinking was then quenched with glycine; cells were washed with PBS and harvested in the presence of protease inhibitors. After washing and permeabilization with the provided buffers, enzymatic digestion of chromatin was performed with micrococcal nuclease at a concentration determined empirically to yield the optimal ratio of 150bp –

900bp fragments. Further digestion was halted with 0.5M EDTA, nuclei were pelleted and then resuspended in the provided ChIP buffer. Nuclei were gently sonicated with 3 sets of 20 second pulses at 3 output power, 20% duty cycle (Branson Sonifier 450) to disrupt the nuclear membrane and further fragment the protein-DNA complexes; 2% of this sample (Whole Cell Extract) was aliquoted and stored at -80oC until further use.

10µg of each chromatin prep was incubated with protein G magnetic beads (Cell

Signaling) to remove nonspecific protein binding interactions prior to immunoprecipitating with a mouse mAb specific for the myc-tag (Cell Signaling) and protein G magnetic beads. Lysates from the pre-clearing step were utilized as ’No 33

Antibody’ controls. Protein-DNA complexes were eluted from the beads; protein-DNA crosslinks were reversed by incubating the chromatin overnight at 65oC in the presence of

0.2M NaCl and 40µg proteinase K. DNA fragments were purified with the provided spin columns according to the manufacturer instructions.

Next-generation sequencing.

Sequencing of ChIP-enriched DNA was performed on an Illumina MiSeq using the TruSeq ChIP Sample Preparation kit protocol (Illumina, California, USA) according to manufacturer’s instructions with minor variations. Briefly, ChIP-enriched DNA was fragmented using a Covaris Ultrasonicator (Covaris, Massachusetts, USA) and run on an

Agilent High Sensitivity DNA chip on an Agilent 2100 Bioanalyzer (Agilent, California,

USA) to confirm the fragment sizes and quantify the DNA. Next, DNA ends were repaired to generate fully dsDNA molecules and the resulting DNA was purified using

Agencourt AMPure XP beads (Beckman Coulter, Indiana, USA). The 3’ ends of the purified dsDNA were adenylated and adapters containing the necessary barcoding and sequencing sites were ligated to the A-overhang. Again, the samples were purified using

Agencourt AMPure XP beads. Bead-purified samples were then subjected to gel purification using a Pippin Prep (Sage Science, Massachusetts, USA) automated size selection apparatus with a pre-cast 2% agarose gel cassette with no dyes and the 150 –

350 bp size range was collected. The ligated library was enriched using 18 cycles of PCR and purified with Agencourt AMPure XP beads. Libraries were run on an Agilent High

Sensitivity DNA Chip on an Agilent 2100 Bioanalyzer to determine the size distribution and the approximate quantity. Exact quantification was performed using the Kapa 34

Biosystems Complete Kit (ROX Low) (Kapa Biosystems, Cape Town, South Africa) according to the manufacturer’s instructions on a Stratagene Mx3000P Real Time PCR machine (Agilent, California, USA). Enriched and quantified libraries were normalized to

10 nM with Tris-HCL (pH8.5) 0.1% Tween 20, pooled in equimolar ratios, and diluted in

10 volumes of Tris-Tween buffer. Pooled libraries were denatured using 0.1 N NaOH for

5 minutes at room temperature, diluted with HT1 buffer, and pooled with denatured PhiX control (10%). The combined denatured libraries were then sequenced on an Illumina

MiSeq using the MiSeq Sequencing 2 x 75 Paired End V3 chemistry.

Genomics data analysis.

Reads were trimmed by Trimmomatic (Bolger et al., 2014) and aligned with

Burrows-Wheeler Aligner (BWA; (Li and Durbin, 2009) ) to a custom reference genome containing the human reference genome (GRCh38.p10) with the HCMV AD169 reference genome (Accession: FJ527563.1) added as an additional to prevent read assignment bias. The reads were then imported to CLC Genomics

Workbench (QIAGEN) for visualization. Peak calling was performed on the aligned reads with two methods: 1) Transcription Factor ChIP-seq Module within CLC Genomics

Workbench with the reads from each experimental sample compared against the reads from the Whole Cell Extract sample. For peak calling, a minimum p-value of 0.01 was used. 2) Fold enrichment of pUL34 binding to the oriLyt region was calculated by dividing the number of reads in the immunoprecipitated samples by the number of reads in the Whole Cell Extract samples within a 125bp (25 bp overhang) sliding window across the oriLyt region. For peak calling, a target enrichment of 4-fold was chosen, 35 representing an increase over background of at least 2-fold. Both methods gave highly similar, but different, results for the HCMV oriLyt region peaks.

Protein-protein interactions.

For immunoprecipitations, cells were either mock-infected or infected with

HCMV (strain Towne or Myc-UL34-AD169). Cell lysates were harvested at 48 hpi in co-immunoprecipitation buffer (20mM Tris-HCl, 150mM NaCl, 1mM EDTA, 0.5%

NP40, protease inhibitor cocktail), precleared with normal rabbit serum, and then immunoprecipitated with antisera to pUL34 (Biegalke et al., 2004) or with commercially prepared monocloncal anti-myc-tag antibody (Cell Signaling). Immunoprecipitates were then analyzed by immunoblot with the indicated antibody. The mouse monoclonal antibodies to UL44 (ICP36) and to UL84 were obtained from Virusys Corporation,

Taneytown, MD, USA; the mouse monoclonal antibody to IE2 was obtained from

Millipore, Temecula, CA, USA; and the mouse monoclonal antibody to IE1 (pp72) was obtained from Research Diagnostics, Inc., Flanders, NJ, USA. For GST (glutathione-S- transferase) pull downs, a portion of the major immediate early exon 5 (amino acids 534 to 580) was amplified using oligonucleotides 577 and 581; the resulting amplimer was inserted in frame with the GST tag in pGEX6P-1. GST and GST-IE2 protein was purified using glutathione beads; pUL34 and luciferase were synthesized by in vitro transcription/translation reactions in the presence of 35S-methionine with the TNT Quick

Coupled In Vitro Transcription/Translation kit (Promega). pUL34 was incubated with

GST or GST-IE2, washed, and bound proteins eluted and analyzed by PAGE and autoradiography. 36

Statistical analysis.

Statistics were performed with two-tailed Student’s t-test (Figure 8, 9) within

Microsoft Excel with the data analysis module. The significance level was set as follows:

*, p < 0.05; **, p < 0.01.

Data availability.

Data from the Myc-UL34 ChIP-seq experiment have been deposited in the NCBI

GEO under accession number GSE106211.

37

CHAPTER 3: CONSERVATION OF UL34 BINDING SITES AND CHIP-SEQ

ANALYSIS

A portion of this Chapter was published in Virology, May 2018 Citation: Slayton, M., Hossain, T., and Biegalke, B.J. (2018). pUL34 binding near the human cytomegalovirus origin of lytic replication enhances DNA replication and viral growth. Virology 518, 414–422.

Background

During lytic-phase replication, human cytomegalovirus (HCMV) must successfully evade the host immune response (both intracellular and extracellular) and replicate its DNA genome. Approximately 25% of the known viral genes are required for lytic-phase replication; expression of the viral gene UL34 is required for viral replication to occur (Dunn et al., 2003; Rana and Biegalke, 2014). Although UL34 is an essential viral gene, the functions of this gene remain primarily unknown. Previous work identified the viral UL34 proteins (pUL34) as sequence-specific DNA binding proteins that localize to viral replication centers in nuclei of HCMV-infected cells. The UL34 proteins bind within the promoter regions of two viral genes involved in evasion of the host immune system, US3 and US9 – the binding to pUL34 to this region results in transcriptional down regulation (LaPierre and Biegalke, 2001). From these observations, the predicted pUL34-binding site was identified. In vitro experiments confirmed that pUL34 binds to predicted binding sites (Liu and Biegalke, 2013). Based on the predicted

UL34-binding site, the HCMV genome contains ~14 predicted pUL34-binding sites

(AAACACCGT[G/T]) (Fig. 2). A random distribution of nucleotides in the viral genome would result in one UL34-binding site; thus, the increase in the presence of predicted 38

UL34-binding sites suggests that UL34-DNA interactions are involved further in aspects of viral replication besides immunoevasion. However, the conservation of the predicted

UL34-binding sites amongst clinical HCMV isolates has not been determined. The DNA binding nature of the UL34 proteins also suggests that they may interact with the cellular genome, contributing to the documented alteration of cellular gene expression levels that occur during viral infection. However, interactions of UL34 proteins with the cellular genome have not yet been examined.

Chromatin immunoprecipitation with sequencing (ChIP-Seq) is an investigative technique which combines the high-throughput power of next-generation sequencing and the antibody-based immunoprecipitation of protein-bound DNA fragments to determine the genome-wide binding profile of any DNA-binding protein. The main limitation of

ChIP-Seq is the availability of a commercially available, ChIP qualified antibody for the immunoprecipitation step. Antisera to the UL34 proteins was previously created in rabbits (Biegalke et al., 2004) but this antisera lacks the specificity needed for ChIP.

Therefore, a modified virus which expresses the early UL34 protein with the Myc-tag at the amino-terminus was created. This virus was used for ChIP-seq with a commercially available and ChIP-qualified antibody that recognizes the Myc-tag.

The purpose of the experiments described in this chapter was to determine if the predicted UL34-binding site is present in clinical HCMV isolates from a variety of tissues and from patients living in various geographic locations, and to identify pUL34 binding sites in the viral and human genome during infection.

39

Aims

Are the predicted UL34 binding sites in strain AD169 conserved amongst other HCMV strains?

Initial analyses analyzed only a few isolates or strains of HCMV to identify 14 potential UL34 -binding sites. With advances in DNA sequencing, the publicly available database of sequenced HCMV strains has grown to contain the genomes of over 200 clinical isolates. A comprehensive analysis of HCMV genomes for the pUL34-binding site will determine if the sequences are conserved amongst viral strains, possibly implicating them in viral replication.

Does pUL34 bind to the viral and the human genomes during infection?

pUL34 binds to the predicted pUL34-binding site in vitro and in vivo as assayed by gel mobility shift experiments, and in yeast one-hybrid studies, respectively. The interaction of pUL34 with the viral and cellular genomes during the viral replication cycle has not been examined. To assess the possible interactions, chromatin immunoprecipitation followed by deep-sequencing (ChIP-seq) was performed, combined with a modified strain of HCMV that expresses pUL34 as a tagged protein (myc), to examine the pUL34-DNA interactions that occur at 48 hours post-infection.

Results

Predicted pUL34-binding sites are conserved among HCMV strains.

Comparison of the pUL34-binding sites in the laboratory HCMV strain, AD169, with a low passage strain, Toledo, and with a clinical isolate, VR1814 (Dargan et al., 2010) demonstrated that, while most of the potential pUL34-binding sites in the HCMV 40 genome are conserved, there are some strain differences in the number and locations of the binding sites.

Since highly conserved pUL34-binding sites are likely to have greater functional significance, pUL34-binding site conservation was determined for a larger number of

HCMV strains. For this analysis, the NCBI nucleotide database

(http://www.ncbi.nlm.nih.gov/nuccore) which contains the genomic sequences of over

200 HCMV isolates was utilized. The HCMV laboratory strains, AD169 and Towne, and

45 randomly selected isolates were analyzed for pUL34-binding site iterations by scanning the viral genome for the pUL34-binding site sequence. A random distribution of nucleotides would result in approximately one UL34-binding site within the HCMV genome. Compared with strain AD169, conservation of the pUL34-binding sites ranged from 83% to 100%, (Table 2). The pUL34-binding sites in RL9A, UL11, UL13, and the first two binding sites in the oriLyt region (oriLyt 1 and oriLyt 2) were present in all 47 genomes analyzed. All of the binding sites exhibited a high degree of conservation

(>90%), except for the pUL34-binding site within UL37 which was only present in 83% of the genomes analyzed.

The analysis also identified additional potential pUL34-binding sites in many of the HCMV isolates. Notably, a fourth potential pUL34-binding site was detected in the oriLyt region in 95% of the genomes. The laboratory strain AD169 has a sequence

(AAACGCCGTT) at this site, with 9 of 10 nucleotides matching the known pUL34- binding sites (AAACACCGTT). The concentration and conservation of predicted 41

Figure 2. UL34-binding sites throughout the HCMV genome.

Schematic diagram of the human cytomegalovirus genome strain AD169 with the locations of each UL34-binding site indicated. The unique long and unique short regions are indicated; repeat regions are represented by larger white boxes between and flanking the unique regions. Each UL34-binding site is marked with a thin vertical line and labeled with the gene in which it is located. The origin of lytic replication (oriLyt) is displayed in greater detail, with the positions of the UL57 and RNA4.9 transcripts indicated. The UL34- binding sites in the oriLyt are indicated by black ovals and the core region, which is critical for viral DNA replication, is indicated by the gray box. 42

Table 2. Conservation of UL34-binding sites. The genomes of 47 HCMV isolates were analyzed for the presence of UL34-binding sites. The strain and the original source of the isolate are listed. Gray boxes indicate the presence of a UL34-binding site within or near the indicated gene; open boxes indicate the absence of a UL34-binding site when compared to the reference strain, AD169. The presence and relative locations of any additional binding sites are also indicated. Abbreviations: y/o, year old; m/o, month old; pt, patient. 43

STRAIN SOURCE RL9A RL10 UL11 UL13 UL31 UL32 UL37 UL54 ORI1 ORI2 ORI3 US3 US9 US11 ADDITIONAL SITES AD169 7 y/o female adenoids Towne 2 m/o infant oriLyt4; UL89 urine Merlin Congenital; child oriLyt4 urine Toledo Congenital; infant oriLyt4; UL89 urine TB40/E transp. patient oriLyt4,TRS1 VR1814 cervical oriLyt4, UL140 secretions AF1 amniotic fluid oriLyt4, US22 U11 Congenital; infant oriLyt4 urine U8 Congenital; infant oriLyt4, UL140 urine 3301 Congenital; infant oriLyt4 urine HAN20 bronchoalveolar oriLyt4, UL48, lavage UL150, RL13 HAN38 bronchoalveolar oriLyt4 lavage HAN13 bronchoalveolar oriLyt4, lncRNA2.7 lavage JP AIDS patient oriLyt4, TRS1 prostate 3157 Congenital; infant oriLyt4, US22,UL4 urine TR vitreous humor oriLyt4, lncRN5.0 AIDS pt HAN unknown oriLyt4, after UL32

44

Table 2 continued. 6397 Congenital; infant oriLyt4, TRS1, US10, urine US34, US22 Davis Congenital; infant oriLyt4, UL150, liver lncRNA2.7 HAN1 bronchoalveolar oriLyt4, 2nd oriLyt3, lavage UL2 HAN2 bronchoalveolar oriLyt4 lavage HAN3 bronchoalveolar oriLyt4, US22 lavage HAN8 bronchoalveolar UL89 lavage HAN12 bronchoalveolar oriLyt4 lavage PAV1 Amniotic fluid oriLyt4, lncRNA2.7 PAV21 Amniotic fluid oriLyt4, lncRNA2.7 PAV20 Amniotic fluid oriLyt4 PAV18 Amniotic fluid oriLyt4, UL89 UKNEQAS1 Congenital; infant oriLyt4 urine UKNE1AS2 Amniotic fluid oriLyt4, TRS1 DB Cervical swab oriLyt4, UL89 PAV25 Amniotic fluid oriLyt4, TRS1, US22 PAV24 Amniotic fluid PAV23 Amniotic fluid oriLyt4, UL140, UL150 PAV12 Amniotic fluid oriLyt4 PAV11 Amniotic fluid oriLyt4 PAV8 Amniotic fluid oriLyt4 PAV7 Amniotic fluid oriLyt4 PAV6 Amniotic fluid oriLyt4, UL150 PAV5 Amniotic fluid oriLyt4 PAV4 Amniotic fluid oriLyt4 45

Table 2 continued. 2CEN30 bronchoalveolar oriLyt4, TRS1 lavage 2CEN15 bronchoalveolar oriLyt4 lavage 2CEN5 bronchoalveolar oriLyt4, TRS1, UL140, lavage RL13 2CEN2 bronchoalveolar oriLyt4, IRS1, RL1 lavage HAN16 infant urine oriLyt4, TRS1, UL140 HAN19 bronchoalveolar oriLyt4; TRS1; IRS1; lavage RL1 % CONSERVATION 100% 91% 100% 100% 98% 91% 83% 98% 100% 100% 94% 94% 98% 91% 93% conservation of oriLyt4 46 pUL34-binding sites in the oriLyt region further suggests that pUL34-DNA interactions may have a role in viral DNA replication.

Additional pUL34-binding sites were identified in some of the HCMV genomes and were positioned near TRS1 (in 19% of the genomes), UL89 (13%), and UL140

(11%). Fewer than 10% of the genomes analyzed had additional UL34 binding sites near

US22, UL48, UL150, RL13, lnc2.4, UL4, lnc5.0, US10, US34, UL1, and RL1 (Table 2).

There was no apparent association with the presence or absence of specific pUL34-binding sites and the source of the viral isolate.

Generation of myc-UL34-AD169 and ChIP-seq.

The in vivo interactions of pUL34 with DNA were investigated using chromatin immunoprecipitation followed by deep sequencing (ChIP-seq). In order to use a ChIP- verified antibody for immunoprecipitation for these experiments, a virus was generated

(myc-UL34-AD169) that expresses the early pUL34 protein as a myc-tagged protein by fusing the myc-tag to the N-terminal of pUL34. The myc-UL34-AD169 virus replicated with kinetics similar to the parental virus; the myc-UL34 protein localized to nuclei and viral replication centers of infected cells, in a pattern similar to pUL34 (Rana and

Biegalke, 2014), and was expressed with wild-type kinetics in infected cells as assayed by immunoblots (Fig. 3A-C).

For ChIP analysis, human diploid fibroblasts were infected with myc-UL34-

AD169 at an MOI of 5. At 48 hours post infection, protein-DNA complexes were crosslinked with formaldehyde, sheared by enzymatic and sonication treatments, and 47

Figure 3. Characterization of the Myc-UL34-AD169 virus.

(A) One-step growth curve. Viruses reconstituted from the unmodified parental BAC

(pHB5) or from a BAC containing an insertion of the myc-tag epitope on the amino- terminal of early pUL34 were assessed for their ability to replicate in a one-step growth curve. Human fibroblasts were infected at a multiplicity of infection of 1 pfu/cell; supernatants were quantified for pfu at the indicted time points after infection.

(B) Immunofluorescent analyses of cells infected with Myc-UL34-AD169 at 48 hpi.

Nuclei are stained with DAPI. Myc-UL34 is stained with the same anti-myc antibody used for ChIP-seq analysis and an anti-mouse secondary labeled with FITC.

(C) Expression of myc-tagged UL34. Human fibroblasts were mock-infected or HCMV- infected (pHB5, Myc-UL34-AD169) and harvested in RIPA buffer at the indicated time points after infection. Proteins from cell lysates were analyzed by immunoblotting with a monoclonal antibody specific for the myc-tag epitope.

Experiments in Figure 3 were performed by the post-doctoral fellow, Dr. Tanvir Hossain, who was also working in the Biegalke lab. 48

A

B

C

47 -

49 immunoprecipitated using an anti-myc tag antibody. The precipitated DNA fragments were purified, sequenced and mapped to a composite genome that contained the human and the HCMV strain AD169 reference genomes. After removing duplicate reads, the interaction of pUL34 with human or viral genomic DNA was determined by comparing reads from the myc-UL34-AD169 infected, immunoprecipitated samples with the reads obtained from the myc-UL34-AD169 infected, whole cell extract. Peaks were called using a multi-faceted approach, combining the results of three peak calling methods to create the most likely binding profile of pUL34. The composite technique included peak calling via the Transcription Factor ChIP-Seq peak calling module in CLC Genomics

Workbench v9.0 (QIAGEN); the R-based Linux-run “phantompeakqualtools” script

(Landt et al., 2012); and a ‘by-hand’ method in which identified peaks were confirmed with a direct comparison of the reads in the experimental samples versus the control samples. pUL34 binds to viral genomic DNA during infection.

Previous experiments determined that pUL34 binds predicted UL34-binding sites.

However, the protein-DNA interactions were not examined during viral infection within the cell. The ChIP-seq data were analyzed to determine if pUL34 binds to the viral genome during infection. First, the total number of reads that were aligned to either the viral or the human genome were determined for the experimental samples (infection followed by immunoprecipitation with the anti-myc antibody) as well as the two control samples (infection followed by immunoprecipitation in the absence of antibody and infection with no immunoprecipitation). Despite the human genome containing over 50

13,000-fold more bases than the viral genome, pUL34 bound to viral DNA at a higher frequency in the experimental samples (17% of total reads) than in the no antibody control (2% of total reads) or the non-immunoprecipitated control (1% of total reads)

(Table 3). This data demonstrated that pUL34 has a much higher affinity for the HCMV genome relative to the cellular genome.

There were 40 viral peaks identified, representing unique pUL34-DNA interactions with the viral genome at 48 hours post infection (Table 4). At 48 hours post- infection, pUL34 was bound to 8 of the 14 predicted pUL34-binding sites, the binding sites near RL9, UL11, UL32, UL37, UL54, oriLyt-1, oriLyt-2, and oriLyt-3. The remaining 32 interactions are in addition to the previously predicted pUL34-DNA binding sites. The consensus sequences were extracted from each viral pUL34-DNA interaction in the same manner as with the human genome, then used as input in the

MEME-ChIP motif discovery tool. Several potential motifs were discovered, but they were not statistically significant. Only one statistically significant motif was identified

(Figure 4). The statistically significant recurring motif, which was discovered independently by MEME-ChIP analysis of the ChIP-seq data, is highly similar to the predicted UL34-binding site (AAACACCGT[G/T]).

Although there is a total of 14 peaks which contain this motif or similar to it, only

8 of those 14 correspond to actual predicted UL34-binding sites – the remaining 6 are novel identifications. This result demonstrates that while 8 of the 14 predicted UL34 binding sites were bound, an additional 6 novel regions had very similar sequences to the

51

Table 3. Distribution of the aligned reads in the ChIP-seq samples. The total number of reads for each sample type was determined as they aligned to either the viral or human genome. A composite genome containing the complete human genome with the viral genome added as an additional chromosome was utilized to prevent bias in the alignment.

Table 3.

52

Table 4. Interactions between pUL34 and the viral genome during infection. Peaks that were called from the ChIP-seq analysis in the viral genome are listed in the table.

“Center of peak” refers to the viral genomic coordinate, “Binding score” is the result of three combined peak-identification methods: CLC peak-call module, phantomPeakqualtools script, and by-hand comparison of aligned reads; higher score corresponds to a higher likelihood of a pUL34-binding interaction at the indicated genomic location. Nearby genes are indicated; due to the small size of the viral genome, both the nearest gene in the 5’ and 3’ direction are shown. Peaks with blue highlighting represent predicted pUL34 binding sites (see Figure 2); peaks with asterisk (*) represent binding sites within the oriLyt region that were examined further in Chapter 4.

Center of peak Binding score 5' gene 5' distance 3' gene 3' distance 2516 5.79 RL1 486 RNA2.7 0 6361 5.76 RL6 296 RNA1.2 0 8084 4.75 RL8A 164 RL9A 0 10522 4.75 RL12 0 UL1 1280 15342 5.51 UL6 0 UL7 147 18940 4.67 UL11 0 UL13 299 25366 4.97 UL20 0 UL21A 856 28494 4.83 UL22A 852 UL23 0 32428 4.91 UL25 322 UL26 0 41004 5.26 UL31 1101 UL32 0 46428 4.87 UL35 0 UL36 1750 50932 5.72 UL36 993 UL37 0 56355 3.39 UL42 1198 UL43 0 61535 5.31 UL47 0 UL48 2610 71756 5.36 UL48A 299 UL49 0 76144 4.82 UL52 0 UL53 524 79525 4.91 UL53 1506 UL54 0 91784* 10.90 UL57 259 RNA4.9 2092 92198* 9.52 UL57 889 RNA4.9 1592 93580* 5.98 UL57 2193 RNA4.9 308 53

Table 4 continued. Center of peak Binding score 5' gene 5' distance 3' gene 3' distance 95524* 3.93 RNA4.9 0 UL69 3470 97841 3.68 RNA4.9 0 UL69 1146 102454 4.87 UL69 979 UL70 0 115541 6.07 UL78 286 UL79 0 117420 6.00 UL80.5 0 UL82 865 123412 5.78 UL83 1329 UL84 0 130416 5.28 UL87 0 UL88 1524 139100 4.45 UL88 5489 UL89 0 146454 4.09 UL99 428 UL100 0 150432 9.76 UL102 231 UL103 0 155288 4.90 UL105 0 RNA5.0 160 158126 4.02 UL105 5455 RNA5.0 0 161282 4.72 UL112 598 UL114 0 164558 5.42 UL116 166 UL117 0 166722 5.44 UL128 613 UL130 0 177370 4.09 US12 430 US13 0 204459 4.77 US16 949 US17 0 208512 5.35 US22 388 US23 0 214480 5.02 US27 0 US28 158 221376 5.05 TRS1 0

54

Figure 4. Motif prediction in the viral genome and pUL34-DNA binding interactions during infection.

(A) Logo for the consensus motif and occurrences of the motif in the viral genome. The motif was discovered at 14 sites which are indicated below. Location refers to the genomic identifier, strand refers to the positive or negative strand of DNA, p-value is the indicator of statistical significance, and sites illustrates the sequence used for producing the motif (colored) as well as flanking sequences (gray). Locations which match predicted UL34-binding sites are indicated with an asterisk (*); the remaining locations contain sequences similar to the predicted UL34-binding site and were still bound during infection.

(B) Comparison of predicted versus actual pUL34-DNA binding interactions. Schematic diagram of the HCMV genome with predicted pUL34-binding sites indicated as black vertical bars (top). Below, diagram of actual pUL34-binding sites at 48 hours post infection. Interactions that match with the predicted sites are indicated as black vertical bars; non-matching sites are indicated with light blue bars. The hatched box represents

US2-US6 which were replaced with BAC maintenance sequences.

55

A

Location Strand p-value Sites oriLyt-2* + 3.87e-7 CATTGCCATG AAACACCGTGA TGGGGAACGG

UL31* - 3.87e-7 CGGCTTCTCA AAACACCGTGT CCACCACCCC

RL9* - 3.87e-7 AATTATAACA AAACACCGTGA CACTGTCCAT oriLyt-3* + 6.88e-7 ACTGTA AAACACCGTTT GACCGCACCC

UL54* + 1.93e-6 CCTCGGGCTC AAACACCGTGG CGCCCTGGTA

UL114 - 2.23e-6 CGAGACGGCT AAACAACGTGT CCAACTCGCT

UL70 + 3.00e-6 GTTTTTTGGA AAACGCCGTTA TCCGGG oriLyt-1* - 3.00e-6 GTAGGACAGA AAACGCCGTTT TCTAGGGACG

UL6 + 1.19e-5 TTATCAACGG CAACAGCGTGT CGGAAAC

UL87 - 1.82e-5 AAAGGTTTCT CCACACCGTGT CTGTGTCGGC

UL105 + 2.56e-5 TGCACGTGTG CACCACCGTGG ACTACGGCCT

RL12 + 2.56e-5 GCCCTAGATA CAACACCGATA TCGAAAATGA oriLyt-core - 3.13e-5 ATCCGCGTGG CAACGCCCTGA CAACCAAAAA

UL11* - 3.38e-5 A AACCGACGTGA AGTTCTACGT

56

B

57 predicted UL34-binding site and were also bound during infection; the remaining 26 binding interactions did not have a statistically significant recurring motif. pUL34 binding sites in the oriLyt region are occupied during infection.

The confirmation that pUL34 binds to the origin of lytic replication during viral infection led to the question as to the function of these binding interactions. Hence, the reads that aligned to the oriLyt region were analyzed in a more precise manner, by scanning across the region with a 150-basepair sliding window and providing an in-depth landscape of pUL34-DNA interactions within this region.

pUL34 was highly enriched at the UL34-oriLyt-1 and oriLyt-2 binding sites, with a peak of protein-DNA interactions located between nucleotides 91,750 – 91,825 (oriLyt-

1) and a peak between nucleotides 91,275 – 92,350 (oriLyt-2) (Fig. 5A). The maximum enrichment determined for these peaks was 11.3-fold and 10.0-fold, respectively.

Surprisingly, an additional peak was identified between nucleotides 93,525 – 93,625, with a maximum enrichment of 7.7 fold; nucleotides representing 8 out of 10 of the UL34 binding site occur within this region (peak c; Fig. 5B). Lastly, pUL34-DNA interactions occurred at the UL34-oriLyt-3 binding site (nucleotides 95,350 – 95,650), with a maximum enrichment of 4.4 fold. These data indicated that pUL34 interacts with the oriLyt UL34-binding sites during viral replication, consistent with previous in vitro protein-DNA interactions identified by electrophoretic mobility assays (Liu and Biegalke,

2013).

58

Figure 5. Enrichment of pUL34 in the oriLyt region.

(A) The binding activity of pUL34 in the oriLyt region was determined by analysis of sequencing reads from the Myc-UL34 ChIP-seq at 48 hpi (see Materials and Methods).

The inserted oriLyt region fragment in pSP50 is indicated by the large white rectangle

(nucleotides 90,752 – 95,815 in strain AD169); the core region containing the essential region of the oriLyt is indicated by the gray rectangle. The locations of important transcription factor binding sites, the UL34-oriLyt binding sites, the RNA-stem loop and vRNAs (RNA-DNA hybrid) and the partial transcripts of UL57 and RNA4.9 are indicated. The fold enrichment of pUL34 is displayed as a graph relative to the oriLyt region in 125bp increments with a 25 bp overlap on both ends. Fold enrichment was calculated by comparison of the infected (Myc-UL34-AD169), immunoprecipitated (anti-

Myc) reads against the whole cell extract reads; error bars indicate differences between biological duplicates. Identified peaks are labeled with lower-case lettering for reference.

(B) Consensus sequences from identified ChIP-seq peaks. The labeled peaks from Fig.

5A were analyzed for a consensus sequence. Each peak occurred around the published

UL34 binding sequence (highlighted in gray), with the exception of peak ‘c’, which has two non-matching nucleotides (labeled with an asterisk). #; peaks corresponding to

UL34-binding sites on the reverse strand. 59

A

B

60

Interaction of pUL34 with the cellular genome.

Cellular gene expression is highly regulated during HCMV infection. pUL34 acts as a transcription factor via direct interactions with viral DNA (LaPierre and Biegalke,

2001); however, it was not known if pUL34 interacted with the human genome. Analysis of the ChIP-seq dataset identified several interactions between pUL34 and human DNA

(Table 5). There was a total of 122 identified protein-DNA interactions at various chromosomal locations. The closest known transcripts to the UL34-binding sites were predominantly non-coding regulatory RNAs, pseudogenes and rRNA/tRNA. Distances to the nearest gene ranged from above 50,000 base pairs to as little as 200 basepairs, with some binding sites occurring within mitochondrial transcription units; on average, the distance from the peak to the nearest neighboring gene was 6,000 base pairs. The peak shape score, which is the mathematical model used to determine the likelihood of a protein-DNA interaction from ChIP-seq data, ranged from 3.75 to 29.3. This number determines how well the peak fits the peak shape model; a higher number is correlated with a better fit to the peak shape model. No biological relevance can be determined without further experimentation – the number can be thought of as the statistical likelihood of that particular protein-DNA interaction. An ideal peak shape is comprised of the forward and reverse sequencing reads distributed evenly at -100 and +100 base pairs from the center of the peak, respectively (Gaussian distribution). Creation of the peak shape model for a given data set involves complex mathematical algorithms and statistical likelihoods which are beyond the scope of molecular biology; however, peak- 61 calling software is continually improving. Despite this fact, the best method for calling peaks remains as “by-hand” visual inspection of the data (Heydarian et al., 2014)

The regulation of cellular gene expression is complex, and the identified pUL34-

DNA binding interactions within the human genome may not portray the potential downstream effects which may result from the modulation of gene expression. To obtain a broader view of the implications of the UL34-cellular genome interactions, the genes identified as located near to a UL34-binding interaction were analyzed with the Reactome

Pathway Database (reactome.org; (Fabregat et al., 2018). Reactome functions by taking published expression data for the genes of interest (Table 5) and their effects on the expression of other genes (gene expression pathway) to create a list of the interactions between different cellular processes that are regulated by the input list of genes. The most likely pathways (p-value < 0.05) to be affected by pUL34-DNA interactions in the human genome are shown in Table 6.

Consensus UL34-binding site in the human genome.

From the identified peaks from the ChIP-seq analysis, a consensus of the pUL34- binding sequence can be determined. The consensus sequence of each peak from the pUL34-DNA interactions were extracted with Galaxy (usegalaxy.org; Afgan et al., 2016).

These sequences were then used as the input for MEME-ChIP (meme-suite.org;

Machanick and Bailey, 2011) to discover potential motifs within the human sequences to which pUL34 binds. Three statistically significant motifs were discovered (Fig. 6A). To determine whether there was an abundance of known human transcription factor binding sites 62

Table 5. Interactions between pUL34 and the human genome during infection. Peaks corresponding to pUL34-DNA interactions in the human genome were called from the

Myc-UL34-AD169 ChIP-seq reads as described above. Listed in the table are the chromosomal location of each peak (“Chr”), the name of the nearest gene (“Name”), the biotype of the gene (“Biotype”), and the peak shape score generated during peak calls

(“Score”). Peaks are organized by gene biotype and peak shape scores are colorized based on value – darker red scores represent values on the lower end, white scores represent values within the average, and darker blue represent values near the higher end of the distribution.

Chr Name Biotype Score Chr Name Biotype Score 1 . RP11-312B8.1 antisense 4.39 .22 MIR5739 miRNA 4.43 14 IGHD4-11 IG_D_gene 15.05 X MIR6894 miRNA 9.15 1 RNU11 lincRNA 4.12 1 DLEU2_4 misc_RNA 9.00 2 AC093326.2 lincRNA 4.01 2 Y_RNA misc_RNA 5.30 16 RP11-297D21.2 lincRNA 9.99 3 RNY3P13 misc_RNA 6.82 22 RNU12 lincRNA 4.67 3 CLRN1-AS1 misc_RNA 4.13 1 MIR6077 miRNA 4.71 5 Y_RNA misc_RNA 5.28 2 MIR3133 miRNA 4.39 8 Y_RNA misc_RNA 5.57 3 MIR4789 miRNA 4.01 9 Y_RNA misc_RNA 5.97 4 MIR548AH miRNA 4.89 10 Y_RNA misc_RNA 5.40 6 MIR6720 miRNA 4.14 10 Y_RNA misc_RNA 5.96 7 MIR1302-6 miRNA 3.75 10 Y_RNA misc_RNA 5.45 7 MIR5707 miRNA 4.02 11 Y_RNA misc_RNA 4.88 8 MIR5680 miRNA 10.39 13 RNY3P2 misc_RNA 5.81 9 MIR181B2 miRNA 4.18 19 Y_RNA misc_RNA 4.47 11 MIR548K miRNA 5.40 MT MT-RNR1 Mt_rRNA 7.19 11 MIR3166 miRNA 4.08 MT MT-RNR2 Mt_rRNA 12.51 14 MIR4307 miRNA 4.29 MT MT-TF Mt_tRNA 9.69 15 MIR4713 miRNA 5.22 MT MT-TV Mt_tRNA 10.73 17 MIR301A miRNA 4.98 MT MT-TL1 Mt_tRNA 29.34 17 MIR1273E miRNA 6.62 MT MT-TY Mt_tRNA 8.63 18 MIR8078 miRNA 8.83 MT MT-TS1 Mt_tRNA 29.73 19 MIR6885 miRNA 5.56 MT MT-TH Mt_tRNA 8.33 19 MIR7975 miRNA 5.35 MT MT-TS2 Mt_tRNA 10.70 63

Table 5 continued. Chr Name Biotype Score Chr Name Biotype Score .MT MT-TL2 Mt_tRNA 8.87 .18 RNA5SP458 rRNA 4.74 MT MT-TE Mt_tRNA 8.04 2 SNORA48 snoRNA 4.52 MT MT-TT Mt_tRNA 18.13 5 SNORA63 snoRNA 4.34 3 TMEM158 protein_co 5.04 5 SNORA74 snoRNA 4.81 5 FOXD1 dingprotein_co 5.03 11 snoMBII-202 snoRNA 4.93 7 FERD3L dingprotein_co 6.41 13 SNORD38 snoRNA 5.37 13 MRPL57 dingprotein_co 4.32 14 SNORD113 snoRNA 4.71 16 C16orf91 dingprotein_co 5.65 14 SNORD114-20 snoRNA 8.68 20 ID1 dingprotein_co 5.76 14 SNORD113 snoRNA 7.75 MT MT-ND1 dingprotein_co 9.80 15 SNORD115-23 snoRNA 4.30 MT MT-ND2 dingprotein_co 5.39 15 SNORD18 snoRNA 4.82 MT MT-CO1 dingprotein_co 6.75 19 SNORD111 snoRNA 5.59 MT MT-CO2 dingprotein_co 11.70 1 RNU6-584P snRNA 6.22 MT MT-ATP8 dingprotein_co 6.23 1 RNVU1-15 snRNA 4.11 MT MT-ATP6 dingprotein_co 5.77 2 RNU6-1168P snRNA 4.48 MT MT-CO3 dingprotein_co 14.31 2 RNU7-148P snRNA 6.73 MT MT-ND3 dingprotein_co 4.87 3 RNU6-232P snRNA 5.54 MT MT-ND4L dingprotein_co 4.30 4 RNU6-1074P snRNA 4.93 MT MT-ND4 dingprotein_co 11.14 6 RNU6-130P snRNA 4.50 MT MT-ND5 dingprotein_co 13.59 6 RNU6-1106P snRNA 4.13 MT MT-ND6 dingprotein_co 7.00 7 RNU6-863P snRNA 4.48 MT MT-CYB dingprotein_co 12.55 8 RNU6ATAC32P snRNA 4.64 2 PCED1CP dingpseudogen 8.40 8 RNU6-1011P snRNA 7.27 3 DUX4L26 epseudogen 4.67 9 RNU6-820P snRNA 4.18 4 RP11- epseudogen 6.26 12 RNU6-101P snRNA 4.84 5 241F15.10RP11-287J9.1 epseudogen 13.36 13 RNU6-75P snRNA 7.17 7 AC004987.10 epseudogen 12.26 14 RNU6-1234P snRNA 5.04 7 RP11-678B3.1 epseudogen 4.31 16 RNU7-24P snRNA 8.06 7 GS1-124K5.9 epseudogen 8.22 17 RNU6-258P snRNA 4.70 7 CYP3A137P epseudogen 5.62 19 RNU6-9 snRNA 4.19 9 FKBP4P6 epseudogen 7.77 22 RNU6-1219P snRNA 4.72 9 RP11-350E12.5 epseudogen 4.91 14 TRAJ56 TR_J_gene 9.20 10 RP11-288D15.1 epseudogen 5.34 17 RP11-815I9.5 epseudogen 5.27 18 ROCK1P1 epseudogen 7.10 20 CST2P1 epseudogen 4.19 22 ANP32BP2 epseudogen 6.70 X GRPEL2P2 epseudogen 9.11 5 RNA5SP180 erRNA 4.96 5 RNA5SP188 rRNA 4.09 10 RNA5SP327 rRNA 6.80 64

Table 6. Reactome Pathway of the human genes that UL34 binds near or within. List of genes that have the potential to be modulated by pUL34 binding. Results were created with the Reactome Pathway (reactome.org) with the input of the genes of interest (listed in

Table 5).

Pathway name #Entities found #Entities total Entities pValue tRNA processing in the mitochondrion 25 45 1.11E-16 rRNA processing in the mitochondrion 21 40 1.11E-16 Respiratory electron transport 10 115 2.02E-07 The citric acid (TCA) cycle and respiratory electron transport 12 228 4.61E-06 Complex I biogenesis 6 57 3.37E-05 Metabolism of RNA 27 782 4.70E-04 Mitochondrial translation initiation 4 96 0.00703967 rRNA modification in the mitochondrion 2 10 0.0088108 Mitochondrial translation elongation 4 94 0.014344783 Mitochondrial translation termination 4 94 0.015109095 Mitochondrial translation 4 102 0.021663021 Formation of ATP by chemiosmotic coupling 2 23 0.049028723 65

Figure 6. Motif prediction and overrepresented human transcription factor binding sites found in ChIP-seq data.

(A) Discovered motifs in the human genome sequences bound by pUL34 during infection. The consensus sequence of each pUL34-DNA binding interaction within the human genome was extracted and the most likely binding motifs were determined. The

‘logo’ represents recurring nucleotides at the given locations (larger letters have a higher frequency of occurrence), the e-value represents the significance of the motif, the number of times that the motif was found is listed under sites and the total number of nucleotides in each motif is indicted under width.

(B) Overrepresented human transcription factor binding sites found in sequences bound by pUL34. Known motifs of human transcription factors were discovered in pUL34- bound human genome sequences with the pScanChIP web server. The displayed logo represents the motif that each transcription factor (above the logo) recognizes and binds to. 66 A

B FOSL family; JUN family; NFE2 family, BACH2

RUNX1

CTCF

67 within the regions flanking the pUL34-DNA interactions, the consensus sequences were scanned for such occurrences with the pScanChIP web server

(www.beaconlab.it/pscan_chip_dev; Zambelli et al., 2013) (Fig. 6B); three known motifs

(one family of TFs; two unique TFs) were discovered within the dataset.

Discussion

There were 40 peaks identified from the analysis of pUL34-DNA interactions in the viral genome (Table 4). Many of these interactions were not expected, as prior work identified pUL34 as a DNA-binding protein which represses viral transcription by interacting with the sequence AAACACCGT[G/T] located in the promoter of US3 and

US9 (LaPierre and Biegalke, 2001). However, the environment within the infected cell is much different than that of a test tube, which may be responsible for the expanded DNA- binding activity of pUL34.

Intriguingly, the pUL34-binding sites in the oriLyt region are occupied by pUL34 during viral DNA replication. The in-depth pUL34-DNA ChIP-seq analysis indicated that oriLyt-1 and oriLyt-2 pUL34-binding sites were highly enriched for pUL34. These binding sites flank the oriLyt promoter and occur near the IE2 and C/EBPɑ oriLyt binding sites, which are important for viral DNA replication (Kagele et al., 2009; Xu et al., 2004). Additionally, pUL34 bound the oriLyt core region at a near consensus site located in close proximity to the RNA-stem loop which serves as a binding site for another viral protein, pUL84 (Colletti et al., 2007). Protein-protein interactions were identified between pUL34 and pUL84 (Chapter 4); pUL34 enrichment at this site may result from direct DNA interactions as well as from pUL34-UL84 interactions. 68

In the human genome, 122 peaks of UL34 binding were identified, signifying unique pUL34-DNA interactions with human DNA (Table 5). Many of the pUL34-DNA binding interactions occurred within or near transcription units that do not encode a protein (84%; 103/122). This subset of transcripts is primarily comprised of regulatory

RNAs, such as short and long non-coding RNAs and micro RNAs. A majority of the pUL34-DNA interactions near protein coding genes occurred within the mitochondrial genome (68%; 13/19). Additionally, several interactions were discovered near mitochondrial tRNA transcripts, which are directly involved in protein synthesis within the mitochondrion.

Given the small size of the mitochondrial genome compared to the rest of the human genome, it was surprising to see a high frequency of pUL34-DNA interactions with mitochondrial DNA. It is not known if pUL34 enters the mitochondrion; previous studies showed that the protein localized to the cell nucleus after being synthesized

(Biegalke, 2013). Confirmation of UL34-mitochondrial interactions would suggest that pUL34 binding may interrupt the normal function of the mitochondria, which in turn could be involved in the pathology of HCMV infection. The interaction of pUL34 with the mitochondrial genome is also intriguing in light of a UL34 mutant virus, where loss of the carboxyl terminal half of pUL34 resulted in a significant decrease in expression of some mitochondrial genes (Hossain et al., pending review).

The sequence that the UL34 proteins bind to in the human genome varied greatly from that of the viral genome (Figure 6A). This suggests that pUL34 obtains modifications or interacts with other human or viral proteins (such as the human proteins 69 indicated in Figure 6B) which in turn modulate the binding domain to recognize alternative sequences. These binding interactions are particularly interesting in that pUL34 seemed to preferentially bind to mitochondrial DNA. Furthermore, an abundance of specific human transcription factor binding sites were present in the fragments which generated pUL34-binding site peaks, including the transcription factor family FOSL,

JUN, and NFE2, as well as RUNX1 and CTCF. FOSL and JUN are proto-oncogene transcription factors which interact at AP-1 sites and require tight regulation for their proper function (Halazonetis et al., 1988). Both NFE2 and RUNX1 are involved in the differentiation of hematopoietic stem cells and thus may result in cancer when improperly regulated (Andrews et al., 1993; Niebuhr et al., 2008). Perhaps one of the most well- studied transcription factors, CTCF is a global regulator of gene expression and epigenetic modifications, with aberrant activity also resulting in cancer (Ohlsson et al.,

2001; Ong and Corces, 2014). The discovered human pUL34 binding sites have these human TF binding sites in close proximity; further experimentation (such as co- immunoprecipitation) would be useful to determine if pUL34 interacts with the listed human proteins during infection.

Taken together, these data demonstrated that pUL34 interacts with both the viral and human genome during infection, recognizing related, but distinct sequences within each genome. pUL34 was bound to the viral oriLyt binding sites at 48 hours post infection, when viral DNA is being actively replicated. The oriLyt binding sites are the focus of study for the next chapter.

70

CHAPTER 4: UL34 BINDING SITES IN THE ORIGIN OF LYTIC REPLICATION

The entirety of this Chapter was published in Virology, May 2018 Citation: Slayton, M., Hossain, T., and Biegalke, B.J. (2018). pUL34 binding near the human cytomegalovirus origin of lytic replication enhances DNA replication and viral growth. Virology 518, 414–422.

Background

The UL34 proteins interact with three UL34 binding sites located in the origin of lytic replication (oriLyt) in both in vitro (gel-shift assay; (Liu and Biegalke, 2013)) and in vivo (ChIP-seq; Chapter 3) assays. OriLyt is the only lytic-replication origin within the

HCMV genome and is located approximately in the middle of the unique long segment of the HCMV genome (Anders et al., 1992; Masse et al., 1992). The oriLyt boundaries are defined as a ~3.3 kb fragment comprised of a 1.5 kb core element flanked by 1,100 bp or

700 bp to the left and right of the core, respectively. OriLyt displays base composition biases and contains numerous repeat elements and transcription factor binding sites; several transcripts initiate in or are transcribed through oriLyt, some of which are involved in oriLyt activation (Kagele et al., 2009; Kiehl et al., 2003; Prichard et al., 1998;

Xu et al., 2004). While the oriLyt core region is sufficient for viral replication of an oriLyt-containing plasmid, the entire origin, including the flanking regions, is required for replication of the viral genome (Borst and Messerle, 2005). Like other herpes viruses, a core set of proteins is required for lytic replication of the viral genome; UL44 (DNA processivity factor), UL54 (DNA polymerase), UL70 (primase), UL105 (helicase),

UL102 (primase-associated factor), UL57 (single-stranded DNA-binding protein) and the 71

UL112-UL113 locus (phosphoproteins). Two additional viral proteins, UL84 and IE2

(UL122) are also required for the replication of HCMV DNA (Xu et al., 2004).

The requirement for the oriLyt flanking regions in viral DNA replication, the presence of predicted UL34-binding sites, and the in vivo binding of pUL34 to the predicted UL34-oriLyt binding sites (Chapter 3), suggested that pUL34-DNA interactions may contribute to viral DNA replication or overall viral growth.

Aims

Do UL34-DNA binding interactions at the oriLyt region contribute to the viral-mediated replication of a plasmid containing oriLyt?

Human cytomegalovirus lytic-phase DNA replication is controlled by several viral and human proteins; many of these proteins have DNA-binding activity and interact with the viral genome at or near the origin of lytic replication. The UL34 proteins have

DNA-binding properties and are expressed during viral DNA replication (Biegalke et al.,

2004; Lester et al., 2006); furthermore, the UL34 proteins bind to UL34-binding sites in the oriLyt region (Chapter 3). This portion of the study aims to determine the functional role of the UL34-oriLyt binding sites by mutating these sites in an oriLyt-containing plasmid and assaying the plasmid replication rate during viral infection.

Do UL34-DNA binding interactions at the oriLyt region contribute to overall viral growth?

The plasmid replication study is limited in that only viral-mediated replication of an oriLyt-containing plasmid can be examined. It is unclear if changes to the viral DNA replication rate caused by the introduced mutations at the UL34-oriLyt binding sites 72 would influence viral growth. To determine if UL34-DNA binding interactions at the oriLyt region contribute to the overall growth of HCMV, identical mutations need to be introduced to the viral genome and the properties of the resulting viruses need to be characterized.

Results

UL34-binding sites enhance the efficiency of oriLyt-dependent DNA replication.

The wild type (wt) oriLyt from HCMV AD169, including the oriLyt core and flanking regions, mediates efficient replication of a plasmid (pSP50; wt) in transient replication assays [(Anders et al., 1992); Fig. 7A]. The contributions of the UL34- binding sites to oriLyt-dependent DNA replication were examined by mutating the binding sites in the wt plasmid, either individually or in combinations (Fig. 7B). The resulting plasmids were analyzed for their ability to replicate in a transient replication assay, comparing replication rates to that of the wild-type parental plasmid. The binding site identified as peak C in the ChIP-seq analysis (Chapter 3) was not mutated since others have demonstrated that mutations in this region of oriLyt result in replication defective viruses (Zhu et al., 1998). Transfection was used to introduce the plasmids into human fibroblasts; transfected cells were subsequently infected with HCMV to provide the proteins needed for replication of the oriLyt plasmid. Southern blot analyses and qPCR were utilized to examine oriLyt-dependent DNA replication of the wt and mutated plasmids. Representative results obtained with Southern blot analysis are shown in Fig.

7C; the efficiency of plasmid replication was also determined using qPCR (Fig. 7D).

73

Figure 7. Transient replication assays.

(A) Diagram of the oriLyt and flanking regions from HCMV strain AD169. The UL34 binding sites are indicated by vertical black bars; the oriLyt core region is indicated by a dark gray rectangle. The location of the C/EBPα and IE2 binding sites, the oriLytPM and the RNA stem-loop are also indicated. (B) Diagrams of the oriLyt region with one or more mutated UL34-binding sites. The presence of a UL34-binding site is indicated by the vertical black bar; replacement of a UL34-binding site at the location of ori2 in the ori2 mutant rescue is indicated by a white bar. The ori1 binding site was mutated from

AAACACCGTG to AACATATG; the ori2 binding site was mutated from

AAACACCGTG to AAGATCT; the ori3 binding site was mutated from

AAACACCGTT to AAGGCCT. (C) Southern blot analysis of replicated DNA.

Following transfection and infection, extracted DNAs were digested with KpnI and DpnI prior to blotting and probing with an oriLyt-specific probe. The HCMV genomic and plasmid oriLyt sequences are indicated on the right. (D) Multiplex qPCR of replicated

DNA. DNAs were digested with DpnI to eliminate input plasmid DNA before being used as template for qPCR with viral, plasmid, and positive control targets. Replication

ΔΔC efficiency of the oriLyt plasmids was calculated using the 2 T method (Livak and

Schmittgen, 2001), adjusted for reaction efficiency (positive control) and loading level

(viral) with the CT value of the wild-type oriLyt plasmid serving as base-line replication efficiency which was set at 100%. Experiments were performed in biological triplicate. 74

A

B

C

7 kb –

5 kb –

fd D

75

For qPCR reactions, three probe/primer combinations were used in the qPCR reactions; an internal control to monitor for variation in amplification efficiency; a probe/primer specific for viral genomic sequences at the US3 gene was used to normalize for total DNA in each sample; and lastly, a primer/probe combination which detected the replicated plasmids by targeting sequences within the vector. The relative amount of the replicated plasmid DNA was normalized to the internal control and corrected for variation in the relative total amount of DNA in the sample. Transfection efficiency was monitored by the detection of DNA fragments arising from input plasmid (following digestion with DpnI) and DNA replication rates were standardized accordingly.

The results are expressed as replication percentages, with the wild type plasmid set at a replication efficiency of 100%. The individual mutations in UL34-binding sites 1 and 3 had modest effects on oriLyt-dependent plasmid replication, with replication efficiencies averaging 78% or 66% of wild type, respectively (Fig. 7D). The individual mutation of the UL34-binding site 2 had the most dramatic effect, reducing plasmid replication to 38%, relative of wild type. The plasmid with mutations in UL34-binding sites 1 and 3 replicated at an efficiency of 67%, similar to either of the individual mutants, suggesting that these two binding sites do not act synergistically. Likewise, the presence of both mutations in UL34-binding sites 2 and 3 reduced the replication efficiency to 29%, similar to the result of mutating UL34-binding site 2 alone, but with a slight additive effect. The second UL34-binding site, ori2, had the most significant role in oriLyt-dependent DNA replication, perhaps due in part to its location, with ori2 located closest to the oriLyt core. 76

The insertion of a wild type UL34-binding site back into the mutated UL34- binding site 2 completely restored the replication efficiency of the oriLyt-containing plasmid, even in the presence of additional sequences that correspond to restriction enzyme sites (Fig. 7C and D). The ability of the UL34-oriLyt-2 mutant rescue plasmid to replicate at wild type levels suggests that alteration of secondary structures was not responsible for the decreased replication efficiency seen with UL34-oriLyt-2 mutant.

Furthermore, mutation of sequences between the UL34-oriLyt-2 binding site and the core region (but not at a known transcription factor binding site) had no discernable effect on plasmid DNA replication mediated by HCMV infection (Fig. 7D). No plasmid DNA replication was detected following transfection of the wild-type oriLyt plasmid in the absence of viral infection.

Elements of dyad symmetry are common features of replication origins in DNA viruses, allowing local distortion or melting during the initiation of replication (Fanning and Zhao, 2009). To examine the impact of the UL34-binding site mutations on potential secondary structures in the oriLyt, folding predictions were performed with mFOLD

(Zuker, 2003) on the wild type and the mutant oriLyt sequences. Predicted folding patterns were unaltered as a result of the mutant UL34-binding sites (data not shown).

Viruses with mutant oriLyt-UL34 binding sites replicate with slower kinetics.

The decreased replication efficiency of a plasmid containing oriLyt with mutated

UL34-binding sites led us to postulate that disruption of pUL34-DNA interactions in the oriLyt region would lead to a viral growth defect. Through targeted recombination, we introduced mutations into each of the UL34-oriLyt binding sites in the parental BAC, 77 pHB5. Stocks of mutant and wild type viruses were prepared and analyzed from duplicate isolates; the concentration of infectious virus was determined by plaque assays and the relative number of genomic copies/pfu determined by qPCR. These data were then used to determine how many genome copies were required to form a plaque for each virus. The number of genomes required to generate a plaque varied between the mutant viruses. Compared to wild type virus, approximately twenty-fold more genomes were required for the formation of a plaque following infection with either the UL34-oriLyt-1 or UL34-oriLyt-2 mutant viruses, while the UL34-oriLyt-3 mutant virus required approximately 200-fold more genomes to generate a plaque (Fig. 8). One step growth curves were performed, infecting fibroblasts with equivalent numbers of viral genomes, with the wild type virus infection corresponding to 0.1 pfu/cell. Growth curves were calculated from data obtained from duplicate mutant viruses, produced from separately isolated BACs. Duplicate mutants replicated with nearly identical kinetics, thus, the observed effects are not likely to result from random mutations.

Wild-type virus (reconstituted from the parental BAC, pHB5) grew with standard kinetics, reaching > 1 x106 infectious units per ml by 7 days post infection (Fig 9A).

Viruses with mutations in the UL34-oriLyt-1 or UL34-oriLyt-2 binding site produced 2 or 3 logs less virus, respectively, at 5 days post-infection, relative to wild type virus.

Maximal titers of these two mutant viruses were reached at 10 days post-infection and were reduced relative to wild type virus. The UL34-oriLyt-3 mutant virus exhibited a major defect in replication, with the production of infectious virus delayed by 7 days, with a ~5 log difference in infectious virus production. 78

Figure 8. Number of genomes required to generate a single plaque

An equivalent number of plaque forming units for each virus (pHB5, ori1, ori2, ori3) were subjected to ultracentrifugation through a 20% sucrose cushion prior to DNA extraction. Viral genomic copy number was determined from the extracted DNA by qPCR. Number of genomes are displayed as the amount required to generate a single plaque compared to the unmodified parental virus. Student’s t-test was used for statistical analysis. *, p < 0.05; unlabeled data points were not significantly different.

79

Figure 9. Mutant virus analysis

(A) One-step growth curve. Viruses reconstituted from an unmodified BAC (pHB5) or from BACs with mutations to the UL34-oriLyt binding sites (ori1, ori2, ori3) were assessed for their ability to replicate in a one-step growth curve. Equivalent number of genomes (corresponding to 0.1 pfu/cell for the wild-type virus) for each virus were used to infect human fibroblasts; supernatants were tittered to determine the number of pfu at the indicated time points after infection. The numbers of pfu produced from duplicate isolates of the mutant viruses were averaged; the error bars reflect one standard deviation between the duplicate mutant isolates. The collection of supernatants was halted at 7 days post infection for the wild-type virus due to complete destruction of the cell monolayer.

(B) Transcript levels at 48 hours post infection. Human fibroblasts were infected as above and total RNA was isolated at 48 hpi, converted to cDNA and used in qPCR with primers specific to UL57 or RNA4.9. Fold change compared to wild-type was calculated

ΔΔC using the 2 T method (Livak and Schmittgen, 2001), adjusted for loading level (human beta-actin). The dotted line indicates wild-type expression levels which were set to 1.

For statistics, Student’s t-test was performed. *, p <0.05; **, p<0.01; unlabeled data points were not significantly different.

80

A

B

81

The pUL34 binding sites are located near the UL57 promoter (Kiehl et al., 2003)

(ori-1 and ori-2) and within RNA4.9 (ori-3). UL57 encodes the single-stranded DNA binding protein, while RNA4.9 is a long non-coding transcript. RNA4.9 is abundantly transcribed during infection but plays an unknown role in lytic replication (Gatherer et al., 2011); recently, RNA4.9 was identified in pre-replication compartments and shown to be necessary for viral DNA replication and growth (Ezra et al., 2017). We examined transcript levels of UL57 and RNA4.9 in the mutant and wild-type viruses at 48 hours post infection (Fig. 9B) using quantitative rtPCR. Transcript levels for the viral genes were normalized to relative amounts of beta-actin (loading control) and UL44 (reference gene) transcripts. The UL34-oriLyt-1 and 2 viruses expressed UL57 and RNA4.9 at levels comparable to or greater than levels seen for wild type virus. In contrast, transcript levels of UL57 and RNA4.9 were only moderately reduced for the UL34-oriLyt-3 virus. pUL34 interacts with viral proteins involved in DNA replication.

pUL34 localizes to viral replication compartments with a pattern similar to that seen for IE2 and UL44 (Rana and Biegalke, 2014), suggesting that pUL34 potentially interacts with proteins present in viral replication compartments. To examine potential interactions of pUL34 with proteins with a known role in viral replication, co- precipitation experiments were performed with a candidate approach. pUL34 and associated proteins were co-precipitated using anti-UL34 antiserum, followed by immunoblot analyses to detect known viral DNA replication proteins. Initially, cell lysates were harvested from infected cells at 48 hpi; pUL34 was immunoprecipitated, and associated proteins analyzed for the presence of IE2 proteins by immunoblotting. The 40 82 kDa IE2 protein co-precipitated with pUL34 as shown in Fig. 10A. With prolonged exposure, the 86 kDa IE2 protein was also detected as co-precipitating with pUL34 (data not shown). To confirm the interactions of pUL34 with IE2, GST pull downs were performed, by incubating in vitro synthesized pUL34 with immobilized GST-IE2 or GST alone as a control (Fig. 10B). After washing, bound proteins were analyzed by protein gel electrophoresis and autoradiography. As shown in Fig. 10C, full length pUL34 and the amino terminal domain of pUL34 bound to GST-IE2 but not to GST. As a control, luciferase was synthesized in vitro; luciferase did not bind to either GST or to GST-IE2.

These experiments demonstrate that pUL34 interacts with IE2 in vitro and exists as a complex within infected cells undergoing active viral replication.

Experiments were then performed to determine if pUL34 interacts with two other proteins, UL84 and UL44, that are involved in viral DNA replication. For these experiments, cells were mock-infected or infected with either wild-type virus or myc-

UL34 AD169. Cell lysates were immunoprecipitated with antisera to pUL34 or with a monoclonal antibody to the myc-tag epitope; lysates were then analyzed by immunoblotting, using monoclonal antibodies to the viral proteins. As shown in Fig.

10D-I, no viral proteins were detected in uninfected cells. UL44 was expressed in wild- type infected cells and coprecipitated with pUL34 (Fig. 10D and 10H). Similarly, UL84 coprecipitated with pUL34 (Fig. 10E and 10I; in 10I, the UL84 bands in the immunoprecipitated lane for Myc-UL34-AD169 are slightly shifted due to the gel tearing during transfer). As a potential control for detection of non-specific interactions, we also

83

Figure 10. Protein-protein interactions

(A) Co-immunoprecipitation of IE2 with pUL34. The arrow indicates the co-precipitated

IE2 protein. (B) Expression of GST and GST-IE2 proteins. (C) GST pull down assays.

The input lanes reflect the in vitro translated proteins added to the GST or the GST-IE2 proteins. (D-I) Mock- or HCMV-infected (Wild-type; Myc-UL34-AD169) cells were harvested using co-immunoprecipitation buffer. After preclearing of the lysate with normal serum, anti-sera to pUL34 or a monoclonal antibody to the myc-tag epitope were used to immunoprecipitate pUL34 and associated proteins. Co-precipitated proteins were analyzed by immunoblotting, using antibodies specific for (D, H) UL44, (E, I) UL84, (F)

IE1, or (G) UL34. UL34 is expressed as 3 related proteins (Biegalke et al., 2004) that are similar in size to the heavy chain (*) of the anti-UL34 antiserum.

Experiments in panel A to G were performed by lab technician, Janet Hammer, who was also working in the Biegalke lab.

84

A B

C

85

Figure 10: Continued

D

E

F

G

H

I

86 analyzed the co-precipitates for the presence of IE1. IE1 localizes to the nuclei of infected cells early in infection but is not required for viral DNA replication; IE1 did not co-precipitate with pUL34 (Fig. 10F). Additionally, the protein-protein interactions were detected following immunoprecipitation with either anti-UL34 anti-sera or the anti-myc antibody, confirming the specificity of the interactions.

Discussion

Lytic replication of the HCMV genome requires the presence of the complete oriLyt region (Borst and Messerle, 2005), and six core replication proteins; UL84, the p43 and p84 isoforms of UL112-113 (Schommartz et al., 2017; Yamamoto et al., 1998) and IE2 (UL122) (Pari and Anders, 1993). The localization of pUL34 to viral replication compartments (Biegalke, 2013) and the interaction of pUL34 with binding sites present in oriLyt (Chapter 3) lead to the hypothesis that pUL34 plays a role in viral DNA replication. These experiments confirm that the presence of pUL34-binding sites in the oriLyt region is important for efficient viral-mediated DNA replication of a plasmid containing oriLyt and overall viral replication.

Mutation of the UL34-oriLyt-2 binding site alone, or in combination with UL34- oriLyt-3, significantly reduced the viral-mediated replication efficiency of the oriLyt- containing plasmid. Although pUL34 is not one of the identified viral replication proteins (Pari and Anders, 1993), these results suggest that pUL34 increases the efficiency of oriLyt-dependent DNA replication.

Viruses with mutations to the UL34-oriLyt-2 or 3 binding sites replicated with slower kinetics. Intriguingly, deletion of the flanking regions which contain these 87 binding sites results in defective viral growth (Borst and Messerle, 2005). Furthermore,

Borst and Messerle (2005) reported that neither inversion of the oriLyt core nor the insertion of the oriLyt core into a non-essential region was sufficient to rescue oriLyt- dependent DNA replication in a virus lacking oriLyt. Their results highlight the positional importance of the flanking regions, which contain pUL34-binding sites. The pUL34-oriLyt-1 and 2 binding sites are located near the UL57 promoter, while the pUL34-oriLyt-3 binding site occurs within the long non-coding RNA4.9. Mutation of the pUL34-oriLyt-3 binding site had minimal effects on replication of the oriLyt plasmid while having a dramatic effect on viral replication, leading us to suggest that this mutation alters aspects of viral replication other than DNA replication. The decrease in viral replication efficiency for the UL34-oriLyt 3 mutant was surprising, due to the observed but modest decrease in viral DNA replication efficiency (Figure 7C, 7D). The observed growth defect in the UL34-oriLyt-3 mutant virus may result from a decrease in the expression of RNA4.9, or from structural changes to the non-coding RNA itself.

Further analysis of this virus may provide insights into the role of RNA4.9 in viral replication.

In addition to binding to the oriLyt region during viral replication, pUL34 associates with IE2, UL44 and UL84 proteins, all of which are required for viral DNA replication. Given the DNA-binding activity of pUL34, protein-DNA interactions may contribute to the formation of the protein complexes. Additionally, targeted mutations to the UL34-oriLyt binding sequences in a plasmid containing oriLyt and within the viral genome resulted in a reduction to viral-mediated DNA replication and viral growth, 88 respectively. These data indicate that pUL34 functions in the viral DNA replication cascade, beyond its previously defined role in transcriptional repression (LaPierre and

Biegalke, 2001). This newly proposed role of pUL34 in viral DNA replication is further supported by the identification of pUL34 on nascent DNA at the oriLyt region (C.

Rossetto, personal communication).

89

CHAPTER 5: BAC MUTAGENESIS WITH GBLOCKS AND THE UL34 BINDING

SITE IN EXON 3 OF THE UL37 OPEN READING FRAME

Background

The HCMV UL37 gene is transcribed during the immediate-early phase of lytic infection. UL37 contains three exons which are extensively spliced to yield three transcripts of importance: UL37x1 (only exon 1), UL37m (all three exons with shortened exon 3) and UL37 (full-length transcript) (Adair et al., 2003); additional transcripts are generated to a much lesser degree. These transcripts are translated into the proteins pUL37x1, glycoprotein (gp) UL37m, and gpUL37, respectively. pUL37x1, also referred to as viral mitochondrial-localized inhibitor of apoptosis (vMIA), is an essential protein that is expressed immediately after infection (Goldmacher, 2002). vMIA localizes to the mitochondria and prevents permeabilization of the outer membrane, thereby inhibiting release of cytochrome c and apoptosis (Bozidis et al., 2010; Goldmacher et al., 1999).

The glycoproteins gpUL37m and gpUL37 have similar functions to pUL37x1 but are not as abundantly expressed. Interestingly, the full-length gpUL37 is cleaved proteolytically into two halves which are trafficked to different organelles: the N-terminal, which primarily consists of pUL37x1, localizes to mitochondria while the C-terminal, which contains a transmembrane domain, localizes to the endoplasmic reticulum (Mavinakere and Colberg-Poley, 2004).

A UL34-binding site is located within exon 3 of the full-length transcript, but it is not present in the shortened UL37m transcript. We were interested in investigating the role of the UL34-binding site present in the UL37 gene. In doing so, we developed an 90 approach which bypasses the limitation of PCR-based transfer construct generation for

“en-passant” bacterial artificial chromosome (BAC) mutagenesis. Currently, BACs are used primarily in transgenic expression studies encompassing entire regulatory regions

(Cubells et al., 2016; Lee et al., 2015) and in the maintenance of full viral genomes as infectious clones, such as with human cytomegalovirus (Akhmedzhanov et al., 2017;

Almazán et al., 2006; Borst et al., 1999). The primary method of targeted mutagenesis to

BACs is through homologous recombination using introduced dsDNA with the desired mutation. The lambda Red recombination based method of “en passant” mutagenesis is elegant and useful because it leaves no recombination scars in the mutated sequence

(Tischer et al., 2010). However, this technique relies on the generation of a linear dsDNA transfer molecule generated by PCR, using large primers that are comprised of sequences that are homologous to both a selection marker and to the specific target.

Complex secondary structures can be formed by these long primers, making the generation of the construct difficult.

Aims

Does a commercially synthesized transfer construct provide time and cost-saving benefits to the “en-passant” BAC mutagenesis technique?

Through the use of “en-passant” mutagenesis, scar-less insertions, deletions and substitutions can be made to a BAC within the bacterium (Tischer et al., 2006, 2010).

The technique is powerful and well-designed; however, if the researcher is unable to create the dsDNA “transfer construct” with PCR, the experiment reaches a standstill.

Improvements to commercially available solid-phase synthesized DNA fragments have 91 lowered the cost per nucleotide to an amount that makes them feasible for use in recombination techniques. With the GeneBlocks service from Integrated DNA

Technologies, a transfer construct will be synthesized and tested for recombination in place of the method which utilizes PCR.

Does a virus with a mutant UL34-binding site in exon 3 of the UL37 transcript have an altered replication rate?

The UL34-binding site within the third exon of the UL37 transcript is unique in that it occurs within the coding region of a spliced viral gene; spliced transcripts within the HCMV genome are uncommon. However, this binding site has the lowest frequency of conservation (83%) amongst clinical isolates (Chapter 3). Regardless, the virus created with the modified BAC mutagenesis technique described in this chapter should be tested for improvements or deficiencies in viral replication. This aim will be answered with a one-step growth curve as well as measurement of UL37 transcripts.

Results

Use of a solid-phase synthesized transfer construct saves both time and cost during “en passant” BAC mutagenesis.

The goal of this experiment was the insertion of mutations in the open reading frame of the human cytomegalovirus gene, UL37 using the BAC, pHB5. The parental

BAC, pHB5, contains the genome of human cytomegalovirus (HCMV) strain AD169, with the BAC maintenance and chloramphenicol resistance sequences inserted in place of the viral genes US2 through US6 (Borst et al., 1999). Initially, insertion of the desired mutations was attempted using the described “en passant” mutagenesis strategy. Long 92 primers (75 and 81 base pairs) were generated which targeted exon 3 of the HCMV UL37 open reading frame and contained the desired mutations (Table 7); the primers were used to amplify the kanamycin resistance gene and the I-SceI recognition site using the template plasmid pEP-KanS. Despite previous success using this strategy to generate transfer constructs targeting other viral loci (data not shown), the UL37 transfer construct could not be generated with PCR. The secondary structure of the long primers was analyzed with DINAMelt (Markham and Zuker, 2005, 2008). The UL37 primers were predicted to extensively , forming molecules with extensive secondary structures

(Figure 11A). Attempts to amplify the PCR product included the use of different polymerases, annealing temperatures, primer concentrations and were unsuccessful, suggesting that the structure of the primers was preventing successful amplification.

To obtain the mutation of interest, a dsDNA transfer construct was designed and synthesized by solid-phase chemistry, obtained through the IDT gBlocks Gene Fragments service (Integrated DNA Technologies, Iowa, USA). The gBlock was designed to contain the kanR sequences flanked on both ends by sequences homologous to UL37 exon

3 with the desired mutations and the I-SceI sites (Table 8; Figure 11B). Initially, repeated sequences on both ends of the proposed gBlock resulted in rejection during the complexity analysis. To bypass this issue, an additional 70 base pairs were added onto each end of the proposed gBlock (total length 1250 base pairs); the addition of 70 base pairs resulted in no price increase because of the IDT pricing tier system. The final

93

Table 7. Sequence of the long primers which failed to generate a transfer construct.

Sequences highlighted in yellow indicate the portion of the primer that matches the pEP-

KanS template. Sequence highlighted in green match the UL37 coding sequence. Yellow region, I-SceI sites. Nucleotide substitutions are indicated with white lettering on a black background. The long primers were PAGE purified. Primers were obtained from

Integrated DNA Technologies, Coralville, IA.

UL37-kanR-F (top) and UL37-kanR-R (bottom)

5’-GTTGTTGACGCTGCTTTGGAGCACGGGCCATGGGGTGAGCGTCCGATGTACTAGGGAT

AACAGGGTAATCGATTT-3’

5’-CGTCCGTACCGTGGTACGTACATCGGACGCTCACCCCATGGCCCGTGCTCCAAAGCAG

GCCAGTGTTACAACCAATTAACC-3’

94

Figure 11. Secondary structure of long primers and design of gBlock gene fragment.

(A) Secondary structure of the 75-base pair (left) and 81-base pair (right) long primers used for the attempts to generate a transfer construct by PCR. Sequences were folded on the DINAMelt web server with settings mimicking PCR conditions. The red dotted boxes represent the portion of the long primers which anneal to the plasmid template, pEP-KanS.

(B) To-scale schematic representation of the long primers (top left, bottom right) and gBlock Gene Fragment (middle) utilized in this study. Dark gray region, kanamycin resistance (broken for scale purposes); yellow region, I-SceI sites; green region, UL37 coding sequences with intended mutation; blue region, additional UL37 coding sequences required for synthesis. The long primers are 75 and 81 base pairs respectively; the gBlock Gene Fragment is 1250 base pairs.

95

A

B

96 product then contained the repeated sequences located further internally in the DNA molecule which allowed the gBlock to be successfully synthesized. The addition of 70 more nucleotides homologous to the UL37 gene also increased the specificity of the transfer construct to the target of interest by increasing the total homology arm lengths to

130 base pairs each.

Generation of the recombinant BAC requires the use of electrocompetent GS1783 cells harboring the parental BAC, pHB5. GS1783 is an E. coli strain which expresses

Red recombination proteins (heat inducible at 42oC) as well as the I-SceI homing endonuclease (L-arabinose inducible) (Tischer et al., 2010). Bacteria containing pHB5 were mixed briefly with the UL37-kanR gBlock and immediately electroporated (Bio-Rad

MicroPulser Electroporator). Bacteria were removed from the cuvette and incubated in antibiotic-free LB broth to regenerate the culture. The resulting solution was plated on

LB agar containing 50 µg/ml kanamycin and 34 µg/ml chloramphenicol and incubated until colonies became visible (~2 days). Several colonies were selected for further analysis. BAC DNA was extracted with a standard miniprep protocol.

To determine if recombinant BACs resulted from electroporation of the gBlock, a region of the UL37 gene that was targeted by the mutagenesis was amplified; the resulting PCR product was then sequenced. All colonies analyzed contained a BAC with a successful recombination; the intended mutations and kanamycin resistance gene were inserted into the desired target (n=10). Of the initial 10 colonies, two colonies were selected for an additional round of recombination for removal of the kanR gene by

97

Table 8. Sequence of the gBlock gene fragment.

Blue region, additional UL37 coding sequence required for synthesis. green region, UL37 coding sequence. Yellow region, I-SceI sites. Un-highlighted region, kanamycin resistance sequence. Nucleotide substitutions are indicated with white lettering on a black background. The gBlock was obtained from Integrated DNA Technologies,

Coralville, IA.

UL37-kanR gBlock

5’-GGAAAGGGGAGGGTAGAAACGTGAGTCTCCGTCAATAAAAAGTCCCGTGTCGTCTCCA

CGTAGGTTTCTGTGTGGCTGTTGTTGACGCTGCTTTGGAGCACGGGCCATGGGGTGAGCGTC

CGATGTACTAGGGATAACAGGGTAATCGATTTATTCAACAAAGCCACGTTGTGTCTCAAAA

TCTCTGATGTTACATTGCACAAGATAAAAATATATCATCATGAACAATAAAACTGTCTGCTT

ACATAAACAGTAATACAAGGGGTGTTATGAGCCATATTCAACGGGAAACGTCTTGCTCGAG

GCCGCGATTAAATTCCAACATGGATGCTGATTTATATGGGTATAAATGGGCTCGCGATAAT

GTCGGGCAATCAGGTGCGACAATCTATCGATTGTATGGGAAGCCCGATGCGCCAGAGTTGT

TTCTGAAACATGGCAAAGGTAGCGTTGCCAATGATGTTACAGATGAGATGGTCAGACTAAA

CTGGCTGACGGAATTTATGCCTCTTCCGACCATCAAGCATTTTATCCGTACTCCTGATGATG

CATGGTTACTCACCACTGCGATCCCCGGGAAAACAGCATTCCAGGTATTAGAAGAATATCC

TGATTCAGGTGAAAATATTGTTGATGCGCTGGCAGTGTTCCTGCGCCGGTTGCATTCGATTC

CTGTTTGTAATTGTCCTTTTAACAGCGATCGCGTATTTCGTCTCGCTCAGGCGCAATCACGA

ATGAATAACGGTTTGGTTGATGCGAGTGATTTTGATGACGAGCGTAATGGCTGGCCTGTTG

AACAAGTCTGGAAAGAAATGCATAAGCTTTTGCCATTCTCACCGGATTCAGTCGTCACTCAT

GGTGATTTCTCACTTGATAACCTTATTTTTGACGAGGGGAAATTAATAGGTTGTATTGATGT

TGGACGAGTCGGAATCGCAGACCGATACCAGGATCTTGCCATCCTATGGAACTGCCTCGGT

GAGTTTTCTCCTTCATTACAGAAACGGCTTTTTCAAAAATATGGTATTGATAATCCTGATAT 98

Table 8 continued.

GAATAAATTGCAGTTTCATTTGATGCTCGATGAGTTTTTCTAATCAGAATTGGTTAATTGGT

TGTAACACTGGCCTGCTTTGGAGCACGGGCCATGGGGTGAGCGTCCGATGTACGTACCACG

GTACGGACGTTAATCGCACTTCCAACACCACGAGCATGAACTGCCACCTAAACTGCACACG

CAATCACACGCAAATCTACAACGG-3’

99 expression of the I-SceI homing endonuclease (introducing a double-stranded break) followed by expression of Red recombination proteins. The resulting recombinants were plated on LB agar containing 34 µg/ml chloramphenicol and 1% L-arabinose and incubated until colonies were visible. Several colonies were tested for kanamycin sensitivity by replica plating; all selected colonies were kanamycin sensitive (n=10).

Extraction of DNA, amplification with PCR and sequencing were performed as described above; all analyzed colonies contained the desired mutations with no trace of the kanamycin resistance insertion. In summary, use of the gBlocks gene fragment as a transfer construct allowed us to rapidly generate the desired mutation in the HCMV genome. Our modified protocol was both time and cost-effective (Figure 12).

The UL37-exon3-mutant virus replicates with nearly wild-type kinetics.

The resulting UL37 mutant BAC, with an abrogated UL34-binding site in exon 3 of the UL37 gene, was then utilized to generate mutant virus stocks. After isolation, purification and quantitation of the mutant virus, the UL37 mutant was tested for replication in a one-step growth curve as described previously (Chapter 4) at both high and low MOI (0.1 and 0.01 infectious particles per cell) in duplicate with two independent isolates (UL37A and B) and compared with the virus derived from the parental BAC (pHB5). Following infection at a high MOI, the wild-type and mutant virus replicated with similar kinetics for one replication cycle (4 days) (Figure 13A).

After one replication cycle, production of infectious virus from the UL37-exon3-mutant infected cells was unable to keep up with the wild-type virus. The infected cells were

100

Figure 12. Work-flow chart of “en passant” mutagenesis with the standard protocol (left) and modified protocol (right). Approximate time requirements for the differing steps leading up to the first recombination event are listed in italicized lettering below each box.

101 inspected visually during the growth curve infection every ~24 hours. At 4 days post infection, the UL37-exon3-mutant virus infected cells displayed a different cellular morphology than the wild-type infected cells, appearing much larger in size and exhibiting less cytopathology (data not shown). The experiment was repeated with a lower multiplicity of infection (0.01 pfu/cell); at this MOI the viruses grew with similar kinetics during the assayed time (Figure 13B).

The wild type or the mutated UL34-binding site in the UL37 gene is located in exon 3. To examine the potential impact of the mutated UL34-binding site on transcript levels, full-length UL37 transcript were measured at early and late times post infection.

We hypothesized that the mutation could influence transcript elongation and/or splicing.

For the analysis of UL37 transcripts, a pair of primers was designed for qRT-PCR, targeting downstream of the UL34-binding site in exon 3 within the UL37 transcriptional unit. Human beta-actin was measured as a loading control. SYBR-green based qRT-

PCR assays were conducted as described previously (Chapter 4) on isolated total RNA from wild-type and UL37-exon3-mutant virus at 16 hours and 72 hours post infection; these time points were chosen because UL37 mRNA levels peak at 16 hours post infection while 72 hours post infection represents a late time point. At the assayed time- points post infection, the level of full-length UL37 transcripts for the UL37 mutant virus was not significantly different from that seen for the wild type virus, pHB5 (Figure 14).

102

Figure 13. One-step growth curve comparison of wild-type (pHB5) and mutant (UL37A and B) virus.

(A) Replication following infection of human foreskin fibroblasts at 0.1 MOI. Viruses were assessed for their ability to replicate in a one-step growth curve as described previously (Chapter 4). The wild-type (blue line) and UL37-exon3-mutant (gray and orange lines) growth rates and displayed.

(B) Replication at 0.01 MOI.

The procedure was the same as in A, except with a lower MOI.

103

A

0.1 MOI

B

0.01 MOI

104

Figure 14. Measurement of UL37 transcripts from wild-type and UL37-exon3-mutant infected cells.

Full-length UL37 transcripts were detected with qRT-PCR in total RNA preparations from wild-type and UL37-exon3-mutant infected cells at 16 and 72 hours post infection with an MOI of 0.1 infectious particles per cell. Levels are shown as fold change, with the wild-type UL37 transcript being set to 1. Transcripts derived from the UL37 exon2-3 splice junction were also examined but were undetectable at either time-point. Mock infected cells lacked expression of either transcript.

2

1.5

1 Foldchange 0.5

0 WT UL37mut WT UL37mut

16 hpi 72 hpi 105

Discussion

Alternative splicing of the UL37 gene results in several transcripts that are produced at immediate-early time points post infection. The UL34-binding site is located near the 5’ end of exon 3; deletion studies demonstrated that UL37x1 is required for viral replication, but UL37x3 is dispensable (Dunn et al., 2003). Although the UL37x3 binding site has the lowest conservation rate amongst predicted UL34-binding sites

(83%), pUL34-DNA interactions occur at this location at 48 hours post infection

(Chapter 3)

The UL37-exon3-mutant virus was not as replication-deficient as the oriLyt viruses which were studied in Chapter 4. Both the rate of replication and expression of the UL37 full-length transcript were unaltered by the introduced mutation. However, at a higher MOI the infected cells did not produce as much virus after 4 days and the infection began to decline. Combined with the visual observation that the UL37-exon3-mutant infected cells displayed an altered morphology as compared to wild-type infected cells, this result may be due to a failure of the mutant virus to prevent apoptosis from occurring, thereby halting the production of infectious virus. Future work to determine the level of endogenous cellular caspase expression during infection by the UL37-exon3- mutant virus will help to clarify this observation.

Two-step Red recombination methods, such as “en passant” mutagenesis, allow for accurate and traceless modifications to BAC sequences. Traditionally, the use of

PCR to generate a transfer construct was the most time and cost-effective method.

However, the primers used in this process are long, limited to a specific target location 106 and may contain secondary structures that cannot be avoided by redesign. To bypass these limitations, a cost-effective time-saving method which utilizes a solid-phase synthesized dsDNA transfer construct was developed, reducing the time requirement by

~11 hours and the cost per mutant by ~$50. The technique was validated by the successful recombineering of a BAC to insert a mutation at the UL34-binding site located within exon 3 of the UL37 open reading frame. This simplified version allowed for the generation of a mutant BAC (AD169-UL37x3mut) where PCR amplification of the target mutation site was unsuccessful. These data indicated that use of a chemically synthesized transfer construct for lambda Red recombination resulted in a similar or greater efficiency of recombination when compared to the PCR method, with the benefit of reducing time and cost requirements. This technique was utilized successfully to substitute seven interspersed nucleotides into the target; however, it is likely that this strategy could be utilized for larger insertions or deletions.

107

SUMMARY

The UL34 proteins of human cytomegalovirus are complex, in the sense that they have a multitude of functions during infection; for every new functionality that is discovered, several additional questions about the UL34 proteins are generated.

Previously, it was known that UL34 bound to viral genomic DNA at specific sequences in vitro, where it down-regulated the production of US3 and US9 transcripts; with the completion of this work, it is known that UL34 binds to both viral AND human genomic

DNA during infection in vivo, contributes to the efficiency of viral DNA replication through binding interactions at the viral oriLyt, and interacts with other viral proteins essential to DNA replication. However, the functionalities of some UL34-binding sites remain unsolved, as the mutation of UL34-binding site in exon 3 of the UL37 transcript resulted in no detectable change in viral replication or full-length transcription of the gene. Several additional predicted UL34-binding sites remain uninvestigated, and with the results of the in vivo viral genome binding study (Chapter 3), it appears that even more will require investigation. Lastly, although the disruption of UL34-DNA interactions at the viral oriLyt resulted in a reduced viral replication rate, the virus was still able to function. Therefore, the specific functionality of the UL34 proteins which makes them essential to replication remains unknown.

108

FUTURE DIRECTIONS

The work presented in this dissertation provided answers to several questions about the functionality of the UL34 proteins. However, several unanswered questions remain, and future experiments can be hypothesized from the results of this work. For instance, what does the DNA-binding profile of pUL34 for the viral and human genomes look like at other timepoints post infection? The myc-UL34-AD169 virus can be utilized to answer this question by repeating the ChIP-seq experiment at 16, 24, 72, and 96 hours post infection. This experiment would generate a landscape of pUL34-DNA interactions as they occur over time and may illuminate further functionalities; for example, although pUL34 was not bound to US9 at 48 hours post infection, what about during an earlier timepoint? Regarding pUL34-DNA interactions with the human genome, the results presented in this work highlight novel interactions between the UL34 proteins and human genomic DNA. The binding profile at 48 hours post infection could be expanded to cover the other timepoints listed above to determine if these interactions are regulated in a temporal manner. Additionally, protein-protein interaction experiments should be performed to determine if pUL34 interacts with the human TFs identified in the over- representation analysis (Figure 6B).

Regarding pUL34-DNA interactions at the viral oriLyt, future work should determine if the mutations at the UL34-oriLyt binding sites within the viral genome cause a reduction in viral DNA replication efficiency during infection, with a similar or different effect as the reduction observed in the plasmid assay (Figure 7). This can be achieved with the BrdU incorporation assay, which is a pyrimidine analog that can be 109 added to cells and will be incorporated into newly synthesized DNA. Levels of BrdU incorporation within cells infected by either the wild-type or mutant viruses described here will then elucidate overall viral DNA replication rates – uninfected cells would be utilized to determine the basal cellular proliferation rate. Commercially available kits for

BrdU incorporation and detection (antibody-based) reduce the variability and improve the success rate for this type of experiment.

Lastly, regarding the UL37 mutant virus in Chapter 5 – an interesting observation was made while performing the growth curve (Figure 13) at high MOI (0.01). The cells that were infected with wild-type CMV presented with a normal cellular morphology following infection; however, the cells infected with the UL37 mutant virus became very large and detached from the dish, with what appeared to be a reduction in the localized spread of infectious virions. Future work with this mutant virus would determine if the infected cells are able to undergo apoptosis by examining the expression of cellular caspases by western blot. Normally, as infection progresses, apoptosis is prevented through various mechanisms – the main UL37 protein is involved in this process by preventing release of cytochrome c. Perhaps this mutant virus, which contains a modified UL34 binding site in exon 3 of the UL37 open reading frame, is not able to prevent apoptosis as well as the wild-type. Any such disruption would likely present as a decrease in the spread of infectious virus as observed in the cells infected with the UL37 mutant virus at high MOI.

These experiments represent some of the potential future work that could be performed based on the results of this dissertation. Many additional possibilities remain, 110 and the results of the future work would likely open the door for even further experimentation. Thus, the cycle of science continues. 111

REFERENCES

Adair, R., Liebisch, G.W., and Colberg-Poley, A.M. (2003). Complex alternative processing of human cytomegalovirus UL37 pre-mRNA. J Gen Virol 84, 3353–3358.

Adler, B., Scrivano, L., Ruzcics, Z., Rupp, B., Sinzger, C., and Koszinowski, U. (2006).

Role of human cytomegalovirus UL131A in cell type-specific virus entry and release. J.

Gen. Virol. 87, 2451–2460.

Afgan, E., Baker, D., van den Beek, M., Blankenberg, D., Bouvier, D., Čech, M.,

Chilton, J., Clements, D., Coraor, N., Eberhard, C., et al. (2016). The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic

Acids Res 44, W3–W10.

Ahn, J.-H., Jang, W.-J., and Hayward, G.S. (1999). The Human Cytomegalovirus IE2 and

UL112-113 Proteins Accumulate in Viral DNA Replication Compartments That Initiate from the Periphery of Promyelocytic Leukemia Protein-Associated Nuclear Bodies

(PODs or ND10). J. Virol. 73, 10458–10471.

Akhmedzhanov, M., Scrochi, M., Barrandeguy, M., Vissani, A., Osterrieder, N., and

Damiani, A.M. (2017). Construction and manipulation of a full-length infectious bacterial artificial chromosome clone of equine herpesvirus type 3 (EHV-3). Virus Research 228,

30–38.

Almazán, F., DeDiego, M.L., Galán, C., Escors, D., Álvarez, E., Ortego, J., Sola, I.,

Zuñiga, S., Alonso, S., Moreno, J.L., et al. (2006). Construction of a Severe Acute 112

Respiratory Syndrome Coronavirus Infectious cDNA Clone and a Replicon To Study

Coronavirus RNA Synthesis. J. Virol. 80, 10900–10906.

Anders, D.G., Kacica, M.A., Pari, G., and Punturieri, S.M. (1992). Boundaries and structure of human cytomegalovirus oriLyt, a complex origin for lytic-phase DNA replication. J. Virol. 66, 3373–3384.

Anders, D.G., Kerry, J.A., and Pari, G.S. (2007). DNA synthesis and late viral gene expression. In Human Herpesviruses: Biology, Therapy, and Immunoprophylaxis, A.

Arvin, G. Campadelli-Fiume, E. Mocarski, P.S. Moore, B. Roizman, R. Whitley, and K.

Yamanishi, eds. (Cambridge: Cambridge University Press).

Andrews, N.C., Erdjument-Bromage, H., Davidson, M.B., Tempst, P., and Orkin, S.H.

(1993). Erythroid transcription factor NF-E2 is a haematopoietic-specific basic–leucine zipper protein. Nature 362, 722–728.

Appleton, B.A., Loregian, A., Filman, D.J., Coen, D.M., and Hogle, J.M. (2004). The cytomegalovirus DNA polymerase subunit UL44 forms a C clamp-shaped dimer. Mol.

Cell 15, 233–244.

Appleton, B.A., Brooks, J., Loregian, A., Filman, D.J., Coen, D.M., and Hogle, J.M.

(2006). Crystal structure of the cytomegalovirus DNA polymerase subunit UL44 in complex with the C terminus from the catalytic subunit. Differences in structure and function relative to unliganded UL44. J. Biol. Chem. 281, 5224–5232. 113

Azevedo*, L.S., Pierrotti, L.C., Abdala, E., Costa, S.F., Strabelli, T.M.V., Campos, S.V.,

Ramos, J.F., Latif, A.Z.A., Litvinov, N., Maluf, N.Z., et al. (2015). Cytomegalovirus infection in transplant recipients. Clinics (Sao Paulo) 70, 515–523.

Beelontally, R., Wilkie, G.S., Lau, B., Goodmaker, C.J., Ho, C.M.K., Swanson, C.M.,

Deng, X., Wang, J., Gray, N.S., Davison, A.J., et al. (2017). Identification of compounds with anti-human cytomegalovirus activity that inhibit production of IE2 proteins.

Antiviral Research 138, 61–67.

Biegalke, B.J. (2013). Nontraditional Localization and Retention Signals Localize

Human Cytomegalovirus pUL34 to the Nucleus. J. Virol. 87, 11939–11944.

Biegalke, B.J., Lester, E., Branda, A., and Rana, R. (2004). Characterization of the

Human Cytomegalovirus UL34 Gene. J. Virol. 78, 9579–9583.

Bolger, A.M., Lohse, M., and Usadel, B. (2014). Trimmomatic: a flexible trimmer for

Illumina sequence data. Bioinformatics 30, 2114–2120.

Borst, E.-M., and Messerle, M. (2005). Analysis of Human Cytomegalovirus oriLyt

Sequence Requirements in the Context of the Viral Genome. J Virol 79, 3615–3626.

Borst, E.-M., Hahn, G., Koszinowski, U.H., and Messerle, M. (1999). Cloning of the

Human Cytomegalovirus (HCMV) Genome as an Infectious Bacterial Artificial

Chromosome in Escherichia coli: a New Approach for Construction of HCMV Mutants.

J. Virol. 73, 8320–8329. 114

Bozidis, P., Williamson, C.D., Wong, D.S., and Colberg-Poley, A.M. (2010). Trafficking of UL37 Proteins into Mitochondrion-Associated Membranes during Permissive Human

Cytomegalovirus Infection. J Virol 84, 7898–7903.

Carter, C.A., and Ehrlich, L.S. (2008). Cell biology of HIV-1 infection of macrophages.

Annu. Rev. Microbiol. 62, 425–443.

Cha, T.A., Tom, E., Kemble, G.W., Duke, G.M., Mocarski, E.S., and Spaete, R.R.

(1996). Human cytomegalovirus clinical isolates carry at least 19 genes not found in laboratory strains. J. Virol. 70, 78–83.

Chambers, R.W., Rose, J.A., Rabson, A.S., Bond, H.E., and Hall, W.T. (1971).

Propagation and Purification of High-Titer Human Cytomegalovirus. Appl. Microbiol.

22, 914–918.

Chee, M.S., Bankier, A.T., Beck, S., Bohni, R., Brown, C.M., Cerny, R., Horsnell, T., Iii,

C.A.H., Kouzarides, T., Martignetti, J.A., et al. (1990). Analysis of the Protein-Coding

Content of the Sequence of Human Cytomegalovirus Strain AD169. In

Cytomegaloviruses, J.K. McDougall, ed. (Springer Berlin Heidelberg), pp. 125–169.

Chen, D.H., Jiang, H., Lee, M., Liu, F., and Zhou, Z.H. (1999). Three-dimensional visualization of tegument/capsid interactions in the intact human cytomegalovirus.

Virology 260, 10–16. 115

Colletti, K.S., Smallenburg, K.E., Xu, Y., and Pari, G.S. (2007). Human cytomegalovirus

UL84 interacts with an RNA stem-loop sequence found within the RNA/DNA hybrid region of oriLyt. J. Virol. 81, 7077–7085.

Cubells, J.F., Schroeder, J.P., Barrie, E.S., Manvich, D.F., Sadee, W., Berg, T., Mercer,

K., Stowe, T.A., Liles, L.C., Squires, K.E., et al. (2016). Human Bacterial Artificial

Chromosome (BAC) Transgenesis Fully Rescues Noradrenergic Function in Dopamine

β-Hydroxylase Knockout Mice. PLOS ONE 11, e0154864.

Dargan, D.J., Douglas, E., Cunningham, C., Jamieson, F., Stanton, R.J., Baluchova, K.,

McSharry, B.P., Tomasec, P., Emery, V.C., Percivalle, E., et al. (2010). Sequential mutations associated with adaptation of human cytomegalovirus to growth in cell culture.

J. Gen. Virol. 91, 1535–1546.

Dunn, W., Chou, C., Li, H., Hai, R., Patterson, D., Stolc, V., Zhu, H., and Liu, F. (2003).

Functional profiling of a human cytomegalovirus genome. PNAS 100, 14223–14228.

Einhorn, L., and Ost, A. (1984). Cytomegalovirus infection of human blood cells. J.

Infect. Dis. 149, 207–214.

Ertl, P.F., and Powell, K.L. (1992). Physical and functional interaction of human cytomegalovirus DNA polymerase and its accessory protein (ICP36) expressed in insect cells. J Virol 66, 4126–4133.

Ertl, P.F., Thomas, M.S., and Powell, K.L. (1991). High level expression of DNA polymerases from herpesviruses. Journal of General Virology 72, 1729–1734. 116

Ezra, A., Tai-Schmiedel, J., Karniely, S., Eliyahu, E., and Stern-Ginossar, N. (2017). The

HCMV long non coding RNA 4.9 is important for viral DNA replication. (Jerusalem,

Israel).

Fabregat, A., Jupe, S., Matthews, L., Sidiropoulos, K., Gillespie, M., Garapati, P., Haw,

R., Jassal, B., Korninger, F., May, B., et al. (2018). The Reactome Pathway

Knowledgebase. Nucleic Acids Res 46, D649–D655.

Fanning, E., and Zhao, K. (2009). SV40 DNA replication: From the A gene to a nanomachine. Virology 384, 352–359.

Furukawa, T., Fioretti, A., and Plotkin, S. (1973). Growth Characteristics of

Cytomegalovirus in Human Fibroblasts with Demonstration of Protein Synthesis Early in

Viral Replication. J. Virol. 11, 991–997.

Gao, Y., Colletti, K., and Pari, G.S. (2008). Identification of Human Cytomegalovirus

UL84 Virus- and Cell-Encoded Binding Partners by Using Proteomics Analysis. J. Virol.

82, 96–104.

Gao, Y., Kagele, D., Smallenberg, K., and Pari, G.S. (2010). Nucleocytoplasmic shuttling of human cytomegalovirus UL84 is essential for virus growth. J. Virol. 84, 8484–8494.

Gatherer, D., Seirafian, S., Cunningham, C., Holton, M., Dargan, D.J., Baluchova, K.,

Hector, R.D., Galbraith, J., Herzyk, P., Wilkinson, G.W.G., et al. (2011). High-resolution human cytomegalovirus transcriptome. PNAS 108, 19755–19760. 117

Geballe, A.P., Leach, F.S., and Mocarski, E.S. (1986). Regulation of cytomegalovirus late gene expression: gamma genes are controlled by posttranscriptional events. J. Virol.

57, 864–874.

Goldmacher, V.S. (2002). vMIA, a viral inhibitor of apoptosis targeting mitochondria.

Biochimie 84, 177–185.

Goldmacher, V.S., Bartle, L.M., Skaletskaya, A., Dionne, C.A., Kedersha, N.L., Vater,

C.A., Han, J., Lutz, R.J., Watanabe, S., McFarland, E.D.C., et al. (1999). A cytomegalovirus-encoded mitochondria-localized inhibitor of apoptosis structurally unrelated to Bcl-2. PNAS 96, 12536–12541.

Halazonetis, T.D., Georgopoulos, K., Greenberg, M.E., and Leder, P. (1988). c-Jun dimerizes with itself and with c-Fos, forming complexes of different DNA binding affinities. Cell 55, 917–924.

Heiden, D., Ford, N., Wilson, D., Rodriguez, W.R., Margolis, T., Janssens, B., Bedelu,

M., Tun, N., Goemaere, E., Saranchuk, P., et al. (2007). Cytomegalovirus Retinitis: The

Neglected Disease of the AIDS Pandemic. PLoS Medicine 4, e334.

Hermiston, T.W., Malone, C.L., Witte, P.R., and Stinski, M.F. (1987). Identification and characterization of the human cytomegalovirus immediate-early region 2 gene that stimulates gene expression from an inducible promoter. J. Virol. 61, 3214–3221. 118

Heydarian, M., Luperchio, T.R., Cutler, J., Mitchell, C.J., Kim, M.-S., Pandey, A.,

Sollner-Webb, B., and Reddy, K. (2014). Prediction of Gene Activity in Early B Cell

Development Based on an Integrative Multi-Omics Analysis. J Proteomics Bioinform 7.

Ho, D.D., Rara, T.R., Andrews, C.A., and Hirsch, M.S. (1984). Replication of Human

Cytomegalovirus in Endothelial Cells. J Infect Dis 150, 956–957.

Ibanez, C.E., Schrier, R., Ghazal, P., Wiley, C., and Nelson, J.A. (1991). Human cytomegalovirus productively infects primary differentiated macrophages. J Virol 65,

6581–6588.

Jones, T.R., Wiertz, E.J., Sun, L., Fish, K.N., Nelson, J.A., and Ploegh, H.L. (1996).

Human cytomegalovirus US3 impairs transport and maturation of major histocompatibility complex class I heavy chains. Proc Natl Acad Sci U S A 93, 11327–

11333.

Kagele, D., Gao, Y., Smallenburg, K., and Pari, G.S. (2009). Interaction of HCMV UL84 with C/EBPα transcription factor-binding sites within oriLyt is essential for lytic DNA replication. Virology 392, 16–23.

Kahl, M., Siegel-Axel, D., Stenglein, S., G, Jahn, and Sinzger, C. (2000). Efficient Lytic

Infection of Human Arterial Endothelial Cells by Human Cytomegalovirus Strains. J.

Virol. 74, 7628–7635.

Kalejta, R.F. (2008). Tegument proteins of human cytomegalovirus. Microbiol. Mol.

Biol. Rev. 72, 249–265. 119

Kerry, J.A., Priddy, M.A., and Stenberg, R.M. (1994). Identification of sequence elements in the human cytomegalovirus DNA polymerase gene promoter required for activation by viral gene products. J. Virol. 68, 4167–4176.

Kerry, J.A., Priddy, M.A., Jervey, T.Y., Kohler, C.P., Staley, T.L., Vanson, C.D., Jones,

T.R., Iskenderian, A.C., Anders, D.G., and Stenberg, R.M. (1996). Multiple regulatory events influence human cytomegalovirus DNA polymerase (UL54) expression during viral infection. J Virol 70, 373–382.

Kiehl, A., Huang, L., Franchi, D., and Anders, D.G. (2003). Multiple 5′ ends of human cytomegalovirus UL57 transcripts identify a complex, cycloheximide-resistant promoter region that activates oriLyt. Virology 314, 410–422.

Landt, S.G., Marinov, G.K., Kundaje, A., Kheradpour, P., Pauli, F., Batzoglou, S.,

Bernstein, B.E., Bickel, P., Brown, J.B., Cayting, P., et al. (2012). ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res. 22, 1813–

1831.

LaPierre, L.A., and Biegalke, B.J. (2001). Identification of a Novel Transcriptional

Repressor Encoded by Human Cytomegalovirus. J. Virol. 75, 6062–6069.

Lee, J.-W., Tapias, V., Di Maio, R., Greenamyre, J.T., and Cannon, J.R. (2015).

Behavioral, neurochemical, and pathologic alterations in bacterial artificial chromosome transgenic G2019S leucine-rich repeated kinase 2 rats. Neurobiology of Aging 36, 505–

518. 120

Lester, E., Rana, R., Liu, Z., and Biegalke, B.J. (2006). Identification of the functional domains of the essential human cytomegalovirus UL34 proteins. Virology 353, 27–34.

Li, H., and Durbin, R. (2009). Fast and accurate short read alignment with Burrows-

Wheeler transform. Bioinformatics 25, 1754–1760.

Liu, Z., and Biegalke, B.J. (2013). Human Cytomegalovirus UL34 Binds to Multiple

Sites within the Viral Genome. J. Virol. 87, 3587–3591.

Livak, K.J., and Schmittgen, T.D. (2001). Analysis of Relative Gene Expression Data

Using Real-Time Quantitative PCR and the 2−ΔΔCT Method. Methods 25, 402–408.

Loregian, A., Rigatti, R., Murphy, M., Schievano, E., Palu, G., and Marsden, H.S. (2003).

Inhibition of Human Cytomegalovirus DNA Polymerase by C-Terminal Peptides from the UL54 Subunit. J Virol 77, 8336–8344.

Luisi, K., Sharma, M., and Yu, D. (2017). Development of a vaccine against cytomegalovirus infection and disease. Current Opinion in Virology 23, 23–29.

Machanick, P., and Bailey, T.L. (2011). MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics 27, 1696–1697.

Manicklal, S., Emery, V.C., Lazzarotto, T., Boppana, S.B., and Gupta, R.K. (2013). The

“Silent” Global Burden of Congenital Cytomegalovirus. Clin Microbiol Rev 26, 86–102.

Marchini, A., Liu, H., and Zhu, H. (2001). Human cytomegalovirus with IE-2 (UL122) deleted fails to express early lytic genes. J. Virol. 75, 1870–1878. 121

Markham, N.R., and Zuker, M. (2005). DINAMelt web server for nucleic acid melting prediction. Nucleic Acids Res 33, W577–W581.

Markham, N.R., and Zuker, M. (2008). UNAFold: Software for Nucleic Acid Folding and Hybridization. In Bioinformatics, J.M. Keith, ed. (Totowa, NJ: Humana Press), pp.

3–31.

Masse, M.J., Karlin, S., Schachtel, G.A., and Mocarski, E.S. (1992). Human cytomegalovirus origin of DNA replication (oriLyt) resides within a highly complex repetitive region. PNAS 89, 5246–5250.

Mavinakere, M.S., and Colberg-Poley, A.M. (2004). Internal cleavage of the human cytomegalovirus UL37 immediate-early glycoprotein and divergent trafficking of its proteolytic fragments. J. Gen. Virol. 85, 1989–1994.

McMahon, T.P., and Anders, D.G. (2002). Interactions between human cytomegalovirus helicase–primase proteins. Virus Research 86, 39–52.

Melendez, D.P., and Razonable, R.R. (2015). Letermovir and inhibitors of the terminase complex: a promising new class of investigational antiviral drugs against human cytomegalovirus. Infect Drug Resist 8, 269–277.

Mercorelli, B., Sinigalia, E., Loregian, A., and Palù, G. (2008). Human cytomegalovirus

DNA replication: antiviral targets and drugs. Rev. Med. Virol. 18, 177–210. 122

Mercorelli, B., Luganini, A., Nannetti, G., Tabarrini, O., Palù, G., Gribaudo, G., and

Loregian, A. (2016). Drug Repurposing Approach Identifies Inhibitors of the Prototypic

Viral Transcription Factor IE2 that Block Human Cytomegalovirus Replication. Cell

Chemical Biology 23, 340–351.

Murphy, E., and Shenk, T.E. (2008). Human Cytomegalovirus Genome. In Human

Cytomegalovirus, (Springer, Berlin, Heidelberg), pp. 1–19.

Murphy, E., Yu, D., Grimwood, J., Schmutz, J., Dickson, M., Jarvis, M.A., Hahn, G.,

Nelson, J.A., Myers, R.M., and Shenk, T.E. (2003). Coding potential of laboratory and clinical strains of human cytomegalovirus. Proc. Natl. Acad. Sci. U.S.A. 100, 14976–

14981.

Niebuhr, B., Fischer, M., Täger, M., Cammenga, J., and Stocking, C. (2008). Gatekeeper function of the RUNX1 transcription factor in acute leukemia. Blood Cells, Molecules, and Diseases 40, 211–218.

Ohlsson, R., Renkawitz, R., and Lobanenkov, V. (2001). CTCF is a uniquely versatile transcription regulator linked to epigenetics and disease. Trends in Genetics 17, 520–527.

Ong, C.-T., and Corces, V.G. (2014). CTCF: an architectural protein bridging genome topology and function. Nat Rev Genet 15, 234–246.

Pari, G.S. (2008). Nuts and Bolts of Human Cytomegalovirus Lytic DNA Replication. In

Human Cytomegalovirus, (Springer, Berlin, Heidelberg), pp. 153–166. 123

Pari, G.S., and Anders, D.G. (1993). Eleven loci encoding trans-acting factors are required for transient complementation of human cytomegalovirus oriLyt-dependent

DNA replication. J. Virol. 67, 6979–6988.

Park, M.-Y., Kim, Y.-E., Seo, M.-R., Lee, J.-R., Lee, C.H., and Ahn, J.-H. (2006).

Interactions among Four Proteins Encoded by the Human Cytomegalovirus UL112-113

Region Regulate Their Intranuclear Targeting and the Recruitment of UL44 to

Prereplication Foci. J. Virol. 80, 2718–2727.

Penfold, M.E.T., and Mocarski, E.S. (1997). Formation of Cytomegalovirus DNA

Replication Compartments Defined by Localization of Viral Proteins and DNA

Synthesis. Virology 239, 46–61.

Petrik, D.T., Schmitt, K.P., and Stinski, M.F. (2006). Inhibition of Cellular DNA

Synthesis by the Human Cytomegalovirus IE86 Protein Is Necessary for Efficient Virus

Replication. J Virol 80, 3872–3883.

Piret, J., and Boivin, G. (2017). Herpesvirus Resistance to Antiviral Drugs. In

Antimicrobial Drug Resistance, (Springer, Cham), pp. 1185–1211.

Pizzorno, M.C., and Hayward, G.S. (1990). The IE2 gene products of human cytomegalovirus specifically down-regulate expression from the major immediate-early promoter through a target sequence located near the cap site. J. Virol. 64, 6154–6165.

Powers, C., DeFilippis, V., Malouli, D., and Früh, K. (2008). Cytomegalovirus Immune

Evasion. In Human Cytomegalovirus, (Springer, Berlin, Heidelberg), pp. 333–359. 124

Prichard, M.N., Jairath, S., Penfold, M.E.T., Jeor, S.S., Bohlman, M.C., and Pari, G.S.

(1998). Identification of Persistent RNA-DNA Hybrid Structures within the Origin of

Replication of Human Cytomegalovirus. J. Virol. 72, 6997–7004.

Rana, R., and Biegalke, B.J. (2014). Human cytomegalovirus UL34 early and late proteins are essential for viral replication. Viruses 6, 476–488.

Riegler, S., Hebart, H., Einsele, H., Brossart, P., Jahn, G., and Sinzger, C. (2000).

Monocyte-derived dendritic cells are permissive to the complete replicative cycle of human cytomegalovirus. J. Gen. Virol. 81, 393–399.

Russell, W.C., Gold, E., Keir, H.M., Omura, H., Watson, D.H., and Wildy, P. (1964).

The growth of herpes simplex virus and its nucleic acid. Virology 22, 103–110.

Schleiss, M.R. (2016). Cytomegalovirus vaccines under clinical development. J Virus

Erad 2, 198–207.

Schommartz, T., Tang, J., Brost, R., and Brune, W. (2017). Differential requirement of human cytomegalovirus UL112-113 protein isoforms for viral replication. J. Virol.

JVI.00254-17.

Schwartz, R., Sommer, M.H., Scully, A., and Spector, D.H. (1994). Site-specific binding of the human cytomegalovirus IE2 86-kilodalton protein to an early gene promoter. J.

Virol. 68, 5613–5622. 125

Shannon-Lowe, C.D., and Emery, V.C. (2010). The effects of maribavir on the autophosphorylation of ganciclovir resistant mutants of the cytomegalovirus UL97 protein. Herpesviridae 1, 4.

Sinclair, J., and Sissons, P. (2006). Latency and reactivation of human cytomegalovirus. J

Gen Virol 87, 1763–1779.

Sinzger, C., Grefte, A., Plachter, B., Gouw, A.S., The, T.H., and Jahn, G. (1995).

Fibroblasts, epithelial cells, endothelial cells and smooth muscle cells are major targets of human cytomegalovirus infection in lung and gastrointestinal tissues. J. Gen. Virol. 76 (

Pt 4), 741–750.

Sinzger, C., Hahn, G., Digel, M., Katona, R., Sampaio, K.L., Messerle, M., Hengel, H.,

Koszinowski, U., Brune, W., and Adler, B. (2008). Cloning and sequencing of a highly productive, endotheliotropic virus strain derived from human cytomegalovirus TB40/E. J.

Gen. Virol. 89, 359–368.

Smith, J.A., and Pari, G.S. (1995). Human cytomegalovirus UL102 gene. J. Virol. 69,

1734–1740.

Smith, J.A., Jairath, S., Crute, J.J., and Pari, G.S. (1996). Characterization of the Human

Cytomegalovirus UL105 Gene and Identification of the Putative Helicase Protein.

Virology 220, 251–255.

Spector, D.J. (2015). UL84-independent replication of human cytomegalovirus strains conferred by a single codon change in UL122. Virology 476, 345–354. 126

Stenberg, R.M., Thomsen, D.R., and Stinski, M.F. (1984). Structural analysis of the major immediate early gene of human cytomegalovirus. J. Virol. 49, 190–199.

Stern-Ginossar, N., Weisburd, B., Michalski, A., Khanh Le, V.T., Hein, M.Y., Huang, S.-

X., Ma, M., Shen, B., Qian, S.-B., Hengel, H., et al. (2012). Decoding human cytomegalovirus. Science 338.

Stinski, M.F. (1977). Synthesis of Proteins and Glycoproteins in Cells Infected with

Human Cytomegalovirus. J. Virol. 23, 751–767.

Stinski, M.F., Thomsen, D.R., Stenberg, R.M., and Goldstein, L.C. (1983). Organization and expression of the immediate early genes of human cytomegalovirus. J. Virol. 46, 1–

14.

Strang, B.L., Sinigalia, E., Silva, L.A., Coen, D.M., and Loregian, A. (2009). Analysis of the Association of the Human Cytomegalovirus DNA Polymerase Subunit UL44 with the

Viral DNA Replication Factor UL84. J. Virol. 83, 7581–7589.

Strang, B.L., Boulant, S., Chang, L., Knipe, D.M., Kirchhausen, T., and Coen, D.M.

(2012). Human Cytomegalovirus UL44 Concentrates at the Periphery of Replication

Compartments, the Site of Viral DNA Synthesis. J. Virol. 86, 2089–2095.

Tamashiro, J.C., and Spector, D.H. (1986). Terminal structure and heterogeneity in human cytomegalovirus strain AD169. J. Virol. 59, 591–604. 127

Taylor, R.T., and Bresnahan, W.A. (2006). Human cytomegalovirus immediate-early 2 protein IE86 blocks virus-induced chemokine expression. J. Virol. 80, 920–928.

Tischer, B.K., von Einem, J., Kaufer, B., and Osterrieder, N. (2006). Two-step red- mediated recombination for versatile high-efficiency markerless DNA manipulation in

Escherichia coli. BioTechniques 40, 191–197.

Tischer, B.K., Smith, G.A., and Osterrieder, N. (2010). En passant mutagenesis: a two step markerless red recombination system. Methods Mol. Biol. 634, 421–430.

Urban, M., Joubert, N., Hocek, M., Alexander, R.E., and Kuchta, R.D. (2009). Herpes

Simplex Virus-1 DNA Primase: A Remarkably Inaccurate yet Selective Polymerase.

Biochemistry 48, 10866–10881.

Vanarsdall, A.L., and Johnson, D.C. (2012). Human cytomegalovirus entry into cells.

Curr Opin Virol 2.

Varnum, S.M., Streblow, D.N., Monroe, M.E., Smith, P., Auberry, K.J., Paša-Tolić, L.,

Wang, D., Camp, D.G., Rodland, K., Wiley, S., et al. (2004). Identification of Proteins in

Human Cytomegalovirus (HCMV) Particles: the HCMV Proteome. J. Virol. 78, 10960–

10966.

Waldman, W.J., Roberts, W.H., Davis, D.H., Williams, M.V., Sedmak, D.D., and

Stephens, R.E. (1991). Preservation of natural endothelial cytopathogenicity of cytomegalovirus by propagation in endothelial cells. Arch. Virol. 117, 143–164. 128

Wang, X., Wang, Y., Zhu, Q., Guo, G., and Yuan, H. (2014). Pulmonary infection in the renal transplant recipients: Analysis of the radiologic manifestations. Radiology of

Infectious Diseases 1, 3–6.

White, E.A., and Spector, D.H. (2007). Early viral gene expression and function. In

Human Herpesviruses: Biology, Therapy, and Immunoprophylaxis, A. Arvin, G.

Campadelli-Fiume, E. Mocarski, P.S. Moore, B. Roizman, R. Whitley, and K.

Yamanishi, eds. (Cambridge: Cambridge University Press).

White, E.A., Del Rosario, C.J., Sanders, R.L., and Spector, D.H. (2007). The IE2 60- kilodalton and 40-kilodalton proteins are dispensable for human cytomegalovirus replication but are required for efficient delayed early and late gene expression and production of infectious virus. J. Virol. 81, 2573–2583.

Wiebusch, L., Asmar, J., Uecker, R., and Hagemeier, C. (2003). Human cytomegalovirus immediate-early protein 2 (IE2)-mediated activation of cyclin E is cell-cycle-independent and forces S-phase entry in IE2-arrested cells. J. Gen. Virol. 84, 51–60.

Woodroffe, S.B., Hamilton, J., and Garnett, H.M. (1997). Comparison of the infectivity of the laboratory strain AD169 and a clinical isolate of human cytomegalovirus to human smooth muscle cells. Journal of Virological Methods 63, 181–191.

Xu, Y., Cei, S.A., Huete, A.R., Colletti, K.S., and Pari, G.S. (2004). Human

Cytomegalovirus DNA Replication Requires Transcriptional Activation via an IE2- and 129

UL84-Responsive Bidirectional Promoter Element within oriLyt. J Virol 78, 11664–

11677.

Yamamoto, T., Suzuki, S., Radsak, K., and Hirai, K. (1998). The UL112/113 gene products of human cytomegalovirus which colocalize with viral DNA in infected cell nuclei are related to efficient viral DNA replication. Virus Research 56, 107–114.

Yu, D., Silva, M.C., and Shenk, T. (2003). Functional map of human cytomegalovirus

AD169 defined by global mutational analysis. PNAS 100, 12396–12401.

Zambelli, F., Pesole, G., and Pavesi, G. (2013). PscanChIP: finding over-represented transcription factor-binding site motifs and their correlations in sequences from ChIP-Seq experiments. Nucleic Acids Res 41, W535–W543.

Zhu, Y., Huang, L., and Anders, D.G. (1998). Human Cytomegalovirus oriLyt Sequence

Requirements. J. Virol. 72, 4989–4996.

Zuker, M. (2003). Mfold web server for nucleic acid folding and hybridization prediction.

Nucleic Acids Research 31, 3406–3415.

! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !

Thesis and Dissertation Services ! !