CHARACTERIZING INTERACTIONS OF HIV-1 WITH VIRAL DNA

AND THE CELLULAR COFACTOR LEDGF

Dissertation

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of

Philosophy in the Graduate School of The Ohio State University

By

Christopher J. McKee, B.A.

Graduate Program in Molecular, Cellular & Developmental Biology

2010

Dissertation Committee:

Mamuka Kvaratskhelia, Advisor

Mary Jo Burkhard

Karin Musier-Forsyth

Thomas Schmittgen

ABSTRACT

A hallmark of retroviral replication is the permanent integration of the viral into the host cell genome. This integration is mediated by the viral enzyme integrase (IN) which is bound to the ends of the viral DNA in the context of a large complex known as the pre-integration complex (PIC).

Using the two catalytic activities of IN, the viral DNA ends are first processed then used for a strand transfer reaction that simultaneously breaks the host DNA backbone and ligates the viral genome into it. For HIV-1, the site of integration into the host genome is non-random and occurs most often in active transcription units. This bias is likely explained by the recruitment of pre-integration complexes to chromatin via interactions between IN and the cellular Lens

Epithelium-Derived Growth Factor (LEDGF). Failure of either enzymatic reaction or failure of the PIC to engage chromatin is a replicative dead end for the .

Integrase has long been considered an attractive therapeutic target due to its essential role in replication and its lack of a cellular counterpart. The first-in- class HIV IN inhibitors block integration by binding specifically to the IN-viral DNA complex, displacing the viral DNA ends from the and effectively preventing strand transfer. The specificity of this inhibition led to the rapid emergence of HIV-1 phenotypes resistant to all available strand transfer

ii

inhibitors and highlighted the need for new classes of IN inhibitor. The development of these new inhibitors will be driven by an improved understanding of IN structure and of the sequence of pre-integration events. In spite of tremendous efforts, no high-resolution structural data is available for the full- length HIV-1 IN. The following text describes my efforts to characterize three critical molecular interactions that determine the fate of HIV-1 integration: IN with viral DNA, IN with the cellular cofactor LEDGF, and LEDGF with chromatin.

Using innovative mass spectrometric footprinting techniques I have mapped and validated biologically essential contacts at IN-viral DNA, IN-IN, and IN-LEDGF interfaces. These experiments provided structural details that revealed previously undescribed changes in integrase conformation and subunit dynamics upon binding viral DNA and LEDGF, respectively. Finally, I have explored epigenetic modifications in chromatin that are recognized by LEDGF. Specific binding of LEDGF to regions featuring these modifications could explain, at least in part, the targeting of active transcription units for HIV-1 integration in infected cells.

iii

ACKNOWLEDGMENTS

I would like to thank my advisor Mamuka Kvaratskhelia for his mentorship and the many opportunities I have had while working in his lab. Furthermore, my co-workers in the Kvaratskhelia lab have contributed enormously to the work presented herein. I would in particular like to thank Zhuojun Zhao, Jacques

Kessl, Jocelyn Eidahl, and Nick Shkriabai for their technical assistance and friendship. I am indebted to our many collaborators at OSU and beyond, especially Alan Engelman of the Dana-Farber Cancer Institute and the members of the Ohio State Center for Research for their experimental contributions and constructive criticism of my work. My research has been funded by grants from the National Institutes of Health and the Jeffrey J.

Seilhamer Cancer Foundation, to whom I am extremely grateful for their confidence and financial support. Previously copyrighted materials are reprinted here with permission of the American Society for Biochemistry and Molecular

Biology.

Special thanks go to the OSU department of Molecular, Cellular, &

Developmental Biology for admitting me to their outstanding program and guiding me to success as a graduate student. I also extend many thanks to those who have assisted me on my circuitous path to graduate school including John

iv

Haddock, Yan Zhang, and Sreekumari Rajeev. Most of all I thank Jessica Dyszel for her many years of support, both technical and moral.

v

VITA

July, 1978...... Born- Springfield, Illinois, USA

2001...... B.A. Microbiology Southern Illinois University Carbondale, IL, USA

2003-2005...... Microbiologist Ohio Department of Agriculture Animal Disease Diagnostic Laboratory Reynoldsburg, OH, USA

2005-present...... Graduate Research and Teaching associate The Ohio State University Columbus, OH, USA

vi

PUBLICATIONS

Research Publication

1. J.J. Kessl, C.J. McKee , J.O. Eidahl, N. Shkriabai, A. Katz, M. Kvaratskhelia. HIV-1 Integrase-DNA Recognition Mechanisms. . 2009; 1(3):713-736

2. J.J. Kessl, J.O. Eidahl, N. Shkriabai, Z. Zhao, C.J. McKee , S. Hess, T. Burke, M. Kvaratskhelia. A novel mechanism for inhibiting HIV-1 integrase with a small molecule. Mol Pharmacol. 2009. 76(4):824-32.

3. C.J. McKee , M. Kvaratskhelia. Mass spectrometry-based footprinting of protein-protein interactions. 2008. Methods 47(4):304-7.

4. C.J. McKee , J.J. Kessl, N. Shkriabai, M.J. Dar, A. Engelman, M. Kvaratskhelia. Dynamic Modulation of HIV-1 Integrase Structure and Function by Cellular LEDGF Protein. 2008. J Biol Chem 283:31802-12.

5. Z. Zhou, C.J. McKee* , J.J. Kessl, W.L. Santos, J.E. Daigle, A. Engelman, G. Verdine, M. Kvaratskhelia. Subunit Specific Protein Footprinting Reveals Significant Structural Rearrangements and a Role for N-Terminal Lys-14 of HIV-1 Integrase During Viral DNA Binding. 2007. J Biol Chem 283 :5632-41.

*Shared first-authorship

FIELDS OF STUDY

Major Field: Molecular, Cellular, & Developmental Biology

vii

TABLE OF CONTENTS

Page

Abstract ...... ii Dedication ...... iv Acknowledgments ...... v Vita ...... vi List of Tables ...... x List of Figures...... xi

Chapters:

1. Introduction ...... 1

1.1 The HIV/AIDS challenge and integrase inhibitors to treat HIV/AIDS ...... 1 1.2 The HIV-1 replication cycle ...... 2 1.3 Integration of HIV-1 viral DNA into host chromatin...... 3 1.4 Structural aspects of HIV-1 integration...... 5 1.5 LEDGF as a cellular cofactor of HIV-1 integration...... 10 1.6 Figures for chapter 1 ...... 13

2. Identification of Subunit Specific Viral-DNA Contacts and a DNA-Induced Conformation Change within Functional HIV-1 Integrase Nucleoprotein Complexes ...... 17

2.1 Introduction ...... 17 2.2 Materials and methods...... 19 2.2.1 Preparation of HIV-1 IN, DNA oligonucleotides, and cross-linked products ...... 19 2.2.2 Mass spectrometric footprinting ...... 20 2.2.3 CD Spectroscopy ...... 20 2.2.4 Virus infectivity ...... 21 2.3 Results ...... 21 2.4 Discussion...... 30 2.5 Tables for Chapter 2 ...... 37 2.6 Figures for Chapter 2 ...... 39

viii

3. Dynamic Modulation of HIV-1 Integrase Structure and Function by Human LEDGF Protein ...... 49

3.1 Introduction ...... 49 3.2 Materials and Methods...... 52 3.2.1 Expression Plasmids and Recombinant ...... 52 3.2.2 IN 3’-Processing and DNA Strand Transfer Activities...... 53 3.2.3 Concerted Integration Assay ...... 53 3.2.4 Subunit Exchange Assay...... 54 3.2.5 Mass Spectrometric Footprinting...... 54 3.2.6 Size Exclusion Chromatography ...... 55 3.2.7 Molecular Modeling ...... 55 3.2.8 LEDGF Binding Affinities to Wild-type and Mutant Ins ...... 55 3.3 Results ...... 56 3.4 Discussion...... 65 3.5 Tables for Chapter 3 ...... 71 3.6 Figures for Chapter 3 ...... 73

4. Recognition of Histone Post Translational Modifications by the LEDGF PWWP Domain ...... 88

4.1 Introduction ...... 88 4.2 Materials and Methods ...... 90 4.2.1 Expression Plasmids and Recombinant Proteins...... 90 4.2.2 Peptide Synthesis ...... 91 4.2.3 Histone Peptide Microarrays ...... 91 4.2.4 Pull Down Assays ...... 92 4.3 Results ...... 93 4.4 Discussion...... 94 4.5 Tables for Chapter 4...... 96 4.6 Figures for Chapter 4 ...... 97

Bibliography...... 100

ix

LIST OF TABLES

Table Page

2.1 Susceptibility of IN basic residues to modification ...... 37

2.2 Infectivities of IN mutant viruses...... 38

3.1 Summary of mass spectrometric footprinting results...... 71

3.2 Summary of size exclusion chromatography results for wild type IN interactions with the LEDGF IBD ...... 72

3.3 Summary of size exclusion chromatography results for wild type and mutant IN proteins...... 72

4.1 Histone peptides selected for further study ...... 96

x

LIST OF FIGURES

Figure Page

1.1 HIV replication cycle ...... 13

1.2 Enzymatic activities of HIV integrase...... 14

1.3 Assays of integrase enzymatic activity ...... 15

1.4 Two-domain structures of HIV integrase...... 16

2.1 Experimental strategy...... 39

2.2 IN mutant enzymatic activity...... 40

2.3 IN mutant cross-linking efficiency with modified DNA...... 41

2.4 The effects of the modifying reagent on IN function ...... 42

2.5 Representative segments of MALDI-ToF data...... 43

2.6 Representative segments of R199 IN MALDI-ToF data ...... 44

2.7 Representative MALDI-TOF spectra showing specific protection of K14...... 45

2.8 Effects of substitutions on recombinant IN ...... 46

2.9 Model of viral DNA bound to HIV IN ...... 47

2.10 The crystal structure of the CCD-CTD fragment...... 48

3.1 The LEDGF proteins used in our studies...... 73

3.2 Effects of LEDGF and the isolated IBD on IN 3’ processing and DNA strand transfer activities...... 74

xi

3.3 Effects of LEDGF and the isolated IBD on IN concerted integration activity ...... 75 3.4 Subunit exchange assay using His-tag IN and SDS-PAGE...... 77

3.5 Size exclusion chromatography of free IN and IN-IBD complexes ...... 78

3.6 Schematic presentation of the protein footprinting strategy...... 79

3.7 Representative segments of the MALDI-ToF mass spectra from the protein footprinting ...... 80

3.8 Size exclusion chromatography of wild type and indicated Mutants ...... 81

3.9 LEDGF binding to wild type and mutant IN proteins...... 82

3.10 Biochemical characterization of IN mutant proteins...... 83

3.11 A model for the IBD bound NTD-CCD tetramer...... 84

3.12 IN 3’ processing activities in the presence of the diketo acid inhibitor (118-D-24) 4-[3-(azidomethyl)phenyl]-2-hydroxy-4-oxo- 2-butenoic acid ...... 85

3.13 The NTD-CCD dimer-dimer interface ...... 85

3.14 A potential interaction hotspot at the IN subunit-subunit interface...... 87

4.1 Representative peptide screening results...... 98

4.2 MS data from in vitro pull-down assays ...... 99

xii

CHAPTER 1

INTRODUCTION

1.1. The HIV/AIDS challenge and integrase inhibitors to treat HIV/AIDS

Human immunodeficiency virus (HIV), the causative agent of acquired

immunodeficiency syndrome (AIDS), continues to be a serious public health

concern worldwide. HIV infection leads to rapid deterioration of cell-mediated

immunity via replication in and consequent destruction of CD4+ immune cells.

This deterioration marks the progression towards AIDS and diminished capacity

to defend against pathogens. Since its discovery in 1981, nearly 60 million

people have been infected with HIV and 25 million have died from AIDS related

causes (1). Fortunately the challenge of AIDS has been met with unprecedented

drug discovery efforts, yielding 25 FDA approved drugs in 25 years. Access to

antiretroviral therapy dramatically prolongs life expectancy and improves quality

of life. Antiretroviral drug development is an ongoing process as the rapid

evolutionary rate of HIV leads to the emergence of viruses resistant to available

drugs. In 2007 the first in a new class of HIV drugs, the integrase inhibitors, was

approved by the FDA and has been shown to be remarkably effective (39, 121).

Predictably, the drug-resistant viral phenotypes observed in vitro eventually

emerged in the clinic and highlighted the need for a new class of integrase

1

inhibitors with an alternate inhibitory mechanism. The development of these drugs will likely be driven by an improved understanding of integrase structural biology and critical pre-integration events.

1.2 The HIV-1 Replication Cycle

The HIV-1 replication cycle (Figure 1.1) begins with target cell recognition through interactions between viral envelope proteins and host cell surface receptors. Fusion of the viral envelope to the cellular membrane introduces the viral RNA and core proteins into the cytoplasm in the form of a large nucleoprotein complex (37). It is in the context of this nucleoprotein complex that reverse transcription, a process unique to retroviral replication, converts the viral

RNA genome into cDNA using the virally-encoded . Following reverse transcription this complex collects several host proteins to form a massive new complex termed the pre-integration complex

(PIC). By unknown mechanisms, the HIV-1 PIC is actively transported across the nuclear membrane where it has access to host chromatin. Active transport of the PIC across the nuclear membrane is a peculiarity of HIV-1 and other lentiviruses that enables the infection of target cells without waiting for nuclear disassembly during mitosis (153). This allows infection at all stages of the cell cycle and expands the host range to non-dividing or terminally differentiated cells. Upon entry to the nucleus the PIC, or some remnant of it, is targeted to host chromatin where the viral genome is permanently integrated by the viral integrase (IN) protein. Target site preferences vary among the retroviridae. For example, lentivirus integration occurs most often within active transcription units 2

while gammaretroviruses prefer promoter regions and the associated CpG islands (125). After integration the is treated essentially as a host cell gene, transcribed and translated by the host cell machinery (37). Manipulation of host RNA splicing allows for the nuclear export of genome-length for packaging into new virions as well as spliced RNAs for protein translation. Viral proteins and are targeted to the cell membrane for assembly and budding. Finally, during or shortly after budding, the viral core is processed by the viral protease to yield an infectious particle capable of starting the cycle anew.

1.3. Integration of HIV-1 viral DNA into host chromatin

Permanent integration into the host genome has numerous benefits from

the viral perspective. Integration into the host chromosome endows the provirus

with long-term stability and provides access to the host transcription machinery.

Furthermore, the integrated provirus takes advantage of co-mitotic replication

with the host cell. Stable integration has the added benefit of establishing a

latent reservoir capable of avoiding immune surveillance and archiving drug-

resistant phenotypes (131).

Successful integration requires two spatially and temporally separate IN catalytic activities: 3’-processing and strand transfer (Figure 1.2). The HIV-1 IN protein contains a single active site that carries out both activities (38, 56). Upon completion of reverse transcription, while the pre-integration complex is still in the cytoplasm, integrase binds specifically to the long terminal repeats of the viral genome and cleaves two bases from each 3’ end following a conserved –CA 3

dinucleotide (38). The strand transfer reaction simultaneously breaks the host

DNA phosphodiester backbone and ligates the processed viral DNA ends into opposite strands, across the major groove of the host DNA double helix with an invariable spacing of 5 base pairs between attachment sites. 3’ processing and strand transfer are essentially the same reaction from a catalytic perspective, with H2O serving as a nucleophile in the first reaction and the exposed 3’-OH in the second. Host DNA repair mechanisms complete the process by removing the unpaired overhangs, filling the gaps and repairing the nicks to yield a permanently integrated genome flanked by a duplication of the five sequence between attachment sites (139, 151). Along with the biologically relevant 3’-processing and strand transfer activities, purified IN exhibits additional activities in vitro . The enzyme can reverse the strand transfer reaction by site

selectively cleaving the integrated DNA in a reaction termed disintegration (34). A

recent report has indicated that the recombinant protein can also catalyze

internal cleavage at a palindromic sequence mimicking LTR-LTR junction (43,

44). However, there is no evidence as of yet that these additional catalytic

activities observed in vitro occur in infected cells.

In infected cells IN functions in the context of a large nucleoprotein complex termed the pre-integration complex (PIC), where a number of viral and cellular proteins contribute to retroviral integration (15, 16, 23, 24, 26, 32, 33, 52,

61, 92, 93, 96, 105, 106, 147). PICs can be extracted from infected cells and used for biochemical assays in vitro (12, 14, 27, 50, 53, 63, 95, 169). However, the abundance of these nucleoprotein complexes is insufficient to perform atomic 4

structural or even lower resolution biophysical analyses. Therefore, recombinant

IN and model DNA substrates have been employed instead to study protein- nucleic acid interactions. In their simplest form, these assays utilize purified recombinant protein and short DNA substrates (~21-mer dsDNA mimicking the

U5 end of viral DNA) to monitor 3’-processing and strand transfer activities in vitro (Figure 1.3A). These reactions, however, are only capable of monitoring single-end integrations and not the biologically-relevant concerted integration of two DNA ends into the target. More intricate assays use longer donor DNA substrates of several hundred base pairs and a circular target DNA to monitor both 3’ processing and concerted integration (Figure 1.3B) (98-100, 127, 148,

149). Furthermore, the nucleoprotein complexes formed in this improved experimental design have properties reminiscent of those in pre-integration complexes and can be assembled in quantities useful for structural analysis (98,

100).

1.4. Structural aspects of HIV-1 integration

HIV-1 IN is notoriously resistant to crystallization efforts, likely owing to its poor solubility and inherent flexibility. HIV-1 IN is comprised of three stable domains: an N-terminal domain (NTD), catalytic core domain (CCD) and C- terminal domain (CTD)(38). Mutant HIV-1 IN proteins with reduced surface hydrophobicity have been used to crystallize the individual domains as well as two-domain structures consisting of the NTD+CTD and CCD+CTD (Figure 1.4)

(21, 48, 109). Of note, the soluble mutants used for the structures exhibit aberrant inter-subunit dynamics in vitro and yield replication-defective viruses in 5

vivo (82, 123). No structure to date has included viral or target DNA. The N- terminal domain (NTD, residues 1-50) is a compact α-helical bundle stabilized by

the coordination of a zinc atom with a distinctive CCHH zinc finger motifs. It

plays an important architectural role, stabilizing both protein-protein and protein-

DNA interactions (38, 70). The core domain (residues 51-212) contains a single

highly conserved active site in which the three acidic residues D64, D116, and

E152 coordinate a divalent metal ion (56). Mutations of these residues severely

compromise IN activities both in vitro and in infected cells (56, 74, 91).

Biochemical assays with purified IN revealed that it requires either Mg 2+ or Mn 2+

to carry out the reactions with model DNA substrates. Of these, Mg 2+ is considered to be the physiological cofactor due to its relative abundance in the cells. Several structural studies have shown a single divalent metal bound to the active site of the HIV-1 CCD (67). However, based on the two-metal mechanism for structurally and functionally similar polynucleotidyl transferases (8, 170), it has been proposed that DNA binding stabilizes the second metal in the active site

(102). IN uses the same catalytic site for 3’-processing and strand transfer reactions. Therefore, the CCD is likely to harbor both viral and target DNA binding sites. Furthermore, the CCD is also an essential building block for formation of the functional multimeric IN. The CCD-CCD interface is fairly large

(~ 1650 Å) and mutations destabilizing these interactions adversely affect IN catalytic activities (47, 166). The C-terminal domain (residues 213-288) is made up of highly basic β-sheets that bind DNA strongly but non-specifically. The C-

terminal domain (CTD) is rich in basic amino acids and adopts an SH3-like fold 6

(49). Other proteins with the same fold bind the minor groove of DNA in a nonspecific manner (66, 88, 138). Similarly, the CTD is thought to provide a stabilizing platform for DNA substrates. In addition, the CTD has been implicated in functional oligomerization of IN. L241A and L242A mutations along the C- terminal dimer disrupted IN dimerization and compromised catalytic activities

(114).

IN selectively binds U5 and U3 termini of viral DNA. Footprinting of PICs isolated from infected cells revealed the terminal 200-250 base pairs of each viral

DNA end as primary protein binding sites (27). While random DNA sequences can effectively compete with IN-viral DNA interactions and impair the 3’- processing reactions in vitro (90, 150, 160), in the context of infected cells this is unlikely to significantly deter association with viral DNA ends as PIC assembly takes place in the cytoplasm where competition from non-specific DNA would be minimal. Functional assays have shown that IN can distinguish between the viral

DNA ends and nonspecific substrates. Mutational studies in vitro have indicated the importance of CA/TG dinucleotide pair for effective 3’-processing of the viral

DNA ends (90). Additional proximal regions of viral DNA have also been implicated in specific recognition of the viral DNA (59). One important feature contributing to selective recognition of the LTR termini by IN could be the DNA end distortion. NMR analysis of a 17 base pair oligonucleotide containing the U5 terminal sequence revealed that base stacking and minor groove were significantly disordered at the cleavage site and a correlation between DNA end

7

distortion and cleavage activities was observed (137). Introducing mismatch bases at the terminal three positions enhanced base unstacking and unpairing, and substantially stimulated the site specific processing activities.

The alternative experimental strategies to identify the LTR regions

important for selective recognition involved application of DNA analogs. Probing

effects of various DNA backbone, base, and groove modifications on IN catalytic

activities suggested that IN requires flexibility of the phosphodiester backbone at

the scissile bond (85). The other study examined 2’-modified nucleosides and

1,3-propanediol insertions in various positions of the U5 sequence (2). Akin to

the mutagenesis experiments (59) divalent metal dependent affects were

observed upon altering certain regions of the DNA (2). Nucleoside modifications

at positions 3, 5 and 6 significantly diminished Mg2+ dependent activities, while

Mn 2+ dependent reactions were less affected. In contrast, Mg 2+ and Mn 2+

dependent activities were equally impaired when the modifications were

introduced at positions 7-9 (2). Taken together, the biochemical approaches

enabled the delineation of several important features of viral DNA essential for

formation of the functional nucleoprotein complexes. Nevertheless, the detailed

mechanism for selective recognition remains elusive. Ideally, atomic structures of

IN complexes with specific and non-specific would be necessary to fully

address this question.

Recombinant HIV-1 IN exists predominantly as a dimer in solution, in

equilibrium with monomers, tetramers and higher order oligomers. The IN dimer

interface is the most stable, with a predicted Kd of 67.8pm (155). The less stable 8

integrase tetramer requires a separate interface that probably involves all three

IN subdomains. One particular two-domain crystal structure shows two symmetry-related CCD+NTD particles in what appears to be a tetramer formation formed by crystal lattice contacts (167). In this structure the dimer interface is unchanged and the tetramer interface is stabilized by extensive electrostatic interactions between the NTD and CCD of monomers in separate dimers. The multimeric state of IN in the context of preintegration complexes is unknown, though strong evidence suggests that monomers assemble on the viral LTRs to form a dimer of dimers that carries out both enzymatic activities (65, 69, 123,

136). In such an assembly, only two of active sites would be catalytically active while the remaining monomers would play solely architectural roles.

The available two-domain IN crystal structures can easily be superimposed to generate a full-length model. These models fail to recreate a plausible active site arrangement as the spacing between active sites is >30 angstroms while the five base pair distance between attachment sites in target

DNA is only ~15 angstroms (for B-DNA). The inconsistency is most likely explained by the absence of DNA in any of the structures used to build the models.

The structure of prototype foamy virus IN in complex with viral DNA inspired a unique model in which a tetramer of IN in which the active sites are spaced appropriately to carry out the concerted integration of vDNA ends seen in

vivo . This tetrameric structure features two IN dimers face to face, with the inner

subunits making extensive contacts with viral DNA and the CTD and NTD of the 9

same subunits participating in protein-protein interactions (70). Only the CCDs of the outer subunits were resolved in the crystal structure, and they seem to be placed far from the viral DNA substrate. The role of these outer subunits could be to interact with target DNA or cofactors, or they could be playing exclusively an architectural role.

1.5. LEDGF as a cellular cofactor of HIV-1 integration

While purified recombinant IN is capable of carrying out the enzymatic activities in vitro using small DNA substrates, these assays do not reflect the

complexities of integration in vivo. The PIC contains numerous viral and cellular

proteins and integrates a >9kb viral genome into chromatin, which is itself

complex in structure and composition. A number of cellular cofactors are

suspected to interact with HIV-1 IN (52, 157). Among these, lens epithelium-

derived growth factor (LEDGF) has been identified as critical for effective

integration (32, 105). LEDGF is a ubiquitous cellular transcription factor of

uncertain purpose found associated with chromatin throughout the cell cycle. It

is a 530 amino acid 60kD protein (the “p75” was coined based on its apparent

molecular weight on an SDS-PAGE gel). The LEDGF N-terminal domain

ensemble (NDE) consists of a PWWP domain, a nuclear localization signal

(NLS), and dual copies of the AT-hook DNA binding motif (158). The AT-hook

domain and PWWP domain are required for association with chromatin while the

NLS is possibly contributes to nuclear import of the PIC (107). The LEDGF C-

terminus contains a highly conserved integrase binding domain (IBD) that

strongly binds HIV-1 IN as well as various cellular proteins (31, 32, 171). The 10

solution structure of the LEDGF IBD and a co-crystal structure of the IBD in complex with the IN catalytic core domain have been reported (30, 33). The IBD consists of an 82 residue α-helical bundle that docks into a relatively small cavity at the IN catalytic core domain dimer interface, contacting both IN subunits (30).

Cherepanov et al observed that ectopically expressed FLAG-tagged HIV-1

IN consistently pulled down LEDGF and co-localized with LEDGF in the nucleus

(32). Suppression or knockout of LEDGF abolished chromatin association and nuclear localization and reduced integration levels ~10-fold in infected cells

(105). In addition, integration into active transcription units was reduced in these cells (36). These findings immediately suggested a role for LEDGF in nuclear import of the PIC and integration target site selection. These findings were especially compelling in light of the fact that the IN-LEDGF interaction is lentivirus specific and suppression of LEDGF abolishes the two hallmarks of lentiviral infection: nuclear import of PICs and an integration preference for active transcription units (29). The role of LEDGF as a nuclear import factor remains controversial. NLS-mutant LEDGF proteins have been shown to sequester IN in the cytoplasm when both proteins are expressed at high levels (163). However, the relevance of these experiments to naturally occurring viral infection is questionable, as the majority of LEDGF in the cell is chromatin associated and

LEDGF has not been reported to be packaged in viral particles. In viral replication assays using LEDGF knockout cells, the accumulation of unintegrated

HIV genomes in the nucleus indicated the replication block was at the level of integration and not at nuclear translocation (147). The residual infection of 11

LEDGF knockdown cells suggests a LEDGF-independent mechanism for nuclear translocation (147, 163). Furthermore, PICs isolated from the cytoplasmic fractions of LEDGF knockout cells are fully competent for integration in in vitro assays (147). Taken together these findings suggest that LEDGF is not a critical component for assembly or function of the PIC, and that the critical role of

LEDGF lies in its ability to direct PICs to active transcription units and tether them to chromatin.

While the ability of LEDGF to tether proteins to chromatin has been demonstrated, the details of LEDGF interactions with chromatin are mostly unexplained. The PWWP module found in the LEDGF N-terminus is present in

~60 eukaryotic proteins and is related in sequence and 3-D structure to a number of other chromatin binding domains collectively known as the Tudor clan domains

(119). The PWWP domain of the closely related hepatoma-derived growth factor

(HDGF) has been crystallized and the 3-D structure is presumably similar to

LEDGF PWWP (113). The defining feature of the HDGF PWWP is a 5 stranded antiparallel β barrel structure that features a surface exposed hydrophobic pocket

(113). The function of LEDGF in the cell is unclear, though it seems to act as a general co-activator of transcription and chromatin tether. LEDGF knockout mice survive, although with increased mortality (152). It was originally described as a transcription factor that controlled heat shock and stress response elements, although later experiments were not able to replicate a DNA sequence-specific binding of LEDGF (140, 142, 158).

12

1.6 Figures for chapter 1

Figure 1.1 HIV replication cycle: HIV viral envelope proteins interact specifically with surface receptors on target cells, leading to membrane fusion and entry of the viral core into the cytoplasm. The viral RNA genome is reverse transcribed into cDNA, which is transported into the nucleus and permanently integrated into the host DNA by the viral enzyme integrase. The integrated provirus is used as a template for the synthesis of RNAs encoding viral proteins as well as full-length RNAs for packaging into new viral particles. Translation of viral proteins and viral particle assembly lead to the budding of new infectious particles capable of infecting other cells. Figure adapted from Ciuffi et al, 2006 (35).

13

A. B.

3’ Processing

C. D.

Strand Transfer

E.

Figure 1.2 Enzymatic activities of HIV integrase: (A) In the cytoplasm HIV IN recognizes the 3’- viral DNA ends and (B) cleaves a GT dinucleotide, leaving recessed ends. (C) In the nucleus the viral pre-integration complex engages chromatin where (D) the exposed 3’-OH groups are used in a nucleophilic attack that breaks the target DNA phosphodiester backbone and attaches the viral DNA ends five base pairs apart on opposite strands. (E) Host DNA repair enzymes fill in the recessed ends and seal the nicks in the DNA backbone. Figure adapted from Pommier et al, 2005 (132).

14

A B IN viral DNA viral DNA CAGT GTCA

3'-processing GT Stable Synaptic Complex

CA GTCA processed DNA

Integration Target DNA strand transfer

Strand Transfer Complex

GT CA C A Deproteinization Strand transfer or half-site integration product

Concerted Integration Product

Figure 1.3 Assays of integrase enzymatic activity: (A) 32 P labeled 21-mer dsDNA mimicking viral DNA ends is used as a substrate for both the 3’- processing and strand transfer reactions. (B) Long 32 P labeled DNAs (>200 base pairs) with ends mimicking viral DNA are pre-incubated with IN to assemble a stable synaptic complex in which the 3’-processing reaction occurs. The target is super-coiled plasmid DNA, which is uncoiled by a single-end integration event or linearized by the successful concerted integration of two viral DNA ends.

15

A. B.

Figure 1.4 Two-domain structures of HIV integrase The crystal structures of HIV IN (A) catalytic core domain + C-terminal domain (28) and (B) N-terminal domain + catalytic core domain (167). Individual IN subunits comprising the dimeric structure are colored green and yellow.

16

CHAPTER 2

SUBUNIT SPECIFIC VIRAL-DNA CONTACTS AND A DNA-INDUCED

CONFORMATION CHANGE WITHIN FUNCTIONAL HIV-1 INTEGRASE

NUCLEOPROTEIN COMPLEXES

2.1 Introduction

HIV-1 integrase (IN) is commonly viewed as an important therapeutic target for the following reasons: its catalytic activities are required for viral replication, there is no closely related cellular equivalent of IN, and specific IN inhibitors are likely to be effective against viral strains resistant to currently available therapies targeting reverse transcriptase (RT), protease, and virus-cell fusion. Detailed structural information on functional IN-DNA complexes could aid drug design efforts. For example, the promising diketo acid class of inhibitors preferentially bind to the assembled IN-viral DNA complex rather than the free protein (58, 68, 72, 73).

The two chemical reactions catalyzed by HIV-1 IN, 3’ processing and DNA

strand transfer, have been characterized in detail (reviewed in (13)). First, IN

removes two from each 3' end of the viral DNA synthesized by RT. In

the following step, concerted transesterification reactions covalently join the viral

DNA ends into the host genome (57). In vivo, the enzyme acts in the context of a 17

large nucleoprotein complex with a number of viral and host proteins contributing to the integration process (15, 16, 23, 24, 26, 32, 33, 52, 61, 92, 93, 96, 105,

106, 147).

HIV-1 IN is composed of three distinct structural and functional domains: the N-terminal domain (NTD) (residues 1-50) that contains an HHCC zinc binding motif, the catalytic core domain (CCD) (residues 51-212) containing the DDE motif essential for coordinating catalytic divalent metals, and the C-terminal domain (CTD) (residues 213-288) that is thought to provide a platform for DNA binding. Crystallographic or NMR structural data are available for each of the individual domains (20, 47, 49, 67, 108). In addition, two-domain CCD/CTD (28) and NTD/CCD (166) crystal structures have been determined. However, efforts to obtain detailed structures for the full-length protein or IN-DNA complexes have been impeded by relatively poor protein solubility and/or full-length enzyme flexibility.

To better understand IN-DNA interactions, a number of biochemical approaches have been employed. Site directed mutagenesis experiments indicated that different monomers within the IN multimer provide complementary rather than symmetrical contacts to DNA (54, 159, 161). Photo and chemical cross-linking studies revealed several potential DNA binding residues (59, 75, 76,

83, 84, 120). These biochemical experiments together with crystallographic determination of the two-domain structures prompted molecular modeling research. However, the IN-DNA models obtained by different groups vary

18

significantly, indicating that the available experimental data comprises an insufficient number of constraints for formulating a common outcome (25, 41, 64,

76, 86, 130, 166, 168) .

Here we employed a new experimental strategy combining two established methodologies of site specific protein-DNA cross-linking through bond formation and mass spectrometric (MS) protein footprinting. This approach enabled us to identify monomer selective contacts with substrate DNA.

Importantly, our studies found that upon DNA binding IN undergoes a conformational change involving the alpha helical connection between the core and C-terminal domains. We also uncovered a potential role for the NTD in binding to the viral DNA substrate. The functional importance of the identified residues was confirmed by site directed mutagenesis. Our data provide new and important details on how HIV-1 IN interacts with its viral DNA substrate.

2.2 Materials and Methods

2.2.1 Preparation of recombinant HIV-1 IN, DNA oligonucleotides, and cross-linked products: Full-length IN and mutant proteins were expressed in E. coli and purified as described previously (84). The catalytic activities of

recombinant proteins were monitored according to the reported procedure (84).

Phosphoramidites (O6-phenyl-dI, O4-triazolyl-dU and 2-F-dI) were purchased

from Glen Research (Sterling, VA) and synthetically incorporated in the viral DNA

sequences using an Applied Biosystems 392 DNA synthesizer. Post-synthetic

attachment of the carbon tether (cystamine) to the modified bases, and

subsequent activation and purification of oligonucleotides were performed as 19

published previously (84). For cross-linking reactions, 10 µM IN was incubated with equimolar DNA duplex in 50 mM HEPES, pH 7.0, 100 mM NaCl, 5 mM

o MgCl 2, and 10% glycerol at 37 C for 20 min. The reactions were quenched by 20 mM methyl methanethiosulfonate and subjected to surface topology analysis described below.

2.2.2 Mass spectrometric footprinting: Free IN and cross-linked IN-DNA complexes were treated at 37 oC with NHS-biotin or p-Hydroxyphenylglyoxal

(HPG) for 30 and 60 min, respectively. The reagent concentrations are indicated

in the text and figure legends. The reactions were quenched by excess Lys and

Arg using free amino acid forms. Cross-linked products were separated by

denaturing SDS-PAGE using nonreducing gel loading and separation buffers.

Bands were visualized by Microwave Blue stain (Protiga, Gaithersburg, MD),

excised, and destained according to the described procedure (40, 89, 103, 117,

141, 143). In-gel proteolysis was performed using 0.5 µg trypsin to generate

small peptide peaks amenable to MS and MS/MS analysis. Tryptic peptides were

monitored by a Kratos MALDI-ToF instrument equipped with a curved field

reflectron feature (Kratos Analytical Instruments, Manchester, U.K.), enabling

generation of post source decay (PSD) amino acid sequence data. MALDI-ToF

experiments were performed using α-cyano-4-hydroxy-cinnamic acid as a matrix.

For accurate quantitation of the modified peptide peaks, the intensities of

unmodified IN tryptic peptides were used as internal controls.

2.2.3 CD Spectroscopy: CD spectra were recorded on an AVIV 202 circular

dichroism spectrometer (USA). A 0.01 cm pathlength cell and 1 nm bandwidth 20

were used to make measurements in the near (215-320 nm) ultraviolet region.

The protein concentration was 10 µM in 20 mM PIPES, pH 6.8, 750 mM NaCl, 2 mM β-mercaptoethanol, 0.1 mM EDTA and 1 mM CHAPS.

2.2.4 Virus infectivity: The SpeI-EcoRI fragment from pNL4-3 containing the IN coding sequence sub-cloned into pBlueScript (Invitrogen) was subjected to

QuikChange mutagenesis (Stratagene, La Jolla, California), and the mutated fragments were returned to pNL4-3. AgeI-PflMI fragments were subsequently exchanged for the IN-coding fragment in the NLX.Luc.R- strain that carries the luciferase reporter gene (110). Normalized levels of viral infectivities using SupT1 target cells were calculated as previously described (110).

2.3 Results

We chose to employ MS protein footprinting (40, 89, 103, 117, 141, 143) to study IN-viral DNA interactions for the following reasons. The methodology proved to be effective for identifying contact amino acids to cognate nucleic acids for a number of nucleoprotein complexes not amenable to conventional structural biology analyses such as X-ray crystallography or NMR. Of note, the MS footprinting consistently revealed functionally essential interactions (40, 89, 103,

117, 141, 143). Briefly, the approach is based on comparing surface topologies of free protein versus pre-assembled nucleoprotein complexes using amino acid specific reagents such as NHS-biotin and HPG that modify exposed lysine and arginine residues, respectively. The basic residues are targeted due to their likely role in electrostatic contacts with nucleic acids. Reagent concentrations are carefully optimized to ensure mild modification conditions where the integrity of 21

the functional complex is preserved. Subsequent MS analyses reveal the surface residues readily labeled in free protein yet shielded from modification in the complex due to bound nucleic acid.

Unlike previously examined nucleoprotein complexes, IN-DNA interactions presented us with a new challenge as homologous protein monomers are predicted to provide complementary rather than symmetrical contacts to cognate

DNA. Indeed, earlier complementation studies (54, 161) suggested that the viral

DNA is coordinated by the CCD of one monomer and the CTD of another monomer. Because of this, it was essential to separate individual monomers based on their differential interactions with DNA to accurately assign contacts within full length multimeric IN. For this, we employed a new experimental strategy combining two technologies established in our laboratories: site-specific protein-DNA cross-linking through disulfide bridging (reviewed in (164)) and MS foot-printing (40, 89, 103, 117, 141, 143) (Figure 2.1). Our goal was to obtain two different cross-linked complexes: one with the CCD tethered to the DNA end, and the other with the CTD linked to a subterminal position in the DNA substrate. The two complexes were then exposed to selective amino acid modifying reagents.

Subsequent SDS-PAGE enabled us to separate the cross-linked complexes from free protein. As depicted in figure 2.1, monomer 1 cross-linked to DNA through

CCD residue E152C was separated from monomer 2 (Figure 2.1, middle). In a parallel experiment, the monomer 2-DNA complex cross-linked through a selective CTD residue was separated from monomer 1 (Figure 2.1, bottom).

Cross-linked products were then processed and analyzed separately using MS. 22

We predicted that protection patterns within monomer 1 and monomer 2 would differ, and the identified amino acids in the respective monomers would reveal a

DNA binding channel in the context of full length multimeric IN.

To form disulfide bridged IN-DNA complexes, the alkanethiol tether was placed at selected sites in the viral DNA sequence. DNA oligonucleotides were prepared through synthetic incorporation of commercially available analogs of A,

C and G. Crystallographic and NMR studies have shown that placements of cross-linkable base analogs in double stranded DNA do not alter Watson-Crick base pairing or global nucleic acid structures (reviewed in (164)). Furthermore, viral sequences containing the modified bases were effective IN substrates

(Figure 2.2).

IN was prepared for cross-linking using a previously reported two-step procedure (64, 84). First, 3 surface Cys residues (C56, C65 and C280) were replaced with Ser to rid non-specific cross-linking between the wild type protein and modified DNA. Of the remaining three cysteines, C40 and C43 coordinate the structural Zn ion, and C130 is hindered from the protein surface (28, 166).

Indeed, the triple mutant (C56S, C65S and C280S) exhibited minimal reactivity with cross-linkable DNA (84). In the second step, the reactive Cys residue was introduced at a surface position predictive of specific nucleoprotein complex formation. Based on available mechanistic and structural studies, we chose

E152C to tether IN to DNA modified at base G2 (numbering as in Figure 2.2, upper panel). E152 is the catalytic Glu of the DDE motif, and G2 is located at the scissile phosphodiester bond. Furthermore, the reactive SH group in the G 23

analog points to the minor groove (164), stabilizing the IN catalytic site immediately adjacent to the scissile bond. To confirm specificity, binding of

E152C protein to DNA modified at other positions was examined. The results reveal a striking preference for formation of the IN(E152C)xDNA(G2) complex

(Figure 2.3A). The reduced reactivity for position 2 in the lower strand could be explained by the fact that the SH group of the C-analog points to the major groove, away from the cleavage site. The observation that the other sites yield only residual cross-linking (Figure 2.3A) indicates the specific nature of the

IN(E152C)xDNA(G2) interaction.

Previous studies by Bushman and colleagues (64) identified IN(E246C) as a plausible CTD contact to DNA. Using TNB-thio tethered substrates the authors observed robust but non-specific cross-linking of IN(E246C) to various positions. Further experiments with less reactive nucleotide analogs uncovered the relative preference of IN(E246C) for position A7 albeit significantly reduced formation of the nucleoprotein complex. To generate sufficient amounts of cross- linked products for our MS footprinting we used TNB-activated DNAs. In addition to IN(E246C) crosslinking to A7 (Figure 2.3B, lane 5) we probed interactions with additional nucleotides including G16 and G19 located at the distal region of the

DNA substrate (Figure 2.3B, lanes 6 and 7). The data indicated comparable reactivities of IN(246C) with various cross-linkable nucleotides (Figure 2.3B).

While these non-specific interactions were expected, the available crystal structure of the CCD-CTD fragment (28) allowed us to delineate functional from non-productive IN-DNA interactions. For example, based on the spatial 24

separation between the CCD and CTD, we could predict that crosslinking

IN(E246C) to nucleotides adjacent to the scissile bond (Figure 2.3B, lanes 2 and

3) would position the CCD toward the opposite terminus of the DNA substrate. In contrast, in the functional complex the CCD would have to directly interact with the 3’-processing site, apposing the CTD to the distal regions of the DNA substrate. Therefore, for the footprinting studies we chose to examine IN(E246C) crosslinks to positions A7, G16 and G19.

Of note, the triple (C56S, C65S, and C280S) background mutant and

E246C proteins were catalytically active, indicating these mutations did not significantly alter IN folding or its interactions with viral DNA (Figure 2.2B). As expected, replacement of active site residue E152 compromised IN activity, due to the inability of the substituent amino acid to coordinate the essential magnesium ion for catalysis. Therefore, the integrity of E152C IN was assessed as follows. First, E152C efficiently crosslinked with DNA(G2) (Figure 2.3A, lane

3), indicating that its global structure is intact. Chemical modification and subsequent MS analyses importantly revealed very similar surface topologies for the wild type and E152C INs. Furthermore, the two proteins yielded overlapping

CD spectra. This collection of data supports the notion that substituting Cys for

Glu-152 does not overtly alter IN structure.

The modification reactions were carried out under mild conditions such that the integrity of the nucleoprotein complex was preserved. For this, effects of increasing concentration of NHS-biotin and HPG modifiers on wild type IN activities were examined. Representative data in Figure 2.4 show that treatment 25

of free IN with NHS-biotin using the 1/50 ratio of protein/reagent inactivated IN function, probably due to modification of essential lysine residues in the vicinity of the enzyme active site (83). To ensure that this was the case as compared to gross disruption of IN structure, we subjected the pre-assembled IN-DNA complex to the same modification. The result was a remarkable retention of original IN activity, indicating that under these conditions, the integrity of the functional complex is preserved (Figure 2.4, lane 3). The optimal IN/HPG ratio was analogously defined as 1/1000. Importantly, centrifugation assays (15,800 g for 1h) indicated that the crosslinked complexes were fully soluble (data not shown).

The two nucleoprotein complexes IN(E152C)xDNA(G2) and

IN(E246C)xDNA(A7) were subjected to MS footprinting analysis according to the

scheme depicted in Figure 2.1. Representative segments of the mass spectra

are depicted in Figure 2.5. C1, C2 and C3 peaks are unmodified IN peptides that

provide internal controls for accurate quantitation of modified peptides. The

peptide peak 265-269 containing biotinylated K266 is another control, as this

lysine is equally exposed in the free protein and IN-DNA complexes (Figure

2.5A). Figure 2.5B reveals monomer selective protection of K160. This residue is

readily modified in free IN. However, upon formation of the IN(E152C)xDNA(G2)

complex, K160 is shielded from modification. In contrast, the residue remains

surface-exposed in the IN(E246C)xDNA(A7) complex. A converse picture is

observed for CTD residue K264 (Figure 2.5C). The intensity of the modified peak

persists in the IN(E152C)xDNA(G2) complex, but was significantly diminished in 26

the IN(E246C)xDNA(A7) complex. Taken together, these results alongside the summary in Table 2.1 demonstrate that separate monomers differentially contact substrate DNA. DNA cross-linked via E152C specifically protected K14, K159,

K160, and R199, whereas residues K219, K244, R263, K264, and R269 were footprinted only via the E246C link. The protection patterns of E246C with positions A7, G16 and G19 were very similar.

Interestingly, a unique modification pattern was observed for R199 (Figure

2.6). This residue was only marginally modified in the context of free IN. The

reactivity was further reduced in the IN(E152C)xDNA(G2) complex. Of note, we

observed a remarkable increase in the extent of R199 modification within the

IN(E246C)xDNA(A7) complex (Figure 2.6C), indicating that subunit-specific DNA

binding significantly exposes this amino acid to the protein surface. R199 was

also more reactive in the IN(E246C)xDNA(G16) and IN(E246C)xDNA(G19)

complexes as compared with the free protein. However, a pattern could be

noticed wherein the extent of R199 modification decreased as the crosslinking

points were further distanced from the cleavable DNA end (compare 4C, D and

E).

To address the specificity of the IN-DNA interactions, E152C was cross-

linked with non-specific double stranded and single-stranded DNA. The non-

specific double strand did not yield sufficient cross-linked species for quantitative

MS analysis. These results together with the data depicted in Figure 2.3A

indicate that in non-specific IN-DNA complexes, E152C is not positioned close

enough to establish disulfide cross-links with the DNA. The single stranded 27

oligonucleotide resulted in higher yields of cross-linked product than its double stranded counterpart, but constituted only ~25% of the specific

IN(E152C)xDNA(G2) complex (data not shown). MS comparison of

IN(E152C)-ssDNA and IN(E152C)xDNA(G2) products indicated sharp differences in protein-DNA contacts within the two complexes. The residues protected in the latter complex remained surface exposed in the former species.

Representative data showing differential accessibilities of K14 in the two complexes are depicted in Figure 2.7. These results suggest that detectable cross-linking between E152C and ssDNA was due probably to the highly flexible nature of the oligonucleotide, rather than its specific binding to IN.

We next used site-directed mutagenesis to confirm the functional importance of the identified amino acids. In fact, several interacting residues revealed from our footprinting results have previously been examined by mutagenesis due to their high degree of conservation among various HIV strains, and were shown to be essential for recombinant IN function and HIV-1 replication

(83, 110, 111). Therefore, our efforts focused on analyzing two novel contacts:

K14, the first NTD residue to be implicated in viral DNA binding, and R199, involved in the protein conformational change in one subunit while interacting with DNA in the other monomer (Table 2.1). The substitution of Ala or Glu was engineered at each position within the wild type IN background. MS surface mapping and CD spectroscopy revealed proper folding of the recombinant mutant proteins. The results in Figure 2.8 show that the K14A/E and R199A/E mutants exhibited residual levels of specific activity (3’-processing and DNA 28

strand transfer) and generated non-specific (20-mer) cleavage products.

Interestingly, the same phenomenon was observed with mutations of certain IN residues (for example Q148) implicated in specific recognition of the viral DNA substrate (84). An alternative scenario that our preparations could be contaminated with E. coli is less tenable as the D116N active site

mutant purified according to the identical procedure did not generate the 20-mer

product (Figure 2.8, lane 7). Also noteworthy is that the Ala mutants displayed

significantly reduced specific activity, while the Glu mutations yielded more

pronounced affects (Figure 2.8, compare lane 4 to lane 3 and lane 6 to lane 5) .

Neutralizing the positive charge that is needed to stabilize DNA at the specific

site by Ala substitutions could reduce binding affinity, while the placement of Glu

may result in more efficient displacement of the nucleic acid from its specific site

due to electrostatic repulsions.

Each of the four mutations also significantly impacted the infectivity of the

virus. The R199A mutant retained ~0.24% of wild-type HIV-1NL4-3 function in a

single-round infection, whereas the other viruses failed to yield detectable

infectivities (Table 2.2).

Our experimental results were used to dissect the viral DNA binding

channel in the functional IN nucleoprotein complex. For this, a full length IN

model was created by superimposing the two available crystal structures:

NTD/CCD and CCD/CTD (28, 166). Then the residues identified by our

footprinting analysis were located on the IN monomers (Figure 2.9). The

assignment of the amino acids to the respective IN subunits was critical to 29

visualize a plausible DNA binding channel. The DNA substrate was readily positioned along the path of the basic residues (Figure 2.9) with all three protein domains providing direct contacts to nucleic acid.

2.4 Discussion

We have investigated interactions between full-length HIV-1 IN and its viral DNA substrate using a novel experimental strategy combining two established methodologies: site-specific protein-DNA cross-linking through disulfide bridging, and MS protein footprinting. The cross-linking technology has been successfully used in combination with x-ray crystallography to obtain atomic structures of a number of nucleoprotein complexes including the covalently trapped complex of RT with a DNA primer-template and a deoxynucleoside triphosphate (78, 164). Analogous efforts for the HIV-1 IN-DNA complex have so far been hampered by poor protein solubility and/or full length enzyme flexibility.

Combining site-specific cross-linking and MS footprinting offered a powerful alternative and led us to the following important findings. Asymmetric contacts between IN and viral DNA were identified, which provided major constraints to dissect the viral DNA binding channel in the functional nucleoprotein complex. In addition, our experiments uncovered a subunit-specific conformational change induced upon DNA binding.

Our work was based on and expanded previous optimization of IN-DNA

cross-linking through disulfide bridging (64). We complemented the cross-linking

approach with MS surface topology analysis of basic amino acid residues. Our

results reveal the importance of these technologies for identification of 30

asymmetric contacts between IN and viral DNA. While the observed protections near the cross-linking points were expected, the method also revealed subunit selective contacts distant from the disulfide linkages. For example, K14 within the

NTD was protected when CCD residue E152C was cross-linked to DNA(G2). In addition, increased reactivity rather than protection of R199 was observed in the

IN(E246C) complexes with DNA substrates. Site-specific cross-linking was also essential to delineate specific from non-specific IN-DNA interactions. IN-DNA binding could yield a mixture of active and non-productive complexes. However, functional site-specific modification of IN and DNA enabled us to covalently link and separate specific complexes from non-productive IN-DNA interactions, the latter of which dissociated under denaturing SDS-PAGE (Figure 2.3A). Although our method does not distinguish whether the observed amino acid protections are due to direct protein-nucleic acid or protein-protein contacts, the protections appear to strictly depend on the presence of the specific HIV-1 DNA substrate in the reaction mixture. The surface basic residues identified in this study are moreover likely to engage in electrostatic interactions with the DNA substrate.

Indeed, all the interacting amino acids align in a plausible DNA binding channel

(Figure 2.9).

Our results for the first time implicate the NTD in direct contact with viral

DNA. Previous studies revealed the importance of the NTD for IN catalytic activities via testing mixtures of different protein domains in functional trans- complementation assays (54, 159, 161). This approach, however, fell short of demonstrating direct NTD-viral substrate interactions. We in contrast observed 31

specific protection of NTD residue K14 in the IN(E152C)xDNA(G2) complex, but importantly not with the non-specific IN(E152C)-ssDNA complex (Figure 2.7) nor

IN(E246C) complexes (Table 2.1). Other studies reported that the N-terminal 11 residues of HIV-1 IN cross-linked to the target DNA component rather than viral sequences in a dumbbell disintegration substrate (75, 76). Given that the authors examined a limited number of cross-linkable sites in the viral DNA, the contribution by K14 could easily have been missed. In contrast, the Lys and Arg modifying reagents employed here provide a significantly more detailed picture of individual amino acid residues. The alternative scenario that K14 became protected at a point in the reaction pathway that lay significantly downstream of

IN-viral DNA complex formation for example during assembly of the DNA strand transfer complex could be ruled out for the following reasons. While our in vitro assays could yield strand transfer reaction products after 3’-processing of the blunt-ended 21-mer substrate, our crosslinking and modification reactions were performed for relatively short periods of time under which no 3’-processing or strand transfer products were detected (data not shown). In addition, the

IN(E152C)xDNA(G2) complex, in which K14 was protected from modification, was catalytically inactive. Therefore, our findings suggest that K14 plays a role in interacting with viral DNA. This conclusion is further supported by the site- directed mutagenesis studies, which showed that substituting K14 with Ala or Glu compromised both 3’-processing and strand transfer activities in vitro and abrogated HIV-1 infectivity in cell culture. Interestingly, the recombinant mutant proteins generated -1 cleavage products (Figure 2.8), a phenotype reminiscent of 32

the pattern observed with mutations of certain IN residues involved in selective interactions with viral DNA (84). Figure 2.9 indicates that K14 is positioned in the viral DNA binding channel, where it could indeed contribute to cognate DNA recognition.

Another key finding of this study is a DNA-induced protein conformational change involving the extended alpha helix that connects the CCD to the CTD. In a previous study, Asante-Appiah and Skalka identified a metal induced reorganization of HIV-1 IN structure, which affected the recognition of the CCD and CTD, but not the NTD, by domain selective (7). Bushman and co- workers observed differential cross-linking of CTD residues with blunt ended and processed DNA substrates, suggesting protein structural changes upon cleavage of the viral DNA terminus (64). Roth and co-workers found that a 19 amino acid sequence insertion at E212, located on the helix connecting the CCD and CTD, was tolerated by IN as this large structural change did not significantly impact enzyme function (133). Our observation of the increased reactivity of R199 in the

IN-viral DNA complex (Figure 2.6) provides important clues regarding the nature of protein structural changes. R199 is located on the ridged helix (aa 196-221) bridging the CCD to the CTD (see Figure 2.9). The helix is flanked by a flexible loop (aa 187-195) and the highly basic CTD on its N- and C-termini, respectively.

Such structural organization enables the flexibility of the loop to be translated to and exploited by the CTD upon formation of the functional nucleoprotein complex.

33

Our interpretation of the step-wise decrease in R199 reactivity within the

IN(E246C)xDNA(G16) and IN(E246C)xDNA(G19) complexes as compared with

IN(E246C)xDNA(A7) is the following. Linking the CTD on A7 with the CCD concomitantly committed to the scissile bond at the 3’ end of the cleaved strand would require a significant conformational change in the protein. Accordingly, the

CTD accommodated distal G16 and G19 sites with less overall structural rearrangements (Figure 2.6). This observation reflects an extraordinary degree of flexibility within this region of HIV-1 IN. The residual reactivity of R199 in the

IN(E246C)xDNA(G16) and IN(E246C)xDNA(G19) complexes could reflect relatively minor adjustments of the CTD to bound DNA. We would argue that the

CTD interacts with the energetically more favorable distal regions of the DNA substrate. In Figure 2.9, G19 is located nearby protected K244 and crosslinkable

E246 residues. However, the binding site of the CTD can apparently vary. Unlike the CCD that needs to precisely position over the cleavable scissile bond, the

CTD is highly flexible and thus engages in non-specific contacts with DNA in vitro

(Figure 2.3B). In this respect it is noteworthy that insertion of 19 amino acids at the CCD-CTD connection, which could have very well altered the positioning of the CTD on viral DNA, did not significantly affect IN activity (133). We propose that this reflects an inherent flexibility of the CCD-CTD connector loop, which could be essential for the productive assembly and stability of the nucleoprotein complex.

34

Why does viral DNA bind to only one surface of the IN multimer while the opposite side of the protein remains unaligned? There are at least the following two factors to consider. The crystal structure of the CCD-CTD fragment indicates that the extended α-helix (aa 196-221) is a canonical cylinder in one monomer

and is bent by ~45 degrees at Gln209 in the second monomer, thus rendering

the IN dimer asymmetric ((28), also see Figure 2.10). Our findings indicate that

upon DNA binding only one of the two monomers undergoes a conformational

change. Thus, the DNA bound and surface exposed sites within the IN multimer

differ significantly. We suggest that the conformational change together with the

kink observed in the crystal structure have biological relevance and are important

for the assembly of the functional nucleoprotein complex.

IN dimers have been shown to bind each viral DNA end and catalyze the

3’-processing reactions (69, 71). Following GT dinucleotide removal IN remains

stably associated with cognate DNA (69) and the two viral DNA-bound dimers

presumably form a tetramer to carry out the integration of both viral DNA ends,

as occurs during infection (100). Figure 2.9 depicts the IN dimer bound to one

viral DNA end. Further experimental data elucidating protein-protein interfaces in

the synaptic complex are needed to validate higher order interactions at play

during the concerted integration of both viral DNA ends. Conditions for effective

full-site integration by recombinant IN and 2-D-gel electrophoretic isolation of

synaptic complexes have been reported (100, 149). These techniques can be

combined with MS surface topology analyses to better understand the protein-

protein interactions essential for the two-ended integration reaction. It will also be 35

intriguing to apply our footprinting approach to study effects of physiologically relevant cell factors like LEDGF/p75 on IN-DNA interactions. Our identification of a restricted, single viral end-specific DNA binding channel by the combined use of subunit-specific protein-DNA cross-linking and MS footprinting is an important step toward these long-term goals. In addition, the new experimental strategy reported here can be employed by other research groups to analyze a large variety of nucleoprotein complexes.

The new findings reported here also have implications for exploiting the

IN-viral DNA structure as a therapeutic target. For example, the most promising

diketo class of inhibitors are known to specifically interact with the pre-assembled

IN-viral DNA complex and impair the DNA strand transfer activity (58, 68, 132).

More recently, IN-DNA interfacial quinolonyl diketo acid derivatives, which

potently inhibit both 3’-processing and strand transfer, were reported (46). The

novel IN-viral DNA contacts uncovered herein could help to elucidate the

mechanisms of these known compounds and facilitate rational evolution of new

potent inhibitors.

36

2.5 Tables for Chapter 2

Residue Free IN IN(E152C)xDNA(G2) IN(E246C)xDNA(A7)* K14 + - + K111 + + + K159/K160 + - + K160 + - + R166 + + + K188 + + + R199 + - +++ K211 + + + K219 + + - R224 + + + R228 + + + K244 + + - R262 + + + R262/ R263 + + - K264 /K266 + + - K266 + + + R269 + + - K273 + + + *, IN(E246C) complexes with DNA(A7), DNA(G16) and DNA(G19) yielded very similar protection patterns; +, surface exposed residues readily modified; -, amino acids inaccessible to modification; +++ , increased reactivity (see Figure 2.6); The amino acids that become shielded from modifiers in the nucleoprotein complex are highlighted.

Table 2.1 Susceptibility of IN basic residues to modification

37

Mutant Infectivity* V165A 0.01 ± 0.01 K14A < 0.01 K14E < 0.01 R199A 0.24 ± 0.13 R199E < 0.01 D64N/D116N 0.05 ± 0.02 * Percent wild-type infectivity (2.7x10 5 relative light units/µg total cell protein 7 6 using 4 x 10 RT-cpm of HIV-1NLX.Luc.R- and 5 x 10 SupT1 cells) of three independent infections with standard error. V165A and D64N/D116N were included as prototypical class II and class I IN mutant viruses, respectively (110).

Table 2.2 Infectivities of IN mutant viruses

38

2.6 Figures for Chapter 2

Figure 2.1 Experimental strategy: Free IN (space-fill model of the two domain CCD/CTD structure is used for illustration) and the two cross-linked IN-DNA complexes are examined in parallel experiments. The two IN mutants each containing a reactive Cys residue in either the CCD (E152C) or CTD (E246C) are cross-linked to DNA substrates that contain modified nucleotides placed at respective positions. Free IN and the two complexes are subjected to treatment by small chemical modifiers (M). Surface residues in free IN and complexes are modified, but amino acids interacting with DNA are shielded from modification. Complexes and free IN monomers are separated by SDS-PAGE. Monomer-DNA complex bands are excised and subjected to in-gel proteolysis. Subsequent comparative MS analyses reveal modification patterns in free protein and individual monomers.

39

B

Figure 2.2 IN mutant enzymatic activity: The upper panel shows the 21-mer U5 viral DNA end substrate used in our studies. The modified nucleotides are in bold and the IN cleavage site is indicated by a vertical arrow. (A) Wild type IN 3’ processing and strand transfer activities using native and thiol cross-linkable oligonucleotides. Lane 1: no IN control; lane 2: unmodified DNA sequence+IN; lane 3: DNA(G2)+IN; lane 4: DNA(A7)+IN; lane 5: DNA(G16)+IN; lane 6: DNA(G19)+IN. (B) 3’-processing and strand transfer activities of the following proteins: lane 1: no IN control; lane 2: wild type IN; lane 3: the background triple (C56S, C65S and C280S) mutant IN, lane 4: the IN(E152C) mutant; lane 5: the IN(E246C) mutant. Of note, while E152C (lane 4) and E246C (lane 5) mutations where introduced in the background SSS sequence (lane 3), these proteins for simplicity are referred in the figure and text as IN(E152C) and IN(E246C). 21-mer substrate (21-S),19-mer cleavage (19-P) and strand transfer (STP) products are indicated.

40

A

B

Figure 2.3 IN mutant cross-linking efficiency with modified DNA: The upper panel shows the 21-mer U5 viral DNA end substrate used in our studies. The modified nucleotides are in bold and the IN cleavage site is indicated by a vertical arrow. (A) Cross-linking efficacy of IN(E152C) to the following modified DNAs: 1) no DNA control, 2) DNA(C2), 3), DNA(G2), 4) DNA(G5), 5) DNA(A7), 6) DNA(G16) and 7) DNA(G19). (B) Cross-linking efficacy of IN(E246C) to the following modified DNAs: 1) no DNA control, 2) DNA(C2), 3), DNA(G2), 4) DNA(G5), 5) DNA(A7), 6) DNA(G16) and 7) DNA(G19). Of note, while E152C and E246C mutations where introduced in the background SSS sequence, these proteins for simplicity are referred in the figure and text as IN(E152C) and IN(E246C).

41

Figure 2.4 The effects of the modifying reagent on IN function: Lane 1: reactions without any modifications, lane 2: IN was pre-treated with NHS-biotin (IN/NHS-biotin ratio of 1/50) prior to DNA substrate addition. Lane 3, NHS- biotin added after IN-DNA complex formation. 21-mer substrate (21-S),19-mer cleavage (19-P) and strand transfer (STP) products are indicated.

42

Figure 2.5 Representative segments of MALDI-ToF data: Free IN, IN(E152C)xDNA(G2) and IN(E246C)xDNA(A7) complexes were treated with NHS-biotin (A, B and C). IN peptides as well as the affected residues are indicated. Peaks C1, C2 and C3 are unmodified tryptic peptide peaks of IN, which serve as internal controls.

43

Figure 2.6. Representative segments of R199 IN MALDI-ToF data: Free IN (A), IN(E152C)xDNA(G2) (B), IN(E246C)xDNA(A7) (C), IN(E246C)xDNA(G16) (D) and IN(E246C)xDNA(G19) (E) complexes were treated with HPG in parallel reactions. The aa189-211 peptide containing HPG modified R199 is indicated. Peak C4 is an unmodified tryptic peptide of IN, which served as an internal control. 44

Figure 2.7 Representative MALDI-TOF spectra showing specific protection of K14: In parallel experiments IN(E152C)xDNA(G2) and IN(E152C)-ssDNA complexes were subjected to NHS-biotin treatment. The presented spectra indicate that K273 remained surface assessable in both complexes, while K14 was specifically protected in IN(E152C)xDNA(G2) but not IN(E152C)-ssDNA. C5 is an unmodified tryptic peptide of IN, which served as an internal control.

45

Figure 2.8 Effects of amino acid substitutions on recombinant IN activities: Upper image depicts strand transfer activities. Positions of 21-mer substrate and reaction products (STP) are indicated. Lower image displays 3’- processing activities. The positions of 21-mer substrate, and specific (19-mer) and non-specific (20-mer) products are shown.

46

Figure 2.9 Model of viral DNA bound to HIV IN: Separate IN subunits comprising the dimeric structure are colored yellow and green. Amino acid contacts identified by mass spectrometric footprinting are labeled, with the substrate DNA (magenta) overlaid. The IN active site residues are highlighted in red.

47

Figure 2.10 The crystal structure of the CCD-CTD fragment (28): Monomers 1 and 2 are colored in yellow and green, respectively. In monomer 1, the 196- 221 α-helix is shown in cyan and the asymmetric kink at Gln209 is indicated. In monomer 2, the same helix is depicted in red, the 187-195 loop is in magenta, and side chain of R199 is in blue.

48

CHAPTER 3

DYNAMIC MODULATION OF HIV-1 INTEGRASE STRUCTURE AND

FUNCTION BY CELLULAR LEDGF PROTEIN

3.1 Introduction

Integration of the reverse transcribed RNA genome into a host

chromosome is an obligatory step for HIV-1 replication (reviewed in (13)). This

process is catalyzed by the retroviral enzyme integrase (IN) in two reaction steps.

In the first step, which is called 3’ processing and takes place shortly after the

cDNA is made, IN hydrolyzes a GT dinucleotide from each end of the viral DNA.

In the second step, IN catalyzes concerted integration of the processed viral DNA

ends into chromosomal DNA. The sites of attack on the two target DNA strands

are separated by 5 bp, which leads to dissociation of the small double-stranded

DNA fragment between the attachment sites. The subsequent repair of the

intermediate species by cellular enzymes completes the integration reaction.

HIV-1 IN consists of three distinct structural and functional domains. The

N-terminal domain (NTD) (residues 1-50) contains conserved pairs of and cysteine residues that bind zinc (18, 20), which contributes to IN multimerization and its catalytic function (94, 176). The catalytic core domain

49

(CCD) (residues 51-212) contains three acidic residues, D64, D116 and E152, which play a key role in coordinating active site divalent metal ions (47, 67). The

C-terminal domain (CTD) (residues 213-288) also contributes to functional IN multimerization (5, 82). Results of structural biology studies revealed each individual domain as a dimer (20, 47, 49, 67, 108) and more recent two-domain crystal structures comprised of the CCD and CTD (28) or NTD and CCD (166) likewise unveiled dimeric organizations. Functional studies suggested that a dimer of full-length IN could suffice to process each 3’ end, whereas a tetramer is required to integrate both viral DNA ends into chromosomal DNA (62, 69, 100).

Efforts to determine the complete IN structure have been impeded by limited protein solubility and/or the inherent flexibility of the three-domain enzyme. Full- length IN interestingly exists as a mixture of monomers, dimers, tetramers, and higher-order species in the absence of DNA (45, 82, 160, 165).

While in vitro analysis with model DNA substrates demonstrated that IN

alone could catalyze 3’ processing and DNA strand transfer reactions, the in vivo

function of the enzyme is likely to be regulated by a number of viral and cellular

proteins. Following the completion of reverse transcription, the newly synthesized

cDNA remains associated with several viral proteins and recruits host factors to

form the preintegration complex (PIC) (16, 32, 60, 79, 97, 101, 104, 115, 124,

157). Of these, transcriptional co-activator p75, also known as lens epithelium-

derived growth factor (LEDGF), is the principal cellular interactor of HIV-1 IN (32,

115, 157). A number of recent studies have indicated that LEDGF is critically

important for effective HIV-1 integration and viral replication (42, 77, 105, 162). 50

RNA interference (RNAi)-mediated knock-down of endogenous LEDGF to below detectable levels resulted in reduction of infection to 3.5% of that observed in the presence of normal cells (105). Similarly significantly reduced levels of HIV-1 infection were detected in LEDGF knockout mouse embryo fibroblasts (118,

147). Expression of recombinant HIV-1 IN in human cells revealed that LEDGF protects the viral protein from proteasomal degradation and tethers it to chromosomal DNA (51, 104, 106, 107, 115). Accordingly, LEDGF primarily functions during HIV-1 infection to tether PICs to active genes during integration

(147). In vitro assays with purified recombinant proteins furthermore demonstrated that LEDGF binds directly to IN, which significantly stimulates its enzymatic activities (29, 31, 32, 127, 134, 158, 172).

The N-terminal part of LEDGF contains a PWWP domain, nuclear

localization signal, and dual copy of the AT-hook DNA binding motif (reviewed in

(55); Figure 3.1). These conserved elements primarily mediate LEDGF

association with chromatin (107, 158). An evolutionarily conserved domain in the

C-terminal region (residues 347-429) mediates the interaction with IN and was

thus termed the IN-binding domain (IBD) (31, 163). The solution structure of the

LEDGF IBD and its co-crystallization with the IN CCD has been recently reported

(30, 33). Interestingly, the IBD docks into a relatively small cavity at the CCD

dimer interface, contacting both IN subunits (30). The importance of the

interacting amino acids revealed from the crystal structure has been validated by

site directed mutagenesis in the context of full-length recombinant proteins (19,

33, 51, 135, 157) and by the out-growth of resistant viral strains in the presence 51

of a dominant-interfering LEDGF fragment (77). However, mutagenesis studies have also indicated that full-length IN-LEDGF interactions extend beyond the contacts observed in the co-crystal structure of the isolated domains (115).

We have undertaken a number of innovative biochemical approaches to

characterize the structural and mechanistic foundations between the full-length

interacting partners, which has revealed a highly dynamic nature for the

interactions between free IN subunits. LEDGF moreover strongly stabilized the

IN subunit-subunit contacts. Mass spectrometric surface topology studies

furthermore uncovered novel protein-protein contacts, which lie outside of the

central IBD-CCD co-crystal structure. Mutational analysis confirmed the

importance of the identified residues and indicated a strong correlation between

IN tetramer formation and high affinity LEDGF binding. These findings provide

new insight into how LEDGF modulates HIV-1 IN structure/function, and highlight

the potential to exploit the highly dynamic nature of IN subunit interactions as a

novel therapeutic target.

3.2 Materials and Methods

3.2.1 Expression Plasmids and Recombinant Proteins: HIV-1 IN proteins

were expressed from pKBIN6Hthr, which was derived from pKB-IN6H (28) by

replacing amino acids VDKLAAALE upstream from the C-terminal His 6 affinity tag with LVPRGS ALE (thrombin cleavage site underlined) by PCR-directed

mutagenesis. Mutations were also introduced into pKBIN6Hthr using PCR, and

the coding regions of plasmids created via PCR were verified by DNA

sequencing. Wild-type and mutant IN proteins were purified according to the 52

previously described procedure (29). Purified recombinant LEDGF, mutant (mt)

LEDGF, and IBD (Figure 3.1) were obtained as described previously (31, 33,

115).

3.2.2 IN 3’ Processing and DNA Strand Transfer Activities: The 32 P labeled

21-mer synthetic double stranded DNA (50 nM) mimicking the U5 viral end

sequence was used as substrate. The concentrations of wild type and mutant IN

proteins as well as LEDGF and LEDGF IBD included in the reactions are

indicated in the figure legends. The reactions were carried out at 37 oC for 1 hr in

buffer containing: 50 mM MOPS (pH 7.2), 2 mM β-Mercaptoethanol, 10 mM

MnCl 2 , 1 mM CHAPS, 50 mM NaCl, and stopped with 50 mM EDTA. Reaction products were subjected to denaturing polyacrylamaide gel electrophoresis and visualized using a Storm 860 phosphorimager (Amersham Biosciences).

3.2.3 Concerted Integration Assay: These assays were performed as described previously (134). Briefly, the 972 bp ScaI-DraIII restriction fragment from pU3U5 (129) served as donor DNA and was 5’ end labeled with 32 P ATP and T4 polynucleotide kinase. HIV-1 IN (400 nM) was assembled with the labeled donor substrate (18 nM) in the presence of 20 mM HEPES (pH 7.0), 5 mM DTT, 10 mM MgCl 2, 25 µM ZnCl 2, 100 mM NaCl, 5% DMSO, 10% PEG

6000. Ligands (IBD or LEDGF) were added before pre-incubation for 20 min at

room temperature. Reactions (25 µl final volume) were initiated by adding 500 ng

of circular target DNA (pGEM, Promega) and the mixtures were incubated for 1

hr at 37 oC. Reactions were stopped by adding 10 mM EDTA, 0.2% SDS, and 1 mg/ml proteinase K. After ethanol precipitation, samples were subjected to 0.6% 53

agarose gel electrophoresis for 6 hours at 50 V. The gels were dried and the labeled DNA products were detected using the Storm 860 phosphorimager.

3.2.4 Subunit Exchange Assay: His-tagged IN (1 µM) was pre-incubated with or without ligand (2 µM LEDGF or 2 µM mtLEDGF) in exchange buffer (25 mM

HEPES, pH 7.1, 200 mM NaCl, 4% glycerol, 2 mM β-mercaptoethanol) for 30 min at room temperature. Tag-free IN (1 µM) was then added and incubated for the indicated times. Aliquots were then briefly centrifuged 2 min at 1,000 g and supernatants were pulled down by Ni-nitrilotriacetic acid (NTA) resin (GE

Healthcare) for 10 min in the presence of BSA (0.1 mg/ml). The IN-bound resin was then washed three times with buffer containing 50 mM HEPES (pH 7.1), 200 mM NaCl, 2 mM MgCl 2, 100 mM imidazole, and 0.1% (v/v) Nonidet P40. The bound proteins were subjected to SDS-PAGE separation and visualized by

Coomassie stain.

3.2.5 Mass Spectrometric Footprinting: In parallel reactions, free IN and

IN+LEDGF were first incubated at room temperature for 30 min and then subjected to treatments at 37 °C with 1 mM N-hydroxysuccinimide (NHS)-biotin for 30 min or 20 mM p-Hydroxyphenylglyoxal (HPG) for 60 min. These concentrations of modifying reagents were chosen because comparative pulldown experiments with untreated and modified IN-LEDGF complexes indicated that under these conditions the integrity of the preassembled protein- protein complex was fully preserved (data not shown). NHS-biotin treatment was carried out in buffer containing 50 mM HEPES (pH 8.0), 150 mM NaCl, 10 mM

MgCl 2. The HPG modifications were performed in 50 mM HEPES (pH 8.0), 50 54

mM boric acid, 150 mM NaCl. The reactions were quenched by excess Lys and

Arg using free amino acid forms. IN-LEDGF complexes were selectively pulled down using Ni-NTA resin. The bound proteins were separated by denaturing

SDS-PAGE and visualized by Microwave Blue stain (Protiga, Gaithersburg, MD).

IN bands were excised, destained and subjected to in-gel proteolysis with 0.5 µg trypsin. The tryptic peptides were analyzed with the Axima-CFR MALDI-ToF instrument (Shimadzu) using α-cyano-4-hydroxy-cinnamic acid as a matrix.

3.2.6 Size Exclusion Chromatography: Experiments were performed with a

Superdex 200 10/300 GL column (GE Healthcare) at 0.5 ml/min in buffer

containing 50 mM HEPES (pH 7.4), 750 mM NaCl, and 10% glycerol. The

column was calibrated with the following proteins: conalbumin (75,000 Da),

carbonic anhydrase (29,000 Da), ribonuclease A (13,700 Da), and aprotinin

(6,500 Da). Proteins were detected by absorbance at 280 nm.

3.2.7 Molecular Modeling: The model of the NTD-CCD tetramer bound to the

LEDGF IBD was generated by overlaying the CCDs within PDB structures 2B4J

(30) and 1K6Y (166) using the Insight II software package (Accelrys Inc., San

Diego) on a Silicon Graphics O2 work station. The constructed model was then

energy-minimized by the same software package using the CFF91 force field and

steepest descent method.

3.2.8 LEDGF Binding Affinities to Wild-type and Mutant Ins: LEDGF (50-650 nM) was incubated with 100 nM His-tagged IN (WT or mutant) in binding buffer

(50 mM HEPES (pH 7.1), 200 mM NaCl, 2 mM MgCl 2, 100 mM imidazole, 0.1%

(v/v) Nonidet P40) for 60 min at room temperature. Samples were then briefly 55

centrifuged for 2 min at 1,000 g and supernatants were pulled down by Ni-NTA resin for 30 min in the presence of BSA (0.1 mg/ml). The resin was then washed three times with the same buffer and the bound proteins were separated by SDS-

PAGE. LEDGF was detected by western blot analysis using a mouse monoclonal

LEDGF (BD Biosciences) and quantified using Image software (NIH).

Plotting and curve fitting was performed with Origin 8 software (OriginLab).

Nonspecific signal was not detected when LEDGF was incubated with Ni-NTA beads in the absence of IN (data not shown).

3.3 Results

We previously reported that LEDGF significantly stimulated the in vitro

activities of HIV-1 IN whereas the isolated IBD failed to do so (31, 158). As these

assays utilized relatively long blunt-ended viral DNA substrates and DNA strand

transfer product formation as read-out, we reanalyzed the effects of these two

proteins on IN function using an oligonucleotide-based assay that monitors the

formation of 3’ processing and DNA strand transfer reaction products on

denaturing sequencing gels (Figure 3.2A) (17). The results in panel B (lanes 1-6)

revealed stimulation of IN DNA strand transfer activity by the LEDGF IBD under

these assay conditions. It should be noted that in this setting the 19-mer 3’

processing reaction product is the substrate for the second catalytic step (Figure

3.2A). To dissect if the IBD directly enhanced IN 3’ processing activity, a

selective IN strand transfer inhibitor (154, 174) was included in the experiment

(Figure 3.12). The 19-mer reaction product accumulated under these conditions,

revealing significant stimulation of IN 3’ processing activity by the LEDGF IBD 56

(Figure 3.12). For control experiments we analyzed the D366N point mtLEDGF, which is defective for IN binding in vitro (33) and in yeast cells (135), but retains

LEDGF DNA binding activity (Figure 3.1). In contrast with the LEDGF IBD,

formation of 3’ processing and DNA strand transfer reaction products decreased

with increasing mtLEDGF concentrations (Figure 3.2B, lanes 7-12), probably due

to competition between mtLEDGF and IN for binding to DNA. A bell-shape

enhancement of IN activities was observed in the presence of wild type LEDGF

(Figure 3.2B, lanes 13-18), likely due to the competition with IN for DNA binding

at high LEDGF:IN ratios. The IBD, which is the weaker stimulant, could display

this effect at relatively high stoichiometry (compare lanes 6 and 18 in Figure

3.2B) because it lacks the N-terminal LEDGF regions that mediate DNA binding

(158) (Figure 3.1).

We next analyzed the activity of the IBD using a longer donor DNA

substrate and a second circular target DNA, a design that can distinguish the

formation of single-end half-site (HS) integration products from those that form by

the pairwise concerted integration of two viral DNA ends (labeled full-site (FS) in

Figure 3.3A) (127, 134). As expected (127, 134), wild type LEDGF modestly

stimulated FS and HS product formation at LEDGF:IN ratios of <1, while higher

LEDGF concentrations selectively inhibited FS product formation (Figure 3.3B,

lanes 7-9). The selective reduction of FS product formation was also observed

with increasing concentrations of IBD (Figure 3.3B, lanes 4-6).

57

These experiments (Figures 3.2, 3.3, and 3.12) demonstrated that direct binding of LEDGF or its IBD could differentially influence IN activities. The co- crystal structure did not show any significant changes in the tertiary structure of the CCD upon IBD binding, but did reveal that the host factor engaged both monomers of the IN dimer at the CCD interface (30). Therefore, one possible mechanism could be that LEDGF binding influences the dynamics of IN subunit- subunit interactions. Indeed, prior analyses using IBD-based peptides hinted at this possibility (3, 71). We therefore devised the following experiment to analyze the dynamics of IN subunit exchange (Figure 3.4A). His-tagged IN (the IN2-IN2 multimer) was mixed with tag-free IN (IN1-IN1). Three IN populations could then form upon subunit exchange: IN1-IN1, IN1-IN2, and IN2-IN2. Of these, IN2-IN2 and IN1-IN2 could be pulled down by NTA resin through affinity binding with the

His-tag, while the tag-free IN1-IN1 would be washed out. Indeed, when incubated separately IN2 was affectively bound by NTA beads, while the interaction was not detected using IN1 (data not shown). However, the IN1 protein was quantitatively recovered following its pre-incubation with IN2 due to the effective exchange of IN protein subunits. Kinetic analyses determined exchange within 10 min of mixing, reflecting the highly dynamic nature of protein-protein interactions between free IN subunits under these conditions (Figure 3.4B, lanes 1-4).

We next asked how the IBD and full length LEDGF would affect IN subunit exchange. For this, tag-free IN1 pre-incubated with LEDGF was then exposed to His-tagged IN2. LEDGF effectively prevented IN subunit exchange

58

(Figure 3.4B, lanes 5-8). Very similar results were obtained with the IBD (data not shown). In contrast, due to the inability to effectively bind IN, mtLEDGF failed to affect the dynamics of IN subunit exchange (Figure 3.4B, lanes 9-12).

These results indicated that LEDGF or its IBD markedly affected IN subunit-subunit interactions but did not distinguish whether the cofactor stabilizes

(“locks”) IN into a specific multimeric state or prevents the multimer from forming by interfering with subunit-subunit interactions. To delineate how the proteins modulated IN subunit exchange, we employed size exclusion chromatography

(Figure 3.5). The lowest detectable concentration of IN (2.5 µM) was used to approach those employed above in activity-based assays. Under these conditions, wild type IN exhibited two main peaks with the predominant and minor species corresponding to tetramer and dimer, respectively (Figures 3.5A).

The “shoulder” observed to the right of the dimer peak suggests that some monomeric protein was also present under these conditions. The addition of the

IBD resulted in a single shifted peak with a retention time consistent with two

LEDGF IBD molecules bound to the IN tetramer (Figures 3.5A). Since the CCD used in the crystallographic studies contained the F185K solubilizing mutation, we also assayed the F185K/C280S double mutant IN (dmIN), which is a soluble version of the full-length protein (82). By contrast to the wild type, dmIN contained greater quantities of the dimer than tetramer (Figure 3.5B). Similar with the unmutated IN, the IBD shifted the equilibrium

59

sharply in favor of the dmIN tetramer-IBD complex compared with the dimer dmIN-IBD complex. These experiments demonstrate that the IBD preferentially binds and stabilizes the IN tetramer.

To obtain more detailed information on the interaction between full length

IN and LEDGF, we turned to our mass spectrometric protein footprinting approach (Figure 3.6). The method enables the comparison of surface topologies of free protein versus a protein-ligand complex using small chemical amino acid selective modifiers. The concentrations of the modifying reagents are optimized to ensure mild reaction conditions such that the integrity of the protein-ligand complex is preserved. Subsequent SDS-PAGE separation, proteolysis, and mass spectrometric analyses are carried out to reveal surface amino acids readily modified in free protein but shielded from modification by the interacting partner

(Figure 3.6). This approach has proved instrumental for studying a number of nucleic acid-protein interactions including the IN-viral DNA complex, and consistently revealed biologically relevant contact amino acids (40, 89, 103, 117,

141, 143, 144, 175). Here, we used the method for the first time to examine protein-protein interactions. For this, in parallel reactions with free IN and pre- formed IN-LEDGF complexes were subjected to treatments with Lys and Arg modifying reagents. Despite tight binding, the solution mixture of IN and LEDGF was likely to contain unliganded proteins in addition to the protein complex. To ensure for selective analysis, we utilized His-tagged LEDGF and tag-free IN proteins. Following chemical modification, unliganded LEDGF and IN-LEDGF complexes recovered using Ni-NTA resin, and in parallel free IN, were separated 60

by SDS-PAGE. IN amino acids protected from modification by LEDGF binding were then deciphered by comparing the modification patterns obtained with the different species of gel-isolated IN (Figure 3.6). Control experiments utilized mtLEDGF that lacks the ability to bind IN.

Representative mass spectrometric fragments are shown in Figure 3.7,

and two distinct sets of modification patterns were readily observed. Three peaks

containing modified R107, K186, and K14 were detected with free IN and

IN+mtLEDGF but were markedly reduced from the IN-LEDGF complex (Figure

3.7A-C). In contrast, two peaks containing modified R228 and K273, as well as

an unmodified residue 265-273 fragment, persisted in all three samples. A

detailed summary of the modification patterns is presented in Table 3.1. Of the

12 Lys and 9 Arg residues readily modified in free IN, K14, R107, R166, K186,

R187, and K188 were selectively protected by LEDGF binding. These results are

fully consistent with previous deletion analyses that revealed the CCD as the

primary viral recognition determinant with the NTD donating secondary affinity-

enhancing contacts. The CTD, by contrast, was dispensable for high affinity

LEDGF-IN binding (115).

The footprinting results were next analyzed in the context of available

protein structures to gain further insight into the details of the LEDGF-IN

interaction. Protections of R107 and R166 are consistent with available data from

unliganded (28, 47, 67, 166) as well as IBD-bound (30) CCD dimers. For

example, R166 is part of the so-called α4/5 connector that forms the primary IBD

recognition determinant from one of the IN monomers (30). The guanidino group 61

of R107 furthermore directly participates in IN dimerization through interactions with E85 of the other CCD subunit (30, 47, 67). Intriguingly, our results indicated a relatively rapid exchange between IN subunits, consistent with R107 accessibility in the free protein. In contrast, LEDGF binding stabilized the interacting IN subunits, rendering R107 inaccessible to chemical modification

(Figures 3.4 and 3.5).

Our footprinting studies revealed new protein-protein contact residues

K14, K186, R187, and K188 in the full-length IN-LEDGF complex. Of these, the latter three were unliganded and freely surface exposed in the IBD-CCD co- crystal structure (30). The co-crystal was comprised of a CCD dimer, while the results of size exclusion chromatography indicated affinity of the IBD for the full- length IN tetramer (Figure 3.5). Interestingly, the two domain NTD/CCD crystal structure revealed interactions between two dimers, with K14 from the NTD of one dimer and K186/R187/K188 from the CCD of another dimer contributing to a tetrameric interface (166). In particular, Figure 3.13 depicts the hydrogen bonding network between the two dimers involving the side chains of K14, K186, and

R187. The primary amine of K188 is not directly involved in dimer-dimer interactions. However, two acidic residues (E198 and D25) located at ~4.5 Å from K188 could effectively restrict the access of NHS-biotin to this site (Figure

3.13). Therefore, shielding of K14, K186, R187, and K188 in the context of the complete IN-LEDGF complex could be due to interactions between two IN dimers. Also consistent with a role for the basic triad (K186, R187, K188) in IN tetramerization is the observation that the mutation of the adjacent F185 to Lys 62

shifted the oligomeric state of IN in favor of a dimer while wtIN under the same conditions was predominantly tetrameric (compare dmIN and wtIN chromatograms in Figure 3.5).

To test our hypothesis that protections of K14, K186, R187, and K188 resulted from LEDGF mediated stabilization of tetrameric IN, we conducted site directed mutagenesis experiments (Figures 3.8-3.10). Single point mutations of the target residues significantly compromised tetramer formation (Figure 3.8). For example, K14A, K186A, and K187A mutants were predominantly dimeric, even at relatively high protein concentration (10 µM) (Figure 3.8 and Table 3.3). The

K188A substitution had relatively modest affect on tetramer formation (Figure

3.8) consistent with the lesser role of this residue in direct dimer-dimer interactions (Figure 3.13).

We next examined the effects of these mutations on the LEDGF-IN interaction. For this, increasing concentrations of wild type tag-free LEDGF were incubated with His-tagged IN proteins and the fractions of LEDGF recovered by

Ni-NTA pull-down were quantitated (Figures 3.9A and 3.9B). These experiments yielded an apparent K d of ~200 nM for the interaction between wild type IN and

LEDGF. The K14A, K186A, and R187A mutants exhibited significantly reduced affinity for LEDGF, while the K188A protein was relatively more effective at binding (Figures 3.9A and 3.9B). In fact, the comparison of size exclusion chromatography (Figure 3.8) and binding (Figure 3.9A and 3.9B) results suggested a strong correlation between IN tetramer formation and high affinity

LEDGF binding. 63

The HIV-1 IN dimer can suffice to process viral DNA 3’ ends whereas the tetramer has been implicated in DNA strand transfer activity (62, 69, 100); results of a separate study using chimera IN however indicated that the tetramer was required for efficient 3’ processing activity (10). Because the relevant protein- protein interfaces within the catalytic complexes are for the most part unknown, we investigated whether the tetrameric interface important for high affinity

LEDGF binding played a significant role in IN catalysis. Impressively, the K14A,

K186A, and R187A substitutions adversely affected 3’ processing and DNA strand transfer activities (Figure 3.10). In contrast, the wild type level of activities was observed with the K188A protein. Taken together, our results suggest that the tetramer interface involving basic residues K14, K186, and R187 is important for HIV-1 IN 3’ processing and DNA strand transfer activities.

While the NTD-CCD tetramer was observed in a crystal lattice at high concentrations of the two-domain protein, it is less clear how LEDGF could promote tetramer formation in solution at significantly lower protein concentrations. To address this, we superimposed the structures of the CCD-IBD complex (two IBD molecules bound to the CCD dimer, see (30)) and the two domain (NTD+CCD) tetramer (166) using molecular modeling (Figure 3.11). The results indicated asymmetric interactions of the IBD molecules with the

NTD+CCD tetramer. For example, two (colored magenta) of the four IBD molecules could effectively bridge between the two IN dimers by coordinating the

CCD of one dimer and establishing additional electrostatic interactions with the

NTD of another dimer (Figure 3.11). These additional contacts could contribute to 64

the high affinity IBD-IN interactions and stabilize the IN tetramer. The other two

IBD molecules (colored grey) interacted with the CCD dimer interfaces but could not establish additional charge-charge contacts with the NTDs due to the spatial separation between the IN domains (Figure 3.11). The lack of such interactions may reduce the binding affinity for these IBD molecules. In other words, we propose that the IN tetramer has two high affinity and two lower affinity binding sites for the LEDGF IBD.

3.4 Discussion

The present studies revealed a highly dynamic nature of interactions between IN subunits, which could be essential for its biological function as well as exploited as a novel therapeutic target (see below). The cellular cofactor

LEDGF strongly modulated the dynamic structure of HIV-1 IN by stabilizing subunit-subunit interactions. Unlike the published co-crystal structure of the isolated domains that indicated binding of two IBD molecules to the CCD dimer, our experiments with the full length proteins demonstrated the importance of the

IN tetramer for high affinity LEDGF binding. These results are consistent with previous findings that endogenous LEDGF protein associated with tetrameric recombinant HIV-1 IN in human cells (32).

Our results also indicate that the stabilized IN tetramer is very effective at catalyzing IN 3’ processing and DNA strand transfer activities (Figures 3.2, 3.3, and 3.10). While free IN exists in solution as a mixture of monomer, dimer,

65

tetramer, and high-order species, the IBD strongly promoted formation of the IN tetramer (Figure 3.5) and stimulated IN catalytic function (Figures 3.2, 3.3, and

3.12).

Our footprinting and molecular modeling studies uncovered novel intra- and inter-protein-protein contacts in the full length IN complex with LEDGF

(Figure 3.7). Site directed mutagenesis experiments confirmed the importance of

K14, K186 and K187 residues for tetramer formation and high affinity LEDGF binding (Figures 3.8-3.10). Furthermore, the K14A, K186A, and R187A mutations also compromised IN 3’ processing and DNA strand transfer activities. These results suggest the importance of the common tetramer interface for IN interactions with DNA substrates and LEDGF. Consistently, our previous footprinting analysis of the IN-DNA complex revealed a role for K14 in DNA binding (175). The protections in the nucleoprotein complex could arise from direct protein-DNA or DNA induced protein-protein interactions (175). The present studies clarify that K14 could be a critical dimer-dimer contact essential for effective binding of IN with both DNA and LEDGF. Of note, our findings corroborate results of a number of virus-based assays that indicated essential roles for K14, K186, and R187 (9, 22, 110, 128, 156, 175) in HIV-1 infection whereas the K188A mutation resulted in reduced but reproducible levels of virus spread (110). Therefore, we propose that the IN tetramer is the biologically relevant form responsible for catalytic activities and high affinity binding to

LEDGF.

66

The fact that LEDGF selectively impaired concerted integration of two

HIV-1 DNA ends (Figure 3.3, see also (127, 134)) has been rather puzzling. How can this observation be reconciled in vivo ? The following two scenarios can be considered. Free IN could first assemble onto the viral DNA ends before encountering LEDGF (134). The high degree of flexibility within individual IN subunits (54, 161, 175) as well as their dynamic interplay (Figure 3.4) could be critical for effective assembly of the synaptic complex, where the two catalytic sites position themselves for concerted integration. The preassembled IN tetramer-viral DNA complex would engage LEDGF, which in turn would tether the nucleoprotein complex to active genes without significantly affecting structural arrangements of IN with its DNA substrates. Accordingly, in vitro experiments

with model DNA substrates and purified proteins indicated the importance of the

order of IN binding to DNA and LEDGF for effective concerted integration (127,

134). An alternative possibility is that the viral DNA-IN-LEDGF complex engages

an as yet unidentified cellular chromatin factor(s). Binding this hypothetical

cellular partner could trigger structural changes within the nucleoprotein complex

to enable concerted integration. It has been reported that the IN tetramer

catalyzes the sequential joining of the two viral DNA ends to target DNA in vitro

(100), and that LEDGF strongly facilitates single-end HIV-1 integration (29, 127,

134). Protein-protein interactions at chromatin could prompt LEDGF to partly or fully disengage the PIC to allow IN to regain its flexibility and complete the integration of the second viral DNA end.

67

Our structure-function studies have significant implications for exploiting the highly dynamic nature of HIV-1 IN as a novel therapeutic target. For example, inhibition of functional IN could be accomplished by the following mechanisms: 1) interfering with subunit-subunit assembly and 2) restricting IN flexibility by

“locking” the protein into a functionally compromised multimeric state. Of note, such compounds would be complementary to the clinically approved IN active site inhibitor Raltegravir, as the resistant mutations developed to this drug are significantly distanced from the protein-protein interfaces (116). In general, protein-protein interactions are thought to be rather challenging drug intervention targets for the following reasons. The shape of such interfaces is typically flat and comprises large surface areas (~750-1,500 Ǻ) (reviewed in (6)). However, the subset of interfaces that contribute to high-affinity binding could be significantly smaller and affected by small molecule inhibitors. Indeed, significant progress has been made in discovering a number of protein-protein inhibitors in recent years (reviewed in (6)).

Our results indicate that IN monomers could also be a plausible target for interfacial inhibitors. For example, the fact that IN subunits exchange rapidly

(Figure 3.4) provides biological targets for effective binding of small molecules.

Consistently, our mass spectrometric footprinting revealed that R107, which is located at the dimer interface, is readily modified by HPG (Figure 3.7). While

HPG cannot be viewed as a lead compound, detailed analysis of the CCD structure revealed the following intriguing role for this residue. R107 together with

G106 form a bulge that effectively docks into a cavity within the interacting 68

subunit (Figure 3.10) to stabilize the CCD dimer. The structural dimensions depicted in Figure 3.10 suggest that this site could be selectively targeted by small molecules. Of note, discovery of such structural pockets provide a strong rationale for exploiting IN-IN (Figure 3.10) and IN-LEDGF (30) interactions as therapeutic targets.

An alternative intriguing mechanism could be to restrict the catalytically important flexibility of IN by stabilizing a multimeric state. For example, we showed that the LEDGF IBD forms a stable complex with the IN tetramer, which is active for 3’ processing and single-end DNA strand transfer but not concerted integration in vitro (Figure 3.3). Previous reports showed that overexpression of

LEDGF IBD proteins in target cells effectively impaired HIV-1 replication (42,

105). Furthermore, the IBD was significantly more effective at suppressing HIV-1 replication in LEDGF-deficient cells (555-fold) as compared to cells containing normal LEDGF levels (~30 fold) (105). These observations cannot be fully explained by a competition between the IBD and endogenous LEDGF for IN binding. Instead, our findings provide mechanistic clues that direct binding of the

IBD to the IN tetramer restricts protein flexibility required to form the fully functional DNA strand transfer complex.

While the studies with the LEDGF IBD provide proof-of-concept for a new mechanism of IN inhibition, the main interest is to discover small molecule inhibitors. Such compounds do not have to compete with IN subunit-subunit or

IN-LEDGF interactions and overcome large energy barriers created by protein- protein interfaces. Instead, they could specifically exploit the structural “pockets” 69

present in tetrameric or dimeric proteins to stabilize the interacting subunits and compromise the catalytically essential dynamic structure. Recent reports indicate that IBD-derived short peptides impaired IN catalytic activities in vitro (3, 71) and inhibited HIV-1 infection (71). In fact, earlier observations that certain small molecule inhibitors selectively bind at the IN dimer interface gain particular interest in the context of recent elucidation of structural and mechanistic details of LEDGF-IN interactions. For example, 3,4-dihydroxyphenyltriphenylarsonium bromide has been reported to bind the IN dimer at a site that overlaps the IBD binding pocket (126). Our previous studies revealed highly selective binding of another small molecule, methyl N,O-bis(3,4-diacetoxycinnamoyl)serinate, to the

IN dimer at the site that is immediately adjacent to the IBD binding pocket (144).

A coumarin-based inhibitor was likewise determined to bind IN nearby the IBD interaction site (4). Further research in this direction may well lead to the discovery of novel types of clinically useful HIV-1 IN inhibitors.

70

3.5 Tables for Chapter 3

Modified amino Free IN IN+LEDGF IN+mtLEDGF acids K7 + + + K14 + - + R107 + - + K111 + + + K160 + + + R166 + - + K186 + - + R187 + - + K188 + - + K215 + + + K219 + + + R224 + + + R228 + + + R231 + + + K236 + + + R262 + + + R263 + + + K264 + + + K266 + + + R269 + + + K273 + + + +, surface residues susceptible to modification. -, residues protected from modification due to protein-protein contacts.

Table 3.1 Summary of mass spectrometric footprinting results

71

Elution Estimated Suggested Protein volume, ml MW, Da oligomeric state Free IN (peak Tet) 13.573 102,687 Tetramer Free IN (peak Dim) 15.100 50,060 Dimer Free LEDGF IBD 17.48 15,116 Monomer IN-IBD complex 13.076 129,740 IN tetramer+two IBD

Table 3.2 Summary of size exclusion chromatography results for wild type IN interactions with the LEDGF IBD: Theoretical molecular weights of individual IN and IBD subunits are 32,199 Da and 14,459 Da, respectively.

Elution Estimated Suggested Protein volume, ml MW, Da oligomeric state Wild type IN (peak Tet) 13.320 104,113 Tetramer K14A (peak Tet) 13.123 107,420 Tetramer K14A (peak Dim) 14.855 51,841 Dimer K186A (peak Tet) 13.129 107,174 Tetramer K186A (peak Dim) 14.848 51,988 Dimer R187A (peak Tet) 13.109 108,085 Tetramer R187A (peak Dim) 14.897 50,932 Dimer K188A (peak Tet) 12.857 120,167 Tetramer K188A (peak Dim) 14.872 51,453 Dimer

Table 3.3 Summary of size exclusion chromatography results for wild type and mutant IN proteins

72

3.6 Figures for Chapter 3

Figure 3.1 The LEDGF proteins used in our studies: The upper wild type protein exhibits two distinct chromatin and IN binding activities (green and blue, respectively). The middle D366N mutant (mt) protein is severally impaired for IN binding (33, 135). In contrast, the IBD interacts with IN but lacks the ability to bind DNA and chromatin (107, 158).

73

Figure 3.2 Effects of LEDGF and the isolated IBD on IN 3’ processing and DNA strand transfer activities: (A) Schematic of oligonucleotide-based integration assay. (B) Experimental results. The positions of 21-mer substrate (21-mer S) and the products of 3'-processing (19-mer P) and strand transfer (STP) reactions are indicated. Lanes 1, 7 and 13: 21-mer DNA substrate; Lanes 2, 8 and 14: 50 nM DNA + 0.5 µM IN. The following concentrations of the indicated LEDGF proteins were included in the reactions: lane 3, 9, and 15: 0.25 µM, lane 4, 10, and 16: 0.5 µM, lane 5, 11, and 17: 1 µM, and lane 6, 12, and 18: 2 µM.

74

A B

Figure 3.3 Effects of LEDGF and the isolated IBD on IN concerted integration activity: (A) Schematic diagram of concerted integration assay. Integration of one donor (B) DNA into the target plasmid yields half-site (HS) or single-end integration products. Integration of a pair of D ends into both strands of target DNA results in full-site (FS) or concerted integration. The D can otherwise integrate into separate D molecules to form D-D products. (D) Agarose gel image of reaction products. Lane 1: 32 P labeled lambda/HindIII DNA mass markers; lane 2: control reaction without IN; lane 3, reaction with 400 nM IN; lane 4, 5 and 6, contained 0.25 µM, 0.5 µM and 1.0 µM IBD, respectively; lanes 7, 8 and 9 contained 0.25 µM, 0.5 µM and 1.0 µM LEDGF, respectively.

75

Figure 3.4 Subunit exchange assay using His-tag IN and SDS-PAGE: Design (A) and results (B). (A) Subunit exchange between IN multimers was tested by mixing two different IN proteins: IN1, a tag-free form and IN2, containing the 6xHis tag at its C-terminus. The full length proteins are depicted as dimers (IN1- IN1) and (IN2-IN2). Both proteins contained wild type IN sequences and displayed identical catalytic activities. After mixing, unbound proteins were washed from the resin and bound complexes were analyzed by SDS-PAGE. IN1 and IN2 were clearly separated based on their molecular weight differences. (B) The data in lanes 1-4 indicate that IN2 was able to quantitatively pull-down tag- free IN1. The recovered multimers contained a mixture of IN2-IN2 and IN1-IN2, while tag-free IN1-IN1 was washed out from the Ni-NTA resin. To assess the impact of LEDGF on IN subunit exchange, the IN2 multimer was pre-incubated with LEDGF and then exposed to IN1 (lanes 5-8). The results show that LEDGF interacted with IN2 and effectively prevented IN subunit exchange (lanes 5-8). In contrast, mtLEDGF did not bind IN2 or affect subunit exchange (lanes 9-12). Total amounts of input IN1, IN2, LEDGF, and mtLEDGF proteins are shown in lanes 13, 14, 15 and 16, respectively.

76

Figure 3.4 (Figure legend on previous page)

77

Figure 3.5 Size exclusion chromatography of free IN and IN-IBD complexes: 2.5 µM IN and 5 µM IBD were used in these experiments. (A) Chromatograms of wild type IN protein in its free form (top) and complexed with IBD (bottom). (B) Elution profiles of the soluble double mutant (F185K/C280S) IN protein in its free form (top) and complexed with IBD (bottom). Peaks corresponding to tetramer (Tet) IN, dimer (Dim) IN, as well as IBD-bound IN tetramer (Tet+IBD) and dimer (Dim+IBD), are indicated.

78

Figure 3.6 Schematic presentation of the protein footprinting strategy: The structures of the CCD and the IBD-CCD complex are used for illustration while the experiments were performed with full length IN and LEDGF. In parallel experiments free IN and the IN-LEDGF complex were subjected to treatment by small chemical modifiers (M). Surface residues in free IN and the complex were modified, but the interacting amino acids in the complex were shielded from modification. His-tag LEDGF and tag-free IN proteins were used. Following the modification reactions the complex was pulled down by NTA beads, which enabled recovery of only the LEDGF bound form of IN from the reaction mixture. The interacting proteins were then separated by SDS-PAGE. The IN band was excised and subjected to in-gel proteolysis. Subsequent comparative mass spectrometry (MS) analyses revealed modification patterns in free protein and the complex.

79

Figure 3.7 Representative segments of the MALDI-ToF mass spectra from the protein footprinting: (A) Free IN was treated with HPG or NHS-biotin. (B) IN was preincubated with mtLEDGF and then exposed to treatments with HPG or NHS-biotin. (C) IN-LEDGF complexes were pre-formed and then exposed to HPG or NHS-biotin treatments. Start and end amino acid numbers of the detected peptide peaks are indicated. IN residues affected by modification are depicted in brackets.

80

Figure 3.8 Size exclusion chromatography of wild type and indicated mutants: All protein concentrations were 10 µM. The elution time points corresponding to tetrameric and dimeric IN are indicated by arrows.

81

A

B

Figure 3.9 LEDGF binding to wild type and mutant IN proteins: (A) Western blot results for LEDGF binding to wild type and mutant IN proteins. Increasing concentrations of LEDGF were incubated with 100 nM of the indicated IN. The concentrations of LEDGF were as follows: lane 1: 50 nM, lane 2: 100 nM, lane 3: 150 nM, lane 4: 200 nM, lane 5: 250 nM, lane 6: 300 nM, lane 7: 350 nM, lane 8: 400 nM ,lane 9: 500 nM, lane 10: 650 nM. (B) Quantitative analysis of the western blot results.

82

Figure 3.10 Biochemical characterization of IN mutant proteins: 3’ Processing and DNA strand transfer activities of the indicated IN proteins; other labeling is the same as in Figure 3.2B.

83

Figure 3.11 A model for the IBD bound NTD-CCD tetramer: Two available crystal structures of the NTD-CCD tetramer and the IBD-CCD complex were superimposed. Individual subunits of IN are colored cyan, green, yellow and orange. The acidic residues, which coordinate catalytic metals, in the green and yellow subunits are in spheres and colored red. For clarity only K14 in the green subunit and the basic triad (K186, R187, K188) in the yellow subunit are depicted as spheres. These residues contribute to interactions between the two cyan- green and yellow-orange dimers. The two IBD molecules colored magenta establish symmetrical high affinity interactions, bridging the two dimers together. The lower part of the picture shows that the IBD bound to the CCDs of the yellow-orange dimer is positioned to interact with the α-helix within the green NTD that contains K14. The side chains of D6, E10, and E13 in the green NTD are depicted as red sticks. These amino acids potentially establish charge- charge interactions with the side chains of K401, K402, R404, and R405 (blue sticks) of the IBD helix. The additional two lower affinity binding IBD molecules (grey), which coordinate the CCDs of either cyan-green or yellow-orange dimers, are significantly distanced from the tetramer interfaces.

84

Figure 3.12 IN 3’ Processing activities in the presence of the diketo acid inhibitor (118-D-24) 4-[3-(azidomethyl)phenyl]-2-hydroxy-4-oxo-2-butenoic acid: (154, 174). All reactions contained 5 µM inhibitor, which selectively impaired IN strand transfer activity and allowed accumulation of the 19-mer 3’ processing reaction product (data not shown). Lane 1: free DNA; lane 2: 50 nM DNA + 0.5 µM IN. The following concentrations of IBD were added to the reactions in lanes 3-6: lane 3: 0.25 µM; lane 4: 0.5 µM; lane 5: 1 µM; and lane 6: 2 µM. The positions of 21-mer substrate (21-mer S) and the product of 3’ processing (19-mer P) are indicated.

Figure 3.13 (next page) The NTD-CCD dimer-dimer interface: (167). The side chains of K14 (lower panel), K186 (upper panel), and R187 (upper panel) are engaged in hydrogen bonding networks (blue dashed lines) to stabilize IN dimer- dimer interactions. The corresponding contact amino acids and distances are indicated. The primary amine of K188 remains unliganded (upper panel) but is located in close vicinity (~4.5 Å) to E198 of the same subunit and D25 of another dimer. The colors and relative orientations of individual subunits are similar to those used in Figure 3.11.

85

Figure 3.13 (Figure legend on previous page)

86

Figure 3.10 A potential interaction hotspot at the IN subunit-subunit interface: A and B depict two separate monomers colored green and yellow. R107 and G106 form a bulge structure (A) that docks into the cavity comprised of Y83, I84, E85, A86, V180, and N184 ( colored blue) of another subunit (B) to stabilize the IN dimer. The height (H), depth (D), and width (W) for this potential small molecule binding pocket are indicated.

87

CHAPTER 4

SPECIFIC RECOGNITION OF HISTONE POST TRANSLATIONAL MODIFICATIONS BY THE LEDGF PWWP DOMAIN

4.1 Introduction

The human protein LEDGF has been identified as a key cellular binding partner of HIV-1 integrase and is essential for effective integration of HIV-1 viral

DNA into the host cell genome. The N-terminal end of LEDGF contains a PWWP domain, dual copies of the DNA binding AT hook motif, and a nuclear localization signal (158). This N-terminal domain ensemble has been shown to be responsible for LEDGF interactions with chromatin (145). The C-terminal portion of the protein contains the integrase binding domain (IBD) that binds strongly to

HIV-1 IN and numerous cellular proteins (158, 171). The protein uses the polarity of these two functions to act as a bifunctional tether maintaining cargo proteins in close proximity to chromatin. LEDGF is proposed to tether the HIV pre-integration complex to chromatin, thereby steering integration into the active transcription units targeted by LEDGF/p75. While the AT-hook motifs of LEDGF are sufficient to bind strongly to naked DNA, the contribution of the PWWP domain is only evident in the context of chromatinized DNA templates (11, 146).

88

Although the binding partner of the PWWP domain is unknown, it has been established as critical chromatin interaction motif (107, 146). The PWWP module is present in ~60 eukaryotic proteins and is related in sequence and 3-D structure to a number of other chromatin binding domains collectively known as the Tudor clan domains (119). The PWWP domain of the closely related hepatoma-derived growth factor (HDGF) has been crystallized and the 3-D structure is presumably similar to LEDGF PWWP (113). The defining feature of the HDGF PWWP is a five stranded anti-parallel β barrel structure that features a

surface exposed hydrophobic pocket (113). This structure was used by Shun et

al to design point mutations in LEDGF PWWP domain that abolished interactions

with chromatin (145). These LEDGF PWWP mutants were assumed to deform a

corresponding hydrophobic pocket or a nearby positively charged surface. While

single or double amino acid substitutions are proven to disrupt interactions with

chromatin, it is unclear what specific interaction is being disrupted.

We have hypothesized that the PWWP domain recognizes specific post-

translational modifications of histone proteins in chromatin. In chromatin the

genomic DNA is organized around an octamer of histones to form a series of

nucleosomes (112). Generally, the nucleosome is made up a heterodimer of

histones 2A and 2B combined with a histone H3/H4 tetramer. Histone proteins

can be modified post translationally in order to rearrange chromatin structure or

to serve as landmarks for recruitment of proteins to particular sites (87). Histone

amino-terminal tails that protrude from the nucleosome seem to be the most

89

heavily modified regions of the histone, likely due to their accessibility to modifying enzymes (112). However, numerous modifications have been identified in the structured central regions of the histones and at the protein-DNA interface (173). To test this hypothesis we used a histone peptide array to screen for specific interactions of the LEDGF PWWP domain with modified histone peptides. The library consists of 188 short (21-mer) peptides from histones H3, H2A, H2B, and H4 that are either unmodified or contain common post-translational modifications found in chromatin. This preliminary screen produced several possible interactions which were subjected to further examination to determine the specificity of the interactions.

4.2 Materials and Methods

4.2.1 Expression Plasmids and Recombinant Protein Expression Plasmids and Recombinant Proteins: To express LEDGF PWWP domain the pFT-1 plasmid was altered by site directed mutagenesis to insert sequential stop codons in the coding sequence at the sites corresponding to LEDGF residues

111 and 112. All mutations were introduced using QuikChange XL site-directed mutagenesis kit (Stratagene, La Jolla, CA) according to kit protocol. The resulting plasmid was transformed into BL21(DE3) cells (Stratagene, La Jolla, CA). Cells were grown at 37°C to an OD600 of 0.6 before induction with 500 µM IPTG.

Cells were harvested 3 hours post-induction and stored at -80°C. Thawed cell paste was suspended in 50ml HEPES, pH 7.5, 150mM NaCl, 20mM imidazole supplemented with EDTA-free Complete protease inhibitor (Roche) and lysed by 90

sonication. The resulting lysate was clarified by centrifugation and incubated with Ni-NTA agarose resin (GE Healthcare) at 4°C for 1 hour with gentle agitation. The resin was loaded onto a gravity column and washed thoroughly with 50ml HEPES, pH 7.5, 150mM NaCl, 20mM imidazole prior to elution with

HEPES (pH 7.5), 150mM NaCl, 500mM imidazole. For tag-free preparations of

LEDGF PWWP proteins the polyhistidine tag was removed at this stage by overnight incubation with PreScission protease (GE Healthcare) at 4°C. Peak fractions from the Ni-NTA column were pooled, diluted 1:2 in 50mM HEPES (pH

7.5), loaded on a 5ml heparin HiTrap column (GE Healthcare), with a linear gradient of NaCl from 150-1M. Finally, the proteins were purified by size exclusion chromatography using a 120ml and HiLoad 16/60 Superdex 200 gel filtration column (GE Healthcare) and 50mM HEPES (pH 7.5), 150mM NaCl as the elution buffer. The HP1 chromodomain protein was expressed and purified exactly as above, using an expression vector generously provided by Sepideh

Khorasanizadeh.

4.2.2 Histone Peptide Microarrays: The biotinylated histone peptides comprising the library are affixed to streptavidin-coated microtiter plates via an N- terminal biotin (Alta Biosciences, UK). These plates are thoroughly blocked with

PBS (pH 7.4), 135mM NaCl, 10ug/ml BSA, 0.1% Tween-20 (blocking buffer) prior to a sixteen hour incubation at 4 °C with 100 µL of 5 µM recombinant

polyhistidine-tagged LEDGF PWWP (LEDGF residues 1-110) suspended in

blocking buffer. After extensive washing with PBS (pH 7.4), 135mM NaCl, 0.1%

91

Tween-20 (wash buffer) the plates are incubated with 1:3000 dilution mouse α-

6xHis 1 ° antibody (Sigma) in blocking buffer for 1 hour at 20C. After extensive washing, a 1:1000 dilution of horseradish peroxidase (HRP)-conjugated α-mouse

IgG 2° antibody (GE Healthcare) was added to the wells. Wells containing a

peptide:PWWP complex are identified spectrophotometrically by the blue color

formed during the oxidation of tetramethylbenzidine (TMB) by hydrogen peroxide

in the presence of the HRP.

4.2.3 Pull-down Assays: Pull-down of tag-free histone peptides (Gene Med

Biosynthesis) with 6xHis-tagged proteins was accomplished by assembly of

protein-peptide complexes in 50mM Tris, pH 7.4, 150 mM NaCl, during one hour

incubation at room 4°C. The Ni-NTA agarose resin (GE Healthcare) used was

washed with 50 mM Tris, pH 7.4, 250 mM NaCl, 2 mM MgCl2. Protein-peptide

complexes were spun for 2 minutes at 2000xg to remove precipitate and the

supernatant was incubated with the washed resin for 30 minutes at 4°C with

gentle agitation. Following the incubation the resin was washed with 50 mM Tris,

pH 7.4, 150 mM NaCl, 20 mM imidazole followed by H2O. Protein-peptide

complexes were eluted from the resin by the addition of 10 µL of 750 mM imidazole solution followed by vigorous mixing. The eluted proteins and peptides complexes were desalted using a ZipTip C18 tip (Millipore) and visualized by

MALDI-ToF mass spectrometry using α-cyano-4-hydroxycinnamic acid as a matrix.

92

4.3 Results

In order to rapidly screen a large number of histone post-translational modifications we employed a library of synthetic peptides affixed to the wells of microtiter plates. In 96-well microtiter plate format, these peptides were exposed to recombinant polyhistidine tagged LEDGF PWWP domain and unbound protein was washed away. The peptide-bound protein was visualized colorimetrically with an enzyme-linked antibody specific to the protein’s polyhistidine tag (Figure

4.1). These preliminary screens revealed a subset of nine potentially interacting peptides (Table 4.1) that were subjected to further examination to eliminate artifactual results. To do this, the peptides were pulled down in vitro with affinity- tagged LEDGF PWWP. The histone peptide H3meK9 was used as a control for these experiments as it is known to interact with the HP1 chromodomain and not with LEDGF PWWP ((80, 81) and unpublished data). This established that our method could identify a specific interaction and discriminate non-specific interactions. In this assay the assembled peptide-protein complexes were immobilized on Ni-NTA agarose resin so that unbound proteins and peptides could be washed away. The complexes were eluted with imidazole solution, desalted, and analyzed by MALDI-ToF mass spectrometry to visualize the bound peptides. This assay was able to reproducibly show that in the control reactions

H3meK9 was bound to selectively to HP1-CR and not LEDGF PWWP.

Furthermore these assays showed a highly specific interaction of the histone

H2B peptide KKRKRSRKESYSVYVY(acetyl-K)VLKQ with LEDGF PWWP (Figure

4.2). 93

4.4 Discussion and future directions

The peptides identified were found in the central regions of the histone proteins and not at the heavily modified tail regions. While these experiments suggest that LEDGF PWWP indeed binds these modified peptides, the Kd of the

interaction remains to be determined. Furthermore, incremental truncation of the

peptide is needed to determine the minimal sequence required to maintain the

interaction.

The nature of the protein-peptide interface is unknown at this time. We

are interested to find out if the point mutations in LEDGF-PWWP that abolished

chromatin association (146) would have a similar effect on the binding of this

peptide. Of note, the chromatinized templates used by Shun et al for the in vitro chromatin binding assays were assembled with a preparation of native HeLa cell histones in which the full complement of post-translational modifications is conceivably represented. In addition to the mutant analysis, the surface topology of the protein-peptide interface can be probed by the mass spectrometric footprinting analysis developed in our lab (122). This method has been used successfully in the past to provide detailed information about protein-protein and protein-nucleic acid interfaces (123, 175). The most convincing structural information would come from a protein-peptide co-crystal structure. While the full-length LEDGF protein has not so far been amenable to crystallization efforts, crystallization of the isolated PWWP domain may be more feasible. Indeed, the highly homologous HDGF PWWP domain has been successfully crystallized

(113). 94

It is of paramount importance to test the significance of the interaction to

LEDGF function in vivo. As LEDGF operates as a chromatin tethering protein, chromatin localization using fluorescent-tagged ectopically-expressed protein is likely the best measure of LEDGF function. Cells expressing the fluorescent protein can be permeabilized and treated with test peptide to determine if competition between peptide and chromatin reduces chromatin localization. This strategy has been used successfully in the past to validate the HP1- chromodomain interaction with acetyl-lysine 9 of histone H3 (81). Due to the critical role of LEDGF-chromatin interactions for HIV integration, peptide-treated cells could be treated for sensitivity to HIV infection as another test of biological relevance.

95

Tables for chapter 4

Well Sequence Histone C5 VKRISGLIYEETRGVLKVFLE H4 C6 VKRI(S-phosph)GLIYEETRGVLKVFLE H4 C7 VKRISGLIYEETRGVL(K-Ac)VFLE H4 E11 LAIRNDEELN(K-Ac)LLGKVTIAQG H2A F1 LAIRNDEELN(K-Ac)LLG(K-Ac)VTIAQG H2A H6 KKRKRSRKE(S-phosph)YSVYVY(K-Ac)VLKQ H2B H7 KKRKRSRKESYSVYVY(K-Ac)VLKQ H2B G11 VREIAQDF(K-Me2)TDLRFQSSAVMA H3

Table 4.1 Histone peptides selected for further study

96

4.5 Figures for chapter 4

Figure 4.1 (next page) Representative peptide screening results: (A) H2/H4 library (B) H3 library. A library of biotinylated modified and unmodified histone peptides were immobilized on a streptavidin plates and assayed for interaction with recombinant, polyhistidine-tagged LEDGF PWWP domain. Peptides binding to LEDGF PWWP in this assay are identified by formation of a blue color (see materials and methods). Peptides selected for further study are listed in table 4.1.

97

A 12 11 10 9 8 7 6 5 4 3 2 1 A A B B C C D D E E F F G G H H

B 12 11 10 9 8 7 6 5 4 3 2 1

Figure 4.1 (Figure legend on previous page)

98

A.

B.

C.

D.

Figure 4.2 MS data from in vitro pull-down assays: Test peptides were incubated with polyhistidine-tagged LEDGF PWWP domain. Ni-NTA agarose resin was used to immobilize tagged protein and unbound peptide was washed away. Protein-peptide complexes were eluted from the resin with imidazole solution, desalted, and subjected to analysis by MALDI-ToF MS. (A) The histone peptide KKRKRSRKESYSVYVY(K-Ac)VLKQ. (B) Background binding of the peptide to Ni-NTA resin. (C) Binding of the peptide to LEDGF PWWP. (D) Binding of peptide to the HP1 chromodomain.

99

BIBLIOGRAPHY

1. 2009. AIDS Epidemic Update. World Health Organization and Joint United Programme on HIV/AIDS. 2. Agapkina, J., M. Smolov, S. Barbe, E. Zubin, T. Zatsepin, E. Deprez, M. Le Bret, J. F. Mouscadet, and M. Gottikh. 2006. Probing of HIV-1 integrase/DNA interactions using novel analogs of viral DNA. J Biol Chem 281: 11530-40. 3. Al-Mawsawi, L. Q., F. Christ, R. Dayam, Z. Debyser, and N. Neamati. 2008. Inhibitory profile of a LEDGF/p75 peptide against HIV-1 integrase: insight into integrase-DNA complex formation and catalysis. FEBS Lett 582: 1425-30. 4. Al-Mawsawi, L. Q., V. Fikkert, R. Dayam, M. Witvrouw, T. R. Burke, Jr., C. H. Borchers, and N. Neamati. 2006. Discovery of a small-molecule HIV-1 integrase inhibitor-binding site. Proc Natl Acad Sci U S A 103: 10080-5. 5. Andrake, M. D., and A. M. Skalka. 1995. Multimerization determinants reside in both the catalytic core and C terminus of avian sarcoma virus integrase. J Biol Chem 270: 29299-306. 6. Arkin, M. R., and J. A. Wells. 2004. Small-molecule inhibitors of protein- protein interactions: progressing towards the dream. Nat Rev Drug Discov 3: 301-17. 7. Asante-Appiah, E., and A. M. Skalka. 1997. A metal-induced conformational change and activation of HIV-1 integrase. J Biol Chem 272: 16196-205. 8. Beese, L. S., and T. A. Steitz. 1991. Structural basis for the 3'-5' exonuclease activity of Escherichia coli DNA polymerase I: a two metal ion mechanism. Embo J 10: 25-33. 9. Berthoux, L., S. Sebastian, M. A. Muesing, and J. Luban. 2007. The role of lysine 186 in HIV-1 integrase multimerization. Virology 364: 227-36. 10. Bosserman, M. A., D. F. O'Quinn, and I. Wong. 2007. Loop202-208 in avian sarcoma virus integrase mediates tetramer assembly and processing activity. Biochemistry 46: 11231-9. 11. Botbol, Y., N. K. Raghavendra, S. Rahman, A. Engelman, and M. Lavigne. 2008. Chromatinized templates reveal the requirement for the LEDGF/p75 PWWP domain during HIV-1 integration in vitro. Nucleic Acids Res 36: 1237-46. 12. Bowerman, B., P. O. Brown, J. M. Bishop, and H. E. Varmus. 1989. A nucleoprotein complex mediates the integration of retroviral DNA. Genes Dev 3: 469-78. 100

13. Brown, P. O. 1997. Integration, p. 161-204. In J. M. Coffin, S. H. Hughes, and H. E. Varmus (ed.), . Cold Spring Harbor Laboratory, Plainview, NY. 14. Brown, P. O., B. Bowerman, H. E. Varmus, and J. M. Bishop. 1987. Correct integration of retroviral DNA in vitro. Cell 49: 347-56. 15. Buckman, J. S., W. J. Bosche, and R. J. Gorelick. 2003. Human immunodeficiency virus type 1 nucleocapsid zn(2+) fingers are required for efficient reverse transcription, initial integration processes, and protection of newly synthesized viral DNA. J Virol 77: 1469-80. 16. Bukrinsky, M. I., N. Sharova, T. L. McDonald, T. Pushkarskaya, W. G. Tarpley, and M. Stevenson. 1993. Association of integrase, matrix, and reverse transcriptase antigens of human immunodeficiency virus type 1 with viral nucleic acids following acute infection. Proc Natl Acad Sci U S A 90: 6125-9. 17. Bushman, F. D., and R. Craigie. 1991. Activities of human immunodeficiency virus (HIV) integration protein in vitro: specific cleavage and integration of HIV DNA. Proc Natl Acad Sci U S A 88: 1339-43. 18. Bushman, F. D., A. Engelman, I. Palmer, P. Wingfield, and R. Craigie. 1993. Domains of the integrase protein of human immunodeficiency virus type 1 responsible for polynucleotidyl transfer and zinc binding. Proc Natl Acad Sci U S A 90: 3428-32. 19. Busschots, K., A. Voet, M. De Maeyer, J. C. Rain, S. Emiliani, R. Benarous, L. Desender, Z. Debyser, and F. Christ. 2007. Identification of the LEDGF/p75 binding site in HIV-1 integrase. J Mol Biol 365: 1480-92. 20. Cai, M., R. Zheng, M. Caffrey, R. Craigie, G. M. Clore, and A. M. Gronenborn. 1997. Solution structure of the N-terminal zinc binding domain of HIV-1 integrase. Nat Struct Biol 4: 567-77. 21. Cai, M., R. Zheng, M. Caffrey, R. Craigie, G. M. Clore, and A. M. Gronenborn. 1997. Solution structure of the N-terminal zinc binding domain of HIV-1 integrase. Nat Struct Biol 4: 567-77. 22. Cannon, P. M., W. Wilson, E. Byles, S. M. Kingsman, and A. J. Kingsman. 1994. Human immunodeficiency virus type 1 integrase: effect on viral replication of mutations at highly conserved residues. J Virol 68: 4768-75. 23. Carteau, S., S. C. Batson, L. Poljak, J. F. Mouscadet, H. de Rocquigny, J. L. Darlix, B. P. Roques, E. Kas, and C. Auclair. 1997. Human immunodeficiency virus type 1 nucleocapsid protein specifically stimulates Mg2+-dependent DNA integration in vitro. J Virol 71: 6225-9. 24. Carteau, S., R. J. Gorelick, and F. D. Bushman. 1999. Coupled integration of human immunodeficiency virus type 1 cDNA ends by 101

purified integrase in vitro: stimulation by the viral nucleocapsid protein. J Virol 73: 6670-9. 25. Chen, A., I. T. Weber, R. W. Harrison, and J. Leis. 2006. Identification of amino acids in HIV-1 and avian sarcoma virus integrase subsites required for specific recognition of the Ends. J Biol Chem 281: 4173-82. 26. Chen, H., and A. Engelman. 1998. The barrier-to-autointegration protein is a host factor for HIV type 1 integration. Proc Natl Acad Sci U S A 95: 15270-4. 27. Chen, H., S. Q. Wei, and A. Engelman. 1999. Multiple integrase functions are required to form the native structure of the human immunodeficiency virus type I intasome. J Biol Chem 274: 17358-64. 28. Chen, J. C., J. Krucinski, L. J. Miercke, J. S. Finer-Moore, A. H. Tang, A. D. Leavitt, and R. M. Stroud. 2000. Crystal structure of the HIV-1 integrase catalytic core and C-terminal domains: a model for viral DNA binding. Proc Natl Acad Sci U S A 97: 8233-8. 29. Cherepanov, P. 2007. LEDGF/p75 interacts with divergent lentiviral and modulates their enzymatic activity in vitro. Nucleic Acids Res 35: 113-24. 30. Cherepanov, P., A. L. Ambrosio, S. Rahman, T. Ellenberger, and A. Engelman. 2005. Structural basis for the recognition between HIV-1 integrase and transcriptional coactivator p75. Proc Natl Acad Sci U S A 102: 17308-13. 31. Cherepanov, P., E. Devroe, P. A. Silver, and A. Engelman. 2004. Identification of an evolutionarily conserved domain in human lens epithelium-derived growth factor/transcriptional co-activator p75 (LEDGF/p75) that binds HIV-1 integrase. J Biol Chem 279: 48883-92. 32. Cherepanov, P., G. Maertens, P. Proost, B. Devreese, J. Van Beeumen, Y. Engelborghs, E. De Clercq, and Z. Debyser. 2003. HIV-1 integrase forms stable tetramers and associates with LEDGF/p75 protein in human cells. J Biol Chem 278:372-81. 33. Cherepanov, P., Z. Y. Sun, S. Rahman, G. Maertens, G. Wagner, and A. Engelman. 2005. Solution structure of the HIV-1 integrase-binding domain in LEDGF/p75. Nat Struct Mol Biol 12: 526-32. 34. Chow, S. A., K. A. Vincent, V. Ellison, and P. O. Brown. 1992. Reversal of integration and DNA splicing mediated by integrase of human immunodeficiency virus. Science 255: 723-6. 35. Ciuffi, A., and F. D. Bushman. 2006. Retroviral DNA integration: HIV and the role of LEDGF/p75. Trends Genet 22: 388-95.

102

36. Ciuffi, A., M. Llano, E. Poeschla, C. Hoffmann, J. Leipzig, P. Shinn, J. R. Ecker, and F. Bushman. 2005. A role for LEDGF/p75 in targeting HIV DNA integration. Nat Med 11: 1287-9. 37. Coffin, J. M., S. H. Hughes, and H. Varmus. 1997. Retroviruses. Cold Spring Harbor Laboratory Press, Plainview, N.Y. 38. Craigie, R. 2001. HIV integrase, a brief overview from chemistry to therapeutics. J Biol Chem 276: 23213-6. 39. Croxtall, J. D., and S. J. Keam. 2009. Raltegravir: a review of its use in the management of HIV infection in treatment-experienced patients. Drugs 69: 1059-75. 40. Datta, S. A., Z. Zhao, P. K. Clark, S. Tarasov, J. N. Alexandratos, S. J. Campbell, M. Kvaratskhelia, J. Lebowitz, and A. Rein. 2007. Interactions between HIV-1 Gag molecules in solution: an inositol phosphate-mediated switch. J Mol Biol 365: 799-811. 41. De Luca, L., G. Vistoli, A. Pedretti, M. L. Barreca, and A. Chimirri. 2005. Molecular dynamics studies of the full-length integrase-DNA complex. Biochem Biophys Res Commun 336: 1010-6. 42. De Rijck, J., L. Vandekerckhove, R. Gijsbers, A. Hombrouck, J. Hendrix, J. Vercammen, Y. Engelborghs, F. Christ, and Z. Debyser. 2006. Overexpression of the lens epithelium-derived growth factor/p75 integrase binding domain inhibits human immunodeficiency virus replication. J Virol 80: 11498-509. 43. Delelis, O., K. Carayon, A. Saib, E. Deprez, and J. F. Mouscadet. 2008. Integrase and integration: biochemical activities of HIV-1 integrase. Retrovirology 5: 114. 44. Delelis, O., V. Parissi, H. Leh, G. Mbemba, C. Petit, P. Sonigo, E. Deprez, and J. F. Mouscadet. 2007. Efficient and specific internal cleavage of a retroviral palindromic DNA sequence by tetrameric HIV-1 integrase. PLoS One 2: e608. 45. Deprez, E., P. Tauc, H. Leh, J. F. Mouscadet, C. Auclair, and J. C. Brochon. 2000. Oligomeric states of the HIV-1 integrase as measured by time-resolved fluorescence anisotropy. Biochemistry 39: 9275-84. 46. Di Santo, R., R. Costi, A. Roux, M. Artico, A. Lavecchia, L. Marinelli, E. Novellino, L. Palmisano, M. Andreotti, R. Amici, C. M. Galluzzo, L. Nencioni, A. T. Palamara, Y. Pommier, and C. Marchand. 2006. Novel bifunctional quinolonyl diketo acid derivatives as HIV-1 integrase inhibitors: design, synthesis, biological activities, and mechanism of action. J Med Chem 49: 1939-45. 47. Dyda, F., A. B. Hickman, T. M. Jenkins, A. Engelman, R. Craigie, and D. R. Davies. 1994. Crystal structure of the catalytic domain of HIV-1 103

integrase: similarity to other polynucleotidyl transferases. Science 266: 1981-6. 48. Dyda, F., A. B. Hickman, T. M. Jenkins, A. Engelman, R. Craigie, and D. R. Davies. 1994. Crystal structure of the catalytic domain of HIV-1 integrase: similarity to other polynucleotidyl transferases. Science 266: 1981-6. 49. Eijkelenboom, A. P., R. A. Lutzke, R. Boelens, R. H. Plasterk, R. Kaptein, and K. Hard. 1995. The DNA-binding domain of HIV-1 integrase has an SH3-like fold. Nat Struct Biol 2: 807-10. 50. Ellison, V., H. Abrams, T. Roe, J. Lifson, and P. Brown. 1990. Human immunodeficiency virus integration in a cell-free system. J Virol 64: 2711-5. 51. Emiliani, S., A. Mousnier, K. Busschots, M. Maroun, B. Van Maele, D. Tempe, L. Vandekerckhove, F. Moisant, L. Ben-Slama, M. Witvrouw, F. Christ, J. C. Rain, C. Dargemont, Z. Debyser, and R. Benarous. 2005. Integrase Mutants Defective for Interaction with LEDGF/p75 Are Impaired in Chromosome Tethering and HIV-1 Replication. J Biol Chem 280: 25517-25523. 52. Engelman, A. 2007. Host cell factors and HIV-1 integration. Future HIV Ther. 1: 415-426. 53. Engelman, A. 2009. Isolation and analysis of HIV-1 preintegration complexes. Methods Mol Biol 485: 135-49. 54. Engelman, A., F. D. Bushman, and R. Craigie. 1993. Identification of discrete functional domains of HIV-1 integrase and their organization within an active multimeric complex. Embo J 12: 3269-75. 55. Engelman, A., and P. Cherepanov. 2008. The lentiviral integrase binding protein LEDGF/p75 and HIV-1 replication. PLoS Pathog 4: e1000046. 56. Engelman, A., and R. Craigie. 1992. Identification of conserved amino acid residues critical for human immunodeficiency virus type 1 integrase function in vitro. J Virol 66: 6361-9. 57. Engelman, A., K. Mizuuchi, and R. Craigie. 1991. HIV-1 DNA integration: mechanism of viral DNA cleavage and DNA strand transfer. Cell 67: 1211-21. 58. Espeseth, A. S., P. Felock, A. Wolfe, M. Witmer, J. Grobler, N. Anthony, M. Egbertson, J. Y. Melamed, S. Young, T. Hamill, J. L. Cole, and D. J. Hazuda. 2000. HIV-1 integrase inhibitors that compete with the target DNA substrate define a unique strand transfer conformation for integrase. Proc Natl Acad Sci U S A 97: 11244-9. 59. Esposito, D., and R. Craigie. 1998. Sequence specificity of viral end DNA binding by HIV-1 integrase reveals critical regions for protein-DNA interaction. Embo J 17: 5832-43. 104

60. Farnet, C. M., and F. D. Bushman. 1997. HIV-1 cDNA integration: requirement of HMG I(Y) protein for function of preintegration complexes in vitro. Cell 88: 483-92. 61. Farnet, C. M., and W. A. Haseltine. 1990. Integration of human immunodeficiency virus type 1 DNA in vitro. Proc Natl Acad Sci U S A 87: 4164-8. 62. Faure, A., C. Calmels, C. Desjobert, M. Castroviejo, A. Caumont- Sarcos, L. Tarrago-Litvak, S. Litvak, and V. Parissi. 2005. HIV-1 integrase crosslinked oligomers are active in vitro. Nucleic Acids Res 33: 977-86. 63. Fujiwara, T., and K. Mizuuchi. 1988. Retroviral DNA integration: structure of an integration intermediate. Cell 54: 497-504. 64. Gao, K., S. L. Butler, and F. Bushman. 2001. Human immunodeficiency virus type 1 integrase: arrangement of protein domains in active cDNA complexes. Embo J 20: 3565-76. 65. Gao, K., S. L. Butler, and F. Bushman. 2001. Human immunodeficiency virus type 1 integrase: arrangement of protein domains in active cDNA complexes. Embo J 20: 3565-76. 66. Gao, Y. G., S. Y. Su, H. Robinson, S. Padmanabhan, L. Lim, B. S. McCrary, S. P. Edmondson, J. W. Shriver, and A. H. Wang. 1998. The crystal structure of the hyperthermophile chromosomal protein Sso7d bound to DNA. Nat Struct Biol 5: 782-6. 67. Goldgur, Y., F. Dyda, A. B. Hickman, T. M. Jenkins, R. Craigie, and D. R. Davies. 1998. Three new structures of the core domain of HIV-1 integrase: an active site that binds magnesium. Proc Natl Acad Sci U S A 95: 9150-4. 68. Grobler, J. A., K. Stillmock, B. Hu, M. Witmer, P. Felock, A. S. Espeseth, A. Wolfe, M. Egbertson, M. Bourgeois, J. Melamed, J. S. Wai, S. Young, J. Vacca, and D. J. Hazuda. 2002. Diketo acid inhibitor mechanism and HIV-1 integrase: implications for metal binding in the active site of phosphotransferase enzymes. Proc Natl Acad Sci U S A 99: 6661-6. 69. Guiot, E., K. Carayon, O. Delelis, F. Simon, P. Tauc, E. Zubin, M. Gottikh, J. F. Mouscadet, J. C. Brochon, and E. Deprez. 2006. Relationship between the oligomeric status of HIV-1 integrase on DNA and enzymatic activity. J Biol Chem 281: 22707-19. 70. Hare, S., S. S. Gupta, E. Valkov, A. Engelman, and P. Cherepanov. Retroviral intasome assembly and inhibition of DNA strand transfer. Nature 464: 232-6.

105

71. Hayouka, Z., J. Rosenbluh, A. Levin, S. Loya, M. Lebendiker, D. Veprintsev, M. Kotler, A. Hizi, A. Loyter, and A. Friedler. 2007. Inhibiting HIV-1 integrase by shifting its oligomerization equilibrium. Proc Natl Acad Sci U S A 104: 8316-21. 72. Hazuda, D. J., N. J. Anthony, R. P. Gomez, S. M. Jolly, J. S. Wai, L. Zhuang, T. E. Fisher, M. Embrey, J. P. Guare, Jr., M. S. Egbertson, J. P. Vacca, J. R. Huff, P. J. Felock, M. V. Witmer, K. A. Stillmock, R. Danovich, J. Grobler, M. D. Miller, A. S. Espeseth, L. Jin, I. W. Chen, J. H. Lin, K. Kassahun, J. D. Ellis, B. K. Wong, W. Xu, P. G. Pearson, W. A. Schleif, R. Cortese, E. Emini, V. Summa, M. K. Holloway, and S. D. Young. 2004. A naphthyridine carboxamide provides evidence for discordant resistance between mechanistically identical inhibitors of HIV-1 integrase. Proc Natl Acad Sci U S A 101: 11233-8. 73. Hazuda, D. J., P. Felock, M. Witmer, A. Wolfe, K. Stillmock, J. A. Grobler, A. Espeseth, L. Gabryelski, W. Schleif, C. Blau, and M. D. Miller. 2000. Inhibitors of strand transfer that prevent integration and inhibit HIV- 1 replication in cells. Science 287: 646-50. 74. Hazuda, D. J., A. L. Wolfe, J. C. Hastings, H. L. Robbins, P. L. Graham, R. L. LaFemina, and E. A. Emini. 1994. Viral long terminal repeat substrate binding characteristics of the human immunodeficiency virus type 1 integrase. J Biol Chem 269: 3999-4004. 75. Heuer, T. S., and P. O. Brown. 1997. Mapping features of HIV-1 integrase near selected sites on viral and target DNA molecules in an active enzyme-DNA complex by photo-cross- linking. Biochemistry 36: 10655-65. 76. Heuer, T. S., and P. O. Brown. 1998. Photo-cross-linking studies suggest a model for the architecture of an active human immunodeficiency virus type 1 integrase-DNA complex. Biochemistry 37: 6667-78. 77. Hombrouck, A., J. De Rijck, J. Hendrix, L. Vandekerckhove, A. Voet, M. De Maeyer, M. Witvrouw, Y. Engelborghs, F. Christ, R. Gijsbers, and Z. Debyser. 2007. Virus evolution reveals an exclusive role for LEDGF/p75 in chromosomal tethering of HIV. PLoS Pathog 3: e47. 78. Huang, H., R. Chopra, G. L. Verdine, and S. C. Harrison. 1998. Structure of a covalently trapped catalytic complex of HIV-1 reverse transcriptase: implications for drug resistance. Science 282: 1669-75. 79. Iordanskiy, S., R. Berro, M. Altieri, F. Kashanchi, and M. Bukrinsky. 2006. Intracytoplasmic maturation of the human immunodeficiency virus type 1 reverse transcription complexes determines their capacity to integrate into chromatin. Retrovirology 3: 4.

106

80. Jacobs, S. A., and S. Khorasanizadeh. 2002. Structure of HP1 chromodomain bound to a lysine 9-methylated histone H3 tail. Science 295: 2080-3. 81. Jacobs, S. A., S. D. Taverna, Y. Zhang, S. D. Briggs, J. Li, J. C. Eissenberg, C. D. Allis, and S. Khorasanizadeh. 2001. Specificity of the HP1 chromo domain for the methylated N-terminus of histone H3. Embo J 20: 5232-41. 82. Jenkins, T. M., A. Engelman, R. Ghirlando, and R. Craigie. 1996. A soluble active mutant of HIV-1 integrase: involvement of both the core and carboxyl-terminal domains in multimerization. J Biol Chem 271: 7712-8. 83. Jenkins, T. M., D. Esposito, A. Engelman, and R. Craigie. 1997. Critical contacts between HIV-1 integrase and viral DNA identified by structure- based analysis and photo-crosslinking. Embo J 16: 6849-59. 84. Johnson, A. A., W. Santos, G. C. Pais, C. Marchand, R. Amin, T. R. Burke, Jr., G. Verdine, and Y. Pommier. 2006. Integration requires a specific interaction of the donor DNA terminal 5'-cytosine with glutamine 148 of the HIV-1 integrase flexible loop. J Biol Chem 281: 461-7. 85. Johnson, A. A., J. M. Sayer, H. Yagi, S. S. Patil, F. Debart, M. A. Maier, D. R. Corey, J. J. Vasseur, T. R. Burke, Jr., V. E. Marquez, D. M. Jerina, and Y. Pommier. 2006. Effect of DNA modifications on DNA processing by HIV-1 integrase and inhibitor binding: role of DNA backbone flexibility and an open catalytic site. J Biol Chem 281: 32428-38. 86. Karki, R. G., Y. Tang, T. R. Burke, Jr., and M. C. Nicklaus. 2004. Model of full-length HIV-1 integrase complexed with viral DNA as template for anti-HIV drug design. J Comput Aided Mol Des 18: 739-60. 87. Kouzarides, T. 2007. Chromatin modifications and their function. Cell 128: 693-705. 88. Krueger, J. K., B. S. McCrary, A. H. Wang, J. W. Shriver, J. Trewhella, and S. P. Edmondson. 1999. The solution structure of the Sac7d/DNA complex: a small-angle X-ray scattering study. Biochemistry 38: 10247-55. 89. Kvaratskhelia, M., J. T. Miller, S. R. Budihas, L. K. Pannell, and S. F. Le Grice. 2002. Identification of specific HIV-1 reverse transcriptase contacts to the viral RNA:tRNA complex by mass spectrometry and a primary amine selective reagent. Proc Natl Acad Sci U S A 99: 15988-93. 90. LaFemina, R. L., P. L. Callahan, and M. G. Cordingley. 1991. Substrate specificity of recombinant human immunodeficiency virus integrase protein. J Virol 65: 5624-30. 91. Leavitt, A. D., G. Robles, N. Alesandro, and H. E. Varmus. 1996. Human immunodeficiency virus type 1 integrase mutants retain in vitro

107

integrase activity yet fail to integrate viral DNA efficiently during infection. J Virol 70: 721-8. 92. Lee, M. S., and R. Craigie. 1998. A previously unidentified host protein protects retroviral DNA from autointegration. Proc Natl Acad Sci U S A 95: 1528-33. 93. Lee, M. S., and R. Craigie. 1994. Protection of retroviral DNA from autointegration: involvement of a cellular factor. Proc Natl Acad Sci U S A 91: 9823-7. 94. Lee, S. P., J. Xiao, J. R. Knutson, M. S. Lewis, and M. K. Han. 1997. Zn2+ promotes the self-association of human immunodeficiency virus type-1 integrase in vitro. Biochemistry 36: 173-80. 95. Lee, Y. M., and J. M. Coffin. 1990. Efficient autointegration of avian retrovirus DNA in vitro. J Virol 64: 5958-65. 96. Lewinski, M. K., M. Yamashita, M. Emerman, A. Ciuffi, H. Marshall, G. Crawford, F. Collins, P. Shinn, J. Leipzig, S. Hannenhalli, C. C. Berry, J. R. Ecker, and F. D. Bushman. 2006. Retroviral DNA integration: viral and cellular determinants of target-site selection. PLoS Pathog 2: e60. 97. Li, L., J. M. Olvera, K. E. Yoder, R. S. Mitchell, S. L. Butler, M. Lieber, S. L. Martin, and F. D. Bushman. 2001. Role of the non-homologous DNA end joining pathway in the early steps of retroviral infection. Embo J 20: 3272-81. 98. Li, M., and R. Craigie. 2009. Nucleoprotein complex intermediates in HIV- 1 integration. Methods 47: 237-42. 99. Li, M., and R. Craigie. 2005. Processing of viral DNA ends channels the HIV-1 integration reaction to concerted integration. J Biol Chem 280: 29334-9. 100. Li, M., M. Mizuuchi, T. R. Burke, Jr., and R. Craigie. 2006. Retroviral DNA integration: reaction pathway and critical intermediates. Embo J 25: 1295-304. 101. Lin, C. W., and A. Engelman. 2003. The barrier-to-autointegration factor is a component of functional human immunodeficiency virus type 1 preintegration complexes. J Virol 77: 5030-6. 102. Lins, R. D., T. P. Straatsma, and J. M. Briggs. 2000. Similarities in the HIV-1 and ASV integrase active sites upon metal cofactor binding. Biopolymers 53: 308-15. 103. Liu, Y., M. Kvaratskhelia, S. Hess, Y. Qu, and Y. Zou. 2005. Modulation of replication protein A function by its hyperphosphorylation-induced conformational change involving DNA binding domain B. J Biol Chem 280: 32775-83.

108

104. Llano, M., S. Delgado, M. Vanegas, and E. M. Poeschla. 2004. Lens epithelium-derived growth factor/p75 prevents proteasomal degradation of HIV-1 integrase. J Biol Chem 279: 55570-7. 105. Llano, M., D. T. Saenz, A. Meehan, P. Wongthida, M. Peretz, W. H. Walker, W. Teo, and E. M. Poeschla. 2006. An essential role for LEDGF/p75 in HIV integration. Science 314: 461-4. 106. Llano, M., M. Vanegas, O. Fregoso, D. Saenz, S. Chung, M. Peretz, and E. M. Poeschla. 2004. LEDGF/p75 determines cellular trafficking of diverse lentiviral but not murine oncoretroviral integrase proteins and is a component of functional lentiviral preintegration complexes. J Virol 78: 9524-37. 107. Llano, M., M. Vanegas, N. Hutchins, D. Thompson, S. Delgado, and E. M. Poeschla. 2006. Identification and characterization of the chromatin- binding domains of the HIV-1 integrase interactor LEDGF/p75. J Mol Biol 360: 760-73. 108. Lodi, P. J., J. A. Ernst, J. Kuszewski, A. B. Hickman, A. Engelman, R. Craigie, G. M. Clore, and A. M. Gronenborn. 1995. Solution structure of the DNA binding domain of HIV-1 integrase. Biochemistry 34: 9826-33. 109. Lodi, P. J., J. A. Ernst, J. Kuszewski, A. B. Hickman, A. Engelman, R. Craigie, G. M. Clore, and A. M. Gronenborn. 1995. Solution structure of the DNA binding domain of HIV-1 integrase. Biochemistry 34: 9826-33. 110. Lu, R., A. Limon, E. Devroe, P. A. Silver, P. Cherepanov, and A. Engelman. 2004. Class II integrase mutants with changes in putative nuclear localization signals are primarily blocked at a postnuclear entry step of human immunodeficiency virus type 1 replication. J Virol 78: 12735-46. 111. Lu, R., A. Limon, H. Z. Ghory, and A. Engelman. 2005. Genetic analyses of DNA-binding mutants in the catalytic core domain of human immunodeficiency virus type 1 integrase. J Virol 79: 2493-505. 112. Luger, K., A. W. Mader, R. K. Richmond, D. F. Sargent, and T. J. Richmond. 1997. Crystal structure of the nucleosome core particle at 2.8 A resolution. Nature 389: 251-60. 113. Lukasik, S. M., T. Cierpicki, M. Borloz, J. Grembecka, A. Everett, and J. H. Bushweller. 2006. High resolution structure of the HDGF PWWP domain: a potential DNA binding domain. Protein Sci 15: 314-23. 114. Lutzke, R. A., and R. H. Plasterk. 1998. Structure-based mutational analysis of the C-terminal DNA-binding domain of human immunodeficiency virus type 1 integrase: critical residues for protein oligomerization and DNA binding. J Virol 72: 4841-8.

109

115. Maertens, G., P. Cherepanov, W. Pluymers, K. Busschots, E. De Clercq, Z. Debyser, and Y. Engelborghs. 2003. LEDGF/p75 is essential for nuclear and chromosomal targeting of HIV-1 integrase in human cells. J Biol Chem 278: 33528-39. 116. Malet, I., O. Delelis, M. A. Valantin, B. Montes, C. Soulie, M. Wirden, L. Tchertanov, G. Peytavin, J. Reynes, J. F. Mouscadet, C. Katlama, V. Calvez, and A. G. Marcelin. 2008. Mutations associated with failure of raltegravir treatment affect integrase sensitivity to the inhibitor in vitro. Antimicrob Agents Chemother 52: 1351-8. 117. Marchand, C., K. Krajewski, H. F. Lee, S. Antony, A. A. Johnson, R. Amin, P. Roller, M. Kvaratskhelia, and Y. Pommier. 2006. Covalent binding of the natural antimicrobial peptide indolicidin to DNA abasic sites. Nucleic Acids Res 34: 5157-65. 118. Marshall, H. M., K. Ronen, C. Berry, M. Llano, H. Sutherland, D. Saenz, W. Bickmore, E. Poeschla, and F. D. Bushman. 2007. Role of PSIP1/LEDGF/p75 in lentiviral infectivity and integration targeting. PLoS ONE 2: e1340. 119. Maurer-Stroh, S., N. J. Dickens, L. Hughes-Davies, T. Kouzarides, F. Eisenhaber, and C. P. Ponting. 2003. The Tudor domain 'Royal Family': Tudor, plant Agenet, Chromo, PWWP and MBT domains. Trends Biochem Sci 28: 69-74. 120. Mazumder, A., N. Neamati, A. A. Pilon, S. Sunder, and Y. Pommier. 1996. Chemical trapping of ternary complexes of human immunodeficiency virus type 1 integrase, divalent metal, and DNA substrates containing an abasic site. Implications for the role of lysine 136 in DNA binding. J Biol Chem 271: 27330-8. 121. McColl, D. J., and X. Chen. Strand transfer inhibitors of HIV-1 integrase: bringing IN a new era of antiretroviral therapy. Antiviral Res 85: 101-18. 122. McKee, C. J., J. J. Kessl, J. O. Norris, N. Shkriabai, and M. Kvaratskhelia. 2009. Mass spectrometry-based footprinting of protein- protein interactions. Methods 47: 304-7. 123. McKee, C. J., J. J. Kessl, N. Shkriabai, M. J. Dar, A. Engelman, and M. Kvaratskhelia. 2008. Dynamic modulation of HIV-1 integrase structure and function by cellular lens epithelium-derived growth factor (LEDGF) protein. J Biol Chem 283: 31802-12. 124. Miller, M. D., C. M. Farnet, and F. D. Bushman. 1997. Human immunodeficiency virus type 1 preintegration complexes: studies of organization and composition. J Virol 71: 5382-90. 125. Mitchell, R. S., B. F. Beitzel, A. R. Schroder, P. Shinn, H. Chen, C. C. Berry, J. R. Ecker, and F. D. Bushman. 2004. Retroviral DNA 110

integration: ASLV, HIV, and MLV show distinct target site preferences. PLoS Biol 2: E234. 126. Molteni, V., J. Greenwald, D. Rhodes, Y. Hwang, W. Kwiatkowski, F. D. Bushman, J. S. Siegel, and S. Choe. 2001. Identification of a small- molecule binding site at the dimer interface of the HIV integrase catalytic domain. Acta Crystallogr D Biol Crystallogr 57: 536-44. 127. Pandey, K. K., S. Sinha, and D. P. Grandgenett. 2007. Transcriptional coactivator LEDGF/p75 modulates human immunodeficiency virus type 1 integrase-mediated concerted integration. J Virol 81: 3969-79. 128. Petit, C., O. Schwartz, and F. Mammano. 2000. The karyophilic properties of human immunodeficiency virus type 1 integrase are not required for nuclear import of proviral DNA. J Virol 74: 7119-26. 129. Pluymers, W., P. Cherepanov, D. Schols, E. De Clercq, and Z. Debyser. 1999. Nuclear localization of human immunodeficiency virus type 1 integrase expressed as a fusion protein with green fluorescent protein. Virology 258: 327-32. 130. Podtelezhnikov, A. A., K. Gao, F. D. Bushman, and J. A. McCammon. 2003. Modeling HIV-1 integrase complexes based on their hydrodynamic properties. Biopolymers 68: 110-20. 131. Poeschla, E. M. 2008. Integrase, LEDGF/p75 and HIV replication. Cell Mol Life Sci 65: 1403-24. 132. Pommier, Y., A. A. Johnson, and C. Marchand. 2005. Integrase inhibitors to treat HIV/AIDS. Nat Rev Drug Discov 4: 236-48. 133. Puglia, J., T. Wang, C. Smith-Snyder, M. Cote, M. Scher, J. N. Pelletier, S. John, C. B. Jonsson, and M. J. Roth. 2006. Revealing domain structure through linker-scanning analysis of the murine leukemia virus (MuLV) RNase H and MuLV and human immunodeficiency virus type 1 integrase proteins. J Virol 80: 9497-510. 134. Raghavendra, N. K., and A. Engelman. 2007. LEDGF/p75 interferes with the formation of synaptic nucleoprotein complexes that catalyze full-site HIV-1 DNA integration in vitro: implications for the mechanism of viral cDNA integration. Virology 360: 1-5. 135. Rahman, S., R. Lu, N. Vandegraaff, P. Cherepanov, and A. Engelman. 2007. Structure-based mutagenesis of the integrase-LEDGF/p75 interface uncouples a strict correlation between in vitro protein binding and HIV-1 fitness. Virology 357: 79-90. 136. Ren, G., K. Gao, F. D. Bushman, and M. Yeager. 2007. Single-particle image reconstruction of a tetramer of HIV integrase bound to DNA. J Mol Biol 366: 286-94.

111

137. Renisio, J. G., S. Cosquer, I. Cherrak, S. El Antri, O. Mauffret, and S. Fermandjian. 2005. Pre-organized structure of viral DNA at the binding- processing site of HIV-1 integrase. Nucleic Acids Res 33: 1970-81. 138. Robinson, H., Y. G. Gao, B. S. McCrary, S. P. Edmondson, J. W. Shriver, and A. H. Wang. 1998. The hyperthermophile chromosomal protein Sac7d sharply kinks DNA. Nature 392: 202-5. 139. Sakurai, Y., K. Komatsu, K. Agematsu, and M. Matsuoka. 2009. DNA double strand break repair enzymes function at multiple steps in retroviral infection. Retrovirology 6: 114. 140. Sharma, P., D. P. Singh, N. Fatma, L. T. Chylack, Jr., and T. Shinohara. 2000. Activation of LEDGF gene by thermal-and oxidative- stresses. Biochem Biophys Res Commun 276: 1320-4. 141. Shell, S. M., S. Hess, M. Kvaratskhelia, and Y. Zou. 2005. Mass spectrometric identification of lysines involved in the interaction of human replication protein a with single-stranded DNA. Biochemistry 44: 971-8. 142. Shinohara, T., D. P. Singh, and N. Fatma. 2002. LEDGF, a survival factor, activates stress-related genes. Prog Retin Eye Res 21: 341-58. 143. Shkriabai, N., S. A. Datta, Z. Zhao, S. Hess, A. Rein, and M. Kvaratskhelia. 2006. Interactions of HIV-1 Gag with assembly cofactors. Biochemistry 45: 4077-83. 144. Shkriabai, N., S. S. Patil, S. Hess, S. R. Budihas, R. Craigie, T. R. Burke, Jr., S. F. Le Grice, and M. Kvaratskhelia. 2004. Identification of an inhibitor-binding site to HIV-1 integrase with affinity acetylation and mass spectrometry. Proc Natl Acad Sci U S A 101: 6894-9. 145. Shun, M. C., Y. Botbol, X. Li, F. Di Nunzio, J. E. Daigle, N. Yan, J. Lieberman, M. Lavigne, and A. Engelman. 2008. Identification and characterization of PWWP domain residues critical for LEDGF/p75 chromatin binding and human immunodeficiency virus type 1 infectivity. J Virol 82: 11555-67. 146. Shun, M. C., Y. Botbol, X. Li, F. Di Nunzio, J. E. Daigle, N. Yan, J. Lieberman, M. Lavigne, and A. Engelman. 2008. Identification and Characterization of PWWP Domain Residues Critical for LEDGF/p75 Chromatin-Binding and Human Immunodeficiency Virus Type 1 Infectivity. J Virol. 147. Shun, M. C., N. K. Raghavendra, N. Vandegraaff, J. E. Daigle, S. Hughes, P. Kellam, P. Cherepanov, and A. Engelman. 2007. LEDGF/p75 functions downstream from preintegration complex formation to effect gene-specific HIV-1 integration. Genes Dev 21: 1767-78. 148. Sinha, S., and D. P. Grandgenett. 2005. Recombinant human immunodeficiency virus type 1 integrase exhibits a capacity for full-site 112

integration in vitro that is comparable to that of purified preintegration complexes from virus-infected cells. J Virol 79: 8208-16. 149. Sinha, S., M. H. Pursley, and D. P. Grandgenett. 2002. Efficient concerted integration by recombinant human immunodeficiency virus type 1 integrase without cellular or viral cofactors. J Virol 76: 3105-13. 150. Smerdon, S. J., J. Jager, J. Wang, L. A. Kohlstaedt, A. J. Chirino, J. M. Friedman, P. A. Rice, and T. A. Steitz. 1994. Structure of the binding site for nonnucleoside inhibitors of the reverse transcriptase of human immunodeficiency virus type 1. Proc Natl Acad Sci U S A 91: 3911-5. 151. Smith, J. A., F. X. Wang, H. Zhang, K. J. Wu, K. J. Williams, and R. Daniel. 2008. Evidence that the Nijmegen breakage syndrome protein, an early sensor of double-strand DNA breaks (DSB), is involved in HIV-1 post-integration repair by recruiting the ataxia telangiectasia-mutated kinase in a process similar to, but distinct from, cellular DSB repair. Virol J 5: 11. 152. Sutherland, H. G., K. Newton, D. G. Brownstein, M. C. Holmes, C. Kress, C. A. Semple, and W. A. Bickmore. 2006. Disruption of Ledgf/Psip1 results in perinatal mortality and homeotic skeletal transformations. Mol Cell Biol 26: 7201-10. 153. Suzuki, Y., and R. Craigie. 2007. The road to chromatin - nuclear entry of retroviruses. Nat Rev Microbiol 5: 187-96. 154. Svarovskaia, E. S., R. Barr, X. Zhang, G. C. Pais, C. Marchand, Y. Pommier, T. R. Burke, Jr., and V. K. Pathak. 2004. Azido-containing diketo acid derivatives inhibit human immunodeficiency virus type 1 integrase in vivo and influence the frequency of deletions at two-long- terminal-repeat-circle junctions. J Virol 78: 3210-22. 155. Tsiang, M., G. S. Jones, M. Hung, S. Mukund, B. Han, X. Liu, K. Babaoglu, E. Lansdon, X. Chen, J. Todd, T. Cai, N. Pagratis, R. Sakowicz, and R. Geleziunas. 2009. Affinities between the binding partners of the HIV-1 integrase dimer-lens epithelium-derived growth factor (IN dimer-LEDGF) complex. J Biol Chem 284: 33580-99. 156. Tsurutani, N., M. Kubo, Y. Maeda, T. Ohashi, N. Yamamoto, M. Kannagi, and T. Masuda. 2000. Identification of critical amino acid residues in human immunodeficiency virus type 1 IN required for efficient proviral DNA formation at steps prior to integration in dividing and nondividing cells. J Virol 74: 4795-806. 157. Turlure, F., E. Devroe, P. A. Silver, and A. Engelman. 2004. Human cell proteins and human immunodeficiency virus DNA integration. Front Biosci 9: 3187-208.

113

158. Turlure, F., G. Maertens, S. Rahman, P. Cherepanov, and A. Engelman. 2006. A tripartite DNA-binding element, comprised of the nuclear localization signal and two AT-hook motifs, mediates the association of LEDGF/p75 with chromatin in vivo. Nucleic Acids Res 34: 1653-75. 159. van den Ent, F. M., A. Vos, and R. H. Plasterk. 1999. Dissecting the role of the N-terminal domain of human immunodeficiency virus integrase by trans-complementation analysis. J Virol 73: 3176-83. 160. van Gent, D. C., Y. Elgersma, M. W. Bolk, C. Vink, and R. H. Plasterk. 1991. DNA binding properties of the integrase proteins of human immunodeficiency viruses types 1 and 2. Nucleic Acids Res 19: 3821-7. 161. van Gent, D. C., C. Vink, A. A. Groeneger, and R. H. Plasterk. 1993. Complementation between HIV integrase proteins mutated in different domains. Embo J 12: 3261-7. 162. Vandekerckhove, L., F. Christ, B. Van Maele, J. De Rijck, R. Gijsbers, C. Van den Haute, M. Witvrouw, and Z. Debyser. 2006. Transient and stable knockdown of the integrase cofactor LEDGF/p75 reveals its role in the replication cycle of human immunodeficiency virus. J Virol 80: 1886-96. 163. Vanegas, M., M. Llano, S. Delgado, D. Thompson, M. Peretz, and E. Poeschla. 2005. Identification of the LEDGF/p75 HIV-1 integrase- interaction domain and NLS reveals NLS-independent chromatin tethering. J Cell Sci 118: 1733-43. 164. Verdine, G. L., and D. P. Norman. 2003. Covalent trapping of protein- DNA complexes. Annu Rev Biochem 72: 337-66. 165. Vincent, K. A., V. Ellison, S. A. Chow, and P. O. Brown. 1993. Characterization of human immunodeficiency virus type 1 integrase expressed in Escherichia coli and analysis of variants with amino-terminal mutations. J Virol 67: 425-37. 166. Wang, J. Y., H. Ling, W. Yang, and R. Craigie. 2001. Structure of a two- domain fragment of HIV-1 integrase: implications for domain organization in the intact protein. Embo J 20: 7333-43. 167. Wang, J. Y., H. Ling, W. Yang, and R. Craigie. 2001. Structure of a two- domain fragment of HIV-1 integrase: implications for domain organization in the intact protein. Embo J 20: 7333-43. 168. Wielens, J., I. T. Crosby, and D. K. Chalmers. 2005. A three- dimensional model of the human immunodeficiency virus type 1 integration complex. J Comput Aided Mol Des 19: 301-17. 169. Wu, W., L. E. Henderson, T. D. Copeland, R. J. Gorelick, W. J. Bosche, A. Rein, and J. G. Levin. 1996. Human immunodeficiency virus type 1 nucleocapsid protein reduces reverse transcriptase pausing at a 114

secondary structure near the murine leukemia virus polypurine tract. J Virol 70: 7132-42. 170. Yang, W., and T. A. Steitz. 1995. Recombining the structures of HIV integrase, RuvC and RNase H. Structure 3: 131-4. 171. Yokoyama, A., and M. L. Cleary. 2008. Menin critically links MLL proteins with LEDGF on cancer-associated target genes. Cancer Cell 14: 36-46. 172. Yu, F., G. S. Jones, M. Hung, A. H. Wagner, H. L. MacArthur, X. Liu, S. Leavitt, M. J. McDermott, and M. Tsiang. 2007. HIV-1 integrase preassembled on donor DNA is refractory to activity stimulation by LEDGF/p75. Biochemistry 46: 2899-908. 173. Zhang, L., E. E. Eugeni, M. R. Parthun, and M. A. Freitas. 2003. Identification of novel histone post-translational modifications by peptide mass fingerprinting. Chromosoma 112: 77-86. 174. Zhang, X., G. C. Pais, E. S. Svarovskaia, C. Marchand, A. A. Johnson, R. G. Karki, M. C. Nicklaus, V. K. Pathak, Y. Pommier, and T. R. Burke. 2003. Azido-containing aryl beta-diketo acid HIV-1 integrase inhibitors. Bioorg Med Chem Lett 13: 1215-9. 175. Zhao, Z., C. J. McKee, J. J. Kessl, W. L. Santos, J. E. Daigle, A. Engelman, G. Verdine, and M. Kvaratskhelia. 2008. Subunit-specific protein footprinting reveals significant structural rearrangements and a role for N-terminal Lys-14 of HIV-1 Integrase during viral DNA binding. J Biol Chem 283: 5632-41. 176. Zheng, R., T. M. Jenkins, and R. Craigie. 1996. Zinc folds the N-terminal domain of HIV-1 integrase, promotes multimerization, and enhances catalytic activity. Proc Natl Acad Sci U S A 93: 13659-64.

115