Article

Capsid-CPSF6 Interaction Licenses Nuclear HIV-1 Trafficking to Sites of Viral DNA Integration

Graphical Abstract Authors Vasudevan Achuthan, Jill M. Perreira, Gregory A. Sowd, ..., Stefan G. Sarafianos, Abraham L. Brass, Alan N. Engelman

Correspondence [email protected] (A.L.B.), alan_engelman@dfci. harvard.edu (A.N.E.)

In Brief Prior work indicated that the nuclear periphery dictated HIV-1 integration site selection. Using multiple orthologous approaches, Achuthan et al. fail to garner evidence for preferential targeting of the periphery. The interaction between viral capsid and CPSF6 enables HIV-1 to bypass integration into peripheral heterochromatin and penetrate the nuclear structure for integration.

Highlights d CA-CPSF6 interaction as opposed to nuclear periphery dictates HIV-1 integration d CPSF6 enables HIV-1 to penetrate the nuclear interior beyond the nuclear periphery d Loss of CPSF6 interaction results in integration at lamina- associated domains d LEDGF/p75 does not play a significant role in intranuclear HIV-1 localization

Achuthan et al., 2018, Cell Host & Microbe 24, 392–404 September 12, 2018 ª 2018 Elsevier Inc. https://doi.org/10.1016/j.chom.2018.08.002 Cell Host & Microbe Article

Capsid-CPSF6 Interaction Licenses Nuclear HIV-1 Trafficking to Sites of Viral DNA Integration

Vasudevan Achuthan,1,2 Jill M. Perreira,3 Gregory A. Sowd,1,2 Maritza Puray-Chavez,4 William M. McDougall,3 Adriana Paulucci-Holthauzen,5 Xiaolin Wu,6 Hind J. Fadel,7 Eric M. Poeschla,8 Asha S. Multani,5 Stephen H. Hughes,9 Stefan G. Sarafianos,4,11 Abraham L. Brass,3,10,* and Alan N. Engelman1,2,12,* 1Department of Cancer Immunology and Virology, Dana-Farber Cancer Institute, Boston, MA 02215, USA 2Department of Medicine, Harvard Medical School, Boston, MA 02115, USA 3Department of Microbiology and Physiological Systems, University of Massachusetts Medical School, Worcester, MA 01655, USA 4Department of Molecular Microbiology & Immunology, University of Missouri School of Medicine, Columbia, MO 65212, USA 5Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA 6Leidos Biomedical Research, Inc., Frederick, MD 21702, USA 7Division of Molecular Medicine, Mayo Clinic, Rochester, MN 55905, USA 8Division of Infectious Diseases, University of Colorado Denver School of Medicine, Aurora, CO 80045, USA 9HIV Dynamics and Replication Program, National Cancer Institute, Frederick, MD 21702, USA 10Gastroenterology Division, Department of Medicine, University of Massachusetts Medical School, Worcester, MA 01655, USA 11Present address: Department of Pediatrics, Emory University School of Medicine, Atlanta, GA 30332, USA 12Lead Contact *Correspondence: [email protected] (A.L.B.), [email protected] (A.N.E.) https://doi.org/10.1016/j.chom.2018.08.002

SUMMARY tivity, promoters/enhancers, histone modifications, and -rich regions of chromatin (reviewed in Craigie and Bushman, 2014; HIV-1 integration into the host genome favors Serrao and Engelman, 2016). HIV-1 integration favors actively actively transcribed . Prior work indicated transcribed genes that reside within relatively gene-dense regions that the nuclear periphery provides the architectural (Schroder et al., 2002), a phenotype that depends in large part on basis for integration site selection, with viral the interactions of two viral , integrase (IN) and capsid capsid-binding host cofactor CPSF6 and viral inte- (CA), with respective cellular binding partners lens epithelium- grase-binding cofactor LEDGF/p75 contributing to derived growth factor (LEDGF)/p75 (Cherepanov et al., 2003; Ciuffi et al., 2005; Marshall et al., 2007; Shun et al., 2007b; Singh et al., selection of individual sites. Here, by investigating 2015) and cleavage and polyadenylation specificity factor 6 the early phase of infection, we determine that (CPSF6) (Lee et al., 2010; Sowd et al., 2016). LEDGF/p75 and HIV-1 traffics throughout the nucleus for integration. CPSF6, however, influence HIV-1 integration in different ways. CPSF6-capsid interactions allow the virus to bypass LEDGF/p75 depletion preferentially reduces integration into peripheral heterochromatin and penetrate the nu- genes as compared with gene-dense regions (Ciuffi et al., 2005; clear structure for integration. Loss of interaction Koh et al., 2013; Marshall et al., 2007; Ocwieja et al., 2011; Shun with CPSF6 dramatically alters virus localization et al., 2007b) and shifts intragenic sites toward 50 end regions toward the nuclear periphery and integration into (Shun et al., 2007b; Singh et al., 2015;Sowd et al., 2016), indicating transcriptionally repressed lamina-associated het- that LEDGF/p75 primarily functions to position integration along erochromatin, while loss of LEDGF/p75 does not gene bodies. Although CPSF6 knockout reduces intragenic inte- significantly affect intranuclear HIV-1 localization. gration more dramatically than LEDGF/p75 depletion, positional targeting within transcription units is relatively random. By Thus, CPSF6 serves as a master regulator of HIV-1 contrast, HIV-1 dramatically loses preference for integration near intranuclear localization by trafficking viral preinte- activating epigenetic marks and favors gene-sparse regions, indi- gration complexes away from heterochromatin at cating that CPSF6 predominantly shields HIV-1 from heterochro- the periphery toward gene-dense chromosomal re- matin (Sowd et al., 2016). CPSF6 facilitates HIV-1 PIC nuclear gions within the nuclear interior. import (Chin et al., 2015; Dharan et al., 2016; Peng et al., 2014), and other import cofactors, including transportin 3 (TNPO3), nucleoporin 358 (NUP358) (Ocwieja et al., 2011), cyclophilin A INTRODUCTION (Schaller et al., 2011), and NUP153 (Di Nunzio et al., 2013; Koh et al., 2013), can also influence sites of integration, indicating a po- HIV-1 replication initiates with viral attachment and cell entry, fol- tential link between PIC nuclear import and integration targeting lowed by reverse transcription, nuclear import of the preintegra- (Di Nunzio, 2013; Ko¨ nig et al., 2008). tion complex (PIC), and viral DNA (vDNA) integration. Integration Image-based studies have been used to map positions of into cell genomes is non-random, with different types of retrovi- PICs, integrated proviruses, and favored gene targets (recurrent ruses displaying distinct preferences for genes, transcriptional ac- integration genes [RIGs]) within cell nuclei. Such analyses are

392 Cell Host & Microbe 24, 392–404, September 12, 2018 ª 2018 Elsevier Inc. facilitated by determining relative distance of the imaged focus 2015) in intranuclear HIV-1 localization depleted the proteins by from the nuclear edge, and binning results into three concentric RNAi, although Burdick et al. (2017) analyzed HeLa cells with zones of equal area: peripheral nuclear (PN), mid-nuclear (MN), genomic deletions of the LEDGF/p75 coding gene PSIP1.To and central nuclear (CN) (Chin et al., 2015; Marini et al., 2015). systematically address the roles of LEDGF/p75 versus CPSF6, For studies that did not perform such analyses, we have for the we initially used WT HEK293T cells and isogenic derivatives sake of comparison correlated reported distance measures to knocked out for PSIP1 (LEDGF/p75 knockout [LKO]) (Fadel zonal region (STAR Methods). HIV-1 PICs (Albanese et al., et al., 2014), CPSF6 (CKO), or both PSIP1 and CPSF6 (double 2008; Burdick et al., 2013, 2017; Francis and Melikyan, 2018) KO [DKO]) (Sowd et al., 2016). Cells infected with VSV-G-pseu- and proviruses (Di Primio et al., 2013; Marini et al., 2015) predom- dotyped HIV-1NL4-3 at the approximate MOI of 350 were imaged inantly mapped to the PN and MN. RIGs were mapped by fluores- at 24 hr post-infection (hpi) using the Provirus ViewHIV assay that cence in situ hybridization (FISH) also to PN and MN areas, with predominantly detects integrated virus (Chin et al., 2015)(Fig- some PN loci in proximity to nuclear pore complexes (NPCs) ure S1). DNA signals from confocal microscopy z stacks were (Marini et al., 2015). Such observations have led to the suggestion binned into PN, MN, and CN areas. In both WT and LKO cells, that active chromatin at the nuclear periphery determines the vDNA dispersed fairly equally across the nuclear sections (Fig- architectural basis for HIV-1 integration site selection (Marini ure 1A). In line with our knockdown studies (Chin et al., 2015), et al., 2015). However, the roles of known integration targeting HIV-1 DNA significantly relocalized to the PN in CKO cells cofactors in this process, such as LEDGF/p75 and CPSF6, (76.5% ± 1.4%) with concomitant reductions in MN and CN require clarification. For example, prior studies disagree as to a localization. Consistent with a dominant role for CPSF6 in deter- potential role for LEDGF/p75: whereas two reports indicated mining intranuclear HIV-1 localization, 65.1% ± 0.9% of vDNA that LEDGF/p75 played a role in PN targeting (Marini et al., signals mapped to the PN area of DKO cell nuclei (Figure 1A). 2015; Vranckx et al., 2016), two other papers concluded To assess the generalizability of these observations, we uti- LEDGF/p75 does not contribute to the macrolocalization of lized the MICDDRP bDNA hybridization technique, which, at HIV-1 within the nucleus (Burdick et al., 2017; Quercioli et al., 24 hpi, does not distinguish between unintegrated vDNA and 2016). CPSF6, by contrast, was reportedly important for nuclear proviral DNA (Puray-Chavez et al., 2017). Cells infected at MOI penetration (Chin et al., 2015). To the best of our knowledge, the 0.2 showed the same basic dispersion of vDNA foci as obtained roles of LEDGF/p75 versus CPSF6 in intranuclear vDNA localiza- using the Provirus ViewHIV technique (compare Figures 1B and tion during acute HIV-1 infection have not been systematically 1A). The intranuclear location of HIV-1 DNA documented by analyzed. these bDNA hybridization techniques is therefore independent Here we analyze the functions of CPSF6 and LEDGF/p75 in of imaging methodology. HIV-1 intranuclear localization by (1) imaging PICs using indepen- dently developed ViewHIV and multiplex immunofluorescent CPSF6 Shields HIV-1 from Integrating into LADs cell-based detection of DNA, RNA, and (MICDDRP) HIV-1 disfavors integration into LAD regions of chromatin (Marini branched DNA (bDNA) hybridization techniques (Chin et al., et al., 2015), which mainly consist of transcriptionally silent 2015; Puray-Chavez et al., 2017); (2) imaging proviruses using genes and gene-sparse regions (Guelen et al., 2008; Peric- Provirus ViewHIV (Chin et al., 2015) and single-cell imaging of Hupkes et al., 2010) that interact physically with the proteina- HIV-1 provirus (SCIP) (Di Primio et al., 2013) assays; (3) mapping ceous lamina at the nuclear periphery (Meuleman et al., 2013). viral integration sites (Serrao et al., 2016; Sowd et al., 2016); and As CPSF6 engagement by CA helps HIV-1 to avoid integrating (4) visualizing the intranuclear positions of some of the genes that into heterochromatin (Sowd et al., 2016) and the virus markedly are preferred integration targets in wild-type (WT) cells or when accumulates in the PN area upon CPSF6 depletion (Chin et al., LEDGF/p75- or CPSF6-dependent targeting pathways are dis- 2015)(Figures 1A and 1B), viral integration sites from our rupted. Our results fail to support the notion that HIV-1 integration HEK293T cell datasets (Sowd et al., 2016)(Table S1) were favors the nuclear periphery under baseline infection conditions. analyzed for proximity to LAD coordinates (Meuleman et al., In agreement with two prior reports (Burdick et al., 2017; Quercioli 2013). LAD association was calculated using 5-kb windows et al., 2016), we do not observe a significant role for LEDGF/p75 in because the majority of HIV-1 integration sites in WT cells map- determining the distribution of HIV-1 within the nucleus. We ped within ±2.5 kb of LADs (Figure 1C). In the computationally moreover demonstrate that peripheral HIV-1 DNA localization derived matched random control (MRC), 45.9% of DNA frag- observed in the absence of the CA-CPSF6 interaction strongly ments aligned within 5 kb of LADs (Figure 1D, dashed line). correlates with integration into heterochromatic lamina-associ- Consistent with the prior report (Marini et al., 2015), LAD-prox- ated domains (LADs) that preferentially reside at the nuclear pe- imal integration was significantly disfavored in WT cells (p < riphery. Therefore, CPSF6 engagement licenses HIV-1 PICs to 2.2 3 10299; Table S2). In LKO cells, 29% of integration sites bypass peripheral heterochromatin and penetrate the nuclear mapped to LADs, which, although significantly different from interior to locate gene-dense euchromatin for integration. the 20% value observed in WT cells, was still highly disfavored relative to the MRC (Figure 1D; Table S2). Strikingly, LADs RESULTS became preferred integration targets in CKO cells (Figure 1C), with 65% of all integration sites falling within this window (Fig- Distribution of vDNA and PICs during Acute HIV-1 ure 1D). Increased LAD targeting persisted in DKO cells in the Infection absence of both CPSF6 and LEDGF/p75 (Figures 1C and 1D). Most studies assessing roles of LEDGF/p75 (Marini et al., 2015; Analysis of LADs across different cell types has defined consti- Quercioli et al., 2016; Vranckx et al., 2016) or CPSF6 (Chin et al., tutive LADs (cLADs) as DNA regions interacting with the nuclear

Cell Host & Microbe 24, 392–404, September 12, 2018 393 Figure 1. CPSF6 Dependence of HIV-1 DNA Localization (A and B) Representative central z-section images of infected cells obtained using Provirus ViewHIV (A) or MICDDRP (B). Nuclei (blue) are stained with DAPI throughout the figures, with vDNA appearing in red. Scale bars, 20 mm (A) and 10 mm (B). Distributions of 240–973 total HIV-1 foci from n = 3 independent ex- periments, each conducted in duplicate (A), and 300–383 total foci from n = 2 independent experiments conducted in duplicate (B), were binned into nuclear zones. Results are mean ± SEM. (C) Histogram analysis of HIV-1 integration with respect to LADs; MRC is shown as gray shade. (D and E) Proportion of integration sites within 2.5 kb of LADs (D) and at cLADs versus ciLADs (E). p values (as in A) compare results with WT (black asterisks) or MRC (gray asterisks). ***p < 0.0001; *p < 0.05; NS, p > 0.05; comparisons to random (dashed lines) are indicated by orange or gray color. NS, not significant.

394 Cell Host & Microbe 24, 392–404, September 12, 2018 lamina independent of cell type and reciprocally, constitutive in- the time of infection with the allosteric IN inhibitor BI-D, which, ter-LADs (ciLADs) as chromatin regions that are never found to under these conditions, competes with IN-LEDGF/p75 binding associate with the lamina (Kind et al., 2015). Correlating viral inte- and inhibits HIV-1 integration (Feng et al., 2016; Jurado et al., gration sites with cLAD and ciLAD genomic coordinates revealed 2013). Preliminary experiments revealed that BI-D treatment of that HIV-1 significantly favored ciLADs and avoided cLADs in WT HEK293T cells yielded the same vDNA localization and LAD- and LKO cells (Figure 1E; Table S2). These phenotypes dramat- associated integration targeting phenotypes as observed using ically reversed in CKO and DKO cells wherein HIV-1 favored LKO cells (Figures S3A–S3C; Table S1). In lieu of CPSF6 cLADs, as observed for LADs, and avoided integration into knockout, we utilized the A77V CA mutant, which impairs ciLAD regions (Figure 1E; Table S2). CPSF6 binding efficiency without severely decreasing HIV-1 infectivity (Saito et al., 2016). As expected, BI-D treatment signif- CA-CPSF6 Interaction: A Master Regulator of icantly reduced the infection of T cells derived from different Intranuclear HIV-1 Localization blood donors, while the A77V CA mutant infected cells were We next analyzed an isogenic set of back-complemented cells to similar to the WT (Figure S3D). Both BI-D treatment and the assess the specificity of the virus-host interaction in determining A77V substitution significantly reduced the extent of intragenic the CPSF6-dependent phenotypes discussed thus far. CKO cells HIV-1 integration in T cells (Table S1). were transduced with vectors expressing either of two WT CPSF6 Assessment of HIV-1 DNA localization by Provirus ViewHIV isoforms (CPSF6[588] or CPSF6[551]) as well as the CPSF6[551] (Figure 3A) and MICDDRP (Figure S4A) assays revealed fairly F284A mutant that is defective for CA binding (Sowd et al., equal dispersions across the nuclear zones of primary T cells. 2016). As controls, WT and CKO cells were transduced with an Similar nuclear distributions were maintained with BI-D empty expression vector. Immunoblotting revealed that the exog- treatment, while the A77V change significantly increased enously expressed CPSF6 proteins were present in CKO cells at HIV-1 DNA localization to the PN section (72.7% ± 4.0% to levels similar to that observed for endogenous CPSF6 in parental 79.1% ± 1.1% of vDNA foci for donor A cells depending on WT cells (Figure S2)(Sowd et al., 2016). Expected HIV-1 intranu- assay; 62.6% ± 8.1% for donor B; Figures 3A and S4A). clear distributions were observed in WT-vector and CKO-vector Other CA mutant viruses defective for CPSF6 binding, cells using the Provirus ViewHIV assay (Figure 2A). Expressing including N74D (Lee et al., 2010), N74A, and A105T (Saito either CPSF6[551] or CPSF6[588] in CKO cells redistributed et al., 2016), exhibited similar enrichments in the PN area of pri- vDNA similar to the phenotype observed in WT-vector control mary and transformed cell nuclei (Chin et al., 2015; Francis and cells. In sharp contrast, HIV-1 DNA remained localized to the PN Melikyan, 2018). Integration site analyses revealed that BI-D area in CKO cells expressing the F284A mutant protein (Figure 2A). treatment in large part recapitulated the LAD-proximal upticks To confirm these observations, we applied the SCIP technique, in integration targeting observed in LKO cells and in drug-treated which detects proviruses via induced genomic DNA breaks (Di Pri- HEK293T cells (compare Figures 3B and S3B; Table S2). A77V mio et al., 2013). As observed for most samples using the Provirus mutant viral integration likewise retargeted to LAD and cLAD ViewHIV assay, integration in the PN and MN areas of WT and WT- regions in CD4+ T cells, with concomitant reductions in ciLAD re- vector cells was indistinguishable from random, although we do gions (Figures 3B and 3C; Table S2). However, in contrast to re- note avoidance of the CN area under these experimental condi- sults obtained in CKO cells, the mutant virus did not favor LAD or tions (Figure 2B). More importantly, similar to our prior results us- cLAD regions over random, which we suspect may be due to re- ing CPSF6 knockdown (Chin et al., 2015), HIV-1 integration in CKO sidual binding affinity of the CPSF6 protein for A77V CA (Saito cells aberrantly targeted the PN region. Expression of CPSF6 et al., 2016). Meta-analysis of integration sites from monocyte- [551], but not the CA-binding defective F284A mutant, restored derived macrophages (MDM) infected with WT or A77V (Saito the normal pattern of integration targeting to CKO cells (Figure 2B). et al., 2016) revealed similar targeting phenotypes, with a mar- To assess the role of the CA-CPSF6 interaction in PIC trafficking, ginal preference of the mutant virus for integration into LADs we generated CA and vDNA images at 12 hpi using the PIC View- (Figure S3E; p = 1.8 3 103). HIV assay. Signals in the PN were statistically indistinguishable from random in WT, LKO, and CKO cells expressing either Genes Frequently Targeted for Integration under CPSF6[551] or CPSF6[588] (Figure S2). By contrast, these signals CPSF6-Deficient Conditions Are Enriched at LADs were enriched significantly in the PN areas of CKO cells, CKO-vec- We next focused our attention on the genes that were repeatedly tor cells, and CKO cells expressing CPSF6[551]F284A (Figure S2). targeted for integration under different conditions of HIV-1 infec- CPSF6[551] or CPSF6[588] expression in large part corrected tion. Among 1,136 unique integration sites, Marini et al. (2015) the LAD and cLAD integration targeting defects observed in identified 156 RIGs that were targeted in at least two of six inde- CKO cells. However, the integration site patterns in CKO cells pendent studies, and FISH revealed that a fraction of these expressing the F284A mutant remained largely similar to the genes preferentially localized to the PN area of CD4+ T cell phenotypes observed in CKO-vector cells (Figures 2C and 2D; nuclei. We initially curated repeatedly targeted genes from our Table S2). WT, LKO, CKO, and DKO HEK293T cell datasets (Sowd et al., 2016) (74,752 unique genic sites in total; Table S1). As expected, Nuclear Distribution of HIV-1 DNA in CD4+ T Cells the majority of previously defined RIGs, 123 of 156, were identi- As the results discussed above were obtained using transformed fied among the top targeted genes in our WT cell dataset, while cells, we next analyzed primary CD4+ T cells 24 hr after infection only nine of the RIGs identified by Marini et al. (2015) were with HIV-1NL4-3 carrying its native envelope glycoprotein at the repeatedly targeted for integration in our CKO cell dataset. Heat- MOI of 0.35. In lieu of PSIP1 knockout, cells were treated at maps were assembled to compare gene usage across different

Cell Host & Microbe 24, 392–404, September 12, 2018 395 Figure 2. CA-CSPF6 Interaction Underlies Nuclear Dispersion and LAD Evasion during HIV-1 Integration (A) Representative images of infected cells developed by Provirus ViewHIV. Scale bar, 20 mm. Results are mean ± SEM for n = 2 experiments, each conducted in duplicate, encompassing between 172 and 415 total foci across the indicated conditions. (B) Representative SCIP assay images and nuclear dispersion values (mean ± SEM for n = 2 experiments, each conducted in duplicate). Scale bar, 10 mm. Number of counted foci ranged from 36 to 168. (C and D) Proportion of integration sites nearby LADs (C) and at cLADs versus ciLADs (D). ***p < 0.0001; **p < 0.001; *p < 0.05; NS, p > 0.05; comparisons to random (dashed lines) are indicated in orange and grey color. NS, not significant. infection conditions and the MRC. WT cell RIGs, as expected, S3), underlining that genes typically avoided by HIV-1 become were strongly favored compared with the MRC, yet these prefer- enriched for integration upon CPSF6 depletion. ences were mostly ablated by PSIP1 or CPSF6 knockout (Fig- Annotating the top 50 RIGs for gene length, transcriptional ac- ure 4A; see Table S3 for p values versus MRC). By contrast, tivity, local gene density, and LAD association revealed that WT more than half of the RIGs identified in LKO cells were indistin- cell RIGs trend toward modest size (average 0.2 Mb), are expect- guishable from genes selected by random (Figure 4B; Table edly relatively well expressed (Schroder et al., 2002), reside in S3). However, the top targeted genes in CKO and DKO cells gene-rich areas, and typically do not associate with the nuclear were highly specific to these cell types (Figures 4C and 4D; Table lamina (Figures 5A–5C and S5; Table S3). Similar to the analysis

396 Cell Host & Microbe 24, 392–404, September 12, 2018 Figure 3. CPSF6-Dependent Pan-Nuclear HIV-1 Localization in Primary T Cells (A) Representative Provirus ViewHIV images of infected CD4+ T cells from two blood donors at 24 hpi. Scale bar, 20 mm. Results (mean ± SEM from n = 2 in- dependent experiments, each conducted in duplicate) compile 72–233 total HIV-1 foci across the indicated conditions. (B and C) HIV-1 integration sites observed within 2.5 kb of LADs (B) and at cLADs versus ciLADs (C). Dashed lines denote MRC values. ***p< 0.0001; **p < 0.001; *p < 0.05; NS, p > 0.05; comparisons to random (dashed lines) are indicated by orange and grey coloring. NS, not significant. of bulk integration sites, LKO cell RIGs associated with LADs to a 3, were highlighted for image analysis (Table S4). Group 1 greater extent than WT RIGs, yet this frequency was still less genes, with the exception of MKL1, were classified as RIGs by than expected from the MRC. By contrast, the vast majorities Marini et al. (2015), while none of the shortlisted group 2 and of CKO cell and DKO cell RIGs were LAD associated (Figure 5C). group 3 genes were identified as RIGs in that study. Whereas CKO and DKO cell RIGs typically reside in gene-sparse regions group 1 genes computationally mapped to ciLADs, group 2 (Figure S5; Table S3) and are comparatively large and transcrip- and group 3 genes associated with LADs (Table S5). tionally less active than are the genes targeted by HIV-1 in WT Bacterial artificial (Table S5) were labeled for cells (Figures 5A and 5B). LAD regions importantly remained FISH in uninfected HEK293T and CD4+ T cells. As before, radial transcriptionally repressed in CKO and DKO cells (Figure 5D), distance from the nuclear edge was calculated, and foci were revealing little if any influence of global transcriptional reprog- binned into PN, MN, and CN zones. The majority of group 1 ramming on the CPSF6-dependent relocalization and integration genes were dispersed throughout HEK293T and CD4+ T cell retargeting phenotypes documented here. Analysis of primary nuclei (Figures 6A and 6B). As a group, the average location of MDM (Saito et al., 2016) and CD4+ T cell (Table S1) integration these genes in both cell types marginally favored the MN without datasets corroborated these observations. The genes that are any preference for PN localization (Figure 6C). Strikingly, all preferentially targeted by the A77V CA mutant virus differ signif- seven genes selected under CPSF6-deficient targeting condi- icantly from those targeted by WT HIV-1 (Figure S4B) and are en- tions (groups 2 and 3) were highly enriched in the PN, with riched for LAD association (Figure S3F). 84.0% of group 2 and 81.4% of group 3 genes localized to the To identify cell-type-independent RIGs, we expanded our an- PN area of T cell nuclei. NPLOC4 was an exception among group alyses to include primary and transformed cells infected with WT 1 genes, exhibiting a notably different localization pattern (p < virus ± CPSF6 knockdown (Sowd et al., 2016) or BI-D treatment, 0.0001) in both cell types (Figure 6B). Analogous preferential dis- or infected with the A77V or N74D CA mutants that are defective tributions of RIGs NPLOC4 and KDM2A to PN and MN areas, for CPSF6 binding (Saito et al., 2016; Sowd et al., 2016) (224,429 respectively, were recorded by Marini et al. (2015). genic sites in total; Tables S1 and S4). We identified three distinct RIG groups comprising genes that were repeatedly targeted by DISCUSSION HIV-1 in at least three different cell types specifically under WT conditions (group 1), under LEDGF/p75- or CPSF6-deficient Nuclear Distribution of PICs and Proviral DNA during conditions (group 2), or specifically when CPSF6 targeting was Acute HIV-1 Infection disrupted (group 3) (Tables S3 and S4). Thirteen representative Studies that have mapped intranuclear locations of chromo- genes, six from group 1, three from group 2, and four from group somal regions by FISH (Boyle et al., 2001; Cremer et al., 2001,

Cell Host & Microbe 24, 392–404, September 12, 2018 397 Figure 4. RIG Selection as a Function of Integration Targeting Pathway (A–D) Heatmaps compare integration frequency of the top 50 targeted genes in the indicated cell type across all cell types and the MRC. Maps are colored based on Z score values; darker shades of blue denote values enriched compared with the MRC and lighter shades of blue are depleted compared with the MRC. See Table S3 for integration frequencies and p values.

398 Cell Host & Microbe 24, 392–404, September 12, 2018 Figure 5. Characterization of Pathway-Specific RIGs (A) Bee-swarm plots show lengths of top 50 genes targeted by HIV-1 in the noted cell type or the MRC. (B) Box-and-whiskers plots show average gene expression values for the RIGs noted in (A). Whiskers include 10th–90th percentiles. (C) Proportion of top 50 genes showing LAD association. The computational pipeline (in the absence of infection) enriches for long genes (A), accounting for the high proportion of LAD association among MRC RIGs. Combined proportions of LAD and non-LAD RIGs were statistically compared with combined WT or MRC values. (D) Expression values of chromatin outside versus inside LAD regions in indicated HEK293T cell type for n = 3 independent experiments (error bars, SD). TPM, transcripts per million reads. ***p < 0.001; **p < 0.01; *p < 0.05; NS, p > 0.05 (black, compared with WT or indicated sample; gray, compared with MRC).

2003), conformation capture (Di Stefano et al., HIV-1 PIC and proviral localization. At present, it is unclear why 2016; Kalhor et al., 2012), or block-face scanning electron micro- bDNA hybridization indicated a greater extent of CN area local- scopy (Chen et al., 2017) have concluded that areas relatively en- ization. Hybridization was conducted on fixed samples, such riched in genes preferentially partition to the interior region of the that bDNA probes can freely access target sequences indepen- nucleus, while gene-sparse regions tend to associate with the dent of native higher-order structure. I-SceI endonuclease for periphery. As it is well established that HIV-1 integration over- the SCIP assay was introduced into cells by transient transfec- whelmingly targets gene-rich chromosomal regions (Koh et al., tion, and it is unclear whether the enzyme accesses all nuclear 2013; Ocwieja et al., 2011; Schaller et al., 2011; Schroder zones equally under this condition. At 4 days post-infection et al., 2002; Sowd et al., 2016)(Figure S5), such observations (dpi), unintegrated vDNA principally localized to MN and CN re- are inconsistent with the nuclear periphery playing a major role gions (Marini et al., 2015), leaving open the possibility that unin- in determining integration site selection. Indeed, our data lead tegrated vDNA could contribute to CN populations observed us to conclude that HIV-1 does not preferentially target the nu- here at 24 hpi, although we note that we were unable to confirm clear periphery under baseline infection conditions. preferential MN/CN localization of unintegrated vDNA at 4 dpi by Although the LEDGF/p75-IN interaction is crucial in targeting bDNA hybridization (Figure S1). Regardless, we failed to detect HIV-1 integration along gene bodies (Ciuffi et al., 2005; Marshall preferential PN localization using either the Provirus ViewHIV et al., 2007; Shun et al., 2007b; Singh et al., 2015; Sowd et al., assay, which predominantly detects integrated HIV-1 (Figure S1), 2016), our data, in agreement with other recent findings (Burdick or the SCIP assay, which exclusively scores for integration (Fig- et al., 2017; Quercioli et al., 2016), suggest that this does not play ure 2B). We also note that visual inspection of images of MDM- a significant role in dictating viral intranuclear localization. Given infected cells are consistent with the notion that HIV-1 disperses the contrastingly dramatic role for CPSF6, we propose a model fairly randomly throughout cell nuclei (Stultz et al., 2017). (Figure 7) whereby the CA-CPSF6 interaction licenses the HIV- It is unclear why some prior studies preferentially mapped inte- 1 PIC to move beyond the nuclear periphery and penetrate into gration to the PN area, although the duration of infection before the nucleus to locate gene-rich regions and LEDGF/p75 for inte- imaging can surely influence results (Di Primio et al., 2013). We gration into gene bodies. Although both factors are known to focused on 24 hpi, when the bulk of integration by some mea- play significant roles to determine precise sites of HIV-1 integra- sures has occurred (Brussel and Sonigo, 2003; Mohammadi tion, the model accounts for the vastly different roles observed et al., 2013) and the least possible number of cells had under- for these proteins at the macroscale of gross nuclear structure. gone mitosis. As FISH was performed at 4 dpi (Marini et al., Although we suspect that most HIV-1 integration events follow 2015), it seems possible that proviruses could significantly reor- this scheme, not all have to, and indeed some may occur without ganize within the nucleus between 1 and 4 dpi. We noted mar- the virus interacting with either cofactor. It remains interesting to ginal reorganization, from random to the MN area of HeLa cell test whether CPSF6 accompanies PICs as they move into the nuclei, between 1 and 4 dpi (Figure S1), as well as preferential nucleus, or whether IN-LEDGF/p75 engagement a priori follows MN localization at 24 hpi in some bDNA hybridization experi- CA binding to CPSF6. ments (Figures 1A and S4A). Correlating positions of HIV-1 inte- While most of the results generated here using bDNA hybridi- gration, which were determined using genomic DNA harvested zation indicate fairly random HIV-1 DNA dispersion across nu- at 5 dpi, with fractional PN localization at 24 hpi across samples clear zones, the CN area assessed by SCIP was comparatively underscored the relatively static nature of PN proviral popula- unpopulated at 24 hpi, which is similar to prior mappings of tions over this time frame. Fractional PN area localization

Cell Host & Microbe 24, 392–404, September 12, 2018 399 (legend on next page)

400 Cell Host & Microbe 24, 392–404, September 12, 2018 Figure 7. Intranuclear PIC Distribution and HIV-1 Integration Site Selection Peripheral, mid, and central topological nuclear sections are illustrated in WT cells (A) versus cells depleted for CPSF6 (B). A representative chromatin fiber is shown in blue, with nuclear lamina in gray. Background blue gradient represents relative concentration of gene-dense chromatin toward the nuclear interior. PICs contain orange spheres connected by a black DNA loop. CPSF6 and LEDGF/p75 are shown as black triangles and green circles, respectively. correlated strongly with LAD and cLAD proximal integration tar- and integration targeting (Di Nunzio, 2013). Among the known geting, and negatively correlated with integration at ciLADs (Fig- nuclear import cofactors, CPSF6 would seem to play the domi- ure S6A). The spatial organization of HIV-1 in nuclei at 24 hpi un- nant role in integration site selection. Correlating integration da- der WT versus CPSF6-deficient conditions furthermore agreed tasets from cells knocked down for TNPO3, NUP358 (Ocwieja with the inherent localizations of genes that are repeatedly tar- et al., 2011), NUP153 (Koh et al., 2013), or CPSF6 (Sowd et al., geted under these respective conditions. 2016)(Table S1) revealed that NUP358 and TNPO3 play compar- There are indications that integration in proximity to the NPC atively minor roles in preventing HIV-1 integration at LADs and provides a favorable environment for HIV-1 gene expression (Le- cLADs, with no discernible role for NUP153 (Figures S6B and lek et al., 2015). Although beyond the scope of this study, it will S6C). TNPO3 binds CPSF6 and is its likely b-karyopherin import be of interest to determine whether transcriptionally active provi- cofactor (Maertens et al., 2014), indicating that the TNPO3 ruses disperse equally throughout the nucleus or perhaps prefer phenotype is CPSF6 dependent (De Iaco et al., 2013; Fricke the PN area, and how such phenotypes might change in et al., 2013). We would accordingly speculate that CPSF6, response to external stimuli and/or time post-infection. uniquely among known nuclear import cofactors, is poised to guide the PIC after nuclear entry. PIC Nuclear Import and HIV-1 Integration Targeting CPSF6 as part of the cleavage factor I complex regulates Observations that HIV-1 avoids integration into LADs have fueled mRNA polyadenylation and is typically found in association models whereby PICs instead target peripheral chromatin in as- with subnuclear components such as speckles and para- sociation with NPCs (Lelek et al., 2015; Lusic and Siliciano, speckles (Cardinale et al., 2007; Dettwiler et al., 2004), which 2017), which includes transcriptionally active regions devoid of locate throughout the nucleoplasm and are enriched for mRNA heterochromatin (Ibarra et al., 2016; Schermelleh et al., 2008). processing and transport factors (Fox and Lamond, 2010; Gal- However, correlation of integration sites with chromatin associ- ganski et al., 2017). Importantly, transcriptionally active genes, ated with two NPC components, NUP153 and NUP93 (Ibarra favored targets for HIV-1 integration, preferentially localize near et al., 2016), indicates that HIV-1 only modestly targets these re- speckles (Galganski et al., 2017). Although our recent work has gions. Enrichment of integration sites, compared with the MRC, indicated that the role of CPSF6 in integration targeting is inde- was not evident within 10 kb of NUP153-associated regions in pendent of its role in polyadenylation regulation (Rasheedi different cell types, whereas HIV-1 marginally favored integration et al., 2016; Sowd et al., 2016), KewalRamani and colleagues near NUP93 associated regions in two of three analyzed cell have determined an important role for CPSF6 in PIC trafficking types (Figure S7). Extending the analysis to 1 Mb from to speckles for integration (personal communication). Thus, NUP153- and NUP93-associated regions corroborated insignif- HIV-1 has apparently evolved to hijack CPSF6 to avoid periph- icant and minor enrichment near these respective marks (Fig- eral heterochromatin and penetrate into nuclei to locate ure S7). Disruption of CPSF6-dependent targeting, moreover, transcriptionally active gene-dense regions of chromatin for typically shifted integration away from the NUP-associated re- integration. gions (Figure S7), indicating that the observed nuclear redistribu- tions are primarily driven through the associations of HIV-1 with STAR+METHODS LADs. Consistent with this analysis, none of the shortlisted group 1 RIGs mapped to NUP93 or NUP153-associated regions, while Detailed methods are provided in the online version of this paper group 2 EYS and group 3 FBXL17 genes were associated with and include the following: NUP153 (Table S5). NPC proximal integration (Lelek et al., 2015; Marini et al., 2015) d KEY RESOURCES TABLE seemingly agreed with models linking HIV-1 PIC nuclear import d CONTACT FOR REAGENT AND RESOURCE SHARING

Figure 6. Inherent Gene Localization in Uninfected Cells (A) Radial locations of visualized RIGs in uninfected HEK293T and CD4+ T cells from donor A. Representative images of nuclei stained by DAPI (gray pseudocolor) along with indicated gene-specific foci. Scale bars, 3 mm. (B) Stacked column graphs show distributions of 101–238 total gene foci binned into three nuclear zones. Orange lines indicate random PN and MN distributions. (C) Zonal analyses of group 1–3 genes as compiled gene sets. ***p < 0.0001; **p < 0.001; *p < 0.05; NS, p > 0.05 (orange, gene set compared with random; black, cross-gene-set comparisons).

Cell Host & Microbe 24, 392–404, September 12, 2018 401 d EXPERIMENTAL MODEL AND SUBJECT DETAILS Brussel, A., and Sonigo, P. (2003). Analysis of early human immunodeficiency B Human Cells virus type 1 DNA synthesis by use of a new sensitive assay for quantifying B Viruses integrated provirus. J. Virol. 77, 10119–10124. d METHOD DETAILS Burdick, R.C., Delviks-Frankenberry, K.A., Chen, J., Janaka, S.K., Sastri, J., B Plasmid Constructs Hu, W.S., and Pathak, V.K. (2017). Dynamics and regulation of nuclear import and nuclear movements of HIV-1 complexes. PLoS Pathog. 13, e1006570. B Cell Propagation and Virus Production B HIV-1 Infection Burdick, R.C., Hu, W.S., and Pathak, V.K. (2013). Nuclear import of APOBEC3F-labeled HIV-1 preintegration complexes. Proc. Natl. Acad. Sci. B Imaging Assays USA 110, E4780–E4789. B Determination and Analyses of HIV-1 Integration Sites Cardinale, S., Cisterna, B., Bonetti, P., Aringhieri, C., Biggiogera, M., and d QUANTIFICATION AND STATISTICAL ANALYSIS Barabino, S.M. (2007). Subnuclear localization and dynamics of the pre- B Viral Signal Quantification and Determination of Intra- mRNA 30 end processing factor mammalian cleavage factor I 68-kDa subunit. nuclear Distance Travelled by Foci Mol. Biol. Cell 18, 1282–1292. B Statistical Considerations and Analyses Chen, B., Yusuf, M., Hashimoto, T., Estandarte, A.K., Thompson, G., and d DATA AND SOFTWARE AVAILABILITY Robinson, I. (2017). Three-dimensional positioning and structure of chromo- somes in a human prophase nucleus. Sci. Adv. 3, e1602231. SUPPLEMENTAL INFORMATION Cherepanov, P., Maertens, G., Proost, P., Devreese, B., Van Beeumen, J., Engelborghs, Y., De Clercq, E., and Debyser, Z. (2003). HIV-1 integrase forms Supplemental Information includes seven figures and six tables and can be stable tetramers and associates with LEDGF/p75 protein in human cells. found with this article online at https://doi.org/10.1016/j.chom.2018.08.002. J. Biol. Chem. 278, 372–381. Chin, C.R., Perreira, J.M., Savidis, G., Portmann, J.M., Aker, A.M., Feeley, ACKNOWLEDGMENTS E.M., Smith, M.C., and Brass, A.L. (2015). Direct visualization of HIV-1 replica- tion intermediates shows that capsid and CPSF6 modulate HIV-1 intra-nuclear We thank Bas van Steensel for sharing genomic coordinates of cLAD and invasion and integration. Cell Rep 13, 1717–1731. ciLAD regions, and David Levy for the generous gift of pNLENG1-ES-IRES Ciuffi, A., Llano, M., Poeschla, E., Hoffmann, C., Leipzig, J., Shinn, P., Ecker, DNA. This work was supported by NIH grants R01 AI052014 (to A.N.E.), R01 J.R., and Bushman, F. (2005). A role for LEDGF/p75 in targeting HIV DNA inte- AI077344 (to E.M.P), P30 AI060354 (Harvard CFAR), and T32 AI007245 (to gration. Nat. Med. 11, 1287–1289. G.A.S.). X.W. was supported by the NCI and NIH under contract no. Craigie, R., and Bushman, F.D. (2014). Host factors in retroviral integration and HHSN261200800001E. S.H.H. was supported by the Intramural Research the selection of integration target sites. Microbiol. Spectr. 2,6. Program of the NIH and NCI. A.L.B. was supported by the Bill and Melinda Cremer, M., Kupper,€ K., Wagler, B., Wizelman, L., Hase, J.v., Weiland, Y., Gates Foundation (OPP1097381) and Gilead Sciences. Kreja, L., Diebold, J., Speicher, M.R., and Cremer, T. (2003). Inheritance of gene density-related higher order chromatin arrangements in normal and AUTHOR CONTRIBUTIONS tumor cell nuclei. J. Cell Biol. 162, 809–820. S.H.H., S.G.S., A.L.B., and A.N.E. conceived and designed the experiments; Cremer, M., von Hase, J., Volm, T., Brero, A., Kreth, G., Walter, J., Fischer, C., V.A., J.M.P., G.A.S., W.M.D., M.P.-C., A.P.-H., and A.S.M. performed the ex- Solovei, I., Cremer, C., and Cremer, T. (2001). Non-random radial higher-order periments; H.J.F. and E.M.P. provided reagents; V.A., J.M.P., G.A.S., W.M.D., chromatin arrangements in nuclei of diploid human cells. Chromosome Res. 9, X.W., S.H.H., S.G.S., A.L.B., and A.N.E. analyzed the data; and V.A. and A.N.E. 541–567. wrote the manuscript with input from all authors. Croft, J.A., Bridger, J.M., Boyle, S., Perry, P., Teague, P., and Bickmore, W.A. (1999). Differences in the localization and morphology of chromosomes in the DECLARATION OF INTERESTS human nucleus. J. Cell Biol. 145, 1119–1131. Daugaard, M., Baude, A., Fugger, K., Povlsen, L.K., Beck, H., Sørensen, C.S., The authors declare no competing interests. Petersen, N.H.T., Sorensen, P.H.B., Lukas, C., Bartek, J., et al. (2012). LEDGF (p75) promotes DNA-end resection and homologous recombination. Nat. Received: February 9, 2018 Struct. Mol. Biol. 19, 803–810. Revised: June 22, 2018 De Iaco, A., Santoni, F., Vannier, A., Guipponi, M., Antonarakis, S., and Luban, Accepted: August 1, 2018 J. (2013). TNPO3 protects HIV-1 replication from CPSF6-mediated capsid sta- Published: August 30, 2018 bilization in the host cell cytoplasm. Retrovirology 10,20. SUPPORTING CITATIONS Dettwiler, S., Aringhieri, C., Cardinale, S., Keller, W., and Barabino, S.M. (2004). Distinct sequence motifs within the 68-kDa subunit of cleavage factor The following reference appears in the Supplemental Information: Croft Im mediate RNA binding, protein-protein interactions, and subcellular localiza- et al. (1999). tion. J. Biol. Chem. 279, 35788–35797. Dharan, A., Talley, S., Tripathi, A., Mamede, J.I., Majetschak, M., Hope, T.J., REFERENCES and Campbell, E.M. (2016). KIF5B and Nup358 cooperatively mediate the nuclear import of HIV-1 during infection. PLoS Pathog. 12, e1005700. Achuthan, V., Keith, B.J., Connolly, B.A., and DeStefano, J.J. (2014). Human Di Nunzio, F. (2013). New insights in the role of nucleoporins: a bridge leading immunodeficiency virus reverse transcriptase displays dramatically higher to concerted steps from HIV-1 nuclear entry until integration. Virus Res. 178, fidelity under physiological magnesium conditions in vitro. J. Virol. 88, 187–196. 8514–8527. Di Nunzio, F., Fricke, T., Miccio, A., Valle-Casuso, J.C., Perez, P., Souque, P., Albanese, A., Arosio, D., Terreni, M., and Cereseto, A. (2008). HIV-1 pre-inte- Rizzi, E., Severgnini, M., Mavilio, F., Charneau, P., et al. (2013). Nup153 and gration complexes selectively target decondensed chromatin in the nuclear Nup98 bind the HIV-1 core and contribute to the early steps of HIV-1 replica- periphery. PLoS One 3, e2413. tion. Virology 440, 8–18. Boyle, S., Gilchrist, S., Bridger, J.M., Mahy, N.L., Ellis, J.A., and Bickmore, Di Primio, C., Quercioli, V., Allouch, A., Gijsbers, R., Christ, F., Debyser, Z., W.A. (2001). The spatial organization of human chromosomes within the nuclei Arosio, D., and Cereseto, A. (2013). Single-cell imaging of HIV-1 provirus of normal and emerin-mutant cells. Hum. Mol. Genet. 10, 211–219. (SCIP). Proc. Natl. Acad. Sci. USA 110, 5636–5641.

402 Cell Host & Microbe 24, 392–404, September 12, 2018 Di Stefano, M., Paulsen, J., Lien, T.G., Hovig, E., and Micheletti, C. (2016). Li, H., and Durbin, R. (2009). Fast and accurate short read alignment with Hi-C-constrained physical models of human chromosomes recover function- Burrows-Wheeler transform. Bioinformatics 25, 1754–1760. ally-related properties of genome organization. Sci. Rep. 6, 35985. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Fadel, H.J., Morrison, J.H., Saenz, D.T., Fuchs, J.R., Kvaratskhelia, M., Ekker, Abecasis, G., and Durbin, R.; 1000 Genome Project Data Processing S.C., and Poeschla, E.M. (2014). TALEN knockout of the PSIP1 gene in human Subgroup (2009). The sequence alignment/map format and SAMtools. cells: analyses of HIV-1 replication and allosteric integrase inhibitor mecha- Bioinformatics 25, 2078–2079. nism. J. Virol. 88, 9704–9717. Limo´ n, A., Nakajima, N., Lu, R., Ghory, H.Z., and Engelman, A. (2002). Wild- Feng, L., Dharmarajan, V., Serrao, E., Hoyte, A., Larue, R.C., Slaughter, A., type levels of nuclear localization and human immunodeficiency virus type 1 Sharma, A., Plumb, M.R., Kessl, J.J., Fuchs, J.R., et al. (2016). The competitive replication in the absence of the central DNA flap. J. Virol. 76, 12078–12086. interplay between allosteric HIV-1 integrase inhibitor BI/D and LEDGF/p75 Lusic, M., and Siliciano, R.F. (2017). Nuclear landscape of HIV-1 infection and during the early stage of HIV-1 replication adversely affects inhibitor potency. integration. Nat. Rev. Microbiol. 15, 69–82. ACS Chem. Biol. 11, 1313–1321. Maertens, G.N., Cook, N.J., Wang, W., Hare, S., Gupta, S.S., O¨ ztop, I., Lee, K., Fox, A.H., and Lamond, A.I. (2010). Paraspeckles. Cold Spring Harb. Perspect. Pye, V.E., Cosnefroy, O., Snijders, A.P., et al. (2014). Structural basis for nu- Biol. 2, a000687. clear import of splicing factors by human Transportin 3. Proc. Natl. Acad. Francis, A.C., and Melikyan, G.B. (2018). Single HIV-1 imaging reveals pro- Sci. USA 111, 2728–2733. gression of infection through CA-dependent steps of docking at the nuclear Marini, B., Kertesz-Farkas, A., Ali, H., Lucic, B., Lisek, K., Manganaro, L., pore, uncoating, and nuclear transport. Cell Host Microbe 23, 536–548.e6. Pongor, S., Luzzati, R., Recchia, A., Mavilio, F., et al. (2015). Nuclear architec- Frazee, A.C., Pertea, G., Jaffe, A.E., Langmead, B., Salzberg, S.L., and Leek, ture dictates HIV-1 integration site selection. Nature 521, 227–231. J.T. (2015). Ballgown bridges the gap between transcriptome assembly and Marshall, H.M., Ronen, K., Berry, C., Llano, M., Sutherland, H., Saenz, D., expression analysis. Nat. Biotechnol. 33, 243–246. Bickmore, W., Poeschla, E., and Bushman, F.D. (2007). Role of PSIP1/ Fricke, T., Valle-Casuso, J.C., White, T.E., Brandariz-Nun˜ ez, A., Bosche, W.J., LEDGF/p75 in lentiviral infectivity and integration targeting. PLoS One Reszka, N., Gorelick, R., and Diaz-Griffero, F. (2013). The ability of TNPO3- 2, e1340. depleted cells to inhibit HIV-1 infection requires CPSF6. Retrovirology 10,46. Meuleman, W., Peric-Hupkes, D., Kind, J., Beaudry, J.B., Pagie, L., Kellis, M., Reinders, M., Wessels, L., and van Steensel, B. (2013). Constitutive nuclear Galganski, L., Urbanek, M.O., and Krzyzosiak, W.J. (2017). Nuclear speckles: lamina-genome interactions are highly conserved and associated with A/T- molecular organization, biological function and role in disease. Nucleic Acids rich sequence. Genome Res. 23, 270–280. Res. 45, 10350–10368. Mohammadi, P., Desfarges, S., Bartha, I., Joos, B., Zangger, N., Mun˜ oz, M., Guelen, L., Pagie, L., Brasset, E., Meuleman, W., Faza, M.B., Talhout, W., Gunthard,€ H.F., Beerenwinkel, N., Telenti, A., and Ciuffi, A. (2013). 24 hours Eussen, B.H., de Klein, A., Wessels, L., de Laat, W., et al. (2008). Domain in the life of HIV-1 in a T cell line. PLoS Pathog. 9, e1003161. organization of human chromosomes revealed by mapping of nuclear lamina interactions. Nature 453, 948–951. Nagai, S., Dubrana, K., Tsai-Pflugfelder, M., Davidson, M.B., Roberts, T.M., Brown, G.W., Varela, E., Hediger, F., Gasser, S.M., and Krogan, N.J. (2008). Ibarra, A., Benner, C., Tyagi, S., Cool, J., and Hetzer, M.W. (2016). Functional targeting of DNA damage to a nuclear pore-associated SUMO- Nucleoporin-mediated regulation of cell identity genes. Genes Dev. 30, dependent ubiquitin ligase. Science 322, 597–602. 2253–2258. Ocwieja, K.E., Brady, T.L., Ronen, K., Huegel, A., Roth, S.L., Schaller, T., Jurado, K.A., Wang, H., Slaughter, A., Feng, L., Kessl, J.J., Koh, Y., Wang, W., James, L.C., Towers, G.J., Young, J.A., Chanda, S.K., et al. (2011). HIV inte- Ballandras-Colas, A., Patel, P.A., Fuchs, J.R., et al. (2013). Allosteric integrase gration targeting: a pathway involving Transportin-3 and the nuclear pore pro- inhibitor potency is determined through the inhibition of HIV-1 particle matura- tein RanBP2. PLoS Pathog. 7, e1001313. tion. Proc. Natl. Acad. Sci. USA 110, 8690–8695. Peng, K., Muranyi, W., Glass, B., Laketa, V., Yant, S.R., Tsai, L., Cihlar, T., Kalhor, R., Tjong, H., Jayathilaka, N., Alber, F., and Chen, L. (2012). Genome Muller, B., and Krausslich, H.G. (2014). Quantitative microscopy of functional architectures revealed by tethered chromosome conformation capture and HIV post-entry complexes reveals association of replication with the viral population-based modeling. Nat. Biotechnol. 30, 90–98. capsid. Elife 3, e04114. Kim, D., Langmead, B., and Salzberg, S.L. (2015). HISAT: a fast spliced aligner Peric-Hupkes, D., Meuleman, W., Pagie, L., Bruggeman, S.W., Solovei, I., with low memory requirements. Nat. Methods 12, 357–360. Brugman, W., Graf, S., Flicek, P., Kerkhoven, R.M., van Lohuizen, M., et al. Kind, J., Pagie, L., de Vries, S.S., Nahidiazar, L., Dey, S.S., Bienko, M., Zhan, (2010). Molecular maps of the reorganization of genome-nuclear lamina inter- Y., Lajoie, B., de Graaf, C.A., Amendola, M., et al. (2015). Genome-wide maps actions during differentiation. Mol. Cell 38, 603–613. of nuclear lamina interactions in single human cells. Cell 163, 134–147. Pertea, M., Kim, D., Pertea, G.M., Leek, J.T., and Salzberg, S.L. (2016). Koh, Y., Wu, X., Ferris, A.L., Matreyek, K.A., Smith, S.J., Lee, K., KewalRamani, Transcript-level expression analysis of RNA-seq experiments with HISAT, V.N., Hughes, S.H., and Engelman, A. (2013). Differential effects of human im- StringTie and Ballgown. Nat. Protoc. 11, 1650–1667. munodeficiency virus type 1 capsid and cellular factors nucleoporin 153 and Pertea, M., Pertea, G.M., Antonescu, C.M., Chang, T.C., Mendell, J.T., and LEDGF/p75 on the efficiency and specificity of viral DNA integration. J. Virol. Salzberg, S.L. (2015). StringTie enables improved reconstruction of a tran- 87, 648–658. scriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295. Ko¨ nig, R., Zhou, Y., Elleder, D., Diamond, T.L., Bonamy, G.M., Irelan, J.T., Puray-Chavez, M., Tedbury, P.R., Huber, A.D., Ukah, O.B., Yapo, V., Liu, D., Ji, Chiang, C.Y., Tu, B.P., De Jesus, P.D., Lilley, C.E., et al. (2008). Global analysis J., Wolf, J.J., Engelman, A.N., and Sarafianos, S.G. (2017). Multiplex single- of host-pathogen interactions that regulate early-stage HIV-1 replication. Cell cell visualization of nucleic acids and protein during HIV infection. Nat. 135, 49–60. Commun. 8, 1882. Lee, K., Ambrose, Z., Martin, T.D., Oztop, I., Mulky, A., Julias, J.G., Quercioli, V., Di Primio, C., Casini, A., Mulder, L.C., Vranckx, L.S., Vandegraaff, N., Baumann, J.G., Wang, R., Yuen, W., et al. (2010). Flexible Borrenberghs, D., Gijsbers, R., Debyser, Z., and Cereseto, A. (2016). use of nuclear import pathways by HIV-1. Cell Host Microbe 7, 221–233. Comparative analysis of HIV-1 and murine leukemia virus three-dimensional Lelek, M., Casartelli, N., Pellin, D., Rizzi, E., Souque, P., Severgnini, M., Di nuclear distributions. J. Virol. 90, 5205–5209. Serio, C., Fricke, T., Diaz-Griffero, F., Zimmer, C., et al. (2015). Chromatin or- Quinlan, A.R., and Hall, I.M. (2010). BEDTools: a flexible suite of utilities for ganization at the nuclear pore favours HIV replication. Nat. Commun. 6, 6483. comparing genomic features. Bioinformatics 26, 841–842. Levy, D.N., Aldrovandi, G.M., Kutsch, O., and Shaw, G.M. (2004). Dynamics of Rasheedi, S., Shun, M.-C., Serrao, E., Sowd, G.A., Qian, J., Hao, C., Dasgupta, HIV-1 recombination in its natural target cells. Proc. Natl. Acad. Sci. USA 101, T., Engelman, A.N., and Skowronski, J. (2016). The cleavage and polyadeny- 4204–4209. lation specificity factor 6 (CPSF6) subunit of the capsid-recruited pre-

Cell Host & Microbe 24, 392–404, September 12, 2018 403 messenger RNA cleavage factor I (CFIm) complex mediates HIV-1 integration Shun, M.C., Daigle, J.E., Vandegraaff, N., and Engelman, A. (2007a). Wild-type into genes. J. Biol. Chem. 291, 11809–11819. levels of human immunodeficiency virus type 1 infectivity in the absence of cellular emerin protein. J. Virol. 81, 166–172. Saito, A., Henning, M.S., Serrao, E., Dubose, B.N., Teng, S., Huang, J., Li, X., Saito, N., Roy, S.P., Siddiqui, M.A., et al. (2016). Capsid-CPSF6 interaction is Shun, M.C., Raghavendra, N.K., Vandegraaff, N., Daigle, J.E., Hughes, S., dispensable for HIV-1 replication in primary cells but is selected during virus Kellam, P., Cherepanov, P., and Engelman, A. (2007b). LEDGF/p75 functions passage in vivo. J. Virol. 90, 6918–6935. downstream from preintegration complex formation to effect gene-specific HIV-1 integration. Genes Dev. 21, 1767–1778. Schaller, T., Ocwieja, K.E., Rasaiyaah, J., Price, A.J., Brady, T.L., Roth, S.L., Singh, P.K., Plumb, M.R., Ferris, A.L., Iben, J.R., Wu, X., Fadel, H.J., Luke, Hue, S., Fletcher, A.J., Lee, K., KewalRamani, V.N., et al. (2011). HIV-1 B.T., Esnault, C., Poeschla, E.M., Hughes, S.H., et al. (2015). LEDGF/p75 capsid-cyclophilin interactions determine nuclear import pathway, integration interacts with mRNA splicing factors and targets HIV-1 integration to highly targeting and replication efficiency. PLoS Pathog. 7, e1002439. spliced genes. Genes Dev. 29, 2287–2297. Schermelleh, L., Carlton, P.M., Haase, S., Shao, L., Winoto, L., Kner, P., Burke, Sowd, G.A., Serrao, E., Wang, H., Wang, W., Fadel, H.J., Poeschla, E.M., and B., Cardoso, M.C., Agard, D.A., Gustafsson, M.G., et al. (2008). Subdiffraction Engelman, A.N. (2016). A critical role for alternative polyadenylation factor multicolor imaging of the nuclear periphery with 3D structured illumination mi- CPSF6 in targeting HIV-1 integration to transcriptionally active chromatin. croscopy. Science 320, 1332–1336. Proc. Natl. Acad. Sci. USA 113, E1054–E1063. Schroder, A.R.W., Shinn, P., Chen, H., Berry, C., Ecker, J.R., and Bushman, F. Stultz, R.D., Cenker, J.J., and McDonald, D. (2017). Imaging HIV-1 genomic (2002). HIV-1 integration in the favors active genes and local DNA from entry through productive infection. J. Virol. 91, e00034. hotspots. Cell 110, 521–529. Towbin, B.D., Gonzalez-Aguilera, C., Sack, R., Gaidatzis, D., Kalck, V., Meister, P., Askjaer, P., and Gasser, S.M. (2012). Step-wise methylation of Serrao, E., Cherepanov, P., and Engelman, A.N. (2016). Amplification, next- histone H3K9 positions heterochromatin at the nuclear periphery. Cell 150, generation sequencing, and genomic DNA mapping of retroviral integration 934–947. sites. J. Vis. Exp. https://doi.org/10.3791/53840. Vranckx, L.S., Demeulemeester, J., Saleh, S., Boll, A., Vansant, G., Schrijvers, Serrao, E., and Engelman, A.N. (2016). Sites of retroviral DNA integration: from R., Weydert, C., Battivelli, E., Verdin, E., Cereseto, A., et al. (2016). LEDGIN- basic research to clinical applications. Crit. Rev. Biochem. Mol. Biol. mediated inhibition of integrase-LEDGF/p75 interaction reduces reactivation 51, 26–42. of residual latent HIV. EBioMedicine 8, 248–264. Serrao, E., Krishnan, L., Shun, M.C., Li, X., Cherepanov, P., Engelman, A., and Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis (Springer- Maertens, G.N. (2014). Integrase residues that determine nucleotide prefer- Verlag). ences at sites of HIV-1 integration: implications for the mechanism of target Zhang, H., Meltzer, P., and Davis, S. (2013). RCircos: an R package for Circos DNA binding. Nucleic Acids Res. 42, 5164–5176. 2D track plots. BMC Bioinformatics 14, 244.

404 Cell Host & Microbe 24, 392–404, September 12, 2018 STAR+METHODS

KEY RESOURCES TABLE

REAGENT or RESOURCE SOURCE IDENTIFIER Antibodies Anti-CA monoclonal antibody AG3.0 NIH AIDS Reagent Program Cat #4121; RRID: AB_2734137 Goat anti-mouse secondary antibody Alexa Flour Thermo Fisher Scientific Cat #A-11029; RRID: AB_2534088 488 (Invitrogen) Anti-Phospho-histone H2A.X (Ser139) clone EMD Millipore Cat #05-636; RRID: AB_309864 JBW301 Goat anti-mouse secondary antibody Alexa Flour Thermo Fisher Scientific Cat #A-11032; RRID: AB_2534091 594 (Invitrogen) Anti-CPSF6 rabbit polyclonal antibody Bethyl Laboratories Cat #A301-358A; RRID: AB_937785 Monoclonal anti-b-actin-peroxidase antibody Sigma-Aldrich Cat #A3854-200UL; RRID: AB_262011 Bacterial and Virus Strains NEB 5-alpha Competent E. coli (High Efficiency) New England Biolabs Cat #C2987I

HIV-1NL4-3 Generated using the listed recombinant plasmids N/A Chemicals, Peptides, and Recombinant Proteins VectaShield mounting media with DAPI Vector Laboratories Cat #H-1200 Dulbecco’s phosphate buffered saline (DPBS) Sigma-Aldrich Cat #D8537 DPBS (for VIewHIV assays) Boston BioProducts Cat #BB-220DM 70% formamide Ambion Cat #AM9342, AM9344 Trisodium citrate Sigma-Aldrich Cat #S4641 NaCl Thermo Fisher Scientific Cat #BP358 HCl Sigma-Aldrich Cat #H1758 4% paraformaldehyde Sigma-Aldrich Cat #F8775 100% ethanol Decon Laboratories Cat #2801 Bovine serum albumin BioPharm Laboratories Cat #71-010 Glycine Sigma Cat #G7126 Glacial acetic acid Thermo Fisher Scientific Cat #A38-500 Methanol Sigma-Aldrich Cat #179337 EDTA Sigma-Aldrich Cat #E5134 Tris Thermo Fisher Scientific Cat #BP152-5 Digest All-3 pepsin (Invitrogen) Thermo Fisher Scientific Cat #00-3009 Cot-1 DNA (Invitrogen) Thermo Fisher Scientific Cat #15279-011 FuGENE HD transfection reagent Promega Cat #E2311 Tween 20 Sigma-Aldrich Cat #P1379 1% TritonX-100 Sigma-Aldrich Cat #T9284 FuGENE 6 transfection reagent Promega Cat #E2691 Opti-MEM (Gibco) Thermo Fisher Scientific Cat #31985070 DEAE-Dextran Sigma-Aldrich Cat #D-9885 16% paraformaldehyde Alfa Aesar Cat #43368 Polybrene Sigma-Aldrich Cat #TR-1003-G Poly-L lysine Sigma-Aldrich Cat #P8920 RNase A Qiagen Cat #19101 Lentivirus precipitation solution Alstem Cat #VC100 Protease RNAscope Fluorescent Multiplex Reagent Kit Cat #320850 DAPI (for MICDDRP assay) RNAscope Fluorescent Multiplex Reagent Kit Cat #320850 DAPI (for FISH assays) Vector Lab Cat #H-1200 Colcemid Invitrogen Cat #15212-012 (Continued on next page)

Cell Host & Microbe 24, 392–404.e1–e8, September 12, 2018 e1 Continued REAGENT or RESOURCE SOURCE IDENTIFIER 0.075 M KCl Thermo Fisher Scientific Cat #P217-500 In situ hybridization buffer Empire Genomics Supplied with probes Thremobrite system S500-12 Abbott Laboratories SN# 538S50000389 2X SSC Invitrogen Cat #15557-036 0.05% Tween 20 Thermo Fisher Scientific Cat #BP337-500 Methanol Thermo Fisher Scientific Cat #BP1150-4 Acetic Acid Thermo Fisher Scientific Cat #BP1185-500 Ethanol Pharmco-AAPR Cat #111000200 PBS Thermo Fisher Scientific Cat #SH30256.01 Dulbecco’s modified Eagle’s medium Thermo Fisher Scientific Cat #11965-084 (DMEM; Gibco) Roswell Park Memorial Institute (RPMI) Thermo Fisher Scientific Cat #11875-085 1640 (Gibco) Fetal Bovine Serum HyClone Cat #SH30088.03 100X penicillin streptomycin solution Corning Cat #30-002-CI 100 mM sodium pyruvate (Gibco) Thermo Fisher Scientific Cat #11360-070 100 X MEM non-essential amino acids (Gibco) Thermo Fisher Scientific Cat #11140-050 0.025% Trypsin 2.21 mM EDTA Corning Cat #25-053-CI Dimethyl Sulfoxide (DMSO) Sigma-Aldrich Cat #D2650 PolyJet transfection reagent Signagen laboratories Cat #SL100688 Integrase strand transfer inhibitor Raltegravir Selleck Chem Cat #S2005 Allosteric integrase inhibitor BI-D Custom synthesis from MedChemExpress N/A Interleukin-2, human (hIL-2) recombinant (E. coli) Sigma-Aldrich Cat #HIL2-RO Phytohemagglutinin Thermo Fisher Scientific Cat #ICN15188405 HEPES (Life Technologies) Thermo Fisher Scientific Cat #15630080 T4 DNA Ligase New England Biolabs Cat #M0202L BssHII New England Biolabs Cat #R0199L SpeI New England Biolabs Cat #R0133L MseI New England Biolabs Cat #R0525L BglII New England Biolabs Cat #R0144L Gibson assembly master mix New England Biolabs Cat #E2611L Critical Commercial Assays EnzChek Reverse Transcriptase Assay Kit Thermo Fisher Scientific Cat #E22064 (Life Technologies) ViewRNA ISH Cell Assay Kit (Affymetrix) Thermo Fisher Scientific Cat #18813 HIV-1 p24 antigen capture assay Advanced Bioscience Laboratories Cat #5447 Deposited Data Integration sites reported in this paper NCBI Sequence Read Archive (NCBI SRA) SRA: SRP132583 Experimental Models: Cell Lines HeLa MAGI cells NIH AIDS Reagent Program Cat #3522 HeLa TZM-bl cells NIH AIDS Reagent Program Cat #8129 HEK 293-FT cells Thermo Fisher Scientific Cat #R70007 WT HEK293T, LKO HEK293T cells (Fadel et al., 2014) N/A B8 CKO, F6 DKO, WT-vector, CKO-vector, (Sowd et al., 2016) N/A CKO-CPSF6[551[, CKO-CPSF6[588], CKO-CPSF6[551]F284A cells Donor A CD4+ T cells Lonza Cat #2W-200; Lot #0000475324 Donor B CD4+ T cells Lonza Cat #2W-200; Lot #0000469901 (Continued on next page)

e2 Cell Host & Microbe 24, 392–404.e1–e8, September 12, 2018 Continued REAGENT or RESOURCE SOURCE IDENTIFIER Oligonucleotides PIC ViewHIV probe set targeting HIV-1 gag nt Thermo Fisher Scientific Cat #VF6-12978 922-1933 (Invitrogen) Provirus ViewHIV probe set covering NL4-3 Thermo Fisher Scientific Cat #VF1-14734 genome (Invitrogen) MICDDRP Probe set 3 (PS-3) Advanced Cell Diagnostics RNAscope Cat #317701-C1 Pre-amplifier probes Advanced Cell Diagnostics RNAscope Cat #320850 Fluorescent Multiplex Reagent Kit Amplifier probes Advanced Cell Diagnostics RNAscope Cat #320850 Fluorescent Multiplex Reagent Kit Fluorescently labelled bacterial artificial Empire Genomics (see Table S5) chromosome (BAC) probes for FISH (see Table S5) Recombinant DNA pNLENG1-ES-IRES-GFP (Levy et al., 2004) N/A pNLX.Luc.R-.DAvrII (Koh et al., 2013) N/A pNLENG1-ES-IRES-GFP A77V CA This paper N/A pCG-VSV-G (Shun et al., 2007a) N/A pNLXE7 (Limo´ n et al., 2002) N/A pCBASceI (Di Primio et al., 2013) N/A HIV-CMV-deltaGFP-I-SceI (Di Primio et al., 2013) N/A Software and Algorithms ImageJ NIH (https://imagej.net/ImageJ) N/A NIS Elements AR 4.4 Nikon N/A Imaris Bitplane N/A Huygens Professional software (version 16.10) SVI N/A BWA MEM (Li and Durbin, 2009) N/A HISAT2 (Kim et al., 2015) N/A SAMtools (Li et al., 2009) N/A BEDtools software suite (Quinlan and Hall, 2010) N/A GGplot2 (Wickham, 2016) N/A Rcircos (Zhang et al., 2013) N/A Stringtie (Pertea et al., 2015) N/A Ballgown (Frazee et al., 2015) N/A Other Hybridization oven Illumina Model #230402ILL Denaturation oven Agilent Model #G2545A Humidified BioAssay Dish Corning Cat #431301 100 mm petridish Thermo Fisher Scientific Cat #08-757-13 24-well dish Olympus Plastics Cat # 25-107 8mm diameter #1.5 thickness coverslips Electron Microscopy Sciences Cat. # 72296-08 Pressure cooker Cuisinart Cat #CPC-600AMZ Humidified HybEZ oven Advanced Cell Diagnostics RNAscope Cat #PN 321710/321720 Inverted confocal microscope Leica Cat #TCS SP8 ImmEdge hydrophobic barrier PAP pen Vector Laboratories Cat #H-4000 Nikon A-1 Confocal Imaging System The Sanderson Center for Optical Nikon A-1 Confocal Microscope Experimentation (SCOPE) Core Facility at UMass Medical School 0.22 mM polyvinylidene fluoride filter Millipore Cat #SLGV033RS 12 mm collagen-coated coverslips NeuVitro Cat #GG-12-Collagen 0.45 mM polyvinylidene fluoride filter Thermo Fisher Scientific Cat #09740106

Cell Host & Microbe 24, 392–404.e1–e8, September 12, 2018 e3 CONTACT FOR REAGENT AND RESOURCE SHARING

Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Alan Engelman ([email protected]).

EXPERIMENTAL MODEL AND SUBJECT DETAILS

Human Cells WT HEK293T cells (female) and knockout cell line derivatives LKO (Fadel et al., 2014), CKO, and DKO (Sowd et al., 2016), as well as CKO cells engineered to express CPSF6[551], CPSF6[588], or CPSF6[551]F284A protein (Sowd et al., 2016), were previously described. B8 CKO and F6 DKO cell clones were used in this study. The sex of knockout cell lines was not determined directly, but can be assumed as female based on the sex of parental HEK293T cells. HeLa MAGI and TZM-bl indicator cell lines, which were derived from female HeLa cells, were obtained from the NIH AIDS Reagent Program. Primary CD4 T Cells CD4+ T cells isolated via CD4 immunomagnetic selection from two different deidentified male blood donors were purchased from Lonza. NIH Grant R01 AI052014, which funded this research and described the use of deidentified human samples for HIV-1 infection experiments ex vivo, was reviewed by a member of the Dana-Farber Cancer Institute (DFCI) Institutional Review Board (IRB), who determined that the project activities do not meet the definition of human subjects research set forth at 45 CFR 46.102.

Viruses

Single-round derivatives of HIV-1NL4-3 that carry the green fluorescent protein (HIV-GFP) or luciferase (HIV-Luc) reporter gene were constructed from plasmid DNA sources.

METHOD DETAILS

Plasmid Constructs HIV-GFP and HIV-Luc were constructed using pNLENG1-ES-IRES (Levy et al., 2004) and pNLX.Luc.R-.DAvrII (Koh et al., 2013), respectively. The mutation for the A77V change in CA was introduced into pNLENG1-ES-IRES-GFP using the Gibson assembly method. Briefly, the plasmid was digested with restriction enzymes BssHII and SpeI from New England Biolabs (NEB) as per the man- ufacturer’s recommendations, the 14 kb fragment was isolated from a 1% agarose gel, and DNA was recovered using the crush and soak method (Achuthan et al., 2014). The digested plasmid was assembled with two PCR fragments amplified using the parent plasmid as template with the following combination of primers (PCR fragment 1: 50-TCTCGACGCAGGACTCGGCTTGCTG-30 and 50-ATTCTGCAACTTCCTCATTGATGGTCTC-30; PCR fragment 2: 50-GAGACCATCAATGAGGAAGTTGCAGAAT-30 and 50-GTCATCCATCCTATTTGTTCCTGAAGGG-30) using the Gibson assembly master mix (NEB) according to the manufacturer’s in- structions, and the resulting plasmid was sequence verified. The vesicular stomatitis virus G (VSV-G) glycoprotein was expressed using pCG-VSV-G (Shun et al., 2007a) while the HIV-1NL4-3 glycoprotein was expressed from pNLXE7 (Limo´ n et al., 2002). Plasmids encoding the I-SceI endonuclease (pCBASceI) and HIV vector with the I-SceI recognition sequence (HIV-CMV-deltaGFP-I-SceI) were described previously (Di Primio et al., 2013).

Cell Propagation and Virus Production HEK293T and HeLa cell lines were maintained in Dulbecco’s modified Eagle’s medium (DMEM) containing 10% (vol/vol) fetal bovine serum (FBS), 100 IU penicillin, and 100 mg/ ml streptomycin. Primary CD4+ T cells maintained in Roswell Park Memorial Institute (RPMI) 1640 medium containing 20% FBS, 100 IU penicillin, 100 mg/ ml streptomycin, 1 X non-essential amino acids, 1X sodium py- ruvate, and 5 mM HEPES (CD4+ T cell media) were stimulated for 48 h with 5 mg/ml phytohemagglutinin (PHA) prior to infection.

Viral genome and envelope expression (VSV-G for HEK293T cell infections; HIV-1NL4-3 for CD4+ T cells) plasmids were co-trans- fected into HEK293T cells using PolyJet, and cell supernatants containing virus particles were harvested and concentrated as described (Sowd et al., 2016). Briefly, 3 x 106 HEK293T cells plated on 100 mm culture dish were transfected with 13.3 mgof pNLX.Luc.R-.DAvrII (HIV-Luc) or pNLENG1-ES-IRES-GFP (HIV-GFP) with 1.7 mg of either pCG-VSV-G or pNLXE7 using 45 ml PolyJet in 500 ml DMEM. Media was changed 8 h post-transfection. Supernatant was harvested 48 h after transfection, cleared by centrifu- gation for 5 min at 700 x g, filtered through a 0.45 mM polyvinylidene fluoride filter, concentrated by spinning at 26,000 rpm for 2 h at 4C using a SW32Ti rotor, and stored in aliquots at 80C. HIV-1 yield was quantified using the bulk p24 antigen capture assay for integration site analysis. For ViewHIV experiments, reverse transcriptase activity in the viral supernatant was titered as arbitrary fluorescence units (AFUs) using the EnzChek Reverse Transcriptase Assay Kit after inactivating viral preparations at room temperature for 1 h in DMEM sup- plemented with 5% FBS and 1% TritonX-100 (Chin et al., 2015). The approximate MOI of 350 was determined via p24 antigen capture staining of HeLa MAGI cells infected with the same AFU per cell ratio. For the experiment reported in Figure S1, HeLa MAGI cells were infected with 10,000 AFUs per cell, yielding an approximate MOI of 70. For MICDDRP experiments, viral titer was determined as the number of blue forming units (BFU) using TZM-bl cells in the presence of 20 mg/mL DEAE-Dextran as described (Puray-Chavez et al., 2017). e4 Cell Host & Microbe 24, 392–404.e1–e8, September 12, 2018 Immunoblotting Detection of CPSF6 (RRID: AB_937785) and actin (RRID: AB_262011) proteins by western immunoblotting was as described (Sowd et al., 2016).

HIV-1 Infection Image-Based Assays Semi-synchronized infections for ViewHIV (50,000 AFU per cell) and MICDDRP (0.2 BFU per cell) were performed essentially as described (Chin et al., 2015; Puray-Chavez et al., 2017). HEK293T cells were plated onto coverslips precoated with collagen in 24-well plates. Cells and virus were pre-chilled on ice for 45 min after which the media was aspirated and viral dilutions added to the cells. The pre-chilled cell-virus mixtures were placed at 4C for 45 min before placing at 37C (time 0). At 12 or 24 hpi, cells were washed twice with phosphate-buffered saline (PBS) and treated with 0.025% trypsin for 30 sec. Trypsin was neutralized by washing twice with cold DMEM containing 10% FBS, and the cells were fixed for 10 min in 4% paraformaldehyde (PFA) in PBS. For SCIP assays, cells were transfected with pCBASceI 24 h after plating using FuGENE HD as per the manufacturer’s instructions. Two days post-transfection, cells were synchronously infected with VSV-G pseudotyped HIV-CMV-deltaGFP-I-SceI virus. CD4+ T cells were infected with 10 pg p24 of HIV-GFP per cell for ViewHIV and MICDDRP experiments. The cells were plated into 96-well plates in CD4+ T cell media containing 8 mg/ml polybrene and 40 U/ml of interleukin (IL) 2. After incubating cell and virus di- lutions on ice for 45 min, pre-chilled cells and virus were mixed and placed at 4C for 45 min. Cells were then infected by spinoculation at 1200 x g for 2 h at 37C, after which the plates were incubated at 37C (time 0). Media lacking polybrene were replaced at 5 hpi. At 24 hpi, cells were gently washed once with cold CD4+ T cell media, and cells were fixed for 10 min in 4% PFA for image analysis. At 3 dpi, infectivity was measured as the proportion of cells expressing GFP by flow cytometry. IN Inhibitor Treatments Where indicated, cells were pre-treated with 500 nM RAL (dissolved in DMSO) or equivalent DMSO volume for 1 h prior to infection, the concentrations of which were maintained throughout the infections. For infection in the presence of BI-D, cells were treated with an EC90 concentration (7 mM) as determined using HEK293T cells (Jurado et al., 2013) or matched DMSO concentration for 1 h prior to infection, and such concentrations were maintained throughout infections. Integration Site Sequencing Infection of HEK293T cells for integration site sequencing was performed as described (Sowd et al., 2016). In brief, HEK293T cells were infected with 1.67 pg p24 of VSV-G pseudotyped HIV-Luc per cell within 2-3 h of plating to maintain similar multiplicities of infec- tion across cell type, which helps to compensate for cell growth defects inherent to knockout cells (Sowd et al., 2016). CD4+ T cells were infected with 10 pg p24 of HIV-GFP per cell for integration site sequencing. Genomic DNA was isolated for integration site sequencing at 5 dpi.

Imaging Assays ViewHIV Analyses Detailed description of the ViewHIV method for detection of proviral DNA and PICs (unintegrated vDNA and CA) is provided in Chin et al. (2015). Probe sets were designed against regions of clade B strains HIV-1HX2B and HIV-1NL4-3 that were conserved across multiple viral strains. Probes for the PIC assay spanned nucleotide positions 922-1933 of gag (VF6-12978), while probes for the provirus assay spanned the entire genomic sequence (VF1-14734). At least two separate experiments with duplicate samples within each experiment were imaged. Adherent cells were fixed onto #1.5 thickness coverslips with 4% PFA and stored in Dulbecco’s PBS (DPBS) in a 24-well dish. CD4+ T cells were fixed in 4% PFA and stored in PBS. CD4+ T cell suspensions (10 ml at 10,000 cells/ml) were spread across the coverslip and allowed to air dry inside a 100 mm petridish. Cells were rehydrated by sequentially adding 50 mlof 100% ethanol, 70% ethanol, and 50% ethanol for 2 min to the coverslip, followed by DPBS for 30 min at room temperature. For the PIC ViewHIV assay, coverslips were transferred to a 100 mm petridish prior to hybridization with probes, and 50 ml of DPBS was added to each coverslip immediately. DPBS was aspirated and coverslips were incubated with Affymetrix 1X detergent solution (ViewRNA ISH Cell Assay Kit) prepared with DPBS for 5 min at room temperature, after which the coverslips were washed twice with DPBS. Coverslips were then incubated with Affymetrix proteinase K solution (ViewRNA ISH Cell Assay Kit) diluted 1:1000 in DPBS for 10 min at room temperature and then washed with DPBS twice. DPBS (100 ml) was then added to the coverslips and the petridish containing the coverslips was sealed in parafilm and placed at 60C for 35 min. The probe set, diluted 1:100 in probe set diluent (ViewRNA ISH Cell Assay Kit), was added to the coverslips, which were then placed in a 40C humidified hybridization chamber for 3 h. For co-staining with antibodies, the coverslips were permeabilized with DPBS-TT (1% Tween 20 and 1% Triton X-100 in DPBS) for 20 min, treated with blocking buffer [1% bovine serum albumin (BSA), 2.25% glycine in PBS)] for 30 min, and incubated with primary and secondary antibodies diluted in 1% BSA for 1 h each. Matching pre-amp, amplifier, and label probes were then added to the coverslips as described in Chin et al. (2015). For the Provirus ViewHIV assay, coverslips with fixed cells were submerged in TE buffer (100 mM Tris-HCl, 50 mM EDTA, pH 7.5) and heated for 3 min inside a pressure cooker using the low-pressure setting. After releasing the pressure, the coverslips were imme- diately chilled in a water bath for 30 min and soaked in 75 ml 2X SSC (0.3 M NaCl, 30 mM trisodium citrate, pH 8.0) for 5 min at room temperature. Coverslips were treated with 100 ml of Digest All-3 pepsin (1:20 dilution in 0.01 M HCl) at 40C for 10 min. Coverslips were washed twice in PBS, after which samples were dehydrated by sequentially adding 70%, 85%, and 100% ethanol for 2 min each. Excess ethanol was aspirated gently and coverslips were allowed to air-dry completely. Once dry, 100 ml of pre-warmed

Cell Host & Microbe 24, 392–404.e1–e8, September 12, 2018 e5 denaturation solution (ViewRNA ISH Cell Assay Kit) was added to each coverslip and samples were heated at 72C for 10 min. Cov- erslips were immediately dehydrated by sequentially adding 50 ml of cold 70%, 85%, and 100% ethanol for 2 min each. Coverslips were airdried, transferred to a new parafilm-lined petridish, and 50 ml of the provirus probe set (1:100 dilution in the probe set diluent) was added to each sample. Petridishes were moved to a Humidified BioAssay Dish, which was sealed with parafilm and incubated overnight at 40C. Pre-amp, amplifier, and label probes were then added to the coverslips as described in Chin et al. (2015). For the experiment in Figure S1, coverslips were additionally treated with 5 mg/mL RNase A for 30 min at 37C as described (Puray-Chavez et al., 2017). RNase A treatment was done after protease (PIC assay) or pepsin (provirus assay) treatment. Coverslips were finally mounted on slides using VectaShield mounting media with 4,6-diamidino-2-phenylindole (DAPI). MICDDRP Assay Samples were processed 24 hpi and treated with RNase A for DNA detection using sense probe-2 targeting the 801-1393 bp of gag- pol region of negative-strand vDNA as previously described (Puray-Chavez et al., 2017). HIV-1 DNA in cells was probed using RNA- scope reagents (Advanced Cell Diagnostics), following the manufacturer’s protocol with some modifications. Following fixation onto coverslips as described above, cells were dehydrated by removal of DPBS and sequentially adding 50%, 70%, and 100% ethanol for 5 min each at room temperature. The ethanol was replaced with fresh 100% ethanol and incubated at room temperature for a final 10 min. At this point coverslips can be stored in 100% ethanol at 20C. The coverslips were sequentially incubated for 2 min each with 100%, 70%, and 50% ethanol to rehydrate cells. Finally, 50% ethanol was replaced with PBS at room temperature for 10 min. Coverslips were then washed with 0.1% Tween in PBS for 10 min, and twice more in PBS for 1 min. Coverslips were then immobilized on glass slides, a circle was drawn around the coverslips using an ImmEdge hydrophobic barrier PAP pen, and PBS was added to prevent sample dehydration. The manufacturer’s protease solution (Pretreat 3), diluted 1:2 in PBS, was added to the coverslips, and incubated in a humidified HybEZ oven for 15 min at 40C. The slides were additionally treated with RNase A solution after discarding the protease solution. After washing three times with ultrapure water for 2 min, the slides were treated with 5 mg/mL RNase A (Qiagen) in PBS for 30 min at 37C, washed further three times with ultrapure water for 2 min, and heated to 50C for 30 min with hybridization buffer (1.7 M ethylene carbonate, 100 mg/mL dextran sulfate, 600 mM NaCl, 0.1% Tween-20, 10 mM sodium citrate, pH 6.2). Hybrid- ization of pre-amplifiers and amplifiers to the probes were performed as described (Puray-Chavez et al., 2017). Two separate exper- iments were processed to obtain final DNA distributions. Images were obtained with a Leica TCS SP8 inverted confocal microscope equipped with a 63X/1.4 and a 100X/1.4 oil immersion objective with a tunable supercontinuum white light laser. Images taken using the 100X/1.4 objective were used to produce representative pictures, whereas the images obtained using the 63X/1.4 were used for analyzing the radial position of HIV-1 vDNA. Confocal data sets were deconvolved using the Huygens Professional software (Version 16.10, SVI). SCIP Assay Cells were fixed 24 h after infection and immunostained using anti-gamma H2X antibody [anti-phospho-histone H2A.X (Ser139) clone JBW301; RRID: AB_309864]. Consistent with prior reports (Di Primio et al., 2013; Vranckx et al., 2016), 2 or 3 foci were observed per mock infected cell, though such background foci were generally less intense than those observed in infected cells. LKO cell foci, by contrast, were largely similar between mock and infected conditions, which we suspect might be due to the role of LEDGF/p75 in homologous recombination (Daugaard et al., 2012). We accordingly were unable to use the SCIP assay to monitor proviral content of infected LKO cells. ViewHIV and SCIP assays were performed using a Nikon A-1 confocal imaging system with a 60X objective and pinhole of 0.9 AU. This resulted in optical sections (or slices) with z plane depths as follows: 0.360 mm with the 488 nm laser line (CA imaging), 0.400 mm with the 561 nm laser line (provirus assay DNA), and 0.410 mm with the 633 nm laser line (PIC assay DNA). All acquisition settings were kept constant across each experiment. FISH Sample sizes required for statistically significant comparisons between different groups of genes were calculated as described (Ser- rao et al., 2014). Using a Cohen’s d value of 0.5 (‘‘medium’’ effect size), desired statistical power level of 0.9, and probability level or p value of 0.05 yielded 85 as the minimal number of foci required. At least 100 foci for each gene were imaged in both HEK293T and CD4+ T cells. HEK293T cells cultured in 10 cm plates were treated with colcemid (0.02 mg/mL) for 2 h at 37C before treatment with 0.025% trypsin for 30 sec. The cells after trypsinization were transferred to 15 ml conical tubes and centrifuged at 1000 x g for 10 min. Cells were re-suspended in hypotonic solution (0.075 M KCl) for 20 min at room temperature, fixed in methanol and acetic acid (3:1 vol/vol), and washed three times in fixative. Slides were prepared by dropping a few drops of the cell suspension on a small area, briefly air dried on a slide warmer at 60C, and incubated at 37C overnight inside a petridish. SSC solution (2X; 100 ml) was added onto the slides next day and incubated at 37C for 30 min. Slides were then washed with pure water 3 times, dehydrated by serially adding 70%, 85%, and 100% ethanol for 2 min each, and air-dried. FISH assay was performed on the above cytological preparations using BAC probes labelled with either green 5-fluorescein dUTP (lmax = 491 nm, lem max = 515 nm) or orange 5-TAMRA dUTP (lmax = 548 nm, lem max = 573 nm) (Table S5) obtained from Empire Genomics (Buffalo, N.Y.). Slides were hybridized with probes according to the manufacturer’s instructions with slight modifications. In short, 2 ml of each probe was mixed with 8 mlof in situ hybridization buffer (Empire Genomics). The probe was applied on the slide and covered with a glass coverslip (22 mm X 22 mm) and sealed with rubber cement. The slides were denatured at 72C for 3 min using the Thermobrite system (Abbott Laboratories, Illinois, USA) and incubated at 37C in a humidified chamber overnight. The slides were then washed using 2X SSC buffer at 45C for 1-2 min, rinsed using 0.05% Tween 20–PBS, and counterstained with DAPI. CD4+ T cells were cultured in CD4+T cell media and were stimulated for 48 h with 5 mg/ml PHA, after which the cells were processed as described for e6 Cell Host & Microbe 24, 392–404.e1–e8, September 12, 2018 HEK293T cells. Images were acquired using a NIKON confocal A1 microscope with a 60 X oil objective lens with 1.4NA. Pixel sizes were set at around 200 nm and z-stacks were acquired every 600 nm. Laser lines used for acquisition were 405 nm (DAPI), 488 nm (green probes), and 561 nm (orange probes). Radial position of genes labeled using the BAC probes were determined as described below under the ‘‘Viral Signal Quantification and Determination of Intranuclear Distance Travelled by Foci’’ subsection of QUANTIFICATION AND STATISTICAL ANALYSIS.

Determination and Analyses of HIV-1 Integration Sites Linker-Mediated PCR (LM-PCR) LM-PCR was used to amplify viral-host chromosomal junctions as described (Serrao et al., 2016). Briefly, 10 mg of genomic DNA was digested overnight at 37C with 100 U each of restriction endonucleases MseI and BglII (NEB) in 100 ml with buffer supplied by the manufacturer. Specific linker oligonucleotides, which were compatible for ligation with the MseI-generated DNA ends containing 5’-TA overhangs, were ligated with genomic DNA overnight at 12C in reactions containing 1.5 mM ligated linker, 1 mg fragmented DNA, and 800 U T4 DNA ligase. Viral long terminal repeat (LTR)-host DNA junctions were amplified using semi-nested PCR with a unique linker-specific primer and nested LTR primers. The second round LTR-specific primer and linker-specific primer carried adapter sequences for DNA clustering as well as primer-binding sites for next-generation sequencing (Serrao et al., 2016). Eight PCR reactions were performed in parallel for each PCR round to enhance library diversity. See Table S6 for details of oligonucleotides used for LM-PCR analyses. Integration site libraries were analyzed by 150 bp end sequencing on a MiSeq Illumina instrument at the DFCI Molecular Biology Core Facilities. Bioinformatics Paired-end sequencing reads were cropped to remove LTR and linker sequences using custom Python scripts. Cropped reads were aligned to human genome build hg19 using BWA MEM (Li and Durbin, 2009) or HISAT2 (Kim et al., 2015; Pertea et al., 2016). The results were then filtered to retain high-quality alignments using SAMtools (Li et al., 2009), and unique integration sites were extracted and converted to the browser extensible data (BED) format using custom Python scripts. Each interval in the BED file reports the middle dinucleotide step of the integration site. Previously published integration datasets (Table S1) were processed as described above and aligned to human genome hg19 using HISAT2. Integration sites were correlated with respect to HT1080 cell LADs (Meule- man et al., 2013), cLADs and ciLADs (100-kb contiguous genomic segment bins) (Kind et al., 2015), NUP153 and NUP93-associated regions (Ibarra et al., 2016), and various other genomic annotations obtained from the University of California Santa Cruz database (http://genome.ucsc.edu/cgi-bin/hgTables) using the BEDtools software suite (Quinlan and Hall, 2010). Heatmaps comparing inte- gration frequency for individual genes across samples and histograms were generated using ggplot2 (Wickham, 2016). RIGs were initially classified by rank-ordering genes based on the fraction of total integration events harbored by the individual gene. Only genes with more than one integration event were considered for the RIGs analyses. To identify genes for FISH analyses, top 100 RIGs from HEK293T, HOS, U2OS, MDM, and CD4+ T cells (Table S1) were compared to identify genes enriched relative to the MRC (p < 0.05; Fisher’s exact test) across minimally three independent cell types. Such analyses were extended to LEDGF/p75 or CPSF6 deficient conditions (Table S1) to identity pathway-specific FISH probes. Group 1 genes were specifically enriched under WT conditions while Group 3 genes were specifically enriched in CPSF6 depleted conditions (statistically significant compared to both the MRC and WT in at least three cell types). Group 2 genes were enriched in CPSF6 or LEDGF/p75 depleted conditions. Consistent with the heatmap analyses, the majority of group 2 genes were not statistically significant compared to MRC, and thus p value cut-off versus the MRC was not enforced during Group 2 gene identification. Genome-wide maps (Figure S5) generated using Rcircos (Zhang et al., 2013) display statistically enriched RIGs (p < 0.05 versus the MRC for frequency of integration events). RNA-seq data from WT, CKO, LKO, and DKO HEK293T cells were as described (Sowd et al., 2016) (SRA: SRP065607). Single and paired end reads were quality filtered and mapped to hg19 using HISAT2 (Kim et al., 2015; Pertea et al., 2016). Transcript abundance was quantified using Stringtie (Pertea et al., 2015), and relative transcript levels were compared using Ballgown (Frazee et al., 2015). To calculate average expression, bedfiles containing the top 50 RIGs were intersected with a bedfile containing the relative transcript abundance for every gene using bedtools. To identify genes that lie partially or completely in a LAD, a bedfile containing relative tran- script abundance for the HEK293T knockout cell panel was correlated with HT1080 cell LADs (Meuleman et al., 2013). Relative gene expression levels of genes within or outside of LADs were averaged and graphed.

QUANTIFICATION AND STATISTICAL ANALYSIS

Viral Signal Quantification and Determination of Intranuclear Distance Travelled by Foci Confocal images, taken as 8-bit, were processed according to Chin et al. (2015). Images were first processed using the ‘‘subtract background’’ tool using a rolling ball radius of 5, and were smoothed to minimize single bright pixels using the FIJI image analysis software (ImageJ). A threshold was set with a minimum between 18-50 and a maximum of 255, and a minimum size of 0.1 mm2 in FIJI, and the counts of individual CA and vDNA signals were determined using the ‘‘analyze particles’’ tool in FIJI. Nuclear distribution pattern was determined in a consistent manner across different cell types as described (Chin et al., 2015). First, the central z section image within each nucleus was determined. The area of the nucleus was determined by DAPI staining, from which the nuclear radius was calculated assuming the nucleus to be a circle. The distance of each viral focus from the closest nuclear edge was calculated using ImageJ. Then the radial distance, computed as fraction travelled by the foci into the nucleus along the radius, was calculated as the ratio of the distance along the radius. Nuclei were divided into three concentric zones of equal surface as per Nagai et al. (2008)

Cell Host & Microbe 24, 392–404.e1–e8, September 12, 2018 e7 and Towbin et al. (2012), and radial distances from the nuclear edge of the foci were binned into the three zones. The most peripheral zone, PN, had a width of 0.184 x r; the MN ranged from 0.184 x r to 0.422 x r; and CN had a width of 0.422 x r. Correlating Reported Distance Measures to Concentric Nuclear Zones Albanese et al. (2008) mapped the majority of HIV-1 PICs to 0.4-2.0 mm from the edge of HeLa cell nuclei. Considering the average radius of 7.4 mm(Burdick et al., 2013), this equates to 0.05-0.27 x r or the PN and MN. Burdick et al. (2013) scored the average dis- tance traveled by PICs into HeLa cell nuclei as 1.2-1.6 mm, equating to 0.16-0.22 x r, which we interpret as the interface between PN and MN areas. Their follow up study confirmed 1.4 mm (or 0.19 x r) as the average distance travelled into HeLa cell nuclei (Burdick et al., 2017). Francis and Melikyan (2018) measured 1.8 mm as the average distance traversed by HIV-1 PICs in HeLa cell nuclei, which equates to 0.24 x r (MN area). Using the SCIP assay, 55% of HIV-1 proviruses were mapped to within 1.5 mm from the edge of U2OS cell nuclei (average radius of 8 mm) at 48 hpi (Di Primio et al., 2013), which equates to 0.19 x r and is thus consistent with the PN-MN interface. In the CEMss T cell line, 62% of proviruses mapped to within 0.5 mm from the nuclear edge at 48 hpi. Considering 5.5 mmas the average nuclear radius, this places the viruses at 0.09 x r (within the PN). Eleven days later, HIV-1 proviruses distributed randomly throughout the T cell nuclei (Di Primio et al., 2013). Marini et al. (2015) scored the majority of HIV-1NL4-3 foci in the PN area of primary CD4+ T cell nuclei at 4 dpi, although the related HIV-1BRU strain at this same time partitioned similarly to PN and MN areas. Prefer- ential PN localization was recorded for infected macrophages by Marini et al. (2015).

Statistical Considerations and Analyses Each virus infection experiment and associated imaging analysis was conducted using technical duplicate samples. Results (Figures 1A, 1B, 2A, 2B, 3A, S1A, S1B, S2A, S3A, S3D, and S4A) compile data obtained from minimally two independent experiments, with precise number indicated by ‘‘n’’ in respective figure legends. A single factor ANOVA test was used to confirm significant changes within the experiment (critical p value <0.05) and was followed by comparisons between the indicated distributions using the c2 anal- ysis for Figures 1A, 1B, 2A, 2B, 3A, 6B, 6C, S1, S2, S3A, and S4A. Single factor ANOVA test to confirm significant changes followed by t-tests assuming unequal variances were used for Figures 5D and S3D. Fisher’s exact test was used for all comparisons of integration site distributions except for the average number of genes per Mb, which used the Wilcoxon rank-sum test. Wilcoxon rank-sum test was also used for comparisons between the average gene length and gene expression of the top 50 RIGs (Figures 5A and 5B). RIGs were identified by comparing the integration frequency observed in each gene with the observed frequency in the MRC using Fisher’s exact test, which eliminated genes that showed high integration frequency by random choice. Frequency of integration in each gene was individually compared between WT and the respective experimental condition using Fisher’s exact test to determine condition- specific RIGs (e.g., CKO-specific RIGs). Spearman’s rank correlation method was used to correlate the fraction of vDNA foci in PN with the integration sites observed at cLADs, ciLADs, and within 2.5 kb of LADs (Figure S6A).

DATA AND SOFTWARE AVAILABILITY

The sequences reported in this paper have been deposited in the National Center for Biotechnology Information Sequence Read Archive (NCBI SRA) (SRA: SRP132583).

e8 Cell Host & Microbe 24, 392–404.e1–e8, September 12, 2018 Cell Host & Microbe, Volume 24

Supplemental Information

Capsid-CPSF6 Interaction Licenses Nuclear HIV-1 Trafficking to Sites of Viral DNA Integration

Vasudevan Achuthan, Jill M. Perreira, Gregory A. Sowd, Maritza Puray-Chavez, William M. McDougall, Adriana Paulucci-Holthauzen, Xiaolin Wu, Hind J. Fadel, Eric M. Poeschla, Asha S. Multani, Stephen H. Hughes, Stefan G. Sarafianos, Abraham L. Brass, and Alan N. Engelman Figures S1-S7 and Tables S4-S6

Figure S1. Influence of Raltegravir (RAL) on ViewHIV Assay Results, Related to Figure

1A, Figure 2A, and Figure 3A

(A) Left, representative images of HEK293T cells infected in the presence of RAL or matched vehicle control concentration, developed using the PIC ViewHIV assay at the indicated times post- infection. Right, numbers of foci per nucleus (average ± SEM for n=2 independent experiments, each conducted in duplicate). Consistent with our prior report (Chin et al., 2015), RAL did not detectably inhibit foci formation.

(B) Same as in panel A, except that images were developed using the Provirus ViewHIV assay.

Unlike with the PIC ViewHIV assay, the average number of foci per cell remained relatively constant throughout the experimental time course, with a transient blip at 24 h likely due to unintegrated vDNA forms that subsided at the later time points. Note that the vast majority of vDNA signals at later time points is inhibited by RAL. Considered alongside other reports that indicate that the bulk of HIV-1 integration has occurred by 24 hpi (Brussel and Sonigo, 2003;

Mohammadi et al., 2013), we infer that 2 of every 3 foci detected at 24 hpi with the Provirus

ViewHIV assay is an integrated virus.

(C) Foci distributions across nuclear sections for panel B infections in the presence of DMSO

(average ± SEM for n=2 independent experiments). The following foci numbers were counted at the indicated time: 12 h, 161; 24 h, 286; 48 h, 190; 96 h, 176.

***p < 0.0001; **p < 0.01; *p < 0.05; NS, p > 0.05; orange indicators in panel C are in comparison to random (dashed line). NS, not significant. Scale bars in panels A and B, 20 µm.

Figure S2. CPSF6-Dependent Localization of HIV-1 PICs at 12 hpi, Related to Figure

1A and Figure 2A

(A) HIV-1 DNA (red) and CA (green) from representative infected cells images; merge overlays

1 the two signals. Scale bar, 20 μm.

(B) Distributions of 85-603 DNA (left) and 87-355 CA (right) signals from n=2 separate experiments, each conducted in duplicate, were binned into nuclear zones and are shown as stacked column graphs. Error bars are SEM. ***p < 0.0001; NS, p > 0.05 compared to indicated random distributions (orange lines indicate random PN and MN values).

(C) Levels of CPSF6 and actin proteins in indicated cell lysates were assessed by immunoblotting.

Numbers to the left sides of the blots indicate mass marker positions in kilodaltons.

Figure S3. Outcomes of BI-D Treatment and A77V Substitution on Integration Site

Distributions and HIV-1 Infection, Related to Figure 3

(A) Distribution of radial distance of 585-745 DNA signals from HEK293T cells treated with BI-D or DMSO vehicle control (mean ± SEM) for n=2 experiments (each conducted in duplicate) compared to the distributions in WT and LKO cells (data from Figure 1A).

(B and C) Proportion of HIV-1 integration sites near LADs (B) and at cLADs vs. ciLADs (C) in

HEK293T cells; WT and LKO data are from Figure 1D and 1E. Dashed lines are MRC values.

(D) HIV-1 infectivity in primary CD4+ T cells derived from two blood donors as measured by the proportion of GFP+ cells. Error bars represent standard deviation obtained from minimally n=2 separate experiments, with each experiment conducted in duplicate.

(E) Proportion of HIV-1 integration sites in MDM nearby LADs and at cLADs vs. ciLADs.

Dashed lines are MRC values.

(F) Proportion of top 50 targeted genes in the indicated cell type showing LAD association. LAD association of MRC genes are shown as dashed lines. Statistical comparisons are made to the

MRC or WT by taking together the observed proportion of LAD and non-LAD RIGs.

***p < 0.0001; **p < 0.01; *p < 0.05; NS, p > 0.05 (orange, compared to random; black, compared to WT or vehicle control; gray, compared to indicated MRC).

2

Figure S4. MICDDRP Assay Results with Primary CD4+ T Cells and Primary Cell

RIGs, Related to Figure 3A and Figure 4

(A) Representative MICDDRP images (scale bar, 5 µm), with quantitative assessment of 196 to

284 foci shown to the right for n=2 independent experiments, each conducted in duplicate. Error bars, SEM. ***p < 0.0001; *p < 0.05; NS, p > 0.05 (comparisons to random dashed line noted in orange).

(B) Heatmaps compare integration frequency of RIGs identified for WT and A77V CA mutant viruses in the indicated cell types. The maps are colored based on Z-score values; darker shades of blue denote values enriched compared to the MRC and lighter shades are depleted versus the MRC.

Figure S5. Genome-Wide Association of HEK293T Cell RIGs with LADs and Gene-

Dense Regions of Chromosomes, Related to Figure 4 and Figure 5

RIGs with p values < 0.05, which amounted to 698 genes from the WT cell dataset (blue), 307 for LKO cells (pink), and 555 for CKO cells (green), were mapped onto 23 chromosomes (22 autosomal and X) alongside gene dense (black histograms; genes/Mb) and cLAD (brown) regions.

The upper left quadrant highlights chromosomes 18 and 19, which have been the focus of localization studies due to their similar sizes (Chr 18, 78.1 Mb; Chr 19, 59.1 Mb) but vastly different average gene densities (Chr 18, 3.3 genes/Mb; Chr 19, 22.5 genes/Mb). Chr 19, which is enriched for WT cell RIGs and relatively devoid of CKO cell RIGs, locates toward the interior region of cell nuclei, while Chr 18, which is devoid of WT cell RIGs and enriched for cLADs and

CKO cell RIGs, partitions toward the nuclear periphery (Cremer et al., 2003; Cremer et al., 2001;

Croft et al., 1999).

Figure S6. LAD-Tropic Integration Targeting: Correlation with Fractional PN

3 Localization and Influence from Nuclear Import Cofactor Knockdowns, Related to Figure

1, Figure 2, and Figure 3

(A) Proportion of study-wide imaging foci observed in PN at 24 hpi were correlated with the

proportion of integration sites within 2.5 kb of LADs (leftward graph), at cLADs (center graph),

and at ciLADs (right graph). Resulting Spearman’s rank correlation (rs) and p values are

indicated.

(B and C) HIV-1 integration sites observed within 2.5 kb of LADs (B) and at cLADs vs ciLADs (C)

for the indicated knockdown conditions. See Table S1 for sources of integration sites.

***p < 0.0001; NS, p > 0.05 (black; compared to short interfering RNA siNON non-targeting

controls; gray, compared to MRC).

Figure S7. Integration Site Distributions near NUP153 and NUP93 Associated

Regions, Related to Figure 1 and Figure 3

(A and B) WT and A77V CA mutant viral integration sites within 10 kb windows of NUP153 (A)

and NUP93 (B) associated regions in the indicated cells. ***p < 0.0001; NS, p > 0.05 compared

to indicated MRC (dashed lines).

(C-H) Histograms showing proportions of integration sites within 1 Mb of NUP153 and NUP93

associated regions in indicated cell types.

4 A 15

NS NS NS ** DMSO 10 DMSO RAL

5 Foci count per nucleus RAL

0 Mock 12 hr 24 hr 48 hr 96 hr 12 hr 24 hr 48 hr 96 hr B

15 48h 48h NS * *** ***

DMSO DMSO 10 RAL

5 Foci count per nucleus RAL

Mock 12 hr 24 hr 48 hr 96 hr 0 12 hr 24 hr 48 hr 96 hr C PN 60 M N NS * CN NS NS NS 40 NS NS NS NS NS NS *

Fraction of foci 20

0 12 hr 24 hr 48 hr 96 hr DMSO A DNA CA Merge

WT LKO CKO WT-vector CKO-vector CKO-CPSF6[551]CKO-CPSF6[588] CKO-CPSF6[551] F284A

NS NS *** NS *** NS NS *** CN NS NS NS NS NS *** CN NS NS NS NS NS NS NS NS MN NS NS *** NS *** NS NS NS MN B NS NS NS NS NS NS NS *** *** NS *** *** *** PN *** NS *** NS *** PN 100 100 CN CN MN MN 80 PN C 80 PN 250 M N M N 60 130 60 100 70 CPSF6 40 40 50 P N

Fraction of CA foci Fraction of CA P N 50 Fraction of viral DNA foci Fraction of viral DNA ACTIN 20 20 35 25 0 0

WT WT LKO CKO LKO CKO WT-vector WT-vector WT-vector CKO-vector CKO-vector CKO-vector F284A F284A F284A CKO-CPSF6[551]CKO-CPSF6[588]CKO-CPSF6[551] CKO-CPSF6[551]CKO-CPSF6[588]CKO-CPSF6[551] CKO-CPSF6[551]CKO-CPSF6[588]CKO-CPSF6[551] *** *** A B *** C PN *** *** 60 60 *** 60 60 60 60 M N *** cLAD *** CN ciLAD NS MRC NS MRC * 40 40 *** 40 NSNS NS 40 NS NS *** 40 40 *** NS NS NS NS *** *** *** *** NS *** *** *** of LADs *** MRC cLAD 20 20 20 20 *** 20 Fraction of foci 20

% of Integrations *** MRC ciLAD *** *** ****** % Integration +/- 2.5 kb *** *** *** *** 0 0 0 0 0 O DMSO BI-D W T LKO CK O DK O T K DMSO BI-D W T LKO BI-D W L CKO DKO DMSO 100 D E F ** ***NS *** NS *** *** *** MRC 80 LAD 40 60 LAD 60 *** Non-LAD ** *** cLAD 60 30 NS *** MRC 40 ciLA D 40 20 40 of LADs % of Top 50 RIGs Top % of *** MRC cLAD 20 % of Integration *** 20 *** NS % GFP+ positive cells 10 *** MRC cLAD 20 MRC

% Integration +/- 2.5 kb Non-LAD *** 0 0 0 W T A77V BI-D W T A77V BI-D WT A77V WT A77V WT A77V WT A77V

Donor A Donor B MDM CD4 A *** 80 *** PN M N 60 CN NS * NS

40 NS NS NS Fraction of foci NS *** 20

*** Donor A 0 WT BI-D A77V WT BI-D A77V

B WT CA A77V CA WT CA A77V CA FANCA CCDC57 RAB11FIP3 KIAA1109 NPLOC4 TBC1D5 NSD1 DLG2 CYTH1 GRM8 KDM2A CCDC57 KDM2A DPYD NPLOC4 PLCL1 PACS1 OTUD7B C6orf106 FARSB VAV1 WDFY4 NBPF20 TBC1D5 GNB1 KIRREL3 SPATS2 NT5DC1 PPP6R3 PTGES3 MIR548W MAST2 CTCF SMG6 TANC2 SLC19A2 ANKRD11 RASGRP3 SRP68 MIR548F1 ANKFY1 NYAP2 ANAPC11 RSU1 CSNK1D AGAP1 CCDC57 BTAF1 NOSIP CNTN4 STK4 SHOC2 CYTIP TRIM2 CDK14 PRCP PPP6R2 MTMR12 KMT2C CCDC91 HSF1 PDE4D COL22A1 OSBPL8 ZNF251 KMT2C ABL1 EEA1 SEC16A SLC2A7 RABL6 PCDH9 EHMT1 ARHGEF10L GAB3 RBM26 ASH1L EMC1 CCNL2 UBAC2 SF1 USP48 RLF MIR548AN NFATC3 NIPAL3 CCDC30 ITFG1 POLR2A NCMAP PIK3R3 COX11 RPTOR EYA3 DPYD FOXK2 TRAPPC10 ZCCHC17 GON4L PTPRM HORMAD2 ZMYM4 SUCO LINC00669 RBM6 CLSPN RFWD2 WDPCP VPRBP ORC1 IARS2 ARHGAP15 MROH1 NEXN−AS1 MLLT10 MMADHC RERE HS2ST1 ARHGAP21 TANK MACF1 LPPR5 VCL DPP4 RPRD2 LPPR4 GBF1 CWC22 POGZ WDR47 SORCS1 ATXN10 SP1 RAP1A PSMD13 SBF1 MGA WNT2B DEAF1 ITPR1 UBR1 LOC101929099 RTN3 LOC339862 DENND4A POGZ ANKRD13D FHIT IQGAP1 ARHGEF2 NUMA1 GBE1 NPRL3 ALDH9A1 CLPB KIAA1524 FLYWCH1 TIPRL WNK1 LPP CREBBP CENPL NCAPD2 TP63 C16orf52 RABGAP1L ITPR2 ARAP2 ZC3H18 RASAL2 TFCP2 TBC1D1 SMG6 LAMC2 PAN3 ANTXR2 PAFAH1B1 MIR548F1 ANKRD10 UNC5C TAOK1 PDC SEC23A NEIL3 IKZF3 CR1 FMN1 FRG1 VMP1 ESRRG INO80 MTRR RNF157 TAF1A IREB2 MAN2A1 TNRC6C NUP133 IQGAP1 FAM114A2 WT A77V MRC WT A77V MRC WT A77V MRC WT A77V MRC

CD4+ T cells MDM −1.5 −1 0 1 1.5 (depleted) (enriched) Row Z−Score 19

18

X

22 1

21

20

19 2

18

17

3 16

Genes / Mb 15 cLAD

WT 4 CKO 14 LKO

13 5

12

6

11

7

10

8

9 A

r = -0.714, P = 0.004 rs= 0.782, P = 0.001 rs= 0.768, P = 0.001 s 80 40 80

60 30 60

40 20 40 LAD

20 10 20 % Integration in cLAD % Integration in ciLAD

% Integration +/- 2.5 kb of 0 0 0 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 Fraction of foci in PN Fraction of foci in PN Fraction of foci in PN B C

80 cLAD *** 60 *** *** ciLAD 60 *** *** *** *** *** 40 *** NS NS *** *** 40 *** NS MRC

of LADs *** *** *** *** *** *** *** *** MRC cLAD 20 *** NS % of Integration *** 20 NS *** *** MRC ciLAD % Integration +/- 2.5 kb *** *** *** *** *** *** 0 0 siNON siNON siNON siNON siNON siNON siTNPO3 siTNPO3 siNUP358 siNUP358 siNUP153 siNUP153 siCPSF6 9 siCPSF6 9 siCPSF6 11 siCPSF6 11 A B

2 2

NS NS 1 *** NS 1 NS MRC *** NS NS *** NS MRC *** NS WT CKO WT A77V WT A77V WT CKO WT A77V WT A77V % Integration +/- 10 kb of % Integration +/- 10 kb of NUP93-associated regions NUP153-associated regions HEK293T MDM CD4+T HEK293T MDM CD4+T C D 9 HEK293T 9 HEK293T CKO CKO WT WT MRC MRC 6 6

3

3 % of Integrations % of Integrations

0 0 -500 -400 -300 -200 -100 0 100 200 300 400 500 -500 -400 -300 -200 -100 0 100 200 300 400 500 E F 9 MDM A77V CA 9 MDM A77V CA WT CA WT CA MRC MRC

6 6 % of Integrations

% of Integrations 3 3

0 0 -500 -400 -300 -200 -100 0 100 200 300 400 500 -500 -400 -300 -200 -100 0 100 200 300 400 500 G 9 H CD4+T CD4+T A77V CA 9 A77V CA WT CA WT CA 6 MRC MRC 6

3 % of Integrations % of Integrations 3

0 0 -500 -400 -300 -200 -100 0 100 200 300 400 500 -500 -400 -300 -200 -100 0 100 200 300 400 500 Distance from NUP153-associated regions (Kb) Distance from NUP93-associated regions (Kb) Table S4. Expanded integration datasets for RIG analyses (related to Figure 6) Gene Normal targeting conditions LEDGF/p75-deficient targeting CPSF6-deficient targeting PACS1 CD4+ T, MDM, HEK293T, HOS, U2OS - - GNB1 CD4+ T, MDM, HEK293T, HOS, U2OS - - FANCA CD4+ T, MDM, HEK293T, HOS, U2OS - - NPLOC4 CD4+ T, MDM, HEK293T, HOS, U2OS - - KDM2A CD4+ T, MDM, HEK293T, HOS, U2OS - - MKL1 CD4+ T, MDM, HEK293T, HOS, U2OS - - HEK293T CKO, HEK293T DKO EYS - HEK293T LKO, HEK293T + BI-D, MRC U20S+siCPSF6 MDM (CA A77V), HOS (CA N74D and CA A77V) HEK293T CKO, HEK293T DKO PDE4D - HEK293T LKO, MRC U20S+siCPSF6 MDM (CA A77V), HOS (CA N74D and CA A77V) HEK293T CKO, HEK293T DKO CTNNA3 - HEK293T LKO, HEK293T + BI-D, MRC U20S+siCPSF6 HOS (CA N74D and CA A77V) HEK293T CKO FBXL17 - - U20S+siCPSF6 MDM (CA A77V), HEK293T (CA N74D) HEK293T CKO PIK3C3 - - MDM (CA A77V), CD4+ T (CA A77V) HEK293T CKO, HEK293T DKO CADM2 - - U20S+siCPSF6 MDM (CA A77V) , HEK293T (CA N74D) HEK293T CKO, HEK293T DKO GBE1 - - MDM (CA A77V), HEK293T (CA N74D) HOS (CA A77V and CA N74D)

Table S5. Genomic association and BAC clones for RIG visualization (related to Figure 6)

Gene Genes/10 Genome Chr Start End BAC cloneb Chr Start End Dye (RefSeq id) Mb associationa KDM2A RP11- 11 67240035 67258079 314 ciLAD 11 66778531 66948851 Orange (NM_001256405) 1060G24 RP11-412P10 22 40301674 40460018 MKL1 22 40410281 40636719 204 ciLAD RP11-597P1 22 40438884 40613641 Orange (NM_020831) RP11-598E24 22 40593429 40777443 FANCA 16 89737551 89816657 98 ciLAD RP11-7D23 16 89703393 89703948 Green (NM_000135) PACS1 RP11-675B4 11 66047591 66185800 11 66070353 66244747 318 ciLAD Green (NM_018026) RP11-506O3 11 66116981 66291271 GNB1 RP11-1113I16 1 1643924 1859239 1 1785285 1891117 141 ciLAD Green (NM_002074) RP11-798H13 1 1813473 2001967 NPLOC4 17 81556885 81637153 152 ciLAD RP11-765O14 17 79379432 79579283 Orange (NM_017921) CTNNA3 RP11-367H22 10 66597266 66784942 10 65912518 67665685 101 LAD Orange (NM_001127384) RP11-153G2 10 68662730 68852128 EYS LAD, RP11- 6 63719980 65707225 15 6 64412022 64562160 Green (NM_001142800) NUP153 1133O23 PDE4D RP11-125B16 5 59438347 59606166 5 58969041 60488098 27 LAD Green (NM_001165899) RP11-124G1 5 59666460 59856133 FBXL17 LAD, RP11-344A6 5 107954610 108149892 5 107859033 108382098 36 Green (NM_001163315) NUP153 RP11-50A11 5 107468062 107617851 PIK3C3 18 41955198 42081482 34 LAD RP11-380O21 18 41933258 42143401 Green (NM_002647) GBE1 RP11-144J2 3 81447172 81617144 3 81489699 81761799 9 LAD Orange (NM_000158) RP11-142L1 3 81606228 81769730 CADM2 RP11-177M8 3 85275297 85432781 3 84958982 86074429 16 LAD Orange (NM_001167674) RP11-693D2 3 85442163 85644693 aGenes were computationally tested for association with LAD, cLAD, ciLAD, NUP153, and NUP93 genomic coordinates. bFISH was performed using a mixture of 1 to 3 labelled BAC clones. Table S6. Oligonucleotides for LM-PCR/integration site sequencing (related to STAR Methods) Sample Primer use Primer Name Primer sequence (5' to 3')a First round LTR primer AE5316 TGTGACTCTGGTAACTAGAGATCCCTC (common to all samples) Second round LTR primer AE6404 AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT CGATGTGAGATCCCTCAGACCCTTTTAGTCAG HEK293 Linker short strand AE6380 TAGTCCCTTAAGCGGAG-NH2 DMSO Linker long strand AE6381 GTAATACGACTCACTATAGGGCCTCCGCTTAAGGGAC Linker primer AE6382 CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCTGTAATACGACTCACTATAGGGC Second round LTR primer AE6406 AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT ACAGTGGAGATCCCTCAGACCCTTTTAGTCAG Linker short strand AE6386 TACTATGACGGTGACGC-NH2 HEK293 BI-D Linker long strand AE6387 GAGAATCCATGAGTATGCTCACGCGTCACCGTCATAG Linker primer AE6388 CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCTGAGAATCCATGAGTATGCTCAC Second round LTR primer AE6405 AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT TGACCAGAGATCCCTCAGACCCTTTTAGTCAG CD4+ T cell Linker short strand AE6456 TAGACTGACGCAGTCTG-NH2 Donor A WT Linker long strand AE6457 GACGTACATACTGATCGCATAGCAGACTGCGTCAGTC Linker primer AE6458 CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCTGACGTACATACTGATCGCATAG Second round LTR primer AE6410 AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT AGTCAAGAGATCCCTCAGACCCTTTTAGTCAG CD4+ T cell Linker short strand AE6447 TACCGGTCAGCATAGTG-NH2 Donor A BI-D Linker long strand AE6448 GACTTGAACCGTAGCATCTAAGCACTATGCTGACCGG Linker primer AE6449 CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCTGACTTGAACCGTAGCATCTAAG Second round LTR primer AE6493 AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT GGCTACGAGATCCCTCAGACCCTTTTAGTCAG CD4+ T cell Linker short strand AE6392 TACTGAGACGTCGATGC-NH2 Donor A Linker long strand AE6393 GATCATGCGAGATACATCTCAGGCATCGACGTCTCAG A77V Linker primer AE6394 CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCTGATCATGCGAGATACATCTCAG Second round LTR primer AE6436 AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT ATGTCAGAGATCCCTCAGACCCTTTTAGTCAG CD4+ T cell Linker short strand AE6459 TACTCAGTAGGCGTAGC-NH2 Donor B WT Linker long strand AE6460 GATTGCAATAATCGCGCTACAGGCTACGCCTACTGAG Linker primer AE6461 CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCTGATTGCAATAATCGCGCTACAG Second round LTR primer AE6492 AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT TAGCTTGAGATCCCTCAGACCCTTTTAGTCAG CD4+ T cell Linker short strand AE6462 TAGTAGTCACGAGCGTC-NH2 Donor B BI-D Linker long strand AE6463 CAGTTAGACTACACGTTAGACGGACGCTCGTGACTAC Linker primer AE6464 CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCTCAGTTAGACTACACGTTAGACG Second round LTR primer AE6411 AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT AGTTCCGAGATCCCTCAGACCCTTTTAGTCAG CD4+ T cell Linker short strand AE6456 TAGACTGACGCAGTCTG-NH2 Donor B Linker long strand AE6457 GACGTACATACTGATCGCATAGCAGACTGCGTCAGTC A77V Linker primer AE6458 CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCTGACGTACATACTGATCGCATAG aLinker-specific primer and second round LTR primers contain DNA-clustering adapter sequences which are color coded as follows: black, complementary to the linker or to the HIV-1 LTR; red, unique index or barcode; green, Illumina sequencing primer binding sites; blue, adapter sequences for DNA clustering.