Supporting Information Appendix

A critical role for alternative polyadenylation factor CPSF6 in targeting

HIV-1 integration to transcriptionally active chromatin

Gregory A. Sowd, Erik Serrao, Hao Wang, Weifeng Wang, Hind J. Fadel, Eric M.

Poeschla, and Alan N. Engelman

SI Materials and Methods ...... 2

SI Figure Legends...... 4

SI References ...... 7

SI Figures (Fig. S1- Fig. S10) ...... 9

SI Tables (Table S1-Table S16) ...... 21

1 SI Materials and Methods

Luciferase Assay and qPCR. At 2 days post infection (dpi), cells were lysed in passive lysis buffer (Promega) by freezing overnight at -80 °C and thawing at 37 °C for 30 min. Insoluble were pelleted at 17,500 xg for 8 min, and supernatant was kept for luciferase assays.

Protein concentration was quantified by Pierce BCA Assay Kit (Thermo Fisher

Scientific). Luminescence, reported as relative light units (RLU), was measured as previously described (1) and normalized for total protein amount.

For infection of cells for quantitative PCR (qPCR), residual DNA was removed by treating virus with 0.08 U TurboDNAse (Ambion)/µL for 1 h at 37° C. Cells were infected for 2 h in the presence of DMSO or 10 µM efavirenz (NIH AIDS Research and Reference Reagent

Program). After 2 h, cells were washed to remove virus. At 8 h post-infection, 1, 2, 5, 10, and 15 dpi, DNA was extracted using the DNeasy Blood and Tissue Kit (Qiagen) and subjected to qPCR as detailed by (1) except that 100 ng of DNA was used in each reaction. In parallel to

DNA extraction, protein lysates were made for luciferase assays at 2, 5, 10, and 15 dpi. qPCR primer sequences used in this study are detailed in SI Appendix, Table S16.

Bioinformatics. RNA-seq reads from three independent biological replicates were processed as described (2). Percent distal polyadenylation usage index (PDUI) values were calculated as detailed in (3, 4). RNAs containing 3’ untranslated regions (UTRs) (SI Appendix, Fig. S6D,E) were identified using the DaPars algorithm (3). 3’ UTRs with significant changes in length were determined as before (3) except the critical P value was lowered to 10-5. To adjust for differences in integration into 3’ UTR containing (SI Appendix, Fig. S6D), each data set in figure S6E was normalized to have a total integration of 100% into 3’ UTR containing genes prior to plotting the percent integration into genes with CPSF6-dependent 3’ UTRs.

Regions around transcriptional start sites (TSSs) were parsed into 5 kb for counting integration sites (Fig. 3C,D). lengths were split into 20 bins before counting integration

2 sites (Fig. 3F,G). Levels of gene expression were parsed into 30 bins, with each bin containing

530 genes (Fig. 4A; SI Appendix, Fig. S9C). Prior to graphing, integration and gene expression data for each cell line were normalized to 100% integration into genes and the average expression level of 40 log(CPM), respectively. Intron density data was grouped into 30 bins (29 equal bins for 0-1 introns kb-1 and 1 bin for genes with ≥1 intron kb-1), and integration frequencies within each bin were summed to yield integration as a function of intron density

(Fig. 4B-D; SI Appendix, Fig. S9A,B). Percent integration along was tabulated using 100 equal bins of length (SI Appendix, Fig. S6A,B). CytoBand coordinates were obtained from the UCSC Genome Annotation Database

(http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/). Percent integrations in sequences associated with activating H4K16ac, H3K36me3, and H3K4me1 epigenetic marks and repressive H3K9me3 and H3K27me3 marks (SI Appendix, Fig. S6C) were calculated using the sequences provided by National Center for Biotechnology Information Sequence Read

Archive (NCBI SRA) accession numbers SRA000206 (5) and SRA000287 (6). Percent integration in sequences contained within CPSF6 ChIP-seq data (Fig. S6F) were calculated using sequences obtained through SRA694190 (7).

Statistical Analyses. A single factor ANOVA was used to confirm significant changes within the experiment (critical P value <0.05) and was followed by individual comparisons using one tailed t-tests assuming unequal variances in Microsoft Excel for figures 2, 5B-D, SI Appendix figures

S1B, S3, S5, S7A, and S8A,B. Fisher’s exact test was used for all other statistics except to calculate P values for average number of genes Mb-1 and integration as a function of gene length, which used the Wilcoxon rank-sum test.

3 SI Figure Legends

Fig. S1. CPSF6 knockdown influences HIV-1 integration site preferences. (A) Western blots of

U2OS cells transfected with the indicated siRNAs. Cell lysates were prepared at 0 and 48 hpi.

The migration positions of mass standards in kDa are to the left. (B) HIV-Luc infectivity as a function of input virus (0, 0.2, 1.1, 2.2, or 11 ng of p24) for n = 6 independent experiments

(averages ± standard error). NS, not significant. (C-E) Frequencies of HIV-1 integration into genes (C), within 5 kb of a CpG island (D), or within 5 kb of a TSS (E) for the indicated samples.

Black stars indicate significant differences from siNON, and grey stars indicate significant differences from the MRC. P Values: *, <0.05; **, <0.01; ***, <0.001; ****, <0.0001. See SI

Appendix Tables S1 and S2 for panel C-E values and complete statistical analysis, respectively.

Fig. S2. CRISPR/Cas9-mediated knockout of CPSF6. (A) Diagram of CPSF6 gene. Exon 6 sequences (red) are included in the larger 588 amino acid isoform of CPSF6. Arrows highlight the cleavage sites of the indicated gRNA/Cas9 combination. (B and D) Upper panels: Diagram of PCR primers surrounding exons 1 and 10 (B) and exon 7 (D) with expected PCR product sizes. Lower panels: Ethidium bromide stained gels of PCR products from the indicated reactions. DNA ladder sizes are in bp. Stars, non-specific amplification products. (C and E) Light grey box: Summary of loss of function mutations or truncations in the CPSF6 gene as determined by sequencing cloned PCR products. Also shown are amino acid sequence alignments of WT CPSF6[551] protein with predicted mutant proteins for the indicated cell lines

(dark grey boxes indicate identical residues; asterisks, stop codons).

Fig. S3. CKO decreases cell growth but has minimal effect on cell cycle progression. (A, B) Plot of WST-1 assay results as a function of number of cells plated, normalized to wild-type (WT)

HEK293T cell absorbance for 2,000 cells plated. (C) Cell doubling rates starting from 250,000 plated cells. A-C results are the average of three independent experiments, with error bars

4 denoting standard deviation. (D) Bar graph of cell percentages in each stage of the cell cycle as determined by flow cytometry. Bars are the average of three independent experiments, with error bars showing standard deviation. *, P <0.05; **, P <0.01 as compared to WT.

Fig. S4. CKO decreases the lengths of 3’ UTRs. (A-C and G, H) Scatter plots of PDUI values from the DaPars algorithm (3) for the indicated cell lines versus WT. (D-F and I, J) Graph of

ΔPDUI (PDUIKnockout-PDUIWT) against fold change in gene expression for B8 CKO (D), C9 CKO

(E), F5 CKO (F), LKO (I), and F6 DKO (J) cells. Each dot represents one gene. Genes with significantly different PDUI values versus WT are shown in red. Criteria used for significance is the same as reference (3), except that the significance factor was lowered to 10-5.

Fig. S5. Profiles of HIV-1 infectivity, LRT and 2-LTR circle formation, and integration. (A-D)

Plots of LRT products (A), 2-LTR circles (B), integration (C), and infectivity (D) for the indicated cell lines at the indicated time points. (E) Levels of IN active site mutant D64N/D116N (NN) infectivity relative to the WT virus on the indicated cell lines. (F) Fold changes in WT and NN virus infectivity (black and grey bars, respectively), normalized to 1 (indicated virus on WT cells).

Data points are the average of two (D) or three independent experiments (A-C,E,F) with error bars showing standard error. *, P <0.05; **, P <0.01; ***, P <0.001; as compared to WT.

Portions of these data appear in main text Fig. 2.

Fig. S6. Integration profiles as a function of chromosome position, histone posttranslational modification, 3’ UTR, and CPSF6 chromatin binding. (A,B) Cartoon depictions of representative chromosomes 7 (A) and 17 (B) divided into 100 equally sized bins (x-axis), with integration site number percent normalized for the total number of events per chromosome (y-axis). Five increasingly dark shades of blue shading indicate the following five degrees of Giemsa staining: negative, 25%, 50%, 75%, and 100%. Red denotes centromere positions. Green arrows denote

5 points of significant phenotypic deviation of WT and LKO integration targeting patterns from those observed in CKO and DKO cells. (C) Frequencies of WT and N74D integration in the indicated cell type with respect to epigenetic marks associated with gene activation (H4K16ac,

H3K36me3, and H3K4me1) or repression (H3K9me3 and H3K27me3) (5, 6). Dotted lines,

MRC. (D) HIV-1 integration into 3’ UTR containing genes is graphed for the indicated cell lines.

(E) The effect of CKO, LKO, and DKO on normalized integration into genes with CPSF6- regulated 3’ UTRs. (F) Integrations into regions identified through CPSF6 ChIP-Seq. P values for the indicated comparisons are shown in the lower panels of C, D, E, and F.

Fig. S7. Moloney murine leukemia virus (MLV) integration is not affected by CKO. (A) Infectivity of MLV-Luc with the indicated cell lines as a function of virus input. NS, not significant. (B-E)

MLV integration frequencies into genes (B), gene dense regions (C), within 5 kb of CpG islands

(D), and within 5 kb of TSSs (E) for the indicated cell lines. Black asterisks in B, D and E denote significant differences versus the WT; gray asterisks are differences versus the MRC. *, P

<0.05: **, P <0.01; ****, P <0.0001. See SI Appendix Tables S10 and S11 for corresponding values and complete statistical analysis, respectively.

Fig. S8. Proliferative capacity and 3’ UTR shortening is rescued by CPSF6 complementation.

Cell metabolism (A) and doubling rates (B) for the specified cell lines (n = 3 independent experiments; error bars show standard deviation). NS, not significant. (C-F) PDUI values for the indicated complemented cell lines versus WT control cells transduced with empty expression vector. Each dot represents one gene. Genes with significantly different PDUI values versus the

WT control are shown in red. Criteria used for significance is the same as reference (3) with the following modification: the P value required for significance was lowered to 10-5.

6 Fig. S9. Integration as a function of intron density and gene expression in CPSF6 complemented cells. (A) Integration as a function of intron density (x-axis sorted into 29 equal bins). (B) Integration into genes with intron density ≥1 kb-1. (C) Integration as a function of gene expression in the indicated cell lines (gene expression levels on the x-axis were parsed into 30 equal bins based on gene number).

Fig. S10. Roles for CPSF6 and LEDGF/p75 in HIV-1 integration targeting in primary MDM. (A)

Representative immunoblots of cells transfected with the indicated siRNAs at 0 and 48 hpi. The arrowhead highlights LEDGF/p75, which barely separated from a more slowly migrating non- specific, cross-reactive band (*). Mass standard positions (in kDa) are to the left. (B-D)

Integration frequencies into genes (B) and nearby TSSs and CpG islands (C and D, respectively). (E) Integration distribution as a function of gene density. Black asterisks in panels

B-D show significant differences from cells transfected with siNON control siRNA whereas gray asterisks show significant differences versus MRC. *, P <0.05; ****, P <0.0001. See SI Appendix

Tables S14 and S15 for corresponding values and complete statistical analysis, respectively.

SI References

1. Jurado KA, et al. (2013) Allosteric integrase inhibitor potency is determined through the

inhibition of HIV-1 particle maturation. Proc Natl Acad Sci U S A 110(21):8690-8695.

2. Wang H, Shun MC, Dickson AK, & Engelman AN (2015) Embryonic lethality due to

arrested cardiac development in Psip1/Hdgfrp2 double-deficient mice. PLoS One

10(9):e0137797.

3. Masamha CP, et al. (2014) CFIm25 links alternative polyadenylation to glioblastoma

tumour suppression. Nature 510(7505):412-416.

4. Xia Z, et al. (2014) Dynamic analyses of alternative polyadenylation from RNA-seq

reveal a 3'-UTR landscape across seven tumour types. Nat Commun 5:5274.

7 5. Barski A, et al. (2007) High-resolution profiling of histone methylations in the human

genome. Cell 129(4):823-837.

6. Wang Z, et al. (2008) Combinatorial patterns of histone acetylations and methylations in

the . Nat Genet 40(7):897-903.

7. Katahira J, et al. (2013) Human TREX component Thoc5 affects alternative

polyadenylation site choice by recruiting mammalian cleavage factor I. Nucleic Acids

Res 41(14):7060-7072.

8. Maertens G, et al. (2003) LEDGF/p75 is essential for nuclear and chromosomal

targeting of HIV-1 integrase in human cells. J Biol Chem 278(35):33528-33539.

9. Vandegraaff N, Devroe E, Turlure F, Silver PA, & Engelman A (2006) Biochemical and

genetic analyses of integrase-interacting proteins lens epithelium-derived growth factor

(LEDGF)/p75 and hepatoma-derived growth factor related protein 2 (HRP2) in

preintegration complex function and HIV-1 replication. Virology 346(2):415-426.

10. Matreyek KA & Engelman A (2011) The requirement for nucleoporin NUP153 during

human immunodeficiency virus type 1 infection is determined by the viral capsid. J Virol

85(15):7818-7827.

11. De Ravin SS, et al. (2014) Enhancers are major targets for murine leukemia virus vector

integration. J Virol 88(8):4504-4513.

8 Sowd_Fig. S1 A B

siNON siC9

siNON siC9 siC11 siL3 siL21 siC9 + siL3siC11 + siL21 siC11 siL3 100 3 siL21 siC9 + siL3 70 CPSF6N-term )

6 siC11+ siL21 55 NS 100 LEDGF/p75

0 hpi 70 2 55

ACTIN NS

100 1 70 CPSF6N-term Infectivity x 10 (RLU/µg

100 LEDGF/p75 70 0 48 hpi 55 0 2 4 6 8 10 12 ACTIN HIV-Luc (ng p24)

C **** ******* **** **** D 6% **** 60% **** **** **** **** ************ **** **** **** **** **** **** **** **** * 4% MRC MRC 40%

2%

20% integration % % integration integration % genes into (+/- 2.5 kb of CpG islands) 0% 0%

siC9 siL3 siC9 siL3 siC11 siL21 siC11 siL21 siNON siNON

siC9 + siL3 siC9 + siL3 siC11 + siL21 siC11 + siL21 E 6% **** **** **** **** ****

4% MRC **** **** **** **** ****

2% % integration integration % (+/- 2.5 kb of TSSs)

0%

siC9 siL3 siC11 siL21 siNON

siC9 + siL3 siC11 + siL21

9 Sowd_Fig. S2 A Exon: 1 2 3 4 5 6 7 8 9 10 CPSF6 gene gRNA: a b c d

B Exon: 1 10 C B8 CPSF6 alleles e h Allele a locus d locus f i A ∆30-31 In1634^1635 (51bp) g B ∆25-32 In1634^1635 (51bp) gRNA: a d C ∆142-23000 CPSF6 PCR Product Size (bp) e x i (WT) 23454 CPSF6[551] 1 11 21 e x i (KO) 536 WT e x f 572 B8- A e x g 1037 B8- B h x i 553 B8- C

WT B8 WT B8 WT B8 546 551 700 700 WT 600 600 500 500 B8- C e x i e x f h x i 12 17

LKO F6 LKO F6 LKO F6 LKO F6 700 F6 CPSF6 alleles 500 1000 Allele a locus d locus 400 e x f h x i e x i e x g A (C) ∆33-48 ∆1601-1686 B (C) ∆142-23001 CPSF6 CPSF6[551] 1 20 WT F6- A/C F6- A/C

D E b/c gRNA b/c- CPSF6 alleles Exon: 7 clone A/B(C) C j C7 ∆802-1115 ∆804-1117 C9 ∆802-1115 k gRNA: b c F5 ∆802-1116 ∆803-1115 PCR Product Size (bp) F9 ∆802-1116 j x k (Full Length) 1034 CPSF6[551] j x k (Deletion) ~719 261 271 281 WT

WT C7 C9 F5 F9 LKO E4 E5 F5 and F9- A/B 1200 F5- C 1000 1000 900 C7 and C9- A/B/(C) 800 700 700 C7 C 600 ∆283-369 370 371 381 391 WT F5 and F9 F5 C C7 and C9 C7 C

LKO-b/c gRNA LKO-b/c - CPSF6 alleles clone A (C) (~700 bp) B (C) (~1 kb) E4 ∆802-1115 Inversion (802-1116) E5 ∆802-1116 Inversion (811-1116)

10 Sowd_Fig. S3 A B

WT WT LKO 80% 80% B8 CKO LKO F6 DKO C9 CKO 40% E5 DKO 40% F5 CKO E4 DKO

F9 CKO % WT (2000 cells plated) % WT (2000 cells plated)

0% 0% 0 500 1000 1500 2000 0 500 1000 1500 2000 Number of cells plated Number of cells plated C D 6 WT 50% LKO WT

B8 CKO LKO

) 40% 6 4 C9 CKO B8 CKO

F5 CKO 30% F6 DKO F9 CKO 20% * 2 % of cells

Cell numberCell (x 10 F6 DKO

E5 DKO 10% ** E4 DKO * 0 0% 0 1 2 3 G0/ G1 S G2/ M < G1 > G2 Time (days) Cell cycle phase

11 12 Sowd_Fig. S5

A EFV LKO B8 CKO F6 DKO WT F5 CKO E4 DKO 200% 160% 120% 10% 80% 40% 0% 1% LRT (% WT at 8 hpi) LRT (% 0.25 0.5 1 2 4 8 16 2 4 8 16 Time (dpi) B C ** 600% 200% 10 dpi 500% 15 dpi 160% 400% 120% 300% 80% 200% ** 100% 40% ** ** Integration (% Integration (% WT) ***

2-LTR (% WT at 24 hpi) 2-LTR (% 0% 0% 0 10 20 30 40 50 WT LKO EFV

) Time (hpi)

D 5 B8 CKOF5 CKOF6 DKOE4 DKO 2.5 2.0 1.0 1.5 1.0 0.1 0.5 0 0.01

Infectivity (RLU/µg x 10 Infectivity(RLU/µg 0 5 10 15 5 10 15 Time (dpi) E F HIV-Luc WT 3% * 100 HIV-Luc IN N/N ** ** ** ** 2% * 10 * * * ** ** **

1% ** ** 1 ** Infectivity (fold WT) 0% 0.1

% infectivity (IN NN/WT virus) WT WT LKO LKO B8 CKOF5 CKOF6 DKOE4 DKO B8 CKOF5 CKOF6 DKOE4 DKO

13 Sowd_Fig. S6

14 Sowd_Fig. S6 C WT 16% LKO CKO DKO WT/ CA N74D 12% MRC

8% % integration

4%

0% H4K16ac H3K36me3 H3K4me1 H3K9me3 H3K27me3

Activating Repressive p values for integration into H4K16-acetylated chromatin Virus CA WT CA N74D Cell line WT LKO CKO DKO WT WT LKO 6.41E-06 CA WT CKO 1.08E-33 9.69E-13 DKO 9.67E-27 9.36E-09 0.13 CA N74D N74D 3.04E-31 3.96E-13 0.32 0.020 MRC 4.52E-25 8.43E-07 1.82E-03 0.14 1.87E-04 p values for integration into H3K36-trimethylated chromatin Virus CA WT CA N74D Cell line WT LKO CKO DKO WT WT LKO 4.20E-07 CA WT CKO 1.88E-66 2.41E-31 DKO 2.63E-56 1.24E-24 0.13 CA N74D N74D 1.59E-41 7.58E-18 0.06 0.59 MRC 7.28E-43 1.90E-14 3.10E-08 1.14E-04 7.04E-03 p values for integration into H3K4-monomethylated chromatin Virus CA WT CA N74D Cell line WT LKO CKO DKO WT WT LKO 3.31E-28 CA WT CKO < 2.2E-305 3.10E-159 DKO 6.38E-232 9.93E-93 2.19E-12 CA N74D N74D 1.05E-221 1.26E-100 0.009 4.47E-04 MRC 1.05E-185 5.21E-56 4.97E-50 7.13E-13 7.99E-23 p values for integration into H3K9-trimethylated chromatin Virus CA WT CA N74D Cell line WT LKO CKO DKO WT WT LKO 0.019 CA WT CKO 1.43E-35 6.32E-23 DKO 1.85E-28 2.09E-17 0.13 CA N74D N74D 2.59E-27 1.77E-17 0.71 0.34 MRC 3.59E-24 1.65E-13 2.05E-04 0.043 5.93E-03 p values for integration into H3K9-trimethylated chromatin Virus CA WT CA N74D Cell line WT LKO CKO DKO WT WT LKO 8.34E-03 CA WT CKO 0.67 1.04E-03 DKO 0.032 0.49 4.67E-03 CA N74D N74D 0.69 0.033 0.40 0.10 MRC 2.04E-03 1.00 7.56E-05 0.41 0.014 15 Sowd_Fig. S6 D E Non-signifiantly affected 3’ UTRs 20%

60% CPSP6-dependent 3’ UTRs

16% MRC

40% 12%

8% 20% genes with 3’ UTRs 4% % integration into genes Normalized % integration into 0% 0%

WT WT LKO CKO DKO MRC LKO CKO DKO Cell Line WT LKO CKO DKO Cell Line WT LKO CKO DKO WT WT LKO 1.11E-252 LKO 3.91E-07 CKO < 2.2E-305 3.25E-51 CKO 9.94E-35 8.14E-69 DKO 1.94E-152 1.16E-13 0.81 DKO 2.33E-04 8.89E-11 9.61E-03 MRC < 2.2E-305 < 2.2E-305 < 2.2E-305 1.83E-72 MRC 0.29 1.15E-06 2.86E-58 1.38E-05

F

8%

6%

4% % integration 2%

0%

WT LKO CKO DKO MRC

WT/ CA N74D

Virus CA WT Cell Line WT LKO CKO DKO WT N74D WT LKO 0.32 CA WT CKO 1.52E-08 6.01E-06 DKO 3.72E-07 8.49E-05 0.47 WT N74D N74D 5.51E-05 2.45E-03 0.29 0.66 MRC 1.78E-23 3.37E-18 2.71E-05 3.54E-07 2.29E-06

16 Sowd_Fig. S7 A B 60% WT LKO 8 CKO DKO **** **** **** **** MRC ) 4 NS 40% 6

20% 4 % of integrations %

2 0%

NSWT vs. CKO Infectivity (RLU/µg x 10 Infectivity WT D LKO CKO DKO NSWT vs. CKO 0 60% 0 2 4 6 8 10 * MLV (µL)

C 40% **

WT 8% CKO 20% **** of CpG of CpG Island LKO **** **** **** **** 6% DKO integration % +/- 2.5 kb MRC MRC 0%

WT 4% LKO CKO DKO E 60%

% of integrations ** 2%

40% * 0% 0 30 60 90 of TSS Gene density (+/- 500 kb) 20%

% integration integration % +/- 2.5 kb **** **** **** **** MRC 0%

WT LKO CKO DKO

17 Sowd_Fig. S8 A B 4 120% ) NS 6 NS 3 80% NS 2

40% 1 Cell numberCell (x 10 % WT (2000 cells plated) % 0% 0 0 500 1000 1500 2000 0 1 2 3 Number of cells plated Time (days)

WT-vector CKO-CPSF6[551] CKO-CPSF6[551]-F284A CKO-vector CKO-CPSF6[588]

18 A Sowd_Fig. S9 20% MRC WT-vector 16% CKO-CPSF6[551] CKO-CPSF6[588]

12% CKO-CPSF6[551]-F284A CKO-vector

8%

4% % integration integration % into genes

0% 0.03 0.05 0.1 0.5 1.0 Intron density (intron number/kb) B 5%

4%

3% MRC 2%

1% with ≥1 intron/kb

% integration% genes into 0%

WT-vector CKO-vector

CKO-CPSF6[551]CKO-CPSF6[588]

C CKO-CPSF6[551]-F284A MRC CKO-CPSF6[588] WT-vector CKO-CPSF6[551]-F284A 8% CKO-CPSF6[551] CKO-vector

6%

4% into genes

2% Normalized % integration integration Normalized %

0% 0.5 1 10 100 1000 Relative gene expression (log(CPM))

19 A B Sowd_Fig. S10

80% **** **** 11 11 siNON siC siL3 siNON siC siL3

100 * * LEDGF/p75 **** **** **** 70 MRC 100 40% 70 CPSF6N-term 55 ACTIN % integration integration % into genes 0 hpi 48 hpi 0%

siL3 siNON siC11

C D

8% **** **** 8%

**** **** **** **** **** 4% MRC **** MRC 4% of TSS % integration integration % % integration integration % +/- 2.5 kb +/- 2.5 kb of CpG Island 0% 0%

siL3 siL3 siNON siC11 siNON siC11

E siNON 8% siL3 siC11 6% MRC

4% % of integrations 2%

0% 0 30 60 90 Gene density (+/- 500 kb)

20 Table S1: HIV-1 integration in U2OS Cells Gene Density siRNA #1 siRNA #2 Unique Sites % in Genes % +/- 2.5 kb CpG % +/- 2.5 kb TSS +/- 500 kb siNON 19403 67.3% 4.9% 4.1% 16.3 siC9 siNON 7356 45.2% 0.9% 1.5% 5.4 siC11 siNON 14625 45.5% 0.9% 1.7% 5.2 siL3 siNON 19474 65.4% 5.9% 5.1% 15.6 siL21 siNON 21899 67.2% 5.1% 4.4% 16.1 siC9 siL3 8793 49.3% 1.8% 2.3% 6.3 siC11 siL21 14025 45.8% 1.1% 2.1% 5.5 Matched Random Control 50000 44.7% 4.2% 4.0% 8.7

Table S2: Statistics for HIV-1 integration in U2OS Cells p values: percentage of integrations in genes siRNA MRC siNON siC9 siC11 siL3 siL21 siC9 + L3 MRC siNON <2.2E-302 siC9 0.37 2.67E-236 siC11 0.08 <2.2E-302 0.72 siL3 <2.2E-302 9.75E-05 6.32E-197 1.95E-297 siL21 <2.2E-302 0.92 4.48E-243 <2.2E-302 9.64E-05 siC9 + siL3 4.18E-16 4.44E-179 1.77E-07 9.85E-09 5.64E-143 1.08E-184 siC11 + siL21 0.02 <2.2E-302 0.44 0.61 8.15E-282 <2.2E-302 1.56E-07

p values: percentage of integrations +/- 2.5 kb of a CpG islands siRNA MRC siNON siC9 siC11 siL3 siL21 siC9 + L3 MRC siNON 4.83E-05 siC9 1.28E-59 3.88E-68 siC11 1.61E-105 9.74E-111 0.88 siL3 1.51E-21 7.48E-06 5.21E-92 4.75E-151 siL21 2.66E-08 0.2793178 4.78E-75 4.35E-124 4.4E-5 siC9 + siL3 1.71E-32 8.31E-41 8.32E-07 1.02E-08 2.46E-62 8.66E-47 siC11 + siL21 2.60E-89 7.01E-96 0.22 0.19 2.75E-133 7.10E-108 6.68E-06

p values: percentage of integrations +/- 2.5 kb of a TSS siRNA MRC siNON siC9 siC11 siL3 siL21 siC9 + L3 MRC siNON 0.85 siC9 8.61E-32 3.27E-28 siC11 2.01E-46 3.82E-37 0.29 siL3 2.32E-09 1.84E-06 3.02E-46 9.43E-65 siL21 0.03 0.12 1.92E-34 4.28E-47 8.7E-4 siC9 + siL3 7.80E-17 1.05E-14 4.6E-4 2.8E-3 7.65E-30 1.59E-19 siC11 + siL21 4.10E-30 2.95E-24 3.1E-3 0.02 4.01E-47 4.76E-32 0.37

p values: average number of genes/ integration (+/- 500 kb of integration site) siRNA MRC siNON siC9 siC11 siL3 siL21 siC9 + L3 MRC siNON <2.2E-302 siC9 1.43E-286 <2.2E-302 siC11 1.50E-289 <2.2E-302 0.32 siL3 <2.2E-302 3.6E-3 <2.2E-302 <2.2E-302 siL21 <2.2E-302 0.21 <2.2E-302 <2.2E-302 0.08 siC9 + siL3 1.56E-151 <2.2E-302 8.48E-25 4.23E-29 <2.2E-302 <2.2E-302 siC11 + siL21 <2.2E-302 <2.2E-302 0.01 3.2E-4 <2.2E-302 <2.2E-302 1.27E-20

21 Table S3: HIV-1 integration in representative CKO cell lines Gene Density Cell Line Virus Unique Sites % in Genes % +/- 2.5 kb CpG % +/- 2.5 kb TSS +/- 500kb WT WT 41696 78.9% 5.6% 4.2% 20.3 B8 WT 66886 55.3% 1.0% 1.7% 5.7 C9 WT 14478 57.1% 1.1% 1.8% 6.0 F5 WT 57763 58.7% 1.1% 1.8% 6.1 WT CA N74D 6981 64.7% 1.2% 2.2% 6.2 B8 CA N74D 10796 57.1% 0.9% 1.7% 5.9 C9 CA N74D 5959 59.1% 0.9% 1.6% 6.2 F5 CA N74D 5307 62.6% 1.7% 2.0% 7.2 MRC 50000 44.7% 4.2% 4.0% 8.7

22 Table S4: Statistics for HIV-1 integration in CKO cell lines p values: percentage of integrations in genes Virus CA WT CA N74D Cell line WT B8 C9 F5 WT WT B8 < 2.2E-305 CA WT C9 < 2.2E-305 7.57E-05 F5 < 2.2E-305 1.04E-33 4.86E-04 CA N74D WT 4.42E-139 1.03E-51 2.76E-26 5.92E-22 MRC < 2.2E-305 2.67E-284 1.27E-153 < 2.2E-305 2.36E-217

Virus CA N74D CA WT Cell line WT B8 C9 F5 B8 C9 F5 WT B8 1.17E-23 < 2.2E-305 CA N74D C9 7.58E-11 0.014 9.29E-03 F5 0.01788513 3.77E-11 1.52E-04 3.50E-08 MRC 2.36E-217 1.35E-122 9.02E-99 5.70E-137

p values: average number of genes/ integration (+/- 500 kb of integration site) Virus CA WT CA N74D Cell line WT B8 C9 F5 WT WT B8 < 2.2E-305 CA WT C9 < 2.2E-305 1.60E-10 F5 < 2.2E-305 3.08E-37 0.14 CA N74D WT < 2.2E-305 1.92E-17 1.15E-03 8.45E-03 MRC < 2.2E-305 < 2.2E-305 3.38E-218 < 2.2E-305 1.35E-90

Virus CA N74D CA WT Cell line WT B8 C9 F5 B8 C9 F5 WT B8 1.23E-05 8.08E-05 CA N74D C9 0.86 6.77E-05 3.83E-03 F5 4.61E-08 8.06E-23 5.07E-08 6.25E-20 MRC 1.35E-90 8.72E-196 4.50E-80 4.44E-28

p values: percentage of integrations +/- 2.5 kb of a TSS Virus CA WT CA N74D Cell line WT B8 C9 F5 WT WT B8 1.04E-135 CA WT C9 4.31E-48 0.44 F5 2.92E-114 0.17 0.94 CA N74D WT 3.61E-17 1.23E-03 0.02 0.01 MRC 0.14 1.15E-130 1.14E-43 1.35E-108 7.02E-15

Virus CA N74D CA WT Cell line WT B8 C9 F5 B8 C9 F5 WT B8 0.022 0.66 CA N74D C9 8.70E-03 0.49 0.41 F5 1.78E-52 0.23 0.10 0.24 MRC 7.02E-15 1.03E-35 7.95E-25 6.38E-15

23 Table S4 (continued) p values: percentage of integrations +/- 2.5 kb of a CpG islands Virus CA WT CA N74D Cell line WT B8 C9 F5 WT WT B8 < 2.2E-305 CA WT C9 7.30E-150 0.33 F5 < 2.2E-305 0.17 0.93 CA N74D WT 1.49E-74 0.08 0.40 0.27 MRC 9.45E-24 1.90E-285 1.64E-91 8.19E-245 4.99E-44

Virus CA N74D CA WT Cell line WT B8 C9 F5 B8 C9 F5 WT B8 0.03 0.29 CA N74D C9 0.07 1.00 0.22 F5 0.04 1.60E-05 2.24E-04 2.05E-04 MRC 4.99E-44 3.49E-84 9.17E-50 1.29E-23

24 Table S5: HIV-1 integration in CKO, LKO, and DKO cells Gene Density Cell Line Virus Unique Sites % in Genes % +/- 2.5 kb CpG % +/- 2.5 kb TSS +/- 500 kb WT WT 23026 82.7% 5.5% 4.3% 20.7 LKO WT 21981 62.8% 11.6% 10.1% 13.7 CKO WT 31761 57.0% 1.0% 1.8% 5.8 DKO WT 31979 48.3% 5.1% 5.3% 6.6 WT CA N74D 19488 63.3% 1.5% 2.0% 6.2 LKO CA N74D 13158 54.8% 7.6% 7.6% 8.8 CKO CA N74D 16984 58.2% 1.1% 1.7% 6.1 DKO CA N74D 13404 55.2% 6.2% 6.0% 7.9 MRC 50000 44.7% 4.2% 4.0% 8.7

25 Table S6: Statistics for HIV-1 integration in CKO, LKO, and DKO cells p values: percentage of integrations in genes Virus CA WT CA N74D Cell line WT LKO CKO DKO WT WT LKO < 2.2E-305 CA WT CKO < 2.2E-305 4.09E-26 DKO < 2.2E-305 9.47E-186 2.45E-92 CA N74D WT < 2.2E-305 2.08E-03 2.97E-41 1.41E-213 MRC < 2.2E-305 < 2.2E-305 1.16E-233 3.83E-24 < 2.2E-305

Virus CA N74D CA WT Cell line WT LKO CKO DKO LKO CKO DKO WT LKO 6.58E-62 2.41E-46 CA N74D CKO 6.11E-21 3.11E-14 0.01 DKO 1.32E-68 0.51 5.32E-17 9.95E-19 MRC < 2.2E-305 2.95E-69 4.14E-186 1.41E-64

p values: percentage of integrations +/- 2.5 kb of a CpG islands Virus CA WT CA N74D Cell line WT LKO CKO DKO WT WT LKO 6.22E-112 CA WT CKO 1.14E-218 < 2.2E-305 DKO 0.07 1.66E-151 1.17E-222 CA N74D WT 1.32E-121 < 2.2E-305 5.63E-06 3.74E-118 MRC 3.20E-14 1.59E-260 7.33E-181 7.07E-10 3.32E-86

Virus CA N74D CA WT Cell line WT LKO CKO DKO LKO CKO DKO WT LKO < 2.2E-305 2.36E-34 CA N74D CKO < 2.2E-305 1.68E-135 0.24 DKO < 2.2E-305 1.23E-07 1.57E-84 5.38E-04 MRC < 2.2E-305 3.20E-53 1.77E-53 6.70E-18

p values: percentage of integrations +/- 2.5 kb of a TSS Virus CA WT CA N74D Cell line WT LKO CKO DKO WT WT LKO 1.37E-124 CA WT CKO 3.05E-66 < 2.2E-305 DKO 6.93E-09 5.66E-89 2.52E-135 CA N74D WT 2.34E-42 3.28E-276 0.12 7.35E-87 MRC 0.21 1.32E-193 3.42E-79 3.04E-17 3.33E-46

Virus CA N74D CA WT Cell line WT LKO CKO DKO LKO CKO DKO WT LKO 5.31E-129 1.04E-15 CA N74D CKO 0.11 2.40E-134 0.83 DKO 3.95E-77 1.07E-07 2.30E-83 0.04 MRC 3.33E-46 1.96E-53 1.67E-52 6.69E-18

26 Table S6 (continued) p values: average number of genes/ integration (+/- 500 kb of integration site)

Virus CA WT CA N74D Cell line WT LKO CKO DKO WT WT LKO < 2.2E-305 CA WT CKO < 2.2E-305 < 2.2E-305 DKO < 2.2E-305 < 2.2E-305 0.01 CA N74D WT < 2.2E-305 < 2.2E-305 7.13E-17 5.95E-08 MRC < 2.2E-305 < 2.2E-305 < 2.2E-305 < 2.2E-305 1.33E-220

Virus CA N74D CA WT Cell line WT LKO CKO DKO LKO CKO DKO WT LKO 3.07E-120 < 2.2E-305 CA N74D CKO 1.74E-07 2.48E-155 0.04 DKO 4.18E-58 1.01E-11 1.34E-86 8.66E-97 MRC 1.33E-220 0.61 8.60E-274 6.59E-20

27 Table S7: Statistics for HIV-1 integration along gene bodies p values: percentage integration along the length of genes Cell line WT LKO CKO DKO WT LKO 4.03E-08 CKO 1.20E-39 7.51E-58 DKO 0.02 0.81 7.14E-14 MRC 1.95E-29 1.61E-47 0.03 6.00E-11

Virus CA WT CA N74D Cell line WT WT LKO CKO DKO CA WT WT WT 1.42E-23 LKO 2.41E-07 4.49E-31 CA N74D CKO 1.27E-23 0.53 1.59E-31 DKO 3.29E-03 5.04E-22 0.09 1.18E-22 MRC 1.95E-29 0.94 0.01 0.43 1.06E-24

28 Table S8: Statistics for HIV-1 integration correlated with gene expression p values: percentage integration into various gene expression bins <5 CPM Virus CA WT CA N74D Cell line WT LKO CKO DKO WT WT 8.32E-117 LKO 6.72E-84 CA WT CKO 6.31E-168 4.79E-12 1.29E-07 DKO 8.60E-97 6.84E-20 2.64E-08 MRC < 2.2E-305 < 2.2E-305 < 2.2E-305 2.92E-46 2.38E-96 5≤ Gene Expression< 10 CPM Virus CA WT CA N74D Cell line WT LKO CKO DKO WT WT 2.17E-39 LKO 3.38E-108 CA WT CKO 3.00E-180 4.33E-07 2.54E-07 DKO 2.17E-58 3.07E-03 0.75 MRC 4.11E-195 4.78E-06 0.32 0.43 1.45E-14 10≤ Gene Expression< 20 CPM Virus CA WT CA N74D Cell line WT LKO CKO DKO WT WT 0.77 LKO 0.02 CA WT CKO 0.36 0.10 0.28 DKO 2.69E-08 4.10E-12 5.62E-10 MRC 3.38E-71 8.78E-93 1.09E-97 1.67E-03 2.82E-06 20≤ Gene Expression< 50 CPM Virus CA WT CA N74D Cell line WT LKO CKO DKO WT WT 0.23 LKO 0.67 CA WT CKO 0.10 0.25 0.13 DKO 4.49E-14 2.86E-13 6.87E-12 MRC 2.59E-78 4.85E-73 8.55E-79 0.19 1.20E-13 50≤ Gene Expression< 100 CPM Virus CA WT CA N74D Cell line WT LKO CKO DKO WT WT 3.80E-27 LKO 4.82E-75 CA WT CKO 1.39E-58 1.77E-04 0.91 DKO 1.37E-15 0.03 0.82 MRC < 2.2E-305 3.17E-75 1.27E-138 2.70E-31 1.07E-48 ≥ 100 CPM Virus CA WT CA N74D Cell line WT LKO CKO DKO WT WT 2.05E-50 LKO 2.52E-06 CA WT CKO 3.71E-89 1.41E-49 0.67 DKO 1.23E-39 1.72E-26 9.32E-04 MRC < 2.2E-305 3.58E-304 6.23E-120 1.61E-12 3.53E-40

29 Table S9: Statistics for HIV-1 integration into genes with varied intron densities p values: percentage integration into various intron densities <0.135 introns/ kb Cell line WT LKO CKO DKO WT CA N74D WT LKO 5.27E-59 CKO < 2.2E-305 1.07E-210 DKO 1.48E-210 3.66E-105 1.85E-09 WT CA N74D < 2.2E-305 1.58E-118 8.99E-07 4.83E-17 MRC < 2.2E-305 1.18E-225 0.10 6.65E-12 1.07E-04 0.135≤ intron density< 0.467 intons/ kb Cell line WT LKO CKO DKO WT CA N74D WT LKO 1.81E-107 CKO 8.85E-186 5.90E-07 DKO 8.60E-44 0.07 0.47 WT CA N74D 1.41E-88 0.18 3.05E-10 0.01 MRC < 2.2E-305 1.69E-102 4.78E-73 5.14E-18 2.96E-109 ≥ 0.467 intons/ kb Cell line WT LKO CKO DKO WT CA N74D WT LKO 0.51 CKO < 2.2E-305 < 2.2E-305 DKO 1.79E-37 2.03E-35 7.24E-252 WT CA N74D 8.65E-252 3.43E-238 < 2.2E-305 2.77E-10 MRC 4.70E-242 6.44E-224 < 2.2E-305 0.04 7.03E-18

30 Table S10: MoMLV integration in LKO, CKO, and DKO cell lines Gene Density Cell Line Virus Unique Sites % in Genes % +/- 2.5 kb CpG % +/- 2.5 kb TSS +/- 500kb WT MoMLV 693 56.7% 50.2% 46.6% 15.8 LKO MoMLV 813 53.3% 55.8% 54.1% 15.9 CKO MoMLV 267 56.2% 50.9% 47.2% 16.3 DKO MoMLV 411 51.1% 40.1% 40.1% 13.4 MRC 50000 44.7% 4.2% 4.0% 8.7

Table S11: Statistics for MoMLV integration in LKO, CKO, and DKO cell lines p values: percentage of integrations in genes Cell line WT LKO CKO DKO WT LKO 0.19 CKO 0.88 0.44 DKO 0.08 0.51 0.21 MRC 3.05E-10 1.26E-06 2.01E-04 9.55E-03

p values: percentage of integrations +/- 2.5 kb of a CpG islands Cell line WT LKO CKO DKO WT LKO 0.030 CKO 0.89 0.18 DKO 1.19E-03 2.51E-07 7.10E-03 MRC 8.92E-269 < 2.2E-302 1.09E-109 2.64E-111

p values: percentage of integrations +/- 2.5 kb of a TSS Cell line WT LKO CKO DKO WT LKO 3.80E-03 CKO 0.89 0.06 DKO 0.04 4.11E-06 0.08 MRC 1.91E-242 < 2.2E-302 9.89E-99 3.99E-114

p values: average number of genes/ integration (+/- 500 kb) Cell line WT LKO CKO DKO WT LKO 0.22 CKO 0.87 0.34 DKO 3.31E-03 7.48E-05 0.03 MRC 5.10E-65 9.60E-90 1.51E-24 9.79E-20

31 Table S12: HIV-1 integration in CPSF6 complemented cells

Gene Density Cell Line Virus Unique Sites % in Genes % +/- 2.5 kb CpG % +/- 2.5 kb TSS +/- 500kb WT-vector HIV-1 WT 24922 80.6% 5.3% 4.0% 20.5 B8-CPSF6[551] HIV-1 WT 34602 75.6% 5.0% 4.0% 18.1 B8-CPSF6[588] HIV-1 WT 19171 74.3% 4.7% 3.9% 17.3 B8-CPSF6[551]-F284A HIV-1 WT 29894 55.4% 1.1% 1.8% 5.4 B8-vector HIV-1 WT 29563 56.1% 1.0% 1.9% 5.8 MRC 50000 44.7% 4.2% 4.0% 8.7 Table S13: Statistics for HIV-1 integration into CPSF6 complemented cells p values: percentage of integrations in genes Cell line WT-vector B8-CPSF6[551] B8-CPSF6[588] B8-vector B8-CPSF6[551]-F284A WT-vector B8-CPSF6[551] 2.33E-48 B8-CPSF6[588] 9.83E-56 1.20E-03 B8-vector < 2.2E-302 < 2.2E-302 < 2.2E-302 B8-CPSF6[551]-F284A < 2.2E-302 < 2.2E-302 < 2.2E-302 0.08 MRC < 2.2E-302 < 2.2E-302 < 2.2E-302 1.36E-213 1.38E-189 p values: percentage of integrations +/- 2.5 kb of a CpG islands Cell line WT-vector B8-CPSF6[551] B8-CPSF6[588] B8-vector B8-CPSF6[551]-F284A WT-vector B8-CPSF6[551] 0.16 B8-CPSF6[588] 0.01 0.19 B8-vector 3.64E-194 3.05E-201 3.80E-143 B8-CPSF6[551]-F284A 9.48E-193 5.41E-200 7.03E-142 0.84 MRC 7.30E-11 3.77E-08 1.79E-03 8.95E-164 2.40E-162 p values: percentage of integrations +/- 2.5 kb of a TSS Cell line WT-vector B8-CPSF6[551] B8-CPSF6[588] B8-vector B8-CPSF6[551]-F284A WT-vector B8-CPSF6[551] 0.90 B8-CPSF6[588] 0.68 0.57 B8-vector 2.89E-50 1.03E-58 2.78E-41 B8-CPSF6[551]-F284A 9.72E-52 1.62E-60 1.31E-42 0.86 MRC 0.91 0.99 0.54 1.24E-66 7.90E-69 p values: average number of genes/ integration (+/- 500 kb of integration site) Cell line WT-vector B8-CPSF6[551] B8-CPSF6[588] B8-vector B8-CPSF6[551]-F284A WT-vector B8-CPSF6[551] 2.96E-120 B8-CPSF6[588] 3.89E-136 < 2.2E-302 B8-vector < 2.2E-302 8.94E-06 < 2.2E-302 B8-CPSF6[551]-F284A < 2.2E-302 < 2.2E-302 < 2.2E-302 1.79E-17 MRC < 2.2E-302 < 2.2E-302 < 2.2E-302 < 2.2E-302 < 2.2E-302

32 Table S14: HIV-1 integration in MDM cells Gene Unique % +/- 2.5 kb Cell Line Virus % in Genes % +/- 2.5 kb CpG Density +/- Sites TSS 500kb Non-transfected WT 27753 75.3% 7.0% 5.5% 20.9 siNON WT 27954 76.0% 6.6% 5.5% 19.1 siC11 WT 6200 73.2% 3.9% 3.8% 11.0 siL3 WT 11280 72.4% 8.5% 7.2% 16.7 MRC 50000 44.7% 4.2% 4.0% 8.7

Table S15: Statistics for HIV-1 integration in MDM cells p values: percentage of integrations in genes Cell Line siNON siC11 siL3 siNON siC11 8.01E-08 siL3 8.22E-18 0.20 MRC <1.9E-320 <1.9E-320 <1.9E-320 p values: percentage of integrations +/- 2.5 kb of a CpG islands Cell Line siNON siC11 siL3 siNON siC11 1.99E-23 siL3 3.66E-14 1.75E-45 MRC 8.80E-55 0.16 1.28E-88 p values: percentage of integrations +/- 2.5 kb of a TSS Cell Line siNON siC11 siL3 siNON siC11 4.99E-11 siL3 4.29E-14 1.49E-28 MRC 4.37E-24 0.34 1.77E-55 p values: average number of genes/ integration (+/- 500 kb of integration site) Cell Line siNON siC11 siL3 siNON siC11 <1.9E-320 siL3 0.32 <1.9E-320 MRC <1.9E-320 <1.9E-320 <1.9E-320

33 Table S16: List of oligonucleotides, siRNA, and primers used in this study Oligonucleotide name Sequence (5'-3') Purpose Reference AE6812 CACCGGACCACATAGACATTTACG CPSF6 gRNA a This paper AE6813 AAACCGTAAATGTCTATGTGGTCC CPSF6 gRNA a This paper AE6814 CACCGGGCGATCTCCTCGATTAGG CPSF6 gRNA b This paper AE6815 AAACCCTAATCGAGGAGATCGCCC CPSF6 gRNA b This paper AE6816 CACCGGACCTCGGCTATCTGATGT CPSF6 gRNA c This paper AE6817 AAACACATCAGATAGCCGAGGTCC CPSF6 gRNA c This paper AE6818 CACCGATGACGATATTCGCGCTCT CPSF6 gRNA d This paper AE6819 AAACAGAGCGCGAATATCGTCATC CPSF6 gRNA d This paper AE6836 GCACGA AAGCTTGCCACCATGGCGGACGGCGTGGA 5' HindIII restiction site addition to CPSF6 cDNA This paper AE6837 GTTGGA AAGCTTCTAACGATGACGATATTCGCGCTCTCG 3' HindIII restiction site addition to CPSF6 cDNA This paper AE6829 TGGCACGAATTCAACGTCCTCTGCCCTCAG CPSF6-a EcoRI genomic DNA (e in Fig. S2) This paper AE6830 CACTGAGGATCCCCAACAATAGGGAGCGAGGC CPSF6-a BamHI genomic DNA (f in Fig. S2) This paper AE6831 TGGCACGAATTCGTAAGGATATACTTCATTGTAGTTGGTAGTG CPSF6-b EcoRI genomic DNA (j in Fig. S2) This paper AE6832 CACTGAGGATCCGCGTTCTTGCAGTATCCATTTCC CPSF6-c BamHI genomic DNA (k in Fig. S2) This paper AE6833 TGGCACGAATTCGCAGCTCAGGATAGTAAGTTTAAACCAG CPSF6-d EcoR genomic DNA (h in Fig. S2) This paper AE6834 CACTGAGGATCCTGAAACCTGAAAGTGATAACTCAGCA CPSF6-d BamHI genomic DNA (i in Fig. S2) This paper AE6835 CACTGAGGATCCCGACTGGGCTTCAAAGCAC CPSF6-a 1037 bp BamHI genomic DNA (g in Fig. S2) This paper AE6853 CTGGACAACCTGCTGGGCAGCCTCCATTGGGTC CPSF6 F284A mutagenesis This paper AE6854 GAGGCTGCCCAGCAGGTTGTCCAGGAAAAAGAACTG CPSF6 F284A mutagenesis This paper siNON UGGUUUACAUGUCGACUAAdTdT Nontargeting siRNA GE Dharmacon D-001810-01-20 This paper UUAGUCGACAUGUAAACCAdTdT siNON (-) strand siC9 CAUAGUAGAUCACGAGAAAdTdT CPSF6 siRNA GE Dharmacon J-012334-09-0005 This paper UUUCUCGUGAUCUACUAUGdTdT siC9 (-) strand siC11 CGUCAUAAAUCCCGUAGUAdTdT CPSF6 siRNA GE Dharmacon J-012334-11-0005 This paper UACUACGGGAUUUAUGACGdTdT siC11 (-) strand siL3 AGACAGCAUGAGGAAGCGAdTdT LEDGF siRNA 8 UCGCUUCCUCAUGCUGUCUdTdT siL3 (-) strand siL21 GGUCAAAGACUCUAAAUGGAGdTdT LEDGF siRNA 9 CUCCAUUUAGAGUCUUUGACCdTdT siL21 (-) strand AE2963 TGTGTGCCCGTCTGTTGTGT LRT qPCR forward primer 1 AE4422 GAGTCCTGCGTCGAGAGATC LRT qPCR reverse primer 1 AE2965 [FAM]CAGTGGCGCCCGAACAGGGA[TAMRA] LRT qPCR TaqMan probe 1 AE4450 GCCTGGGAGCTCTCTGGCTAA 2-LTR circle qPCR forward primer 1 AE4451 GCCTTGTGTGTGGTAGATCCA 2-LTR circle qPCR reverse primer 1 AE4452 [FAM]AAGTAGTGTGTGCCCGTCTGTTGTGTGACTC[TAMRA] 2-LTR circle qPCR TaqMan probe 1 AE3014 ATGCCACGTAAGCGAAACTCTGGCTAACTAGGGAACCCACTG Alu-PCR 1st round primer (HIV-1 R region primer) 10 AE1066 TCCCAGCTACTCGGGAGGCTGAGG Alu-PCR 1st round primer (Alu primer) 10 AE3013 ATGCCACGTAAGCGAAACTC Alu-PCR qPCR 2nd round nested primer 10 AE990 CTGACTAAAAGGGTCTGAGG Alu-PCR qPCR 2nd round nested primer 10 AE995 [FAM]TTAAGCCTCAATAAAGCTTGCCTTGAGTGC[TAMRA] Alu-PCR qPCR probe 10 AE5316 TGTGACTCTGGTAACTAGAGATCCCTC HIV-1 LTR 1st round integration site sequencing primer This paper AE6404* AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGA HIV-1 LTR 2nd round integration site sequencing primer This paper TCTXXXXXXGAGATCCCTCAGACCCTTTTAGTCAG AE6624 CCTTGGGAGGGTCTCCTCTGAGT MLV LTR 1st round integration site sequencing primer 11 AE6625* AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGA MLV LTR 2nd round integration site sequencing primer 11 TCTXXXXXXTGACTACCCGTCAGCGGAGGTC

AE6392** TACTGAGACGTCGATGC-NH2 Linker short strand oligo This paper AE6393** GATCATGCGAGATACATCTCAGGCATCGACGTCTCAG Linker long strand oligo This paper AE6394** CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTT Linker integration site sequencing primer (1st and second This paper CCGATCTGATCATGCGAGATACATCTCAG round)

* Various 6-bp barcode sequences were utilized at base positions marked by "XXXXXX." ** Various linkers were designed based on scrambling the listed short and long oligo sequences. Therefore multiple versions of the listed primer/oligos were used. 34