bioRxiv preprint doi: https://doi.org/10.1101/611848; this version posted April 17, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Full-coverage sequencing of HIV-1 provirus from a reference

Alejandro R. Gener1,2,3,4

1Baylor College of Medicine, Integrative Molecular and Biomedical Sciences Program, Houston, Texas, USA 2Department of Medicine, Nephrology, Baylor College of Medicine, Houston, Texas, USA 3MD Anderson Center, Department of Genetics, Houston, Texas, USA 4Universidad Central del Caribe, School of Medicine, Bayamón, Puerto Rico, USA

Corresponding author Alejandro R. Gener: [email protected] ; [email protected]

Conflict(s) of interest: None.

Key words: nanopore, DNA sequencing, HIV-1, HIV sequence variability, , viral heterogeneity

Funding: This work was funded in part by institutional support from Baylor College of Medicine; the Human Sequencing Center, Baylor College of Medicine; private funding by East Coast Oils, Inc., Jacksonville, Florida, and my own private funding, including Student Genomics (manuscripts in prep). I have also received the PFLAG of Jacksonville scholarship for multiple years. bioRxiv preprint doi: https://doi.org/10.1101/611848; this version posted April 17, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

2 ABSTRACT

Objective(s): To evaluate nanopore DNA sequencing for sequencing full-length HIV-1

provirus.

Design: I used nanopore sequencing to sequence full-length HIV-1 from a plasmid

(pHXB2).

Methods: pHXB2 plasmid was processed with the Rapid PCR-Barcoding library kit and

sequenced on the MinION sequencer (Oxford Nanopore Technologies, Oxford., UK). Raw

fast5 reads were converted into fastq (base called) with Albacore, Guppy, and FlipFlop

base callers. Reads were first aligned to the reference with BWA-MEM to evaluate sample

coverage manually. Reads were then assembled with Canu into contigs, and contigs

manually finished in SnapGene.

Results: I sequenced full-length HXB2 HIV-1 from 5’ to 3’ LTR (100%), with median per-

base coverage of over 9000x in one 12-barcoded experiment on a single MinION flow .

The longest HIV-spanning read to-date was generated, at a length of 11,487 bases, which

included full-length HIV-1 and plasmid backbone on either side. At least 20 variants were

discovered in pHXB2 compared to reference.

Conclusions: The MinION sequencer performed as-expected, covering full-length HIV. The

discovery of variants in a dogmatic reference plasmid demonstrates the need for single-

molecule sequence verification moving forward. These results illustrate the utility of long

read sequencing to advance the study of HIV at single integration site resolution.

bioRxiv preprint doi: https://doi.org/10.1101/611848; this version posted April 17, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

3 INTRODUCTION

Researchers have yet to observe integrated HIV-1 in one take, hindering locus-

specific studies. Complicating matters is the cycling that all orterviruses [1] undergo

between two major states (as infectious virion RNA and integrated proviral DNA),

repetitive viral sequences like long terminal repeats (LTRs), non-integrated forms [2], and

alternative splicing of viral mRNAs [3]. Short read DNA sequencing (less than 150 base

pairs (bp) in most reported experiments, but up to 500 bp) provides some information,

but analyses require high coverage and extensive effort (non-exhaustive examples [4,5]).

Repetitive sequences in HIV, rarity of integration events in vivo, and non-integrated HIV

limit the ability to assign DNA variants to specific loci within each provirus, as well as the

proviral integration sites (partially reviewed in [6]). Others have studied the products of

HIV by analyzing cellular HIV mRNAs with long read sequencing, but with

reverse transcription of these viral [7], which are well known to diverge from

provirus. Despite progress, researchers have yet to observe the genome of HIV-1 as

complete provirus (integrated DNA), hindering locus-specific studies. To this end, current

long read DNA sequencing clearly surpasses the limitations of read length of leading next-

generation sequencing platforms. Here I used the MinION sequencer (Oxford Nanopore

Technologies, Oxford UK) to sequence HIV-1 as plasmid (pHXB2), with the goal of

evaluating the technology for future applications.

bioRxiv preprint doi: https://doi.org/10.1101/611848; this version posted April 17, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

4 METHODS

HIV-1 plasmid

A plasmid containing the HIV-1 reference strain HXB2 was acquired from the AIDS

Reagent Program. HXB2 is one of the earliest clinical isolates, but it was unknown

whether this plasmid (a molecular clone of the original isolate) was ever sequence-

verified after the reference sequence (below) was deposited.

HIV-1 sequences

The reference sequence of HXB2 is from NCBI, Accession number K03455.1.

DNA sequencing

Sequence of a plasmid containing HXB2 was sequence-verified with long-read

nanopore sequencing on the MinION (Oxford Nanopore Technologies, Oxford UK). While

competing long read sequencing platforms exist, they were not evaluated in this Student

Genomics pilot and will not be discussed further. Unless otherwise noted, reagents and

software were purchased from Oxford Nanopore Technologies. Briefly, stock plasmid was

diluted to 5 ng final amount in milliQ water and processed with Rapid PCR Barcoding kit

SQK-RPB004 along with 10 other barcoded samples (not discussed further in this

manuscript) following ONT protocol RPB_9059_V1_REVA_08MAR2018. Two DNA

polymerases were evaluated (barcode 10 used LA Taq (Takara); barcode 11 Taq (Sigma-

Aldrich). Libraries were loaded onto a MinION flow cell version R9.4.1 and a 48-hour

sequencing run was completed with MinKNOW (version 1.10.11) on a mid 2010 MacBook

Pro (Apple Inc. Cupertino CA USA) with 16 Gb RAM and 2.7 GHz dual-core i7 processor

(Intel Corp., Santa Clara CA USA). Residual reads from subsequent runs were pooled for

final analyses. bioRxiv preprint doi: https://doi.org/10.1101/611848; this version posted April 17, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

5 Raw data was base called (converted from fast5 to fastQ format) with Albacore

version 2.3.4 (older base caller), Guppy version 2.3.1 (current official), and FlipFlop

(development Guppy) on a compute cluster. At the time, only albacore allowed for direct

demultiplexing. Guppy (guppy_barcoder) provided similar functionality by providing a

summary file with read ID and barcode, and grep was used to pull read length and mean

read quality (Supplemental Figure 1) from guppy_basecaller summary file.

Demultiplexed basecalled reads were fed into Canu version 8 implemented on a

cluster [8]. Genome size was estimated to be 16 Kb from agarose gel of undigested, but

naturally degraded pHXB2 (data not shown). Alignments to reference were made with

BWA-MEM [9], implemented in Galaxy (usegalaxy.org) [10]. To down sample data from

barcode 10/LA Taq sample, I used seqtk_sample, seed 1, (Heng Li, Broad Institute) to take

the amount of data (file size) in barcode 11 and divided by the amount of data (file size) in

barcode 10. Seqtk was implemented in Galaxy. SAMTOOLS Flagstat [11] was used for

mapping stats from BAM files from BWA-MEM, implemented in Galaxy. SnapGene

version 4.3.4 was used to manually annotate contigs from Canu. Blastn (NCBI) was used

to identify unknown regions of pHXB2.

Statistics

Two-tailed Mann-Whitney U tests were used to compare distributions. P-values

are reported over brackets delineating relevant comparisons. Calculations and graphing

were done with GraphPad Prism for macOS version 8.0.2.

bioRxiv preprint doi: https://doi.org/10.1101/611848; this version posted April 17, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

6 RESULTS

In one of the two barcoded experiments, I achieved extremely high coverage of

pHXB2 (median per-base coverage of over 9000x) (Figure 1). In addition to kilobase-long

reads, at least one read spanned the entire HIV genome (Figure 2).

I assembled the previously undefined plasmid pHXB2 relative to the previously

published HXB2 reference from libraries made with either high fidelity or standard DNA

polymerases (Figure 3). Genome assembly was done using Canu (a command-line utility

recommended by most in the long DNA field). Canu’s final output is a set of contiguous

DNA sequences (contigs) as FASTA files, but a consequence of assembling plasmid

sequences with this tool is partial redundancy at contig ends. Manual end-trimming of

contigs was performed in SnapGene based on an estimated length of 16 kilobases. Top

blastn hits from barcode 10/LA Taq pHXB2 base called with FlipFlop were as follows: for

the main backbone (with and antibiotic selection cassette for

cloning), shuttle vector pTB101-CM DNA, complete sequence (based on pBR322), from

4352-8340; for the upstream element (relative to 5’ LTR), Homo sapiens 3

clone RP11-83E7 map 3p, complete sequence from 58,052 to 59,165; for the downstream

element, cloning vector pNHG-CapNM from 10,204 to 11,666. Other identified elements

included Enterobacteria phage SP6, complete sequence from 39,683 to 39,966. Identities

of query to HXB2 and hits were all approximately 99%.

On closer examination of the two pHXB2 libraries prepared in parallel and

sequenced simultaneously, I counted 20 single nucleotide polymorphisms (SNPs) in this

reference clone of HXB2, the reference clinical isolate for HIV-1 (Figure 4 and Table 1).

These mismatches were seen in all Canu assemblies and verified in IGV and/or SnapGene.

This represents a ~0.2% divergence (20/9717) from the reference, which was assumed to bioRxiv preprint doi: https://doi.org/10.1101/611848; this version posted April 17, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

7 have perfect identity (0% divergence). Transitions were more common (14/20), and

curiously falls in line with a previous report of increased transitions over transversions in

models, because transversions are more likely to be deleterious to viral

replication (i.e.: to cause protein-coding changes) [12]. Indeed, less than half (9/20) of the

observed SNPs occurred in protein-coding regions, even though 92% of HXB2 is coding

(per reference, 791/9719). Of those 9 SNPs in protein-coding regions, 4 caused non-

synonymous mutations. Note that one of those occurs in a region overlapping both gag

and pol regions, however only pol exhibited a non-synonymous change from valine to

isoleucine (in p6, at position 2259 relative to HXB2). Other non-synonymous variants

occurred at 4609 (in p31 , arginine to lysine), 7823 (in ASP antisense protein,

glycine to arginine), and 9253 (in nef, isoleucine to valine).

As previously reported [13], per-read variability was higher near homopolymers

(runs of the same base) (Supplemental Figures 2). At the level of consensus (made

from sequences mapped to reference HXB2), homopolymers contributed few, if any,

obvious errors. While dips occurred at certain points near homopolymers, the consensus

did not change at the sequencing depth used in this study for either barcoded pHXB2

samples, or for down sampled high-coverage data (Figure 1). Another interpretation is

that homopolymers tend to seem truncated with ONT, with more reads in support of

shorter homopolymers. Canu assemblies showed base caller-variability (Supplemental

Table 1). That said, newer base callers tend to produce fewer gaps. The ratio of per-read

deletions to per-read insertions (DEL/INS) was much higher for SNPs occurring at

homopolymers and near the same base, and this difference was maintained between all

base callers used (Supplemental Figure 3). bioRxiv preprint doi: https://doi.org/10.1101/611848; this version posted April 17, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

8 DISCUSSION

Nanopore sequencing surpassed the read length limitations of other sequencing

modalities such as Sanger sequencing and sequencing by synthesis by at least two orders

of magnitude. This work represents the first instance of complete sequencing of the HIV

provirus as plasmid and contributed to the identification of single nucleotide variants

which may not have been easily determined using other sequencing modalities,

illustrating the importance of validating molecular clone reagents in their entirety.

First complete pass over all HIV information in reference plasmid

HIV provirus is believed to occur naturally as one or a few reverse-transcribed

DNA forms integrated into the host nuclear genome. Depending on where integration

occurs, local GC or AT content might cause problems for detecting integrants. HIV also

has conserved transitions from areas of higher GC content (~50%) to content

approximating average human GC content (~40%). To avoid sequencing bias from PCR

and to accommodate for the potential heterogeneity of HIV sequences, I fractionated

whole sample directly (as opposed to PCR-barcoding select amplicons) with tagmentation

provided in the Rapid PCR-Barcoding kit (ONT). Tagmentation uses transposon-mediated

cleavage and ligation of barcode adapters for later PCR amplification. A consequence of

this fractionation was a distribution of reads (see Supplemental Figure 1A) shorter

than longer reads reported elsewhere for ONT experiments [14]. Based on this

distribution and the level of coverage, it was expected that HIV might be covered from

end to end, but this would have been exceptional. That said, an example is presented

here (see Figure 2).

Errors in a reference plasmid of HIV-1 bioRxiv preprint doi: https://doi.org/10.1101/611848; this version posted April 17, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

9 HIV is the most studied human pathogen. By leveraging long reads with the

MinION, I was able to find mutations in highly repetitive LTRs, in addition to mutations in

protein-coding regions. Two independent experiments (PCR-barcoded libraries made with

Takara Long Amplicon Taq and Sigma-Aldrich Taq master mixes) yielded at least 20 single

nucleotide variants which differed from the reference sequence. These 20 SNPs were

concordant between both experiments, and across the three base callers used. Other

differences between the PCR-barcoded samples may be due to PCR amplification errors,

which may yield asymmetric base differences (a site with one predominant allele in one

experiment). Another possibility may be plasmid admixture in the reference stock. The

latter may be more likely in the event of symmetric differential calling (example, a site

which differs from the reference, but is called as two or three different bases at equal

ratios). Such sites represent a challenge for conventional (i.e. non-long read) sequencing

modalities. The pHXB2 sequenced was a reference plasmid, with unknown divergence

between the published reference for HXB2. In this study I was also able to confirm that

the backbone of this plasmid is from pBR322, aiding in the continued use of this

important reagent, and illustrating the need of full-length reagent validation in all clonal

viral lines moving forward. I suggest that all sequences tagged for use in clinical settings

likewise be sequence-verified at the level of single-molecule sequencing, lest preventable

errors go unchecked.

Improvement in ONT base callers within the past year

Albacore, Guppy, and FlipFlop basecallers were compared. Each produced reads of

similar length distributions (relative to polymerase used), while Guppy and FlipFlop

produced improved and best performance relative to quality score distributions (see

Supplemental Figures 1B). Interestingly, while read length distributions were affected by bioRxiv preprint doi: https://doi.org/10.1101/611848; this version posted April 17, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

10 fidelity of polymerases evaluated in this work, mean quality distributions were not. This is

important because of the differences in cost between higher fidelity enzymes and classic

Taq. In consideration of library prep, choice of enzyme used should be based on the

desired read length distribution. Regarding read mapping, the increase in mean quality

score between these base callers didn’t change overall mapping much given longer reads

(data not shown), but did facilitate demultiplexing, resulting in approximately ~10%

increases in barcoded libraries (shift in reads from unclassified to a given barcode).

FlipFlop tended to handle homopolymers better than previous base callers.

Homopolymers in HXB2 tended to exhibit apparent deletions near 5’ ends of

homopolymers (upstream due to technical artifact in BWA-MEM), but because consensus

is conserved (example, at least 80% of base in called read set is identical to reference), it

is unlikely that any of the homopolymer deletions are real in these experiments.

Current limitations of the approaches used in the present study are 1.) the cost of

long-read sequencing, regardless of platform, compared to the cheaper short reads from

sequencing by synthesis, 2.) long DNA extraction methods, and 3.) the lower per-base

accuracy (low-mid 90’s vs. 98-99%). As the price of long-read sequencing decreases, as

the hardware and software used in base calling improve, and as library protocols

improve, the cost of obtaining usable data from long reads will become negligible

compared to the ability to answer historically intractable questions. Future work will

include optimizing DNA extraction protocols with the goal of capturing high-coverage

fuller glimpses of each HIV proviral integration site in in vivo HIV models and patient

samples. This work has broad implications for all cells infected by both integrating and

non-integrating , and for the characterization of targeted regions in the genome

which may be recalcitrant to previous sequencing methods. Long read sequencing is an bioRxiv preprint doi: https://doi.org/10.1101/611848; this version posted April 17, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

11 important emerging tool defining the post-scaffold genomics era, allowing for the

characterization of anatomical landmarks at the genomic scale. bioRxiv preprint doi: https://doi.org/10.1101/611848; this version posted April 17, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

12 ACKNOWLEDGEMETS

ARG conceived of this project, performed experiments, analyzed results, and

wrote the manuscript.

This work was funded in part by institutional support from Baylor College of

Medicine; the Human Genome Sequencing Center, Baylor College of Medicine; private

funding by East Coast Oils, Inc., Jacksonville, Florida, and my own private funding,

including Student Genomics (manuscripts in prep). Compute resources from the

Computational and Integrative Biomedical Research Center at BCM (“sphere” cluster

managed by Dr. Steven Ludtke) and the Department of Molecular and Human Genetics at

BCM (“taco” cluster managed by Mr. Tanner Beck and Dr. Charles Lin) greatly facilitated

the completion of this work. I have also received the PFLAG of Jacksonville scholarship for

multiple years.

I would like to thank Drs. Steven Richards, Qingchang Meng and the staff of HGSC

Research and Development team for their earlier support in nanopore adoption. I would

like to thank the team at Oxford Nanopore Technologies for their timely improvements

and continued R&D. I would also like to thank other members of the Paul E. Klotman lab,

including Dr. Deborah P. Hyink (who thoughtfully helped to edit the manuscript),

Taneasha Washington, and former members Dr. Gokul C. Das and Alexander Batista. I

would like to thank Drs. Alana Canupp and Jim Maruniak for their early interest in my

scientific development, and for the passion that they show in everything that they do. bioRxiv preprint doi: https://doi.org/10.1101/611848; this version posted April 17, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

13 REFERENCES

1 Krupovic M, Blomberg J, Coffin JM, Dasgupta I, Fan H, Geering AD, et al. Ortervirales: New Order Unifying Five Families of Reverse-Transcribing Viruses. J Virol 2018; 92:e00515-18. 2 Graf EH, Mexas AM, Yu JJ, Shaheen F, Liszewski MK, Di Mascio M, et al. Elite suppressors harbor low levels of integrated HIV DNA and high levels of 2-LTR circular HIV DNA compared to HIV+ patients on and off HAART. PLoS Pathog 2011; 7. doi:10.1371/journal.ppat.1001300 3 Cuesta I, Mari A, Ocampo A, Miralles C, Pérez-castro S, Thomson MM. Sequence Analysis of In Vivo -Expressed HIV-1 Spliced RNAs Reveals the Usage of New and Unusual Splice Sites by Viruses of Different Subtypes. 2016; :1–24. 4 Wymant C, Blanquart F, Golubchik T, Gall A, Bakker M, Bezemer D, et al. Easy and accurate reconstruction of whole HIV from short-read sequence data with shiver. Virus Evol 2018; 4:1–13. 5 Bruner KM, Wang Z, Simonetti FR, Bender AM, Kwon KJ, Sengupta S, et al. A quantitative approach for measuring the reservoir of latent HIV-1 proviruses. Nature 2019; 566:120–125. 6 Pinzone MR, O’Doherty U. Measuring integrated HIV DNA ex vivo and in vitro provides insights about how reservoirs are formed and maintained. Retrovirology 2018; 15:1–12. 7 Ocwieja KE, Sherrill-Mix S, Mukherjee R, Custers-Allen R, David P, Brown M, et al. Dynamic regulation of HIV-1 mRNA populations analyzed by single-molecule enrichment and long-read sequencing. Nucleic Acids Res 2012; 40:10345–10355. 8 Walenz BP, Koren S, Bergman NH, Phillippy AM, Miller JR, Berlin K. Canu: scalable and accurate long-read assembly via adaptive k -mer weighting and repeat separation. Genome Res 2017; 27:722–736. 9 Li H, Durbin R. Fast and accurate long-read alignment with Burrows – Wheeler transform. 2010; 26:589–595. 10 Afgan E, Baker D, van den Beek M, Blankenberg D, Bouvier D, Čech M, et al. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Res 2016; 44:W3–W10. 11 Wysoker A, Fennell T, Marth G, Abecasis G, Ruan J, Li H, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009; 25:2078–2079. 12 Lyons DM, Lauring AS. Evidence for the Selective Basis of Transition-to- Transversion Substitution Bias in Two RNA Viruses. Mol Biol Evol 2017; 34:3205– 3215. 13 Loman NJ, Quick J, Simpson JT. A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat Methods 2015; 12:733–735. 14 Payne A, Holmes N, Rakyan V, Loose M. Whale watching with BulkVis: A graphical viewer for Oxford Nanopore bulk fast5 files. bioRxiv 2018; :312256. 15 Nicol JW, Helt GA, Steven G. Blanchard J, Raja A, Loraine AE. The Integrated Genome Browser: free software for distribution and exploration of genome-scale datasets. BMC Bioinformatics 2009; 25:2730–2731. bioRxiv preprint doi: https://doi.org/10.1101/611848; this version posted April 17, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

14 FIGURES

Figure 1: Nanopore reads carpet HXB2. Top 2 Coverage and Alignment panels from

barcoded library 10 (bar10 = LA Taq). Bottom from Barcode 11 (bar11 =Taq). BWA-MEM

was used to map reads called with FlipFlop to HXB2. Color-coding of alignment panes:

Purple is an insertion in a given read relative to reference. White is either a deletion in a

given read or space between two aligned reads. Note that per-read “insertion” and

“deletion” do not necessarily represent true insertions or deletions actually present in the

sample. Automated assembly followed by manual consensus building converts these

overlapping reads into approximations of the ground truth. Visualized in Integrated

Genome Browser [15].

bioRxiv preprint doi: https://doi.org/10.1101/611848; this version posted April 17, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

15

@6fbf0205-5195-460e-8e28-930db50e5d79 runid=0b284792282af9a6d7275cfca845556bb4b8ac7f sampleid=Student_Genomics_Run_1 read=87223 ch=460 start_time=2018-05-09T20:57:49Z barcode=barcode10 TGTTGTACTTCGTTCAATGGAATTTGGGTGTTTAGCGGTTCTTTACCGTGACAAGCGTTGAAAACGCTGTCCTCTCCGTTTTCGTGCGCCGCTTCAATGTGTATAATATACATATGTATATGTATAATGTATATATGTATCTGTGTATATATCTTGCATTTTTGTAGAAAAAAACAGAAAATATAGAAGTTTATAAGAA CTAACACTTTCTTACATAACAAAACGAAATGTTCGAACTACGTAACTAAAATGATGAAAAATATTCCCAGTATCACTGCCTGTTTGGATGTGGCTATCAGAGGTTTATTTTCCCCTTTCTTGTTTGCTCTTTAAGTCAATCTGGCCCCCATGGCCTCCTGACTCTGTGACTCGGCACCAGCACTGTGGCCCCTTCATTT ACATTTGATAACTGTAGAGAGATTATTATAATCCTGCTCATTAGACAGATCAATCTGAAGTTGGCAAGTTTTTAAATATAACTACCTATTTTTAAAAAGGGATGCCTTTAGTTTAGTTAACAATATATACTGCACATTTTGTTTTAAAAGGCCTGTTTACTACCACTGATTAACTATATGCCCACTGAGGCAACTCCTT CTTTTGTTTTTATTCAAATATTTACTAATTACCAGGACTCCTGTGTGCTAATACAATGGTGCTCTACTTTCTCACCTATATACTAGGGGAGACCCAAGCACTATCACCCATACCTCTGAGCTTCCTAACTGGGTTACTCTAGTTAACTAAGTAACTCAAGCTAGCCAAAATCATCCCAAACTTCCCACCCCCATACCTA TTACCACTGCCAATTACCTGTGGTTTCTTTACTCTAAACCTGTGATTCCTCTGAATTATTTTCATTTTAAAGAAATTGTATTTGTTAAATATGTACTACAAACTTAGTAGTTGGAAAGACTAATTCACTCCCCAAAGAAGACAAGATATCCTTGATCTGTGGATCTACCACACACAAGGCTACTTCCTGATTAGCAGAA CTACACACCAGGGCCAGAGATTCGGATATCCACTGACCCCTTTGGATGGTGCTACAAGCTAGTACCAGTTGAGCCAGATAAGGTAGAAGAGGCCAATAAAGGAGGGGAGCACCAGCTTGTTACACCCTGTGAGCCTGCATGGGATGGATGACCCGGAGAGAGAGATGTTAGAGTGGAGGGTTTGACAGCCGCCTAGCAT TTCATCGTGGCCCGAGCTGCATCCGGGTACTTCAAGAACTGCTGATATCGAGCTTGCTACAAGGGACTTTCCGCTGGGGACTTTCCAGGGAGGCGTGGCCTGGGCGGGACTGGGGGTGGCGAGCCCTCAGATCCTGCATATAAGCAGCTGCTTTTGCCTGTACTGGGTCTCTGAGTTGAACCAGATCTGAGCCTGGGAG CTCTCTGGCTAACTAGGGGGGGCACTTAAGCCTCAATAAAGCTTGCCTTGAGTGCTTCAAGTAGTGTGTGCCCGTCTGGTGTGTGACTCTGGTAACTAGAGATCCCTCAGACCCTTTGGTCAGTGTGGAAAATCTCTAGCAGTGGCGCCCGAACAGGGACTTGGCGAAAGGGAAACCAGAGGAGCTCTCGGCGCAGGAC TCGGCTTGCTGAAGCGCGCACGGCAAGAGCGAGGGGCGGCGACTGTTGAAGTACGCCAAAAATTTTGACTAGCGGAAGGCTAGAAAGGAGAGATAGGTGCGGAAAGCGTCAGTATTAAGCGGGGAAATTAGATCGATGGGAAAAATTCGGTTAAGGCCAGGGAAAAGAAAAATATAAATTAAAACATATGGTATGGGCA AGCAGGGAGCTAGAACGATTCGCAATTAATCCCTGGCCTGTTAGAAACATCAGAAGGCTGTAGACAAATACTGGGACAGCTACAACCATCCCTTCAGACAGGATCAGAAAGAACTTAGATCATTATATATAATACAGTAGCAACCCTCTATTGTGTGCATCAAAGGATGAGATAAAAGACACCAAGGAAGCTTTAGACA AGATAGAGGAAGAGAGCAAAACAAAAAGTAAGAAAAAAAGCACAGCAAGCAGCAGCTGACACAGGACACAACAATCAGGTCAGCCAAAATTACCTATAATTGCAGAACATCCAGGGGCAAATGGTACATCAGGCCATATCACCTAGAACTTTAAATGCATGGGTAAAAAGTAGTAGAAGAGGCTTTCAGCCCAGGAGTG ATACCCATGTTTTCGAAAACATTATCAGAAGGAGCCACCCCACAAGATTTAAACACCATGCTAAACACAGTGGGGACATCAAGCAGCCATGCAAATGTTAAAGAGACCATCAATGAGGAAGCTGCAGAATGGGATAGAGTGCATCCAGTGCATGCAGGGCCTATTACACCAGGCCAGATGAGAGAACCAAGGGGAAGTG ACATAGCAGGAACTACTAGTACCCTTCAGGAACAAATAGGATGGATGACAAAATAATCCACCTATCCCAGTAGGAGAAATTTATAAAAGATGGATAATCCTGGGATTAAATAAAATAGTAAGAATGTATAGCCCTACCAGCATTCTGGACATAAGACAAGGACCAAAAGAACCCTTTAGAGACTATGTAGACCGGTTCT ATAAAACTCTAAGAGCCGAGCAAGCTTCACAGGAGGTAAAAAATTGGATGACAGAAACCTTGTTGGTCCAAATGCGAACCCAGATTGTAAGACTATTTTTAAAAGCATGGGACCAGCGGCTACACTAGAAGAAATGATGACAGCATGTCAGGGAGTAGAGGACCCGGCCATAAGGCAAGAGTTTTGGCTGAAGCAATGA GCCAAGTAATACAAATTTACCATAATGATGCAGAGAGCAATTTTAGGAACCAAAGAAAGATTGTTAAGAGGTGTTTCAATTGTGGCAAGAAGGGGCACACAACCAGAAATTGCAGGGCCCCTAGGAAAAGGGCTGTTGGAAATGTGGAAAGGAAGGACACCAAATGAAAGATGTGTACTGAGAGACAGGCTAATTTTTA GGGAAGATCTGGCCTTCCTACAAGGAGGCCAGGGAATTTATCCAGAGCAGACCAGAGCCAACAGCCCCACCAGAAGAGAGCTTCAGGTCTGGGGTAAGAGACAACAACTCCTCAGGAAGCGGGAGCCGATAGACAAGGAACTGTATCCTTTAACTTCCCTCGGGTACTCTTTGGCAACGACCCCTCGTCACAATAAAGA TAGGGGGCAACTAAAGGAAGCTCTGTGTTAATTACAGGAGCAGATGATACAGTATTGAAATGAGTTTGCCGGGAAGATGGAAACCAAAAATGATAGGGGGAGTGGAGGTTTTATCAAAAGTAAGACAGTATGATCAGATACTCACTTAGAAATCTGTGGACATAAAGCTATAGGTACAGAGTATTAGTAGGACCTACAC TACAATCGACATAATTGGAAGAAATCTGTTGACTCAGATTGGTTGCACTTTAAATTTTCCCATTAGCCCTATTGAGACTGTACAGTAAAATTAAAGCCAGGAATGGATGGCCCAAAAGAAAGTTAAACAATGGCCATTGACAGAAGAAAAACAAAAGCATTAGTAAAAATTTGTACAGAGGTGGAAAAGGAAGGAGAAA ATTTCAAAAATTGGGCCTAAAATCACTTCTTACTCAGTATTTTACATAAAGAAAAAGACAGTACTAAATGGAGAAAATTAGTAGATTTCAGAGAACTTAATAAGAGAACTCAAGACTTCTGGGAAGTTCAGTGTGAATACCACATCCACTTTGGGGTTAAAAAGAAAAATCAGTAACAGTACTGGATGTGGGTGATATA TTTTTCCAGTTCCCTTAGATGAAGACTTCAGGAAATATACTGCATTTACCATACCTAGTATAAACAATGAGACACCAGGGATTAGATATCAGTACAATGTTAACTACACAGGGATGGAAAGGATCACCAGCAATATTCCAAAGTAGCATGACAAAAATCTTGAAACCTTTTAGAAAACAAAATCAGACATAGTTATCTA TCAATACATGGATGATGATTTGTATGTAGGATCTGACTTAGAAATAGGGCAGCATAGAACAAAAATAGAGGAGCTGAGACAACATCTGTTTGGGGGTGGGGACTTACCACACCAGACAAAAAACATCAGAAAGAACGCACTCCATTCCTTTGGATGGGTTATGAACTCCGTACGATAAATGGACGAGGTACAGCCTATG GTGCTGTGCCAGAAAAGAAAGACAGCTGGACTGTCAATGACATACAGGAAGTTAGTGGAAATTGAATTGGGCAGTCAGTTGCCCGAGAGTTAAAAATAGGCAATTATGTAAACTCCTTAGAGAGGAACCAAGAAAAGCACTAACAAGTGTACCACTAACAGAGAAGCAAGAGCTGAGGCAGCCGAAAACAGAGAGATTC TAAAGAACCAGTACATGGGTGTATTATGACCCATCAAAAGACTTAATAGCAGAAATACAGAAGCAGGGGCAAGGCCAAAATGGACACATATCAAATTTATCAAGAGCCATTTAAAAATCTGAAACAGGAAATATGCAAGAATGAGGGTGCCCACACTAATGATGTAAACAATTAACAGAGGCAGTGCAAAAGTAGCCAC AGAAGCATGGTAATATGGGAAAGACTCCTAAATTTAAACTGCCCATACAAAAGGAAACATGGGAAACATGGTGGACAGGTATTGGCAAGCCACCTGGTTCACAATGAGTTTGTTAATACCCCTCCTTTAGTGAAATTAGGTACCAGTTAGAGAAAAGAACCCATAGTAGGAGCAGAAACCTTCTATGTAGATGGGGCAG CCAACAGATAAATTAGGAAAAGCAGGATATGTTACTAATAGAGGAAGACAAAAAGTTGTCCACCTAACTGACACAACAAATCAGAAGACTGAGTTACAAGCAATTTATCTAGCTTTGCAGGATTCGGGATTAAGTAAACATAGTAAAAGCAGACTCACAATATGCGTAGGAGTCATTCAAGCACAACCAGATCAAAGTG AATCAGGGTTAGTCAATCAAATAATAGAGCAGTTAATAAAAAAGGAAAAGGTCTATCTGGCATGGGTACCAGCACACAAAGGAATTGGAGGAAATGAACAAGTAGATAAATTAGTCAGTGCTGGAATCAGGAAAGTACTATTTTTAGATGGAATAGATAGACCCAAGGTGAACATGAGAAATAGCACAGTAATTGGAGA GCAATAACAGTGAGTTACAACCTGCCACCTGTGGTAGCAAAGAAATAGTAGCCAGCTGTGATAAATGTCAGCTAAAGAAAGGAGAAATACATCATGGACAAGTAGACTGTAGTCAGGAATATGGCAACTAGATTGTACACATTTAAGGAAGGAAAAGTTATCCTGGTAGCAGTTCATGTGGCAGTGGATGCTATAGAAG CAGAAGTTATTCCAGCAGAAACAGGGCAGGAAACAGCATATTTTCTTTAAAATTAGCAGGAAGATGGCCAGTAAAAACAATACATACAGACAATGGCGGCAATTTCACCAGTGCTACGGTTAAGGCCGCCTGTTGGTGGAGGCGGGAATCAAGCAGAATTCCCTACAATCCCCAAAGTCAAGGAGTAGTAGAATCTATG AATAAAGGATTAAAAGAAAATTATAGGACAGGTAAGAGATCGAGGCTGAACATCTTAAGACAGCAGTACAAAATGGCAGTATTCATCACAATTTAAAAGAAAAGGGGGATTGGGGTACAGTGCAGGGGAAAGAATAGTAGACATAATAGCAACAGACATACAAACTAAAGAATTACAAAAACAAATTACAAAATTCAGA ATTTTGGGTTTATTACAGGGACAGAAATCCACTTTGGAAAGGACCCGTAAGCTCCTCTGGAAGGTGAAGGGCAGTAGTAATACAAGATAATAGTGACATAAAGTAGTGCCAGAAGAAAAGCAAAAGCCATTGGATTATGGAAAACAGATGGCAGGTGATGATTGTGTTGGCAAGTAGACAGGATGAGGATTAGAACATG GAAAAGTTTAGTAAAACACCATATGTATGTTCAGGGAAGAAGCTAGGGGATGGTTTTATAGACATCACTATGAAAGCCCTCATCCAGGAATAAGTTCAGAAGTACACATCCCACTAGGGGGGATGCTAGATTAATAACAACATATTGGGGTCTGCATACAGGAGAAAAGAGACTGACACATTTGGGTCCAGGGGTCTCC ATAGAATGGAGGAAAAGAGATTATAGCACACAAGTAGACCCTGAACTAGCAGACCAACTAATTCATCTGTATTGCTGACTGTTTTGAACTCTGCTATAAGAAAGGCCTTATTAGGACACATAGTTAGCCCCTAGGTGTGAATATCAAGCTGGGACATAACAAGGTAGGATCTCTACAATACTTGGCATAGCAGCATTAA TAACACCAAAAAGATAAAGCCACCTTTGCCTAGTGTTACAGAAAACTGACAGAGGATAGATGGAACAAGCCCCAGAAGACCAAGAGACCTGAGGGAGCCACACAATGAATGGACACTAGAGCTTTTAGAGGCTTAAGAATGAAGCTGTTAGACATTTTCTAGGATTTGGCTCCATGGCAAGACAACATATCTATAAACT TATGGGGATACTTGGGCACAGGAGTGGAAGCCATAATAGAATCTGCAACAACTGCTGTTTATCCATTTTCAAATTGAATTGTCGACATAGCAGAATAGGCGTTACTCGACAGAGGAGAGCAAGAAATGGAGCCAGTAGATCCTAGACTAGAGCCCTGGAAGCATCAGGAAGTCAGCCTAAACTGCTTGTACCAATTGCT ATTGTAAAAGTGTTGCTTTCATTGCCCAAGTTTCATAACAAAAGCCTTAGGCATCTCCTATGGCAGGAAGAAGCGGAGACAGCGACGAAGAGCTCATCAGAACAGTCAGACTCATCAAGCTTCTCTATCAAAGCAGTAAGTAGTACATGTAACGCAACCTATACCAATAGCGCAATAGTAGCATTAGTAAGTAGCAATA ATAATAGCAGCAATTGTGTGGTCCATGAAAAGTAATCATAGAATATAGGAAGAGAATATTAAGACAGAAAAATAGACAGGTTAAATTGATAGACTAATAGAAGGCAGAAGACAGTGGCAATGAGAGTGAAGGAGAAATATCAGCACTTGTGGAGATGGGGTGGAGATGGGGCACCATGCTCCTTGGGATGTTGATGATC TGTAGTGCTTTACTTAGAAAAATTGTGGAATACAGTCTATTATGGGGTACCTGTGTGGAAGGAAAGCAACCTACTCTATTTTGTGCATCAGATGCTAAAGCATATGATACAGAGGTACATCAATCATTTAGGAGCCACACATGCCTGTGTACCCTGGGAACCCCAACCCACAAGAAGTAGTATTGGTAAATGTGACAGA AAATTTTAACATGTGGAAAAATGACATGGTAGAACAGATGCATGAGGATATAATCAGTTTATGGGATCAAAGCCTAAAGCCCATGTAAAATTAACTTACTCTGTGTTAGTTTAAAGTGCACTGATTTGAAGAATGATACTAATACCAATAGTAGTAGCGGAGAATGATAATGGAGAAAGGAGATAAAAACTGCTCTTTC AATATCAGCACAAGCAAAAAATTGGAGTGCAGAAAGAATATGCATTTTTATAAACTTGATATAAATACCAATAGATAATGATACTACCAGCTATAAGTTGACAAGGAGTTGCCTTTCAGTCATTACACAGGCCTGTCCAAAGGTATCCTTTGAGCCAATTCCATACATTATTGTGCCCCGGCTGGTTTTGCGATTCTAA AATGTAATAATAAGGCACGTTCAATGGAACAGGACCATGTACAAATAAATCAGCACAGTACAATGTACACACATGGAATTAGGCCAGTAGTATCAACTCAACTGCTGTTAAATGGCAGTCTAGCAGAAGAAGAGGGTAGTAATTAGATCTGTCAATTTCACGGACAATGCTAAAACCATAATAACAGCTGAACACATCT GTAGAAATTAATTGTACAAGACCCAACAACAATACAAGAAAAAGAATCCGTATCCAGAGAGGGACCAGGAGGCATTTGTTACAATAGGAAAAATAGGAAGTATGAGACAAGCACATTGTAACATTAATAGTGAGCAAATGGAATAACACTTAAAACAGATAGCTAGCAAATTAAGAGAACAATTTGGAAGCCAACCAAA ACAATAATCTTTACTTAATCCTCAGGAGGGACCCGAAATTGTAACACTTACAGTTTTGGTGTGGAGGGAATTTTTTCTACTGTAATTCAACACAACTGTTTAATAGTACTGGTTTAATAGTACTTGGAGTACTGAAGGGTCAAATAACACTGAAGGAAGTGACACAATCACCCTCCCATGCAGAATAAAACAAATTATA AACATGTGGCAGAAGTAGGAAAAGCAATGTATGCCCCTCCCATCAGTGGACAAATTAGATGTTCATCAAATATTACAGGGCTGTATTAACAAGAGATGGTGGTAATAGCAACAATGAGTCCGAGATCTTCAGACCTGGAGGGAGAATGTAGAGGGACAATTGGAGAAGTGAATTATATAAATATAAAGTAGTAAAAATT GAACCATTAGGAGTAGCACCCACCAAGGCAAAGAAAGAGTGGTGCAGAGAGAAAAAGAGCAGTGGGAATAGGAGCTTTGTTCCTTGGGTTCTTGAACAACCAGAACACTATGGGCGCAACGTCAATGACGCTGACGGTGCAGAGCCAGACAATTATTCTGGTATGGTGCAACAACTCAGACAAGCTGCTGAGACAGCAT CTGTTGCAACTCACAGTCTGAAGATCAAGCAGCTCAGGCAAGAATCCTGGCTGTGGAGAATACCTAAAGGATCAATAGCTCTGAGTTTGGGGTTGCTCTGGAGAAACTCATTGCACCACTGCTTGTGCCTTGGAATGCTAGTTGGAGTAATAAATCTCTGGAACAGATTTGATCACACGACCTGGATGGAGTGGGACCA GAGAAAGTAACAATTACACAAGCTTAATACACTCCTTGGTGAAGAATCGCAAAACCAGCAAGAAAAAGAATGAACAAAAATTATTGGAATTAGATAAATGGGCAAGTTTCTTGTGGAATTAGGTTTAACATAACAAATTGGCTGTGTATAAATTATTCATAATGATAGTAGGAGGCTTGGTAGGTTTAAGAATAGTTTT TGCTGTACTTTCTATAGTGAATAGAGTTAGGCAGGGATATTCACCATTATCGTTTCAGACCCACCTCAACCCCGAGGGGACCCGACAGGCCCGAAGAATAGAAGAAGGTGGAGAGAGAGACAGATCCATTCGATTAGTGAACGGATCCTTGGCACTTATCTGGGACGATCTGCGGAGCGCATCGTCTTCAGCTACCACC GCTTGAGACACTCTTGATTGTAACGAGGATTGTGGAACTTCTGGGACGCAGGGGGTGGAAGCCCTCAAATATTGGTGGAATCTCCTACAGTATTGGAGTCAGGAGCTAAAGAATAGTGCTGTTAGCTTGCTCATTCTACAGCCATAGCAGTAGCTGAGGGGACGGGCGATAGGGTTATAGAAGTAGTACAAGGAGCTTG GCATGAGCTATTCGCCACATACCTAGAAAAAGGAGAATAAGACAGGGCTTGGAAAGGATTTTTGCTATAAGATGGGTGGCAAGTGAATGGTCAAAAAGTAGTGTGATTGGATGGCCTACTGTAAGGAAAGAATGAGACAGGCTGAGCCAGCAGCAGATAGGGTGGGGAGCAGCATCTCGAGACCTGGAAAAACATGGAG CAATCACAAGTAGCAATACAGCAGCTACCAATGCTGCTTGTGCCTGGCTAGAAGCACAAGAGGAGGAGGAGGTGGGTTTTCAGTCACACCTCCGCCTTTAAGACCAATGACTTACAAGGCAGCTGTAGATCTTAGCCACTTTTTAAAAGAAAAGGGGACTGGGAGGGCTAATTCACTCCCAAAGAAGACAAGATATCCT TGATCTGTGGATCTACCACACACAAGAGCTACTTCCCTGATTAGCAGAACTACACACGGGGCCAGGGGTCGAATGTCCACTGACCTTTGGATGGTGCTACAAGCTAGTACCAGTTGGGAGGCAGATAAAGGTAGAAGAGGCCAATAAAGGAGAGAACACCAGCTTGTTACACCCTGTGAGCCTGCATGGGATGGATGAC CCGGAGAGAAAGTGTTAGAGTGGAGGTTTGACAACCGCCTAGCATTTCATCCACGTGGCCGGGTGCATCCGGAGTACTTCAGAAACTGCTGATATCGGAGCTTGCTAGGGACTTTCCGCTGGGGGACTTTCAGGGAGGCGTGGCCTGGGCGGGACTGGGGTGGCGAGCCCTCAGATCCTGTAAGCAGCTGCTTTGCCTG TACTGGGTCTCTCTGGTTAGACCAGATCTGACCTGGGAGCTCTCTGGCTAACTAGGGGAACCCACTGCTTAAGCCTCAATAAAGCTTGCCTTGAGTGCTTCAAGTAGTGTGTGCCCGTCTGTTGTGTGACTCTGGTAACTAGAGATCCCTCAGACCCTTTTAGTCAGTGTGGAAAATCTCTAGCAGTAGTAATGTCTTA TTATTCAGTATTTATAACTTGCAAAGGAATGAATATCAGGGAGTGAAGAGAGCCTTGACATTATAATAGATTTAGCAGGAATTGAACTAGACACACAGGCAAAGCTGCAGAAATTTGGAAGAAGCCACCAGAGATGCTACGATTCTGCACATACCTGGCTAATCCCAGATCCTAAGGATTACAGTTTACTAACATTTAT ATAATGATTTATAGTTTAAAGTATAAACTTATCTAATTTACTATTCTGACAGATATTAAATTAATCTCAAATATCATAAGAGATGTATCCCCATTAACCTGAAACTGAGAGGGAAAGATGTTGAAGTAATTTTCCCACAATTACAGCATCCGTTAGTTACGACTCTATGATCTTCTGACACAAATTCCATTTACTCCTC ACCCTATGACTCAGTCGAATATATCAAAGTTATGGACATTATGCTAAGTAACAAATTACCCTTTTATATAGTAAATACTGAGTAGATTGAGAGAAGGAAAAATTGTTTGCAAACCTGAATAACTTCAAGAAGAAGAGGTGAGGATAAAATTAACAGTTGTCATTTAACAAGTTTTAACAAGTAACTTGGTTAGAAAAGG GATTCAAATGCATAAAGCAAGGGATAAATTTTCTGGCAACAGAACTATACAATATAACCTTAAATATGACTTCAAATAATTGTTGGAACTTGATAAAACTATTAAATATTATTGAAGATATCAATATTATAAATGTAATTTACTTTTAAAGAATGTGTATCATTAGAGTAGAAAACAATCCTTATTATCACATTTTGTC AAACAAGTTTGTTATTAACACAAGTAAGAATACTGCATTCAATTAAGTTGACTATGCAGATTTTGTGTTTTTGTTAAAATTAGAAAGAGATAACTGAAGCGGCGCACGAAAAACGGAGAGGACAAGGTTTCAACGCTTGTCTGC + )''.+62124.89860,2)'%%#$,+))**((((#%%+)*""$(*236<6221/,/2+/&&(1&/&&::;>+6%:;98BCD?82;:12>?7))41?=9*,+,:890-3255:84;/*(0/3))&%((0$&003:69'+25+/- 2+)+/,/79/+,&2246778'.%%3AF==:4,01=?BB6:;<=;B=56::3:9?7:<612354,$#*19872::<;9:;A4::>8667)'#.%196?@?<@:./;<@CF?BADA76:;:AC?B=<:;>$$.$%(.:9<<9>;<7(+-@>?H?98A5B<@@&9>>:<6:59>8:=4,/- /58:'&(+,+4/+,,,((0<;>==79=@A@72:53254400&245;=<6+1;;EB;;:=857*$&+,,,.11:9:@=:><797;9;?75:-2)=DB;:9<:- CF<9?/+*,&46FC?RB8**6B97C'''')'%82?2241:945930==>.7:?B@A=>>;8;+>@*I?>@=B;=9:<78;:84++2<;/-//- 3,)-*-015.4457=<2.-0/$-25497866814:2%(3;4,(&/02>:;:8,,,7<::7>=<+&0.12/&)%%%)++289@;(991&%,967?:8;;C>;94<>A?B><:@FEH;9@=<887'+6;<9;:=?6;=;;;46952- 4....JB=11235820-9::798;?A8??7=9>>58A:/9::=>;A?>@>5=7)-%$%...5@<018;8;H?8=>9<@<6HA<9513-89:8:,<58>34%+.211;:=4;?=@:<>@>@7/78+%)/.*,'01=?1@:4*0.1$.$$&&&)$%*- 35:><;7...,,55;;=6)/8248.4@8<=<=?>;@C>B8;-*;;AA=?>>8=>>;4681*,5@=@;7C8548=.92&+"23;=9;::;>*-$$1=9899>205852.81<7A999<>??@?ADB2<;0).-0*0%698<:(5;=@D<'/:???:/+-$1211($49=:;8@=;;+5:H>998- )79577('976--0-;=802;;7,08;<8<62283-5(00334/%:::6679D>69<=:@>=;<=FA=;76;<5B7563317456:><,*++&6600.13;B9368./?A@66+(/8;7C:6<<:@?;44:BB<@=984(--%.6;;-963$2#''&()%591:=:===>8<78:-6:<(-<3C8=3?19- :$12.%&$((##$-,#&''+,+&/.+8>DD@??;:77@>D====A/+:<=>A3>670*+14643#+$'&+4/227&+(3520/()88:?;527)01027*'+,%*7>24A@==@@AIG<458?563387<863///--,&$3<94####&))*-0::- 29&00#$*.4+688(+49%)).0A@B<=:B=566;;<=;988;,2'''15599-#-21&+@@9;=90/)),4).,+*%125,$+881-'(-('+,/;/,%%2'&$.37B6765993%36::'23<3..&$%%(57=B2>999=HB($%00- *'%3996>=;6?:4700+56'::;4445;;;?:9074+C<,*8<:79C*>9A779;C<5511*49)30::;8>?B;/--/5:,51415%*107885%66/550244106:73392<>3446765(+,,,7:=0/):;>?H9=>A=>;8;=6&%&+- /1224?6=BA>7=,,/67;=9..4:*8':;D=:6/-59=8=@-6659;<2248=24B@@::OC@@=@8***;7KD725<=>56F>:>GA;ADCD;<>>QE9:823>991*-&'43;=7@97QCE4/34<;13;<73)27;;6?8457-6889;<=>@BA<>=79:B)/ID@=).*0*-,)+:7.*'+/5.1;:4- '6'(.<=?:5;<>5356:88>EL<:;IB?EEB@;934:<7<7781C965922.1-+,- )&('$*(&*,%,.988:B717978B=7'594>;=64A@9''2800($$$**46997./19*-(0-+*.<;:;<<>AC49HDE9;99;:@6>>?19;:B$:=::>639:7/4=??/7:5.1/%*,&%,/::85774><6;<9A>:<0332>5;AA5/5FAFC;:75>?95::*%$585:68>;>>6>53?236>><85>? 7<>7>H;55-,+3278@C?;8<=89;9=712;74<>AF=@A8=79789;7CA*%-86%%&'2/'$-.%8227=6?93A<77<+9>=9=9<97;<25//24=832>9;;9:9<:7940)%,(.$88:;;5)(-),40'/- +48>>;=<=90./>=997>@??@=$585"8=9<17:<,--2177$$<<=88@9/*9.1452598745=;;:=425?=;4$/((#+.2-.+,,1>><=>C?ABAA530.(9:4D@?>9:<+*223/%.%$363/233(%"#&%*+,-7/4=>;@:7A:>>=<85*6+B=>(34:83230,-+- %%.1&,48==98934789))/3:529879:3.<8:$67'$$5:.6752->;;2,**8981+--..36,%)*%$&5664233$'%+)2=9;;@88:A-66=<;><9:;;/+/77'><>><;;:0+*17:79.)%%4+&:;3*999;;?99>3**.8@;71--7<<==@<<6688<;=<:))+4*++48:879:@C;<6;:92666*+,%1&(,)(059:<476253/*)- 01..*&%&$&%$)+58;9:<;9CAA=>AA9;=@=@>=?<@AE>@7<;;+.+.4655(235?>D?56695.5568536;99<778;610-,@==<.DA;49EA->>@C>=8<143444C<1($..395@960)+++*(%(88==;86687.3=97'$.1//$&'#$,%,$$%&278>-4+,- *$$$5119;5*/5@?:/3%45::0=?>E7)*'.1%&#$$&$%&&-''%%(##),-/.125/-,$$%%//?:859::6/048<9=>7:GBE:?;9+5691.4.877888@><=>:;:6,(-6//1..6@@:>=B?>99;=;??767668DE<331:@A::8-/13=6?>A19:7:179.28988,226><<75786?<=>=?>7::78=>::?/68==D>6'&(%%%,(,+./5>;<<<847=8<@;.,,- 1%2>=@=?<<;@60/4%16;>62:AFB8/1568$%'&.38832'',>GB7G93586-.;;;:+3796;;8;6582-0*57(4<;*$%13:9=8589<;-2;?@:8=A@LCB:5AB:79<=?>58;976053JC;6=-3)#12128=90:0-- ("'.)%%*,.)$&*)1/*;<>:;;4;88(49<01*0;;G?<22<<=..))))()-&'$$=;:;<88679D9956567/,-475,#+##$'78:A8?:;3$$&<2/-,2,'*%$$'0$$&$.025109<2(%22*0318-9>7;?==?;:986:4-)/&01-(+*7.;,0:=6.-'',3**+16- ,&&%.&$($*$%%%&&(%219A=99;2+*5;>727<9/31))48<.,6;6$-/06==57..+43**%%')+/)$%$$/%+)((,&'))&'%&%,,)+/--+,-*$#%%$$%###+- 11397',1446977HA;*)(23/1%037;44:<653459AA?D?<;66KD6*82336C@<;27744>22*&&%'(%%&'1(.1422&*(('::>=BD63%:<<:;>:4:;<:78BB9&.55<;;;D998+06''((&'))&5770991((+89D=//&0,30**/517:>447;?<23.+74640+'&+00/13..280 0=;4*$%6$12+25(33:<;8'4<=;;=><;551/01:?<.63)#$&/&*&57+%%%*1-,><5/4/55('*2+O=<:=5-9::0550<7/&,900346@?7;;;2=*9+$%'').,85;=:;;7679<799=<:550)&4*&%5(3)%+,1;;@?=;98++@A:'7985++262260*1)(/1- '+#478:8()*351;=95,*%()$$$.&%$"#-2-'(01>=?@>=?>=;5863658;?:7G?@0262597>;6758@;:;=;9=>IA;HAD8<21,/0<79:9B;=06=5'+-..>C??=4,5446;4//:>8>@8'7>3;2(&/356;2/0)'-((-- #)))146,')++025$&&3**''%257;=>G<11+2*)*+87=7663/'99==9<,0'5&51478BF>=>BH=LHAE-(437:@EDA9N@<111)*$*,,/'<=<;<9667===6+4=9==?366(5;E;8/+)1.669=<4$+:7158<86;::6&76&72<3336-,3.- 067<<;89E@@B=44:>&&"&''0.0$6306:>;54/13045(),,,110-&-70534.560$&%$%(.%#%#*"#2/311057?;::;.&-5<+<:=:9B?@J<7(&&&57=- %&&('//7$,8==B>>.8157::=>>797;3>C6/600879808==E@@?929)71===8220*%0;5326,/);>>=9<<56878:<<>:+-(&$$)-.7-/,0$"%'&&%%&456368:5.378@?;6665+<7-*<6*$#34<:$)9:@A>6762($'((5522.;&&%%7385<8?@4%%&%(;3/:)239,+955)0+*66:<98>?=:<(0-*(>8<87,750- .+<:60:2:;M@5&544.*%+:=2A9>318:<665**6=9$1270/''*468866E?@?:;$*++77:B;AA=@<78><===8)/)$%***,-<>8(7===.68624()+(**(,-24189779==6?A?;->>3<<55::9932$.+,4*)/.-/%*- 6<;9=5I@<34740+);>?620>857:;>G9;<%<9+'2036*)'29??;;0451.<=;<($&0017>?B>A@A:9966>>?:71%))))*662567::NA5(8+)+--/--17-%+,(405;<:>;/0)69=969:9B:48::?>>9841.0398982943--#%'(',$0%99?888>87;=@>@AI5.- 7@;??<<9;:93+.;99;357:4-.04:;@8/('$&'...09;@1=>(%('/6.-,17:<;;;==><9&&6313552/1-)*))-?:5:421775,$('$0/6;<;1(++,(1.5;89;:54.3655;6;<;9:>=(8((- &(222375686./.024/7<;,/./3;;@8A/;:5$)073831$&+#1458)477;<6(%$49=)45<;7501699;=8136/4())),9:;57868:?2-565?DEDC?>46-5/0:98?8=399C?@A?A;4624''.%%.427620590,*/;:11,+201--./2(',/-.0-.1- <<@<901>/>>=28?9;:70<,::1)%47;<<8.)&&&18;1.*2200,(+.'><:26@;8**-.//.- 3,'''78825;:4=;=C>?IFJ:9B89=<3:@;<:;=>A<<7*0287151)(&'),,56=62;:=C9=:<?@@<6G<(151'(&%$&4349;9<54453<8;<;<@8+.83531.,.04(&%(/*)+=?>;800$&)$$%25,83788,$))+*)((%&'(+*:7/97>HF<:95,$&&$3$%4;D:1?.(?;6=97;AA?4)'/431'''*12,%.14B8589:?D>0/(2482.(.76;=@@5=<4*)289836>6;5/*&776:7<=8;8)711:, ,1220B@??31771//34%839?9;<0:?99@?63488::=>9A@E?78/-6((02:A>7)/64330%%*=>257&'()+)2.-)(05=99L:778767?44<<=:48903((7400<;C@0**.4280,013046>:)))%'+077978B;;>98:;4***-18<=:<2-001-75- >;;B43577?<@69527==7?8;=@91)):677655%**44.&%#%*,//668@6998:;<76456*'%&&&665A?47%'/$&-)06,)/4>?>8:,8($$##,+33-=>;7212>:9,68:5$)$)+/2451'$5/.%')**,.;<7888<42/+$259:A<::6:D?<:+*+(+--- .14.0238@>;=:4(7<:<<>74:>A-5165=?>6=?1);>;>A=@87:68<;;A@0?@197**/927:=;;=667>;4=7877761.0+3509714-,$#$'%##$+,9:=0655594?=><978<9589B>7=?>8-- 5997:;2%%'*%%%%$*+)'&/12.90$1115188>:8<;94820'+*('$&*%%)$$$5365555GA+$&8<>A::5=:1>H:D9994.&$$:@:4&622<;:==<@?=A>=?>,*'$(91+213158;=@=<=>=<;::<9?<=7<:2::<=88:>6;9;@617@?;C?82=88.3*.,/7.298.%&&((;6:658;6/10-4/22??79780<>=532@92:<==;;.%)<<>1><=,2(&85>=;-37:;8553799??9;<4''%%&358.1,2409:58.)((4822/5975:<773*5;:?;:;B<>>HC>;?>?@=<>><9;>?>6.+686(6968<<+++,869562895:7878;;3798;:87A:;=A4;568=<9969.*(/0+4'(30-.)'"%)-012329?==A;96>610+35;:AAC;7;=>@?>028:798853-).2;36:30665364<5+.;<;689.2///::6()))'),,.76+,+;<;;>?>>;;?=9AD24458*+359657@777>?>:E?9=51.+/-- ,06<:>8;=<=><35..5...2:?;577?63;:50+-+*%-)*(&'1120+,-+''+&('&&&&,$$&%'0$%18;;98=1*79?312,,+24345;:A@<8:8:45:,**000//-,*-<<7<@04.- $#%(%"#43796;7*<:<8265&###:;;815<.A?71:86000,,.34;=CD8*)&&&45:=9;=6#&4.2;;5*++%&%&(&&()&$$245==:><:;;4617882764.,&)16<<6327.38?=86;>,116509:88>8;BD2.0001)./;;@657,*75749:=<>;CB;B?B=:;3.44*))0,./3229=9;=;94&24453*(9.796*.3$%<<;GC3359=889504A<<605:*+<793/;;<=),/<>;<<;664;6>=;),12:4%+7*($'',:>8=B 5.396<5$'3:5*0),3/.424331/776>9189MA88(9>96874$$'&-5)790770;:>004A<;67778+00057==74;3,,(&'%%32211794438;;4)(',/290-:>>=998:76895=<@:@6*%'&)'%(%$"#&*+/364272033*((--0087,+),;=43(-/$+,+'*)...+*//- 369434./425(030/0&*.)$#('%#%*&#&&%#%)''''-79;:1;<:81<;/8:15'$58640#$%$''500037;:=5436@A:5:<=4<:9466:9,4'*-;9=<98519=>82)&"&&*+)#$#(&+0.'88>@76>74672**--689;>:766>>>63&##%-54889;>?;4;788+,+2- 0#37?>;=>@;54<<;A=9<:B=:;6/15700764=A=>>=?=6:/><7/07%/*)-'&(,$,'$'''*08:>;>9:447<;7-+)+38,*5'088==07;=<48HCD;?;:())));=>//78::;==:I242(0349:/5.+-58=::C>A96/144)&$-,"$&')4<>;::*'9:;5<87;9B7>B<- +656<=:91&%**12;@;;9844''278=47;1;=36;??BB:;><189:>83##145B337A@C9;+38,--<699@7=75511=C=>>78:/3230/3>;99=>8543:=;:10/594+54211'&6),.115982125+++022)&(6++))(*-6-96CC3&80*%5#$%%8=<<:=><<@A<$- 7=:891<=<;97;:;::56:=9,)(&%592%7;>:2/)$%$#$#)$#'#&58.28973387035<;)%-(&&#+$$&+*367=N@7747=69:4158/=79><:7;988566<861/454--';60.&'+/>++.*6:@D;1:8:@9><22>;8458036::#- .(%32CA@<7:<55=<:=?>>639;:769('(*&)$+9$=;0>:;38<;6(249+-=2<=8&,*#$"1+7@:;;@C70*-986883666,1./''-,)#$%&(5<;>=<28:854;;8125''>>3((-(-%,/.77=6304;8BC>6.777554?/289+77=>=:8:89:=D85<+#%+0- 49KE<55.3,28:;>99=<;=3:76.6389:9435/$-;5(09<<;8)'&+33.2AA@1:89840,24%&&$11237*122-79:8B:64::<=6;>@=<>===/3:6331551)+,586=>0,,-9=BAEB@B;1/.;;2)B<<=B=:1)&$)&.04;;:=815;C?CBC=668995I@6;<@<;9>B9<===:;<9;65&&0)0,0(001.- %3;6@=AAD?:?:A==3-+13798<=6-4685'843,%$#$&'''9<:92588&'-.%#1<77>52=88;=>-9:<:88BC:8///)-3%'7AA;8;8<46212'++7-==<>266<@><=<4?=-$)&)35:<>=:0/<>88*:99=:B=78899?=::70-.6>E;./'$%%&+,6::6/+/75:<:'&($++:>>>BAC<<::,'&&$$$$$)%&.-+899::,#"'(7*-78;==A:/B><72%8;7G?:865/4/15=@>=;&-03+**'7<@5- 7=23762(&#*22555>@A5,,,2;:76907=<;1<<:8';;;===7/276>B;>=(,1*;;;-37/7AC9?91..,30.57:9G<==BB699+0<>39:0-+.3748<7:<4/383/;5899921123+%&((./256869:/:<<)1;9@?=77=82251:E@>/&ED;<<<9<9:+&)$&$/$$$%(($.00159:7:88?;A?6@AC@B<0.+,..$$$&,(+&(%%%(526:<<7<:8&#&(&%&$.-*-35<7B:4- /8CA8@?B>))0/'(,+35???DCD=;98@>@CCDCC9>8;;>=G:8;=?8>A8:85560(5.0319:1/1,*$(,/1922..2:+;>*,044+,0*1&289:555112,)#%$$%('*+73;4/8- 52*+&%%$%&&%/8?;@@B==;:111)$+8:=4@B>==@GC5::=8862/971-5=B>537A7113;;@BA955=>;<=>97::::<=;:;;723/12<=>:77A8-:/@?=;;?8679>>78:8547=:;B;359622333324&8:??47;1*+&(2510),#$- 48><?5/984+358:+79:6AA:339892%$*8,,?:><@5''&-++'&4*602/3777;98B;:9:;8>=9=9-/422646<-7+,66;;.3+*899GBC.:774>884==40..4;4/0%$$%&7:;D?C9:>7- )*.2>694;4;=9?A=CID:;<338:4:6247(+45;5:86488767'%5475808B>=>;C@?;/0414:58;ABEBJ?9*#'(5577:345135)3;9&&::<419;8<76899:::65478:;8;::7633>3><=@60/5;;>>9884225..1:901$7-(&&'326>;:-*0$%%&&')'&-/12565444.:7>3D=?:;2*+#+,,C;77<=;<=4)%"

Figure 2: Longest read containing complete full-length HIV-1 (early clone of HIV-1

reference strain HXB2). The 5th longest read in the barcode 10 set (read ID 6fbf0205-

5195-460e-8e28-930db50e5d79) contained full-length HIV-1. Query (full read) blastn

against HIV (taxid:11676) returned 92.95% identity to HIV-1, complete genome (accession

number AF033819.3). Limiting query to HXB2 (red) blastn against Nucleotide collection bioRxiv preprint doi: https://doi.org/10.1101/611848; this version posted April 17, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

16 nr/nt returned 100% coverage and 93.02% identity to HIV-1 HXB2. This read was 11,487

bases long, with mean quality score 11.984396. Base called using Guppy 2.3.1 with

FlipFlop config.

bioRxiv preprint doi: https://doi.org/10.1101/611848; this version posted April 17, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

17

Figure 3: Assembled HXB2 plasmid, varying base caller and polymerase. Each pane (n=6)

summarizes the results of contig curation. Tan boxes contain alignments of

reference/blastn hits to contig (Black line). Divergence from reference decreases with

newer base callers, and with long amplicon DNA polymerase (Sigma-Aldrich Taq vs. LA

Taq by Takara). Annotations below the black lines were automatically imported with

SnapGene and represent sequences with minimal/no divergence from common cloning

features. The regions corresponding to HXB2 are between red boxes (LTRs). Viewed in

SnapGene. bioRxiv preprint doi: https://doi.org/10.1101/611848; this version posted April 17, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

18

Figure 4: Single nucleotide variants from HXB2 plasmid. Base called with FlipFlop.

Visualized in Integrated Genome Browser [15]. Gray indicates per-base consensus

accuracy  80%.

bioRxiv preprint doi: https://doi.org/10.1101/611848; this version posted April 17, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

19

Substitution Mutation Class Homopolymer- Same as LANL Site Position Coverage Change Change Subfeature Frame Class (Syn/Non/Stop) adjacent? neighbor? Feature

1 24 5669 C>A transversion NA NA yes yes 5'LTR U3 NA

2 108 5028 A>G transition NA NA yes yes 5'LTR U3 NA

3 164 6674 G>T transversion NA NA yes no 5'LTR U3 NA

4 168 6722 T>G transversion NA NA yes yes 5'LTR U3 NA

5 176 6817 A>G transition NA NA yes yes 5'LTR U3 NA

6 182 6427 C>T transition NA NA yes no 5'LTR U3 NA

7 227 6488 A>G transition NA NA yes yes 5'LTR U3 NA

8 291 7122 A>G transition NA NA no no 5'LTR U3 NA

9 333 7521 C>T transition NA NA no no 5'LTR U3 NA

10 654 7927 C>T transition NA NA no no None None NA gag 11 1659 7378 aaG>aaA transition None Syn yes yes gag p24, p55 frame 1 gag frame gag:agG>agA gag:Arg>Arg 12 2259 9070 transition Syn/Non no no gagpol p6 1 pol pol:Gtc>Atc pol:Val>Ile frame 3 pol 13 2927 9126 aaG>aaA transition None Syn yes yes pol p51 RT frame 3 pol 14 3812 9628 ccC>ccT transition None Syn yes yes pol p51 RT frame 3 pol 15 4574 9638 acT>acA transversion None Syn no no pol p31 IN frame 3 pol 16 4596 9736 Ggt>Agt transition None Syn yes no pol p31 IN frame 3 pol 17 4609 9272 aGg>aAg transition Arg>Lys Non yes yes pol p31 IN frame 3 gp41 gcC>gcG RRE, also frame 18 7823 9304 transversion ASP:Gly>Arg Syn/Non no no gp41 Ggc>Cgc ASP 3, ASP -2 nef 19 9253 9408 Ata>Gta transition Ile>val Non no yes nef/3'LTR also U3 frame 1 20 9418 8786 C>T transition NA NA no no 3'LTR U3 NA

Table 1: Summary of pHXB2 sample divergence from reference HXB2. Coverage

numbers are with Albacore basecalled reads, BWA-MEM in Galaxy alignment, viewed in

IGV. Base-1 (first base is numbered 1, 2nd 2, etc..), relative to HXB2, Accession number

K03455.1. Changed base represented as upper-case. Annotated as codon if in protein-

coding region. No deletions or insertions were predicted from manual inspection.

Abbreviations, ASP: antisense protein, RRE: rev-response element, NA: not applicable. bioRxiv preprint doi: https://doi.org/10.1101/611848; this version posted April 17, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

20 Syn: synonymous mutation. Non: non-synonymous mutation. Stop: stop codon/non-

sense mutation. LTR: . RT: . IN: integrase. LANL:

Los Alamos National Laboratory HIV Database. Data from two separate sequencing

experiments on the same plasmid sample.

bioRxiv preprint doi: https://doi.org/10.1101/611848; this version posted April 17, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

21 SUPPLEMENTAL DATA

Supplemental Figure 1A: Read stats with different callers/aligners.

Median (big dash) and quartiles (little dash). Read lengths increase with higher fidelity Taq.

bioRxiv preprint doi: https://doi.org/10.1101/611848; this version posted April 17, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

22

Supplemental Figure 1B: Newer base callers increase read mean quality.

Median (big dash) and quartiles (little dash). Effect of enzyme version was not statistically significant.

bioRxiv preprint doi: https://doi.org/10.1101/611848; this version posted April 17, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

23

Supplemental Figure 2: A set of homopolymer tracks from HXB2 plasmid. Visualized in

Integrated Genome Browser [15]. Alignments with BWA-MEM shown from FlipFlop (top)

and Albacore (bottom) base called reads.

bioRxiv preprint doi: https://doi.org/10.1101/611848; this version posted April 17, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

24

Albacore SNPs Guppy SNPs FlipFlop SNPs

150 800 200

600 150 100

DEL/INS Ratio DEL/INS Ratio400 DEL/INS Ratio100

50 200 50

0 0 0 ======Δ = Δ = Δ = Δ = Δ = Δ & & & & & & & & - & & - & & - + + - A + + - A + + - A A A A P A A A P A A A P P P P H P P P H P P P H H H H H H H H H H

Supplemental Figure 3: The ratio of deletions to insertions is higher at mismatches both

adjacent to homopolymers and similar to neighbor bases. Box plot shows median (“x” is

mean) and quartile ranges. Y-axis is ratio. HPA: homopolymer-adjacent. ==: same as

neighbor base. Δ: different than neighbor base. Higher coverage (above ~10) usually

makes up for current error profile. Above true for Albacore, Guppy, and FlipFlop. bioRxiv preprint doi: https://doi.org/10.1101/611848; this version posted April 17, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

25 Mismatches Gaps (INDEL)

Taq LA Taq Taq LA Taq

Albacore Guppy FlipFlop Albacore Guppy FlipFlop Albacore Guppy FlipFlop Albacore Guppy FlipFlop

5' LTR NA 9 9 9 9 9 NA NA 0 2 0 0

gag 2 2 2 2 2 2 12 10 9 9 8 8

5' LTR+ψ 10 10 ΝΑ ΝΑ ΝΑ ΝΑ 5 2 ΝΑ ΝΑ ΝΑ ΝΑ

pol 7 6 6 6 6 6 26 22 9 18 11 10

vif 0 0 0 0 0 0 3 3 2 3 1 1

vpr 0 0 0 0 0 0 1 1 1 0 0 0

tat 2 1 1 1 1 1 10 6 3 7 4 5

rev 2 1 1 1 1 1 10 7 4 7 4 5

vpu 1 0 0 0 0 0 0 0 0 0 0 0

gp160 2 1 1 1 1 1 11 7 4 8 4 5

nef 1 1 1 1 1 1 3 2 2 3 2 1

3' LTR 2 2 NA NA NA NA 2 0 NA NA NA NA

nef+3' LTR NA 2 2 2 2 2 NA 2 2 5 2 1

HXB2 22 20 20 20 20 20 61 46 28 47 27 25

ownstream bridge 0 0 0 0 0 0 4 5 1 2 1 1

pBR322-related 0 0 0 0 0 0 19 19 13 18 19 15

upstream bridge 2 2 3 2 2 2 8 6 5 7 7 3

Supplemental Table 1: Variation in assemblies at the feature level. Assembled with

Canu. NA denotes features which may not have matched exactly, but which were

collapsed with adjacent features for ease of counting. Mapped with SnapGene. bioRxiv preprint doi: https://doi.org/10.1101/611848; this version posted April 17, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

26 LIST OF DATA (To be deposited in SRA in near future)

Albacore base called barcode 10 files Guppy base called barcode 10 files FlipFlop base called barcode 10 files Albacore base called barcode 11 files Guppy base called barcode 11 files FlipFlop base called barcode 11 files . files of contigs (n=6)