Bacterial comparative and transcriptomics using long nanopore reads Long reads enable complete assembly of bacterial , allowing us to compare genomes between strains and , and to monitor the -wide response to different stimuli Contact: [email protected] More information at: www.nanoporetech.com and publications.nanoporetech.com

a) 1.5 Homologous sequence 4 Position in K-12 (Mb) No homologue 1.4 0 0.5 1 2.5 2 2.5 3 3.5 4 4.5 3 1.3 E. Coli K-12 MG1655 E. Coli K-12 MG1655 2 1.2 gene annotation translocation 1 1.1 Nanopore assembly Position in assembly (Mb) Position Position in assembly (Mb) Position 1.0 0 1.1 1.2 1.3 1.4 1.5 0 1 2 3 4 Position in K-12 NEB 5-alpha genome (Mb) Position in K-12 MG1655 genome (Mb) Position in K-12 genome (kb) Position in K-12 genome (kb) b) 500 4 25 27 29 31 33 35 37 39 117 119 121 123 125 127 129 400 3 300 82,982 bp / 85 genes

2 200 mmuP cynT cynS mhpD 12,852 bp / 11 genes 1 100 Position in assembly (kb) Position Position in assembly (Mb) Position 0 0 0 100 200 300 400 500 0 1 2 3 4 29,510 bp 6,166 bp Position in K-12 MG1655 genome (Mb) Position in K-12 MG1655 genome (kb)

Fig. 1 Closed assemblies compared to a) the correct K12 strain, b) the primary K12 reference Fig. 2 Detailed analysis of inserted and deleted sequences between E. coli strains Single-contig assembly of bacterial Nucleotide-level resolution of breakpoints by genomes using long 1D2 reads and Canu comparison to the annotated reference We extracted genomic DNA from a culture of E. coli strain NEB 5-alpha, prepared a 1D2 library, We performed global alignment of our assembly to the MG1655 reference genome using using an SQK-LSK308 kit, and sequenced this library on a FLO-MIN107 flowcell. We basecalled MAUVE, to detect and render any structural variants (insertions, deletions, translocations and with Albacore 2.0.2 and assembled 8,000 of the 1D2 reads (160 Mb) with Canu 1.5, using an inversions) and to locate the resulting breakpoints. By comparing the assembly to the annotated error-correction rate of 0.04. We obtained a single-contig assembly and constructed a Mummer reference, we were able to identify affected genes and mobile elements (Fig. 2). The easy plot of this assembly against the reference sequence of the NEB 5-alpha strain (Fig. 1a) for availability of fully-closed assemblies using long nanopore reads will simplify confirmation that our assembly was correct. We also constructed a Mummer plot against the the identification and characterisation of bacterial strains, enabling their correct phylogenetic K12 MG1655 strain, to reveal any large-scale structural differences (Fig. 1b). Several differences placement, and will allow us to monitor the exchange of genetic material – such as were evident, and we investigated these further. antibiotic-resistance cassettes – between strains and species.

a) b) 6,243 bp d) 3,735 bp 4,500 kb 0 kb 2,867 kb 2,868 kb 2,869 kb 2,870 kb 2,871 kb 2,872 kb 4,489 kb 4,490 kb 4,491 kb 4,492 kb Stationary phase cDNA coverage 500 kb 4,000 kb logarithmic phase Log / stationary phase log2-fold change stationary phase Log phase cDNA coverage log2-fold change

1,000 kb gene annotation 3,500 kb yigM metR metE ysgA udp rmuC csgD csgB csgA csgC ymdA ymdB clsC

c) 29 kb e) 5,669 bp

1,260 kb 1,270 kb 1,280 kb 758 kb 759 kb 760 kb 761 kb 762 kb 763 kb

logarithmic phase

1,500 kb 3,000 kb stationary phase

Negative-strand CDS log2-fold change

Positive-strand CDS gene annotation 2,500 kb 2,000 kb elaD yfbL yfbN nuoN nuoL nuoI nuoF nuoC lrhA yfbR yfbT yeaX rnd fadD yeaY tsaB yoaA yfbK yfbM yfbP nuoM nuoK nuoG nuoE nuoB alaA yfbS yfbV

Fig. 3 Comparison of E. coli expression levels in log and stationary phases a) Circos plot, b)–e) selected genes with substantially different expression levels under the two growth conditions Comparative transcriptomics: PCR-free sequencing of full-length cDNAs allows us to monitor changes in expression levels under different growth conditions To investigate transcriptomic changes under different growth conditions, we grew E. coli strain NEB 5-alpha cells in LB broth, and measured the optical density at regular intervals, to identify the growth stage of the culture. We took samples of the culture in both log-growth phase and stationary phase, transferring the cells immediately to a solution of RNAlater, before extraction of total RNA using Qiagen’s RNeasy kit. In our experience, this kit provides a simple and efficient way to obtain full-length from gram negative bacteria. We added a polyA tail to the total RNA using E. coli PolyA polymerase (NEB) and then depleted ribosomal RNAs using 5´-phosphate-dependent exonuclease (Lucigen). Next, we prepared cDNA sequencing libraries using Oxford Nanopore’s PCR-free Direct cDNA protocol and sequenced each library on a FLO-MIN006 flowcell, generating 3 Gb of sequence per library. Using BWA-mem, we aligned the cDNA reads to our NEB 5-alpha assembly and generated a Circos plot to display the differences in expression level (Fig. 3a). We converted the alignment bam files to bigWig files using deepTools, and visualised the alignments in IGV alongside a genome annotation from strain MG1655, which we added by gff liftover. There were differences in the expression levels of many genes. Among the genes with the most significantly different expression levels, we found csgA/B and fadD to be more highly expressed in stationary phase. These genes are involved in biofilm formation and quorum-sensing pathways, respectively. Conversely, we found the expression of nuoC and metE to be substantially higher in log phase. These genes are involved in energy transfer and amino-acid synthesis (Figs. 3b–e). P17019 - Version 4.0 © 2017 Oxford Nanopore Technologies. All rights reserved.