What Can the Miseq & NGS Core Do for Your Research?

What Can the MiSeq & NGS Core Do for Your Research? Grant Hill Library Prep Specialist [email protected] © 2013 Illumina, Inc. All rights reserved. Illumina, IlluminaDx, BaseSpace, BeadArray, BeadXpress, cBot, CSPro, DASL, DesignStudio, Eco, GAIIx, Genetic Energy, Genome Analyzer, GenomeStudio, GoldenGate, HiScan, HiSeq, Infinium, iSelect, MiSeq, Nextera, NuPCR, SeqMonitor, Solexa, TruSeq, TruSight, VeraCode, the pumpkin orange color, and the Genetic Energy streaming bases design are trademarks or registered trademarks of Illumina, Inc. All other brands and names contained herein are the property of their respective owners. Agenda ! Into/basic principles of NGS (terms & technology) ! Small genome sequencing ! Targeted resequencing (custom panels) ! 16s metagenomics & amplicon sequencing ! Small RNA & targeted rna expression ! Experimental design & local resources 2 Enhanced Focus on the Sample to Answer Integration From library prep to downstream informatics & knowledge generation Library Prep Sequence Answer 3 For Research Use Only. Not for use in diagnostic procedures. The Flow Cell Where the magic happens ! Everything except sample preparation is completed on the flow cell •" Template annealing (1 - 384 samples) •" Template amplification •" Sequencing primer hybridization •" Sequencing-by-synthesis reaction •" Generation of fluorescent signal MiSeq NextSeq HiSeq 4 Flow Cell Surface 8 channels Simplified workflow Surface of flow cell ! Clusters in a coated with a lawn of oligo pairs contained environment (no need for clean rooms) ! Sequencing performed in the flow cell on the clusters 5 Sequencing by synthesis 3’ 5’ DNA (0.1-1.0 ug) A G T C A G C T T A C C G G A T A A C T C C C G G A T T C Sample G A preparation SingleCluster molecule growth array T 5’ Sequencing 1 2 3 4 5 6 7 8 9 T G C T A C G A T … Image acquisition Base calling 6 Illumina Paired End Sequencing Overview of automated process on the flowcell Cluster amplification 1st Read 1 cut FLOWCELL Normal sequencing FLOWCELL process Linearize DNA Sequence 1st strand Because we amplify on the flowcell surface, we can resynthesize the DNA in the cluster, and regenerate both FLOWCELL strands again Strand re-synthesis 2nd Read 2 cut FLOWCELL FLOWCELL Linearize Sequence 2nd strand 7 de novo Sequencing gap contig contig long read mate pair 8 Re-Sequencing Alignment 3X 2X reference Variant Calling G/A SNV GCTATGCATTGGCATGGCATGCTAGCTACGGGATGCTGATCGATTTCGAAACTGACT! CTATGCATTGGCATGGCATGCTAGCTACAGGATGCTGATCGATTTCGAAACTGACTG! random TATGCATTGGCATGGCATGCTAGCTACGGGATGCTGATCGATTTCGAAACTGACTGT! errors ATGCATTGGCATGTCATGCTAGCTACAGGATGCTGATCGATTTCGAAACTGACTGTT! TGCATTGGCATGGCATGCTAGCTACGGGATGCTGATCGATTTCGAAACTGACTGTTA! GCATTGGCATGGCATGCTAGCTACGGGATGCTGATCGATTTCGAAACTGACTGTTAG! CATTGGCATGGCATGCTAGCTACAGGATGCTGATCGATTTCGAAACTGACTGTTAGC! ATTGGCATGGCATGCTAGCTACGGGATGCTGATCGATATCGAAACTGACTGTTAGCC! TTGGCATGGCATGCTAGCTACAGGATGCTGATCGATTTCGAAACTGACTGTTAGCCA! TGGCATGGCATGCTAGCTACAGGATGCTGATCGATTTCGAAACTGACTGTTAGCCAT! reference GCTATGCATTGGCATGGCATGCTAGCTACAGGATGCTGATCGATTTCGAAACTGACTGTTAGCCAT! 9 Paired-End Sequencing Extends the Power of the Technology Known sequence Sequence reads (single-end) New sequence 80-90% 10 Paired-End Sequencing Extends the Power of the Technology Known sequence Sequence reads (paired-end) Unique placement of one end can resolve ambiguous placement of other New sequence 95 to >99% 11 Indexing pooling samples DNA/RNA samples ! unique sequence “barcode” create library & add index ! increase throughput indexed ! maximize capacity libraries TAAGGCGAGTA….. CGTACTAGTGG….. informatic AGGCAGAAAGT….. pool de-multiplex TCCTGAGCTGC….. GGACTCCTGTG….. TAGGCATGTAG….. CTCTCTACGCA….. FASTQ CAGAGAGGAGT….. GCTACGCTGTG….. CGAGGCTGTAG….. AAGAGGCAGCA….. GTAGAGGAAGT….. 12 Coverage 4X 2X 3X average (read length) x (# of clusters) = coverage genome size (target) Examples Human on HiSeq 2500 (Rapid) (2x150bp) x (600 Million clusters) = 200 X average coverage 3 Billion bp (human) E. coli on MiSeq – 96 samples (2x300bp) x (25 Million clusters) = 52 X average coverage 96 samples x 3 Million bp (E. coli) 13 Coverage Optimizing Sequence Capacity High Performers Consume Unnecessary Coverage Low Performers Need More Coverage Minimum Coverage 14 Applications Guidelines Whole Human Genome ChIP seq Exomes Shotgun Metagenomics RNA seq Targeted Panels Microbial & 16S Metagenomics Focused Power Flexible Power Production Power Population Power MiSeq Series NextSeq HiSeq Series HiSeq X Series 1-25M 130-400M 300-2,500M 3,000M 15 MiSeq Desktop Sequencer- Speed and Simplicity ! Economical personal sequencer ! Easy to use workflow ! Rapid turnaround time ! Proven sequencing chemistry ! Multiple applications 16 MiSeq Offers Scalable Sequencing MiSeq Core Consumables Version 3 25 •" 600 cycles Million Reads •" 150 cycles MiSeq Core Consumables Version 2 •" 500 cycles 15 •" 300 cycles Million Reads •" 150 cycles MiSeq Core Consumables Version 2 Micro 4 •" 300 cycles (Micro) Million Reads MiSeq Core Consumables Version 2 Nano 1 •" 500 cycles (Nano) Million Reads •" 300 cycles (Nano) 17 Multiple Kits Available to Maximize Utility Chemistry Number of Sequencing Cycles Output (Gb) Version Reads (mil) Time (hrs) 3 25 600 15 56 3 25 150 3.8 21 2 15 500 8.5 39 2 15 300 5.1 24 2 15 50 0.85 5.5 2 4 (Micro) 300 1.2 28 2 1 (Nano) 500 0.5 19 2 1 (Nano) 300 0.3 19 18 MiSeq Applications Portfolio Integrated. Optimized. Simplified. Amplicon Custom Targeted Custom Small RNA Clone Sequencing Amplicon Resequencing Enrichment sequencing checking ChIP-Seq Library QC Plasmid Regulation RNA-Seq Resequencing Small RNA De novo 16S genome sequencing sequencing Metagenomics 19 For Research Use Only. Not for use in diagnostic procedures MiSeq Reporter (MSR) Apps De Novo Enrichment Generate LibraryQC PCR Metagenomics Small Resequencing TruSeq Assembly FASTQ Amplicon RNA Amplicon ! Streamlined on-board analysis workflows ! Most workflows also available on BaseSpace ! No user intervention from sample loading to report generation ! Accessible from any computer on the same local network as instrument rd ! All workflows generate FASTQ files that can be analyzed by most 3 party apps 20 For Research Use Only. Not for use in diagnostic procedures BaseSpace Core Apps BaseSpace Labs Apps 16S TopHat Cufflinks RNA VariantStudio FastQC Kraken NextBio VCAT Metagenomics Alignment Assembly Express Metagenomics Annotates & DE BWA Isaac Broad IGV TruSeq Amplicon DS NextBio SRST2 FASTQ Velvet Enrichment Enrichment Amplicon Transporter Toolkit Assembly BWA Isaac Tumor Long Read Long Read PicardSpace SRA Prokka WGS WGS Normal Assembly Phasing Import Annotation Third-Party Apps SPAdes Novoalign Advaita DNASTAR SCIEX SCIEX SCIEX EDGC SWATHAtlas Pipeline Annotator MetaPhlAn N-of-One MyFLQ LoFreq eGB Genomatix Genome OncoMD GeneTalk Profiler PathGEN Melanoma Tute DeepChek® PEDANT CosmosID Dx PathSEQ Profiler HIV, HBV, HCV 21 For Research Use Only. Not for use in diagnostic procedures. Nextera XT workflow and applications 22 For Research Use Only. Not for use in diagnostic procedures Library Prep Approaches ! Shotgun – random fragmentation –" mechanical shearing –" enzymatic shearing (Nextera) ! Considerations –" fragment length –" uniformity of coverage –" workflow 23 A Typical MiSeq DNA Workflow Nextera XT MiSeq MSR or BaseSpace BaseSpace Prep Sequence Analyze Share 15 minutes hands-on 20 minutes hands-on Fully automated Secure and store ! Rapid library preparation –" Complete library prep in as little as 90 minutes with only 15 minutes of hands-on time ! Optimized for small genomes, PCR amplicons and plasmids –" One library prep kit for many applications ! Ultra low input –" Only a single nanogram of input DNA needed ! Innovative sample normalization –" Eliminates the need for library quantification before sample pooling and sequencing 24 For Research Use Only. Not for use in diagnostic procedures Nextera XT, how does it work? 25 Nextera XT, how does it work? FLEXIBLE: Genomic DNA Environmental sample (metagenomics) Long range PCR amplicon Plasmid ds-cDNA 26 Nextera XT, how does it work? FAST: 5 minutes 27 Nextera XT, how does it work? COST EFFECTIVE: Under $35 28 Nextera XT workflow applications ! Assemble and compare genomes of bacterial isolates –" Develop and infection control program ! Perform whole genome metagenomics –" Monitor evolution of bacterial populations under specific conditions ! Single cell RNA-seq analysis –" Sequence amplified ds-cDNA from single cells from different tissue compartments ! Resequence long range PCR products to detect variation between samples –" Manage highly similar genes by leveraging the LR PCR specificity 29 Using NGS to Assess Food Pathogen Outbreak Samples First example: FDA/CDC Listeria whole genome sequencing project Compared with pulsed-field gel electrophoresis (PFGE), WGS provides clearer distinction between cases and foods that are likely part of a given outbreak and those that are not. Whole-genome sequences of the Listeria First time EVER that microbial whole strains isolated from Roos Foods cheese genome sequencing was used by the products were available after the recall and US Federal Government in real time were found to be highly related to sequences to link an outbreak to a company and of the Listeria strains isolated from the affect legal action. patients. 30 PFGE Protocol 31 CDC Protocol, PulseNet Application: Bacterial WGS from culture Displacing PFGE MiSeq & Nextera Sample Prep Primary Library Prep Analysis •" Grow •" Lyse cells from •" Nextera XT •" MiSeq workflow: culture cultured •" 16 samples

What Can the Miseq & NGS Core Do for Your Research?

De Bruijn Graph Based De Novo Genome Assembly

Computational Methods for De Novo Assembly of Next-Generation Genome Sequencing Data Rayan Chikhi

Whole-Genome Sequencing of Bacterial Pathogens

De Novo Transcriptome Assembly of Cucurbita Pepo L. Leaf Tissue Infested by Aphis Gossypii

Metaspades: a New Versatile Metagenomics Assembler

Scaling Metagenome Sequence Assembly with Probabilistic De Bruijn Graphs

Efficient De Novo Assembly of Large Genomes Using Compressed Data Structures

Next-Generation DNA Sequencing Informatics, 2Nd Edition

Assembling Metagenomes, One Community at a Time

Cultivar-Specific Transcriptome and Pan-Transcriptome Reconstruction of Tetraploid Potato

Genomics Tutorial 2019

Efficient De Novo Assembly of Large Genomes Using Compressed Data Structures