What Can the MiSeq & NGS Core

Do for Your Research?

Grant Hill Library Prep Specialist

[email protected]

© 2013 Illumina, Inc. All rights reserved. Illumina, IlluminaDx, BaseSpace, BeadArray, BeadXpress, cBot, CSPro, DASL, DesignStudio, Eco, GAIIx, Genetic Energy, Genome Analyzer, GenomeStudio, GoldenGate, HiScan, HiSeq, Infinium, iSelect, MiSeq, Nextera, NuPCR, SeqMonitor, Solexa, TruSeq, TruSight, VeraCode, the pumpkin orange color, and the Genetic Energy streaming bases design are trademarks or registered trademarks of Illumina, Inc. All other brands and names contained herein are the property of their respective owners. Agenda

Into/basic principles of NGS (terms & technology)

Small genome sequencing

Targeted resequencing (custom panels)

16s & amplicon sequencing

Small RNA & targeted rna expression

Experimental design & local resources

2 Enhanced Focus on the Sample to Answer Integration From library prep to downstream informatics & knowledge generation

Library Prep Sequence Answer

3 For Research Use Only. Not for use in diagnostic procedures. The Flow Cell Where the magic happens

Everything except sample preparation is completed on the flow cell

• Template annealing (1 - 384 samples)

• Template amplification

• Sequencing primer hybridization

• Sequencing-by-synthesis reaction

• Generation of fluorescent signal

MiSeq NextSeq HiSeq

4 Flow Cell Surface

8 channels Simplified workflow

Surface of flow cell Clusters in a coated with a lawn of oligo pairs contained environment (no need for clean rooms)

Sequencing performed in the flow cell on the clusters

5 Sequencing by synthesis

3’ 5’

DNA (0.1-1.0 ug) A G T C A G C T T A C C G G A T A A C T C C C G G A T T C Sample G A preparation SingleCluster molecule growth array T 5’ Sequencing 1 2 3 4 5 6 7 8 9 T G C T A C G A T …

Image acquisition Base calling

6 Illumina Paired End Sequencing Overview of automated process on the flowcell

Cluster amplification 1st Read 1 cut

FLOWCELL Normal sequencing FLOWCELL process Linearize DNA Sequence 1st strand

Because we amplify on the flowcell surface, we can resynthesize the DNA in the

cluster, and regenerate both FLOWCELL strands again Strand re-synthesis

2nd Read 2 cut

FLOWCELL FLOWCELL

Linearize Sequence 2nd strand

7 de novo Sequencing

gap

contig contig

long read

mate pair

8 Re-Sequencing

Alignment 3X 2X

reference

Variant Calling G/A SNV

GCTATGCATTGGCATGGCATGCTAGCTACGGGATGCTGATCGATTTCGAAACTGACT! CTATGCATTGGCATGGCATGCTAGCTACAGGATGCTGATCGATTTCGAAACTGACTG! random TATGCATTGGCATGGCATGCTAGCTACGGGATGCTGATCGATTTCGAAACTGACTGT! errors ATGCATTGGCATGTCATGCTAGCTACAGGATGCTGATCGATTTCGAAACTGACTGTT! TGCATTGGCATGGCATGCTAGCTACGGGATGCTGATCGATTTCGAAACTGACTGTTA! GCATTGGCATGGCATGCTAGCTACGGGATGCTGATCGATTTCGAAACTGACTGTTAG! CATTGGCATGGCATGCTAGCTACAGGATGCTGATCGATTTCGAAACTGACTGTTAGC! ATTGGCATGGCATGCTAGCTACGGGATGCTGATCGATATCGAAACTGACTGTTAGCC! TTGGCATGGCATGCTAGCTACAGGATGCTGATCGATTTCGAAACTGACTGTTAGCCA! TGGCATGGCATGCTAGCTACAGGATGCTGATCGATTTCGAAACTGACTGTTAGCCAT! reference GCTATGCATTGGCATGGCATGCTAGCTACAGGATGCTGATCGATTTCGAAACTGACTGTTAGCCAT!

9 Paired-End Sequencing Extends the Power of the Technology

Known sequence

Sequence reads (single-end)

New sequence 80-90%

10 Paired-End Sequencing Extends the Power of the Technology

Known sequence

Sequence reads (paired-end)

Unique placement of one end can resolve ambiguous placement of other New sequence 95 to >99%

11 Indexing pooling samples

DNA/RNA samples unique sequence “barcode” create library & add index increase throughput

indexed maximize capacity libraries TAAGGCGAGTA….. CGTACTAGTGG….. informatic AGGCAGAAAGT….. pool de-multiplex TCCTGAGCTGC….. GGACTCCTGTG….. TAGGCATGTAG….. CTCTCTACGCA….. FASTQ CAGAGAGGAGT….. GCTACGCTGTG….. CGAGGCTGTAG….. AAGAGGCAGCA….. GTAGAGGAAGT…..

12 Coverage

4X 2X 3X average (read length) x (# of clusters) = coverage genome size (target)

Examples

Human on HiSeq 2500 (Rapid)

(2x150bp) x (600 Million clusters) = 200 X average coverage 3 Billion bp (human)

E. coli on MiSeq – 96 samples

(2x300bp) x (25 Million clusters) = 52 X average coverage 96 samples x 3 Million bp (E. coli)

13 Coverage Optimizing Sequence Capacity

High Performers Consume Unnecessary Coverage

Low Performers Need More Coverage

Minimum Coverage

14 Applications Guidelines Whole Human Genome

ChIP seq

Exomes

Shotgun Metagenomics

RNA seq

Targeted Panels

Microbial & 16S Metagenomics

Focused Power Flexible Power Production Power Population Power

MiSeq Series NextSeq HiSeq Series HiSeq X Series

1-25M 130-400M 300-2,500M 3,000M

15 MiSeq Desktop Sequencer- Speed and Simplicity

Economical personal sequencer

Easy to use workflow

Rapid turnaround time

Proven sequencing chemistry

Multiple applications

16 MiSeq Offers Scalable Sequencing

MiSeq Core Consumables Version 3 25 • 600 cycles Million Reads • 150 cycles

MiSeq Core Consumables Version 2 • 500 cycles 15 • 300 cycles Million Reads • 150 cycles

MiSeq Core Consumables Version 2 Micro 4 • 300 cycles (Micro) Million Reads MiSeq Core Consumables Version 2 Nano 1 • 500 cycles (Nano) Million Reads • 300 cycles (Nano)

17 Multiple Kits Available to Maximize Utility

Chemistry Number of Sequencing Cycles Output (Gb) Version Reads (mil) Time (hrs) 3 25 600 15 56

3 25 150 3.8 21

2 15 500 8.5 39

2 15 300 5.1 24

2 15 50 0.85 5.5

2 4 (Micro) 300 1.2 28

2 1 (Nano) 500 0.5 19

2 1 (Nano) 300 0.3 19

18 MiSeq Applications Portfolio Integrated. Optimized. Simplified.

Amplicon Custom Targeted Custom Small RNA Clone Sequencing Amplicon Resequencing Enrichment sequencing checking

ChIP-Seq Library QC Plasmid Regulation RNA-Seq Resequencing

Small RNA De novo 16S genome sequencing sequencing Metagenomics

19 For Research Use Only. Not for use in diagnostic procedures MiSeq Reporter (MSR) Apps

De Novo Enrichment Generate LibraryQC PCR Metagenomics Small Resequencing TruSeq Assembly FASTQ Amplicon RNA Amplicon

Streamlined on-board analysis workflows

Most workflows also available on BaseSpace

No user intervention from sample loading to report generation

Accessible from any computer on the same local network as instrument

rd All workflows generate FASTQ files that can be analyzed by most 3 party apps

20 For Research Use Only. Not for use in diagnostic procedures BaseSpace Core Apps BaseSpace Labs Apps

16S TopHat Cufflinks RNA VariantStudio FastQC Kraken NextBio VCAT Metagenomics Alignment Assembly Express Metagenomics Annotates & DE

BWA Isaac Broad IGV TruSeq Amplicon DS NextBio SRST2 FASTQ Velvet Enrichment Enrichment Amplicon Transporter Toolkit Assembly

BWA Isaac Tumor Long Read Long Read PicardSpace SRA Prokka WGS WGS Normal Assembly Phasing Import Annotation

Third-Party Apps

SPAdes Novoalign Advaita DNASTAR SCIEX SCIEX SCIEX EDGC SWATHAtlas Pipeline Annotator

MetaPhlAn N-of-One MyFLQ LoFreq eGB Genomatix Genome OncoMD GeneTalk Profiler

PathGEN Melanoma Tute DeepChek® PEDANT CosmosID Dx PathSEQ Profiler HIV, HBV, HCV

21 For Research Use Only. Not for use in diagnostic procedures. Nextera XT workflow and applications

22 For Research Use Only. Not for use in diagnostic procedures Library Prep Approaches

Shotgun – random fragmentation

– mechanical shearing

– enzymatic shearing (Nextera)

Considerations

– fragment length

– uniformity of coverage

– workflow

23 A Typical MiSeq DNA Workflow

Nextera XT MiSeq MSR or BaseSpace BaseSpace

Prep Sequence Analyze Share 15 minutes hands-on 20 minutes hands-on Fully automated Secure and store

Rapid library preparation – Complete library prep in as little as 90 minutes with only 15 minutes of hands-on time

Optimized for small genomes, PCR amplicons and plasmids – One library prep kit for many applications

Ultra low input – Only a single nanogram of input DNA needed

Innovative sample normalization – Eliminates the need for library quantification before sample pooling and sequencing

24 For Research Use Only. Not for use in diagnostic procedures Nextera XT, how does it work?

25 Nextera XT, how does it work?

FLEXIBLE:

Genomic DNA

Environmental sample (metagenomics)

Long range PCR amplicon

Plasmid

ds-cDNA

26 Nextera XT, how does it work?

FAST:

5 minutes

27 Nextera XT, how does it work?

COST EFFECTIVE:

Under $35

28 Nextera XT workflow applications

Assemble and compare genomes of bacterial isolates – Develop and infection control program

Perform whole genome metagenomics – Monitor evolution of bacterial populations under specific conditions

Single cell RNA-seq analysis – Sequence amplified ds-cDNA from single cells from different tissue compartments

Resequence long range PCR products to detect variation between samples – Manage highly similar genes by leveraging the LR PCR specificity

29 Using NGS to Assess Food Pathogen Outbreak Samples First example: FDA/CDC Listeria whole genome sequencing project

Compared with pulsed-field gel electrophoresis (PFGE), WGS provides clearer distinction between cases and foods that are likely part of a given outbreak and those that are not.

Whole-genome sequences of the Listeria First time EVER that microbial whole strains isolated from Roos Foods cheese genome sequencing was used by the products were available after the recall and US Federal Government in real time were found to be highly related to sequences to link an outbreak to a company and of the Listeria strains isolated from the affect legal action. patients.

30 PFGE Protocol

31 CDC Protocol, PulseNet Application: Bacterial WGS from culture Displacing PFGE

MiSeq & Nextera Sample Prep Primary Library Prep Analysis

• Grow • Lyse cells from • Nextera XT • MiSeq workflow: culture cultured • 16 samples per • 2x 250 isolate run • Generate FASTQ • Genomic DNA • ~100x • Data sent to CDC extraction coverage or BaseSpace for storage and sharing with CDC and upload to NCBI SRA database. • Analysis thru BioNumerics (Applied Maths) as WGS-MLST

32 Targeted Resequencing

33 For Research Use Only. Not for use in diagnostic procedures Coverage vs. Depth it’s a trade off

75X WGS TruSeq PCR Free

Exome 200X Nextera Rapid Capture

Amplicon Panel TruSight Myeloid 5,000X

34 Library Prep Approaches

Targeted – re-sequencing

– Amplicon - PCR

– Enrichment – oligo hyb probes

region of interest

35 Variants Can Cause Mismatches!

Amplicon

Enrichment

36 TSCA or Enrichment? Factors to consider

Feature TSCA/Low- Enrichment (NRC) Input Content up to 500kb 500kb-15mb Insert size 150-450bp 250-300bp* Targets/Probes 1536+ 2,000-60,000+ Input 50ng/10ng 50-100ng Types of variants Indel/SNV Indel/SNV/CNVs* FFPE compatible Yes Yes/No Discover breakpoints No Yes Tolerate Mismatches Yes/no Yes, 10% - low level

37 Advantages Disadvantages

• Simple, Fast workflow • Complex primer design • Low sample input • Multiplex limit (10-50ng) • Novel fusions difficult Amplicon • “Good” with damaged • Duplicates DNA (FFPE) • Off target PCR artifacts

• Simple probe design • Longer run time • No upper limit to total • Higher sample input regions size (≥50ng) Enrichment • Novel fusions possible • Nextera LP not “good” • Remove duplicates with FFPE

38 Targeted Sequencing gene panels & custom

TruSeq Custom Fixed panels Custom Amplicon Nextera Rapid Capture

up to 500kb 500kb to 15 Mb

39 TruSeq Custom Amplicon Assay Overview Begin with 250ng genomic DNA

>10ng Genomic DNA

Hybridization

40 TruSeq Custom Amplicon Assay Overview Using a pair of custom PCR primers, each targeted region is amplified in each sample

Genomic DNA

Hybridization

Extension- Ligation

41 TruSeq Custom Amplicon Assay Overview Incorporation of indexed primers followed by sample pooling

Genomic DNA

Hybridization

Extension- Ligation

Amplification

42 TruSeq Custom Amplicon Assay Time Go from DNA to called variants in ~2 days

Day 1 Day 1 Day 1-2 Day 2 Start at 8:00AM Complete at 2:00PM ~27 hrs Finished at 5:00PM Receive custom Assay Cluster gen and Real-time oligos; biochemistry sequencing analysis Hybridization setup on MiSeq

Oligos, Extension & Ligation, Automated Simple, efficient, universal reagents PCR with index, sequencing automatic data analysis normalization & pooling and variant calling

43 TruSeq Custom Amplicon Data Analysis Illumina-developed and supported analysis tools

Illumina Experiment Manager – Wizard-driven tool to set up and manage sample preparation plates – Generation of sample sheet for the sequencing of pooled samples

MiSeq Reporter – On-instrument software – Demultiplex indices/samples – Perform read alignment – Report on detected variants

Illumina Amplicon Viewer – Offline viewer for visualization of data from multiple runs – Custom report generation

44 MiSeq Reporter Performs on-instrument secondary analysis

Installed on MiSeq

For the Amplicon workflow, MiSeq Reporter will: – Create an assembly for each amplicon per sample – Identify variants – Create graphs and reports, including coverage per amplicon, quality and variant scores

45 Illumina Amplicon Viewer View of ~8kb contiguous region with 53 tiled amplicons

Coverage

Q Scores

Variant Scores

46 MiSeq Amplicon Workflow

DesignStudio TruSeq Custom Amplicon MiSeq MSR or BaseSpace

Order Prep Sequence Analyze Intuitive web- Simple plate- 20 minutes hands-on Fully automated based design tool based format

Simple Assay Customization – Include up to 1,536 amplicons across a range of amplicon sizes and reference genomes

Optimized reagents – Achieve high performance from as little as 10 ng of DNA

Rapid sample processing – Prepare up to 96 reactions simultaneously using simple plate-based format

Easy, integrated analysis – Access automated, design-specific variant calling, and data analysis with pre-configured MiSeq software

47 For Research Use Only. Not for use in diagnostic procedures Compared 63 samples run on Miseq vs PGM for 54 genes

Used TruSeq Amplicon Cancer panel with 7 additional amplicons

Ran 10 samples per miseq V2 2 x 150bp for 1400x mean coverage

110 variants identified on Miseq and 102 on PGM

48 For ResearchFor Use Research Only. Use Not Only. for Not usefor use in in diagnostic diagnostic procedures. procedures. How Many Samples Can I Run with my Panel?

49 For ResearchFor Use Research Only. Use Not Only. for Not usefor use in in diagnostic diagnostic procedures. procedures. DesignStudio™ Tool Easy custom probe design tool

Let the DesignStudio Tool guide you with the new assay selector tool ✔ ︎ Help me choose

Move rapidly from design to order in 5 easy steps

1 2

50 For Research Use Only. Not for use in diagnostic procedures. DesignStudio Tool 1. Select assay

51 For Research Use Only. Not for use in diagnostic procedures. DesignStudio Tool 2. Configure design

52 For Research Use Only. Not for use in diagnostic procedures. DesignStudio Tool 3. Add targets

53 For Research Use Only. Not for use in diagnostic procedures. DesignStudio Tool 4. Review design

54 For Research Use Only. Not for use in diagnostic procedures. Illumina Concierge Service Product content service

Tier 1 Tier 2 Design optimization Product optimization Dedicated design expert 2 - 4 weeks In-silico design optimization Extended product capabilities 6 - 8 weeks Dedicated project manager Functional testing with controls Iterative product enhancements Shipment coordination Customer engagement Design creation Probe pool creation

Requirements gathering Tier 1 Expert in-silico optimization Tier 2 Iterative enhancement Delivery § Content § Design optimization § Manufacture § Performance § Design review and acceptance § Functional QC § Timing § Order § Enhancement

55 For Research Use Only. Not for use in diagnostic procedures. Extended Capabilities Through the Illumina Concierge Service

TruSeq® Custom Amplicon v1.5 Nextera® Rapid Capture Custom

16–10,000 amplicons Any alternate genomes (2‒67K probe maximum) Unique molecular identifiers

Dual strand for FFPE

Add-on to existing panels

Smaller amplicons

Targeting of single nucleotide polymorphism (SNPs)

Any alternate genomes

56 For Research Use Only. Not for use in diagnostic procedures. 16S metagenomics

57 For Research Use Only. Not for use in diagnostic procedures Metagenomic Studies

16S rRNA gene sequencing Shotgun Sequencing Ø relative abundance of the Ø gene expression analysis and classification functional annotation within microbial Ø High resolution phylogeny Communities Ø make functional predictions

Metagenomic Project

Meta-transcriptomics Long-reads Scaffolding Ø establishment of a signature Ø Confidently match sequences to profile for a specific environment. databases Ø screen microbiomes for enzymes of interest such as proteases or polymerases

58 16S Metagenomics

16S ribosomal RNA: ~ 1.5kb long

Expressed from 1 or more rDNA genes

Highly conserved regions to amplify targets via degenerate primers

Species-specific variable areas to phylogenecally classify bacteria

Classificaon specificity and sensivity: - Amplificaon primers - Reads per sample - Database (related strains or species can not be disnguished) - Length of amplicon (# of variable regions covered by reads)

Chimeric amplicons and sequence errors falsely increase diversity esmates

59 Targeng V3-V4 of 16S gene Illumina demonstrated 16S Sequencing protocol

Degenerated primer pairs for ~460 bp amplicon covering V3 and V4 regions of the 16S gene

Amplicon

Stch 2x250 and 2x300 PE reads for full connuous sequence (add to databases)

Ashelford et al. (2005) Appl. Environ. Microbiol. & * Klindworth et al. (2013) Nucleic Acids Res.

60 16S amplicons Applicable to other amplicons <550bp

Overhang adapter sequence used in Step 2 F Locus-specific sequence rDNA or cDNA of rRNA 460bp target sequence

Locus-specific sequence R Overhang adapter sequence used in Step 2

cleanup

Index adapter oligos from ILMN P5 adapter and Index 2

cleanup Index adapter oligos from ILMN P7 adapter and Index 1

250-300bp Read1 Index1 Final Library Index2 250-300bp Read2

61 Metagenomic Analysis: 16S rRNA Sequencing

BaseSpace Apps: 16S Metagenomics 1.01 – Demonstrated Protocol – Taxonomic classificaon of 16S rRNA targeted amplicon reads using an Illumina- curated version of the GreenGenes database.

Kraken Metagenomics – ccb.jhu.edu/soware/kraken – Taxonomic classificaon of short reads from bacteria, archaea & viruses

Selecon of publicly available alternaves: QIMME – hp://qiime.org/ – Open source soware package for comparison and analysis of microbial communies, primarily based on amplicon data.

Mothur – hp://www.mothur.org/ – Open Source soware suite to analyze and visualize 16S rRNA gene sequences.

metagenomeSeq – hps://github.com/HCBravoLab/metagenomeSeq – Differenal abundance analysis for microbial marker-gene surveys

STAMP – hp://kiwi.cs.dal.ca/Soware/STAMP – Idenfying biologically relevant differences between metagenomic communies.

62 Jack Gilbert’s Paper from 2012

For anyone who wants to start to do metagenomic studies this paper is the best guide out there.

It gives guidance on: – Sampling – DNA Extraction – Sequencing – Statistical analysis – Assemblies and annotation

His Conclusion: “ Metagenomics will be employed as commonly and frequently as any other laboratory method.”

Link: http://www.microbialinformaticsj.com/ content/pdf/2042-5783-2-3.pdf

63 Gut microbiome response to short- term macronutrient change

Plant vs animal based diet

16s on HiSeq then QIIME

RNA-seq

Regulatory & taxonomical shifts

64 60 families including children & dogs

Fecal, oral, & skin

Household members shared more microbiota vs different households

Dog ownership increased shared skin microbiota

65 Sampled their oral and gut microbiome daily for one year Wrote an iOS app to chronicle daily activities

66 doi:10.1128/mBio.01012-14

► Compared OTUs of oral microbiota of healthy and periodontal disease samples.

► OTUs present varied across all samples.

► Transcriptome analysis revealed enzyme expression was found to be well conserved among disease samples.

67 microRNA workflow

68 For Research Use Only. Not for use in diagnostic procedures microRNA biology

Daniel Ramsköld - Wikipedia

69 miR-seq, how it works

70 Targeted RNA Expression workflow

71 For Research Use Only. Not for use in diagnostic procedures Introducing TruSeq Targeted RNA Expression Rapid and economical RNA profiling and validation for MiSeq

Quantitative analysis of RNA targets in high throughput – Assay 12 to 1,000 targets across 10s to 100s of samples – Enables higher density validation experiments

Lower price per sample than RT-PCR and Taqman Arrays – Significantly reduces initial start up costs

Rapid workflow and fast turn around time – Sample to answer in <2 days

Low sample input requirement – 50 ng or less total RNA

Simplest analysis of any NGS

72 Rapid, High Throughput Workflow

Modified from existing TSCA Sample Prep Chemistry

Sample to answer in 1.5 days with < 4hrs hands-on time

Sample prep for 48-384 samples per run

Single MiSeq run equivalent to 15,000 qPCR reactions or 40 384-well plates

Requires only 50 bp Single Reads =

73 Region of interest is small, but the assay can be used to target splice junctions, fusion transcripts, cSNPs, RNA editing sites, in/dels….

74 Case Study: SLC25A3

A

B

C

A 6795825 6795824 A 6795812 B+C A+B+C 6795822 6795816 A+B+C

Two assays that detect transcript A

One assay that detects transcript B+C

Two assays that detect all isoforms

75 TREx

TruSeq Targeted RNA Expression assays represent an entirely new way to profile Gene Expression

This is a new sequencing application for MiSeq

The assay gives data that is very accurate and reproducible across a large dynamic range

The system allows researchers to study up to 15,000 gene expression results in a short 50 bp run on MiSeq

76 How many samples can I run?

Depends on:

Coverage Output

Desired coverage #Samples Sample Index Output (number of reads & read length)

77 Estimating Coverage and Cost

[MiSeq 15M reads X (PE 2x150)] X 85% on target X 90% PF – Divide by the target length (50,000 bases or 200 amplicons) – Divide by the number of samples (48)

Est 1400x mean coverage

78 How do I find what I need for my application? Sample Prep Selector

79 For Research Use Only. Not for use in diagnostic procedures. THANK YOU

MiSeq MiSeqDx NextSeq HiSeq HiSeq X Ten

In vitro diagnostic

80 For Research Use Only. Not for use in diagnostic procedures. APPENDIX

81 For Research Use Only. Not for use in diagnostic procedures MSR Workflows: De Novo Assembly

Common applications: – De novo assembly of small genomes

Process: 1. Uses Velvet assembler to reconstruct small genomes through use of contigs, without the need for a reference 2. Can compare with known reference if available to generate dot-plot

Outputs: – FASTA file containing contigs – Dot plot png

82 For Research Use Only. Not for use in diagnostic procedures MSR Workflows: Generate FASTQ

Common applications: – Most flexible intermediate output for any downstream analysis outside of MSR – Analogous to the “BCL to FASTQ Converter” utility

Process: 1. Reads are assembled from base call files and written to FASTQ 2. FASTQ is ready for additional processing, such as alignment

Outputs: – FASTQ

83 For Research Use Only. Not for use in diagnostic procedures MSR Workflows: Enrichment

Common applications: – Large, targeted panels using pulldown enrichment/capture – Nextera Rapid Capture Custom, Nextera Rapid Capture Exome (8 rxn x 1 plex)

Process: 1. Reads are aligned against whole genome reference (BWA) 2. Variants are called using the standard variant caller (GATK), or the somatic variant caller if specified in the sample sheet (particularly useful on cancer samples)

Outputs: – FASTQ – BAM – VCF – gVCF

84 For Research Use Only. Not for use in diagnostic procedures MSR Workflows: Metagenomics

Common application: – Bacteria population analysis based on 16S rRNA amplicons – Generate taxonomic classification data down to species level – Integrates seamlessly with Illumina’s 16S Demonstrated Protocol (V3-V4 amplicons)

Process: 1. Reads are classified by sorting against 16S database, GreenGenes (V1-V9 regions) 2. Per sample statistics written to report files and plots

Outputs: – FASTQ – BAM – GUI Plots – HTML report

85 For Research Use Only. Not for use in diagnostic procedures MSR Workflows: Amplicon

Common applications: – Analysis of PCR amplicons fragmented with Nextera tagmentation

Process: 1. Reads are aligned (BWA) against a custom built manifest file from IEM 2. Variant analysis in regions of interest (GATK)

Outputs: – FASTQ, BAM, VCF, gVCF

86 For Research Use Only. Not for use in diagnostic procedures MSR Workflows: Resequencing

Common applications: – Small genome analysis (~20mb or smaller)

Process: 1. Reads are aligned against reference genomes (BWA) 2. Variant analysis in regions of interest (GATK)

Outputs: – FASTQ – BAM – VCF – gVCF

87 For Research Use Only. Not for use in diagnostic procedures MSR Workflows: Small RNA

Common applications: – Small RNA abundance measurements typically important in transcription regulation – Often important for cancer research

Process: 1. Reads are aligned against databases for mature miRNA (miRBase), small RNA, and a genomic reference using Bowtie (flexible reference storage) 2. Small RNA hits and relative species abundance is reported

Outputs: – FASTQ – BAM – TXT reports – Charts

88 For Research Use Only. Not for use in diagnostic procedures