Sequencing Library Preparation

Sarah Boswell http://scholar.harvard.edu/saboswell Genomics, Epigenomics & Transcriptomics

Hasin et al. Multi-omics approaches to disease, Genome Biology 18:83 (2017). Genomics, Epigenomics & Transcriptomics

DNA DNA & Histone RNA Modifications

C. Petit. Evodevomics, BioSciences Master Reviews (2013). Basic Library Preparation

Small piece of DNA or cDNA INSERT

INSERT + Sequencing Adapter Library Seq Methods Sequencing Library Preparation

 Starting Material: DNA & RNA

 RNA-Seq library prep

prep

 Mate-pair sequencing (circularization)

 Low input / single cell library prep High Quality Starting Material

 DNA Extraction

 Treat with RNase

 RNA Extraction

 Treat with DNase

Practice your extraction before the real experiment Genomic DNA Extraction

 Any of the available kits or protocols

or yeast often need additional bead-beating step.

 RNase treat all DNA samples.

 Quantitation

 Absorbance: Nano-drop (50-500 ng/ul)

 Theoretically should can read to 3000 ng/ul. Empirically find it is only accurate within range above.

 Dye based

 SYBR Green, Qubit, Quant-IT

https://genomics.ed.ac.uk/resources/sample-requirements Genomic DNA QC

Agarose Gel

TapeStation Genomic DNA Assay https://genomics.ed.ac.uk/resources/sample-requirements http://www.agilent.com/cs/library/applications/5991-5258EN.pdf Chromatin Immunoprecipitation (ChIP)

 Crosslinking

 Cell Lysis

 Chromatin Preparation / Sheering

 Immunoprecipitation

 Crosslinking Reversal & DNA cleanup

https://www.thermofisher.com/us/en/home/life-science/antibodies/antibodies-learning-center/antibodies-resource-library/antibody-application-notes/step-by-step-guide-successful-chip-assays.html High Quality Starting Material

 DNA Extraction  Treat with RNase

 RNA Extraction

 Treat with DNase RNA: What is Transcriptomics?

 Wikipedia “The transcriptome is the set of all messenger RNA molecules in one cell or a population of cells.”

 National Cancer Institute Definition of Terms

“The study of all RNA molecules in a cell. Transcriptomics is used to learn more about how genes are turned on in different types of cells and how this may help cause certain diseases, such as cancer.” RNA enrichment

 PolyA tailed messenger RNA: mRNA-Seq

 Total RNA (rRNA removed): “total” RNA-Seq

Front Genet. 2015 Jan 26;6:2 mRNA (polyA) Purification

 mRNA enrichment

 mRNA binds beads coated with oligo dT primer

 Non-polyadenylated transcripts are washed away

AAAA TTTTTT AAAA

AAAA TTTTTT Transcripts Lost in polyA Purification

 Ribosomal/Transfer RNA  Histone mRNA  Long-noncoding RNA  Nascent intron containing transcripts  Micro RNA  Degraded RNA  Many viral transcripts  Prokaryote/Bacterial transcripts  polyA is the degradation signal rRNA Depletion Bead Rnase H mRNA/long noncoding RNA/nascent RNA  Illumina: TruSeq rRNA rRNA  Probes hybridize rRNA on magnetic beads

 RNA of interest remains in supernatant

 KAPA: RiboErase

 Probes hybridize rRNA in solution

 Hybrids are digested with RNase H

 Probes digested with DNAse I Purified RNA

Modified from: Scientific Reports 6, article 37876 (2016) RNA Extraction

 Cultured cells – Easy!

 Use your favorite column kit – Qiagen, Invitrogen, Zymo

 For high throughput suggest bead based – MagMax

 96 well format column plates are also available

 Tissue samples  Dissect in cold room if possible

 Use RNAlater solution to store tissue sample

 Upon extraction proceed to homogenization in cold room RNA Extraction: Column Purification

 On-column DNase digestion

 Dry to remove excess ethanol

 Elute with warm water to increase yields On-Column DNase

http://www.sabiosciences.com/pathwaymagazine/pcrhandbook/10.php RNA Extraction: Column Purification

 Standard columns/protocols have 100-200bp cut off.

 Will result in loss of small RNA species

 Specialized columns and kits are available

 microRNA

 FFPE

 Blood

HACK - Cut off size can be adjusted by changing the percent ethanol used for sample binding RNA Extraction: Bead Purification

Binding Wash DNase Re-bind Wash Elute

 Magnetic based purification good for high-throughput applications

 Can use oligo dT beads or total RNA beads

 DNase step:

 DNase I digestion step is sometimes skipped (polyA libraries only)

 Best practice is to keep this step.

http://www.beckman.com/nucleic-acid-sample-prep/rna-isolation/isolation-from-tissue RNA Extraction: Tissues / Trizol

 Keep tissues as cold as possible

 Work in cold room

 After homogenization suggest column based cleanup

 Particularly important if used Trizol for lysis

 DNase 10ug then column cleanup

https://www.thermofisher.com/order/catalog/product/AM7020 RNA Quantitation & Quality

 Quantitation

 Absorbance: Nano-drop (50-500 ng/ul)

 Theoretically should can read to 3000 ng/ul. Empirically find it is only accurate within range above.

 Dye based

 RiboGreen

 Qubit / Quant-IT

 Quality

 Visualize on gel

 Agilent Bioanalyzer (RIN) RNA quality

 High quality RNA needed for mRNA libraries

 Degraded samples should only be used to make a “total” RNA-seq library – rRNA removal

 FFPE & Archival Samples mRNA Purification of Degraded Samples

AAAA AAAA TTTTTT AAAA

AAAA TTTTTT

Transcript 1 Transcript 2

 PolyA tail no longer attached to transcript.

 Results in differential loss of transcripts between samples. Sequencing Library Preparation

 Starting Material: DNA & RNA

 RNA-Seq library prep

 Genomic library prep

 Mate-pair sequencing (circularization)

 Low input / single cell library prep RNASeq Stranded Library Prep (dUTP method)

or mRNA purification

or strand specific amplification

Index

http://www.rna-seqblog.com/wp-content/uploads/2012/12/library-preparation.jpg Library Strandedness

ACCATGAACCGTA

TGGTACTTGGCAT ACCAUGAACCGUA

 Read alignment depends on direction of transcription

 “sense” strand of transcript can be on either the sense or antisense strand of the DNA http://seqanswers.com/forums/showthread.php?t=44220 Library Strandedness

https://galaxyproject.org/tutorials/rb_rnaseq/ Key steps in library preparation

 Starting Material

 Library amplification bias

 Multiplexing

 qPCR quantitation

 Sequencing read order & terminology Library Amplification Bias

 Final step of library prep is amplification transcript count 2, 13, 4

 Introduces library bias

 Some products preferentially amplified

 Fewer cycles = less bias

transcript count 12, 20, 8

Modified from: Nature Methods 9, 72-74 (2012) Limited Cycle Library Amplification

 Number of cycles needed is proportional to amount of input RNA.

 Library prep kits will recommend a certain number of cycles.

 This is usually optimized for the lower input.

 Test how many cycles will give you enough product.

 Fewer cycles = less bias Limited Cycle Library Amplification

 Perform micro qPCR reaction on small amount of pre- amplification library (Kapa)

 Amplify only the number cycles needed to get enough product for sequencing (20ul of 4nM product)

https://www.kapabiosystems.com/document/kapa-library-quantification-illumina-tds/?dl=1 Limited Cycle Library Amplification

9 cycles | 10 cycles 11 cycles | 12 cycles

PCR bubble Amplify 9 cycles Library QC

 Quantitation Peak around 150 = primer dimer  Dye based

 SYBR Green

 Qubit / Quant-IT

 Size & Quality

 Agilent Bioanalyzer

 Size determination

 Do not use for quantitation Size selection with SPRI beads

 Solid Phase Reverse Immobilization beads

 Carboxyl groups on surface bind DNA in the presence of crowding agents (PEG & NaCl)

http://core-genomics.blogspot.com/2012/04/how-do-spri-beads-work.html Key steps in library preparation

 Starting Material  Library amplification bias

 Multiplexing

 qPCR quantitation

 Sequencing read order & terminology RNASeq Stranded Library Prep (dUTP method)

or mRNA purification

or strand specific amplification

Index

http://www.rna-seqblog.com/wp-content/uploads/2012/12/library-preparation.jpg Multiplexing (barcodes and indices)

 Multiplexing allows optimal use of reads you will get  Charges for sequencing are usually per lane of the flow cell

 HiSeq generates ~150 million reads per lane

 NextSeq generates ~ 450 million reads (one lane instrument)

 For RNA-Seq number of reads you need will depend on your experiment

 10 million standard for transcriptome

 20 million standard for total RNA (rRNA depleted) Make sure multiplexing libraries of similar size 3’ 5’

DNA

(0.1-5.0 μg)

A

T

C G

C T

G

T

A

C

G

C

G

A

A

T

C

G

A

A

C

T

C

G

C

G

C

A T

Single molecule array C

G

A A 5’ Library Preparation Cluster Growth Sequencing 1 2 3 4 5 6 7 8 9 T G CT A C G A T …

Image Acquisition Base Calling

Consider Cluster Size in Multiplexing

www.support.illumina.com Multiplexing

 Pool samples based on dye based quantitation

 Submit pool to core facility for sequencing.

 Make all sequencing libraries in one batch

qPCR quantitate before sequencing Key steps in library preparation

 Starting Material  Library amplification bias  Multiplexing  qPCR quantitation

 Sequencing read order & terminology Sequencing Read Order

Rd1 Seq Primer Index 1 primer Barcode and/or UMI

INDEX Index 2 Index 2 primer(A) primer(B) Rd2 Seq Primer

1. Read 1 HiSeq/MiSeq (4 color)

2. Index Read 1 (i7) • A&C read on one camera

3. Index Read 2 (i5) • G&T read on other 4. Read 2 NextSeq (2 color) Sequencing Library Preparation

 Starting Material: DNA & RNA

 RNA-Seq library prep

 Genomic library prep

 Mate-pair sequencing (circularization)

 Low input / single cell library prep Genomic Library Prep

Once you have sheared your DNA this is a quick process

Shear Genomic DNA or begin with cDNA

 Protocol same as for RNA-Seq once End Repair (blunt ends) you have sheared dsDNA Add 3’ A Tail

 Acoustic shearing – Covaris Ligate Adapters

 Sonication

Enrich/Linearize with PCR  Hydrodynamic shearing – nebulization

Sequencing

http://tucf-genomics.tufts.edu/home/faq Tagmentation (DNA fragmentation facilitated by transposon activity)

http://www.molecularecologist.com/2015/01/new-to-the-genome-sequencing-8-menu-nextera-library-preps/ Tagmentation Approach

 Nextera from Illumina

 Very fast and efficient for DNA library preps

 Works with small amounts of DNA

 Important to RNase treat your sample

 Needs precise DNA quantitation (Qubit)

 Often used as last stage of low input RNASeq library protocols (WGS) vs Whole Exome Sequencing (WES)

https://www.my46.org/intro/whole-genome-and-exome-sequencing Genomic Assembly with Mate-Pairs

https://www.illumina.com/content/dam/illumina-marketing/images/technology/mate-pair-sequencing-figure.gif Mate-Pair Sequencing

http://www.illumina.com/ Mate-Pair Sequencing

Read1

Read2

http://www.illumina.com/ Mate-Pair Sequencing

Read1

Read2

Read1

Read2 Going backwards

http://www.illumina.com/ Mate-Pair Sequencing

Read1

Read2

Read1

Read2 Going backwards Read2

Read1

http://www.illumina.com/ Mate-Pair Genomic Library

 Insert pieces of DNA can be long (2-5kb)

 Allows for better de novo genome assembly and finishing of genome assemblies.

 Helps in determining presence and location of genomic rearrangements and amplifications. Sequencing Library Preparation

 Starting Material: DNA & RNA  RNA-Seq library prep  Genomic library prep  Mate-pair sequencing (circularization)

 Low input / single cell library prep Low Input & Single Cell RNA-Seq

 Lower input = less chance to see mRNA of interest  Need to consider sampling error  High technical variation  Single cell methods will only capture 10-40% of expected mRNA  Will not reliably detect low-abundance transcripts  Differential expression observed is reliable for highly expressed genes Single Cell / Low Input Methods

InDrops Single Cell / Low Input Methods

InDrops Single Cell / Low Input Methods

InDrops Smart-Seq / Drop-Seq / SCRB-Seq

In Drop-Seq / SCRB-Seq libraries are enriched for 3’ UMI labeled ends

S. Picelli et al., Full-length RNA-seq from single cells using Smart-seq2., Nat Protoc 9, 171–81 (2014). InDrops Single Cell Sequencing

 Lysis and reverse transcription occurs in the beads

 Samples are frozen after RT as RNA:DNA hybrid in gel.

A. M. Klein et al., Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells., Cell 161, 1187–201 (2015). InDrops Library Prep / CEL-Seq2

 ssSynthesis to make full length dsDNA

 IVT back to RNA via T7 promoter

 Fragmentation RNA

 Random hexamer RT with adaptor

 PCR to add index and illumina adaptors R. Zilionis et al., Single-cell barcoding and sequencing using droplet microfluidics, Nature Protocols 12, 44-73 (2017). Single Cell / Low Input Methods

InDrops

 Smart-Seq gives full transcript information & detects most genes per cell.  Droplet based and Smart-Seq most accurate differential expression  Drop-Seq 5-12% capture efficiency  InDrops/10x 50-80% capture efficiency Low Input or Single Cell

 Only use if needed for experiment  All commonly used methods all rely on PolyA tail

 If you can get more starting material then you will get better results.  Gold standard is TruSeq with >500ng input RNA

 Plan a small scale starter experiment to see if protocol will give useful results Transcript Enrichment: Capture Sequencing

 Capture targeted sequence using biotinylated RNA bait

 Sequencing library applied to beads

 Retain only library covering genes of interest

 Saves money on sequencing

 IDT lockdown probes expensive but good for small number genes

http://rnaseq.uoregon.edu/ Capture Probes

 Tiling is the number of times a base is covered by a different probe.

 Difficult to design probes if looking at a single gene family or pseudogenes.

http://rnaseq.uoregon.edu/ Final Thoughts

 Practice your library prep on a control sample.

 Be sure you understand each step in library prep.

 Talk to someone who has done the protocol before starting. qPCR Precise quantitation is key to effective sequencing! Useful Websites

 support.illumina.com/

 seqanswers.com/

 core-genomics.blogspot.com/2012/04/how-do-spri- beads-work.html

 www.broadinstitute.org/files/shared/illuminavids/Sa mplePrepSlides.pdf