Sequencing Library Preparation
Sarah Boswell http://scholar.harvard.edu/saboswell Genomics, Epigenomics & Transcriptomics
Hasin et al. Multi-omics approaches to disease, Genome Biology 18:83 (2017). Genomics, Epigenomics & Transcriptomics
DNA DNA & Histone RNA Modifications
C. Petit. Evodevomics, BioSciences Master Reviews (2013). Basic Library Preparation
Small piece of DNA or cDNA INSERT
INSERT + Sequencing Adapter Library Seq Methods Sequencing Library Preparation
Starting Material: DNA & RNA
RNA-Seq library prep
Genomic library prep
Mate-pair sequencing (circularization)
Low input / single cell library prep High Quality Starting Material
DNA Extraction
Treat with RNase
RNA Extraction
Treat with DNase
Practice your extraction before the real experiment Genomic DNA Extraction
Any of the available kits or protocols
Bacteria or yeast often need additional bead-beating step.
RNase treat all DNA samples.
Quantitation
Absorbance: Nano-drop (50-500 ng/ul)
Theoretically should can read to 3000 ng/ul. Empirically find it is only accurate within range above.
Dye based
SYBR Green, Qubit, Quant-IT
https://genomics.ed.ac.uk/resources/sample-requirements Genomic DNA QC
Agarose Gel
TapeStation Genomic DNA Assay https://genomics.ed.ac.uk/resources/sample-requirements http://www.agilent.com/cs/library/applications/5991-5258EN.pdf Chromatin Immunoprecipitation (ChIP)
Crosslinking
Cell Lysis
Chromatin Preparation / Sheering
Immunoprecipitation
Crosslinking Reversal & DNA cleanup
https://www.thermofisher.com/us/en/home/life-science/antibodies/antibodies-learning-center/antibodies-resource-library/antibody-application-notes/step-by-step-guide-successful-chip-assays.html High Quality Starting Material
DNA Extraction Treat with RNase
RNA Extraction
Treat with DNase RNA: What is Transcriptomics?
Wikipedia “The transcriptome is the set of all messenger RNA molecules in one cell or a population of cells.”
National Cancer Institute Definition of Terms
“The study of all RNA molecules in a cell. Transcriptomics is used to learn more about how genes are turned on in different types of cells and how this may help cause certain diseases, such as cancer.” RNA enrichment
PolyA tailed messenger RNA: mRNA-Seq
Total RNA (rRNA removed): “total” RNA-Seq
Front Genet. 2015 Jan 26;6:2 mRNA (polyA) Purification
mRNA enrichment
mRNA binds beads coated with oligo dT primer
Non-polyadenylated transcripts are washed away
AAAA TTTTTT AAAA
AAAA TTTTTT Transcripts Lost in polyA Purification
Ribosomal/Transfer RNA Histone mRNA Long-noncoding RNA Nascent intron containing transcripts Micro RNA Degraded RNA Many viral transcripts Prokaryote/Bacterial transcripts polyA is the degradation signal rRNA Depletion Bead Rnase H mRNA/long noncoding RNA/nascent RNA Illumina: TruSeq rRNA rRNA Probes hybridize rRNA on magnetic beads
RNA of interest remains in supernatant
KAPA: RiboErase
Probes hybridize rRNA in solution
Hybrids are digested with RNase H
Probes digested with DNAse I Purified RNA
Modified from: Scientific Reports 6, article 37876 (2016) RNA Extraction
Cultured cells – Easy!
Use your favorite column kit – Qiagen, Invitrogen, Zymo
For high throughput suggest bead based – MagMax
96 well format column plates are also available
Tissue samples Dissect in cold room if possible
Use RNAlater solution to store tissue sample
Upon extraction proceed to homogenization in cold room RNA Extraction: Column Purification
On-column DNase digestion
Dry to remove excess ethanol
Elute with warm water to increase yields On-Column DNase
http://www.sabiosciences.com/pathwaymagazine/pcrhandbook/10.php RNA Extraction: Column Purification
Standard columns/protocols have 100-200bp cut off.
Will result in loss of small RNA species
Specialized columns and kits are available
microRNA
FFPE
Blood
HACK - Cut off size can be adjusted by changing the percent ethanol used for sample binding RNA Extraction: Bead Purification
Binding Wash DNase Re-bind Wash Elute
Magnetic based purification good for high-throughput applications
Can use oligo dT beads or total RNA beads
DNase step:
DNase I digestion step is sometimes skipped (polyA libraries only)
Best practice is to keep this step.
http://www.beckman.com/nucleic-acid-sample-prep/rna-isolation/isolation-from-tissue RNA Extraction: Tissues / Trizol
Keep tissues as cold as possible
Work in cold room
After homogenization suggest column based cleanup
Particularly important if used Trizol for lysis
DNase 10ug then column cleanup
https://www.thermofisher.com/order/catalog/product/AM7020 RNA Quantitation & Quality
Quantitation
Absorbance: Nano-drop (50-500 ng/ul)
Theoretically should can read to 3000 ng/ul. Empirically find it is only accurate within range above.
Dye based
RiboGreen
Qubit / Quant-IT
Quality
Visualize on gel
Agilent Bioanalyzer (RIN) RNA quality
High quality RNA needed for mRNA libraries
Degraded samples should only be used to make a “total” RNA-seq library – rRNA removal
FFPE & Archival Samples mRNA Purification of Degraded Samples
AAAA AAAA TTTTTT AAAA
AAAA TTTTTT
Transcript 1 Transcript 2
PolyA tail no longer attached to transcript.
Results in differential loss of transcripts between samples. Sequencing Library Preparation
Starting Material: DNA & RNA
RNA-Seq library prep
Genomic library prep
Mate-pair sequencing (circularization)
Low input / single cell library prep RNASeq Stranded Library Prep (dUTP method)
or mRNA purification
or strand specific amplification
Index
http://www.rna-seqblog.com/wp-content/uploads/2012/12/library-preparation.jpg Library Strandedness
ACCATGAACCGTA
TGGTACTTGGCAT ACCAUGAACCGUA
Read alignment depends on direction of transcription
“sense” strand of transcript can be on either the sense or antisense strand of the DNA http://seqanswers.com/forums/showthread.php?t=44220 Library Strandedness
https://galaxyproject.org/tutorials/rb_rnaseq/ Key steps in library preparation
Starting Material
Library amplification bias
Multiplexing
qPCR quantitation
Sequencing read order & terminology Library Amplification Bias
Final step of library prep is amplification transcript count 2, 13, 4
Introduces library bias
Some products preferentially amplified
Fewer cycles = less bias
transcript count 12, 20, 8
Modified from: Nature Methods 9, 72-74 (2012) Limited Cycle Library Amplification
Number of cycles needed is proportional to amount of input RNA.
Library prep kits will recommend a certain number of cycles.
This is usually optimized for the lower input.
Test how many cycles will give you enough product.
Fewer cycles = less bias Limited Cycle Library Amplification
Perform micro qPCR reaction on small amount of pre- amplification library (Kapa)
Amplify only the number cycles needed to get enough product for sequencing (20ul of 4nM product)
https://www.kapabiosystems.com/document/kapa-library-quantification-illumina-tds/?dl=1 Limited Cycle Library Amplification
9 cycles | 10 cycles 11 cycles | 12 cycles
PCR bubble Amplify 9 cycles Library QC
Quantitation Peak around 150 = primer dimer Dye based
SYBR Green
Qubit / Quant-IT
Size & Quality
Agilent Bioanalyzer
Size determination
Do not use for quantitation Size selection with SPRI beads
Solid Phase Reverse Immobilization beads
Carboxyl groups on surface bind DNA in the presence of crowding agents (PEG & NaCl)
http://core-genomics.blogspot.com/2012/04/how-do-spri-beads-work.html Key steps in library preparation
Starting Material Library amplification bias
Multiplexing
qPCR quantitation
Sequencing read order & terminology RNASeq Stranded Library Prep (dUTP method)
or mRNA purification
or strand specific amplification
Index
http://www.rna-seqblog.com/wp-content/uploads/2012/12/library-preparation.jpg Multiplexing (barcodes and indices)
Multiplexing allows optimal use of reads you will get Charges for sequencing are usually per lane of the flow cell
HiSeq generates ~150 million reads per lane
NextSeq generates ~ 450 million reads (one lane instrument)
For RNA-Seq number of reads you need will depend on your experiment
10 million standard for transcriptome
20 million standard for total RNA (rRNA depleted) Make sure multiplexing libraries of similar size 3’ 5’
DNA
(0.1-5.0 μg)
A
T
C G
C T
G
T
A
C
G
C
G
A
A
T
C
G
A
A
C
T
C
G
C
G
C
A T
Single molecule array C
G
A A 5’ Library Preparation Cluster Growth Sequencing 1 2 3 4 5 6 7 8 9 T G CT A C G A T …
Image Acquisition Base Calling
Consider Cluster Size in Multiplexing
www.support.illumina.com Multiplexing
Pool samples based on dye based quantitation
Submit pool to core facility for sequencing.
Make all sequencing libraries in one batch
qPCR quantitate before sequencing Key steps in library preparation
Starting Material Library amplification bias Multiplexing qPCR quantitation
Sequencing read order & terminology Sequencing Read Order
Rd1 Seq Primer Index 1 primer Barcode and/or UMI
INDEX Index 2 Index 2 primer(A) primer(B) Rd2 Seq Primer
1. Read 1 HiSeq/MiSeq (4 color)
2. Index Read 1 (i7) • A&C read on one camera
3. Index Read 2 (i5) • G&T read on other 4. Read 2 NextSeq (2 color) Sequencing Library Preparation
Starting Material: DNA & RNA
RNA-Seq library prep
Genomic library prep
Mate-pair sequencing (circularization)
Low input / single cell library prep Genomic Library Prep
Once you have sheared your DNA this is a quick process
Shear Genomic DNA or begin with cDNA
Protocol same as for RNA-Seq once End Repair (blunt ends) you have sheared dsDNA Add 3’ A Tail
Acoustic shearing – Covaris Ligate Adapters
Sonication
Enrich/Linearize with PCR Hydrodynamic shearing – nebulization
Sequencing
http://tucf-genomics.tufts.edu/home/faq Tagmentation (DNA fragmentation facilitated by transposon activity)
http://www.molecularecologist.com/2015/01/new-to-the-genome-sequencing-8-menu-nextera-library-preps/ Tagmentation Approach
Nextera from Illumina
Very fast and efficient for DNA library preps
Works with small amounts of DNA
Important to RNase treat your sample
Needs precise DNA quantitation (Qubit)
Often used as last stage of low input RNASeq library protocols Whole Genome Sequencing (WGS) vs Whole Exome Sequencing (WES)
https://www.my46.org/intro/whole-genome-and-exome-sequencing Genomic Assembly with Mate-Pairs
https://www.illumina.com/content/dam/illumina-marketing/images/technology/mate-pair-sequencing-figure.gif Mate-Pair Sequencing
http://www.illumina.com/ Mate-Pair Sequencing
Read1
Read2
http://www.illumina.com/ Mate-Pair Sequencing
Read1
Read2
Read1
Read2 Going backwards
http://www.illumina.com/ Mate-Pair Sequencing
Read1
Read2
Read1
Read2 Going backwards Read2
Read1
http://www.illumina.com/ Mate-Pair Genomic Library
Insert pieces of DNA can be long (2-5kb)
Allows for better de novo genome assembly and finishing of genome assemblies.
Helps in determining presence and location of genomic rearrangements and amplifications. Sequencing Library Preparation
Starting Material: DNA & RNA RNA-Seq library prep Genomic library prep Mate-pair sequencing (circularization)
Low input / single cell library prep Low Input & Single Cell RNA-Seq
Lower input = less chance to see mRNA of interest Need to consider sampling error High technical variation Single cell methods will only capture 10-40% of expected mRNA Will not reliably detect low-abundance transcripts Differential expression observed is reliable for highly expressed genes Single Cell / Low Input Methods
InDrops Single Cell / Low Input Methods
InDrops Single Cell / Low Input Methods
InDrops Smart-Seq / Drop-Seq / SCRB-Seq
In Drop-Seq / SCRB-Seq libraries are enriched for 3’ UMI labeled ends
S. Picelli et al., Full-length RNA-seq from single cells using Smart-seq2., Nat Protoc 9, 171–81 (2014). InDrops Single Cell Sequencing
Lysis and reverse transcription occurs in the beads
Samples are frozen after RT as RNA:DNA hybrid in gel.
A. M. Klein et al., Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells., Cell 161, 1187–201 (2015). InDrops Library Prep / CEL-Seq2
ssSynthesis to make full length dsDNA
IVT back to RNA via T7 promoter
Fragmentation RNA
Random hexamer RT with adaptor
PCR to add index and illumina adaptors R. Zilionis et al., Single-cell barcoding and sequencing using droplet microfluidics, Nature Protocols 12, 44-73 (2017). Single Cell / Low Input Methods
InDrops
Smart-Seq gives full transcript information & detects most genes per cell. Droplet based and Smart-Seq most accurate differential expression Drop-Seq 5-12% capture efficiency InDrops/10x 50-80% capture efficiency Low Input or Single Cell
Only use if needed for experiment All commonly used methods all rely on PolyA tail
If you can get more starting material then you will get better results. Gold standard is TruSeq with >500ng input RNA
Plan a small scale starter experiment to see if protocol will give useful results Transcript Enrichment: Capture Sequencing
Capture targeted sequence using biotinylated RNA bait
Sequencing library applied to beads
Retain only library covering genes of interest
Saves money on sequencing
IDT lockdown probes expensive but good for small number genes
http://rnaseq.uoregon.edu/ Capture Probes
Tiling is the number of times a base is covered by a different probe.
Difficult to design probes if looking at a single gene family or pseudogenes.
http://rnaseq.uoregon.edu/ Final Thoughts
Practice your library prep on a control sample.
Be sure you understand each step in library prep.
Talk to someone who has done the protocol before starting. qPCR Precise quantitation is key to effective sequencing! Useful Websites
support.illumina.com/
seqanswers.com/
core-genomics.blogspot.com/2012/04/how-do-spri- beads-work.html
www.broadinstitute.org/files/shared/illuminavids/Sa mplePrepSlides.pdf