<<

RNA next generation resources available at the Experimental and Computational Genomics Core (ECGC)

Kornel Schuebel, PhD

ECGC Resource Director

[email protected]

Telephone 410-614-0445 CRB2 Rm 131 (lab) CRB2 Rm 1m44 (office) What’s our mission?

To facilitate easy access to genomic technologies and expertise, including experimental design, sample processing, and data analysis.

To build educational and training opportunities for genomics analysis.

Next Generation Microarray Sequencing Experimental and Computational Genomics

Biostatistics and Genomics Bioinformatics Education Analysis The ECGC team

Faculty Directors Staff Leslie Cope Michael Considine Sarah Wheelan Anuj Gupta Vasan Yegnasubramanian Jennifer Meyers Alyza Skaist Resource Director Hai Xu Kornel Schuebel Coordinators Lauren Ciotti Luda Danilova Daniel Vellucci Core faculty IT support Rob Scharpf Greg Smith Elana Fertig Dominic King How do I start my project?

Let us know a little (a Lauren Sarah sentence or two is fine) Vasan Contact us at ecgc.jhmi.edu about your project Leslie Kornel

Schedule and attend a consultation

Lauren Meet with us to establish an Sarah experimental plan, discuss Vasan costs and a timeline Leslie Set up an iLab project report Kornel and drop off samples

Drop off times are generally Lauren on Tuesdays and Thursdays Jennifer Kornel We will contact you to verify types of data analysis you want

Anuj, Alyza, Michael Schedule a meeting to look We confirm with you the Sarah comparisons your iLabs Vasan at the data together report showed Leslie Kornel

Followup Finding us is easy! Just fill out the form.

ecgc.jhmi.edu What type of RNA seq should I do? Depends on your project…

Enrichment------Three popular options Library construction

1)Bulk RNA seq Sequencing------500ng-1ug of high quality total RNA

2) Low input Mapping------Minimum input is 100 cells

3) FFPE/degraded RNA Analysis------Step 1: Total RNA quality assessment with the Agilent BioAnalyzer /Tapestation Step 2: Library construction for bulk RNA seq

rRNA depletion with Ribo Zero beads followed by fragmentation

A) polyA selection with magnetic beads C) End repair

B) cDNA synthesis D) Adapter ligation and amplification Advantages of the Illumina Tru Seq approach

- Retains strand specific information on RNA transcript

- Library capture of both coding RNA, as well as multiple forms of non-coding RNA

- Degraded RNA can be used with minor adjustments to fragmentation procedures

- Can be automated and scaled up to 96 well format

- Is compatible with no indexing or a lower indexing pooling level (<24).

- The libraries do not require PCR amplification to enable cluster generation although PCR is recommended to meet the yield requirements of most standard applications. Step 2: Library construction for problematic RNA samples

Illumina TruSeq RNA Exome

To perform RNA seq analysis using FFPE and other low-quality samples we use the Illumina TruSeq RNA Exome workflow.

TruSeq RNA Exome features a highly optimized probe set that delivers comprehensive coverage of coding RNA sequences. TruSeq RNA Exome includes 425,437 probes designed to capture 210,000 targets, spanning 21,415 genes of interest qnd covering 98.3% of the RefSeq exome.

This method adds unique oligonucleotides to each library, tagging them for downstream pooling into one lane (Figure 2A). This multisample pooling step allows more samples to be loaded in a single sequencing run, making high-throughput studies feasible. After libraries are pooled, they undergo a capture step that produces a targeted library, depleted of ribosomal RNA and intronic or intergenic regions. Pooled libraries are hybridized to biotin-labeled probes specific for coding RNA regions (Figure 2B). Specific targets within the pool are then captured by adding streptavidin beads that bind to the biotinylated probes (Figure 2C). Magnets pull the bound RNA fragments from the solution (Figure 2D). Captured RNA fragments are eluted from the beads and hybridized for a second enrichment reaction. After amplification, a targeted library is ready for cluster generation and subsequent sequencing. Advantages of the RNA exome approach - Requires very little RNA (>10 ng)

- Can be used with degraded FFPE samples

- Retains strand specificity

- Only available for human RNA analysis Low input RNA sequencing Step1. cDNA is prepared from total RNA using the chimeric DNA/RNA primer mix and reverse transcriptase. The chimeric primers have a 3' DNA section that hybridizes to the mRNA and a unique 5' RNA sequence that does not hybridize to the mRNA.

Step2. After partial fragmentation of mRNA in the cDNA/mRNA complex, DNA polymerase generates a second strand, including DNA complementary to the unique 5' RNA sequence resulting in a double-stranded cDNA with an RNA/DNA heteroduplex at one end.

Step3. RNase H removes the unique RNA sequence in the double-stranded cDNA revealing a site for binding the second DNA/RNA chimeric primer. DNA polymerase synthesizes cDNA starting at the 3' end of the primer, displacing the existing forward strand. RNA at the 5' end of the newly synthesized strand is again cleaved by RNase H, exposing the priming site for initiation of the next round of cDNA synthesis. The process of SPIA DNA/RNA primer binding, cDNA synthesis, strand displacement and RNA cleavage is repeated in a highly processive manner, resulting in rapid accumulation of micrograms of amplified cDNA from picograms of total RNA. Advantages of the SPIA approach

Highly reproducible

-The SPIA amplification process is highly reproducible over a wide range of input amounts, both in terms of the relative abundance of transcripts as well as the distribution of sequencing reads in an NGS experiment.

-As with any enzymological amplification method, there is some level of bias with regard to the degree of amplification of various sequences or molecules. In the case of SPIA, this bias is highly reproducible, with biological information being consistently maintained thereby enabling correlative comparisons between samples processed with the same SPIA-based protocols.

High Fidelity

The SPIA amplification method produces sequence copies with a high degree of accuracy thereby preservating biological differences in the transcriptome.

Robust!

Disadvantage that it does not retain strand specificity. Step 3: Sequencing through third party vendors Novagen and Psomagen. Sequencing with “in house” ECGC NGS sequencers

Installed in May 2019, the Ion GeneStudio S5 Prime System can sequence 150 million molecules in 3 hours.

Installed in September 2019, the Illumina NovaSeq 6000 is able to the sequence up to twenty billion DNA fragments at the same time. As an example, this allows us to sequence forty-eight human at 30x coverage in a forty hour run. Typical sequencing parameters

-50,000,000 reads per sample

-paired end 150x150 bp Illumina sequencing Step 4: RNA seq analysis- from sequencing reads to differential expression

Raw Sequence Data QC by FASTQ Files FastQC

Reads Mapping

RSEM-STAR

Mapped Reads Expression Quantification SAM/BAM Files Expected Counts, FPKM, TPM etc.

DE testing DEseq2, EBSeq, edgeR etc. List of DE Functional Interpretation genes Pathway Explanatory Integrate with analysis plots other data SAM/BAM format Two section: header section, alignment section

http://samtools.sourceforge. net/SAM1.pdf How we quantitate gene expression

Sample level

DE level Expression quantification viewed by IGV Gene expression analysis using arrays

Illumina microarrays Agilent microarrays

- gene and microRNA expression - custom Educational activities: (partnered with the Center for Computational Genomics)

Short courses, including: - statistics and data analysis using R - introduction to UNIX - sed, awk and shell scripting - inferring phylogenies in cancer - visualizing sequencing data

Practical Genomics Workshop (annual, late August)

Bioinformatics and Genomics Symposium (annual, October) Future directions: single cell RNA seq using the 10x Genomics Chromium platform

Clustering of approximately 5,000 CITE-seq single-cell expression profiles of PBMCs reveals distinct cell populations based on transcriptome analysis. The left panel shows a two-dimensional representation (tSNE) of global gene expression relationships among all cells. Major cell types in peripheral blood can be discerned based on marker gene expression as indicated. The right panels show mRNA (blue) and corresponding Antibody-Derived Tag (ADT, green) signal for the CITE-seq antibody panel projected on the tSNE plot. Darker shading corresponds to higher levels measured. Future Directions

• Continually update and enhance the diversity and utility of the services offered through the Core • Support research on technologies for early detection, treatment monitoring and risk stratification • Adopt and generate new technologies, equipment, workflows and analytical pipelines • Continue and update genomics educational activities • Strive to provide exceptional services with optimal pricing • Establish discount pricing with 3rd party sequencing vendors for certain applications. Contact info:

Kornel Schuebel [email protected]

Lauren Ciotti [email protected]

Our website: ecgc.jhmi.edu Thanks for your attention!