TRAINER’S MANUAL

Introduction to Next Generation Sequencing Hands-on Workshop

Bioplatforms Australia (BPA) The Commonwealth Scientific and Industrial Research Organisation (CSIRO)

TRAINER’S MANUAL

Licensing

This work is licensed under a Creative Commons Attribution 3.0 Unported License and the below text is a summary of the main terms of the full Legal Code (the full licence) available at http://creativecommons.org/licenses/by/3.0/legalcode.

You are free: to copy, distribute, display, and perform the work to make derivative works to make commercial use of the work Under the following conditions: Attribution - You must give the original author credit. With the understanding that: Waiver - Any of the above conditions can be waived if you get permission from the copyright holder. Public Domain - Where the work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license. Other Rights - In no way are any of the following rights affected by the license:

• Your fair dealing or fair use rights, or other applicable copyright exceptions and limitations; • The author’s moral rights; • Rights other persons may have either in the work itself or in how the work is used, such as publicity or privacy rights.

Notice - For any reuse or distribution, you must make clear to others the licence terms of this work. Contents

Licensing3

Contents 4

Workshop Information7 The Trainers ...... 9 Providing Feedback...... 10 Document Structure ...... 10 Resources Used...... 11

Data Quality 13 Key Learning Outcomes...... 14 Resources You’ll be Using ...... 14 Useful Links...... 14 Introduction...... 15 Prepare the Environment...... 16 Quality Visualisation...... 16 Read Trimming...... 19

Read Alignment 25 Key Learning Outcomes...... 26 Resources You’ll be Using ...... 26 Useful Links...... 26 Introduction...... 28 Prepare the Environment...... 28 Alignment...... 28 Manipulate SAM output...... 30 Visualize alignments in IGV...... 31 Practice Makes Perfect! ...... 32

ChIP-Seq 33 Key Learning Outcomes...... 34 Resources You’ll be Using ...... 34 Introduction...... 36 Prepare the Environment...... 36 Finding enriched areas using MACS...... 36 Viewing results with the Ensembl genome browser...... 38 Annotation: From peaks to biological interpretation...... 40 Motif analysis...... 41 Contents Contents

Reference ...... 43

RNA-Seq 45 Key Learning Outcomes...... 46 Resources You’ll be Using ...... 46 Introduction...... 48 Prepare the Environment...... 48 Alignment...... 49 Isoform Expression and Transcriptome Assembly ...... 52 Differential Expression...... 54 Visualising the CuffDiff expression analysis...... 56 Functional Annotation of Differentially Expressed ...... 60 Differential Expression Analysis using edgeR...... 61 References...... 67

de novo Genome Assembly 69 Key Learning Outcomes...... 70 Resources You’ll be Using ...... 70 Introduction...... 72 Prepare the Environment...... 72 Downloading and Compiling Velvet...... 73 Assembling Single-end Reads ...... 75 Assembling Paired-end Reads ...... 82 Hybrid Assembly...... 92

Post-Workshop Information 95 Access to Computational Resources...... 96 Access to Workshop Documents...... 110 Access to Workshop Data ...... 110

Space for Personal Notes or Feedback 111

TRAINER’S MANUAL 5

Workshop Information Workshop Information

8 TRAINER’S MANUAL The Trainers Workshop Information

The Trainers

Dr. Zhiliang Chen Postdoctoral Research Associate The University of New South Wales (UNSW), NSW [email protected]

Dr. Susan Corley Postdoctoral Research Associate The University of New South Wales (UNSW), NSW [email protected]

Dr. Nandan Deshpande Postdoctoral Research Associate The University of New South Wales (UNSW), NSW [email protected]

Dr. Konsta Duesing Research Team Leader