Edger: Differential Expression Analysis of Digital Gene
Total Page:16
File Type:pdf, Size:1020Kb
edgeR: differential expression analysis of digital gene expression data User's Guide Yunshun Chen, Davis McCarthy, Matthew Ritchie, Mark Robinson, Gordon K. Smyth First edition 17 September 2008 Last revised 30 June 2016 Contents 1 Introduction 5 1.1 Scope . 5 1.2 Citation . 6 1.3 How to get help . 7 1.4 Quick start . 8 2 Overview of capabilities 9 2.1 Terminology . 9 2.2 Aligning reads to a genome . 9 2.3 Producing a table of read counts . 9 2.4 Reading the counts from a file . 10 2.5 The DGEList data class . 10 2.6 Filtering . 11 2.7 Normalization . 12 2.7.1 Normalization is only necessary for sample-specific effects . 12 2.7.2 Sequencing depth . 12 2.7.3 RNA composition . 12 2.7.4 GC content . 13 2.7.5 Gene length . 13 2.7.6 Model-based normalization, not transformation . 13 2.7.7 Pseudo-counts . 14 2.8 Negative binomial models . 14 2.8.1 Introduction . 14 2.8.2 Biological coefficient of variation (BCV) . 15 2.8.3 Estimating BCVs . 16 2.8.4 Quasi negative binomial . 17 2.9 Pairwise comparisons between two or more groups (classic) . 17 2.9.1 Estimating dispersions . 17 2.9.2 Testing for DE genes . 18 2.10 More complex experiments (glm functionality) . 19 2.10.1 Generalized linear models . 19 1 2.10.2 Estimating dispersions . 19 2.10.3 Testing for DE genes . 20 2.11 What to do if you have no replicates . 21 2.12 Differential expression above a fold-change threshold . 23 2.13 Gene ontology (GO) and pathway analysis . 24 2.14 Gene set testing . 25 2.15 Clustering, heatmaps etc . 25 2.16 Alternative splicing . 26 2.17 CRISPR-Cas9 and shRNA-seq screen analysis . 26 3 Specific experimental designs 28 3.1 Introduction . 28 3.2 Two or more groups . 28 3.2.1 Introduction . 28 3.2.2 Classic approach . 29 3.2.3 GLM approach . 30 3.2.4 Questions and contrasts . 31 3.2.5 A more traditional glm approach . 32 3.2.6 An ANOVA-like test for any differences . 33 3.3 Experiments with all combinations of multiple factors . 34 3.3.1 Defining each treatment combination as a group . 34 3.3.2 Nested interaction formulas . 36 3.3.3 Treatment effects over all times . 36 3.3.4 Interaction at any time . 37 3.4 Additive models and blocking . 37 3.4.1 Paired samples . 37 3.4.2 Blocking . 38 3.4.3 Batch effects . 40 3.5 Comparisons both between and within subjects . 40 4 Case studies 43 4.1 RNA-Seq of oral carcinomas vs matched normal tissue . 43 4.1.1 Introduction . 43 4.1.2 Reading in the data . 43 4.1.3 Annotation . 44 4.1.4 Filtering and normalization . 45 4.1.5 Data exploration . 46 4.1.6 The design matrix . 46 4.1.7 Estimating the dispersion . 47 4.1.8 Differential expression . 48 4.1.9 Gene ontology analysis . 50 2 4.1.10 Setup . 51 4.2 RNA-Seq of pathogen inoculated arabidopsis with batch effects . 52 4.2.1 Introduction . 52 4.2.2 RNA samples . 52 4.2.3 Loading the data . 52 4.2.4 Filtering and normalization . 53 4.2.5 Data exploration . 53 4.2.6 The design matrix . 54 4.2.7 Estimating the dispersion . 55 4.2.8 Differential expression . 56 4.2.9 Setup . 58 4.3 Profiles of Yoruba HapMap individuals . 59 4.3.1 Background . 59 4.3.2 Loading the data . 59 4.3.3 Filtering and normalization . 61 4.3.4 Estimating the dispersion . 62 4.3.5 Differential expression . 63 4.3.6 Gene set testing . 64 4.3.7 Setup . 66 4.4 RNA-Seq profiles of mouse mammary gland . 67 4.4.1 Introduction . 67 4.4.2 Read alignment and processing . 67 4.4.3 Count loading and annotation . 68 4.4.4 Filtering and normalization . 69 4.4.5 Data exploration . 70 4.4.6 The design matrix . 71 4.4.7 Estimating the dispersion . 72 4.4.8 Differential expression . 74 4.4.9 ANOVA-like testing . 76 4.4.10 Gene ontology analysis . 77 4.4.11 Gene set testing . 79 4.4.12 Setup . 80 4.5 Differential splicing after Pasilla knockdown . 81 4.5.1 Introduction . 81 4.5.2 RNA-Seq samples . 81 4.5.3 Read alignment and processing . 82 4.5.4 Count loading and annotation . 83 4.5.5 Filtering and normalization . 84 4.5.6 Data exploration . 85 4.5.7 The design matrix . 86 4.5.8 Estimating the dispersion . 86 3 4.5.9 Differential expression . 88 4.5.10 Alternative splicing . 88 4.5.11 Setup . 91 4.5.12 Acknowledgements . 91 4.6 CRISPR-Cas9 knockout screen analysis . 91 4.6.1 Introduction . 91 4.6.2 Sequence processing . ..