Edger: Differential Expression Analysis of Digital Gene

edgeR: differential expression analysis of digital gene expression data User's Guide Yunshun Chen, Davis McCarthy, Matthew Ritchie, Mark Robinson, Gordon K. Smyth First edition 17 September 2008 Last revised 30 June 2016 Contents 1 Introduction 5 1.1 Scope . 5 1.2 Citation . 6 1.3 How to get help . 7 1.4 Quick start . 8 2 Overview of capabilities 9 2.1 Terminology . 9 2.2 Aligning reads to a genome . 9 2.3 Producing a table of read counts . 9 2.4 Reading the counts from a file . 10 2.5 The DGEList data class . 10 2.6 Filtering . 11 2.7 Normalization . 12 2.7.1 Normalization is only necessary for sample-specific effects . 12 2.7.2 Sequencing depth . 12 2.7.3 RNA composition . 12 2.7.4 GC content . 13 2.7.5 Gene length . 13 2.7.6 Model-based normalization, not transformation . 13 2.7.7 Pseudo-counts . 14 2.8 Negative binomial models . 14 2.8.1 Introduction . 14 2.8.2 Biological coefficient of variation (BCV) . 15 2.8.3 Estimating BCVs . 16 2.8.4 Quasi negative binomial . 17 2.9 Pairwise comparisons between two or more groups (classic) . 17 2.9.1 Estimating dispersions . 17 2.9.2 Testing for DE genes . 18 2.10 More complex experiments (glm functionality) . 19 2.10.1 Generalized linear models . 19 1 2.10.2 Estimating dispersions . 19 2.10.3 Testing for DE genes . 20 2.11 What to do if you have no replicates . 21 2.12 Differential expression above a fold-change threshold . 23 2.13 Gene ontology (GO) and pathway analysis . 24 2.14 Gene set testing . 25 2.15 Clustering, heatmaps etc . 25 2.16 Alternative splicing . 26 2.17 CRISPR-Cas9 and shRNA-seq screen analysis . 26 3 Specific experimental designs 28 3.1 Introduction . 28 3.2 Two or more groups . 28 3.2.1 Introduction . 28 3.2.2 Classic approach . 29 3.2.3 GLM approach . 30 3.2.4 Questions and contrasts . 31 3.2.5 A more traditional glm approach . 32 3.2.6 An ANOVA-like test for any differences . 33 3.3 Experiments with all combinations of multiple factors . 34 3.3.1 Defining each treatment combination as a group . 34 3.3.2 Nested interaction formulas . 36 3.3.3 Treatment effects over all times . 36 3.3.4 Interaction at any time . 37 3.4 Additive models and blocking . 37 3.4.1 Paired samples . 37 3.4.2 Blocking . 38 3.4.3 Batch effects . 40 3.5 Comparisons both between and within subjects . 40 4 Case studies 43 4.1 RNA-Seq of oral carcinomas vs matched normal tissue . 43 4.1.1 Introduction . 43 4.1.2 Reading in the data . 43 4.1.3 Annotation . 44 4.1.4 Filtering and normalization . 45 4.1.5 Data exploration . 46 4.1.6 The design matrix . 46 4.1.7 Estimating the dispersion . 47 4.1.8 Differential expression . 48 4.1.9 Gene ontology analysis . 50 2 4.1.10 Setup . 51 4.2 RNA-Seq of pathogen inoculated arabidopsis with batch effects . 52 4.2.1 Introduction . 52 4.2.2 RNA samples . 52 4.2.3 Loading the data . 52 4.2.4 Filtering and normalization . 53 4.2.5 Data exploration . 53 4.2.6 The design matrix . 54 4.2.7 Estimating the dispersion . 55 4.2.8 Differential expression . 56 4.2.9 Setup . 58 4.3 Profiles of Yoruba HapMap individuals . 59 4.3.1 Background . 59 4.3.2 Loading the data . 59 4.3.3 Filtering and normalization . 61 4.3.4 Estimating the dispersion . 62 4.3.5 Differential expression . 63 4.3.6 Gene set testing . 64 4.3.7 Setup . 66 4.4 RNA-Seq profiles of mouse mammary gland . 67 4.4.1 Introduction . 67 4.4.2 Read alignment and processing . 67 4.4.3 Count loading and annotation . 68 4.4.4 Filtering and normalization . 69 4.4.5 Data exploration . 70 4.4.6 The design matrix . 71 4.4.7 Estimating the dispersion . 72 4.4.8 Differential expression . 74 4.4.9 ANOVA-like testing . 76 4.4.10 Gene ontology analysis . 77 4.4.11 Gene set testing . 79 4.4.12 Setup . 80 4.5 Differential splicing after Pasilla knockdown . 81 4.5.1 Introduction . 81 4.5.2 RNA-Seq samples . 81 4.5.3 Read alignment and processing . 82 4.5.4 Count loading and annotation . 83 4.5.5 Filtering and normalization . 84 4.5.6 Data exploration . 85 4.5.7 The design matrix . 86 4.5.8 Estimating the dispersion . 86 3 4.5.9 Differential expression . 88 4.5.10 Alternative splicing . 88 4.5.11 Setup . 91 4.5.12 Acknowledgements . 91 4.6 CRISPR-Cas9 knockout screen analysis . 91 4.6.1 Introduction . 91 4.6.2 Sequence processing . ..

Edger: Differential Expression Analysis of Digital Gene

The Transition from Primary Colorectal Cancer to Isolated Peritoneal Malignancy

Sustained Intrinsic WNT and BMP4 Activation Impairs Hesc Diferentiation to Defnitive Endoderm and Drives the Cells Towards Extra‑Embryonic Mesoderm C

Peripheral Nerve Single-Cell Analysis Identifies Mesenchymal Ligands That Promote Axonal Growth

Supplementary Material Study Sample the Vervets Used in This

Downloaded from the European Nucleotide Archive (ENA; 126

Clustering & Function Enrichment Analysis

Edger: Differential Analysis of Sequence Read Count Data User's

Sulfotransferase 1A3 (SULT1A3)

De Novo Transcriptome Assembly, Functional Annotation, And

Co-Expression of Mklf7 and Trka 1149

SUPPLEMENTARY MATERIALS Figure S1. Pre-Processing

Data-Driven and Knowledge-Driven Computational Models of Angiogenesis in Application to Peripheral Arterial Disease