Gene Expression Analysis for Functional Genomics

Gene expression analysis for functional genomics Jason Fiedler, PhD [email protected] Loftsgard 474H Functional genomics Genomics from genes to functions DNA RNA Protein phenotype Transcription of genes • RNA polymerase binds promoter to initiate transcription • Promoter is upstream of the gene • 3’-5’ strand as template, anti-sense strand • Sequence of a gene presented is of the sense strand (5’-3’ strand), which is coding sequence Regulation of transcription can be very complicated • Many “regulators” can influence transcription • Activators • Transcription factor • DNA-bending protein • RNA polymerase Gene regulation is even more complicated in eukaryotes Wild type Mutant Gene expression analysis/Transcriptomics = Gene expression analysis/Transcriptomics Nucleus Nucleus DNA DNA measuring mRNA abundance measuringmRNA RNA RNA Proteins Proteins Cell membrane Cell membrane Relave values are typically used 100 200 300 400 500 Intensity100 200 300 400 500 0 Intensity 0 1 1 3 3 5 5 7 7 9 9 11 11 13 13 15 15 Mutant 17 Wild type type Wild 17 19 19 21 21 23 23 25 Genes Genes 25 27 27 29 29 31 31 33 33 1 35 63 37 65 39 67 41 69 43 71 45 73 47 787 49 89 51 91 565 93 67 95 69 97 Does mRNA abundance correlate with protein abundance? [mRNA] vs [protein] Variance in protein abundance explained by X Vogel and Marcotte. Nat. Rev. Genet. 13(4), 227-232. 2012 Regulation of translation can also be very complicated Multi-dimensional analysis is needed to fully understand biology • Phenotypes • Traits • Diseases • Development Gene expression analysis can be used to help validate genetic association studies Quan>tave-trait Loci (QTL) mapping Genome-Wide Associaon Study Identify novel gene expression differences among “treatments” • Development 1000x • Stress response • Mutational status 0.001x • Disease • Transgenic • KO or overexpress Differen>ally expressed • Usually high-throughput genes (DEG) method is used – Many genes measured. • Genes expressed differently in each treatment are investigated further Disease severity Gene expression experimental design • Line/subject/population selection (isolate treatment) • Treatment application/sampling (RNA is transient) • mRNA isolation (comprises only ~2% of all RNA) • What technique to measure abundance – Northern Blot or rtPCR (measure single or a few genes) – Serial Analysis of Gene Expression (SAGE) – Microarray (Affymetrix GeneChip) – RNA-Seq • Analyze data to infer biological meaning – Which genes are up- and down-regulated together (inversely) – What do these genes do? – Build a regulatory model Many RNA levels change throughout the day This can confound differences among treatments mRNA needs to be isolated from other RNA • PolyA Selection – Oligo-dT, often using magnetic beads – Isolates mRNA with poly A tail • rRNA Depletion – RiboZero, RiboMinus – Non-polyA RNAs preserved (non-coding, bacterial RNA, etc.) – Can be less effective at removing all rRNA Directly visualize the RNA RNA - Staining Northern Blot Replace coding region with reporter gene 5’ Nave promoter OsJMJ703 3’ 5’ Nave promoter Β-glucuronidase (GUS) 3’ • Identify locations of gene activation in transgenic hosts. Song, T. et al. Plant Phys. and Bioch. 2018 Reverse transcription PCR (rtPCR) Reverse Transcriptase Creates DNA from RNA qPCR measures abundance of a single fragment rtPCR measures transcript abundance associated with grain development Relave Expression Zhao, J., et al. Front. Plant Sci. 2018 Serial Analysis of Gene Expression (SAGE) • NlaIII is an endonuclease that cleaves double stranded DNA molecules into fragments – 5’ C A T G| 3' – 3’ |G T A C 5' Malali Gowda et al. Plant Physiol. 2004;134:890-897 Serial Analysis of Gene Expression (SAGE) • Ligate different gene fragments together • Sequence and count abundance of each gene Malali Gowda et al. Plant Physiol. 2004;134:890-897 Affymetrix GeneChips Expression Assay (microarray containing probes for genes) Schematic of Gene Chip Immobilized probe sequences are arranged in RNA or cDNA fragments flow features (cells), 1 millions iden>cal sequences over the microarray and make up 1 cell hybridize to the probe cells. Different probes represent different parts of the same gene gene A sequence 25-mers PM + MM - A probe set for gene A Probes are selected to be specific to the target gene and have good hybridizaon characteris>cs. GeneChip Overview Gene A probe set Gene B probe set 1.28 cm * Each chip * Pairs of features signal * * contains up to * * specific binding 6.5 M features 5 µm Probe sets signal gene expression 5 µm A Gene Chip is run for Each feature contains each treatment and genes millions of iden>cal with different abundance probes values are examined RNA-Seq for gene expression analysis • Cheaper per data point than other methods • Detect genes without prior knowledge of transcriptome • Allelic expression • No ascertain bias • No background adjustment • Requires reference sequence • More about RNA-seq, see the reference paper of “RNA- Seq: a revolutionary tool for transcriptomics” (Wang et al., 2009) RNA-Seq protocol 1) Isolate mRNA 2) Generate cDNA 3) Prepare sequencing library 1) fragment 2) add adapters to ends 4) High-throughput sequence 5) Align to reference 6) Count reads 1 7) Convert read counts to gene abundance 2 RNA isolation and processing Adapted from Simon et al., 2009, Ann. Rev. Plant Biol. 60:305 Reads alignment and counting • Align (map) reads to a genome or transcriptome • Convert alignments to read-counts per gene • Normalize read counts to size of gene AND total reads in sample – RPKM (Reads Per Kilobase per Million reads mapped) Sample 1 Sample 2 (18 reads) (9 reads) 2,000 bp gene 1,000 bp gene RNA-Seq analysis: picking significant genes • Use corrected p- Before FHB values Fold change inoculaon (log scale) – Multiple testing • Account for read counts and differences among treatments Fold change 24H aer FHB • Differentially- inoculaon (log scale) expressed genes (DEG) Log RPKM What do these significant genes do? • Gene Ontology resources (GO terms) What terms are “enriched” Bailey et al. J. of . Carcinogenesis. 2013 Gene expression example --Cold responsive in Arabidopsis • Seedlings were grown at 22oC with 16-h-light and 8-h-dark cycles for 2 weeks - all samples taken at same time of photoperiod • Total RNA was prepared from Arabidopsis seedlings after 0, 3, 6, or 24 h cold treatment at 0oC • Each time point, three plates of 150 seedlings were used for RNA pools WT weak – response mutant Time 0h 450 seedlings 450 seedlings Time 3h 450 seedlings 450 seedlings Time 6h 450 seedlings 450 seedlings Time 24h 450 seedlings 450 seedlings Lee et al. The Arabidopsis Cold-Responsive Transcriptome and Its Regulation by ICE1. The Plant Cell 2005 Which genes are significantly affected by the treatment? • Affymetrix GeneChips that contain 24,000 genes • 939 (4%) genes were cold responsive • A multitude of transcriptional cascades – Early cold-responsive genes (3 h and 6 h) encode transcription factors that likely control late-responsive genes Lee et al. The Arabidopsis Cold-Responsive Transcriptome and Its Regulation by ICE1. The Plant Cell 2005 Analysis of DEGs stimulates more experiments • 21 differentially-expressed genes are associated with Auxin (16 downregulated) – Does the Auxin-level decrease? • Some cold-responsive genes don’t do the same thing in WT and mutant (ice1) plants. – Can we model interactions? Gene expression profiling of mutants highlights a complicated regulatory network • GeneChip profiling of WT and CBFΔ mutants. • Treat plants at 4 °C – Measure expression levels – Measure freezing tolerance • First wave of transcription factors include CBF genes – Activates cold-response pathways Park et al. Regulation of the Arabidopsis CBF regulon by a complex low-temperature regulatory network.The Plant Journal 2015 Gene expression profiling of transgenic plants • Number of cold-regulated (COR) genes induced or repressed in transgenic plants overexpressing first-wave transcription factors Gene expression profiling of transgenic plants • Number of cold-regulated (COR) genes induced or repressed in transgenic plants overexpressing first-wave transcription factors • First wave transcription factors function together in a complex low- temperature regulatory network (172 genes) Use global analysis to test specific hypotheses • GolS3 is a COR gene that is only regulated by CBF. • However, many other COR genes are only partially regulated by CBF How do we protect plants from freezing • Some first-wave TFs are known to increase freezing tolerance without cold acclimation. – But they stunt growth • Overexpress other first wave TFs and measure tolerance. • Overexpression of HSFC1 increases freezing tolerance, but also stunts growth via an unknown mechanism Systemic Acquired Resistance in Plants NPR1 knock-down doesn’t inhibit bacteria- mediated SAR in barley SAR Dey, S., et al. Plant Phys. 2014 RNA-Seq of bacteria and mock –infected barley Identify 4 “promising” TFs • Transcript abundance (qPCR) @ 3 time points after infection • Too much variability to draw conclusions Local and systemic response of wheat to Xanthomonas infection Garcia-Seco, D, et al. Sci. Rep. 2017 RNA-Seq to determine DEGs LC/MS to determine DEPs What happens in wheat? • Very different responses in leaves and roots • Leaves activate recognition platforms. • Roots start to produce energy and secondary metabolites. – Are roots more resistant now? Mole Rats! • Naked Mole Rat (HMR) • Blind Mole Rat (BMR) • Eusocial • Solitary Many amazing characteristics Characterization of these delightful

Gene Expression Analysis for Functional Genomics

Chapter 14: Functional Genomics Learning Objectives

Masterpath: Network Analysis of Functional Genomics Screening Data

The Economic Impact and Functional Applications of Human Genetics and Genomics

Human Functional Genomics Project Begins Unraveling Links Between

High-Throughput Automated Microfluidic Sample Preparation for Accurate Microbial Genomics

IBM Functional Genomics Platform, a Cloud-Based Platform for Studying Microbial Life at Scale

After the Draft Sequence, What Next for the Human Genome Mapping Project Resource Centre?

Toward a Protein–Protein Interaction Map of the Budding Yeast: A

A Massively Parallel Barcoded Sequencing Pipeline Enables Generation of the First Orfeome and Interactome Map for Rice

Arabidopsis Thaliana Functional Genomics Project Annual Report 2008

Comparative Genomics for Reliable Protein-Function Prediction from Genomic Data

Classical Genetics 3. the Beginnings of Genomic Biol