Gene expression analysis for functional genomics
Jason Fiedler, PhD [email protected] Loftsgard 474H Functional genomics Genomics from genes to functions
DNA
RNA
Protein
phenotype Transcription of genes
• RNA polymerase binds promoter to initiate transcription • Promoter is upstream of the gene • 3’-5’ strand as template, anti-sense strand • Sequence of a gene presented is of the sense strand (5’-3’ strand), which is coding sequence Regulation of transcription can be very complicated
• Many “regulators” can influence transcription • Activators • Transcription factor • DNA-bending protein • RNA polymerase
Gene regulation is even more complicated in eukaryotes Gene expression analysis/Transcriptomics = measuring mRNA abundance
Nucleus Wild type Proteins 500 400 DNA 300 Wild type RNA 200 100 Intensity
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 1 63 65 67 69 71 73 787 89 91 93 95 97 Genes Cell membrane
Nucleus Mutant Proteins 500
400 DNA 300 Mutant RNA 200 100 Intensity
0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 565 67 69 Genes
Cell membrane
Rela ve values are typically used Does mRNA abundance correlate with protein abundance?
[mRNA] vs [protein] Variance in protein abundance explained by X
Vogel and Marcotte. Nat. Rev. Genet. 13(4), 227-232. 2012 Regulation of translation can also be very complicated Multi-dimensional analysis is needed to fully understand biology
• Phenotypes • Traits • Diseases • Development
Gene expression analysis can be used to help validate genetic association studies
Quan ta ve-trait Loci (QTL) mapping
Genome-Wide Associa on Study Identify novel gene expression differences among “treatments”
• Development 1000x • Stress response • Mutational status 0.001x • Disease • Transgenic • KO or overexpress Differen ally expressed • Usually high-throughput genes (DEG) method is used – Many genes measured. • Genes expressed differently in each treatment are investigated further Disease severity
Gene expression experimental design
• Line/subject/population selection (isolate treatment) • Treatment application/sampling (RNA is transient) • mRNA isolation (comprises only ~2% of all RNA) • What technique to measure abundance – Northern Blot or rtPCR (measure single or a few genes) – Serial Analysis of Gene Expression (SAGE) – Microarray (Affymetrix GeneChip) – RNA-Seq • Analyze data to infer biological meaning – Which genes are up- and down-regulated together (inversely) – What do these genes do? – Build a regulatory model Many RNA levels change throughout the day
This can confound differences among treatments mRNA needs to be isolated from other RNA
• PolyA Selection – Oligo-dT, often using magnetic beads – Isolates mRNA with poly A tail • rRNA Depletion – RiboZero, RiboMinus – Non-polyA RNAs preserved (non-coding, bacterial RNA, etc.) – Can be less effective at removing all rRNA Directly visualize the RNA
RNA - Staining Northern Blot Replace coding region with reporter gene
5’ Na ve promoter OsJMJ703 3’
5’ Na ve promoter Β-glucuronidase (GUS) 3’
• Identify locations of gene activation in transgenic hosts.
Song, T. et al. Plant Phys. and Bioch. 2018 Reverse transcription PCR (rtPCR)
Reverse Transcriptase Creates DNA from RNA
qPCR measures abundance of a single fragment rtPCR measures transcript abundance associated with grain development Rela ve Expression
Zhao, J., et al. Front. Plant Sci. 2018 Serial Analysis of Gene Expression (SAGE)
• NlaIII is an endonuclease that cleaves double stranded DNA molecules into fragments – 5’ C A T G| 3' – 3’ |G T A C 5'
Malali Gowda et al. Plant Physiol. 2004;134:890-897 Serial Analysis of Gene Expression (SAGE)
• Ligate different gene fragments together • Sequence and count abundance of each gene
Malali Gowda et al. Plant Physiol. 2004;134:890-897 Affymetrix GeneChips Expression Assay (microarray containing probes for genes) Schematic of Gene Chip
Immobilized probe sequences are arranged in RNA or cDNA fragments flow features (cells), 1 millions iden cal sequences over the microarray and make up 1 cell hybridize to the probe cells. Different probes represent different parts of the same gene
gene A sequence
25-mers
PM + MM -
A probe set for gene A
Probes are selected to be specific to the target gene and have good hybridiza on characteris cs. GeneChip Overview
Gene A probe set
Gene B probe set 1.28 cm * Each chip * Pairs of features signal * * contains up to * * specific binding 6.5 M features 5 µm Probe sets signal gene expression 5 µm A Gene Chip is run for Each feature contains each treatment and genes millions of iden cal with different abundance probes values are examined RNA-Seq for gene expression analysis
• Cheaper per data point than other methods • Detect genes without prior knowledge of transcriptome • Allelic expression • No ascertain bias • No background adjustment • Requires reference sequence
• More about RNA-seq, see the reference paper of “RNA- Seq: a revolutionary tool for transcriptomics” (Wang et al., 2009) RNA-Seq protocol
1) Isolate mRNA 2) Generate cDNA 3) Prepare sequencing library 1) fragment 2) add adapters to ends 4) High-throughput sequence 5) Align to reference 6) Count reads 1 7) Convert read counts to gene abundance 2 RNA isolation and processing
Adapted from Simon et al., 2009, Ann. Rev. Plant Biol. 60:305 Reads alignment and counting • Align (map) reads to a genome or transcriptome • Convert alignments to read-counts per gene • Normalize read counts to size of gene AND total reads in sample – RPKM (Reads Per Kilobase per Million reads mapped)
Sample 1 Sample 2 (18 reads) (9 reads)
2,000 bp gene 1,000 bp gene RNA-Seq analysis: picking significant genes
• Use corrected p- Before FHB values Fold change inocula on (log scale) – Multiple testing
• Account for read counts and differences among treatments Fold change 24H a er FHB • Differentially- inocula on (log scale) expressed genes (DEG)
Log RPKM What do these significant genes do? • Gene Ontology resources (GO terms) What terms are “enriched”
Bailey et al. J. of . Carcinogenesis. 2013
Gene expression example --Cold responsive in Arabidopsis
• Seedlings were grown at 22oC with 16-h-light and 8-h-dark cycles for 2 weeks - all samples taken at same time of photoperiod • Total RNA was prepared from Arabidopsis seedlings after 0, 3, 6, or 24 h cold treatment at 0oC • Each time point, three plates of 150 seedlings were used for RNA pools
WT weak – response mutant
Time 0h 450 seedlings 450 seedlings
Time 3h 450 seedlings 450 seedlings
Time 6h 450 seedlings 450 seedlings
Time 24h 450 seedlings 450 seedlings
Lee et al. The Arabidopsis Cold-Responsive Transcriptome and Its Regulation by ICE1. The Plant Cell 2005 Which genes are significantly affected by the treatment?
• Affymetrix GeneChips that contain 24,000 genes • 939 (4%) genes were cold responsive • A multitude of transcriptional cascades – Early cold-responsive genes (3 h and 6 h) encode transcription factors that likely control late-responsive genes
Lee et al. The Arabidopsis Cold-Responsive Transcriptome and Its Regulation by ICE1. The Plant Cell 2005 Analysis of DEGs stimulates more experiments
• 21 differentially-expressed genes are associated with Auxin (16 downregulated) – Does the Auxin-level decrease?
• Some cold-responsive genes don’t do the same thing in WT and mutant (ice1) plants. – Can we model interactions? Gene expression profiling of mutants highlights a complicated regulatory network
• GeneChip profiling of WT and CBFΔ mutants. • Treat plants at 4 °C – Measure expression levels – Measure freezing tolerance
• First wave of transcription factors include CBF genes – Activates cold-response pathways
Park et al. Regulation of the Arabidopsis CBF regulon by a complex low-temperature regulatory network.The Plant Journal 2015 Gene expression profiling of transgenic plants
• Number of cold-regulated (COR) genes induced or repressed in transgenic plants overexpressing first-wave transcription factors Gene expression profiling of transgenic plants
• Number of cold-regulated (COR) genes induced or repressed in transgenic plants overexpressing first-wave transcription factors • First wave transcription factors function together in a complex low- temperature regulatory network
(172 genes) Use global analysis to test specific hypotheses
• GolS3 is a COR gene that is only regulated by CBF.
• However, many other COR genes are only partially regulated by CBF How do we protect plants from freezing • Some first-wave TFs are known to increase freezing tolerance without cold acclimation. – But they stunt growth
• Overexpress other first wave TFs and measure tolerance.
• Overexpression of HSFC1 increases freezing tolerance, but also stunts growth via an unknown mechanism Systemic Acquired Resistance in Plants NPR1 knock-down doesn’t inhibit bacteria- mediated SAR in barley
SAR
Dey, S., et al. Plant Phys. 2014 RNA-Seq of bacteria and mock –infected barley Identify 4 “promising” TFs
• Transcript abundance (qPCR) @ 3 time points after infection • Too much variability to draw conclusions Local and systemic response of wheat to Xanthomonas infection
Garcia-Seco, D, et al. Sci. Rep. 2017 RNA-Seq to determine DEGs LC/MS to determine DEPs What happens in wheat?
• Very different responses in leaves and roots • Leaves activate recognition platforms. • Roots start to produce energy and secondary metabolites. – Are roots more resistant now? Mole Rats! • Naked Mole Rat (HMR) • Blind Mole Rat (BMR) • Eusocial • Solitary Many amazing characteristics Characterization of these delightful rodents could be a genetic “Fountain of Youth” Have multiple levels of cancer resistance
Skin cells secrete lots of High Molecular weight hyaluronan NMR Cells quickly stop growth when they contact it
Miyawaki, et al. Nature Com 7, 2016 Tian, et al. Nature 499, 2013 Genomics show that mole rat cells are missing many genes that allow cells to evade apoptosis RNA-Seq experiment comparing NMR liver to mouse liver
Yu, et al. PLOS One. 2011 What genes are interesting?
Yu, et al. PLOS One. 2011 What do the genes do?
Yu, et al. PLOS One. 2011 Underground burrows are low in oxygen – 7-10% (animals under constant hypoxia) Hypoxia gene expression experiment design
21% Oxygen 6% Oxygen
x 3 x 3
x 3 x 3
6 hrs under treatment, sacrifice, harvest liver, extract mRNA, RNA-Seq to measure gene expression
Schmidt, et al. Scien fic Reports. 7, 2017 Differential liver gene expression in normal vs hypoxic conditions Mouse and mole rat transcriptomics GO term enrichment
What biological processes are upregulated in response to hypoxia? DNA Damage and repair genes are up regulated
Werner Syndrome – premature death and increased cancer risk Interesting genes upregulated in mole rats
• A2M - strong tumor suppressor properties • NEIL2 & XPA – DNA damage repair • CISD2- Mutations cause Wolfram syndrome – Neurogenerative, optic atrophy, shortened lifespan – Overexpression in mice increases lifespan • TSP1/FOXO3 pathway (in response to hypoxia) • FGF21 (in response to hypoxia) – Overexpression in mice increases lifespan • BUB3/RAE1– Mitotic spindle assembly More hypoxia transcriptomics experiments lead to models of stress and cancer response
Fang, et al. Nature Com. 5 2014 Transcriptomics/Gene Expression analysis is an important component of functional genomics
• Phenotypes • Traits • Diseases • Development