Gene expression analysis for functional

Jason Fiedler, PhD [email protected] Loftsgard 474H Functional genomics Genomics from to functions

DNA

RNA

Protein

of genes

• RNA polymerase binds promoter to initiate transcription • Promoter is upstream of the • 3’-5’ strand as template, anti-sense strand • Sequence of a gene presented is of the sense strand (5’-3’ strand), which is coding sequence Regulation of transcription can be very complicated

• Many “regulators” can influence transcription • Activators • Transcription factor • DNA-bending • RNA polymerase

Gene regulation is even more complicated in eukaryotes analysis/Transcriptomics = measuring mRNA abundance

Nucleus Wild type 500 400 DNA 300 Wild type RNA 200 100 Intensity

0

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 1 63 65 67 69 71 73 787 89 91 93 95 97 Genes Cell membrane

Nucleus Mutant Proteins 500

400 DNA 300 Mutant RNA 200 100 Intensity

0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 565 67 69 Genes

Cell membrane

Relave values are typically used Does mRNA abundance correlate with protein abundance?

[mRNA] vs [protein] Variance in protein abundance explained by X

Vogel and Marcotte. Nat. Rev. Genet. 13(4), 227-232. 2012 Regulation of can also be very complicated Multi-dimensional analysis is needed to fully understand biology

• Traits • Diseases • Development

Gene expression analysis can be used to help validate genetic association studies

Quantave-trait Loci (QTL) mapping

Genome-Wide Associaon Study Identify novel gene expression differences among “treatments”

• Development 1000x • Stress response • Mutational status 0.001x • Disease • Transgenic • KO or overexpress Differenally expressed • Usually high-throughput genes (DEG) method is used – Many genes measured. • Genes expressed differently in each treatment are investigated further Disease severity

Gene expression experimental design

• Line/subject/population selection (isolate treatment) • Treatment application/sampling (RNA is transient) • mRNA isolation (comprises only ~2% of all RNA) • What technique to measure abundance – Northern Blot or rtPCR (measure single or a few genes) – Serial Analysis of Gene Expression (SAGE) – Microarray (Affymetrix GeneChip) – RNA-Seq • Analyze data to infer biological meaning – Which genes are up- and down-regulated together (inversely) – What do these genes do? – Build a regulatory model Many RNA levels change throughout the day

This can confound differences among treatments mRNA needs to be isolated from other RNA

• PolyA Selection – Oligo-dT, often using magnetic beads – Isolates mRNA with poly A tail • rRNA Depletion – RiboZero, RiboMinus – Non-polyA RNAs preserved (non-coding, bacterial RNA, etc.) – Can be less effective at removing all rRNA Directly visualize the RNA

RNA - Staining Northern Blot Replace coding region with

5’ Nave promoter OsJMJ703 3’

5’ Nave promoter Β-glucuronidase (GUS) 3’

• Identify locations of gene activation in transgenic hosts.

Song, T. et al. Plant Phys. and Bioch. 2018 Reverse transcription PCR (rtPCR)

Reverse Transcriptase Creates DNA from RNA

qPCR measures abundance of a single fragment rtPCR measures transcript abundance associated with grain development Relave Expression

Zhao, J., et al. Front. Plant Sci. 2018 Serial Analysis of Gene Expression (SAGE)

• NlaIII is an endonuclease that cleaves double stranded DNA molecules into fragments – 5’ C A T G| 3' – 3’ |G T A C 5'

Malali Gowda et al. Plant Physiol. 2004;134:890-897 Serial Analysis of Gene Expression (SAGE)

• Ligate different gene fragments together • Sequence and count abundance of each gene

Malali Gowda et al. Plant Physiol. 2004;134:890-897 Affymetrix GeneChips Expression Assay (microarray containing probes for genes) Schematic of Gene Chip

Immobilized probe sequences are arranged in RNA or cDNA fragments flow features (cells), 1 millions idencal sequences over the microarray and make up 1 cell hybridize to the probe cells. Different probes represent different parts of the same gene

gene A sequence

25-mers

PM + MM -

A probe set for gene A

Probes are selected to be specific to the target gene and have good hybridizaon characteriscs. GeneChip Overview

Gene A probe set

Gene B probe set 1.28 cm * Each chip * Pairs of features signal * * contains up to * * specific binding 6.5 M features 5 µm Probe sets signal gene expression 5 µm A Gene Chip is run for Each feature contains each treatment and genes millions of idencal with different abundance probes values are examined RNA-Seq for gene expression analysis

• Cheaper per data point than other methods • Detect genes without prior knowledge of • Allelic expression • No ascertain bias • No background adjustment • Requires reference sequence

• More about RNA-seq, see the reference paper of “RNA- Seq: a revolutionary tool for transcriptomics” (Wang et al., 2009) RNA-Seq protocol

1) Isolate mRNA 2) Generate cDNA 3) Prepare sequencing library 1) fragment 2) add adapters to ends 4) High-throughput sequence 5) Align to reference 6) Count reads 1 7) Convert read counts to gene abundance 2 RNA isolation and processing

Adapted from Simon et al., 2009, Ann. Rev. Plant Biol. 60:305 Reads alignment and counting • Align (map) reads to a or transcriptome • Convert alignments to read-counts per gene • Normalize read counts to size of gene AND total reads in sample – RPKM (Reads Per Kilobase per Million reads mapped)

Sample 1 Sample 2 (18 reads) (9 reads)

2,000 bp gene 1,000 bp gene RNA-Seq analysis: picking significant genes

• Use corrected p- Before FHB values Fold change inoculaon (log scale) – Multiple testing

• Account for read counts and differences among treatments Fold change 24H aer FHB • Differentially- inoculaon (log scale) expressed genes (DEG)

Log RPKM What do these significant genes do? • resources (GO terms) What terms are “enriched”

Bailey et al. J. of . Carcinogenesis. 2013

Gene expression example --Cold responsive in Arabidopsis

• Seedlings were grown at 22oC with 16-h-light and 8-h-dark cycles for 2 weeks - all samples taken at same time of photoperiod • Total RNA was prepared from Arabidopsis seedlings after 0, 3, 6, or 24 h cold treatment at 0oC • Each time point, three plates of 150 seedlings were used for RNA pools

WT weak – response mutant

Time 0h 450 seedlings 450 seedlings

Time 3h 450 seedlings 450 seedlings

Time 6h 450 seedlings 450 seedlings

Time 24h 450 seedlings 450 seedlings

Lee et al. The Arabidopsis Cold-Responsive Transcriptome and Its Regulation by ICE1. The Plant Cell 2005 Which genes are significantly affected by the treatment?

• Affymetrix GeneChips that contain 24,000 genes • 939 (4%) genes were cold responsive • A multitude of transcriptional cascades – Early cold-responsive genes (3 h and 6 h) transcription factors that likely control late-responsive genes

Lee et al. The Arabidopsis Cold-Responsive Transcriptome and Its Regulation by ICE1. The Plant Cell 2005 Analysis of DEGs stimulates more experiments

• 21 differentially-expressed genes are associated with Auxin (16 downregulated) – Does the Auxin-level decrease?

• Some cold-responsive genes don’t do the same thing in WT and mutant (ice1) plants. – Can we model interactions? Gene expression profiling of mutants highlights a complicated regulatory network

• GeneChip profiling of WT and CBFΔ mutants. • Treat plants at 4 °C – Measure expression levels – Measure freezing tolerance

• First wave of transcription factors include CBF genes – Activates cold-response pathways

Park et al. Regulation of the Arabidopsis CBF regulon by a complex low-temperature regulatory network.The Plant Journal 2015 Gene expression profiling of transgenic plants

• Number of cold-regulated (COR) genes induced or repressed in transgenic plants overexpressing first-wave transcription factors Gene expression profiling of transgenic plants

• Number of cold-regulated (COR) genes induced or repressed in transgenic plants overexpressing first-wave transcription factors • First wave transcription factors function together in a complex low- temperature regulatory network

(172 genes) Use global analysis to test specific hypotheses

• GolS3 is a COR gene that is only regulated by CBF.

• However, many other COR genes are only partially regulated by CBF How do we protect plants from freezing • Some first-wave TFs are known to increase freezing tolerance without cold acclimation. – But they stunt growth

• Overexpress other first wave TFs and measure tolerance.

• Overexpression of HSFC1 increases freezing tolerance, but also stunts growth via an unknown mechanism Systemic Acquired Resistance in Plants NPR1 knock-down doesn’t inhibit bacteria- mediated SAR in barley

SAR

Dey, S., et al. Plant Phys. 2014 RNA-Seq of bacteria and mock –infected barley Identify 4 “promising” TFs

• Transcript abundance (qPCR) @ 3 time points after infection • Too much variability to draw conclusions Local and systemic response of wheat to Xanthomonas infection

Garcia-Seco, D, et al. Sci. Rep. 2017 RNA-Seq to determine DEGs LC/MS to determine DEPs What happens in wheat?

• Very different responses in leaves and roots • Leaves activate recognition platforms. • Roots start to produce energy and secondary metabolites. – Are roots more resistant now? Mole Rats! • Naked Mole Rat (HMR) • Blind Mole Rat (BMR) • Eusocial • Solitary Many amazing characteristics Characterization of these delightful rodents could be a genetic “Fountain of Youth” Have multiple levels of cancer resistance

Skin cells secrete lots of High Molecular weight hyaluronan NMR Cells quickly stop growth when they contact it

Miyawaki, et al. Nature Com 7, 2016 Tian, et al. Nature 499, 2013 Genomics show that mole rat cells are missing many genes that allow cells to evade apoptosis RNA-Seq experiment comparing NMR liver to mouse liver

Yu, et al. PLOS One. 2011 What genes are interesting?

Yu, et al. PLOS One. 2011 What do the genes do?

Yu, et al. PLOS One. 2011 Underground burrows are low in oxygen – 7-10% (animals under constant hypoxia) Hypoxia gene expression experiment design

21% Oxygen 6% Oxygen

x 3 x 3

x 3 x 3

6 hrs under treatment, sacrifice, harvest liver, extract mRNA, RNA-Seq to measure gene expression

Schmidt, et al. Scienfic Reports. 7, 2017 Differential liver gene expression in normal vs hypoxic conditions Mouse and mole rat transcriptomics GO term enrichment

What biological processes are upregulated in response to hypoxia? DNA Damage and repair genes are up regulated

Werner Syndrome – premature death and increased cancer risk Interesting genes upregulated in mole rats

• A2M - strong tumor suppressor properties • NEIL2 & XPA – DNA damage repair • CISD2- cause Wolfram syndrome – Neurogenerative, optic atrophy, shortened lifespan – Overexpression in mice increases lifespan • TSP1/FOXO3 pathway (in response to hypoxia) • FGF21 (in response to hypoxia) – Overexpression in mice increases lifespan • BUB3/RAE1– Mitotic spindle assembly More hypoxia transcriptomics experiments lead to models of stress and cancer response

Fang, et al. Nature Com. 5 2014 Transcriptomics/Gene Expression analysis is an important component of functional genomics

• Phenotypes • Traits • Diseases • Development