Gene expression: Non-coding RNA and personalized medicine. Breaking the genetic enigma

Narayan Sastri Palla [email protected] Outline

• RNA world • Central Dogma • Types of RNA transcripts and their functions • Noncoding RNA • Structure RNA • Computational methods to identify ncRNA • Lab exercises: microRNA target prediction Evolution of Life from primitive RNA to complex regulatory RNA in DNA Genome Central Dogma of Biology

r

4 RNA is an Active Player

reverse transcription long ncRNA

5 Overview of the kinds of DNA sequences found in the human genome Definitions:

Genome: the total amount of genetic material, stored as DNA. • The nuclear genome refers to the DNA in the contained in the nucleus; in the case of humans the DNA in the 46 chromosomes. It is the nuclear genome that defines a multicellular organism; it will be the same for all (almost) cells of the organism. Transcriptome:

• The total amount of genetic information which has been transcribed by the cell. This information will be stored as RNA. • This represents some 90% of the total genomic sequences • There is ~5X more RNA than DNA in a cell, most of it rRNA (~80%) and tRNA (~15%) Transcriptome:

• The transcriptome is unique to a cell type and is a measure of the expression. • Different cells within an organism will have different transcriptomes. Cell types can be identified by their transcriptome. Proteome:

• The cell’s complete protein output. This reflects all the mRNA sequences translated by the cell. • Cell types have different proteomes and these can be used to identify a particular cell. • Only 1 – 2% of the genome codes for the proteome If protein-coding portions of the human genome make up only 1.5% what is the rest doing?

How can the disparity between the number of sequences transcribed and translated be explained? It’s an RNA World

• Protein Translation • mRNA • tRNA • rRNA • RNA function and maturation • snRNA • snoRNA • RNaseP • Y RNA • Rnase MRP • RNA interference • miRNA • siRNA • piRNA • Regulatory • lncRNA • lincRNA • Telomere synthesis • Telomerase RNA Non-coding RNA

• Only 1-2 % of the genome codes for proteins • BUT a large amount of it is transcribed; some estimates have it as high as 98%. Non-coding RNA

The difference is the RNA which is an end in itself. This non-coding RNA (ncRNA) consists of : • the introns of protein coding , • non coding genes (what are these??) • Sequences antisense to or overlapping protein coding genes. Classes of ncRNAs

Class Size Function Phylogenetic distribution tRNA 70-80 Translation ubiquitous rRNA translation ubiquitous 16S/18S 1.5K 28S+5.8S/23S 3K 5S 130 RNase P 220-440 tRNA -maturation ubiquitous MRP 250-350 eukarya snoRNA 130 pseudouridinylation addition of repeats telomerase 400-550 snRNA 100-600 Spliceosome Eukarya U1 ~ U6 130-140 mRNA maturation Eukarya, archaea U7 ~65 Histone mRNA Eukayotes Maturation 7SK ~300 Translational vertebrata regulation tmRNA 300-400 Tags protein For proteolysis miRNA ~22 Post-tran. Reg. Multi-cellular orgs (Bompfunewerer, et al, 2005) Roles of ncRNAs

• Known roles for ncRNAs: – RNA catalyzes excision/ligation in introns. – RNA catalyzes the maturation of tRNA. – RNA catalyzes peptide bond formation. – RNA is a required subunit in telomerase. – RNA plays roles in immunity and development (RNAi). – RNA plays a role in dosage compensation. – RNA plays a role in carbon storage. – RNA is a major subunit in the SRP, which is important in protein trafficking. – RNA guides RNA modification. – In the beginning it is thought there was an RNA World, where RNA was both the information carrier and active molecule.

16 Functions of RNAs

• Protein Translation • mRNA • tRNA • rRNA Protein translation

Messenger RNA (mRNA)

• Destiny dictated by post-transcriptional modifications • “Cap and tail exits cell” • mRNA methylation widespread and likely functional • N6-methyladenosine (m6a) • meRIP-Seq Protein translation

Transfer RNA (tRNA) • 15% of cellular RNA

Ribosomal RNA (rRNA) • 80% of cellular RNA Functions of RNAs

•RNA function + maturation • snRNA • snoRNA • RNaseP • Y RNA • Rnase MRP

RNA-protein complexes Support cellular and molecular functions RNA function and maturation

Small nuclear RNA (snRNA)

• RNA component of the Spliceosome • snRNP complex made up of 5 snRNAs and over 20 proteins • Removes regions of non-coding mRNA (introns) RNA function and maturation

Small nucleolar RNA (snoRNA)

• Guides chemical modifications of other RNAs • mainly rRNA, tRNA, and snRNA • 2 main classes of snoRNA: • H/ACA box, direct conversion of uridine to pseudouridine • C/D box snoRNAs, help add methyl groups to RNAs RNA function and maturation

Ribonuclease P (RNaseP)

• RNA component of an RNA enzyme () • Cleaves a precursor sequence from tRNA molecules, generating mature tRNA • Also required for RNA Pol III transcription of various small noncoding RNA genes (e.g., tRNA, 5S rRNA, SRP RNA, and U6 snRNA genes) • Because they catalyze site-specific cleavage of RNA molecules, may have pharmaceutical applications RNA function and maturation

Y RNA

• Component of Ro Ribonucleoparticle (RoRNP) complex • Chaperone regulating maturation of small ncRNAs • Transcribed by RNA Pol III • UV resistance in mammalian cells • Essential for DNA replication • Upregulated in human cancer tissue • Required for increased proliferation of cancer cell lines RNA function and maturation

Ribonuclease MRP (RNase MRP)

• RNA component of RNase MRP • Enzymatically active ribonucleoprotein • Initiation of mitochondrial DNA replication • In the nucleus, involved in precursor rRNA processing Functions of RNAs

• RNA interference • miRNA • siRNA • piRNA

RNA inhibiting RNA RNA interference (RNAi)

• Biological process in which RNA molecules inhibit gene expression, typically by causing the destruction of specific mRNA molecules • Andrew Fire and Craig C. Mello shared the 2006 Nobel Prize in Medicine for their work on RNAi in the nematode worm C. elegans • RNAi helps defend cells against parasitic sequences • viruses and transposons • Plays integral role in development as well as regulation of gene expression in general • Technological applications • Gene knockdown: study physiological role of individual genes • Functional genomics: genome-wide RNAi screens • Medicine: attractive, but RNAi delivery to tissues is difficult

Small (short) interfering RNA (siRNA) MicroRNA (miRNA)

miRNAs must first undergo extensive post-transcriptional modification before they are mature and functional

Primary transcript (Pri-miRNA)

pre-miRNA

mature miRNA

1. Pri-miRNA are processed into 70-nucleotide precursors (pre-miRNA) 2. Precursor is cleaved to generate 21–25-nucleotide mature miRNAs Difficulty to discover ncRNAs from genomes

Unlike protein- coding genes: No strong statistical sequence signals (no ORF, no polyadenine)

Folded into 3D structure

Transcribed to tRNA sequence tRNA gene ncRNA gene finding strategies

1. Computational predictive methods 2. cDNA cloning to enrich ncRNAs 3. Detecting new transcripts with microarrays Computational ncRNA gene finding methods

• Specific (custom-designed) ncRNA search and annotation (e.g., tRNAscan, methylattion-guide snoRNA, miRNA, tmRNA)

• Reconfigurable search systems (e.g., Infernal, ERPIN, RNATOPS,FastR) • mechanism to profile the target ncRNA (structure) - need training data

• De novo ncRNA gene detection with • base composition (e.g., G+C %) • structure fold (e.g., RNAz)

• Comparative analysis (e.g., QRNA, EvolFold) - consensus structure Some ncRNAs databases

• Rfam (280,000 regions of 379 families) • NONCODE (109 transitional classes and 9 groups) • RNAdb (800 mammalian ncRNAs, excluding tRNAs, rRNAs and snRNAs) • Arabidposis small RNA Project (ASRP) • Etc. RNA Folds into (Secondary and) 3D Structures

A C A G A C A G C U A G C 200 G C AAUUGCGGGAAAGGGGUCAA P5a C GU G U 120 A U U G CAGCCGUUCAGUACCAAGUC G C P5 C G U A C G UCAGGGGAAACUUUGAGAUG C G G G GCCUUGCAAAGGGUAUGGUA A U U A A U A AUAAGCUGACGGACAUGGUC A A A CUAACCACGCAGCCAAGUCC U A A A 180 G C UAAGUCAACAGAUCUUCUGU G C C G P5c U A C G A G U A A 260 UGAUAUGGAUGCAGUUCA A A G G C G C G C C A P4 C G U G U C C U G G A 140 A U U G A U G U U 160 G C P6 C G A A A U G C C A P5b U A A C U G A G U G 220 G U C G U A A G A A C G P6a C G U U A A A We would like G U U A C G A U to predict them A U P6b C G A U G C 240 from sequence. A U U C U Waring & Davies. Cate, et al. (Cech & Doudna). (1984) Gene 28: 277. (1996) Science 273:1678. 33 RNA structure rules

• Canonical basepairs: • Watson-Crick basepairs: • G - C • A - U • Wobble basepair: • G – U • Stacks: continuous nested basepairs. (energetically favorable) • Non-basepaired loops: • Hairpin loop. • Bulge. • Internal loop. • Multiloop. • Pseudo-knots RNA stem-loop (pseudoknot-free) structure example X inactivation in mammals

An example of noncoding RNA function

X

X X X Y Dosage compensation Xist – X inactive-specific transcript

Avner and Heard, Nat. Rev. Genetics 2001 2(1):59-67 Laboratory Exercises

microRNA computational prediction and analysis

Slides from: www.bioalgorithms.info miRNA Pathway Illustration Features of miRNAs

• Hundreds miRNA genes are already identified in human genome.

• Most miRNAs start with a U

• The second 7-mer on the 5' end is known as the “seed.” • When an miRNAs bind to their targets, the seed sequence has perfect or near- perfect alignment to some part of the target sequence. • Example: UGAGCUUAGCAG... Features of miRNAs

• Many miRNAs are conserved across species: • For half of known human miRNAs, >18% of all occurrences of one of these miRNA seeds are conserved among human, dog, rat, and mouse.

• As a rule, the full sequence of miRNAs is almost never completely complementary to the target sequence. • Common to see a loop or bulge after the seed when binding. • Loop/bulge is often a hairpin because of stability.

• The site at which miRNAs attack is often in their target's 3' UTR. miRNA Binding

Bulges

The MRE is known as Hairpin is more stable the “miRNA recognition than a simple bulge element.” This is simply the sequence in the target that an miRNA binds to Locating miRNA Genes: Experimentally

• Locating miRNA experimentally is difficult.

• Procedure: 1. Find a gene that causes down-regulation of another gene. 2. Determine if no protein is encoded. 3. Analyze the sequence to determine if it is complementary to its target. Locating miRNA Genes: Comparative Genomics

• Idea: Find the seed binding sites. 1. Examine well-conserved 3' UTRs among species to find well-conserved 8-mers (A + seed) that might be an miRNA target sequence. 2. Look for a sequence complementary to this 8-mer to identify a potential miRNA seed. Once found, check flanking sequence to see if any stable hairpin structures can form—these are potentially pre-miRNAs. 3. To determine the possibility of secondary RNA structure, use RNAfold. Locating miRNA Genes: Example

• Suppose you found a well-conserved 8-mer in 3' UTRs (this could be where an miRNA seed binds in its target). • Example: AGACTAGG

• Look elsewhere in genome for complementary sequence (this could be an miRNA seed). • Example: TCTGATCC

• When TCTGATCC is found, check to see (with RNAfold) if the sequences around it could form hairpin; if so, this could be an miRNA gene. Finding miRNA targets: Method

• Goal: Find the set of miRNA targets for miRNAs shared across multiple species • Trying to identify which genes have 3' UTRs are attacked by miRNAs • Basic Assumptions: 1.There is perfect binding to the miRNA seed. 2.Any leftover sequence wants to achieve optimal RNA secondary structure. • Basic Method: For each species’ set of 3' UTRs, find sites where there is perfect binding of the miRNA seed and “optimal folding” nearby. Look for agreement among all the species. Method : Example Method : Steps

1. Find a perfect match to the miRNA seed.

2. Extend the matching region if possible.

3. Find the optimal folding for the remaining sequences.

4. Calculate the energy of this interaction. Method : Details

• Input: A set of miRNAs conserved among species and a set of 3' UTR sequences for those species.

• Method: For each organism: 1. Find all occurrences in the UTR sequences that match the miRNA seed exactly. 2. Extend this region with perfect or wobble pairings. 3. With the remaining sequence of the miRNA, use the program RNAfold to find optimal folding with the next 35 bases of the UTR sequence. 4. Calculate a score for this interaction based on the free energy of the interaction given by RNAfold. Method : Details

• Method Cont.: 5. Sum up the scores of all interactions for each UTR. 6. Rank all the organism's gene's UTRs by this score (sum of all interactions in that UTR). 7. Repeat the above steps for each organism. 8. Create a cutoff score and a cutoff rank for the UTRs. 9. Select the set of genes where the orthologous genes across all the sampled species have UTR's that score and rank above this cutoff. Method : Details

• Verification: • Find the number of predicted binding sites per miRNA. • Compare it to number of binding sites for a randomly generated miRNA. • The result is much higher. Results

• Predicted a large portion of already known targets and provided direction for identifying undiscovered targets.

• Found that it is more common that genes are regulated by multiple small RNAs.

• Found that many small RNAs have multiple targets. microRNA target prediction http://diana.imis.athena-innovation.gr/DianaTools/index.php?r=site/index

http://www.targetscan.org/vert_72/ microRNA target prediction

http://mirtar.mbc.nctu.edu.tw/human/index.php

• miRTar is an actively updated web-based program • Flexible, easy to use interface • Experimentally Databases

http://www.lncrnadb.org/ https://en.wikipedia.org/wiki/List_of_long_non-coding_RNA_databases http://www.noncode.org/ https://omictools.com/non-coding-rna-analysis-category

http://www.mirtoolsgallery.org/miRToolsGallery/node/1055

Browsers: http://genome.ucsc.edu/ http://epigenomegateway.wustl.edu/ Web-based analysis: http://mirtar.mbc.nctu.edu.tw/human/index.php http://lilab.research.bcm.edu/cpat/ Thank you!