IDENTIFICATION OF A PHOSPHO-HNRNP E1 NUCLEIC ACID CONSENSUS

SEQUENCE MEDIATING EPITHELIAL TO MESENCHYMAL TRANSITION (138 PP.)

Dissertation Advisor: Philip H. Howe

Protein translational regulation by RNA binding (RBPs) is a critical process in maintaining homeostasis. Epithelial to mesenchymal transition (EMT) is a process in which epithelial cells de-differentiate and become mesenchymal, increasing the propensity toward tumorigenesis and/or metastasis. We have identified a heterogeneous nuclear riboprotein E1

(hnRNP E1)-mediated post-transcriptional operon that controls transcript-selective translational regulation of epithelial / mesenchymal transition (EMT)-associated . In this regulatory mechanism, hnRNPE1 binds to the 3’-UTR of select transcripts and silences their translation.

TGFβ reverses translational silencing through Akt2-dependent phosphorylation of hnRNP E1 at

Ser-43, resulting in loss of hnRNP E1 binding to RNA. We have identified approximately forty pro-EMT / metastatic mRNAs that are regulated by this hnRNP E1 operon and our preliminary studies have revealed a short stretch of nucleic acids, that we have termed the BAT element

(TGF-beta activated translational (BAT) element), present in their respective 3’-UTRs that may be responsible for hnRNP E1 binding. Herein, through the use of in-vitro and in-vivo assays, we demonstrate the contribution of BAT element mutations and constitutively high levels of pSer43 hnRNP E1 to cancer tumorigenesis and metastasis.

i

IDENTIFICATION OF A PHOSPHO-HNRNP E1 NUCLEIC ACID CONSENSUS

SEQUENCE MEDIATING EPITHELIAL TO MESENCHYMAL TRANSITION

A dissertation submitted

to Kent State University in partial

fulfillment of the requirements for the

degree of Doctor of Philosophy

by

Andrew S. Brown

August 2015

© Copyright

All rights reserved

Except for previously published materials

ii Dissertation written by

Andrew S. Brown

B.S., Youngstown State University, 2008

M.S., Youngstown State University, 2010

Ph.D., Kent State University, 2015

Approved by

Philip H. Howe, Ph.D., Chairman, Department of Biochemistry, Doctoral Advisor

Derek S. Damron, Ph.D., Professor, Department of Cellular and Molecular Biology

Srinivasan Vijayaraghavan, Ph.D. Professor, Department of Cellular and Molecular Biology

Olena Piontkivska, Ph.D., Associate Professor, Department of Biology

Bidyut K. Mohanty, Ph.D., Assistant Professor, Department of Biochemistry

Accepted by

Ernest J. Freeman, Ph.D., Director, School of Biomedical Science

James L. Blank, Ph.D., Dean, College of Arts and Sciences

iii TABLE OF CONTENTS

TABLE OF CONTENTS ...... iv

LIST OF FIGURES ...... vi

LIST OF TABLES ...... viii

ACKNOWLEDGEMENTS ...... x

CHAPTER I ...... 1

INTRODUCTION ...... 1

CHAPTER II ...... 5

Computational Identification of Post Translational Modification Regulated RNA

Binding Motifs ...... 5

ABSTRACT ...... 6

INTRODUCTION ...... 7

EXPERIMENTAL PROCEDURES ...... 10

RESULTS ...... 15

DISCUSSION ...... 33

ACKNOWLEDGEMENTS ...... 36

FUNDING ...... 37

REFERENCES ...... 38

CHAPTER III ...... 42

Identification and Characterization of an hnRNP E1 Translational Silencing Motif ...... 42

iv

ABSTRACT ...... 43

INTRODUCTION ...... 44

EXPERIMENTAL PROCEDURES ...... 48

RESULTS ...... 53

DISCUSSION ...... 77

ACKNOWLEDGMENTS ...... 80

FUNDING ...... 81

REFERENCES ...... 82

CHAPTER IV ...... 86

Development of a Phosphorylated Serine 43 (hnRNP E1) Specific Antibody and

Implications of Expression in Metastatic Cancer ...... 86

ABSTRACT ...... 87

INTRODUCTION ...... 88

EXPERIMENTAL PROCEDURES ...... 93

RESULTS ...... 96

DISCUSSION ...... 110

ACKNOWLEDGEMENTS ...... 112

FUNDING ...... 113

REFERENCES ...... 114

CHAPTER V ...... 118

GENERAL SUMMARY ...... 118

REFERECES: ...... 125

v LIST OF FIGURES

Figure 1. Systems overview of RPTS Application Capabilities ...... 20

Figure 2. 3’-UTRs become divergent across evolution while conserving regulatory motifs.

...... 24

Figure 3. RPTS predicted motif-containing genes interact with hnRNP E1...... 27

Figure 4. Predicted motif-containing genes global set enrichment (GSEA) analysis...... 30

Supplementary Figure 1. Additional RBP motif training sets...... 32

Figure 5. Translational regulation of EMT-inducing genes is upregulated by TGFβ and knockout of hnRNP E1...... 55

Figure 6. EMT-inducing genes bind hnRNP E1 through their 3’UTR and are protected from exonuclease degradation...... 59

Figure 7. hnRNP E1 binding nucleic acid motifs show a total loss of affinity for hnRNP E1 when in its phosphorylated state...... 62

Figure 8. Unbiased exonuclease digestion of hnRNP E1 bound target mRNA allows for high-throughput analysis of protected RNA fragment sequences...... 65

Figure 9. Genomic exonuclease ION torrent analysis reveals 37 consensus motifs with conserved pyrimidine residues interspaced by variable structural regions...... 68

Figure 10. Conservation of descriptor motif features in randomized synthetic RNA maintains regulatory motif properties in protein translation...... 72

vi

Figure 11. Non-canonical TGFβ signaling cascades are significantly upregulated in metastatic cancer lines...... 76

Figure 12. In-vitro kinase of KH1 and p-KH1 enrichment...... 100

Figure 13. p-hnRNP E1 specific antibody shows specificity to hnRNP E1Ser43...... 103

Figure 14. Akt isoform expression and p-hnRNP E1 levels in metastasis...... 106

Figure 15. Clinical implication of TGFβ induced p-hnRNP E1 activity...... 109

vii LIST OF TABLES

TABLE 1 ...... 18

TABLE 2 ...... 91

viii

This thesis is dedicated to my parents.

For their endless love, support, and encouragement.

ix ACKNOWLEDGEMENTS

I would like to express my sincere gratitude for the inspirational mentorship, guidance, and support of Dr. Philip H. Howe. His unique and patient mentorship skills have provided me a deep respect and appreciation for molecular biology.

I would also like to acknowledge the support and assistance given to me by the members of my dissertation committee, mentors at the Medical University of South Carolina, and members of my laboratory for their support and critical insight into the development of my research.

Most importantly, I would like to express my deepest gratitude for the love and support of my future-wife. Her continuous encouragement, patience, and inspiration have played a major role in the completion of this process. Thank You, Korey.

x

CHAPTER I

INTRODUCTION

1 Cancer metastasis and EMT

Epigenetic changes and environmental stress or stimuli are implicated in the progression of normal cellular tissue obtaining the hallmarks of cancer progression such as evasion of death signals, uncontrolled growth, and cellular de-differentiation. Many models of cancer progression implicate a single cell or sub-population of cells undergoing a genetic altering event; this event causes the effected cell to develop an invasive phenotype. According to Weinberg, et al., cancer metastasis can be divided into two distinct phases, 1) physical translocation of the cancer cell from primary tumor to the microenvironment of a distant tissue, and 2) colonization of the distant tissue (Burk et al., 2008; Chaffer & Weinberg, 2011; Karnoub et al., 2007). The epithelial-to-mesenchymal transition (EMT) is thought to be a reversible process by which a cell that is epithelial in nature undergoes a change that causes it to generate progeny with a mesenchymal phenotype. EMT is though to be driven by transcription factors (EMT-TFs) that create a genetic profile inducing mesenchymal traits, and the silencing or deactivation of EMT-

TFs allows the cell to revert back to its epithelial phenotype through mesenchymal-to-epithelial transition (MET) (Chaffer & Weinberg, 2011; Gupta & Massagué, 2006). Multiple studies have been performed demonstrating the ability of non-cancerous, epithelial cells to undergo changes by inducing EMT-TFs that induces a cancer-stem-cell (CSC) state and causes these cells to become mesenchymal and invasive. The properties conferred to the cell by EMT-TFs are extremely conducive for requirement 1 of Weinberg’s metastasis model, this in-turn causes an increased rate of extravasation, and an increase in the colonization of primary tumor cells to the target microenvironment (Mani et al., 2008; Morel et al., 2008).

2 TGFβ mediated EMT

TGFβ and its related factors modulate a vast array of cellular functions and primarily causes cells to undergo either apoptosis or become highly proliferative (Massagué, 2000, 2012; Massague et al., 1991). These distinctly dual and opposite roles filled by TGFβ are controlled by downstream regulatory pathway activation, traditional non-canonical TGFβ signaling occurs through the

SMAD proteins and causes a phosphorylation cascade of events ultimately leading to growth arrest and apoptosis; multiple non-canonical pathways have been identified, amongst these the phosphatidylinositol-4,5-bisphosphate 3-kinase (PI3K) / protein kinase B (Akt) pathway has been identified as a potent mediator of EMT, tumorigenesis, metastasis, and invasiveness (Chin,

Balk, & Toker, 2013; Hussey et al., 2012b; Massague et al., 1991; Xu, Lamouille, & Derynck,

2009; Zavadil & Böttinger, 2005). Akt is a serine / threonine kinase that has three currently known isoforms: Akt1, Akt2, and Akt3. The ratio of expression for different Akt isoforms has been implicated in a variety of pathological conditions, the general expression pattern indicative of an invasive phenotype is a high ratio of Akt2 : Akt1; there are currently no convincing models of cancer progression that are effected by Akt3 (Cheng et al., 1996; Chin et al., 2013; Liu et al.,

1998). An over activation of Akt2 via non-canonical TGFβ stimulation has been implicated in the post-translational modification of hnRNP E1 by phosphorylating serine 43, altering its RNA binding and protein translational silencing capabilities (Chaudhury, Chander, & Howe, 2010;

Hussey et al., 2011; Hussey et al., 2012b). hnRNP E1 is a DNA / RNA binding protein that is ubiquitously expressed by epithelial cells. Major roles of this protein vary, and its implication in the progression of cancer has been associated with the Akt2 induced serine 43 phosphorylation

(p-hnRNP E1). p-hnRNP E1 modifications cause increased protein expression in a subset of genes that are shown to translocate from monosomal to polysomal fractions when high-

3 throughput expression analysis of polyribosomal sedimentation assays (Hussey et al.,

2012b). The loss of translational repression by hnRNP E1 causes increased protein levels of genes that are highly implicated in EMT and metastatic progression; this process is driven by

TGFβ, and requires sufficient Akt2 levels and activation.

Nucleic acid sequence conservation in 3’ UTRs

RNA binding proteins (RBPs) have been shown to bind nucleic acid motifs that have been conserved in the orthologous genes of evolutionarily divergent species. These conserved RBP binding motifs are typically located in either the 3’ or 5’ untranslated region (UTR) of mature mRNA, conserved motifs generally bind a single RBP, binding can be inhibited by post- translational modification (Auyeung, Ulitsky, McGeary, & Bartel, 2013; Churbanov, Rogozin,

Babenko, Ali, & Koonin, 2005; Chursov, Frishman, & Shneider, 2013). The induction of EMT by non-canonical TGFβ signaling follows this pattern of RBP mediated protein translational repression of target genes that contain a conserved nucleic acid motif in their 3’UTRs. This efficient mechanism of regulating developmentally necessary genes and inhibiting their expression at mature stages poses a unique ability to respond to stress along with the paradigm of being a potential point of failure and misregulation (Hafner et al., 2010b; Hussain, Zawawi, &

Bayfield, 2013; Hussey et al., 2011). The hnRNP E1 / conserved nucleic acid regulatory motif system presents the opportunity to cause cells to inadvertently undergo EMT by pathological exposure to TGFβ at later stages in the cells life.

4 CHAPTER II

Computational Identification of Post Translational Modification Regulated RNA Binding

Protein Motifs

5 ABSTRACT

RNA and its associated RNA binding proteins (RBPs) mitigate a diverse array of cellular functions and phenotypes. The interactions between RNA and RBPs are implicated in many roles of biochemical processing by the cell such as localization, protein translation, and RNA stability. Recent discoveries of novel mechanisms that are of significant evolutionary advantage between RBPs and RNA include the interaction of the RBP with the 3’ and 5’ untranslated region (UTR) of target mRNA. These mechanisms are shown to function through interaction of a trans-factor (RBP) and a cis-regulatory element (3’ or 5’ UTR) by the binding of a RBP to a regulatory-consensus nucleic acid motif region that is conserved throughout evolution. Through signal transduction, regulatory RBPs are able to temporarily dissociate from their target sites on mRNAs and induce translation, typically through a post-translational modification (PTM).

These small, regulatory motifs located in the UTR of mRNAs are subject to a loss-of-function due to single polymorphisms or other mutations that disrupt the motif and inhibit the ability to associate into the complex with RBPs. The identification of a consensus motif for a given RBP is difficult, time consuming, and requires a significant degree of experimentation to identify each motif-containing gene on a genomic scale. We have developed a computational algorithm to analyze high-throughput genomic arrays that contain differential binding induced by a PTM for a

RBP of interest – RBP-PTM Target Scan (RPTS). We demonstrate the ability of this application to accurately predict a PTM-specific binding motif to an RBP that has no antibody capable of distinguishing the PTM of interest, negating the use of in-vitro exonuclease digestion techniques.

6 INTRODUCTION

Post-translational modification (PTM) of RBPs through signal transduction can have tremendous implications for affinity to conserved RNA sequences in the 3’ and 5’ UTRs (Kuersten &

Goodwin, 2003). Signal transduction cascades cause various PTMs of RBPs that have the potential to effect stability, localization, conformation, and affinity to RNA (Aletta, Cimato, &

Ettinger, 1998; Dreyfuss, Kim, & Kataoka, 2002). This study aims to predict the nucleic acid regulatory motif to which a specific RBP’s affinity is modulated through PTM.

Cellular regulation through conserved regulatory motifs in UTR regions of mRNA is an essential process that maintains homeostasis. RBPs interact with motifs in mRNA and control its stability, translation, and localization; all of which play a vital role in the progression of disease (Ray et al., 2013). Advances in high-throughput, genome wide sequencing have allowed for genomic

RBP analysis by cost-effective, in-vitro binding assays coupled to downstream sequencing. A need has been established for computational tools to aid with the analysis of data generated in these studies to give accurate predictions of regulatory nucleic acid target sites that bind RBPs in a PTM-dependent manner.

The conservation of regulatory motifs in orthologous genes across evolution is a guiding principle in the analysis of high-throughput sequencing data generated from RBP binding studies

(Chen & Rajewsky, 2006; Siomi, Matunis, Michael, & Dreyfuss, 1993; Wuchty, Oltvai, &

Barabasi, 2003). 3’ and 5’-UTRs contain conserved motifs that interact with RBPs, these motifs are conserved, while the overall UTR has become divergent in both sequence and length. These divergent species have shown similar regulatory activity despite their highly variable UTRs

(Churbanov, Rogozin, Babenko, Ali, & Koonin, 2005). To illustrate this finding, a complex of

RBPs and a 3’-UTR motif are shown to regulate the expression of genes responsible for

7 inflammation that is induced by stimulation with interferon (IFN)-γ. IFN-γ was shown to silence the expression of these inflammatory mediators upon PTM to RBPS involved in forming a complex with a conserved 3’-UTR element. The modification of these RBPs facilitated their dissociation from their target nucleic acid motif and abolished the inhibitory effect of complex formation. In-vitro studies were performed using a tedious method of sequentially shortening the

3’-UTR and determining binding affinity for the RBP. When the minimal sequence was found it was compared for evolutionary conservation and found to be highly conserved in the 3’-UTR of orthologous genes that had a highly variable 3’ UTR length when compared to one another

(Mukhopadhyay, Jia, Arif, Ray, & Fox, 2009).

Databases such as hosted on PubMed (ncbi.nlm.nih.gov) allow for the fast and efficient retrieval of nucleic acid and protein sequences, as well as experimental results that contain data from high-throughput array studies (Maglott, Ostell, Pruitt, & Tatusova, 2011). The GEO database contains many studies where the genomic affinity of 3’ and 5’-UTRs are characterized as a function of modifying an RBP. Many of the conclusions drawn from these studies simply include the signature of gene induction that was derived from analyzing the transcriptome expression levels for each sample. The underlying mechanism of these genetic alterations lies in the interaction between an RBP and a consensus nucleic acid motif that is contained in the genes directly interacting with the RBP. We have developed an algorithm capable of analyzing array data and through evolutionary analysis determine a minimal regulatory-consensus motif descriptor that has affinity for an individual RBP as a function of PTM.

There are many methods that utilize a lengthy process involving exonuclease digestion of RBP- bound RNA, coupled to primer ligation and downstream high-throughput sequencing. The intention of this algorithm is to detect a regulatory motif that is sensitive to PTM of a specific

8 RBP, this could include the loss, or gain of binding induced by the PTM. In a situation where there are no commercially available antibodies capable of distinguishing a PTMs from native

RBPs, this application serves as an excellent tool to predict motifs that can be later validated by multiple in-vitro binding assays.

The algorithm presented in this work has been tested on a publicly available 3’ UTR Affymetrix array (GEO, GSE40466) that was used to determine the genome wide loss-of-binding of the RBP hnRNP E1 as a function of TGFβ induced post-translation modification. hnRNP E1 is a protein that is implicated in the translational silencing of genes through binding a cis-regulatory element in the 3’ UTR of mature mRNA. The complex formed by hnRNP E1 and the 3’ UTR regulatory element can be disrupted by the phosphorylation of hnRNP E1Ser43 (p-hnRNP E1) induced by

TGFβ (Chaudhury et al., 2010; Hussey et al., 2011). This analysis identifies 36 genes that have a significant loss-of-binding to p-hnRNP E1 (Hussey et al., 2012). We demonstrate the ability of our algorithm to predict a consensus motif and support this prediction with in-vitro analysis of binding kinetics.

9 EXPERIMENTAL PROCEDURES

RPTS Package

Here we introduce RNA Binding protein Post-translational modification Regulatory Nucleic

Acid Target Predictor (RPTS), a computational analysis tool implemented in python, capable of identifying putative nucleic acid consensus motifs that interact with a specific RBP in a PTM- dependent manner. We have made available a repository located at: https://github.com/asbrown001/RPTS, containing source code. Installation and configuration instructions are located in the root directory. This tool is a high-throughput wrapper that builds off of the Clustal-Ω alignment engine and Bioconductor statistical computing package

(implemented in R). We handle pre-processing of raw high-throughput data with optional libraries that are user selectable. Table 2 provides a detailed description of the libraries that are selectable to perform pre-processing operations on raw data. To minimize configuration issues, certain optional libraries are not able to be run locally on the client machine, we have made available webservices that utilize a preconfigured server to handle pre-processing and return a fully processed Javascript object notation (JSON) object. We detect differential expression of processed gene expression data by utilizing the limma package for Bioconductor. This application executes Bioconductor / R scripts by utilizing the RPy2 library

(http://rpy.sourceforge.net/), and built-in process module to execute C/C++ libraries used in alignment.

Computational Algorithm RPTS

10 The analysis used in this publication uses a publicly available dataset hosted in GEO, GSE40466 that contain data from 36 unique Affymetrix 3’ UTR chips. The study can be downloaded as a tape archive (.tar) file that contains 36 g-zipped (.gz) files corresponding to an individual chip.

RPTS allows the .tar file to be imported as input and handles all subsequent compression, decompression and computer memory management.

RPTS handles normalization of raw data and allows a user to specify experimental groups with separate samples. Statistical analysis for differentially expressed genes is carried out by the limma Bioconductor package. Phenotype data that describes the experimental design can be input as a file or user-defined by entering data at prompts. Robust multichip average (RMA) normalization is applied to the processed data, fit to experimental design, and an empirical Bayes adjustment is performed to determine differentially expressed genes. Genes that had significant differential expression between different sample groups (PTM treatments) are listed to the user along with the degree of enrichment. Differentially expressed genes are next subjected to evolutionary analysis for conserved motif regions.

Identification of Evolutionary Conserved Motifs

The list of genes found to be significantly altered by experimental treatment is further scrutinized for evolutionary conservation. RPTS contains 3’ and 5’-UTR databases from UTRdb (Pesole et al., 2000; Pesole et al., 2002) for a variety of evolutionary divergent species. Key factors for choosing species, along with various programmatic options are described above.

RPTS contains a post-alignment algorithm that is used to compare evolutionarily conserved motifs. A recursive process to choose the highest conserved region of an alignment is utilized and the consensus from this region is added to a pool of motifs from genes that were predicted to contain the regulatory-motif.

11 Creating, Modifying, or Updating Query Database

This algorithm requires a database that contains nucleic acid sequences in either FASTA or

UTRdb flat database format. We suggest downloading database files from UTRdb

(http://utrdb.ba.itb.cnr.it/) and adding to the “Database” directory contained in the application root.

This application walks the entire directory and scans for database extension or compressed files.

It is suggested the user separate database files downloaded in a taxonomy based directory to allow for simple updating. The evolutionary analysis step allows a database file to be read into memory as a custom object that supports rapid querying, to update any of these files, simply modify the contents of the “Database” directory. Any modification to files in this directory will take effect on the next subsequent instantiation of the application, no further input from the user is necessary.

Evolutionary Divergent Orthologous RBP PTM Validation

The selection of organisms used in evolutionary alignments is performed prior to evolutionary analysis. Organisms are first validated to contain an orthologous RBP, this is accomplished through performing a BLASTp query of the RBP-of-interest (RBPi). The user provides the accession number to their RBP of interest and this is used in a webservices call to the BLASTp url-endpoint. Results of BLAST analysis are interpreted by the application, by default, the threshold to consider the organism as having an orthologous RBP is > 60% sequence identity and

> 80% alignment length, and an e-score of 1e-10 or greater. To assess the likelihood of this RBP undergoing a similar PTM, a user will provide a numerical position for the amino acid residue or range of residues that the PTM occurs on their RBPi. A Clustal alignment is performed on the orthologous RBP and RBPi, the residue range specified by the user is compared to see if there

12 are identical amino acids or amino acids that have similar properties to the RBPi. If this test is passed, the organism is subjected to cytokine response analysis.

Evolutionary Divergent Organism Cytokine Validation

Many PTMs are induced by cytokine stimulation, to account for this phenomenon in evolutionary analysis, we have made an optional check available to the user. If a user believes that the only to cause a PTM on their RBPi is through cytokine stimulation, they can set this constraint to choose only organisms that contain an orthologous cytokine, with user-adjustable constraints for similarity. To perform this check, a call to BLASTp is made using the RBPi accession number input by the user. By default, > 60% sequence identity, > 80% alignment length, and an e-score of 1e-10 or greater is used as a cutoff to consider the organism as having similar cytokine. If no orthologs are found, the organism is left out of evolutionary analysis.

Selection of Putative RNA Motifs from UTRs

To identify putative consensus motifs based on an evolutionary conserved principal, we create objects in memory for each putative motif-containing gene. Each of these objects contains a

UTR sequence of the gene contained on the high-throughput chip, and a variable number of UTR sequences from orthologous genes of evolutionary divergent organisms that fit exclusion criteria.

This object is parsed into a memory stream and aligned using the Clustal-Ω alignment engine.

Alignments are performed locally on a client machine for performance enhancement, this keeps from making an excess amount of web requests. Upon completion of the alignment, the output is extracted through the Python process module and analyzed for conservation. We have employed a recursive method that scans the overall alignment sequence and identifies regions of similarity based on a number of gaps. A user may define the sensitivity of region determination by adjusting the number of consecutive gaps and matches that will define a region, default values

13 are > 6 consecutive gaps, and > 10 similar residues. Homologous regions are parsed into objects that contain portions of the alignment that correspond to the regional boundary and are given a score of conservation based on the number of identical and similar residues found in the sub- alignment. The region with highest conservation score is then selected at the putative motif for that gene and added to a collection of confirmed genes.

Western Blot Analysis and Immunoprecipitation

To measure levels of p-hnRNP E1, we employed a combinational approach through immunoprecipitation and subsequent western blot analysis. No antibody yet exists to specifically detect p-hnRNP E1; however, there are antibodies capable of recognizing the phosphorylated form of the consensus RXRXXpS, which is contained at the Ser43 site of hnRNP

E1. For immunoprecipitation, protein-A coupled sepharose was incubated with α-hnRNP E1 at

4°C for a minimum of 1 hour. Protein-A / antibody mixture was washed 3x with wash buffer

(PBS, 0.05% Tween-20), and added to 500 µg total cellular lysate from NMuMG cells and incubated at 4°C for a minimum of 2 hours. Beads were washed 3x with wash buffer and loaded into polyacrylamide gel for electrophoresis, subsequently transferred to membrane for western blot analysis.

RNA Immunoprecipitation Assay and Polymerase Chain Reaction

Antibody and protein-A coupled sepharose was prepared as described in western blot and immunoprecipitation methods above. For RNA immunoprecipitation, antibody and bead mixture was incubated with cellular lysate for a minimum of 20 minutes at 4°C. Beads were washed 3x with an RNA wash buffer (200 mM NaCl, 0.05% Tween-20), and RNA was extracted using phenol : chloroform extraction. RNA was reverse transcribed into cDNA and amplified using polymerase chain reaction, samples were electrophoresed on a 2% agarose-gel.

14 RESULTS

Determination of RNA Consensus Binding Sequences for RNA Binding Proteins

Our example analysis of GSE40466 (GEO database) shows a number of genes that have a signature hnRNP E1 interaction that is inhibited by TGFβ induced PTM. Again, TGFβ induces p-hnRNP E1, and is shown to cause the dissociation from its target nucleic acid sequence. This novel regulatory system reported previously by our lab was shown to promote stalling at the elongation stage of translation through the interaction of hnRNP E1, a target nucleic acid sequence in the 3’-UTR, and various other proteins factors (Chaudhury et al., 2010; Hussey et al., 2011). hnRNP E1 was shown to be the most crucial member of this stall-complex by the observation of polyribosomal sedimentation assays. We observed samples untreated with TGFβ had enriched 80S factions (monosomal) for genes that were known to contain the nucleic acid regulatory sequence. Samples that had hnRNP E1 silenced, and samples that were treated with

TGFβ saw these same genes shift into the polyribosomal fractions, indicating the necessity for intact hnRNP E1. GSE40466 contains 2 distinct experiments used to characterize genes this signature between motif containing 3’-UTRs and hnRNP E1, the first group compares differential binding to hnRNP E1 as a function of treatment with TGFβ. The second group of experiments compares enrichment of polyribosomal fractions as a function of both TGFβ treatment, and the loss of hnRNP E1. RPTS allows for these experimental groups to be defined.

Experiment group 1 was set to scan for genes that had a strong affinity for hnRNP E1 and a subsequent loss with TGFβ treatment. Experimental group 2 was set to look for genes that were highly enriched in the 80S and lighter (monosomal) factions in hnRNP E1-competent cells, and a

15 subsequent shift of enrichment into the polyribosomal fractions as either a function of hnRNP E1 silencing, or TGFβ treatment.

RPTS Algorithm

We have included a diagrammatic depiction of the generalized flow-through of user interaction with RPTS to illustrate the major points of functionality of the program (Fig. 1A). We have also included a detailed visual representation of the RPTS algorithm (Fig. 1B). To being analyzing an experimental set of GEO data, a user may input an archive directly from GEO

(http://www.ncbi.nlm.nih.gov/gds), or from a compressed version of their proprietary sequencing results. RPTS analyzes these data and gives a list of putative genes that contain the regulatory motif. The motif coordinates are given and mapped with respect to their position in mature mRNA. Each gene that contains a motif is subjected to evolutionary conservation analysis to determine the likelihood of being an RBP regulatory motif, and if validated is placed into a pool of validated genes. At the end of evolutionary analysis, all validated genes are subjected to a further alignment and a descriptor sequence is generated by analyzing the consensus of the alignment and applying a nucleic acid base generalization technique outlined in supplemental results.

Once a user has loaded their datasets to RNRNTP, they are immediately decompressed and stored in memory. RPTS does not currently have a minimal required amount of physical

Random Access Memory (RAM); however, 2 gigabytes (GB) is recommended for purposes of efficiency. Once this decompression is complete, background subtraction processing occurs utilizing the Bioconductor / R statistical computing language. Raw values of gene intensities are stored in memory, and the user then defines experimental groups that consist of individual samples (replicates) and sample groups (treatment). RPTS then determines genes that are

16 statistically significant by normalization of replicate data, and comparison of gene expression between sample groups contained in the experiment. To accomplish statistical analysis, Python scripts were written to perform a one-way analysis of variance (ANOVA) across each contained in the array, genes that had a p-value of less than 0.05, were further scrutinized by Tukey’s HSD, to reveal which sample groups contained significant differential expression of the individual gene. Patterns for gene expression are determined by RPTS and given to the user. The user may then select the expression pattern that correlates to their expected results and proceed with evolutionary conservation analysis.

Evolutionary conservation analysis is determined by selecting the 3’-UTRs of orthologous genes of evolutionarily divergent organisms. RPTS contains compressed databases of a variety of organisms for their 3’ and 5’ UTR genomes. The conservation analysis algorithm ensures each organism selected for comparison meets the following criteria: 1) the organism must contain an orthologous RBP that is highly similar to the RBP of interest. 2) (Optional) The RBP must have the ability to undergo a similar PTM, and 3) (Optional) The organism must be capable of responding to an orthologous cytokine that is used in the experiment. Clustal-Ω alignment is performed on the orthologous genes of species that passed our algorithm’s logic. Highly conserved regions are determined and scored by a recursive algorithm, described in our experimental procedures, the highest scored region is placed into another algorithm to determine a minimal descriptor sequence based off the motif-containing genes. Finally, a descriptor sequence is generated following logic presented in the methods section. This can be used to determine critical points of interaction with the RBP, and validated using synthetic RNA expression techniques and in-vitro binding assays.

17 TABLE 1

18 FIGURE 1

A GEO Dataset Statistically High-throughput generated motif experimental descriptor results (CEL)

Genes Containing Gene-by-gene RPTS RBP-regulatory Motif Motif coordrinates

B RBP Specific Regulatory Motif

Archive file is read into memory Genes significantly effected by experimental treatment are analyzed for evolutionary Alignment and conservation PSSM analysis to yield general Decompression, descriptor conversion and background subtraction 3’UTR Sequences of Predicted motif is divergent mapped to mRNA, species are exact position is User-defined analyzed using annotated experiments, CLUSTAL- sample groups Omega and samples

Conserved regions ranked, Background highest ranked Subtraction, region per gene is Differential selected Expression

19 FIGURE 1 LEGEND

Figure 1. Systems overview of RPTS Application Capabilities

A) System overview of the user interface with RPTS. Input may come from either a published tape archive (.tar) GEO dataset, or directly from a high-throughput sequencing system, such as

Affymetrix. Input data may come from multiple sources, RPTS has options the user may set in order to combine data from multiple sources. B) The algorithm of RPTS is described in detail in this panel. General flow of decision points and data analysis are described, key points of decision are highlighted in red-boxes. Output from the program is highlighted in green-boxes, with a description of what is contained.

20 Evolutionary Conservation of 3’ UTR Regulatory Motifs

We demonstrate the techniques and theory used to find probable RBP regulatory-consensus motifs from high throughput binding data. As noted earlier, the 3’ UTRs of evolutionary divergent species vary significantly in both sequence and length; therefore, an evolutionarily important regulatory sequence would likely be conserved to maintain regulation and mechanistic function. Fig. 2a shows the divergent lengths of 3’ UTRs across Jak2, a gene regulated by the

RBP of interest (hnRNP E1) in dataset GSE40466. The lengths of Jak2 are highly divergent; however, when aligned to the binding consensus identified by RPTS, we see this motif highly conserved across orthologous genes in the divergent species. Fig. 2b is a Clustal-Ω (Goujon et al., 2010; Sievers et al., 2011) alignment of the 3’ UTRs and the predicted RBP consensus sequence; notice the highly conserved residues in the alignment. The overall nucleic acid consensus motif that is determined by RPTS for a given dataset is compared on the transcriptome level using the basic local alignment search tool (BLAST) (Boratyn et al., 2013). Given the premise that RPTS has identified a unique motif contained in a subset of genes, the descriptor motif prediction should be unique enough to use reverse-logic query genes with regions of high similarity to the motif descriptor. When analyzed to the transcriptome should yield similar genes that were predicted to contain the motif by RPTS. To confirm the regulatory-consensus motif predictions, we set out to measure the uniqueness of the motif sequence by using the basic local alignment sequence tool (BLAST). Fig. 2c shows the overlap of BLAST-nucleotide (BLASTn) predictions obtained by analyzing the predicted consensus motif as a search pattern in the RefSeq database of Mus musculus. BLASTn output includes genes that contain a sequence that matches the reference pattern, genes with significant matching values were compared to the motif- containing gene predictions from RPTS. With our training set, we see a significant (92.7%)

21 overlap of predictions, further confirming our algorithm as a robust identifier of RBP PTM regulatory binding motifs.

22 FIGURE 2

A

B

Jak2:Evolutionary Consensus

C

Predicted 92.7% Descriptor Motif From Binding Prediction Identical genes Array BLASTn Query > 75% Similarity

23 FIGURE 2 LEGEND

Figure 2. 3’-UTRs become divergent across evolution while conserving regulatory motifs.

A) Representation of the lengths of divergent species 3’-UTR Jak2 lengths. There is a significant degree of variation across evolution, however as seen in panel B regions of these

UTRs have been conserved. B) This alignment was generated from RPTS, it utilizes the multiple sequence comparison by log-expectation (Clustal-Ω) alignment method to determine regions of highest conservation in Jak2 (Clamp, Cuff, Searle, & Barton, 2004; Edgar, 2004;

Thompson et al., 2006). Purines and Pyrimidines are differentiated by pink and blue, respectively.

C) Our training GEO set GSE40466 yielded a consensus nucleic acid sequence specific to binding hnRNP E1, in addition to determining a consensus sequence, the output includes the identity of genes that contain this motif. When the consensus sequence was analyzed by

BLASTn, 92.7% of the genes predicted by BLAST analysis were also predicted by RPTS.

24 Identification of Target mRNA for RNA Binding Proteins

The identification of a PTM regulatory-consensus nucleic acid binding motif for an RBP of interest is obtained by analyzing genomic wide binding arrays where gene intensity is read as a function of interaction with a RBP. This algorithm takes an approach of utilizing currently available analytic packages such as Bioconductor for detecting differentially expressed genes, and Clustal-Ω to perform alignments. It provides its own unique analytic approach to combine these tools into a high-throughput analytic package that is capable of detecting conserved regions in nucleic acid sequences. Ultimately, this application is capable of defining a region of nucleic acid responsible for binding to a RBP of interest in a PTM-dependent manner.

To further support our computational findings, we utilized a modified RNA-immunoprecipitation

(RIP) assay to confirm the interaction (and subsequent loss of interaction through cytokine treatment) of our example RBP, hnRNP E1 with predicted motif-containing genes. NMuMG cells have been previously confirmed to actively accumulate p-hnRNP E1 in response to TGFβ signaling (Chaudhury et al., 2010; Hussey et al., 2011; Hussey et al., 2012). Fig. 3a confirms the

TGFβ-induced p-hnRNP E1 accumulation at various time points, keep in mind, the system demonstrated here will show a loss of interaction between motif-containing genes and p-hnRNP

E1. Fig. 3b confirms 6 predicted motif-containing genes indeed interact with our RBP of interest in a treatment dependent manner. The loss of interaction between hnRNP E1 and target genes as a function of TGFβ treatment is illustrated in Fig. 3b. To show the effect TGFβ had on overall mRNA levels of these target genes, we assayed overall levels of the target genes mRNA from cellular lysate. As the bottom panel of each gene assay contained in Fig. 3b shows, TGFβ does not cause a change in the total mRNA levels of these motif-containing genes.

2255 FIGURE 3

A NMuMG B NMuMG -/- hnRNP E1 0 0.5 1 3 6 24 0 TGFβ (h)

IP: α-hnRNP E1 WB: α-RXRXXpS

WB: α-hnRNP E1

C NMuMG NMuMG NMuMG -/- NMuMG -/- hnRNP E1 hnRNP E1

0 1 3 6 24 IgG 0 TGFβ (h) 0 1 3 6 24 IgG 0 TGFβ (h)

RIP: α-hnRNP E1 RIP: α-hnRNP E1 rtPCR: Kpna rtPCR: Jak2

rtPCR: Kpna rtPCR: Jak2

0 1 3 6 24 IgG 0 TGFβ (h) 0 1 3 6 24 IgG 0 TGFβ (h)

RIP: α-hnRNP E1 RIP: α-hnRNP E1 rtPCR: Rbms1 rtPCR: Egfr

rtPCR: Rbms1 rtPCR: Egfr

0 1 3 6 24 IgG 0 TGFβ (h) 0 1 3 6 24 IgG 0 TGFβ (h)

RIP: α-hnRNP E1 RIP: α-hnRNP E1 rtPCR: Gna1 rtPCR: Cast

rtPCR: Gna1 rtPCR: Cast

0 1 3 6 24 IgG 0 TGFβ (h)

RIP: hnRNP E1 rtPCR: β-actin

rtPCR: β-actin

26 FIGURE 3 LEGEND

Figure 3. RPTS predicted motif-containing genes interact with hnRNP E1.

A) To assess the TGFβ induced serine-43 phosphorylation of hnRNP E1, we combined an immunoprecipitation and subsequent western blot analysis. Cytosolic hnRNP E1 was immunoprecipitated by α-hnRNP E1 / protein-A-sepharose, an antibody specific to the phosphorylated form of the phosphorylation consensus sequence that contains serine-43 of hnRNP E1 was used in a subsequent western blot to assay p-Ser43-hnRNP E1. NMuMG -/- hnRNP E1 samples were used as a negative control due to its lack of hnRNP E1. A steady rise of p-Ser43-hnRNP E1 is seen as NMuMG cells are treated with TGFβ in a time dependent manner. Levels of total hnRNP E1 remain unchanged as an effect of TGFβ. B) Schematic representation of TGFβ stimulation on hnRNP E1. TGFβ induces kinase activity of Akt2 to cause phosphorylation on serine-43 of hnRNP E1, causing dissociation of this RBP from its target consensus motif. C) After determining the kinetics of hnRNP E1 phosphorylation, we utilized these time points to assess and confirm the binding of RPTS predicted motif-containing genes. The top panel of individual gene assays shows the loss of binding through RNA immunoprecipitation (RIP) and subsequent reverse transcriptase polymerase chain reaction

(rtPCR). We utilized IgG as a negative control to show the specificity for RIP samples with α- hnRNP E1. As seen in panel A, the phosphorylation levels of serine-43 peak around 3 hours,

RIP analysis shows a significant loss of binding for these motif-containing genes in a direct relationship to the hnRNP E1 phosphorylation kinetics. The lower panel of gene assays shows total RNA levels as a function of TGFβ by rtPCR of total cellular RNA.

27 Contribution of Motif-Containing Genes to Various Annotated Pathways

Our conformational findings illustrated in Fig. 2 and 3 allow for further downstream analysis.

Predictions of RPTS essentially define an RBP-mediated regulon with individual genes identified and their putative RBP-binding consensus motif precisely mapped. Our aim is to visualize a system that is regulated by an individual RBP or its PTM in the context of a biological system. To determine the contributions these regulon genes have to metabolism and known cell regulatory pathways, we utilized the Global Set Enrichment Analysis (GSEA) to analyze the contributions and metabolic pathways that would be activated by the induction of our regulon genes (Mootha et al., 2003; Subramanian et al., 2005). Fig. 4a is a representation of grouped pathways that are known to have a synergistic contribution. Analyzing our predicted regulon genes for contribution to grouped metabolic pathways, we see four groupings that are highly implicated in the progression of cancer. These findings support previous observations that TGFβ-induced pSer43 of hnRNP E1 drives both cancer progression and metastasis (Huo et al., 2010; Shi et al., 2012; Xue et al., 2014).

Individual metabolic pathways induced by malignancy associated with our RBP-regulon system were also determined by using RPTS. Fig. 4c contains individual metabolic pathways that would be activated as a function of expression of our regulon genes. Again, these findings support literature that associates the loss of repressor hnRNP E1 or its PTM by TGFβ as a potent inducer of cancer. Fig. 4c highlights pathways that contained a p-value of less than 0.05, and a significant (greater than 1.7) fold enrichment, many of these pathways are involved in the induction of cancer, and support the many observations that hnRNP E1 dysregulation induces cancer.

28 FIGURE 4

A C

B

29 FIGURE 4 LEGEND

Figure 4. Predicted motif-containing genes global set enrichment (GSEA) analysis.

A) The list of motif-containing genes predicted by RPTS was analyzed by GSEA for pathway enrichment of known oncogenic pathways. Each individual block is a functional grouping of pathways, individual pathways are noted horizontally, gene names are noted vertically. B) GSEA individual pathway enrichment analysis. Each pathway had a p-value of less than 0.05, and an enrichment score of greater than 1.7-fold enrichment. When analyzed as a total group, many of these pathways are implicated in the progression of EMT, cancer, and metastasis. C) Previous data suggests the mutation of single bases in RBP consensus motifs causes a disruption of the complex formation. This panel shows the number of SNP records contained in dbSNP (Entrez) for each RPTS motif-containing prediction. There are significant numbers of SNP mutations that occur in the loci of these genes, it is statistically plausible that some of these mutations occur within the boundaries of individual RBP consensus motif.

30 SUPPLEMENTARY FIGURE 1

Nova2 Tdp43

31 SUPPLEMENTARY FIGURE 1 LEGEND

Supplementary Figure 1. Additional RBP motif training sets.

RPTS was tested against two other RBPs that have high throughput binding array data to allow for RPTS testing. While no bona-fide motif-containing genes have yet been determined, binding motif genes for Nova2 and Tdp43 determined by RPTS are highly similar to binding predictions.

32 DISCUSSION

The development of RPTS was motivated by the need to determine a nucleic acid consensus motif specific to an RBP, and subsequently identify on a genome scale, all genes containing this regulatory motif in their UTR. The analysis presented in this manuscript benefits from the vast amounts of publicly available data to support our computational predictions and in-vitro observations. RPTS predictions can serve as an initial screen for analyzing high-throughput

UTR assays as a function of RBP binding. The output of RPTS is directly testable through relatively inexpensive in-vitro methods that involve the synthetic production of both the RBP and multiple nucleic acid transcripts.

RPTS was engineered to analyze 3’ and 5’ UTR array data; however, its theory and core foundation should provide a robust platform for modification and analysis of DNA-binding protein and DNA arrays. A potential shortcoming of RPTS lies in its strict analysis algorithm where a putative gene that shows proper binding characteristics could be rejected due to a lack of evolutionary conservation in its 3’ UTR. We have justified this shortcoming by the notion that genes critical to cellular stability would have likely conserved their regulatory motifs. However, genes which have binding characteristics, yet lack a conserved motif likely have a less important role in cellular function.

Our lab has employed RPTS to thoroughly analyze and characterize a nucleic acid binding motif that is unique to the Ser43 site on hnRNP E1 in an ongoing study. This study is near completion and has vigorous in-vitro binding data to support RPTS predictions. We further characterize biological significance, and show a clinical consequence for the disruption of the binding between an RBP and a regulatory motif contained in an oncogenic subset of genes. With the planned additions to extend the capabilities of RPTS, we anticipate this becoming a continuously

33 refined algorithm that gains robustness through usage by other groups and in-vitro validation techniques.

Future Work and Potential Applications

As mentioned before, RPTS was engineered to support the analysis of 3’ and 5’-UTR arrays as a function of RBP affinity. It is our ultimate goal to expand the basic theories and function of

RPTS to support a multitude of high-throughput binding data, and expand the downstream analysis of RPTS predictions to screen for interactions with other potent cellular biomolecules such as miRNA.

Nearly all RBP-RNA interactions that are known involve the potential for disruption of the complex through cellular signaling and other biochemical events. If the complex is disrupted, it leaves a consensus motif that is normally bound by RBPs open and a potential target for other effectors to modulate gene stability, silencing, or localization. A future goal of our lab is to create and annotate a database for confirmed RPTS motif predictions that can be used to scan predicted motifs against a multitude of cellular modulators such as miRNA, and other characterized RBP-domains for which a target sequence is already known.

The loss of interaction to regulatory RBPs can be of drastic consequence to the cell and organism as a whole. Many of these epigenetic events can be the starting point for various forms of cancer and cellular de-differentiation. A large database of SNPs is curated by Entrez (dbSNP) and contains an ever-growing number of SNPs that are of clinical significance. RPTS allows for the precise identification of putative identification of an RBP binding motif, and the subsequent genome-wide identification of genes that contain this motif with corresponding motif locations.

The combination of this data can be further analyzed and compared with dbSNP in order to

34 correlate a potential mutation that induces loss of function through the loss of interaction with a regulatory RBP.

The output of RPTS maps consensus motifs to the exact position on a gene, we are currently in the process of writing a computational algorithm that will extract these data and map the motif to individual to be more compatible with SNP analysis. Many of the annotated SNPs contained in dbSNP give their position by their location in the . Fig. 4c is a frequency count of SNPs located in genes predicted by this algorithm to contain an evolutionary conserved motif in their 3’-UTR, and lose binding to p-hnRNP E1. Counts of SNPs contained per gene were obtained for each regulon gene for humans, these values are significantly higher when organismal constraints are removed. From a statistical standpoint, there is a high probability that some of these annotated SNPs will reside in the consensus motif, and depending on the type of SNP, may potentially cause a disruption of interaction with the regulatory RBP.

Previous data from our lab shows a single point mutation contained in the regulatory consensus of Dab2 causes a complete loss of binding and regulation (Chaudhury et al., 2010).

In summary, we have created a unique computational algorithm that is capable of providing a simplistic user interface where a user can simply upload a standardized GEO expression set and output a putative consensus motif descriptor along with a transcriptome wide identification of motif-containing genes.

35 ACKNOWLEDGEMENTS

We thank Dr. George Hussey and other members of our laboratory for helpful comments and critical insights.

36 FUNDING

This work was supported by the grants CA55536 and CA154664 from the National Cancer

Institute (to Philip H. Howe, Ph.D.).

37 REFERENCES

Aletta, J. M., Cimato, T. R., & Ettinger, M. J. (1998). Protein methylation: a signal event in post-

translational modification. Trends in Biochemical Sciences, 23(3), 89-91. doi:

http://dx.doi.org/10.1016/S0968-0004(98)01185-2

Boratyn, G. M., Camacho, C., Cooper, P. S., Coulouris, G., Fong, A., Ma, N., . . . Merezhuk, Y.

(2013). BLAST: a more efficient report with usability improvements. Nucleic Acids

Research, 41(W1), W29-W33.

Chaudhury, A., Hussey, G. S., Ray, P. S., Jin, G., Fox, P. L., & Howe, P. H. (2010). TGF-[beta]-

mediated phosphorylation of hnRNP E1 induces EMT via transcript-selective

translational induction of Dab2 and ILEI. Nat Cell Biol, 12(3), 286-293. doi:

http://www.nature.com/ncb/journal/v12/n3/suppinfo/ncb2029_S1.html

Chen, K., & Rajewsky, N. (2006). Deep conservation of microRNA-target relationships and

3'UTR motifs in vertebrates, flies, and nematodes. Paper presented at the Cold Spring

Harbor symposia on quantitative biology.

Churbanov, A., Rogozin, I. B., Babenko, V. N., Ali, H., & Koonin, E. V. (2005). Evolutionary

conservation suggests a regulatory function of AUG triplets in 5′-UTRs of eukaryotic

genes. Nucleic Acids Research, 33(17), 5512-5520. doi: 10.1093/nar/gki847

Clamp, M., Cuff, J., Searle, S. M., & Barton, G. J. (2004). The jalview java alignment editor.

Bioinformatics, 20(3), 426-427.

Dreyfuss, G., Kim, V. N., & Kataoka, N. (2002). Messenger-RNA-binding proteins and the

messages they carry. Nat Rev Mol Cell Biol, 3(3), 195-205.

38 Edgar, R. C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high

throughput. Nucleic Acids Research, 32(5), 1792-1797.

Goujon, M., McWilliam, H., Li, W., Valentin, F., Squizzato, S., Paern, J., & Lopez, R. (2010). A

new bioinformatics analysis tools framework at EMBL–EBI. Nucleic Acids Research,

38(suppl 2), W695-W699.

Huo, L.-R., Ju, W., Yan, M., Zou, J.-H., Yan, W., He, B., . . . Zhong, N. (2010). Identification of

differentially expressed transcripts and translatants targeted by knock-down of

endogenous PCBP1. Biochimica et Biophysica Acta (BBA)-Proteins and Proteomics,

1804(10), 1954-1964.

Hussey, G. S., Chaudhury, A., Dawson, A. E., Lindner, D. J., Knudsen, C. R., Wilce, M. C., . . .

Howe, P. H. (2011). Identification of an mRNP complex regulating tumorigenesis at the

translational elongation step. Molecular cell, 41(4), 419-431.

Hussey, G. S., Link, L. A., Brown, A. S., Howley, B. V., Chaudhury, A., & Howe, P. H. (2012).

Establishment of a TGFβ-induced post-transcriptional EMT gene signature. PLoS One,

7(12), e52624.

Kuersten, S., & Goodwin, E. B. (2003). The power of the 3[prime] UTR: translational control

and development. Nat Rev Genet, 4(8), 626-637.

Maglott, D., Ostell, J., Pruitt, K. D., & Tatusova, T. (2011). Entrez Gene: gene-centered

information at NCBI. Nucleic Acids Research, 39(suppl 1), D52-D57.

Mootha, V. K., Lindgren, C. M., Eriksson, K.-F., Subramanian, A., Sihag, S., Lehar, J., . . .

Laurila, E. (2003). PGC-1α-responsive genes involved in oxidative phosphorylation are

coordinately downregulated in human diabetes. Nature genetics, 34(3), 267-273.

39 Mukhopadhyay, R., Jia, J., Arif, A., Ray, P. S., & Fox, P. L. (2009). The GAIT system: a

gatekeeper of inflammatory gene expression. Trends Biochem Sci, 34(7), 324-331. doi:

10.1016/j.tibs.2009.03.004

Pesole, G., Liuni, S., Grillo, G., Licciulli, F., Larizza, A., Makalowski, W., & Saccone, C.

(2000). UTRdb and UTRsite: specialized databases of sequences and functional elements

of 5′ and 3′ untranslated regions of eukaryotic mRNAs. Nucleic Acids Research, 28(1),

193-196.

Pesole, G., Liuni, S., Grillo, G., Licciulli, F., Mignone, F., Gissi, C., & Saccone, C. (2002).

UTRdb and UTRsite: specialized databases of sequences and functional elements of 5′

and 3′ untranslated regions of eukaryotic mRNAs. Update 2002. Nucleic acids research,

30(1), 335-340.

Ray, D., Kazan, H., Cook, K. B., Weirauch, M. T., Najafabadi, H. S., Li, X., . . . Hughes, T. R.

(2013). A compendium of RNA-binding motifs for decoding gene regulation. Nature,

499(7457), 172-177. doi: 10.1038/nature12311

Shi, Z., Zhang, T., Long, W., Wang, X., Zhang, X., Ling, X., & Ding, H. (2012). Down-

regulation of poly (rC)-binding protein 1 correlates with the malignant transformation of

hydatidiform moles. International Journal of Gynecological Cancer, 22(7), 1125-1129.

Sievers, F., Wilm, A., Dineen, D., Gibson, T. J., Karplus, K., Li, W., . . . Söding, J. (2011). Fast,

scalable generation of high!quality protein multiple sequence alignments using Clustal

Omega. Molecular systems biology, 7(1), 539.

Siomi, H., Matunis, M. J., Michael, W. M., & Dreyfuss, G. (1993). The pre-mRNA binding K

protein contains a novel evolutionary conserved motif. Nucleic Acids Research, 21(5),

1193-1198.

40 Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., . . .

Lander, E. S. (2005). Gene set enrichment analysis: a knowledge-based approach for

interpreting genome-wide expression profiles. Proceedings of the National Academy of

Sciences of the United States of America, 102(43), 15545-15550.

Thompson, J. D., Muller, A., Waterhouse, A., Procter, J., Barton, G. J., Plewniak, F., & Poch, O.

(2006). MACSIMS: multiple alignment of complete sequences information management

system. BMC bioinformatics, 7(1), 318.

Wuchty, S., Oltvai, Z. N., & Barabasi, A. L. (2003). Evolutionary conservation of motif

constituents in the yeast protein interaction network. Nat Genet, 35(2), 176-179. doi:

10.1038/ng1242

Xue, X., Wang, X., Liu, Y., Teng, G., Wang, Y., Zang, X., . . . Wang, J. (2014). SchA–p85–FAK

complex dictates isoform-specific activation of Akt2 and subsequent PCBP1-mediated

post-transcriptional regulation of TGFβ-mediated epithelial to mesenchymal transition in

human lung cancer cell line A549. Tumor Biology, 35(8), 7853-7859.

41 CHAPTER III

Identification and Characterization of an hnRNP E1 Translational Silencing Motif

42 ABSTRACT

Non-canonical transforming growth factor β (TGFβ) signaling through protein kinase B (Akt2) induces phosphorylation of heterogeneous nuclear ribonucleoprotein E1 (hnRNP E1) at serine-43

(p-hnRNP E1). This post-translational modification of hnRNP E1 promotes its dissociation from a translational stall complex with target 3’ untranslated region (UTRs) nucleic acid regulatory motifs, driving epithelial to mesenchymal transition (EMT), and metastasis. We have identified an hnRNP E1 consensus-binding motif and genomically identified genes that contain this post- translational modification-responsive regulatory element. This study characterizes the binding kinetics of the consensus-binding motif and hnRNP E1, its various K-homology (KH) domains, and p-hnRNP E1. Levels of p-hnRNP E1 are highly upregulated in metastatic cancer cells and low in normal epithelial tissue, and show a correlation to levels of Akt2 and high phosphorylation of serine-474 (p-Akt2). We show a metastatic progression signature in cellular progression series of high protein expression of Akt2, p-Akt2, and p-hnRNP E1, and significantly lower level of total hnRNP E1 protein. Genes that are translationally silenced by hnRNP E1 and reversibly expressed by p-hnRNP E1 are highly implicated in the progression of

EMT and metastasis. This study provides insight into a non-canonical TGFβ signaling cascade that is responsible for inducing EMT by aberrant expression of hnRNP E1 silenced targets. The relevance of this system in progression series and well-established human breast and colon cancer lines provides a potential new prognostic indicator of cancer progression. New insights provided by the resolution of this molecular mechanism provide new targets for therapeutic intervention and call for new small-molecule inhibitor screens for the induction of p-hnRNP E1 without disrupting other vital pathways further upstream.

43 INTRODUCTION

Epithelial to mesenchymal transition (EMT) is a process in which an epithelial cell reverts to a mesenchymal state, typically through cytokine stimulation (Huber, Kraut, & Beug, 2005; Thiery,

2002; Xu et al., 2009). EMT is highly associated with promoting tumor formation, tumor metastasis, and the overall progression of cancer through the loss of the cell’s epithelial characteristics, and the induction of mesenchymal properties (Xu et al., 2009). EMT is promoted by non-canonical transforming growth factor β (TGFβ) signaling and downstream activation of the phosphatidylinositol-4,5-bisphosphate 3-kinase (PI3K) / protein kinase B (Akt) pathway

(Hills & Squires, 2010; J. M. Lee, Dedhar, Kalluri, & Thompson, 2006; Miettinen, Ebner, Lopez,

& Derynck, 1994). This non-canonical pathway of TGFβ differs in function from its canonical counterpart by promoting tumor formation and progression, opposed to inhibition (Massagué,

2000, 2012). We have previously identified a mechanism of translational repression by heterogeneous nuclear ribonucleoprotein E1 (hnRNP E1) binding to a conserved 3’ untranslated region (UTR) nucleic acid motif in the mRNA of EMT-inducing genes. This interaction is responsible for stalling peptide synthesis at the elongation stage, this repression can be interrupted by the phosphorylation of serine 43 on hnRNP E1 (p-hnRNP E1) causing dissociation of hnRNP E1 from the translational stall complex (Chaudhury, Hussey, et al.,

2010b; Hussey et al., 2011; Hussey et al., 2012b).

Akt is a serine / threonine kinase with three different isoforms, Akt1, Akt2, and Akt3 (Agarwal,

Brattain, & Chowdhury, 2013; Chin et al., 2013). Akt plays a central role in cellular physiology and its various isoforms have significantly different function when compared to one another.

44 The progression of cancer and induction of metastasis is highly correlated to an increased expression of Akt2. When compared to normal epithelial tissue, cancer / tumor tissue shows an increased expression of Akt2, and interestingly, a decrease in Akt1 (Mulholland et al., 2012; G.

Xue et al., 2012). Akt2 is a downstream activation target of non-canonical TGFβ signaling, and is implicated in the activation of multiple pathways associated with cellular growth and proliferation (X.-F. Chen et al., 2012). We have previously implicated Akt2 as the only isoform of this enzyme that is capable of inducing p-hnRNP E1, and demonstrated the ability to halt

TGFβ induced EMT by inhibiting Akt2 kinase activity with a small molecule inhibitor

(LY294005) (Chaudhury, Hussey, et al., 2010b; Hussey et al., 2012b).

RNA binding proteins (RBPs) are regulatory proteins that bind RNA typically in its 3’ or 5’UTR and most frequently modulate post-translational regulation of protein expression (Burd &

Dreyfuss, 1994; Glisovic, Bachorik, Yong, & Dreyfuss, 2008). There have been multiple advances in high-throughput genomic sequencing allowing for the resolution of RBP target binding sites, and a pattern of evolutionary conservation for solved RBP binding motifs has been established (Glisovic et al., 2008; Hafner et al., 2010b; J. Ho & Marsden, 2014). Most procedures that utilize a global approach to resolve a binding motif involve the use of exonuclease digestion (Corcoran et al., 2011; Hafner et al., 2010a, 2010b). The region of RNA that is covered by RBP binding is protected from exonuclease digestion through its RBP interaction. After high throughput sequencing analysis, a large number of fragment RNA sequences are resolved, and analyzed to determine a binding motif for a particular RBP. hnRNP

E1 is a highly expressed RBP that also has the ability to bind DNA, and has multiple sites of potential post-translational modification. We have implicated a single post-translational

45 modification at serine 43 that causes the differential binding to hnRNP E1 for a subset of EMT- inducing genes (Hussey et al., 2012b). To this end, we have developed an in-vitro exonuclease digestion assay that is capable of resolving a nucleic acid binding motif with specificity for a single RBP post-translational modification.

We have previously shown genes that are translationally silenced by the binding of their 3’UTR to hnRNP E1, and subsequently expressed upon serine 43 phosphorylation (Hussey et al.,

2012b). When analyzed by pathway enrichment analysis, these genes are shown to highly enrich pathways associated with invasiveness and metastasis. The synergistic contribution of these

EMT-inducing genes, coupled to their similar hnRNP E1 binding characteristics has given insight into a highly complex regulatory mechanism that is dependent on non-canonical TGFβ signaling and downstream Akt2 activation. Hence, the observation of high Akt2 levels in cancer tissue by our lab and others has lead to the hypothesis that sufficient Akt2 levels can cause the aberrant expression of EMT-inducing genes through amplification of non-canonical TGFβ signaling. In an attempt to resolve a putative nucleic acid consensus motif, we performed computational alignments of the 3’UTR for EMT-inducing genes found to differentially bind hnRNP E1. These alignments yielded no obvious regions of conservation amongst these approximately 40 genes, and lead us to hypothesize that a unique structural motif exists in the

3’UTR of EMT-inducing genes, and while conserved structurally, sequence is highly divergent.

To address these hypothesizes, we adopted a combinatorial approach of in-vitro exonuclease digestion with genomic RNA to resolve a nucleic acid motif, followed by a variety of in-vitro binding assays to better characterize the interactions between the various KH domains in hnRNP

E1 and the regulatory nucleic acid motif. We further go on to characterize this non-canonical

46 TGFβ pathway by characterizing protein levels of Akt2, p-Akt2, hnRNP E1, and p-hnRNP E1 in a well established cellular cancer progression to correlate activity of this pathway to metastatic potential. This cohort of genes that contain regulatory motifs dependent on p-hnRNP E1 levels represent a new Akt2 / hnRNP E1 mediated regulon that is responsible for inducing EMT by overstimulation of the PI3K / Akt2 pathway.

47 EXPERIMENTAL PROCEDURES

Cell culture – NMuMG cells were maintained and grown in DMEM media.

Cloning of 3’UTRs for select mRNA – Sequences of primers used for PCR amplification of 3’

UTRs are shown in Table 2. For cloning of 3’UTRs cDNAs were synthesized from mRNA in the total RNA pool extracted from NMuMG cells by reverse transcriptase (RT)-PCR according to manufacturer’s instructions. DNA samples were further amplified using Vent DNA polymerase

(New England Biolabs). The PCR products were cloned at the SmaI site of pUC18. All the clones contained the 3UTRs sequences under a T7 promoter.

Transcription of synthetic mRNA – pUC18 based 3’UTR clones under T7 promoter were transcribed by T7 RNA polymerase according to manufacturer’s instructions with or without [α-

32P]-UTP and [α-32P]-CTP. Transcribed RNA samples were treated with RNase-free DNaseI, extracted with trizol solution, precipitated with ethanol and finally dissolved in TE.

Expression of recombinant proteins – E. coli DH5α cells containing an Isopropyl β-D-1- thiogalactopyranoside (IPTG)-inducible GST-tagged protein expression plasmid were grown to an optical density at 600 nm of 0.6, and induced with 1 µM IPTG. IPTG induction was performed for 4 hours at 37°C with rotation at 270 RPM. After induction, bacterial cells were lysed with a bacterial lysis buffer (50 mM Tris-HCl, 300 mM NaCl, 20 mM Imidazole, 0.05% β- mercaptoethanol, 0.5% triton X-100, 1 mM PMSF, 0.5 µg/mL, and 10 µg/mL aprotinin) for 30 minutes at 4°C. To clear lysed bacterial debris, samples were centrifuged at an RCF of 100,000 x g for 30 minutes at 4°C. Cleared lysate was incubated with glutathione-agarose for a minimum of 1 hour at 4°C with shaking. Purified protein is obtained by either elution with a reduced

48 glutathione buffer (50 mM Tris-HCl pH = 7.5, 10 mM reduced-glutathione), a minimum of 10- volumes of elution buffer is incubated for 5 minutes at 4°C, eluted protein was concentrated in a

3 KDa molecular weight cutoff centrifugation filter to a desired volume. For protein that was to be purified without its GST tag, washed protein bound glutathione-agarose beads were incubated for 18 hours with PreScission protease (GE Healthcare), expressed protein was recovered by separating the supernatant of the reaction, concentration was performed on an as-needed basis.

RNA / protein binding assays - GST-hnRNP E1 protein purified from E. coli DH5 cells were immobilized on glutathione agarose beads. Radiolabeled or unlabeled RNA samples were bound to the immobilized beads and were incubated at 0°C for – min. The beads were washed with – buffer and finally eluted with 5 mM reduced glutathione in the same buffer. The samples were extracted with Trizol and analyzed by 7M urea-10% polyacrylamide gel electrophoresis.

Recombinant protein exonuclease digestion protection assays – Samples of RNA and GST- tagged recombinant protein were incubated in an RNA-protein binding buffer (50 mM Tris-HCl pH = 7.5, 100 mM KCl, 4% glycerol, 1 mM EDTA) for 40 minutes at 4°C. Glutathione-agarose beads were added to reaction and allowed to bind for 1 hour at 4°C, with gentle shaking. A series of 50, 100, 150, 200 mM NaCl washes were performed to remove unbound RNA. RNase

A was added at manufacturer recommended concentration, and allowed to digest samples at for

10 minutes at 15°C. Samples were washed with a RNA-protein binding buffer, and subjected to further downstream analysis.

49 Preparation of digested mRNA for high throughput sequencing – Fragments of RNA from recombinant protein exonuclease digestion protection assays were removed from bound protein by a combination of proteinase-K digestion, followed by trizol extraction and resolved 7M urea-

10% polyacrylamide gel electrophoresis (PAGE). To append a primer to the 3’ end of recovered, digested RNA, a modified (5'-/5rApp/ TTT AAC CGC GAA TTC CAG /3SpC3/-3') primer was appended to RNA by ligation with RNA-ligase-2 (Rnl2) for 18 hours at 4°C. Ligated oligonucleotides were resolved using PAGE. 5’ end ligation was achieved by incubation with a modified (5'-ACG GAA TTC CTC ACT rArArA-3') primer was appended to the hybrid

DNA/RNA-oligonucleotide by incubation with T4-ligase for 18 hours at 4°C. Ligated oligonucleotides were resolved using PAGE, and subjected to further downstream analysis.

Computational analysis of high throughput sequencing results – Transformation from a .bam to

.fastq was performed utilizing a custom conversion script implemented in Python (included in supplemental). A reference nucleic acid sequence database was constructed utilizing custom

Python implementations (included in supplemental), this database was subsequently used for

Bowtie2 analysis (Langmead & Salzberg, 2012) to determine gene of origin (GOR) for respective reads. Evolutionary conservation and finite mapping of sequence was obtained through utilization of custom Python script implementations (included in supplemental).

Analysis of read data generated from scripts were further analyzed for statistical conservation, a minimal descriptor sequence describing a unique nucleic acid sequence responsible for the

TGFβ-induced reversible binding with hnRNP E1.

50 In-vitro translation assays – Luciferase open-reading frame (ORF) was obtained through PCR amplification of luciferase from pGL3-basic, with HindIII and BamHI cleavage sites (3’ and 5’ respectively) were placed on ends of ORF. ORF was amplified and placed into a pcDNA 3.1+ plasmid vector with the multiple-cloning site oriented at the 3’UTR of Luciferase-ORF. Small,

~50 nucleotide regulatory oligonucleotides were ligated to cleaved pcDNA 3.1+-Luciferase plasmids to allow for transcription of RNA using T7. RNA was transcribed, containing a luciferase promoter, ORF, and 3’UTR regulatory element. Transcribed RNA was incubated with recombinant hnRNP E1 / p-hnRNP E1 and recombinant eEF1A1 protein. Incubation was performed for 5 minutes at room temperature and subsequently translated using Promega rabbit reticulocyte lysate protein translation system per manufacturers specifications.

RNA electromobility shift assays – Synthetic RNA and recombinant proteins were prepared as previously described and allowed to incubate for 10 minutes at 4°C in RNA-protein binding buffer (40 mM Tris-HCl pH = 7.5, 30 mM KCl, 1 mM MgCl2, 0.01% NP40, 1 mM dithiothreitol). After binding, a loading buffer composed of 50% glycerol and bromophenol blue

/ xylene cyanol was added to samples. Samples were loaded into varying percentages of non- denaturing polyacrylamide gel and electrophoresed.

RNA pull-down assays - Synthetically transcribed RNA was prepared as previously described.

RNA was fixed to activated CNBr-agarose beads (GE Healthcare) according to manufacturer recommended procedures. RNA-bead mixture was washed with a wash buffer (50 mM Tris-HCl pH = 7.5, 4% glycerol, 1 mM EDTA) and added to mammalian cell lysate. Beads and lysate were incubated for 1 hour at 4°C, and subsequently washed with wash buffer. Protein was eluted

51 from beads using PAGE loading buffer, containing β-mercaptoethanol. Samples were subsequently transferred to membrane and analyzed for protein interaction using western blot analysis.

Phosphorylation of recombinant hnRNP E1 – Recombinant substrate proteins were prepared as previously described, and incubated with recombinant Akt2 enzyme (Signal Chem) in kinase buffer (25 mM Tris-HCl pH = 7.5, 5 mM β-glycerophosphate, 10 µM ATP, 2 mM dithiothreitol,

0.1 mM Na3VO4, 10 mM MgCl2) and incubated for 15 minutes at 30°C. Recombinant substrates are removed by extraction with glutathione-agarose, washing and cleavage from GST-tag by

PreScission (GE Healthcare) protease. Phosphorylated recombinant proteins were used in various downstream analyses.

52 RESULTS

TGFβ regulated EMT-inducing genes are translationally regulated:

Previous work in our lab has identified a subset of EMT-inducing genes that show a signature non-canonical TGFβ translational upregulation through the dissolution of a regulatory complex containing hnRNP E1. From these identified genes, we utilized a biased approach of choosing 4 candidates that showed this signature TGFβ / hnRNP E1 interaction. To validate these candidates, we first analyzed mRNA and protein levels with respect to TGFβ treatment and hnRNP E1 knockdown (-/- hnRNP E1) (Fig. 5A) as shown, at 24 hours of TGFβ treatment we observed a significant accumulation of Egfr, Dab2, Fam3c, and Jak2 protein levels. In contrast, we show that mRNA levels of these genes do not change in response to the silencing of hnRNP

E1 or TGFβ treatment (Fig. 5B and 5C). The discrepancy associated with an increase in protein translation and lack of increase in mRNA shows a translational regulation mechanism rather than transcriptional. This efficient, developmentally necessary mechanism is aberrantly reactivated by an increased potential for TGFβ to signal through its non-canonical pathways.

53 FIGURE 5

54 FIGURE 5 LEGEND

Figure 5. Translational regulation of EMT-inducing genes is upregulated by TGFβ and knockout of hnRNP E1.

A) Immunoblot analysis for candidate motif containing genes. NMuMG cells and a mutant hnRNP E1 knockout version, NMuMG -/- hnRNP E1 were treated with 5 ng/mL TGFβ up to 24 hours. B) Semi-quantitative PCR analysis of mRNA levels from lysates used in (A). C)

Quantitative real-time PCR analysis of mRNA, normalized to expression of GAPDH mRNA levels.

55 TGFβ regulated EMT-inducing genes interact with hnRNP E1 in their 3’UTR:

To assess the specificity of binding for 3’UTRs of candidate genes, we developed an in-vitro assay to determine binding affinity for hnRNPE1, its various KH domains (KH1, KH2, KH3), and p-hnRNP E1. 3’UTRs of candidate genes were transcribed from plasmids containing a T7

RNA polymerase promoter, and radioactively labeled. We observed (Fig. 6A) a high affinity between full length 3’UTR RNAs and hnRNP E1 as well as high affinity between 3’UTRs and all three various KH domains; however, p-hnRNP E1 samples had a nearly complete lack of affinity for binding 3’UTR RNA.

To determine the region of 3’UTR that was in contact with hnRNP E1, we utilized a modified exonuclease protection assay with homologous samples of RNA and recombinant hnRNP E1 / p- hnRNP E1 (Fig. 6B). As hnRNP E1 protein was titrated into samples of 3’UTR RNA, we observed a smaller molecular weight RNA band appear and increase in intensity concomitant with hnRNP E1 titration (Fig. 6B). When p-hnRNP E1 was used to protect RNA from digestion, the smaller band observed in hnRNP E1 samples disappeared, supporting our previous findings for a lack of affinity w/ p-hnRNP E1 (Fig. 6B). After excising hnRNP E1 protected bands, we ligated 3’ and 5’ primers to these digested RNA fragments and performed Sanger sequencing.

When sequencing results of these 4 candidate UTR digestion products were aligned with Clustal- omega, we observed a high degree of conservation, particularly with pyrimidine residues (Fig.

6B).

A hallmark of previously resolved regulatory motifs (for other RBPs) in nature is the conservation of the motif sequence throughout evolutionarily divergent species orthologous

56 genes (Grange, de Sa, Oddos, & Pictet, 1987; Vardhanabhuti, Wang, & Hannenhalli, 2007; Xie et al., 2005). Utilizing this observation, we developed a custom written computational algorithm implemented in Python to analyze the conservation of our sequenced element in comparison to orthologous genes across evolutionary divergent species. We show a graphical presentation generated using WebLogo ((Crooks, Hon, Chandonia, & Brenner, 2004; Schneider & Stephens,

1990)) for the degree of conservation of each base we obtained from our exonuclease protection assay when this RNA fragment was aligned to orthologous genes of evolutionarily divergent species (Fig. 6C). Alignments revealed further conservation of pyrimidine bases throughout evolution, supporting the data obtained from aligning our four candidate genes sequenced fragments. To provide in-vivo evidence of TGFβ induced loss-of-affinity between a putative regulatory motifs and hnRNP E1, we utilized RNA pull down assay where RNA transcripts were synthesized according to our candidate gene sequencing results and attached to CNBr beads.

When these RNA motif-coupled beads were incubated with total cellular lysate from NMuMG cells treated with TGFβ, we observed a significant loss of interaction with hnRNP E1 (Fig. 6D).

This data provides further evidence for a loss of affinity between this 3’UTR RNA motif and p- hnRNP E1.

57 FIGURE 6

58 FIGURE 6 LEGEND

Figure 6. EMT-inducing genes bind hnRNP E1 through their 3’UTR and are protected from exonuclease degradation.

A) Autoradiograph analysis of RNA bound to hnRNP E1. Rows marked with “Bound” contain sample that was obtained through elution of recombinant protein by reduced glutathione elution buffer. Rows marked “Input” contain sample that was obtained by aliquoting a portion of the reaction prior to elution to ensure equal loading of radioactive RNA. B) Autoradiograph of exonuclease treated samples that were bound with either hnRNP E1 or p-hnRNP E1, titration is indicated by additional “+” characters. Sanger sequencing of digested fragments protected by hnRNP E1 are shown in alignment below autoradiograph. C) Evolutionary conservation analysis of sequences obtained by Sanger. Homologene (NCBI) database was used to obtain orthologous gene sequences for evolutionarily divergent organisms. These sequences were aligned by Clustal-omega using the fragment sequence as an alignment template. Height of nucleic acid base in consensus logo indicates its degree of conservation. D) Immunoblot analysis of hnRNP E1 pulled down by synthetic RNA for each candidate gene from TGFβ (up to

24 hours) treated lysate.

59 Affinity characterization of putative motifs and hnRNP E1:

To further characterize the affinity of binding between putative RNA motif sequences and hnRNP E1, we performed RNA electromobility shift assays (REMSA). We observed a high affinity for putative motifs and hnRNP E1 (Fig. 7A), and a near total loss of affinity with p- hnRNP E1. After precisely characterizing the binding kinetics and interactions between newly resolved consensus elements, we set out to determine the uniqueness of each resolved element by using the basic local alignment search tool (BLAST) (available at https://blast.ncbi.nlm.nih.gov/) against genomic mRNA databases. We observed no significant hits outside the candidate gene isoforms of mRNA when the sequenced motif element was used as a search parameter (Fig. 7B).

There are similar features found in each of the four candidate motifs; however, their to one another is low. In order to determine a global consensus motif, we needed to alter our biased approach of resolving individual 3’UTRs to an unbiased approach using genomic mRNA from cells.

60 FIGURE 7

61 FIGURE 7 LEGEND

Figure 7. hnRNP E1 binding nucleic acid motifs show a total loss of affinity for hnRNP E1 when in its phosphorylated state.

A) RNA electromobility shift assay (REMSA) analysis of synthetic regulatory motif RNA. hnRNP E1 and p-hnRNP E1 were titrated (0 – 20 pMol) with 1 pMol of motif RNA. RNA was labeled with [α-32P]-UTP during transcription, samples were run on non-denaturing polyacrylamide gel. B) BLASTn analysis of conserved motif results to determine uniqueness of these regulatory motifs in a genomic context. Sanger sequencing results were used as a query sequence (highlighted in red), stringency parameters were lowered to allow for the identification of patterns, rather than specific bases. Queries were performed against a mouse genomic database, containing genomic mRNA sequences.

62 Genomic binding motif identification of TGFβ / hnRNP E1 regulated genes:

We have characterized a signature protein expression and regulation pattern for TGFβ / hnRNP

E1 regulated EMT-inducing genes (Fig. 8A). To identify the nucleic acid structure in the 3’UTR of these EMT-inducing genes, utilized a biased approach of resolving one motif per gene. When analyzed for similar features, we were unable to obtain statistically significant results from 4 resolved motifs and we set out to modify this assay to incorporate global mRNA rather than synthetically transcribed pools of a homologous 3’UTR RNA. We observed protected fragments of RNA in both digestion reactions protected by hnRNP E1 and p-hnRNP E1; however, a unique band of protected RNA was present in hnRNP E1 and not p-hnRNP E1 reactions (Fig. 8C). We have revealed a nearly 50-nucleotide fragment of RNA that was protected from exonuclease digestion with hnRNP E1 and not p-hnRNP E1. Excision of this band and ligation of 3’ and 5’ sequencing primers allowed for high-throughput Ion-torrent sequencing to reveal the genomic motif patterns that interact with the serine 43 region of hnRNP E1. We obtained nearly 800,000 reads of RNA fragments whose length ranged in size from approximately 22 – 54 nucleotides.

63 FIGURE 8

64 FIGURE 8 LEGEND

Figure 8. Unbiased exonuclease digestion of hnRNP E1 bound target mRNA allows for high-throughput analysis of protected RNA fragment sequences.

A) Venn diagram depiction of phenotypic expression for hnRNP E1 regulated EMT-inducing genes. B) Flow diagram of unbiased, genomic approach for resolving 3’UTR consensus motif.

This modification to common exonuclease assays utilizes homologous recombinant proteins for the sake of lowered background and increased specificity of sequencing results. C) Exonuclease digestion assay for genomic RNA obtained from NMuMG cells (using trizol extraction) and bound to either hnRNP E1 or p-hnRNP E1. p-hnRNP E1 serves as a control to account for the promiscuous nature of hnRNP E1 for binding other RNAs that are not specific to its serine-43 reversible binding region. A unique band found only in samples protected by hnRNP E1 was ligated with 3’ and 5’ sequencing primers and sequenced using ION-torrent sequencing.

65 Analysis of high-throughput sequencing results:

We further analyzed Ion-torrent sequencing results by developing a computational algorithm that utilizes Bowtie2 and Bioconductor packages to align sequence reads with their parent genes and derive positional information of the read sequence in the context of parent gene mRNA

(Langmead & Salzberg, 2012; Robinson, McCarthy, & Smyth, 2010). After stringent analysis composed of mapping reads to their parent genes, analyzing for evolutionary conservation, and resolving motif position, we obtained motif positions for each of our EMT-inducing genes (Fig.

9A). We have created a heat map to correlate the number of reads from ION sequencing for individual genes validated to contain this motif and listed the nucleic acid sequence of the motif, with its highly conserved residues highlighted in red (Fig. 9A). We observed 3 triple pyrimidine repeats that are spaced by 5 – 9 variable nucleotides, supporting our hypothesis that the motif responsible for binding hnRNP E1 (serine 43 region) is highly variable in sequence.

To further annotate these newly resolved binding motifs, we developed a computational algorithm that analyzed each resolved motif and determined a minimal consensus motif descriptor sequence (Fig. 9B). When this minimal descriptor sequence was analyzed for secondary structure, we observed the formation of a dual single strand loop formation, with two

G-C helices of variable length (Fig. 9B). This newly determined descriptor motif was analyzed with BLASTn and found to contain an approximately 92.7% overlap with BLASTn predicted genes and genes contained in Fig. 9A.

66 FIGURE 9

67 FIGURE 9 LEGEND

Figure 9. Genomic exonuclease ION torrent analysis reveals 37 consensus motifs with conserved pyrimidine residues interspaced by variable structural regions.

A) Genes that were identified to be bound to hnRNP E1 and protected from exonuclease digestion. Heat map indicates number of times a sequenced read was mapped to this parent mRNA. Annotated nucleic acid sequences show conserved pyrimidine residues highlighted in red, interspaced regions annotated to better show alignment. Parent genes were determined for each sequenced read by a custom Python script and custom 3’UTR database containing mouse genomic 3’UTR sequences from (UTRdb). B) Structural depiction of consensus motif predicted by custom Python structural prediction algorithm, generalized hnRNP E1 specific binding motif descriptor. Descriptor sequence was inferred by a statistical comparison of all sequenced reads, followed by evolutionary conservation analysis, performed by a custom Python script.

68 hnRNP E1 binding characterization of consensus motif:

We developed a computational script implemented in python to generate a randomized nucleic acid sequence following the constraints of our descriptor motif. We tested 6 randomized consensus elements and found similar binding characteristics with no significant changes in affinity for hnRNP E1, KH1, KH2, KH3, or p-hnRNP E1, and carried out the remainder of experiments utilizing a randomized consensus element with the sequence described in Fig. 10A.

We hypothesized the highly conserved bases (highlighted in red, Fig. 10A) were the main residues in the motif that were responsible for binding to hnRNP E1 and set out to test this hypothesis using REMSA analysis. Previous data in our lab suggested the point mutation of a residue contained in a putative Dab2 / hnRNP E1 binding motif (determined by systematically breaking apart and shortening the 3’UTR of Dab2) could significantly lower the affinity for hnRNP E1. Using this principal, we mutated the first conserved C residue in each of the 3- pyrimidine rich regions (termed cY1, cY2, and cY3, 5’ to 3’, respectively) to determine the overall effect of binding affinity for hnRNP E1, and determine domain specificity for the various

KH domains (Fig. 10B). Our wild type consensus motif showed a high affinity for binding hnRNP E1, KH1, KH2, and KH3, and near total loss of affinity for p-hnRNP E1. Upon mutation of cY1, we observed a significantly lower affinity for full-length hnRNP E1, and a complete loss of affinity for KH2, suggesting an interaction between cY1 and KH2. Systematic mutation of a conserved base in cY2 and cY3 reveals interaction between cY2 and KH3 and cY3 and KH1 (Fig.

10B).

To characterize this randomized consensus motif in the context of in-vivo protein binding, we attached a wilt type and subsequent cY mutations to CNBr beads to assess interaction with

69 cellular hnRNP E1 treated with TGFβ. We observed a high affinity interaction of hnRNP E1 and our wild type motif, along with a weakening of affinity for each of the mutations induced in cY regions (Fig. 10C). Mutations to conserved regions in motif elements cause a high loss of affinity for hnRNP E1 and could potentially mimic the effect of p-hnRNP E1.

In-vitro translation effects of conserved motif mutations and p-hnRNP E1:

We have developed a reporter assay for protein translation utilizing luciferase mRNA with a regulatory element for hnRNP E1 ligated to its 3’ end. This system mimics the translational repression mechanism and allows for a better contextual analysis of translational inhibition induced through hnRNP E1 regulatory complex and the nucleic acid motif. Translation of luciferase was completely inhibited with the addition of 1 pMol luciferase-motif RNA, 4 pMol of hnRNP E1, and 1 pMol eEF1A1 (Fig. 10D). When a similar ratio of p-hnRNP E1 to eEF1A1 was incubated with luciferase-motif RNA, no inhibition of protein translation occurred, further strengthening our observation of this motif’s specificity of binding to hnRNP E1. Subsequent mutations to conserved pyrimidine residues caused a slight decrease in the translation of luciferase; however, they were incapable of complete inhibition (Fig. 10D).

70 FIGURE 10

71 FIGURE 10 LEGEND

Figure 10. Conservation of descriptor motif features in randomized synthetic RNA maintains regulatory motif properties in protein translation.

A) A randomized hnRNP E1-specific consensus motif generated by a custom Python script capable of randomizing oligonucleotides based on descriptor constraints. This sequence was used as the “WT-Consensus” RNA for experiments contained in this figure. B) RNA electromobility shift assay (REMSA) for WT-Consensus and subsequent mutations at conserved pyrimidine residues. Recombinant K-homology (KH) domains of hnRNP E1 were used to determine interactions with conserved motif regions. C) RNA pull-down analysis of WT- consensus motif compared to a bona-fide Dab2 Motif element. Cellular lysate from NMuMG cells were treated with 5 ng/mL TGFβ for up to 6 hours, hnRNP E1 interactions were measured through pull-down by various synthetic RNA constructs. D) In-vitro translation assay of luciferase mRNA coupled to various 3’UTR regulatory elements. Luciferase_Dab2-Motif mRNA was used as a control to compare strength of translation inhibition amongst WT- consensus and its various mutations.

72 Contrasting expression of Akt2 / p-Akt2 and hnRNP E1 / p-hnRNP E1 in a metastatic progression model:

We have characterized and identified nucleic acid motifs that are responsible for the reversible translational stalling through binding to hnRNP E1. To further understand the capacity of this mechanism in a biological context, we set out to analyze key members of the non-canonical

TGFβ pathway that drive the phosphorylation of hnRNP E1Ser43. We have established a clear relationship between Akt2 / p-Akt2 levels, and hnRNP E1 / p-hnRNP E1 levels in the context of metastatic progression. We utilized the 67NR, 4TO7, and 4T1 tumor progression model and

NMuMG cells to show protein expression in normal epithelial, tumorigenic, and metastatic cells

(Wendt, Smith, & Schiemann, 2010). Levels of Akt2 increased as a function of metastatic potential in the progression series, and remained high in NMuMG cells. To assess the activity of

Akt2, we measured levels of its phosphorylated serine-474 residue (p-Akt2). NMuMG cells show virtually no p-Akt2 levels without TGFβ treatment, and when stimulated, exhibit a significant increase in activity (Fig. 11A). While responsive to TGFβ, the progression cell lines do not exhibit such a large shift in the activity of Akt2; it appears to remain constitutively active

(Fig. 11A). We next set out to measure the overall levels of hnRNP E1 in these various lines and determine a ratio with p-hnRNP E1 as a function of metastatic capacity. We observed a significant level of hnRNP E1 expression in NMuMG cells that was not influenced by TGFβ, and a lower expression as the metastatic potential increased (Fig. 11A). Conversely, we observed a low expression of p-hnRNP E1 in NMuMG lines, and increase across the progression series (Fig. 11A). p-hnRNP E1 levels are greatly increased by treatment with TGFβ in NMuMG and 4T07 lines, and not significantly altered in 67NR and 4T1 lines (Fig. 11A). These observations suggest that a cell capable of undergoing non-canonical TGFβ-induced EMT must

73 have a sufficiently high level of Akt2, and its activity must be responsive to stimulation.

Ultimately, the deciding factor for metastatic cells appears to be the ratio of hnRNP E1 and p- hnRNP E1, we show a greatly increased ratio in 4T1 lines which are known to be highly metastatic and aggressive.

To further understand the signature expression of these proteins in the context of well- characterized human cancer lines, we set out to assay in-vitro levels in multiple colon and breast cancer cell lines of varying metastatic capacity. We observed a high correlation of high Akt2, p-

Akt2, and p-hnRNP E1 levels, along with a low hnRNP E1 expression in the most aggressive breast and colon lines as shown (Fig. 11B). These observations suggest a high correlation between non-canonical TGFβ signaling and increased metastatic potential through Akt2 activity, and p-hnRNP E1 levels.

74 FIGURE 11

75 FIGURE 11 LEGEND

Figure 11. Non-canonical TGFβ signaling cascades are significantly upregulated in metastatic cancer lines.

A) Immunoblot analysis of Akt2 and hnRNP E1 in NMuMG cells and a metastatic progression model made up of 67NR, 4TO7, and 4T1 lines. 67NR cells are tumorigenic and not metastatic,

4TO7 are metastatic, yet the metastases do not form secondary tumors, and 4T1 cells are metastatic and form secondary tumors. Akt2 activation is measured by assaying phosphorylated serine-474 (p-Akt2), and p-hnRNP E1 levels are analyzed through a combinatorial α-hnRNP E1 immunoprecipitation followed by α-Akt2 phospho-substrate immunoblotting. B) Human colon and breast cancer cell lines of varying metastatic potential were analyzed for non-canonical

TGFβ activation. We utilized the progression series analysis as a means of correlating metastatic potential to human tumor cell lines.

76 DISCUSSION

This study has identified a nucleic acid descriptor sequence that specifically binds hnRNP E1 in a phospho-serine 43 dependent manner. We have further identified, on a genomic scale, all genes that contain unique motif and precisely map its location within the 3’UTR. We have identified 3 conserved regions of triple-pyrimidine repeats that are separated by a highly variable region and mapped their interaction with the various KH domains of hnRNP E1. The mutation of a conserved pyrimidine residue to a purine causes a significant loss of affinity for hnRNP E1, opening the possibility for mutations that are incorporated into the genome to cause a loss of regulation for EMT-inducing genes.

This study has provided further insight into the kinetics of binding for hnRNP E1 and its target binding motifs. We have identified two aspects of this repression system that can induce its dissociation and cause a loss of translational repression, mutations to the conserved bases in target 3’UTR motifs or increased p-hnRNP E1 levels. As demonstrated by our group and others, the interaction between Akt2 and hnRNP E1 is of profound importance to driving EMT and cancer metastasis (Hills & Squires, 2010; Miettinen et al., 1994; Thiery, 2002). We have provided compelling evidence that Akt2 levels shift TGFβ signaling to induce the phosphorylation of serine 43 on hnRNP E1, facilitating its release from a transnationally silencing complex. We have shown a clear correlation between the metastatic potential of cultured human cancer cells and their levels of Akt2, and hnRNP E1, and further profiled the activation of Akt2, and levels of p-hnRNP E1 in the context of increasing metastatic potential.

We show a potentially deadly signature of low hnRNP E1, high ratio of p-hnRNP E1, and

77 constitutively high Akt2 and p-Akt2, we hypothesize that tumors showing this non-canonical activation signature would show substantially high levels of EMT-inducing gene protein expression.

Normal epithelial tissue has a high potential to take on the characteristics of highly metastatic tissues through its levels of Akt2 and hnRNP E1; however, we do not see the necessary signature of activation without stimulation by TGFβ (Hills & Squires, 2010; Hussey et al., 2012b; Thiery,

2002). Levels of p-Akt2 and p-hnRNP E1 are constitutively low in normal epithelial tissue, yet treatment with TGFβ increases protein expression to levels observed in tumorigenic and metastatic cultured cells. There are currently no protein expression profiles in any of the larger cancer signature databases that contain an array to detect TGFβ regulated EMT-inducing genes.

A future goal of this study is to develop a high-throughput assay capable of detecting activation of Akt2 and levels of p-hnRNP E1 and correlate these to levels of EMT-inducing gene protein expression in patient tissue samples, this will provide further evidence of this pathways involvement in the progression of cancer. A current hurdle to the development of a high- throughput assay of this nature is the detection of p-hnRNP E1. Currently the only method to detect this modification involves utilizing time-consuming immunoprecipitation coupled to immunoblot analysis by Akt2 substrate recognition antibody. There is no commercially available antibody capable of detecting p-hnRNP E1 and use of the current two-pronged detection method would not be suitable in a high throughput screen. We have attempted to create polyclonal antibodies specific to p-hnRNP E1; unfortunately, they have showed no specificity for p-hnRNP E1. Currently, we have begun development of a p-hnRNP E1 specific mouse monoclonal antibody, and preliminary bleed data results are positive. We hope to utilize

78 this candidate antibody for future diagnostic methods to show the clinical relevance of p-hnRNP

E1 in cancer.

The development of diagnostic clinical screens and implication of p-hnRNP E1 in cancer will prove useful for identifying a new target for therapeutic treatment. Current drugs on the market capable of lowering p-hnRNP E1 levels target this pathway at early points by inhibiting Akt2 or

PI3K. Unfortunately, these drugs have many adverse side effects due to their broad inhibitory nature; many pathways driven by Akt2 and PI3K are of vital importance to cellular regulation.

Inhibition at these points has proven problematic in a clinical context by the variety of complications that develop from prolonged use. With the discovery of a regulon controlled by hnRNP E1, we have given the potential of a target site for a small-molecule drug inhibitor. We hypothesize that treating cells with a specific inhibitor to the phosphorylation of serine 43 of hnRNP E1 would prove to be more therapeutic than a generalized Akt2 or PI3K inhibitor. We have clearly demonstrated the ability of hnRNP E1 knockout to drive metastasis, and show a similar effect by treating NMuMG cells with TGFβ, thus development of a specific p-hnRNP E1 inhibitor would prove invaluable in the treatment of patients afflicted with this condition.

79 ACKNOWLEDGMENTS

We thank Dr. George Hussey, Dr. Bidyut Mohanty, and other members of our laboratory for helpful comments and critical insights. We also thank the Medical University of South

Carolina’s Genomic Sequencing Core Facility for the ION Torrent sequencing performed by Dr.

Jamie Barth, Ph.D.

80 FUNDING

This work was supported by the grants CA55536 and CA154664 from the National Cancer

Institute (to Philip H. Howe, Ph.D.).

81 REFERENCES

Agarwal, E., Brattain, M. G., & Chowdhury, S. (2013). Cell survival and metastasis regulation

by Akt signaling in colorectal cancer. Cellular signalling, 25(8), 1711-1719.

Burd, C. G., & Dreyfuss, G. (1994). Conserved structures and diversity of functions of RNA-

binding proteins. SCIENCE-NEW YORK THEN WASHINGTON-, 615-615.

Chaudhury, A., Hussey, G. S., Ray, P. S., Jin, G., Fox, P. L., & Howe, P. H. (2010). TGF-β-

mediated phosphorylation of hnRNP E1 induces EMT via transcript-selective

translational induction of Dab2 and ILEI. Nature cell biology, 12(3), 286-293.

Chen, X.-F., Zhang, H.-J., Wang, H.-B., Zhu, J., Zhou, W.-Y., Zhang, H., . . . Zhang, L. (2012).

Transforming growth factor-β1 induces epithelial-to-mesenchymal transition in human

lung cancer cells via PI3K/Akt and MEK/Erk1/2 signaling pathways. Molecular biology

reports, 39(4), 3549-3556.

Chin, Y. M. R., Balk, S., & Toker, A. (2013). Akt2-specific signaling in prostate cancer

maintenance. Cancer research, 73(8 Supplement), 4090.

Corcoran, D. L., Georgiev, S., Mukherjee, N., Gottwein, E., Skalsky, R. L., Keene, J. D., &

Ohler, U. (2011). PARalyzer: definition of RNA binding sites from PAR-CLIP short-read

sequence data. Genome Biol, 12(8), R79.

Crooks, G. E., Hon, G., Chandonia, J. M., & Brenner, S. E. (2004). WebLogo: a sequence logo

generator. Genome Res, 14(6), 1188-1190. doi: 10.1101/gr.849004

Glisovic, T., Bachorik, J. L., Yong, J., & Dreyfuss, G. (2008). RNA-binding proteins and post-

transcriptional gene regulation. FEBS letters, 582(14), 1977-1986.

82 Grange, T., de Sa, C. M., Oddos, J., & Pictet, R. (1987). Human mRNA polyadenylate binding

protein: evolutionary conservation of a nucleic acid binding motif. Nucleic acids

research, 15(12), 4771-4787.

Hafner, M., Landthaler, M., Burger, L., Khorshid, M., Hausser, J., Berninger, P., . . .

Munschauer, M. (2010a). PAR-CliP-a method to identify transcriptome-wide the binding

sites of RNA binding proteins. Journal of visualized experiments: JoVE(41).

Hafner, M., Landthaler, M., Burger, L., Khorshid, M., Hausser, J., Berninger, P., . . .

Munschauer, M. (2010b). Transcriptome-wide identification of RNA-binding protein and

microRNA target sites by PAR-CLIP. Cell, 141(1), 129-141.

Hills, C. E., & Squires, P. E. (2010). TGF-beta 1-induced epithelial-to-mesenchymal transition

and therapeutic intervention in diabetic nephropathy. American journal of nephrology,

31(1), 68-74.

Ho, J., & Marsden, P. A. (2014). Competition and collaboration between RNA!binding proteins

and microRNAs. Wiley Interdisciplinary Reviews: RNA, 5(1), 69-86.

Huber, M. A., Kraut, N., & Beug, H. (2005). Molecular requirements for epithelial–

mesenchymal transition during tumor progression. Current opinion in cell biology, 17(5),

548-558.

Hussey, G. S., Chaudhury, A., Dawson, A. E., Lindner, D. J., Knudsen, C. R., Wilce, M. C., . . .

Howe, P. H. (2011). Identification of an mRNP complex regulating tumorigenesis at the

translational elongation step. Molecular cell, 41(4), 419-431.

Hussey, G. S., Link, L. A., Brown, A. S., Howley, B. V., Chaudhury, A., & Howe, P. H. (2012).

Establishment of a TGFβ-induced post-transcriptional EMT gene signature. PLoS One,

7(12), e52624.

83 Langmead, B., & Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nature

methods, 9(4), 357-359.

Lee, J. M., Dedhar, S., Kalluri, R., & Thompson, E. W. (2006). The epithelial–mesenchymal

transition: new insights in signaling, development, and disease. The Journal of cell

biology, 172(7), 973-981.

Massagué, J. (2000). How cells read TGF-β signals. Nature reviews Molecular cell biology, 1(3),

169-178.

Massagué, J. (2012). TGFβ signalling in context. Nature reviews Molecular cell biology, 13(10),

616-630.

Miettinen, P. J., Ebner, R., Lopez, A. R., & Derynck, R. (1994). TGF-beta induced

transdifferentiation of mammary epithelial cells to mesenchymal cells: involvement of

type I receptors. The Journal of cell biology, 127(6), 2021-2036.

Mulholland, D. J., Kobayashi, N., Ruscetti, M., Zhi, A., Tran, L. M., Huang, J., . . . Wu, H.

(2012). Pten loss and RAS/MAPK activation cooperate to promote EMT and metastasis

initiated from prostate cancer stem/progenitor cells. Cancer research, 72(7), 1878-1889.

Robinson, M. D., McCarthy, D. J., & Smyth, G. K. (2010). edgeR: a Bioconductor package for

differential expression analysis of digital gene expression data. Bioinformatics, 26(1),

139-140.

Schneider, T. D., & Stephens, R. M. (1990). Sequence logos: a new way to display consensus

sequences. Nucleic Acids Res, 18(20), 6097-6100.

Thiery, J. P. (2002). Epithelial–mesenchymal transitions in tumour progression. Nature Reviews

Cancer, 2(6), 442-454.

84 Vardhanabhuti, S., Wang, J., & Hannenhalli, S. (2007). Position and distance specificity are

important determinants of cis-regulatory motifs in addition to evolutionary conservation.

Nucleic Acids Research, 35(10), 3203-3213.

Wendt, M. K., Smith, J. A., & Schiemann, W. P. (2010). Transforming growth factor-beta-

induced epithelial-mesenchymal transition facilitates epidermal growth factor-dependent

breast cancer progression. Oncogene, 29(49), 6485-6498. doi: 10.1038/onc.2010.377

Xie, X., Lu, J., Kulbokas, E., Golub, T. R., Mootha, V., Lindblad-Toh, K., . . . Kellis, M. (2005).

Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by

comparison of several mammals. Nature, 434(7031), 338-345.

Xu, J., Lamouille, S., & Derynck, R. (2009). TGF-β-induced epithelial to mesenchymal

transition. Cell research, 19(2), 156-172.

Xue, G., Restuccia, D. F., Lan, Q., Hynx, D., Dirnhofer, S., Hess, D., . . . Hemmings, B. A.

(2012). Akt/PKB-mediated phosphorylation of Twist1 promotes tumor metastasis via

mediating cross-talk between PI3K/Akt and TGF-β signaling axes. Cancer discovery,

2(3), 248-259.

85 CHAPTER IV

Development of a Phosphorylated Serine 43 (hnRNP E1) Specific Antibody and

Implications of Expression in Metastatic Cancer

86 ABSTRACT

The shift of transforming growth factor β (TGFβ) as an inducer of apoptosis to an inducer of proliferation and cancer is a process driven by abnormal non-canonical downstream signaling.

TGFβ signals its non-canonical V-Akt murine thymoma viral oncogene homolog 2 (Akt2) pathway and drives the phosphorylation of serine 43 of hnRNP E1 (p-hnRNP E1), facilitating its release from a translational repression mechanism of genes that drive epithelial to mesenchymal transition (EMT). Our group has identified genes regulated by hnRNP E1 / TGFβ that contain a unique nucleic acid motif in their 3’ untranslated region (UTR) of mRNA. This regulatory motif is responsible for the reversible binding with hnRNP E1 and subsequent loss-of-binding to p- hnRNP E1. We have further gone on to characterize the pathways that cause an over accumulation of p-hnRNP E1 causing a reprogramming of the cell to initiate EMT. Screens of metastatic progression series have yielded a signature pattern of low protein expression of hnRNP E1, high ratio of p-hnRNP E1, and high levels of Akt2 protein and activity. We have demonstrated the ability for the knockdown of hnRNP E1 in normal epithelial tissue drives cancer metastasis and invasiveness, a similar effect occurs on tissue with high Akt2 levels and high non-canonical TGFβ downstream activation. These observations and findings have provided the basis for the development of a high-throughput, multiplexed protein array assay, capable of detecting the metastatic potential of a clinical patient’s tumor tissue. A necessary component of this array is an antibody capable of specifically detecting p-hnRNP E1 and not its unmodified counterpart. We have utilized a unique method to generate a phospho-specific antibody to detect p-hnRNP E1 in conjunction with numerous other members of the non- canonical TGFβ-EMT inducing pathway.

87 INTRODUCTION

Heterogeneous nuclear ribonucleoproteins (hnRNPs) are a family of RNA binding proteins

(RBPs) that regulate a variety of cellular processes from RNA processing, protein translation, and are increasingly shown to interact with 3’ and 5’ untranslated regions (UTRs) to regulate gene expression (Chaudhury, Chander, et al., 2010; Díaz-Moreno et al., 2009; Gherzi et al.,

2006; Kosturko et al., 2006; Ostareck!Lederer & Ostareck, 2004; Thiele et al., 2004; Zhang,

2011). Of the multiple members of this family (hnRNPs), our group has identified hnRNP E1 as a potent mediator of epithelial to mesenchymal transition (EMT) (Chaudhury, Hussey, et al.,

2010b; Hussey et al., 2011).

Previous reports have shed light on a unique translational repressor mechanism involving the

3’UTR of multiple EMT inducing genes binding hnRNP E1 (Chaudhury, Hussey, et al., 2010b;

Hussey et al., 2011). This developmental-necessary regulatory system is of major importance in cellular development; translation is stalled at the elongation state. This energy-efficient mechanism allows for the rapid translation of stalled genes when environmental stress is encountered. We have ultimately determined the phosphorylation of serine 43 on hnRNP E1 (p- hnRNP E1) induces the release of translational repression, allowing a previously stalled ribosomal / RNA translation complex to proceed and when analyzed in polyribosomal sedimentation assay, target mRNA is found enriched in polyribosomal fractions (Chaudhury,

Chander, et al., 2010; Chaudhury, Hussey, et al., 2010b; Hussey et al., 2011; Hussey et al.,

2012a).

88 Multiple observations by many different groups have implicated transforming growth factor β

(TGFβ) as a potent inducer of EMT (J. D. Ho et al., 2013; Iwano, 2009; Kasai, Allen, Mason,

Kamimura, & Zhang, 2005; Padua & Massagué, 2009; Zeisberg et al., 2003). Our group has shown consistently, convincing evidence to support a TGFβ mediated shift in the ratio of hnRNP

E1 to p-hnRNP E1. We have delineated the pathway through which non-canonical signaling of

TGFβ is driven by V-Akt murine thymoma viral oncogene (Akt2) being present in sufficient protein level (Cheng et al., 1996; Zavadil & Böttinger, 2005). This increase in Akt2 causes a shift in the canonical apoptotic phenotype displayed by epithelial cells stimulated by TGFβ to a proliferative, pro-EMT, pro-invasive metastatic phenotype (Chaudhury, Chander, et al., 2010;

Chaudhury, Hussey, et al., 2010b; Hussey et al., 2011; Hussey et al., 2012a; Song, Sheng, Zhang,

Jiao, & Li, 2014).

Akt2 is a serine / threonine kinase composed of Src homology 2-like (SH2-like) domains. Akt 2 levels have been shown to be significantly overexpressed in metastatic / tumorigenic tissue, and levels of its serine 474 (p-Akt2) site have been shown to be a readout of metastatic activity

(Bellacosa et al., 1995; Engelman, 2009; Liu et al., 1998). Akt2 is a mediator of many different cellular processes, our group has discovered a direct link where Akt2 directly phosphorylates hnRNP E1 (serine 43), facilitating the release of hnRNP E1 from its 3’UTR silencing complex, and inducing translation of genes which drive EMT and cancer metastasis (Chaudhury, Chander, et al., 2010; Chaudhury, Hussey, et al., 2010b; Hussey et al., 2011; Hussey et al., 2012a).

We have defined a nucleic acid regulatory motif that is located in the 3’UTRs of EMT inducing genes. Genes that contain this motif have been identified, and the motif location has been

89 precisely mapped (Table 2). Through multiple in-vitro assays, we were able to show the binding kinetics of this motif for hnRNP E1 and p-hnRNP E1, and the consequences of the loss of affinity for p-hnRNP E1 as sufficient to lose translational repression as identified in in-vitro translation assays (Chaudhury, Hussey, et al., 2010b; Hussey et al., 2011). The elucidation of this pathway and key factors has presented the need to develop a currently unavailable p-hnRNP

E1 specific antibody. The current method to detect p-hnRNP E1 via western bot analysis is to immunoprecipitate with α-hnRNP E1 and subsequently analyze with an Akt2 substrate specific antibody that recognizes the serine 43 phosphorylation of p-hnRNP E1. This time consuming, complex method for detecting p-hnRNP E1 has presented a technical challenge to the development of clinical screens to correlate levels of endogenous p-hnRNP E1 and other mediators of this TGFβ driven pathway in metastatic cancer tissue.

90 TABLE 2

91 We have shown a clear correlation between the ratio of hnRNP E1 : p-hnRNP E1 as a function of metastatic potential in human cancer cells and a mouse metastatic progression model. In addition to a shift in a higher ratio of p-hnRNP E1, we have observed high Akt2 levels and activity as a predictor of metastatic potential. This has driven the need to develop a p-hnRNP E1 specific antibody, coupled to analysis of other members driving TGFβ mediated EMT, such as

Akt2 and p-Akt2. Previous attempts by our lab to develop a rabbit-polyclonal p-hnRNP E1 specific antibody have been met with the technical challenge of in distinguishability for the antibody between hnRNP E1 and p-hnRNP E1. These attempts have given insight into the necessary modifications to standard antibody production techniques.

In this manuscript, we have developed a method to specifically raise a mouse monoclonal antibody that is capable of detecting serine 43 phosphorylation of hnRNP E1. We have utilized recombinant protein expression techniques to greatly enhance the titer of p-hnRNP E1 specific antibodies and have begun using preliminary bleed sera in the development of our multidimensional metastasis potential array.

92 EXPERIMENTAL PROCEDURES

Recombinant protein expression – E. coli DH5α cells containing an Isopropyl β-D-1- thiogalactopyranoside (IPTG) inducible recombinant plasmid (containing various peptide expression open reading frames) were grown to an optical density at 600 nm of 0.6, then treated with 1 µM IPTG for 4 hours at 37°C to induce protein expression. After induction, cells were lysed with a bacterial lysis buffer (50 mM Tris-HCl pH = 7.5, 300 mM NaCl, 20 mM Imidazole,

0.05% β-mercaptoethanol, 0.5% triton X-100, 1 mM PMSF, 0.5 µg/mL, 10 µg/mL aprotinin) for

30 minutes at 4°C. To complete cellular lysis and clear bacterial debris, lysed cells were centrifuged at an RCF of 100,000 x g. Recombinant proteins contained a GST-tag with a

PreScission (GE Healthcare) protease cleave site between the expressed peptide and GST-tag.

Proteins were utilized in a variety of downstream analytic methods.

In-vitro kinase assay – Recombinant KH1 and hnRNP E1 were utilized as substrates and prepared as previously described. Recombinant substrates were incubated with Akt2 enzyme

(Signal Chem) in kinase buffer (25 mM Tris-HCl pH = 7.5, 5 mM β-glycerophosphate, 10 µM

ATP, 2 mM dithiothreitol, 0.1 mM Na3VO4, 10 mM MgCl2) for 15 minutes at 30°C. To purify substrate proteins, glutathione agarose was added incubated with halted kinase reaction for 1 hour at 4°C. Agarose beads were washed and subjected to PreScission (GE Healthcare) protease cleavage following manufacturers specifications. Supernatant of protease reaction was reclaimed and concentrated in a 3 KDa molecular weight cutoff filtration column and stored in

50% glycerol at -80°C.

93 Separation of phosphorylated proteins – Incomplete kinase activity on all substrate proteins in in-vitro kinase assay presented the need to further purify p-hnRNP E1 from hnRNP E1. Our lab has previously developed in-vitro binding techniques, using a nucleic acid consensus sequence previously delineated, and shown to bind only hnRNP E1 and have virtually no affinity for p- hnRNP E1. We transcribed synthetic RNA oligonucleotides utilizing a T7-transcriptional promoter and attached this to CNBr-agarose beads (GE Healthcare) following manufacturers recommended procedures. A column was packed with approximately 5 mL of RNA-bound resin and equilibrated with RNA-protein binding buffer (50 mM Tris-HCl pH = 7.5, 100 mM KCl, 4% glycerol, 1 mM EDTA). Samples from kinase reaction containing either hnRNP E1 and p- hnRNP E1 or KH1 and p-KH1 were run through the column, flow through was collected and confirmed to contain p-hnRNP E1. We confirmed removal of hnRNP E1 through running a high salt (300 mM NaCl) elution buffer and analyzing eluate for hnRNP E1. To confirm complete removal of hnRNP E1, we ran the flow through back into the column multiple times until hnRNP

E1 was no longer detectable by western blot analysis.

Methodology for production of antibody – We obtained approximately 3 mg of purified p-KH1 peptide to be used for inoculation into BALB/c mice. 5 individual BALB/c mice were inoculated with p-KH1 peptide. After multiple additional inoculations, preliminary bleed sera was obtained and analyzed using enzyme linked immunosorbent assay (ELISA) to measure affinity to a phosphopeptide containing the sequence KRIREE(pS)GARINISC and its non- phosphorylated isoform KRIREESGARINISC to assess specificity for p-KH1. Additional analysis to confirm antibody specificity in a western blot analysis was performed. 30 µg of total cellular lysate from NMuMG cells treated with 5 ng/mL TGFβ and 5 ng/mL TGFβ, 10 µM

94 LY294005 (potent inhibitor of TGFβ-induced hnRNP E1 serine 43 phosphorylation) and analyzed using denaturing polyacrylamide gel electrophoresis. A commercially available α- hnRNP E1 (Abnova) antibody was used to confirm levels of total hnRNP E1, preliminary bleed sera was used to analyze p-hnRNP E1 levels and applied at a dilution of 1:1000 in 5% bovine serum albumin / PBS.

95 RESULTS

This work highlights key regulatory components of a highly metastatic non-canonical TGFβ downstream activation of PI3K / Akt2. We have shown both in-vitro and cellular models the various components of this pathway that ultimately lead to accumulation of p-hnRNP E1, and a loss its translational repression. Translational repression is lost through perturbation of hnRNP

E1 and target 3’UTR regulatory motifs affinity. To this end, we have identified signature protein expression / modification patterns as a function of increasing metastatic potential. We have shown a strong correlation between a high ratio of p-hnRNP E1 : hnRNP E1 as a strong predictor of invasiveness and metastatic potential. Our efforts to produce a monoclonal p-hnRNP E1 specific antibody have proven successful and we have taken the preliminary steps toward the development of a high-throughput clinical tissue analysis to measure levels and states of proteins implicated in this pathway.

Our unique approach to developing a phospho-specific antibody has yielded a highly specific mouse monoclonal antibody against p-hnRNP E1Ser43 that we plan to use in future clinical analyses of tumor tissues to measure the activation of this non-canonical TGFβ pathway.

Implications of high p-hnRNP E1 levels have recently been implicated in a variety of other pathologic cellular conditions, the difficult process of developing a specific antibody to assay hnRNP E1 in this modified state have alleviated tedious previous technical methods to assay this modification. Previous methods to assay p-hnRNP E1 included immunoprecipitation from cellular lysate with α-hnRNP E1 and subsequent western blot with an Akt2 substrate kinase recognition antibody (detects pSer 43). These necessary steps previously inhibited the

96 development of a high throughput screen, and we have taken measures to ensure versatile usage of our p-hnRNP E1 antibody. To date, we have tested this in both ELISA and western blot analysis techniques. Both have proven highly successful and yielded results that indicate a high degree of specificity to p-hnRNP E1.

Producing our antibody required a large quantity of highly purified phosphorylated antigen. We decided to use recombinantly expressed KH1 (domain of hnRNP E1) as this small (~10 KDa) peptide contains the serine 43 site of phosphorylation by Akt2. Previous assays developed in our lab were used to phosphorylate KH1 by use of recombinant Akt2 enzyme. Figure 12A is a western blot analysis showing phosphorylation of serine 43 on KH1 (p-KH1). Recombinant

Akt2 incubated with KH1 in the presence of ATP causes an accumulation of p-KH1. Previous experimentation in our lab resulted in the optimization of this assay on the substrate protein hnRNP E1; this was included in this confirmation to ensure a similar level of kinase activity by

Akt2 on both substrates. We are able to detect p-KH1 levels by use of α-RXRXXpS antibody in western blots where purified recombinant proteins were used. Figure 12B is an RNA pull down assay demonstrating the specificity of binding between our consensus nucleic acid motif (5’ – [

W ]3-7 – [ S ]2 – [ N ]2-4 – CYY – [ N ]2-4 – [ S ]2-4 – [ N ]2-4 – CYY – [ N ]2-4 – [ S ]2-4 – [ N ]2-4 –

CYY – [ N ]2-4 – [ S ]2 – [ W ]>3 – 3’) and KH1 and lack of binding to p-KH1. We show a strong affinity for KH1 and our nucleic acid motif, and a near complete loss of affinity for p-KH1. We exploit this phenomenon in the preparation of p-KH1 antigen to obtain a highly pure source.

97 Figure 12C is a western blot analysis of 5 fractions of high salt eluate from our nucleic acid- coupled agarose column used in preparation of our p-KH1 antigen. Flow through from our purification column was repeatedly recycled through the purification column, after each flow through, a high salt wash was subsequently used to remove KH1 bound to nucleic acid-coupled resin, these fractions western blot analysis by α-hnRNP E1 (epitope contained within KH1).

After 5 repeated cycles of KH1 removal, we determined our sample to be highly enriched for p-

KH1. Figure 12D is a final confirmation of our purified p-KH1 pool, analyzed by western blot with α-RXRXXpS, and compared with a non-enriched pool of in-vitro kinase phosphorylated

KH1.

98 FIGURE 12

A B

+ - - + + KH1 - + + - - p-KH1 - - + + - CNBr-agarose + RNA + + - - - CNBr-agarose - RNA

RNA Pulldown WB: α-hnRNP E1

WB: α-RXRXXpS

C D

Enriched Non-enriched KH1 KH1 1 2 3 4 5 High Salt Elution #

WB: α-hnRNP E1 WB: α-RXRXXpS

99 FIGURE 12 LEGEND

Figure 12. In-vitro kinase of KH1 and p-KH1 enrichment.

A) Western blot analysis of in-vitro kinase samples. Recombinant KH1 and hnRNP E1 were placed in the presence of Akt2 enzyme in a kinase buffer. An Akt2 kinase substrate phosphorylation specific antibody (α-RXRXXpS) was used to detect serine 43 phosphorylation of KH1 and hnRNP E1. B) To demonstrate the specificity for the nucleic acid consensus motif

5’ – AAUUGCUA – CCC – AAUGCCUGG – CCC – AAGGGCAUU – CCC – AAAGCUUA –

3’, we ligated this oligonucleotide to CNBr-agarose and performed RNA pull down assays to demonstrate the specificity for this motif and KH1. p-KH1 affinity for this motif is significantly inhibited, binding for KH1 is heavily favored. C) Our purification technique exploited the observation in panel B, we depleted unphosphorylated KH1 from our kinase reaction by continuously running sample through our nucleic acid-agarose column. High salt elutions were used to remove bound KH1 from column; aliquots of each elution were analyzed for total KH1 levels using an α-hnRNP E1 antibody with an epitope to a conserved site in KH1. We noticed near complete depletion of unphosphorylated KH1 after 3 passes through our affinity column.

D) To confirm KH1 enrichment, we compared processed KH1 to a sample of un-processed KH1 obtained by in-vitro kinase phosphorylation by Akt2.

100 Mouse monoclonal antibody to p-KH1 was obtained, and analyzed for specificity to p-hnRNP

E1. Figure 13C is an ELISA assay, measuring our candidate antibodies specificity to a phosphorylated substrate (KRIREEpSGARINISC) similar to p-KH1 and a control substrate

(KRIREESGARINISC) similar to KH1. We show a significantly different affinity of binding between our antibody and phosphorylated substrate as indicated by an absorbance of 1.902 to

0.581 (phosphorylated substrate to control). To further test the specificity of this antibody in a more usage-appropriate scenario, we performed an in-vitro kinase assay on full length, recombinant hnRNP E1. Figure 13B is a western blot analysis with our p-hnRNP E1 antibody on various samples of hnRNP E1 run under different Akt2 kinase conditions. We show a high degree of specificity for p-hnRNP E1 indicated by the robust signaling observed in western blot analysis. To test this antibody in the context of cellular lysate, we obtained lysate from NMuMG cells treated with TGFβ and NMuMG cells treated with both LY294005 (a potent inhibitor of

Akt2 activation through PI3K) and TGFβ. Figure 13A is a western blot analysis comparing a commercialized α-hnRNP E1 antibody with no specificity or altered affinity for p-hnRNP E1, and our candidate α-p-hnRNP E1 antibody. Commercial antibody detects a slight decrease of hnRNP E1 levels with respect to LY294005 prolonged treatment, and virtually no change with

TGFβ treatment alone. Our p-hnRNP E1 antibody showed significant decreases signal intensity to samples that contained LY294005, and a gradual increase of intensity for samples treated solely with TGFβ. The phosphorylation kinetics observed mirror that of previous results that utilized our combined assay approach.

101 FIGURE 13

102 FIGURE 13 LEGEND

Figure 13. p-hnRNP E1 specific antibody shows specificity to hnRNP E1Ser43.

A) A candidate mouse monoclonal antibody, specific for p-hnRNP E1 was used in western blot analysis of total cellular lysates from NMuMG cells treated with 5 ng/mL TGFβ for various times. An inhibitor of PI3K / Akt2, LY294005 was used to pretreat NMuMG cells at 10 µM for

30 minutes prior to TGFβ stimulation. We observed no increase in signal from our candidate α- p-hnRNP E1 antibody in cells treated with inhibitor; however, cells treated only with TGFβ show an increase in signal that increases consistently with previous kinetic analysis of p-hnRNP

E1 induction. To compare specificity, a commercially available α-hnRNP E1 antibody was used to 1) confirm the size / position of hnRNP E1 bands, 2) confirm a lack of total protein increase with TGFβ, and 3) normalize levels of detected p-hnRNP E1. B) To further characterize this candidate antibody, we utilized recombinant hnRNP E1 and mediated kinase kinetics by altering the levels of substrate and Akt2 enzyme. We observed no background detection of Akt2 enzyme and saw an intensity effect in the detected band respective to levels of both substrate and enzyme. We observed no significant detection of unphosphorylated hnRNP E1 as indicated by the last three lanes. C) ELISA confirmation of specificity for α-p-hnRNP E1, two consensus sequences were used to coat the bottom of wells in this assay, KRIREEpSGARINISC was used as a phosphorylated consensus, and KRIREESGARINISC was used as a control peptide. We observed a significant difference in affinity between these two peptides; affinity for the phosphorylated variant was significantly higher than its control.

103 Previous work in our laboratory correlated a signature pattern of protein expression for hnRNP

E1 and Akt2 with relation to metastatic potential. We further characterized these regulatory proteins for 2 post-translational modifications, p-hnRNP E1 and p-Akt2. Metastatic cells showed a low level of hnRNP E1, and high levels of Akt, p-Akt2, and p-hnRNP E1. We observed a derivation of this pattern in NMuMG cells that undergo TGFβ induced EMT, there were high levels of Akt2 and hnRNP E1; however, levels of p-Akt2 and p-hnRNP E1 were significantly low with no TGFβ stimulation. Akt2 is known to be overexpressed in cancerous tissue and a relationship between the ratio of Akt1 : Akt2 is hypothesized to play a role in this development. Figure 14A is a comparison of Akt1 : Akt2 levels for a well-characterized metastatic progression model: 67NR, 4TO7, and 4T1. We included NMuMG to show a comparison of normal tissue that has the capacity to undergo EMT and inherit the properties of invasiveness. We observe a shift of a higher ratio of Akt1 to Akt2 toward a predominant expression of Akt2. We further characterize signature members of the non-canonical TGFβ pathway by assessing levels of hnRNP E1 and p-hnRNP E1 throughout this progression series.

As previously observed, Figure 14B suggests hnRNP E1 levels decrease as the metastatic potential increases, and ratios of p-hnRNP E1 follow the opposite pattern.

104 FIGURE 14

105 FIGURE 14 LEGEND

Figure 14. Akt isoform expression and p-hnRNP E1 levels in metastasis.

To assess levels of Akt1, Akt2 and p-Akt2 (serine 474) in a progression series, we analyzed total cellular lysates from NMuMG, 67NR, 4TO7, and 4T1 by western blot. A) Levels of Akt1 are slightly decreased from NMuMG to progression lines, there appears to be no significant deviation of Ak1 levels as the metastatic potential of the series increases. We observed a high level of Akt2 in NMuMG cell lines, and a gradual increase in Akt2 across the progression series.

4T1 cells contain a significantly higher level of Akt2 than its less metastatic 67NR counterpart.

Interestingly, NMuMG cells contain nearly the same total expression of Akt2; however, when analyzed for p-Akt2, we see a sharp decrease in NMuMG levels. An increase in the activity of

Akt2 is observed increasing along with metastatic potential as indicated by p-Akt2 levels in progression series. Previous data from our lab suggests NMuMG cells contain little to no p-

Akt2, however; TGFβ stimulation significantly raises these levels within a short period of time.

B) Another hallmark of metastatic cell phenotype is a significantly low level of hnRNP E1. To assess the relationship of total hnRNP E1 and p-hnRNP E1 with metastatic potential, we performed western blot analysis of total cellular lysates and subsequently analyzed protein levels. We observed a high level of hnRNP E1 in NMuMG cells, and a sharp decrease of expression in the highly metastatic 4T1 line. Likewise, we observed a low level of p-hnRNP E1 in NMuMG lines and a high expression in 4T1. Combined with a low total level of hnRNP E1, and high ratio of p-hnRNP E1, we hypothesize downstream levels of EMT-inducing genes would have significantly high protein expression levels.

106 Our current observations of non-canonical TGFβ-induced EMT and metastasis through PI3K /

Akt2 have been limited to cellular models. To gain insight into the clinical implications of our hypothesis, we analyzed publically available data from The Cancer Genome Atlas (TCGA) to correlate tumor phenotypes to our model. Figure 15A shows a Kaplan-Meyer survival analysis of tumor samples from TCGA that had greater than a 2-fold increase of protein expression for

EMT-inducing genes (Table 2). We see a significant correlation between the high expression of these genes and a low potential for survival. Figure 15B contains data from an MSigDB (Broad

Institute) analysis of our 36 EMT-inducing genes (Table 2) implication in cancer signatures.

MSigDB analysis revealed 6 of our EMT-inducing genes are significantly upregulated in MCF7

(highly metastatic, highly invasive). When grouped for functionality, we found significantly elevated protein levels of Ccl2, Inhba, and Egfr (grouped to EGFR_UP annotated gene cluster) contained a p-value of 0.0003; as well as elevated protein levels of Palld, Tcf12, and Jak2

(grouped to RAF_UP annotated gene cluster) contained a p-value of 0.0003.

We have determined multiple related clusters of gene expression that correlate to both in-vitro characterized progression series and clinical tumor samples, and suggests a metastatic phenotype with low chance of survival. These results have formed the basis of a high throughput protein expression profile to be used with clinical tissue samples to analyze the activity of non-canonical

TGFβ EMT and metastasis.

107 FIGURE 15

108 FIGURE 15 LEGEND

Figure 15. Clinical implication of TGFβ induced p-hnRNP E1 activity.

To assess clinical features associated with a high expression of protein for EMT-inducing genes

(Table 2), we performed computational analysis utilizing the Cancer Browser (UCSC) (Cline et al., 2013). We have observed our phenotypic expression patterns for non-canonical TGFβ induced EMT and metastasis mainly in breast and colon cancer lines. To this end, we analyzed protein expression data for 352 patients from The Cancer Genome Atlas (TCGA, http://cancergenome.nih.gov/). Patients with protein levels of at least 10 EMT-inducing genes that had a fold increase of 1.5 or greater were selected and a Kaplan-Meyer survival plot was created. A) Kaplan-Meyer survival plot of patients that had significantly high protein expression of our EMT-inducing genes. B) A further characterization of EMT-inducing gene protein expression levels in cancerous tissues was performed using MSigDB (Broad)

(Subramanian, Tamayo, Mootha, Mukherjee, Ebert, Gillette, Paulovich, Pomeroy, Golub,

Lander, et al., 2005). We observed 2 signature pathways enriched by 6 of our EMT-inducing genes. We observed significantly higher protein expression of Ccl2, Inhba, and Egfr in MCF7 cells, enriching the EGFR_UP annotated pathway. Palld, Tcf12, and Jak2 levels were found to be enriched in the RAF_UP annotated pathway, the enrichment of these well-characterized pathways suggests a highly metastatic phenotype.

109 DISCUSSION

The work in our lab has focused on the non-canonical function of TGFβ signaling through PI3K

/ Akt2. We have previously characterized a nucleic acid consensus motif responsible for binding hnRNP E1 and repressing its protein expression, along with gaining significant insight into the arrangement of conserved nucleic acid residues with various domains of hnRNP E1. We have further gone to characterize signature expression profiles of Akt1, Akt2, p-Akt2, hnRNP E1, and p-hnRNP E1 in the context of metastatic progression. To this end, we have identified key genes, and post-translational modifications that we have implicated in the progression of cancer and metastasis.

Our analysis of Akt isoforms and post-translational modifications revealed similar expression profiles (with respect to metastasis) to many published protein expression profiles on tumor and non-tumor tissues. When metastatic progression lines are analyzed, we see a strong correlation in the shift of expression from Akt1 to Akt2, and an even stronger correlation between metastasis and p-Akt2. Currently, there is no published analysis of clinical protein post-translational modification that analyzes levels of p-hnRNP E1. Our development of a suitable, versatile p- hnRNP E1 specific antibody has made possible the development of clinical screens to assess patient tissue samples for p-hnRNP E1 levels.

We have identified two separate genetic expression profiles that are dependent upon one another.

Our first signature includes an of the levels and states of 2 key protein regulators of TGFβ- induced EMT (Akt and hnRNP E1), we will use this as a means to measure activity of this

110 aberrantly reactivated pathway. Our second signature includes the 36 EMT-inducing genes identified in Table 2; we hypothesize that a readout of high pathway activity from signature 1 would correlate to a high expression of signature 2 (IE the genes that are regulated by our signature 1 pathway).

Current in-silico scans of small-molecule inhibitor databases suggest multiple potential small- molecule inhibitors that would have the capability to specifically inhibit the phosphorylation of serine 43 on hnRNP E1. This would be an ideal treatment scenario for patients showing a high prognostic indicator of overactive non-canonical TGFβ signaling. Current treatments capable of inhibiting this pathway are ineffective and dangerous to patients because of their broad-spectrum inhibitory processes. Most drugs on the market target Akt2 or PI3K, both of which are not highly specific and interfere with a vast variety of other cellular processes, causing complications for patients. We envision the development of a specific inhibitor; capable of inhibiting the formation of p-hnRNP E1 would prove a valuable treatment for patients afflicted by this misregulation.

111 ACKNOWLEDGEMENTS

We thank Genscript for their work on mice inoculations and further validation of our antigen.

We also thank Dr. Erica Bullesbach for her guidance and training of advanced protein separation and validation techniques.

112 FUNDING

This work was supported by the grants CA55536 and CA154664 from the National Cancer

Institute (to Philip H. Howe, Ph.D.).

113 REFERENCES

Bellacosa, A., De Feo, D., Godwin, A. K., Bell, D. W., Cheng, J. Q., Altomare, D. A., . . .

Masciullo, V. (1995). Molecular alterations of the AKT2 oncogene in ovarian and breast

carcinomas. International journal of cancer, 64(4), 280-285.

Chaudhury, A., Chander, P., & Howe, P. H. (2010). Heterogeneous nuclear ribonucleoproteins

(hnRNPs) in cellular processes: Focus on hnRNP E1's multifunctional regulatory roles.

Rna, 16(8), 1449-1462.

Chaudhury, A., Hussey, G. S., Ray, P. S., Jin, G., Fox, P. L., & Howe, P. H. (2010). TGF-β-

mediated phosphorylation of hnRNP E1 induces EMT via transcript-selective

translational induction of Dab2 and ILEI. Nature cell biology, 12(3), 286-293.

Cheng, J. Q., Ruggeri, B., Klein, W. M., Sonoda, G., Altomare, D. A., Watson, D. K., & Testa, J.

R. (1996). Amplification of AKT2 in human pancreatic cells and inhibition of AKT2

expression and tumorigenicity by antisense RNA. Proceedings of the National Academy

of Sciences, 93(8), 3636-3641.

Cline, M. S., Craft, B., Swatloski, T., Goldman, M., Ma, S., Haussler, D., & Zhu, J. (2013).

Exploring TCGA Pan-Cancer Data at the UCSC Cancer Genomics Browser. Sci. Rep., 3.

doi: 10.1038/srep02652 http://www.nature.com/srep/2013/131002/srep02652/abs/srep02652.html - supplementary-

information

Díaz-Moreno, I., Hollingworth, D., Frenkiel, T. A., Kelly, G., Martin, S., Howell, S., . . . Ramos,

A. (2009). Phosphorylation-mediated unfolding of a KH domain regulates KSRP

localization via 14-3-3 binding. Nature structural & molecular biology, 16(3), 238-246.

114 Engelman, J. A. (2009). Targeting PI3K signalling in cancer: opportunities, challenges and

limitations. Nature Reviews Cancer, 9(8), 550-562.

Gherzi, R., Trabucchi, M., Ponassi, M., Ruggiero, T., Corte, G., Moroni, C., . . . Briata, P.

(2006). The RNA-binding protein KSRP promotes decay of β-catenin mRNA and is

inactivated by PI3K-AKT signaling.

Ho, J. D., Robb, G. B., Tai, S. C., Turgeon, P. J., Mawji, I. A., Man, H. J., & Marsden, P. A.

(2013). Active stabilization of human endothelial nitric oxide synthase mRNA by hnRNP

E1 protects against antisense RNA and microRNAs. Molecular and Cellular Biology,

33(10), 2029-2046.

Hussey, G. S., Chaudhury, A., Dawson, A. E., Lindner, D. J., Knudsen, C. R., Wilce, M. C., . . .

Howe, P. H. (2011). Identification of an mRNP complex regulating tumorigenesis at the

translational elongation step. Molecular cell, 41(4), 419-431.

Hussey, G. S., Link, L. A., Brown, A. S., Howley, B. V., Chaudhury, A., & Howe, P. H. (2012).

Establishment of a TGFβ-induced post-transcriptional EMT gene signature.

Iwano, M. (2009). EMT and TGF-beta in renal fibrosis. Frontiers in bioscience (Scholar

edition), 2, 229-238.

Kasai, H., Allen, J. T., Mason, R. M., Kamimura, T., & Zhang, Z. (2005). TGF-β1 induces

human alveolar epithelial to mesenchymal cell transition (EMT). Respiratory research,

6(1), 56.

Kosturko, L. D., Maggipinto, M. J., Korza, G., Lee, J. W., Carson, J. H., & Barbarese, E. (2006).

Heterogeneous nuclear ribonucleoprotein (hnRNP) E1 binds to hnRNP A2 and inhibits

translation of A2 response element mRNAs. Molecular biology of the cell, 17(8), 3521-

3533.

115 Liu, A.-x., Testa, J. R., Hamilton, T. C., Jove, R., Nicosia, S. V., & Cheng, J. Q. (1998). AKT2, a

member of the protein kinase B family, is activated by growth factors, v-Ha-ras, and v-

src through phosphatidylinositol 3-kinase in human ovarian epithelial cancer cells.

Cancer research, 58(14), 2973-2977.

Ostareck!Lederer, A., & Ostareck, D. H. (2004). Control of mRNA translation and stability in

haematopoietic cells: the function of hnRNPs K and E1/E2. Biology of the Cell, 96(6),

407-411.

Padua, D., & Massagué, J. (2009). Roles of TGFβ in metastasis. Cell research, 19(1), 89-102.

Song, Q., Sheng, W., Zhang, X., Jiao, S., & Li, F. (2014). ILEI drives epithelial to mesenchymal

transition and metastatic progression in the lung cancer cell line A549. Tumor Biology,

35(2), 1377-1382.

Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., . . .

Mesirov, J. P. (2005). Gene set enrichment analysis: A knowledge-based approach for

interpreting genome-wide expression profiles. Proceedings of the National Academy of

Sciences, 102(43), 15545-15550. doi: 10.1073/pnas.0506580102

Thiele, B.-J., Doller, A., Kähne, T., Pregla, R., Hetzer, R., & Regitz-Zagrosek, V. (2004). RNA-

binding proteins heterogeneous nuclear ribonucleoprotein A1, E1, and K are involved in

post-transcriptional control of collagen I and III synthesis. Circulation research, 95(11),

1058-1066.

Zavadil, J., & Böttinger, E. P. (2005). TGF-β and epithelial-to-mesenchymal transitions.

Oncogene, 24(37), 5764-5774.

116 Zeisberg, M., Hanai, J.-i., Sugimoto, H., Mammoto, T., Charytan, D., Strutz, F., & Kalluri, R.

(2003). BMP-7 counteracts TGF-β1–induced epithelial-to-mesenchymal transition and

reverses chronic renal injury. Nature medicine, 9(7), 964-968.

Zhang, Y. E. (2011). Stopped in translation: EMT control meets eukaryotic elongation.

Developmental cell, 20(3), 289-290.

117 CHAPTER V

GENERAL SUMMARY

118 The progression of metastasis is the most common cause of death in cancer patients (Ma et al.,

2010). Current therapeutic agents for the clinical treatment of metastasis include many drugs that are weakly effective because of the large amount of side effects. A growing understanding of genetic profiles and association with clinical tumor phenotypes has given the rise of profiling cancer patients and personalized medicine. Understanding the genetic profile of individual cancers allows for a more specific, effective, and therapeutic means of treating cancer.

Paramount to continuing the understanding of genetic profiles in cancer has created a need for more high-throughput screening techniques and more robust mechanisms to identify irregularities in cancer cells and pinpoint the biochemical pathways that will be affected.

Gene expression is primarily regulated by multiple mechanisms including transcription, splicing, mRNA stability, mRNA transport, translation, protein stability, and post-translational modification (Hay & Sonenberg, 2004). Cellular development is driven by gene expression that must be inhibited once development becomes static. Periodically, these genes need to be re- expressed for short periods of time to maintain cellular homeostasis; as a result, this process is typically regulated by translational modulation. The most efficient implementation of translational regulation is repression at the elongation stage of peptide synthesis; however, pathological mutations at a mature stage in life through various epigenetic alterations have been implicated in the reversion of cellular differentiation and increased potential of taking on an invasive phenotype (Ruggero & Pandolfi, 2003; Sampath et al., 2004; Standart & Jackson,

1994). The interaction between RBPs and cis-regulatory elements located in the 3’UTR of developmental genes have been shown to be regulated by a signaling cascade and ultimately bind and cause regulatory function of target; this regulatory function can be modulated through post-

119 translational modification of the RBP through a variety of cellular pathways (Chaudhury,

Chander, et al., 2010; Hussey et al., 2011). Integrity of protein structure and nucleic acid structural elements are of major importance for the proper inhibition these unique regulatory complexes exert. Protein integrity is affected significantly through post-translational modifications that can cause shifts in localization, affinity changes, and potentially signal for degradation. It is important to understand the context post-translational modification exerts on critical RBPs in addition to understanding the downstream genes that are dependent on its regulatory function.

Our studies have demonstrated a non-canonical TGFβ regulon capable of inducing EMT and further causing metastasis through the aberrant expression of genes silenced by hnRNP E1. We have characterized the pathways and key regulators of this induction pathway in addition to identifying genes that are regulated by its repressor function. We have identified a key post- translational modification (serine 43 phosphorylation) that is responsible for this aberrant expression and identified this as a causative agent for inducing EMT in normal epithelial cells.

The failure of this mechanism has been implicated in driving processes of tumorigenesis, metastasis, and invasiveness as demonstrated by multiple analytic studies of well-characterized cancer progression cellular models and correlation with clinical phenotypic and expression observations.

The growing number of publically available data containing genome wide array studies that quantify and identify mRNA that interacts with RBPs with respect to post-translational modifications provided an opportunity to predict cellular phenotypic changes, and determine a

120 nucleic acid binding region that is sensitive to post-translation modification affinity changes. We have developed a computational algorithm to aid in the in-silico prediction of mRNA targets that interact with a specific RBP. In-silico predictions are made by analyzing the unmodified and post-translational modified RBP interactions with mRNA; differential interacting genes are further analyzed and validated. This allows for the identification of genes whose expression will be altered by modification, and allows for an accurate prediction of the position and sequence of

RNA motifs in mRNA. We have developed simple in-vitro techniques of validating computational predictions of RBP post-translational modification specific nucleic acid motifs.

Our goal is to create a curated database of RBPs that are highly implicated in the progression of cancer and metastasis and annotate their unique post-translational responsive binding elements to be utilized in the development of therapeutic treatment for patients afflicted by mutation.

Previous discoveries by our lab and numerous others have implicated the role of hnRNP E1 in cancer progression. We have identified a key pathway where hnRNP E1 plays a pivotal role in the onset, progression, and maintenance of a highly aggressive phenotype. This was illustrated by the development of an hnRNP E1 silenced mutant NMuMG cell model. The silencing of hnRNP E1 drives the cell to exhibit aggressive, mesenchymal characteristics in both is morphological appearance and genomic profile. The introduction of these cells into immune compromised mice revealed their metastatic nature by metastasis occurring from the mammary fat pad to distant, vital tissue such as the brain and lung (Chaudhury, Chander, et al., 2010;

Hussey et al., 2011). We show a similar phenotype of NMuMG cells that are placed in a rich microenvironment of TGFβ, this observation defined a need to understand genes that were affected by the disruption of the hnRNP E1 silencing complex that TGFβ facilitates. We

121 developed a unique modification to an exonuclease protection assay utilizing recombinant expressed homologous pools of RBPs and either synthetically transcribed or cellular-derived

RNA. Coupled with high-throughput sequencing technology, we were able to precisely resolve the boundaries of nucleic acids that were uniquely protected by hnRNP E1, and not p-hnRNP E1.

These data were utilized to support computational predictions, and assist in the further development of this assay to identify a post-translational modification specific binding motif and all genes throughout the genome that contain it. This high-throughput binding and protection assay confirmed our hypothesis that RBP binding motifs can heavily vary in sequence and key bases that are responsible for the direct binding with RBPs are conserved. By defining a minimal motif consensus descriptor, we have allowed for subsequent analysis of these motifs in virtually any annotated system of nucleic acids. A growing area of interest in the progression of cancer is the function of pseudogenes. Pseudogenes are genomic DNA sequences that have similarity in sequence to normal, expressed genes, yet their expression was previously though to have no function. Pseudogenes can be transcribed into molecules such as small RNA motifs and long non-coding RNA (lncRNA), and potentially contain regulatory regions that are similar to

RBP specific motifs (Poliseno et al., 2010; Sampath et al., 2004; Tam et al., 2008). The presence of pseudogenes and their potential to interfere with regulation of RBP target genes poses yet another area of potential cancer induction. Pseudogenes have been implicated in the regulation of mRNA stability, localization, and interaction with RBPs (Hirotsune et al., 2003; J. T. Lee,

2012). Our methods have allowed for the precise identification of RBP specific regulatory nucleic acid motifs and subsequent computational identification of non-traditional nucleic acid sequences, such as pseudogenes. We hypothesize that analysis of pseudogenes that contain

122 similar regulatory motif elements can be used to better understand mechanisms by which pseudogenes alter cellular regulation.

The implication of the various roles of hnRNP E1 and p-hnRNP E1 in the induction of EMT and cancer and subsequent study of the mechanism by which this occurs has provided a unique system to study and develop prognostic mechanisms to identify and treat patients afflicted by its misregulation. We have identified a signature panel of protein expression and enzymatic activation that we can detect through proteomic analysis of normal and cancer tissue. This signature of non-canonical TGFβ induced EMT includes a high expression of Akt2, and high degree of activation indicated by serine 474 phosphorylation (p-Akt2), and low levels of hnRNP

E1 coupled to a high ratio of p-hnRNP E1 have been correlated to a shift in increased metastatic potential. This signature was noted in a variety of highly invasive cancer tissues; however, normal epithelial tissues such as NMuMG cells do not fully contain this expression pattern without stimulation by a cytokine such as TGFβ. We observed sufficient levels of Akt2 that were not activated without TGFβ stimulation, we hypothesize a role for the total levels of Akt2 to facilitate non-canonical signaling to a high enough level that it causes a shift in the ratio of p- hnRNP E1 to cause dissociation of the protein translation complex and aberrant expression of

EMT-inducing genes. Our group has developed a primary mouse monoclonal antibody capable of detecting endogenous levels of p-hnRNP E1 in western blot and ELISA applications. This antibody development has allowed previously difficult p-hnRNP E1 assays in a high-throughput analytic context to become possible. Previous methods used to detect p-hnRNP E1 levels included a lengthy procedure of immunoprecipitation and subsequent western blot analysis. This hurdle previously prevented a quality high-throughput ELISA screen for non-canonical TGFβ

123 induced cancer from being developed. We aim to use this antibody in a variety of clinical prognostic studies as well as further develop a screen for patients affected by metastatic breast and colon cancer.

The identification of a translational repression mechanism governed by hnRNP E1 and non- canonical TGFβ signaling has provided a platform for which therapeutic drugs can be developed.

We postulate based on data provided by The Cancer Genome Atlas (TCGA, http://cancergenome.nih.gov/) that the implication of this mechanism in cancer is significant.

Our development of a clinical screen and further correlation studies of this pathways will provide the basis for the development of a specific inhibitor to the formation of p-hnRNP E1 by inhibiting its serine 43 phosphorylation by Akt2. The development of computational prediction algorithms and validation studies for the identification of post-translational modification RBP binding consensuses has made possible the streamlined process of identifying genes that are regulated by RBPs whose mutations are highly implicated in the progression of cancer.

Characterization of RBP controlled gene expression provides the basis for understanding causative agents of metastasis and allow for manipulating treatment at a more specific target that is less harmful to the patient.

124 REFERECES:

Auyeung, V. C., Ulitsky, I., McGeary, S. E., & Bartel, D. P. (2013). Beyond secondary structure:

primary-sequence determinants license pri-miRNA hairpins for processing. Cell, 152(4),

844-858. doi: 10.1016/j.cell.2013.01.031

Burk, U., Schubert, J., Wellner, U., Schmalhofer, O., Vincan, E., Spaderna, S., & Brabletz, T.

(2008). A reciprocal repression between ZEB1 and members of the miR!200 family

promotes EMT and invasion in cancer cells. EMBO reports, 9(6), 582-589.

Chaffer, C. L., & Weinberg, R. A. (2011). A perspective on cancer cell metastasis. Science,

331(6024), 1559-1564.

Chaudhury, A., Chander, P., & Howe, P. H. (2010). Heterogeneous nuclear ribonucleoproteins

(hnRNPs) in cellular processes: Focus on hnRNP E1's multifunctional regulatory roles.

Rna, 16(8), 1449-1462.

Cheng, J. Q., Ruggeri, B., Klein, W. M., Sonoda, G., Altomare, D. A., Watson, D. K., & Testa, J.

R. (1996). Amplification of AKT2 in human pancreatic cells and inhibition of AKT2

expression and tumorigenicity by antisense RNA. Proceedings of the National Academy

of Sciences, 93(8), 3636-3641.

Chin, Y. M. R., Balk, S., & Toker, A. (2013). Akt2-specific signaling in prostate cancer

maintenance. Cancer research, 73(8 Supplement), 4090.

Churbanov, A., Rogozin, I. B., Babenko, V. N., Ali, H., & Koonin, E. V. (2005). Evolutionary

conservation suggests a regulatory function of AUG triplets in 5′-UTRs of eukaryotic

genes. Nucleic Acids Research, 33(17), 5512-5520. doi: 10.1093/nar/gki847

125 Chursov, A., Frishman, D., & Shneider, A. (2013). Conservation of mRNA secondary structures

may filter out mutations in Escherichia coli evolution. Nucleic Acids Research, gkt507.

Gupta, G. P., & Massagué, J. (2006). Cancer metastasis: building a framework. Cell, 127(4),

679-695.

Hafner, M., Landthaler, M., Burger, L., Khorshid, M., Hausser, J., Berninger, P., . . .

Munschauer, M. (2010). Transcriptome-wide identification of RNA-binding protein and

microRNA target sites by PAR-CLIP. Cell, 141(1), 129-141.

Hay, N., & Sonenberg, N. (2004). Upstream and downstream of mTOR. Genes & development,

18(16), 1926-1945.

Hirotsune, S., Yoshida, N., Chen, A., Garrett, L., Sugiyama, F., Takahashi, S., . . . Yoshiki, A.

(2003). An expressed pseudogene regulates the messenger-RNA stability of its

homologous coding gene. Nature, 423(6935), 91-96.

Hussain, R. H., Zawawi, M., & Bayfield, M. A. (2013). Conservation of RNA chaperone activity

of the human La-related proteins 4, 6 and 7. Nucleic Acids Res, 41(18), 8715-8725. doi:

10.1093/nar/gkt649

Hussey, G. S., Chaudhury, A., Dawson, A. E., Lindner, D. J., Knudsen, C. R., Wilce, M. C., . . .

Howe, P. H. (2011). Identification of an mRNP complex regulating tumorigenesis at the

translational elongation step. Molecular cell, 41(4), 419-431.

Hussey, G. S., Link, L. A., Brown, A. S., Howley, B. V., Chaudhury, A., & Howe, P. H. (2012).

Establishment of a TGFβ-induced post-transcriptional EMT gene signature. PLoS One,

7(12), e52624.

126 Karnoub, A. E., Dash, A. B., Vo, A. P., Sullivan, A., Brooks, M. W., Bell, G. W., . . . Weinberg,

R. A. (2007). Mesenchymal stem cells within tumour stroma promote breast cancer

metastasis. Nature, 449(7162), 557-563.

Lee, J. T. (2012). Epigenetic regulation by long noncoding RNAs. Science, 338(6113), 1435-

1439.

Liu, A.-x., Testa, J. R., Hamilton, T. C., Jove, R., Nicosia, S. V., & Cheng, J. Q. (1998). AKT2, a

member of the protein kinase B family, is activated by growth factors, v-Ha-ras, and v-

src through phosphatidylinositol 3-kinase in human ovarian epithelial cancer cells.

Cancer research, 58(14), 2973-2977.

Ma, L., Young, J., Prabhala, H., Pan, E., Mestdagh, P., Muth, D., . . . Valastyan, S. (2010). miR-

9, a MYC/MYCN-activated microRNA, regulates E-cadherin and cancer metastasis.

Nature cell biology, 12(3), 247-256.

Mani, S. A., Guo, W., Liao, M.-J., Eaton, E. N., Ayyanan, A., Zhou, A. Y., . . . Shipitsin, M.

(2008). The epithelial-mesenchymal transition generates cells with properties of stem

cells. Cell, 133(4), 704-715.

Massagué, J. (2000). How cells read TGF-β signals. Nature reviews Molecular cell biology, 1(3),

169-178.

Massagué, J. (2012). TGFβ signalling in context. Nature reviews Molecular cell biology, 13(10),

616-630.

Massague, J., Cheifetz, S., Laiho, M., Ralph, D., Weis, F., & Zentella, A. (1991). Transforming

growth factor-beta. Cancer surveys, 12, 81-103.

127 Morel, A.-P., Lièvre, M., Thomas, C., Hinkal, G., Ansieau, S., & Puisieux, A. (2008).

Generation of breast cancer stem cells through epithelial-mesenchymal transition. PloS

one, 3(8), e2888.

Poliseno, L., Salmena, L., Zhang, J., Carver, B., Haveman, W. J., & Pandolfi, P. P. (2010). A

coding-independent function of gene and pseudogene mRNAs regulates tumour biology.

Nature, 465(7301), 1033-1038.

Ruggero, D., & Pandolfi, P. P. (2003). Does the ribosome translate cancer? Nature Reviews

Cancer, 3(3), 179-192.

Sampath, P., Mazumder, B., Seshadri, V., Gerber, C. A., Chavatte, L., Kinter, M., . . . Driscoll,

D. M. (2004). Noncanonical function of glutamyl-prolyl-tRNA synthetase: gene-specific

silencing of translation. Cell, 119(2), 195-208.

Standart, N., & Jackson, R. (1994). Regulation of translation by specific protein/mRNA

interactions. Biochimie, 76(9), 867-879.

Tam, O. H., Aravin, A. A., Stein, P., Girard, A., Murchison, E. P., Cheloufi, S., . . . Schultz, R.

M. (2008). Pseudogene-derived small interfering RNAs regulate gene expression in

mouse oocytes. Nature, 453(7194), 534-538.

Xu, J., Lamouille, S., & Derynck, R. (2009). TGF-β-induced epithelial to mesenchymal

transition. Cell research, 19(2), 156-172.

Zavadil, J., & Böttinger, E. P. (2005). TGF-β and epithelial-to-mesenchymal transitions.

Oncogene, 24(37), 5764-5774.

128