Lnc-EPCAM AND Lnc-BHLHE41 AS RNA REGULATORS OF BREAST AND PREVENTION

A Dissertation Submitted to the Temple University Graduate School Board

In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY

By MARIA BARTON May 2017

Examining Committee Members:

Jose Russo, MD, FACP, Advisor, The Irma H. Russo MD Breast Cancer Research Laboratory, Fox Chase Cancer Center

Dianne Soprano, PhD, Advisor, Department of Biochemistry, Lewis Katz School of Medicine, Temple University

Ana Gamero, PhD, Department of Biochemistry, Lewis Katz School of Medicine, Temple University

Nora Engel, PhD, Department of Cancer Biology, Genetics and Epigenetics, Fels Institute, Lewis Katz School of Medicine, Temple University

Richard Katz, PhD, External Member, Department of Epigenetics, Fox Chase Cancer Center

Raghbir Athwal, PhD, External Member, Department of Pathology and Laboratory Medicine, Fels Institute, Lewis Katz School of Medicine, Temple University

© Copyright 2016 By MARIA BARTON All Rights Reserved

ii

ABSTRACT

The objective of this study was to unveil a novel area of regulation in breast cancer and breast cancer prevention through the study of a recent discovered class of genetic regulators named long non-coding

RNAs (lncRNAs). LncRNAs are RNA molecules longer than 200 nucleotides that are not translated into , but regulate the of involved in different cellular processes, including differentiation, cancer initiation and progression. The link between lncRNAs and cancer is well documented in the literature. More recently, their relevance in the transcription field is beginning to be explored and their roles have been found to vary from guiding proteins to the genome to scaffolding proteins complexes needed for the transcription of a specific gene. Initial transcriptome analysis of normal breast of parous and nulliparous postmenopausal women revealed that several lncRNAs are differentially expressed in the parous breast. This observation provided evidence of a potential role of lncRNAs in the regulation of transcription and their function in pregnancy’s preventive effect in reducing the lifetime risk of developing breast cancer. Specifically, RNA sequencing of healthy postmenopausal breast tissue biopsies from eight parous and eight nulliparous women using Illumina platform was performed. The sequencing results showed that there are 42 lncRNAs differentially expressed between parous and nulliparous breast tissue.

These data led to the hypothesis that these novel lncRNAs may be drivers in the process of development that occurs in the mammary gland during pregnancy, providing protection against breast cancer. After analysis of these 42 lncRNAs using bioinformatics tools, review of the scientific literature, and real-time

PCR analysis, two lncRNAs (lncBHLHE41 and lncEPCAM) were selected to be tested in vitro, using different molecular techniques in human epithelial breast cell lines to determine their relevance in breast cancer. This project provided novel information on lncRNAs induced by pregnancy in the breast tissue, and identified two lncRNAs as potential key regulators in breast differentiation and cancer progression. The manipulation of these lncRNAs led to evidence of their function in vitro and, using xenograft studies, we determined their relevance in vivo. Although treatment for cancer using lncRNAs as targets is in its infancy at the clinic, the advancement in knowledge and technology to study their relevance in disease could lead to the development of therapeutics for breast cancer and breast cancer prevention in the near future.

iii

To my little “chinis” who has changed my life forever

In memory of Miguel R. Barton

iv

ACKNOWLEDGEMENTS

I feel this is the hardest part for me to write. I am so grateful to so many people who have contributed to me getting to this point. I first want to start by acknowledging my undergraduate mentor Dr. Sonia Rodriguez who sprinkled on me her passion for research. The few years I was her mentoree, and had the pleasure of interacting with her diverse team, ignited the fire for research that now burns in me. I have been very lucky with the mentors I have had and that is why my Master’s Degree mentor Dr. Robert H. Silverman also deserves my gratitude for the achievements during my PhD. It was in his laboratory, a great experience at the Cleveland Clinic, that I learned how to start walking the path of becoming an independent researcher and melting into the challenging working environment of a research-based institution. That is why, when I met Dr. Russo at Fox Chase Cancer Center, I already felt at home. Dr. Russo was welcoming from the very beginning. That is just who he is. Although I was nervous thinking of the responsibilities ahead of me, I felt comfortable during our first encounter, which feels like yesterday. Thank you for letting me (and even encouraging me) to always speak my mind. I owe you a great deal of my achievements and I will be forever grateful for the opportunity of working in The Irma H. Russo MD Breast Cancer Research Laboratory.

In addition to my mentors, I want to deeply thank my advisory committee, Dr. Dianne Soprano, Dr. Nora

Engel and Dr. Ana Gamero for their continuous support and suggestions at every step during my graduate work. I took your comments into action and your input undoubtedly gave shape to this very challenging project. Thank you all for your caring guidance and advice. To my external dissertation readers, Dr. Richard

A. Katz and the generous Dr. Raghbir Athwal, it has been a pleasure meeting you both. I am so grateful to all the faculty, postdoctoral fellows and graduate students at Fox Chase Cancer Center for their availability and cooperative work ethics and Dr. Katz is a true embodiment of the institution’s values and mission. As I said, Fox Chase felt like home on my first visit, and time didn’t prove me wrong.

Another “home” away from home was Fels Institute. I am very grateful to my outside research proposal sponsor, Dr. Xavier Grana, not only for his invaluable contributions and guidance during my ORP project but also for taking me under the CBGN wing. I felt I had the best of two worlds being a Biochemistry student in a Cell, Molecular, Genetics and Epigenetics cluster.

v

To my past and present lab members, I wouldn’t have been able to do this without your suggestions, constructive criticism, and friendship. Julia, Su, Tom, Theresa, and Dr. Irma Russo helped me design and/or carry out experiments. In particular I want to thank Ricardo from the bottom of my heart for always being available (in the most broad way) to discuss results and future directions for this project. In the Russo lab,

I was honored to mentor many undergraduate and graduate students and these experiences have shaped me into a more solid scientist. Special thanks to Dominic, Olivia and Tyler for reminding me how much I love teaching.

To my graduate fellows in this great adventure, I want to thank Danielle Feather, Elsie Samakai, Umme

Ayesa, and Jenny Shrestha for their words of support and encouragement at every step. I’m happy we will be finding the light at the end of the tunnel and wish you all success in your careers.

Last but not least, I want to thank the drivers and energy fillers of my life. To the love of my life, Javier, and the little princess we added to the family, my beloved Zoe. Javier, your determination to be the best professional you possibly can have always inspired me to find the best scientist in me. I’m so proud of our

“all-terrain” family. To my sisters, Victoria and Luciana, my beautiful niece Sofia and especially my mother

Susana. Mom, you have been my utmost supporter, several times flying overseas so that I could get to this point. Gracias.

vi

TABLE OF CONTENTS

Page ABSTRACT ...... iii

DEDICATION ………………………………………………………………………………..... iv

ACKNOWLEDGEMENTS ...... v

LIST OF TABLES ………………………………………………………………………….….. x

LIST OF FIGURES ………………………………………………………………………….… xi

LIST OF ABBREVIATIONS ...... xiv

HYPOTHESIS ……………………………………………………………………………….. xvi

SPECIFIC AIMS …………………………………………………………………………….. xvii

CHAPTER 1: INTRODUCTION ...... 1 1.1. Incidence and mortality rates in recent years ...... 2 1.2. Breast Cancer Subtypes ...... 2 1.3. Breast Cancer Prevention ...... 3 1.4. Role of lncRNA in Development and Disease ...... 4 1.4.1. Molecular interactions and functional roles of lncRNA ...... 6 1.4.1.i- Epigenetic Transcriptional Regulation ...... 7 1.4.1.ii - Enhancer-associated lncRNA ...... 8 1.4.1.iii - mRNA Processing ...... 8 1.4.1.iv - miRNA Sequestration ...... 9 1.5. LncRNA Regulation ...... 12 1.5.1. Cis-Regulation ...... 12 1.5.2. Trans-Regulation ...... 16 1.6. LncRNAs and the hallmarks of cancer ...... 18 1.6.1. Sustaining Proliferative Signaling ...... 19 1.6.2. Evading Growth Suppressors ...... 21

vii

1.7. General Considerations ...... 28

CHAPTER 2: MATERIALS AND METHODS ...... 29 2.1. Data and Human Breast Sample Collection ...... 29 2.1.1. RNA isolation ...... 29 2.1.2. Library Preparation ...... 30 2.1.3. Quality Control ...... 30 2.1.4. Quantity Determination ...... 30 2.1.5. RNAseq and RNAseq Data Analysis ...... 31 2.1.6. Integrative Genome Viewer (IGV) ...... 32 2.1.7. Use of the Cancer Genome Atlas (TCGA) Breast Invasive Carcinoma Data Download, Sample Selection and TCGA Data Analysis ...... 32 2.2. Tissue culture and human breast samples ...... 32 2.2.1. General tissue culture procedures ...... 32 2.2.2. Normal and Cancer Breast Tissue Processing ...... 37 2.3. RT-qPCR ...... 37 2.4. Lentiviral Infections for overexpression of lncRNAs ...... 38 2.5. Flow Cytometry ...... 39 2.6. Fluorescence In Situ Hybridization (FISH) for detection of lncRNAs ...... 40 2.7. TUNEL Assay ...... 40 2.8. MTT Assay to measure effect of overexpression of lncRNAs ...... 41 2.9. Proliferation, Migration and Invasion by Real Time Cell Analysis (RTCA) ...... 42 2.10. Tumor tissue immune-histochemical (IHC) staining ...... 44 2.11. Xenograft study ...... 45 2.12. Statistical analyses...... 50

CHAPTER 3: RNAsequencing OF BREAST TISSUE FROM PAROUS VS. NULLIPAROUS WOMEN ...... 51 3.1. How the transcriptional machinery of the breast is affected by pregnancy ...... 51 3.2. Background ...... 52 3.3. Results ...... 54

CHAPTER 4: LncRNAs IN BREAST CELLS, BREAST TISSUE AND XENOGRAFT MICE MODEL ...... 62 4.1. Evaluating the relevance of top lncRNA candidates from healthy breast tissue in a breast cancer setting ...... 62

viii

4.2. In vitro cancer/normal model to study role of lncRNAs ...... 62 4.2.1. LncRNA levels in the in vitro model ...... 63 4.2.2. Overexpression (OE) of the selected lncRNAs ...... 69 4.3. LncRNAs in breast cancer tissue vs. normal adjacent ...... 97 4.4. LncRNAs in vivo model (xenograft study) ...... 99 4.5. Discussion ...... 107

CHAPTER 5: RNAsequencing OF BREAST TISSUEFROM PAROUS VS. NULLIPAROUS WOMEN: A CODING GENE PERSPECTIVE...... 115 5.1. Abstract to side project ...... 115 5.2. Background ...... 116 5.3. Results ...... 117 5.4. Data interpretation ...... 129

CHAPTER 6: CONCLUDING REMARKS ...... 136

CHAPTER 7: FUTURE PERSPECTIVES ...... 138

REFERENCES CITED ...... 142

ix

LIST OF TABLES

Page Table 1.1: Summary of breast tumor molecular subtypes ...... 3

Table 1.2: LncRNAs associated with human disease ...... 10

Table 2.1: Cell line characteristics ...... 33

Table 2.2 Cell line, treatment and number of animals per group in xenograft study ...... 47

Table 3.1: LncRNAs ranked by RNAseq coverage and BLAST ...... 59

Table 3.2: Lnc-BHLHE41 and lnc-EPCAM characteristics ...... 61

Table 4.1: Summary of phenotypic changes as a result of lncRNAs’ overexpression (OE)…………………………………………………………………….… 106

Table 5.1: Mean expression reads of genes involved in atRA metabolism differentiation for parous vs. nulliparous women...... 119

Table 5.2: Mean expression reads of matched normal vs. tumor tissues of all 112 female IBC-TCGA patients...... 124

Table 5.3: Mean reads of matched normal vs. tumor tissues in CSN2-expressing patients in N tissue only and N+T (n=63)...... 125

Table 5.4: Spearman correlation results of CSN2, UGT2B7, XDH, and WISP3 for CSN2-expressing patients in N tissue only and N+T tissue (n=63)...... 127

Table 5.5: Spearman correlation results of CSN2, UGT2B7, XDH, and WISP3 for all TCGA-IBC patients (n=112) ...... 128

Table 5.6: Spearman correlation results of CSN2, UGT2B7, XDH, and WISP3 for all TCGA-IBC patients (n=112) in Tumor tissue only ...... 128

x

LIST OF FIGURES

Page

Figure 1.1: Female trends in cancer ...... 1

Figure 1.2: Non-coding RNA comprise a much larger portion of the than protein-coding RNA ...... 5

Figure 1.3: LncRNA interactions and actions...... 6

Figure 1.4: The scale of lncRNAs in selected organs according to

MiTranscriptome analysis ...... 11

Figure 1.5: LncRNAs regulating in Cis and in Trans...... 15

Figure 1.6: LncRNAs associated with hallmarks of cancer...... 19

Figure 2.1: Viral production in HEK293T cells followed by infection of T-47D and

MDA-MB-231...... 39

Figure 2.2: MTT assay scheme...... 42

Figure 2.3: Real Time Cell Analysis system (RTCA) ...... 44

Figure 2.4: Survival surgery for estrogen implantation ...... 48

Figure 2.5: Timeline of xenograft experiment. End point: tumor removal after euthanasia ...... 49

Figure 3.1: Heatmap representing differentially expressed lncRNAs between parous and nulliparous women...... 55

xi

Figure 3.2: IGV images showing the coverage for 4 different lncRNA differentially expressed between P and NP women ...... 57

Figure 3.3: IGV images of lnc-BHLHE41 and lnc-EPCAM ...... 60

Figure 4.1: LncRNAs expression averaged from 3 independent experiments in the cell lines ...... 67

Figure 4.2: Heatmap representing averaged lncRNA expression per cell line ...... 67

Figure 4.3: Expression levels (-ΔΔCts) of lncBHLHE and lncEPCAM in the two cell lines selected for in-depth study ...... 68

Figure 4.4: Transfection efficiency of cells in culture ...... 73

Figure 4.5: Proliferation of T47D cells ...... 76

Figure 4.6: Proliferation of MDA-MB-231 cells ...... 78

Figure 4.7: Breast cancer OE lncBHLHE cells subjected to Click-iT Plus TUNEL assay ...... 82

Figure 4.8: Effect of lncBHLHE and lncEPCAM on migration ...... 85

Figure 4.9: Effect of lncBHLHE and lncEPCAM on invasion ...... 88

Figure 4.10: LncRNA expression in normal immortalized and cancer cells...... 91

Figure 4.11: Genomic region around lncBHLHE (lncB) and lncEPCAM (lncE) ...... 93

Figure 4.12: Evaluating Cis regulation...... 96

Figure 4.13: Breast cancer tissue evaluation for lncBHLHE (lncB) and lncEPCAM

(lncE) ...... 98

Figure 4.14: Mice weight throughout experiments...... 101

xii

Figure 4.15: Mice MDA-MB-231 tumors OE lncB and lncE, and histological sectioning...... 102

Figure 4.16: Mice T-47D tumors OE lncB and lncE, and histological sectioning..

...... 105

Figure 4.17: Cysts in mice reproductive organs (uterus and ovaries) due to the presence of estrogen pellet ...... 106

Figure 5.1: Heat map of differentially expressed protein-coding genes in the breast tissue of parous and nulliparous women ...... 118

Figure 5.2: CSN2 protein expression levels in parous and nulliparous women…

...... 120

Figure 5.3: CSN2 mRNA expression levels in parous and nulliparous women……………………………………………………………………………………...…121

Figure 5.4: Schematic of CSN2, UGT2B7, XDH, and WISP3 molecular interconnections in the normal mammary gland...... 135

xiii

LIST OF ABBREVIATIONS

AR – Androgen

ATCC – American Tissue Culture Collection atRA – all trans Retinoic Acid

BC – Breast Cancer

BC200 – Brain Cytoplasmic 200 nucleotides RNA

BWS - Beckwith-Wiedemann Syndrome

CBX – Chromobox

CCF – Cell Culture Facility

CSN2 – Casein-β

DMEM – Dulbecco’s Modified Eagle Medium

EMT – Epithelial Mesenchymal Transition

EPCAM – Epithelial Cell Adhesion Molecule

ER - eRNA – enhancer associated RNA

FCCC – Fox Chase Cancer Center

FBS – Fetal Bovine Serum

FISH – Fluorescence In Situ Hybridization

GR –

GRE – Glucocorticoid Response Elements

HBEC – Human Breast Epithelial Cell

HCC – hCG – Human Chorionic Gonadotropin

HER2 – Human Epidermal Growth Factor Receptor 2

HOTAIR - HOX Transcript Antisense RNA

HRS - hours

IBC – Invasive Breast Carcinoma

IGV – Integrative Genome Viewer

IHC – Immune-histochemical

xiv

LncRNA – Long non coding RNA

MALAT-1 - -Associated Lung Adenocarcinoma Transcript 1

MET – Mesenchymal Epithelial Transition ncRNA – non-coding RNA

NEAT-1 – Nuclear Paraspeckle Assembly Transcript 1

NSCLC - Non-Small Cell Lung Carcinoma

OE – overexpression; overexpressed

PCAT-1 - Prostate Cancer-Associated Transcript 1

PINC – Pregnancy Induced Non-Coding RNA

PR –

PRC – Polycomb Repressive Complex

RA – Retinoic Acid

RAG – retinoyl-beta-glucuronide

RAR –

RNAseq – RNA sequencing

RPMI – Royal Park Memorial Institute

RTCA – Real Time Cell Analysis

SNPs - Single Nucleotide Polymorphisms

TCGA – The Cancer Genome Atlas

TERRA - Telomeric repeat-containing RNA

TNBC – Triple Negative Breast Cancer

UGT2B7 - UDP Glucuronosyltransferase 2B7

WISP3 – WNT1 Inducible Signaling Pathway Protein 3

XDH - Xanthine Dehydrogenase

xv

HYPOTHESIS

Early pregnancy induces the regulation of the expression of long non-coding which are important players in that occurs in breast tissue, potentially conferring protection against the development of breast cancer

AIMS

This project proposes to open a new door on understanding the regulation of transcription in breast cancer prevention through the study of long non-coding RNAs (lncRNAs). LncRNAs regulate, at the transcriptional and post-transcriptional level, numerous genes involved in different cellular processes, including differentiation and cancer progression. Due to advances in technology, thousands of lncRNAs have been identified and several have been studied in disease, particularly in cancer [1-5]. Their functional roles have been found to vary widely, making their study even more challenging [6]. Our initial transcriptome analysis of normal breast of postmenopausal women reveals that chromatin remodeling genes and non-coding

RNAs are differentially expressed between parous and nulliparous women [7]. This observation provided a new paradigm in our understanding of the role of lncRNAs in the regulation of transcription and its potential function in pregnancy’s preventive effect in reducing the lifetime risk of developing breast cancer. Our preliminary data obtained using RNA sequencing of healthy postmenopausal breast tissue biopsies from parous and nulliparous women using Illumina platform (Illumina Inc., San Diego, CA) show that there are

42 lncRNAs differentially expressed between parous vs. nulliparous breast tissue. We hypothesize that these unique lncRNAs are key drivers in the process of mammary gland development that occurs during pregnancy, providing protection against breast cancer. After analysis of these 42 lncRNAs using bioinformatics tools, scientific literature, and real-time PCR, we selected two lncRNAs (lncBHLHE41 and lncEPCAM) to be tested in functional studies using different molecular techniques in human epithelial breast cells to determine their relevance in breast cancer. We propose the following two specific aims:

xvi

SPECIFIC AIMS

The specific aims of this doctoral thesis are:

1- Identification and quantification of regulatory long non-coding RNA sequences in parous

and nulliparous breast tissue

To accomplish this objective we utilized normal breast tissue of eight nulliparous and eight parous

postmenopausal women for transcriptomic analysis. RNA was isolated from the breast tissue and was

then used for RNA sequencing using Illumina HiSeq 2000 sequencer, an approach that allowed the

direct sequencing of the RNA and the alignment of the sequence reads to the genome. Through these

procedures we determined the quantity and type of lncRNAs at two different levels of breast

differentiation, nulliparous (NP, undifferentiated) and parous (P, differentiated). Our results opened the

way to direct experimental manipulation of selected lncRNAs that rendered a comprehensive picture of

the differentiation of the human breast at the transcriptomic level, findings that led to the identification

of the potential role played by lncRNAs in the prevention of breast cancer.

1a- RNA Library Preparation:

3' polyadenylated (poly(A)) tail RNA is targeted in order to ensure poly A tail RNA is separated from

non poly A tail RNA. This was accomplished with poly (T) oligos covalently attached to magnetic beads

(TruSeq Illumina technology). Reverse transcription of this RNA was performed to obtain cDNA. Due

to the 5' bias of randomly primed-reverse transcription as well as secondary structures influencing

primer binding sites, hydrolysis of RNA into 200-300 nucleotides prior to reverse transcription reduces

both problems simultaneously. Once the cDNA is synthesized it can be further fragmented. The

template is then ready for the desired sequencing method, i.e. RNAseq.

xvii

1b- Next Generation Sequencing (NGS)/RNA Seq:

High-throughput sequencing technology generated millions of short reads (50 base-pair long; paired end) from the prepared RNA library.

The characterization of in the P vs. NP tissue via measurement of RNA levels was performed to determine how the transcriptional machinery of the breast cell was affected by pregnancy.

2- Functional analysis of noncoding RNAs identified as key players in the RNAseq analysis

between parous and nulliparous breast tissue evaluated in a breast cancer setting

RNAseq data analysis was used to identify novel lncRNA regions potentially involved in the molecular mechanisms that occur in mammary gland development induced by pregnancy. We then studied the overexpression of these lncRNA regions in different breast cancer cell lines and breast cancer tissue to determine the relevance of the lncRNAs identified in normal breast and their potential connection to cancer progression. The expression profile of luminal and triple negative breast cancer cell lines was compared and further studied in vivo. Luminal and triple negative breast cancer accounts for the vast majority of breast cancer subtypes.

2a- lncRNA Expression Profiling and RNA overexpression in vitro: Breast cancer cell lines were tested to evaluate if the lncRNA expression profile is different in different breast cancer cells for the two lncRNAs of interest. Cells were infected with lentiviral construct containing the lncRNA to produce its overexpression. Also, phenotypic changes were evaluated to determine if the overexpression of the lncRNA produces any changes to the cell’s phenotype. A few of the phenotypes tested after overexpression were cell proliferation, migration, invasion, and the location of these lncRNAs in the proposed cell lines using fluorescence in situ hybridization (FISH), among others.

For a lncRNA that is upregulated in parous women, we may observe a decrease in the transformation phenotypes when we overexpress this lncRNA. We hypothesize that the upregulated lncRNAs in parous women are protecting against a potential risk to develop breast cancer, so overexpression of

xviii these lncRNAs in the cancer cell will lead to decreased cell proliferation, cell migration, and cell invasion.

2b- RNA overexpression in vivo: breast cancer cell lines which overexpressed the selected lncRNAs were injected into the mammary fat pad of SCID mice. The xenograft studies in SCID mice were the strategy used to evaluate the role of these lncRNAs of interest in vivo.

For a lncRNA that is upregulated in parous women, we expect a decrease in tumor size when we overexpress this lncRNA in cancer cells injected into SCID mice. We hypothesize that the upregulated lncRNAs in parous women are protecting against a potential risk to develop breast cancer, so overexpression of these lncRNAs in the cancer cell will lead to decreased tumorigenesis. On the other hand, if we overexpress a lncRNA that is upregulated in nulliparous women, we expect to observe an increase in tumorigenesis.

At the completion of these studies, this project has revealed novel information on lncRNAs induced by pregnancy in disease free breast cells, and also placed lncRNAs as potential key regulators in breast transformation. Data translated from our sequenced human breast tissue samples to our in vitro/in vivo findings have helped determine the function of these lncRNAs in the human breast.

The findings from this project have led to the identification of the potential role played by two of these lncRNAs in breast cancer. The manipulation of these lncRNAs can eventually lead to the development of therapeutics for breast cancer and be novel targets in breast cancer prevention.

xix

1. Park, J.Y., et al., Roles of Long Non-Coding RNAs on Tumorigenesis and Glioma Development. Brain Tumor Res Treat, 2014. 2(1): p. 1-6. 2. Hou, P., et al., LincRNA-ROR induces epithelial-to-mesenchymal transition and contributes to breast cancer tumorigenesis and metastasis. Cell Death Dis, 2014. 5: p. e1287. 3. He, X., et al., The long non-coding RNA HOTAIR is upregulated in endometrial carcinoma and correlates with poor prognosis. Int J Mol Med, 2014. 33(2): p. 325-32. 4. Lu, X., et al., Downregulation of gas5 increases pancreatic cancer cell proliferation by regulating CDK6. Cell Tissue Res, 2013. 354(3): p. 891-6. 5. Sorensen, K.P., et al., Long non-coding RNA HOTAIR is an independent prognostic marker of metastasis in estrogen receptor-positive primary breast cancer. Breast Cancer Res Treat, 2013. 142(3): p. 529-36. 6. Fatica, A. and I. Bozzoni, Long non-coding RNAs: new players in cell differentiation and development. Nat Rev Genet, 2014. 15(1): p. 7-21. 7. Russo, J., et al., Pregnancy-induced chromatin remodeling in the breast of postmenopausal women. Int J Cancer, 2012. 131(5): p. 1059-70.

xx

CHAPTER 1: INTRODUCTION

Breast cancer affects women of all ages, races and nationalities (1-3). The worldwide incidence has increased 30% since the 1970s, well above lung & bronchus, colorectum, and uterine corpus

(2). Excluding skin , breast cancer is the most common cancer diagnosed among US women, accounting for nearly one in three cancers (4). In the USA only, it is estimated that there will be around 246,660 new cases of breast cancer in women in 2016 (2). This year only, 40,450 women in the USA will die of this disease (2, 4), second leading cause of cancer for 2016 and since

1990 (Figure 1.1). All other cancers linked to the reproductive system combined (ovary, uterine corpus, etc.) won’t account for as many deaths (2).

Figure 1.1: Female trends in cancer. Trends in cancer incidence and cancer death for the female population since 1975 for incidence rate and since 1930 for cancer death. Adapted from Siegel et al. CA Cancer J Clin 2016

1

1.1. Incidence and mortality rates in recent years

Female breast cancer incidence and mortality rates vary by race and ethnicity. Non-Hispanic white and non-Hispanic black women have higher breast cancer incidence and death rates than women of other race/ethnicities. Breast cancer death rates are highest in black women, a more than 40% higher rate than white women. This mortality difference can be attributed to a variety of biologic and non-biologic factors, including differences in subtype of breast cancer, stage at diagnosis, weight linked to comorbidities (diabetes, cholesterol, etc.), tumor characteristics, as well as social environment and behavior (health access and compliance to treatment, for example) (5, 6).

Different risk factors may contribute to the lower breast cancer rates seen in Alaska

Native/American Indian, Hispanic, and Asian/Pacific islander (7). For example, Hispanic women tend to have more than 2 children and Alaska Native/American Indian women tend to have their first child at a young age (less than 25 years old). As it will be described in further detail later in this chapter both number of children (i.e. number of full term pregnancies) and age at first pregnancy are well described breast cancer preventive measures (8, 9). Asian/Pacific Islander women in general breastfeed for at least a year and also have lower rates of obesity, both factors associated with lower breast cancer risk (10, 11).

1.2. Breast Cancer Subtypes

Although often referred to as a single disease, breast cancer is distinguished by several distinct histologic subtypes and at least 4 different molecular subtypes (Luminal A, Luminal B, HER2+ and

Triple Negative Breast Cancer [TNBC]; Table 1.1). These 4 subtypes are associated with distinct risk factors and are biologically variable in presentation, development, and outcomes after treatment (12-14). Gene expression profiling techniques have allowed the breast cancer field to identify the genetic variability among tumors although standard practice has yet to be achieved.

Molecular subtypes have been determined using , including the presence or absence of hormone receptors (HR; HR-positive/HR-negative for estrogen and progesterone) and

2

overexpression of the Human Epidermal Growth Factor Receptor 2 (HER2) protein (15). Overall,

74% of breast cancer cases are luminal type A, 12% are TNBC, 10% are luminal B, and 4% are

HER2+ (HER2-enriched), with the distributions varying by race and ethnicity as reported by the

American Cancer Society (4). TNBC (ER-PR-HER2-) is considered to be more aggressive and have poorer prognoses, in part because there are currently no targeted therapies for these tumors

(16). Luminal type A breast cancers are associated with higher survival rates in part because expression of HRs is predictive of response to hormonal therapy (16).

Table 1.1: Summary of breast tumor molecular subtypes. Adapted from Dai et al. J Of Cancer

2016

Subtype Alias status Grade Outcome Luminal Luminal A ER+PR+HER2-Ki67- 1 or 2 Good Luminal B ER+PR+HER2+/-Ki67+ 2 or 3 Intermediate or poor HER2 positive HER2 overexpression ER-PR-HER2+ 2 or 3 Poor Triple Negative Basal ER-PR-HER2-, basal marker+ 3 Poor

Claudin-low ER-PR-HER2-, EMT+ 3 Poor

1.3. Breast Cancer Prevention

Female’s reproductive history is closely linked to breast cancer risk (11, 17, 18). The first full-term pregnancy (FTP) is an essential step for determining the fate of the mammary gland in subsequent decades. Pregnancy exerts a protective effect in women whose first child is born before women reach 25 years of age (17, 19, 20). Moreover, multiple FTPs significantly decrease the risk even further, whereas postponement of the first FTP to the mid-thirties increases the risk compared to nulliparous women (17, 21). Pregnancy is a hormonally complex process that only succeeds when there is a perfect synchronization of the levels of estrogen, progesterone and human Chorionic

Gonadotropin (hCG), hormones that are essential for the maintenance of pregnancy and breast development in preparation for milk production (22, 23). Primiparous women younger than 25 years

3

of age that have elevated levels of hCG during the first trimester have a 33% decreased breast cancer incidence in their postmenopausal years (18, 24). On the other hand, exposure to high estrogen levels (due to hormone replacement therapy, for example) have been associated with increased risk of developing breast cancer in pre- and postmenopausal women (24, 25). The secretion of estrogen by the ovaries together with the surge of gonadotropin releasing hormone and luteinizing hormone, trigger ovulation (26). After oocyte fertilization and implantation, hCG steadily increases for the first 12 weeks (27). Estrogen, progesterone and hCG stimulate the terminal end buds of the mammary gland to develop into organized lobular structures after undergoing cell proliferation and differentiation (28, 29). Completion of pregnancy and further breastfeeding induce long-lasting molecular changes in the mammary gland (30, 31). These changes result in a significant reduction in the incidence of all types of breast cancer (32-34).

Notably, long noncoding RNAs (lncRNAs) are genetic regulators of the molecular changes that occur by the physiological events of pregnancy, as it is discussed below and in following chapters

(35, 36).

1.4. Role of lncRNA in Development and Disease

The study of noncoding RNAs is a relatively new field in molecular biology, genetics and epigenetics. Noncoding RNAs were once thought of as the “dark matter” of the genome, but it is becoming increasingly clear that they play major roles in gene regulation (37) as 98% of the human genome is non-protein-coding material.

Noncoding RNAs are transcripts of RNA that do not code for a protein (37). These RNA transcripts can be categorized into two groups: short noncoding RNA (for example miRNA; between 18-22 nucleotides in length) and long noncoding RNA (lncRNA; equal or greater than 200 nucleotides in length) (37). Long noncoding RNAs can be transcribed from in between genes (intergenic), where are defined (intronic), or be overlapping to protein coding genes (overlapping). They can be transcribed sense to the closest protein-coding gene or be anti-sense (Figure 1.2). Moreover, they

4

can have diverse gene expression regulatory functions including transcriptional regulation, post- transcriptional regulation, or direct regulation of proteins (37). When these functions go awry, however, many necessary biological functions can be affected negatively, and this can result in disease progression, including oncogenesis and cancer development.

In the context of cancer, some lncRNAs act as tumor suppressors while others act as .

In addition, or loss of certain lncRNAs can lead to cancer development. Figure 1.3 describes the general functional roles and mechanisms of lncRNA regulation. In particular we will expand on how different types of regulatory lncRNAs affect cancer initiation and progression through a variety of mechanisms and dysregulation.

Figure 1.2: Non-coding RNA comprise a much larger portion of the human genome than protein-coding RNA (only 3% of genome). LncRNA can be designated as Intergenic, Intronic,

Enhancer, Sense, Antisense, or Bidirectional based on their genomic location relative to that of nearby protein-coding genes. Reproduced from Parasramka et al., Pharmacology & Therapeutics

2016.

5

1.4.1. Molecular interactions and functional roles of lncRNA

Noncoding RNAs, like protein coding genes, have many diverse regulatory functions. LncRNAs function in both transcriptional regulation and post-transcriptional regulation (Figure 1.3). Their functions can be divided into several categories, some of which will be further explained below: epigenetic transcriptional regulation, enhancer-associated lncRNA, mRNA processing, and miRNA sequestration, among others.

Figure 1.3: LncRNA interactions and actions. (A) LncRNAs can target genomic DNA loci and can modulate gene transcription by associating with RNA pol II or pre-initiation complexes, or through epigenetic modulation by guiding chromatin-modifying complexes to target genomic DNA loci. (B) LncRNA can contribute to RNA inhibition or degradation, by association with mRNAs and miRNAs to regulate splicing, or by acting as endogenous sponges. (C) LncRNAs can modulate

6

protein activity and localization by acting as molecular guides and scaffolds or as decoys for proteins such as transcription factors. Reproduced from Parasramka et al., Pharmacology &

Therapeutics 2016

1.4.1.i- Epigenetic Transcriptional Regulation

Epigenetic transcriptional regulation is the most common form of lncRNA function. This form of regulation typically results in repression of target genes via the coupling of the lncRNA to histone modifying or chromatin remodeling protein complexes (38).

The most common protein complexes that lncRNAs associate with during epigenetic transcriptional regulation are polycomb repressive complexes PRC1 and PRC2 (38). These complexes add repressive markers to chromatin allowing for gene transcription inhibition (38). PRC1 can be comprised of numerous proteins, the main component of which are Chromobox (CBX) proteins, proteins that act to ubiquitinate histone 2A at lysine tail 119 (H2AK119) (39). PRC2 is comprised of

3 proteins, EED, SUZ12, and EZH2 (39). EZH2 is the main component and is a histone methyltransferase that trimethylates histone 3 at lysine residue 27 (H3K27me3) (39). It is estimated that nearly 20% of all lncRNAs bind to PRC2 (40).

While PRC1 and PRC2 are the most common lncRNA partners in epigenetic modifications, many other protein complexes are involved in chromatin remodeling (38). The lncRNA HOTAIR, in addition to binding to PRC2, also binds to the LSD1/CoREST complex, a complex that represses genes by deacetylating chromatin tails (41). LncRNA AIR has been reported to interact with G9a, an H3K9 histone methyltransferase (42). KCNQ1OT1 interacts with

PRC2, G9a, and DNMT1, which methylates CpG dinucleotides (43).

Rarely, lncRNAs can use epigenetic modifications to promote gene transcription, as is the case with lncRNA HOTTIP, which interacts with WDR5 to recruit MLL histone methyltransferases to the

HoxA to induce transcription (44). All these examples show the versatility of lncRNAs, molecules which can interact with numerous chromatin modifiers and proteins involved in transcriptional regulation at the DNA level.

7

1.4.1.ii - Enhancer-associated lncRNA

In addition to lncRNA functions in epigenetic modifications, it has become clear that some lncRNAs influence gene regulation through the activity of gene enhancers (38). The main types of lncRNAs that function in this way are the newly discovered class of lncRNAs called enhancer-associated

RNAs (eRNAs) (38). eRNAs are significant in coordinating enhancer loci with gene expression and regulation (38).

However, the mechanism by which eRNAs regulate is still unclear. It has been shown, though, that eRNAs are important for establishing proximity of enhancer- loci via chromosomal looping

(45). eRNAs also work in conjunction with cell-specific enhancers to regulate gene transcription in distinct cell types (45). Again, the signaling mechanisms by which eRNAs accomplish these tasks have yet to be elucidated.

1.4.1.iii - mRNA Processing

The majority of lncRNAs function by regulating gene transcription, but there are many lncRNAs implicated in post-transcriptional mRNA processing (38). About 95% of all human mRNA is alternatively spliced as part of post-transcriptional processing (46). The primary component of mRNA processing is the nuclear paraspeckle which is a sub-cellular compartment in the space between chromatin within the nucleus (47). Nuclear paraspeckles are known to be involved in many post-transcriptional activities, including RNA editing and splicing, although their functions are not completely understood (47). Paraspeckles are thought to serve as storage sites to mRNA before they are exported to the cytoplasm and translated (38).

Two lncRNAs have been implicated in nuclear paraspeckle function and are upregulated in many cancers: MALAT1 and NEAT1 (38). They are both thought to contribute to gene expression by regulating mRNA splicing, editing, and export to the cytoplasm (48, 49).

MALAT1 (also called NEAT-2) is present in high concentration within the nuclear paraspeckles, suggesting its role in mRNA processing (46). MALAT1 associates with nuclear paraspeckles in an

8

RNA Polymerase II transcription-dependent fashion and thus is speculated to have a role in either paraspeckle structure or in trafficking of splicing factors throughout the nucleus between the paraspeckles and transcription sites (46).

NEAT1 also localizes to nuclear paraspeckles (46). Depletion of NEAT1 has been shown to disrupt the paraspeckle function (46). However, overexpression has been shown to increase the number of paraspeckles (46). As vital parts of the nuclear paraspeckle, both MALAT1 and NEAT1 could have potential roles in cancer development and progression through disruption of post- transcriptional mRNA splicing and processing. MALAT-1 is highly expressed in several tissues such as breast, prostate, lung, and colon (37, 38).

1.4.1.iv - miRNA Sequestration

Long noncoding RNA and transcripts can also interact with (miRNAs).

MiRNAs are highly conserved 21-mer RNAs that regulate the expression of genes by binding to the 3’-untranslated regions of specific messenger RNAs (mRNAs). Long noncoding RNAs function as molecular “sponges” by binding to and sequestering miRNA (50). Sequestration of miRNAs indirectly regulates the levels of protein-coding mRNAs (50). For example, the protein coding gene

PTEN and its pseudogene PTENP1 are in competition for miRNA binding sites (50). PTEN is a tumor suppressor and thus this sequestration of miRNA by its pseudogene PTENP1 may contribute to cancer progression (50).

Under normal conditions, miRNAs bind to sites of mRNA and repress of the gene (51).

PTEN works to regulate these miRNAs as a tumor suppressor (52). A lncRNA transcribed from

PTEN, PTENP1, sequesters miRNAs away from its origin gene PTEN and overexpression of

PTENP1 has been shown to result in increased mRNA stability and increased amounts of PTEN

(51, 52). Suppression of PTENP1 has shown targeting of PTEN by miRNAs and thus leading to destabilization and suppression of PTEN (52). More recently, two antisense RNA transcripts have been discovered from the PTENP1 locus, an alpha isoform and a beta isoform (51). The alpha isoform function is still to be elucidated, but seems to function by recruiting chromatin regulators to the promoter region of PTEN, leading to transcriptional repression of PTEN (51). The beta isoform

9

stabilizes PTENP1 enhancing its ability to sequester repressive miRNAs away from PTEN thus increasing PTEN protein levels (51). The disruption of this intricate regulatory network can lead to cancer initiation and development.

Table 1.2: LncRNAs associated with human disease. Adapted from Parasramka et al.,

Pharmacology & Therapeutics 2016

Disease Type of disease lncRNA Reference Cardiovascular Myocardial ischemia HIF1A Zolk et al. 2008 disease Diabetes Transient neonatal HYMA1 Mackay et al. 2002 PINK1-AS Scheele et al. 2007 Genetic disorders X-fragile syndorme FMR2 Khalil et al. 2008 Beckwith- Wiedemann KCNQ1OT1 Chiesa et al. 2012 Neurological Alzheimer's disease BACE1-AS Faghihi et al 2008 disorders Huntington TUNA Lin et al. 2014 Schizophrenia DISC2, GOMAFU Chubb et al. 2008 Parkinson naPINK1 Scheele et al. 2007 Female Cancer (breast, ovary, HOTAIR, GAS5, UCA1, Khalil et al. 2008 uterine, cervical) SRA, MALAT-1, XIST Zhang et al. 2014 Colon MALAT-1, HOTAIR, Graham et al. 2011 KCNQ1OT1, H19 Ma et al. 2015; Michalik et al 2014 Liver HOTAIR, H19, linc-RoR Anwar et al. 2012, Du et al. 2012 PCNA-AS1, HOTTIP Takahasi et al. 2014; Yuan et al. 2014 Prostate PCA3, ANRIL, MALAT-1, Chung et al. 2011; Kotake et al. 2011 NEAT-1, SRA, PTENP1 Kotare et al. 2011; Prensner et al 2011 MALAT-1, linc-p21, Dingemans et al 2014; Gutschner et al Lung TUG1 2013 CCAT2, HOTAIR Hou et al. 2014

The disruption of any of the regulatory mechanisms can result in the initiation and development of several diseases, including cancer. Although our main focus is going to be the initiation and development of carcinogenesis, Table 1.2 summarizes the importance of lncRNAs in several other diseases such as genetic & neurological disorders, diabetes, and cardiovascular disease. The relevance of lncRNAs in cancer is highlighted by the thousands of publications that appeared in the last five years that link lncRNAs with an array of cancers. Iyer et al. developed a bioinformatics pipeline to predict novel transcripts with emphasis on lncRNAs (often expressed at low levels).

10

Using TCGA database (The Cancer Genome Atlas cancer data repository), they used the RNA sequencing data from all samples within a given cancer type, across 18 organs to identify the amount of lncRNAs expressed in each organ studied (53). This pipeline identified 3 times more lncRNAs (almost 60,000) than protein coding genes, which is around 20,000 as several groups have reported (53, 54). In particular, over 1500 lncRNAs have been identified in breast cancer

(Figure 1.4) and only a few have been studied and characterized (53, 55). The challenge of determining the involvement and relevance of the subset of lncRNAs that are functional in a certain disease or system, such as cancer, hasn’t progressed as fast as the technology to identify them, and thus plenty remains to be elucidated.

Figure 1.4: The scale of lncRNAs in selected organs according to MiTranscriptome analysis.

The identification of lncRNA genes has progressed rapidly since their recognition. Nearly 8000 lncRNA genes are cancer related and/or lineage specific. Only endocrine organs and lung

11

adenocarcinoma are shown. Other cancers were evaluated in the study such as head and neck, renal, etc. Adapted from Evans et al, J Clin Invest 2016. MiTranscriptome is a catalog of human long poly- adenylated RNA transcripts derived from computational analysis of RNAseq data of more than 6,500 samples covering several cancers and tissue types.

1.5. LncRNA Regulation

Hundreds of publications over the past decade have identified exponential numbers of lncRNAs in humans. Research has shown that lncRNAs constitute a key layer of genome regulation in diverse biological processes and disease. Chromatin modifiers have been associated with lncRNAs (40,

56) to form a complex which can then target specific genomic regions to modify gene transcription in Cis or in Trans.

1.5.1. Cis-Regulation

Long noncoding RNA can work via cis-regulation, where genes are controlled locally through histone modifications (38). This type of regulation can be highly specific to a single gene or can encompass an entire chromosomal region (38). Some cis-regulatory lncRNAs include imprinting lncRNAs like H19 and KCNQ1OT1, XIST, ANRIL (Figure 1.5; A & C). Imprinting is an epigenetic mechanism that induces parental-specific gene expression in diploid mammalian cells (57).

Some lncRNAs are involved in regulation of imprinted regions of the genome and this regulation is critical for maintaining specific parent-of-origin gene expression (38). The most studied imprinted region with regards to lncRNA regulation is an imprinted region of human 11 (58, 59).

LncRNA H19 is expressed on the maternal allele and lncRNA KCNQ1OT1 is expressed on the paternal allele (58-60). LncRNA H19 and KCNQ1OT1 work to silence the IGF2 and KCNQ1 genes, respectively (60).

12

In many solid tumors, such as hepatocellular carcinoma and bladder cancer, H19 shows abnormal expression and thus has been linked to both oncogenic and tumor suppressive qualities (61-63).

There is evidence that it is directly activated by cMYC and also downregulated by during prolonged cell proliferation (64, 65). In vitro knockdown of H19 impairs cancer cell growth in cell lines and in vivo knockdown of H19 decreases cancer cell growth in hepatocellular carcinoma (61, 64).

In Beckwith-Wiedemann Syndrome (BWS), a developmental disorder that has an increased risk for cancer development, there are abnormal imprinting patterns of KCNQ1OT1 (60, 66). In addition, abnormal H19 methylation also seems to cause a higher risk of cancer development in BWS patients (66).

XIST is one of the most extensively studied lncRNAs. XIST is transcribed from one of the X in females and helps facilitate in the inactivation of that X chromosome (67, 68). On the active X chromosome, XIST is repressed by an antisense lncRNA partner TSIX, which keeps this X chromosome active (68). XIST has been identified to play a role in the development of breast cancer and some other cancers. For example, it has been reported to play a role in the development of cancer in basal-like and hereditary BRCA1-deficient breast cancer patients (69, 70). XIST has also been implicated in X chromosome abnormalities that are commonly seen in breast cancer

(38). XIST has further been linked to lymphoma and male testicular germ-cell tumors where XIST hypomethylation is used as a biomarker (71).

The mechanism by which XIST and X-chromosome inactivation may contribute to human cancer is still unknown, but there are many possibilities including X-chromosome duplication, XIST dysregulation, and overexpression of XIST or other X-linked genes (72). In human breast cancer models, it was found that nearly half of all sporadic basal-like cancers had an active X-chromosome duplication and loss of the inactive X-chromosome (73). It has also been shown that the inactive X chromosome often develops more mutations than the active X chromosome thus suggesting inability to repair damage should the inactive X chromosome become active (73). Since the inactive

X chromosome is the one containing functional XIST, these mutations could be in X-linked genes,

13

or mutations of XIST itself (74). Mutations of XIST can cause it to be inactive, thus reactivating the

X chromosome. Even in cancers with only one active X chromosome, it has been shown that the inactive X chromosome is genetically unstable with a higher rate than any other part of the genome (74).

In a mouse model developed by Yildirim et al., it was shown that loss of XIST directly causes cancer

(75). XIST was knocked out in vivo after X inactivation had taken place and then was reintroduced into female mice (75). The mice developed lethal blood cancer that killed the mice after 1.5 months and only 10% of the mice were still alive after two years. Both homozygous and heterozygous mice were affected (75).

BRCA1 loss or mutation plays an essential role in breast and development (74). It has been proposed that BRCA1 has direct action on XIST RNA coating of the inactive X chromosome and that mutations in BRCA1 might affect X-chromosome inactivation via XIST RNA coating (74). In breast, ovarian, and cell lines with XIST loss, X-chromosome inactivation was also lost, showing that proper function and expression of XIST is essential in X- chromosome inactivation and dysfunction of XIST can result in X-chromosome reactivation and cancer development (72).

Long noncoding ANRIL is located on chromosome 9 within the INK4/ARF tumor suppressor locus, antisense to CDKN2A and CDKN2B genes (38). ANRIL functions by repressing both INK4A and

INK4B, but not ARF (76). The repression of these INK4 genes occurs through direct binding to

CBX7 on the Polycomb Repressive Complex 1 (PRC1) and SUZ12 from Polycomb Repressive

Complex 2 (PRC2) (76, 77). These then apply repressive histone modifications to the INK4 locus

(38).

ANRIL also shows different splicing variants (78). Pasmant et al. reported that single nucleotide polymorphisms (SNPs) in ANRIL are correlated with a higher risk of atherosclerosis and coronary artery disease development (79).

14

Figure 1.5: LncRNAs regulating in Cis and in Trans. LncRNAs contribute to the epigenetic control of gene expression, both by activating and repressing transcription. (A) Enhancer RNAs

(eRNAs) play an RNA-mediated role in forming enhancer interactions resulting in long-range cis gene regulation by lncRNAs. (B) Activation by the JPX lncRNA as an example of a trans-acting lncRNA, which in this case facilitates XIST activation. (C) XIST as an example of a lncRNA that facilitates gene repression in cis across the majority of the X chromosome. (D) HOTAIR expression results in trans repression of HOX genes. Reproduced from Rinn Cold Spring Harb Perspect Biol

2014.

15

1.5.2. Trans-Regulation

Trans-regulatory lncRNAs work to regulate gene expression at distant locations in the genome

(38). This form of regulation is typically epigenetic in regulation type and requires the recruitment of histone modification complexes (38). Some trans-acting lncRNAs include HOTAIR, PCAT-1, and

GAS5 (Figure 1.5; B & D).

HOTAIR works to regulate HOX genes via a trans-regulatory mechanism (80). HOTAIR is transcribed from the HOXC cluster but regulates the HOXD cluster (80). HOTAIR does this by binding to two histone modification complexes: PRC2 and LSD1/CoREST1 (41, 80) acting as a scaffold between the two complexes to coordinate their function (38). This well described lncRNA has been implicated in many cancers. For example, in breast cancer and hepatocellular carcinomas, HOTAIR is upregulated (81, 82). In breast cancer, its overexpression is used as a marker for overall-survival and progression-free survival (83-85). Circulating DNA of HOTAIR in serum has recently been proposed as a novel biomarker for breast cancer (86). Moreover, the overexpression of HOTAIR in breast cancer, helps to promote abnormal PRC2 function by increasing PRC2 recruitment to target genes, where PRC2 induces repressive H3K27me3 chromatin marks (84). This causes widespread change in chromatin structure and gene repression.

Prostate Cancer-Associated Transcript 1 (PCAT-1) is a lncRNA that shows tissue specific expression and is only upregulated in prostate cancer (38). Unlike HOTAIR, PCAT-1 is repressed by PRC2, suggesting that overexpression of PCAT-1 defines a specific molecular subtype of the prostate that does not require PRC2 coordination (87). PCAT-1 supports cancer cell proliferation and functions as a of transcription by repressing genes involved in mitosis and cell division such as BRCA2 (38).

Long noncoding RNA GAS5 has many isoforms ranging up to 12 in size (88). GAS5 regulates cell survival and metabolism via the glucocorticoid receptor (GR) (88). The 3’ end of

16

GAS5 interacts with the GR DNA-binding domain and this is sufficient to repress GR-induced genes

(88). When bound to GR, GAS5 serves as a decoy that prevents GR from binding to its target DNA sequences and genes (88).

GAS5 is shown to play a role in breast cancer. In human breast tumors GAS5 expression is downregulated (89). When GAS5 is overexpressed in breast cancer cell lines, it induces cancer cell and suppresses cancer cell proliferation (89). It is speculated that in addition to GAS5 playing a role in cancer via interaction with GR, GAS5 may also interact with the

(AR) (88).

Rosen and co-workers described the first lncRNA expressed in normal tissue that was reported to be regulated by the process of mammary gland development (31, 90). Pregnancy Induced Non-

Coding RNA (PINC) is a large, alternatively spliced, noncoding RNA expressed in the terminal ductal structures of the parous mammary gland in regression after pregnancy (90). As already mentioned, terminal end buds possess progenitor properties and thus have the ability to repopulate a mammary gland and survive apoptosis (28, 31, 91). PINC expression is temporally and spatially regulated in response to developmental stimuli in vivo (90). PINC RNA is localized to distinct foci in either the nucleus or the cytoplasm in a cell-cycle-specific manner. Loss-of-function experiments showed that PINC is involved in the regulation of cell-cycle progression and breast cell survival

(90). PINC is upregulated during pregnancy but is then downregulated when the breast regresses following delivery and breast involution (92). The cells that survive involution are thought to function as alveolar progenitor cells that rapidly differentiate into milk-producing cells in subsequent pregnancies (92). Experiments done in the post-pubertal mouse mammary gland, determined that mPINC is enriched in luminal and alveolar progenitors (92). mPINC levels increase throughout pregnancy and then decline in early lactation, when alveolar cells undergo terminal differentiation.

Accordingly, mPINC expression is significantly decreased when HC11 mammary epithelial cells are induced to differentiate (92). This reduction in mPINC levels may be necessary for lactation, as overexpression of mPINC in HC11 cells blocks lactogenic differentiation, while knockdown of mPINC enhances differentiation. Shore et al. have demonstrated that the normal breast

17

regulates a lncRNA that leads to terminal differentiation of alveolar cells during pregnancy and mammary gland development to prevent abundant milk production and secretion until delivery (92).

1.6. LncRNAs and the hallmarks of cancer

In 2000, Hanahan and Weinberg proposed that there are six defining traits of cancer (93). Tumor formation is a multi-step process that involves all of the six hallmarks of cancer (94). The six basic cancer traits are: sustaining proliferative signaling, evading growth suppressors, enabling replicative immortality, activating invasion and metastasis, inducing angiogenesis, and resisting cell death (93). In 2011, Hanahan and Weinberg expanded on their original analysis, describing therapeutic targeting to ten hallmarks of cancer, including immune and epigenetic regulators (95).

Here we look at each of the original six cancer traits and include how lncRNAs play a role in tumor formation and progression in relation to each of the original hallmarks of cancer (Figure 1.6).

18

Figure 1.6: LncRNAs associated with hallmarks of cancer. LncRNAs have been implicated in key hallmarks of cancer and can contribute to the onset and progression of cancer. LncRNAs may act as oncogenes or tumor suppressors of all the processes involved in the hallmarks of cancer.

Adapted from Parasramka et al., Pharmacology & Therapeutics 2016.

1.6.1. Sustaining Proliferative Signaling

One of the most common characteristics of cancer cells, is their ability to continuously proliferate in the absence of external stimuli and independently of cell-to-cell contact (93, 94). Tumor cells can acquire the ability to sustain proliferation in many ways. They can produce their own growth factors and corresponding growth factor receptors to create self-stimulation (94). Cancer cells can also

19

produce signals to stimulate normal tumor-associated (tumor-adjacent) cells to produce tumor cell growth factors (94). In addition, growth factor receptor levels or the signaling cascade can be altered causing cancer cells to be hyper-responsive (94). Finally, cancer cells can become completely independent of growth factors and not need them to grow and proliferate through constitutive activation of downstream signaling or negative feedback mechanism disruptions (94).

Some lncRNAs that, when dysregulated, play a role in assisting cancer cells in sustaining proliferative signaling are SRA, PCAT-1, and other lncRNAs that are derived from cell cycle gene promoters.

SRA (steroid receptor RNA activator) is a lncRNA that is part of a protein complex that contains

SRC-1 (94, 96). SRA functions as a coactivator of the steroid receptors PR, ER, and androgen receptor (AR) (94). SRA is an interesting transcript in that depending on the spliced form, it can act as a lncRNA or code for a protein. As a lncRNA, elevated levels of SRA are found in breast tumors and it is believed that the increase in SRA contributes to the altered estrogen and progesterone receptor activity that occurs during breast tumorigenesis (97). As a protein-coding gene, the SRA gene produces a protein that acts as both a coactivator and corepressor (98, 99). of the SRA transcript allows for a balance between lncRNA and protein versions of SRA

(100). This transcriptional balance is speculated to be involved in breast tumorigenesis and tumor progression by regulating, or dysregulating, specific genes (101).

As mentioned, PCAT-1 is a lncRNA that is highly upregulated in metastatic and high-grade localized prostate cancers (87). This lncRNA has been shown to play a role in cancer cell proliferation. After siRNA-mediated knockdown of PCAT-1, cell proliferation decreased and when

PACT-1 was overexpressed, cell proliferation increased (94). After knockdown of PCAT-1, 255 genes were upregulated and 115 downregulated (94). Those upregulated genes were associated with mitosis and cell cycle (94). This shows that PCAT-1 functions as a repressor of a subset of genes involved in antagonizing cell cycle progression and proliferation and thus PCAT-1 may contribute to prostate cancer progression (87, 94).

20

One study identified 216 lncRNAs derived from cell cycle gene promoter regions (102). Many of the lncRNA transcripts show periodic expression throughout the cell cycle and significant altered expression in many human cancers (94). The expression of these lncRNAs is regulated by specific oncogenic stimuli, stem cell differentiation, or DNA damage (94).

1.6.2. Evading Growth Suppressors

There has been extensive study on protein-coding genes that inhibit cellular growth of cancer cells called tumor suppressor genes. Some of these include p53, PTEN, and RB. When activated, these genes can lead to cell cycle arrest or might induce apoptosis (93, 94). When these growth signals are turned off, cancer cells die. Therefore, cancer cells have found ingenious ways to overcome tumor suppressor genes. One such mechanism is the loss or mutation of the tumor suppressor gene (94). In addition, cancer cells have developed ways to inhibit tumor suppressor function using lncRNAs (94). ANRIL and lincRNA-p21 are some of the lncRNAs described to inhibit tumor suppressor genes.

ANRIL, as previously mentioned, is a lncRNA that works via cis-regulation to repress the INK4 locus (38). INK4B is a tumor suppressor located within that locus, and when expression levels are elevated ANRIL blocks its activity (77, 94). ANRIL interacts with the SUZ12 subunit of PRC2, recruiting PRC2 to the INK4/ARF locus to silence INK4B (94). In studies where ANRIL was depleted, INK4B expression levels increased and cancer cell proliferation was inhibited (94). It has also been shown that ANRIL interacts with the CBX7 subunit of PRC1 to place an H3K27 trimethylation mark on the INK4A locus to silence it (76). In prostate cancer, both CBX7 and ANRIL have elevated expression levels showing ANRIL’s correlation to cancer progression and cancer cell survival through the ability to evade growth suppressors and tumor suppressors (94). While

ANRIL shows oncogenic functions, the lncRNA growth arrest-specific 5 (GAS5) displays more tumor suppressor-type functions (94). GAS5 is transcribed from chromosome 1 and is alternatively spliced to form a lncRNA with exons that have a small and poorly conserved open reading frame

21

(103, 104). The introns of GAS5 code for many snoRNAs and these in turn may regulate important biological functions (105).

Unlike SRA, which acts as a coactivator, GAS5 functions as a “ribo-repressor”, interacting with the

DNA binding domain of glucocorticoid receptor (GR) and thus competing with glucocorticoid response elements (GREs) for receptor binding (88). This suppresses the transcription and induction of several downstream genes, including many apoptotic inhibitors, thereby sensitizing the cell to apoptosis (88). GAS5 expression in prostate and breast cancer cell lines induces both growth arrest of cancer cells and apoptosis independently of other stimuli (89). GAS5 expression is greatly reduced in actively growing leukemia cells (106). After the cell cycle is arrested due to high cell density, GAS5 levels increase (106). In breast cancer, GAS5 expression is significantly lower than normal breast epithelial tissue (89). In addition, many other cancers have alterations in the GAS5 transcript including melanoma, breast, and prostate cancers (107-109). These alterations cause low levels of GAS5 and thus GAS5 cannot enact its tumor suppressor function to stop the growth of the cancer cells.

Long intergenic noncoding RNA p21 (lincRNA-p21) is a direct p53 target located right next to the p21 (CDKN1A) gene (110). Its expression is typically activated due to DNA damage in many different tumor models (110). LincRNA-p21 interacts with hnRNP K, an RNA binding protein that acts as a repressor of transcription (110). When lincRNA-p21 is bound to hnRNP K, it mediates its binding to target genes, finally leading to gene silencing and induction of apoptosis (110). This mechanism has only been described in mice and while the transcription of human lincRNA-p21 is similarly initiated by DNA damage in human fibroblasts, the exact mechanism is still unknown (94).

It is speculated that lincRNA-p21 might directly silence the tumor suppressor p21. Activation of p21 is dependent on the silencing of the lincRNA-p21 antisense transcript (111). In human cells, this bidirectional transcription could be an endogenous regulatory mechanism that uses an antisense

RNA to directly regulate the transcription of a gene (94).

22

1.6.3. Enabling Replicative Immortality

Cancer cells, unlike normal cells, have the ability of unlimited replication. In normal cells, replication is limited to a certain number of division cycles based on the shortening of telomeres in each division (93, 94). This telomeric shortening is what determines the number of times a cell can divide.

Cancer cells seem to have found ways to overcome the shortening of telomeres during cell division to allow for unlimited replication (93, 94). There are two ways cancer cells can do this. The first is through telomerase, an enzyme that adds telomeric repeats to the ends of chromosomes, circumventing the normal shortening during cell division (112-114). In fact, about 90% of all human cancers express telomerase (112-114). The second way the other 10% of cancer cells overcome telomere shortening is through alternative lengthening of telomeres (ALT) (112, 113, 115). This strategy is a non-conservative way of lengthening telomeres via the transfer of telomere repeats between sister chromatids (112, 113, 115). TERC and TERRA are two lncRNAs that partake in the first strategy of telomeric lengthening involving telomerase.

The telomerase holoenzyme comprises of a reverse transcriptase called Telomerase Reverse

Transcriptase (TERT) and an RNA primer called Telomerase RNA Component (TERC) (116).

Synthesis of telomeres involves TERT-catalyzed reverse transcription of a small region within

TERC (94). Under normal conditions, telomeres shorten as cells age until the cell’s lifespan has been met, and the cell undergoes programmed cell death via apoptosis. The transcription of this small region within TERC is critical for tumor cells to become immortalized because it produces telomerase which allows for lengthening of telomeres, thus counteracting cell aging and enabling the cancer cells to live “indefinitely” (94). In many human cancers, TERC is amplified and overexpressed (117-119).

TERRA, telomeric repeat-containing RNA, is a group of lncRNAs derived from several subtelomeric loci (120-122). TERRA is involved in telomeric heterochromatin formation and is thought to be a negative regulator of telomerase, acting either locally or globally to inhibit telomerase activity (123,

124). This promotes telomere capping and preserves genomic and telomeric integrity (125).

23

Reduced expression of TERRA is necessary for tumor cells to lengthen telomeres, linking TERRA to cancer (94). Cancer cells that express telomerase and high methylation of subtelomeric loci also show low expression levels of TERRA (126). When cells were incubated with synthetic TERRA mimicking oligonucleotides, telomerase activity was inhibited, showing a potential therapeutic for cancers that have lowered TERRA levels (122).

1.6.4. Activating Invasion and Metastasis

Cancer cells often display morphological changes as well as changes in cell-to-cell and cell-to- matrix adhesions and interactions (93, 94). This enables cancer cells to pass through the first steps of invasion and metastasis. The invasion and metastasis process consists of multiple biological changes that allow cancer cells to invade into healthy tissues and then into blood and lymphatic vessels (127, 128). The cancer cells must evade the immune system and then exit blood vessels into target tissues to form new tumors in different body sites (93, 94). Two lncRNAs that assist in cancer cell invasion and metastasis are MALAT-1 and HOTAIR.

MALAT-1 (Metastasis-Associated Lung Adenocarcinoma Transcript 1) has been identified as a prognostic marker for metastasis and overall survival in non-small cell lung carcinoma (NSCLC)

(129). MALAT-1 is abundant in many human cell types and is highly conserved across species (49,

130). MALAT-1 is housed in the nucleus and localizes to nuclear speckles to regulate alternative splicing of pre-mRNAs by modulating active serine/ splicing factors (49, 131). MALAT-1 is a master regulator of RNA post-transcriptional modification and perturbation of cellular MALAT-1 levels modifies processing of a wide variety of transcripts directly involved in tumor biology (132).

MALAT-1 is expressed in normal cells and tissue, showing the highest expression in pancreatic and lung cells (129). In many human cancers, like lung cancer, uterine cancer, cervical cancer, and hepatocellular carcinoma (HCC), MALAT-1 is upregulated, and in NSCLC patients, MALAT-1 is associated with metastasis (94, 129, 133-135). The association of MALAT-1 with metastasis is stage-specific and tissue-specific, meaning MALAT-1 can serve as a prognostic marker for survival in early stage lung adenocarcinoma (129).

24

HOTAIR (HOX Antisense Intergenic RNA) is another lncRNA shown to be involved in cancer metastasis. As stated earlier, HOTAIR acts via a trans-regulatory mechanism; it is transcribed from the HoxC locus but regulates gene function of the HoxD locus (80).

HOTAIR is often dysregulated in many types of cancer (82, 84, 136, 137). In breast cancer,

HOTAIR expression is increased in both primary tumors and metastases (84, 94). In addition,

HOTAIR expression levels in primary tumors is correlated with metastasis and poor prognosis (94).

Overexpression of HOTAIR in epithelial cancer cells alters H3K27 methylation patterns via PRC2 and therefore changes gene expression, leading to an increase of cancer invasiveness and metastasis (84). Parallelly, HOTAIR depletion has been show to inhibit cancer invasiveness (84).

In HCC, HOTAIR expression is increased compared to non-cancerous tissues (82). In HCC patients that receive liver transplantation, high HOTAIR levels are used as a prognostic marker for

HCC recurrence and shorter survival (82).

Similarly to breast cancer, HOTAIR depletion in liver cancer cells reduced invasion and induced cancer cell death (138). Additionally, HOTAIR could be used as a potential biomarker for lymph node metastasis in HCC patients (138). HOTAIR suppression causes cancer cells to be more sensitive to tumor necrosis factor A-induced apoptosis as well as chemotherapeutic agents (82).

1.6.5. Inducing Angiogenesis

As cancer cells grow and proliferate, they increase in size and mass. Cell growth is limited by the diffusion of nutrients and oxygen into the cell, but cancer cells often grow past the normal barriers of size and mass (93, 94). This occurs through angiogenesis, or the formation of new blood vessels, which allows the cancer cells’ access to more oxygen and nutrients (93, 94). It also allows for cancer cells to dispose of their toxic metabolic waste into the blood stream (94).

To stimulate blood vessel formation, tumor cells induce angiogenic factors or block antiangiogenic signals (139). Many of the regulators of angiogenesis are signaling proteins that can bind to receptors to either activate or inhibit blood vessel formation (140). One known activator of angiogenesis is VEGF, a protein whose expression is induced by either hypoxia or oncogenic

25

signaling (141). It’s activity can be counterbalanced by Thrombospondin-1, which binds to the same receptors as VEGF (142).

Long noncoding RNAs have also been shown to play a role in the regulation of angiogenesis. One such lncRNA is aHIF. aHIF is a naturally transcribed antisense transcript from the untranslated region of the gene hypoxia inducible factor a (HIF1a) (143, 144). aHIF negatively regulates HIF1a, which is a major inhibitor of angiogenesis (143, 144). When aHIF is overexpressed, HIF1a is targeted for mRNA decay, and therefore is unable to inhibit angiogenesis (145). aHIF is a natural antisense transcript of HIF1a and is found in normal cells and tissues (143). aHIF transcripts are also often present and overexpressed in several human cancers, including renal carcinomas (143,

144). In breast cancer, expression of aHIF is used as a marker for poor prognosis (146).

1.6.6. Resisting Cell Death

There are three major pathways that lead to cell death. These pathways are meticulously altered for the cancer cells to avoid cell death (95).

The first mechanism that leads to a controlled cell death is apoptosis. Apoptosis can be initiated through several external and internal stimuli, and many malignant cancers show reduction of apoptosis as well as therapy resistance (147, 148). One way to trigger apoptosis is through DNA damage and the p53 pathway (93, 94). p53 induces the expression of many pro-apoptotic genes leading to cell death, but in many cancers, p53 is either completely lost or mutated, making cancer cells more resistant to cellular stress and less resistant to apoptotic factors (149). Tumor cells also show increased expression in anti-apoptotic regulators such as Bcl-2 and Bcl-xL (94).

A second mechanism of controlled cell death is autophagy. This usually occurs at low levels in cells but its activity can be heightened in time of cellular stress (150). Autophagy allows for cells to break down organelles and use those degraded products for energy synthesis (150, 151). Autophagy can have both positive and negative effects for cancer cells. Positively, autophagy induction following

26

Beclin-1 gene loss in mice has been shown to increase cancer susceptibility (150, 152). Negatively, under severe stress conditions, cells will shrink via autophagy into a dormant state, disallowing cancer to grow (152). However, this dormancy is reversible and may contribute to cancer recurrence after therapy (152).

The third mechanism of cell death is necrosis. Typically, necrosis is seen as “uncontrolled” cell death, but more recently, evidence has shown that necrosis is a controlled rather than non-direct method of cell death (153, 154). Necrosis can help promote cancer cell proliferation and expansion

(93, 94). Necrotic cells attract pro-inflammatory cells, and these then can activate cancer cell- promoting mechanisms such as angiogenesis, proliferation, and invasion (153, 155).

Long noncoding RNAs have been shown to be influential in the cell death decision, two examples are PCGEM1 and PANDA.

Prostate-specific transcript 1 (PCGEM1) is a lncRNA that is prostate tissue specific and prostate cancer associated (156). PCGEM1 allows for cancer cells to avoid death through apoptosis inhibition. PCGEM1 functions in apoptosis inhibition through delayed induction of the tumor suppressor genes p53 and p21 when PCGEM1 is overexpressed (157, 158). When PCGEM1 expression levels are elevated, levels of cleaved caspase 7 and cleaved PARP (which are both markers of apoptosis) are strongly reduced (157, 158). Additionally, this apoptosis inhibition and p53/p21 delay has been shown to be androgen-dependent (157, 158).

PANDA (p21 associated ncRNA DNA damage activated) also works to promote cancer cell immortality through inhibition of apoptosis. When DNA is damaged, five lncRNAs can be transcribed from the p21 promoter, one of which is PANDA (102). PANDA acts in trans to limit the expression of pro-apoptotic genes through interaction with NF-YA (102, 159).

NF-YA is a transcription factor that binds to essential cell death genes (102). PANDA inhibits the expression of pro-apoptotic genes by sequestering NF-YA away from target gene promoters (102).

Depletion of PANDA expression showed sensitivity of human fibroblasts to apoptosis (102). Kotake

27

et al. provided new insights into the regulatory mechanisms of PANDA earlier this year by showing that PANDA affects regulation of p53 tumor-suppressor protein by stabilizing p53 in response to

DNA damage (159). Thus, PANDA seems to promote cell survival by repression of apoptotic genes but also stabilizes p53 in response to DNA damage.

1.7. General Considerations

The examples above, including PANDA lncRNA, serve as indicators of the many gaps knowledge in the lncRNA field. The experimental advancements haven’t kept up with the amount of information technological advances, such as microarrays and RNAsequencing have provided. This has created a huge gap in knowledge in the ncRNA field, especially in the broad lncRNA area. However, from the information presented above, it is evident that lncRNA is not “dark matter” and that in fact many reported lncRNAs play major roles in disease. These noncoding RNA transcripts have a variety of different mechanisms of action and functions and can regulate gene expression at the genetic, transcriptional, translational, and protein levels. The further we understand and study these functions and mechanisms, the closer we can get to understanding how lncRNA can be used to prevent, screen for, or be used as therapeutics for cancer in particular (95). Current methods of cancer therapy face a variety of difficulties: many therapies are harmful to healthy tissue and other organs in the body and have insufficient delivery methods, and many are nonspecific and do not work on all subtypes of a specific cancer (94). Long noncoding RNAs could provide an answer to the above challenges. Moreover, in the case of breast tissue in particular, identifying lncRNAs expressed in parous vs. nulliparous women could give us hints of possible biomarkers to be used for breast cancer prevention.

This project proposes to tackle an untapped field on understanding the regulation of transcription in breast cancer prevention through comparing lncRNA expression between two physiologically different women. After analysis of 42 differentially expressed lncRNAs between parous and nulliparous women, we selected two lncRNAs (lncBHLHE41 and lncEPCAM) to be tested in functional studies using different molecular techniques in human epithelial breast cells to determine their relevance in breast cancer and breast cancer prevention.

28

CHAPTER 2: MATERIALS AND METHODS

2.1. Data and Human Breast Sample Collection

Breast core needle biopsies (CNBs) from 8 parous and 8 nulliparous women were obtained using

12 Gauge BARD® MONOPTY® disposable core biopsy instrument (Bard Biopsy Systems, Tempe,

AZ). The procedure was performed by an experienced physician at the Mammography Department at Sunderby Hospital, Luleå, Sweden. Three CNBs for each volunteer were taken; one core was fixed for histological analysis and the remaining cores were used for subsequent RNA extraction

(160). From this set of samples, RNA samples were used to prepare the libraries and run the

RNAseq for this project.

All volunteers who were eligible had signed an informed consent and completed a questionnaire that collected data on reproductive history, medical history, family background of cancer, use of tobacco, oral contraceptive (OC), hormone replacement therapy (HRT), and/or other medications

(160).

2.1.1. RNA isolation

Total RNA from the core biopsy samples was isolated using the Qiagen All prep RNA/DNA Mini Kit according to the manufacturer’s instructions (Qiagen, Alameda, CA). RNA quantity was assessed using NanoDrop v3.3.0 (NanoDrop Technologies, Wilmington, DE) and quality was assessed by means of the Agilent 2100 Bioanalyzer (Agilent Technologies, CA). The amount of total RNA yielded from the core biopsies ranged from 150 ng to 1 µg. The total RNA from cell lysates was isolated using the Qiagen RNA Mini Kit according to the manufacturer’s instructions (Qiagen,

Alameda, CA) using Qiagen lysis buffer supplemented with 1% beta-mercaptoethanol (Gibco;

Thermo Fisher Scientific). Only high quality RNA (RIN of 8.0 or more using Agilent Bioanalyzer

RNA gels according to protocol) was used for library preparation.

29

2.1.2. Library Preparation

Between 200ng-1µg total RNA was used for RNAseq library preparation by following the Illumina

TruSeq RNA v1 sample preparation guide. A further clean up step was performed after the enrichment step to get rid of the primer dimer formed during PCR enrichment step, if needed.

RNAseq libraries were subjected to quantification process by Qubit (Life Technologies), pooled for cBot amplification and subsequent 50 base paired-end sequencing run with Illumina HiSeq 2000 platform.

2.1.3. Quality Control

Agilent Bioanalyzer High Sensitivity DNA gels were used for the sizing and quality control of low concentrated dsDNA samples from 50 - 7000 bp. The average size of the cDNA obtained after the library construction was around 260-300 bp.

2.1.4. Quantity Determination

Accurate quantification of the number of amplifiable molecules in a library was critical to the outcome of sequencing results on Illumina next-generation sequencing platforms. Overestimation of library concentration results in lower cluster density while underestimation results in too many cluster on the flow cell, which can lead to poor cluster resolution. Both scenarios result in suboptimal sequencing capacity. Q-PCR is regarded as the gold standard for accurate library quantification of cDNA libraries. cDNA quantity was determined by q-PCR using SYBR Green I dye. A standard curve of six known

DNA concentration samples was used to determine the unknown library’s concentration.

Quantification was achieved by inference from the standard curve. Each point of the DNA standard curve was run in triplicates. 1:8000 dilutions were done to the library and samples were run in triplicates. The average was used to determine the library’s final concentration.

30

2.1.5. RNAseq and RNAseq Data Analysis

RNAseq data was generated using Illumina HiSeq 2000. After the sequencing run, demultiplexing with CASAVA was employed to generate the fastq file for each sample (reads passing filtering can be used as sequence input for alignment). Reads were aligned to the human genome (UCSC hg19 build) using TopHat software (161). Gene expression levels of protein coding genes were extracted using HTSeq (162) with RefSeq annotation (163). After removing genes with 0 sequence read from all samples, total of 20,863 genes were reported for all 16 samples. Data were then normalized by

DESeq normalization method (164) and a small pseudo count 10-5 was added before log- transformation. We removed one outlier data point per gene, per test group (parous and nulliparous) before applying the Limma moderate t-test (165) for differential expression analysis.

The outlier data point was determined by the farthest distance to the median expression level of the given gene. Forty-two (42) lncRNAs were differentially expressed between parous and nulliparous samples using p.value <= 0.05 and fold change >=2 (Chapter 3) and sixty-eight (68) genes were selected by using Benjamini-Hochberg multiple-test adjusted p.value <= 0.01 and fold change >= 2 between parous and nulliparous samples for protein coding genes (Chapter 5). It was also required, for each selected gene, to be expressed >50 (average read count after normalization) in at least one group. The samples were run in two different batches that showed no statistically significant difference between them. Thus, the results from the two batches were combined. In summary, differential expression of normal breast tissue biopsies from 8 parous and 8 nulliparous healthy postmenopausal women were compared and candidate lncRNAs obtained from this analysis were further evaluated in vitro and in vivo.

31

2.1.6. Integrative Genome Viewer (IGV)

The integrative Genomics Viewer is a visualization tool for interactive exploration of large and integrated genomic datasets. It is an easy way to easily visualize a variety of data types including

RNAseq data (166, 167). RNAseq data from our project was uploaded to the software and this allows for viewing quality of RNAseq data (i.e. coverage), expression for the different samples, exact location of the lncRNAs, its length, exact sequence, etc.

2.1.7. Use of the Cancer Genome Atlas (TCGA) Breast Invasive Carcinoma Data Download, Sample Selection and TCGA Data Analysis

The Cancer Genome Atlas (TCGA), a NIH subsidiary, is a database that contains data for numerous cancers. RNAseq data from the Invasive Breast Carcinoma dataset on TCGA (IBC-

TCGA) contained a total of 1,098 clinical data samples as of February 2016. Normal and tumor sample from the same patient (normal-adjacent matched samples) were selected. In addition, we selected samples that were classified under data level 3 (normalized by TCGA). One hundred and twelve patients had matching normal and tumor RNAseq level 3 data (a total of 224 files). The downloaded files included raw data counts and normalized counts on gene expression, exons, and splicing junctions.

2.2. Tissue culture and human breast samples

2.2.1. General tissue culture procedures

Information in Table 2.1 summarizes the phenotypes and karyotypes of the immortalized human breast epithelial cells and the breast cancer cell lines representing the main subtypes of breast cancer.

32

Table 2.1: Cell line characteristics. Adapted from Neve et al. Cancer Cell 10, 515–527,

December 2006

Cell line Gene Cluster ER PR HER2 Tumor Typer Age (years) Ethnicity Culture media Authenticated (Y/N) HCC1954 basal negative negative positive Ductal Carcinoma 61 white RPMI+10%FBS Bought from ATCC Hs578T basal negative negative IDC 74 white DMEM+10%FBS N MCF10A* basal negative negative fibrocystic disease 36 white DMEM/F12 Y MCF10F* basal negative negative fibrocystic disease 36 white DMEM/F12 Y MCF12A* basal negative negative fibrocystic disease 60 white DMEM/F12 N MCF7 luminal positive positive IDC 69 white DMEM+10%FBS Y MDAMB231 basal negative negative adenocarcinoma 51 white RPMI+10%FBS Y SKBR3 luminal negative negative positive adenocarcinoma 43 white McCoys 5A+10%FBS Y T47D luminal positive positive IDC 54 ? RPMI+10%FBS Y IDC = invasive ductal carcinoma * cell line derived from reduction mammoplasty

All cell lines were obtained from the Cell Culture Facility (CCF) at Fox Chase Cancer Center

(FCCC) – except for MCF12A – to avoid the use of ‘‘secondhand’’ cells which may result in multiple clonal variants of some cell lines (see section 2.1.ii). To maintain the collections’ integrity, cell lines were carefully maintained in culture, and stocks of the earliest-passage cells were stored. Quality control was maintained by careful analysis and reanalysis of morphology, growth rates, and gene expression. All cell extracts were made from subconfluent cells in the exponential phase of growth in full media. Information about the biological characteristics of the cell lines used and the culture conditions are summarized in the table above (Table 2.1). All cell lines were maintained in a 37°C,

5% CO2 humidified incubator for the duration of the experiments. When ready to be split, cells were detached from culture dish or flask with 0.25% trypsin/EDTA and counted using a TC20 automated cell counter from Bio-Rad (cat # 1450102). Most of the cell lines have been authenticated by CCF at FCCC (MCF10A, MCF10F, MCF-7, T-47D, MDA-MB-231, and SK-BR-3). I briefly describe below the main characteristics of the cells used in this study.

33

2.2.1.a. MCF10F and MCF10A cells

A mortal human breast cell derived from a subcutaneous mastectomy specimen of a 36-year old woman with no family history of breast cancer acquired the immortal phenotype in vitro (168). The breast was composed of lobule type 2 and was free of neoplasia, exhibiting only stromal fibrosis, cystic changes and ductal hyperplasia without atypia. Original explants of the tissue, maintained for over a year, exhibited a normal diploid chromosomal pattern. MCF10F (F=floating) cells, which exhibited immortality after extended cultivation in low calcium medium, retained the characteristics of the normal breast epithelium, such as lack of tumorigenicity in nude mice, three-dimensional growth in collagen, hormone and growth factor dependency for in vitro growth, lack of anchorage- independent growth and dome formation in confluent cultures (168). Immortalization of these cells was characterized by their continuous growth in culture medium containing either low Ca++ or the conventional level of Ca++ (1.05mM) without entering and without expressing phenotypes indicative of neoplastic transformation such as colony formation in agar or in agar- methocel. MCF10A (Adherent) cells are bona fide normal HBEC in nature, expressing genetic, cytogenetic, ultrastructural and phenotypic characteristics of normal human breast epithelium, representing the cell line closest to a normal HBEC available (169). The phenotype of MCF10A cells has been maintained stable for more than 118 passages in vitro (169). Authenticated MCF10A and MCF10F are available at the Cell Culture Facility (CCF) at Fox Chase Cancer Center (FCCC).

The media to grow these cells was prepared and bought at the CCF at FCCC.

2.2.1.b. MCF12A cells

Non-tumorigenic epithelial cells, produced by long-term culture of normal mammary tissue were kindly provided by Dr. Xiaowei Chen at FCCC. Dr. Chen obtained the cells from ATCC (American

Type Culture Collection) and only passaged them once. Cells were at an early passage when experiments were performed (passage 8). Cells were fed twice weekly with 1:1 Dulbecco’s modified

Eagle’s medium (DMEM)/F12/Ham’s F12 medium supplemented with 5% horse serum, 30 mM

NaHCO3, 10 mg/ml of insulin, 500 ng/ml of hydrocortisone, 100 ng/ml of cholera toxin, 20 ng/ml of

34

epidermal growth factor (EGF), 50 mg/ml of penicillin/streptomycin (170). Media was prepared and bought from CCF at FCCC. MCF12A is an epithelial spontaneous immortalized cell line. Derived from a 60-year old nulliparous women after reduction mammoplasty, it is a non-tumorigenic cell line (171). It shows three dimensional growth in collagen according to ATCC. However, our immune-cytological analysis indicated these cells express low levels of E-cadherin, high levels of vimentin and don’t form three dimensional growth in collagen. Thus, MCF10A cells were selected as our normal immortalized cell line to normalized results.

2.2.1.c. MCF-7 cells

Authenticated luminal MCF-7 mammary adenocarcinoma cells obtained from the Cell Culture

Facility (CFF) at Fox Chase Cancer Center (FCCC) were studied between passages 20-30. Cells were fed twice weekly with 1:1 DMEM/Ham’s F12 medium supplemented with 10% fetal bovine serum (FBS), 2 mM L-glutamine, 30 mM NaHCO3, 16 ng/ml of insulin, and 50 mg/ml of penicillin/streptomycin (170).

2.2.1.d. T-47D or T47D cells

Authenticated luminal cell line T-47D was purchased from CCF at FCCC. T-47D breast cancer cells were grown in RPMI-1640 media supplemented with 10% FBS. All studies were done with cells in low passage number (30-40). T-47D is a cell line derived from metastatic pleural effusion from a primary ductal carcinoma. It’s a breast epithelial cell line.

2.2.1.e. MDA-MB-231 or MDAMB231 cells

Authenticated triple negative breast cancer cell line MDA-MB-231 was purchased from CCF at

FCCC and studied between passages 20-30. MDA-MB-231 cells were grown in RPMI-1640 media supplemented with 10% FBS. Cell line is derived from pleural effusion metastatic site. The primary

35

breast cancer was adenocarcinoma in a 51-year-old Caucasian woman. It’s an epithelial and triple negative (ER-, PR-, HER2-) cell line.

2.2.1.f. Hs-578T

Triple negative breast cancer cell line (ER-PR-HER2-) derived from an invasive ductal carcinoma diagnosed in a 74 year old Caucasian woman. This cell line is epithelial and grows in DMEM media supplemented with 10% FBS.

2.2.1.g. SK-BR-3

Luminal B type epithelial breast cancer cell line derived from adenocarcinoma of a 43-year-old

Caucasian woman. It overexpresses HER2 and is + (ER+PR+). It grows in

McCoy’s 5a media supplemented with 10% FBS. These cells were authenticated by CCF at FCCC.

2.2.i.h. HCC1954

These breast epithelial cells were obtained from 61-year-old East Indian woman diagnosed with invasive ductal carcinoma with no lymph node metastasis. The tumor was HR- (ER-PR-) and overexpressed HER2. These cells were purchased directly from ATCC and expanded in our lab in

2013. Low passage vials were stored at -80oC for short term use and in liquid nitrogen for long term use.

36

2.2.2. Normal and Cancer Breast Tissue Processing

Frozen tissue was obtained from the Tissue Bank at FCCC. Tissues are from biopsies collected during surgery. Although at the time of pathological processing for storage in the Tissue Bank the samples were separated in normal-or-adjacent-to-the tumor and cancer, we re-evaluated the tissue by H&E to only use normal tissue that had a normal-histological appearance. Those bonafide normal breast tissue were selected for comparison of gene expression between them and cancer tissue. Each sample stored in the Tissue Bank at FCCC contain an exhaustive report collected on the patients’ clinical history before surgery and the final histopathological report. Thus, each tissue had a full report of the patients’ clinical and biological characteristics. This helped in the selection of tissue to be analyzed and the interpretation of results.

Frozen tissues were embedded in OCT (Optimal Cutting Temperature compound) and place in cryomolds previous to cutting. A drop of OCT was placed in the bottom of the mold and the tissue was placed on top. This holds the tissue in place while you fill the mold with OCT. Once the mold level was full, we placed the mold with the tissue in OCT on dry ice until ready to be cut. Frozen tissues were stored at -80oC. Only tissues that showed a clear histology were used for further analysis. Lobules, ducts and/or ductules had to be found in tissue labeled as “normal” and a clear tumor pathology had to be seen under the microscope for a tissue to be considered “tumor”.

“Tumor” tissues which presented a few normal structures were discarded and vice versa. Tissues that presented “hyperplasia” or “atypical lobules” were discarded.

2.3. RT-qPCR

Reverse Transcriptase quantitative PCR with TaqMan primer/probe detection was performed and expression levels of selected lncRNAs were determined by triplicate. After RNA is extracted from the cells of interest, a cDNA reaction using reverse transcription enzyme and random primers is performed. The cDNA obtained in step one is then used to perform step two where a real time PCR reaction is performed using a primer/probe combination targeting your DNA of interest. Each experiment was run three times (biological replicate) and each PCR reaction run was done at least

37

in duplicate (technical replicate). A house keeping gene such as GAPDH or 18S was always run in duplicate to normalize expression levels. 18S was found to be very consistent among different cell lines and throughout passages and was thus selected as our control for differential gene expression. 18S was run in duplicate in every RT-qPCR experiment that we performed.

Primer/Probes were obtained from Applied Biosystems. The assay search tool/find your Taqman assay was used to design the primer/probes. As most of our RT-qPCR targets were novel lncRNAs, we used the lncRNA’s sequence as target information. The software designs the best primer/probes for that region taking into account nucleotide balance (amount of ATCGs), specificity to the region (to avoid off target effects), melting temperature, etc.

2.4. Lentiviral Infections for overexpression of lncRNAs

We generated lentiviral constructs that contained a green fluorescent protein (GFP) tag to be used for the selection of the cells. The lncRNA full length was cloned into the lentiviral vector (p-GFP-

Lenti TR30023 8.7kb; Origene with CMV promoter-GFP reporter and U6promoter-lncRNA- puromycin selection antibiotic). HEK293T cells were used to generate lentiviral particles following standard protocol (www.addgene.org/viral-vectors/lentivirus) (Figure 2.1). HEK293T supernatant was harvested at 24, 48 and 72 hrs and stored at 4oC. First, second, and third batches were tested with Lenti-X GoStix—Instant Lentiviral Titration (Clonetech) to determine the best supernatant batch. OriGene’s Lenti-vpak packaging system was used to optimize the packing of the third generation lentivectors into virus particles. Cells of interest (T-47D and MDA-MB-231) were then co-transfected in 6-well plates with the lncRNA-GFP lentiviral vector and the packaging plasmid using a lipid based transfection reagent (MegaTran, Origene). The transduced/infected cells (T-

47D and MDA-MB-231) were left in culture with supernatant for 24 hrs and then fresh media was added. The cells were then left in culture for another 24hrs, 48hrs, 72 hrs. The cells were at each time point harvested to detect the overexpression efficiency of the specific lncRNA by quantitative real-time PCR to determine best working conditions (supernatant concentration, infection time, and harvesting time). Infection efficiencies ranged between 20-50% depending on the target cell line.

38

Expression changes were considered significant if they showed a two-fold change in expression compared to GFP controls (cells transfected with the lentiviral vector containing GFP only). Control cell lines or “infection control” (baseline cell line exposed to just the packaging plasmids and transfection reagent but no lentiviral vector) were used to determine the threshold when using flow cytometry for selection. Results shown are a result of infected cells left in culture for 2 weeks, maintained in media with puromycin, to obtain stable cell lines. Before the experiments, lncRNA overexpression levels were tested by RT-qPCR to confirm two-fold increase in expression of lncRNA compared to GFP control in prepartively GFP+ sorted cells, left in culture for at least two weeks in puromycin selective media.

Figure 2.1: Viral production in HEK293T cells followed by infection of T-47D and MDA-MB-

231. Sorting of GFP+ T-47D and GFP+ MDA-MB-231 cells should correspond with cells that overexpress (OE) the lncRNA. This is confirmed by RT-qPCR after sorting and media selection of

GFP+/puromycin/lncRNA cells. Adapted from www.addgene.com/viral-vectors/lentivirus

2.5. Flow Cytometry

Flow cytometry was used to select for cells which expressed a substantial amount of fluorescence.

Control cells or “infected control” (see lentiviral infections) were used to determine a threshold each time cells were run through flow cytometry. UnFACSed cells showed a significant reduction of lncRNA overexpression compared to FACSed cells. Briefly, cells were harvested and counted in

39

the Bio-Rad Cell Counter machine. Cells were resuspended in complete media containing antibiotics (penicillin, 100 U/ml; streptomycin, 100 µg/ml) to avoid possible contamination during flow cytometry. FACSed cells were then grown in a humidified 5% CO2 37oC incubator until there were enough cells for experiment. Before phenotypic experiments, 10% of these were used to check lncRNA overexpression.

2.6. Fluorescence In Situ Hybridization (FISH) for detection of lncRNAs

FISH is an RNA detection method that enables detection, and localization of RNA at the single cell level. Single molecule RNA FISH is an accurate method that provides insight into lncRNAs’ localization in the cells of interest. In situ hybridization against candidate lncRNAs was performed by using labeled complementary Stellaris RNA probes on paraformaldehyde-fixed cells (172).

Hybridization signals are then detected by fluorescence microscopy (173). For each target

(LncBHLHE and LncEPCAM/BC200), a mix of multiple 20-mer oligonucleotides, each labeled with a single Quasar® 670 fluorophore was designed using Stellaris web designer software

(www.biosearchtech.com/stellaris-probe-designer) and synthesized. Only the lncRNA sequence is needed to synthesize the FISH probes. LncBHLHE probe was composed of 37 probes (20 nts in length) spanning over the lncBHLHE complete RNA sequence length. LncEPCAM probe was composed of 48 probes (20 nts in length) spanning over the lncEPCAM complete RNA sequence length. For MALAT-1 probe (positive control), the Stellaris FISH probe human MALAT-1 with

Quasar 670 Dye was ordered. Adherent cells were grown on cover glass and subsequently fixed and permeabilized. Hybridizations were carried out for 16 hours at 37°C in 50 μl hybridization solution (10% dextran sulfate, 10% formamide in 2X SSC). Samples were then washed, DAPI stained, and imaged.

2.7. TUNEL Assay

To accurately evaluate the cell death induced by the lncRNAs overexpression, we analyzed the overexpressing cells using Terminal Deoxyribonucleotide Transferase-Mediated dUTP modified

40

nick-end labeling (Click-iT® Plus TUNEL assay for In Situ Apoptosis Detection, Alexa Fluor® 594 dye). The Click-iT® Plus TUNEL assay detects apoptotic cells in formalin fixed cultured cells through the use of a small, highly specific labeling moiety and a bright fluorescent dye. The 3’-OH of the DNA fragments in apoptotic cells are labeled and stained by terminal dexynucleotidyl transferase (TdT)-mediated dUTP nick end labeling method using an apoptosis in situ detection kit

(Life Technologies), according to the manufacturer’s instructions. A negative and a positive control

(using DNAase to produce DNA fragmentation, Promega, Wisconsin) were simultaneously prepared along with our generated cell lines. After incorporation of the labeling moiety into DNA fragments, detection is achieved through a catalyzed “click” reaction using conditions mild enough to preserve the fluorescent GFP signal. Fluorescence microscopy was used to capture the image of the TRITC-labeled TUNEL- positive cells. Imaging specifics: The Microscope - Olympus BX53 fluorescent microscope (Olympus); The Camera - RetigaTM 2000R Fast 1934 Digital CCD

Camera-Monochrome (QIMAGING Corporation, Burnaby, BC, Canada); the software - MetaMorph

Software version 7.7.8.0 (Molecular Devices, Sunnyvale CA).

2.8. MTT Assay to measure effect of overexpression of lncRNAs

Cell proliferation was assessed by measuring tetrazolium MTT (3-(4,5-dimethylthiazolyl-2)-2,5- diphenyltetrazolium bromide) absorbance using Vybrant MTT Cell Proliferation Kit (Molecular

Probes, Eugene, OR) (174). For this purpose the cells were seeded in 100 µL culture medium into costar 96-well flat bottom tissue culture plates at an optimal density per cell line (2000–4000 cells/well) to have a 50–80% confluent culture by the time of measurement (175). Cells at 24 hours

(hrs) would be 50% confluent and by day 3 would be 80% confluent. MTT was measured in 3 consecutive days starting the day after seeding to measure effect of overexpression of lncRNA in the cells. The day of MTT measurement, media was removed and replaced with 100 µL phenol- red-free media containing 12 mM MTT. The plate was incubated in a 37°C, 5% CO2 incubator for

4 hrs, followed by 15 hrs of incubation after the addition of HCl-SDS to solubilize the formazan product (Figure 2.2). Optical density was read at 570 nm using Epoch Microplate

Spectrophotometer (Biotek, Winnoski, VT).

41

Figure 2.2: MTT assay scheme. Cell are plated in a 96 well plate at optimal cell density depending on cell line. Absorbance is measured at 570nm at 24 hrs, 48hrs, and 72 hrs.

2.9. Proliferation, Migration and Invasion by Real Time Cell Analysis (RTCA)

Cell Assays were performed using a Real Time Cell Analysis machine at the CCF at FCCC. The xCELLigence® RTCA DP instrument uses noninvasive electrical impedance monitoring to quantify cell proliferation, and attachment quality in a label-free, real-time manner (Figure 2.3). Cells overexpressing the specific lncRNA in a specific cell line are plated in RTCA electronically integrated 16-well plates after being counted in the Bio-Rad Cell Counter machine. Different cell counts were initially tested to determine the optimum amount of cells to be analyzed by proliferation, migration and invasion. Noninvasive gold microelectrodes in 16-well plates maximize the physiological relevance of data extracted from the in vitro cellular assays by enabling label-free, real-time monitoring of cell proliferation, cell attachment quality, and cell invasion/migration. For the proliferation assay (Figure 2.3A): 5000 cells were plated and left for 3-5 days to evaluate proliferation capacity for the different cell lines tested (T-47D and MDA-MB-231). By continuously providing data in short time regimes (every 15 minutes for migration/invasion and every hour for proliferation), RTCA provided a phenotypic view of cellular behavior of our overexpressed lncRNA cell lines.

42

For the migration and invasion assay (Figure 2.3B): 20,000 cells were plated and left for 24 hrs to evaluate migration and invasion capacity. For invasion assay, the 16-well integrated Boyden chamber (CIM plate) was coated on the upper chamber with matrigel 1:40 (matrigel:serum free media). The lower chamber contains culture media with 10% fetal bovine serum (FBS). The two chambers were assembled and serum starved cells were added to the upper chamber. The migrating/invading cells are left to move to the underside of the upper chamber for 24 hrs in a humidified 5%CO2 37oC incubator. The gold microelectrodes collect data at specified intervals and real time curves are created by xCELLigence software (aceabio.com/products/xcelligence-rtca).

At end point, cells were stained for imaging. The non-migratory cells on the upper membrane surface can be removed with a cotton swab, and the migratory cells attached to the bottom surface of the membrane stained. The number of migrating/invasive cells were quantitated by imaging using Olympus upright microscope with a 40× objective.

43

Figure 2.3: Real Time Cell Analysis system (RTCA). (A) Electrode based system can be used to measure proliferation of cells in real time (Eplate). (B) Electrode based system (CIM plate) can be used to measure Migration when wells are not coated with matrigel and Invasion when wells are coated with 1:40 matrigel:media. Adapted from X-Celligence system (Roche).

2.10. Tumor tissue immune-histochemical (IHC) staining

2.10.1. Tumor tissue microarray (TMA)

TMA was constructed using paraffin‐embedded xenografts. Human breast cancer tissues known to be ER, PgR, or HER2 positive were added to the same TMA block for validation of the staining.

The use of human breast cancer samples received approval from the Institutional Review Board

(IRB) of FCCC. Staining was performed following the standard protocol using i6000 Autostainer

(BioGenex, Fremont, CA). Primary antibodies tested include Ki67. Super SensitiveTM Polymer‐HRP

Detection System (BioGenex, Fremont, CA) was used to detect the immunostaining.

44

2.10.2. Casein-β Immunostaining

Ethanol fixed, paraffin embedded breast biopsy samples from postmenopausal women were used for immunohistochemical analysis for Casein-β protein. Paraffin sections at 4 µm were deparaffinized and placed in the Antigen Retrieval Citra Plus solution (BioGenex, Fremont, CA.

#HK081-20K) and microwaved for 10 min at 100°C. After cooling for 20 min, slides were quenched with Peroxide Block (BioGenex, HK111) for 10 min, followed by blocking with Power Block

(BioGenex, #HK085) for 30 min at room temperature. The sections were then incubated overnight at 4°C with Casein-β rabbit polyclonal antibody (Santa Cruz, Dallas, Texas #SC-30041) at 1:800 dilution. The slides were washed 4×10 min with Super Sensitive Wash Buffer (BioGenex, #HK583), and then proceeded with BioGenex i6000 Autostainer for the second antibody incubation and development. Images were acquired using OLYMPUS BH-2 RFCA microscope with DP72 camera.

The staining was evaluated based on the staining intensity and scored with + for the positive staining, and -/+ for weak positive staining.

2.11. Xenograft study

The tumorigenic ability of the cell lines modified by the overexpression (OE) of the selected lncRNAs was tested in 6-8 week old female CB17/SCID mice (Fox Chase Cancer Center’s

Laboratory Animal Facility). Cells modified with OE lncRNA vector were injected subcutaneously in the mammary fat pad of the abdominal region of the mice and tumors were measured three times a week and excised when they reached a maximal diameter of 10 mm (176). We selected two lncRNAs/cell line (lncBHLHE and LncEPCAM in MDA-MB-231 and T-47D) to perform the xenograft studies. At least five (5) mice per lncRNA were evaluated in each separate xenograft experiment.

We repeated the T-47D xenograft experiment for groups GFP and lncEPCAM to increase the statistical power. For total amount of animals per group see Table 2.2.

Specifically, we subcutaneously inoculated 2x106 lncRNA OE MDA-MB-231 cells and 5x106 lncRNA OE T47D in 100ul of matrigel (1:1 matrigel:PBS) in the mammary fat pad of CB17/SCID mice (177). In total we had 8 groups as described in Table 2.2 [2 lncRNAs + 2 non-targeting control

45

(GFP cells and infection control cells) x 2 cell lines]. We utilized 5 female CB17/SCID mice per group in our T-47D experiment – original and repeat; 40 animals total– and 6/7 mice per group in the MDA-MB-231 xenograft experiment (25 CB17/SCID mice total – see Table 2.2). Tumor response was evaluated by determining the number of mice developing tumors and the number of tumors per mice. Tumor volume was calculated as follows: 0.5 × L × W2, where L (length) and W

(width) are the large and smaller diameters. Tumors developed in inoculated CB17/SCID mice were processed for Hematoxilin & Eosin (H&E) and immunocytochemical studies – details below in tissue removal and harvest. All organs (lungs, brain, liver, kidneys, spleen, bladder, uterus & ovaries) were processed for H&E to evaluate tissue abnormalities or metastasis due to tumor formation in the mammary fat pad. All animal studies were carried out using protocols approved by the IACUC of FCCC. The number of animals per group is the minimum necessary to give statistical power to the results and in this way allow a detectable significant difference between the groups.

46

Table 2.2 Cell line, treatment and number of animals per group in xenograft study

Group Cell line Treatment Number of animals 1 T47D Control/Mother cell line 5

2 T47D GFP+ 10

3 T47D lncBHLHE 5

4 T47D lncEPCAM 10

5 MDA-MB-231 Control/Mother cell line 6

6 MDA-MB-231 GFP+ 7

7 MDA-MB-231 lncBHLHE 6

8 MDA-MB-231 lncEPCAM 6

Total number of animals 55

The xenograft study in detail

Female CB17/SCID mice of 6-8 weeks old were used for the xenograft studies. Fifty-five

CB17/SCID mice obtained from the FCCC Animal Care Facilities were housed five to a cage and maintained with 12 h light/12 h darkness cycle. They received water and food at libitum. On the day of cell injection, the mice were anesthetized with 1-3 % isoflurane. MDA-MB-231 cell line was orthotopically injected in the mammary fat pad of the abdominal region of the mice at a concentration of 2.0x106 cells suspended in 100µl of matrigel (1:1 matrigel:PBS). T-47D cell line was also orthotopically injected in the mammary fat pad of the abdominal region of the mice at a concentration of 5.0x106 cells suspended in 100µl of matrigel (1:1 matrigel:PBS). Table 2.2 indicates how many animals were used for each group in total.

Before injection, cells were prepared in a sterile hood and transported to LAF in aseptic conditions

(sterile syringe and needles). Once injected, the mice were monitored 3 times a week from Day 2 until the end of the experiment to make sure no signs of discomfort or distress appeared in the duration of the experiment. As from Day 8 until the end of experiment, tumors were measured 3 times a week.

47

T-47D is an estrogen receptor positive cell line. The growth of these cells depends on higher levels of estrogen than what CB17/SCID mice produce. Thus, for T-47D xenograft models, implantation of a subcutaneous 17-β-estradiol-releasing pellet was required for the formation of tumors (177,

178). The pellets were prepared in the lab under sterile conditions for a final concentration of 0.75 mg of estrogen/pellet (Figure 2.4). The pellet is implanted 2 days before the injection of the cells and is left for the duration of the experiment. These pellets are 2*2 mm in size and weight approximately 25mg. Inhalation of 1-3% isofluorane in oxygen was used as anesthesia during survival surgery (Figure 2.4A). After disinfecting, a ~0.5 cm incision was made using surgical scissors in the loose skin at the back of the neck and the skin was bluntly dissected caudolaterally.

The pellet was installed using tweezers (forceps), and the incision was closed with a wound clip

(Figure 2.4B). The wound clip was removed at 7-10 days when the healing of skin was evident.

The pellet is maintained throughout the experiment (Figure 2.4C).

Figure 2.4: Survival surgery for estrogen implantation. (A) Mice are put asleep while a small incision is produced in the back of the neck/upper back. (B) The pellet is placed subcutaneously

48-72 hrs before cell injection and skin clipped to allow incision to heal. (C) Shape, size and color of estrogen pellets made with beewax.

48

For both cell lines, tumor growth was monitored 3 times a week by visual inspection from Day 2 to

Day 8 (Figure 2.5 – Timeline of experiment). After Day 8, the length, width and height of tumors was measured with a caliper 3 times a week for the duration of the experiment (approximately 8 weeks).

Figure 2.5: Timeline of xenograft experiment. End point: tumor removal after euthanasia

Tumor removal and tissue harvest for necropsy

At the end of the experiment, animals were euthanized by intraperitoneal injection of 1:10

Xylazine/Ketamine solution, administrated at 90mg of Ketamine/Kg of body weight and detailed necropsy was performed.

The chest and abdomen fur was wet with 70% ethanol. The palpable tumors were examined for morphology evaluation. Tumors were weighted and fixed with 10% formalin for 48 hrs before tissue processing.

A few animals had to be euthanized earlier as they started presenting signs of morbidity, and distress (rough hair coat, hunched posture and weight loss of more than 10% of their body weight).

Some animals presented with urine retention, evident at the time of necropsy when a bigger full bladder was observed.

49

2.12. Statistical analyses

Data were analyzed using the unpaired Student’s t-test. Values represent the mean ± SD from one representative experiment of three independent experiments. Tests were performed separately for each cell line. The pvalue of 0.05 or less - indicated in each experiment - was considered statistically significant. Unless noted otherwise, *pvalue<0.05; **pvalue<0.01; ***pvalue<0.001. All in vitro experiments were performed at least three times.

For the xenograft studies, using two sample two-sided t-test with a 5% Type I error, with 6 animals in each arm of MDA-MB-231 xenograft studies, we were able to detect differences in tumor size with at least 80% power. With 5 animals in each arm of T47D xenograft studies, we were able to detect differences in tumor size with at least 90% power.

For the analysis using TCGA database, we compiled a list of genes we were interested in (following results obtained when analyzing differentially expressed genes between parous vs. nulliparous women) and organized the data by patient – RNAseq raw counts of tumor vs. normal samples.

TCGA data selection included all patients who had matched normal and tumor tissues (112 total patients) as of February 2016. The differential expression analysis between normal vs. tumor tissues samples was performed using DEseq (164); one of the Bioconductor tools (179) for the R statistical programming language and environment (180). Firstly, genes with zero sequence read from all samples were removed. Then, the raw counts of all 112 samples were normalized to the estimated library size using an inherent method (estimated Size Factors) in DEseq. Finally, the negative binomial distribution of this package was employed to perform differential expression analysis using the normalized read counts. The spearman’s correlation coefficient test was used for correlation detection.

50

CHAPTER 3: RNASEQUENCING OF BREAST TISSUE

FROM PAROUS VS. NULLIPAROUS WOMEN

3.1. How the transcriptional machinery of the breast is affected by pregnancy

As described in section 2.1, normal breast tissue of nulliparous and parous postmenopausal women were used for transcriptomic analysis. The RNA isolated was used for RNA sequencing using Illumina HiSeq 2000 sequencer, an approach that allowed the direct sequencing of the RNA and the alignment of the sequence reads to the genome. The focus of this analysis was the noncoding regions of the genome. Thus, quantity and type of lncRNAs at two different levels of breast differentiation, nulliparous (NP, undifferentiated) and parous (P, differentiated) was categorized from the RNAsequencing results and data analysis. After breast tissue biopsy and

RNA extraction, our main aim for this part of the project was to determine how the transcriptional machinery of the breast is affected by pregnancy.

In chapter 2, I have described the technical details of the RNA sequencing and in this section I will emphasize some important technical aspects of this study.

3.1.1. RNA Library Preparation

After RNA extraction from breast tissue biopsies from parous (P) and nulliparous (NP) women without history of breast cancer, poly A tail RNA was obtained using poly (T) oligos covalently attached to magnetic beads (TruSeq Illumina technology). This RNA was then reverse transcribed to obtain cDNA. Then cDNA template was fragmented into 200-300 nucleotides prior to sequencing to avoid secondary structure formation that could interfere with the reading during RNAseq.

51

3.1.2. Next Generation Sequencing (NGS)/RNA Seq

High-throughput sequencing technology generates millions of short reads (50 base-pair long; paired end) from the prepared RNA library. The characterization of gene expression in the P vs.

NP tissue via measurement of RNA levels was performed to determine how the transcriptional machinery of the breast is affected by full term pregnancy that occurred in the past, when the woman was in her middle twenties to early thirties, and as a consequence how the breast tissue has been imprinted by this unique physiological process and maintained until the postmenopausal age as a specific transcriptome signature.

3.2. Background

The morphological, physiological and genomic changes resulting from pregnancy and hormonally- induced differentiation of the breast and their influence on breast cancer risk have been addressed in previous publications (32, 181-183). During postmenopausal years the breast of cancer-free parous and nulliparous women contains preponderantly Lobule 1 and the fact that nulliparous women are at higher risk of developing breast cancer than parous women indicates that Lobule 1 in these two groups differ biologically and exhibit different susceptibility to carcinogenesis (184).

The biological differences are probably at the transcriptomic level and we hypothesized that molecular changes occurring during pregnancy would be long-lasting, even decades after pregnancy.

After the Human Genome Project was declared complete in 2003, a follow-up was launched later that year called the Encyclopedia of DNA Elements (ENCODE) Project. This public research project was designed to identify every functional element in the human genome now that it had successfully been mapped (185). Through this project, over 80% of the human genome was able to be identified in one or more biochemical functions (185), despite only 1.5% of all DNA translating to protein- coding genes. This gave rise to several new types of RNA which had previously been ignored, including ribosomal RNA (rRNA), micro RNA (miRNA), and long noncoding RNA (lncRNA) (186).

52

Long noncoding RNA has been classified as RNA which is longer than 200 nucleotides (a length which was set arbitrarily to distinguish them from small regulatory RNA) which does not appear to code for a protein (i.e. no significantly large open reading frame) (187). Despite making up the majority of the unknown functional elements of the human genome, lncRNAs are one of the least understood in how they function.

With these noncoding bits of RNA (formerly thought to be byproducts of sloppy transcriptional machinery) proven to have actual biological and regulatory functions, the search for determining their roles in development and disease began a few years ago. By using emerging techniques to study these regions through whole-genome sequencing and altered gene expression (silencing or overexpression), lncRNAs were discovered as active modifiers of the three-dimensional structure of the genome and dozens of examples of the mechanism of action of specific lncRNAs were pinpointed (40, 188). XIST, also known as the X-inactive specific transcript, is an example of lncRNA which participates in gene silencing through recruitment of epigenetic modifying complex

(188). XIST, which was found to coat the inactive X chromosome, ensures gene silencing through the recruitment of PRC2 (Polycomb Repressive Complex 2), similar to HOTAIR. Importantly, the discovery of XIST and HOTAIR showed that lncRNAs are not only transcribed, but also processed through splicing and polyadenylation, and can also regulate both in cis and in trans (188) [See

Chapter 1].

Thus, we sought to evaluate the transcriptomic differences between our samples (parous vs. nulliparous) by studying their differential expression through RNA sequencing results. Due to economic and logistic reasons, the samples were run in 2 different batches. We first ran 4 parous

(P) and 2 nulliparous (NP) samples and in a later batch, we ran 4 more P samples and 6 NP samples for a total of eight (8) P and (8) NP samples. These two different batches showed no statistically significant difference between them. Thus, the results from the two batches were combined.

53

3.3. Results

After RNAseq of our 16 samples (8P and 8NP) we determined that 42 lncRNAs were differentially expressed between the 2 groups (fold change>2; p.value< 0.01). With a total of 42 long-noncoding regions of interest shown in Figure 3.1, a literature search to identify lncRNAs involved in mammary gland development and/or disease was conducted to narrow the list of candidate lncRNAs to be further studied. No literature was found detailing any of our specific lncRNAs (except for lncEPCAM which was vaguely described as BC200), so we determined that these were all novel lncRNAs in our system. Some of these regions may have aliases depending on the lncRNA database used.

We used LNCipedia, which is a comprehensive compendium of human lncRNAs (189). LNCipedia has become an integrated database of human annotated lncRNA transcripts obtained from different sources. Thus, all the aliases used by different databases that describe the same region in the genome, are combined by LNCipedia, making the search user-friendly. In addition to basic transcript information and putative RNA secondary structure, LNCipedia offers information on calculated protein coding potential, putative microRNA binding sites and up-to-date literature on a particular lncRNA of interest (190).

54

Parous Nulliparous

Upregulated Downregulated

Figure 3.1: Heatmap representing differentially expressed lncRNAs between parous and nulliparous women. Upregulated lncRNAs are represented in red and downregulated lncRNAs are represented in green. Light and dark blue colors under parous represent 2 different batches of samples. Both batches were determined to be comparable and thus samples were grouped under

“parous”. Dark and bright red represent grouped samples under nulliparous.

55

We next analyzed the coverage of earlier sequencing with the software tool called Integrated

Genomics Viewer (IGV) (166, 167). This was performed in order to ensure that custom probes and primers to be used in RT-qPCR would be specific to the region of interest. Moreover, IGV helps determine how well that specific region was covered during RNAseq.

The chromosomal location for each lncRNA was obtained from LNCipedia (189, 190) and the coverage was viewed at an 800bp resolution (Figure 3.2). Ideal coverage was defined as regions which showed high levels of readings (Figure 3.2). For each lncRNA the best coverage regions were recorded and scored from one to three, with a “one” meeting the criteria stated above (good coverage throughout the region - Figure 3.2A), and a “three” showing shorter regions (<100 bp) with sporadic coverage (Fig. 3.2B) and/or coverage not being consistent through sample of the same group (P vs NP). The coverage in IGV for a few examples of all 42 lncRNAs is shown in

Figure 3.2.

56

Figure 3.2: IGV images showing the coverage for 4 different lncRNA differentially expressed between P and NP women. RNAseq coverage was evaluated for all 42 differentially expressed lncRNAs. Figure 3.2 shows representative results. A – Two lncRNAs which show good coverage throughout gene’s length; scored as 1. Both genes are upregulated in nulliparous women. B – Poor coverage throughout gene’s length; scored as 3. Lnc-WSB1 is reported as upregulated in nulliparous women following RNAseq data analysis. On the other hand, lnc-CDH5 is reported upregulated in parous women.

57

The lncRNA’s sequence reported by LNCipedia was first verified through IGV for coverage during

RNAseq and then ran through NCBI’s Basic Local Alignment Search Tool (BLAST) in order to check for sequence specificity. Each sequence was run through Nucleotide BLAST and searched versus the human genomic for highly similar sequences. Sequences which returned multiple matches in any assembly searched by the query were discarded as potential candidates for further study due to the possibility of generating reagents that may have off-target effects.

After the list was narrowed down, Applied Biosystems (ABI) was selected to create custom primers and probes for each lncRNA. All remaining candidates were then run through ABI’s software available through their Custom TaqMan® Assay Design Tool. ABI detected that thirteen sequences were adequate for primer/probe specific design (Table 3.1). Some candidates which failed testing, failed due to sequence specificity not being high enough, possibly allowing the probe to bind to off- target areas in the genome. A few others failed due to thermodynamic criteria not being met, which was detected through their testing of melting temperatures, and the potential of the formation of secondary structures due to their nucleotide sequence. The latter highlights the stringent criteria of Custom TaqMan® Design Tool. After this analysis, the remaining thirteen lncRNAs were then ranked based primarily on their coverage using IGV. These rankings are shown in Table 3.1.

58

Table 3.1: LncRNAs ranked by RNAseq coverage and BLAST. All 13 lncRNAs were analyzed by IGV to determine quality of coverage during RNAseq and were then BLASTed to determine potentiality of off-target effects during primer design.

All 13 lncRNAs were tested in vitro for all cell lines selected Rank Name 1 lnc-BHLHE41 to represent different subtypes of breast cancer (Chapter 4). 2 lnc-EPCAM The ranking shown in Table 3.1, obtained after careful

3 lnc-NLGN2 consideration of all 42 lncRNAs differentially expressed 4 lnc-COL15A1 between parous and nulliparous, determined that lnc- 5 lnc-DCAF5 6 lnc-KLF15 BHLHE41 and lnc-EPCAM were top of the list candidates for 7 lnc-WSB1 further study. However, we considered all 13 lncRNAs for a

8 lnc-CDH5 screening of the selected cell lines (Figure 4.2). Figure 3.3 9 lnc-ALDH9A1 shows IGV images of these top two candidates showing the 10 lnc-P2RY1 11 lnc-KIAA1267 coverage throughout the regions of interest. Lnc-BHLHE41 12 lnc-GPBP1 is a 566 nt lncRNA (1304 nt unspliced) transcribed from

13 lnc-THSD4 and is located upstream of BHLHE41 (also known as SHARP1 and BHLHB3). Interestingly, BHLHE41 gene has been linked to angiogenesis

(191) and metastasis (192, 193) of endometrial cancer, lung cancer (194), pancreatic cancer (195,

196) and breast cancer (177, 197, 198), among other adult and pediatric cancers. Lnc-BHLHE41 is highly upregulated in parous women and hence, if proven, could potentially be protecting the breast of parous women from developing breast cancer. Lnc-EPCAM is a 13 kb lncRNA with only

1 that spans throughout a good portion of . This interesting very long non coding RNA is transcribed sense to EPCAM gene, which is a well-described cell adhesion molecule and its implications in cancer are well documented. This gene encodes a carcinoma-associated antigen which is expressed on most normal epithelial cells and human carcinomas (199) and functions as a calcium-independent cell adhesion molecule (200). The antigen is being used as a target for immunotherapy treatment of human carcinomas (201-203). IGV images of the two top lncRNA candidates are shown in Figure 3.3. Both lncRNAs had good coverage during

RNAsequencing for all samples and thus were considered as good candidates to be tested in vitro.

59

Figure 3.3: IGV images of lnc-BHLHE41 (top) and lnc-EPCAM (bottom). The location and regions in which these lncRNAs sit in the genome was searched to evaluate the quality of the

RNAseq results and thus determine strong potential candidates for in vitro studies in a breast cancer setting.

From the IGV images and using LNCipedia, we could determine that although lnc-EPCAM is 13kb long, the region that was highly amplified in all samples was from chr2:47562453 to chr2:47562653.

This region corresponds to a vaguely described lncRNA called BC200 (also known as BCYRN1).

This 200 nucleotide long non coding RNA was first described in 1987 when Watson and Sutcliffe observed it was expressed in the monkey brain (“and presumably also occurs in humans”) and named it Brain Cytoplasmic transcript (204) - BC200. BC200 is conserved in mice but not in zebrafish (189, 190). A decade later, Chen et al., identified the expression of this lncRNA in human tumors (205).

60

Characteristics of location in the genome, length, number of exons, transcriptional direction, and type of lncRNA (intergenic, intronic, overlapping, etc.) is described in Table 3.2. Our results from

RNAseq for these two lncRNAs regarding fold change between parous and nulliparous, and the p.value for those results are also listed in Table 3.2.

Table 3.2: Lnc-BHLHE41 and lnc-EPCAM characteristics. Genomic information and RNAseq data. Fold Change (FC) is relative to Parous women. A FC>1 represents lncRNA upregulated in parous women. A FC<1 represents a lncRNA downregulated in parous women (i.e. upregulated in nulliparous women).

lncRNA Chromosome Location Length # Exons Strand Type lnc-BHLHE41 chr12.p12.1 chr12:26278246-26279550 1304 bp 3 - intergenic 566 w/o introns lnc-EPCAM chr2.p21 chr2:47558199-47571656 13,458 bp 1 + intergenic

lncRNA Sense or antisense? Distance from gene? Fold Change Regulation P. value lnc-BHLHE41 sense to BHLHE41 243 bp upstream BHLHE41 3.68 upregulated in Parous 0.0002

lnc-EPCAM sense to EPCAM 24,631 bp upstream EPCAM 0.48 upregulated in Nulliparous 0.0041

61

CHAPTER 4: LNCRNAS IN BREAST CELLS,

BREAST TISSUE AND XENOGRAFT MICE MODEL

4.1. Evaluating the relevance of top lncRNA candidates from healthy breast tissue in a breast cancer setting

Breast cancer tumors are divided into 3 different immune-cytochemical groups (luminal, basal/triple negative and HER2+ subtype) based on their receptor expression. Several breast cancer cell lines were selected to represent these 3 molecular subtypes. After screening the top lncRNA candidates in 7 cell lines that represented luminal, basal/triple negative (TNBC), and HER2+ breast cancers, two of these lncRNAs were selected for in depth characterization. One of these lncRNAs is completely novel (lncBHLHE41) and the other one is a previously reported but vaguely studied lncRNA (LncEPCAM/BC200) that has been shown to be upregulated in breast cancer tissue. The experiments described in this chapter helped unravel the role of these two lncRNAs in cancer phenotype. After overexpressing these lncRNAs in breast cancer cells, we studied how their overexpression affected the cancer phenotype, including changes in proliferation rate, apoptosis, migration, and invasion. RNA fluorescence in situ hybridization allowed us to determine the localization of the lncRNAs in the cell’s compartments. Altogether, the findings described in this chapter provided novel information on lncRNAs induced by pregnancy in breast cells, and identified lncBHLHE and lncEPCAM/BC200 as potential key regulators in breast differentiation and breast cancer cell transformation.

4.2. In vitro cancer/normal model to study role of lncRNAs

Breast normal-like and breast cancer cell lines were evaluated for thirteen lncRNAs listed in Table

3.1 to determine their level of expression. Custom primers and probes for the selected lncRNAs were used to determine level of expression in three normal-like cell lines, two luminal, two basal/TNBC and two HER2+ breast cancer cell lines. The six breast cancer cell lines (2 luminal, 2

62

basal and 2 HER2+ cells) cover a wider scope of breast cancers. The normal-like cell lines that were used are MCF-10A and MCF-10F (169), immortal non-tumorigenic ERα negative human breast epithelial cell lines and MCF-12A (206), an immortal non-tumorigenic epithelial cell line established from tissue taken at reduction mammoplasty from a nulliparous patient with fibrocystic breast disease. The two luminal cell lines selected are MCF-7 and T-47D; both tumorigenic cell lines which are luminal A, ER+, PR+/–, HER2–, Ki67 low, endocrine responsive and often responsive. The two breast cancer basal cell lines that were tested are MDA-MB-

231 and Hs-578T. Both basal cell lines are ER–, PR–, HER2– Ki67, E-cadherin and claudin-3 low and they have an intermediate response to chemotherapy (171, 207). We also selected SKBR-3 and HCC-1954, representative of HER2+ cell lines (171). Using different cell lines, we can test the pathways in which lncRNAs are thought to regulate the transcription of other genes in a variety of breast cancers.

4.2.1. LncRNA levels in the in vitro model

In order to select the lncRNAs for functional studies, we evaluated the expression levels of the thirteen selected lncRNAs chosen after thorough consideration (Chapter 3). The detection of the levels of lncRNAs in the cell lines described in Materials and Methods section 2.1 and above, was performed by reverse transcriptase real time PCR. Total RNA was extracted from the normal-like cell lines (MCF-10A, MCF-10F and MCF-12A) and the breast cancer cell lines (MCF-7, T-47D,

MDA-MB-231, Hs-578T, SKBR-3 and HCC-1954) using QIAGEN RNeasy Plus Micro Kit. Using standard protocol, the reverse transcription and PCR reactions were performed with Applied

Biosystems’ (ABI) High Capacity cDNA transcription kits and ABI’s TaqMan Gene Expression kit, respectively. Primer/probe targeting 18S was used as the housekeeping gene for control, as 18S was previously tested to be consistent in its expression across the different cell lines evaluated.

Three of the 13 lncRNAs were not expressed in the cell lines tested (Ct values lower than the corresponding negative control).

63

All cell lines were run in three independent experiments and the passage number tested are summarized in Figure 4.1A which shows the expression levels by Ct of each lncRNAs by cell line and Figure 4.B shows the expression levels by lncRNA of interest. In a real time PCR assay a positive reaction is detected by accumulation of a fluorescent signal. The Ct (cycle threshold) is defined as the number of cycles required for the fluorescent signal to cross the threshold (i.e. exceeds background level). Thus, a higher Ct value means this gene or lncRNA is less expressed by this cell line.

A

64

65

B

66

Figure 4.1: LncRNAs expression averaged from 3 independent experiments in the cell lines

(and the passages) shown. (A) Ct values of cell lines evaluated by lncRNA. Blue bars represent the Ct values of breast normal-like/immortalized cells; red bars represent the Ct values of breast cancer cells. (B) Ct values of the lncRNAs evaluated by cell line.

Using the average expression levels of each lncRNA in the different cell lines, we generated a heatmap representing the results of the averaged expression levels of these ten lncRNAs/cell line

(light purple – highly expressed lncRNA; black - low expression of lncRNA). Figure 4.2 represents the latter results.

Figure 4.2: Heatmap representing averaged lncRNA expression per cell line (3 independent experiments; n=3)

67

From the ten lncRNAs, two lncRNAs (lncBHLHE41and lncEPCAM) were selected for the following reasons:

1- Their high differential expression between breast cancer cell lines and normal

immortalized cells. For example, lncEPCAM is not expressed in normal immortalized

cells.

2- The relevance of the nearby gene (BHLHE41 and EPCAM) in differentiation and cancer

initiation/progression showing potential genomic “hotspots”.

3- The quality of sequencing data in those regions.

4- The RNAseq results: These are the top ranked lncRNAs in the RNAseq results and further

evaluation when comparing expression of lncRNAs between parous and nulliparous tissue

(Chapter 3).

Figure 4.3: Expression levels (-ΔΔCts) of lncBHLHE and lncEPCAM in the two cell lines selected for in-depth study. Determination of lncRNAs levels in T-47D and MDAMB-231 was performed in three consecutive passages and was averaged. MCF10F (P.90, 94 and 97) was used as normalization control. **pvalue <0.01

68

4.2.2. Overexpression (OE) of the selected lncRNAs

To test the functional role of these lncRNAs, overexpression experiments were carried out for the two lncRNAs selected. The study of these lncRNAs presents a challenge as not only these are novel molecules (never described in the literature before except for a couple articles on lncEPCAM and its level of expression in cancer tissues), but also the accessibility to reagents is scarce and all the reagents used in the present study are custom made. We generated plasmids where the sequence of the lncRNA was cloned into a GFP/puromycin vector and transfected through

Lipofectamine or electroporation into several cell lines. The transfection efficiency for these methods was low (between 0.5-5%). Thus, we decided to generate lentiviral vectors that would produce overexpression of the two lncRNAs by infection of the same cell lines (see details in

Chapter 2 section 2.4). The efficiency increased notably and we were able to generate stable cell lines after puromycin treatment. Briefly, HEK293T cells were transfected with transfer vector containing cDNA of lncBHLHE41 or lncEPCAM (plus a transfection reaction containing only GFP+ or no transfer vector – Inf Ctrol). Figure 4.4A shows HEK293T cells under fluorescent microscope after 48 hrs of transfection. The supernatant of the HEK293T cells in culture containing lentiviral particles was then used to infect target cells (Chapter 2; Figure 2.1). Figure 4.4B and 4.4C show the efficiency of infection in T47D and MDAMB-231 respectively. If cells were in culture for two weeks, cells were also re-sorted and RNA was extracted to determine levels of the lncRNA were still overexpressed in the cells used for the experiments.

69

A

70

B

71

C

C C c C

72

Figure 4.4: Transfection efficiency of cells in culture. (A) HEK293T cells are transfected following standard protocol and supernatant is then collected at 24 hrs, 48 hrs and 72 hrs. Target cells (T-47D and MDA-MB-231) are then infected with these supernatants. Forty-eight (48) hr- supernatant had the most amount of viral particles (Lenti-X GoStix) and was thus used to infect the target cells. (B) T47D cells infected with lentiviral particles containing different constructs. GFP+ and Inf Ctrol are used as control for infection. Inf Ctrol represents cells that have gone through the process of infection without transfer vector. (C) MDAMB-231 cells infected with lentiviral particles containing different constructs. GFP+ and Inf Ctrol are used as control for infection. Inf Ctrol represents cells that have gone through the process of infection without transfer vector.

73

4.2.3. Functional role of the selected lncRNAs

To determine if the overexpression of these lncRNAs affected the phenotype of the cancer cell we selected 4 assays: proliferation, apoptosis, migration and invasion. A scrambled negative control and a GFP empty vector were added to each experiment to determine the effects of transfection/infection and introduction of a 8.0 kb plasmid in the cells. The cells were then harvested and the overexpression efficiency was determined via quantitative real-time PCR before using the cells for phenotypic assays. Expression changes were considered significant if they showed a two- fold change in expression compared to the GFP-empty vector.

4.2.3.a. Proliferation

Proliferation was measured by two methods as described in chapter 2 (sections 2.8 and 2.9). Cells were infected with the specified lncRNA (lncBHLHE or lncEPCAM) and were grown until at least 1 million cells. GFP marker was checked after 24-48 hrs to confirm infection in about 40% of cells for lncEPCAM (lncE) and 20-30% of cells for lncBHLHE (lncB). A fraction of these cells were used to test overexpression by collecting cell lysate, extracting RNA and running RT-qPCR. Figure 4.5 and

Figure 4.6 are representative figures of results of at least 3 independent experiments testing T-47D and MDA-MB-231, respectively, where the same cell type went through the same process of infection, RNA extraction, RT-qPCR lncRNA level check and proliferation assay. Figure 4.5A is T-

47D cells with specified construct measured using MTT method at 24 hrs, 48 hrs and 72 hrs post- plating. Figure 4.6A is a representation of MDA-MB-231 cells with specified construct measured using MTT method at 24 hrs, 48 hrs and 72 hrs post-plating. T-47D cells and MDA-MB-231 cells overexpressing lncB show a reduced proliferation rate compared to T-47D cells containing GFP+ construct (Figures 4.5A and 4.6A, respectively). On the other hand, T-47D cells and MDA-MB-231 cells overexpressing lncE show an increased proliferation rate compared to the same T-47D-GFP+ and MDA-MB-231-GFP+ controls (Figures 4.5A and 4.6A, respectively). Figures 4.5B and 4.6B are the same cells but measuring proliferation rate in real time for 60+ hours. Repeat experiments

(infections #1, #2 and #3 gave similar results with exponential growth starting at 20 hrs and all cells

(LncE, GFP+, Inf Ctrol and LncB) converging at cell index 7 – approximately 1.6x105 T-47D cells -

74

after 70 hrs of incubation) (Figure 4.5B). LncE promotes proliferation in both luminal (T-47D) and

TNBC (MDA-MB-231) cells whereas lncB inhibits proliferation in both cell lines as determined by both MTT and RTCA methods.

A

75

Figure 4.5: Proliferation of T47D cells. (A) Measured by MTT- T47D cells were plated and left to attach overnight. At 24 hrs, 48 hrs and 72 hrs post-plating, cells were subjected to MTT and absorbance was measured according to protocol. Results are representative of 4 independent infections (n=4). **Pvalue (p)<0.001; ***pvalue (p)<0.0001. (B) Proliferation rate by RTCA. Twenty thousand (20,000) cells/well were plated and followed for 65 consecutive hours with data collected every hour; 4 replicates per construct. Upper panel is the image obtained in real time. T47D were

76

recorded for 65 hrs to determine proliferation rates of cells overexpressing different constructs (Inf

Ctrol: no construct or scrambled; GFP+: GFP-expressing vector/empty vector; LncB: lncBHLHE41 overexpressing cells; LncE: lncEPCAM overexpressing cells). Lower panel represents results from the upper panel at specified time points. Results are representative of 3 independent infections

(n=3). *Pvalue (p)<0.01

A

77

Figure 4.6: Proliferation of MDA-MB-231 cells. (A) Measured by MTT - MDA-MB-231 cells were plated and left to attach overnight. At 24 hrs, 48 hrs and 72 hrs post-plating cells were subjected to

MTT and absorbance was measured according to protocol. Results are representative of 4 independent infections (n=4). *Pvalue (p)<0.01. (B) Proliferation rate by RTCA. Fifteen thousand

(15,000) cells/well were plated and followed for 48 consecutive hours with data collected every hour; 4 replicates per construct. Upper panel is the image obtained in real time. MDAMB-231 were

78

recorded for 48 hrs to determine proliferation rates of cells overexpressed with different constructs

(GFP+: GFP-expressing vector/empty vector; LncB: lncBHLHE41 over-producing cells; LncE: lncEPCAM over-producing cells). Lower panel represents results from the upper panel at specified time points. NB: Inf Ctrol overlapped with GFP+ and was removed from graph for clarity. Results are representative of 3 independent infections (n=3). Pvalue (p)<0.05.

4.2.3.b. Apoptosis

Apoptosis induced by the overexpression (OE) of the selected lncRNAs was measured using Click- iT® Plus TUNEL assay. These cells which already contain a GFP marker had to be evaluated using a method multiplexing TUNEL with the GFP label. Life Technologies Click-iT Plus TUNEL Assay detects in situ apoptosis using Alexa Fluor with 594 dye. This picolyl azide dye excites at 590nm and emits at 615nm, not interfering with GFP signal which emits at 530nm. Figure 4.7A and 4.7B show images taken for T-47D and MDAMB-231 respectively when the cells containing the specified constructs were subjected to Life Technologies Click-iT® Plus TUNEL Assay for In Situ Apoptosis

Detection. For the positive control, cells incubated overnight and subsequently fixed, were subjected to DNase treatment and analyzed for DNA breaks. Positive controls present a positive reaction in a 100% of the cells for both T-47D and MDA-MB-231. Figure 4.7A shows a weak T-47D positive reaction. No positive cells were observed for either lncB or lncE. This could be due to the nature of the cells. T-47D control cells subjected to DNase and tested with another kit called In Situ

Cell Death detection kit (Roche) also show inconclusive results for T-47D TUNEL positive reaction but not MDA-MB-231 (data not shown). Hence, T-47D apoptosis experiments are hard to evaluate.

However, Figure 4.7B represents an example of apoptotic cells in MDA-MB-231 for lncB.

Quantification of apoptotic cells in 10 frames (approximately 400 cells) yield the results presented in the graph (Figure 4.7B lower panel). Figure 4.7C shows in graphical form the results obtained from counting at least 400 cells per reaction of MDA-MB-231 OE either lncRNA (or GFP) and calculating the % TUNEL positive (TUNEL positive/total amount of cells in 10 frames). MDA-MB-

231 cells OE lncB show a slight increase in apoptosis compared to MDA-MB-231-GFP+ cells

(control). An increase of 2% to 7% is observed in MDA-MB-231 cells OE lncBHLHE (lncB). Inf Ctrol

79

and GFP+ show very few cells that undergo apoptosis. A 3.5x more apoptotic cells are seen due to the presence of the OE of lncB compared to control (GFP+ cells).

T47D OE lncBHLHE cells subjected to Click-iT Plus TUNEL assay A

80

B MDAMB-231 OE lncBHLHE cells subjected to Click-iT Plus TUNEL assay A

81

Quantification of Apoptosis in MDAMB-231 OE lncEPCAM (lncE) and lncBHLHE (lncB) using

Click-iT Plus TUNEL assay

Figure 4.7: Breast cancer OE lncBHLHE cells subjected to Click-iT Plus TUNEL assay. (A)

T47D OE lncBHLHE cells subjected to Click-iT Plus TUNEL assay. Upper panel: positive control

(cells subjected to DNAase); Middle panel: negative control; Lower panel: apoptotic reaction. A weak positive control hinders any reliable evaluation of apoptosis changes due to presence of OE of lncRNA. Results are representative of 2 independent experiments (n=2). (B) MDAMB-231 OE lncBHLHE cells subjected to Click-iT Plus TUNEL assay. Upper panel: positive control (cells subjected to DNAase); Middle panel: negative control; Lower panel: apoptotic reaction. A few more apoptotic cells are seen due to the presence of OE of lncBHLHE compared to GFP+ MDAMB-231 cells. Results are representative of 2 independent experiments (n=2). (C) Quantification of

Apoptosis in MDAMB-231 OE lncEPCAM (lncE) and lncBHLHE (lncB) using Click-iT Plus TUNEL assay. Inf Ctrol and GFP+ show very few cells that undergo apoptosis. A 3.5x more apoptotic cells are seen due to the presence of the OE of lncBHLHE compared to control (GFP+ cells). Results are representative of 2 independent experiments (n=2).

82

4.2.3.c. Migration

The xCELLigence RTCA instrument from Roche Applied Science was used to determine how these lncRNAs affect migration (and invasion – see section 4.2iii d) of MDA-MB-231, cells that are considered highly aggressive. We confirmed that MDA-MB-231 baseline cells (and MDA-MB-231

Inf Ctrol) migrate at a similar rate as MDA-MB-231 containing the GFP marker. MDA-MB-231 cell line is widely reported as highly migratory and invasive (208-210) due to the release of ample levels of MMP-9 (211) and other membrane matrix metalloproteinases (212). Figure 4.8 shows how the migration rate of MDA-MB-231 is affected by the presence of lncBHLHE and lncEPCAM. Twenty thousand (20,000) cells/well were plated at the beginning of the experiment and were let to migrate through the upper chamber to the lower chamber (see section 2.9 of Materials and Methods for details) for 24 hrs. Figure 4.8A shows the migration assay in real time. Middle panel 4.8B is the quantification at end point. Lower panel 4.8C are images of the cells that migrated to the lower chamber. At the end of the experiment, cells that do not migrate are removed and cells that migrate to the lower chamber are stained. More cells migrated in MDA-231-lncE compared to MDA-231-

GFP (MDA-231-lncB shows the least amount of stained cells). Figure 4.8 is representative of 3 independent experiments (n=3) with 4 technical replicates in each run. The high expressing E- cadherin cell line T47D, has very little to no migration capacity in vitro (213, 214). We tested to see if the introduction of lncEPCAM (or lncBHLHE) modified its non-migratory characteristic and after

48 hrs followed by the RTCA system we concluded that the presence of lncECPAM (or lncBHLHE) did not modify the above characteristic. T47D cells infected with lncRNAs showed the same low migratory effects as T47D-GFP+ and the negative control (T47D plated in serum free media in the upper chamber with no serum in the bottom chamber). After 48 hrs serum deprived T47D cells start dying (data not shown).

83

A

B

C

84

Figure 4.8: Effect of lncBHLHE and lncEPCAM on migration. MDA-MB-231 cells containing lncBHLHE and lncEPCAM were subjected to real time cell analysis migration. Upper panel (A) shows the real time results of cells being recorded every 15 minutes for 24 hours. Middle panel (B) shows results at end point (24 hrs after plating). Lower panel (C) shows stained cells on wells at end point (see Materials and Methods 2.9). Results are representative of 3 independent cell infections (n=3) with the corresponding lncRNAs averaging 4 replicates in each independent experiment. LncB = lncBHLHE41; LncE = lncEPCAM; neg ctrol = negative control – no serum added to the lower chamber.

85

4.2.3.d. Invasion

The xCELLigence RTCA instrument was used to determine how these lncRNAs affect invasion of

MDA-MB-231. We confirmed that MDA-MB-231 baseline cells invade at a similar rate as MDA-MB-

231 containing the GFP marker. Figure 4.9 shows how the invasion rate of MDA-MB-231 is affected by the presence of lncBHLHE and lncEPCAM. Twenty thousand (20,000) cells/well were plated on matrigel coated wells at the beginning of the experiment and were let to invade through the upper chamber to the lower chamber (see section 2.9 of Materials and Methods for details) for 24 hrs.

Figure 4.9A shows the invasion assay in real time. Middle panel 4.9B is the quantification at end point. Lower panel 4.9C are images of the cells that invaded to the lower chamber. At the end of the experiment, cells that do not invade are removed and cells that managed to invade to the lower chamber are stained. More cells invaded in MDA-231-lncE compared to MDA-231-GFP (MDA-231- lncB shows the least amount of stained cells). Figure 4.9 is representative of 3 independent experiments (n=3) with 4 technical replicates in each run. As mentioned in 4.2iiic, T-47D cells have very little to non-invasive capacity (213, 215) unless transformed with KRas or NRas (216). They are considered non tumorigenic (tumors take more than 10 months to grow in nude mice) unless supplemented with exogenous estrogen (178). We tested to see if the introduction of lncEPCAM

(or lncBHLHE) modified its non-invasive characteristic and after 48 hrs followed by the RTCA system we concluded that the presence of lncECPAM (or lncBHLHE) did not modify the above characteristic. T-47D cells infected with lncRNAs showed the same low invasiveness as T47D-

GFP+ (data not shown) and the negative control (T47D plated in serum free media in a matrigel coated upper chamber with no serum in the bottom chamber).

86

A

B

C

87

Figure 4.9: Effect of lncBHLHE and lncEPCAM on invasion. MDA-MB-231 cells containing lncBHLHE and lncEPCAM were subjected to real time cell analysis invasion. Upper panel (A) shows the real time results of cells being recorded every 15 minutes for 24 hours. Middle panel (B) shows results at end point (24 hrs after plating specified cells on matrigel coated wells). Lower panel (C) shows stained cells on wells at end point (see Materials and Methods 2.9). Results are representative of 3 independent cell infections with the corresponding lncRNAs (n=3) averaging 4 replicates in each independent experiment. LncB = lncBHLHE41; LncE = lncEPCAM; neg ctrol = negative control – no serum added to the lower chamber.

88

4.2.3.e. RNA Fluorescence in situ hybridization

Determining if lncBHLHE41 and lncEPCAM are nuclear or cytoplasmic is key to pinpointing their potential function in differentiation and cancer. Also, if their nuclear localization is confirmed, techniques such as ChIRP could potentially aid in determining their interaction with chromatin proteins and DNA. A collaboration with Dr. Arjun Raj (University of Pennsylvania, Philadelphia, PA) was established for training in FISH for low abundant lncRNAs, such as lncBHLHE41 and lncEPCAM. LncRNA MALAT-1 was used as positive control for these reactions as it is abundantly expressed in most cancer cell lines (217). We tested MALAT-1’s expression in a normal immortalized cell line (MCF10A) and three breast cancer cell lines (T47D, MCF-7, and MDA-MB-

231) to confirm its high expression before we tested low abundant lncRNAs such as lncBHLHE and lncEPCAM. All 4 cell lines express high amounts of MALAT-1 (Figure 4.10A) and thus lncBHLHE and lncEPCAM were studied in these cell lines. Figure 4.10B shows lncBHLHE and lncEPCAM expression levels in breast cancer cell lines. We confirmed that MCF10A does not express lncEPCAM (data not shown) which goes along with the results obtained by RT-qPCR (see Figures

4.1B and 4.2). LncBHLHE is nuclear and seems to localize to the nucleolus and lncEPCAM is both nuclear and cytoplasmic.

89

A

90

B

Figure 4.10: LncRNA expression in normal immortalized and cancer cells. (A) MALAT-1 expression in normal immortalized, luminal and triple negative breast cancer cell lines. MALAT-1

RNA was tested in our cell lines to determine the level of expression of this abundant lncRNA used as positive control. (B) LncBHLHE and lncEPCAM expression in luminal and triple negative breast cancer cell lines. LncBHLHE is nuclear and seems to localize to the nucleolus and lncEPCAM is both nuclear and cytoplasmic.

91

4.2.3.f. LncRNAs’ regulation in Cis

Long noncoding BHLHE41 (lncB) is located on chromosome 12. The genes in the landscape immediately surrounding lncBHLHE41 (lncB) were considered to see if it was possible that lncB was a regulator of their expression via cis regulation. IGV was used to examine these genes to see if there was a discernable bias between parous and nulliparous expression. Genes surrounding lncBHLHE41 included BHLHE41, IFLTD1, RASSF8, SSPN, and ITPR2 (Figure

4.11A). BHLHE41 was determined to be expressed in the parous breast, while the parous expression of SSPN was only detected on exon 3 of the gene. RASSF8 was the only gene that seemed more expressed in nulliparous samples, and it was determined the only detectable difference was on exon 6. IFLTD1 and ITPR2 had no discernable bias for either parous or nulliparous.

Long noncoding EPCAM (lncE) is located on chromosome 2. BC200 RNA, also known as BCYRN1

RNA, BC200a, NCRNA00004, LINC00004, is transcript 3 of lncE (190). Since our original determination of lncEPCAM being a key candidate in our system, databases have updated its name and is now found in NCBI and lncRNA databases associated with a few publications with the name

BC200 or BCYRN1. It’s relevance in the field of neuroscience and cancer has recently emerged and NCBI has given it the status of “reviewed” and the NCBI reference sequence (RefSeq) number

NR_001568.1. The functions of three genes in the genome near lncE were examined to see if it was plausible that lncE was regulating them in cis manner: EPCAM, CALM2, and MSH2 (Figure

4.11B). Using IGV, EPCAM was more expressed in parous women, with significant differences found in the exons. CALM2 leaned slightly towards a nulliparous favored expression. MSH2 is more expressed in parous, especially evident on exons 3-8.

92

A

B

Figure 4.11: Genomic region around lncBHLHE (lncB) and lncEPCAM (lncE). (A) Using IGV to view the genes around lncB and the selection of IFLTD1, RASSF8 and BHLHE to be further studied. (B) NCBI snapshot of lncE/BCYRN1 genome surroundings. CALM2, EPCAM and MSH2 were selected to be further studied.

After evaluating the involvement of the genes in the surrounding regions of both lncBHLHE and lncEPCAM in cancer and/or development, we decided to evaluate by RT-qPCR, the effect on the overexpression of these two lncRNAs on the mRNA expression of 6 nearby genes. BHLHE41 and

RASSF8 mRNA expression was determined in cells T47D and MDAMB-231 overexpressing

93

lncBHLHE (IFLTD1 is not expressed in neither of these cell lines). CALM2, EPCAM and MSH2 mRNA expression was determined in the same cell lines overexpressing lncEPCAM. T47D and

MDAMB-231 expressing GFP were used to normalize the results (Figure 4.12). When lncBHLHE41 was overexpressed, BHLHE41’s expression increased 2-fold in MDA-MB-231 and T-47D compared to the respective control (Figure 4.12A). When lncEPCAM is overexpressed, CALM2’s expression decreased more than half in MDA-MB-231 and T-47D compared to respective control

(Figure 4.12B).

BHLHE41 is a basic helix loop helix (bHLH) protein encoding gene which regulates circadian rhythms and cell differentiation, and behaves as a transcriptional repressor (218). It has been identified as a potential tumor suppressor gene in lung and endometrial cancers; reduced expression of BHLHE41 is correlated with cell migration, invasion, and metastasis (193, 219). In endometrial cancer, BHLHE41 has been linked to the NOTCH1 pathway (193). In triple negative breast cancer, the BHLHE41 protein represses invasion and metastasis by degrading hypoxia- inducible factor 1 α (HIF-1α) (193, 197). The parous bias of BHLHE41 in IGV also suggests tumor suppressing function as parous women are more protected from breast cancer than nulliparous

(184). When lncBHLHE41 was overexpressed, BHLHE41’s expression increased 2-fold in MDA-

MB-231 and T-47D compared to the respective control (Figure 4.12A).

RASSF8 is part of a well-known family of tumor suppressor genes; this member of the RAS family is an integral part of epithelial cell adherent junctions, and is involved in cell migration, invasion and metastasis (220, 221). It has also been proposed as a lung tumor suppressor (222, 223). We found RASSF8 increased expression in nulliparous women in our IGV analysis. We decided to evaluate this gene in our cell model to determine if there were any differences in vitro due to its highly reported role in cancer progression. RASSF8’s expression levels do not significantly change when lncBHLHE41 is overexpressed.

CALM2 is a calmodulin, a calcium binding protein responsible for cell signaling, proliferation, apoptosis, and cell cycle development (224). Mutations in CALM2 and its two nearly identical

94

calmodulin associates are related to ventricular arrhythmias occurring simultaneously with epilepsy and hindered neurodevelopment (225). In breast cancer cells, CALM2 directly binds to death receptor-5 (DR5) in a calcium dependent manner leading to the formation of death inducing signaling complex for apoptotic signaling (226). When lncEPCAM is overexpressed, CALM2’s expression decreased more than half in MDA-MB-231 and T-47D compared to respective control

(Figure 4.12B).

EPCAM or Epithelia Cell Adhesion Molecule is a type I transmembrane protein that is expressed in the majority of normal epithelial tissues and is overexpressed in most epithelial cancers including breast cancer (227, 228). It is associated with enhanced malignant potential in colorectal carcinoma and head and neck squamous cell carcinoma (229, 230). EPCAM’s protein antigen is currently the focus of immunotherapy cancer treatment (202, 231). EPCAM’s expression levels do not significantly change when lncEPCAM is overexpressed.

MSH2 is a homolog of the E. coli mismatch repair gene mutS. Heterozygous germline mutations in any of the mismatch repair (MMR) genes - MLH1, MSH2, and MSH6 - cause Lynch syndrome, an autosomal dominant cancer predisposition syndrome conferring a high risk of colorectal and endometrial cancers, among others (232). Prostate cancer is also associated with mutations in MSH2, MLH1 and MSH6 with the loss of the respective mismatch repair protein in 69 % of the tumors (233). MSH2’s expression levels do not significantly change when lncEPCAM is overexpressed.

95

A

B

Figure 4.12: Evaluating Cis regulation. Effect of lncB (A) and lncE (B) overexpression on nearby genes in MDA-MB-231 (MDA) and T47D cell lines. Fold change was determined by the following equations: ΔCt=Ct_gene – Ct18S; ΔΔCt=ΔCt_gene – ΔCt_GFP; Fold change = 2^(-ΔΔCt) where

18S is the housekeeping gene and Ct_GFP corresponds to threshold of the gene in cells that express GFP. Error bars indicate difference between duplicates for each independent experiment.

96

4.3. LncRNAs in breast cancer tissue vs. normal adjacent

Breast cancer tissues and its normal adjacent tissue were obtained from the Tissue Bank at FCCC.

Careful selection of tissues was made before testing lncBHLHE and lncEPCAM expression.

Tissues were divided in two and H&E staining was performed on all tissues, cancer and normal adjacent, to confirm morphology before RNA was extracted on the other piece. After H&E staining, a substantial amount of tissue had to be discarded as “normal” labeled tissues presented with ductal hyperplasia or, in a few cases, small regions showing ductal carcinoma in situ. On the other hand, a few tissues showed normal structures in a tissue labeled as “tumor”. We were very conservative and just chose typical normal and tumor tissue. Normal tissues should present lobular/ductal structures and tumor tissue should show abundance of epithelial cells (with little lymphocyte infiltration). Samples that were labeled as tumor but contained some/a few ductal structures, samples that presented ductal hyperplasia, or samples with atypical lobules were discarded. Figure 4.13 shows representative examples. Figure 4.13A shows normal structures

(ducts and ductules) in normal adjacent tissue and abundance of epithelial cells in tumor tissue.

Figure 4.13B shows examples of tissue samples that were unfit for further gene testing. Figure

4.13C shows lncBHLHE and lncEPCAM in representative tissues, although few samples were tested as a consequence of a stringent selection of fit samples. Tumor expression levels were normalized to normal adjacent levels. LncEPCAM is upregulated in tumor tissue and lncBHLHE is downregulated in tumor tissue (Figure 4.13 C).

97

A

B

C

Figure 4.13: Breast cancer tissue evaluation for lncBHLHE (lncB) and lncEPCAM (lncE). (A)

Shows examples of expected tissue structures and morphology for normal tissue (well defined ducts and ductules – blue arrows -) and tumor tissue with overgrowth of epithelial cells (blue arrows). (B) Examples of tissues not considered for further analysis. (C) Expression levels of lncB and lncE on representative tissues.

98

4.4. LncRNAs in vivo model (xenograft study)

All animal studies were carried out using protocol approved by the IACUC of FCCC. Female

CB17/SCID mice of 6-8 weeks of age were obtained from FCCC animal facility. A viable single‐cell suspension of T-47D or MDA-MB-231 cells overexpressing lncBHLHE41 or lncEPCAM (T-47D- lncB; T-47D-lncE; MDA-MB-231-lncB; MDA-MB-231-lncE) in 100 μL of PBS was mixed with 100 μL matrigel. Cells were then injected into the mouse mammary fat pad. Training was received and experimental conditions were maintained within and in between experiments to assure reproducible injections. Also, training was received for survival surgery. As mentioned in Materials and Methods

(chapter 2 section 2.11), T-47D cells require higher levels of estrogen than what is produced by the

CB17/SCID mice to grow tumors. An estrogen 2*2mm pellet is implanted in the back of the neck

48 hrs before cell injection. Figure 4.14 shows body weight of mice throughout 4-week experiments.

No major changes were observed in weight for the MDA-MB-231 xenograft experiment. The average weight was about 20g+/-3g and all animals looked healthy at the time of sacrifice (Figure

4.14A). However, T-47D experiment was repeated (T-47D repeat) with 5 mice injected with T47D-

GFP-control cells and 5 mice injected with T-47D-lncE in the right mammary fat pad because in the first experiment 2 T-47D-GFP and 3 T-47D-lncE mice died during the experiment due to severe weight loss (marked with ᵼ).

99

A

B

100

Figure 4.14: Mice weight throughout experiments. (A) Mice with MDA-MB-231 cell line OE either lncE (mice #1-6) or lncB (mice #7-12) were injected in the left mammary fat pad of SCID mice and compared to MDA-MB-231-GFP (mice #13-19) and MDA-MB-231-Infection Control (mice

#20-25). (B) Mice with T-47D cell line OE either lncB or lncE were injected in the right mammary fat pad of SCID mice. GFP and infection control (IC) cells were used as controls for both cell lines.

GFP and IC showed no significant difference in tumor growth.

At the end of the experiment, mice were sacrificed according to IACUC protocol and tumors were removed for weighing (Figure 4.15A) and histological analysis (4.15B). The average tumor mass is an average of all mice in a specific group (4.15C). Tumors grew very distinctively from one group to another. As Figure 4.15C shows, mice containing xenografts with MDA-MB-231-lncE cells in the mammary fat pad, grow very big tumors compared to MDA-MB-231-GFP in the 4-week of the experiment.

101

A

B

C

Figure 4.15: Mice MDA-MB-231 tumors OE lncB and lncE, and histological sectioning. (A)

Tumor growth in mice throughout four weeks. (B) The figure shows a representative section of H&E stained poorly differentiated tumor at end point (4 weeks). (C) Tumor weight at end point. ***pvalue

< 0.001.

102

Xenografts experiments using T-47D cell line require the extra step of estrogen pellet insertion as cells do not grow (or grow very slowly) without estrogen stimulation. This requires survival surgery a couple of days before cell injection and as a consequence more handling and potential exposure to immunocompromised animals. Survival surgery went smoothly and mice looked healthy and healed well after surgery. However, 2 weeks after surgery (one week after clip removal) a few mice started to lose weight. Mice were monitored and eventually 4 mice had to be sacrificed due to extreme weight loss. We also noticed drier skin and rough fur in these animals (one mouse in the lncB group died due to heart attack while holding). These events may all be a consequence of estrogen exposure. Thus, as all these animals were either in the GFP-control group or the lncE group, we decided to repeat these 2 groups with 5 mice/group for a total of 7 mice in the GFP- control group (1 mice died due to extreme weight loss in second round) and 8 mice in the lncE group. Figure 4.16 shows the results of our T-47D xenograft study, combining both rounds. Figure

4.16A shows the dissected tumors at end point. Figure 4.16B shows H&E staining of poorly differentiated adenocarcinoma. T-47D cells which OE lncE have invaded the muscle (purple arrow) showing the degree of aggressiveness of the tumor. Results in Figure 4.16C are expressed as percentage tumor growth (instead of tumor mass/weight as in Figure 4.15A) as two separate experiments’ results were combined to increase the power. T-47D xenograft and T-47D repeat xenograft (which just included GFP and lncE groups) showed the same proportional results. Both experiments were carried out identically and thus we were able to combine the results using percentage tumor growth (Figure 4.16C). As Figure 4.16C shows, mice containing xenografts with

T-47D-lncE cells in the mammary fat pad, grow very big tumors compared to T-47D-GFP

(pvalue<0.001), and mice containing T-47D-lncB cells grow smaller tumors compared to T-47D-

GFP (pvalue<0.05), in the 4-week period of the experiment.

Table 4.1 summarizes the phenotypic changes observed in vitro and in vivo as a consequence of the overexpression of lncBHLHE and lncEPCAM in T-47D and MDA-MB-231 cells.

103

A

B

C

104

Figure 4.16: Mice T-47D tumors OE lncB and lncE, and histological sectioning. (A) The figure shows tumor growth in mice throughout four weeks. (B) The figure shows a representative section of H&E stained poorly differentiated tumor at end point (4 weeks). Invasion to the muscle is marked with a purple arrow in LncE image. (C) Percent tumor growth at end point. *pvalue<0.05; ***pvalue

< 0.001.

105

The effect of the estrogen pellet was evaluated in all organs, but especially in reproductive organs such as ovaries and uterus. Unpublished data from other labs at FCCC and elsewhere that work with the xenograft model using estrogen receptor positive cells, have noticed estrogenic effects on bladder as well. Thus we decided to evaluate all organs and found no abnormalities except for distended bladder in a few mice and cysts in ovaries/uteri. Figure 4.17 shows the presence of cysts

(purple arrows) that can form in reproductive organs due to estrogenic effect.

Figure 4.17: Cysts in mice reproductive organs (uterus and ovaries) due to the presence of estrogen pellet. H&E staining of ovaries and uteri of all animals with estrogen pellet was performed to evaluate effect of the pellet in all organs. Only uterus and ovaries presented abnormalities (cysts; purple arrows) due to effect of estrogen. 10x left panel; 40x right panel.

Table 4.1: Summary of phenotypic changes as a result of lncRNAs’ overexpression (OE) Summary of Results - Phenotypic Changes due to OE compared to control Assay Method T-47D T-47D MDA-MB-231 MDA-MB-231 in vitro lncBHLHE lncEPCAM lncBHLHE lncEPCAM Proliferation MTT decreased increased decreased increased RTCA decreased increased decreased increased Apoptosis TUNEL inconclusive inconclusive increased No difference Migration RTCA decreased increased decreased increased Invasion RTCA decreased increased decreased increased in vivo Xenograft *MFPI decreased increased decreased increased *MFPI = mammary fat pad injections

106

4.5. Discussion

By using RNA sequencing (RNAseq) technology we identified 42 long non-coding regions that were significantly and differentially expressed between parous and nulliparous breast tissue samples

(36, 234). Among these 42 regions, 21 were found significantly downregulated, and 21 were upregulated in parous samples. The power of this model is that the lncRNAs were discovered directly from a cohort of 16 women who volunteered for breast biopsies of healthy tissue. To the best of our knowledge, this is the first time that normal tissue in two different physiological conditions (pregnancy vs. no pregnancy) has been studied to identify noncoding regions that are differentially expressed between the two groups. Although this is a small cohort, these findings highlight the differences among apparently (from the histological point of view) similar tissue. Given the plethora of potential functional roles lncRNAs have, we believe the lncRNAs identified in this study are a class of genetic regulators that is largely untapped. In the context of healthy tissue, these lncRNAs may be highlighting the predisposition nulliparous women have over the increased risk of developing breast cancer in their postmenopausal years. In the context of disease, these lncRNAs may serve as drivers of cancer as well as potential therapeutic entry points. In order to study these lncRNAs in the laboratory, we turned to a normal vs. cancer setting to evaluate their expression levels, relevance and potential applicability of the information discovered in a parous vs. nulliparous setting. To gain information on the lncRNAs’ expression levels in cancer, we used normal and breast cancer cell lines, plus normal adjacent and tumor breast tissue. Using real-time

PCR we evaluated the expression of these lncRNAs in nine different human breast cell lines (Figure

4.2) and human breast tissue (Figure 4.12).

The insight given by the RNAseq results can now be complemented with relevance of these lncRNAs in vitro. The presence and differential expression of these lncRNA between parous and nulliparous women opened the doors to further studying their relevance in breast cancer prevention and cancer. Thus, a selection of a wide range of cell lines was made to not only represent different subtypes of breast cancer (luminal, triple negative/basal, and HER2+) but also the three main normal immortalized cell lines available in the scientific community were tested (MCF10A, MCF10F, and MCF12A). Findings obtained studying these lncRNAs are relevant to breast cancer prevention

107

as these lncRNAs were first identified when comparing parous vs. nulliparous women. Our immediate goal was to test the expression of these lncRNAs in breast cells (normal- like/immortalized and cancer), breast tissue (tumor and normal-adjacent) and, if relevant, their role in vivo.

As already mentioned, all the lncRNAs discovered by our RNAseq analysis weren’t previously described, making their study attractive but challenging at the same time. When analyzing their characteristics and location in the genome, we discovered that lncEPCAM had previously been vaguely described in the literature. Its potential implication in breast cancer had been reported a few years back (205). This made this lncRNA particularly interesting as the few publications related to this short lncRNA were mainly describing its implication in brain pathology such as Alzheimer’s disease (235). Recently, it has become clear of the relevance of BC200 as a key regulator in cancer

(236-238), specifically breast cancer (239-241). However, the findings are still in its infancy and thus the field is open for exploration. The relevance and involvement of any of the lncRNAs described above, in either breast cancer and/or breast cancer prevention, is a novel and unexplored territory.

Technologies such as RNAseq and microarrays help in the discovery of potentially relevant molecular targets in the system of interest. Targeted techniques help evaluate these potential candidates to determine their importance. Are these lncRNAs functioning to promote disease or are they mere by-products of transcription in disease predisposition and disease progression? Our preliminary studies in healthy tissue, together with literature search, coverage analysis, sequence analysis, availability of reagents, and preliminary results on cells and cancer tissue helped us narrow down our in detailed analysis to two top candidates, lncBHLHE41 and lncEPCAM (Table

3.2).

Our data show that this subset of lncRNAs are not only differentially expressed between normal and cancer cells but also cluster the different breast cancer subtypes in luminal, basal/TNBC and

HER2+ (at least 2 cell lines were used for each subgroup). After successful overexpression of these lncRNAs in the selected cell lines, we tested transformation phenotypes of the cells using the

108

techniques described above. The assessment of the transformation phenotype changes after the manipulation of these lncRNAs, allowed us to start understanding the role of these molecules not only in the neoplastic process, but probably also in the prevention of breast cancer.

Breast normal-like and breast cancer cell lines were tested for ten lncRNAs (Figure 4.1 and 4.2) to determine their levels of expression. Custom primers and probes for the selected lncRNAs were used to determine level of expression in three normal-like cell lines (MCF12A, MCF10F, MCF10A), two luminal (MCF7 and T47D), two basal/triple negative (MDAMB231 and Hs578T), and two breast cancer cell lines overexpressing HER2/Neu (SKBR3 and HCC1954) (Figure 4.2). The six breast cancer cell lines cover a wide scope of breast cancers as main subtypes of breast cancer are represented. Normal-like cell line MCF10A, luminal breast cancer cell T47D (and for preliminary experiments MCF7) and triple negative breast cancer (TNBC) MDAMB-231 were selected for in- depth study. The luminal cell lines were chosen based on the fact that over 70% of breast cancers are of the luminal type (13). Triple negative breast cancers (TNBC) accounts for 10-20% of breast cancers and has been found to be associated with younger age, more advanced stage of diagnosis, and no current local treatment except for mastectomy with or without radiotherapy, due to lack of drug-targetable receptors (242). Although TNBC is sensitive to chemotherapy, survival after metastatic relapse is short, treatments are few, and the response rate is poor and lack durability

(242). Based on our preliminary results (Figure 4.1), we observed that the lncRNA expression levels were very similar among the three normal-like cell lines. We used MCF10 as our normal control due to its stable morphology and consistent gene expression profile, even after several passages in culture. This makes MCF10 a suitable normal control (NB: cells are immortalized and not primary cells).

To gain information on the expression levels of the selected lncRNAs in normal and breast cancer cell lines, we evaluated the expression levels of these ten lncRNAs in this in vitro model. The detection of the levels of lncRNAs was performed by reverse transcriptase real-time PCR. The expression levels for each lncRNA was calculated by the average of three consecutive passages of each cell line, which showed consistency of the expression of these lncRNA as cell line passages

109

increased. Unsupervised clustering analysis showed that the patterns of expression of these lncRNAs group each subtype of breast cell line together. Highly interesting, this reveals that these ten lncRNAs have different expression signatures for each breast cancer subtypes (luminal,

TNBC/basal and HER2+ cell lines) (Figure 4.2). We thus considered the functional studies of these lncRNAs in different breast cancer cell lines of high interest but had to choose a few lncRNAs to be studied in depth in a few cell lines. Based on expression levels in different cell lines, potential relevance in breast cancer, and location in the genome, we selected 2 lncRNA candidates to perform the functional studies shown above. Figure 4.3 shows the expression values of lncBHLHE and lncEPCAM in the selected cell lines compared to normal-like MCF10F.

We hypothesized that these lncRNAs were key drivers in the process of molecular remodeling that occurs in the mammary gland during pregnancy, providing protection against the development of breast cancer. To understand their role in cancer progression, we evaluated the changes in phenotype of breast cancer cell lines, in vitro and in vivo, after lncBHLHE and lncEPCAM were successfully overexpressed.

LncBHLHE was initially chosen as a strong candidate to be further studied in vitro and in vivo as it was highly upregulated in parous breast tissue samples. We then confirmed its high upregulation in normal immortalized breast cancer cells and normal-adjacent breast cancer tissue. All these data indicated that lncBHLHE (lncB), although being a novel lncRNA, could be acting as a tumor suppressor in breast tissue. The lack of literature made it a challenge to study. However, further indication of its position in the genome, neighboring cancer-related genes such as BHLHE (also known as SHARP-1) strengthened the interest in this lncRNA. Many cancers have different subtypes and specific genomic signatures. How do we determine if a certain lncRNA may be providing cancer signatures? In the case of lncBHLHE the finding of an association with it upregulation in normal adjacent and parous tissue suggest of its potential to be used in screening to determine its expression in high risk patients and be used as a surrogate marker. For example,

HOTAIR lncRNA is often used as a prognostic marker for many cancers, including breast cancer, where overexpression of HOTAIR is correlated with poor prognosis and higher risk for metastasis

110

(84). More specifically, lncRNAs can be used to determine cancer risk. Genetic screening for specific lncRNAs that are associated with breast cancer can determine the risk factor for developing that cancer. Similar to how familial breast cancer risk can be tested through genetic screening for

BRCA mutations (243), many lncRNAs, such as PCAT for prostate cancer (or lncBHLHE for breast cancer), can be used as a method to determine cancer risk (87).

As Figure 4.2 shows, lncB expression in cancer cells is lower than in normal cells. If this lncRNA is a regulator of tumorigenesis or tumor progression, its overexpression in cancer cells should minimize or ablate the cancer characteristics such as increased proliferation, migration, invasion, and tumor growth. Interestingly, when lncB is overexpressed both in luminal cells and TNBC, the cancer phenotype is minimized in vitro. We also observed significantly smaller tumors in vivo for cell line T-47D. As already mentioned, MDA-MB-231 is a very aggressive cell line. It is possible that the minimized cancer phenotype we see in vitro when lncB is overexpressed in MDA-MB-231 is not translated into the in vivo setting as cells have the stroma to ablate any changes in the cells and still grow tumors as large as (or even larger than) control – differences between lncB and GFP are non-significant in MDA-MB-231. Cis regulation evaluation determined that lncB may be

(directly or indirectly) interacting with and regulating nearby gene BHLHE41. When lncB is overexpressed, BHLHE41 is upregulated in both T-47D and MDA-MB-231. This protein coding gene has been described as a candidate tumor suppressor in lung cancer (194). BHLHE41 transcript is low in lung cancer cell lines and downregulated in lung adenocarcinomas. Its overexpression inhibited colony formation of lung cancer cells presumably due to inhibition of cell proliferation as a consequence of downregulation of cyclin D1 (194, 244). In endometrial cancer

(EC), overexpression of BHLHE41 in EC cells inhibited cell viability (191), migration, invasion, and metastasis (193). Immunohistochemical analysis demonstrated that BHLHE41 protein levels were lower in metastatic EC than in primary tumors, and statistical analysis revealed correlations between levels of SHARP-1/BHLHE41 and markers of EMT (193). In addition, exogenous

BHLHE41 overexpression affected the proteins levels of genes involved in EMT process supporting a role for BHLHE41 in suppressing EMT and metastasis. Finally, also in EC, overexpression of

111

BHLHE41 in vivo inhibited tumor growth and angiogenesis, and decreased HIF-1α expression

(191). A similar mechanism of action was established in breast cancer as well. Montagner et al. demonstrated that BHLHE41 is not only necessary but sufficient to limit expression of HIF-target genes (197). BHLHE41 opposes HIF-dependent TNBC migration in vitro and invasiveness/metastasis in vivo. Mechanistically, BHLHE41 binds to HIFs and promotes HIF’s proteosomal degradation (198). If lncB is proven to (directly or indirectly) regulate BHLHE41 by upregulation of BHLHE41 in the presence or binding of lncB, then the impact on breast cancer (and parous women) is evident. In fact, BHLHE41 is upregulated in T-47D and MDA-MB-231 cell lines when lncB is overexpressed. As lncB and BHLHE41 are here regarded to function as tumor suppressors, the increased expression of BHLHE41 in T-47D and MDA-MB-231 cell lines may indicate that they interact to reduce tumorigenesis (197). The overexpression of lncBHLHE would mean that an upregulation of BHLHE41 makes parous women less prone to developing breast cancer by limiting proliferation, and even migration, invasion, and metastasis.

Our data suggests that lncEPCAM is indeed expressed in breast cancer cells. This coincides with the scarce literature reporting lncEPCAM (also known as BC200 and BCYRN1) only expressed in cancer tissue and not in normal tissue (205). Importantly, overexpression of lncEPCAM leads to increased proliferation in luminal and basal/TNBC cells. LncEPCAM (lncE) overexpressed (OE) cells show statistically significant migration and invasion in both luminal and TNBC cells. In vivo, lncE OE cells produce large tumors that invade to muscle showing the aggressiveness of cells that

OE this lncRNA. Also, our preliminary data in mouse tissue indicate that there are more Ki67 positive cells in MDA-MB-231-lncE and T47D-lncE tumor cells in xenografts than in MDA-MB-231-

GFP and T47D-GFP tumors, respectively (data not shown). Although a few publications have described this lncRNA as an , reporting that lncE/BC200 RNA is highly expressed in invasive breast carcinomas (245) and other human tumors (205), it was only recently that Singh et al. reported on a possible mechanisms of action for BC200 contributing to breast carcinogenesis

(246). In 2004, Iacoangeli et al. suggested that the presence of BC200 in Ductal Carcinoma In Situ

(DCIS) was a prognostic indicator of tumor progression. BC200 had the potential to be a molecular tool in the diagnosis and prognosis of breast cancer (245). Last year, a patent by Tiedge et al.,

112

suggested BC200 RNA as the diagnostic molecular tool for breast cancer after extracting and measuring the levels of BC200 RNA in whole blood. The authors determined that patients having circulating blood levels of 25x BC200 RNA compared to control patients with no disease, have an increased risk for the development of breast cancer (247). This parameter is proposed as an early diagnostic tool with ease to obtain the sample and no or few side effects. Notably this patent is still pending. Earlier this year, Singh et al. published a paper further providing evidence of the role of

BC200 in breast cancer. They demonstrated that BC200 contains sequence complementarity to

Bcl-x mRNA and thus may facilitate the regulation of alternative splicing of Bcl-x mRNA in ER+ breast cancer cells. The authors also demonstrated that BC200 knockout (KO) suppressed ER+ tumor growth in vivo (246). We here propose that the effect on breast cancer pathogenesis is not only in ER+ breast cancer, but also in TNBC. Singh et al. determined that BC200 was expressed in MDA-MB-231 cell line but did not follow up as they determined that the expression of this lncRNA in MDA-MB-231 cells was lower than in luminal cells such as MCF-7 and T-47D. Although we can confirm results published by Singh et al. on MCF-7 cells, we expanded these results in T-47D plus we could determine that similar traits are observed in TNBC model MDA-MB-231. Thus, the Singh et al. publication earlier this year served as a solid platform to establish the high relevance of lncE/BC200 in breast cancer pathogenesis (246). They tackled how, mechanistically, BC200 is critical to cell proliferation and survival. By using CRISPR/Cas9 system they knocked out BC200 in MCF-7 cells and showed that the latter produced an increase in the level of pro-apoptotic Bcl isoforms (246). Although we find these findings very enlightening, we strongly believe this is not the only mechanism by which BC200 is regulating breast cancer development. Moreover, we have here demonstrated that BC200 effect on breast pathogenesis is not limited to ER+ breast cancer.

Our results not only strengthen Singh et al.’s results by demonstrating that these results are not only limited to MCF-7 cells, by including yet another luminal model like T-47D in the picture, but more importantly show that these results are highly relevant in TBNC as well. Since the lncRNA field emerged, experts have discussed the importance of findings for genes that are expressed at a low level. It has been proven time and again that tight regulation occurs with genes expressed at low levels, more so in the lncRNA field (248). We decided to dig further into a complex and

113

aggressive breast cancer subtype like TNBC and have confirmed the relevance of BC200 in luminal breast cancer and for the first time reported the relevance in TNBC. As previously mentioned,

BC200 effect on breast pathogenesis may not only be limited to regulation of alternative splicing of

Bcl-x by BC200 but there are sure other mechanisms contributing to this. Our preliminary results on Cis regulation of BC200 add to Singh et al.’s findings.

CALM2, a gene responsible for apoptosis, proliferation, and cell cycle progression (224, 249, 250), is downregulated in both cell lines (T-47D-lncE and MDA-MB-231-lncE) indicating that lncE, a potential regulatory oncogene, suppresses CALM2 expression to deregulate cell cycle progression and apoptosis. Haddad et al. recently suggested that CALM2 is involved in the etiology of breast cancer, especially in African American women, by performing gene-based and single-SNP analyses (251). CALM2 was included in the study because of calmodulin’s involvement in gonadotropin-releasing hormone (GnRH) signaling. As previously described by Melamed et al.,

GnRH induces calcium influx, which activates calmodulin, leading to gonadotropin gene expression

(252). Thus, CALM2 may impact breast cancer susceptibility through its effects on hormone synthesis (251). The observation that CALM2 is downregulated as a consequence of overexpression of lncE indicates that cells tend to shut down a gene responsible for cell death and controlled proliferation and cell cycle progression in favor of deregulated apoptosis and uncontrolled proliferation, and cell cycle progression.

Altogether, the overexpression of lncBHLHE41 and lncEPCAM/BC200 allowed us to understand the role of these long non coding RNAs not only in the development of the neoplastic process but also in how they may be causing the protection of breast cancer in postmenopausal parous women.

114

CHAPTER 5: RNASEQUENCING OF BREAST TISSUE

FROM PAROUS VS. NULLIPAROUS WOMEN:

A PROTEIN CODING GENE PERSPECTIVE

5.1. Abstract to side project

Casein-CSN2)a marker of mammary cell differentiation unique to parity, is extremely upregulated in our parous cohort when we evaluated protein coding genes of our RNAsequecing

(RNAseq) results and compared parous vs. nulliparous women. This called our attention and we looked into the literature and other genes in our list of differentially expressed genes between parous vs. nulliparous women to find an explanation for such outstanding upregulation. In this chapter we aimed to further pursue our understanding of the role CSN2 may have in the protection against breast cancer development in parous women. We aimed to determine if the molecular relationaship of this gene with other markers of differentiation could be playing a role in the observation that parous women are more refractory to postmenopausal breast cancer. Our findings suggest that all-trans Retinoic Acid (atRA) may bind to CSN2 and together may be the core of molecular interconnections that lead to protective transcriptomic changes in the mammary gland during pregnancy. The high upregulation of CSN2 decades after pregnancy is indicative of a molecular imprint as a consequence of pregnancy. To aid to this analysis and find a potential link to breast cancer, we evaluated the genes identified in normal breast of parous vs. nulliparous women in a setting of normal-adjacent vs. tumor samples of 112 patients using publicly available

RNAseq data in The Cancer Genome Atlas (TCGA) database. TCGA database contains information including molecular profiles, recurrence of disease, and family background of patients with specific cancer subtypes, including invasive breast carcinoma (IBC). RNAseq data of tumor tissue and its matched-adjacent normal tissue from female IBC patients is an invaluable material to validate observations made in a smaller scale.

115

The results summarized in this chapter suggest that high upregulation of CSN2 during pregnancy plays a specific role not only in the upregulation of markers of differentiation, but could also be playing a role in the protection of breast cancer progression. Our analysis suggests that CSN2 appears to be induced by atRA. The same set of genes upregulated in parous women which are involved in the atRA processing may have a protective role in the normal-adjacent tissue of TCGA- invasive breast cancer patients.

5.2. Background

Gene expression programs controlling cell differentiation are a focal point in understanding cancer biology (253, 254). The breast is the only organ in the body for which the pattern of differentiation is dramatically changed from childhood to adulthood. Further changes occur when the breast is exposed to the hormonal impacts induced by pregnancy, lactation and menopausal breast involution; all of these processes result in temporary and permanent changes in its gene expression profile (35, 255-257). It has been previously shown that breast tissue of postmenopausal parous women is more differentiated than the breast of matching age nulliparous women (20, 258) and that a full term pregnancy induces a specific genomic signature characterized by the upregulation of genes controlling splicing, chromatin remodeling, and differentiation (35, 184, 259). Among those genes that are markers of cell differentiation is casein-β (CSN2) (260-265). This gene interacts with the all trans retinoic acid (atRA) pathway which is vital for programing cell differentiation in many tissue types throughout the body during development (266). The upregulation of differentiation markers like CSN2 may be a direct consequence of atRA being induced during mammary gland development due to pregnancy and lactation. All-trans-RA in combination with other therapeutic drugs, is currently being studied clinically to treat multiple cancers (267-278). More specifically, atRA has recently been shown to inhibit cell plasticity and motility, as well as downregulate transcription factors involved in epithelial-mesenchymal transition (EMT) at both RNA and protein levels in triple negative breast cancer cells (279). Sub-populations of breast cancer cells do respond differently to atRA treatment: atRA promotes cell cycle arrest and apoptosis in luminal epithelial cells while increasing adhesion and reducing invasive capacities of myoepithelial cells (280). In the

116

context of breast cancer prevention and progression, atRA treatment is capable of re-differentiating transformed cells at early neoplastic stages (281). Biomarkers induced by atRA metabolism are potential targets for breast cancer prevention and treatment.

Our analysis and findings in a small subset of samples of parous and nulliparous women was confirmed in the TCGA-IBC dataset. TCGA data generated from our differential expression analysis demonstrated a significant upregulation in normal vs. tumor tissue of CSN2, UGT2B7, WISP3, and

XDH, consistent with our original results from the parous vs. nulliparous postmenopausal breast tissue data. WISP3 and XDH have previously been reported to either be lost or mutated in breast cancer (282-284). Central to our findings is CSN2 which has never been reported as a biomarker of differentiation in tissue that has already undergone postmenopausal involution. In summary, here we report that CSN2 is highly expressed in the normal tissue of parous postmenopausal women and by utilizing transcriptomic data from TCGA, we evaluated the interactions of CSN2 with genes that, induced by the process of cell differentiation during pregnancy, may play a preventive role in the development of breast cancer decades after pregnancy occurred.

5.3. Results

5.3.1. CSN2 is highly upregulated in postmenopausal parous breast tissue

By comparing the RNA sequencing data from 8 parous and 8 nulliparous postmenopausal women, we have determined the high upregulation of biomarkers involved in differentiation and atRA processing. We have previously reported significant differences in gene transcription between the postmenopausal nulliparous and parous breast by microarray and qRT-PCR (35, 160, 184, 285).

In the present work we confirmed these differences and increased our understanding of the molecular processes that may lead to breast cancer prevention in parous women by using RNAseq technology. The RNAseq results are depicted in Figure 5.1. We have identified 68 differentially expressed genes (fold change >= 2; adjusted p.value <= 0.01) of which 35 were downregulated and 33 were upregulated in parous breast tissues.

117

Figure 5.1: Heat map of differentially expressed protein-coding genes in the breast tissue of parous and nulliparous women. From a total of 68 differentially expressed protein-coding genes,

35 were downregulated genes in parous breast (in green) and 33 were upregulated genes (in red).

The two shades in blue represent two batches of parous samples while the two shades in red represent two batches of nulliparous samples.

118

Our initial analysis revealed three genes involved in cell differentiation: Casein-β (CSN2), WNT1

Inducible Signaling Pathway Protein 3 (WISP3), and Xanthine dehydrogenase (XDH) which were among the top upregulated genes in the postmenopausal parous breast (Table 5.1). Additionally,

XDH is essential in all-trans retinoic acid (atRA) processing. Also strongly upregulated was an enzymatic mediator in atRA metabolism, UDP Glucuronosyltransferase 2B7 (UGT2B7).

Table 5.1: Mean expression reads of genes involved in atRA metabolism and/or cell differentiation for parous vs. nulliparous women.

Gene ID Mean Parous Mean Nulliparous Log2FC P.value

CSN2 6.33 -9.54 15.87 0.0009

UGT2B7 5.68 -8.46 14.14 0.0033

XDH 9.44 7.44 2 0.0001

WISP3 6.52 4.81 1.71 0.0033

The most upregulated gene in postmenopausal parous breast tissue was CSN2 by a log2FC factor of 15.87 (p.value = 0.00090, Table 5.1). CSN2 has been implicated in atRA mammary alveolar cell morphological differentiation, the binding of retinol and atRA, and produces the inhibition of cell proliferation (262, 286). In addition to marking the presence of lactation with recruited milk proteins,

CSN2 is capable of binding molecular retinol and atRA (261, 286-288). Using paraffin embedded breast tissue of parous and nulliparous women we confirmed the immunocytochemical detection

119

of the CSN2 protein which was highly expressed in both ductules and ducts of the parous breast epithelia (Figure 5.2).

Figure 5.2: CSN2 protein expression levels in parous and nulliparous women. Representative pictures of CSN2 protein staining (magnification 400x). Parous women show high levels of CSN2 protein expression both in ductules (left panel) and ducts (right panel). Nulliparous women show low levels of CSN2 protein expression both in ductules (left panel) and ducts (right panel).

Using Integrative Genome Viewer (IGV) (166), we could visually assess the highly different

RNAseq gene expression results when comparing parous vs. nulliparous distinct mRNA expression levels (Figure 5.3).

120

Figure 5.3: CSN2 mRNA expression levels in parous and nulliparous women. Representative screenshots of CSN2 mRNA expression (parous upper panel vs. nulliparous lower panel). Parous women show high levels of CSN2 mRNA expression visualized using Integrative Genome Viewer

(IGV) from our RNAseq data. Nulliparous women present low levels of CSN2 mRNA expression visualized by IGV from our RNAseq data.

121

The second most upregulated gene in our study was UGT2B7 with a log2FC factor of 14.14 (p.value

= 0.0033, Table 5.1). atRA and derivatives and 4-hydroxyestrone, are substrates for metabolism isozyme UGT2B7 (289-292). UGT2B7 has previously been localized to normal and invasive human breast tissue parenchyma showing strong presence in the terminal ductal structures, with reduced reactivity at areas exhibiting micro-invasion in neoplastic tissues (289).

XDH (also known as xanthine oxidoreductase) was upregulated in parous tissue by a log2FC factor of 2.0 (p.value = 0.00013, Table 5.1). XDH can be converted to Xanthine Oxidase (XO) and is then capable of rapidly oxidizing retinol to atRA, the active form of vitamin A (293, 294). XDH activity is essential in atRA metabolism in numerous mammary gland processes such as the formation of the structural envelope which is needed for lactation and protection from abnormal involution (295).

WISP3 (also known as CCN6) was upregulated in parous tissue by a log2FC factor of 1.71 (p.value

= 0.0033, Table 5.1). WISP3 is well known for its role in the breast tissue by acting through a variety of tumor suppressor molecular pathways such as inducing mesenchymal-to-epithelial cell transitions (MET), suppressing invasion and motility, as well has inhibiting the development of distant metastasis (296-299).

Our present study demonstrates the differential expression in the parous breast of CSN2 protein

(Figure 5.2) and CSN2 mRNA (Figure 5.3). Our RNAseq data strongly indicates the presence of retinol to atRA metabolism in the postmenopausal parous breast, which may illustrate an anti- carcinogenic microenvironment through the induction of cellular differentiation programming and atRA-induced apoptosis.

122

5.3.2. Differentially expressed genes in normal-adjacent tissue of TCGA breast cancer patients correlate with findings in parous breast tissue

A thorough search and analysis of the 68 differentially expressed genes was done with NCBI,

PubMed, and GeneCards. We confirmed that CSN2, UGT2B7, XDH, and WISP3 are involved in atRA metabolism and/or cell differentiation. A study of open-source RNAseq data from TCGA was then pursued to evaluate a potential correlation in expression and significance of the relationship among these five genes.

Matched normal-to-tumor RNAseq data from breast tissues of 112 female IBC patients was downloaded from the TCGA databank. We speculated that if the upregulation of atRA expression is more prevalent in the parous breast as indicated by the upregulation of CSN2, UGT2B7, XDH, and WISP3, there may be a correlation in normal-to-tumor tissue as well. Of note, TCGA dataset includes a diverse population of women who are 30-90 years old and for which parity and demographic information is not available.

123

Table 5.2: Mean expression reads of matched normal vs. tumor tissues of all 112 female

IBC-TCGA patients. Represented are genes involved in atRA metabolism and/or cell differentiation

Gene ID Mean Normal Mean Tumor Log2FC p.value

CSN2 1636.8 2.0 9.70 0.0004

UGT2B7 31.5 11.3 1.48 0.0004

XDH 1160.6 500.6 1.21 0.0025

WISP3 107.4 49.6 1.12 0.0013

Differential mean RNAseq log2FC values of normal and tumor tissues of all 112 female IBC TCGA patients (Table 5.2) suggests atRA metabolism could be more active in normal tissue than tumor tissue. Moreover, p.values indicate that all five genes associated with atRA metabolism and/or cell differentiation from our healthy postmenopausal cohort dataset were significantly upregulated in the normal tissue of the mammary gland (p.values < 0.01; Table 5.3) in comparison to its adjacent tumor tissue. Notably, CSN2 differential expression in all TCGA patients was highly significant

(p.value = 0.00040). Taken together, our TCGA analysis demonstrates that all five genes are significantly upregulated in normal vs. tumor tissue in the female TCGA-IBC dataset. Moreover, the upregulated expression patterns of all five genes involved in atRA metabolism and/or cell differentiation from the normal tissue of TCGA patients directly correlates with our postmenopausal parous data (Table 5.1), potentially providing further indication that parity may establish a molecular environment supporting the integrity of un-transformed breast tissue.

124

5.3.3. CSN2-expressing TCGA-IBC patients show a prevalence of RA metabolism

To determine whether CSN2 is significant in modulating genes involved in atRA metabolism and/or cell differentiation, TCGA-IBC patients expressing CSN2 in normal tissue, or normal and tumor tissue were selected for differential RNAseq expression analysis to compare with patients with no

CSN2 expression. Sixty three patients (out of the 112) were positive for CSN2 [CSN2-expressing patients or CSN2(+)] in normal (N) tissue only (53 patients), or both normal and tumor (N+T) tissue

(10 patients). Forty two patients were negative for CSN2 [CSN2(-)] expression in both normal and tumor tissues.

Table 5.3: Mean reads of matched normal vs. tumor tissues in CSN2-expressing patients in

N tissue only and N+T (n=63). Represented are genes involved in atRA metabolism and/or cell differentiation.

Gene ID Mean Normal Mean Tumor Log2FC p.value

CSN2 2986.5 3.3 9.84 0.0004

UGT2B7 37.3 15.5 1.27 0.0160

XDH 1622.7 449.4 1.85 0.0006

WISP3 133.9 47.6 1.49 0.0009

Differential log2FC gene expression statistics obtained from 63 CSN2(+) patients (Table 5.3) exemplifies both consistent and statistically significant (p.values<= 0.02) upregulation of all four previously identified genes of interest in healthy tissues of parous vs. nulliparous women. The positive expression patterns of the normal tissues of CSN2(+) patients is consistent with the normal tissue of the entire population of TCGA patients (112 patients, Table 5.2). Moreover, this pattern

125

from CSN2(+) patients were statistically consistent with our parous vs. nulliparous data (Table 5.1).

The log2FC of CSN2(+)TCGA patients (63 patients, Table 5.3) exhibit better fold change compared to the whole TCGA population (112 patients, Table 5.2).

Log2FC scores of selected genes of interest in CSN2(+) TCGA patients are not as high in the log2FC scores as exhibited by our original data (Parous vs. Nulliparous), but are significantly upregulated (p.values < 0.02) which illustrate the difference in tissue environment (healthy tissue vs. normal-adjacent to tumor). However, the fact that the genes analyzed are significant in both cohorts (parous vs. nulliparous and normal vs. tumor) further strengthens their relevance in a normal tissue background.

5.3.4. CSN2-expressing TCGA-IBC patients have moderately correlated gene expression of

RA related genes

A Spearman correlation test using normalized tumor and normal tissue read counts (Table 5.4) of the CSN2(+) patients detected positive and moderately correlated (Spearman statistic range 0.50 to 0.65) regression scores of CSN2, UGT2B7, XDH, and WISP3. The evidence provided here suggests that there is a correlation among the transcription of these genes in CSN2(+) patients with varied biological and clinical backgrounds who express CSN2 in either normal tissue only or normal and tumor tissue.

126

Table 5.4: Spearman correlation results of CSN2, UGT2B7, XDH, and WISP3 for CSN2- expressing patients in N tissue only and N+T tissue (n=63).

CSN2 UGT2B7 XDH WISP3

CSN2 1 0.50 0.65 0.64

UGT2B7 NA 1 0.60 0.50

XDH NA NA 1 0.61

WISP3 NA NA NA 1

When looking at Spearman correlation results among our selected genes, we observed moderately positive correlation regression, even in the 112 IBC-TCGA patient population (Table 5.5) which contains diverse patient backgrounds. Notably, values plummet, showing no correlation, if we analyze correlation of these selected genes in tumor tissue only (Table 5.6) in all TCGA-IBC patients (n=112). From the Spearman correlation results of our selected genes of interest, we determined that the regression scores shift from a moderately positive correlation (Table 5.4) to a lower correlation value (Table 5.5) when adding to the analyses those patients that do not express

CSN2 (total population of 112 instead of the 63 CSN2-expressing patients). Moreover, if we look at the very low correlation values of these genes in tumor tissue only (Table 5.6), we can further highlight the molecular connection between these five genes in the normal-adjacent tissue.

127

Table 5.5: Spearman correlation results of CSN2, UGT2B7, XDH, and WISP3 for all

TCGA-IBC patients (n=112).

CSN2 UGT2B7 XDH WISP3

CSN2 1 0.45 0.49 0.50

UGT2B7 NA 1 0.55 0.53

XDH NA NA 1 0.54

WISP3 NA NA NA 1

Table 5.6: Spearman correlation results of CSN2, UGT2B7, XDH, and WISP3 for all

TCGA-IBC patients (n=112) in Tumor tissue only.

CSN2 UGT2B7 XDH WISP3

CSN2 1 0.35 0.24 0.24

UGT2B7 NA 1 0.16 0.30

XDH NA NA 1 0.09

WISP3 NA NA NA 1

128

5.4. Data interpretation

In this chapter, RNAseq results from our transcriptomic study of healthy postmenopausal parous and nulliparous breast tissue coupled with a bioinformatic analysis of publicly-available transcriptome data obtained from TCGA-IBC patients highlights CSN2 as a novel biomarker of pro- differentiation and all-trans retinoic acid (atRA) metabolism. CSN2 is largely overexpressed in our postmenopausal parous breast tissue RNAseq data with a log2FC factor of 15.87 compared to postmenopausal nulliparous breast tissue. Not only is CSN2 regarded as a primary milk protein of lactation in mammals (261, 262, 286, 300), but also marks the degree of differentiation in mammary epithelial tissue, a characteristic which is often lost in invasive breast carcinoma (IBC) (253, 301-

305). The fact that CSN2 is activated in pregnancy but remains highly upregulated in the parous breast after involution highlights the relevance of CSN2 as a biomarker of parity and differentiated tissue. CSN2 inversely correlates with breast cancer progression as its expression has previously been shown to decrease in carcinogen-treated parous mammary gland tissue in rats (306). Our present work further supports the literature of the protective role of parity that is evidenced by specific gene expression programs in postmenopausal women. In addition to CSN2, two other transcripts identified as motifs in atRA biochemistry, UGT2B7 and XDH are also significantly upregulated. Furthermore, the up-regulation of anti-breast cancer modulation gene WISP3 (296-

300, 307) furthers the evidence of the effect of early parity on breast cancer prevention (160, 184,

259, 308, 309). These genes are part of a molecular pathway associated with breast differentiation, , and the inhibition of breast cancer progression (Figure 5.4).

UGT2B7 is the only member of its enzymatic family known to metabolize atRA and derivatives with high catalytic efficiency (292, 310-313). Notably, Gestl et al. previously demonstrated that UGT2B7 presence and activity in invasive breast cancer is significantly reduced in comparison to the normal breast (289). Mechanistically, UGT2B7 conjugates atRA through glucuronidation to form retinoyl- beta-glucuronide (RAG), a bio-active metabolite whose molecular concentration has been reported to be 10-fold less toxic than atRA (314, 315). In various studies, RAG has been illustrated to be an

129

important mediator of retinoic acid biochemistry. For example, RAG has previously been shown to induce differentiation in epithelial cells and in the acute myeloid leukemia cell line HL-60 by a putative mechanism commented on in Formelli et al. (315, 316). However, RAG’s function in breast tissue has not yet been described. Important to our speculation of atRA processing and preservation, RAG is capable of converting back to atRA via hydrolysis as exhibited in atRA deficient epithelial tissues and organs in rats (317). In addition to atRA, UGT2B7 also metabolizes cancer-inducing metabolite 4-hydroxyestrone, which aids to the anti-carcinogenic process because quinone metabolites of 4-hydroxylated catechol estrogens can potentially form mutagenic DNA adducts (318, 319).The molecule 4-hydroxyestrone originates from 17-beta-estradiol which is reported to induce breast cancer initiation based on experimental evidence (320-324). All these data support the concept that UGT2B7’s activity in the breast has multiple putative tumor suppressive qualities pertaining to atRA and estrogenic processing making it an important marker of breast differentiation and therefore in cancer prevention.

XDH also plays a functional role in atRA processing within the mammary gland. It is well known that XDH is produced in the mammary gland and is also active in breast milk (294, 325-327). When converted to XO, the gene is then capable of rapidly oxidizing retinol to atRA, the active form of vitamin A (293, 294, 328, 329). Although retinol or retinaldehyde is not exclusively metabolized by

XO (330), the loss of XDH expression has been shown to be directly responsible for atRA deficiency in malignant human mammary epithelial cells (294). Not only is XDH responsible for catalyzing retinol to atRA, but it is also a biomarker of mammary gland development and differentiation (295,

328, 329, 331). In an immunohistochemical analysis of normal human breast tissue, Cook et al. demonstrated the presence of XO/XDH in the cytoplasm of epithelium lining terminal ducts (332).

Additionally, the intensity of the staining was elevated in the alveolar epithelium of lactating mammary lobules. In contrast, no immunohistochemically detectable XO/XDH was observed in intraductal in situ carcinomas and in invasive carcinomas of the breast (332). XDH has also been shown to induce , and is directly implicated in the structural envelope, as well as production and secretion of milk fat lobules during lactation in the mammary gland in vivo (295). In

130

the context of breast cancer, XDH has been shown to inhibit proliferation and stimulate differentiation of human breast cancer cells (331). Taibi et al. showed that malignant breast cancer cell lines are unable to synthesize atRA due to deficient levels or activity of XDH and cellular retinol binding protein (CRBP) (333). The inability to metabolize RA leads to toxic RA buildup, highlighting the importance of having high levels of activity of XDH in breast tissue. Further illustrating XDH regulation, Taibi et al. previously reported that the presence of estradiol decreases XDH enzyme activity and protein expression in non-tumorigenic and malignant human mammary epithelial cells

(334). From our results, the upregulation of XDH in postmenopausal parous breast tissue is consistent with the fact that less estrogenic exposure is associated with the protective role of parity

(309, 335). Furthermore, aggressive breast cancer has previously been linked to loss of XDH expression in vivo and breast cancer cell invasiveness in vitro (331) as well as with lower disease survival in progesterone receptor negative breast cancer patients (283).

There is strong experimental evidence that both UGT2B7 and XDH control atRA processing and their roles in cell differentiation are systematically regulatory in the context of mammary gland biology (289, 331, 332, 336). In the present work we propose that the high upregulation of CSN2 may be an indicator of UGT2B7 and XDH atRA biochemical activity. CSN2, together with other milk proteins, marks the process of lactation and cell differentiation (260-265, 337). Since the XDH gene is needed for lactation and protection from abnormal involution (295), we speculate its upregulation is essential to maintain mammary gland development and differentiation. CSN2 is directly involved in atRA metabolism by its implications in atRA-induced mammary epithelial cell lactogenic differentiation and inhibition of cell proliferation (262, 338, 339). Results from in vitro spectroscopic and docking experiments suggest that CSN2 can act as a carrier for retinoids through hydrophobic contacts (286). Retinoic acid signaling pathways involved in mammary gland involution and tissue remodeling further emphasize the relevance of UGT2B7, XDH, and CSN2 gene expression in postmenopausal years (340).

131

We postulate the role of another transcript WISP3 (also known as CCN6), as a potent tumor suppressor in breast cancer and cell programming marker. WISP3 is upregulated in our postmenopausal normal parous breast tissue data and also in our TCGA-IBC data analysis. To this effect, WISP3 has been reported to either be reduced or lost in 60% IBC cases, and 79% of inflammatory breast cancers (282, 341). Using CSN2 as a biomarker for lactation, Jiang et al. demonstrated that the matricellular protein, WISP3, can induce lactation in dairy cow mammary epithelial cells via mTOR/S6k1 signaling in the presence of amino acids, particularly lysine and methionine (300). The two main signaling pathways of milk protein expression in the mammary glands of mammals are JAK2/STAT5 and mTOR/S6k1 (342, 343). However, signal trans-activator transcription 5 (STAT5), a classical regulator of lactation, had no effect on WISP3 overexpression or knockdown (300). Of note, the hormonal neurophysiology of lactation associated with pregnancy is inactive in concert with involution in postpartum years (344). In addition to regulating lactation,

WISP3 is also a regulatory protein that exerts tumor suppressive functions in breast cancer by diverse molecular pathways. Recently, WISP3 overexpression was shown to suppress the transcription and protein expression of Notch1/Slug, which are well established pro-EMT regulators. Additionally, Huang et al. demonstrated that WISP3 inhibits cancer cell migration, induces mesenchymal-to-epithelial cell transition (MET), and reduces the number of breast tumor initiating cells (296). Reports by Pal et al. have shown that WISP3 antagonizes pro-EMT invasive potential from the p38 MAPK pathway which ultimately protects the developmental integrity of acinar morphogenesis in human mammary epithelial cells (297, 307). Pal et al. also demonstrated that WISP3 overexpression decreased metastasis distance in vivo (297). WISP3 is also associated with regulating reactive oxygen species (ROS) accumulation by contributing towards superoxide dismutase (SOD) expression and regulating mitochondrial function (345-347). Naturally, ROS may be generated as a consequence of XO mediated oxidization of its substrates such as retinol (294).

A functional connection between differentiation markers WISP3 and XDH is represented in Figure

5.4.

132

Retinoids have been observed to induce apoptosis and growth inhibition in numerous breast cancer cell lines through a growing number of identified pathways including microRNA networks, but mechanistically have not been fully elucidated (270, 272, 279, 305, 348-353). atRA has also been demonstrated to downregulate oncogenes such as Pin1, HER2, and ER as well as upregulate tumor suppressor genes in breast cancer cell lines (354). Recently, Berardi et al. identified myoepithelial and luminal breast cancer cells exhibit different responses to atRA treatment (280).

It has been shown that a subpopulation of myoepithelial cells respond to atRA by increasing adhesion to extracellular matrix components and by reducing invasive capacity (280). atRA induces apoptosis in luminal breast cancer cells, whereas the myoepithelial compartment responds with cellular senescence (280). atRA has additionally been shown to reduce the migration of breast cancer cell lines MCF-7 and T-47D via RAR-β (352). In the context of breast cancer prevention and progression, atRA treatment is capable of re-differentiating transformed cells at early neoplastic stages (281). Importantly, metabolism of retinol to retinoic acid was previously reported to be impaired in breast cancer cells in comparison to normal human mammary epithelial cells (333). In congruency with our TCGA bioinformatics analysis, retinoid signaling is often compromised in tumors in comparison to the surrounding normal epithelial tissue (355). Experimental data has shown that atRA fail to be synthesized in solid mammary tumors (356). Consequently, retinol accumulates and drives the self-renewal of pluripotent embryonic stem cells as well as maintains cancer-like stem cells in long-term culture (356, 357). The epigenetic mechanism which frequently donates to the aberrant retinoid signaling in carcinoma cells is the hyper methylation of the RAR-β gene (358), which is responsible for many of atRA’s anti-tumorigenic effects (349, 359). Particularly, the RAR-β-2 promoter is unresponsive in cancer cells and the RAR-β-2 promoter is active in normal cells (358, 360, 361). Accumulatively, these data suggest that the molecular control of epithelial homeostasis in the mammary gland is widely modulated by atRA pathways.

We have previously shown that parity induces a genomic signature which exemplifies a protective effect against the onset of cancer through autonomous processes in cellular differentation (160,

285, 362). Moreover, we have reported a decreased capacity for β-catenin accumulation in parous

133

breast tissue potentially due to differential methylation patterns and expression of genes involved in its regulation and degradation of the Wnt signaling pathway (362). Meier-Abt et al. elucidated how parity decreases Wnt/Notch signaling ratio, reduces nuclear β-catenin levels in basal mammary epithelial cells, and increases differentiation in mice basal mammary stem and progenitor cells in vitro and in vivo (308). Synergistic with our data, WISP3 as well as atRA have been reported to decrease notch signaling (279, 296, 362) . atRA has previously been shown to be involved in β-catenin biochemistry in SKBR3 breast cancer cells by increasing cell-cell adhesion strength and beta-catenin protein stability (363). Furthermore, atRA has been suggested to inhibit tumorigenesis by RAR’s direct interaction in preventing β-catenin-LEF/TCF transactivation (364).

LEF/TCF are potent downstream transcription factors of β-catenin involved in breast oncogenesis

(365). Our data (in combination with other reports), suggests that the decreased capacity for β- catenin accumulation caused by breast involution, WISP3 expression, and atRA metabolism, is what may partially contribute to the protective effect of pregnancy against breast cancer (259, 299,

364, 366, 367).

This chapter presents a putative network of systematic biological associations of upregulated genes in postmenopausal parous mammary tissue which may partially describe one of the multidimensional layers in the defensive nature of parous mammary tissue against the onset of breast cancer (Figure 5.4). The upregulation of CSN2, UGT2B7, XDH, and WISP3 in the parous breast and their association with atRA metabolism is significant to postmenopausal parous breast tissue homeostasis in contrast to postmenopausal nulliparous tissue. This finding is further supported by our TCGA-IBC analysis of normal-adjacent mammary tissues obtained from 112

TCGA database patients. We conclude that the anti-proliferative nature of the atRA metabolism pathway is suggestive, at least in part, of a natural mechanism that explains the refractoriness of the parous breast towards developing breast cancer and that CSN2 has been an overlooked biomarker of breast cancer prevention.

134

Figure 5.4: Schematic of CSN2, UGT2B7, XDH, and WISP3 molecular interconnections in the normal mammary gland. atRA = all-trans Retinoic Acid; RAR = Retinoic Acid Receptor; RAG = retinoyl-beta-glucuronide.

This novel molecular network of significantly upregulated genes associated with atRA processing and cell differentiation could account for one of the pathways involved in the protective advantage of postmenopausal parous breast tissue against breast cancer. The upregulation of CSN2,

UGT2B7, WISP3, and XDH observed in our postmenopausal parous vs. nulliparous RNAseq data is further confirmed when analyzing normal-adjacent vs. tumor tissue of TCGA-IBC patients.

Collectively, our data is suggestive, on the transcriptomic and protein level, that the regulatory nature of the parous breast exemplifies systematic defense pathways which may evade the onset of malignant transformation years (even decades) after pregnancy in postmenopausal years.

135

CHAPTER 6: CONCLUDING REMARKS

Long non-coding RNAs (lncRNAs) are a diverse class of molecules that function through interaction with DNA, RNA, and proteins to impact every aspect of gene regulation. With increasing advances in bioinformatics, lncRNAs now vastly outnumber annotated coding genes. Significant progress has been made in gathering and analyzing data obtained through next-generation RNA sequencing by using improved bioinformatics tools. Several databases are now available to assist in dissemination of information regarding a single or group of lncRNAs in various etiological breast cancers and several other cancer types. These databases provide information on key aspects of lncRNAs: e.g., predictive structure using the lncRNA sequence, knowledge about cellular pathways the lncRNA potentially regulates, the landscape of the lncRNA genomic locus, possible interactions with chromatin and transcription factors, etc.

A major challenge moving forward will be to distinguish functional lncRNAs from those that may simply represent byproducts of transcription. Already, many lncRNAs have been found to be critical regulators during normal processes such as development and response to exogenous stimuli. In disease, several lncRNAs have been pinpointed as drivers or promoters of specific stages or outcomes. Further insight into the structure and function of lncRNAs will be critical to take full advantage of their therapeutic potential. Last month Xue et al. published the first study to relate lncRNA structure to function (368). Inspired by historical work showing that structure plays a role in the function of other classes of RNA such as transfer RNA, Xue and coworkers deciphered the secondary structure of Braveheart lncRNA and used this information to determine how it interacts with a zinc-finger transcription factor to control the development of mouse heart muscle cells. They found that removing 11 nucleotides (2% of the entire molecule) in one of the loops of Braveheart lncRNA, they could bring normal heart development to a halt.

The latter is a good example of how determining the functional relevance of a certain lncRNA in a system is just the tip of the iceberg. In our study, we have discovered, identified, and evaluated the relevance of several lncRNAs expressed in our cohort of samples. From this study we further

136

evaluated a potential tumor suppressor (lncBHLEHE41) and an oncogene (lncEPCAM) in breast cancer. Thus, these two lncRNAs are not only relevant for breast cancer research but could also be used as predictors for breast cancer prevention. The fact that these are “non-druggable” molecules, adds a layer of complexity to using them as therapeutic targets. Antisense oligonucleotides (and other small molecules) are showing promising results, but their use in the clinic is still in its infancy. As a consequence, the need for linking function to structure is key, especially when wanting to connect the findings in breast development due to pregnancy with breast cancer progression. A deeper look into structure of lncBHLHE41 and lncEPCAM will open doors on how these RNA molecules interact with and, as a consequence, regulate protein coding genes that are also evaluated here as genetic regulators of breast cancer progression such as

CSN2 and WISP3. Our analysis was done separating non coding RNAs to protein coding genes, but a deeper understanding of the functional domains and structural elements of lncRNA molecules will undoubtedly shed light on how non coding biomarkers come together at specific locations of the genome to regulate the protein coding genes discussed in this study.

137

CHAPTER 7: FUTURE PERSPECTIVES

Scientific understanding of the involvement of lncRNAs in normal and cancer biology has recently begun to be unfold. The first step in understanding the mechanisms and functions of the multiplicity of lncRNAs discovered in the last few years in humans is a wide-scale and systematic identification and categorization of lncRNAs. However, time has proven that this is a monumental effort and has taken a significant amount of time, effort, and resources to pinpoint the relevance of a handful of lncRNAs in disease, cancer in particular. The wide range of lncRNA characteristics (length, location in genome, low abundance) plus the functional role (transcriptional vs. translational regulation as they can interact with DNA, RNA and/or protein) in the cell, together with locus of action (nuclear vs. cytoplasmic) has made the study of these intriguing molecules a slow journey over the last decade. However, as more lncRNAs are discovered due to improved and precise technology, the relevance in driving disease becomes increasingly evident. The soaring amount of software engines targeted to functionally annotate these cancer-associated transcripts, make these molecules reliable and attractive targets for therapeutic intervention in the fight against cancer and other diseases. In our study, we prioritized the lncRNAs discovered in our small set of breast biopsies to determine top candidates, which could be contributing to increased risk of developing breast cancer in postmenopausal years and contributing to deregulating cancer. Thus we selected a cancer setting to determine how these lncRNAs may contribute to breast cancer pathogenesis.

Once the mechanisms are understood, a more efficacious treatment of this disease can be achieved.

Short-term future directions would be to determine potential interactions of these two lncRNAs with nearby and in trans chromatin elements using methods such as ChIRP which is a novel technique to study lncRNA-chromatin interactions (369). Biotinylated antisense oligonucleotides are designed to target the lncRNA of interest. After RNA and chromatin are cross-linked in vivo, co-precipitated

DNA and proteins can be analyzed by DNA sequencing and mass spectrometry, respectively (370).

Since EPCAM is a proposed target for breast cancer therapy (231), research was done on the

138

potential role of lncEPCAM on the regulation of EPCAM. The sequence for both the lncRNA and the gene were entered into the BLAST software and compared for sequences of high specificity, returning numerous results. Among these results were complementary sequence matches of

>200nt, providing potential binding sites for lncEPCAM on the gene itself. Based on these possible binding sites, it was theorized that the long noncoding region may bind to the gene and recruit transcription factors to the promoter region. After using BLAST to compare DNA consensus binding sequences for several transcription factors known to bind to the EPCAM gene, multiple binding locations were found to exist on the lncRNA as well. These potential binding sites existed for transcription factors such as SOX9, ATF6, LYF-1, and several more tested. Although our preliminary RT-qPCR results on neighboring genes did not highlight EPCAM as a protein coding gene affected by lncEPCAM’s overexpression, ChIRP could indicate otherwise and expand on our

Cis-regulation results.

With the advent of CRISPR/Cas9 technology it is tempting to substitute the widely used loss-of- function methods such as siRNA and shRNA to knock down the expression of the lncRNA of interest by a method that causes editing of the DNA. As many of the lncRNAs discovered are nuclear, CRISPR/Cas9 has recently become a more appealing approach in this field.

Complementing our overexpression results with opposite results obtained by knocking out the lncRNAs of interest with CRISPR/Cas9 ought to be a short-term future direction. However, two- thirds of discovered lncRNAs loci are at risk of inadvertently deregulate neighboring genes if the lncRNA is edited as the lncRNA may be derived from bidirectional promoters or overlap with sense or antisense genes (371). Thus, as explained below, antisense oligonucleotides become an interesting alternative to specifically target the transcript from the lncRNA’s loci.

Identifying lncRNAs biomarkers, such as lncEPCAM/BC200, will additionally benefit cancer identification and treatment because these biomarkers could indicate whether the tumor is invasive and/or metastatic (372). Developing a method of identification for suitable lncRNA biomarkers for breast cancers would be a tremendous step in cancer prognosis, especially if these biomarkers could be used as early indicators of cancer. BC200 is already under consideration for a biomarker

139

of breast cancer development (247). The results presented in this project aid in the understanding of the effect of this lncRNA in luminal and TNBC pathogenesis.

Next step is to determine if specific treatments to either inhibit lncRNAs acting as oncogenes and/or proto-oncogenes, such as XIST, BC200, and HOTAIR, or to increase expression of tumor suppressor lncRNAs, such as GAS5, MALAT-1, and ZFAS1 (and potentially lncBHLHE41 if the results in this project are expanded) are efficacious treatment methods (372, 373). In particular, antisense oligonucleotides (ASOs) are capable of targeting specific genes or transcripts directly through Watson-Crick base pairing and thus can be designed based on sequence information alone

(374). For a transcript such as lncEPCAM/BC200 which is upregulated in luminal and TNBC a single ASO agent could specifically target this "non-druggable" RNA molecule. ASOs are an emerging class of agents that are beginning to be applied in preclinical studies to reduce levels of lncRNAs of interest. In the foreseeable future these agents could be used in the clinic to tackle disease progression.

Breast cancer is incredibly heterogeneous. Even within sub-classifications of breast cancer –

ER+/PR+, HER2+, and Triple Negative – breast cancer varies greatly within the same breast cancer subtype, and from patient to patient. The advantage of using lncRNAs as a biomarker for breast cancer prevention or breast cancer progression is the transcript sequence of these molecules can be targeted to quantify or use as therapeutic molecules. Understanding how their dysregulation can lead to breast cancer development is essential for tailoring treatment to a patient’s specific type of breast cancer.

In summary, given the plethora of discovered functions, we believe lncRNAs are a class of RNA that is largely unexplored as drivers of disease as well as potential therapeutic targets. The continuous active investigation in the field of lncRNAs, although slow due to its complexity, has proven satisfying in the cancer field. Breast cancer in particular lacks any lncRNA predicting therapeutic intervention (248) but results from this project and subsequent expansion of these

140

results, could lead to two new key players in breast cancer prevention (lncBHLHE41) and breast cancer development (lncEPCAM/BC200).

141

REFERENCES CITED

1. DeSantis CE, Siegel RL, Sauer AG, Miller KD, Fedewa SA, Alcaraz KI, et al.

Cancer statistics for African Americans, 2016: Progress and opportunities in reducing racial disparities. CA: a cancer journal for clinicians. 2016;66(4):290-308.

2. Siegel RL, Miller KD, Jemal A. Cancer statistics, 2016. CA: a cancer journal for clinicians. 2016;66(1):7-30.

3. Torre LA, Sauer AM, Chen MS, Jr., Kagawa-Singer M, Jemal A, Siegel RL.

Cancer statistics for Asian Americans, Native Hawaiians, and Pacific Islanders,

2016: Converging incidence in males and females. CA: a cancer journal for clinicians. 2016;66(3):182-202.

4. DeSantis CE, Fedewa SA, Goding Sauer A, Kramer JL, Smith RA, Jemal A.

Breast cancer statistics, 2015: Convergence of incidence rates between black and white women. CA: a cancer journal for clinicians. 2016;66(1):31 -42.

5. Danforth DN, Jr. Disparities in breast cancer outcomes between Caucasian and African American women: a model for describing the relationship of biological and nonbiological factors. Breast cancer research : BCR. 2013;15(3):208.

6. Silber JH, Rosenbaum PR, Clark AS, Giantonio BJ, Ross RN, Teng Y, et al.

Characteristics associated with differences in survival among black and white women with breast cancer. Jama. 2013;310(4):389-97.

7. Chlebowski RT, Chen Z, Anderson GL, Rohan T, Aragaki A, Lane D, et al.

Ethnicity and breast cancer: factors influencing differences in incidence and outcome. Journal of the National Cancer Institute. 2005; 97(6):439-48.

8. Martin JA, Hamilton BE, Osterman MJ. Births in the United States, 2015.

NCHS data brief. 2016(258):1-8.

9. Hamilton BE, Martin JA, Osterman MJ, Curtin SC, Matthews TJ. Births: Final

Data for 2014. National vital statistics reports : from t he Centers for Disease

142

Control and Prevention, National Center for Health Statistics, National Vital

Statistics System. 2015;64(12):1-64.

10. Fedewa SA, Sauer AG, Siegel RL, Jemal A. Prevalence of major risk factors and use of screening tests for cancer in the United States. Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer

Research, cosponsored by the American Society of Preventive Oncology.

2015;24(4):637-52.

11. Loof-Johanson M, Brudin L, Sundquist M, Rudebeck CE. Breastfeeding

Associated with Reduced Mortality in Women with Breast Cancer. Breastfeeding medicine : the official journal of the Academy of Breastfeeding Medicine. 2016.

12. Tamimi RM, Colditz GA, Hazra A, Baer HJ, Hankinson SE, Rosner B, et al.

Traditional breast cancer risk factors in relation to molecular subtypes of breast cancer. Breast cancer research and treatment. 2012;131(1):159 -67.

13. Cancer Genome Atlas N. Comprehensive molecular portraits of human breast tumours. Nature. 2012;490(7418):61-70.

14. Perou CM, Sorlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, et al.

Molecular portraits of human breast tumours. Nature. 2000;406(6797):747 -52.

15. Anderson WF, Rosenberg PS, Katki HA. Tracking and evaluating molecular tumor markers with cancer registry data: HER2 and breast cancer. Journal of the

National Cancer Institute. 2014;106(5).

16. Blows FM, Driver KE, Schmidt MK, Broeks A, van Leeuwen FE, Wesseling J, et al. Subtyping of breast cancer by immunohistochemistry to investigate a relationship between subtype and short and long term survival: a collaborative analysis of data for 10,159 cases from 12 studies. PLoS medicine.

2010;7(5):e1000279.

17. MacMahon B. Epidemiology and the causes of breast cancer. International journal of cancer Journal international du cancer. 2006;118(10):2373-8.

143

18. Toniolo P, Grankvist K, Wulff M, Chen T, Johansson R, Schock H, et al.

Human chorionic gonadotropin in pregnancy and maternal risk of breast cancer.

Cancer research. 2010;70(17):6779-86.

19. MacMahon B, Cole P, Lin TM, Lowe CR, Mirra AP, Ravnihar B, et al. Age at first birth and breast cancer risk. Bull World Health Organ. 1970;43(2):209 -21.

20. Russo J, Balogh GA, Russo IH. Full-term pregnancy induces a specific genomic signature in the human breast. Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology. 2008;17(1):51 -66.

21. Hinkula M, Pukkala E, Kyyronen P, Kauppila A. Grand multiparit y and the risk of breast cancer: population-based study in Finland. Cancer causes & control :

CCC. 2001;12(6):491-500.

22. Nair RR, Verma P, Singh K. Immune-endocrine crosstalk during pregnancy.

General and comparative endocrinology. 2016.

23. Kobayashi S, Sugiura H, Ando Y, Shiraki N, Yanagi T, Yamashita H, et al.

Reproductive history and breast cancer risk. Breast cancer. 2012;19(4):302 -8.

24. Albrektsen G, Heuch I, Thoresen S, Kvale G. Clinical stage of breast cancer by parity, age at birth, and time since birth: a progressive effect of pregnancy hormones? Cancer epidemiology, biomarkers & prevention : a publication of the

American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology. 2006;15(1):65-9.

25. Lukanova A, Surcel HM, Lundin E, Kaasila M, Lakso HA, Schock H, et al.

Circulating estrogens and progesterone during primiparous pregnancies and risk of maternal breast cancer. International journal of cancer Journal international du cancer. 2012;130(4):910-20.

26. Christensen A, Bentley GE, Cabrera R, Ortega HH, Perfito N, Wu TJ, et al.

Hormonal regulation of female reproduction. Hormone and metabolic research =

144

Hormon- und Stoffwechselforschung = Hormones et metabolisme. 2012;44(8):587 -

91.

27. Chen T, Lundin E, Grankvist K, Zeleniuch-Jacquotte A, Wulff M, Afanasyeva

Y, et al. Maternal hormones during early pregnancy: a cross-sectional study.

Cancer causes & control : CCC. 2010;21(5):719-27.

28. Russo J, Ao X, Grill C, Russo IH. Pattern of distribution of cells positi ve for and progesterone receptor in relation to proliferating cells in the mammary gland. Breast cancer research and treatment. 1999;53(3):217 -27.

29. Russo IH, Koszalka M, Russo J. Effect of human chorionic gonadotropin on mammary gland differentiation and carcinogenesis. Carcinogenesis.

1990;11(10):1849-55.

30. Katz TA, Liao SG, Palmieri VJ, Dearth RK, Pathiraja TN, Huo Z, et al.

Targeted DNA Methylation Screen in the Mouse Mammary Genome Reveals a

Parity-Induced Hypermethylation of Igf1r That Persists Long after Parturition.

Cancer prevention research. 2015;8(10):1000-9.

31. Ginger MR, Rosen JM. Pregnancy-induced changes in cell-fate in the mammary gland. Breast cancer research : BCR. 2003;5(4):192 -7.

32. Russo J, Balogh G, Mailo D, Russo PA, Heulings R, Russo IH. The genomic signature of breast cancer prevention. Recent Results Cancer Res. 2007;174:131 -

50.

33. ElShamy WM. The protective effect of longer duration of breastfeeding against pregnancy-associated triple negative breast cancer. Oncotarget. 2016.

34. Giudici F, Scaggiante B, Scomersi S, Bortul M, Tonutti M, Zanconati F.

Breastfeeding: a reproductive factor able to reduce the risk of luminal B breast cancer in premenopausal White women. European journal of cancer prevention : the official journal of the European Cancer Prevention Organisation. 2016.

145

35. Barton M, Santucci-Pereira J, Russo J. Molecular pathways involved in pregnancy-induced prevention against breast cancer. Front Endocrinol (Lausanne).

2014;5:213.

36. Santucci-Pereira J, Barton M, Russo J. Use of Next Generation Sequencing in the Identification of Long Non-Coding RNAs as Potential Players in Breast

Cancer Prevention. Transcriptomics: Open Access. 2014;2(104).

37. Parasramka MA, Maji S, Matsuda A, Yan IK, Patel T. Long non-coding RNAs as novel targets for therapy in hepatocellular carcinoma. Pharmacology & therapeutics. 2016;161:67-78.

38. Prensner JR, Chinnaiyan AM. The emergence of lncRNAs in cancer biology.

Cancer discovery. 2011;1(5):391-407.

39. Margueron R, Reinberg D. The Polycomb complex PRC2 and its mark in life.

Nature. 2011;469(7330):343-9.

40. Khalil AM, Guttman M, Huarte M, Garber M, Raj A, Rivea Morales D, et al.

Many human large intergenic noncoding RNAs associate with chromatin -modifying complexes and affect gene expression. Proceedings of the National Academy of

Sciences of the United States of America. 2009;106(28):11667 -72.

41. Tsai MC, Manor O, Wan Y, Mosammaparast N, Wang JK, Lan F, et al. Long noncoding RNA as modular scaffold of histone modif ication complexes. Science.

2010;329(5992):689-93.

42. Nagano T, Mitchell JA, Sanz LA, Pauler FM, Ferguson-Smith AC, Feil R, et al. The Air noncoding RNA epigenetically silences transcription by targeting G9a to chromatin. Science. 2008;322(5908):1717-20.

43. Pandey RR, Mondal T, Mohammad F, Enroth S, Redrup L, Komorowski J, et al. Kcnq1ot1 antisense noncoding RNA mediates lineage-specific transcriptional silencing through chromatin-level regulation. Molecular cell. 2008;32(2):232-46.

146

44. Wang KC, Yang YW, Liu B, Sanyal A, Corces-Zimmerman R, Chen Y, et al.

A long noncoding RNA maintains active chromatin to coordinate homeotic gene expression. Nature. 2011;472(7341):120-4.

45. Wang D, Garcia-Bassets I, Benner C, Li W, Su X, Zhou Y, et al.

Reprogramming transcription by distinct classes of enhancers functionally defined by eRNA. Nature. 2011;474(7351):390-4.

46. Zong X, Tripathi V, Prasanth KV. RNA splicing control: yet another gene regulatory role for long nuclear noncoding RNAs. RNA biology. 2011;8(6):968 -77.

47. Bond CS, Fox AH. Paraspeckles: nuclear bodies built on long noncoding

RNA. The Journal of cell biology. 2009;186(5):637-44.

48. Bernard D, Prasanth KV, Tripathi V, Colasse S, Nakamura T, Xuan Z, et al.

A long nuclear-retained non-coding RNA regulates synaptogenesis by modulating gene expression. The EMBO journal. 2010;29(18):3082-93.

49. Tripathi V, Ellis JD, Shen Z, Song DY, Pan Q, Watt AT, et al. The nuclear - retained noncoding RNA MALAT1 regulates alternative splicing by modulating SR splicing factor phosphorylation. Molecular cell. 2010;39(6):925-38.

50. Poliseno L, Salmena L, Zhang J, Carver B, Haveman WJ, Pandolfi PP. A coding-independent function of gene and pseudogene mRNAs regulates tumour biology. Nature. 2010;465(7301):1033-8.

51. de Giorgio A, Krell J, Harding V, Stebbing J, Castellano L. Emerging roles of competing endogenous RNAs in cancer: insights from the regulation of PTEN.

Molecular and cellular biology. 2013;33(20):3976-82.

52. Johnsson P, Ackley A, Vidarsdottir L, Lui WO, Corcoran M, Grander D, et al.

A pseudogene long-noncoding-RNA network regulates PTEN transcription and translation in human cells. Nature structural & molecular biology. 2013;20(4):440 -6.

53. Iyer MK, Niknafs YS, Malik R, Singhal U, Sahu A, Hosono Y, et al. The landscape of long noncoding RNAs in the human transcriptome. Nature genetics.

2015;47(3):199-208.

147

54. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, et al.

Initial sequencing and analysis of the human genome. Nature. 2001;409(6822):860 -

921.

55. Evans JR, Feng FY, Chinnaiyan AM. The bright side of dark matter: lncRNAs in cancer. The Journal of clinical investigation. 2016;126(8):2775 -82.

56. Rinn JL. lncRNAs: linking RNA to chromatin. Cold Spring Harbor perspectives in biology. 2014;6(8).

57. Koerner MV, Pauler FM, Huang R, Barlow DP. The function of non -coding

RNAs in genomic imprinting. Development. 2009;136(11):1771-83.

58. Bartolomei MS, Zemel S, Tilghman SM. Parental imprinting of the mouse

H19 gene. Nature. 1991;351(6322):153-5.

59. Lee MP, DeBaun MR, Mitsuya K, Galonek HL, Brandenburg S, Oshimura M, et al. Loss of imprinting of a paternally expressed transcript, with antisense orientation to KVLQT1, occurs frequently in Beckwith-Wiedemann syndrome and is independent of insulin-like growth factor II imprinting. Proceedings of the National

Academy of Sciences of the United States of America. 1999;96(9):5203 -8.

60. Weksberg R, Nishikawa J, Caluseriu O, Fei YL, Shuman C, Wei C, et al.

Tumor development in the Beckwith-Wiedemann syndrome is associated with a variety of constitutional molecular 11p15 alterations including imprinting defects of

KCNQ1OT1. Human molecular genetics. 2001;10(26):2989-3000.

61. Matouk IJ, DeGroot N, Mezan S, Ayesh S, Abu-lail R, Hochberg A, et al. The

H19 non-coding RNA is essential for human tumor growth. PloS one.

2007;2(9):e845.

62. Lottin S, Adriaenssens E, Dupressoir T, Berteaux N, Montpellier C, Coll J, et al. Overexpression of an ectopic H19 gene enhances the tumorigenic properties of breast cancer cells. Carcinogenesis. 2002;23(11):1885-95.

148

63. Gabory A, Jammes H, Dandolo L. The H19 locus: role of an imprinted non - coding RNA in growth and development. BioEssays : news and reviews in molecular, cellular and developmental biology. 2010;32(6):473 -80.

64. Barsyte-Lovejoy D, Lau SK, Boutros PC, Khosravi F, Jurisica I, Andrulis IL, et al. The c- oncogene directly induces the H19 noncoding RNA by allele - specific binding to potentiate tumorigenesis. Cancer research. 2006;66(10):5330 -7.

65. Pantoja C, de Los Rios L, Matheu A, Antequera F, Serrano M. Inactivation of imprinted genes induced by cellular stress and tumorigenesis. Cancer research.

2005;65(1):26-33.

66. Bliek J, Maas SM, Ruijter JM, Hennekam RC, Alders M, Westerveld A, et al.

Increased tumour risk for BWS patients correlates with aberrant H19 and not

KCNQ1OT1 methylation: occurrence of KCNQ1OT1 hypomethylation in familial cases of BWS. Human molecular genetics. 2001;10(5):467-76.

67. Chow J, Heard E. X inactivation and the complexities of silencing a sex chromosome. Current opinion in cell biology. 2009;21(3):359-66.

68. Lee JT. Lessons from X-chromosome inactivation: long ncRNA as guides and tethers to the epigenome. Genes & development. 2009;23(16):1831 -42.

69. Richardson AL, Wang ZC, De Nicolo A, Lu X, Brown M, M iron A, et al. X chromosomal abnormalities in basal-like human breast cancer. Cancer cell.

2006;9(2):121-32.

70. Ganesan S, Silver DP, Greenberg RA, Avni D, Drapkin R, Miron A, et al.

BRCA1 supports XIST RNA concentration on the inactive X chromosome. Cell .

2002;111(3):393-405.

71. Kawakami T, Okamoto K, Ogawa O, Okada Y. XIST unmethylated DNA fragments in male-derived plasma as a tumour marker for testicular cancer. Lancet

(London, England). 2004;363(9402):40-2.

149

72. Weakley SM, Wang H, Yao Q, Chen C. Expression and function of a large non-coding RNA gene XIST in human cancer. World journal of surgery.

2011;35(8):1751-6.

73. Briggs SF, Reijo Pera RA. X chromosome inactivation: recent advances and a look forward. Current opinion in genetics & development. 201 4;28:78-82.

74. Chaligne R, Heard E. X-chromosome inactivation in development and cancer. FEBS letters. 2014;588(15):2514-22.

75. Yildirim E, Kirby JE, Brown DE, Mercier FE, Sadreyev RI, Scadden DT, et al.

Xist RNA is a potent suppressor of hematologic cancer in mice. Cell.

2013;152(4):727-42.

76. Yap KL, Li S, Munoz-Cabello AM, Raguz S, Zeng L, Mujtaba S, et al.

Molecular interplay of the noncoding RNA ANRIL and methylated histone H3 lysine

27 by polycomb CBX7 in transcriptional silencing of INK4a. Molecul ar cell.

2010;38(5):662-74.

77. Kotake Y, Nakagawa T, Kitagawa K, Suzuki S, Liu N, Kitagawa M, et al.

Long non-coding RNA ANRIL is required for the PRC2 recruitment to and silencing of p15(INK4B) tumor suppressor gene. Oncogene. 2011;30(16):1956 -62.

78. Burd CE, Jeck WR, Liu Y, Sanoff HK, Wang Z, Sharpless NE. Expression of linear and novel circular forms of an INK4/ARF-associated non-coding RNA correlates with atherosclerosis risk. PLoS genetics. 2010;6(12):e1001233.

79. Pasmant E, Sabbagh A, Vidaud M, Bieche I. ANRIL, a long, noncoding RNA, is an unexpected major hotspot in GWAS. FASEB journal : official publication of the

Federation of American Societies for Experimental Biology. 2011;25(2):444 -8.

80. Rinn JL, Kertesz M, Wang JK, Squazzo SL, Xu X, Brugman n SA, et al.

Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell. 2007;129(7):1311-23.

81. Hung T, Chang HY. Long noncoding RNA in genome regulation: prospects and mechanisms. RNA biology. 2010;7(5):582-5.

150

82. Yang Z, Zhou L, Wu LM, Lai MC, Xie HY, Zhang F, et al. Overexpression of long non-coding RNA HOTAIR predicts tumor recurrence in hepatocellular carcinoma patients following liver transplantation. Annals of surgical oncology.

2011;18(5):1243-50.

83. Gibb EA, Brown CJ, Lam WL. The functional role of long non -coding RNA in human carcinomas. Molecular cancer. 2011;10:38.

84. Gupta RA, Shah N, Wang KC, Kim J, Horlings HM, Wong DJ, et al. Long non-coding RNA HOTAIR reprograms chromatin state to promote canc er metastasis. Nature. 2010;464(7291):1071-6.

85. Sorensen KP, Thomassen M, Tan Q, Bak M, Cold S, Burton M, et al. Long non-coding RNA HOTAIR is an independent prognostic marker of metastasis in estrogen receptor-positive primary breast cancer. Breast cancer research and treatment. 2013;142(3):529-36.

86. Zhang L, Song X, Wang X, Xie Y, Wang Z, Xu Y, et al. Circulating DNA of

HOTAIR in serum is a novel biomarker for breast cancer. Breast cancer research and treatment. 2015;152(1):199-208.

87. Prensner JR, Iyer MK, Balbin OA, Dhanasekaran SM, Cao Q, Brenner JC, et al. Transcriptome sequencing across a prostate cancer cohort identifies PCAT -1, an unannotated lincRNA implicated in disease progression. Nature biotechnology.

2011;29(8):742-9.

88. Kino T, Hurt DE, Ichijo T, Nader N, Chrousos GP. Noncoding RNA gas5 is a growth arrest- and starvation-associated repressor of the glucocorticoid receptor.

Science signaling. 2010;3(107):ra8.

89. Mourtada-Maarabouni M, Pickard MR, Hedge VL, Farzaneh F, Williams GT.

GAS5, a non-protein-coding RNA, controls apoptosis and is downregulated in breast cancer. Oncogene. 2009;28(2):195-208.

90. Ginger MR, Shore AN, Contreras A, Rijnkels M, Miller J, Gonzalez -Rimbau

MF, et al. A noncoding RNA is a potential marker of cell fate duri ng mammary gland

151

development. Proceedings of the National Academy of Sciences of the United

States of America. 2006;103(15):5781-6.

91. Wagner KU, Boulanger CA, Henry MD, Sgagias M, Hennighausen L, Smith

GH. An adjunct mammary epithelial cell population in parous females: its role in functional adaptation and tissue renewal. Development. 2002;129(6):1377 -86.

92. Shore AN, Kabotyanski EB, Roarty K, Smith MA, Zhang Y, Creighton CJ, et al. Pregnancy-induced noncoding RNA (PINC) associates with polycomb repressive complex 2 and regulates mammary epithelial differentiation. PLoS genetics.

2012;8(7):e1002840.

93. Hanahan D, Weinberg RA. The hallmarks of cancer. Cell. 2000;100(1):57 -

70.

94. Gutschner T, Diederichs S. The hallmarks of cancer: a long non -coding RNA point of view. RNA biology. 2012;9(6):703-19.

95. Hanahan D, Weinberg RA. Hallmarks of cancer: the next generation. Cell.

2011;144(5):646-74.

96. Lanz RB, McKenna NJ, Onate SA, Albrecht U, Wong J, Tsai SY, et al. A steroid receptor coactivator, SRA, functions as an RNA and is present in an SRC-1 complex. Cell. 1999;97(1):17-27.

97. Leygue E, Dotzlaw H, Watson PH, Murphy LC. Expression of the steroid receptor RNA activator in human breast tumors. Cancer research.

1999;59(17):4190-3.

98. Chooniedass-Kothari S, Hamedani MK, Troup S, Hube F, Leygue E. The steroid receptor RNA activator protein is expressed in breast tumor tissues.

International journal of cancer Journal international du cancer. 2006;118(4):1054 -9.

99. Kawashima H, Takano H, Sugita S, Takahara Y, Sugimura K, Nakatani T. A novel steroid receptor co-activator protein (SRAP) as an alternative form of steroid receptor RNA-activator gene: expression in prostate cancer cells and enhancement of androgen receptor activity. The Biochemical journal. 2003;369 (Pt 1):163-71.

152

100. Hube F, Guo J, Chooniedass-Kothari S, Cooper C, Hamedani MK, Dibrov

AA, et al. Alternative splicing of the first of the steroid receptor RNA activator (SRA) participates in the generation of coding and noncoding RNA isoforms in breast cancer cell lines. DNA and cell biology. 2006;25(7):418 -28.

101. Cooper C, Guo J, Yan Y, Chooniedass-Kothari S, Hube F, Hamedani MK, et al. Increasing the relative expression of endogenous non-coding Steroid Receptor

RNA Activator (SRA) in human breast cancer cells using modified oligonucleotides.

Nucleic acids research. 2009;37(13):4518-31.

102. Hung T, Wang Y, Lin MF, Koegel AK, Kotake Y, Grant GD, et al. Extensive and coordinated transcription of noncoding RNAs within cell -cycle promoters.

Nature genetics. 2011;43(7):621-9.

103. Muller AJ, Chatterjee S, Teresky A, Levine AJ. The gas5 gene is disrupted by a frameshift mutation within its longest open reading frame in several inbred mouse strains and maps to murine chromosome 1. Mammalian genome : off icial journal of the International Mammalian Genome Society. 1998;9(9):773 -4.

104. Raho G, Barone V, Rossi D, Philipson L, Sorrentino V. The gas 5 gene shows four alternative splicing patterns without coding for a protein. Gene.

2000;256(1-2):13-7.

105. Smith CM, Steitz JA. Classification of gas5 as a multi -small-nucleolar-RNA

(snoRNA) host gene and a member of the 5'-terminal oligopyrimidine gene family reveals common features of snoRNA host genes. Molecular and cellular biology.

1998;18(12):6897-909.

106. Coccia EM, Cicala C, Charlesworth A, Ciccarelli C, Rossi GB, Philipson L, et al. Regulation and expression of a growth arrest -specific gene (gas5) during growth, differentiation, and development. Molecular and cellular biology.

1992;12(8):3514-21.

107. Morrison LE, Jewell SS, Usha L, Blondin BA, Rao RD, Tabesh B, et al.

Effects of ERBB2 amplicon size and genomic alterations of chromosomes 1, 3, and

153

10 on patient response to trastuzumab in metastatic breast cancer. Genes, chromosomes & cancer. 2007;46(4):397-405.

108. Nupponen NN, Carpten JD. Prostate cancer susceptibility genes: many studies, many results, no answers. Cancer metastasis reviews. 2001;20(3 -4):155-

64.

109. Smedley D, Sidhar S, Birdsall S, Bennett D, Herlyn M, Cooper C, et al.

Characterization of chromosome 1 abnormalities in malignant melanomas. Genes, chromosomes & cancer. 2000;28(1):121-5.

110. Huarte M, Guttman M, Feldser D, Garber M, Koziol MJ, Kenzelmann -Broz D, et al. A large intergenic noncoding RNA induced by p53 mediates global gene repression in the p53 response. Cell. 2010;142(3):409-19.

111. Morris KV, Santoso S, Turner AM, Pastori C, Hawkins PG. Bidirectional transcription directs both transcriptional gene activation and suppression in human cells. PLoS genetics. 2008;4(11):e1000258.

112. Bryan TM, Englezou A, Dalla-Pozza L, Dunham MA, Reddel RR. Evidence for an alternative mechanism for maintaining telomere length in human tumors and tumor-derived cell lines. Nature medicine. 1997;3(11):1271-4.

113. Molenaar C, Wiesmeijer K, Verwoer d NP, Khazen S, Eils R, Tanke HJ, et al.

Visualizing telomere dynamics in living mammalian cells using PNA probes. The

EMBO journal. 2003;22(24):6631-41.

114. Shay JW, Bacchetti S. A survey of telomerase activity in human cancer.

European journal of cancer (Oxford, England : 1990). 1997;33(5):787-91.

115. Henson JD, Cao Y, Huschtscha LI, Chang AC, Au AY, Pickett HA, et al. DNA

C-circles are specific and quantifiable markers of alternative-lengthening-of- telomeres activity. Nature biotechnology. 2009;27(12): 1181-5.

116. Feng J, Funk WD, Wang SS, Weinrich SL, Avilion AA, Chiu CP, et al. The

RNA component of human telomerase. Science. 1995;269(5228):1236 -41.

154

117. Andersson S, Wallin KL, Hellstrom AC, Morrison LE, Hjerpe A, Auer G, et al.

Frequent gain of the human telomerase gene TERC at 3q26 in cervical adenocarcinomas. British journal of cancer. 2006;95(3):331-8.

118. Nowak T, Januszkiewicz D, Zawada M, Pernak M, Lewandowski K,

Rembowska J, et al. Amplification of hTERT and hTERC genes in leukemic cells with high expression and activity of telomerase. Oncology reports. 2006;16(2):301 -

5.

119. Phatak P, Burger AM. Telomerase and its potential for therapeutic intervention. British journal of pharmacology. 2007;152(7):1003 -11.

120. Azzalin CM, Reichenbach P, Khoriauli L, Giulotto E, Lingner J. Telomeric repeat containing RNA and RNA surveillance factors at mammalian chromosome ends. Science. 2007;318(5851):798-801.

121. Luke B, Lingner J. TERRA: telomeric repeat -containing RNA. The EMBO journal. 2009;28(17):2503-10.

122. Schoeftner S, Blasco MA. Developmentally regulated transcription of mammalian telomeres by DNA-dependent RNA polymerase II. Nature cell biology.

2008;10(2):228-36.

123. Deng Z, Norseen J, Wiedmer A, Riethman H, Lieberman PM. TERRA RNA binding to TRF2 facilitates heterochromatin formation and ORC recruitment at telomeres. Molecular cell. 2009;35(4):403-13.

124. Redon S, Reichenbach P, Lingner J. The non-coding RNA TERRA is a natural ligand and direct inhibitor of human telomerase. Nucleic acids researc h.

2010;38(17):5797-806.

125. Flynn RL, Centore RC, O'Sullivan RJ, Rai R, Tse A, Songyang Z, et al.

TERRA and hnRNPA1 orchestrate an RPA-to-POT1 switch on telomeric single- stranded DNA. Nature. 2011;471(7339):532-6.

155

126. Ng LJ, Cropley JE, Pickett HA, Reddel RR, Suter CM. Telomerase activity is associated with an increase in DNA methylation at the proximal subtelomere and a reduction in telomeric transcription. Nucleic acids research. 2009;37(4):1152 -9.

127. Fidler IJ. The pathogenesis of cancer metastasis: the 'seed and soil' hypothesis revisited. Nature reviews Cancer. 2003;3(6):453-8.

128. Talmadge JE, Fidler IJ. AACR centennial series: the biology of cancer metastasis: historical perspective. Cancer research. 2010;70(14):5649 -69.

129. Ji P, Diederichs S, Wang W, Boing S, Metzger R, Schneider PM, et al.

MALAT-1, a novel noncoding RNA, and thymosin beta4 predict metastasis and survival in early-stage non-small cell lung cancer. Oncogene. 2003;22(39):8031-41.

130. Gutschner T, Baas M, Diederichs S. Noncoding RNA gene silencing through genomic integration of RNA destabilizing elements using nucleases.

Genome research. 2011;21(11):1944-54.

131. Hutchinson JN, Ensminger AW, Clemson CM, Lynch CR, Lawrence JB,

Chess A. A screen for nuclear transcripts identifies two linked noncoding RNAs associated with SC35 splicing domains. BMC genomics. 2007;8:39.

132. Lin R, Roychowdhury-Saha M, Black C, Watt AT, Marcusson EG, Freier SM, et al. Control of RNA processing by a large non -coding RNA over-expressed in carcinomas. FEBS letters. 2011;585(4):671-6.

133. Guo F, Li Y, Liu Y, Wang J, Li Y, Li G. Inhibition of metastasis-associated lung adenocarcinoma transcript 1 in CaSki human cervical cancer cells suppresses cell proliferation and invasion. Acta biochimica et biophysica Sinica.

2010;42(3):224-9.

134. Lin R, Maeda S, Liu C, Karin M, Edgington TS. A large noncoding RNA is a marker for murine hepatocellular carcinomas and a spectrum of human carcinomas.

Oncogene. 2007;26(6):851-8.

156

135. Yamada K, Kano J, Tsunoda H, Yoshikawa H, Okubo C, Ishiyama T, et al.

Phenotypic characterization of endometrial stromal of the uterus. Cancer science. 2006;97(2):106-12.

136. Kogo R, Shimamura T, Mimori K, Kawahara K, Imoto S, Sudo T, et al. Long noncoding RNA HOTAIR regulates polycomb-dependent chromatin modification and is associated with poor prognosis in colorectal cancers. Cancer research.

2011;71(20):6320-6.

137. Niinuma T, Suzuki H, Nojima M, Nosho K, Yamamoto H, Takamaru H, et al.

Upregulation of miR-196a and HOTAIR drive malignant character in gastrointestinal stromal tumors. Cancer research. 2012;72(5):1126-36.

138. Geng YJ, Xie SL, Li Q, Ma J, Wang GY. Large intervening non -coding RNA

HOTAIR is associated with hepatocellular carcinoma progression. The Journal of international medical research. 2011;39(6):2119-28.

139. Baeriswyl V, Christofori G. The angiogenic switch in carcinogenesis.

Seminars in cancer biology. 2009;19(5):329-37.

140. Augustin HG, Koh GY, Thurston G, Alitalo K. Control of vascular morphogenesis and homeostasis through the angiopoietin-Tie system. Nature reviews Molecular cell biology. 2009;10(3):165-77.

141. Ferrara N. Vascular endothelial growth factor. Arteriosclerosis, thrombosis, and vascular biology. 2009;29(6):789-91.

142. Kazerounian S, Yee KO, Lawler J. Thrombospondins in cancer. Cellular and molecular life sciences : CMLS. 2008;65(5):700 -12.

143. Rossignol F, Vache C, Clottes E. Natural antisense transcripts of hypoxia - inducible factor 1alpha are detected in different normal and tumour huma n tissues.

Gene. 2002;299(1-2):135-40.

144. Thrash-Bingham CA, Tartof KD. aHIF: a natural antisense transcript overexpressed in human renal cancer and during hypoxia. Journal of the National

Cancer Institute. 1999;91(2):143-51.

157

145. Uchida T, Rossignol F, Matthay MA, Mounier R, Couette S, Clottes E, et al.

Prolonged hypoxia differentially regulates hypoxia-inducible factor (HIF)-1alpha and HIF-2alpha expression in lung epithelial cells: implication of natural antisense

HIF-1alpha. The Journal of biological chemistry. 2004;279(15):14871-8.

146. Cayre A, Rossignol F, Clottes E, Penault-Llorca F. aHIF but not HIF-1alpha transcript is a poor prognostic marker in human breast cancer. Breast cancer research : BCR. 2003;5(6):R223-30.

147. Adams JM, Cory S. The Bcl-2 apoptotic switch in cancer development and therapy. Oncogene. 2007;26(9):1324-37.

148. Lowe SW, Cepero E, Evan G. Intrinsic tumour suppression. Nature.

2004;432(7015):307-15.

149. Hollstein M, Sidransky D, Vogelstein B, Harris CC. p53 mutations in human cancers. Science. 1991;253(5015):49-53.

150. Levine B, Kroemer G. Autophagy in the pathogenesis of disease. Cell.

2008;132(1):27-42.

151. Mizushima N. Autophagy: process and function. Genes & development.

2007;21(22):2861-73.

152. White E, DiPaola RS. The double-edged sword of autophagy modulation in cancer. Clinical cancer research : an official journal of the American Association for

Cancer Research. 2009;15(17):5308-16.

153. Galluzzi L, Kroemer G. Necroptosis: a specialized pathway of programmed necrosis. Cell. 2008;135(7):1161-3.

154. Zong WX, Thompson CB. Necrotic death as a cell fate. Genes & development. 2006;20(1):1-15.

155. Grivennikov SI, Greten FR, Karin M. Immunity, , and cancer.

Cell. 2010;140(6):883-99.

156. Srikantan V, Zou Z, Petrovics G, Xu L, Augustus M, Davis L, et al. PCGEM1, a prostate-specific gene, is overexpressed in prostate cancer. Proceedings of the

158

National Academy of Sciences of the United States of America. 2000;97(22):12216 -

21.

157. Fu X, Ravindranath L, Tran N, Petrovics G, Srivastava S. Regulation of apoptosis by a prostate-specific and prostate cancer-associated noncoding gene,

PCGEM1. DNA and cell biology. 2006;25(3):135-41.

158. Petrovics G, Zhang W, Makarem M, Street JP, Connelly R, Sun L, et al.

Elevated expression of PCGEM1, a prostate-specific gene with cell growth- promoting function, is associated with high-risk prostate cancer patients.

Oncogene. 2004;23(2):605-11.

159. Kotake Y, Kitagawa K, Ohhata T, Sakai S, Uchida C, Niida H, et al. Long

Non-coding RNA, PANDA, Contributes to the Stabilization of p53 Tumor Suppressor

Protein. Anticancer research. 2016;36(4):1605-11.

160. Belitskaya-Levy I, Zeleniuch-Jacquotte A, Russo J, Russo IH, Bordas P,

Ahman J, et al. Characterization of a genomic signature of pregnanc y identified in the breast. Cancer prevention research. 2011;4(9):1457-64.

161. Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and

Cufflinks. Nat Protoc. 2012;7(3):562-78.

162. Anders S, Pyl PT, Huber W. HTSeq--a Python framework to work with high- throughput sequencing data. Bioinformatics. 2015;31(2):166-9.

163. Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva

O, et al. RefSeq: an update on mammalian reference sequences. Nucleic acids research. 2014;42(Database issue):D756-63.

164. Anders S, Huber W. Differential expression analysis for sequence count data. Genome biology. 2010;11(10):R106.

165. Smyth GK. Limma: linear models for microarray data. In: Gentleman R,

Carey V, Dudoit S, Irizarry R, Huber W, editors. Bioinformatics and Computational

Biology Solutions using R and Bioconductor. New York: Springer; 2005. p. 397 -420.

159

166. Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES, Getz

G, et al. Integrative genomics viewer. Nature biotechnology. 2011;29(1):24 -6.

167. Thorvaldsdottir H, Robinson JT, Mesirov JP. Integrative Genomics Viewer

(IGV): high-performance genomics data visualization and exploration. Briefings in bioinformatics. 2013;14(2):178-92.

168. Tait L, Soule HD, Russo J. Ultrastructural and immunocytochemical characterization of an immortalized human breast epithelial cell line, MCF -10.

Cancer research. 1990;50(18):6087-94.

169. Soule HD, Maloney TM, Wolman SR, Peterson WD, Jr., Brenz R, McGrath

CM, et al. Isolation and characterization of a spontaneously immortalized human breast epithelial cell line, MCF-10. Cancer research. 1990;50(18):6075-86.

170. Kallergi G, Tsapara A, Kampa M, Papakonstanti EA, Krasaga kis K, Castanas

E, et al. Distinct signaling pathways regulate differential opioid effects on actin cytoskeleton in malignant MCF7 and nonmalignant MCF12A human breast epithelial cells. Experimental cell research. 2003;288(1):94-109.

171. Neve RM, Chin K, Fridlyand J, Yeh J, Baehner FL, Fevr T, et al. A collection of breast cancer cell lines for the study of functionally distinct cancer subtypes.

Cancer cell. 2006;10(6):515-27.

172. Dunagin M, Cabili MN, Rinn J, Raj A. Visualization of lncRNA by single - molecule fluorescence in situ hybridization. Methods in molecular biology (Clifton,

NJ). 2015;1262:3-19.

173. Cabili MN, Dunagin MC, McClanahan PD, Biaesch A, Padovan-Merhar O,

Regev A, et al. Localization and abundance analysis of human lncRNAs at single - cell and single-molecule resolution. Genome biology. 2015;16:20.

174. Mosmann T. Rapid colorimetric assay for cellular growth and survival: application to proliferation and cytotoxicity assays. Journal of immunological methods. 1983;65(1-2):55-63.

160

175. Pogash TJ, El-Bayoumy K, Amin S, Gowda K, de Cicco RL, Barton M, et al.

Oxidized derivative of docosahexaenoic acid preferentially inhibit cell proliferation in triple negative over luminal breast cancer cells. In vitro cellular & developmental biology Animal. 2015;51(2):121-7.

176. Su Y, Pogash TJ, Nguyen TD, Russo J. Development and characterization of two human triple-negative breast cancer cell lines with highly tumorigenic and metastatic capabilities. Cancer medicine. 2016;5(3):558-73.

177. Kim RS, Avivar-Valderas A, Estrada Y, Bragado P, Sosa MS, Aguirre-Ghiso

JA, et al. Dormancy signatures and metastasis in estrogen receptor positive and negative breast cancer. PloS one. 2012;7(4):e35569.

178. Holen I, Walker M, Nutter F, Fowles A, Evans CA, Eaton CL, et al.

Oestrogen receptor positive breast cancer metastasis to bone: inhibition by targeting the bone microenvironment in vivo. Clinical & experimental metastasis.

2016;33(3):211-24.

179. Huber W, Carey VJ, Gentleman R, Anders S, Carlson M, Carvalho BS, et al.

Orchestrating high-throughput genomic analysis with Bioconductor. Nature methods. 2015;12(2):115-21.

180. R Development Core Team. R: A language and environment for statistical computing. : R Foundation for Statistical Computing; 2013 [cited 2013]. Availabl e from: http://www.R-project.org.

181. Russo IH, Koszalka M, Russo J. Comparative study of the influence of pregnancy and hormonal treatment on mammary carcinogenesis. British journal of cancer. 1991;64(3):481-4.

182. Janssens JP, Russo J, Russo I, Michiel s L, Donders G, Verjans M, et al.

Human chorionic gonadotropin (hCG) and prevention of breast cancer. Molecular and cellular endocrinology. 2007;269(1-2):93-8.

161

183. Wellings SR, Jensen HM, Marcum RG. An atlas of subgross pathology of the human breast with special reference to possible precancerous lesions. Journal of the National Cancer Institute. 1975;55(2):231-73.

184. Russo J, Santucci-Pereira J, de Cicco RL, Sheriff F, Russo PA, Peri S, et al.

Pregnancy-induced chromatin remodeling in the breast of post menopausal women.

International journal of cancer Journal international du cancer. 2012;131(5):1059 -

70.

185. Consortium EP. A user's guide to the encyclopedia of DNA elements

(ENCODE). PLoS biology. 2011;9(4):e1001046.

186. Cabili MN, Trapnell C, Goff L, Koziol M, Tazon-Vega B, Regev A, et al.

Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes & development. 2011;25(18):1915 -27.

187. Rinn JL, Chang HY. Genome regulation by long noncoding RNAs. Annu Rev

Biochem. 2012;81:145-66.

188. Marchese FP, Huarte M. Long non-coding RNAs and chromatin modifiers: their place in the epigenetic code. Epigenetics. 2014;9(1):21 -6.

189. Volders PJ, Helsens K, Wang X, Menten B, Martens L, Gevaert K, et al.

LNCipedia: a database for annotated human lncRNA transcript sequences and structures. Nucleic acids research. 2013;41(Database issue):D246 -51.

190. Volders PJ, Verheggen K, Menschaert G, Vandepoele K, Martens L,

Vandesompele J, et al. An update on LNCipedia: a database for annotated human lncRNA sequences. Nucleic acids research. 2015;43(8):4363-4.

191. Liao Y, Lu W, Che Q, Yang T, Qiu H, Zhang H, et al. SHARP1 suppresses angiogenesis of endometrial cancer by decreasing hypoxia-inducible factor-1alpha level. PloS one. 2014;9(6):e99907.

192. Amelio I, Melino G. The "Sharp" blade against HIF -mediated metastasis.

Cell cycle. 2012;11(24):4530-5.

162

193. Liao Y, He X, Qiu H, Che Q, Wang F, Lu W, et al. Suppression of the epithelial-mesenchymal transition by SHARP1 is linked to the NOTCH1 signaling pathway in metastasis of endometrial cancer. BMC cancer. 2014;14:487.

194. Falvella FS, Colombo F, Spinola M, Campiglio M, Pastorino U, Dragani TA.

BHLHB3: a candidate tumor suppressor in lung cancer. Oncogene. 2008.

195. Inaguma S, Riku M, Hashimoto M, Murakami H, Saga S, Ikeda H, et al. GLI1 interferes with the DNA mismatch repair system in pancreatic cancer through

BHLHE41-mediated suppression of MLH1. Cancer research. 2013;73(24):7313 -23.

196. Sato F, Kawamura H, Wu Y, Sato H, Jin D, Bhawal UK, et al. The basic helix-loop-helix transcription factor DEC2 inhibits TGF-beta-induced tumor progression in human pancreatic cancer BxPC-3 cells. International journal of molecular medicine. 2012;30(3):495-501.

197. Montagner M, Enzo E, Forcato M, Zanconato F, Parenti A, Rampazzo E, et al. SHARP1 suppresses breast cancer metastasis by promoting degradation of hypoxia-inducible factors. Nature. 2012;487(7407):380-4.

198. Piccolo S, Enzo E, Montagner M. p63, Sharp1, and HIFs: master regu lators of metastasis in triple-negative breast cancer. Cancer research. 2013;73(16):4978 -

81.

199. Spizzo G, Fong D, Wurm M, Ensinger C, Obrist P, Hofer C, et al. EpCAM expression in primary tumour tissues and metastases: an immunohistochemical analysis. Journal of clinical pathology. 2011;64(5):415-20.

200. Trebak M, Begg GE, Chong JM, Kanazireva EV, Herlyn D, Speicher DW.

Oligomeric state of the colon carcinoma-associated glycoprotein GA733-2 (Ep-

CAM/EGP40) and its role in GA733-mediated homotypic cell-cell adhesion. The

Journal of biological chemistry. 2001;276(3):2299-309.

201. Amoury M, Kolberg K, Pham AT, Hristodorov D, Mladenov R, Di Fiore S, et al. Granzyme B-based cytolytic fusion protein targeting EpCAM specifically kills

163

triple negative breast cancer cells in vitro and inhibits tumor growth in a subcutaneous mouse tumor model. Cancer letters. 2016;372(2):201 -9.

202. Deng Z, Wu Y, Ma W, Zhang S, Zhang YQ. Adoptive T -cell therapy of prostate cancer targeting the cancer stem cell antigen EpCAM. BMC imm unology.

2015;16:1.

203. Schmidt M, Hasenclever D, Schaeffer M, Boehm D, Cotarelo C, Steiner E, et al. Prognostic effect of epithelial cell adhesion molecule overexpression in untreated node-negative breast cancer. Clinical cancer research : an official jo urnal of the American Association for Cancer Research. 2008;14(18):5849 -55.

204. Watson JB, Sutcliffe JG. Primate brain-specific cytoplasmic transcript of the

Alu repeat family. Molecular and cellular biology. 1987;7(9):3324 -7.

205. Chen W, Bocker W, Brosius J, Tiedge H. Expression of neural BC200 RNA in human tumours. The Journal of pathology. 1997;183(3):345 -51.

206. Paine TM, Soule HD, Pauley RJ, Dawson PJ. Characterization of epithelial phenotypes in mortal and immortal human breast cells. International journal of cancer Journal international du cancer. 1992;50(3):463-73.

207. Prat A, Parker JS, Karginova O, Fan C, Livasy C, Herschkowitz JI, et al.

Phenotypic and molecular characterization of the claudin-low intrinsic subtype of breast cancer. Breast Cancer Res. 2010;12(5):R68.

208. Denoyelle C, Vasse M, Korner M, Mishal Z, Ganne F, Vannier JP, et al.

Cerivastatin, an inhibitor of HMG-CoA reductase, inhibits the signaling pathways involved in the invasiveness and metastatic properties of highly invasive b reast cancer cell lines: an in vitro study. Carcinogenesis. 2001;22(8):1139 -48.

209. Larkins TL, Nowell M, Singh S, Sanford GL. Inhibition of cyclooxygenase -2 decreases breast cancer cell motility, invasion and matrix metalloproteinase expression. BMC cancer. 2006;6:181.

164

210. Zheng Y, Rodrik V, Toschi A, Shi M, Hui L, Shen Y, et al. Phospholipase D couples survival and migration signals in stress response of human cancer cells.

The Journal of biological chemistry. 2006;281(23):15862-8.

211. Morini M, Mottolese M, Ferrari N, Ghiorzo F, Buglioni S, Mortarini R, et al.

The alpha 3 beta 1 integrin is associated with mammary carcinoma cell metastasis, invasion, and gelatinase B (MMP-9) activity. International journal of cancer Journal international du cancer. 2000;87(3):336-42.

212. Jiang WG, Davies G, Martin TA, Parr C, Watkins G, Mason MD, et al.

Expression of membrane type-1 matrix metalloproteinase, MT1-MMP in human breast cancer and its impact on invasiveness of breast cancer cells. International journal of molecular medicine. 2006;17(4):583-90.

213. Yu Y, Xiao CH, Tan LD, Wang QS, Li XQ, Feng YM. Cancer -associated fibroblasts induce epithelial-mesenchymal transition of breast cancer cells through paracrine TGF-beta signalling. British journal of cancer. 2014; 110(3):724-32.

214. van Horssen R, Hollestelle A, Rens JA, Eggermont AM, Schutte M, Ten

Hagen TL. E-cadherin promotor methylation and mutation are inversely related to motility capacity of breast cancer cells. Breast cancer research and treatment.

2012;136(2):365-77.

215. Kunz-Schughart LA, Heyder P, Schroeder J, Knuechel R. A heterologous 3 -

D coculture model of breast tumor cells and fibroblasts to study tumor -associated fibroblast differentiation. Experimental cell research. 2001;266(1):74 -86.

216. Keely PJ, Rusyn EV, Cox AD, Parise LV. R-Ras signals through specific integrin alpha cytoplasmic domains to promote migration and invasion of breast epithelial cells. The Journal of cell biology. 1999;145(5):1077 -88.

217. Zhao Z, Chen C, Liu Y, Wu C. 17beta-Estradiol treatment inhibits breast cell proliferation, migration and invasion by decreasing MALAT -1 RNA level.

Biochemical and biophysical research communications. 2014;445(2):388 -93.

165

218. Baier PC, Brzozka MM, Shahmoradi A, Reinecke L, Kroos C, Wichert SP, e t al. Mice lacking the circadian modulators SHARP1 and SHARP2 display altered and mixed state endophenotypes of psychiatric disorders. PloS one.

2014;9(10):e110310.

219. Falvella FS, Colombo F, Spinola M, Campiglio M, Pastorino U, Dragani TA.

BHLHB3: a candidate tumor suppressor in lung cancer. Oncogene.

2008;27(26):3761-4.

220. Zhang L, Wang JH, Liang RX, Huang ST, Xu J, Yuan LJ, et al. RASSF8 downregulation promotes lymphangiogenesis and metastasis in esophageal squamous cell carcinoma. Oncotarget. 2015;6(33):34510-24.

221. Wang J, Hua W, Huang SK, Fan K, Takeshima L, Mao Y, et al. RASSF8 regulates progression of cutaneous melanoma through nuclear factor -kappab.

Oncotarget. 2015;6(30):30165-77.

222. Lock FE, Underhill-Day N, Dunwell T, Matallanas D, Cooper W, Hesson L, et al. The RASSF8 candidate tumor suppressor inhibits cell growth and regulates the

Wnt and NF-kappaB signaling pathways. Oncogene. 2010;29(30):4307-16.

223. Falvella FS, Manenti G, Spinola M, Pignatiello C, Conti B, Pastorino U, et al.

Identification of RASSF8 as a candidate lung tumor suppressor gene. Oncogene.

2006;25(28):3934-8.

224. Shirasaki Y, Kanazawa Y, Morishima Y, Makino M. Involvement of calmodulin in neuronal cell death. Brain research. 2006;1083(1):189 -95.

225. Makita N, Yagihara N, Crotti L, Johnson CN, Beckmann BM, Roh MS, et al.

Novel calmodulin mutations associated with congenital arrhythmia susceptibility.

Circulation Cardiovascular genetics. 2014;7(4):466-74.

226. Fancy RM, Wang L, Zeng Q, Wang H, Zhou T, Buchsbaum DJ, et al.

Characterization of the Interactions between Calmodulin and Death Receptor 5 in

Triple-negative and Estrogen Receptor-positive Breast Cancer Cells: AN

166

INTEGRATED EXPERIMENTAL AND COMPUTATIONAL STUDY. The Journal of biological chemistry. 2016;291(24):12862-70.

227. Gao J, Liu X, Yang F, Liu T, Yan Q, Yang X. By inhibiting Ras/Raf/ERK and

MMP-9, knockdown of EpCAM inhibits breast cancer cell growth and metastasis.

Oncotarget. 2015;6(29):27187-98.

228. Gao J, Yan Q, Wang J, Liu S, Yang X. Epithelial -to-mesenchymal transition induced by TGF-beta1 is mediated by AP1-dependent EpCAM expression in MCF-7 cells. Journal of cellular physiology. 2015;230(4):775-82.

229. Liao MY, Lai JK, Kuo MY, Lu RM, Lin CW, Cheng PC, et al. An anti -EpCAM antibody EpAb2-6 for the treatment of colon cancer. Oncotarget. 2015;6(28):24947 -

68.

230. Liao MY, Kuo MY, Lu TY, Wang YP, Wu HC. Generation of an anti -EpCAM antibody and epigenetic regulation of EpCAM in . International journal of oncology. 2015;46(4):1788-800.

231. Osta WA, Chen Y, Mikhitarian K, Mitas M, Salem M, Hannun YA, et al.

EpCAM is overexpressed in breast cancer and is a potential target for breast cancer gene therapy. Cancer research. 2004;64(16):5818-24.

232. Baris HN, Barnes-Kedar I, Toledano H, Halpern M, Hershkovitz D, Lossos A, et al. Constitutional Mismatch Repair Deficiency in Israel: High Proportion of

Founder Mutations in MMR Genes and Consanguinity. Pediatric blood & cancer.

2016;63(3):418-27.

233. Dominguez-Valentin M, Joost P, Therkildsen C, Jonsson M, Rambech E,

Nilbert M. Frequent mismatch-repair defects link prostate cancer to Lynch syndrome. BMC urology. 2016;16:15.

234. Barton M, Santucci-Pereira J, López de Cicco R, Russo IH, Ross EA, Slifker

M, et al., editors. Long Non-Coding RNAs in the Postmenopausal Breast and their

Role in Cancer Prevention. 2014 AACR Annual Meeting; 2014; San Diego,

CA.2014.

167

235. Mus E, Hof PR, Tiedge H. Dendritic BC200 RNA in aging and in Alzheimer's disease. Proceedings of the National Academy of Sciences of the United States of

America. 2007;104(25):10679-84.

236. Hu T, Lu YR. BCYRN1, a c-MYC-activated long non-coding RNA, regulates cell metastasis of non-small-cell lung cancer. Cancer cell international. 2015;15:36.

237. Zhao RH, Zhu CH, Li XK, Cao W, Zong H, Cao XG, et al. BC200 LncRNA a potential predictive marker of poor prognosis in esophageal squamous cell carcinoma patients. OncoTargets and therapy. 2016;9:2221-6.

238. Wu DI, Wang T, Ren C, Liu L, Kong D, Jin X, et al. Downregulation of BC200 in ovarian cancer contributes to cancer cell proliferation and chemoresistance to carboplatin. Oncology letters. 2016;11(2):1189-94.

239. De Leeneer K, Claes K. Non Coding RNA Molecules as Potential Biomarkers in Breast Cancer. Advances in experimental medicine and biology. 2015;867:263-

75.

240. Sosinska P, Mikula-Pietrasik J, Ksiazek K. The double-edged sword of long non-coding RNA: The role of human brain-specific BC200 RNA in translational control, neurodegenerative diseases, and cancer. Mutation research Reviews in mutation research. 2015;766:58-67.

241. Shore AN, Rosen JM. Regulation of mammary epithelial cell homeostasis by lncRNAs. The international journal of biochemistry & cell biology. 2014;54:318 -30.

242. Kumar P, Aggarwal R. An overview of triple-negative breast cancer.

Archives of gynecology and obstetrics. 2016;293(2):247-69.

243. Welcsh PL, King MC. BRCA1 and BRCA2 and the genetics of breast and ovarian cancer. Human molecular genetics. 2001;10(7):705-13.

244. Bhawal UK, Sato F, Arakawa Y, Fujimoto K, Kawamoto T, Tanimoto K, et al.

Basic helix-loop-helix transcription factor DEC1 negatively regulates cyclin D1. The

Journal of pathology. 2011;224(3):420-9.

168

245. Iacoangeli A, Lin Y, Morley EJ, Muslimov IA, Bianchi R, Reilly J, et al.

BC200 RNA in invasive and preinvasive breast cancer. Carcinogenesis.

2004;25(11):2125-33.

246. Singh R, Gupta SC, Peng WX, Zhou N, Pochampally R, Atfi A, et al.

Regulation of alternative splicing of Bcl -x by BC200 contributes to breast cancer pathogenesis. Cell death & disease. 2016;7(6):e2262.

247. Tiedge H, Iacoangeli A, Soff GA, inventorsCancer Blood Test Using BC200

RNA Isolated From Peripheral Blood for Diagnosis and Treatment of Invasive

Breast Cancer. USA2015 March 26, 2015.

248. Schmitt AM, Chang HY. Long Noncoding RNAs i n Cancer Pathways. Cancer cell. 2016;29(4):452-63.

249. Gillett AM, Wallace MJ, Gillespie MT, Hooper SB. Increased expansion of the lung stimulates calmodulin 2 expression in fetal sheep. American journal of physiology Lung cellular and molecular physiology. 2002;282(3):L440-7.

250. Calaluce R, Gubin MM, Davis JW, Magee JD, Chen J, Kuwano Y, et al. The

RNA binding protein HuR differentially regulates unique subsets of mRNAs in estrogen receptor negative and estrogen receptor positive breast cancer. BMC cancer. 2010;10:126.

251. Haddad SA, Lunetta KL, Ruiz-Narvaez EA, Bensen JT, Hong CC,

Sucheston-Campbell LE, et al. Hormone-related pathways and risk of breast cancer subtypes in African American women. Breast cancer research and treatment.

2015;154(1):145-54.

252. Melamed P, Savulescu D, Lim S, Wijeweera A, Luo Z, Luo M, et al.

Gonadotrophin-releasing hormone signalling downstream of calmodulin. Journal of neuroendocrinology. 2012;24(12):1463-75.

253. Granit RZ, Slyper M, Ben-Porath I. Axes of differentiation in breast cancer: untangling stemness, lineage identity, and the epithelial to mesenchymal transition.

Wiley interdisciplinary reviews Systems biology and medicine. 2014;6(1):93 -106.

169

254. Osmanbeyoglu HU, Pelossof R, Bromberg JF, Leslie CS. Linking signali ng pathways to transcriptional programs in breast cancer. Genome research.

2014;24(11):1869-80.

255. Russo J, Yang X, Hu YF, Bove BA, Huang Y, Silva ID, et al. Biological and molecular basis of human breast cancer. Front Biosci. 1998;3:D944 -60.

256. Russo IH, Russo J. Hormonal approach to breast cancer prevention. J Cell

Biochem Suppl. 2000;34:1-6.

257. Inman JL, Robertson C, Mott JD, Bissell MJ. Mammary gland development: cell fate specification, stem cells and the microenvironment. Development.

2015;142(6):1028-42.

258. Russo J, Mailo D, Hu YF, Balogh G, Sheriff F, Russo IH. Breast differentiation and its implication in cancer prevention. Clinical cancer research : an official journal of the American Association for Cancer Research. 2005;11(2 Pt

2):931s-6s.

259. Peri S, de Cicco RL, Santucci-Pereira J, Slifker M, Ross EA, Russo IH, et al.

Defining the genomic signature of the parous breast. BMC medical genomics.

2012;5:46.

260. Liu X, Robinson GW, Hennighausen L. Activation of Stat5a and Stat5b by tyrosine phosphorylation is tightly linked to mammary gland differentiation.

Molecular endocrinology. 1996;10(12):1496-506.

261. Huynh HT, Robitaille G, Turner JD. Establishment of bovine mammary epithelial cells (MAC-T): an in vitro model for bovine lactation. Experimental cell research. 1991;197(2):191-9.

262. Lee HY, Heo YT, Lee SE, Hwang KC, Lee HG, Choi SH, et al. Short communication: retinoic acid plus prolactin to synergistically increase specific casein gene expression in MAC-T cells. Journal of dairy science. 2013;96(6):3835-

9.

170

263. Riley LG, Gardiner-Garden M, Thomson PC, Wynn PC, Williamson P,

Raadsma HW, et al. The influence of extracellular matrix and prolactin on global gene expression profiles of primary bovine mammary epithelial cells in vitro. Animal genetics. 2010;41(1):55-63.

264. Lemay DG, Ballard OA, Hughes MA, Morrow AL, Horseman ND, Nommsen -

Rivers LA. RNA sequencing of the human milk fat layer transcriptome reveals distinct gene expression profiles at three stages of lactation. PloS one.

2013;8(7):e67531.

265. Pauciullo A, Erhardt G. Molecular Characterization of the Llamas (Lama glama) Casein Cluster Genes Transcripts (CSN1S1, CSN2, CSN1S2, CSN3) and

Regulatory Regions. PloS one. 2015;10(4):e0124963.

266. Cunningham TJ, Duester G. Mechanisms of retinoic acid signalling and its roles in organ and limb development. Nature reviews Molecular cell biology.

2015;16(2):110-23.

267. Sun R, Liu Y, Li SY, Shen S, Du XJ, Xu CF, et al. Co-delivery of all-trans- retinoic acid and doxorubicin for cancer therapy with synergistic inhibition of cancer stem cells. Biomaterials. 2015;37:405-14.

268. Greve G, Schiffmann I, Lubbert M. Epigenetic priming of non -small cell lung cancer cell lines to the antiproliferative and differentiating effects of all -trans retinoic acid. Journal of cancer research and clinical oncology. 2015;141(12):2171 -

80.

269. Najafzadeh N, Mazani M, Abbasi A, Farassati F, Amani M. Low-dose all- trans retinoic acid enhances cytotoxicity of cisplatin and 5 -fluorouracil on CD44(+) cancer stem cells. Biomedicine & pharmacotherapy = Biomedecine & pharmacotherapie. 2015;74:243-51.

270. Fisher JN, Terao M, Fratelli M, Kurosaki M, Paroni G, Zanetti A, et al.

MicroRNA networks regulated by all-trans retinoic acid and Lapatinib control the

171

growth, survival and motility of breast cancer cells. Oncotarget. 2015;6(15):13176 -

200.

271. Yan Y, Li Z, Xu X, Chen C, Wei W, Fan M, et al. All -trans retinoic acids induce differentiation and sensitize a radioresistant breast cancer cells to chemotherapy. BMC complementary and alternative medicine. 2016;16:113.

272. Garattini E, Bolis M, Garattini SK, Fratelli M, Centritto F, Paroni G, et al.

Retinoids and breast cancer: from basic studies to the clinic and back again.

Cancer treatment reviews. 2014;40(6):739-49.

273. Paroni G, Fratelli M, Gardini G, Bassano C, Flora M, Zanetti A, et al.

Synergistic antitumor activity of lapatinib and retinoids on a novel subtype of breast cancer with coamplification of ERBB2 and RARA. Oncogene. 2012;31(29):3431 -43.

274. Uray IP, Dmitrovsky E, Brown PH. Retinoids and rexinoids in cancer prevention: from laboratory to clinic. Seminars in oncology. 2016;43(1):49 -64.

275. di Masi A, Leboffe L, De Marinis E, Pagano F, Cicconi L, Rochette -Egly C, et al. Retinoic acid receptors: from molecular mechanisms to cancer therapy.

Molecular aspects of medicine. 2015;41:1-115.

276. Herreros-Villanueva M, Er TK, Bujanda L. Retinoic Acid Reduces Stem Cell -

Like Features in Pancreatic Cancer Cells. Pancreas. 2015;44(6):918 -24.

277. Sonawane P, Cho HE, Tagde A, Verlekar D, Yu AL, Reynolds CP, et al.

Metabolic characteristics of 13-cis-retinoic acid (isotretinoin) and anti-tumour activity of the 13-cis-retinoic acid metabolite 4-oxo-13-cis-retinoic acid in neuroblastoma. British journal of pharmacology. 2014;171(23) :5330-44.

278. Zhang W, Zeng Z, Wei F, Chen P, Schmitt DC, Fan S, et al. SPLUNC1 is associated with nasopharyngeal carcinoma prognosis and plays an important role in all-trans-retinoic acid-induced growth inhibition and differentiation in nasopharyngeal cancer cells. The FEBS journal. 2014;281(21):4815-29.

279. Zanetti A, Affatato R, Centritto F, Fratelli M, Kurosaki M, Barzago MM, et al.

All-trans-retinoic Acid Modulates the Plasticity and Inhibits the Motility of Breast

172

Cancer Cells: ROLE OF NOTCH1 AND TRANSFORMING GROWTH FACTOR

(TGFbeta). The Journal of biological chemistry. 2015;290(29):17690 -709.

280. Berardi DE, Flumian C, Campodonico PB, Urtreger AJ, Diaz Bessone MI,

Motter AN, et al. Myoepithelial and luminal breast cancer cells exhibit different responses to all-trans retinoic acid. Cellular oncology (Dordrecht). 2015;38(4):289 -

305.

281. Arisi MF, Starker RA, Addya S, Huang Y, Fernandez SV. All trans-retinoic acid (ATRA) induces re-differentiation of early transformed breast epithelial cells.

International journal of oncology. 2014;44(6):1831-42.

282. Huang W, Zhang Y, Varambally S, Chinnaiyan AM, Banerjee M, Merajver

SD, et al. Inhibition of CCN6 (Wnt-1-induced signaling protein 3) down-regulates E- cadherin in the breast epithelium through induction of snail and ZEB1. The

American journal of pathology. 2008;172(4):893-904.

283. Rodrigues P, Furriol J, Bermejo B, Chaves FJ, Lluch A, Eroles P.

Identification of candidate polymorphisms on stress oxidative and DNA damage repair genes related with clinical outcome in breast cancer patients. International journal of molecular sciences. 2012;13(12):16500-13.

284. Kleer CG, Zhang Y, Pan Q, Merajver SD. WISP3 (CCN6) is a secreted tumor-suppressor protein that modulates IGF signaling in inflammatory breast cancer. Neoplasia. 2004;6(2):179-85.

285. Peri S, de Cicco RL, Santucci-Pereira J, Slifker M, Ross EA, Russo IH, et al.

Defining the genomic signature of the parous breast. BMC Med Genomics.

2012;5(1):46.

286. Bourassa P, N'Soukpoe-Kossi CN, Tajmir-Riahi HA. Binding of vitamin A with milk alpha- and beta-caseins. Food chemistry. 2013;138(1):444-53.

287. Hu H, Wang J, Bu D, Wei H, Zhou L, Li F, et al. In vitro culture and characterization of a mammary epithelial cell line from Chinese Holstein dairy cow.

PloS one. 2009;4(11):e7636.

173

288. Ogorevc J, Dovc P. Relative quantification of beta-casein expression in primary goat mammary epithelial cell lines. Genetics and molecular research :

GMR. 2015;14(2):3481-90.

289. Gestl SA, Green MD, Shearer DA, Frauenhoffer E, Tephly TR, Weisz J.

Expression of UGT2B7, a UDP-glucuronosyltransferase implicated in the metabolism of 4-hydroxyestrone and all-trans retinoic acid, in normal human breast parenchyma and in invasive and in situ breast cancers. The American journal of pathology. 2002;160(4):1467-79.

290. Turgeon D, Carrier JS, Levesque E, Hum DW, Belanger A. Relative enzymatic activity, protein stability, and tissue distribution of human steroid - metabolizing UGT2B subfamily members. Endocrinology. 2001;142(2):778 -87.

291. Ritter JK, Chen F, Sheen YY, Lubet RA, Owens IS. Two human liver cDNAs encode UDP-glucuronosyltransferases with 2 log differences in activity toward parallel substrates including hyodeoxycholic acid and certain estrogen derivatives.

Biochemistry. 1992;31(13):3409-14.

292. Samokyszyn VM, Gall WE, Zawada G, Freyaldenhoven MA, Chen G,

Mackenzie PI, et al. 4-hydroxyretinoic acid, a novel substrate for human liver microsomal UDP-glucuronosyltransferase(s) and recombinant UGT2B7. The Journal of biological chemistry. 2000;275(10):6908-14.

293. Nishino T, Okamoto K, Eger BT, Pai EF, Nishino T. Mammalian xanthine oxidoreductase - mechanism of transition from xanthine dehydrogenase to xanthine oxidase. The FEBS journal. 2008;275(13):3278 -89.

294. Taibi G, Paganini A, Gueli MC, Ampola F, Nicotra CM. Xanthine oxidase catalyzes the synthesis of retinoic acid. Journal of enzyme inhibition.

2001;16(3):275-85.

295. Vorbach C, Scriven A, Capecchi MR. The housekeeping gene xanthine oxidoreductase is necessary for milk fat droplet enveloping and secretion: gene

174

sharing in the lactating mammary gland. Genes & development. 2002;16(24):3223 -

35.

296. Huang W, Martin EE, Burman B, Gonzalez ME, Kleer CG. The matricellular protein ccn6 (wisp3) decreases notch1 and suppresses breast cancer initiating cells. Oncotarget. 2016.

297. Pal A, Huang W, Li X, Toy KA, Nikolovska-Coleska Z, Kleer CG. CCN6 modulates BMP signaling via the Smad-independent TAK1/p38 pathway, acting to suppress metastasis of breast cancer. Cancer research. 2012;72(18):4818 -28.

298. Lorenzatti G, Huang W, Pal A, Cabanillas AM, Kleer CG. CCN6 (WISP3) decreases ZEB1-mediated EMT and invasion by attenuation of IGF -1 receptor signaling in breast cancer. Journal of cell science. 2011;124(Pt 10):1752 -8.

299. Leask A. CCN6: a modulator of breast cancer progression. Journal of cell communication and signaling. 2016.

300. Jiang N, Wang Y, Yu Z, Hu L, Liu C, Gao X, et al. WISP3 (CCN6) Regulates

Milk Protein Synthesis and Cell Growth Through mTOR Signaling in Dairy Cow

Mammary Epithelial Cells. DNA and cell biology. 2015;34(8):524-33.

301. Sreekumar A, Roarty K, Rosen JM. The mammary stem cell hierarchy: a looking glass into heterogeneous breast cancer landscapes. Endocrine -related cancer. 2015;22(6):T161-76.

302. Condiotti R, Guo W, Ben-Porath I. Evolving views of breast cancer stem cells and their differentiation States. Critical reviews in oncogenesis.

2014;19(5):337-48.

303. Russo J, Hu YF, Yang X, Russo IH. Developmental, cellular, and molecular basis of human breast cancer. Journal of the National Cancer Institute

Monographs. 2000(27):17-37.

304. Scribner KC, Behbod F, Porter WW. Regulation of DCIS to invasive breast cancer progression by Singleminded-2s (SIM2s). Oncogene. 2013;32(21):2631-9.

175

305. Liu HX, Li XL, Dong CF. Epigenetic and metabolic regulation of breast cancer stem cells. Journal of Zhejiang University Science B. 2015;16(1):10 -7.

306. Uehara N, Unami A, Kiyozuka Y, Shikata N, Oishi Y, Tsubura A. Parous mammary glands exhibit distinct alterations in gene expression and pro liferation responsiveness to carcinogenic stimuli in Lewis rats. Oncology reports.

2006;15(4):903-11.

307. Pal A, Huang W, Toy KA, Kleer CG. CCN6 knockdown disrupts acinar organization of breast cells in three-dimensional cultures through up-regulation of type III TGF-beta receptor. Neoplasia. 2012;14(11):1067-74.

308. Meier-Abt F, Milani E, Roloff T, Brinkhaus H, Duss S, Meyer DS, et al. Parity induces differentiation and reduces Wnt/Notch signaling ratio and proliferation potential of basal stem/progenitor cells isolated from mouse mammary epithelium.

Breast cancer research : BCR. 2013;15(2):R36.

309. Russo J, Moral R, Balogh GA, Mailo D, Russo IH. The protective role of pregnancy in breast cancer. Breast cancer research : BCR. 2005;7(3):131 -42.

310. Sass JO, Forster A, Bock KW, Nau H. Glucuronidation and isomerization of all-trans- and 13-cis-retinoic acid by liver microsomes of phenobarbital - or 3- methylcholanthrene-treated rats. Biochemical pharmacology. 1994;47(3):485-92.

311. Radominska A, Little JM, Lester R, Mackenzie PI. Bile acid glucuronidation by rat liver microsomes and cDNA-expressed UDP-glucuronosyltransferases.

Biochimica et biophysica acta. 1994;1205(1):75 -82.

312. Lu Y, Bratton S, Heydel JM, Radominska-Pandya A. Effect of retinoids on

UDP-glucuronosyltransferase 2B7 mRNA expression in Caco-2 cells. Drug metabolism and pharmacokinetics. 2008;23(5):364-72.

313. Radominska-Pandya A, Chen G, Samokyszyn VM, Little JM, Gall WE,

Zawada G, et al. Application of photoaffinity labeling with [(3)H] all trans- and 9- cis-retinoic acids for characterization of cellular retinoic acid --binding proteins I and II. Protein science : a publication of the Protein Society. 2001;10(1):200 -11.

176

314. Barua AB, Sidell N. Retinoyl beta-glucuronide: a biologically active interesting retinoid. The Journal of nutrition. 2004;134(1):286S -9S.

315. Formelli F, Barua AB, Olson JA. Bioactivities of N-(4-hydroxyphenyl) retinamide and retinoyl beta-glucuronide. FASEB journal : official publication of the

Federation of American Societies for Experimental Biology. 1996;10(9):1014-24.

316. Biesalski HK, Schaffer M. Comparative assessment of the activity of beta - carotene, retinoyl-beta-D-glucuronide and retinoic acid on growth and differentiation of a human promyelocytic leukemia cell l ine HL-60. International journal for vitamin and nutrition research Internationale Zeitschrift fur Vitamin - und

Ernahrungsforschung Journal international de vitaminologie et de nutrition.

1997;67(5):357-63.

317. Kaul S, Olson JA. Effect of vitamin A defici ency on the hydrolysis of retinoyl beta-glucuronide to retinoic acid by rat tissue organelles in vitro. International journal for vitamin and nutrition research Internationale Zeitschrift fur Vitamin - und

Ernahrungsforschung Journal international de vitami nologie et de nutrition.

1998;68(4):232-6.

318. Cavalieri E, Frenkel K, Liehr JG, Rogan E, Roy D. Estrogens as endogenous genotoxic agents--DNA adducts and mutations. Journal of the National Cancer

Institute Monographs. 2000(27):75-93.

319. Cavalieri EL, Rogan EG. Depurinating estrogen-DNA adducts, generators of cancer initiation: their minimization leads to cancer prevention. Clinical and translational medicine. 2016;5(1):12.

320. Toniolo PG, Levitz M, Zeleniuch-Jacquotte A, Banerjee S, Koenig KL, Shore

RE, et al. A prospective study of endogenous estrogens and breast cancer in postmenopausal women. Journal of the National Cancer Institute. 1995;87(3):190 -

7.

177

321. Seeger H, Deuringer FU, Wallwiener D, Mueck AO. Breast cancer risk during

HRT: influence of estradiol metabolites on breast cancer and endothelial cell proliferation. Maturitas. 2004;49(3):235-40.

322. Seeger H, Wallwiener D, Mueck AO. Influence of stroma-derived growth factors on the estradiol-stimulated proliferation of human breast cancer cells.

European journal of gynaecological oncology. 2004;25(2):175 -7.

323. Russo J, Fernandez SV, Russo PA, Fernbaugh R, Sheriff FS, Lareef HM, et al. 17-Beta-estradiol induces transformation and tumorigenesis in human breast epithelial cells. FASEB journal : official publication of the Federation of American

Societies for Experimental Biology. 2006;20(10):1622-34.

324. Getoff N, Gerschpacher M, Hartmann J, Huber JC, Schittl H, Quint RM. The

4-hydroxyestrone: Electron emission, formation of secondary metabolites a nd mechanisms of carcinogenesis. Journal of photochemistry and photobiology B,

Biology. 2010;98(1):20-4.

325. Kurosaki M, Zanotta S, Li Calzi M, Garattini E, Terao M. Expression of xanthine oxidoreductase in mouse mammary epithelium during pregnancy and lactation: regulation of gene expression by glucocorticoids and prolactin. The

Biochemical journal. 1996;319 ( Pt 3):801-10.

326. Jarasch ED, Grund C, Bruder G, Heid HW, Keenan TW, Franke WW.

Localization of xanthine oxidase in mammary-gland epithelium and capillary endothelium. Cell. 1981;25(1):67-82.

327. Kooij A, Frederiks WM, Gossrau R, Van Noorden CJ. Localization of xanthine oxidoreductase activity using the tissue protectant polyvinyl alcohol and final electron acceptor Tetranitro BT. The journal of hi stochemistry and cytochemistry : official journal of the Histochemistry Society. 1991;39(1):87 -93.

328. McManaman JL, Palmer CA, Wright RM, Neville MC. Functional regulation of xanthine oxidoreductase expression and localization in the mouse mammary gland:

178

evidence of a role in lipid secretion. The Journal of physiology. 2002;545(Pt 2):567 -

79.

329. McManaman JL, Bain DL. Structural and conformational analysis of the oxidase to dehydrogenase conversion of xanthine oxidoreductase. The Journal of biological chemistry. 2002;277(24):21261-8.

330. Tang XH, Gudas LJ. Retinoids, retinoic acid receptors, and cancer. Annual review of pathology. 2011;6:345-64.

331. Fini MA, Monks J, Farabaugh SM, Wright RM. Contribution of xanthine oxidoreductase to mammary epithelial and breast cancer cell differentiation in part modulates inhibitor of differentiation-1. Molecular cancer research : MCR.

2011;9(9):1242-54.

332. Cook W, Chu R, Saksela M, Raivio K, Yeldandi A. Differential immunohistochemical localization of xanthine oxidase in normal and neoplastic human breast epithelium. International journal of oncology. 1997;11(5):1013 -7.

333. Taibi G, Carruba G, Cocciadiferro L, Granata OM, Nicotra CMA. Low Levels of Both Xanthine Dehydrogenase and Cellular Retinol Binding Protein Ar e

Responsible for Retinoic Acid Deficiency in Malignant Human Mammary Epithelial

Cells. Annals of the New York Academy of Sciences. 2009;1155(1):268 -72.

334. Taibi G, Carruba G, Miceli V, Cocciadiferro L, Nicotra CM. Estradiol decreases xanthine dehydrogenase enzyme activity and protein expression in non- tumorigenic and malignant human mammary epithelial cells. Journal of cellular biochemistry. 2009;108(3):688-92.

335. Yue W, Yager JD, Wang JP, Jupe ER, Santen RJ. Estrogen receptor - dependent and independent mechanisms of breast cancer carcinogenesis. Steroids.

2013;78(2):161-70.

336. Battelli MG, Polito L, Bortolotti M, Bolognesi A. Xanthine oxidoreductase in cancer: more than a differentiation marker. Cancer medicine. 2016;5(3):546 -57.

179

337. Ganguli N, Ganguli N, Usmani A, Majumdar SS. Isolation and functional characterization of buffalo (Bubalus bubalis) beta-casein promoter for driving mammary epithelial cell-specific gene expression. Journal of biotechnology.

2015;198:53-9.

338. Lipkie TE, Morrow AL, Jouni ZE, McMahon RJ, Ferruzzi MG. Longitudinal

Survey of Carotenoids in Human Milk from Urban Cohorts in China, Mexico, and the

USA. PloS one. 2015;10(6):e0127729.

339. Jones E, Spencer SA. Optimising the provision of human milk for preterm infants. Archives of disease in childhood Fetal and neonatal edition.

2007;92(4):F236-8.

340. Zaragoza R, Garcia-Trevijano ER, Lluch A, Ribas G, Vina JR. Involvement of

Different networks in mammary gland involution after the pregnancy/lactation cycle:

Implications in breast cancer. IUBMB life. 2015;67(4):227-38.

341. van Golen KL, Davies S, Wu ZF, Wang Y, Bucana CD, Root H, et al. A novel putative low-affinity insulin-like growth factor-binding protein, LIBC (lost in inflammatory breast cancer), and RhoC GTPase correlate wit h the inflammatory breast cancer phenotype. Clinical cancer research : an official journal of the

American Association for Cancer Research. 1999;5(9):2511-9.

342. Yang J, Kennelly JJ, Baracos VE. Physiological levels of Stat5 DNA binding activity and protein in bovine mammary gland. Journal of animal science.

2000;78(12):3126-34.

343. Yang J, Kennelly JJ, Baracos VE. The activity of transcription factor Stat5 responds to prolactin, growth hormone, and IGF -I in rat and bovine mammary explant culture. Journal of animal science. 2000;78(12):3114-25.

344. Need EF, Atashgaran V, Ingman WV, Dasari P. Hormonal regulation of the immune microenvironment in the mammary gland. Journal of mammary gland biology and neoplasia. 2014;19(2):229-39.

180

345. Miller DS, Sen M. Potential role of WISP3 (CCN6) in regulating the accumulation of reactive oxygen species. Biochemical and biophysical research communications. 2007;355(1):156-61.

346. Davis L, Chen Y, Sen M. WISP-3 functions as a ligand and promotes superoxide dismutase activity. Biochemical and biophysical research communications. 2006;342(1):259-65.

347. Patra M, Mahata SK, Padhan DK. CCN6 regulates mitochondrial function.

2016;129(14):2841-51.

348. Chen H, Zhang H, Lee J, Liang X, Wu X, Zhu T, et al. HOXA5 acts directly downstream of retinoic acid receptor beta and contributes to retinoic acid -induced apoptosis and growth inhibition. Cancer research. 2007;67(17):8007 -13.

349. Shilkaitis A, Green A, Christov K. Retinoids induce cellular senescence in breast cancer cells by RAR-beta dependent and independent pathways: Potential clinical implications (Review). International journal of oncology. 2015;47(1):35 -42.

350. Khan S, Wall D, Curran C, Newell J, Kerin MJ, Dwyer RM. MicroRNA -10a is reduced in breast cancer and regulated i n part through retinoic acid. BMC cancer.

2015;15:345.

351. Liu RZ, Garcia E, Glubrecht DD, Poon HY, Mackey JR, Godbout R. CRABP1 is associated with a poor prognosis in breast cancer: adding to the complexity of breast cancer cell response to retinoic acid. Molecular cancer. 2015;14:129.

352. Flamini MI, Gauna GV, Sottile ML, Nadin BS, Sanchez AM, Vargas-Roig LM.

Retinoic acid reduces migration of human breast cancer cells: role of retinoic acid receptor beta. Journal of cellular and molecular medicine. 201 4;18(6):1113-23.

353. Liu C, Wu HT, Zhu N, Shi YN, Liu Z, Ao BX, et al. Steroid receptor RNA activator: Biologic function and role in disease. Clinica chimica acta; international journal of clinical chemistry. 2016;459:137-46.

181

354. Wei S, Kozono S, Kats L, Nechama M, Li W, Guarnerio J, et al. Active Pin1 is a key target of all-trans retinoic acid in acute promyelocytic leukemia and breast cancer. Nature medicine. 2015;21(5):457-66.

355. Farias EF, Arapshian A, Bleiweiss IJ, Waxman S, Zelent A, Mira YLR.

Retinoic acid receptor alpha2 is a growth suppressor epigenetically silenced in

MCF-7 human breast cancer cells. Cell growth & differentiation : the molecular biology journal of the American Association for Cancer Research. 2002;13(8):335 -

41.

356. Sharma RB, Wang Q, Khillan JS. Amplification of tumor inducing putative cancer stem cells (CSCs) by vitamin A/retinol from mammary tumors. Biochemical and biophysical research communications. 2013;436(4):625-31.

357. Chen L, Khillan JS. A novel signaling by vitamin A/retinol promotes self renewal of mouse embryonic stem cells by activating PI3K/Akt signaling pathway via insulin-like growth factor-1 receptor. Stem cells. 2010;28(1):57-63.

358. de The H, Vivanco-Ruiz MM, Tiollais P, Stunnenberg H, Dejean A.

Identification of a retinoic acid responsive element in the retinoic acid receptor beta gene. Nature. 1990;343(6254):177-80.

359. Xu XC. Tumor-suppressive activity of retinoic acid receptor-beta in cancer.

Cancer letters. 2007;253(1):14-24.

360. de The H, Chomienne C, Lanotte M, Degos L, Dejean A. The t(15;17) translocation of acute promyelocytic leukaemia fuses the retinoic acid receptor alpha gene to a novel transcribed locus. Nature. 1990;347(6293):558 -61.

361. de The H, Tiollais P, Dejean A. The retinoic acid recep tors. Nouvelle revue francaise d'hematologie. 1990;32(1):30-2.

362. Russo J, Santucci-Pereira J, Russo IH. The genomic signature of breast cancer prevention. Genes (Basel). 2014;5(1):65-83.

363. Byers S, Pishvaian M, Crockett C, Peer C, Tozeren A, Sporn M, et al.

Retinoids increase cell-cell adhesion strength, beta-catenin protein stability, and

182

localization to the cell membrane in a breast cancer cell line: a role for serine kinase activity. Endocrinology. 1996;137(8):3265-73.

364. Easwaran V, Pishvaian M, Salimuddin, Byers S. Cross-regulation of beta- catenin-LEF/TCF and retinoid signaling pathways. Current biology : CB.

1999;9(23):1415-8.

365. Hatsell S, Rowlands T, Hiremath M, Cowin P. Beta-catenin and Tcfs in mammary development and cancer. Journal of mammary gland biology and neoplasia. 2003;8(2):145-58.

366. Shah S, Pishvaian MJ, Easwaran V, Brown PH, Byers SW. The role of cadherin, beta-catenin, and AP-1 in retinoid-regulated carcinoma cell differentiation and proliferation. The Journal of biological chemistry. 2002;277(28):25313-22.

367. Vallorosi CJ, Day KC, Zhao X, Rashid MG, Rubin MA, Johnson KR, et al.

Truncation of the beta-catenin binding domain of E-cadherin precedes epithelial apoptosis during prostate and mammary involution. The Journal of biological chemistry. 2000;275(5):3328-34.

368. Xue Z, Hennelly S, Doyle B, Gulati AA, Novikova IV, Sanbonmatsu KY, et al.

A G-Rich Motif in the lncRNA Braveheart Interacts with a Zinc -Finger Transcription

Factor to Specify the Cardiovascular Lineage. Molecul ar cell. 2016;64(1):37-50.

369. Chu C, Quinn J, Chang HY. Chromatin isolation by RNA purification

(ChIRP). Journal of visualized experiments : JoVE. 2012(61).

370. Chu C, Chang HY. Understanding RNA-Chromatin Interactions Using

Chromatin Isolation by RNA Purification (ChIRP). Methods in molecular biology

(Clifton, NJ). 2016;1480:115-23.

371. Goyal A, Myacheva K, Gross M, Klingenberg M, Duran Arque B, Diederichs

S. Challenges of CRISPR/Cas9 applications for long non-coding RNA genes.

Nucleic acids research. 2016.

372. Malih S, Saidijam M, Malih N. A brief review on long noncoding RNAs: a new paradigm in breast cancer pathogenesis, diagnosis and therapy. Tumour biology :

183

the journal of the International Society for Oncodevelopmental Biology and

Medicine. 2015.

373. Hansji H, Leung EY, Baguley BC, Finlay GJ, Askarian-Amiri ME. Keeping abreast with long non-coding RNAs in mammary gland development and breast cancer. Frontiers in genetics. 2014;5:379.

374. Zhou T, Kim Y, MacLeod AR. Targeting Long Noncoding RNA wi th Antisense

Oligonucleotide Technology as Cancer Therapeutics. Methods in molecular biology

(Clifton, NJ). 2016;1402:199-213.

184