COMPARATIVE GENE EXPRESSION ANALYSIS TO IDENTIFY COMMON FACTORS IN MULTIPLE CANCERS
DISSERTATION
Presented in Partial Fulfillment of the Requirements for
the Degree Doctor of Philosophy in the Graduate
School of The Ohio State University
By
Leszek A. Rybaczyk, B.A.
*****
The Ohio State University
2008
Dissertation Committee:
Professor Kun Huang, Adviser
Professor Jeffery Kuret Approved by
Professor Randy Nelson
Professor Daniel Janies ------
Adviser
Integrated Biomedical Science Graduate Program ABSTRACT
Most current cancer research is focused on tissue-specific genetic mutations.
Familial inheritance (e.g., APC in colon cancer), genetic mutation (e.g., p53), and overexpression of growth receptors (e.g., Her2-neu in breast cancer) can potentially lead to aberrant replication of a cell. Studies of these changes provide tremendous information about tissue-specific effects but are less informative about common changes that occur in multiple tissues. The similarity in the behavior of cancers from different organ systems and species suggests that a pervasive mechanism drives carcinogenesis, regardless of the specific tissue or species. In order to detect this mechanism, I applied three tiers of analysis at different levels: hypothesis testing on individual pathways to identify significant expression changes within each dataset, intersection of results between different datasets to find common themes across experiments, and Pearson correlations between individual genes to identify correlated genes within each dataset. By comparing a variety of cancers from different tissues and species, I was able to separate tissue and species specific effects from cancer specific effects. I found that downregulation of
Monoamine Oxidase A is an indicator of this pervasive mechanism and can potentially be used to detect pathways and functions related to the initiation, promotion, and progression of cancer.
ii
Dedicated to my wife
iii
ACKNOWLEDGMENTS
I want to thank my adviser, Dr. Kun Huang, for his seemingly unending patience, guidance and advice. Without which I never would have finished this research.
I am indebted to Dr. Jared Butcher for his constant support and input that proved invaluable during my research. I am also grateful to Drs. Donald Holzschu, Meredith
Bashaw, and Scott Moody for encouraging me to pursue academia.
I want to especially acknowledge my committee members, Drs. Randy Nelson,
Jeff Kuret, and Dan Janies who gave up valuable time and resources so that I could succeed.
I wish to thank Dr. Christopher Hans for volunteering to be the graduate studies representative on my committee.
I want to express my gratitude to both sets of my parents, Drs. Pramod and
Dorothy Pathak as well as Mr. and Mrs. Jerome McNally for all their help during the course of my training.
I also wish to acknowledge the administrative staff in my program who shepherded through this difficult process.
iv
VITA
April 23, 1980……………………...……………...... Born – Albuquerque, New Mexico
2005……………………………………………………B.A. Psychology, Ohio University
2005-present……………………Graduate Research Associate, The Ohio State University
PUBLICATIONS
Research Publication
1. L.A. Rybaczyk, M.J. Bashaw, D.R. Pathak, S. Moody, R. Gilders, D. Holzschu, “An overlooked connection: serotonergic mediation of estrogen-related physiology and pathology.” BMC Women’s Health, vol. 5; (2005): 12. (Highly accessed)
2. L.A. Rybaczyk, M.J. Bashaw, D.R. Pathak, K. Huang, “An indicator of cancer: downregulation of Monoamine Oxidase-A in multiple organs and species.” BMC Genomics, 9(1):134, 2008. (Highly accessed)
FIELDS OF STUDY Major Field: Integrated Biomedical Sciences
v
TABLE OF CONTENTS
Page
Abstract……………………………………………………………………………………ii
Dedication……………………………………………………………………………...…iii
Acknowledgements……………………………………………………………………….iv
Vita………………………………………………………………………………………...v
List of Tables...... ix
List of Figures...... x
Chapters
1. Introduction………………………………..………………………………………1
1.1 Serotonin and Cancer.....……….………………………………………………3
1.2 Comparative Analysis of Gene Expression in Multiple Cancers……………...4
1.3 Organization of this Dissertation…….………………………………...... 6
2. Genechip Technology…………….………………………………………………..7
2.1 Biological Issues…………..…………………………………………………..9
2.2 Current Statistical Approaches………………………………………………10
2.3 Summary……………………………………………………………………..15
3. Serotonin Physiology in Multiple Pathologies with a Focus on Cancer…………17
3.1 Serotonin Regulation………………………………………………………...18
3.2 Serotonin in the Central Nervous System...………….………………………20
3.3 Serotonin in the Musculoskeletal System……………………………………24
vi
3.4 Serotonin in the Vascular System……………………………………………26
3.5 Serotonin in the Immune System…………………………………………….29
3.6 Serotonin in Cancer………………………………………...………………..33
3.7 Summary………………………………………………....…………………..37
4. Hypothesis Testing of the Tryptophan/Serotonin Metabolic Pathway……….….39 4.1 Methods…….……………………………………………..…………………42 4.2 Results……....………………………………………………………………..44 4.3 Discussion..…………………………………………………………………..46 4.4 Summary……....……………………………………………………………..46
5. Whole Genome Analysis……………………….………………………………...48
5.1 Methods……..…………………….…………………………………….……50 5.1.1 Dataset Collection……………………………………………….…51 5.1.2 Dataset Handling…………………………………………………...52 5.1.3 Gene Selection……………………………………………………..52 5.2 Results………………………………………………………………………..56
5.2.1 Frequency of Differential Expression for Genes…………………..57
5.2.2 Human Genes………………………………………………………58
5.3 Discussion……..……………………………………………………………..59
5.4 Summary……………………………………………………………………..60
6. Correlating MAO-A Expression to Identify Differentially Expressed Pathways..61
6.1 Methods………………………………………………………………………62
6.1.1 Dataset Selection.…………………………………………………..62
6.1.2 Correlations………………………………………………………...63
vii
6.2 Results………………………………………………………………………..64
6.3 Discussion……………………………………………………………………64
6.4 Summary…………...…………………………………………………………66
7. Conclusions and Future Directions……………………………...………………67
7.1 Conclusions and Future Directions for Tier I: Hypothesis Testing of the Tryptophan/Serotonin Metabolic Pathway……………...……..……………..68
7.2 Conclusions and Future Directions for Tier II: Whole Genome Analysis…...70
7.3 Conclusions and Future Directions for Tier III: Correlating MAO-A Expression to Identify Differentially Expressed Pathways…………….…….71
7.4 Conclusion…………...………..……………...... ……………………………73
References…………………………………………………………….………………….75
Appendix A Tables………………….………………………………………………..…106
Appendix B Figures….………………………………………………………………….136
viii
LIST OF TABLES
Table Page
1 Description of first datasets identified for analysis……………………………..107
2 Genes listed in the tryptophan pathway in KEGG……………………………...110
3 Descriptive information on human datasets extracted………………………….112
4 Descriptive information on paired datasets extracted from GEO ……………...113
5 Descriptive information on animal datasets extracted from GEO ……………..114
6 The genes with a frequency of 11 out of 19…………………………………….115
7 Genes with frequency of occurrences more than 22 out of 40……………….....116
8 The DAVID output of gene function clustering of the genes with frequency of occurrences more than 22 out of 40...... 118
9 The top six signaling networks identified using Ingenuity Pathway Analysis with a frequency of occurrences more than 22 out of 40...... 122
10 The genes with significant frequency of occurrences in only human datasets....123
11 The DAVID output of gene function clustering of the genes with frequency of occurrences more than 19 out of 32...... 125
12 The top six signaling networks from Ingenuity Pathway Analysis classification of the genes with frequency of occurrences more than 18 out of 32...... 134
13 The top six signaling networks from Ingenuity Pathway Analysis classification of the genes that correlated with MAO-A...... 135
ix
LIST OF FIGURES
Figure Page
1 A flow chart representing the analytical technique used...... 137
2 Expression of MAO-A in normal and cancer tissue samples...... 138
( , + 1) 3 CDF of Beta 2 2 for L=19 datasets...... 139 𝐿𝐿 𝐿𝐿 𝑁𝑁 − ( , + 1) 4 CDF of Beta 2 2 for L=40 datasets...... 140 𝐿𝐿 𝐿𝐿 𝑁𝑁 − ( , + 1) 5 CDF of Beta 2 2 for L=32 datasets...... 141 𝐿𝐿 𝐿𝐿 6 A histogram of the𝑁𝑁 −frequencies of common differentially expressed genes for the 19 datasets (Group A)...... 142
7 A histogram of the gene frequencies for 40 datasets (Group B)...... 143
8 A graph representing the significance of the various pathways for 40 datasets based on an Ingenuity Pathway Analysis...... 144
9 The distribution of significant genes in humans...... 145
10 A graph representing the significance of the various pathways for 32 human datasets based on an Ingenuity Pathway Analysis...... 146
11 Graph representing the significance of the various pathways for 32 human datasets based on an Ingenuity Pathway Analysis...... 147
12 A representation of the G2/M check point...... 148
13 The glycolitic/gluconeogeneic pathway generated by IPA...... 149
14 An enlargement of the portion of the gluconeogeneic pathway that negatively correlated with MAO-A...... 150
x
CHAPTER 1
INTRODUCTION
Many diseases have defied the ability of modern research to identify treatments and etiologies. The inability to measure thousands of genes simultaneously was one major impediment. The rapid advancement of genetic techniques including qPCR and genechip technology that occurred in the 1990’s dramatically accelerated the pace of research today. Being able to identify multiple genes whose regulation is altered has led to the discovery of genes that were previously thought not to be involved with disease. The drawback to this explosion of information is that the methods and statistics needed to handle and analyze this much information do not necessarily exist.
Bioinformatics is the science of creating, maintaining and analyzing large databases of biomedical information. A major branch of bioinformatics is the study of gene expression profiles using genechip technology. Genechips measure the expression of thousands of mRNAs simultaneously, and can be used to identify similar gene expression changes that occur in multiple types of cancer.
The potential knowledge that can be gained from reanalysis of genechip data has led to the creation of publicly available databases composed of thousands of genechip experiments. In fact, all National Institute of Health (NIH) funded grants
1
are required to make their data available to the public for reanalysis. One of the
largest databases is the Gene Expression Omnibus (GEO,
http://www.ncbi.nlm.nih.gov/geo) maintained by the National Center for
Biological Information or NCBI. NCBI’s GEO consists of data derived from a
variety of genechip experiments encompassing multiple types of disease. The
availability of a large amount of genechip data from different disease types
enables researchers to carry out reanalysis to identify previously unexplored genes
as candidates for diagnostic markers or therapeutic targets.
With thousands of genechip experiments performed every year, the sheer
volume of data is astounding. One way to reduce the vast nature of the data is to
examine epidemiologic risk factors and associate them with molecular changes
for a specific disease, such as cancer. However, in cancers a factor can be both
protective and detrimental. The effect of any risk factor is dependent on the specific tissue. For example, exposure to polyaromatic-arylhydrocarbons (PAHs)
increases the incidence of liver cancer but is protective for breast cancer [1, 2]. In
contrast, excessive levels of 17β-estradiol (E2) have been shown to be a risk
factor for both breast and liver cancer. These two disparate compounds are both
ligands for nuclear receptors. E2 binds to multiple subtypes of estrogen receptors
which then drive gene transcription downstream of estrogen response elements.
PAHs behave in a similar fashion by binding to the Arylhydrocarbon receptor
(AhR) which then initiates transcription downstream of xenobiotic/hypoxic
2
response elements. There is much literature on the relationship between E2 and
the receptor for PAHs [3]. One aspect of this literature is the relationship of both estrogen and PAHs to the small molecule serotonin (5-HT) [4, 5]. The role of 5-
HT and its precursor tryptophan in cellular physiology suggests that the metabolism of tryptophan, and as a result serotonin metabolism, may be involved in the initiation, promotion, and/or progression of cancers in general.
1.1 Serotonin and Cancer
Although conventionally thought of as a neurotransmitter, 5-HT is active throughout the body and acts as an effector for multiple hormones, especially E2.
By acting as E2’s effector, 5-HT is pivotal in reproduction and wound healing.
One of the first genes to be expressed by trophoblasts is Monoamine Oxidase-A
(MAO-A), the enzyme that metabolizes serotonin. It is expressed to degrade the
5-HT that is released by maternal platelets during implantation into the uterine lining. Serotonin and its precursor tryptophan are critical to the maintenance of the maternal-fetal interaction. They act in conjunction with other factors to provide fetal immune privilege, promote angiogenesis, act as clotting factors, prevent apoptosis, act as growth signals, initiate invasion of the placental cells
into the uterine wall, signal for mitosis, and finally, after birth, reactivate the immune system. The biologic characteristics of the maternal-fetal interaction are
analogous to the hallmarks of cancer and could potentially be used to explain
3
many of the cellular characteristics associated with cancer. To elucidate the role of tryptophan and its derivatives on carcinogenesis, I used genechip data and compared gene expression between cancerous tissues and normal tissues for
individual types of cancer and then identified expression changes that were
common to a variety of cancer types. Initially, I focused on the reliability of expression changes within tryptophan metabolism. I then expanded my analysis to the entire genome. Lastly, I used my findings to elucidate biological processes that appear to be common among cancers.
1.2 Comparative Analysis of Gene Expression in Multiple Cancers
Because of the inherent problems associated with current genechip analytical techniques, I had to develop a new methodology that allowed me to detect changes in gene expression common to different cancers in multiple species. One goal of this project was to explore new statistical methods for carrying out comparative studies on gene expression profiles across multiple datasets that represent the same disease process. Here, I take a three-tier approach.
First, I identify the genes that are differentially expressed in each cancer by conducting hypothesis testing on individual genes to detect significant expression changes. Second, I aggregate the lists of the differentially expressed genes for different cancers to find common functional processes across multiple types of
4
cancer. And thirdly, I conduct Pearson correlations between individual genes to identify conserved networks among the various cancers.
In essence, my approach can be considered a hybrid numerical-semantic method. Since the first stage is the traditional numerical method, the second stage is a quantitative analysis on the semantic information – the gene list, and the third stage is a combination of the numeric and semantic information. These approaches have been shown to be very effective in both the literature and my preliminary work. [6, 7] By using similarities and differences in gene expression profiles, I can identify different factors and molecular signatures that are important to different cancers.
I used this approach to identify genes that are differentially expressed in the majority of cancers. By analyzing an ensemble of genechip datasets
representing a variety of cancerous tissues and treating each dataset as a single
replicate of an experiment (Appendix B, Figure 1), I was able to identify common
expression changes in cancer that are novel and substantial. I found that one gene
in particular, MAO-A, was always changed. I then expanded the scope of my
work and found that MAO-A suppression was associated with a number of the
classical cancer hallmarks. Finally I discovered that MAO-A downregulation
correlates biological network whose role in cancer is well characterized. This
network, the G2/M checkpoint, is associated with initiation, promotion, and
progression in multiple cancers.
5
1.3 Organization of this Dissertation
Immediately following in Chapter 2, I describe genechip technology, its biological applications, and the current state of statistical procedures used. In
Chapter 3, I provide a detailed review of the literature and describe the normal serotonin physiology and its role in disease. I have shown that serotonin plays important and consistent roles in many diseases as a hormone and in detail
describe its proposed role in breast cancer etiology. [8] Chapter 4 describes my
first study that began by examining the role of the serotonin precursor,
tryptophan, in cancer. Ultimately I found that serotonin played a role in multiple
cancers, through the downregulation of MAO-A.
In Chapter 5, I expanded my analysis to the entire genome and also
expand my dataset to include other types of cancer such as sarcomas and
lymphomas. I then focus strictly on human cancers to eliminate differences that might exist because of species specificity. In Chapter 6, I test for MAO-A associated genes by conducting multiple correlations on the eight human cancer datasets in which both control and cancerous tissues are from the same patient
(paired data). I conclude with Chapter 7 where I discuss the findings from
Chapters 4 and 5 along with those in 6. These findings provided the evidence that
led me to hypothesize a putative mechanism for cancer and my conclusions along
with future directions for these findings.
6
CHAPTER 2
GENECHIP TECHNOLOGY
Genechips (sometimes called microarrays) are composed of complementary RNA (cRNA) or complimentary DNA (cDNA) oligonucleotides that are bound to a hard substrate; the combined oligonucleotides and substrate are called a genechip. There are two primary methods by which genechips are made. The oldest is by “spotting” RNA oligomers (50-500bp) on a substrate such as a glass slide. A newer method is by synthesizing the strands directly on the chip. The oligomers in this method are usually shorter (25-50bp) and are denser compared to spotted arrays. By convention, the strands are called probes or probesets and the probesets that code for the same gene in a particular location on the chip are referred to as a spot. The exact methodologies used to manufacture genechips are well described but beyond the scope of this work.
The purpose of genechips is to measure the extent of expression of various genes. To do this, mRNA is isolated from a biological sample and reverse transcribed to make cDNA. The cDNA is then transcribed using biotin labeled oligoneucleotides to make biotin labled cRNA. Following transcription, the labeled RNA conjugated with a fluorescent tag and is then placed on the genechip where it hybridizes to the respective complementary oligonucleotide strands. The
7
chip is then washed to remove any unbound cRNA and placed in a fluorometer to
measure the fluorescent intensity for each spot.
Affymetrix is the leading manufacturer of genechips and a typical
Affymertix chip can measure the expression of over 22,000 genes. Multiple genechips can be used to compare mRNA levels between different conditions, time points, species, and tissues. In addition, genechips can be used to determine the expression of splice variants.
Splice variants can be detected using traditional techniques but testing for
more than a small number of variants is highly impractical. Multiple splice
variants can be identified by using genechips since there are multiple spots for
each gene and each spot can measure the expression of a different exon. [9] In this way, genechips not only measure the expression of a particular gene but also to
proportion each variant is expressed.
Techniques that are able to identify genetic expression changes such as
splice variants, decreased transcription of an enzyme, or increased production of
growth factors provide insight into how disease exerts its pathogenic effect.
Genechips do these tests en mass. A common misconception is that genechips
measure genotypic changes, while there are chips that detect single nucleotide
polymorphisms (SNPs) the majority of genechip experiments examine mRNA
levels, which is a phenotype. By measuring these levels genechip data can be used
to identify common phenotypical changes that occur during the disease process.
8
2.1 Biological Issues
Genechips have been criticized for being less accurate than other methods
used to measure mRNA such as qPCR and Northern blotting. The main objection
is that there is more noise associated with genechips compared to other
techniques. The probesets used on genechips are shorter than those used in other
techniques potentially allowing for binding of inappropriate mRNA. Also, once
the mRNA is bound to the chip no amplification of the signal takes place.
Compared to other techniques, genechips contain more noise. The short probesets
and lack of amplification are two of the major contributors to the noise associated
with genechips.
Further complicating the analysis is the difference of probeset affinities, a
problem inherent to all genetic techniques. The affinity of individual spots can
vary greatly depending on CG content making comparisons between different
genes difficult. This can partially be overcome by normalization and the best
technique is called CG robust mean array (CGRMA). CGRMA takes into account the intensities on all the chips used in the experiment as well as the CG content of each probe set. Another way to normalize arrays is by inclusion of internal standards. Affymetrix provides a set of standardization spots that can be used to create normalized intensities among chips. Although useful for
Affymetrix datasets, these standardization probesets do not create a measure that can be used with other manufacturer’s chips, custom chips or between species.
9
The affinity of the probesets between the various chips can be highly variable and even calculating the affinity based on CG content and correcting for it does not guarantee that values will be comparable. Minute differences in processing methods between the different chip types and/or individual laboratory practices can drastically affect the mRNA binding and alter the affinity. Small difference in temperature cause dramatic changes in affinity, making comparison of chips from different datasets impossible.
2.2 Current Statistical Approaches
Currently microarrays are used in a manner similar to large scale qPCR.
Unfortunately, often genes that are thought not to be relevant to the study are simply unanalyzed and filtered out. This is done to simplify interpretation of the results, which can be extraordinarily difficult when so many genes are measured simultaneously. By excluding certain genes, the interpretation is simplified and the probability of Type I error, or false positives, is decreased. The drawback of excluding genes is that certain genes may be highly involved but remain unidentified (Type II error). However, initially decreasing the number of genes by using domain knowledge to increase the power of the analysis is the most pragmatic way of improving the quality of the analysis, as long as a broader spectrum analysis follows.
10
Current statistical techniques are not the best choice for analyzing
genomic expression changes. Student’s t-test is the most widely used statistical
test to determine significance between two populations. One major flaw with t-
tests is they assume that populations have normal distributions and are
homoscadastic; assumptions of normal distributions and homoscadasticity are
rarely true for real-life data, but these flaws in analysis are currently accepted. In some instances, when the original distribution is log normal, the assumption of normality can be met by using log transformation. Non-parametric techniques such as the Wilcoxon-Mann-Whitney or Spearman rank correlation avoid the issue of distribution by utilizing ranking. This way there is no assumption of having two normal distributions. Unfortunately, they still assume
homoscadasticity.
Besides the problems of assumption of normality and/or homoscadasticty,
performing multiple tests increases the Type I error. Due to the inherent statistical
properties associated with running multiple tests, on average false positives will be present, alpha being the p-value and n being the number of tests. To correct for this, established statistical techniques are applied. The most popular correction for multiple tests is the Bonmeferonni-Holm (BH) correction. [10] The
BH correction is a step wise correction, compared to the traditional Bonferonni correction which is often considered too stringent. [11] Another metric that is used to try to overcome multiple testing is the false discovery rate (FDR).
11
Although there are many ways of calculating the FDR, the most common is to
multiply the p-value from a t statistic by the number of genes discovered. If a
gene x has a p-value of 0.001 and 200 genes were discovered to be below your
threshold α, then the FDR would be 0.001 200 or 0.2 [11]. Ultimately, the FDR is simply the percentage of genes discovered that are false positives. The FDR is not a significant improvement given all the limitations of a t-test.
While statistical techniques can be used to determine the probability of a false positive, they still are unable to identify false positives. Several algorithms have been developed that attempt to address this issue and they include the FDR,
Significance of Microarray (SAM) [11], and the Univariate Permutation Test
(UPT). [12] Unfortunately, all of these methods utilize the f and t statistics, which have assumptions of normal distributions. By increasing the number of replicates of an experiment, comparative analysis can isolate truly changed genes.
This requires that multiple datasets that have potentially different characteristics be analyzed together.
Direct comparison of data from identical platforms by cross- renormalization using the standards provided by the manufacturer is not ideal.
The majority of publicly accessible datasets have previously been normalized by one of several normalization methods. The most common is locally weighted
regression or lowess. Lowess normalization is a non-linear normalization. If the data was originally normalized using a non-linear normalization, such as lowess, 12
reapplying normalization would skew the values. Normalization of genechip data
is done in order to reduce the associated noise. Following normalization, noise is
random and is equally distributed among all the samples. The random distribution
of the noise has less effect than other confounding factors.
If the data is not from identical chip types, then differences in scale effect
the ability to compare chips. Often there is a non-linear relationship between
different chip types, making transformation of values unfeasible. Eliminating the
issues of affinity (biological confound) and scale (statistical confound) could be
done by creating a point estimate of the values and variance for each gene’s
expression change. This methodology will still retain the biological confound of
affinity even though it could remove issues of scale. By using the ratio of change
in gene expression between disease and control and the standard error of the mean
for each group, instead of point estimates, affinity and scale become moot.
Repeating this technique makes it obvious that genechips are capable of detecting
much smaller changes than the accepted standard of at least a two-fold change in
expression. Multiple datasets are needed to detect these minute changes. The noise associated with microarray work prevents any single experiment from detecting these small changes but by using multiple datasets, a lower threshold can be detected.
13
Even with increased sensitivity, statistical procedures such as the t-
statistic, only provide information about the absolute location of the central
tendency for each group and do not address the relative central tendency between
two groups or variables. The primary reason is that using only scalar values will
produce only scalar results. Visual inspection of the data is often adequate to determine other information such as the direction of change when there not many
variables (less than 5 covariates). Evaluating the residuals post hoc provides
information about the direction of change but is time consuming and impractical
for large datasets. Knowing the direction of change is often just as important as knowing that there is a change.
Correlations address the direction of change between two variables. The
Pearson’s product moment correlation (Pearson r), Spearman’s Rho (Spearman
rank), and Kendall’s Tau are the most common correlation tests used1. The
Pearson r correlation is very sensitive to multiple factors including non-normality
and non-linearity. The Spearman rank is less sensitive to non-linearity. Both the
Spearman rank and the Pearson r measure the strength of association between two
variables. Kendall’s Tau is slightly different and is used for analyzing ordinal
values. Although it is possible to produce a correlation matrix for all genes using
any of these tests, examining this large of a matrix will produce a great deal of
1 A fourth test, Goodman–Kruskal Gamma, is rarely used since it measures the difference between concordant ranks and discordant ranks.
14
false positives and would be impractical to interpret. These problems can be
solved by using domain knowledge of specific gene expression changes. By
beginning with a known genetic expression change, a single correlation matrix can be rapidly generated and then filtered based on a threshold value.
Clustering, a common technique in bioinformatics, utilizes various aspects of the different correlation tests. Clustering splits the data into related groups.
Unfortunately, clustering algorithms primarily examine positive correlations and often ignore negative correlations. Biologically, there are times that a gene is suppressed and as the suppressor increases the gene expression decreases. This would be difficult to detect without examining negative correlations.
2.3 Summary
Current statistical techniques are not the best choice for analyzing genomic expression changes because of assumptions of normality and homoscadsticity. In addition, the family wise error rate corrections are often too conservative and produce a great deal of false negatives. The situation is only made worse by differences in normalization techniques that make comparing different datasets to each other directly impossible. To be able to compare datasets across disease, species, and different types of chips, techniques similar to those used in epidemiology can be used, such as meta-analysis. But rather than adjusting values to create point estimates, multiple datasets can be analyzed,
15 treating each dataset as a single replicate of a single experiment. This increases the power and decreases the Type I error of the experimental analysis. By utilizing multiple datasets, the variability within single datasets can be overcome and a better estimator can be created.
16
CHAPTER 3
SEROTONIN PHYSIOLOGY IN MULTIPLE PATHOLOGIES WITH A
FOCUS ON CANCER
The effects of serotonin play a critical role in mammalian physiology.
Serotonin is highly regulated by estrogen, and estrogen receptors and serotonin receptors coexist in cells in a wide variety of tissues. In mammalian females, estrogen that acts extracellularly is primarily produced in the reproductive organs, and concentrations in blood serum and other tissues change over the lifespan and within the ovarian cycle.[13] The most active and most studied form of estrogen in mammals is 17-β estradiol (hereafter E2), although less active forms are also present.[14] Changes in E2 typically occur in conjunction with changes in progesterone, and are to some degree dependent on progesterone priming. I will primarily focus on physiological levels of E2 assuming the presence of progesterone between puberty and menopause, and assuming its absence after menopause. Differences in estrogen concentrations are associated with physiological changes affecting the CNS, skeletal, vascular, and immune systems.
The mechanisms producing these changes have yet to be fully elucidated. [15]
17
This critical review of the literature illustrates how many of E2’s effects are mediated by changes in the actions of serotonin (5HT). Serotonin is usually considered to be a neurotransmitter, but surprisingly, only 1% of serotonin in the human body is found in the brain [16]. The remaining 99% is found in other tissues, primarily plasma, the gastro-intestinal tract, and immune tissues, where serotonin acts as a hormone regulating various physiological functions including vasodilation[17], clotting[18], recruitment of immune cells[19-21], gastro- intestinal motility,[22] and initiation of uterine contraction [23, 24]. Serotonin also has peripheral functions in a wide variety of animal phyla [25-28] and is similar in chemical structure to auxin, which regulates plant cell shape, growth, and movement [29].
3.1 Serotonin Regulation
Both naturally-occurring and pharmacologically-induced changes in E2 alter the concentration of serotonin through several mechanisms. First, E2 increases transcription of both tryptophan hydroxylase (TPH)2 and the protein
needed to activate TPH, YWAH [30-32] This causes an increase in serotonin
concentrations in the body [33, 34]. Second, E2 inhibits expression of the
serotonin reuptake transporter (SERT) gene and acts as an antagonist at SERT,
thus promoting the actions of serotonin by increasing the time that it remains
2 The conversion of tryptophan to 5-hydroxytryptophan via TPH1 or TPH2 is the rate-limiting steps in synthesis of serotonin from tryptophan. 18
available in synapses and interstitial spaces. [35, 36] Finally, E2 downregulates
MAO-A, the enzyme responsible for serotonin metabolism, [37] and thereby
increase the amount of serotonin available.
Beyond increasing concentrations of serotonin, E2 also modulates the
actions of serotonin because the activation of E2 receptors affects the distribution
and state of serotonin receptors. Higher levels of E2 in the presence of
progesterone upregulate E2 β receptors (ERβ) and downregulate E2 α receptors
(ERα).[38] ERβ activation results in upregulation of the 5HT2A receptor,[39] while ERα activation results in an increase in 5HT1A receptors via nuclear factor
kappa B (NFkB).[40] Therefore, increasing E2 causes an increase in the density
and binding of the 5HT2A receptor,[41, 42] which could explain the observed
increases in 5HT2A density for post-menstrual teenage girls.[43] 5HT2A activity
stimulates an increase in intracellular Ca++,[44] which causes changes in cellular
function.[29, 45] 5HT2A activation subsequently causes Protein Kinase C (PKC)
activation. The effects of increased Ca++ and PKC in cells are system-specific and explain many of the physiological consequences of serotonin activation. One effect of PKC activation is the uncoupling of 5HT1A auto-receptors[46] and decreasing serotonin’s effect at these receptors.[47, 48] Following 5HT2A activation of PKC, 5HT1A receptors become unable to reduce serotonin
production through negative feedback and serotonin concentrations increase.[46-
48] E2 compounds this effect by directly inhibiting 5HT1A function.[49, 50] 19
With reduced levels of E2, 5HT1A receptors are dysinhibited and counter
the effects of 5HT2A receptor activation. Increased activation of 5HT1A in the
immune system results in greater mitotic potential via cyclic adenosine
monophosphate (cAMP) and extra cellular response kinase (ERK).[51-54]
Additionally, the reinstatement of 5HT1A auto-regulation decreases serotonin
concentrations by allowing negative feedback inhibition of serotonin production
and release. Normal physiology depends on maintaining a balance between
++ 5HT2A receptor produced Ca inflow and 5HT1A receptor suppression of cAMP
production. Pathologies result when this balance is perturbed, and the specific
manifestation of these pathologies depends on which system is affected.
3.2 Serotonin in the Central Nervous System
Changes in estrogen are correlated with a variety of effects in the CNS,
such as changes in pain transmission, headache, dizziness, nausea, temperature
regulation, and mood.[55] Serotonin systems regulate these same functions[55,
56] in a direction consistent with mediation of E2 effects. For pain, E2 acts as a
central analgesic,[57] and pain sensation is inhibited by the activation of some
serotonergic neurons.[16] Analgesic drugs that exploit this effect at the 5HT2A receptor are already available.[16, 58-61] I hypothesize that E2’s upregulation of the 5HT2A receptor in the brain might contribute to E2-mediated pain relief, in which case central administration of 5HT2A receptor antagonists would decrease 20
E2’s analgesic effects. In the spinal cord, altered expression of 5HT2A receptors can both increase and decrease pain.[60, 61] E2’s upregulation of 5HT2A in the spinal cord could be a factor in the development of fibromyalgia, which presents as increased generalized pain sensation. Serotonergic regulation of fibromyalgia is supported by evidence that fibromyalgia is comorbid with other serotonin-related pathologies,[62] and that fibromyalgia patients have altered tryptophan metabolism[63] and can be treated with 5HT2A antagonists.[62] E2’s effect on serotonin could also explain why fibromyalgia is more frequently observed in females than males.[64] Cancer pain is similar to fibromyalgia in its neruogenic origin. Cancer pain patients have decreased tryptophan levels[65] and drugs that affect the serotonergic system appear to be more effective analgesics than opioids.
Cancer pain is often linked to sleep disturbances and depression.[66]
Depression is more common in women than in men and is known to be mediated by serotonin receptor levels.[56, 67] Specifically, depression is linked to decreased density of serotonin receptors and decreased efficacy of serotonin in the brain. The increased risk, timing of onset, and effectiveness of treatment of depression in women may be mediated by estrogen’s effect on serotonin receptors. The onset of depression in women is a characteristic of times when estrogen levels are relatively low (in early pregnancy, postpartum, and around and following menopause) or low in comparison to progesterone (the luteal phase of the menstrual cycle).[68, 69] In women with depression around or following
21
menopause, the effectiveness of treatment with selective serotonin reuptake inhibitors (SSRIs) is enhanced by simultaneous administration of estrogen,[67] and doses of estrogen alone are effective at treating premenstrual, postpartum, and perimenopausal depression, especially for depression linked to aberrant expression of 5HT2A receptors.[39, 70] ERβ regulates the antidepressant effect of
E2 in mice; ERβ knockout mice fail to show the decrease in immobility usually
induced by E2 doses in a forced swim test. [71] The increased levels of serotonin
and increased activity of the 5HT2A receptor caused by E2 could be the
mechanism for E2’s antidepressant effects, in which case 5HT2A receptor agonists
could also enhance the anti-depressant effects of E2. Interestingly depression and
serotonin levels are specifically linked to breast cancer, for which high E2 levels
have been reported to be a risk factor. Among breast cancer patients, women who
are depressed as measured by decreased MAO-A levels, have significantly worse
outcomes compared to those who are not.[72]
In addition, many cancer patients suffer from hot flashes[66] similar to
those that occur during menopause. The loss of estrogen at menopause results in
decreased density of 5HT2A receptors and lower activity of serotonin, which could
explain aberrant temperature regulation, including hot flashes and night sweats.
Although the effects of temperature changes are felt throughout the body, 5HT2A receptors in the CNS are responsible for temperature regulation. Administration of drugs acting at the 5HT2A receptor restores normal temperature regulation 22
following ovariectomy [73] and chemically induced changes in body
temperature[74] The nighttime prevalence of hot flashes and night sweats could
be a result of the conversion of serotonin to melatonin at night, resulting in lower
circulating serotonin levels.[75] Phytoestrogens preferentially bind to ERβ
receptors[44] and are effective at reducing hot flashes and night sweats.[76] The
mechanism by which these compounds work could be an ERβ-produced
upregulation of 5HT2A receptors.
Two of the major side effects of hormone replacement therapy (the
treatment for hot flashes) and chemotherapy, are dizziness and nausea, which are
controlled in the CNS. The mechanism by which these side effects occur has not
been fully elucidated. It is possible that E2’s effect on serotonin pathways is
responsible for these symptoms. 5HT2A receptors activate vestibular neurons
which maintain balance [77] and are found in emetic centers, which are involved
in chemically-induced vomiting [78]. My hypothesis is corroborated by the use of serotonergic drugs to minimize these side effects of both treatments.[79]
Migraines are common among cancer patients[80, 81] and females are also at greater risk for headaches,[56] which can result from vasodilation in the brain [82]. Activation of an additional serotonin receptor, 5HT1B, is one
mechanism by which vasodilation occurs. 5HT1B receptors are not uncoupled by
E2 (unlike 5HT1A receptors), and their vasodilatory effect is typically balanced by
activation of 5HT2A receptors, which result in vasoconstriction [83]. After E2 23
exposure, increased serotonin concentrations result in greater activation of both
the 5HT1B and 5HT2A receptors. Under normal conditions, upregulation and
activation of 5HT2A receptors enable them to balance the effects of 5HT1B receptors [41, 42, 84]. I suggest that females’ increased headache risk might result if high serotonin concentrations are maintained without adequate compensatory
5HT2A activity.
3.3 Serotonin in the Musculoskeletal System
E2 and 5-HT also affect the skeletal system. As bones grow, they are
continually remodeled and reshaped. Normal bone development is affected by
growth hormone, parathyroid hormone, calcitonin, and environmental factors like
dietary calcium intake and physical activity. In addition to these factors, estrogen
and serotonin play an important role in the development and maintenance of bone
mass. For bone growth to occur, two types of cells are required: osteoblasts,
which form new bone, and osteoclasts, which resorb bone. During puberty,
osteoclasts and osteoblasts are in balance and resorb and build bone
simultaneously, but osteoporosis results when osteoclasts increase relative to
osteoblasts. These effects have been linked to E2 concentrations in both males
and females,[85, 86] and I propose that they can be explained by examining E2-
produced changes in serotonergic function in bone growth and loss. 5HT2A receptor activation causes an increase in expression of osteoblast progenitor cells,
24
maintaining bone density.[87] SERT activation, in contrast, increases osteoclasts in bone, aiding in bone growth in childhood,[88] but resulting in loss of bone density and increases in extracellular Ca++ postpartum[89, 90] and in menopause.
[91, 92] Studies of female mice lacking the ERα, the ERβ, or both suggest these
two receptors might counterbalance each other’s effects on longitudinal bone
growth,[93] with ERβ primarily responsible for decreasing bone growth and increasing bone resorbtion.[94] Because ERα and ERβ have opposing effects on serotonin systems, mediation by serotonin could explain E2’s effects on the skeletal system such as the decrease in bone density observed following menopause or when E2 function is otherwise compromised. However, bone loss begins around age 30 in men and women and this early bone loss cannot be entirely explained by differences in E2 concentrations or by my proposed model
[95].
In the muscle, 5-HT acts to regulate glucose transport by altering expression and insertion of GLUT transports independent of insulin. [96, 97] The serotonergic regulation of glucose transport is mediated via two different mechanisms. First serotonin increases GLUT4 insertion in the muscle membrane via the 5HT2A receptor.[97, 98] Second 5-HT increases transcription of GLUT4
and GLUT1 via a MAO-A mediate mechanism. [96] Both the GLUT4 and the
GLUT1 are passive transporters for glucose. The GLUT1 is highly expressed in
25
the placenta [99], while the GLUT4 is primarily expressed in more mature tissue.
[100]
Premenopausal women show decreased levels of insulin insensitivity
while following menopause the risk of insulin insensitivity is parallel to men if
not greater.[101] This suggests that E2 is protective against the development of
diabetes. It is paradoxical then that some women develop gestational diabetes
when E2 levels are the highest. [102] This suggests that some other mechanism is
also involved in glucose transport and regulation. There seems to be a strong relationship between circulating interlukins and the development of gestational diabetes, and a strong relationship with pre-eclampsia with gestational diabetes.
[102, 103] 5-HT is capable of altering both IL-6 and IL-8 levels, while MAO-A activity is altered in placental cells from pre-eclamptic pregnancies. [20, 104,
105] Cumulatively, these findings implicate the serotonergic system in the development of insulin insensitivity in females.
3.4 Serotonin in the Vascular System
In the vascular system, estrogen and serotonin have been shown to individually alter clotting, cholesterol, vasoconstriction, and heart attack risk
[106-109]. Both high and low levels of E2 have been associated with increased risk of thromboembolism; high levels result in increased clot formation, while low levels result in slower clot breakdown. Unusually high concentrations of estrogen
26
(beyond normal physiological levels) directly increase the likelihood of clotting by increasing production of clotting factors VII through X in the liver.[55] In
addition, these levels of E2 might increase clotting by increasing serotonin, which
is constitutively present in human plasma and platelets and works to promote
clotting[18] and increase density of platelets.[79] Increased clotting and
thromboembolism at low concentrations of E2[110] can also be explained using
serotonergic changes. Postmenopausal women have longer latency to lysis of
clots, and E2 replacement therapy returns latencies to pre-menopausal
levels.[111] Patients with slower clot breakdowns have decreased uptake and
release of serotonin from platelets,[112] and at low E2 levels serotonin’s ability to
break down clots via the 5HT2A receptor is limited,[113, 114] so I suggest that
lower serotonin activity associated with lower E2 levels could also contribute to
increased clotting.
Increased concentrations of E2 are also associated with decreased
cholesterol, and at menopause, there is an increase in total serum cholesterol,
which is reduced by estrogen-containing hormone replacement therapy.[115] I
suggest higher cholesterol after menopause is linked to the effects of serotonin.
Serotonin increases membrane fluidity by incorporation of cholesterol into
membranes, decreasing bioavailable cholesterol.[116, 117] Increased membrane
fluidity also increases serotonergic function, creating a positive feedback
loop.[118, 119] If serotonin is an intermediary between estrogen and cholesterol,
27
then in the presence of high concentrations of E2, I would expect more cholesterol
incorporated into membranes, thereby reducing cholesterol present in the plasma.
My hypothesis is supported since the administration of drugs that reduce
concentrations of serotonin in the plasma cause increases in plasma cholesterol
despite consistent levels of E2 [109].
Both clotting and cholesterol contribute to heart attack risk. Women are at lower risk of heart attack than men prior to menopause, but changes in the vascular system after menopause result in the loss of protection from heart disease.[55, 56] In females, recent evidence suggests that physiological levels of
E2 protect against heart attacks, while testosterone makes heart attacks more likely.[120] E2 acting at ERβ is responsible for this protective effect, as mice lacking ERβ have greater mortality and increased heart failure indicators following experimentally induced myocardial infarctions.[121] In addition antidepressants have been shown to be decrease the risk of myocardial infarction.[122] I suggest that these effects in females can be explained in part by serotonin receptor changes. Specifically, in the presence of physiological E2 and therefore ERβ activation, serotonin preferentially acts on 5HT2A receptors and to
reduce vasospasm in cardiac tissue. After menopause, when 5HT2A receptors
have been downregulated, serotonin instead acts on 5HT1A receptors, which cause
adrenergic stimulation of smooth muscle[123] and increase likelihood of cardiac
vasospasm.[124] This increases the risk of heart attack.[123, 125-127] In 28
addition, testosterone, which increases following menopause, compounds the
actions of serotonin at 5HT1A receptors by preventing desensitization of 5HT1A receptors.[128] These changes in sensitivity of cardiac vessels, combined with increased clotting and lipid levels, would be expected to increase heart attack risk, arteriosclerosis and strokes. However, E2 is not solely responsible for protection from heart attack, progesterone also plays a role. Hormone replacement therapy
(HRT) containing E2 and medroxyprogesterone instead of E2 and progesterone has been shown to increase heart attack.[129] Although the study showing increased heart attack risk during HRT is controversial,[130] it is possible that decreased concentrations of serotonin produced by treatment with medroxyprogesterone [124, 131] could contribute to this increased risk.
3.5 Serotonin in the Immune System
Both E2 and serotonin are also active in the immune system, and in this system, their interaction is well-documented. E2 suppresses major histocompatablilty complex II (MHC II) proteins in a tissue-specific manner[132] and acts centrally to suppress the immune system[133] by helping to activate
5HT2A receptors in the thymus.[42, 94, 134, 135] Estrogen treatment also indirectly suppresses MHC II protein expression via serotonin.[133, 136]
Specifically, increased 5HT2A activity causes decreased MHC II production,[137] and decreased selection against self-reactive helper T cells (TH1).[138] In 29
addition, the concurrent inactivation of 5HT1A receptors decreases TNF-α
production.[139, 140] Although self-reactive TH1 cells are present, I hypothesize
that E2’s suppression of MHC II prevents them from becoming activated, and
therefore while sufficient E2 is present they fail to attack tissues. Following
menopause, or when E2 levels are unusually low, suppression of MHC II and
immune function is lost, allowing self-reactive TH1 cells to become active and
pathogenic. It is possible that estrogen and serotonin’s modulation of the immune
system prevents immune attack on offspring during pregnancy (when estrogen is
at relatively high concentrations) and avoids infection after delivery (when
estrogen is relatively low).[141] Further high levels of E2 increases expression of indoleamine 2,3 dioxygenase (IDO) shifting tryptophan metabolism away from serotonin. Increased IDO has been implicated in both maternal fetal interactions and immune evasion by cancer. [142, 143] IDO acts by suppressing the ability of dendritic cells to active T cells through alterations in MHC expression. [136]
MHC II protein and self-reactive T cells appear to be the common
denominators among autoimmune disorders in women, suggesting a role for E2
and serotonin in mediating these disorders. Multiple sclerosis (MS) is associated
with the presence of MHC II protein polymorphic pathogenic alleles[144, 145]
and serotonin depletion.[146] This serotonin depletion could be a consequence of
low E2, so the decrease in MS symptoms during pregnancy [147] could be
explained by higher concentrations of E2. Also the severity of MS symptoms
30
increases as serotonin levels decrease[148], symptoms worsen in phases of the menstrual cycle when there is low E2[149], and low levels of E2 result in changes
in the 5HT signaling pathway [150]. In female SERT knockout mice, symptoms
of experimental allergic encephalomyelitis (a MS model) are less severe and have
a greater latency to occurrence, possibly as a result of increased serotonin
availability. [151] Not only may low serotonin levels be linked to MS, but the
effects of serotonin on MS may involve 5HT2 receptors in particular. Gene-
microarray analysis of brain lesions found lower 5HT2 receptor expression in all 4
MS patients compared to controls.[152] A potential mechanism by which MS
could be induced is through infection of the John Cunningham virus (JC virus).
JC Virus causes progressive multifocal leukoencephalopathy (PML) through
destruction of oligodendrocytes.[153] JC virus is a retrovirus and like other
retroviruses downregulates the receptor that it used to enter the cell. In the case of
the JC virus this receptor is the 5HT2A.[154]
Another possibility is serotonin depletion by conversion of serotonin to
melatonin in the absence of light. Melatonin is primarily synthesized in the pineal,
but has been shown to be produced in physiological relevant concentrations in
other tissues exposed to light such as skin and hair follicles. [155-158] Alterations
in serotonin concentrations associated with increased melatonin production could
potentially explain the increased incidence of MS in more northern climates [159]
(where daylight periods are shorter) and the reason that light therapy can be
31
effective in reducing symptoms of MS. [160] Similarly, self-reported incidence of Type I diabetes (IDDM) is negatively correlated with exposure to UV radiation and positively correlated with latitude in Australia.[161] Melatonin suppresses estrogen function [75] and suppresses 5HT2A receptor activity.[162] Further,
melatonin might be the link between E2 and helper T-cell (TH1) activity, as
melatonin has been shown to upregulate expression of TH1-stimulating factors
such as TNF-α and IFN-γ.[163] TNF-α increases the expression of MHC class II
proteins and activates TH1 cells, [164] which are hallmarks of MS.
Similar MHC class II polymorphisms and T cell dysfunctions have been
implicated in lupus,[165, 166] and lower levels of free tryptophan[167] and MHC
II protein overexpression is also linked to autoimmune attack on beta cells in
Type I diabetes (IDDM).[168] Overexpression of MHC II following failure to
select against self-reactive T-cells is also a useful model for rheumatoid arthritis,
Graves disease, and Hashimoto’s thyroiditis, in which T-cells react to proteins
produced in the body, failing to discriminate them from invading organisms.[169]
Women in whom estrogen-regulated serotonin signaling is compromised would
be expected to have higher levels of MHC class II protein expression and may
present these pathologies. However, simply over-expressing MHC II proteins is
not sufficient to activate the immune system and induce autoimmune
disorders.[169] The links between autoimmune disorders, serotonergic systems,
and E2 suggest that manipulation of serotonin or E2 could be used to successfully
32 treat these pathologies. Consistent with this suggestion, ER agonists reduce the symptoms of autoimmune disorders.[170, 171]
3.6 Serotonin in Cancer
After describing the physiological role of serotonin, I will use breast cancer as an example of the interaction between E2 and 5-HT and their relationship to carcinogenesis. Cataloging the interactions of serotonin with all the different types of cancer is beyond the scope of this work. However I can describe the basis for carcinogenesis before focusing on breast cancer. Carcinogenesis is conceptualized as consisting of three distinct phases: initiation, promotion and progression. Initiation is the irreversible alteration of a normal cell; promotion involves both proliferation of initiated cells and suppression of apoptosis of these cells; and progression is the irreversible conversion of one of the promoted initiated cells to an invasive, metastatic tumor cell.[172] Therefore, any endogenous milieu that induces apoptosis or suppresses mitogenesis of initiated cells could reduce breast cancer risk.
For breast cancer, one of the prevailing theories for the role of E2 is that longer duration of lifetime exposure to E2 is associated with increased risk, so that early menarche and late menopause result in greater likelihood of developing breast cancer. [173] Adding a role for serotonin does not conflict with this idea, but it does help explain several epidemiological findings that are not accounted
33
for by a relationship between increased E2 exposure alone and breast cancer.
First, the highest breast cancer incidence is in post-menopausal women, when endogenous E2 levels are much lower than before menopause. As described above, the higher E2 concentrations in the presence of progesterone prior to menopause cause an increase in 5HT2A receptor density and serotonin activity that
promotes apoptosis. In contrast, 5HT1A activation (which occurs preferentially
after menopause) decreases apoptotic signaling via caspase-3 suppression.[52]
Therefore, if E2 is acting on breast cancer in part by serotonin modulation, then I
would predict that the decrease in E2 after menopause should increase risk of
breast cancer. This is consistent with the observed breast cancer incidence
curve.[174] The failure of low levels of E2 to inhibit cancer growth is also
reflected in patterns of tumor development within the estrous cycle. In mice,
breast tumor growth occurs primarily in diestrus (when E2 is low), and tumor size
is maintained or shrinks when E2 levels are high.[175] In conjunction with the
alterations in serotonin receptor distribution the loss of E2 could affect cells by
altering the metabolism of tryptophan to serotonin and instead form
arylhydrocarbon agonists. The Arylhydrocarbon receptor is ligand bound nuclear
receptor that belongs to the Per-ARNT-Sim family. The only known natural
ligands of the AhR are products of a reaction between Aspartate
Amniotransferase (AST) and tryptophan. The loss of E2 increases free AST
34 levels that alter the metabolism of amino acids. [176] This alteration could result in the production of AhR agonists from tryptophan.[4]
Second, in Pike’s Breast Tissue Age model, a one-time rapid increase in breast tissue age and therefore breast cancer risk is included immediately following the first full-term pregnancy.[177] The extension of Pike’s model includes multiple births by incorporating smaller increases in risk at each additional full-term pregnancy.[178] This pattern of increased risk for breast cancer immediately following full-term pregnancies is well-documented.[179-
181] E2 concentrations increase steadily during pregnancy, peaking at about 100 times normal cycling levels.[15] In the days around parturition, these concentrations drop precipitously to levels below those of normal cycling females, where they are maintained for at least a month and potentially much longer (depending on suckling suppression). [182] The observed increase in breast cancer risk can be accounted for by the concurrent decrease in E2 and therefore changes in 5HT2A receptor function immediately prior to parturition.
The loss of E2's effect on serotonin could account for the immediate increase in risk. Mammary involution is mediated by serotonin and in the absence of serotonin cells that were meant to undergo apoptosis remain[183]. However this cannot explain the long-term reduction in risk, which is likely related to other changes associated with parturition or lactation.
35
Third, obesity exerts differential effects on breast cancer risk over the
lifespan; decreasing risk prior to menopause and increasing risk following
menopause.[184, 185] Under the prevailing theory of cumulative E2 exposure,
obesity (which increases E2 levels[186]) would always be expected to increase
breast cancer risk. However, the effect of E2 using serotonin mediation described
above can account for the observed differential effects. Increased E2 in the
presence of progesterone increases activation of 5HT2A receptors, while increased
E2 in the absence of progesterone increases activation of 5HT1A receptors. The effects of these two receptors on apoptotic activity would predict that obesity exerts a protective effect before menopause and increases risk after menopause.
The importance of the presence of progesterone for this protective effect is underscored by recent HRT studies, which show that the use of estrogen and progesterone does not increase breast cancer risk,[187] while the use of estrogen
and medroxyprogesterone (which decreases serotonin in some tissues[26, 188]) has been shown to increase breast cancer risk. Consistent with the observed
effects of HRT, oral contraception with Depo-Provera,® which includes
medroxyprogesterone rather than progesterone, has been shown to increase breast
cancer risk.[187, 189] This is in line with the findings that selective serotonin
reuptake inhibitors (SSRIs), which increase serotonin levels, have been shown to
decrease the incidence of cancer in both animals and humans [190, 191]. The
36 increased serotonin levels that result through E2 upregulation or SSRIs may be responsible for the protective effect both exert on cancer.
3.7 Summary
Most research on pathologies in women's health has centered on changes in E2. My review of data from a variety of fields suggests that serotonin is one of many ways that estrogen exerts its effects on physiology and pathology. There is a body of literature that shows many of the effects specifically attributed to estrogen receptor function occur further downstream as a result of alterations in various pathways. Theses pathways ultimately relate to the primary function of
E2, which is reproductive. And serotonergic mediation of the estrogen system likely provides reproductive benefits that are not yet understood. Several of the effects I have discussed produce reproductive benefits: immune suppression during pregnancy decreases the chance of lost pregnancies, postpartum activation of the immune system increases antibodies in milk, increased clotting and vasoconstriction in the uterus prevents bleeding during birth, and increased available calcium during lactation improves the quality of breast milk. Notably, the same mechanism that results in these potential benefits in the reproductive system also produces changes in the remainder of the body that have consequences in physiology and pathologies. Whether the potential reproductive benefit of these effects is adequate to account for the maintenance of the
37
estrogen/serotonin link remains to be explored. I suggest serotonergic mediation
might contribute to explaining E2’s effects on some pathologies, including heart
attacks, multiple sclerosis, and in particular cancer. Drugs that increase 5-HT
induce apoptosis of tumorous tissue both in vitro and in vivo.[192, 193]
However, serotonergic involvement in cancer has only been minimally described. Fortunatly, the role of E2 has been extensively studied in many cancers. E2 as a risk factor has been most studied in relation to breast cancer, but increased levels of E2 have also been associated with increased risk of other reproductive cancers in both women and men. [5] [194] [195] [196] Additionally
E2 has been observed to be a risk factor for liver cancer [197] while it appears to be protective for colon.[198] The mechanisms of these associations have not been fully elucidated. These relationships between cancer risk and E2 can be better understood in the context of serotonergic signaling. Even though different cancers seem disparate there are commonalities, as shown in the “Hallmarks of
Cancer”[199] Here, the role of 5-HT in cellular physiology suggests serotonin pathways may be involved in the initiation, promotion, and/or progression of cancers. Altering specific aspects of the serotonergic system, rather than simply increasing E2, could allow clinicians to target treatments in particular tissues or towards particular receptor types, alleviating undesirable side effects of E2
administration.
38
CHAPTER 4
HYPOTHESIS TESTING OF THE TRYPTOPHAN/SEROTONIN
METABOLIC PATHWAY
One of the key goals in cancer research is to identify biological changes
that distinguish normal tissue from cancerous tissue. A common approach to
identifying oncogenes has been to assess gene expression in each type of cancer
and compare it to non-cancerous tissue of the same organ. Comparisons within a
single cancer type (e.g., breast cancer) or class (e.g., leukemia) have yielded
potential oncogenic mechanisms that have been successfully used to develop
therapeutic strategies for individual cancer types. For instance, real-time
polymerase chain reaction (qPCR) research has found increased expression of
oncogenes like c-myc [200] and decreased expression of tumor suppressors like
Rb [201]. Western blotting has been used to show overexpression of functional
erb-B2 in breast cancers [202] and ovarian cancers [203]. Unfortunately,
comparing the data obtained from studies of individual types of cancer has resulted in only limited success at detecting consistent changes among different
types of cancers. One such success is the identification of a mutation in p53, a
protein responsible for repairing cellular DNA, which occurs in approximately
50% of all cancers [204]. The discovery of similarities among various cancer
39 tissues is the first step in identifying a common mechanism that contributes to the development of cancer. Once a change is identified, appropriate therapeutic targets can be developed to help physicians identify at-risk individuals and improve patient care. Indeed, novel therapeutic strategies have been developed as a result of the extensive study of p53 [205].
Although best known as a neurotransmitter, only 1% of the tryptophan
(Trp) derivative, serotonin (5-HT), is found in the nervous system. The remaining serotonin is found in the periphery and is active in the immune, circulatory, reproductive, musculoskeletal, and gastrointestinal systems [8]. Depending on receptor distributions, serotonin activity can promote or reduce apoptosis [52].
My previous work explored the activity of serotonin in an array of pathologies, particularly those in which epidemiological data suggests gender differences [8]. I propose that estrogenic effects on serotonergic function and receptor distribution could explain gender differences in pathologic incidence, as well as some of the effects of estrogen on breast cancer. Rather than being specific to breast cancer, the role of serotonin and its precursor Trp in cellular physiology suggests that the metabolic pathway of tryptophan and as a result serotonin metabolism may be involved in the promotion or progression of cancers in general. There is some literature supporting this hypothesis [142, 191, 193, 206-211], but further research is needed to understand the exact relationship between Trp or its metabolites and cancer. Recently a mechanism was proposed by which catabolism of Trp by
40
indoleamine 2,3-dioxygenase (IDO) can be linked to immune evasion in tumor
cells [142, 212]. Other studies suggest that decreased serum tryptophan levels are
predictive of poorer prognosis and quality of life in cancer patients. [208] For
serotonin specifically, studies have shown that the selective serotonin reuptake
inhibitors (SSRIs) which prevent the reuptake of serotonin thus increase
extracellular serotonin levels, have anti-cancer activity in cancer cell lines[211],
decrease incidence of cancer in both animals [190] and humans [191], and can be
used as a treatment for lymphoma/leukemia [213].
Using genechip technology to study multiple types of cancer
simultaneously can identify whether multiple cancers have similar gene
expression changes [214]. Knowing that tryptophan/serotonin metabolism is
linked to carcinogenesis I hypothesized that it was altered in multiple cancer
among a variety of species. To test this hypothesis, I analyzed an ensemble of
cancer genechip datasets focusing on genes involved in tryptophan metabolism,
which include serotonergic genes among them. I first compare gene expression
between cancerous tissues and normal tissues for each type of cancer and then identify changes that are common to a variety of cancer types. Using this technique, I will be able to identify the relationship between the cancer and the serotonergic system.
41
4.1 Methods
19 genechip datasets consisting of cancerous tissue from 10 different
organs derived from human, mouse, rat or zebrafish were extracted from the GEO
profiles database. Datasets were first identified by using the search terms “cancer”
or “metastasis”. All datasets in the GEO profiles database as of March 31st 2007
were considered. The genechip data was selected such that both control and
cancer samples were contained in the same dataset. Because of the differences in
gene expression that are inherent to cell culture [215], only data derived from
primary biopsies were used. By definition cancer implies an invasive phenotype therefore all other tumor types were excluded such as adenomas, and carcinomas in situ. Among datasets that contained multiple types of cancer, the cancer with the closest number of samples to the control was used for analysis.
Appendix A, Table 1 describes the datasets used, which include 12 human
cancer datasets, five mouse cancer datasets, one rat cancer dataset, and one
dataset from zebrafish. All animal datasets were derived from cancers that were induced using viral, genetic, or by chemical means. No xenografts were included in this analysis. In total, 242 cancerous samples and 139 control samples were used. Prior to analysis, data that was logarithmic was transformed back to its original values, and all “null” values were excluded. Data points that were from more than one patient, also called pooled samples, were also excluded. Previous reports have shown that probe-sets cannot be averaged [9] therefore all analyses
42
were preformed on the probe-set with the highest mean expression value. Each
genechip dataset was normalized (both intrachip and interchip) before being
deposited in the GEO database. I independently validated the normalization in
every dataset by inspecting the distribution of expression values. Datasets that
were not appropriately normalized were excluded.
My initial interest was in tryptophan-related genes, so I examined all
human, mouse, and zebrafish tryptophan pathway genes (~60 genes depending on
the species and gene chip) listed in Kyoto Encyclopedia of genes and genomes
(KEGG) using BRB-array tools [216] developed by the National Cancer
Institute’s Biometrics Research Branch (NCI BRB). Appendix A, Table 2 lists all
the tryptophan genes in KEGG. First, differences in the expression levels of the
genes were examined within each of the individual cancer datasets. I performed
either related samples or independent samples t-tests (as appropriate, see
Appendix A, Table 1) on the change in expression of the genes between control
and cancer samples for each dataset. Since I focused on the behavior of individual
genes across multiple types of cancer rather than groups of genes in individual
datasets I only compensated for the multiple t-tests over the composite number of
datasets. The probability that I would observe the same false positive from among
approximately 60 genes, in at least 16 out of the 19 separate datasets, is at most of
the order of 7.8x10-17, an unlikely outcome. Therefore finding one specific gene
(MAO-A) which is consistently down-regulated in 16 out of 19 (using the
43
Bonferroni-Holm or BH correction) datasets is an indication that the observed
down-regulation of MAO-A in cancer tissues cannot be attributed to chance
alone. Nonetheless, I did adjust for multiple t-tests among the 19 datasets by using
the Bonferroni-Holm adjustment [10, 217]. Specifically, I sorted the p-values for
t-tests of expression of each gene from smallest to largest and compared the i-th
p-value to the original alpha level (0.05) divided by the number of data sets+1-i.
Thus I compared the smallest (first) p-value with 0.05/18 = 0.0028 and the largest
(last) p-value with 0.05/1 = 0.05. For each dataset a list of tryptophan related genes whose expression level was significantly changed using these criteria was generated. Then the frequency of each gene appearing in all the lists was counted and the genes were sorted by the frequency from high to low. Only MAO-A was differentially expressed in the majority of the datasets analyzed.
4.2 Results
By conducting a series of analyses focusing on tryptophan related gene expression data (see Appendix A, Table 2 for a list of genes analyzed) in the GEO database (gene expression omnibus) maintained by NCBI [186], I found that only
Monoamine Oxidase A (MAO-A, E.C. 1.4.3.4) showed consistent decreased expression, in cancers among a variety of tissues from humans, rodents, and zebrafish. Specifically, only MAO-A expression was significantly altered in all 13 of the datasets that used non-cancerous patients as controls and half of the paired
44 datasets. Appendix A, Table 1 provides specific p-values and the mean fold change in MAO-A for the datasets analyzed. Although the extent of downregulation varied among patients, cumulatively 95.4% of all of the tissue samples from human cancer patients, and 94.2% of all animal cancer cases showed lower MAO-A expression than the single lowest control sample in their respective dataset. Changes in expression for unpaired data are provided in
Appendix B, Figure 2.
Within each dataset, between 67% to 100% of patients had MAO-A expression below the lowest control sample (see Appendix A, Table 1 for individual values). Examining data from individual patients among the paired data revealed a remarkable pattern of downregulation in cancerous tissue among paired samples analyzed. Only a subset of the datasets that compared cancerous tissue to normal tissue from the same patient failed to show significant downregulation.
The three datasets that did not contain a significant shift in MAO-A expression after correcting for multiple t-tests were the two papillary thyroid cancer datasets and one gastric cancer dataset. However 69% of thyroid cancer patients and 100% of gastric cancer patients exhibited a decrease in expression compared to non- cancerous tissue from the same patient. A possible explanation for this discordance is that a less pronounced downregulation of MAO-A occurs in patients with these two types of cancer.
45
4.3 Discussion
Most current cancer research is focused on tissue-specific genetic
mutations. Familial inheritance (e.g., APC in colon cancer), genetic mutation
(e.g., p53), and overexpression of growth receptors (e.g., Her2-neu in breast
cancer) can each lead to aberrant replication of a cell. Studies of these changes
provide tremendous information about tissue-specific effects but are less
informative about common changes that occur in multiple tissues. The similarity
in the behavior of cancers from different organ systems and species indicates the potential for a universal change among cancers, regardless of the specific tissue or
species. This study suggests that downregulation of MAO-A is such a change and
could be an important indicator or even a factor in the development and spread of
many types of cancers.
4.4 Summary
Identifying consistent changes in cellular function that occur in multiple
types of cancer could revolutionize the way cancer is treated. Previous work has
produced promising results such as the identification of p53. Recently drugs that
affect serotonin reuptake were shown to reduce the risk of colon cancer in man.
Here, I analyzed an ensemble of cancer datasets focusing on genes involved in the
tryptophan metabolic pathway. Genechip datasets consisting of cancerous tissue
from human, mouse, rat, or zebrafish were extracted from the GEO database. I
46 first compared gene expression between cancerous tissues and normal tissues for each type of cancer and then identified changes that were common to a variety of cancer types. I found that significant downregulation of MAO-A, the enzyme that metabolizes serotonin, occurred in multiple tissues from humans, rodents, and fish. MAO-A expression was decreased in 95.4% of human cancer patients and
94.2% of animal cancer cases compared to the non-cancerous controls. These are the first findings that identify a single reliable change in so many different cancers. My next study investigated the link between MAO-A suppression and the development of cancer to by analyzing the entire genome in order to determine the extent that MAO-A suppression contributes to cancer risk.
47
CHAPTER 5
WHOLE GENOME ANALYSIS
Genetic causes are the mostly cited reasons for initiation and development of cancers. During the past 40 years, a tremendous amount of research efforts have been spent on identifying genes related to cancers including proto-oncogenes
like MYC, ERK, EGFR, KRAS and the tumor suppressors such as p53, PTEN,
Rb. According to the American Cancer Society there are more than 100
recognized oncogenes and about 30 tumor suppressor genes [218]. In spite of the
vast number of candidates no common genetic mechanism has been identified that
is applicable across cancers despite the phenotypic similarities. For any type of
cancer, usually multiple genes have been identified; for instance, BRCA1 and
HER2 in breast cancer,[219, 220] or KRAS and AXIN2 in lung cancer. [221, 222]
The abundance of potential candidate oncogenes in combination with the vast
amount of tumor suppressors has lead to the notion that “no two cancers are
alike”. Further enforcing this idea was a 2007 Science publication that compared
common mutations found in breast cancer patients with those found in colon
cancer patients. This study identified a list of more than 200 gene mutations for
each type of cancer.[223] Yet these results were based on only 11 patients from
48
each type of cancer. The two mutation lists shared very little similarity except for
the well-known tumor suppressor p53.
In spite of the dramatic difference between the genotypes of different cancers,
the fact that they are all histologically classified as “cancer” implies a strong
similarity in their pathology. These common phenotypes among the cancers are
well accepted and have been summarized in the paper “Hallmarks of Cancer” in
which six common traits of cancer are discussed. [199] According to Hanahan
and Weinberg these common traits are:
• “self-sufficiency in growth signals,
• evading apoptosis,
• insensitivity to anti-growth signals,
• sustained angiongenesis,
• limitless replicative potential,
• tissue invasion & metastasis”.
For each of these cellular traits there is a corresponding biochemical trait that is influenced by dynamic genetic expression changes that impact not only the cell
but the microenvironment as well. It seems fallacious that viruses, chemicals, or
other factors lead to mutations or deletions in so many diverse oncogenes or
tumor suppressors that then act on the same repertoire of pathways in a variety of
different tissues and species.
49
To identify a potential mechanism, that might explain why the variety of different means of cancer initiation result in similar cellular physiology I applied a well established statistical approach from quality control to microarray datasets.
This novel approach allowed me to compare gene expression profiles form multiple datasets. This application is novel since there I used the 19 datasets from my earlier serotonergic metabolism analysis (Group A). I also wanted to increase my sample size, therefore I relaxed my inclusion criteria and arrived at 40 unique datasets across multiple species (Group B). To assure that the gene expression profiles were relevant to human cancers, I restricted my subsequent analysis to the
32 human cancers datasets (Group C). Using this technique, I was able to identify genes that were differentially expressed in a number of different cancers, with the majority linked to the regulation the G2/M checkpoint.
5.1 Methods
Related work which was published in 2004, compared the gene expression profiles for six different types of cancers and generated a list of common genes.
[214, 224] This work is the basis for the well-known gene expression data exploration portal Oncomine, that has since gone through several iterations. [225]
My work shares some similarities in the concept of comparative analysis, but my methods differ significantly in terms of the objective, the inclusion / exclusion criteria for datasets, statistical methods, and pathway analysis. With the large
50 amount of new datasets available after 2004, I can apply more stringent selection criteria and obtain results with a higher degree of specificity and confidence.
5.1.1 Dataset Collection
The original 19 genechip datasets identified for the serotonergic analysis were used for the initial first set of analyses. Appendix A, Tables 1 describe the datasets used, which include 12 human cancer datasets, five mouse cancer datasets, one rat cancer dataset, and one dataset from zebrafish. All animal datasets were derived from tumors that were induced in the animal using viral, genetic, or chemical means. No xenografts were used in this analysis. In total the original 19 datasets consisted of 242 cancerous samples and 139 control samples.
After analyzing the original 19 datasets, I relaxed my inclusion/exclusion criteria and expand my analysis to include extra datasets that became available after
March 2007. I also included the results of the search terms sarcoma, lymphomas, and brain cancer. In my previous analyses, if there was more than one type of cancer present within a dataset, I only used the cancer that had the closest number of samples to the control. Here I analyzed each type of cancer in every dataset independently. Appendix A, Tables 3-5 describe the datasets used.
51
5.1.2 Dataset Handling
Prior to analysis, data that was logarithmic was transformed back to its
original values, and all “null” values were excluded. Data points that were from
more than one patient, also called pooled samples, were excluded. Previous
reports have shown that probe-sets cannot be averaged [9] therefore all analyses
were performed on the probe-set with the highest mean expression value. Each
genechip dataset is normalized (both intrachip and interchip) before being
deposited in the GEO database. I independently validated the normalization on
every dataset by inspecting the distribution of expression values.
5.1.3 Gene Selection
For each dataset, differentially expressed genes were identified using
BRB-array tools [216] developed by the NCI Biometrics Research Branch. First,
differences in the expression levels of the genes were examined within each of the
individual cancer datasets. I performed either related samples or independent samples t-tests (as appropriate, see Appendix A, Tables 3-5) on the change in expression levels of the genes between control and cancer samples for each dataset.
52
The probability of inferring that a particular gene is differentially
expressed in one or more given datasets is a function of alpha level (usually 0.05).
Normally, in bioinformatics, this level of an alpha leads to significant Type I
error. Under the null hypothesis of no differential gene expression, the expected
percentage of Type I errors in any one dataset is 100. So for any given
dataset with tests of significance, the number of false positives will be
approximately . Therefore analyzing a dataset that consists of genechips
each containing 22,000 probesets, at 0.05, on average would result in