Washington University in St. Louis Washington University Open Scholarship Engineering and Applied Science Theses & McKelvey School of Engineering Dissertations

12-2018 Profiles of and Non-Histone in Breast Cancer Alla Karpova Washington University in St. Louis

Follow this and additional works at: https://openscholarship.wustl.edu/eng_etds Part of the Computational Biology Commons, Molecular, Cellular, and Tissue Engineering Commons, and the Systems and Integrative Engineering Commons

Recommended Citation Karpova, Alla, "Acetylation Profiles of Histone and Non-Histone Proteins in Breast Cancer" (2018). Engineering and Applied Science Theses & Dissertations. 433. https://openscholarship.wustl.edu/eng_etds/433

This Thesis is brought to you for free and open access by the McKelvey School of Engineering at Washington University Open Scholarship. It has been accepted for inclusion in Engineering and Applied Science Theses & Dissertations by an authorized administrator of Washington University Open Scholarship. For more information, please contact [email protected]. WASHINGTON UNIVERSITY IN ST. LOUIS

School of Engineering and Applied Sciences

Department of Biomedical Engineering

Thesis Examination Committee:

Li Ding, Chair

Michael Brent

Gary Patti

Acetylation Profiles of Histone and Non-Histone

Proteins in Breast Cancer

by

Alla Karpova

A thesis presented to the School of Engineering of Washington University in St. Louis in partial fulfillment of the requirements for the degree of

Master of Science

December 2018

Saint Louis, Missouri

Contents

Contents ...... ii List of Figures...... iv List of Tables ...... vii Acknowledgments ...... viii ABSTRACT OF THE THESIS ...... ix 1 Background ...... 1 1.1 Histone Acetyltransferases and Deacetylases ...... 1 1.2 Histone Acetylation ...... 5 1.3 Alterations in Acetylation in Cancer ...... 7 1.3.1 Wrong Histone Acetyltransferases ...... 7 1.3.2 Wrong Histone Deacetylases ...... 9 1.3.3 Histone Acetylation in Cancer ...... 10 1.3.4 Non-histone Proteins Acetylation: Links to Cancer ...... 11 2 Research Methods ...... 14 2.1 Datasets Overview ...... 14 2.1.1 Global Acetylome Dataset ...... 14 2.1.2 Global Proteomics Data ...... 15 2.1.3 Histone Acetylation Dataset ...... 16 2.2 Linear Regression Analysis ...... 16 2.2.1 Histone Acetylation Linear Model ...... 16 2.2.2 Metabolic Acetylation Linear Models ...... 17 2.2.3 Normalization of Acetylome Data for Non-histone proteins ...... 17 2.3 Set Enrichment Analysis ...... 18 2.3.1 Sample subset ...... 18 2.3.2 GSEA ...... 18 2.4 Statistical Analysis ...... 19 3 Findings ...... 20 3.1 Histone Acetylation ...... 20 3.2 Metabolic Proteins Expression and Acetylation Interplay ...... 32 3.2.1 Expression of metabolic enzymes characterize Basal subtype metabolism as glycolytic ...... 32 3.2.2 Differential acetylation of cytoplasmic and mitochondrial metabolic enzymes ...... 37 3.2.3 Functional role of differentially acetylated ...... 45

ii

4 Conclusions ...... 52 Appendix A ...... 53 References ...... 58

iii

List of Figures

Figure 1.1.1: Schematic representation of acetylation and deacetylation reactions (Drazic et al. 2016). A. N-terminal acetylation of polypeptides. B. Reversible acetylation of e-amino group of . C. Reaction specific for NAD+-dependent ...... 2 Figure 2.1.1.1: Distribution of all detected acetylation values for each experiment separately...... 15 Figure 2.1.1.2: Overview of global acetylome dataset coverage. A. Number of acetylated peptides vs coverage. Dashed line indicates the minimal coverage for every peptide in the dataset. B. Distribution of coverage of acetylation peptides per sample. Samples with abnormally low coverage were excluded from the analysis...... 15 Figure 2.3.1.1: Scorings plot of PCA decomposed matrix of H2B N-terminal acetylation sites. Circle indicates a unit circle, the length of Ac-sites vectors is equal to their loading. Cluster 1 is considered as high acetylation samples and cluster 3 as low acetylation samples...... 18 Figure 3.1.1: Number of acetylated sites per core histone reported in the dataset ...... 20 Figure 3.1.2: Unsupervised clustering of histone acetylation sites. Distance: Euclidean, clustering method: Ward’s. Red indicates the highest acetylation level, blue – the lowest, white – the missing data ...... 21 Figure 3.1.3: Spearman’s correlation between histone acetylation sites. A. Correlation plot for all histone sites. B. Distribution of Rho correlation values for H2A/H2B sites (yellow), or H3/H4 sites (green), or between the H2A/H2B and H3/H4 groups ...... 22 Figure 3.1.4: Coefficients of fitted linear models [1]. Only coefficients with p-value < 0.05 under random model are labeled...... 23 Figure 3.1.5: Significantly enriched NCG sets in differentially expressed . A. Gene ratio of enriched sets. B. Running enrichment score for breast and kidney cancer sets...... 25 Figure 3.1.6: Top ten DE genes enriched for breast cancer association. mRNA level corresponds to log2(FPKM + 0.01), level corresponds to normalized relative protein abundance. Significance level of Wilcoxon test is shown...... 26 Figure 3.1.7: Spearman’s correlation between H2B N-terminal Ac-sites with luminal specific DE genes. A. GATA3, mutations are labeled: M – missense mutation, F – frameshift insertion/deletion mutation, S – splice site mutation. B. FOXA1, C. ESR1. mRNA level corresponds to log2(FPKM + 0.01), protein level corresponds to normalized relative protein abundance...... 27 Figure 3.1.8: Unsupervised clustering of histone acetylation sites. Distance: Euclidean, clustering method: Ward’s. Red indicates the highest acetylation iv

/protein/mRNA level, blue – the lowest, white – the missing data. Mutations are labeled: M – missense mutation, F – frameshift insertion /deletion mutation, S – splice site mutation. mRNA level corresponds to log2(FPKM + 0.01), protein level corresponds to normalized relative protein abundance ...... 29 Figure 3.1.9: Average Spearman’s correlation between mRNA and protein levels of differentially expressed genes and acetylation. A. Correlation computed for H2B K5, K11, K11_K12, K15_K16 and K20_K23. B. Correlation computed for H3 K27, K27_K36 and K36 ...... 31 Figure 3.2.1.1: Top enriched pathways for acetylated proteins in breast cancer. The right part provides examples of acetylated proteins interactions, colorful nodes indicate that there are novel sites reported in the dataset, grey nodes corresponds to proteins with all-known Ac-sites reported in the dataset...... 33 Figure 3.2.1.2: Unsupervised clustering of protein expression of differentially expressed metabolic proteins and carriers (Anova, FDR < 0.05). Distance: Euclidean, clustering method: Ward’s. Red indicates the highest protein level, blue – the lowest, white – the missing data ...... 34 Figure 3.2.1.3: PFKP gene amplification, RNA level and protein level in breast cancer subtypes. Copy number level is considered as log2(copy number tumor /copy number normal). The left plot has a dashed line marking 0.5, a cutoff for gene to have one additional copy. FPKM is Fragments Per Kilobase of transcript per Million mapped reads...... 35 Figure 3.2.2.1: Unsupervised clustering of normalized acetylation level of differentially acetylated lysines of metabolic proteins and carriers (Anova, FDR < 0.05). Distance: Euclidean, clustering method: Ward’s. Red indicates the highest acetylation level, blue – the lowest, white – the missing data...... 38 Figure 3.2.2.2: Examples of differentially acetylated sites of metabolic enzymes...... 39 Figure 3.2.2.3: Coefficients of fitted linear models for HDACs [2]. Only coefficients with linear model coefficient FDR < 0.1 and p-value < 0.01 under random model are shown ...... 41 Figure 3.2.2.4: Coefficients of fitted linear models for HATs [2]. Only coefficients with linear model coefficient FDR < 0.1 and p-value < 0.01 under random model are shown ...... 42 Figure 3.2.2.5: Schematic representation of Acetyl-coA synthesis pathways in basal subtype. Blue arrows mean protein downregulation, pink arrows mean protein upregulation, yellow circles – elevated acetylation. Adopted from (Narita, Weinert, and Choudhary 2018). ACC1 - Acetyl-CoA Carboxylase Alpha, or ACACA; ACC2 - Acetyl-CoA Carboxylase Beta, or ACACB; ACLY - ATP Citrate , ACSS2 - Acyl-CoA Synthetase Short Chain Family Member 2; PDC – Pyruvate Dehydrogenase Complex...... 43 Figure 3.2.2.6: Expression of Acetyl-coA metabolism proteins. A. Protein expression of proteins involved in Acetyl-coA synthesis in cytoplasm. B. Protein expression of proteins involved in fatty acid synthesis from Acetyl-coA in cytoplasm...... 44 Figure 3.2.3.1: Localization of GPI differentially acetylated sites relative to its active center. Differentially acetylated sites are marked in pink, not differentially acetylated sites are marked in grey. K519 residue involved in catalysis v

reported by (Ji Hyun Lee et al. 2001) is marked in magenta. Catalytically important sites reported by (Cordeiro et al. 2004) are colored in white...... 46 Figure 3.2.3.2: Localization of PGK1 differentially acetylated sites relative to its active center. Differentially acetylated sites are marked in cyan. Substrates 3-PG and ADP are colored in white ...... 47 Figure 3.2.3.3: Localization of MDH2 differentially acetylated sites. Differentially acetylated upregulated in basal sites are marked in pink. Differentially acetylated downregulated in basal sites are marked in blue. Not differentially acetylated cites, but previously reported as activating, are in cyan. Substrate NAD+ is colored in magenta...... 49 Figure 3.2.3.4: Localization of IDH2 differentially acetylated sites. Differentially acetylated sites are marked in pink. Protein is colored by hydrophobicity: red the most hydrophilic, white the most hydrophobic. Substrate NADP+ is colored in green ...... 50

vi

List of Tables

Table 1.1.1: Human major histone acetyltransferases ...... 2 Table 1.1.2: Human major histone deacetylases...... 3 Table 3.2.2.1: Summary of protein expression and acetylation of glycolytic enzymes in breast cancer subtypes. Number of affected sites was chosen with t-test FDR < 0.1 ...... 39 Table 3.2.2.2: Summary of protein expression and acetylation of TCA cycle enzymes in breast cancer subtypes. Number of affected sites was chosen with t-test FDR < 0.1 ...... 40

vii

Acknowledgments

I would like to thank my advisor Prof. Li Ding for her support and valuable advice she gave to all her students. Special thanks goes to Steven Carr lab in Broad Institute who kindly provided the data used in this Thesis. I would also like to share my gratefulness to my colleagues and friends in the lab for being the best people I am so glad to work with. This work is supported by NIH grants

5U24CA210986-03 to Steven Carr, U24CA210972 and U24CA211006 to Li Ding. I acknowledge support of computational resources from McDonnell Genome Institute and the Oncology Division of the Washington University School of Medicine. I would also like to say special thanks to the

Department of Biomedical Engineering of Washington University in St. Louis and my academic advisor Prof. Dennis Barbour for giving me advice to pursue precision medicine and bioinformatics field.

I would like to thank my thesis committee who reviewed this thesis and gave new directions to follow in the future.

viii

ABSTRACT OF THE THESIS

Histone and Non-Histone

Proteins Acetylation Profiles in Breast Cancer

by

Alla Karpova

Master of Science in Biomedical Engineering

Washington University in St. Louis, 2018

Research Advisor: Li Ding

This study evaluates the impact of protein acetylation on breast cancer and the regulation of metabolism. Acetylation is the second abundant post-translational modification after , regulating protein activity and function. The alterations in acetylation of both histone and non-histone proteins is known to be related to many human diseases, including cancer.

Acetylation and deacetylation of histones is closely associated with the regulation of gene expression, while acetylation of non-histone proteins may have a broad effect on major cellular processes, such as proliferation, metabolism, cell cycle and apoptosis, imbalanced regulation of which is essential for cancer development. Therefore, it’s critical to explore the role of this post-translational modification in cancer in a systematic manner. Here, utilizing a unique acetylome dataset for 120 patients with breast cancer, as well as genomic and proteomic data, I showed the impact of acetylation on gene expression and metabolic enzymes. More specifically, the association between acetylation level and expression of FOXA1 and GATA3 factors has been established. In addition, acetylation of metabolic enzymes has been demonstrated to reveal additional information on metabolism regulation in breast cancer.

ix

1 Background

1.1 Histone Acetyltransferases and Deacetylases

Acetylation is one of the most prevalent post-translational modifications (PTMs) in eukaryotic cells

(Khoury, Baliban, and Floudas 2011). Alongside with phosphorylation, ubiquitination and glycosylation, acetylation controls essential cellular processes and mediate the adjustment to changing environmental conditions. Constantly developing techniques for protein detection allow for identification of thousands of new thousands of PTM sites, leading to a deeper understanding of their function (Doll and Burlingame 2015).

Acetylation can occur in two forms: acetylation of N-terminal (Nt) amino acid in a peptide and acetylation of e-amino group of lysines. Unlike Nt-acetylation, acetylation is a reversible modification. This modification can be introduced by histone acetyltransferases (HATs) that utilize acetyl-coA as a source of and transfer it on the e-amino group. Since the acetylation can occur not only on histone proteins (as it was thought before), those enzymes are sometimes called lysine (K) acetyltransferases (KATs). Histone deacetylases catalyze the reverse reaction and exempt lysines from the acetyl group (Fig. 1.1.1). Similar to HATs, they are sometimes called KDACs – lysine deacetylases.

1

There are 17 genes encoding proteins with acetyltransferase activity annotated to be the main activity.

HATs can be subdivided into three families: p300/CBP family, the GNAT family and the MYST family. Table 1.1.1 summarizes gene and protein names of major acetyltransferases and provides the examples of their substrates.

Figure 1.1.1. Schematic representation of acetylation and deacetylation reactions (Drazic et al. 2016). A. N-terminal acetylation of polypeptides. B. Reversible acetylation of e-amino group of lysines. C. Reaction specific for NAD+-dependent sirtuins – class of deacetylases.

Table 1.1.1. Human major histone acetyltransferases.

HUGO GENE SUBSTRATE FAMILY HAT NEW NAME SYMBOL EXAMPLES GNAT HAT1 KAT1 HAT1 H2A, H4 GNAT GCN5 KAT2A KAT2A H3 H3, H4, ACLY, GNAT PCAF KAT2B KAT2B PKM H2A, H2B, H3, P300/CBP CBP KAT3A CREBBP H4, FOXO1 H3, FOXO1, P300/CBP P300 KAT3B EP300 SIRT2 _ TAF1 KAT4 TAF1 H3, H4 H2A, H4, MYST TIP60 KAT5 KAT5 FOXP3

2

H3, H4, p53, MYST MOZ KAT6A KAT6A RUNX2 MYST MORF KAT6B KAT6B H3, RUNX2 MYST HBO1 KAT7 KAT7 H4 MYST MOF KAT8 KAT8 H4, p53 a GNAT ELP3 KAT9 ELP3 H3, H4, - tubulin _ TFIIIC90 KAT12 GTF3C4 H3 _ SRC-1 KAT13A NCOA1 H3, H4 _ SRC-3 KAT13B NCOA3 H3, H4 _ SRC-2 KAT13C NCOA2 H3, H4 _ CLOCK KAT13D CLOCK ARNTL, NR3C1 _ ATF-2 _ ATF2 _

HATs can localize in different cellular locations. The majority of them functions in the nucleus, such as CBP/p300, KAT7, KAT8, HAT1 and others. However, some of the HATs can be found in both cytoplasm and the nucleus: CBP/p300, KAT2B, ELP3, ATF2 and CLOCK. In mitochondria, only one acetyltransferase (ACAT1) has been identified, modifying pyruvate dehydrogenase complex and regulating its activity (Fan et al. 2014). Histone acetyltransferases are almost always associated with other protein, defining their target and site specificity. HAT-binding proteins usually contain various domains such as bromodomain, chromodomain, WD40 repeats and PHD fingers domains aimed to recognize different modifications of histones (Lee and Workman 2007).

Histone deacetylases can be subdivided into four families: Class I, II and IV require Zn2+ as a , while class III is NAD+- dependent and are called sirtuins. HDACs of class I or II usually modify histones, transcription factors and remodeling complexes (Drazic et al. 2016). However, some members of those two classes are also found in cytoplasm (Table 1.1.2).

Table 1.1.2. Human major histone deacetylases.

CLASS HDAC COFACTOR COMPARTMENT SUBSTRATE EXAMPLES

I HDAC1 Zn2+ Nucleus All core histones, RelA, AR I HDAC2 Zn2+ Nucleus All core histones

3

I HDAC3 Zn2+ Nucleus All core histones, NF-kB, KAT2B, STAT1 I HDAC8 Zn2+ Nucleus/Cytoplasm All core histones, p53

II HDAC4 Zn2+ Nucleus/Cytoplasm All core histones, HIF1a, p53

II HDAC5 Zn2+ Nucleus/Cytoplasm GATA2. GCMa II HDAC6 Zn2+ Cytoplasm a-tubulin, HSP90

II HDAC7 Zn2+ Nucleus/Cytoplasm PLAG1

II HDAC9 Zn2+ Nucleus/Cytoplasm ATDC II HDAC10 Zn2+ Cytoplasm HSP70, PP1 III SIRT1 NAD+ Nucleus p53, FOXO1, HSF1, KAT7, CBP III SIRT2 NAD+ Cytoplasm a-tubulin

III SIRT3 NAD+ Mitochondria GDH, TCA cycle enzymes, LCAD, ACSS2 III SIRT4 NAD+ Mitochondria GLUD1 III SIRT5 NAD+ Mitochondria CPS1, cytochrome c III SIRT6 NAD+ Nucleus H3K56ac, RBBP8 III SIRT7 NAD+ Nucleolus H3K18ac, PAF53 IV HDAC11 Zn2+ Nucleus All core histones

HDAC1 and HDAC2 share a lot of sequence similarity and involved in regulation of cell cycle and apoptosis (Reichert, Choukrallah, and Matthias 2012). HDAC3 is also essential for cell cycle regulation, along with DNA damage control (Reichert, Choukrallah, and Matthias 2012). Most of class

I histone deacetylases are part of bigger protein complexes, such as Sin3, N-CoR/SMRT and

CoREST, except for HDAC8, which has been determined to function alone (Barneda-Zahonero and

Parra 2012). In general class I HDACs are expressed ubiquitously in every cell. In contrast to class I, class II HDACs are more tissue specific and play crucial role in differentiation and organism development. Along with deacetylase domain, class II HDACs have long regulatory domain, allowing for binding to tissue specific transcription factors and therefore modulating the specificity of these histone deacetylases (M. Parra and Verdin 2010). Sirtuins can be localized in various cellular location,

4

including mitochondria, cytoplasm and the nucleus, where they are involved in regulation of oxidative stress, aging, metabolism and DNA repair.

1.2 Histone Acetylation

Eukaryotic DNA is wrapped around and packaged with specialized protein complexes, called . Each nucleosome is an octamer that consists of four pairs of core histones H2A, H2B,

H3 and H4. Approximately 147 bp of DNA are wrapped two times around each nucleosome with 8-

114 pb of free DNA linking adjacent nucleosomes. Nucleosome is a basic unit of chromatin, and dependent on how tightly packaged the nucleosomes are, chromatin can exist in two basic states: the more relaxed euchromatin, and more condensed heterochromatin (Shahbazian and Grunstein 2007).

The openness of chromatin is thought to directly impact the expression of underlying DNA with looser chromatin being more accessible to RNA Pol and transcription factors and more actively transcribed. Chromatin structure is highly dynamic, and it plays a crucial role in epigenetic gene regulation.

All core histones have a globular domain that forms the center of the nucleosome and an N-terminal tail that protrudes away from the nucleosomes. These N-terminal tails are major sites of nucleosome regulation through post-translational modifications (PTMs) including acetylation, phosphorylation, , ubiquitination and sumoylation (Shahbazian and Grunstein 2007). All four histones tails have lysines – potential sites of methylation or acetylation, and while methylation can be either repressing (H3K9me3, H3K27me3), or activating (H3K4me3) depending on the lysine position, lysine acetylation is always thought to be activating.

In vitro studies suggest that lysine acetylation reduces the electrostatic attraction between the negatively charged DNA phosphates and positively charged lysines, resulting in a less condensed

5

chromatin. In vivo, lysine acetylation is regulated by “writer” enzymes – histone acetyltransferases

(HATs) that catalyze the transfer of acetyl group from Acetyl-CoA onto the lysine amino group and

“eraser” enzymes - histone deacetylases (HDACs) that remove the acetyl group from lysines. HATs and HDACs are believed to act non-specifically genome-wide, as well as at individual gene loci when targeted by transcription activators and repressors.

Histone acetylation is dimmed to be always activation mark because it decreases the nucleosome affinity to the DNA and increases its accessibility to transcription factors. The most studied histone acetylation mark is H3K27Ac, marking active promoters and enhancers (Creyghton et al. 2010;

Pradeepa 2017). H3K36Ac occurs in RNAP II promoters, while is usually found within exons, marking actively transcribed genes (Morris et al. 2007). H3K9Ac marks the switch from transcription initiation to the elongation, in contrast to H3K9me3, which is a strong repressive mark of heterochromatin (Gates et al. 2017). H3K14 is critical for DNA damage checkpoint activation (Y.

Wang et al. 2012), and together with H3K9ac marks bivalent promoters, enhancers and sometimes inactive promoters as well (Karmodiya et al. 2012). In one study, a consistent set of histone modification has been identified, marking promoter regions. This set contains the following histone modifications: H2A.Z, H2BK5ac, H2BK12ac, H2BK20ac, H2BK120ac, H3K4ac, H3K4me1,

H3K4me2, H3K4me3, H3K9ac, H3K9me1, H3K18ac, H3K27ac, H3K36ac, H4K5ac, H4K8ac and

H4K91ac. The authors have also shown that promoters containing this marks result in higher gene expression than promoters without such modifications. They also noted that the amount of these modifications correlate genome wide (Z. Wang et al. 2008). lysine 16 acetylation has relatively unusual role not related to regulation of gene expression. H4K16ac controls nucleosome- level interactions, preventing the formation of evenly spaced nucleosomes, resulting in transcriptional repression (Blosser et al. 2009). However, another study has demonstrated that in mouse embryonic

6

stem cells loss of H4K16 acetylation does not affect high-order chromatin structure, and more surprisingly that H4K16ac is another marks of active enhancers (Taylor et al. 2013). These findings suggest that histones modifications have multifarious functions, depending on the chromatin context, developmental stage and cell type.

1.3 Alterations in Acetylation in Cancer

It has been recently demonstrated that both HATs and HDACs are required to achieve proper level of gene transcription. As it was discussed above, acetylation increases chromatin accessibility to RNAP and TFs, however, the fine tuning of acetylation marks on the promoter is required to switch from initiation to elongation. In addition, it is very important to balance the level of acetylation at transcription start sites, therefore, the activity of both HATs and HDACs is required to achieve right transcription rate.

In general, low acetylation levels can be due to either mutations, low expression, displacement or haploinsufficiency of HATs, or overexpression/aberrant recruitment of HDACs, or the combination of these events. Similarly, elevated acetylation can be achieved by high expression, mutations and aberrant recruitment of HATs or downregulation and mutations in HDACs. 1.3.1 Wrong Histone Acetyltransferases

Mutations in HATs have been reported in many cancer types. For example, CREBBP gene has many recurrent mutations in lymphoma and leukemia (Morin et al. 2011; Pasqualucci et al. 2011; Mullighan et al. 2011; Gui et al. 2011). In addition, close paralog of CREBBP, gene EP300, was shown to have both missense and truncating mutations in solid tumors (Muraoka et al. 1996; Gayther et al. 2000). In the TCGA PanCancer study, EP300 was identified as a driver gene (tumor suppressor) for bladder

(together with CREBBP), endometrial and lung cancers (Bailey et al. 2018). Two functional copies of

7

CREBBP, but not EP300, are required to avoid defects in hematopoietic differentiation, leading to malignancies (Kung et al. 2000). Another study on mice proves even more the hypothesis that EP300 and CREBBP act as tumor suppressors. In this study embryonic stem cells deficient for p300 or CBP were inject into mice embryos. These mice were shown to have higher chances to develop hematological malignancies compared to control group (Di Cerbo and Schneider 2013). P300 mediates the functioning of a number of tumor suppressor proteins, such as TGF-b, p53 and E2F, through activation of transcription of target genes (Iyer, Özdag, and Caldas 2004). However, there are clear examples of EP300 and CREBBP being oncogenic genes. For instance, both CBP and p300 were demonstrated to form fusions with MILL (KMT2A) and MOZ (KAT6A) proteins in the mixed lineage leukemia (Di Cerbo and Schneider 2013). In addition, p300/CBP can acetylate more common fusion proteins such as AML1-ETO and positively modulate their contribution in leukomogenesis. These observations highlight the oncogenic potential of p300/CBP acetyltransferases. Not only mutation status alters the function of HATs, but the upregulation of expression level as well. Overexpression of p300 was observed in breast, liver and lung carcinomas (Di Cerbo and Schneider 2013), correlating with poor prognosis. Moreover, it was demonstrated that p300 promotes the expression of androgen receptor (AR) target genes through the acetylation of AR in ligand independent manner in prostate cancer (Debes et al. 2003). Therefore, prostate cancer can benefit from high expression of EP300. In addition, the knockdown of EP300, but not CREBBP, significantly decreases the proliferation of prostate cancer cells. In human breast cancer lines, upon BRCA1 being mutated, p300/CBP acetylates estrogen receptor a, and the ectopic expression of WT BRCA1 downregulates p300 (Di Cerbo and

Schneider 2013). PCAF together with p300 acetylates p53 and increase its DNA binding affinity promoting growth arrest and apoptosis (Yamaguchi et al. 2009). Altogether, these results demonstrate

8

that HATs can play both oncogenic and tumor suppressor role in cancer development and their exact contribution in each particular cancer type development is still to be identified. 1.3.2 Wrong Histone Deacetylases

Similar to HATs, HDACs are also often dysregulated in cancer. The mechanism of dysregulation can be very different: from mutations to aberrant recruitment by fusion oncogenic protein. For example, class I HDACs are frequently found to be upregulated in breast, pancreatic, lung and prostate carcinomas and are almost always associated with poor prognosis (Barneda-Zahonero and Parra 2012).

In one study, expression of HDAC1 and HDAC3 was correlated with estrogen and progesterone receptor expression, suggesting they could be an independent prognostic marker (Krusche et al. 2005).

In and breast cancer cells knockdown of HDAC1 results in cell cycle arrest and induction of apoptosis (Senese et al. 2007), while the overexpression of HDAC1, HDAC6 and

HDAC8 increases cell invasion (Park et al. 2011). In addition, in breast cancer cells HDAC2 silencing enhances p53 binding ability, which correlates with cell cycle block and senescence induction (Harms and Chen 2007). Class II deacetylases were also reported to have mutations and aberrant expression in many cancer types (Barneda-Zahonero and Parra 2012). They can also affect cell proliferation rate of cancer cells. For instance, HDAC5 induces cell rapid division by regulation of p14 repression

(Yarosh et al. 2008). In addition, HDAC7 together with estrogen receptor a represses a tumor suppressor Reprimo, therefore contributing to cell growth (Malik et al. 2010). HDACs might also be involved in developing of cancer chemoresistance: it was shown that DHAC7/HIF-1A complex might repress cyclin D1, contributing to chemoresistance (Wen et al. 2010). Not all HDACs function as potential oncogenes. In breast carcinomas, HDAC6 expression was demonstrated to be associated with better survival and was higher in ER and PR-positive tumors (Zhang et al. 2004). HDAC6 probably deacetylase Hsp90, preventing the hormone mediated activation and decreasing the growth 9

of breast cell (Barneda-Zahonero and Parra 2012). Sirtuins are more controversial in terms of their impact on cancer development. SIRT3 and SIRT7 were shown to be upregulated in breast cancer, while SIRT2 is, on contrary, downregulated in gliomas and gastric carcinomas. SIRT2 appears to act as a tumor suppressor ensuring the proper passing of mitotic checkpoint. SIRT3 overexpression opposes p53-mediated cell cycle arrest in bladder cancer cells, however, in xenograft models, SIRT3 knockdown triggers tumorogenesis (Barneda-Zahonero and Parra 2012).

These results demonstrate that HDACs play important role in cell cycle regulation and are often associated with the outcome. However, some findings are contradictory and may vary from cancer cell lines to real patients. Hence, it is very important to examine the contribution of every HDAC 1.3.3 Histone Acetylation in Cancer

First studies on global histone acetylation changes in cancer cells revealed the overall reduction in

H4K16 acetylation (Di Cerbo and Schneider 2013). Global loss of H3K18ac, H3K9ac and H4K16ac is generally associated with poor prognosis and a shorter life expectation. Moreover, global loss of methylation marks such as H3K4me2, H3K9me2 and H3K27me3 is also an indicator of poor outcome (Di Cerbo and Schneider 2013). At the same time, another group of researchers associated low level of H3K9ac and H3K18ac with a better prognosis in lung cancer (Seligson et al. 2009). The molecular mechanisms underlying such changes in global histone modifications level have not been established yet. One study has demonstrated that SIRT7 is able to deacetylase H3K18ac, resulting in inhibition of key cellular regulators, and that downregulation of SIRT7 contributes to cancer proliferation (Barber et al. 2012). Modification of core part of histones might also be dysregulated in cancer. For instance, H3K56ac involved in DNA damage response was reported to correlate with de- differentiated state of cancer cells. So, these are not many things known about global changes in histone acetylation and even fewer of them are connected to breast cancer. Nevertheless, global 10

dysregulation of histone acetylation may cause global changes in gene expression patterns, which may induce and enhance tumorogenesis. 1.3.4 Non-histone Proteins Acetylation: Links to Cancer

Acetylation is implicated in vital cellular processes, many of which have been linked to various diseases, including cancer. Acetylation is involved in regulation of gene expression not only in form of histone acetylation, but non-histone proteins as well. P53 is a great example of a transcription factor with activity modulated by acetylation. This key regulator of cell cycle and apoptosis is acetylation at several sites, resulting in enhanced DNA binding capacity and activation of p53-regulated genes. p53 is the most important and most frequently mutated tumor suppressor in the majority of cancer type (Narita,

Weinert, and Choudhary 2018). Another major cellular process often dysregulated in cancer is cell cycle. During the cell cycle sister chromatid are grouped together in pairs, which is accomplished by cohesion complex. A key component of cohesion complex, SMC3, surrounding chromatids as a ring, is acetylated at two sites, therefore resulting in close state of this ring and tight retention of sister chromatids together. It was also reported that acetylation modifies the activity of other cell cycle regulators such as CDK1, CDK2, Aurora kinase A and B (Narita, Weinert, and Choudhary 2018).

Taking together, these results suggest that along with phosphorylation acetylation may play important role in regulation of cell cycle in both normal and cancer cells. Moreover, acetylation of DNA damage response proteins controls the choice of a pathway to repair double-strand breaks: non-homologous end-joining (NHEJ) or homology-directed repair (HDR). Acetylation interferes the recruitment of

NHEJ-promoting factor TP53-binding protein 1 (53BP1) by several means: histone H4 acetylation prevents 53BP1 binding to H4K20me2 site; acetylation of H2AK15 decreases the amount of

H2AK15ub sites and thus impairs 53BP1 recruitment to H2AK15ub; ATM kinase activated upon

DNA damage response phosphorylates ACLY, thus increasing the production of acetyl-coA in 11

nucleus, resulting in higher histone acetylation and wrong chromatin localization of 53BP1; and finally,

CBP acetylates 53BP1, which impairs its recruitment to double-strand breaks. Hence, acetylation of various proteins defines which pathways will be utilized to repair DSBs. If this decision making acetylation is disrupted, NHEJ may be used more frequently, resulted in higher rate of insertions and deletions - common outcome of NHEJ (Narita, Weinert, and Choudhary 2018). Finally, acetylation has been shown to be part of cell signaling process as well. For instance, PTEN regulating the level of PIP3 can be acetylated in its catalytic and C-terminal domains. Acetylation of catalytic domain inhibits PTEN activity, while acetylation of C-tail promotes PTEN binding to proteins, enhancing its lipid phosphatase activity and recruiting it to signaling complexes (Narita, Weinert, and

Choudhary 2018). Besides gene transcription, DNA damage response and cell signaling, acetylation is also associated with regulation of protein folding, cytoskeleton organization, RNA processing, metabolism, autophagy and other vital cellular processes.

Taking together, acetylation and acetylation regulating enzymes are particularly important for cancer formation and are frequently positively or negatively associated with poor outcome. Recent studies by the National Cancer Institute’s Clinical Proteomic Tumor Analysis Consortium (CPTAC) have produced proteomic, phosphoproteomic and acetylome data for infiltrating breast carcinomas. These datasets provide the opportunity to evaluate the impact of acetylation on tumor progression and subtyping using a cohort of 122 breast cancer patients. In this study, we characterized global histone acetylation profiles and evaluated the differences in metabolic enzymes expression and acetylation across breast cancer subtypes. We determined association between acetylation of N-terminal H2B histone sites and luminal transcription factors GATA3 and FOXA1. Acetylation of H2B but not other histones also significantly correlates with protein expression of many tumor suppressor chromatin modifying enzymes, such as KMT2A, KMT2C and KMT2D and this correlation evades in basal and

12

Her2 subtypes. Analysis of acetylation of metabolic enzymes revealed a metabolic uniqueness of basal subtype compared to all others and demonstrated that regulation of enzymatic activity by acetylation complements of that by gene and protein expression.

13

2 Research Methods

2.1 Datasets Overview

2.1.1 Global Acetylome Dataset

The acetylome dataset for CPTAC Breast cancer was generated by Steven Carr lab, Broad Institute, using the targeted LC-MS/MS technology with isobaric tags (TMT (tandem mass tags)-10) (Mertins et al. 2018). The peptides were enriched for acetylated lysines, using anti-lysine acetylation antibody

(Svinkina et al. 2015). The dataset consists of log-ratios of intensities for experimental over common reference. The common reference is created by sampling small amount of tissue of every experiment sample and mixing them together, ensuring that peptides can be identified more consistently.

The dataset consists of 9517 detected peptides and 130 tumor samples with eight replicates. Over

9,000 detected peptides correspond to more than 10,000 unique acetylation sites for more than 300 proteins. The data were generated in 17 different experiments. Figure 2.1.1.1. shows the distribution of detected values for each experiment. Distributions are centered and scaled, so the data is normalized for batch effect.

14

11 12 0.4 13 16 0.3 17 2 0.2 density 3 4 0.1 5 6 0.0 7 −25 0 25 8 value 9 Figure 2.1.1.1. Distribution of all detected acetylation values for each experiment separately.

Samples from one experiment have significantly lower coverage than all other samples (Fig. 2.1.1.2 B)

and were excluded from the analysis. The minimum coverage for the dataset with excluded samples

is 26%, the number of peptides with 100% coverage across samples is 1602, corresponding to 744

genes (Fig. 2.1.1.2 A). A B

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 25 9500 ●●● ●●● ●● ● ● 8500 ● ●● ●● Number of genes ● 20 ● 7500 ● 3500 ●● ●● ●● ● 3000 6500 ●● ●● 15 ● ●● ●● 5500 ●● ● 2000 ●● ●● ●● 10 4500 ● ●●

x% coverage ●● ●● ● 1000 3500 ● ●●● Number of samples ●● 5 ● 400 2500 ● ●●● ●● ● 1500 ● 0 Number of Ac − peptides with at least 0 25 50 75 100 20 40 60 80 100 Coverage across samples, % Ac−peptides coverage per sample, % Figure 2.1.1.2. Overview of global acetylome dataset coverage. A. Number of acetylated peptides vs coverage. Dashed line indicates the minimal coverage for every peptide in the dataset. B. Distribution of coverage of acetylation peptides per sample. Samples with abnormally low coverage were excluded from the analysis. 2.1.2 Global Proteomics Data

Global proteomics data were downloaded from CPTAC Data Coordinating Center. This includes 122

tumor samples and 17 adjacent normal samples. The overlap of samples with both acetylome and

proteome data is 113 tumor samples. 15

2.1.3 Histone Acetylation Dataset

Four core histones H2A, H2B, H3 and H4 are encoded by a large number of genes, protein products of which can differ in one amino acid in N-terminal domain. Mass spectrometry technology is able to distinguish different genes of one histone and report two functionally the same sites as separate entities. To facilitate the comprehension of the data, I averaged the values for every core histone belonging to one functional site. For example, reported peptides HIST1H2BH_K12k and

HIST1H2BD_K12k were average to obtain a value for H2B_K12 site. 2.2 Linear Regression Analysis

2.2.1 Histone Acetylation Linear Model

Linear model of form �� ≈ � + � ∗ �� + � was fitted using lm function in R to test for association between protein abundance and acetylation level of histone site in 113 tumor samples. I used acetylation level of a histone site as a dependent variable and the protein abundance of HAT or HDAC as an independent one. The protein level of histones was not included in this model since the global proteomic dataset Only Site : Enzyme pairs represented in more than 30 samples were considered. The p-value of coefficient � were adjusted to FDR using Benjamini-

Hochberg procedure.

To account for the possibility that such associations may appear by chance, the similar linear models were fitted using random pairs of Protein : Ac-site. I randomly selected 50 proteins from the global proteomics dataset and 1000 Ac-sites from global acetylome dataset and performed linear regression analysis for every pair. I collected the resulted random � coefficients and fitted their distribution with normal distribution. The resulted distribution was used to obtain the significance level of experimental

16

coefficient given the distribution of random �. Only pairs with coefficient p-value under random model < 0.05 were considered for further analysis. 2.2.2 Metabolic Enzymes Acetylation Linear Models

Similar to histone sites, associations between HATs and HDACs and acetylation of metabolic enzymes we tested using linear model of form: �� ≈ � + � ∗ �� + � ∗

�� + � . I used acetylation level of a metabolic enzyme site as a dependent variable and the protein abundance of HAT or HDAC and protein abundance of the metabolic enzyme as independent ones. The protein level of histones was not included in this model since the global proteomic dataset

Only Site : Enzyme pairs represented in more than 30 samples were considered. The p-value of coefficient � were adjusted to FDR using Benjamini-Hochberg procedure.

Significant pairs were filtered using random model p-value as described in 2.1.1. 2.2.3 Normalization of Acetylome Data for Non-histone proteins

To assess the difference between the amount of protein acetylation independent from protein expression level, the global acetylation data were normalized using linear regression of form:

�� ≈ � + � ∗ �� + �. Protein abundance of any given protein was used as an independent variable and the acetylation value of an Ac-site on this protein was used as a dependent variable. The residual values � of every fitted model was used as a new normalized acetylation value not explained by protein abundance.

17

2.3 Gene Set Enrichment Analysis

2.3.1 Sample subset

To select the samples with high- or low-level acetylation of N-terminal H2B sites, I performed

Principal Component Analysis on the acetylation data of these sites, using prcomp function in R with centering and scaling. Then I used the returned X matrix as input for Optimal_Clusters_KMeans to determine the optimal number of k-means clusters and then KMeans_rcpp from ClusterR package.

The clusters with the highest loadings of histone H2B sites vectors were treated as samples with high

H2B acetylation, and clusters with the lowest negative loadings were chosen as samples with low H2B acetylation. This analysis was done independently for whole sample set and luminal only sample set.

2

H2B_K5 1 groups H2B_K11_K12 H2B_K11 1 0 2 H2B_K15_K16H2B_K20_K23 H2B_K20H2B_K16_K20 3 −1 4

−2 PC2 (9.8% explained var.) PC2 (9.8% explained

−3 −8 −4 0 4 PC1 (68.7% explained var.)

Figure 2.3.1.1. Scorings plot of PCA decomposed matrix of H2B N-terminal acetylation sites. Circle indicates a unit circle, the length of Ac-sites vectors is equal to their loading. Cluster 1 is considered as high acetylation samples and cluster 3 as low acetylation samples. 2.3.2 GSEA

To find gene potentially regulated by histone acetylation, the gene expression data (FPKM), provided by (?) was used. Genes with zero expression level were filtered out. Baumgartner-Weiss-Schindler

18

(BWS) test statistic was used as a ranking metric for differentially expressed genes (Zyla et al. 2017).

The latest release 6.0 of the Network of Cancer Genes database was used as gene sets for GSEA

(Venkata et al. 2018; Subramanian et al. 2005). GSEA was performed in R using function gsea from

ClusterProfiler package and BWS statistic as a ranking metric (G. Yu et al. 2012). Only gene sets with q-value < 0.05 were considered. Similar analysis was performed with (GO) sets downloaded from MSigDB version 6.2, curated by Broad Institute (Liberzon et al. 2015). 2.4 Statistical Analysis

Protein expression and normalized acetylation means in breast cancer subtypes were compared by

Student’s t-test. The t-test p-values were adjusted to FDR using Benjamini-Hochberg procedure. The differentially expressed metabolic proteins (Fig. 3.2.1.2) and normalized acetylation of metabolic proteins sites (Fig. 3.2.2.1) for hierarchical clustering were chosen based on ANOVA test p-value with p-value FDR adjusted cutoff 0.05.

19

3 Findings

3.1 Histone Acetylation

Acetylome dataset reports ~80 acetylated peptides corresponding to various genes of four canonical histones H2A, H2B, H3 and H4. The histone acetylation data have been preprocessed as described in section 2.1.3. to facilitate results interpretation. The resulting protein histone acetylation dataset contains mostly sites from H2B histones (Fig 3.1).

15

10

5

0 Number of Ac − sites reported H2A H2B H3 H4 Histones

Figure 3.1.1. Number of acetylated sites per core nucleosome histone reported in the dataset.

To evaluate general trends in histone acetylation marks across samples and identify general patterns in the acetylation of histones, unsupervised clustering of acetylation values has been performed. Figure

3.1.2 demonstrates the result of the clustering in combination with PAM50 subtype, stage and other clinical annotations. As it seen in Figure 3.1.2, based on the histone acetylation, samples can be subdivided into 3 groups: with overall low, average and high acetylation of histones. There is a group

20

of samples, sharing high acetylation of all four core histones; another group shares H3/H4 elevated acetylation, while H2A/H2B sites are average; and there are also patients with increased H2B sites acetylation, but low H3/H4 acetylation.

Necrosis Necrosis Age 3 20 Race Stage 2 Cancer_type 1 PAM50 0 H2A_K118 0 H2B_K46 Age H2B_K85 H2B_K120 −1 1100 H2A_K5 H2B_K20 H2A_K124 −2 H2B_K116 H2B_K5 H2B_K15_K16 −3 H2B_K16_K20 400 H2B_K20_K23 H2B_K11 H2B_K11_K12 Race H2B_K21 H2B_K21_K24 White H2A_K95 H2A_N110_K118 H2B_K133 Asian H4_K91 H3_K56 Black or African American H3_K79 H4_K77 Unknown (Could not be determined or unsure) H3_K27 H3_K27_K36 Not Evaluated H3_K36 H4_K12 H3_K14 H3_K9_K14 Stage H3_K18_K23 H3_K23 Stage IA H4_K8_K12 H4_K12_K16 Stage IIA Stage IIB Stage III Stage IIIA Stage IIIB Stage IIIC

Cancer_type TNBC ER+ PR+ HER2+ ER/PR+ ER/HER2+ Intermediate

PAM50 Basal LumA LumB Her2 Normal_like Normal Figure 3.1.2. Unsupervised clustering of histone acetylation sites. Distance: Euclidean, clustering method: Ward’s. Red indicates the highest acetylation level, blue – the lowest, white – the missing data.

Additionally, H3 and H4 acetylation sites correlate with each other more than with H2B and H2A sites. Indeed, the average Spearman’s correlation value between sites within H3/H4 group is much higher than that between them and H2A/H2B group (Fig. 3.1.3). The observed correlation between histone sites can be explained by nucleosome structure. Nucleosome assembly is a sequential process,

21

starting from the formation of H2A/H2B and H3/H4 dimers, followed by tetramer and then octamer

formation.

A B H2A/H3 (H4), H2A_K118H2A_K124H2A_K5H2A_K95H2A_N110_K118H2B_K11H2B_K11_K12H2B_K116H2B_K12H2B_K120H2B_K133H2B_K15_K16H2B_K16_K20H2B_K20H2B_K20_K23H2B_K21H2B_K21_K24H2B_K46H2B_K5H2B_K57H2B_K85H3_K14H3_K18_K23H3_K23H3_K27H3_K27_K36H3_K36H3_K56H3_K79H3_K9_K14H4_K12H4_K12_K16H4_K77H4_K8_K12H4_K91 1 H2A_K118 H2A_K124 H2B/H3 (H4) H2A_K5 H2A_K95 0.8 H2A_N110_K118 H2B_K11 H2B_K11_K12 0.6 H2B_K116 H2B_K12 2 H2B_K120 H2A/ H2B_K133 0.4 H2A/ H2B_K15_K16 H2B_K16_K20 H2B H2B_K20 H2B 0.2 H2B_K20_K23 sites density H2B_K21 1 H2B_K21_K24 H2B_K46 0 H3/H4 H2B_K5 H2B_K57 H2B_K85 −0.2 H3_K14 H3_K18_K23 0 H3_K23 H3_K27 −0.4 −0.5 0.0 0.5 1.0 H3_K27_K36 H3_K36 Spearman's Rho H3_K56 −0.6 H3_K79 H3/H4 H3_K9_K14 H4_K12 sites H4_K12_K16 −0.8 H4_K77 H4_K8_K12 H4_K91 −1

Figure 3.1.3. Spearman’s correlation between histone acetylation sites. A. Correlation plot for all histone sites. B. Distribution of Rho correlation values for H2A/H2B sites (yellow), or H3/H4 sites (green), or between the H2A/H2B and H3/H4 groups.

Many histone sites are known to be involved in transcription regulation and chromatin structure

maintenance. The acetylation level of some sites may be particularly important for cancer cells and

therefore tightly regulated. To determine whether there are sites regulated majorly by one of known

histone acetyltransferases, I used linear regression approach and found histone Ac-sites that can be

regulated by major acetyltransferases in breast cancer. In this model, Ac-site is an independent variable,

whereas the protein abundance of an acetyltransferase is a dependent variable:

�� ≈ � + � ∗ �� + � [1]

The bigger �, the stronger one unit of enzyme protein abundance affects one unit of the acetylation

level of an Ac-site. To account for random associations that might appear, the similar regression

models were fitted with random Ac-sites and random protein abundances. The distribution of resulted

22

linear model coefficients were used to determine the p-value of experimental linear model coefficients.

Figure 3.1.3 shows the breakdown of linear model coefficients � per HAT. As seen from Fig. 3.1.4,

CREBBP protein product (CBP) demonstrates the strongest association with N-terminal H2B histone

Ac-sites: H2B K5, K11, K12, K15, K16, K20 and K23. This observation is concordant with a recent paper findings, stating that p300/CBP are responsible for acetylation of N-terminal H2B Ac-sites

(Weinert et al. 2018).

log10(FDR adjusted p−val) 0.7 2 3 0.6 H2B_K11 H2B_K57 4 5 H2B_K20_K23 0.5 H2A_K124 H2B_K11 6 H2B_K11_K12 H2B_K15_K16 H2B_K57 7 H2B_K5 0.4 H2B_K5 H2B_K11 H2B_K11 Histone Linear model coef H3_K23 H2A 0.3 H2B H3 0.2 H4

ATF2 BRD1 BRD2 BRD3 BRD4 ELP3 HAT1 KAT5 KAT7 KAT8 EP300 KAT2A KAT6B CLOCK NCOA1 NCOA3 CREBBP Acetyltransferase

Figure 3.1.4. Coefficients of fitted linear models [1]. Only coefficients with p-value < 0.05 under random model are labeled.

EP300 (p300) also shows association with N-terminal H2B Ac-sites, but the coefficients are smaller and less significant under the random associations model. In addition, CLOCK, KAT5 and KAT7 exhibit significant associations with H2AK124, H3K23 and H2BK57 sites, respectively, however the physical interaction between these proteins and Ac-sites has not been reported so far. High association between KAT7 and KAT8 protein and H2BK5, K11 sites can be accounted to co-expression of these acetyltransferases (Spearman’s RhoEP300/KAT7 = 0.6, RhoCREBBP/KAT8 = 0.45). 23

I have decided to focus on CBP : H2B association because H2B sites have better coverage than other significant sites, providing more power for statistical tests, and because this histone has never been explored in cancer studies. Since acetylation of histones in general decreases the nucleosome positive charge, leading to the impaired interaction with DNA phosphate groups and therefore, resulting in a more relaxed and accessible chromatin state, global differential acetylation of histone H2B may affect the expression of some. One possible mechanism of action on gene expression level might be associated with H2A-H2B dimers removal from nucleosomes upon acetylation that may help maintain the open state of certain genome regions (Ito et al. 2000). Additionally, at least for yeast, H2B N- terminal lysines have been shown to be involved in upregulation of genes involved in NAD+ and vitamins synthesis (M. A. Parra et al. 2006), suggesting that changes in global histone acetylation can influence the expression of specific genes. To determine if differential acetylation of H2B N-terminal sites is connected to gene expression, gene set enrichment analysis (GSEA) has been performed using

RNA-seq gene expression data. I compared gene expression in two groups of samples with either high or low acetylation level of H2B N-terminal sites. Using Network of Cancer Genes (NCG) curated sets, we identified that ‘Breast cancer’ set is enriched for upregulated genes (high H2B samples compared to low H2B ones), along with kidney, bladder and liver cancer types. On the contrary, the downregulated genes were found to be enriched for melanoma associated genes (data not shown).

Since many cancer types share cancer associated protein, such as p53, c-Myc, NRAS and other well- known oncogenes and tumor suppressors, it is not surprising to see other cancer types to be enriched as well. Regardless of other cancer types, breast cancer set shows the biggest number of genes contributing to maximum enrichment score (34 genes versus 23 in kidney and 24 in bladder cancer).

24

A NCG enrichment analysis of H2B_N−term_for_rna B breast

10 kidney ● 5

0 Ranked list metric Ranked Count −5 melanoma ● ● 15 ● 20 0.4 25 ● 0.2 bladder ● ● 30 0.0 Running Enrichment Score 0 5000 10000 15000 p.adjust Position in the Ranked List of Genes liver ● 0.015 kidney 6

0.020 4

0.025 2 breast ● 0 0.030

Ranked list metric Ranked −2

−4

prostate ● 0.4

0.2 0.15 0.20 0.25

GeneRatio 0.0 Running Enrichment Score 0 5000 10000 15000 Position in the Ranked List of Genes

Figure 3.1.5. Significantly enriched NCG sets in differentially expressed genes. A. Gene ratio of enriched sets. B. Running enrichment score for breast and kidney cancer sets.

mRNA and protein expression level of top ten breast cancer associated genes, contributing to the

maximum enrichment score, are shown in Fig 3.1.6. For these genes the difference in gene expression

level is translated into the protein level as well.

25

8 * ** *** ** **** * ** ** ** ** Figure 3.1.6. Top ten DE genes enriched for ● 4 ● ● ● ● breast cancer ● ● H2B status ● ● ● ● ● ● association. mRNA ● a high_H2B_acet 0 ● ● ● a low_H2B_acet level corresponds to mRNA level log2(FPKM + 0.01), −4 protein level

● corresponds to

* ** * ● *** **** ** * * ** * normalized relative

ESR1 GPS2 TBX3 ARID2 FOXA1 GATA3 TRAF5 ARID1A MED12 protein abundance. 2 MAP3K1 ● ● Gene● Significance level of ● H2B status Wilcoxon test is 0 a high_H2B_acet shown. ● a low_H2B_acet Protein level ● −2 ●

−4

ESR1 GPS2 TBX3 ARID2 FOXA1 GATA3 TRAF5 ARID1A MED12 MAP3K1 Gene Among the top ten breast cancer DE genes, there are three genes involved in the luminal development vector establishment in breast tissue: GATA3, FOXA1 and ESR1. Their expression remains high in breast cancer subtypes derived from luminal cell types, and positively associated with good outcome

(Shou et al. 2016; Yoon et al. 2010). Due to subtype specificity, it is possible that the observed difference in expression of those genes can be driven by subtype, but not H2B acetylation. To see if

H2B acetylation shows similar association with GATA3, FOXA1 and ESR1 expression in luminal samples only, I tested the mean difference between H2B high and low samples, defined for luminal subtypes only, using similar approach (see 2.3.1). It turned out that there is no significant difference in GATA3, FOXA1 or ESR1 mRNA level between H2B high and low luminal samples, while GATA3 and FOXA1 protein level is on contrary significant (p-value <0.05) (See Appendix A Fig. A.1). In addition, correlation analysis between H2B Ac-sites and GATA3, FOXA1 and ESR1 mRNA or 26

protein in luminal samples revealed that protein has stronger association with H2B histone than mRNA for all three genes (Fig. 3.1.7). However, in all samples together both mRNA and protein correlate well with H2B acetylation (Fig A.2). Such dramatic difference in correlation between mRNA and H2B acetylation in all samples and luminal samples separately tells us that correlation in all samples together is driven by differences between subtypes and not H2B. Protein correlation with H2B remains similar for all samples and luminal samples only, suggesting that this correlation is real.

A H2B_K5 H2B_K11 H2B_K11_K12 H2B_K15_K16 H2B_K20_K23

2.5 S ● S ● M ● S ● S ● M F M M M ● M ● ● ● F ● M ● M F● ● ● ●● S PAM50 ● F● M ●● ● ● ●● F ● ● 0.0 S ● ● ● F● S S ● ● F● S S ●M F ●M● ● ● ● M F ● F ● ● F ● F F M F LumA M F F F F M ● F ● LumB −2.5 F

Acetylation level R = 0.27 , p = 0.025 R = 0.26 , p = 0.033 R = 0.25 , p = 0.036 R = 0.15 , p = 0.2 R = 0.18 , p = 0.14

5 6 7 8 9 5 6 7 8 9 5 6 7 8 9 5 6 7 8 9 5 6 7 8 9 GATA3 mRNA level

H2B_K5 H2B_K11 H2B_K11_K12 H2B_K15_K16 H2B_K20_K23

2.5 S ●S ● M S ● M ● ●S S M M M ● M F● ● ● ● M S ● M ● ● F M ●● PAM50 ● ● F ●●● F● ●● F ● ● 0.0 S ● ● S ● ● ● M M ● ● ● M ● F F ●F● ● ● ● S ● F F ● S ● ● LumA F ● F M M F F M F F F F ● F ● LumB −2.5 F

Acetylation level R = 0.28 , p = 0.017 R = 0.48 , p = 2e−05 R = 0.39 , p = 0.00073 R = 0.33 , p = 0.0045 R = 0.41 , p = 0.00046

−1 0 1 2 3 −1 0 1 2 3 −1 0 1 2 3 −1 0 1 2 3 −1 0 1 2 3 GATA3 protein level B H2B_K5 H2B_K11 H2B_K11_K12 H2B_K15_K16 H2B_K20_K23

2.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● PAM50 ●● ●● ● ●● ● ● ● 0.0 ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● LumA ● ● LumB −2.5

Acetylation level R = 0.059 , p = 0.63 R = 0.019 , p = 0.87 R = 0.041 , p = 0.73 R = 0.038 , p = 0.75 R = 0.075 , p = 0.53

5 6 7 8 9 5 6 7 8 9 5 6 7 8 9 5 6 7 8 9 5 6 7 8 9 FOXA1 mRNA level

27

H2B_K5 H2B_K11 H2B_K11_K12 H2B_K15_K16 H2B_K20_K23

2.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● PAM50 ● ● ● ● ● ● ● ● ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● LumA ● ● LumB −2.5

Acetylation level R = 0.21 , p = 0.081 R = 0.47 , p = 4e−05 R = 0.35 , p = 0.0027 R = 0.14 , p = 0.23 R = 0.29 , p = 0.016

−1 0 1 2 −1 0 1 2 −1 0 1 2 −1 0 1 2 −1 0 1 2 FOXA1 protein level C H2B_K5 H2B_K11 H2B_K11_K12 H2B_K15_K16 H2B_K20_K23

2.5 ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● PAM50 ● ● ● ● ● ● ● ● ● ● 0.0 ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● LumA ● ● LumB −2.5

Acetylation level R = - 0.071 , p = 0.56 R = 0.045 , p = 0.71 R = 0.15 , p = 0.23 R = 0.039 , p = 0.75 R = 0.0089 , p = 0.94

3 4 5 6 7 8 3 4 5 6 7 8 3 4 5 6 7 8 3 4 5 6 7 8 3 4 5 6 7 8 ESR1 mRNA level

H2B_K5 H2B_K11 H2B_K11_K12 H2B_K15_K16 H2B_K20_K23

2.5 ● ● ● ● ● ● ● ● ● ● ● ●● ● ● PAM50 ● ● ● ● ● ● ● ● 0.0 ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● LumA ● ● LumB −2.5

Acetylation level R = 0.069 , p = 0.57 R = 0.15 , p = 0.2 R = 0.12 , p = 0.32 R = 0.0072 , p = 0.95 R = 0.082 , p = 0.5

−2 0 2 −2 0 2 −2 0 2 −2 0 2 −2 0 2 ESR1 protein level Figure 3.1.7. Spearman’s correlation between H2B N-terminal Ac-sites with luminal specific DE genes. A. GATA3, mutations are labeled: M – missense mutation, F – frameshift insertion/deletion mutation, S – splice site mutation. B. FOXA1, C. ESR1. mRNA level corresponds to log2(FPKM + 0.01), protein level corresponds to normalized relative protein abundance.

So, we see very low correlation between H2B acetylation and GATA3/FOXA1 mRNA level.

However, the protein is highly correlated with H2B acetylation. This may indicate that these gene products might be regulated on protein level by H2B acetylation. It is known that in the multiprotein complexes subunits that are not incorporated in the complex get degraded very soon (Mueller et al.

2015). GATA3 has been demonstrated to act upstream of FOXA1 and mediate ESR1 binding ability

28

(Theodorou et al. 2013), therefore these proteins may form complexes at gene regulatory regions. As a support for this statement, our data indicates that protein expression of GATA3, FOXA1 and ESR1 is highly concordant in luminal breast cancer (Fig 3.1.8) - average Spearman’s correlation value between protein abundances of these three proteins is 0.41. Hence, these findings suggest us that

GATA3, FOXA1 and ESR1 form a complex that somehow is linked to H2B acetylation.

Even though three important transcription factors correlate with H2B acetylation, it is unlikely that they directly interact since none of these proteins have a bromodomain able to recognize acetylated proteins. Hence, there must be a bromodomain containing mediator able to bind H2B histone sites that serves as a linker between the TF and the histone modification. There are at least five proteins with a bromodomain that were shown to interact with H2B acetylated lysines: BRD2, BRD3, BRD4,

CBP (CREBBP gene product) and p300 (EP300 gene product). In luminal breast cancer, all five proteins demonstrate good correlation with GATA3 protein and histone H2B acetylation (Fig. 3.1.8 and Fig. A.3), suggesting that they interact with them and this interaction involves H2B histone.

HER2 4 HER2 PR Positive ER 2 Negative PAM50 Indeterminate BRD2 D BRD3 A A BRD4 0 A A A A A A N CREBBP PR M EP300 Positive A A M F ESR1 mRNA M FM,M A F,F,N,F FOXA1 mRNA −2 Negative A A A F F F M M F S M A S GATA3 mRNA A A M F ESR1 PROT Indeterminate M FM,M A F,F,N,F FOXA1 PROT A A A F F F M M F S M A S GATA3 PROT −4 H2B_K5 ER H2B_K11 H2B_K11_K12 Positive H2B_K15_K16 H2B_K20_K23 Negative X05BR003 X05BR009 X11BR047 X11BR014 X14BR008 X05BR005 X11BR032 X11BR006 X11BR020 X11BR050 X11BR018 X18BR017 X21BR010 X11BR027 X11BR051 X11BR053 X11BR009 X11BR075 X03BR010 X18BR006 X20BR001 X11BR072 X09BR007 X11BR011 X01BR030 X15BR003 X11BR049 X14BR014 X11BR054 X18BR004 X18BR019 X01BR032 X05BR016 X18BR003 X18BR016 X05BR038 X11BR043 X03BR013 X18BR010 X18BR007 X01BR033 X11BR022 X11BR019 X11BR040 X11BR074 X11BR010 X01BR015 X11BR056 X13BR009 X11BR025 X21BR002 X06BR005 X11BR031 X16BR012 X22BR006 X05BR026 X11BR055 X06BR014 X03BR004 X11BR073 X11BR012 X11BR015 X03BR011 X21BR001 X22BR005 X11BR013 X14BR005 X09BR001 X05BR004 X01BR023 X11BR044 Indeterminate

PAM50 Basal LumA LumB Her2 Normal_like Normal 29

Figure 3.1.8. Unsupervised clustering of histone acetylation sites. Distance: Euclidean, clustering method: Ward’s. Red indicates the highest acetylation/protein/mRNA level, blue – the lowest, white – the missing data. Mutations are labeled: M – missense mutation, F – frameshift insertion/deletion mutation, S – splice site mutation. mRNA level corresponds to log2(FPKM + 0.01), protein level corresponds to normalized relative protein abundance.

In fig. 3.1.8, we can clearly see a group of samples with high acetylation of H2B, high protein expression of GATA3, FOXA1, ESR1 and bromodomain containing proteins (on the right), and a group of samples with low expression of these proteins and low histone acetylation. However, there is also a group with elevated acetylation of H2B, but average or sometimes decreased expression of transcription factors. One possible explanation to this observation is that other proteins can interact with H2B as well, contributing to the longevity of acetylation marks. Such protein can be found by similar approach, searching for high protein correlation with H2B acetylation, but low mRNA correlation. Such analysis can be done in the future.

I also noticed that genes that were previously found as differentially expressed between samples with high and low H2B acetylation have dramatic differences between their protein and mRNA correlations with H2B acetylation. It turned out that in luminal samples these genes corelate with H2B acetylation on a protein, but not on a mRNA level. Many of these are chromatin-interacting proteins. This indicates the possibility that they are more likely to be involved in interaction with H2B, than to be regulated by H2B on the mRNA level (Fig. 3.1.9 A). Such correlation is not observed for H3 histone sites marking active chromatin (Fig. 3.1.9 B), or H2A, or H4 (data not shown). Among proteins with high correlation with H2B there are transcription factors: ARID1A, ARID1B, ARID2, FOXA1,

GATA3; methyltransferases KMT2A, KMT2C and SETD2; EP300 and other chromatin interacting and modifying enzymes.

30

A 0.4

0.2 Type mRNA 0.0 PROT with H2B sites −0.2 Average correlation Average

ESR1 TBX3 ITCH ARID2 GPS2 RPGR FOXA1 GATA3 SETD2 EP300 ASXL1 ERBB2 ERBB3 TRAF5 ARID1A ARID1B RUNX1 KMT2A KMT2C KMT2D PBRM1 NCOR1 MED12 CCND1 EXOC2 CDKN1B MAP2K4 MAP3K1 ZFP36L1 SMARCD1 B factor(Gene, levels = toOrder) 0.3 0.2 0.1 Type 0.0 mRNA −0.1 PROT

with H3 sites −0.2

Average correlation Average −0.3

ESR1 TBX3 ITCH ARID2 GPS2 RPGR FOXA1 GATA3 SETD2 EP300 ASXL1 ERBB2 ERBB3 TRAF5 ARID1A ARID1B RUNX1 KMT2A KMT2C KMT2D PBRM1 NCOR1 MED12 CCND1 EXOC2 CDKN1B MAP2K4 MAP3K1 ZFP36L1 SMARCD1 Figure 3.1.9. Average Spearman’s correlationfactor(Gene, levels between = toOrder) mRNA and protein levels of differentially expressed genes and histones acetylation. A. Correlation computed for H2B K5, K11, K11_K12, K15_K16 and K20_K23. B. Correlation computed for H3 K27, K27_K36 and K36.

To sum up, we showed that acetylation of histone H2B is significantly associated with CBP/p300 protein level across subtypes, suggesting that CBP/p300 may acetylate H2B N-terminal sites. We also demonstrated that in luminal subtypes protein level of various TFs, including luminal specific TFs, correlates with H2B acetylation better than that of their mRNA level, suggesting the direct or indirect interaction between those factors and histone H2B.

31

3.2 Metabolic Proteins Expression and Acetylation Interplay

3.2.1 Expression of metabolic enzymes characterize Basal subtype metabolism as glycolytic Acetylation is the second most common post-translational modification in eukaryotic cells. Along with histone proteins, many non-histone proteins have been found to be acetylated as well. Among acetylated proteins, there are ones involved in chromatin remodeling, metabolism, translation, splicing and other major vital cellular processes (Choudhary et al. 2009). Acetylation of non-histone proteins has been shown to play essential role in cancer development, therefore it is important to see how the acetylation of those proteins can contribute to breast cancer subtyping and development.

To find out what cellular pathways are enriched for acetylated proteins in the existing acetylome dataset for breast cancer, the KEGG pathways enrichment analysis has been performed with acetylated proteins reported in the dataset as foreground and the whole list of genes from RNA-seq data as a background. The top ten overrepresented KEGG pathways are shown in Fig. 3.2.1.1. Among the enriched acetylated pathways there are spliceosome, ribosome, proteasome subunits along with enzymes involved in carbon metabolism, amino acid synthesis etc. (Fig. 3.2.1.1). All these gene products are housekeeping and relatively abundant in every cell, hence it is not surprising to see them enriched for acetylated proteins. This observation points out to one limitation of targeted mass- spectrometry measurement, which we have to keep in mind, while analyzing the data: less abundant proteins may not be captured by antibodies due to their saturation by excessive amount of housekeeping proteins.

32

KEGG pathway enrichment analysis of acetylated genes in CPTAC breast cancer Carbon RNA transport

Spliceosome metabolism UBE2I ● SUMO2 RANGAP1 PHGDH PSAT1 NUP58 NUP98 NUP85 XPOT ESD XPO1 PGAM1 Carbon metabolism ADH5 NUP93 NUP205 ● RANBP2 PFKP NUP188 NXF1 p.adjust FBP1 PFKM ALDOA PGLS KPNB1 NUP62 NUP50 NUP155 PGK1 G6PD ENO2 NUP88 NUP153 Ribosome ALDOC TPR ● GPI PGK2 THOC2 TKT PRPS1 SUMO3 SRRM1 1e−08 PDHB PKM PRPS2 IDH2 DDX39B UPF3A UPF1 GLUD1 SHMT1 IDH3G PDHA1 RNA transport FH EIF4E ● 2e−08 ACO1 EIF4A2 OGDH SHMT2 EIF4G2 PABPC1 IDH1 DLD PC SDHA IDH3A EEF1A1 EIF2S1 EIF5B PCCB PCCA 3e−08 DLST ACSS2 EIF5 Systemic lupus erythematosus ACO2 SUCLA2 EIF4A1 SUCLG2 EIF3J EEF1A2 ● MUT MCEE HADHA EIF3B ACAT1 EIF1B EIF1AX ECHS1 EHHADH ACADS EIF3I Count CAT EIF3CL Biosynthesis of amino acids ● ● 30 Ribosome Spliceosome ● 40 Valine, leucine and isoleucine degradation ● RBM22 SART1 RPS7 RPS14 THOC2 50 RPS6 ● U2AF1 RPS11 ZMAT2 RBM25 RPS19 RPS5 60 RPS13 EFTUD2 SRSF7 RPSA RPS24 ● RPS20 HSPA1A DHX16 LSM5 Proteasome ● RPS3 WBP11 RPS23 RPS16 XAB2 U2AF2 70 RPS27A FAU SNW1 DDX23 ● RPS2 RPS8 SMNDC1 SNRPA TCERG1 UBA52 SNRPD1 SNRPE Propanoate metabolism ● RPS21 SNRNP200 RPL9 RPS4X SF3B2 MRPL12 RBMX HSPA2 HSPA8 MRPL1 RPL23A PHF5A DHX15 SNRPD2 RPLP0 DDX5 RPL36 DDX42 SNRNP40 SNRPB2 RPL18A RPL30 RPL27A SNRPA1 ● RPL15 RPL7 Citrate cycle (TCA cycle) HSPA1B RPL35A RPL10A RBM17 SRSF1 RPL37 RPL27 RPL18 LSM6 CRNKL1 RPL22 RPL21 PRPF40A SF3A2 SF3B1 RPL8 0.02 0.03 0.04 0.05 RPL26 RPL6 RPL5 RPL23 PRPF19 PRPF38B DDX39B SF3B6 GeneRatio SF3A1 RPL3 RP9 RPL31 RPS18 SRSF5 SNRNP70

Figure 3.2.1.1 Top enriched pathways for acetylated proteins in breast cancer. The right part provides examples of acetylated proteins interactions, colorful nodes indicate that there are novel sites reported in the dataset, grey nodes corresponds to proteins with all-known Ac-sites reported in the dataset.

I have decided to focus on acetylation of cancer metabolism and look for subtype differences in this

process. Cancer is known to have a rewired metabolism to support constant needs in energy and

monomers for building new cells. One possible way to reprogram the metabolism is to change the

expression of metabolic genes. Another way to regulate protein activity without changing the

expression is to use post-translational modifications. And acetylation is known to be a major regulatory

modification of metabolic enzymes.

First, I evaluated how protein expression of metabolic genes is different across subtypes (Fig. 3.2.1.2).

On figure 3.2.1.2 basal cancer subtype forms an isolated cluster, supporting the fact that basal triple

negative subtype has a distinctive form of metabolism (Lanning et al. 2017).

33

Stage 6 Stage PAM50 Stage IA SLC9A1 4 CRAT Stage IIA SLC25A1 SLC2A10 Stage IIB G6PD 2 FASN Stage III ACACA ACSS3 Stage IIIA ME3 0 SLC9A3R2 SLC44A4 Stage IIIB GLUD1 COX17 −2 Stage IIIC SLC39A6 SLC9A3R1 FBP1 PAM50 SLC25A35 −4 ACADS Basal HMGCL SLC43A3 SLC52A2 LumA SLC26A6 SLC16A1 LumB SLC5A6 SLC25A19 Her2 SLC7A5 SLC3A2 Normal_like SLC7A1 SLC2A1 Normal SLC2A3 LDHB PHGDH PFKP Compartment PSAT1 ENO1 cytoplasm GFPT2 ME2 endosome GLS NDUFAF7 EPR SLC4A1AP NDUFAF4 NDUFA4L2 Golgi HK2 SLC25A13 membrane SLC25A5 Compartment mitochondria nucleus various

Figure 3.2.1.2. Unsupervised clustering of protein expression of differentially expressed metabolic proteins and carriers (Anova, FDR < 0.05). Distance: Euclidean, clustering method: Ward’s. Red indicates the highest protein level, blue – the lowest, white – the missing data. Many solute outer membrane carriers are upregulated in basal subtype along with glycolysis enzymes.

Luminal subtypes and basal subtype tend to have their metabolic proteins regulated in opposite manner: highly expressed genes in one subtype are lowly expressed in another one. Her2 cells show mixed behavior, but the expression is more concordant with basal subtype, than luminal A/B. 7 out of 26 upregulated proteins in basal subtype are involved in glycolysis: glucose importers

GLUT1/GLUT3, hexokinase HK2/HK3, phosphofructokinase PFKP and enolase ENO1 (t-test

FDR < 0.05 compared to LumA/B). In addition, lactate dehydrogenase LDHB is also overexpressed in basal compared to all other subtypes (t-test FDR < 0.001), suggesting that the final compound of glycolysis pyruvate (Pyr) is further transformed into lactate and then exported from the cell by

Monocarboxylic Acid Transporter 1 (MCT1), which is also upregulated (t-test FDR < 0.01 compared to LumA and Her2). Interesting that two out of three enzymes controlling glycolysis rate limiting reactions are upregulated (HK and PFKP) in basal, meaning that basal subtype probably has an elevated glycolysis flux compared to luminal A and B. I would also like to point out that PFKP gene 34

in basal subtype has higher copy number variation (CNV) compared to other subtypes, which probably causes its upregulated expression (Fig 3.2.1.3). Even though Her2 subtype shows high levels of PFKP protein product, the CNV of this gene is not altered in Her2, therefore, another mechanism might be involved in the upregulation of protein expression.

Anova, p = 1.2e−05 Anova,● p = 1.4e−11 3 Anova, p = 5.4e−16 2 ● ● ● 150 2 ● ● 1 1 100 ● 0

PFKP ● PFKP ● ● ● protein level 0 PFKP FPKM 50 ● −1

copy number level number copy ● ● ● ● ● −2 0

Her2 Basal LumA LumB Her2 Her2 Basal LumA LumB Basal LumA LumB

Normal_like Normal_like Normal_like

Figure 3.2.1.3. PFKP gene amplification, RNA level and protein level in breast cancer subtypes. Copy number level is considered as log2(copy number tumor/ copy number normal). The left plot has a dashed line marking 0.5, a cutoff for gene to have one additional copy. FPKM is Fragments Per Kilobase of transcript per Million mapped reads.

The same association between CNV, RNA and protein level is observed for GLUT1 and LDHB (Fig.

A.4 and A.5), but not for HK2/3 (data not shown). Noteworthy, along with upregulation of key glycolytic enzymes, basal subtype has significantly reduced protein expression of a key gluconeogenesis pathway enzyme - Fructose-1,6-bisphosphatase 1 (FBP1) (t-test FDR < 0.05 compared to LumA/B), strongly supporting the importance of glycolysis flux for basal subtype.

Glycolysis is tightly connected to serine synthesis pathway, which starts from glycolysis intermediate compound 3-phosphoglycerate (3-PG). Serine synthesis appears to be upregulated in basal subtype as well since enzymes controlling first two reactions, phosphoglycerate dehydrogenase (PHGDH) and phosphoserine aminotransferase 1 (PSAT1), are upregulated on both RNA and protein levels in basal 35

subtype (t-test FDR<0.05). Next, serine can be used for protein synthesis, or for synthesis of another amino acid – glycine. The enzyme catalyzing serine into glycine transformation in cytoplasm – Serine

Hydroxymethyltransferase 1 (SHMT1) - is slightly downregulated in basal compared to luminal (t-test

FDR<0.2 compared to LumA/B), while on contrary similar mitochondrial protein is upregulated (t- test FDR<0.005 compared to LumA/B). Such dysregulation of SHMT1/2 indicates that basal subtype may utilize mitochondrial transformation of serine to the glycine more actively as opposed to luminal subtypes, which rely on cytoplasmic reaction more. In mitochondria, glycine might be further decarboxylated by glycine decarboxylase (GLDC), which is also upregulated in basal breast cancer subtype (t-test FDR < 0.05).

Tumors also often use glutamine as an alternative source of energy to fuel TCA cycle and make citric acid required for amino acid synthesis (Scalise et al. 2017). Glutamine can be used for many transaminase reactions as a source of amino group. For example, glutamine is utilized by

Phosphoribosyl Pyrophosphate Amidotransferase (PPAT), catalyzing the first step of purine synthesis; Guanine Monophosphate Synthase (GMPS), member of de novo guanine synthesis pathway; Glutamine--Fructose-6-Phosphate Transaminase 1/2 (GFPT1/2), controlling the flux of glucose into the hexosamine pathway; Asparagine Synthetase (ASNS), transforming aspartate into asparagine. All these proteins are upregulated in basal subtype (t-test FDR < 0.05, compared to LumA for PPAT; t-test FDR < 0.2, compared to LumA for GFTP1; t-test FDR < 0.05, compared to

LumA/LumB for GFTP2, GMPS and ASNS). In order for all these reactions to be supported, tumor cell should have enough glutamine coming from extracellular space. However, none of known glutamine transporters are upregulated in any subtype (SLC1A4, SLC1A5, SLC6A19, SLC38A1,

SLC38A2), except for SLC6A14, upregulated in Basal and Her2 subtypes on mRNA level (t-test FDR

< 0.05, compared to LumA/LumB)(Scalise et al. 2017). In addition to being source of amino group

36

for other chemical compounds, glutamine can be hydrolyzed by (GLS), protein expression of which is significantly elevated in Basal subtype (t-test FDR < 10-4). Many tumors rely on glutamine because it can fuel TCA cycle in mitochondria: first Gln is deaminized into Glu, which is further transformed into a-ketoglutarate through reaction catalyzed by Glutamate Dehydrogenase

1 (GLUD1). As opposed to all other enzymes we have discussed so far, protein level of GLUD1 is significantly decreased in Basal subtype (t-test FDR < 0.05), suggesting fueling TCA cycle is not a preferable option for this breast cancer subtype. 3.2.2 Differential acetylation of cytoplasmic and mitochondrial metabolic enzymes

As we see from analysis of gene and protein expression of metabolic enzymes, Basal subtype exhibits aerobic glycolytic form of metabolism. In the frame of glycolytic metabolism, TCA cycle and oxidative respiration are usually suppressed (Vander Heiden, Cantley, and Thompson 2009). However, protein expression analysis has not revealed any changes in expression of TCA cycle enzymes in breast cancer subtypes. Hence, other mechanisms such as post-translational modifications might be involved in balancing mitochondrial reactions.

Therefore, I evaluated differentially acetylated sites across subtypes. I found 36 metabolic protein with differentially acetylated sites, 8 of which are involved in glycolysis and 9 in TCA cycle. In figure 3.2.2.1 clustering of normalized differentially acetylated Ac-sites is shown. Mostly all basal samples have high acetylation of mitochondrial enzymes, while cytoplasmic enzymes are hypoacetylated. A subset of luminal A samples has notably elevated acetylation of glycolytic enzymes such as Glyceraldehyde-3-

Phosphate Dehydrogenase (GAPDH), ENO1, Glucose-6-Phosphate (GPI),

Triosephosphate Isomerase 1 (TPI1), Phosphoglycerate Kinase 1 (PGK1) and Pyruvate Kinase

(PKM), as well as cytoplasmic paralogs of TCA cycle enzymes: malate dehydrogenase 1 (MDH1) and

37

Isocitrate Dehydrogenase (NADP+) 1 (IDH1).

Stage Stage 5 PAM50 Stage IA IDH2_K263k. IDH2_K272k. 0 Stage IIA IDH2_K133k. IDH2_K193k. Stage IIB IDH2_K243k. ACADS_K306k. −5 Stage III COX5B_K56k. GLUD1_K545k. Stage IIIA NDUFB3_K34k. ACADM_K289k. −10 ACO2_K739k. Stage IIIB SDHA_K541k. ACADVL_K218k. −15 Stage IIIC ACAT1_K174k. NDUFA10_K80k. PDHA1_K121k. PAM50 PDHB_K227k. −20 PDHB_K354k. Basal ACADVL_K658k. ACADVL_K662k. ACO2_K520k. LumA SLC25A24_K336k. SLC25A5_K163k. LumB SLC25A5_K166k. ACADM_K211k. Her2 IDH2_K45k. ACO2_K401k. Normal_like ENO1_K343k. NDUFAB1_K92k. Normal FH_K66k. ACO2_K689k. ME2_K94k. ACADVL_K395k. compartment MDH1_K214k. HMGCL_K48k. cytoplasm HMGCL_K93k. LDHB_K119k. membrane MDH2_K335k. UQCRB_K12k. mitochondria IDH1_K236k. PKM_K66k. ENO1_K420k. various ENO1_K71k. LDHA_K81k. PGK1_K191k. PKM_K166k. TPI1_K225k. ENO1_K81k. ENO1_K89k. PKM_K261k. PGK1_K131k. ENO1_K233k. ENO1_K64k. LDHA_K228k. IDH1_K224k. PGK1_K220k. GAPDH_K186k. GAPDH_K215k. GAPDH_K84k. ENO1_K228k. ENO1_K92k. PGK1_K382k. PGK1_K11k. LDHA_K126k. SOD1_K123k. MDH1_K110k. MDH1_K118k. MDH1_K103k. MDH1_K239k. PKM_K266k. PKM_K433k. GPI_K252k. GPI_K454k. IDH1_K81k. GPI_K366k. SLC25A5_K147k. SOD1_K129k. GAPDH_K139k. GAPDH_K61k. HMGCL_K190k. GAPDH_K66k. SLC9A3R1_K19k. PFKL_K410k. PKM_K475k. GPI_K524k. GAPDH_K251k. GPI_K447k. GAPDH_K259k. FASN_K1878k. PGK1_K41k. GAPDH_K86k. PGK1_K272k. TPI1_K106k.N109n.M120m. TPI1_K212k. GAPDH_K107k. PGK1_K216k. FASN_K1847k. FASN_K41k. LDHB_K82k. ACLY_K538k. PFKM_K749k. HK1_K777k. ACLY_K1080k. FASN_K1072k. IDH1_K93k. compartment

Figure 3.2.2.1. Unsupervised clustering of normalized acetylation level of differentially acetylated lysines of metabolic proteins and carriers (Anova, FDR < 0.05). Distance: Euclidean, clustering method: Ward’s. Red indicates the highest acetylation level, blue – the lowest, white – the missing data. Interesting that there is a little overlap between differentially expressed and differentially acetylated glycolysis enzymes. The summary of observed changes in glycolytic enzymes across subtypes is shown in Table 3.2.2.1.

38

Table 3.2.2.1 Summary of protein expression and acetylation of glycolytic enzymes in breast cancer subtypes. Number of affected sites was chosen with t-test FDR < 0.1. GENE PROTEIN ACETYLATION NUMBER OF AC- NUMBER OF EXPRESSION SITES AC-SITES AFFECTED DETECTED SCL2A1 Higher in basal NA - 0 SCL2A3 Higher in basal NA - 0 HK1 No change No change 0 12 HK2, 3 Higher in basal NA - 0 GPI No change Lower in basal 6 14 ALDOA No change Higher in basal 2 12 and her2 ALDOB, C No change NA - 0 TPI No change Lower in basal 2 13 GAPDH No change Lower in basal 8 14 PGK1 No change Lower in basal 9 31 PGAM1 No change No change 0 8 ENO1 Higher in basal Lower in basal 7 23 PKM No change Lower in basal 7 12 LDHA No change Lower in basal 3 11 LDHB Higher in basal No change 0 9

All differentially acetylated sites of glycolytic enzymes, except for those of aldolase A (ALDOA), are hypoacetylated in Basal subtype. Worth noting that basal subtype differs the most in acetylation of metabolic enzymes, but not others (Fig. 3.2.2.2).

3 Anova, p = 2.5e−09 Anova, p = 3.4e−06 Anova, p = 0.00038 Anova, p = 1.3e−08 4 2 2 ● ● 2 1 2 ● 1 0 0 0 0

● ●

acetylation level ● acetylation level acetylation level −1 −2 acetylation level ● −2 −1 ● GPI_K252k. normalized PGK1_K131k. normalized ACO2_K465k. normalized ACO2_K465k. ● MDH2_K335k. normalized −2 ● −2

Her2 Her2 Her2 Her2 Basal LumA LumB Basal LumA LumB Basal LumA LumB Basal LumA LumB

Figure 3.2.2.2. Examples of differentially acetylated sites of metabolic enzymes.

39

In Table 3.2.2.2 the summary of changes in acetylation of TCA cycle enzymes and pyruvate dehydrogenase complex (PDC) is provided. As it seen from this table none of the TCA cycle proteins is differentially expressed, only one subunit DLAT of PDC. However, I found nine mitochondrial enzymes with upregulated acetylation in basal subtype.

Table 3.2.2.2 Summary of protein expression and acetylation of TCA cycle enzymes in breast cancer subtypes. Number of affected sites was chosen with t-test FDR < 0.1. GENE PROTEIN ACETYLATION NUMBER OF AC- NUMBER OF EXPRESSION SITES AC-SITES AFFECTED DETECTED CS No change No change 0 6 ACO2 No change Higher in basal 5 15 IDH2 No change Higher in basal 3 18 IDH3A,G No change No change 0,0 7,2 IDH3B No change Higher in basal 2 2 OGDH No change No change 0 3 SUCLG1,2 No change Higher in basal 1,1 5,9 SUCLA2 No change No change 0 3 SDHA No change Higher in basal 2 8 SDHB, C No change No change 0 1,0 FH No change Higher in basal 1 8 MDH2 No change Higher in basal / 3 16 lower in basal PDHA1 No change No change 0 5 PDHB No change Higher in basal 2 3 DLD No change Higher in basal 1 14 DLAT Higher in basal No change 0 5

We identified that glycolysis enzymes are upregulated and hypoacetylated in basal subtype, while TCA cycle enzymes are hyperacetylated. Next, I investigated if there are HATs or HDACs that cause such difference between subtypes.

Hence, I studied the association of expression of HATs and HDACs with acetylation of metabolic enzymes. The association was tested using linear model [2]:

�� ≈ � + � ∗ �� + � ∗ �� + � [2] 40

Deacetylase SIRT3 was found to have strong association with a number of mitochondrial enzymes acetylation sites. SIRT3 is a major deacetylase in mitochondria, for which many TCA cycle enzymes were reported to be a substrate (Cimen et al. 2010; W. Yu, Dittenhafer-Reed, and Denu 2012; Ozden et al. 2014). SIRT3 shows negative association (as expected for a deacetylase) with MDH2, Acyl-CoA

Dehydrogenase Very Long Chain (ACADVL), Succinate-CoA Alpha Subunit (SUCLG1), 3-

Hydroxymethyl-3-Methylglutaryl-CoA Lyase (HMGCL), catalyzing the final step of leucine degradation in mitochondria (Fig. 3.2.2.3). These proteins seem to be under control of SIRT3 deacetylase in mitochondria.

1.0

SOD1_K24k. GAPDH_K215k.

PGAM1_K251k. HK1_K353k. GAPDH_K84k. 0.5 MDH1_K239k. ACLY_K540k. IDH1_K115k. PGAM1_K251k. MDH1_K118k. ACLY_K97k. GSTP1_N207n.K209k. IDH3A_K58k. log10(FDR adjusted p−val) ACO2_K689k. PGK1_K353k. 2 3 0.0 4

Linear model coef SLC25A5_K23k. SLC25A5_K163k. ACADVL_K658k.

NDUFB3_K34k. HMGCL_K48k.SLC25A5_K33k. EGLN1_K408k.NDUFAB1_K92k.

SDHB_K55k. NDUFA5_K112k. PGK1_K139k. HMGCL_K48k.

−0.5 ACADVL_K395k. COX4I1_K67k. SUCLG1_K66k.

ACAT1_K87k. MDH2_K239k. MDH2_K335k.

SIRT1 SIRT2 SIRT3 SIRT7 HDAC1 HDAC2 HDAC3 HDAC4 HDAC5 HDAC6 HDAC7 HDAC8 HDAC10 HDAC11 Acetyltransferase Figure 3.2.2.3. Coefficients of fitted linear models for HDACs [2]. Only coefficients with linear model coefficient FDR < 0.1 and p-value < 0.01 under random model are shown.

Linear models for another mitochondrial deacetylase SIRT5 were not significant under FDR cutoff and random model p-value cutoff. All other significant associations with negative �� seem to be not functional because all significant sites are located in mitochondria, while HDACs are either nuclear, or cytoplasmic. A number of positive associations can be explained by high protein coexpression of a 41

group of HATs and HDACs (Fig. A.5). At least, HDAC1, HDAC2 and HDAC3 protein correlates with the expression of various HATs.

Similar to HDACs plot, Fig. 3.2.2.4 is showing significant linear model coefficients for HATs.

1.0 ACO2_K465k. PGAM1_K251k.

PGK1_K15k. PGAM1_K251k. IDH1_K93k.

PFKL_K410k. MDH1_K236k. PGAM1_K251k. PFKL_K410k. ACLY_K468k.

HMGCL_K324k. MDH1_K107k. GPI_K466k. PGK1_K323k. PGK1_K141k. 0.5 FH_K66k. MDH2_K239k. PKM_K66k. ACO2_K409k.ME2_K224k. PGAM1_K251k. log10(FDR adjusted p−val) PGAM1_K251k. ACLY_K1080k. ACLY_K468k. GAPDH_K251k. MDH1_K118k. MDH1_K239k. 1.5 PGK1_K139k. MDH1_K107k. PGK1_K146k. PGK1_K361k. GAPDH_K145k. ACLY_K1080k. 2.0 PGAM1_K100k. ARNT_K128k. 0.0 EHHADH_K584k. G6PD_K201k. 2.5 Linear model coef

HMGCL_K48k.

SLC25A6_K163k. −0.5 NDUFA10_K242k. ENO1_K221k.

NDUFV3_N68n.K72k.

ATF2 TAF1 BRD1 BRD2 BRD3 BRD4 ELP3 HAT1 KAT5 KAT7 KAT8 EP300 KAT2A KAT2B KAT6B CLOCK NCOA3 CREBBP Acetyltransferase Figure 3.2.2.4. Coefficients of fitted linear models for HATs [2]. Only coefficients with linear model coefficient FDR < 0.1 and p-value < 0.01 under random model are shown.

Again, various HATs display strong correlation with acetylation of cytoplasmic enzymes such as

PGAM1, PGK1, GAPDH, G6PD, GPI, IDH1, Phosphofructokinase Liver Type (PFKL), MDH1,

ATP Citrate Lyase (ACLY). Interesting that mostly all significantly associated with HATs Ac-sites are not differentially acetylated in basal subtype, and therefore HATs expression cannot explain difference in acetylation between subtypes. However, for one protein PGK1 differentially acetylated sites were found significantly associated with HATs. In PGK1 two out of five acetylated lysine found in linear model regression analysis (K15, K141, K146, K323, K361) are differentially acetylated across subtypes.

Two Ac-sites K141 and K361 demonstrate association with nuclear TAF1 and KAT5 that might be explained by the fact that PGK1 has an alternative function in the nucleus, regulating DNA replication

42

and repair (J. Wang et al. 2007). Similar to PGK1, three out four significant MDH1 Ac-sites are differentially acetylated in breast cancer subtypes (K107, K118, K236, K239). As a result, linear regression analysis can only partially explain differential acetylation of metabolic enzymes in basal subtype compared to LumA/B.

Not only enzyme concentration can affect the amount of acetylated protein in cell, but also the concentration of Acetyl-coA, the essential compound for acetyltransferase reaction. Therefore, we searched the Acetyl-coA synthesis pathways for any alterations across subtypes (Fig. 3.2.2.5).

Figure 3.2.2.5. Schematic representation of Acetyl-coA synthesis pathways in basal subtype. Blue arrows mean protein downregulation, pink arrows mean protein upregulation, yellow circles – elevated acetylation. Adopted from (Narita, Weinert, and Choudhary 2018). ACC1 - Acetyl-CoA Carboxylase Alpha, or ACACA; ACC2 - Acetyl-CoA Carboxylase Beta, or ACACB; ACLY - ATP Citrate Lyase, ACSS2 - Acyl-CoA Synthetase Short Chain Family Member 2; PDC – Pyruvate Dehydrogenase Complex.

There are two pathways for acetyl-coA synthesis in cytoplasm. First one involves ACLY enzyme and citric acid, the second one involves ACSS2 enzyme and acetate. Both cytoplasmic acetyl-coA synthesis pathways turned out to be downregulated in basal subtype. The main acetyl-coA synthesis enzyme

ACLY is decreased compared to Her2 (t-test FDR < 0.01), LumB (t-test FDR < 0.08) and LumA (t- test FDR < 0.2). Similar to ACLY, ACSS2 is slightly diminished in basal compared to all subtypes (t- 43

test FDR < 0.2), as well as mitochondrial ACSS3 (t-test FDR < 0.05), but not ACSS1 (t-test FDR <

0.3) (Fig.3.2.2.6A). In cytoplasm, acetyl-coA serves as a main source of carbon for fatty acid synthesis.

However, in basal fatty acid synthesis appears to be downregulated as well. Acetyl-CoA Carboxylase

Alpha (ACACA) and Fatty Acid Synthase (FASN), catalyzing the first two steps of fatty acid synthesis in cytoplasm, are downregulated on the protein level (ACACA t-test FDR < 0.03, FASN t-test FDR

< 0.001) (Fig.3.2.2.6B). Regulatory beta subunit of Acetyl-CoA Carboxylase Complex (ACACB) is decreased only compared to LumA subtype (t-test FDR < 0.01). A Anova, p = 0.0017 4 Anova, p = 3.9e−07 3 Anova, p = 0.079

● ● ● ● 2 2 2 ● ● ● ● ● 1 ●

0 ACLY ACSS2 ACSS2 0 ACSS3 0 protein level protein level protein level ● −1 ● −2 −2

Her2 Her2 Her2 Basal LumA LumB Basal LumA LumB Basal LumA LumB

Normal_like Normal_like Normal_like

B Anova, p = 0.00014 Anova, p = 0.02 Anova, p = 1.4e−08 ● 4 ● ● 5.0 2 ●

● ● 2 2.5 ● ● 0 ● ●

● FASN ACACA ACACA ACACB ACACB protein level protein level 0 protein level 0.0 −2

● ●

Her2 Her2 Her2 Basal LumA LumB Basal LumA LumB Basal LumA LumB

Normal_like Normal_like Normal_like

Figure 3.2.2.6. Expression of Acetyl-coA metabolism proteins. A. Protein expression of proteins involved in Acetyl-coA synthesis in cytoplasm. B. Protein expression of proteins involved in fatty acid synthesis from Acetyl-coA in cytoplasm. 44

Such a noticeable downregulation of Acetyl-coA synthesis and metabolism in cytoplasm can be an indicator of a low concentration of Acetyl-coA in this compartment, which can in turn explain reduced acetylation of many Ac-sites in cytoplasm. It has been reported that expression of ACLY can predict the level of histone H2B, H3 and H4 acetylation level (Wellen et al. 2009; Carrer et al. 2017), indicating that expression level of this protein mirrors the concentration of its reaction product – Ac- coA, which in turns affect the acetylation of other proteins. In mitochondria and nucleus, another source of acetyl-coA is active - PDC, catalyzing pyruvate decarboxylation to Ac-coA. All subunits of

PDC have similar expression level across subtypes, except for DLAT, being significantly upregulated in basal (t-test FDR < 0.001 compared to LumB, FDR < 0.15 compared to LumA). Another subunit of PDC, PDHB, was found to be differentially acetylated in basal subtype relatively to LumA/B (Table

3.2.2.2), but the function of these sites is unknown. Hence, the local concentration of Ac-coA in mitochondria and nucleus can be maintained by PDC in Basal subtype.

Taking together we identified possible reasons for such large differences between metabolic enzymes acetylation in subtypes. Hypoacetylated state of cytoplasmic proteins in basal subtype can be partially explained by expression of HATs and supported by possibly decreased level of cytoplasmic acetyl- coA. Hyperacetylation of mitochondrial enzymes in basal is observed probably due to downregulation of mitochondrial SIRT3 deacetylase.

3.2.3 Functional role of differentially acetylated

In previous sections, the differences in protein and protein acetylation levels between subtypes have been described. If upregulated expression of a protein is almost always can be treated as an increased rate of the reaction catalyzed by this protein, post-translational modifications can be as activating, as

45

inhibiting. Hence, there is a need in identification of functional role of these sites determined from literature.

The first glycolysis enzyme found with differentially acetylated sites is GPI. The fact that acetylation affects its activity has not been reported so far. However, in 3-dimensional space six out of fourteen detected acetylation sites are differentially acetylated and are located together on one side of the protein surface not interacting with other subunits of GPI. Four of them (K252, K447, K454, K524) frame a site believed to be important for F-6P binding (Cordeiro et al. 2004; Ji Hyun Lee et al. 2001).

I hypothesize that acetylation of lysines surrounding the makes it more hydrophobic and not preferable for hydrophilic substrate binding. Thus, low acetylation may maintain high enzymatic activity.

Figure 3.2.3.1. Localization of GPI differentially acetylated sites relative to its active center. Differentially acetylated sites are marked in pink, not differentially acetylated sites are marked in grey. 46

K519 residue involved in catalysis reported by (Ji Hyun Lee et al. 2001) is marked in magenta. Catalytically important sites reported by (Cordeiro et al. 2004) are colored in white.

Similarly, PGK1 has two differentially acetylated sites (K216 and K220) located in proximal distance to the substrate . The acetylation of K220 was determine to decrease enzymatic activity by disrupting the binding of ADP (S. Wang et al. 2015). The 3D conformation of native PGK1 active site can be seen in fig. 3.2.3.2

K216

K220

3-PG

ADP

Figure 3.2.3.2. Localization of PGK1 differentially acetylated sites relative to its active center. Differentially acetylated sites are marked in cyan. Substrates 3-PG and ADP are colored in white.

Acetylated sites can interfere not only substrate binding, but also the binding of allosteric regulator as it happens in PKM. PKM becomes allosterically activated by fructose-1,6-bisphosphate (FBP) (Lv et al. 2013), and K433 acetylation prevents it from binding to PKM, and therefore, reduces the activity

47

of the enzyme. Moreover acetylation of K433 promotes nuclear location of PKM and triggers its kinase activity, as opposed to its canonical phosphatase activity in glycolysis (Lv et al. 2013). K433 site is differentially lower acetylated in basal subtype, allowing for binding the allosteric regulator and keeping glycolytic function of PKM on the required level.

To sum up, acetylation generally inhibits glycolytic enzymes in different ways. Given the diminished acetylation of Ac-sites in basal subtype, we can conclude that acetylation agrees with protein expression in maintaining glycolytic flux on the high level.

TCA cycle enzymes appear to be regulated by acetylation in a similar fashion. Almost all TCA cycle enzymes are subjects to deacetylation by SIRT3 (Sol et al. 2012). For instance, deacetylation of four lysines in SDH complex by SIRT3 was shown to increase its activity (Cimen et al. 2010). Two out of four these regulated lysines are differentially acetylated in breast cancer subtypes. Elevated acetylation of K179 and K538 of SDHA in basal subtype suggests that the function of this complex is decreased.

Since SDH is the only unique enzyme acting as part of both TCA cycle and respiratory chain, we can suppose that respiratory chain is also affected by K179 and K538 acetylation. One possible explanation why K179 acetylation can inhibit the activity of SDHA was proposed by Peter Chhoy

(Chhoy et al. 2016). He postulates that acetylation of K179 may prevent the binding of SDHE subunit of SDH complex, which is necessary for loading FAD inside the SDHA subunit.

Another enzyme MDH2 was shown to have Ac-sites, regulated by SIRT3, as well. One study has shown that increased acetylation of K185, K301, K307 and K314 upon inhibition of SIRT3 leads to the increased activity of MDH2 towards malate formation (Zhao et al. 2010). On the other hand, another study has demonstrated that upon SIRT3 inhibition and calorie restriction, K239 is significantly hyperacetylated, leading to the decreased enzymatic activity of MDH2 (Hebert et al.

2013). Such change in activity can be explained by K239 residue localization in MDH2 complex. It

48

locates nearby NAD+ binding site and may impair its binding (Figure 3.2.3.3). Interesting that four residues reported by Zhao et al. are located together on the outer surface of MDH2 complex

K297

K301

K239

NAD+

K185

K301

K307

K239 K314 K335

Figure 3.2.3.3. Localization of MDH2 differentially acetylated sites. Differentially acetylated upregulated in basal sites are marked in pink. Differentially acetylated downregulated in basal sites are marked in blue. Not differentially acetylated cites, but previously reported as activating, are in cyan. Substrate NAD+ is colored in magenta.

Interesting, K239 was found to be hyperacetylated in basal subtype, but not other four lysines.

However, it is essential to notice that under normal conditions in mitochondria and low NADH level

MDH2 catalyzes the reduction of malate to oxaloacetate, even though the reverse reaction is thermodynamically more favorable. In in vitro studies of MDH2 activity the rate of reverse reaction is always measured, therefore, it is hard to conclude the exact effect of acetylation on the malate

49

reduction reaction happening in mitochondria. But if acetylation has the same effect on both forward and reverse reactions and K239 acetylation is truly inhibiting, then we can say that MDH2 is less active in basal subtype that in others.

IDH2 has also been reported as SIRT3 substrate. The specific K413 position controlled by SIRT3 level was stated to be important for IDH2 dimerization required for proper functioning (Zou et al.

2017). In another study, acetylation mimicking mutagenesis demonstrated that acetylated K180, K251,

K256, K272, K275 and K413 decrease the activity of IDH2 (Xu et al. 2017). Among these sites, only one site K272 was found to be differentially acetylated in the current dataset, not K180, K256, K275 or K413 (site K251 is not detected). This fact suggests that IDH2 activity may not be decreased in basal subtype compared to others. However, there is one site K193, which was not tested in previous studies, located in clasp-domain of IDH2 that is involved in protein dimerization and possible tetramerization (Fig. 3.2.3.4). K193 site locates in highly hydrophilic region of the clasp-domain, and acetylation might disrupt its structure since it makes lysines even more hydrophobic. Therefore, acetylation of K193 residue may interfere IDH2 dimer formation that is critical for its proper functioning. If it is so, then basal subtype has IDH2 activity decreased compared to other subtypes.

K193

NADP+

K193

50

Figure 3.2.3.4. Localization of IDH2 differentially acetylated sites. Differentially acetylated sites are marked in pink. Protein is colored by hydrophobicity: red the most hydrophilic, white the most hydrophobic. Substrate NADP+ is colored in green. Finally, Aconitase ACO2 has been also proved to be SIRT3 substrate along with other TCA cycle enzymes. However, the effect of acetylation has been poorly investigated. One study demonstrates that upon treatment with acetylating agent, the activity of ACO2 is increasing at low concentration of the agent and is pluming at its high concentrations (Fernandes et al. 2015). This controversial behavior indicates that either the degree of acetylation is important, or that acetylation of essential for catalysis sites is highly not favorable under low concentrations of acetic anhydride. Unfortunately, all four differentially acetylated sites are located on the outer surface of ACO2, relatively far from Fe-S cluster and substrate binding site, so it is hard to predict if acetylation of those site influences the activity.

To sum, we identified that the activity of glycolytic and TCA cycle enzymes is usually diminished upon acetylation, therefore, basal breast cancer subtype can be characterized by the increased level of glycolysis and decreased level of TCA cycle.

51

4 Conclusions

In this work we explored the associations of histone and non-histone proteins acetylation with breast cancer subtyping. In luminal A and B subtypes acetylation of N-terminal sites of histone H2B significantly correlates with expression level of luminal specific transcription factors FOXA1 and

GATA3. Moreover, the correlation is higher with protein expression, than with mRNA expression, suggesting these factors might be part of complexes interacting with H2B acetylated sites. In addition,

H2B acetylation correlates with protein expression of a number of tumor suppressor genes, indicating that H2B acetylation may be a new prognostic factor in luminal breast cancer. Acetylation of non- histone proteins can additionally characterize the metabolism of basal subtype. We showed that in addition to upregulated expression of glycolysis genes, basal cancer subtype maintains lowered acetylation level of cytoplasmic enzymes, but elevated acetylation of mitochondrial enzymes, which altogether help cancer favor glycolysis over the TCA cycle. Such differences in acetylation can be accounted for differential expression of mitochondrial HDACs and downregulated Ac-coA synthesis pathways in basal subtype compared to LumA/B. As a result, hypoacetylated cytoplasmic enzymes function more efficiently, while the activity of hyperacetylated mitochondrial enzymes appears to be decreased, supporting the established by gene expression vector of aerobic glycolytic metabolism in basal subtype of breast cancer.

52

Appendix A * 8

● 6 H2B status 4 ● a high_H2B_acet * ● * * * ● * * * * * 8 8 ● ● a low_H2B_acet ● ●

mRNA level ● ● 2 2 2 ● ● ● 6 ● ● 6 ● ● H2B status H2B status H2B status H2B status 0 ● ● 4 4 ● ● a high_H2B_acet 0 0 a high_H2B_aceta high_H2B_aceta high_H2B_acet ● ● ● ● ● ● a low_H2B_aceta low_H2B_aceta low_H2B_aceta low_H2B_acet

mRNA level ● ● mRNA level ● ● Protein level 2 2 ESR1 Protein level GPS2 TBX3 ● ● ARID2 ● FOXA1 GATA3 TRAF5 ARID1A ● MED12 ● ● ● MAP3K1 ● −2 −2 ● ● ● ● Gene 0 0 ● ● ● ●

ESR1 TBX3 ARID2 ESR1 GPS2 ESR1 GPS2 ESR1 GPS2 TBX3 GPS2 TBX3 TBX3 FOXA1ARID2 GATA3 ARID2 GATA3 ARID2 TRAF5 FOXA1 MED12FOXA1 GATA3 MED12FOXA1 GATA3 TRAF5 TRAF5 TRAF5 ARID1A ARID1A ARID1A ARID1A MED12 MED12 MAP3K1 MAP3K1 MAP3K1 MAP3K1 Gene Gene Gene Gene Figure A.1. Gene and protein expression of luminal specific DE genes. Significance level of Wilcoxon test is shown.

A H2B_K5 H2B_K11 H2B_K11_K12 H2B_K15_K16 H2B_K20_K23 S S ● S S 2 ● M ● S PAM50 F ● ● M M ● M ● M M S ● ● ● M F ● ● ● Basal ●● M ● ●●●●F● ● F ● ● 0 F ●● ● ● ●● ● ● ● ● S ●●M M ● ● ● ● S S F ● ● ● ● ● S F F M Her2 F F F F F −2 F M M M ● F ● LumA M F F F F F ● LumB −4 Acetylation level R = 0.39 , p = 2.1e−05 R = 0.41 , p = 4.6e−06 R = 0.34 , p = 2e−04 R = 0.31 , p = 7e−04 R = 0.44 , p = 1.1e−06 ● Normal_like −6 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 GATA3 mRNA level

53

H2B_K5 H2B_K11 H2B_K11_K12 H2B_K15_K16 H2B_K20_K23

S ● S S 2 ● M S ● S M PAM50 ● M ● S F M M ● M ● ● ● ● M M ● ● Basal ● ● ● ● ●● F ● ●● 0 ● ●● M ●F● ● F ● ●● ● M● S ● S ● F M● ● S F ● ●● ● S F ● ●F● ● F ● ● F F M F M Her2 F F M F F −2 F M F ● ● F F LumA ● LumB −4 Acetylation level R = 0.37 , p = 4.1e−05 R = 0.48 , p = 4.5e−08 R = 0.36 , p = 6.1e−05 R = 0.34 , p = 0.00019 R = 0.5 , p = 7.5e−09 ● Normal_like −6 −2 0 2 −2 0 2 −2 0 2 −2 0 2 −2 0 2 GATA3 protein level B

H2B_K5 H2B_K11 H2B_K11_K12 H2B_K15_K16 H2B_K20_K23

● 2 ● PAM50 ● ● ● ● ● ● ● ● ● ● Basal ●● ● ●● ● ● ●● 0 ● ●● ●●● ●●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● Her2 −2 ● ● LumA

● LumB −4 Acetylation level R = 0.21 , p = 0.025 R = 0.23 , p = 0.014 R = 0.21 , p = 0.023 R = 0.23 , p = 0.015 R = 0.34 , p = 0.00021 ● Normal_like −6 −5 0 5 −5 0 5 −5 0 5 −5 0 5 −5 0 5 FOXA1 mRNA level

H2B_K5 H2B_K11 H2B_K11_K12 H2B_K15_K16 H2B_K20_K23

● 2 ● PAM50 ● ● ● ● ● ● ● ● ● ● Basal ●● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Her2 −2 ● ● LumA

● LumB −4 Acetylation level R = 0.31 , p = 0.00077 R = 0.44 , p = 9.2e−07 R = 0.33 , p = 0.00027 R = 0.27 , p = 0.0039 R = 0.45 , p = 3.3e−07 ● Normal_like −6 −2 −1 0 1 2 −2 −1 0 1 2 −2 −1 0 1 2 −2 −1 0 1 2 −2 −1 0 1 2 FOXA1 protein level C H2B_K5 H2B_K11 H2B_K11_K12 H2B_K15_K16 H2B_K20_K23

● 2 ● PAM50 ● ● ● ● ● ● ● ● ● ● Basal ●● ● ●●● ● ●● 0 ● ●● ●● ●●● ● ● ● ● ● ● ●● ● ●● ● ● Her2 −2 ● ● LumA

● LumB −4 Acetylation level R = 0.28 , p = 0.0021 R = 0.34 , p = 0.00021 R = 0.3 , p = 0.0013 R = 0.24 , p = 0.0087 R = 0.36 , p = 6.5e−05 ● Normal_like −6 0 5 0 5 0 5 0 5 0 5 ESR1 mRNA level

54

H2B_K5 H2B_K11 H2B_K11_K12 H2B_K15_K16 H2B_K20_K23

● 2 ● PAM50 ● ● ● ● ● ● ● ● ● ● Basal ● ● ● ●● ● ● ● 0 ● ● ● ●● ●● ● ● ● ● ● ● ● ●●● ● ●●● ● ●● Her2 −2 ● ● LumA

● LumB −4 Acetylation level R = 0.27 , p = 0.0036 R = 0.25 , p = 0.0058 R = 0.16 , p = 0.096 R = 0.082 , p = 0.38 R = 0.23 , p = 0.012 ● Normal_like −6 −4 −2 0 2 −4 −2 0 2 −4 −2 0 2 −4 −2 0 2 −4 −2 0 2 ESR1 protein level Figure A.2. Spearman’s correlation between H2B N-terminal Ac-sites and luminal specific DE genes across all subtypes. A. GATA3, mutations are labeled: M – missense mutation, F – frameshift insertion/deletion mutation, S – splice site mutation. B. FOXA1, C. ESR1. mRNA level corresponds to log2(FPKM + 0.01), protein level corresponds to normalized relative protein abundance. A BRD2 BRD3 BRD4 CREBBP EP300

R = 0.55 , p = 5.4e−07 R = 0.4 , p = 0.00064 R = 0.49 , p = 1.4e−05 R = 0.25 , p = 0.038 R = 0.33 , p = 0.0052 2 PAM50 1 ● ● ● ● ● ● LumA ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ●● ● LumB 0 ●● ● ● ● ● ● ● ●● ● ●● Protein level ●● ● ● ● ● ● ● −1

−1 0 1 2 3 −1 0 1 2 3 −1 0 1 2 3 −1 0 1 2 3 −1 0 1 2 3 GATA3 protein level B BRD2 BRD3 BRD4 CREBBP EP300

R = 0.44 , p = 0.00012 R = 0.23 , p = 0.052 R = 0.36 , p = 0.0018 R = 0.11 , p = 0.37 R = 0.37 , p = 0.0015 2 PAM50 1 ● ● ● ● ● ● LumA ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● LumB 0 ●● ● ● ● ● ● ● ● ● ● ● ● ● Protein level ● ● ● ● ● ● ● ● −1

−1 0 1 2 −1 0 1 2 −1 0 1 2 −1 0 1 2 −1 0 1 2 FOXA1 protein level C BRD2 BRD3 BRD4 CREBBP EP300

R = 0.29 , p = 0.015 R = - 0.11 , p = 0.34 R = 0.013 , p = 0.92 R = - 0.0054 , p = 0.96 R = 0.23 , p = 0.049 2 PAM50 1 ● ● ● ● ● ● LumA ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● LumB 0 ● ●● ● ● ● ● ● ● ● ● ● Protein level ●● ● ● ● ● ● ● −1

−2 0 2 −2 0 2 −2 0 2 −2 0 2 −2 0 2 ESR1 protein level

55

D BRD2 BRD3 BRD4 CREBBP EP300

R = 0.27 , p = 0.023 R = 0.48 , p = 2.3e−05 R = 0.48 , p = 2e−05 R = 0.44 , p = 0.00011 R = 0.51 , p = 6.9e−06 2 PAM50 1 ● ● ● ● ● ● LumA ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● LumB 0 ● ● ● ● ● ● ● ● ●● ● ● ●● Protein level ●● ● ● ● ● ● ● −1

−1 0 1 2 3 −1 0 1 2 3 −1 0 1 2 3 −1 0 1 2 3 −1 0 1 2 3 H2B K11 acetylation level Figure A.3. Spearman’s correlation between H2B interacting bromodomain containing proteins and luminal specific DE genes across luminal subtypes. A. GATA3. B. FOXA1. C. ESR1. D. H2B K11 acetylation. Protein level corresponds to normalized relative protein abundance.

300 ● 5.0 Anova, p = 0.00047 Anova, p = 2.2e−07 Anova, p = 1.9e−11

● 2.5 1 200 ●

● ● ● ● ● 0.0 ● SLC2A1 SLC2A1 0 100

● protein level

SLC2A1 FPKM ● ● ● ● −2.5 copy number level number copy ● ● ● −1 0

Her2 Her2 Her2 Basal LumA LumB Basal LumA LumB Basal LumA LumB

Normal_like Normal_like Normal_like

Figure A.4. SCL2A1 gene amplification, RNA level and protein (GLUT1) level in breast cancer subtypes. Copy number level is considered as log2(copy number tumor/ copy number normal). The left plot has a dashed line marking 0.5, a cutoff for gene to have one additional copy. FPKM is Fragments Per Kilobase of transcript per Million mapped reads.

Anova, p = 0.0065 Anova,● p = 3.2e−16 Anova, p = 2.5e−10

● 750 2 1 ●

● ● ● ● 500 ● 0 LDHB 0 ● ● ● ●

● LDHB FPKM ● 250 ● ● LDHB protein level copy number level number copy ● ● ● ● ● ● −2 −1 ● 0

Her2 Her2 Her2 Basal LumA LumB Basal LumA LumB Basal LumA LumB

Normal_like Normal_like Normal_like

Figure A.5. LDHB gene amplification, RNA level and protein (GLUT1) level in breast cancer subtypes. Copy number level is considered as log2(copy number tumor/ copy number normal). The

56

left plot has a dashed line marking 0.5, a cutoff for gene to have one additional copy. FPKM is Fragments Per Kilobase of transcript per Million mapped reads.

HDAC8SIRT2SIRT6NCOA1HDAC11SIRT3SIRT5HDAC10HDAC6KAT2AKAT6AATF2 SIRT7HDAC4KAT5 CLOCKHDAC5HDAC7KAT2BKAT6BHAT1 HDAC2NCOA3BRD1EP300KAT7 BRD2SIRT1BRD4HDAC1KAT8 ELP3 HDAC3BRD3CREBBP 1 HDAC8 SIRT2 SIRT6 NCOA1 0.8 HDAC11 SIRT3 SIRT5 0.6 HDAC10 HDAC6 KAT2A KAT6A 0.4 ATF2 SIRT7 HDAC4 0.2 KAT5 CLOCK HDAC5 HDAC7 0 KAT2B KAT6B HAT1 −0.2 HDAC2 NCOA3 BRD1 EP300 −0.4 KAT7 BRD2 SIRT1 −0.6 BRD4 HDAC1 KAT8 ELP3 −0.8 HDAC3 BRD3 CREBBP −1

Figure A.6. Self-correlation of protein expression of HATs and HDACs.

57

References

Bailey, Matthew H, Collin Tokheim, Eduard Porta-Pardo, Sohini Sengupta, Denis Bertrand, Amila Weerasinghe, Antonio Colaprico, et al. 2018. “Comprehensive Characterization of Cancer Driver Genes and Mutations.” Cell 173 (2). Elsevier: 371–385.e18. https://doi.org/10.1016/j.cell.2018.02.060. Barber, Matthew F, Eriko Michishita-Kioi, Yuanxin Xi, Luisa Tasselli, Mitomu Kioi, Zarmik Moqtaderi, Ruth I Tennen, et al. 2012. “SIRT7 Links H3K18 Deacetylation to Maintenance of Oncogenic Transformation.” Nature 487 (7405): 114–18. https://doi.org/10.1038/nature11043. Barneda-Zahonero, Bruna, and Maribel Parra. 2012. “Histone Deacetylases and Cancer.” Molecular Oncology 6 (6). No longer published by Elsevier: 579–89. https://doi.org/10.1016/J.MOLONC.2012.07.003. Blosser, Timothy R., Janet G. Yang, Michael D. Stone, Geeta J. Narlikar, and Xiaowei Zhuang. 2009. “Dynamics of Nucleosome Remodelling by Individual ACF Complexes.” Nature 462 (7276): 1022–27. https://doi.org/10.1038/nature08627. Carrer, Alessandro, Joshua L D Parris, Sophie Trefely, Ryan A Henry, David C Montgomery, AnnMarie Torres, John M Viola, et al. 2017. “Impact of a High-Fat Diet on Tissue Acyl-CoA and Histone Acetylation Levels.” The Journal of Biological Chemistry 292 (8). American Society for Biochemistry and Molecular Biology: 3312–22. https://doi.org/10.1074/jbc.M116.750620. Cerbo, V. Di, and R. Schneider. 2013. “Cancers with Wrong HATs: The Impact of Acetylation.” Briefings in Functional Genomics 12 (3). Oxford University Press: 231–43. https://doi.org/10.1093/bfgp/els065. Chhoy, Peter, Kristin A. Anderson, Kathleen A. Hershberger, Frank K. Huynh, Angelical S. Martin, Eoin McDonnell, Brett S. Peterson, et al. 2016. “Deacetylation by SIRT3 Relieves Inhibition of Mitochondrial Protein Function.” In Sirtuins, 105–38. Dordrecht: Springer Netherlands. https://doi.org/10.1007/978-94-024-0962-8_5. Choudhary, C., C. Kumar, F. Gnad, M. L. Nielsen, M. Rehman, T. C. Walther, J. V. Olsen, and M. Mann. 2009. “Lysine Acetylation Targets Protein Complexes and Co-Regulates Major Cellular Functions.” Science 325 (5942): 834–40. https://doi.org/10.1126/science.1175371. Cimen, Huseyin, Min-Joon Han, Yongjie Yang, Qiang Tong, Hasan Koc, and Emine C Koc. 2010. “Regulation of Succinate Dehydrogenase Activity by SIRT3 in Mammalian Mitochondria.” Biochemistry 49 (2). NIH Public Access: 304–11. https://doi.org/10.1021/bi901627u. Cordeiro, Artur T., Paul A. M. Michels, Luiz F. Delboni, and Otávio H. Thiemann. 2004. “The Crystal Structure of Glucose-6-Phosphate Isomerase from Leishmania Mexicana Reveals Novel Active Site Features.” European Journal of Biochemistry 271 (13). John Wiley & Sons, Ltd: 2765– 72. https://doi.org/10.1111/j.1432-1033.2004.04205.x.

58

Creyghton, Menno P, Albert W Cheng, G Grant Welstead, Tristan Kooistra, Bryce W Carey, Eveline J Steine, Jacob Hanna, et al. 2010. “Histone H3K27ac Separates Active from Poised Enhancers and Predicts Developmental State.” Proceedings of the National Academy of Sciences of the United States of America 107 (50). National Academy of Sciences: 21931–36. https://doi.org/10.1073/pnas.1016071107. Debes, Jose D, Thomas J Sebo, Christine M Lohse, Linda M Murphy, De Anna L Haugen, and Donald J Tindall. 2003. “P300 in Prostate Cancer Proliferation and Progression.” Cancer Research 63 (22): 7638–40. Doll, Sophia, and Alma L Burlingame. 2015. “Mass Spectrometry-Based Detection and Assignment of Protein Posttranslational Modifications.” ACS Chemical Biology 10 (1). American Chemical Society: 63–71. https://doi.org/10.1021/cb500904b. Drazic, Adrian, Line M. Myklebust, Rasmus Ree, and Thomas Arnesen. 2016. “The World of Protein Acetylation.” Biochimica et Biophysica Acta (BBA) - Proteins and Proteomics 1864 (10). Elsevier: 1372–1401. https://doi.org/10.1016/J.BBAPAP.2016.06.007. Fan, Jun, Changliang Shan, Hee-Bum Kang, Shannon Elf, Jianxin Xie, Meghan Tucker, Ting-Lei Gu, et al. 2014. “Tyr Phosphorylation of PDP1 Toggles Recruitment between ACAT1 and SIRT3 to Regulate the Pyruvate Dehydrogenase Complex.” Molecular Cell 53 (4). Cell Press: 534–48. https://doi.org/10.1016/J.MOLCEL.2013.12.026. Fernandes, Jolyn, Alexis Weddle, Caroline S Kinter, Kenneth M Humphries, Timothy Mather, Luke I Szweda, and Michael Kinter. 2015. “Lysine Acetylation Activates Mitochondrial Aconitase in the Heart.” Biochemistry 54 (25). NIH Public Access: 4008–18. https://doi.org/10.1021/acs.biochem.5b00375. Gates, Leah A, Jiejun Shi, Aarti D Rohira, Qin Feng, Bokai Zhu, Mark T Bedford, Cari A Sagum, et al. 2017. “Acetylation on Histone H3 Lysine 9 Mediates a Switch from Transcription Initiation to Elongation.” The Journal of Biological Chemistry 292 (35). American Society for Biochemistry and Molecular Biology: 14456–72. https://doi.org/10.1074/jbc.M117.802074. Gayther, S A, S J Batley, L Linger, A Bannister, K Thorpe, S F Chin, Y Daigo, et al. 2000. “Mutations Truncating the EP300 Acetylase in Human Cancers.” Nature Genetics 24 (3): 300–303. https://doi.org/10.1038/73536. Gui, Yaoting, Guangwu Guo, Yi Huang, Xueda Hu, Aifa Tang, Shengjie Gao, Renhua Wu, et al. 2011. “Frequent Mutations of Chromatin Remodeling Genes in Transitional Cell Carcinoma of the Bladder.” Nature Genetics 43 (9): 875–78. https://doi.org/10.1038/ng.907. Harms, Kelly Lynn, and Xinbin Chen. 2007. “ 2 Modulates P53 Transcriptional Activities through Regulation of P53-DNA Binding Activity.” Cancer Research 67 (7): 3145–52. https://doi.org/10.1158/0008-5472.CAN-06-4397. Hebert, Alexander S, Kristin E Dittenhafer-Reed, Wei Yu, Derek J Bailey, Ebru Selin Selen, Melissa D Boersma, Joshua J Carson, et al. 2013. “Calorie Restriction and SIRT3 Trigger Global Reprogramming of the Mitochondrial Protein Acetylome.” Molecular Cell 49 (1). NIH Public Access: 186–99. https://doi.org/10.1016/j.molcel.2012.10.024. Heiden, Matthew G Vander, Lewis C Cantley, and Craig B Thompson. 2009. “Understanding the Warburg Effect: The Metabolic Requirements of Cell Proliferation.” Science (New 59

York, N.Y.) 324 (5930). NIH Public Access: 1029–33. https://doi.org/10.1126/science.1160809. Ito, T, T Ikehara, T Nakagawa, W L Kraus, and M Muramatsu. 2000. “P300-Mediated Acetylation Facilitates the Transfer of Histone H2A-H2B Dimers from Nucleosomes to a Histone .” Genes & Development 14 (15). Cold Spring Harbor Laboratory Press: 1899–1907. https://doi.org/10.1101/GAD.14.15.1899. Iyer, Narayanan Gopalakrishna, Hilal Özdag, and Carlos Caldas. 2004. “P300/CBP and Cancer.” Oncogene 23 (24). Nature Publishing Group: 4225–31. https://doi.org/10.1038/sj.onc.1207118. Ji Hyun Lee, K. Z. Chang, V. Patel, and C. J. Jeffery. 2001. “Crystal Structure of Rabbit Phosphoglucose Isomerase Complexed with Its Substrate D-Fructose 6-Phosphate.” Biochemistry 40 (26). UTC: 7799–7805. https://doi.org/10.1021/bi002916o. Karmodiya, Krishanpal, Arnaud R Krebs, Mustapha Oulad-Abdelghani, Hiroshi Kimura, and Laszlo Tora. 2012. “H3K9 and H3K14 Acetylation Co-Occur at Many Gene Regulatory Elements, While H3K14ac Marks a Subset of Inactive Inducible Promoters in Mouse Embryonic Stem Cells.” BMC Genomics 13 (1): 424. https://doi.org/10.1186/1471-2164-13-424. Khoury, George A, Richard C Baliban, and Christodoulos A Floudas. 2011. “Proteome- Wide Post-Translational Modification Statistics: Frequency Analysis and Curation of the Swiss-Prot Database.” Scientific Reports 1 (September). Nature Publishing Group. https://doi.org/10.1038/srep00090. Krusche, Claudia A, Pia Wülfing, Christian Kersting, Anne Vloet, Werner Böcker, Ludwig Kiesel, Henning M Beier, and Joachim Alfer. 2005. “Histone Deacetylase-1 and -3 Protein Expression in Human Breast Cancer: A Tissue Microarray Analysis.” Breast Cancer Research and Treatment 90 (1): 15–23. https://doi.org/10.1007/s10549-004-1668-2. Kung, A L, V I Rebel, R T Bronson, L E Ch’ng, C A Sieff, D M Livingston, and T P Yao. 2000. “Gene Dose-Dependent Control of Hematopoiesis and Hematologic Tumor Suppression by CBP.” Genes & Development 14 (3). Cold Spring Harbor Laboratory Press: 272–77. https://doi.org/10.1101/GAD.14.3.272. Lanning, Nathan J., Joshua P. Castle, Simar J. Singh, Andre N. Leon, Elizabeth A. Tovar, Amandeep Sanghera, Jeffrey P. MacKeigan, Fabian V. Filipp, and Carrie R. Graveel. 2017. “Metabolic Profiling of Triple-Negative Breast Cancer Cells Reveals Metabolic Vulnerabilities.” Cancer & Metabolism 5 (1). BioMed Central: 6. https://doi.org/10.1186/s40170-017-0168-x. Lee, Kenneth K., and Jerry L. Workman. 2007. “Histone Acetyltransferase Complexes: One Size Doesn’t Fit All.” Nature Reviews Molecular Cell Biology 8 (4). Nature Publishing Group: 284–95. https://doi.org/10.1038/nrm2145. Liberzon, Arthur, Chet Birger, Helga Thorvaldsdóttir, Mahmoud Ghandi, Jill P. Mesirov, and Pablo Tamayo. 2015. “The Molecular Signatures Database Hallmark Gene Set Collection.” Cell Systems 1 (6). Cell Press: 417–25. https://doi.org/10.1016/J.CELS.2015.12.004. Lv, Lei, Yan-Ping Xu, Di Zhao, Fu-Long Li, Wei Wang, Naoya Sasaki, Ying Jiang, et al. 2013. “Mitogenic and Oncogenic Stimulation of K433 Acetylation Promotes PKM2 Activity and Nuclear Localization.” Molecular Cell 52 (3). NIH Public Access: 340–52. https://doi.org/10.1016/j.molcel.2013.09.004. 60

Malik, Simeen, Shiming Jiang, Jason P Garee, Eric Verdin, Adrian V Lee, Bert W O’Malley, Mao Zhang, Narasimhaswamy S Belaguli, and Steffi Oesterreich. 2010. “Histone Deacetylase 7 and FoxA1 in Estrogen-Mediated Repression of RPRM.” Molecular and Cellular Biology 30 (2): 399–412. https://doi.org/10.1128/MCB.00907-09. Mertins, Philipp, Lauren C Tang, Karsten Krug, David J Clark, Marina A Gritsenko, Lijun Chen, Karl R Clauser, et al. 2018. “Reproducible Workflow for Multiplexed Deep-Scale Proteome and Phosphoproteome Analysis of Tumor Tissues by Liquid Chromatography-Mass Spectrometry.” Nature Protocols 13 (7). NIH Public Access: 1632–61. https://doi.org/10.1038/s41596-018-0006-9. Morin, Ryan D., Maria Mendez-Lago, Andrew J. Mungall, Rodrigo Goya, Karen L. Mungall, Richard D. Corbett, Nathalie A. Johnson, et al. 2011. “Frequent Mutation of Histone-Modifying Genes in Non-Hodgkin Lymphoma.” Nature 476 (7360): 298–303. https://doi.org/10.1038/nature10351. Morris, Stephanie A., Bhargavi Rao, Benjamin A. Garcia, Sandra B. Hake, Robert L. Diaz, Jeffrey Shabanowitz, Donald F. Hunt, C. David Allis, Jason D. Lieb, and Brian D. Strahl. 2007. “Identification of Histone H3 Lysine 36 Acetylation as a Highly Conserved Histone Modification.” Journal of Biological Chemistry 282 (10): 7632–40. https://doi.org/10.1074/jbc.M607909200. Mueller, S., A. Wahlander, N. Selevsek, C. Otto, E. M. Ngwa, K. Poljak, A. D. Frey, M. Aebi, and R. Gauss. 2015. “Protein Degradation Corrects for Imbalanced Subunit Stoichiometry in OST Complex Assembly.” Molecular Biology of the Cell 26 (14). American Society for Cell Biology: 2596–2608. https://doi.org/10.1091/mbc.E15-03-0168. Mullighan, Charles G., Jinghui Zhang, Lawryn H. Kasper, Stephanie Lerach, Debbie Payne- Turner, Letha A. Phillips, Sue L. Heatley, et al. 2011. “CREBBP Mutations in Relapsed Acute Lymphoblastic Leukaemia.” Nature 471 (7337): 235–39. https://doi.org/10.1038/nature09727. Muraoka, M, M Konishi, R Kikuchi-Yanoshita, K Tanaka, N Shitara, J M Chong, T Iwama, and M Miyaki. 1996. “P300 Gene Alterations in Colorectal and Gastric Carcinomas.” Oncogene 12 (7): 1565–69. Narita, Takeo, Brian T. Weinert, and Chunaram Choudhary. 2018. “Functions and Mechanisms of Non-Histone Protein Acetylation.” Nature Reviews Molecular Cell Biology. https://doi.org/10.1038/s41580-018-0081-3. Ozden, Ozkan, Seong-Hoon Park, Brett A Wagner, Ha Yong Song, Yueming Zhu, Athanassios Vassilopoulos, Barbara Jung, Garry R Buettner, and David Gius. 2014. “SIRT3 Deacetylates and Increases Pyruvate Dehydrogenase Activity in Cancer Cells.” Free Radical Biology & Medicine 76 (November). NIH Public Access: 163–72. https://doi.org/10.1016/j.freeradbiomed.2014.08.001. Park, Soon Young, Ji Ae Jun, Kang Jin Jeong, Hoi Jeong Heo, Jang Sihn Sohn, Hoi Young Lee, Chang Gyo Park, and Jaeku Kang. 2011. “Histone Deacetylases 1, 6 and 8 Are Critical for Invasion in Breast Cancer.” Oncology Reports 25 (6): 1677–81. https://doi.org/10.3892/or.2011.1236. Parra, Maribel, and Eric Verdin. 2010. “Regulatory Signal Transduction Pathways for Class IIa Histone Deacetylases.” Current Opinion in Pharmacology 10 (4). Elsevier: 454–60. https://doi.org/10.1016/J.COPH.2010.04.004. Parra, Michael A, David Kerr, Deirdre Fahy, Derek J Pouchnik, and John J Wyrick. 2006. 61

“Deciphering the Roles of the Histone H2B N-Terminal Domain in Genome-Wide Transcription.” Molecular and Cellular Biology 26 (10). American Society for Microbiology Journals: 3842–52. https://doi.org/10.1128/MCB.26.10.3842-3852.2006. Pasqualucci, Laura, David Dominguez-Sola, Annalisa Chiarenza, Giulia Fabbri, Adina Grunn, Vladimir Trifonov, Lawryn H. Kasper, et al. 2011. “Inactivating Mutations of Acetyltransferase Genes in B-Cell Lymphoma.” Nature 471 (7337): 189–95. https://doi.org/10.1038/nature09730. Pradeepa, Madapura M. 2017. “Causal Role of Histone in Function.” Transcription 8 (1). Taylor & Francis: 40. https://doi.org/10.1080/21541264.2016.1253529. Reichert, Nina, Mohamed-Amin Choukrallah, and Patrick Matthias. 2012. “Multiple Roles of Class I HDACs in Proliferation, Differentiation, and Development.” Cellular and Molecular Life Sciences 69 (13). SP Birkhäuser Verlag Basel: 2173–87. https://doi.org/10.1007/s00018-012-0921-9. Scalise, Mariafrancesca, Lorena Pochini, Michele Galluccio, Lara Console, and Cesare Indiveri. 2017. “Glutamine Transport and Mitochondrial Metabolism in Cancer Cell Growth.” Frontiers in Oncology 7. Frontiers Media SA: 306. https://doi.org/10.3389/fonc.2017.00306. Seligson, David B., Steve Horvath, Matthew A. McBrian, Vei Mah, Hong Yu, Sheila Tze, Qun Wang, David Chia, Lee Goodglick, and Siavash K. Kurdistani. 2009. “Global Levels of Histone Modifications Predict Prognosis in Different Cancers.” The American Journal of Pathology 174 (5). Elsevier: 1619–28. https://doi.org/10.2353/AJPATH.2009.080874. Senese, Silvia, Katrin Zaragoza, Simone Minardi, Ivan Muradore, Simona Ronzoni, Alfonso Passafaro, Loris Bernard, et al. 2007. “Role for Histone Deacetylase 1 in Human Tumor Cell Proliferation.” Molecular and Cellular Biology 27 (13): 4784–95. https://doi.org/10.1128/MCB.00494- 07. Shahbazian, Mona D, and Michael Grunstein. 2007. “Functions of Site-Specific Histone Acetylation and Deacetylation.” https://doi.org/10.1146/annurev.biochem.76.052705.162114. Shou, Jiafeng, Yucheng Lai, Jinming Xu, and Jian Huang. 2016. “Prognostic Value of FOXA1 in Breast Cancer: A Systematic Review and Meta-Analysis.” The Breast 27 (June): 35–43. https://doi.org/10.1016/j.breast.2016.02.009. Sol, Eri Maria, Sebastian A. Wagner, Brian T. Weinert, Amit Kumar, Hyun-Seok Kim, Chu- Xia Deng, and Chunaram Choudhary. 2012. “Proteomic Investigations of Lysine Acetylation Identify Diverse Substrates of Mitochondrial Deacetylase Sirt3.” Edited by Axel Imhof. PLoS ONE 7 (12). Public Library of Science: e50545. https://doi.org/10.1371/journal.pone.0050545. Subramanian, A., P. Tamayo, V. K. Mootha, S. Mukherjee, B. L. Ebert, M. A. Gillette, A. Paulovich, et al. 2005. “Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles.” Proceedings of the National Academy of Sciences 102 (43): 15545–50. https://doi.org/10.1073/pnas.0506580102. Svinkina, Tanya, Hongbo Gu, Jeffrey C Silva, Philipp Mertins, Jana Qiao, Shaunt Fereshetian, Jacob D Jaffe, Eric Kuhn, Namrata D Udeshi, and Steven A Carr. 2015. “Deep, Quantitative Coverage of the Lysine Acetylome Using Novel Anti-Acetyl-Lysine Antibodies and an Optimized Proteomic Workflow.” Molecular & Cellular Proteomics : MCP 14 (9). American Society for Biochemistry and Molecular Biology: 2429–40. https://doi.org/10.1074/mcp.O114.047555. 62

Taylor, G. C. A., R. Eskeland, B. Hekimoglu-Balkan, M. M. Pradeepa, and W. A. Bickmore. 2013. “H4K16 Acetylation Marks Active Genes and Enhancers of Embryonic Stem Cells, but Does Not Alter Chromatin Compaction.” Genome Research 23 (12): 2053–65. https://doi.org/10.1101/gr.155028.113. Theodorou, Vasiliki, Rory Stark, Suraj Menon, and Jason S Carroll. 2013. “GATA3 Acts Upstream of FOXA1 in Mediating ESR1 Binding by Shaping Enhancer Accessibility.” Genome Research 23 (1). Cold Spring Harbor Laboratory Press: 12–22. https://doi.org/10.1101/gr.139469.112. Venkata, Santhilata Kuppili, Dimitra Repana, Joel Nulsen, Lisa Dressler, Michele Bortolomeazzi, Aikaterini Tourna, Anna Yakovleva, Tommaso Palmieri, and Francesca D Ciccarelli. 2018. “The Network of Cancer Genes (NCG): A Comprehensive Catalogue of Known and Candidate Cancer Genes from Cancer Sequencing Screens.” BioRxiv, August. Cold Spring Harbor Laboratory, 389858. https://doi.org/10.1101/389858. Wang, Jianhua, Jincheng Wang, Jinlu Dai, Younghun Jung, Chuen-Long Wei, Yu Wang, Aaron M. Havens, et al. 2007. “A Glycolytic Mechanism Regulating an Angiogenic Switch in Prostate Cancer.” Cancer Research 67 (1): 149–59. https://doi.org/10.1158/0008-5472.CAN-06-2971. Wang, Shiwen, Bowen Jiang, Tengfei Zhang, Lixia Liu, Yi Wang, Yiping Wang, Xiufei Chen, et al. 2015. “ and MTOR Pathway Regulate HDAC3-Mediated Deacetylation and Activation of PGK1.” Edited by Daniel Durocher. PLOS Biology 13 (9). Public Library of Science: e1002243. https://doi.org/10.1371/journal.pbio.1002243. Wang, Yu, Scott P Kallgren, Bharat D Reddy, Karen Kuntz, Luis López-Maury, James Thompson, Stephen Watt, et al. 2012. “Histone H3 Lysine 14 Acetylation Is Required for Activation of a DNA Damage Checkpoint in Fission Yeast.” The Journal of Biological Chemistry 287 (6). American Society for Biochemistry and Molecular Biology: 4386–93. https://doi.org/10.1074/jbc.M111.329417. Wang, Zhibin, Chongzhi Zang, Jeffrey A Rosenfeld, Dustin E Schones, Artem Barski, Suresh Cuddapah, Kairong Cui, et al. 2008. “Combinatorial Patterns of Histone Acetylations and in the .” Nature Genetics 40 (7). NIH Public Access: 897–903. https://doi.org/10.1038/ng.154. Weinert, Brian T, Takeo Narita, Shankha Satpathy, Balaji Srinivasan, Bogi K Hansen, Christian Schölz, William B Hamilton, et al. 2018. “Time-Resolved Analysis Reveals Rapid Dynamics and Broad Scope of the CBP/P300 Acetylome.” Cell 174 (1). Elsevier: 231–244.e12. https://doi.org/10.1016/j.cell.2018.04.033. Wellen, K. E., G. Hatzivassiliou, U. M. Sachdeva, T. V. Bui, J. R. Cross, and C. B. Thompson. 2009. “ATP-Citrate Lyase Links Cellular Metabolism to Histone Acetylation.” Science 324 (5930): 1076–80. https://doi.org/10.1126/science.1164097. Wen, Wen, Jin Ding, Wen Sun, Kun Wu, Beifang Ning, Wenfeng Gong, Guoping He, et al. 2010. “Suppression of Cyclin D1 by Hypoxia-Inducible Factor-1 via Direct Mechanism Inhibits the Proliferation and 5-Fluorouracil-Induced Apoptosis of A549 Cells.” Cancer Research 70 (5): 2010–19. https://doi.org/10.1158/0008-5472.CAN-08-4910. Xu, Yuqun, Lingwen Liu, Akira Nakamura, Shinichi Someya, Takuya Miyakawa, and Masaru

63

Tanokura. 2017. “Studies on the Regulatory Mechanism of Isocitrate Dehydrogenase 2 Using Acetylation Mimics.” Scientific Reports 7 (1): 9785. https://doi.org/10.1038/s41598-017-10337-7. Yamaguchi, Hirohito, Nicholas T Woods, Landon G Piluso, Heng-Huan Lee, Jiandong Chen, Kapil N Bhalla, Alvaro Monteiro, Xuan Liu, Mien-Chie Hung, and Hong-Gang Wang. 2009. “P53 Acetylation Is Crucial for Its Transcription-Independent Proapoptotic Functions.” The Journal of Biological Chemistry 284 (17). American Society for Biochemistry and Molecular Biology: 11171–83. https://doi.org/10.1074/jbc.M809268200. Yarosh, Will, Tomasa Barrientos, Taraneh Esmailpour, Limin Lin, Philip M Carpenter, Kathryn Osann, Hoda Anton-Culver, and Taosheng Huang. 2008. “TBX3 Is Overexpressed in Breast Cancer and Represses P14 ARF by Interacting with Histone Deacetylases.” Cancer Research 68 (3): 693–99. https://doi.org/10.1158/0008-5472.CAN-07-5012. Yoon, Nam K., Erin L. Maresh, Dejun Shen, Yahya Elshimali, Sophia Apple, Steve Horvath, Vei Mah, et al. 2010. “Higher Levels of GATA3 Predict Better Survival in Women with Breast Cancer.” Human Pathology 41 (12): 1794–1801. https://doi.org/10.1016/j.humpath.2010.06.010. Yu, Guangchuang, Li-Gen Wang, Yanyan Han, and Qing-Yu He. 2012. “ClusterProfiler: An R Package for Comparing Biological Themes among Gene Clusters.” Omics : A Journal of Integrative Biology 16 (5). Mary Ann Liebert, Inc.: 284–87. https://doi.org/10.1089/omi.2011.0118. Yu, Wei, Kristin E Dittenhafer-Reed, and John M Denu. 2012. “SIRT3 Protein Deacetylates Isocitrate Dehydrogenase 2 (IDH2) and Regulates Mitochondrial Redox Status.” The Journal of Biological Chemistry 287 (17). American Society for Biochemistry and Molecular Biology: 14078–86. https://doi.org/10.1074/jbc.M112.355206. Zhang, Zhenhuan, Hiroko Yamashita, Tatsuya Toyama, Hiroshi Sugiura, Yoko Omoto, Yoshiaki Ando, Keiko Mita, Maho Hamaguchi, Shin-Ichi Hayashi, and Hirotaka Iwase. 2004. “HDAC6 Expression Is Correlated with Better Survival in Breast Cancer.” Clinical Cancer Research : An Official Journal of the American Association for Cancer Research 10 (20): 6962–68. https://doi.org/10.1158/1078-0432.CCR-04-0455. Zhao, S., W. Xu, W. Jiang, W. Yu, Y. Lin, T. Zhang, J. Yao, et al. 2010. “Regulation of Cellular Metabolism by Protein Lysine Acetylation.” Science 327 (5968): 1000–1004. https://doi.org/10.1126/science.1179689. Zou, Xianghui, Yueming Zhu, Seong-Hoon Park, Guoxiang Liu, Joseph O’Brien, Haiyan Jiang, and David Gius. 2017. “SIRT3-Mediated Dimerization of IDH2 Directs Cancer Cell Metabolism and Tumor Growth.” Cancer Research 77 (15): 3990–99. https://doi.org/10.1158/0008- 5472.CAN-16-2393. Zyla, Joanna, Michal Marczyk, January Weiner, and Joanna Polanska. 2017. “Ranking Metrics in Gene Set Enrichment Analysis: Do They Matter?” BMC Bioinformatics 18 (1). BioMed Central: 256. https://doi.org/10.1186/s12859-017-1674-0.

64

Protein acetylation in breast cancer, Karpova, M.S., 2018

65