Post-translational modifications regulatory networks: , mechanisms and implications

Thèse

Luca Freschi

Doctorat en biologie Philosophiae Doctor (Ph.D.)

Québec, Canada

© Luca Freschi, 2015

Résumé

Les modifications post-traductionnelles (PTM) sont des modifications chimiques des protéines qui permettent à la cellule de réguler finement ses fonctions ainsi que de coder et d’intégrer des signaux environnementaux. Les progrès récents en ce qui a trait aux techniques expérimentales et bioinformatiques nous ont permis de determiner les profils de PTM pour des protéomes entiers ainsi que d’identifier les molécules qui sont responsables d’ « écrire » ou d’« effacer » ces PTM. Avec ces donnés, il a été possible de commencer à definir des réseaux de régulation cellulaire par PTM. Ici, nous avons étudié l’évolution de ces réseaux pour mieux comprendre comment ils peuvent contribuer à expliquer la complexité et la diversité des organismes ainsi que pour mieux comprendre leurs mecanismes d’action. Avant tout, nous avons abordé la question de comment les réseaux de régulation des PTM peuvent être recablés après un évenement de duplication des gènes en étudiant comment le réseau de phosphorégulation de la levure bourgeonnante a été récablé après un évenement de duplication complète du génome qui a eu lieu il y a 100 milions d’années. Nos résultats mettent en évidence le rôle de la duplication des gènes comme mécanisme clé pour l’innovation et la complexification des réseaux de régulation par PTM. Par la suite, nous avons abordé la question de comment les PTM peuvent contribuer à la diversité des organismes en comparant les profils de de l’homme et de la souris. Nous avons trouvé des différences substantielles dans les profils de PTM de ces deux espèces qui ont le potentiel d’expliquer, au moins en partie, les différences phénotypiques observées entre eux. Nous avons aussi trouvé des évidences qui supportent l’idée que les PTM peuvent « sauter » vers des nouvelles localisations et quand même réguler les mêmes fonctions biologiques. Ce phénomène doit être pris en considération dans les comparaisons des profils de PTM qui appartiennent à des espèces différentes, pour éviter de surestimer la divergence causée par la régulation par les PTM. Enfin, nous avons investigué comment plusieures PTM alternatives pour un même residu pouvent interagir pour réguler des fonctions cellulaires. Nous avons examiné deux des PTM les plus connus, la phosphorylation et la O-GlcNAcylation, qui modifient les sérines et les thréonines, et nous avons étudié les mécanismes potentiels d’interaction entre ces deux PTM. Nos résultats supportent l’hypothèse que ces deux PTM contrôlent plusieurs fonctions biologiques plutôt qu’une seule fonction. Globalement, les résultats présentés dans cette

III

thèse permettent d’élucider les dynamiques évolutives, les mécanismes de fonctionnement et les implications biologiques des PTM.

IV

Abstract

Post-translational modifications (PTMs) are chemical modification of that allow the cell to finely tune its functions as well as to encode and integrate environmental signals. The recent advancements in the experimental and bioinformatic techniques have allowed us to determine the PTM profiles of entire proteomes as well as to identify the molecules that write or erase PTMs to/from each . This data have made possible to define cellular PTM regulatory networks. Here, we study the evolution of these networks to get new insights about how they may contribute to increase organismal complexity and diversity and to better understand their molecular mechanisms of functioning. We first address the question of how and to which extent a PTM network can be rewired after a duplication event, by studying how the budding yeast phosphoregulatory network was rewired after a whole genome duplication event that occurred 100 million years ago. Our results highlight the role of gene duplication as a key mechanism to innovate and complexify PTM regulatory networks. Then, we address the question of how PTM networks may contribute to organismal diversity by comparing the and mouse phosphorylation profiles. We find that there are substantial differences in the PTM profiles of these two species that have the potential to explain, at least in part, the phenotypic differences observed between them. Moreover, we find evidence supporting the idea that PTMs can jump to new positions during evolution and still regulate the same biological functions. This phenomenon should be taken into account when comparing the PTM profiles of different species, in order to avoid overestimating the divergence in PTM regulation. Finally, we investigate how multiple and alternative PTMs that affect the same residues interact with each other to control proteins functions. We focus on two of the most studied PTMs, protein phosphorylation and O-GlcNAcylation, that affect serine and threonine residues and we study their potential mechanisms of interactions in human and mouse. Our results support the hypothesis that these two PTMs control multiple biological functions rather than a single one. Globally this work provides new findings that elucidate the evolutionary dynamics, the functional mechanisms and the biological implications of PTMs.

V

Table of Contents

RÉSUMÉ ...... III ABSTRACT ...... V TABLE OF CONTENTS ...... VII LIST OF FIGURES ...... XI LIST OF ABBREVIATIONS ...... XIII ACKNOWLEDGEMENTS ...... XVII FOREWORD ...... XIX CHAPTER 1 - INTRODUCTION ...... 1 1.1 - POST-TRANSLATIONAL MODIFICATIONS ...... 1 1.2 - HOW PTMS REGULATE PROTEIN FUNCTIONS ...... 2 1.3 - PTM REGULATORY NETWORKS ...... 3 1.4 - CROSS-TALK BETWEEN PTMS ...... 4 1.5 - PTM NETWORKS AND THE EVOLUTION OF BIOLOGICAL COMPLEXITY AND DIVERSITY ...... 4 1.6 - THE TECHNOLOGICAL ADVANCEMENTS OF THE LAST DECADE MAKE POSSIBLE THE STUDY OF PTM NETWORKS ...... 5 1.7 – AIMS OF THIS THESIS ...... 6 CHAPTER 2 - PHOSPHORYLATION NETWORK REWIRING BY GENE DUPLICATION ...... 9 2.1 – RÉSUMÉ ...... 10 2.2 - ABSTRACT ...... 11 2.3 – INTRODUCTION ...... 12 2.4 - MATERIALS AND METHODS ...... 13 2.5 – RESULTS AND DISCUSSION ...... 18 2.5.1 - Paralogous phosphoproteins substantially diverged after WGD ...... 18 2.5.2 - Conservation and compensation of phosphosite loss by site-position turnover ...... 21 2.5.3 - Life after WGD: rewiring the cellular regulatory networks ...... 22 2.5.4 - Phosphosite loss dominates site-divergence ...... 24 2.6 - CONCLUSION ...... 27 2.7 - ACKNOWLEDGEMENTS ...... 28 CHAPTER 3 - WHERE DO PHOSPHOSITES COME FROM AND WHERE DO THEY GO AFTER GENE DUPLICATION? ...... 29 3.1 – RÉSUMÉ ...... 30 3.2 – ABSTRACT ...... 31 3.3 – INTRODUCTION ...... 32 3.4 - METHODS ...... 36 3.5 – RESULTS ...... 37 3.6 - CONCLUSION ...... 44 3.7 - ACKNOWLEDGEMENTS ...... 47

VII

CHAPTER 4 - FUNCTIONAL DIVERGENCE AND EVOLUTIONARY TURNOVER IN MAMMALIAN PHOSPHOPROTEOMES ...... 49 4.1 – RÉSUMÉ...... 50 4.2 - ABSTRACT ...... 51 4.3 – INTRODUCTION ...... 52 4.4 - METHODS ...... 55 4.5 – RESULTS ...... 59 4.5.1 - Conservation and divergence between human and mouse phosphoproteomes ...... 59 4.5.2 - A role for state-diverged sites in phosphoproteome divergence ...... 64 4.5.3 - Evolutionary turnover of mammalian phosphorylation sites ...... 70 4.6 – CONCLUSION ...... 74 4.7 – ACKNOWLEDGEMENTS ...... 76 CHAPTER 5 – CROSS-TALK BETWEEN O-GLCNACYLATION AND PHOSPHORYLATION IN MAMMALIAN PROTEOMES ...... 77 5.1 – RÉSUMÉ...... 78 5.2 -ABSTRACT ...... 79 5.3 - INTRODUCTION ...... 80 5.4 - METHODS ...... 83 5.5 - RESULTS ...... 85 5.5.1 - An extensive dataset of phosphorylation and O-GlcNAcylated sites ...... 85 5.5.2 – Phosphorylation and O-GlcNAcylation are found in the same residues more than expected by chance alone ...... 85 5.5.3 - Clues of independent regulation of multiple functions in but not in mouse ...... 88 5.5.4 – Three state sites and 2-state ones have different preferences for protein kinases ...... 91 5.6 - CONCLUSION ...... 93 CHAPTER 6 – GENERAL CONCLUSIONS ...... 95 6.1 - SUMMARY OF THE STUDY ...... 95 6.2 - PERSPECTIVES ...... 98 ANNEX 1 – SUPPLEMENTARY INFORMATION FOR CHAPTER 2 ...... 101 ANNEX 2 – SUPPLEMENTARY INFORMATION FOR CHAPTER 4 ...... 111 ANNEX 3 – SUPPLEMENTARY INFORMATION FOR CHAPTER 5 ...... 129 ANNEX 4 - QPCA: A SCALABLE ASSAY TO MEASURE THE PERTURBATION OF PROTEIN-PROTEIN INTERACTIONS IN LIVING CELLS ...... 131 ABSTRACT ...... 131 INTRODUCTION ...... 131 MATERIALS AND METHODS ...... 134 RESULTS AND DISCUSSION ...... 139 The DHFR-qPCA signal reflects the amount of protein complex formed in the cell ...... 139 DHFR-qPCA allows to study the effect of metabolic, drug and genetic perturbations on protein complexes ...... 143

VIII

Conclusions ...... 147 ACKNOWLEDGEMENTS ...... 147 REFERENCES ...... 157

IX

List of Figures

FIGURE 2.1. CONSERVATION AND DIVERGENCE OF PHOSPHOREGULATION AMONG WGD PARALOGS...... 21 FIGURE 2.2. GAINS AND LOSSES OF PHOSPHOSITES AFTER GENE DUPLICATION...... 25 FIGURE 2.3. L. KLUYVERI PHOSPHOPROTEOMICS CONFIRMS THAT PHOSPHOSITES ARE PREFERENTIALLY LOST IN PARALOGOUS PHOSPHOPROTEINS...... 26 FIGURE 3.1. ALGORITHM USED TO CALCULATE AND COMPARE THE PROPORTIONS OF TRANSITIONS BETWEEN PHOSPHORYLATED AND PHOSPHOMIMETIC RESIDUES RELATIVE TO CONTROL SITES...... 39 FIGURE 3.2. PHOSPHOSITES THAT ARE DIFFERENTIALLY LOST IN PARALOGOUS PHOSPHOPROTEINS EVOLVE TOWARD NEGATIVELY CHARGED RESIDUES...... 40 FIGURE 3.3. DETAILED ANALYSIS OF THE PATTERNS OF EVOLUTION OF PSER AND PTHR SITES...... 41 FIGURE 3.4. TRANSITIONS BETWEEN PHOSPHORYLATABLE AND PHOSPHOMIMETIC AMINO ACIDS NEED TO GO THROUGH A NON‐ NEGATIVELY CHARGED INTERMEDIATE...... 45 FIGURE 3.5. A DUPLICATION EVENT COULD PROVIDE THE CONDITIONS FOR THE INTERMEDIATE NON‐FUNCTIONAL SITE TO BE NEUTRAL, WHICH WOULD ALLOW A TRANSITION WITHOUT AFFECTING THE FITNESS OF THE ORGANISM...... 46 FIGURE 4.1. PURIFYING SELECTION IS ACTING ON MAMMALIAN PHOSPHORYLATION SITES AND THEIR PHOSPHORYLATION STATUS. 61 FIGURE 4.2. ANALYSIS OF NETPHOREST SCORES FOR THE DIFFERENT CLASSES OF SITES...... 67 FIGURE 4.3. COMPARISON OF A PAIR OF STC AND STD SITES...... 69 FIGURE 4.4. PROPORTION OF SITES THAT ARE PHOSPHORYLATED BY THE SAME PROTEIN KINASE...... 71 FIGURE 4.5. EVOLUTIONARY HISTORIES OF CANDIDATE FUNCTIONALLY REDUNDANT SITE PAIRS...... 73 FIGURE 5.1. NUMBER OF 3‐STATE SITES IN HUMAN AND MOUSE AND COMPARISONS TO RANDOM EXPECTATIONS...... 86 FIGURE 5.2. FRACTION OF SITES AS A FUNCTION OF PROTEIN ABUNDANCE FOR HUMAN AND MOUSE O‐GLCNACYLATION SITES AND COMPARISON OF AVERAGE PROTEIN ABUNDANCE BETWEEN ALL PROTEINS AND PROTEINS THAT CONTAIN 3‐STATE SITES FOR HUMANS AND MOUSE...... 88 FIGURE 5.3. COMPARISON OF RESIDUE CONSERVATION FOR 1‐STATE, 2‐STATE AND 3‐STATE RESIDUES IN THE HUMAN AND MOUSE PROTEOMES...... 90 FIGURE 5.4. COMPARISON OF THE EVOLUTIONARY CONSERVATION OF THE REGIONS SURROUNDING 1‐STATE, 2‐STATE AND 3‐STATE RESIDUES (+/‐ 5 AMINO ACIDS) FOR THE HUMAN PROTEOME...... 91 FIGURE 5.5. KINASE PREFERENCES OF 3‐STATE RESIDUES FOR HUMAN AND MOUSE PROTEINS...... 92

XI

List of Abbreviations cSer: control Serine cThr: control Threonine DHFR: Dihydrofolate reductase FN: False negative FP: False Positive My: Million years PCA: Protein Complementation Assay PTM: Post Translational Modification PWM: Position Weight Matrices StC: State-Conserved StD: State-Diverged SiD: Site-Diverged WGD: Whole Genome Duplication

XIII

For Marcello and Elodia

XV

Acknowledgements

My first thought goes to my advisor, Christian Landry. He has always done all the possible (and, sometimes, even the impossible) to make me a better scientist and a better man.

I would also like to thank the members of my thesis and PhD committees: Prof. Pedro Beltrao, Prof. Yves Bourbonnais, Prof. Nicolas Derome and Prof. Sabine Elowe. Thanks to their suggestions the quality of my PhD and of this thesis have improved a lot.

I cannot forget to mention here all the members of the Landry and Aubin-Horth labs: François-Christophe Marois-Blanchet, Guillaume Diss, José-Francisco Torres-Quiroz, Isabelle Gagnon-Arsenault, Jean-Baptiste Leducq, Alexandre Dubé, Guillaume Charron, Andrée-Ève Chrétien, Francis Rousseau-Brochu, Marie Filteau, Samuel Rochette, Lou Nielly-Thibault, Mélissa Giroux, Jukka-Pekka Verta, Martha Nigg, Nadia Aubin-Horth, François-Olivier Gagnon-Hébert, Sergio Cortez-Ghio, Jennyfer Lacasse, Carole Di Poi and Lucie Grecias. They have been my family in these 5 years in Québec.

A lot of thanks to my lovely wife, Maryam. She have always been on my side during these months, in the good and in the tough moments.

I would also like to thank my parents, Elodia and Marcello, for all the efforts they have made during all these years to allow me following my dreams. What I have accomplished up to now and what I will accomplish in the future is not only my success, but also theirs.

Finally, I would like to spend a few words to say thank you to a special person that unfortunately has not been able to share with me the moments and the feelings of the PhD defence: my beloved grandmother Sara. Without her advice I would not be where I am now.

XVII

Foreword

This thesis is organized in 6 chapters including a general introduction and a general conclusion. Chapters 2, 3 and 4 have already been published as independent scientific articles. Chapter 5 will be submitted for publication to a scientific journal. Annex 4 includes a further paper whose subject is not directly connected with the main theme of this thesis.

Chapter 2 has been published as: Freschi L., Courcelles M., Thibault P., Michnick S.W. and Landry C.R (2011) Phosphorylation network rewiring by gene duplication, Molecular Systems Biology. 7:504

Chapter 3 has been published as: Diss G., Freschi L., Landry C.R (2012) Where do phosphosites come from and where do they go after gene duplication? Journal of Evolutionary Biology - special issue: Molecular Evolutionary Routes that Lead to Innovations 2012: 843167

Chapter 4 has been published as: Freschi L., Osseni M., Landry C.R (2014) Functional Divergence and Evolutionary Turnover in Mammalian Phosphoproteomes, PLoS genetics 10 (1), e1004062

Annex 4 has been published as: Freschi L., Torres-Quiroz F., Dubé A.K., Landry C. R (2013) qPCA: a scalable assay to measure the perturbation of protein-protein interactions in living cells, Molecular Biosystems. 9(1):36-43

The analysis of the results and the writing of the articles have been performed by L. Freschi, under the direction of C. R Landry.

For Chapter 2 C. Landry, M. Courcelles and P. Thibault performed the phosphoproteomics experiments. S. W. Michnick contributed reagents, tools and guidance on the phosphoproteomics experiments.

For Chapter 3 G. Diss participated to the analysis of the results and the writing of the manuscript and he is therefore co-first author in the article.

For Chapter 4 M. Osseni contributed building up the data set used in the study.

XIX

Chapter 1 - Introduction

1.1 - Post-translational modifications

In cells the blueprint for all cellular functions is stored in the DNA. However, the actual effectors of the cellular functions are a myriad of different molecules, among which proteins have a promiment role. The information flux follows this simple rule: the information stored in the DNA is transcribed into RNA, an intermediate messenger molecule and then translated into proteins by (Crick, 1970). All these steps are tighty regulated so that each protein is expressed in the right place at the right time. For instance, factors that bind to the regulatory regions of conribute to define the expression level of genes (Brewster et al, 2014). Different RNA sequences have different stabilities and are translated at different rates by ribosomes, and this affects both protein abundance and protein folding (Schwanhausser et al, 2011). Finally, after , proteins can be further modified by the addition of chemical groups (Prabakaran et al, 2012). This additions are called post-translational modifications or PTMs. Up to now more than 300 different PTMs that have been reported in the literature, which affect more than 300,000 residues of proteins in prokaryotes and eukaryotes proteomes (Khoury et al, 2011). The attached chemical groups may vary in size and span from small groups like the methyl one to entire proteins like ubiquitin. Moreover, most PTMs are reversible, meaning that the protein can shuffle between different states (modified/non-modified) over time or internal conditions (Olsen et al, 2006). Notable examples of PTMs include phosphorylation, the addition of a phosphate group to serines and threonies residues of proteins, glycosylation, the addition of sugar moieties to several residues and ubiquinitylation, the addition of ubiquitin to lysine residues. PTMs are an important mechanism through which the cell gets a fine tuning of cellular functions and we will now go into the details of this aspect.

1

1.2 - How PTMs regulate protein functions

PTMs can modify the properties of a protein in different ways : (i) they can activate or deactivate one or more functions by determining a conformational change of the protein (Sprang et al, 1988), (ii) they can allow protein-protein interactions by changing the bulk charge at the interaction surface of the protein (Khmelinskii et al, 2009), (iii) they can contribute determining the stability (Vazquez et al, 2000) and the half-life (Koivomagi et al, 2011) of the protein and (iv) they can determine the localization of the protein (Madeo et al, 1998). A spectacular example of the first of these scenarios is repesented by the glycogen phosphorylase, an enzyme involved in glycogen metabolism (Livanova et al, 2002). This protein is present in the cell in two forms, named a and b. The a form of the enzyme has a low catalytic activity while the b form is characterized by a high one. The transition from the a form to the b form is made possible by a phosphorylation event on a specific residue (Ser-14). This molecular event ultimately determines a conformational change that has as a consequence an increased enzymatic activity of the protein, which breaks down glycogen chains into glucose molecules that become available for cellular catabolic processes. An example of how PTMs regulate protein-protein interactions is represented by the proteins p53 and MDM2. p53 is a tumor suppressor protein that plays important roles in angiogenesis, genomic stability and apoptosis (Levine & Oren, 2009). p53 interacts with MDM2, a p53 specific E3 ubiquitin ligase, to form the p53-MDM2 complex (Moll & Petrenko, 2003) that prevents p53 activation in unstressed cells. The phosphorylation of the Ser-106 residue of p53 under stress conditions inhibits the interaction between these two proteins (Hsueh et al, 2013) ultimately contributing to p53 activation. PTMs can also affect the half-life of a protein. The attachment of several ubiquitin units to lysine residues is a general mechanism for the cell to target proteins to proteasome mediated degradation. An example of the importance of this mechanism is provided by cell cycle regulation. Indeed, the progression on the cell cycle is made possible by the expression and subsequent ubiquitin-mediated proteolysis of cyclins and CDK inhibitors (Glotzer et al, 1991). Finally, PTMs can also affect the localization of proteins. For instance the phosphorylation of two key serine residues (Ser-358 and Ser-361) of the Ikaros protein, a hematopoietic-specific transcription factor, determines the re-localization of this protein in the nucleus where it can promote the transcription of the genes involved in

2

lymphocite differentiation (Uckun et al, 2012). The different mechanisms described above show how PTMs can change the properties of proteins in many different ways, thus representing a versatile mechanism to tune and regulate protein functions.

1.3 - PTM regulatory networks

One of the most important characteristics of PTMs is that, in general, they are reversible modifications of proteins. This means that the cell has to possess molecular mechanisms to add and remove PTMs in order to be able to regulate proteins function. Further, these mechanisms need to be specific, since different PTMs (e.g. phosphorylation and ubiquitylation) occur at different residue types (phosphorylation occurs on serine, threonine and tyrosine residues while ubiquitylation occurs on lysine residues). In the last decades we started to unravel the molecular details of these mechanisms for many PTMs and to understand the principles beyond them, the most well known example being protein phosphorylation. For this PTM we now know that the phosphate groups are added by a specific set of proteins called protein kinases and are removed by another specific set of proteins called protein phosphatases. In the human genome there are about 500 protein kinases (that correspond to the 2% of all human genes) and 200 phosphatases (Manning et al, 2002). Indeed, for each PTM type (e.g. phosphorylation, , ubiquitylation, etc.) there is a set of proteins, called writers, that can add the PTM to other proteins and another set, called erasers, that can remove the PTM (Lim & Pawson, 2010). In order to understand how the cell regulates its functions through PTMs we need to consider and study the entire network composed by all PTMs, writers and erasers, also called PTM regulatory network. At the moment we are still far from having a complete understanding of the cell PTM regulatory network and even the most recent studies mostly focus on a single PTM network or a few of them (Hunter, 2007).

3

1.4 - Cross-talk between PTMs

An important aspect that characterizes PTM regulatory networks is that they are not independent from each other. Indeed, several studies reviewed in (Hunter, 2007) have revealed that the presence of one PTM at one residue can interfere with the the addition of other PTMs at the same residue or at adjacent residues. This interference is often referred as cross-talk. Two types of cross-talk have been described in the literature: positive cross-talk and negative cross-talk (Hunter, 2007). The term positive cross-talk refers to a situation in which the presence of one PTM at one residue favours the addition or the removal of another PTM. The term negative cross-talk, instead, describes a situation in which one residue can potentially undergo two or more PTMs, so there is a direct competition between the writers of the differents PTM networks to modify that residue. An example of positive cross-talk between PTMs is represented by the phosphorylation-dependent ubiquitylation of cyclin D1 (Lin et al, 2006). Cyclin D1 is a regulator of the CDK4/6 kinases. The cyclin D1/CDK complexes trigger the G1/S transition through the cell cycle. However, during the S-phase cyclin D1 has to be degraded. This event is primed by the phosphorylation of Thr- 286, which promotes the subsequent ubiquitylation of cyclin D1 by an E3 ligase, targeting the protein for degradation. An example of the second type of cross-talk, negative cross- talk, is represented by the protein p53. This tumor suppressor is stabilized by the acetylation of multiple specific lysine residues at the C-terminus (Lys-370, Lys-371, Lys- 372, Lys-381, Lys-382) (Li et al, 2002). The acetylation of these residues impedes their ubiquitylation by MDM2, thus contributing to stabilize p53 and increase its half-life. These examples show that PTM regulatory networks are interconnected and that in order to understand how the whole cellular PTM regulatory network works, we need to take into account and study the cross-talk between PTMs.

1.5 - PTM networks and the evolution of biological complexity and diversity

The organisms that populate our planet have evolved from simpler ones through evolutionary trajectories determined by natural selection and PTM regulatory networks

4

appearance and complexification is thought to have a role in the emergence of biological complexity and diversity. A notable example that supports this scenario has been revealed by recent studies about the evolution of tyrosine phosphorylation. This PTM regulatory network, indeed, is the result several evolutionary steps that started more than a billion years ago in a single-cell eukaryotic organism. Pincus and collaborators (Pincus et al, 2008) shed light on these evolutionary steps. Limited tyrosine phosphorylation calalyzed by Ser/Thr kinases cross-phosphorylation was observed in premetazoan organisms. Premetazoan organisms also possessed a reduced set of erasers for tyrosine phosphorylation (tyrosines phosphatases). However, the complete PTM network that included the set of writers (tyrosine kinases) was only observed in metazoans and choanoflagellates. The emergence of the set of writers on metazoans was also associated to an expansion of the set of erasers. The emergence of the tyrosine phosphorylation PTM network is thought to have had a huge impact for the emergence of multicellular organisms (metazoa), since tyrosine phosphorylation is a key component of the molecular machinery that allows cell-cell communications. This example shows how by studying the evolution of PTM networks we can understand the basis of organismal complexity and diversity.

1.6 - The technological advancements of the last decade make possible the study of PTM networks

The advancements in mass-spectrometry, genomics, biochemistry and bioinformatics of the last decade allowed us to have an unprecedent set of tools to study PTM networks. While classic techniques that rely on antibody-based Western blot analysis are still useful to detect specific PTM events, the development of protocols that allow to enrich the sample for peptides carrying a PTM of interest coupled to mass spectrometry (Zhao & Jensen, 2009) have allowed to determine for the first time proteome-wide PTM profiles at high- throughput and in particular those of model organisms like yeast (Gruhler et al, 2005; Holt et al, 2009; Li et al, 2007), C. elegans (Zielinska et al, 2009), mouse (Huttlin et al, 2010) and human (Sharma et al, 2014). Further, techniques like peptide arrays (Chen & Turk, 2010) have allowed us to explore the specifity of the writers and provided the basis for

5

determining writer-site associations. These associations allowed us to recostruct the topology and the organization of some PTM networks. The results of these studies have also lead to the developmement of algorithms (e.g. (Miller et al, 2008)) that can predict PTM sites on proteins or associations writer-site. While each one of these tools and techniques have several limitations, they allowed us to investigate for the first time whole PTM networks at high-throughput.

1.7 – Aims of this thesis

The general aim of this thesis is to study the evolution of PTM regulatory networks in order to understand how they rewire over time and in different organisms and how they cross-talk to each other. Sheding light on these aspects of PTM networks will allow us to achieve a better understanding of (i) what are the molecular mechainisms that contribute to increase biological complexity, (ii) what are the molecuar basis of species divergence and (iii) improve our knowledge about how the cell integrates different signals to take decisions.

We will now review more in detail the specific questions addressed in each chapter.

Chapter 2 addresses the question of how eukaryotic PTM networks are rewired after gene duplication. Gene duplication is a mechanism that provides raw genetic material that can be shaped by evolution and it is thought to be one of the mechanisms at the origin of organismal complexity. By using budding yeast as model system we study to which extent gene duplication changed the phosphoregulatory network of this model organism. We chose this PTM network because it is the one for which we have the most complete data. In this chapter we also investigate the molecular mechanisms involved in the rewiring of this phosphoregulatory network and we discuss how these mechanisms may have contributed to increase its complexity.

In Chapter 3 we further develop the analyses on the evolutionaries trajectories of yeast phosphorylation sites after the duplication event (Chapter 2) and we study how some of these trajectories that imply the loss of phosphorylation sites may actually contribute to complexify the cellular regulatory network. We then discuss these results in the context of

6

how gene duplication may lead to biological innovations.

In Chapter 4 we address the question of how a mammalian PTM regulatory network has been rewired by evolution, by comparing the mouse and human phosphoproteomes. In this case also we focus on this regulatory network because it is the best known one. Comparing the PTMs of human and mouse represents an important step to both understand the regulatory differences between these species and, more in general, the molecular basis of species divergence.

Finally, in chapter 5 we study the cross-talk of two mammalian PTM regulatory networks that share the same target residues, the phosphorylation and O-GlcNAcylation PTM networks (the target residues being Ser and Thr), in mouse and human. While examples of cross-talk between these two PTMs had already been reported in the literature, a global assessment of the cross-talk between these two PTM networks is not available yet. In our analysis we first find evidence for a global the cross-talk between these two PTM networks and we then determine if phosphorylation and O-GlcNAcylation could act as two molecular switches that regulate a single function or two molecular switches for two functions. By answering these questions we can understand some of the mechanisms by which different PTM networks interact with each other allowing the cell to integrate different signals and to take decisions.

7

Chapter 2 - Phosphorylation network rewiring by gene duplication

Published on: Freschi L., Courcelles M., Thibault P., Michnick S.W. and Landry C.R (2011) Phosphorylation network rewiring by gene duplication, Molecular Systems Biology. 7:504

9

2.1 – Résumé

Pour comprendre comment des réseaux de régulation complexes se sont assemblés au fil de l’évolution, nous avons besoin d’avoir une compréhension détaillée des dynamiques qui suivent les événements de duplication des gènes, entre autres les changements des profils des modifications post-traductionnelles. Nous avons comparé le profil de phosphorylation des protéines paralogues de la levure bourgeonnante à celui d’une espèce qui a divergé de Saccharomyces cerevisiae avant que l’événement de duplication des gènes se soit produit. Nous avons trouvé que 100 millions d’années de divergence après l’événement de duplication sont suffisants pour déterminer que la majorité des sites de phosphorylation soient perdus ou gagnés par un paralogue ou l’autre, avec une forte tendance pour les pertes. Toutefois, certaines pertes peuvent être en partie compensées par l’évolution d’autres sites de phosphorylation, étant donné que les paralogues tendent à préserver le même nombre de sites au fil du temps. Nous avons aussi trouvé qu’environ 50% des relations kinase-substrat peuvent avoir changé durant cette période. Nos résultats suggèrent qu’après la duplication, les protéines tendent à subir des événements de subfonctionnalisation au niveau des modifications post-traductionnelles. De plus, même lorsque les sites de phosphorylation sont conservés au cours de l’évolution, il y a une rotation des kinases qui phosphorylent ces sites.

10

2.2 - Abstract

Elucidating how complex regulatory networks have assembled during evolution requires a detailed understanding of the evolutionary dynamics that follow gene duplication events, including changes in post-translational modifications. We compared the phosphorylation profiles of paralogous proteins in the budding yeast Saccharomyces cerevisiae to that of a species that diverged from the budding yeast prior to the duplication of those genes. We found that 100 million years of post-duplication divergence are sufficient for the majority of phosphorylation sites to be lost or gained in one paralog or the other, with a strong bias towards losses. However, some losses may be partly compensated for by the evolution of other phosphosites, as paralogous proteins tend to preserve similar numbers of phosphosites over time. We also found that up to 50% of kinase-substrate relationships may have been rewired during this period. Our results suggest that after gene duplication, proteins tend to subfunctionalize at the level of posttranslational regulation and that even when phosphosites are preserved, there is a turnover of the kinases that phosphorylate them.

11

2.3 – Introduction

Genomes and organisms gain in complexity during evolution by gene duplication followed by the functional divergence of the duplicates (Hurles, 2004). Signalling and regulatory proteins are thought to play a particularly important role in the evolution of organismal complexity (Gough & Wong, 2010). We know very little about the early evolutionary steps that follow the duplication of regulatory proteins and of the substrates they regulate. Studies on short time scales and on well-characterized organisms are needed in order to estimate the contribution of the different evolutionary forces to the assembly of novel regulatory pathways and networks.

Here we address the evolution of phosphoregulatory networks by directly studying phosphoproteins and their associated protein kinases. Protein phosphorylation regulates several if not most of protein functions by affecting their stability, localization, activity and ability to interact (Moses & Landry, 2010). When maintained, paralogous proteins may diverge in function following two evolutionary paths, which are not mutually exclusive. First, one paralog may evolve new functions (neofunctionalization) (Conant & Wolfe, 2008). Second, degenerative mutations may accumulate in one or both paralogs leading to the loss of redundant functions (subfunctionalization) (Force et al, 1999; Lynch & Force, 2000). If we assume a model under which each phosphosite in a protein has a function (Holmberg et al, 2002), neofunctionalization would correspond to sites acquired after the duplication event and subfunctionalization to sites lost in one of the two paralogs. In the first case, new connections are created in the kinase-substrate network; in the second case, no new function has evolved and regulatory links are lost rather than created. We (Landry et al, 2009) and others (Lienhard, 2008) have recently suggested that a fraction of phosphorylation sites may have no specific functions and represent the result of kinase- substrate interactions that evolved neutrally or nearly neutrally. Accordingly, a fraction of the links that are created or lost after gene duplication in these networks would represent gains and losses of phosphosites without sub- or neofunctionalization of the proteins.

In this study we used the budding yeast Saccharomyces cerevisae phosphorylation network as a model. The lineage leading to the budding yeast underwent a whole genome

12

duplication (WGD) 100 million years (My) ago (Wolfe & Shields, 1997) that affected its signalling networks significantly: while only 10% of all genes (~500 pairs) were maintained as duplicates, 30% and 33% of protein kinases and phosphatases have been retained as duplicates respectively (Seoighe & Wolfe, 1999). Furthermore, phosphoproteins were significantly more likely to be retained as paralogs than nonphosphorylated proteins (Amoutzias et al, 2010). Finally, duplicated kinases and their regulatory proteins differ in sequence and functions (Musso et al, 2008) and many of them show accelerated amino acid changes after the WGD (Kellis et al, 2004). Using computational and experimental analyses, we examined the extent to which phosphosites diverged after gene duplication, we addressed whether there have been accelerated gains and losses of phosphosites among these phosphoproteins and whether kinase-substrate relationships have been modified since the WGD.

2.4 - Materials and Methods

We compiled a set of 20342 phosphorylation sites on 2688 proteins from 8 large-scale studies using 21068 phosphopeptides from 6 studies (Albuquerque et al, 2008; Bodenmiller et al, 2007; Chi et al, 2007; Gruhler et al, 2005; Li et al, 2007; Reinders et al, 2007), as compiled by Amoutzias et al. (Amoutzias et al, 2010) to which we added 3616 phosphopeptides from Beltrao et al. (Beltrao et al, 2009) and 3620 phosphopeptides from Gnad et al. (Gnad et al, 2009). Raw phosphopeptides from these studies were filtered according to the following criteria: for the Gnad dataset, we considered peptides with a probability score above 0.95; for the Beltrao dataset we selected the peptides with score greater than 0.02 and not being acetylated at the amino or carboxy terminus; then for all datasets we selected all the peptides that matched one exact hit on S. cerevisiae proteins using Blat searches (Kent, 2002). Peptides that matched more than one protein were eliminated because they could not be assigned unambiguously to a single protein. We used this data to assemble a first dataset. Thus, we compiled another dataset using the same data about the phosphosites, but this time we did not apply the filtering step with Blat. To our knowlegde these data sets of phosphorylation sites are the most comprehensive ones currently available. Finally, we compiled a third dataset of manually curated phosphosites

13

that have been shown to be phosphorylated in small scale experiments and whose function has been determined (Ba & Moses, 2010). The compiled data and all the other data described below are available at: http://www.bio.ulaval.ca/landrylab/download/.

We estimated the state-divergence of phosphosites between paralogous proteins by comparing cross-study conservation and reproducibility. Our data set comes from 8 distinct studies, so there are 28 possible pairwise comparisons. We only considered sites that were S/T in both paralogs. For each pair of studies we considered 2 sets of concatenated paralogous proteins, para.1 and para.2. We counted the number of sites found in para.1 in study 1 and examined how many were also found in para.1 in study 2 (cross-study reproducibility) and para.2 in study 2 (cross-study conservation) (Annex1, Figure S2.1). We did the same comparison for these two studies between sites identified in para.2 of study 1 and also in para.2 of study 2 (cross-study reproducibility) and of para.1 of study 2 (cross-study conservation). Each pair of studies therefore yields two ratios of cross-study conservation/cross-study reproducibility and this ratio gives a measure of the extent of conservation between paralogs while taking into account the reproducibility of the two studies.

State conservation ≈ cross-study conservation/cross-study reproducibility

State conservation ≈ (Study.1 para.1  Study.2 para.2)/Study.1 para.1

(Study.1 para.1  Study.2 para.1)/Study1. para1

A regression of the cross-study conservation on the cross-study reproducibility provides a rough estimate of the state-conservation between paralogs while taking reproducibility into account (Figure 2.1A).

14

Local phosphosite turnover was tested as follows. We took all the pairs of WGD phosphoproteins where both paralogs had one or more phosphosites. For each phosphosite present in the first paralog, we examined a window of length l centered on the site, thus defining a range of positions along the sequence. Excluding all state-conserved sites (at the exact same position), we counted all the phosphosites present in the aligned second paralog inside the corresponding range of positions within a window. A site was conserved if for a given phosphosite in the first paralog there was at least one phosphosite in the second paralog inside the range of positions. We then determined the ratio of conserved sites over all sites for each window size. The random expectation was estimated using 100 randomizations of phosphosites as described below.

The Position Weight Matrices (PWM) used for the prediction of the protein kinases associated with each of the phosphosites were derived empirically by Mok et al. (Mok et al, 2010) through in vitro peptide screening using 61 of the 122 kinases from S. cerevisiae. While this data is incomplete, it is the best currently available as it relies on empirically derived consensus motifs rather than completely in silico predictions. In order to assign all of the phosphosites to their most likely corresponding kinases, we extracted all of the 15- mers of the yeast proteome that correspond to the phosphosite and their 14 flanking (±7) residues. All phosphosites were then scored by summing the logarithm of the values present in each kinase PWM matrix corresponding to each of the amino acids of the 15- mer. We then assigned a protein kinase to a particular site based on the highest score for that site (Annex1, Figure S2.2). Data on kinase-substrate interactions were obtained from Ptacek (Ptacek et al, 2005) and Ubersax (Ubersax et al, 2003). In the first case the data represents microarray interactions between 87 different kinases and more than 4000 potential substrates. We estimated the fraction of paralogs that were phosphorylated by the same kinase, considering only paralogs that were both phosphorylated by at least one kinase. The second data comes from an in vitro experiment testing for interactions between Cdc28 and the yeast proteome. We calculated the number of times both paralogs were phosphorylated by the kinase among all cases where at least one of the two was phosphorylated.

15

Gains and losses of phosphosites were inferred as described in Figure 2.2A. We estimated the expected numbers of gains and losses by randomly sampling S/T sites. We divided the phosphosites in the four classes according to the type of the residue (S or T) and the type of region where the residue was located (ordered or disordered), and the representation of each class was respected in the resampling. Disordered regions of proteins were predicted using DISOPRED (Ward et al, 2004b) using all the fungal protein sequences as a reference database. We performed a random sampling of S/T positions 1000 times, calculating the number of gains and losses after each resampling. The ancestral residues occupying the phosphosite position were determined as follows. We aligned all of S. cerevisiae genes to the Lachancea kluyveri and Zygosaccharomyces rouxii orthologs, these two species having diverged from the S. cerevisiae lineage prior to the whole-genome duplication. All the sequences and the orthology relationships were obtained from YGOB (Gordon et al, 2009) and alignments performed with MUSCLE (Edgar, 2004) using default parameters. Orthology relationships were found for 4401 genes (among which 516 out of 553 of S. cerevisiae paralogous genes). For each quartet of sequences, we inferred the ancestral sequence at the first node joining the two paralogs (Figure 2.2A). The ancestral protein sequences were inferred using the codeml method implemented in PAML (Yang, 2007) using the following parameters (fix_alpha = 0, alpha = 0.04, fix_blength = 2). We reconstructed ancestral sequences using two different substitution matrices (wag and dayhoff) and both gave similar results so we are presenting only results derived from the wag matrix. We examined the robustness of the reconstruction by performing the same analyses including an additional pre-WGD species (K. thermotolerans) to our set. In this case we were able to reconstruct the orthology relationships and the ancestral sequence for 4388 genes (among which 516 out of 551 of S. cerevisiae paralogous genes) (Dataset 4). All analyses were performed using Perl (http://www.perl.org) and R (http://www.r- project.org/) scripts.

The Lanchacea kluyveri phosphosites were identified as follows. L. kluyveri (formerly known as Saccharomyces kluyveri) strain FM628 (MATa ura3) was obtained from Marc

Johnston (Washington University). Pre-cultures of 75 ml were grown to OD600 ~ 3 overnight in standard yeast YPD medium at 30°C, agitated at 600 rpm and diluted to OD600

16

= 0.1 in the morning in 1L of YPD. Cells were harvested at OD600~0.6-0.8 by centrifugation at 4,000 rpm for 20 minutes. The pellet (about 2-3 grams) were suspended in 20 ml of lysis buffer following (Albuquerque et al, 2008) with slight modifications: 50 mM Tris-HCl (pH 8.0), 150 mM NaCl, 0.2% NonidetP-50, 1.5 mM MgCl2, 0.2 mM EDTA. The lysis buffer also contained phosphatase inhnibitors phosSTOP (Roche), protease inhibitors, complete protease cocktail (Roche) and 1 mM PMSF. Samples were quickly frozen directly in liquid nitrogen drop-by-drop to make 1cm3 frozen pellets and conserved at -80 °C. Yeast powder extracts were then produced using a Frezzer-Mill (Spex SamplePrep), which pulverizes cryogenically small pellets with a magnetically driven impactor submerged in liquid nitrogen. The fine powder was then centrifuged at 14,500 RPM (rotor SA600) for 30 minutes at 4°C. The clear supernatant was treated with Benzonase (Novagen) to eliminate nucleic acids overnight at 4°C and then cold acetone precipitated.

Protein pellets were resuspended in 1% SDS/50 mM ammonium bicarbonate (AB) and microBCA (Pierce) was used to determine protein concentration. Proteins extracts (1 mg) were reduced for 20 min at 37°C with 0.5 mM Tris (2-carboxyethyl)phosphine, TCEP(Pierce), alkylated with 50 mM iodoacetamide for 20 min at 37°C and quenched by adding 50 mM DTT. Samples were diluted 10x with 50 mM AB, digested overnight at 37°C with sequencing grade trypsin (enzyme:substrate, 1:100) (Promega). The digestion was stopped by adding trifluoro acetic acid (TFA) and was followed by evaporation on a SpeedVac (Thermo Fisher Scientific, San Jose, CA). Phosphopeptides were enriched on home-made TiO2-affinity columns (1.25 mg Titansphere, 5 µm, GL Sciences), using 250 mM lactic acid (Fluka) and eluted with 30 µl of 1% ammonium hydroxide, as described previously (Thingholm et al, 2006). Samples were acidified with 1 µL of TFA, desalted using 30 mg HLB cartridge (Waters Corporation, Milford, MA), dried and resuspended in 2% acetonitrile, can (Thermo Fisher Scientific), 0.2% formic acid, FA (EMD Chemicals Inc., Gibbstown) prior to analysis.

Triplicate 2D-nanoLC-MS/MS analysis of phosphopeptides was performed on an LTQ- Orbitrap XL mass spectrometer (Thermo Fisher Scientific) coupled to an Eksigent LC

17

system. Online SCX separation (Opti-Guard 1mm cation column, Optimize Technologies) was performed using five different ammonium acetate salt fractions, pH 3.0 (0, 250, 500, 1000 & 2000 mM) in 2% ACN (0.2% FA). Peptides eluted from each salt fraction were transferred to a pre-column reverse phase trap (4 mm length, 360 µm i.d.) and injected on a reverse phase analytical column (10 cm length, 150 µm i.d.) (Jupiter C18, 3µm, 300 Å, Phenomenex). A linear gradient (2 to 25% ACN over 63 min followed by 25 to 40% ACN over the next 15 min) was applied to separate phosphopeptides, which were directly injected into the mass spectrometer at a flow rate of 600 nL/min. Detailed MS operation procedure is described in (Marcantonio et al, 2008). Mascot Distiller v2.1.1 (Matrix Science, London, UK) was used to extract and preprocess MS/MS spectrum from raw data file. Peptide identification was done with Mascot v2.2 using Lachancea (Saccharomyces) kluyveri protein sequence database (http://www.ebi.ac.uk/embl/). The following parameters were used: parent and fragment tolerance of 0.02 Da and 0.5 Da respectively, trypsin with 2 missed cleavages and the following modifications: carbamidomethyl (C), deamidation (NQ), oxidation (M), phosphorylation (STY). ProteoConnections (Courcelles et al, 2011) was used to limit peptide false discovery rate to 1% and evaluate the confidence of phosphorylation site localisation. MS/MS of all peptide identifications are available at http://www.thibault.iric.ca/proteoconnections. Phosphosites with a confidence score above 60% were considered for the evolutionary analyses (711 sites in 396 proteins).

2.5 – Results and discussion

2.5.1 - Paralogous phosphoproteins substantially diverged after WGD

Our dataset consists of 2726 phosphosites (serines (S), 82%; threonines (T), 16%; tyrosines (Y), 2%) that belong to one or the other member of the 352 pairs of yeast WGD paralogs for which at least one of the two proteins is a phosphoprotein. In this work we focused on S/T phosphosites as they make up 98% of all phosphosites. Among these sites, 2445 are unique to one paralog and 118 (that correspond to 236 phosphosites) occur at homologous positions, a number 7.4 times higher than expected by chance (P<< 0.001, Annex 1, Figure

18

S2.3). Phosphosites diverge in two ways. First are cases where a S/T residue is phosphorylated in a protein and a residue that cannot be phosphorylated occupies the homologous position in its paralog (site-divergence). Site-divergence accounts for 69% of the sites that are unique to one paralog. Second, a S/T is phosphorylated in one paralog and its homologous position is conserved (S/T) but not observed to be phosphorylated (state- divergence). Eighty-six percent of homologous sites that are phosphorylated are in fact state-diverged. This measure of state-divergence is strongly upwardly biased by false negative (FN) and false positive (FP) identifications and also by the fact that phosphopeptides that match more than one protein are not included in this dataset. We considered these issues by comparing the cross-study conservation with the cross-study reproducibility. We found that state-conservation between paralogs is around 36% for filtered peptides (considering only phosphopeptides that match a single position in the proteome) and 54% for unfiltered peptides (considering all phosphopeptides) (Figure 2.1A). Protein sequence, function, localization and/or recognition by protein kinases have diverged to such extent in 100 My that only 36-54% of their post-translational regulation by phosphorylation appears to be conserved despite a conservation of the actual residues.

19

20

Figure 2.1. Conservation and divergence of phosphoregulation among WGD paralogs.

(A) The state-conservation of paralogous proteins was estimated as a regression of the cross-study conservation on the cross-study reproducibility. A 1:1 relationship is expected if all phosphosites were state-conserved. Deviation from this 1:1 relationship provides an estimate of state divergence. Filtered data: phosphopeptides that match a single protein; unfiltered data: all phosphopeptides. (B) Positive correlation in the number of phosphosites of WGD paralogous proteins. Red dots indicate average numbers in binned data and green dots the actual data. Green intensities indicate the number of points at these positions. (C) Proportion of paralogous pairs with significant conservation as a function of the window size considered. A site is considered conserved if there is a phosphorylated site in the other paralog within the window (excluding the exact position). (D) Case of putative local compensation. The fraction of conserved sites as a function of window size is shown. Blue: observed value; Grey: 95th quantile (100 permutations); Red: average of the expected distribution. (E) Fraction of paralogous phosphosites or phosphoproteins assigned to the same protein kinase. Assignments are based on PWMs from (Mok et al, 2010). The observed fraction is calculated using these assignments while the expected fraction is estimated after shuffling the assigned kinases among the pairs of paralogous sites. Ptacek: large-scale in vitro kinase-substrate interactions on microarrays (Ptacek et al, 2005). Ubersax: in vitro Cdc28-substrate interactions (Ubersax et al, 2003). (F) Distributions of the PWM scores for different classes of sites.

2.5.2 - Conservation and compensation of phosphosite loss by site-position turnover

Surprisingly, despite the low level of site-conservation between paralogous proteins, there is a highly significant correlation in the number of phosphorylation sites between paralogs (rho = 0.35, p-value < 2.2×10-16; Figure 2.1B). This correlation remains significant when the number of phosphosites is normalized by protein length (rho = 0.32 p-value < 6.9×10- 14) or the length of disordered regions (rho = 0.27 p-value < 3.8×10-10), which both tend to be preserved between paralogs. The correlation is also significant when only site-diverged phosphosites are considered (rho = 0.28, p-value = 2.0 ×10-11). This correlation suggests that stabilizing selection is acting to maintain the overall number of phosphosites. This result is in agreement with a recent study (Beltrao et al, 2009) reporting that the phosphorylation levels of orthologous protein complexes or pathways between Candida

21

albicans and S. cerevisiae tend to be conserved. The turnover of phosphosite position over time could be made possible by the fact that sites that appear at a position nearby a site that is lost can compensate for the loss (Serber & Ferrell Jr, 2007), particularly when the charge of a region rather than that of a particular residue is important. The redundancy in the position of phosphosites has been previously proposed to explain the weak site- conservation among species (Landry et al, 2009), but so far there has been limited evidence for this (Ba & Moses, 2010; Moses & Landry, 2010).

If this local turnover model is responsible for the overall conservation of the number of phosphosites, the proportion of conservation between paralogs should increase significantly if we consider regions of proteins rather than actual positions. We found that to be the case for a significant but limited number of paralogous pairs. We re-considered the proportion of state-conserved sites as the proportion of sites in a protein that have a phosphosite in the homologous region of a given window size in its paralog. We first found that the window size that maximizes the signal is about 33 amino acids in length (Figure 2.1C). Then, we found that among the 167 pairs of paralogous proteins where both paralogs have at least one phosphosite, 11 of them (6.6%) showed a significant level of conservation at that window length (an example is shown in Figure 2.1D). This result may suggest either that compensation by near-by sites is relatively uncommon and is specific to some types of proteins, or that the relatively limited coverage of the yeast phosphoproteome leaves us with limited power to detect significant compensation. Another possibility is that such compensation takes place only in highly phosphorylated proteins. Indeed, we found that paralogous pairs for which there is significant functional compensation have significantly more phosphosites (mean: 9.28 vs. 3.87; Wilcoxon test: p-value < 9.5×10-11) and also tend to contain a larger proportion of disordered residues (mean: 53% versus 42%, p = 0.01) compared to all pairs.

2.5.3 - Life after WGD: rewiring the cellular regulatory networks

Phosphosites are phosphorylated by a variety of kinases that recognize specific motifs surrounding the phosphosites. As for many eukaryotes, around 2% (120 total) of yeast

22

protein-coding genes code for protein kinases (Zhu et al, 2000). We examined the conservation of the relationships between our set of phosphosites and yeast kinases by assigning each phosphosite to a kinase using empirically derived Position Weight Matrices (PWM) for 61 yeast kinases (Data set S1 from (Mok et al, 2010)). We first found that WGD paralogs are generally not biased in terms of the protein kinases that regulate them (rho = 0.99, p-value < 2×10-16, Annex 1, Figure S2.4). Secondly, we found that state- conserved sites are assigned to the same kinase 44% of the time, a twenty fold increase over what is expected if phosphosites were randomly matched between paralogs (p-value < 0.0001; Figure 2.1E). This number drops to 23% for state-diverged sites, again supporting the fact that state-divergence does not entirely result from FN identifications. These sites are either not being phosphorylated or being phosphorylated by a different kinase in a different condition not addressed so far in phosphoproteomics studies. This first hypothesis is supported by the fact that, for state-diverged sites, the assigned scores are significantly higher for the phosphorylated sites than the non-phosphorylated ones (Figure 2.1F). We estimated that the state-diverged nonphosphorylated S/T sites in reality comprise 50% of nonphosphorylated sites (Annex 1, Figure S2.5).

The low percentage of assignment (44%) of the same kinase to state-conserved sites suggests that the kinases that phosphorylate paralogous sites have changed since the WGD (Moses & Landry, 2010). We found independent support for this from large scale and small-scale kinase-substrate interaction experiments (Ptacek et al, 2005; Ubersax et al, 2003) in which kinase-substrate relationships are also conserved in similar proportions (Figure 2.1E). Overall, these analyses suggest that while a significant fraction of sites is conserved and phosphorylated in both paralogs, the flanking sequences and/or protein structure and/or localization have diverged enough for the substrate to be regulated by a different protein kinase, a regulatory network turnover that is similar to what is observed for transcriptional networks (Gasch et al, 2004; Moses & Landry, 2010). After 100 My of evolution, up to 50% of kinase-substrate relationships may be rewired, while preserving the phosphorylation status of the substrates.

23

2.5.4 - Phosphosite loss dominates site-divergence

A recent study on the budding yeast reported putative cases of neo- and subfunctionalization of phosphosites (Amoutzias et al, 2010), but did not compare the extent of those changes to a null model. We therefore sought to quantify whether site divergence resulted from losses or gains of phosphosites by reconstructing the ancestral sequences of the paralogous proteins and comparing the observed proportions to the neutral expectations (Figure 2.2A, 2.2B & 2.2C). We found that 25% of sites correspond to gains and 31% of sites correspond to losses. These proportions are, respectively, significantly less and more than expected by chance alone, based on the resampling of phosphorylatable sites in the same set of phosphoproteins (Figure 2.2C). This remains true for ordered and disordered regions of proteins, which have been shown to evolve at different rates. We consider that these losses represent several subfunctionalization events as non-functional phosphosites (Landry et al, 2009) are expected to evolve as randomly selected S/T. These results are also unlikely to result from false positives, as we performed the same analyses on a smaller number of manually curated phosphosites (Nguyen Ba & Moses, 2010); Annex 1, Figure S2.6) and we observed similar results. Our results are also robust to data filtering (Annex1, Figure S2.7) and variation in ancestral sequence reconstruction (Annex 1, Figure S2.8).

24

Figure 2.2. Gains and losses of phosphosites after gene duplication.

(A) Inference of gains and losses of phosphosites. Serines (S) and threonines (T) are considered equivalent with respect to phosphorylation. !S/!T indicates residues that are not a S nor a T, and pS/pT indicates phosphorylated S/T . (B) Examples of lost (S72), gained (S121) and conserved site (S103) from the curated dataset (Dataset 2). (C) The number of observed losses is greater than expected by chance alone and the number of gains shows the opposite result. Results in ordered and disordered regions agree with each other.

25

A limitation of this analysis is that we have to assume that the phosphorylatable sites (S/T) of the ancestral sequence that correspond to phosphorylation sites in S. cerevisiae were phosphorylated in the ancestor. Only a direct observation of the phosphorylation state of the ancestral proteins would alleviate this problem. We therefore performed a phosphoproteomics experiment on Lachancea kluyveri (Souciet et al, 2009), a species that diverged from S. cerevisiae before the WGD event and that can be used as a proxy for ancestral functions (van Hoof, 2005). We identified 855 phosphosites on 429 proteins (Annex 1, File S2.1) that we mapped on our alignments. We found that a smaller proportion of phosphosites identified in L. kluyveri are also phosphorylated in the S. cerevisiae WGD paralogs (1:2) compared to the 1:1 S. cerevisiae orthologs (Figure 2.3A).

Figure 2.3. L. kluyveri phosphoproteomics confirms that phosphosites are preferentially lost in paralogous phosphoproteins.

(A) L. kluyveri phosphosites are more likely to be phosphorylated in S. cerevisiae if they are in 1:1 orthologs (142/469 sites in 108 proteins) than in 1:2 orthologs (31/181 sites in 45 proteins). (B)

26

Ratios of the number of sites unique to S. cerevisiae to the number of shared ones with L. kluyveri for 1:1 orthologs (142/6644 sites in 108 proteins) and 2:1 orthologs (62/2681 sites in 45 proteins).

Assuming that the rate of phosphosite gain in the L. kluyveri lineage was similar in these two categories of genes (1:1 and 1:2 L.kluyveri-S. cerevisiae orthologs), this result confirms that phosphosites were more likely to be lost in the S. cerevisiae WGD paralogs and thus that gene duplication has significantly accelerated the rate of phosphosite divergence. We also found that the proportion of sites that are uniquely phosphorylated in S. cerevisiae (not found to be phosphorylated in L. kluyveri) in the WGD paralogs is actually comparable to the one for the 1:1 orthologs (Figure 2.3B). Under a scenario where phosphosite gains accelerated the divergence of the WGD paralogs, we would have expected to see a significantly higher fraction of gains for the 2:1 orthologs compared to the 1:1 ones. Our phosphoproteomics results therefore support our bioinformatics analyses based solely on ancestral sequence reconstruction and confirm the prevalence of phosphosite losses in the divergence of paralogous phosphoproteins.

2.6 - Conclusion

A previous study considering the ancestral function of duplicated WGD proteins has shown the importance of subfunctionalization in shaping the function of WGD paralogs acting at the level of protein functions (van Hoof, 2005), whereas investigations of transcriptional regulation have also found a significant contribution of neofunctionalization in the divergence of paralogs (Papp et al, 2003; Tirosh & Barkai, 2007). Our results suggest that at the level of post-translational regulation, subfunctionalization may have been the most important driving force in shaping the yeast regulatory network. One limitation of our analysis is that we consider that, when functional, each phosphosite has an independent function, which may not be necessarily the case, as several cooperative effects among phosphosites have been reported (Kapoor et al, 2000). The combined and individual effects of the sub- and neofunctionalized sites will need to be addressed experimentally to estimate the functional effects of these divergences. Further integrative analyses will also be required to elucidate the importance of neo- and subfunctionalization that take place at

27

multiple levels (transcription, protein function, PTMs), as these may be largely dependent on each other (Jensen et al, 2006). Another key finding of our study is that 100 My may be sufficient to rewire half of the kinase-substrate relationships in a cell. This result is in agreement with the idea that protein-protein interaction networks evolve rapidly. In about 300 My of evolution, half of all the interactions are supposed to be replaced by new interactions (Wagner, 2001b).

2.7 - Acknowledgements

We thank H. Wurtele and A. Verreault for the use of their facilities and N. Lartillot and A. Moses for comments on the manuscript. This work was supported by a Canadian Institute of Health Research (CIHR) grant GMX-191597 and FRSQ to C. R Landry. C. R Landry is a CIHR New Investigator. L. Freschi was supported by a Quebec Research Network on Protein Function, Structure and Engineering (PROTEO) fellowship.

28

Chapter 3 - Where Do Phosphosites Come from and Where Do They Go after Gene Duplication?

Published on: Diss G., Freschi L., Landry C. R (2012), Where do phosphosites come from and where do they go after gene duplication? Journal of Evolutionary Biology - special issue: Molecular Evolutionary Routes that Lead to Innovations 2012: 843167.

29

3.1 – Résumé

La duplication des gènes, suivie par la divergence, est un mécanisme important pour promouvoir des innovations au niveau moléculaire. Si la divergence au niveau de la régulation transcriptionnelle est bien documentée, nous ne connaissons pas beaucoup de détails à propos de la divergence causée par les modifications post-traductionnelles (PTMs). Ici, nous testons si des gains et des pertes d’acide aminés phosphorylées après la duplication des gènes peuvent modifier de façon spécifique la régulation de ces protéines dupliquées. Nous montrons que lorsqu’un site de phosphorylation est perdu dans un paralogue, les transitions vers les acides aminés chargés négativement (qui peuvent mimer l’état phosphorylé de façon constitutive) sont significativement favorisées. Ces transitions ne peuvent pas se produire avec une seule mutation, signifiant que la fonction doit être perdue avant d’être regagnée avec les résidus phosphomimétiques. En conclusion, nous discutons de comment la duplication des gènes peut faciliter les transitions entre acides aminés phosphorylés et acides aminés phosphomimétiques.

30

3.2 – Abstract

Gene duplication followed by divergence is an important mechanism that leads to molecular innovation. Whereas regulatory divergence at the transcriptional level is well documented, little is known about divergence of posttranslational modifications (PTMs). Here we test whether gains and losses of phosphorylated amino acids after gene duplication may specifically modify the regulation of these duplicated proteins. We show that when phosphosites are lost in one paralog, transitions from phosphorylated serines and threonines are significantly biased toward negatively charged amino acids, which can mimic their phosphorylated status in a constitutive manner. Surprisingly, these favoured transitions cannot be reached by single mutational steps, which suggests that the function of a phosphosite needs to be completely abolished before it is restored through substitution by these phosphomimetic residues. We conclude by discussing how gene duplication could facilitate the transitions between phosphorylated and phosphomimetic amino acids.

31

3.3 – Introduction

Gene duplication is one of the most prominent mechanisms by which organisms acquire new functions (Ohno, 1970). Spectacular examples of such gains of function resulting from gene duplications are the evolution of trichromatic vision in primates (Dulai et al, 1999), the evolution of human beta-globin genes that are involved in the oxygen transport at different developmental stages (Efstratiadis et al, 1980) as well as the expansion of the family of immunoglobulins and other immunity related genes that shaped the vertebrate immune system (Boulais et al, 2010; Zhang, 2003). Because of the central role of gene duplication in evolution, there has been a profound interest for a better understanding of how these new functions evolve at the molecular level (Hurles, 2004), for determining at what rate gene duplication occurs (Lynch & Conery, 2000; Lynch et al, 2008; Wagner, 2001a) and for testing whether the retention of paralogous genes necessarily requires the evolution of new functions (Force et al, 1999; Hurles, 2004; van Hoof, 2005). One of the most important challenges has been to determine mechanistically how specific mutations translate into new functions, as establishing sequence-function relationships remains a difficult task (Dean & Thornton, 2007).

After a gene duplication event, the two sister paralogs are identical copies of their ancestor and encode two identical functions, thus relaxing the selective constraints on each paralog (Lynch & Conery, 2000). Under most evolutionary models, both paralogs have to diverge to be retained on evolutionary time scales, otherwise one paralog would be lost and the system would return to its ancestral state (non-functionalization) (Hurles, 2004). There are two ways for paralogs to diverge in function. The first one is the acquisition of new functions by one or both of the two paralogs, a mechanism called neofunctionalization (Force et al, 1999; Lynch & Conery, 2000; Ohno, 1970). The second mechanism, called subfunctionalization, implies the complementary partitioning of the ancestral function between the two paralogs by losses of functions (Force et al, 1999; Lynch & Conery, 2000; Lynch & Force, 2000). These two mechanisms are not mutually exclusive because the ancestral function can be partitioned by subfunctionalization and then one or both paralogs may acquire new functions by neofunctionalization, a mechanism called neosubfunctionalization (He & Zhang, 2005). An increase in the dosage of a gene product

32

by the addition of a second identical copy of the ancestral gene can also contribute to the retention of paralogous pairs, without the need for the gain or loss of functions (Kondrashov & Koonin, 2004; Kondrashov et al, 2002).

Divergence between paralogs does not necessarily imply a divergence in a specific function but can also involve a change in the regulation of that function. For instance, the regulatory control of a protein function can be modified at the transcriptional or at the posttranslational level. Divergence in expression pattern of duplicated transcript is well documented (Ferris & Whitt, 1979; Force et al, 1999; Gu et al, 2002; Ohno, 1970). For example, Gu et al. showed that a large fraction of ancient duplicated gene pairs in yeast shows divergent gene expression patterns (Gu et al, 2002). A more recent study showed that nearly half of the genes that duplicated after a Whole Genome Duplication event (WGD) in a forest tree species have diverged in expression by a random degeneration process (Rodgers-Melnick et al, 2012). However, little is known about the divergence of regulation by posttranslational modifications (PTMs), which take place after transcription and translation and directly affects protein activities (Moses & Landry, 2010).

PTMs are covalent modifications of one or more amino acids that affect the activity of a protein, its localization in the cell, its turnover rate, and its interactions with other molecules (Mann & Jensen, 2003). Cells use a wide range of different PTMs to exert distinct regulations on proteins. Although only 20 amino acids are encoded by the genetic code, more than 200 amino acid variants or their derivatives are found in proteins after PTMs (Seo & Lee, 2004). Phosphorylation, the addition of a phosphate moiety from an ATP donor to a serine (Ser), threonine (Thr) or tyrosine (Tyr) residue by a protein kinase, is by far the best-known PTM, as it is the most common and is involved in the regulation of key biological processes of fundamental and medical interest, such as signal transduction and cell-cycle regulation (Hunter, 2000). Phosphorylation of these amino acids modifies their biochemical properties in several manners. Of particular interest for this study is the fact that the addition of a phosphate group brings two new negative charges that allow the formation of a salt-bridge or contribute to the local charge of the protein (Serber & Ferrell, 2007). Given that a phosphate group is a relatively large molecule, phosphorylation can

33

also have sterical effects. Such properties can notably induce conformational changes of the protein, modify its catalytic activity or block the access to its catalytic site, which result in the activation or inhibition of the activity of the target protein by direct or allosteric effects (Serber & Ferrell, 2007).

Several of the effects of protein phosphorylation can be mimicked by the negatively charged amino acids aspartic acid (Asp) and glutamic acid (Glu). Indeed, the biochemical properties of these amino acids are close to those of phosphorylated Ser or Thr residues (Tarrant & Cole, 2009). In particular conditions, Asp and Glu are constitutive functional equivalents of phosphosites in a phosphorylated state. This functional resemblance has been exploited by biochemists by replacing Ser and Thr residues by Asp and Glu in proteins of interest in order to mimic their phosphorylated status. This molecular mimicry led them to call Asp and Glu phosphomimetic amino acids (Tarrant & Cole, 2009). This trick appears to have been also used by nature to evolve new phosphosites. A striking example comes from the evolution of the Activation Induced cytidine Deaminase (AID) across vertebrates, an enzyme involved in the generation of antibody diversity. The interaction of this enzyme with the Replication Protein A (RPA) promotes AID access to transcribed double-stranded DNA during immunoglobulin class switch recombination. This interaction requires a negative charge on AID, which is provided by an Asp in bony fish. In these organisms, the enzyme is constitutively capable of interacting with RPA. In amphibians and mammals, the function of the Asp residue is carried out by a phosphorylatable Ser (pSer), which allows the regulation of the protein interaction by protein kinases in a condition specific fashion (Basu et al, 2008). It was recently suggested that this type of evolutionary transitions might be common. Globally, it was shown that pSer tend to evolve from or to phosphomimetic amino acids (Asp and Glu) when gained and lost respectively throughout the evolution of eukaryotes (Kurmangaliyev et al, 2011; Pearlman et al, 2011).

Protein phosphoregulation has been suggested to play a role in the evolutionary fate of paralogous proteins. Most studies done so far focused on the paralogous genes of the budding yeast Saccharomyces cerevisiae because its phosphoproteome has been intensely

34

studied (Albuquerque et al, 2008; Beltrao et al, 2009; Gnad et al, 2009). Using the yeast paralogs that derive from the WGD event, Amoutzias et al. showed that the number of phosphosites on a phosphoprotein is an important determinant for the retention of its duplicated descendants (Amoutzias et al, 2010). In a following study, Freschi et al. studied the gains and losses of phosphosites in paralogous phosphoproteins and found that the great majority of them are present in one paralog and not in the other. This divergence was shown to be principally driven by losses rather than gains of phosphosites on one paralog (Freschi et al, 2011). Finally, Kaganovich and Snyder found that phosphosites tend to diverge more asymmetrically than non-phosphorylated amino acids, playing thus an important role in paralogous genes divergence and retention (Kaganovich & Snyder, 2012). These observations raise the question of where do phosphosites come from and where do they go after a gene duplication. According to the observations on phosphomimetic amino acids described above, gains and losses of phosphosites could represent two distinct types of divergence. On the one hand, the gain or the loss of phosphosites from or to a non- phosphomimetic residue would represent a divergence in the function of the protein. On the other hand, a gain or a loss could occur from or to phosphomimetic residues, leading to a modification of the control of the charged residue by the cell rather than a modification of function per se. Here we test whether this second scenario could have contributed to the divergence of paralogous proteins using the yeast phosphoproteome as a model.

35 3.4 - Methods

Dataset All analyses were performed using the dataset we compiled in a previous study (Freschi et al, 2011) and that is available at http://www.bio.ulaval.ca/landrylab/download/. This dataset contains 20,342 phosphosites on 2688 proteins from eight large-scale studies (Albuquerque et al, 2008; Beltrao et al, 2009; Bodenmiller et al, 2007; Chi et al, 2007; Gnad et al, 2009; Gruhler et al, 2005; Li et al, 2007; Reinders et al, 2007). It also provides the alignments of all S. cerevisiae WGD paralogous genes with their ancestral sequence and with the orthologs of L. kluyveri and Z. rouxii. The aligments were performed using MUSCLE (Edgar, 2004) while the ancestral sequence was inferred using the Codeml method implemented in PAML (Yang, 2007). We chose to analyze only two species that diverged before the Whole Genome Duplication event for the following reasons. The majority of phosphorylation sites are located in disordered regions (Landry et al, 2009) and these regions are fast evolving. Alignment of sequences from distantly related species leads to spurious alignments or to alignments that may contain several indels. Indels decrease the number of phopshorylation sites available for the analysis, as ancestral sequences cannot be computed at these positions. Further, in Freschi et al. 2011 (Freschi et al, 2011), we performed the analyses including an additional species that diverged prior to the whole- genome duplication and we found that this did not significantly affect our results. Finally, this dataset also provides information about the localization of each residue in ordered or disordered regions of the protein, according to predictions made with DISOPRED (Ward et al, 2004b).

Approaches to study gains and losses of phosphosites

We applied different approaches to study gains and losses coming from, or going to negatively charged amino acids. In the first approach, we used the ancestral sequence as a reference to assess the presence of a gain or a loss at a specific position. For the gains, we compared the proportion of phosphomimetic amino acids in the ancestral sequence (Asp or Glu) going to pSer or pThr to the proportion of phosphomimetic amino acids going to cSer and cThr. For the losses, we compared the proportion of phosphorylated residues (pSer and

pThr) coming from Asp or Glu to the proportion of non-phosphorylated residues (cSer and cThr) coming from Asp or Glu, respectively. We required the ancestral sequence to have a phosphorylatable residue and one of the two paralogs to be phosphorylated at the homologous position. Comparisons of proportions were performed using Fisher’s exact tests as implemented in R. In our second approach, we used a parsimony method to calculate the same proportions. This time we used the sequences of S. kluyveri and Z. rouxii as reference. In the case of a gain of phosphosites, we required the presence of the same negatively charged residue (Asp or Glu) in the reference species as well as in one of the two paralogs and a phosphorylatable residue (Ser or Thr) in the other paralog. In the case of losses of phosphosites, we required the presence of the same phosphorylatable residue (Ser or Thr) in the reference species as well as in one of the two paralogs and a negatively charged residue (Asp or Glu) in the other paralog. All proportions were calculated by dividing the number of sites coming from or going to an Asp or a Glu by the number of sites that come from or go to any of the 17 non-phosphorylatable amino acids following the same criteria (Figure 3.1).

3.5 – Results

The phosphoproteome of S. cerevisiae is the best described among eukaryotes and has been mapped by mass spectrometry, leading to the identification of high-confidence phosphosites (Albuquerque et al, 2008; Beltrao et al, 2009; Gnad et al, 2009). We assembled a data set (Freschi et al, 2011) that consists of 2,726 phosphosites (Ser, 82%; Thr, 16%; Tyr, 2%) that belong to one or the other member of the 352 pairs of yeast WGD paralogs for which at least one of the two proteins is a phosphoprotein. We inferred the ancestral sequence for each pair of paralogs using alignments with orthologous sequences from Lachancea kluyveri and Zygosaccharomyces rouxii, two species that diverged from S. cerevisiae before the WGD event. For each pair, we aligned all five sequences, we mapped the phosphosites on the sequences of the paralogs and analysed phosphosites that diverged, i.e. cases where a phosphorylatable residue was present in only one paralog.

37

Under a scenario where gains of phosphosites would result from selection for transitions from phosphomimetic amino acids to phosphorylated residues, we would expect phosphorylated Ser or Thr (pSer and pThr, respectively) to evolve more often from Asp or Glu than non-phosphorylated ones (cSer and cThr, respectively). Similarly, under a scenario where losses of phosphosites would result from transitions from phosphorylated residues to phosphomimetic amino acids, we would expect pSer and pThr to evolve more often to Asp and Glu than equivalent cSer and cThr. We tested these two hypotheses as described in (Figure 3.1).

In the first case, we compared the proportion of pSer and pThr that were gained from Asp and Glu with that of cSer and cThr, i.e. all serines and threonines from the same set of proteins that were gained from Asp and Glu but that are not known to be phosphorylated. In the second case, we compared the ratio of sites that were lost and replaced by phosphomimetic residues in only one paralog with the ratios derived from cSer and cThr. We performed the analysis using paralogous ancestral sequences inferred with a likelihood method and also using a parsimonious approach, whereby the ancestral state of phoshosites was inferred based on the conservation of the site in one of the two paralogs and its two orthologs (Figure 3.1A). Global results are presented in Figure 3.2 and detailed analyses are presented in Figure 3.3.

38

Figure 3.1. Algorithm used to calculate and compare the proportions of transitions between phosphorylated and phosphomimetic residues relative to control sites.

(A) Phosphosite (pS, pT) gains from phosphomimetic amino acids were identified as cases where only one of the paralog has a phosphosite and the ancestral sequence has a phosphomimetic residue at the same position. Control sites (cS, cT) were identified in the same way but considering Ser and Thr that are not known to be phosphorylated. The ancestral sequence was inferred using likelihood or parsimony approaches. Phosphosites losses to phosphomimetic amino acids were identified as cases where one paralog has a phosphosite in a position that is occupied by a phosphomimetic amino acid in the other paralog and a phosphorylatable amino acid at the same position in the ancestral sequence. (B) The proportion of pS or pT that evolved from or to D or E was compared to the proportion of cS or cT that evolved from or to D or E. X represents any amino acid with the exception of Ser, Thr and Tyr.

39

Figure 3.2. Phosphosites that are differentially lost in paralogous phosphoproteins evolve toward negatively charged residues.

Each bar represents the percentage of sites (pSer and pThr, cSer and cThr) that evolved from or to Asp or Glu. Numbers above the bars represent the total number of pSer, cSer, pThr or cThr sites that were gained or lost. Numbers above the arrows indicate p-values of the Fisher’s exact tests, bold ones being below 0.05.

40

Figure 3.3. Detailed analysis of the patterns of evolution of pSer and pThr sites.

Each bar represents the percentage of sites (pSer, cSer, pThr or cThr) that evolved from or to Asp or Glu. Numbers above the bars represent the total number of pSer, cSer, pThr or cThr sites that were gained or lost. Numbers above the arrows indicate p-values of the Fisher’s exact tests, bold ones being below 0.05. The top panel shows results obtained by ancestral sequence reconstruction using a likelihood approach and the bottom panel using parsimony.

41

A gobal analysis of pSer, pThr, Asp and Glu shows that phosphosites tend to be lost to Asp and Glu more frequently than cSer and cThr, and this holds true for both likelihood (16.6% vs 12.1%, respectively, p = 0.002) and parsimony (17.1% vs 9.6%, respectively, p = 0.006) reconstruction methods (Figure 3.2). However, although there is a tendency towards the gains of phosphosites form Asp and Glu, the observed differences are not significant (Figure 3.2). When studied separately, phosphosites in ordered and disordered regions show the same global tendency to go toward phosphomimetic amino acids (Likelihood: 17.5% vs 10.0% in ordered regions, p = 0.058; 16.5% vs 13.7% in disordered regions, p = 0.086. Parsimony: 20.0% vs 8.1% in ordered regions, p = 0.076; 16.7% vs 11.7% in disordered regions, p = 0.110). This suggests that the effect might be more important in ordered regions of proteins, as would be expected if these residues were playing structural roles. Further, we found that phosphosites are not preferentially gained from phosphomimetic amino acids in disordered regions, while there is a non significant tendency for this type of transition in ordered regions (Likelihood: 16.0% vs 15.7% in disordered regions, p = 0.943; 18.8% vs 13.7% in ordered regions, p = 0.294. Parsimony: 14.1% vs 14.2% in disordered regions, p = 1.000; 11.8% vs 10.2% in ordered regions, p = 0.691). Because the distinction between order and disorder reduces the number sites in each category and does not provide opposite results, we considered both regions simultaneously in the following analyses.

We also examined which class of substitution could be contributing to this overall result (Figure 3.3). We first found that pSer and pThr that were gained after gene duplication follow trends that are in the expected direction although some of the comparisons are not statistically significant and other results are in the opposite direction (Figure 3.3). However, this detailed analysis showed that pSer are significantly more likely to evolve to Glu than cSer (11.6% vs 5.3%, p = 0.008) and pThr evolve significantly more frequently to Asp than cThr (9.8% vs 4.3% respectively, p = 0.013).

Protein phosphorylation is known to have a key role in regulating protein activities (Cohen, 2000). Evolutionary events such as gains and losses of phosphosites can lead to changes in protein regulation, thus rewiring the protein regulatory network of the cell (Freschi et al,

42

2011). In the literature, there is evidence for gains of new phosphosites coming from negatively charged residues among orthologs (Basu et al, 2008; Pearlman et al, 2011) as well as cases of losses of phosphosites to these amino acids (Kurmangaliyev et al, 2011). The biochemical properties of Glu and Asp mimic the ones of pSer and pThr with the exception that their charge is not regulatable (Tarrant & Cole, 2009). These observations led us to hypothesize that coding sequence divergence of paralogous genes by neo and subfunctionalization does not strictly involve the apparition or the partitioning of protein function. Paralogous genes could also diverge in how these functions are regulated. Divergence in the regulatory control is well known at the transcriptional level (Gu et al, 2005; Rodgers-Melnick et al, 2012), but has not been specifically addressed at the posttranslational level. We tested this hypothesis on the complete set of WGD phosphoproteins of the buddying yeast S. cerevisiae.

Using two different methods to infer the ancestral state of phosphorylated and non- phosphorylated Ser and Thr, we found that pSer and pThr globally have a tendency to evolve from negatively charged amino acids in paralogous phosphoproteins compared to their non-phosphorylated counterparts. The tendencies observed are in agreement with our hypothesis and with the observations made by Pearlman et al. across eukaryotes (Pearlman et al, 2011). However, the observed differences are not significant, which could be explained by a few non-exclusive scenarios. First, we are looking at a narrow evolutionary window (100 My), which contrasts with the analysis conducted by Pearlman et al., who used aligned sequences from organisms spanning the entire tree of life (Pearlman et al, 2011). Further, the mechanism proposed may apply primarily to few sites and in ordered regions of proteins. Only few phosphosites in these regions could be analysed here since the majority of them is found in disordered regions [37], which reduces the statistical power of our analysis. Our results regarding losses of phosphosites are in line with this hypothesis. Finally, a significant fraction of phosphosites are thought to be non-functional (Landry et al, 2009). Because these non-functional sites are not under selective pressure, they may contribute to decrease the signal coming from functional sites. Nevertheless, from our results, we cannot rule out the possibility that gains of phosphosites are not more likely to derive from phosphomimetic residues after gene duplications. A larger sample size, the

43

study of a time window of a different length and a better knowledge of the functional importance of phosphosites may be needed to provide a final answer.

Following the same approach, we examined whether phosphorylated residues, when lost, are more likely to be replaced by Asp and Glu than when non-phosphorylated equivalent residues are lost. We found that this is the case globally and also when considering individual cases for both pSer and pThr; pSer are more likely to be replaced by Glu residues while pThr by Asp residues. A similar trend was detectable for the transitions from pThr to Glu. These results are in agreement with those from Kurmangaliyev et al. (Kurmangaliyev et al, 2011) who also showed that pSer are more likely to evolve to phosphomimetic amino acids than cSer in the divergence of orthologs between species. Our results show that the evolutionary trajectories of pSer and pThr provide a mechanism for paralogous protein divergence. Our analyses support the hypothesis that divergence between paralogs can be generated by a loss of the posttranslational regulatory control on a function rather than by the complete loss of the function itself. Indeed, the substitution of a phosphosite for an Asp or a Glu residue may block one paralog into a single constitutive functional state whereas the other one remains regulatable by protein kinases and phosphatases.

3.6 - Conclusion

Our results raise the question of how these transitions are made possible during evolution. The genetic code is organized in such a way that transitions between phosphorylatable and phosphomimetic amino acids involve a transition state with an amino acid that is not negatively charged, except for transitions between two Asp and two Ser codons that involve a Tyr residue (Figure 3.4).

44

Figure 3.4. Transitions between phosphorylatable and phosphomimetic amino acids need to go through a non-negatively charged intermediate.

However, Tyr is only rarely phosphorylated in yeast and Tyr residues are not phosphorylated by the serine/threonine kinases (Ubersax & Ferrell, 2007), which suggests that this path would not be favoured. A non-negatively charged intermediate could lead to a complete loss of the function that was performed by the negative charge and could thus be deleterious (Figure 3.5A).

45

Figure 3.5. A duplication event could provide the conditions for the intermediate non- functional site to be neutral, which would allow a transition without affecting the fitness of the organism.

(A) Without a duplication event, the loss of a negative charge could have deleterious effects if the charge is important for the function of the protein. (B) The redundant paralogous gene copy could serve as a backup and prevent deleterious effects created by the loss of the charge. The backup copy could then be retained or lost. In the latter case, the system would be different from its ancestor.

Here we propose that the relaxed constraints that follow a gene duplication event could provide the mean to reach this intermediate state and to go beyond (Figure 3.5B). After gene duplication, when one of the duplicated copies is lost, the system is assumed to go back to its ancestral state, a process called non-functionalization (Lynch & Conery, 2000). However, following our model, the duplicated copy could serve as a backup for a transition period, which would allow the other copy to reach a state that would have been unreachable otherwise (Gordon, 1994; Hansen et al, 2000; Scannell & Wolfe, 2008). After the loss of the backup copy, the system would remain different from its ancestral state since the phosphorylation profile and thus the phosphoregulation of this protein has changed. The

46

term non-functionalization may thus not be suitable for such cases. In the case of a WGD event, where the vast majority of the duplicated genes are eventually lost and are thought to return back to their ancestral state, these 2-step transitions could potentially lead to a great burst in the evolution of phosphoregulation. Further studies at different time points following gene duplication would be needed to determine how important this mechanism could be for the evolution of phosphosites.

3.7 - Acknowledgements

This work was supported by a Canadian Institute of Health Research (CIHR) Grant GMX- 191597 and Natural Sciences and Engineering Research Council of Canada discovery grant to C. R Landry. C. R Landry is a CIHR New Investigator. G. Diss and L. Freschi were supported by fellowships from the Quebec Research Network on Protein Function, Structure and Engineering (PROTEO). We thank the members of the Landry laboratory, two anonymous referees and N. Aubin-Horth for comments on the manuscript.

47

Chapter 4 - Functional Divergence and Evolutionary Turnover in Mammalian Phosphoproteomes

Published on: Freschi L., Osseni M., Landry C.R (2014) Functional Divergence and Evolutionary Turnover in Mammalian Phosphoproteomes, PLoS genetics 10 (1), e1004062

49

4.1 – Résumé

Ici, nous avons étudié l’évolution de la phosphorégulation chez les mammifères en comparant les phosphoprotéomes de l’homme et la souris. Nous avons trouvé que 84% des positions qui sont phosphorylées dans une espèce ou l’autre sont conservées au niveau des résidus. Vingt pourcent de ces sites conservés sont phosphorylés dans les deux espèces. Cette proportion est 2.5 fois plus grande que ce qui est attendu par chance. Cela suggère que la sélection purificatrice tend à préserver la phosphorégulation. L'autre 80% des sites qui sont conservés au niveau du résidu sont différentiellement phosphorylés chez l’homme et la souris. Nos résultats suggèrent qu’au moins 5% de ces sites ont le potentiel d’être des vrais cas de divergence entre les réseaux de phosphorylation de ces deux espèces et cela même si le résidu est conservé dans les protéines orthologues des deux espèces. Nous avons aussi montré que le turn-over évolutif des sites de phosphorylation qui se trouvent dans des positions adjacentes chez l’humain ou la souris mène à une surestimation de la divergence de phosphorégulation dans ces deux espèces. Notre étude propose des analyses avancées des phosphoprotéomes et un cadre pour l’étude de leur contribution à l’évolution phénotypique.

50

4.2 - Abstract

Here, we studied the evolution of mammalian phosphoregulation by comparing the human and mouse phosphoproteomes. We found that 84% of the positions that are phosphorylated in one species or the other are conserved at the residue level. Twenty percent of these conserved sites are phosphorylated in both species. This proportion is 2.5 times more than expected by chance alone, suggesting that purifying selection is preserving phosphoregulation. The other 80% sites that are conserved at the residue level are differentially phosphorylated between species. We showed that least 5% of them are likely to reflect true cases of phosphoregulatory divergence between mouse and humans. Moreover, we showed that evolutionary turnover of phosphosites at adjacent positions in human or mouse leads to an over estimation of the divergence in phosphoregulation between these two species. Our study provides a framework for the study of phosphoregulatory divergence contribution to phenotypic evolution.

51

4.3 – Introduction

Most proteins undergo chemical modifications after their synthesis (post-translational modifications, PTMs). These modifications allow a fine-tuning of protein functions and represent a mechanism to expand the coding capacity of genes (Nussinov et al, 2012). Over the past decade, methods based on mass spectrometry have accelerated the discovery of PTMs (Beausoleil et al, 2004; Choudhary et al, 2009; Huttlin et al, 2010; Kim et al, 2011; Olsen et al, 2006; Zielinska et al, 2010). Each experiment can now detect thousands of modified residues, allowing to probe the functional state of entire proteomes. The PTM that has been studied the most is protein phosphorylation: the addition of a phosphate group to specific amino acids (serine (S), threonine (T) and tyrosine (Y) in eukaryotes). Phosphorylation has been shown to affect protein functions, interactions, stability and localization (Khmelinskii et al, 2009; Madeo et al, 1998; Sprang et al, 1988; Vazquez et al, 2000). It is thus of fundamental importance to understand how protein phosphorylation evolves within and between species because changes in phosphorylation profiles may cause changes in protein function and regulation and in organismal phenotypes, including disease development (e.g. (Herbig et al, 2000)).

There have been several reports recently on the evolution of phosphoproteomes. For instance, Kim and Hahn (Kim & Hahn, 2011) identified phosphorylation sites that emerged after the split between humans and chimpanzees and found that these sites are located in proteins involved in crucial biological processes such as cell division and chromatin remodelling. Other studies have looked at the evolution of a subset of phosphoproteomes on a broader evolutionary scale (Boulais et al, 2010; Malik et al, 2008). For example, Boulais and collaborators (Boulais et al, 2010) performed a phosphoproteomics analysis of mouse phagosomal proteins and then compared these proteins to their orthologs from 10 model organisms, from Drosophila to mouse (Boulais et al, 2010). They observed that the phagosomal phosphoproteome was extensively rewired during evolution, but that some phosphorylation sites have been maintained for more than a billion years, suggesting their importance for phagosomal functions. Finally, other studies looked at the conservation and divergence of entire phosphoproteomes over a broad evolutionary scale (Boekhorst et al, 2008; Gnad et al, 2007; Landry et al, 2009; Tan et al, 2009) (and reviewed in (Levy et al,

52

2012)) in order to understand the evolutionary mechanisms and the constraints acting on phosphorylation sites. These studies found that phosphorylated residues tend to be on average more conserved than their non-phosphorylated counterparts (Gnad et al, 2007; Landry et al, 2009) and that this is particularly true for those that were experimentally shown to play functional roles (Landry et al, 2009).

Most studies that aimed at studying the evolution of phosphoproteomes so far have looked at the evolutionary conservation of phosphorylation sites in several species without knowing if these sites are actually phosphorylated in species other than the reference. In other words, if a phosphorylation site in one species corresponds to a phosphorylatable amino acid in another species, both residues were considered as conserved phosphorylation events. This assumption was necessary because of the lack of phosphorylation data available for more than one species. However, we can hypothesize that residue conservation does not always imply phosphoregulatory conservation. Indeed, sites could be conserved at the residue level but differ in their phosphoregulation due to changes elsewhere in the protein, for instance, the recognition motifs of the protein by kinases and phosphatases (Ubersax & Ferrell, 2007) or upstream (in trans) in the signalling cascade. This aspect has not been addressed by previous studies, except in a few cases (Beltrao et al, 2009; Boulais et al, 2010). However, identifying such sites is of great interest since sites that differ in their phosphoregulation despite being conserved at the residue level could lead to changes in the architecture of phosphorylation networks and, ultimately, contribute to phenotypic evolution. We examine this issue here.

Another aspect of phosphoproteomes that can be studied using evolutionary analysis is how phosphorylation sites alone or in combination may affect the function of a protein (Nussinov et al, 2012). Many models of phosphorylation site function stress the importance of conformational changes by protein phosphorylation (Barr & Bogoyevitch, 2001; Nussinov et al, 2012; Skou, 1965). In other models, phosphorylation sites regulate protein functions without the need for conformational changes but rather through changes in the local charge of the protein (Serber & Ferrell Jr, 2007), i.e. simply through bulk electrostatics. A corollary of this last model is that the protein phosphorylation code is

53

redundant, i.e. that phosphorylation sites can change their position over time and still maintain their biological function as long as the number of sites in a given protein region is preserved, without affecting organismal phenotypes. By looking at the patterns of evolution of phosphorylation sites, one could find traces of this redundancy by studying rapid phosphorylation site evolutionary turnover (phosphorylation site gains and losses). This evolutionary turnover has been invoked for interpreting the global rapid pattern of evolution in different species (Ba & Moses, 2010; Freschi et al, 2011; Gnad et al, 2007; Landry et al, 2009; Macek et al, 2008; Malik et al, 2008). However, evidence for positional redundancy of phosphorylation sites is relatively limited. Two independent pieces of evidence come from the cell cycle phosphorylation networks. Moses and collaborators (Moses et al, 2007) studied the evolution of cyclin-dependent kinase (CDK) consensus phosphorylation sites of the yeast pre-replicative complex (Bell & Dutta, 2002). They found that although orthologous proteins contained clusters of CDK consensus sites, the position and the number of phosphorylatable sites were not conserved, suggesting that phosphorylation sites tend to shift their positions during evolution. In a more recent investigation, Holt and collaborators (Holt et al, 2009) compared the positions of 547 phosphorylation sites on 308 Cdk1 substrates in vivo in the budding yeast and their orthologous sites in other fungi. They found that the precise positioning is conserved only in the very closely related species. However, in both cases the phosphorylation status of the sites in other species was not investigated so it is not clear whether the phosphorylation sites were absent from the orthologous proteins or if they actually shifted during evolution through gains or losses to another position. The extent to which phosphorylation site positional redundancy plays a role in overall phosphoproteome turnover therefore awaits comprehensive phosphorylation data from closely related species, which we have assembled here.

We performed an integrated analysis of phosphorylation site evolution between the human and mouse proteomes using a large dataset of phosphorylation sites (Beltrao et al, 2012; Dinkel et al, 2011; Gnad et al, 2011; Hornbeck et al, 2012; Huttlin et al, 2010; Keshava Prasad et al, 2009; Minguez et al, 2012). These two phosphoproteomes are the ones for which we have the greatest amount of phosphoproteomics data between closely related

54

species. We estimated the extent of divergence and conservation between the two phosphoproteomes and we investigated whether phosphorylation site evolutionary turnover could contribute to this divergence.

4.4 - Methods

Phosphoproteomics and sequence data An extensive dataset of human and mouse phosphorylation sites was built by combining data from 7 different databases and experimental studies (Beltrao et al, 2012; Dinkel et al, 2011; Gnad et al, 2011; Hornbeck et al, 2012; Huttlin et al, 2010; Keshava Prasad et al, 2009; Minguez et al, 2012). All protein sequences and orthology relationships were retrieved from ENSEMBL (version 69). In this study, only protein sequences for which we could find orthology relationships between a human protein and at least a mouse, dog and opossum protein were considered. This step allowed us to study the evolutionary history of phosphorylation sites. For humans and mouse, orthology relationships were determined for the longest isoforms of each protein. Each group of orthologous sequences was aligned using MUSCLE (Edgar, 2004). Disordered and ordered regions of proteins were predicted using DISOPRED (Ward et al, 2004a). In order to map phosphorylation sites to our sequences, the following procedure was applied. The sites that were already mapped onto proteins associated with ENSEMBL IDs in the original datasets were directly mapped to our sequences. For all other cases, phosphopeptides were mapped onto proteins using BLAT (Kent, 2002). All peptides that mapped to more than one protein were removed at this step. Mapped phosphorylation sites and information about protein disorder are available in Annex 2, Dataset S2.

Calculating random expectations for phosphorylation sites In order to calculate the random expectation for the number of sites belonging to each one of the different categories (StC, StD and SiD), statuses (0: non-phosphorylated, 1: phosphorylated) of phosphorylatable amino acid were shuffled in each protein by preserving the overall proportion of sites for each residue (S, T or Y) and the localization in disordered/ordered regions. The null distributions were estimated by iterating this

55

procedure 100 times, calculating each time the number of sites belonging to each category. We calculated random expectations by shuffling the mouse sites only. We also performed the calculations by independently shuffling both human and mouse sites and found similar results.

Protein abundance data and classes of abundance Data on protein abundance were taken from PaxDb (Wang et al, 2012) (H. sapiens whole organism integrated dataset). In the analysis presented on Figure 4.1D, proteins were ordered by their abundance and divided in four equal bins.

Housekeeping proteins, tissue specific proteins and sites with known function Data on housekeeping genes were retrieved from Eisenberg and Levanon (Eisenberg & Levanon, 2003) who identified 575 human genes that are expressed in 47 different tissues and cell lines based on microarray data. Data on tissue-specific genes derive from an independent dataset and were retrieved from the TiGER database (Liu et al, 2008). About 5.3 millions human ESTs were mapped to UniGene clusters and the expression pattern of the all UniGenes in 30 human tissues was determined using the NCBI EST database. 7,261 tissue-specific genes were identified. Manually curated data on functional phosphorylation sites (n = 156) were retrieved from Landry et al. (Landry et al, 2009). These sites were derived from the manual curation of the primary literature.

NetPhorest and position weight matrices scores NetPhorest (Miller et al, 2008) was downloaded from (http://netphorest.info) and was run locally using default options. In order to calculate position weight matrices scores, 29 position weight matrices which scores are based on the same metric were obtained from Benjamin Turk (Bullock et al, 2009; Bullock et al, 2005; Bunkoczi et al, 2007; Davis et al, 2009; Filippakopoulos et al, 2008; Gwinn et al, 2008; Hutti et al, 2004; Kikani et al, 2010; Pike et al, 2008; Rennefahrt et al, 2007; Sheridan et al, 2008; Wong et al, 2012). These matrices were used to score all 10-mer amino acids in the mouse and human proteomes that have a phosphorylatable amino acid on the sixth position. The score reflects the probability of each 10-mer to be phosphorylated by a specific kinase.

56

Comparison of proportions, distributions and correlations Proportions were compared with 2-sample tests for equality of proportions with continuity correction. Distributions were compared with non-parametric Wilcoxon Rank Sum tests. Correlations were calculated with the Spearman method. All these statistical analyses were performed as implemented in R.

Algorithm to identify evolutionary clustered sites phosphorylation sites pairs Site colocalization in orthologous proteins was estimated using a window of positions (centered on each human phosphorylation site). The fraction of colocalized sites over the total number of sites was calculated for a range of window sizes. In order to determine which sites were closer in sequence linear space than expected by chance alone, the mouse phosphorylation sites were shuffled in each protein by preserving the overall proportion of sites for each residue (S, T or Y) and disordered/ordered regions, and the fraction of colocalized sites was calculated for each window length. One thousand iterations were performed in order to generate the null model. Also, we masked all the positions in which a phosphorylatable amino acid was present at a given position in both human and mouse. Evolutionary clustered sites were defined as sites that were more likely to be colocalized than expected by chance alone (null model). The closest pair of phosphorylation sites present in these windows was then selected (see also Annex 2, Figure S4.1). The phosphorylatable amino acids serine (S) and threonine (T) differ in biochemical properties compared to tyrosine (Y), another phosphorylatable amino acid (Taylor et al, 1995). Therefore, S/T and Y sites were considered as belonging to separate classes and not considered to be able to compensate each other. Only 1,529 pairs of orthologous proteins that had at least two phosphorylation sites that diverged (site-divergence) in human and mouse respectively were considered. Among these pairs, 563 had at least one SiD site that involves a phospho-serine or phospho-threonine in both humans and mouse. Only one single pair had a SiD site that involves a phospho-tyrosine in both humans and mouse.

57

Testing if evolutionary clustered sites tend to be phosphorylated by the same kinase or group of kinases The kinase that was the most likely to phosphorylate each one of the evolutionary clustered sites was inferred using NetPhorest (Miller et al, 2008) and proportion of evolutionary clustered site pairs phosphorylated by the same kinase was determined. This number was compared to a null distribution obtained by randomly shuffling (10,000 iterations) the kinase-phosphorylation site associations between different evolutionary clustered sites. Analogous analyses were performed for StC and StD sites. We then performed the same analysis but this time using the three best kinases predicted by NetPhorest, as proposed by Tan et al. (Tan et al, 2009). We therefore considered two evolutionary clustered sites as being phosphorylated by the same group of kinases if they shared one or more kinases (kinase group) among the three best kinases predicted to be associated to each site according to NetPhorest. This number was compared to a null distribution obtained by randomly shuffling (100 iterations) the kinases-phosphorylation site associations between different evolutionary clustered sites. Analogous analyses were performed for StC and StD sites. Finally, we performed again all the analyses described above but this time using position weight matrices from the literature (see section NetPhorest and position weight matrices scores for further details) instead of NetPhorest to infer the kinase that was the most likely to phosphorylate each one of the StD, StC and evolutionary clustered sites.

58

4.5 – Results

4.5.1 - Conservation and divergence between human and mouse phosphoproteomes We assembled a dataset of human (n = 106,877) and mouse (n = 54,400) phosphorylation sites by collecting data from 7 different databases and experimental studies (Beltrao et al, 2012; Dinkel et al, 2011; Gnad et al, 2011; Hornbeck et al, 2012; Huttlin et al, 2010; Keshava Prasad et al, 2009; Minguez et al, 2012) (Annex 2, Table S4.1). We successfully mapped 128,705 sites onto 11,150 human and mouse orthologous proteins: 86,065 in humans and 42,640 in mouse (Annex 2, Figure S4.2). As previously observed (Iakoucheva et al, 2004; Landry et al, 2009), phosphorylation sites are preferentially located in disordered regions of proteins (observed vs. expected proportions: 0.69 vs. 0.62, p-value = 2.2 × 10-16). Given this asymmetry in the localization of phosphorylation sites, we generated all the null models of our analyses by respecting the proportion of sites in these two structural categories. Our dataset allows comparing the human and mouse phosphoproteomes using both sequence information and the phosphorylation status of each site. Accordingly, we classified orthologous sites into three classes following Freschi et al. (Freschi et al, 2011) (Figure 4.1A): i) Site-diverged (SiD): sites phosphorylated in one species and non-phosphorylatable in the other; ii) State-conserved (StC): sites phosphorylated in both species; iii) State-diverged (StD): sites that are conserved at the residue level but that have been reported to be phosphorylated in only one of the two species.

In order to examine the extent of conservation of phosphorylation between human and mouse, we estimated the fraction of sites belonging to each of these three categories compared to the total number of sites that are phosphorylated in human, mouse or in both species. We first looked at phosphorylation site divergence. We found that 16,863 sites (16% of the sites that are phosphorylated in human or mouse or both species) are SiD (Figure 4.1B). These sites are about 1% less abundant than random expectations obtained by shuffling the phosphorylation statuses of S/T/Y residues (Figure 4.1B), suggesting that purifying selection is acting on phosphorylation sites to maintain their function but to a limited extent, as previously observed with different approaches (e.g. (Landry et al, 2009)). These sites, if functional, are expected to reflect differences in phosphoregulation between

59

human and mouse. However, a fraction of these SiD sites might be positionally redundant site pairs such that the functional divergence may be overestimated (see below).

We examined other types of conservation and divergence. We first found that 20,146 phosphorylation sites (18% of the sites that are phosphorylated in human or mouse or both species, Figure 4.1B) are StC. This proportion is 2.5 times greater than what is expected by chance alone (Figure 4.1B). We observed this strong signal for conservation in both disordered and ordered regions (Annex 2, Figure S4.3). These results suggest an overall conservation of the phosphorylation profiles between the two species, most likely as a result of purifying selection acting to maintain the phosphoregulation of these sites. We performed a similar analysis on clusters of poly-S/T/Y (stretches of two or more consecutive S/T/Y residues) rather than single residues and found the same patterns of conservation and divergence (Annex 2, Figure S4.4).

60

Figure 4.1. Purifying selection is acting on mammalian phosphorylation sites and their phosphorylation status.

(A) Site-diverged (SiD) sites are orthologous residues where one is phosphorylated and the other is a non-phosphorylatable amino acid (any amino acid but S, T and Y). State-conserved (StC) sites are

61

orthologous phoshorylatable residues (S, T, Y) that are both reported to be phosphorylated. Finally, state-diverged (StD) sites are orthologous phosphorylatable residues for which only one of the two is phosphorylated. Circles with the P symbol indicate residue phosphorylation. Colors indicate the different categories of sites. (B) Number of observed SiD, StC and StD sites and their respective expected distributions as estimated by randomizing mouse phosphorylation sites. (C) Three scenarios for StD sites: false positive and false negative identifications; rapidly evolving non- functional phosphorylation sites; divergence in phosphoregulation. (D) Relationship between state- conservation and protein abundance. The four classes of protein abundance have the same number of proteins. (E) Comparison of the proportion of StC and StD sites in housekeeping and tissue- specific proteins. (F) Comparison of the proportion of sites with known functions present in StC and StD sites.

Despite an overall signal of conservation on the phosphorylation status of proteins, the most represented category of sites in our dataset is StD sites (71,550 sites or 66% of the sites that are phosphorylated in human, mouse or both species). Three different non- exclusive scenarios could explain this large number of StD sites (Figure 4.1C). The first one implies that state divergence results from an incomplete coverage of phosphoproteomic data, which means that the phosphoproteomes of the two species might have been undersampled, for instance sampled at different depths or in different conditions or tissues (e.g. (Huttlin et al, 2010)). The second scenario is that a large fraction of the StD sites identified might result from non-functional phosphorylation sites. Non-functional phosphorylation sites evolve rapidly (Landry et al, 2009) and could therefore lead to the poor conservation on the phosphorylation status we observed. The third scenario is that a fraction of StD phosphorylation sites is actually diverging in its regulation. Finally, state- divergence could also be inflated by false positive identifications in one species or the other.

We examined which scenario or scenarios were compatible with our data. According to the first scenario, StD may mostly result from false-negative phosphorylation sites in the data. This is certainly the case for an important part of the data as our dataset contains twice as much phosphorylation data for humans than mouse, and humans are not expected to have

62

more phosphorylation sites than mouse. We reasoned that if state-divergence is caused by false-negatives in the datasets, we would expect to see the fraction of StC to increase as a function of protein abundance, since highly abundant proteins are more likely to be sampled in both species than rare proteins. Indeed, we found that the proportion of state conserved sites almost doubles between the two extreme classes of abundance (Figure 4.1D, see also Figure 4.2A). Admittedly, this effect could also be caused by the fact that phosphoregulation is more conserved on highly-expressed proteins but it is unlikely, as it was recently shown that abundant proteins are enriched in non-functional phosphorylation sites (Levy et al, 2012) that evolve relatively rapidly (Landry et al, 2009). In addition, only conserved residues are considered in this analysis.

We also examined whether StC or StD phosphorylation sites were more likely to be found in housekeeping or tissue-specific proteins. Housekeeping proteins are expressed in all tissues, while tissue-specific ones are expressed in one or a few tissues. Accordingly, if StD sites are affected by false negatives we would expect to find them preferentially in tissue- specific proteins. We examined the dataset of housekeeping genes (Eisenberg & Levanon, 2003) and tissue-specific genes (Liu et al, 2008) and found that StC sites are preferentially found in housekeeping proteins compared to StD sites (proportions: 0.027 vs. 0.019, p- value = 0.005, Figure 4.1E), while the trend is reversed if we look at tissue specific proteins (proportions: 0.268 for state diverged vs. 0.219 for StC, p-value = 6.1 × 10-5, Figure 4.1E). This result is in agreement with our hypothesis that StD sites are affected by false negatives, although this effect could be due to the fact that phosphoregulation is more conserved on housekeeping proteins.

In order to examine whether non-functional phosphorylation sites could contribute to poor state-conservation between species, we used a manually curated dataset of functional phosphorylation sites compiled by Landry and collaborators (Landry et al, 2009). Functional sites were identified as sites for which a phenotype was observed when phosphorylatable residues were mutated. If non-functional sites contribute to state- divergence, we would expect functional sites to be overrepresented in StC sites. We found that StC sites are enriched in functional phosphorylation sites compared to StD sites

63

(proportions: 0.0025 vs. 0.00046, p-value < 1.19 × 10-14, Figure 4.1F). This observation suggests that a fraction of the StD sites we identified might be non-functional phosphorylation sites, which would explain their poor conservation status between species. It is important to consider that in both cases these observations are not biased by residue conservation as both StC and StD categories are composed of only phosphorylatable residues.

4.5.2 - A role for state-diverged sites in phosphoproteome divergence Our observation that the majority of StD sites might result from false-negative phosphorylation site identifications or might be non-functional does not rule out the possibility that at least some of these sites could be actual StD sites that diverge in regulation, for instance due to the sequences surrounding the phosphorylated residues. Kinase recognition motifs on substrates are difficult to compare directly due to their degeneracy (Ubersax & Ferrell, 2007). We therefore relied on kinase prediction tools for our analyses. We assigned each site to a protein kinase using the NetPhorest classifier (Miller et al, 2008) to associate protein kinases with all phosphorylation sites based on the site flanking sequences. NetPhorest classification is based on an atlas of consensus sequence motifs that covers 179 kinases and 104 phosphorylation-dependent binding domains and was built using in vivo and in vitro experimental data (Miller et al, 2008). If a site is phosphorylated in one species but not in the other, the sequences surrounding the phosphorylatable residue should match a kinase consensus motif better for the phosphorylated site than for the orthologous non-phosphorylated one. Given that NetPhorest provides a score (from 0 to 1) for many possible kinase-substrate associations, we selected the kinase having the best NetPhorest score and we used this score as a proxy to assess the probability of a given site to be phosphorylated. We relaxed this assumption in some of our analyses. In addition, we performed the same analyses directly using a collection of position weight matrices derived from mammalian kinases and the results are in agreement with what we find with the NetPhorest predictions (Figure S4.5).

We first examined whether there was an association between S/T/Y phosphorylation and NetPhorest scores and found that the probability for a site to be phosphorylated strongly

64

increases with increasing NetPhorest scores in both mouse and human data (Figure S4.6). Another result in support of this observation is that the fraction of state conserved sites increases as a function of NetPhorest scores (Figure 4.2A) and this relationship is independent from protein abundance. We also found that prediction scores are very similar for StC sites (median scores: 0.32 for the human phosphorylation sites vs. 0.32 in mouse ones, p-value = 0.54) and higher than those of sites conserved at residue level but non- phosphorylated in both species (median scores: 0.32 for StC vs. 0.20 for non- phosphorylated residues, p-value = 2.2 × 10-16; Figure 4.2B and Figure S4.7A-B). This confirms again a strong association between NetPhorest scores and the probability that a site is phosphorylated. Surprisingly, we found that scores of StC sites were also higher than the scores of the phosphorylated residues in the StD class (median scores: 0.32 vs. 0.22 for humans, p-value = 2.2 × 10-16; 0.32 vs. 0.26 for mouse, p-value = 2.2 × 10-16; Figure 4.2B- C and Figure S4.7A-B). This means that sites that are conserved and phosphorylated in both species have a significantly better match to consensus kinase motifs than those that are conserved at the residue level but phosphorylated in one species only.

65

66

Figure 4.2. Analysis of NetPhorest scores for the different classes of sites.

(A) Fraction of StC sites as a function of NetPhorest scores and protein abundance. (B) Comparison of NetPhorest scores for human and mouse phosphorylated and non-phosphorylated residues (Wilcoxon tests). (C) Comparison of NetPhorest scores for StD sites (Wilcoxon tests). (D) Correlation between human and mouse NetPhorest scores for StC sites (red) and StD sites phosphorylated in human but not in mouse (black). (E) Correlation between human and mouse NetPhorest scores for StC sites (red) and StD sites phosphorylated in mouse but not in human (black). (F) Proportion of phosphorylated sites that have higher NetPhorest scores compared to their corresponding site in the other species for StC and StD sites. Comparisons of human and mouse scores calculated with position weight matrices are shown in Figure S4.5. *: p-value < 0.05; **: p- value < 0.01; ***: p-value < 0.001.

There are several possible explanations for these differences. First, this result could derive from how predictive tools have been developed. For instance, phosphorylation sites may be more often studied on abundant proteins, which would imply that kinase prediction tools are better trained at recognizing phosphorylation sites present on abundant proteins. We tested this hypothesis and found that there is no increase in the average NetPhorest scores as a function of protein abundance (Figure S4.8), showing that the NetPhorest classification is not biased towards sites present in highly abundant proteins. Another possibility is that StD sites contain a significantly higher proportion of false-positive phosphorylation sites compared to StC sites, as the latter have been found to be phosphorylated in the two species in completely independent experiments and thus have much stronger experimental support. Indeed, false positive sites would have low NetPhorest scores, similar to non- phosphorylated ones and would therefore contribute lowering the average NetPhorest score for the residues that are phosphorylated in StD sites compared to StC sites. A third possibility is that StD sites could contain a proportion of non-functional phosphorylation sites with non-consensus motifs as shown before by Landry and collaborators (Landry et al, 2009) who found that phosphorylation sites matching kinase motifs have a higher degree of evolutionary conservation and are thus more likely to be functional. Altogether, these results suggest that the match to a consensus sequence motif could be used to the

67

prioritization of phosphorylate sites for downstream functional analysis in phosphoproteomics experiments.

Despite these potentially confounding factors, we found evidence that StD is at least partly caused by divergence in regulatory motifs. We found that scores of phosphorylated StD sites are significantly higher than those of their non-phosphorylated orthologous counterparts in both pairwise comparisons (phosphorylated in human vs. non- phosphorylated in mouse, median scores: 0.216 vs. 0.214, p-value = 3.93 × 10-5; phosphorylated in mouse vs. non-phosphorylated in humans, median scores: 0.255 vs. 0.245, p-value = 6.38 × 10-5; Figure 4.2C). The fact that we see the effects in both directions rules out the possibility that NetPhorest scores are systematically higher in humans. In order to identify among the set of StD sites the ones that have the potential to be true StD sites, we directly compared matching orthologous NetPhorest scores of StC and StD sites. We found a strong correlation between the NetPhorest scores for StC sites (rho = 0.95, p-value < 2.2 × 10-16) and a weaker correlation between the scores of the StD sites, and this both for those phosphorylated in humans but not in mouse (rho = 0.89, p-value < 2.2 × 10-16, Figure 4.2D) and for those phosphorylated in mouse but not in humans (rho = 0.88, p-value < 2.2 × 10-16, Figure 4.2E). This result is confirmed when comparing the proportion of StD sites having higher scores in humans than in mouse to the same proportion calculated for StC. We found a slight but significant excess of StD sites having higher scores in human than in mouse compared to StC sites (proportions: 0.284 vs. 0.258, p-value = 8.69 × 10-13, Figure 4.2F). We found similar results for the StD sites having higher scores in mouse compared to humans (proportions: 0.291 vs. 0.261, p-value = 8.69 × 10-11, Figure 4.2F). By summing up all these excess StD sites that show high NetPhorest scores in one organism but low scores in the other we concluded that that at least 5% of the StD sites (either phosphorylated in human or mouse) present in our dataset have the potential to be sites that are differentially regulated between species, despite a conservation of the actual phosphorylatable residues. Our results do not depend on the NetPhorest algorithm as we performed the same analyses using position weight matrices available from the literature (Bullock et al, 2009; Bullock et al, 2005; Bunkoczi et al, 2007; Davis et al, 2009; Filippakopoulos et al, 2008; Gwinn et al, 2008; Hutti et al, 2004; Kikani et al, 2010;

68

Pike et al, 2008; Rennefahrt et al, 2007; Sheridan et al, 2008; Wong et al, 2012) and all of our conclusions about StC and StD sites were mirrored in these tests, as shown in Figure S4.5. Overall, our results show that in addition to the actual divergence in phosphorylated sites (SiD), a significant fraction of the mouse and human phosphoproteomes have diverged through changes in the kinase recognition motifs. These changes in the phosphoregulatory status of proteins represent changes in the protein regulatory network, as illustrated for a particular subnetwork in Figure 4.3.

Figure 4.3. Comparison of a pair of StC and StD sites.

(A) Example of StC site (human protein: NUCL; site S28). Both sites are predicted to be phosphorylated by the same kinase (CK2) by NetPhorest. The human and mouse kinase- phosphorylation networks are shown for the 10 StC sites with the highest NetPhorest scores (Table S2). The width of the edges is proportional to the NetPhorest score. (B) Example of StD site (human protein: NIN; site S1145). The two phosphorylation sites are predicted to be phosphorylated by different kinases (human: CK2, mouse DMPK) by NetPhorest. The human and mouse kinase- phosphorylation networks are shown for the 10 StD sites with the highest difference in NetPhorest

69

scores (Annex 2, Table S4.2). Dotted lines represent predicted kinase-phosphorylation site associations that have been rewired in mouse considering the human network as reference.

Potential StD sites are located in proteins that have fundamental cellular functions, making them good candidates for the investigation of species-specific mechanisms of regulation. Further examples are available in Annex 2, Table S4.2.

4.5.3 - Evolutionary turnover of mammalian phosphorylation sites We next examined whether the positional turnover of phosphorylation sites could contribute to SiD between mouse and humans. One prediction of this model is that sites that are lost in one lineage could be compensated for by the gain of other sites in the proximity (Freschi et al, 2011). Similarly, sites could change their positions as a result of insertions and deletions in the surrounding regions. In order to test this prediction, we developed an algorithm to identify evolutionary clustered sites (Freschi et al, 2011), i.e. pairs of sites that are SiD between mouse and humans and that are closer to each other in the linear protein space than expected by chance alone (Annex 2, Figure S4.1).

We found that 123 site pairs belonging to 68 proteins show significant evolutionary clustering of SiD phosphorylation sites (Annex 2, Table S4.3; alignments are available in Annex 2, Dataset S1). Ninety percent of the proteins that contain evolutionary clustered site pairs have only one or two of them (Annex 2, Figure S4.9) with few exceptions (Annex2, Table S4.4). This number also excludes proteins for which we found a high number of evolutionary clustered site pairs due to large clusters of sites that we did not consider (NOL8, 10; KI67, 27; MDC1, 180 site pairs). The median NetPhorest score for these sites is 0.29, suggesting that they are likely to be phosphorylated and not false-positives (0.20 is the median score for non-phosphorylated residues while 0.32 is the median score for phosphorylated residues). The typical window within which we found significant clustering between SiD sites is 10 amino acids (Annex 2, Figure S4.10) and approximately 80% of the sites are less than 40 amino acids distant in the alignment. The observed number of site pairs (n = 123) is likely an underestimate of the contribution of evolutionary site turnover

70

because we need many possible configurations in the neutral model to identify them and phosphorproteomes have likely been under sampled. We found that the proportion of proteins that show significant evolutionary clustering increases with the proportion of available sites (Annex 2, Figure S4.11). Furthermore, we found that the number of evolutionary clustered sites is correlated with protein size (rho = 0.26, p-value = 0.03) and may thus be biased towards large proteins.

If these clustered SiD sites were functionally equivalent at the network level between the two species, we would expect them to be phosphorylated by the same kinases or group of kinases. We used again NetPhorest to test this hypothesis. We determined the proportion of StC, StD and evolutionary clustered sites that were likely to be phosphorylated by the same kinases or group of kinases (overlap of one or more kinases among the three best kinases predicted by NetPhorest) (Tan et al, 2009) and we compared these observations to the random expectations obtained by shuffling the mouse kinase-substrate associations. We found that the proportion of StC and StD sites predicted to be phosphorylated by the same kinases or group of kinases was more than 7 times greater than expected by chance alone, suggesting that, globally, these sites tend to be phosphorylated by the same kinases or group of kinases (Figure 4.4A-B).

Figure 4.4. Proportion of sites that are phosphorylated by the same protein kinase.

(A) Proportion of sites phosphorylated by the same kinases (NetPhorest predictions) for the different categories of sites (StD: state diverged, StC: state conserved, ECS: evolutionary clustered

71

sites). Black dots represent the observed proportion. Orange lines represent the range of proportions expected by chance alone. P-values for StC and StD: < 0.0001; p-value for ECS: 0.03. The histogram shows the distribution expected proportions for ECS. A similar analysis was performed using position weight matrices (Figure S4.5). (B) Proportion of sites phosphorylated by one or more shared kinases (kinase group) among the three best kinases predicted to be associated with each site according to NetPhorest. P-values for StC, StD and ECS: < 0.01.

We found a slightly significant tendency (p-value = 0.03) for the evolutionary clustered sites to be phosphorylated by the same kinase (Figure 4.4A). We then performed the same analysis, but considering the three best kinases found by NetPhorest assuming that phosphorylation sites could be functionally conserved if they are phosphorylated by closely related kinases as well, as in Tan et al. (Tan et al, 2009). We found that evolutionary clustered sites were 1.4 times more likely to be phosphorylated by the same group of kinases than expected by chance alone (p-value < 0.01; Figure 4.4B). This result suggests that, in general, many evolutionary clustered sites may actually be functionally equivalent. Finally, we performed this analysis using position weight matrices available from the literature (Bullock et al, 2009; Bullock et al, 2005; Bunkoczi et al, 2007; Davis et al, 2009; Filippakopoulos et al, 2008; Gwinn et al, 2008; Hutti et al, 2004; Kikani et al, 2010; Pike et al, 2008; Rennefahrt et al, 2007; Sheridan et al, 2008; Wong et al, 2012) and found qualitatively similar results (Annex 2, Figure S4.5F).

Evolutionary clustered sites could arise through losses and gains of phosphorylation sites in the two lineages. Our algorithm identifies evolutionary clustered sites, but it cannot tell whether these represent gains of phosphorylation sites that compensated for deleterious losses in the same lineage or whether they were simply the result of indels that affected the position of the sites in the human and mouse protein alignments. We therefore aligned the mouse and human proteins with several orthologs belonging to species that diverged after the human-mouse divergence (Figure 4.5A) and manually curated the data in order to identify the possible evolutionary steps that led to these configurations of phosphorylation sites.

72

Figure 4.5. Evolutionary histories of candidate functionally redundant site pairs.

(A) Phylogeny of the species considered for the analysis of evolutionary clustered sites. For all species we show the species name, the three-letter identifier and the common name. (B) Alignment of the Fanconi anemia group M protein (FANCM). Evolutionary clustered sites are indicated in bold. Residues that have been reported to be phosphorylated are on a green background. (C) Alignment of the disabled homolog 2 protein (DAB2). (D) Alignment of the low-density lipoprotein receptor-related protein (LRP2).

73

We manually identified many cases (n = 17, 14%) of evolutionary clustered sites that were most likely caused by indels changing protein length and thus alignment. An example is in the Fanconi anemia group M protein, an ATPase implicated in DNA repair (Meetei et al, 2005) in which S1673 and S1674 are shifted towards the C-terminal in the mouse lineage (Figure 4.5B). The remaining 86% (n = 106) of the cases of evolutionarily clustered sites could not be simply explained by indels and may thus represent compensatory evolutionary events. We observed such a case in the protein DAB2 (human site: S723; mouse site: S731), which plays a potential role in ovarian carcinogenesis (Fazili et al, 1999) (Figure 4.5C). The human S723 has been gained after the split of the Haplorrhini from the other primates, while the second one (S731) has been lost after the split between the rodents and the primates. Another example involves the human T4634 and the mouse site S4632 on LRP2 (Figure 4.5D). This protein is a membrane receptor of absorptive epithelial cells. Mutations in this protein are associated with Donnai-Barrow syndrome, a genetic syndrome that leads to defects in vision, hearing, craniofacial features and structural abnormalities in brain (Kantarci et al, 2007). In this case the human T4634 site appeared in primates after the split from rodents, while the mouse S4632 site was lost after the split of the Strepsirrhini from the other primates. The biological function of these phosphorylation sites has not been determined but they represent prime candidates for exploring, at the molecular level, the positional redundancy of phosphorylation sites.

4.6 – Conclusion

Here we compared the human and mouse phosphoproteomes in order to gain a detailed picture of phosphoregulatory conservation and divergence between these two species. We found that, globally, phosphorylation sites tend to be conserved between human and mouse. By using phosphorylation data from both species, we showed that the number of the sites that are phosphorylated in both human or mouse is 2.5 times higher than expected by chance alone. In addition, we estimated phosphorylation status divergence. We found that the majority of phosphorylation sites that are conserved at the residue level between human and mouse are actually divergent with respect of their phosphorylation status (StD sites). While this is most likely largely due to incomplete coverage between the two species, we

74

showed that at least 5% of the StD sites are actually diverging at the kinase-substrate interaction level. We also found that phosphorylation sites that are phosphorylated in both species are more likely to be functional and have higher kinase assignment scores, suggesting that this conservation criterion could be used to prioritize phosphorylation sites for further characterization (Beltrao et al, 2012; Landry et al, 2009). Taken together, these results suggest that more data is needed in these two species to be able to completely assess the conservation and divergence of their phosphoproteomes. Furthermore, the candidate StD sites might have specific regulatory properties that still have to be characterized and understood. A better understanding of these properties will allow us to make an important step towards in our attempt to describe and explain how small regulatory differences map to the important phenotypic differences among species. Mouse is the best model system to study human biology and diseases. It is therefore important to understand how these two species diverge and phosphoregulatory evolution may play an important role.

We identified sites that are phosphorylated in one species but that have diverged in the other so that the site is not phosphorylatable (SiD sites). While the biological meaning of the majority of these sites still remains to be assessed, our analysis suggests that many of them could be functionally redundant. This result supports the finding by Moses and collaborators that phosphorylation site evolutionary turnover has a role in shaping phosphoregulation (Moses et al, 2007). If the redundancy hypothesis holds true, we might need to revisit estimations of phosphorylation conservation, since omitting positional redundancy may lead to an underestimation of phosphorylation site functional conservation. Moreover, this implies that we should consider different categories of phosphorylation sites: the ones for which the position along the protein is a determinant for their function (positionally-dependent phosphorylation sites) and those for which the global charge rather than the exact position is responsible for their function (positionally-flexible phosphorylation sites).

75

4.7 – Acknowledgements

We thank A. Moses, A. Nguyen Ba and all members of the Landry laboratory for their comments on the manuscript. We also thank B. Turk (Yale University) for providing the position weight matrices used in this study. This work was supported by Canadian Institutes of Health Research (CIHR) (GMX-191597). C. R Landry is a CIHR New Investigator. L. Freschi was supported by a fellowship from the Fonds de Recherche du Québec - Nature et Technologies (FRQ-NT) and L. Freschi and M. Osseni by the Quebec Research Network on Protein Function, Structure and Engineering (PROTEO).

76

Chapter 5 – Cross-talk between O-GlcNAcylation and phosphorylation in mammalian proteomes

77

5.1 – Résumé

Les modifications post-traductionnelles sont des interrupteurs moléculaires qui permettent à la cellule d’exercer un contrôle fin sur la fonction de ses protéines. Dans certains cas un résidu peut subir plusieurs de ces modifications qui peuvent activer/désactiver la même fonction de la protéine ou des fonctions différentes. C’est le cas, par exemple, de la phosphorylation et de la glycosylation qui affectent les sérines et thréonines des protéines. Ici, nous avons étudié si ces deux modifications pouvaient agir comme des interrupteurs pour la même fonction biologique ou pour des fonctions différentes. Nous avons trouvé que les résidus qui peuvent atteindre trois états (non modifié, phosphorylé, O-GlcNAcylé) ont un niveau de conservation plus élevé comparé comparativement à ceux qui ne peuvent atteindre que deux états (non modifié, phosphorylé ou non modifié, O-GlcNAcylé). De plus, nous avons trouvé que les résidus qui peuvent atteindre trois états ont tendance à être phosphorylés par des kinases différentes comparativement aux résidus qui peuvent atteindre deux états seulement. Nos résultats supportent l’hypothèse que la phosphorylation et la O-GlcNAcylation contrôlent deux fonctions différentes plutôt que la même fonction.

78

5.2 - Abstract

Post-translational modifications (PTMs) are molecular switches that allow the cell to finely tune proteins functions. In some cases a residue can be modified by multiple and alternative PTMs that can activate/deactivate the same protein function or different functions. This is the case for serine and threonine residues, that can be phosphorylated and O-GlcNAcylated. Here, we investigate wheather these two PTMs may act as switches for the same biological function or different functions. We found that there is a greater evolutionary constraint for the residues that can shuttle between 3 states (non-modified, phosphorylated, O- GlcNAcylated) compared to the ones that can shuttle between 2 states only (non-modified, phosphorylated or non-modified, O-GlcNAcylated). Moreover, we found that 3-state and 2- state residues are likely to be regulated by different sets of kinases. Our results support the hypothesis that at least in humans, phosphorylation and O-GlcNAcylation control multiple functions rather than the same one.

79

5.3 - Introduction

Post-translational modifications (PTMs) are chemical modifications of proteins that allow the modulation of protein functions and represent a mean to extend the coding capacity of genes (Prabakaran et al, 2012). PTMs modulate protein activity, localization, degradation and interactions (Khmelinskii et al, 2009; Madeo et al, 1998; Sprang et al, 1988; Vazquez et al, 2000). Proteins can undergo several PTMs and progresses achieved in mass spectrometry technologies in the last decade allow to screen entire proteomes for the identification and quantification of these PTMs (Olsen & Mann, 2013). Examples of PTMs include protein phosphorylation, the addition of a phosphate group to serines, threonines and tyrosines and O-GlcNAcylation, the addition of an O-linked β-N-acetylglucosamine moiety to serines and threonins (Zeidan & Hart, 2010). Given the large number of modifications any protein can bear, one major question that emerged recently is whether these PTMs affect each other’s function, i.e. whether they cross talk to each other (Beltrao et al, 2013; Brooks & Gu, 2003; Hunter, 2007; Latham & Dent, 2007). This interaction would in principle define a PTM “code” that would allow the cell to implement complex regulatory programs at the level of single proteins. Indeed, each PTM allows the protein to assume a new configuration or state that often determines changes in protein function (Deribe et al, 2010).

Two general modes of cross-talk have been reported in the literature: positive and negative cross-talk (Hunter, 2007). Positive cross-talk refers to a scenario in which one PTM promotes the direct or indirect addition or removal of a second modification. An example of this mode of cross-talk is the phosphorylation-dependent ubiquitynation of the Sic1p protein in yeast (Nash et al, 2001; Verma et al, 1997). Sic1p is an inhibitor of the Cyclin Dependent Kinases (CDK), important regulators of cell cycle progression. This inhibition has to be released in order to to start DNA replication. The phosphorylation of Sic1p by CDKs at multiple sites allows the ubiquitinylation of Sic1 by Cdc4. This event determines Sic1p degradation, thus allowing the cell to progress through the cell cycle. Another notable example of positive cross-talk is the interplay between lysine residues on the human histones whereby the methylation of Lys-27 of histone H3 increases the probability of the methylation of Lys-36 (Schwammle et al, 2014).

80

The opposite mode of action, the negative cross-talk, implies that one PTM impedes another modification to occur. Examples of negative cross talk have been reported and include again the methylated lysines of histone H3. For instance, the tri-methylation of Lys- 4 inhibits the methylation of Lys-9 (Schwammle et al, 2014). The importance of these cross-talks is illustrated by their use by microorganisms to take the control of the cell or shut down the immunitary response. Indeed, an example of this scenario is represented by the human protein MAPKK6. The phosphorylation of this protein on critical serine and threonine residues is required to activate the downstream MAPK kinases in the innate immune response to pathogens. Mukherjee and collaborators (Mukherjee et al, 2006) found that Yersinia species use the effector protein YopJ to acetylate these critical residues on MAPKK6. This competition between phosphorylation and acetylation for the same sites prevents the activation of MAPKK6, allowing Yersinia to usurp the eukaryotic cellular signalling and block a pathway that is crucial for the innate immune response activation.

Of particular interest are the cross talks among PTMs that occur on the same residues. PTMs occurring on the same residues are by definition exclusive and thus have the potential to directly affect each other’s functions. Examples of such PTMs reported in the literature are acetylation, ubiquitinylation, methylation and SUMOylation in lysines residues (Latham & Dent, 2007) as well as O-GlcNAcylation and phosphorylation in serines and threonines residues (Hart et al, 2011). Although different PTMs can regulate the same function, they could also regulate different protein functions (Beltrao et al, 2013; Benayoun & Veitia, 2009). For instance, previous studies have shown that the cross-talk between protein lysine acetylation and ubiquitination has effects on protein stability (Caron et al, 2005). Acetylation at lysine residues of proteins prevents their ubiquitination and, ultimately, their degradation. In this case, the cross-talk between acetylation and ubiquitinylation regulates the same protein property or function: protein stability. On the other hand, different PTMs occurring on the same site can regulate two distinct functions in different contexts, i.e. for instance in different tissues, steps of the cell-cycle or different cell compartments. For instance, Kamemura and collaborators (Kamemura et al, 2002) found that the Thr-58 residue of the c-Myc protein is preferentially phosphorylated or O-

81

GlcNAcylated in a condition dependent fashion in presence or absence of mitogens, suggesting that different cellular roles of c-Myc are regulated by these two PTMs.

Here we examine the putative cross talk between protein phosphorylation and O- GlcNAcylation. In the last decade the interest for O-GlcNAcylation and its interaction with phosphorylation has grown as showed by the recent studies that have unveiled the role of this modification in regulating key steps of cellular metabolism (Ruan et al, 2013). Further, O-GlcNAcylation is one of the few post-translational modifications for which more than 1,000 sites have been experimentally detected (Khoury et al., 2011).

Phosphorylation and O-GlcNAcylation occur on a specific set of serine and threonine residue. Some residues are not modified and therefore can be only found in one configuration (1-state sites); others are phosphorylated but not glycosylated or vice-versa, thus having one more possible configuration (2-state sites). Finally some sites can be glycosylated and phosphorylated (on different molecules of the same protein or at different times) and can therefore be found in three states (3-state sites). We sought to determine if phosphorylation and O-GlcNAcylation may act on 3-state sites as two independent switches that regulate different biological functions or may act as a single switch to regulate one single function. We hypothesized that the two PTMs regulate two functions. In this case, we should observe that (i) phosphorylation and O-GlcNAcylation do occur at the same residues more often than expected by chance. In addition, we would expect to observe that (ii) the 3-state sites evolve slower than the 2-state ones, since the two functions constitute a stronger constraint. This trend should be observed if we consider the site or its flanking regions, since PTM sites are defined by motifs of amino acids rather than single amino acids. Finally, we would expect (iii) the 3-state sites to show different preferences for protein kinases compared to the 2-state ones, since for the 3-state sites the phosphorylation is expected to be more condition-dependent (e.g. (Kamemura et al, 2002)). We tested all these predictions on the human and mouse phosphoproteomes. Overall, our analyses support the hypotheses that there is a cross-talk between phosphorylation and O- GlcNAcylation and that these two PTMs are likely to control different cellular functions.

82

5.4 - Methods

Phosphorylation, O-GlcNAcylation, protein disorder, sequence data and protein abundance data An extensive dataset of human and mouse phosphorylation and O-GlcNAcylation sites was built by combining data from 8 different databases and experimental studies about phosphorylation (Beltrao et al, 2012; Dinkel et al, 2011; Gnad et al, 2011; Hornbeck et al, 2012; Huttlin et al, 2010; Keshava Prasad et al, 2009; Minguez et al, 2012; Trinidad et al, 2012) and 5 ones about O-GlcNAcylation (Alfaro et al, 2012; Hornbeck et al, 2012; Lu et al, 2013; Trinidad et al, 2012; Wang et al, 2011). To our knowledge this set of phosphorylation and O-GlcNAcylation sites is representative of the data currently available on the literature. All protein sequences and orthology relationships were retrieved from ENSEMBL (version 69). Only protein sequences for which we could find orthology relationships between a human protein and at least a mouse, dog and opossum protein were considered. For humans and mouse, orthology relationships were determined for the longest isoforms of each protein. Each group of orthologous sequences was aligned using MUSCLE (Edgar, 2004). Disordered and ordered regions of proteins were predicted using DISOPRED (Ward et al, 2004a). In order to map phosphorylation sites to our sequences, the following procedure was applied. The sites that were already mapped onto proteins associated with ENSEMBL IDs in the original datasets were directly mapped to our sequences. For all other cases, phosphopeptides were mapped onto proteins using BLAT (Kent, 2002). All peptides that mapped to more than one protein were removed at this step. Mapped phosphorylation and O-GlcNAcylation sites and information about protein disorder are available in Dataset S5.1 (available on request). Finally, data about protein abudance was retrieved from PaxDb (Wang et al, 2012).

Shuffling procedure used to determine random expectations In order to calculate the random expectation for 3-state modified sites, O-GlcNAcylation sites were shuffled in each protein by preserving the overall proportion of sites for each residue (S or T) and the localization in disordered/ordered regions. The null distributions

83

were estimated by iterating this procedure 1000 times, calculating each time the number of 3-state sites. To calculate the random expectations for the localization of the 3-state sites in disordered or ordered regions we considered the two PTMs as one single modification and we performed the shuffling reassigning this modification preserving the overall proportion of co-occurrences per residue (S or T).

Evolutionary conservation The Rate4Site software with default options was used to calculate the evolutionary rates for the 1-state, 2-state and 3-state serines and threonines (Pupko et al, 2002). The raw evolutationary rates were normalized with the following procedure. For each residue type (e.g. serine located in a disordered region) the average evolutionary rate for that residue type in that protein was calculated. Then, for each residue the evolutionary rate calculated with Rate4Site was divided by the average evolutionary rate of the residue type in that protein. In order to avoid the bias of having a different number of species in the alignments used to determine the evolutionary rates, the same analysis was performed using alignments from a previous study (Landry et al, 2009). Finally, in order to avoid the potential biased determined by the algorithm used to calculate the evolutionary rates the analyses were also performed with another algorithm, as described by (Gray & Kumar, 2011).

Kinase-phosphorylation site associations NetPhorest (Miller et al, 2008) was downloaded from (http://netphorest.info) and was run locally using default options. The kinase-phosphorylation site associations were determined by ranking all possible associations determined by NetPhorest according to their score and taking the one with the best score.

84

5.5 - Results

5.5.1 - An extensive dataset of phosphorylation and O-GlcNAcylated sites We built a dataset of human (n = 86,065) and mouse (n = 43,013) phosphorylation sites by collecting data from 8 different databases and experimental studies (Beltrao et al, 2012; Dinkel et al, 2011; Gnad et al, 2011; Hornbeck et al, 2012; Huttlin et al, 2010; Keshava Prasad et al, 2009; Minguez et al, 2012; Trinidad et al, 2012). We successfully mapped these sites onto 8,889 human and 5,903 mouse proteins. We also built a dataset O- GlcNAcylation sites in human (n = 613) and mouse (n = 810) from 5 different databases and experimental studies (Alfaro et al, 2012; Hornbeck et al, 2012; Lu et al, 2013; Trinidad et al, 2012; Wang et al, 2011). We mapped these sites onto 262 human and 316 mouse proteins respectively, in which we counted 105 and 156 co-occurrences (in humans and mouse, respectively). Sixty-five human proteins and 84 mouse proteins contained at least one co-occurrence of phosphorylation and O-GlcNAcylation sites (3-state sites). Previous studies reported that phosphorylation sites tend to be located in disordered (unstructured) regions (Iakoucheva et al, 2004; Landry et al, 2009). Three-state residues also tend to be located in disordered regions in both organisms (p-value < 0.005; Annex 3, Figure S5.1).

5.5.2 – Phosphorylation and O-GlcNAcylation are found in the same residues more than expected by chance alone

We first examined whether within the co-modified proteins the two PTMs occur on the same residue. We counted the number of 3-state residues and we randomly shuffled the O- GlcNAcylation sites in each protein to generate a null model that reflects the random expectations for each species separately. We found that the number of 3-state residues is 1.3-times greater than expected by chance in both species (p-value < 0.001), thus supporting rejecting our null hypothesis of independence between phosphorylation and O- GlcNAcylation (Figure 5.1).

85

Figure 5.1. Number of 3-state sites in human and mouse and comparisons to random expectations.

Number of 3-state (phosphorylatable and O-GlcNAcylatable) sites in human (A) and mouse (B) and comparisons to random expectations (1000 iterations, p-value < 0.001).

One potential confounding factor of our analyses is that phosphorylation and O- GlcNAcylation tend to be sampled more often on highly-abundant proteins, which would artificially inflate their co-occurrence. For instance, we recently showed that there is a detection bias for phosphorylation sites towards highly abundant proteins (e.g. (Freschi et al, 2014)) and this could also be true for O-GlcNAcylated proteins, thereby increasing the probably of finding both modifications on highly abundant proteins. Our results are in line with these expectations (Figure 5.2A,B), suggesting that in general, PTMs tend to be preferentially detected on highly abundant proteins.

86

87

Figure 5.2. Fraction of sites as a function of protein abundance for human and mouse O- GlcNAcylation sites and comparison of average protein abundance between all proteins and proteins that contain 3-state sites for humans and mouse.

Fraction of sites as a function of protein abundance for human (A) and mouse (B) O- GlcNAcylation sites and comparison of average protein abundance between all proteins and proteins that contain 3-state sites for humans (C) and mouse (B). Data about protein abudance was retrieved from PaxDb (Wang et al, 2012).

We also found that 3-state sites tend to be preferentially found in proteins with high average protein abundance in mouse, but not in humans (Figure 5.2C,D). This difference could reflect a functional difference between humans and mice, but more likely it is a side effect of biases in the protein sampling in mouse. We examined the distributions of protein abundance for the proteins with 3-state sites, showing that the sampling bias towards highly abundant that we observed in mouse is common across the different studies (Annex 3, Figure S5.2).

5.5.3 - Clues of independent regulation of multiple functions in humans but not in mouse In previous studies of phosphorylation site evolution, residue conservation has been associated to functional roles in protein regulation (Beltrao et al, 2012). We therefore hypothesized that if phosphorylation and O-GlcNAcylation regulate independent protein functions, 3-state sites should be more conserved than 2-state sites while if they regulate the same function, the evolutionary rates of the 3-state sites should be approximately the same as 2-state sites. We estimated the rates of evolution of all serines and threonines of phosphorylated or glycosylated proteins using alignments of 16 species. We normalized each rate by the average evolutionary rate of each residue type within each protein (see methods) so that the rates become independent of protein abundance or structural properties (order or disorder), both of which have been shown to affect rates of evolution (Landry et al, 2009; Levy et al, 2012). We compared the distribution of the evolutionary rates of 1- state (non-modified) serines and threonines to those of 2-state and 3-state modified residues. We found that 3-state modified residues are more conserved over evolution

88

compared to 2-state and 1-state residues (Figure 5.3A) in humans. We also observed that on average O-GlcNAcylated sites are more conserved than phosphorylation sites. However, we did not observe the same trend in mouse (Figure 5.3B).

89

Figure 5.3. Comparison of residue conservation for 1-state, 2-state and 3-state residues in the human and mouse proteomes.

Comparison of residue conservation for 1-state, 2-state and 3-state in the human (A) and mouse (B) proteomes (Wilcoxon tests: n.s.: non-significant; *: p-value < 0.05; **: p-value < 0.01; ***: p-value < 0.001). Panels (C) and (D) show the same analysis on the human proteome using a different measure of evolutionary conservation (Gray & Kumar, 2011) or different sequence alignments (Landry et al, 2009). A green circle indicates phosphorylation, while a blue one indicates O- GlcNAcylation.

In order to avoid a potential bias determined by the number of species used to calculate the evolutionary rates or the method used, we performed the same analysis using an independent method (Figure 5.3C) (Gray & Kumar, 2011) and alignments that have been used to calculate evolutionary rates in previous studies (Figure 5.3D) (Landry et al, 2009). We also looked at the regions (+/- 5 amino acids) surrounding the 1-state, 2-state and 3- state sites and we found that 3-state sites tend to be more evolutionary conserved compared to 1-state and 2-state sites (Figure 5.4).

90

Figure 5.4. Comparison of the evolutionary conservation of the regions surrounding 1-state, 2- state and 3-state residues (+/- 5 amino acids) for the human proteome.

(Wilcoxon tests: *: p-value < 0.05; **: p-value < 0.01; ***: p-value < 0.001). A green circle indicates phosphorylation, while a blue one indicates O-GlcNAcylation.

This result again supports out hypothesis that phosphorylation and O-GlcNAcylation overall are likely to control independent functions.

5.5.4 – Three state sites and 2-state ones have different preferences for protein kinases If 3-state sites allow the regulation of multiple functions, they should have some features that distinguish them from 2-state ones. Both 2-state phosphorylated and 3-state sites can be phosphorylated. We would therefore expect 3-state sites to be phosphorylated by a set of kinases that differs from the ones for 2-state ones. To test this prediction, we determined the kinase-phosphorylation site associations for all the phosphorylated and 3-state residues using NetPhorest ((Miller et al, 2008), see also methods) and we compared the likelihoods being phosphorylated by a given kinase for 2-state phosphorylated residues and 3-state residues. We found that 3-state residues show a clear preference for certain kinases

91

compared to the residues that are phosphorylated only, and this holds true for both mouse and human (Figure 5.5).

Figure 5.5. Kinase preferences of 3-state residues for human and mouse proteins.

Kinase preferences of 3-state residues for human (A) and mouse (B) proteins. The associations were determined using NetPhorest. Colored bars represent significant trends (p-value < 0.05): green indicates significat preference while orange indicates significat avoidance.

Examples of such kinases include ATM/ATR, GSK and RCK. The example of ATM is of particular interest, since this protein is involved in the response to DNA damage and its deregulation leads to cancer (Kastan, 2008). The link between O-GlcNAcylation and ATM has been already been shown in a recent study (Miura et al, 2012). Moreover, the function of this protein is also regulated by protein phosphorylation (Kozlov et al, 2011). The role of the 3-state sites in this protein has now to be investigated experimentally in order to understand how they integrate the regulatory programs encoded by phosphorylation and O- GlcNAcylation.

92

5.6 - Conclusion

Here, we focused on the cross-talk between phosphorylation and O-GlcNAcylation in human and mouse. We sought to extend the previous studies on the cross-talk about these two PTMs (Hart et al, 2011; Zeidan & Hart, 2010) by performing a proteome-wide analysis in which we used the most recent available proteomics data. We found that the number of 3-state (glycosylated and phosphorylated) serines and threonines is grater than expected by chance, suggesting that these sites could have a potential role for protein regulation. Evolutionary conservation is an indicator of functionality (Beltrao et al, 2012; Landry et al, 2009) and our results show that in humans the 3-state sites tend to be significantly more conserved than both the 2-state ones. This suggests that phosphorylation and O- GlcNAcylation may act as independent switches to regulate two sets of protein functions, since if they acted on the same function we would have expected to see the 3-state modified sites not differing in their level of conservation compared to the 2-state modified ones. Finally, we tried to associate some putative functions to the 3-state modified sites and we found that are more often associated with some kinases compared to sites that are phosphorylated only. Our finding that phosphorylation and O-GlcNAcylation of 3-state sites are likely to regulate independent functions does not rule out the fact that indeed for many 3-state sites phosphorylation and glycosylation may act as one single switch. The most realistic scenario is that the cell uses a mixture of these two modes of function to finely tune its functions. The study of the cross-talk between phosphorylation and O- GlcNAcylation is still at its dawn, but the die is cast.

93

Chapter 6 – General conclusions

6.1 - Summary of the study

In this thesis we sudied the evolution of PTM networks to answer the following key questions:

(i) how a PTM regulatory network is rewired after gene duplication and how this process may contribute to increase the organismal complexity (ii) how a PTM regulatory network evolves in different species (iii) how two PTM regulatory networks that share the same target residue interfere with each other and what are the possible functional consequences of this interference

In Chapter 2 and 3 we focused on the first one of these questions and we determined to which extent gene duplication followed by divergence contributed rewiring a specific eukaryotic PTM regulatory network: the phosphoregulatory network of the budding yeast S. cerevisiae. Our results (Chapter 2) show that 100 million years of evolution were sufficient to extensively rewire this PTM network. We observed major changes both at the level of phosphorylation sites and at the level of the network of kinases that phosphorylate them so that 95% of the PTM profiles and up to 50% of the kinase-phosphorylation sites associations have changed between paralogs. We then investigated the evolutionary mechanisms responsible for this rewiring and we found that phosphorylation sites tended to be lost rather than gained between paralogous proteins. We proposed that this mechanism could potentially contribute increasing the biological complexity and the fitness of the cell. Indeed, in the case of multi-functional proteins which functions are regulated by multiple independent phosphorylation sites, a duplication event followed by the differential loss of phosphorylation sites would allow the two duplicates to split the functions between each other. Finally, we also showed that at least a fraction of sites that have been lost beween paralogs may actually have been compensated by the emergence of new phosphorylation sites at positions close to the original ones (evolutionary turnover of phosphotylation sites). Overall, our results show the effects of a duplication event on a PTM regulatory network, pointing out the importance of this mechanisms to lead to biological innovations.

95

In Chapter 3 we continued on the path paved by Chapter 2 by investigating in detail the evolutionary trajectories of the phosphorylation sites that are lost in WGD paralogs. We found that a significant fraction of them tend to be preferentially lost towards aspartic and glutamic acid (Asp and Glu) residues. This is an interesting finding because these two amino acids have chemical properties that mimick those of phosphorylated serines and threonines (both Asp and Glu are negatively charged amino acids). By looking at the genetic code we noticed that this kind of transitions from a phosphorylatable amino acids to negatively charged ones necessarely require two mutations and imply a non-functional intermediate, i.e. an amino acid that does not carry any negative charge and cannot be phosphorylated. If a site is important to regulate an essential function, the presence of the non-functional intermediate would lead to a fitness defect, making these kind of transitions unlikely to occur due to purifying selection. We reasoned that gene duplication would represent a way to bypass this problem since one of the duplicates proteins could accumulate mutations while the other could exert the original function, allowing the second mutation to occur. We searched in the literature and we found that examples compatible with this evolutionary scenario in which gene duplication allows transitions between phosphorylation sites and negatively charged amino acids have already been observed and reported (e.g. (Basu et al, 2008)). Our results suggest that this mechanism could be general and that it has the potential to lead to new regulatory opportunities for the cell. Moreover, our results also point out once again the importance of gene duplication as a mechanism that can lead to biological innovations.

Chapter 4 focuses on the second objective of this thesis by investigating the evolutionary rewiring of the phosphoregulatory network between human and mouse. We found that a large number of phosphorylation sites are conserved between these two species and that, in general, purifying selection is acting to mantain them. However, we also found that a lot of phosphorylation sites are species specific since to a phosphorylated site in one species corresponds one non-phosphorylatable residue in the other one. These sites represent good candidates to explain the molecular bases of the phenotypic diveregence observed between human and mouse. We also found for the first time more subtle differences in

96

phosphoregulation, represented by sites that are conserved at residue level in the two species but are divergent with respect to the phosphoregulation. The biological impact and the functions of these sites have now to be assessed. Finally, we reported some observations that support the hypothesis of the evolutionary turnover of phosphotylation sites, that states that phosphorylation sites can jump to different but close locations during evolution and still retain their biological function. In fact, by aligning the human and mouse phosphoproteins we identified more than 100 site pairs that tended to be found in the same region of the protein more than expected by chance and tended to be phosphorylated by the same kinases. These results suggests that evolutionary turnover should be taken into account when comparing different phosphoproteomes, in order to avoid overestimating the divergence between them.

In the last chapter (Chapter 5) we focused on the third objective of the thesis by studying one of the mechanisms by which PTMs regulate protein functions. Different PTMs can occur at the same residues, meaning that at a given time a residue can carry no PTM or one of the two PTMs but not both at the same time (interference or cross-talk between PTMs). The presence of this interference raises the question of whether both PTMs control the same protein function or each PTM type controls a specific one. We investigated this problem by studying the phosphorylation and O-GlcNAcylation profiles of human and mouse. We first found that these two PTMs tend to occur together more than expected by chance, confirming that there is an interference or cross-talk between phosphorylation and O-GlcNAcylation. Further, we found that the residues that have been experimentally found to be phosphorylated and O-GlcNAcylated tend, in general, to have an higher level of evolutionary conservation than the residues that have been found to be phosphorylated or O-GlcNAcylated only, suggesting that they may be involved in the regulation of a larger set of functions compared to the sites that can carry only one of the two PTMs. Finally we found that the set of kinases that phosphorylate doubly modified residues differs from the one that phosphorylate the sites that undergo only one modification. All these observations represent a description of the effects of the interference between two PTM regulatory networks at global level.

97

6.2 - Perspectives

Although in the last century we have made impressive progresses, a lot of work has still to be done to understand how PTM networks are organized, how they evolve and how they are implemented in different organisms. In this section we will propose some research paths that emerge from this thesis and that can bring us closer to our objective.

First, from our study emerges that there is a need for large scale and small scale experimental studies of PTMs. Large scale studies are needed in order to saturate the coverage of the PTMs that have been extensively studied up to now (phosphorylation, ubiquitination, glycosylation) or add some data for those that we have not studied yet. To this aim, we need to develop new enrichment protocols that would allow us to take full advantage of the last generation of mass spectrometry instruments. Small scale studies are also needed to assign functions to PTMs. Indeed, even for protein phosphorylation, the most studied PTM, although thousands of phosphorylation sites have been reported in the literature, the ones with known function remain a small fraction (Landry et al, 2009).

Another important aspect that needs to be further developed is the study of PTM writers and erasers, and this for several PTM networks. If we want to study how different PTM networks are integrated and how they participate to cellular regulation we need to know the spatio-temporal regulatory patterns of the writers and erasers as well as their specificities in order to link this information to PTM profiles, substrate abundance and PTM site occupancy.

A complete new dimension that has to be explored in order to understand how PTM networks work is to study how they rewire in different conditions, as also suggested by Beltrao and collaborators (Beltrao et al, 2013) in a recent review. These studies would provide both information about the network configuration in different conditions and also allow to identify the PTMs that are condition-specific, thus helping us in the task of assigning functions to PTMs.

98

New data about the profiles of different PTMs in different organisms are also needed to study the evolution of PTM networks. At the moment we have limited data even for the model organisms (Gruhler et al, 2005; Holt et al, 2009; Huttlin et al, 2010; Li et al, 2007; Sharma et al, 2014; Zielinska et al, 2009), however by having more data about model and non-model organisms we could get a better understanding of species divergence. For example, we could study how differences in regulation may reflect phenotypic differences. Further, we could study more in detail the evolutionary events that shaped PTM networks and contributed to complexify them. Chapters 2 and 3 of this thesis point out the importance of events like gene duplication in these processes. By having more data about PTMs (in particular in mammals) we could understand when specific regulations appeared and how the same biological process can be regulated in a different way in different species.

Another limit to the understanding of the PTM networks is also that comprehensive datasets of PTMs for specific organs or tissues are still substantially missing (Hornbeck et al, 2012). By having these datasets we could study which parts of the whole PTM regulatory network are of key importance in a cell type, tissue or organ. This approach should be applied to different PTM networks, to understand how they are integrated together at different scales from cells to tissues to organs.

Future studies should also investigate the links between PTM regulation and disease. Even if this topic has not been addressed directly in this thesis, a lot of phosphorylation sites are located in proteins implicated in diseases but their role remains unknown. The role of PTMs as markers for disease and disease progression has not been also fully assessed a part from specific cases (e.g. (Jin & Zangar, 2009)).

All these considerations suggest that we are still far from cracking the code of PTM networks, i.e. the logic and the dynamics by which PTM networks cross-talk to each other to regulate cellular functions but, nevertheless the studies of the last decade as well as this study have set the directions to take in order to get there.

99

Annex 1 – Supplementary information for Chapter 2

Figure S2.1. Comparison of independent studies allows measuring phosphosite conservation among paralogs.

Cross-study reproducible sites are sites that are phosphorylated in one study and also found to be phosphorylated in another study for the same set of proteins. Cross-study conserved sites are sites found to be phosphorylated in one paralog in one study and in its paralog in a second study. Under a scenario where paralogous proteins were perfectly conserved, these two numbers should be equal as they are equally affected by false positive and false negative identifications. Thus, the ratio of cross-study conservation over cross-study reproducibility provides an estimate of the true state- conservation (Figure 2.1A). Only S/T sites conserved the two paralogs are considered. 0 and 1 indicate nonphosphorylated and phosphorylated sites respectively.

101

Figure S2.2: Distribution of all PWM scores and of the maximal scores used to assign a protein kinase to a particular phosphosite.

Overall distribution of PWM scores (red) and distribution of the maximal scores for each phosphosite (blue). Maximum scores were used to assign a kinase to each phosphosites. See methods for details.

102

Figure S2.3. State-conserved sites are more abundant than expected by chance alone. In order to calculate the expected number of state-conserved sites, we randomly re-assigned phosphosites to conserved S/T sites of the paralogous phosphoproteins and calculated how many occurred at homologous positions. This process was repeated 1000 times. There are 118 sites that occur at homologous positions (236 phosphosites), a number that is 7.4 times higher than expected by chance alone (P<< 0.001).

103

Figure S2.4. Correlation between the relative abundance of kinases assigned to the global phosphoproteome and those found in the WGD phosphoproteome. There is strong positive correlation between the relative fractions of kinases assigned to the global phosphoproteome and those found in the WGD phosphoproteome (rho = 0.99, p-value < 2×10-16). This suggests that there is no specific group of kinases that preferentially phosphorylates WDG phosphoproteins.

104

Figure S2.5. State-diverged nonphosphorylated S/T sites likely comprise 50% of nonphosphorylated sites. Nonphosphorylated S/T of the state-diverged sites likely comprise cases that are false negative identifications. Because sites that have not been reported to be phosphorylated (i.e. randomly selected S/T) have lower PWM scores than phosphorylated ones (Figure 2.1F), we can use PWM scores to estimate this proportion. Thus, in order to estimate the true ratio of nonphosphorylated S/T sites in state-diverged nonphosphorylated sites, we re-created the PWM score distribution ii (sites not found to be phosphorylated among the state-diverged sites) from Figure 1F by randomly sampling different ratios of PWM scores from distribution i (non-phosphorylated) and distribution iv (state-conserved phosphorylated sites). We calculated the median of these distributions and iterated this procedure 100 times. Each box-plot represents the distribution of the 100 medians calculated for each of the ratios considered. A mixture of 50% of each distribution gives the same median as the median score of the state-diverged nonphosphorylated S/T.

105

Figure S2.6. Gains and losses of phosphosites for the literature curated dataset (Dataset 2) (Nguyen Ba & Moses, 2010).

This dataset contains 394 phosphosites mapped on 118 proteins. The comparison of the number of gains and losses with the respective null models (see methods) confirms the trend observed in phosphoproteomics experiments. Random sampling was performed as detailed for Figure 2.2C. The black square represents the observed numbers in each case.

106

Figure S2.7. Gains and losses of phosphosites for the unfiltered dataset (Dataset 3).

The comparison of the number of gains and losses with the random expectation confirms the trend for phosphosites to be preferentially lost. Here all phosphosites that correspond to phosphopeptides that are assigned to more than one protein are also considered. Random sampling is done as detailed for Figure 2.2C. The black square represents the observed numbers in each case.

107

Figure S2.8. Gains and losses of phosphosites for Dataset 4 (additional species considered for the inference of the ancestral sequence).

The comparison of the number of gains and losses with the random expectation confirms the trend for phosphosites to be preferentially lost. Here the ancestral sequences were reconstructed as detailed in the methods section but including an additional species that diverged from S. cerevisiae prior to the WGD. Random sampling was done as detailed for Figure 2.2C. The black square represents the observed numbers in each case.

108

Supplementary Files

Supplementary files can be found at the address: http://www.bio.ulaval.ca/landrylab/download

File S2.1 Phosphosites identified in L. kluyveri. The file includes information about the position of the site along the protein, the type of residue and the confidence score.

File S2.2 Mini-website of alignments of the paralogous pairs, their orthologs and the ancestral sequences. Disordered regions of proteins are indicated by asterisks and phosphorylation sites are in bold.

109

Annex 2 – Supplementary information for Chapter 4

Human Mouse Study #prot S T Y #prot S T Y Minguez et al. 5899 25799 8179 6631 3278 11474 2689 1708 Beltrao et al. 7349 28666 9240 8460 5760 21876 4706 2361 Phosida et al. 2312 7948 2056 454 2818 10001 1704 238 HPRD 4670 21493 6511 2340 - - - - Phosphosite.Org 8355 39796 16160 15610 5607 26010 6538 3958 phosphoELM 2923 11085 2743 1203 1510 3070 636 379 Huttlin et al. - - - - 3193 14679 2849 426 Non-redundant 12341 61401 24257 20962 8179 38718 9821 5501

Table S4.1. Number of phosphoproteins and phosphorylation sites (sorted by phosphorylatable residue) for all the studies we considered as well as the corresponding non-redundant values.

Protein Site Sequence NetPhorest Predicted score kinase NUCL_HUMA ENSP00000318195_28 PKEVEEDSEDEEMS 0.649539 CK2 N E NUCL_MOUS ENSMUSP00000027438_2 PKEVEEDSEDEEMS 0.649539 CK2 E 8 E

PAXB1_HUM ENSP00000328992_262 REDENDASDDEDD 0.649397 CK2 AN DE PAXB1_MOUS ENSMUSP00000113835_2 REDENDASDDEDD 0.649397 CK2 E 64 DE

ARI4A_HUMA ENSP00000347602_160 DEKEEESSEEEDED 0.649337 CK2 N K ARI4A_MOUS ENSMUSP00000035512_1 DEKEEESSEEEDED 0.649337 CK2 E 60 K

B3KYA7_HU ENSP00000429744_344 LEEEEENSDEDELD 0.648665 CK2 MAN S B3KYA7_MO ENSMUSP00000018476_3 LEEEEENSDEDELD 0.648665 CK2 USE 16 S

RPC7L_HUMA ENSP00000358320_163 KKEEEVTSEEDEEK 0.648586 CK2 N E RPC7L_MOUS ENSMUSP00000089544_1 KKEEEVTSEEDEEK 0.648586 CK2 E 63 E

ARI4A_HUMA ENSP00000347602_159 EDEKEEESSEEEDE 0.648118 CK2 N D ARI4A_MOUS ENSMUSP00000035512_1 EDEKEEESSEEEDE 0.648118 CK2

111

E 59 D

ARI4B_HUMA ENSP00000355562_295 EKEKEDNSSEEEEEI 0.647742 CK2 N ARI4B_MOUS ENSMUSP00000106163_2 EKEKEDNSSEEEEEI 0.647742 CK2 E 95

SENP3_HUMA ENSP00000403712_75 PSFDASASEEEEEEE 0.647679 CK2 N SENP3_MOUS ENSMUSP00000005336_7 PSFDASASEEEEEEE 0.647679 CK2 E 3

TBD2B_HUM ENSP00000300584_957 PDKGELVSDEEEDT 0.647633 CK2 AN TBD2B_MOUS ENSMUSP00000045413_9 PDKGELVSDEEEDT 0.647633 CK2 E 59

U5S1_HUMAN ENSP00000392094_19 YIGPELDSDEDDDE 0.647600 CK2 L U5S1_MOUSE ENSMUSP00000021306_1 YIGPELDSDEDDDE 0.647600 CK2 9 L

NIN_HUMAN ENSP00000245441_1145 VTRRHVLSDLEDDE 0.632661 CK2 V NIN_MOUSE ENSMUSP00000082422_1 PATKHFLSDLGDHE 0.103702 DMPK 133 A

F111A_HUMA ENSP00000434435_607 QQDVEMMSDEDL 0.633808 CK2 N F111A_MOUS ENSMUSP00000119518_6 VQNVEMLSIDF 0.139290 CK2 E 10

OSTP_HUMA ENSP00000378517_191 ATDEDITSHMESEE 0.601545 CK2 N L OSTP_MOUSE ENSMUSP00000031243_1 ATDEDLTSHMKSG 0.107948 CK1 76 ES

ORC2_HUMA ENSP00000234296_177 LIVPRSHSDSESEYS 0.576608 CK2 N ORC2_MOUSE ENSMUSP00000027198_1 IIASRSHYDSESEYS 0.090376 MAP2K6_ 76 MAP2K3_ MAP2K4_ MAP2K7

K1551_HUMA ENSP00000310338_1198 NSIKNSSSEEEKQK 0.602660 CK2 N E K1551_MOUS ENSMUSP00000041180_9 VPQCHCSSTEKKEK 0.119368 ACTR2_A E 56 D CTR2B_T GFbR2

112

LTV1_HUMA ENSP00000356548_211 YDSAGLLSDEDCM 0.612627 CK2 N SV LTV1_MOUSE ENSMUSP00000019950_2 RSSAGFLSDGGDLS 0.129578 CK2 06 A

RBP2_HUMA ENSP00000283195_2583 KCELSKNSDIEQSS 0.593321 CK2 N D RBP2_MOUSE ENSMUSP00000003310_2 KCELPQNSDIKQSS 0.115794 GRK 421 D

SYCC_HUMA ENSP00000369897_264 LTGEEVNSCVEVLL 0.567883 CK2 N E SYCC_MOUS ENSMUSP00000010899_2 LSGEEVDSKVQVLL 0.093388 CK2 E 64

TSN1_HUMA ENSP00000361072_156 CCGFTNYTDFEDSP 0.585526 CK2 N Y TSN1_MOUSE ENSMUSP00000030465_1 CCGFNNYTDFNAS 0.115104 CK2 56 RF

SETB1_HUMA ENSP00000271640_474 LSPQAGDSDLESQL 0.543830 CK2 N A SETB1_MOUS ENSMUSP00000015841_4 LSPQAADTESLESQ 0.080428 CK2 E 73 L

Table S4.2. Comparison of StC and StD sites. The first ten site pairs present in the table are the pairs of StC sites with the highest NetPhorest scores. The last ten pairs are the pairs of StD sites with the highest difference of NetPhorest scores between the phosphorylated site and its non- phosphorylated counterpart. Green rows refer to phosphorylated sites while grey to non- phosphorylated ones. Differences between orthologous 15mers centered on each site are highlighted in yellow.

Protein ID Description Human site Mouse site ENSMUSP00000036890_18 PKP2 plakophilin 2 ENSP00000070846_203 1 ENSMUSP00000036890_23 PKP2 plakophilin 2 ENSP00000070846_267 5 ENSMUSP00000021381_55 PNN pinin, desmosome associated protein ENSP00000216832_552 9 ENSMUSP00000032179_18 NUP210 nucleoporin 210kDa ENSP00000254508_1862 39 ENSMUSP00000032179_18 NUP210 nucleoporin 210kDa ENSP00000254508_1863 39 vacuolar protein sorting 13 homolog C (S. ENSMUSP00000077040_83 VPS13C cerevisiae) ENSP00000261517_542 9 vacuolar protein sorting 13 homolog C (S. ENSMUSP00000077040_83 VPS13C cerevisiae) ENSP00000261517_734 9

113

vacuolar protein sorting 13 homolog C (S. ENSMUSP00000077040_83 VPS13C cerevisiae) ENSP00000261517_736 9 vacuolar protein sorting 13 homolog C (S. ENSMUSP00000077040_19 VPS13C cerevisiae) ENSP00000261517_1902 56 ENSMUSP00000057096_92 DSG2 desmoglein 2 ENSP00000261590_984 1 ENSMUSP00000068560_10 ADNP2 ADNP homeobox 2 ENSP00000262198_1024 52 ENSMUSP00000079752_45 LRP2 low density lipoprotein receptor-related protein 2 ENSP00000263816_4527 33 ENSMUSP00000079752_46 LRP2 low density lipoprotein receptor-related protein 2 ENSP00000263816_4634 32 ENSMUSP00000105191_76 EHBP1 EH domain binding protein 1 ENSP00000263991_751 5 ENSMUSP00000105191_76 EHBP1 EH domain binding protein 1 ENSP00000263991_769 5 ENSMUSP00000071904_19 ALMS1 Alstrom syndrome 1 ENSP00000264448_2751 16 ENSMUSP00000071904_19 ALMS1 Alstrom syndrome 1 ENSP00000264448_2754 16 ENSMUSP00000029879_42 NBN nibrin ENSP00000265433_402 9 ENSMUSP00000029879_53 NBN nibrin ENSP00000265433_497 3 ENSMUSP00000029879_54 NBN nibrin ENSP00000265433_516 3 ENSMUSP00000054797_13 FANCM Fanconi anemia, complementation group M ENSP00000267430_1413 79 ENSMUSP00000054797_16 FANCM Fanconi anemia, complementation group M ENSP00000267430_1673 38 ENSMUSP00000054797_16 FANCM Fanconi anemia, complementation group M ENSP00000267430_1686 38 ENSMUSP00000054797_16 FANCM Fanconi anemia, complementation group M ENSP00000267430_1693 38 ENSMUSP00000054797_16 FANCM Fanconi anemia, complementation group M ENSP00000267430_1721 38 ENSMUSP00000060780_22 C10orf47 chromosome 10 open reading frame 47 ENSP00000277570_146 5 ENSMUSP00000020112_79 UHRF1BP1L UHRF1 binding protein 1-like ENSP00000279907_446 7 ENSMUSP00000032909_53 PDE3B phosphodiesterase 3B, cGMP-inhibited ENSP00000282096_561 6 ENSMUSP00000003310_11 RANBP2 RAN binding protein 2 ENSP00000283195_1146 41 ENSMUSP00000003310_26 RANBP2 RAN binding protein 2 ENSP00000283195_2802 38 ENSMUSP00000003310_26 RANBP2 RAN binding protein 2 9848] ENSP00000283195_2807 41 ENSMUSP00000052641_14 ZNF646 zinc finger protein 646 29004] ENSP00000300850_1448 12 CLSPN claspin ENSP00000312995_69 ENSMUSP00000045344_84 ENSMUSP00000045344_94 CLSPN claspin ENSP00000312995_949 8

114

ENSMUSP00000045344_94 CLSPN claspin ENSP00000312995_955 8 ENSMUSP00000045344_11 CLSPN claspin ENSP00000312995_1161 23 disabled homolog 2, mitogen-responsive ENSMUSP00000079689_73 DAB2 phosphoprotein (Drosophila) ENSP00000313391_723 1 ENSMUSP00000054748_26 FAM123C family with sequence similarity 123C ENSP00000314914_307 7 ENSMUSP00000019405_53 MAP1S microtubule-associated protein 1S ENSP00000325313_582 2 ENSMUSP00000019405_57 MAP1S microtubule-associated protein 1S ENSP00000325313_640 3 ENSMUSP00000019405_57 MAP1S microtubule-associated protein 1S ENSP00000325313_643 3 ENSMUSP00000105628_32 DDX24 DEAD (Asp-Glu-Ala-Asp) box polypeptide 24 ENSP00000328690_302 9 ENSMUSP00000072662_13 LRRC16A leucine rich repeat containing 16A ENSP00000331983_1314 20 ENSMUSP00000116040_45 EFCAB13 EF-hand calcium binding domain 13 ENSP00000332111_385 2 ENSMUSP00000037970_71 BMP2K BMP2 inducible kinase ENSP00000334836_728 5 ENSMUSP00000037970_88 BMP2K BMP2 inducible kinase ENSP00000334836_1011 8 ENSMUSP00000037970_88 BMP2K BMP2 inducible kinase ENSP00000334836_1080 8 ENSMUSP00000021311_55 KIF18B kinesin family member 18B ENSP00000341466_676 8 ENSMUSP00000020208_55 FGD6 FYVE, RhoGEF and PH domain containing 6 ENSP00000344446_553 7 ENSMUSP00000020208_55 FGD6 FYVE, RhoGEF and PH domain containing 6 ENSP00000344446_632 7 ENSMUSP00000020208_55 FGD6 FYVE, RhoGEF and PH domain containing 6 ENSP00000344446_693 7 ENSMUSP00000078945_63 HTT huntingtin ENSP00000347184_411 8 ENSMUSP00000105207_33 MKL1 megakaryoblastic leukemia (translocation) 1 ENSP00000347847_295 5 ENSMUSP00000105207_34 MKL1 megakaryoblastic leukemia (translocation) 1 ENSP00000347847_305 5 ENSMUSP00000115078_21 SVIL supervillin ENSP00000348128_221 8 ENSMUSP00000115078_21 SVIL supervillin ENSP00000348128_226 8 ENSMUSP00000115078_21 SVIL supervillin ENSP00000348128_240 8 ENSMUSP00000115078_21 SVIL supervillin ENSP00000348128_253 8 ENSMUSP00000115078_24 SVIL supervillin ENSP00000348128_253 8 ENSMUSP00000115078_24 SVIL supervillin ENSP00000348128_261 8 SVIL supervillin ENSP00000348128_263 ENSMUSP00000115078_24

115

8 ENSMUSP00000115078_85 SVIL supervillin ENSP00000348128_914 7 ENSMUSP00000102224_50 CLCC1 chloride channel CLIC-like 1 ENSP00000349456_506 2 ENSMUSP00000102224_50 CLCC1 chloride channel CLIC-like 1 ENSP00000349456_509 3 ENSMUSP00000038749_47 PDE3A phosphodiesterase 3A, cGMP-inhibited ENSP00000351957_475 2 ENSMUSP00000038749_52 PDE3A phosphodiesterase 3A, cGMP-inhibited ENSP00000351957_523 6 ENSMUSP00000001179_14 PCNT pericentrin ENSP00000352572_1703 44 ENSMUSP00000001179_19 PCNT pericentrin ENSP00000352572_2370 90 PARP9 poly (ADP-ribose) polymerase family, member 9 ENSP00000353512_61 ENSMUSP00000110528_20 ENSMUSP00000034846_57 C15orf39 open reading frame 39 ENSP00000353854_586 9 Mitogen-activated protein kinase kinase kinase ENSMUSP00000034316_52 RP5-862P8.2 MLK4 ENSP00000355583_542 1 Mitogen-activated protein kinase kinase kinase ENSMUSP00000034316_52 RP5-862P8.2 MLK4 ENSP00000355583_546 1 ENSMUSP00000027645_12 PTPRC protein tyrosine phosphatase, receptor type, C ENSP00000356346_1281 65 ENSMUSP00000027645_12 PTPRC protein tyrosine phosphatase, receptor type, C ENSP00000356346_1287 71 ENSMUSP00000120085_12 CEP350 centrosomal protein 350kDa ENSP00000356579_1195 00 ENSMUSP00000120085_12 CEP350 centrosomal protein 350kDa ENSP00000356579_1219 00 ENSMUSP00000120085_22 CEP350 centrosomal protein 350kDa ENSP00000356579_2204 19 ENSMUSP00000120085_22 CEP350 centrosomal protein 350kDa ENSP00000356579_2238 19 ENSMUSP00000120085_22 CEP350 centrosomal protein 350kDa ENSP00000356579_2239 21 ENSMUSP00000083204_90 F5 coagulation factor V (proaccelerin, labile factor) ENSP00000356770_1155 3 kinase non-catalytic C-lobe domain (KIND) ENSMUSP00000050586_26 KNDC1 containing 1 ENSP00000357561_257 7 ENSMUSP00000049977_55 SLK STE20-like kinase ENSP00000358770_569 4 ENSMUSP00000110756_25 DST dystonin ENSP00000359790_2635 21 ENSMUSP00000104783_62 ZNF217 zinc finger protein 217 ENSP00000360526_568 1 alpha thalassemia/mental retardation syndrome X- ATRX linked ENSP00000362441_33 ENSMUSP00000109203_43 alpha thalassemia/mental retardation syndrome X- ATRX linked ENSP00000362441_52 ENSMUSP00000109203_43 alpha thalassemia/mental retardation syndrome X- ATRX linked ENSP00000362441_65 ENSMUSP00000109203_35

116

alpha thalassemia/mental retardation syndrome X- ENSMUSP00000109203_66 ATRX linked ENSP00000362441_706 9 alpha thalassemia/mental retardation syndrome X- ENSMUSP00000109203_87 ATRX linked ENSP00000362441_899 1 alpha thalassemia/mental retardation syndrome X- ENSMUSP00000109203_10 ATRX linked ENSP00000362441_1068 75 ENSMUSP00000038150_18 ITPR3 inositol 1,4,5-trisphosphate receptor, type 3 ENSP00000363435_1861 31 ENSMUSP00000101412_16 SPEN spen homolog, transcriptional regulator (Drosophila) ENSP00000364912_1622 69 ENSMUSP00000101412_21 SPEN spen homolog, transcriptional regulator (Drosophila) ENSP00000364912_2014 07 ENSMUSP00000101412_24 SPEN spen homolog, transcriptional regulator (Drosophila) ENSP00000364912_2486 02 ENSMUSP00000056715_83 ATXN2 ataxin 2 ENSP00000366843_872 0 ENSMUSP00000065810_22 RTN3 reticulon 3 ENSP00000367050_268 5 human immunodeficiency virus type I enhancer ENSMUSP00000056147_59 HIVEP1 binding protein 1 ENSP00000368698_479 1 ENSMUSP00000038576_40 BRCA2 breast cancer 2, early onset ENSP00000369497_384 0 ENSMUSP00000098701_23 SHROOM2 shroom family member 2 ENSP00000370299_229 7 ENSMUSP00000045217_10 FANCA Fanconi anemia, complementation group A ENSP00000373952_850 71 ENSMUSP00000082177_79 CCDC88C coiled-coil domain containing 88C ENSP00000374507_1023 4 ENSMUSP00000048180_29 ACD adrenocortical dysplasia homolog (mouse) ENSP00000377496_411 0 ENSMUSP00000034133_43 MYLK3 myosin light chain kinase 3 ENSP00000378288_450 2 ENSMUSP00000106137_85 GLI3 GLI family zinc finger 3 ENSP00000379258_850 1 ENSMUSP00000005218_72 CD44 CD44 molecule (Indian blood group) ENSP00000398632_686 8 ENSMUSP00000005218_72 CD44 CD44 molecule (Indian blood group) ENSP00000398632_717 8 ENSMUSP00000005218_77 CD44 CD44 molecule (Indian blood group) ENSP00000398632_717 3 ENSMUSP00000028509_43 GORASP2 golgi reassembly stacking protein 2, 55kDa ENSP00000410208_448 2 ENSMUSP00000068896_13 TOP2A topoisomerase (DNA) II alpha 170kDa ENSP00000411532_1360 79 ENSMUSP00000068896_13 TOP2A topoisomerase (DNA) II alpha 170kDa ENSP00000411532_1361 79 ENSMUSP00000068896_13 TOP2A topoisomerase (DNA) II alpha 170kDa ENSP00000411532_1392 67 ENSMUSP00000068896_14 TOP2A topoisomerase (DNA) II alpha 170kDa ENSP00000411532_1495 69 ENSMUSP00000109855_10 C9orf172 chromosome 9 open reading frame 172 ENSP00000412388_104 5 CAMSAP3 calmodulin regulated spectrin-associated protein ENSP00000416797_704 ENSMUSP00000125993_58

117

family, member 3 3 calmodulin regulated spectrin-associated protein ENSMUSP00000125993_88 CAMSAP3 family, member 3 ENSP00000416797_811 2 ENSMUSP00000092964_64 CCDC110 coiled-coil domain containing 110 ENSP00000427246_807 4 ENSMUSP00000106118_13 PRAGMIN Tyrosine-protein kinase SgK223 ENSP00000428054_148 1 RNF214 ring finger protein 214 ENSP00000431643_56 ENSMUSP00000060941_48 ENSMUSP00000045559_18 DMXL1 Dmx-like 1 ENSP00000439479_1841 29 solute carrier family 1 (neutral amino acid SLC1A5 transporter), member 5 ENSP00000444408_9 ENSMUSP00000104136_33 ENSMUSP00000021494_31 C14orf38 chromosome 14 open reading frame 38 ENSP00000452964_279 3 obscurin, cytoskeletal calmodulin and titin- ENSMUSP00000038264_32 OBSCN interacting RhoGEF ENSP00000455507_3159 81 ENSMUSP00000009713_84 MKL2 MKL/myocardin-like 2 ENSP00000459626_852 6

Table S4.3. List of evolutionary clustered sites. The list includes for each pair of evolutionary clustered sites the name of the proteins, a description of the protein and the two identifiers.

Protein ID Description Num. ECS SVIL supervillin 8 ATRX alpha thalassemia/mental retardation syndrome X-linked 6 FANCM Fanconi anemia, complementation group M 5 CEP350 centrosomal protein 350kDa 5 TOP2A topoisomerase (DNA) II alpha 170kDa 4 VPS13C vacuolar protein sorting 13 homolog C (S. cerevisiae) 4 CLSPN claspin 4 SPEN spen homolog, transcriptional regulator (Drosophila) 3 BMP2K BMP2 inducible kinase 3 NBN nibrin 3 CD44 CD44 molecule (Indian blood group) 3 MAP1S microtubule-associated protein 1S 3 FGD6 FYVE, RhoGEF and PH domain containing 6 3 RBP2 RAN binding protein 2 3

Table S4.4. List of proteins with more than two evolutionary clustered sites. The list includes for each pair of evolutionary clustered sites the name of the proteins where they are found, a description of the protein and the two identifiers of the sites.

118

Figure S4.1. Algorithm to detect evolutionary clustered sites (ECS). (A) Estimation of the colocalization of phosphorylation sites inside a window of length L. Calculations were performed for windows of amino acids of increasing length. (B) Shuffling of phosphorylation sites respecting their biochemical properties (residue: S, T or Y; location in ordered/disordered regions) and calculation of the null expectations for the colocalization inside a window of length L. Calculations were performed for windows of amino acids of increasing length. (C) Comparison of the observed and expected values of colocalization. (D) Determination of the closest phosphorylation sites for which the observed colocalization score is higher than expected by chance (null expectation).

119

Figure S4.2. Comparison of human and mouse phosphorylation sites present in our dataset. (A) Global number of phosphorylation sites. (B) Proportion of the different phosphorylated residues (S: serine, T: threonine, Y: tyrosine).

Figure S4.3. Localization of SiD, StC and StD sites. Fraction of sites located in disordered, ordered or mixed regions for each of the three categories and comparison with the expectations. Mixed regions are regions where one site is located in a disordered region while the orthologous one is located in an ordered region.

120

Figure S4.4. Conservation and divergence of clusters of poly-S/T/Y. There are 158,970 poly-S/T/Y clusters (stretches of two or more consecutive S/T/Y residues) in the human proteome and 158,022 in the mouse. We defined three categories of clusters: i) Site-diverged clusters (SiD-c): human or mouse clusters that do not overlap with a cluster in the other species, even though they can overlap with single phosphorylation sites; ii) state-conserved clusters (StC-c): overlapping human and mouse clusters in which both the human and the mouse clusters contain at least one phosphorylation site: iii) state-diverged clusters (StD-c): overlapping human and mouse clusters in which only one among the human and the mouse clusters contains at least one phosphorylation site. The plots show the number of observed SiD-c, StC-c and StD-c clusters of poly-S/T/Y (orange dots) and the comparison to random expectations (distributions in grey). The null model was generated by 1,000 iterations in which human and mouse clusters were randomized.

121

122

Figure S4.5. Analysis of position weight matrice (PWM) scores for the different classes of sites and probability of being phosphorylated by the same protein kinase. (A) Comparison of the distributions of PWM scores for human and mouse phosphorylated and non-phosphorylated residues (Wilcoxon tests). (B) Comparison of the distributions of PWM scores for StD sites (Wilcoxon tests; *: p-value < 0.05; **: p-value < 0.01; ***: p-value < 0.001). (C) Correlation between human and mouse PWM scores for StC sites (red) and StD sites phosphorylated in human but not in mouse (black). (D) Correlation between human and mouse PWM scores for StC sites (red) and StD sites phosphorylated in mouse but not in human (black). (E) Proportion of phosphorylated sites that have higher PWM scores compared to their corresponding site in the other species for StC and StD sites. (F) Proportion of sites phosphorylated by the same kinase for the different categories of sites (StD: state diverged, StC: state conserved, ECS: evolutionary clustered sites). Black dots represent the observed proportion. Orange lines represent the range of proportions expected by chance. The histogram shows the distribution of random expectations for ECS. P-value for StD and StC: < 0.00001; p-value for ECS: 0.006.

123

Figure S4.6. NetPhorest scores and phosphorylation sites. Fraction of phosphorylated sites (human and mouse) as a function of the NetPhorest score.

AB StC StC StD (pho.) StD (pho.) StD (non-pho.) StD (non-pho.) Non phosphorylated Non phosphorylated Density Density 012345 012345 0.0 0.2 0.4 0.6 0.7 0.0 0.2 0.4 0.6 0.7 NetPhorest score NetPhorest score

Figure S4.7. Distributions of NetPhorest scores for the different classes of sites. (A,B) Distribution of NetPhorest scores for StC, StD and non-phosphorylated sites. Non- phosphorylated sites (red) are orthologous sites that are conserved at the residue level and both non- phosphorylated according to our phosphoproteomics data. For StD sites (in which one site is

124

phosphorylated while the orthologous one is phosphorylatable but not phosphorylated) we present two distributions: one for phosphorylated residues and another for non-phosphorylated residues.

Figure S4.8. Relationship between NetPhorest scores in state-conserved sites and protein abundance. Distributions of NetPhorest scores for state-conserved sites (only the scores for the human residue were considered) for four classes of relative protein abundance.

125

Figure S4.9. Distribution of the number of evolutionary clustered sites per protein.

126

Figure S4.10. Distance between evolutionary clustered sites. (A) Proportion of evolutionary clustered sites as a function of the length of the window (expressed in number of amino acids) in which the clustered sites are contained. (B) Cumulative distribution of the proportion of evolutionary clustered sites as a function of the distance between them (1-100 aa).

Figure S4.11. Relationship between evolutionary clustered sites and available sites. (A) Proportion of protein pairs having evolutionary clustered sites as a function of the available sites (SiD sites). (B) Distribution of available sites present in the proteins that have evolutionary clustered sites.

127

Supplementary files

Supplementary files can be found at the address: http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1004062#s4

Dataset S1. Alignments of orthologous mammalian proteins for the 68 proteins that show significant clustering of SiD phosphorylation sites (i.e. that contain evolutionary clustered sites). Proteins’ ENSEMBL IDs of the aligned proteins are provided. Alignments are in table format. The columns’ IDs provide information about the organism (following the ENSEMBL convention; e.g. “hsa” indicates Homo sapiens and “mus”, Mus musculus) and the type of data included (“aa” for amino acid and “p” for phosphorylation).

Dataset S2. Human and mouse phosphorylation sites for the 11,150 proteins present in our dataset (i.e. that contains evolutionary clustered sites). ENSEMBL IDs of the aligned proteins are provided. The alignment is in table format. The column IDs provide information on the organism (following the ENSEMBL convention; e.g. “hsa” indicates Homo sapiens and “mus”, Mus musculus) and the type of data included (“aa” for amino acid and “p” for phosphorylation, “diso” for disorder/order). Two columns (“hsa” and “mus”) provide information about the position of the residues along the human or mouse sequences. Protein disorder is indicated by the “*” symbol, while order is indicated by the “.” symbol. For phosphorylation sites, we provide information (columns .p.db) about the papers/dataset that lists the site as being phosphorylated. ("Be", Beltrao et al., 2012; “Hp”, Keshava Prasad et al., 2009; “Hu”, Huttlin et al., 2010; "Mi", Minguez et al., 2012; “Pe”, Dinkel et al., 2011; “Ph”, Gnad et al., 2011; "Po", Hornbeck et al., 2012).

128

Annex 3 – Supplementary information for Chapter 5

Figure S5.1. Localization of 3-state sites (phosphorylatable and O-GlcNAcylatable) residues in disordered (A) or ordered regions (B) of human proteins. Localization of of 3-state sites (phosphorylatable and O-GlcNAcylatable) residues in disordered (C) or ordered regions (D) of mouse proteins. The distribution represents the random expectations for each one of the regions.

129

Figure S5.2. Distribution of human (A) and mouse (B) protein abundances for the proteins containing 3-state sites. PhosOrg: phosphositeOrg.

130

Annex 4 - qPCA: a scalable assay to measure the perturbation of protein-protein interactions in living cells

Abstract

One of the most important challenges in systems biology is to understand how cells respond to genetic and environmental perturbations. Here we show that the yeast DHFR- PCA, coupled with high-resolution growth profiling (DHFR-qPCA), is a straightforward assay to study the modulation of protein-protein interactions (PPIs) in vivo as a response to genetic, metabolic and drug perturbations. Using the canonical Protein Kinase A (PKA) pathway as a test system, we show that changes in PKA activity can be measured in living cells as a modulation of the interaction between its regulatory (Bcy1) and catalytic (Tpk1, Tpk2) subunits in response to changes in carbon metabolism, caffeine and methyl methanesulfonate (MMS) treatments and to modifications in the dosage of its enzymatic regulators, the phosphodiesterases. Our results show that the DHFR-qPCA is easily implementable and amenable to high-throughput. The DHFR-qPCA will pave the way to the study of the effects of drug, genetic and environmental perturbations on in vivo PPI networks, thus allowing the exploration of new spaces of the eukaryotic interactome.

Introduction

Protein-protein interactions (PPIs) are fundamental for all cellular functions (Vidal et al, 2011). In particular, they allow the cell to perceive external stimuli and generate appropriate physiological responses. In the last decade, the development of high-throughput techniques to study PPIs has led to the first maps of the protein interactome of several model organisms (Giot et al, 2003; Ito et al, 2001; Krogan et al, 2006; Li et al, 2004; Rual et al, 2005; Tarassov et al, 2008). These maps described new associations within and among functional modules (Tarassov et al, 2008) and among protein complexes and cellular functions (Krogan et al, 2006). One limit of these maps is that they have mainly been determined in one single experimental condition. Studying how protein interactomes

131

change in different conditions would allow to understand how cells adapt to different environments, how they respond to drugs, stressors and genetic perturbations as well as to understand the basis of cellular development and differentiation (Ideker & Krogan, 2012). In order to achieve these objectives, it is necessary to develop new techniques or to adapt existing ones. Ideally, these techniques should be simple, easily implementable and amenable to high-throughput. Because studying how cells respond to perturbations of PPIs requires being able to detect them in their endogenous environments, these assays should also be performed in living cells and among proteins that are natively regulated.

Protein Complementation Assay (PCA) is a family of techniques that is now widely used to study PPI networks (Michnick et al, 2007; Morell et al, 2009). All different variants of PCA are based on the same principles: two complementary fragments of a reporter protein are fused to two proteins of interest. If the two proteins interact, the activity of the reporter protein is reconstituted such that it provides a detectable signal. One of the most cost- effective PCA techniques is the DHFR-PCA. In this case the reporter protein is a modified murine dihydrofolate reductase (Dhfr) that confers resistance to the chemical methotrexate (MTX) (Tarassov et al, 2008). Therefore, the presence of an budding yeast PPI network in living cells (Tarassov et al, 2008). In yeast, the assay requires that the coding sequences of the DHFR fragments (DHFR F[1,2] and DHFR F[3]) are inserted in the genome at the 3’ end of two genes of interest to produce proteins with the DHFR fragments fused at the C- termini (Fig. 1).

132

Fig. 1 Rationale of the DHFR-PCA. The gene encoding the engineered mouse DHFR is split into two complementary fragments, DHFR F[1,2] and DHFR F[3] and fused to the genes encoding two proteins of interest A and B. The concentration of tetrahydrofolate (THF) produced increases with the amount of DHFR complexes formed and is expected to affect growth in a dose-dependent fashion.

The endogenous gene modification offers the advantage of minimally perturbing the transcriptional regulation of the gene and does not require a modification of the protein native localizations. The interaction between two proteins of interest can be detected and measured as cellular growth on media with MTX. This PCA has recently been successfully used to determine a large part of the Here we show that the fitness-based yeast DHFR-PCA (Tarassov et al, 2008), combined with high-resolution growth profiling (DHFR-qPCA), can be successfully used to study changes in PPIs in vivo in different conditions and genetic backgrounds, and thus represents a tool that can be used to explore new dimensions of protein interactomes. Using high-resolution growth profiling it is possible to follow the growth of yeast strains in microchambers and in real-time. This allows a precise and sensitive measurement of the growth curves of several strains in parallel in different growth media.

133

Materials and methods

Bioinformatic analysis of previous DHFR-PCA data The integrated dataset on protein abundance was downloaded from PaxDb (Wang et al, 2012). Data on colony size were taken from Tarassov et al. (Tarassov et al, 2008). In order to determine the correlation between PCA signal and protein abundance we calculated the Spearman’s rank correlation coefficient as implemented in R (The R project for Statistical Computing).

Construction of the strains used to test the DHFR-qPCA Diploid strains with different numbers of DHFR-fused genes were constructed as follows. Haploid strains (BY4741 and BY4742 backgrounds; BY4741: MATa, his3Δ1, leu2Δ0, met15Δ0, ura3Δ0; BY4742: MATα, his3Δ1, leu2Δ0, lys2Δ0, ura3Δ0) carrying a single DHFR fragment (DHFR F[1,2] for MATa strains and DHFR F[3] for MATα strains) were purchased from Open Biosystems (http://www.openbiosystems.com) and were crossed as described by Tarassov et al. (Tarassov et al, 2008). For each interaction listed in Supplementary Table 1, we crossed the corresponding BY4741 MATa GENE A-DHFR F[1,2] strain with the BY4742 MATα GENE B-DHFR F[3] strain to generate diploid heterozygous strains GENE A-DHFR F[1,2]/GENE A, GENE B/GENE B-DHFR F[3]. These strains are 1:1 strains, as only one allele of each gene is tagged with a DHFR fragment. Then we sporulated these 1:1 strains, dissected the tetrads and genotyped the segregation of the DHFR fragments (Treco & Winston, 2008) in order to obtain haploid strains carrying one allele of each gene tagged (MATa GENE A-DHFR F[1,2] GENE B- DHFR F[3] and MATα GENE A-DHFR F[1,2] GENE B-DHFR F[3], respectively). Finally, we crossed these haploid strains to generate 2:2 diploid strains with both alleles of each gene tagged (MATa/MATα GENE A-DHFR F[1,2]/GENE A-DHFR F[1,2], GENE B-DHFR F[3]/GENE B-DHFR F[3]). All strain genotypes are listed in Supplementary Table 2. We confirmed all integrations by colony PCR (Amberg et al, 2005) with oligonucleotides listed in Supplementary Table 3.

134

High-resolution growth profiling

Saturated overnight cultures in YPD with suitable antibiotics were diluted to an OD600 of 1 and then diluted 1/30 in 150µl of SC (0.669% YNB w/o ammonium sulphate w/o amino acids, 2% glucose, drop-out -lys, -met, -ade) with methotrexate 200µg/mL (Bioshop Canada Inc.) or the methotrexate solvent DMSO (Bioshop Canada Inc.) in a 100 well honeycomb plate (Growth Curves USA). We measured growth profiles using a Bioscreen C

(Growth Curves USA) by reading the OD420-580 every 15 minutes with continuous agitation at 30°C. For the experiments in different conditions we added caffeine (Bioshop Canada Inc.) and MMS (Sigma-Aldrich, concentration 99%) in a gradient of concentrations (final concentrations: 1 mM, 2mM, 4mM, 6mM and 0.002%, 0.005%, 0.008%, 0.011%, respectively) or we used 2% galactose instead of 2% glucose in the SC medium. Growth curves where measured for at least three replicates from three independent precultures.

Comparison of the different parameters used to estimate growth For each growth curve, three parameters were measured in order to obtain a quantitative estimate of cellular growth: the maximum growth rate (∆OD/∆t), the efficiency (ODfinal-

ODinitial) and the lag time. The maximum growth rate was calculated as follows. We used a sliding window approach to calculate a regression line for each interval spanning 31 measurements. Then, we sorted all regression coefficients and we determined the 98th percentile. We assumed this value to be the maximum growth rate. This approach allowed to eliminate any extreme value that could result from experimental errors (Hill & Otto, 2007). The lag time was defined as the average time point of the interval where the maximum growth rate was observed. Finally, the efficiency was calculated as difference between the final and initial ODs. All analyses were performed in R (The R project for Statistical Computing).

Growth curve analysis In order to measure the relative interaction score (M) we used the following algorithm. We subtracted the lag time calculated in MTX (test) to the lag calculated in DMSO (control), which estimates the effect of the perturbation on growth rate independently from the

135

interaction. We called this difference ∆L. Then, for each experiment, in order to evaluate the differences in lag times between strains or conditions on a relative scale, we calculated a relative interaction score value (M) as a proxy for effective cellular growth (Supplementary Fig 2). We first computed the maximum ∆L among the strains or conditions tested (∆L max). For each strain or condition we then determined M by subtracting ∆L from ∆L max. The minimal interaction score is thus arbitrarily set to 0.

Spot assays

An overnight preculture in YPD was diluted to an OD600 = 1 and a ten fold serial dilution was performed. Five µL of the cell suspensions were inoculated onto SC medium plates with methotrexate 200µg/mL or DMSO.

Interactions between Ras and different RBD mutants The DHFR F[1,2] coding sequence was amplified by PCR with oligonucleotides that contain restriction sites BspEI and XhoI respectively, using the plasmid pAG25-linker- DHFR-F[1,2]-ADHterm (Tarassov et al, 2008) as template and subcloned in the plasmid p413Gal1-Ras-yCD-F[1] (Ear & Michnick, 2009) cutting with BspEI and XhoI to fuse the Ras coding gene to the DHFR F[1,2]. Sequence encoding the DHFR F[3] was amplified from pAG32-linker-DHFR(3)-ADHterm (Tarassov et al, 2008) using oligos that contain restriction sites BspEI and XhoI. The resulting PCR fragment was digested with the restriction enzymes BspEI and XhoI and subcloned in the six plasmids p415Gal1-RBD- yCD-F[2] (Ear & Michnick, 2009) that contain the wild-type RBD residues 55-133 and six mutant RBD 55-133. All constructed plasmids were verified by sequencing. A BY4741 strain was transformed with the plasmids p413Gal1-Ras-DHFR F[1,2] while six BY4742 strains were transformed with the plasmids p415Gal1-RBD-DHFR F[3] containing the wild-type and the five RBD mutants. The BY4741 strain was crossed with all the six BY4742 strains. The resulting diploid strains containing both plasmids were grown overnight in SC-Raffinose medium (0.669% YNB w/o ammonium sulphate w/o amino acids, 2% raffinose, drop-out –lys, -met, –ade, -his, -leu). High-resolution growth profiling experiments were performed in SC-Galactose medium (0.669% YNB w/o ammonium sulphate w/o amino acids, 2% galactose, drop-out –lys, -met, –ade, -his, -leu). Growth

136

curves where measured in two experiments. In the first one five replicates from five independent pre-cultures were used to test each interaction. In the second one twelve independent replicates were used to test each interaction. Fig 4 shows the combined results of the two experiments.

PKA regulatory and catalytic subunits interactions in different conditions In order to study the perturbation of the PKA complex in different conditions, we constructed diploid strains using strains from the DHFR collection (Tarassov et al, 2008). We crossed the MATa strains (BY4741 background) with the Tpk1 or the Tpk2 genes fused to the DHFR F[1,2] with a MATα strain (BY4742 background) having the Bcy1 gene tagged with DHFR F[3] to generate the diploid strains JFL001 and JFL002.

Strains carrying one additional copy of the PDE genes (PDE1 or PDE2) were obtained by transforming the JFL001 and JFL002 strains with plasmids from the MoBY collection (Ho et al, 2009). Plasmids of the MoBY collection carry one yeast gene (in this case PDE1 or PDE2) under the control of its native promoter and terminator as well as an URA selection cassette and a yeast centromeric sequence (CEN) (Ho et al, 2009). We also transformed the JFL001 and JFL002 strains with an empty pRS316 plasmid, which also has a URA cassette and a CEN sequence, to generate a no insert control strain. We then constructed strains carrying a deletion of one allele of a different PDE gene. We performed two independent transformations to obtain haploid MATα BY4742 strains where TPK1 or TPK2 genes were tagged with the DHFR F[1,2] and BCY1 tagged with the DHFR F[3] (JFL003 and JFL004). Then we crossed the JFL003 and JFL004 strains with the ho∆ (control strain), pde1∆ and pde2∆ MATa strains from the YKO deletion collection (Winzeler et al, 1999). The strains of the YKO collection are BY4741 MATa strains where a single gene is interrupted by a fragment containing a KanMX cassette, which provides resistance to the antibiotic G418. We therefore obtained diploid strains that carry a deletion of one allele of a different PDE gene (JFL005, JFL006, JFL007 and JFL008) and control strains (JFL009 and JFL010). All strains were grown in rich medium (YPD) with antibiotics. Strains carrying the DHFR F[1,2] cassette were grown in presence of Nourseothricin (Werner BioAgents; 100 µg/mL), those carrying DHFR F[3] were grown in presence of Hygromycin B (Bioshop

137

Canada Inc.; 250 µg/mL). Finally, strains carrying a KanMX cassette were grown in presence of G418 (Bioshop Canada Inc.; 200 µg/mL).

Co-immunoprecipitation and western blotting of Tpk2-Bcy1 (PKA) Strain JFL011 used for immunoprecipitation was constructed as follows. Plasmid pYM17 (Janke et al, 2004) that contains six repeats of the HA tag and a natNT2 marker was amplified with specific oligonucleotides for the integration at the BCY1 locus (C-terminus) in a BY4742 strain. Then, we amplified plasmid pYM20 (Janke et al, 2004) that has nine Myc tags in tandem repeats with oligos for integration at the TPK2 locus (C-terminus). All oligonucleotides mentioned above are listed on Supplementary Table 3. This PCR product was transformed into a BY4741 strain for homologous recombination. These haploid strains were crossed to generate the JFL011 diploid strain. Six independent cultures of the strain JFL011 were grown in 5mL of YPD overnight at 30°C with shaking. The next day, cells were diluted to OD600 of 0.1 in 100mL and grown to OD600 of 0.5. Three of the cultures were treated with caffeine (final concentration 6mM in water) while for the remaining three (controls) treated with the same volume of water. All cultures were incubated for 30 min. After incubation, the equivalent 15 OD600 of cells were collected, washed once with 2mL of zymolyase buffer (1M Sorbitol, 0,01M Phosohate-buffer pH 7.6, 0,02M EDTA) and then resuspended in 4mL with the same buffer. 4µL of - mercaptoethanol and 5µL of Zymolyase 20T (20mg/mL) were added to the cells and incubated at 37°C for 23 min with agitation. The spheroplasts were washed with 1M sorbitol, resuspended in 200µL of lysis buffer (50mM Tris-HCl pH 7.4, 0.01M EDTA, 150mM NaCl, 1% Triton X-100, PMSF 1mM, Aprotinin 2µg/mL, Leupeptin 20µg/mL, Pepstatin A 2µg/mL) and incubated 2h on ice. Lysates were immunoprecipitated for 2h at 4°C with THE™ c-Myc Tag Antibody, mAB, Mouse (GenScript A00704) coupled to Dynabeads M-280 Tosylactivated (Life Technologies Corporation), washed 3 times with 500µL of cold washing buffer (0.1M Na-Phospahte pH 7.4, 0.08% Tween 20) and eluted in 40 µL of boiling 2X Laemmli Buffer for 10 min. The primary antibodies for Western blotting were the rabbit Anti-HA antibody (Rockland 600-401-384) for the HA tag, and THE™ c-Myc Tag Antibody. The secondary antibodies were IRDye 680 conjugated Goat Anti-Rabbit (926-32221) and IRDye 800 conjugated Goat Anti-mouse (926-32210)

138

(LiCor). Dried membranes were scanned and process using an Odyssey Infrared Imaging System. Pixel quantification was performed using ImageJ64 (ImageJ).

Results and discussion

The DHFR-qPCA signal reflects the amount of protein complex formed in the cell We first examined whether the DHFR-PCA signal provides a quantitative measure of PPIs (Fig. 1), which is a minimal requirement for the measurement of changes in PPIs in different conditions. Here by quantitative we mean that the PCA signal correlates with the quantity of protein complex formed by two interacting proteins and changes in PCA signal can be reproducibly measured. We first tested this hypothesis by combining protein abundance data (Wang et al, 2012) with the PCA data from Tarassov et al. (Tarassov et al, 2008), where PCA signal is measured as colony size on agar plates containing methotrexate. We found a highly significant correlation between the average abundance of two interacting proteins and PCA signal (rho = 0.18, p-value < 2.2e-16), and this correlation is significantly improved by considering the abundance of the least expressed protein of the interacting pair (rho = 0.32, p-value < 2.2e-16, Fig 2). This result suggests that PCA signal reflects the abundance of the protein complexes formed.

139

Fig. 2 Relationship between colony size (DHFR-PCA signal) and the abundance of the least expressed protein of interacting pairs. Grey dots represent raw data and blue dots binned data.

We then tested whether changes in PPIs could be detected in the absence of modification of protein abundance. We randomly selected 15 PPIs from the yeast DHFR-PCA network 2 with high (7 pairs), medium (2 pairs) and low (6 pairs) protein abundances (see Supplementary Table 1) 12. We constructed yeast strains carrying different combinations of DHFR tagged genes in diploid cells: one allele of each locus (1:1) or both alleles of each locus (2:2) (Fig. 3A). These constructs allowed to directly manipulate by four fold the amount of DHFR reconstituted without modifying protein abundance.

140

Fig. 3 (A) Fifteen diploid strains with one (1:1) or both alleles (2:2) of two genes (GENE A, light blue and GENE B, yellow) tagged with DHFR fragments (DHFR F[1,2] in red and DHFR F[3] in dark blue, respectively) were constructed in order to test whether the DHFR-PCA signal could be modulated without changing protein abundance (the genes are under the control of their native promoters). In these diploid strains, only the number of alleles of each locus that are fused to the DHFR fragments varies. (B) Parameters used to describe yeast growth curves (Slope (∆OD/∆t),

Efficiency (ODfinal-ODinitial) and Lag time) and their correlations. (C) Example of raw data of a DHFR-qPCA experiment showing the growth profiles for the Vps29-Vps35 interaction. Each curve represents an independent biological replicate. While in DMSO (control) the 1:1 and 2:2 strains have the same growth profile, in methotrexate (MTX) the 2:2 has a significantly shorter lag time than the 1:1 (t-test; p-value < 0.001) (D) Results of DHFR-qPCA test for 14 PPIs (1:1 and 2:2 backgrounds; 15th interaction shown in panel C). All independent replicates are shown for each interaction. Grey points represent growth in DMSO (control), while colored points represent growth in MTX. Red points show interactions among proteins with high expression levels; blue, medium expression and black low expression. Dashed lines associate the same interaction in the two

141

different backgrounds. The significance of the difference in lag time between the 1:1 and 2:2 backgrounds in MTX is shown for each interaction (t-test; ***: p-value < 0.001; *: p-value between 0.01 and 0.05; NS: non-significant). (E) Spot-dilution assays show that difference in growth rates can also be detected on solid medium. Results for the Vps29-Vps35 interaction are shown (cell dilution 1:10). An isogenic strain carrying the two DHFR fragments alone expressed on plasmids provides a negative control. DMSO is the MTX solvent and is thus used as a control for growth.

When only one allele of each gene is fused to the DHFR fragments (A’ and B’ for the tagged alleles, A and B for the untagged alleles), four types of protein complexes can be formed: A’-B’, A-B’, A’-B, A-B. Therefore in the 1:1 strains, only the A’-B’ complexes (1/4) would provide a DHFR-PCA signal. If both alleles of both genes are tagged (2:2 strains), all complexes would be of type A’-B’ and thus 100% of PPIs of the complex would provide a DHFR-PCA signal. In both cases, the concentrations of proteins A and B are unaltered. We applied high-resolution growth profiling (see Methods) to these strains in DMSO (control) and MTX and estimated growth parameters from the growth profile. As a first step, we determined which growth parameter would maximise the power to detect changes in PPIs. We compared the slope, the efficiency and the lag time required to reach the maximum growth rate (Fig. 3B). We found that all parameters are strongly correlated with each other (Fig. 3B) and thus largely redundant. Because the lag time maximizes correlation between replicates (rho = 0.90, p-value < 6.81e-5) and is therefore less sensitive to experimental error, we used it as an estimate of growth and thus PCA signal. For 14 out of 15 protein pairs, lag time was significantly lower for the 2:2 strains than for the 1:1 strains (Fig. 3C-D). These four-fold differences in DHFR-PCA signal can also be detected on solid medium (Fig. 3E). We also sought to test whether the PCA signal would reflect the known dissociation constant (Kd) of a protein complex. Block et al. (Block et al, 1996) showed how point mutations in the Ras Binding Domain (RBD) affect the Kd of the Ras- RBD complex. We tested the interactions between Ras and six RBD mutants with different

Kds by DHFR-qPCA. As expected, to a decrease in Kd corresponds an increase in PCA signal (Fig. 4).

142

Fig. 4 DHFR-qPCA signal for interactions between Ras and different RBD mutants. PCA signal increases with a decrease in Kd of the different mutant Ras-RBD protein complexes. The R89L mutant shows no interaction and its M score is arbitrarily set on this graph. The numbers in parenthesis indicate the Kd of each complex in µM units. Seventeen independent biological replicates in two independent experiments were used to perform DHFR-qPCA assays for each mutant Ras-RBD complex.

Altogether, these results (Fig. 2,3,4) indicate that the DHFR-qPCA provides a quantitative readout of the amount of protein complexes formed between interaction partners, even when changes in the amount of complex formed do not involve changes in protein abundance.

DHFR-qPCA allows to study the effect of metabolic, drug and genetic perturbations on protein complexes We next sought to test directly the approach on a canonical signaling pathway by using different perturbations and combinations of perturbations. For this purpose, we chose the well-characterized protein kinase A (PKA) complex. The PKA is a tetramer formed by two regulatory and two catalytic subunits (Zhang et al, 2012) that is regulated by cAMP levels (Fig. 5A).

143

144

Fig. 5 Condition-dependence of the interactions between the PKA regulatory and catalytic subunits in response to different perturbations. (A) The activation/inactivation of the PKA is regulated by intracellular levels of cAMP, which is modulated among other mechanisms by the enzymes adenylate cyclase and phosphodiesterase (Pde1 and Pde2 in yeast). (B) Comparison of the DHFR-qPCA signal for the Bcy1-Tpk2 interaction in glucose and galactose (C) DHFR-qPCA signal for the Bcy1-Tpk2 interaction in cells grown in media supplemented with caffeine at different concentrations. (D) Co-immunoprecipitation of Bcy1 and Tpk2 in standard medium (YPD) and in YPD supplemented with caffeine confirms the DHFR- qPCA results (t-test, p-value < 0.05). (E) DHFR-qPCA signal for the Bcy1-Tpk2 interaction in cells grown in media supplemented with methyl methanesulfonate (MMS). (F) DHFR-qPCA signal for the Bcy1-Tpk2 interaction in cells grown in media supplemented with galactose and MMS at different concentrations. (G) DHFR-qPCA signal for the Bcy1-Tpk2 interaction in strains carrying an additional copy (on a low copy number plasmid) or a deletion of one copy (heterozygous strain) of the genes coding for the PDE enzymes (left and right panel, respectively). In all cases, n represents the number of independent biological replicates.

The PKA pathway regulates different processes such as glucose metabolism (Dechant & Peter, 2008), protein translation (Ashe et al, 2000), biogenesis (Martin et al, 2004), stress responses (Ramachandran et al, 2011), autophagy (Budovskaya et al, 2005) and lifespan (Longo, 2003). In yeast, the regulatory subunits are encoded by the gene BCY1, while the catalytic subunits are encoded by three different genes: TPK1, TPK2 and TPK3 (Johnson et al, 1987; Toda et al, 1987). We studied the interactions between the regulatory subunit (Bcy1) and the catalytic subunits (Tpk1 and Tpk2, respectively) by DHFR-qPCA in response to four different perturbations: 1) galactose, which leads to a decrease in PKA activity relative to glucose (Portela et al, 2003); 2) caffeine, which was shown to indirectly inhibit the PKA (Soulard et al, 2010) through the TORC1 pathway; 3) methyl methanesulfonate (MMS), a DNA damaging agent, which has recently been associated with the PKA pathway (Bandyopadhyay et al, 2010) and was shown to lead to the phosphorylation of the PKA regulatory subunit (Searle et al, 2011) and 4) dosage of the PKA regulator PDE (phosphodiesterase), which negatively regulates the PKA by degrading cAMP (Ma et al, 1999). Each of these conditions is known or has been hypothesized to

145

affect the PKA pathway in yeast but has not been assessed at the level of the protein complex dissociation.

We estimated the effect of each perturbation in control conditions (without MTX) and subtracted this effect to estimate the net PCA signal in MTX. We obtained the difference in lag time between MTX and DMSO (∆L) from which we computed a relative interaction score value (M, see Methods). Our results show that the DHFR-qPCA can detect changes in the PKA activity as a response to metabolic, drug and genetic perturbations (Fig. 5 for Bcy1-Tpk2; Supplementary Fig 1 for Bcy1-Tpk1). We observed more interaction between the regulatory and catalytic subunits of the PKA in galactose (Fig. 5B; Supplementary Fig 1A) and in the presence of caffeine (Fig. 5C; Supplementary Fig 1B) when compared with standard growth conditions. This confirms that the PKA is inhibited in these conditions (Portela et al, 2003; Soulard et al, 2010) and this inhibition involves changes in Bcy1-Tpk1 and Bcy1-Tpk2 interactions. Further, our results show that the DHFR-qPCA can detect concentration-dependent effects on the PKA complex, as PKA inhibition increases as caffeine concentration increases (Fig. 5C; Supplementary Fig 1B). In this particular case, we also measured changes in PKA complex using co-immunoprecipitation and confirmed this quantitative effect on the PKA (for Bcy1-Tpk2) (Fig. 5D). The DHFR-qPCA signal appears to provide a larger amplitude of changes in the interaction than co- immunoprecipitation, which is most likely due to the fact that the DHFR-qPCA assay is a fitness based assay which serves as a signal amplifier. This property could be exploited for instance for the measurement of subtle changes in PPIs. We also found that the effect of MMS on the PKA was stronger in galactose than in glucose (Fig. 5E,F; Supplementary Fig 1C,D). The PKA might indeed be maximally activated in glucose and the addition of MMS does not allow to activate it to an extent that can be detected with this assay. It is also possible that DNA damage affects the PKA pathway in a carbon source-dependent fashion, with a greater effect in non-fermentable carbon sources. Finally, we found that modifying the dosage of PDE1 or PDE2 leads to significant changes in the amount of PKA complex formed, consistent with their roles as negative regulators of the PKA complex (Ma et al, 1999) (Fig. 5G; Supplementary Fig 1E). This result shows that the DHFR-qPCA allows the

146

detection of subtle quantitative genetic perturbations (50% of gene product) that affect PPIs.

Conclusions PPIs regulate many cellular processes and are therefore expected to be dynamic and condition-dependent, i.e. the degree of association among proteins will depend on the conditions to which cells are exposed. There is therefore a strong need for the development of simple assays to measure changes in PPIs. Here we show that the yeast DHFR-qPCA is a quantitative technique that allows the screening of PPIs in different conditions at low cost. Our results, with those of Schlect et al. (Schlecht et al, 2012), show that the DHFR-qPCA can be used to study PPIs in different conditions and at high-throughput. Ninety-six interactions could be tested simultaneously in a standard plate-reader or more in dedicated instruments (see Methods). Unlike PCA assays based on luciferase (Stefan et al, 2007), this assay does not allow to measure dynamic PPIs in real-time, because it is based on fitness. However, this offers the advantage that fitness differences among conditions or strains can be amplified through generations and may thus allow to detect very small changes of interactions. The PCA signal might be saturated for interactions with low-Kd and/or highly abundant proteins. However, we expect that this would occur for a limited number of interactions under natural conditions as we see a strong correlation between the abundance of proteins and PCA signal over five orders of magnitude of protein abundance without saturation (Fig. 2). With the availability of entire yeast collections tagged with the DHFR fragments (Tarassov et al, 2008), any pairwise interaction of interest could be investigated in different conditions and in different genetic backgrounds. The DHFR-qPCA will pave the way to the study of the effects of drug, genetic and environmental perturbations on in vivo PPI networks, thus allowing the exploration of new spaces of the model eukaryotic interactome.

Acknowledgements

This work was supported by a Canadian Institute of Health Research (CIHR) Grant GMX- 191597 and partly by a Human Frontier Science Program grant (RGY0073/2010) and a

147

Genome Québec grant. C. R Landry is a CIHR New Investigator. L. Freschi was supported by a fellowship from the Fonds de recherche du Québec – Nature et technologies (FRQNT). L. Freschi and F. Torres-Quiroz were supported by fellowships from the Quebec Research Network on Protein Function, Structure and Engineering (PROTEO). We thank all members of the Landry laboratory and N. Aubin-Horth for their comments on the manuscript.

Supplementary information

Gene A-DHFR Gene B- Interactions Interaction Abundance F[1,2] DHFR F[3] (Gene names) (ORF names) VMA21 VPH1 VMA21-VPH1 YGR105W-YOR270C H ARX1 YBR267W ARX1- YDR101C-YBR267W H YBR267W DHH1 EDC3 DHH1-EDC3 YDL160C-YEL015W H DHH1 LSM7 DHH1-LSM7 YDL160C-YNL147W H DHH1 PBP1 DHH1-PBP1 YDL160C-YGR178C H SGN1 PUB1 SGN1-PUB1 YIR001C-YNL016W H TOM70 ALO1 TOM70-ALO1 YNL121C-YML086C H CKB1 CKA2 CKB1-CKA2 YGL019W-YOR061W M NOT5 MOT2 NOT5-MOT2 YPR072W-YER068W M LSB3 CUE5 LSB3-CUE5 YFR024C-A-YOR042W L MMS2 SIP5 MMS2-SIP5 YGL087C-YMR140W L PEX14 PEX17 PEX14-PEX17 YGL153W-YNL214W L SLA1 END3 SLA1-END3 YBL007C-YNL084C L YKE2 GIM5 YKE2-GIM5 YLR200W-YML094W L VPS29 VPS35 VPS29-VPS35 YHR012W-YJL154C L For each pair we used the data by Ghaemmaghami et al. 1 and we calculated the average abundance. Then, we classified the pairs in 3 classes: low abundance (L), medium abundance (M) and high abundance (H).

Supplementary Table 1. Protein-protein interactions selected to test for the relationship between growth and the amount of DHFR complex formed.

References

1. S. Ghaemmaghami, W. Huh, K. Bower, R. W. Howson, A. Belle, N. Dephoure, E. K. O'Shea and J. S. Weissman, Nature, 2003, 425, 737-741.

148

Strain Genotype LTQ001 MATa, VMA21-DHFR F[1,2]-natNT2, VPH1-DHFR F[3]-hphNT1 LTQ002 MATa, ARX1-DHFR F[1,2]-natNT2, YBR267W-DHFR F[3]-hphNT1 LTQ003 MATa, DHH1-DHFR F[1,2]-natNT2, EDC3-DHFR F[3]-hphNT1 LTQ004 MATa, DHH1-DHFR F[1,2]-natNT2, LSM7-DHFR F[3]-hphNT1 LTQ005 MATa, DHH1-DHFR F[1,2]-natNT2, PBP1-DHFR F[3]-hphNT1 LTQ006 MATa, SGN1-DHFR F[1,2]-natNT2, PUB1-DHFR F[3]-hphNT1 LTQ007 MATa, TOM70-DHFR F[1,2]-natNT2, ALO1-DHFR F[3]-hphNT1 LTQ008 MATa, CKB1-DHFR F[1,2]-natNT2, CKA2-DHFR F[3]-hphNT1 LTQ009 MATa, NOT5-DHFR F[1,2]-natNT2, MOT2-DHFR F[3]-hphNT1 LTQ010 MATa, LSB3-DHFR F[1,2]-natNT2, CUE5-DHFR F[3]-hphNT1 LTQ011 MATa, MMS2-DHFR F[1,2]-natNT2, SIP5-DHFR F[3]-hphNT1 LTQ012 MATa, PEX14-DHFR F[1,2]-natNT2, PEX17-DHFR F[3]-hphNT1 LTQ013 MATa, SLA1-DHFR F[1,2]-natNT2, END3-DHFR F[3]-hphNT1 LTQ014 MATa, YKE3-DHFR F[1,2]-natNT2, GIM5-DHFR F[3]-hphNT1 LTQ015 MATa, VPS29-DHFR F[1,2]-natNT2, VPS35-DHFR F[3]-hphNT1 LTQ016 MATα, VMA21-DHFR F[1,2]-natNT2, VPH1-DHFR F[3]-hphNT1 LTQ017 MATα, ARX1-DHFR F[1,2]-natNT2, YBR267W-DHFR F[3]-hphNT1 LTQ018 MATα, DHH1-DHFR F[1,2]-natNT2, EDC3-DHFR F[3]-hphNT1 LTQ019 MATα, DHH1-DHFR F[1,2]-natNT2, LSM7-DHFR F[3]-hphNT1 LTQ020 MATα, DHH1-DHFR F[1,2]-natNT2, PBP1-DHFR F[3]-hphNT1 LTQ021 MATα, SGN1-DHFR F[1,2]-natNT2, PUB1-DHFR F[3]-hphNT1 LTQ022 MATα, TOM70-DHFR F[1,2]-natNT2, ALO1-DHFR F[3]-hphNT1 LTQ023 MATα, CKB1-DHFR F[1,2]-natNT2, CKA2-DHFR F[3]-hphNT1 LTQ024 MATα, NOT5-DHFR F[1,2]-natNT2, MOT2-DHFR F[3]-hphNT1 LTQ025 MATα, LSB3-DHFR F[1,2]-natNT2, CUE5-DHFR F[3]-hphNT1 LTQ026 MATα, MMS2-DHFR F[1,2]-natNT2, SIP5-DHFR F[3]-hphNT1 LTQ027 MATα, PEX14-DHFR F[1,2]-natNT2, PEX17-DHFR F[3]-hphNT1

149

LTQ028 MATα, SLA1-DHFR F[1,2]-natNT2, END3-DHFR F[3]-hphNT1 LTQ029 MATα, YKE3-DHFR F[1,2]-natNT2, GIM5-DHFR F[3]-hphNT1 LTQ030 MATα, VPS29-DHFR F[1,2]-natNT2, VPS35-DHFR F[3]-hphNT1 JFL001 MATa/MATα, TPK1-DHFR F[1,2]-natNT2/TPK1, BCY1/ BCY1-DHFR F[3]- hphNT1 JFL002 MATa/MATα, TPK2-DHFR F[1,2]-natNT2/TPK2, BCY1/BCY1-DHFR F[3]- hphNT1 JFL003 MATα, TPK1-DHFR F[1,2]-natNT2, BCY-DHFR F[3]-hphNT1 JFL004 MATα, TPK2-DHFR F[1,2]-natNT2, BCY-DHFR F[3]-hphNT1 JFL005 MATa/MATα, TPK1/TPK1-DHFR F[1,2]-natNT2, BCY1/BCY1-DHFR F[3]- hphNT1, pde1∆-KanMX/PDE1 JFL006 MATa/MATα, TPK1/TPK1-DHFR F[1,2]-natNT2, BCY1/BCY1-DHFR F[3]- hphNT1, pde2∆-KanMX/PDE2 JFL007 MATa/MATα, TPK2/TPK2-DHFR F[1,2]-natNT2, BCY1/BCY1-DHFR F[3]- hphNT1, pde1∆-KanMX/PDE1 JFL008 MATa/MATα, TPK2/TPK2-DHFR F[1,2]-natNT2, BCY1/BCY1-DHFR F[3]- hphNT1, pde2∆-KanMX/PDE2 JFL009 MATa/MATα,TPK1/ TPK1-DHFR F[1,2]-natNT2, BCY1/BCY1-DHFR F[3]- hphNT1, ho∆-KanMX/HO JFL010 MATa/MATα, TPK2/TPK2-DHFR F[1,2]-natNT2, BCY1/BCY1-DHFR F[3]- hphNT1, ho∆-KanMX/HO JFL011 MATa/MATα, TPK2-Myc-hphNT1/TPK2, BCY1/BCY1-HA-natNT2

Supplementary Table 2. Genotypes of the strains constructed in this study.

Experiments Primer Information Primer Sequence 5’ to 3’ qPCA C Oligo Forward YGR105W (VMA21) GTTTAGCTGCTGCAATGGCC qPCA C Oligo Forward YOR270C (VPH1) AAGTTTTTCGTGGGTGAAGG qPCA C Oligo Forward YDR101C (ARX1) GCCAAGGATAAGAGGTTCG G qPCA C Oligo Forward YBR267W (YBR267W) GACTCAACAGCGTGTTTGGC qPCA C Oligo Forward YDL160C (DHH1) ACAGGCGTATCCTCCACCGC qPCA C Oligo Forward YEL015W (EDC3) CTGGCTGGCCTTTGATTGCC qPCA C Oligo Forward YDL160C (DHH1) ACAGGCGTATCCTCCACCGC qPCA C Oligo Forward YNL147W (LSM7) TTATAGGTGTCCTAAAAGGC qPCA C Oligo Forward YDL160C (DHH1) ACAGGCGTATCCTCCACCGC qPCA C Oligo Forward YGR178C (PBP1) AGCGAACGGGTCGGCAATG C

150

qPCA C Oligo Forward YIR001C (SGN1) AAAAACACTTCAACAGTGCC qPCA C Oligo Forward YNL016W (PUB1) ACAGCAGCAGCAACAGGGC G qPCA C Oligo Forward YNL121C (TOM70) ATTACTTTTGCTGAAGCCGC qPCA C Oligo Forward YML086C (ALO1) AGGATTTGAAAAAGTTCCGG qPCA C Oligo Forward YGL019W (CKB1) GATGAGGCAGTATCTGGTCC qPCA C Oligo Forward YOR061W (CKA2) ATTAGCTGTTCCTGAAGTGG qPCA C Oligo Forward YPR072W (NOT5) AATCTGAGGAGGAATCATG G qPCA C Oligo Forward YER068W (MOT2) TAAGGTTCCTATTCAGCAGC qPCA C Oligo Forward YFR024C-A (LSB3) ACCATTCAGAAAGGGTGAC G qPCA C Oligo Forward YOR042W (CUE5) GAACCCCTGGATACTACACC qPCA C Oligo Forward YGL087C (MMS2) ACTGGAAAAGAGCCTACAC C qPCA C Oligo Forward YMR140W (SIP5) CGAACTTGAAGATCAAATGG qPCA C Oligo Forward YGL153W (PEX14) GATAGCAACGCCTCCATTCC qPCA C Oligo Forward YNL214W (PEX17) TTAACAGATAGGTCCCGAGC qPCA C Oligo Forward YBL007C (SLA1) TTACAGAACCAACCTACTGG qPCA C Oligo Forward YNL084C (END3) GTCGATAACTGATGACTTGG qPCA C Oligo Forward YLR200W (YKE2) ATGCGAAAAGAACATAAGG G qPCA C Oligo Forward YML094W (GIM5) TTCCTTGTCCATCGAGGCCC qPCA C Oligo Forward YHR012W (VPS29) TAATTCACCAAGTTTCTGCC qPCA C Oligo Forward YJL154C (VPS35) CACCAACTGAAGTATATCCC qPCA Oligo Reverse to test DHFR integration CCATCTTTTCGTAAATTTCTG PKA BCY1-DHFR integration Forward TGCAGTAGACGTATTAAAGC TCAATGATCCTACAAGACAT GGCGGTGGCGGATCAGGAG GC PKA BCY1-DHFR integration Reverse AGGAAATTCATGTGGATTTA AGATCGCTTCCCCTTTTTACT TCGACACTGGATGGCGGCGT TAG PKA TPK1-DHFR integration Forward TCAAGGTGAAGACCCATATG CTGATCTTTTCCGGGACTTC GGCGGTGGCGGATCAGGAG GC PKA TPK1-DHFR integration Reverse AATATAGATACGAGAGGAA AATACAACAAAACATTAGTC ATTCGACACTGGATGGCGGC GTTAG

151

PKA TPK2-DHFR integration Forward TCAAGGCGATGATCCATATG CTGAATACTTTCAAGATTTC GGCGGTGGCGGATCAGGAG GC PKA TPK2-DHFR integration Reverse GTACTTGAAAATTGTTTTTG TGTTTTTTGGTTCATGGAACT TCGACACTGGATGGCGGCGT TAG PKA C Oligo Forward BCY1 GTGATCAAGGGGAGAACTTT TATTT PKA C Oligo Forward TPK1 CGACTCTAACACGATGAAAA CCTAT PKA C Oligo Forward TPK2 GGTATCGGTGACACGTCT CoIP BCY1-HA Forward TACTGGGTCCTGCAGTAGAC GTATTAAAGCTCAATGATCC TACAAGACATCGTACGCTGC AGGTCGAC CoIP BCY1-HA Reverse AAGAGAAAGGAAATTCATG TGGATTTAAGATCGCTTCCC CTTTTTACTTAATCGATGAA TTCGAGCTCG CoIP TPK1-MYC Forward ACTACGGTGTTCAAGGTGAA GACCCATATGCTGATCTTTT CCGGGACTTCCGTACGCTGC AGGTCGAC CoIP TPK1-MYC Reverse AAAAAAAAATATAGATACG AGAGGAAAATACAACAAAA CATTAGTCATTAATCGATGA ATTCGAGCTCG CoIP TPK2-MYC Forward ATTATGGTATTCAAGGCGAT GATCCATATGCTGAATACTT TCAAGATTTCCGTACGCTGC AGGTCGAC CoIP TPK2-MYC Reverse AGAGAAAGTACTTGAAAATT GTTTTTGTGTTTTTTGGTTCA TGGAACTTAATCGATGAATT CGAGCTCG CoIP Oligo Reverse to test MYC or HA CGACAGTCACATCATGC integration Kd Oligo used to check Ras and RBD's CAACATTTTCGGTTTGTATT plasmids constructions AC Kd Oligo Forward to amplify DHFR F[1,2] ATCGCAGGCTCCGGAGGTGG and clone in p413Gal1-Ras contain a AGGTTCTGGAGGTATGGTTC restriction site BspEI GACCATTGAACTGC Kd Oligo Reverse to amplify DHFR F[1,2] CGATGCCCGCCCCCGCTCGA and clone in p413Gal1-Ras contain a GCTATGTTCTAGATTAGGTA

152

restriction site Xho1 CCCAA

Kd Oligo Forward to amplify DHFR F[3] and CGTTGAGGCTCCGGAGGTGG clone in p415Gal1-RBD contain a AGGTTCTGGAGGTATGAGTA restriction site BspEI AAGTAGACATGGTT Kd Oligo Reverse to amplify DHFR F[3] and AGATCGCCGCCCCCGCTCGA clone in p415Gal1-RBD contain a GCTAAGTTCTAGATTAGTCT restriction site Xho1 TTCTT C Oligos were used to confirm the integration at the proper locus.

Supplementary Table 3. Oligonucleotides used in this study.

153

154

Supplementary Fig. 1. Dynamics of the interactions between the PKA regulatory and catalytic subunits in response to different perturbations. (A) Comparison of the DHFR-qPCA signal for the Bcy1-Tpk1 interaction in glucose and galactose (B) DHFR-qPCA signal for the interaction Bcy1- Tpk1 in cells grown in media supplemented with caffeine at different concentrations. (C) DHFR- qPCA signal for the Bcy1-Tpk1 interaction in cells grown in media supplemented with methyl methanesulfonate. (D) DHFR-qPCA signal for the Bcy1-Tpk1 interaction in cells grown in media supplemented with galactose and methyl methanesulfonate at different concentrations. (E) DHFR- qPCA signal for the interaction Bcy1-Tpk1 in strains carrying an additional copy (on a low copy number plasmid) or a deletion of one copy (heterozygous strain) of the genes coding for the PDE enzymes (left and right panel, respectively). In all cases, n represents the number of independent replicates.

155

Supplementary Fig. 2. Comparing PPIs using the M relative interaction score. (A) The difference between the lag times in DMSO and MTX (∆L) is calculated for all interactions. (B) M scores are calculated for each interaction by subtracting to the maximum ∆L of all interaction the ∆L of a specific interaction. (C) Bar graphs are generated to compare the relative interaction scores of all interactions tested.

156

References

Albuquerque CP, Smolka MB, Payne SH, Bafna V, Eng J, Zhou HL (2008) A multidimensional chromatography technology for in-depth phosphoproteome analysis. Mol Cell Proteomics 7: 1389-1396

Alfaro JF, Gong CX, Monroe ME, Aldrich JT, Clauss TR, Purvine SO, Wang Z, Camp DG, 2nd, Shabanowitz J, Stanley P, Hart GW, Hunt DF, Yang F, Smith RD (2012) Tandem mass spectrometry identifies many mouse brain O-GlcNAcylated proteins including EGF domain-specific O-GlcNAc transferase targets. Proc Natl Acad Sci U S A 109: 7280-7285

Amberg DC, Burke DJ, Strathern JN (2005) Methods in yeast genetics: Cold Spring harbor Laboratory Press.

Amoutzias GD, He Y, Gordon J, Mossialos D, Oliver SG, Van de Peer Y (2010) Posttranslational regulation impacts the fate of duplicated genes. Proceedings of the National Academy of Sciences of the United States of America 107: 2967-2971

Ashe MP, De Long SK, Sachs AB (2000) Glucose depletion rapidly inhibits translation initiation in yeast. Mol Biol Cell 11: 833-848

Ba ANN, Moses AM (2010) Evolution of characterized phosphorylation sites in budding yeast. Molecular Biology and Evolution 27: 2027-2037

Bandyopadhyay S, Mehta M, Kuo D, Sung MK, Chuang R, Jaehnig EJ, Bodenmiller B, Licon K, Copeland W, Shales M, Fiedler D, Dutkowski J, Guenole A, van Attikum H, Shokat KM, Kolodner RD, Huh WK, Aebersold R, Keogh MC, Krogan NJ et al (2010) Rewiring of Genetic Networks in Response to DNA Damage. Science 330: 1385-1389

Barr RK, Bogoyevitch MA (2001) The c-Jun N-terminal protein kinase family of mitogen-activated protein kinases (JNK MAPKs). Int J Biochem Cell B 33: 1047-1063

Basu U, Wang YB, Alt FW (2008) Evolution of Phosphorylation-Dependent Regulation of Activation-Induced Cytidine Deaminase. Mol Cell 32: 285-291

Beausoleil SA, Jedrychowski M, Schwartz D, Elias JE, Villen J, Li J, Cohn MA, Cantley LC, Gygi SP (2004) Large-scale characterization of HeLa cell nuclear phosphoproteins. Proceedings of the National Academy of Sciences of the United States of America 101: 12130-12135

Bell SP, Dutta A (2002) DNA replication in eukaryotic cells. Annu Rev Biochem 71: 333-374

Beltrao P, Albanese V, Kenner LR, Swaney DL, Burlingame A, Villen J, Lim WA, Fraser JS, Frydman J, Krogan NJ (2012) Systematic functional prioritization of protein posttranslational modifications. Cell 150: 413- 425

Beltrao P, Bork P, Krogan NJ, van Noort V (2013) Evolution and functional cross-talk of protein post- translational modifications. Mol Syst Biol 9: 714

Beltrao P, Trinidad JC, Fiedler D, Roguev A, Lim WA, Shokat KM, Burlingame AL, Krogan NJ (2009) Evolution of Phosphoregulation: Comparison of Phosphorylation Patterns across Yeast Species. PLoS Biology 7: e1000134-e1000134

157

Benayoun BA, Veitia RA (2009) A post-translational modification code for transcription factors: sorting through a sea of signals. Trends in cell biology 19: 189-197

Block C, Janknecht R, Herrmann C, Nassar N, Wittinghofer A (1996) Quantitative structure-activity analysis correlating Ras/Raf interaction in vitro to Raf activation in vivo. Nat Struct Biol 3: 244-251

Bodenmiller B, Mueller LN, Mueller M, Domon B, Aebersold R (2007) Reproducible isolation of distinct, overlapping segments of the phosphoproteome. Nature Methods 4: 231-237

Boekhorst J, van Breukelen B, Heck A, Jr., Snel B (2008) Comparative phosphoproteomics reveals evolutionary and functional conservation of phosphorylation across eukaryotes. Genome biology 9: R144

Boulais J, Trost M, Landry CR, Dieckmann R, Levy ED, Soldati T, Michnick SW, Thibault P, Desjardins M (2010) Molecular characterization of the evolution of phagosomes. Mol Syst Biol 6: 423

Brewster RC, Weinert FM, Garcia HG, Song D, Rydenfelt M, Phillips R (2014) The transcription factor titration effect dictates level of gene expression. Cell 156: 1312-1323

Brooks CL, Gu W (2003) Ubiquitination, phosphorylation and acetylation: the molecular basis for p53 regulation. Curr Opin Cell Biol 15: 164-171

Budovskaya YV, Stephan JS, Deminoff SJ, Herman PK (2005) An evolutionary proteomics approach identifies substrates of the cAMP-dependent protein kinase. Proceedings of the National Academy of Sciences of the United States of America 102: 13933-13938

Bullock AN, Das S, Debreczeni JE, Rellos P, Fedorov O, Niesen FH, Guo K, Papagrigoriou E, Amos AL, Cho S, Turk BE, Ghosh G, Knapp S (2009) Kinase domain insertions define distinct roles of CLK kinases in SR protein phosphorylation. Structure 17: 352-362

Bullock AN, Debreczeni J, Amos AL, Knapp S, Turk BE (2005) Structure and substrate specificity of the Pim-1 kinase. The Journal of biological chemistry 280: 41675-41682

Bunkoczi G, Salah E, Filippakopoulos P, Fedorov O, Muller S, Sobott F, Parker SA, Zhang H, Min W, Turk BE, Knapp S (2007) Structural and functional characterization of the human protein kinase ASK1. Structure 15: 1215-1226

Caron C, Boyault C, Khochbin S (2005) Regulatory cross-talk between lysine acetylation and ubiquitination: role in the control of protein stability. BioEssays : news and reviews in molecular, cellular and developmental biology 27: 408-415

Chen C, Turk BE (2010) Analysis of serine-threonine kinase specificity using arrayed positional scanning peptide libraries. Current protocols in molecular biology / edited by Frederick M Ausubel [et al] Chapter 18: Unit 18 14

Chi A, Huttenhower C, Geer LY, Coon JJ, Syka JEP, Bai DL, Shabanowitz J, Burke DJ, Troyanskaya OG, Hunt DF (2007) Analysis of phosphorylation sites on proteins from Saccharomyces cerevisiae by electron transfer dissociation (ETD) mass spectrometry. Proceedings of the National Academy of Sciences of the United States of America 104: 2193-2198

Choudhary C, Kumar C, Gnad F, Nielsen ML, Rehman M, Walther TC, Olsen JV, Mann M (2009) Lysine acetylation targets protein complexes and co-regulates major cellular functions. Science 325: 834-840

158

Cohen P (2000) The regulation of protein function by multisite phosphorylation - a 25 year update. Trends in Biochemical Sciences 25: 596-601

Conant GC, Wolfe KH (2008) Turning a hobby into a job: how duplicated genes find new functions. Nature reviews Genetics 9: 938-950

Courcelles M, Lemieux S, Voisin L, Meloche S, Thibault P (2011) ProteoConnections: a bioinformatics platform to facilitate proteome and phosphoproteome analyses. Proteomics 11: 2654-2671

Crick F (1970) Central dogma of molecular biology. Nature 227: 561-563

Davis TL, Walker JR, Allali-Hassani A, Parker SA, Turk BE, Dhe-Paganon S (2009) Structural recognition of an optimized substrate for the ephrin family of receptor tyrosine kinases. Febs J 276: 4395-4404

Dean AM, Thornton JW (2007) Mechanistic approaches to the study of evolution: the functional synthesis. Nature reviews Genetics 8: 675-688

Dechant R, Peter M (2008) Nutrient signals driving cell growth. Curr Opin Cell Biol 20: 678-687

Deribe YL, Pawson T, Dikic I (2010) Post-translational modifications in signal integration. Nature structural & molecular biology 17: 666-672

Dinkel H, Chica C, Via A, Gould CM, Jensen LJ, Gibson TJ, Diella F (2011) Phospho.ELM: a database of phosphorylation sites--update 2011. Nucleic acids research 39: D261-267

Dulai KS, von Dornum M, Mollon JD, Hunt DM (1999) The evolution of trichromatic color vision by opsin gene duplication in New World and Old World primates. Genome research 9: 629-638

Ear PH, Michnick SW (2009) A general life-death selection strategy for dissecting protein functions. Nature methods 6: 813-816

Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32: 1792-1797

Efstratiadis A, Posakony JW, Maniatis T, Lawn RM, O'Connell C, Spritz RA, DeRiel JK, Forget BG, Weissman SM, Slightom JL, Blechl AE, Smithies O, Baralle FE, Shoulders CC, Proudfoot NJ (1980) The structure and evolution of the human beta-globin gene family. Cell 21: 653-668

Eisenberg E, Levanon EY (2003) Human housekeeping genes are compact. Trends Genet 19: 362-365

Fazili Z, Sun WP, Mittelstaedt S, Cohen C, Xu XX (1999) Disabled-2 inactivation is an early step in ovarian tumorigenicity. Oncogene 18: 3104-3113

Ferris SD, Whitt GS (1979) Evolution of the Differential Regulation of Duplicate Genes after Polyploidization. J Mol Evol 12: 267-317

Filippakopoulos P, Kofler M, Hantschel O, Gish GD, Grebien F, Salah E, Neudecker P, Kay LE, Turk BE, Superti-Furga G, Pawson T, Knapp S (2008) Structural coupling of SH2-kinase domains links Fes and Abl substrate recognition and kinase activation. Cell 134: 793-803

159

Force A, Lynch M, Pickett FB, Amores A, Yan YL, Postlethwait J (1999) Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151: 1531-1545

Freschi L, Courcelles M, Thibault P, Michnick SW, Landry CR (2011) Phosphorylation network rewiring by gene duplication. Mol Syst Biol 7

Freschi L, Osseni M, Landry CR (2014) Functional divergence and evolutionary turnover in mammalian phosphoproteomes. PLoS Genet 10: e1004062

Gasch AP, Moses AM, Chiang DY, Fraser HB, Berardini M, Eisen MB (2004) Conservation and evolution of cis-regulatory systems in ascomycete fungi. PLoS Biol 2: e398-e398

Giot L, Bader JS, Brouwer C, Chaudhuri A, Kuang B, Li Y, Hao YL, Ooi CE, Godwin B, Vitols E, Vijayadamodar G, Pochart P, Machineni H, Welsh M, Kong Y, Zerhusen B, Malcolm R, Varrone Z, Collis A, Minto M et al (2003) A protein interaction map of Drosophila melanogaster. Science 302: 1727-1736

Glotzer M, Murray AW, Kirschner MW (1991) Cyclin is degraded by the ubiquitin pathway. Nature 349: 132- 138

Gnad F, de Godoy LMF, Cox J, Neuhauser N, Ren S, Olsen JV, Mann M (2009) High-accuracy identification and bioinformatic analysis of in vivo protein phosphorylation sites in yeast. Proteomics 9: 4642-4652

Gnad F, Gunawardena J, Mann M (2011) PHOSIDA 2011: the posttranslational modification database. Nucleic acids research 39: D253-260

Gnad F, Ren S, Cox J, Olsen JV, Macek B, Oroshi M, Mann M (2007) PHOSIDA (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites. Genome biology 8: R250

Gordon JL, Byrne KP, Wolfe KH (2009) Additions, Losses, and Rearrangements on the Evolutionary Route from a Reconstructed Ancestor to the Modern Saccharomyces cerevisiae Genome. PLoS Genetics 5: e1000485-e1000485

Gordon R (1994) Evolution Escapes Rugged Fitness Landscapes by Gene or Genome Doubling - the Blessing of Higher Dimensionality. Comput Chem 18: 325-331

Gough NR, Wong W (2010) Focus Issue: The Evolution of Complexity. Sci Signal 3: eg5-eg5

Gray VE, Kumar S (2011) Rampant purifying selection conserves positions with posttranslational modifications in human proteins. Mol Biol Evol 28: 1565-1568

Gruhler A, Olsen JV, Mohammed S, Mortensen P, Faergeman NJ, Mann M, Jensen ON (2005) Quantitative phosphoproteomics applied to the yeast pheromone signaling pathway. Molecular & Cellular Proteomics: MCP 4: 310-327

Gu X, Zhang Z, Huang W (2005) Rapid evolution of expression and regulatory divergences after yeast gene duplication. Proceedings of the National Academy of Sciences of the United States of America 102: 707-712

Gu ZL, Nicolae D, Lu HHS, Li WH (2002) Rapid divergence in expression between duplicate genes inferred from microarray data. Trends in Genetics 18: 609-613

160

Gwinn DM, Shackelford DB, Egan DF, Mihaylova MM, Mery A, Vasquez DS, Turk BE, Shaw RJ (2008) AMPK phosphorylation of raptor mediates a metabolic checkpoint. Mol Cell 30: 214-226

Hansen TF, Carter AJR, Chiu CH (2000) Gene conversion may aid adaptive peak shifts. J Theor Biol 207: 495-511

Hart GW, Slawson C, Ramirez-Correa G, Lagerlof O (2011) Cross talk between O-GlcNAcylation and phosphorylation: roles in signaling, transcription, and chronic disease. Annu Rev Biochem 80: 825-858

He XL, Zhang JZ (2005) Rapid subfunctionalization accompanied by prolonged and substantial neofunctionalization in duplicate gene evolution. Genetics 169: 1157-1164

Herbig U, Griffith JW, Fanning E (2000) Mutation of cyclin/cdk phosphorylation sites in HsCdc6 disrupts a late step in initiation of DNA replication in human cells. Mol Biol Cell 11: 4117-4130

Hill JA, Otto SP (2007) The role of pleiotropy in the maintenance of sex in yeast. Genetics 175: 1419-1427

Ho CH, Magtanong L, Barker SL, Gresham D, Nishimura S, Natarajan P, Koh JLY, Porter J, Gray CA, Andersen RJ, Giaever G, Nislow C, Andrews B, Botstein D, Graham TR, Yoshida M, Boone C (2009) A molecular barcoded yeast ORF library enables mode-of-action analysis of bioactive compounds. Nat Biotechnol 27: 369-377

Holmberg CI, Tran SEF, Eriksson JE, Sistonen L (2002) Multisite phosphorylation provides sophisticated regulation of transcription factors. Trends in Biochemical Sciences 27: 619-627

Holt LJ, Tuch BB, Villen J, Johnson AD, Gygi SP, Morgan DO (2009) Global Analysis of Cdk1 Substrate Phosphorylation Sites Provides Insights into Evolution. Science 325: 1682-1686

Hornbeck PV, Kornhauser JM, Tkachev S, Zhang B, Skrzypek E, Murray B, Latham V, Sullivan M (2012) PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic acids research 40: D261-270

Hsueh KW, Fu SL, Chang CB, Chang YL, Lin CH (2013) A novel Aurora-A-mediated phosphorylation of p53 inhibits its interaction with MDM2. Biochimica et biophysica acta 1834: 508-515

Hunter T (2000) Signaling - 2000 and beyond. Cell 100: 113-127

Hunter T (2007) The age of crosstalk: phosphorylation, ubiquitination, and beyond. Mol Cell 28: 730-738

Hurles M (2004) Gene duplication: the genomic trade in spare parts. PLoS Biology 2: 900-904

Hutti JE, Jarrell ET, Chang JD, Abbott DW, Storz P, Toker A, Cantley LC, Turk BE (2004) A rapid method for determining protein kinase phosphorylation specificity. Nature methods 1: 27-29

Huttlin EL, Jedrychowski MP, Elias JE, Goswami T, Rad R, Beausoleil SA, Villen J, Haas W, Sowa ME, Gygi SP (2010) A tissue-specific atlas of mouse protein phosphorylation and expression. Cell 143: 1174-1189

Iakoucheva LM, Radivojac P, Brown CJ, O'Connor TR, Sikes JG, Obradovic Z, Dunker AK (2004) The importance of intrinsic disorder for protein phosphorylation. Nucleic acids research 32: 1037-1049

Ideker T, Krogan NJ (2012) Differential network biology. Mol Syst Biol 8

161

ImageJ -- imagej.nih.gov/ij/

Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y (2001) A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proceedings of the National Academy of Sciences of the United States of America 98: 4569-4574

Janke C, Magiera MM, Rathfelder N, Taxis C, Reber S, Maekawa H, Moreno-Borchart A, Doenges G, Schwob E, Schiebel E, Knop M (2004) A versatile toolbox for PCR-based tagging of yeast genes: new fluorescent proteins, more markers and promoter substitution cassettes. Yeast 21: 947-962

Jensen LJ, Jensen TS, de Lichtenberg U, Brunak S, Bork P (2006) Co-evolution of transcriptional and post- translational cell-cycle regulation. Nature 443: 594-597

Jin H, Zangar RC (2009) Protein modifications as potential biomarkers in breast cancer. Biomarker insights 4: 191-200

Johnson KE, Cameron S, Toda T, Wigler M, Zoller MJ (1987) Expression in Escherichia-Coli of Bcy1, the Regulatory Subunit of Cyclic Amp-Dependent Protein-Kinase from Saccharomyces-Cerevisiae - Purification and Characterization. J Biol Chem 262: 8636-8642

Kaganovich M, Snyder M (2012) Phosphorylation of yeast transcription factors correlates with the evolution of novel sequence and function. Journal of Proteome Research 11: 261-268

Kamemura K, Hayes BK, Comer FI, Hart GW (2002) Dynamic interplay between O-glycosylation and O- phosphorylation of nucleocytoplasmic proteins: alternative glycosylation/phosphorylation of THR-58, a known mutational hot spot of c-Myc in lymphomas, is regulated by mitogens. J Biol Chem 277: 19229-19235

Kantarci S, Al-Gazali L, Hill RS, Donnai D, Black GC, Bieth E, Chassaing N, Lacombe D, Devriendt K, Teebi A, Loscertales M, Robson C, Liu T, MacLaughlin DT, Noonan KM, Russell MK, Walsh CA, Donahoe PK, Pober BR (2007) Mutations in LRP2, which encodes the multiligand receptor megalin, cause Donnai-Barrow and facio-oculo-acoustico-renal syndromes. Nature genetics 39: 957-959

Kapoor M, Hamm R, Yan W, Taya Y, Lozano G (2000) Cooperative phosphorylation at multiple sites is required to activate p53 in response to UV radiation. Oncogene 19: 358-364

Kastan MB (2008) DNA damage responses: mechanisms and roles in human disease: 2007 G.H.A. Clowes Memorial Award Lecture. Molecular cancer research : MCR 6: 517-524

Kellis M, Birren BW, Lander ES (2004) Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature 428: 617-624

Kent WJ (2002) BLAT---The BLAST-Like Alignment Tool. Genome Research 12: 656-664

Keshava Prasad TS, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S, Telikicherla D, Raju R, Shafreen B, Venugopal A, Balakrishnan L, Marimuthu A, Banerjee S, Somanathan DS, Sebastian A, Rani S, Ray S, Harrys Kishore CJ, Kanth S, Ahmed M et al (2009) Human Protein Reference Database--2009 update. Nucleic acids research 37: D767-772

162

Khmelinskii A, Roostalu J, Roque H, Antony C, Schiebel E (2009) Phosphorylation-Dependent Protein Interactions at the Spindle Midzone Mediate Cell Cycle Regulation of Spindle Elongation. Dev Cell 17: 244- 256

Khoury GA, Baliban RC, Floudas CA (2011) Proteome-wide post-translational modification statistics: frequency analysis and curation of the swiss-prot database. Scientific reports 1

Kikani CK, Antonysamy SA, Bonanno JB, Romero R, Zhang FF, Russell M, Gheyi T, Iizuka M, Emtage S, Sauder JM, Turk BE, Burley SK, Rutter J (2010) Structural bases of PAS domain-regulated kinase (PASK) activation in the absence of activation loop phosphorylation. The Journal of biological chemistry 285: 41034- 41043

Kim DS, Hahn Y (2011) Identification of novel phosphorylation modification sites in human proteins that originated after the human-chimpanzee divergence. Bioinformatics 27: 2494-2501

Kim W, Bennett EJ, Huttlin EL, Guo A, Li J, Possemato A, Sowa ME, Rad R, Rush J, Comb MJ, Harper JW, Gygi SP (2011) Systematic and quantitative assessment of the ubiquitin-modified proteome. Mol Cell 44: 325- 340

Koivomagi M, Valk E, Venta R, Iofik A, Lepiku M, Balog ER, Rubin SM, Morgan DO, Loog M (2011) Cascades of multisite phosphorylation control Sic1 destruction at the onset of S phase. Nature 480: 128-131

Kondrashov FA, Koonin EV (2004) A common framework for understanding the origin of genetic dominance and evolutionary fates of gene duplications. Trends Genet 20: 287-290

Kondrashov FA, Rogozin IB, Wolf YI, Koonin EV (2002) Selection in the evolution of gene duplications. Genome biology 3

Kozlov SV, Graham ME, Jakob B, Tobias F, Kijas AW, Tanuji M, Chen P, Robinson PJ, Taucher-Scholz G, Suzuki K, So S, Chen D, Lavin MF (2011) Autophosphorylation and ATM activation: additional sites add to the complexity. J Biol Chem 286: 9107-9119

Krogan NJ, Cagney G, Yu HY, Zhong GQ, Guo XH, Ignatchenko A, Li J, Pu SY, Datta N, Tikuisis AP, Punna T, Peregrin-Alvarez JM, Shales M, Zhang X, Davey M, Robinson MD, Paccanaro A, Bray JE, Sheung A, Beattie B et al (2006) Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440: 637-643

Kurmangaliyev YZ, Goland A, Gelfand MS (2011) Evolutionary patterns of phosphorylated serines. Biol Direct 6

Landry CR, Levy ED, Michnick SW (2009) Weak functional constraints on phosphoproteomes. Trends in Genetics: TIG 25: 193-197

Latham JA, Dent SY (2007) Cross-regulation of histone modifications. Nature structural & molecular biology 14: 1017-1024

Levine AJ, Oren M (2009) The first 30 years of p53: growing ever more complex. Nature reviews Cancer 9: 749-758

163

Levy E, Michnick S, Landry C (2012) Protein abundance is key to distinguish promiscuous from functional phosphorylation based on evolutionary information. Philosophical transactions of the Royal Society of London Series B, Biological sciences 367: 2594-2606

Li M, Luo J, Brooks CL, Gu W (2002) Acetylation of p53 inhibits its ubiquitination by Mdm2. J Biol Chem 277: 50607-50611

Li SM, Armstrong CM, Bertin N, Ge H, Milstein S, Boxem M, Vidalain PO, Han JDJ, Chesneau A, Hao T, Goldberg DS, Li N, Martinez M, Rual JF, Lamesch P, Xu L, Tewari M, Wong SL, Zhang LV, Berriz GF et al (2004) A map of the interactome network of the metazoan C-elegans. Science 303: 540-543

Li X, Gerber SA, Rudner AD, Beausoleil SA, Haas W, Villen J, Elias JE, Gygi SP (2007) Large-scale phosphorylation analysis of alpha-factor-arrested Saccharomyces cerevisiae. Journal of Proteome Research 6: 1190-1197

Lienhard GE (2008) Non-functional ? Trends in Biochemical Sciences 33: 351-352

Lim WA, Pawson T (2010) Phosphotyrosine signaling: evolving a new cellular communication system. Cell 142: 661-667

Lin DI, Barbash O, Kumar KG, Weber JD, Harper JW, Klein-Szanto AJ, Rustgi A, Fuchs SY, Diehl JA (2006) Phosphorylation-dependent ubiquitination of cyclin D1 by the SCF(FBX4-alphaB crystallin) complex. Mol Cell 24: 355-366

Liu X, Yu X, Zack DJ, Zhu H, Qian J (2008) TiGER: a database for tissue-specific gene expression and regulation. BMC Bioinformatics 9: 271

Livanova NB, Chebotareva NA, Eronina TB, Kurganov BI (2002) Pyridoxal 5'-phosphate as a catalytic and conformational cofactor of muscle glycogen phosphorylase B. Biochemistry Biokhimiia 67: 1089-1098

Longo VD (2003) The Ras and Sch9 pathways regulate stress resistance and longevity. Exp Gerontol 38: 807- 811

Lu CT, Huang KY, Su MG, Lee TY, Bretana NA, Chang WC, Chen YJ, Chen YJ, Huang HD (2013) DbPTM 3.0: an informative resource for investigating substrate site specificity and functional association of protein post-translational modifications. Nucleic Acids Res 41: D295-305

Lynch M, Conery JS (2000) The evolutionary fate and consequences of duplicate genes. Science 290: 1151- 1155

Lynch M, Force A (2000) The probability of duplicate gene preservation by subfunctionalization. Genetics 154: 459-459

Lynch M, Sung W, Morris K, Coffey N, Landry CR, Dopman EB, Dickinson WJ, Okamoto K, Kulkarni S, Hartl DL, Thomas WK (2008) A genome-wide view of the spectrum of spontaneous mutations in yeast. Proceedings of the National Academy of Sciences of the United States of America 105: 9272-9277

Ma PS, Wera S, Van Dijck P, Thevelein JM (1999) The PDE1-encoded low-affinity phosphodiesterase in the yeast Saccharomyces cerevisiae has a specific function in controlling agonist-induced cAMP signaling. Mol Biol Cell 10: 91-104

164

Macek B, Gnad F, Soufi B, Kumar C, Olsen JV, Mijakovic I, Mann M (2008) Phosphoproteome analysis of E. coli reveals evolutionary conservation of bacterial Ser/Thr/Tyr phosphorylation. Molecular & cellular proteomics : MCP 7: 299-307

Madeo F, Schlauer J, Zischka H, Mecke D, Frohlich KU (1998) Tyrosine phosphorylation regulates cell cycle- dependent nuclear localization of Cdc48p. Mol Biol Cell 9: 131-141

Malik R, Nigg EA, Korner R (2008) Comparative conservation analysis of the human mitotic phosphoproteome. Bioinformatics 24: 1426-1432

Mann M, Jensen ON (2003) Proteomic analysis of post-translational modifications. Nat Biotechnol 21: 255-261

Manning G, Whyte DB, Martinez R, Hunter T, Sudarsanam S (2002) The protein kinase complement of the human genome. Science 298: 1912-1934

Marcantonio M, Trost M, Courcelles M, Desjardins M, Thibault P (2008) Combined enzymatic and data mining approaches for comprehensive phosphoproteome analyses: application to cell signaling events of interferon- gamma-stimulated macrophages. Molecular & Cellular Proteomics: MCP 7: 645-660

Martin DE, Soulard A, Hall MN (2004) TOR regulates ribosomal protein gene expression via PKA and the forkhead transcription factor FHL1. Cell 119: 969-979

Meetei AR, Medhurst AL, Ling C, Xue Y, Singh TR, Bier P, Steltenpool J, Stone S, Dokal I, Mathew CG, Hoatlin M, Joenje H, de Winter JP, Wang W (2005) A human ortholog of archaeal DNA repair protein Hef is defective in Fanconi anemia complementation group M. Nature genetics 37: 958-963

Michnick SW, Ear PH, Manderson EN, Remy I, Stefan E (2007) Universal strategies in research and drug discovery based on protein-fragment complementation assays. Nat Rev Drug Discov 6: 569-582

Miller ML, Jensen LJ, Diella F, Jorgensen C, Tinti M, Li L, Hsiung M, Parker SA, Bordeaux J, Sicheritz-Ponten T, Olhovsky M, Pasculescu A, Alexander J, Knapp S, Blom N, Bork P, Li S, Cesareni G, Pawson T, Turk BE et al (2008) Linear motif atlas for phosphorylation-dependent signaling. Science signaling 1: ra2

Minguez P, Parca L, Diella F, Mende DR, Kumar R, Helmer-Citterich M, Gavin AC, van Noort V, Bork P (2012) Deciphering a global network of functionally associated post-translational modifications. Mol Syst Biol 8: 599

Miura Y, Sakurai Y, Endo T (2012) O-GlcNAc modification affects the ATM-mediated DNA damage response. Biochimica et biophysica acta 1820: 1678-1685

Mok J, Kim PM, Lam HYK, Piccirillo S, Zhou X, Jeschke GR, Sheridan DL, Parker SA, Desai V, Jwa M, Cameroni E, Niu H, Good M, Remenyi A, Ma J-LN, Sheu Y-J, Sassi HE, Sopko R, Chan CSM, De Virgilio C et al (2010) Deciphering protein kinase specificity through large-scale analysis of yeast phosphorylation site motifs. Science Signaling 3: ra12-ra12

Moll UM, Petrenko O (2003) The MDM2-p53 interaction. Molecular cancer research : MCR 1: 1001-1008

Morell M, Ventura S, Aviles FX (2009) Protein complementation assays: Approaches for the in vivo analysis of protein interactions. Febs Lett 583: 1684-1691

Moses AM, Landry CR (2010) Moving from transcriptional to phospho-evolution: generalizing regulatory evolution? Trends in Genetics: TIG 26: 462-467

165

Moses AM, Liku ME, Li JJ, Durbin R (2007) Regulatory evolution in proteins by turnover and lineage-specific changes of cyclin-dependent kinase consensus sites. Proceedings of the National Academy of Sciences of the United States of America 104: 17713-17718

Mukherjee S, Keitany G, Li Y, Wang Y, Ball HL, Goldsmith EJ, Orth K (2006) Yersinia YopJ acetylates and inhibits kinase activation by blocking phosphorylation. Science 312: 1211-1214

Musso G, Costanzo M, Huangfu M, Smith AM, Paw J, San Luis B-J, Boone C, Giaever G, Nislow C, Emili A, Zhang Z (2008) The extensive and condition-dependent nature of epistasis among whole-genome duplicates in yeast. Genome Research 18: 1092-1099

Nash P, Tang X, Orlicky S, Chen Q, Gertler FB, Mendenhall MD, Sicheri F, Pawson T, Tyers M (2001) Multisite phosphorylation of a CDK inhibitor sets a threshold for the onset of DNA replication. Nature 414: 514- 521

Nguyen Ba AN, Moses AM (2010) Evolution of Characterized Phosphorylation Sites in Budding Yeast. Molecular Biology and Evolution 27: 2027-2037

Nussinov R, Tsai CJ, Xin F, Radivojac P (2012) Allosteric post-translational modification codes. Trends in biochemical sciences 37: 447-455

Ohno S (1970) Evolution by gene duplication, London, New York,: Allen & Unwin; Springer-Verlag.

Olsen JV, Blagoev B, Gnad F, Macek B, Kumar C, Mortensen P, Mann M (2006) Global, in vivo, and site- specific phosphorylation dynamics in signaling networks. Cell 127: 635-648

Olsen JV, Mann M (2013) Status of large-scale analysis of post-translational modifications by mass spectrometry. Mol Cell Proteomics 12: 3444-3452

Papp B, Pál C, Hurst LD (2003) Evolution of cis-regulatory elements in duplicated genes of yeast. Trends in Genetics: TIG 19: 417-422

Pearlman SM, Serber Z, Ferrell JE (2011) A Mechanism for the Evolution of Phosphorylation Sites. Cell 147: 934-946

Pike AC, Rellos P, Niesen FH, Turnbull A, Oliver AW, Parker SA, Turk BE, Pearl LH, Knapp S (2008) Activation segment dimerization: a mechanism for kinase autophosphorylation of non-consensus sites. Embo J 27: 704-714

Pincus D, Letunic I, Bork P, Lim WA (2008) Evolution of the phospho-tyrosine signaling machinery in premetazoan lineages. Proc Natl Acad Sci U S A 105: 9680-9684

Portela P, Van Dijck P, Thevelein JM, Moreno S (2003) Activation state of protein kinase A as measured in permeabilised Saccharomyces cerevisiae cells correlates with PKA-controlled phenotypes in vivo. Fems Yeast Res 3: 119-126

166

Prabakaran S, Lippens G, Steen H, Gunawardena J (2012) Post-translational modification: nature's escape from genetic imprisonment and the basis for dynamic information encoding. Wiley interdisciplinary reviews Systems biology and medicine 4: 565-583

Ptacek J, Devgan G, Michaud G, Zhu H, Zhu X, Fasolo J, Guo H, Jona G, Breitkreutz A, Sopko R, McCartney RR, Schmidt MC, Rachidi N, Lee S-J, Mah AS, Meng L, Stark MJR, Stern DF, De Virgilio C, Tyers M et al (2005) Global analysis of protein phosphorylation in yeast. Nature 438: 679-684

Pupko T, Bell RE, Mayrose I, Glaser F, Ben-Tal N (2002) Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics 18 Suppl 1: S71-77

Ramachandran V, Shah KH, Herman PK (2011) The cAMP-Dependent Protein Kinase Signaling Pathway Is a Key Regulator of P Body Foci Formation. Mol Cell 43: 973-981

Reinders J, Wagner K, Zahedi RP, Stojanovski D, Eyrich B, van der Laan M, Rehling P, Sickmann A, Pfanner N, Meisinger C (2007) Profiling phosphoproteins of yeast mitochondria reveals a role of phosphorylation in assembly of the ATP synthase. Mol Cell Proteomics 6: 1896-1906

Rennefahrt UE, Deacon SW, Parker SA, Devarajan K, Beeser A, Chernoff J, Knapp S, Turk BE, Peterson JR (2007) Specificity profiling of Pak kinases allows identification of novel phosphorylation sites. The Journal of biological chemistry 282: 15667-15678

Rodgers-Melnick E, Mane SP, Dharmawardhana P, Slavov GT, Crasta OR, Strauss SH, Brunner AM, DiFazio SP (2012) Contrasting patterns of evolution following whole genome versus tandem duplication events in Populus. Genome research 22: 95-105

Rual JF, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, Li N, Berriz GF, Gibbons FD, Dreze M, Ayivi- Guedehoussou N, Klitgord N, Simon C, Boxem M, Milstein S, Rosenberg J, Goldberg DS, Zhang LV, Wong SL, Franklin G, Li SM et al (2005) Towards a proteome-scale map of the human protein-protein interaction network. Nature 437: 1173-1178

Ruan HB, Singh JP, Li MD, Wu J, Yang X (2013) Cracking the O-GlcNAc code in metabolism. Trends in endocrinology and metabolism: TEM 24: 301-309

Scannell DR, Wolfe KH (2008) A burst of protein sequence evolution and a prolonged period of asymmetric evolution follow gene duplication in yeast. Genome research 18: 137-147

Schlecht U, Miranda M, Suresh S, Davis RW, St Onge RP (2012) Multiplex assay for condition-dependent changes in protein-protein interactions. Proceedings of the National Academy of Sciences of the United States of America

Schwammle V, Aspalter CM, Sidoli S, Jensen ON (2014) Large scale analysis of co-existing post-translational modifications in histone tails reveals global fine structure of cross-talk. Molecular & cellular proteomics : MCP 13: 1855-1865

Schwanhausser B, Busse D, Li N, Dittmar G, Schuchhardt J, Wolf J, Chen W, Selbach M (2011) Global quantification of mammalian gene expression control. Nature 473: 337-342

167

Searle JS, Wood MD, Kaur M, Tobin DV, Sanchez Y (2011) Proteins in the Nutrient-Sensing and DNA Damage Checkpoint Pathways Cooperate to Restrain Mitotic Progression following DNA Damage. Plos Genetics 7

Seo J, Lee KJ (2004) Post-translational modifications and their biological functions: Proteomic analysis and systematic approaches. J Biochem Mol Biol 37: 35-44

Seoighe C, Wolfe KH (1999) Yeast genome evolution in the post-genome era. Current opinion in microbiology 2: 548-554

Serber Z, Ferrell JE (2007) Tuning bulk electrostatics to regulate protein function. Cell 128: 441-444

Serber Z, Ferrell Jr JE (2007) Tuning Bulk Electrostatics to Regulate Protein Function. Cell 128: 441-444

Sharma K, D'Souza RC, Tyanova S, Schaab C, Wisniewski JR, Cox J, Mann M (2014) Ultradeep human phosphoproteome reveals a distinct regulatory nature of Tyr and Ser/Thr-based signaling. Cell reports 8: 1583- 1594

Sheridan DL, Kong Y, Parker SA, Dalby KN, Turk BE (2008) Substrate discrimination among mitogen- activated protein kinases through distinct docking sequence motifs. J Biol Chem 283: 19511-19520

Skou JC (1965) Enzymatic Basis for Active Transport of Na+ and K+ across Cell Membrane. Physiol Rev 45: 596-&

Souciet J-L, Dujon B, Gaillardin C, Johnston M, Baret PV, Cliften P, Sherman DJ, Weissenbach J, Westhof E, Wincker P, Jubin C, Poulain J, Barbe Vr, Ségurens Ba, Artiguenave Fß, Anthouard Vr, Vacherie B, Val M-E, Fulton RS, Minx P et al (2009) Comparative genomics of protoploid Saccharomycetaceae. Genome Research 19: 1696-1709

Soulard A, Cremonesi A, Moes S, Schutz F, Jeno P, Hall MN (2010) The Rapamycin-sensitive Phosphoproteome Reveals That TOR Controls Protein Kinase A Toward Some But Not All Substrates. Mol Biol Cell 21: 3475-3486

Sprang SR, Acharya KR, Goldsmith EJ, Stuart DI, Varvill K, Fletterick RJ, Madsen NB, Johnson LN (1988) Structural changes in glycogen phosphorylase induced by phosphorylation. Nature 336: 215-221

Stefan E, Aquin S, Berger N, Landry CR, Nyfeler B, Bouvier M, Michnick SW (2007) Quantification of dynamic protein complexes using Renilla luciferase fragment complementation applied to protein kinase A activities in vivo. Proceedings of the National Academy of Sciences of the United States of America 104: 16916-16921

Tan CS, Bodenmiller B, Pasculescu A, Jovanovic M, Hengartner MO, Jorgensen C, Bader GD, Aebersold R, Pawson T, Linding R (2009) Comparative analysis reveals conserved protein phosphorylation networks implicated in multiple diseases. Science signaling 2: ra39

Tarassov K, Messier V, Landry CR, Radinovic S, Molina MMS, Shames I, Malitskaya Y, Vogel J, Bussey H, Michnick SW (2008) An in vivo map of the yeast protein interactome. Science 320: 1465-1470

Tarrant MK, Cole PA (2009) The Chemical Biology of Protein Phosphorylation. Annu Rev Biochem 78: 797- 825

168

Taylor SS, Radzioandzelm E, Hunter T (1995) Protein-Kinases .8. How Do Protein-Kinases Discriminate between Serine Threonine and Tyrosine - Structural Insights from the Insulin-Receptor Protein-Tyrosine Kinase. Faseb J 9: 1255-1266

The R project for Statistical Computing -- www.r-project.org/

Thingholm TE, Jørgensen TJD, Jensen ON, Larsen MR (2006) Highly selective enrichment of phosphorylated peptides using titanium dioxide. Nature Protocols 1: 1929-1935

Tirosh I, Barkai N (2007) Comparative analysis indicates regulatory neofunctionalization of yeast duplicates. Genome Biology 8: R50-R50

Toda T, Cameron S, Sass P, Zoller M, Wigler M (1987) Three different genes in S. cerevisiae encode the catalytic subunits of the cAMP-dependent protein kinase. Cell 50: 277-287

Treco DA, Winston F (2008) Growth and manipulation of yeast. Current protocols in molecular biology Chapter 13: Unit 13.12-Unit 13.12

Trinidad JC, Barkan DT, Gulledge BF, Thalhammer A, Sali A, Schoepfer R, Burlingame AL (2012) Global identification and characterization of both O-GlcNAcylation and phosphorylation at the murine synapse. Mol Cell Proteomics 11: 215-229

Ubersax JA, Ferrell JE, Jr. (2007) Mechanisms of specificity in protein phosphorylation. Nat Rev Mol Cell Biol 8: 530-541

Ubersax JA, Woodbury EL, Quang PN, Paraz M, Blethrow JD, Shah K, Shokat KM, Morgan DO (2003) Targets of the cyclin-dependent kinase Cdk1. Nature 425: 859-864

Uckun FM, Ma H, Zhang J, Ozer Z, Dovat S, Mao C, Ishkhanian R, Goodman P, Qazi S (2012) Serine phosphorylation by SYK is critical for nuclear localization and transcription factor function of Ikaros. Proc Natl Acad Sci U S A 109: 18072-18077 van Hoof A (2005) Conserved Functions of Yeast Genes Support the Duplication, Degeneration and Complementation Model for Gene Duplication. Genetics 171: 1455-1461

Vazquez F, Ramaswamy S, Nakamura N, Sellers WR (2000) Phosphorylation of the PTEN tail regulates protein stability and function. Mol Cell Biol 20: 5010-5018

Verma R, Annan RS, Huddleston MJ, Carr SA, Reynard G, Deshaies RJ (1997) Phosphorylation of Sic1p by G1 Cdk required for its degradation and entry into S phase. Science 278: 455-460

Vidal M, Cusick ME, Barabasi AL (2011) Interactome networks and human disease. Cell 144: 986-998

Wagner A (2001a) Birth and death of duplicated genes in completely sequenced eukaryotes. Trends Genet 17: 237-239

Wagner A (2001b) The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes. Molecular Biology and Evolution 18: 1283-1292

Wang J, Torii M, Liu H, Hart GW, Hu ZZ (2011) dbOGAP - an integrated bioinformatics resource for protein O- GlcNAcylation. BMC Bioinformatics 12: 91

169

Wang M, Weiss M, Simonovic M, Haertinger G, Schrimpf SP, Hengartner MO, von Mering C (2012) PaxDb, a Database of Protein Abundance Averages Across All Three Domains of Life. Mol Cell Proteomics 11: 492-500

Ward JJ, McGuffin LJ, Bryson K, Buxton BF, Jones DT (2004a) The DISOPRED server for the prediction of protein disorder. Bioinformatics 20: 2138-2139

Ward JJ, Sodhi JS, McGuffin LJ, Buxton BF, Jones DT (2004b) Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. Journal of molecular biology 337: 635-645

Winzeler EA, Shoemaker DD, Astromoff A, Liang H, Anderson K, Andre B, Bangham R, Benito R, Boeke JD, Bussey H, Chu AM, Connelly C, Davis K, Dietrich F, Dow SW, EL Bakkoury M, Foury F, Friend SH, Gentalen E, Giaever G et al (1999) Functional characterization of the S-cerevisiae genome by gene deletion and parallel analysis. Science 285: 901-906

Wolfe KH, Shields DC (1997) Molecular evidence for an ancient duplication of the entire yeast genome. Nature 387: 708-713

Wong A, Zhang YW, Jeschke GR, Turk BE, Rudnick G (2012) Cyclic GMP-dependent Stimulation of Serotonin Transport Does Not Involve Direct Transporter Phosphorylation by cGMP-dependent Protein Kinase. J Biol Chem 287: 36051-36058

Yang ZH (2007) PAML 4: Phylogenetic analysis by maximum likelihood. Molecular biology and evolution 24: 1586-1591

Zeidan Q, Hart GW (2010) The intersections between O-GlcNAcylation and phosphorylation: implications for multiple signaling pathways. Journal of cell science 123: 13-22

Zhang JZ (2003) Evolution by gene duplication: an update. Trends Ecol Evol 18: 292-298

Zhang P, Smith-Nguyen EV, Keshwani MM, Deal MS, Kornev AP, Taylor SS (2012) Structure and Allostery of the PKA RII beta Tetrameric Holoenzyme. Science 335: 712-716

Zhao Y, Jensen ON (2009) Modification-specific proteomics: strategies for characterization of post- translational modifications using enrichment techniques. Proteomics 9: 4632-4641

Zhu H, Klemic JF, Chang S, Bertone P, Casamayor A, Klemic KG, Smith D, Gerstein M, Reed MA, Snyder M (2000) Analysis of yeast protein kinases using protein chips. Nat Genet 26: 283-289

Zielinska DF, Gnad F, Jedrusik-Bode M, Wisniewski JR, Mann M (2009) Caenorhabditis elegans has a phosphoproteome atypical for metazoans that is enriched in developmental and sex determination proteins. J Proteome Res 8: 4039-4049

Zielinska DF, Gnad F, Wisniewski JR, Mann M (2010) Precision mapping of an in vivo N-glycoproteome reveals rigid topological and sequence constraints. Cell 141: 897-907

170