<<

Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine 1116

Evolution of the G -coupled signaling system

Genomic and phylogenetic analyses

ARUNKUMAR KRISHNAN

ACTA UNIVERSITATIS UPSALIENSIS ISSN 1651-6206 ISBN 978-91-554-9277-9 UPPSALA urn:nbn:se:uu:diva-258956 2015 Dissertation presented at Uppsala University to be publicly examined in C8:301, Uppsala Biomedical Centre (BMC), Husargatan 3, Uppsala, Wednesday, 9 September 2015 at 09:15 for the degree of Doctor of Philosophy (Faculty of Medicine). The examination will be conducted in English. Faculty examiner: Professor Torgeir Hvidsten (Department of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences).

Abstract Krishnan, A. 2015. Evolution of the -coupled receptor signaling system. Genomic and phylogenetic analyses. Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine 1116. 56 pp. Uppsala: Acta Universitatis Upsaliensis. ISBN 978-91-554-9277-9.

Signal transduction pathways mediated by G protein-coupled receptors (GPCRs) and their intracellular coupling partners, the heterotrimeric G , are crucial for several physiological functions in eukaryotes, including . This thesis describes a broad genomic survey and extensive comparative phylogenetic analysis of GPCR and G protein families from a wide selection of eukaryotes. A robust mining of GPCR families in fungal genomes (Paper I) provides the first evidence that homologs of the mammalian families of GPCRs, including , Adhesion, Glutamate and are present in Fungi. These findings further support the hypothesis that all main GPCR families share a common origin. Moreover, we clarified the evolutionary hierarchy by showing for the first time that Rhodopsin family members are found outside metazoan lineages. We also characterized the GPCR superfamily in two important model organisms (Amphimedon queenslandica and Saccoglossus kowalevskii) that belong to different metazoan phyla and which differ greatly in morphological characteristics. Curation of the GPCR superfamily (Paper II) in Amphimedon queenslandica (an important model to understand evolution of animal multicellularity) reveals the presence of four of the five GRAFS families and several other GPCR families. However, we find that the sponge GPCR subset is divergent from GPCRs in other studied bilaterian and eumetazoan lineages. Mapping of the GPCR superfamily (Paper III) in a hemichordate Saccoglossus kowalevskii (an essential model to understand the evolution of the chordate body plan) revealed the presence of all major GPCR GRAFS families. We find that S. kowalevskii encodes local expansions of peptide and somatostatin- like GPCRs. Furthermore, we delineate the overall evolutionary hierarchy of vertebrate-like G protein families (Paper IV) and provide a comparative perspective with GPCR repertoires. The study also maps the individual gene gain/loss events of G proteins across holozoans with more expanded invertebrate taxon sampling than earlier reports. In addition, Paper V describes a broad survey of nematode chemosensory GPCR families and provides insights into the evolutionary events that shaped the GPCR mediated chemosensory system in protostomes. Overall, our findings further illustrate the evolutionary hierarchy and the diversity of the major components of the G protein-coupled receptor signaling system in eukaryotes.

Keywords: GPCRs, G proteins, Sensory system, , Olfaction, Chemosensation, Hemichordates, Sponges, Porifera, Bilaterians, Holozoans, Fungi, Opisthokonts

Arunkumar Krishnan, , Department of Neuroscience, Functional Pharmacology, Box 593, Uppsala University, SE-75124 Uppsala, Sweden.

© Arunkumar Krishnan 2015

ISSN 1651-6206 ISBN 978-91-554-9277-9 urn:nbn:se:uu:diva-258956 (http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-258956)

You can't even begin to understand biology, you can't understand life, unless you understand what it's all there for, how it arose - and that means evolution. - Richard Dawkins

List of Papers

This thesis is based on the following papers, which are referred to in the text by their Roman numerals.

I Krishnan, A.*, Almen, MS.*, Fredriksson, R., Schioth, HB. (2012) The origin of GPCRs: identification of mammalian like Rhodopsin, Adhesion, Glutamate and Frizzled GPCRs in fungi. PLoS One, 7(1): e29817 II Krishnan, A.*, Dnyansagar, R.*, Almen, MS., Williams, MJ., Fredriksson, R., Narayanan, M., Schioth, HB. (2014) The GPCR repertoire in the demosponge Amphimedon queenslandica: insights into the GPCR system at the early divergence of animals. BMC Evolutionary Biology, 14:270 III Krishnan, A., Almen, MS., Fredriksson, R., Schioth, HB. (2013) Remarkable similarities between the hemichordate (Saccoglossus kowalevskii) and vertebrate GPCR repertoire. Gene, 526: 122-133 IV Krishnan, A., Mustafa, A., Almen, MS., Fredriksson, R., Williams, MJ., Schioth, HB. (2015) Evolutionary hierarchy of vertebrate-like families. Molecular Phylogenetics and Evolution, 91: 27-40 V Krishnan, A., Almen, MS., Fredriksson, R., Schioth, HB. (2014) Insights into the origin of nematode chemosensory GPCRs: putative orthologs of the Srw family are found across several phyla of protostomes. PLoS One, 9: e93048

*These authors contributed equally

Reprints were made with permission from the respective publishers.

Additional publications

In addition to the papers included in this thesis, the author has published the following papers.

 Duc, DL., Renaud, G., Krishnan, A., Almén, MS., Huynen, L., Prohaska, SJ., Ongyerth, M., Bitarello, BD., Schiöth, HB., Hofreiter, M., Stadler, PF., Prüfer, K., Lambert, D., Kelso, J., Schöneberg T. (2015) Kiwi genome provides insights into evolution of a nocturnal lifestyle. Genome Biology, Accepted

 Hamann, J., Aust, G., Arac, D., Engel, FB., Formstone, C., Fredriksson, R., Hall, RA., Harty, BL., Kirchhoff, C., Knapp, B., Krishnan, A., Liebscher, I., Lin, HH., Martinelli, DC., Monk, KR., Peeters, MC., Piao, X., Promel, S., Schoneberg, T., Schwartz, TW., Singer, K., Stacey, M., Ushkaryov, YA., Vallon, M., Wolfrum, U., Wright, M.W., Xu, L., Langenhan, T., Schioth, H.B. (2015) International Union of Basic and Clinical Pharmacology. XCIV. Adhesion G Protein-Coupled Receptors. Pharmacological Reviews, 67:338-367

 Harty, BL., Krishnan, A., Sanchez, NE., Schioth, HB., Monk, KR. (2015) Defining the gene repertoire and spatiotemporal expression profiles of adhesion G protein-coupled receptors in zebrafish. BMC Genomics, 16:62

 Krishnan, A., Schioth, HB. (2015) The role of G protein-coupled receptors in the early evolution of neurotransmission and the nervous system. Journal of Experimental Biology, 218:562-571

 Eriksson, A., Williams, MJ., Voisin, S., Hansson, I., Krishnan, A., Philippot, G., Yamskova, O., Herisson, FM., Dnyansagar, R., Moschonis, G., Manios, Y., Chrousos, GP., Olszewski, PK., Frediksson, R., Schioth, HB. (2015) Implication of coronin 7 in body weight regulation in humans, mice and flies. BMC Neuroscience, 16:13

 Valsalan, R., Krishnan, A., Almen, MS., Fredriksson, R., Schioth, HB. (2013) Early vertebrate origin of melanocortin 2 receptor accessory proteins (MRAPs). General and Comparative Endocrinology, 188:123-132

 Vastermark, A., Krishnan, A., Houle, ME., Fredriksson, R., Cerda- Reverter, JM., Schioth, HB., (2012) Identification of distant Agouti- like sequences and re-evaluation of the evolutionary history of the Agouti-related peptide (AgRP). PLoS One, 7: e40982

Contents

Introduction ...... 13 The and its proteins ...... 13 G protein-coupled receptors ...... 14 The Glutamate family ...... 16 The Rhodopsin family ...... 16 The Adhesion family ...... 18 The Frizzled family ...... 19 The family...... 19 Other GPCR families ...... 20 The “GPCR A” clan (CLO192) ...... 21 Heterotrimeric G proteins ...... 22 Diversity of GPCR signaling pathways ...... 23 GPCR mediated sensory signaling ...... 24 Eukaryotic evolution and taxon sampling ...... 25 Aims ...... 29 Paper I ...... 29 Paper II ...... 29 Paper III ...... 29 Paper IV ...... 29 Paper V ...... 30 Materials and Methods ...... 31 Sequence retrieval using similarity search methods ...... 31 Multiple sequence alignments ...... 32 Phylogenetic analysis ...... 33 Domain search ...... 34 HMM-HMM profile comparisons ...... 35 Results and Discussion ...... 36 Basal fungal species has homologs of metazoan-like GRAF(s) GPCR families (Paper I) ...... 36 Invertebrate metazoans are rich in GPCRs (Papers II and III) ...... 37 Evolution of G protein families and comparisons with GPCR repertoires (Paper IV) ...... 39 Evolution of Nematode chemosensory GPCR system in protostomes (Paper V) ...... 40

Conclusions ...... 43 Future perspectives ...... 45 Acknowledgements ...... 46 References ...... 49

Abbreviations

7TM Seven pass transmembrane domain ATP Adenosine triphosphate CA Common ancestor cAMP Cyclic adenosine monophosphate CRD rich domain CTF C-terminal fragment ECD Extracellular domain FZD Frizzled GABA Gamma-aminobutyric acid GAIN GPCR autoproteolysis-inducing domain GDP GEFs Guanine nucleotide exchange factors GPCR G protein-coupled receptor GPS GPCR proteolytic site GTP GTP Guanosine-5'-triphosphate HMM Hidden markov model HRM motif IORs Insect odorant receptors ITR Intimal thickness related receptor LECA Last eukaryotic common ancestor MCMC Markov chain Monte Carlo algorithms NTF N-terminal fragment ORs Olfactory receptors PP Posterior probability RGS Regulators of G protein signaling RhoA Ras homolog gene family, member A TM Transmembrane VFD Venus flytrap domain

Introduction

The cell membrane and its proteins The cell membrane helps define the structure and the functioning of the cell by acting as a barrier separating the interior of the cell from the outside environment [1]. The cell membrane is composed of a thin sheet like structure, formed by a phospholipid bilayer. This phospholipid molecule is made up of two units: a polar head group containing glycerol and a phosphate molecule, a hydrophobic tail formed by two hydrophobic hydrocarbon fatty acid chains. The non-polar fatty acid tail is covalently attached to the glycerol molecule of the polar head region [2]. In aqueous environments, these phospholipids are spontaneously arranged so the hydrophobic tail region (water-hating) is packed inward, while the hydrophilic head region is exposed to the outside of the cell [2]. Although, these lipids constitute the major component of the cell membrane, a substantial proportion is constituted of membrane proteins to aid several essential roles in the functioning of a cell [3,4]. Some major functions include transporting molecules across the plasma membrane, receiving extracellular signals and triggering intracellular responses, and maintenance of stable cell-cell interactions [5]. Thus, membrane proteins are one of the most important biological macromolecules, which assist controlling the physiology of the organism. Membrane proteins can be classified into two main categories: peripheral or integral, based on the nature of their interaction with the cell membrane. Peripheral membrane proteins are temporarily associated or adhere to the surface of the membrane. This mainly includes an amphipathic that adheres to the cell membrane or a covalently attached lipid chain that helps proteins attach to the membrane. Conversely, the integral membrane proteins are embedded in the phospholipid bilayer. The largest structural class of integral transmembrane proteins is α-helical that is either classified as single-pass or multi-pass transmembrane proteins depending on the number of helices spanning the membrane. These membrane spanning helices are largely characterized by amino acids containing non-polar side chains to facilitate their interaction with hydrophobic membrane, and the loops connecting these helices are hydrophilic and exist in the cytosol. In some cases, these integral transmembrane proteins are also formed by a β-barrel domain composed by multiple β-strands which act as transmembrane channels (for

13 example Porins). In this thesis, we focus on the evolution and characterization of the largest integral and α-helical transmembrane , the G protein-coupled receptors (GPCRs) and its intracellular coupling partners, G proteins [6,7].

G protein-coupled receptors GPCRs constitute one of the largest families of membrane proteins and play a crucial role in eukaryotic signal transduction [6,7]. In humans and other animals, GPCRs mediates most of the cellular responses through hormones, neurotransmitters and environmental stimulants. Given their important physiological roles, they serve as major drug targets, with approximately 36% of current clinical drugs targeting these receptors [8,9,10,11]. GPCRs are characterized by the presence of a seven membrane spanning α-helical segment (also known as the 7TM domain), an extracellular N-terminus, intracellular C-terminus, alternating extracellular and intracellular loops that connect the helices. Sequence analysis shows that the 7TM domain segments are relatively conserved within the GRAFS families, while the N-termini and the alternating loops display great diversity both in length and sequence composition. This diversity in their N-termini and extracellular loops contribute to their remarkable ability to bind ligands of different shapes, sizes and chemical properties. After binding, GPCRs undergoes conformational changes that lead to heterotrimeric G protein activation, resulting in intracellular response [12]. The GPCR superfamily constitute about 800 in the genome and several classification system have been proposed, including A-F [13], 1- 5 [14] or the GRAFS classification system [15]. Of these, the A-F and the GRAFS system are widely used. In this thesis, we utilize the GRAFS classification system to describe the evolutionary mining of GPCRs. On the basis of sequence similarity and phylogenetic inference, the GRAFS system categorized the human GPCR repertoire into five main families: Glutamate (G), Rhodopsin (R), Adhesion (A), Frizzled/taste2 (F), and Secretin (S) (Figure 1) [15]. Apart from these five main families, GPCRs also includes some small families such as the GPR108 family, the intimal thickness- related receptor family, ocular albinism receptor family and the Dictyostelium discoideum cAMP receptor family [15]. However, these GPCR families do not fall under the GRAFS system and are usually referred to as “others” category. Mining of GPCR GRAFS families and other GPCR families across diverse metazoan taxa have shown that most GPCR families are found in metazoans and some of these families also have a pre-metazoan origin. For example, homologs of Frizzled, Glutamate, Adhesion, and cAMP families, are found in Dictyostelium discoideum. However, previous mining of GPCRs focused mainly on metazoans and extensive mining of GPCRs in

14 non-metazoans by including several non-metazoan lineages were not performed previously. Moreover the curation of GPCR repertoires in non- bilaterian metazoans and their evolutionary relationships with bilaterian GPCRs still remains unresolved. In this thesis, we focus on a broad scale mining of GPCR families in diverse eukaryotic taxa including several previously unsampled non-metazoan species. We also study the evolution of G protein families in detail and present a comparative perspective and a robust update on the evolution of GPCR signaling system.

Figure 1: A simplified schematic figure depicting the most common or generic structural architecture of GRAFS GPCR families.

15 The Glutamate family The Glutamate family (also known as class C GPCRs) mostly includes metabotropic glutamate receptors (binding excitatory glutamate), gamma- aminobutyric acid (GABA) receptors (GABABRs), calcium-sensing receptor (CASR), the sweet and umami taste receptors (TAS1Rs) and other orphan receptors[16]. The Glutamate family of GPCRs plays important role in many physiological processes: including synaptic transmission, calcium homeostasis, among others. The extracellular domain (ECD) of the Glutamate GPCRs is characterized by a Venus flytrap domain (VFD) that acts as an orthosteric for native ligands and a cysteine rich domain (CRD) (see Figure 1), [16,17]. The globular VFD domain constitutes a cleft formed by the bi-lobed architecture, switching between the open and the closed conformations, allowing for the binding of ligands in the cleft (Figure 1) [18,19,20]. This mechanism resembles the so called “Venus flytrap” plant that is able to close around its prey. The CRD domain is known to mediate communication between the ECD and 7TM domains. This interaction is stabilized by disulphide bridges formed by the present in the CRD domain, and one of those bridges also facilitates interaction between the CRD and VFD domains [16]. Extensive mining of GPCRs showed that the Glutamate family has an ancient evolutionary origin among most GPCR families [21]. Previous studies have found Glutamate GPCRs in several ancient metazoan genomes, and non-metazoan species [22,23,24]. Homologs of Glutamate GPCRs are found in Amoebozoa Dictyostelium discoideum [22,23,24], and in chromalveolate Thalassiosira pseudonana [21,25]. Evolutionary mining of Glutamate GPCRs across diverse eukaryotic taxa also showed that the N-terminus domains responsible for the “Venus flytrap” based ligand binding is found conserved in most Glutamate GPCRs[18].

The Rhodopsin family The Rhodopsin family of GPCRs (also known as Class “A”) is the largest of all GPCR families constituting about 700 of the 800 human GPCRs [15]. In humans, the Rhodopsin family can be further divided into four main groups (α, β, γ and δ) and 13 sub-groups/families [15]. The phylogenetic inference of clustering into 13 distinct families largely correlated with their difference in functions and chemical properties of their ligands. The 13 receptor families or the main branching clusters in the Rhodopsin family tree are: the cluster, amine receptor cluster, receptor cluster, receptor cluster, MECA (melanocortin, endothelin, cannabinoid and adenosine) receptor cluster (forming the α group); peptide receptor cluster (forming the β group); SOG (somatostatin, opioid and galanin) receptor cluster, MCH (melanin-concentrating hormone) receptor cluster,

16 receptor cluster (forming the γ group); MAS-related receptor cluster, glycoprotein receptor cluster, purine receptor cluster, and cluster (forming the δ group) [15]. Among these families, the olfactory receptors constitute the major component of the large Rhodopsin family, and are responsible for discriminating thousands of different odor molecules [26]. The family of olfactory receptors is considered one of the most evolutionary dynamic among all protein families and is subject to birth- and-death evolution in which new genes are created by repeated gene duplication, while others are lost by pseudogenization [26]. The olfactory receptors have long evolutionary histories and are found in most chordates and as well in some non-bilaterian species, suggesting that olfaction evolved early in the animal evolution [27,28]. Similarly, among the 13 families of the Rhodopsin family, some families such as “amine receptor cluster” and “peptide receptor cluster” are also found in most bilaterians, as well as in non-bilaterians including sea-anemones and placozoans [29,30,31,32], while other families are relatively recent. Previous evolutionary studies on this family suggests that they have undergone large lineage specific expansions/reductions and are also found in all metazoan lineages [21]. Most notable is the olfactory family that had undergone large expansions and deletions [27,33]. Among all major GPCR families, the Rhodopsin family are targets for the largest spectrum of diverse ligands [7]. Most receptors of this family are characterized by a short N-terminus and the ligand binding to these receptors is largely confined within the seven transmembrane regions (Figure 1) [7]. Structure based sequence alignments and subsequent evolutionary conservation analysis show that there are several key residues in the ligand binding pocket formed by the TM bundle [7]. Most of these crucial amino acids are located in TM helices, TM3, TM4, TM6 and TM7 and forms a consensus scaffold of the ligand binding pocket. Variations of these amino acids contribute to the ligand specificity of different receptors [7]. Moreover, residues located in the intracellular region and the cytoplasmic ends of the TM helices are crucial for interactions with its coupling partner, the G proteins and other downstream signaling molecules such as GPCR kinases and arrestins [6,7]. One such crucial motif is the characteristic glutamic acid/aspartic acid-arginine-tyrosine (E/DRY) motif located at the cytoplasmic end of TM3, and these residues are widely known to be crucial for their interaction with G proteins, and more importantly their role in keeping the receptor at its ground state [34]. As mentioned previously, the depth of the ligand binding differs between small molecules, larger peptide ligands, and large glycoprotein hormones that tend to bind to the pocket and as well interact with the N-terminus and extracellular loops [7].

17 The Adhesion family The Adhesion family is the second largest of all GPCR families with 33 and 31 members in human and mice, respectively [15,35]. This family members are characterized by long N-terminus with a wide repertoire of domains, including the classical cell-adhesion domains like the EGF (epidermal growth factor) and cadherin domains and hence the family was originally classified as “Adhesion-like” (Figure 1) [36]. Also, the N-terminus of Adhesion GPCRs contains the GPCR autoproteolysis-inducing (GAIN) domain, which encompasses the highly conserved and cysteine-rich GPCR proteolytic site (GPS) [37,38] site of around 50 amino acids, located immediately before the first TM helix. Autoproteolysis at the GPS site cleaves the receptors into an extracellular N-terminal fragment (NTF) and a membrane-spanning/cytoplasmic C-terminal fragment (CTF) [39]. It is currently understood that the majority of mature Adhesion GPCRs exist as a non-covalently attached NTF-CTF complex and this process of cleavage and the GPS motif is crucial for receptor signaling, receptor stability and trafficking [39]. Previous sequence and phylogenetic analysis based on the 7TM domain showed that Adhesion GPCRs can further be subdivided into nine distinct families [40]. Notably, each subdivided family has a unique pattern of N-terminal domain architecture and these patterns are used as a molecular signature to distinguish between the subfamilies as well as identify their homologs in distant organisms [21,38]. Mining of Adhesion GPCRs in several eukaryotic taxa showed that these molecules are of ancient origin [21,41,42]. Homologs of Adhesion GPCRs are found in metazoans and Amoebozoa Dictyostelium discoideum suggesting that these molecules have a pre-metazoan origin [21,41,42]. Until recently, Adhesion GPCRs is not known to be present in Fungi; however, “Paper I” provided the first evidence for the presence of their homologs in basal fungal lineages and other non-metazoans including the unicellular holozoans. Also, extensive mining of Adhesion GPCRs showed that most non-metazoan homologs contains only the C-terminal fragment (CTF) that include the 7TM region and the GPS motif, while the NTF is found mainly in metazoans and its closest unicellular relatives [42]. Unfortunately, the signal transduction mechanisms and the ligands of Adhesion GPCRs are mostly unknown; however, recent progress in the field has uncovered several Adhesion GPCR–interacting partners. Unlike other major GPCR families that mainly interact with small molecules or peptides as ligands, Adhesion GPCRs are found to interact with several matricellular ligands [39]. For example, some of the known interaction partners for the Adhesion GPCR CD97 are CD55 (also known as decay-accelerating factor), α5β1 and αvβ3 and CD90. And similarly, Adhesion GPCR interacts with teneurin-2, fibronectin leucine-rich transmembrane (FLRT) family members and with presynaptically localized neurexin-1a, -

18 1b, -2b, and -3b proteins forming a trans-synaptic adhesion complex [43,44,45,46,47]. These recent evidences strengthen the idea of adhesion- class GPCRs as important in establishing cell-cell adhesion, and their interaction with transmembranous proteins involved in neuronal biology also suggests that they have roles in neuronal development [39,48].

The Frizzled family The Frizzled genes were first identified in Drosophila melanogaster, and the origin of the term “Frizzled” came from the phenotype of the Frizzled mutant of D. melanogaster that displayed a disruption in the polarity of the epidermal cells [49,50]. The Frizzled family of receptors is activated by secreted lipoglycoproteins of the Wingless/Int-1 (WNT) family, and these receptors are crucial for embryonic development, tissue and cell polarity, neural development and many other processes during development of adult organisms [51]. Over the years, Frizzled receptors have been identified in diverse animals including sponges and human [21,50,52]. In most , the Frizzled family includes ten receptors named FZD1-10 according to the nomenclature of the International Union of Basic and Clinical Pharmacology (IUPHAR nomenclature) [50]. These receptors are characterized by a seven transmembrane domain, an extracellular region that contains a cysteine-rich domain (CRD), and an intracellular segment that interacts with downstream signaling proteins [53]. The extracellular CRD domain acts as the ligand binding region for most Frizzled receptors [53]. The known ligands of Frizzled receptors are WNTs, R-spondins, Norrins, FRPs (Frizzled related proteins) and CTGF (connective-tissue growth factors) [52]. The Frizzled family receptors are classified as a separate family of GPCRs by IUPHAR, which is largely based on data from sequence comparisons with other GPCR families, the presence of a seven transmembrane segment characteristic to GPCRs, and other indirect evidence that points towards their interactions with G proteins [50,52]. However, direct experimental evidence for their coupling with G protein is still lacking [52]. Rather, a notable feature of Frizzled family receptors is that they interact with several other intracellular proteins to induce their intracellular signalling. One such is the phosphoprotein Disheveled and it is known that the PDZ domain of Disheveled directly interacts with the highly conserved KTxxxW motif in the C terminal region of Frizzled proteins to induce intracellular signaling [54,55,56].

The Secretin family The Secretin family of GPCRs constitutes receptors that serve as molecular targets for peptide hormones, including the hormone secretin [16,57]. These receptors play a crucial role in hormonal homeostasis in humans and other

19 animals and serve as important drug targets for several endocrine disorders [57]. Secretin GPCRs are generally characterized by N-terminus extracellular domain (ECD) containing the Hormone Receptor Motif (HRM domain), seven transmembrane segments containing membrane spanning helices and intra and extracellular loops connecting the helices, and an intracellular C-terminus. The ECD plays an important role in binding of these polypeptide hormones. Sequence analysis also shows that the residues located at the N-terminus region, close to TM1 helix are conserved, and are also implicated in ligand binding mechanisms [58,59,60]. Some of the ligands for these receptors include the hormones: secretin, vasoactive intestinal peptide (VIP), pituitary adenylate cyclase-activating polypeptide (PACAP), glucagon, growth hormone releasing hormone (GHRH), glucagon-related peptides 1 and 2 (GLPs 1 and 2), gastric inhibitory polypeptide (GIP), corticotropin releasing factor (CRF), and parathyroid hormone, among others [57,58]. Given the important role of these ligands and their induced signaling responses, this family of receptors serves as drug targets in several diseases, including cardiovascular diseases, diabetes, migraine, bone disorders [16,61,62]. Mining of Secretin GPCRs has shown that these receptors are found mainly in bilaterians and there is no compelling evidence for the presence of their homologs in non-bilaterian species [63]. Phylogenetic analysis based on the 7TM segment suggested that Secretin GPCRs are closely related to Adhesion family GPCRs and was earlier hypothesized to have evolved from Adhesion family of GPCRs [64]. Moreover, the A-F classification system groups Adhesion and Secretin family GPCRs as “Class B” GPCRs based on the sequence similarities observed within the 7TM regions. However, Secretin GPCRs differs from Adhesion GPCRs in many other aspects [16,39]. For example, the Secretin GPCRs lack autocatalytic processing, multitude of N-terminal domains and cell-cell adhesion functions that are commonly associated with Adhesion family GPCRs [39]. Taking these observations into account, we classify any novel GPCR sequences into the family of Adhesion GPCRs, based on the presence of GPS site and long N-terminal domains.

Other GPCR families The other GPCR gene families that do not descend into the major GRAFS families, mainly include the ocular albinism receptor (GPR143) [65], intimal thickness related receptor (ITR), GPR108 (otherwise known as the lung seven transmembrane receptor) and the Dictyostelium discoideum Cyclic AMP receptors. These GPCR genes are found as single-copy genes in most organisms and have long evolutionary histories [21,42]. The functional roles of some of these GPCR-like genes are not well understood. However, GPR143 is known to be mainly involved in pigmentation of the and is also expressed in epidermal cells. Moreover, mutations in GPR143 cause

20 ocular albinism type 1, a condition that severely affects the visual acuity in humans [65,66,67]. Moreover the D. discoideum Cyclic AMP receptors are well studied and are known to be involved in D. discoideum development. Apart from these GPCR families that are commonly found in metazoans, there are other families of GPCRs that are specifically found in fungi species. Fungal GPCRs fall mainly into to six families that are unrelated to the metazoan GPCRs [68]. The six families include the sensing receptor families STE2 and STE3, putative nutrient sensing GPCRs (STM1- like), sugar/ sensing receptors (GPR1 and GIT3-like), microbial (Nop-1 and ORP-1-like) and cAMP-like receptors [68]. It must also be mentioned that there are several putative proteins with GPCR-like architecture containing the characteristic 7TM domain, but are yet to be classified and require further studies [68]. Until recently, it has been considered that these are the major families that comprise the fungal GPCR families; however, “Paper I” provided the first evidence that a few basal fungal species also contains homologs of mammalian-like GPCRs.

The Pfam “GPCR A” clan (CLO192) The Pfam database is a comprehensive resource of a large collection of protein domains and families and each protein domain/family is characterized by a manually curated profile Hidden Markov Model (HMM). In the Pfam database, most GPCR families are associated with a characteristic Pfam domain. For example, 7tm_1 domain (Pfam ID: PF00001) denote to Rhodopsin (Class A) GPCR family, while 7tm_2 domain (PF00002) denote the Adhesion/Secretin (Class B) family GPCRs and 7tm_3 (PF00003) denote Glutamate (class C) GPCRs. These manually curated HMM profiles built using the representative sequences of a particular helps to differentiate between families and as well to identify homologs in the species of interest. Moreover, in the Pfam database, similar families are grouped into a specific clan and assigned a unique clan number/ID. It is suggested that all families belonging to a particular Pfam clan is considered to share a common origin. In such aspect, several GPCR families are grouped into a specific clan, named the “GPCR A” Pfam clan (CL0192). This approach of assigning of protein families to a clan follows a systematic approach and considers several evidences to measure or evaluate the relatedness between the families. The criteria are that families comprising a clan should have related structure, related function, significant matching of the same sequence to HMMs from different families and significant HMM profile–profile comparison scores [69,70]. Based on these criterions, each Pfam clan was annotated and that the “GPCR A” Pfam clan currently contains 36 families. The Pfam GPCR A clan comprise the established GRAFS families (five families), nematode

21 chemosensory GPCRs (19 families), and the “other” GPCR families (TAS2R, Ocular_alb, V1R, Dicty_CAR, Lung_7-TM_R, Git3, Git3_C, DUF621, DUF1182, Bac_rhodopsin, 7TM-7TMR_HD, GpcrRhopsn4). Nonetheless, some of these families categorized as others (Bac_rhodopsin, 7TM-7TMR_HD, GpcrRhopsn4, DUF621, and DUF1182) are distantly related to GPCRs and needs further studies to be considered as unambiguous members of the GPCR superfamily.

Heterotrimeric G proteins Heterotrimeric G proteins perform a crucial role as molecular switches responsible for turning on the intracellular signaling cascades mediated by GPCR signaling pathways [12]. Heterotrimeric G proteins are composed of three subunits α, β, and γ, and they normally exist in an inactive state of Gα- GDP bound to Gβγ subunits [71,72]. Activation of a GPCR by extracellular stimuli (ligands) induces a conformational change in the receptor and as well within the Gαβγ subunits resulting in the release of GDP and binding of GTP by Gα. Binding of GTP in the Gα subunit induces conformational changes in the three flexible segments of Gα, called the “switch” regions (I–III), and this destabilizes the heterotrimeric complex, and leads to the dissociation of Gβγ [72,73]. This allows both the GTP bound Gα and the dissociated Gβγ to further regulate the downstream effector molecules and modulate various aspects of cellular physiology [74]. This active system returns to its inactive state upon intrinsic hydrolysis of GTP to GDP by Gα subunit [75], and this process is likely enhanced by regulator of G protein signaling (RGS) proteins that interact with Gα subunit [76,77]. Upon the release of GTP, the GDP bound Gα is ready to associate with the Gβγ subunits forming an inactive heterotrimeric G protein complex to once again initiate the G protein cycle [12]. Regarding the structure of G protein subunits, the Gα subunit is the largest of the three subunits and contains a GTPase domain and an all alpha helical domain [71,78]. The GTPase domain contains the nucleotide binding pocket and is also characterized by the presence of three flexible loops, named switches I, II and III [79,80]. The Gβ subunit is characterized by seven WD40 sequence repeats and these seven individual segments form a β propeller structure [81,82]. The smaller Gγ subunit contains two α-helical regions connected by a loop and they form a coiled-coil interaction with the N-terminus Gβ [81,82]. The Gβ and Gγ form a strong Gβγ dimer and they only dissociate under denaturing conditions. Despite the size, diversity of GPCR superfamily and their ability to sense a diverse array of extracellular signals, these receptors interact with a relatively small number of G proteins to initiate intracellular signaling cascades [12,83]. For example, the contains around 800

22 GPCRs, but only 16 Gα, five Gβ and 12 Gγ genes [83,84,85]. This limited repertoire of G proteins, however, forms a large array of heterotrimeric Gαβγ complexes, as several Gβγ dimer combinations are capable of interacting with the same Gα protein. Based on and phylogenetic grouping, there exist several different G protein families [84,85]. The Gα subunit is generally categorized into four major classes (Gαi/O, Gαs, Gαq/11 and Gα/12/13), and each of these classes consists of two to four genes [86]. Recently, a fifth class of Gα was identified in fishes and invertebrates, but lost in tetrapods [87,88]. The Gβ subunit genes in mammalian genomes can be categorized into five major gene families (GNB1-5). The GNB1-4 sequences are highly conserved, while GNB5 is divergent [86,89]. The Gγ subunit is the smallest of the three subunits with a protein sequence length of around 65 to 75 amino acids. In humans, there are 12 genes that code for the Gγ subunit protein sequences and they vary considerably from each other, except for the three genes that form the family (GNG11, GNGT1, and GNGT2) [84,85].

Diversity of GPCR signaling pathways The signaling pathways mediated by GPCRs regulate a remarkable array of cellular functions, and largely governs the vertebrate physiology [90,91], including hormone release, neurotransmission, cardiac and smooth muscle contraction, blood pressure regulation, embryonic development, immune response and much more [92,93,94,95,96,97]. Given the diversity of the GPCR superfamily, these membrane bound receptors contribute to the activation of several distinct intracellular signaling pathways and the induced signaling mechanisms also largely differ between the major families of GPCRs. The diversity of GPCR signaling largely depends upon both the nature of ligand bound to the receptor and the receptors’ interaction with the members of Gα subunit families, as different Gα subunit can activate several different downstream effectors [89]. For example, Gαi inhibits adenylate cyclase (a membrane bound that catalyzes the production of cyclic- AMP (cAMP) from ATP) and decreases the amount of cAMP, while Gαs stimulates adenylate cyclase to increase the amount of cAMP. Members of other Gα class such as the Gαq activate C (PLC), while Gα12 and Gα13 belonging to Gα12/13 class activate the small GTPase RhoA and as well as its guanine nucleotide exchange factors (GEFs) that include p115- RhoGEF, PDZ-Rho-GEF, and LARG (leukemia-associated Rho GEF). Moreover, the Gβγ dimer dissociated from the Gα subunit can also activate many signaling molecules, including ion channels such as the inwardly rectifying potassium channel (GIRKs), N type calcium channels, P/Q type calcium channels and several other molecules including lipid kinases (Phosphoinositide 3 kinase γ) and [98,99,100]. Apart from

23 these above mentioned interactions with the effector molecules, there is a diverse array of other proteins that are regulated or activated by Gα and Gβγ, for which the physiological role is not yet known. For several decades, research on GPCR and G proteins have been among the most active fields in biological science, and this has led to the understanding of many diverse intracellular signaling pathways and its associations with large number of diseases, including cancer [101] and cardiac disorders [102]. Some of the most important pathways activated by G proteins include the cAMP pathway [103], MAPK/ERK (Mitogen Activated /Extracellular signal-Regulated kinases) pathway [104], PKA (Protein Kinase-A) pathway, PKC (Protein Kinase- C) pathway, PTK (Protein ) pathway, Rho pathway, NF-KappaB (Nuclear Factor-Kappa B) and STAT (Signal Transducers and Activators of Transcription Factors) pathway, among others. Most of these molecules also interact with several other proteins and thereby control numerous physiological functions.

GPCR mediated sensory signaling The sensory system plays a crucial role for organisms to survive by detecting environmental cues. In animals, the sensory signaling is largely mediated by belonging to the GPCR superfamily [26], as well as by ion channels that mediate mechanosensation, thermosensation and others [105]. Chemosensory GPCRs in vertebrates are encoded by several gene families, which includes the olfactory receptors (ORs), type 1 and 2 (VRs 1 and 2), type 1 and 2 (TRs 1 and 2), and trace amine-associated receptor (TAARs) [26]. Apart from these, GPCRs also mediates vertebrate vision by means of the “opsin” family genes that belong to the large Rhodopsin family (Class A) of GPCRs [106]. Previous evolutionary mining of these families showed that the vomeronasal receptors and taste receptors are most likely confined within vertebrates, while the olfactory receptors responsible for the are found across several metazoans [21]. Indeed vertebrate-like olfactory receptors (receptors that share sequence similarity with vertebrate ORs) are found in Nematostella vectensis, suggesting that ORs evolved before the emergence of bilateral animals [27,28]. Previous mining of these sensory GPCRs showed that deuterostomes (mainly includes chordates, hemichordates and echinoderms) contains olfactory receptors that share sequence similarity with the vertebrate-like OR family [21,27,28]. However, the protostomes (includes the arthropods, nematodes, platyhelminthes, molluscs, and annelids) seems to have evolved several lineage specific families to aid sensing the environment. This mainly includes the nematode chemoreceptors (or nematode chemosensory GPCRs) as well as the insect odorant and

24 gustatory receptors. The insect odorant receptors (IORs) were initially thought to be GPCRs due to the presence of seven-transmembrane regions, but subsequent studies showed that they lack homology to GPCRs. Also, compared to the GPCRs, the IORs possess a distinct structural topology in their N-terminus, which is located intracellularly [107]. Moreover, recent data have shown that IORs function as both heteromeric ligand-gated ion channels and cyclic nucleotide-gated ion channels, but not known to be functioning as receptors coupling to G proteins [108,109]. Similarly, the insect gustatory receptors, responsible for taste perception in insects (sweet and bitter taste) are unrelated to mammalian taste receptors or other GPCR families, but shares homology with the insect odorant receptors [110,111]. These findings raise the possibility that insect gustatory receptors are not GPCRs and they also function as ligand-gated ion channels. Conversely, the nematode chemoreceptors found in “Caenorhabditis” species are considered as GPCRs and they are closely related to the Rhodopsin (Class A) family GPCRs, among the other major families [112]. Also, previous analysis based on HMM-HMM profile comparisons suggested strong relationships of nematode chemoreceptors with Rhodopsin (Class A) family GPCRs and were hypothesized to have likely evolved from “Class A” GPCRs [21]. The nematode chemosensory GPCRs constitute a massive expansion and encodes over 1500 genes, which comprise a substantial part (about 8%) of the Caenorhabditis elegans genome [113]. A complete curation of the C. elegans gene families categorized them into 19 large families based on sequence similarity and phylogenetic clustering [113]. Fifteen of these 19 gene families are grouped into three major superfamilies (Sra, Str and Srg) [113]. The remaining four families (srbc, srsx, srw and srz) are classified as “others” and are not grouped into any of these three superfamilies (Sra, Str and Srg) on the basis of sharing low sequence similarity. Paper V performs a mining for these nematode chemoreceptor gene families in a broad taxon sampling and as well clarifies their relationship between the Rhodopsin (class A GPCRs) subfamilies.

Eukaryotic evolution and taxon sampling Several phylogenomic studies have proposed the evolutionary hierarchy of eukaryotes and in a broader aspect the eukaryotes can be divided into “Unikonta/Amorphea” and “Bikonta” [114,115,116,117,118]. Unikonta/Amorphea comprises Holozoa (Metazoa + closest unicellular relatives), Fungi and Amoebozoa while Bikonta comprises Excavata, Chromalveolata and Archaeplastida (Plantae) [114,115,116,117,118]. This thesis presents a comprehensive mining of GPCRs from various eukaryotic supergroups with special emphasis on opisthokonts, which comprises Metazoa (animals), Fungi and several other unicellular lineages, such as the

25 choanoflagellates (Choanoflagellata) and filastereans (Filasterea), etc (see Figure 2). In our earlier study, we sampled GPCR repertories from various metazoan and a few non-metazoan species and we have shown that the Frizzled, cAMP, Adhesion and Glutamate family of GPCRs are found in all analyzed metazoans and in D. discoideum [21]. However, the Fungi kingdom, which is placed (in the tree of life) before divergence of holozoans, but after the emergence of Amoebozoa (Figure 2), has not been previously mined for the presence of GRAFS families of GPCRs. Paper I describe the most comprehensive mining of GPCRs in Fungi and provides the first evidence for the presence of four of the five families of GRAFS families. Later during the course of opisthokont evolution, the metazoans emerged from the closest unicellular holozoans giving rise to the emergence of multicellular animals from single-celled ancestors. The metazoans, or the multicellular animals can be divided into bilaterians (has front and back side) and non-bilaterians. The non-bilaterians comprise ctenophores, sponges, placozoans and cnidarians. Representative species from these metazoan lineages have been sequenced and the genomic datasets provide large insights into the early origin of metazoans and its morphological characteristics. This includes, 1) Amphimedon queenslandica, a demosponge belonging to an ancient group of animals that diverged from other metazoans over 600 million years ago [119]; 2) Mnemiopsis leidyi, a ctenophore that are suggested as the sister lineage to all animals, including the most ancient sponges [120] 3) Trichoplax adhaerens [121], a placozoan representing one of the simplest free-living animals and 4) Nematostella vectensis [122], a primitive animal along with corals, jellyfish, and hydras, constitute the oldest eumetazoan phylum, the Cnidaria. Paper II examines the GPCR repertoire in Amphimedon queenslandica (a demosponge) belonging to an ancient group of animals (sponges/Porifera) and compares the GPCR repertoire with other non-bilaterians and bilaterians. The bilaterians comprise the deuterostomes and protostomes and during the course of the metazoan evolution, deuterostome split from the protostome about 550 MYA. The deuterostomes comprises echinoderms, hemichordates, cephalochordates, urochordates and chordates (Figure 2). The lineages that split before chordates and share a common deuterostome ancestor are crucial for understanding the evolution of chordates, as well as the morphological complexity of vertebrates. Hemichordates are a sister group to echinoderms, and closely related to chordates (Figure 2). Hemichordates are thus important to understand the origin of deuterostomes, chordate body plans, and also serve as an essential model organism to understand the evolution of the nervous system in chordates [123]. Paper III classified the repertoire of Saccoglossus kowalevskii and provides the first comprehensive analysis of GPCR signaling genes in hemichordate lineage and compared with the versions found in vertebrates.

26 The protostomes that differ from deuterostomes in terms of embryonic development comprises several phyla that include arthropods, nematodes, mollusks, annelids, and platyhelminthes. Taken together the protostomes represent the majority of all animal species described so far. Over the years, several genomes of protostomes have been sequenced and some of them are well known model organisms (Drosophila melanogaster, Caenorhabditis elegans, and Anopheles gambiae among others). In this thesis, we have utilized several such interesting genomes belonging to protostomes and deuterostomes. In Paper IV, we have mined G protein gene families from a broader range of metazoan genomes than previous studies. In particular, we included several previously unsampled invertebrate genomes and studied the overall evolution of G proteins, including the gene gain and loss events across metazoans. We have annotated and classified G proteins from a total of 21 metazoan genomes (excluding mammals) and in two of the closest unicellular holozoans relatives. In Paper V, we analyzed 26 eukaryotic genomes with special emphasis on protostome species for the presence of nematode chemosensory GPCR gene families. Our genomic sampling covered several important protostomes species comprising nematodes, arthropods, mollusks and platyhelminthes. The analyzed taxa also included several other eukaryotic genomes covering basal eukaryotic branches. Overall in this thesis, we have performed a broad range of taxon sampling that included several previously unsampled genomes and this provides a comprehensive update on the evolution of GPCR gene families.

27

Figure 2: A simplified schematic representation showing the evolutionary hierarchy of eukaryotic lineages within the eukaryotic supergroup Opisthokonta. The tree topology is based on different recent phylogenomic studies [114,115,116,117]. The branch lengths do not represent actual evolutionary distances. The dashed lines indicate the current uncertainty over the positioning sponges and ctenophores.

28 Aims

The overall aim of this thesis was to investigate the evolution of the two most important components of the G protein-coupled receptor signaling system, the GPCRs and G proteins. This included the mining of GPCRs (GRAFS families) and G proteins (vertebrate-like G protein families) across a broad range of eukaryotic taxon sampling and performing comparative phylogenetic/genomic analysis to deduce their origin, evolutionary hierarchy and several species-specific innovations.

Paper I To understand the evolution of GPCR gene families prior to the divergence of metazoans with special focus on fungi species.

Paper II To map and classify the repertoire of GPCRs in Amphimedon queenslandica, and perform phylogenetic comparisons with GPCRs found in eumetazoans and bilaterians.

Paper III To map and classify the repertoire of GPCRs in Saccoglossus kowalevskii, and perform phylogenetic comparisons with GPCRs found in other bilaterians.

Paper IV To trace the evolutionary hierarchy of vertebrate-like G protein families and provide a comparative perspective with the GPCR repertoires.

29 Paper V To study the evolution of the GPCR mediated chemosensory system in protostomes, with special focus on nematode chemosensory GPCR families.

30 Materials and Methods

Sequence retrieval using similarity search methods To predict protein function and as well to assign a family classification, the most common and basic method is to do a pairwise alignment of the query of interest to a database of protein sequences. This is achieved using either global or local alignment methods. Global alignment technique assumes that sequences that are to be aligned are related over their entire length, while the local alignment method attempts to search for short stretches of segments of fixed length that are later chained or extended on both ends to attain a local alignment. The local alignment is widely used and preferred over the global alignment because in most cases proteins belonging to the same superfamily/family may simply share few common or core domains and the regions connecting those domains may share less homology. The most widely used programs that perform a local alignment are BLASTp and FASTA. With further consistent developments of the algorithms, BLASTp achieved a better accuracy and speed, and is currently more popular than the other programs. A more robust class of search algorithms was later developed, which uses a probabilistic model such as profile hidden Markov model (profile HMM) to perform sequence searches [124]. HMM models are built from a set of aligned sequences or a multiple sequence alignment [125]. HMM models have higher sensitivity compared to programs like BLAST, and are often used to find remote homologies and build protein families [126]. The HMMER package [127] is used to build HMMs, and perform HMM searches and it is extensively used by the Pfam database [70]. In all papers included in this thesis, we have used Pfam HMM models to identify GPCR sequences in our genomes of interest. HMM models for each major GPCR family, including the Rhodopsin (7tm_1: PF00001), Adhesion/Secretin (7tm_2: PF00001), Glutamate (7tm_3: PF00003) and Frizzled (PF01534) are created using previously defined GPCRs and are made available in the Pfam database. In addition, the Pfam database have also curated the “GPCR A” Pfam clan that contains all GPCR gene families. A Perl script known as Pfam_scan.pl was downloaded from the Pfam ftp site and used to perform HMM based search against all HMM models present in the Pfam database. From the Pfam search output file we retrieve all protein sequences that are best aligned with the GPCR HMM models belonging to the “GPCR A” Pfam

31 clan. From the list of all putative GPCR sequences obtained from a given genome, we retrieve the sequences belonging to the families of interest and are subject to further analysis. Furthermore to categorize the GPCR families into subfamilies, we align these GPCRs with the previously classified human GPCR repertoire and other vertebrate genomes using BLASTp. In particular, this approach is mainly performed for all putative Rhodopsin family GPCR sequences, in order to further classify the sequences in any of the 13 previously known Rhodopsin subfamilies. In Papers II and III, all identified Rhodopsin GPCRs obtained using a Pfam search was searched against the human RefSeq dataset extended with GPCRs from other known families that are not present in human (nematode chemosensory GPCRs, Dictyostelium cAMP receptor family, etc). In Paper III, we also align the Rhodopsin family GPCR sequences with manually annotated and reviewed Rhodopsin (7tm_1) GPCRs obtained from the Swiss-Prot database. Based on the top 5 hits for each query sequence, we classified the list of GPCRs into subfamilies and were used to perform further phylogenetic analysis.

Multiple sequence alignments Aligning two or more sequences is the basis for identifying evolutionarily and/or structurally related proteins as well as to infer phylogenetic relationships [128,129,130]. Several sequence alignment programs are used and they vary in their accuracy, ability to scale to thousands of proteins and flexibility in comparing proteins that do not share the same domain architecture [129,131]. One of the most widely used tools is ClustalW, first introduced in 1994 [132]. However, recent developments and implementation of varying algorithms have led to the creation of several different tools such as MAFFT [133], MUSCLE [134], T-COFFEE [135], and PROBCONS [136], all of which can deliver better accuracy than ClustalW (based on benchmark data). Choosing an optimal alignment program depends on the number and the nature of sequences that has to be aligned. For example, T-COFFEE is highly accurate when about 2-100 sequences are considered, however, for a larger dataset of about 200 to 500 sequences or more, MAFFT and MUSCLE are preferred [129]. Also, MAFFT employs several multiple-alignment strategies that enable to align large number (>200) of sequences (FFT-NS-1, FFT-NS-2) using progressive methods as well as building alignments using iterative refinement methods such as L-INS-I, E-INS-I and G-INS-I for <200 sequences [137,138]. The iterative method L-INS-I is used to align a group of sequences containing sequences flanking around one alignable domain and the other regions on the either-side of the alignable domain are ignored or left unaligned [137]. The G-INS-I algorithm assumes that entire regions could be aligned and it is

32 optimal to perform a global alignment using Needleman-Wunsch algorithm [137,138]. In most cases, we utilize option E-INS-I over the other two methods because it allows to align sequences that has several conserved motifs embedded in long unalignable regions. Option E-INS-I [137] is optimal for most GPCR families because most GPCR sequences have more than one alignable domain region that includes the seven-transmembrane region, N-terminal domains and the C-terminal region. These regions are often flanked by relatively less conserved amino acid stretches. This pattern is often observed in GPCR families such as the Adhesion, Frizzled, Secretin and Glutamate that contains a long N-terminus comprising multiple domains (in most cases at least one N-terminal domain is present see Figure 1) and a seven transmembrane domain with extracellular and intracellular loops. However, we use the E-INS-I for other protein families (Paper IV) as well because of the use of the generalized affine gap cost over other alternative algorithms. This ensures that unalignable residues are left unaligned at the pairwise alignment stage and is often useful to align sequences that has insertions and deletions and as well sequences that contain low complexity regions and sequencing errors. Nevertheless, all alignments were later manually analyzed for the presence of conserved motifs and domain regions characteristic of GPCRs and G proteins (Paper IV) and poorly aligned regions are either trimmed or manually corrected using Bioedit or Jalview software.

Phylogenetic analysis Phylogenies are the most widely used methodology to resolve the orthologous and paralogous relationships of gene family/families, reconstruct the tree of life and compare genomes [139]. Phylogenetic tree reconstruction requires a multiple sequence alignment of either protein or DNA sequences, from which the variations of residues at each position of the alignment is determined [139]. In distance-matrix methods, the distance between every pair of sequences is calculated, and the resulting distance matrix is used for tree reconstruction. Character-based methods simultaneously compare all sequences in the alignment, considering one character at a time to compute the tree score [139]. Neighbour-joining is the most widely used algorithm that employs a distance matrix, while the maximum parsimony, maximum likelihood and Bayesian inference methods are character based. Each of these methods has unique strengths and limitations. For example, distance based methods are often known for their faster computational speed, but the distance calculations are problematic when sequences are divergent and involve many gaps [140,141]. Maximum Likelihood (ML) methods, like other character based methods like maximum parsimony and Bayesian inference tries to compute the minimum number of

33 changes at each site in the alignment. The ML method employs complex substitution models and has various adjustable parameters to estimate the most credible tree. However, ML iterations involve heavy computation time. The Bayesian method of phylogenetic inference has gained popularity in the last two decades and employs Markov chain Monte Carlo algorithms (MCMC algorithms) to estimate the posterior probability of trees [142,143]. Often the posterior probabilities of the nodes are considered as easy to interpret as they infer the probability that the tree is correct. However, MCMC analysis requires high computational time and for large datasets the convergence of the two random trees (a general methodology in Bayesian analysis, where two independent runs are computed simultaneously from two random trees and their convergence is estimated) is hard. Also, Bayesian analysis is often criticized for inflated posterior probabilities [139]. To ensure reliability over the tree topology, we construct phylogenetic trees using two different methods. In most cases, we use the Bayesian approach as implemented in MrBayes and cross verify the tree topology using the ML method. Therefore, the strength of the node values is often estimated using both the posterior probability (PP) and bootstrap analysis. Phylogenetic trees were constructed using Bayesian approach as implemented in MrBayes version 3.1.2 [144]. MCMC analysis was used to approximate the posterior probabilities of the trees. The topology of all the phylogenetic trees supported by the PP of the Bayesian approach was cross verified with bootstrap analysis (500 replicates) using maximum likelihood (ML) approach as implemented in PhyML (version 3.0) program [145,146].

Domain search The domain architectures of GPCR sequences were predicted using several online search methods, such as the Conserved Domain Search service (CD- Search) [147], Pfam search [70,148], and InterProScan [149,150] with default settings. Each of these methods employs a varied approach to predict the domains present in any given protein sequence. The CD-Search predicts the presence of a domain by utilizing an RPS-BLAST (Reverse PSI-BLAST) search strategy, where the query sequence is aligned against a database containing PSSMs (position-specific scoring matrices) of protein domain models. Similar to the CD-search, the Pfam search engine aligns the sequence of interest against its large collection of HMM profiles of protein families and domains. The Inter-ProScan utilizes a consensus approach by integrating the predictions from several databases and protein domain search engines and provides an output by considering the strengths and limitations of each method. Nonetheless, Inter-ProScan is more time consuming over the other methods.

34 HMM-HMM profile comparisons Comparing the HMM-HMM profiles of any two protein families can help to detect the homology between them [151]. The HHsearch program compares HMM profiles of any two protein families and defines a probability of whether the families are homologous or not. This is widely used by the Pfam database to define a Pfam clan, where homologous families are grouped together [70]. It is considered that HMM-HMM profile based comparison is highly sensitive in protein homology detection and a probability of >95% is considered as threshold where the homology between the families is certain [151]. We utilize this approach in “Paper I” and “Paper IV” to detect homology between GPCR gene families and as well to know the measure of relatedness or similarities between GPCR families.

35 Results and Discussion

Basal fungal species has homologs of metazoan-like GRAF(s) GPCR families (Paper I) In Paper I, we performed a comprehensive search for GPCRs in species that diverged before the split of the metazoans, with extra emphasis on the fungal phyla. We systematically mined 79 fungal genomes and provide the first evidence that four of the five main mammalian families of GRAFS GPCRs; Rhodopsin, Adhesion, Glutamate and Frizzled, are all present in Fungi. These findings, based on a more extended data sampling than previous studies, enabled us to revise the results of previous findings [21] and present an overall evolution of GPCRs. For the first time, the Rhodopsin family GPCRs was found outside metazoan lineages. Also, we strengthened previous findings that the cAMP family and the Rhodopsin family are closely related [21] and suggested that the Rhodopsin family is an expansion of the more ancient cAMP branch. The study estimated that the Rhodopsin family diverged from the cAMP receptor family in the common ancestor of opisthokonts. The Frizzled and the Adhesion families were shown to be more ancient than previously thought and it was also hypothesized that both families diverged from the cAMP receptor family close to the split of unikonts. The Glutamate family is found to be the most widespread GPCR family; only missing in plantae. Moreover, our mining also suggested that long and domain rich N-termini that are characteristic of the metazoan Adhesions emerged first in choanoflagellates, whereas fungi and Amoebozoa adhesions only have the Adhesion seven transmembrane domain. In contrast, the N-terminal domains of the Glutamate GPCRs that were earlier observed in metazoans are also found to be present in fungi. In conclusion, our study provided a comprehensive overview of GPCR evolution in eukaryotes and demonstrated the widespread distribution of these classical membrane receptors across the eukaryotic domain. Our findings were extended in a recent study, which provided evidence that the last eukaryotic common ancestor (LECA) possibly possess Rhodopsin, Adhesion, Glutamate and cAMP families, G proteins and other molecular components of the GPCR signaling [42]. Taken together, it is evident that diverse eukaryotes including the unicellular relatives and the more complex multicellular organisms of metazoans share a basal GPCR signal transduction system that was already present in early eukaryotic evolution.

36 Invertebrate metazoans are rich in GPCRs (Papers II and III) In Papers II and III, we curated and classified the GPCR repertoire in two of the most important emerging model organisms belonging to the metazoan phyla. Paper II describes the GPCR repertoire in the demosponge Amphimedon queenslandica, and phylogenetically compares the receptors with those found in eumetazoans and bilaterians. Similarly, Paper III describes the GPCR repertoire in Saccoglossus kowalevskii (the acorn worm) and reports a comparative perspective with the GPCRs found in the closely related echinoderms and other chordate species. The findings from these analyses provides a platform to relate to the GPCRs in higher animals, including humans and also helps to better understand the evolution of these GPCR families and subfamilies in detail. Importantly, these two organisms belong to representative metazoan phyla and contribute to the understanding of the evolution of metazoan morphological traits. Amphimedon queenslandica is a demosponge that belong to one of the earliest diverging phyla (Porifera) of animals and is an important model organism to study the evolution of animal multicellularity from single-celled ancestors, while Saccoglossus kowalevskii (the acorn worm) is a hemichordate belonging to the superphylum of deuterostome bilateral animals and are the sister group to echinoderms, and closely related to chordates [123]. S. kowalevskii is also used as a model organism to study the evolution of central nervous system and other morphological traits commonly found in vertebrates. Paper II builds upon the observations of the GPCR gene mining obtained from the whole genome report of the demosponge Amphimedon queenslandica. The analysis presents the complete component of GPCRs in Amphimedon queenslandica by utilizing both automated and manual verification of GPCR sequences. Moreover the study also delineated the relationships between the sponge GPCRs and GPCRs found in eumetazoans and bilaterians. Cross genome comparisons of GPCR repertoires revealed that sponge genome contains members belonging to four of the five major families: Glutamate (33), Rhodopsin (126), Adhesion (40) and Frizzled (3). We find that the sponge genome also encodes cAMP-like, intimal thickness- related receptor-like (ITR-like), lung 7TM receptor-like and ocular albinism like (GPR143) receptors. Furthermore, our phylogenetic comparisons of sponge GPCRs with other genomes revealed several interesting findings. First, we find that the sponge Rhodopsin family GPCRs is divergent from those found in other metazoans, including the eumetazoans and bilaterians and lacks support to establish orthologous relationships. These findings suggest a possible scenario where Rhodopsin (Class A) GPCRs likely expanded independently in non-bilaterian metazoans and may have co-opted for diverse functions based on the diverse morphological characteristics of the non-bilaterian metazoans. Second, our mining also revealed that the

37 sponge genome encodes a surprisingly large subset of Adhesion family GPCRs, with an overall count of 40 sequences. This number is relatively higher than several metazoans, and is also comparable to many vertebrates, including humans. Intriguingly, expansions of Adhesion GPCRs at the early origin of metazoans, relative to unicellular relatives, is in agreement with the fact that cell-cell adhesion is one of the major factors involved in the evolution of multicellular animals from single-celled closest ancestors. Overall, our analysis shows that sponge genome contains a diversified set of GPCRs and constitutes the one of the first expansions of the Rhodopsin and Adhesion family GPCRs. Similarly, Paper III describes the GPCR complement in Saccoglossus kowalevskii. The study identified 260 unique GPCRs and classified 257 of them within the five main GPCR families; Glutamate (23), Rhodopsin (212), Adhesion (18), Frizzled (3) and Secretin (1). Intriguingly, this basal chordate contains several members of the Adhesion and Glutamate family members that are commonly found in vertebrates including humans. Comparisons with the human counterparts show that these sequences share a good pairwise sequence identity within the 7tm region and contain highly similar N-terminal domain architectures as well. We found 23 members belonging to the Glutamate family, including six GRMs-, eight GABABs-, three CASR- and one GPR158-like receptor. Also, the S. kowalevskii genome encodes 18 Adhesion GPCR like genes, and contains conserved members belonging to five of the nine known mammalian Adhesion groups. Like most vertebrates, the Rhodopsin family in S. kowalevskii contains the largest number of GPCRs. The final list of Rhodopsin GPCRs in S. kowalevskii contains 212 members and this constituted about 82% of the GPCRs found in its genome. These 212 receptors were found in all four groups of the Rhodopsin family. However, only seven of the 13 subgroups are present in S. kowalevskii. The missing gene families included the mammalian type of olfactory receptors, chemokine receptors, prostaglandin receptors, purine receptors, among others. Interestingly, the Rhodopsin family in S. kowalevskii contains large expansions of peptide and the somatostatin-like receptors. Within the expansions of the peptide subgroup, a set of 25 sequences is similar to neuropeptide FF receptors, while other members of the expansions are similar to TACRs. This is comparable to observations in Branchiostoma floridae, which has a large expansion of about 39 sequences similar to the neuropeptide FF receptors [152]. It is also found that the average sequence identity between the GPCR orthologues in humans and the hemichordate S. kowalevskii is around 47%. This is similar to that observed in couple of closest related vertebrate relatives, Ciona intestinalis (41%) and B. floridae (~47%). Several orthologs of these GPCR sequences in human and other vertebrates are involved in neurotransmission or perform other CNS related roles. It is thus interesting to find conserved members of these GPCRs in Saccoglossus kowalevskii, which lacks a nervous system, but

38 instead has a nerve-net-like architecture. In summary, our curation of GPCR families in hemichordate S. kowalevskii provides a platform for better understanding the evolution of hemichordate–chordate GPCR signaling system and for further investigation of the hemichordate neurobiology from GPCR’s perspective.

Evolution of G protein families and comparisons with GPCR repertoires (Paper IV) Understanding the evolution of GPCR signaling pathway in detail requires not only the analysis of the receptors that receives the extracellular signal, but more importantly the intracellular coupling partner (the G proteins) and all other crucial downstream components. Until recently, several studies have focused on the GPCR evolution and several other individual components of the pathways, but have not provided complete evolutionary histories including all the components of the GPCR signaling pathway. Recently, however, de Mendoza and colleagues presented a comprehensive analysis on the evolution of the GPCR signaling pathway by performing a broad genomic survey that includes taxon sampling from all eukaryotic supergroups. This study provided large insights into the evolution of the GPCR pathway and reported that representatives of G protein families are found across metazoans and their closest unicellular relatives. Paper IV builds upon the observations from those previous findings and maps the overall evolutionary hierarchy of vertebrate-like G proteins. In addition, Paper IV provides family-level annotation for each identified G protein gene and presents an overall map of individual gene gain/loss events with the data obtained from more expanded invertebrate taxon sampling than previous studies. The evolutionary histories of G protein gene families presented in this study provide additional support to several previous findings and catalog several new observations as well. First, our results are in agreement with earlier findings that G protein Gα gene families can be traced back to the last common ancestor (CA) of holozoans. Based on these previous evidences and from our results from Paper IV, we then suggest that the CA of holozoans likely constitutes eight genes coding for G proteins, which include five Gα genes corresponding to the vertebrate Gα classes (Gαs, Gαi/o, Gαq, Gα12/13 and Gαv); two ancestral Gβ genes and an ancestral Gγ gene. Secondly, we suggest that the ancestral Gαi/o-like gene already present in the closest unicellular metazoans likely duplicated to give rise to Gαi-like and Gαo-like genes, possibly in the last CA of metazoans. Interestingly, this subset of genes is almost retained in all invertebrate metazoans before being expanded in vertebrate tetraploidizations. Nevertheless, our broad invertebrate taxon sampling also showed that Gα

39 genes have expanded in several species. This was mainly observed within the Gα classes (Gαi/o, Gαs, Gαq, and Gα12/13) and GNB1–4 paralogon. Moreover, our survey also revealed that the D. melanogaster eye specific Gβ subunit, involved in the organism’s phototransduction cascade, is also conserved among most arthropods. Also, Paper IV presents an overall proposed scenario on the expansions of G protein gene families that occurred during the early vertebrate genome doubling events. We examined previously studied families such as the GNAI/GNAT, GNB1–4 paralogon and G gamma (GNAT1/GNAT2/GNA11) and broaden the perspective by including other subunit families, such as GNAS/L, GNAQ/11, GNA12/13, as well as the GNG cluster. Our results are in agreement with earlier findings and we also show that GNAS/L, GNAQ/11, GNA12/13 have also expanded by means of vertebrate tetraploidizations events, giving rise to the commonly observed vertebrate repertoire of G protein gene families. Our broad genomic survey also provides a platform to relate to the evolution of GPCRs, as the receptors expanded massively in several lineages. It is intriguing to note that the first large expansions of GPCRs that occurred at the early origin of metazoans did not require similar expansions of G proteins to ensure the process of intracellular signaling mechanisms. As noted previously, this suggests that the G protein repertoire already present in holozoans was likely tolerant to the massive expansions of GPCRs and in such aspect the evolution of the GPCR signaling system is highly modular in nature. In summary, Paper IV concludes that most invertebrate metazoans have retained the basic subset of G proteins already present in the CA of metazoans and that they have undergone several losses and gains, as well as evolving lineage specific subtypes like Gβe during the course of the metazoan evolution before being expanded in vertebrates. In addition, several species-specific expansions also suggest the presence of diverse intracellular signaling mechanisms yet to be described in some of these metazoans. Overall, our results provide a basis for future genomic and functional studies aiming to understand the diverse roles of GPCR/G protein mediated intracellular signaling systems in multicellular animals.

Evolution of Nematode chemosensory GPCR system in protostomes (Paper V) Nematode chemosensory GPCRs (NemChRs) constitute a massive expansion of putative chemoreceptor proteins that gave rise to 19 large families sharing high sequence similarity. Based on sequence analysis, previous studies suggested that these receptors likely split from the Rhodopsin family, which is also comprised of the olfactory receptors responsible for sensing the environment in most metazoans. However,

40 evolutionary relationships between the nematode chemosensory GPCRs and the Rhodopsin family are not fully understood. Moreover, a broad genomic survey was not performed previously to check for the presence of any homologs of these 19 gene families outside the nematode lineage. Paper V examines the relationships between the Rhodopsin (class A) GPCRs and the 19 NemChR gene families and as well reports a broad survey for their homologs outside the nematode lineage. Herein, we investigated 26 eukaryotic species covering several eukaryotic phyla and provided the first evidence for the presence of homologs of the srw family outside the nematode lineage. Our search methodology identified 29 putative srw family members that included 15 sequences in insects, 11 in molluscs and 3 in Schistosoma mansoni. The evidence is convincing as these sequences: 1) shared several common motifs with the previously annotated srw family members from C. elegans; 2) had a corresponding Pfam domain 7tm_GPCR_srw as their highest scoring alignment in HMM-based Pfam search; 3) unambiguously clustered with the previously known and annotated srw family from nematodes with high confidence support. Furthermore, based on HMM profile-profile comparisons and phylogenetic analysis, we showed that srw family members are closely related to the peptide and SOG subfamilies of the large Rhodopsin family. We find that peptide and SOG family members from four species (C. elegans, N. vectensis, D. melanogaster and T. adhaerens) are among the top hits in BLAST searches and these sequences are consistently placed basal to the srw family sequences in our phylogenetic analysis. This topology is also consistent when we included more sequences belonging to all other subfamilies of the large Rhodopsin family. Based on these findings, we suggest that the srw family split from the Rhodopsin family, somewhere close to the emergence of protostomes from a bilaterian ancestor. Interestingly, our results also show that the srsx family shares significant similarity to some vertebrate olfactory receptors, as some of the sequences in vertebrate genomes shows similarity to both the 7tm_GPCR_Srsx domain and 7tm_4 (olfactory) domain within their transmembrane spanning regions. These findings suggest that these chemosensory GPCR families likely share a common origin somewhere close to the origin of bilaterians. And later the protostomes evolved several lineage specific chemosensory families, while deuterostomes retained the vertebrate-like olfactory receptors that were already present in cnidarians. These findings also support our earlier hypothesis that suggests that all chemosensory GPCR gene families and the Rhodopsin family share a common ancestor somewhere close to split of cnidarians from the eumetazoan ancestor. In other words, these findings strongly support that all chemosensory GPCR gene families likely split from the ancient Rhodopsin family. Taken together, Paper V clarifies the relationships between the nematode chemosensory GPCRs and the

41 Rhodopsin family and provides important insights into the events that shape the evolution of the GPCR mediated chemosensory system in protostomes.

42 Conclusions

The data presented in this thesis build upon the earlier hypothesis on the evolutionary histories of the GRAFS GPCR and G protein gene families and provide a robust update on the overall evolution of the two major components of the GPCR signaling system: the GPCRs and G proteins. Paper I provide a comprehensive genomic survey for homologs of the GRAFS GPCR families in pre-metazoan eukaryotic lineages with extra emphasis on fungal genomes. The data obtained from the study provided unprecedented evidence for the presence of four of the five major GRAFS families in basal fungal genomes and enables to revise the evolutionary histories of the major GPCR families. This means that for the first time our data presented evidence for the presence of Rhodopsin family GPCRs outside of the metazoan lineage and extended its origin than previously thought. Moreover, our findings strengthened the relationships between the cAMP receptor families and other major GPCR families and support the idea that most GPCR gene families share a common origin. Papers II and III shift focus towards understanding the GPCR components in two important organisms that hold key evolutionary positions in the metazoan tree of life and thus serve as potential model organisms to study the evolution of metazoan morphological traits. The findings from paper II clearly suggested that the one of the most ancient metazoan (Amphimedon queenslandica) encoded the first metazoan expansions of the Rhodopsin and Adhesion GPCR families and constitute a highly diversified GPCR repertoire at the early origin of metazoans. Our comparative phylogenetic analysis also suggests that the Rhodopsin family GPCRs has likely undergone species specific expansions, possibly to co-opt for differential roles depending upon the diverse morphological traits of the pre-bilaterian metazoans. Paper III reports the curation of GPCRs in Saccoglossus kowalevskii, and our comparative phylogenetic studies suggested that the acorn worm shares several conserved Rhodopsin, Adhesion and Glutamate family receptors that are commonly found in vertebrates. Findings from Paper III provided the first curation of the major large scale signaling gene family in the hemichordate lineage and provided a basis for further understanding of hemichordate neurobiology from the perspective of GPCRs. Paper IV provides a comprehensive mining of G protein genes in broad taxon sampling that included more invertebrate genomes than previous studies. Based on our mining and comparative sequence and phylogenetic

43 analyses, we traced the evolutionary hierarchy of the vertebrate-like G protein gene families and provided further support to previous findings that Gα genes corresponding to all major G alpha classes can be traced back to CA of holozoans. Moreover, we add to these findings and suggest that CA of holozoans likely contain eight genes coding for G proteins, and that this subset of G proteins was already present prior to the divergence of metazoans and retained in most invertebrate species albeit with a few gene losses and species specific expansions. Our overall map of gene/gain and loss of G protein genes in metazoans provides a basis for further understanding of the G protein mediated intracellular signaling pathways in diverse metazoan lineages. Paper V clarifies the evolutionary relationships between the nematode chemosensory gene families and the Rhodopsin family. Our data from genomic survey and comparative phylogenetic analyses presented evidence for the first time that srw family is found across several phyla of protostomes and established the evolutionary links between the Rhodopsin (class A) GPCR family and nematode chemoreceptors. Moreover, the study presents an insight that all GPCR-mediated chemosensory GPCR families share a common origin with the large Rhodopsin family GPCRs, somewhere before the protostome-deuterostome split. In summary, our results provide a robust update on the evolutionary histories of the GPCR and G protein families and further strengthen our overall understanding of the evolution of the GPCR signaling system.

44 Future perspectives

With the current growing knowledge on the evolution of non-bilaterian morphological traits, the next leap forward would be to understand the roles of some of the major signaling gene families in these ancient metazoans. For example, the current positioning of ctenophores (comb jellies) as sister group to all other metazoans has revised earlier theories and implies that the nervous system has multiple origins in the common ancestor of Ctenophora and Eumetazoa or that the ancestral nervous system has been lost in sponges and ctenophores. Thus, it would be interesting to explore what role the large expansions of GPCRs, in particular the members of the Rhodopsin (class A) family (a family that has crucial neurobiological roles in most bilaterians) perform in sponges and ctenophores, which lack nervous system. Whether these receptors co-opted for different roles depending on the diverse morphological traits of the non-bilaterian metazoans would be an interesting question to explore. Also, much attention has been paid to the understanding of the evolution of GPCR gene families and as well several other important gene families from a vertebrate or human view point. This to some extent distorts the overall evolution of gene families, mainly in the context of species-specific biology. This approach has largely ignored some of the species specific expansions and little is known about the roles of these GPCRs in several ancient metazoan lineages. With the current availability of genomes from diverse metazoan lineages, it would thus be interesting to catalog the species specific expansions and perform further studies to explore the roles of these. This approach will enhance our understanding of the overall evolution of GPCR signaling system and as well several other crucial gene families that underpin the evolution of metazoan morphology.

45 Acknowledgements

First and foremost I would like to express my gratitude to my main supervisor Professor Helgi Schiöth for accepting me as a PhD student. In particular, I am thankful for believing in me, supporting my ideas, and creating an environment to work freely and independent. An offer of becoming a PhD student in such renowned university changed my living altogether and gave me a platform to support myself and more importantly my family. Many thanks to my co-supervisor Dr. Robert Fredriksson for always being helpful, having answers to my questions, as well as for correcting my manuscripts with all patience.

A special thanks to my former supervisor Assoc Prof. Manoj Narayanan at IIT Madras for accepting me to join your lab in IITM and to introduce me what a research project is all about and how to pursue one such. It is only under your guidance I first learnt the basics of Bioinformatics and the art of writing a manuscript. With all gratitude, I must mention that without the opportunity of using the platform as IIT’s, it would have been much difficult to have acquired a PhD position in abroad and as well to have achieved what I have so far.

Very special thanks to Mike Williams for always saying “yes I can do it, cheers”, whenever I troubled you for proof reading my manuscripts. And you always did it with all patience and your contribution certainly helped my manuscripts to read a lot better.

I would also deeply thank Markus Sällman Almén for not just helping me with my projects but always taking care of other things as well. You was the first person to care about my accommodation problem when no one around asked about it. And it is you who first introduced me to Anica Klockars to seek any help regarding accommodation. And I here owe a million thanks to Anica for using all your points to find me a beautiful apartment, which helped me to relax myself. And without your help life in Uppsala would have been much difficult. We (Me and Roopa) always think about you whenever we feel relaxed at home and the apartment always remind you and your great help. Our best regards for you and your family and we always wish for happiness and prosperity to be fulfilled thorough out your life.

46 I owe many thanks to all current and former members of Helgi Lab. Given the number of persons in the Helgi Lab, it was always fun to keep track of all publications from the lab and I learnt many things by reading several papers of lab members with wide research focus. Although, I haven’t had much interaction with some of the members of the lab, I have always felt privileged to be part of such productive lab. Thanks to: Smitha, Nathalie, Samantha, Karin, Josefin, Mathias, Åke, Olga Titova, Philip, Praveen, Jai, Anders, Linda, Wei, Sahar, Emelie, Sofie, Frida, Emil and all other members of Helgi Lab. Special thanks to Rohit for making contribution to my papers. And special thanks to Lyle for proofreading my review article and my thesis. And I would like to thank Aniruddha, Chetan, Rohit, and others for brief chats in the corridors. And special thanks to Ravisankar for sharing the room with me and spending valuable time in cooking and discussing all issues at work. And, importantly thanks for being patient for all my complaints about life and lifestyle in abroad and giving me advice over several issues.

I would like to thank all kind and helpful people that I have met during my two years of life at IIT Madras. Thanks to Navin, Deepak, Mrityunjay and others. And special thanks to Navin for teaching me the basics of biochemistry and biophysics early in the morning. And you guys have been always inspiring and it is only during this period in my life I first and truly thought about pursuing a PhD. And many thanks for those fun filled cricket and table tennis sessions as well.

I would also especially like to thank my wonderful friends during my masters at University of Madras. Thanks to Kalai, Balaji, Niyaz, Kumaresan, Rao, Thirumurthy and Gugan. Although we all differ in opinions and thinking, I have always felt that we have directly or indirectly inspired each other. This has truly contributed to each one of us to pursue a PhD carrier and I firmly believe that our batch would be one among the very few (or the very first perhaps!) where all boys pursued a PhD.

And most importantly, I thank my best buddies: Narsi, Muthiah, Kiruba, Murali and Balaji. Without all of you, life would have been absolutely meaningless. And I always wish that our beautiful journey goes till the very end.

Finally, I owe everything to my Appa and Amma who have always been integral in all the good things that I have come across in my life. Many thanks to Appa and Amma for believing and supporting whenever needed. Appa, you have always been my friend, with whom I can discuss anything. I have always had the freedom to hand around with friends and also to come back home and talk about the drinks that I had with my best buddies. And

47 many thanks to all family members and relatives, who have always wished for my success and happiness. And last but not the least; I thank my wife Roopa for the unconditional love and care that you always have for me. More than being my wife, you have always been my friend and have become everything to me for the rest of my life. Without you, I wouldn’t have had any thoughts of pursuing a PhD. And without you, all that I have got in my life wouldn’t have been possible. Thank you all for being a part of my life and for making this possible.

48 References

1. Robertson JD (1959) The ultrastructure of cell membranes and their derivatives. Biochem Soc Symp 16: 3-43. 2. van Meer G, Voelker DR, Feigenson GW (2008) Membrane lipids: where they are and how they behave. Nat Rev Mol Cell Biol 9: 112-124. 3. von Heijne G (2006) Membrane-protein topology. Nat Rev Mol Cell Biol 7: 909-918. 4. Engel A, Gaub HE (2008) Structure and mechanics of membrane proteins. Annu Rev Biochem 77: 127-148. 5. Muller DJ, Wu N, Palczewski K (2008) Vertebrate membrane proteins: structure, function, and insights from biophysical approaches. Pharmacol Rev 60: 43-78. 6. Rosenbaum DM, Rasmussen SG, Kobilka BK (2009) The structure and function of G-protein-coupled receptors. Nature 459: 356-363. 7. Venkatakrishnan AJ, Deupi X, Lebon G, Tate CG, Schertler GF, et al. (2013) Molecular signatures of G-protein-coupled receptors. Nature 494: 185-194. 8. Lappano R, Maggiolini M (2011) G protein-coupled receptors: novel targets for drug discovery in cancer. Nat Rev Drug Discov 10: 47-60. 9. Rask-Andersen M, Masuram S, Schioth HB (2014) The druggable genome: Evaluation of drug targets in clinical trials suggests major shifts in molecular class and indication. Annu Rev Pharmacol Toxicol 54: 9-26. 10. Rask-Andersen M, Almen MS, Schioth HB (2011) Trends in the exploitation of novel drug targets. Nat Rev Drug Discov 10: 579-590. 11. Civelli O, Reinscheid RK, Zhang Y, Wang Z, Fredriksson R, et al. (2013) G protein-coupled receptor deorphanizations. Annu Rev Pharmacol Toxicol 53: 127-146. 12. Oldham WM, Hamm HE (2008) Heterotrimeric G protein activation by G- protein-coupled receptors. Nat Rev Mol Cell Biol 9: 60-71. 13. Kolakowski LF, Jr. (1994) GCRDb: a G-protein-coupled receptor database. Receptors Channels 2: 1-7. 14. Bockaert J, Pin JP (1999) Molecular tinkering of G protein-coupled receptors: an evolutionary success. EMBO J 18: 1723-1729. 15. Fredriksson R, Lagerstrom MC, Lundin LG, Schioth HB (2003) The G- protein-coupled receptors in the human genome form five main families. Phylogenetic analysis, paralogon groups, and fingerprints. Mol Pharmacol 63: 1256-1272. 16. Lagerstrom MC, Schioth HB (2008) Structural diversity of G protein-coupled receptors and significance for drug discovery. Nat Rev Drug Discov 7: 339- 357. 17. Chun L, Zhang WH, Liu JF (2012) Structure and ligand recognition of class C GPCRs. Acta Pharmacol Sin 33: 312-323.

49 18. Cao J, Huang S, Qian J, Huang J, Jin L, et al. (2009) Evolution of the class C GPCR Venus flytrap modules involved positive selected functional divergence. BMC Evol Biol 9: 67. 19. Dore AS, Okrasa K, Patel JC, Serrano-Vega M, Bennett K, et al. (2014) Structure of class C GPCR metabotropic 5 transmembrane domain. Nature 511: 557-562. 20. Wu H, Wang C, Gregory KJ, Han GW, Cho HP, et al. (2014) Structure of a class C GPCR metabotropic glutamate receptor 1 bound to an allosteric modulator. Science 344: 58-64. 21. Nordstrom KJ, Sallman Almen M, Edstam MM, Fredriksson R, Schioth HB (2011) Independent HHsearch, Needleman--Wunsch-based, and motif analyses reveal the overall hierarchy for most of the G protein-coupled receptor families. Mol Biol Evol 28: 2471-2480. 22. Pin JP, Galvez T, Prezeau L (2003) Evolution, structure, and activation mechanism of family 3/C G-protein-coupled receptors. Pharmacol Ther 98: 325-354. 23. Taniura H, Sanada N, Kuramoto N, Yoneda Y (2006) A metabotropic glutamate receptor family gene in Dictyostelium discoideum. J Biol Chem 281: 12336-12343. 24. Prabhu Y, Eichinger L (2006) The Dictyostelium repertoire of seven transmembrane domain receptors. Eur J Cell Biol 85: 937-946. 25. Port JA, Parker MS, Kodner RB, Wallace JC, Armbrust EV, et al. (2013) Identification of G protein-coupled receptor signaling pathway proteins in marine diatoms using comparative genomics. BMC Genomics 14: 503. 26. Nei M, Niimura Y, Nozawa M (2008) The evolution of animal chemosensory receptor gene repertoires: roles of chance and necessity. Nat Rev Genet 9: 951- 963. 27. Niimura Y (2009) On the origin and evolution of vertebrate olfactory receptor genes: comparative genome analysis among 23 chordate species. Genome Biol Evol 1: 34-44. 28. Churcher AM, Taylor JS (2011) The antiquity of chordate odorant receptors is revealed by the discovery of orthologs in the cnidarian Nematostella vectensis. Genome Biol Evol 3: 36-43. 29. Anctil M (2009) Chemical transmission in the sea anemone Nematostella vectensis: A genomic perspective. Comp Biochem Physiol Part D Genomics Proteomics 4: 268-289. 30. Jekely G (2013) Global view of the evolution and diversity of metazoan neuropeptide signaling. Proc Natl Acad Sci U S A 110: 8702-8707. 31. Mirabeau O, Joly JS (2013) Molecular evolution of peptidergic signaling systems in bilaterians. Proc Natl Acad Sci U S A 110: E2028-2037. 32. Nikitin M (2015) Bioinformatic prediction of Trichoplax adhaerens regulatory peptides. Gen Comp Endocrinol 212: 145-155. 33. Dong D, He G, Zhang S, Zhang Z (2009) Evolution of olfactory receptor genes in primates dominated by birth-and-death process. Genome Biol Evol 1: 258- 264. 34. Rovati GE, Capra V, Neubig RR (2007) The highly conserved DRY motif of class A G protein-coupled receptors: beyond the ground state. Mol Pharmacol 71: 959-964. 35. Bjarnadottir TK, Gloriam DE, Hellstrand SH, Kristiansson H, Fredriksson R, et al. (2006) Comprehensive repertoire and phylogenetic analysis of the G protein-coupled receptors in human and mouse. Genomics 88: 263-273.

50 36. Schioth HB, Nordstrom KJ, Fredriksson R (2010) The adhesion GPCRs; gene repertoire, phylogeny and evolution. Adv Exp Med Biol 706: 1-13. 37. Promel S, Langenhan T, Arac D (2013) Matching structure with function: the GAIN domain of adhesion-GPCR and PKD1-like proteins. Trends Pharmacol Sci 34: 470-478. 38. Arac D, Boucard AA, Bolliger MF, Nguyen J, Soltis SM, et al. (2012) A novel evolutionarily conserved domain of cell-adhesion GPCRs mediates autoproteolysis. EMBO J 31: 1364-1378. 39. Langenhan T, Aust G, Hamann J (2013) Sticky signaling--adhesion class G protein-coupled receptors take the stage. Sci Signal 6: re3. 40. Bjarnadottir TK, Fredriksson R, Hoglund PJ, Gloriam DE, Lagerstrom MC, et al. (2004) The human and mouse repertoire of the adhesion family of G- protein-coupled receptors. Genomics 84: 23-33. 41. Fredriksson R, Schioth HB (2005) The repertoire of G-protein-coupled receptors in fully sequenced genomes. Mol Pharmacol 67: 1414-1425. 42. de Mendoza A, Sebe-Pedros A, Ruiz-Trillo I (2014) The evolution of the GPCR signaling system in eukaryotes: modularity, conservation, and the transition to metazoan multicellularity. Genome Biol Evol 6: 606-619. 43. Boucard AA, Ko J, Sudhof TC (2012) High affinity neurexin binding to cell adhesion G-protein-coupled receptor CIRL1/latrophilin-1 produces an intercellular adhesion complex. J Biol Chem 287: 9399-9413. 44. Boucard AA, Maxeiner S, Sudhof TC (2014) function as heterophilic cell-adhesion molecules by binding to teneurins: regulation by alternative splicing. J Biol Chem 289: 387-402. 45. Silva JP, Lelianova VG, Ermolyuk YS, Vysokov N, Hitchen PG, et al. (2011) and its endogenous ligand Lasso/teneurin-2 form a high-affinity transsynaptic receptor pair with signaling capabilities. Proc Natl Acad Sci U S A 108: 12113-12118. 46. O'Sullivan ML, de Wit J, Savas JN, Comoletti D, Otto-Hitt S, et al. (2012) FLRT proteins are endogenous latrophilin ligands and regulate excitatory development. Neuron 73: 903-910. 47. O'Sullivan ML, Martini F, von Daake S, Comoletti D, Ghosh A (2014) LPHN3, a presynaptic adhesion-GPCR implicated in ADHD, regulates the strength of neocortical layer 2/3 synaptic input to layer 5. Neural Dev 9: 7. 48. Hamann J, Aust G, Arac D, Engel FB, Formstone C, et al. (2015) International Union of Basic and Clinical Pharmacology. XCIV. Adhesion G protein- coupled receptors. Pharmacol Rev 67: 338-367. 49. Adler PN (2002) Planar signaling and morphogenesis in Drosophila. Dev Cell 2: 525-535. 50. Schulte G (2010) International Union of Basic and Clinical Pharmacology. LXXX. The class Frizzled receptors. Pharmacol Rev 62: 632-667. 51. Huang HC, Klein PS (2004) The Frizzled family: receptors for multiple signal transduction pathways. Genome Biol 5: 234. 52. Schulte G, Bryja V (2007) The Frizzled family of unconventional G-protein- coupled receptors. Trends Pharmacol Sci 28: 518-525. 53. Janda CY, Waghray D, Levin AM, Thomas C, Garcia KC (2012) Structural basis of Wnt recognition by Frizzled. Science 337: 59-64. 54. Wang HY, Liu T, Malbon CC (2006) Structure-function analysis of Frizzleds. Cell Signal 18: 934-941. 55. Venkatakrishnan A, Flock T, Prado DE, Oates ME, Gough J, et al. (2014) Structured and disordered facets of the GPCR fold. Curr Opin Struct Biol 27C: 129-137.

51 56. Wang C, Wu H, Katritch V, Han GW, Huang XP, et al. (2013) Structure of the human receptor bound to an antitumour agent. Nature 497: 338- 343. 57. Poyner DR, Hay DL (2012) Secretin family (Class B) G protein-coupled receptors - from molecular to clinical perspectives. Br J Pharmacol 166: 1-3. 58. Bortolato A, Dore AS, Hollenstein K, Tehan BG, Mason JS, et al. (2014) Structure of Class B GPCRs: new horizons for drug discovery. Br J Pharmacol 171: 3132-3145. 59. Miller LJ, Dong M, Harikumar KG, Gao F (2007) Structural basis of natural ligand binding and activation of the Class II G-protein-coupled . Biochem Soc Trans 35: 709-712. 60. Siu FK, Lam IP, Chu JY, Chow BK (2006) Signaling mechanisms of secretin receptor. Regul Pept 137: 95-104. 61. Watkins HA, Au M, Hay DL (2012) The structure of secretin family GPCR peptide ligands: implications for receptor pharmacology and drug development. Drug Discov Today 17: 1006-1014. 62. Archbold JK, Flanagan JU, Watkins HA, Gingell JJ, Hay DL (2011) Structural insights into RAMP modification of secretin family G protein-coupled receptors: implications for drug development. Trends Pharmacol Sci 32: 591- 600. 63. Cardoso JC, Pinto VC, Vieira FA, Clark MS, Power DM (2006) Evolution of secretin family GPCR members in the metazoa. BMC Evol Biol 6: 108. 64. Nordstrom KJ, Lagerstrom MC, Waller LM, Fredriksson R, Schioth HB (2009) The Secretin GPCRs descended from the family of Adhesion GPCRs. Mol Biol Evol 26: 71-84. 65. Palmisano I, Bagnato P, Palmigiano A, Innamorati G, Rotondo G, et al. (2008) The ocular albinism type 1 protein, an intracellular G protein-coupled receptor, regulates melanosome transport in pigment cells. Hum Mol Genet 17: 3487- 3501. 66. Shen B, Samaraweera P, Rosenberg B, Orlow SJ (2001) Ocular albinism type 1: more than meets the eye. Pigment Cell Res 14: 243-248. 67. Fukuda N, Naito S, Masukawa D, Kaneda M, Miyamoto H, et al. (2015) Expression of ocular albinism 1 (OA1), 3, 4- dihydroxy- L-phenylalanine (DOPA) receptor, in both neuronal and non-neuronal organs. Res 1602: 62-74. 68. Xue C, Hsueh YP, Heitman J (2008) Magnificent seven: roles of G protein- coupled receptors in extracellular sensing in fungi. FEMS Microbiol Rev 32: 1010-1032. 69. Finn RD, Mistry J, Schuster-Bockler B, Griffiths-Jones S, Hollich V, et al. (2006) Pfam: clans, web tools and services. Nucleic Acids Res 34: D247-251. 70. Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, et al. (2014) Pfam: the protein families database. Nucleic Acids Res 42: D222-230. 71. Oldham WM, Hamm HE (2006) Structural basis of function in heterotrimeric G proteins. Q Rev Biophys 39: 117-166. 72. Johnston CA, Siderovski DP (2007) Receptor-mediated activation of heterotrimeric G-proteins: current structural insights. Mol Pharmacol 72: 219- 230. 73. Oldham WM, Hamm HE (2007) How do receptors activate G proteins? Adv Protein Chem 74: 67-93. 74. Koelle MR (2006) Heterotrimeric G protein signaling: Getting inside the cell. Cell 126: 25-27.

52 75. Kleuss C, Raw AS, Lee E, Sprang SR, Gilman AG (1994) Mechanism of GTP hydrolysis by G-protein alpha subunits. Proc Natl Acad Sci U S A 91: 9828- 9831. 76. Hollinger S, Hepler JR (2002) Cellular regulation of RGS proteins: modulators and integrators of G protein signaling. Pharmacol Rev 54: 527-559. 77. De Vries L, Zheng B, Fischer T, Elenko E, Farquhar MG (2000) The regulator of G protein signaling family. Annu Rev Pharmacol Toxicol 40: 235-271. 78. Sprang SR (1997) G protein mechanisms: insights from structural analysis. Annu Rev Biochem 66: 639-678. 79. Lambright DG, Noel JP, Hamm HE, Sigler PB (1994) Structural determinants for activation of the alpha-subunit of a heterotrimeric G protein. Nature 369: 621-628. 80. Noel JP, Hamm HE, Sigler PB (1993) The 2.2 A crystal structure of transducin-alpha complexed with GTP gamma S. Nature 366: 654-663. 81. Sondek J, Bohm A, Lambright DG, Hamm HE, Sigler PB (1996) Crystal structure of a G-protein beta gamma dimer at 2.1A resolution. Nature 379: 369- 374. 82. Wall MA, Coleman DE, Lee E, Iniguez-Lluhi JA, Posner BA, et al. (1995) The structure of the G protein heterotrimer Gi alpha 1 beta 1 gamma 2. Cell 83: 1047-1058. 83. Wettschureck N, Offermanns S (2005) Mammalian G proteins and their cell type specific functions. Physiol Rev 85: 1159-1204. 84. Hurowitz EH, Melnyk JM, Chen YJ, Kouros-Mehr H, Simon MI, et al. (2000) Genomic characterization of the human heterotrimeric G protein alpha, beta, and gamma subunit genes. DNA Res 7: 111-120. 85. Downes GB, Gautam N (1999) The G protein subunit gene families. Genomics 62: 544-552. 86. Wilkie TM, Gilbert DJ, Olsen AS, Chen XN, Amatruda TT, et al. (1992) Evolution of the mammalian G protein alpha subunit multigene family. Nat Genet 1: 85-91. 87. Oka Y, Saraiva LR, Kwan YY, Korsching SI (2009) The fifth class of Galpha proteins. Proc Natl Acad Sci U S A 106: 1484-1489. 88. Oka Y, Korsching SI (2009) The fifth element in animal Galpha protein evolution. Commun Integr Biol 2: 227-229. 89. Hildebrandt JD (1997) Role of subunit diversity in signaling by heterotrimeric G proteins. Biochem Pharmacol 54: 325-339. 90. Simon MI, Strathmann MP, Gautam N (1991) Diversity of G proteins in signal transduction. Science 252: 802-808. 91. Hamm HE (1998) The many faces of G protein signaling. J Biol Chem 273: 669-672. 92. Preininger AM, Hamm HE (2004) G protein signaling: insights from new structures. Sci STKE 2004: re3. 93. Marinissen MJ, Gutkind JS (2001) G-protein-coupled receptors and signaling networks: emerging paradigms. Trends Pharmacol Sci 22: 368-376. 94. Malbon CC (2005) G proteins in development. Nat Rev Mol Cell Biol 6: 689- 701. 95. Landry Y, Gies JP (2002) Heterotrimeric G proteins control diverse pathways of transmembrane signaling, a base for drug discovery. Mini Rev Med Chem 2: 361-372. 96. Lombardi MS, Kavelaars A, Heijnen CJ (2002) Role and modulation of G protein-coupled receptor signaling in inflammatory processes. Crit Rev Immunol 22: 141-163.

53 97. Hendriks-Balk MC, Peters SL, Michel MC, Alewijnse AE (2008) Regulation of G protein-coupled receptor signalling: focus on the cardiovascular system and regulator of G protein signalling proteins. Eur J Pharmacol 585: 278-291. 98. Smrcka AV (2008) G protein betagamma subunits: central mediators of G protein-coupled receptor signaling. Cell Mol Life Sci 65: 2191-2214. 99. Khan SM, Sleno R, Gora S, Zylbergold P, Laverdure JP, et al. (2013) The expanding roles of Gbetagamma subunits in G protein-coupled receptor signaling and drug action. Pharmacol Rev 65: 545-577. 100. Lin Y, Smrcka AV (2011) Understanding molecular recognition by G protein betagamma subunits on the path to pharmacological targeting. Mol Pharmacol 80: 551-557. 101. Dorsam RT, Gutkind JS (2007) G-protein-coupled receptors and cancer. Nat Rev Cancer 7: 79-94. 102. Belmonte SL, Blaxall BC (2012) Conducting the G-protein Coupled Receptor (GPCR) Signaling Symphony in Cardiovascular Diseases: New Therapeutic Approaches. Drug Discov Today Dis Models 9: e85-e90. 103. Sassone-Corsi P (2012) The cyclic AMP pathway. Cold Spring Harb Perspect Biol 4. 104. Seger R, Krebs EG (1995) The MAPK signaling cascade. FASEB J 9: 726-735. 105. Julius D, Nathans J (2012) Signaling by sensory receptors. Cold Spring Harb Perspect Biol 4: a005991. 106. Shichida Y, Matsuyama T (2009) Evolution of opsins and phototransduction. Philos Trans R Soc Lond B Biol Sci 364: 2881-2895. 107. Kaupp UB (2010) Olfactory signalling in vertebrates and insects: differences and commonalities. Nat Rev Neurosci 11: 188-200. 108. Sato K, Pellegrino M, Nakagawa T, Vosshall LB, Touhara K (2008) Insect olfactory receptors are heteromeric ligand-gated ion channels. Nature 452: 1002-1006. 109. Wicher D, Schafer R, Bauernfeind R, Stensmyr MC, Heller R, et al. (2008) Drosophila odorant receptors are both ligand-gated and cyclic-nucleotide- activated cation channels. Nature 452: 1007-1011. 110. Benton R (2008) Chemical sensing in Drosophila. Curr Opin Neurobiol 18: 357-363. 111. Amrein H, Thorne N (2005) Gustatory perception and behavior in Drosophila melanogaster. Curr Biol 15: R673-684. 112. Robertson HM, Thomas JH (2006) The putative chemoreceptor families of C. elegans. WormBook: 1-12. 113. Thomas JH, Robertson HM (2008) The Caenorhabditis chemoreceptor gene families. BMC Biol 6: 42. 114. Dunn CW, Hejnol A, Matus DQ, Pang K, Browne WE, et al. (2008) Broad phylogenomic sampling improves resolution of the animal tree of life. Nature 452: 745-749. 115. Hampl V, Hug L, Leigh JW, Dacks JB, Lang BF, et al. (2009) Phylogenomic analyses support the monophyly of Excavata and resolve relationships among eukaryotic "supergroups". Proc Natl Acad Sci U S A 106: 3859-3864. 116. Burki F, Okamoto N, Pombert JF, Keeling PJ (2012) The evolutionary history of haptophytes and cryptophytes: phylogenomic evidence for separate origins. Proc Biol Sci 279: 2246-2254. 117. Derelle R, Lang BF (2012) Rooting the eukaryotic tree with mitochondrial and bacterial proteins. Mol Biol Evol 29: 1277-1289.

54 118. Laurin-Lemay S, Brinkmann H, Philippe H (2012) Origin of land plants revisited in the of sequence contamination and missing data. Curr Biol 22: R593-594. 119. Srivastava M, Simakov O, Chapman J, Fahey B, Gauthier ME, et al. (2010) The Amphimedon queenslandica genome and the evolution of animal complexity. Nature 466: 720-726. 120. Ryan JF, Pang K, Schnitzler CE, Nguyen AD, Moreland RT, et al. (2013) The genome of the ctenophore Mnemiopsis leidyi and its implications for cell type evolution. Science 342: 1242592. 121. Srivastava M, Begovic E, Chapman J, Putnam NH, Hellsten U, et al. (2008) The Trichoplax genome and the nature of placozoans. Nature 454: 955-960. 122. Putnam NH, Srivastava M, Hellsten U, Dirks B, Chapman J, et al. (2007) Sea anemone genome reveals ancestral eumetazoan gene repertoire and genomic organization. Science 317: 86-94. 123. Rottinger E, Lowe CJ (2012) Evolutionary crossroads in developmental biology: hemichordates. Development 139: 2463-2475. 124. Eddy SR (2004) What is a hidden Markov model? Nat Biotechnol 22: 1315- 1316. 125. Eddy SR (1998) Profile hidden Markov models. Bioinformatics 14: 755-763. 126. Eddy SR (2011) Accelerated Profile HMM Searches. PLoS Comput Biol 7: e1002195. 127. Finn RD, Clements J, Eddy SR (2011) HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 39: W29-37. 128. Dunbrack RL, Jr. (2006) Sequence comparison and protein structure prediction. Curr Opin Struct Biol 16: 374-384. 129. Edgar RC, Batzoglou S (2006) Multiple sequence alignment. Curr Opin Struct Biol 16: 368-373. 130. Bork P, Koonin EV (1998) Predicting functions from protein sequences--where are the bottlenecks? Nat Genet 18: 313-318. 131. Nuin PA, Wang Z, Tillier ER (2006) The accuracy of several multiple sequence alignment programs for proteins. BMC Bioinformatics 7: 471. 132. Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22: 4673-4680. 133. Katoh K, Misawa K, Kuma K, Miyata T (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30: 3059-3066. 134. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32: 1792-1797. 135. Notredame C, Higgins DG, Heringa J (2000) T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol 302: 205-217. 136. Do CB, Mahabhashyam MS, Brudno M, Batzoglou S (2005) ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Res 15: 330-340. 137. Katoh K, Toh H (2008) Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinform 9: 286-298. 138. Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30: 772- 780. 139. Yang Z, Rannala B (2012) Molecular phylogenetics: principles and practice. Nat Rev Genet 13: 303-314.

55 140. Kumar S, Gadagkar SR (2000) Efficiency of the neighbor-joining method in reconstructing deep and shallow evolutionary relationships in large phylogenies. J Mol Evol 51: 544-553. 141. Gascuel O, Steel M (2006) Neighbor-joining revealed. Mol Biol Evol 23: 1997- 2000. 142. Huelsenbeck JP, Ronquist F, Nielsen R, Bollback JP (2001) Bayesian inference of phylogeny and its impact on evolutionary biology. Science 294: 2310-2314. 143. Ronquist F (2004) Bayesian inference of character evolution. Trends Ecol Evol 19: 475-481. 144. Holder M, Lewis PO (2003) Phylogeny estimation: traditional and Bayesian approaches. Nat Rev Genet 4: 275-284. 145. Guindon S, Gascuel O (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52: 696-704. 146. Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, et al. (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59: 307-321. 147. Marchler-Bauer A, Derbyshire MK, Gonzales NR, Lu S, Chitsaz F, et al. (2015) CDD: NCBI's conserved domain database. Nucleic Acids Res 43: D222-226. 148. Coggill P, Finn RD, Bateman A (2008) Identifying protein domains with the Pfam database. Curr Protoc Bioinformatics Chapter 2: Unit 2 5. 149. Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, et al. (2009) InterPro: the integrative protein signature database. Nucleic Acids Res 37: D211-215. 150. Jones P, Binns D, Chang HY, Fraser M, Li W, et al. (2014) InterProScan 5: genome-scale protein function classification. Bioinformatics 30: 1236-1240. 151. Soding J (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 21: 951-960. 152. Nordstrom KJ, Fredriksson R, Schioth HB (2008) The amphioxus (Branchiostoma floridae) genome contains a highly diversified set of G protein-coupled receptors. BMC Evol Biol 8: 9.

56

Acta Universitatis Upsaliensis Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine 1116 Editor: The Dean of the Faculty of Medicine

A doctoral dissertation from the Faculty of Medicine, Uppsala University, is usually a summary of a number of papers. A few copies of the complete dissertation are kept at major Swedish research libraries, while the summary alone is distributed internationally through the series Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine. (Prior to January, 2005, the series was published under the title “Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine”.)

ACTA UNIVERSITATIS UPSALIENSIS Distribution: publications.uu.se UPPSALA urn:nbn:se:uu:diva-258956 2015