Iowa State University Capstones, Theses and Graduate Theses and Dissertations Dissertations

2011 Genomic and functional characterization of G -coupled receptors in the human pathogen Schistosoma mansoni and the model planarian Schmidtea mediterranea Mostafa Zamanian Iowa State University

Follow this and additional works at: https://lib.dr.iastate.edu/etd

Recommended Citation Zamanian, Mostafa, "Genomic and functional characterization of G protein-coupled receptors in the human pathogen Schistosoma mansoni and the model planarian Schmidtea mediterranea" (2011). Graduate Theses and Dissertations. 12153. https://lib.dr.iastate.edu/etd/12153

This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Genomic and functional characterization of G protein-coupled receptors in the human pathogen Schistosoma mansoni and model planarian Schmidtea mediterranea

by

Mostafa Zamanian

A dissertation submitted to the graduate faculty

in partial fulfillment of the requirements for the degree of

DOCTOR OF PHILOSOPHY

Major: Neuroscience

Program of Study Committee: Timothy A. Day, Major Professor Thomas J. Baum Steve A. Carlson M. Heather W. Greenlee Michael J. Kimber

Iowa State University

Ames, Iowa

2011

Copyright c Mostafa Zamanian, 2011. All rights reserved.

ii

DEDICATION

I would like to dedicate this thesis to my family. To my mom and dad, who sparked my interest in science as a child and provided a stable and nurturing environment for me to explore my own path. This could not have occurred without their unconditional love and timely wisdom.

To my sister, Maryam, for the valuable emotional support she provided throughout. iii

TABLE OF CONTENTS

LIST OF TABLES ...... vi

LIST OF FIGURES ...... vii

ACKNOWLEDGEMENTS ...... 1

1 GENERAL INTRODUCTION ...... 2

1.1 Schistosomes: Significant Human Parasites ...... 2

1.2 Planarians: Robust Model Organisms ...... 4

1.3 Genomics ...... 5

1.4 Functional Genomics: RNA Interference ...... 7

1.4.1 Canonical RNAi Pathway ...... 7

1.4.2 RNAi in Schistosomes ...... 9

1.4.3 RNAi in Planaria ...... 10

1.5 G Protein-Coupled Receptors ...... 10

1.5.1 Signal Transduction ...... 11

1.5.2 Export and Regulation ...... 13

1.5.3 Phylogenetic Overview ...... 15

1.6 In silico GPCR Mining ...... 16

1.7 In silico GPCR Classification ...... 17

1.8 GPCR Deorphanization ...... 20

1.8.1 Difficulties and Pitfalls ...... 21

1.9 General Objectives ...... 22 iv

2 The Repertoire of G Protein-Coupled Receptors in the Human Parasite

Schistosoma mansoni and the Model Organism Schmidtea mediterranea . 23

Abstract ...... 23

2.1 Background ...... 24

2.2 Results and Discussion ...... 27

2.3 Conclusions ...... 42

2.4 Acknowledgements ...... 44

2.5 Methods ...... 44

2.6 Figures ...... 47

2.7 Tables ...... 62

3 Novel RNAi-mediated Approach to Probing Platyhelminth G Protein-

Coupled Receptors ...... 73

Abstract ...... 73

3.1 Introduction ...... 74

3.2 Results and Discussion ...... 76

3.3 Conclusions ...... 80

3.4 Materials and Methods ...... 81

3.5 Acknowledgements ...... 85

3.6 Figures and Tables ...... 85

4 PROF1 Localization in Planaria ...... 93

4.1 Planarian In situ hybridization protocol ...... 93

4.2 PROF1 Localization Results ...... 94

5 CONCLUSIONS ...... 95

APPENDIX A The genome of the blood fluke Schistosoma mansoni . . . . . 97

APPENDIX B Yeast GPCR Assay ...... 104

APPENDIX C PROF1 Primers and RT-PCR ...... 108

APPENDIX D RNAi vector ...... 112 v

APPENDIX E cAMP Assay Optimization ...... 113

APPENDIX F PERL Code ...... 114

BIBLIOGRAPHY ...... 126 vi

LIST OF TABLES

Table 1.1 Genome assembly statistics...... 6

Table 1.2 Heterotrimeric G protein classification...... 11

Table 1.3 GRAFS species comparison...... 15

Table 2.1 Stage-specific GPCR tabulation...... 64

Table 2.2 GRAFS-based comparison of GPCR repertoires...... 65

Table 2.3 Planarian homologs of parasite GPCRs...... 66

Table 2.4 Planarian homologs of parasite GPCRs- continued...... 67

Table 2.5 Planarian homologs of parasite GPCRs- continued...... 68

Table 2.6 Comparison of PROF1 motifs and classical motifs...... 69

Table 2.7 Rhodopsin SVM training parameters and cross-validation accuracy. . . 70

Table 2.8 Rhodopsin SVM classifier results...... 71

Table 2.9 Amine SVM classifier results...... 72

Table 3.1 RNAi-based GtNPR-1 deorphanization cAMP raw values ...... 91

Table 3.2 5-HT candidate receptor selection ...... 92

Table A.1 Summary of Ligand-Gated Ion Channels (LGIC) ...... 100

Table A.2 Summary of Ligand-Gated Ion Channels (LGIC)- Continued ...... 101

Table A.3 Summary of Voltage-Gated and Other Ion Channels ...... 102

Table A.4 Summary of Voltage-Gated and Other Ion Channels- Continued . . . . 103

Table C.1 PROF1 RT-PCR conditions ...... 111 vii

LIST OF FIGURES

Figure 1.1 Schistosome life cycle...... 4

Figure 1.2 GPCR signal transduction...... 12

Figure 1.3 GPCR classification tree...... 17

Figure 1.4 SVM classification: the linear case...... 19

Figure 1.5 GPCR deorphanization in mammalian culture...... 21

Figure 2.1 Transmembrane domain-focused GPCR sequence mining strategy...... 52

Figure 2.2 HMM-based identification of S. mansoni and S. mediterranea GPCRs. .... 53

Figure 2.3 Manual curation and expansion of Schistosoma mansoni GRAFS GPCRs. .. 54

Figure 2.4 GRAFS phylogenetic tree...... 55

Figure 2.5 Rhodopsin phylogenetic tree...... 56

Figure 2.6 Aminergic receptors: S. mediterranea and S. mansoni...... 57

Figure 2.7 Phylogenetic analysis of PROF1 GPCRs...... 58

Figure 2.8 PROF1 multiple sequence alignment...... 59

Figure 2.9 Phylogenetic analysis of Glutamate GPCRs...... 60

Figure 2.10 Schematic of glutamate in association with LBD residues...... 61

Figure 3.1 RNAi-based deorphanization approach overview...... 85

Figure 3.2 cAMP ligand screen ...... 86

Figure 3.3 Semi-quantitative PCR reveals GtNPR-1 knockdown...... 87

Figure 3.4 RNAi-based GtNPR-1 deorphanization ...... 88

Figure 3.5 Phamacological response profile of 5-HT receptor agonists...... 89

Figure 3.6 Effects of Smed-SER85 suppression on basal motility...... 90 viii

Figure 4.1 Localization of PROF1 transcripts in S. medterranea ...... 94

Figure B.1 Heterologous GPCR expression in yeast...... 104

Figure B.2 Smp 011940 sequence alignment...... 105

Figure B.3 Full-length transcript of Smp 011940...... 107

Figure C.1 S. mediterranea PROF1 RT-PCR...... 111

Figure D.1 RNAi vector pPR244 (pDONRdT7)...... 112

Figure E.1 Membrane preparation optimization with cAMP RIA...... 113 1

ACKNOWLEDGEMENTS

I would like to express my sincerest gratitude to the many individuals who helped me over the course of my doctoral studies and research. First and foremost, I would like to thank my advisor Dr. Tim Day for his guidance and patience. Dr. Day’s refreshing approach to graduate education provided an environment that tremendously aided my development as a scientist. I will remain grateful for the opportunity he extended to me. I would like to thank Dr. Michael

Kimber for the many hours he dedicated to my molecular biology training in his role as a postdoctoral research associate, and for his open door policy later on in his role as faculty. My research experience was greatly enriched by constructive advice from Drs. Baum, Carlson and

Greenlee.

I would also like to take this opportunity to thank the many students who were involved with this work in various capacities. In particular, Dr. Ekaterina Novozhilova for her general helpfulness as the senior graduate sudent in the laboratory for much of the time I was a student.

Valuable discussions and assistance were provided by Prince Agbedanu, Michael Song, and

Nalee Xiong. To all of these individuals I am grateful. 2

1 GENERAL INTRODUCTION

This dissertation describes work towards the goal of characterizing an important superfamily of cell receptor in two prominent organisms. This is of interest from an infectious disease and human health standpoint, as well as a purely biological standpoint. The protein superfamily investigated is the G protein-coupled receptor (GPCR) family. Extensive efforts were made to delineate the receptor complements of the human parasite Schistosoma mansoni and the model organism Schmidtea mediterranea. Further work primarily focuses on validation of a novel method for elucidating receptor function in the native cell membrane environment.

The first chapter of this dissertation describes these biomolecules and organisms in appropriate depth, justifies the significance of the research topic, and outlines a hypothesis-driven research plan, and serve as a general foundation for the research manucripts that follow. These works are tied together and discussed in the final chapter, where future avenues of research are suggested.

The appendices include various supplementary documents and my summarized contributions to related published work that are referenced within manuscripts.

1.1 Schistosomes: Significant Human Parasites

Schistosomes are trematodes of the phylum Platyhelminths, and etiological agents of schisto- somiasis. In their parasitic capacity, schistosomes continue to pose a significant challenge to human health. Recent estimates place the number of infected humans at a staggering 207 million across 74 countries, with 280,000 deaths per annum attributed to schistosomiasis in sub-Saharan Africa alone [1]. In endemic areas, another 779 million people are thought to be at risk of infection [2]. It is calculated that 70 million disability-adjusted life years (DALYs) 3 are lost to schistosomiasis annually [2,3]. This figure surpasses the global burdens posed by both malaria and tuberculosis, and is nearly equivalent to that of HIV/AIDS [4]. To further exacerbate the situation, there exists strong evidence that the prevalence of this disease helps account for the disproportionately high HIV-1 infection rates observed in afflicted areas [5].

The genus Schistosoma contains a number of species that infect humans, with varying geo- graphic distributions. Among these, Schistosoma mansoni, Schistosoma japonicum, and Schis- tosoma haematobium account for a prevalence of infections. S. mansoni and S. japonicum cause intestinal schisotomiasis, while S. haematobium leads to urogenital schistosomiasis. The wide range of species-specific clinical manifestations associated with chronic schistosomiasis have been well described. These include persistent inflammation, and morbidity brought on by anemia, diarrhea, chronic abdominal pain, fatigue, malnutrition, and childhood stunting [6].

While there is great variance in the severity of pathologies, the overall health and economic impact of this disease is a crippling factor in many developing countries. The relative lack of attention schistosomiasis receives with respect to its human health burden has led to its classification as a neglected tropical disease (NTD).

At present, this overwhelming disease burden is met with a near exclusive reliance on treat- ment with the drug praziquantel. Significant advances in schistosomiasis control have been achieved in the past few years, due primarily to well-organized control programs that include aggressive chemotherapy [7,8]. While these programs produce unquestionable public health benefit, their assertive use of praziquantel raises well-founded concerns regarding the long-term efficacy of this approach. Alarming reports of emerging drug resistance and the potential for re- sistance [9,10] have spurred recognition of the pressing need for new antischistosomals [1,11–15].

The schistosome life cycle is complex, requiring an intermediate fresh water host. We can begin description of this life cycle with the deposition of schistosome eggs into water. The eggs hatch and release motile miracidia, which parasitize snails as an intermediate host. These miracadia 4 then develop into sporocysts within the intermediate snail host. After successive generations, free-living cercariae emerge and are released into water. Cercareae penetrate human skin and transition into schistosomulae. Schistosomules enter the venous circulatory system and travel to the lungs. From there, they migrate to the hepatic portal system where they undero sexual maturation and develop into male and female adults. Adult mate pairs migrate to a final in- fection site in a species-dependent manner: either the mesenteric venules of the small intestine

(intestinal schistosomiasis), or the venous plexus of the bladder (urinary schistosomiasis). The female parasites produce hundreds to thousands of eggs daily. These eggs pass through the lumen wall and are then expelled in feces or urine, allowing the life cycle to begin anew. The life cycle is displayed in Figure 1.1.

H2O H2O H2O

- - ' - - $' - - $- - E M S C S A E

  &Intermediate Host %&Host (human) % (Snail)

Figure 1.1 Schistosome life cycle.

The schistosome life-cycle is depicted in linearized fashion. Life stages: egg (E), miracidium (M), sporocyst (S), cercaria (C), schistosomule (S), and adult (A). Intermediate host (snail) and host (human) stages are outlined.

1.2 Planarians: Robust Model Organisms

Planarians are free-living bilateral organisms of the phylum Platyhelminthes. The remarkable regenerative capabilities of these organisms and their evolutionary position as basal metazoans, have advanced their use as a model system to study stem cell biology and development [16–18].

The planarian body is seeded with stem cells exhibiting totipotency. These ‘neoblasts’ con- stitute > 20% of the total organismal cell count and exhibit the capacity to differentiate into 5 any of the estimated 40 planarian cell types [19]. Planarians are hermaphroditic and rely on either or both sexual or asexual modes of reproduction to propagate, in a species-dependent manner. Asexual reproduction occurs via transverse fission along the anteroposterior axis, and can be promoted by various environmental cues including population density and light-dark cycles [20,21]. A large number of planarian species exist with varying reproductive modes com- plicated by varying ploidy. Among these, Schmidtea mediterranaea is the most widely studied.

S. mediterranaea is a stable diploid (2n = 8) with a relatively small haploid genome. Inter- estlingly, a Robertsonian chromosomal translocation involving the fusion of the whole arm of 1 to chromosome 3 has generated exclusively asexual strains of this species [22].

Both sexual and asexual strains are easily maintained and propagated in the laboratory. Clonal asexual lines derived by serial amputation severely limit genetic variation and are therefore ideal for experimental manipulation and examination of genome content. Other planarian species, such as Dugesia tigrina and Dugesia japonica, are also widely studied with respect to their strong regenerative abilities. However, their mixoploid genomes (2n = 16 and 3n = 24) cou- pled with the presence of large numbers of transposable elements [23], complicate genomic analyses. In the course of this work, we make use of S. mediterranaea and D. tigrina.

1.3 Genomics

The most recent release of the S. mansoni genome is in the form of a 381 Mb assembly dis- playing approximately 6X coverage, accompanied by a set of 11,812 predicted proteins [24].

Schistosome genomic DNA was isolated and prepared using standard methods from mixed-sex cercariae of the Puerto Rico isolate. Whole-genome shotgun Sanger sequencing was performed on insert-containing plasmid, fosmid, and bacterial artificial (BAC) libraries. The

final predicted gene set resulted from the application of multiple ab initio gene prediction al- gorithms that were further refined with transcriptome data and information incorporated from the S. japonicum genome [25]. A manually curated subset of 402 schistosome gene models served as a training input for the predition algorithms employed. 6

The current S. mediterranea genomic assembly is estimated at 865 Mb in length at 11X ∼ sequencing depth. A similar WGS (whole-genome shotgun) sequencing approach was used, whereby plasmid and fosmid libraries were constructed with genomic DNA purified from the clonally derived S2F2 strain. The predicted S. mediterranea proteome arises from 30,930 gene models that were produced by MAKER [26,27], an automated genome annotation pipeline that makes use of a number of gene prediction tools and available EST (expressed sequence tag) col- lections. However, there is evidence to suggest that this may be a vast overestimate of the true planarian gene count. In order to refine this figure, 31 S. mediterranea mRNAs were sequenced and aligned with the assembled genome and compared to corresponding MAKER predictions.

This exercise generated estimates of transcripts represented in the MAKER gene set (90%), as well as transcripts extending across multiple contigs (30%) and transcripts incorrectly split into multiple gene models (15%). Extrapolation of these figures to the whole genome indicates a more reasonable protein-coding gene count of 15,570. Table 1.1 presents the current state of both nuclear genomic assemblies.

S. mansoni S. mediterranea Size 363 Mb 865 Mb Assembly v3.1 v3.1 Coverage 6X 11.6X Repeats 45% 46% A/T Richness 65% 69% Contigs 31,407 94,682 C N50 17,677 19,025 Supercontigs 5,745 43,294 SC N50 879,876 40,862 Gene Prediction ↓ ↓ 11,809 39,930* (15,570)

Table 1.1 Genome assembly statistics. 7

1.4 Functional Genomics: RNA Interference

The recent arrival of flatworm whole genome sequence data opens avenues for functional ap- proaches to investigating the molecular physiology of both schistosomes and planarians. Careful enlistment of these genomic resources alongside improved helminth gene manipulation proto- cols, will aid in deciphering mechanisms of parasite survival and pathogenesis and may lead to new strategies for parasite control and elimination. Although significant strides have been made towards developing flatworm transgenic approaches [28,29], readily-adaptible and robust protocols are still lacking. In the case of schistosomes, this effort is hampered by the complex parasite life cycle which only allows for time-restricted maintenance of specific life stages in vitro [30]. Reports of the successful introduction of transgenes in schistosomes have thus far been transient and stage-specific, involving electroporation as a means of delivery [31,32]. The potential for germ line transgenesis [33] could eventually yield parasite strains with heritable loss-of-function and gain-of-function genotypes. Advances in parasite culture techniques could conceivably catalyze this recognized and worthwhile goal.

The best-established functional genomics tool available for the study of both pathogenic and free-living flatworms is RNA interference (RNAi). RNAi was first discovered in the nematode

Caenorhabditis elegans, where introduction of exogenous double-stranded RNA (dsRNA) was shown to result in sequence-specific gene supression [34,35]. This phenomenon has been shown to extend to protozoa and nearly all higher-order eukaryotes examined, and has rapidly become a standard tool for loss-of-function gene analysis. Long dsRNAs can elicit potent gene sup- pression when introduced into worms, flies, and plants [36]. In mammalian cells, much shorter silencing triggers are required to avoid a non-specific interferon response (IFN) [37]. What follows is an outline of the basic components of the canonical RNAi pathway.

1.4.1 Canonical RNAi Pathway

In general, the RNAi pathway involves the association of small non-coding RNAs (ncRNAs) with a ribonucleoprotein complex dubbed the RNA-induced silencing complex (RISC). RISC 8 uses sequence information from a guide strand to downregulate the translation of comple- mentary mRNAs via transcript cleavage or translational block. The RISC-mediated silencing pathways triggered by small ncRNAs that originate from exogenously-applied long dsRNAs and endogenous microRNAs (miRNAs) essentially converge after a series of RNA processing events. Other pathways involving ncRNAs derived from single stranded RNA (ssRNA) precur- sors such as piwi-interacting RNAs (piRNAs) are set aside in this overview.

Exogenous dsRNAs are first processed into 21-25 nt dsRNAs by the RNase III family ribonu- ∼ clease Dicer [38]. Dicer contains a PAZ domain which recognizes dsRNA helical ends, and two catalytic RNase III domains which cleave individual strands to produce siRNA duplexes with 2 nt 3’ overhangs [39]. The distance beetween the PAZ and RNase III domains corresponds to the length of the siRNA duplexes produced. The guide strand incorporated into RISC is determined by the relative thermodynamic stability of the 5’ ends of the two siRNA duplex strands [39–41].

In the microRNA pathway, genomically-encoded miRNAs are first transcribed by RNA poly- merase II as long pri-miRNAs with a 5’ cap and a poly-A tail [42,43]. Pri-miRNAs are processed in the nucleus by a complex consisting of the RNase III enzyme Drosha and the dsRNA-binding protein Pasha, yielding 65-70 nt pre-miRNAs folded into stem-loop structures. Following ex- ∼ port to the cytoplasm, pre-miRNAs are recognized and cleaved into imperfect miRNA:miRNA* duplexes by Dicer. As with siRNA duplexes, the mature miRNA guide strand is selected for incorporation into the RISC complex based on the relative thermodynamic stability of the 5’ ends of the two miRNA duplex strands.

The RISC complex contains members of the Argonaute (Ago) protein family [44]. Argonaute proteins play a role in guide strand selection and direct endonuclease activity against mRNA transcripts complementary to their bound siRNA or miRNA guide fragment. The strand that displays lower stability base pairing in its 5’ end preferentially associates with RISC, and its complementary passenger strand is degraded as the first RISC substrate. At this juncture in 9 the pathway, the origin of the silencing trigger (siRNA or miRNA) does not affect the mechan- ims of downstream post-transcriptional silencing. This is instead determined by the degree of complementarity between the guide strand and a target transcript. miRNAs tend to exhibit partial complementarity to the 3’ UTR of one or more target transcripts, leading to transla- tional repression [45]. This allows for cell-type and tissue specific gene regulation [46]. Perfect or near-perfect complementarity between siRNAs or miRNAs and their target transcripts leads to transcript cleavage.

In plants and nematodes, RNAi can be a systemic phenomenon. Cell-to-cell dispersion of silencing triggers in C. elegans involves a dsRNA channel (SID-1), which acts as a conduit for the rapid energy-independent import of dsRNA. There is evidence to suggest that alternative

SID-1-independent pathways are responsible for the export of dsRNA from cells [47]. Most examined animals, except some insects, house at least a single SID-1 homolog. Less universally conserved mechanisms that improve RNAi efficiency include distinct pathways for the biogenesis of secondary siRNAs. In C. elegans, this form of signal amplification occurs via RNA-dependent

RNA polymerase (RdRP) activity [48].

1.4.2 RNAi in Schistosomes

RNAi in schistosomes provides new opportunities for focused exploitation of genomic data. It is clear that schistosomes possess the cellular machinery that mediates RNAi, with a number of key molecular actors bioinformatically identified or characterized [49–53]. These include a single Dicer homolog (smDicer), two Drosha homologs (SmDrosha1 & 2), four Argonaute homologs (SmAgo1-4), and a SID-1 homolog. The latter suggests that these parasites retain cell-to-cell dsRNA transport mechanisms similar to C. elegans. miRNAs have also been iden- tified in silico in both S. mansoni and S. japonicum [54–56], further establishing the presence of the canonical miRNA-processing and RNAi pathway in schistosomes.

Since the original publications documenting RNAi in S. mansoni [57, 58], there has been an 10 accumulation of published accounts of gene silencing in S. mansoni life stages including adults, schistosomules, sporocysts and eggs [51,59–70]. For the most part, silencing has been reported for abundantly expressed genes associated with the surface tissues or gut using dsRNA or siRNA triggers, such that the RNAi-susceptibility of genes associated with other tissue types is unclear.

For example, the silencing of 32 different genes in the developing sporocyst revealed variable knockdown efficiencies and some off-target silencing, demonstrating the need to optimize and validate RNAi on a gene-by-gene basis [67].

1.4.3 RNAi in Planaria

Double-stranded RNA has been found to be useful for inducing acute gene silencing in planaria.

The susceptibility of planarians such as S. mediterranea to dsRNA-mediated gene interference has been well demonstrated. RNAi was first shown to lead to both specific and near complete gene inhibition with a microinjection protocol [71]. As with schistosomes, miRNAs have been identified in silico in planaria [72–74]. With the required internal machinery established, other protocols have since been described to achieve the desired silencing effect in planarians, in- cluding soaking [75] and bacterial feeding [76, 77] as a means of dsRNA delivery. The latter involves transforming RNAi vector constructs into RNase III-deficient bacterial cells, induction of dsRNA synthesis, and ingestion of dsRNA-containing bactera by planarians. This method has already been utilized in a high-throughput manner to examine the effects genes may have on regeneration and stem cell function by noting developmental and morphological defects, as well as other behavioral phenotypes [78].

1.5 G Protein-Coupled Receptors

G protein-coupled receptors (GPCRs) represent the largest known superfamily of proteins in the metazoa. Relevant to our aims, GPCRs are long-established as lucrative targets for therapeutic intervention, acting as targets for 50% of all prescription pharmaceuticals [79]. This is undoubt- edly a function of their extensive role in eukaryotic signal transduction, and more precisely, their involvement in a myriad of consequential stimulus-response pathways. Sequence diversity 11 within the GPCR superfamily is matched appropriately by diversity in the panel of known exogenous and endogenous ligands [80]. Although GPCRs respond to a range of molecules that include biogenic amines, peptides, odorants and classical neurotransmitters, they are defined by a common structural motif: a core domain of well-conserved seven transmembrane-spanning

(7TM) α-helices.

1.5.1 Signal Transduction

GPCRs transduce extracellular signals to intracellular signaling cascades by acting as guanine nucleotide exchange factors (GEFs). The activation of a GPCR by its cognate ligand promotes

GDP-GTP exchange in associated heterotrimeric G proteins, comprised of an α subunit and a βγ dimer. The GTP-bound Gα subunit then dissociates from the dimer and activates a particular biochemical pathway, depending on its subtype. Gαs,Gαi/o,Gαq/11, and Gα12/13 constitute the larger G protein family groupings (Table 1.2)[81]. G proteins exert biochemical influence on effector molecules, leading to downstream accumulation or reduction of second- messengers. This can lead to changes in the activity of metabolic enzymes, ion channels, transporters, and the cellular transcriptional machinery, producing a biological effect. Al- though GPCR signaling is complex and can include G protein-independent pathways [82], the primary G protein-dependent pathways that result from receptor activation are the cAMP and phosphatidylinositol pathways [83].

Family Gα Subtype

Gi/o Gi,Go,Gt,Ggust,Gz

Gs Gs,Golf

Gq/11 Gq,G11,G14,G15,G16

G12/13 G12,G13

Table 1.2 Heterotrimeric G protein classification.

Gαs and Gαi stimulate and inhibit the membrane-associated effector adenylate cyclase, which catalyzes the conversion of ATP to the second-messenger 3’,5’-cyclic AMP (cAMP), respec- tively. Gαq/11 acts on the membrane-bound effector phospholipase C beta (PLC-β), which 12 cleaves phosphatidylinositol 4,5-bisphosphate (PIP2) to produce diacyl glycerol (DAG) and the second-messenger inositol 1,4,5-triphosphate (IP3). In turn, IP3 diffuses into the cytosol

2+ and agonizes IP3-gated ion channels on the ER surface, causing an increase in cytosolic Ca concentration. Gα12/13 is implicated mostly in such processes as actin remodeling and cellular migration. The Gβγ dimer lacks a catalytic domain and acts through regulated protein-protein interactions [84]. Although the list of Gβγ-interacting proteins continues to grow, Gβγ signal- ing is generally treated as a secondary signaling pathway with respect to Gα-mediated signaling.

Figure 1.2 presents an overview of the primary GPCR signaling pathways and the endpoints most frequently assayed as a function of GPCR activation.

!"+,-./0/0"1/1023-" !"41+5&6" $'&(" !" @:A2.,"

""+,-./0/0"1/1023-" ))" )*" *)" ""41+5&6" $'&(" )" )))" *" *))"

!"&7839780:923-";")&?6" !" $'&(" #" #" $%&" !"4;2=>6"

$'&("

Figure 1.2 GPCR signal transduction.

The three primary G protein subtypes (G-αs, G-αi and G-αq) are depicted along with their corresponding intracellular signaling pathways. The biochemical endpoints most commonly monitored in studies of GPCR activation are cAMP and Ca2+.

More recent findings have further nuanced our understanding of GPCR structure and signaling.

A growing body of biophysical studies indicate the presence of functional receptor dimers and higher order oligomers in Rhodopsin GPCRs with varying symmetries [85, 86]. While 13 examples of obligate homo- and hetero-dimerization have long existed for Glutamate family

GPCRs [87], this represents a significant challenge to the classical view of receptor monomers as the functional signaling unit for the numerically dominant Rhodopsin family. Despite ambiguity as to the scale of occurrence, the possibility of widespread oligomeric signaling necessitates the development of more sophisticated pharmacological and regulatory models to describe receptor behavior [88].

1.5.2 Export and Regulation

GPCR synthesis, folding, and assembly occur at the ER. The underlying mechanisms of GPCR transport from the ER to the cell surface are not yet very well understood, at least in com- parison to the extensive work performed on the endocytic pathway. Proper folding has been shown to involve the guidance of ER chaperones and accessory proteins [89]. Further, proteins such as receptor activity modifying proteins (RAMPs) can affect the pharmacology of eventual surface-exposed receptors [90]. Correctly-folded receptors which are able to pass the ER quality control mechanism are packaged into ER-derived vesicles for export. GPCR motifs have been identified that play a role in ER export regulation by selective interaction with components of the COPII transport system, influencing such parameters as cargo concentration and rate of exit [91]. GPCRs are transported in vesicles to the cell surface through the ER-Golgi inter- mediate complex (ERGIC), the Golgi, and the trans-Golgi network (TGN). Along the export pathway, the vast majority of GPCRs undergo some form of post-translational modification

(e.g., palmitoylation and glycosylation) [92].

G protein export follows along similar lines, with both co- and post-translational modifications helping establish the physical association of G protein subunits with the cell membrane. Except in the case of Gαt, all Gα subunits are N-terminally palmitoylated [81]. Gαi subunits are also targets of N-terminal myristoylation. Although the Gβγ dimer exhibits greater hydrophobicity, this is insufficient for membrane association and Gγ subunits undergo C-terminal thio-ether- linked isoprenylation with either a farnesyl or geranylgeranyl moeity [93]. Once GPCRs and 14

G proteins have made their way to the cell surface, they are known to interact with molecules such as GRKs (GPCR kinases) [94, 95], arrestins [96, 97], and RGS (regulators of G protein signaling) proteins such as GAPs (GTPase-activating proteins) [98, 99]. These cellular actors regulate receptor internalization, coupling, ligand binding, recruitment and recycling.

The activation cycle of heterotrimeric G proteins can be described by the equilibrium of active

Gα*[GTP] and inactive Gα[GDP] subunits. GPCRs in their active conformation act as GEFs, promoting Gα GDP release and the rapid binding of GTP: [GTP]cytosol [GDP]cytosol. There  are two basic models that describe G protein-GPCR interactions [93]. In the ‘collision cou- pling’ model, G proteins associate with active GPCRs as a result of free stochastic movement within the membrane. In the ‘pre-coupling’ model, G proteins are coupled to GPCRs prior to their activation. Antagonistic to receptor-mediated GEF activity, Gα subunits have intrinsic

GTPase activity. This reaction is catalyzed by GAPs, which significantly accelerate the rate of

GTP hydrolysis and thus, signal termination.

GEF * ∗ Gα[GDP]Gβγ ) Gα[GTP] + Gβγ −−−GAP−−−

To modulate the magnitute and specificity of intracellular biochemical responses to extracel- lular ligands, receptors are tightly regulated. The best-characterized pathway for receptor downregulation is the endocytic pathway via clathrin-coated pits (CCPs) [97, 100]. In this pathway, the intracellular domains of agonist-occupied GPCRs are phosphorylated by GRKs, followed by arrestin binding. Arrestin brings about receptor desensitization by inhibition of G protein coupling. Receptor/arrestin complexes can be recruited through an adapter complex into clathrin-coated pits, resulting in receptor internalization and the formation of the early endosome. Receptors are then sorted, either marked for lysosomal destruction or recycled back to the cell surface.

Another mechanism of homologous desensitization involves the activity of enzymes accumulated 15 downstream of G protein activation, such as protein kinase A (PKA) and protein kinase C

(PKC) [101]. These enzymes can directly or indirectly (through GRKs) phosphorylate agonist- bound GPCRs. Alteration of the coupling profile of a receptor is another potential consequence of phosphorylation [102]. The molecular events governing heterologous desensitization are less well understood [103], however, it should be noted that receptor signaling pathways can exhibit extensive cross-talk leading to positive or negative forms of regulation [104].

1.5.3 Phylogenetic Overview

Early efforts to sub-classify and bioinformatically fingerprint the GPCR superfamily gave rise to the A-F classification system [105, 106]. Pharmacological organization of this receptor fam- ily was continued with the GPCRDB database [107]. More recent phylogenetic analysis of the entire known human GPCR complement revealed five primary GPCR groupings: Gluta- mate, Rhodopsin, Adhesion, and Secretin [108]. Together, the families are referred to as ‘GRAFS’, and their lineages have been shown to extend throughout the metazoa [109, 110]

(Table 1.3). Within-family phylogenetic analysis of Rhodopsin GPCRs revealed four primary clusters (α, β, γ, and δ), further sub-classified into 13 groups.

H. sapiens M. musculus D. melanogaster C. elegans G 24 112 9 6 R 752 1106 76 124 A 27 13 5 5 F 10 11 7 5 S 20 28 13 5

Table 1.3 GRAFS species comparison.

Among these, the α amine and β peptide groupings are the most populated in most examined species, including the Ecdysozoa. Many of these Rhodopsin sub-groupings are only present in mammals, and examples of lineage-specific GPCR expansions can also be identified that fall outside this classification scheme. For example, nematode chemosensory receptors in C. elegans comprise over 85% of the total GPCR complement [111]. Flatworm whole genome sequence 16 data provides an important substrate for the potential identification of similar phylum or species-specific GPCR groupings.

1.6 In silico GPCR Mining

Growing genomic resources have provided a GPCR mining platform for a number of organ- isms, including Homo sapien [112], Mus musculus [113], Gallus gallus [114], Rattus rattus [115],

Tetraodon nigrovirdis [116], Anopheles gambiae [117], Drosophila melanogaster [118], Ciona intestinalis [119], Branchiostoma floridae [120], Xenopus tropicalis [121] and Canis famil- iaris [122]. GPCR sequences have been accumulated in these genomes using a range of bioin- formatics methods that include homology-based searching (BLAST), hidden Markov models

(HMMs) and motif-driven queries [123]. The more successful GPCR mining protocols have involved the application of a combination of such methods and algorithms.

Known GPCR sequences are commonly used as queries to search against sequence assemblies with BLAST [124]. BLAST is a heuristic that approximates the optimal Smith Waterman align- ment algorithm, allowing for efficient searches against large genomic datasets. This algorithm is ideal for finding sequences that share at least moderate sequence similarity, but is limited in its ability to detect highly-diverged or novel GPCR sequences. While BLAST identifies global , motif-based approaches focus on highly-conserved and length-restricted aspects of GPCR primary structure. The ordered combination of motifs in the form of finger- prints can be used to filter false positives generated by individual motifs. While motifs can be declared with different levels of exactness, this approach suffers from similar disadvantages in the detection of atypical GPCR sequences.

Profile hidden Markov models (HMMs) represent a more sophisticated alignment-based ap- proach to the modeling of protein families. In broad terms, a hidden Markov model consists of a system of ‘hidden’ states and observed variables. Conditional probabilities are ascribed to state transitions, as well as for the emission of variables within states. In the biological 17 application of HMMs, emmitted variables are typically sequence alphabets (e.g. nucleotide or amino acid) and hidden state paths represent underlying biological units (e.g. introns or GpC islands) [125]. Profile HMMs have been introduced as a means of building consensus statis- tical models of protein familities [126]. The underlying architecture of these HMMs consists of a ‘match’, ‘insert’, and ‘delete’ state, fit for linear sequence data. Profile HMMs have seen extensive use in GPCR discovery, and HMMs have also been developed for the prediction of

G protein coupling profiles [127], TM domain boundaries [128, 129], and the identification of peptide ligands [130].

1.7 In silico GPCR Classification

Superfamily GPCR

HMM

Level 1 (Family) Glutamate Rhodopsin Adhesion Frizzled Secretin

SVM

Level 2 (subfamily) Peptide Amine Hormone (Rhod) ...

SVM

Level 3 (sub-subfamily) Dopamine Histamine Serotonin ...

Figure 1.3 GPCR classification tree.

GRAFS-based distinctions are presented at the family (level 1) plane, while consensus classification shared between the A-F and GRAFS systems are used at the subfamily (level 2) and sub-subfamily (level 3) plane. Profile HMMs can be used to effectively distinguish GPCRs from other transmembrane proteins and to place them within their primary families. SVM classifiers can be used for level 2 and 3 placement of Rhodopsin family GPCRs.

There is great overlap in the computational approaches to GPCR mining and GPCR classifi- cation, and the methods discussed above can be applied to both. While HMMs are adequate for GPCR identification and classification at the family plane, other algorithms outperform

HMM and homology-based classifiers at deeper classification planes (Figure 1.3). Numerous support vector machine (SVM) classifiers have been shown to classify GPCRs with high accu- racy at the subfamily and sub-subfamily levels. In reduced terms, SVMs represent a powerful 18 supervised-learning method for data classification. Given a combined set of positively and negatively labeled training instances, an SVM produces a binary classifier that can then be used to label unknown samples. Each instance is associated with a fixed-length numerical fea- ture vector, containing certain attributes of the data to be classified. The SVM identifies a maximum-margin separating hyperplane to distinguish between vectors representing instances of opposite sign (Figure 1.4,[131]).

SVM training datasets are populated by pairings of feature vectors and classification groups:

D Dataset = xi, yi xi , yi 1, +1 (1.1) { } | ∈ < ∈ {− }

where i = 1, ..., n and n represents the number of training instances

where D represents the dimensionality of feature vectors used in training

In the linear case, the SVM identifies a hyperplane of the form:

w⊥x + b = 0 (1.2)

where w is the normal vector perpendicular to the hyperplane

We can select normalization such that

w⊥x + b +1 (1.3) ≥

w⊥x + b 1 (1.4) ≥ − for positive (+1) and negative ( 1) support vectors, respectively −

The margin between these is given by 2/ w , and an SVM can be trained to maximize this k k distance by minimizing w . This is treated as a quadratic optimization problem with linear k k constraints [131]. Data that is not fully linearly separable is dealt with by the introduction of slack variables that introduce costs for misclassified instances. Additionally, non-linearly sep- arable feature vectors must often be mapped to a higher dimensional space by the application 9

19

of kernel functions. The kernel trick allows for the construction of a separating hyperplane in

the transformed feature space.

w

H2 -b |w| H1 Origin Margin

Figure 1.4 SVM classification: the linear case.

Figure 5. Linear separating hyperplanes for the separable case. The support vectors are circled. Recently, this approach has seen extensive use in the area of biosequence discrimination, and

relevant to our goals, the particular problem of GPCR classification. In the first study on

the matter, SVM-based classifiers were shown to drastically outperform their BLAST and Thus, we introduce positive Lagrange multipliers αi,i=1, ,l,oneforeachofthe HMM-based counterparts for level 1 and level 2 GPCR subclassification [132]. Subsequent··· inequality constraints (12). Recall that the rule is that for constraints of the form ci 0, the constraintstudies equations further improved are the multiplied predictive performance by ofpositive SVMs with theLagrange introduction of dipep- multipliers and subtracted≥ from the objectivetide composition function, feature to vectors form [133, 134 the], achieving Lagrangian. accuracies of 97.3% For and equality 96.4% for level constraints, the Lagrange multipliers are1 unconstrained. (Rhodopsin) and level 2 (amine) This classification, gives Lagrangian:respectively. Alternative feature vectors have since been similarly validated [135, 136]. 1 l l L wAlthough2 SVMs areα binaryy (x classifiers,w + theyb can)+ be appliedα to multi-class problems where the (13) P ≡ 2# # − i i i · i number of classes,i=1 k, is greater than 2. Such problemsi=1 are typically solved using either the “one-versus-rest”! (OvR) or “one-versus-one” (OvO)! method. In the OvR scenario, k binary

We must nowclassifiers minimize are trained,L suchP thatwith each classifier respect separates to onew class,b from,andsimultaneouslyrequirethatthe all others. The “winner- derivatives of L with respect to all the α vanish, all subject to the constraints α 0 P i i ≥ (let’s call this particular set of constraints 1). Now this is a convex quadratic programming problem, since the objective function is itselfC convex, and those points which satisfy the constraints also form a convex set (any linearconstraintdefinesaconvexset,andasetof N simultaneous linear constraints defines the intersection of N convex sets, which is also aconvexset).Thismeansthatwecanequivalentlysolvethefollowing“dual”problem: maximize LP ,subjecttotheconstraintsthatthegradientofLP with respect to w and b vanish, and subject also to the constraints that the α 0(let’scallthat particular set of i ≥ constraints 2). This particular dual formulation of the problem is called the Wolfe dual (Fletcher, 1987).C It has the property that the maximum of L ,subjecttoconstraints , P C2 occurs at the same values of the w, b and α,astheminimumofLP ,subjecttoconstraints 8. C1 Requiring that the gradient of LP with respect to w and b vanish give the conditions:

w = αiyixi (14) i !

αiyi =0. (15) i ! Since these are equality constraints in the dual formulation, we can substitute them into Eq. (13) to give 1 L = α α α y y x x (16) D i − 2 i j i j i · j i i,j ! ! 20 takes-all” strategy is then commonly used to label unknown samples, whereby the classifier with

k(k−1) the highest output decision function assigns the final class. In the OvO scenario, 2 binary classifiers are constructed in a pair-wise manner. A voting strategy is then typically employed in classification, whereby each classifier accounts for one vote and the class with the maximum number of votes assigns the final label. Although the OvO method has been shown to perform better on a number of fronts [137], as far as the authors are aware, all previously described

SVM-based GPCR classifiers available for public use rely on the simpler OvR method.

1.8 GPCR Deorphanization

Once identified, GPCRs require deorphanization, the process of pairing orphan receptors with their cognate ligands. The predominant approaches all require the transient or stable heterolo- gous expression of receptors in a surrogate cell system and in most cases, this expression occurs in cells derived from other species and phyla. In the absense of a flatworm cell culture line,

flatworm GPCRs have been expressed in such divergent cellular environments as CHO [138],

HEK293 [139,140], COS7 [139], yeast [140,141], and Xenopus oocyte cells [142]. The traditional work-flow begins with full-length sequence cloning and introduction of the receptor coding se- quence into an appropriate expression vector designed for the particular cell culture system.

Receptor activation assays incorporate different detection schemes (e.g. fluorescence, biolumi- nescence, and gene reporting) and are amenable to a host of receptor-specific adaptations. In mammalian cell culture, GPCR activation is commonly assayed as a function of Ca2+ mobiliza- tion from the ER (Figure 1.5). Agonist-evoked calcium flux can be detected using fluorescent calcium-sensitive dyes in a high throughput platform. In this system, co-transfected G protein chimeras [143, 144] can be used to divert GPCR signaling through the Gαq pathway, irrespec- tive of the endogenous receptor coupling profile. The carboxyl-terminus of the Gα subunit is primarily responsible for receptor specificity, and chimeric G proteins have been constructed via replacement of the five C-terminal amino acids of Gαq alpha with those of other G protein subtypes. 21

!"

+,-./0"

PIP2 DAG ))" )*" *)"

)" )))" *" *))" !"#$%&

IP3 !" #" #" $%&"

$'&(" !"#

Figure 1.5 GPCR deorphanization in mammalian culture.

Yeast have also been shown to be attractive heterologous GPCR expression systems for a num- ber of reasons. They provide a minimal eukaryotic host environment, are genetically robust and amenable to manipulation, and are low cost and relatively easy to maintain in the laboratory.

Further, they lend themselves to high throughput-screening assays with simple outputs. A number of yeast strains have been genetically modified for the purpose of heterologous GPCR expression. In these engineered strains, modifications of the endogenous yeast GPCR-mediated response pathway direct the activation of a non-native GPCR to yeast growth as a phenotypic output [145]. Our laboratory and others have had recent success in employing yeast as a means of deorphanizing both platyhelminth [140] and nematode [146] GPCRs, and we have continued on this path with other flatworm receptors (Appendix B).

1.8.1 Difficulties and Pitfalls

Current approaches to GPCR deorphanization have severe limitations and are inefficient for large-scale projects. This has introduced a significant bottleneck in the way of both the pharma- 22 cological and structural characterization of GPCRs [147,148]. Although heterologous expression is not a theoretically challenging feat, individual targets routinely prove to be recalcitrant and consume inordinate effort. The time and effort required to establish the transient or stable expression of a receptor make traditional approaches undesirable for projects of larger scale.

The processes that guide the proper folding and export of receptors are not necessarily well- conserved across cell lineages. Further, a significant set of potential concerns also present themselves post-transfection or transformation. In the event that a GPCR is successfully ex- pressed on the surface of a host cell, the GPCR must operate in conjunction with a foreign complement of accessory and signaling proteins. The possibility of a receptor with altered structure or function is therefore a consequential worry. In view of these concerns, a receptor deorphanization method that could be applied in a native cell or membrane environment could side-step some of these concerns.

1.9 General Objectives

Objective 1. Whole-genome identification and classification of GPCRs in S. mansoni and S. mediterranea using a transmembrane-focused bioinformatics protocol. The primary hypothesis is that these platyhelminths possess an extensive and complex GPCR repertoire, and exhibit a host of lineage-specific features.

Objective 2. Development of a loss-of-function GPCR deorphanization protocol that occurs in the native cell membrane environment. The primary hypothesis is that joining RNAi and

GPCR second messenger assays can lead to the elucidation of receptor agonists and G protein coupling pathways.

These objectives are pursued with the overarching goals of identifying schistosome drug targets, mapping platyhelminth receptor biology, promoting planarians as model organisms for parasite research, and establishing higher-throughput methods of receptor deorphanization and drug target validation. The selection of specific ‘rational’ targets is aided by these focused objectives. 23

2 The Repertoire of G Protein-Coupled Receptors in the Human Parasite Schistosoma mansoni and the Model Organism Schmidtea mediterranea

A paper submitted to the Journal BMC Genomics

Mostafa Zamanian 1,2,∗, Michael J Kimber 1,2, Paul McVeigh 3, Steve A Carlson 1,2, Aaron G

Maule 3, Tim A Day 1,2,∗

Abstract

Background: G protein-coupled receptors (GPCRs) constitute one of the largest groupings of eukaryotic proteins, and represent a particularly lucrative set of pharmaceutical targets. They play an important role in eukaryotic signal transduction and physiology, mediating cellular responses to a diverse range of extracellular stimuli. The phylum Platyhelminthes is of con- siderable medical and biological importance, housing major pathogens as well as established model organisms. The recent availability of genomic data for the human blood fluke Schis- tosoma mansoni and the model planarian Schmidtea mediterranea paves the way for the first comprehensive effort to identify and analyze GPCRs in this important phylum.

Results: Application of a novel transmembrane-oriented approach to receptor mining led to the discovery of 116 S. mansoni GPCRs, representing all of the major families; 104 Rhodopsin,

2 Glutamate, 3 Adhesion, 2 Secretin and 5 Frizzled. Similarly, 291 Rhodopsin, 9 Glutamate,

21 Adhesion, 1 Secretin and 11 Frizzled S. mediterranea receptors were identified. Among

1Department of Biomedical Sciences, Iowa State University, Ames, IA USA 2Interdepartmental Neuroscience Program, Iowa State University, Ames, IA USA 3Biomolecular Processes, School of Biological Sciences, Queen’s University Belfast, Belfast UK ∗Corresponding Authors. Emails: [email protected], [email protected] 24 these, we report the identification of novel receptor groupings, including a large and highly- diverged Platyhelminth-specific Rhodopsin subfamily, a planarian-specific Adhesion-like family, and atypical Glutamate-like receptors. Phylogenetic analysis was carried out following extensive gene curation. Support vector machines (SVMs) were trained and used for ligand-based clas- sification of full-length Rhodopsin GPCRs, complementing phylogenetic and homology-based classification.

Conclusions: Genome-wide investigation of GPCRs in two platyhelminth genomes reveals an extensive and complex receptor signaling repertoire with many unique features. This work provides important sequence and functional leads for understanding basic flatworm receptor biology, and sheds light on a lucrative set of anthelmintic drug targets.

2.1 Background

The G protein-coupled receptor (GPCR) superfamily constitutes the most expansive family of membrane proteins in the metazoa. These cell-surface receptors play a central role in eukary- otic signal transduction, and conform to a structural archetype consisting of a core domain of seven transmembrane (TM)-spanning α-helices. GPCRs are also established drug targets, acting as sites of therapeutic intervention for an estimated 30-50% of marketed pharmaceuti- cals [79, 149]. This is undoubtedly a function of their extensive involvement in a wide range of important physiological processes. The diverse panel of known GPCR ligands includes bio- genic amines, photons, peptides, odorants and classical neurotransmitters [80]. This diversity is mirrored by the significant degree of primary sequence variation displayed among GPCRs.

At present, there exists no comprehensive study of GPCRs for the phylum Platyhelminthes.

This important phylum houses prominent endoparasites, both flukes and tapeworms, as well as free-living species that serve as established model organisms in the realm of developmental bi- ology. Lack of sequence data and a reliance on techniques with a definably narrow expectation of success such as degenerate PCR have contributed to the very modest number of GPCRs thus 25 far identified or characterized [138–140,142,150,151] in this phylum. The arrival of EST repos- itories [49,152,153] has only marginally contributed to this number, perhaps as a consequence of GPCR under-representation [154]. The recent availability of Schistosoma mansoni [24] and

Schmidtea mediterranea [27] whole genome sequence data provides basis for the in silico accu- mulation and analysis of undiscovered and potentially novel receptors.

The blood fluke Schistosoma mansoni is the primary etiological agent of human schistosomi- asis, a chronic and debilitating condition that afflicts a staggering 207 million people in 76 countries [2] and accounts for 280,000 deaths per annum in sub-Saharan Africa alone [155].

It is calculated that up to 70 million disability-adjusted life years (DALYs) are lost to schis- tosomiasis annually [4]. This figure surpasses the global burden posed by both malaria and tuberculosis, and is nearly equivalent to that of HIV/AIDS. At present, this overwhelming disease burden is met with a near exclusive reliance on treatment with the drug praziquantel.

The threat of drug resistance [9,10] has spurred recognition of the pressing need for new antis- chistosomals [1,13,15]. In this context, as modulators of a diverse range of critical biochemical and physiological pathways, GPCRs hold great promise as potential targets for disruption of crucial parasite survival and proliferation activities.

The free-living planarian Schmidtea mediterranea is an important platyhelminth studied exten- sively for its regenerative abilities [17,156]. Like other planarians, it is abundantly seeded with totipotent stem cells with the ability to migrate and undergo division and differentiation at sites of injury. In addition to its current role as a powerful model organism for regeneration and stem cell biology, S. mediterranea presents itself as a potential parasite drug discovery model [157].

In the case of nematodes, the biology of the free-living model organism Ceanhorhabditis elegans features prominently in many anti-parasitic drug discovery efforts [158, 159]. Like C. elegans,

S. mediterranea is significantly more tractable to modern genomic approaches compared to the parasitic members of its phyla. It is relatively easy to maintain and it is amenable to RNA interference (RNAi) [71]. Genome-wide analysis and comparison of the GPCR complements of 26

S. mansoni and S. mediterranea is a major step towards engaging this hypothesis.

The growing number of sequenced genomes has provided a GPCR mining platform for a number of organisms, including Homo sapien [112], Mus musculus [113], Gallus gallus [114], Rattus rat- tus [115], Tetraodon nigrovirdis [116], Anopheles gambiae [117], Drosophila melanogaster [118],

Ciona intestinalis [119], Branchiostoma floridae [120], Xenopus tropicalis [121] and Canis fa- miliaris [122]. For these organisms, GPCR sequences have been accumulated with a range of bioinformatic methods that include homology-based searching (BLAST), hidden Markov mod- els (HMMs) and motif-driven queries [123]. The more sophisticated GPCR mining protocols have involved the application of a combination of such methods and algorithms.

Phylogenetic studies of the GPCRs in a number of eukaryotic genomes have led to the introduc- tion of the GRAFS classification system [108,109]. GRAFS outlines five major protein families thought to represent groupings of receptors with shared evolutionary ancestry present in the

Bilateria: Glutamate, Rhodopsin, Adhesion, Frizzled, and Secretin. In addition to these primary families, some organisms are known to house groupings of lineage-specific receptors that con- stitute distinct GPCR families. Examples in the phylogenetic vicinity of the Platyhelminthes include the nematode chemosensory receptors [160] and insect gustatory receptors [161]. Any in silico protocol for genome wide GPCR identification should therefore cast a broad enough net to reveal any such highly-diverged receptor groupings, while also providing stringency to limit false positives.

Here, we apply an array of sensitive methods towards the goal of identifying, manually curating and classifying putative G protein-coupled receptor sequences in two prominent platyhelminths.

Our hypothesis is that organisms in this phylum possess an extensive and complex complement of GPCRs, including phylum or species-specific GPCR groupings. We perform phylogenetic analysis of putative receptors with respect to the GRAFS classification system and employ a machine-learning approach for ligand-based classification of full-length Rhodopsin GPCRs. 27

2.2 Results and Discussion

In this study, we developed a robust transmembrane-focused strategy to identify, curate and classify putative platyhelminth GPCRs. TM-focused profile hidden Markov models (HMMs) were used to mine the predicted proteomes of S. mansoni and S. mediterranea in order to iden- tify receptors at the GPCR family plane. Subsequent rounds of filtering were used to remove false positives, followed by homology-based searches against the original genome assemblies.

Extensive manual curation of the final sequence dataset allowed for more refined phylogenetic analysis. Greater classification depth was achieved with a complementary transmembrane- focused support vector machine (SVM)-based classifier. An overview of this bioinformatics protocol is outlined in Figure 2.1.

Identification of GRAFS family receptors with TM-focused profile HMMs

Towards the goal of identifying members of the GRAFS GPCR families in our genomes of interest, we relied primarily on the use of family-specific profile HMMs. This alignment-rooted method has been successfully applied in other genomes and has been shown suitable for the identification and classification of GPCR sequences at the family level [123, 162]. In a depar- ture from previously described protocols, we chose to focus HMM training exclusively on the most highly-conserved structural features that extend throughout the GPCR superfamily. The idea behind this measure was to dampen challenges posed by the inexact gene structures that underlie the flatworm predicted proteomes, as well as the sizable phylogenetic distance of this phylum from organisms with characterized GPCR complements.

In this framework, receptor transmembrane domains are convenient markers that can be identified with greater confidence than other GPCR stretches using sensitive prediction al- gorithms such as HMMTOP [128] and TMHMM [129]. Training sequences were procured from GPCRDB [107] and processed into what we will refer to as “transmembrane-only pseu- dosequences” (TOPs), representing the ordered concatenation of TM domains flanked bi- 28 directionally by 5 amino acids (Figure 2.1b). TM-focused HMMs were constructed for each of the major GPCR families, as well as for the nematode chemosensory and insect odorant families. The Adhesion and Secretin training sets were combined to build a single HMM, given that sequences belonging to these families are not easily distinguishable beyond the N-terminal ectodomain [154].

The predicted proteomes of S. mansoni and S. meditearranea were first filtered for the removal of globular proteins. Typical strategies limit investigation to proteins with 6-8 predicted TM domains, tolerating errors in the algorithmic prediction of these regions. We significantly re- laxed this filter and broadened the search scope to include all proteins with 3-15 TM domains.

The utility of this change then was to alert us to partial sequences or incorrectly predicted gene models that may be reconstructed with manual curation and that otherwise would have been screened from detection. Family-derived profile HMMs already provide an adequately stringent

filter for distinguishing between GRAFS family receptors and other transmembrane proteins.

The proteins that survived this filter were processed into TOPs in the same manner as the training sequences. These sequences were searched against the set of profile HMMs, and the resulting hits for each GPCR family were ranked according to E-value. A primary cut-off was selected at the point where subsequent hits showed significant homology to other known proteins or GPCRs belonging to other families. This was accomplished with a BLASTp [124] search of all hits against the NCBI non-redundant (nr) database. Sequences that displayed

GPCR-related homology, along with those that returned no significant BLAST results, were retained. As evidenced later, the requirement of statistically meaningful GPCR-related ho- mology introduces an unnecessary selection bias that can mask the identification of unique receptors.

Application of the Rhodopsin HMM to the S. mansoni predicted proteome led to the examina- tion of the 400 top-ranking hits (E-value < 0.007), 77 of which remained after removal of false 29 positives via homology-based searches. Similarly, 270 of the 450 top-ranked (E-value < 0.002)

Rhodopsin HMM hits remained for S. mediterranea. Redundancy within the S. mediterranea genome assembly warranted the detection and removal of identical sequences. BLAT [163] was used to self-align the nucleotide sequences of the predicted proteins that survived the HMM

filtering process. Redundant sequences were removed and if a choice was presented, the longest member of a set of identical sequences was retained. This led to the removal of 14 Rhodopsin sequences from the S. mediterranea dataset. Figure 2.2 displays the overall transmembrane distribution for both proteomes at these various stages of processing for the Rhodopsin family.

These steps were likewise performed for the nematode chemosensory and insect odorant GPCR families, however no flatworm orthologs were identified. This is not unexpected, considering their lack of conservation among the Ecdysozoa.

Manual editing of gene models

Candidate GPCR sequences underwent manual inspection, and the corresponding gene models were edited. This labor-intensive step is crucial in improving the reliability of any further analysis on this gene family. Common manual edits included the merging or splitting of gene models, movement of intron-exon boundaries, and sequence extension or truncation in either or both directions. This process was aided by examination of open reading frames (ORFs) in the vicinity of a gene models. ORFs that housed common receptor motifs, displayed GPCR-related homology or contained transmembrane stretches were typically incorporated. In many cases, sequencing gaps prevented any meaningful improvement. S. mansoni GRAFS sequences and S. mediterranea GAFS sequences were curated in this manner. We avoided genome-wide manual curation of S. mediterranea Rhodopsin sequences in light of the dubious condition of the draft genome. The A/T rich (69%), highly repetitious (46%) and heterozygous nature of the genome has significantly complicated automated assembly efforts. However, as we elaborate later, we did construct and edit gene models for a particular grouping of Rhodopsin-like Schmidtea

GPCRs. The significant level of improvement achieved by manual gene editing is shown in

Figure 2.3. 30

Homology-based search and final gene editing

To account for the likelihood that the primary sets of gene models used do not provide per- fect accounting of all gene-encoding regions within the assemblies, we exercised a translated nucleotide BLAST (tBLASTn). For each family, putative GPCRs were combined from both species and searched locally against the original nucleotide assemblies translated in all six frames. Hits with E-value < 0.1 were manually examined for GPCR-related homology. In cases where identified regions of homology overlapped with a given gene model, that gene model was added to the sequence pool. Conversely, if no gene model was found to be present at a particular genomic location, a simple preliminary gene model was built by connecting the high-scoring segment pairs (HSP) that contributed to the tBLASTn hit. In keeping with the

HMM approach, only putative receptors with a TM count 3 were retained. This led to a ≥ further significant expansion of the total unique sequence count in both organisms (Table 2.1).

This reported sequence count is not equivalent to a receptor count, as many of these sequences may represent fragments of a single protein or prove to be redundant sequences. To bridge this gap and to improve the general state of this additional sequence data, manual editing of gene models was again performed.

Overall phylogenetic view

Putative receptor sequences were tentatively divided into three sequence bins based on the number of predicted TM domains: full-length, near full-length and partial. Full-length se- quences were those that likely had their entire 7TM domain intact as predicted by HMMTOP with user oversight. Alignments to homologous proteins were used to help make a final deci- sion with respect to the potential algorithmic miscounting of TM domains. Near full-length sequences are predicted to contain 4 TM domains, while all other sequences (< 4 TM do- ≥ mains) were placed into the partial sequence bin. Phylogenetic analysis was carried out for full-length and many near full-length receptors. Figure 2.4 displays a topological overview of the primary flatworm GPCR groupings. This phylogenetic analysis confirms the distinct and analogous presence of the primary GRAFS families, and further reveals two novel flat- 31 worm GPCR families: Platyhelminth-specific Rhodopsin-like orphan family 1 (PROF1) and

Planarian Adhesion-like receptor family 1 (PARF1).

The Rhodopsin family

The Rhodopsin family is divided into four main groups (α, β, δ, and γ) and further subdivided into 13 major sub-families via analysis of fully sequenced mammalian genomes [110]. The α and

β subfamilies are well-populated in both S. mansoni and S. mediterranea (Figure 2.5), while the δ and γ subfamilies are absent. Table 2.2 provides a preliminary classification of receptors identified with respect to the GRAFS classification system from a comparative perspective.

Alpha (α) receptors

The α subfamily houses amine, opsin-like, and melatonin receptors. Among these, the amine grouping is typically largest. This metazoan trend holds true for S. mansoni and S. mediter- ranea, each possessing at least 25 and 60 putative aminergic receptors, respectively. These numbers are greater than those observed among ecdysozoans, and in the case of S. mediter- ranea, the figure surpasses even the human amine GPCR complement. Biogenic amines such as serotonin (5-hydroxytryptamine; 5HT), dopamine, and histamine have been shown to play a prominent role in the flatworm nervous system [164,165]. Although a small number of amin- ergic GPCRs have been characterized in this phylum, the majority of receptors that mediate aminergic signaling have thus far remained elusive. Phylogenetic analysis was carried out on the

flatworm amine GPCR complement with respect to C. elegans aminergic receptors, as shown in Figure 2.6. Two diverged flatworm-specific groupings can be outlined, including one that signifies a major paralogous expansion in schistosomes. Other flatworm receptors are grouped and tentatively associated with ligands corresponding to their phylogenetic relationships with deorphanized C. elegans GPCRs.

Four -like receptors were identified in S. mansoni. Five melanopsin-like receptors were identified in S. mediterranea, along with a single receptor that displays moderate homology 32 to various ciliary . Along with the presence of cyclic nucleotide gated (CNG) ion chan- nels in the planarian genome, this raises the possibility of ciliary phototransduction. Another noteworthy observation is the conspicuous absence of melatonin-like receptors in S. mansoni, while S. mediterranea houses a relatively large complement of 9 such receptors. Melatonin is endogenously synthesized in planaria in a circadian manner [166,167], and has been implicated in regeneration [168]. Identification of melatonin receptors is a requisite for a more complete mapping of the underlying signal transduction pathway(s) in these important processes.

Beta β receptors

The β subfamily contains the great majority of peptide and peptide hormone GPCRs in exam- ined organisms. Neuropeptidergic signaling is known to play a fundamental role in flatworm locomation, reproduction, feeding, host-finding and regeneration [169, 170]. The known flat- worm neuropeptide complement has recently undergone considerable expansion with the appli- cation of bioinformatics and mass spectrometry-based (proteomics) approaches [171,172]. This represents a significant advance from the original handful of FMRFamide-like peptides (FLPs) and neuropeptide Fs (NPFs) first identified in the phylum. Many of these newly-identified amidated peptides are planarian or flatworm-specific, while others exhibit relatedness to pep- tides in other phyla, including myomodulin-like, buccalin-like, pyrokinin-like, neuropeptide FF

(NPFF)-like, and gonadotropin (or thyrotropin) releasing hormone-like peptides.

Our efforts yielded at least 81 and 35 putative peptide receptors in S. mediterranea and S. mansoni, respectively. These numbers further evidence the notion that peptidergic signaling is the predominant mode of neurotransmission in the Platyhelminthes. Flatworm peptide re- ceptors do not cluster into a single phylogenetic group (Figure 4b). One groupings contains

GPCRs that share homology with receptors responsive to peptide ligands present in the ver- tebrate lineage, while the other is mostly populated with receptors that share homology with invertebrate neuropeptide-responsive receptors including Gt-NPR1 homologs. It can be noted that the putative flatworm peptide receptor count greatly outnumbers the set of currently 33 known peptide ligands. Although this may be explained by peptide promiscuity and receptor redundancy, it is also very possible that many neuropeptides have yet to be uncovered. Ligands cannot be confidently assigned to the majority of identified receptors. While some show mod- erate homology to characterized FLP and NPF-like receptors, most receptors display weak or insignificant homology to an assortment of thyrotropin-releasing hormone, capa, sex peptide, growth hormone secretagogue, proctolin, pyrokinin, myokinin, tachykinin, galanin, and orexin receptors. These tentative BLAST-based annotations may be used with caution to help guide receptor deorphanization efforts.

Other receptors

A large number of Rhodopsin receptors could not be individually annotated with confidence, and were placed in the “Other Rhodopsin” receptor bin. Receptors in this category lack phylogenetic support to be clustered with known Rhodopsin groupings, and lack meaningful homology to receptors with known ligands. Many receptors in this bin exhibit some weak peptide or amine receptor-relatedness, but these require functional validation before they can be added to the α or β subfamily counts. Many of these receptors are likely unique to the phylum, and therefore obscure the Rhodopsin family subdivisions apparent in the vertebrate lineage.

Planarian homologs of parasite GPCRs

Given the relative tractability of planarians to experimental manipulation, we identified the nearest homologs of S. mansoni Rhodopsin receptors in the S. mediterranea pool (Tables 2.3

- 2.5). It is a reasonable expectation that there is significant conservation in the biological and pharmacological properties of receptors sharing high sequence identity between these species.

The characterization of certain planarian receptors is likely to inform us about the function and druggability of parasite receptors. Each S. mansoni receptor was first matched to its most similar S. mediterranea sequelog, and sequence pairs were ranked according to amino acid percent identity (PID): 8 receptor pairs were identified sharing > 50% PID, 15 with 40-50%

PID, 48 with 30-40% PID, and the remaining sequences with < 20% PID. The top grouping is 34 comprised exclusively of biogenic amine (GAR and 5HT) and peptide GPCRs. Among them is a receptor pair orthogolous to Gt-NPR1 [139], the only neuropeptide receptor deorphanized in this phylum. This degree of sequence conservation promotes the use of planaria as a convenient heterologous system to study parasite receptors.

Platyhelminth-specific Rhodopsin-like orphan family 1 (PROF1)

A large and distinct sequence clade comprised of 19 S. mansoni and 40 S. mediterranea proteins was identified and labeled Platyhelminth Rhodopsin Orphan Family 1 (PROF1). Members of this novel and highly-diverged phylogenetic grouping are predicted to house a 7TM domain with an extracellular N-terminus and seem to be exclusively derived from intronless genes.

Most PROF1 sequences were revealed with homology-based searches after a small number of bait sequences survived the Rhodopsin HMM filtering stage. In fact, 31 of the 40 Schmidtea

PROF1-containing ORFs were identified via tBLASTn, and only one of these ORFs coincided with an existing gene model. Similarly, 13 of 19 Schistosoma PROF1 were identified in this manner and only four of these were represented in the predicted gene set.

These receptors display remnants of classical Rhodopsin motifs at corresponding positions (Ta- ble 2.6), yet show no significant overall homology to any previously discovered GPCRs. It is important to point out that the absolute requirement of GPCR-related BLAST homology as part of the post-HMM filtering stage would have masked the identification of PROF1 recep- tors. BLASTp searches of all PROF1 sequences against the NCBI nr database (E-value cutoff

= 0.1) returned no hits for the majority of sequences. The small pool of hits that did result, exhibit both very poor homology and represent an incongruous range of receptors that include peptide, lipid and odorant GPCRs. This further highlights the unique nature of these receptors.

Maximum parsimony analysis led to the subdivision of PROF1 into three primary phylogenetic groupings with good bootstrap support (Figure 2.7). Group I is the largest among these with 27 and 13 members from S. mediterranea and S. mansoni, respectively. The lack of obvious one- 35 to-one orthologs between species suggests expansion or contraction of these receptors occurred after the splitting of planaria and trematodes in the flatworm lineage. Group II includes 6 S. mansoni and 7 S. mediterranea sequences, while group III houses 6 S. mediterranea sequences.

It is likely that the closest related receptor to the ancestral gene for this family is contained in group I or II. A multiple sequence alignment of TM domains I-IV (used for phylogenetic analysis) is shown in Figure 2.8.

Of additional interest, short PROF1-like sequence fragments were identified in both genome assemblies that could not be incorporated into full-length gene structures. These may constitute pseudogenes, or be ascribed to errors in assembly. RT-PCR was used to confirm transcript expression for a selection of putative full-length PROF1 receptors in Schmidtea: 8 from group

I, 2 from group II and 3 from group III (highlighted in Figure 2.7). Correct-sized amplicons were visualized for all 13 targets. Similarly, we selected a representative from each Schistosoma

PROF1 grouping and confirmed transcription in the adult stage: SMP084270 from group I and

SMP041880 from group II. It is not currently possible to assign functions or putative ligands for the PROF1 family. However, given that they constitute one of the largest Rhodopsin- like subfamilies conserved between these monophyletic species, we suspect that they play an important biological role in this phyla.

The Adhesion and Secretin Families

Adhesion and Secretin receptors show sequence similarity in their 7TM domains and are commonly grouped as Class II GPCRs. The phylogenetic separation of these families un- der the GRAFS paradigm is mirrored by noticeable structural differences in their N-terminal ectodomains. Archetypal Adhesion GPCRs have a long N-terminus containing a diverse ar- rangement of functional domains. In the vertebrate lineage, this family constitutes the second largest grouping of GPCRs after Rhodopsin and is further partitioned into 8 clusters (I-VIII).

Secretin GPCRs usually display N-terminal hormone-binding domains (HBD) that confer re- sponsivity to peptide hormones and are thought to descend from the group V Adhesion re- 36 ceptors [173]. Additional Adhesion-like proteins have been identified in various organisms that stake more dubious evolutionary positions. The insect Methuselah receptors are one such ex- ample that have become the subject of great investigation, attributable to their role in life-span regulation and stress resistance [174]. More recently, a cluster of Adhesion-like receptors (NvX) was identified in the basal Cnidarian N. vectensis which share some homology with Methuselah receptors [173, 175].

We have identified a novel cluster of 12 Schmidtea GPCRs that show moderate (> 20% PID) homology to NvX receptors. We denote this cluster Planarian Adhesion-like receptor family

1 (PARF1). Like NvX, PARF1 receptors contain a single Somatomedin B domain, except in the case of SMDC005966C which is predicted to contain two. Interestingly, no PARF1 or- thologs were identified in S. mansoni. A single Adhesion GPCR in S. mansoni (SMP099670) was found to house a Somatomedin B domain, but it otherwise shares no significant homol- ogy with PARF1. Two Adhesion-like Schmidtea GPCRs (SMD002396 and SMD002965) were identified that most resemble vertebrate group V orphan GPR133. Two Schmidtea GPR157 homologues (SMD002980 and SMD009091) were also identified via Adhesion/Secretin HMM, however, these receptors exhibit vague sequence similarity to more than one GPCR family [115].

Latrophilin-like receptors were found to be present in both flatworms. Schmidtea SMD011811 contains a GPS domain, and can be grouped with sequence fragments SMDC001354A and

SMDC001354B. Schistosoma SMP176830 contains a Somatomedin B domain, but shares no significant sequence similarity with the identified -like planarian receptors. Evidence of the potential druggability of these particular receptors comes from the parasitic nematode

Haemonchus contortus, where a latrophilin-like receptor has been identified as a target of an anthelmintic cyclodepsipeptide [176]. One other Adhesion-like parasite GPCR was identified

(SMP058380) that displays an N-terminal GPS domain, but with no clear planarian ortholog.

The Secretin flatworm complement is comparatively smaller. Two S. mansoni and one S. 37 mediterranea Secretin GPCRs were identified. SMP125420 and its planarian ortholog SMD004009 show high sequence similarity to diuretic hormone receptors and contain an N-terminal hor- mone receptor domain (HRM). These receptors likely play a role in homeostatic regulation.

Schistosome SMP170560 exhibits an HRM domain and parathyroid hormone receptor homol- ogy. This receptor may in fact have a planarian ortholog, but despite the recognition of a short, nearly identical Schmidtea sequence fragment, we were unable to identify the rest of the hypothetical gene within the assembly.

The Glutamate Family

Glutamate GPCRs respond to a wide range of signals, including glutamate, γ-aminobutyric acid (GABA), Ca2+ and odorants. The mammalian complement of metabotropic glutamate receptors (mGluRs) consists of 8 proteins that fall into 3 groups. They universally pos- sess a large extracellular domain that contains within it a ligand binding domain (LBD).

The Drosophila mGluR-like complement consists of two receptors, DmGluRA and DmXR.

DmGluRA shares the structural profile of mammalian mGluR2 and mGluR3. DmXR consti- tutes one member of a larger insect-specific clade, and displays an atypically-diverged LBD that responds to L-canavanine [177,178]. Outside of the metazoa, a group of 17 Dictyostelium

GABAB-like receptors (GrlA-GrlR) have been forwarded as potential evolutionary precursors to mGluRs [179, 180].

We identified 2 Schistosoma and 9 Schmidtea Glutamate-like sequences. Phylogenetic analysis of these sequences was performed with respect to both mammalian and non-mammalian Gluta- mate receptors (Figure 2.9). The S. mansoni Glutamate-like receptors both have corresponding orthologs in the Schmidtea genome. GSMP052660 and its ortholog GSMD025402 group with

DmGluRA, and most of the remaining planarian sequences fall in the phylogenetic vicinity of the major mGluR groupings. However, GSMP128940 and its ortholog GSMD001419, along with GSMD004608, seem to be significantly diverged from both GABAB and mGluR recep- tors. In the case of DmXR and Grl receptors, the examination of key LBD residues involved 38 in glutamate binding led to the eventually validated conclusion that glutamate was not the primary ligand. We perform similar analysis as depicted in Figure 2.10. Although the residues of GSMD001419 involved in α-amino and α-carboxylic groups of glutamate are conserved, the residues associated with the γ-carboxylic group are not. This runs parallel to the observations made for DmXR. GSMP128940 displays an even more atypical LBD and conserves only a single putative glutamate-interacting residue. We hypothesize that these particular receptors either bind other amino acid-derived ligands or possess unusual pharmacological profiles.

The Frizzled Family

Wnt-mediated Frizzled signaling plays a significant regulatory role in a number of crucial devel- opmental processes, including cell fate determination, cell motility, cell polarity, and synaptic organization [181]. In planaria, the canonical Wnt/β-catenin pathway is implicated as a molec- ular switch for anteroposterior polarity in regeneration [156,182]. We identified four S. mansoni

Frizzled sequences, along with the 10 S. mediterranea sequences previously identified. A single

Smoothened-like sequence was found for each species.

In humans, 10 Frizzled receptors are grouped into four clusters based on sequence identity:

Fzd1/Fzd2/Fzd7 (I), Fzd5/Fzd8 (II), Fzd4/Fzd9/Fzd10 (III), and Fzd3/Fzd6 (IV) [181]. Both

flatworm genomes house a single receptor (FSMP118970 and FSMD000018) that groups in clus- ter IV, sharing 45% amino acid identity with Drosophila Fzd1 and 38% identity with human ∼ ∼ Fzd6. Four planarian (FSMD023435, FSMD010098 and FSMD000054) and two schistosome

(FSMP139180 and FSMP155340) receptors appear to belong to cluster II. Other flatworm

Frizzled receptors show less clear relationships with their vertebrate counterparts.

Ligand-based support vector machine (SVM) Rhodopsin subclassification

Support vector machines (SVMs) represent a powerful supervised-learning method for data classification. Given a combined set of positively and negatively labeled training instances, an

SVM produces a binary classifier that can then be used to label unknown samples. Each in- 39 stance is associated with a fixed-length numerical feature vector, containing certain attributes of the data to be classified. The SVM identifies a maximum-margin separating hyperplane to distinguish between vectors representing instances of opposite sign. More often than not, training instances are not linearly separable in the feature space, and feature vectors must

first be mapped to a higher dimensional space. Non-linear classification is then performed by application of kernel functions which allow for the construction of a hyperplane in the trans- formed feature space. Recently, this approach has seen extensive use in the area of biosequence discrimination, and relevant to our goals, the particular problem of GPCR classification.

In the first study on the matter, SVM-based classifiers were shown to drastically outperform their BLAST and HMM-based counterparts for level 1 and level 2 GPCR subclassification [132].

Subsequent studies further improved the predictive performance of SVMs with the introduction of dipeptide composition feature vectors [133,134], achieving accuracies of 97.3% and 96.4% for level 1 (Rhodopsin) and level 2 (amine) classification, respectively. Alternative feature vectors have since been similarly validated [135, 136]. Although these computational approaches are touted as among the most sensitive, to the best of our knowledge, they have seen no utilization in the realm of genome-wide GPCR mining studies.

Perhaps one reason for this is that even in the case of publicly available SVM classifiers, training and validation occurs exclusively with full-length sequence data. More suitable classifiers would be tailored to the general deficiencies of sequence data resulting from in silico methods, where inexact gene structures are an unavoidable phenomenon. In this respect, we developed a classifier to complement our particular GPCR identification approach. This involved focusing

SVM training on transmembrane domains, as identification of these conserved blocks had been a primary aim of both our receptor mining and manual curation protocols. 40

Multi-class SVM

Multi-class SVMs refer to classification problems where the number of classes, k, is greater than

2. Such problems are typically solved using either the “one-versus-rest” (OvR) or “one-versus- one” (OvO) method. In the OvR scenario, k binary classifiers are trained, such that each classifier separates one class from all others. The “winner-takes-all” strategy is then commonly used to label unknown samples, whereby the classifier with the highest output decision function

k(k−1) assigns the final class. In the OvO scenario, 2 binary classifiers are constructed in a pair-wise manner. A voting strategy is then typically employed in classification, whereby each classifier accounts for one vote and the class with the maximum number of votes assigns the final label. Although the OvO method has been shown to perform better on a number of fronts [137], as far as the authors are aware, all previously described SVM-based GPCR classifiers available for online use rely on the simpler OvR method. We constructed OvO GPCR classifiers for two levels of Rhodopsin family sub-classification.

Building feature vectors for ligand-based receptor discrimination

The general fixed-length feature vector, F~ , contains frequency information for the 202(400) possible dipeptides over a given stretch of sequence, L amino acids in length. Dipeptides are counted in both possible frames and there are therefore L 1 total amino acid pairs. −

F~ = P ,P , ..., P ,P (2.1) h 1 2 399 400i fi Pi = (2.2) L 1 − where fi represents the frequency of dipeptide i

To better associate an SVM-based classification approach with our gene-mining strategy, we explored the idea of again focusing our efforts exclusively on the transmembrane domains.

Two options in the way of final feature vector construction were pursued: X~ T 1 and X~ T 7. X~ T 1 represents the 400-element dipeptide frequency vector taken over the entire length of a TM-only pseudosequence, while X~ T 7 represents the 2800-element dipeptide frequency vector generated 41 from the ordered concatenation of the dipeptide frequency vectors for the seven individual TM domains. The standard dipeptide frequency vector calculated for full-length proteins, X~ FL, was used for comparison. We will refer to the corresponding SVM classifiers as SVMT 1, SVMT 7, and SVMFL.

X~ T = F~ TM −TM (2.3) 1 { } 1 7

X~ T = F~ TM F~ TM F~ TM F~ TM F~ TM F~ TM F~ TM (2.4) 7 { } 1 ⊕ { } 2 ⊕ { } 3 ⊕ { } 4 ⊕ { } 5 ⊕ { } 6 ⊕ { } 7

X~ FL = F~ FL (2.5) { }

SVM training: cross-validation and grid search

Rhodopsin training sequences were divided into 17 subfamilies using the GPCRDB classi-

fication system. Programs were written to process this training data into feature vector form (Appendix F). Training was performed with the radial basis function (RBF) kernel,

−γkx −x k2 K(xi, xj) = e i j , and a grid search was used to tune parameters γ and C with 5- fold cross-validation. For each proposed feature vector construction, the best performing (C,γ) pair was selected in domains C = 2−5, 2−4, ... , 215 and γ = 2−15, 2−14, ... , 215 and used to train a final classifier (Table 2.7).

Our original expectation was that SVMT 1 would display lower accuracy than SVMFL, given that a smaller subset of sequence information would be used for training. We hoped that this presumed disparity would be compensated by SVMT 7 with the addition of position-specific information. Instead, both SVMT 1 and SVMT 7 registered higher cross-validation accuracies than SVMFL for Rhodopsin subfamily classification. X~ T 7 was the best-performing classifier with 99.47% accuracy. These results led us to conclude that for the Rhodopsin family, the exclusion of sequence information outside of the transmembrane bundle improves dipeptide-based SVM classification. Encouragingly, this is in agreement with structure and ligand interaction data for the Rhodopsin family [183]. The same procedure was carried out in constructing classifiers 42

for amine GPCRs. SVMT 1 was the best performing classifier with a cross-validation accuracy of 96.44%.

SVM classification results

Rhodopsin sequences with seven TM domains as predicted by HMMTOP were classified by the two-tiered SVM. TOPs were aligned and manually examined to correct for erroneously predicted

TM domains. Sequences were then subclassified with the Rhodopsin SVMT 7 classifier, and those discerned as amine-responsive were further sub-classified with the amine classifier SVMT 1.A total of 121 S. mediterranea and 58 S. mansoni sequences were classified via Rhodopsin SVM.

The majority of these receptors were identified as peptide-responsive (Table 2.7). This grouping also contains all PROF1 receptors included in this classification stage, perhaps providing some clues as to their ligands. A subset of 22 S. mediterranea and 21 S. mansoni sequences were identified as amine-responsive, and classified via amine SVMT 1. These classification outputs are detailed in Table 2.9. These results can inform receptor deorphanization efforts, alongside traditional homology-based approaches.

2.3 Conclusions

This is the first comprehensive genome-wide study of G protein-coupled receptors in the phylum

Platyhelminthes. Our transmembrane-focused receptor mining approach yielded a lower-bound estimate of 116 S. mansoni and 333 S. mediterranea GPCRs. Phylogenetic analysis established the presence of the primary metazoan GRAFS families, along with well-populated α and β

Rhodopsin subfamilies in both examined genomes. The identification of these receptors com- plements previous and ongoing efforts to identify biogenic amine and neuropeptide-like ligands in flatworms, and will help identify specific receptors that mediate important aspects of flat- worm biology associated with the aminergic and peptidergic signaling systems.

The flatworm GPCR repertoire is also shown to house entirely novel receptor groupings with large numerical representation, including a Platyhelminth-specific Rhodopsin subfamily (PROF1) 43 and a planarian-specific Adhesion-like family (PARF1). These particular lineage-specific ex- pansions, along with the many other highly-diverged receptors identified, may reveal functional innovations specific to these organisms. Many of these receptors have enhanced appeal as se- lective pharmacological targets. While their diverged structures are an attractive feature in the parasite drug discovery paradigm, it presents a challenge in posing more exact hypotheses related to receptor function.

To further aid the process of functionally pairing receptors and ligands, we provide a prelimi- nary classification of full-length receptors using SVMs. This represents the first effort to apply

SVMs to the problem of GPCR classification in a whole-genome manner, a task made difficult by the evolutionary distance of flatworms from other species with well-characterized GPCR complements. SVM results may be used in conjunction with phylogenetic and homology-based approaches to receptor classification. As the quality of the underlying gene models improves, and as a greater number of full-length receptor transcripts are sequence characterized, these

SVMs can be applied to an expanding subset of identified GPCRs. Functional characteriza- tion of flatworm GPCRs is also likely to improve SVM accuracy by providing better training examples.

The notion that schistosome GPCRs represent lucrative anthelmintic drug targets is strength- ened by data on the crucial biological role of related receptor signaling molecules in nearly- related organisms [184,185], as well as that of predicted platyhelminth GPCR ligands [164,165,

169,171]. The receptors, ligands and downstream biochemical pathways associated with GPCR signaling have been identified as potential targets for parasite life-cycle interruption [15, 186].

Enlistment of schistosome reverse genetics approaches alongside receptor sequence data can lead to the validation of specific receptors as drug targets.

In this regard, RNAi in schistosomes [69,70] provides new opportunities for focused exploitation of this dataset. A simple medium-throughput phenotypic classification system has recently 44 been described for both schistosomula and adult schistosomes [14]. These endpoints could readily be used in an RNAi-mediated GPCR loss-of-function screen. Assaying the temporal expression profiles of parasite GPCRs can also be a worthwhile measure as a selection tool for receptors expressed in intra-host stages. On this front, we further the case for planarians as a convenient model organisms to interrogate the function of trematode receptors, and provide a list of inter-species receptor pairings ranked by sequence identity.

2.4 Acknowledgements

The authors wish to acknowledge the sequencing centers and laboratories responsible for the publicly-accessible genomic data that made this work possible. We would like to thank Dr.

Alejandro Sanchez Alvarado and Eric Ross for providing us the MAKER predicted gene set.

This work was funded by a grant from the National Institutes of Health (NIH R01 AI49162) to

TAD and AGM. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

2.5 Methods

Predicted proteomes and training sequences

The most recent release of the S. mansoni genomic assembly is accompanied with a set of 13,197 predicted proteins [24]. The S. mediterranea predicted proteome consists of 31,955 predicted proteins that were produced with MAKER, although this number may represent a significant overestimate of the true protein count [26,27]. HMM and SVM training sequences were down- loaded from GPCRDB [107] in FASTA file format. In total, 268 Glutamate, 5025 Rhodopsin,

175 Adhesion, 354 Frizzled and 185 Secretin sequences were procured for HMM training. 20,920

GPCRDB sequences were used for Rhodopsin SVM training, and 2,105 sequences were used for amine SVM subclassification. 45

Nomenclature

Putative receptors retain their original GeneDB or MAKER IDs in slightly modified form. In cases where a gene model was created, receptors were given a label in similar form that includes genomic contig or scaffold information. Letters are appended to the ends of these labels where necessary to distinguish among multiple gene models associated with a single contig or scaffold.

All putative flatworm GPCR sequences are provided in association with their tentative IDs.

Transmembrane domain prediction

We applied two common algorithms, TMHMM 2.0 [129] and HMMTOP 2.1 [128], to identify transmembrane domains in our GPCR training set. HMMTOP correctly predicted 7 TM domains for 93.8% (4712/5025) of Rhodopsin family receptors, compared to 81.9% (4119/5025) in the case of TMHMM. This disparity in sensitivity held for all GPCR families, and was the basis for our decision to employ HMMTOP for most subsequent work. A robust Perl script

(Appendix F) was written to parse coordinate predictions output from HMMTOP, and to generate sequence files containing only regions of interest from the original protein sequences as required.

TM-focused Profile hidden Markov model (HMM) construction

Provided a multiple sequence alignment, HMMER-2.3.2 [126] builds a probabilistic model

(profile HMM) that can be used to query sequence databases to find (or align) homologous sequences. To prepare each GPCR family training set, predicted TM domains flanked bi- directionally by 5 amino acids were extracted and concatenated using coordinates produced in the previous section. These sequences were aligned with Muscle 3.6 [187] and a profile

HMM was constructed for each family with hmmbuild. All models underwent calibration using hmmcalibrate, with the default parameters. 46

HMM-based GPCR identification

All predicted proteins in the S. mansoni and S. mediterranea genomes with a predicted number of TM domains in the range of 3-15 were processed in a manner identical to the HMM training set. These TOP-converted protein sets were searched against our family-specific profile HMMs using hmmpfam. The resulting hits for each GPCR family were ranked according to e-value, and a cut-off was selected at the point where subsequent hits showed significant homology to other known proteins or GPCRs belonging to other families. This was accomplished with a

BLASTp search of all hits against the NBCI nr database. The BLAST results were parsed with a script (Appendix F) and top results were examined for removal of false positives.

Manual curation of putative GPCR-encoding genes

A large number of GPCR sequences underwent manual inspection of gene structure, and the original predictions were edited where possible. Common manual edits included the merging or splitting of gene models, modification of intron-exon boundaries, and sequence extension or truncation in either or both directions. All editing was performed with Artemis [188].

Curation was primarily guided by homology-based searches and identification of TM domains and family-specific GPCR motifs in ORFs that occurred in the vicinity of a gene model. In the case of S. mansoni, this labor-intensive process was aided by the extraction of GeneDB annotations for scaffolds thought to contain one or more receptors. More specifically, a script was written to compile pertinent scaffold information stored in EMBL formatted files, including the orientation, the number of predicted transmembrane domains and the top BLAST hits for all proteins identified by our profile HMMs. This data was parsed into a spreadsheet and proved significant in helping identify instances where manual curation was appropriate. In the case of S. mediterranea, annotated genomic regions were loaded into Artemis in GFF3 format and edited in a similar manner. 47

Phylogenetic analysis

Near full-length (TM > 5) receptors were first processed for removal of the N- and C-termini.

ClustalX 2.0 [189] was used to generate multiple sequence alignments of the GPCRs to be examined, with default parameters. PFAAT [190] was used to edit the resulting alignment with attention to key motifs and residues housed within transmembrane domains. Low-entropy sequence blocks present in all sequences were retained. The Phylip 3.6 [191]package was used to generate phylogenetic trees. Alignments were bootstrapped using seqboot. Maximum parsimony trees were calculated with protpars with input order randomized. Neighbor-joining trees were calculated with protdist and neighbor using the JTT (Jones-Taylor-Thornton) distance matrix and with input order randomized. Consensus trees were built with consense, and visualized and edited with FigTree.

PROF1 RT-PCR

Total RNA was extracted from flatworm (schistosome or planarian) tissue using the RNAque- ous Kit (Ambion), and RNA was treated with Turbo DNAase (Ambion) per manufacturer’s instructions. A two-step RT-PCR was performed, where reverse transcription was first carried out with the Retroscript kit (Ambion). Primers were designed for two schistosome PROF1 sequences and 13 planarian PROF1 sequences using Primer 3.0 [192] (Appendix C). PCR products were visualized by agarose gel electrophoresis to confirm transcript expression.

SVM

Programs were written to process training sequences into feature vector form for the training of three SVM classifiers: SVMT 1, SVMT 2, and SVMFL (Appendix F). TM prediction was performed on training sequences with HMMTOP, and fixed-length dipeptide frequency vectors were calculated in correspondence with each model. SVMs were implemented with the the

LIBSVM [193] package. The RBF kernel was chosen and a grid-search was performed with an available python script for selection of kernel parameters. C and γ were assayed in the domains C = 2−5, 2−4, ... , 215 and γ = 2−15, 2−14, ... , 215 to identify the C,γ pair that 48 maximizes 5-fold cross validation ACC. The classifiers were trained in accordance with the

GPCRDB ligand-based groupings, and applied to a subset of flatworm Rhodopsin receptors with 7 predicted TM domains.

2.6 Figures

Figure 2.1 - Transmembrane domain-focused GPCR sequence mining strategy.

(A) Family-specific profile HMMs are built using TM-only pseudosequences (TOPs) extracted from the GPCRDB [107] sequence repository. The predicted proteomes of both S. mansoni and S. mediterranea are processed in a manner identical to that of the training sequences and are searched against a set of family-specific profile HMMs. Results are ranked statistically and sequences meeting a conservatively selected cutoff undergo an automated BLASTp cam- paign against the NCBI “nr” database. The output is parsed, and transmembrane proteins exhibiting significant homology to non-GPCR proteins are removed. Redundant sequences are removed with the BLAT utility. The surviving sequence pool is then manually assessed and curated, followed by tBLASTn of sequences against the whole genome assemblies. Adhesion and Secretin GPCR sequences are distinguished from one another by inspection of their N- terminal ectodomains. Putative full-length Rhodopsin GPCRs, defined by the presence of an intact 7TM domain, are sub-classified via SVM. (B) Construction of TOPs is a two-step process involving the prediction of TM boundary coordinates by HMMTOP, followed by the ordered concatenation of TM domains flanked bi-directionally by 5 amino acids.

Figure 2.2 - HMM-based identification of S. mansoni and S. mediterranea GPCRs.

The transmembrane frequency distribution of the S. mansoni (left) and S. mediterranea (right) predicted proteomes is shown as predicted by HMMTOP at various junctures of the bioinfor- matics protocol for the Rhodopsin family. The top graphs overlay the HMM-derived sequence pools (black, yellow outline) on top of the entire predicted proteomes (white, black outline) in the assayed TM domain range (3-15). The middle graphs overlay the BLASTp filtered sequence pools (black, yellow outline) on top of the HMM-derived pools (white, black outline). The bot- 49 tom graphs display the final distributions upon filtering, and after the removal of redundant sequences in the case of S. mediterranea.

Figure 2.3 - Manual curation and expansion of Schistosoma mansoni GRAFS GPCRs.

The transmembrane distributions for the filtered S. mansoni HMM pool is shown before (top) and after (middle) manual editing of the underlying gene models, as predicted by HMMTOP.

The number of GPCRs with a predicted intact 7TM domain increases from 27 to 41, coupled with a significant contraction of the distribution spread. The mean TM count shifts from 6.00 to

6.41, which equates to the identification and addition of roughly 42 missing TM domains during the first round of curation. Homology-bases searches against the genome assembly increased the putative 7TM receptor count to 59 (bottom). Receptors in the 8 and 9 TM bin can be considered full-length for our purposes, as the erroneously-predicted additional TM domains can be excised for phylogenetic analysis. Inclusion of these receptors brings the total putative full-length (7TM) receptor tally to 68 (of 116 total sequences).

Figure 2.4/ 2.5 - GRAFS and Rhodopsin phylogenetic trees.

(A) Overall topological view of the combined S.mansoni and S. mediterranea GPCR comple- ments. Maximum parsimony analysis (bootstrap value = 100) was carried out using putative full-length non-Rhodopsin GPCRs and a subset of full-length Rhodopsin-like GPCRs. In addi- tion to the phylogenetic clustering of sequences into the primary GRAFS families, this analysis reveals the presence of two distinct phylum-specific groupings: PROF1 and PARF1. * Sequence family is present in S. mediterranea. ** Sequence family is present in both S. mansoni and

S. mediterranea. (B) Neighbor-joining tree of flatworm Rhodopsin-like GPCRs. To maximize the number of sequences included in this analysis, a sequence block housing TM domains I-IV was extracted from the overall alignment. This allowed for inclusion of 312 Rhodopsin-like sequences: 90 S. mansoni and 224 S. mediterranea receptors (bootstrap value = 200). The α

(amine and opsin), peptide, melatonin, and PROF1 groupings are highlighted. Branches ter- minating in Schistosoma receptors are shown in green, and branches terminating in Schmidtea 50 receptors are shown in blue. Original concensus tree with bootstrap values and sequence labels is available.

Figure 2.6 - Aminergic receptors: S. mediterranea and S. mansoni.

Neighbor-joining tree (bootstrap value = 500) of putative biogenic amine-responsive GPCRs.

Included in this analysis are 21 (of 25) S. mansoni and 58 (of 65) S. mediterranea full-length and near full-length aminergic receptors, alongside 14 known C. elegans biogenic amine recep- tors. The latter grouping includes receptors that respond to tyramine, octopamine, dopamine, serotonin, and acetylcholine [194]. Branch lengths are scaled to bootstrap support, branches ter- minating in Schistosoma receptors are shown in green, and branches terminating in Schmidtea receptors are shown in blue. Flatworm receptors are outlined (solid lines) and classified by lig- and with respect to their nearest-related C. elegans homologs. Two diverged flatworm-specific receptor groupings are outlined in dashed lines.

Figure 2.7 - Phylogenetic analysis of PROF1 GPCRs.

Maximum parsimony tree for all identified PROF1 receptors. An alignment block that included

TM domains I-IV was bootstrapped 1000 times for parsimony analysis. PROF1 can be sub- divided into 3 families with good bootstrap support (> 50%; relevant values displayed): I, II and III. Schistosome sequences are shown in green and Schmidtea sequences are shown in blue.

The tree is rooted with a Schistosoma opsin-like GPCR (AAF73286.1). Schmidtea PROF1 receptors with transcript expression confirmed by RT-PCR are marked with red asterisks.

Figure 2.8 - PROF1 multiple sequence alignment.

Multiple sequence alignment of all PROF1 receptors over a sequence range that includes TM domains I-IV (used for phylogenetic analysis). Residues are colored according to an identity threshold set at 80% within each group. The locations of individual TM domains were approx- imated by alignment to Rhodopsin and are depicted above the MSA. Red asterisks are used 51 to mark residue locations where the among-group PROF1 identity level threshold (> 80%) is met.

Figure 2.9 - Phylogenetic analysis of Glutamate GPCRs.

Maximum parsimony tree of Glutamate family GPCRs. TM domains I-VII were used for phylogenetic analysis with the alignment bootstrapped 1000 times (bootstrap support val- ues are provided). Schistosome sequences are shown in green and Schmidtea sequences are shown in blue. GSMD007320 and GSMD015264 were excluded as they remain incomplete over the sequence range used. GABAB receptors are highlighted, along with the primary verte- brate mGluR groupings and the more recently discovered insect Group X receptors. A human

Calcium-sensing receptor (AAA86503.1) was used as an outgroup. Putative flatworm GPCRs that are diverged from both the GABAB and glutamate-responsive receptors are outlined in red. The ligand-binding domains of these receptors are further analyzed in Figure 9.

Figure 2.10 - Schematic of glutamate in association with LBD residues.

Conserved mGluR LBD residues involved in glutamate binding are shown (underlined) in com- parison with the corresponding residues for flatworm Glutamate-like receptors GSMP128940 and GSMD004608. Numbers represent residue location with respect to the mouse mGluR3 sequence. Disagreement at a given position is highlighted in red. GSMP128940 displays overall divergence with the canonical glutamate binding pocket, while GSMD004608 retains only key residues that interact with the glutamate α-carboxylic and α-amino groups. 52

A S. mansoni | S. mediterranea

GPCRDB Predicted Proteins

Transmembrane Prediction Transmembrane Prediction

Filter: 3 ≤ TM domains ≤ 15

Extract/Concatenate Blocks Extract/Concatenate Blocks

Apply family HMMs

MSA Rank Results

Filter 1: e-value cutoff Profile HMM Database Filter 2: BLASTp G R F A/S Filter 3: redundant sequences

NO Full-length?

Filter: unique hits YES Manual Curation

tBLASTn Genomic Assembly Putative full-length GPCRs Partial GPCRs

G R F A/S Inspect N-terminus

SVM Classification A S

B Original Sequence !"#$%&'()*(+&& !"#,%&'()*(+&& !"#-%&'()*(+&& !"#.%&'()*(+&& !"#(%&'()*(+&& !"#/%&'()*(+&& !"#0%&'()*(+&&

!" !!" !!!" !#" #" #!" #!!"

!" !!" !!!" !#" #" #!" #!!"

Transmembrane-Only Pseudosequence (TOP)

Figure 2.1 Transmembrane domain-focused GPCR sequence mining strategy. 53

Schistosoma mansoni Schmidtea mediterranea

300 800

600 200 y y c c n n e e

u u 400

HMM pool q q e e r r F F 100 200

0 0 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 TM domain count TM domain count - false positives BLASTp filter

100 100 90 90 80 80 70 70 y y

c 60 c 60 n n e e u 50 u 50 q q e e r 40 r F 40 F 30 30 20 20 10 10 0 0 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 TM domain count TM domain count

- redundancy BLAT filter

50 100 90 40 80 70 y y c c 30 60 n n e e u Filtered pool u 50 q q e e r r 20 40 F F 30 10 20 10 0 0 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 TM domain count TM domain count

Figure 2.2 HMM-based identification of S. mansoni and S. mediterranea GPCRs. 54

Schistosoma mansoni filtered HMM pool 60

Mean = 6.00 40 St. Dev. = 1.78 R y c

n AS e

u 27 q

e G r F 20 F

0 3 4 5 6 7 8 9 10 11 12 TM domain count

Editing 60

Mean = 6.41 St. Dev. = 1.30 41 40 R y c

n AS e u q

e G r F 20 F

0 3 4 5 6 7 8 9 10 11 12 TM domain count

tBLASTn 60 59 Editing

&" !" Mean = 6.24 &" 40 St. Dev. = 1.41 R y c

n AS e u q

e G r F 20 F

#$%" 0 3 4 5 6 7 8 9 10 11 12 TM domain count

Figure 2.3 Manual curation and expansion of Schistosoma mansoni GRAFS GPCRs. 55



 









Figure 2.4 GRAFS phylogenetic tree. 56

A

PROF1

*

Peptide-like (II)

Peptide-like (I)

** Melatonin

Alpha (α)- Opsin

Alpha (α)- Amine

300.0

Figure 2.5 Rhodopsin phylogenetic tree. 57

0 . 0 2 . 8 0 7

0177 0090

3 0 . 0 8 0 4 . 5 9 1

145 P 00786 0086 M S

5

0 . .0 Dopamine 3 0 4

00088 0367

00320 Tyramine .0 00074 6 Acetylcholine 6 0 4 2 0057 5 2 . . 0 152 0 1 P 9 M Serotonin 0

0 S

0113 2 .

0126 8

4

3 6 7 134

00099 2 5 1 P 5

0 9 . 0 .0 00013 0 M 1

. 0 6 0 S S

1 M 2 1 1 P 6 R 0 .0 00007 A 150 0 2 .1 5 -G R 6 . E A 0 C -G 1 1 E 8 C 0

0 2 . .01 00167 R 5 5 E 2 .01 S 02109 4 - C C Serotonin 8 E 3 01477 E S R .0 C A - - 0293 G 3 D T E- O Y C

R P C 4 A E 2 C -S E E 0175 -S R 0.0 8 0174 E 6 0 3.0 R 0006 0113 2 3 0204 21.0 6.0 5 C 6 E . 0079-S 1 SMDC17 ER Dopamine (D1) 5.4 CE 21 5 -SE .0 R1 71.0 012214.0 0147 9.04 004728.01 00156 CE-DOP1

CE-DOP2 Dopamine (D2) CE-DOP3 SMP127310 720 SMP127

.01 00 02 4 .02 1 1. 5 3 00252 .0 01992 .0 P134 83 8 M 0121 00950 S 1 00051 .0 2 2 .0 2 00035 00556 4 R 00300 E 00565 -S 4 E .1 C 6 0 SM . 0 PS SMP04330 7 2 0 C1 S R M S 03 P043 MP145520 E S 2 MP S 7 134 - 0 .0 460 SMP043 6 E S S 320 4 C MP MP043 043 340 26 0109 0 0 2 0

160 0 P .5 . SM M 3 1 P S S 059 6 M P 4 043 00 0 00446 2 MDC94. 90 S 0 0046 9

2 01015 . 0136 0

S 2 S 0005

0 M

. M

0 P

P 8

6 126

148 . 2 0 6

1 7 .

2 0

3 0138

1

0118 0

0 2 7 .0

0015 00738

00158 00158 00593 0110 8

SMDC4046.1 5 8

. . 0 0 0 9 2 7 . Serotonin 7 6 0 5 . . 0 . 1 0 0 0 . 5 6 9

01381 Serotonin

0 0 7 .

7 0

7

149 P

M 0122 S

600.0

Figure 2.6 Aminergic receptors: S. mediterranea and S. mansoni. 58 Phylogenetic analysis of PROF1 GPCRs. Figure 2.7 59 I I S S K K A Y A A E T T R K P T K K E Y K T E P G G G G G G G N G G G D G G G G D D G H G D G D N G D N N Q Q D I I I I I I I I I I I I S L L S L F S S T T T T T V T T T V V V V V V V K T V Y A T K T E M D D Q M N N D N N H N I I I I I I I I I L L L L L F F L E E P E E E E K K K K K K K K K K K Y V Y K R R Y V T Y E A Y K Q Q D N Q Q Q N I I I I I I I I I I I I I S S F F F F S L L T T T T R K T T V V R E R R T T Y Y V Y V Y K K K KK V H N H M N N N Q W I F L S S S S S L L L L L L L L L L L L L L L L L Y Y Y Y Y K K Y T K Y Y Y H H Q Q G W W W W W W W W W W W W S R R R E K K R K K T K R K K R T E K E G G G G N D D Q G G D D D D D Q G D Q Q G H G D Q D D G G Q D G D D G D I I I I I I I L L L L L L L S S SSL F F F F F F F S S S S V T V A V V Y Y TR Y A V A Y A Y T V M M M M M M Q M H M M H I I I F L F F F F F F F F F F F F F L L F S L S F S L Y Y Y Y E Y Y Y V V R V Y Y V V A T V T T A Y Y M Q G N I I I I I I I I I I I I I I I I I II I I I I I I I I F F F L FF F L L L S F LS L V A V V V V A V V G G G G G G G I I I I I I I L S S S S L F SL L L L L L A A A A A A A V A T T T T T A AA T V V A G N M N N N N N C G G G W W I I I I I I I I I II F F F F F F F L F F LL F L F F L F LL F F L L L LLL Y V V Y Y Y Y Y Y T E M H H H H H Q H H H H Q H I I I F L P P P P P P P V P V V P P V V V P V V V P V V P P P P P P P P P P P P P P P P P P P P P P P P P P P P P C I I I I I I I I I I I I I S S S S L S S L L S S L L L L L L L S F F L L L T T V T V V V V V M N N N G N N N N N N M S S S S S S S S S V T A A V A A T T V T V A A V T Y Y M M M M M M M M M N M N N N N G C M M M N C C M N G N N D N I I I L L L L L L L S F L F F L F F L S F S F L L L S S L L L L A V V V V E V T E M M C M C M M M M N M M M M M M I I I I I I I I I I I I I I I I I I I I I I L L L L L F L L LSS L L L F L F L L V V V V V V P V V V V V T A C D I I I I I I I I I I I I F F F F F F F F F L F L L F L F F F SL F S F L F S F LL A T T A T T V V VV A V E C G G C M M S F S F S S S S S S S S S S S S T A T T A T T C N C C C C C G C C C C N N N C N N N G N G C G N G C I I I I I I I I I I I I I I I I I I I I I SS SS SS SS F F L F L L L L L S F L L L F LS L F VVV A V T V M W W W W I I I I I I I I I I I I I I I F LLS LLS S F S L F F F L L F F L F F F L L F L A A V A V V V V V GG G G G G C C I I I I I I I I I I I I II L L L L LLL F F F F F F FF L F F L F F L V V T T T T Y Y A V V Y V V V Y V V T Y Y W I I I I I I I I I I I S LLLL S V V V V V V V V V T V V V V V V TT A T T V V T V T V T V M M M M M M M M M M M M M D IIII I I I I I I I I I I L L L L F L S L V V V T V V V V A VVV G G G G M M G G G G G G G C G G G G G C G G C I I I I I I I I I I I I I I I II I I S S S L L F L L L LS S L L L L L F L F L F VV TT TT A V V V V V T G G G G G C I I I I I I I I I I I I I S S S S S S S L S S S S F S S SL SL S S S S L Y V V V A V VVV A V V C C H C C C C C C C C C C I I I I I I I I I L S S L F S L L L L S F S F F Y Y Y Y V V Y Y A A V A T A A A V V A V V T A Y A T V G C G G I I I II I I I II I I I II I L L F S L F L F F L Y Y V Y Y Y Y Y A Y Y Y Y T K Y Y K V Y R K Q Q Q Q Q Q N N Q C C M W I I I I I I I I I I I I I I I L L L L L L L F L K V V V R V R V K C H C W W W W W W W W W W W W W I S S S S S SL A A AW A A A A A A A A A AW A T T A P T AW A A A A A A A V A AW V AW A AW T A R T P A AW A A A AW A A G G N G I F L S S S S S F F F S S F F F F S F F L L L Y T Y Y R T T Y Y T V T K Y Y V K M G Q H H Q M N W L L S S RR RR K K RR RR K K KK KK K K K K K K KK K Y YY K K K Y Y R K T Y T K R K KK R Y R R K K K K K R H Q Q H Q Q H H N I I L L F L L F L L F F L L S LL L S L K Y Y R R Y K K Y K V V Y Y Y Y K K K K K K V K K V V T H N N Q N H G W S S S S S S S S S S S S P P R P P P P P P P RT RT T P P P P Y Y P Y G Q GH G G G N G G G G N N H G G H Q H G H H H N H I I I I I I I I I I F F F F F F F F L S L F L L L L S L S L L Y Y Y Y Y K V Y Y R Y K M M M M G Q K H H G W W W W W I I I S R R K V T T R K K T V R K R T T K K K R K K T K D R R D N K K K K H K K K K K N K N N N N N N N N H N D N N I I I I I S S S S I L S L S L L L L L L S S L S R R K L K S K K R R P K R G G Q R R A G G G Q M N N M M M M M M M G F Y R V K E Y R R H E N M I I F S S F F F F F F S F F F F F F L F S F F Y Y F Y Y Y Y Y C C Y Y C G C C C C C C M M M M M M M M M M M M C M M I S S K R K R T R K K K K K K K K K K K K K K K K K K K R R K K K K D D K K K K K K K K K Q K H H Q N Q Q M Q I I I I I I L L F L F S LL LL L LL L L F L L F F S L L L L L L L L L L L L L L L L L L L L L V K T T V K V M M M M M M F P P P P P P P P P P P P P P P P P T P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P * I

I L F F I F F F F F F F F F F F F F F F F F K F R F K K K Y A K K A K K K Q K M R K C M H M C W W W W W W W W I L I L S F F F F F F F F F L F F Y Y Y Y Y Y Y Y Y V Y Y Y Y Y Y Y YY Y Y Y K T Y H Y M YY Y Y Y Y Y H Y Y H M M H W I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I L L L L F V V * V V V V M

S S L L S S S L S S S T S A A A T T T V A V V A A A A A A A A A T T A T A A T A A T A A A A T A A A A A C C C C I L I L L L F L L L L L L L F L F F F F L L L F F L L L L L L L L L L L L L L LL L L L L L L F L L L V V N C C I I L LL L LL LLL F L L L F L F F F L A V V V Y Y Y V V Y C C C C C C C C C C C C C C C C C C C M M M M M C M C W W W W S R R R R R R R R R R R R R R R R * R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R K

D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D I I I I I I I I I I I I I I I I I I I I I I L L F F L L F F L F LD L F F S L L L L L F F L V A V V V V V V V V V V C S S S S S S S S S S S S S S S S S S S S S S S S S S S S S V V A A A T T A A A A A C N C C C C C C C C C C C C C C M I I F F F F L F F F F S L S F F F S L F F A V T T A T T A A V V A A V A V T V V A V V V V V T V T V T C C M M C C M I I I I I I I L F L L L L L F F L F L L L L L L L L L L L S L L L L L L L A V A A V V V V V A V V M M M M C C C C W I I I I I I L L L L L L L L L L L L L L L F L L L L L L L L L L L L L L V V V V V V V V V V V V V V V M M I I I LL L F F F F F F F F L L LL LL L L F F F F F F F F F F L F L F F L L L L F FF L F F F F S F F S L T Y R V M M H I I I I I I I I I I I I I I I I I I I I L L L L F LLL L S S L L L L V V Y V V A A A A V V V V A A A A A T T M M M M I F L F L F F F S S S S S S L F T A Y Y Y Y N N D N N C N N G G G N G N N N N G G N N N N G G G G G G N G N W W W I I I I S F F F S S S F F F S S S S S S L L S F F F F S F F L SSL L A A T V A Y A V C G C G G G G C C C C C G N C G I I S S S S S S S S S S F F S S L S S S A T A R A T R A R A A A A A A A A T A R T T T T A V V V A A R C M GG M N I I I I I I I L F L L F L F L S F S S S S S S L L L L L L L L S LS F L S F V T A V A M M N N C MM W W W W W W W W W W I I I F S F F F F L F L S S L L S T A K T E T T T T T T A T T T T T T T T T T T T D Q C Q N G C C C C C C G N Q C C F S S S S S S S S S S S S S S S S S S S S F S S S S S S Y V Y A Y Y A Y T T V E A V Y A Y A A Q N C C D Q C H C C G I I I I I I I I I S F S S S S S F S S S F S F S S F F T T T Y T V T T T T T T A M M M M M M C C C M C C C W W W W I I F F F F L F F L F F F F L F F F F F F F F F T T T V V A A T T A A T A A T T T T T Y A T A AA T T A T T T M G W I I I I F F L S L S F F L S S S SL S S S S S Y T E T Y Y T A A A A R A A A A Y E Y T T T G N D D N D N N D N M M M I I I I I I I I I I I L S S F F S LS Y V P P P P P T T Y Y Y Y Y P P Y Y P Y T Y P Y Y Y Y Y Q Q G Q N H Q C C H C I I I I I I I I I I I I I I I I L S F F L F L F L F F F L L L F F F F F F F F L L F F F F F T V V A T V M NN M N M W S F F F F L F L S S F F S S F F F F S S L F F F F S S S T T Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y A Y Y Y Y Y G G G G G G F S F R Y Y R E V R R E V R Y K E T E R R E R R R R R R R R R R T R R R R R R R P R T R E R R R R R R R R Q Q Q N N I I I I F F L L F F F F F F F F F V Y Y Y T V Y Y Y Y Y T Y Y Y Y Y H M H H M M M M H H H H H H Q H C M H M H H M I I I I I I I F F L L F L L L F L F L L F L L S F F L L L L F F F F F L L F F FF F L F L F L L S S V T T A V A T M G I I I I I L T K K K K K V K K V K K R R K R K K K K K K K K K R R R K K R R R R K K R K K R R K R R K K K R R R R Q C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C I I I I F F S S L S S S S L F S S S S S S S S L S L S L F F L S F L S F S Y A Y T P P T T A A A A T A T A T A A G W I I I I I I I I I I I I I I F S F S F F F F L S F L L F S F F F F F V T A A T E E E A T P M N G M M D D M M G H I I S S S L S R E T E T K V V K E E E E K V E P E R T P T V D N H N D D D Q N N N N D D D D D D Q D D N D D S S S S S S S S SLL S S S SS S SS S S S S S S S S S S S S S S S S SSL S S S S S S T T G G G G G G G G G G G N N N N N N I I I I I I I I I I I I I I I I F L S L F L K T E Y V T K Y K V V K Y Y E E E K Q Q M M H N Q Q Q Q Q Q Q Q W I I S L L S L L L S SLS S T T T T T T T Y Y T T T Y V N N N N Q N N N N N N N N N N N N N NN NN N N N N N N N N H N N I I I I I I I I I I I I S S F L S L F L L L L L F S F L F F Y P Y Y Y Y Y Y K Y M D D D D N D D D D G D D Q D I I I I I I I I I I I I I I I I I I I I I I I I I I LS L L L LS F F L L L L L F LD T T T P V T T T T T T V T V T T M W I I I I F F F F F S F F F F F F F F F F F F F S F F F F F S S F F F F F F F F F F F A K K K K K K K K K K K M Q Q I I I I I F F F F F F F F F F F F F F F F F L F F L L FF F L L L L L F L L F L S Y Y Y Y V Y Y Y Y Y Y W W W W F L S S S S S S S FF FF F S S Y Y Y Y Y Y A Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y V Y Y Y Y Y Y G G G N H W I II I I I I I I I I I I I I I I I I I L F L S S L L S L L L L Y Y T Y V Y T Y Y V V V V V V K V V V V Y T H Q Q W I I S L S L F S S S S L S F T K K A K K V V K K K K E E R A T T R R R K T R E K K R K K Q N N H D N N Q D N D N M S F Y A T G G G G G G G NN D Q G G G G C G G G G Q G G G G G G G G G G G G G G G G G C G G C G G G G G G G C G G G W Y Y C C H K R G G F S S S S S L L L S S S S Y A A R E E E K K T K N N N D N N N N N G N N N Q Q Q Q H D H N N N G D N Q G Q G G N D N I S L S S S S S S S S S S S S S S S S S S S S S S T T T T T T T T T T T T T T T T T T * T T T T T T T T T T T T T T

I I I I I I I I L L L L A A A AA A A V A A A V A A A A A A A A A A A V A V T A A A T A A A A A A A A A A A A V A M M F F F F F F F F Y Y Y Y Y Y Y V Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y V Y Y Y Y Y Y Y Y Y Y Y Y Y Y H W W W S S P P P P P P P A A P Y P P P P E K YY P P P Y P P P P P P P P P P P P P P P P P P A P T P P P P P P P K P T P P P I I I L L L L L L L L L L L L L L L * L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L V V M

G G G G G G D G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G I F F F F F K K K K K R R R Y R R R R R V K K R K K K K K K K K K R K K K K K Y R R R K K K K R K R R K K K R K K W S F S S S S S A A A A T V T T A A A A A A A A A A A A V T T A A A A A A A A A A A A A A G G M G G G G G G G G G G G I I L L L L L F F L F F F F L L S L L L L P P P P P P T P P P P P P P P P P P T P P P P P P P P P P V P P P P P P P I I I I F F F L F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F Y Y I I I I I I I I I I I I I L F L L L L L L L F L L L F FF L L L Y Y V Y D D M D D M D D D D Q M Q D D M C D D D M W W F Y Y R Y R Y H D D H D D H D D D D D H D D D D W W W W W W W W W W W W W W W W W W W W W W W W W W W W W W W W W W I I I I I I I I I I I I L L L L F L L L L L L L L L L L L L F L F L L L L L L L L L L L L L L L L L L L L M M M M M I I F L L F F F L F F L F L F L F V G G W W W W W W W W W W W W W W W W W W W W W W W W W W W W W W W W W W W W W W I I I I S S S S T T T T T A T V T T T A A T T T G G G G G G G G G G G G G G G G G G G G D G G G G G G G G G G G G G I I I F S L S S F F L F F F F F F F F F L F L F F V V Y Y Y Y Y Y Y V Y V Y Y D G M N N N N N N N N N N N W W W W W I I I I I I I S L F F F S F F L S S L F F S S S S S S S A V V V T T V V T V A V V T T T M M C C C C C C M M M C I II I I I I I I I I I I I I I I I I I I I I I I I I I L L F L F S L F F F F L F F F F F F F LS F V T V V A Q M PROF1 multiple sequence alignment. I I I I II II I I I I I I I I I I I I I I I L L L L L L L S L L L S L L L L L LL V V V V T T N N N N N M N Q N N D N C I I I I I I I I L F F F F F F F F F S T V A A T A A T T A Y T V T A T E A Y Y Y Y N C Q C N N G N N N N N N D Q C N I I I I I I I I I L S F L F S L S F F L F F F L L L L L L L L L L L F V V V V V V V V V G G M M M G M G G I I I I I I I I I I I I I L L L S L L L S S S S S F F S S F LL LL S L LL LL V V T VV A A A A V Y C M M M C C C G N N S S S E E E E D H D D D D H N D D D D D D Q D N N D N D D D DS D DS D D D D N N D D H D Q N H D D H D D D D H D D D D I L S S S S S S S S S F F F S S S S S S S F S S F S A A A T A V A A A Y Y Y A T A A V V A A A A A A A G C C C C I I I I I I I I I I I I I I I I I I I I I I I I L L F L F F F F LS F L F L F F V V A V V G G G M G G G G C M W L SLS S S S L S S S S A A A A A A A A A A A A A A A A A A A A A A A A A A A A T A T A A A A A A A A C C I I I I I I I I I I I L L L L L L L L L L L L L L L L L L L L LS LS LS L L L S L LSL LSL L L L L L L L LS L L LS L L * * * V Y M M

I I I I I I F F F S S F F F F F F F L F F S L F F F F F Y T A V A A V A A T V T T V A C G C C G C C G G G C

W W W I I I I I I I I I I I I I I S S S S S S L L S S S S S V V T T T T A T A T G G G G G G G G G M G I I I I S S L L L L L L L LSS L S L LS LS S L LS LS LS LS L L L L F L S L L L L LS L L L L V A T A M M C C M M M C LL F F F F LLS LLS F LL LL L F F F L F Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y C I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I L S * L L V V V Y V V T V V V V I

I I I S F F F S S S F S L L S L A T A A K K T V E K K Y Y T T T T Y Y Y Y V V K A T T T Y T T VV G N N M M N N N N F F F F F F F F S F F F F F Y H H Q Q Q Q Q H Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q H Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q R R R R R R R R R R R R R R R R R R R R R * R R R R R R R R R R R R R R K R R R R R R R R R R R R R R R R R R R R R D

I S S S S S S S S S S S S S S S S S S T T T T R T T T P T P P T T T T T T T C C N N N N N N C N N C I I Figure 2.8 I S L F L L L S R R RT K K K K K K K K K RT RT RT RT K A Y K K R P P K K R K K K K RT K K K RT H H M M N L S S S T K Q N Q I I I L L F F S L K K K K K R K K K K K K K K K K K K K Q Q N H Q N N D D D D D D N N N Q D Q H W I L L L L S S S S K R P R R A P K R K R R P T T T P K K K K K K A A Y K K PP K G H M Q H H H H N Q Q Q Q Q Q I I I I S L F L F F L L L L S FK L F E K K K P P Y Y T Y K T Y K R K Y Y Y Y A R Y Y R Y Y E R Y Y Y Y Y Y Y M G Q Q I I I I I L F S S S S K K R E K K K E K K K K K K K A K E R K K K T Y E K K R K K K K R A K K H N N M C C M I I I I I I I I I I I I L S F F F F S S S S L S F F F L T K Y R K Y Y K R K V V V V Y V Y KK K K K V V D Q N M W W I I I I I I II I I I L L L S L L S L L L L L L L L F F L L L T T Y R A A K Y V A T T T N N N N M M M M G M W I F F F F F F F L F F F F F F F F F F F F F F LS LS LS LS F F F F F F F F F F F F F F F F F F F F F F F F F F F Y V T V I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I L L S Y R Y T V Y V K V V Y V V V V H H N H I I I F L F F F L L L S F F F F L S F Y Y Y Y R T Y V Y Y T Y Y Y Y Y Y Y Y Y Y Y T Y Y Y Y Y Y Y V Y Y Y Y N N I I I I I I I I I I I I I I I I I I I II I I I I I F L L S F V V V A V A T A A A V C G G C C C C C C C C C C C C C III I I I I I I I I I I I I I I I I I I I I I I I I I L L F T A T V V V V V V V V A A T T T T V V V A V T V V T N C I I I I I I II I I I I S L F F L L F F L L F L L L L L L F L F F L L L F L L S L L Y Y Y Y Y T V V V V Y Y V Y A H M I I I I I I I I I I I I I I I I S S S S L L L L S L S S L F L L F P P V V T V V V V V A A A M C C C M M M M C G G G N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N H N N N N N C N N N N N N N N N N N N N N N N N * I I I I I S S S S L L S S S L S S S S S S S S S S S S S F V T T T T T T T A V A T T T M G M G G G G G G G G G G G C I I

I I I I I I I I I I I I I I I I I I I I I I F F F F F F F F F F F F F F V V V V V V V P V P V V T A V M I I I I I I I I I I I I I I II I I I I I I I I I I I I I I L L L F L L L L L F L L L L L L L L F V V V V V T V VV VV VV S S S S S S S T A G M G G M G G D G G N G G G D G G G G G G G G G G G G D G G G C G G G G G G G G G G G G G G G G G TM-I TM-II TM-III TM-IV TM-III TM-II TM-I I I I I I I I I I I I L L F F L F L F F F F F F F F F F F F L F F F F F F F F F F F F F F F F F F F F V A A V V P V I I I I I I I I I I I I I I I I L L L L F L L L L L S S S S A V A T V V V V T T V T V C M G G G G G C C G G I I I S S F F S S SL SL SL S S SL S L S S S S S S S V V A A A V A A V T A G G G G C G G G G G C G G G G G G G G G G I I I I I I I I I I I I I I I I I I I I I I I I I I I L F L L F F F F F F L F T V V AA V V V Y V V V V M M N M M I I I I I I I I I I I I I I I I I I I I I I I I I L F F L F L L L L F L F LLS L F F F V V V V VV V V V V V M G G C C W I * P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P I I I

L S L S S F F L L F F L L L L L F F L L L F F F F F L F F F S S S F F F F F L S F F F F F F T A T N N N N C C I I I I I I I I I I I I I I I I I I I I I I I I I F F F F L F F L L L L L F L L L F F V Y V V V V V V V V V V V C M F F F F F F Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y * I I I L A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A T T V V V G G G G G G G G G G G G G G G G I S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S T AA R R R A R R R R R R R R V R T R T M M C M I I I I I I L L LS F F L F L L F S F L L L F F F L L F F L L L F F L L F F L L F Y Y M M M M M W W W W W W W W W W W I I I I I I I I I I I I I I I I I I I I I F L S S F S F F F F F F F F T Y A V T R T V V V T V V V V V V V N N N G N S L F S A Y Y Y N G G G G G G G G G G G G N N G G G G G G G G G G G G G G G G G G G G G G G D G G G G G G G G G G G I I I I I I L L L F L L L L L L L L L L L L F L L L F L L L L L L V V V Y V V V V V A V V M M D D M M M M M M M M M I I I I I I I I I I I I I I I I I I I I I I I I L F F L L L L S S L L A A A V T V A V V A T V V V V V M M M M M I I I I I I I I I S SL S L S S S S S R K E VV E K E T T K T T K P K N H N H H H Q H Q Q H H H H H H H H H H H H H H Q S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S V E A A K K E Y T E E E E T T Y N D N D Q N G G D I I I I I I I L L L L L F L L L L L S L F F F F F F F L L S F F F F F L L T T V V A T V Y VV E E E T E Y Y C C C M M SMDC18000.2 SMDC12781.1 SMDC14497.4 SMDC15168.2 SMDC6858.1 SMDC21116.2 SMDC19969.1 SMDC14736.1 SMDC4350.2 SMDC15273.1 SMDC7587.4 SMDC8572.2 SMDC6858.4 SMDC1026.2 SMDC423.2 SMDC1889.2 SMPSC433 SMPSC331A SMDC85103B SMDC6472.1 SMP091950 SMDC8510.2 SMDC12554.3 SMPSC12A SMDC15385.3 SMDC14380.2 SMDC27521.1 SMDC85103A SMDC504.1 SMPSC1003 SMP083940 SMPSC331B SMPSC63 SMP084290 SMPSC12B SMDC27911.2 SMP084270 SMP084280 SMDC1917.2 SMDC2955.1 SMP001070.1 SMP041880 SMDC907.1B SMDC2150.1 SMP10893 SMP023710 SMP134960 SMDC22227.1 SMDC2514.1 SMDC907.1A SMP117340 SMDC2411.1 SMDC55981A SMDC4570.1 SMDC17493B SMDC55981B SMDC1825.1 SMDC17493A I II III 60 GABA-B mGluR- Group III Group mGluR- Group X Group mGluR- Group II Group mGluR- mGluR- Group I Group mGluR- GPCRs. Glutamate Phylogenetic analysis of Figure 2.9 61 Ala Thr A: B: Thr Ile 174 Asp A: B: Asp 194 151 O Ser A: Ser B: Ser His Tyr O A: B: Tyr 222 3 _ NH 301 Asn Asp A: B: Asp a O 68 Schematic of glutamate in association with LBD residues. Leu Ser Arg O A: B: 389 Figure 2.10 Tyr Leu A: GSMP128940 B: GSMD004608 Lys A: B: 62

2.7 Tables

Table 2.1 - Tabulated GPCR sequence count at various stages of processing.

Sequence counts are provided for each GPCR family at different stages in the receptor mining protocol. The S. mansoni count is shown after application of the TM-focused HMM and

filtering of the predicted proteome (HMM), extensive manual curation (MC), homology-based searches against the nucleotide assembly (tBLASTn) and a final round of manual curation

(Final). This progression is similarly displayed for S. mediterranea, with additional stages for the shedding of redundant sequences using BLAT (R).

Table 2.2 - GRAFS-based comparison of GPCR repertoires.

The GPCR repertoires of S. mansoni and S. mediterranea are shown from a GRAFS-based perspective, alongside those of other organisms with characterized GPCR complements. For

Rhodopsin sub-classification, BLAST searches were used to help tentatively assign putative ligands to receptors omitted from phylogenetic analysis.

Tables 2.3, 2.4, 2.5 - Identification of Planarian sequelogs of parasite Rhodopsin receptors.

The nearest S. mediterranea sequelog of each S. mansoni Rhodopsin receptor is shown, along with the length of the BLASTp overlap region and the corresponding E-value. Receptor pairs are ranked by percent identity (PID). Parasite receptors closest to top of the table are likely candidates for indirect characterization via investigation of their nearest-related planarian coun- terpart.

Table 2.6 - Comparison of PROF1 motifs and classical Rhodopsin motifs.

PROF1 motifs are compared to ubiquitous Rhodopsin family motifs. Motifs are displayed as regular expressions. The two most frequently occurring amino acids are shown for each position in order of frequency, except in cases where a particular residue is absolutely conserved or when there is no clear second in frequency rank. Red text is used to highlight positional agreement 63 between PROF1 motifs and the corresponding Rhodopsin motifs. More specifically, instances where the most frequently occurring residue is equivalent.

Table 2.7- Rhodopsin SVM training parameters and cross-validation accuracy.

RBF grid-search parameters used to train SVM models, along with the corresponding 5-fold cross-validation accuracy (ACC) for the training set. The best performing model for level

2 (subfamily) classification is SVMT 7, while the best performing model for level 3 (amine) classification is SVMT 1. Both classifiers exclusively employ transmembrane sequence data, and outperform the classifiers trained with full-length sequence data.

Table 2.7- Rhodopsin SVM classifier results.

Ligand-based classification of flatworm Rhodopsin GPCRs with Rhodopsin SVMT 7. PROF1 receptors are labeled with ‘*’. A total of 178 receptors were classified, with the vast majority placed in the peptide and amine groupings. Interestingly, all 45 PROF1 receptors were classified as peptide-responsive.

Table 2.9 - Amine SVM classifier results.

Ligand-based classification of flatworm amine-responsive GPCRs with amine SVMT 1. A total of 43 receptors were identified as aminergic via Rhodopsin SVM classification. In cases of erroneous TM boundary prediction, the SVMFL classifier was used. The classifier results display correct predictions for the three schistosome receptors thus far deorphanized in this subfamily, including two histamine-responsive GPCRs and one dopamine-responsive GPCR

(labeled with ‘*’). 64 9 22 11 291 Final 9 R 24 10 291 10 30 11 291 tBLASTn - 6 8 11 MC S. mediterranea 6 8 R 11 256 6 11 10 270 HMM 2 5 5 104 Final 3 5 5 105 tBLASTn Table 2.1 Stage-specific GPCR tabulation. 2 4 4 S. mansoni 74 MC 2 4 5 77 HMM F R G AS Family 65 0 9 5 0 0 0 0 0 0 0 0 0 9 9 1 60 81 40 96 11 12 S. mediterranea 0 0 4 0 0 0 0 0 0 0 0 5 0 2 3 2 0 25 35 19 21 S. mansoni 1 0 1 0 0 0 1 0 0 0 0 5 0 6 5 5 0 20 31 10 124 C. elegans 1 2 8 0 0 0 5 4 0 0 0 0 5 0 9 5 0 21 21 79 13 D. melanogaster 2 2 0 0 0 3 3 0 0 0 0 0 8 1 0 18 12 29 77 13 13 A. gambiae Table 2.2 GRAFS-based comparison of GPCR repertoires. 3 1 7 7 0 0 44 22 11 13 43 43 10 35 20 11 13 24 27 20 494 H. sapiens ) ) ) ) ) ) α ) ) ) ) ) ) γ ) γ α δ α δ α δ δ α γ β SEC GLR ADH TAS2 PARF1 PROF1 OLF ( LGR ( PUR ( PEP ( SOG ( OPN ( MRG ( MEC ( MTN ( AMIN ( CHEM ( FZD/SMO MCHR ( Unclassified PTGER ( F R G A/S 66

S. mansoni Query Length S. mediterranea Hit Length Overlap E-value PID SMP145540 492 mk4.017782.00.01 278 206 6.00E-90 69% SMP148210 279 mk4.007388.02.01 339 196 2.00E-65 62% SMP169680 178 mk4.000219.07.01 424 82 3.00E-25 62% SMP118040 354 mk4.000375.06.01 371 314 1.00E-103 60% SMP126730 463 mk4.007388.02.01 339 192 7.00E-62 59% SMP152540 231 mk4.009070.00.01 704 131 5.00E-43 59% SMP160020 320 mk4.013690.00.01 458 157 3.00E-43 52% SMP149770 318 mk4.013819.05.01 308 141 5.00E-39 52% SMP134820 360 mk4.011160.01.01 484 197 2.00E-51 48% SMP132410 383 mk4.000375.06.01 371 314 4.00E-78 47% SMP133550 503 mk4.010211.00.01 370 236 2.00E-53 47% SMP140620 170 mk4.018209.00.01 365 171 3.00E-39 47% SMP123350 535 mk4.000301.14.01 386 87 7.00E-24 47% SMP127720 518 mk4.012183.00.01 509 167 1.00E-39 44% SMP120620 587 mk4.002108.03.01 224 112 3.00E-26 44% SMP001070.1 197 SMDC2955.1 334 126 4.00E-26 44% SMP000900 412 mk4.001208.04.01 152 141 3.00E-25 43% SMP150180 483 mk4.012659.00.01 387 417 9.00E-80 42% SMP043340 551 mk4.005650.02.01 421 189 2.00E-35 41% SMP043290 497 mk4.005650.02.01 421 168 2.00E-34 41% SMPSC31 241 mk4.002460.05.01 306 160 1.00E-27 41% SMP172810 450 mk4.000557.00.01 337 75 2.00E-12 41% SMP134350 421 mk4.002485.01.01 254 213 9.00E-49 40% SMP011940 396 mk4.002418.01.01 447 321 2.00E-72 39% SMP194740 265 mk4.012712.00.01 279 143 3.00E-18 39% SMP149580 377 mk4.010306.00.01 320 342 1.00E-55 38% SMP145240 581 mk4.015843.00.01 270 246 2.00E-40 38% SMP058080 319 mk4.000152.09.01 261 218 6.00E-39 38% SMPSC12B 244 SMDC1889.2 392 198 6.00E-38 38% SMP161500 302 mk4.025533.00.01 287 216 1.00E-37 38% SMP043320 584 mk4.000354.16.01 413 173 5.00E-36 38% SMP043260 560 mk4.000354.16.01 413 183 1.00E-34 38% SMPSC103 526 mk4.000354.16.01 413 171 3.00E-33 38% SMP164730 542 mk4.025533.00.01 287 209 1.00E-31 38% SMP157050 387 mk4.014274.00.01 409 167 3.00E-25 38%

Table 2.3 Planarian homologs of parasite GPCRs. 67

S. mansoni Query Length S. mediterranea Hit Length Overlap E-value PID SMPSC10893 251 SMDC2411.1 385 128 6.00E-19 38% SMPSC433 371 SMDC1889.2 392 367 4.00E-66 37% SMP041880 342 SMDC2411.1 385 366 2.00E-62 37% SMP134100 293 mk4.005562.01.01 324 295 1.00E-51 37% SMP043300 522 mk4.000354.16.01 413 172 5.00E-34 37% SMP023710 465 SMDC2411.1 385 134 1.00E-22 37% SMP140250 196 mk4.005799.01.01 323 140 1.00E-21 37% SMP104210 329 mk4.011006.00.01 348 127 4.00E-20 37% SMP146450 367 mk4.012231.01.01 343 127 1.00E-16 37% SMP170020 407 mk4.033855.01.01 243 187 8.00E-30 36% SMP137050 454 mk4.002108.03.01 224 217 5.00E-35 35% SMP157640 482 mk4.022218.00.01 344 127 1.00E-12 35% SMP129810 497 mk4.006543.02.01 440 404 2.00E-61 34% SMP091950 366 SMDC1889.2 392 363 6.00E-57 34% SMP172170 243 mk4.000152.09.01 261 220 3.00E-28 34% SMP043270 490 mk4.000354.16.01 413 175 3.00E-29 33% SMP117340 192 SMDC2411.1 385 133 2.00E-21 33% SMP164140 272 mk4.004167.00.01 260 145 5.00E-18 33% SMP135660 461 mk4.001407.02.01 330 75 0.016 33% SMP167870 319 SMDC1889.2 392 291 4.00E-42 32% SMPSC1003 373 SMDC1889.2 392 362 1.00E-40 32% SMP173010 354 mk4.002460.05.01 306 259 9.00E-37 32% SMP056080 532 mk4.012712.00.01 279 283 3.00E-35 32% SMPSC12A 305 SMDC1889.2 392 234 3.00E-35 32% SMPSC74 411 mk4.001407.02.01 330 277 6.00E-29 32% SMP141880 492 mk4.011130.02.01 209 138 6.00E-16 32% SMP083880 247 mk4.008619.00.01 542 83 1.00E-07 32% SMP132730 278 mk4.024092.00.01 427 67 4.00E-06 32% SMP180030 369 mk4.007964.02.01 210 190 2.00E-20 31% SMP162870 195 mk4.021981.00.01 316 141 8.00E-20 31% SMP084280 355 SMDC1889.2 392 307 1.00E-40 30% SMPSC63 358 SMDC6472.1 368 340 4.00E-40 30% SMP072450 460 mk4.013492.00.01 417 379 1.00E-36 30% SMPSC331A 354 SMDC1889.2 392 333 8.00E-35 30% SMP149170 441 mk4.031060.00.01 384 345 2.00E-30 30%

Table 2.4 Planarian homologs of parasite GPCRs- continued. 68

S. mansoni Query Length S. mediterranea Hit Length Overlap E-value PID SMP178420 328 mk4.004400.01.01 317 149 1.00E-15 30% SMP134460 521 mk4.005650.02.01 421 444 3.00E-42 29% SMP084290 364 SMDC6472.1 368 340 8.00E-37 29% SMP007070 580 mk4.000219.07.01 424 318 1.00E-30 29% SMP159860 468 mk4.004728.01.01 352 126 2.00E-13 29% SMP132220 135 mk4.008535.00.01 402 78 8.00E-07 29% SMP153210 392 mk4.014320.00.01 350 98 6.00E-06 29% SMP134960 367 SMDC2411.1 385 345 5.00E-36 28% SMP084270 369 SMDC1889.2 392 302 1.00E-33 28% SMP059400 291 mk4.005650.02.01 421 338 1.00E-26 28% SMP127310 545 mk4.013690.00.01 458 466 1.00E-41 27% SMP145520 379 mk4.000354.16.01 413 319 2.00E-36 27% SMP083940 352 SMDC8510.2 380 350 8.00E-31 27% SMPSC15 409 mk4.000596.05.01 315 310 5.00E-28 27% SMP008850 515 mk4.011509.03.01 461 381 3.00E-26 27% SMPSC34 365 mk4.002618.00.01 384 314 2.00E-21 27% SMP180350 190 mk4.010158.01.01 325 152 2.00E-19 27% SMP027940 324 mk4.004400.01.01 317 271 2.00E-18 27% SMP180140 354 mk4.001491.09.01 305 190 7.00E-12 27% SMP170610 388 mk4.014274.00.01 409 117 4.00E-07 27% SMP153200 367 mk4.000600.00.01 372 110 0.003 27% SMPSC331B 353 SMDC1889.2 392 334 2.00E-28 26% SMP128710 551 mk4.031060.00.01 384 258 7.00E-15 26% SMP126890 324 mk4.002418.01.01 447 93 8.00E-08 26% SMP137300 328 mk4.006327.00.01 315 292 5.00E-18 25% SMP137310 327 mk4.004400.01.01 317 278 6.00E-18 24% SMP137320 324 mk4.004400.01.01 317 290 2.00E-17 24% SMP041700 367 mk4.021981.00.01 316 228 7.00E-16 24% SMP137980 292 mk4.000600.00.01 372 298 2.00E-11 23% SMP056060 487 mk4.012712.00.01 279 227 4.00E-10 23% SMPSC261 229 mk4.014320.00.01 350 176 0.002 23% SMPSC58 323 mk4.006327.00.01 315 283 7.00E-21 22% SMP128170 581 mk4.000038.11.01 391 321 2.00E-17 21% SMP080820 209 mk4.002569.02.01 367 133 5.00E-05 21%

Table 2.5 Planarian homologs of parasite GPCRs- continued. 69 VII N [F/I] . . [M/L] [N/S] [F/G] . . [Y/F] [N/D] P[N/D] F.. . . [Y/F] Y motifs. VI Rhodopsin S...I.S [F/Y] .[F/Y] ...W. . .[F/Y] W. . . .P [S/A] . P [P/L] III D D R [R/K] D [C/M] [L/V] [R/S] C [D/E] R [Y/H] Table 2.6 Comparison of PROF1 motifs and classical II LA..D [L/I] [A/S] ..[L/I] [A/T] [L/I] A ..[D/E] . .[D/E] [H/N] TM PROF1-I PROF1-II Rhodopsin PROF1-III 70

SVM model Scoring scheme γ C 5-fold ACC

SVMT 1 OvO 16.0 32.0 99.01% −8 Level 2: Rhodopsin SVMT 7 OvO 2 2048.0 99.47%

SVMFL OvO 256.0 32.0 98.65%

SVMT 1 OvO 256.0 32.0 96.44% 4.5 Level 3: Amine SVMT 7 OvO 32.0 2 95.0%

SVMFL OvO 256.0 32.0 94.77%

Table 2.7 Rhodopsin SVM training parameters and cross-validation accuracy. 71 SVM classifier results. S. med. Rhodopsin mk4.011371.00.01 mk4.005939.01.01mk4.000656.10.01 mk4.009528.00.01 mk4.001585.00.01mk4.003202.01.01 mk4.009070.00.01 mk4.005562.01.01mk4.000742.09.01 mk4.005766.00.01 mk4.000883.05.01mk4.003002.02.01 mk4.007388.02.01 mk4.002635.00.01mk4.001678.03.01 mk4.029325.00.01 mk4.007538.01.01mk4.011160.01.01 mk4.012659.00.01 mk4.000646.00.01 mk4.001569.04.01 mk4.005650.02.01 mk4.008535.00.01 mk4.009526.00.01mk4.002418.01.01 mk4.000699.11.01 mk4.022218.00.01mk4.014127.00.01 mk4.001974.03.01 mk4.000596.05.01 mk4.008555.00.01mk4.021573.00.01 SMDC21116.2* mk4.027005.00.01 mk4.002457.00.01 mk4.000160.15.01mk4.000375.06.01 SMDC14497.4* mk4.001419.04.01 mk4.000600.00.01 mk4.000787.05.01mk4.002618.00.01 SMDC6858.4* mk4.010162.00.01 mk4.022227.01.01 mk4.005268.02.01mk4.002569.02.01 SMDC504.1* mk4.005939.01.01 mk4.004400.01.01 mk4.006097.03.01mk4.008774.00.01 SMDC18000.2* mk4.001585.00.01 mk4.001491.09.01 mk4.024092.00.01mk4.002807.03.01 SMDC12781.1* mk4.007388.02.01 mk4.014245.00.01 mk4.006248.02.01mk4.004711.02.01 SMDC15168.2* SMDC943.5* mk4.000131.00.01 mk4.002150.00.01mk4.001291.01.01 SMDC175.4* mk4.010834.00.01 mk4.007776.00.01mk4.005599.03.01 SMDC423.2* SMDC8510.2* mk4.000020.08.01 mk4.000341.05.01mk4.001235.01.01 SMDC1026.2* SMDC12554.3* mk4.044215.00.01 mk4.028014.00.01mk4.010211.00.01 SMDC6472.1* SMDC27521.1* mk4.011988.00.01 SMDC1917.2* mk4.010306.00.01mk4.001235.04.01 SMDC14380.2* mk4.022607.00.01 SMDC5598.1A* mk4.002514.01.01mk4.018209.00.01 SMDC8510.3A* mk4.000557.00.01 SMDC5598.1B* mk4.013492.00.01mk4.003634.00.01 SMDC8510.3B* mk4.009248.03.01 SMDC1749.3A* mk4.030476.00.01mk4.005809.00.01 SMDC15385.3* mk4.000065.02.01 SMDC1749.3B* mk4.000617.01.01mk4.016304.00.01 SMDC15273.1* mk4.001283.01.01 SMDC27911.2* mk4.011509.03.01mk4.018682.00.01 SMDC4350.2* mk4.020797.01.01 SMDC907.1A* mk4.012790.01.01 SMDC19969.1* mk4.000219.07.01 SMDC907.1B* SMDC14736.1* SMDC1825.1* Table 2.8 S. man. Amine SMP126730SMP127310 SMP043290SMP134460 SMP043300 SMP120620 SMP134820 SMP043320 SMP180140 SMP027940 SMP043340 SMP160020 SMP137300 SMP145520 SMPSC103 SMP043260 SMP145540 SMP159860 SMP043270 SMP148210 Peptide SMP150180 SMP134960*SMPSC34 SMP180140 SMP137310 SMP164730 SMP153210 SMP056080 SMPSC58 SMP027940 SMPSC15 SMP137050 SMP137320 SMP137300 SMPSC331A* SMP007070 SMP084280* SMPSC331B* SMP091950* SMPSC433*SMPSC1003* SMP167870* SMP145240 SMPSC63*SMP170020 SMP072450 SMP011940 SMP157050 SMP141880 SMP153200 SMP149580 SMP084270* SMP173010 SMP084290* SMP008850 SMP083940* SMP123350 SMP041880* 72

S. mansoni S. mediterranea SMP126730 4 mk4.011371.00.01 1 SMP127310 3* mk4.000656.10.01 2 SMP134460 4 mk4.003202.01.01 3 SMP134820 5 mk4.000742.09.01 3 SMP027940 5 mk4.003002.02.01 1 SMP137300 5 mk4.001678.03.01 6 SMP043260 4* mk4.011160.01.01 6 SMP043270 3 mk4.001569.04.01 2 SMP043290 5 mk4.005939.01.01 5 SMP043300 6 mk4.001585.00.01 5 SMP043320 5 mk4.005562.01.01 1 SMP043340 4* mk4.000883.05.01 1 SMP145520 5 mk4.002635.00.01 2 SMP145540 3 mk4.007538.01.01 2 SMP148210 5 mk4.000646.00.01 2 SMP150180 3 mk4.005650.02.01 1 SMP120620 6 mk4.009528.00.01 2 SMP180140 5 mk4.009070.00.01 1 SMP160020 5 mk4.005766.00.01 1 SMPSC103 2 mk4.007388.02.01 5 SMP159860 3 mk4.029325.00.01 6 mk4.012659.00.01 1

1: Muscarinic acetylcholine 2: Adrenoceptors 3: Dopamine 4: Histamine 5: Serotonin 6: Octopamine 7: Trace amine

Table 2.9 Amine SVM classifier results. 73

3 Novel RNAi-mediated Approach to Probing Platyhelminth G Protein-Coupled Receptors

A paper to be submitted.

Mostafa Zamanian 1,2,∗, Prince N Agbedanu 1, Michael J Kimber 1,2, Tim A Day 1,2,∗

Abstract

G protein-coupled receptors (GPCRs) represents the largest known superfamily of membrane proteins extending throughout the Metazoa. There exists ample motivation to elucidate the functional properties of GPCRs in the phylum Platyhelminthes, given the heavy health bur- den exacted by pathogenic flatworms and the role of free-living flatworms as model organisms for the study of developmental biology and their parasite counterparts. Efforts on this front have been hampered by the unreliable nature of heterologous receptor expression platforms.

We validate and describe a loss-of-function approach for ascertaining the ligand and G protein coupling properties of GPCRs in their native cell membrane environment. RNA interference

(RNAi) was used in conjunction with a GPCR biochemical endpoint assay to monitor cAMP modulation in response to the translational suppression of individual receptors. This was used to confirm GYIRFamide as the cognate ligand for Dugesia tigrina GtNPR-1, while revealing its endogenous coupling to Gαi/o. The method was then applied to deorphanize a Schmidtea mediterranea 5-HT receptor, Smed-SER-7. A bioinformatics protocol guided the selection of receptor candidates mediating 5-HT-evoked responses. While these results establish the poten-

1Department of Biomedical Sciences, Iowa State University, Ames, IA USA 2Interdepartmental Neuroscience Program, Iowa State University, Ames, IA USA ∗Corresponding Authors. Emails: [email protected], [email protected] 74 tial of this approach, future work can help optimize and adapt this receptor deorphanization strategy to a higher-throughput platform.

3.1 Introduction

G protein-coupled receptors (GPCRs) have been the subject of intense research scrutiny due to their central role in eukaryotic signal transduction and their exploitability as drug tar- gets [79, 80, 149]. The phylum Platyhelminthes houses prominent human pathogens as well as tractable model organisms. Flatworm GPCRs represent lucrative anthelmintic targets, as evidenced by the biological activities of their putative ligands [164, 169] and the crucial bio- logical functions of these receptors in nearly-related organisms [184, 185]. Signaling pathways associated with the GPCR superfamily have been specifically identified as potential targets for life-cycle interruption of flatworm parasites [15, 186]. The identification and pharmacological characterization of GPCRs is likely to generate lucrative drug discovery leads, while enhanc- ing our basic understanding of receptor biology in this important phylum. This process has been slowed by an exclusive reliance on traditional heterologous receptor expression approaches.

Once identified, GPCRs undergo deorphanization, the process of pairing orphan receptors with their cognate ligands. The predominant approaches all require the transient or stable het- erologous expression of GPCRs in a surrogate cell system and in most cases, this expression occurs in cells derived from other species and phyla [148, 195, 196]. The complex regulatory processes that guide the correct folding and export of receptors to the cell membrane [89–92] are not necessarily well-conserved across cell lineages. Further, the structural and functional integrity of receptors can depend on the local membrane lipid environment, the composition of which can differ between native and heterologous cells [197,198]. The exact post-translational requirements for proper receptor expression and function vary greatly among receptors, making the task of identifying a suitable heterologous system a receptor-specific process of trial-and- error [148]. 75

Recalcitrance of individual flatworm GPCRs to heterologous expression platforms has intro- duced a significant bottleneck in implementing functional assays to identify receptor agonists.

Similar issues have impeded the structural elucidation of mammalian receptors [147]. Only a handful of flatworm GPCRs have thus far been deorphanized, with receptors expressed in such divergent cellular environments as CHO [138], HEK293 [139,140], COS7 [139], yeast [140,141], and Xenopus oocyte cells [142]. We describe a relatively simple loss-of-function deorphaniza- tion approach that could be applied in a native cell or membrane environment. This alternative strategy could help catalyze a first-pass mapping of receptors and ligands in this phylum.

Inversing the Paradigm: RNAi as a Deorphanization Tool

We describe and validate an RNA interference (RNAi)-based method that allows receptors to undergo deorphanization without the need for full-length cloning or transport to a heterologous expression system. In principle, a collection of putative GPCR ligands are screened against membrane preparations to evaluate their effects on second-messengers downstream of GPCR activation. RNAi is then used to assay whether observed responses can be altered or abolished by the knockdown of individual receptors from the membrane preparations. A successful “hit” confirms expression of a given receptor, functionally pairs the receptor with a given ligand, and couples the receptor with a specific G protein signaling pathway.

Bioinformatics approaches can be used to help identify receptors as putative targets for a partic- ular ligand, or conversely, to narrow the list of potential ligands for a given receptor. The recent availability of platyhelminth genomic data [24, 25, 27] has led to the accumulation of a wealth of receptor and ligand data. A comprehensive in silico protocol revealed over 116 Schistosoma mansoni and 333 Schmidtea mediterranea GPCRs, which were classified using phylogenetic, homology-based, and machine-learning approaches [199]. Bioinformatics and proteomics-based studies have also led to the expansion of the known set of putative GPCR ligands [169,171,172].

The primary biochemical endpoints of GPCR activation are typically assayed by recording 76

2+ agonist-evoked changes in cAMP (Gαs and Gαi/o) or Ca (Gαq) levels. A variety of es- tablished labeling and detection schemes (e.g. fluorescent, luminescent, and radioisotope) are available for these second messengers [200]. In this study, we focus our efforts on the Gαs and Gαi/o pathways and employ a radioimmunoassay (RIA) for cAMP detection. Monitoring adenylyl cyclase modulation of cAMP allows us to simultaneously examine two of the three major GPCR activation endpoints.

While this loss-of-function approach limits pharmacological analysis, it is adaptable to higher- throughput platforms and can serve as an efficient first-pass ligand-receptor mapping tool. It should be noted that ligands and receptors can display pharmacological promiscuity. Ligands can act through more than one receptor and receptors can respond to more than one ligand, with a range of affinities. Further, receptors responsive to a given ligand do not necessarily share the same G protein coupling profile and are likely to be expressed in different abundances.

However, this approach only concerns itself with the contribution of individual receptors to differences between control and RNAi response profiles. The scale and directionality of the differences between these profiles provide information relevant to ligand responsivity and G protein coupling, respectively. The basic logic of this deorphanization strategy is outlined in

Figure 3.1.

3.2 Results and Discussion cAMP Assay Optimization and Ligand Screen

A cell membrane preparation protocol was adapted [201] and optimized for planaria, and used to generate samples for treatment with various GPCR ligands. The downstream effects of lig- and incubation on cAMP levels were monitored using a cAMP RIA. A ligand screen was first carried out on Dugesia tigrina membrane preparations with a small number of peptides and bio- genic amines. These ligand classes are prominent in platyhelminth biology [164, 169, 171, 172], and there is a strong likelihood that a subset signal through one or more receptors coupled to either the Gαs or Gαi/o pathways. This would presumably be made apparent by stimulation 77 of basal [cAMP] or inhibition of forskolin (Fk)-stimulated [cAMP] [202] as measured by RIA, respectively.

Included in this initial screen were the only two ligands definitively coupled to planarian

GPCRs: GYIRFamide and the biogenic amine serotonin (5-HT; 5-hydroxytryptamine). It was a reasonable assumption that both GYIRFamide and 5-HT would modulate cAMP levels in a whole organism membrane preparation. We previously deorphanized the D. tigrina recep- tor GtNPR-1, showing it responded potently to the neuropeptide GYIRFamide in mammalian cell culture [138]. Chimeric G proteins (Gαqi5 and Gαqo5) were used to divert downstream

GtNPR-1 signaling through the Gαq pathway, suggesting this receptor is Gαi/o-coupled in its native environment. More recently, a Dugesia japonica 5-HT GPCR has been deorphanized using Xenopus laevis oocytes [142], and there is long-established evidence of 5-HT stimulation of cAMP in both S. mansoni [203,204] and other planarian species [205], suggesting that 5HT acts through one or more Gαs-coupled GPCRs.

Alongside GYIRFamide and 5-HT, we included neuropeptide F (NPF) and octopamine as pu- tative ligands. NPF has been shown to inhibit Fk-stimulated cAMP production in membranes isolated from S. mansoni [201]. Given the identification of planarian NPF homologues [171,172], we hypothesized that this peptide would have a similar inhibitory effect on cAMP levels. The results of this primary screen show that 10−5M 5-HT drastically stimulates cAMP produc- tion, while 10−4M GYIRFamide, 10−4M NPF, and 10−4M octopamine inhibit Fk-stimulated cAMP accumulation in Dugesia membrane preparations (Figure 3.2) to varying degrees. These changes in [cAMP] can be viewed as the additive response profile of each ligand.

We chose to pursue the response profiles of GYIRFamide and 5-HT. Provided that GtNPR-1 is a known target of GYIRFamide in D. tigrina, we first examined whether or not this would be apparent using this novel loss-of-function approach. Given that the inhibition of adenylate cy- clase by GYIRFamide is less potent than that brought on by NPF, this also serves as validation 78 of assay sensitivity. In addition to this proof or principle, we chose to pursue deorphanization of an orphan S. mediterranea 5-HT receptor, aided by the availability of receptor sequence data for this species.

Coupling Second-Messenger Assay with RNAi: GtNPR-1 Proof of Principle

Establishing RNAi-mediated Receptor Suppression

Double-stranded (ds) RNA was introduced to isolated D. tigrina colonies using a bacterial- mediated feeding protocol. Planaria were randomly selected, isolated into treatment groups, and fed either non-flatworm control dsRNA or GtNPR-1 dsRNA. A 10 day RNAi feeding cycle consisted of four evenly-spaced feedings, followed by a four-day starvation period. Semi- quantitative RT-PCR was used to confirm gene knockdown. A small number of planarians were randomly selected from both experimental and control groups to assay GtNPR-1 suppression, and the remaining planarians were used for membrane assays. Significant GtNPR-1 knockdown

(> 80%) is consistent and apparent in the experimental group, while GtNPR-1 expression remains robust in the control group (Figure 3.3) relative to endogenous standard.

Deorphanization via Comparison of Response Profiles

Membranes were prepared from both control and GtNPR-1 dsRNA-fed planarians, and treated with Fk (10−4M), GYIRFamide (10−4M), and Fk (10−4M) + GYIRFamide (10−4M). RIA was used to assay cAMP levels corresponding to these treatments. Comparison of the response profiles of control and RNAi planaria reveals near-complete abolishment of GYIRFamide-evoked inhibition of Fk-stimulated cAMP in the GtNPR-1 knockdown group (Figure 3.4, Table 3.1).

Overall, GYIRFamide reduces Fk-stimulated cAMP production by an average of 30% in the ∼ control group, and this inhibition was completely abolished by the suppression of GtNPR-1 expression in the RNAi group. In fact, combined Fk and peptide treatment led to a slight increase ( 2%) in cAMP levels compared to Fk alone. These results confirm that GtNPR-1 ∼ is agonized by GYIRFamide and further establish that this receptor is natively coupled to the

Gαi/o signaling pathway. 79

In silico Target Selection

The two ligands that most drastically stimulated and inhibited adenlyate cyclase activity in our primary ligand screen were 5-HT and NPF, respectively. These results were shown to ex- tend to S. mediterranea (data not shown). To identify and rank 5-HT receptor candidates, a profile HMM was built with sequences procured from GPCRDB [107]. Training was focused on

62 full-length invertebrate 5-HT and 5-HT-like receptors. This was used to search against S. mediterranea GPCR sequence datasets [199], and the results were ranked by E-value. The top

10 receptor candidates were used as BLASTp [124] queries against the NBCI “nr” database.

This was used to identify receptors displaying 5-HT receptor homology, and to filter against receptors that displayed a varied range of biogenic amine GPCR-related homology.

Receptors that survived the BLAST filter were compared to their nearest-related S. mansoni homologs, each returning either Smp 148210 and Smp 126730 (Table 3.2). While there is a great deal of bioinformatic evidence to suggest multiple receptor targets for 5-HT, we narrowed our list to two best-match receptors and selected mk4.001585.00.01 as the first target for RNAi- based deorphanization. This receptor is tentatively labeled Smed-SER85.

RNAi-based Deorphanization of Planarian 5HT Receptor

Smed-SER85 transcript expression was confirmed via PCR, and knockdown was illicited fol- lowing the protocol described for GtNPR-1. Similarly, a membrane preparation protocol was optimized for S. mediterranea and membranes were treated with 10−4 M Fk, 10−4 M 5-HT, 10−4

8-hydroxy-2-(di-n-propylamino) tetralin (8-OH-DPAT), and 10−4 meta-Chlorophenylpiperazine

(mCPP) (Figure 3.5). 8-OH-DPAT is an agonist of vertebrate 5-HT1A and 5-HT7 receptors, while mCPP most potently stimulates 5-HT2B and 5-HT2C . This initial profile establishes that these compounds can modulate cAMP levels in this phylum as well, presumably through the action of 5-HT receptors. 80

Smed-SER85 Mediates Flatworm Motility

The phenotypic effects of Smed-SER85 knockdown were assayed using an automated video tracking system. Planarian motility was recorded over a 15 minute period after acclimation of individual worms to the recording well environment. Figure 3.6 shows 52% decrease in ∼ distance travelled over the recording duration in a comparison of control and Smed-SER85

RNAi worms. This significant decrease in basal motility levels suggests that Smed-SER85 plays an important role in the maintenance of flatworm locomotory behavior. Previous studies have established the role of serotoninergic signaling in the regulation of flatworm locomotion

[206,207], and this motility assay establishes Smed-SER85 as the only known receptor mediating such fundamental flatworm behaviors.

3.3 Conclusions

This study shows the utility of joining RNAi with biochemical endpoint assays to as a means of deorphanizing platyhelminth GPCRs in their native membrane environment. The approach was

first validated using the only deorphanized flatworm neuropeptide GPCR (GtNPR-1), confirm- ing agonism by GYIRFamide and providing new information about its endogenous G protein coupling profile. The orphan S. mediterranea GPCR Smed-SER85 was shown to respond to

5-HT, revealing its endogenous G protein pathway and illustrating the usefulness of applying an in silico strategy to candidate receptor selection. The translational suppression of Smed-

SER85 led to a major decrease in overall flatworm motility. While 5-HT and 5-HT-related pharmacological agents had been shown to be involved in regulation of locomotory behavior, this represents the identification of the first receptor that mediates these observations. While this loss-of-function strategy side-steps some of the concerns and difficulties associated with heterologous GPCR expression, there is significant room for improving both the sensitivity and scalability of this assay.

The heavy tissue requirements of the membrane preparation protocols employed introduce a potential rate-limiting step. Further optimizations of membrane or whole cell preparation 81 protocols in this phylum could allow for more efficient and robust pharmacological analysis.

This assay could also conceivably be adapted to higher-throughput platforms, and extended to include GPCRs that signal through the Gαq pathway. Conveniently, establishing receptor- specific RNAi in planaria allows for the accumulation of loss-of-function phenotypic data in parallel to pharmacological data. In this regard, the study of planarians can inform flatworm parasite biology. Biasing the receptor and ligand pool to those best conserved between parasitic and free-living flatworms could lead to new targets for chemotherapeutic intervention.

3.4 Materials and Methods

Planarian maintenance protocol

Dugesia tigrina (Ward’s Natural Science, Rochester, NY) and Schmidtea mediterranea colonies were maintained in the laboratory with a regular feeding cycle. Dugesia tigrina colonies were fed three times a week, while Schmidtea mediterranea colonies were fed twice a week. Planarians were randomly selected and isolated in 50-worm groupings for RNAi feeding cycles and cAMP ∼ assays.

RNA interference

Primers were designed with Primer3 [192] to selectively amplify 300-500 bp fragments of

GtNPR-1 and 5HT receptor candidate Smed-SER85. BLAT [163] was used to check for poten- tial off-target silencing. A 465 bp fragment of GtNPR-1 was amplified from a full length clone of Gt-NPR1 housed in pcDNA3.1(+) with the primers 5’-TTGGATCTTTCCAGCGACTCT-3’

(forward) and 5’-ATGGTTCGTTCGACGTTTTC-3’ (reverse). A 388 bp fragment of Smed-

SER85 was amplified from Schmidtea cDNA isolated using RNAqueous (Ambion) and Ret- roscript (Ambion) with the primers 5’-CTCCGCTTTTAATTGGAGGA-3’ (forward) and 5’-

CTGTTTCTTTTTCCGGGGAT-3’ (reverse). An RNAi control sequence was amplified from

Aedis aegypti cDNA with primers 5’-ACTTCGGCGTCATTTATTGG-3’ (forward) and 5’-

GAAGGGATCATCGAAAACGA-3’ (reverse), corresponding to a 413 bp fragment of a puta- tive odorant receptor (VectorBase [208] id: AAEL013422). Second-round PCR was performed 82 for each target sequence using the original gene-specific primers flanked by Gateway Cloning system (Invitrogen) recombination sites: 5’-GGGG-attB1-3’ (forward) and 5’-GGGG-attB2-

3’ (reverse). Entry sequences were subcloned into the pPR244 (pDONRdT7) [77] (Appendix

D) destination vector with corresponding attP1 and attP2 recombination sites using the BP

Clonase II (Invitrogen) enzyme mix. Clones were transformed into TOP10 Electrocompetent

E. coli (Invitrogen) and sequence confirmed.

Bacterial-mediated dsRNA feeding

Propagation of RNAi vectors was carried out using competent RNase III-deficient HT115 (DE3) bacterial cells as previously described [77]. Colonies were scaled in liquid culture (500 ml) until an OD of 0.3-0.4 was reached, and T7 polymerase activity was IPTG-induced. After 2 hours, cells were pelleted by centrifugation and resuspected in 50 ml of media. This step was repeated with a resuspension volume of 10 ml. Glycerol stocks of dsRNA-containing bacteria

(665 ul total volume; 20% glycerol) were mixed with 275 ul of blended organic beef liver and

100 ul red blood cells (RBCs). RBCs allowed for visual monitoring of planarian food intake.

Planarians were dark-fed dsRNA-containing bacteria four times over a 10-day timeline with each feeding separated by a two-day period, and starved for at least four days prior to their use in experiments.

Semi-quantitative RT-PCR

Total RNA was extracted from individual planaria with RNAqueous (Ambion), followed by removal of DNA contaminants with TURBO DNase (Ambion). First strand cDNA synthesis was carried out with the RETROscript kit (Ambion), as part of a two-stage RT-PCR. PCR optimization was carried out with the QuantumRNA 18S Internal Standards kit (Ambion) per manufacturer instructions. 18S ribosomal RNA was used as an endogenous standard for nor- malizing measures of gene expression and reducing sample-to-sample variation. cDNA samples were used in parallel as templates for multiplex PCR with gene-specific and 18S rRNA primer pairs. PCR reaction products were visualized on 1.2% electrophoretic gel with the Kodak Gel 83

Logic 112 imaging system, and amplicon intensities were analyzed with standard software to derive relative transcript abundances.

Membrane preparation and cAMP RIA

Planaria were washed twice with cold cAMP buffer containing 50 mM sucrose, 50 mM glycyl- glycine, 10 mM creatine phosphate, 2 mM MgCl2, 0.5 mM isobutylmethylxanthine (IBMX),

1 mM dithiothreitol (DTT), 0.02 mM EGTA, 10 units/ml creatine kinase, and 0.01% bovine serum albumin. Worms were kept on ice for 5 min and then homogenized on ice for 2 min with a Teflon homogenizer. This preparation was centrifuged at 1,000 X g for 5 min, with the pellet that included cell debris discarded. This centrifugation step was then repeated. The super- natant was centrifuged at 40,000 X g for 30 min at 4◦C. The supernatant was discarded, and the membrane-containing pellet was resuspended via sonication in cAMP buffer suplemented with

0.1 mM ATP and 0.1 mM GTP. Total suspension volume was set at 500 ul/sample, such that each sample would contain cell membranes from 3 worms. 500 ul aliquots of this membrane ∼ preparation correspond to individual reactions in the cAMP assay.

Samples were incubated with various concentrations (and combinations) of forskolin and/or pu- tative ligands (peptide or biogenic amine) at 37◦C for 20 min to stimulate cAMP production.

Forskolin and peptide ligands were dissolved in Me2SO, with final reaction mixtures contain- ing <0.1% Me2SO. Me2SO has no measurable effect on cAMP in this range. Samples were centrifuged at 3,000 X g for 5 minutes after ligand incubation, and 400 ul of supernatant from each sample was transferred into a fresh tube. Three reactions was assayed for each treatment condition, and each reaction was sampled in triplicate. cAMP concentration was determined with a radioimmunoassay as described previously [209], with a lower detection bound of 1.6 fmol per tube. 84

Bioinformatics

A Profile HMM were built using HMMER-2.3.2 [126] using training sequence data from GPCRDB

[107]. Available invertebrate 5-HT full-length receptor sequences were aligned and a profile

HMM was constructed for each using hmmbuild. The model underwent calibration using hmm- calibrate, with the default parameters. S. mediterranea GPCR sequence datasets [199] were searched against this subfamily-specific profile HMM using hmmpfam. The resulting hits for were parsed with a Perl script and ranked according to E-value. The top 10 receptors candidates were used as BLASTp [124] queries against the NBCI “nr” database, and surving receptors were similarly searched against the S. mansoni predicted proteome [24].

Phenotypic Assays

Planarian motility was tracked with the Ethovision 3.1 video tracking system (Noldus). Worms were individually tracked in wells, and allowed to settle for 10 minutes proceeded by a 15 minute recording period. Worms were recorded in groups of six in a six-well plate platform. Recorded movement tracks were examined to confirm that the tracking software properly distinguished the organism from background.

Statistical Analysis

Basal cAMP levels were set as a baseline for individual RIA experiments, and cAMP values were normalized with respect to the level of Fk-stimulated cAMP (set at 100%). This allowed us to join datasets from repeated experiments with differing basal cAMP levels due to variance in the quality and yield of individual membrane preparations. One-way analysis of variance

(ANOVA) was used with Tukey’s post hoc test for multiple comparison analysis of cAMP levels associated with different treatments. A two-tailed T-test was used for motility comparison between control and RNAi groups. Significances are reported at P < 0.05, P < 0.01, and P <

0.001. 85

3.5 Acknowledgements

We would like to thank Dr. Peter Reddien and Dr. Alejandro Sanchez Alvarado for providing the pPR244 (pDONRdT7) RNAi vector. This work was funded by a grant from the National

Institutes of Health (NIH R01 AI49162) to TAD and AGM and a USDA Formula Fund Grant

(IOWV-DAY-411-23-08) to TAD.

3.6 Figures and Tables

Profile Blank Fk L Fk+L

Control A B C D

RNA-i (x) A B C' D'

I. If C > A, D ≥ B: L has aggregate stimulatory effect on [cAMP]

If C' < C, D' ≤ D: GPCR "x" is L-responsive and G-s coupled

II. If C ≤ A, D < B: L has aggregate inhibitory effect on [cAMP]

If C' ≥ C, D' > D: GPCR "x" is L-responsive and G-i coupled

III. If C ≈ A, D ≈ B: L has no net effect on [cAMP]

Figure 3.1 RNAi-based deorphanization approach overview.

The general set of experimental outcomes for an RNAi-based deorphanization experiment focused on the Gs and Gi pathway are shown. Letters placed within wells represent assayed cAMP levels for membrane preparations in response to various treatments. Potential results are described with respect to the notion that a given ligand can act on multiple GPCRs that are not necessarily coupled to the same G protein (Gs or Gi). Abbreviations: Fk, forskolin; L, ligand; RNAi (x), RNAi preparation with GPCR “x” knocked down. 86

2000

1500 ) M p (

* * ]

P 1000 *** M A c

[ ***

500

0

Control Fk [100uM] 5-HT [10uM] OCT [100uM] NPF [100uM] GYIRF [100uM]

Fk [100uM] + 5-HT [10uM]Fk [100uM] + OCT [100uM] Fk [100uM] + NPF [100uM] Fk [100uM] + GYIRF [100uM]

Figure 3.2 cAMP ligand screen

Peptide and biogenic amine ligand screen performed against isolated D. tigrina membranes. RIA cAMP outputs are shown as mean SEM, and asterisks represent statistically significant differences compared ± with either control or treatment with Fk alone; * P < 0.05, *** P < 0.001, one-way ANOVA, Tukey post hoc test. Red bars are compared with Fk treatment: octopamine (OCT), GYRIFamide (GYIRF), and neuropeptide F (NPF) all inhibit Fk-stimulated cAMP at 100 uM. The green bar is compared with the control condition: serotonin (5-HT) stimulates basal cAMP. These changes in cAMP are likely GPCR-mediated, and should therefore be altered in a ligand-specific manner by subtraction of particular receptor targets from cell membranes via RNAi. 87

Figure 3.3 Semi-quantitative PCR reveals GtNPR-1 knockdown.

Lane 1 is a 100 bp DNA ladder, lanes 2-5 represent individual GtNPR-1 dsRNA-fed planarians, and lanes 6-9 represent control dsRNA-fed planarians. The bottom band ( 300 bp) is the 18S internal ∼ standard, and the top band ( 400 bp) shows GtNPR-1 expression. The top band disappears in the ∼ experimental group, confirming near abolishment of receptor expression in these worms. Relative band intensities (GtNPR-1/18S rRNA) for GtNPR-1 RNAi group: 0.44 0.15. Relative band intensities for ± control group (band location manually selected): 0.08 0.02. This corresponds to > 80% knockdown ± of GtNPR-1 transcript. 88

Control GtNPR-1 RNAi 150 ns P *** M A c

d 100 e t a l u m i t

s 50 - k F

% 0

C Fk C Fk

Fk+GYIRF Fk+GYIRF

Figure 3.4 RNAi-based GtNPR-1 deorphanization

Treatment groups are Control (control dsRNA) and GtNPR-1 RNAi (GtNPR-1 dsRNA). Treatments are C (control), Fk (10−4 M forskolin), and Fk + GYIRF (10−4 M forskolin and 10−4 M GYIRFamide). Each bar is the mean ( SEM) of 3 separate experiments. Basal cAMP levels were set as a baseline for ± each individual experiment, and cAMP values were normalized with respect to the level of Fk-stimulated cAMP (set at 100%). This allowed us to join datasets with differing basal cAMP levels, due to variance in the quality and yield of individual membrane preparations. Analysis of the raw cAMP values of individual experiments renders the same results. Asterisks indicate significance at P < 0.001 (***), and “ns” indicates no significant difference (one-way ANOVA, Tukey post hoc test). 89

150 ***

] *** ) 100 M *** p (

P M A

c 50 [ **

0

Fk 5-HT Control mCPP

8-OH-DPAT

Figure 3.5 Phamacological response profile of 5-HT receptor agonists.

Treatments applied at 10−4 M: Fk, 5-HT, 08-OH-DPAT, and meta-Chlorophenylpiperazine (mCPP). This control profile shows that 5-HT and 8-OH-DPAT lead to an overall increase in cAMP, while mCPP has an overall inhibitory effect on cAMP levels. Treatment groups are compared to control group. Asterisks indicate significance at P < 0.001 (***) and P < 0.01 (one-way ANOVA, Tukey post hoc test). 90

800

600 ) m m (

y

t 400 i l i t *** o M

200

0

RNAi Control

Figure 3.6 Effects of Smed-SER85 suppression on basal motility.

Total locomotory distance calculated over a 15 minute recording period (n=20). Comparison of control and Smed-SER85 RNAi planarians reveals a > 50% decrease in overall motility (two-tailed t-test, P < 0.001), suggesting this receptor plays a fundamental role in maintenance of planarian locomotory behavior. 91

EXP Treatment Control GtNPR1 RNAi C 62.05 2.46 60.83 1.91 ± ± 1 Fk 102.45 4.06 101.47 1.59 ± ± Fk + G 85.02 1.59 ∗∗∗ 103.03 4.27 ns ± ± C 27.88 0.97 33.54 1.27 ± ± 2 Fk 57.37 2.68 58.78 1.64 ± ± Fk + G 48.67 1.23 ∗∗ 57.89 0.93 ns ± ± C 81.49 4.06 55.16 1.60 ± ± 3 Fk 215.96 10.99 129.79 3.61 ± ± Fk + G 195.63 6.17 ∗∗ 132.60 4.62 ns ± ±

Table 3.1 RNAi-based GtNPR-1 deorphanization cAMP raw values

RIA-determined cAMP values (pM) are provided for 3 separate experiments (mean SEM). Treatments: ± C (control), Fk (Forskolin), Fk + G (Forskolin + GYIRFamide). The amount of isolated membrane differs between experiments, as evidenced by basal cAMP levels. This is in part due to differences in the size, number, and feeding behavior of worm batches used for membrane isolation. Analysis (one- way ANOVA, Tukey) of these raw datasets establishes abolishment of cAMP inhibition brought on by GYIRFamide via GtNPR-1 suppression. For each experimental grouping, Fk is compared to Fk + G. Asterisks indicate significance at P < 0.001 (***), P < 0.01 (**), and “ns” means no significant difference. 92

S. mediterranea 5-HT HMM BLAST S. mansoni PID mk4.013690.00.01 2.90E-108 mk4.005939.01.01 3.90E-85 * Smp 148210 58% mk4.011371.00.01 1.20E-71 mk4.001585.00.01 3.00E-70 * Smp 126730 47% mk4.007388.02.01 9.60E-69 * mk4.029325.00.01 2.30E-65 mk4.000656.10.01 1.10E-61 mk4.004462.02.01 5.10E-51 * Smp 148210 mk4.011006.00.01 3.30E-50 * Smp 126730 mk4.003202.01.01 1.10E-49

Table 3.2 5-HT candidate receptor selection

Top-ranked 5-HT profile HMM hits are shown with their associated E-values. Receptors that survived the BLAST filter are marked with an asterisk (*). The nearest related S. mansoni homolog is shown and % identity (PID) values are printed for the best match to each parasite receptor. Of these two receptors (shown in bold), mk4.001585.00.01 (Smed-SER85) was selected for RNAi-based deorphanization. 93

4 PROF1 Localization in Planaria

The PROF1 receptor family, as a flatworm-specific subset of the notoriously drugable GPCR superfamily, represents a rational target for anthelmintic drug discovery. As an initial step towards deciphering the biological functions of these receptors and assaying their potential exploitability as drug targets, whole mount in situ hybridization (WISH) was used to local- ize transcript expression for two planarian PROF1 receptors. The overarching motivation is that the different PROF1 receptors and receptor subtypes are likely to display unique mRNA expression patterns that will inform us about their cognate ligands and their potential signal transduction roles with respect to flatworm anatomy.

4.1 Planarian In situ hybridization protocol

PROF1 receptor fragments were PCR amplified with the minimal T7 polymerase promoter se- quence appended to the 5’ anti-sense primer. Primers for SMDC2955.1: 5’-cgtgcctacctgattccatt-

3’ (forward) and 5’-TAATACGACTCACTATAGGGTACTtttcctcgttgggagatttg-3’ (reverse).

Digoxygenin (DIG)-labeled antisense riboprobes were synthesized using these PCR products

(Roche). Whole-mount in situ hybridization (WISH) was performed at 55◦C in hybridization solution (50% formamide, 5XSSC, 100 ug/ml yeast tRNA, 100 ug/ml heparin sodium salt,

0.1% Tween-20, 10 mM DTT, 10% dextran sulfate sodium salt). DIG-labeled riboprobe (40 ng/ml) was denatured at 72◦C for 15 min immediately prior to hybridization. BCIP/NBT was used for chromogenic color development, followed by paraformaldehyde fixation and imaging. 94

4.2 PROF1 Localization Results

WISH staining revealed that the expression pattern for SMDC2955.1 (PROF1 group II) is localized to the planarian nervous system. From this we can conclude that PROF1 receptors are endoGPCRs expressed in the planarian nervous system, as opposed to chemosensory-type

GPCRs. Further, their cognate ligands likely belong to either the neuropeptide or families. The former is of greater likelihood, given the relative conservation of aminergic GPCRs in the metazaoa in comparison to peptidergic GPCRs, and the highly- diverged nature of this receptor clade. More concrete evidence for the hypothesis that PROF1 receptors respond to neuropeptide ligands comes in the form of the transcript distributions of very recently identified flatworm-specific peptides [172]. It is also entirely possible that PROF1 ligand(s) have yet to be uncovered. This preliminary result is encouraging, given the likelihood of functional conservation with respect to schistosome biology, as the parasite nervous system and neuromusculature is widely recognized as fertile ground for drug targeting [?, 210].

Figure 4.1 Localization of PROF1 transcripts in S. medterranea

(L) The localization of a single PROF-1 subtype II transcript (labeled SMDC2955.1) is shown along the longitudinal nerve chords (red) and the cerebral ganglion (green). (R) The planarian head region reveals PROF-1 expression in the cerebral ganglion (green). 95

5 CONCLUSIONS

This dissertation describes the successful application of a sophisticated bioinformatics protocol to comprehensively identify G protein-coupled receptors in two important flatworm species, the human parasite Schistosoma mansoni and the model planarian Schmidtea mediterranea.

Transmembrane-focused hidden Markov models were used in combination with a set of filters to mine the genomic assemblies of these organisms for GPCRs. Subsequent rounds of manual gene curation and homology-based searches against the nucleotide assemblies further expanded the total receptor count and improved the quality of the underlying gene models. The final

GPCR dataset houses 116 S. mansoni and 333 S. mediterranea GPCRs. Phylogenetic analysis confirmed the presence of the primary metazoan ‘GRAFS’ families, and revealed large num- bers of lineage-specific receptors. Among these, the flatworm-specific PROF1 receptors and the planarian-specific PARF1 receptors represent the largest and most distinct groupings.

Transmembrane-focused SVMs were trained and used to sub-classify a subset of full-length

Rhodopsin and aminergic receptors. Together, these phylogenetic, homology, and machine learning-based outputs can guide future efforts to identify the cognate ligands of these GPCRs.

To learn more about the potential roles of PROF1 receptors, in situ hybridization was per- formed in S. mediterranea. The transcript distribution of a representative of PROF1 subtype

II was localized to the nervous system. This and other lines of evidence suggest that these receptors are endo-GPCRs that likely respond to flatworm-specific peptide ligands. Analysis of Glutamate GPCRs revealed receptors with non-canonical ligand binding domains that are likely to exhibit atypical pharmacology or respond to other amino-acid derived ligands. To support the adoption of planarians as flatworm parasite models, parasite-planarian sequence 96 pairs were ranked by shared sequence identity.

In light of the unpredictable nature of heterologous approaches to GPCR deorphanization, significant steps were taken to validate an RNAi-mediated loss-of-function approach to char- acterizing GPCRs in their native membrane environment. A membrane preparation protocol was optimized for use with planaria, and RNAi was used in conjunction with a biochemical endpoint assay (cAMP RIA) to associate the planarian receptor GtNPR-1 with its neuropep- tide ligand GYIRFamide. This process also coupled the receptor to its endogenous G protein signaling pathway (Gαi/o). A small ligand screen of biogenic amines and peptides yielded other leads for application of this method. Among these, we further bioionformatically ranked and pursued 5-HT receptors mediating the aggregate stimulatory effects of 5-HT on cAMP levels.

Parallel phenotypic assays revealed that downregulation of Smed-5HT85 significantly decreases planarian motility. 97

APPENDIX A The genome of the blood fluke Schistosoma mansoni

A paper published in the journal Nature 1

Matthew Berriman 2 et al. (Mostafa Zamanian 3,4, and Tim A Day 1,2)

This appendix includes the abstract and selected sections for which I shared primary respon- sibility in both the underlying analysis and write-up. Supplementary Tables 12, 13, 20, and

Supplementary Figure 4 resulted from the work summarized in these sections.

Abstract

Schistosoma mansoni is responsible for the neglected tropical disease schistosomiasis that affects

210 million people in 76 countries. Here we present analysis of the 363 megabase nuclear genome of the blood fluke. It encodes at least 11,809 genes, with an unusual intron size distribution, and new families of micro-exon genes that undergo frequent alternative splicing.

As the first sequenced flatworm, and a representative of the Lophotrochozoa, it offers insights into early events in the evolution of the animals, including the development of a body pattern with bilateral symmetry, and the development of tissues into organs. Our analysis has been informed by the need to find new drug targets. The deficits in lipid that make schistosomes dependent on the host are revealed, and the identification of membrane receptors, ion channels and more than 300 proteases provide new insights into the biology of the life cycle and new targets. Bioinformatics approaches have identified metabolic chokepoints, and a chemogenomic screen has pinpointed schistosome proteins for which existing drugs may be

1Reprinted with permission of Nature: Vol 460 | 16 July 2009 | doi:10.1038/nature08160 2Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK 3Department of Biomedical Sciences, Iowa State University, Ames, IA USA 4Interdepartmental Neuroscience Program, Iowa State University, Ames, IA USA 98 active. The information generated provides an invaluable resource for the research community to develop much needed new control tools for the treatment and eradication of this important and neglected disease.

GPCRs, ligand-gated and voltage-gated ion channels

GPCRs, ligand-gated and voltage-gated ion channels are targets for 50% of all current phar- maceuticals [211]. At least 92 putative GPCR-encoding genes are present (Supplementary

Table 12), the bulk (82) of which are from the rhodopsin family. The largest groups are the

α-subfamily (30), which includes amine receptors, and the β-subfamily (24), which contains neuropeptide and hormone receptors. The diversity of the former subfamily underlines the wide range of potential amine/neurotransmitter reactivities of schistosomes, but the tentative identities assigned need to be confirmed by functional studies, as has already been performed for a [139]. Schistosomes detect chemosensory cues, but a large, unique clade of the mediating receptors was not found. However, the 26 ‘orphan’ rhodopsin family

GPCRs may include proteins with this role. Outside the large rhodopsin family, representatives from each of the smaller families of GPCRs, glutamate family (2), frizzled family (3), and the secretin/adhesion family (4) are present.

Each of the three major ligand-gated ion channel families- the Cys-loop family, glutamate- activated cation channels, and ATP- gated ion channels- are represented in the schistosome genome. Of the 13 Cys-loop family ligand-gated ion channels, nine encode nicotinic acetyl- choline receptor subunits (Supplementary Fig. 4 and Supplementary Table 13). The remaining four anion channel subunits group among GABA (c-aminobutyric acid), glycine and glutamate receptors, but it is not possible to assign precise identities. The seven schistosome glutamate- activated cation channels comprise at least two sequences from each of the three common sub-groupings. The presence of a functional P2X receptor for ATP- mediated signalling in schistosomes was already known [212], and the data here show at least four more. 99

Voltage-gated ion channels generate and control membrane potential in excitable cells, and are central to ionic homeostasis. There are examples of successful drugs targeting voltage-gated sodium, potassium and calcium channels [213]. Although voltage-gated sodium channels were not found, at least 41 members from each of the major six transmembrane (6TM) and four transmembrane (4TM) families of potassium channels (Supplementary Table 14) are present.

The 6TM voltage-gated potassium channel family (20 members) is the largest, including the well-characterized Kv1.1 channel found in nerve and muscle of adult schsitosomes [214]. Other classes of 6TM potassium channels include the KQT channels, large calcium-activated channels, small calcium-activated channels, and cyclic-nucleotide-gated groups. This last group, compris- ing eight members, is most often associated with signal transduction in primary olfactory and visual sensory cells (Caenorhabditis elegans has only five; [215]). S. mansoni possesses six 4TM inward-rectifying TWIK-related potassium channels (about 46 in C. elegans). There are four

α and two β subunits of voltage-gated calcium channels in schistosomes, and a β subunit is implicated as a molecular target of the anti-schistosomal praziquantel [216].

Neuropeptides

Thirteen putative neuropeptides were identified (Supplementary Table 20), indicating that schistosomes may have much greater diversity than the two described previously. Apart from the neuropeptide Fs (NPFs), most are apparently restricted to the Platyhelminthes- their ab- sence from humans making them a credible source of anthelmintic drug leads. The predicted product of npp-6 (the amidated heptapeptide AVRLMRLamide) resembles molluscan myomod- ulin, whereas the two NPP-13 peptides show 100% carboxy-terminal identity with vertebrate neuropeptide-FF-like peptides (peptides ending with a C-terminal sequence PQRFamide); nei- ther of these has previously been reported in any non-vertebrate organism. The discovery of a second NPF (NPP-21b) as well as the known NPP-21a [201] is reminiscent of the vertebrate neuropeptide Y (NPY) superfamily, and strengthens the argument that NPFs and NPYs have a common ancestry. 100

GeneID Top BLAST % ID E-Value Final Annotation TreeFam

Glutamate Activated Cationic Channels NMDA Smp 126350 RF:XP 321646.2 56 4.80E-180 , NMDA TF314731 Smp 147390 RF:XP 971730.1 36 4.60E-121 Glutamate Receptor, NMDA TF314731 Smp 126350 RF:XP 321646.2 56 4.80E-180 Glutamate Receptor, NMDA TF314731 Smp 147390 RF:XP 971730.1 36 4.60E-121 Glutamate Receptor, NMDA TF314731 Kainate Smp 140920 GB:BAC06343.1 64 6.80E-120 Glutamate Receptor, Kainate TF315232 Smp 147430 RF:XP 966711.1 54 9.50E-136 Glutamate Receptor, Kainate TF315232 Smp 153780 RF:XP 611666.2 37 3.90E-124 Glutamate Receptor, Kainate TF315232 AMPA Smp 133920 RF:XP 787239.1 30 2.70E-034 Glutamate Receptor, AMPA TF315232 Smp 023290 RF:XP 787239.1 29 3.00E-039 Glutamate Receptor, AMPA TF315232

Cys-Loop Family Nicotinic Subunits Smp 132070.1 GB:ABA60381.1 51 2.80E-086 nAChR subunit TF315605 Smp 132070.2 GB:ABA60381.1 51 2.80E-086 nAChR subunit TF315605 Smp 031680 GB:AAR84361.1 99 0 nAChR subunit (ShAR1alpha) TF315605 Smp 037960 GB:ABA60381.1 50 2.80E-105 nAChR subunit TF315605 Smp 142690 GB:ABA60385.1 42 1.20E-081 nAChR subunit (ShAR1beta2) TF315605 Smp 142700 GB:ABA60381.1 49 3.10E-102 nAChR subunit TF315605 Smp 157790 GB:ABA60385.1 57 5.80E-098 nAChR subunit TF315605 Smp 176310 GB:ABA60381.1 48 3.80E-095 nAChR subunit TF315605 Smp 180570 GB:ABA60386.1 37 6.20E-062 nAChR subunit TF315605 Smp 130390 GB:ABA60386.1 76 4.40E-045 nAChR subunit TF315605 Smp 139330 GB:AAR84362.1 97 0 nAChR subunit (ShAR1beta) TF315605 Smp 101990 RF:NP 000070.1 30 6.30E-025 nAChR subunit TF315605 Non-nicotinic Cys-loop Receptor Subunits Smp 015630 GB:AAM23270.1 42 2.70E-065 Cys-loop LGIC Subunit TF315453 Smp 096480 SP:O75311 37 2.30E-068 Cys-loopLGIC Subunit TF315453 Smp 099500 RF:XP 974894.1 36 3.60E-031 Cys-loop LGIC Subunit TF315453 Smp 176730 RF:NP 001024077.1 51 1.20E-028 Cys-loop LGIC Subunit TF315453 Smp 104890 SP:P57695 47 2.40E-062 Cys-loop LGIC Subunit TF315453

Table A.1 Summary of Ligand-Gated Ion Channels (LGIC) 101

GeneID Top BLAST % ID E-Value Final Annotation TreeFam

ATP gated ion channels P2X Smp 099640 GB:CAH04147.1 48 2.00E-097 P2x Receptor Subunit TF328633 Smp 176860 GB:CAH04147.1 46 1.80E-073 P2x Receptor Subunit TF328633 Smp 179310 GB:CAH04147.1 100 5.60E-164 P2x Receptor Subunit TF328633 Smp 114060 GB:AAX11263.1 43 1.60E-031 P2x Receptor Subunit TF328633 Smp 089780 RF:NP 990079.1 37 5.50E-024 P2x Receptor Subunit TF328633 Smp 005030 GB:AAX11263.1 100 1.20E-035 P2x Receptor Subunit TF328633

Other Cyclic-nucleotide-gated cation channel Smp 056560 RF:XP 967432.1 50 1.10E-055 TF318250 Smp 152480 RF:XP 395071.2 54 5.80E-146 TF318250 Smp 152500 RF:XP 314248.2 66 7.50E-066 TF318250 Smp 155040 RF:XP 554755.1 43 8.90E-115 TF318250 Smp 194700 SP:Q90805 53 6.50E-166 TF318250 Amiloride-sensitive sodium channel Smp 058270 TF330663 Smp 083980 TF317359 Smp 175020 TF317359 Smp 052630 GB:AAK20896.1 24 6.90E-014 TF317359 Smp 162680 GB:AAF80601.1 25 2.20E-017 TF317359 Smp 093210 GB:AAF80601.1 22 2.60E-026 TF317359 Smp 180260 GB:AAK20896.1 26 2.20E-012 TF317359

Table A.2 Summary of Ligand-Gated Ion Channels (LGIC)- Continued 102

GeneID Annotation TreeFam

6TM K+ Channels SMP 081250 voltage-gated potassium channel TF313103 SMP 129380 voltage-gated potassium channel TF313103 SMP 136440 voltage-gated potassium channel TF313130 SMP 035160 voltage-gated potassium channel TF313103 SMP 035180 voltage-gated potassium channel TF315186 SMP 035870 voltage-gated potassium channel TF313103 SMP 146620 voltage-gated potassium channel TF313130 SMP 148670 voltage-gated potassium channel TF313130 SMP 151810 voltage-gated potassium channel TF313130 SMP 152350 voltage-gated potassium channel TF313130 SMP 063930 voltage-gated potassium channel SKv1.1 TF313103 SMP 157490 voltage-gated potassium channel TF313130 SMP 069240 voltage-gated potassium channel TF313103 SMP 160780 voltage-gated potassium channel TF313103 SMP 161140 voltage-gated potassium channel TF313130 SMP 163090.1 voltage-gated potassium channel TF352511 SMP 163090.2 voltage-gated potassium channel TF352511 SMP 094560 voltage-gated potassium channel TF313103 SMP 121190 voltage-gated potassium channel TF313103 SMP 194710 voltage-gated potassium channel TF313103 SMP 144310 voltage-gated potassium channel, KQT TF315186 SMP 144310 voltage-gated potassium channel, KQT TF315186 SMP 008170 calcium-activated potassium channel, large conductance TF313103 SMP 161450 calcium-activated potassium channel, large conductance TF315015 SMP 166620 calcium-activated potassium channel, large conductance TF313947 SMP 166910 calcium-activated potassium channel, large conductance TF314283 SMP 156150 calcium-activated potassium channel, small conductance TF315015 SMP 056560 cyclic-nucleotide-gated cation channel TF318250 SMP 152480 cyclic-nucleotide-gated cation channel TF318250 SMP 152500 cyclic-nucleotide-gated cation channel TF318250 SMP 155040 cyclic-nucleotide-gated cation channel TF318250 SMP 194700 cyclic-nucleotide-gated cation channel TF318250 SMP 153100 cyclic-nucleotide-gated cation channel, hyperpolarization-activated TF318250 SMP 168880 cyclic-nucleotide-gated cation channel, hyperpolarization-activated TF318250 SMP 174860 cyclic-nucleotide-gated cation channel, hyperpolarization-activated TF318250

Table A.3 Summary of Voltage-Gated and Other Ion Channels 103

GeneID Annotation TreeFam

4TM K+ Channels SMP 127170 twik family of potassium channels-related TF316115 SMP 034850 twik family of potassium channels-related TF313947 SMP 141570 twik family of potassium channels-related TF316115 SMP 046640 twik family of potassium channels-related TF316115 SMP 151120 twik family of potassium channels-related TF313947 SMP 155970 twik family of potassium channels-related TF313947

Voltage-Gated Ca2+ Channels SMP 020270 high voltage-activated calcium channel Cav1 TF312805 SMP 159990 high voltage-activated calcium channel TF312805 SMP 020170 high voltage-activated calcium channel Cav2A TF312805 SMP 004730 high voltage-activated calcium channel Cav2B TF312805 SMP 135140 high voltage-activated calcium channel beta subunit CavB1 TF316195 SMP 141660 high voltage-activated calcium channel beta subunit CavB2 TF316195 SMP 141780 Four domain-type voltage-gated ion channel alpha-1 subunit TF312843

Voltage-Gated Cl- Channels SMP 058360.1 chloride channel protein TF313867 SMP 058360.2 chloride channel protein TF313867 SMP 071970 chloride channel protein TF313867

Table A.4 Summary of Voltage-Gated and Other Ion Channels- Continued 104

APPENDIX B Yeast GPCR Assay

This appendix contains data related to progress towards the yeast-based deorphanization of a Schistosoma mansoni neuropeptide-like GPCR (Smp 011940). Figure B.1 provides a com- parison of the endogenous (left) and modified (right) pathways that allow for these strains to be used for receptor deorphanization. The first candidate receptor chosen for this approach was originally identified in a regular expression-driven ORF screen of GPCR homologues in S. mansoni, and exhibits a great deal of sequence similarity with an allatostatin-responsive GPCR

(Figure B.2).

α-factor Ligand

GTP GDP GpaI (α) GTP GDP GpaI - Gα SST2

Ste4 (β) Ste18 (γ) Ste4 (β) Ste18 (γ)

Ste11 (MEKK) Ste11 (MEKK)

Ste7 (MEK) Ste7 (MEK)

Fus3 (MAPK) Fus3 (MAPK)

Far1 Ste12 P Ste12 P

PRE PRE

FUS1 FUS1 Reporter: HIS Nucleus Nucleus

Figure B.1 Heterologous GPCR expression in yeast. 105

Figure B.2 Smp 011940 sequence alignment.

Smp 011940 (0029329) is aligned with an allatostatin receptor from Periplaneta americana (AAK52473). Yellow and green signify sequence identity and sequence similarity, respectively.

Full-length GPCR sequence characterization with 5’,3’-RACE PCR.

5’,3’-Rapid Amplification of cDNA Ends (RACE) PCR was used to elucidate full-length gene structure, and to establish with certainty the coding sequence of the GPCR to be heterologously expressed. Poly A+ RNA was extracted from adult S. mansoni tissue using TRI Reagent

(Sigma) and Dynabead’s mRNA Purification Kit (Dynal). This mRNA was used to synthesize cDNA with SMART RACE cDNA Amplification Kit (BD Biosciences). Gene specific primers were designed from ORF sequences and used for 5’ and 3’ RACE PCR in conjunction with the cDNA templates. Reactions were visualized on 1.2% agarose gel. Discrete amplicons were excised from gel, purified, and subcloned into pGEM-T Easy Vector (Promega). The resulting construct was transformed into JM109 competent Escherichia coli (Promega). Individual clones identified by a blue-white colony screen were filtered with colony PCR, and positive clones were cultured in LB broth overnight. Plasmic DNA was purified with the Wizard Plus SV Miniprep kit (Promega) and sequence confirmed. The final consensus sequence is shown in Figure B.3. 106

Sub-cloning receptor coding sequence into yeast expression vector

The Smp 011940 coding sequence was sub-cloned into the yeast expression vector Cp4258 (2µ ori AmpR LEU2 REP3 PGK-promoter-MFα1-(1-89)) [217, 218]. In the resulting construct,

GPCR expression is driven by the PGK promoter, while the 89 amino acid Mfα1-leader se- quence is fused to the receptor N terminus to promote GPCR export to the cell membrane [219].

Smp 019940 (in pcDNA3.1) was used as a template in a two-step PCR that added a 6-mer spacer and NcoI and XbaI restriction sites to the 5’ and 3’ ends of the coding sequence, respec- tively. The complete coding sequence was then ligated into Cp4258. The resulting construct was transformed into JM109 competent Escherichia coli (Promega). Individual clones were screened with a colony PCR, and positive clones were cultured in LB broth overnight. Plasmid

DNA was purified with the Wizard Plus SV Miniprep kit (Promega) and sequenced. A perfect clone was identified, and used for yeast transformation.

Transformation of yeast with recombinant plasmid.

Cy14083 cells were grown in 10 mL YPD broth overnight (30◦C, 250 RPM), achieving an optical density (OD600) of about 1. The culture was scaled up by inoculation in 300 mL YPD in a 1 L flask until OD600 measures fell in the 0.3-0.5 range. The sample was centrifuged at

4,000 x g for 5 min and resuspended in 10 mL H20. Another centrifugation step was performed at 5,000 x g for 5 min, the pellet was resuspended in 10 mL buffered lithium solution (1 mL

◦ 1M LiAc, 1 mL 10X TE and 8 mL H20) and incubated for 1 hr at 30 C. Carrier ssDNA (Salmon sperm) was boiled for 10 minutes and left to cool on ice for 45 min. 200 ug carrier ssDNA and 1 ug of recombinant plasmid (Smp 0119949 Cp4258) was added to a sterile 1.5 k mL tube. 200 uL of yeast suspension was added, followed by 1.2 mL of PEG 4000 solution

(1 mL 10X TE, 1 mL 10X LiAc, 8 mL 50% PEG 4000). The mixture was quickly vortexed and incubated for 30 min at 30◦C (250 RPM). Cells were heat shocked at 42◦C for 15 min and placed in the microcentrifuge for 5 sec. 500 uL was diluted in 1 mL 1X TE, and 200 uL of the dilution were spread on selective (-LEU) plates. Plates were incubated for 2-4 days at 30◦C until transformants appeared. Colonies were inoculated in 2 mL -LEU broth and incubated for 107

20-24 hrs at 30◦C. Yeast genomic DNA was isolated and used in a PCR screen with primers specific for Smp 019940. Four positive colonies were propagated and stored for later use in functional assays.

Figure B.3 Full-length gene transcript of Smp 011940.

As revealed by RACE PCR, the Smp 011940 transcript is 2044 nucleotides in length and contains a 1248 nucleotide open reading frame encoding a 416 amino acid peptide. 108

APPENDIX C PROF1 Primers and RT-PCR

S. mansoni and S. mediterranea PROF1 primers

> S. mansoni:

SMP084270

F: 5’ atgataagtatgaactcaagtgaattaatttttactg 3’ (Tm=57)

R: 5’ tcagtaattgtggcctgatacaacgct 3’ (Tm=58)

Predicted length: 1,107 bp (full-length)

SMP041880

F-primer: 5’ atgttacataatactacaactatagattatagtcagttagt 3’ (Tm=59)

R-primer: 5’ tcaatcttcagttctttggggtctatgca 3’ (Tm=59)

Predicted length: 1,026 bp (full-length)

> S. mediterranea: (Tm: 59-60 C for all primers)

SMDC6472.1

F: 5’ tgcaacaaatggtgacgttt 3’ / R: 5’ ggcaaggtagagatggcaaa 5’

Predicted length: 534

F: 5’ tgtgccaaaaagaactcctg 3’ / R: 5’ taaaaccggaagctgtgcat 5’

Predicted length: 1007

SMDC8510.2

F: 5’ cagcccttgggatttattga 3’ / R: 5’ gcggccaaaatatagcaaaa 5’

Predicted length: 446

F: 5’ gcattcgttttcccagtgat 3’ / R: 5’ cttggacttgtgggtgcttt 5’

Predicted length: 956

SMDC8510.3A

F: 5’ ccaaactccatgaccgaact 3’ / R: 5’ aggatccgcaaaaggagaat 5’

Predicted length: 486

F: 5’ aacaacgggcaatactctcg 3’ / R: 5’ cgcaatattttgctcagcttc 5’

Predicted length: 1009 109

SMDC14380.2

F: 5’ ccaaactccatgaccgaact 3’ / R: 5’ aggatccgcaaaaggagaat 5’

Predicted length: 493

F: 5’ aacaacgggcaatactctcg 3’ / R: 5’ cgcaatattttgctcagcttc 5’

Predicted length: 1026

SMDC504.1

F: 5’ taggagctctggcatttgct 3’ / R: 5’ gccaataaaccgcagttagc 5’

Predicted length: 433

F: 5’ tcgacttgtcaaatgtcaaagg 3’ / R: 5’ gaggataatgtcggttttgaaca 5’

Predicted length: 1042

SMDC1889.2

F: 5’ gccaattgtaccatgtgctg 3’ / R: 5’ aaaattcgaatggctgatcg 5’

Predicted length: 425

F: 5’ tccttgggatttattggttcc 3’ / R: 5’ aaaattcgaatggctgatcg 5’

Predicted length: 1039

SMDC15273.1

F: 5’ aggcgaactacgcgttcata 3’ / R: 5’ ccccataagtccacgaagaa 5’

Predicted length: 411

F: 5’ ggcgaactacgcgttcatag 3’ / R: 5’ tcccgttaaaatacgaacgaa 5’

Predicted length: 1031

SMDC7587.4

F: 5’ tgtggtgcatttctcatggt 3’ / R: 5’ ttgcaaaaactaacgccaca 5’

Predicted length: 460

F: 5’ tgatctatatatgccaagatcagc 3’ / R: 5’ aaagattttgttaatttcgtcagaaaa 5’

Predicted length: 1001

SMDC2411.1

F: 5’ ttgtcgctgtggttggaata 3’ / R: 5’ aattttcgagctcctgttgc 5’

Predicted length: 499

F: 5’ ttgtcgctgtggttggaata 3’ / R: 5’ aacgaccggaagtggattc 5’

Predicted length: 1059

SMDC2955.1

F: 5’ acgataagtggggcattgag 3’ / R: 5’ tttcctcgttgggagatttg 5’

Predicted length: 495

F: 5’ cgtgcctacctgattccatt 3’ / R: 5’ tttcctcgttgggagatttg 5’

Predicted length: 949 110

SMDC5598.1A

F: 5’ tttcctgcatttggtcttcc 3’ / R: 5’ cgcaatttccctgtcgtaat 5’

Predicted length: 500

F: 5’ ccgttaatgagttgccttgg 3’ / R: 5’ cgaaagttcggataacgtgaa 5’

Predicted length: 905

SMDC4570.1

F: 5’ ttcacgattctgttggcttg 3’ / R: 5’ tgatactgatgctggggtca 5’

Predicted length: 505

F: 5’ caagtgatgaatccccgaat 3’ / R: 5’ atgctggggtcaccaataac 5’

Predicted length: 904

SMDC5598.1B

F: 5’ ggtgagtagggccctgtgta 3’ / R: 5’ tacgccaagatgtggggtat 5’

Predicted length: 528

F: 5’ ggtgagtagggccctgtgta 3’ / R: 5’ ggtagcatatgaacattaaggtgtg 5’

Predicted length: 942 111

RT-PCR Results

Performed using primer set two (400-600 bp products)

Reaction component V (uL) Primer 1 (10 uM) 3.0 Stage Temp Time Primer 2 (10 uM) 3.0 Denature 94.0 ◦C 2:00 cDNA template 4.0 Denature 94.0 ◦C 0:30 10X Buffer 5.0 Anneal 55.0 ◦C 0:30 MgCl2 (50 mM) 2.0 Extented 72.0 ◦C 1:00 dNTP (10 mM) 1.0 x45 H2O 31.5 Hold 4.0 ◦C Platinum Taq 0.5 ∞ Total volume 50.0

Table C.1 PROF1 RT-PCR conditions

PCR reaction (left), PCR thermocycler conditions (right). Template: S. mediterranea cDNA

Figure C.1 S. mediterranea PROF1 RT-PCR.

Lanes 1 and 15 are low DNA mass ladder. Lanes 2-14 are visualized PCR reactions with PROF1 primers corresponding to (in order): SMDC6472.1, SMDC8510.2, SMDC14380.2, SMDC8510.3A, SMDC, SMDC1889.2, SMDC15273.1, SMDC7587.4, SMDC2411.1, SMDC2955.1, SMDC5598.1A, SMDC4570.1, and SMDC5598.1B. Although spurious products can be seen in some lanes, correct-sized amplicons were present for all putative PROF1 receptors that were PCR amplified. 112

APPENDIX D RNAi vector

pUC Ori

pPR244 (pDONRdT7) ccdB KanR

T7 term = 560-607Figure D.1 RNAi vector pPR244 (pDONRdT7). T7 = 612-631 AttP2= 760-992 Two T7 RNAAttP1= polymerase 2941-3172 promoters are flanked by two class I T7 transcriptional terminators to improve transcriptionT7= efficiency. 3193-3212 A gene fragment is subcloned between the attP1 and attP2 recombination sites, T7 term= 3217-3262 and the construct is;:: transformedSo 2"1- into RNase-III deficient bacteria with IPTG-inducible T7 polymerase activity to generateApprox. 500 target-specific ng of purified plasmid dsRNA. pPR244 was blotted onto Whatman filter paper. This plasmid should ONLY be transformed into DB3.1 cells or ccDB Survival2T1R Competent Cells (Invitrogen) and maintained with Kanamycin (2sug/ml) and Chloramphenicol (sOug/ml).

A Gateway BP reaction must be used to insert cDNAs. After the BP reaction, transform into a standard cloning strain (ie. DHsalpha) and select ONLY with Kanamycin . ./

The circle indicates the location of where the DNA was placed onto the filter paper. Just cut out the circle and place it in a microfuge tube with 100 ul or so of TE Buffer. The DNA will elute from the filter paper into the TE. After a couple of hours you can use a few ul to transform the bacteria and you should be on your way. Remem ber to use ON LY the correct bacterial strain plus both ofthe antibiotics (Kan + Chloramphenicol). You have a ton of DNA blotted onto that filter paper so you should have no problem. Best Wishes! 113

APPENDIX E cAMP Assay Optimization

1000

800 )

M 600 p (

] P M A c

[ 400

200

0

Control FS [10uM] DMSO [5uL] FS [100uM] 5-HT[10uM] DMSO [10uL] NPF [100uM] FS [10uM] + NPF FS [10uM] + 5-HT FS [100uM] + NPF FS [100uM] + 5-HT

Figure E.1 Membrane preparation optimization with cAMP RIA.

This sample optimization experiment was performed using a D. tigrina membrane preparation, after alteration of a S. mansoni membrane preparation protocol. Establishes that DMSO is safe to use as a solvent at the concentrations used in experiments. Establishes that 10−4 M forskolin provides a more robust cAMP range, and allows for the measurement of NPF inhibition of cAMP with better resolution than forskolin used at 10−5 M. Confirms significant 5-HT stimulation of cAMP at 10−5 M, even as 10−4 M 5-HT membrane treatments have been previously published. 114

APPENDIX F PERL Code

This appendix contains Perl source code used in the completion of this work. This includes scripts for extracting transmembrane domains, generating SVM feature vectors, calculating position-specific entropy in multiple sequence alignments, and parsing Blast output files.

TM extraction

#Author: Mostafa Zamanian ([email protected])//Iowa State University

#This script streams in two input files:

#1) FASTA-formatted sequence file (with no special characters)

#2) Corresponding HMMTOP output file (generated from 1)

#And generates two output files:

#1) FASTA-formatted sequence file housing TM-only sequences

#2) CVS-formatted spreadsheet file housing TM frequency distribution

#!/usr/bin/perl use strict; use warnings;

#Processes FASTA-formated sequence file open(FILE, "X.fasta") or die("Unable to open file"); my $a; while ($_ = ) {

if ($_ =~ />/) {

chomp $_;

$_=$_."&";

}

chomp $_; 115

$_ =~s/\s//g;

$a.=$_;

}

$a =~s/>/\&>/g;

$a = $a."&";

#Processes corresponding HMMTOP output file open(FILE2, "X_HMMTOP.out") or die("Unable to open file"); my $b; while ($_ = ) {

chomp $_;

$_ = $_."\*";

$b.=$_;

} my $b2 = $b;

#Creates output files open(OUTFILE, ">X_TMextract.fasta"); open(OUTFILE2, ">X-TMcount.csv"); my @TMcounter;

#Regular expression used to parse and identify headers-sequences pairs while ($a =~ /\>(.*?)\&([A-Z|\*]{30,})\&/g){

my $geneid = $1;

my $seq = $2;

my $geneid2 = substr($geneid, 0, 99);

#Regular expresion used to identify the HMMTOP coordinate line

#Extract and concatenate TM domains if TM-count is 3-15

if ($b2 =~ /($geneid2)\s+[A-Z]{2,3}\s+([3-9]|1[0-5])\s+(\d+.*?)\*/g){

my $TMtemp = $2;

push @TMcounter, $TMtemp;

my $COOR = $3;

my $i = 1;

my @TMaa = ’’;

while ($COOR =~ /(\d+)\s+(\d+)/g){

my $CR1 = $1;

my $CR2 = $2;

#+-5 aa 116

my $TMaatemp = substr ($seq,$CR1-6,$CR2-$CR1+11);

push @TMaa, $TMaatemp;

$i = $i+1;

}

while ($geneid =~ /(.*?)\s/) {

$geneid = $1;

}

#Prints TM-only sequences to FASTA file

print OUTFILE ">",$geneid,"-TM:",$TMtemp,"\n";

print OUTFILE join "", @TMaa;

print OUTFILE "\n";

}

}

#Hash that maps TM numbers to their frequency my %TM_counters; for (@TMcounter){

$TM_counters{$_}++;

}

#Prints hash in CSV table format while (my ($key, $value) = each %TM_counters) {

my $output = "$key ===> $value";

print $output, "\n";

print OUTFILE2 "$key,$value";

print OUTFILE2 "\n";

} close OUTFILE;close OUTFILE2;

TM extraction: joining of TM domains with character barrier

#Adds special character "-" between concatenated TM domains

#Required for SVM feature vector generation for model T7

#Swap first line for second in previous script: my $TMaatemp = substr ($seq,$CR1-6,$CR2-$CR1+11); my $TMaatemp = "-".substr ($seq,$CR1-3,$CR2-$CR1+5)."-"; 117

SVM dipeptide frequency feature vector construction: full-length model (FL)

#This script streams one input file (FASTA):

#1) Sequence file (with no special characters) with full-length seqs

#And generates one output file (txt):

#1) 400-element frequency vectors taken over each full-length sequence

#!/usr/bin/perl use strict; use warnings;

#Declare amino acid alphabet in array form my @aa1 = ("A","C","D","E","F","G"..."Q","R","S","T","V","W","Y"); my @aa2 = ("A","C","D","E","F","G"..."Q","R","S","T","V","W","Y"); my $z; my @seqnames; my @seqs;

#Read in sequence file while ($_ = <>) {

if ($_ =~ />/) {

$_=$_."*";

}

chomp $_;

$z.=$_;

}

$z =~s/\s//g;

$z =~s/>/\*>/g;

$z = $z."*"; my $z2=$z; my $z3=$z;

#Hash associating amino acids with numbers (1-20) my %AA_numbering = (

’A’ => ’1’, ’C’ => ’2’, ’D’ => ’3’, ’E’ => ’4’, ’F’ => ’5’,

’G’ => ’6’, ’H’ => ’7’, ’I’ => ’8’, ’K’ => ’9’, ’L’ => ’10’,

’M’ => ’11’, ’N’ => ’12’, ’P’ => ’13’, ’Q’ => ’14’, ’R’ => ’15’, 118

’S’ => ’16’, ’T’ => ’17’, ’V’ => ’18’, ’W’ => ’19’, ’Y’ => ’20’,);

#Generate duplet hash by first generating a duplet array my @duplet_array; my $i=0; for ($i=0; $i<20; $i++){

my $one = $aa1[$i];

my $j=0;

for ($j=0; $j<20; $j++){

my $two = $aa2[$j];

my $onetwo = $one.$two;

push @duplet_array, $onetwo;

}

}

#Create a seed hash, mapping every aa pair to numbers 1-400 my %AA_duplet_numbering ; my $k=1; for (@duplet_array) {

$AA_duplet_numbering{$_}=$k;

$k = $k +1;

}

#Extract filename my $ARGV2=’’; while ($ARGV =~/(.*?)\.fasta/g){

$ARGV2=$1;

}

#Declares feature vector output file open(OUTFILE, ">$ARGV2\_FV.txt");

#***

#Reads in and processes individual sequences while ($z2 =~ />(.*?)\*([ACDEFGHIKLMNPQRSTVWY-]+)\*/g) {

my $seqnumber = $1; #sequence label

my $sequence1 = $2; #raw sequence

my $seq_length = length($sequence1); #sequence length

#Reads in aa duplets in both frames and adds them to vector

my @duplets; 119 while ($sequence1 =~ /([ACDEFGHIKLMNPQRSTVWY]{2})/g) {

push @duplets, $1;

} my $sequence2 = substr ($sequence1,1,); while ($sequence2 =~ /([ACDEFGHIKLMNPQRSTVWY]{2})/g) {

push @duplets, $1;

}

#Creates hash that maps duplets to their frequency my %duplet_counter; for (@duplets) {

$duplet_counter{$_}++;

} my @printarray2; while (my ($key, $value) = each %duplet_counter) {

my $output = "$key ===> $value";

push @printarray2, $output;

}

#Sorts duplets alphabetically my @sorted2 = sort {$a cmp $b} @printarray2; print join "\n", @sorted2; #prints progress

#Creates ordered frequency array my @AA_duplet_ordered; for (@duplets) {

push @AA_duplet_ordered, $AA_duplet_numbering{$_};

}

#Creates ordered hash that maps duplet numbers to frequency print "\n","AA_duplet_numb => frequency:","\n"; my %duplet_counterb; for (@AA_duplet_ordered) {

$duplet_counterb{$_}++;

} print OUTFILE "x "; #where x is an integer- typically 0

#Orders hash keys into new array my @duplet_counterb_temp = sort {$a <=> $b} (keys %duplet_counterb); for (@duplet_counterb_temp) { 120

my $key = $_; #dipeptide key

my $value = $duplet_counterb{$_}; #frequency value

my $valuef = $value/($seq_length-1); #converts frequencies to proportions

print OUTFILE "$key:$valuef "; #prints final fv values to output file

}

print OUTFILE "\n";

} close (OUTFILE);

SVM dipeptide frequency feature vector construction: T1 model

#This script streams one input file (FASTA):

#1) Sequence file (with no special characters) with full-length seqs

#And generates one output file (txt):

#1) 400-element dipeptide frequency vector for each sequence

#FVs calculated over length of concatenated TM domains

#Same source code as previous script

SVM dipeptide frequency feature vector construction: T7 model

#This script streams one input file (FASTA):

#1) TM-only sequence file with TM borders marked

#And generates one output file (txt):

#1) 2800-element dipeptide frequency vector for each sequence

#Ordered concatenation of 400-element FVs calculated for each TM

#Replace everything past the line "#***" in FL script

#Reads in and processes individual sequences while ($z2 =~ />(.*?)\*([ACDEFGHIKLMNPQRSTVWY-]+)\*/g){

my $seqnumber = $1;

my $sequence1 = $2;

#Reads in individual TM partitions in order

my @sequencepartitions;

while ($sequence1 =~ /([ACDEFGHIKLMNPQRSTVWY]+)/g){

push @sequencepartitions, $1; 121

} my $x=0; for (@sequencepartitions) {

print "\n","Duplets","\n";

my @duplets =();

my $sequence1a = $_;

my $seq_length1a = length($sequence1a);

while ($sequence1a =~ /([ACDEFGHIKLMNPQRSTVWY]{2})/g){

push @duplets, $1;

}

my $sequence2a = substr ($sequence1a,1,);

while ($sequence2a =~ /([ACDEFGHIKLMNPQRSTVWY]{2})/g){

push @duplets, $1;

}

#Creates hash that maps duplets to their frequency

my %duplet_counter =();

for (@duplets) {

$duplet_counter{$_}++;

}

my @printarray2 =();

while (my ($key, $value) = each %duplet_counter) {

my $output = "$key ===> $value";

push @printarray2, $output;

}

my @sorted2 =();

my @sorted2 = sort {$a cmp $b} @printarray2; #sorts duplets alphabetically

#Creates ordered hash that maps duplet numbers to frequency

my @AA_duplet_ordered =();

for (@duplets) {

push @AA_duplet_ordered, $AA_duplet_numbering{$_};

}

#Creates ordered hash that maps duplet numbers to frequency

print "\n","AA_duplet_numb => frequency:","\n";

my %duplet_counterb =();

for (@AA_duplet_ordered) { 122

$duplet_counterb{$_}++;

}

my @printarray2f =();

#Orders hash keys into new array

my @duplet_counterb_temp =();

my @duplet_counterb_temp = sort {$a <=> $b} (keys %duplet_counterb);

for (@duplet_counterb_temp) {

my $key = $_+(400*$x);

my $value = $duplet_counterb{$_};

my $valuef = $value / ($seq_length1a -1);

my $output = "$key ===> $valuef";

push @printarray2f, $output;

print OUTFILE "$key:$valuef ";

}

$x=$x+1;

} print OUTFILE "\n";

} close (OUTFILE);

Multiple Sequence Alignment Shannon Entropy Calculator

#This script streams in one input file:

#1) FASTA-formatted multiple sequence alignment

#And generates one file:

#1) CSV formatted-file containing MSA positions and corresponding entropy values

#!/usr/bin/perl use strict; use warnings;

#Spreadsheet header open OUTFILE, (">entropy.csv"); print OUTFILE "Position",",","Entropy","\n";

#Read in MSA (fasta) and amass into single string 123 my $z; while ($_ = <>) {

if ($_ =~ />/) {

chomp $_;

$_=$_."&";

}

chomp $_;

$z.=$_;

}

$z =~s/\s//g; my $z2=$z;

#Logarithm subroutine sub log_base {

my ($base, $value) = @_;

return log($value)/log($base);

}

#Find sequence length my $alength = ’’; if ($z2 =~ />.*?\&([ACDEFGHIKLMNPQRSTVWYX-]+)/) {

my $atemp = $1;

$alength = length ($atemp);

}

#Find number of sequences my $k=0; while ($z2 =~ />/g) {

$k=$k+1;

} my $seqnumber = $k;

#Calculate Shannon Entropy for each position in alignment my $i=0; for ($i=0; $i<$alength; $i++){

print "ARRAY POSITION NUMBER: ",$i+1;

print OUTFILE $i+1,",";

my @singlets;

while ($z =~ />.*?\&([ACDEFGHIKLMNPQRSTVWYX-]+)/g){ 124

my $letters = substr ($1,$i,1);

push @singlets, $letters;

} print join ’ ’, @singlets;

#Create a counter for this position my %singlet_counter; for (@singlets) {

$singlet_counter{$_}++;

} my @printarray; my @printarray2;

#Only consider columns where seqs with gap < 1/4th of the total seq number

#Otherwise entropy set to default arbitrarily high value (5) my $seqnumberb = $seqnumber/4; my $entropy = 5; if ($singlet_counter{’-’} < $seqnumberb){

$entropy = 0;

#Counts instances of each aa for given column, converts to probabilities

#Calculates Shannon entropy with probabilities

while (my ($key, $value) = each %singlet_counter) {

my $prob = $value/$seqnumber;

my $templog = log_base(2,$prob);

$templog = (-1)*$prob*$templog;

$entropy = $entropy+$templog;

my $output2 = "$key ===> $value";

push @printarray1, $output2;

my $output = "$key ===> $templog";

push @printarray2, $output;

}

}

#Print frequencies, probabilities, and entropy to screen print "AA => frequency:","\n"; print join "\n", @printarray1; print "\n"; print "AA => probability:","\n"; 125

print join "\n", @printarray2;

print "\n";

print "ENTROPY: ", $entropy,"\n";

#Print entropy to CSV file

print OUTFILE2 $entropy,"\n";

} close (OUTFILE);

BLAST parser

#This script streams in one input file:

#1) Blast result output file (txt) generated with netblast-2.2.18

#And generates one output file:

#1) CSV formatted-file containing information about top Blast hats

#!/usr/bin/perl use strict; use warnings;

#Reads in Blast output file and stores in single string my $a; while ($_ = <>) { chomp $_;

$a.=$_;

}

#Creates csv file, fills out first row with desired parameter list open(OUTFILE, ">BLAST.csv"); print OUTFILE "query id,query length...","\n";

#Regular expression extract query and hit information while ($a =~ /Query=\s([^\s]*?)\s.*?\(([0-9]+)\sletters\)(.*?)Lambda/g){

my $queryid = $1; # query id

my $querylength = $2; #query length

my $region = $3; #region

print OUTFILE $queryid,",",$querylength,"\n";

while ($region =~ /\>(ref.*?)\s(.*?)\[(.*?)\].*?Length\s=\s([0-9]+).*? 126

Expect\s=\s(.*?)\s+Identities\s=\s(.*?),/g){

my $hitid = $1; #hit id

my $hitannotation = $2; #hit annotation

my $hitspecies = $3; #hit species

my $hitlength = $4; #hit length

my $hiteval = $5; #hit E-value

my $hitidentity = $6; #hit identity

$hitannotation =~s/\s{2,}/\s/g;

$hitannotation =~s/,/\s/g;

$hitspecies =~s/\s{2,}/\s/g;

print OUTFILE " ",","," ",",";

#Choose and print parameters of interest to file

print OUTFILE $hitid,...,"\n";

}

} close OUTFILE; 127

BIBLIOGRAPHY

[1] Cioli D, Valle C, Angelucci F, Miele AE: Will new antischistosomal drugs finally

emerge? Trends in Parasitology 2008, 24(9):379–82.

[2] Steinmann P, Keiser J, Bos R, Tanner M, Utzinger J: Schistosomiasis and water

resources development: systematic review, meta-analysis, and estimates of

people at risk. The Lancet infectious diseases 2006, 6(7):411–25.

[3] King CH, Dickman K, Tisch DJ: Reassessment of the cost of chronic helmintic

infection: a meta-analysis of disability-related outcomes in endemic schisto-

somiasis. Lancet 2005, 365(9470):1561–9.

[4] Hotez PJ, Fenwick A: Schistosomiasis in Africa: an emerging tragedy in our new

global health decade. PLoS Negl Trop Dis 2009, 3(9):e485.

[5] Chenine AL, Shai-Kobiler E, Steele LN, Ong H, Augostini P, Song R, Lee SJ, Autissier

P, Ruprecht RM, Secor WE: Acute Schistosoma mansoni Infection Increases Sus-

ceptibility to Systemic SHIV Clade C Infection in Rhesus Macaques after

Mucosal Virus Exposure. PLoS Negl Trop Dis 2008, 2(7):e265.

[6] King CH, Dangerfield-Cha M: The unacknowledged impact of chronic schistoso-

miasis. Chronic Illn 2008, 4:65–79.

[7] Fenwick A, Webster JP, Bosque-Oliva E, Blair L, Fleming FM, Zhang Y, Garba A,

Stothard JR, Gabrielli AF, Clements ACA, Kabatereine NB, Toure S, Dembele R,

Nyandindi U, Mwansa J, Koukounari A: The Schistosomiasis Control Initiative 128

(SCI): rationale, development and implementation from 2002-2008. Parasitol-

ogy 2009, 136(13):1719–30.

[8] Utzinger J, Raso G, Brooker S, Savigny DD, Tanner M, Ornbjerg N, Singer BH, N’goran

EK: Schistosomiasis and neglected tropical diseases: towards integrated and

sustainable control and a word of caution. Parasitology 2009, 136(13):1859–74.

[9] Danso-Appiah A, Vlas SJD: Interpreting low praziquantel cure rates of Schisto-

soma mansoni infections in Senegal. Trends in Parasitology 2002, 18(3):125–9.

[10] Melman SD, Steinauer ML, Cunningham C, Kubatko LS, Mwangi IN, Wynn NB, Mutuku

MW, Karanja DMS, Colley DG, Black CL, Secor WE, Mkoji GM, Loker ES: Reduced

Susceptibility to Praziquantel among Naturally Occurring Kenyan Isolates of

Schistosoma mansoni. PLoS Negl Trop Dis 2009, 3(8):e504.

[11] Utzinger J, Keiser J: Schistosomiasis and soil-transmitted helminthiasis: com-

mon drugs for treatment and control. Expert Opin Pharmacother 2004, 5(2):263–85.

[12] Caffrey CR: Chemotherapy of schistosomiasis: present and future. Curr Opin

Chem Biol 2007, 11(4):433–9.

[13] Sayed AA, Simeonov A, Thomas CJ, Inglese J, Austin CP, Williams DL: Identification

of oxadiazoles as new drug leads for the control of schistosomiasis. Nat Med

2008, 14(4):407–12.

[14] Abdulla MH, Ruelas DS, Wolff B, Snedecor J, Lim KC, Xu F, Renslo AR, Williams

J, McKerrow JH, Caffrey CR: Drug discovery for schistosomiasis: hit and lead

compounds identified in a library of known drugs by medium-throughput

phenotypic screening. PLoS Negl Trop Dis 2009, 3(7):e478.

[15] Fitzpatrick JM, Peak E, Perally S, Chalmers IW, Barrett J, Yoshino TP, Ivens AC,

Hoffmann KF: Anti-schistosomal Intervention Targets Identified by Lifecycle

Transcriptomic Analyses. PLoS Negl Trop Dis 2009, 3(11):e543. 129

[16] Reddien PW, Alvarado AS: Fundamentals of planarian regeneration. Annu Rev

Cell Dev Biol 2004, 20:725–57.

[17] Alvarado AS: Stem cells and the Planarian Schmidtea mediterranea. C R Biol

2007, 330(6-7):498–503.

[18] Aboobaker AA: Planarian stem cells: a simple paradigm for regeneration. Trends

in cell biology 2011.

[19] Alvarado AS: Planarian regeneration: its end is its beginning. Cell 2006,

124(2):241–5.

[20] Best JB, Goodman AB, Pigon A: Fissioning in planarians: control by the brain.

Science 1969, 164(879):565–6.

[21] Morita M, Best JB: Effects of Photoperiods and Melatonin on Planarian Asexual

Reproduction. J Exp Zool 1984, 231:273–282.

[22] Newmark PA, Alvarado AS: Not your father’s Planarian: a classical model enters

the era of functional genomics. Nat. Rev. Genet. 2002, 3(3):210–219.

[23] Garcia-Fern`andezJ, Bayascas-Ram´ırezJR, Marfany G, Mu˜noz-M´armolAM, Casali A,

Bagu˜n`aJ, Sal´oE: High copy number of highly similar mariner-like transposons

in planarian (Platyhelminthe): evidence for a trans-phyla horizontal transfer.

Mol Biol Evol 1995, 12(3):421–31.

[24] Berriman M, Haas BJ, LoVerde PT, Wilson RA, Dillon GP, Cerqueira GC, Mashiyama

ST, Al-Lazikani B, Andrade LF, Ashton PD, Aslett MA, Bartholomeu DC, Blandin G,

Caffrey CR, Coghlan A, Coulson R, Day TA, Delcher A, Demarco R, Djikeng A, Eyre

T, Gamble JA, Ghedin E, Gu Y, Hertz-Fowler C, Hirai H, Hirai Y, Houston R, Ivens A,

Johnston DA, Lacerda D, Macedo CD, McVeigh P, Ning Z, Oliveira G, Overington JP,

Parkhill J, Pertea M, Pierce RJ, Protasio AV, Quail MA, Rajandream MA, Rogers J,

Sajid M, Salzberg SL, Stanke M, Tivey AR, White O, Williams DL, Wortman J, Wu W, 130

Zamanian M, Zerlotini A, Fraser-Liggett CM, Barrell BG, El-Sayed NM: The genome

of the blood fluke Schistosoma mansoni. Nature 2009, 460(7253):352–8.

[25] japonicum Genome Sequencing S, Consortium FA, Zhou Y, Zheng H, Liu F, Hu W,

Wang ZQ, Gang L, Ren S: The Schistosoma japonicum genome reveals features

of host-parasite interplay. Nature 2009, 460(7253):345–51.

[26] Cantarel BL, Korf I, Robb SMC, Parra G, Ross E, Moore B, Holt C, Alvarado AS,

Yandell M: MAKER: an easy-to-use annotation pipeline designed for emerging

model organism genomes. Genome Research 2008, 18:188–96.

[27] Robb SMC, Ross E, Alvarado AS: SmedGD: the Schmidtea mediterranea genome

database. Nucleic Acids Research 2008, 36(Database issue):D599–606.

[28] Gonz´alez-Est´evez C, Momose T, Gehring WJ, Sal´oE: Transgenic planarian lines

obtained by electroporation using transposon-derived vectors and an eye-

specific GFP marker. Proc Natl Acad Sci USA 2003, 100(24):14046–51.

[29] Kalinna BH, Brindley PJ: Manipulating the manipulators: advances in parasitic

helminth transgenesis and RNAi. Trends Parasitol 2007, 23(5):197–204.

[30] Mann VH, Morales ME, Rinaldi G, Brindley PJ: Culture for genetic manipulation of

developmental stages of Schistosoma mansoni. Parasitology 2010, 137(3):451–62.

[31] Morales ME, Mann VH, Kines KJ, Gobert GN, Fraser MJ, Kalinna BH, Correnti JM,

Pearce EJ, Brindley PJ: piggyBac transposon mediated transgenesis of the hu-

man blood fluke, Schistosoma mansoni. FASEB J 2007, 21(13):3479–89.

[32] Kines KJ, Morales ME, Mann VH, Gobert GN, Brindley PJ: Integration of reporter

transgenes into Schistosoma mansoni chromosomes mediated by pseudotyped

murine leukemia virus. FASEB J 2008, 22(8):2936–48. 131

[33] Kines KJ, Rinaldi G, Okatcha TI, Morales ME, Mann VH, Tort JF, Brindley PJ: Elec-

troporation facilitates introduction of reporter transgenes and virions into

schistosome eggs. PLoS Negl Trop Dis 2010, 4(2):e593.

[34] Fire A, Xu S, Montgomery MK, Kostas SA, Driver SE, Mello CC: Potent and spe-

cific genetic interference by double-stranded RNA in Caenorhabditis elegans.

Nature 1998, 391(6669):806–11.

[35] Timmons L, Fire A: Specific interference by ingested dsRNA. Nature 1998,

395(6705):854.

[36] Mello CC, Conte D: Revealing the world of RNA interference. Nature 2004,

431(7006):338–42.

[37] Moffat J, Sabatini DM: Building mammalian signalling pathways with RNAi

screens. Nat Rev Mol Cell Biol 2006, 7(3):177–87.

[38] Macrae IJ, Zhou K, Li F, Repic A, Brooks AN, Cande WZ, Adams PD, Doudna JA:

Structural basis for double-stranded RNA processing by Dicer. Science 2006,

311(5758):195–8.

[39] Siomi H, Siomi MC: On the road to reading the RNA-interference code. Nature

2009, 457(7228):396–404.

[40] Khvorova A, Reynolds A, Jayasena SD: Functional siRNAs and miRNAs exhibit

strand bias. Cell 2003, 115(2):209–16.

[41] Ghildiyal M, Zamore PD: Small silencing RNAs: an expanding universe. Nat. Rev.

Genet. 2009, 10(2):94–108.

[42] Nilsen TW: Mechanisms of microRNA-mediated gene regulation in animal cells.

Trends Genet 2007, 23(5):243–9. 132

[43] Filipowicz W, Bhattacharyya SN, Sonenberg N: Mechanisms of post-transcriptional

regulation by microRNAs: are the answers in sight? Nat. Rev. Genet. 2008,

9(2):102–14.

[44] H¨ock J, Meister G: The Argonaute protein family. Genome Biol 2008, 9(2):210.

[45] Filipowicz W, Jaskiewicz L, Kolb FA, Pillai RS: Post-transcriptional gene silencing

by siRNAs and miRNAs. Curr Opin Struct Biol 2005, 15(3):331–41.

[46] Sood P, Krek A, Zavolan M, Macino G, Rajewsky N: Cell-type-specific signatures

of microRNAs on target mRNA expression. Proc Natl Acad Sci USA 2006,

103(8):2746–51.

[47] Jose AM, Smith JJ, Hunter CP: Export of RNA silencing from C. elegans tis-

sues does not require the RNA channel SID-1. Proc Natl Acad Sci USA 2009,

106(7):2283–8.

[48] Miska EA, Ahringer J: RNA interference has second helpings. Nat Biotechnol 2007,

25(3):302–3.

[49] Verjovski-Almeida S, DeMarco R, Martins EAL, Guimar˜aes PEM, Ojopi EPB, Paquola

ACM, Piazza JP, Nishiyama MY, Kitajima JP, Adamson RE, Ashton PD, Bonaldo MF,

Coulson PS, Dillon GP, Farias LP, Gregorio SP, Ho PL, Leite RA, Malaquias LCC, Mar-

ques RCP, Miyasato PA, Nascimento ALTO, Ohlweiler FP, Reis EM, Ribeiro MA, S´a

RG, Stukart GC, Soares MB, Gargioni C, Kawano T, Rodrigues V, Madeira AMBN,

Wilson RA, Menck CFM, Setubal JC, Leite LCC, Dias-Neto E: Transcriptome anal-

ysis of the acoelomate human parasite Schistosoma mansoni. Nat Genet 2003,

35(2):148–57.

[50] Krautz-Peterson G, Skelly PJ: Schistosoma mansoni: the dicer gene and its ex-

pression. Exp Parasitol 2008, 118:122–8.

[51] Krautz-Peterson G, Bhardwaj R, Faghiri Z, Tararam CA, Skelly PJ: RNA interference

in schistosomes: machinery and methodology. Parasitology 2009, :1–11. 133

[52] Gomes MS, Cabral FJ, Jannotti-Passos LK, Carvalho O, Rodrigues V, Baba EH, S´aRG:

Preliminary analysis of miRNA pathway in Schistosoma mansoni. Parasitol Int

2009, 58:61–8.

[53] Batista TM, Marques JT: RNAi pathways in parasitic protists and worms. Journal

of proteomics 2011.

[54] Copeland CS, Marz M, Rose D, Hertel J, Brindley PJ, Santana CB, Kehr S, Attolini CSO,

Stadler PF: Homology-based annotation of non-coding RNAs in the genomes of

Schistosoma mansoni and Schistosoma japonicum. BMC Genomics 2009, 10:464.

[55] Hao L, Cai P, Jiang N, Wang H, Chen Q: Identification and characterization of mi-

croRNAs and endogenous siRNAs in Schistosoma japonicum. BMC Genomics

2010, 11:55.

[56] Sim˜oesMC, Lee J, Djikeng A, Cerqueira GC, Zerlotini A, da Silva-Pereira RA, Dalby

AR, Loverde P, El-Sayed NM, Oliveira G: Identification of Schistosoma mansoni

microRNAs. BMC Genomics 2011, 12:47.

[57] Boyle JP, Wu XJ, Shoemaker CB, Yoshino TP: Using RNA interference to ma-

nipulate endogenous gene expression in Schistosoma mansoni sporocysts. Mol

Biochem Parasitol 2003, 128(2):205–15.

[58] Skelly PJ, Da’dara A, Harn DA: Suppression of cathepsin B expression in Schis-

tosoma mansoni by RNA interference. Int J Parasitol 2003, 33(4):363–9.

[59] Delcroix M, Sajid M, Caffrey CR, Lim KC, Dvor´akJ, Hsieh I, Bahgat M, Dissous C,

McKerrow JH: A multienzyme network functions in intestinal protein digestion

by a platyhelminth parasite. J Biol Chem 2006, 281(51):39316–29.

[60] Nabhan JF, El-Shehabi F, Patocka N, Ribeiro P: The 26S proteasome in Schisto-

soma mansoni: bioinformatics analysis, developmental expression, and RNA

interference (RNAi) studies. Exp Parasitol 2007, 117(3):337–47. 134

[61] Freitas TC, Jung E, Pearce EJ: TGF-beta signaling controls embryo development

in the parasitic flatworm Schistosoma mansoni. PLoS Pathog 2007, 3(4):e52.

[62] Krautz-Peterson G, Radwanska M, Ndegwa D, Shoemaker CB, Skelly PJ: Optimizing

gene suppression in schistosomes using RNA interference. Mol Biochem Parasitol

2007, 153(2):194–202.

[63] Ndegwa D, Krautz-Peterson G, Skelly PJ: Protocols for gene silencing in schisto-

somes. Exp Parasitol 2007, 117(3):284–91.

[64] Morales ME, Rinaldi G, Gobert GN, Kines KJ, Tort JF, Brindley PJ: RNA interference

of Schistosoma mansoni cathepsin D, the apical enzyme of the hemoglobin

proteolysis cascade. Mol Biochem Parasitol 2008, 157(2):160–8.

[65] Pereira TC, Pascoal VDB, Marchesini RB, Maia IG, Magalh˜aesLA, Zanotti-Magalh˜aes

EM, Lopes-Cendes I: Schistosoma mansoni: evaluation of an RNAi-based treat-

ment targeting HGPRTase gene. Exp Parasitol 2008, 118(4):619–23.

[66] Rinaldi G, Morales ME, Alrefaei YN, Cancela M, Castillo E, Dalton JP, Tort JF, Brindley

PJ: RNA interference targeting leucine aminopeptidase blocks hatching of

Schistosoma mansoni eggs. Mol Biochem Parasitol 2009, 167(2):118–26.

[67] Mour˜aoMM, de Moraes Mour˜aoM, Dinguirard N, Franco GR, Yoshino TP: Phenotypic

screen of early-developing larvae of the blood fluke, schistosoma mansoni,

using RNA interference. PLoS Negl Trop Dis 2009, 3(8):e502.

[68] Faghiri Z, Skelly PJ: The role of tegumental aquaporin from the human parasitic

worm, Schistosoma mansoni, in osmoregulation and drug uptake. FASEB J

2009, 23(8):2780–9.

[69] Krautz-Peterson G, Simoes M, Faghiri Z, Ndegwa D, Oliveira G, Shoemaker CB, Skelly

PJ: Suppressing glucose transporter gene expression in schistosomes impairs

parasite feeding and decreases survival in the mammalian host. PLoS Pathog

2010, 6(6):e1000932. 135

[70] Stefani´cS, Dvoˇr´akJ, Horn M, Braschi S, Sojka D, Ruelas DS, Suzuki B, Lim KC,

Hopkins SD, McKerrow JH, Caffrey CR: RNA interference in Schistosoma mansoni

schistosomula: selectivity, sensitivity and operation for larger-scale screening.

PLoS Negl Trop Dis 2010, 4(10):e850.

[71] Alvarado AS, Newmark PA: Double-stranded RNA specifically disrupts gene ex-

pression during planarian regeneration. Proc Natl Acad Sci USA 1999, 96(9):5049–

54.

[72] Palakodeti D, Smielewska M, Graveley BR: MicroRNAs from the Planarian

Schmidtea mediterranea: a model system for stem cell biology. RNA 2006,

12(9):1640–9.

[73] Lu YC, Smielewska M, Palakodeti D, Lovci MT, Aigner S, Yeo GW, Graveley BR: Deep

sequencing identifies new and regulated microRNAs in Schmidtea mediter-

ranea. RNA 2009, 15(8):1483–91.

[74] Friedl¨ander MR, Adamidi C, Han T, Lebedeva S, Isenbarger TA, Hirst M, Marra M,

Nusbaum C, Lee WL, Jenkin JC, Alvarado AS, Kim JK, Rajewsky N: High-resolution

profiling and discovery of planarian small RNAs. Proc Natl Acad Sci USA 2009,

106(28):11546–51.

[75] Orii H, Mochii M, Watanabe K: A simple ”soaking method” for RNA interference

in the planarian Dugesia japonica. Dev Genes Evol 2003, 213(3):138–41.

[76] Newmark PA, Reddien PW, Cebri`aF, Alvarado AS: Ingestion of bacterially ex-

pressed double-stranded RNA inhibits gene expression in planarians. Proc

Natl Acad Sci USA 2003, 100 Suppl 1:11861–5.

[77] Reddien PW, Bermange AL, Murfitt KJ, Jennings JR, Alvarado AS: Identification of

genes needed for regeneration, stem cell function, and tissue homeostasis by

systematic gene perturbation in planaria. Developmental Cell 2005, 8(5):635–49. 136

[78] Newmark PA: Opening a new can of worms: a large-scale RNAi screen in

planarians. Developmental Cell 2005, 8(5):623–4.

[79] Flower DR: Modelling G-protein-coupled receptors for drug design. Biochim

Biophys Acta 1999, 1422(3):207–34.

[80] Lagerstr¨omMC, Schi¨oth HB: Structural diversity of G protein-coupled receptors

and significance for drug discovery. Nat Rev Drug Discov 2008, 7(4):339–57.

[81] Milligan G, Kostenis E: Heterotrimeric G-proteins: a short history. Br J Pharmacol

2006, 147 Suppl 1:S46–55.

[82] Heuss C, Gerber U: G-protein-independent signaling by G-protein-coupled re-

ceptors. Trends Neurosci 2000, 23(10):469–75.

[83] Gilman AG: G proteins: transducers of receptor-generated signals. Annu Rev

Biochem 1987, 56:615–49.

[84] Smrcka AV: G protein betagamma subunits: central mediators of G protein-

coupled receptor signaling. Cell Mol Life Sci 2008, 65(14):2191–214.

[85] Terrillon S, Bouvier M: Roles of G-protein-coupled receptor dimerization. EMBO

Rep 2004, 5:30–4.

[86] Gurevich VV, Gurevich EV: GPCR monomers and oligomers: it takes all kinds.

Trends Neurosci 2008, 31(2):74–81.

[87] Marshall FH, Jones KA, Kaupmann K, Bettler B: GABAB receptors - the first 7TM

heterodimers. Trends in Pharmacological Sciences 1999, 20(10):396–9.

[88] Rovira X, Pin JP, Giraldo J: The asymmetric/symmetric activation of GPCR

dimers as a possible mechanistic rationale for multiple signalling pathways.

Trends in Pharmacological Sciences 2010, 31:15–21.

[89] Duvernay MT, Filipeanu CM, Wu G: The regulatory mechanisms of export traf-

ficking of G protein-coupled receptors. Cell Signal 2005, 17(12):1457–65. 137

[90] Sexton PM, Albiston A, Morfis M, Tilakaratne N: Receptor activity modifying pro-

teins. Cell Signal 2001, 13(2):73–83.

[91] Dong C, Filipeanu CM, Duvernay MT, Wu G: Regulation of G protein-coupled

receptor export trafficking. Biochim Biophys Acta 2007, 1768(4):853–70.

[92] Kobilka BK: G protein coupled receptor structure and activation. Biochim Bio-

phys Acta 2007, 1768(4):794–807.

[93] Oldham WM, Hamm HE: Heterotrimeric G protein activation by G-protein-

coupled receptors. Nat Rev Mol Cell Biol 2008, 9:60–71.

[94] Pitcher JA, Freedman NJ, Lefkowitz RJ: G protein-coupled receptor kinases. Annu

Rev Biochem 1998, 67:653–92.

[95] Ribas C, Penela P, Murga C, Salcedo A, Garc´ıa-Hoz C, Jurado-Pueyo M, Aymerich

I, Mayor F: The G protein-coupled receptor kinase (GRK) interactome:

role of GRKs in GPCR regulation and signaling. Biochim Biophys Acta 2007,

1768(4):913–22.

[96] Reiter E, Lefkowitz RJ: GRKs and beta-arrestins: roles in receptor silencing,

trafficking and signaling. Trends Endocrinol Metab 2006, 17(4):159–65.

[97] Moore CAC, Milano SK, Benovic JL: Regulation of receptor trafficking by GRKs

and arrestins. Annu Rev Physiol 2007, 69:451–82.

[98] Burchett SA: Regulators of G protein signaling: a bestiary of modular protein

binding domains. J Neurochem 2000, 75(4):1335–51.

[99] Johnston CA, Siderovski DP: Receptor-mediated activation of heterotrimeric G-

proteins: current structural insights. Molecular Pharmacology 2007, 72(2):219–30.

[100] Marchese A, Chen C, Kim YM, Benovic JL: The ins and outs of G protein-coupled

receptor trafficking. Trends Biochem Sci 2003, 28(7):369–76. 138

[101] Kelly E, Bailey CP, Henderson G: Agonist-selective mechanisms of GPCR desen-

sitization. Br J Pharmacol 2008, 153 Suppl 1:S379–88.

[102] Daaka Y, Luttrell LM, Lefkowitz RJ: Switching of the coupling of the beta2-

to different G proteins by protein kinase A. Nature 1997,

390(6655):88–91.

[103] Hosey MM: What molecular events underlie heterologous desensitization? Fo-

cus on ”receptor phosphorylation does not mediate cross talk between mus-

carinic M(3) and bradykinin B(2) receptors”. Am J Physiol 1999, 277(5 Pt

1):C856–8.

[104] Liebmann C, B¨ohmerFD: Signal transduction pathways of G protein-coupled

receptors and their cross-talk with receptor tyrosine kinases: lessons from

bradykinin signaling. Curr Med Chem 2000, 7(9):911–43.

[105] Attwood TK, Findlay JB: Fingerprinting G-protein-coupled receptors. Protein Eng

1994, 7(2):195–203.

[106] Kolakowski LF: GCRDb: a G-protein-coupled receptor database. Recept Channels

1994, 2:1–7.

[107] Horn F, Bettler E, Oliveira L, Campagne F, Cohen FE, Vriend G: GPCRDB infor-

mation system for G protein-coupled receptors. Nucleic Acids Research 2003,

31:294–7.

[108] Fredriksson R, Lagerstr¨omMC, Lundin LG, Schi¨othHB: The G-protein-coupled re-

ceptors in the form five main families. Phylogenetic analysis,

paralogon groups, and fingerprints. Molecular Pharmacology 2003, 63(6):1256–72.

[109] Schi¨othHB, Fredriksson R: The GRAFS classification system of G-protein cou-

pled receptors in comparative perspective. General and Comparative Endocrinology

2005, 142(1-2):94–101. 139

[110] Fredriksson R, Schi¨othHB: The repertoire of G-protein-coupled receptors in fully

sequenced genomes. Molecular Pharmacology 2005, 67(5):1414–25.

[111] Perez DM: From plants to man: the GPCR ”tree of life”. Mol Pharmacol 2005,

67(5):1383–4.

[112] Vassilatis DK, Hohmann JG, Zeng H, Li F, Ranchalis JE, Mortrud MT, Brown A, Ro-

driguez SS, Weller JR, Wright AC, Bergmann JE, Gaitanaris GA: The G protein-

coupled receptor repertoires of human and mouse. Proc Natl Acad Sci USA 2003,

100(8):4903–8.

[113] Bjarnad´ottirTK, Gloriam DE, Hellstrand SH, Kristiansson H, Fredriksson R, Schi¨oth

HB: Comprehensive repertoire and phylogenetic analysis of the G protein-

coupled receptors in human and mouse. Genomics 2006, 88(3):263–73.

[114] Lagerstr¨omMC, Hellstr¨omAR, Gloriam DE, Larsson TP, Schi¨othHB, Fredriksson R:

The G protein-coupled receptor subset of the chicken genome. PLoS Comput

Biol 2006, 2(6):e54.

[115] Gloriam DE, Fredriksson R, Schi¨othHB: The G protein-coupled receptor subset

of the rat genome. BMC Genomics 2007, 8:338.

[116] Metpally RPR, Sowdhamini R: Genome wide survey of G protein-coupled recep-

tors in Tetraodon nigroviridis. BMC Evol Biol 2005, 5:41.

[117] Hill CA, Fox AN, Pitts RJ, Kent LB, Tan PL, Chrystal MA, Cravchik A, Collins FH,

Robertson HM, Zwiebel LJ: G protein-coupled receptors in Anopheles gambiae.

Science 2002, 298(5591):176–8.

[118] Metpally RPR, Sowdhamini R: Cross genome phylogenetic analysis of human and

Drosophila G protein-coupled receptors: application to functional annotation

of orphan receptors. BMC Genomics 2005, 6:106. 140

[119] Kamesh N, Aradhyam GK, Manoj N: The repertoire of G protein-coupled recep-

tors in the sea squirt Ciona intestinalis. BMC Evol Biol 2008, 8:129.

[120] Nordstr¨omKJV, Fredriksson R, Schi¨othHB: The amphioxus (Branchiostoma flori-

dae) genome contains a highly diversified set of G protein-coupled receptors.

BMC Evol Biol 2008, 8:9.

[121] Ji Y, Zhang Z, Hu Y: The repertoire of G-protein-coupled receptors in Xenopus

tropicalis. BMC Genomics 2009, 10:263.

[122] Haitina T, Fredriksson R, Foord S, Schioth H, Gloriam D: The G protein-coupled

receptor subset of the dog genome is more similar to that in humans than

rodents. BMC Genomics 2009, 10:24.

[123] Davies MN, Gloriam DE, Secker A, Freitas AA, Mendao M, Timmis J, Flower DR:

Proteomic applications of automated GPCR classification. Proteomics 2007,

7(16):2800–14.

[124] Altschul S, Gish W, Miller W, Myers E, Lipman D: Basic local alignment search

tool. J. mol. Biol 1990.

[125] Eddy SR: What is a hidden Markov model? Nat Biotechnol 2004, 22(10):1315–6.

[126] Eddy SR: Profile hidden Markov models. Bioinformatics 1998, 14(9):755–63.

[127] Yabuki Y, Muramatsu T, Hirokawa T, Mukai H, Suwa M: GRIFFIN: a system for

predicting GPCR-G-protein coupling selectivity using a support vector ma-

chine and a hidden Markov model. Nucleic Acids Research 2005, 33(Web Server

issue):W148–53.

[128] Tusn´ady GE, Simon I: The HMMTOP transmembrane topology prediction

server. Bioinformatics 2001, 17(9):849–50. 141

[129] Krogh A, Larsson B, von Heijne G, Sonnhammer EL: Predicting transmembrane pro-

tein topology with a hidden Markov model: application to complete genomes.

Journal of Molecular Biology 2001, 305(3):567–80.

[130] Mirabeau O, Perlas E, Severini C, Audero E, Gascuel O, Possenti R, Birney E, Rosenthal

N, Gross C: Identification of novel peptide hormones in the human proteome

by hidden Markov model screening. Genome Research 2007, 17(3):320–7.

[131] Burges CJ: A Tutorial on Support Vector Machines for Pattern Recognition.

Data Mining and Knowledge Discovery 1998, 2:121–167.

[132] Karchin R, Karplus K, Haussler D: Classifying G-protein coupled receptors with

support vector machines. Bioinformatics 2002, 18:147–59.

[133] Bhasin M, Raghava GPS: GPCRpred: an SVM-based method for prediction of

families and subfamilies of G-protein coupled receptors. Nucleic Acids Research

2004, 32(Web Server issue):W383–9.

[134] Bhasin M, Raghava G: GPCRsclass: a web tool for the classification of amine

type of G-protein-coupled receptors. Nucleic Acids Research 2005.

[135] Guo YZ, Li M, Lu M, Wen Z, Wang K, Li G, Wu J: Classifying G protein-coupled

receptors and nuclear receptors on the basis of protein power spectrum from

fast Fourier transform. Amino Acids 2006, 30(4):397–402.

[136] Gupta R, Mittal A, Singh K: A Novel and Efficient Technique for Identifica-

tion and Classification of GPCRs. IEEE Transactions on Information Technology in

Biomedicine 2008, 12(4):541–548.

[137] Hsu CW, Lin CJ: A comparison of methods for multiclass support vector ma-

chines. Neural Networks, IEEE Transactions on 2002, 13(2):415 –425. 142

[138] Omar HH, Humphries JE, Larsen MJ, Kubiak TM, Geary TG, Maule AG, Kimber MJ,

Day TA: Identification of a platyhelminth neuropeptide receptor. International

Journal for Parasitology 2007, 37(7):725–33.

[139] Hamdan FF, Abramovitz M, Mousa A, Xie J, Durocher Y, Ribeiro P: A novel Schis-

tosoma mansoni G protein-coupled receptor is responsive to histamine. Mol

Biochem Parasitol 2002, 119:75–86.

[140] Taman A, Ribeiro P: Investigation of a in Schistosoma man-

soni: Functional studies and immunolocalization. Mol Biochem Parasitol 2009.

[141] El-Shehabi F, Ribeiro P: Histamine signalling in Schistosoma mansoni: Im-

munolocalisation and characterisation of a new histamine-responsive receptor

(SmGPR-2). International Journal for Parasitology 2010.

[142] Nishimura K, Unemura K, Tsushima J, Yamauchi Y, Otomo J, Taniguchi T, Kaneko

S, Agata K, Kitamura Y: Identification of a novel planarian G-protein-coupled

receptor that responds to serotonin in Xenopus laevis oocytes. Biol Pharm Bull

2009, 32(10):1672–7.

[143] Conklin BR, Farfel Z, Lustig KD, Julius D, Bourne HR: Substitution of three amino

acids switches receptor specificity of Gq alpha to that of Gi alpha. Nature 1993,

363(6426):274–6.

[144] Conklin BR, Herzmark P, Ishida S, Voyno-Yasenetskaya TA, Sun Y, Farfel Z, Bourne

HR: Carboxyl-terminal mutations of Gq alpha and Gs alpha that alter the

fidelity of receptor activation. Molecular Pharmacology 1996, 50(4):885–90.

[145] Ladds G, Goddard A, Davey J: Functional analysis of heterologous GPCR sig-

nalling pathways in yeast. Trends in Biotechnology 2005, 23(7):367–73.

[146] Kimber M, Sayegh L, El-Shehabi F, Song C, Zamanian M, Woods D, Day T, Ribeiro

P: Identification of an Ascaris G protein-coupled acetylcholine receptor with

atypical muscarinic pharmacology. International Journal for Parasitology 2009. 143

[147] McCusker EC, Bane SE, O’Malley MA, Robinson AS: Heterologous GPCR ex-

pression: a bottleneck to obtaining crystal structures. Biotechnol Prog 2007,

23(3):540–7.

[148] Tate CG, Grisshammer R: Heterologous expression of G-protein-coupled recep-

tors. Trends Biotechnol 1996, 14(11):426–30.

[149] Wise A, Gearing K, Rees S: Target validation of G-protein coupled receptors.

Drug Discov Today 2002, 7(4):235–46.

[150] Hoffmann KF, Davis EM, Fischer ER, Wynn TA: The guanine protein coupled

receptor rhodopsin is developmentally regulated in the free-living stages of

Schistosoma mansoni. Mol Biochem Parasitol 2001, 112:113–23.

[151] Saitoh O, Yuruzume E, Watanabe K, Nakata H: Molecular identification of a G

protein-coupled receptor family which is expressed in planarians. Gene 1997,

195:55–61.

[152] Oliveira G: The Schistosoma mansoni transcriptome: an update. Exp Parasitol

2007, 117(3):229–35.

[153] Alvarado AS, Newmark PA, Robb SM, Juste R: The Schmidtea mediterranea

database as a molecular resource for studying platyhelminthes, stem cells

and regeneration. Development 2002, 129(24):5659–65.

[154] Schi¨othHB, Nordstr¨omKJV, Fredriksson R: Mining the gene repertoire and ESTs

for G protein-coupled receptors with evolutionary perspective. Acta Physiol

(Oxf) 2007, 190:21–31.

[155] van der Werf MJ, Vlas SJD, Brooker S, Looman CWN, Nagelkerke NJD, Habbema

JDF, Engels D: Quantification of clinical morbidity associated with schistosome

infection in sub-Saharan Africa. Acta Trop 2003, 86(2-3):125–39. 144

[156] Gurley KA, Rink JC, Alvarado AS: Beta-catenin defines head versus tail identity

during planarian regeneration and homeostasis. Science 2008, 319(5861):323–7.

[157] Nogi T, Zhang D, Chan JD, Marchant JS: A novel biological activity of praziquantel

requiring voltage-operated ca channel Beta subunits: subversion of flatworm

regenerative polarity. PLoS Negl Trop Dis 2009, 3(6):e464.

[158] Hashmi S, Tawe W, Lustigman S: Caenorhabditis elegans and the study of gene

function in parasites. Trends Parasitol 2001, 17(8):387–93.

[159] B¨urglinTR, Lobos E, Blaxter ML: Caenorhabditis elegans as a model for parasitic

nematodes. International Journal for Parasitology 1998, 28(3):395–411.

[160] Thomas JH, Robertson HM: The Caenorhabditis chemoreceptor gene families.

BMC Biol 2008, 6:42.

[161] Montell C: A taste of the Drosophila gustatory receptors. Curr Opin Neurobiol

2009, 19(4):345–53.

[162] Strope PK, Moriyama EN: Simple alignment-free methods for protein classifica-

tion: a case study from G-protein-coupled receptors. Genomics 2007, 89(5):602–

12.

[163] Kent WJ: BLAT–the BLAST-like alignment tool. Genome Research 2002,

12(4):656–64.

[164] Ribeiro P, El-Shehabi F, Patocka N: Classical transmitters and their receptors in

flatworms. Parasitology 2005, 131 Suppl:S19–40.

[165] Ribeiro P, Geary TG: Neuronal signaling in schistosomes: current status and

prospects for postgenomics. Canadian Journal of Zoology 2010, 88:1–22.

[166] Morita M, Hall F, Best JB, Gern W: Photoperiodic modulation of cephalic mela-

tonin in planarians. J Exp Zool 1987, 241(3):383–8. 145

[167] Itoh MT, Shinozawa T, Sumi Y: Circadian rhythms of melatonin-synthesizing

enzyme activities and melatonin levels in planarians. Brain Res 1999, 830:165–

73.

[168] Ermakova ON, Ermakov AM, Tiras KP, Lednev VV: [Melatonin effect on the regen-

eration of the flatworm Girardia tigrina]. Ontogenez 2009, 40(6):466–9.

[169] Mcveigh P, Kimber MJ, Novozhilova E, Day TA: Neuropeptide signalling systems

in flatworms. Parasitology 2005, 131 Suppl:S41–55.

[170] Kreshchenko ND: Functions of flatworm neuropeptides NPF, GYIRF and

FMRF in course of pharyngeal regeneration of anterior body fragments of

planarian, Girardia tigrina. Acta Biol Hung 2008, 59 Suppl:199–207.

[171] McVeigh P, Mair G, Atkinson L, Ladurner P, Zamanian M, Novozhilova E, Marks N,

Day T, Maule A: Discovery of multiple neuropeptide families in the phylum

Platyhelminthes. International Journal for Parasitology 2009.

[172] Collins JJ, Hou X, Romanova EV, Lambrus BG, Miller CM, Saberi A, Sweedler JV,

Newmark PA: Genome-wide analyses reveal a role for peptide hormones in

planarian germline development. PLoS Biol 2010, 8(10):e1000509.

[173] Nordstr¨omKJV, Lagerstr¨omMC, Wall´erLMJ, Fredriksson R, Schi¨othHB: The Se-

cretin GPCRs descended from the family of Adhesion GPCRs. Mol Biol Evol

2009, 26:71–84.

[174] Alic N, Partridge L: Antagonizing Methuselah to extend life span. Genome Biol

2007, 8(8):222.

[175] Ja WW, West AP, Delker SL, Bjorkman PJ, Benzer S, Roberts RW: Extension of

Drosophila melanogaster life span with a GPCR peptide inhibitor. Nat Chem

Biol 2007, 3(7):415–9. 146

[176] Saeger B, Schmitt-Wrede HP, Dehnhardt M, Benten WP, Kr¨ucken J, Harder A, Samson-

Himmelstjerna GV, Wiegand H, Wunderlich F: Latrophilin-like receptor from the

parasitic nematode Haemonchus contortus as target for the anthelmintic dep-

sipeptide PF1022A. FASEB J 2001, 15(7):1332–4.

[177] Mitri C, Parmentier ML, Pin JP, Bockaert J, Grau Y: Divergent evolution in

metabotropic glutamate receptors. A new receptor activated by an endoge-

nous ligand different from glutamate in insects. J Biol Chem 2004, 279(10):9313–

20.

[178] Mitri C, Soustelle L, Framery B, Bockaert J, Parmentier ML, Grau Y: Plant insecticide

L-canavanine repels Drosophila via the insect orphan GPCR DmX. PLoS Biol

2009, 7(6):e1000147.

[179] Taniura H, Sanada N, Kuramoto N, Yoneda Y: A metabotropic glutamate receptor

family gene in Dictyostelium discoideum. J Biol Chem 2006, 281(18):12336–43.

[180] Anjard C, Loomis WF: GABA induces terminal differentiation of Dictyostelium

through a GABAB receptor. Development 2006, 133(11):2253–61.

[181] Huang HC, Klein PS: The Frizzled family: receptors for multiple signal trans-

duction pathways. Genome Biol 2004, 5(7):234.

[182] Petersen CP, Reddien PW: Smed-betacatenin-1 is required for anteroposterior

blastema polarity in planarian regeneration. Science 2008, 319(5861):327–30.

[183] Gloriam DE, Foord SM, Blaney FE, Garland SL: Definition of the G protein-coupled

receptor transmembrane bundle binding pocket and calculation of receptor

similarities for drug design. J Med Chem 2009, 52(14):4429–42.

[184] Wilkie TM: G-protein signaling: satisfying the basic necessities of life. Curr Biol

2000, 10(23):R853–6. 147

[185] Keating CD, Kriek N, Daniels M, Ashcroft NR, Hopper NA, Siney EJ, Holden-

Dye L, Burke JF: Whole-genome analysis of 60 G protein-coupled receptors

in Caenorhabditis elegans by gene knockout with RNAi. Curr Biol 2003,

13(19):1715–20.

[186] Taft AS, Norante FA, Yoshino TP: The identification of inhibitors of Schistosoma

mansoni miracidial transformation by incorporating a medium-throughput

small-molecule screen. Exp Parasitol 2010, 125(2):84–94.

[187] Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high

throughput. Nucleic Acids Research 2004, 32(5):1792–7.

[188] Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream MA, Barrell B:

Artemis: sequence visualization and annotation. Bioinformatics 2000, 16(10):944–

5.

[189] Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H,

Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG:

Clustal W and Clustal X version 2.0. Bioinformatics 2007, 23(21):2947–8.

[190] Caffrey DR, Dana PH, Mathur V, Ocano M, Hong EJ, Wang YE, Somaroo S, Caffrey

BE, Potluri S, Huang ES: PFAAT version 2.0: a tool for editing, annotating, and

analyzing multiple sequence alignments. BMC Bioinformatics 2007, 8:381.

[191] Retief JD: Phylogenetic analysis using PHYLIP. Methods Mol Biol 2000, 132:243–

58.

[192] Untergasser A, Nijveen H, Rao X, Bisseling T, Geurts R, Leunissen JAM: Primer3Plus,

an enhanced web interface to Primer3. Nucleic Acids Research 2007, 35(Web Server

issue):W71–4.

[193] chung Chang C, Lin CJ: LIBSVM: a Library for Support Vector Machines (Ver-

sion 2.31) 2001. 148

[194] Chase DL, Koelle MR: Biogenic amine neurotransmitters in C. elegans. Worm-

Book 2007, :1–15.

[195] Mertens I, Vandingenen A, Meeusen T, Loof AD, Schoofs L: Postgenomic character-

ization of G-protein-coupled receptors. Pharmacogenomics 2004, 5(6):657–72.

[196] Chung S, Funakoshi T, Civelli O: Orphan GPCR research. Br J Pharmacol 2008, 153

Suppl 1:S339–46.

[197] Opekarov´aM, Tanner W: Specific lipid requirements of membrane proteins–

a putative bottleneck in heterologous expression. Biochim Biophys Acta 2003,

1610:11–22.

[198] Pucadyil TJ, Chattopadhyay A: Role of cholesterol in the function and organiza-

tion of G-protein coupled receptors. Prog Lipid Res 2006, 45(4):295–333.

[199] Zamanian M, Kimber MJ, McVeigh P, Carlson SA, Maule AG, Day TA: The repertoire

of G protein-coupled receptors in the human parasite Schistosoma mansoni

and the model organism Schmidtea mediterranea. BMC Genomics (Submitted)

2011, :1–33.

[200] Thomsen W, Frazer J, Unett D: Functional assays for screening GPCR targets.

Curr Opin Biotechnol 2005, 16(6):655–65.

[201] Humphries JE, Kimber MJ, Barton YW, Hsu W, Marks NJ, Greer B, Harriott P, Maule

AG, Day TA: Structure and bioactivity of neuropeptide F from the human

parasites Schistosoma mansoni and Schistosoma japonicum. J Biol Chem 2004,

279(38):39880–5.

[202] Insel PA, Ostrom RS: Forskolin as a tool for examining adenylyl cyclase expres-

sion, regulation, and G protein signaling. Cell Mol Neurobiol 2003, 23(3):305–14.

[203] Kasschau MR, Mansour TE: Serotonin-activated adenylate cyclase during early

development of Schistosoma mansoni. Nature 1982, 296(5852):66–8. 149

[204] Estey SJ, Mansour TE: Nature of serotonin-activated adenylate cyclase during

development of Schistosoma mansoni. Mol Biochem Parasitol 1987, 26(1-2):47–59.

[205] Cret`ıP, Capasso A, Grasso M, Parisi E: Identification of a 5-HT1A receptor posi-

tively coupled to planarian adenylate cyclase. Cell Biol Int Rep 1992, 16(5):427–32.

[206] Moneypenny CG, Maule AG, Shaw C, Day TA, Pax RA, Halton DW: Physiological ef-

fects of platyhelminth FMRF amide-related peptides (FaRPs) on the motility

of the monogenean Diclidophora merlangi. Parasitology 1997, 115 ( Pt 3):281–8.

[207] Farrell MS, Gilmore K, Raffa RB, Walker EA: Behavioral characterization of sero-

tonergic activation in the flatworm Planaria. Behav Pharmacol 2008, 19(3):177–82.

[208] Lawson D, Arensburger P, Atkinson P, Besansky NJ, Bruggner RV, Butler R, Campbell

KS, Christophides GK, Christley S, Dialynas E, Emmert D, Hammond M, Hill CA,

Kennedy RC, Lobo NF, MacCallum MR, Madey G, Megy K, Redmond S, Russo S,

Severson DW, Stinson EO, Topalis P, Zdobnov EM, Birney E, Gelbart WM, Kafatos

FC, Louis C, Collins FH: VectorBase: a home for invertebrate vectors of human

pathogens. Nucleic Acids Research 2007, 35(Database issue):D503–5.

[209] Richards JS, Jonassen JA, Rolfes AI, Kersey K, Reichert LE: Adenosine 3’,5’-

monophosphate, luteinizing hormone receptor, and progesterone during gran-

ulosa cell differentiation: effects of estradiol and follicle-stimulating hormone.

Endocrinology 1979, 104(3):765–73.

[210] Maule AG, Marks NJ: Parasitic flatworms: molecular biology, biochemistry, im-

munology and physiology 2006, :222–223.

[211] Overington JP, Al-Lazikani B, Hopkins AL: How many drug targets are there? Nat

Rev Drug Discov 2006, 5(12):993–6.

[212] Agboh KC, Webb TE, Evans RJ, Ennion SJ: Functional characterization of a P2X

receptor from Schistosoma mansoni. J Biol Chem 2004, 279(40):41650–7. 150

[213] Kaczorowski GJ, McManus OB, Priest BT, Garcia ML: Ion channels as drug targets:

the next GPCRs. J Gen Physiol 2008, 131(5):399–405.

[214] Kim E, Day TA, Bennett JL, Pax RA: Cloning and functional expression of a

Shaker-related voltage-gated potassium channel gene from Schistosoma man-

soni (Trematoda: Digenea). Parasitology 1995, 110 ( Pt 2):171–80.

[215] Salkoff L, Wei AD, Baban B, Butler A, Fawcett G, Ferreira G, Santi CM: Potassium

channels in C. elegans. WormBook 2005, :1–15.

[216] Jeziorski MC, Greenberg RM: Voltage-gated calcium channel subunits from platy-

helminths: potential role in praziquantel action. International Journal for Para-

sitology 2006, 36(6):625–32.

[217] Mumberg D, M¨ullerR, Funk M: Yeast vectors for the controlled expression of

heterologous proteins in different genetic backgrounds. Gene 1995, 156:119–22.

[218] Miret JJ, Rakhilina L, Silverman L, Oehlen B: Functional expression of heteromeric

calcitonin gene-related peptide and adrenomedullin receptors in yeast. J Biol

Chem 2002, 277(9):6881–7.

[219] Overton HA, Babbs AJ, Doel SM, Fyfe MCT, Gardner LS, Griffin G, Jackson HC, Procter

MJ, Rasamison CM, Tang-Christensen M, Widdowson PS, Williams GM, Reynet C:

Deorphanization of a G protein-coupled receptor for oleoylethanolamide and

its use in the discovery of small-molecule hypophagic agents. Cell Metab 2006,

3(3):167–75.