Molecular evolution of nucleoside transporters

Tamima Ashraf

A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES

IN PARTIAL FULFILMENT OF THE REQUIREMENTS

FOR THE DEGREE OF MASTER OF SCIENCE

GRADUATE PROGRAMME IN BIOLOGY YORK UNIVERSITY TORONTO, ONTARIO

JANUARY 2008 Library and Bibliotheque et 1*1 Archives Canada Archives Canada Published Heritage Direction du Branch Patrimoine de I'edition

395 Wellington Street 395, rue Wellington Ottawa ON K1A0N4 Ottawa ON K1A0N4 Canada Canada

Your file Votre reference ISBN: 978-0-494-38742-9 Our file Notre reference ISBN: 978-0-494-38742-9

NOTICE: AVIS: The author has granted a non­ L'auteur a accorde une licence non exclusive exclusive license allowing Library permettant a la Bibliotheque et Archives and Archives Canada to reproduce, Canada de reproduire, publier, archiver, publish, archive, preserve, conserve, sauvegarder, conserver, transmettre au public communicate to the public by par telecommunication ou par I'lnternet, prefer, telecommunication or on the Internet, distribuer et vendre des theses partout dans loan, distribute and sell theses le monde, a des fins commerciales ou autres, worldwide, for commercial or non­ sur support microforme, papier, electronique commercial purposes, in microform, et/ou autres formats. paper, electronic and/or any other formats.

The author retains copyright L'auteur conserve la propriete du droit d'auteur ownership and moral rights in et des droits moraux qui protege cette these. this thesis. Neither the thesis Ni la these ni des extraits substantiels de nor substantial extracts from it celle-ci ne doivent etre imprimes ou autrement may be printed or otherwise reproduits sans son autorisation. reproduced without the author's permission.

In compliance with the Canadian Conformement a la loi canadienne Privacy Act some supporting sur la protection de la vie privee, forms may have been removed quelques formulaires secondaires from this thesis. ont ete enleves de cette these.

While these forms may be included Bien que ces formulaires in the document page count, aient inclus dans la pagination, their removal does not represent il n'y aura aucun contenu manquant. any loss of content from the thesis. •*• Canada Abstract

Mammalian cells possess two non-homologous families of nucleoside transporters (NTs), equilibrative nucleoside transporters (ENTs) and concentrative nucleoside transporters

(CNTs). NTs allow the transport of natural nucleosides and nucleoside analog drugs administered for cancer and viral therapy. To date, four ENT isoforms and three CNT isoforms are found in mammals. However, the evolutionary history of these isoforms is not clearly understood. To understand the distribution and evolution of ENT and CNT isoforms, I used word-based and BLAST searches along with CLUSTALW and

TMHMM2.0 programs to identify novel NT members in recently sequenced genomes.

Database analyses revealed novel NT members in mammals, fish, bird, frog, marsupial, seasquirt, sea urchin, insects, parasites, plant, fungi and bacteria, providing insight into how or genome duplication may have led to the evolution of different isoforms in vertebrates from a single ancestral prototype. I have also identified highly conserved residues among invertebrates and vertebrates which have remained unchanged over time and will be useful targets for structural studies. During my analyses an unknown protein family (UPP) was identified that belongs to the major facilitator superfamily (MFS). Due to sequence similarity to the oligosaccharide: H+ symporter (OHS) family and the nucleoside: H+ symporter (NHS) family of the MFS, I decided to functionally characterize the human homologue of this family. Furthermore, cellular localization studies revealed that the human UPP1 (hUPPl) protein is localized at the plasma membrane using MCF-7 and COS-7 cell lines. Functional characterization of this protein

i established that hUPPl transports hypoxanthine in MCF-7 cells and transports uridine when expressed in Xenopus oocytes. This study is the first one to provide evidence that another nucleoside/nucleobase transport system in human exists that does not have any with ENT or CNT proteins. This study also presents the most comprehensive analyses of the NT family and provides better understanding of the evolution of ENT and CNT families.

11 Acknowledgement I would like to begin by thanking my supervisor, Dr. Imogen R. Coe, for taking the risk to supervise me and teaching me so much more than just nucleoside transporters. I have learnt a great deal in the past couple of years and I am extremely grateful for the amount of interest that you showed in my project.

I would like to thank my parents and my brother for their tremendous support despite being thousands of miles away. I would have never imagined surviving in Canada for six long years without their constant encouragement and support.

I would like to thank Mirza Haque, for bearing with my endless insanity and always making it worthwhile at the end. Thank you for always being the first one to listen and critique my presentations.

I would also like to thank my lab mates for always being there since my first day at Coe lab. I am really lucky, Zlatina, that you were always there to help me with those calculations and to ease my frustration when experiments did not go as planned. Thank you, Nicole and Jen, for always taking the time to listen to my incoherent thoughts and then making sense of it all. Thank you German for all your valuable guidance and advises. I would also like to thank Dr. Brad Hanna at the University of Guelph for his help with the oocytes. Finally, I would like to thank God for giving me the opportunity and patience to study.

111 Table of Contents Abstract i Acknowledgement iii Table of Contents iv List of Abbreviations vii List of Contributors ix List of Tables x List of Figures xi Chapter 1: Molecular evolution of nucleoside transporters 1 Abstract 1 1.1 Introduction 2 1.1.1 Cellular and clinical relevance of nucleoside transporters '. 2 1.1.2 Classification of nucleoside transporters 4 1.1.2 Evolutionary history of nucleoside transporters 6 1.2 Thesis Objectives 8 1.3 Methods 9 1.3.1 Database analysis 9 1.3.2 Tree construction 9 1.3.3 Nomenclature 10 1.4 Results 11 1.4.1ENT family 11

1.4.1.1 Novel mammalian ENTs 11

1.4.1.2 Novel avian ENTs , 15 1.4.1.3 Novel fish ENTs 16 1.4.1.4 Novel amphibian ENTs 17 1.4.1.5 Novel tunicate ENT 17 iv 1.4.1.6 Novel Arthropod ENTs 18 1.4.1.7 Novel echinoderm ENTs 18 1.4.1.8 Novel nematode ENTs 19 1.4.1.9 Novel mycetozoan ENTs 19 1.4.1.10 Novel Protozoan ENTs 20 1.4.1.11 Novel Plant and fungal ENTs 20 1.4.1.12 Conserved residues among ENT family members 20 1.4.2 CNT family 21

1.4.2.1 Novel vertebrate CNTs 21 1.4.2.2 Novel tunicate CNTs 22 1.4.2.3 Novel invertebrate CNTs 23 1.4.2.4 Novel fungal CNTs 24 1.4.2.5 Novel prokaryotic CNTs 24 1.4.2.6 Conserved residues among CNT family members 24 Table legends 25

Figure legends 37

1.5. Discussion 47

1.5.1 The ENT family 47

1.5.2 The CNT family 55

1.6 Conclusion 59

1.7 References 60

Chapter 2: Functional characterization of a putative human membrane transporter 71

Abstract 71

2.1 Introduction 73

2.2 Research objectives 76

2.3 Methods 76

2.3.1 Database analysis 76

v 2.3.2 Nomenclature 77 2.3.3 Plasmid construction 77 2.3.4 COS-7 and HEK-293 and MCF-7 cell culture... 78 2.3.5 Transfection for Confocal microscopy and uptake assay 79 2.3.6 Confocal microscopy 79 2.3.7 Uptake assay using cell systems 80 2.3.8 In vitro RNA transcription 81 2.3.9 Uptake assay in Xenopus oocytes 82 2.4 Results 83 2.4.1 UPP family 83

2.4.1.1 Novel vertebrate UPPs 83

2.4.1.2 Novel invertebrate UPPs 84 2.4.1.3 Novel prokaryotic UPPs 85 2.4.2 Human UPP1 85

2.4.2.1 Cellular localization of hUPPl 85 2.4.2.2 Substrate specificity of hUPPl 86 Table legends 88 Figure legends 95 2.5 Discussion 107 2.6 Conclusion Ill 2.7 References 112

VI List of Abbreviations ADP Adenosine diphosphate ATP Adenosine triphosphate AZT Azidothymidine C- Carboxyl CNT Concentrative nucleoside transporter DIPY Dipyridamole DMEM Dulbecco' s modified Eagle' s medium DNA Deoxyribonucleic acid ENT Equilibrative nucleoside transporter FBS Fetal bovine serum GFP Green fluorescent protein GTP Guanosine triphosphate hCNT Human concentrative nucleoside transporter hENT Human equilibrative nucleoside transporter HIV Human immunodeficiency virus hUPP Human unknown protein product MFS Major facilitator superfamily N- Amino NBTI Nitrobenzylthioinosine NHS Nucleoside: H+ symporter NT Nucleoside transporter Nup Nucleoside permease OHS Oligosaccharide: H+ symporter RNA Ribonucleic acid SDS Sodium dodecyl sulfate SLC Solute carrier SP Sugar porter TM Transmembrane TMD Transmembrane domain vii UCP1 Uncoupling protein 1 UPP Unknown protein product List of Contributors I would like to thank Dr. Brad Hanna at University of Guelph for maintaining the frogs, performing the surgery to remove the oocytes and for providing necessary training for oocyte sorting and microinjection.

IX List of Tables Chapter 1

Table 1: The ENT family members

Table 2: The CNT family members

Table 3: The ENT protein sequence similarities between Ciona intestinalis and human

Chapter 2

Table 1: The UPP family members

Table 2: Predicted expression profile for hUPPl and hUPP2

Table3: Comparison of [ H]-hypoxanthine and [ H]-uridine uptake by hUPPl expressed in MCF-7cells

Table 4: [3H]-hypoxanthine uptake during 60 minutes exposure in pCDNA3.1(+) and hUPPl expressed oocytes

Table 5: [3H]-uridine uptake during 60 minutes exposure in pCDNA3.1(+), hENTl and hUPPl expressed oocytes

x List of Figures Chapter 1

Figure 1: predicted membrane topology of Ciona UPP

Figure 2: Predicted membrane topology of LacY and NupG of E. coli

Figure 3: Predicted membrane topology of hUPPl and hUPP2

Figure 4: Plasma membrane localization of hUPPl in Cos-7 cells

Figure 5: Plasma membrane localization of hUPPl in MCF-7 cells

Figure 6: [3H] hypoxanthine uptake by hUPPl expressed in MCF-7 cells

Figure 7: [3H] uridine uptake by hUPPl expressed in MCF-7 cells

Figure 8: [ H] hypoxanthine uptake by hUPPl expressed in Xenopus oocytes

Figure 9: [3H] uridine uptake by hUPPl expressed in Xenopus oocytes

Figure 10: Phylogenic distribution of UPP family

Chapter 2

Figure 1: Predicted membrane topology of human ENT1 (hENTl)

Figure 2: Predicted membrane topology of human CNT1 (hCNTl)

Figure 3: Phylogenetic distribution of ENT family members

Figure 4: Predicted membrane topology of CiENT

Figure 5: The evolutionary history of the ENT family

Figure 6: Conserved residues among ENT family members

Figure 7: Phylogenetic distribution of CNT family members

Figure 8: The evolutionary history of the CNT family

Figure 9: Conserved residues among CNT family members

xi Chapter 1: Molecular evolution of nucleoside transporters

Abstract

Mammalian cells possess two families of nucleoside transporters (NTs), equilibrative nucleoside transporters (ENTs) and concentrative nucleoside transporters

(CNTs). These proteins are clinically significant because of their ability to transport nucleoside analog drugs used in anticancer and antiviral therapy. ENTs transport in the direction of the endogeneous concentration gradient of nucleosides across the membrane whereas CNTs are typically Na+ symporters. In humans, 4 ENT isoforms and 3 CNT isoforms have been identified to date. NT homologues have been identified in a number of eukaryotes and prokaryotes. To better understand the distribution and composition of

NT families, I have identified novel putative ENTs and CNTs in other organisms with recently sequenced genomes. I have used protein sequence similarity analyses employing

ClustalW pairwise and multiple alignments, in combination with BLAST searches, to identify NT members in a variety of organisms representing a broad phylogeny. The identification of NT isoforms in tunicate, zebrafish, pufferfish, opossum, chicken, frog, insects, parasites, fungi and plants provide insight into how gene or genome duplication may have led to the evolution of different isoforms in vertebrates from a single ancestral prototype. In addition to the identification of novel putative NTs, I have also investigated the protein sequences of ENTs and CNTs to identify highly conserved residues which have remained unchanged over time. Taken together, these data present the most comprehensive evolutionary study of the NT families to date.

1 1.1 Introduction

1.1.1 Cellular and clinical relevance of nucleoside transporters

The nucleoside transporter (NT) family consists of many functionally diverse membrane transporters, which facilitate the passage of hydrophilic nucleosides and nucleobases across the cell membrane (Vijayalakshmi and Belt, 1988, Wu et al. 1992,

Griffith et al. 1996, Cass et al. 1999, Cabrita et al. 2002, Podgorska et al 2005).

Nucleosides can either be produced inside a cell by de novo processes or, alternatively, can be transported into the cell by NTs. NTs provide salvage pathways for several mammalian cells and protozoan parasites. For example, bone marrow cells, enterocytes and purine auxotrophs like trypanosomes cannot synthesize nucleosides on their own and

NTs are their only way to retrieve these essential molecules from the extracellular environment (De Koning et al. 2005, Carter et al. 2001). Once inside the cell, nucleosides are phosphorylated, become nucleotides and act as building blocks of nucleic acids (DNA and RNA). Nucleosides are therefore essential for nucleic acid synthesis and through their contribution to the intracellular nucleoside pool, NTs are involved in a wide spectrum of important cellular processes like transcription (DNA), protein synthesis

(RNA) and bioenergetics (ATP, GTP, ADP).

NTs are extensively studied mainly due to their ability to transport various molecules involved in cell signaling (Pastor- Anglada et al. 2007). To date, a great deal of research has been done on the physiological role of adenosine. Adenosine is known to activate cell-surface adenosine receptors linked to intracellular signal transduction pathways influencing physiological processes like neurotransmission and cardiovascular 2 activity (Loffler et al. 2007, Jennings et al. 2001). NT's are likely to modulate these activities by removing the extracellular adenosine and thus terminating the receptor- activated signal (Jennings et al. 2001).

Nucleoside transporters are also physiologically and pharmacologically important due to their contribution as nucleoside or nucleobase analog drug transporters (King et al.

2007, Zhang et al. 2007). NTs allow (ENTs) or mediate (CNTs) the transport of various hydrophilic drugs across the antagonistic lipid bilayer. These drugs are used in treatments against cancer and viral infections. Some of the routinely administered drugs for tumor treatment are gemcitabine, capecitabine, cladribine, cytarabine, fludarabine and 5- flurouracil (Kong et al. 2004, Galmarini et al. 2003, Galmarini et al. 2002). After they enter cells, these drugs become phosphorylated and are incorporated into nucleic acids during synthesis. Analogs typically interfere with the DNA replication and repair within cells leading to inhibition of cell division, apoptosis and, in some cases, increased cellular sensitivity to other treatments such as radiation (Baldwin et al. 1999). Many nucleoside analog antiviral compounds are being used to treat HIV and hepatitis, e.g. AZT, ribavirin, didanosine (Zhang et al. 2007, Kong et al. 2004). The identification of multiple NTs in various protozoan parasites and the increased understanding of their ability to transport various cytotoxic drugs have made them potential therapeutic targets for developing antiparasitic drugs (Parker et al. 2000). Allopurinol is a nucleoside analog that is successfully being used in treating leishmaniasis and chagas disease (Nakajima et al.

1996, Mishra et al. 2007). As more diverse roles of NTs are identified, more studies are

3 being undertaken to understand the evolution, structure, function and regulation of these transporters.

1.1.2 Classification of nucleoside transporters

Mammalian cells have two structurally unrelated NT systems: equilibrative and concentrative.

Equilibrative nucleoside transporter (ENT) Family

ENTs are dependent on the concentration gradient of nucleosides and nucleobases across the membrane and belong to the solute carrier family 29 (SLC29) (Baldwin et al.

2004). However, orthologues of ENTs in parasitic protozoan like Trypanosoma and

Leishmania are proton dependent symporters (Stein et al. 2003) suggesting that despite having structural similarity to other ENTs, these protozoan parasites use an ion dependent mechanism to transport nucleosides. Multiple isoforms of ENT members have been found in mammals, fish, insects, worms, parasites, fungi and plants (Hyde et al.2001,

Acimovic and Coe, 2002). Four isoforms of ENTs have been identified to date in humans, ENT1- ENT4 (Hyde et al.2001, Acimovic and Coe, 2002). However, only ENT1 and ENT2 are characterized extensively at the molecular and functional level. Human

ENT1 (hENTl) is the first member of this family to be identified in 1996 and is now known as a purine and pyrimidine nucleoside transporter (Griffith et al. 1996). ENT1 isoform is expressed widely in liver, heart, lung, spleen, testis, kidney, colon, brain, placenta and erythrocytes (Baldwin et al. 2004). Unlike ENT1, ENT2 can transport nucleobases along with nucleosides (Yao et al. 2002). ENT2 is also abundantly expressed

4 in a range of tissues, particularly in skeletal muscle (Ward et al. 2000). ENT3 possesses an extended hydrophilic N-terminal region unlike ENT1 and ENT2 and has a lysosomal targeting dileucine motif which appears to direct this protein to an intracellular localization (Baldwin et al. 2005). ENT4 has recently been identified as a ubiquitously expressed monoamine/organic cation transporter which is also capable of low-affinity adenosine transport in the heart (Barnes et al. 2006). However, this protein appears to be optimally functional at acidic pH and therefore its physiological relevance is not fully understood. All four family members have similar predicted membrane topology with intracellular N-terminal, 11 transmembrane (TM) helices, extracellular C terminal and a cytoplasmic loop linking TM6 and 7 (Figure 1) (Sankar et al. 2002).

Concentrative nucleoside transporters (CNTs)

CNTs are typically Na+ symporters which can move nucleosides against their endogenous concentration gradient and which belong to the solute carrier family 28

(SLC28) (Gary et al. 2004). There are three known CNT isoforms in humans, hCNTl- hCNT3 (Smith et al. 2007). HCNT1 is selective for pyrimidine nucleosides and adenosine, whereas hCNT2 is selective for purine nucleosides and uridine, but hCNT3 is broadly selective for both purine and pyrimidine nucleosides (Smith et al. 2004, Dresser et al. 2000). CNT family members have been identified in mammals, hagfish, nematodes, fungi and in prokaryotes. Unlike ENTs, CNTs are primarily localized in intestinal and renal epithelia, hepatocytes, choroid plexus, macrophages, splenocytes and leukaemic cells (Mangravite et al. 2001, Lu et al. 2004). Most CNT members share the common predicted membrane topology of an intracellular N-terminus, 13 TM helices and an 5 extracellular glycosylated C-terminus (Figure 2) (Hamilton et al. 2002). However, the proton coupled CNT from Escherichia coli, NupC, lacks TMs 1-3, but displays the remaining predicted topology (Loewen et al. 2004).

1.1.2 Evolutionary history of nucleoside transporters

Analyzing the evolutionary history of NTs in chordates is crucial in helping to understand the diverse and intricate processes these transporters are involved in. The comparative genomics of the NT family can provide insights into how vertebrate members may have evolved from invertebrate ancestors. The comprehensive analysis of the molecular evolution of their structure and function will also provide insight into conserved regions among different taxa and whether there has been a gain or loss of gene function (Funkhouser et al. 2007). Understanding this complicated evolutionary process can also aid in future developmental studies and in generating whole animal models for drug testing. In the long run, whole animal models can benefit the development of better approaches to drug administration in humans.

To elucidate the origin of the multiple NT isoforms in mammals, comprehensive data analyses have been conducted. These analyses have revealed novel orthologues in other vertebrate and invertebrate organisms. Previous analyses of ENT family revealed an active gene duplication process and possible gene loss (Acimovic and Coe, 2002). Gene duplication is considered as the major driving force for chordate evolution and data suggest that it also plays a role in the evolution of the ENT family (Pebusque et al. 1998,

Annilo et al. 2006, Acimovic and Coe, 2002). Previous data suggests that prior to the

6 appearance of vertebrates two ENT lineages existed, the ENT4 lineage and the ENT1/2/3 lineage which appeared from an ancestral gene duplication event (Acimovic and Coe,

2002). Then two subsequent gene duplications took place in the ENT1/2/3 lineage. These three gene duplication events are thought to be responsible for the presence of four ENT isoforms in vertebrates. However, this model has not been recently updated or expanded since first proposed (Acimovic and Coe, 2002) particularly in light of the many genomes that are now available.

In addition, previous phylogenetic analyses with functionally characterized CNTs in human, mouse, rat, pig, nematode, pathogenic yeast and bacteria showed that CNT1/2 proteins are more similar to each other than CNT3 proteins (Yao et al. 2002) which could be the result of CNT1/2 and CNT 3 lineage split due to an ancient gene duplication event.

The discovery of a single CNT3 in the prevertebrate hagfish suggested that the CNT3 lineage existed in prevertebrates and first gene duplication event in CNT family took place before the appearance of vertebrates. However, the timing of the ancient and subsequent gene duplication events and the order of appearance of these isoforms is still unclear.

7 1.2 Thesis Objectives Due to the completion of multiple genome sequencing projects in recent years, genome sequences of organism from different taxa are now available in publicly accessible databases providing us with an opportunity to identify putative NT members in a single taxon. Furthermore, I can compare newly identified putative NTs to other completed NT families and analyze the evolutionary history and relationship of the NT family as a whole. Comparison of isoforms between protochordate (e.g. seasquirt- Ciona intestinalis) or prevertebrate (e.g. hagfish-Eptatretus stouti) and vertebrates will provide insight into the origin of ENT1-4 and CNT1-3 in vertebrates. And integrating previous data with recent findings from extensive data mining studies will further reveal the divergence patterns of NT family members. Thus, the research objectives are,

1. To expand the NT families by identifying novel members in phylogenetically

diverse organisms, particularly those with complete genomes

2. Deduce the origin of four ENT isoforms and three CNT isoforms and elucidate

potential mechanisms for the existence of the different isoforms and possible

physiological relevance of multiple isoforms

8 1.3 Methods

1.3.1 Database analysis

To investigate the origin of ENTs and CNTs in chordates, putative NT family members were identified using word-based searches (nucleosides, transporter, concentrative, adenosine etc) from the Joint Genome Institute (http://genome.jgi- psf.org/), Ensemble database and the NCBI database. Full-length protein sequences

(containing initiating methionine and stop codon) identified on the basis of word searches were subsequently used in protein Blast searches to identify homologous sequences in eukaryotic organisms and prokaryotic (http://www.ncbi.nlm.nih.gov/BLAST/), particularly those with partially or completely sequenced genomes that are publicly accessible (e.g. Pan troglodytes, Macaca mulatta, Monodephis domestica). Partial and complete protein sequences were further analyzed employing ClustalW pairwise and multiple alignments (default settings) to investigate the degree of similarity between the amino acid sequences (http://www.ebi.ac.uk/clustalw/) of putative and characterized proteins. The transmembrane (TM) helix prediction program TMHMM 2.0 was used to determine the putative membrane topology of the protein sequences

(http://cbs.dtu.dk/services/TMHMM).

1.3.2 Tree construction

The program, Webphylip, was used to generate phylogenetic trees (rootless) using the protein parsimony method to visualize the distribution and similarity of these proteins among various taxa (http://biocore.unl.edu/WEBPHYLIP/). Phylogenetic trees

9 were constructed using the most conserved region (Amino acids from position 100 to

220) among vertebrates and invertebrates. Putative and previously characterized ENT proteins were aligned to identify the region with the highest conservation. Due to the fact that the ENT4 isoforms show overall low sequence homology to ENT 1-3 isoforms, alignments were done both including and excluding the ENT4 proteins to identify conserved residues.

1.3.3 Nomenclature

Newly identified protein sequences were designated as ENT 1, 2, 3, 4 or CNT1, 2 or 3 according to their sequence homology to previously characterized ENT or CNT proteins, preceded by an abbreviation of genus and species name. Previously identified sequences in human, mouse, rat were named accordingly, e.g. hENTl, mENTl. In exceptional cases where the abbreviation of genus and species was similar for two organisms, first two letters of the genus followed by the first letter of the species name had been used, e.g. CafENT for Canis familaris since CfENT has been used to designate

Crithidia fasiculata and CaENT has been used for Candida albicans. If the orthology was clear between a mammalian ENT and two duplicated of another organism, they were designated as ENT 1.1 and ENT 1.2. For several invertebrates, the ENT homology was not clear and their naming does not reflect similarity to other prototypic

ENTs but rather their order of discovery.

10 1.4 Results

1.4.1 ENT family

The word-based searches and Blast searches on the NCBI, JGI and Ensemble database resulted in a total of 141 ENTs based on sequence similarity in a variety of taxa from chordata, nematoda, platyhelminthes, echinodermata, arthropoda, protozoa and mycetozoa (Table 1, Figure 3). The compiled database consists of both previously characterized and novel but uncharacterized ENT proteins that require functional characterization to confirm their status as ENTs. To date, only 34 proteins have been functionally characterized out of the 141 proteins identified.

1.4.1.1 Novel mammalian ENTs

Pan troglodytes'. Completion of the genome sequencing project in human's closest relative, the chimpanzee, Pan troglodytes, has allowed identification of putative ENTS which would be predicted to show the least evolutionary change compared to hENTs.

Four novel PtENTs (corresponding to ENT1, ENT2, ENT3 and ENT4) were obtained from data derived from the genome sequencing project. Protein and mRNA sequences are typically predicted by automated computational analysis using gene prediction programs such as GNOMON. Thus, the database describes putative PtENTl as 711 amino acids in length (based on GNOMON analysis), which is considerably longer than hENTl. On closer inspection of the predicted protein sequence it was evident that a novel initiating methionine had been designated as the beginning of this protein (compared to other

11 mammalian ENT1 proteins) and this methioinine is considerably upstream of the initiating methionine in hENTl (Unpublished data, Abdulla and Coe). Further analysis of this N-terminal extended protein by TMHMM suggests that the putative PtENTl possesses an intracellular 254 amino acid N-terminal region, followed by the classic 11

TM domain structure that typifies ENTs. Alignment of PtENTl with hENTl identified another potential initiating methionine in the same location as mammalian ENT1 proteins which would result in a protein of 456 amino acids, typical for ENT1 isoforms in mammals. Additionally, a putative PtENTl, 456 amino acids long has been located in the

ENSEMBL database. Thus, I have included this protein as a putative PtENTl, while acknowledging that the existence of an additional in frame initiating methionine raises the possibility that splice variants of ENT1 that have previously not been identified may exist.

The putative PtENT3 is described as having a length of 638 amino acids predicted by GNOMON analysis. Although the TMHMM generated hydropathy plot for this protein suggested a typical 11 TM domain structure, the intracellular 181 amino acid N- terminal region is unique compared to other ENT3 homologues. Alignment with hENT3 revealed that the presence of a novel upstream methionine resulted in this longer PtENT3 protein. However, the alignment also showed the presence of a downstream (relative to the first methionine) second initiating methionine resulting in a protein, which is 475 residues in length. This "truncated" PtENT3 possess high sequence homology and has a comparable length to other mammalian ENT3 orthologues. Since this protein is also listed as a putative ENT3 in the ENSEMBL genome browser, it has been included in 12 Table 1 as a putative PtENT3. Similarly to PtENTl, these observations suggest the possibility of a second splice variant of PtENT3. A small N-terminal segment has been also identified in the chimpanzee genome that has high sequence similarity to other

ENT4 proteins. Although the mRNA and protein sequence is still incomplete, this partial sequence has been included as a putative PtENT4 in Table 1.

Macaca mulatto: A third primate genome, the rhesus monkey revealed four prototypic

ENTs (MmENTl-4). The putative MmENTl has been described as 752 amino acids in length based on GNOMON analysis. TMHMM analysis showed that this protein has a longer N-terminal region than other mammalian ENT1 proteins. Further investigation with ClustalW alignment revealed that a novel methionine upstream of hENTl was used, which generated this long intracellular region (256 amino acids) at the N-terminal. A similar long internal loop has also been identified during this analysis in PtENTl. Further resembling PtENTl, a second initiating methionine was found in MmENTl. Noting the presence of a second initiating methionine and aligning MmENTl with hENTl generated a protein that is 456 residues in length. An identical protein has been identified in the

ENSEMBL database and since this "truncated" version of MmENTl has high sequence homology and comparable length to other ENT1 isoforms, it has been included in Table 1 as another putative MmENTl.

The putative MmENT4 has a large intracellular loop (200 residues) and an unusually long N terminal region compared to other mammalian ENT4 proteins. This putative transporter also has high sequence homology to other mammalian ENT4 isoforms from residue 201 to residue 645. This 524 residue long sequence has therefore 13 been included as another putative transporter lacking an initiating methionine. This suggests the possibility of two splice variants of MmENT4 where one of them is longer than the typical ENT4 proteins and the second one is similar in length to other ENT4 proteins, but only partially identified.

Bos taurus: Four putative ENTs have been also identified in Bos taurus, BtENTl-4. A complete putative BtENT3 protein was identified that is 474 amino acids long. However, another putative BtENT3, 476 amino acids in length, was listed on the ENSEMBL database which lacks an initiating methionine but possess high sequence similarity to other ENT3 proteins. It suggests the possibility of another existing, but unidentified isoform of the BtENT3 that is longer than usual ENT3 proteins.

The putative BtENT4 is 608 amino acids in length and when aligned with hENT4 has 78% similarity. However, further investigation revealed another BtENT4 isoform,

525 amino acids in length that has an 86% similarity to hENT4. Since both of these proteins show high similarity to other mammalian ENT4 proteins, both have been listed as putative BtENT4.

Canis familiaris: Four novel ENT isoforms have been obtained from the dog genome sequence available at the NCBI database. The putative CafENT4 protein has been identified as 548 amino acids long. The first 22 amino acids of CafENT4 give it a longer

N-terminal region, including a novel upstream methionine, compared to other mammalian homologues. However, aligning this protein with other ENT4 sequences revealed the presence of another downstream initiating methionine that generated a protein 526 amino acids in length. This second initiating methionine is found to be 14 conserved among both functionally characterized and novel mammalian ENT4 proteins.

The additional 22 amino acids upstream of the conserved methionine could be the N- terminus of another CafENT4 isoform.

Monodelphis domestical A recent addition to the NCBI database is the opossum genome.

Five putative ENTs have been identified in this vertebrate including two splice variants of MdENTl (1.1 and 1.2). Unlike other mammalian homologues, the MdENT2 protein is described as 632 residues long, which is longer than typical ENT2 proteins. Although the

TMHMM generated hydropathy plot showed the expected 11 TM domain structure, the intracellular 200 amino acid N-terminal region is uncommon for ENT2 proteins.

Nevertheless, aligning this protein with other mammalian ENT2 amino acid sequences did not reveal another downstream initiating methionine. But a highly homologous region was observed from residue 201 at the N-terminal region to residue 632 at the C- terminal region. Further investigation revealed that this region is highly conserved among all vertebrate ENT2 isoforms. Thus, this partial protein sequence has been listed in the table as another putative MdENT2, 433 amino acids in length lacking the initiating methionine.

1.4.1.2 Novel avian ENTs

Gallus gallus: Three putative avian ENT proteins have also been identified in the chicken genome (GgENTl, 2 & 4). The putative GgENTl is 449 amino acids long based on GNOMON analysis. On the other hand, another putative GgENTl has been listed on the ENSEMBL database as 457 amino acids long that has eight extra amino acids

15 (ALCLFLPA) between residue 250 and 251 compared to the above mentioned one. The rest of the sequences for both proteins are 100% identical. Both have been listed as putative GgENTl in Table 1. An ENT3 like isoform has not been identified in chicken genome. Given that the ENT3 prototype has been found in all other species, it still remained inconclusive if this isoform exists in this taxon or not. Also, only partial ENT4 sequence has been found which appears to be missing the sequence of the N-terminal based on comparisons with ENT4 proteins.

1.4.1.3 Novel fish ENTs

The varying genome sizes and presence of duplicated genes made it challenging to identify ENT homologues in fish genomes. In pufferfish, Tetraodon nirgoviridis, four prototypic novel ENTs were identified (TnENTl-TnENT4). During previous comprehensive analysis of the ENT family, five putative partial ENTs were identified within another pufferfish genome (Fugu), and four of them could be classified with confidence as FrENTl-4 (Acimovic and Coe, 2002). In this study, a total of six putative

ENTs have been identified, FrENTl.l, FrENT1.2, FrENT2.1, FrENT2.2, FrENT3 and

FrENT4. All except FrENT3 are incomplete sequences. Since Fugu and Tetraodon possesses relatively small genomes, data from another completed fish genome confirmed the presence of additional isoforms in comparison to mammalian ENTs. Seven novel

ENTs have been identified in the zebrafish genome, DrENTl.l, DrENT1.2, DrENT2.1,

DrENT2.2, DrENT3, DrENT4.1 and DrENT4.2. The putative DrENT3 found in the

NCBI database was unusual, in both length (919 residues) and in the predicted location

16 (extracellular) of the N-terminus region. In contrast, another putative ENT3 was found from the ENSEMBLE genome browser, which is 385 amino acids long and lacks both an initiating methionine and a stop codon. These two proteins had identical amino acids sequence, but alignment of both required gaps to be introduced into the shorter sequence to maintain the alignment. This leads to the speculation that these two could be splice variants of DrENT3. Although most of them had the typical 11 TM domain topology, the sequences are incomplete since they lack an initiating Methionine or a stop codon.

1.4.1.4 Novel amphibian ENTs

The Xenopous laevis commonly known as African clawed frog possesses two complete putative ENT sequences which, based on sequence similarity, are orthologues of hENTl and hENT2, and belongs to the ENT1 and ENT2 lineage respectively. Another partial ENT3 like sequence was identified in this amphibian, but an ENT homologue representing the ENT4 lineage was not found.

1.4.1.5 Novel tunicate ENT

The presence of ENT homologues in urochordates was first reported by Acimovic and Coe (2002). However, at that time, only a partial sequence was identified in the

Ciona genome since the sequencing of this genome was not complete. Word search and

Blast search on the JGI database confirmed the presence of one complete putative ENT protein. The novel putative ENT protein in Ciona (CiENT) showed high sequence homology with human ENTs (hENT) 1 to 3 (Table 3). Predicted membrane topologies show that the putative CiENT is comparable in profile to hENTl (Figure 4) consisting of

17 a short NH2-intratracellular domain, 11 TM domains and a short COOH-extracellular domain. These data are consistent with those generated by ClustalW analyses and I included this protein as a putative ENT that represents the ENT1/2/3 lineage. One single

ENT protein in this lineage suggests that further gene duplication took place in this lineage after the appearance of protochordate (Figure 5).

1.4.1.6 Novel Arthropod ENTs

Previously, three ENTs have been identified in Drosophila, DmENTl, 2 and 3

(Machado et al. 2007). Based on sequence similarity and predicted topology, the

DmENT3 is clearly the orthologue of hENT4. Three other arthropod genomes have been examined. As expected, three putative ENTs were found in the honey bee (Apis melifera) and the red beetle (Aedes aegypti). In both insects, two putative ENTs showed high sequence similarity to DmENTl&2 and one had high similarity to DmENT3. In

Anopheles genome, two isoform was found representing lineage 1/2 and lineage4 (Figure

5).

1.4.1.7 Novel echinoderm ENTs

Strongylocentrotus purpuratus (sea urchin) is a model organism for developmental studies. Analyses of the sea urchin genome revealed two ENTs representing the first two ENT lineages that were predicted to have arisen from the first gene duplication event (Figure 5). When compared to chordate ENTs, e.g. mouse, the

SpENTl was 28%, 25% and 24% similar to mENTl to 3 respectively. But it was only

15% similar to mENT4. On the other hand, SpENT4 was 40% similar to mENT4 and

18 showed lower homology (between 21% and 23%) to other isoforms supporting the contention that ENTs may have arisen from an ancient gene duplication which occurred before the appearance of vertebrates.

1.4.1.8 Novel nematode ENTs

Previously six ENTs have been described in the nematode C. elegans (Acimovic and Coe, 2002) and CeENTl, 2 and 3 have been functionally characterized (Appleford et al. 2004). Current sequence similarity analysis suggests that CeENTs 1-5 represent the

ENT1/2/3 lineage and CeENT6 represents the highly conserved but separate ENT4 lineage. During this study, another member of this family was discovered which has been designated as CeENT7. This newly found isoform showed higher sequence similarity to

DmENT2 (32%) than DmENT3 (15%) and clearly belongs to the ENT1/2/3 lineage

(Figure 5).

1.4.1.9 Novel mycetozoan ENTs

Previously one partial putative ENT sequence was identified within the

Dictyostellium discoideum. During this study I have been able to identify three complete putative ENTs in this simple organism. Although they have the typical 11 TM domain predicted structure, they have low sequence homology to other invertebrate and vertebrate ENTs. Thus, the ENT homology to specific isoforms could not be determined for this organism and they were numbered according to their order of discovery.

DdENTl, DdENT2 and DdENT 3 are 26%, 20% and 17% similar to hENTl respectively.

19 1.4.1.10 Novel Protozoan ENTs

Numerous ENT isoforms have been identified and characterized in protozoan parasites, e.g. Leishmania donovani, Crithidia fasiculata, Trypanosome brucei,

Plasmodium falciparum (Landfear et al. 2002, Sanchez et al.2002, Liu et al. 2005, De

Koning et al. 2003). Novel ENT isoforms have been identified during this study in

Trichomonas vaginalis, Tetrahymena thermophila and Giardia lamblia. The parasitic

ENTs possess higher sequence variability than other ENT proteins and therefore could not be grouped according to the four ENT prototypes (Figure 5).

1.4.1.11 Novel Plant and fungal ENTs

Previously eight potential ENT members have been identified in Arabidopsis thaliana and to date, five of them have been functionally characterized (Li et al.2003).

Four ENT members have also been found and one of them has been characterized in

Oryzae sativa (rice) (Hirose et al. 2005). Additionally, I have identified two novel ENTs in Vitis vinifera (grapevine). Some fungi also contain ENT like proteins, (e.g. Candida albicans) and novel fungal ENT members have been detected in Aspergillus clavatus and in Cryptococcus neoformans (Figure 5).

1.4.1.12 Conserved residues among ENT family members

Having identified a total of 141 putative and characterized ENTs, I used these data to investigate the conservation of specific regions or residues within the ENTs which might indicate structural or functional constraints. These data can direct future experimental approaches to investigate the structure and function of ENTs. Multiple

Clustal alignments revealed that the following seven residues in hENTl are conserved 20 among all invertebrates and vertebrates- glycine at residue 22, proline at position 209, proline at 308, asparagine at 338, glycine at position 344, arginine residue at position 368 and finally a glycine residue at position 408. In terms of conserved motifs, a PWN motif at position 28 of hENTl is present in ENT1, ENT2 and ENT3 proteins, whereas the

ENT4 proteins have a PYNSF motif that is conserved among mammals, birds, insects and echinoderms (Figure 6).

1.4.2 CNT family

The word-based and Blast searches resulted in a total of 67 CNTs being identified based on sequence homology, in different taxa ranging from chordata to prokaryotes

(Table 2, Figure 7). The compiled database consists of previously characterized and putative CNTs.

1.4.2.1 Novel vertebrate CNTs

According to the current literature, three CNT isoforms are present in the (Smith et al. 2007) and genomes of other mammals, e.g. mouse, rat. The current database mining study revealed that these three CNT isoforms also exist in chimpanzee, cow and opossum. However, in several mammalian genomes only one or two novel isoforms could be identified, e.g. CNT2 & 3 in rhesus monkey, CNT1 & 3 in dog, CNT1 in pig and rabbit. Although the second or third isoforms could not be identified in these genomes, it is highly probable that all three isoforms are present in their genome and could be identified once their complete genome sequences are available. Interestingly, the chimpanzee genome was found to contain more than one isoforms of CNT2. The

21 PtCNT2.1 and PtCNT2.2, 643 and 658 amino acids long respectively, are 99% identical to each other. This again demonstrates the possibility of multiple isoforms among NT family members.

The chicken genome revealed two putative CNTs, one highly similar to hCNTl&2 (56 and 57% respectively) and the other to hCNT3 (65%). In the amphibian genome only one putative CNT was found which is homologous to hCNTl and 2 (57% to hCNTl and 56% to hCNT2) representing the CNT 1/2 lineage and showed lower sequence homology to hCNT3 (46%).

Two putative CNTs representing the CNT 1/2 lineage and CNT3 lineage were also found in the pufferfish genome. In zebrafish genome, only one CNT protein could be identified which showed high resemblance to CNT1/2 lineage members. The previously identified and characterized hagfish CNT belongs to the CNT3 lineage and it is still to be determined if this prevertebrate possess a CNT 1/2 like isoform. The vertebrate CNTs are highly conserved (particularly when compared to ENTs) and high sequence homology has been found among all the members where CNT1 and 2 isoforms are more similar to each other than CNT3. The presence of all three isoforms in mammals and two isoforms in other vertebrates and invertebrates support the contention that the CNT 1/2 lineage split after the appearance of chordates (Figure 8).

1.4.2.2 Novel tunicate CNTs A Ciona genome word based search revealed three putative CNTs. However,

ClustalW alignment revealed that two of these sequences are identical and thus only one

22 has been included in this study. Interestingly, one putative CiCNT shows high sequence homology to the mammalian or vertebrate CNT1&2 isoforms (47% to hCNTl and 48% to hCNT2), whereas the other one appears to belong to the CNT3 lineage (Figure 8) and was highly homologous to vertebrate CNT3 isoforms (53% identical to hCNT3).

1.4.2.3 Novel invertebrate CNTs

Several invertebrate putative CNT members have been identified based on sequence similarity. Two novel putative CNTs have been found in the Drosophila and

Anopheles genome, one being homologous to vertebrate CNT3 isoforms and the other to

CNT1&2 isoforms, thus confirming the existence of the CNT3 and CNT1/2 lineages in arthropods. In Apis melifera and Aedes Aegypti, however, only one putative CNT was found which belonged to the CNT3 lineage (AmCNT3 42% identical to DmCNT3 and

32% identical to DmCNTl/2; AaCNT3 58% to DmCNT3 and 39% to DmCNTl/2).

Five novel CNTs have been identified in the echinoderm, sea urchin. Based on sequence analyses, SpCNT3.1, 3.2, 3.3 and 3.4 were highly homologous to the hCNT3 (40 to 41% identical) and SpCNTl/2 was only 30% identical to hCNT3. But SpCNT3.1, 3.2 and 3.3 are 99% identical to each other and at present it is not clear if they are splice variants of one gene. Previous data also confirms the presence of two CNT proteins in nematodes that are 66% identical to each other (Xiao et al. 2001). The CeCNTl/2 and CeCNT3 represent the CNT 1/2 and CNT3 lineage respectively (Figure 8).

23 1.4.2.4 Novel fungal CNTs

Homologues of the previously characterized CaCNT in Candida albicans were identified in several other fungi - Aspergillus niger, Neosartorya fischeri, Coprinopsis cinerea, Chaetomium globosum, Yarrowia lipolytica. All of these novel CNTs possess the predicted 12 to 13-TMD structure similar to CaCNT and show 22 to 30% similarity to vertebrate CNTs.

1.4.2.5 Novel prokaryotic CNTs

Based on sequence homology, CNT members have been identified in a wide range if prokaryotic organisms. The NupC was the first discovered and characterized prokaryotic CNT member (Loewen et al. 2004). Homologues have been identified in gram negative and gram positive bacteria including denitrifying, anaerobic, bioluminescent, pathogenic strains, e.g. Shewanella denitrificans, Klebsiella pneumonia,

Photorhabdus luminescens, Vibrio fischeri. Similar to NupC, most of them display a 10

TMD predicted topology.

1.4.2.6 Conserved residues among CNT family members

Clustal alignment has revealed 10 residues in hCNTl that are conserved among all eukaryotic and prokaryotic CNT members. A glycine at position 227 (hCNTl as a reference sequence), glutamate at position 308, proline at position 387, proline at position

470, glycine at position 476, lysine at position 492, and an asparagine-glutamate-X-X- alanine motif starting at residue 496 and a phenylalanine at position 541 (Figure 9).

24 Table legends Table 1: The ENTfamily members. Functionally characterized (F.C.) ENT protein sequences were used in BLAST searches in the NCBI, ENSEMBL or JGI database to identify novel ENT members. TMHMM2.0 was used to predict the number of putative transmembrane helices (TMH).

Table 2: The CNTfamily members. Functionally characterized (F.C.) CNT protein sequences were used in BLAST searches in the NCBI, ENSEMBL or JGI database to identify novel CNT members. TMHMM2.0 was used to predict the number of putative transmembrane helices (TMH).

Table 3: The ENT protein sequence similarities between Ciona intestinalis and human. Full length protein sequences were used in ClutalW multiple alignment (using default settings) to determine the sequence similarity. The putative Ciona ENT (CiENT) showed higher homology to human ENT (hENT) 1, 2 and 3 than ENT4.

25 Table 1: The ENT family members (F.C. = Functionally characterized, TMH=Transmembrane helix)

Protein size Met/ No. Name Identifier Organism (residues) Stop codon F.C. Putative TMH 1 hENTl GL4826716 Homo sapiens 456 V V 2 hENT2 GI:38708299 Homo sapiens 456 V V 3 hENT3 GI: 119574809 Homo sapiens 475 V V 4 hENT4 GI: 100913032 Homo sapiens 530 V V 5 PtENTl GI: 114607582 Pan troglodytes 711 V X 5.1 PtENTl ENSPTRG00000018216 Pan troglodytes 456* V X 6 PtENT2 GI: 114638694 Pan troglodytes 456 V X 7 PtENT3 GI: 114631148 Pan troglodytes 638 V X 7.1 PtENT3 ENSPTRG00000023469 Pan troglodytes 475* V X 8 PtENT4 GI: 114613575 Pan troglodytes 130 No Stop X Partial sequence 9 MmENTl GI: 109071342 Macaca mulatta 752 V X 9.1 MmENTl ENSMMUG00000022685 Macaca mulatta 456* V X 10 MmENT2 GI: 109109548 Macaca mulatta 393 X X Partial sequence 11 MmENT3 GI: 109089410 Macaca mulatta 475 V X 12 MmENT4 GI: 109065906 Macaca mulatta 645 V X 12 MmENT4 GI: 109065906 Macaca mulatta 524* No Met X Partial sequence 13 BtENTl GL77735743 Bos taurus 456 V X 14 BtENT2 ENSBTAG00000012024 Bos taurus 454 V X 10 15 BtENT3 GI: 122692299 Bos taurus 474 V X 15.1 BtENT3.1 ENSBTAG00000000839 Bos taurus 476* No Met X Partial sequence 16 BtENT4 GI: 119917365 Bos taurus 608 V X 16 BtENT4 ENSBTAG00000007772 Bos taurus 525* V X 17 CalENTl GL50979327 Canis familiaris 456 V X 18 CaffiNT2 GL73983635 Canis familiaris 456 V X 19 CafENT3 GL73953438 Canis familiaris 473 V X 20 CafENT4 GL73958166 Canis familiaris 548 V X 20 CaffiNT4 GL73958167 Canis familiaris 526* V X 21 rENTl GI: 13928948 Rattus norvegicus 457 V V 11 22 rENT2 GI: 13929038 Rattus norvegicus 456 V V 11 23 rENT3 GI:51036680 Rattus norvegicus 475 V X 11 24 rENT4 GI: 109496637 Rattus norvegicus 523 V X 11 25 mENTl.l GI:47606215 Mus musculus 460 V V 11 26 mENT1.2 GI:8568090 Mus musculus 458 V V 11 27 mENT2 GI:8698687 Mus musculus 456 V V 11 28 mENT3 GI: 12963743 Mus musculus 475 V V 11 29 mENT4 GL22122849 Mus musculus 528 V V 11 Monodelphis 30 MdENTl.l GI: 126310076 domestica 455 V X 11 Monodelphis 31 MdENT1.2 GL126310078 domestica 452 V X 11 Monodelphis 32 MdENT2 GL126338872 domestica 632 V X 11 Monodelphis 32 MdENT2 ENSMODG00000008339 domestica 448* No Met X Partial sequence Monodelphis 33 MdENT3 GI: 126272532 domestica 476 V X 11 Monodelphis 34 MdENT4 GI: 126334540 domestica 528 V X 11 Oryctolagus 35 OcENT2 GI: 130489840 cuniculus 456 V X 11 36 GgENTl GL50740543 Gallus gallus 449 V X 11 36 GgENTl ENSGALGOOOOOO10182 Gallus gallus 457* No Stop X 11 37 GgENT2 GI: 118092643 Gallus gallus 458 V X 11 38 GgENT4 ENSGALGOOOOOO 10497 Gallus gallus 514 No Met X Partial sequence 39 X1ENT1 GL56269176 Xenopus laevis 459 V X 11 40 X1ENT2 GI:49115927 Xenopus laevis 462 V X 11 41 X1ENT3 GI:50415257 Xenopus laevis 473 No Met X 11 Tetraodon 42 TnENTl GL47218125 nigroviridis 438 No Stop X 11 43 TnENT2 GL47213393 Tetraodon 427 No Met X 11 nigroviridis Tetraodon 44 TnENT3 GL47228980 nigroviridis All No Stop X 11 Tetraodon 45 TnENT4 GI:47208592 nigroviridis 531 No Met X 11 46 DrENTl.l GI:71834498 Danio rerio 440 V X 11 47 DrENT1.2 GI: 125830058 Danio rerio 348 No Stop X Partial sequence 48 DrENT2.1 Ensemble: 1767 Danio rerio 415 No Met X Partial sequence 49 DrENT2.2 Ensemble:33909 Danio rerio 444 No Stop X Partial sequence 50 DrENT3 GI: 125832534 Danio rerio 919 V X 9 50 DrENT3 Ensemble:39541 Danio rerio 385* X X Partial sequence 51 DrENT4.1 GI: 125802931 Danio rerio 529 V X 11 52 DrENT4.2 Ensemble: 16339 Danio rerio 521 V X 11 53 FrENTl.l SINFRUG00000157116 Takifugu rubripes 369 X X Partial sequence 54 FrENT1.2 SINFRUG00000143396 Takifugu rubripes 415 No Met X Partial sequence 55 FrENT2.1 SINFRUG00000138187 Takifugu rubripes 403 No Met X Partial sequence 56 FrENT2.2 SINFRUG00000161521 Takifugu rubripes 423 X X Partial sequence 57 FrENT3 SINFRUG00000120849 Takifugu rubripes 474 V X 11 58 FrENT4 SINFRUG00000152509 Takifugu rubripes 511 X X Partial sequence 59 CiENT CI0100152425 Ciona intestinalis 503 V X 11 Drosophila 60 DmENTl GL24580625 melanogaster 476 V X 11 Drosophila 61 DmENT2 GI: 19920836 melanogaster 458 V V 11 Drosophila 62 DmENT3 GI:24663540 melanogaster 668 V X 11 63 AmENTl GI: 110756196 Apis mellifera 437 V X 10 64 AmENT2 GI:66548409 Apis mellifera 487 V X 11 65 AmENT2 GI:66515359 Apis mellifera 615 V X 11 66 AaENTl GI: 108873714 Aedes aegypti 501 V X 11 67 AaENT2 GI: 108878902 Aedes aegypti 447 V X 11 68 AaENT3 GL108881958 Aedes aegypti 652 V X 11 69 AgENTl GI: 118791644 Anopheles gambiae 462 No Met X 70 AgENT4 GL58394421 Anopheles gambiae 614 X X Strongylocentrotus 71 SpENTl GI:115683814 purpuratus 514 V X Strongylocentrotus 72 SpENT4 GL72044407 purpuratus 484 V X Caenorhabditis 73 CeENTl GI:25150621 elegans 445 V V Caenorhabditis 74 CeENT2 GI: 17568767 elegans 450 V V Caenorhabditis 75 CeENT3 GI:71997684 elegans 729 V X Caenorhabditis 76 CeENT4 GL71984819 elegans 451 V X Caenorhabditis 77 CeENT5 GI: 17567071 elegans 434 V X Caenorhabditis 78 CeENT6 GL71985763 elegans 441 V X Caenorhabditis 79 CeENT7 GI: 115534410 elegans 449 V X Schistosoma 80 SjENTl GL56755431 japonicum 336 X X Partial sequence Schistosoma 81 SJENT2 GL56753381 japonicum 442 V X Dictyostelium 82 DdENTl GI:66810734 discoideum 430 V X Dictyostelium 83 DdENT2 GL66813100 discoideum 522 V X Dictyostelium 84 DdENT3 GL66813098 discoideum 482 V X 85 LdENTl.l GL3450834 Leishmania donovani 491 V V 86 LdENT1.2 GL3435100 Leishmania donovani 491 V V 87 LdENT2 GL8272582 Leishmania donovani 499 V V 88 TbATl GI:72389715 Trypanosoma brucei 463 V w 29 89 TbNT2 GI: 84043912 Trypanosoma brucei 462 V V 11 90 TbNT3 GL84043916 Trypanosoma brucei 464 V X 11 91 TbNT4 Tb927.2.6220 Trypanosoma brucei 462 V X 11 92 TbNT5 GI:6164680 Trypanosoma brucei 463 V V 11 93 TbNT6 GI:84043932 Trypanosoma brucei 462 V V 11 94 TbNT7 Tb927.2.6280 Trypanosoma brucei 466 V V 11 95 TbNT8.1 GI:29468625 Trypanosoma brucei 435 V V 11 96 TbNT9 Tb927.6.220 Trypanosoma brucei 463 V X 11 97 TbNTIO Tb09.160.5480 Trypanosoma brucei 462 V X 11 98 TbNTl 1 GI:70908254 Trypanosoma brucei 482 V X 11 99 TbNT12 GI:70833830 Trypanosoma brucei 440 V X 11 100 CffiNTl GI: 10764226 Crithidia fasciculata 497 V V 11 101 CffiNT2 GI: 10764228 Crithidia fasciculata 502 V V 11 Plasmodium 102 P1ENT GL9963825 falciparum 422 V V 11 103 TgATl GI:6073829 Toxoplasma gondii 462 V V 11 104 TgAT2 44.m02769 Toxoplasma gondii 532 V V 11 Trichomonas 105 TvENTl.l GI: 123503176 vaginalis 408 V X 11 Trichomonas 106 TvENT1.2 GI:121914636 vaginalis 401 V X 11 Trichomonas 107 TvENT2.1 GI: 123484141 vaginalis 428 V X 10 Trichomonas 108 TvENT2.2 , GI: 123484137 vaginalis 424 V X 10 Trichomonas 109 TvENT2.3 GL121915517 vaginalis 336 X X Partial sequence Trichomonas 110 TvENT3.1 GI: 123470046 vaginalis 421 V X 11 Trichomonas 111 TvENT3.2 GI:121916483 vaginalis 458 V X 11 Trichomonas 112 TvENT4.1 GI: 123444411 vaginalis 399 V X 11 Trichomonas 113 TvENT4.2 GI: 123416956 vaginalis 400 V X 114 G1ENT GL71075613 Giardia lamblia 487 V X Tetrahymena 115 TtENTl GI: 118369603 thermophila 420 V X Tetrahymena 116 TtENT2 GI:118371337 thermophila 427 V X Tetrahymena 117 TtENT3 GI: 118387598 thermophila 448 V X Tetrahymena 118 TtENT4 GI: 118385951 thermophila 491 V X Tetrahymena 119 TtENT5 GI: 118374999 thermophila 417 V X Tetrahymena 120 TtENT6 GI: 118352803 thermophila 503 V X Tetrahymena 121 TtENT7 GI: 118376600 thermophila 479 V X Tetrahymena 122 TtENT8 GI: 118356601 thermophila 507 V X Tetrahymena 123 TtENT9 GI: 118353878 thermophila 419 V X Tetrahymena 124 TtENTIO GI: 118387968 thermophila 394 V X 10 125 AtENTl GI:30698033 Arabidopsis thaliana 450 V V 126 AtENT2 GI: 15232807 Arabidopsis thaliana 417 V X 127 AtENT3 GI: 16518993 Arabidopsis thaliana 418 V V 128 AtENT4 GI: 15234603 Arabidopsis thaliana 418 V V 129 AtENT5 GI: 15234604 Arabidopsis thaliana 419 V X 130 AtENT6 GI:22328363 Arabidopsis thaliana 418 V V 131 AtENT7 GI:22330367 Arabidopsis thaliana 417 V V 132 AtENT8 GL15217822 Arabidopsis thaliana 389 V X 133 OsENT2 GI: 115472733 Oryzae sativa 418 V V 134 OsENTl GI:64976566 Oryzae sativa 423 V X 135 OsENT3 GI: 115472735 Oryzae sativa 418 V X 11 136 OsENT4 GI: 115472737 Oryzae sativa 276 V X 6 137 VvENTl GI: 147766692 Vitis vinifera 401 V X 10 138 VvENT2 GL147841916 Vitis vinifera 697 V X 10 139 CaENT GI:68486699 Candida albicans 453 V X 11 140 AcENT GI: 121707565 Aspergillus clavatus 446 V X 10 Cryptococcus 141 CnENT GI:58262768 neoformans 481 V X 11

32 Table 2: The CNT family members (F.C. = Functionally characterized, TMH=Transmembrane helix)

Protein length Met/ Putative No. Name Identifier Organism (residues) Stop codon F.C. TMH 1 hCNTl GL9296936 Homo sapiens 649 V V 13 2 hCNT2 GI: 116242780 Homo sapiens 658 V V 13 3^ hCNT3 GI: 10732815 Homo sapiens 691 V V 13 4 PtCNTl GI: 114658654 Pan troglodytes 648 V X 13 5 PtCNT2.1 GI: 114656802 Pan troglodytes 643 V X 13 6 PtCNT2.2 GL55641999 Pan troglodytes 658 V X 13 7 PtCNT3 GI: 114625308 Pan troglodytes 691 V X 13 8 MmCNT2 GI: 109080955 Macaca mulatta 658 V X 13 9 MmCNT3 GL109111956 Macaca mulatta 692 V X 13 11 BtCNTl GL82697361 Bos taurus 649 V X 13 12 BtCNT2 GI: 119902661 Bos taurus 733 V X 13 13 BtCNT3 GI: 119900722 Bos taurus 602 •i X 13 14 SsCNTl ' GI:47523032 Sus scrofa 647 V X 13 15 CfCNTl GL73951580 Canis familiaris 753 V X 13 16 CfCNT3 GL73946593 Canis familiaris 650 V X 13 17 rCNTl GL9296931 Rattus norvegicus 648 V V 13 18 rCNT2 GL9296932 Rattus norvegicus 659 V •i 13 19 rCNT3 GI: 18266722 Rattus norvegicus 705 V X 13 20 mCNTl GI:51921363 Mus musculus 648 V V 13 21 mCNT2 GL9296945 Mus musculus 660 V V 13 22 mCNT3 GL10732817 Mus musculus 703 V V 13 23 MdCNTl GI: 12628201 Mondelphis domestica 670 V X 13 24 MdCNT2 GI: 126282009 Mondelphis domestica 638 V X 13 25 MdCNT3 GI: 126335048 Mondelphis domestica 655 V X 13 26 OcCNTl GI: 12229760 Oryctolagus cuniculus 658 V X 13 27 GgCNTl/2 GI: 118095439 Gallus gallus 662 V X 13 28 GgCNT3 GL118104143 Gallus gallus 640 V X 13 29 X1CNT1/2 GL50418042 Xenopus laevis 645 V X 13 30 TnCNTl/2 GL47224198 Tetraodon nigroviridis 544 V X 10 to 12 Partial 31 TnCNT3 GI:47213420 Tetraodon nigroviridis 574 No Met X sequence 32 DrCNTl/2 GI: 125844184 Danio rerio 646 V X 11 to 13 33 hfCNT3 GI:5814216 Eptatretus stoutii 683 V V 11 Partial 34 CiCNTl/2 GW1.12Q.13.1 Ciona intestinalis 499 X X sequence Partial 35 CiCNT3 GW1.168.39.1 Ciona intestinalis 470 X X sequence 36 AgCNTl/2 GI: 118787362 Anopheles gambiae 472 V X 10 to 12 Partial 37 AgCNT3 GI: 118787360 Anopheles gambiae 533 X X sequence Drosophila 38 DmCNTl/2 GL24651961 melanogaster 528 V X 10 Drosophila 39 DmCNT3 GI: 19921868 melanogaster 621 V X 11 to 12 40 AmCNT3 GI: 110762851 Apis melifera 588 V X 11 to 12 41 AaCNT3 GI: 108879688 Aedes aegypti 505 V X 11 Strongylocentrotus 42 SpCNT3.1 GI: 115608985 purpuratus 661 V X 10 to 12 Strongylocentrotus 43 SpCNT3.2 GI: 115608989 purpuratus 660 V X 10 to 12 Strongylocentrotus 44 SpCNT3.3 GI: 115608987 purpuratus 672 V X 10 to 12 Strongylocentrotus 45 SpCNT3.4 GI: 115644299 purpuratus 506 V X 13 Strongylocentrotus 46 SpCNTl/2 GI: 115940835 purpuratus 562 V X 8 to 10 47 CeCNTl/2 GL8886418 Caenorhabditis elegans 568 V X 12 to 13 48 CeCNT3 GI:8886420 Caenorhabditis elegans 575 V V 13 49 AnCNT GI: 145240975 Aspergillus niger 599 V X 12 to 13 50 CaCNT GL29501737 Candida albicans 608 V V 12 to 13 51 NfCNT GI: 119472976 Neosartorya fischeri 620 V X 12 52 GzCNT GI:46138033 Gibberella zeae 647 V X 12 53 CcCNT GI: 116508443 Coprinopsis cinerea 579 V X 12 54 MgCNT GL39941110 Magnaporthe grisea 662 V X 13 55 CgCNT GI: 116205722 Chaetomium globosum 615 V X 11 56 Y1CNT GI:50551237 Yarrowia lipolytica 593 V X 12 57 NupC GI: 16130325 Eschericia coli 400 V V 10 Shewanella 58 SdCNT GI:91792383 denitrificans 420 V X 10 59 SeCNT GL16761335 Salmonella enterica 400 V X 10 60 KpCNT GI: 152971281 Klebsiella pneumoniae 400 V X 10 61 SgCNT GL85059653 Sodalis glossinidius 394 V X 10 Photorhabdus 62 P1CNT GL37525355 luminescens 394 V X 10 Xenorhabdus 63 XnCNT GL48249476 nematophila 396 V X 10 64 CnCNT GI:118443815 Clostridium novyi 393 V X 10 65 BsCNT GI:558558 Bacillus subtilis 393 V X 10 66 SaCNT GL82750228 Staphylococcus aureus 404 V X 10 to 12 67 VfCNT GI:59711109 Vibrio fischeri 418 V X 10 Table 3: The ENT protein sequence similarities between Ciona intestinalis and human

Peptide Sequence Sequence Peptide Length Length % Identity Name Name (residues) (residues)

CiENT 503 hENTl 456 40

CiENT 503 hENT2 456 34

CiENT 503 hENT3 475 33

CiENT 503 hENT4 530 21

36 Figure legends Figure 1: Predicted membrane topology of human ENT1 (hENTl). Putative transmembrane domains were predicted by TMHMM2.0 using full length protein sequence. The topology shows an intracellular N-terminal, 11 transmembrane helices (TMH) and an extracellular C terminal. Red lines represent extracellular loops and blue lines represent intracellular loops.

Figure 2: Predicted membrane topology of human CNT1 (hCNTl). Putative transmembrane domains were predicted by TMHMM2.0 using full length protein sequence. The topology shows an intracellular N-terminal, 13 transmembrane helices (TMH) and an extracellular C terminal. Red lines represent extracellular loops and blue lines represent intracellular loops.

Figure 3: Phylogenetic distribution of ENTfamily members. The rootless tree was generated using the program PHYLIP (default settings). The ENT clusters show the distribution of the four prototypic isoforms.

Figure 4: Predicted membrane topology ofCiENT. Putative transmembrane domains were predicted by TMHMM2.0 using full length protein sequence. The topology shows an intracellular N-terminal, 11 transmembrane helices (TMH) and an extracellular C- terminal. Red lines represent extracellular loops and blue lines represent intracellular loops.

Figure 5: The evolutionary history of the ENT family. Appearance of novel invertebrate and vertebrate ENTs are shown. The invertebrate ENTs are found in lineage 1/2/3 and lineage 4 and vertebrate ENTs are found in all four lineages. The branch lengths do not imply time frame.

Figure 6: Conserved residues among ENT family members. Conserved residues have been highlighted in yellow and conserved motif among ENT 1-3 and among ENT4 proteins been highlighted in blue and green respectively. The colour of the residues are assigned as follows- red = small and hydrophobic residues, blue = acidic, magenta = basic, green =hydroxyl +amine + basic, grey= others. (TMH= Transmembrane helix)

Figure 7: Phylogenetic distribution of CNT family members. The rootless tree was generated using the program PHYLIP (default settings). The CNT clusters show the distribution of the three prototypic isoforms and prokaryotic CNTs.

Figure 8: The evolutionary history of the CNT family. Appearance of novel invertebrate and vertebrate CNTs are shown. The invertebrate and non-mammalian vertebrate CNTs 37 are found in lineage 1/2 and lineage 3 and only mammalian CNTs are found in all three lineages. The branch lengths do not imply time frame.

Figure 9: Conserved residues among CNT family members. Conserved residues have been highlighted in yellow and conserved motif has been highlighted in blue. The colour of the residues are assigned as follows- red = small and hydrophobic residues, blue = acidic, magenta = basic, green =hydroxyl +amine + basic, grey= others. (TMH= Transmembrane helix).

38 Figure 1: Predicted membrane topology of human ENT1 (hENTl)

0 tsa

2 3

Amino acids and Predicted transmembrane helices

Figure 2: Predicted membrane topology of human CNT1 (hCNTl)

100 200 300 400 500 600 1 2 3 4 5 6 7 8 9 10 11 12 13

Amino acids and Predicted transmembrane helices

39 Figure 3: Phylogenetic distribution of ENT family members

ENT3 isoforms

ENT4 isoforms

ENT2 isoforms

ENT1 isoforms

40 Figure 4: The predicted membrane topology of CiENT based on TMHMM 2.0

480 500 23456 789 10 11 Amino acids and Predicted transmembrane helices

41 Figure 5: The evolutionary history of the ENT family

ENT4 ENT1 ENT2 ENT3

Mammals • -Mammals Pan troglodytes Pan troglodytes Macaca mulatto Macaca mulatto Bos taurus Bos taunts Canis familiaris Canis familiaris Monodelphis domestica ENT1/2 ENT3 Monodelphis domestica

Chordates- Chordates Xenopous laevis Tetraodon nirgoviridis Tetraodon nirgoviridis Takifugu rubripes Takifugu rubripes Danio rerio Danio rerio Gallus gallus ENT4 ENT1/2/3 Gallus gallus < Protochordates Ciona intestinalis Invertebrates - Invertebrates Strongylocentrotus purpuratus C. elegans Apis melifera Strongylocentrotus purpuratus Aedes aegypti Apis melifera Aedes aegypti

Aspergillus clavatus Trichomonas vaginalis Tetrahymena thermophila Cryptococcus neoformans Giardia lamblia Vitis vinifera Dictyostelium discoideum Ancestral ENT

42 Figure 6: Conserved residues among ENT family members

TMH1 Extracellular TMH7 TMH8 TMH9 TMH10 hENTl GLOTT.TIIIMMFFMT-- --AGLLPASY-- -IGMFPAVT- --NIFDWLG- --VLARLV- --SNGYL hENT2 GLGTLLHBIFFIT----LGTMPSTY-- -LSVFPAIT- --MIMDWLG- --VCLRFL- --SNGYL hENT3 GIGSLL|m|t'FIT----TGSFPMRN-- -SLIYPAVC- - -NFADLCG- --VLLRTC- --SNGYL hENT4 GEGFLLflBBlT-- --TGMLPKRY-- -LCLFPGLE- --NLSDFVG- --SCLRW- --SNGYF GgENTl GLGTLLBMFFMT----AGLLPASY-- -IGVFPSIT- - -NVFDWMG- - -WLRVI - --SNGYL GgENT2 GVGSLLWFFIT-- --SSRFPMRN-- -IIIFPSLS- --NFADWCG- --VLLRTI- --SNGYL GgENT4 GVGFLLOHHIT-- --TGLLPKRY-- -LCLFPGLE- --NLSDFVG- --SCLRW- --TNGYF X1ENT1 GLGTLLBHFFMT- ---AARFPASY-- -IGIFPAVT- --NLFDWAG- --VAARLV- --SNGYL X1ENT2 GLGTLLBMFFIT----LTLLPQTY-- -LSVFPAIT- --NVMDWAG- --VAVRFI- --TNGYF X1ENT3 GVGASLBBIFFCT- •- -TGQFPMKH-- -IIIFPTIS- --NFSDFCG- --VFLRTL- --SNGYL TnENTl GLATLLHUFFMT----AGILPASY-- -IGAFPAVT- --NLMDWAG- --VGLRLI- --SNGYL TnENT2 GLGTLLHMFFMT- •--AGLLPASY- - -IGVFPAIT- --NLCDWGG- --IVCRVI- --SNGYL TnENT3 GIGSLLBWFFIT-- --SGHFPMRI-- -IMVFPAVS- --NMADFCG- --VLCRTV- --SNGYL TnENT4 GVGFLLQIBIT-- --MGMLPKRY-- -LCLFPGLE- --NMSDFVG- --SCLRW- --TNGYF CiENT GLGTLLBMFFIT- •- -AASLPPRY-- -LACFPAIT- --NLTDWLG- --VLIRGV- --SNGHL SpENTl GIGTLYBBISFIT----AAKLPEGY-- -LAIFPWL- --NLGDFFG- --WSRLL- --SNGYL SpENT4 GTGFLL^JHvT- •--AGMLPKKF- - -LCLFPGIE- --NFTDLCG- --SASRIL- --SNGYF DmENTl GIGTMTBWFFVT- •--AGLFPSEF- - -LSVYPAVT- --NCGDYFG- --IWRMA- --SNGYF DmENT2 GVGTLMBUfcFIT- •--VASLPIKY- - -LSVFPAIQ- --NVFAMLG- --WLRLA- --SSGYL DmENT3 GIGFVLBMBII-- --ASMLPKQY-- -LSLYPGIE- --NTSDWG- --SGLRIV- --TNGLA

43 Figure 7: Phylogenetic distribution of CNT family members

CNTl/2 and CNT3 isoforms

CNT3 isoforms

CNT2 isoforms

CNT1 isoforms

44 Figure 8: The evolutionary history of the CNT family

CNT3 CNT1 CNT2

Mammals -• ~ Mammals

Pan troglodytes Pan troglodytes Macaca mulatto Macaca mulatto Bos taurus Bos taurus Canis familiaris Canis familiaris Monodelphis domestica Monodelphis domestica

Chordates Chordates Xenopous laevis Tetraodon nirgoviridis Tetraodon nirgoviridis Gallus gallus Danio rerio Gallus gallus

Protochordates Protochordates Ciona intestinalis Ciona intestinalis Invertebrates • Invertebrates

Strongylocentrotus purpuratus Strongylocentrotus purpuratus Apis melifera Apis melifera Aedes aegypti Aedes aegypti

Aspergillus niger Shewanella denitrificans Neosartorya fischeri Klebsiella pneumonia Coprinopsis cinerea Photorhabdus luminescens Chaetomium globosum Vibrio fischeri Yarrowia lipolytica Ancestral CNT

45 Figure 9: Conserved residues among CNT family members

TMH5 TMH7 TMH8 TMH10 Intracellular hCNTl IRTEPGFIAF----TATETL----LVYPE----LRPVAFLMGVA- --GIKLFLHBY hCNT2 IRTDLGYTVF----TATETL----LAYPE----LRPMVFMMGVE- --GIKFFlBBflS^ hCNT3 LRTDPGFIAF----SPIESV----LFWPE----FMPFSFMMGVE- - -GYKTFF(^gy mCNTl IRTEPGFVAF----SATETL----LVYPE----LRPVAFLMGVA---GIKLFLJHJBY mCNT2 IRTEPGFNAF----TAAETL----LVYPE--- -LRPMVFMMGVQ---GVKFFIHEBEY mCNT3 LRTRPGFVAF- •--SPIESV- ---LFWPE----FMPFSFMMGVD- - -GYKTFFHHS^ MdCNTl IRTNPGFAAF----IATETL----LVYPE----LRPIAFIMGVD- --GIKFFL|QJQY MdCNT2 IRTDPGFAAF----TATETL----LVYPE----LRPIAFMMGVD- --GIKFFLBDD^ MdCNT3 LRTTYGFEAF----SPIESV----LFWPE----FMPFSFMMGVD- - -GYKTFF^npF GgCNTl/2 LRTTPGIQAF----TPTETL----LVYPE--- -LMPVAFLMGAD---GIKIFL^^Y GgCNT3 LRTKVGFDVF- •--TPVESL- ---LFWPE----FMPFSFMMGVD- - -GYKTFFJHJBY X1CNT1/2 IRTEPGYQAF----TATESL----LVYPE----FMPIAFMMGVK- --GTKIILHBJJY TnCNTl/2 IRTQPGLIAF----SPTETL----LSYPE----FMPVAFMMGIP- - -GTKLFLBBBJY TnCNT3 FRTSSGASAV----TSVESV----TFWPE----FRPLAFMMGVS- --GTKTFLBHnj^ hfCNT3 LRTKPGLDAF----SPMESM----TFWPE----LMPFAFMMGVN- - -GMKTFFBHEJY CiCNTl/2 LRTQAGFDAF----SATESM----LVYPE----FMPITFLMGIS- --GSKIFL(HBY CiCNT3 LRTSAGYTAI- ---SAIESA----LLYPE----FMPVAFLMGAD- --GMKTFLHHBY SpCNT3.1 LRTSIGFAIF----SASESL----LMYPE----FRPIAFIMGVP- --GLKTFIHBSB^ SpCNT3.3 LRTSIGFAIF----SASESL----LMYPE----FRPIAFIMGVP- --GLKTFIHBEEY SpCNT3.2 LRTSIGFAIF----SASESL----LMYPE----FRPIAFIMGVP- --GLKTFIBDB^ SpCNT3.4 LRTHPGFVAF----SGAESF----MVYPE----FFPLAWILGTT- --GIKTFLBHBJY SpCNTl/2 MGVRAL- ---TAIESI----LAYPE----FMPLAFIMGVE- - -GLKTFLJHR* DmCNTl/2 LRLPFGRSIF----TVCESV----LFYPE----FIPIVFVMGVP- --AQKSFlHBQJY DmCNT3 IRWEVGRKIF----TVCESV----LYMPE----FIPLVWAMGVP- --ATKTIlBH^Y CeCNTl/2 LKWPTGRWFF----TPVESV----LMYPE----FFPLAYIMGIT- - -GSKTAVHBJBY CeCNT3 LKWSTGQWFF- •--TPVESV- ---IMFPE----FFPLAYMMGVN- - -GTKTAVBHHJY CaCNT LRTKCGYDVF----SGAEAI----LRYPE----FYPIGFLLGTP- --AYKFIQJHHB* AnCNT LRTQAGYDIF----SGAEW----IRWPE----CYPVSFLLGVS- --GMKLVMHDB^ VfCNT LYVPVGRDVL- •--SRAESL- ---IIKPE----FAPLAFLIGVP- - -GQKLWjnB^ NupC LNSDVGLGFV----GKLESF----LINPY----FYPIAWVMGVP- - -ATKLVSHHS&4 1.5. Discussion

1.5.1 The ENT family

The comprehensive analyses conducted during this study resulted in the identification of novel putative ENT isoforms in various vertebrate and invertebrate organisms with recently sequenced genomes (Tablel). Although a total of 141 ENT homologues have been identified in all of the eukaryotic kingdoms (Animalia, Fungi and

Plantae), the existence of these transporters could not be confirmed in prokaryotes.

To date no ENT member has been identified in bacteria or in archaea although these organisms do possess facilitative nucleoside transport proteins, e.g. Tsx (Nieweg et al.

1997). At present, it is not clear if the ENT family has evolved only in eukaryotes to facilitate bidirectional transport process in order to maintain intracellular nucleoside pool by having an equilibrative system or if a greater extent of sequence dissimilarity exists between eukaryotes and prokaryotes and this is the reason behind our inability to identify putative prokaryotic ENTs. Whereas the sequence homology is highly conserved among

ENT 1-3 arid among ENT4 proteins, the sequence similarity becomes less evident among their invertebrate ancestors. The lack of an identifiable ENT in prokaryotes may be due to the role of facilitative nucleoside transport having been taken over or having always been due to other non-ENT proteins such as proton dependent symporters (e.g. NupG in E. coli) or proton dependent CNTs or Tsx. Prokaryotes presumably rely on them for the uptake of required nucleosides to maintain their physiological processes (Hao et al. 2004,

Loewen et al. 2004, Nieweg et al. 1997).

47 The simple protists in which ENT members have been identified are -

Diplomonadida {Giardia lamblia), Parabasala {Trichomonas vaginalis) and

Kinetoplastida {Trypanosoma brucei, Leishmania major, Crithidia fasciculata) (Landfear et al. 2002, Sanchez et al.2002, Liu et al. 2005). The higher-order protists belonging to

Alveolata that include flagellates and ciliates also contain ENT members, e.g.

Toxoplasma gondii, Plasmodium falcuparum, Tetrahymena thermophila (De Koning et al. 2003, Parker et al. 2000). They lack the necessary biosynthetic pathway for purine synthesis and therefore, rely absolutely upon transport proteins to salvage purine nucleosides from their vertebrate or invertebrate hosts. Multiple isoforms have been identified in different groups of parasites. Some of the well characterized ones are TbATl in Trypanosoma brucei, LmENTl in Leishmania major, TgATs in Toxoplasma gondii,

CfNTl in Crithidia fasciculata etc (Landfear et al. 2002, Sanchez et al.2002, De Koning et al.2003). Interestingly, these recently characterized transporters show different substrate specificity and affinity. While some of them are entirely purine selective (e.g.

TgATl), others transport both purine and pyrimidine nucleosides with high affinity

(e.g.TgAT2) (De Koning et al. 2003). These protozoan ENTs also show the least sequence homology to other ENTs and their sequence variation may explain their unusual proton dependency during nucleoside transport. Additionally, it is not clear why some parasites possess numerous NT isoforms. One possible explanation could be that since their lifecycle requires more than one host, they need multiple ENTs to ensure adaptability. This might also explain the different substrate specificity of different NTs as different hosts are associated with different environments. The expression and complete 48 characterization of different isoforms is necessary to understand the NT dependency of parasites and for the accurate development of nucleoside analog drugs.

Other "lower" eukaryotic members of this family are the Dictyostelia ENTs. Like the protozoan ENTs, the three identified DdENTs in the Dictyostelium genome have very low sequence homology to other ENTs and their homology to ENT1/2/3 and ENT4 lineage could not be deduced. Dictyostelia stand on the borderline between unicellular and multicellular organisms. At the beginning of their lifecycle they are unicellular amoeba, but have the ability to divide into multicellular organism during cellular starvation (Schaap et al.2006). Although their taxonomic classification is debated, they are thought to belong to the phylum protista and diverged from the animal-fungal lineage after the animal-plant lineage split took place (Eichinger et al. 2005). DdENTs are a model organism for studying ancient eukaryotic cell-cell signaling, motility and development of multicellularity. Therefore, characterization of the putative DdENTs will provide useful information on their role in both unicellular physiology as well as ENT expression during evolution of multicellularity.

According to standard molecular phylogeny, nematodes and arthropods are closely related and grouped together. Thus ENT proteins from both lineages (1/2/3 and 4) are expected to be present in arthropod and nematode genome. To date, three ENTs have been identified in the Drosophila genome, two in the ENT 1/2/3 lineage and one in the

ENT4 lineage. On the other hand, seven ENT isoforms have been identified in C. elegans including a novel isoform identified during these analyses. Six of these CeENTs represent the ENT 1/2/3 lineage whereas one represents the ENT4 lineage that suggests 49 multiple gene duplication in the ENT1/2/3 lineage and no further duplication in the ENT4 lineage. Since both of these taxa contain ENT 1/2/3 and ENT4 like proteins it was suggested that the first gene duplication event of the ENT family occurred prior to the divergence of vertebrates from invertebrates (Acimovic and Coe 2002). Searching other sequenced arthropod genomes such as the honey bee and yellow fever mosquito resulted in discovery of a similar arrangement of ENTs within these taxa. Both of these organisms contain three ENTs, two in the ENT 1/2/3 lineage and one in the ENT4 lineage that further confirms the presence of two separate lineages before vertebrates appeared. One exception is the Anopheles gambiae genome where one isoform from each lineage has been found. A revised and more updated genome sequence of Anopheles gambiae would be able to confirm if a third ENT isoform exists in this organism (Holt et al. 2002).

Among the seven CeENTs, only three isoforms from the ENT 1/2/3 have been characterized at the molecular level (Appleford et al. 2004) and they have overlapping substrate specificity suggesting similar function. The presence of multiple isoforms of a transporter has previously been demonstrated for the nucleotide sugar transporters where

18 have been identified in this nematode genome (Caffaro et al. 2007). Although the role of multiple isoforms is not clear, their presence is likely due to a series of gene duplication events. These duplications can lead to proteins that are functionally redundant or can have tissue-specific or intracellular compartment specific roles.

Interestingly, during this study two putative ENT proteins were identified in a platyhelminth, Schistosoma japonica. This is the first time the existence of ENTs in the 50 phylum platyhelminthes has been reported and based on sequence analyses, these two putative proteins belong to the ENT1/2/3 lineage. With limited available data on

Schistosoma japonica genome, it cannot be concluded if the ENT4 lineage is present or has been lost in this organism. However, determining the substrate specificity of the two novel transporters may help in designing nucleoside analog drugs that will interfere with the synthesis of nucleotides and could have therapeutic applications in treating schistosomiasis which is a major health problem in many tropical and sub-tropical countries (McManus et al. 2004).

The first fully sequenced genome of a non-chordate deuterostome was the sea urchin (Sodergren et al. 2006). This model organism is widely used for developmental studies and analyses of its genome provide insight into the ancestral forms of chordate genes and into the loss of genes over the course of evolution (Davidson et al. 2006).

Based on their external apperances it is not evident that sea urchins actually belong to the same group (deuterostome) as humans and are more closely related to humans than flies or worms (Sherman, 2007). The two newly identified ENTs in sea urchin representing the

ENT 1/2/3 and the ENT4 lineage further support the contention (Acimovic and Coe,

2002) that both lineages (1/2/3/ and 4) existed prior to the appearance of vertebrates.

To identify the evolutionary pattern of this transporter family further, I examined a urochordate genome. Ciona is the most "simple" form of chordates. In its larval stage, the tadpole possesses a notochord and a dorsal tubular nerve chord. Also, the Ciona genome possesses genes found in vertebrates and also genes involved in cellulose metabolism similar to fungi and bacteria (Dehal et al. 2002, Leveugle et al. 2004) 51 suggesting that it is an ideal organism to study evolutionary changes of a protein from invertebrates to vertebrates. Our analysis shows that Ciona genome possesses only one known ENT isoform (CiENT) that appears to be a member of the ENT1/2/3 lineage. Our other data suggest that the ENT4 lineage and the ENT 1/2/3 lineage both existed prior to the appearance of Ciona therefore I would predict the presence of an ENT4 homolog in this organism. However, the lack of an ENT4 lineage in Ciona may not be completely unexpected, since deletion of ancestral duplicated genes has been identified as a significantly common trait in Ciona, most likely to facilitate adaptation by retaining useful genes and by losing redundant genes (Hughes et al. 2005). Further investigations of the Ciona genome and functional characterization studies are required to confirm if an

ENT4 homolog exists in this organism. Interestingly, the existence of a single Ciona

ENT suggests that this is a prototypic ENT of the ENT 1/2/3 lineage that has undergone further gene duplication and resulted in ENT1, 2 and 3 isoforms in mammals.

Since the pufferfish and zebrafish diverged from the common ancestor with mammals about 450 million years ago (Jaillon et al. 2004), comparing the divergence and conservation of sequences will provide fundamental insights into the evolution of vertebrate ENTs. In Tetraodon nirgoviridis, Takifugu rubripes and Danio rerio, representatives of all four ENT isoforms have been found. In mammals, homologues of these four isoforms are also present where the ENT1 and ENT2 protein sequences are more similar than ENT3 proteins. These data suggest that further gene duplication of the

ENT 1/2/3 lineage may have occurred leading to two subsequent lineages, ENT 1/2

52 lineage and ENT3 lineage. And possibly a third gene duplication event then led to the

ENT1/2 split that resulted in ENT 1 and ENT2 isoforms in mammals.

Marsupials are a diverse group of animals that represent the ancestral stage of mammals. The ancestral marsupial line split from the placental mammal line about 180 million years ago (Lemos et al. 2007) and marsupials are sometimes considered to reflect the ancestral mammalian state. This can be useful to compare how ENT proteins have evolved in function in placental mammals. Comparing ENT functions and expression among early and advanced marsupials might give us the opportunity to study specialized functions of ENTs. Previously identified proteins in both placental mammals and marsupials suggest a conservation of function, e.g. the uncoupling protein 1 (UCP1) possesses similar function in maintaining nonshivering thermogenesis in small mammals and fish (Jastroch et al. 2007). The four prototypic MdENTs might also possess functions similar to other characterized mammalian ENTs.

The rest of the mammalian genomes are found to encode all four prototypic ENTs

(e.g. rhesus monkey, mouse, rat etc). However, the identification of two isoforms representing one lineage in several taxa suggests that splice variants may exist in some mammals. Recently a splice variant of mENTl has been confirmed to be functional

(Bone et al. 2007). The existence of splice variants of ENTs suggests different and novel physiological and/or pharmacological properties. Also, both Fugu and Zebrafish genome possess multiple isoforms for ENT1 and ENT2, which is likely to be due to the whole genome duplication event that is evident in teleost genomes (Taylor et al. 2001). A similar phenomenon has been observed in other gene families in teleost (Chung et al. 53 2006). For example, the zebrafish genome expresses two tropoelastin genes, whereas in human, mouse, bovine and chicken genome, only one gene has been found (Chung et al.

2006). The presence of multiple isoforms for ENT prototypes indicate that this genome duplication took place after the divergence of the fishes (Acimovic and Coe, 2002).

Additionally the presence of two ENT1 or two ENT2 isoforms provides the opportunity to study if these genes have undergone sub-specialization in function or neofunctionalization. Functional characterization of the zebrafish ENTs would provide a better opportunity to determine the expression and developmental profile of ENTs in vivo and could provide us with a model system to study isoform specific function in vertebrates.

Although our findings suggest that the NT family has undergone extensive evolutionary changes and is strongly influenced by whole-genome or segmental gene duplication, some areas of sequence similarity appears to be preserved among all species.

The CeENTs, DmENTs, SpENTs and the vertebrate ENTs all possess similar membrane topology to mammalian ENTs, e.g. hENTs, which suggests that the ENT proteins have maintained their conserved prototypic structure during evolution despite significant changes in primary amino acid sequence. ENT isoforms in kingdom plantae and fungi also possess the typical 11-TMD structure showing structural conservation over time (Li et al. 2003, Hirose et al. 2005). Recently random mutagenesis of two residues

(phenylalanine and asparagine) in TM segment 8 of hENTl that are conserved among mammalian ENTs showed that these residues were involved in protein folding and inhibitor sensitivity (Visser et al. 2007). The conserved proline residues identified in this 54 study might have roles in folding of the protein in helices since all three conserved proline residues have been found inside the TM helices or at the beginning of the helix

(Orzaez et al. 2004). The PWN motif is conserved among all ENT1-3, and PYNSF among all ENT4 proteins and it is most likely that this motif is an important protein structure determinant and might have a role in folding or trafficking of this protein to the membrane or in substrate recognition. Targeting this motif would be able to provide us with useful structural data about hENTl protein folding and targeting. The glycine residues at position 22 and 408 of hENTl might also have an architectural role by enabling the protein to fold back on itself (Coleman et al. 2005, Sengupta et al. 2002).

The conserved arginine residue at position 368 might have a role in protein trafficking since in Leishmania LdNT2, replacing arginine-393 with a lysine residue resulted in predominant flagellar localization (Arastu-Kapur et al. 2003). Analyses of the other conserved residues might reveal their ancient contribution towards maintaining the ENT structure.

1.5.2 The CNT family

The evolution of the CNTs is very poorly understood and identification of isoforms in model organisms such as Drosophila, Ciona, etc. helped us construct the first ever evolutionary history of this protein family. This study identified 67 putative and functionally characterized CNTs. Like the ENT family, CNT members are found in

"lower" eukaryotes including insects, nematodes and fungi. In contrast to ENTs, CNT members are also present in prokaryotic organisms. The NupC is the first prokaryotic

55 CNT member that has been functionally characterized (Craig et al. 1994). But unlike most of its eukaryotic homologues, NupC transports nucleosides using a proton gradient instead of a sodium gradient (Loewen et al. 2004, Smith et al. 2005).

Similar proton dependency has been noticed in Candida albicans and it was believed that bacterial and fungal CNTs have different ion dependency than other eukaryotic organisms (Loewen et al. 2003). Interestingly, the CeCNT3 transporter in C. elegans is found to be another proton coupled broadly selective nucleoside transporter

(Xiao et al. 2001). Previously, transporter ion-dependency was thought to be consistent among nematodes and mammals (e.g. sodium dependent dopamine transporters, proton dependent peptide transporters), but the proton dependent CeCNT3 clearly demonstrates that this is not the case. Further cellular localization and characterization studies might reveal the physiological importance of this exceptional characteristic of CeCNT3. The presence of CNTs representing both CNT 1/2 and CNT3 lineages in nematodes suggests that both lineages were present long before the appearance of vertebrates. The identification of two putative CNTs in both Drosophila and Anopheles further confirms the presence of two lineages before the appearance of vertebrates. The identification of ancestral CNT 1/2 protein in these arthopods also provides the opportunity to analyze the function of the ancestral CNT1/2 protein. Based on sequence similarity and known substrate specificity of functionally characterized CNT3 proteins (e.g. hfCNT3, mCNT3, hCNT3), it can be speculated that the DmCNT3 and AgCNT3 proteins will most likely be broadly selective nucleoside transporters (Smith et al. 2005). However, even though the substrate specificity of CNT1 and 2 isoforms are known for some mammals (e.g. 56 mouse, human), the substrate preference of their ancestral protein, CNT1/2 is still to be determined.

The presence of more than three CNT isoforms in an invertebrate or a vertebrate genome has not been identified before. However, based on my analyses, this phenomenon is present in echinoderms. Analysis of the sea urchin genome revealed one putative CNT isoform representing CNT 1/2 and four putative isoforms representing

CNT3 linage. The presence of CNT3 and CNT 1/2 like proteins in another invertebrate provides more evidence that only two lineages existed in invertebrates. As mentioned earlier, hCNT3 has broad substrate specificity and transports both purine and pyrimidine nucleosides with high affinity (Yao et al. 2002). Different isoforms of SpCNT3 might also possess different function and thus functional characterization of these putative

SpCNTs may demonstrate their possible physiological role.

Two novel CNTs have also been found in my analyses in the Ciona genome, one representing the CNT 1/2 lineage and the other one representing the CNT3 lineage. The identification of only two lineages in Ciona suggests that the CNT 1/2 lineage has undergone gene duplication after the appearance of protochordates. In future, the identification of complete protein sequences will allow examining the functions of CNT isoforms in ancient chordates.

Hagfish may represent a "transition" state between early chordates and true vertebrates and this taxon is the most ancient member of the subphylum Craniata that existed 550 million years ago (Yao et al. 2002). This "prevertebrate" genome revealed only one CNT, which may represent an ancestral version of the CNT3 isoforms in 57 mammals (Yao et al. 2002). The ability to identify other CNT isoforms in the hagfish genome is limited by the availability of inadequate cDNA sequences which restricts further data analysis for CNTs. None of the vertebrate genomes which were examined appeared to possess all three CNT isoforms which led me to conclude that second gene duplication likely took place in the CNT 1/2 lineage just before the appearance of mammals.

My analyses confirm that similar to the ENT family, the CNT family also has two lineages present in invertebrates and vertebrates, CNT 1/2 lineage and CNT3 lineage.

However, it is apparent that the CNT 1/2 lineage has undergone further gene duplication after the appearance of vertebrates and before the appearance of mammals, which has resulted in CNT1 and CNT2 isoforms. This would explain the presence of all three isoforms in mammals whereas the other vertebrates and invertebrates have only two isoforms, each representing one lineage.

Analyzing both ENT and CNT families reveals that the CNT family transporters are well conserved whereas the ENT transporters have notably evolved to perform more specialized functions. I was able to identify 10 conserved residues among all CNT members starting from prokaryotes to mammals. The glycine-227 and glycine-476 are located at the end and beginning of TMH-5 and TMH-10 and most likely play a structural role (Coleman et al. 2005). The conserved proline-387 is located in helix 8 and proline -

470 is located in helix 10 and most likely contributes to helix folding (Orzaez et al.

2004). The glutamate, located on helix 7 has been recently studied and mutating this residue into asparagine or glutamine resulted in impaired transport activity (Yao et al. 58 2007). The lysine-492 of hCNTl could be a site for post translational modification since it is a polar and charged amino acid. The motif asparagine-glutamate-X-X- alanine and the phenylalanine residue is located at the cytoplasmic side of hCNTl and they might play a role in protein-protein interaction, however, the function of this motif is only speculative. Further site-directed mutagenesis studies will reveal the specialized roles of these residues which made them unchangeable over time.

1.6 Conclusion

This study presents the most recent comprehensive analysis of the NT family.

Using recently sequenced genomes from different prokaryotes and eukaryotes, novel NT members have been identified among different taxa. Incorporating these newly identified putative members along with previously found NT proteins, I have re-examined the evolutionary history of ENT and CNT proteins and identified evolutionary conserved residues which will further enhance our understanding of the structure and function of these proteins.

59 1.7 References

1. Acimovic, Y and Coe, IR. 2002. Molecular evolution of the equilibrative nucleoside transporter family: identification of novel family members in prokaryotes and eukaryotes. Molecular Biology and Evolution.19:2199-2210.

2. Annilo T, Chen ZQ, Shulenin S, Costantino J, Thomas L, Lou H, Stefanov S, Dean M. 2006. Evolution of the vertebrate ABC gene family: Analysis of gene birth and gene death. Genomics. 88: 1-11.

3. Appleford PJ, Griffiths M, Yao SY, Ng AM, Chomey EG, Isaac RE, Coates D, Hope IA, Cass CE, Young JD, Baldwin S A.2004. Functional redundancy of two nucleoside transporters of the ENT family (CeENTl, CeENT2) required for development of Caenorhabiditis elegans. Mol Membr Biol. 21(4):247-59.

4. Arastu-Kapur S, Ford E, Ullman B, Carter NS. 2003. Functional analysis of an inosine-guanosine transporter from Leishmania donovani. The role of conserved residues, aspartate 389 and arginine 393. J Biol Chem. 278(35):33327-33.

5. Baldwin SA, Beal PR, Yao SY, King AE, Cass CE and Young JD. 2004. The equilibrative nucleoside transporter family, SLC29. Pflugers Arch. 447(5):735-43.

6. Baldwin SA, Mackey JR, Cass CE, Young JD.1999.Nucleoside transporters: molecular biology and implications for therapeutic development. Mol Med Today. 5(5):216-24.

7. Baldwin SA, Yao SY, Hyde RJ, Ng AM, Foppolo S, Barnes K, Ritzel MW, Cass CE, Young JD. 2005. Functional characterization of novel human and mouse equilibrative nucleoside transporters (hENT3 and mENT3) located in intracellular membrane. J Biol Chem. 280(16): 15880-7.

8. Barnes K, Dobrzynski H, Foppolo S, Beal PR, Ismat F, Scullion ER, Sun L, Tellez J, Ritzel MW, Claycomb WC, Cass CE, Young JD, Billeter-Clark R, Boyett MR, Baldwin SA. 2006. Distribution and functional characterization of equilibrative nucleoside transporter-4, a novel cardiac adenosine transporter activated at acidic pH. Circ Res. 99(5):510-9."

60 9. Bone DB, Robillard KR, Stolk M, Hammond JR. 2007. Differential regulation of mouse equilibrative nucleoside transporter 1 (mENTl) splice variants by protein kinase CK2. Mol Membr Biol. 24(4):294-303.

10. Cabrita MA, Baldwin SA, Young JD, Cass CE. 2002. Molecular biology and regulation of nucleoside and nucleobase transporter proteins in eukaryotes and prokaryotes. Biochem Cell Biol. 80(5):623-38.

11. Caffaro CE, Hirschberg CB, Berninsone PM. 2007. Functional redundancy between two Caenorhabditis elegans nucleotide sugar transporters with a novel transport mechanism. J Biol Chem. 282(38):27970-5.

12. Carter NS, Landfear SM, Ullman B. 2001. Nucleoside transporters of parasitic protozoa. Trends Parasitol. 17(3): 142-5.

13. Cass CE, Young JD, Baldwin SA, Cabrita MA, Graham KA, Griffiths M, Jennings LL, Mackey JR, Ng AM, Ritzel MW, Vickers MF, Yao SY. 1999. Nucleoside transporters of mammalian cells. Pharm Biotechnol. 12:313-52.

14. Chung MI, Ming Miao , Stahl RJ, Chan E, Parkinson J, Keeley FW. 2006. Sequences and domain structures of mammalian, avian, amphibian and teleost tropoelastins: Clues to the evolutionary history of elastins. Matrix Biol. 25(8):492-504.

15. Coleman MD, Bass RB, Mehan RS, Falke JJ. 2005. Conserved Glycine Residues in the Cytoplasmic Domain of the Aspartate Receptor Play Essential Roles in Kinase Coupling and On-Off Switching. Biochemistry. 44(21):7687-95.

16. Craig JE, Zhang Y, Gallagher MP. 1994. Cloning of the nupC gene of Escherichia coli encoding a nucleoside transport system, and identification of an adjacent insertion element, IS 186. Mol Microbiol. 11(6): 1159-68.

17. Davidson EH. 2006. The sea urchin genome: where will it lead us? Science. 314(5801):939-40.

18. De Koning HP, Al-Salabi MI, Cohen AM, Coombs GH, Wastling JM. 2003. Identification and characterization of high affinity nucleoside and nucleobase transporters in Toxoplasma gondii. Int J Parasitol. 33(8):821-31

61 19. De Koning HP, Bridges DJ, Burchmore RJ. 2005. Purine and pyrimidine transport in pathogenic protozoa: from biology to therapy. FEMS Microbiol Rev. 29(5):987-1020.

20. Dehal P, Satou Y, Campbell RK, Chapman J, Degnan B, De Tomaso A, Davidson B, Di Gregorio A, Gelpke M, Goodstein DM, Harafuji N, Hastings KE, Ho I, Hotta K, Huang W, Kawashima T, Lemaire P, Martinez D, Meinertzhagen IA, Necula S, Nonaka M, Putnam N, Rash S, Saiga H, Satake M, Terry A, Yamada L, Wang HG, Awazu S, Azumi K, Boore J, Branno M, Chin-Bow S, DeSantis R, Doyle S, Francino P, Keys DN, Haga S, Hayashi H, Hino K, Imai KS, Inaba K, Kano S, Kobayashi K, Kobayashi M, Lee BI, Makabe KW, Manohar C, Matassi G, Medina M, Mochizuki Y, Mount S, Morishita T, Miura S, Nakayama A, Nishizaka S, Nomoto H, Ohta F, Oishi K, Rigoutsos I, Sano M, Sasaki A, Sasakura Y, Shoguchi E, Shin-i T, Spagnuolo A, Stainier D, Suzuki MM, Tassy O, Takatori N, Tokuoka M, Yagi K, Yoshizaki F, Wada S, Zhang C, Hyatt PD, Larimer F, Detter C, Doggett N, Glavina T, Hawkins T, Richardson P, Lucas S, Kohara Y, Levine M, Satoh N and Rokhsar DS. 2002. The draft genome of Ciona intestinalis: insights into chordate and vertebrate origins. Science. 13; 298(5601):2157-67.

21. Dresser MJ, Gerstin KM, Gray AT, Loo DD, Giacomini KM. 2000. Electrophysiological analysis of the substrate selectivity of a sodium-coupled nucleoside transporter (rCNTl) expressed in Xenopus laevis oocytes. Drug Metab Dispos.; 28(9): 1135-40.

22. Eichinger L, Pachebat J A, Glockner G, Rajandream MA, Sucgang R, Berriman M, Song J, Olsen R, Szafranski K, Xu Q, Tunggal B, Kummerfeld S, Madera M, Konfortov BA, Rivero F, Bankier AT, Lehmann R, Hamlin N, Davies R, Gaudet P, Fey P, Pilcher K, Chen G, Saunders D, Sodergren E, Davis P, Kerhornou A, Nie X, Hall N, Anjard C, Hemphill L, Bason N, Farbrother P, Desany B, Just E, Morio T, Rost R, Churcher C, Cooper J, Haydock S, van Driessche N, Cronin A, Goodhead I, Muzny D, Mourier T, Pain A, Lu M, Harper D, Lindsay R, Hauser H, James K, Quiles M, Madan Babu M, Saito T, Buchrieser C, Wardroper A, Felder M, Thangavelu M, Johnson D, Knights A, Loulseged H, Mungall K, Oliver K, Price C, Quail MA, Urushihara H, Hernandez J, Rabbinowltsch E, Steffen.D, Sanders M, Ma J, Kohara Y, Sharp S, Simmonds M, Spiegler S, Tivey A, Sugano S, White B, Walker D, Woodward J, Winckler T, Tanaka Y, Shaulsky G, Schleicher M, Weinstock G, Rosenthal A, Cox EC, Chisholm RL, Gibbs R,

62 Loomis WF, Platzer M, Kay RR, Williams J, Dear PH, Noegel AA, Barrell B, Kuspa A. 2005. The genome of the social amoeba Dictyostelium discoideum. Nature. 435(7038):43-57.

23. Funkhouser JD, Aronson NN Jr. 2007. Chitinase family GH18: evolutionary insights from the genomic history of a diverse protein family. BMC Evol Biol.7:96.

24. Galmarini CM, Jordheim L, Dumontet C. 2003. Pyrimidine nucleoside analogs in cancer treatment. Expert Rev Anticancer Ther. 3(5):717-28.

25. Galmarini CM, Mackey JR, Dumontet C. 2002.Nucleoside analogues and nucleobases in cancer treatment. Lancet Oncol. 3(7):415-24.

26. Gray JH and Owen RP and Giacomini KM. 2004. The concentrative nucleoside transporter family, SLC28. Pflugers Arch. 447(5): 728-34.

27. Griffith DA and Jarvis SM. 1996. Nucleoside and nucleobase transport systems of mammalian cells. Biochim Biophys Acta. 29; 1286(3): 153-81.

28. Hamilton SR, Yao SY, Ingram JC, Hadden DA, Ritzel MW, Gallagher MP, Henderson PJ, Cass CE, Young JD, Baldwin SA. 2001. Subcellular distribution and membrane topology of the mammalian concentrative Na+-nucleoside cotransporter rCNTl. J Biol Chem. 276(30): 27981-8.

29. Hao Xie, Simon G. Patching, Maurice P. Gallagher, Gary J. Litherland, Adrian R. Brough, Henrietta Venter, Sylvia Y. M. Yao, Amy M. L. Ng, James D. Young, Richard B. Herbert, Peter J. F. Henderson, Stephen A. Baldwin. 2004. Purification and properties of the Escherichia coli nucleoside transporter NupG, a paradigm for a major facilitator transporter sub-family. Mol Membr Biol. 21(5): 323-36.

30. Hirose N, Makita N, Yamaya T, Sakakibara H. 2005. Functional characterization and expression analysis of a gene, OsENT2, encoding an equilibrative nucleoside transporter in rice suggest a function in cytokinin transport. Plant Physiol. 138(1): 196-206.

31. Holt RA, Subramanian GM, Halpern A, Sutton GG, Charlab R, Nusskern DR, Wincker P, Clark AG, Ribeiro JM, Wides R, Salzberg SL, Loftus B, Yandell M, Majoros WH, Rusch DB, Lai Z, Kraft CL, Abril JF, Anthouard V, Arensburger P, Atkinson PW, Baden H, de Berardinis V, Baldwin D, Benes V, Biedler J, Blass C, 63 Bolanos R, Boscus D, Barnstead M, Cai S, Center A, Chaturverdi K, Christophides GK, Chrystal MA, Clamp M, Cravchik A, Curwen V, Dana A, Delcher A, Dew I, Evans CA, Flanigan M, Grundschober-Freimoser A, Friedli L, Gu Z, Guan P, Guigo R, Hillenmeyer ME, Hladun SL, Hogan JR, Hong YS, Hoover J, Jaillon O, Ke Z, Kodira C, Kokoza E, Koutsos A, Letunic I, Levitsky A, Liang Y, Lin JJ, Lobo NF, Lopez JR, Malek JA, Mcintosh TC, Meister S, Miller J, Mobarry C, Mongin E, Murphy SD, O'Brochta DA, Pfannkoch C, Qi R, Regier MA, Remington K, Shao H, Sharakhova MV, Sitter CD, Shetty J, Smith TJ, Strong R, Sun J, Thomasova D, Ton LQ, Topalis P, Tu Z, Unger MF, Walenz B, Wang A, Wang J, Wang M, Wang X, Woodford KJ, Wormian JR, Wu M, Yao A, Zdobnov EM, Zhang H, Zhao Q, Zhao S, Zhu SC, Zhimulev I, Coluzzi M, della Torre A, Roth CW, Louis C, Kalush F, Mural RJ, Myers EW, Adams MD, Smith HO, Broder S, Gardner MJ, Fraser CM, Birney E, Bork P, Brey PT, Venter JC, Weissenbach J, Kafatos FC, Collins FH, Hoffman SL. 2002. The genome sequence of the malaria mosquito Anopheles gambiae. Science. 298(5591): 129- 49.

32. Hughes LA and Friedman R. Loss of ancestral genes in the genomic evolution of Ciona intestinalis. 2005. Evolution and development. 7:3, 196-200.

33. Hyde RJ, Cass CE, Young JD, Baldwin SA. 2001. The ENT family of eukaryote nucleoside and nucleobase transporters: recent advances in the investigation of structure/function relationships and the identification of novel isoforms. Mol MembrBiol. 18(l):53-63.

34. Jaillon O, Aury JM, Brunet F, Petit JL, Stange-Thomann N, Mauceli E, Bouneau L, Fischer C, Ozouf-Costaz C, Bernot A, Nicaud S, Jaffe D, Fisher S, Lutfalla G, Dossat C, Segurens B, Dasilva C, Salanoubat M, Levy M, Boudet N, Castellano S, Anthouard V, Jubin C, Castelli V, Katinka M, Vacherie B, Biemont C, Skalli Z, Cattolico L, Poulain J, De Berardinis V, Cruaud C, Duprat S, Brottier P, Coutanceau JP, Gouzy J, Parra G, Lardier G, Chappie C, McKernan KJ, McEwan P, Bosak S, Kellis M, Volff JN, Guigo R, Zody MC, Mesirov J, Lindblad-Toh K, Birren B, Nusbaum C, Kahn D, Robinson-Rechavi M, Laudet V, Schachter V, Quetier F, Saurin W, Scarpelli C, Wincker P, Lander ES, Weissenbach J, Roest Crollius H. 2004. Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature. 431(7011):946-57.

35. Jastroch M, Withers KW, Taudien S, Frappell PB, Helwig M, Fromme T, Hirschberg V, Heldmaier G, McAllan BM, Firth BT, Burmester T, Platzer M, 64 Klingenspor M. 2007. Marsupial uncoupling protein 1 sheds light on the evolution of mammalian nonshivering thermogenesis. Physiol Genomics. Epub.

36. Jennings LL, Hao C, Cabrita MA, Vickers MF, Baldwin SA, Young JD, Cass CE. 2001. Distinct regional distribution of human equilibrative nucleoside transporter proteins 1 and 2 (hENTl and hENT2) in the central nervous system. Neuropharmacology. 40(5):722-31.

37. King AE, Ackley MA, Cass CE, Young JD, Baldwin SA. 2007. Nucleoside transporters: from scavengers to novel therapeutic targets. Trends Pharmacol Sci. 27(8):416-25.

38. Kong W, Engel K, Wang J. 2004. Mammalian nucleoside transporters. Curr Drug Metab. 5(l):63-84.

39. Landfear SM. 2001. Molecular genetics of nucleoside transporters in Leishmania and African trypanosomes. Biochem Pharmacol. 62(2): 149-55.

40. Lemos B. The Opossum genome reveals further evidence for regulatory evolution in mammalian diversification. Genome Biol. 8(8):223.

41. Leveugle M, Prat K, Popovici C, Birnbaum D, Coulier F. 2004. Phylogenetic analysis of Ciona intstinalis gene superfamilies supports the hypothesis of successive gene expansions. J Mol Evol. 58(2): 168-81.

42. Li G, Liu K, Baldwin SA, Wang D. 2003. Equilibrative nucleoside transporters of Arabidopsis thaliana. cDNA cloning, expression pattern, and analysis of transport activities. J Biol Chem. 278(37):35732-42.

43. Liu W, Arendt CS, Gessford SK, Ntaba D, Carter NS, Ullman B. 2005. Identification and characterization of purine nucleoside transporters from Crithidia fasiculata. Molecular and biochemical parasitology. 140:1-12.

44. Loewen SK, Ng AM, Mohabir NN, Baldwin SA, Cass CE, Young JD. 2003. Functional characterization of a H+/nucleoside co-transporter (CaCNT) from Candida albicans, a fungal member of the concentrative nucleoside transporter (CNT) family of membrane proteins. Yeast: 20(8):661-75.

45. Loewen SK, Yao SY, Slugoski MD, Mohabir NN, Turner RJ, Mackey JR, Weiner JH, Gallagher MP, Henderson PJ, Baldwin SA, Cass CE, Young JD. 2004. Transport of physiological nucleosides and anti-viral and anti-neoplastic 65 nucleoside drugs by recombinant Escherichia coli nucleoside-H(+) cotransporter (NupC) produced in Xenopus laevis oocytes. Mol Membr Biol. 21(1): 1-10.

46. Loffler M, Morote-Garcia JC, Eltzschig SA, Coe IR, Eltzschig HK. 2007. Physiological roles of vascular nucleoside transporters. Arterioscler Thromb Vase Biol. 27(5): 1004-13.

47. Lu H, Chen C, Klaassen C. 2004. Tissue distribution of concentrative and equilibrative nucleoside transporters in male and female rats and mice. Drug Metab Dispos. 32(12): 1455-61.

48. Machado J, Abdulla P, Hanna WJ, Hilliker AJ, Coe IR. 2007. Genomic analysis of nucleoside transporters in Diptera and functional characterization of DmENT2, a Drosophila equilibrative nucleoside transporter. Physiol Genomics. 28(3):337- 47.

49. Mangravite LM, Lipschutz JH, Mostov KE, Giacomini KM. 2001. Localization of GFP-tagged concentrative nucleoside transporters in a renal polarized epithelial cell line. Am J Physiol Renal Physiol. 280(5):F879-85.

50. McManus DP, Bartley PB. 2004. A vaccine against Asian schistosomiasis. Parasitol Int. 53(2): 163-73.

51. Mishra J, Saxena A, Singh S. 2007. Chemotherapy of leishmaniasis: past, present and future. Curr Med Chem. 14(10): 1153-69.

52. Nakajima-Shimada J, Hirota Y, Aoki T. 1996. Inhibition of Trypanosoma cruzi growth in mammalian cells by purine and pyrimidine analogs. Antimicrob Agents Chemother. 40(ll):2455-8.

53. Nieweg A, Bremer E. 1997. The nucleoside-specific Tsx channel from the outer membrane of Salmonella typhimurium, Klebsiella pneumoniae and Enterobacter aerogenes: functional characterization and DNA sequence analysis of the tsx genes. Microbiology. 143 (Pt 2):603-15.

54. Orzaez M, Salgado J, Gimenez-Giner A, Perez-Paya E, Mingarro I. 2004. Influence of proline residues in transmembrane helix packing. J Mol Biol. 335(2):631-40.

55. Parker MD, Hyde RJ, Yao SY, McRobert L, Cass CE, Young JD, McConkey GA, Baldwin SA. 2000. Identification of a nucleoside/nucleobase transporter from 66 Plasmodium falciparum, a novel target for anti-malarial chemotherapy. Biochem J. 349(Pt l):67-75.

56. Parker MD, Hyde RJ, Yao SY, McRobert L, Cass CE, Young JD, McConkey GA, Baldwin SA. 2000. Identification of a nucleoside/nucleobase transporter from Plasmodium falciparum, a novel target for anti-malarial chemotherapy. Biochem J. 349(Pt l):67-75.

57. Pastor-Anglada M, Errasti-Murugarren E, Aymerich I, Casado FJ. 2007. Concentrative nucleoside transporters (CNTs) in epithelia: from absorption to cell signaling. J Physiol Biochem. 63(1):97-110.

58. Pebusque MJ, Coulier F, Birnbaum D and Pontarotti P. 1998. Ancient Large- Scale Genome Duplications: Phylogenetic and Linkage Analyses Shed Light on Chordate Genome Evolution. Mol Biol Evol. 9:1145-59.

59. Podgorska M, Kocbuch K, Pawelczyk T.2005. Recent advances in studies on biochemical and structural properties of equilibrative and concentrative nucleoside transporters. Acta Biochim Pol. 52(4):749-58.

60. Sanchez MA, Tryon R, Green J, Boor I, Landfear SM. 2002. Six related nucleoside/nucleobase transporters from Trypanosoma brucei exhibit distinct biochemical functions. J Biol Chem. 277(24):21499-504.

61. Sankar, N, Machado J, Abdulla P, Hilliker AJ and Coe IR. 2002. Comparative genomic analysis of equilibrative nucleoside transporters suggests conserved protein structure despite limited sequence identity. Nucleic Acids Research. 30(20): 4339-4350.

62. Schaap P, Winckler T, Nelson M, Alvarez-Curto E, Elgie B, Hagiwara H, Cavender J, Milano-Curto A, Rozen DE, Dingermann T, Mutzel R, Baldauf SL .2006.Molecular phylogeny and evolution of morphology in the social amoebas. Science. 314(5799):661-3.

63. SenGupta DJ, Lum PY, Lai Y, Shubochkina E, Bakken AH, Schneider G, Unadkat JD. 2002. A Single Glycine Mutation in the Equilibrative Nucleoside Transporter Gene, hENTl, Alters Nucleoside Transport Activity and Sensitivity to Nitrobenzylthioinosine. Biochemistry.41(5):1512-9.

64. Sherman M. 2007. Universal Genome in the Origin of Metazoa: Thoughts About Evolution. Cell Cycle.6(15). 67 65. Smith KM, Ng AM, Yao SY, Labedz KA, Knaus EE, Wiebe LI, Cass CE, Baldwin SA, Chen XZ, Karpinski E, Young JD. 2004. Electrophysiological characterization of a recombinant human Na+-coupled nucleoside transporter (hCNTl) produced in Xenopus oocytes. J Physiol. 558(Pt 3):807-23.

66. Smith KM, Slugoski MD, Cass CE, Baldwin SA, Karpinski E, Young JD. 2007. Cation coupling properties of human concentrative nucleoside transporters hCNTl, hCNT2 and hCNT3. Mol Membr Biol. 24(l):53-64.

67. Smith KM, Slugoski MD, Loewen SK, Ng AM, Yao SY, Chen XZ, Karpinski E, Cass CE, Baldwin SA, Young JD. 2005. The broadly selective human Na+/nucleoside cotransporter (hCNT3) exhibits novel cation-coupled nucleoside transport characteristics. J Biol Chem. 280(27):25436-49.

68. Sodergren E, Weinstock GM, Davidson EH, Cameron RA, Gibbs RA, Angerer RC, Angerer LM, Arnone MI, Burgess DR, Burke RD, Coffman JA, Dean M, Elphick MR, Ettensohn CA, Foltz KR, Hamdoun A, Hynes RO, Klein WH, Marzluff W, McClay DR, Morris RL, Mushegian A, Rast JP, Smith LC, Thorndyke MC, Vacquier VD, Wessel GM, Wray G, Zhang L, Elsik CG, Ermolaeva O, Hlavina W, Hofmann G, Kitts P, Landrum MJ, Mackey AJ, Maglott D, Panopoulou G, Poustka AJ, Pruitt K, Sapojnikov V, Song X, Souvorov A, Solovyev V, Wei Z, Whittaker CA, Worley K, Durbin KJ, Shen Y, Fedrigo O, Garfield D, Haygood R, Primus A, Satija R, Severson T, Gonzalez- Garay ML, Jackson AR, Milosavljevic A, Tong M, Killian CE, Livingston BT, Wilt FH, Adams N, Belle R, Carbonneau S, Cheung R, Cormier P, Cosson B, Croce J, Fernandez-Guerra A, Geneviere AM, Goel M, Kelkar H, Morales J, Mulner-Lorillon O, Robertson AJ, Goldstone JV, Cole B, Epel D, Gold B, Hahn ME, Howard-Ashby M, Scally M, Stegeman JJ, Allgood EL, Cool J, Judkins KM, McCafferty SS, Musante AM, Obar RA, Rawson AP, Rossetti BJ, Gibbons IR, Hoffman MP, Leone A, Istrail S, Materna SC, Samanta MP, Stole V, Tongprasit W, Tu Q, Bergeron KF, Brandhorst BP, Whittle J, Berney K, Bottjer DJ, Calestani C, Peterson K, Chow E, Yuan QA, Elhaik E, Graur D, Reese JT, Bosdet I, Heesun S, Marra MA, Schein J, Anderson MK, Brockton V, Buckley KM, Cohen AH, Fugmann SD, Hibino T, Loza-Coll M, Majeske AJ, Messier C, Nair SV, Pancer Z, Terwilliger DP, Agca C, Arboleda E, Chen N, Churcher AM, Hallbook F, Humphrey GW, Idris MM, Kiyama T, Liang S, Mellott D, Mu X, Murray G, Olinski RP, Raible F, Rowe M, Taylor JS, Tessmar-Raible K, Wang D, Wilson KH, Yaguchi S, Gaasterland T, Galindo BE, Gunaratne HJ, Juliano C,

68 Kinukawa M, Moy GW, Neill AT, Nomura M, Raisch M, Reade A, Roux MM, Song JL, Su YH, Townley IK, Voronina E, Wong JL, Amore G, Branno M, Brown ER, Cavalieri V, Duboc V, Duloquin L, Flytzanis C, Gache C, Lapraz F, Lepage T, Locascio A, Martinez P, Matassi G, Matranga V, Range R, Rizzo F, Rottinger E, Beane W, Bradham C, Byrum C, Glenn T, Hussain S, Manning G, Miranda E, Thomason R, Walton K, Wikramanayke A, Wu SY, Xu R, Brown CT, Chen L, Gray RF, Lee PY, Nam J, Oliveri P, Smith J, Muzny D, Bell S, Chacko J, Cree A, Curry S, Davis C, Dinh H, Dugan-Rocha S, Fowler J, Gill R, Hamilton C, Hernandez J, Hines S, Hume J, Jackson L, Jolivet A, Kovar C, Lee S, Lewis L, Miner G, Morgan M, Nazareth LV, Okwuonu G, Parker D, Pu LL, Thorn R, Wright R, Sea Urchin Genome Sequencing Consortium. 2006. The genome of the sea urchin Strongylocentrotus purpuratus. Science. 314(5801):941-52.

69. Stein A, Vaseduvan G, Carter NS, Ullman B, Landfear SM, Kavanaugh MP. 2003. Equilibrative nucleoside transporter family members from Leishmania donovani are electrogenic proton symporters. J Biol Chern. 278(37):35127-34.

70. Taylor JS, Van de Peer Y, Braasch I, Meyer A.2001. Comparative genomics provides evidence for an ancient genome duplication event in fish. Philos Trans R Soc Lond B Biol Sci. 356(1414): 1661-79.

71. Vijayalakshmi D and Belt JA. 1988. Sodium-dependent nucleoside transport in mouse intestinal epithelial cells. Two transport systems with differing substrate specificities. J Biol Chem. 263:19419-19423.

72. Visser F, Sun L, Damaraju V, Tackaberry T, Peng Y, Robins MJ, Baldwin SA, Young JD, Cass CE. 2007. Residues 334 and 338 in transmembrane segment 8 of human equilibrative nucleoside transporter 1 are important determinants of inhibitor sensitivity, protein folding, and catalytic turnover. J Biol Chem. 282(19): 14148-57.

73. Ward JL, Sherali A, Mo ZP, Tse CM. 2000. Kinetic and pharmacological properties of cloned human equilibrative nucleoside transporters, ENT1 and ENT2, stably expressed in nucleoside transporter-deficient PK15 cells. Ent2 exhibits a low affinity for guanosine and cytidine but a high affinity for inosine. J Biol Chem. 275(12):8375-81.

69 74. Wu X, Yuan G, Brett CM, Hui AC and Giacomini KM. 1992. Sodium-dependent nucleoside transport in choroid plexus from rabbit. Evidence for a single transporter for purine and pyrimidine nucleosides. J Biol Chem. 267:8813-8818.

75. Xiao G, Wang J, Tangen T, Giacomini KM. 2001. A novel proton-dependent nucleoside transporter, CeCNT3, from Caenorhabditis elegans. Mol Pharmacol. 59(2):339-48.

76. Yao SY, Ng AM, Loewen SK, Cass CE, Baldwin SA and Young JD. 2002. An ancient prevertebrate Na+-nucleoside cotransporter (hfCNT) from the Pacific hagfish (Eptatretus stouti). Am J Physiol Cell Physiol. 283(1):C 155-68.

77. Yao SY, Ng AM, Slugoski MD, Smith KM, Mulinta R, Karpinski E, Cass CE, Baldwin SA, Young JD. 2007. Conserved glutamate residues are critically involved in Na+/nucleoside cotransport by human concentrative nucleoside transporter 1 (hCNTl). J Biol Chem. 282(42):30607-17.

78. Yao SY, Ng AM, Vickers MF, Sundaram M, Cass CE, Baldwin SA, Young JD. 2002. Functional and molecular characterization of nucleobase transport by recombinant human and rat equilibrative nucleoside transporters 1 and 2. Chimeric constructs reveal a role for the ENT2 helix 5-6 region in nucleobase translocation. J Biol Chem. 277(28):24938-48.

79. Zhang J, Visser F, King KM, Baldwin SA, Young JD, Cass CE. 2007. The role of nucleoside transporters in cancer chemotherapy with nucleoside drugs. Cancer Metastasis Rev. 26(1):85-110.

70 Chapter 2: Functional characterization of a putative human membrane transporter

Abstract

Nucleoside transport systems are extensively researched due to their importance in the transport of natural nucleosides and nucleoside analog drugs administered for cancer and viral therapy. Two non-homologous nucleoside transport systems are currently known to be expressed in mammals - the equilibrative and concentrative systems. A third nucleoside/proton symport system with no obvious homology to the equilibrative or concentrative exists only in prokaryotes. During the evolutionary analyses of the NT family, an unknown protein in Ciona intestinalis was found that had sequence homology to the oligosaccharide: H+ symporter (OHS) family and the nucleoside: H+ symporter (NHS) family. In order to determine human homologues of this unknown protein product (UPP), BLAST, ClustalW and TMHMM programs have been employed. Novel members of this family have been identified in mammals, fish, bird, frog, marsupial, seasquirt, insects, sea urchin, protozoa and bacteria. Although the predicted protein structure suggests that this family belongs to the major facilitator superfamily, the function of these proteins is unknown. I determined the cellular localization of human UPPl(hUPPl) using MCF-7 and COS-7 cell systems using a recombinant fluorescently tagged hUPPl and transient transfection with confocal microscopy to investigate the potential role of the human homologue of this UPP family.

I confirmed thathUPPl is localized at the plasma membrane supporting its putative role as a plasma membrane transporter. To functionally characterize hUPPl I used MCF-7 cell system and observed that hUPPl transports hypoxanthine (10.10 picomoles/mg 71 protein). However, when hUPPl was heterologously expressed in Xenopus oocytes, it showed affinity for uridine (0.28picomoles/oocyte min" ). Further studies are needed to determine the complete transport profile of this protein, but this study lays the foundation for the designation of this unknown protein as a novel nucleoside/nucleobase transporter that is not a member of the classic ENT or CNT families.

72 2.1 Introduction

The two main nucleoside transport systems currently characterized in mammals are the equilibrative nucleoside transporters (ENTs) and concentrative nucleoside transporters (CNTs) (Cass et al. 1999, Cabrita et al. 2002, Podgorska et al. 2005). They belong to the solute carrier family 29 (SLC29) and SLC28 respectively (Baldwin et al.

2004, Gray et al. 2004). Both of these two systems have been studied since their members are present in humans and are involved in rnany aspects of physiology and pharmacology like cell signaling, drug transport etc. (Zhang et al. 2007). ENTs are expressed in eukaryotes only, whereas CNTs exist in both prokaryotes and eukaryotes.

Additionally, a third nucleoside transport system exists that uses a proton symport system to transport nucleosides. This transport system has been detected in prokaryotes only. The first (and only) characterized member of this system is called NupG, an E. coli nucleoside transporter (Hao et al. 2004). This nucleoside permease belongs to the major facilitator superfamily (MFS). MFS transporters are capable of transporting small solutes in the presence of chemiosmotic ion gradients (Pao et al. 1998). Previously it was believed that members of the MFS were only capable of transporting sugar molecules. It was later identified that members of this superfamily can also transport oligosaccharides, nitrates, phosphates, oxalate, aromatic acid, nucleosides, drugs, anions, cations etc using ion gradients (Pao et al. 1998). NupG belongs to the nucleoside: H+ symporter (NHS) family and transports both purine and pyrimidine with high affinity (Hao et al. 2004).

The characterized CNTs present in Caenorhabditis elegans and Candida albicans also

73 use a proton gradient instead of a sodium gradient to transport nucleosides, but they lack any sequence homology to NupG.

During the evolutionary analyses of ENT and CNT family, my word search analysis revealed a predicted transporter sequence in Ciona intestinalis genome which has no sequence homology or structural similarity to either the ENT or CNT family

(Figure 1). The function of this protein was not known and it was therefore assigned a name, 'Unknown protein product (UPP)'. Further analyses of this protein revealed that it has significant protein sequence homology to two families of MFS, the NHS family and the oligosaccharide: H+ symporter (OHS) family.

Both the NHS family and members of the OHS family have been identified in prokaryotes only. A well characterized transporter of the OHS family is LacY of E. coli

(Newman et al. 1981). Since the UPP protein in Ciona showed significant sequence similarity and similar putative 12 TMH structure to both lacY permease and NupG

(Figure 2-a and 2-b), it raised the possibility that eukaryotic organisms might possess a similar permease that can either function as lactose or nucleoside transporter. If the UPP protein is indeed a MFS permease, this discovery would represent the first identification of a novel nucleoside transporter in eukaryotes that belongs to the MFS.

Since nucleosides transporters play key roles in regulating various important physiological processes like cardioprotection, cell signaling and are known to transport many anticancer and antiviral drugs, identifying another transport system in humans could provide the opportunity to explore another means of delivering drugs to "diseased" cells. This study investigates the intriguing possibility that additional proteins may exist 74 which could potentially contribute to nucleoside transport processes in eukaryotic cells.

The research is focused on identifying homologues of this protein in vertebrates as well as detecting if a human homologue of this protein exists and finally determining the function of these proteins.

75 2.2 Research objectives

The main objectives of this part of my thesis are to combine bioinformatics and functional studies-

1. To identify homologous UPP proteins in different taxa and

2. To functionally characterize a UPP protein.

2.3 Methods

2.3.1 Database analysis

During the comprehensive analyses of the ENT and CNT family, a protein with unknown function was detected that was listed as a predicted transporter. Since the function of the protein is not known, it was designated as UPP (unknown protein product). The identified full-length UPP protein sequence in Ciona genome was used in

Blast searches to discover putative homologous sequences in eukaryotic organisms

(http://www.ncbi.nlm.nih.gov/BLASTP/). The generated partial and complete protein sequences were further analyzed using ClustalW pairwise and multiple alignments (with default settings) to investigate the degree of similarity between the amino acid sequences

(http://www.ebi.ac.uk/clustalw/). The program, Webphylip, was used to generate phylogenetic tree (rootless) using protein parsimony method to visualize the distribution and similarity of these proteins among various taxa

(http://biocore.unl.edu/WEBPHYLIP/). The transmembrane helix (TMH) prediction program TMHMM 2.0 was used to determine the putative membrane topology of the protein sequences (http://cbs.dtu.dk/services/TMHMM). The protein expression profiles

76 for human proteins were determined using the database BodyMap-Xs

(http://bodymap.jp/).

2.3.2 Nomenclature

The protein sequences identified by word-based and BLAST searches that showed similarity to Ciona UPP protein, but no similarity to ENTs or CNTs were designated as unknown protein product, UPP, preceded by an abbreviation of genus and species name

(Ciona intestinalis, CiUPP) except for human, mouse and rat sequences, which were named hUPPl, mUPPl, rUPPl etc. The numbers 1 and 2 imply the presence of two UPP proteins in one organism where UPP1 is longer in length than UPP2 protein. Additional numbering indicates the presence of more than one isoform of UPP1 or UPP2 in an organism (e.g. DmUPP 1.2).

2.3.3 Plasmid construction

Commercially synthesized PJ10:hUPPl with BgUI and Kpnl site was obtained from DNA2.0 (CA, USA). Using restriction endonucleases BgUI (10,000 Units/ml) and

Kpnl (10,000 Units/ml) (New England Biolabs, Pickering, ON, Canada), hUPPl was isolated from PJ10 vector. The vector pEGFP-Cl was also digested with BgUI and Kpnl . The digestion reaction contained 40|xg of DNA and 100 units of each BgUI and

Kpnl in an appropriate buffer and reactions were incubated for 2 hours at 37 C. Digested vector and hUPPl was purified from 1% (w/v) agarose gel using a gel extraction kit

(Qiagen Inc., Canada). Ligation was performed using vector: insert ratio of 1:1, 1:3, 1:6 and 2jxl of lOx T4 ligase buffer, 2u,l of T4 DNA ligase (400,000 U/ml) (New England 77 Biolabs) and water to an end volume of 20 ul. As a negative control, a reaction mixture without the insert was used. Ligation samples were incubated overnight at 16 C and transformed in Novablue competent cells (EMD Biosciences, Mississauga, ON, Canada) using the following incubation steps - 5 minutes on ice, 30 seconds at 37 C, 2 minutes on ice and finally 30 minutes at 42 C. Samples were plated on LB-agar-kanamycin plates and incubated overnight at 37°C. Colonies were inoculated in 5 ml LB medium supplemented with 50^g/ml kanamycin. After 16 hours of incubation, plasmid DNA was purified from the bacterial culture using a Gene elute plasmid miniprep kit (Sigma- aldrich, Oakville, ON, Canada) according to the manufacturer's protocol. Purified DNA was then digested with BgUI and Kpnl to identify plasmids containing the insert. The positive clones were then sequenced at the Core Molecular Facility at York University. In

MCF-7 cells, this construct was used in transient transfection for confocal microscopy to determine cellular localization and transport assays to determine substrate specificity.

The construct pCDNA3.1(+):hUPPl was also prepared by Genescript (NJ, USA) using

Nhel and Kpnl restriction sites in order to express hUPPl in Xenopus oocytes and to perform uptake assays to functionally characterize the protein.

2.3.4 COS-7 and HEK-293 and MCF-7 cell culture

In order to determine the cellular localization and substrate specificity of hUPPl in mammalian cells, COS-7, HEK-293 and MCF-7 cells were cultured in a 5% C02 atmosphere at 37°C temperature in Dulbecco's Modified Eagle's Medium (DMEM) supplemented with 10% (v/v) fetal bovine serum (Gibco) and lx antibiotic-antimycotic

78 solution (Sigma-Aldrich, Oakville, ON, Canada). Cells were grown on Petri dishes or cover slips.

2.3.5 Transfection for Confocal microscopy and uptake assay

COS-7, HEK-293 and MCF-7 cells were seeded in six well plates or on coverslips and were transfected at approximately 80% confluency. One hour prior to transfection, the medium was replaced by 1 ml fresh DMEM supplemented with 10% (v/v) FBS and lx antibiotic-antimycotic solution. To assemble the reaction mixture, 5(xg of plasmid

DNA (GFP: hUPPl or GFP: hENTl) was mixed with 250 \i\ FBS free DMEM in one vial. In another vial, 12 |xl of Lipofectamine (Invitrogen, CA, USA) was mixed with 100

\i\ FBS free DMEM. After 5 minutes incubation at room temperature, the contents of the vials were combined and incubated for another 20 minutes at room temperature. Then the mixture was added to the cells and cells were incubated overnight in a 5% CO2 atmosphere at 37 C. Wild type cells were used as negative control. GFP: hENTl and

Flag: hENTl transfected cells were positive controls for confocal microscopy and transport assay.

2.3.6 Confocal microscopy

For organelle labeling cells were grown on coverslips. For mitochondrial and lysosomal staining, transfected cells were incubated with MitoTracker CMXRos and

LysoTracker Red DND-99 (Molecular Probes, OR, USA) for 30 minutes at room temperature. Cells were then fixed with 2% (w/v) paraformaldehyde and mounted on microscopic slides using a drop of mounting medium (DAKO Cytomation, Mississauga, 79 ON, Canada). The Golgi body was stained with anti-golgin-97 antibody according to the manufacturer's protocol (Molecular Probes). The presence of GFP in the cells was detected after 22 hours using an Olympus Fluoview 300 confocal microscope and cells were imaged at 60X objective using Olympus Fluoview 300 imaging software (version

4.3). Images were saved and presented in this thesis as a representative section of a series taken through the entire cell.

2.3.7 Uptake assay using cell systems

To determine if hUPPl is a nucleoside or nucleobase transporter, HEK-293 and

MCF-7 cells were seeded in six-well plates and standard nucleoside transport assays (Coe et al. 1996) were performed at room temperature. Cells were incubated in sodium-free transport buffer (20 mM Tris-HCl, 3 mM potassium diphosphate, 1 mM magnesium chloride, 2 mM calcium chloride, 5 mM glucose, 130 mM TV-methyl D-glucamine (pH

7.4) containing permeant (10 uM uridine or hypoxanthine) with radiolabeled nucleoside,

[ H]-uridine (specific activity 65.58 Ci/mmol) or nucleobase, [ H]-hypoxanthine (specific activity 87.5 Ci/mmol) (Moravek Biochemicals, CA, USA). After 10 seconds, the permeant solution was rapidly aspirated and the cells were washed with stop buffer (100 mM NBTI and 30 uM DIPY in sodium free transport buffer). Cells were then solubilized in 1% Triton-X (Bioshop Canada Inc., ON) at 4°C at least for 48 hours and aliquots were taken to measure protein content (using protein assay, Bio-Rad, ON) and radioactivity was measured by standard liquid scintillation counting. Transport is expressed as

80 picomoles per milligram protein. Statistical analyses were done using Student's unpaired two- tailed t test. A P value of less than 0.05 was considered statistically significant.

2.3.8 In vitro RNA transcription

To determine uptake activity of hUPPl in oocytes, RNA transcripts of hUPPl and hENTl were generated using the mMessage mMachine kit (Ambion Inc.). The vector pCDNA3.1(+): hUPPl and PCS2+: hENTl was linearized with restriction , NotI

(10,000 units/ml) (New England Biolabs). DNA (20 \xg) was digested using 100 Units of enzyme in appropriate buffer for 2 hours at 37°C. The negative control was a 1.2 Kb long fragment generated by linearizing the pCDNA3.1 (+) vector with restriction enzyme

Smal following the same reactions conditions. After linearization, the restriction digestions were terminated by adding l/20th volume of 0.5M EDTA, l/10th volume of 3M sodium acetate and 2 volumes of 100% ethanol. DNA was then precipitated at -20°C for at least 15 minutes and centrifuged at 14,000 rpm for 15 minutes to obtain a pellet. The pellet was resuspended in nuclease free water. Each RNA transcription reaction was assembled with nuclease free water, 2 x NTP/CAP, lOx reaction buffer, enzyme mix (T7 or SP6) and 1 \xg of linear template DNA and incubated at 37°C for 2 hours. After that, 2 fxl of DNase (2U/fxl) was added to each reaction and incubated at 37°C for 15 minutes to degrade the DNA template. The reaction was stopped by adding 30[xl of nuclease free water and 30 ul of LiCl (7.5 M) and chilled for 30 minutes at -20°C. To pellet the RNA, the contents were centrifuged at 14,000 rpm at 4°C for 15 minutes. After removing the supernatant, the pellet was washed using 1 ml of 70% (v/v) ethanol. The wash solution

81 was removed and the RNA pellet was resuspended in nuclease free water and stored at -

20°C. The concentration and size of the RNA was determined using spectrophotometer and agarose gel electrophoresis.

2.3.9 Uptake assay in Xenopus oocytes

To complement the transport assays in cultured cells, standard assays for functional characterization of membrane transporters were used to determine the substrate uptake in Xenopus oocytes (Yao et al. 2001). After surgery, stage VI oocytes were removed and kept in ND96 medium (96 mM NaCl, 2mM KC1, ImM CaCl2, 5mM

Hepes and 2.5 mM pyruvate, pH 7.5, supplemented with gentamicin sulphate (50ug/ml) and BSA(1% w/v). Follicular cells were removed by treating the oocytes with collagenase (lmg/ml) (Sigma-Aldrich). RNA (18 nanogram) generated from constructs containing hUPPl, hENTl (as positive control) and linearized pCDNA3.1(+) (as negative control) were individually injected into defolliculated oocytes using a nanoinjector

(Drummond Scientific Company, PA, USA). After injection, oocytes were kept at room temperature for 72 hours in ND96 medium. Prior to each experiment, oocytes (10-14 oocytes per individual data point) were incubated for 15 minutes in transport buffer (100 mM NaCl, 2 mM KC1, 1 mM CaCl2, ImM MgCl2, and 10 mM Hepes, at pH7.5) with T nM NBTI. The transport buffer was then replaced by 200u.l transport buffer containing uridine or hypoxanthine (100fxM) containing radiolabeled substrates (O.lmCi/ml [3H] uridine or [3H] hypoxanthine) and incubated for 60 minutes. At the end of the incubation, extracellular label was removed by six rapid ice-cold washes with 6 ml of ice cold buffer

82 within one minute. Each single oocyte was then placed into a separate scintillation vial containing 200 [il of 1% (w/v) SDS (and allowed to dissolve with vigorous shaking for

45 min at room temp). When the oocytes had completely dissolved, 2 ml scintillation fluid was added to each vial and isotopic uptake in each oocyte was measured by liquid scintillation counting. Uptake is expressed as picomoles per oocyte per minute.

Statistical analyses were done using Student's unpaired two- tailed t test. A P value of less than 0.05 was considered statistically significant.

2.4 Results

2.4.1 UPP family

Using a "reverse screening" approach, 50 UPPs in chordata, arthropoda, nematoda, echinodermata, mycetozoa and bacteria were found (Table 1, Figure 11).

2.4.1.1 Novel vertebrate UPPs

In humans, two putative isoforms were found, hUPPl and hUPP2, each having a length of 791 and 560 residues respectively. The two isoforms were 17% similar.

Because of the low homology between these two proteins, it is not clear yet if they belong to the same family. HUPPl contains an intracellular N-terminal, 12 TM helices

(TMH) and an extracellular C terminal with an extracellular loop between helix 3 and 4.

Although hUPP2 also showed a predicted 12 TMH topology, hUPP2 has an intracellular

N-terminal, an intracellular loop between helices 3 and 4 and an intracellular C terminal

(Figure 3-a and 3-b). HUPPl was found to be expressed in a wide range of tissues and more importantly in a wide range of tumors compared to hUPP2 (Table 2). 83 The two UPP isoforms were also detected in cow, dog, mouse, rat and in zebrafish genomes. The UPP1 proteins were routinely longer in length than the UPP2 proteins. The isoforms were 16% similar to each other in cows, 20% similar in dogs, 18% similar in mouse, 17% similar in rat and finally 18 % similar in zebrafish. HUPP1 had

50 to 80% similarity to other vertebrate UPP1 homologues and had 20 to 30% homology to invertebrate UPP1 members.

In the chimpanzee genome, 3 putative UPP1 proteins were identified. The majority of the sequence (720 amino acids) was identical for all three PtUPPl proteins, but C-terminal region was variable in length. Since these proteins were predicted by the automated computational gene prediction method (GNOMON), it is not clear if each of them exist as functional protein or if they are the results of computational mis- assignment. In rhesus monkey, two UPP1 isoforms (793 and 737 amino acid long respectively) have been identified. Similar to PtUPPl proteins, rhesus monkey UPP proteins only differ in the C-terminus length. As the functional data for these primate

UPPs are not available, they have been included as putative UPPs in Table 1.

Unlike the vertebrates mentioned above, one UPP isoform homologous to UPP1 was detected in pig, horse, opossum, chicken, pufferfish and zebrafish. A single isoform homologous to the UPP2 isoform has been detected in the amphibian genome.

2.4.1.2 Novel invertebrate UPPs Novel UPP members were identified in nematodes, arthropods, echinoderms, parasitic protozoa and protists. In C. elegans, only one UPP member was identified that

84 showed 27% similarity to hUPPl and 14% to hUPP2. The Drosophila genome revealed three putative UPPs, DmUPPl.l, DmUPP1.2 and DmUPP1.3 and they were 19%, 13% and 29% similar to hUPPl and 9%, 11% and 14% similar to hUPP2. Two UPP1 proteins were also identified in Apis melifera and were 20% and 32% similar to hUPPl. Another arthropod Anopheles gambiae revealed three putative UPPs and they showed higher homology to hUPPl (19 to 35%) than hUPP2 (14 to 16 %).

Three UPP members were identified in sea urchin. SpUPPl, SpUPP1.2 and

SpUPP1.3. They were 19%, to 31% similar to hUPPl and 14 to 16% similar to hUPP2.

UPP homologues were also identified in Leishmania and Trypanosoma and were 17% and 20% similar to hUPPl. The protist Dictostellium also had three UPP isoforms that were 20%, 21% and 16% similar to hUPPl and comparatively less similar to hUPP2,

13%, 14% and 12% respectively.

2.4.1.3 Novel prokaryotic UPPs Prokaryotic UPP members have been identified in various gram negative (e.g

Pseudomonas putida) and gram positive bacteria (e.g. Thiobacillus denitrificons).

2.4.2 Human UPP1

Since hUPPl was found to be conserved among different taxa compare to hUPP2,

I decided to characterize hUPPl protein to determine its function.

2.4.2.1 Cellular localization of hUPPl While hUPPl has been identified as a putative transporter protein based on

TMHMM and its location within the cell has not been confirmed. Therefore I decided to 85 determine its cellular location using the immortalized primate kidney cell line, COS-7. In

COS-7, hUPPl showed distinct plasma membrane localization comparable to the GFP tagged hENTl, a well established plasma membrane transporter. Since some plasma membrane proteins are also found in the membranes of intracellular organelles, I determined if there was any co-localization of hUPPl with mitochondria or lysosomes but found no evidence that hUPPl was localized to either of these structures (Figure 4).

Immunolocalization of hUPPl in HEK-293 cells was more challenging because these cells were less tolerant to the washes required for the procedure and did not tend to stick to the cover-slips. Therefore, I decided to use another human cultured cell line,

MCF-7 (an epithelial, breast cancer cell line). As was observed in COS-7 cells, both the

GFP: hUPPl and GFP: hENTl transfected MCF-7 cells showed expression of the protein at the plasma membrane, but no localization in either mitochondria or lysosomes (Figure

5a and 5b). However, some localization was detected in another region inside the cell which was confirmed to be the Golgi body (based on co-localization with an anti-Golgi marker) (Figure 5c). The localization of the empty GFP vector is shown in figure 6.

2.4.2.2 Substrate specificity of hUPPl

The putative structure of hUPPl and its localization at the plasma membrane support the contention that this is a membrane transporter. However, functional characterization is required to establish whether this is indeed the case. Therefore, uptake assays in both human cells and Xenopus oocytes were conducted. In order to determine the substrate specificity of hUPPl, uridine and hypoxanthine were used as substrates.

86 Uridine is considered to be a standard permeant in transport assays since it is known to be transported by all characterized ENTs and CNTs (Cass et al. 1998). Some NTs are capable of transporting nucleobases (e.g hENT2) along with nucleosides and therefore a nucleobase, hypoxanthine was also tested. Analysis of functional transport in cultured cells is challenging because of endogenous transport activities and the standard methodology for functional characterization of transporter proteins such as the nucleoside transporters has long been the Xenopus oocyte heterologous expression system. Since

Xenopus oocytes lack nucleoside transport activity (Griffiths et al. 1997), I decided that it is better suited to use an expression system without endogenous nucleoside transport activity that can interfere or modify its transport characteristics. Therefore, to further clarify the function of hUPPl I moved into this system.

The uptake assays in MCF-7 cells showed a higher uptake of hypoxanthine in

GFP: hUPPl transfected cells than untransfected MCF-7 cells (Table 3, Figure 7). In contrast, GFP: hUPPl transfected cells did not show any increase in uridine uptake than wild type cells (Table 3, Figure 8). No hypoxanthine uptake activity was detected in oocytes and hUPPl did not show uptake over the negative control (Table 4, Figure 9).

The rate of uridine uptake (0.28 pmoles/oocyte min4) of hENTl RNA injected oocytes was higher than the negative control (0.17 pmoles/ oocyte min"1). Uridine accumulation was even greater (.33 pmoles/ oocyte min" ) for the hUPPl injected oocytes than the positive control (Table 5, Figure 10). Each experiment was done at least three times using batches of oocytes from different frogs.

87 Table legends Table 1: The UPP family members. A putative transporter protein sequence from Ciona genome was used in BLAST searches to identify novel UPP members. None of the UPP proteins are functionally characterized.

Table 2: The expression profile of human JJPP1 (hUPPl) and 2 (hUPP2) in various tumor tissues. Using full-length protein sequences, the expression level of hUPPl and hUPP2 in various tumor cells were determined by the BodymapXS database.

Table 3: Comparison of [ H]-hypoxanthine and [ H]-uridine uptake by hUPPl expressed in MCF-7cells (Pooled data). Uptake was measured in presence of [3H]-hypoxanthine and [3H]-uridine (lOuM). Uptake value was presented as mean ± standard deviation. Each experiment was conducted in sextuplicate, n = 3. Statistically significant (p <0.05) increase was represented by *.

Table 4: [ H]-hypoxanthine uptake during 60 minutes exposure in pCDNA3.1(+) and hUPPl expressed oocytes. In vitro transcribed RNA was injected in oocytes and a transport assay using lOOuJVl hypoxanthine with radiolabeled hypoxanthine (lul/ml). Uptake value was presented as mean ± standard error. Each experiment was performed using different batches of oocytes. Statistically significant (p <0.05) increase was represented by *.

Table 5: [ H]-uridine uptake during 60 minutes exposure inpCDNA3.1(+), hENTl and hUPPl expressed oocytes. In vitro transcribed RNA was injected in oocytes and a transport assay using lOOuM uridine with radiolabeled uridine (l[xl/ml). Uptake value was presented as mean ± standard error. Each experiment was performed using different batches of oocytes. Statistically significant (p <0.05) increase was represented by *.

88 Table 1: The UPP family members

Protein size Number GINo. Organism (residues) 1 hUPPl GL34533977 Homo sapiens 791 2 hUPP2 GP21751849 Homo sapiens 586 3 PtUPPl.l GI: 114582282 Pan troglodytes 735 4 PtUPP1.2 GI:114582278 Pan troglodytes 791 5 PtUPP1.3 GL55614358 Pan troglodytes 747 6 MmUPPl.l GI: 109100373 Macaca mulatta 737 7 MmUPP1.2 GI: 109100369 Macaca mulatta 793 8 EcUPPl GI: 149730992 Equua caballus 799 9 BtUPPl GL76609255 Bos taurus 799 10 BtUPP2 GI:76643911 Bos taurus 597 11 CafUPPl GP74005084 Canis familiaris 795 12 CafUPP2 GP73955714 Canis familiaris 572 13 SsUPPl GI: 148230170 Sus scrofa 798 14 rUPPl GL27683291 Rattus norvegicus 794 15 rUPP2 GI:27672794 Rattus norvegicus 596 16 mUPPl GI:30017395 Mus musculus 775 17 mUPP2 GI: 19483879 Mus musculus 586 18 MdUPPl GI: 126326753 Monodelphis domestica 806 19 GgUPPl GI:50750023 Gallus gallus 786 20 X1UPP2 GP54038672 Xenopus Laevis 614 21 TnUPPl GI:47218574 Tetraodon nigroviridis 591 22 DrUPPl GI:51859264 Danio rerio 793 23 DrUPP2 GI:57525765 Danio rerio 542 24 CiUPPl CI0100147045 Ciona intestinalis 503 25 AgUPPl.l GI:55236267 Anopheles gambiae 852 26 AgUPP1.2 GL55240576 Anopheles gambiae 560 27 AgUPP1.3 GI:118783319 Anopheles gambiae 730 28 DmUPPl.l GL45550435 Drosophila melanogaster 539 29 DmUPP1.2 GL24659524 Drosophila melanogaster 741 30 DmUPP1.3 GL24653693 Drosophila melanogaster 762 31 AmUPPl.l GL66550874 Apis mellifera 530 32 AmUPP1.2 GL66525378 Apis mellifera 948 33 SpUPPl.l GI:72007999 Strongylocentrotus purpuratus 834 34 SpUPP1.2 GI:72015615 Strongylocentrotus purpuratus 638 35 SpUPP1.3 GL72077917 Strongylocentrotus purpuratus 776 36 CeUPPl GI: 15144376 Caenorhabditis elegans 630 37 DdUPPl.l GL66803813 Dictyostelium discoideum 580 38 DdUPP1.2 GL66826307 Dictyostelium discoideum 519 39 DdUPP1.3 GL66828613 Dictyostelium discoideum 587 40 LmUPP GL68128961 Leishmania major 591 41 TbUPP GI:71748684 Trypanosoma brucei 433 89 42 TcUPP GI:71655719 Trypanosoma cruzi All 43 PpUPP GI:26988561 Pseudomonas putida 384 44 OsUPP GL89093228 Oceano spirillum sp. 387 45 TdUPP GL74317941 Thiobacillus denitrificans 386 46 NsUPP GL82703043 Nitrosospira multiformis 382 47 MaUPP GI:77955545 Marinobacter aquaeolei 365 48 AeT GL78700294 Alkalilimnicola ehrlichei 380 49 HhT GI:88948590 Halorhodospira halophila 389 50 TtP GI:20517508 Thermoanaerobacter tengcongensis 382

90 Table 2: The expression profile of human UPPl(hUPPl) and 2 (hUPP2) in various tumor tissues

hUPPl hUPP2 Tumor Type Adrenal tumor X X

Chondrosarcoma V X Glioma i X Non-glioma V X Breast(mammary gland) cancer V X Colorectal tumor V X

Esophageal tumor X X

Gastrointestinal tumor V X Germ cell tumor V V

Cervical tumor X X

Laryngeal cancer V X Oral tumor V X Thyroid tumor V X Pharyngeal tumor V X Kidney tumor V X Leukemia V X Liver tumor V X Ovarian tumor V X Pancreatic tumor V X Respiratory tract tumor V X

Retinoblastoma X X

Skin tumor V X

Soft tissue/muscle tissue tumor X X Urinary bladder tumor X X

Uterine tumor V X Mixed (normal and tumor) V X

92 Table 3: Comparison of [3H]-hypoxanthine and [3H]-uridine uptake during 10 seconds exposure in Wild type and hUPPl transfected MCF-7cells (Pooled Data, n=3)

[ H]-hypoxanthine [3H]-uridine uptake uptake (pmoles/mg (pmoles/mg protein) protein)

Mean±SD Mean±SD

GFP:hUPPl 10.70 ± 3.76* 9.54 ±5.10

Wild Type 5.405 ± 3.57 11.62 ±4.44

* represents significant increase (p <0.05)

93 Table 4: [ H]-hypoxanthine uptake during 60 minutes exposure in pCDNA3.1(+) and hUPPl expressed oocytes (n= 3)

pCDNA3.1(+) hUPPl (Mean ± standard error) (Mean ± standard error)

Experiment 1 0.119 ±0.01 0.131 ±0.008

Experiment 2 0.129 ±0.019 0.153 ±0.015

Experiment 3 0.118 ±0.009 0.111 ±0.009

Average 0.122 ±0.013 0.132 ±0.011

Table 5: [3H]-uridine uptake during 60 minutes exposure in pCDNA3.1(+), hENTl and hUPPl expressed oocytes (n= 3)

pCDNA3.1(+) hENTl hUPPl (Mean ± standard (Mean ± (Mean ± error) standard error) standard error) Experiment 1 0.172 ±0.015 0.287 ±0.012 0.378 ±0.018 Experiment 2 0.165 ±0.015 0.293 ± 0.043 0.333 ±0.011 Experiment 3 0.162 ±0.011 0.252 ±0.011 0.282 ±0.017 Average 0.166± 0.014 0.277± 0.022* .331± 0.015*

* represents significant increase (p <0.05)

94 Figure legends Figure 1: Predicted membrane topology of dona UPP (CiUPP). The TMHMM2.0 shows a putative 12 transmembrane domain topology for CiUPP. Red lines represent extracellular loops and blue lines represent intracellular loops. Figure 2: Membrane topology of LacY and NupG ofE. coli. As generated by TMHMM2.0, both LacY and NupG have 12 transmembrane domains. Red lines represent extracellular loops and blue lines represent intracellular loops. Figure 3: Predicted membrane topology ofhUPPl andhUPP2. The TMHMM2.0 shows a putative 12 transmembrane domain topology for both hUPPl and hUPP2. Red lines represent extracellular loops and blue lines represent intracellular loops. Figure 4: Plasma membrane localization ofhUPPl in Cos-7 cells. Confocal microscopy with Cos-7 cells transiently transfected with GFP: hENTl and GFP: hUPPl shows plasma membrane localization of GFP: hUPPl and GFP: hENTl and (a) no co- localization with mitochondrial stains, (b) no localization in the lysosome. Bar=10um.

Figure 5: Plasma membrane localization ofhUPPl in MCF-7 cells. Confocal microscopy with MCF-7 cells transiently transfected with GFP: hENTl and GFP: hUPPl shows plasma membrane localization of GFP: hUPPl and GFP: hENTl and (a) no co- localization with mitochondrial stains, (b) no localization in the lysosome, (c) low levels of localization in the golgi body (bright orange area). Bar=T0um.

Figure 6: Localization of pEGFP-Cl in MCF-7 cells. Cells transiently transfected with pEGFP-Cl vector shows localization of the empty vector throughout the cell. Bar=10um. Image is adopted from Karanvir Wasal (Coe lab).

Figure 7: fH] hypoxanthine uptake by hUPPl expressed in MCF-7 cells (Pooled data). HUPPl exhibits hypoxanthine uptake in MCF-7 cells. Uptake of [3H]-hypoxanthine (lOuM) measured after 10 sec exposure. HUPPl showed significantly (p <0.05) higher uptake than the negative control. Uptake was expressed as mean ± standard deviation. Each experiment was conducted in sextuplicate, n=3. Statistically significant (p <0.05) increase was represented by *. Figure 8: [ H] uridine uptake by hUPPl expressed in MCF-7 cells (Pooled data). HUPPl do not exhibit uridine uptake in MCF-7 cells. Uptake of [3H]-uridine (lOuM) measured after 10 sec exposure. HUPPl did not show higher uptake than the negative control. Uptake was expressed as mean ± standard deviation. Each experiment was conducted in sextuplicate, n<3. Flag: hENTl was used as a positive control. 95 Figure 9: fH] hypoxanthine uptake by hUPPl inXenopus oocytes (Pooled data). Uptake of [ H]-hypoxanthine (IOOJJM) was measured after 60 minutes exposure in oocytes injected with hUPPl or linearized empty vector transcripts (negative control). HUPPl did not show higher uptake than the negative control. Uptake was expressed as mean ± standard error. Figure 10: f HJ uridine uptake by hUPPl inXenopus oocytes (Pooleddata). Uptake of [3H]-uridine (lOOuM) was measured after 60 minutes exposure in oocytes injected with transcripts of hUPPl or linearized empty vector (negative control) or hENT (Positive control). HENT1 and hUPPl showed significant (p <0.05) increase compare to the negative control. Uptake was expressed as mean ± standard error. Statistically significant (p <0.05) increase was represented by *.

Figure 11: Phylogenic distribution of UPP family. The rootless tree generated by PHYLIP (using default settings) shows the grouping of eukaryotic and prokaryotic UPP members.

96 Figure 1: Predicted membrane topology of Ciona UPP (CiUPP)

200 300 400 500

2 3 4 5 6 7 8 9 10 11 12 Amino acids and Predicted transmembrane helices

97 Figure 2-a: Membrane topology of LacY of E. coli

Amino acids and Predicted transmembrane helices

Figure 2-b: Membrane topology of NupG of E. coli

6 7 8 9 10

Amino acids and Predicted transmembrane helices

98 Figure 3-a: Predicted membrane topology of hUPPl

500 800 7 8 9 10 11 12 Amino acids and Predicted transmembrane helices

Figure 3-b: Predicted membrane topology of hUPP2

100 300 4Q0 500 3 5 6 7 8 9 10 11 12 Amino acids and Predicted transmembrane helices Figure 4: Confocal microscopy with cos-7 cells transiently transfected with GFP: hENTl and GFP: hUPPl shows plasma membrane localization of GFP: hUPPl and GFP: hENTl and (a) no co-localization with mitochondrial stains, (b) no localization in the lysosome a)

GFP:hENT1

GFP:hUPP1

b)

GFP:hENT1

GFP:hUPP1

100 Figure 5: Confocal microscopy with MCF-7 cells transiently transfected with GFP: hENTl and GFP: hUPPl shows plasma membrane localization of GFP: hUPPl and GFP: hENTl and (a) no co-localization with mitochondrial stains, (b) no localization in the lysosome, (c) low levels of localization in the golgi body a)

GFP: hENTl

GFP:hUPP1

b)

GFP:hENT1

GFP:hUPP1

101 c)

GFP:hENT1

GFP:hUPP1

102 Figure 6: Localization of pEGFP-Cl in MCF-7 cells

103 Figure 7: [ H] hypoxanthine uptake by hUPPl expressed in MCF-7 cells (Pooled data).

Statistically significant (p <0.05) increase was represented by *.

d) C 1o_ a O) c E X(0 "as o CD a O X QE.

Control GFP:hUPP1

Figure 8: [ H] uridine uptake by hUPPl expressed in MCF-7 cells (Pooled data).

20n

s & 15H *i 1

3 ® ri-, O 5H x E

Control GFP:hUPP1 Flaq:hENT1 Figure 9: [ H] hypoxanthine uptake by hUPPl expressed in Xenopus oocytes (Pooled data)

0.20-1

S. E 0.15-

Z O 0.10- c -5 2 g 0.05H £•1 0.00- pCDNA3.1(+):Lin hUPPl

Figure 10: [ H] uridine uptake by hUPPl expressed inXenopus oocytes (Pooled data).

Statistically significant (p <0.05) increase was represented by *.

0.4i

£ £ 1 0.3-

9r * a> O0.2H c o "35 is _a> Z) |0.H Q. 0.0- pCDNA3.1(+):Lin hENT1 hUPPl Figure 11: Phylogenetic distribution of UPP family

Eukaryotic

UPP1

Prokaryotic

UPPs

106 2.5 Discussion

The novel putative transporter sequence in Ciona genome led to the identification of homologous proteins in both eukaryotes and prokaryotes which constitute a previously undescribed family of proteins. None of the UPPs are functionally characterized. A novel protein family that is evolutionary conserved from bacteria to mammals leads to various speculations as to what cellular or physiological role they may have. Interestingly, two human UPPs were also identified in this analysis and both these proteins have hydropathy profiles which suggest a topology of 12 TM domains. The human UPP1 (hUPPl) showed a 12 TM structure with intracellular N terminus and extracellular C terminus. The

12 TM domain structure of hUPPl is similar to the MFS permeases. MFS permeases are present in membranes of archaea to mammals (Pao et al. 1998) and their transporters are evolutionary conserved and possess conserved residues among all taxa. For example, the

Sugar porter (SP) family is present in all three kingdoms-bacteria, archaea and eukarya and their sequence alignment show high level of conservation. Having similar putative structure and sequence similarity to the OHS and NHS family makes hUPPl a member of the MFS and an interesting target for research.

HUPPl appears to be ubiquitously expressed throughout the human body according to expressed sequence tag (EST) counts predicted by BodymapXS including brain, heart intestine, kidney, lung, mammary gland, pancreas, pituitary gland etc. and the highest level was found in central nervous system and lungs suggesting that this transporter plays a significant physiological role in those tissues. Since glucose is the primary source of energy in the brain and hUPPl is expressed in brain, it is likely that 107 hUPPl might function as glucose transporter. Additionally, its presence in numerous tumor cells and its identification as a putative member of the MFS makes it a potentially important transporter in the transport drugs for clinical applications. If it transports sugar molecules like the OHS family or nucleosides like the NHS family, inhibiting their transport activity can also be way of treating tumor cells. Inhibiting glucose molecules will block the energy source whereas inhibiting nucleoside transport will interfere with the intracellular nucleoside pool. Therefore, identifying a novel human member of the

MFS raised the enticing possibility that this protein might have functional characteristics and substrate preference similar to the OHS or NHS family.

While it is still not clear what the substrate preference of hUPPl is, during the course of this research, a study by Kasho et al, confirmed that hUPPl possess conserved residues that are involved in sugar binding and H+ translocation (Kasho et al. 2006). The significant similarity of hUPPl to LacY family members led to their speculation that it might be a lactose or sugar permease. Expressing this protein in E. coli XLl-Blue cells and in Pichia pastoris strains showed apparent expression at the membrane and could not be identified in any other cellular fraction (Kasho et al. 2006). However, expressing this protein in bacteria did not show any uptake of lactose, galactose or glucose (Presented data, Ron Kaback, CSBMCB conference 2006). These data provide evidence that hUPPl is not a homologue of the OHS family or the SP family.

In continuation of this work, our analysis has further revealed that hUPPl sequence has similarity to previously characterized proton/nucleoside symporter in

E.coli, NupG. The possibility of the existence of eukaryotic homologues of NupG is 108 present in current literature (Hao et al. 2004). However, a full-length human protein has not been identified to date. This raises the possibility that UPP members could belong to the NHS family of the MFS, whereas NuPG is the prokaryotic homologue of this transporter family. Therefore, I confirmed the plasma membrane localization of hUPPl in mammalian cell lines and tested a nucleoside to determine the function of this protein.

All mammalian NTs characterized to date transport uridine (Cass et al. 1998).

And if hUPPl is a homologue of the NHS family, it most likely has an affinity for uridine as well. Interestingly, when expressed in oocytes, hUPPl did show an affinity for uridine comparable to uridine uptake of hENTl. A similar uptake was not observed in the MCF-

7 cell system. However, the low substrate concentration used in the uptake assay could be the explanation since uptake was not observed in oocytes also while using a low concentration of uridine (20uM) (data not shown). Some NTs are capable of transporting nucleobases (e.g hENT2) along with nucleosides and therefore it was a logical choice to test a nucleobase (hypoxanthine). Surprisingly, hypoxanthine uptake was observed in hUPPl expressed MCF-7 cells. Lack of hypoxanthine uptake in oocytes suggests a different regulation or post-translational modification of hUPPl in the human adenocarcinoma cells than oocytes. At this moment, I cannot explain what that regulation might be, since very little is known about the structure and function of hUPPl and this is the first study to demonstrate a possible substrate specificity of hUPPl. MCF-7 cells possess endogenous ENT2 that is capable of transporting hypoxanthine (hENT2), but the nucleobase transport kinetics in tumor cell lines have been less extensively studied and other transport proteins like hUPPl might have a role (Marshman et al. 2001). 109 Nucleobase uptake systems in various cells are in general not well-studied and there is evidence of nucleobase transport in some cell types, e.g. Sertoli cell, where an ENT or

CNT system is not involved and the uptake of nucleobases cannot be explained (Kato et al. 2006). This phenomenon suggests the presence of one or multiple unknown transport systems that may work together or independently to provide the route for nucleobase movement.

Although our preliminary experiments demonstrate that hUPPl transports nucleosides, performing uptake assays in the presence of other substrates (eg. Adenosine, inosine, cytidine) would be able to generate a complete transport profile for hUPPl.

Moreover, this transporter transports uridine at neutral pH (7.5) and it is necessary to determine how proton concentrations affect its function. Since hUPPl is expressed in many tumor cells, knowing its broad substrate specificity and detailed functional characteristics can aid towards therapeutic approaches against cancer. Besides, localization of this protein in the CNS might help in drug transport across the blood-brain barrier which makes this study extremely important for potential drug transport applications.

In this study, I have only focused on hUPPl due to their homology to MFS permeases and their abundance in various tissues and tumors. The functional characteristics of hUPP2 are still not known. Although hUPPl and hUPP2 have low sequence homology, hUPP2 might have significant roles in nucleoside transport and in

110 other aspects of cellular physiology and characterization studies are required to comprehend the function of this protein.

2.6 Conclusion

This study describes the identification of a novel nucleoside transporter family and provides functional data on a human member of this family. Identification of a novel nucleoside transport system that is expressed ubiquitously in human body including multiple tumors provides another means of drug therapy and an opportunity to explore the physiological role of these proteins in various organisms.

Ill References 1. Baldwin SA, Beal PR, Yao SY, King AE, Cass CE and Young JD. 2004. The equilibrative nucleoside transporter family, SLC29. Pflugers Arch. 447(5):735-43.

2. Cabrita MA, Baldwin SA, Young JD, Cass CE. 2002. Molecular biology and regulation of nucleoside and nucleobase transporter proteins in eukaryotes and prokaryotes. Biochem Cell Biol. 80(5):623-38.

3. Cass CE, Young JD and Baldwin SA. 1998. Recent advances in the molecular biology of nucleoside transporters of mammalian cells. Biochem. and cell biol. 76: 761-770.

4. Cass CE, Young JD, Baldwin SA, Cabrita MA, Graham KA, Griffiths M, Jennings LL, Mackey JR, Ng AM, Ritzel MW, Vickers MF, Yao SY. 1999. Nucleoside transporters of mammalian cells. Pharm Biotechnol. 12:313-52.

5. Coe IR, Yao L, Diamond I, Gordon AS. 1996. The role of protein kinase C in cellular tolerance to ethanol. J Biol Chem. 271(46):29468-72.

6. Gray JH and Owen RP and Giacomini KM. 2004. The concentrative nucleoside transporter family, SLC28. Pflugers Arch. 447(5): 728-34.

7. Hao Xie, Simon G. Patching, Maurice P. Gallagher, Gary J. Litherland, Adrian R. Brough, Henrietta Venter, Sylvia Y. M. Yao, Amy M. L. Ng, James D. Young, Richard B. Herbert, Peter J. F. Henderson, Stephen A. Baldwin. 2004. Purification and properties of the Escherichia coli nucleoside transporter NupG, a paradigm for a major facilitator transporter sub-family. Mol Membr Biol. 21(5): 323-36.

8. Kasho VN, Smirnova IN, Kaback, HR. 2006. Sequence alignment and homology threading reveals prokaryotic and eukaryotic proteins similar to lactose permease. J. Mol. Biol. 358: 1060-1070.

9. Kato R, Maeda T, Akaike T, Tamai I. 2006. Characterization of novel Na+- dependent nucleobase transport systems at the blood-testis barrier. Am J Physiol Endorinol Metab 290:E968-E975.

10. Marshman E, Taylor GA, Thomas HD, Newell DR, Curtin NJ. 2001. Hypoxanthine transport in human tumour cell lines: relationship to the inhibition of hypoxanthine rescue by dipyridamole. Biochem Pharmacol. 61(4):477-84.

112 11. Newman MJ, Foster DL, Wilson TH, Kaback HR. 1981. Purification and reconstitution of functional lactose carrier from Escherichia coli. J. Biol. Chem. 256:11804-11808.

12. Pao SS, Paulsen IT, Saier MH Jr. 1998. Major facilitator superfamily. Microbiol Mol Biol Rev. 62(1): 1-34.

13. Podgorska M, Kocbuch K, Pawelczyk T.2005. Recent advances in studies on biochemical and structural properties of equilibrative and concentrative nucleoside transporters. Acta Biochim Pol. 52(4):749-58.

14. Yao SY, Ng AM, Sundaram M, Cass CE, Baldwin SA, Young JD. 2001. Transport of antiviral 3'-deoxy-nucleoside drugs by recombinant human and rat equilibrative, nitrobenzylthioinosine (NBMPR)-insensitive (ENT2) nucleoside transporter proteins produced in Xenopus oocytes. Mol Membr Biol. 18(2): 161-7.

15. Zhang J, Visser F, King KM, Baldwin SA, Young JD, Cass CE. 2007. The role of nucleoside transporters in cancer chemotherapy with nucleoside drugs. Cancer Metastasis Rev. 26(1):85-110.

113