Chapter 2 PRNP and PrP

Chapter 2: PRNP and PrP

52 Chapter 2 PRNP and PrP

Chapter 2: PRNP and PrP

In this Chapter, I first discuss features of the PRNP and of vertebrate prion . Then I describe background and logic of my strategy for discovery of PRNP homologues. Finally, I outline arguments for the Kangaroo Genome Project and the reasons why I used the tammar wallaby PRNP in comparative genomic analysis.

2.1 Vertebrate Prion Proteins

The set of known vertebrate proteins comprising the prion protein family consists of 99 members, from fish to mammals. There are 79 mammalian PrPs (78 eutherian and 1 marsupial), 14 bird PrPs, 2 reptile PrPs, 1 amphibian PrP and 4 fish PrP homologue sequences available in the NCBI Protein database (Figure 2.1).

There is conservation of the general sequence features among the vertebrate PrPs. Prion proteins can be provisionally divided into four regions with distinct composition: the basic region (region 1), the repeats or low-complexity sequence (region 2), the hydrophobic region (region 3), and the C-terminal region (region 4). However, the PrP regions from different vertebrate classes exhibit differences in their primary sequences (Chapter 6.3).

Most notably, the PrPs show conservation in the middle hydrophobic sequence, in the presence of one bond and two N- sites in the C-terminal domain, and in the presence of the N- and C-terminal signal sequences for extracellular export and attachment of a GPI anchor. On the other hand, the N-terminal repeat region is variable, both in repeat motif length and sequence, and is entirely absent in frog PrP.

The species barrier in prion transmission is determined in part by the sequence similarity between host PrPC and exogenous PrPSc (Chapter 1.2). Variability and conservation among prion protein sequences is therefore important because of the risk of prion transmission.

53 Chapter 2 PRNP and PrP

Figure 2.1. Overall structures of PrP, stPrP, PrP-like and Sho proteins showing: S, signal sequence; B, basic region; H, hydrophobic region; R/PGH, PGH-rich repeats; R/GH, GH-rich repeats; B,R/RG, RG-rich basic repeats; B,R, basic repeats; N, N- glycosylation site; S-S, disulfide bridge; GPI, glycophosphatidylinositol anchor; GY and GYH, GY- and GYH-rich regions. Regions and attachment positions are approximately to scale. Numbers indicate the first residue of each section, and last one of each protein. A, Mammalian, avian and reptilian PrPs; numbers refer to human; additional N site for avian in italics. B, Xenopus laevis PrP. C, Fugu, Tetraodon and salmon stPrP; numbers refer to PrP-461. D, Fugu and Tetraodon PrP-like; numbers refer to Fugu. E, Zebrafish PrP-like. F, Zebrafish and Fugu Sho (Chapter 4); the arrow indicates insertion region in Fugu; numbers refer to zebrafish. G, Mammalian Sho (Chapter 4); numbers refer to human.

53a Chapter 2 PRNP and PrP

The discovery and characteristics of mammalian PrPs are described in Chapter 1.2. Here I will summarize the history of discovery of PrPs in other species, and also outline major analyses of PrPs.

Harris et al. (1991) isolated a cDNA coding for the chicken PrP. This protein showed 33% identity with mouse PrP, but the middle hydrophobic sequence, glycosylation sites (with the third site unique to birds), disulphide bridge and GPI-anchor were conserved. The proximal repeats of bird and mammalian PrP, however, showed marked differences (Chapter 6.3). The chicken PRNP mRNA levels increased during postnatal development in brain in parallel with the levels of choline acetyltransferase mRNA.

To understand better the species barrier that determines prion transmission between human and primates, Shätzl et al. (1995) compared human PrP with 25 monkey and ape PrPs. The most prominent difference in this selection of PrPs was in the number of proximal repeats: whereas one fewer repeat was detected in the orang-utan, African green monkey and spider monkey PrP, one additional repeat was found in the squirrel monkey PrP relative to the human PrP. Variations in the residues 90-130 (PrPC-PrPSc interface; Chapter 1.2.5) could influenced human prion transmission to apes. However, this analysis also indicated that residues outside this region are also involved in the species barrier. Differences in approximately one third of amino acid residues in PrP were observed when the bovine, , mink, rat, mouse, Armenian hamster, Chinese hamster and Syrian hamster PrP were included in the alignment. Genomic organization of the PRNP gene was identical in all the species (Chapter 2.2).

Windl et al. (1995) sequenced the first marsupial PrP. This sequence revealed overall conservation of mammalian prion protein (80% identity) but there were differences in the composition of proximal repeats (Chapter 6.3).

Wopfner et al. (1999) expanded the number of known mammalian PrPs to a total of 46, and the number of avian PrPs to a total of 9, and then analysed the regions of PrP that control the species barrier. Structural regions (Chapter 1.3) were conserved among

54 Chapter 2 PRNP and PrP mammals, as were functional positions including the two glycosylation sites, two and a residue (amino acid 231 in human) that is the attachment for a GPI-anchor. The minor differences in PrPs could strongly affect disease transmission: there are only two residues different between the dog and cat PrP, and between the ferret and mink PrP. However, whereas dog and ferret are resistant, cat and mink are susceptible to prion infection. PrPs were also highly conserved between bird species (roughly 90%), but avian PrPs showed only 30% of overall identity with mammalian PrPs. The only PrP region invariably conserved between mammals and birds was the middle hydrophobic sequence bordered by residues 110-128 in human PrP (Chapter 1.2.5). The proximal repeats of avian and mammalian PrPs are different, perhaps reflecting different evolutionary pressures.

Van Rheede et al. (2003) studied molecular evolution of the mammalian (eutherian) prion protein. In order to include representatives of major clades from all 18 eutherian orders in the analysis, they sequenced 26 new eutherian PrPs. Glycosylation sites, disulphide bridge, hydrophobic region, elements of secondary structure and signal are all conserved among the eutherian PrPs (Figure 2.1). The repeat number in eutherian PrPs varies from as low as two in squirrel (also shown in the lemur PrP (Gilch et al., 2000) to seven in gymnure and leaf-nosed bat. Deviations from the repeat consensus sequence were observed, as well as repeat homogenization. Not all the in repeats implicated in binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent mutational process in the eutherian PRNP. I show in Chapter 6.3 that this counts also for the marsupial mammals.

Simonic et al. (2000) cloned a cDNA encoded by the turtle PRNP. The 270-residue protein showed 40% identity with mammalian and 58% identity with avian PrPs. Ten tandem hexarepeats were found in the N-terminal part of the protein, whose composition was different from those in bird and mammals (Chapter 6). Homology modelling of the turtle PrPC C-terminal region suggested that turtle PrP could generate the same fold as mammalian PrP.

55 Chapter 2 PRNP and PrP

Strumbo et al. (2001) reported the sequence of the 216 residues in X. laevis PrP (Figure 2.1). This amphibian PrP showed more identity with avian and turtle PrPs (more than 44%) than with mammalian PrP (about 28%). The major surprise was a lack of the repeats in the N-terminal part of the protein. The conserved hydrophobic sequence was four residues shorter than in other PrPs.

Suzuki et al. (2002) reported a new gene in three fish species, encoding the protein with similarities to PrP (Figure 2.1). First, a cDNA was cloned in Fugu rubripes coding for a protein of 180 amino acids. This protein was named PrP-like, because it shared the conserved middle hydrophobic sequence and other features of PrP, including its basic nature and the predicted N-terminal signal sequence and GPI-anchor attachment. As for mammal PrP, the complete coding region lies within a single exon (Chapter 2.2). However, the Fugu PrP-like lacked the repeats, disulfide bridge and glycosylation sites of other vertebrate PrPs, and had a different C-terminal domain. Two other fish PrP-like sequences were discovered in Tetraodon nigroviridis and zebrafish.

Two more fish were later reported to encode proteins with structural features similar to PrP (Rivera-Milla et al., 2003; Oidtmann et al., 2003). Firstly, Rivera-Milla et al. (2003) reported a cDNA from Fugu rubripes encoding a protein of 461 amino acids (PrP461) that contained the conserved hydrophobic region and a C-terminal domain similar to those in other vertebrate PrPs, including the disulfide bridge and N- glycosylation sites, one of which is conserved with other PrPs. However, it had a greatly expanded repeat region. Sequence similarity between the Fugu PrP461 and mammalian PrPs was 22%. The same Fugu rubripes protein independently discovered by Oidtmann et al. 2003 was named stPrP-1. It has a different length (450 amino acids due to the inclusion of an extra small (30 bp) intron in its ORF (Chapter 2.2). A 605 amino acids orthologue from Atlantic salmon Salmo salar, longer because of an expanded repeat region, was also described by Oidtmann et al. (2003). A homologous stPrP-1 from Tetraodon was found in public genomic data (Rivera-Milla et al., 2003 and Oidtmann et al., 2003).

56 Chapter 2 PRNP and PrP

Secondly, Oidtmann et al. (2003) reported a cDNA encoded by a third related gene in Fugu they named stPrP-2. This stPrP-2 is closely related to stPrP-1 and has the same sequence features: a hydrophobic region that is disrupted by charged residues, a C- terminal domain with the disulfide bridge and three N-glycosylation sites, and an expanded repeat region. It was estimated that the Fugu stPrP-1, Fugu stPrP-2, salmon stPrP-1 and Fugu PrP-like show 24.8, 21.3, 17.7 and 16.3% identity with the human PrP 27-30 (residues 90-230).

In addition to the problem of transmission of among mammals and to humans in the recent BSE crisis, the findings of PrP homologues in fish raise new issues: the possibility of spread of prions to farmed fish (e.g. from meat and bone meal feedstuff derived from farm animals), and vice versa. Oidtmann et al. (2003) indicated that it seems unlikely that fish could accumulate mammalian prions, but this possibility should not be excluded, as factors contributing to the species barrier are not fully understood.

2.2 PRNP and Its Homologues

The mammalian prion protein gene and its fish homologues have similar characteristics, including the exon/intron structure and complete ORF within 3’ terminal exon.

2.2.1 Mammalian PRNP

There is only a single PRNP gene in the mammalian genome. Interest in the structure and regulation of this gene has been extreme because it dictates both host PrPC amino acid sequence and level of its expression. These features determine genetic resistance/disposition to the prion diseases (Prusiner and Scott, 1997; Prusiner, 1998).

The first PRNP gene studied was that of Syrian hamster (Basler et al., 1986). Analysis showed that it has two exons (56-82 bp and 2kb), separated by one intron (10 kb) and that the entire ORF resides within the larger exon. Multiple transcription sites were observed within 25 bp in the upstream promoter. The promoter region contained three Sp-1-binding sites but no TATA box, features of a housekeeping gene that are in tune

57 Chapter 2 PRNP and PrP with the ubiquitous expression of PRNP (Chapter 2.3). Li and Bolton (1997) showed that there is another non-coding exon (99 bp) within the intron. In different brain regions, the transcript containing all three exons was expressed at 30-50% of the level of transcript containing only exons 1 and 3. Worth noting here is that the first full- length PrP sequence was translated from the Syrian hamster PRNP DNA sequence (Chapter 1.2.2).

Human PRNP has the same gene structure as Syrian hamster PRNP (Puckett et al., 1991), with a short proximal exon coding for a 5’untranslated region of mRNA (136 bp), a single intron (13 kb) and a distal exon (2.3 kb) containing the ORF. Although no trace of the exon 2 was found in human cDNAs, the gene contains an exon 2-like sequence (Lee et al., 1998). The proximal promoter is GC-rich, typical of housekeeping genes. I analyse human PRNP characteristics in Chapter 6.

Lee et al. (1998) compared the human PRNP with the mouse and ovine PRNPs. The mouse Prnp (21 kb) encompasses three exons. The short exons 1 (47 bp) and 2 (98 bp) encode the 5’ UTR (Westaway et al., 1994a), and the complete ORF lies within the exon 3 (2 kb). The splice donor and acceptor sites flanking the exon 2 were different from consensus sequences, suggesting that splicing may not be obligatory. Two mouse Prnp that determine different incubation times after prion infection, Prnpa and Prnpb, have different lengths (approximately 6 kb difference in the second intron) but this difference does not affect incubation times. There are multiple transcription start sites across the 25 bp promoter region. The promoters of both Prnpa and Prnpb contain binding sites for the Sp1 and AP-1 transcription factors. There are four motifs 250 bp proximal to the transcription start site CTTTCATTTTCTC, CCATTAt/cGTAACG, TAAAGATGATTTTTA, TCAGGGAG. These are conserved in the mouse Syrian hamster, sheep and human promoters but their functional significance is unclear.

The 20 kb sheep PRNP gene also has three exons (52, 98, 4028 bp) (Westaway et al., 1994b). The coding exon 3 is longer than those in other PRNPs. There is neither a Sp1- nor an AP-1-binding site in the promoter, but there is an AP-2-binding site.

58 Chapter 2 PRNP and PrP

Hills et al., (2001) reported full genomic sequence of the 20 kb bovine PRNP (Figure 2.2). The gene structure is the same as that of the sheep PRNP, with three exons (53, 98 and 4092 bp) and two introns (2442 and 13552 bp).

Comparative genomic analysis of the human, mouse and sheep PRNP showed that the genes accumulate transposable elements extensively and independently (Lee et al., 1998). The content of transposable elements was estimated to be 40% in human and mouse, and 57% in sheep. The 6 kb difference between mouse Prnpa and Prnpb is due to insertion of the transposable element intracisternal A-particle into the intron 2. The three-species comparisons identified conserved non-coding sequences in the intron 1 and in the 3’UTR region of the terminal exon (Chapter 6.5). The longer terminal exon is present in the bovine and sheep genes due to integration of the Bov-B, Bov-tA, and Mariner transposable elements in the 3’UTR.

Mammalian PRNP lies adjacent to one or two related genes (Chapter 5). The gene immediately distal to PRNP in eutherian mammals is the PRND encoding the doppel (Dpl) protein (Chapter 2.2.2), which is thought to have arisen by a duplication of PRNP (Mastrangelo and Westaway 2001). The next gene adjacent to PRND, detected so far only in humans and not present in mouse, is PRNT gene (Makrinou et al. 2002), which seems to be a pseudogene arisen from a duplication of PRND (Chapter 5.5). Further distal to the PRNT in human, and to the Prnd in mouse, are the RASSF2 encoding Ras association domain family 2 protein, and the SLC23A1 encoding solute carrier family 23 member 1 protein, conserved in both human and mouse genomes (Chapter 5.5).

The PRNP gene is located on human 20p13, and in syntenic regions on mouse chromosome 2F3, rat chromosome 3q36, dog chromosome 24 (Ensembl), bovine chromosome 13q17, river buffalo chromosome 14q15, sheep chromosome 13q15, goat chromosome 13q15 (Iannuzzi et al., 1998) and chicken chromosome 22 (Ensembl).

In summary, these analyses have identified, as conserved features of eutherian PRNP promoters, their GC richness and a lack of TATA box typical of housekeeping genes. There are some differences in gene structure and regulation of between

59 Chapter 2 PRNP and PrP

Figure 2.2: Structure of the mammalian PRNP and fish PrP-like genes. (A) Typical mammalian PRNP has two short noncoding exons (E1 and E2), two introns, and the complete ORF within longer terminal exon (E3). E2 is missing in the human PRNP. The sizes of exons and introns correspond to the bovine PRNP. (B) Fugu PrP-like contains one non-coding exon (E1) and the complete ORF is within terminal exon (E2). The two rulers indicate size in kb. Exons are depicted by black rectangles. ORF is shown as white rectangle.

59a Chapter 2 PRNP and PrP species. PRNP genes contain either three or two exons. There is a single transcription start site in PRNP except for rodents which have multiple transcription start sites. Whereas the hamster, human, bovine and mouse promoters contain Sp-1-and AP-1- binding sites, the sheep promoter does not but instead include an AP-2-binding site.

2.2.2 PRND: A Mammalian Paralogue of PRNP gene

The first mammalian paralogue of gene encoding the prion protein was discovered in mouse by sequencing the genomic DNA 16 kb downstream of the Prnp gene (Moore et al., 1999). Prnd encodes a GPI-anchored glycoprotein of 179 amino acids dubbed “doppel” (Dpl; double in German) showing roughly 25% identity with mammalian PrP. However, Dpl contains neither the middle hydrophobic section of the PrP critical for its function, nor the proximal repeats (Chapter 1.2). PRND is 27 kb distal to PRNP in human. The human and rat Dpl are 76% and 90% identical with mouse Dpl. Unlike Prnp, the Prnd was expressed minimally in the adult mouse brain, and highly in testis. Expression of Prnd was upregulated in the brains of Prnp0/0 mice lines Ngsk Prnp0/0 and Rcm Prnp0/0 that exhibit and (Chapter 2.5.1).

The solution structure of recombinant mouse Dpl (amino acids 26-157) was very similar to that of PrPC (Chapter 1.3), despite limited (Figure 2.3) (Mo et al., 2001). A globular domain contained three helices and little of β-structure. Two disulfide bonds were found, one between Cys-109 and Cys-143 and the other between Cys-95 and Cys-148. Regions of secondary structure occurred roughly at the same positions in both proteins, but differences include a kink in helix αB, shorter helix αC, shorter β-strands and different orientation of the β-sheet in Dpl.

Prnd knockout mice develop normally (Behrens et al., 2002). Sterility was found in male but not in female Prnd-deficient mice. The spermatids from Prnd knock-out males were immobile and malformed, their number was reduced and they were unable to fertilize oocytes in vitro. Acrosomal defects observed in the Prnd knockout sperms could account for infertility, perhaps due to inability of sperms to cross zona pellucida. Transformation of the round spermatids into testicular spermatozoa was also abnormal,

60 Chapter 2 PRNP and PrP

Figure 2.3: Comparison of the backbone topology of recombinant mouse Dpl and PrP. αA, α helix A; αB, α helix B; αC, α helix C (copied from Mo et al., 2001).

60a Chapter 2 PRNP and PrP as well as regional separation of the spermiogeneic differentiation stages. Thus Prnd is implicated in male gametogenesis. PrPC, although expressed in testis, could not compensate for the loss of Dpl, indicating that Prnp and Prnd have non-redundant functions, at least in the male reproductive tract. Indeed, the Prnp/Prnd double knockout mice showed no additional new (Chapter 2.5.1).

The proximity of PRNP and PRND, and sequence and structural similarities of their products, indicate that they are product of tandem duplication. After duplication of an ancestral gene, the two genes (duplicates) evolved distinct and unrelated functions (divergent evolution). Although the two proteins retain similar architectures but with slightly different topologies, their diverged amino acid compositions dictate different functions.

2.2.3 Fish PRNP Homologues

Features of the fish PRNP homologues were first defined in the Fugu genome. Suzuki et al. (2002) reported the PrP-like gene and analyzed its structure and its local genomic environment. The gene structure resembles that of mammalian PRNP: a short exon 1 (39 bp) and a long exon 2 (932 bp) harbouring the complete ORF are separated by an intron (1.5 kb) (Figure 2.2). The Fugu PrP-like transcript was expressed in skin, eyes and brain, an expression pattern different from that of mammal PRNP (Chapter 2.3). It was noted that the PrP-like resides in the same genomic region as mammalian PRNPs, proximal to RASSF2 and SLC23A1 in both Fugu and mammals. Suzuki et al. placed the PrP-like between these two genes, suggesting an evolutionary relationship between the fish PrP-like and tetrapod PRNP genes (Chapter 5.5).

Oidtmann et al. (2003) provided more details about the PRNP-related genes in Fugu. They determined that stPrP-2 lies 2 kb proximal to PrP-like, and RASSF2 and SLC23A1 were distal (Chapter 5.5). This fish genomic region did not contain the PRND, which is reported only in mammals.

61 Chapter 2 PRNP and PrP

The other PRNP homologue, stPrP-1 was found in different genomic context. It contains a small intron within the ORF, unlike other members of the PRNP gene family that had been described (Oidtmann et al., 2003). Rivera-Milla et al. (2003) demonstrated expression of the PrP461 (stPrP-2) transcript in brain and liver. Oidtmann et al. (2003) showed that stPrP-1, but not stPrP-2 mRNA is expressed in brain in Fugu. Salmon stPrP-1 transcript is expressed ubiquitously (muscle, liver, skin, gills, kidney, spleen, heart, brain) but most prominently in brain, an expression pattern similar to PRNP expression in mammals (Chapter 2.3).

Suzuki et al. (2002) made the initial suggestion of the evolutionary link between the fish PrP-like and tetrapod PRNP based on their similar protein sequence features (extracellular, GPI-anchored proteins with repeats and middle hydrophobic region) and shared contex (proximity to RASSF2 and SLS23A1 genes). However, Oidtmann et al. (2003) considered that the stPrP-2 had a closer evolutionary relationship with tetrapod PrPs because its C-terminal region has more similarity to mammalian PrP than does the PrP-like. Of all the fish PrP homologues, stPrP-1 showed highest homology with other PrPs although the gene is located in a different genomic context. I analyse these competing hypotheses in Chapter 5.

2.3 Expression of PRNP and PrPC

Mammalian PRNP is a housekeeping gene and is expressed in a heterogenous set of cells. This was first demonstrated by Oesch et al. (1985) who cloned a partial cDNA coding for Syrian hamster PrP. The mRNA levels were the same in both normal and prion-infected brain. Transcription of mRNA was shown in a range of other tissues: heart, lung, pancreas, liver, spleen, testis and kidney.

Regulated expression of PRNP during Syrian hamster brain development was demonstrated by Northern analysis (McKinley et al., 1987). A low level of the mRNA was found one to ten days after birth, rising to a maximal between the day 10 and day 20 after birth, and remained constant throughout life. PrPC expression increased from a low at day 2 to a maximum at day 10 after birth. These changes correlate with

62 Chapter 2 PRNP and PrP morphological changes occurring during mammalian brain development, including differentiation and increase in the rates of synaptogenesis and myelination which occur after postnatal day six, suggesting PRNP’s involvement in neuronal maturation (Chapters 2.5.2 and 6.5).

Caughey et al. (1988) found that PRNP is expressed in normal and -infected mouse and hamster brains, liver and spleen. In situ hybridisation (Brown et al., 1990) revealed PRNP expression in the and non-neuronal cells of mouse brain (ependymal cells, choroid plexus epithelium, astrocytes, pericytes, endothelial cells and meninges). Transcription was found also in the microglia cells, alveolar lining and septal interstitial pulmonary cells and myocard, but not in spleen.

PRNP is also expressed in a number of the cell lines from mouse (epithelial cell line C127, neuroblastoma Neuro 2A cells, erythroid cell line AA60, embryo fibroblast B6- 3T3 cells, B cell lymphoma cell line 1593), Syrian hamster (ovary-derived CCL61 cells), human (astrocytoma HTB14 cells, neuroblastoma HTB10 cells) and rat (glioma- derived C6Bu3 cells). No PRNP mRNA was found in the mouse myeloid cell lines 5402 (differentiated) and 7320 (undifferentiated), nor in the human lymphoma cell line MBL-2 (Caughey et al., 1998).

Human and lymphoid cell lines (but not erythrocytes or granulocytes) transcribe PRNP and express PrPC (Cashman et al., 1990). After activation of T lymphocytes, abundance of PrPC on the cell surface increased. Polyclonal antibodies to PrPC suppressed concanavalin A-induced activation of lymphocytes, indicating that the PrPC may participate in activation of T lymphocytes (Chapter 2.5.2).

Manson et al. (1992) studied PRNP expression during embryonic mouse development using in situ hybridisation. Transcripts were found by 13.5 or 16.5 days throughout the developing brain and spinal cord, and also in the peripheral (ganglia and nerve trunks of the sympathetic nervous system and neural cells of sensory organs). At this stage PRNP expression was also detected in the differentiating non-neuronal cells of dental lamina and kidney. In extra-embryonic tissue, the PRNP transcripts were

63 Chapter 2 PRNP and PrP found in the maternal cells of the placenta, and in the amnion, umbilical cord and mesodermal layer of yolk-sac.

The distribution of tissues expressing PrPC was studied in Syrian hamster (Bendheim et al., 1992). Immunohistochemical analysis localized PrPC in brain to the neurons and surrounding neuropil in the , septal, caudate and thalamic nuclei, dorsal root ganglia and dorsal root axons. PrPC was most concentrated within the hippocampus including the CA1, CA3, CA4 subfields, fimbria, pyramidal cells, dentate formation and the intervening neuropil. Cortex, fornix, caudate, thalamus, brainstem and spinal cord expressed less PrPC. In non-neuronal tissues, the circulating leukocytes, heart, myocard, lung (bronchial epithelium), stomach (parietal and glandular neuroepithelial cells), intestines, spleen, testis and ovary all expressed PrPC.

Askanas et al. (1993) demonstrated that PrPC is concentrated at the postsynaptic domain of human normal neuromuscular junctions (NMJ). At the NMJ, molecular compositions of the and immediately postsynaptic cytoplasmic domain are different from those in the nonsynaptic region of the muscle fibre.

Ford et al. (2002a) developed antibodies recognising PrPC in glutaraldehyde-fixed tissue and studied PrPC expression in the brain. PrPC expression was predominantly neural. The GABA-immunoreactive neurones showed the highest levels of expression. Dopaminergic neurones and glia, on the other hand, showed no PrPC expression. However, all the neurones expressed PRNP mRNA, indicating the importance of posttranscriptional control of mRNA activity (Chapter 6.5).

PrPC is expressed in a heterogenous set of mouse tissues outside brain (Ford et al., 2002b), including peripheral nerves and Schwann cells, sympathetic ganglia and nerves, parasympathetic and enteric nervous system, antigen presenting and processing cells, populations of lymphocytes and the neuroendocrine system. A good correlation between mRNA and protein was found outside brain.

64 Chapter 2 PRNP and PrP

Barmada et al. (2004) generated transgenic mice in which the PRNP promoter drives expression of a fusion protein PrP-EGFP (enhanced green fluorescent protein). PrP- EGPH was expressed within -rich regions in brain. In the hippocampus, fluorescence was found in the synapse-rich layers such as the strata oriens, radiatum, lacunosum-moleculare and lucidum, alveus, subiculum, fimbria and hilus. PrP-EGPH was found throughout the neocortex. In the , fluorescence was detected at high levels in the molecular layer, and at lower levels in the granule cell layer and white matter.

Morel et al. (2004) analysed expression of PrPC in normal human intestinal tissues. PrPC was expressed in enterocytes, the dominant cell population of the intestinal epithelium, and also in the vascular epithelia. The enterocytic cell line caco-2/TC7 also expresses PrPC.

2.4. Cell Biological Features of PrPC

The metabolism of PrPC determines both its normal role and its contribution to prion disease pathogenesis.

Mammalian PrPC is a membrane protein that cycles constitutively between the and early endosomes (reviewed in Harris, 2003). The biosynthetic pathway of PrPC is similar to that of other secreted and membrane proteins. It is first synthesized in the endoplasmic reticulum (ER), then post-translationally modified (cleavage of signal peptides, N-linked glycosylation and addition of the GPI-anchor) in the ER and Golgi, and finally it reaches the cell surface. The PrPC molecules cycle constitutively through the cell with a transit time of approximately 1 hr: the t1/2 for internalisation and the t1/2 for return to the cell surface are both roughly 20 min with the protein being equally divided between the two compartments (Shyng et al., 1993). Most of the molecules are recycled intact to the cell surface but a small percentage is proteolytically cleaved in the middle of the protein. Roughly 10-30% of the membrane-anchored C molecules is released into the extracellular milieu. The t1/2 for degradation of PrP in lysosomes is 3-6 hrs (Taraboulos et al., 1992).

65 Chapter 2 PRNP and PrP

Most of the protein resides in membrane “rafts”, detergent-resistant domains enriched in sphingolipids that are foci for events. Internalisation of PrPC may occur via clathrin-coated vesicles and is mediated by the N-terminal part of the protein or, alternatively, through a caveolae-mediated endosomal pathway. Binding of copper stimulates the endocytosis.

Peters et al. (2003) analysed PrPC trafficking using cryoimmunogold electron microscopy. They found that PrPC was enriched in the caveolae, stable membrane microdomains (“rafts”) that mediate key cell processes such as signal transduction, anchored by the actin cytoskeleton, and enriched in caveolin, cholesterol and glycosphingolipids. PrPC was delivered to the late endosomes/lysosomes via a nonclassical, caveolae-containing early endocytic structures (“caveosomes”). The GPI- anchored proteins may cycle between the cell surface and trans Golgi network via this pathway and inhibitors of such endocytosis may be of therapeutic interest.

Early studies of PrPC localization were ambiguous: it was predominantly found in the soma with minor signal in the neuropil (Bendheim et al., 1992; Ford et al., 2002b) but PrP was found to be predominant in the neuropil as well. It could also be predominant in the synaptosomal plasma membrane but with no presence in the synaptic vesicles or cytosol (Herms et al., 1999).

Mironov et al. (2003) investigated ultrastructural localization of PrPC in the mouse hippocampus cornu ammonis 1 (CA1) and dentate gyrus areas. They demonstrated ubiquitous cell distribution of the extracellular PrPC. Consistent with its GPI-anchored membrane asociation, this suggests that it diffuses along the cell membrane. PrPC was associated predominantly with the neuropil and had the same concentration within the synaptic specializations and perisynaptically. It was present with the same concentration in the presynaptic and postsynaptic membranes and within the synapse, but no PrP was found in the synaptic vesicles. Besides PrPC associated with the biosynthetic and endocytic membranous structures, a cytosolic PrP was also identified in subpopulations of unknown neurons in the hippocampus, neocortex and thalamus (CPrP cells). This

66 Chapter 2 PRNP and PrP cytosolic PrP could be novel PrP entity, with structure and function different from the extracellular PrPC.

Barmada et al. (2004) confirmed existence of the cytosolic PrP. Further, they showed that the PrP-EGFP (Chapter 2.3) is localized primarily along axons and in presynaptic terminals. This distribution is consistent with retrograde and anterograde transport of PrP along axons (Moya et al., 2004) and with preferential sorting of some GPI-anchored proteins in neurons to their axonal surface. There was less PrP-EGFP on dendrites in the hippocampus and cerebellum.

In enterocytes, PrPC is localized in rafts microdomains as well (Morel et al., 2004; Chapter 2.3). Further, it was mainly concentrated in the lateral membrane, associated with the junctional complexes. This localization was dependent on cell-cell contacts (Chapter 2.5.2). PrPC was not found on the apical membrane.

There are three topological forms of PrP known: the extracellular GPI-anchored form (PrPC) comprising roughly 50% of total PrP, and two transmembrane entities (CtmPrP and NtmPrP) spanning the cell membrane in opposite orientations and comprising about 10% and 40% of total PrP (Hegde et al., 1998). Two adjacent regions act in concert to generate transmembrane entities: TM1 (A113-S135 in human PrP) and STE (for stop transfer effector L104-M112 in human PrP). Aberrant regulation of PrP biogenesis and topology may cause neurodegeneration. CtmPrP caused severe neurodegeneration in mice, and is a key component in the GSS disease pathway caused by the A117V . NtmPrP could have normal role.

Both PrP isoforms have two variably occupied glycosylation sites, Asn181 and Asn197 in human PrP (reviewed in Rudd et al., 2002). More than 50 glycans occupied either or both sites in PrPC. The PrPSc from the scrapie-infected hamster brain also had glycans at these sites, but contained more of tri- and tetra-antennary glycan complexes. The glycans stabilise the folded part of PrPC, so altering its sugars could have functional consequences. For example, the PrP transformation occurs more readily if PrPC is unglycosylated. The oligosaccharides are also required for the intracellular trafficking

67 Chapter 2 PRNP and PrP of PrPC (Chapter 1.2.6). Further, they are big in comparison with the PrPC. Simulations of molecular dynamics (Zuegg and Gready, 2000) showed that the folded domain of PrPC is stabilized by an indirect effect of glycosylation, and that the glycans change the surface charge to a negative electrostatic field which could inhibit the association of PrPC with the membrane.

Both PrP isoforms also contain a phosphatidyl inositol glycolipid that attaches them to the outer leaflet of cell membrane (Stahl et al., 1988). In fact, all vertebrate homologues of PrP were predicted to contain this GPI-anchor (Chapter 2.1). It is readily cleaved by a bacterial enzyme PI-PLC (Chapter 1.2.7), releasing PrPC from the cell membrane. Simulations of the molecular dynamics indicated that the GPI-anchor is flexible and maintains the protein 9-13Å from the cell membrane (Zuegg and Gready, 2000). In general, GPI-anchored proteins are involved in signal transduction and cell activation (e.g. acetylcholinesterase in synaptic cleft) and they show rapid locomotion (Medof et al., 1996). They may be promiscuous and reincorporate into membranes in trans, remaining fully functional (protein “painting”). Such intermembrane transfer of the mouse GPI-anchored complement restriction factors from erythrocytes to the epithelium was shown to occur in vivo under physiological conditions (Kooyman et al., 1995). By analogy with these observations, this feature of GPI-linked proteins may enable spreading of prions from neuron to neuron.

Indeed, Liu et al. (2002) demonstrated that PrPC could be transferred from cell to cell by a GPI-dependent process in vitro. This process is tightly regulated, as it occurred only after either donor or recipient cells, or both, were activated by the protein C (Chapter 2.5.2) activator phorbol 12-myristate 13-acetate (PMA). The transfer was also dependent on direct cell to cell contact.

Exosomes are membrane vesicles released into the extracellular millieu. Follicular dendritic cells, which are implicated in the peripheral prion disease pathogenesis (Chapter 1.2.9), release and exchange exosomes with other cells (Fevrier and Raposo, 2004). Exosomes are released after exocytic fusion of multivesicular endosomes, and could act as carriers for intercellular exchange of PrPC and PrPSc (Fevrier et al., 2004).

68 Chapter 2 PRNP and PrP

A fraction of infectious PrPSc was released from the scrapie-infected Mov and Rov cells in association with exosomes. Native PrPC is released in the same manner. Protein composition of the PrP-carrying exosomes was evaluated by mass spectroscopy. Among others, proteins involved in adhesion, membrane fusion and exosome biogenesis were found, indicating that the PrP-carrying vesicles are bona fide exosomes. Exosomes are a newly discovered mode of intercellular communication. They are released by many cell types including B cells and intestinal epithelial cells. They are enriched in cell-type specific proteins (e.g. MHC I and II in B cells), in ubiquitous proteins involved in biosynthesis of exosomes and their adhesion to target cells, in membrane raft components, and in GPI-anchored proteins. This finding is in agreement with result of Peters et al. (2003): one fate of the caveosomes is exocytic fusion and release into the extracellular environment.

Yedidia et al. (2001) showed that roughly 10% of the newly synthesized PrPC is degraded by the ERAD-proteasome pathway, which is responsible for clearing of misfolded proteins (Chapter 1.4.1). During this process, PrP molecules are translocated to the cytosol, unglycosylated, ubiquitinated and degraded.

Ubiquitous expression indicates functional contribution of the PRNP to many cell types. Glycosylated, GPI-anchored extracellular PrPC diffuses along the cell membrane. It resides in the cell membrane foci that mediate signal transduction, cycles constitutively and is degraded by the lysosomes.

2.5 Normal Function of PRNP

The normal function of prion protein gene remains elusive, and a number of hypotheses were proposed.

2.5.1 Prnp Knock-Out Mice

Prnp knock-out mice were constructed to illuminate the normal function of Prnp.

69 Chapter 2 PRNP and PrP

Prion protein gene knock-out mice conservatively generated by disrupting the Prnp ORF have no obvious phenotype (Bueler et al., 1992; Manson et al., 1994; Weissmann and Flechsig, 2003). No major anatomical abnormalities, infertility, difference in immunological status, learning or behavioural changes were found.

A more radical Prnp knock-out (which, as well as disrupting the ORF, also included removal of the splice acceptor site of the exon 3) produced ataxia and loss of the Purkinje cells later in life (Sakaguchi et al., 1996). However, this phenotype was a consequence of the up-regulation of the Prnd gene and its high, non-physiological expression in brain (Chapter 2.2.2).

There are several explanations for the lack of phenotype in the conservative Prnp knock-out mice. The knock-out phenotype could be so subtle that a selective disadvantage may emerge only after many generations, for example as a consequence of stressful conditions. Alternatively, the functional redundancy or compensation of its loss by other molecule(s) may mask the loss of the Prnp gene. Another possibility is that the protein may have recently lost its function (Bueler et al., 1992). Finally, the knock-out phenotype may not be apparent in laboratory settings.

However, there could be more subtle phenotypic changes in the Prnp knock-out mice.

Collinge et al. (1994) reported that the CA1 hippocampal slices from Prnp knock-out mice show weakened GABAA receptor-mediated fast inhibition and impaired long-term potentiation. However, other laboratories could not confirm this observation (Lasmezas, 2003). Colling et al. (1997) reported aberrant mossy fibers in the Prnp0/0 mice hippocampus CA2 and dentate gyrus regions, similar to morphological abnormalities following epileptic seizures.

Mice devoid of Prnp exhibited alterations in circadian activity rhythms and sleep (Tobler et al., 1996) indicating involvement of Prnp in regulation of sleep. Period lengths of the circadian activity rhythms were longer in the null mice than in wild type. Next, the Prnp0/0 mice were less active in the first half of the dark period. The null mice

70 Chapter 2 PRNP and PrP also showed different non-rapid eye movement sleep (REM), waking distribution in the dark and sleep fragmentation. These were rescued by re-introduction of the Prnp gene. Evaluation of behavioural parameters in the Prnp0/0 mice showed normal fear-motivated memory, anxiety and exploratory behaviour but slightly increased locomotor activity (Roesler et al., 1999).

Results consistent with the mild knock-out phenotype were produced by using a tetracycline controlled transactivator to repress PrPC expression in adult mice. Tremblay et al. (1998) found no deleterious effects. After administration of doxycycline (an analogue of tetracycline) to adult mice, expression of PrPC in brain was repressed by 90% after seven days of treatment; when doxycycline was withdrawn, it took seven days for PrPC expression to return to its normal level. Doxycycline-treated mice were not susceptible to exogenous prions. The absence of systemic or CNS dysfunction upon PrPC repression also argues in favour of redundancy between PrPC and other molecule(s) as no developmental compensation and adaptation was possible using this experimental system.

Using the cre-loxP system to knock-out the Prnp gene in 9 week old mice Malluci et al. (2002) found that mice remained healthy and showed no evidence of neurodegeneration. However, a significant reduction of afterhyperpolarization in the CA1 cells was found, indicating that the PrPC may modulate neuronal excitability by affecting afterhyperpolarization. Bypassing developmental compensatory mechanisms induced no detrimental effect, suggesting once again functional redundancy between Prnp and another gene(s).

Coitinho et al. (2003) studied behavioural parameters in the 3- and 9-months old Prnp0/0 mice. Behavioural parameters were also compared after administration of anti-PrPC antibodies into the CA1 region of dorsal hippocampus in normal 3- and 9-months old rats. Memory performance normally declines with aging, starting at the age of 9-12 months in rodents. No difference from normal mice was observed in the 3-months old Prnp0/0 mice. On the other hand, impairment of both short- and long-term memory was observed in the 9-months old Prnp0/0 mice when compared with normal mice. This was

71 Chapter 2 PRNP and PrP also the case in comparisons of 9-months old rats that received anti-PrPC antibodies compared with normal rats. Decreased locomotor activity during observation of an open field was observed in the 9-months old Prnp0/0 mice. Normal anxiety was found in both Prnp0/0 mice age groups. These observations may be explained by the impairment (or modification) of PrPC physiological functions in the adult Prnp0/0 mice hippocampus.

The Prnd gene (Chapter 2.2.2) is dispensable for prion disease pathogenesis. Its normal function must encompass reproduction, since male Prnd knock-outs are infertile. Overexpression of Prnd in the brain causes neurodegeneration that can be rescued by the expression of Prnp. Mice in which both paralogues, Prnp and Prnd, were inactivated showed no additional new phenotype (Genoud et al., 2004). Double knock- out mice had no morphologic or immunologic abnormalities apart from infertility of male mice. This analysis showed that there is no functional redundancy between Prnp and Prnd genes. Therefore, functional redundancy is likely to exist between Prnp and its other homologue(s) (Chapters 4-6).

The homologue(s) of Prnp with redundant function are unknown. Shadow of prion protein SPRN is the only human gene that is such a candidate at present (Chapters 4-7).

2.5.2 Hypotheses about the Function of PRNP

Many hypotheses have been proposed for PRNP function, including its involvement in copper transport, copper buffering, redox signalling, neuroprotection, cell-cell interactions, activation and nucleic acid metabolism and signal transduction. Here I will briefly outline eight hypotheses, and describe in full ninth which is supported by my work (Chapter 6).

2.5.2.1 PrPC Transports Copper

The endocytic pathway of PrPC could suggest a role in uptake or in efflux of an extracellular ligand (Harris, 2003). Mammalian PrPC binds copper cooperatively at five to six sites in a low micromolar range (total copper concentration is 16-20 µM in blood,

72 Chapter 2 PRNP and PrP

0.5-2.5 µM in cerebrospinal fluid and 15 µM in synapse) and in a pH-dependent manner (optimal at physiological pH) (Brown et al., 1997). The residues involved in copper binding are histidines that reside in the proximal octarepeats, and histidines in the C- terminal domain (His96, His111 or His140 in human PrP; Chapter 2.1). Deletion of the proximal repeats in chicken PrPC also affected copper binding (Pauly and Harris, 1998).

Prnp0/0 mice showed reduced copper content in the membrane-enriched brain and liver extracts and increased content of serum copper. Tenfold reductions of copper content were also found in the synaptosomal and endosome-enriched brain fractions, indicating that the PrPC-deficient cell membranes are also deficient in copper. Further, a reduction in the activity of copper/ superoxide dismutase (SOD-1) and altered electrophysiological responses in the excess of copper were observed in PrPC-deficient cells.

Thus, PrPC is a copper-binding cuproprotein whose low affinity copper binding may allow exchange of copper with other molecules. In this, PrPC may be similar to the proteins implicated in pathogenesis of Parkinson’s disease (monoamine oxidase), Alzheimer’s disease (amyloid precursor protein APP) and familial amyotrophic lateral sclerosis (SOD-1), which are also cuproproteins.

It is unclear how copper and PrPC could be functionally related. Bound copper may serve as a cofactor for enzymatic activity of PrPC, PrPC may act as a sink for chelation of extracellular copper ions, or PrPC may act as a carrier protein for copper uptake and delivery to intracellular targets. Pauly and Harris (1998) showed that copper rapidly and reversibly stimulates endocytosis of PrPC from the cell surface. Incubation of N2a C mouse neuroblastoma cells expressing either mouse or chicken PrP with excess CuSO4 (200 µM, 500 µM) rapidly stimulated internalisation of both PrPCs. The removal of metal reversed the PrPC distribution.

Two models for the role of PrPC in copper trafficking were hypothesized (Harris, 2003). Firstly, PrPC could serve as a receptor for uptake of copper ions from the extracellular milieu. It could bind copper on the plasma membrane via the proximal repeats and

73 Chapter 2 PRNP and PrP deliver it by endocytosis to the acidic endosomal compartments, where copper ions dissociates at low pH and are then transported to the cytoplasm. PrPC could then return to the cell surface to begin a new cycle. Alternatively, PrPC could facilitate cellular efflux of copper via the secretory pathway by binding copper ions in the Golgi compartments.

2.5.2.2 PrPC Buffers Copper from the Synapse

As PrPC is concentrated at the synapse both presynaptically and postsynaptically, copper binding may have an anti-oxidant effect that is important for synaptic (reviewed by Brown, 2001). At the cellular level, PrP-deficient cells are more susceptible to oxidative damage and toxicity, and show increased sensitivity to various kinds of stresses, implying a protective role of PrPC. The synaptic release of copper may increase its local concentration to up to 250 µM. This copper is usually bound to peptides or amino acids and must be taken up rapidly by the neurones. Excess copper can catalyse interconversion of various reactive oxygen species, or even generate hydroxyl radicals from water. Sequestering it from the synapse is therefore important to protect the cell from oxidative damage. PrPC-deficient cells do take up copper, but to a lesser extent than PrPC-containing cells.

Brown et al. (2001) showed that the protection of cells against by PrPC is proportional to the amount of copper it binds. Both purified PrPC and recombinant PrP exhibited superoxide-dismutase-like activity in a formazan formation assay. The SOD-like activity increased with the number of copper molecules incorporated, and it depended on the copper concentration. This suggested that copper binding facilitates changes in the secondary structure of the protein. The SOD-like activity was inhibited when PrPs were incubated with the PrP106-126 (Chapter 1.2.5). Increased resistance to oxidative stress was also shown for cells grown in excess copper, but not when PrPC was stripped away using PI-PLC. Expression of PrPC with bound copper boosted cellular resistance to oxidative stress.

74 Chapter 2 PRNP and PrP

Cui et al. (2003) investigated which regions of prion protein are required for the SOD- like activity. The repeats and hydrophobic region (Chapter 2.1) are indispensable for this activity, and the C-terminus is also important.

Several studies argue against a copper-transporting role for PrPC; for example, the in vitro study of Rachidi et al. (2003) indicated that the PrPC was not involved in delivery of copper at physiological concentrations (1.6 µM).

2.5.2.3 PrPC Contributes to Redox Signalling

Another suggestion is that PrPC could be a copper-sensitive stress-sensor, which is able to initiate signal transduction cascades. After sensing stimuli such as copper and/or free radicals, PrPC could trigger intracellular calcium signals that contribute to modulation of synaptic transmission and maintenance of neuronal integrity (reviewed by Vassallo and Herms, 2003). PrPC may efficiently buffer copper at the synapse in order to maintain copper concentrations in the presynaptic cytosol and protect from oxidative insult. These complementary activities should also contribute to the preservation of neuronal electrophysiology. Copper may be transported back to the cell by other transporters present on the outer side of the cell membrane.

Some features of PrPC, like its neuroprotective effect against oxidative stress, suggest that it is involved in free radical pathways, as these overlap with systems controlling homeostasis of redox-active metals such as copper. One scenario is that PrPC acts as a modulator of calcium flux in response to copper because copper enables redox signalling and triggers responses. Thus copper will bind to PrPC after its concentration increases, enabling it to participate in the redox reactions (such as SOD-like activity), in turn triggering membrane and activating Ca2+-mediated signalling cascades. Therefore PrPC may act as a sensor for strong copper/reactive oxygen species (ROS) stimuli and by generating a signal through redox chemistry it may turn on Ca2+ - mediated signalling.

75 Chapter 2 PRNP and PrP

2.5.2.4 PrPC has Neuroprotective Role

Several lines of evidence implicate a role of PrPC in prevention of apoptotic cell death.

Using the yeast two-hybrid system, Kurschner and Morgan (1995) identified Bcl-2 as a binding partner of PrPC. Bcl-2 specifically suppresses apoptosis in a number of cell types and it can bind proteins from the same and from other protein families. A comprising the C-terminal 183 amino acids of mouse PrPC (residues 72-254) interacted with the Bcl-2 region that contains the BH2 domain (residues 174-236). By this association, PrPC could sequester Bcl-2 from its intracellular organelle pools, and the depletion of Bcl-2 pools during prion disease and accumulation of PrPSc may contribute to apoptosis.

Kuwahara et al. (1999) established hippocampal cell lines from Prnp0/0 and Prnp+/+ mice. A stress insult (serum removal from the cell culture) caused apoptosis in the Prnp0/0 cells but not in the Prnp+/+ cells. Transduction of the Prnp0/0 cells with either PrPC- or Bcl-2-coding constructs prevented apoptosis of the cells under the serum-free conditions. Prnp0/0 cells had shorter neurites than Prnp+/+ cells, but this was also abrogated by the expression of PrPC in Prnp0/0 cells. This study strongly indicated the involvement of PrPC in prevention of cell death.

Human PrPC protected neurons against apoptosis mediated by the Bax protein (Bounhar et al., 2001). Inhibition of apoptosis depended on the proximal octarepeats but not on the GPI-anchor. Bax is not pro-apoptotic unless it is induced by insult or overexpression. However, overepression of both Bax and PrPC prevented apoptosis in the human primary neurons. Conversely, an antisense PrPC cDNA potentiated the effect of Bax overexpression. Trafficking of PrPC past the cis-Golgi was required for neuroprotection. The PrP D178A (FFI) and T183A prevented the protective effect of PrPC. Thus, PrPC could be a strong natural neuroprotector.

Activation of PrPC in vitro induced neuroprotection (Chiarini et al., 2002). An immunogenic PrPC-binding peptide (PrR; Martins et al., 1997) that binds the mouse

76 Chapter 2 PRNP and PrP

PrPC between residues 113-128 activated the cAMP/protein kinase A (PKA) and ERK pathways, partially preventing apoptosis in retinal explants from neonatal rats or neonatal mice, but not from Prnp0/0 mice. Incubation of cells with PrR increased the intracellular levels of cAMP, activity of PKA and activation of ERKs. Addition of the PrP106-126 peptide disrupted interactions between PrPC and PrR, blocking the neuroprotective effect. Inhibitors of PKA, but not of ERKs, blocked neuroprotection suggesting involvement of cAMP/PKA-dependent pathway in the PrPC-mediated neuroprotection. Further, antibodies to PrPC that increased cAMP also increased the neuroprotective effect, indicating that the activation of PrPC transduces neuroprotective signals through a cAMP/PKA-dependent pathway and affects sensitivity to induced apoptosis.

A cDNA microarray analysis was used to determine which genes are over- or under- expressed in a human breast cancer cell line resistant to the cytotoxic action of tumor necrosis factor α (TNF) (Diarra-Mehrpour et al., 2004). Seventeen-fold overexpression of PRNP mRNA and also overexpression of PrPC was found in a TNF-resistant clone. Furthermore, overexpression of PrPC was able to convert TNF-sensitive cells into TNF- resistant cells. The protective effect of PrPC on tumor cells could be a consequence of its interaction with 2 and activation of the PI3K/Akt pathway.

2.5.2.5 PrPC Mediates Intercellular Contacts

PrPC binds molecules in extracellular matrix and on the cell membrane that mediate cell- cell interactions.

The 37-kDa laminin receptor precursor (LRP) was identified as an interacting partner of PrPC using the yeast two-hybrid system in S. cerevisiae (Rieger et al., 1997). PrPC binds the same domain of LRP (residues 161-180) as does laminin. Laminin is a glycoprotein involved in cell attachment, differentiation, movement and growth. The interaction between PrPC and LRP was confirmed by re-transformation and by co-transfection in the insect (Sf9) and mammalian (COS-7) cells. The LRP level was higher in scrapie- infected N2a cells and in brains of scrapie-infected mice and hamsters. The LRP,

77 Chapter 2 PRNP and PrP located on the cell surface, binds elastin and laminin and mediates their action. The two extracellular proteins, PrPC and LRP may interact on the cell surface.

Interaction between laminin and PrPC was also shown (Graner et al., 2000) by the specific and saturable fashion in which PrPC bound laminin. In brain, laminin promotes neuronal differentiation, migration of neurons, neuronal regeneration and also acts anti- apoptotically. These effects are mediated by the cell membrane receptors because are major components of the extracellular matrix. For example, the interaction between laminin and amyloid precursor protein promotes neurite outgrowth. The PrPC- laminin interaction was also involved in neuritogenesis induced by NGF and laminin in PC-12 cells, suggesting a role for PrPC in neuronal plasticity. Supporting this hypothesis are the observations that anti-PrPC antibodies inhibited neuritogenesis and that NGF treatment of the PC-12 cells increased PrPC expression by 25%. Laminin is a big (800 kDa), heterotrimeric molecule with many known isoforms. PrPC bound preferentially to the well-conserved γ-1 chain C-terminal domain of laminin that stimulates neurite outgrowth. Neuritogenesis stimulated by the γ-1 chain was abrogated in the Prnp0/0 cells. PrPC may therefore act as a laminin receptor.

In the caveolae-like membrane microdomains (“rafts”), PrPC was identified as a part of protein complexes together with three spice variants of the neural cell-adhesion molecule (N-CAM) (Schmitt-Ulms et al., 2001). The N-CAMs belong to the immunoglobulin superfamily and they mediate cell-cell interaction by triggering cytosolic signals. The PrPC-N-CAM interaction occurred through amino-acid side chains. The interacting face of PrPC, its N-terminal part, the first helix and the adjacent loop, bound the β-strands C and C’ within two adjacent N-CAM fibronectin type III modules. The partners may associate early during their joint passage in the secretory pathway. Knock-out mice lacking N-CAM were susceptible to prions, indicating that N- CAM is not the protein X (Chapter 1.2.4). However, the PrPC/N-CAM association may be involved as an alternative signalling route from PrPC to Fyn tyrosine kinase (see below).

78 Chapter 2 PRNP and PrP

In order to identify proteins that reside near PrPC in the cell, Schmitt-Ulms et al. (2004) used time-controlled transcardiac cross-linking (tcTPC), a method that combines transcardiac perfusion and mild formaldehyde cross-linking. More than 20 proteins were identified; most of these were either integral membrane proteins or proteins that reside near the cell membrane. Some of the proteins were components of the secretory pathway. Of twenty proteins, six are GPI-anchored proteins (PrPC, N-CAM 1, N-CAM 2, myelin-associated glycoprotein, contactin-1, limbic system-associated membrane protein), and two were previously identified partners of PrPC, chaperone BiP (Chapter 1.4.1), and APP-like proteins. Most of these twenty proteins are involved in cell adhesion and neuritic outgrowth. Although it is possible that not all identified proteins are genuine interacting partners of PrPC, this analysis confirmed that PrPC is embedded within the specialized membrane microdomains (“rafts”) together with a defined subset of other GPI-attached molecules.

Morel et al. (2004) showed co-localization of PrPC and Src kinase at the junctional complexes on the lateral membrane of enterocytes. A pool of Src also co-precipitated with anti- PrPC antibodies and vice versa. Thus, PrPC could play a role in intercellular signalling and/or sensing of neighboring cells, through an interaction with Src kinases (Fyn tyrosine kinase is a member of Src family; see below).

2.5.2.6 PrPC is Involved in Lymphocyte Activation

Evidence that PrPC is involved in the activation of T cells includes the observation that PrPC is expressed at high levels in T cells, B cells, and dendritic cells (Li et al., 2001). The composition of N-linked glycans on PrPC from these cells is different from those on PrPC from brain or neuroblastoma cells. The level of PrPC expressed on the surface of T cells increased as a consequence of cellular activation (Chapter 2.3). The memory T cells express more PrPC than naïve T cells. Anti-PrPC antibodies inhibited the proliferation of T cells in vitro. Thus PrPC may be involved in the activation of T cells.

79 Chapter 2 PRNP and PrP

There is a strict association between the PrPC and Fyn in the lymphoblastoid T cells (Mattei et al., 2004). PrPC clustered within the glycophospholipid-enriched membrane microdomains (GEMs) where it strongly interacted with the GM1 and GM3 gangliosides. The GM3 is the main constituent of GEMs where it modulates signal transduction. The phosphorylation protein ZAP-70 was also found to interact with PrPC after T cell activation mediated by CD28 and CD3. ZAP-70 has a key role in the GEM- associated signalling pathways leading to T cell activation. PrPC could be a component of the signalling complex leading to T cell activation.

Finally, after hypothermal stimulation of the human lymphocyte cell line Jurkat E6.1, PrPC co-localized with the CD3 and GM1 in the lipid rafts (Wurm et al., 2004).

Thus, PrPC could be involved in activation of T cells.

2.5.2.7 PrPC Participates in Nucleic Acid Metabolism

PrPC has nucleocapsid protein-like properties (Gabus et al., 2001). Human PrPC mimicked the chaperone properties of HIV Ncp7 nucleocapside protein by actively assisting the annealing of complementary nucleic acid strands, viral RNA dimerization, hybridisation of replication primer tRNALys to the HIV-1 5’-primer binding site sequence and initiation of reverse transcription by reverse transcriptase. The transmembrane or the cytoplasmic PrP entities (Chapter 2.4) could interact with cellular and/or viral nucleic acids.

2.5.2.8 PrPs are Memory Molecules

Alternative PrP conformations other than PrPSc could exist (Tompa and Friedrich, 1998). The self-sustaining autocatalytic propagation of these states may determine the normal PrP function. A kinetic model was proposed in which PrP forms a bi-stable molecular switch that can structurally encode and stably store information. Such a mechanism could control a range of physiological processes, including the formation of

80 Chapter 2 PRNP and PrP memory. The mechanism for long-term synaptic stabilization mediated by the neuronal isoform of CPEB from sea hare shows similarities with this model (Chapter 1.5.2).

2.5.2.9 PrPC is Signal Transduction Protein

The final hypothesis I will discuss is that PrPC could be a signal transduction protein.

This is supported by findings of interactions between PrPC and proteins involved in signal transduction. Antibody cross-linking of PrPC in the mouse 1C11 neuronal cells triggers activation of the Fyn tyrosine kinase (Mouillet-Richard et al., 2000). In the mouse hippocampus, Fyn contributes to the molecular mechanisms for induction of long-term potentiation (a long-lasting enhancement of synaptic transmission thought to be the cellular basis for learning and memory) (Kojima et al., 1997). The 1C11 cell line is a neuroectodermal progenitor that, depending on the inducers, differentiates into either 1C11*/5-HT serotonergic cells or 1C11**/NE noradrenegic cells. PrPC is expressed in both progenitor and differentiated cells. PrPC cross-linking did not trigger response in the progenitor cells. However, dephosphorylation of the Fyn tyrosine kinase and increase of its kinase activity was found 10 min after ligation of the ani-PrPC antibodies 1A8 and SAF61 in the differentiated cells. Progenitor and differentiated cells have similar amounts of PrPC, but the signalling competence involving PrPC depended on the differentiation and full acquisition of neuron-associated functions. In the differentiated cells, PrPC co-immunoprecipitated with caveolin-1. Antibodies against caveolin-1 inhibited the PrPC-mediated activation of Fyn, indicating involvement of caveolin-1 in the coupling of PrPC with Fyn. Physiological extracellular signal leading to the activation of PrPC is unknown. Although PrPC was abundant in both cell bodies and neurite extensions, the neuritic PrPC was mostly due to Fyn activation. Thus, PrPC may be involved in modulation of neuronal functions.

The recombinant bovine PrP (residues 25-242) interacts with the catalytic α/α’ subunits of protein kinase CK2 (Meggio et al., 2000), a pleiotropic protein kinase that is abundant in brain. CK2 phosphorylates more than 200 substrates, most of which are involved in signal transduction and gene expression. The association between CK2 and

81 Chapter 2 PRNP and PrP

PrP induced CK2 phosphotransferase activity. Both N-terminal and C-terminal parts of recombinant PrP were involved in this activation, but the N-terminus was more important for activation. The CK2 is extracellular and could contact PrPC on the outer side of the cell membrane leading to stabilization of the active conformation of CK2.

Recombinant mouse PrP (residues 23-231) was used as a bait to screen a mouse brain cDNA expression library in the yeast two-hybrid system (Spielhaupter and Shätzl, 2001) leading to identification of the neuronal phosphoprotein synapsin Ib, adaptor protein Grb2 and uncharacterized prion interactor Pint as potential partners of PrPC. These interactions were confirmed by co-immunoprecipitation assays. Synapsin Ib and Grb2 interacted with both the N- and C-terminal parts of PrP, but Pint interacted with the C-terminal part only. PrPC co-fractionated with synapsin Ib and Grb2 in microsomal preparations, indicating that these proteins interact in the intracellular, presumably Golgi, vesicles. Pint1 is a newly discovered protein, with homologues in human and C. elegans. Synapsins reversibly attach synaptic vesicles to the cytoskeleton, and regulate their release, so the interaction between synapsin Ib and PrPC may contribute to the regulation of cell-cell contact and extracellular signalling. Grb-2 is an adaptor involved in intracellular signal transduction, which links signals coming from extracellular proteins to their intracellular effectors. Interactions between PrPC and these proteins involved in signal transduction suggest a role for PrPC in signal transduction.

When PrPC was stimulated with various anti-PrPC antibodies in the 1C11 progenitor and differentiated cells, neurohypothalamic GT1-7 cells and T lymphoid BW5147 cells (Schneider et al., 2004), it triggered production of the NADPH oxidase-dependent reactive oxygen species (ROS), and phosphorylation of the extracellular regulated kinase 1 and 2 (ERK 1 and 2), two MAPK kinases. PrPC activation lead to phosphorylation of the p47PHOX subunit of NADPH, a substrate of the protein kinase C. Inhibition of NADPH oxidase with diphenyleneiodonium (DPI) abolished ROS production following PrPC activation, indicating involvement of NADPH oxidase ROS production in the PrPC-mediated signalling. ROS act as chemical mediators in many signalling processes such as regulation of transcription factors and activation of kinases, including the MAPK kinase family. After PrPC activation the ERKs, but not the other

82 Chapter 2 PRNP and PrP

MAPKs, c-Jun NH2-terminal kinase or p38MAPK, were phosphorylated (activated). In the neuronal context, ERKs are modulators of long-term synaptic facilitation (): they activate the CREB-1-mediated gene transcription (Martin et al., 1997; Si et al., 2003a; Chapter 6.5). Phosphorylation of ERKs is regulated by ROS production in the 1C11 progenitor, GT1-7 and BW5147 cells, although the GT-1 and BW5147 cells lack caveolin-1. In the differentiated 1C11 cells, but not in the other cells, both ROS production and phosphorylation of ERKs were specifically controlled by the activation of Fyn tyrosine kinase. Thus, PrPC contributes to signalling networks in neuronal, neuroendocrine and lymphoid cells (Figure 2.4).

Using Affymetrix oligonucleotide microarrays, Mody et al. (2001) analysed patterns of gene expression in the developing mouse hippocampus. Of 11000 genes, 1926 showed dynamic changes across the five timepoints denoting major developmental events. These were the embryonic day 16 (E16) corresponding to the proliferation of neurons, and the postnatal days 1, 7, 16 and 30 (P1, P7, P16, P30) corresponding to the outgrowth and differentiation of neurons (P1, P7), formation of synapses (P16) and maturation of synaptic function (P30). Genes showed 16 different expression patterns (c0 - c15) of four major types: type I showing overall age-dependent down-regulation (c0, c1, c5), type II showing general age-dependent up-regulation to peak levels at P16 or P30 (c10, c11, c14, c15), type III showing peak expression at either P1 or P7 (c4, c8, c9, c12, c13) and type IV showing minimal expression at either P1 or P7 (c2, c3, c6). This clustering correlated with the major developmental changes. For instance, the c1 genes highly expressed at the E16 were switched off after birth. The Prnp gene belonged in the type II c15 cluster, showing the highest expression at P30, when the hippocampal synapses become more active and begin to exhibit increased synaptic plasticity. The other genes that shared the expression profile with Prnp were related to the maturation of synaptic function, including the genes involved in synaptic function, signal transduction, control of transcription and translation, glucose and oxidative metabolism and membrane regulation of ionic concentration. Of particluar note here is that the genes encoding PKC subunit βII and MEK protein kinase, which are involved in the PrPC-induced signalling (Figure 2.4), clustered together with Prnp within the c15. This clustering of PRNP gene and genes involved in its signalling pathway with genes

83 Chapter 2 PRNP and PrP

1C11, GT1-7, BW5147 1C115-HT, 1C11NE

PrP (Signal(s) ?) PrP

Cav

? Fyn ?

PKC PKC Shc

Grb2

Ras

NOx NOx Raf

MEK MEK

ERK ERK

Figure 2.4: Model of the proposed PrPC-associated signalling pathways. 1C11, progenitor neuroectodermal cells; GT1-7, hypothalamic cell line; BW5147, T lymphocyte cell line; 1C115-HT, differentiated serotonergic cells; 1C11NE, differentiated noradrenegic cells; PKC, protein kinase C; NOx, NADPH oxidase; MEK, MEK1 and MEK2 kinases; ERK, ERK1 and ERK2 kinases; Cav, caveolin 1b, Fyn, Fyn tyrosine kinase; Shc Grb-2 Ras Raf, Shc-Grb2/SOS-Ras-Raf signalling cascade (modified from Schneider et al., 2004).

83a Chapter 2 PRNP and PrP contributing to mature synaptic function indicates the involvement of PRNP in synaptic plasticity (Chapter 6).

Comparative genomics is a strategy to understand gene function. I used this approach to analyse the elusive function of PRNP gene in Chapter 6. My analysis supports best the signal transduction hypothesis.

2.6 Genomes: Digging Out the Gems

A major impetus for sequencing of the was its potential for discovery of new human genes related to known disease-associated genes. Study of such genes may shed on the function of their disease-causing counterparts, reveal the basis for related diseases, uncover potential drug targets and gain new insights into disease pathogenesis mechanisms. Further, genomic sequence allows rapid discovery of paralogues of the classic drug target proteins in silico. There are also numerous similar applications to basic physiology and cell biology.

As well as the human genome, there are 167 genomes completely sequenced by now, including only five other vertebrates, mouse, rat, Fugu, chicken and chimp (Genome News network; Table 2.1). By 19 August, more than 30 genomes were sequenced this year (Genome News Network). Genomic sequences are deposited in public biological databases, and comparison of genomes is a strategy to discover new genes, define gene regulatory elements and understand genome evolution and gene function.

2.6.1 The Human Genome

The human genome provides evidence of our evolutionary history (Lander et al., 2001). Clues about human development, physiology and evolution are all encrypted within the 2.9 Gb of DNA. Basic features of the broad genome landscape are the gene content, distribution of GC content and CpG islands, distribution of repeats and recombination rate.

84 Chapter 2 PRNP and PrP

Table 2.1: 167 sequenced genomes (Genome News Network, 24 August 2004)

Aeropyrum pernix Gallus gallus Pseudomonas aeruginosa Agrobacterium tumefaciens Geobacter sulfurreducens Pseudomonas putida Anabaena Gloeobacter violaceus Pseudomonas syringae Anopheles gambiae Guillardia theta Pyrobaculum aerophilum Apis mellifera Haemophilus ducreyi Pyrococcus abyssi Aquifex aeolicus Haemophilus influenzae Pyrococcus furiosus Arabidopsis thaliana Halobacterium Pyrococcus horikoshii Archaeoglobus fulgidus Helicobacter hepaticus Pyrolobus fumarii Ashbya gossypii Helicobacter pylori Ralstonia solanacearum Bacillus anthracis Homo sapiens Rattus norvegicus Bacillus cereus Kluyveromyces waltii Rhodopirellula baltica Bacillus halodurans Lactobacillus johnsonii Rhodopseudomonas palustris Bacillus subtilis Lactobacillus plantarum Rickettsia conorii Bacteroides thetaiotaomicron Lactococcus lactis Rickettsia prowazekii Bartonella henselae Leptospira interrogans Rickettsia siberica Bartonella quintana Listeria innocua Saccharomyces cerevisiae Bdellovibrio bacteriovorus Listeria monocytogenes Saccharopolyspora erythraea Bifidobacterium longum Magnaporthe grisea Salmonella enterica Blochmannia floridanus Mesorhizobium loti Salmonella typhimurium Bordetella bronchiseptica Methanobacterium Schizosaccharomyces pombe Bordetella parapertussis thermoautotrophicum Shewanella oneidensis Bordetella pertussis Methanococcoides burtonii Shigella flexneria Borrelia burgdorferi Methanococcus jannaschii Sinorhizobium meliloti Bradyrhizobium japonicum Methanococcus maripaludis Staphylococcus aureus Brucella melitensis Methanogenium frigidum Staphylococcus epidermidis Brucella suis Methanopyrus kandleri Streptococcus agalactiae Buchnera aphidicola Methanosarcina acetivorans Streptococcus mutans Caenorhabditis briggsae Methanosarcina mazei Streptococcus pneumoniae Caenorhabditis elegans Mus musculus Streptococcus pyogenes Campylobacter jejuni Mycobacterium bovis Streptomyces avermitilis Candida glabrata Mycobacterium leprae Streptomyces coelicolor Caulobacter crescentus Mycobacterium paratuberculosis Sulfolobus solfataricus Chlamydia muridarum Mycobacterium tuberculosis Sulfolobus tokodaii Chlamydia trachomatis Mycoplasma gallisepticum Synechococcus Chlamydophila caviae Mycoplasma genitalium Synechocystis Chlamydophila pneumoniae Mycoplasma mycoides Thermoanaerobacter tengcongensis Chlorobium tepidum Mycoplasma penetrans Thermoplasma acidophilum Chromobacterium violaceum Mycoplasma pneumoniae Thermoplasma volcanium Ciona intestinalis Mycoplasma pulmonis Thermosynechococcus elongatus Clostridium acetobutylicum Mycoplasma mobile Thermotagoa maritima Clostridium perfringens Nanoarchaeum equitans Thermus thermophilus Clostridium tetani Neisseria meningitidis Treponema denticola Corynebacterium diphtheriae Neurospora crassa Treponema pallidum Corynebacterium efficiens Nitrosomonas europaea Tropheryma whipplei Coxiella burnetii Oceanobacillus iheyensis Ureaplasma urealyticum Cyanidioschyzon merolae Onions yellows phytoplasma Vibrio cholerae Debaryomyces hansenii Oryza sativa Vibrio parahaemolyticus Deinococcus radiodurans Pan troglodytes Vibrio vulnificus Desulfovibrio vulgaris Pasteurella multocida Wigglesworthia glossinidia Drosophila melanogaster Phanerochaete chrysosporium Wolbachia pipientis Encephalitozoon cuniculi Photorhabdus luminescens Wolinella succinogenes Enterococcus faecalis Picrophilus torridus Xanthomonas axonopodis Erwinia carotovora Plasmodium falciparum Xanthomonas campestris Escherichia coli Plasmodium yoelii yoelii Xylella fastidiosa Fugu rubripes Porphyromonas gingivalis Yarrowia lipolytica Fusobacterium nucleatum Prochlorococcus marinus Yersinia pestis Protochlamydia amoebophila

Bold and underlined, the six vertebrate genomes.

84a Chapter 2 PRNP and PrP

2.6.1.1 Gene and Protein Content

The early estimates of the gene number varied between 30000-100000 genes. Yet an average human gene is complex. The mean exon number per gene is 8.8, and the mean size of internal exons is 145 bp. An average gene extends across 27 kb. The mean size of introns is 3365 bp, and mean sizes of 3’UTR and 5’UTR are 770 and 300 bp respectively. The mean size of coding sequence is 1340 bp, translating into a protein of 447 amino acids. It was estimated that approximately 35% of human genes are alternatively spliced, and there are on average 3 distinct transcripts per gene. The gene density ranges from 6.4 genes/Mb (chromosome Y) to 26.8 genes/Mb (chromosome 19).

Protein-coding genes in the human genome were predicted from three lines of evidence: direct evidence of transcription (mRNA, EST), indirect evidence (homology to previously identified genes and proteins) and ab initio prediction using software that recognizes the functional signals in genes. The ab initio gene prediction methods predict correctly about 70% of individual exons and 20% of individual genes in human. The gene prediction strategy used as a first step the Ensembl prediction system, starting with the ab initio prediction (Genscan program) and confirmation of gene predictions by assesing similarity with known proteins, mRNAs, ESTs and protein motifs from any organism. The protein matches were then extended using the GeneWise program. This system yielded 35500 gene and 44860 transcript predictions. Frequent mistakes with this system are fragmentation, merging and overlapping of genes.

In the second step, the Genie program predictions were combined with the Ensembl gene predictions. Genie starts with the mRNA and EST matches, and then employs the Hidden Markov Model statistics for ab initio prediction to extend these matches in both 3’ and 5’ directions. This strategy yielded fewer fragmented genes than the Ensembl system, merging 15437 Ensembl gene predictions into 9526 clusters.

In the final step, known genes from the RefSeq, SWISSPROT and TrEMBL databases were incorporated into the results, producing a final estimate of 31000 coding genes in

85 Chapter 2 PRNP and PrP the human genome, only twice as many as in worm or fly. This includes about 15000 known genes, and about 17000 gene predictions, which are a collection of anonymous genes and a fantastic resource for targeted gene discovery (Chapter 4). This estimate leads to calculations that, on average, 1.5% of the human genome is coding sequence. There are also several thousands of non-coding genes in the human genome (tRNAs, rRNAs, splicesomal RNAs, telomeric RNAs, snoRNAs, microRNA, siRNAs, and other non-coding genes of unknown function). Overall 30% of the genome would be transcribed.

The full set of known human proteins is more complex than those in invertebrates due to presence of vertebrate-specific protein domains and motifs: 7% of the InterPro families are vertebrate-specific representing 70 protein families and 24 domain families. Vertebrates have arranged pre-existing protein components into a richer collection of domain architectures. Specifically, the human genome contains more genes, domains, protein families, paralogues, multidomain proteins with multiple functions and domain architectures, in comparisons with worm and fly.

2.6.1.2 GC Content and CpG Islands

There are GC-rich and GC-poor regions in the human genome. The genome-wide GC content average is 41%, ranging from 36-47.1% on a large scale (> 10Mb), and from 33.1-59% on a smaller scale. There is strong positive correlation between the GC content and gene density. The human genome contains 28890 CpG islands, which are short genomic regions (<85 bp) with high GC content (>75%) that are associated with 5’ ends of genes.

2.6.1.3 Repeat Content

Repeat sequences account for more than 50% of the human genome. The repeats are evidence of evolutionary events and forces that have shaped the genome. As passive entities, they represent markers for studies of mutation and selection. As active entities,

86 Chapter 2 PRNP and PrP they have reshaped genome by causing rearrangements, forming new genes, reshuffling existing genes and modulating of GC content.

Transposable elements comprise 45% of the genome. The currently recognized long interspersed elements (LINEs), short interspersed elements (SINEs), long terminal repeats (LTR) retroposons and DNA transposons comprise 13%, 20%, 8% and 3%, respectively, of the human genome. Overall activity of these transposons has declined over the past 35-50 million years, with the possible exception of the 61 LINEs with intact ORFs. There is a remarkable variation in the repeat content across the genome, ranging from less than 2% across the four 100 kb homeobox gene clusters to 89% across 525 kb of the X chromosome in region Xp11. The absence of repeats in a genomic region may indicate many cis-regulatory elements that cannot be interrupted by insertions.

Simple sequence repeats (SSR) are perfect or imperfect tandem repeats of a particular k- mer. Microsatellites have a short k (1-13 bp) and minisatellites have longer k (14-500 bp). Simple sequence repeats arise by the DNA polymerase slippage and comprise 3% of the genome with frequency of one SSR per 2 kb.

Segmental duplications of parts of the genomic sequence (1-200 kb) occur as interchromosomal duplications when segments are distributed to nonhomologous , and as the intrachromosomal duplications when duplications occur within a particular chromosome. These regions comprise 3.3% of the genome. Chromosomal regions near centromeres and telomeres consist almost entirely of interchromosomal duplicated segments.

2.6.1.4 Recombination Rate

The overall occurrence of single nucleotide polymorphisms (SNP) is roughly 1 in 1900 bp. Recombination rate varies across the genome. In general, recombination rate is higher in the distal regions of chromosomes (20 Mb from telomere) and on the shorter

87 Chapter 2 PRNP and PrP chromosome arms, promoting at least one crossover per chromosome arm per meiosis. Recombination is suppressed near the centromeres.

2.6.1.5 Quality Assessment of the Human Genome Sequence

World standards for the human genome sequence fidelity state that there should be less than one error per 10000 DNA bases (99.99% accuracy), and that the sequence should be without gaps. Schmutz et al. (2004) performed a detailed evaluation of a sample of 34 Mb of the human DNA reference sequence. Accuracy of the sequence was above 99.99%, with the overall error rate 1/73369. There was 1 significant error (a single error that causes 50 contiguous base pairs to be incorrect) in 2630005 base pairs.

2.6.2 The Mouse Genome

Mouse is a key experimental tool for biomedical research (Waterston et al., 2002). The mouse genome is also important for comparative genomics, since roughly 75 million years of independent evolution separates the human and mouse genomes, which now diverge in nearly one substitution per two nucleotides.

The mouse genome (2.6 Gb) is 14% smaller than the human genome (Table 2.2), due to higher deletion rate in mouse. The mouse has higher overall GC content (42%) and tighter GC distribution. There are fewer CpG islands in the mouse genome (15500) than in human (28890).

Only 37.5% of the mouse genome can be recognized as transposon-derived, compared with 45% of the human genome. This is due to higher nucleotide substitution rate that makes ancient repeat sequences difficult to recognize. The neutral substitution rate in mouse (4.5 x 10-9 per year) is twice that of human (2.2 x 10-9 per year), perhaps determined by population size, body size or generation time. The depth of the human repeat analysis (150-200 million years), therefore, is better than that of the mouse repeat analysis (100-120 million years). Lineage-specific repeats account for 32.4% of the mouse genome compared with 24.4% in human. The rate of transposition is constant in

88 Chapter 2 PRNP and PrP

Table 2.2: Vertebrate genomes in numbers

Human Mouse Dog Rat Fugu Genome size 2.9 Gb 2.6 Gb 2.4 Gb 2.7 Gb 365 Mb Gene number 31000 31000 NA 31000 39000 Gene density 6.4/Mb -26.8 /Mb NA NA NA 1/10.9 kb Average gene size 27 kb NA NA NA NA GC content 41% 42% NA 43% 44.1-53.5 % Transposons 45% 37.5% 31% 40% 2.7 % Substitution rate 2.2 x 10-9/year 4.5 x 10-9/year (2.2 x 10-9/year) 4.9 x 10-9/year NA SNP frequency 1/1900 bp 1/600 bp 1/1500 bp NA NA

88a Chapter 2 PRNP and PrP mouse although it has declined in humans. There are 3000 individual LINEs, four SINE lineages and three LTR lineages that are potentially active in the mouse genome. The LINEs bias toward AT-rich, and the SINEs bias toward GC-rich genome regions.

The SNP frequency is 1 per 500-700 bp in mouse. Mouse has roughly four-fold more short SSRs (1-5 bp unit) than human.

Both the mouse and human genome have about 30000 protein-coding genes. There are 80% mouse genes with one identifiable orthologue in human. Less than one percent of genes are unique to each genome. At the nucleotide level, the two genomes can be aligned across 40% of their lengths. These sequences are the orthologous sequences from the common ancestor that remained in both lineages. Over 90% of the human and mouse genomes can be partitioned into the regions of conserved synteny (orthologous gene loci on the same chromosome in two species regardless of gene order and presence of intervening genes). In these genomic regions, gene order from the most recent ancestor has been conserved in both species.

Approximately 5% of mammalian genome is under purifying selection, more that its coding potential (1.5%). This suggests that the UTRs (1%), regulatory elements, non- protein-coding genes and chromosomal structural elements are under functional selection as well.

The mammalian genome is evolving in a non-uniform manner. There is a substantial variation across the genome in all three forces that shape genome: nucleotide substitution, deletion and insertion. Neutral substitution rate is correlated with recombination rate genome-wide.

Two general mechanisms guide protein invention in eukaryotes. First, domains can be combined to form new architectures, and second, gene families may expand in a lineage-specific manner. In the mouse lineage, many local gene family expansions have occurred. Such examples include genes involved in reproduction, immunity, development and olfaction.

89 Chapter 2 PRNP and PrP

Two-genome comparison between human and mouse allowed estimation of rate of protein evolution. Measures of protein sequence evolution are the percentage of identity and the ratio between the rates of non-synonymous (KA) mutations per non-synonimous site and synonymous (KS) mutations per synonymous site (in general, the KA / KS ratio

<1 indicates purifying selection, the KA / KS ratio =1 indicates neutral evolution, and the

KA / KS >1 indicates positive selection). For the 12845 pairs of mouse-human 1:1 orthologues, the median amino acid identity was 78.5%, and the median KA / KS ratio was 0.115. The major determinant of the KA / KS ratio was variation in KA. The KS clustered tightly around 0.6 synonymous substitutions per synonymous site, indicating a similar neutral substitution rate among all proteins. Domains are under greater selective pressure than protein regions not containing domains, and catalytical domains are under greater selective pressure than not-catalytical domains. Finally, domains in the secreted class are typically under less purifying selection than are either nuclear or cytoplasmic domains. Protein domain families involved in the immunity and gene transcription showed the highest median KA / KS ratio.

2.6.3 The Rat Genome

Rat is a tool in experimental medicine and drug discovery (Gibbs et al., 2004). It is separated from mouse by 12-24 million years, and from human by about 75 million years. This third mammalian genome sequenced allowed three-way comparisons to resolve new details of mammalian evolution.

The rat genome (2.7 Gb) is smaller than human but bigger than mouse (Table 2.2). The difference between rodents is due to a different repeat content and to a different proportion of segmental duplications.

The number of genes encoded by the rat genome is similar to that in mouse and human (about 30000). Most of genes (90%) have had no deletion or duplication since the last common ancestor. The intronic structures have been conserved as well. Coding density is about 1.7%. There are 435 tRNA genes, and 454 other known non-coding RNA genes

90 Chapter 2 PRNP and PrP defined in the rat genome. There are 15975 CpG islands in the rat genome. The GC content is 0.35% enriched in comparison with mouse (43%) due to a higher rate of A to G transitions over T to C transitions. There is also an excess of the G+T over C+A on the coding strand (strand asymmetry).

In the protein-coding sequences there is an overall excess of small deletions over insertions. Based on the three-species comparisons, the rates of indel accumulations in nuclear, accumulated/secreted, mitochondrial, cytoplasmic proteins, enzymes and ligand-binding proteins are 4 x 10-4, 3.9 x 10-4, 3.1 x 10-4, 2.4 x 10-4, 2.1 x 10-4 and 1.4 x 10-4. Whereas the transmembrane protein regions were the most refractory to indel accumulation, the low-complexity protein regions were three times enriched in indels.

Almost all human disease-associated genes have 1:1 orthologues in the rat genome and are unlikely to be diverged, duplicated or lost. However, their rates of synonymous substitution are higher than those of remaining genes. Some rat-specific genes arose through expansion of gene families, including the genes encoding pheromones, immunity-related proteins and proteins involved in chemosensation and detoxification.

About 3% of the genome is in the large segmental duplications, associated primarily with the pericentromeric and subtelomeric regions. These regions harbour many recently expanded gene families. Intrachromosomal duplications occur three times more frequently than the interchromosomal duplications in rat.

Roughly 40% of the rat genome aligns with human and mouse and this fraction contains the vast majority of exons and regulatory elements. A portion of this eutherian core makes 5-6% of the genome that is under selective constraint. About 28% of the rat genome aligns only with mouse. This fraction contains rodent-specific repeats (40%), and the rest may be single-copy DNA deleted in the human lineage.

One third (29%) of the rat genome aligns with neither human nor mouse. Half of this sequence consists of rat-specific repeats, and about third of this sequence are rodent- specific repeats deleted in mouse.

91 Chapter 2 PRNP and PrP

There were 250 genome rearrangements in the rodent lineage since evolutionary split between rodents and human. The neutral substitution rate appears to be three times higher in rodents than in human, with that in rat 5-10% higher than in mouse. Microdeletions occur at a two-fold higher rate in rodents than in human. There is a correlation between the local rate of microinsertions, microdeletions, transposable element insertions and nucleotide substitutions in the rat genome.

Males have two-fold excess of nucleotide substitution and of little indels (<50 bp) mirroring the ratio of the numbers of cell divisions between the male and female germlines.

About 40% of the rat genome is derived from transposable elements. The LINEs comprise 22% of the genome, with the family still active. Two SINE families, B2 and ID, are also active, as well as all three classes of LTR retroviral elements. The DNA transposons are inactive.

2.6.4 The Fugu rubripes Genome

The tiger pufferfish, Fugu rubripes, has the smallest vertebrate genome but its gene repertoire is similar to mammals (Venkatesh et al., 2000). Thus it could be a useful reference genome for gene discovery and discovery of conserved regulatory elements.

Although the compact genome of tiger pufferfish Fugu rubripes has only 365 Mb (Aparicio et al., 2002), the number of protein-coding genes between human and Fugu is comparable (Table 2.2). 31059 genes were predicted in the Fugu genome, with the upper bound of gene loci expected to reach 38000-40000. Genes were predicted mostly using the homology evidence due to unavailability of cDNA.

Only 2.7% of the genome matched interspersed repeats but this is probably a significant underestimate due to incompleteness of the Fugu repeats database. Rapid deletion of nonfunctional sequences may be the mechanism accounting for the repeat structure in

92 Chapter 2 PRNP and PrP

Fugu. On the other hand, transposable elements in Fugu appear to be very active. At least 40 families of transposable elements have accumulated fewer than 5% of substitutions, indicating that they may be active.

The compactness of Fugu genome is due to reduction in the size of introns and intergenic regions. Roughly 75% introns are <425 bp in length, but the number of introns, 161536, is very similar to that in human. Both gain and loss of introns were observed in the Fugu lineage. The presence of “giant” genes was also noted in Fugu. The average gene density was estimated to be one gene per 10.9 kb. Gene loci occupy one third of the genome.

There was much lower GC variation in the overall Fugu GC content (44.1-53.5%) than in human.

With windows of 1, 0.5 and >1 kb, roughly 0.15, 1.3 and 5% of the Fugu genome contained duplicated segments, indicating that the large duplications are not a recent feature of the Fugu genome. However, evidence for ancient duplications comes from the existence of paralogous segments.

Most of human peptides (75%) have some match in Fugu. About 6000 Fugu proteins have no match in human. There is a general human-Fugu concordance between the predicted protein classifications. Exceptions include an excess of the potassium channel subunits and kinases in Fugu, and an excess of the C2H2 zinc finger proteins in human. Olfactory receptors show a clear expansion of different families in Fugu.

Many short genomic segments are conserved between human and Fugu after separation by 450 million years of independent evolution. However, scrambling of the gene order, depending on the chromosome length, was also often found in human-Fugu comparisons.

93 Chapter 2 PRNP and PrP

2.6.5 The Dog Genome

Dog is an attractive choice for genetic comparisons as the characteristics of about 300 breeds are maintained by restricting gene flow between breeds. The dog genome sequence was sequenced with 1.5 time coverage, consisting of 6.22 million reads and covering about 50% of the 4.8 Gb diploid genome (Kirkness et al., 2003). This limited depth of sequencing permits some initial analyses.

The dog genome is estimated to be 2.3-2.4 Gb (Table 2.2). The 6.22 million reads were merged into 522011 contigs with mean span of 8.6 kb and random sequence coverage of about 77%.

Roughly 31% of the sequence is repeat-derived (e.g. human 45%, mouse 38%), but the dog repeat libraries may not be as complete as those for human and mouse. The substitution levels were similar in dog and human.

The dog-human alignments covered 18473 genes. Dog appears to have much larger complements than human of olfactory receptor genes, and genes involved in peptide metabolism.

The SNP frequency in dog was estimated to be about 1/1500 bp.

Many sequences in the dog genome differed in the presence or absence of a SINE insertion, and such polymorphisms were verified in a number of dog breeds. Approximately 7% of the 23000 SINE_cF elements are dimorphic in the sequenced poodle, and these are a valuable resource for phylogenetic studies. This kind of gene dimorphism may cause dramatic phenotypic effects (e.g. induction of the canine narcolepsy), contributing to the phenotypic diversity among the dog breeds.

At present, besides the published human, mouse, rat, dog and Fugu genome analyses, sequences of the chimpanzee and chicken genomes are also available (Ensembl). Sequencing and assembly of the Tetraodon and zebrafish genomes is near completion

94 Chapter 2 PRNP and PrP

(Genoscope; Ensembl). Further, sequencing of the cow, pig and Brazilian opossum genomes is underway (NCBI). The National Human Genome Research Institute (NHGRI, USA) approved funding for the projects to sequence the genomes of African savannah elephant, the European common shrew, the guinea pig, two species of hedgehog, the nine-banded armadillo, the rabbit, the cat, and the orang-utan (Genome News Network). These vertebrate genomes are priceless resource for discovery of genes and definition of gene regulatory sequences.

2.6.6 Annotation of Genomic Sequences

Automatic genome annotation is a major strategy to annotate genomic sequences (Chapter 2.6.1). However, this is a work in progress. The main problems are mistakes arising from automatic genome annotation, and the inability of recent programs to predict UTRs and non-protein coding genes. Further, the collections of transcripts and ESTs are limited.

Guigo et al. (2003) developed a two-stage multi-exon gene prediction procedure that exploits the availability of human and mouse genomic sequences. The first stage is to run gene-prediction programs (TWINSCAM, SGP2) that utilize genome alignment in combination with detection of statistical patterns in DNA. In the second stage, multiexon genes predicted in human and mouse are compared. Gene prediction is retained only if the predicted proteins in both species align, with at least one predicted intron at the same location. A total of 1019 additional new genes were predicted using this method. The reliability of these gene predictions was 76%, as tested by RT-PCR and direct sequencing of a single exon pair from a sample of the gene predictions. Analysis of gene expression patterns indicated that this gene prediction system could be particularly sensitive to genes with tissue-restricted expression.

There are still transcripts and ESTs that are missing from the human collections. Ota et al. (2004) sequenced 21243 full-length human cDNAs, of which 14490 were unique. Roughly half of these were protein-coding cDNAs (5416). Of these, 1999 clusters had not been predicted by computational methods. The distribution of GC content in this

95 Chapter 2 PRNP and PrP category has a peak at 58%, suggesting that there may be a bias against GC-rich transcripts in the current protein-coding gene predictions. The remaining cDNAs contained no ORF, corresponding to the non-protein-coding genes.

Manual curation is at present the ultimate way to annotate genomic sequence. For example, The Vertebrate Genome Annotation database (VEGA; http://vega.sanger.ac.uk/) is a central repository for manual annotation of different vertebrate finished genome sequences. Expert manual annotators have to correct mistakes arising from automatic gene prediction by effectively integrating the ab initio gene predictions, direct evidence, homology-based evidence and comparison across multiple genomes.

This strategy can also be used in a targeted fashion to search for genes of interest in multiple genomes and compile supporting evidence. Using this strategy to search genomic databases for the predicted PRNP paralogues, I discovered a new human PRNP paralogue dubbed Shadow of prion protein gene (SPRN). I compiled direct evidence, ab initio gene predictions and homology-based evidence for this gene in mammals and fish (Chapter 4).

2.7 Comparative Genomic Analysis

With the availability of many genomic sequences, it is now possible to decipher information that is encrypted within the DNA stands. Comparative genomic analysis is emerging as a major strategy to understand genomes.

Functional sequences tend to evolve more slowly than non-functional sequences (Frazer et al., 2002). By comparing genomic sequences it is therefore possible to identify conserved, functional sequences (coding and non-coding) against non-conserved, non- functional background noise. The depth of comparative analysis depends on the evolutionary distance between sequences in comparison. For instance, human-mouse (75 million years) comparisons revealed many conserved coding and non-coding regions, but it was not possible to discriminate which non-coding conserved regions are

96 Chapter 2 PRNP and PrP indeed functional (Waterston et al., 2002). When more species are included in comparisons (e.g. human, mouse and cow), non-coding sequences conserved in all species are more likely to be functional. At the other extreme are comparisons between human and fish (450 million years). These will primarily reveal conserved coding sequences, but conserved regulatory sequences could be also found.

Computational tools have been developed to enable comparison and analysis of genomic sequences (Frazer et al., 2003). There are two basic types of programs for alignment of long genomic sequences: global and local. Global alignments are designed to produce an optimal similarity score over the entire lengths of sequences compared. I used the global alignment tool VISTA in my work (Mayor et al., 2000; Chapter 3). The VISTA server implements AVID algorithm that works by first finding maximal exact matches between two sequences using suffix tree, and then identifies the best anchor points based on the length of the exact matches and the similarity of their flanking regions. Local alignments, on the other hand, are computed to produce optimal similarity scores between the subregions of sequences. I used the PipMaker program for local alignments in my analyses (Schwartz et al., 2000; Chapter 3). The underlying algorithm BLASTZ is a gapped BLAST program that starts by finding short, exact matches and than extend those matches to alignments that include gaps.

Kellis et al. (2003) compared genomes of four yeast species (S. cerevisiae, S. paradoxus, S. mikatae, S. bayanus) that diverged over 5-20 million years. This comparative genomic analysis allowed gene identification and determination of gene structure. Gene regulatory elements in genes were also found. Furthermore, genes and genome regions that exhibit fast or slow evolutionary changes were identified.

Thomas et al. (2003) compared an 1.8 Mb region of human chromosome 7 harbouring 10 genes with its orthologous genomic regions from 11 species. Human, chimpanzee, baboon, cat, dog, cow, pig, rat, mouse, chicken, Fugu, Tetraodon and zebrafish, spanning 450 million years of evolution, were included in this analysis. These sequences showed conservation that reflected both functional constraints and neutral sequence entropy. The small genomic regions (average 58 bp) conserved across these

97 Chapter 2 PRNP and PrP sequences called multi-species conserved sequences (MCS) are candidate regions for functional roles. About 2% of MCSs comprises ancestral repeats and 32% represents coding sequences or UTRs. The remaining 68% of MCSs are outside known exons, and almost none correspond to currently known regulatory elements. Many of the conserved non-coding genomic sequences identified by this strategy were previously not detectable in pairwise sequence comparisons. The human-fish comparisons detected conservation largely confined to coding sequences, but almost third of human coding exons did not align with fish. Eliminating chimp and baboon did not affect the specificity of the MCS detection, but eliminating non-human primates, chicken and fish reduced the MCS number by 17%. Chicken sequence alone detected 40% of MCS bases (94% of the coding but only 29% of the non-coding sequences).

I used the public genomic data for mammals (human, mouse, rat) and fish (zebrafish, Fugu, Tetraodon) as a basis for gene discovery and comparative genomic analysis by which I determined evolutionary trajectories of the PRNP and SPRN genes (Chapter 5).

2.8 The Tammar Wallaby: an Alternative Mammalian Experimental Model and Kangaroo Genome Project

The number of vertebrate genomic sequences available limits the depth of comparative genomic analysis. O’Brien et al. (2001) discussed current limitations for comparative genomics and listed mammalian species that are a priority for sequencing. The criteria for sequencing priority includes phylogeny, relevance to understanding human biology or medicine, economic importance, genomic characteristics, developmental features and species diversity among mammalian orders. Of 4600-4800 mammal species, all but 270 are eutherian (“placental”) mammals. The eighteen eutherian orders cluster into four principal clades. Human, mouse and rat all cluster in the clade III. There is therefore a need to sequence representatives from the other three clades. Livestock together with cat and dog, cluster in the same clade IV. Representatives of the remaining clade II (sloths, anteaters, armadillos) and clade I (Afrotheria) should also be considered, as well as marsupials and monotremes.

98 Chapter 2 PRNP and PrP

Graves and Westerman (2002) presented the case for a Kangaroo Genome Project. Marsupials, found only in Australasia and the Americas, are mammals since they bear fur and suckle their young with milk. Yet, independent evolution over 180 million years of separation from their eutherian relatives has sculpted different (but not inferior) mammals with quite distinct characteristics.

Three distantly related marsupial species have been of major experimental interest: tammar wallaby (Macropus eugenii), fat-tailed dunnart (Sminthopsis crassicaudata) and Brazilian opossum (Monodelphis domestica) (Graves and Weterman, 2002).

All mammals (Figure 2.5) are equally related to birds and reptiles (about 310 million years of separation), and fish (roughly 450 million years). Marsupials (Metatheria) and Eutheria diverged about 180 million years ago. These therian mammals diverged from the egg-laying mammals monotremes (Prototheria) roughly 210 million years ago.

Early marsupials radiated in the Americas more that 65 million years ago, and during the time of the supercontinent Gondwana they colonized Antarctica and Australia. After separation of the Americas and Australia 38-84 million years ago, Australian marsupials evolved separately. The oldest fossils found in Australia are dated 55 million years ago. The evolutionary distance between tammar wallaby (Australia) and Brazilian opossum (South America) mirrors that of human and mouse (75 million years).

The marsupial genome is roughly the same size as eutherian genomes, but it is usually divided into fewer, larger chromosomes. A basal 2n=14 karyotype represented in all marsupial superfamilies represents an ancestral diploid marsupial karyotype. The diploid karyotype of tammar wallaby contains 16 chromosomes, and the diploid Brazilian opossum karyotype has 18 chromosomes.

Comparative gene mapping has been used to study the relationships between the mammalian genomes. The experiments comparing human and other eutherians showed, for instance, that the X chromosome content is mainly conserved among all eutherian mammals. However, most genes on the short arm of human X are autosomal

99 Chapter 2 PRNP and PrP

Figure 2.5: Evolutionary relationship among vertebrates (Graves and Westerman, 2002). My, million years.

99a Chapter 2 PRNP and PrP

(chromosome 5) in tammar wallaby, as well as in monotremes, implying that they were added onto the eutherian X after divergence of marsupials. The relatively recent evolutionary origin of this region could explain why a high number of human genes on the short arm of human X escape the X inactivation. The marsupial X is subject to inactivation, but the mechanism seems to be simpler than that in eutherians, and may be ancestral. Several genes involved in eutherian sex determination have been isolated and analysed in marsupials.

The depth of comparative genomics depends on the richness and evolutionary span across the species being compared. As marsupials on the evolutionary scale fill the huge gap between eutherians (which radiated roughly 105 million years ago) and bird/reptile branch (which diverged about 310 million years ago), this lineage makes a logical choice for sequencing. Being at this mid evolutionary distance from human, the highest promise for such an alternative mammalian experimental system is in identification of conserved genes and of conserved regulatory sequences.

This potential of such analyses of the kangaroo genome was discussed by Wakefield and Graves (2003). Sequencing of the kangaroo genome will provide a new dimension to comparative genomic analysis, as inferred from the contribution of Australian mammal tammar wallaby (Macropus eugenii) to biology, genetics and genomics. Comparison of the XPCT gene between human, mouse and tammar wallaby suggested a high ratio of conservation signal to random noise. This reduced noise level could be particularly useful for identification of gene regulatory regions.

The kangaroo genome project (Figure 2.6; http://kangaroo.genome.org.au) is an international project to achieve draft-quality sequencing of the tammar wallaby genome. The project includes mapping of the tammar genome, sequencing of DNA and analysis of gene expression. Initial funding for the project was approved in March 2004.

I outline some major discoveries that have emerged from the mammalian-wide comparisons, arguing in favour of the kangaroo genome project.

100 Chapter 2 PRNP and PrP

Figure 2.6: Kangaroo genome project logo.

100a Chapter 2 PRNP and PrP

2.8.1 The Mammalian Testis-Determining Gene

A testis-determining gene is encoded by the Y chromosome in mammals. This gene defines maleness, inducing the development of testis from the indifferentiated gonad. The Y-borne zinc-finger (ZFY) gene was an early candidate for the testis-determining gene. It maps to the eutherian Y and also has a homologue (ZFX) on the short arm of the human X. Using the human ZFY as probe to hybridise the tammar wallaby and fat- tailed dunnart chromosome spreads, Sinclair et al., 1988, surprisingly, found that it mapped to neither Y nor X. In marsupials, the ZFY is autosomal, indicating that it is not primary mammalian sex-determining gene.

2.8.2 Discovery of New Human Genes

It was proposed that there are two classes of Y-chromosome associated genes: single copy genes present on both Y and X and widely expressed, and multicopy Y-specific genes expressed in testis. It was thought that one such a testis-specific gene was the human RBMY (for RNA-binding motif gene, Y chromosome). RBMY genes were reported to have no X homologue in eutherians. However, Delbridge et al. (1999) first found that the RBMY has a homologue on the marsupial X and subsequently demonstrated also on the human chromosome X by cloning, sequencing and fluorescent in situ hybridisation. Thus the new human locus, RBMX, was found after comparison between marsupials and human. This human gene is now being investigated for a role in mental retardation, since its position on the human X falls within a deletion interval containing several X-linked mental retardation genes.

2.8.3 Detection of Regulatory Elements

Chapman et al. (2003) used marsupial sequence for phylogenetic footprinting (Chapter 6.6). A BAC clone from stripe-faced dunnart (Sminthopsis macroura) was isolated harbouring the lymphoblastic leukemia-1 (LYL1) gene. LYL1 is a member of the stem- cell leukemia gene family identified on the basis of translocations in T cell acute

101 Chapter 2 PRNP and PrP leukemia. By aligning the LYL1 promoter between human, mouse and dunnart, Chapman et al. found conserved putative transcription factor-binding sites.

I therefore isolated and characterized the prion protein gene from tammar wallaby. In comparative genomic analysis that included also the PRNPs from four eutherian species (human, mouse, bovine, ovine), I identified mammalian-wide conserved gene regions and potential regulatory elements. I discussed these findings with respect to current hypotheses about the function of PRNP (Chapter 6). This study showed utility of the marsupial sequence in analysis of the human disease-related gene.

2.9 The Present Study

The original aim of this study was to analyse the evolution and function of prion protein gene. Elucidation of its normal function is essential for better understanding of its role in prion diseases, and for development of strategies for therapy and prevention of prion diseases.

This project grew another dimension when I discovered the new human SPRN gene and defined a new family of vertebrate Shadoo proteins (Chapter 4).

I then analysed evolution of PRNP and SPRN genes and showed different evolutionary trajectories for these two mammalian genes. The more conserved evolution of SPRN gene indicates that it has more prominent, and perhaps more important, function than PRNP suggesting that it could substitute for the loss of PRNP in the knock-out mice (Chapter 5).

Finally, PRNP gene comparisons across the eutherian-marsupial distance enabled me to identify conserved gene regions that represent potential regulatory elements. I fitted this information with the hypotheses on normal function of PRNP and concluded that my analysis supports best the signal transduction hypothesis (Chapter 6).

102