View metadata, citation and similar papers at core.ac.uk brought to you by CORE

provided by Elsevier - Publisher Connector

Genomics 90 (2007) 482–492 www.elsevier.com/locate/ygeno

Promoter, alternative splice forms, and genomic structure of protocadherin 15☆ ⁎ Kumar N. Alagramam a, , Nathaniel D. Miller b, Nithin D. Adappa a, Darrell R. Pitts a, John C. Heaphy a, Huijun Yuan c, Richard J. Smith c

a Department of Otolaryngology–Head and Neck Surgery, University Hospitals of Cleveland, Case Western Reserve University, Cleveland, OH 44106, USA b Department of Biochemistry, Miami University, Miami, OH 44904, USA c Molecular Otolaryngology Research Laboratories, Department of Otolaryngology, University of Iowa, Iowa City, IA 52242, USA Received 1 September 2006; accepted 20 June 2007 Available online 15 August 2007

Abstract

We originally showed that the protocadherin 15 (Pcdh15) is necessary for hearing and balance functions; mutations in Pcdh15 affect hair cell development in Ames waltzer (av) mice. Here we extend that study to understand better how Pcdh15 operates in a cell. The original report identified 33 exons in Pcdh15, with exon 1 being noncoding; additional exons of Pcdh15 have since been reported. The 33 exons of Pcdh15 described originally are embedded in 409 kb of mouse genomic sequence, while the corresponding exons of human PCDH15 are spread over 980 kb of genomic DNA; the exons in Pcdh15/PCDH15 range in size from 9 to ∼2000 bp. The genomic organization of Pcdh15/PCDH15 bears similarity to that of 23, but differs significantly from other protocadherin , such as Pcdhα, β,orγ. A CpG island is located ∼2900 bp upstream of the PCDH15 transcriptional start site. The Pcdh15/PCDH15 promoter lacks TATAA or CAAT sequences within 100 bases upstream of the transcription start site; deletion mapping showed that Pcdh15 harbors suppressor and enhancer elements. Preliminary searches for alternatively spliced transcripts of Pcdh15 identified novel splice variants not reported previously. Results from our study show that both mouse and human protocadherin 15 genes have complex genomic structures and transcription control mechanisms. © 2007 Elsevier Inc. All rights reserved.

Keywords: Protocadherin 15; Genomic structure; Promoter elements; Alternate splicing

Protocadherins are nonclassical that are structurally understand how protocadherins function in the inner ear and eye and functionally divergent from classical cadherins [1,2]. of mammals. Mutation of mouse protocadherin 15 (Pcdh15) affects normal Promoters play a pivotal role in gene regulation. The murine development of sensory hair cells in the inner ear, causing Pcdh15 is expressed in hair cells and other epithelial cells and it deafness and balance problems in Ames waltzer (av) mice [3]. is important to identify DNA sequence elements that control Mutations in its human ortholog, PCDH15, cause Usher syn- expression of Pcdh15/PCDH15. For example, identification of drome 1F (USH1F), a form of sensory impairment in which cis-acting elements of Pcdh15 would be valuable for the deafness and blindness cosegregate [4,5]. Mutations in generation of conditional knockouts in mice, and identification PCDH15 are also associated with nonsyndromic deafness [6]. of cis-acting elements of PCDH15 would be a useful tool for Elucidation of cis-acting promoter elements, alternate splicing, gene therapy. Though mouse and sequences are and the genomic structure of Pcdh15 would help us better available, there are several reasons dedicated analysis is required. First, unlike an exon, a promoter is an ill-defined

☆ unit. Typically, a sequence upstream of the transcription start Sequence data from this article have been deposited with the GenBank Data site, including the start site, is considered the putative promoter. Library under Accession Nos. AY029205 (PCDH15 cDNA) and AF281899 (mouse Pcdh15). Second, in general, the putative promoter does not share ⁎ Corresponding author. extensive homology even with functionally correlated genes. E-mail address: [email protected] (K.N. Alagramam). The latter prevents detection by commonly used search pro-

0888-7543/$ - see front matter © 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.ygeno.2007.06.007 K.N. Alagramam et al. / Genomics 90 (2007) 482–492 483 grams such as BLAST or FASTA. Therefore, we applied a Various members of the Protocadherin family are also known combination of in silico prediction tools coupled with in vitro to express alternatively spliced gene transcripts [13]. The first verification protocols to identify promoter elements of Pcdh15. evidence for alternative splicing in Pcdh15 came from Northern Results of this effort are presented here. blot analysis, which demonstrated three large transcripts (10– Alternative splicing regulates gene expression in a different 12 kb) [3]. Isoforms consisting mostly of the cytoplasmic way and it is a significant component of the complexity of the domain (CP) [6] and the extracellular domain (EC) [14] have mammalian genome. The cadherin 23 (CDH23) gene is asso- also been described, and most recently, several other spliced ciated with USH1D and DFNB12 (nonsyndromic autosomal products of Pcdh15 have been reported [15]. Here we report recessive deafness) [7,8]; mutations in murine Cdh23 are novel alternatively spliced forms of Pcdh15. Though the associated with the deaf–circling mouse mutation, waltzer [9]. functional relevance of these isoforms needs to be determined It was reported that different spliced variants were expressed in in separate experiments, it is clear that the mechanism regulating the ear and the eye, leading to different –protein Pcdh15/PCDH15 expression is complex. interactions between CDH23 and other [10]. In separate Protocadherins are distinct from classical cadherins, the other reports, it was shown that mutations in the Harmonin gene subfamily of cadherins. Classical cadherins are defined by caused 1C [11], while a mutation in alter- having five cadherin repeats in the EC domain, a single natively spliced exon D of the Harmonin gene is linked to transmembrane domain, and a conserved CP domain, which nonsyndromic deafness [12]. Similarly, identification of spliced interacts with β-catenin [1,2]. While protocadherins share the products may help to explain certain cases of nonsyndromic cadherin repeat in the EC domain, they have a higher number of mutations in PCDH15 and may also help explain how the gene repeats. Additionally, protocadherins have greater variability in mediates its function at the protein level. the CP domain, showing little or no homology to other members

Table 1 Exon/intron boundaries of the human PCDH15 gene Exon 3′ Acceptor sequence 5′ Exon Size (bp) 3′ Exon 5′ Donor sequence Intron (bp) 1 a cttcagtgccagtgaaatcaacgtgctgag CTAATA… 355 …CCGCAA gtaagttcttcccccgcccttctcttattt 144,560 2 aatgtttctttctaccttttcttatttcag ATCAGC… 119 b …ATGATG gtaaggttgcttttagcttgacctttagag 138,731 c 3 ttattacttttggttattttctctctttag ATTGCA… 66 …GGAATG gtaagtgagttttttattttacaaatgtct 122,041 c 4 tgttttcaagtacctttctttggtttgcag GTACAA… 161 …AGAGAT gttagtgattttttttctttgcccctgtaa 9,707 5 attttaaaagatgcaatttctacattgcag CCACCG… 156 …AATGAG gtcagtttctatttactctctctctgtcct 22,635 6 attgatatttaaaatcttttttcctttcag CTCACT… 120 …GATCCG gtatgtaaatacctatttgcaatgtaattt 16,658 7 atctgaagtctgtatctttaatatttgcag ACATCC… 111 …GCTAAT gtgagtattcaatttcttttaaatttcttc 12,154 8 ctacctgttgtttgtttttctccctaacag GACCGT… 171 …ACTCCG gtaaatatactcttatctttctccatttat 80,339 9 tgaatgaaactctcttaatttttttttcag GAAGAA… 109 …TTGTTG gtttgtatctgataacattttacactatca 22,774 10 attatgtactgaaatctcaatgtcttccag GGACTC… 113 …ATTAAG gtaaattaagctaatatcttatttccttgt 18,046 11 ctaaatattttcttttcttttcaattttag GCTGAA… 207 …GAAGAT gtaagttaaatacatattttgctcttgttt 10,414 12 cgtttgaaacgatgtcattttttttcaaag ACAAAA… 135 …TTTTCG gtaaagagcttttgtgtaaatatacatatg 1,540 13 tctactaagtgtatttttctcctcccaaag ATAACA… 150 …ATACAG gtaactactcacataatataagtgtacact 47,374 14 gaggatttatgttgtgattttcaattccag CTCACT… 194 …GCGAAG gtaatttatattagaaccaatgcagtcttt 20,092 15 accatcttcttttcctctctttctctgcag GAACTC… 133 …CTACAG gtatgttccttggtgtgtgtgtgtgtgtgt 42,811 16 tgataattgaattatgttttctctctgaag GCAACT… 80 …AGAAAC gtaagttaaaaatgatttcctactttaaaa 10,559 17 gtcttatttgtttgtttgttttgtcactag CACGGG… 94 …GATGGG gtgagtatgctccctgtgaaggggttttaa 12,445 18 tcacatttaatgcttgtgtttttctttaag ACCTCA… 129 …GTAAAA gtaagtggaatatttatatagatcttatat 43,566 19 atttttttgtttttttgttttatttaatag GCAACA… 306 …ATAGAG gtgtgtgggaaaaagcatttattacatgaa 2,475 20 ctaataaaactattcttctgcttattacag GCCAAA… 225 …GTAAAG gtaagggctagagatgacttctggttgcct 24,354 21 aagtactaaactcctaattgtgttttacag GATATG… 117 …CCTCCT gtaagtagaaggcattgattaatattctga 33,756 22 gcaaattaataattaataattttattttag GGATTA… 141 …TTTAAG gtcagtgtctcttttcattatgtggtctat 1,907 23 ctttaacatgaatttttattgctttttaag TTGGTG… 113 …ATATAG gtattgatttctgtccgattttgaaataca 18,756 24 ataaatacatcatcttctctaaacctgcag ACCTCC… 112 …AAGAAG gtgagtagacttttagtttgtgttctcacc 1,910 25 atgaccaggtaattgtattttaatttctag ATACAT… 139 …CAAAAA gtaagttatactacttaaagcttttaccag 35,444 26 ataaaaaatgcatcgtttcttgtcctccag GCAATA… 128 …GTGAAG gtatgcggaattatgcttaatatacagaac 36,378 c 27 ataatctctttttttttcccctccgtctag GCTACT… 216 …GTACTC gtaagtagataaaacttcaggaaagggaag 9,378 28 ttaaatgaattctgttttttcttctttcag GTCTCC… 86 …TACAGA gtaattgattttgtctccttatatatttat 16,678 29 tgtcatgtttgtactttgtttgatttatag GATCTT… 177 …TTTTAA gtaagtgctttttgttatctgtaacagctc 8,788 30 attaattttactatttatctaacaataaag ATTTTT… 219 …CAGACA gtaagtattcaaggatggtaaacctggata 2,741 31 tatttttcttactcttgctttcccaaacag GTTTN 9 bAAAGT gtaagtataattaacattttacacatcctt 1,016 32 gttttacaagccatccttactttatccaag ACGTCA… 156 …CTCAAT gtaagtagaccgttcaggaactctgagaag 4,034 33 ttacatttcattatcttcttttctttcaag TCTTTT… 2069 d Total: 6816 Total: 974,061 a Exon 1 is untranslated. b Exon 2 has a 5′ untranslated region of 28 bp followed by a 91-bp coding region. c These intron sizes could not be counted exactly, but could be accurately estimated from contig NT_02478.5. d Exon 33 has a 1501-bp coding region followed by a 568-bp 3′ untranslated region. 484 K.N. Alagramam et al. / Genomics 90 (2007) 482–492 of the cadherin superfamily. The overall 3D structure of the substantially, ranging from 1016 to 144,560 bp (Figs. 1A and cadherin repeat motif is quite similar to the immunoglobulin 1B). To determine the size of both PCDH15 and Pcdh15 motif and it has been speculated that there may be some we used the NCBI MapViewer (http://www.ncbi.nlm.nih.gov/ functional similarity between the two families [17–19]. Like mapview/). The size of the gene for human protocadherin 15 immunoglobulins, protocadherins may play various roles within including the intronic sequences was approximately 980,000 bp. multicellular organisms. In this report, we compare the genomic The mouse homolog was less than half as long, at just over structure and exon-to-domain organization of Pcdh15 to various 409,000 bp. Exon-flanking sequences have the potential to be members of the cadherin family. Our report shows that the of significant value in mutation detection in humans and will genomic structure of Pcdh15/PCDH15 is unique among make it possible to evaluate the contribution of PCDH15 to cadherins. The characterization of Pcdh15/PCDH15 described hereditary deafness, vestibular dysfunction, and retinitis pig- in this report is useful for investigations involving the regulation, mentosa (Tables 1 and 2). function, or mutational analysis of these genes. By aligning the predicted domains with the amino acid sequence as determined from the nucleotide sequence of the Results gene we determined the exons responsible for encoding the different domains (Fig. 1C). Exons 2 through 28 encode the Intron/exon boundaries and intron sizes cadherin repeats, with an average of 2.5 exons per repeat.

We determined the intron/exon boundaries for human CpG island and 5′ UTR analysis PCDH15 and identified 33 exons that ranged in size from 9 to 2069 bp (Table 1). The exon structure of mouse Pcdh15 is A CpG island 215 bp long was found approximately 2.9 kb similar to that of the human gene (Table 2). Intron sizes vary upstream of the transcriptional start site. The only other potential

Table 2 Exon/intron organization of the mouse Pcdh15 gene Exon 3′ Acceptor sequence 5′ Exon Size (bp) 3′ Exon 5′ Donor sequence 1 a ggagcgctaattgcttttcagccagctgat CAACGT… 377 …TTAAAG gcaagtaagttccttgctcccagcttttgc 2 aatgtgctttcctgcttcctgttatttcag ATCAGC… 119 b …ACGATG gtaagtccacctttactctgatctctgcag 3 aattttaacctttggttatcttttctctag ATTGCA… 66 …GAAACG gtgagtgtgttctgtatttcccagtatttt 4 tggttacaaatagctgtgtttggtttgcag GTACAA… 161 …AGAGAC gtcagtagctttatttttctctgaccctgt 5 tgtctaaaacgatgtcatttcttcccgcag CCACCA… 156 …AATGAG gtctgtgatcctgcgctgatatgtgtgttc 6 cagacagtttctctttcatgttctttccag CTCACT… 120 …GATCCG gtaagtactcagggtggtaattgttgtgag 7 gcctgacttgtgtctacttaacacttgcag ACATCC… 111 …GCAAAT gtgagtgcttgccttctcttaccttctgtt 8 c tccgttaccctcatcag GACCGT… 171 …ACTCCG gtaagtacaatttcttcttctttctttcct 9 cacagctaaacactctgaattttcttccag GAAGAA… 109 …TTGTCG gtttgtatttgccaaaattttatcctgtta 10 cagttctgaactgagtctcttgtctcacag GCACCC… 113 …ATTAAG gtaaattaaactaaaatcttatttcctttt 11 atcatagctaatttttctttaatttttcag GCTGAG… 207 …GAAGAC gtaagttccatgactatcttagcaattatt 12 ttttaagtaatgtatttttttctttcaaag ACAAAA… 135 …TTTCTG gtaaaatatttaatttatgtgtatattagt 13 gcctctctactgcgttctgtttctccgcag ATAACA… 150 …ATTCAG gtaaggccactggatggcaggtgtgtttgc 14 aagatttgtattgtgccttttcaattccag CTGACA… 194 …AAGAAG gtgactccaccaggccctgtagttctatag 15 taaacacccatctctcttctccctgtgtag GCACTC… 133 …CTACAG gtgggcctgggatatggatctgtgtgtctg 16 tataattgaattatgttttcttctctgaag GCAACT… 80 …AGAAAC gtaagttacaagggaatgatttatacttga 17 ttcttttgttttgtttctttctgcctctag CACAGG… 94 …GATGGA gtaagtgttaccacaggggcggggatgcta 18 acactgaatcttctcttctcttcccttaag ACCTCA… 129 …GTCCGG gtaagtggaatcatcgtttagcttatgtgt 19 gttgtgcgaacttttgtttcttccatttag GCAACA… 306 …ATAGAG gtgtgtagacaggcattaactccatgacat 20 gtaaccatcctcctctatgacttgtcatag GCCAAG… 225 …GTGAAG gtaaggaaccttttctctctatatgccaca 21 acagcactaaatctcaattctacttcacag GACATG… 117 …CCACCT gtaagtagagatgctggccaatgctgtggg 22 catcccgatcgatcaattatcctttttcag GGGATG… 141 …TTCAAG gtcagtgtgccattttgtttgagacatcta 23 atctgtatgcggattctgtcacttttttag CTGGTG… 113 …ATACAG gtatgaattcccatcaggctttaaaacacc 24 agaaaatgaacaccccttttaaccctgcag ACCTCC… 112 …AGGAAG gtaggtaacgagttttgcctttgttttctt 25 gaatctaatgagttgtatgctggattccag ACAAGT… 139 …CAAAAA gtaagttacattgcttaaaaggttgacagt 26 tcaccaactaagatatgtcttgtcttccag GCAATA… 128 …GTGAAG gtatgaggagagaatctttgatgggaaact 27 atgatttgtcatgtggttttctcttcgtag GCCACC… 216 …GTACTG gtaagcgcatgtgtgaggggcatgagacac 28 aattaaatacattctattttcacttttcag GTCTCC… 86 …TACAGA gtaaaggattttgtttgttaatagattttc 29 tgtcatatttatgtgttattttatatttag GATTTT… 177 …TTTTAA gtaagtggtttttattgcacttcacctcca 30 ttagtttttcaatctgtcacacaacaaaag GTTCCT… 219 …CCGACA gtgagtatgcagttgctgggtggatattct 31 cttcttcttgcacactgctttccaaaacag GTTTN 9 bAAAGT gtaagtatgattacaaatattttgcacatg 32 gtcttacaagacaaccttactctactcaag ACGCCA… 156 …CGCAAT gtgagtaatgagggaggataactgtgtggg 33 tcacaacttgtctccttcttctctttcaag TCTTTT… 2154 d a Exon 1 is untranslated. b Exon 2 has a 5′ untranslated region of 28 bp followed by a 91-bp coding region. c Exon 8 did not have any additional high-quality shotgun sequence to extend the 3′ intron sequence. d Exon 33 has a 1516-bp coding region followed by a 638-bp 3′ untranslated region. K.N. Alagramam et al. / Genomics 90 (2007) 482–492 485

Fig. 1. Genomic structure of human PCDH15. (A) Green boxes represent individual exons 1–33 (proportional to other exons, but not to scale of diagram). Dotted line indicates gaps where genomic sequence could not be aligned, but sizes could be inferred from contig NT_024078.5. (B) Scaled-down representation of the PCDH15 gene to show all exons in a single line and their positions along the ∼981-kb genomic DNA. (C) Spliced exons aligned with predicted amino acid sequence and predicted domains. upstream CpG islands are at least 60 kb away (Fig. 2). In- Determination of alternatively spliced products terestingly, an AG acceptor splice site was present for human protocadherin 15, which suggests the existence of another exon RT-PCR analysis with forward primers (94 and/or 109) on upstream of exon 1. exons 1 and 3, and reverse primers (95 and/or 109) on exons 6 and 10, showed one major product and two minor products (Fig. Sequence analysis of mouse Pcdh15 and human PCDH15 5A). The major product was the expected one, which included all of the exons internal to the primers. The minor products included Mouse Pcdh15 and human PCDH15 are highly similar in one transcript that skipped exon 4 and another that skipped their cDNA sequences and predicted amino acid sequences (Fig. exons 2 and 4 (Figs. 5B and 5C). Exon 2 codes for amino acids 3). In the EC domain, the cDNA sequences are approximately MFLQFAVWKCLPHGILIASLLVVSWGQYDDD, and exon 4 85% similar at the nucleotide level and 94% similar at the peptide codes for TILVDNMLIKGTAGGPDPTIELSLKDNVDYWV- sequence level. Similarity drops significantly in the CP domain, LLDPVKQMLFLNSTGRVLDRD. The second region of alter- with only 53% of the predicted amino acid sequences matching native splicing was contained completely within exon 33. A between the two species. Despite this dissimilarity between the portion with amino acids LFLLY…HFEQS is spliced out as a amino acid sequences, within the CP domain, the two proline- minor product. The major product is, as expected, the entire exon rich domains are almost identical between the two species, 33 (Fig. 5D). suggesting that these domains are functionally significant. Comparison of the genomic structure of PCDH15 to other Promoter analysis protocadherins in the database showed an important feature that may be unique to Pcdh15/PCDH15. Other protocadherin genes, The expression of the Pcdh15 gene in vitro based on the such as the PCDHα, PCDHγ, and the δ protocadherin sub- lacZ reporter assay results for the various plasmid constructs family, are characterized as having one large exon that encodes displayed both up and down regulation of the Pcdh15 gene all of the cadherin repeats [19,21]. PCDH15, in contrast, has 27 (Figs. 6A and 6B). Plasmid D (approximately 5.5 kb upstream exons encoding its 11 cadherin repeats. Comparison to the through exon 1) showed the highest activity. Plasmids E and F PCDHα and PCDHγ gene clusters shows this stark difference contained, in addition to the same 5.5-kb segment as plasmid in genomic organization (Fig. 4). Studies of the genomic organi- D, even more of the upstream segment. These plasmids zation of a number of classical cadherin genes have revealed that demonstrated much lower expression of lacZ, indicating the the DNA sequences encoding individual extracellular domains presence of suppressor elements in this region. This was are interrupted by two introns [22–26]. The number of introns further substantiated by plasmids G–J, which had only per cadherin repeat is not constant in PCDH15 but is typically upstream elements in the range of 10–5.5 kb and displayed 2or3. minimal expression of lacZ. The smaller plasmids (A–C) had 486 K.N. Alagramam et al. / Genomics 90 (2007) 482–492

Fig. 2. CpG islands. 225 kb of human PCDH15 genomic sequence was analyzed for CG content and putative CpG islands. The test thresholds for a reported island were an observed/expected ratio of 0.60, %C+%G N40.00, and a length N200 bp. The putative island directly upstream of exon 1 has a length of 215 bp. increased expression compared to plasmids E–J but had less Discussion expression than plasmid D. See Fig. 6 for the makeup of all of the constructs as well as their effects on the expression In the original description, PCDH15 consists of 33 exons of lacZ. [3,4]. Twenty-seven of the exons code for 11 cadherin repeats in K.N. Alagramam et al. / Genomics 90 (2007) 482–492 487

exons and both have a CP domain of which a major portion is encoded by one exon. The genetic structure of Pcdh15 bears more resemblance to Cdh23 than it does to other members of the protocadherin family (Fig. 4). While the significance of the genomic organization of protocadherins remains to be eluci- dated, a single exon coding for most of the cytoplasmic domain within a gene that undergoes a significant amount of splicing in both the mouse and the human suggests that this stretch of amino acids is functionally significant. The first evidence that more than one transcript is expressed from the Pcdh15 locus came from Northern blot analysis [3,16]. Recently, several alternatively spliced transcripts of Pcdh15 were reported to be expressed in the mouse inner ear by Friedman and colleagues [15,16]. Four main classes of alternatively spliced transcripts were defined; two of these transcripts (CD-2 and CD-3) encode new cytoplasmic domains Fig. 3. Comparison of similarities in predicted amino acid sequence between and increase the number of exons associated with the Pcdh15 mouse and human. TM, transmembrane domain. SS, signal sequence. locus to 39 (these additional exons are not shown in Fig. 1; our numbering is based on the original report of 33 exons). Based on immunolocalization studies of C-terminal isoforms in hair its EC domain and one large exon codes for ∼90% of the cells, it has been reported that a Pcdh15 C-terminal isoform, cytoplasmic domain, while typical protocadherins appear to labeled CD3, is a tip-link antigen [16]. It is also reported that the have one large exon coding for the EC repeats and multiple temporal and spatial expression patterns of the transcripts with exons coding for the cytoplasmic domain. A comparison of the different cytoplasmic domains vary in the developing hair cells genomic structure of mouse Pcdh15 to mouse Cdh23 (cadherin compared to mature hair cells; thus it is speculated that different 23) is noteworthy in this regard. The 27 cadherin repeat domains isoforms may have different functions [16]. CD-2 and CD-3 of mouse Cdh23 are coded by 59 exons and the CP domain is were also reported to be expressed in the human retina [16]. coded by 6 exons, one of which is quite large (927 bp) [27]. The Ahmed et al. [6] describe the protein product of an mRNA genomic structure of mouse Cdh23 shows a pattern similar to transcript that has a novel start site in the middle of exon 21, that of mouse Pcdh15 in that both have an EC domain resulting in an N-terminally truncated protein. Haywood- containing multiple cadherin repeats encoded by multiple Watson et al. reported the existence of several new splice

Fig. 4. Comparison of the genomic organization of (A) Pcdh α/γ clusters to that of (B) Pcdh15 and to that of (C) Cdh23. In (A) the upper orange bars represent the genomic coding regions. The lower bar represents the generic protein structure of members of the Pcdh α/γ clusters. Notice that one large region of DNA codes for all of the cadherin repeats in the cytoplasmic domain (yellow). In (B) and (C) the upper orange and black bars represent genomic coding regions. Exons are the orange hash marks and the black regions represent introns. The lower bars for (B) and (C) represent the protein structures for Pcdh15 and Cdh23, respectively. Notice that the cadherin repeats in both Cdh23 and Pcdh15 are coded for by multiple exons. 488 K.N. Alagramam et al. / Genomics 90 (2007) 482–492

Fig. 5. Analysis of the various transcripts produced by alternative splicing in Pcdh15. (A) Alternative spliced forms of Pcdh15. This schematic represents (not to scale) the exons in the protocadherin 15 gene with forward and reverse primers (94, 95, 108, 109) used in the PCR. The major and minor products are shown with all included exons attached by solid lines. Note: The numbering of the exons is based on the original description of Pcdh15 [3,4]. To use the criteria established by Haywood- Watson et al. [15], one exon should come before the first exon shown and all exon numbers would increase by 1. (B) Gel analysis of RT-PCR products. This PCR includes ϕ markers (ϕX174) on the left, and in lane 1 primer sets 94 and 95, in lane 2 primer sets 108 and 109, and in lane 3 primer sets 94 and 109. Both the major and the minor alternatively spliced products are present in this PCR. (C) Amino acid sequence of exons 2, 3, 4, and 5 of Pchd15. (D) Various transcripts produced by alternative splicing within exon 33. (Top) Representation of the 3′ end of the protocadherin 15 gene with minor expressed protein product, which splices out the 33.b segment, shown. (Bottom) Representation of the segment of exons present in the 3′ region with amino acid sequences included. Boldface amino acid sequence represents the sequence alternatively spliced out of the minor product of this gene. isoforms, including the existence of mRNA transcripts that To determine how the transcription of Pcdh15 is skipped exons 3, 4, or 7 and one that is devoid of exons 3 through controlled, both in silico and in vitro analyses were carried 15 [15]. Comparing the findings of this report to previous reports out. To identify CpG islands in the 5′ UTR of PCDH15, we [15,16], minor product B in Fig. 5, which skips exons 2 and 4, is used a high stringency standard as recommended by Jones consistent with one of the isoforms described by Ahmed et al. and Takai [28]. Our analysis showed a potential CpG island (exons 3 and 5); however, minor product A, which skips only located ∼2.9 kb upstream of exon 1. There is also a potential exon 4 (exon 5 in Ahmed et al.), and the alternative splicing that CpG island located ∼17 kb downstream of the last exon occurs internal to exon 33 (exon 34 in Ahmed) are novel. found under the same conditions (data not shown). Since it Clearly Pcdh15 is capable of forming multiple alternatively has been estimated that ∼50% of the mammalian promoters spliced products with fewer cadherin repeats, allowing for the contain CpG islands [29], this finding suggests that we formation of multiple protein isoforms from one gene. All of the have identified a putative promoter region of Pcdh15. The 5′ observations cited above demonstrate that the regulation of UTR region also was searched for promoter regions using protocadherin 15 is complex. Mouse mutants lacking specific PromoterInspector (www.genomatix.de), but no consensus splice variants and/or knockdown experiments in other animal signal for transcription initiation such as a CAAT or TATAA models, such as zebrafish, are necessary to prove the biological box was present within 100 bases upstream of the transcription relevance of splice variants of Pcdh15/PCDH15 to hair cell start site. However, a recent report by Carninci et al. shows development and function. that the classical TATA box promoter architecture represents ..Aarmme l eois9 20)482 (2007) 90 Genomics / al. et Alagramam K.N. – 492

Fig. 6. (A) Diagram showing the various fragments used for transfection and reporter assays. The encompassed region includes exon 1 and 10 kb upstream. The plasmids used (pBlue TOPO vector from Invitrogen) (lettered A–M) contain a lacZ reporter gene downstream of the above fragments. (B) Results of transfection and reporter assay of the fragments shown in (A). Numerical data generated were normalized by using an in situ plate assay to determine the transfection efficiency. The results of the plasmid alone showed essentially the same activity as plasmids G through M (data not shown). (C) 35 kb upstream of exon 1 from mouse and human PCDH15 was analyzed for regulatory elements with Signal Scan, Promoter 2.0, Proscan, BLAST, MAR Finder, and Promoter Inspector. Rat Pcdh15 is also shown for comparison. Areas of high homology exist between the rat and the mouse. The mouse gene contains many potential regulatory sequences not found in humans. No TATA box or CAAT box sequences were found for either species. 489 490 K.N. Alagramam et al. / Genomics 90 (2007) 482–492

Fig. 6 (continued). only a minority of the set of mammalian promoters in mice and Pcdh15 is 409,000 bp long and also comprises 33 exons of humans [30]. similar length. Additional exons were reported a few weeks To understand better the molecular mechanisms that before this paper was submitted [16], which further attests to the determine the expression of Pcdh15, deletion mapping of the complexity of protocadherin 15 genes. Nevertheless, it is sequence encompassing the putative promoter region was intriguing to note that roughly the same number of exons in undertaken to determine the existence of any cis-acting elements human PCDH15 are embedded in a genomic sequence much (Fig. 6). A lacZ assay was utilized because it is a more sensitive larger than its mouse counterpart. The significance of this assay compared to other assays (i.e., green fluorescent protein) divergence is not clear. These findings raise several important and is therefore more likely to detect weak expression or small questions: What is the utility of having this gene spread out over changes in promoter activity. The results were novel for this such a large segment of the genome? Both syndromic and family of protocadherins and indicate that, in addition to a nonsyndromic mutations of PCDH15 have been reported. Does probable promoter region upstream of Pcdh15, there are likely human PCDH15 show a different splice pattern in eye versus a number of suppressor sites, as well. As previously noted, ear? Can splicing account for nonsyndromic mutations (i.e., if cells transfected with plasmid D had the highest level of lacZ an exon containing the mutation was skipped over in one tissue expression. This expression decreased substantially in the pre- but not in the other)? What is the biological significance of the sence of any upstream segments beyond the 5.5-kb upstream existence of multiple isotypes of Pcdh15/PCDH15 that result sequence that comprised the D plasmid, indicating probable from variable exon splicing? For example, Ahmed et al. suppressor elements in that region. Their existence was fur- reported that some of the spliced forms of the mouse Pcdh15 ther substantiated by plasmids G–J, which contained only the may code for proteins associated with the tip-link antigen while putative upstream suppressor region (10–5.5 kb upstream of other spliced forms are expressed in the hair cell body [16]. exon 1). These plasmids resulted in low level expression. Finally, does the promoter, which potentially contains a CpG Additionally, it is of note that as the segments decreased in island and lacks TATA box sequences, lend the gene to size from the 5′ end relative to plasmid D (A–C), the activity evolutionary plasticity as suggested by Carninci et al. [30]? The of lacZ also decreased. It is possible that the upstream region elucidation of its genomic structure and regulation of RNA that was unique to plasmid D contained the major promoter expression sets the stage for further molecular analysis of sequence and that these other regions found in plasmids A–C Pcdh15/PCDH15. contained only secondary promoter activity. The findings in this paper show that both mouse and human Materials and methods protocadherin 15 genes have complex genomic structures and transcription control mechanisms. PCDH15 is ∼980,000 bp Human and mouse protocadherin genes referenced in this report and the long and comprises 33 exons ranging in size from 9 to 2069 bp. numbering of the exons in Pcdh15 are based on the original cDNA sequences K.N. Alagramam et al. / Genomics 90 (2007) 482–492 491 submitted to GenBank [3,4]. Additional exons of Pcdh15 reported [6,15] Acknowledgments since the original publication of Pcdh15 [3] have not been included in this study. This research was supported by NIDCD Grant DC05385 to Determination of genomic sequence and alignment K.N.A. N.D.M. was supported by funds from the Miami University Honors program. R.J.H.S. is partially supported by Portions of the PCDH15 cDNA sequence were compared with sequences NIDCD (R01-DC02842). We thank P. James, D. Pennock, and catalogued by the Human Genome Project using the BLAST search engine [20]. S. Hoffman for critical reading of the manuscript. Matching sequences with a high level of agreement were aligned using the Sequencher program (GeneCodes, Ann Arbor, MI, USA). The presence of AG/GT acceptor/donor splice sites coinciding with abrupt References changes from matching bases to mismatched bases was used in determining intron/exon boundaries. Because of the high homology between the human and [1] S.T. Suzuki, Protocadherins and diversity of the cadherin superfamily, mouse sequences, close agreement in the intron/exon boundaries was inferred J. Cell. Sci. 109 (Pt 11) (1996) 2609–2611. and then confirmed by using the SSAHA search engine to find matching [2] S.T. Suzuki, Recent progress in protocadherin research, Exp. Cell Res. 261 trace sequences (http://trace.ensembl.org). The boundary was assumed to be (2000) 13–18. correct if an AG/GT site was found that coincided with the sudden occurrence of [3] K.N. Alagramam, C.L. Murcia, H.Y. Kwon, K.S. Pawlowski, C.G. Wright, mismatching regions. Sequence analysis of putative exons and flanking splice R.P. Woychik, The mouse Ames waltzer hearing-loss mutant is caused by junctions compared to the sequence of RT-PCR products confirmed the mutation of Pcdh15, a novel protocadherin gene, Nat. Genet. 27 (2001) boundaries. 99–102. [4] K.N. Alagramam, H. Yuan, M.H. Kuehn, C.L. Murcia, S. Wayne, C.R. Determination of potential CpG island Srisailpathy, R.B. Lowry, R. Knaus, L. Van Laer, F.P. Bernier, S. Schwartz, C. Lee, C.C. Morton, R.F. Mullins, A. Ramesh, G. Van Camp, G.S. The following criteria were used to determine a potential CpG island 135 kb Hageman, R.P. Woychik, R.J. Smith, Mutations in the novel protocadherin upstream and 90 kb downstream of exon 1: an observed/expected ratio of CpG PCDH15 cause Usher syndrome type 1F, Hum. Mol. Genet. 10 (2001) N0.60, where the expected number of CpG per 200 nt was calculated by dividing 1709–1718. the product of the number of C nucleotides times the number of G nucleotides by [5] Z.M. Ahmed, S. Riazuddin, S.L. Bernstein, Z. Ahmed, S. Khan, A.J. the overall number of nucleotides in the window; a %C+%GN40.00%; and a Griffith, R.J. Morell, T.B. Friedman, S. Riazuddin, E.R. Wilcox, Mutations window length N200 nt. The prediction program CpG Plot/CpG report portion of the protocadherin gene PCDH15 cause Usher syndrome type 1F, Am. J. of the EMBOSS suite was used in the prediction (http://www.ebi.ac.uk/Tools/). Hum. Genet. 69 (2001) 25–34. [6] Z.M. Ahmed, S. Riazuddin, J. Ahmad, S.L. Bernstein, Y. Guo, M.F. Sabar, Prediction of exon-flanking primers P. Sieving, S. Riazuddin, A.J. Griffith, T.B. Friedman, I.A. Belyantseva, E.R. Wilcox, PCDH15 is expressed in the neurosensory epithelium of the eye and ear and mutant alleles are responsible for both USH1F and The exon-flanking primers were predicted using the Primer 3 program (http:// DFNB23, Hum. Mol. Genet. 12 (2003) 3215–3223. www.genome-wi.mit.edu/cgi-bin/primer/primer3_www.cgi). Primers for exon [7] H. Bolz, B. von Brederlow, A. Ramirez, E.C. Bryda, K. Kutsche, H.G. 33 were designed so that the PCR products would overlap and cover the entire Nothwang, M. Seeliger, C.S.C.M. del, M.C. Vila, O.P. Molina, A. Gal, C. exon. Kubisch, Mutation of CDH23, encoding a new member of the cadherin gene family, causes Usher syndrome type 1D, Nat. Genet. 27 (2001) Determination of alternatively spliced products 108–112. [8] J.M. Bork, L.M. Peters, S. Riazuddin, S.L. Bernstein, Z.M. Ahmed, S.L. RNA isolated from adult wild-type mouse brain, eye, or neonate mouse Ness, R. Polomeno, A. Ramesh, M. Schloss, C.R. Srisailpathy, S. Wayne, cochlea was used in reverse transcription reactions with Superscript reverse S. Bellman, D. Desmukh, Z. Ahmed, S.N. Khan, V.M. Kaloustian, X.C. Li, transcriptase (Gibco BRL) following the manufacturer's protocol. In general, A. Lalwani, S. Riazuddin, M. Bitner-Glindzicz, W.E. Nance, X.Z. Liu, G. PCR conditions were 94°C for 2 min followed by 30 cycles of 94°C for 30 s, Wistow, R.J. Smith, A.J. Griffith, E.R. Wilcox, T.B. Friedman, R.J. Morell, 55°C for 30 s, and 72°C for 1 min. The annealing temperature was adjusted Usher syndrome 1D and nonsyndromic autosomal recessive deafness based on the Tm of the primers. We designed primers complementary to Pcdh15 DFNB12 are caused by allelic mutations of the novel cadherin-like gene cDNA spaced at approximately 0.5, 1, 1.5, and 2 kb to scan the entirety of the CDH23, Am. J. Hum. Genet. 68 (2001) 26–37. genes. Agarose gel electrophoresis was done on the amplified cDNA produced [9] F. Di Palma, R.H. Holme, E.C. Bryda, I.A. Belyantseva, R. Pellegrino, B. from the mRNA. For mouse Pcdh15 only, when multiple products or products Kachar, K.P. Steel, K. Noben-Trauth, Mutations in Cdh23, encoding a of unexpected size resulted, these products were sequenced. Sequencing was new type of cadherin, cause stereocilia disorganization in waltzer, the done in both directions using the ABI 373A Dye Terminator cycle sequencing mouse model for Usher syndrome type 1D, Nat. Genet. 27 (2001) – system (Perkin Elmer) with the primers used for PCR. Sequences were 103–107. compared with the Pcdh15 sequence in GenBank. [10] J. Siemens, P. Kazmierczak, A. Reynolds, M. Sticker, A. Littlewood- Evans, U. Muller, The Usher syndrome proteins cadherin 23 and harmonin Promoter analysis form a complex by means of PDZ-domain interactions, Proc. Natl. Acad. Sci. U. S. A. 99 (2002) 14946–14951. The 10-kb segment upstream of exon 1 in Pcdh15 was run through [11] E. Verpy, M. Leibovici, I. Zwaenepoel, X.Z. Liu, A. Gal, N. Salem, A. several promoter prediction programs (Signal Scan, Promoter 2.0, ProScan, Mansour, S. Blanchard, I. Kobayashi, B.J. Keats, R. Slim, C. Petit, A BLAST, MAR Finder, Promoter Inspector) that did not predict any traditional defect in harmonin, a PDZ domain-containing protein expressed in the promoter elements, such as a TATAA box or CAAT boxes. We then created a inner ear sensory hair cells, underlies Usher syndrome type 1C, Nat. Genet. series of truncated versions of the putative promoter region from the 10-kb 26 (2000) 51–55. segment upstream of exon 1 using PCR. These truncated segments, which [12] X.M. Ouyang, X.J. Xia, E. Verpy, L.L. Du, A. Pandya, C. Petit, T. Balkany, included exon 1, were fused with a lacZ reporter gene and were expressed W.E. Nance, X.Z. Liu, Mutations in the alternatively spliced exons of as plasmids transfected in mouse 3T3- fibroblast cells and assayed for USH1C cause non-syndromic recessive deafness, Hum. Genet. 111 (2002) β-galactosidase activity. Thirteen different plasmids (assigned letters A–M) 26–30. were created using progressively smaller fragments from the 10-kb segment [13] M. Frank, R. Kemler, Protocadherins, Curr. Opin. Cell Biol. 14 (2002) (Fig. 6A). 557–562. 492 K.N. Alagramam et al. / Genomics 90 (2007) 482–492

[14] V. Rouget-Quermalet, J. Giustiniani, A. Marie-Cardine, G. Beaud, F. Genomic structure and chromosomal mapping of the mouse N-cadherin Besnard, D. Loyaux, P. Ferrara, K. Leroy, N. Shimizu, P. Gaulard, A. gene, Proc. Natl. Acad. Sci. U. S. A. 89 (1992) 8443–8447. Bensussan, C. Schmitt, Protocadherin 15 (PCDH15): a new secreted [25] M. Hatta, S. Miyatani, N.G. Copeland, D.J. Gilbert, N.A. Jenkins, M. isoform and a potential marker for NK/T cell lymphomas, Oncogene 25 Takeichi, Genomic organization and chromosomal mapping of the mouse (2006) 2807–2811. P-cadherin gene, Nucleic Acids Res. 19 (1991) 4437–4441. [15] R.J. Haywood-Watson II, Z.M. Ahmed, S. Kjellstrom, R.A. Bush, Y. [26] P. Huber, J. Dalmon, J. Engiles, F. Breviario, S. Gory, L.D. Siracusa, A.M. Takada, L.L. Hampton, J.F. Battey, P.A. Sieving, T.B. Friedman, Ames Buchberg, E. Dejana, Genomic structure and chromosomal mapping of the Waltzer deaf mice have reduced electroretinogram amplitudes and mouse VE-cadherin gene (Cdh5), Genomics 32 (1996) 21–28. complex alternative splicing of Pcdh15 transcripts, Investig. Ophthalmol. [27] F. Di Palma, R. Pellegrino, K. Noben-Trauth, Genomic structure, Vis. Sci. 47 (2006) 3074–3084. alternative splice forms and normal and mutant alleles of cadherin 23 [16] Z.M. Ahmed, R. Goodyear, S. Riazuddin, A. Lagziel, P.K. Legan, M. (Cdh23), Gene 281 (2001) 31–41. Behra, S.M. Burgess, K.S. Lilley, E.R. Wilcox, S. Riazuddin, A.J. Griffith, [28] P.A. Jones, D. Takai, The role of DNA methylation in mammalian G.I. Frolenkov, I.A. Belyantseva, G.P. Richardson, T.B. Friedman, The tip- epigenetics, Science 293 (2001) 1068–1070. link antigen, a protein associated with the transduction complex of sensory [29] F. Antequera, A. Bird, Number of CpG islands and genes in human and hair cells, is protocadherin-15, J. Neurosci. 26 (2006) 7022–7034. mouse, Proc. Natl. Acad. Sci. U. S. A. 90 (1993) 11995–11999. [17] M. Overduin, T.S. Harvey, S. Bagby, K.I. Tong, P. Yau, M. Takeichi, M. [30] P. Carninci, A. Sandelin, B. Lenhard, S. Katayama, K. Shimokawa, J. Ikura, Solution structure of the epithelial cadherin domain responsible for Ponjavic, C.A. Semple, M.S. Taylor, P.G. Engstrom, M.C. Frith, A.R. selective cell adhesion, Science 267 (1995) 386–389. Forrest, W.B. Alkema, S.L. Tan, C. Plessy, R. Kodzius, T. Ravasi, T. [18] L. Shapiro, A.M. Fannon, P.D. Kwong, A. Thompson, M.S. Lehmann, G. Kasukawa, S. Fukuda, M. Kanamori-Katayama, Y. Kitazume, H. Kawaji, Grubel, J.F. Legrand, J. Als-Nielsen, D.R. Colman, W.A. Hendrickson, C. Kai, M. Nakamura, H. Konno, K. Nakano, S. Mottagui-Tabar, P. Arner, Structural basis of cell–cell adhesion by cadherins, Nature 374 (1995) A. Chesi, S. Gustincich, F. Persichetti, H. Suzuki, S.M. Grimmond, C.A. 327–337. Wells, V. Orlando, C. Wahlestedt, E.T. Liu, M. Harbers, J. Kawai, V.B. [19] Q. Wu, T. Zhang, J.F. Cheng, Y. Kim, J. Grimwood, J. Schmutz, M. Bajic, D.A. Hume, Y. Hayashizaki, Genome-wide analysis of mammalian Dickson, J.P. Noonan, M.Q. Zhang, R.M. Myers, T. Maniatis, Comparative promoter architecture and evolution, Nat. Genet. 38 (2006) 626–635. DNA sequence analysis of mouse and human protocadherin gene clusters, Genome Res. 11 (2001) 389–404. [20] S.F. Altschul, T.L. Madden, A.A. Schaffer, J. Zhang, Z. Zhang, W. Miller, Online references D.J. Lipman, Gapped BLASTand PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Res. 25 (1997) 3389–3402. [1] EMBOSS, CpG Plot/CpG Report, Web program to determine CpG Islands: [21] C. Redies, K. Vanhalst, F. Roy, delta-Protocadherins: unique structures and http://www.ebi.ac.uk/Tools/. functions, Cell. Mol. Life Sci. 62 (2005) 2840–2852. [2] Ensembl Trace Server, search engine to find matching mouse shotgun [22] B.C. Sorkin, W.J. Gallin, G.M. Edelman, B.A. Cunningham, Genes for sequences: http://trace.ensembl.org. two calcium-dependent cell adhesion molecules have similar structures [3] Genomatix, Web program for promoter prediction: http://genomatix.gsf.de/. and are arranged in tandem in the chicken genome, Proc. Natl. Acad. Sci. [4] ISREC, Profile Scan, Web program to predict protein motifs: http://www. U. S. A. 88 (1991) 11545–11549. isrec.isb-sib.ch/. [23] M. Ringwald, H. Baribault, C. Schmidt, R. Kemler, The structure of the [5] Primer 3, Steve Rozen, Helen J. Skaletsky (1998), Web program to design gene coding for the mouse cell adhesion molecule uvomorulin, Nucleic primers: http://www.genome-wi.mit.edu/cgi-bin/primer/primer3_www.cgi. Acids Res. 19 (1991) 6533–6539. [6] Standard Nucleotide Blast, search engine to find matching genomic [24] S. Miyatani, N.G. Copeland, D.J. Gilbert, N.A. Jenkins, M. Takeichi, sequences: http://www.ncbi.nlm.nih.gov/BLAST/.