Proc. Natl. Acad. Sci. USA Vol. 89, pp. 3280-3284, April 1992 Neurobiology Structure and evolution of four POU domain genes expressed in mouse brain (POU Bnain-/POU Brain-2/POU Brain4/POU Scip/) YOSHINOBU HARA*, ALESSANDRA C. ROVESCALLI, YONGSOK KIM, AND MARSHALL NIRENBERG Laboratory of Biochemical Genetics, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20892 Contributed by Marshall Nirenberg, January 6, 1992

ABSTRACT Four mouse POU domain genomic DNA acid sequences of the are related to one another and clones-Brain-i, Brain-2, Brain-4, and Scip-and Bran-2 suggests that the genes originated by duplication of an cDNA, which are expressed in adult brain, were cloned and the ancestral class III POU domain gene.t coding and noncoding regions ofthe genes were sequenced. The amino acid sequences of the four PoU domains are highly conserved; sequences in other regions of the proteins also are MATERIALS AND METHODS conserved but to a lesser extent. The absence of introns from DNA Coming and Sequencing. Our objective was to clone the coding regions of the four POU domain genes and the POU domain genes expressed in mouse brain. Mixtures of similarity of amino acid sequences of the corresponding pro- oligodeoxynucleotides that correspond to highly conserved teins suggest that the coding region of the ancestral class HI amino acid sequences in POU domains were used as primers POU domain gene lacked introns and therefore may have for amplification ofadult mouse brain POU domain cDNA by originated by reverse transcription of a molecule of POU PCR, and the amplified DNA fragments were cloned. Se- domain mRNA followed by insertion ofthe cDNA into germ cell quence analysis revealed 10 POU domain cDNA clones. genomic DNA. Additional duplications ofthe ancestral class HI Nick-translation of clone 38 DNA [228 base pairs (bp)] POU domain gene (or mRNA) would create the Brain-i, yielded a 32P-labeled DNA probe that was used to screen an Brain-2, Brain-4, and Scip genes. adult mouse brain cDNA library. One positive Brain-2 POU domain cDNA clone (P5) was obtained (1380 bp). One million POU domain proteins bind to specific nucleotide sequences recombinants from a mouse BALB/c genomic DNA library in DNA and regulate gene expression (for reviews, see refs. in EMBL4, kindly provided by Konrad Huppi (National 1 and 2). The POU domain is a conserved amino acid Institutes of Health) and 106 recombinants from an adult sequence =150 amino acid residues long. The initial region of mouse brain cDNA library were screened with a 32P-labeled 69-72 amino acid residues is termed the POU-specific do- Brain-2 cDNA probe (nucleotides 1002-1609 in Fig. 4). DNA main, which is followed by a 15- to 25-amino acid residue inserts were subcloned in pBluescript II SK+ and KS+. The linker region and a 60-amino acid residue POU homeo- nucleotide sequences of both strands of DNA were deter- domain. Both the POU-specific domain and the homeo- mined with universal or specific oligodeoxynucleotide prim- domain are required for specific high-affinity binding to DNA ers and single-stranded DNA templates by the dideoxynu- (3, 4). cleotide chain-termination method (13). Rosenfeld and his colleagues have sorted POU domains OlUgdeoxynucleotide Probes. Four oligodeoxynucleotides into different groups on the basis of POU domain amino acid (48 bases) complementary to different sequences that encode sequence similarity (2). Three mammalian class III POU part ofthe C-terminal regions ofBrain-1, Brain-2, Brain-4, or domain cDNAs have been described-human (5) and rat (2) Scip POU domain proteins were synthesized and purified. Brain-i and Brain-2 and rat (5-8) and mouse (9-11) Scip (also Each oligodeoxynucleotide was used as a specific probe for termed Oct-6 and Tst-J)-that have closely related POU one POU domain gene; either Brain4 (nucleotides 1521- domains and are expressed in embryonic and adult brain. 1568; see Fig. 2), Brain-i (nucleotides 1890-1937; see Fig. 3), Only the POU domain regions of human and rat Brain-i and Brain-2 (nucleotides 1755-1802; see Fig. 4), or Scip (nucle- Brain-2 cDNA have been sequenced thus far (2, 5), whereas otides 1117-1164 in figure 1 of ref. 9). Each oligodeoxynu- the complete coding sequences ofrat (6-8) and mouse (9-11) cleotide probe was labeled by adding -10 residues of Scip cDNA have been reported. Scip RNA is expressed in a [32P~dATP (3000 Ci/mmol; 1 Ci = 37 GBq) to the 3' terminus subset ofneurons, oligodendroglia, Schwann cells, and in the catalyzed by terminal deoxynucleotidyl transferase. testis (5-11). The expression ofthe Scip gene is promoted by Genomkc DNA Blot Analysis. Mouse genomic DNA was cAMP (6, 7). digested with EcoRI and/or BamHI, subjected to electro- In this report, a fourth class III mouse POU domain gene, phoresis (0.7% agarose gel, 5 ,tg per lane), and transferred to Brain4, is described, which is also expressed in adult mouse nitrocellulose filters. The filters were hybridized with a brain. Brain4 is similar to the recently reported XLPOU 2 nick-translated 32P-labeled Brain-2 DNA probe (nucleotides POU domain partial cDNA of laevis (12). Three 1002-1609; see Fig. 4) or an oligodeoxynucleotide probe (106 additional mouse class III POU domain genomic clones- cpm/ml) in 4x standard saline citrate (SSC)/40%o forma- Brain-i, Brain-2, and Scip-and Brain-2 cDNA were ob- mide/0.1% SDS/lx Denhardt's solution/25 ,ug of sheared tained and the nucleotide sequences of the coding and salmon sperm DNA per ml at 420C overnight. Filters were noncoding regions were determined. Comparison of the deduced amino acid sequences of the four POU domain *To whom reprint requests should be addressed at: Laboratory of proteins shows that the structure of the genes and the amino Biochemical Genetics, National Heart, Lung, and Blood Institute, National Institutes of Health, Building 36, Room 1C06, 9000 Rock- ville Pike, Bethesda, MD 20892. The publication costs of this article were defrayed in part by page charge tThe sequences reported in this paper have been deposited in the payment. This article must therefore be hereby marked "advertisement" GenBank data base [accession nos. M88299 (Brain-1), M88300 in accordance with 18 U.S.C. §1734 solely to indicate this fact. (Brain-2), M88301 (Brain-4), and M88302 (Scip)]. 3280 Downloaded by guest on September 29, 2021 Neurobiology: Hara et al. Proc. Natl. Acad. Sci. USA 89 (1992) 3281 washed three times in lx SSC/0.1% SDS at 250C and four specific for Brain-i, Brain-2, Brain-4, or Scip RNA (Fig. 1C). times in 0.1x SSC/0.1% SDS at 420C, 520C, 620C, or 720C Two major species of Brain-1 RNA [4.8 and 3.5 kilobases with a 32P-labeled Brain-2 DNA probe or at 520C with a (kb)] were detected in RNA from brain and kidney and two labeled oligodeoxynucleotide probe and were subjected to minor species (8.0 and 1.8 kb) were detected in brain. One autoradiography for several days. major species of Brain-2 RNA (4.8 kb) and 2 minor species of RNA Blot Analysis. Total RNA was prepared from various RNA (3.5 and 7.0 kb) were detected only in brain RNA. One organs of 7-week-old mice by the guanidine isothiocyanate species ofBrain-4 RNA (4.8 kb) was found only in RNA from lysis method (14). Total RNA was fractionated by electro- brain. The Scip probe revealed one major and one minor phoresis (10 gg per lane) on 1.2% agarose formaldehyde gels species of RNA from brain (3.5 and 2.5 kb, respectively). and RNA was transferred to nitrocellulose filters. The filters Isolation of Four Mouse POU Domain Genes. One million were hybridized with an oligodeoxynucleotide probe (106 recombinants from a mouse genomic DNA library and one cpm/ml) specific for either Brain-1, Brain-2, Brain-4, or Scip million from an adult mouse brain cDNA library were RNA in 4x SSC/40% formamide/0.1% SDS/lx Denhardt's screened with a Brain-2 cDNA probe at low stringent washes solution/100 Ag of yeast tRNA per ml/25 tug of sheared (0.1x SSC at 47°C). Sixty positive genomic DNA and 50 denatured salmon sperm DNA per ml at 420C overnight. The cDNA clones were obtained. Restriction site analysis of the filters were washed three times with lx SSC/0.1% SDS at 60 genomic DNA clones (data not shown) revealed 11 kinds 250C and four times with 0.1x SSC/0.1% SDS at 520C and ofclones. Thus far, Brain4, Brain-i, Brain-2, and Scip POU then were subjected to autoradiography for several days. domain genomic DNA clones and Brain-2 cDNA clones have been identified by nucleotide sequence analysis. RESULTS AND DISCUSSION Brain4 POU Domain Gene. The nucleotide sequence (2500 bases) of cloned mouse Brain4 POU domain genomic DNA Southern Analysis and POU Domain Probe Specificity. A and the deduced amino acid sequence of the are Brain-2 POU domain cDNA clone was isolated from an adult shown in Fig. 2. An open reading frame was found for a POU mouse brain cDNA library and the POU domain region ofthe domain protein consisting of 361 amino acid residues with a DNA was used as a probe to detect POU domain genes. The calculated Mr of 39,417. No intron or typical RNA splice site specificity of the Brain-2 DNA probe for hybridization to was found in the coding sequence. Since only 2.5 kb of POU domain genes in mouse genomic DNA digested with Brain4 genomic DNA has been sequenced, the possibility of restriction enzymes and subjected to Southern analysis is introns in the noncoding regions of the gene is not excluded. shown in Fig. 1A as a function of the temperature used for The 3' noncoding region of the Brain4 gene contains repet- stringent washes of filters. Only one DNA fragment was itive AC and GC nucleotide sequences (Fig. 2, underlined detected when the filters were washed at 72°C; however, at regions), which under appropriate conditions might adopt the least four DNA fragments were detected at 42°C-62°C. The conformation of Z-DNA. The amino acid sequence of the specificity of four oligodeoxynucleotide probes complemen- Brain4 POU domain is similar to the POU domain sequence tary to different sequences in Brain-i, Brain-2, Brain4, or recently reported (12) for XLPOU 2 of X. laevis (98% Scip POU domain genes was examined by Southern analysis similarity); 79o similarity was found for 90 amino acid using a stringent wash temperature of 52°C (Fig. 1B). Each residues outside the POU domain. The 3' untranslated nu- oligodeoxynucleotide probe hybridized to a different, single cleotide sequence of XLPOU 2 cDNA has a repetitive AC DNA fragment. nucleotide sequence similar to that of Brain4 and an AT Expression of Four POU Domain Genes. The expression of repeat. No other obvious sequence similarity was found in Brain-i, Brain-2, Brain4, and Scip POU domain genes was the 3' noncoding regions of Brain4 and XLPOU 2 cDNA determined by Northern analysis with total RNA from var- compared. A rat cDNA clone (RHS2) that is the equivalent ious adult mouse tissues and oligodeoxynucleotide probes of Brain4 is described in the accompanying paper (15). A :DNA POU domain of Brain-2 C: RNA - >. co e C - 0 0 0m m0.J - + b- 0 z 0. 3 U 0 LLU ul 0 U LU 0 LU LU owZu 0 24 = 23- S".Mf Brain-1 *; .... 8.0 - 4.8 - -5 3.5 - -2 4- 1.8- 2- 0 7.0 - ; Brain-2 42°C 52*C 62°C 72°C 4.8 -j ;-5 3.5 - J B: DNA -2 m m m + 4. LU 0 LU LU 0 LU m0 Lw0 m LU Brain-4 23 - 4.8 f-5 10- 7- 5- xb..s ...... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~...... 4- 3- p,.ji

2- Brain-i Brain-2 Braln-4B.5-ScipSc.p -2

FIG. 1. Probe specificity and Southern and Northern analyses of Brain-i, Brain-2, Brain4, and Scip DNA or RNA. (A) Southern analysis of mouse genomic DNA cleaved with EcoRI (E), BamHI (B), or EcoRI and BamHI (E + B). POU domain DNA fragments were detected by hybridization with a 32P-labeled Brain-2 POU domain cDNA probe. For the stringent washes of the filters, 0.1 x SSC was used at 42°C, 52°C, 62°C, or 72°C. (B) The specificity of 32P-labeled oligodeoxynucleotide probes for Brain-i, Brain-2, Brain4, or Scrip genes in the genomic Southern analysis. (C) Northern analysis of total RNA from adult mouse tissues. Brain-1, Brain-2, Brain-4, or Scip RNA was detected with 32P-labeled oligodeoxynucleotide probes. Downloaded by guest on September 29, 2021 3282 Neurobiology: Hara et al. Proc. Natl. Acad. Sci. USA 89 (1992)

TAACTAAACCGGAATTCTTTCATGCATTAAGATCAAAATGATATTTTAATTTGTTTTATTTATTTGTTTTAATTTAAATAAGTTAAAAGGCGACCTGAAA 100 CTGAAGTTAGCCTCCTCGCTGCCCCTCCATACAAATATCTACCTTCTATTTATTCTCTGCCACTAACAGCCTTGTAGTCTCCGGGAGGGGGGAAAGGGTT 200 AACTAAAAGAGCCGCTGCCACTCTGAGCTAGGCAGCCAATGGAGCCTAAAGATTGATTCACTTCCTGCTTGGGTCTCACTGAAACTCCCGAGCTTTGCCA 300 GCCTAATTTGGAAAGCGAGCTCGGCCTCCCGCACCACCATTGGCTGTGTTTATGCCTGGCCAGGCCAAGTCGCACTGCGATTGGCCTCCGCGGGTGCCGG 400 500 600 M A T A A S N PY S I LS S SS L V H A D S A G M QQ G S P F R N P 34 C TCAGAAAC TTCTCCAAAG TGACTACTTGCAGGGAGTTCCCAGCAATGGGCATCCCC TCGGGCATCACTGGGTGACCAGTC TTAGCGACGGGGGCCCGTG 700 Q K L L Q S DYL Q G V PS N G H PL G H H WV T S L S D G G P W 67 GTCCTCCACATTGGCCACCAGCCCCCTGGACCAGCAAGACG TGAAGCCGGGACGCGAAGATC TGCAACTGGGCGCAATCATCCATCACCGC TCGCCGCAC 800 S S T L A T S P L D Q Q D V K P G R E D L Q L G A I I H H R S P H 100 GTAGCCCACCACTC GCCGCACACTAACCATCCGAACGCCTGGGGAGCGAGC CCTGCTCCAAACTCGTCCATCACG TCCAGCGGCCAACCCC TCAATGTGT 900 V A H H S P H T N H P N A W G A S P A P N S S I T S S G Q P L N V Y 134 ACTCGCAGCCAGGC TTCACCGTGAGCGGTATGC TGGAGCACGGGGGACTCACTCCACCACCAGCTGC TGCCTCCACACAGAGCCTGCATCCAGTGCTCCG 1000 S Q P G F T V S G M L E H G G L T PPP AAA S T Q S L H P V L R 167 -IX-wXC2IC DNW GGAGCCTCCAGACCATGGTGAGCTGGGCTCGCACCACTGCCAGGACCACTCTGATGAAGAGACTCCAACCTC 1100 E P P D H G EL G S H H C Q D H SD E E T P T S D L ZQ A X Q 200 1200 F X Q R R I X L G F T Q A D V G L A L G T LY G I V 8S Q T T I C R 234 GTCGKCCTACAACTCoho CAT&TACTCTTIC_ _aCT _TCATCCACAGGAAGCCCGAC 1300 P I A L Q L 8 V X N M C X LQL I W L A A D SS T G S P T 267 L T A PAATT-CTTODGA A G DA A CAGCATTGACAAGATCGCTGCTCAA_=AAA _CAr _ _CA CTCAAGMC 1400 S I D K I A A Q C R X R K R R T 8 I A V 8 V X C V L A T 2 F L R C 300 FIG. 2. Nucleotide sequence C:CCSAkGCCCTOCAGC __ 1500 and predicted amino acid se- P R P A A Q A -I 8 8 L A D 8 L Q L R R 2 VV R V W -F C N R R Q X A xi 334 quence of mouse Brain4 AAAG-AkT ACTCCGCCAGGGGATCAGCAGCCACACGAGGTTTATTCGCACACGGTGAAAACAGACGCGTCCTGCCACGATCTCTGACTGGAGGAAGCAAG 1600 genomic hT P P G D QQ P H E V Y S H T V K T D A S C H D L * 361 DNA. The POU domain is en- GAGGTGGCCCGCTGCACTGGGAGCAGTGCTGATACAATTCTCTCTTTACCTTCCTTTCAGCATTACATCCTTTATTGTTGCTATCTCTCTAGCTTGCTAG 1700 closed within a box and the cor- CTCTTCTTTTCTTTCACAGGGGTTTCTAACTTCTTATGACTTATGTTAAGAAAAG GGCCGCTCAGGCAAACAC.ACGLCLACGCG 1800 responding nucleotide and amino CGCGCGCGCGCGCGCGCACACACACACACACACACACACACACACACACATA CATTCAGGCTTCTCAACTCTGAAACACAATTATATTAAGA 1900 AGTCCAGGTGGGGTCCCTGTATTCCCTGCTGCCAGGACAATTTCCCACCAACCTTTTTCTGCTGATCAATATATATTGTTTACTCTGAGTAAGATTTGTT 2000 acid sequences of the POU-spe- TGGTTGCTCCTCTCTAAGAAGGGCAAGGAGCTGGGTAACGGGTGGGGGGCGGGGGAGGGTGGGAAGACAGATGTGTGCCTATGAATAACCTTTCAGCGCC 2100 cific domain and POU homeo- TTGGTTATAGCAGCTGTATTTCAGGTGAAATCTGTTTTACAATAGACTAGTTTTGCATTTTTTTAAAACTTCTATAGCGTTTCTTAATTGTCTGCGGTGTT 2200 domain are shown in boldface TTACTACAAGC TGTACACAATATTTG TGAGATAAT TTG TATC TAATCGACCACTCCACATTATTATTGCTATTATTATTATTTCTTCAT TGAGC TGATGG 2300 GTTTTTMATTGCTTAAAAAGGGAGGGAGGC TGAAAGGAGGAGAGAGAAAAATGCTCACC TCC TGCAC TCCCACAC TTGAGGAGAGGATATGGTGGAGAAA 2400 type. Underlined nucleotides are GTTGTTGATGCTTCTATTTTCCCCCCCGGAACTGAGTGCCTATATGGGCAGAACAAGCAAACTGAATAGGTTTGCAAAGTAGCCTGAAGGCTATTTCGCT 2500 described in the text.

Brain-i POU Domain Gene. A 15.5-kb mouse Brain-i ge- (not shown in Fig. 3) as well as repetitive GGC nucleotide nomic DNA clone was obtained and 4000 nucleotide residues sequences. The 3' noncoding region contains repetitive GCC were sequenced. The nucleotide sequence (2500 bases) of nucleotide sequences and a polyadenylylation signal starting at Brain-i genomic DNA and the deduced amino acid sequence of nucleotide 2323. No intron or typical RNA splice site was Brain-1 POU domain protein are shown in Fig. 3. The Brain-i detected in the coding sequence of Brain-1. gene contains an open reading frame for 495 amino acid resi- Brain-2 POU Domain cDNA and Genomic DNA Clones. An dues, which corresponds to a POU domain protein with a 18.5-kb clone of Brain-2 genomic DNA and multiple Brain-2 calculated Mr of 50,012. Mouse Brain-1 POU domain protein cDNA clones were obtained; 3864 nucleotides of genomic contains long amino acid repeats consisting of glycine, alanine, DNA and 1461 residues of cDNA were sequenced. The proline, or histidine. In most cases, multiple copies of a single nucleotide sequence of Brain-2 cDNA clone C4 corresponds codon determine the repetitive amino acid sequence. The 5' to nucleotides 170-1631 of Brain-2 genomic DNA shown in noncoding region of the Brain-i gene contains polypyrimidine Fig. 4. An open reading frame (1335 nucleotides) was found and polypurine regions and repetitive GA nucleotide sequences that encodes a 445-amino acid protein with a calculated Mr of

TCCGCACTCGTCCGAGGACAGGAGGAGCCGCGGAGCCC TTGCTCCCCGAGAGAGCGCCCGAAGTACGGGTCCCCGCCCCAGTGGGCGAGCC TCGCTGGAG 100 CCAGCCGATCGCTCCCGGCCAGGGGGGAGACCACGACCCCCCCTGAAGGGGCTGGCCACGGAGCTCGCCCGGAGAAGCGATCCCCCCTCCCCAGGGCGC T 200 GCTCCTGCTGCAGCGGCGGCTGCTGCTGACCGAGGCTAGCCGGCGACCCCCGCGCCCTGCCAAGCGGCCTTGCAGCTGCAGCCCCGGGCCGCGGCGGCGG 300 CfGO02GGCDGGAOC0GC0OAGGAGGAGGC.G0G0GAGAAaOCOOCOGGC0GCQGG GGACCCGGGGCGDGGGAGGAGGAGCGGGGGGGGAGGAGGCGGG 400 AGGCGGGGGGGCGCGGCGCGCGGGCGGCGTGCGGCCGCG GCTGCTGCTGCG0CGGCGGCGGTGGT00O0 GOTGGGGTGGCGGGAGCGGAGCGGC 500 ATGGCCACGGCGGC TTCTAACCCCTACCAGCCGGGGAACAGCC TGCTCACGGCCGGCTCCATTGTGCACTCGGACGCGGCCGGGGCGGGCGGCGGCGGAG 600 M A T A A S N P Y Q PG N S LL T A G S I V H S D A A G A G G GG G 34 GGGGCGGCGGCGGCGGCGGCGGCGGGGCGGGGGGCGGC GGCGGCGGCATGCAGCCGGGAAGCGCCGCCGTGACCTCGGGCGCC TACCGGGGGGACCCGTC 700 CC QCGC CGC G A G G G CC G M Q P G S A A V T S G A Y R G D P S 67 CTCGGTCAAGATGGTCCAGAGCGAC TTCATGCAGGGGGCCGCCAGCAACGGCGGCCATATGC TGAGCCACGCGCACCAGTGGGTCACCGCCCTGCCCCAC 800 S V K 1 V Q S D F M Q G A A S N GG H M L S H A H Q W V T A L P H 100 GCTGCCGCCGC TGCCGCCGCTGCTGCCGCCGCTGCCGTGGAAGCCAGCTCACCGTGG TCGGGCAGTGCGGTGGGCATGGCCGGCAGCCCCCAGCAGCCGC 900 A AA AAA A AAA A A V E A SS P W S G S A V G M A G S P Q Q P P 134 CACAGCCGCCGCCGCCGCCGCC TCAGGGCCCCGACGTGAAGGGAGGCGCTGGACGCGAAGACCTGCACGCCGGTACCGCGC TGCATCACCGCGGACCGCC 1000 Q P PPP P Q G P D V K G G A G R E D L H A G T A L H H R G P P 167 GCACCTCGGGCCCCCGCCGCCGCCCCCTCACCAAGGACACCCTGGGGGCTGGGGGGCCGCAGCCGCCGCCGCCGCTGCCGCCGCAGCCGCCGCCGCTGCC 1100 H L G P PP P P P H Q G H P G G W G A A A AA A A A A AAAAA A 200 GCTCACCTCCCGTCCATGGCGGGTGGCCAGCAGCCGCCGCCGCAGAGTCTGCTGTACTCGCAGCCCGGAGGCTTCACGGTGAACGGCATGCTGAGCGCGC 1200 A H L P S M A G G Q Q PPP Q S L L Y S Q P G G F T V NG M L S A P 234 CCCCGGGGCCCGGCGGTGGCGGCGGTGGAGCGGGCGGTGGAGCCCAGAGCCTGGTTCACCCCGGGCTAGTGCGCGGGGACACGCCCGAGTTGGCGGAGCA 1300 P G P G CGC G G A CGGC A Q S L V H P G L V R G D T P E L A E l 267 TCATCACCACCATCACCACCACGCGCACCCGCACCCGCCGCACCCGCACCACGCGCAGGGACCCCCGCATCACGGCGGCGGCGGCGCGGGGCCAGGGCTC 1 400 S ISIS R NAA I9 P S P P R P I R A Q G P P H H G G G G A G P G L 300 -tO-S~CxlIC DONAll AACAGCCACGACCCTCACTCGGACGAGGACACGCCGACGTC ______T_ 1500 N S H DP HS D E D T P T-IS D L i Q i A X Q ir Q R R I X L GC T 334 1600

Q A D V C L A L C T L Y G N V r S Q T T I C R V Z A L Q L S F X N 367 FIG. 3. Nucleotide sequence LCAT 1700 and predicted amino acid se- M C X L X P L L N I W L IZ A D S S T G S P T S I D K I AA Q C R 400 quence of mouse Brain-] genomic

IRTCIRTS _CALI_T f C_ _ LIC PCowo I 1800 DNA. The POU domain is en- X R X R T 8 I 2 V S V G A L Z 9 R 7 L X C P K P 8 AL Q Z I T N L 434 closed in a box. The amino acid c ATGACGCCCCCTGGCATCCAGCAGCA 1900 repeats are shown in boldface A D S L Q L K I. Z V V RV W I CNc R Q III R T P P G I QQ Q 4267 type. The nucleotide and amino GACGCCGGACGATG TCTACTCGCAGGTGGGTACCG TGAGCGCTGACACACCGCCACCGCACCACGGGCTTCAGACCAGCGTGCAGTGAATGCCGGGGCGC 2000 T P D D V Y S Q V G T V S A D T P P P H H G L Q T S V Q * 495 acid sequences of the POU-spe- AGCGCGAAGAGGGCGCCaCCGC e~CCACTCCGCAGCCGCTGTCAGCACCGCCATGGTCACCGTCGCCGCCGTCTCCGAGCGCCGCCGC=GAGAA 2100 cific domain and POU homeo- 0 CCACCGCTGCTGCTGCTGCCGCCGCCGCCGCCGCCGTCACCGTGCGCGCCG GCCGCGCACCCAGCACCTGGCCAGTGCGAGCATCTA 2200 domain are also shown in boldface GGCTCCTCCCCTCCGCCCAAAGACAGGCGTGCCACGGCCCCAGGAGGGGGGAGGACAGCTGGAGACCGATGAGACTCTTTCTGGAACAGAAAGGAGG 2300 GACCAAAAAAATTTTTAAGCATAATAAATACCAAGACTGTTTTATATGCATATATAACAAACAAAAACCGGAAGAGGAAAAGGGGCAACAGGGACATCTC 2400 type. Underlined nucleotides are GTTTATACTGTGGTGGCGTTCTGCTTCTGTTTTGAAAGAAGGGTGAAGATGCCTAACGCACCTAAACTGCACACTTGAGGTACTGTCCACACCGAGATGT 2500 described in the text. Downloaded by guest on September 29, 2021 Neurobiology: Hara et al. Proc. Natl. Acad. Sci. USA 89 (1992) 3283

47,148. Identical nucleotide sequences were found with sites for protein kinase C, cGMP-dependent protein kinase, Brain-2 cDNA and genomic DNA; hence, the portion of and cAMP-dependent protein kinase known to be regulated Brain-2 genomic DNA that corresponds to the Brain-2 cDNA by intracellular levels of calcium ions, cGMP, or cAMP, does not contain an intron. Brain-2 protein contains repetitive respectively. The possibility that the rate of transsynaptic residues of glycine, glutamic acid, and proline. The first communication and the rate of expression of certain genes in-frame codon for methionine in the open reading frame of may be coupled by phosphorylation ofPOU domain proteins, the cDNA is shown in Fig. 4 as the putative codon for which may alter the ability of the protein to regulate genes, initiation of protein synthesis. Two overlapping polyadeny- is a problem for future study. lylation signals are present starting at nucleotide 2262. The 5' Protein-protein interactions between POU domain genes noncoding region of the gene contains 32 consecutive GT have been reported in some cases (18, 19). Homo- and repeats (not shown in Fig. 4) and repetitive GA nucleotide heterodimer formation by Brain-i, Brain-2, Brain4, and Scip sequences. The 3' noncoding region of the Brain-2 gene might generate proteins with different properties or specific- contains repetitive GT, GA, and AC nucleotide sequences. ities for regulating the expression of subsets of genes. Scip POU Domain Gene. A fourth mouse POU domain Class III POU domain genes have been found in nematodes gene, the Scip gene, was cloned and 2766 nucleotides were (20), (18, 21), amphibians (12), and mammals sequenced (not shown here). The nucleotide sequence found (5-11), which suggests that the ancestral class III POU for mouse Scip genomic DNA confirms the sequence that domain gene originated at least 6 x 108 years ago. The was reported for mouse Scip cDNA (9-11). No intron was absence of introns from the coding regions of the four mouse detected in the coding sequence of the Scip gene. POU domain genes and the similarity of amino acid se- Sequence Similarity. A comparison of the amino acid quences of the corresponding proteins suggests that the sequences of Brain-1, Brain-2, Brain-4, and Scip POU do- coding sequence of the ancestral mouse class III POU main proteins is shown in Fig. 5. The four proteins clearly are domain gene lacked introns and therefore may have origi- related to one another. The POU domain is the most highly nated by reverse transcription of a molecule of POU domain conserved region of each protein; however, many other mRNA, followed by insertion of the cDNA into germ cell regions of similarity are present. Brain-i, Brain-2, and Scip, genomic DNA. Thus, the coding sequence of the POU but not Brain-4, proteins contain amino acid repeats 5-27 domain gene would be duplicated, but not the introns or the amino acids long that are unique, rather than conserved, 5' upstream regulatory region of the original gene. The DNA parts of the proteins. Similar di- and trinucleotide repeats are sequences that regulate expression of the original POU present in the 5' and 3' noncoding regions of these genes but domain gene would be replaced by the regulatory sequences no other obvious sequence similarity was found in the of another gene near the cDNA insertion site. The new and noncoding regions compared. the original POU domain genes might well be expressed in Putative phosphorylation sites for different kinds ofprotein different cell types and at different times during development. kinases also are shown in Fig. 5. Many highly conserved It is likely that expression ofthe newly created class III POU putative phosphorylation sites are present in or near the POU domain gene would, in a combinatorial fashion, create a new domains of the four proteins and in the N-terminal regions. set of gene regulatory proteins that might interact in different The regions immediately before and after the POU-specific ways compared to the original set. Additional duplications of domain contain many highly conserved acidic amino acid the ancestral class III POU domain gene (or mRNA) would residues and some serine or threonine residues that are create the Brain-i, Brain-2, Brain4, and Scip genes. We putative sites for phosphorylation. If fully phosphorylated, suggest that other sets of gene regulators may have originated 7-9 of the 14 or 15 amino acid residues before and after the during evolution by formation of chimeric genes by splicing POU-specific domain would be acidic. These POU domain enhancer and promoter DNA sequences from one gene to proteins contain highly conserved putative phosphorylation DNA from a second gene that encodes a protein that regu-

GGGCGCCCGAGGGAAGAAGAGGGGGGTACAGCTCTGCACCAATCACTGGCTCCGGTCTGGGAGGTTGCTAGCGGTATCCACGTAAwATCAAAGGGCGCAGA 100 GCCAATGGGAGGGGGTGGAGGGGGCGGGGCCAGGCGCG TGCCGCTGCGAGCGGC TCTGCCAAjACGAaGGLAGAGAGCTTGAGAaCGCGGGGAGAGGGGG 200 AGCGCCCAGCGAGTCAGAGAGAGTGAGCGAGAGCGAGGAGGGAGAGGAGGAGAAAGAGCGAGGGCGGGCGGGCGGGCGGGAGGCAGCGCGGCAGCAGCAG 300 TAATAGCAAGAGCAGCAA CAGAAGGCGTCGGAGCGGGCGTCGGAGCTGC CCGCTAGGGCTCCTTTAACAAGAGAGGGC TAGACCGCTAGAGAGTGG 400 GAGAAGCGGGCGAGCGAGGGGAAACCCAAGGCAGAAAAGTAACTGTCAAATGCGCGGCTCCTTTAACCAGAGCGCCCAGTCCGGCTCCGAGAGTC 500 600 M A T A A S N H YS L L T S S A S I V H A E P P G G M Q Q G A G G Y 34 ACCGCGAGGCGCAGAGCCTGGTGCAGGGCGACTACGGCGCGCTGCAGAGCAACGGGCACCCGCTCAGCCACGCTCACCAGTGGATCACCGCGCTGTCCCA 700 R E A Q S L V Q G D Y G A L Q S N GH P L S H A H Q W I T A L S H 67 CGGCGGCGGCGGCGGGGGCGGCGGCGGCGGTGGAGGAGGCGGGGG AGGCGGCGACG GGCTCCCCGTGGTCCACCAGCCCCCTAGGCCAG 800 CC C C C C re re C C CC C CC C QC C CCC D G S P W S T S P L G Q 100 CCGGACATCAAGCCCTCGGTGGTGGTACAGCAGGGTGGCCGAGGCGACGAGCTGCACGGGCCAGGAGCGCTGCAGCAACAGCATCAACAGCAACAGCAAC 900 P D I K PS VV V Q Q G G R G D E LH G P G A L Q Q Q I Q Q Q Q Q Q 134 AGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAACAGCAGCAGCAACAACAGCGACCGCCACATC TGGTGCACCACGCTGCCAACCACCATCCCGGGCC 1000 Q Q QQQ Q QQQ Q Q QQQ Q Q Q R P P HL V H H A A N H HP G P 167 CGGGGCATGGCGGAGTGCGGCGGCTGCAGC TCACC TCCCTCCCTCCATGGGAGC TTCCAACGGCGGT TTGCTC TATTCGCAGCCGAGCTTCACGGTGAAC 1100 G A W R S A A A A A H L P P S M G A S N G G L LY S Q P S F T V N 200 GGCATGCTGGGCGCAGGAGGGCAGCCGGCTGGGCTGCACCACCACGGCCTGAGGGACGCCCACGATGAGCCACACCATGCAGACCACCACCCGCATCCGC 1200 G M L G A G G Q P A G LHH H G L R D A H D E P H H A D H H P H P H 234 ACTCTCACCCACACCAGCAACCGCCCC CGCCACCTCCCCCACAAGGCCCACCGGGCCACCCAGGC GCGCACCACGACCCGCAC TCGGACGAGGACACGCC 1300 S H P H Q Q P P P P P P P Q GP P G H P G A H H D P H SD E D T P 267 POU-SPCWPIC DCUKIN GACCTC 1400 T S D L I Q V A X Q F K Q R R I X L C I T Q A D V C L A L C T L 300 1500 FIG. 4. Nucleotide sequence Y G N V 8S Q T T X C Ri R A L Q L 8 KIX M C R LI P LL N XI L 334 LIR= POJ-IC0 DCIM and predicted amino acid se- TGQAAGIYGCAGACTCATCCTCGGGCAGCCCCACCAGCATAGACAAGATCGCAGCGCAACU C c _ 1600 quence of mouse Brain-2 genomic I Z A D S SS G S P TS I D K I A A Q C R X R XX R TS I Z V 8 V 367 DNA. The POU domain is en- CTCAAACCCTAACCCT a _ aG _ T _ 1700 X C A L ZI 8 r L X C P X P S A Q Z I TS L A D 8 L Q L III V V 400 closed in a box. The amino acid A~aOTTOTTTGAhA~rhraCAGAACA~u=TGACCECCTCCCGGAGGGACTCTDCCGGGCGCCGAGGATGTGTATGGGGDGTAGTAGGGACA 1800 repeats are shown in boldface R V WU C N R R Q X Z X R h T P P G G T L P G A E D V Y G G S R D T 434 type. The nucleotide and amino CGCCACCACACCACGGGGTGCAGACGCCCGTCCAGTGAACTCAAGCGGGGGAGGGGCAGAGCTTGGGGCTCCCCTCCCCTTTGGTCCACAGTCTTTCCTG 1900 P PH H G V Q T P V Q * 445 acid sequences of the POU-spe- GGACCTCTGTTTTCCCTCTAACAACTGATTGTTTTTTTTTTTTTTTAATTATTATTTTCCCGGTCCCTTAAAAAAGGAGAAAARAAAATAGAGAAAATAGG 2000 cific domain and POU homeo- AGAAAGGAAAGTAAAACACTGGACTATCCTATATCAGGTAGCAGGTGTAATAATGGTTTTTTGACCTTTGCAGGCGAGAGTACCCAGGCAATGAAGTAGA 2100 domain are also shown in boldface ATGAATGTCTCCTAGTGAGAATGAAGGTGTGTGTGTGTGTGTGTGTGTGTGTG TGTGTGTGTGTGTGTGTGGAGAGAGAGAGAGAGGGAGAGAGAGAGA 2200 GAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAAAGTCCTGGCAAAACTATTACCAAAAAGAG 2300 type. Underlined nucleotides are CAAACTGCTGCTTCCCAATGCTCAAAGTTGACACGTGCTGCTGTGTTTATTTAATATGGATCC 2364 described in the text. Downloaded by guest on September 29, 2021 3284 Neurobiology: Hara et al. Proc. Natl. Acad. Sci. USA 89 (1992)

S c F----lB 1 H Brain-i IMATAAS*YQPG*ILLTA4. 98 Brain-2 MATAA3IY.... LLTSS IVHAEPPGG...... GA ...... ITA 65 Brain-4 IMATAASPY....1 ILSSS ...... NP.KIQ8DEGB «VGDE>:2GSN N GHLGH . H . .8L 61 Scip LM P AA i A ...... F...... *. .GHPLSHAH 74 c c |.WVASAG[S 1 2 G _- ,_- W ..--, k . . . Brain-i r~jwwramm....PWAAAA...... ---- v-VM&SmomarQAVG)"POOPPOPPPPPP.nn*,wnr2"woAv wLuprwwrrwrrrrrrwry vows.. Px~u~mXL..- i i..GITiDATI------.u . 161 Brain-2 SNGGGGGGIGGG GGWA GGGG*SPW~...... ii-+.M--4M#E4*PGA4QQQHQQQQWQQQQWQQWID.M 148 Brain-4 .STIA ...... 94

Scip ... LE ...... I 110 H 1 Brain-i HHR*PHIJGPPP. PJQGJ iGWO PPQ S. SAPPGPG 250 Brain-2 QQQM*PHLMV.HH p eGA1zS AAA...... P ...... YSQ ...... AGQP...... 211 Brain-4 HHR . Hi-. - ...... TSG.....QPL .SQ. . EHGGLTP STQW 161 . Scip HAR ;. .Q GAGGGHQPQP YA.YP.. 184 c Gc M 2 2 Brain-i _.iX...... PE HPPHPHHA GINSHDFEWDEDTPT DLEQFAKQFKQRRIKLGFTQADVGLALG 343 Brain-2 D....PHHADHHPHPHS40PPPPPpP .... HHD E ISDEDTPT DDLEQFAKQFIQRRIKGFTQADVGLALG 298

Brain-4 ....HP...... A ...... PG....HHCQ. 220 Scip LEPSPPPHLGAHGHMGakGG ..G.EE HSDEDAPS 9 DLEQFAKQFKQRRIKLG=QDVGLALG 279 A Cs c G _ C G _ a 1 c 1 21 1 23 GC 2 2 l Brain-l TLYGNVFSQTICREALQL8FKNMCKLKPLLNKWLEEASTG8PT8IDKIAA0SRKRKKRT8IEVSVKGALESHFLKCPKPSAQEITb3ADSLQLEKE--- 443 Brain-2 TLYGNVFSQTTICREELQSAKMLQPLLNWLERSSG8PTBIDKIAAqaRSRKRTIEV8VRGALESHFLKCPRP8AQEIT8IADSLQLEKE 398 Brain-4 320 Scip 379 POU-WSUCIrC DOCUN PM-HOMO DONN c

2 G 42 1

Brain-i VVRVWFCNRQ QQQTP]F . . QVGTVSS . 495 Brain-2 VVRVWFCNRR P....A...... Gs.. 445

Brain-4 VVRVWF NR P P ...... V . 361 Scip 449

FIG. 5. Comparison of amino acid sequences of mouse Brain-1, Brain-2, Brain-4, and Scip POU domain proteins. Boxes represent amino acid sequence similarities. The following criteria for a box apply to two or more consecutive amino acid residues: (i) three or four proteins contain the same amino acid residue, (ii) at least two proteins contain the same amino acid residue and a third protein has a conservative amino acid replacement. Conservative amino acid replacement families defined by Dayhoff et al. (16) are as follows: (i) L, I, M, V; (ii) G, A, S, P, T; (iii) F, Y, W; (iv) E, D, Q, N; (v) R, K, H; (vi) C. Each dot represents a gap. S, T, and Y residues shown in boldface type correspond to putative consensus phosphorylation sites; each letter or number above the site corresponds to one of the following protein kinase abbreviations: A, cAMP-dependent protein kinase; G, cGMP-dependent protein kinase; C, protein kinase C; H, growth-associated histone H1 kinase; M, calmodulin-dependent protein kinase II; P, phosphorylase kinase; S, glycogen synthase kinase-3; Y, tyrosine protein kinase; 1, casein kinase I; 2, casein kinase II. Consensus phosphorylation sites are described by Pearson and Kemp (table II in ref. 17). Putative phosphorylation sites present in two or more proteins are shown in boldface type. The amino acid sequence of mouse Scip protein was deduced from the nucleotide sequence obtained for mouse genomic DNA; the data confirm the sequence reported for mouse Scip cDNA (9-11).

lates gene expression. The most effective sets of gene regu- Swanson, L. W. & Rosenfeld, M. G. (1991) Mol. Cell. Biol. 11, lators would be retained by selection. 1739-1744. 9. Susuki, N., Rohdewohld, H., Neuman, T., Gruss, P. & Scho- ler, H. R. (1990) EMBO J. 9, 3723-3732. We thank K. Huppi for the gift of a mouse genomic DNA library, 10. Meijer, D., Graus, A., Kraay, R., Langeveld, A., Mulder, C. Le Moine and W. S. Young for exchanging sequence information M. P. & Grosveld, G. (1990) Nucleic Acids Res. 19, 7357-7365. prior to publication, and A. Peterkofsky for comments on the 11. Zimmerman, E. C., Jones, C. M., Fet, M., Hogan, B. L. M. & manuscript. Magnuson, M. A. (1991) Nucleic Acids Res. 19, 956. 12. Agarwal, V. R. & Sato, S. M. (1991) Dev. Biol. 147, 363-373. 1. Herr, W., Sturm, R. A., Clerc, R. G., Corcoran, L. M., Bal- 13. Sanger, F., Nicklen, S. & Coulson, A. R. (1977) Proc. Natl. timore, D., Sharp, P. A., Ingraham, H. A., Rosenfeld, M. G., Acad. Sci. USA 74, 5463-5467. Finney, M., Ruvkun, G. & Horvitz, H. R. (1988) Genes Dev. 14. Davis, L. G., Dibner, M. D., & Battey, J. F. (1986) Basic 2, 1513-1516. Methods in Molecular Biology (Elsevier, New York), p. 130. 2. Rosenfeld, M. G. (1991) Genes Dev. 5, 897-907. 15. Le Moine, C. & Young, S. W., III (1992) Proc. Natl. Acad. Sci. 3. Ingraham, H. A., Flynn, S. E., Voss, J. W., Albert, V. R., USA 89, 3285-3289. Kapiloff, M. S., Wilson, L. & Rosenfeld, M. G. (1990) Cell 61, 16. Dayhoff, M. O., Schwartz, R. M. & Orcutt, B. C. (1979) in Atlas ofProtein Sequence and Structure, ed. Dayhoff, M. 0. 1021-1033. (Natl. Biomed. Res. Found., Washington), pp. 345-352. 4. Verrijzer, C. P., Kal, A. J. & van der Vliet, P. C. (1990) Genes 17. Pearson, R. & Kemp, B. E. (1991) Methods Enzymol. 200, Dev. 4, 1964-1974. 62-81. 5. He, X., Treacy, M. N., Simmons, D. M., Ingraham, H. A., 18. Treacy, M. N., He, X. & Rosenfeld, M. G. (1991) Nature Swanson, L. W. & Rosenfeld, M. G. (1989) Nature (London) (London) 350, 577-584. 340, 35-42. 19. Voss, J. W., Wilsom, L. & Rosenfeld, M. G. (1991) Genes Dev. 6. Monuki, E. S., Weinmaster, G., Kuhn, R. & Lemke, G. (1989) 5, 1309-1320. Neuron 3, 783-793. 20. Burglin, T. R., Finney, M., Coulson, A. & Ruvkun, G. (1989) 7. Monuki, E. S., Kuhn, R., Weinmaster, G., Trapp, B. & Nature (London) 341, 239-243. Lemke, G. (1990) Science 249, 1300-1303. 21. Johnson, W. A. & Hirsh, J. (1990) Nature (London) 343, 8. He, X., Gerrero, R., Simmons, D. M., Park, R. E., Lin, C. R., 467-470. Downloaded by guest on September 29, 2021