Genomic Sequence Analysis of the MHC Class I G/F Segment in Common Marmoset ( Callithrix jacchus)

This information is current as Azumi Kono, Markus Brameier, Christian Roos, Shingo of October 2, 2021. Suzuki, Atsuko Shigenari, Yoshie Kametani, Kazutaka Kitaura, Takaji Matsutani, Ryuji Suzuki, Hidetoshi Inoko, Lutz Walter and Takashi Shiina J Immunol published online 5 March 2014 http://www.jimmunol.org/content/early/2014/03/05/jimmun ol.1302745 Downloaded from

Supplementary http://www.jimmunol.org/content/suppl/2014/03/05/jimmunol.130274

Material 5.DCSupplemental http://www.jimmunol.org/

Why The JI? Submit online.

• Rapid Reviews! 30 days* from submission to initial decision

• No Triage! Every submission reviewed by practicing scientists

• Fast Publication! 4 weeks from acceptance to publication by guest on October 2, 2021

*average

Subscription Information about subscribing to The Journal of Immunology is online at: http://jimmunol.org/subscription Permissions Submit copyright permission requests at: http://www.aai.org/About/Publications/JI/copyright.html Email Alerts Receive free email-alerts when new articles cite this article. Sign up at: http://jimmunol.org/alerts

The Journal of Immunology is published twice each month by The American Association of Immunologists, Inc., 1451 Rockville Pike, Suite 650, Rockville, MD 20852 Copyright © 2014 by The American Association of Immunologists, Inc. All rights reserved. Print ISSN: 0022-1767 Online ISSN: 1550-6606. Published March 5, 2014, doi:10.4049/jimmunol.1302745 The Journal of Immunology

Genomic Sequence Analysis of the MHC Class I G/F Segment in Common Marmoset (Callithrix jacchus)

Azumi Kono,* Markus Brameier,† Christian Roos,†,‡ Shingo Suzuki,* Atsuko Shigenari,* Yoshie Kametani,x Kazutaka Kitaura,{ Takaji Matsutani,{ Ryuji Suzuki,{ Hidetoshi Inoko,* Lutz Walter,†,‡ and Takashi Shiina*

The common marmoset (Callithrix jacchus) is a New World monkey that is used frequently as a model for various human diseases. However, detailed knowledge about the MHC is still lacking. In this study, we sequenced and annotated a total of 854 kb of the common marmoset MHC region that corresponds to the HLA-A/G/F segment (Caja-G/F) between the Caja-G1 and RNF39 . The sequenced region contains 19 MHC class I genes, of which 14 are of the MHC-G (Caja-G) type, and 5 are of the MHC-F (Caja-F) type. Six putatively functional Caja-G and Caja-F genes (Caja-G1, Caja-G3, Caja-G7, Caja-G12, Caja-G13,andCaja-F4), 13 pseu-

dogenes related either to Caja-G or Caja-F, three non-MHC genes (ZNRD1, PPPIR11,andRNF39), two miscRNA genes (ZNRD1-AS1 Downloaded from and HCG8), and one non-MHC pseudogene (ETF1P1) were identified. Phylogenetic analysis suggests segmental duplications of units consisting of basically five (four Caja-G and one Caja-F) MHC class I genes, with subsequent expansion/deletion of genes. A similar genomic organization of the Caja-G/F segment has not been observed in catarrhine primates, indicating that this genomic segment was formed in New World monkeys after the split of New World and Old World monkeys. The Journal of Immunology, 2014, 192: 000–000. http://www.jimmunol.org/ ommon marmosets (Callithrix jacchus) are small New Hence, detailed knowledge of the experimental animals’ MHC World monkeys that have gained much attention and genomic region is a prerequisite to understand the role of MHC C importance as experimental animals because of their genes in these diseases and to refine the animal model. smaller body size, easier handling, shorter generation time, and Detailed knowledge of the marmoset MHC (Caja)isonly lower costs compared with the more commonly studied Old World starting to emerge; because of the duplicated nature of this region, monkeys, such as the rhesus (Macaca mulatta) and cynomolgus only fragmented sequences are available from the C. jacchus whole- (Macaca fascicularis) macaques. Marmosets are used in different genome shotgun-sequencing draft assembly. The overall structure biomedical research fields, such as Parkinson’s disease (1), drug of the MHC class I region is largely similar to the HLA, common

toxicology (2, 3), transplantation (4), transgenic techniques and chimpanzee (Patr), and rhesus macaque (Mamu) corresponding by guest on October 2, 2021 stem cell research (5, 6), immunity and infectious diseases (7–9), regions (12). The Caja class I region is divided into three segments: and autoimmune diseases (10). Many of these diseases are es- HLA-B/C corresponding Caja-B segment, HLA-E corresponding sentially influenced by polymorphisms of the MHC region (11). Caja-E segment, and HLA-A/G/F corresponding Caja-G/F segment. From the MHC-B/C segment, which has been under permanent selective pressure in the evolution of primates, a genomic sequence *Division of Basic Medical Science and Molecular Medicine, Department of Molec- of 1179 kb was determined that includes nine duplicated Caja-B ular Life Science, Tokai University School of Medicine, Isehara, Kanagawa 259- genes (12). In contrast, although four to seven Caja-G–like alleles 1143, Japan; †Primate Genetics Laboratory, German Primate Center, Leibniz Institute ‡ were detected in each common marmoset from a recent population for Primate Research, 37077 Go¨ttingen, Germany; Bank of Primates, German Primate Center, Leibniz Institute for Primate Research, 37077 Go¨ttingen, Germany; study (13), the genomic structure of the Caja-G/F segment be- x Division of Basic Medical Science and Molecular Medicine, Department of Immu- tween GABBR1 and ZNRD1 is still not solved, and 54 alleles of nology, Tokai University School of Medicine, Isehara, Kanagawa 259-1143, Japan; and {Department of Rheumatology and Clinical Immunology, Clinical Research Caja-G and Caja-E are currently listed in the Immuno Polymor- Center for Allergy and Rheumatology, Sagamihara, Kanagawa 228-0815, Japan phism Database (IPD)-MHC database (14). Received for publication October 11, 2013. Accepted for publication January 17, In this study, we used seven bacterial artificial 2014. (BAC) clones to determine and annotate a total 854-kb genomic This work was supported by Scientific Research on Science and Technology of Japan sequence between the Caja-G1 and RNF39 genes, including the and a Grant-in-Aid for Scientific Research (B) (24300155) from the Japan Society for Caja-G/F segment, and we characterized Caja-G and Caja-F the Promotion of Science, the program “Pakt fur€ Forschung und Innovation” Grant “Biodiversity” of the Leibniz Society, and institutional support from the German (Caja-G/F) genes and the evolutionary process of the Caja-G/F Primate Center. segment by phylogenetic analysis. Address correspondence and reprint requests to Dr. Takashi Shiina, Department of Molecular Life Science, Division of Basic Medical Science and Molecular Medicine, Tokai University School of Medicine, 143 Shimokasuya, Isehara, Kanagawa 259- Materials and Methods 1143, Japan. E-mail address: [email protected] Construction of a contiguous map of overlapping BAC clones The online version of this article contains supplemental material. of the Caja-G/F segment Abbreviations used in this article: BAC, bacterial artificial chromosome; BLAST, BAC library CHORI-259 (constructed from kidney cells obtained from a basic local alignment search tool; BLAT, BLAST-like alignment tool; contig, con- tiguous; IPD, Immuno Polymorphism Database; LINE, long interspersed nuclear male common marmoset) was obtained from the BACPAC Resource element; LTR, long terminal repeat; MID, multiple identifier; NJ, neighbor joining; Center at the Children’s Hospital Oakland Research Institute (Oakland, ver., version. CA; http://bacpac.chori.org/home.htm). Hybridization screening was performed following the recommended protocols. As probes for BAC Copyright Ó 2014 by The American Association of Immunologists, Inc. 0022-1767/14/$16.00 library screening, we used PCR products of 800–900 bp from the common

www.jimmunol.org/cgi/doi/10.4049/jimmunol.1302745 2 GENOMICS OF THE Caja-G/F SEGMENT marmoset MHC-I genes (exons 2–4), as well as one unique miscRNA, genes were constructed by the neighbor-joining (NJ) method (Molecular ZNRD1-AS1 (Supplemental Table I). Positive BAC clones were ordered Evolution Genetics Analysis software 5) (20), using genomic sequences of by comparison of EcoRI and HindIII restriction fragments and PCR- exons 3–8 (alignment length: 1539 bp, excluding gap sites). NJ trees were based mapping, with 11 locus-specific primer sets obtained by BAC-end constructed by the Maximum Composite Likelihood model and assessed sequencing (Supplemental Table I). All PCR reactions were performed using 10,000 bootstrap replicates. We used the following MHC-I sequences using Ex Taq DNA polymerase (TaKaRa, Shiga, Japan) and the GeneAmp (DNA accession numbers) for phylogenetic analyses: HLA-A (NG_029217); PCR System 9700 (Life Technologies, Carlsbad, CA). HLA-B (NG_023187); HLA-C (NG_029422); HLA-E and HLA-F (NT_113891); HLA-G (NG_029039); Patr-A, Patr-B, Patr-C, Patr-E, Patr-F,andPatr-G Nucleotide sequencing (BA000041); Mamu-A1 (AB128833); Mamu-B1 (AB128860); Mamu-E (AB128840); Mamu-F (AB128841); Mamu-AG3 (AB128837); Mamu-G5 BAC clones 205N13, 62H24, 195P8, 88K6, 246G18, 41G14, and 211G14 (AB128049); and Caja-B1, Caja-B3, Caja-B4, Caja-B6,andCaja-B7 were subjected to nucleotide sequence determination by combination of (AB600201, AB600202). massively parallel pyrosequencing (15) and/or bidirectional shotgun se- quencing (16). Pyrosequencing was performed following the recommended protocol for the Roche 454 GS Junior Bench Top System (Roche, Basel, Switzerland). Results Briefly, titanium rapid libraries of fragmented DNA linked to AMPure beads Construction of a BAC contiguous map covering the Caja-G/F (Beckman Coulter Genomics, Danvers, MA) were prepared for the Roche segment GS Junior Bench Top System by nebulization, fragment end repair, and multiple identifier (MID)-labeled adaptor ligation and were subjected to A total of 22 BAC clones, covering the genomic segment between emulsion PCR and emulsion breaking, which were performed according the GABABR1 and RNF39 genes, was isolated by hybridization to the manufacturer’s protocol (Roche) (17). After the emulsion-breaking screeningandassembledintothree clusters (tentatively named step, the beads carrying the ssDNA templates were enriched and counted, contigs 1–3) based on comparison of restriction fragments and and 0.5 million beads were deposited into a PicoTiterPlate to obtain se- PCR-based mapping (Fig. 1A). In this process, 205N13 overlapping Downloaded from quence reads (17). After the sequencing run, image processing, signal correction, and base-calling were performed using GS Run Processor clone 171K8 was thought to be suitable for genomic sequencing, version (ver.) 2.5 (Roche), with full processing for shotgun or paired-end but we excluded this clone from the following sequencing process filter analysis. Quality-filter sequence reads that passed the assembler because the draft sequence was already released (accession number: software (single SFF file) were binned on the basis of the MID labels into GL286223). Therefore BAC clones 205N13, 62H24, 195P8, 88K6, four separate sequence SFF files using SFF file software (Roche). These files were further quality trimmed to remove poor sequence at the end of 246G18, 41G14, and 211G14 were subjected to complete sequencing the reads with quality values , 20. The trimmed and MID-labeled se- using a combination of massively parallel pyrosequencing and http://www.jimmunol.org/ quence reads were assembled as 95% matched parameters using GS De bidirectional shotgun-sequencing methods (Fig. 1A). Novo Assembler ver. 2.5 (Roche). GS Junior sequencing also was performed with cDNA from various Genomic sequence information of the Caja-G/F segment organs derived from a single common marmoset. Primers and sequencing were essentially as described by O’Leary et al. (18). Obtained sequences Nucleotide sequencing of the seven selected BAC clones was per- were compared by basic local alignment search tool (BLAST) search with formed independently with the Roche GS Junior System. Table I lists Caja-G and Caja-E cDNA sequences available from public databases all relevant parameters obtained from Roche GS Junior sequencing. (IPD), as well as with Caja-B1, Caja-B3, Caja-B4, Caja-B6, and Caja-B7 After shotgun sequencing, gap filling, and validation of the as- sequences described by Shiina et al. (12). Sequence reads from individual sembled sequences, we determined the complete genomic sequences transcripts were counted, and the frequencies of reads matching a database by guest on October 2, 2021 sequence were calculated. Because Caja-B sequences are very similar, we ranging between 183,844 bp (88K6) and 212,178 bp (205N13) considered only those sequencing reads that were identical to reported (Table I). Caja-B sequences. Overlaps between all BAC clones were ascertained at the se- Shotgun sequencing also was performed by the cycle-sequencing me- thod using AmpliTaq DNA Polymerase FS fluorescently labeled BigDye quence level. Four overlaps between the BAC clones are present, terminators in the GeneAmp PCR System 9700 (Life Technologies). A and two overlaps (62H24/195P8 and 195P8/88K6) are not iden- 3130xl Genetic Analyzer was used for automated Sanger sequencing (Life tical: the 67-kb overlap between 62H24 and 195P8 has 99.0% Technologies). Individual sequences were minimally edited to remove nucleotide identity, and the 124-kb overlap between BAC clones vector sequences and were assembled into Sequencher ver. 5.0.1 (Gene 195P8 and 88K6 shows 99.6% nucleotide identity. Therefore, these Codes, Ann Arbor, MI), along with the assembled consensus sequences obtained by the next-generation sequencing method. Remaining gaps or data support allelic relationships of all junctions. Finally, we con- ambiguous nucleotides were determined by the direct sequencing of PCR structed the 470,958-bp contiguous sequence using 274,255 bp of products obtained with appropriate PCR primers or by nucleotide sequence 205N13 and 62H24, 12,859 bp of 195P8, and 183,844 bp of 88K6. determination of shotgun clones. Contig 2 contains 191,375 bp derived from BAC clones 246G18 Sequence analysis and annotation and 41G14 and contig 3 contains 192,325-bp sequence derived from BAC clone 211G14. In total, a 854,658-bp genomic sequence General sequence analysis was performed using GENETYX-MAC soft- (contig 1: 470,958 bp, contig 2: 191,375 bp, and contig 3: 192,325 ware ver. 12.2.7 (Genetyx, Tokyo, Japan). Nucleotide similarities between sequences were calculated by Sequencher; those with nucleotide sequences bp; accession numbers AB809558–AB809560 in the DNA Data in GenBank/European Molecular Biology Laboratory/DNA Data Base in Base in Japan, European Molecular Biology Laboratory, and Gen- Japan were searched using the BLAST program (http://www.ncbi.nlm.nih. Bank nucleotide sequence databases [http://www.ncbi.nlm.nih.gov/ gov/BLAST/), whereas those with C. jacchus whole-genome shotgun- genbank/]) was determined. sequencing draft assembly (WUGSC 3.2 (GCA_000004665.1)) generated by the Washington University Genome Sequencing Center (St. Louis, MO) Comparison of the Caja-G/F segment with C. jacchus draft and the Baylor College of Medicine (Houston, TX) were searched with assembly WUGSC 3.2 BLAST-like alignment tool (BLAT) (http://genome.ucsc.edu/cgi-bin/hgBlat? command=start). Prediction of coding sequences was performed using To compare the Caja-G/F segment with the C. jacchus draft as- GENSCAN (http://genes.mit.edu/GENSCAN.html). Identification and sembly, we performed BLAT searches of our three clusters with classification of repeat sequences were performed using RepeatMasker2 (htttp://repeatmasker.genome.washinton.edu/). WUGSC 3.2 sequence data using a sliding window of 25 kb. A high degree of sequence identity, ranging between 97.6 and 100%, was Phylogenetic analysis found for 150 kb of contig 1, 50 kb of contig 2, and 92 kb of contig Multiple-sequence alignment was created using the ClustalW Sequence 3, with draft sequences of Chr.4_GL285015, Chr.4_GL285016, Alignment program of Molecular Evolution Genetics Analysis software 5 and Chr.4_GL285017 scaffolds (Table II). The other sequences (http://www.megasoftware.net/) (19). Phylogenetic trees of the MHC-I from the clusters show high similarities with some short parts of The Journal of Immunology 3 Downloaded from http://www.jimmunol.org/ by guest on October 2, 2021

FIGURE 1. The sequence-ready BAC contig map (A) and gene structure (B) of the 854-kb Caja-G/F segment between the GABABR1 and RNF39 genes. (A) Three contigs of BAC clone sequences encompassing 470,958 bp, 191,375 bp, and 192,325 bp are shown. In addition, information about BAC clone 171K8 (GL286223) obtained from the public database is shown. Red bars indicate overlapping BAC clones used for genomic sequencing. Circles indicate locations of 11 sequence-tagged site markers used for PCR-based mapping. (B) Red, blue, and black text indicate Caja-F, Caja-G, and non-MHC genes, respectively. Red, black, blue, and green boxes indicate functional MHC genes, pseudogenes, non-MHC genes, and miscRNA, respectively. Upper and lower boxes indicate transcriptional orientation. draft sequences that are derived from chr. 4 or other or unknown segment (Fig. 1B, Table IV). Among them are 19 Caja-G/F genes—6 chromosomal locations. putatively functional Caja-G/F genes (Caja-G1, Caja-G3, Caja-G7, Caja-G12, Caja-G13,andCaja-F4), 8 Caja-G/F pseudogenes along Genomic structure of the Caja-G/F segment with frequently observed nonsense mutations and/or indels on their The GC content of the Caja-G/F segment is almost the same as in coding regions (Caja-G2, Caja-G6, Caja-G8, Caja-G11, Caja-G14, other primate MHC-A/G/F segments, ranging from 43.9% in Caja Caja-F1, Caja-F2,andCaja-F5), and 5 truncated-type Caja-G to 45.2% in HLA (Table III). An analysis of the segment using the pseudogenes with deletion of exons 1 and 2 (Caja-G4, Caja-G5, RepeatMasker 2 program unveiled the following frequencies of Caja-G9,andCaja-G10) or exons 6 and 7 (Caja-F3); 3 non-MHC interspersed repeats: 5.1% short interspersed nuclear elements genes (ZNRD1, PPPIR11,andRNF39); 2 miscRNA genes (ZNRD1- (Alus + mammalian interspersed repeats [MIRs]), 31.7% long in- AS1 and HCG8); and 1 non-MHC pseudogene (ETF1P1)(Fig.1B, terspersed nuclear elements (LINEs) (LINE1 + LINE2 + L3/CR1), Table IV). Nucleotide similarities on coding sequences of the three 13.0% long terminal repeat (LTR) elements, and 2.6% DNA ele- non-MHC class I genes with their human orthologs were 95%, on ments. These repeats collectively occupied 53.7% of the Caja-G/F average, and ranged from 94% (RNF39)to96%(PPP1R11) segment. The densities of the LTR and DNA elements of the Caja- (Table IV). Finally, sequences showing similarity to common mar- G/F segment are much lower than for other simian MHC-A/G/F moset MIC1 and MIC2 genes (12) were not observed in this segment. segments, but the LINE density of the segment is much higher than for other simian MHC-A/G/F segments (Table III). Structural characteristics of the Caja-G/F genes The 854-kb genomic sequence stretching from Caja-G1 to RNF39 Six Caja-G/F genes (Caja-G1, Caja-G3, Caja-G7, Caja-G12, was subjected to gene-identification analysis using BLAST and Caja-G13, and Caja-F4) have open reading frames coding for 360 GENSCAN, which revealed the presence of 25 genes within this (Caja-F4) to 365 (Caja-G1) amino acids, similar to the expressed 4 GENOMICS OF THE Caja-G/F SEGMENT

Table I. Sequence information obtained by Roche GS Junior Bench Top System

Draft Read Draft Read Average Read Average Average Complete Sequence BAC ID Numbersa Bases (Mb) Length (bp) Quality Value Contig Numbers Sequence Depthb Length (bp)c 205N13d 30,723 13.0 422 29.5 5 57.2 212,178 62H24 71,627 27.3 381 28.9 5 129.6 195,377 195P8d 22,704 7.5 330 25.8 7 37.0 190,366 88K6 44,962 17.4 388 29.2 3 92.5 183,844 246G18 47,306 18.4 389 29.3 4 90.1 186,918 41G14d 23,296 9.6 411 28.9 4 47.0 186,871 211G14d 19,059 6.2 324 25.7 8 30.3 192,325 Average 37,097 14.2 382 28.7 5.1 69.1 aDraft reads having .20 quality values. bThe average number of nucleotides contributing to a portion of an assembly. cThe length excludes BAC vector. dSequence was determined by next-generation sequencing and Sanger shotgun-sequencing method.

MHC-I genes of other primates. The transcriptional orientation of differences, respectively. The formal allele names of Caja-G3 and the genes is the same as for MHC-A, MHC-G, and MHC-F genes Caja-G7 were assigned to Caja-G*17:03 and Caja-G*08:06,re- in HLA, Patr, and Mamu. Nucleotide and amino acid similarities spectively. In contrast, Caja class I sequences showing high simi- . of the coding sequences among the six Caja-G/F genes range larities ( 85%) to Caja-F4 were not observed in known Caja class Downloaded from from 74.3 to 96.9% and 74.2 to 93.7%, respectively, with Caja-F4 I sequences. showing the lowest nucleotide and amino acid similarities with the All of the deduced Caja-G amino acid sequences contain the other Caja-G genes (Supplemental Table II). highly conserved and structurally essential amino acid residues, Nucleotide and amino acid similarities to previously released such as N-glycosylation sites (amino acid positions 86–88), four Caja-B, Caja-E,andCaja-G sequences are shown in Supplemental cysteine residues involved in disulfide bonding (positions 101, 164, Table III. The nucleotide sequences of Caja-G1, Caja-G12,and 203, and 259) and CD8 binding, as well as most of the a-chain and http://www.jimmunol.org/ Caja-G13 are identical to Caja-G*18:02 (IPD Accession Number: b2 microglobulin contact sites, compared with the three-dimensional MHC04993), Caja-G*07:01:01 (MHC05010), and Caja-G*20:01 structure of HLA-A2 molecule (21, 22) (Supplemental Fig. 1). In (MHC05027), respectively; Caja-G3 and Caja-G7 show the closest addition, a helices, b sheets, and transmembrane structures of all similarities to Caja-G*17:01 (MHC04991), with one nucleotide six Caja-G/F sequences are similar to other MHC-A class I pro- difference, and Caja-G*08:05 (MHC04982), with three nucleotide teins (data not shown). Therefore, the six Caja-G/F amino acid

Table II. Nucleotide similarities with C. jacchus draft assembly

Sequence Block for BLAT Research (kb) BLAT Score Nucleotide Similarity (%) High Homologous Contig Name and Position by guest on October 2, 2021 Contig 1 (470,958 bp) 1–25 15,571 99.7 chr4_GL285016_random:1-17113 25–50 13,706 99.7 chr4_GL285016_random:28197-43745 50–75 17,514 100.0 chr4_GL285016_random:43748-61279 75–100 20,755 99.8 chr4_GL285017_random:1-20857 100–125 24,808 99.8 chr4_GL285017_random:20858-59400 125–150 24,471 99.7 chr4_GL285017_random:59401-84067 150–175 18,102 99.1 chrUn_GL288610:4330-22932 175–200 17,906 97.7 chrUn_ACFV01182568:1-19094 200–225 5,663 96.1 chr15:83425451-83431720 225–250 20,652 95.8 chr4:30234516-30263704 250–275 16,381 99.3 chr4_GL285062_random:15-18019 275–300 22,767 96.3 chr4_GL285015_random:14851-46332 300–325 6,220 98.8 chr4_GL285063_random:2240-12425 325–350 16,220 95.2 chr4_GL285017_random:64676-83428 350–375 13,704 99.6 chrUn_GL286689:3650-19091 375–400 14,475 94.0 chrUn_ACFV01182568:1-16863 400–425 14,247 98.8 chrUn_GL289115:8-15489 425–450 22,998 98.2 chr4:30235531-30259521 450–471 14,512 99.6 chr4_GL285015_random:1-14659 Contig 2 (191,375 bp) 1–25 13,263 97.6 chrUn_ACFV01182568:5387-19665 25–50 4,970 92.1 chr13:74755841-74755841 50–75 18,322 94.0 chr4:30243303-30686113 75–100 24,680 99.5 chr4_GL285015_random:361-25343 100–125 24,682 99.5 chr4_GL285015_random:25344-61338 125–150 16,690 98.6 chrUn_ACFV01178635:1-177227 150–175 12,500 98.7 chr4_ACFV01183594_random:2131-14992 175–191 8,800 96.6 chr4_GL285068_random:1-12959 Contig 3 (192,325 bp) 1–25 7,774 98.0 chrUn_ACFV01178635:1239-9376 25–50 6,412 98.6 chr4_ACFV01183594_random:8366-14992 50–75 11,800 96.7 chr4_GL285068_random:1-16181 75–100 10,676 96.5 chr4:30274821-30286711 100–125 23,400 97.6 chr4:30243836-30317430 125–150 22,863 99.5 chr4:30317431-30346654 150–175 23,395 99.8 chr4:30346655-30373685 175–192 17,174 99.6 chr4:30373686-30391011 The Journal of Immunology 5

Table III. Features of interspersed repeat sequences in the MHC-A/G/F segment

Reference Nucleotide GC DNA Species Contig Sequence Length (bp) Content (%) SINEs (%) LINEs (%) LTRs (%) Elements (%) Total (%) Caja Contig 1 AB809558 470,958 44.0 4.7 33.5 12.9 2.4 55.3 Contig 2 AB809559 191,375 44.2 5.1 28.9 13.4 2.4 51.4 Contig 3 AB809560 192,325 43.4 6.0 29.8 12.5 3.0 52.4 In total 854,658 43.9 5.1 31.7 13.0 2.6 53.7 HLA BA000025 361,129 45.2 7.4 20.8 21.2 4.8 54.1 Patr BA000041 323,419 45.0 7.4 15.4 23.3 4.3 50.4 Mamu AB128049 937,292 44.5 6.6 18.8 21.6 4.5 51.4 GC, percentage of guanine and cytosine; SINE, short interspersed nuclear element. sequences are expected to have a typical MHC class I and HLA-B, HLA-C, Patr-B, Patr-C, and Mamu-B1 in MHC-B/C structure. Moreover, 36 residues of the 57 Ag binding sites are lineage. diverged among the Caja-G/F sequences (Fig. 2), suggesting dif- Of the 19 Caja MHC class I sequences, 14 show close rela- ferent Ag-binding characteristics of the six Caja-G/F . In tionships with MHC-G genes (Caja-G groups 1–4), and 5 show the Caja-F4 sequence, 22 amino acid residues (19 amino acid close relationships with MHC-F genes (Caja-F group). The phy- changes, 3 amino acid deletions) were uniquely different from the logenetic relationship supports that Caja-G groups 1–4 evolved

other sequences, and changes in the amino acid polarity were after divergence from HLA, Patr, and Mamu MHC-G genes and Downloaded from observed in five sites (positions 66, 69, 116, 150, and 151) (Fig. 2), diverged in the following order: group 4, group 3, and group 1/2. clearly distinguishing Caja-F sequences from Caja-G sequences. Caja-G groups 1 and 2 include putatively functional Caja-G genes (Caja-G1, Caja-G3, Caja-G7, Caja-G12,andCaja-G13), whereas Phylogenetic relationship of the Caja-G/F genes Caja-G groups 3 and 4 contain only Caja-G pseudogenes. The Caja-F To study the phylogenetic relationship of the Caja-G/F genes, we group includes only one putatively functional Caja-F gene (Caja-F4) constructed a phylogenetic tree using the NJ method based on 42 and four Caja-F pseudogenes (Figs. 3, 4). http://www.jimmunol.org/ MHC class I nucleotide sequences covering exons 3–8. The phy- logenetic tree shows five major lineages: MHC-A/AG, MHC-B/C, Discussion MHC-E, MHC-F,andMHC-G (Fig. 3). As expected, the HLA, Patr, In this study, we determined 854 kb of genomic sequence of the and Mamu genes show close relationships, such as HLA-G, Patr-G, common marmoset MHC-G/F segment that includes 14 Caja-G and Mamu-G5 in MHC-G lineage; HLA-A, Patr-A, Mamu-A1,and and 5 Caja-F genes by using seven BAC clones. As a result of Mamu-AG3 in MHC-A lineage; HLA-F, Patr-F,andMamu-F in naturally occurring bone marrow chimerism in marmosets, indi- MHC-F lineage; HLA-E, Patr-E,andMamu-E in MHC-E lineage; viduals may contain up to four alleles of genetic loci. Therefore, by guest on October 2, 2021 Table IV. Gene content of the Caja-G/F segment

Identity to Human Reference Sequence

Gene Symbol Locus Type Direction No. of Exons Prominent Features Reference Sequence Identity (%) Contig 1 (470,958 bp) Caja-G1 Gene (+) 8 MHC, class I, G1 NM_002116 91 Caja-G2 Pseudo (+) 8 MHC, class I, G2 NM_002116 87 Caja-F1 Pseudo (+) 7 MHC, class I, F1 NM_001098479 86 Caja-G3 Gene (+) 8 MHC, class I, G3 NM_002116 91 Caja-G4 Pseudo (+) 6 MHC, class I, G4 NM_002116 84 Caja-G5 Pseudo (2) 6 MHC, class I, G5 NM_002116 86 Caja-G6 Pseudo (+) 8 MHC, class I, G6 NM_005514 84 Caja-F2 Pseudo (+) 7 MHC, class I, F2 NM_001098479 85 Caja-G7 Gene (+) 8 MHC, class I, G7 NM_002116 90 Caja-G8 Pseudo (+) 8 MHC, class I, G8 NM_002116 81 Caja-G9 Pseudo (2) 6 MHC, class I, G9 NM_002116 87 Contig 2 (191,375 bp) Caja-G10 Pseudo (2) 6 MHC, class I, G10 NR_001434 88 Caja-G11 Pseudo (+) 8 MHC, class I, G11 NM_002116 87 Caja-F3 Pseudo (+) 5 MHC, class I, F3 NM_002116 84 Caja-F4 Gene (+) 7 MHC, class I, F4 NM_001098479 86 Caja-G12 Gene (+) 8 MHC, class I, G16 NM_002116 91 Contig 3 (192,325 bp) Caja-F5 Pseudo (+) 7 MHC, class I, F5 NM_001098479 86 Caja-G13 Gene (+) 8 MHC, class I, G13 NM_002116 91 Caja-G14 Pseudo (+) 8 MHC, class I, G14 NM_002116 82 ZNRD1-AS1 Misc RNA (2) 6 ZNRD1 antisense RNA 1 NR_026751 86 HCG8 Misc RNA (2) 1 HLA complex group 8 pseudogene 1 ortholog NR_103542 85 ETF1P1 Pseudo (+) 1 Eukaryotic translation factor 1 pseudogene NM_004730 92 ZNRD1 Gene (+) 4 Zinc ribbon domain containing 1 NM_170783 94 PPP1R11 Gene (+) 3 , regulatory subunit 11 NM_021959 96 RNF39 Gene (2) 4 Ring finger protein 39 NM_025236 94 Caja class I loci are shown in bold type. 6 GENOMICS OF THE Caja-G/F SEGMENT Downloaded from

FIGURE 2. Comparison of 57 Ag binding sites derived from the deduced amino acid sequences of Caja-G/F genes. Dots and dashes indicate identical amino acids with Caja-G1 sequence and deleted sites, respectively. Gray and black backgrounds indicate conserved sites and unique amino acids observed in Caja-F4, respectively. For additional comparison, the respective amino acid residues of HLA-A*02:01 are included. overlapping BAC clone sequences of 854 kb might be a com- segment of contig 1 (the MOG side) and ranging from 0.2 to 2.4% posite of up to four haplotypes. in the overlapping 92-kb segment of contig 3 (the RNF39 side) http://www.jimmunol.org/ Comparison of our sequence with that from the current draft (Table II). Overlaps among the BAC clones in contigs 1 and 3 are assembly of the common marmoset genome WUGSC 3.2 suggests supported by comparison of restriction fragments and PCR-based allelic variation ranging from 0 to 2.4% in the overlapping 150-kb mapping (data not shown). Therefore, we determined the nucle- otide sequences using the representative BAC clones to obtain consensus sequences of the segment, because these well-conserved segments, locating near non-MHC genes, are thought to have similar genomic structures among the Caja-G haplotypes. Fur- thermore, gene order and gene content of the Caja-G/F segment are consistent with those of the common marmoset draft assembly by guest on October 2, 2021 and the HLA region. The remaining 562-kb segment (321 kb in contig 1, 141 kb in contig 2, and 100 kb in contig 3), including the duplicated Caja-G/F genes, is represented only by several mini contigs in the WUGSC 3.2 sequence (Table II), which is most likely due to the current quality (draft assembly and not finished sequence) of the common marmoset draft genome sequence. Ge- nomic diversities in the Caja-G loci were suggested to be generated considerably after duplication based on the population study of the Caja-G genes (13). This finding supports that genomic structures

FIGURE 3. Nucleotide sequence–based phylogenetic tree of MHC class I FIGURE 4. Schematic gene structures showing segmental duplications genes constructed by the NJ method. Bold type indicates putatively functional involved in Caja-G and Caja-F genes. White, black, and gray boxes in- Caja-G and Caja-F genes. Numbers at branches indicate bootstrap values. dicate putatively functional genes, pseudogenes, and missing genes. The Journal of Immunology 7 of the Caja-G/F segment are different among the Caja-G/F hap- and Caja-F4: AB855800) (T. Shiina, Y. Kametani, R. Suzuki, lotypes. In fact, nucleotide similarity between 10 kb of contig 1 L. Walter, manuscript in preparation). (BAC clone 88K6 side) and 10 kb of contig 2 (BAC clone 246G18 Interestingly, we found, using the deep-sequencing approach, side) was extremely low (,50%), although Caja-G9 in contig 1 is that Caja-B3 and Caja-B4 are transcribed in various tissues at low closely related to Caja-G10 in contig 2 with a high nucleotide levels, confirming our previous hypothesis that these MHC class I similarity (98.9%) between them and a close phylogenetic rela- genes might indeed be actively transcribed. Thus, the common tionship (Fig. 3). Hence, we tentatively described the location of marmoset contains and transcribes all simian MHC class I gene contig 2 as being between contig 1 and contig 3 (Fig. 1B). Such lineages, namely MHC-B/C, MHC-E, MHC-F, and MHC-G. The difficulties in draft genomes are a known problem and are par- only exception is MHC-A, which is not present in the common ticularly evident in highly duplicated regions, such as the one an- marmoset or (most probably) in other New World monkey ge- alyzed in this study. Therefore, our determined genomic sequences nomes. The loss of MHC-A and its function as a classical MHC provide additional and informative data for the complete determi- class I gene of MHC-G either happened in the lineage leading to nation of the common marmoset MHC region and provide re- platyrrhine primates (New World monkeys) or, alternatively, the searchers with an excellent basis to search for polymorphisms and specialized function of MHC-G and the classical function of to activate many fields of biomedical research involving the MHC MHC-A genes evolved in the lineage leading to catarrhine of the common marmoset. primates (Old World monkeys, apes, and humans). Unfortunately, Phylogenetic analysis using 42 primate MHC class I genes and strepsirrhine primates, such as lemurs, are not informative because using common marmoset MHC-G/F allele sequences supported the MHC class I genes of these species do not show a particularly the classification of the common marmoset MHC class I genes of close relationship with any of the simian MHC class I lineages (27). the Caja-G/F region into the MHC-G and MHC-F lineages. In- HLA-G is involved in the modulation of immunological re- Downloaded from terestingly, these genes were substantially expanded: four groups sponses through inhibition of activity and mediating apoptosis of contain a total of 14 Caja-G genes, and one group contains 5 cytotoxic CD8+ T cells and NK cells by interacting with their in- Caja-F genes (Fig. 3). The phylogenetic relationships, together hibitory receptors and inhibiting the alloreactivity by CD4+ Tcells with the positional relationships among the Caja-G and Caja-F (28). Although we tried to compare the transcriptional regulatory genes (Fig. 1), also supported the presence of at least four du- factor sequences of the 14 Caja-G genes, we could not identify plicated units (Fig. 4). Unit 3 includes only the Caja-G9 pseu- a single Caja-G gene with a function similar to HLA-G, based on http://www.jimmunol.org/ dogene from Caja-G group 1, and we speculate that Caja-G genes the genomic information provided in this study. Future expression belonging to Caja-G groups 2–4, as well as the Caja-F group, analyses using trophoblast cells will show whether any of the Caja were deleted. Similarly, the Caja-G group 4 gene in unit 4 and MHC class I genes reveal a typical MHC-G transcription pattern. genes from Caja-G groups 1 and 3 in unit 5 are missing. These missing MHC class I genes are either deleted or map to regions that are not covered by our BAC contigs (Fig. 1B). Therefore, we Acknowledgments cannot exclude the possibility that Caja-G9 may belong to unit 4 We thank Nico Westphal for excellent technical support. as well, because Caja-G9 and Caja-G10 show a close relationship by guest on October 2, 2021 with Caja-G group 1 (Figs. 3, 4) and might have been duplicated Disclosures after establishment of the different Caja-G groups. The authors have no financial conflicts of interest. Comparative genomic analysis of sequences from the duplica- tion units of the HLA region revealed that the MIC, HCG9, HCG2, HCP5, and HCG26 sequences are contained within discrete du- References plication units, or duplicons (23). The hypothetical ancestral du- 1. Philippens, I. H., J. A. Wubben, B. Finsen, and B. A. ’t Hart. 2013. Oral treat- ment with the NADPH oxidase antagonist apocynin mitigates clinical and patho- plication unit appears to have formed the duplicated class I logical features of parkinsonism in the MPTP marmoset model. J. Neuroimmune genomic structures seen within the HLA-A/G/F and HLA-B/C Pharmacol. 8: 715–726. segment as the result of a series of imperfect serial duplications 2. Mansfield, K. 2003. Marmoset models commonly used in biomedical research. Comp. Med. 53: 383–392. during primate evolution (23, 24). This segmental duplication is 3. Zuhlke,€ U., and G. Weinbauer. 2003. The common marmoset (Callithrix jac- also observed in the Patr-A/G/F and Mamu-A/G/F segments (25). chus) as a model in toxicology. Toxicol. Pathol. 31(Suppl.): 123–127. However, genomic traits for MIC, HCG2, HCG9, HCP5, and 4. Yaguchi, M., M. Tabuse, S. Ohta, K. Ohkusu-Tsukada, T. Takeuchi, J. Yamane, H. Katoh, M. Nakamura, Y. Matsuzaki, M. Yamada, et al. 2009. Transplantation HCG26 were not found in the Caja-G/F and Caja-B segments of dendritic cells promotes functional recovery from spinal cord injury in (12), suggesting that the Caja-G/F segment (or New World common marmoset. Neurosci. Res. 65: 384–392. monkey’s MHC-G/F segment) was formed by species-specific 5. Hattori, Y., N. Kanamoto, K. Kawano, H. Iwakura, M. Sone, M. Miura, A. Yasoda, N. Tamura, H. Arai, T. Akamizu, et al. 2010. Molecular character- segmental duplication of an MHC-A/G–MHC-F ancestral unit af- ization of tumors from a transgenic mouse adrenal tumor model: comparison ter separation from the other simian primates by birth-and-death with human pheochromocytoma. Int. J. Oncol. 37: 695–705. evolution of MHC class I genes to adapt to environmental patho- 6. Sasaki, E., H. Suemizu, A. Shimada, K. Hanazawa, R. Oiwa, M. Kamioka, gens (26). I. Tomioka, Y. Sotomaru, R. Hirakawa, T. Eto, et al. 2009. Generation of transgenic non-human primates with germline transmission. Nature 459: 523–527. Among the 14 Caja-G and 5 Caja-F genes, 6 putatively func- 7. Kametani, Y., D. Suzuki, K. Kohu, M. Satake, H. Suemizu, E. Sasaki, T. Ito, tional genes (Caja-G1, Caja-G3, Caja-G7, Caja-G12, Caja-G13, N. Tamaoki, T. Mizushima, M. Ozawa, et al. 2009. Development of monoclonal and Caja-F4) were identified in this study. All of the deduced antibodies for analyzing immune and hematopoietic systems of common mar- moset. Exp. Hematol. 37: 1318–1329. Caja class I amino acid sequences contain the highly conserved 8. Lever, M. S., A. J. Stagg, M. Nelson, P. Pearce, D. J. Stevens, E. A. Scott, and structurally essential amino acid residues (Supplemental Fig. 1), A. J. Simpson, and M. J. Fulop. 2008. Experimental respiratory anthrax infection and partial Caja-G1, Caja-G3, Caja-G7, Caja-G12, Caja-G13, in the common marmoset (Callithrix jacchus). Int. J. Exp. Pathol. 89: 171–179. 9. Weatherford, T., D. Chavez, K. M. Brasky, and R. E. Lanford. 2009. The mar- and Caja-F4 cDNA sequences matched perfectly with all six moset model of GB virus B infections: adaptation to host phenotypic variation. Caja-G genes from our RT-PCR analysis of exons 2–4 (Caja-G: J. Virol. 83: 5806–5814. 472 bp, and Caja-F4: 321 bp) using 10 RNA samples derived from 10. Kap, Y. S., J. D. Laman, and B. A. ’t Hart. 2010. Experimental autoimmune encephalomyelitis in the common marmoset, a bridge between rodent EAE and C. jacchus spleen tissues (Caja-G1: AB855795, Caja-G3: AB855796, multiple sclerosis for immunotherapy development. J. Neuroimmune Pharmacol. Caja-G7: AB855797, Caja-G12: AB855798, Caja-G13: AB855799, 5: 220–230. 8 GENOMICS OF THE Caja-G/F SEGMENT

11. Shiina, T., K. Hosomichi, H. Inoko, and J. K. Kulski. 2009. The HLA genomic evolutionary distance, and maximum parsimony methods. Mol. Biol. Evol. 28: loci map: expression, interaction, diversity and disease. J. Hum. Genet. 54: 15–39. 2731–2739. 12. Shiina, T., A. Kono, N. Westphal, S. Suzuki, K. Hosomichi, Y. F. Kita, C. Roos, 20. Saitou, N., and M. Nei. 1987. The neighbor-joining method: a new method for H. Inoko, and L. Walter. 2011. Comparative genome analysis of the major his- reconstructing phylogenetic trees. Mol. Biol. Evol. 4: 406–425. tocompatibility complex (MHC) class I B/C segments in primates elucidated by 21. Bjorkman, P. J., M. A. Saper, B. Samraoui, W. S. Bennett, J. L. Strominger, and genomic sequencing in common marmoset (Callithrix jacchus). Immunogenetics D. C. Wiley. 1987. Structure of the human class I histocompatibility antigen, 63: 485–499. HLA-A2. Nature 329: 506–512. 13. van der Wiel, M. K., N. Otting, N. G. de Groot, G. G. Doxiadis, and 22. Bjorkman, P. J., M. A. Saper, B. Samraoui, W. S. Bennett, J. L. Strominger, and R. E. Bontrop. 2013. The repertoire of MHC class I genes in the common D. C. Wiley. 1987. The foreign antigen binding site and T cell recognition marmoset: evidence for functional plasticity. Immunogenetics 65: 841–849. regions of class I histocompatibility antigens. Nature 329: 512–518. 14. de Groot, N. G., N. Otting, J. Robinson, A. Blancher, B. A. Lafont, S. G. Marsh, 23. Shiina, T., G. Tamiya, A. Oka, N. Takishima, T. Yamagata, E. Kikkawa, D. H. O’Connor, T. Shiina, L. Walter, D. I. Watkins, and R. E. Bontrop. 2012. K. Iwata, M. Tomizawa, N. Okuaki, Y. Kuwano, et al. 1999. Molecular dynamics Nomenclature report on the major histocompatibility complex genes and alleles of MHC genesis unraveled by sequence analysis of the 1,796,938-bp HLA class I of Great Ape, Old and New World monkey species. Immunogenetics 64: 615–631. region. Proc. Natl. Acad. Sci. USA 96: 13282–13287. 15. Rothberg, J. M., and J. H. Leamon. 2008. The development and impact of 454 24. Kulski, J. K., T. Shiina, T. Anzai, S. Kohara, and H. Inoko. 2002. Comparative sequencing. Nat. Biotechnol. 26: 1117–1124. genomic analysis of the MHC: the evolution of class I duplication blocks, di- 16. Shiina, T., G. Tamiya, A. Oka, T. Yamagata, N. Yamagata, E. Kikkawa, K. Goto, versity and complexity from shark to man. Immunol. Rev. 190: 95–122. N. Mizuki, K. Watanabe, Y. Fukuzumi, et al. 1998. Nucleotide sequencing 25. Shiina, T., M. Ota, S. Shimizu, Y. Katsuyama, N. Hashimoto, M. Takasu, analysis of the 146-kilobase segment around the IkBL and MICA genes at the T. Anzai, J. K. Kulski, E. Kikkawa, T. Naruse, et al. 2006. Rapid evolution of centromeric end of the HLA class I region. Genomics 47: 372–382. major histocompatibility complex class I genes in primates generates new dis- 17. Kita, Y. F., A. Ando, K. Tanaka, S. Suzuki, Y. Ozaki, H. Uenishi, H. Inoko, ease alleles in humans via hitchhiking diversity. Genetics 173: 1555–1570. J. K. Kulski, and T. Shiina. 2012. Application of high-resolution, massively 26. Nei, M., X. Gu, and T. Sitnikova. 1997. Evolution by the birth-and-death process parallel pyrosequencing for estimation of haplotypes and levels in multigene families of the vertebrate immune system. Proc. Natl. Acad. Sci. of swine leukocyte antigen (SLA) class I genes. Immunogenetics 64: 187–199. USA 94: 7799–7806. 18. O’Leary, C. E., R. W. Wiseman, J. A. Karl, B. N. Bimber, S. M. Lank, 27. Averdam, A., B. Petersen, C. Rosner, J. Neff, C. Roos, M. Eberle, F. Aujard, J. J. Tuscher, and D. H. O’Connor. 2009. Identification of novel MHC class I C. Munch,€ W. Schempp, M. Carrington, et al. 2009. A novel system of poly- sequences in pig-tailed macaques by amplicon pyrosequencing and full-length morphic and diverse NK cell receptors in primates. PLoS Genet. 5: e1000688. Downloaded from cDNA cloning and sequencing. Immunogenetics 61: 689–701. 28. Fainardi, E., M. Castellazzi, M. Stignani, F. Morandi, G. Sana, R. Gonzalez, 19. Tamura, K., D. Peterson, N. Peterson, G. Stecher, M. Nei, and S. Kumar. 2011. V. Pistoia, O. R. Baricordi, E. Sokal, and J. Pen˜a. 2011. Emerging topics and new MEGA5: molecular evolutionary genetics analysis using maximum likelihood, perspectives on HLA-G. Cell. Mol. Life Sci. 68: 433–451. http://www.jimmunol.org/ by guest on October 2, 2021