<<

and Immunity (2001) 2, 119–127  2001 Nature Publishing Group All rights reserved 1466-4879/01 $15.00 www.nature.com/gene The human for mannan-binding -associated serine -2 (MASP-2), the effector component of the lectin route of complement activation, is part of a tightly linked gene cluster on 1p36.2–3

C Stover1, Y Endo2, M Takahashi2, NJ Lynch1, C Constantinescu1, T Vorup-Jensen3, S Thiel3, H Friedl4, T Hankeln4, R Hall5, S Gregory5, T Fujita2 and W Schwaeble1 1Department of Microbiology and Immunology, University of Leicester, Leicester, UK; 2Department of Biochemistry, Fukushima Medical University, School of Medicine, Fukushima, Japan; 3Department of Medical Microbiology and Immunology, The Bartholin Building, University of Århus, Århus, Denmark; 4Genterprise, Institut fu¨r Molekulargenetik und Gentechnische Sicherheit, Johannes Gutenberg University Mainz, Mainz, Germany; 5The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK

The of the of complement activation, MASP-1 and MASP-2, are encoded by two separate genes. The MASP1 gene is located on chromosome 3q27, the MASP2 gene on chromosome 1p36.23–31. The genes for the classical complement activation pathway proteases, C1r and C1s, are linked on chromosome 12p13. We have shown that the MASP2 gene encodes two gene products, the 76 kDa MASP-2 and a plasma of 19 kDa, termed MAp19 or sMAP. Both gene products are components of the lectin pathway activation complex. We present the complete primary structure of the human MASP2 gene and the tight cluster that this forms with non-complement genes. A comparison of the MASP2 gene with the previously characterised C1s gene revealed identical positions of introns separating orthologous coding sequences, underlining the hypothesis that the C1s and MASP2 genes arose by exon shuffling from one ancestral gene. Genes and Immunity (2001) 2, 119–127.

Keywords: complement; lectin pathway; serine proteases; gene cluster; MASP2 gene

Introduction vertase, C4b2b, and the C5 convertase, C4b2b()n. The alternative pathway forms an amplification loop of comp- The is an integral part of the innate lement activation and is initiated by binding of the comp- antimicrobial immune defence and mediates humoral lement activation product C3b to Factor B. The sub- and cellular interactions within the immune response, sequent cleavage of Factor B by forms an including chemotaxis, phagocytosis, cell adhesion, and B enzymatically active complex, C3bBb, which converts 1 cell differentiation. Complement may be activated via native C3 to C3b. Accumulation of multiple C3b mol- three different routes, the classical pathway, the alterna- ecules in proximity of the alternative pathway convertase tive pathway, and the recently described lectin pathway. C3bBb allows the complex to cleave and activate the fifth The classical activation pathway is initiated by binding of component of complement, C5. the hexaoligomeric recognition molecule C1q to immune The lectin pathway can be activated in the absence of complexes via its globular heads. This effects a confor- immune complexes and is initiated by the recognition of mational change of its collagenous stalks which in turn carbohydrate structures present on microbial organisms leads to the activation of the C1q-associated serine pro- via macromolecular complexes present in body fluids. tease complexes C1r2 and C1s2. Of these, C1r cleaves and These complexes are composed of a multivalent pattern activates the zymogen C1s. Activated C1s subsequently recognition subunit and associated serine proteases cleaves C4 and C4b bound C2 to generate the C3 con- which initiate complement activation upon binding of the recognition subunits to microbial oligosaccharide struc- tures. To date, two pattern recognition components of the Correspondence: Dr WJ Schwaeble, Department of Microbiology and Immunology, University of Leicester, University Road, Leicester LE1 lectin activation pathway have been described, the well 9HN, UK. E-mail: ws5Ȱle.ac.uk characterised plasma glycoprotein mannan-binding lectin This work was supported by the Wellcome Trust, the German (MBL)2 and ficolin p35,3 both members of the Research Foundation (Deutsche Forschungsgemeinschaft), the family,4,5 yet differing in their binding specificities to Japanese Society for the Promotion of Science, and the EMDO microbial surface oligosaccharides. As these vary con- Foundation, Switzerland. siderably amongst members of the collectin family, the Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under accession numbers presence of more than one recognition subunit may result AL109811, AB033742, AJ297949, AJ299718, and AJ300188. in a wider spectrum of microbial surfaces recognised by Received 4 December 2000; accepted 1 February 2001 the lectin pathway activation complexes. MBL forms a Organisation of the human MASP2 gene locus C Stover et al 120 hexoidal macromolecular complex of six structural sub- sequence coding for the signal peptide is split by an units each composed of three identical polypeptide intron of 83 bp (Figure 1a). Exon b codes for the remain- chains. This associates with two distinct serine proteases, der of the signal peptide (R)LLTLLGLLCGSVA and the the MBL-associated serine proteases-1 and -2 (ie, MASP- N-terminal 63 residues of the CUB I domain. The C-ter- 1, MASP-2), and a 19 kDa MASP-2 related plasma pro- minal 59 residues of the CUB I domain are encoded by tein, termed MAp19 or sMAP.6 No detailed information exon c, separated from exon b by 157 bp and from exon is yet available on the stoichiometry of lectin pathway d, the single exon coding for the EGF-like domain, by activation complexes formed with ficolin p35. It is, how- 1016 bp. The alternative splice/polyadenylation exon e ever, clear that ficolin p35 complexes associate with (which contains coding sequence for the four C-terminal MASP-1, MASP-2 and MAp19/sMAP and initiate the lec- amino acids of MAp19 not found in MASP-2, namely glu- tin pathway via the aforementioned serine proteases.3 In tamic acid, glutamine, serine, and leucine, and sequence vitro, purified and recombinant MASP-2 was shown to of the 3Ј UT region specific for MAp19 mRNA) follows cleave the fourth and second component of complement after 454 bp of intronic sequence. The 3Ј boundary of (ie, C4 and C2) in absence of MASP-1.6–8 The sequential exon e is determined by the position of the poly(A) proteolytic cleavage of C4 and C4b bound C2 to form the addition site14 noted in the human MAp19 mRNA tran- C3 convertase, C4b2b, and the C5 convertase, scripts with the accession numbers Y18281–Y18284. 1262 C4b2b(C3b), respectively, is the essential step by which bp further downstream, sequence for the second CUB the lectin pathway activation complex as well as the structural domain is split by an intron of 316 bp (exon f, classical pathway activation complex initiate further 197 bp; exon g, 148 bp, respectively). An intron of 1997 downstream activation of complement.7,8 The potential of bp follows (AJ299718). The two CCP domains of MASP- MASP-2 to activate complement in absence of MASP-1 in 2 are encoded by two exons each over a stretch of 7.6 kb, vitro is supported by the analysis of sera of a recently exon h (119 bp) and exon i (79 bp), exon j (135 bp) and established MASP-1 deficient mouse strain, which exhib- exon k (75 bp), respectively. 2527 bp further downstream, its effective lectin pathway activation under physiological exon l encodes the serine protease domain of MASP-2 9 conditions. Thus, MASP-2 is the effector component of and contains the 3Ј UT region specific for MASP-2 the lectin pathway of complement activation. MASP-1 mRNA. The 3Ј boundary of exon l is determined from and MAp19 are constitutive components of the lectin the position of the poly (A) addition site14 found in cDNA pathway activation complexes, their specific functional clones Y09926, phl-1, phl-2, phl-3.15 Exons b-l are pre- role(s), however, have yet to be defined. ceded by polypyrimidine rich tracts. Sequence compari- We have previously shown that a single structural son of the clones aligned in Figure 1 revealed only two MASP2 gene of approximately 20 kb on chromosome variants (transitions) within the coding sequence of the 1p36.23–31 is transcribed and processed to generate two MASP2 gene: the codon for the amino acid position 362 gene products, a 2.6 kb MASP-2 mRNA, encoding the 76 of the mature MASP-2 protein is GTG (valine) in kDa MASP-2 serine protease, or an abundant 1.0 kb AL109811 and GCG (alanine) in AJ300188, respectively; mRNA transcript, encoding a truncated MASP-2 related the codon for amino acid position 478 is TCC (serine) in plasma protein of 19 kDa devoid of a serine protease AL109811 and TCT (serine) in AJ300188, respectively. domain, termed MAp19 or sMAP.10–13 The codon phases (positions of introns within the The present report amends and completes our pre- codon) are given in Table 1. They are identical for the viously presented organisational map (based solely on human MASP2, MASP1, and C1s genes in the ortholog- genomic PCR)10,11 by a comparative analysis of overlap- ous exonic composition represented in MASP2 exons a, ping sequences obtained from human genomic DNA iso- 16 Ј lated as independent YAC, BAC and ␭FixII clones. These b, c, d, f, g, h, i, j, and k. The codon phase at the 5 end of exon l is also identical for the orthologous position in represent approximately 222 kb of the chromosomal area 17 1p36.2–3. Furthermore, we present putative promoter the human C1r gene. A GC level of 50.37% was determined for sequence elements responsible for the strict hepatic expression pat- − tern of MASP-2 and MAp19 mRNA. encompassing the MASP2 gene ( 2137 from transcription start site to 167 bp past exon l, ie a total of 19980 bp; http://ftp.genome.washington.edu/cgi-bin/RepeatMasker). Results In keeping with this high GC content, this area is charac- terised by the abundance of ‘short interspersed nuclear Human MASP2 gene organisation elements’, especially ALU repeats (36.01% of the The revised multi-exon structure of the human MASP2 sequence) and MIR (1.59%), and scarce ‘long interspersed gene (encompassing approximately 20 kb) is presented as nuclear elements’, LINE2 (1.46%). Ј derived from sequence analysis of six overlapping clones Within 1.2 kb 5 of the determined transcription start (Figure 1a and b). site three potential promoter sites (with a score cut-off of The 5Ј boundary of exon a (62 bp) was determined by 0.9, 0.99, and 1.0, respectively) are predicted by a human mapping the transcription initiation start site. From two promoter analysis programme (http://www.fruitfly. independent 5Ј RACE reactions performed on human org/seqFtools/promoter.html).18 There are two putative cDNA, it was determined to lie 57 bp upstream of GC-box motifs.19 Several potential transcription factor the translation initiation codon (Figure 2). In comparison binding sequences are revealed in their vicinities to the published sequences for human MASP-2 and (http://bimas.dcrt.nih.gov/molbio/signal/)20,21 (Figure 2). MAp19/sMAP mRNA species, the 5Ј UT region of Most importantly, among the latter, consensus sequences human MASP-2 mRNA (Y09926) could be extended by for liver-specific (LF-A1 and TGT3) or -enriched tran- 37 bp and that of human MAp19/sMAP mRNA by 41 scription factors (HNF4, and HNF5, respectively, bp (Y18281) and 31 bp (AB008047), respectively. The http://www.cbil.upenn.edu/tess/) are contained which

Genes and Immunity Organisation of the human MASP2 gene locus C Stover et al 121

Figure 1 (a) Organisation of the human MASP2 gene localised on chromosome 1p36.23–31. Exons are denoted as boxes and enumerated a through l, their products in terms of structural and functional modules conserved in the serine proteases C1r, C1s, MASP-1, MASP-2 are indicated. Introns are indicated as lines and shortened to scale; their sizes are given in brackets. 1 indicates the transcription start site determined by 5Ј RACE analysis (see Figure 2), 17.675 kb indicates the position of the Poly(A) addition site of the MASP-2 mRNA. M, methionine; SP, signal peptide. (b) Alignment of overlapping clones. AL109811 represents a human PAC clone of approximately 150 kb which contains all of the MASP2 gene. The intronic sequence g/h (1997 bp) is deposited separately under AJ299718. ø4–12 and ø2–2 are two clones isolated from a ␭FixII genomic library in this work which have been characterised by restriction mapping, Southern blot and partial sequence analyses. AB033742 represents sequence (2819 bp) of the 5Ј part of the gene giving rise to the abundant plasma protein MAp19 (exons a through e). AJ300188 is a human genomic clone isolated from a RP11 BAC library with partial representation of the MASP2 gene (see Figure 3). AJ297949 (2577 bp) represents an amplification product obtained from human genomic DNA spanning the end of coding sequence for the EGF-like domain up to the sequence coding for the C-terminal part of the CUB II domain and encompasses sequence for the alternative splice and polyadenylation exon e. may play a role in determining the liver specific homologue, Rrp6p, which is required for processing of expression of MASP-2 and MAp19 mRNA.10,12 5.8 S rRNA prior to ribosomal subunit assembly.32 Its Clone RP11–99P18 (AJ300188) only covers sequence for gene, PMSCL2, has been located to human chromosome the 3Ј end of the MASP2 gene including exons l, k, and 1p36 and to mouse chromosome 4q27 (which is syntenic j, and 1.5 kb intronic sequence prior to exon j (Figure 1). to the human gene location).33 Two gene bank entries for As indicated in Figure 3, sequence encompassing the pro- mRNA transcripts of this gene, X66113 and L01457, shar- moter area of the MASP2 gene (Figure 2) and the entire ing the same 5Ј and 3Ј UT regions and differing in the genomic sequence for exons a through i (including the presence or absence of coding sequence for 25 amino alternative splice exon e) are replaced by a shortened, acids were analysed in comparison with the complete unrelated sequence of 1.3 kb. At its boundaries with the transcription unit of PMSCL2. The two mRNA transcripts known MASP2 gene sequence (100% identical in overlap- are alternative products from the same gene arising from ping parts), there is an increasing number of nucleotide in-frame inclusion or skipping of the nineteenth exon mismatches towards a sequence which is composed of (codon phase 0). 48% ALU repeats and 6.3% LINE2 repeats (http:// TAR DNA binding protein of 43 kDa is a transcription ftp.genome.washington.edu/cgi-bin/RepeatMasker). factor that binds specifically to pyrimidine-rich motifs contained in the HIV-1 long terminal repeat and acts to Genetic neighbourhood of the MASP2 gene repress gene expression.34 Its gene is termed TARDBP. Figure 3 (a and b) gives an overview of the MASP2 gene The International Radiation Hybrid Mapping consortium cluster, as determined from sequence of overlapping has mapped the TARDBP gene to chromosome 20 clones characterised in this work. The MASP2 gene is (http://www.ncbi.nlm.nih.gov/omim/). Indeed, data- located in close vicinity to non-complement genes. bank search (http://www.sanger.ac.uk/HGP/Chr20/) FRAP is a phospatidylinositol kinase conserved in using the full-length mRNA sequence (AL050265) yeast,22 cryptococcus neoformans,23 rat,24 and bovidae25 retrieves the following human bacterial artificial chromo- and is involved in G1 cell cycle progression.26 Its gene, some (BAC) clone, RP11–318P23 (AL359954). However, FRAP1, is located in the vicinity of a cytogenetic breakpo- the degree of identity is between 86 and 95%, whereas int.27 It has been mapped to chromosomal location the TARDBP mRNA sequence (AL050265) is 98–100% 1p36.228 and 1p36.2–3,29 respectively. The gene bank identical to corresponding sequence represented in PAC entries U88966 and NMF004958 represent two alterna- clone RP4–635E18 (AL109811), mapping to chromosome tively polyadenylated transcripts of this one gene dif- 1p36.11–31. fering only in sequence established within a GC rich The chromosomal localisation of the gene for ‘cornea- region of 46 bp. derived transcript 6’, CDT6 (Y16132), has not previously PM-Scl 100 kDa nucleolar autoantigen30,31 has a yeast been assigned. The single report in the literature ident-

Genes and Immunity Organisation of the human MASP2 gene locus C Stover et al 122

Figure 2 Promoter region, transcriptional regulatory elements, and transcription initiation start site of the human MASP2 gene. Consensus sequences of putative binding sites for transcription factors are underlined and indicated and predicted promoter sequences are in bold. Putative GC boxes are marked. The transcription initiation start site as defined by 5Ј RACE analysis is indicated as +1. Exon a (containing the 5Ј UT region of MAp19/MASP-2 mRNA transcripts and their translation initiation codon) and the 5Ј portion of exon b (coding for the remainder of the signal peptide and the N-terminal part of CUB I) are shown.

ifies CDT6 as a novel gene whose translational product Like the mRNA for the serine protease C1s, the MASP- shows homology to angiopoietins 1 and 2 and is charac- 2 mRNA transcript is encoded by 12 exons. There is one terised by a coiled-coil structural domain and a fibrino- additional exon in the MASP2 gene to allow generation gen-like domain.35 The gene for CDT6 lies within a 25 of the alternative splice and polyadenylation product, kb intron of the FRAP1 gene, in opposite transcriptional MAp19/sMAP. In the C1s and the MASP2 gene, each of orientation (Figure 3). the two CUB structural domains as well as each of the two CCP modules are encoded by two exons each. The Discussion EGF-like domain is encoded by one exon only. Likewise, the enzymatically active protease domain is encoded by The MASP2 gene extends over approximately 20 kb and one exon only and clearly distinguishes the C1s and is much smaller than the MASP1 gene (about 72 kb as MASP2 genes from the MASP1 gene, in which the pro- seen from sequence made available by the Human Gen- tease domain is encoded by six exons.16 A previously ome Sequencing Project, accession number AC007920) published comparative analysis of the genomic organis- owing to smaller introns. The human C1s gene comprises ation of the serine protease domain of MASP-2 in mouse approximately 13 kb.16 and Xenopus and of MASPs in carp, shark, and lamprey

Genes and Immunity Organisation of the human MASP2 gene locus C Stover et al 123 Table 1 Codon phases within the human MASP2 gene

Splice donor and acceptor sequences of all exons are compiled. The arrows indicate the alternative polyadenylation/splicing event that takes place in the hnRNA to generate either the MAp19 or MASP-2 mRNA. The position of the intron within the respective codon is given to the right; codon phase 0 and I denote a position of the intronic sequence before and after the first base, respectively. revealed that these MASP genes belong to the so-called genes are the result of a gene duplication event38 and AGY lineage of -like serine proteases. The mem- have remained closely linked on chromosome 12p13.39 bers of this lineage are characterised by an AGY codon MASP-1, by contrast, belongs to the TCN lineage of tryp- at the serine and a single exon coding for the sin-like serine proteases.16 Taking this into account,40 a protease domain.16 C1r was also shown to possess an model for the evolution of the MASP1, MASP2, C1r, and intronless protease domain,36 its active site serine residue C1s genes was proposed which involves events such as is also encoded by an AGY triplet. The close proximity mutation, retrotransposition, and duplication from an of the human C1r and C1s genes, both of the AGY lineage ancestral gene of the TCN type lineage showing a split (in 5Ј → 3Ј (C1r gene) and 3Ј → 5Ј (C1s gene) orientation, gene organisation for the protease domain.16,17,41,42 The at an intergenic distance of 9.5 kb37), suggested that both identical codon phases in the orthologous coding

Genes and Immunity Organisation of the human MASP2 gene locus C Stover et al 124

Figure 3 (a) The MASP2 Gene Cluster on chromosome 1p36.2–3. The transcription units for the individual genes in the vicinity of the MASP2 gene are presented as boxes and arrows indicate orientation of transcription. Names of the gene products are included. The position of this analysed chromosomal area with respect to telomere and centromere of the short (p) arm is indicated. (b) Coverage of this chromo- somal area by overlapping clones (AL109811, AJ300188, AL049653). The sequence area denoted by an asterisk represents the divergent sequence of 1340 bp solely composed of repetitive elements found in AJ300188.

sequences of the MASP1, MASP2, C1s genes, and the par- promoter region of the MASP2 gene, exons a through i, tially characterised C1r gene underline the likelihood of and includes most of the intron prior to exon j which such a genesis. The phenomenon of exon shuffling may codes for the N-terminal part of CCP II. Analysis of this have to be added to these events:43 symmetric exons, ie sequence area reveals that there is sequence identity over exons for which the phase of the 5Ј intron is identical to a stretch of 1.5 kb for the 3Ј part of the intron i/j of the the phase of the 3Ј intron, will spread more easily by exon MASP2 gene and a stretch of 4.3 kb for the 5Ј part of shuffling than nonsymmetric ones.44 Within the MASP2 the intergenic region to the PMSCL2 gene, respectively. gene, exons d, e, j, and k are symmetric, ie phase corre- Approaching the divergent sequence of 1.3 kb (solely lated, exons (coding for the EGF-like domain, the MAp19 composed of repetitive elements) from either side, there specific C-terminus, and all of CCP II). By contrast, is an increasing number of nucleotide mismatches. The introns that lie in phase 0 (ie between codons), are said origin of the divergent organisation of the MASP2 gene to be ancient, ie originally present in the progenote, and locus in this BAC clone is presently investigated by are probably more related to structural modules of the sequencing two overlapping BACs of the same human protein.45 In the MASP2 gene, phase 0 introns are found BAC library from which clone RP11–99P18 was isolated. between the exons coding for CUB I, CUB II, and CCP The serine proteases of the classical and lectin pathway I, respectively. of complement activation have a similarity score between The MASP2 gene lies in a gene-rich area of chromo- 39 and 52%.15 Their genes belong to a multigene family. some 1,46 in close proximity to non-complement genes, Genes coding for various types of trypsinogen are also such as the gene coding for TAR DNA binding protein, members of this family. , composed solely of a the gene coding for the 100 kDa Polymyositis-Sclero- serine protease domain (of the TCN type and encoded derma autoantigen, the gene coding for FRAP kinase by five exons, the boundaries differ from the ones found which harbours in one of its introns the gene for CDT6. in the orthologous part of the MASP1 gene coding for the All of these genes have a split exon architecture. serine protease domain), is thought to be the primordial In a wider genetic context, the detection on 1p35–36.2 present before the emergence of complement ser- of several and polymorphic pseudogenes of the macro- ine proteases, and novel substrate specificities are phage stimulating protein gene has lead to the concept thought to have been introduced by the evolutionary that recombination may occur frequently between these anteposition of certain modular domain structures found repeated sequences.47 In addition, a PAC clone from this in, eg, MASP-2, MASP-1, C1r, and C1s. Recent work chromosomal area (Z98257) contains sequence with 81– emphasises the importance of residue 225 ( 96% identity to a human endogenous retrovirus type C numbering) in serine proteases.50 While degradative oncovirus sequence (M74509). An adenoviral integration enyzmes such as trypsin have a proline residue at this site has been mapped to 1p36.1.48 The cytogenetic aber- position, P225, a tyrosine is found in MASP-1, MASP-2, ration described in neuroblastomas is a deletion of the C1r, C1s (and thrombin), Y225. Y225 allows Na+ binding short arm of .49 This deletion encompasses near the primary specificity site and thus enhances cata- the MASP2 gene locus (the MASP2 gene is at 5.66 cR from lytic activity. It is thought that the mutation P225 to Y225 the marker D1S54813) as seen from an analysis of several occurred in prevertebrates in the face of evolutionary framework maps (http://www.ncbi.nlm.nih.gov/ pressures, prior to the conversion of the codon for the genemap/map.cgi for chromosome 1).28,49 The signifi- active site serine (TCN to AGY) in more specialised serine cance of this is not yet clear. Here, it is of interest to note proteases found in vertebrates.50 the substitution of part of the MASP2 gene observed in The concomitant presence of members of unrelated clone RP11–99P18 (AJ300188), which encompasses the multigene families in the vicinity of the MASP1, MASP2,

Genes and Immunity Organisation of the human MASP2 gene locus C Stover et al 125 Table 2 Compilation of positional overlap of human genes belonging to four different multigene families

Presented on the left are the genes for the serine proteases MASP-2, MASP-1, C1r, C1s, and two trypsins; these are compared against members of three unrelated gene families, a transporter gene family (SLC2A), a chloride channel gene family (CLCN), and the tumor necrosis factor receptor gene family (TNFR), and their respective chromosomal locations are indicated.

C1r, C1s, and trypsinogen genes would lend support to region of the serine protease domain of MASP-2 the hypothesis of a joint ancestry of these genes and (Research Genetics, Huntsville, Alabama, USA). The weaken the theoretical possibility of a convergent evol- clones were verified by colony PCR using MASP-2 spe- ution of unrelated genes. In search of these, strikingly cific oligonucleotides (5Ј ACCT GGTGATTTTCCTTGG overlapping chromosomal positions have been found for 3Ј, bp. pos. 1372–1390 of Y09926 and 5Ј GTCACTCT the serine protease genes of interest to this work, two of CGTGGTTTATG 3Ј, bp. pos. 2278–2296 of Y09926) and the trypsinogen genes,51 the genes for certain isoforms of Southern blotting. Partial primary structure for the a glucose transporter52 and of a chloride channel53 as well MASP2 gene was established from overlapping subclones as for the genes for the two TNF receptors (http://www. obtained from a shotgun library constructed of one BAC ncbi.nlm.nih.gov/omim/, http://www.ncbi.nlm.nih. clone, RP11–99P18. Double stranded sequencing and gov/genemap99/)54,55 (Table 2). We propose that the con- analysis were performed using automated sequencer ABI sideration of such neighbouring genes which are equally 377 and Sequencher 3.0 programme (Gene Codes Corpor- members of multigene families should be included in ation, MI, USA), respectively. In parallel, two clones, ø4– phylogenetic analyses. 12 and ø2–2, were isolated from a human ␭ Fix II genomic library using a MASP-2 cDNA probe and analysed by Materials and methods restriction mapping and sequencing. One clone, AJ297949, was obtained by cyclic amplification of human genomic DNA using MASP-2 specific oligonucleotides (5Ј RACE GCGCCCACCTGCGACCACCAC 3Ј, bp pos. 461–481 of Initiation of transcription of the human MASP2 gene10,11 Y09926 and 5Ј TCCTGATTCATCTGTGACAAAGG 3Ј,bp was mapped by RACE technology using 5Ј RACE System pos. 846–868 of Y09926) and characterised by restriction for Rapid Amplification of cDNA Ends (Version 2.0) from mapping and sequencing. Life Technologies (Paisley, UK). Total RNA prepared from human liver tissue56 was transcribed to cDNA using a MASP-2 mRNA specific antisense oligonucleotide (5Ј Acknowledgements GGAGAGTTTGGGATACGGCCGTGG GTA 3Ј, bp pos. 617–643, accession number Y09926). The single strand We wish to thank The International Institute for the cDNA was purified and dC-tailed according to the manu- Advancement of Medicine, University of Leicester for facturer’s protocol. An anchor primer annealing to the kindly providing the non-tumorous human liver speci- homopolymeric tail and two other, nested MASP- men used in this study. We acknowledge the expertise of 2/MAp19 specific oligonucleotides (5Ј GCGTGGCCAG Cathy Summer (Research Genetics, Huntsville, Alabama, CACCTTGGCCCC 3Ј, bp pos. 265–286 of Y18281, 5Ј USA) in isolating the MASP-2 specific human BAC clone GCGA GTAGAAAGTGTCCTTGC 3Ј, bp pos. 326–346 of RP11–99P18 (AJ300188). The authors also wish to thank Y18281) positioned 3Ј of the above oligonucleotide were the Sanger Mapping and Sequence groups for the used for two independent cyclic amplification reactions. sequence production of RP4–635E18 (AL109811). Guelni- Two distinct amplification products of 320 bp and 380 hal Yueksekdag is acknowledged for excellent technical bp, respectively were subcloned in pGEMTeasy vector support. We especially thank Dr Jim Kaufman (Institute (Promega, Southampton, UK) and analysed by double of Animal Health, Compton, England) for his inspiring stranded cycle sequencing. comments.

Cloning of the MASP2 gene Human bacterial artificial chromosome (BAC) clones References were isolated by screening a human RP11 BAC library 1 Whaley K, Schwaeble W. Complement and complement using a radiolabeled cDNA probe specific for the coding deficiencies. Semin Liver Dis 1997; 17: 297–310.

Genes and Immunity Organisation of the human MASP2 gene locus C Stover et al 126 2 Lu JH, Thiel S, Wiedemann H, Timpl R, Reid KB. Binding of phosphatidylinositol kinase homolog required for G1 pro- the pentamer/hexamer forms of mannan-binding protein to gression. Cell 1993; 73: 585–596.

zymosan activates the proenzyme C1r2C1s2 complex, of the 23 Cruz M, Cavallo L, Gorlach J et al. Rapamycin antifungal action classical pathway of complement, without involvement of C1q. is mediated via conserved complexes with FKBP12 and TOR J Immunol 1990; 144: 2287–2294. kinase homologs in cryptococcus neoformans. Mol Cell Biol 1999; 3 Matsushita M, Endo Y, Fujita T. Complement-activating com- 19: 4101–4112. plex of ficolin and mannose-binding lectin-associated serine pro- 24 Sabatini D, Erdjument-Bromage H, Lui M, Tempst P, Snyder S. tease. J Immunol 2000; 164: 2281–2284. RAFT1: a mammalian protein that binds to FKBP12 in a rapamy- 4 Hansen S, Holmskov U. Structural aspects of and cin-dependent fashion and is homologous to yeast TORs. Cell receptors for collectins. Immunobiology 1998; 199: 165–189. 1994; 78: 35–43. 5 Hoppe H, Reid K. Collectins--soluble containing col- 25 Brown E, Albers M, Shin T et al. A mammalian protein targeted lagenous regions and lectin domains--and their roles in innate by G1-arresting rapamycin-receptor complex. Nature 1994; 369: immunity. Protein Sci 1994; 3: 1143–1158. 756–758. 6 Thiel S, Petersen S, Vorup-Jensen T et al. Interaction of C1q and 26 Vilella-Bach M, Nuzzi P, Fang Y, Chen J. The FKBP12-rapamy- Mannan-Binding Lectin (MBL) with C1r, C1s, MBL-associated cin-binding domain is required for FKBP12-rapamycin-associa- serine proteases 1 and 2, and the MBL-associated protein ted protein kinase activity and G1 progression. J Biol Chem 1999; MAp19. J Immunol 2000; 165: 878–887. 274: 4266–4272. 7 Vorup-Jensen T, Petersen S, Hansen A et al. Distinct pathways of 27 Onyango P, Lubyova B, Gardellin P, Kurzbauer R, Weith A. mannan-binding lectin (MBL)- and autoactivation Molecular cloning and expression analysis of five novel genes revealed by reconstitution of MBL with recombinant MBL- in chromosome 1p36. Genomics 1998; 50: 187–198 associated serine protease-2. J Immunol 2000; 165: 2093–2100. 28 Lench N, Macadam R, Markham A. The human gene encoding 8 Matsushita M, Thiel S, Jensenius J, Terai I, Fujita T. Proteolytic FKBP-rapamycin associated protein (FRAP) maps to chromo- activities of two types of mannose-binding lectin-associated somal band 1p36.2. Hum Genet 1997; 99: 547–549. serine protease. J Immunol 2000; 165: 2637–2642. 29 Moore P, Rosen C, Carter K. Assignment of the human FKBP12- 9 Takahashi M, Miura S, Ishii N et al. An essential role of MASP- rapamycin-associated protein (FRAP) gene to chromosome 1p36 1 in activation of the lectin pathway. Immunopharmacology 2000; by fluorescence in situ hybridization. Genomics 1996; 33: 331–332. 49: 3 (Abstract 003). 30 Gelpi C, Alguero´ A, Angeles Martinez M, Vidal S, Juarez C, 10 Stover C, Thiel S, Thelen M, Lynch N, Vorup-Jensen T, Jensenius Rodriguez-Sanchez J. Identification of protein components J, Schwaeble W. Two constituents of the initiation complex of reactive with anti-PM/Scl autoantibodies. Clin Exp Immunol the Mannan-Binding Lectin activation pathway of complement 1990; 81: 59–64. are encoded by a single structural gene. J Immunol 1999; 162: 31 Ge Q, Wu Y, Trieu E, Targoff I.Analysis of the specificity of anti- 3481–3490. PM-Scl autoantibodies. Arthritis Rheum 1994; 37: 1445–1452. 11 Takahashi M, Endo Y, Fujita T, Matsushita M. A truncated form 32 Briggs M, Burkard K, Butler J. Rrp6p, the yeast homologue of of mannose-binding lectin-associated serine protease (MASP)-2 the human PM-Scl 100-kDa autoantigen, is essential for efficient expressed by alternative polyadenylation is a component of the 5.8 S rRNA 3Ј end formation. J Biol Chem 1998; 273: 13255–13263. lectin complement pathway. Int Immunol 1999; 11: 859–863. 33 Bliskovski V, Liddell R, Ramsay E, Miller M, Mock B. Structure 12 Stover C, Thiel S, Lynch N, Schwaeble W. The rat and mouse and localization of mouse Pmscl1 and Pmscl2 genes. Genomics homologues of MASP-2 and MAp19, components of the man- 2000; 64: 106–110. nan-binding lectin activation pathway of complement. J Immunol 34 Ou SH, Wu F, Harrich D, Garcia-Martinez L, Gaynor R. Cloning 1999; 63: 6848–6859. and characterization of a novel cellular protein, TDP-43, that 13 Stover C, Schwaeble W, Lynch N, Thiel S, Speicher M. Assign- binds to human immunodeficiency virus type 1 TAR DNA ment of the gene encoding Mannan-Binding Lectin-associated sequence motifs. J Virol 1995; 69: 3584–3596. serine protease-2 (MASP-2) to human chromosome 1p36.2–3 by 35 Peek R, van Gelderen BE, Bruinenberg M, Kijlstra A. Molecular in situ hybridization and somatic cell hybrid analysis. Cytogenet cloning of a new angiopoietin-like factor from the human cor- Cell Genet 1999; 84: 148–149. Invest Ophthalmol Vis Sci 39 14 Cooke C, Alwine J. The cap and the 3Ј splice site similarly affect nea. 1998; : 1782–1788. polyadenylation efficiency. Mol Cell Biol 1996; 16: 2579–2584. 36 Tosi M, Duponchel C, Meo T, Couture-Tosi E. Complement 15 Thiel S, Vorup-Jensen T, Stover C et al. A second serine protease genes C1r and C1s feature an intronless protease domain closely associated with mannan-binding lectin that activates comp- related to haptoglobin. J Mol Biol 1989; 208: 709–714. lement. Nature 1997; 386: 506–510. 37 Kusumoto H, Hirosawa S, Salier J, Hagen F, Kurachi K. Human 16 Endo Y, Takahashi M, Nakao M et al. Two lineages of mannose- genes for complement C1r and C1s in a close tail-to-tail arrange- binding lectin-associated serine protease (MASP) in vertebrates. ment. Proc Natl Acad Sci 1988; 85: 7307–7311. J Immunol 1998; 161: 4924–4930. 38 Tosi M, Duponchel C, Meo T, Julier C. Complete cDNA 17 Endo Y, Sato T, Matsushita M, Fujita T. Exon structure of the sequence of human complement C1s and close physical linkage gene encoding the human mannose-binding protein-associated of the homologous genes C1s and C1r. Biochemistry 1987; 26: serine protease light chain: comparison with complement C1r 8516–8524. and C1s genes. Int Immunol 1996; 8: 1355–1358. 39 Van Cong N, Tosi M, Gross M et al. Assignment of the comp- 18 Reese MG, Eeckman FH. Novel Neural Network Algorithms for lement serine protease genes C1r and C1s to chromosome 12 Improved Eukaryotic Promoter Site Recognition. The 7th inter- region 12p13. Hum Genet 1988; 78: 363–368. + national Genome sequencing and analysis conference, Hilton 40 Dang Q, DiCera E. Residue 225 determines the Na -induced Head Island, South Carolina, 16–20 September 1995. of catalytic activity in serine proteases. Proc 19 Bucher P. Weight matrix descriptions of four eukaryotic RNA Natl Acad Sci USA 1996; 93: 10653–10656. polymerase II promoter elements derived from 502 unrelated 41 Lawson P, Reid K. A novel PCR-based technique using promoter sequences. J Mol Biol 1990; 212: 563–578. expressed sequence tags and gene homology for murine genetic 20 Prestridge DS. SIGNAL SCAN: A computer program that scans mapping: localization of the complement genes. Int Immunology DNA sequences for eukaryotic transcriptional elements. 1999; 12: 231–240. CABIOS 1991; 7: 203–206. 42 Irwin D. Evolution of an active-site codon in serine proteases. 21 Ghosh D. New developments of a transcription factors database. Nature 1988; 336: 429–430. Trends Biochem Sci 1991; 16: 445–447. 43 Long M, Rosenberg C, Gilbert W. Intron phase correlations and 22 Kunz, J, Henriquez R, Schneider U, Deuter-Reinhard M, Movva the evolution of the intron/exon structure of genes. Proc Natl N, Hall M. Target of rapamycin in yeast, TOR2, is an essential Acad Sci USA 1995; 92: 12495–12499.

Genes and Immunity Organisation of the human MASP2 gene locus C Stover et al 127 44 Gilbert W, de Souza S, Long M. Origin of genes. Proc Natl Acad 50 Guinto ER, Caccia S, Rose T, Futterer K, Waksman G, Di Cera Sci USA 1997; 94: 7698–7703. E. Unexpected crucial role of residue 225 in serine proteases. 45 De Souza S, Long M, Klein R, Roy S, Lin S, Gilbert W. Toward Proc Natl Acad Sci USA 1999; 96: 1852–1857. a resolution of the introns early/late debate: only phase zero 51 Emi M, Nakamura Y, Ogawa M et al. Cloning, characterization introns are correlated with the structure of ancient proteins. Proc and nucleotide sequences of two cDNAs encoding human pan- Natl Acad Sci 1998; 95: 5094–5099. creatic trypsinogens. Gene 1986; 41: 305–310. 46 Saccone S, De Sario A, Della Valle G, Bernardi G. The highest 52 Saier M. Families of transmembrane sugar transport proteins. gene concentrations in the are in telomeric Mol Microbiol 2000; 35: 699–710. bands of metaphase . Proc Natl Acad Sci 1992; 89: 53 Jentsch TJ, Friedrich T, Schriever A, Yamada H. The CLC chlor- 4913–4917. ide channel family. Pflugers Arch 1999; 437: 783–795. 47 van der Drift P, Chan A, Zehetner G, Westerveld A, Versteeg 54 Fuchs P, Strehl S, Dworzak M, Himmler A, Ambros P. Structure R. Multiple MSP pseudogenes in a local repeat cluster on 1p36.2: of the human TNF receptor 1 (p60) gene (TNFR1) and localiz- an expanding genomic graveyard? Genomics 1999; 62: 74–81. ation to chromosome 12p13. Genomics 1992; 13: 219–224. 48 Romani M, Baldini A, Volpi E, Casciano I, Nobile C, Muresu 55 Kemper O, Derre J, Cherif D, Engelmann H, Wallach D, Berger R, Siniscalco M. Concurrent mapping of an adeovirus 5/SV40 R. The gene for the type II (p75) tumor necrosis factor receptor integration site and the U1 snRNA cluster (RNU1) within 400 kb (TNF-RII) is localized on band 1p36.2-p36.3. Hum Genet 1991; of the chromosome 1p36.1. Cytogenet Cell Genet 1994; 67: 37–40. 87: 623–624. 49 White P, Maris J, Beltinger C et al. A region of consistent deletion 56 Chirgwin J, Przybyla A, MacDonald R, Rutter W. Isolation of in neuroblastoma maps within human chromosome 1p36.2–36.3. biologiocally active ribonucleic acid from sources enriched in Genetics 1995; 92: 5520–5524. ribonucleases. Biochemistry 1979; 18: 5294–5299.

Genes and Immunity