INFORMATION TO USERS

This manuscript has been reproduced from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter face, while others may be from any type of computer printer.

The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction.

In the unlikely event that the author did not send UMI a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion.

Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand comer and continuing from left to right in equal sections with small overlaps. Each original is also photographed in one exposure and is included in reduced form at the back of the book.

Photographs included in the original manuscript have been reproduced xerographically in this copy. Higher quality 6” x 9” black and white photographic prints are available for any photographs or illustrations appearing in this copy for an additional charge. Contact UMI directly to order. UMI Bell & Howell Information and Learning 300 North Zeeb Road, Ann Arbor, Ml 48106-1346 USA 800-521-0600

NOTE TO USERS

Page(s) missing in number only; text follows. Microfilmed as received.

222-229

This reproduction is the best copy available.

UMI

MOLECULAR GENETIC AND BIOCHEMICAL STUDIES OF THE

HUMAN AND MOUSE MHC COMPLEMENT CLUSTERS

DISSERTATION

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy

in the Graduate School of The Ohio State University

By

Zhenyu Yang, M.S. *****

The Ohio State Univeristy

1999

Dissertation Committee: Approved by

Chack Yung Yu, D.Phil., Adviser % - Adviser George A. Marzluf, Ph.D. Molecular, Cellular and Gail D. Wenger, Ph.D. Developmental Biology Program

Arthur H. M. Burghes, Ph.D. UMI Number: 9931705

UMI Microform 9931705 Copyright 1999, by UMI Company. All rights reserved.

This microform edition is protected against unauthorized copying under Title 17, United States Code.

UMI 300 North Zeeb Road Ann Arbor, MI 48103 Copyright by

Zhenyu Yang

1999 ABSTRACT

The major histocompatibility complex (MHC) is associated with more than 70 diseases. The goals of this study are to investigate the mechanisms of diseases associated with the MHC class HI region. As shown by definitive restriction fragment length polymorphisms, the four tandemly arranged serine/threonine kinase RP, complement component C4, steroid 21-hydroylase CYP21, and tenascin TNX are organized as a genetic unit designated as the RCCX module. Variations in the number and size of the RCCX modules can promote unequal crossover, leading to gene duplications and deletions. A TNXB-XA recombinant resulted from an unequal crossover between a monomodular and a bimodular RCCX was observed in a patient with congenital adrenal hyperplasia. Elucidation of the DNA sequence for the recombination breakpoint region enabled the designation of a diagnostic technique to detect the presence of TNXB-XA hybrids concurrent with deletions of CYP21B and C4B.

Since the major genetic factor(s) for insulin dependent diabetes mellitus (IDDM) reside in the MHC, EDDM patients were investigated for specific patterns of the RCCX modular structures. The frequency of monomodular RCCX structures is found significantly higher in the diabetic population than in the normal.

A novel gene, DOM3Z, was discovered in the intergenic region between SKI2W and RPl. The cDNA sequence and the exon-intron structure of D0M 3Z were determined.

Immunochemical and transient expression experiments revealed that human Dom3z is a nuclear . Z)OMiZ-related genes are present in simple and complex eukaryotes.

Located between the complement factor B and C4 genes are the four ubiquitously expressed genes, RD-SKI2W-DOM3Z-RP1. These four genes are configured as two head-to-head orientated gene pairs. The presence of unmethylated CpG sequences at the

5' regulatory regions of each gene pair indicates that they may be housekeeping genes.

RD, Ski2w, DomSz, and RP probably have concerted functions related to RNA transcription, translation and turnover.

The organization of the mouse MHC complement gene cluster (MCGC) is similar to that of human's. However, the duplication of mouse RCCX modules is markedly different from human as the mouse gene fragments RP2 and TNXA are larger and less conserved, suggesting an independent gene duplication event during evolution.

Ill Dedicated to My Parents

IV ACKNOWLEGMENTS

First and foremost, I am grateful to my adviser. Dr. Chack Yung Yu, for all his understanding and support during my graudate studies; for his patience in correcting my scientific errors; and for his encouragement and enthusiasm, which made my research and eventually, this thesis, possible.

I want to thank Dr. George A. Marzluf, a real gentleman, for the chance he provided for me to do my first rotation (during which I ran the very first sequencing gel in my life); for sparing time and offering encouragement; and for being there whenever I need help.

I want to express my appreciation to Dr. Gail D. Wenger, for teaching me the

FISH technique, the result of which inspired me to discover my gene DOM3Z\ and for bringing the toothbrush back from Baltimore for a "night person"©. Thank-you also goes to Dr. Arthur H. M. Burghes, for showing me how to search the databases.

I wish to thank Dr. M. Sue O'Dorisio for being my cheerleader, offering the support and encouragement I needed. It is still fresh in my memory that despite of her busy schedule, she c aie fully reviewed my first first-author paper. I want to thank Dr. William B. Zipf and the nurses in the Department of

Endocrinology at Columbus Children's Hospital for recruiting IDDM and CAR patients,

which makes the genetic analysis possible. I also appreciate all the patients who

volunteered for the study. They are the real heroes whose belief in science will eventually

benefit millions of other patients.

A special thank-you goes to Professor Manfred P. Dierich at University

Innsbruck, Austria. His personal acknowledgment has enabled me to be more confident

in myself. His graciousness has set a perfect example for me to be a decent person.

I also wish to thank Dr. Joann Moulds from The University of Texas at Houston,

not only for her help in setting up C4 allotyping experiments, but also for offering

support and being a friend.

I want to thank Jan Zinaich, manager of MCDB Program, for keeping me

informed of all opportunities, and for ensuring me happiness is the most important part of

life (This is more than an appreciation between Gemini's).

I am grateful to members of Dr. Yu's laboratory: Brad Baker, Andrew Dan gel,

Anna Mendoza, Xiaodong Qu, Kristi Rupert (my English teacher), Shanxiang Zhang, and

Bi Zhou. They are the people who have watched me through the ups and downs, shared

my happiness and pain. They are the ones who helped me in dealing with various

problems. I also thank Monica Summers and Doug Balster for sharing cell lines, and for being friends.

Thank you to all my friends, for their love, which holds me through hard times.

VI Finally, I thank the Presidential Fellowship from The Ohio State University for the financial support. VITA

June 2, 1971 ...... Bom - P.R.China

1992 ...... B.S., Biology, Beijing Normal University

1992-1994...... Research Associate,

Institute of Genetics,

Chinese Academy of Sciences, Beijing, P.R.China

1994-1998 ...... Graduate Teaching and Research Associate,

The Ohio State University

1998 ...... M.S.,

Molecular, Cellular and Developmental Biology

The Ohio State University

1998-present ...... Presidential Research Fellow,

The Ohio State University

VIII PUBLICATIONS

Original Research Articles

1. Qu, X. ,Yang.Z„ Zhang,S., Shen, L., Dangel,A.W., Hughes, J., Redman, K.L., Wu, L.-C. and Yu, C.Y. (1998) The human DEVH-box protein Ski2w from the HLA is localized in nucleoli and ribosomes. Nucleic Acids Res. 26: 4068-4077.

2. Yang Z., Shen L., Wu L.C., Dangel A.W., and Yu C.Y. (1998) Four ubiquitously expressed genes, RD-SKI2W-DOM3Z-RP1, are present between complement component genes factor B and C4 in the class EH region of the HLA. Genomics 53: 338-347.

3. Yang, Z„ Mendoza, A.R., Welch, T.R., Zipf W.B., and Yu, C.Y. (1999) Modular variations of the human MHC class EH genes for serine/threonine kinase RP, complement component C4, steroid 21-hydroxylase CYP21 and tenascin TNX (RCCX): a mechanism for gene deletions and disease associations. J. Biol. Chem. 274: 12147-12156.

4. Yang. Z. and Yu, C.Y. (1999) Organizations and gene duplications of the human and mouse MHC complement gene clusters. Exp. Clin. Immunogenet. (in press).

Peer-Reviewed Abstracts

1. Yang. Z., Wenger, G., Labanowska, J., Mendoza, A., and Yu, C.Y. (1996) The putative human helicase gene KIAA0052 that is evolutionally related to the yeast antiviral gene SKI2 and to the human HLA class EH gene SKI2W maps to 9qter and chromosome 5ql 1.2. Am. J. Hum. Genet. 59: A408.

2. Yang. Z.. Wenger, G., and Yu, C.Y. (1997) Ski2k, a putative RNA helicase involved in RNA transport, is conserved from yeast to human. Am. J. Hum. Genet. 61: A 186.

3. Yang. Z.. Mendoza, A.R., Welch, T.R., Zipf W.B., and Yu, C.Y. (1999) Frequent and concurrent duplication and deletion of four consecutive genes in the human MHC class EH region: a mechanism for disease associations. FASEB J. 13: A1472.

IX Electronic publications

1. Yang. Z.. Shen, L., Dangel, A. W. and Yu, C. Y. (1998) Homo sapiens putative RNA helicase Ski2w {SKI2W) gene, complete cds. Genebank Accession No. AF059675.

2. Yang. Z. and Yu, C. Y. (1998) Homo sapiens clone 1 HLA class EU protein DomSz {DOM3Z) mRNA, partial cds. Genebank Accession No. AF059252.

3. Yang. Z. and Yu, C. Y. (1998) Homo sapiens clone 2 HLA class EU protein DomSz {DOM3Z) mRNA, partial cds. Genebank Accession No. AF059253.

4. Yang. Z. and Yu, C. Y. (1998) Homo sapiens clone 1 HLA class Eïï protein Dom3z {DOM32) mRNA, partial cds. Genebank Accession No. AF059254.

5. Yang. Z., Mendoza, A. R. and Yu, C. Y. (1998) Homo sapiens CAH patient Tenascin TNXB gene and TNXA gene recombination breakpoint region. Genebank Accession No. AF086641.

6. Yang. Z.. Shen, L. and Yu, C. Y. (1998) Mus musculus MHC class EQ protein RPl (RPl) mRNA, partial cds. Genebank Accession No. AF099934.

ETELDS OF STUDY

Major Field: Molecular, Cellular and Developmental Biology TABLE OF CONTENTS

Page Abstract ...... ii

Dedication ...... iv

Acknowledgments ...... v

V ita...... viii

List of Tables ...... xiv

List of Figures ...... xv

Chapters:

1. Introduction...... I

2. Modular variations of the human MHC class m genes RCCX: a mechanism for gene deletions and disease associations ...... 30

2.1 Abstract ...... 30 2.2 Introduction ...... 32 2.3 Experimental Procedures ...... 36 2.4 Results...... 40 2.5 Discussion ...... 49

XI 3. Detection of Msc I polymorphism and the concurrent 120 bp deletion in TNXB/XA recombinant gene ...... 71

3.1 Abstract ...... 71 3.2 Introduction ...... 73 3.3 Experimental Procedures ...... 75 3.4 Results...... 77 3.5 Discussion ...... 81

4. Four ubiquitously expressed genes, RD-SKI2W-DOM3Z-RP1, are present in the class m region of the H LA ...... 89

4.1 Abstract ...... 89 4.2 Introduction ...... 91 4.3 Experimental Procedures ...... 95 4.4 Results...... 99 4.5 Discussion ...... 107

5. Biochemical characterization of human Dom3z ...... 123

5.1 Abstract ...... 123 5.2 Introduction ...... 124 5.3 Experimental Procedures ...... 127 5.4 Results...... 132 5.5 Discussion ...... 137

6. Organizations and gene duplications of the human and mouse MHC complement gene clusters ...... 152

6.1 Abstract ...... 152 6.2 Introduction ...... 154

XII 6.3 Experimental Procedures ...... 159 6.4 Results...... 162 6.5 Discussion ...... 170

7. Insulin dependent diabetes mellitus and MHC complement gene cluster 185

7.1 Abstract ...... 185 7.2 Introduction ...... 187 7.3 Experimental Procedures ...... 190 7.4 Results ...... 194 7.5 Discussion ...... 199

8. General discussion ...... 213

References ...... 221

XIII LIST OF TABLES

Table Page

I. I Polymorphisms of complement component C4 ...... 27

1.2 Associations between HLA and some diseases ...... 28

2.1 RCCX modular structures of the B and E families ...... 69

3.1 Detection of the 6.3 kb Msc IRFLP in TNXB genes ...... 88

4.1 Sequence identities / similarities among Dom3z related ...... 122

6.1 A comparison of human and mouse DOMSZ exon-intron structures ...... 184

7.1 RCCX modular structures and C4 allotypes for 50 IDDM patients ...... 208

7.2 RCCX modular variations in IDDM patients ...... 212

XIV LIST OF FIGURES

Figure Page

1.1 Map of the human MHC ...... 21

1.2 Polymorphism of human complement component C4A and C4B proteins ...... 22

1.3 Polymorphism in the number of C4 genes expressed ...... 23

1.4 Size dichotomy of human C4 genes ...... 24

1.5 Gene structure of 21-OHase {CYP21) genes ...... 25

1.6 Number of EST and non-EST entries submitted to GenBank ...... 26

2.1 A molecular map of the human MHC complement gene cluster ...... 56

2.2 C4 allotypes of the B and E families ...... 57

2.3 Southern blot analysis to detect the modular variations of RP, C4,CYP21 and TNX ...... 58

2.4 Southern blot analysis to segregate the RP1/RP2 variation from the C4 long/short (L/S) size dichotomy ...... 60

2.5 Taq I polymorphism to detect the TTVXA-associated 120 bp deletion in the TNXB gene of CAH patient E l ...... 61

2.6 Recombination between RCCX modules ...... 62

2.7 Elucidation of the breakpoint region for TNXB-XA recombinant ...... 63

2.8 RCCX modular variations in the human population ...... 68

3.1 The Msc I map at TNXB and CYP21B gene region ...... 83

XV 3.2 Southern blot analysis to detect the nucleotide mutations at the Msc I site (nt. 2268-2273) in TNXB genes ...... 84

3.3 Southern blot analysis to determine the RCCX modular structures of CAH patients ...... 85

3.4 A schematic diagram to show variant TNX genes by Pst I - Msc I RFLP 86

3.5 Southern blot analysis to detect the concurrent 120 bp deletion and Msc I RFLP in TNXB genes ...... 87

4.1 Exon-intron structure and 5amH I RFLP of the human SKI2W gene ...... I l l

4.2 cDNA and amino acid sequences of human DOMSZ ...... 114

4.3 An alignment of the amino acid sequences for Dom3z-related proteins 116

4.4 Exon-intron structure of the human DOMSZ gene ...... 118

4.5 PFGE analysis of the RD-SKI2W-D0MSZ-RPI com plex ...... 119

4.6 Northern blot analyses of human multiple tissue RNAs ...... 120

4.7 A molecular map of the HLA class m region ...... 121

5.1 Restriction analyses to determine the presence and orientations of DOMSZ cDNA inserts in different expression vectors ...... 141

5.2 Production and detection of bacterial fusion protein Trx-His-Dom3z ...... 144

5.3 Production and detection of bacterial fusion protein GST-Dom3z ...... 145

5.4 Cellular localization of GFP-Dom3z in HeLa cells ...... 146

5.5 Indirect immunofluorescence experiment to detect the cellular localization of Dom3z in H eL a ...... 147

5.6 Genomic Southern blot analysis showing the incorporation of GTŸ-DOMSZ into the stable cell line genome ...... 148

XVI 5.7 Reverse transcription (RT)-PCR of GFP-DOM3Z to examine the expression of the exogenous construct in the stable cell lines ...... 149

5.8 Immunoblot analysis to detect GFP-Dom3z in CHO transfectants ...... 150

5.9 Immunoblot analysis to detect the human endogenous DomSz ...... 151

6.1 Southern blot analyses of DOMSZ in primates and mammals ...... 175

6.2 Genomic sequences linking human DOMSZ to R P l and SKI2W ...... 176

6.3 Exon-intron structure and alternative splicings of human and mouse DOMSZ ...... 178

6.4 An alignment of human and mouse DomSz protein sequences ...... 179

6.5 cDNA and derived amino acid sequences of mouse R P l...... 180

6.6 A comparison of the exon-intron structures and gene duplications for human and mouse R Pl and R P 2...... 182

6.7 A comparison of the gene organizations of human and mouse MHC

complement gene clusters ...... 183

7.1 Comparisons of C4 genes and RCCX modular structures between from the IDDM and non-IDDM populations ...... 202

7.2 C4 allotypes of selected IDDM patients ...... 203

7.3 Sequence comparison of the C4d region from Rgl-associated C4BS and C4A ...... 204

7.4 RFLPs of SKI2 W in EDDM patients ...... 207

xvii CHAPTER 1

INTRODUCTION

As history turns the page of this century, it is time for us to review the research conducted on the major histocompatibility complex (MHC). MHC was first discovered in mouse as the H-2 region. About 10 years after, in 1958, the homologous system in human was also identified. It was named the human leukocyte antigen (HLA), because as blood group antigens, they are expressed on the white blood cells. Later, it was demonstrated that during organ transplantation, matching HLA antigens can significantly improves the fate of the transplant in the recipient. Otherwise, transplant rejection is an absolute consequence. HLA is important in both transfusion and transplantation medicine since several loci within HLA region are involved in histocompatibility. The

HLA was therefore also named the major histocompatibility complex, MHC (Terasaki,

1990). Overview o f the MHC region

In humans, the MHC constitutes a 4,000 kb region on the short arm of in the distal portion band 6p21.3 (Senger et a l, 1993). To date, over 200 genes have been identified in the MHC region, and they are grouped into three major linked gene clusters: Class I, II and IH {Fig. l.I) (Trowsdale, 1996). The class I group lies in the telomeric half of the MHC and spans about 1,800 kb. It contains genes encoding the classical transplantation antigens, HLA-A, -B, and -C loci. The 1,000 kb class n region lies in the centromeric half of the MHC complex. The products of this region were initially identified as immune-response loci, known as HLA-D. However, it became apparent now that the products of the class II loci, HLA-DP, -DQ, and -DR, were structurally and functionally related to the class I loci as they were all members of the immunoglobulin superfamily. These two parts are separated by the class HI region, which is 1,100 kb long, and consists of many densely spaced genes [reviewed in

(Trowsdale, 1996)].

Much of the attention has been focused on the MHC region because the genes in this region fulfill important immunologic functions. Many class I and II genes encode highly polymorphic families of cell-surface glycoproteins that present antigenic peptides to the T cells during an immune response. On the other hand, in recent years, a detailed characterization of the MHC has revealed the presence of genes that mostly encode proteins of unknown functions or with functions unrelated to the immune response

(Campbell and Trowsdale, 1993; Carroll et a i, 1987; Trowsdale, 1993). Many of these genes are located in the class El region. Although the class EH region also contains genes that encode proteins with important immune related functions, such as complement component C2, C4 and factor B, the products of most other genes have a variety of different functions [reviewed in (Aguado et al., 1996; Salter-Cid and Flajnik, 1995)].

Diversities of Complement Component C4

The complement component C4 is an abundant blood protein (0.4 mg/ml). It is essential for the immune response triggered by antibodies. During its activation and inactivation, the C4 molecules interacts with at least eight other molecules: the antigen, the antibody, complement factors C2, C3, C5, regulatory proteins factor I, C4b binding protein, and complement receptor type I (CRl) (Porter, 1985a). The covalent binding ability through the thioester carbonyl group after activation, and the relative large size with -200 kDa, enable C4 to link and provide surfaces for these protein interactions. At the same time, the binding abilities are different between two classes of C4 molecules:

C4A (acidic) exhibits 100-fold higher affinity to form amide bonds with amino groups or peptide antigens than C4B (6asic). The difference in chemical reactivity explains why

C4A binds to immune complexes three times more effectively than C4B, and therefore is primarily involved in promoting the physiological disposal of immune complexes. C4B can form ester bonds with hydroxyl groups or carbohydrate antigens 10-fold more efficient than C4A. Since C4B is hemolytically more active, it participates preferentially in the clearance of microorganisms (Law et a i, 1984; Yu, 1999). The differential chemical reactivities of C4A and C4B are determined by variations of four amino acids between positions 1 lOl-l 106, by which C4A has PCPVLD, while C4B has LSPVIH

(Belt era/., 1985; Yu etaL, 1986).

As one of the most polymorphic serum proteins besides immunoglobulins, the complexity of C4 is manifested in many aspects other than the differential chemical reactivities {Table 1.1) (Isenman and Young, 1984; Yu et a i, 1988). More than 34 allotypes of C4A and C4B have been identified (Mauff et al., 1990a, b), the locations of polymorphic residues are shown in Fig. 1.2 (Yu, 1999). They are remarkably different in electrophoretic mobilities and associations with antigenic determinants [reviewed in (Yu,

1999)]. Two blood group antigens, Rodgers (Rgl, Rg2) and Chido (Chi - Ch6), are located in the C4d region of C4A or C4B (O'Neill et a i, 1978; Tilley et al., 1978). C4d is the factor I - mediated proteolytic degradation fragment that may be linked to various cell surfaces (Reid and Porter, 1981). Amino acid variations in C4d constitute the sequential and conformational epitopes for the Rg and Ch antigenic determinants (Yu et al., 1986, 1988). Generally, C4A is associated with Rgl while C4B is associated with

Chi (Giles et al., 1988; Yu et al., 1988). However, there are some exceptions, such as the association of C4A1 with Chi and C4B5 with Rgl (Rittner et al., 1984; Roos et a i,

1984).

In most cases, the two classes of C4 molecules, C4A and C4B, are encoded by two tandem gene loci, C4A and C4B, respectively (Carroll et ai, 1984). However, there are examples for the two C4 loci coding for identical C4 isotypes or allotypes (Braun et a i, 1990; Yu and Campbell, 1987). It was reported that phenotypically, 64.5-74.5% of chromosome 6 in the population contain two different C4 genes, C4A and C4B\ 9.5-14.5% contain C4A gene only; 16-19% contain C4B gene only. Another 1-2% of chromosome 6 have three C4 genes, with either two C4A genes or two C4B genes {Fig.

1.3) (Carroll e ta l, 1986).

Besides the gene number variations, a unique feature of C4 is the size dichotomy

(Carroll et a l, 1984; Prentice et a i, 1986; Yu et al., 1986). C4 genes could be either 21 kb (long, L) or 14.6 kb (short, S), and the difference lies in intron 9. For the long C4 gene, intron 9 is 6.8 kb in size, while for the short gene, this intron is only 416 bp in size

(Dangel et al., 1994; Yu, 1991). When the sequence of the large intron 9 was determined, an endogenous retrovirus, HERV-K(C4), was discovered (Dangel et al.,

1994, 1995b; Tassabehji etal., 1994; Yu et al., 1986). HERV-K(C4) contains all the hallmark structures cf a retrovirus, including the gag, pol, and env genes {Fig. 1.4).

However, these genes have acquired many point mutations and therefore, may not produce a live retrovirus. HERV-K(C4) is organized in the opposite transcriptional orientation with respect to the C4 gene. Therefore, transcription of C4 produces the antisense version of the retrovirus. Such configuration may have selection advantage for the host because the antisense RNA for the retrovirus could hybridize to proviral genomic

RNAs and serve as a neutralizing agent to inhibit retroviral proliferation (Dangel et a i,

1994; Yu, 1999).

As it is essential for the immune response triggered by antibodies, complete deficiencies of C4A and C4B are rare and highly deleterious (Hauptmann et al., 1988;

Lokki and Colten, 1995; Uring-Lambert et al., 1989). Almost all patients suffer systemic lupus, immune complex disease in the kidney or overwhelming infections. On the other hand, partial or complete deficiencies of either C4A or C4B are more common, occurring in 10-30% of the Caucasian population (Schneider, 1990). C4 deficiencies can be due to the absence or deletion of a specific CA gene (Carroll et a i, 1985a), the presence of two genes coding for identical C4 isotypes or allotypes (Braun et al, 1990; Yu and

Campbell, 1987), or the presence of pseudogenes caused by deletions, insertions or point mutations [reviewed in (Yu, 1999)].

Genes flanking complement component C4

Downstream of C4A and C4B, a pseudogene CYP21A and a functional gene

CYP21B were identified respectively (Carroll et al., 1985b; White et a i, 1985). CYP21A contains three deleterious mutations and many other point mutations {Fig. 1.5) (Higashi etal., 1986; Rodrigues et al., 1987; White et al., 1986). CYP21B encodes steroid 21- hydroxylase, which is a microsomal enzyme expressed in the adrenal gland that catalyzes conversion of 17-hydroxyprogesterone and progesterone to 11-deoxycortisol and deoxycorticosterone respectively [reviewed in (White et al., 1994)]. Deficiency of the adrenal steroid 21-hydroxylase is the most common enzymatic defect of steroid synthesis.

It can lead to the recessively inherited disorder of adrenal steroidogenesis: congenital adrenal hyperplasia (CAH), in which inhibition of the formation of cortisol drives the adrenal cortex to overproduce androgens. This hormonal setting affects the development of genetic females by misdirecting the differentiation of external genitalia towards the male type. In classical CAH, the most common cause of genital ambiguity in females, prenatal exposure to excess androgens results in virilization of the female fetus. It occurs in about 1 in 14,000 live births. Postnatally, untreated patients do not effectively synthesize aldosterone and are salt-wasting, a condition that is potentially fatal [reviewed in (New, 1995; White et al., 1994)]. Complete deletions of functional gene CYP21B, large gene conversions to pseudo gene CYP2JA, and single point mutations in CYP21B are reported to be responsible for the disease (Strachan, 1994).

Overlapping with the 3' end of CYP21A and CYP21B genes are another pair of duplicated genes: tenascin TNXA and TNXB. The transcriptional orientations of TNX are opposite to those of CYP21 genes (Matsumoto et al., 1992b; Morel etal., 1989). The duphcation of TNX is incomplete with TNXB being 68.2 kb in size and consisting of 45 exons, while TNXA starting from intron 32 (Bristow et al., 1993; Matsumoto et a l, 1994;

Rowen et al., 1997a). Another important difference is that there is a T/VXA-specific 120 bp deletion at the boundary of exon 36 and intron 36. This deletion results in a frame shift mutation and premature termination of translation (Gitelman et al, 1992; Shen et ai, 1994).

The function of TNX has not been determined, but the striking similarities of its overall structure with tenascin-related molecules cytostatin (TNG) and restrictin (TNR) suggest that it belongs to the tenascin family (Chiquet-Ehrismann, 1995). Tenascins are believed to be important extracellular matrix proteins involved in regulating numerous developmental processes (Chiquet-Ehrismann et al., 1994). The predicted TNX protein is over 400 kDa in size and consists of five distinct domains: a signal peptide, a hydrophobic domain containing three heptad repeats, a series of 18.5 EGF-like repeats,

32 fibronectin type IQ repeats, and a carboxyl-terminal fibrinogen-like domain. TNX is ubiquitously expressed in the fetus with the most prominent expression in testis and muscle (Bristow et a l, 1993). Data obtained from rat embryo studies indicate a role of

TNX in connective tissue cell migration and late muscle morphogenesis (Burch et al.,

1995). This notion is substantiated by the report that in human, TNX deficiency is associated with connective tissue disease Ehlers-Danlos syndrome (EDS) (Burch et at.,

1997).

When the promoter region of C4A was characterized, a polyadenylation signal

AATAAA, with the characteristic feature for the 3’ end of a mammalian gene, was found

631 bp upstream of C4A transcriptional initiation sites. Novel gene R Pl (also known as

G11) was therefore discovered (Sargent et al., 1994; Shen et al., 1994). Northern blot analysis suggested that RPl is ubiquitously expressed. The deduced amino acid sequence of RPl does not reveal significant similarities to any known proteins, although a bipartite nuclear localization signal was identified (Shen et al., 1994; Yang et al., 1998). The function of RPl is unknown. Similar to TNX, RP is also partially duplicated. The duplicated gene segment RP2 is identical to the last 913 bp of the RPl gene and is unlikely to code for a protein product (Shen et a l, 1994).

MHC and disease associations

In 1967, MHC was first reported by Frenchman Amiel to be associated with

Hodgkin’s disease(Amiel, 1967). Although the association is weak, this finding more than 30 years ago is certainly significant: it opened the door to a whole new area of the

MHC field by triggering thousands of studies on the association of MHC with hundreds of diseases. At least 70 diseases have now been shown to be clearly associated with

MHC, including insulin-dependent diabetes mellitus (DDDM), congenital adrenal hyperplasia (CAH), systemic lupus erythematosus (SLE), as well as juvenile rheumatoid arthritis (JRA) (De Vries and Van Rood, 1992). Table 1.2 gives a representative list of well-established MHC and disease associations (De Vries, 1994).

Autoimmune disease is obviously in the first category on this MHC associated disease list. Most of the autoimmune diseases are polygenic. As reviewed in Vyse and

Kotzin (1998), this is first demonstrated by the genetic heterogeneity, i.e., the same phenotype resulted from the combined effect of different genes and / or alleles. There is no single gene being either necessary or sufficient for the disease development.

Therefore, the common disease genes are referred to as susceptibility genes (Heward and

Gough, 1997). The susceptibility genes may interact with one another [epistasis, (Hodge,

1981)] or act independently [additivity, (Risch, 1990)]. Secondly, the alleles that predispose to the autoimmune diseases are common in the general population. In other words, the genetic susceptibility factors for autoimmunity are not due to rare disease genes, but are instead due to common polymorphisms among apparently normal genes

(Wu et al, 1998). There is a pool of mutations or allelic variants affecting the expression and / or function of the genes involved in control of the immune response. Individually, most mutations in the genome have mild, if detectable at all, effects. However, in combination with "normal” alleles of other loci, they produce a measurable and occasionally lethal autoimmune phenotype (Vyse and Todd, 1996). The third puzzling fact is that the penetrance of autoimmune disease in carriers of the disease susceptibility genes is incomplete (Vyse and Todd, 1996). Even in the presence of a full complement of susceptibility alleles at multiple loci, overt disease does not always result (Todd,

1995). The exception is classic congenital adrenal hyperplasia (CAH) caused by CYP21 deficiency, which is the only disease linked with MHC known to show complete penetrance and to be present at birth (De Vries, 1994).

Research of insulin-dependent diabetes mellitus

The most extensively studied polygenic autoimmune disease is insulin dependent diabetes mellitus (IDDM), also known as the type 1 diabetes. IDDM is characterized by the autoimmune destruction of insulin-secreting pancreatic P cells causing tissue damage, which can lead to blindness, kidney failure and reduced life expectancy (Tisch and

McDevitt, 1996). The peak age of onset is about 12 years, and from then onwards, daily injections of insulin are required by affected individuals. With a frequency of about 0.4% in Caucasian population, IDDM is second to asthma as the most serious chronic childhood disease in the Western world [reviewed in (Cordell and Todd, 1995)].

A genome wide search for human genes that predispose to IDDM indicates that as many as twenty different chromosome regions have certain positive linkage to this disease. Among them, MHC is the strongest genetic factor (Davies et a l, 1994).

Individuals with a combination of HLA class II antigens DR3 and DR4 are also at higher risk of IDDM (Svejgaard and Ryder, 1981; W olf et a l, 1983). It has also been demonstrated that susceptibility to IDDM is most strongly determined by MHC class U

DQ P alleles that encode serine, alanine, or valine at position 57 on both chromosomes

1 0 (Nepom et al., 1987; Todd et al., 1987). Aspartic acid at position 57 mediates resistance

to IDDM, which varies in degree with the sequence of other residues in the DQ a and P chains. One theory to explain this phenomenon is that the presence or absence of an Asp

at residue 57 of the P chain may alter the peptide binding motif and the ability to bind specific immuno-pathogenic peptides (Nepom, 1990; Nepom and Kwok, 1998).

In addition to MHC class H genes, the possible candidates for IDDM susceptibility genes include C4 and several other poorly defined polymorphic genes in the class HI region (Degli-Esposti et al., 1992). There have been reports describing associations of increased frequencies of the C4A null allele and C4B3 allotype with type I diabetes (Jenhani et al., 1992; Marcelli-Barge et al, 1990). Whether these associations are a consequence of linkage disequilibrium^, with the DR and DQ loci or result from a primary association with the disease itself is still unclear. It has also been suggested that there may be a second susceptibility gene mapping close to the C4 locus (Thomsen et al.,

1988).

Important as it is, it has long been apparent that genetic susceptibility is a necessary but not sufficient predisposing factor. Even in monozygotic twins, the concordance rate is only 50% (Barnett et al., 1981), indicating the importance of a number of as yet unidentified environmental factors (Castano and Eisenbarth, 1990).

Infecting viruses are most often implicated in this category (Oldstone et al., 1991). As reviewed by Conrad et al. (1994) and Moller (1998), there are several mechanisms proposed for virus-induced autoimmunity, among which superantigens can act as activators of potentially pathogenic T cell clones (Conrad and Trucco, 1994).

II Superantigens are unique products of bacteria and viruses with the capacity to activate a large proportion of immunocompetent T lymphocytes (1-30% vs. ~1 out of 10^ of the total peripheral T cell pool) irrespective of their immunological specificity. They can cause extreme inflammation with high levels of toxic cytokines (Conrad etal., 1994;

Herman et at., 1991; Jorgensen et al., 1992). It has been reported the involvement of a pancreatic islet cell membrane-bound superantigen as a diabetes aetiopathogenetic factor

(Conrad et al., 1994). Superantigens can also be the products of endogenous retroviral genes. In 1997, a paper published in Cell proposed that a human endogenous retroviral superantigen constitutes a candidate autoimmune gene in type I diabetes (Conrad et a i,

1997). However, this conclusion is put into serious doubt recently as several other groups failed to reproduce similar experiment results (Badenhoop et al., 1999; Jaeckel et al., 1999; Muir era/., 1999).

The research of complex polygenic diseases such as IDDM, in which multiple alleles interact to produce an autoimmune phenotype, can be facilitated by studying experimental species, such as the mouse or rat. While the genes predisposing to disease in an animal model might not be the same as those for human, the underlying genetic basis could present similarities (Cordell and Todd, 1995). Non-obese diabetic (NOD) mouse, discovered at the Shionogi Research Laboratories in Japan, spontaneously develops a disease with many of the characteristics of IDDM in human (Makino et ai,

1980). Similar to human patients with IDDM, NOD mice also have cellular and humoral immune responses specific for pancreatic p cell antigens. As such, it has become the most frequently employed small animal model of IDDM, and has permitted some

1 2 important insights into the complicated pathogenesis of this disorder (Wicker et ai,

1995).

NOD mice not only share pathologic features, but also the genetic predispositions with human IDDM patients. Of the 18 genetic loci identified to date in NOD mice, the strongest contribution to the disease is again from the genes of the MHC (Wicker et al.,

1992; Wicker et at., 1995). The gene organizations of mouse MHC, or H2, are similar but not identical to that of the human (Chaplin et a l, 1983). Generally, H2 region contains similar number of genes as the equivalent human region but is smaller in size

(Gasser et al., 1994). This is due to both smaller intron length as well as smaller intergenic regions (Cho etal., 1991; Newell etal., 1996).

Gene duplications

An important feature of mouse genome is the tolerance of a large viral load, which was suggested to create potential for beneficial variations (Stoye and Coffin,

1988). Endogenous pro viruses can affect genes in their locale either by disruption, or by influence of their regulatory features on neighboring (Copeland et al,

1983; Hayward et a l, 1981). Although some of these effects are deleterious, most proviral sequences are presumably benign or would not have accumulated to their present high copy numbers. One of these examples comes from the mouse gene encoding for sex limited protein (Sip), which derived from the gene for complement component C4, but diverged in regulation and function (Karp et al., 1982; Shreffler et al., 1984). Sip requires androgen for expression and is inactive in the complement pathway, although

13 greater than 95% homology is maintained in coding and flanking regions (Hemenway et a i, 1986; Nonaka etal., 1986a, b; Ogata and Sepich, 1985). The hormone responsive expression of Sip is conferred by an endogenous pro virus which was inserted 2 kb upstream of the gene. As it imposed its regulatory features on the adjacent SLP gene, this endogenous provirus is termed imposon (IMP). That the imposon and its close relatives are stable within the mouse genome argues for some possible selective advantage provided by these elements (Stavenhagen and Robins, 1988).

Within multigene families, divergence is tolerated, so long as a functional gene is retained (Smith, 1974). Duplicated genes are allowed to acquire mutations that may eventually result in acquisition of a new function or final status as a psuedogene

(Stavenhagen and Robins, 1988). In human, an increasing number of instances of gene duplications are also being revealed as enormous progress in the physical mapping and sequencing of the has been achieved in recent years. The human genome research not only provides valuable approach to study individual genes; more significantly, it starts a whole new era of biomedical research by offering the feasibility of screening the entire genome to identify susceptibility genes of the polygenic diseases.

Search of disease susceptibility genes

The elucidation of the genetic basis of disease has been a goal of medicine for many decades. With the advent of molecular genetic technologies, over the past ten years, nearly 100 genes causing various genetic diseases have been identified by positional cloning. However, these diseases are for the most part monogenic in nature.

14 examples include cystic fibrosis and Huntington's disease. Studies of complex, polygenic diseases have proven much more difficult [reviewed in (Pratt and Dzau, 1999)].

Traditionally, linkage and association studies using candidate genes (the population based case control studies) can show the association of a gene to the disease in a disease population compared with a disease-free population (Heward and Gough,

1997). They have provided insight into the causes of polygenic diseases. However, the biggest problem here is that a positive association could result not only from a real disease causing gene, but also linkage diseqiulibrium or even an artifact of population admixture (Lander and Schork, 1994). Therefore, the number of genes that are consistently positive in multiple studies is small and the predictive power of these genetic variants is, unfortunately, limited (Spielman et al., 1993).

The entire genome screening approach makes use of genetic markers, such as microsatellites, from the existing genetic maps (Heward and Gough, 1997). A whole genome scan is the analysis of about 100 evenly spaced markers in mouse pedigree, or about 300 markers in human families, in which there are affected and non-affected progenies. The goal is to identify chromosome regions that are inherited by affected progenies more often than expected from Mendelian random segregation (Vyse and

Todd, 1996). Once genetic dissection implicates a chromosomal region, positional cloning can be used to identify the responsible gene. The availability of densely mapped genetic markers spanning the entire genome can definitely accelerate this process, and could be applied to the analysis of multifactorial traits without prior knowledge of any gene function. This realization formed a basis for the initial promotion of the Human

15 Genome Project, which promises to make a tremendous contribution to the positional cloning of complex traits by eventually providing a complete catalog of all genes in a relevant region (Tomlinson and Bodmer, 1995).

The Human Genome Project

The Human Genome Project (HGP) in U.S. was initiated about 10 years ago aiming at sequencing the entire human genome, comprised of approximately 3 billion base pairs. However, it has been estimated that only 2-3% of the genome encodes proteins. Therefore, in 1991, an intermediate goal was attempted: the defining and sequencing of the regions of the genome encoding proteins by sequencing clones from cDNA libraries (Adams et al, 1991). These sequences termed as the expressed sequence tags (ESTs), account for more than half the sequence records in GenBank {Fig 1.6)

(Boguski, 1995). Over the past 8 years, 1,304,811 entries of human EST sequences have been reported (dbEST release, March 19, 1999).

The reported ESTs are approximately an order of magnitude greater than the estimated number of genes in the human genome, which is between 50,000 and 150,000

(Fields et a l, 1994). This redundancy allows for sequence comparisons and definition of consensus sequences. A hint to the completeness of the EST database is provided by comparing the entries from the database to independently isolated genes. For example, of the 91 genes identified by positional cloning, 83 (91%) are represented in the dbEST database. Similarly, of the 94 genes cloned as oncogenes or tumor suppressors, 94% are represented (Pratt and Dzau, 1999).

16 The identification of the expressed sequences in the human genome has been useful in terms of defining gene families, discovering novel genes, and determining patterns of expression in different tissues and disease states (Pratt and Dzau, 1999).

Currently, only 10% of the ESTs correspond to known genes, another 20% have to known genes, while the remaining 70% are unknown genes with no homology and with no known function (Adams et al., 1995). Nevertheless, the ability to identify genes residing within an interval linked to a particular disease, especially if expression patterns in the tissues can be determined, will dramatically increase the power of genetic analysis and more rapidly yield candidate genes for further analysis [reviewed in (Pratt and Dzau, 1999)].

In biological research, hypotheses are formulated based on the most abstract components, and then tested utilizing a large body of rules that govern the selection and use of methods and tools best suited for addressing a particular biological question.

Hypothesis testing ultimately results in the accumulation of data, which in turn, reshape the world view and theory, leading to subsequent rounds of hypothesis formulation and testing, and an enhanced understanding of a biological question (Schena et a i, 1998). In this specific case, the need for a well-spaced map of highly polymorphic genetic markers to study multifactorial diseases promoted the Human Genome Project, which with its progress adds new dimensions to our understanding of the causes of human diseases.

The ability to rapidly genotype individuals at high density will greatly enhance the ability to determine genes playing causal roles in the development of disease on a population

17 basis and possibly to predict which individuals will be susceptible to disease and to better diagnosis and treat disease (Pratt and Dzau, 1999).

Recombination as the cause for genomic disorders

The classical mechanism for a genetic disease is that an abnormal phenotype is primarily a result of point mutation. While from the genome point of view, structural characteristics of the human genome predispose to rearrangements that also result in human disease traits. The genomic disorders are caused by an alteration of the genome that might lead to the complete loss or gain of a gene(s) sensitive to dosage effect or alternatively, might disrupt the structural integrity of a gene(s). It can occur through many mechanisms, one of which is recombination between region-specific, low-copy repeated sequences [reviewed in (Lupski, 1998)].

There is no doubt that recombination is an essential cellular process. The resultant exchange of information is critical for the survival of species. Recombination provides an effective means of generating genetic diversity that is important for evolution. It also allows cells to retrieve sequences lost by replacing the damaged section with an undamaged strand from a homologous chromosome. Although the process of homologous recombination is used to study gene function by way of gene knockouts, it certainly is also a source of harmful mutations and diseases [reviewed in (Purandare and

Patel, 1997)].

Under certain circumstances, recombination can lead to deletion and / or duplication of the genetic material. It can also result in the creation of fusion genes.

18 Sequence homology between members of gene families or pseudogenes is the trigger for such rearrangements. Unequal crossover or unequal sister chromatid exchange between a functional gene and a related pseudogene can lead to deletion of the functional gene or formation of fusion genes containing a segment derived from the pseudogene.

Alternatively, the pseudogene can act as a donor sequence in gene conversion events and introduce deleterious mutations into the functional gene [reviewed in (Fhirandare and

Patel, 1997)].

Recombinations do not occur randomly, but rather at site-specific hot spots

(Chakravarti etal., 1986; Cullen etal., 1995; Lebo etal., 1983; Wahls etal., 1990).

Specific regions within MHC have been suspected to contain hot spots for recombination, based on the studies of linkage disequilibrium between polymorphic loci (Cullen et al.,

1997; Kobori et a l, 1984; Steinmetz et a i, 1986). In this region, finding the class XU hot spots is of importance not only for its relevance in understanding meiotic recombination within the MHC, but also because of its implications for the positional cloning of candidate disease susceptibility genes (Snoek et al., 1998).

Goals of the study

Traditionally, the study of the genetic predisposition to the large number of MHC associated diseases have been mainly focused on the polymorphic class I and class II genes. However, in many cases, these gene products cannot fully explain all the disease susceptibility. Therefore, characterization of many other genes, in particular genes in the class HE region, has become increasingly important. Therefore, genetic and biochemical

19 studies of the class III genes were performed. The first purpose of the study presented in this dissertation is to establish the molecular genetic basis of the C4 and CYP21 gene duplications and deletions. Due to the concurrent gene duplications and gene deletions of

RP, C4, CYP21 and TNX, the genetic unit consisting of these four genes was termed the

RCCX module. The RCCX modular variations may contribute to the genetic instability of the MHC class in region and lead to MHC-associated diseases. The second purpose of the study is to investigate whether there are specific patterns of the RCCX modular structures in patients with insulin dependent diabetes mellitus (IDDM). As it is important to study IDDM using mice models, the comparison of human and mouse gene structures and gene organizations become apparently important. Consequently, the third purpose is to compare the gene organization of the human and mouse MHC class HI region.

Finally, although many diseases are associated with C4 gene deletion or polymorphism,

C4 may just serve as a genetic marker. It is possible that yet unidentified gene(s) in close proximity may actually be the disease susceptibility gene(s). Therefore, another important goal of this study is to determine the presence of novel gene(s) close to C4.

2 0 A. KE4 DMA RING9 RING2|COLIIA2 DPB1 \DMBTAP1 LMP7 DQB2 RINGl KE5 DPA21DPA1 RING3\ /LMP2\ /TAP2 1DQA2 DQB1 DRB2 DRA KE3 I 11 DPB2II/ DMA I \/lPR2\\//dob I |DQB3 |DQA1 DRB1| DRB3 DRB9I □ ODD 00 DQ Omi DOI 00 00 0 — I— — I— — I— — I------1— — I— — I— — I— 100 200 300 400 500 600 700 800 900 1000 CYP Hsp70 LTB CYP 21P Bf G17 G15 21 XA/ RD I Horn G4 B144 TNF B. G10 /G14 0SG\C4B|/C4A\ GIO G9 21 G7b G6 G5|G3 G1 \ AB MIC mm//G 130 mnrnfin G12\\ \|//G 1l\ C2 \G9a|G8\\l/G7aQ7|oo i oiima BAT5\| o |G2| coi \ //BAT1 m B o / \Ao o I I I I I r 1100 1200 1300 1400 1500 1600 1700 ,.,,JOOO 1900 2000 IV-2 P5.3 C. P5-2 80 A 21 90 TEN LIKE Vll-3\ll-1 1/70 16 P5- MIC OCT-3 X HSR1 MIC VI ^ IV -3 I 17B I TUBS |P5-1 |E / 30 92 I I HIV I//YP5-4 H |( 75,

I I I I I I I I I I I I I I t I I I I 1 I I— I— I— 1— I— I— I— I— I— I— I— I— 1— I— I— I— I— I— I— 1 2100 2200 2300 2400 2500 2600 2700 2800 2900 3000 3100 3200 3300 3400 3500 3600 3700 3800 3900 4000

Fig. 1.1 Map of the human MHC. From centromeric to telomeric of chromosome 6p: A, Class 11; B, Class III; and C, class I regions. (This figure is taken from Trowsdale, 1996.) Y V RP

S A WL S

PD D S PCLD NTIX A RTV “ - ï ------■ 4 41 H H LN G ILSIH S SAR SVGG C4a .Ç4ÿ. H

F/g. 7.2 Polymorphism of human complement component C4A and C4B proteins. C4 is synthesized as a single chain precursor and processed to a three chained structure ( (3—a—y) linked by disulfide bonds. The C4A and C4B isotypic residues are in bold. (This fugure is taken from Yu, 1999).

2 2 N orm al 'Half null' 'N u ll' Duplications h ap lo ty p e h ap lo ty p e h ap lo ty p e

C4A Ï'QO'

C4B C:'QO'

Fi g. 1.3 Polymorphism in the number of C4 genes expressed. (This figure is taken from Carroll et al. 1986)

23 A. Long C4 gene, 21 kb

long intron 9 (6.8 kb)

exon 1 0 env AT I pol iggg

3 'L T R 5'LTR

endogenous retrovirus HERV-K(C4) (6.36 kb)

B. Short C4 gene, 14.6 kb

short intron 9 (416 bp)

exon 9 10

Fig. 1.4 Size dichotomy of human C4 genes. A and B illustrate the exon intron organization of a long and a short C4 gene, respectively. Filled boxes represent the 5' LTR and the 3' LTR; horizontal arrows stand for transcriptional orientation. (This fugure is taken from Dangel el a l, 1995b).

24 TATAA MTAM

11III III III! I I I II ■ II I I I I iini I

21-OH ose B TaTphe^L^îTLèîr G in GGAGAC TAC TCC CTGCTCTGC G T T T T T T T G C T T CAG

21 - OHase A Vq I S er A la Leu G lu S er Pr o Vol Ph e Phe Ala STOP GTC TCr GCT CTG GAA AGC CCA G TTTTTTTTG C T TAG

Fig. 1.5 Gene structure of 21-OHase (ÇYP21) genes. Exons are boxed and numbered 1-10. The position of the TATAA box and the polyadenylation signal AATAAA are shown. The positions at which the 21-OHase A (CYP 21A) differs from 21-OHase B (CYP 2IB) are marked by bars beneath the gene structure. ( ) over a bar indicates a codon change due to nucleotide alteration. The three deleterious mutations in the OHase A (CYP 21 A) gene are illustrated below the corresponding bars. The 8 bp deletion and the T insertion, which cause frameshifts, and the C —> T transition, which generates a stop codon, are underlined. (Tais figure is taken from Rodrigues et al, 1987).

25 450000 w 400000 U EST 0 350000 u 0) 300000

250000

200000

150000 o\ 100000

50000

0 1992 1993 1994 1995*

Year

Fig. 1.6 Number of EST and non-EST entries submitted to GenBank. (This figure is taken from Boguski, 1995) 1. Electrophoretic mobility C4A C4B Agarose gel Fast (acidic) Slow (basic) SDS-PAGE (a chain) Mr 96 000 Mr 94 000

2. Thioester reactivity (a) Hemolytic activity Lower Higher (b) Rel. covalent binding affinities (i) Amino group Higher Lower (ii) Hydroxyl group Lower Higher

3. Antigenic determinants Rodgers, WH Chido, WH Blood group antigen Rg: 1,2 Ch: 1,2,3,4, 5,6

Table 1.1 Polymorphisms of complement component C4 (This table is taken from Yu, 1999). Disease HLA

Hodgkin’s disease DPB 1*0301 Graves’ disease DQA1*0102 Pemphigus DR4 Type 1 diabetes DQA 1*0301 DQB 1*0302 Multiple sclerosis DQA1*0102 DQB 1*0602 DRB1*1501 Narcolepsy DQA1*0102 DQB 1*0602 DRB5*0101 DRB1*1501 Celiac disease DQA1*0501 DQB 1*0201 Sicca syndrome DR3 IgA deficiency DR3 Idiopathic Addison’s disease DR3 Idiopathic membranous nephropathy DR3 Alloimmune thrombopenia DR52 Leprosy Tuberculoid DR3 Lepromatous DQl Tuberculosis DR2 Goodpasture’s syndrome DR2 Rheumatoid arthritis DRB 1*0404

(to be continued)

Table 1.2 Associations between HLA and some diseases (This table is taken from De Vries, 1994)

28 {Table 1.2 continued)

Disease HLA

Pernicious anaemia DR5 Fauci-articular juvenile rheumatoid arthritis DR5 Behcet's disease B51 Congenital adrenal hyperplasia (B47) C2 deficiency (DQ1-DR2-C2 NULL-B18) Systemic lupus erythematosus C4A*Q0 Ankylosing spondyhtis B27 Reiter's syndrome B27 Reactive arthritis B27 Acute anterior uveitis B27 Subactue thyroiditis B35 Fast progression of HIV infection B35 Psoriasis vulgaris cw6 Idiopathic haemochromatosis (A3)

29 CHAPTER 2

MODULAR VARIATIONS OF TEH HUMAN MHC CLASS HI GENES RCCX:

A MECHANISM FOR GENE DELETIONS AND DISEASE ASSOCIATIONS

2.1 ABSTRACT

The ficequent variations of human complement component C4 gene size and gene numbers, plus the extensive polymorphism of the proteins, render C4 an excellent marker for MHC disease associations. As shown by definitive RFLPs, the tandemly arranged genes

RP, C4, CYP21 and TNX are duplicated together as a discrete genetic unit termed ± e RCCX module. Duplications of the RCCX modules occurred by the addition of genomic fragments containing a long (L) or a short (S) C4 gene, a CYP21A or a CYP21B gene and the gene fragments TNXA and RP2. Four major RCCX structures with bimodular L-L, bimodular L-

S, monomodular L and monomodular S are present in the Caucasian population. These modules are readily detectable by Taq I RFLPs. The RCCX modular variations appear to be a root cause for the acquisition of deleterious mutations from pseudogenes or gene segments in the RCCX to their corresponding functional genes. In a patient with congenital adrenal

30 hyperplasia (CAH), we discovered a TNXB-TNXA recombinant with the deletion of RP2-

C4B-CYP21B. Elucidation of the DNA sequence for the recombination breakpoint region and sequence analyses yielded definitive proof for an unequal crossover between TNXA from a bimodular chromosome and TNXB from a monomodular chromosome.

31 2.2 INTRODUCTION

Besides the immunoglobulins, complement component C4 is probably the most polymorphic serum protein. There are two isotypes, C4A and C4B, which manifest remarkable differences in chemical reactivities and serological properties [reviewed in (Yu,

1999)]. More than thirty-four allotypes for C4A and C4B have been demonstrated by agarose gel electrophoresis, based on gross differences in electric charge (Mauff et ai,

1990a). Similar to the protein, the complement C4 genes are unusually complex with frequent variations in gene size and gene number. In addition, the genes surrounding C4A or C4B also exhibit considerable variations. These neighboring genes include RPl or RP2 at the 5' region, CYP21A or CYP21B, and TNXA or TNXB at the 3’ region {Fig. 2.1). The complex organizations of the C4A and C4B genes, together with the extensive polymorphisms of the C4A and C4B proteins render C4 an excellent marker for MHC associated diseases (Porter, 1983; Yu, 1999). For instances, congenital adrenal hyperplasia

(CAH) is mainly caused by mutations or deletions of CYP21B (Miller and Morel, 1989), systemic lupus erythematosus (SLE) is correlated with C4A deficiencies (Fielder et ai,

1983). In addition, insulin dependent diabetes mellitus (Degli-Esposti et al., 1992; Jenhani et a l, 1992), sudden infant death syndrome and spontaneous recurrent abortion (Laitinen et a l, 1991; Schneider era/., 1989), IgA deficiency and common variable immunodeficiency

(Howe et ai, 1991; Schaffer et al., 1989), IgA nephropathy (Welch et al., 1989), skin

32 vitiligo and pemphigus vulgaris (Ahmed et al., 1991 ; Venneker et ai, 1992), autism and narcolepsy (Matsuki et al, 1985; Warren et al., 1992), have all been suggested to be associated with specific alleles or nuU alleles of C4.

The human C4 genes are either 21 kb (long, L) or 14.6 kb (short, S) in size (Yu, 1991).

This dichotomous size variation is due to the presence of an endogenous retrovims HERV-

K (C4) in intron 9 of the long gene (Chu et al., 1995; Dangel et al., 1994; Tassabehji et ai,

1994). There may be one, two or three C4 genes in the MHC class HI region of chromosome 6 (Carroll et al, 1984). Most people have two C4 genes in the MHC with one coding for a C4A protein and the other coding for a C4B protein {Fig. 2.1). C4A has higher affinities to amino group containing targets; C4B has higher affinities to hydroxyl group containing targets. These differences are the result of four amino acid changes between positions 1101 - 1106 (Carroll et al, 1990; Dodds et al, 1996; Yu eta l, 1986). A significant proportion of the population has a single C4 gene in chromosome 6 coding for

C4A or for C4B. Deletion or duplication of the C4 genes are always concurrent with their downstream genes, steroid 21-hydroxylase genes, CYP21A or CYP21B (Carroll eta l,

1985a; Schneider era/., 1986).

RPl is one of the four novel genes, RD-SKI2W-DOM3Z-RP1, present in the 30 kb genomic region between complement component genes factor B (Bf) and C4 (Lévi-Strauss e ta l, 1988; Qu era/., 1998; Sargent era/., 1994; Shenera/., 1994; Yang era/., 1998). RP2 is a partially duplicated gene segment that contains only 913 bp of the sequence corresponding to the last two and half exons of RPl. RPl transcripts are ubiquitously expressed (Yang et al, 1998). Derived amino acid sequence suggested that RPl codes for a

33 nuclear protein that is probably a serine / threonine kinase (Gomez-Escobar et ai, 1998;

Shen et al., 1994; Yang et al., 1998).

The cytochrome P450 steroid 21-hydroxylase genes CYP21A or CYP21B are located

3028 bp downstream of C4A or C4B, respectively (Yu, 1991). CYP21 is essential for the biosynthesis of glucocorticoid and mineralocorticoid hormones. Complete absence of

CYP21 leads to salt-wasting, low activity of CYP21 causes simple virilizing, while below average CYP21 activity causes androgen excess [reviewed in (Miller and Morel, 1989)].

CYP21A is a pseudogene because it contains three deleterious mutations: an 8 bp deletion in exon 3 and a T nucleotide insertion in exon 7 that result in frame shift mutations, as well as a C to T transition in exon 8 that generates a premature stop codon. In addition, there are many other point mutations in coding and non-coding sequences (Higashi et ai,

1986; Rodrigues et al., 1987; White et al, 1986). If on both copies of chromosome 6, the deleterious mutations in CYP21A are incorporated into CYP21B, or the CYP21B genes are deleted, the subject suffers CAH.

The 3' ends of CYP21A or CYP21B overlap with the 3' ends of extracellular matrix protein tenascin TNXA or TNXB by 444 bp, respectively (Morel et al, 1989). The gene configurations of TNXA and TNXB are opposite to those of RP, C4, and CYP21. TNXB gene is 68.2 kb in size, consists of 45 exons, and encodes a protein of 4,289 amino acids (Bristow et al., 1993; Rowen et al., 1997b). The derived amino acid sequence of TNXB reveals a heptad, 18.5 epidermal growtn factor repeats, thirty-two fibronectin type HI repeats, and a fibrinogen domain. The overall structure of TNXB shows a striking similarity to extracellular matrix proteins tenascin/cytostatin (TN-C) and restrictin (TN-R) (Gitelman et

34 ai, 1992; Jones et al, 1988; Matsumoto et al., 1992a). TN-C is present in central and peripheral nervous system and in smooth muscle and tendon. It is probably involved in cell adhesion and cell morphology (Chiquet-Ehrishmann, 1993). TN-R is expressed in the nervous system and implicated in neural cell attachment (Rathjen, 1993). The function of

TNXB is yet to be determined. Its transcripts are ubiquitously expressed in the fetus

(Bristow et at., 1993). TNXA is a partially duplicated gene segment that corresponds to intron 32 to exon 45 of TNXB. In addition there is a 120 bp deletion at exon 36-intron 36 that results in a frame shift mutation and premature termination of translation (Gitelman et at., 1992; Shenera/., 1994).

The demonstration of the endogenous retrovirus HERV-K (C4) mediating the size variation of C4 genes (Dangel et al, 1994), the elucidation of DNA sequences for RPl and

RP2 (Shen etal., 1994), C4A and C4B (Yu, 1991; Yu etal., 1986), CYP21A and CYP21B

(Higashi et al., 1986; Rodrigues et ai, 1987; White et ai, 1986), as well as TNXA and

TNXB (Bristow et ai, 1993; Rowen et ai, 1997b; Shen et ai, 1994) provide important information for resolving the fine structures and the complex organizations of the consecutive genes RP,C4, CYP21 and TNX. The concurrent deletions / duplications of C4 and CYP21 genes (Carroll et ai, 1985a; Schneider et ai, 1986) prompted us to investigate if

RP and TNX also undergo rearrangements in normal individuals and in selected CAH patients. Diagnostic RFLPs for RPl and RP2, TNXA and TNXB have been devised. RP-C4-

CYP21-TNX genes are organized in variable, modular fashions. The unusually frequent modular variation appears to be the root cause for unequal crossovers and exchange of sequences between the functional and non-functional genes of the RCCX.

35 2.3 EXPERIMENTAL PROCEDURES

Oligonucleotides — Oligonucleotides were synthesized by an Applied Biosystem

Model 380B DNA Synthesis machine. The sequences added to facilitate cloning are represented in lower cases. For amplification and sequencing of TNX genes (Bristow et a l, 1993): RDX-5, aga gAA TTC AGT GAA ATC AGG GAG ACC; RDX-3, gag gaa

TTC CAG TGC AGC ACG GCG AA; SDX-52, GGA GCC TCA GAG TGT GCA; SDX-

32, CAA TCG GAG CCT CCA CCA; XB54H, gtg gaa ttc AAG CGA GCA CCT GAC

TCA; and XA31H, gtt gaa ttc TTT TCT TGA CTC CCA CCT G. For amplification of

CYP21A probe (Rodrigues et a l, 1987): 21A5, TGT GGC CAT TGA GGA GGA A; and

21 A3, TGC CAC CGA TCA GGA GGT C.

Isolation o f Human Genomic DNA — Genomic DNAs were isolated following standard protocols from cultured cell lines HepG2 (liver carcinoma) and MOLT4 (T-celi leukemia), peripheral blood of normal individuals, and a congenital adrenal hyperplasia patient (CAH-El). Appropriate consents from blood donors were obtained according to approved protocols by the Institutional Board of the Columbus Children's Hospital.

Complement C4 allotyping — Complement C4A and C4B allotypes from EDTA-blood plasma were determined as described (Awedeh and Alper, 1980; Sim and Cross, 1986).

36 Briefly, 10 pi of plasma was digested with 0.1 U of neuraminidase (Sigma, St Louis, MO) at

4° overnight, and with 0.1 U of carboxyl pepdidase B (Sigma) at room temperature for 30 min. Two agarose gels were prepared. Four pi of digested palsma was loaded to each gel and resolved by high voltage gel electrophoresis (Schneider and Rittner, 1997). One of the gels was subjected to immunofixation using goat antisera against human C4 (Incstar,

Stillwater, MN). Plasma proteins in the other gel were subjected to immunoblot analyses using anti-Chl or anti-Rgl monoclonal antibodies (anti-C4B, cat no. C057-325.2, lot no.

120287; anti-Rgl, RGdl; kindly provided by Dr. Joann M. Moulds, Houston, TX) at a dilution of 1:5000 and 1:1000, respectively. Immune complexes were detected by chemiluminescence method using the ECL-plus reagents (Amersham, Piscataway, NJ).

HLA typing — HLA typing of the E family was kindly performed by The Ohio State

University Tissue Typing Laboratory.

PCR of Cosmid and Genomic DNA — PCR of cosmid and genomic DNA were performed after standard procedures (Saiki, 1990). PCR products were purified from

0.8% low gelling temperature agarose and cloned into pBluescript vectors.

DNA Probes — RP: R P l.l, a 1.1 kb insert of RPl cDNA (Shen et al., 1994); RPl 3' probe, a 651 bp Nhe l-EcoR I fragment of RPl.l. C4: Pa , a 476 bp BamH l-Kpn IcDNA fragment isolated from pAT-A. pAT-A contains the almost full length cDNA insert for the human C4A4 allele (Belt et a i, 1984; Yu et ai, 1986). Pg, a C4J-specific 926 bp

37 BamR I DNA fragment subcloned from ÀJM-2a that contains a C4B5 gene (Yu et al.,

1986). CYP21: a 757 bp fragment of CYP21A, amplified from cos 2 using primers 21 AS and 21 A3. A cosmid isolated from a human genomic library D a (Y u et a i, 1992), cos 2 spans from the S' region of C4A1 gene to the 3' region of TNXB gene. The genomic DNA of cos 2 derives from the HLA haplotype A3 B47 DR7 that contains a deletion of the

CYP21B gene. TNX: a 600 bp fragment of TNXB, corresponding to exons 3S-37 of

TNXB, amplified from cos 2 using primers RDX-S and RDX-3. TNX-800: an 800 bp fragment of TNXB, immediately upstream of the TNXA/TNXB breakpoint, amplified from cos 2 using primers XB-MSS and XB-BK3.

Southern Blot Analysis — Ten micrograms of genomic DNA were digested to completion with the appropriate restriction enzymes for 16 hrs, resolved on 0.8% agarose gel, blotted onto Hybond-N membrane (Amersham, Arlington Heights, IL), and hybridized with an appropriate [a-^'P] dCTP-labeled probe, as described in (Yu and

Campbell, 1987).

DNA Sequencing and Sequence Analysis — Sequencing reactions were performed using a Sequenase kit (U.S. Biochemicals Corp., Cleveland, OH) and ^^S-ATP following the dideoxy sequencing method (Sanger et a l, 1977), or by automated sequencing method using an ABI 377 machine. DNA sequences were compiled using PC/Gene software (Intelligenetics, Mountain View, CA). Comparisons of the sequences were

38 performed by FASTA, BESTFIT, PILEUP and PRETTY programs in GCG package through the Pittsburgh Supercomputing Center (Genetics Computer Group, 1991).

39 2.4 RESULTS

Allotyping o f C4

Allotyping of the plasma C4 proteins from the B and E family members is shown in Fig. 2.2, panel I. Deduced from the immunofixation experiments, B 1 is homozygous for C4AQ0 C4B1 {lane 1); B2 and B3 are both heterozygous, with C4A3 C4B1, C4A3

C4BQ0 for B2 {lane 2) and C4A3 C4B1, C4AQ0 C4B1 for B3 {lane 3). E l has C4A1

C4BQ0, C4A3 C4BQ0 {lane 4); E2 is homozygous for C4A3 C4BQ0 {lane 5); and E3 is heterozygous for C4A1 C4B1, C4A3 C4B1 {lane 6). A normal control with homozygous

C4A3 C4B1 was shown in lane 7. Previously, it was shown by hemagglutination that the

C4A3 allotype reacts with anti-Rgl and the C4B1 reacts with anti-Chl. In this study, immunoblot analyses confirmed that in each individual except for E 1 and E2 (both of which are homozygous C4BQ0), C4B1 reacted with anti-Chl monoclonal antibody

{panel II). Noticeably, C4A1 in El {lane 4) and E3 {lane 6) also interact with the anti-

Chl antibody. In panel IQ, all individuals except B1 (which is homozygous C4AQ0) manifested positive reactions for the C4A3 allotype with the anti-Rgl monoclonal

antibody.

40 Modular variations fo r RP, C4, CYP21 and TNX

Whether the flanking genes RP (located 5’ to C4) and TNX (located 3' to CYP21) are involved in the C4-CYP21 gene deletion events were investigated. Diagnostic RFLPs were devised to detect and distinguish the presence of the RPl and RP2, as well as TNXA and TNXB. By using an RPl 3' probe for hybridization, the RPl gene can be represented by a 9.6 kb BamYi I fragment, and the RP2 (and TNXA gene segments) can be represented by a 5.0 kb BamH I fragment. By using a 600 bp TNX probe corresponding to TNXB exons 35-37, the presence of the TNXB gene can be detected by a 9.0 kb Sea I fragment, and the TNXA gene by a 4.0 kb Sea I fragment {Fig. 2.3, panel I). Previously, techniques were established for detecting the presence of the C4A and C4B genes by Nla IV RFLP, and their associations with Rgl or Chi antigenic determinants by EeoO 1091 RFLP (Yu and Campbell, 1987). In addition, the presence of CYP21A and CYP21B can be detected by the 3.2 kb and 3.7 kb Taq I fragments, respectively (White et a i, 1986). Fig. 2.3

{panels II to IV) shows a series of Southern blot analysis of DNA samples isolated from two families, B and E. Members of the B family appear to be normal. For the E family, there is a congenital adrenal hyperplasia (CAH) patient E 1.

As shown in Fig. 2.3, panel II, restriction fragments corresponding to RPl (9.6 kb

BamH I fragment) and RP2 (5.0 kb BamH I fragment) were detected in all individuals except B1 and El. In these two individuals, only the RP/-specific 9.6 kb fragment is present {lanes 1 and 5).

The presence of the C4A or C4B gene was analyzed by Nla IV restriction patterns shown in panel III-A. B1 contains the 467 bp fragment corresponding to C4B genes

41 {lane 1). El and E2 have the 276 bp and 191 bp fragments specific for C4A genes {lanes

5 and 6). Other individuals contain fragments corresponding to both C4A and C4B genes.

The associations of C4 genes with the major Chido (Chi) and Rodgers (Rgl) antigens were revealed in panel EI-B by the EcoO 1091 restriction patterns. The C4B genes in B 1 express the Chi antigen since only the 458 bp fragment can be detected {lane 1). The

C4A genes in E2 express the Rgl epitope since only the 565 bp fragment is present {lane

6). Both Rgl and Chi specific fragments are present in El {lane 5). Therefore, one of the C4A genes in El expresses Chi (that is frequently associated with C4B), the other

C4A gene expresses Rgl. All other individuals contain both C4A and C4B genes. These

C4 genes express Rgl and Chi because both 565 bp and 458 bp fragments were detectable. An additional 344 bp, EcoQ 1091 fragment exists in all individuals as the 926 bp C4d specific probe (Pg) was used. This fragment is common to C4 genes with Rgl or

Chi (Yu and Campbell, 1987).

The presence of CYP21A and CYP21B was determined by Taq I RFLP {panel IV).

Both CYP21A (3.2 kb fragment) and CYP21B (3.7 kb fragment) were detected in all samples except B1 and El. B1 only contains the functional CYP21B gene {lane 1), while the CAH patient El only contains the pseudogene CYP21A {lane 5).

The presence of TNXA and TNXB genes are determined by Sea I RFLP {panel V).

Restriction fragments for both TNXA (4.0 kb Sea I fragment) and TNXB (9.0 kb Sea I fragment) were detected in all individuals except B 1 and E l {panel II-B). These two individuals have the 9.0 kb Sea I restriction fragment corresponding to TNXB but no 4.0 kb Sea I fragment corresponding to TNXA {lanes 1 and 5).

42 From the above results, it becomes clear that individuals who have a single locus for

RP also have single gene loci for C4, CYP21 and TNX. Individuals with both RPl and

RP2 loci also have duplicated loci for C4, CYP2I and TNX. Hence, the four tandemly arranged genes RP, 04, CYP21, and TNX are duplicated or deleted together as a discrete genetic unit. This genetic unit is designated as the RCCX module.

Molecular basis of the 04 and RP Taq I RFLP and variations o f the RP, 04, 0YP21 and

TNX (ROOX) loci in the B and E families

The Taq I RFLP is the most widely used technique to illustrate the complexities of polymorphisms in the number and size of 04 genes. Four fragments of 7.0, 6.4, 6.0 and

5.4 kb in size can be detected by Southern blot analysis of Taq I digested genomic DNA hybridized to ? a , a 04 S' probe (Schneider et al., 1986; Simon et al., 1997; Yu and

Campbell, 1987). At the same time, all these Taq I fragments can also be detected by an

RPl.l probe, suggesting that the Taq I RFLP is caused by both RP and 04 gene variations. There is a BamYi I site at the 5' untranslated region of the 04A and 04B genes. To segregate the RP1/RP2 variation from the 04 long/short (JJS) size dichotomy, the BamYi i-Taq I double digests of genomic DNA were performed {Fig. 2.4, panel I).

As shown in panel U, the single 6.4 kb Taq I fragment {panel II-A, lane 1) was split to a

3.1 kb {panel II-B, lane 1) and a 3.2 kb {panel II-C, lane 1) Taq 1-BamYi I fragments corresponding to an R P l gene and a short 04 gene. The 7.0 kb and 6.0 kb Taq I fragments {panel II-A, lane 2) were split to 3.1 kb and 2.1 kb Taq i-BamYi I fragments corresponding to RPl and RP2 genes {panel II-B, lane 2), and a single 4.0 kb BamYi I-

43 Taq I fragment corresponding to long C4 genes {panel II-C, lane 2). The 7.0 kb and 5.4 kb Taq I fragments {panel II-A, lane 3) were divided into the ÆP/-associated 3.1 kb and the /?P2-associated 2.1 kb Taq \-BarnVi 1 fragments {panel II-B, lane 3), and the 4.0 kb and 3.2 kb BamH \-Taq I fragments associated with long and short C4 genes, respectively

{panel II-C, lane 3). Therefore, the restriction patterns of the Taq I genomic Southern blot for RP or C4 genes are resulted from a combination of variations of RPl or RP2, one or two loci of C4 genes, and long or short C4 genes. The basis of the Taq I RFLP is interpreted as follows: the 7.0 kb fragment indicates the presence of an RPl gene and a long C4 gene; the 6.4 kb fragment corresponds to an R Pl gene and a short C4 gene; the

6.0 kb fragment represents an RP2 and a long C4 gene; and the 5.4 kb fragment stands for an RP2 and a short C4 gene. The actual sizes of these Taq I restriction fragments derived from DNA sequences are 0.1 kb greater or smaller than the apparent sizes described above.

The specific combinations of RP1/RP2 genes with C4(L)/C4(S) genes in the B and E family members were examined by Taq 1 digests and R P l.l probe as shown in Fig. 2.5, panel I. The RP1-C4(S) haplotype is present in B1, as represented by the single 6.4 kb

Taq I fragment {lane 1). The RP1-C4(L) haplotype is present in El, as shown by the single 7.0 kb Taq I fragment {lane 5). Therefore, homozygous, single RPl gene and single C4 gene are present in B 1 and El. B2 has the homozygous RP1-C4(L)-RP2-C4(L) haplotype as revealed by the presence of the 7.0 kb and the 6.0 kb Taq I fragments {lane

2). B3 and B4 (children of B1 and B2) are heterozygous for the RP1-C4(S) haplotype and the RP1-C4(L)-RP2-C4(L) haplotype since they both contain the 7.0 kb, 6.4 kb and

44 6.0 kb Taq I fragments {lanes 3 and 4). E2 and E3 (parents of El) are heterozygous for the RP 1-C4(L)-RP2-C4(L) haplotype and the RP1-C4(L) haplotype since they both have the 7.0 kb and 6.0 kb Taq I fragments {lanes 6 and 7).

Detection of the TNXA-associated 120 bp deletion in a TNXB gene of CAH patient El

Compared with TNXB, TNXA has not only a 63 kb truncation of the 5’ region

(Bristow et a i, 1993: Rowen et al., 1997b; Shen et al., 1994), but also a 120 bp deletion spanning across the junction of exon 36 and intron 36 (Shen et al., 1994). This deletion attributes to a Taq I RFLP when a TNX 3' probe is used for Southern blot analysis: TNXA associates with a 2.4 kb fragment (TNX-2.4) and TNXB with a 2.5 kb fragment (TNX-

2.5). The analysis of this TNX Taq I polymorphism in the B and E families are shown in

Fig. 2.5, panel DI. B 1 who has monomodular RCCX (S) structure exhibited TNX-2.5

{lane 1) that is consistent with the presence of TNXB genes only. B2 who has bimodular

RCCX (L-L) structure revealed both TNX-2.4 and TNX- 2.5 {lane 2), which is concordant with the presence of both TNXA and TNXB. B3 and B4 who are heterozygous for bimodular RCCX (L-L) and monomodular RCCX (S) also revealed both TNX-2.5 and TNX-2.4 fragments.

E3 is heterozygous for monomodular RCCX (L) and bimodular RCCX (L-L), implying that she has two TNXB genes and one TNXA gene. As expected, the relative band intensity of TNX-2.5 to TNX-2.4 is 2:1 {lane 7). E2 is also heterozygous for monomodular RCCX (L) and bimodular RCCX (L-L) with two TNXB genes and one

TNXA gene in a diploid genome. However, the band intensity of TNX-2.5 is only half of

45 that for TNX-2.4 {lane 6), which is opposite to the expected result. The CAH patient El is homozygous for monomodular RCCX (L) and has only TNXB but no TNXA.

Unexpectedly, he manifested both TTVXR-related 2.5 kb fragment and TAXA-related 2.4 kb fragment. The relative band intensity for TNX-2.5 and TNX-2.4 is 1:1. These results suggest that in both El and E2, one of the TNXB genes contains the 120 bp deletion. In other words, the aberrant TNXB gene in El originates from the paternal chromosome.

Correlation of the HLA typing data for the E family {Table 2.1) and the Taq I RFLP data suggests that the aberrant TNXB gene is present in HLA haplotype A3 B35 DRl.

Based on the phenotypes of the C4A and C4B proteins, together with the haplotypes of the RCCX modules, the organizations of RCCX genotypes for the B and E family members are deduced and shown in Table 2.1. The haplotype c of the B family and the haplotype a of the E family both have bimodular RCCX structures coding for C4A3 protein from the two C4 loci.

An unequal crossover between monomodular and bimodular RCCX chromosomes leading to gene deletion and gene duplication

The presence of a monomodular RCCX structure in El with RP1-C4(L), CYP21A, and a TNXB gene with 7A%4-associated 120 bp deletion leads us to hypothesize that there was an unequal crossover between TNXA and TNXB from the homologous chromosomes. This unequal crossover resulted in the formation of an XB-XA recombinant and the deletion of the RP2-C4B-CYP21B genes {Fig. 2.6, panel I). To test this hypothesis, the genomic region spanning the putative 120 bp deletion was amplified

46 by PCR using the CAH-El genomic DNA. The strategies for PCR are depicted in panel

H of Fig. 2.6. When primers RDX5 and RDX3 were used, two PCR products in the size

of 493 bp and 613 bp were obtained. These products were cloned and sequenced. Both

sequences correspond to the exons 35 - 37 of TNX. The sequence of the 613 bp is

identical to that of the regular TNXB, while the 493 bp product appears to contain a 120

bp deletion similar to that observed in TNXA. To further prove that this is an aberrant

TNXB gene with a TTVXA-associated 120 bp deletion, a 2.6 kb genomic DNA fragment

was amplified using RYM25 and XA3IH. RYM25 is a TTVXB-specific primer because it

is located in exon 32 and is 220 bp upstream of the gene duplication breakpoint for TNXA

and TNXB. The XA3IH primer spans across the 120 bp deletion and therefore is TNXA-

specific. The 2.6 kb PCR product was cloned and sequenced to completion. The

sequences of the 2.6 kb and the 493 bp fragments overlap and so were compiled. The

resulting sequence EIXB-A was aligned with two normal TNXB sequences (TNXB-H

and TNX-M) (Bristow et ai, 1993; Rowen et al., 1997b) and two normal TNXA

sequences (TNXA-M and TNXA-Y) (Shen et al., 1994; Tee et al., 1995) obtained from

three different laboratories. In addition, we include a recombinant TNXA sequence in a pauciarticular JRA patient, LIXA-B that acquired the described 120 bp genomic DNA

sequence (Rupert et a i, 1999). The alignment starts at the breakpoint of gene duplications for TNXA-TNXB. The alignment reveals a picture for the original DNA recombination leading to the reciprocal recombinant sequences in EIXB-A and LlXA-B

{Fig. 2.7, panel I). There are nine single nucleotide changes or informative sites (marked by asterisks) which can differentiate TNXA from TNXB. A Chi sequence related to DNA

47 recombination in bacteriophage A. is located at nucleotides 450-457. Five of the

TNXA/TNXB informative sites (nucleotides 14, 60, 63, 73 and 278) are 5’ to the Chi site.

The other four sites (nucleotides 2234, 2268, 2273 and 2423) are 3' to the Chi site and they are flanking the 120 bp deletion. For ElXB-A, the sequence is TTVXB-specific for the first five informative sites, but is 77V%A-specific for the last four sites. It also acquires the TAXA-specific 120 bp deletion {Fig. 2.7, panel II). This suggests that EIXB-A is the result of a recombination between TNXB and TNXA, occurred between nucleotides 278 and 2234.

For LIXA-B, it has sequences characteristic of TNXA at the first five informative sites, but has sequences characteristic of TNXB at the last four sites and also includes the

120 bp sequence. This aberrant TNXA sequence in LIXA-B appears to be the reciprocal product of EIXB-A resulted by DNA recombination between TNXA and TNXB.

48 2.5 DISCUSSION

It is shown here that in human MHC class HI region, four tandemly arranged genes serine / threonine kinase RP, complement C4, steroid 21-hydroxylase CYP21, and tenascin TNX are organized as a genetic unit designated as an RCCX module. In a monomodular RCCX haplotype, the "full length" genes RPl and TNXB are always present, inferring relevant cellular functions for RPl and for TNXB. In an RCCX bimodular haplotype, duplication of the RCCX module occurs by the addition of a C4 gene, a CYP21 gene together with the TNXA and RP2 gene segments. This additional, modular genomic fragment is either 32.5 kb or 26.2 kb in size, depending on whether the

C4 gene contains the endogenous retrovirus HERV-K (C4). The three pseudogenes/gene segments, CYP21A, TNXA and RP2, present between the two C4 loci, probably do not encode for functional protein products. The concurrent deletions of a C4A or C4B gene with a CYP21A or CYP21B gene is a well-established phenomenon (Carroll et al., 1985a;

Schneider et al., 1986), this report provides the detailed documentation for the modular deletions or duplications of RP and TNX genes together with C4 and CYP21.

Although this multiple-gene modular variation observed in RCCX is uncommon in mammalian genetics, a similar phenomenon with concurrent variations of at least three tandem genes has been observed in a genomic region at chromosome 5ql2-13. The three genes are BTFp44 (a p44 subunit of transcriptional factor TFIIH), NAIP (neuronal

49 apoptosis inhibitory protein), and SMN (survival motor neuron). One, two or three modules of these three genes, each module spans about 500 kb in size, are present in the population (Lewin, 1995). The divergence of sequences in C4A and C4B is analogous to that in SMN^ and SMN^; the presence of the pseudogene CYP21A and the functional

CYP21B is analogous to the pseudogene NAIP^ (with deletion of exon 5) and the intact

NAIP. The absence of the gene is associated with most spinal muscular atrophy

(SMA), and deletion of the NAIP gene in most severe forms of SMA (Taylor et al.,

1998a).

The frequency of the RCCX modular variations has been studied in a population of 150 normal Caucasian females. It is discovered that 75.4% of the C4 genes are long, and 24.6% are short. Bimodular and monomodular RCCX organizations are present in about 71.6% and 16.2% of the chromosome 6, respectively. Trimodular RCCX haplotypes are uncommon and have a frequency about 12.2% [Hauptmann et al, 1988,

C. A. Blanchong and C.Y. Yu, unpublished observation]. Excluding the trimodular haplotypes (Mauff et a i, 1990a; Raum et at., 1984), there are four major RCCX modular structures in the Caucasian population: bimodular long-long (L-L), bimodular long-short

(L-S), monomodular long (L), and monomodular short (S) {Fig. 2.8). These four RCCX structures can be detected conveniently by Taq I RFLPs. The widely applied Taq 1

RFLP analysis of C4 genes yields information on the combination of RPl or RP2 with

C4(L) or C4(S). It does not yield definitive information, however, on whether the C4 gene codes for C4A or C4B proteins. The information for the presence of C4A and C4B genes may be obtained by Nla IV RFLP analysis or by direct DNA sequencing. From

50 these four RCCX organizations, twelve haplotypes for RPI/RP2, C4A/C4B,

CYP21A/CYP21B, TNXA/TNXB are observable in the normal and in disease populations.

Six of these haplotypes are more common in the normal population and they are highlighted (haplotypes 1,2, 5, 9, 10 and 12). RCCX bimodular haplotypes with two

C4B genes (haplotypes 4 and 8) are yet to be shown definitively. Bimodular haplotypes with two CYP21A genes (haplotypes 3 and 6) and a monomodular haplotype with a

CYP21A gene (haplotype 11) are present in CAH patients. In two Brazilian tribes, a fifth

RCCX organization with two short C4 genes is found (Weg-Remers et a i, 1997). This bimodular C4(S)-C4(S) combination (haplotype 13) is extremely rare in other ethnic groups.

The diversities in the number and size of the RCCX modules probably contribute to the genetic variability and instability of the MHC class EH region. A recombination between a bimodular RCCX chromosome and a monomodular RCCX chromosome may lead to the exchange or homogenization of polymorphic or mutant sequences between complement C4A and C4B loci. This has been demonstrated by the presence of the C4A- associated amino acid residues in many C4B allotypes, and vice versa. The typical examples are the reverse associations of Chi antigenic determinant with C4A1 and

C4A13, and Rgl antigenic determinant with C4B5 (Giles et al., 1988; Yu et ai, 1986,

1988). It was also shown that in an SLE patient, there is an acquisition of a 2 bp insertion in exon 29 from C4AQ0 to C4BQ0 (Barba et a l, 1993; Lokki et a l, 1999). The same type of recombination may also lead to the acquisition of mutations from pseudogene

CYP21A or gene segment TNXA to their corresponding functional genes (Harda et al..

51 1987; Tusie-Luna and White, 1995). This is manifested in many disease-associated haplotypes such as the presence of two CYP21A pseudogenes in RCCX bimodular haplotypes (Harda et a i, 1987), and a CYP21AICYP21B hybrid in CAH patients with

HLA haplotype A3 B47 DR7 and monomodular RCCX (Chu et al., 1992; Donohoue et a i, 1995). Another example comes from the presence of a TNXB/TNXA hybrid gene with the 120 bp deletion together with the deletion of the RP2-C4B-CYP21B gene in CAH patient(s), as demonstrated in this chapter.

In CAH patient E l, the 120 bp deletion in the TNXB-XA recombinant will cause a premature termination and therefore truncation of the carboxyl terminal sequences. The truncation includes three fibronectin type m repeats (with four N-linked glycosylation sites) and the entire fibrinogen domain. This mutation may diminish or knockout the function of TNX. The haplotype for the recombinant chromosome with monomodular

RCCX characterized the presence of a CYP21A pseudogene linked to the TNXB/XA hybrid is HLA A5 BBS DRl, RPl C4A3 CYP21A TNXB-XA.

Three independent observations on the deletions of the CYP21B genes, which could have arisen by a mechanism similar to that for haplotype b of CAH-El, were reported. In the first study, HLA haplotype B35 DRl was found in a salt losing CAH patient. In this case, a single C4A3 gene and a CYP21A gene were present with the deletion of C4B and

CYP21B (Jospe et a i, 1987), which is similar to haplotype b of CAH-El. Whether this patient has a recombinant TNXB-XA in the monomodular RCCX was not determined.

In the second study, a de novo deletion of C4B together with CYP21B was suggested to derive from a meiotic unequal crossover between the maternal homologous

52 chromosomes (Sinnott et al., 1990). From our current knowledge, this de novo deletion was probably resulted by an unequal crossover between TNXA from a bimodular L-S chromosome (HLA A30 BI3 DR7) and TNXB from a monomodular S chromosome (HLA

A1 B8 DR3). The recombinant has a monomodular L chromosome (HLA A30 B I3 DR3) with CYP21A and a 2.4 kb Taq I fragment at its 3' end, which is indicative of a 120 bp deletion in the TNXB gene.

In the third study, one of the chromosome 6 in a CAM patient appears to have a monomodular RCCX with the 120 bp deletion in TNXB. The other chromosome 6 has a bimodular RCCX with two CYP21A genes and no CYP21B gene, as revealed by Southern blot analyses using Taq I, Bgl H and II digested genomic DNA. Immunoblot analysis showed that this patient did not produce any TNXB protein. Therefore, both

TNXB genes from the homologous chromosomes were presumed to be non-functional.

Since the patient suffers Ehlers Danlos syndrome in addition to CAH, it is proposed that malfunction of TNXB is associated with the connective tissue disease (Burch et at.,

1997).

Another important piece of evidence for the described unequal crossover comes from studies on the molecular genetics of a pauciarticular juvenile rheumatoid arthritis (JRA) patient in our laboratory. This JRA patient has an RCCX bimodular haplotype with two

CYP21B genes and a 5'-TNXA/XB-3' hybrid with the T/VXA-specific truncation of exons

1-32 at the 5' region, and the presence of the TiVXB-specific 120 bp sequence at the 3' region (Rupert et a i, 1999). The TNXA-XB hybrid appears to be the reciprocal product in

CAH-El 5'-TNXB/XA-3', attributable to the genetic recombination between TNXA and TNXB. This notion is substantiated by the reciprocal associations of the informative sites for TNXB and for TNXA at the two ends of the hybrid sequences {Fig. 2.7).

In the bimodular (or trimodular) RCCX haplotypes, one of the duplicated genes or gene segments could afford to sequence mutations without immediate deleterious effect of knocking out the gene function. The RCCX modular variations in the population allowed sequence variations and enhanced the incorporation of diversified or mutant sequences among the paralogous genes, pseudogenes or gene segments. The selection advantage is probably the emergence of various polymorphic forms of complement C4A and C4B to tackle different microbial antigens (Kawaguchi et al., 1991). The burdens are the accompanying genetic or autoimmune diseases such as CAH, SLE and possibly EDS, caused by unequal crossovers and incorporations of deleterious mutations in the constituents of the RCCX.

In the MHC class II region between DRBl and DRA genes, there may be 1 to 3 DRB pseudogenes. In addition, DRB3, DRB4 or DRB5 can be present (Apple and Erlich,

1996). The DR locus is about 350 kb centromeric to the RCCX modules. Between heterozygous chromosomes with different DR gene number and RCCX modules, misalignments and unequal crossovers would occur during meiosis. This may result in deletion or duplication of structural genes between these two variable regions. More than ten structural genes with important functions have been discovered between DR and

RCCX (Aguado et al., 1996; Rowen et a i, 1997b). Recombinant chromosomes with the essential structural genes deleted would be lethal. Therefore, the apparent productive recombination frequencies between certain MHC haplotypes would be less than

54 expected. This would have contributed to the linkage disequilibrium of the MHC genes, as many of the "ancestral" MHC haplotypes remain largely conserved in the population.

55 10 20 kb I I I L_J to HLA Class I to HLA class II 92 kb to HSP70 350 kb to HLA DRr // $ C2 Bf RD RPl C4A C4B TNXB III II I I I I

Intcrgenicdistance ^ g 00 00 LA s s O n m m

Fig. 2.1 A molecular map of the human MHC complement gene cluster. This map represents the most

common organization of the genes in the normal population from complement component C2 gene to

extracellular matrix protein TNXB gene. Horizontal arrows represents the direction of gene transcription.

Pseudogenes or gene segments are shaded. The negative signs for the intergenic distances between CVP2IA and TNXA, CYP2IB and TNXB represent overlaps at the 3' ends of these genes (This figure is taken from Yang et al, 1998). I. Immunofixation

B1 B2 B3 El E2 E3 C

C4BI C4AI

1 2 3 4 6 7 n. Immunoblot (anti-Chl)

C4B1- # # C4AI

1 2 3 4 5 6 7 ni. Immunoblot (anti-Rgl)

C4A3 -

Fig. 2.2 C4 allotypes of the B and E families. I. Human C4A and C4B allotypes detected by immunofixation; Immunoblot analysis of the C4 allotypes using (//) anti-Chl monoclonal antibody and {JIT) anti-Rgl monoclonal antibody. The C4A1 allotype with reverse association of Chi epitope is marked by an arrow. Lane 4 is from a control plasma with C4A3 and C4B1.

57 I.

A. Bamn I RFLP forRPl and RP2 B. Sea I RFLP for TNXA and TNXB

R P l i C4 CYP2IA TNXA RP2-C4B BamH I BamH I Sea I Sea I I____ I______(9.6 kb) (4.0 kb)

TNXA \RP2\ C4 CYP21B TNXB BamH I BamH I Sea I Sea I 1___ _i I (5.0 kb) (9.0 kb)

(to be continued)

Fig. 2.3 Southern blot analysis to detect the modular variations of RP, C4, CYP21 and TNX. DNA samples were isolated from B and E family members and subjected to appropriate restriction enzyme digestions. Panel I illustrates the basis to detect R P l and RP2 by BamYl I RFLP, and to detect TNXA and TNXB by Sea I RFLP. Solid bars represent locations of the probes used. Panel II shows detection of RP 1 and RP 2 genes by BamYi I RFLP. The probe used was R P l.l. Panel III-A shows the variation of C4A and C4B isotypes by Nla IV RFLP; panel III-B reveals the expression of the Rgl and Chi antigenic determinants in C4 genes by EcoO 1091 RFLP. The probe used was Pg.

Panel IV exhibits the presence of CYP21A and CYP21B genes by Taq I RFLP. The probe used was a 757 bp fragment amplified from CYP2IA. Panel V shows the detection of TNXA and TNXB by Sea I RFLP. The probe used was a 600 bp fragment corresponding to exons 35-37 of TNXB.

58 Fig. 2.3 (continued)

II. BI B2 B3 B4 El E2 E3 E4 1 2 3 4 5 6 7 8

RPl

5.0 - K RP2

III.

A. BI B2 B3 B4 El E2 E4 BI 32 B3 B4 El E2 E4 bp 1 2 3 4 1 2 3 4 5 6 7

467.

276. — Chi

191

IV.

— CYP21B

CYP21A

BI B2 B3 B4 El E2 E4 kb 1 2 3 4 5 6 7

TNXB 9.0 —

4.0 — TNXA

59 I. RP I C4 flEKY-K(Ç^) Taq I Bam HI ' Taq I _l______I______I___ Long C4, RPl I------3.1------1------4.0-

1ERV-K

Long C4, RP2 I ______•2.1---- 1------4.0-

Taq I Bam HI Taq I Short C4, RPl 3.1- 3.2

Taq I Bam HI ' Short C4, RP2 _ J ______I L 1 2.1 1- -3.2

n. A. Taq I B. Taq I-Ba/nH I C. Taq l-BamH I

kb I kb I

RPl _ C4

RP-C4 C4

Fig. 2.4 Southern blot analysis to segregate the RP1/RP2 variation from the C4 long/short (L/S) size dichotomy.Panel I is a schematic diagram for the molecular basis of the Taq I and Taq I - BamH I RFLPs. In panel II-A, genomic DNA isolated from BI {lane 1), HepG 2 (liver carcinoma, lane 2) and Molt 4 (T- cell leukemia, lane 3) were digested with Taq I and hybridized with i?Pl.l probe. In panels II-B and II-C, ON As were double-digested with Taq I and BamH I, and subjected to ÆP1.1 probe {panel II-B) or C4 5' probe {panel II-C).

60 BI B2 B3 B4 El E2 E3 kb I 2 3 4 5 6 7

II.

3.7 —

3.2 —

III.

—TNXB —TNXA

Fig. 2.5 Taq I polymorphism to detect the 7WÆA -associated 120 bp deletion in the TNXB gene of CAH patient El. Genomic DNA isolated from B and E families were digested with Taq I and hybridized with appropriate probes to demonstrate different RP-C4 combinations (panel I), the presence of CYP21A and CYP21B genes (panel II), and the variations of TNX genes (panel HI). The 2.5 kb Taq I fragment is usually associated with TNXB, while the 2.4 kb Taq I fragment is usually associated with TNXA. An arrow indicates the unusual association of a 2.4 kb fragment with TNXB in E l.

61 to 2 0 tcb

I 1 I t 1

-W W [ A-C-B‘l^TSF-HSP70.. HLA fDR-DQ-DPl--

% 9 Bimodular RCCX

C2 Bf I r D R P l C 4A (L) - 5 xa C4B (S ) TNXB

C2 HBf IRD C4A (L) 1 TNXB ^ 2 Monomodular RCCX I

HLA A24 B8 DR3 2 — RPl

and

HLAA3B35 DRI I — RPl C4A (L) XA 1 XB

200 400 bp

>-> < < u. 41 40 39 38 37 36 35 33 3234

TNXA A ^ TNXB 1120 bp deletion I breakpoint of gene duplication

TNXA-XB RECOMBINANT

Fig. 2.6 Recombination between RCCX modules,(i) A model for an unequal crossover between a bimodular RCCX chromosome and a monomodular chromosome to generate a TNXB/XA recombinant and a TNXA/XB recombinant; {IT) PCR strategy to amplify the breakpoint region of gene recombination and its corresponding exon-intron structure.

62 Fig. 2.7 Elucidation of the breakpoint region forTNXB-XA recombinant. (7) An alignment of the normal and recombinant TNXA and TNXB genomic DNA sequences.

Nucleotide number starts at the breakpoint region {bold and underlined) of the TNXB and

TNXA gene duplication. The normal TNXB sequences, TNXB-H (GenBank accession no.

U89337) and TNXB-M (GenBank accession no. X71937) are taken from Ref. 37 and

Ref. 38, respectively. The normal TNXA sequences, TNXA-Y (GenBank accession no.

L20263) and TNXA-M (GenBank accession no. S38953) are taken from Ref. 30 and

Ref. 38, respectively. EIXB-XA is generated by this work. LIXA-XB (GenBank accession no. AF077974) is from Ref. 72. TNXA and TNXB specific sequences or informative sites are marked by asterisks. {II) A schematic diagram showing the reciprocal locations of TNXA and TNXB informative sites in the TNX recombinants of

CAH-El and JRA-Ll. Informative sites are shown as vertical strokes', an open box represents the 120 bp deletion. The continuous DNA sequence for the breakpoint region of the TNXB/TNXA recombinant (EIXB-A) is available from the GenBank under the accession number AF086641.

63 * M ** * 100 ISO •ntXD-H ...... g ...... C-O...... 0 ...... g ...... 'noCB-M ...... g ...... C-O...... c ...... g ...... BIXB-A ...... g ...... , ------C-O ...... o ...... g ...... blX A -B ...... t ...... t - a ...... g ...... g ...... TTOA-Y ...... t ...... - ...... t - a ...... g ...... o . g ...... TNXA-M ...... t ...... g ------t a ...... g ...... o - g ...... CONSDJSUS cttg tgcatttua-ctgaagacaaaoatgactgcaggaatugqcaggccfl gagi-gggg-g-cctggccCgt-cccaggAaggaggaggagtctgcagcc ctgtg-gotCcaacatccatcaaggagtccagagcaggagccaggccagg

Brcakjumt o f i^ene ihtpHcotionfor TNXB ‘TNXA

200 250 * 100 TOXB-II ...... a ...... o ...... TNXB-H ...... o ...... BIXB-A ...... a ...... - ...... o ...... LIZA-B ...... - ...... t ...... ■nOCA-Y ...... t ...... TNXA-M ...... t ...... CONSQISUS cgggagggaaaggccctgggaggggctctctaatctcccngccccgactc tgccccgtcactgccgccgccccLcattactcgctggggctgctgCcgcc tccccgaaggglggccttgcccagaCag-ggcaaaccCccccgccgcgga

150 400 450 TNXB-H ...... - ...... TNXB-H ...... BUCB-A ...... - ...... LIXA-B ...... - ...... - ...... TNXA-Y ...... - ...... a ...... TNXA-M ...... CONSO^SUS cgagtcaggagcalCttctCaagaggaacatcactggaaaacaaaatgag cggggacacagaAaccaacAgcagCggcigcACttgtggCacaggctccC ctCccagagctcgctgatgcccaccLcagAcaggccigaccAcggcacgg

500 550 600 TNXB-M ...... TNXU-M ...... il x a - A ...... LIXA-B ...... g TNXA-Y ...... TNXA-M ...... CONSENSUS ctggtggjaiiigc:cauicacctcaaccatjccagiiccacccicagctic iclcagaagggagcaccacActcclcaagclcagtgaatgtaLcccggca CgggtgggyccagagcctgtgatatcCcgaggtgggctcggcaggacacc Chi .w ^o em e 650 700 750 TNXB-H ...... TNXB-H - ...... - ...... - ...... - ...... - ...... - ...... g a c t g g - - ...... - ...... - ...... LIXA-B ...... tlX B -A ...... tn x a - y ...... TNXA-M ...... CONSENSUS ggggcgtggaagggggaagcgagcACCtgacCcagacagcgcgggagctc gcaggagtcacgaggccacagcgacitcaltgtct gaccgggcc CugaccLaiaaacitcccacctcagcctcgggccaagcctggaagataaa

800 850 900 TNXB-H ...... TNXB-M ...... - ...... - ...... - ...... B U B 'A ...... LIXA-B ...... TNXA-Y ...... - ...... TNXA-M ...... CONSENSUS a a t g g a g c a c c c c a l g g c g c c c c t c a c t c a g a t t c t c c c c t g g g c l t c t c ccacgcagCCCCAQAAGAOGACACACCAOCCCCAQAOTTAOCCCCAGAOJ CCCCIUAOCCTCCTXJAAnACCCCCOCCT/iCXJAGTCCTGACCCTrOACCOAC EXON J J {35B2i PEEDTPAPELAl'EA PBPPEEPRLOVLTVTD

(to be continued)

Fig. 2.7 Elucidation of the breakpoint region for TNXB-XA recombinant. 59

liiii Si:sis iiii iBBi ëiiiiiiiBB! iiiggii N

C o I

8 s :

0 : H" C "Hg " i 3C 5* 1 {Fig. 2.7 continued)

CUB*ALUA^B

TOXA-M ...... - ...... COI)SD)SUS tcacggyguyuACCtagcgcctcagccagcggtUtfUOJtgggcgagttog tggtgggcccggaggaAictgcagagcgacttccaLtcctggggaccaga ugaaaaggggiggiumgccigtgciggagcagmggcgagggggggactcg

2000 2050 2100 nOlB-HTOXB-H ...... 0a ...... BIXB-A ...... 0...... L U A -B ...... a ...... TTOA-Y ...... a ...... •mXA-M ...... 0...... CONSQJSUS c a o g g a g - a u c c t c c c t g c 'c c e t g c c t g c g i c a i l g t i c c t t g a c c c c t c tgcay'n>nU3Aiyk«XVcXXnUACCTCCAATrcAC7rGAAATCA0GUAGA CXrrZAUC^TAACXrirAACIWATWCCVCA^ATCWUGOC'QUACAGCTrc felYON J5 I3B00I VUKSPRDUOFSEIRET SAKVMHMPPPSHADSP

•nttBM ...... t...... t...... B U B 'A ...... o ...... L U A -B ...... - ...... t ...... TOXA-V ...... a ...... TTOCA-M ...... C...... o ...... CONSBISUH AAAifiv-^CTACCACCTOOCCXJACOCjAOgcggtgcctttgccatoigctc atcgccicgcat.ccccicccccccct(jcactct jcccaccctccagcogc cccggggLtcccLgggiaacccccgaiccccaa*gttCt.ca(jaoaACCCT XVSYgLADOO (}84H EXON 36 {3843} E P

- llOhiuirUtion : W hp tlxan . f A + M)hpIntwn 36(EIXB'A iu)ilTNXA) - 0 \ TNXB-H C7N TTfXB'H BIABA ...... C...... 0 ...... bUCA-B ...... T------A...... •nJXA-Y ...... c 0 ...... ■niXA-H ...... c — o ...... CONSENSUS CAOAOTCTrcX^tXTItXlA-OCXX-OCXXrCOQACCCACAAACTXXAOGOOCT OATCCCAOOCOCTCOCTATOAOQTOACan*OqTCTOOOTCCQAttX T r iU AOqAQA(?Ta*QCCTCTCACAOlX T r CCTCACCACOO0t g a g a t 0 B a c tg g gsVQVDGOARTQKLQGL IPOARYEVTVVSVROPE EBBPI. TOFLTT (3888} R

TOXB-M BLXB-A LIXA-B TYIXA-Y TTOCA-M CONSmsUS gaccoggggcaagaggcgggag-caagaaaacggcatgggtgggagttga gagagaacgaggagggigaaagggaggCggtggaggctccgattgcggac gggaggccagtggagtctgggyaggcacggagtagagagagcegcgggga

2600 2650 2700

BIXB-A ...... LIXA-B ...... TW A-Y A...... TTOCA-M ...... CONSENSUS cecttctgaycccctccccitcccccagT IC C T U A C C X ncC C A C A C A C nT GCGTT3CAC7xyCACTrGACCGAOCXJCTTOXXait;CTCCACT«3MGCCCC CWACJAAlXXTVIOaACACC-rATUACqTCCACXrr CACAÜCCCCT EXON 37 (3889} VPDGPTOL RALHLTEüPAVL HWKPP QNPVnTYDVgV TAP I T A

(to he continued) {Fig. 2 .7 continued)

II.

200 400 bp I I I

TNX,TNXB B HUB B AAAA A to CYF2IA II □

120 bp (Iclctioii 0\ «-J H mikpolnlof Gene DiipBcallon J Chi

(U) BP2 AAAA A BBB B B to CYP21B 10 20 kb RCCX modules I 1 1 I I

I. Bimodular (L-L) TNXB

I- C4A 21A C4B 21B 2. C4A 21A C4A 21B 3. C4A 21A C4B 2/A* ? 4. C4B 2IA C4B 2IB

n . Bimodular (L-S)

5. C4A 21A C4B 21B 6. C4A 21A C4A 21B 7. C4A 2IA C4B 2/A* ? 8. C4B 2IA C4B 2IB

in . M onom odular (L) RPl KT TNXB 9. C4A 21B 10. C4B 21B 11. C4A 2/A*

rV. Monomodular (S) RPl m : TNXB

V. B im odular (S-S) CEQZZZZKIz^ h

Fig. 2.8 RCCX modular variations in the human population. Monomodular and bimodular RCCX structures with 13 different haplotypes of RP, C4, CYP21 and TNX gene combinations are shown. The common haplotypes are in bold. Disease haplotypes associated with CAH are marked with asterisks. Abbreviations: 21 A, CYP21A; XA, TNXA. The presence of haplotypes 4 and 8 (indicated by question marks) has not been definitively demonstrated.

6 8 Family B

B1 a. RPl - C4B1 (S; Chl) - CYF21B - TNXB b. RPl - C4BI (S; Chl} - CYP21B - TNXB

B2 c. RPl - C4A3 (L; Rgl) - CYP21A - TNXA - RP2 - C4A3 (L; Rgl) - CYP21B - TNXB d. RPl - C4A3 (L; Rgl) - CYP21A - TNXA - RP2 - C4B1 {L; Chl) - CYP21B - TNXB

B3 a. RPl - C4B1 (S; Chl) - CYP21B - TNXB d. RPl - C4A3 (L; Rgl) - CYP21A - TNXA - RP2 - C4B1 (L; Chl) - CYP21B - TNXB

B4 a. RPl - C4B1 (S; Chl) - CYP21B - TNXB c. RPl - C4A3 (L; Rgl) - CYP21A - TNXA - RP2 - C4A3 (L; Rgl) - CYP21B - TNXB

Family E s E2 a. RPl - C4A3 (L; Rgl) - CYP21A - TNXA - RP2 - C4A3 (L; Rgl) -CYP21B - TNXB(2.5) b. RPl - C4A3 (L; Rgl) - CYP21A - TNXB* (2.4)

E3 c. RPl - C4A3 (L; Rgl) - CYP21A - TNXA - RP2 - C4B1 (L; Chl) - CYP21B - TNXB(2.5) d. RPl - C4A1 (L; Chl) - CYP21A - TNXB (2.5)

E1 b. RPl - C4A3 (L; Rgl) - CYP21A - TNXB* (2.4) d. RPl - C4A1 (L; Chl) - CYP21A - TNXB (2. 5)

(to be continued)

Table 2.1 RCCX Modular Structures of the B and E Families {Table 2.1 continued)

HLA haplotypes of Family E

E2 a. WLkA2B44DR4 b. HLAA3B35DRI

E3 c. HLA A29B44DRI4 cl. HLA A3 B47DR7

El b. HLAA3B35DRJ ^ d. HLA A3 B47 DR7 CHAPTERS

DETECTION OF Msc I POLYMORPHISM AND

THE CONCURRENT 120 bp DELETION IN TNXB/XA RECOMBINANT GENE

3.1 ABSTRACT

Four tandemly arranged genes in the human MHC complement gene cluster

(MCGC), serine/threonine kinase RP, complement C4, steroid 21-hydroxylase ÇYP21, and tenascin TNX are organized as a genetic unit designated as the RCCX module.

Malfunction of CYP21 leads to congenital adrenal hyperplasia (CAH). A rearranged monomodular RCCX with TNXB/XA and concurrent deletion of RP2-C4B-CYP21B was observed in a CAH patient. Nine informative sites were revealed to distinguish TNXA from TNXB, Two of the sites in TNXB can be recognized by the Msc I restriction enzyme. The conversion of TNXB specific sequences to TNXA specific sequences leads to the abolition of the Msc 1 site. When the nucleotide conversions are associated with the

71 120 bp deletion, i.e., the presence of TNXB/XA hybrid, it also indicates the deletion of

RP2-C4B-CYP21B genes. Molecular genetic techniques were developed to detect the presence of the TNXB/XA recombinants in the population. This method can provide an efficient approach to screen for the possible carrier of the CAH disease in the clinical study.

72 3.2 INTRODUCTION

In human MHC class IH region, genes located in between complement C2 and extracellular matrix tenascin X (TNX) are defined as the MHC complement gene cluster

(MCGC) (Yu, 1999). It is a highly variable genomic region that is characterized by polymorphisms, variations in gene size and gene number. Four tandemly arranged genes in this cluster, serine/threonine kinase complement C4, steroid 21-hydroxylase

CYP21, and tenascin TNX are organized as a genetic unit designated as the RCCX module. Combining the modular number variations (monomodular, bimodular, or rarely trimodular), together with the C4 gene size variations (long or short), there have been 13 different RCCX haplotypes observed. The diversities in the number and size of the

RCCX modules probably contribute to the genetic variability and instability of the MHC class in region (Yang et a i, 1998).

TNXB gene is 68.2 kb in size, consists 45 exons, and encodes a functional protein product (Bristow et a i, 1993; Rowen et ai, 1997a, b). TNXA is a partially duplicated gene segment that corresponds to intron 32 to exon 45 of TNXB. TNXA also acquires a

120 bp deletion at exon 36 - intron 36 that results in a frame shift mutation and premature termination of translation (Gitelman et al., 1992; S hen et al., 1994). In a congenital adrenal hyperplasia (CAH) patient El, an unusual chromosome rearrangement was identified. It contains a TNXB/XA hybrid gene with the 120 bp TNXB specific

73 sequence deleted. Sequence comparison of this hybrid gene with normal TNXA and

TNXB genes revealed nine single nucleotide changes or informative sites, together with the 120 bp TNXB specific sequence, which can differentiate TNXA from TNXB (Yang et a l, 1999).

The recombinant chromosome in El is resulted from the pairing and crossover between TNXA from a bimodular RCCX chromosome with the TNXB from a monomodular RCCX chromosome. It can lead to the formation of TNXB/XA hybrid with the deletion of the intervening RP2- C4B-CYP21B genes. Individual containing this recombinant chromosome is a CAH carrier (Yang et a l, 1999). The reciprocal product of the unequal crossover has TNXAJXB with two CYP21B genes. It was detected in the

JRA family L (Rupert et al, 1999). Reversed associations of the informative sites were observed at the TNXB/XA and TNXA/XB recombinants.

From the study of E l, it was found that TNXB/XA recombinant not only implies the abolition of the TNXB function, but also indicates the deletion of the C4B and

CYP21B genes, the latter of which may lead to the pathogenesis of CAH. Despite the importance, there has been no simple method to identify the recombination between

TNXA and TNXB. Therefore, molecular genetic technique was developed to detect the presence of the TNXB/XA recombinants with the deleterious 120 bp deletion in the population. This method can provide an efficient approach to screen for the possible carrier of the CAH disease in the clinical study.

74 3.3 EXPERIMENTAL PROCEDURES

Oligonucleotides — Oligonucleotides were synthesized by Life Technologies

(Grand Island, NY). For amplification of TNXB specific probe: XB-MS5, AGA TCA

GAG CCT GGC AGT GAT GGG; and XB-BK3, AAA CGC AGA GAA GGT GGG TTG

GTA.

Isolation o f Human Genomic DNA — Genomic DNAs were isolated following standard protocols from peripheral blood of normal individuals, congenital adrenal hyperplasia (GAH) patients, pauciarticular juvenile rheumatic arthritis (JRA) patients, and

IgA nephritis (IgA-N) patients. Appropriate consents from blood donors were obtained according to approved protocols by the Institutional Board of the Columbus Children's

Hospital.

DNA Probes — TNX-500 is a 500 bp fragment corresponding to exons 35 - 37 of

TNXA. It is amplified using primers RDX-5 and RDX-3 from cos 4A3. TNX-800 is an

800 bp TNXB specific fragment located immediately upstream of the TNXA/XB breakpoint.

It was amplified from cos 2 using primers XB-MS5 and XB-BK3.

75 Southern Blot Analysis — Ten micrograms of genomic DNA were digested to completion with Msc I or Taq I and resolved on 0.8% agarose gel. In a separate experiment, DNAs were subjected to Msc I - Pst I double digestion and resolved on 1.2% agarose gel. Afterward, they were blotted after standard procedures and hybridized with an appropriate [a-^”P] dCTP-labeled TNX probe.

76 3.4 RESULTS

Two nucleotide conversions at the TNXB informative sites lead to an Msc I RFLP

Among the nine TNX gene informative sites revealed from the previous study, two of them located at nucleotides 2268 and 2273, respectively, are present at less than

30 nucleotides away from the 5’ of the 120 bp TNXB specific sequence (Yang et al.,

1999). It was discovered that in TNXB, these two nucleotides are in the restriction enzyme Msc I recognition site: TGGCCA. However, in TNXA sequence, they are mutated to C and G respectively, and therefore abolished the Msc I site. Moreover, if these two conversions are conjugated with the 120 bp deletion (like the case in the CAH patient El), the abolition of the Msc I site can further serve as an indicator of the RP2-

C4B-CYP21B deletion.

Detailed sequence analysis of RCCX revealed two other Msc I sites adjacent to the one located at the TNX gene informative sites {Fig. 3.1). One is 3.1 kb upstream in

TNXB gene, which is 800 bp upstream of the TNXA/XB breakpoint. The other site is 3.2 kb downstream in the CYP21B gene. Therefore, it was predicted that in a geneomic

Southern analysis using Msc I enzyme for the digestion and a TNXB specific DNA probe, the normal TNXB gene can be represented by a 3.1 kb genomic fragment. On the other hand, if the Msc 1 site at the informative sites was abolished due to the two nucleotide conversions, a 6.3 kb Msc I genomic fragment will be detected.

77 This Msc I RFLP was investigated in 19 CAH patients, 40 IgA-N patients, 41

JRA patients and 14 normal individuals. The results are summarized in Table 3.1. Data from nineteen individuals were presented in Fig. 3.2. Among them, lane 1 represents a normal individual with standard TNXB gene. Lanes 2 and 3 are from CAH family: parent

E2 and patient El; lanes 4 - 8 are from five other CAH patients. Lanes 9 and 10 are from JRA family L: patient LI and grandparent L3 (Rupert et a i, 1999); lane 11 is from another JRA patient. Lanes 12 - 17 are from six IgA-N patients. In lanes 18 and 19, two healthy controls were included.

In lanes 1, 9 and 10, the 3.1 kb Msc I fragment representing the normal TNXB gene was observed. In lanes 2, 3, 6 — 8, 11, 13 — 17 and 19, two fragments with the sizes of 3.1 kb and 6.3 kb were detected. The 6.3 kb band indicates the mutation of the Msc I site at nt. 2268 - 2273. These twelve individuals are heterzygous for this TNXB gene mutation. There are four other individuals who are homozygous for the same mutation since they only exhibit the 6.3 kb Msc I fragment {lanes 4, 5, 12, and 18).

As shown in Fig. 3.2 and summarized in Table 3.1, there is an unexpectedly high frequency of the TNXB gene associated 6.3 kb Msc I fragment in both patient and normal population. In CAH patient E l, it has already been shown that the presence of 6.3 kb

Msc I fragment is accompanied by the 120 bp deletion in TNXB/XA recombinant (Yang et al., 1999). To determine whether the Msc I RFLP is a marker for the concurrent 120 bp deletion and the presence of the rearranged monomodular haplotype, the RCCX modular structures for two CAH patients, who have homozygous TNXB 6.3 kb Msc I fragment

{Fig. 3.2, lanes 4 and 5), were analyzed.

78 Taq I RFLP {Fig. 3.3) suggests that both patients are heterozygous for biomodular

RCCX: RP1-C4(L)-RP2-C4(L) and RP1-C4(L)-RP2-C4(S) {lanes 2 and 3). It is well known that CAH patients do not have normal CYP2I function (New, 1995; White et a i,

1994). In these two cases, although the 3.7 kb fragment representing CYP21B is detectable, the 3.2 kb fragment corresponding to CYP21A is three times more intense, indicating the presence of two CYP21A genes on one chromosome. From this experiment, it is concluded that the Msc I RFLP is not necessarily coupled to the presence of a rearranged monomodular haplotype with TNXB/XA recombinant.

Identification of the TNXB/XA hybrid gene

In order to distinguish the TNXB/XA recombinant containing the concurrent Msc I

RFLP and 120 bp deletion from the TNXB with only Msc I RFLP but no 120 bp deletion, an informative RFLP with Msc I - Pst I double digestion was specifically designed.

Sequence analysis shows that there are two Pst I sites located 263 bp upstream and 561 bp downstream of the Msc I (2268-2273) site {Fig. 3.4). In a Msc I - Pst I double digested genomic Southern blot analysis using TNX-500 as the probe, the normal TNXB gene is represented by two fragments of 561 bp and 263 bp, respectively. The TNXB with the Msc I site mutated can be revealed by a single 824 bp fragment. If there is the

120 bp deletion concurrent with the Msc I mutation (similar to the case in El), the sequence becomes a TNXB/XA hybrid and the 824 bp fragment should no longer be detected. In this case, 77VX4-like characters in between the two Pst I sites should be observed. The normal TNXA gene lacks the specific Msc I site but does contain the two

79 Pst I sites at the corresponding locations. Therefore, the TNXA gene or the TNXB/XA hybrid can be shown by a 704 bp fragment.

The nineteen individuals analyzed for the Msc I site mutation were again examined for the concurrent 120 bp deletion {Fig. 3.5). The individuals in lanes 1, 9, and

10 were previously shown to contain the normal TNXB sequence represented by the 3.1 kb Msc I fragment. This was confirmed by the presence of the 561 bp and 263 bp TNXB fragment in all three lanes. In addition, in lanes 1 and 10, the TNXA specific 704 bp fragment was also exhibited, and thus indicating that these two individuals have bimodular RCCX chromosmomes. All other individuals were previously demonstrated to be either heterozygous or homozygous for the mutation at the Msc I site. However, the

824 bp Msc I -P st I fragment corresponding to the mutation was missing in lanes 2, 3, 7,

14, and 17. It suggests that in these five individuals, the 120 bp TNXB specific sequence was concurrently deleted with the abolition of the Msc I site, and therefore, these individuals contain the TNXB/XA hybrid genes.

80 3.5 DISCUSSION

In this chapter, definitive RFLP has been designed for the detection of the Msc I site mutation in the TNXB gene. With another informative RFLP, it is also possible to conveniently identify the concurrent deletion of the 120 bp deletion in the TNXB/XA recombinant. In this study, five individuals were discovered to contain the abnormal

TNXB genes. Among them, CAH patient E l (Jane 2 in Figs. 3.2 and 3.4) and the patient’s father (lane 3 of Figs. 3.2 and 3.4) have been proven to comprise an unusual chromosome rearrangement resulting in a TNXB/XA hybrid gene and the deletion of the

RP2-C4B-CYP21B genes (Yang et a l, 1999). For the other three individuals newly identified in this study (lanes 7, 14 and 17), it is likely that they also possess the rearranged monomodular RCCX module with RP2-C4B-CYP21B genes deleted and a

TNXB/XA hybrid gene. Consequently, these people become carriers of the CAH disease.

If only the pseudogene CYP21A exists on the other chromosome 6, the person suffers

CAH. This can be demonstrated by the established case in E1 and a newly discovered patient presented in lane 7. Two other examples shown in lanes 14 and 17 do not manifest CAH phenotype, as they contain a functional CYP21B gene from the other chromosome 6. However, these two do suffer from IgA nephritis. IgA-N has been suggested to be associated with C4B deficiency (Abe et al., 1993; Welch et al., 1989). It is probably worthy of noting that at least one C4B is also possibly deleted in one of their

81 chromosomes, which bears the TNX hybrid gene resulting from the unequal crossover.

Whether this C4B deletion has a direct or indirect effect leading to the kidney disorder is yet to be determined. It has been suggested that malfunction of TNXB may be associated with a connective tissue disorder, Elner Danlo's syndrome (EDS) (Burch et al., 1997).

Therefore, these five individuals may also be carriers of the EDS.

One of the two nucleotide conversions in TNXB, T2268->C, is a silent mutation, while the other one A2273->G can lead to a Q R substitution. It is not clear whether the latter has any effect on the function of TNXB. As the abolition of the Msc I site caused by these changes was found not only in patients, but also in normal individuals, some of whom even are homozygous for the mutation, it appears that these two nucleotide changes do not cause any obviously identifiable disease. However, as it is already known that a pool of DNA mutations or allelic variants affects the expression/function of different genes. Individually, most mutations in the genome have mild, if not undetectable, effects, but in combination with "normal" alleles of other loci, they can produce a measurable phenotype (Vyse and Todd, 1996). Whether the single nucleotide polymophisms (SNPs) here in TNXB gene could have any potential damage in the whole genomic context needs to be identified. Obviously, the elucidation of the TNX function becomes extremely critical for the understanding of the possible role in disease association.

82 1 2 kb I .. I ■ I

21B (120 bp ) XA/XB

XB

M3.1 M6.3

U)00

Fig. 3.1 The Msc I map at TNXB and CYP21B gene region. The transcriptional

orientations are shown by horizontal arrows. The vertical arrows point out the 3' end of the

two genes. The duplication breakpoint of TNXA/TNXB is indicated by a dashed line. The 120

bp TNXB specific sequence is represented by a triangle. Msc I restriction sites are marked with

the polymorphic site outlined. Solid bar represents the location of the probe used. The sizes of

the Msc I restriction fragments are labeled below the map. • • • • • k b I 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

6.3 — *■

3.0 > m# wm mw, m oo4^

Fig. 3.2 Southern blot analysis to detect the nucleotide mutations at the Msc I site (nt. 2268 - 2273) in TNXB gene. Genomic DNAs were digested to completion with M.vc I and hybridized with the TNXB specific probe TNX-800. Lane 1, normal control with standard TNXA and TNXB gene; lane 2, father of CAH patient El ; lane 3, CAH patient El; lanes 4-8, CAH patients; lane 9, JRA patient LI; lane 10, grandfather of LI; lane 11, JRA patient; lanes 12-17, IgA-N patients; lanes 18 and 19, normal individuals. 7 .0 ÜfciêüÿlF jg — RP1-C4(L) 6 -0 î*W isf ». — RP2-C4(L) 5 .4 ^ u — RP2-C4(S)

W # - — CYP21B 3 .2 ___^ j ___ CYP2IA

2.5 ,. — TNXB 2 .4 ' ' — TNXA

Fig. 3.3 Southern blet analysis to determine the RCCX modular structures of CAH patients. Genomic DNAs were digested with Taq I and hybridized with RPl.l, CYP21 and TNX-500 probes. Lane 1, normal control with known homozygous RP1-C4(L)-RP2- C4(L); lanes 2 and 3, CAH patients with homozygous 6.3 kb Msc I TNXB fragment; lane 4, CAH patient El with heterozygous 6.3 / 3.1 kb Msc I TNXB fragments.

85 50 100 bp I I I

120 bp 1 T T" T

P i . 4 P. Pst I M, Msc I

263 1 561 normal XB; XA/XB 2 824 X B O n

3 704rA120') normal XA; XB/XA 263 X A (M +) 4 441 rA120')

Fig. 3.4 A schematic diagram to show variant TNX genes by Pst I — Msc I RFLP. The 120 bp TNXB specific region is indicated between two vertical arrows. The polymorphic Msc I site is outlined. Four different possibilities are shown with the sizes of the restriction fragments marked. XA/XB, TNXA/XB hybrid gene, gaining both Msc I site and the 120 bp sequence; XB (M“), TNXB gene losing the Msc I site; XB/XA, TNXB/XA hybrid gene, losing both Msc I site and the 120 bp sequence; XA (M"^, TNXA gene gaining the Msc I site.

86 # # e 6 7 8 9 10 11 12 13 14 15 16 17 18 19

# : m m ## ^ ^ m m

# W':' S3

Fig. 3.5 Southern blot analysis to detect the concurrent 120 bp deletion and Msc I RFLP in TJVXB genes. Genomic DNAs were subjected to Msc I - Pst I double digestion and hybridized with the TNX-500 probe. DNA samples are in the same order as described in Fig. 3.3. Number of Patients Disease total heterozygous homozygous 6.3/3.1 6.3/63

CAH 19 5 2

IgA-N 40 6 1

JRA 41 5 1 normal 14 3 1

Table 3.1 Detection of the 6.3 kb Msc I RFLP in TNXB genes

8 8 CHAPTER 4

FOUR UBIQUITOUSLY EXPRESSED GENES, RDSKI2W-DOM3Z-RP1,

ARE PRESENT IN THE CLASS ffl REGION OF THE MHC

4.1 ABSTRACT

The association of the MHC class III region with many diseases motivates the investigation of unidentified genes in the 30 kb segment between complement component genes Bfanà. C4. RD, which codes for a putative RNA binding protein, is 205 bp downstream of Bf. SKI2W, a DEVH-box gene probably involved in RNA tumover, is 171 bp downstream of RD. RPl is located 611 bp upstream of C4. The DNA sequence between human RD and R P l was determined and the exon-intron structure of SK12W elucidated.

SK12W consists of 28 exons. The putative RNA helicase domain of Ski2w is encoded by nine exons. Further analysis of the 2.5 kb intergenic sequence between SKI2W and RPl led to the discovery of DOM3Z. The full-length cDNA sequence of D0M 3Z encodes 396 amino acids with a leucine zipper motif. Dom3z-related proteins are present in simple

89 and complex eukaryotes. In C. elegans, Dom3z-reiated protein could be involved in the development of germ cells. Human RD-SKI2W and DOM3Z-RP1 are arranged as two

head-to-head orientated gene pairs with unmethylated CpG sequences at the common 5' regulatory region of each gene pair. The ubiquitous expression pattern suggested that these four genes are probably housekeeping genes.

90 4.2 INTRODUCTION

Many autoimmune, genetic and multifactorial disorders are associated with the dynamic genomic region of chromosome 6 known as the MHC (Porter, 1985b; Tiwari and Terasaki, 1985). As one of the most gene-dense regions in the human genome, identification of genes present in the MHC class m region and elucidation of their function are of profound interest. On the average, this region contains a gene in every

15 kb of genomic DNA (Aguado and Campbell, 1996) and the intergenic regions between adjacent genes are minimal in size. For examples, the complement component C2 gene and factor B gene are separated by only 421 bp (Wu et al, 1987), while the extracellular matrix protein tenascin-X (TNX) gene and the cytochrome P450

21-hydroxylase gene (CYP21) overlap at their 3' ends (Morel et ai, 1989). In the 30 kb genomic region between factor B (Bf) and complement C4, others have cloned RD downstream of the B f gene (Lévi-Strauss et ai, 1988); we have demonstrated the presence of SKI2W (Dangel et ai, 1995a) and RPl (Shen et ai, 1994) upstream of C4.

RP, complement C4, steroid CYP21, and tenascin TNX are duplicated together as genetic unit termed the RuCX module in 85.6% of the Caucasian population (Shen et ai, 1994; Yang, Z., Mendoza,A.R., Blanchong,C.A., Rennebohm, R. and Yu, C.Y., in preparation).

91 The RD gene is 6.7 kb in size, consists of eleven exons and is organized in a tail- to-tail configuration with Bf. The intergenic distance between the 3’ ends of RD and Bf is 205 nucleotides (Rowen et al., 1997b). The putative RD protein has 380 amino acids (Cheng era/., 1993; Speiser and White, 1989; Surowy er a/., 1990). It contains the RNA recognition motif (RRM) of the ribonucleoprotein (RNP) family that is also present in poly(A) binding proteins (Homstein et al., 1997) and in nucleolysin

(Kawakami et ai, 1992). However, the most striking feature of RD is the presence of twenty-four copies of alternate, positively and negatively charged residues, Arg-Asp

(RD) (Lévi-Strauss et al., 1988), which may be related to RNA binding. The cellular function of RD has not been determined.

Human SKI2W codes for a transcript of 4 kb in size that encodes a polypeptide of

1,246 amino acids. The Ski2w protein contains an RNA helicase domain with a

DEVH-box, two leucine zipper motifs that may be involved in protein-protein interaction, and an RGD motif that may be a ligand for cell adhesion molecules.

ATPase activities were demonstrated from fusion proteins containing human Ski2w

(Dangel et al., 1995a). Hydrolysis of ATP is required as an energy source for helicases to unwind helical structures of nucleic acids (Fuller-Pace, 1994). By indirect immunofluorescent experiments using Ski2w specific antibodies, it has been shown that Ski2w is present in the nucleolus and also in the cytoplasm (Qu et ai, 1998; Lee et ai, 1995). Immunoblot analysis further revealed that in the cytoplasm of HeLa cell extracts, Ski2w is likely associated with ribosomes and polyribosomes (Qu et ai,

1998). The human Ski2w protein shares striking and extensive sequence similarity to

92 the yeast antiviral protein Ski2p. The yeast Ski2p is involved in antiviral defense probably through its general role on the RNA tumover or the regulation of RNA degradation (Jacobs et al, 1998; Masison et al, 1995; Widner and Wickner, 1993).

The expression pattern and the localization to the nucleolus and polysomes would suggest that the human Ski2w shares similar functional properties with its yeast homolog.

The yeast SKI2 gene is intronless but the human SKI2W gene is split into multiple exons by introns. Elucidation of the human SKI2W gene structure is necessary for molecular genetic studies to investigate the mutation and polymorphism of this gene and its disease associations.

The R P l gene is about 11 kb in size, consisted of nine exons, and located 611 bp upstream of C4 (Shen et al., 1994). There is a bipartite nuclear localization signal close to the amino terminal of the protein sequence, suggesting RPl could be a nuclear protein. The primary structure of RP 1 does not show significant similarities to any known protein. Two different cDNAs, RPl and GII, have been reported. R Pl extends the reading frame of G il at the 5’ region. The larger isoform, RPl, encodes

364 amino acids (Shen et al., 1994). The smaller isoform, G il, encodes 254 or 258 amino acids (Sargent ef a/., 1994).

To study the genetics and functions of the MHC class HI genes, the DNA sequence between RD and RPl was completely determined. From this sequence the exon-intron structure of the SK12W gene is elucidated. Analysis of the 2.5 kb intergenic sequence between SK12W and RPl led to the discovery of D0M3Z, which has related genes in yeasts, a flowering plant and a nematode. RD-SK12W and DOM3Z-RP1 are arranged

93 as two head-to-head orientated gene pairs separated by only 59 nucleotides. The ubiquitous gene expression and structural features suggest that these four genes are probably housekeeping genes.

94 4.3 EXPERIMENTAL PROCEDURES

Sequence determination of the human SKI2W gene — Cos 3 A3, which contains the human C4A3 gene and about 20 kb of genomic DNA upstream of the C4 gene (Carroll et al., 1984; Yu, 1991), was digested with BamY{ I and subcloned into pBluescript vectors.

Subclones containing SKI2W genomic DNAs were screened with SKI2W cDNA probes

(Dangel et ai, 1995a). Four different subclones containing 3.0 kb, 3.0 kb, 2.2 kb, and 1.8 kb inserts were obtained. These clones were subjected to Exo DI digestion using the Erase- a-Base kit from Promega (Madison, WI). Using cos la DNA that overlaps with cos 3A3 by

6.5 kb at the 5' region (Carroll et al., 1984) as template, a 1.4 kb fragment between RD and

SKI2Wv/as generated by PCR and cloned into pCRC vector (Invitrogen, Carlsbad, CA).

For the PCR, one of the primers, RD5UT 5-GCA GCG ATA T TC ACG CTC TC-3', is located at the 5' untranslated region of the RD gene (Speiser and White, 1989), while the other primer from SKI2W, GW5 5'-TGT AAC CCA GTA TCT GGC CT-3', was derived from this study. Sanger's dideoxy sequence reactions were performed using ^^S-ATP and

Sequenase (United State Biochemicals, Cleveland, OH). Gel readings were compiled by

PC/GENE software (Intelligenetics, Mountain View, CA). Sequence analysis was performed by GCG software (Genetics Computer Group, 1991) from Pittsburgh

Supercomputing Center.

95 Sequencing of EST clones for human DOM3Z — The EST clone ID# 271616 was purchased from Genome System Inc (St. Louis, Missouri). The cDNA insert is present between the Not I and EcoR. I sites in a modified pT7T3 vector. The EST clone

HBC4394 was kindly provided by Dr. J. Takeda (Gunma University, Japan). The cDNA insert is located between the EcoK I and Xho I sites of pBluescript vector. Both clones were sequenced using a thermal sequenase sequencing kit (Amersham Pharmacia

Biotech, Arlington Heights, IL). The primers T7 and T3 were used to sequence across the cloning sites. Internal primers used for sequencing are f0,fl,f2,f3,f4, rO, rl, r3, and r4. Oligonucleotides were purchased from Life Technologies (Grand Island, NY).

Sequences of the primers used in this study are listed below. Sequences added to the primers to facilitate cloning are in lower case. fO: 5' aga gga tcc GAG AAG ACA GAG GTA GCT 3' fl: 5’CGA GGC CTG CGC TAG TAT AG 3' f2: 5' aga gga tcc GGG GAG GTG AGA AAA GTG 3' f3: 5’ GAG GTT ATG TAG ATG GGA TAG 3’ f4: 5' GAA GAT GTT TGA ATA TGT GAG G 3' rO: 5' aga gga tcc AAG GTG TTA GAA ATA TGG TTT 3' rl : 5' GTG TGG ACT TGA GTG AGG TAT 3' r2: S' GGT GAG GAG TTT TGT GAG GTG 3' ri: 5' GGA GGT GGT GGA GGG TTT GGT 3' r4: 5' TGG GAA GAG AAG AGA TGA AGG AG 3'

96 Determination of 5' cDNA sequences for human D0M3Z by 5’ RACE — Total

RNA was isolated from HeLa cells using RNAzol™ B (Tel-Test Inc, Friendswood, TX) following the manufacturer’s instruction. To obtain the 5' end of the DOM3Z cDNA, 5'

RACE was performed using a kit from Life Technologies. Briefly, the first strand cDNA was synthesized from total HeLa RNA using the rl primer. A homopolymeric tail was added to the 3' end. PCR amplification was performed using the dC-tailed cDNA, the nested r2 primer, and an anchor primer. PCR products were cloned into the pCR2.1 vector (Invitrogen). A 0.8 kb BamH I DNA fragment corresponding to the genomic region upstream of the clone 271616 sequence was used as a probe for colony hybridization. Clones containing 5' RACE products were sequenced by automated sequencing (ABI model 377, Foster City, CA) using T7 primer and M l3 reverse primer.

Sequence analyses — The DNA sequence for the intergenic region between RPl and SKI2W was submitted to NCBI (National Center for Biotechnology Information).

BLASTN program (Altschul et ai, 1990) was used to search the EST database for any related cDNA clones. Sequences of the DOM3Z EST clones and RACE products were analyzed using GCG software. Overlapping sequences were compiled to obtain the full length D0M 3Z cDNA. Dom3z amino acid sequence was deduced and used to search for related or homologous proteins using the BLASTP program (Altschul et al, 1990). The

Dom3z related protein sequences were aligned using PELEUP and PRETTY programs of the GCG package. The presence of structural motifs in the human Dom3z protein sequence was searched against PROS LI E database (Hofmann et al, 1999). To define the

97 gene structure for D0M3Z, FASTA program was used to compare the EST and RACE sequences with DOM3Z genomic DNA sequences, which correspond to genomic sequences between RPl (Shen et ai, 1994) and SKI2W (this work).

Pulsed-field gel electrophoresis (PFGE) — Human lymphoblastoid cells were embedded in low gelling temperature agarose, digested with II enzyme, resolved with Statagene rotaphor pulsed field gel machine, and blotted onto Hybond N membrane, as described previously (Yu et ai, 1993). The blot was hybridized to a complement C2 probe, PC201 (D'Eustachio et ai, 1986), a DOM3Z/RP1 5' probe (0.8 kb BamH I genomic fragment), and a 1.1 kb RPl cDNA probe, consecutively.

Southern blot analysis — Cos Ml A1 and cos M1B2 were screened from a

MOLT4 cosmid library (Yu et al., 1993) using a 2.2 kb SK12W cDNA probe (Dangel et al., 1995a). Cos KEMl (Carroll era/., 1985a) and co5 la were previously described

(Carroll er a/., 1984). Cosmid DNAs were digested with BamH I, processed after

Southern's protocol (Southern, 1975) and hybridized to a 1.7 kb SK12W cDNA probe.

Northern blot analysis — Multiple tissue Northern blots (MTN blots I and H,

Clontech, Palo Alto, CA) were hybridized with human RD, DOM3Z and RPl probes, separately. The RD probe was 1.4 kb full length cDNA generated by PCR. The D0M3Z probe was derived from the cDNA insert of EST clone 271616. The RPl probe is a 1.1 kb cDNA fragment (Shen et al, 1994).

98 4.4 RESULTS

Gene structure and polymorphism of human SKI2W

The complete DNA sequence for the human SKI2W gene has been determined by

sequencing genomic DNA from cos 3A3 and cos la (Carroll et a i, 1984). The exon-

intron structure of SKI2W was deduced by comparing genomic and cDNA sequences.

The human SKI2W gene spans 11 kb and contains 28 exons {Fig. 4.1 A). The intergenic

distance between SKI2W and the upstream RD gene is 171 bp. RD and SKI2W are

arranged in a head-to-head configuration.

The helicase domain with the putative double leucine zipper motifs is encoded by

exon 10 to exon 18. Apart from helicase boxes V and VI that are encoded by exon 18,

each of the other helicase boxes is mainly encoded by a separate exon. The helicase box

1 is coded by exon 10. The helicase box la with leucine zipper motif I is coded by exon

11. The helicase box H with leucine zipper motif II is coded by exon 12. Helicase

boxes ni and IV are encoded by exons 13 and 14, respectively. The three putative N-

linked glycosylation sites are encoded by exons 5, 14 and 22, respectively. The acidic

motif is encoded by exon 6. The RGD ceU adhesion motif is coded by exon 8. The two

consecutive sulfation sites are encoded by exon 20. Two Alu elements are present in the

SKI2W gene. Alu-C, flanked by aactatat direct repeats, is located in intron 17; Alu-], in

99 reverse orientation with respect to the SKI2W gene, is present in intron 18. This Alu

element is flanked by cctc direct repeats.

A comparison of the genomic and cDNA sequences of SKI2W revealed three

nucleotide variations. The first is a G to A transition at codon 151 (exon 5) that leads to

a Gin to Arg substitution. The second is a C to T transition at codon 1052 (exon 25) that

leads to a Leu to Phe substitution. Whether these substitutions affect the Ski2w protein

function remains to be determined. The third change is a T to C transition at codon

1,067 (exon 26) that is a silent mutation.

Southern blot analysis of cosmid clones of human SA72 Wgene isolated from

different genomic DNA libraries revealed that SKI2W gene is polymorphic. A Baniil I

restriction fragment length polymorphism (RFLP) was detected when a 1.7 kb SKI2 W

cDNA probe was used for hybridization. As shown in Fig. 4.1 B, besides a common 1.8

kb fragment that is present in all lanes, a 5.2 kb fragment was observed in cos M l A1

(lane 1) and in cos M1B2 (lane 2). However, a 2.2 kb fragment was detected in cos

KEMl (lane 3) (Carroll et ai, 1985) and in cos la (lane 4). This polymorphic BamYi I

site has been mapped to the intron 18 of the SKI2W gene, where the A/w-J element is

located.

Identification o f a new gene, D0M3Z, located in between SKI2W and RPl

To investigate whether the intergenic region between SKI2W and RPl contains an

unidentified gene, the 2.5 kb genomic sequence was submitted to NCBI to search the

EST database. Two cDNA clones, 271616 and HBC4394, were analyzed. The

1 0 0 published EST sequence is 359 bp for 271616 (Hillier et al., 1996), and 270 bp for

HBC4394 (Takeda, 1996). Subsequently, both clones were sequenced to completion.

For clone 271616, the cDNA insert is 1,227 bp in size (nucleotide number 160 - 1386,

Fig. 4.2), and contains a poly(A) tail sequence of 28 nucleotides. For clone HBC4394,

the cDNA insert is 1,315 bp in size (nucleotide number 60 - 1374). It extends clone

271616 cDNA sequence by 100 bp at the 5’ end, but is 12 bp shorter at the 3' end and no

poly(A) tail is present. Both clones contain the poly(A) signal at nucleotide number

1362 - 1367.

The cDNA inserts in the two EST clones contain a reading frame, with a stop

codon TAG at the nucleotide number 1318 - 1320. To obtain the full-length cDNA

sequence and to determine the open reading frame of the new gene DOM3Z, 5’ RACE

was performed using total RNA isolated from HeLa cells. The longest RACE product

extends the 5' end of the known cDNA sequence by 59 bp. Therefore, the composite,

full-length D0M3Z cDNA is 1386 bp in size.

Amino acid sequence encoded by D0M3Z cDNAs

Dom3z amino acid sequence was deduced from the full-length cDNA (Fig. 4.2).

Because of the in-frame stop codon at nucleotide 19-21, the ATG at the nucleotide 130 -

132 was assigned as the initiation codon. The Dom3z protein contains 396 amino acids,

of which 11.6% are proline residues. The predicted isoelectric point for Dom3z is 6.82.

The amino acid sequence was used to search for structural motifs against PROSITE

database. A leucine zipper motif is located from amino acids 103 to 124. Four potential

101 sites for N-myristoylation are located at amino acids 5, 9, 264, and 358, respectively.

There are four potential casein kinase H phosphrylation sites at positions 141, 163, 202, and 256, respectively. At amino acid number 47, there is a potential cAMP- and cGMP- dependent protein kinase phosphorylation site. In addition, six potential protein kinase

C phosphorylation sites are present at amino acids 6, 130, 256, 302, 308, and 394.

Three nucleotide changes were detected between the two DOM3Z cDNA clones.

The first variation is at nucleotide 211, which is an A to T transition that leads to the substitution from Thr to Ser at amino acid number 28. Another variation from His to

Gin at amino acid number 261 is caused by the C to A transition at nucleotide 912. The third nucleotide variation, A-> G at position 873, is a silent mutation.

DomSz-related proteins are present in simple and complex eukaryotes

BLASTP algorithm was used to search for related proteins of Dom3z in the national databases. Hypothetical proteins in Schizosaccharomyces pombe (GenBank accession no. 2440182), Saccharomyces cerevisiae (accession no. 1723981),

Caenorhabditis elegans (Dom3, accession no. 1706489), zs\d Arabdiopsis thaliana

(accession no. 2245120) all share significant sequence similarities and identities with human Dom3z {Table 4.1). Apart from the case in Arabdiopsis thaliana which has a much longer reading frame of 1,148 amino acids, the other Dom3z related proteins are similar in sizes with 393 amino acids for C. elegans, 387 amino acids for S. cerevisiae, and 352 amino acids for S. pombe. The overall sequence identities/similarities vary from 21.37 % / 45.96% between the human and baker's yeast Dom3z related proteins, to

1 0 2 30.3% / 53.33% between those of baker’s yeast and fission yeast. It is worthwhile to mention that the sequence similarity of the human Dom3z protein to the related protein in fission yeast (52.43%) is as close as that between the two yeast strains (53.33%).

An alignment of the five Dom3z related protein sequences is shown in Fig. 4.3. A consensus is deduced from conserved sequences from three or more species. Conserved sequences among all five sequences are highlighted in red and shaded in yellow. They probably represent relevant motifs important for protein structures and/or functions. The most notable conserved sequences are YWGYKFE at positions 231-237, EEYCSVVR at positions 266-273, GEVDC at positions 295-299, YVELKT at positions 334-339, and nVGFRDD at positions 373-380.

The exon-intron structure of human DOM3Z and the 5' region ofDOM3Z/RPI

The human D0M 3Z gene is present between SKI2W and R P l. The exon-intron structure of D 0M 3Z was deduced by comparing D0M 3Z cDNA sequence with the genomic sequences at the 3' region of SKI2W (this work) and the 5' region of RPl (Shen et ai, 1994). The DOM3Z gene contains 7 exons {Fig. 4.4). The first exon is 123 bp in size. It has an in-ffame stop codon at nucleotide 19-21 and therefore is untranslated.

The second exon contains six nucleotides of the remaining 5' untranslated sequence, and the coding sequence for the first 119 amino acids of the Dom3z protein. The putative leucine zipper motif is encoded by exons 2 and 3. The seventh exon encodes for the last

49 amino acids, two consecutive stop codons and an additional 56 nucleotides of the 3’ untranslated region. Stretches of T-rich sequences are present downstream of the

103 poIy(A) site, which is consistent with the 3' ends of mammalian genes (Keller, 1995;

Proudfoot, 1991). Downstream of the DOM3Z gene is the last (twenty-eighth) exon of

SKI2W. These two genes are arranged in tail-to-tail configuration and the intergenic region between them is only 59 bp.

Located at the 5' region of the DOM3Z gene is the RPl gene. DOM3Z and RPl are arranged in head-to-head configuration. The 5’ end of GlI, which is a smaller form for RPl, is 322 bp away firom the 5' end of DOM3Z. The 5’ end for the larger form of

RPl is located at the genomic region corresponding to intron 2 of DOM3Z. Therefore, the 5' regulatory regions of D0M 3Z and RPl probably overlap to some extent. There are no consensus TATA boxes but four SPl and four API sites for bindings of ubiquitous gene expression factors at the 5' regions for D0M3Z and RPl (Quandt et al,

1995).

The DNA sequence for the 5' region of DOM3Z and RPl is relatively rich in G-f-C content, with 61.3% of the nucleotides being G-f-C, in contrast to 41% normally present in a mammalian genome (Normore et al., 1976). In addition, there are numerous copies of CG dinucleotides, which are generally under-represented in the mammalian genome.

Hypomethylation of the CG sequences is an indication of active transcription (Bird,

1986, 1987). Méthylation of CG sequence may block the cleavage of some restriction enzymes that have CG dinucleotides in their recognition sequences (McClelland et al.,

1994). A n recognition site with two CG sequences, GCGCGC, is present 575 nucleotides upstream of the D0M 3Z {Fig. 4.4). Restriction mapping of the genomic

DNA between RD and C4 reveals that there are two consecutive H sites at the

104 intergenic region between SKI2W and RD, which is 13.6 kb away from the II site in the DOM3Z/RP1 complex. To investigate if these BjrjH II restriction sites are unmethylated and susceptible to enzyme cleavage, genomic DNAs isolated from two human lymphoblastoid cell lines with RCCX monomodular structures were digested with n, and resolved by PFGE for high molecular weight DNA. A 65 kb fragment was detected when the complement C2 probe (pC201) was employed (Fig. 4.5, lanes 1 and 2); a -14 kb fragment was detected when the DOM3ZJRP1 probe (B0.8) was used

{lanes 3 and 4); and a 70 kb fragment was detected when an RPl 3' probe (RPl.l) was applied {lanes 5 and 6). Detection of the 14 kb II fragment suggests that the n sites between the RP1/DOM3Z genes and between the RD/SKI2 W genes are probably unmethylated. The detection of the -70 kb fragments when C2 or RPl 3' probes were used suggested that the II sites in the C2 gene and in the CYP2I gene were methylated (and therefore resistant to BssH II cleavage) in the lymphoblastoid cells.

RD, SKI2 W, DOM3Z and RPl transcripts are ubiquitously expressed in human tissues

To study the tissue distribution patterns for transcripts of RD, SK12W, D0M 3Z and RPl, multiple tissue Northern blots were hybridized with appropriated probes {Fig.

4.6).

For RD {panel A), 1.4 kb transcripts were detected in all sixteen tissue samples.

Larger transcripts about 2.0 kb in size are also detectable. The testis {lane 4), heart {lane

9), placenta {lane 11) and pancreas {lane 16) expressed large quantities of RD

105 transcripts. The liver {lane 13) produced RD transcripts at a relatively lower level

similar to that of the lung {lane 12).

It has been shown that SKI2W transcripts of -4.0 in size were detectable in all

human cell lines and tissue samples tested (Dangel ef a/., 1995a, Qu er a/., 1998). As

shown in panel B, SKI2W is expressed in all nineteen tissue samples examined, which is

consistent with the suggestion that SKI2W is ubiquitously expressed (Qu et ai, 1998).

In addition to the 4.0 kb transcripts, slightly larger transcripts -4.2 kb in size were

present in some of the tissue samples, including thymus, ovary, small intestine, colon,

brain, placenta, liver, kidney and pancreas.

The size of the D 0M 3Z transcript is predominantly 1.5 kb but may vary from 1.4

kb to 2.5 kb. In the testis, liver and pancreas, smaller transcripts about 1 kb in size were

also observed. In addition, transcripts about 4 kb, 5 kb and 7 kb were also detectable in

the liver. The expression level for D0M 3Z is high in the testis and pancreas, and low in the lung.

For RPl, the predominant transcript is 1.4 kb in size that was detectable in all tissue RNAs. In addition, larger transcripts of 3 kb and 9 kb were present in many

tissues. Similar to the pattern observed for DOM3Z, the testis and pancreas exhibit higher expression levels of RPl, while low level of expression was observed in the lung.

Detection of multiple transcripts in most tissues was a common feature for the D0M3Z-

RPl gene pair.

106 4.5 DISCUSSION

Four ubiquitously expressed genes, RD, SKI2W, DOM3Z and R P l, are present

within the 30 kb genomic region between complement components factor B and C4

(Fig. 4.7). These four genes are arranged in two tandem gene pairs, each pair in a head-

to-head configuration. The extremely high gene density of the MHC class HI region can

be reflected by the very short intergenic distances, e.g., 171 bp between RD and SKI2W,

59 bp between SKI2W and DOM3Z, and the overlap of the 5’ regions of DOM3Z and

R P l, although these genes are arranged in opposite transcriptional orientations (Fig. 4.7

A). No TATA boxes are present but there are multiple SPl and API sites at the

intergenic and the 5' regulatory regions between RD and SKI2W, and between D0M 3Z

and R P l, which are both relatively GC-rich with multiple copies of CG sequences. The

CG sequences at the promoter regions of these gene pairs are probably unmethylated or

hypomethylated, because the 11 sites at these regions were cleavable to yield a 14

kb restriction fragment. These are hallmarks of ubiquitously expressed genes. Indeed,

Northern blot analysis showed that transcripts for all of these four genes are present in

all tissues tested. It would be of interest to investigate whether the expression of these four

genes are coordinated or controlled by similar trans-acting transcriptional factors.

Investigation on the bi-directionality of the 5' regulatory regions for RD-SK12W, and for

D0M3Z-RP1 would yield relevant information on the control of expression of these genes.

107 Pancreas is a tissue where high levels of expression are observed for all four genes.

Another major site of expression is the testis, where RD, D0M3Z, and RPl showed extraordinary high levels of expression. Transcripts from the liver for SKI2W, D0M3Z, and RPl all contain higher molecular weight isoforms. Partially spliced and multiple isoforms of DOM3Z transcripts are detectable in Northern blot analysis {Fig. 4.6 B) and in cDNA clones (Yang and Yu, 1999). The expression pattern for RPl and D0M3Z are very similar, with relatively higher levels in the testis, ovary and pancreas, and lower levels in the lung and peripheral blood lymphocytes {Fig. 4.6 B, C). It is also worthwhile to note that a targeted disruption of Bf'm mouse resulted in the loss of expression of two flanking genes, which are the upstream C2 gene and the downstream RD gene (Taylor et al.,

1998b). In the mouse MHC, C2 and Bfaie separated by 641 bp, while the 3' ends of Bf and

RD overlap by 2 bp (Rowen et ai, 1998; Yang and Yu, 1999). The very close gene placements in the MHC class III region would imply the presence of regulatory sequences of a gene in its neighboring genes. Therefore it is conceivable that the disruption of a gene in the MHC complement gene cluster would affect the expression of its neighbors.

Only the RPl has no related proteins discovered in lower eukaryotes.

Homologous or related proteins for RD, Ski2w, DomSz are present from simple eukaryotes such as yeasts to highly advanced species such as humans. DomSz related protein is also present in a flowering plant (A. thaliana). The conservation of these proteins implies that they may have very fundamental function in a biochemical pathway. Ski2w contains an RNA helicase domain and is associated with ribosomes (Qu etal., 1998). RD contains RNA binding motifs and 24 copies of Arg-Asp (RD). This gene

108 pair appears encoding proteins both involved in RNA metabolism.

Determination of the complete genomic sequence of the human SKI2W gene

provides the foundation for future genetic studies on the polymorphism and mutation of

the SKI2W gene. Unlike the yeast SKI2 gene that contains no introns, the human SKI2W

gene consists of 28 exons. The helicase domain is encoded by nine exons. The coding

of a functional protein domain by multiple exons is also exemplified by genes coding for

serine proteinases. For example, the serine proteinase domain of factor B is encoded by

eight exons (Campbell et ai, 1984).

Although the function of human DOMSZ is unknown, related genes for DOMSZ are

present in both simple and complex eukaryotes. In the nematode C. elegans, DOMS is

located downstream of the MES-S gene. The MES-S gene encodes for maternal effect

component required for normal postembryonic development of the germ line. Since MES-

3 and D0M3 belong to the same operon in C. elegans, their gene products may be

functionally related. It would be of interest to determine if human DomSz plays a role in growth and reproduction, particularly on the development of germ cells. The relatively high levels of transcripts detected in the testis and ovary are consistent with this notion.

The protein sequences for RD, Ski2w and DomSz all contain leucine zipper motifs

involved in protein interactions. The tight hnkage of RD, SKI2W, DOMSZ and RP, the ubiquitous gene expression pattern, and the potential to bind RNAs (for RD and Ski2w) imply that these proteins may have concerted functions, probably related to RNA translation and turnover. Genetic and functional studies of these four ubiquitously

109 expressed genes may help elucidate the molecular basis of disease associations with the

MHC complement gene cluster.

1 1 0 Fig. 4.1 Exon-intron structure and 5amH IRFLP of the human SA12IV gene.

A. Exon-intron structure and BamH. I restriction map. Exons of SKI2W are m filled boxes.

Exons coding for untranslated sequences are in empty boxes. Vertical arrows show the 3' end of genes. Horizontal arrows represent transcriptional start sites and orientations.

BamH I restriction sites and the sizes of restriction fragments are marked beneath exons.

The polymorphic BamH I site is marked with an asterisk. B. Southern blot analysis of

BamH. I digested cosmid DNAs M lA l Çlane 1), M1B2 {lane 2), KEM-1 {lane 3) and la

{lane 4). The probe used for hybridization is a 1.7 kb SKI2W cDNA fragment.

Ill I kb , A.

RD SKI2W ^ D0M3Z

1 2 .1 4 5 6 7 8 9 10 II 12 13 14 15 16 17 18 19 20 21 22 21 24 25 26 27 28 II mum I I I nil ' Cuz''' ' ' R O D 1 II III IV V V - VI I______Helicase domain______^

Restriction Map (kb)

B B B* B B

N) BssH II

W1.7

L/Z Leucine Zipper-like motifs B, BamU I B*, polymorphic BamH I site (to be continued)

Fig. 4.1 Exon-intron structure and BamY\. I RFLP of the human SK12W gene. {Fig. 4.1 continued)

B.

kb 1 2 3 4

5.2 —

2.2 —

1.8 —

113 Fig. 4.2 cDNA and am ino acid sequences of human DOMSZ. The first 59 bp of the cDNA was obtained by S' RACE. The remaining sequence was derived from EST clones

HBC 4394 and 271616. The initiation codon ATG and the poly(A) signal are underlined.

Asterisks indicate the positions of the stop codons. The leucine zipper motif is indicated with the Leu residues highlighted. The polymorphic sites (nt 211, 873, and 912) are shown with the polymorphic nucleotides above the nucleotide sequence, and the resulted amino acid changes below the amino acid sequence. (GenBank accession number:

AF059252).

114 5' RACE ^ ^ EST clone HBC4.W4

1 CCCGCCACTGTCGCTOAATOXTTGCATCATCGAAAGCAGAAAACCACTTTTGCATCCTTCGGCCTCTGGCGTGCCTGCCATGACQTCATAGCTCTGCGOA

EST clone 271616

1 0 0 GGTGOAAGTTGGGGAGCTTTOAGGACCTCATQGATCCCAGGGGOACCAAGAGAGOAGCTGAOAAGACAGAGGTAGCTGAGCCTCGGAACAAACTACCTCG MDPRGTKRGAEKTEVAEPRNKLPR 24 T 200 TCCAGCACCTACTCTGCCCACAGACCCTGCCCTCTACTCTGGGCCCTTTCCTTTCTACCGGCGCCCTTCGGAACTGGGCTGCTTCTCCCTGGATGCTCAA PAPTLPTDPALVSGPFPFYRRPSELGCFSLDAQ 57 S

3 0 0 CGCCAGTACCATGGAGATGCCCGAGCCCTGCGCTACTATAGCCCACCCCCCACTAACGGTCCAGGCCCCAACTTTGACCTCAGAGACGGATACCCGGATC RQYHGDARALRYYSPPPTNGPGPNFDLRDQyPDR 91

4 0 0 GATACCAGCCCCGGGACGAGGAGGTCCAGGAAAGGCTGGACCACCTGCTGTGCTGGCTCCTGGAACACCGAGGCCGGTTGGAGGGGGGTCCAGGCTGGCT yqprdeevqer | d h llcw 36EH r g r [!eq g p g w I| 1 2 4 Leucine Zipper MoiiJ

5 0 0 GGCAGAGGCCATAGTGACGTGGCGGGGGCACCTGACAAAACTGCTGACGACACCGTATGAGCGGCAGGAGGGCTGGCAGCTGGCAGCCTCCCGGTTCCAO AEAIVTWRGHLTKLUTTPYERQEGWQLAASRFQ 157

600 GGAACACTATACCTGAGTGAAGTGGAGACACCGAACGCTCGGGCCCAGAGGCTTGCTCGGCCACCGCTCCTCCGGOAGCTTATGTACATGGGATACAAAT GTLYLSEVETPNARAQRLARPPLLRELHYHGYKF191

70 0 TTGAGCAGTACATGTGTGCAGACAAACCTGGAAGCTCCCCAGACCCCTCTGGGGAGGTTAACACCAACGTGGCCTTCTGCTCTQTGCTACGCAGCCGCCT EQYMCADKPGSSPDPSGEVNTNVAFCSVLRSRL 2 2 4 a 800 GGGAAGCCACCCTCTGCTCTTCTCAGGGGAGGTAGACTGCACAGACCCCCAAGCCCCATCCACACAGCCCCCAACCTGCTATGTGGAGCTCAAGACCTCC GSHPLLFSGEVDCTDPQAPSTQPPTCYVELKTS 2 5 7

A 9 0 0 aaggagatgcacagccctggccaatggaggagtttctacagacacaagctcctgaaatggtgggctcagtcattcctcccaggggtcccgaatgttgttg KEMHSPGQWRSFYRHKLLKWWAQSFLPGVPNVVA291 Q

1000 CTGGCTTCCGTAACCCAGACGGTTTTGTCTCrrCCCTCAAGACCTTTCCTACCATGAAGATGTTTGAATATGTCAGGAATGACCOTGACGGCTGGAATCC GFRNPDGFVSSLKTFPTHKHFEYVRNDRDGWNP 3 2 4

1100 CTCTGTGTGCATGAACTTCTGTGCCGCCTTCCTTAGCTTTGCCCAGAGCACGGTTGTCCAGGATGACCCCAGGCTCGTTCATCTCTTCTCTTGGGAGCCT SVCMNFCAAFLSFAQSTVVODDPRLVHLFSWEP 3 5 7

1 200 GGCGGCCCAGTCACCGTGTCTGTACACCAAGATGCACCTTACGCCTTCCTGCCCATATGGTATGTGGAAGCTATGACTCAGGACCTCCCATCACCCCCCA OGPVTVSVHQDAPYAFLPIWYVEAMTQDLPSPPK391

1 300 AGACTCCCTCTCCCAAATAGTAATGCTTTAOAGGGAGGCAGTCATATCTCTGTGTGCAGATAATAAAAGCATATTTCTAAGAGOTT T P S P K * * 396

Fig. 4.2 cDNA and amino acid sequences of human DOMSZ. Fig. 4.3 An alignment of the amino acid sequences for Dom3z-reIated proteins from human {Hosa, accession no. AF059252), nematode (Cael, accession no. 1706489), fission yeast (Scpo, accession no. 2440182), baker's yeast (Sace, accession no. 1723981), and Arabdiopsis (Arth, accession no. 2245120). Gaps inserted into sequences are shown as dots; residues conserved among three or more species are in upper case and shown in the consensus. Conserved residues among all species are in red fonts and shaded in yellow. Unconserved residues are in lower case and shown as hyphens in the consensus.

Asterisks indicate the positions of stop codons.

116 1 n o Hoaa mdPrG tkrg .. . .aaktev aepcnklprP a P s .. .Ip td paLYsgpfp. .FyttPsE lg oPSLdaqRqy h.gDaraLrY YspPPtngpg pnFDLrdGYp Gael mshyggnPrG nsshgfgrkd ....f q g s d s klilpkltgqP IPnevqipEd nMlYetRnpp kPekqakfls eYcinyDRkl qlgtmrakkP hdklPpnnls..LDLgkGF. Scpo ...... ml ref sFYdvppahv ppvsePlEIa cYSLsrDRF.1 Ll.DdskLsY YypPPl ...... fsDLnCGFp Sace mgvsa nLFvkqRgst talkqPkEIg fYSrtkDeEy LlsDdtnLnY YylPda ...... eLDrkldLs Arch .ysipvlPaG wsndnhggrg gmgrgrwsng rggpqllprPgPypgr.ggn rggFggRyq.nYqrderfVs eLkI.skskEC L. .arkqtnF qeviylkecc fnYyifmvmv Consensus ------—P-G ------P -P— ------—Y " -R F—- P-EI " “Y'SL—DRE- L—D”’’"L"Y Y""PP—- - - ~~LDL““GF“

111 220 Hoaa dRYqPrDoav qERlDhllow LLahrgrlag g ...... p gWL aa alV TWR GhlTKlLttp YErqEgHq., LaaarfqGtl YLsEvatpna raqrlarppL Gael eCFdPkEg...... Dekilm Lhewlnqmap e q gRLKkvihea d W cWR G llTK Icstl YnkeEgWr.. IdaikrkGvI FLcEekteqa knrdlnqTDL Scpo nRFhPpk... sDpdplslVk dv ...... Im t Kglqmns s f l ____ TWR GllTKIHcap l.DprnhWety LvmdptsGil raMeE ...... rtrseT sy Sace sgFqkfkdyy kDFeDrcslr gL...... le tie s s eRliKgkkina d l l ____ TFR GiarKllsca FDspsfntvd LrivsfnGql FlkEvpeavn aakassaTEa Arch CRFellDlns hiWalralVc Fhavleqkcf rwccfilwye vRMyicpfhc qflclkeTFR nnlnKILgaa YnrhEpWe.. MgvhkrnGCI YLdvhklper pqsDL Consensus -RF-P-D ~D—D V- LL ------RLK------1 TWR G—TKII. YD--E-W L------G-I -L-E------TDL

221 330 Hoaa IR ...... aLm YkOYKFEqym caOk...... pgaapDpag aVntnvaFCS VlRarLG ...... ahp LLFaGEVDCt ...... Dpqapatqp Gael dk ...... RMs YWGhKFEqyv TLDd...... DaeqpDcss pVCCkEEYav VfRndLqCdq rln p ek rslg IFysGEVDCl ...... DrhgN ------Scpo a . . .nqdRMc YWGYKFEais TLpeindacs rDqleqrdnq dVvpdEqYCS IVklnlG . ksk LlLaGEVDCl wdkkpcsake sdvhsddgCleEdasNaenp Sace gRninqdlnv FCGYKFECla TLsnplqyCp rEvlekrckr IVshgDEYiS WRcgvG . nek LlI.gaEVDCi fd fk ...... eNgrdn Arch dR ...... Rrc YWGYcFEala TeDp ...... graygEelh hVdanvEFCS WRckLG...... ahr vMMgaEYnDCc ...... D ...vsdkg Consensus -R RM- YWGYKFE TLD ------D D -V EEYCS WR--LG ------L-h-GEVDC------D N------

331 n o Hoaa ptoYVEIiKTs Kemqapgqwr aFyrh.KLLK WWaQSFLpGVPnWaGFRnp dGfvaaLktF plSnkmfayvr n. .dRdgWnp svemnFcaaf LsFaqstVvqDdprlvhlfs Gael ...mlELKTq K gelghg... .FykysKsFK WWlQSsEvnV dhllVGl.RCq eGhvksLCsL rT rE I p..qRasWnf ragfeFlsC l FCYlhnclek Dgdacvleyr Scpo nlllYVELKTs Kkyp.. leny gmrk..KI.LK YWaQSFLlGI grIIIGFRDD nGlLleMkeh fThqlpkmlr pyfkpndWCp ....n R llv v LehaLewIkq cvkqhppsce Sace IkhYaELKcC qqvanlsdch kFer..KLFr cWlQcFLvGIPrIIyGFkDD hyvLkcveeF sTeEVpvllk nnnpqvgsac lealkW ygll ceWlEkmlpr Dedphsqlra Arch krfYVELKTC r e ...... IQSFvaGV PylWGFRDD gGrLvtCerl, CTrDI a. .hRarlkn ywqvsvissh Yrhe* ...... Consensus YVELKT- K ------F KL-K WW-QSFL-GV P-Î2VGFRDI1 -G-L--L—L -T -E I ------R--W------F------L L --I— 0 ------

441 484 Hoaa Hepggpvtva vhqda.,pya flplwYvEam tqdlpappkt papk Gael seMglhkqlq MrrvpVdetd fvpneFlEkf c" ...... Scpo FcLsyCggsk L v lrq ll* ...... Sace FkLvE.ennh Lrlseleesd eeysgLlDge hllsnqfkew rksl Arch ...... Consensus ------L 1------— — — — — — — g— — ——— — — — — — —------

Fig. 4.3 An alignment of the amino acid sequences for Dom3z-related proteins. 200 400 bp

RPl/Gll D0M3Z SKI2W

'■ CpG islands

BssH II

Fig. 4.4 Exon-intron structure of the human DOMSZ gene. Exons coding for protein sequences are in shaded boxes. Exons coding for 5' or 3' untranslated regions are in empty boxes. Transcriptional orientations are marked by horizontal arrows. An inverted, vertical arrow indicates the position of the 3' end of the SKI2W gene. The transcriptional orientations of R P l/G ll and SKI2W are opposite to that of DOMSZ. 1 2 3 4 5 6

kb

115 105- 80- 70-

20 —

Fig. 4.5 PFGE analysis of the RD-SKI2W-DOM3Z-RP1 complex. Genomic DNAs from two individuals with homozygous, monomodular RCCX structure were digested with RjjH n, resolved by PFGE and processed after Southern's procedure. Probes used are; lanes 1 and 2, complement C2 (pC201); lanes 3 and 4, 0.8 kb BamH I fragment corresponding to the S' regions of the RPl and DOMSZ gene (B0.8); lanes 5 and 6, RPl cDNA corresponding to the 3’ region of the RPl gene (RPl.l).

119 X 3 A. 73 o 2 c 3 « O c P o C 3 c o O E u = .5 o BO h c '2 ■•= S" 73 c o .3 i l o g CLJl cc IZ c . < < kb 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ?:lz

4.4 _

2-4 _

1.35 t-#:#

B. 9.5 7.5 :

4.4 . # 2.4 .

1.35.

9.5 7.5 :

4.4 .

2.4 .

1.35

D.

Fig. 4.6 Northern blet analyses of human multiple tissue RNAs hybridized to (A) RD , (B) SKI2W, (C) D0M 3Z and (D) R Pl probes.

120 10 20 kb to HLA Class I to HLA class 11 L - L - 1 .,.1 . ,1 92 kb to HSP70 350 kb to HLA DRr g A. u C2 B / RD RPl C4A XA C4B TNXB

Intergenic distance ^ ^ ^ ^ ^ ^ I I (bp) 421 205 171 59 322* 611 3 kb 611 3 kb •

B. I s ) II - % C2 B f R P l C4 TNXB ...I ___ T /i,v Bs Bs Bs Bs

Porbcs I I B s, B,v.ïH II site I

Fig. 4.7 A molecular map of the HLA class 111 region with (A) RP, complement C4, steroid CYP21, and tenascin T~NX (RCCX) bimodular structure and (B) RCCX monomodular structure. Transcriptional orientations are indicated by horizontal arrows; pseudogenes or partially duplicated gene segments are shaded. S. pombe C. elegans A. thaliana S. cerevisiae

H. sapiens 30.10/52.43 28.85/53.50 28.19/50.00 21.37/45.96

S. pombe - 25.63/47.15 29.37/58.33 30.30/53.33

C. elegans -- 24.26/47.87 23.88/48.06

A. thaliana - - - 25.89/46.45 w

Table 4.1 % Sequence Identities / Similarities among Dom3z Related Proteins CHAPTERS

BIOCHEMICAL CHARACTERIZATION OF HUMAN Dom3z

5.1 ABSTRACT

Human DOMSZ is the most recently identified member of the RD-SKI2 W-

DOM3Z-RPI gene cluster located between complement 5 / and C4 genes in the MHC class m region. It has related proteins in lower eukaryotes including baker's yeast, fission yeast, a nematode, and even a flowering plant. The function of human DomSz has not been determined. Preliminary studies have been conducted on expressing and purifying bacterial fusion proteins, generating polyclonal antibodies and establishing long-term mammalian transfectant cell lines. It has been shown by indirect immunofluorescence using DomSz antisera, and by green fluorescent protein technique, that human DomSz is mainly a nuclear protein. In immunoblot analysis using the polyclonal antibodies against the DomSz fusion proteins, a polypeptide with the apparent size about 67 kDa was detected in several tissue samples and cell lysates of culture human cell lines.

123 5.2 INTRODUCTION

As described in the previous chapter, a novel human gene DOMSZ was discovered by combining the conventional bench work with the state-of-art bioinformatics database searching. Part of the credit in this gene identification process should be given to the Human Genome Project, especially the EST databases. The

DOMSZ gene is located in one of the most gene dense region in the human genome, the

MHC class m region. It belongs to a cluster of four ubiquitously expressed genes RD-

SKI2W-D0MSZ-RP1. These four genes are arranged as two head-to-head orientated gene pairs (Yang et al., 1998).

RD encodes for a putative RNA binding protein with 380 amino acids (Cheng et at., 1993). The cellular function of RD has not been determined. However, RD protein contains the RNA recognition motif (RRM) of the ribonucleoprotein (RNP) family

(Surowy et al., 1990). Moreover, it shares amino acid sequence similarity with the U1 snRNP 70 kDa protein (Speiser and White, 1989). The presence of the RNP-Iike domain and the sequence similarity with the U1 snRNP suggest a possible function related to the

RNA splicing (Taylor et al., 1998b). The biochemical and cellular characterization of

Ski2w has been extensively discussed in the last chapter. RPl does not have any homolog in lower eukaryotes (Shen et a l, 1994). It shares limited sequence similarity with a number of different protein families, with the tyrosine kinase transforming protein

124 from Fujinami virus being the most significant one (18.7% identity, 32.5% similarity over the C-terminal 157 amino acids) (Sargent et ai, 1994). In many cases, the homology is indicative of the protein function, and indeed RPl was recently suggested to be a nuclear serine / threonine protein kinase (Gomez-Escobar et al., 1998).

As a brand new member in the four-gene cluster, the function of human DomSz has not been determined yet. However, it is already know through sequence alignment that human DomSz has related proteins in both simple and complex eukaryotes including the baker’s yeast, fission yeast, a nematode, and even a flowering plant. The conservation of these proteins implies that they have very fundamental function in an organism (Yang et al., 1998). In C. elegans, Dom-S is possibly involved in the postembryonic development of the germ line cells (Paulsen et al., 1995). It is possible that human DomSz also plays a role in growth and reproduction. Studies in rat have revealed that deletions of this region are associated with increased perinatal mortality, decreased body size, male infertility, and reduced female fertility (Kunz et al., 1980).

Taken all results together, they indicate that the MHC class EQ region may indeed play an essential role in growth, development, and reproduction, and DOMSZ may be the primary candidate for this function.

To examine the biochemical and cellular functions of human DomSz, bacterial fusion proteins for DomSz were expressed and purified. Antisera against these DomSz fusion proteins were generated. Short-term and long-term transfected mammalian cell lines of human DomSz were established. It was determined using exogenous GFP-

DomSz protein and confirmed by the indirect immunoflorescence of endogenous protein

125 that human DomSz is mainly a nuclear protein. Immunochemical studies have shown that endogenous DomSz protein is expressed in different mammalian cell lines.

126 5.3 EXPERIMENTAL PROCEDURES

Oligonucleotides — Custom primers were synthesized by Life Technologies

(Grand Island, NY). Human D0M 3Z gene specific primers^, rO, r2, and r3 were described in (Yang et al., 1998). GPP 5’ primer was located at 218 bp upstream of the

Bam HI site in the mammalian expression vector pEGFP-Cl (Clontech, Palo Alto, CA).

It was designed for amplifying the G¥?-DOM3Z fusion gene. The sequence o f GPP 5’ primer is: S' CCG ACC ACT ACC AGC AGA ACA C 3'.

PCR of human DOM3Z cDNA — Primers fO and rO were used to amplify the near full-length human D0M 3Z cDNA from the EST clone ED #271616. The Expand™ high fidelity PCR system (Roche Molecular Biochemicals, Indianapolis, IN) was utilized to ensure the accuracy of the sequence. PCR products were digested with Bam HI and purified using PCR purification kit (Qiagen Inc., Valencia, CA).

Cloning of D0M3Z cDNA into bacterial and mammalian expression systems —

The purified DOM3Z cDNA PCR product was ligated in frame with bacterial expression pETa Bam HI vector (Novagen, Madison, WI), pGEX(2T) Bam HI vector (Amersham

Pharmacia Biotech, Piscataway, NJ), and mammalian expression pEGPP-C 1 Bam HI vector, respectively. D0M 3Z cDNA probe (as described in the previous chapter) was

127 used for colony hybridization to screen for positive clones with the insert, which were confirmed by Bam HI digestion. Afterwards, the correct configured cloning products were identified using appropriate restriction enzyme digestions. The precision of the ligations at the 5’ cloning sites was determined by automated sequencing using the r3 primer.

Expression and purification ofDomSz bacterial fusion proteins — Bacterial fusion proteins Trx-His-Dom3z (from pETa-DOMiZ) and GST-Dom3z (from pGEX(2T)-D(9MiZ) were induced following the standard protocols at 30° or 37° using

0.05, 0.5, or 1 mM IPTG for 30, 60, 90 or 120 min, respectively. Harvested cell pellets were resuspended in PBS and sonicated. The supernatant and the inclusion bodies were separated after centrifugation and analyzed in 10% SDS-PAGE gel. Electroelution was applied to purify Trx-His-Dom3z directly from the gel. The appropriate protein band was excised and placed in a dialysis bag with buffer (0.2 M Tris-acetate, pH 7.4, 1% SDS, 10 mM DTT) and then electroeluted in a gel machine containing 50 mM Tris-acetate, pH

7.4, 0.1% SDS at 4°, 100 V for 3 hours. GST-Dom3z was purified using Sepharose 4B beads (Amersham Pharmacia Biotech). Briefly, French Press was applied to lyse the cells. Soluble proteins were recovered from the supernatant and incubated with the

Sepharose 4B beads. The bound fusion protein was eluted from the beads using 10 mM glutathione.

128 Generation and purification of antisera against human Dom3z bacterial fusion proteins — Purified bacterial fusion proteins Trx-His-Dom3z and GST-Dom3z were used as antigen to immunize rabbits, respectively. For each rabbit, about 50 pg of fusion protein was primarily injected into multiple subcutaneous sites, followed by four more boost injections, each time using about 20 to 10 pg antigen, with the dose gradually reduced. The rabbits were bled after the second boost, and the antisera were examined using ELISA. The specificities and avidities of the antisera were again checked after the terminal bleeding. The antisera were subjected to Immoblized E. coli lysate Kit (Pierce,

Rockford, IL) to remove the part cross-reacting with bacterial proteins for any further immuno analysis.

Transient expression of GFP-Dom3z in HeÎM cells — HeLa cells were cultured on coverslips until -50% confluent, transfected with GPP vector only and GFP-DOMiZ, respectively, using lipofectin reagent (Life Technologies). Seventy-two hours after the transfection, the green fluoresence in cells was observed under a UV microscope with appropriate fluorochome-specific filter.

Development of long-term GFP-D0M3Z transfected mammalian cell lines —

GFP-D0M3Z was digested with EcoR 1 and purified. HeLa and CHO cells were transfected with the linearized DNA using electroporation method (Ausubel et a i, 1999).

The G418 selection began 48 hrs after the transfection. The concentration of G418 starts

129 at 500 i^g/ml, and was gradually increased to 800 fag/ml within two weeks time. The stable transfected cell lines were established by limiting dilutions and clonal expansions.

Isolation of DNA, RNAfrom mammalian cells — Genomic DNAs were isolated from normal HeLa and CHO cells as well as from the stable GFŸ-D0M3Z transfectants using DNA isolation kit (Centra System, Inc., Minneapolis, MN). Total RNA were isolated using RNAzol® kit (Tel-Test, Inc., Friendswood, TX) after the manufacturer’s instructions.

Extraction of protein lysates from mammalian cell lines — Cells were lysed using

20 mM Tris, pH 7.4, 150 mM NaCl and 0.1% SDS, followed by sonication. The total cellular proteins were obtained from the supernatant after centrifugation. Cell lines used include: 293, kidney; Molt4, lymphoblastoid; HT 29, colon carcinoma; IMR32, neuroblastoma; SKNSH, neuroblastoma; Raji, Burkitt lymphoma; lung carcinoma.

Southern blot analysis — Ten micrograms of genomic DNA were digested to completion with restriction enzyme Acc I, resolved on a 0.8% agarose gel, blotted onto

Hybond-N membrane and hybridized with the human D0M 3Z cDNA probe.

Reverse transcription (RT) PCR — The RT-PCR experiments were executed using

GeneAmp RNA PCR kit (Perkin Elmer Cetus, Norwalk, CT) after the manufacturer's protocol. Specifically, random hexamer was used as the downstream primer for

130 synthesizing the first strand cDNA in reverse transcription reaction. The subsequent PCR reactions were carried out using the GFP-5 ’ primer and the DOM3Z r2 primer.

Indirect immunofluorescence experiment — HeLa cells were cultured on coverslips until -70% confluent followed by an additional 3 hours incubation in fresh medium.

Cells were then rinsed with PBS, fixed with 4% paraformaldehyde and permeated with methanol at —20°. After blocked using 10% goat serum, cells were incubated with primary antibodies against DomSz (1:50, 1:100, and 1:200 dilutions) at 4° overnight. The secondary antibody used was 1:80 diluted goat anti-rabbit IgG conjugated with FTTC

(Sigma Chemical Company, St. Louis, MO). Fluorescence was observed using a UV microscope.

Immunoblot analysis — Total cell lysates from bacteria and manmialian cells were resolved in 10% SDS-PAGE gel, and transferred onto nitrocellulose membranes by electroblotting. After blocked using 5% non-fat milk in PBS, the membranes were incubated with the primary antibodies with the appropriate dilutions for 1 hour at room temperature. GPP monoclonal antibody was purchased from Clontech (#8362-1). The signals were detected by chemiluminescence method using horseradish peroxidase (HRP) conjugated secondary antibodies and ECL-plus reagents (Amersham Pharmacia Biotech).

131 5.4 RESULTS

Cloning o f human D0M 3Z into bacterial and mammalian expression vectors

The 1.2 kb human DOM3Z cDNA fragment was amplified and digested with Bam

HI as the two primers used,^ and rO, both have Bam HI sites incorporated to facilitate cloning. The PCR fragment was cloned into bacterial pETa Bam HI vector, pGEX(2T)

Bam HI vector, and mammalian pEGFP-C 1 Bam HI vector. The scheme to determine the presence and orientation of the insert in each vector were analyzed in Fig. 5.1, panels I- m. The results for restriction digestion of clones containing DOM3Z cDNA insert were shown in panel IV.

In panel IV-A, the presence of D0M3Z in pETa vector in two clones was demonstrated by the cloning enzyme Bam HI digestion Canes 1 and 2), which could release the insert. Meanwhile, since both clones exhibit the 7 kb Nco I fragment {lanes 3 and 4) as analyzed in panel I-B, they were therefore concluded to both contain the correctly configured pETa-Z)C>M5Z. Similarly, Bam HI was also used to examine the presence of DOM3Z in pGEX {lanes 5 and 6) and pEGFP {lanes 9 and 10) respectively.

For these two constructs, Sma I digest was chosen to determine the orientation of the insert. The insert in pGEX-D0M3Z clones are correctly configured as both exhibit 5.2 kb and 960 bp Sma I fragments {lanes 1 and 8), which are consistent with the analysis in panel II-B. One clone contains the correctly constructed GFP-D0M3Z as it shows the

132 5.7 kb fragment {lane 11) predicted in panel IQ-B; while the insert in the other clone was in the opposite direction with the 5.1 kb and -800 bp Srna I fragments {lane 12) as analyzed in panel IH-C.

Expression and purification ofTrx-His-DomSz fusion protein

Bacteria transformed with correctly configured pETa-DOMiZ were induced with

IPTG to express the fusion protein Trx-His-Dom3z. The induced protein can be easily observed in the 10% SDS-PAGE gel at the apparent size of around 65 kDa {Fig. 5.2, panel I). Since the calculated fusion protein size is about 54 kDa, it becomes necessary to prove that the induced protein with the apparent size of 65 kDa is indeed the Trx-His-

Dom3z. In immunoblot analysis, the induced 65 kDa protein was detected using polyclonal antibody against GST-DomSz {Fig. 5.2, panel II). The antibody used should only react with the DomSz polypeptide because it was raised against a different fusion protein and also the minimal cross-reactions with bacterial proteins were removed through purification. Therefore, the 65 kDa protein was confirmed to be the Dom3z fusion protein. It was speculated that besides the possible inaccuracy of the protein marker, the discrepancy between the observed and calculated protein sizes may be due to the structural reason, since the Dom3z protein is relatively proline-rich (Yang et a l,

1998). The ring structures of the proline residues may retard the mobility of the Dom3z fusion protein and cause the appearance of a larger protein.

It appears that the induced Trx-His-Dom3z protein mainly exists in the peUet of the bacterial cell lysate. To obtain soluble fusion protein of Dom3z, different

133 combinations of IPTG concentration (1, 0.5, or 0.05 mM), inducing duration time ( 120,

90, 60, or 30 minutes) and temperature (37°, 30°, or room temperature) were tried to optimize the expression. Even under 0.05 mM IPTG, at room temperature for 30 minutes, the induced protein already became insoluble and could only be detected in the inclusion bodies. The insolubility of the fusion protein makes its purification by affinity chromatography very difficult. Therefore, Trx-His-Dom3z was purified directly from

SDS-PAGE gel using electroelution method. The eluted protein was shown to be the main component in the Coomassie blue stained SDS-PAGE gel {Fig. 5.2, panel I, lane 5).

Expression and purification ofGST-DomSz fusion protein

Bacteria with the correctly orientated pGEX(2T)-DDMJZ were induced to express the GST-Dom3z fusion protein using IPTG. The apparent size of the induced protein in SDS-PAGE gel is about 67 kDa {Fig. 5.3, panel I). By means of immunoblot analysis using antibody against Trx-His-Dom3z, the 67 kDa induced protein was proved to be indeed the Dom3z fusion protein {Fig. 5.3, panel II).

GST-Dom3z was also mainly present in the inclusion bodies under conventional lysis and sonication procedure. One possibility leading to the protein precipitation is that during the isolation process, the protein may be denatured because of harsh temperature and mechanical effect. To avoid this problem, harvested cells with the exogenously expressed GST-Dom3z protein were subjected to French Press. The solubility of the fusion protein was indeed significantly improved. The soluble fusion protein was purified using Sepharose-4B beads and eluted with glutathione {Fig. 5.3, panel I, lane 5).

134 Nuclear localization ofDomSz

To determine the cellular localization of DomSz, HeLa cells transiently transfected with GFP-DOM3Z were observed under a UV microscope. It was shown clearly that the fusion protein was mainly localized in cell nuclei (Fig. 5.4). Control experiment using cells transfected with GFP vector only shows that GFP is expressed in cytoplasm as well as in nuclei. It has been reasoned that the fluorescence of GFP is observed throughout the cell because the protein can freely pass the nuclear pore due to its small size (27 kDa).

To further substantiate the nuclear localization of DomSz, indirect immunofluoresence microscopy was performed. Using the purified antisera against Trx-

His-DomSz as the primary antibody, the endogenous DomSz protein was detected mainly in HeLa cell nuclei (Fig. 5.5), which is consistent with the previous observation through the GFP experiment.

Establishment of stable GFP-DOM3Z transfectant cell lines

Stable GFP-DomSz transfected HeLa and CHO cell lines were established under

G418 selection by limiting dilutions and clonal expansions. By genomic Southern blot analysis (Fig. 5.6), it was demonstrated that GFP-DOM3Z construct has been incorporated into all nine CHO (lanes 1-9) and nine HeLa (lanes 10-18) cell lines with possibly different copy numbers. Lane 19 is the genomic DNA from untransfected HeLa cells, it clearly does not have the G¥?-DOM3Z construct.

135 To examine the expression of the exogenous GYŸ-D0M3Z, RT-PCR was specifically designed using the upstream GFP 5' primer in the GFP vector while the downstream r2 primer in the DOMSZ cDNA. As shown in Fig. 5.7, all except one HeLa cell line {panel I, lane 9) exhibit a product at 560 bp as expected. It was therefore concluded the exogenous DOMSZ gene was expressed in these cell lines.

The GFP-Dom3z fusion protein expression in stable CHO transfectants was detected by immunoblot analysis using commercially available GFP monoclonal antibody. A polypeptide with an apparent size about 93 kDa was shown in Fig. 5.8.

Immunoblot analysis to detect the endogenous DomSz protein in tissue samples and mammalian cell lines

The first step leading to solve the cellular and physiological functions is the detection of the endogenous protein. Endogenous Dom3z protein was revealed using immunoblot analysis as shown in Fig. 5.9. When antisera against Trx-His-Dom3z was used, a polypeptide around 67 kDa can be detected in all the samples tested, including adrenal and liver tissue samples and 6 cell line extracts. Similar result was obtained from immunoblot analysis using antisera against GST-Dom3z. However, the calculated molecular weight from the deduced amino acid sequence is only about 44 kDa. Whether this 70 kDa protein is a product after post-translational modifications is yet to be determined.

136 5.5 DISCUSSION

Human DOMSZ is the most recently discovered gene in the RD-SKI2W-DOM3Z-

RPl gene cluster located between complement 5 /and C4 genes in the MHC class HI region (Yang et al., 1998). The exact function of DomSz has not been determined yet.

As demonstrated in this chapter, preliminary studies have been performed to help further investigations. DomSz has been shown to be mainly a nuclear protein. The endogenous protein is detectable in cell line lysate with an apparent size of 65 - 70 kDa. The development of Trx-His-DomSz and GST-DomSz fusion proteins, the generation of polyclonal antibodies, the establishment of stable HeLa and CHO cell lines transfected with GFP-DOMSZ are of significance for future research. They may provide the basic materials for in vitro biochemical, immunochemical and cellular studies.

The endogenous DomSz protein detected by immunoblot analysis has much higher apparent molecular weight (65 - 70 kDa) than the predicted size of the deduced protein (44 kDa). The discrepancy is hard to explain at this moment. From the deduced amino acid sequence, it was shown by PROSITE analysis that there are four potential sites for N-myristoylation located at amino acids 5, 9, 264, and S58, respectively (Yang et a l, 1998). Therefore, it is possible that post-translational modifications of the endogenous DomSz make the size of the mature protein larger than the predicted molecular weight. Meanwhile, it is worthy of noting that DomSz has a leucine-zipper

137 motif that may be important for protein-protein interactions. It is possible that DomSz may be dimerized as it could interact with itself, or it may be tightly asso^-m ciated with some other cellular protein and therefore exist as a complex. Although most non- covalently associated protein complex can be dissociated in the presence of ercaptoethanol and SDS in the protein sample buffer, the 67 kDa band would not disappear even after prolonged boiling in up to 200 mM P-mercaptoethanol or 4% SDS

(data not shown). Therefore, it becomes important to determine whether DomSz interacts with itself or any other protein(s), and if it does, which protein(s) is involved since it would be one relevant aspect to look into the function of DomSz.

It has been implied that DomSz, together with RD, Ski2w and RPl may have concerted functions, probably related to RNA translation and turnover (Yang et al, 1998).

The nuclear localization of human DomSz is consistent with the suggestion of its involvement in the RNA metabolism. Consequently, whether human DomSz can bind RNA becomes an obvious target to be addressed. The improved solubility and purification of

GST-DomSz may provide the essential material for biochemical studies.

Recently, in a personal communication with Dr. Arlen Johnson from the

University of Texas at Austin, it was found that the yeast version of DomSz interacts with

Ratlp, a nuclear 5' - S' exoribonuclease required for RNA turnover (Johnson, 1997).

Human Ski2w and its yeast homolog Ski2p have been suggested to be involved in the S'

- 5' RNA degradation (Jacobs et a l, 1998; Qu et a l, 1998). However, since Ski2w has been shown to be present in the nucleolus and cytoplasm of the cells (Qu et al, 1998), while

DomSz is mainly a nuclear protein, it is unlikely that DomSz would interact with Ski2w

138 directly. It is plausible that Ski2w is involved in a cytoplasmic exonuclease pathway (Qu et al, 1998), while DomSz is involved in a nuclear exonuclease pathway.

The yeast two-hybrid system would be one of the ways to fish out the protein(s) that interacts with DomSz. Besides screening the human cDNA library, it is more feasible to test the interaction of DomSz with a handful of known candidate proteins.

Preliminary results from this study are promising with the indication that DomSz may actually be interacting with itself. It would be exciting to investigate whether human

DomSz would interact with any exoribonuclease, especially the human homolog of yeast

Ratlp. With the commercially available GFP antibodies, which are reportedly designed for this purpose, immunoprécipitation experiment using as the long-term transfectants of

GFP-DOMSZ may yield relevant information.

It is logical to speculate that as yeast DomSz interacts with 5'-S' exonuclease

Ratlp, the DomSz mutants would be defective in the 5'-S' RNA degradation pathway.

One way to examine the human DomSz function is to determine whether human DomSz can rescue the yeast DomSz mutants in the exoribonuclease pathway. If the human

DomSz can complement the yeast protein function, which suggests that human DomSz also interacts with exonuclease and is involved in RNA turnover, it would be of interest to use site-directed mutagenesis to find out the protein domain that is responsible for the protein - protein interaction. Deletion of the leucine-zipper motif is another obvious experiment to test. Since Ski2w is possibly involved in the S'-5' degradation pathway, it appears that DomSz and Ski2w would both play critical role in RNA degradation. It

139 cannot be overemphasized that RNA turnover is one of the most important approaches to achieve the regulation of gene expression.

140 Fig. 5.1 Restriction analyses to determine the presence and orientations of DOMSZ cDNA inserts in different expression vectors. Schemes for digestions were illustrated with pETa-DOM3Z in panel I, pGEX-DOM3Z in panel II, and pEGFP-DOM3Z in panel m . Bam HI digestion was used to examine the presence of the insert {panels I-m, A).

Nco I (panel I, B and C) and Sma I (panels II and m , B and C) digestions were used to determine the configurations of the inserts. The enzymes chosen were marked in each analysis, and the approximate sizes of the fragments were revealed. The results of the digestions are presented in panel IV. The sizes of the insert and each vector were labeled.

141 I.

A. ,B am HI ,N c o | B. zOSObp C. •Ncol

2 k b . 7 k b

corm ct ortaniatlon opposria orionlalkw) II. A. B.

D a m H I ■ S m o l ' - 5 I Lb - 4 4 kb

p O E X O O M 3 2 - 1.2 kb; - MO

S m a l l a m H I convct orientation oppoalte orlenlallon III. A. B. 4.7 kb

Û F P - D O M 3 2 Û F P D O M 3 2

Bam HI ■Sma I - 2 4 0 bp -12K^ • S m a

Bam Ml ISma I correct orientaiion cppoiBe ofienieiroi

(to be continued)

Fig. 5.1 Restriction analyses to determine the presence and orientations of DOMSZ cDNA inserts in different expression vectors. {Fig 5.1 continued)

IV. pETa pGEX pEGFP

Bam HI Nco I Bam HI StJia I Bam HI Sma I kb 1 2 3 4 kb 5 6 7 8 kb 9 10 11 12

5.9. 4.9-

1.2 - 1.2 -

A B I. kDa 4 5

93 — 67 _

56 — 1SSSS. 42 — sssaes

28 —

II.

93 67

56

42

Fig. 5.2 Production and detection of bacterial fusion protein Trx- His-Dom3z. I. Expression and purification of Trx-His-Dom3z. Lanes 1 and 2, soluble proteins from uninduced and induced cell lysates; lanes 3 and 4, insoluble proteins from uninduced and induced cell lysates; lane 5, Trx-His-Dom3z purified using electoelution method. II. Immunoblot analysis of uninduced and induced cell lysates from Trx- His {lanes 1 and 2), as well as from Trx-His-Dom3z {lanes 3 and 4). The antiserum used was against GST-Dom3z.

144 I. 1

67 _ Ê # : 56 —

n.

93 67

56

42

F/g. 5.3 Production and detection of bacterial fusion protein GST-Dom3z. I. Expression and purification of GST- DomSz fusion protein. Lanes 1 and 2, soluble proteins from uninduced and induced cell lysates; lanes 3 and 4, insoluble proteins from uninduced and induced cell lysates; lane 5, GST- Dom3z purified using Sepharose-4B beads and eluted with glutathione. II. Immunoblot analysis of GST-Dom3z. Soluble proteins from uninduced and induced total cell lysates were shown in lanes 1 and 2; results from insoluble proteins from uninduced and induced total cell lysates were presented in lanes 3 and 4. The antiserum used was against GST-Dom3z.

145 Fig. 5.4 Cellular localization of GFP-Dom3z in HeLa cells. HeLa cells were transiently transfected with GFP-D0M3Z for 72 hrs. The green fluorescence was observed under UV microscope.

146 Fig. 5.5 Indirect immunofluorescence experiment to detect the cellular localization of Dom3z in HeLa. The primary antiserum is against Trx-His-Dom3z. kb 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

■ 5.0— !" W . ÿ

2. 1_

1. 0 _

F/g. 5.6 Genomic Southern blot analysis showing the incorporation ofGFV-DOM3Z into the stable cell line genome. The genomic DNAs were subjected to Acc I digestion and probed with the human DOM3Z cDNA. Lanes 1 to 9 are from CHO transfectants, lanes 10 to 18 are from HeLa iransfectants. Lane 19 is genomic DNA from untransfected HeLa DNA as negative control. The position of the incorporated DNA is indicated by an arrow.

148 I. RT-PCR of HeLa transfectants

bp 1 2345 6 7 8 9 10 II

560

n. RT-PCR of CHO transfectants

1 2 3 4 5 6 7

560

Fig. 5.7 Reverse transcription (RT) - PCR of GFP- DOM3Z to examine the expression of the exogenous construct in the stable ceil lines. Total RNA from HeLa (panel I) and CHO (panel H) cell lines were amplified using GFP 5’ primer and DOMSZ r2 primer. Lane 1 of each panel is 100 bp ladder. The size of the product is labeled. Lane 11 of panel I is a negative control using untransfected HeLa total RNA.

149 kDa

133 —

93 — 67 — 56 — 42 —

Fig. 5.8 Immunoblot analysis to detect GFP-Dom3z in CHO transfectants. Monoclonal antiboday against GFP (Clontech, #8362-1) was used.

150 1 2 3 4 5 6 7 8

93 67 56 —

42

Fig 5.9 Immunoblot analysis to detect the human endogenous DomSzin; Icme I, adrenal; lane 2, liver; lanes 3-8, cell lines of 293, Molt4b, HT29, IMR32, SKNSH, Raji, and A549, respectively. Antiserum against Trx-His- Dom3z was used. CHAPTER 6

ORGANIZATIONS AND GENE DUPLICATIONS OF THE

HUMAN AND MOUSE MHC COMPLEMENT GENE CLUSTERS

6.1 ABSTRACT

The MHC complement gene cluster (MCGC) in most people contains thirteen structural genes, pseudogenes and gene segments. Novel genes RD, SKI2W, DOMSZ and

RPl are organized as two head-to-head gene pairs between complement gene Bfdjxd the first locus of C4. Southern blot analysis shows that single copy genes for DOMSZ are detectable in primates and other mammals. Sequence analyses revealed that the exon- intron structures of human and mouse DOMSZ genes are identical. Both human and mouse DOMSZ transcripts exhibit splice variants at the 5’ regions, although the open reading frames remain identical. Cloning and characterization of ± e mouse R Pl cDNA revealed a reading frame for 254 amino acids with a bipartite nuclear localization signal close to the amino-terminus. The mouse RPl gene consists of 7 exons and spans 12.9 kb.

152 Located in intron 4 of mouse RPl is an endogenous retrovirus that probably confers the androgen responsive expression of the Sip protein in certain male congenic mice. The availability of the complete human and mouse MCGC genomic and cDNA sequences allows further deliberate analyses of gene duplications and evolution. The intergenic region between mouse SLP and C4 genes is more than six times larger than the corresponding region in human. It contains the functional gene steroid CYP21A, long stretches of repetitive DNA elements, and three partially duplicated gene segments

TNXA, SKI2W2 and RP2. The modular duplications of human and mouse RP-C4-

CYP21-TNX (RCCX) are sharply different as SKI2W2 is absent in the human MCGC, and

TNXA and RP2 are much smaller in size but higher in sequence conservation in humans.

153 6.2 INTRODUCTION

The MHC complement gene cluster (MCGC) is a genomic region in the class EH region of the major histocompatibility complex (MHC). This gene cluster has many intriguing features, namely, the very close gene placements with minimal intergenic regions or with overlapping 3' ends, the complete and partial duplication of genes in a modular fashion, the presence of endogenous retroviruses that modulate gene size variation, the high degree of protein polymorphism, and the disease associations (Yu,

1998). In humans the MCGC includes four genes, C2, Bf, C4A and C4B, that code for subunit proteins of the complement C3 and C5 convertases (Carroll et ai, 1984). Also located in this cluster are nine other functional genes, pseudogenes and gene segments, which are present between the complement genes and/or undergoing concurrent gene duplications or gene deletions with complement C4A or C4B in a modular fashion

(Rupert et aZ., 1999; Yang er u/., 1999; Yang era/., 1998). The products of the functional genes include an enzyme involved in the hydroxylation of steroids that is essential for hormonal homeostasis of body salts and for sex hormones (cytochrome P450 21 - hydroxylase CYP21) (Miller and Morel, 1989), an extracellular matrix protein (tenascin

TNX) (Bristow et al., 1993), and four ubiquitously expressed proteins which are probably engaged in the various stages of the regulation of gene expression (RD, Ski2w, Dom3z

154 and RPl) (Lévi-Strauss er a/., 1988; Sargent et a/., 1994; Shen er a/., 1994; Yang ef oA,

1998).

Deficiencies and polymorphisms of complement C4 have implicated associations with many genetic and immune disorders (Porter, 1983) such as systemic lupus erythematosus (SLE) (Fielder et al., 1983), insulin dependent diabetes mellitus (Lhotta et al, 1996; Mijovic ef a/., 1985), recurrent spontaneous abortion (Laitinen era/., 1991), and IgA deficiency (Volanakis et ai, 1992). Except SLE, the deficiency or malfunction of C4 itself does not necessarily lead to these disorders, suggesting the presence of disease susceptibility genes close to C4. It is possible that some of the recently identified genes in the MHC class m region could be involved in these disorders.

The novel genes RD, SKI2W, D0M3Z and RPl are located in the 30 kb genomic region between human complement genes 5 /and C4. RD (or D6S45) is located 205 bp downstream of Bf. It encodes a 46 kDa polypeptide that contains an RNA recognition domain and 24 copies of repeating Arg-Asp or Arg-Glu (Lévi-Strauss et al., 1988). The

RD protein is a subunit of NELF that is a negative factor for transcriptional elongation

(Yamaguchi er a/., 1999).

Sharing the 5' regulatory region with RD, SK12W encodes a putative RNA helicase (Dangel et al., 1995). We have shown that human Ski2w proteins are not only present in the nucleoli, they are also associated with polysomes and probably the 40S subunit of ribosomes, suggesting that Ski2w proteins are located at the sites of ribosome biosynthesis and protein synthesis (Qu et ai, 1998). The yeast homologue, Ski2p, is

155 involved in mRNA turnover and antiviral defense (Jacobs et al., 1998; Masison et ai,

1995).

The human SKI2W gene is located 2.4 kb upstream of the R Pl gene (Yang et ai,

1998). RPl (or Gil, D6S60E) probably encodes a nuclear, Mn^^-dependent, Ser/Thr protein kinase (Sargent et al., 1994; Shen et ai, 1994). D0M3Z, the latest identified gene, is located between 5A72 Wand R Pl. Dom3z-related proteins are present in simple eukaryotes such as baker's yeast, as well as complex eukaryotes like humans. The homologue in C. elegans, Dom-3, may have a related function to Mes-3, that is a maternal effect component required for normal postembryonic development of germ cells

(Paulsen et al., 1995). Northern blot analysis revealed that human DOM3Z transcripts are heterogeneous in size and are highly expressed in reproductive tissues and pancreas

(Yang eta/., 1998).

The human genes for nuclear protein kinase RP, complement C4, steroid CYP21 and tenascin TNX (RCCX) are duplicated in a modular fashion. RCCX modules have a spectrum of variations in sequence polymorphisms, number of modules, and C4 gene size

(Rupert et al., 1999; Yang et al, 1999). Although four genes were involved in the RCCX modular duplication process, only complement C4 evolved to yield two functional protein products, C4A and C4B. The human CYP21A is a pseudogene because of an 8- bp deletion in exon 3 and two other deleterious mutations in the coding sequence

(Higashi et a/., 1986; White er a/., 1986). The duplications for human RP and TAX'are incomplete as the 5' ends of the duplicated genes, RP2 and TNXA, are truncated.

156 Between human C4 locus I and locus H are CYP21A and the partially duplicated gene segments TNXA and RP2 (Bristow etal., 1993; Yang et ai, 1999).

The mouse MHC, or H2, has a class EQ region with a gene organization similar but not identical to that of the human (Chaplin et ai, 1983). The duplicated gene for mouse C4, SLP, is marked by its androgen responsive expression and by coding a C4-like protein in certain H2 haplotypes such as //2^(Shreffler et ai, 1984). The intergenic distance between the SLP and C4 loci is about 65 kb (Chaplin et al, 1983), in contrast to a size of 10 kb in the orthologous region of humans (Carroll et al., 1984).

The duplication of mouse C4 is also accompanied by CYP21. The CYP21A gene is located downstream of the SLP gene; the CYP21B gene is located downstream of the

C4 gene. In contrast to human, the mouse CYP21A is a functional gene, while CYP21B is a pseudogene because of the deletion of exon 2 and multiple mutations that cause frame shifts and premature termination codons (Chaplin et ai, 1986). It is not known whether the mechanism leading to the duplication of SLP and C4, CYP21A and CYP21B in mouse

H2 class in region is identical to that in human.

In this study we attempt to provide a detailed genetic analysis of the genes in the human and mouse MCGC, particularly DOM3Z and RP. The human genomic DNA sequence linking three novel genes RPl, D 0M 3Z and SKl2W\s presented. The mouse

DOM3Z and RPl cDNAs and genes are investigated and compared with those of human's. While the molecular genetic characterization of the human and mouse MCGC was in progress, genomic sequences of a human MCGC with a monomodular RCCX structure, and a mouse MCGC from the haplotype became available (Rowen et al.,

157 1997; Rowen era/., 1997; Rowen era/., 1997; Rowen era/., 1998; Rowen era/., 1997;

Rowen et al., 1997; Rowen et al, 1997). These sequences allowed a deliberate comparison of the gene structures and gene duplications of the MCGCs between the two mammalian species.

158 6.3 EXPERIMENTAL PROCEDURES

Southern Blot Analysis — Genomic DNAs from African green monkey (C0S7), pigtail monkey. Rhesus macaque. Owl monkey (OMK), cotton top tamarin (NPC-LC) and Squirrel monkey (DPSO) were isolated as described in (Dangel et al., 1995b).

Genomic DNAs from sheep, dog and pig were purchased from Clontech (Palo Alto, CA).

Genomic Southern blot analyses were performed following standard procedures. Briefly,

10 pg of each genomic DNA was digested to completion with Sst I, resolved in 0.8% agarose gel, blotted onto Hybond N^ membrane (Amersham Pharmacia Biotech,

Arlington Heights, IL), and hybridized with [a-^^P]dCTP-labeled human DOM3Z 0.5 kb

3' genomic DNA probe. The membrane was subsequently washed and exposed to X-ray film.

Determination of 5' cDNA Sequences for Human DOM3Z by 5' rapid amplification ofcDNA ends (RACE) — Total RNA was isolated from HeLa cells using

RNAzol™ B (Tel-Test Inc, Friendswood, TX) following the manufacturer's instruction.

5’ RACE of human DOM3Z was performed using a kit from Life Technologies (Grand

Island, NY). Briefly, the first strand of cDNA was synthesized from total HeLa RNA using the rl primer (5' GTC TCC ACT TCA CTC AGG TAT 3 ). A homopolymeric tail was added to the 3' end. PCR amplification was performed using the dC-tailed cDNA,

159 the nested r2 primer (5' CGT CAG GAG TTT TOT GAG GTG 3'), and an anchor primer.

PGR products were cloned into the pGR2.1 vector (Invitrogen, Garlsand, GA). A 0.8 kb

BamR I DNA fragment corresponding to the genomic region upstream of DOM3Z EST clone 271616 sequence was used as a probe for colony hybridization (Yang et al., 1998).

Glones containing 5' RAGE products were sequenced by automated sequencing (ABI model 377, Foster Gity, G A) using T7 primer and M13 reverse primer.

Sequence Analyses — The human DOM3Z gene sequence is obtained from our previously published genomic DNA sequences for RPl (GenBank accession no.

L26261) (Shen et al., 1994) and for SKI2W (GenBank accession no. AF059675) (Yang et al., 1998). DOM 3ZEST clones were identified by searching the National Genter for

Biotechnology Information (NGBI) EST database with the human Dom3z amino acid sequence, using the" tblastn" program (Altschul et al., 1997). The mouse Dom3z amino acid sequence was deduced from the mouse D0M3Z EST sequence (Marra et a l, 1996) and D0M 3Z gene sequence (Rowen et a l, 1998), v/ith reference to the human Dom3z amino acid sequence (Yang et a l, 1998). The human and mouse DOM3Z cDNA sequences for coding regions were compared using the GGG program (Genetics

Gomputer Group, 1991) PASTA, accessed through the Pittsburgh Supercomputing

Genter. The human and mouse Dom3z protein sequences were aligned using GGG programs PILEUP and PRETTY. Sequence identities and similarities of orthologous proteins from human and mouse were determined by GGG program BESTFIT (Rowen et a i, 1998).

160 Isolation and Sequence Determination of Mouse RPl cDNA — Using the human

/(f 1.1 cDNA probe, the mouse cDNA clone for R Pl was obtained by screening a brain,

ÀgtlO cDNA library prepared from female C57 Black B6 RNA (Clontech, Palo Alto,

CA). The cDNA insert was subcloned into pBluescript (KS) vector and sequenced manually by primer walking method using T7 Sequenase (US Biochemical, Cleveland,

OH). Mouse RPl and RP2 genomic DNA sequences were analyzed through published genomic DNA sequences upstream of mouse C4 (Nonaka et a i, 1986b), SLP (Ogata and

Zepf, 1991), and mouse genomic DNA sequence from C2 to TNXB (Rowen et a i, 1998).

161 6.4 RESULTS

D0M3Z is conserved in mammals

To study whether the DOM3Z gene is conserved among mammals. Southern blot analyses of Sst I-digested genomic DNAs were performed {Fig. 6.1). Panel I shows results on the analysis of Old World primates and New World primates. Using a 0.5 kb genomic DNA fragment corresponding to the 3' end of human D0M 3Z as a probe, a single, 1.6 kb Sst I fragment was detected in African green monkey (COS7), pigtail monkey. Rhesus macaque and human (HT29) (lanes 1-3 and 7, respectively). These identical restriction patterns suggest the presence of a single copy D0M3Z gene in the genomes of Old World primates. Similarly, restriction patterns are identical for DOM3Z genes in New World primates, as shown by the presence of a 3.5 kb Sst I fragment in cotton top tamarin (NPC-LC), Owl monkey (OMK), and Squirrel monkey (DPSO) (lanes

4-6, respectively). Panel H shows results on the analysis of non-primate mammals.

Single Sst I restriction fragments are detected in rabbit (lane 1), porcine (lane 4), rat (lane

5) and mouse (lane 6). Double restriction fragments for D0M 3Z are detectable in ovine

(lane 2) and canine (lane 3), which are most likely due to the presence of additional Sst I sites at the 3’ region of D0M3Z or, less likely, the presence of two genes.

162 Genomic DNA sequence of human D0M3Z

Human DOM3Z was discovered between SKI2W and RPl (Shen et ai, 1994;

Yang et ai, 1998). A 3.0 kb genomic DNA sequence covering the 5' end of R P l, the entire human DOM3Z gene, and the 3' end of SKI2W is shown in Fig. 6.2. The transcribed region of human DOM3Z spans from nt 601 to 2841 and is split into seven exons {Fig. 6.3, panel I), as compared with its full length cDNA sequence. The entire exon 1 and the first six nucleotides of exon 2 delineate the 5' untranslated region. The sizes of the exons vary between 95 bp (exon 6) and 362 bp (exon 2). The sizes of the introns vary between 84 bp (intron 6) and 247 bp (intron 1). The poly(A) site of human

D0M 3Z (nt 2841) is located 59 bp away from the poly(A) site of SKI2W (nt 2901), which is transcribed from the complementary DNA strand. The most 5' nucleotide of DOM3Z mRNA, as identified by 5' RACE, is 267 bp away from one of the 5' transcriptional initiation sites of R P l/G ll (Sargent et al, 1994; Yang et al, 1998).

The intergenic 5' region of D 0M 3Z and RPl is relatively G+C rich. O f the first

1200 nucleotides shown in Fig. 6.2, 61.3% consist of G+C nucleotides, in contrast to

41% normally present in a mammalian genome (Normore er a/., 1976). Remarkably, 75 copies of CpG dinucleotides, which are generally under-represented in the mammalian genome, have been found in this shared intergenic region. No TATA boxes are present but two SPl sites configured in the same orientation of D 0M 3Z are located at nt 64-76 and nt 413-450, and two SPl sites arranged in the same orientation of R P l/G ll are found at nt 577-581, and at nt 1536-1549. Three API sites are present, two of which are in the

DDAfiZconfiguration and one is in R P l/G ll configuration.

163 Heterogeneous 5' ends of human and mouse D0M3Z mRNAs

The 5' ends of human D0M 3Z mRNAs: analysis of the 5' RACE products of

D 0M 3Z performed using total RNA isolated from HeLa cells yielded complex data for the 5' sequences of DOM3Z mRNA. It appears that there are three groups of 5' sequences

{Fig. 6.3, panel II). Clones in the first group are similar to that of EST clone HBC4394 and have been reported (Yang et ai, 1998). This cDNA was designated as cDNAl and its full-length sequence is 1,386 bp. Sequence of the second group clones starts at nucleotide 46 of cDNAl, and there is a 44 bp insertion between nt 123 and 124 of cDNAl. A comparison of this cDNA sequence, cDNA2 (accession no. AF059253) with the D0M 3Z genomic sequence revealed that the 44 bp insertion is also present in the

D0M 3Z gene {Fig. 6.2, nt 927-970). Therefore, the addition of this 44 bp sequence is likely the result of an alternative splicing using a different acceptor site in intron 1 {Fig.

6.2). However, this transcript does not change the open reading frame of DOM3Z because an in-frame stop codon (nt 967-969) is present six nucleotides upstream of the initiation codon.

The third group of the RACE products (cDNA3, accession no. AF059254) retains the "intron 1" sequence, with respect to cDNAl and the D0M 3Z gene sequence {panel II,

Fig. 6.3). Again the open reading frame is not changed because of the same in-frame stop codon as described in cDNA2. This group of sequences may be the results of partially spliced or unspliced RNAs, or the presence of multiple initiation sites for

D0M 3Z transcripts.

164 The 5' variants of mouse D0M 3Z cDNA: analyses of the mouse DOM3Z cDNAs identified from the EST database also revealed three groups of variants with different 5' untranslated regions {Fig. 6.3, panel IQ). In comparison with the regular transcript mcDNAl, one of the variants (mcDNA2; accession no. AA510930) is the result of a different donor site at intron 1 that is 25 bp downstream for that of clone mcDNAl, the other variant (mcDNA3; accession no. AA471915) is caused by an unspliced intron I.

All of these variants share the same initiation codon and therefore encode for an identical protein. However, differences in the 5’ untranslated regions of the mRNAs might have variable translational efficiency for the Dom3z protein.

A comparison of human and mouse D0M3Z genes and proteins

The intron-exon boundaries of the mouse DOM3Z gene were defined by comparing sequences from mouse EST clones with the genomic sequence of DOM3Z

(Marra et al, 1996). Sequence analyses revealed that the gene structures for human and mouse DOM3Z are identical {Table 6.1). The mouse gene also contains 7 exons with an in-frame stop codon in the first exon. Similar to the human gene, the first exon and six nucleotides of the second exon are untranslated. Except for the untranslated regions from exon 1 and exon 7, the sizes of exons 2 to 6 are identical between the two species, as are the intron phases (Patthy, 1987). The sizes of the six introns in the human and mouse

D0M3Z genes are similar.

The coding regions of the human and mouse D0M3Z cDNA sequences are 86.6% identical to each other. The deduced human and mouse Dom3z proteins contain 396 and

165 397 amino acids, respectively. The two protein sequences share 89.7% identities and

93.7% similarities. There are relatively more dissimilar amino acid residues at the regions close to the amino and the carboxyl termini of the proteins {Fig. 6.4). Both proteins are proline-rich, with 11.6% and 10.8% of the amino acids being proline in the human and the mouse protein, respectively. Leucine is the second most frequent amino acid residue in both proteins, with 9.5% in the human and 10.3% in the mouse protein. In human Dom3z, there is a leucine zipper motif at amino acid no. 103-124 {Fig. 6.4); at the corresponding position in mouse protein, one of the leucine residues is substituted by a valine. The leucine zipper motif may be involved in protein-protein interactions.

An analysis of the cDNA and genomic sequences derived from our laboratory and from the GenBank database for human DOM3Z revealed a polymorphism changing codon 28 fromThr to Ser (Figs. 2 and 4), and condon 261 from His to Gin (Yang et al.,

1998). When the derived amino acid sequences from seven mouse D0M 3ZEST clones

(accession nos. AA218230, AA716917, AA178524, AA509363, AA267188, AA510930,

AA471915) (Marra et al., 1996) and from the mouse D0M 3Z gene (Rowen et al, 1998) were compared, two amino acids substimtions were found in multiple clones and therefore likely to be polymorphic residues. These substitutions are: a His to Asn change at codon 20 that is the result of a C to A transversion at nt 199, and a Leu to Ser alteration at codon 28 that is the result of a T to C transition at nt 224.

166 Elucidation of the mouse RPl cDNA and amino acid sequences

A cDNA clone, MRPI.O, for mouse RPl was isolated from a BALB/c brain cDNA library using a human RPl cDNA probe. Sequence determination revealed a 1,028 bp insert that contains a polyadenylation signal and a poly(A) tail. Screening of the related cDNA sequence from the mouse EST database yielded an EST clone, AA036570

(Marra et ai, 1996), with a sequence of 571 bp in length that overlaps with MRPI.O at the 3' end and extends its 5’ end by 102 bp. Fig. 6.5 shows a compiled cDNA sequence for mouse RPl that has a 5' untranslated region of 63 nt, an open reading frame for 254 amino acids, and a 3' untranslated region of 284 nt. A bipartite nuclear localization signal, which consists of two basic tetrapeptides separated by nine amino acid residues, is present at the amino terminal region. The derived amino acid sequence shares 85.8% sequence identities or 88.2% sequence similarities with human RPl/Gl 1. Searching of the related protein family or sequence motifs by BLASTP (Altschul et ai, 1997) and

PROSITES (Bairoch et al., 1997) did not yield information on the possible protein structures or functions of mouse RPl.

Duplication of mouse RPl and RP2 genes

The availability of the genomic DNA sequence extending from the 3' region of mouse complement C2 gene to the tenascin TNXB gene (Rowen et al., 1998) allows a deliberate analysis to characterize the duplication of the mouse RP sequences. In comparison with the cDNA sequence described above, the mouse RPl located upstream of SLP is split into 7 exons spanning 12,946 bp (F/g. 6.6, panel II). An endogenous

167 retrovirus of 6050 bp in size is present in intron 4 of the RPl gene. This endogenous retrovirus was originally discovered while investigating the mechanism leading to androgen-dependent expression of the sex-limited protein (Sip) and was termed the

"imposon" (IMP) (Stavenhagen and Robins, 1988). The IMP is located 1891 bp upstream of the SLP gene. It is flanked by a 5 bp target site repeat sequence (S'-ACATT-

3'), and is configured in the reverse transcriptional orientation with respect to that of RPl

(and SLP gene).

RP2 is located upstream of mouse C4. Similar to the case in human, the duplication of RP is incomplete. The breakpoint for the mouse RPI/RP2 gene duplication occurred in intron 2 3,194 nt downstream of the donor site of exon 2, or 715 bp upstream of the acceptor site of exon 3. The size of the duplicated region for RP2 is 3,404 bp. The corresponding region in RPl from intron 2 to exon 7 spans 9264 bp. In contrast to RPl, there is no trace of IMP integration in intron 4 of RP2, suggesting that the gene duplication of R Pl and RP2 in mouse occurred prior to the integration of IMP to RPl.

The exonic sequences from RPl and RP2 were compared. Besides the truncation of exons 1 and 2, there are 20 point mutations in the remaining five exons of RP2. In addition, there are a single nucleotide deletion in exon 3, a 10-nucleotide deletion in exon

4, and an insertion of a C nucleotide in the 3’ untranslated sequence of exon 7 (shown in

Fig. 6.5). Therefore, it is unlikely that RP2 would code for a functional protein product.

168 Duplication of Mouse TNXA and TNXB Genes

Similar to the case in human, mouse TNX and CYP21 overlap at the 3' end. The mouse TNXB gene is 59.3 kb in size. It consists of 43 exons that encode a protein of

4,114 amino acids (Matsumoto et al., 1994; Rowen et ai, 1997). Analysis of the mouse

MCGC genomic sequence from a H2^ haplotype (Rowen et ai, 1997; Rowen et al.,

1998) revealed that 10 kb of the 3' region TNX sequence are present downstream of the

(functional) CYP21A gene. This sequence, TNXA, corresponds to exon 26 to exon 43 of mouse TNXB with the sequence for exons 34 and 35 being deletea. A nonsense and a frame-shift mutation are present in exons 28 and 30, respectively. In humans, TNXB consists of 45 exons and spans 68.2 kb. The partially duplicated sequence for human

TNXA spans 4.5 kb, which corresponds to exon 33 to exon 45. Human TNXA fused directly to RP2. In mouse, there is a large stretch of highly repetitive DNA sequence spanning 28.7 kb between TNXA and RP2. Unexpectedly, a 1.8 kb DNA sequence is present in this region that is similar to mouse SK12W from intron 12 to intron 17

(including exons 13 - 17). This partially duplicated gene fragment, SKI2W2, is 94.3% identical to the corresponding region in the intact SKI2W gene. The major mutations in

SKI2W2 include an 11 bp deletion in exon 16, and a 6 bp deletion in intron 16.

169 6.5 DISCUSSION

Analysis of the mouse genomic DNA sequence at the MCGC reveals the partially duplicated sequences for TNX and RPl between the CYP21A and C4 {Fig. 6.7). The intergenic sequence between SLP and C4 in the mouse 772^ haplotype is 65.5 kb (Rowen et al., 1997), which is much larger than the orthologous region in human that is 10 kb in size. In humans the duplicated sequence for RP2 is 913 bp that spans the last two and half exons. The breakpoint of gene duplication is located in exon 5 {Fig. 6.6). The sequence of RP2 is exactly identical to that of RPl. In mouse, 3.4 kb of genomic DNA sequence corresponding to the last five exons of RPl are present upstream of mouse C4 gene. The breakpoint of gene duplication occurred in intron 2. This duplicated DNA sequence for mouse RP2 contains multiple point mutations and minideletions and therefore, mouse RP2 is unlikely to be functional even though it is relatively larger than human RP2. The duplicated region for mouse TNXA is 10 kb with 16 exons, compared with 4.5 kb and 13 exons for human TNXA. Again, there are extensive sequence variations/mutations in mouse TNXA gene fragment. Unexpectedly, an 1.8 kb partially duplicated gene fragment for SK12W'\s, present between RP2 and TNXA, which is not observed in the human MHC. Therefore, the modular duplications of RP, C4, CYP21 and

TNX genes in human and in mouse are substantially different and likely occurred independently. In some mouse haplotypes there are multiple copies of SLPIC4 and

170 CYP21 genes (Lévi-Strauss et al., 1985; Pattanakitsakul et al., 1990; Rosa era/., 1985).

The mechanism(s) leading to these gene duplications have not been determined.

In both human and mouse MCGC, there are two sets of gene duplications. The first set is a single gene duplication giving rise to C2 and factor B {Bf). In both human and mouse, C2 and factor B proteins share 41.3% amino acid sequence identities. The second set is the modular gene duplication for RP1/RP2, C4AJC4B {SLP/C4 in mouse),

CYP21A/CYP21B, and TNXAJTNXB. In human, the duplicated RCCX modules are highly similar (99% identical) in sequence. There is overwhelming evidence for homogenization of polymorphic or mutant sequences in the RCCX, modules due to misalignments between monomodular and bimodular/trimodular structures. This homogenization process is not only a driving force for functional diversity of C4, but may also be one of the root causes for MHC disease associations (Kawaguchi et ai,

1991; Yang et al., 1999; Yu, 1998). In mouse, the RCCX modules have much more sequence deviations that include selective integrations of numerous repetitive DNAs, retroelements, multiple point mutations and minideletions. These gross sequence changes would discourage misalignments. Therefore, homogenizations of the diversified sequences in the duplicated gene modules would be less efficient in the mouse MCGC than that of human’s.

The genomic sequence of SLP from the H2^ haplotype has a G-nucIeotide deletion in exon 24 (Rowen et al, 1997). Exon 24 encodes the thioester residues of the protein.

This single nucleotide deletion would lead to frame shift mutation and therefore tmncation of the Sip protein. This is of particular interest because it has in fact been

171 noted that no Sip proteins are produced in both male and female mice with haplotype

(Atkinson gf a/., 1982). Hence, the 5LP in is a mutant or a pseudogene. Frameshift

mutations have also been found in a human C4AQ0 gene caused by a 1-bp deletion in

exon 20 (Fredrikson et al., 1998), and in human C4AQ0 and C4BQ0 genes by a 2- bp

insertion in exon 29 (Barba et ai, 1993; Rupert et al., 1999).

An endogenous retrovirus IMP is present in reverse orientation in the intron 4 of

mouse RPl. The 5' LTRs of the IMP are shown to contain glucocorticoid responsive

elements that confer the sex limited expression of SLP in male mice (Adler et al., 1992;

Ramakrisman and Robins, 1998). Whether the LTRs would bestow a differential

expression of R Pl and/or DOM3Z in male and female mice has yet to be determined. In

humans, an endogenous retrovirus HERV-K(C4) is present in the intron 9 of the long C4

gene. The orientation of the endogenous retrovirus is opposite to that of the resident gene

C4. The IMP present in intron 4 of mouse RPl is also organized in the reverse

orientation with respect to the resident gene. It has been suggested that the integration of

endogenous retroviruses into the intronic sequence of structural genes in the reverse

configuration might confer protective function on the host cell. This is because antisense

RNA molecules for the retrovims are synthesized (as part of the intervening sequences)

whenever the resident genes are transcribed. These antisense molecules could anneal to

the genomic RNA and transcripts of the retrovirus and lead to their degradation (Dangel

etal., 1994).

The size of human RPl transcript was estimated to be about 1.6 kb in size. The cDNA clone R l.l yielded a sequence of 1.1 kb (Shen et al., 1994). An additional cDNA

172 sequence of 520 bp in size was obtained by multiple rounds of RT-PCR using DNA primers based on the genomic sequence at the S' region of RPl. It was suggested that human RPl gene contained 9 exons and encoded 364 amino acids. However, it was not anticipated that there would be another gene, DOM3Z, located less than 300 bp upstream of RPl but in the opposite transcriptional orientation. D0M 3Z also codes for a ubiquitously expressed transcript of similar size to that of RPl (Yang et ai, 1998).

Screening of the S' cDNA sequence previously assigned for RPl with the EST database yielded entries for DOM3Z rather than RPl. In addition, the derived protein sequence from this region is not present in mouse RPl. Therefore, it is possible that (the major versions of) the human or mouse RPl transcript is encoded by 7 exons with an open reading frame for 2S4 amino acids. If this is the case, the distance between the transcriptional start sites of human RPl and D0M 3Z wiU be 267 bp, or 616 bp between translational initiation codons of D0M3Z and RPl {Fig. 6.2).

RD, SK12W, DOM3Z and RPl, are all present between the factor B and the

C4A/SLP genes (Dangel et ai, 199S; Lévi-Strauss et ai, 1988; Sargent et ai, 1994; Shen et ai, 1994; Yang et ai, 1998). These four genes are arranged in two head-to-head configuration gene pairs, i.e., RD-SK12W and D0M3Z-RP1 {Fig. 6.7). The transcriptional start sites are separated by less than 171 bp for of human RD-SK^ZW, 267 bp between human DOM3Z and RPl, 302 bp between mouse RD and SK12W, and 144 bp between mouse DOM3Z and RPl. The absence of TATA-box sequences at the S' regulatory regions of these gene pairs in both species, the presence of CpG islands and

173 the ubiquitous expression are consistent with the hypothesis that these genes have house­ keeping functions (Yang et al., 1998).

174 i i U i ü 1 2 3 4 5 6 7

####

2 -

1.6 -

n. s - % N ° 2 : kb 1 2 3 4 5 6

3 - 2 -

1.6 -'

1 -

Fig. 6.1 Southern blot analyses of DOM3Z genes in (I) primates and (II) mammals. Genomic DNAs were digested with Sst I and a 0.5 kb 3' human D0M 3Z genomic DNA was used as a probe. In (I), DNA in lane 1 is from human, lanes 2 to 4 are DNAs from Old World primates, lanes 5 to 7 are DNAs from New World primates. HT-29, human colon carcinoma cell line; COST, African green monkey; NPC-LC, cotton top tamarin; OMK, owl monkey; DPSO, squirrel monkey.

175 Fig. 6.2 Genomic DNA sequences linking human DOM3Z toR P l and SK12W.

Sequences are obtained from previously published R Pl genomic sequence (accession no.

L26261) and SKI2W genomic sequence (accession no. AF059675), with reference to HLA class m sequence (accession AF019413). Exon-intron structure of D0M 3Z is derived by comparing genomic sequence with D0M 3Z cDNA sequences. The derived amino acid sequences are shown for DomSz only. Intergenic and intron sequences are in lower cases. The locations of 5' splice variants for DOMSZ transcripts, cDNA2 and cDNA3 are indicated. Polymorphic residues are placed above sequence for nucleotides, and below for amino acid sequences. Arrows indicate directions of transcription, translation, or orientations of ci^-acting DNA regulatory motifs. An asterisk represents a stop codon.

Nucleotide sequences for regulatory motifs are italicized.

1 7 6 .SJ5H II SPl ATGACTTCTOACACACCCOCœCCSCCGACCCroOCCACATAAATCOCCTCACOGXCACCGAACTI

5doH I R P lfG ll IniCiaCion codcn 5TTAO:TOGCTCACCCCGAACACCATCCgACTCCACACSCCCTU >:i*ri;a X\.t kX nv;T AAeTCCAAAOgICTCCgCGATt:ACgIGATa:CTCTTCCACCTCXT

<- RPI/C2I cap ait«

API rotroAOMAAayuxrcTCAAcrroTCCTcccTc

^ f x a a 1 DOMTt

m CWA3 SCATtS GACqtaaecçgattctegaccCyagc?caqagqcecctectaaeeccaettqcaCctgqaggacctggagggaqggtgggCg**gggc^gg>*agagqcaOTacg«gageccggc~ggqg

f* API .... praaenc In cWA2 ...... cggcgcccgaggggccCgaacgcctcaaggccagagcccggcgaccaggcggcccgccCagccccaaaccagCcacccccccceaggccccctcCgtagaaCggaaaccccgcacccctg

...... Kauu 3 SanK I T c e c g tc e g a g G A C C T C A T gS A T C rgACCCCCACCAAGACACGACCTGAGAACACACAGCTASCTCACCCTCOGAACAAACTACCTCGTTX ACCACCTACTCTGCCX ACACACCCTSCCC? (DMOPRGTKRCAEXTEVAEPRMKLPRPAPTLPTDPAL S

YSGPFPFYRRPSELCCFSLDAQRaYHGDARALRYYSPPPT

TAACGCTCCACCCCCCAACT T I GACXTCACACACGCATACCCCGATCGATACCACCCCCGCCACGACGAGCTCCACGAAACCCTCGACCACCTCCTGTCCTCGCrCCTGCAACACCCACC NGPGPNPDLRDGYPORYQPRDEEVQERLDHLLCWLLEHRG

CCGdTGCACCCgcgagcaaagcgcggcaggcagcacacccggagacccgcacccaccccccaaaaccaggagagecaaagccggggcgagaaCcagcccccagaagaaaeaagaccacg R L E C (119) iJCSa 3 SPl

(1 2 0 ) CPGWLAEA IVTWRGHLTKL

ccccccctCcccagaggctgagagcecgccacgc

(1 9 9 ) KPGSSPD?

GC r c r i CTCA(»CCACCTAGACTCCACACACCCCCAACCCCCATCCACACACCCCCCAACCTCCTATCTOGACCTCAAGACCTCCAACGACATCCACACCCCTCGCCAATCCACCACTTT LFSGBVOCTDPQA?ST0P?TCYVEr.KTSKBHHSPGQWRSFQ e g CTACACgcccaggaCcggggCgggcagggcgagagccgaggcgggaaggccgggaaagggactcggggaggagggcgaaggcagaaCggagggCgcacgaggggccccacggaccccggg Y R (2 7 1 )

CTACCATGAAGATGnTGAATATOTCAOOgcaagggaacgacgctgcagcccccacccgCatccccaaacaccaagaccacaggtctagcaceeagggeaacagecCgecccctcccece TKÎCMFEYVR ( 3 1 6 ) K x a a 6

(317)KDRDCWNPSVCMNFCAAPLSFAaS

C x o a 7 ACGCTTCTCCACGATGACCCCACgtgaggeaCCcagcgccgccecececcCctggaceeeaggatccagcceecggcccCcaaccgaCgcccccatecgceecccagGCTCGTTCATCTC TVVQODPR (348) (349) L V H L

I CCCTCTCCCAAATAGTAATCC I'r i 'ACACCCACCCACTCATATCTCTCTGTCCACATAATAAAACCATA'l' n v r A AGAGC T T c C e C c q c c g C c t t c c c a g c C g a g C c a c c c c t g g c e a q c a P S P R • • (396) poly(A) signal CKXÜZ i S K I2 W E x o n 23 e

pcly(A) signal £K Z2i* *

Fig. 6.2 Genomic DNA sequences linking human DOM 3Z toR P l and SKI2W.

177 200 400 bp

RPl/GIl DOM3Z S K I 2 W

120 199 272 317 349 396 CpG islands

n. bp 123 247 362 exon 1 intron 1 exon 2

cDNAl CDNA2 cDNA3 m. bp 135 176 362 exon 1 I intron 1 , exon 2

mcDNAl mcDNA2

mcDNAS

Fig. 6.3 Exon-intron structure and alternative splicings of human and mouse DOM3Z. I. Exon-intron structure of the human D 0M 3Z (Yang et al., 1998); II and III, alternative splicings to generate different 5' untranslated regions of human and mouse

D0M 3Z transcripts, respectively.

178 I s 90 Hosa MDPRGTKRgA EKTEVaePrn KLPRpaPtLp TdPaLYSGPF PFYRRPSELG CFSLDAQRQY HGDARALRYY SPPPtNGPGP nFDLRDGYPD

n s

91 I Leucine Zipper Motif | 1 8 0 Hosa RYQPRDEEVQ ERLDHLLcWl LEHRgrLEGG PGWLAeAiVT WRGHLTKLLT TPYERQEGVJQ LAASRFQGTL YLSEVETPnA RAQRLARPPL Mumu ------r - v ------n q ------g - t ------a ------

1 8 1 q 2 7 0 Hosa LRELMYMGYK FEQYMCADKP GsSPDPSGEV NTNVAFCSVL RSRLGsHPLL FSGEVDCtdP QAPsTQPPtC YVELKTSKEM HSPGQWRSFY Mumu ------g ------Y n ------I n ------c ------s ------

2 7 1 3 6 0 - Hosa RHKLLKWWAQ SFLPGVPnW AGFRNPDGFV sSLKTFPTMk MFEyVRNDRD GWNPSVCMNF CAAFLSFAQS TWQDDPRLV HLFSWEPGGP 'o Mumu ------h — ------c ------e ------n ------E ------

3 6 1 Hosa VTVSVHqDAP YAFLPiWYVE aMTQDLPspp KTPSPK. MUWU ------IT ------s ------t — p i s ------d

Fig. 6.4 An alignment of human and mouse Dom3z protein sequences. The human Dom3z sequence is shown. The mouse amino acids identical to those of human are represented by dash lines. Dissimilar sequences are shown at the corresponding positions, with conserved residues in upper case and unconserved residues in lower case. Polymorphic residues are shown above for the human sequence, and below for the mouse sequence. Hosa, Homo sapiens-, Mumu, Mus musculus. Fig. 6.5 cDNA and derived amino acid sequences of mouse RPl. The first 113 bp is obtained from EST clone AA036570 and the rest from cDNA clone MRP 1.0. The bipartite nuclear localization signal for RP1 protein is underlined. The location of exon- intron boundaries in the RPl gene is marked by vertical strokes. Mutant sequences at the

"coding region" of mouse RP2 are shown above the DNA sequence; deletions in RP2 are marked by a hyphen', insertion is marked by ^ and the inserted nucleotide(s).

1 8 0 EST AA036570 ...... 1 ATTATTCTGTGCCGGAAGACGCGGTCTTCGGGACTCCTTTCCCGGAATCCTGAAATCCGGGTTATQAATCGCAAGAGACGGCTCCTGGCG 9 0 M N R K R R L L A 9 . M R P l.O I 91 TCTGAGGCTTTTGGGGTGAAGCGGAGGCGGGCGCCGGGGCCCGTGAGAGCGGATCCTTTAAGAACAAGAGCAGGCTCGGCGCGCGAGGCC 180 10 S E A F G V K R R R APGPVRADPLRTRAGSAREA 39

181 ATCGAGGAGCTCGTGAAGCTGTTTCCCCGGGGGCTGTTCGAGGATGCGCTGCCGCCCATCGCGCTGAGGAGCCAAGTGTACAGTCTGGTG 270 40IEELVKLFPRGLFEDALPPIALRSQVYSLV 69 1 a 271 CCAGACCGGACGGTGGCCGACCTACAACTGAAGGAGCTTCAAGAGCTGGGAGAGATCCGAATTATCCAGCTAGGCTTTGACTTGGATGCT 360 70PDRTVADLQLKELQELGEIRIIQLGFDLDA 99

a - a | - a 361 CATGGGATTGTCTTCACGGAAGACTACAGGACCCGGGTCCTCAAGGCCTGTGATGGCCGACCATGTGCTGGGGCGGTGCAGAAGTTCCTG 450 lOOHGIVFTEDYRTRVLKACDGRPCAGAVQKFL 1 2 9

a g c a I c 451 GCCTCAGTGCTCCCAGCCTGTGGGGACCTCAGTTTCCAGCAGGATCAGATGACACAGACTTATGGCTTCAGGGACCCGGAGATCACGCAG 540 130ASVLPACGDLSFQQDQMTQTYGFRDPEITQ 1 5 9 g 541 CTGGTGAACGCTGGGGTCCTCACTGTCCGAGATGCTGGAAGCTGGTGGCTGGCTGTGCCTGGAGCTGGGAGATTCATCAAGTGCTTTGTT 630 160LVNAGVLTVRDAGSWWLAVPGAGRFIKCFV 1 8 9

I t t c 631 AAAGGGCGCCAGGCTGTACTGAGCATGGTGCGGAAGGCCAAGTACCGGGAGCTTGCCTTGTCAGAGCTCCTGGGCCGCAGGGCCCCTTTG 720 190KGRQAVLSMVRKAKYRELALSELLGRRAPL 2 1 9 I 721 GCAGTGCGGCTAGGTCTTGCCTACCATGTGCACGACCTCATTGGAGCCCAGCTGGTGGACTGTGTCCCCACCACTTCTGGAACCCTCCTT 810 2 2 0 AVRLGLAYHVHDLIGAQLVDCVPTTSGTLL 2 4 9

t g ''c 811 CGCCTGCCAGATACATOAGGAGCCAATGCTTGGATTCTGCAACTCACTGAGTGAGGTTCCTGGAAGTGCCACCCCAGGGTCGTGAGCAAG 900 250 R L P D T * 255

a c 901 TCACCGCAGTGGGTGCCAGGCTCTGCTGCTGCAAGCTGGGCTTCTACCTGAGCTGGGCTGTGGGCATTGCAGCTCTTGCTTCTGTGCGTG 990

a a 991 IGGAGTCAGGAGCCGTGCCAAGGGGATGAGAAGGTGGGGTTGCTGGAGACACTGGAGCAGGGAGTAGAAAACTCTGCCCTTCACGTCAGG 1080

poly(A) signal c 1081 CTGAAATTGCCAAATAAAATACTTGTGCCTGTAAAAAAAAAAAAAAAAAA 1130

Fig. 6.5 cDNA and derived amino acid sequences of mouse RPl. 1 2 kb I. HUMAN J I

DOM3Z RPl C4A Alu Alu

i 1 S/c S S S /bS J

I SVA I Exon 1 2 3 4 56 7

Breakpoint o f gene duplication

TNXA RP2 C4B

* 56 7 II. MOUSE

DOM3Z R Pl SLP

IMPOSOff HH> Exon 1 2 4 3 4 5 6 7 Breakpoint o f gene duplication

RP2 C4

3 4 5 6 7

Fig. 6.6 h. comparison of the exon-intron structures and gene duplications for human (I) and mouse (II) K P l and RP2. The coding exons are in solid, filled boxes', non-coding exons are in empty boxes. Retroelements are shown as gray, shaded boxes. Horizontal arrows represent configurations of genes or retroelements. Vertical arrows indicate the positions of RP1IRP2 gene duplications.

182 10 20 kb I I I I I I.

to HLA Class I to KLA class II 92 kb to HSP70 350 kb to HLA DRr

d RD C4A Xi C4B TNXB

I I I I I I Intcrgcnic distance - g - os g - (bp) CN — fS NO

IL // / C2 B f RPl TNXA C4 TNXB

I I Intcrgcnic distance — ct m -v (bp) 3 ' % T

Fig. 6.7 A comparison of the gene organizations of1. human and II. mouse MHC complement gene cluster. Arrows represent transcriptional orientations of functional genes. Pseudogenes and gene segments are shaded.

183 exon intron intron phase

(human/mouse, bp)

1 123/135 247/176

2 362/362 172/146 2/2

3 236/236 86/89 1/1

4 220/220 127/102 2/2

5 136/136 139/129 0/0

6 95/95 84/79 2/2

7 214/199

Table 6.1 A comparison of human and mouseDOM3Z exon-intron structures

184 CHAPTER?

INSULIN DEPENDENT DIABETES MELLITUS

AND MHC COMPLEMENT GENE CLUSTER

7.1 ABSTRACT

Insulin-dependent (type 1) diabetes mellitus (IDDM) is an immune-mediated multifactoral autoimmne disease. A genome wide search for IDDM susceptibility genes indicated that the major histocompatibility complex (MHC) is the most important locus of the disease association. Besides class II genes DR and DQ, possible candidates also include C4 and some other poorly defined polymorphic genes in the class III region.

Variations of the genetic unit RCCX module, consisting of C4 and neighboring genes

RPl, CYP21 and 77VX, can result in the genetic instability of the class III region, and consequently, lead to disease associations. It has been discovered that in the diabetic patient population (n=50), the frequency of monomodular RCCX structures (42%) is significantly higher than that in the normal population (16.2%, n=150). In the disease group, 68% of the patients have at least one chromosome with monomodular RCCX,

185 compared with 30.9% in the control group. In addition, 11 C4B3 alleles were identified in the IDDM population, which is significantly higher than in the normals (1 in 150 patients). In one of the patients, an abnormal C4B3, which is associated with Rgl was identified. Determination of the genomic sequence of the C4d region from this patient confirmed that it encodes Rgl antigenic determinant, with the C4B specific amino acids at the isotypic region. Two ubiquitously expressed genes, SKI2W and DOM3Z, located upstream of C4, were screened for possible polymorphism in IDDM patients. A Taq I

RFLP and an Nla IV RFLP in SKI2W were detected. The significance of these polymorphisms has yet to be determined. This study further consolidates the association of IDDM with the MHC class HI region.

186 7.2 INTRODUCTION

Insulin dependent (type I) diabetes mellitus (IDDM) is an immune-mediated, multi-genetic autoimmune disease with worldwide incidence. Environmental factors are also necessary to precipitate the disease. IDDM is characterized by destruction of the insulin-secreting p cells of the pancreas. Although IDDM has become one of the most extensively studied polygenic autoimmune diseases, the fundamental genetic defects leading to IDDM still remain undetermined (Davidson, 1998).

A genome wide search for human type 1 diabetes susceptibility genes indicated that the major histocompatibility complex (MHC) is the major locus for the disease association (Davies et aL, 1994). The human MHC is also known as the HLA.

Individuals with a combination of HLA class II antigens DR3 and DR4 are found to have higher risk of IDDM (Harrison et aL, 1990; Svejgaard and Ryder, 1981; W olf et aL,

1983). It has also been suggested that mutation at position 57 of HLA DQ P is strongly associated with IDDM (Nepom et aL, 1987; Todd et aL, 1987). However, the onset of

IDDM cannot be fully explained by such a mutation. Therefore, the presence of susceptibility genes in the MHC other than the class II molecules cannot be ruled out. It is likely that the specific alleles of the HLA DR or DQ are markers for the presence of nearby susceptibility genes or elements for IDDM.

In addition to HLA class II, possible candidates include C4 and some other poorly defined polymorphic genes in the class HI region (Degli-Esposti et aL, 1992). Several

187 groups have reported abnormalities of the complement system in type I diabetic patients

(Jenhani et al., 1992; Marcelli-Barge et al., 1990), including low levels of plasma C4 that may be due to hypercatabolism or to a reduction in protein synthesis (Jacob et al., 1986;

Senaldi et al., 1988). Alternatively, the increased frequency of C4 gene deletion may be responsible (Caplen et a l, 1990). Another C4 allele that has been reported to be associated with IDDM is C4B3 (Bertrams et a l, 1984; Thomsen et a l, 1988). Whether these associations of C4 with IDDM are actually a consequence of linkage disequilibrium with the DR and DQ loci or indeed result from a primary association with the disease itself is unclear. It is also possible that certain C4 alleles serve as markers for the real susceptibility gene(s) nearby, as it has already been suggested that HLA DR314 heterozygous individuals may have a second susceptibility gene that maps close to the C4 locus (Caplen et al, 1990; Thomsen et a l, 1988).

SKI2W and DOM3Z are located between 5 /and C4 (Yang et a l, 1998). Both genes are ubiquitously expressed but the levels of transcripts are significantly higher in pancreas, the organ producing insulin. The exact functions of these two genes have not been determined yet. 5/sT2Wbelongs to the helicase gene family (Dangel et a l, 1995a), which have been suggested to be associated with many inherited human disorders, including diabetes (Ellis, 1997). It is of interest to investigate whether there are any abnormalities of these genes in the IDDM patients.

In order to elucidate the susceptibility genes or genetic elements in the MHC class m region associated with IDDM, C4 gene deletions and RCCX modular variations were examined. The two housekeeping genes SKI2W and DOM3Z were also explored to hunt

188 for any possible mutations in the IDDM patient samples. The studies conducted here will help pave the way to better understanding on the genetic basis of IDDM.

189 7.3 EXPERIMENTAL PROCEDURES

Patient recruitment — Fifty juvenile onset IDDM patients were recruited from the

Columbus Children’s Hospital for this study. Five to Fifteen milliliters of peripheral blood was taken from each person based on the patient’s age. Appropriate consents were obtained according to approved protocols by the Institutional Board of the Columbus

Children's Hospital.

Isolation o f genomic DNA, total RNA and plasma — Genomic DNAs were isolated from 3 ml of peripheral blood using DNA isolation kit (Centra System, Inc.,

Minneapolis, MN). Four ml of blood was subjected to Ficoll gradient centrifugation to separate white blood cells, plasma and red blood cells. Total RNAs were isolated from white blood cells using RNAzol^ kit (Tel-Test, Inc., Friendswood, TX) after the manufacturer’s instruction. One-fourth of the white blood cells were used to create an immortalized cell line for each patient sample.

Establishment of immortalized cell lines — B lymphoblastoid cell lines were developed for each patient by transformation of B lymphocytes in the white blood cells using Epstein Barr viruses (EBV)(Walls and Crawford, 1987). Basically, white blood cells were incubated in 24-well cell culture plates with EBV, which are in the dilutions of

190 10*’, 10'“, 10'^, and 10"^ in R PM I1640 medium. The cells were fed weekly until the clumps of the transformed B lymphoblastoid cells were observed. Afterwards, cells were transferred to T25 flasks and the cultures were expanded.

Southern blot analysis — Ten micrograms of genomic DNA were digested to completion with an appropriate restriction enzyme for 16 hrs, resolved on an agarose gel, blotted and hybridized with an appropriate [a-'’“P] dCTP-labeled probe. Probes used in this study include /?P1.1; C4 Pa and Pg; CYP21, TNX (Yang et a i, 1999), DOM SZand

SKI2W (Yang et aL, 1998).

Complement C4 allotyping — Complement C4A and C4B allotypes from EDTA-blood plasma were determined as described (Awedeh and Alper, 1980; Sim and Cross, 1986).

Briefly, 10 pi of plasma was digested with 0.1 U of neuraminidase (Sigma, St Louis, MO) at

4° overnight, and with 0.1 U of carboxyl pepdidase B (Sigma) at room temperature for 30 min. Plasma proteins were resolved by high voltage gel electrophoresis (Schneider and

Rittner, 1997). C4 proteins were detected by immunofixation using goat antisera against human C4 (Incstar, Stillwater, MN). The association with Chi or Rgl antigenic determinants were examined by immunoblot analyses using anti-Chl or anti-Rgl monoclonal antibodies (anti-C4B, cat no. C057-325.2, lot no. 120287; anti-Rgl, RGdl; kindly provided by Dr. Joann M. Moulds, Houston, TX) at a dilution of 1 in 5000 and 1 in

1000, respectively. Immune complexes were detected by chemiluminescence method using the ECL-plus reagents (Amersham, Piscataway, NJ).

191 Oligonucleotides — Oligonucleotides for amplifying and sequencing the C4d region were synthesized by an Applied Biosystem Model 380B DNA Synthesis machine. The sequences of the primers are: C4E22.5 GAA GGG GCC ATC CAT AGA GA; C4E25.3

GAG GTG CTG CTG TCC CGT GA; C4E26.5 GOT GAG AGG GTT TGT GTT GAA;

C4E27.3 GAG TGT GTG GTT GAA TGG GT; C4E28.5 GAA GGG TGG ATG TGA

AAG GG; C4E29.3 TTG GGT ACT GGG GAA TGG GG; C4E30.5 GAG AGG GTG

ATT GGGG GTG GA; and C4E3L3 GTT GAG GGT TGG TTT GGT GT.

Genomic PCR of the C4d region — To ensure the accuracy of the sequence, PGR of genomic DNAs were performed utilizing the Expand™ high fidelity PGR system (Roche

Molecular Biochemicals, Indianapolis, IN) following manufacturer’s protocol. The PGR products were purified and cloned into the pGR2.1 vector (Invitrogen, Carlsbad, GA).

C4E22.5 and C4E31.3 are the primers used for the amplification.

Allotype specific PCR — Clones containing the G4B-specific G4d region were identified by allotype specific PGR, which were performed using allotype specific primers described in (Barba et aL, 1994). Clones that give the product of expected size were chosen for further analysis.

DNA Sequencing and Sequence Analysis — Sequences of the G4d region were determined by automated sequencing method using an ABI 377 machine. Sequence

192 comparisons were performed by F AST A program in GCG package (Genetics Computer

Group, 1991) through the Pittsburgh Supercomputing Center.

193 7.4 RESULTS

C4 gene deletions and RCCX modular variations in IDDM patients

Low serum concentrations of C4 are found in about one-fourth of all IDDM patients, and may be important in the etiology of the disease (Jacob et aL, 1986). One direct cause for low protein levels is the gene deletion. Therefore, to determine the role of C4A and C4B gene deletions, as well as the RCCX modular variations in IDDM patients, genomic Taq I, Nla IV, and EcoO 1091 Southern blot analyses were performed according to the established techniques described in Chapter 2 (Yang et al, 1999).

The RCCX modular structures of the 50 IDDM patients studied were listed in

Table 7.1. Among the chromosome 6 analyzed, only 50% have bimodular RCCX structure (33% are with two C4 long genes L-L, 17% are with one long and one short C4 gene L-S), while 42% have monomodular RCCX, and 8% have trimodular RCCX. In comparison, 71.6% of the chromosome 6 from non-IDDM population (n=150) have bimodular RCCX (47.7% L-L, 23.8% L-S), only 16.2% have monomodular RCCX, and the other 12.2% have trimodular RCCX {Fig. 1.1, panels I and H). The extensive deletion of C4 genes can also be demonstrated by the even more significant fact that in the diseased population, only 32% have bimodular and / or trimodular RCCX structures on both copies of chromosome 6, or bimodular RCCX on one chromosome and trimodular RCCX on the other. In comparison, the frequency is 69.1 % in the non-IDDM

194 population. In other words, 68% of the IDDM patients have a deletion of at least one C4 and its neighboring genes in the RCCX.

An intriguing aspect on the RCCX modular variations among the 50 IDDM patients is that, there is no single patient who has homozygous, bimodular RCCX structures with two C4 long genes (L-L / L-L). In the non-IDDM population, the frequency of homozygous L-L / L-L RCCX is 19.5%. Since C4 long gene is due to the presence of the endogenous retrovirus HERV-K(C4), the significantly lower percentage of C4 long gene in the diseased population suggests that this endogenous retrovirus in C4 gene could have a protective function against IDDM. The results of these comparisons are summarized in Table 7.2.

C4 protein variations in IDDM patients

Extensive C4 protein polymorphisms have been demonstrated in the IDDM population (Jenhani et al., 1992), and various allotypes of C4A and C4B are determined by immunofixation and immunoblot analyses. The allotyping results for all 50 IDDM patients studied are listed in Table 7.1. A few examples are presented in Fig. 7.2.

Deduced from the immunofixation experiments {panel I), patient 46 in lane 1 appears to have normal C4A3 C4B1; patient 20 {lane 2) is homozygous for C4A3 C4BQ0; and patient 24 {lane 3) is homozygous for C4AQ0 C4B1. Patient 11 {lane 4) has C4A3

C4B2, C4A3 C4BQ0. Patients 13 and 14 {lanes 5 and 6) both have total C4A deficiency, with C4AQ0 C4B1, C4AQ0 C4B2 and C4AQ0 C4B1, C4AQ0 C4B3, respectively. Both

195 patients 4 and 15 {lanes 1 and 8) exhibit similar allotyping pattern, and are assigned to be

C4A3 C4BQ0, C4A3 C4B3. Patient 28 {lane 9) shows C4AQ0 C4B1, C4A3 C4B6.

In immunoblot analyses, C4B1 in lanes 1, 3, 5, 6, and 9 react with anti-Chl monoclonal antibody {panel II). C4B2 in lanes 4 and 5, C4B3 in lanes 6 and 7, as well as

C4B6 in lane 9 also show positive reactions with this anti-Chl antibody. Nothing was detected in lane 2 since this patient is homozygous for C4BQ0. Unexpectedly, nothing was detected in lane 8 either, although the patient has C4B3 according to the immunofixation experiment. It turned out that this C4B3 positively reacts with anti-Rgl monoclonal antibody {panel HI, lane 8). Panel HI also confirms the presence of C4A3 associated with Rgl in lanes 1, 2, 4, 7, 8 and 9. No signal was detected in lanes 3, 5 and

6 because all three patients are homozygous for C4AQ0.

Among the 50 patients examined, 11 have C4B3 identified. In other words, 11

C4B3 alleles were present on the 100 chromosomes in the patient population. This frequency is significantly higher than that in the normals (1 in 300 chromosomes).

Illustration of Rgl-associated C4B3 sequence

As described above, judging from the mobility, one of the C4 proteins in patient

15 was assigned as C4B3. Intriguingly, it positively reacts with anti-Rgl instead of anti-

Chl antibody. The association of C4B3 with Rgl has never been reported before. To definitively prove the reverse association, genomic PCR was performed to amplify the

C4d region of this patient. The PCR products were cloned and the sequence of the C4B- specific C4d region was determined. The sequence comparison of Rgl-associated C4B3

196 with C4A was presented in Fig. 7.3. At nucleotide 1119, 1122, 1130, 1133, and 1135, the cloned genomic PCR product obviously contains C4B specific sequence, and encodes for

LSPVIH, the C4B isotypic amino acids. However, unlike most other C4B, which are associated with Chi, this C4B3 in an IDDM patient encodes for Rgl antigenic determinant characterized by VDLL at amino acid position 1188-1191. There are other nucleotide changes observed in the sequence, some of which cause amino acid substitutions. For example, the C -> T change at nucleotide 802 leads to amino acid Ala

1049 Val substitution; the A ^ G change at nucleotide 817 results in a Asp 1054

Gly substitution.

Screening for possible polymorphism of D0M2Z and SK12W in IDDM patients

As C4 is possibly a marker for the presence of real disease susceptibility gene(s) in close proximity, DOM3Z and SKI2W, two novel genes located between C4 and Bf, were studied for any possible RFLP associated with IDDM.

Genomic Taq I, Mia IV, and EcoO 1091 Southern blots used for determining

RCCX modular structures were hybridized with D0M 3Z and SKI2W probes separately.

No RFLP was observed in EcoO 1091 blots when hybridized with either probe. When hybridized with DOM3Z probe, in addition to the 2.7 kb Taq I fragment, patient no. 3 shows an extra fragment with the size about 1.8 kb {Fig. 7.4, panel I-A, lane 4).

Sequence analysis indicates that the Taq I polymorphism may actually lie in SKI2W, the gene in tail-to-tail configuration with D0M3Z. This was indeed the case as confirmed by

197 the fact that when hybridized with SKI2W probe, the 1.8 kb extra band was also detected in the same patient iFig. 7.4, panel I-B, lane 4).

In Nla IV blots, no RFLP was detected in DOM3Z gene. However, when SKI2W was used as the probe, four IDDM patients (no. 28, 38,41 and 49) exhibited the same abnormal restriction digestion pattern {Fig. 7.4, panel H, lanes 1-4).

198 7.5 DISCUSSION

MHC is the most significant genetic factor that is associated with the pathogenesis of IDDM [reviewed in (Tisch and McDevitt, 1996)]. At the same time, environmental factors, one of which is viral infection, are also important for the disease development

(Conrad etaL, 1994; Yoon, 1995). There have been reports describing the associations of the MHC class HI region with IDDM, including increased frequency of the C4A null allele (Caplen et aL, 1990; Mijovic et aL, 1985). In this chapter, preliminary studies have been presented in searching for any possible effect of the C4 gene size and gene number on IDDM. It appears that in the diseased population, C4 gene deletions occur much more frequently than in the non-IDDM population. Gene deletion is the most direct cause for the low C4 protein levels in the serum, which are found in about 25% of all IDDM patients (Senaldi et aL, 1988). The association between C4 null alleles and IDDM could be due to the linkage disequilibrium with susceptibility loci HLA DR or DQ.

Alternatively, a deficiency of C4 could direcdy predispose to the disease by impairing the efficiency of the complement system. As a major player in the immune response, C4 has a key role in virus neutralization and its deficiency might enable a virus to initiate the process leading to IDDM (Senaldi et aL, 1988).

There has been some controversial demonstration suggesting that endogenous retroviruses can code for superantigens that would lead to the activation of (3 cell

199 destructions in the pancreas (Conrad et a i, 1994; Conrad et al., 1997; Murphy et ai,

1998). However, in this study, the statistics is significant that the frequency of C4 long

genes present is much lower in the IDDM population than in the normal, which indicates

the possible protective role of HERV-K(C4) against IDDM. This endogenous retrovirus

exists in intron 9 of C4 long gene with the opposite transcription direction (Dangel et al.,

1995a). Therefore, antisense viral RNA is synthesized during the C4 transcription. Since

the antisense product is complementary to the viral genomic RNA, it can consequently block the translation of viral proteins and limit the production of superantigens.

Eventually this may reduce the chance for the induction of IDDM by viral superantigens

(Yu, 1999).

DOM3Z and SKI2W, which are located upstream of C4, were also studied for possible mutation(s) in the IDDM population. There is no DOM3Z RFLP observed in the patient population. This result is consistent with the notion that DomSz plays an essential role in a biochemical pathway (Yang et al., 1998). DomSz may have such fundamental function that any mutation could possibly have very severe consequences or even be lethal. There are some 5A72WRFLPs detected in some patients, whether they have any impact on the disease, or are just some examples of the single nucleotide polymorphism

(SNP) with no obvious phenotype by themselves, still need to be determined.

Attentions need to be paid to two important aspects; first, many MHC loci are in linkage disequilibrium with one another (Tomlinson and Bodmer, 1995). It is therefore extremely difficult to distinguish the primary "disease allele" from the associated allele.

From this point of view, HLA typing becomes so necessary that any conclusion drawn

200 about the IDDM "disease gene" before combining the typing data would seem premature.

The HLA typing of the 50 IDDM patients are being conducted as this chapter is written.

Secondly, for any polygenic disease like IDDM, the sample sizes of the diseased population as well as the well-matched control studied need to be large enough for any valid conclusion. Too often, false positive can be observed in a small subject group, and it needs to be aware that this project also faces the same potential hmitation.

Recently, there is another interesting development on the IDDM and human MHC genetics: the mapping of allograft inflammatory factor 1 (AIFI) in the class HI region.

AJFl appears to inhibit insulin secretion at low glucose concentrations and stimulate insulin secretion at high glucose concentration (Chen et a i, 1997). It would be informative to investigate whether there is any mutation in AIFl gene or any variation in the gene expression in the IDDM population. Meanwhile, it has been shown extensively that MHC molecules are related to each other functionally and evolutionarily (Salter-Cid and Flajnik, 1995). Thus, it is possible that other genetic factors or elements located in the MHC class IH region may also be involved in the secretion of insulin or destruction of the p cells in the pancreas. Take in consideration of the high RD-SKI2W-DOM3Z-RP1 transcription levels in the pancreas, these four genes remain attractive of our attention for further exploration.

201 RCCX modular stmcatres

80 60 □ IDDM ■ normals IDDM normals 40 bi- 50 71.6 20 A mono- 42 16.2 I bi- mono-

n. Bimodnlar RCCX

60 1 50 - □ IDDM IDDM normals 40 - 30 - ■ normals L-L 33 47.7 2 0 -

L-S 17 23.8 10 - 0 jL :JLL-L L-S

Fig. 7.1 Comparisons ofC4 genes and RCCX modular structures between chromosomes from the IDDM (n=100) and non-IDDM (n=300) populations. n is the number of chromosomes studied.

202 I. Immunofixation

1 2 3 4. 56789 C 4 B 1 - . ,

C 4 A 3 — #4» # ^ ^ ^ \ C 4 B 6

Patient# 46 20 24 11 13 14 4 15 28

n. Immunoblot (anti-Chl)

C 4 B 1 - # # ^ ^ ^C 4B2 —\C4B3 (Chl+) C4B6

m . Immunoblot (anti-Rgl)

_ ^ — C4B3 (Rgl+) C4A3— # # # ..

Fig. 7.2 C4 allotypes of selected EDDM patients. /. Human C4A and C4B allotypes detected by immunofixation; Immunoblot analysis of the C4 allotypes using: II. anti-Chl monoclonal antibody; and ///. anti-Rgl monoclonal antibody.

203 Fig. 7.3 Sequence comparison of the C4d region from Rgl-associated C4B3 and

C4A. The C4A sequence was taken from Yu, 1991 and used as the standard. The coding regions were shown in uppercase and the introns in lowercase. The isotypic region and the antigenic determinant region were underlined, with the amino acid sequences bolded or bolded and italicized, respectively. The nucleotide changes were labeled above the sequence and the amino acid changes below. Insertions and deletions of nucleotide were marked.

204 e x o n 22 1 TCCATAGAGAGGAGCTGGTCTATGAACTCAACCCCTTGGgtgagtgaccctctacctccagccattggtttcctaagtgggtacaggtggtgggggatgCggacagcaggacaggctgcc HREELVYELNPL (9 3 2 )

e x o n 23 121 aacttcccccatttccccagACCACCGAGGCCGGACCTTGGAAATACCTGGCAACTCTGATCCCAATATGATCCCTGATGGGGACTTTAACAGCTACGTCAGGGTTACAGgtgggagtgc DHRGRTLEIPGNSDPNMIPDGDFNSYVRVT (9 6 2 )

241 cctttagtcccttcccagtggccaccttcggattcatgtgggacctgtggatccctgcttggtcccactccccgtgagcctctgacacagagtcctcagacctccaccctctcccU ccca

e x o n 24 361 tg tagCCTCAGATCCATTGGACACTTTAGGCTCTGAGGGGGCCTTGTCACCAGGAGGCGTGGCCTCCCTCTTGAGGCTTCCTCGAGGCTGTGGGGAGCAAACCATGATCTACTTGGCTCC ASDPLDTLGSEGALSPGGVASLLRLPRGCGEQTMIYLAP

481 GACACTGGCTGCTTCCCGCTACCTGGACAAGACAGAGCAGTGGAGCACACTGCCTCCCGAGACCAAGGACCACGCCGTGGATCTGATCCAGAAAGgttCtgggtgcaagggcaagcagga TLAASRYLDKTEQWSTLPPETKDHAVDLIQK (1 0 3 2 )

601 ggggggccaggaaaggacagttctggaagatggacagcccaggaggcttacagagggaaagaaagggggcccctgatgaggatggggagcatggccttgggctcaaacagcagaagggtg " + 3 " n o t e x o n 2 5 T G T 721 agtgtcacctgagcggccacccctcctctccaagGCTACATGCGGATCCAGCAGTTTCGGAAGGCGGATGGTTCCTATGCGGCTTGGTTGTCACGGGACAGCAGCACCTGGTGAGCTTGG GYMRIQQFRKADGSYAAWLSRDSSTW (1 0 5 8 ) V G O La 841 gagagtggttccagggttctgagggggtcagggctggggcaggggtgggacagagctggtatgatgggagggtggaCaaccaggcacctgggggcgtgggcataatgagaagcaagtcct

e x o n 26 961 tatccccaaccctcctttcctgccctccagGCTCACAGCCTTTGTGTTGAAGGTCCTGAGTTTGGCCCAGGAGCAGGTAGGAGGCTCGCCTGAGAAACTGCAGGAGACATCTAACTGGCT L.TAFVLKVLSLAQEQVGGSPEKLQETSNWL

T C ACT 1081 TCTGTCCCAGCAGCAGGCTGACGGCTCGTTCCAGGACCCCTQTCCAGTGTTAGACAGGAGCATGCAGgtgcgggcatgctggggctggcccgagaagcgcctgtcggaggactctctttg LSQQQADGSFQD P C P V L D R S M Q (1110) L S I H

e x o n 27 1201 CCCCttCCCCCtCCtgtttgacatcttttCtCCCCttactagGGGGGTTTGGTGGGCAATGATGAGACTGTGGCACTCACAGCCTTTGTGACCATCGCCCTTCATCATGGGCTGGCCGTC GGLVGNDETVALTAFVTIALHHGLAV

1321 TTCCAGGATGAGGGTGCAGAGCCATTGAAGCAGAGAGTGgtaagttcagCggcgtttctgccctctgctggcccccagctctctccctttttcctcaggaacccaggggtccaggcccaa FQDEGAEPLKQRV ( 1 1 4 9 )

(to be continued)

Fig. 7.3 Sequence comparison of theC4d region from Rgl-associated C4B3 and C4A. [Fig, 7.3 continued)

e x o n 28 1441 gaccctcctcccgC.tCt:cttccagGAAGCCTCCATCTCAAAGGCAAACTCATTTTTGGGGGAGAAAGCAAG'rGCTGGGCTCCïGGGTGCCCACGCAGCTGCCATCACGGCCTATGCCCTG EASISKANSFLGEKASAGLLGAHAAAITAYAL

1561 ACACTGACCAAGGCOGCTGTCGACCTGCTCGGTGTTGCCCACAACAACCTCATGGCAATGGCCCAGGAGACTGGAGgtgaggggtigaggcgctcCggcagtgagcctgaggcccagggga T L T K A P y D L L GVAHNNLMAMAQETG (1206) "+c

1681 ct:taggatcct:gagt:gtgcccagagggagaggctggatgaagactcagaggaggaatgaagt:t:al:aagcaggggtgggttgggggagactcaggagagcccagcagggggt:ggcCaaggg '■ tc "'+C

c exon 29 A 1801 ccaggggaccaggctcttctccctgccttcctgtttacttgtggtctcccttcacttt:cagATAACCTGTACTGGGGCTCAGTCACTGGTTCTCAGAGCAATGCCGTGTCGCCCACCCCG DNLYWGSVTGSQSNAVSPTP

1921 GCTCCTCGCAACCCATCCGACCCCATGCCCCAGGCCCCAGCCCTGTGGATTGAAACCACAGCCTACGCCCTGCTGCACCTCCTGCTTCACGAGGGCAAAGCAGAGATGGCAGACCAGGCT APRNPSDPMPQAPALWIETTAYALLHLLLHEGKAEMADQA

2041 GCGGCCTGGCTCACCCGTCAGGGCAGCTTCCAAGGGGGATTCCGCAGTACCCAAgtaggggccgtccccgggctctggggggggCgggtagtcctcagaccaagggcttigcttgagtcct AAWLTRQGSFQGGFRSTQ (1 2 8 4 ) o e x o n 30 2161 ggctcaacctccctagOACACGGTGATTGCCCTGGATGCCCTGTCTGCCTACTGGATTGCCTCCCACACCACTGAGGAGAGGGGTCTCAATGTGACTCTCAGCTCCACAGGCCGGAATGG DTVIALDALSAYWIASHTTEERGLNVTLSSTGRNG

2281 GTTCAAGTCCCACGCGCTGCAGCTGAACAACCGCCAGATTCGCGGCCTGGAGGAGGAGCTGCAGgtgaaccacticcctiggtgaaccactcccCcgcctgggtagccaggacacctgggcc FKSHALQLNNRQIRGLEEELQ (1 3 4 0 )

a 2401 tcgtggccaggccagaagccgCccccaccctcccacccgr.ggaatccccgcagcacctctCccCggggtctt;cgggggaagactgact;tcct:ggct:gcgCgacctggagctccgagct;tc

2521 agtt:t:ccCcacttgtagagtaacatacacagagttcaccctacagggtcgttagaaggctgaagtgagataattcatgtgccggtataaactttgtggaaatgtgaggtggggagaggag

e x o n 31 2641 gtggggcctgttttigaggaaggagataagttattggagccgcaaaaacaggctitgcttgtgcccttctaacatLcgccttcccttCtictcgttgcCgaagTTTTCCTTGGGCAGCAAGATC ''noc ''nocFSLGSKI

2 7 6 1 AATGTGAAGGTGGGAGGAAACAGCAAAGGAACCCTGAAGG NVKVGGNSKGTLK (1 3 6 0 ) I. A. B. 1 2 3 4 1 2 3 4

*-

n . 1 2 3 4 5 6 7 8

m m

7.4 RFLPs of SKI2W in IDDM patients. I. Taq I RFLP detected in one IDDM patient {lane 4) using A. DOM3Z or B. SKI2W prohe. II. Nla IV RFLP detected in four EDDM patients {lanes 1-4) using a SKI2W probe.

207 Patient # C4A C4B allotypes Rgl Chi RP-C4 21A/21B XA/XB

1 1 1 A 3B1 2 1 RP1-C4L-RP2-C4L 1 :2 1 :2 A3 BQO RP1-C4L 2 1 1 A3 BQO 1 1 RP1-C4L 21B XB A Q O Bl RP1-C4S 3 1 1 A3 BQO 1 1 RP1-C4L 21B XB A Q O Bl RP1-C4S 4 2 1 A3 B3 2 1 RP1-C4L-RP2-C4L 1 :2 1 :2 A3 BQO RP1-C4L 5 2 1 A 3B1 2 1 RP1-C4L-RP2-C4L 1 :2 1 :2 A3 BQO RP1-C4L 6 1 2 A3 B3 1 2 RP1-C4L-RP2-C4L 1 :2 1 :2 A Q O Bl RP1-C4S 7 2 1 A 3B1 2 1 RP1-C4L-RP2-C4L 2 : 1* 1 : 1 A3 BQO RP1-C4L CAH-C 8 2 1 A 3B1 2 1 RP1-C4L-RP2-C4S 1 :2 1 :2 A3 BQO RP1-C4L I 9 3 2 3A3 2B1 3 2 RP1-C4L-RP2-C4L-RP2-C4L 3 : 2 3 : 2 RP1-C4L-RP2-C4S 10 1 2 A 3B1 1 2 RP1-C4L-RP2-C4L 1 :2 1 :2 A Q O Bl RP1-C4S 11 2 1 A3 B2 2 1 RP1-C4L-RP2-C4S 1 :2 1 :2 A3 BQO RP1-C4L 12 2 3 A 3B1 2 3 RP1-C4L-RP2-C4L 1 : 1 1 : 1 A 3B1B1 RP 1-C4L-RP2-C4S-RP2-C4S 13 0 2 A Q 0B 2 0 2 RP1-C4L 21B XB A Q O Bl RP1-C4S

(to be continued)

Table 1.1 RCCX modular structures and C4 allotypes for 50 IDDM patients {Table 7.1 continued)

Patient # C4A C4B allotypes R g l C h i RP-C4 21A/21BXA/XB

14 0 2 AQO B3 0 2 RP1-C4L 21B XB A Q O Bl RP1-C4S 15! 2 1 A3 B3 (Rgl+) 3 0 RP1-C4L-RP2-C4L 1 :2 1 :2 A3 BQO RP1-C4L 16 2 3 2A3 2B1 1B3 2 3 RP1-C4L-RP2-C4L 3 : 2 2 : 3 RP1-C4L-RP2-C4S-RP2-C4S 17 2 1 A3 A2 2 1 RP1-C4L-RP2-C4S 1 :2 1 :2 AQO Bl RP1-C4S 18 1 2 A 3B1 1 2 RP1-C4L-RP2-C4S 1 :2 1 :2 A Q O Bl RP1-C4S 19 3 1 A3 A3 3 1 RP1-C4L-RP2-C4L 1 ; 1 1 : 1 A3 B2 RP1-C4L-RP2-C4S 20 5 0 A3 A3 5 0 RP1-C4L-RP2-C4L 3 : 2 3 : 2 I A3 A3 A3 RP 1-C4L-RP2-C4S-RP2-C4S 21 1 1 A3 BQO 1 1 RP1-C4L 21B XB A Q O Bl RP1-C4S 22 2 1 A 3B1 2 1 RP1-C4L-RP2-C4L 1 :2 1 :2 A3 BQO RP1-C4L 23 1 2 A 3B1 1 2 RP1-C4L-RP2-C4L 1 :2 1 :2 A Q O Bl RP1-C4S 24 0 2 A Q O Bl 0 2 RP1-C4S 21B XB A Q O Bl RP1-C4S 25 1 1 A 3B1 1 1 RP1-C4L-RP2-C4L 1 : 1 1 :2 A 3B1 RP1-C4L-RP2-C4S 26 3 2 A 3B1 3 2 RP1-C4L-RP2-C4L 3 : 2 3 : 2 A3 A 3B1 RP 1-C4L-RP2-C4L-RP2-C4L

(to be continued) {Table 7.1 continued)

Patient # C4A C4B allotypes Rgl Chi RP-C4 21A/21B XA/XB

27 1 2 A 3B1 1 2 RP1-C4L-RP2-C4L 1 :2 1 :2 A QO Bl RP1-C4S 28 1 2 A3 B6 1 2 RP1-C4L-RP2-C4S 1 :2 1 :2 A QO Bl RP1-C4S 29 0 2 A Q O B l 0 2 RP1-C4S 21B XB A Q O Bl RP1-C4S 30 I 2 A3 B3 1 2 RP1-C4L-RP2-C4L 1 :2 1 :2 AQO Bl RP1-C4S 31 2 1 A3 A2 2 1 RP1-C4L-RP2-C4S 1 :2 1 : 2 AQO Bl RP1-C4S 32 1 2 A3 B3 1 2 RP1-C4L-RP2-C4L 1 ;2 1 :2 A Q O Bl RP1-C4S 33 2 1 A3 A3 2 1 RP1-C4L-RP2-C4L 1 :2 1 :2 AQO B3 RP1-C4S 34 4 i A3 A3 4 1 RP1-C4L-RP2-C4L 3 : 2 3 : 2 A3 A 3B1 RP 1-C4L-RP2-C4S-RP2-C4S 35 (?) 1 i A 3B1 1 1 RP1-C4L-RP2-C4L 2 : 1 1 : 2 A3 B3 RP1-C4L(?) CAH-C 36 3 0 A3 A2 3 0 RP1-C4L-RP2-C4L 1 :2 1 :2 A3 BQO RP1-C4L 37 2 I A 3B1 2 1 RP1-C4L-RP2-C4L 1 :2 1 :2 A3 BQO RP1-C4L 38 2 3 2A3 3B1 2 3 RP 1-C4L-RP2-C4S-RP2-C4S 3 : 2 3 : 2 RP1-C4L-RP2-C4S 39 2 1 A3 B3 2 1 RP1-C4L-RP2-C4L 1 :2 1 :2 A3 BQO RP1-C4S

(to be continued) {Table 1.1 continued)

Patient # C4A C4B allotypes R g l C h i R P - C 4 21A/21B XA/XB

40 2 1 A2 A3 2 1 RP1-C4L-RP2-C4L 1 :2 1 :2 A QO Bl RP1-C4S 41 0 2 A Q O Bl 0 2 RP1-C4S 21B XB A Q O Bl RP1-C4S 42 1 1 A3 B3 1 1 RP1-C4L-RP2-C4L 1 : 1 1 : 1 A 3B1 RP1-C4L-RP2-C4S 43 2 1 A 3B1 2 1 RP1-C4L-RP2-C4L 1 :2 1 :2 A2 BQO RP1-C4L 44 3 1 A3 A3 3 1 RP1-C4L-RP2-C4L 1 : 1 1 : 1 A 3B1 RP1-C4L-RP2-C4S 45 1 1 A 3B1 1 1 RP1-C4L-RP2-C4L 1 : 1 1 : 1 A 2B1 RP1-C4L-RP2-C4S

I s ) 46 1 1 A 3B1 1 I RPI-C4L-RP2-C4L 1 : 1 1 : 1 A 3B1 RP1-C4L-RP2-C4S 47 3 2 A 3B1 3 2 RP1-C4L-RP2-C4S 3 : 2 3 : 2 A3 A 3B1 RP 1-C4S-RP2-C4L-RP2-C4L 48 1 1 A3 AQOBl B2 1 2 RP1-C4L-RP2-C4S 1 : 1 1 : 1 A- mutation? RP1-C4L-RP2-C4S 49 i 1 A 3B1 1 1 RP1-C4L-RP2-C4L 1 : 1 1 : 1 A 3B1 RP1-C4L-RP2-C4S 50 1 2 A3 B3 1 2 RP1-C4L-RP2-C4L 1 :2 1 : 2 A QO Bl RP1-C4S I. A comparison of the RCCX structures and C4 genes between chromsomes of IDDM and non-EDDM population.

EDDM non-EDDM p value (n*=100) (n=300) Bimodular RCCX 50 71.6 <0.001 C4L-L 33 47.7 0.012 C4L-S 17 23.8 0.163 Monomodular RCCX 42 16.2 <0.001 Trimodular RCCX 8 12.2 0.305 C4B3 11 1 <0.001 *n: number of chromosomes studied.

n. A comparison of individuals with bi-/bi- (tri-) modular RCCX and homozygous C4 L- IÆ-L between IDDM and non-IDDM population.

IDDM non-IDDM p value (n*=50) (n=150) bi-/bi- and bi-/tri- 16 103 <0.001 C4 L-L/L-L 0 58 < 0.001

*n: number of individuals studied.

Table 7.2 RCCX modular variations in IDDM patients

212 CHAPTERS

DISCUSSION

The MHC, as one of the most important genomic regions associated with many human diseases, has received a lot of attention. Over the last two decades, enormous amounts of research conducted has indeed made MHC one of the most extensively studied regions of the human genome. Individuals at high risk to develop autoimmune and other diseases would be most informative for investigation of the basis of MHC disease associations.

In particular, class HI region has become a rising star in this area, partly due to many new genes identified, which have turned it to be one of the most gene-dense regions. Of the human genome, whose size is estimated at 3x10^, only 2-3% encodes proteins. In contrast, in the 1,100 kb class III region, there is a gene every 10-15 kb

(Aguado et a l, 1996; Yang et al.., 1998). One of the mechanisms to achieve such gene density is by gene duplication.

Duplications of large regions of DNA, including duplications of whole genes, provide substrates for genetic evolution. As analyzed by Ohno more than 30 years ago

213 (Ohno, 1970), these duplication events liberate copies of the gene to diverge and take up new functional roles in the organism, while the master gene is constrained to preserve its original role. On the other hand, duplications of smaller regions involving parts of genes are now recognized as contributors to the mutation spectrum that result in genetic diseases (Hu and Worton, 1992). Complement component gene C4 has been duplicated in the human genome. While there are a small proportion of the population (16.2%) who have single C4 gene in monomodular RCCX, most of the population (83.8%) have duplicated C4 with two, three or more copies in bi~, tri- or more modular RCCX. The

RCCX modular variation is one of the major mechanisms leading to genetic instability of the MHC class m region and disease associations.

A variety of specialized selective pressures, such as the need to increase the expression level, could promote the development of gene duplications. In MHC, gene duplication is one of the mechanisms to generate and maintain polymorphism to confer resistance to a variety of infectious diseases. The C4 molecule, C4A and C4B, are specified by duplicated C4 loci. C4A is primarily involved in promoting the physiological disposal of immune complexes. C4B participates preferentially in the clearance of microorganisms (Law et a i, 1984; Yu, 1999). In order to function properly,

C4 interacts with at least eight other molecules: the antigen, the antibody, complement factors C2, C3, C5, regulatory proteins factor I, C4b binding protein, and complement receptor type I (CRl) (Porter, 1985a). C4A and C4B differ in two of the eight interactions: with the antigen and with the antibody. The reason for C4 gene duplication could thus be the need to maintain identity of six interaction sites, while allowing

2 1 4 functional differentiation at two other sites. In other words, C4 is under pressure to retain the C4A-C4B difference in the C4d region and identity in the rest of the molecule

(Horiuchi er a/., 1993).

The other side of the coin is that the same MHC polymorphism created by gene duplications is also the price that some people have to pay through increased susceptibility to genetic and autoimmune diseases (De Vries, 1994). For example, between the duplicated copies of CYP21, CYP21B is functional, while CYP21A is a pseudogene. Individuals with only CYP21A in the genome would suffer CAH, which may have fatal consequences. Therefore, the concerted duplication of C4-CYP21 in

RCCX must be occurring because of a strong selection advantage for C4 polymorphisms, with which CYP21 genes accidentally became locked into a permanent partnership by the primigenial duplication (Horiuchi et al., 1993). The parallel C4 gene duplications in mice, and the different inactivation of the CYP21 genes are consistent with this notion.

Another potential problem resulted from the concerted duplication / deletion of the RCCX module is the unequal crossover. Normally, there is a strong selection pressure in natural populations for polymorphism at certain MHC loci. Recombination

(i.e., homologous equal crossover) within the MHC shuffles alleles among haplotypes and thus could help to maintain useful polymorphic alleles. However, as RP2 and TNXA in the RCCX module are both partially duplicated gene segments, unequal crossover could occur between RPl and RP2, or TNXA and TNXB, from homolgous chromosomes or sister chromatids. In chapter 2, it has been demonstrated that in a CAH patient, the improper pairing of TNXB from a monomodular RCCX and TNXA from a bimodular

215 RCCX has led to the unequal crossover, and consequently the deletion of RP2-C4B-

CYP21B genes (Yang et a i, 1999).

The Williams Syndrome is one of some other human diseases caused by the unequal crossover. It is an autosomal dominant syndrome based on 7ql 1.23 (OMIM no.

194050). Clinical features of WS include characteristic "elfin" faces, mental retardation, growth deficiency, and hypertension (Morris et al. 1988). About 90% of WS patients are hemizygous for the elastin gene, having deleted the copy from one chromsome

(Nickerson et at., 1995). The mechanism of the deletion is suggested to involve the improper pairing of homolgous sequences and an unequal crossover that loses a 2 Mb intervening DNA (Mazzarella and Schlessinger, 1998). Recent evidence indicates that the homologous sequences may involve the GTF2I gene and its pseudogene. The GTF2I gene encodes the transcription initiator binding protein TFII-I, and maps to the telomeric breakpoint of the 2 Mb deletion; its pseudogene GTF2I maps to the centromeric breakpoint (Perez et al., 1998).

Homologous recombination between repeated sequences has also been implicated as the mechanism by which a common deletion is produced in Prader-Willi syndrome

(PWS; OMIM no. 176270) and Angelman syndrome (AS; OMIM no. 105830) patients in chromosome 15ql l-ql3 (Christian et al., 1995; Huang et al., 1997). Seventy percent of the individuals affected with PWS and AS acquire a 4 Mb deletion in the paternal and maternal genomes, respectively. In addition, this region is also subjected to duplications.

The recent construction of a detailed Y AC map of the region should help to clarify which

216 element(s) may be responsible for this chromosomal rearrangement (Christian et ai,

1998).

As discussed in Chapter 1, homologous recombination is the major mechanism from which genomic disorders are resulted. This is in sharp contrast to the classical mechanisms of diseases, in which point mutations in a gene specifically affect the structure and expression of the proteins encoded. The introduction of genomic diseases should be credited to the recent advances of the Human Genome Project (HOP) and the completion of total genome sequences for C. elegans, yeast and many bacterial species.

It is these progresses that have enabled scientists to view genetic information in the context of the entire genome and recognize that the mechanisms for some diseases are best understood at a genomic level (Lupski, 1998).

With HOP'S promise to complete an accurate, high-quality sequence of the human genome by the end of 2003, the door has opened wide to the era of whole genome science

(Collins et al., 1998). With the free access to various databases, the process of gene discovery has been revolutionized. The birth of the novel gene D0M 3Z is one of the typical examples. After a new gene is identified, it is possible to determine the sequence of the full length transcript and alternative splicing patterns, from which primary sequence and structure of the protein produced can be deduced. It is also possible to detect sequence variations in normal individuals as well as those associated with disease.

However, there will be a compelling need to better understand the function of human genes and their roles in health and disease. Therefore, the development of new research strategies for the post-genome era, even before the entire sequence of human

217 genome is finished, has become increasingly urgent. At such a turning point of biological sciences, functional genomics, the study of gene function using the tools of the genome program, has emerged. Functional genomics requires the development of analytical strategies that are comprehensive and hierarchical. Comprehensive, because we aim to uncover the action and interaction of all of the genes in a given species. Hierarchical, because this daunting task is only possible if we find ways of grouping genes of related function in order to limit the total number of experiments to be performed (Oliver et al,

1998).

Functional genomics can be best defined as the continuum from a gene's physical stmcture to the regulation and to its role in the whole organism (Woychik et al., 1998).

Understanding of gene regulation include the study of the expression patterns in development and in all adult tissues as well as in tissue undergoing pathogenic changes.

The insights into these aspects would likely derive from the development of innovative new technologies for examining gene expression. Before the genome program, it was possible only to study the temporal and spatial expression of individual genes or small groups of cloned genes under a given experimental setting. However, this has changed with the progress of the HGP and the development of new technologies: microarray and

DNA chips. For the first time in history, it becomes possible to evaluate the temporal / spatial patterns of expression of all genes of an mammalian organism, both under normal conditions or in response to environmental or genetic changes that ultimately lead to a disease phenotype. Consequently, it comes the ability to decipher the genetic pathways

218 for normal development and to understand how these pathways are altered in specific disease states.

For many diseases, especially polygenic diseases such as IDDM, it is important to determine all the contributing genetic factors. Therefore, a genome wide search becomes obvious. However, it is extremely difficult to carry out these projects using the traditional analysis as it requires the family studies and together with genetic mapping, both are dreadfully time consuming. The advancement of the microarray technology, in which the expression patterns of all genes fi’om the whole genome can be compared between normal and diseased tissues, can address this problem in a much more accurate and efficient way.

Understanding of the role of genes in the whole organism can be clarified by studying the model systems. The completion of the yeast and C. elegans genomes, together with the ongoing effort to elucidate the Drosophila and mouse genomes will shed lights on the eventual identification of all human genes. Generating mutations in the animal models and investigation on the resulting phenotypes can provide invaluable data for our understanding of the gene function in the whole organism. Coupled with the transgenic and knock-out techniques, animal models can also be used to regenerate phenotypes and confirm discoveries made form genetic studies. Therefore, model systems have tremendous impact on our understanding of disease etiology, and will also help in designing new treatments in the future.

The Human Genome Project, the ultimate foundation on which new technologies are building, new concepts are formulating, new discoveries are making, will have

219 unprecedented impact and long-lasting value for basic biology, biomedical research, biotechnolgy, and health care. They will improve our understanding in gene- environment interactions and facilitate the development of highly accurate DNA-based medical diagnostics and therapeutics.

220 REFERENCES

Abe, J., Kohsaka, T., Tanaka, M. and Kobayashi, N. (1993). Genetic study on HLA class n and class IH region in the disease associated with IgA nephropathy. Nephron 65: 17-22.

Adams, M.D., Kelley, J.M., Gocayne, J.D., Dubnick, M., Polymeropoulos, M.H., Xiao, H., Merril, C.R., Wu, A., Olde, B. and Moreno, R.F. (1991). Complementary DNA sequencing: expressed sequence tags and human genome project. Science 252: 1651- 1656.

Adams, M.D., Kerlavage, A.R., Fleischmann, R.D., Fuldner, R.A., Bult, C.J., Lee, N.H., Kirkness, E.F., Weinstock, K.G., Gocayne, J.D. and White, O. (1995). Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. Nature 377: 3-174.

Adler, A.J., Danielsen, M. and Robins, D M. (1992). Androgen-specific gene activation via a consensus glucocorticoid response element is determined by interaction with nonreceptor factors. Proc.War/.Acad.5c/.£/5!A 89: 11660-11663.

Aguado, B., Milner, C M., and and Campbell, R.D. (1996). Genes of the MHC class EU region and the functions of the proteins they encode. In "HLA and MHC: genes, molecules and function" (Browning, M. and and McMichael, A., Eds.), pp 39-75.

Ahmed, A.R., Wagner, R., Khatri, K., Notani, G., Awdeh, Z., Alper, C.A. and Yunis, E.J. (1991). Major histocompatibility complex haplotypes and class II genes in non- Jewish patients with pemphigus vulgaris. Proc.Natl.Acad.Sci. USA 88: 5056-5060.

Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. (1990). Basic local alignment search tool. J.Mol.Biol. 215: 403-410.

Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic AcidsRes. 25: 3389-3402.

221 NOTE TO USERS

Page(s) missing in number only; text follows. Microfilmed as received.

222-229

This reproduction is the best copy available.

UMI Amiel, J. C. (1967). Study of the leucocyte phenotypes in Hodgkin’s disease. In "Histocompatibility Testing 1967" (Curtoni, E. S., Mattiuz, P. L., and Tosi, R. M., Eds.), pp 79.

Apple, R. J. and Erlich, H. A. (1996). HLA class II genes: structure and diversity. In "HLA and MHC: genes, molecules and function" (Browning, M. and McMichael, A., Eds.), pp 97-112.

Atkinson, J.P., Karp, D R., Seeskin, E.P., Killion, C.C., Rosa, P.A., Newell, S.L. and Shreffler, D C. (1982). H-2 S region determined polymorphic variants of the C4, Sip, C2, and B complement proteins: a compilation. Immunogenetics 16: 617-623.

Ausubel, F. M., Brent, R., Kingston, R.E., Moore, D.D., Seidman, J.G., Smith, J.A., and Struhl, K. (1996). Introduction of DNA into mammalian cells. In "Current Protocols in Molecular Biology". JohnWiley & Sons, Inc.

Awdeh, Z.L. and Alper, C.A. (1980). Inherited structural polymorphism of the fourth component of human complement. Proc.Natl.Acad.Sci.USA 77: 3576-3580.

Badenhoop, K., Donner, H., Neumann, J., Herwig, J., Kurth, R., Usadel, K.H. and Tonjes, R.R. (1999). IDDM patients neither show humoral reactivities against endogenous retroviral envelope protein nor do they differ in retroviral mRNA expression from healthy relatives or normal individuals. Diabetes 48: 215-218.

Barba, G., Rittner, C. and Schneider, P.M. (1993). Genetic basis of human complement C4A deficiency. Detection of a point mutation leading to nonexpression. J.Clin.Invest. 91: 1681-1686.

Barba, G.M., Braun-Heimer, L., Rittner, C. and Schneider, P.M. (1994). A new PCR- based typing of the Rodgers and Chido antigenic determinants of the fourth component of human complement. Eur.J.Immunogenet. 21: 325-339.

Barnett, A.H., Eff, C., Leslie, R.D. and Pyke, D.A. (1981). Diabetes in identical twins. A study of 200 pairs. Diabetologia 20: 87-93.

Belt, K.T., Carroll, M.C. and Porter, R.R. (1984). The structural basis of the multiple forms of human complement component C4. Cell 36: 907-914.

Belt, K.T., Yu, C.Y., Carroll, M.C. and Porter, R.R. (1985). Polymorphism of human complement component C4. Immunogenetics 21: 173-180.

230 Bertrams, J., Hintzen, U., Schlicht, V., Schoeps, S., Gries, F.A., Louton, T.K. and Baur, M.P. (1984). Gene and haplotype frequencies of the fourth component of complement (C4) in type 1 diabetics and normal controls. Immunobiology 166: 335-344.

Bird, A.P. (1986). CpG-rich islands and the function of DNA méthylation. Nature 321: 209-213.

Bird, A. P. (1987). CpG islands as gene markers in the vertebrate nucleus. Trends Genet. 3: 342-347.

Boguski, M.S. (1995). The turning point in genome research. Trends Biochem.Sci. 20: 295-296.

Braun, L., Schneider, P.M., Giles, C M., Bertrams, J. and Rittner, C. (1990). Null alleles of human complement C4. Evidence for pseudogenes at the C4A locus and for gene conversion at the C4B locus. J.Exp.Med. 171: 129-140.

Bristow, J., Tee, M.K., Gitelman, S.E., Mellon, S.H. and Miller, W.L. (1993). Tenascin- X; a novel extracellular matrix protein encoded by the human XB gene overlapping P450c21B. J. Cell Biol. 122: 265-278.

Burch, G.H., Bedolli, M.A., McDonough, S., Rosenthal, S.M. and Bristow, J. (1995). Embryonic expression of tenascin-X suggests a role in limb, muscle, and heart development. Dev.Dyn. 203: 491-504.

Burch, G.H., Gong, Y., Liu, W., Dettman, R.W., Curry, C.J., Smith, L., Miller, W.L. and Bristow, J. (1997). Tenascin-X deficiency is associated with Ehlers-Danlos syndrome. Nat.Genet. 17: 104-108.

Campbell, R.D., Bentley, D R. and Morley, B.J. (1984). The factor B and C2 genes. Philos.Trans.R.Soc.Lond.B.Biol.Sci 306: 367-378.

Campbell, R.D. and Trowsdale, J. (1993). Map of the human MHC. Immiinol.Today 14: 349-352.

Caplen, N.J., Patel, A., Millward, A., Campbell, R.D., Ratanachaiyavong, S., Wong, F.S. and Demaine, A.G. (1990). Complement C4 and heat shock protein 70 (HSP70) genotypes and type I diabetes mellitus. Immunogenetics 32: 427-430.

231 Carroll, M.C., Campbell, R.D., Bentley, D.R. and Porter, R.R. (1984). A molecular map of the human major histocompatibility complex class HI region linking complement genes C4, C2 and factor B. Nature 307: 237-241.

Carroll, M.C., Palsdottir, A., Belt, K.T. and Porter, R.R. (1985a). Deletion of complement C4 and steroid 21-hydroxylase genes in the HLA class HI region. EMBO 7.4: 2547-2552.

Carroll, M.C., Campbell, R.D. and Porter, R.R. (1985b). Mapping of steroid 21- hydroxylase genes adjacent to complement component C4 genes in HLA, the major histocompatibility complex in man. Proc.Natl.Acad.Sci USA 82: 521-525.

Carroll, M.C., Palsdottir, A., Belt, K.T. and Yu, C.Y. (1986). Molecular genetics of the fourth component of human complement. Biochem.Soc.Symp. 51: 29-36.

Carroll, M.C., Katzman, P., Alicot, E.M., Koller, B.H., Geraghty, D.E., Orr, H.T., Strominger, J.L. and Spies, T. (1987). Linkage map of the human major histocompatibility complex including the tumor necrosis factor genes. Proc.Natl.Acad.Sci. USA 84: 8535-8539.

Carroll, M.C., Fathallah, D M., Bergamaschini, L., Alicot, E.M. and Isenman, D.E. (1990). Substitution of a single amino acid (aspartic acid for histidine) converts the functional activity of human complement C4B to C4A. Proc.Natl.Acad.Sci.USA 87: 6868-6872.

Castano, L. and Eisenbarth, O.S. (1990). Type-I diabetes: a chronic autoimmune disease of human, mouse, and rat. Annu.Rev.Immunol. 8: 647-679.

Chakravarti, A., Elbein, S.C. and Permutt, M.A. (1986). Evidence for increased recombination near the human insulin gene: implication for disease association studies. Proc.Natl.Acad.Sci.USA 83: 1045-1049.

Chaplin, D.D., Woods, D.E., Whitehead, A.S., Goldberger, G., Colten, H R. and Seidman, J.G. (1983). Molecular map of the murine S region. Proc.Natl.Acad.Sci.USA 80: 6947-6951.

Chaplin, D.D., Galbraith, L.J., Seidman, J.G., White, P C. and Parker, K.L. (1986). Nucleotide sequence analysis of murine 21-hydroxylase genes: mutations affecting gene expression. Proc.Natl.Acad.Sci.USA 83: 9601-9605.

232 Chen, Z.W., Ahren, B., Ostenson, C.G., Cintra, A., Bergman, T., Moller, C., Fuxe, K., Mutt, V., Jomvall, H. and Efendic, S. (1997). Identification, isolation, and characterization of daintain (allograft inflammatory factor 1), a macrophage polypeptide with effects on insulin secretion and abundantly present in the pancreas of prediabetic BB rats. Proc.Natl.Acad.Sci. USA 94: 13879-13884.

Cheng, J., Macon, K.J. and Volanakis, I.E. (1993). cDNA cloning and characterization of the protein encoded by RD, a gene located in the class DI region of the human major histocompatibility complex. Biochem.J. 294: 589-593.

Chiquet-Ehrismann, R., Hagios, C. and Matsumoto, K. (1994). The tenascin gene family. Perspect.Dev.Neiirobiol. 2: 3-7.

Chiquet-Ehrismann, R. (1995). Tenascins, a growing family of extracellular matrix proteins. Experientia 51: 853-862.

Chiquet-Ehrismann, R. (1993). Tenascin. In "Guidebook to the extracellular matrix and dhesion proteins" (Kries, T. and Vale, R., Eds.), Oxford University Press, Oxford, pp 93-95.

Cho, S.G., Attaya, M. and Monaco, J.J. (1991). New class E-like genes in the murine MHC. Nature 353: 573-576.

Christian, S.L., Robinson, W.P., Huang, B., Mutirangura, A., Line, M.R., Nakao, M., Surti, U., Chakravarti, A. and Ledbetter, D.H. (1995). Molecular characterization of two proximal deletion breakpoint regions in both Prader-Willi and Angelman syndrome patients. Am.J.Hum.Genet. 57: 40-48.

Christian, S.L., Bhatt, N.K., Martin, S.A., Sutcliffe, J.S., Kubota, T., Huang, B., Mutirangura, A., Chinault, A C., Beaudet, A.L. and Ledbetter, D.H. (1998). Integrated YAC contig map of the Prader-Willi/Angelman region on chromosome 15ql l-ql3 with average STS spacing of 35 kb. Genome Res. 8: 146-157.

Chu, X., Braun-Heimer, L., Rittner, C. and Schneider, P.M. (1992). Identification of the recombination site within the steroid 21- hydroxylase gene (CYP21) of the HLA-B47, DR7 haplotype. Exp.Clin.Immunogenet. 9: 80-85.

Chu, X., Rittner, C. and Schneider, P.M. (1995). Length polymorphism of the human complement component C4 gene is due to an ancient retroviral integration. Exp.Clin. Immunogenet. 12:74-81. Collins, F.S., Patrinos, A., Jordan, E., Chakravarti, A., Gesteland, R. and Walters, L. (1998). New goals for the U.S. Human Genome Project: 1998-2003. Science 282: 682- 689.

Conrad, B., Weidmann, E., Trucco, G., Rudert, W.A., Behboo, R., Ricordi, C., Rodriquez-Rilo, H., Finegold, D. and Trucco, M. (1994). Evidence for superantigen involvement in insulin-dependent diabetes mellitus aetiology. Nature 371: 351-355.

Conrad, B. and Trucco, M. (1994). Superantigens as etiopathogenetic factors in the development of insulin- dependent diabetes mellitus. Diabetes Metab.Rev. 10: 309- 338.

Conrad, B., Weissmahr, R.N., Boni, J., Arcari, R., Schupbach, J. and Mach, B. (1997). A human endogenous retroviral superantigen as candidate autoimmune gene in type I diabetes. Cell 90: 303-313.

Copeland, N.G., Hutchison, K.W. and Jenkins, N.A. (1983). Excision of the DBA ecotropic provirus in dilute coat-color revert ants of mice occurs by homologous recombination involving the viral LTRs. Cell 33: 379-387.

Cordell, H.J. and Todd, J.A. (1995). Multifactorial inheritance in type 1 diabetes. Trends Genet. 11: 499-504.

Cullen, M., Erlich, H., Klitz, W. and Carrington, M. (1995). Molecular mapping of a recombination hotspot located in the second intron of the human TAP2 locus. Am.J .Hum.Genet. 56: 1350-1358.

Cullen, M., Noble, J., Erlich, H., Thorpe, K., Beck, S., Klitz, W., Trowsdale, J. and Carrington, M. (1997). Characterization of recombination in the HLA class II region. Am.J.Hum.Genet. 60: 397-407.

D'Eustachio, P., Kristensen, T., Wetsel, R.A., Riblet, R., Taylor, B.A. and Tack, B.F. (1986). Chromosomal location of the genes encoding complement components C5 and factor H in the mouse. J.Immunol. 137: 3990-3995.

Dangel, A.W., Mendoza, A.R., Baker, B.J., Daniel, C.M., Carroll, M.C., Wu, L.C. and Yu, C.Y. (1994). The dichotomous size variation of human complement C4 genes is mediated by a novel family of endogenous retroviruses, which also establishes species- specific genomic patterns among Old World primates. Immunogenetics 40: 425-436.

234 Dangel, A.W., Shen, L., Mendoza, A.R., Wu, L.C. and Yu, C.Y. (1995a). Human helicase gene SKI2W in the HLA class HI region exhibits striking structural similarities to the yeast antiviral gene SKI2 and to the human gene KIAA0052: emergence of a new gene family. Nucleic AcidsRes 23: 2120-2126.

Dangel, A.W., Baker, B.J., Mendoza, A.R. and Yu, C.Y. (1995b). Complement component C4 gene intron 9 as a phylogenetic marker for primates: long terminal repeats of the endogenous retrovirus ERV-K(C4) are a molecular clock of evolution. Immunogenetics 42: 41-52.

Davidson, M. B. (1998). Diabetes Mellitus: Diagnosis and Treatment, 4^ Edition. W.B.Saunders Co., Philadelphia.

Davies, J.L., Kawaguchi, Y., Bennett, S.T., Copeman, J.B., Cordell, H.J., Pritchard, L.E., Reed, P.W., Gough, S.C., Jenkins, S.C. and Palmer, S.M. (1994). A genome-wide search for human type 1 diabetes susceptibility genes. Nature 371: 130-136.

De Vries, R. R. P. and Van Rood, J.J. (1992). Immunogenetics and disease. In "The generic basis of common diseases" (King, R. A., Rotter, J. I., and Momlsky, A. G., Eds.), pp 92-114.

De Vries, R.R.P. (1994). HLA and disease: past, present and future. NethJ.Med. 45: 302-308.

Degli-Esposri, M.A., Abraham, L.J., McCann, V., Spies, T., Christiansen, F.T. and Dawkins, R.L. (1992). Ancestral haplotypes reveal the role of the central MHC in the immunogenetics of IDDM. Immunogenetics 36: 345-356.

Dodds, A.W., Ren, X.D., Willis, A C. and Law, S.K. (1996). The reaction mechanism of the internal thioester in the human complement component C4. Nature 379: 177-179.

Donohoue, P.A., Guethlein, L., Collins, M.M., Van Dop, C., Migeon, C.J., Bias, W.B. and Schmeckpeper, B.J. (1995). The HLA-A3, Cw6, B47, DR7 extended haplotypes in salt losing 21- hydroxylase deficiency and in the Old Order Amish: identical class I antigens and class II alleles with at least two crossover sites in the class HI region. Tissue Antigens ^6: 163-172.

Ellis, N.A. (1997). DNA hehcases in inherited human disorders. Curr.Opin.Genet.Dev. 7: 354-363.

235 Fielder, A.H., Waiport, M.J., Batchelor, J R., Rynes, R.L, Black. C.M., Dodi, LA. and Hughes, G.R. (1983). Family study of the major histocompatibility complex in patients with systemic lupus erythematosus: importance of null alleles of C4A and C4B in determining disease susceptibility. Br.Med.J. 286: 425-428.

Fields, C., Adams, M.D., White, O. and Venter, J.C. (1994). How many genes in the human genome? Nat.Genet. 7: 345-346.

Fredrikson, G.N., GuUstrand, B., Schneider, P.M., Witzel-Schlomp, K., Sjoholm, A.G., Alper, C.A., Awdeh, Z. and Truedsson, L. (1998). Characterization of non-expressed C4 genes in a case of complete C4 deficiency: identification of a novel point mutation leading to a premature stop codon. Hum.Immunol. 59: 713-719.

Fuller-Pace, F. V. (1994) RNA helicase: modulators of RNA strucmre. Trends Cell Biol. 4: 271-274.

Gasser, D.L., Sternberg, N.L., Pierce, J.C., Goldner-Sauve, A., Feng, H., Haq, A.K., Spies, T., Hunt, C., Buetow, K.H. and Chaplin, D.D. (1994). PI and cosmid clones define the organization of 280 kb of the mouse H-2 complex containing the Cps-1 and Hsp70 loci. Immunogenetics 39: 48-55.

Genetic Computer Group, Inc., Madison, WT. (1991). GCG sequence analysis software package: progam manual (version 7).

Giles, C.M., Uring-Lambert, B., Goetz, J., Hauptmann, G., Fielder, A.H., Oilier, W., Rittner, C. and Robson, T. (1988). Antigenic determinants expressed by human C4 allotypes; a study of 325 families provides evidence for the structural antigenic model. Immunogenetics 27: 442-448.

Gitelman, S.E., Bristow, J. and Miller, W.L. (1992). Mechanism and consequences of the duplication of the human C4/P450c21/gene X locus. Mol.Cell Biol. 12: 3313-3314.

Gomez-Escobar, N., Chou, C.F., Lin, W.W., Hsieh, S.L. and Campbell, R.D. (1998). The G11 gene located in the major histocompatibility complex encodes a novel nuclear serine/threonine protein kinase. J.Biol.Chem. 273: 30954-30960.

Harada, F., Kimura, A., Iwanaga, T., Shimozawa, K., Yata, J. and Sasazuki, T. (1987). Gene conversion-like events cause steroid 21-hydroxylase deficiency in congenital adrenal hyperplasia. Proc.Natl.Acad.Sci.USA 84: 8091-8094.

236 Harrison, L.C., Campbell, I.L., Colman, P.G., Chosich, N., Kay, T.W.H., Tait, B., Bartholomeusz, K., DeAizpurua, H„ Joseph, J.L., Chu, S. and Kielczynski, W.E. (1990). Type 1 diabetes: immunopathology and immunotherapy. Adv. Endocrinol. Metab. 1: 35.

Hauptmann, G., Tappeiner, G. and Schifferli, J.A. (1988). Inherited deficiency of the fourth component of human complement. Immunodefic.Rev. 1: 3-22.

Hayward, W.S., Neel, B.G. and Astrin, S.M. (1981). Activation of a cellular one gene by promoter insertion in ALV-induced lymphoid leukosis. Nature 290: 475-480.

Hemenway, C., Kalff, M., Stavenhagen, J., Walthall, D. and Robins, D. (1986). Sequence comparison of alleles of the fourth component of complement (C4) and sex- limited protein (Sip). Nucleic AcidsRes. 14: 2539-2554.

Herman, A., Kappler, J.W., Marrack, P. and Pullen, A.M. (1991). Superantigens: mechanism of T-cell stimulation and role in immune responses. Annu.Rev.Immunol. 9: 745-772.

Heward, J. and Gough, S.C. (1997). Genetic susceptibility to the development of autoimmune disease. Clin. Sci. 93: 479-491.

Higashi, Y., Yoshioka, H., Yamane, M., Gotoh, O. and Fujii-Kuriyama, Y. (1986). Complete nucleotide sequence of two steroid 21-hydroxylase genes tandemly arranged in human chromosome: a pseudogene and a genuine gene. Proc.Natl.Acad.Sci. USA 83: 2841-2845.

Hillier, L., Clark, N., Dubuque, T., Elliston, K., Hawkins, M., Holman, M., Hultman, M., Kucaba, T., Le, M., Lennon, G., Marra, M., Parsons, J., Rifkin, L., Rohlfing, T., Soares, M., Tan, P., Trevaskis, E., Waterston, R., Willisamson, A., Wohldmann, P., and Wilson, R. (1996) yyl8e05.rl Soares melanocyte 2NbHM Homo Sapeins cDNA clone IMAGE: 271616 5', rtiRNA sequence. GenBank accession no. N43758.

Hodge, S.E. (1981). Some epistatic two-locus models of disease. 1. Relative risks and identity-by-descent distributions in affected sib pairs. Am.J.Hum.Genet. 33: 381-395.

Hofmann, K., Bucher, P., Falquet, L. and Bairoch, A. (1999). The PROS 11E database, its status in 1999. Nucleic AcidsRes 27: 215-219.

Horiuchi, Y., Kawaguchi, H., Figueroa, P., O'hUigin, C. and Klein, J. (1993). Dating the primigenial C4-CYP21 duplication in primates. Genetics 134: 331-339.

237 Homstein, E., Abramzon-Talianker, A., Wiesel, I., and Meyuhas, O. (1997) The human poiy(A)-binding protein (PABP) gene: structural and functional analysis. GenBank accession no. 1562511.

Howe, H.S., So, A.K., Farr ant, J. and Webster, A.D. (1991). Common variable immunodeficiency is associated with polymorphic markers in the human major histocompatibility complex. Clin.Exp.Immunol. 83: 387-390.

Hu, X. and Worton, R.G. (1992). Partial gene duplication as a cause of human disease. Hum.Mutat. 1: 3-12.

Huang, B., CroUa, J.A., Christian. S.L., Wolf-Ledbetter, M.E., Macha, M.E., Papenhausen, P.N. and Ledbetter, D.H. (1997). Refined molecular characterization of the breakpoints in small inv dup(15) chromosomes. Hum.Genet. 99: 11-17.

Isenman, D.E. and Young, J R. (1984). The molecular basis for the difference in immune hemolysis activity of the Chido and Rodgers isotypes of human complement component C4. J.Immunol. 132: 3019-3027.

Jacob, B.G., Richter, W.O., Schwandt, P., Fateh-Moghadam, A. and Witt, T.N. (1986). Low serum C4 concentrations and peripheral neuropathy in type I and type U diabetes. Br.Med.J. 292: 1671.

Jacobs, J.S., Anderson, A.R. and Parker, R.P. (1998). The 3' to 5' degradation of yeast mRNAs is a general mechanism for mRNA turnover that requires the SKI2 DEVH box protein and 3' to 5' exonucleases of the exosome complex. EMBO 7.17: 1497-1506.

Jaeckel, E., Heringlake, S., Berger, D., Brabant, G., Hunsmann, G. and Manns, M.P. (1999). No evidence for association between IDDMK( 1,2)22, a novel isolated retro vims, and IDDM. Diabetes 48: 209-214.

Jenhani, F., Bardi, R., Gorgi, Y., Ayed, K. and Jeddi, M. (1992). C4 polymorphism in multiplex families with insulin dependent diabetes in the Tunisian population: standard C4 typing methods and RPTP analysis. J. Autoimmun. 5: 149-160.

Johnson, A.W. (1997). Rat Ip and Xmlp are functionally interchangeable exoribonucleases that are restricted to and required in the nucleus and cytoplasm, respectively. Mo/. CeZ/B/o/. 17: 6122-6130.

238 Jones, F.S., Burgeon, M.P., Hoffman, S., Crossin, K.L., Cunningham, B.A. and Edelman, G.M. (1988). A cDNA clone for cytotactin contains sequences similar to epidermal growth factor-like repeats and segments of fibronectin and fibrinogen. Proc.Natl.Acad.Sci. USA 85: 2186-2190.

Jorgensen, J.L., Reay, P.A., Ehrich, E.W. and Davis, M.M. (1992). Molecular components of T-cell recognition. A«nM./?ev.7/n/nM«o/. 10: 835-873.

Jospe, N., Donohoue, P.A., Van Dop, C., McLean, R.H., Bias, W.B. and Migeon, C.J. (1987). Prevalence of polymorphic 21-hydroxylase gene (CA21HB) mutations in salt- losing congenital adrenal hyperplasia. Biochem.Biophys.Res Commun. 142: 798-804.

Karp, D.R., Capra, J.D., Atkinson, J.P. and Shreffler, D C. (1982). Structural and functional characterization of an incompletely processed form of murine C4 and Sip. J.Immunol. 128: 2336-2341.

Kawaguchi, H., O'hUigin, C., and Klein, J. (1991) Evolution of primate C4 and CYP21 genes. In "Molecular evolution of the major histocompatibility complex" (Klein, J. and Klein, D., Eds), pp 357-381.

Kawakami, A., Tian, Q., Duan, X., Streuli, M., Schlossman, S.F. and Anderson, P. (1992). Identification and functional characterization of a TLA-1-related nucleolysin. Proc.Natl.Acad.Sci USA 89: 8681-8685.

Keller, W. (1995). No end yet to messenger RNA 3' processing! Cell 81: 829-832.

Kobori, J.A., Winoto, A., McNicholas, J. and Hood, L. (1984). Molecular characterization of the recombination region of six murine major histocompatibility complex (MHC) I-region recombinants. J.Mol.Cell Immunol. 1: 125-135.

Kunz, H.W., Gill, T.J., Dixon, B.D., Taylor, F.H. and Greiner, D.L. (1980). Growth and reproduction complex in the rat. Genes linked to the major histocompatibility complex that affect development. y.£xp.Me^7. 152: 1506-1518.

Laitinen, T., Lokki, M.L., Tulppala, M., Ylikorkala, O. and Koskimies, S. (1991). Increased frequency of complement C4 'null' alleles in recurrent spontaneous abortions. Hum.Reprod. 6: 1384-1387.

Lander, E.S. and Schork, N.J. (1994). Genetic dissection of complex traits. Science 265: 2037-2048.

239 Law, S.K., Dodds, A.W. and Porter, R.R. (1984). A comparison of the properties of two classes, C4A and C4B, of the human complement component C4. EMBO J. 3: 1819- 1823.

Lebo, R.V., Chakravarti, A., Buetow, K.H., Cheung, M.C., Gann, H., Cordell, B. and Goodman, H. (1983). Recombination within and between the human insulin and beta- globin gene loci. Proc.Natl.Acad.Sci USA 80: 4808-4812.

Lee, S.G., Lee, I., Park, S.H., Kang, C. and Song, K. (1995). Identification and characterization of a human cDNA homologous to yeast SKI2. Genomics 25: 660-666.

Lévi-Strauss, M., Tosi, M., Steinmetz, M., Klein, J. and Meo, T. (1985). Multiple duplications of complement C4 gene correlate with H-2- controlled testosterone- independent expression of its sex-limited isoform, C4-Slp. Proc.Natl.Acad.Sci. USA 82: 1746-1750.

Lévi-Strauss, M., Carroll, M.C., Steinmetz, M. and Meo, T. (1988). A previously undetected MHC gene with an unusual periodic structure. Science 240: 201-204.

Lewin, B. (1995). Genes for SMA: multum in parvo. Cell 80: 1-5.

Lhotta, K., Schlogl, A., Uring-Lambert, B., Kronenberg, F. and Konig, P. (1996). Complement C4 phenotypes in patients with end-stage renal disease. Nephron 72: 442- 446.

Lokki, M.L. and Colten, H R. (1995). Genetic deficiencies of complement. Ann.Med. 27: 451-459.

Lokki, M.L., Circolo, A., Ahokas, P., Rupert, K.L., Yu, C.Y. and Colten, H R. (1999). Deficiency of human complement protein C4 due to identical frameshift mutations in the C4A and C4B genes. J.Immunol. 162: 3687-3693.

Lupski, J R. (1998). Genomic disorders: structural features of the genome can lead to DNA rearrangements and human disease traits. Trends Genet. 14: 417-422.

Makino, S., Kunimoto, K., Muraoka, Y., Mizushima, Y., Katagiri, K. and Tochino, Y. (1980). Breeding of a non-obese, diabetic strain of mice. Jikken.Dobutsii. 29: 1-13.

Marcelli-Barge, A., Poirier, J.C., Chantome, R., Deschamps, I., Hors, J. and Colombani, J. (1990). Marked shortage of C4B DNA polymorphism among insulin-dependent diabetic patients. Res.Immunol. 141: 117-128.

240 Marra, M., Hillier, L., Allen, M., Bowles, M., Dietrich, N., Dubuque, T., Geisel, S., Kucaba, T., Lacy, M., Le, M., Martin, J., Morris, M., Schellenberg, K., Steptoe, M., Tan, F-, Underwood, R., Moore, B., Theising, B., Wylie, T., Lennon, G., Soares, B., Wilson, R., and Waterston, R. (1996). Mus musculus cDNA clone 1MAGE:468777, 5' similar to PIR: A53439 B53439 complement C4A 5'-region protein RP-human; mRNA sequence. GenBank accesion no. AA035570.

Masison, D C., Blanc, A., Ribas, J.C., Carroll, K., Sonenberg, N. and Wickner, R.B. (1995). Decoying the cap- mRNA degradation system by a double-stranded RNA virus and poly(A)~ mRNA surveillance by a yeast antiviral system. Mol.Cell Biol. 15: 2763- 2771.

Matsuki, K., Juji, T., Tokunaga, K., Naohara, T., Satake, M. and Honda, Y. (1985). Human histocompatibility leukocyte antigen (HLA) haplotype frequencies estimated from the data on HLA class I, II, and HI antigens in 111 Japanese narcoleptics. J.Clin. Invest. 76: 2078-2083.

Matsumoto, K., Aral, M., Ishihara, N., Ando, A., Inoko, H. and Dcemura, T. (1992a). Cluster of fibronectin type HI repeats found in the human major histocompatibility complex class EH region shows the highest homology with the repeats in an extracellular matrix protein, tenascin. Genomics 12: 485-491.

Matsumoto, K., Ishihara, N., Ando, A., Inoko, H. and Dcemura, T. (1992b). Extracellular matrix protein tenascin-like gene found in human MHC class HI region. Immunogenetics 36: 400-403.

Matsumoto, K., Saga, Y., Dcemura, T., Sakakura, T. and Chiquet-Ehrismann, R. (1994). The distribution of tenascin-X is distinct and often reciprocal to that of tenascin-C. J. Cell Biol. 125: 483-493.

Mauff, G., Brenden, M., Braun-Stilwell, M., Doxiadis, G., Giles, C.M., Hauptmann, G., Rittner, C., Schneider, P.M., Stradmann-Bellinghausen, B. and Uring-Lambert, B. (1990a). C4 reference typing report. Complement Inflamm. 7: 193-212.

Mauff, G-, Alper, C.A., Dawkins, R., Doxiadis, G., Giles, C.M., Hauptmann, G., Rittner, C. and Schneider, P.M. (1990b). C4 nomenclature statement (1990b). Complement Inflamm. 7: 261-268.

Mazzarella, R. and Schlessinger, D. (1998). Pathological consequences of sequence duplications in the human genome. Genome Res. 8: 1007-1021.

241 McClelland, M., Nelson, M. and Raschke, E. (1994). Effect of site-specific modification on restriction endonucleases and DNA modification methyltransferases. Nucleic AcidsRes. 22: 3640-3659.

Mijovic, C., Fletcher, J., Bradwell, A.R., Harvey, T. and Barnett, A.H. (1985). Relation of gene expression (allotypes) of the fourth component of complement to insulin dependent diabetes and its microangiopathic complications. Br.Med.J. 291: 9-10.

Miller, W.L. and Morel, Y. (1989). The molecular genetics of 21-hydroxylase deficiency. Annu.Rev.Genet. 23: 371-393.

Moller, E. (1998). Mechanisms for induction of autoimmunity in humans. Acta Paediatr.Suppl. 424: 16-20.

Morel, Y., Bristow, J., Gitelman, S.E. and Miller, W.L. (1989). Transcript encoded on the opposite strand of the human steroid 21- hydroxylase/complement component C4 gene locus. Proc.Natl.Acad.Sci.USA 86: 6582-6586.

Muir, A., Ruan, Q.G., Marron, M.P. and She, J.X. (1999). The IDDMK( 1,2)22 retrovirus is not detectable in either mRNA or genomic DNA from patients with type 1 diabetes. Diabetes 48: 219-222.

Murphy, V.J., Harrison, L.C., Rudert, W.A., Luppi, P., Trucco, M., Fierabracci, A., Biro, P. A. and Bottazzo, G.F. (1998). Retroviral superantigens and type 1 diabetes mellitus. Cell 95: 9-11.

Nepom, B.S., Schwarz, D., Palmer, J.P. and Nepom, G.T. (1987). Transcomplementation of HLA genes in IDDM. HLA-DQ alpha- and beta- chains produce hybrid molecules in DR3/4 heterozygotes. Diabetes 36: 114-117.

Nepom, G.T. (1990). A unified hypothesis for the complex genetics of HLA associations with IDDM. Diabetes 39: 1153-1157.

Nepom, G.T. and Kwok, W.W. (1998). Molecular basis for HLA-DQ associations with IDDM. Diabetes 47: 1177-1184.

New, M.L (1995). Steroid 21-hydroxylase deficiency (congenital adrenal hyperplasia). Am.J.Med. 98: 2S-8S.

Newell, W.R., Trowsdale, J. and Beck, S. (1996). MHCDB: database of the human MHC (release 2). Immunogenetics 45: 6-8.

242 Nickerson, E., Greenberg, F., Keating, M.T., McCaskill, C. and Shaffer, L.G. (1995). Deletions of the elastin gene at 7ql 1.23 occur in approximately 90% of patients with Williams syndrome. Am.J.Hum.Genet. 56: 1156-1161.

Nonaka, M., Kimura, H., Yeul, Y.D., Yokoyama, S., Nakayama, K. and Takahashi, M. (1986a). Identification of the 5'-flanking regulatory region responsible for the difference in transcriptional control between mouse complement C4 and Sip genes. Proc.Natl.Acad.Sci.USA 83: 7883-7887.

Nonaka, M., Nakayama, K., Yeul, Y.D. and Takahashi, M. (1986b). Complete nucleotide and derived amino acid sequences of sex-limited protein (Sip), nonfunctional isotype of the fourth component of mouse complement (C4). J.Immunol. 136: 2989- 2993.

Normore, W. M., Shapiro, H. S., and Setlow, P. (1976) CRC handbook of biochemistry and molecular biology (Fastman, G.D., Eds).

O'Neill, G.J., Yang, S.Y., Tegoli, J., Berger, R. and Dupont, B. (1978). Chido and Rodgers blood groups are distinct antigenic components of human complement C4. Nature 273: 668-670.

Ogata, R.T. and Sepich, D.S. (1985). Murine sex-limited protein: complete cDNA sequence and comparison with murine fourth complement component. J.Immunol. 135: 4239-4244.

Ogata, R.T. and Zepf, N.E. (1991). The murine Sip gene. Additional evidence that sex- limited protein has no biologic function. J.Immunol. 147: 2756-2763.

Ohno, S. (1970). Evolution by gene duplication. Springer-Verlag, Berlin.

Oldstone, M B., Nerenberg, M., Southern, P., Price, J. and Lewicki, H. (1991). Virus infection triggers insuhn-dependent diabetes mellitus in a transgenic model: role of anti­ self (virus) immune response. Cell 65: 319-331.

Oliver, S.G., Winson, M.K., Kell, D.B. and Baganz, F. (1998). Systematic functional analysis of the yeast genome. Trends Biotechnol. 16: 373-378.

Pattanakitsakul, S., Nakayama, K., Takahashi, M. and Nonaka, M. (1990). Three extra copies of a C4-related gene in H-2"^ mice are C4/Slp hybrid genes generated by multiple recombinational events. Immunogenetics 32: 431-439.

243 Patthy, L. (1987). Intron-dependent evolution: preferred types of exons and introns. FEES Lett. 214: 1-7.

Paulsen, J.E., Capowski, E.E. and Strome, S. (1995). Phenotypic and molecular analysis of mes-3, a matemal-effect gene required for proliferation and viability of the germ line in C. elegans. Genetics 141: 1383-1398.

Perez, J.L., Wang, Y.K., Peoples, R., Coloma, A., Cruces, J. and Francke, U. (1998). A duplicated gene in the breakpoint regions of the 7ql 1.23 Williams- Beuren syndrome deletion encodes the initiator binding protein TFll-I and BAP-135, a phosphorylation target of BTK. Hum.Mol.Genet. 7: 325-334.

Porter, R.R. (1983). Complement polymorphism, the major histocompatibility complex and associated diseases: a speculation. Mol.Biol.Med. 1: 161-168.

Porter, R.R. (1985a). The complement components coded in the major histocompatibility complexes and their biological activities. Immunol.Rev. 87: 7-17.

Porter, R.R. (1985b). The polymorphism of the complement genes in HLA. Ann.Inst.Pasteur Immunol. 136C: 91-101.

Pratt, R E. and Dzau, V.J. (1999). Genomics and hypertension: concepts, potentials, and opportunities. Hypertension 33: 238-247.

Prentice, H.L., Schneider, P.M. and Strominger, J.L. (1986). C4B gene polymorphism detected in a human cosmid clone. Immunogenetics 23: 274-276.

Proudfoot, N. (1991). Poly(A) signals. Cell 64: 671-674.

Purandare, S.M. and Patel, P.I. (1997). Recombination hot spots and human disease. Genome Res. 7: 773-786.

Qu, X., Yang, Z., Zhang, S., Shen, L., Dangel, A.W., Hughes, J.H., Redman, K.L., Wu, L.C. and Yu, C.Y. (1998). The human DEVH-box protein Ski2w from the HLA is localized in nucleoli and ribosomes. Nucleic AcidsRes 26: 4068-4077.

Quandt, K., Freeh, K., Karas, H., Wingender, E. and Werner, T. (1995). Matind and Matlnspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data. Nucleic AcidsRes. 23: 4878-4884.

244 Ramakrishnan, C. and Robins, D M. (1997). Steroid hormone responsiveness of a family of closely related mouse pro viral elements. Mamm.Genome 8: 811-817.

Rathjen, F. G. (1993) Restrictin. In "Guidebook to the extracellular matrix and adhesion proteins" (Kries, T. and Vale, R., Eds.), Oxford University Press, Oxford, pp 87-88.

Raum, D., Awdeh, Z., Anderson, J., Strong, L., Granados, J., Teran, L., Giblett, E., Yunis, E.J. and Alper, C.A. (1984). Human C4 haplotypes with duplicated C4A or C4B. Am.J.Hum.Genet. 36: 72-79.

Reid, K.B. and Porter, R.R. (1981). The proteolytic activation systems of complement. Annu.Rev.Biochem. 50: 433-464.

Risch, N. (1990). Linkage strategies for genetically complex traits. I. Multilocus models. Am.J.Hum.Genet. 46: 222-228.

Rittner, C., Giles, C.M., Roos, M.H., Demant, P. and Mollenhauer, E. (1984). Genetics of human C4 polymorphism: detection and segregation of rare and duplicated haplotypes. Immunogenetics 19: 321-333.

Rodrigues, N.R., Dunham, I., Yu, C.Y., Carroll, M.C., Porter, R.R. and Campbell, R.D. (1987). Molecular characterization of the HLA-linked steroid 21-hydroxylase B gene from an individual with congenital adrenal hyperplasia. EMBO 7.6: 1653-1661.

Roos, M.H., Giles, C.M., Demant, P., Mollenhauer, E. and Rittner, C. (1984). Rodgers (Rg) and Chido (Ch) determinants on human C4: characterization of two C4 B5 subtypes, one of which contains Rg and Ch determinants. J.Immunol. 133: 2634-2640.

Rosa, P.A., Sepich, D.S., Shreffler, D C. and Ogata, R.T. (1985). Mice constitutive for sex-limited protein (SLP) expression contain multiple Sip gene sequences. J.Immunol. 135: 627-631.

Rowen, L., Dankers, C., Baskin, D., Faust, J., Loretz, C., Aheam, M. E., Banta, A., Swartzell, S., Smith, T. M., Spies, T., and Hood, L. (1997a) Homo sapiens HLA class m region containing tenascin X (tenascin-X) gene, partial cds; cytochrome P450 21- hydroxylase (CYP21B), complement component C4 (C4B) G11, helicase (SKI2W), RD, complement factor B (Bf), and complement component C2 (C2) genes, complete cds. GenBank accession no. AF019413.

245 Rowen, L., Dankers, C., Spies, T., Faust, J., Loretz, C., Aheam, M. E., Banta, A., Schwartzell, S., Smith, T. M., and Hood, L. (1997b) Human HLA class HI region containing cAMP response element binding protein-related protein (CREB-RP) and tenascin-X genes, complete cds, complete sequence. GenBank accession no. U89337.

Rowen, L., Mahairas, G., Qin, S., Aheam, M. E., Dankers, C., Lasky, S., Loretz, C., Schmidt, S., Tipton, S., Traicoff, R., Zackrone, K., and Hood, L. (1997c) Mus musculus major histocompatibility locus class HI region:butyrophilin-like protein gene, partial cds; Notch4, PBX2, RAGE, lysophatidic acid acyl transferase-alpha, palmitoyl-protein thioesterase 2 (PPT2), CREB-RP, and tenascin X (TNX) genes, complete cds; and CYP210HB pseudogene, complete sequence. GenBank accession no. AF030001.

Rowen, L., Qin, S., Lasky, S., Loretz, C., Dors, M., Mahairas, G., and Hood, L. (1998) Mus musculus major histocompatibility locus class HI region: complement C4 (C4) and cytochrome P450 hydroxylase A (CYP210H-A) genes, complete cds; sip pseudogene, complete sequence; NG6, SKI, and complement factor B (Bf) genes, complete cds; and complement factor C2 (C2) gene, partial cds. GenBank accession no. AF()49850.

Rupert, K. L., Renneblohm, R. M., and Yu, C. Y. (1999). An unequal crossover between the RCCX modules of the HLA leading to the presence of a CYP21B gene and a tenascin TNXB/TNXA-RP2 recombinant between C4A and C4B genes in a patient with juvenile rheumatoid arthritis. Exp. Clin Immunogenet. (in press).

Saiki, R. (1990). Amplification of genomic DNA. In PGR protocols: a guide to methods and applications (Innis, M.A., Gelfand, D.H., Sninsky,J.J. and White,T.J. Eds.) Academic Press, San Diego, pp21-28.

Salter-Cid, L. and Flajnik, M.P. (1995). Evolution and developmental regulation of the major histocompatibility complex. Crit.Rev.Immunol. 15: 31-75.

Sanger, P., Nicklen, S. and Coulson, A.R. (1977). DNA sequencing with chain- terminating inhibitors. Proc.A^ar/.Acad.Sc/.C/5'A 74: 5463-5467.

Sargent, C.A., Anderson, M.J., Hsieh, S.L., Kendall, E., Gomez-Escobar, N. and Campbell, R.D. (1994). Characterisation of the novel gene G11 lying adjacent to the complement C4A gene in the human major histocompatibility complex. Hum.Mol. Genet. 3: 481-488.

Schaffer, P.M., Palermos, J., Zhu, Z.B., Barger, B.O., Cooper, M.D. and Volanakis, I.E. (1989). Individuals with IgA deficiency and conunon variable immunodeficiency share

246 polymorphisms of major histocompatibility complex class LQ genes. Proc.Natl.Acad.Sci.USA 86: 8015-8019.

Schena, M., Heller, R.A., Theriault, T.P., Konrad, K., Lachenmeier, E. and Davis, R.W. (1998). Microarrays: biotechnology's discovery platform for functional genomics. Trends Biotechnol. 16: 301-306.

Schneider, P.M., Carroll, M.C., Alper, C.A., Rittner, C., Whitehead, A.S., Yunis, E.J. and Colten, H R. (1986). Polymorphism of the human complement C4 and steroid 21- hydroxylase genes. Restriction fragment length polymorphisms revealing structural deletions, homoduplications, and size variants. J.Clin.Invest. 78: 650-657.

Schneider, P.M., Wendler, C., Riepert, T., Braun, L., Schacker, U., Horn, M., Althoff, H., Mattem, R. and Rittner, C. (1989). Possible association of sudden infant death with partial complement C4 deficiency revealed by post-mortem DNA typing of HLA class n and m genes. Eur.J.Pediatr 149: 170-174.

Schneider, P.M. (1990). C4 DNA RFLP reference typing report. Complement Inflamm. 7: 218-224.

Schneider, P.M. and Rittner, C. (1997). Complement genetics. In Complement: a Practical Approach. (Dodds,A.W.; Sim,R.B., eds.) ERL Press at Oxford University Press, Oxford, pp 165-198.

Senaldi, G., Millward, B.A., Hussain, M.J., Pyke, D.A., Leslie, R.D. and Vergani, D. (1988). Low serum haemolytic function of the fourth complement component (C4) in insulin dependent diabetes. J.Clin.Pathol. 41: 1114-1116.

Senger, G., Ragoussis, J., Trowsdale, J. and Sheer, D. (1993). Fine mapping of the human MHC class U region within chromosome band 6p21 and evaluation of probe ordering using interphase fluorescence in situ hybridization. Cytogenet.Cell Genet. 64: 49-53.

Shen, L., Wu, L.C., Sanlioglu, S., Chen, R., Mendoza, A.R., Dangel, A.W., Carroll, M.C., Zipf, W.B. and Yu, C.Y. (1994). Structure and genetics of the partially duplicated gene RP located immediately upstream of the complement C4A and the C4B genes in the HLA class HI region. Molecular cloning, exon-intron structure, composite retroposon, and breakpoint of gene duplication. J.Biol.Chem. 269: 8466-8476.

247 Shreffler, D C., Atkinson, J.P., Chan, A C., Karp, D.R., Killion, C.C., Ogata, R.T. and Rosa, P.A. (1984). The C4 and Sip genes of the complement region of the murine H-2 major histocompatibility complex. Philos.Trans.R.Soc.Lond.B.Biol.Sci 306: 395-403.

Sim, E. and Cross, S.J. (1986). Phenotyping of human complement component C4, a class-in HLA antigen. Biochem. J. 239: 763-767.

Simon, S., Truedsson, L., Marcus-Bagley, D., Awdeh, Z., Eisenbarth, G.S., Brink, S.J., Yunis, E.J. and Alper, C.A. (1997). Relationship between protein complotypes and DNA variant haplotypes: complotype-RFLP constellations (CRC). Hum.Immunol. 57: 27-36.

Sinnott, P., Collier, S., Costigan, C., Dyer, P.A., Harris, R. and Strachan, T. (1990). Genesis by meiotic unequal crossover of a de novo deletion that contributes to steroid 21-hydroxylase deficiency. Proc.Natl.Acad.Sci. USA 87: 2107-2 111.

Smith, G.P. (1974). Unequal crossover and the evolution of multigene families. Cold Spring Mart.Symp.Quant.Biol. 38: 507-513.

Snoek, M., Teuscher, C. and van, V.H. (1998). Molecular analysis of the major MHC recombinational hot spot located within the G7c gene of the murine class HE region that is involved in disease susceptibility. J.Immunol. 160: 266-272.

Southern, E.M. (1975). Detection of specific sequences among DNA fragments separated by gel electrophoresis. J.Mol.Biol. 98: 503-517.

Speiser, P.W. and White, P C. (1989). Structure of the human RD gene: a highly conserved gene in the class III region of the major histocompatibility complex. DNA 8: 745-751.

Spielman, R.S., McGinnis, R E. and Ewens, W.J. (1993). Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (JDDM). Am.J.Hum.Genet. 52: 506-516.

Stavenhagen, J.B. and Robins, D M. (1988). An ancient provirus has imposed androgen regulation on the adjacent mouse sex-limited protein gene. Cell 55: 247-254.

Steinmetz, M., Stephan, D. and Fischer, L.K. (1986). Gene organization and recombinational hotspots in the murine major histocompatibility complex. Cell 44: 895- 904.

248 Stoye, J.P. and Coffin, J.M. (1988). Polymorphism of murine endogenous pro viruses revealed by using virus class-specific oligonucleotide probes. J.Virol. 62: 168-175.

Strachan, T. (1994). Molecular pathology of 21-hydroxylase deficiency. J.Inherit.Metab.Dis. 17: 430-441.

Surowy, C.S., Hoganson, G., Gosink, J., Strunk, K. and Spritz, R.A. (1990). The human RD protein is closely related to nuclear RNA-binding proteins and has been highly conserved. Gene 90: 299-302.

Svejgaard, A. and Ryder, L.P. (1981). HLA genotype distribution and genetic models of insulin-dependent diabetes xa&Witas. Ann.Hiim.Genet. 45: 293-298.

Takeda, J. (1995) HUMHBC4394 Human pancreatic islet Homo sapiens cDNA similar to HLA class HI RPl, mRNA sequence. GenBank accesion no. D82148.

Tassabehji, M., Strachan, T., Anderson, M., Campbell, R.D., Collier, S. and Lako, M. (1994). Identification of a novel family of human endogenous retroviruses and characterization of one family member, HERV-K(C4), located in the complement C4 gene cluster. Nucleic AcidsRes 22: 5211-5217.

Taylor, J.E., Thomas, N.H., Lewis, C.M., Abbs, S.J., Rodrigues, N.R., Davies, K.E. and Mathew, C.G. (1998a). Correlation of SMNt and SMNc gene copy number with age of onset and survival in spinal muscular atrophy. Eur.J.Hum.Genet. 6: 467-474.

Taylor, P R., Nash, J.T., Theodoridis, E., Bygrave, A.E., Waiport, M.J. and Botto, M. (1998b). A targeted disruption of the murine complement factor B gene resulting in loss of expression of three genes in close proximity, factor B, C2, and D17H6S45. J.BiolChem. 273: 1699-1704.

Tee, M.K., Thomson, A.A., Bristow, J. and Miller, W.L. (1995). Sequences promoting the transcription of the human XA gene overlapping P450c21A correctly predict the presence of a novel, adrenal-specific, truncated form of tenascin-X. Genomics 28: 171- 178.

Terasaki, P.I. (1991). History of transplantation : thirty-five recollections. UCLA Tissue Typing Laboratory, Los Angeles.

249 Thomsen, M., Molvig, J., Zerbib, A., da, P.C., Abbal, M., Dugoujon, J.M., Ohayon, E., Svejgaard, A., Cambon-Thomsen, A. and Nerup, J. (1988). The susceptibility to insulin- dependent diabetes mellitus is associated with C4 allotypes independently of the association with HLA-DQ alleles in HLA-DR3,4 heterozygotes. Immunogenetics 28: 320-327.

Tilley, C.A., Romans, D.G. and Crookston, M.C. (1978). Localisation of Chido and Rodgers determinants to the C4d fragment of human C4. Nature 276: 713-715.

Tisch, R. and McDevitt, H. (1996). Insulin-dependent diabetes mellitus. Cell 85: 291- 297.

Tiwari, J. L. and Terasaki, P. I. (1985) HLA and Disease Associations. Springer-Verlag, New York.

Todd, J.A., Bell, J.L and McDevitt, H.O. (1987). HLA-DQ beta gene contributes to susceptibility and resistance to insuhn-dependent diabetes mellitus. Nature 329: 599- 604.

Todd, J.A. (1995). Genetic analysis of type 1 diabetes using whole genome approaches. Proc.Natl.Acad.Sci USA 92: 8560-8565.

Tomlinson, I.P. and Bodmer, W.F. (1995). The HLA system and the analysis of multifactorial genetic disease. Trends Genet. 11: 493-498.

Trowsdale, J. (1993). Genomic structure and function in the MHC. Trends Genet. 9: 117-122.

Trowsdale, J. (1996).Molecular genetics of HLA class I and class H regions. In "HLA and MHC: genes, molecules and function" (Browning, M. and McMichael, A., Eds.), pp 23-38.

Tusie-Luna, M.T. and White, P C. (1995). Gene conversions and unequal crossovers between CYP21 (steroid 21- hydroxylase gene) and CYP21P involve different m&chdimsms. Proc.Natl.Acad.Sci.USA 92: 10796-10800.

Uring-Lambert, B., Mascart-Lemone, P., Tongio, M.M., Goetz, J. and Hauptmann, G. (1989). Molecular basis of complete C4 deficiency. A study of three patients. Hum.Immunol. 24: 125-132.

250 Venneker, G.T., Westerhof, W., de, V., I, Drayer, N.M., Wolthers, B.G., de Waal, L.P., Bos, J.D. and Asghar, S.S. (1992). Molecular heterogeneity of the fourth component of complement (C4) and its genes in vitiligo. J Invest.DermatoL 99: 853-858.

Volanakis, I.E., Zhu, Z.B., Schaffer, P.M., Macon, K.J., Palermos, J., Barger, B.C., Go, R., Campbell, R.D., Schroeder, H.W.J. and Cooper, M.D. (1992). Major histocompatibility complex class lH genes and susceptibility to immunoglobulin A deficiency and common variable immunodeficiency. J.Clin.Invest. 89: 1914-1922.

Vyse, T.J. and Todd, J.A. (1996). Genetic analysis of autoimmune disease. Cell 85: 311-318.

Vyse, T.J. and Kotzin, B.L. (1998). Genetic susceptibility to systemic lupus Qxyth&rxidXosns. Annu.Rev.Immunol. 16: 261-292.

Wahls, W.P., Wallace, L.J. and Moore, P.D. (1990). Hypervariable minisatellite DNA is a hotspot for homologous recorribination in human cells. Cell 60: 95-103.

Walls, E. V. and Crawford, D.H. (1987). Generation of human B lymphoblastoid cell lines using Epstein-Barr virus. In "Lymphocytes: a practical approach" (Klaus, G. G. B., Eds.), pp 149-162.

Warren, R.P., Singh, V.K., Cole, P., Odell, J.D., Pingree, C.B., Warren, W.L., DeWitt, C.W. and McCullough, M. (1992). Possible association of the extended MHC haplotype B44-SC30-DR4 with autism. Immunogenetics 36: 203-207.

Weg-Remers, S., Brenden, M., Schwarz, E., Witzel, K., Schneider, P.M., Guerra, L.K., Rehfeldt, I.R., Lima, M.T., Hartmann, D., Petzl-Erler, M.L., de, M., I and Mauff, G. (1997). Major histocompatibility complex (MHC) class DI genetics in two Amerindian tribes from southern Brazil: the Kaingang and the Guarani. Hum.Genet. 100: 548-556.

Welch, T.R., Beischel, L.S. and Choi, E.M. (1989). Molecular genetics of C4B deficiency in IgA nephropathy. Hum.Immunol. 26: 353-363.

White, P.C., Grossberger, D., Onufer, B.J., Chaplin, D.D., New, M.L, Dupont, B. and Strominger, J.L. (1985). Two genes encoding steroid 21-hydroxylase are located near the genes encoding the fourth component of complement in man. Proc.Natl.Acad.Sci USA 82: 1089-1093.

White, P.C., New, M.L and Dupont, B. (1986). Structure of human steroid 21- hydroxylase genes. Proc.Natl.Acad.Sci. USA 83: 5111-5115.

251 White, P.C., Tusie-Luna, M.T., New, M.L and Speiser, P.W. (1994). Mutations in steroid 21-hydroxylase (CYP21). Hum.Mutat. 3: 373-378.

Wicker, L.S., Appel, M.C., Dotta, P., Pressey, A., Miller, B.J., DeLarato, N.H., Fischer, P.A., Boltz, R.C.J. and Peterson, L.B. (1992). Autoimmune syndromes in major histocompatibility complex (MHC) congenic strains of nonobese diabetic (NOD) mice. The NOD MHC is dominant for insulitis and cyclophosphamide-induced diabetes. J.Exp.Med. 176: 67-77.

Wicker, L.S., Todd, J.A. and Peterson, L.B. (1995). Genetic control of autoimmune diabetes in the NOD mouse. Annu.Rev.Immunol. 13: 179-200.

Widner, W.R. and Wickner, R.B. (1993). Evidence that the SKI antiviral system of Saccharomyces cerevisiae acts by blocking expression of viral mRNA. Mol.Cell Biol. 13: 4331-4341.

Wolf, E., Spencer, K.M. and Cudworth, A.G. (1983). The genetic susceptibility to type 1 (insulin-dependent) diabetes: analysis of the HLA-DR association. Diabetologia 24: 224-230.

Woychik, R.P., Klebig, M.L., Justice, M.J., Magnuson, T.R., Avner, E.D. and Avrer, E.D. (1998). Functional genomics in the post-genome era. Mutat.Res. 400: 3-14.

Wu, A.Y., Schulman, S.J., Marconi, L.A., Reilly, C.R., Scott, B. and Lo, D. (1998). Protection against diabetes by MHC heterozygosity and reversal by cyclophosphamide. Celllmmunol. 184: 112-120.

Wu, L.C., Morley, B.J. and Campbell, R.D. (1987). Cell-specific expression of the human complement protein factor B gene: evidence for the role of two distinct 5'- flanking elements. Cell 48: 331-342.

Yamaguchi, Y., Takagi, T., Wada, T., Yano, K., Furuya, A., Sugimoto, S., Hasegawa, J. and Handa, H. (1999). NELF, a multisubunit complex containing RD, cooperates with DSIF to repress RNA polymerase 0 elongation. Cell 97: 41-51.

Yang, Z , Shen, L., Dangel, A.W., Wu, L.C. and Yu, C.Y. (1998). Four ubiquitously expressed genes, RD (D6S45)-SKI2W (SKIV2L)-DOM3Z-RPl (D6S60E), are present between complement component genes factor B and C4 in the class m region of the HLA. Genomics 53: 338-347.

252 Yang, Z. and Yu, C.Y. (1999). Organizations and gene duplications o f the human and mouse MHC complement gene clusters. Exp.Clin.Immunogenet.(\n press).

Yang, Z., Mendoza, A.R., Welch, T.R., Zipf, W.B., and Yu, C.Y. (1999) Modular variations of HLA class HI genes for serine/threonine kinase RP, complement C4, steroid 21-hydroxylase CYP21 and tenascin TNX (RCCX): a mechanism for gene deletions and disease associations. J.Biol.Chem. 274. 12147-12156.

Yoon, J.W. (1995). A new look at viruses in type 1 diabetes. Diabetes Metab.Rev. 11: 83-107.

Yu, C.Y., Belt, K.T., Giles, C.M., Campbell, R.D. and Porter, R.R. (1986). Structural basis of the polymorphism of human complement components C4A and C4B: gene size, reactivity and antigenicity. EMBO J.5: 2873-2881.

Yu, C.Y. and Campbell, R.D. (1987). Definitive RFLPs to distinguish between the human complement C4A/C4B isotypes and the major Rodgers/Chido determinants: application to the study of C4 null alleles. Immunogenetics 25: 383-390.

Yu, C.Y., Campbell, R.D. and Porter, R.R. (1988). A structural model for the location of the Rodgers and the Chido antigenic determinants and their correlation with the human complement component C4A/C4B isotypes. Immunogenetics 27: 399-405.

Yu, C.Y. (1991). The complete exon-intron structure of a human complement component C4A gene. DNA sequences, polymorphism, and linkage to the 2\- hydroxylase gene. 7 //TimMnoZ. 146: 1057-1066.

Yu, C.Y., Wu, L.C., Buluwela, L. and Milstein, C. (1993). Cosmid cloning and walking to map human CDl leukocyte differentiation antigen genes. Methods Enzymol. 217: 378-398.

Yu, C.Y. (1999). Molecular genetics of the human MHC complement gene cluster. Exp.Clin.Immunogenet. 15: 213-230.

253 IMAGE EVALUATION TEST TARGET (Q A -3)

/r

%

L à |2j^ 1.0 y ° m L à Il 2.2 L i Là L à Ui L i L à l.l U m m 1.8

1.25 1.4 1.6

150mm

V

V ^PPl_IED A KVI/4GE . Inc 1653 East Main Street . ' X • > Rochester, NY 14609 USA .a= -.^ = Phone: 716/4820300 f ' Fax: 716/288-5989

0 1993. Applied Image. Inc.. Ail Rights Reserved

o/