PREFERENTIAL ALLELIC EXPRESSION OF GENETIC INFORMATION

ON HUMAN 7

LAYLA KATIRAEE

A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy

Department of Molecular Genetics University of Toronto

© Copyright by Layla Katiraee 2008

PREFERENTIAL ALLELIC EXPRESSION OF GENETIC INFORMATION

ON HUMAN

LAYLA KATIRAEE

Doctor of Philosophy

Department of Molecular Genetics

University of Toronto

2008

ABSTRACT

Genes are typically expressed in equal amounts from both parentally inherited . However, recent studies have demonstrated that can be preferentially transcribed from a locus. Non-random preferential expression of alleles can occur in a parent-of-origin pattern, known as imprinting, where epigenetic factors regulate their transcription. Alternatively, it can occur in a haplotype-specific pattern, where cis-acting polymorphisms in regulatory regions are thought to underlie the phenomenon. Both forms of unequal allelic expression have been associated with human disease. Consequently, it is important to identify genes subject to unequal allelic expression and characterize mechanisms that regulate differential transcription.

This thesis presents the results of a screen for unequal allelic expression where approximately 50 murine transcripts homologous to genes on human chromosome 7 were analyzed. Human chromosome 7 was selected due to its association with several human disorders that show parent-of-origin effects. The screen identified non-imprinted

ii

preferential allelic expression in numerous transcripts and demonstrated that such patterns can occur in tissue specific patterns.

Paraoxonase-1 (Pon1), a implicated in arthrosclerosis, was identified as having a dynamic pattern of allelic expression which varies throughout embryonic development.

This finding represents the first report of a developmentally regulated pattern of allelic variance. Carboxypeptidase-A4 (Cpa4) was identified as having a tissue-specific imprinted pattern of expression, where the maternal allele was preferentially expressed in all embryonic tissues, with the exception of the brain. The Krüppel-like factor 14 gene (Klf14), a novel imprinted transcript, was found to have ubiquitous maternal expression in all human and murine tissues analyzed. A differentially methylated region, generally associated with imprinted transcripts, was not found in the gene’s CpG island, nor was a differential pattern of histone modifications identified. However, it was determined that maternal methylation regulates the transcript.

The data in this thesis contribute to our understanding of the numerous patterns of allelic expression that exist in nature and the diverse mechanisms that regulate them.

Ultimately, quantitative analyses of allelic expression patterns and the identification of their underlying genomic DNA sequences will become standard protocol in all biomedical studies.

iii

ACKNOWLEDGMENTS

I would like to thank my supervisor Dr. Stephen Scherer for his confidence in me and for his support, as well as my committee members Dr. Timothy Hughes and Dr. Christopher

Pearson for their guidance and encouragement. I would like to thank Dr. Kazuhiko

Nakabayashi for introducing me to the field of epigenetics, for the overwhelming amount of guidance that he offered throughout the course of my doctoral degree, for exemplifying scientific creativity and discipline, for answering all my frantic emails, and for his friendship.

To Dr. Takahiro Yamada, I thank him for his assistance and collaboration in countless experiments, for his patience in my training, for his wit and humour. To all the post-docs, technicians, and students who assisted me, particularly Dr. Kohji Okamura, Dr. Makiko

Meguro-Horike, Dr. Shin-ichi Horike, Dr. Yan Ren, Michelle Lee, Maryam Mashreghi-

Mohammadi, Katherine Sansom, and Xiaochu Zhao, I owe a debt of gratitude. I’d like to thank our collaborators who made the findings in this thesis possible. I’d like to thank Dr. Johanna

Rommens for her guidance and wisdom. I’d particularly like to thank Dr. Helen Petropoulos for her encouragement, friendship, movie and television critiques, for teaching me all the basics, and for her help troubleshooting several experiments. I thank Adam Smith for his experimental assistance and the critical reading of this thesis. I’d like to thank all past and present members of TCAG and the Scherer lab, particularly the members of the Sequencing and Tissue Culture facilities, as well as the bioinformaticians. I’d like to thank Dr. Ilona Skerjanc and Dr. Robert

Hegele for helping me get where I am today.

I’d like to thank all the members of the Scherer, Minassian, and Pearson labs for their friendship and help, especially Julie Turnbull, Dr. Iulia Oprea, Julie Tomlinson, Sanjeev

Pullenayegum, Dr. Elayne Chan, and Dr. Christian Marshall. To Dr. Lars Feuk and particularly

Andrew Carson, with whom I’ve solved over 2000 crossword puzzles, hundreds of wordraces, cricklers, and iSketches, and have spent countless hours discussing the merits of TV shows, I

iv

owe an endless debt of gratitude for their friendship, support, and assistance. Moving my desk into the lab may have been one of the most difficult, yet responsible decisions I made throughout my PhD, but the ‘conversations’ we had at Sick Kids will always make me smile.

To Jennifer Skaug, who epitomizes dedication, discipline, and patience, I can never thank enough. I doubt that I will ever find a co-worker and friend as kind and understanding wherever

I may go.

I’d like to thank the members of the Association for Baha’i Studies at the University of

Toronto, the Baha’i Community of Toronto, Varqa Children’s Magazine, and all the members of my Ruhi study circles for enriching my life. To my friends in Toronto who brightened these past five years: Ted & Lindsay Slavin, Sherine & Nabil Kharooba, Lynn & Jason Arsenault, the

Sahba Family, Poopak & Hilda Raad, Robyn & Zayne Raue, and Carol Forster. I thank the countless artists whose music kept me company and entertained me on evenings and weekends

☺ I would like to thank my extended family for their support and prayers along this long trek, particularly my grandparents, my aunt Guity and uncle Faramarz, my cousins Ala, Sina, and

Lisa, as well as my in-laws Sam and Barbara, who I consider one of my guardian angels. I would like to thank my siblings Ema, Babak, and Galen and would like to point out that I’m the only real doctor in the family. This thesis would have never been completed without the love, support and encouragement of my parents, Hamid and Mahboobeh, who took care of me when

I needed it the most, always believed in me, and let me achieve my goals.

Finally, I’d like to dedicate this thesis to my husband and my best friend, David Parker.

You’ve brightened every hour of these past five years, including my darkest days. You gave me the strength that I needed to finish and held my hand every step of the way, reminding me why

I’m the lucky one. You’ve stood by me and believed in me like nobody ever has. This would have been impossible without you and I look forward to getting where I’m going with you.

v

TABLE OF CONTENTS

THESIS ABSTRACT...... ii ACKNOWLEDGEMENTS...... iv TABLE OF CONTENTS...... vi LIST OF TABLES...... ix LIST OF FIGURES...... x LIST OF APPENDICES...... xii

CHAPTER I: INTRODUCTION TO UNEQUAL ALLELIC EXPRESSION I.A UNEQUAL ALLELIC EXPRESSION...... 2 I.B UNEQUAL ALLELIC EXPRESSION IN NON-PARENT-OF-ORIGIN PATTERNS...... 4 I.B.1 Physiological importance of unequal allelic expression in non-parent of origin patterns...... 4 I.C UNEQUAL ALLELIC EXPRESSION IN PARENT-OF-ORIGIN PATTERNS...... 5 I.C.1 Imprinting disorders...... 6 I.C.2 Regulation of imprinting...... 9 I.C.2.i DNA methylation...... 9 I.C.2.ii Histone modifications...... 11 I.C.2.iii Insulation...... 12 I.C.2.iv Non-coding RNA...... 13 I.C.3 Imprinted clusters in human and murine genomes...... 17 I.C.3.i The H19/Igf2 locus...... 17 I.C.3.ii Imprinted loci on human chromosome 7...... 18 I.C.3.ii.a GRB10...... 18 I.C.3.ii.b PEG10/SGCE region...... 20 I.C.3.ii.c MEST/COPG2 region...... 21 I.C.4 Identification of imprinted genes...... 24 I.C.4.i Identification of candidate imprinted genes...... 24 I.C.4.ii Characterization of imprinted expression...... 25 I.C.5 Theories on the evolution of imprinted expression...... 26 I.D THESIS OVERVIEW...... 29 I.E REFERENCES...... 31

CHAPTER II: ALLELIC ANALYSIS OF CANDIDATE IMPRINTED GENES ON HUMAN CHROMOSOME 7 II.A INTRODUCTION...... 48 II.B MATERIALS AND METHODS...... 50 II.C RESULTS...... 57 II.C.1 Transcripts selected for imprinting analysis...... 57 II.C.2 Preferential allelic expression is a common phenomenon...... 58 II.C.3 Analysis of preferential allelic expression in placental tissues...... 63 II.D DISCUSSION...... 67 II.E REFERENCES...... 72

vi

CHAPTER III: DYNAMIC VARIATION IN ALLELE-SPECIFIC OF PARAOXONASE-1 IN MURINE TISSUES THROUGHOUT DEVELOPMENT III.A INTRODUCTION...... 77 III.B MATERIALS AND METHODS...... 80 III.B.1 Allelic expression analysis of murine tissues...... 80 III.B.2 Quantification of allelic ratios...... 81 III.B.3 Real-time quantitative PCR...... 82 III.B.4 Methylation analysis...... 82 III.B.5 Human PON1 expression analysis...... 83 III.C RESULTS...... 85 III.C.1 Pon1 allelic expression is dynamic throughout embryonic development...... 85 III.C.2 Expression from Pon1 alleles increase disproportionately throughout embryonic development...... 89 III.C.3 Screening of candidate SNPs responsible for allele specific gene expression and promoter methylation analysis...... 93 III.C.4 Allelic expression analysis for the human PON1 gene...... 95 III.D DISCUSSION...... 97 III.E REFERENCES...... 101

CHAPTER IV: IMPRINTING ANALYSIS OF MURINE CARBOXYPEPTIDASE-A4 IV.A INTRODUCTION...... 106 IV.B MATERIALS AND METHODS...... 108 IV.B.1 Expression analysis...... 108 IV.B.2 Methylation analysis...... 109 IV.C RESULTS...... 110 IV.C.1 Cpa4 is imprinted in murine embryonic tissues, yet displays biallelic expression in the fetal brain...... 110 IV.C.2 Regulation of Cpa4 imprinted expression...... 115 IV.D DISCUSSION...... 117 IV.E REFERENCES...... 121

CHAPTER V: IDENTIFICATION OF THE IMPRINTED KLF14 TRANSCRIPTION FACTOR ON HUMAN CHROMOSOME 7q32 V.A INTRODUCTION...... 126 V.B MATERIALS AND METHODS...... 128 V.B.1 RT-PCR using RNA from somatic cell hybrids, human tissues, and murine embryonic samples...... 128 V.B.2 Methylation analysis...... 128 V.B.3 Rapid amplification of cDNA ends (RACE) to determine full length sequence of KLF14...... 129 V.B.4 Chromatin immunoprecipitation (ChIP) and analysis of histone modifications...129 V.B.5 Amplification of KLF14 and KLF16 in mammalian species...... 130 V.B.6 Sub-cellular localization...... 131

vii

V.C RESULTS...... 132 V.C.1 Maternal specifix expression of human and murine KLF14 in embryonic and extra-embryonic murine tissues...... 132 V.C.2 Histone modifications in Klf14 and Mest CpG islands...... 136 V.C.3 Methylation analysis and Klf14 expression in Dnmt3a knockout mice...... 140 V.C.4 Characterization of the human and murine KLF14 transcripts...... 144 V.C.5 Expression of human and murine KLF14...... 145 V.C.6 Functional prediction and sub-cellular localization of KLF14...... 147 V.C.7 Syntenic analysis of KLF14...... 150 V.C.7 Sequence of KLF14 in RSS patients, autistic and control individuals...... 154 V.D DISCUSSION...... 158 V.D.1 Imprinting of KLF14...... 158 V.D.1 Evolution of KLF14...... 161 V.E REFERENCES...... 165

CHAPTER VI: SUMMARY AND FUTURE DIRECTIONS VI.A SUMMARY AND FUTURE DIRECTIONS...... 170 VI.B REFERENCES...... 178

viii

LIST OF TABLES

CHAPTER I: INTRODUCTION TO UNEQUAL ALLELIC EXPRESSION TABLE I.1 IMPRINTED GENE DISORDERS...... 8 TABLE I.2 IMPRINTED LOCI ON HUMAN CHROMOSOME 7...... 23

CHAPTER II: ALLELIC ANALYSIS OF CANDIDATE IMPRINTED GENES ON HUMAN CHROMOSOME 7 TABLE II.1 PRIMERS USED IN PREFERENTIAL ALLELIC EXPRESSION ANALYSIS...... 53 TABLE II.2 TRANSCRIPTS STUDIED FOR PREFERENTIAL ALLELIC EXPRESSION...... 56

CHAPTER III: DYNAMIC VARIATION IN ALLELE-SPECIFIC GENE EXPRESSION OF PARAOXONASE-1 IN MURINE TISSUES THROUGHOUT DEVELOPMENT TABLE III.1 QUANTITATIVE PCR AND PYROSEQUENCING RESULTS FOR Pon1...... 92

CHAPTER IV: IMPRINTING ANALYSIS OF MURINE CARBOXYPEPTIDASE-A4 TABLE IV.1 FREQUENCY OF JF1 ALLELE IN TISSUES OF F1 HYBRID OFFSPRING...... 113

ix

LIST OF FIGURES

CHAPTER I: INTRODUCTION TO UNEQUAL ALLELIC EXPRESSION FIGURE I.1 FORMS OF UNEQUAL ALLELIC EXPRESSION...... 3 FIGURE I.2 REGULATION OF IMPRINTED EXPRESSION...... 14

CHAPTER II: ALLELIC ANALYSIS OF CANDIDATE IMPRINTED GENES ON HUMAN CHROMOSOME 7 FIGURE II.1 EXPERIMENTAL OUTLINE OF SCREEN FOR PREFERENTIAL ALLELIC EXPRESSION OF CANDIDATE GENES ON HUMAN CHROMOSOME 7...... 52 FIGURE II.2 VARIATION IN PREFERENTIAL ALLELIC EXPRESSION LEVELS...... 60 FIGURE II.3 COMPARISON OF POLYMORPHIC PEAKS BETWEEN GENOMIC DNA AND cDNA TO DETERMINE PREFERENTIAL ALLELIC EXPRESSION...... 61 FIGURE II.4 STRAIN SPECIFIC EFFECTS IN PREFERENTIAL ALLELIC EXPRESSION...... 62 FIGURE II.5 IMPRINTING ANALYSIS OF Hoxa11...... 65 FIGURE II.6 IMPRINTING ANALYSIS OF Hoxa11 USING F2 HYBRIDS...... 66

CHAPTER III: DYNAMIC VARIATION IN ALLELE-SPECIFIC GENE EXPRESSION OF PARAOXONASE-1 IN MURINE TISSUES THROUGHOUT DEVELOPMENT FIGURE III.1 OVERVIEW OF THE MURINE Pon1 LOCUS...... 79 FIGURE III.2 EXPRESSION PATTERN OF Pon1...... 87 FIGURE III.3 COMPARISON OF PYROSEQUENCING AND SNAPSHOT METHODOLOGIES...... 88 FIGURE III.4 REAL-TIME QUANTITATIVE PCR ANALYSIS OF Pon1...... 90 FIGURE III.5 METHYLATION ANALYSIS OF Pon1 PROMOTER IN JF1 X BL6 HYBRID EMBRYOS...... 94 FIGURE III.6 HUMAN PON1 EXPRESSION...... 96

CHAPTER IV: IMPRINTING ANALYSIS OF MURINE CARBOXYPEPTIDASE-A4 FIGURE IV.1 IMPRINTED MATERNAL EXPRESSION OF Cpa4 IN MURINE TISSUES...... 112 FIGURE IV.2 IMPRINTING ANALYSIS OF 9.5 DPC TISSUES...... 114 FIGURE IV.3 METHYLATION ANALYSIS OF MURINE Cpa4 PUTATIVE PROMOTER REGION..116

CHAPTER V: IDENTIFICATION OF THE IMPRINTED KLF14 TRANSCRIPTION FACTOR ON HUMAN CHROMOSOME 7q32 FIGURE V.1 HUMAN AND MURINE KLF14 STRUCTURE...... 127 FIGURE V.2 IMPRINTING ANALYSIS OF MURINE Klf14 BY RFLP...... 134 FIGURE V.3 IMPRINTING ANALYSIS OF HUMAN AND MURINE KLF14...... 135 FIGURE V.4 EPIGENETIC MODIFICATIONS OF MURINE Klf14...... 138 FIGURE V.5 EXPRESSION OF Klf14 IN OFFSPRING OF Dnmt3a CONDITIONAL KNOCKOUT MICE...... 142 FIGURE V.6 BISULFITE SEQUENCING OF MURINE AND HUMAN DNA...... 143 FIGURE V.7 HUMAN AND MURINE KLF14 EXPRESSION...... 146 FIGURE V.8 MULTISPECIES AMINO ACID ALIGNMENT OF KLF14 OPEN READING FRAME. 148 FIGURE V.9 CELLULAR LOCALIZATION OF THE MURINE Klf14 ...... 149 FIGURE V.10 RELATIONSHIP BETWEEN THE HUMAN KLF-PROTEIN TRANSCRIPTION FACTORS...... 152 FIGURE V.11 RETROTRANSPOSITION OF KLF14 AND MAMMALIAN EVOLUTION...... 153

x

FIGURE V.12 KLF14 ORF SEQUENCES IN THE HUMAN, CHIMPANZEE AND GORILLA...... 155 FIGURE V.13 HAPLOTYPE FREQUENCIES OF KLF14 OPEN READING FRAME IN THE HUMAN POPULATION...... 157

CHAPTER VI: SUMMARY AND FUTURE DIRECTIONS FIGURE VI.1 MODEL FOR THE REGULATION OF THE Mest/Klf14 IMPRINTED CLUSTER...... 174

APPENDIX I: ANALYSIS OF PARAOXONASE-1 SPLICE VARIANT APPENDIX FIGURE 1 EXPRESSION PATTERN OF AK050119...... 182 APPENDIX FIGURE 2 COMPARISON OF PYROSEQUENCING AND SNAPSHOT METHODOLOGIES...... 183

xi

LIST OF APPENDICES

APPENDIX I: ANALYSIS OF PARAOXONASE-1 SPLICE VARIANT...... 180

APPENDIX II: LIST OF ABBREVIATIONS...... 184

xii 1

CHAPTER I: INTRODUCTION TO UNEQUAL ALLELIC EXPRESSION

2

I.A UNEQUAL ALLELIC EXPRESSION

Mendelian genetics is based on the assumption that inherited alleles are equally expressed. However, in the past half century, it has been revealed that genes can be mono- allelically or preferentially expressed. Monoallelic gene expression can occur in a stoichastic fashion, where either allele is randomly silenced, and is exemplified by the silencing of the X-chromosome in female embryos. However, this phenomenon also occurs in autosomal genes, such as protocadherin genes (1), T and B cell receptors, as well as olfactory receptors (2). Recent studies estimate that approximately 5% of autosomal genes are subject to random monoallelic expression (3). Preferential gene expression can also occur in a non-random pattern and can be broadly classified into two categories: the preferential expression of alleles in a parent-of-origin pattern (genomic imprinting) and unequal allelic expression in a non-parent-of-origin pattern, where a specific haplotype is preferentially expressed (Figure I.1).

3

Figure I.1

Figure I.1 Forms of unequal allelic expression. Column on the left depicts unequal allelic expression in a parent-of-origin pattern (imprinting). Column on right depicts unequal allelic expression in a non-parent-of-origin pattern or preferential expression of a haplotype. Within each quadrant, the red and blue rectangles on the left depict maternal and paternal alleles, respectively, where the alleles are distinguished by identification of a single nucleotide polymorphism (SNP). The rectangles with the arrows on the right show the alleles which are being preferentially expressed. The method to identify preferential allelic expression using SNPs is explained in section I.C.4.ii

4

I.B UNEQUAL ALLELIC EXPRESSION IN NON-PARENT-OF-ORIGIN PATTERNS

The preferential expression of alleles in a non-parent-of-origin pattern has been shown to be a common occurrence in human, mouse, and maize (4-8). Genes that are subject to preferential allelic expression are scattered throughout genomes and are seldom clustered (4, 9). A recent expression analysis of 1,389 genes suggested that approximately

53% of these showed differential expression in white blood cells in at least 1 of 12 individuals examined (8). The difference in expression between two alleles has been shown to be as high as 90% (10), and variations in cis-acting DNA regulatory elements are considered to be primarily responsible for such allele-specific gene expressions (10, 11).

By analyzing multi-generational pedigrees, it has been demonstrated that the preferential expression of an allele can be linked to a specific haplotype whose expression follows

Mendelian laws of inheritance (10, 12). However, analyses in unrelated individuals have demonstrated that only a minority of individuals heterozygous for a transcribed polymorphism demonstrate unequal allelic expression (5), and that the ratio of unequal gene expression can vary between tissues (4).

The physiological importance of these differences is gradually being unravelled, and is highlighted by the fact that altered levels of expression of polymorphic alleles have been implicated in disease pathogenesis and have functional significance (13-16).

I.B.1 Physiological importance of unequal allelic expression in non-parent-of-origin patterns

In 2002, Yan and colleagues demonstrated that a decrease in the expression of one allele in the adenomatous polyposis coli tumour suppressor gene was linked to the development of familial adenomatous polyposis (13). Additionally, by analyzing tumours exhibiting loss of heterozygosity, they demonstrated that the lost allele was the one

5 associated with higher levels of expression. Interestingly, the allele with decreased expression levels did not contain any coding mutations or mutations in the promoter or untranslated regions, suggesting that variations in cis-acting elements leading to unequal allelic expression may be located in introns or in distant regions.

At the same time, a common polymorphism in the cytotoxic T lymphocyte antigen

4 gene (CTLA4) was associated with an increased risk for autoimmune disorders (17). The risk associated haplotype contains a single polymorphism and was found to be expressed at lower levels. A similar study identified a haplotype of peptidylarginine deiminase citrullinating enzyme 4 (PADI4) associated with susceptibility to rheumatoid arthritis. The mRNA produced from the associated haplotype was found to have a significantly different level of stability than the non-susceptible allele (18). A study performed by Laitinen, et al, in 2004, identified a common susceptibility locus associated with asthma in various populations (16). An analysis of the locus revealed a gene (GPRA) with splice variants that differed at the 3’-exon. Immunohistochemical studies using tissues from asthmatic and healthy individuals revealed an imbalance between the two protein isoforms of GPRA in affected patients. The authors hypothesized that polymorphisms in regulatory regions may create affect splicing of GPRA. Differential expression of polymorphic alleles have also been implicated in obesity (14), myocardial infarction (19), and other complex diseases (for review, see (20)) stressing the importance of this biological phenomenon in human morbidity.

I.C UNEQUAL ALLELIC EXPRESSION IN PARENT-OF-ORIGIN PATTERNS

The expression of alleles in a parent-of-origin pattern is also known as genomic imprinting. In mammals, this phenomenon was first identified in 1975 when Takagi and

Sasaki determined that the paternal X chromosome was always inactivated in murine extra-

6 embryonic tissues (21). The term “genomic imprinting” was used in 1984, when studies showed that both maternal and paternal genomes are necessary for the proper development of an embryo, suggesting the existence of specific “imprints” on the haploid genomes during gametogenesis (22, 23). Subsequent studies determined that uniparental disomy of specific chromosomal regions lead to various phenotypic anomalies (24). The first imprinted genes were identified in 1991 (25-27), and today at least 84 imprinted genes have been identified (28).

I.C.1 Imprinting disorders

Imprinted genes often play important roles in embryonic development and regulation of growth. Consequently, aberrations in the expression of these genes have been associated with various developmental disorders (Table I.1). Additionally, several complex diseases have shown parent-of-origin effects in linkage studies, where the disease associated allele is inherited more frequently from a specific parent in affected individuals.

Most notably, parent-of-origin effects have been observed autism (29), bipolar disorder

(30), epilepsy (31), Alzheimer’s (32, 33) and Williams-Beuren syndrome (34). However, imprinted genes have not been associated in any of these disorders.

The study of uniparental disomy patients and individuals with genomic anomalies has been invaluable to the discovery of imprinted loci. Most notably, the identification of microdeletions that were in common between Angelman syndrome (AS) and Prader-Willi syndrome patients led to the identification of a single imprinting centre for the entire region

(35). Subsequent studies led to the discovery of maternally expressed genes, considered to be candidates for AS due to their loss of expression in patients with paternal uniparental disomy (pUPD) (36, 37), and mutations in the maternally expressed UBE3A gene in AS patients (38, 39).

7

In the past decade, murine studies have helped clarify the function and phenotypic effect of various imprinted genes. Knockouts of several of these transcripts have been shown to lead to early embryonic lethality, accounting for the death of parthenogenetic and androgenetic embryos (40-42). Studies of these and other imprinted genes have demonstrated that they often play important roles in the development of the placenta. These placenta-specific functions may underlie the influence that imprinted genes exert on embryonic development since they may regulate the flow of nutrients from the mother to the embryo (for review, see (43)). This concept is central in the social-conflict hypothesis of genomic imprinting (see section I.C.5).

While many imprinted disorders are characterized by overgrowth or growth retardation, several are associated with abnormal behaviour (Table I.1). Murine studies have been less successful at discovering the role of imprinted genes in behaviour due to the complexities of human cognition and intelligence. However limited these studies may be, they have demonstrated a clear role of imprinted genes in mammalian behaviour. Most notably, two paternally expressed genes (Peg1/Mest and Peg3) have been shown to regulate maternal behaviour, and knockouts of either of these genes can lead to the death of infants due to impaired maternal feeding and nesting (44, 45).

8

Table I.1 Imprinted gene disorders Disorder Locus PO Phenotype Beckwith-Wiedemann Gigantism, enlarged tongue, 11p15.5 pUPD Syndrome exomphalos, Wilms' tumor Obesity, short stature, mental Prader-Willi Syndrome 15q11-13 mUPD retardation Mental retardation, speech and Angelman Syndrome 15q11-13 pUPD language limitations, abnormal behaviour Parathyroid hormone-resistant Pseudohypoparathyrodism type 20q13.2 pUPD hypocalcemia and IB hyperphosphatemia Intrauterine growth retardation, Transient neonatal diabetes 6q24 pUPD dehydration, deficient insulin mellitus secretion Paternal uniparental disomy Skeletal abnormalities, mental 14q32.3 pUPD chromosome 14 retardation, joint contractures Intrauterine growth retardation, Russell-Silver Syndrome 7* mUPD unique facial features *Denotes that a specific locus has not been associated with the disorder PO: Parent-of-origin; pUPD, mUPD: Paternal/Maternal uniparental disomy

9

I.C.2 Regulation of imprinting

Imprinted genes often share common regulatory mechanisms. Aberrations in the establishment and maintenance of these mechanisms, which are often epigenetic in nature, can lead to loss of imprinting. In the following section, four factors which regulate imprinting will be reviewed: DNA methylation, histone modifications, insulation, and non- coding RNA expression.

I.C.2.i DNA methylation

The methylation of cytosines is an epigenetic modification essential for the proper development of many organisms, and most commonly occurs at CG dinucleotides.

Cytosine methylation is heritable and can determine gene expression patterns. DNA methylation represses transcription in several different manners: by interfering with the binding of transcription factors to DNA, by serving as a marker for transcriptional repressors that bind methylated CpGs (such as MECP2), and by recruiting histone modifying enzymes (46).

In imprinted loci, cytosine methylation is associated with differentially methylated regions (DMRs), where the unmethylated allele is most often found on the parental allele that is being expressed, although there are exceptions to this observation (47-49).

Differentially methylated regions often act as imprinting control regions (ICRs), and loss of methylation at these loci can lead to loss of imprinting in numerous genes (50, 51).

Consequently, the proper establishment and maintenance of DMRs throughout development is crucial for imprinted expression. Briefly, germ cells undergo demethylation and DMRs must be re-established in a sex-specific manner. Upon fertilization, these methylated regions must be maintained and escape the genome-wide demethylation which occurs in the pre-implantation embryo (52).

10

Not all DMRs are established in germ cells. Some DMRs, known as secondary

DMRs, are established during embryonic development. These are generally associated with imprinted genes that show tissue specific imprinting or developmental-stage specific imprinting.

The methylation of DNA is carried out by DNA methyl-transferases (DNMTs).

DNMT1 has a preference for hemi-methylated DNA, and is involved in maintaining methylation in replicating DNA (53). The DNMT3 methyl-transferases are more closely involved in imprinting mechanisms due to their role in de novo methylation. In 2004, the creation of conditional knockout mice for two Dnmt3 isoforms demonstrated that Dnmt3a is essential for the establishment of methylated regions associated with imprinted genes in germ cells (54). Consequently, the offspring from Dnmt3a conditional mutant females lacked methylation at all maternally methylated regions. Germ cells from Dnmt3a conditional mutant males lacked methylation in 2/3 paternally methylated regions examined. The same phenotype has been observed for the offspring of Dnmt3L mutants.

This protein is a member of DNMT family, but lacks methyltransferase activity. It is considered to be a cofactor for the DNMTs (55). It has been recently shown that DNMT3L forms a complex with DNMT3A, and that DNMT3L may bring the functional DNA methyltransferase DNMT3A into contact with maternally methylated DMRs. Additionally,

DNMT3L directly binds unmethylated lysine 4 of histone 3 (H3K4) demonstrating that

DNA methylation and histone modification mechanisms are intricately intertwined (56, 57).

The mechanism whereby the methylated regions established by the DNMT3 escape demethylation during embryogenesis remains poorly understood. Recently, the first factor essential in the maintenance of genomic imprinting has been identified. This protein, PGC7/Stella, protects specific DMRs from demethylation during embryogenesis,

11 but not all regions (58). The exact mechanism whereby this protective influence is exerted remains unknown: Stella may directly bind to DNA or may be part of a complex.

I.C.2.ii Histone Modifications

The modification of histones in the chromatin structure is an epigenetic mechanism which regulates gene expression. There are many different types of histone modifications, most notably, the modification of lysines, arginines, and serines in the amino terminal tail of core histones. In general, the methylation of these residues is associated with transcriptional silencing, whereas their acetylation is linked to transcriptional activity. As previously mentioned, the process whereby histone modification takes place is linked to

DNA methylation, where repressive histone modifications can serve as marks which recruit

DNMTs (59). Whole genome analyses of histone modification sites are gradually discovering genomic regions characterized by these epigenetic modifications in selected cell types (60, 61). However, the precise mechanism whereby histone modifying enzymes recognize target regions remains unknown.

Imprinted loci are characterized by the differential modification of histone residues, particularly within ICRs (Figure I.2). It has been shown that histone methylation alone can also drive genomic imprinting, where loss of DNA methylation does not affect parent-of- origin expression (62). Silenced alleles have been associated with methylation of lysines

(K) 9 and 27 on histone 3 (H3K27 and H3K9), (63) methylation of lysine 20 of histone 4

(H4K20) (64), hypoacetylation of histones 3 and 4 (H3ac and H4ac), and hypomethylation

H3K4 (65), among others.

Among the methylated histone residues, only methylation of H3K4 is associated with euchromatin. It has been recently shown that subunits of the histone deacetylase

(HDAC) complex bind to H3K4me3 with high affinity, recognizing and targeting active

12 promoter regions through this modified histone (66). Consequently, this methylated histone residue plays a role in transcriptional silencing, as suggested in the previous section

(I.C.2.i).

Unlike the methylation and demethylation of DNA, much is known about the enzymes and cofactors involved in histone modifications, particularly the HDAC complexes. More recently, histone demethylases have been identified, the first of which was the enzyme LSD1 (Lysine specific demethylase 1) which directly removes methyl groups from H3K4 and H3K9 through an oxidative demethylation reaction (67). The study of these complexes and the regions they target demonstrate that histone modifications are a dynamic process, unlike DNA methylation which is considered to be a more stable and less easily reversible mechanism (68).

I.C.2.iii Insulation

Chromatin insulators divide neighbouring chromosomal regions into distinct transcriptional domains by preventing the interaction of enhancers with promoters

(enhancer-blocking insulators) or by dividing euchromatic and heterochromatic regions

(barrier insulators) (69). The vast majority of vertebrate insulators bind the protein CTCF

(CCCTC-binding factor).

The conserved transcription factor CTCF has 11 zinc-fingers and binds a variety of different sites where it can act as an insulator, transcriptional enhancer or repressor. It has been shown to bind various imprinted loci, most notably the H19/Igf2 domain. The

DMR/ICR at this region contains several CTCF binding sites (70), deletions of which are sufficient to affect imprinting and cause disease (71). CTCF, which is sensitive to DNA

CpG methylation, only binds to the unmethylated maternal allele of the DMR from which

H19 is expressed (72). This effectively blocks access of long range elements to the

13 promoter region of Igf2 on the maternal allele, and consequently, the gene is silenced (73).

On the paternal allele, however, the methylated DMR blocks CTCF from binding. Without the presence of this insulator, enhancers interact with the promoter of Igf2, allowing for its transcription (70, 72) (Figure I.2.B). CTCF is also associated with the GRB10 locus on human chromosome 7 (74), MEG3 on human chromosome 14 (75), Rasgrf1 on murine chromosome 9 (76), and Kcnq1ot1 on murine chromosome 7 (77).

The mechanism whereby CTCF acts as an insulator is complex, yet much light has been shed on it in the past decade. Yusufzai and colleagues demonstrated that insulator regions are localized in distinct nucleolar regions by CTCF, and postulated that these chromatin domains may form loop structures that are silenced. Such closed loops may form clusters where multiple loops may be held together by CTCF’s ability to interact with itself

(78). It has also been demonstrated that the binding of CTCF to an insulator can cause accumulation of RNA polymerase II in the associated gene’s enhancer and its depletion in the promoter (79).

I.C.2.iv Non-coding RNA (ncRNA)

Non-coding transcripts are present in most imprinted loci, are often antisense or intronic to coding genes, and may regulate mono-allelic expression (80). However, not all ncRNA play a role in imprinting mechanisms, as has been demonstrated by the deletion of the H19 transcript (81).

The first ncRNA shown to play a role in the regulation of imprinting was the paternally-expressed Air. Truncation of this molecule by the insertion of a polyadenylation signal caused loss of imprinting of three maternally expressed genes in the region, including two genes that do not overlap with Air. The truncated molecule maintained its own imprinting status, and methylation at the Air promoter was unaltered. Methylation of

14 the promoter of the protein-coding gene which overlaps Air, however, was lost (82). The authors hypothesized that Air may repress the promoter in cis using a mechanism similar to

RNA interference or may recruit trans-acting repressors to the locus, such as DNMTs or

HDACs (Figure I.2.C). Such a role in chromatin remodelling has been attributed to other antisense ncRNA, particularly Tsix, which plays a crucial role in X-chromosome inactivation (83, 84). In a similar study, Kanduri and colleagues analyzed the imprinted non-coding transcript Kcnq1ot1. By incorporating polyadenylation signals at different points in the transcript, they created molecules of varying lengths and demonstrated that the degree of silencing is directly proportional to the length of Kcnq1ot1. At the same time, their study revealed that the first gene to become silenced was the overlapping protein- coding gene, and transcriptional inhibition was subsequently spread to the flanking imprinted genes. This silencing was achieved by chromatin remodelling, through H3K9 methylation (85).

15

Figure I.2

16

Figure I.2 Regulation of imprinted expression. A) Regulation of imprinted expression by DNA methylation and histone modifications. Imprinting control region (ICR) is depicted in purple. On the expressed allele (red), histones (grey) are characterised by acetylation and methylation of H3K4, creating a loose conformation. The DNA on this allele is unmethylated. On the silenced allele (blue), the histones are condensed and are characterized by methylation at residues other that H3K4. The DNA on this allele is methylated (green circles). B) Regulation of imprinted expression by insulation. The example shown mimics the patterns of expression seen in the H19/Igf2 locus. CTCF binds the unmethylated allele (red) blocking long-range enhancers (yellow segments) from interacting with gene 1. On the opposite allele (blue), methylation blocks CTCF from binding, allowing long-range enhancers to interact with gene 1. At the same time, methylation at the ICR silences the expression of gene 2. C) Regulation of imprinted expression by ncRNA. The example shown mimics the patterns of expression seen in the Air and Kcnq1ot1 loci. The non-coding antisense RNA (wave) is transcribed from the unmethylated allele (blue), silencing the protein coding overlapping transcript. On the opposite allele (red), the ICR is methylated, blocking the transcription of the non-coding transcript. This allows the transcription of the protein coding gene (arrow). (Figure I.2.B is modified from (86)).

17

I.C.3 Imprinted clusters in human and murine genomes

Imprinted genes are generally located in clusters that are scattered throughout mammalian genomes. Within these clusters, imprinted transcripts share common regulatory elements. Genomic regions which are crucial to the establishment and regulation of imprinting are termed ICRs. All ICRs are also DMRs and have differentially modified histones. Antisense transcripts that regulate imprinting are often derived from ICRs.

In the following sections, the H19/Igf2 imprinting cluster and the imprinted loci on human chromosome 7 will be reviewed. The former locus is arguably the most widely studied imprinted region. The regions on chromosome 7 will be described due to the focus of this thesis on this chromosome.

I.C.3.i The H19/Igf2 locus

This locus is located on mouse chromosome 7 and human chromosome 11p15. The

IGF2/Igf2 gene is paternally expressed (27), while H19 is expressed from the maternal allele (25). The ICR at the locus is paternally methylated and this imprint is established in the germ cell (87, 88). Deletions of the ICR cause loss of imprinting of both H19 and Igf2

(89, 90). In mice, there are three other DMRs at the locus associated with Igf2, known as

DMR0, DMR1, and DMR2 (91).

Studies have shown that Igf2 and H19 share two endoderm specific enhancers located at the 3’end of H19. Targeted deletions of these cis-acting regulatory elements cause loss of expression of H19 or Igf2, depending on the parental chromosome from which the deletion was inherited (73). Subsequent studies demonstrated that CTCF plays a crucial role in the imprinting of these genes by blocking enhancer activity from Igf2 (70, 72), as described in I.C.2.iii. Briefly, maternal methylation at the ICR blocks CTCF from binding, thereby allowing the long-range enhancers interact with Igf2 (Figure I.2.B).

18

By performing chromatin conformation capture techniques as well as chromatin immunoprecipitation (ChIP), Murrell and colleagues demonstrated that the H19 ICR and the Igf2 DMRs interact (92). On the maternal allele, the H19 ICR was found to interact with Igf2 DMR1, whereas on the paternal allele it interacted with Igf2 DMR2 thereby creating different chromatin loops on each chromosome. The authors hypothesized that the

DMR2-ICR interaction on the paternal allele would place the Igf2 promoter in close proximity with the endoderm enhancers.

In addition to acting as an insulator, studies have shown that CTCF plays a role in maintaining the hypomethylated state of the maternal ICR (93, 94). By performing RNAi,

Fedoriw and colleagues demonstrated that the maternal DMR becomes hypermethylated with decreases in CTCF levels (95). In a similar study, Schoenherr et al introduced point mutations in the four CTCF binding sites located in the ICR, successfully disrupting CTCF binding. In mice carrying the mutated allele, they observed increased methylation on the maternal allele. However, oocytes with the mutated CTCF binding sites had normal methylation patterns, suggesting that CTCF plays a role in maintaining hypomethylation on the maternal allele following fertilization (94).

I.C.3.ii Imprinted loci on human chromosome 7

There are two imprinted clusters on human chromosome 7 and one imprinted gene found on its own (Table I.2). The MEST/COPG2 cluster is found on murine chromosome 6, as is the PEG10/SGCE cluster. The singleton, GRB10, is located on murine chromosome

11.

I.C.3.ii.a GRB10

GRB10/Grb10 (Growth-Factor Receptor Bound Protein 10) is the only known imprinted gene whose expressed parental allele is distinct in different tissues. The gene is

19 maternally expressed in mice, yet specific isoforms are preferentially expressed from the paternal allele in the neonatal brain (96, 97). In humans, however, it is biallelically expressed in the vast majority of tissues, with the exception of the fetal brain, where most isoforms are paternally expressed, and skeletal muscle, where a single isoform is maternally expressed (98).

Methylation analyses have identified a germline derived DMR, which is maternally methylated in both humans and mice (99). There are two different GRB10 gene promoters in the mouse: an upstream promoter specific to maternally expressed isoforms and a downstream promoter specific to paternally expressed transcripts which overlaps the germline derived DMR (97). CTCF has been shown to bind the latter in a methylation sensitive manner and it has been suggested that the protein may act as an insulator controlling the expression of the upstream maternally expressed transcripts. Interestingly, the region to which CTCF binds is absent in the , which may account for the lack of widespread maternal expression seen in the mouse (97).

GRB10 functions as a growth inhibitor and a knockout of the gene gives rise to overgrown mice (100). At the same time, duplications of the region spanning GRB10 have been observed in several patients with Russell-Silver Syndrome (RSS), highlighting the gene’s importance in the syndrome (101, 102). Consequently, GRB10 has been considered to be a candidate for RSS (Table I.1). However, studies have failed to link GRB10 to the aetiology of RSS, particularly since the gene is not transcribed in tissues associated with growth (103). At the same time, mutations in GRB10 have not been identified in non-UPD7 patients, arguing against a role for the gene in the disease (104).

20

I.C.3.ii.b PEG10/SGCE region

The PEG10/SGCE cluster is located at 7q21. The cluster contains several imprinted genes, a fraction of which have been shown to be imprinted in the mouse yet whose status in humans remains unknown. The founding member of this cluster is Sgce (Sarcoglycan-

Epsilon), which is paternally expressed in all tissues in the human and mouse (105, 106).

Mutations in the gene have been associated with myoclonus dystonia, whose penetrance is dependent on the parent-of-origin of the SGCE mutation (107). PEG10 (Paternally

Expressed Gene 10) is also paternally expressed in all human and murine tissues examined and is located 100 basepairs downstream from human SGCE (108, 109). The transcript is retrotransposon derived and a knockout of the gene has been shown to cause early embryonic lethality in the mouse, due to placental defects (40). Several other imprinted genes have been identified in this locus. CALCR (Calcitonin Receptor) has been shown to be maternally expressed in the murine brain and maternally expressed in human-mouse somatic cell hybrids (110, 111). PPP1R9A (Neurabin) is maternally expressed in a tissue- specific pattern in both humans and mice (109, 112). Murine Pon2 and Pon3 (Paraoxonase

2 and 3) are preferentially expressed from the maternal allele in extra-embryonic tissues, although their imprinting status in humans remains unknown. Human PON1 expression has been detected from the paternal allele in human-mouse somatic cell hybrids (111), yet murine Pon1 has been shown to be biallelically expressed, bringing the human data into dispute due to the general conservation of imprinted patterns between humans and mice

(109). Finally, murine Asb4 (Ankyrin repeat and SOCS box-containing 4) has been shown to have ubiquitous maternal expression (113), yet its imprinting status in humans remains unknown.

21

A germline derived DMR has been identified, which overlaps the first exons of

Peg10 and Sgce (109). Despite the presence of numerous imprinted genes in this region, no other DMR has been identified, suggesting that the Peg10/Sgce DMR may act as an ICR

(109). The mechanism whereby the more distant imprinted genes within this region acquire mono-allelic expression has not yet been elucidated.

I.C.3.ii.c MEST/COPG2 region

The MEST/COPG2 locus is located at 7q32 and contains several imprinted genes.

The founding member of this cluster is MEST (Mesoderm-Specific Transcript), also known as PEG1. This paternally expressed transcript has mono-allelic expression in all tissues examined, in both humans and mice (114-116). This allele specific expression pattern has been shown to be isoform specific in humans. As mentioned in section I.C.1, mice lacking

Mest expression have abnormal behaviour, yet they also have intra-uterine growth retardation. As such, human MEST has been considered to be a candidate for RSS (Table

I.1). Most recently, a patient with RSS born after in vitro fertilization was described, where partial hypermethylation was observed at the MEST DMR, suggesting a role for the gene in the aetiology of the disorder (117). It remains unclear whether the hypermethylation observed in this patient is due to RSS or due to the in vitro fertilization procedure, which has been associated with imprinting disorders (118, 119). However, previous studies have failed to find mutations in MEST or abnormal methylation patterns in RSS patients (120,

121).

A transcript located intronic to MEST, termed MESTIT1, has been shown to be paternally expressed in all tissues examined (122, 123), yet a murine orthologue has not been identified to date. The human gene CPA4 (Carboxypeptidase 4), found 150 kilobases

22 centromeric to MEST, is maternally expressed in specific tissues. Murine Cpa4 is also expressed from the maternal allele, as described in chapter IV.

The human COPG2 transcript (Coatomer Protein Complex Subunit Gamma-2), which partially overlaps MEST, was found to be paternally expressed in most tissues (124).

However, these data were later disputed when subsequent studies found biallelic expression for the human gene (125) and maternal expression of murine Copg2 (126). Additionally,

COPG2IT1 (also known as CIT1), orthologous to Copg2as2 (also known as Mit1/Lb9), is paternally expressed in human and murine tissues (125, 126) and is located intronic to

COPG2, in an antisense orientation. Due to the overlap between MEST and COPG2, as well as the presence of numerous isoforms and 3’ends, the analysis of COPG2 has been complex. Lee et al. reported the existence of a paternally expressed transcript antisense to murine Copg2 (Copg2as1). This molecule was found to overlap the 3’UTRs of Mest and

Copg2. However, the authors could not exclude the possibility that the transcript was a splice variant of Copg2as2 or an isoform of Mest (126).

The retrotransposon derived KLF14, located within this cluster, has been shown to have ubiquitous maternal expression in humans and mice, as described in Chapter V.

A DMR, located at the 5’ end of MEST/Mest has been identified (116). This DMR is established in during spermatogenesis (127), and is shown to have differential histone modifications, as described in chapter V. Consequently, this DMR may act as an ICR for the imprinted cluster.

23

Table I.2 Imprinted loci on human chromosome 7 Human Murine Human Murine Tissue Locus Locus Gene Expression Expression Specific 7p12.2 11qA1 GRB10 M,P M,P Y 7q21.3 6qA1 SGCE P P N PEG10 P P N PPP1R9A M M Y CALCR M(?) M Y PON2 U M Y PON3 U M Y ASB4 U M N 7q32.3 6qA3 MEST P P N MESTIT1 P NH N CPA4 M M* Y COPG2 P (disputed) M COPG2IT1 P P N KLF14 M* M* N M: Maternal; P: Paternal; U: Unknown; NH: No Homologue; ?: Not conclusive * denotes expression patterns identified within this thesis

24

I.C.4 Identification of imprinted genes

The identification of imprinted genes is essential to understanding the epigenetic mechanisms underlying their allele-specific expression as well as the diseases caused by aberrations in their regulation. Consequently, many studies have focussed on characterizing and identifying these transcripts, as well as determining their patterns of expression.

However, since only a small fraction of genes are predicted to be imprinted, the selection of candidate imprinted genes is a cornerstone in these studies.

I.C.4.i Identification of candidate imprinted genes

Numerous approaches have been taken towards selecting imprinted genes for expression analyses. The simplest of these is the selection of transcripts located within and flanking known imprinted clusters. This clear-cut approach is arguably the most successful of methods, since the vast majority of imprinted genes are grouped together, as described in

I.C.3.

Several studies have attempted to use in silico approaches to identify imprinted transcripts. Luedi and colleagues developed an algorithm which identified DNA elements, such as repetitive elements and common motifs, shared by known imprinted genes to generate a list of 600 putative murine imprinted transcripts (128). Most recently, a similar study performed by the same group identified 156 imprinted transcripts, selected two of these for additional analyses, and demonstrated that the latter were imprinted (129).

Numerous studies have used microarray technology to identify candidate imprinted genes. In 2003, Nikaido et al used murine parthenogenetic and androgenetic embryos to identify transcripts that had opposing patterns of expression between the two RNA sources.

This large-scale study identified 2114 candidate imprinted genes (130). A previous study performed in 2002 successfully discovered three novel imprinted gene amongst the

25 candidate transcripts identified by cDNA Microarray (113). Several studies have used similar methods to identify candidate transcripts using RNA from mice with uniparental disomies (131, 132).

I.C.4.ii Characterization of imprinted expression

Once a candidate imprinted gene has been identified, its parent-of-origin pattern of must be determined, which requires the ability to distinguish parental alleles. This is most commonly achieved through the identification of transcribed polymorphisms unique to each parental chromosome in the gene of interest. Subsequently, the expression of the polymorphism is noted to determine if the gene is biallelically expressed (determined by equal levels of each allele at the polymorphic site) or if there is preferential allelic expression. In general, the expression of the polymorphism is noted in numerous tissues in order to determine if there is tissue-specific imprinting. Once preferential allelic expression is identified, it is necessary to determine if the allelic preference is in a parent-of-origin pattern. In mice, this is accomplished by performing reciprocal crosses of inbred strains of mice (Figure I.1). In humans, this is achieved by identifying preferential allelic expression in numerous samples and determining the allele that is being transcribed by comparison to parental DNA.

To determine allelic preference at polymorphic sites, several methods are used, the most common being PCR amplification of the transcribed polymorphism followed by

RFLP or sequencing. For more quantitative analyses, particularly when imprinting is leaky, pyrosequencing or SNaPshot analyses are performed, which determine allelic ratios.

SNaPshot is a single base-pair primer extension procedure, generally used for genotyping.

However, the incorporation of nucleotides can be quantified when normalized against a known control. In contrast, pyrosequencing quantitatively measures the release of

26 pyrophosphates, which occurs with the incorporation of dNTPs into the DNA strand, through a luciferase driven reaction.

The materials used for imprinting analyses are also diverse. The tissues considered to be most reliable are those obtained from dissections or microdissections of embryos from hybrid mice, since the developmental stage and tissue analyzed can be controlled. Tissues can also be obtained from androgenetic or parthenogenetic embryos. However, since these embryos are embryonic lethal and there is disregulation of numerous transcripts, results from these studies are treated with caution. Monochromosomal somatic cell hybrids have been used in numerous studies to identify imprinted transcripts (111, 122). These cell lines are murine, yet contain a single human chromosome of known parental origin. However, these cell lines can only be used to study transcripts which are expressed and imprinted in the cell type.

RNA-FISH has been successfully used to demonstrate the transcription of imprinted genes from a single parental chromosome (133, 134). However, this method is limited in its ability to identify preferential allelic expression of small molecules.

I.C.5 Theories on the evolution of imprinted expression

Among animals, imprinted gene expression exists only in therian mammals, including marsupials. Several studies have tried to identify imprinted gene expression in monotremes, however genes with parent-of-origin patterns of expression have not yet been identified (135, 136). Consequently, imprinting is hypothesized to have arisen after the divergence of monotremes from therian mammals, between 219-165 million years ago

(MYA) (137).

It has been proposed that genomic imprinting arose from an ancestral chromosome or region that was originally imprinted and subsequently duplicated and translocated (138).

27

This is supported by the observation that the paralogues of several imprinted genes are imprinted as well or located within imprinted loci. Under such a model, the duplication of imprinted loci would necessarily duplicate regulatory elements as well, thereby creating novel imprinted clusters. However, a recent study analyzing the location of therian imprinted genes in birds and monotremes demonstrated that these are randomly dispersed throughout these vertebrae genomes (139). A related theory has proposed that genomic imprinting evolved as a form of dosage compensation in order to silence duplicated genes.

A large number of imprinted transcripts are retrotransposed genes, several of which are protein coding (reviewed in (28)). This has led to the hypothesis that genomic imprinting may have evolved from mechanisms which neutralize the expression of foreign

DNA and transgenes (140). The importance of the proper expression of retrotransposed genes is supported by evidence indicating that aberrations in the expression of these molecules can lead to embryonic lethality (40, 141). It has been recently shown that retrotransposition can drive the establishment of a DMR, providing support for this theory

(142).

Under another model known as “the ovarian time bomb hypothesis”, imprinting is predicted to have developed as a mechanism, not only to prevent parthenogenesis, but also to prevent trophoblastic disease. Varmuza and Mann, who developed this theory, noted that parthenogenetic embryos have poorly developed trophoblasts (143). They suggest that the oocyte carries imprints to silence genes which promote trophoblastic invasion, thereby preventing malignant trophoblastic disease in cases of parthenogenesis. The authors note that 25% of ovarian tumors are derived from parthenogenesis, yet these fail to develop into malignant trophoblast disease. They also indicate that the risk for this disease increases

1000-fold with a molar pregnancy, which is the result of the fusion of sperm(s) with an

28 enucleated egg. Therefore, an imprinted genome would confer selective advantage to the mother (143, 144). However, the most outstanding criticism of this theory is that it fails to explain genes that are paternally silenced, as well as the fact that many imprinted genes are not associated with trophoblastic invasion (145-147).

In 2005, Reik and Lewis proposed that genomic imprinting evolved together with chromosome X-inactivation (148). Their hypothesis suggests that with the evolution of the placenta in both marsupials and eutherians, selective pressures led to the silencing of growth suppressors, particularly the silencing of growth regulating genes located on the X- chromosome by imprinted X-inactivation. This mechanism of epigenetic silencing then spread to autosomal genes. Consequently, imprinting must only exist in species where there is a placenta and X-chromosome in activation. Their hypothesis is supported by the fact that X-chromosome inactivation and imprinting exists in marsupials and eutherians, yet not in monotremes. It is also upheld by evidence indicating that autosomal imprinting and chromosome X-inactivation share many mechanistic similarities such as the use of non- coding RNA and histone modifications. However, a study published in 2006 determined that the Xist gene, a non-coding RNA essential for X-chromosome inactivation, evolved from a protein-coding gene. The results indicate that the pseudogenization of Xist occurred after the divergence of eutherians from marsupials, indicating that X-chromosome inactivation may have evolved independently in these mammalian classes (149).

Perhaps the most accepted theory of genomic imprinting is the social conflict hypothesis, also known as the kinship selection hypothesis (150, 151). This theory predicts a conflict between maternal and paternal interests in the allocation of nutrients and resources for the offspring. Maternal interests lie in the survival of the mother and in ensuring that there are sufficient resources for future pregnancies. In contrast, paternal

29 interests lay in extracting maternal resources in order to ensure the survival of the offspring and inheritance of the paternal allele, regardless of consequences to future offspring.

Consequently, paternally and maternally expressed transcripts enhance and limit growth, respectively. This functional prediction is upheld by the majority of imprinted transcripts

(see section I.C.1).

I.D THESIS OVERVIEW

The identification and study of genes exhibiting unequal gene expression is a crucial step in understanding the mechanisms underlying their allelic imbalance and comprehending the diseases caused by their improper regulation.

To contribute to a better understanding of these phenomena, this thesis focuses on the identification of differentially expressed murine transcripts homologous to genes on human chromosome 7. I hypothesize that we will gain insight into differential allelic expression by characterizing these genes as well as the mechanisms governing their regulation and evolution that, ultimately, underlie allelic preference.

This thesis describes the identification of differential allelic expression in three genes: a) The identification of differential allelic expression in a non parent-of-origin pattern in

Pon1. This transcript and its splice variant were found to be preferentially expressed in a haplotype-specific pattern in embryonic mouse livers. Quantification of total expression and of allelic ratios revealed that total expression of Pon1 increases throughout embryonic development, yet allelic ratios do not increase at proportional rates. Consequently, allelic preference can shift from one haplotype to another throughout the course of development. b) The identification of imprinted gene expression in murine Cpa4. This transcript, which is known to be imprinted in humans, was analysed in murine tissues. Cpa4 was found to have

30 tissue specific imprinting, and its promoter region was examined for differential cytosine methylation patterns. c) The identification of imprinted gene expression in human and murine KLF14. This thesis provides the first description of imprinted expression of KLF14 in both human and mouse tissues. Imprinting was observed in every tissue examined in both humans and mice, and it was found to be expressed solely from the maternal allele. Various epigenetic analyses were performed in order to gain a deeper understanding of the regulation of this transcript.

The data in this thesis demonstrates that preferential allelic expression patterns can occur in tissue specific patterns and that these patterns may not be static. Future studies may ascertain the frequency of such dynamic expression patterns and discover the mechanisms that underlie their variability. The data herein also contributes towards the understanding of the mechanisms regulating the imprinted expression of genes on human chromosome 7q32.3. Additional studies may discover the function of the genes at this locus and the mechanisms regulating tissue-specific imprinted expression of transcripts within this region.

31

I.E REFERENCES

1. Esumi, S., Kakazu, N., Taguchi, Y., Hirayama, T., Sasaki, A., Hirabayashi, T., Koide, T., Kitsukawa, T., Hamada, S. and Yagi, T. (2005) Monoallelic yet combinatorial expression of variable exons of the protocadherin-alpha gene cluster in single neurons. Nat Genet, 37, 171-6. 2. Chess, A., Simon, I., Cedar, H. and Axel, R. (1994) Allelic inactivation regulates olfactory receptor gene expression. Cell, 78, 823-34. 3. Gimelbrant, A., Hutchinson, J.N., Thompson, B.R. and Chess, A. (2007) Widespread monoallelic expression on human autosomes. Science, 318, 1136-40. 4. Lo, H.S., Wang, Z., Hu, Y., Yang, H.H., Gere, S., Buetow, K.H. and Lee, M.P. (2003) Allelic variation in gene expression is common in the human genome. Genome Res, 13, 1855-62. 5. Bray, N.J., Buckland, P.R., Owen, M.J. and O'Donovan, M.C. (2003) Cis-acting variation in the expression of a high proportion of genes in human brain. Hum Genet, 113, 149-53. 6. Cowles, C.R., Hirschhorn, J.N., Altshuler, D. and Lander, E.S. (2002) Detection of regulatory variation in mouse genes. Nat Genet, 32, 432-7. 7. Guo, M., Rupe, M.A., Zinselmeier, C., Habben, J., Bowen, B.A. and Smith, O.S. (2004) Allelic variation of gene expression in maize hybrids. Plant Cell, 16, 1707- 16. 8. Pant, P.V., Tao, H., Beilharz, E.J., Ballinger, D.G., Cox, D.R. and Frazer, K.A. (2006) Analysis of allelic differential expression in human white blood cells. Genome Res, 16, 331-9. 9. Cheung, V.G., Conlin, L.K., Weber, T.M., Arcaro, M., Jen, K.Y., Morley, M. and Spielman, R.S. (2003) Natural variation in human gene expression assessed in lymphoblastoid cells. Nat Genet, 33, 422-5. 10. Pastinen, T., Sladek, R., Gurd, S., Sammak, A., Ge, B., Lepage, P., Lavergne, K., Villeneuve, A., Gaudin, T., Brandstrom, H. et al. (2004) A survey of genetic and epigenetic variation affecting human gene expression. Physiol Genomics, 16, 184- 93.

32

11. Morley, M., Molony, C.M., Weber, T.M., Devlin, J.L., Ewens, K.G., Spielman, R.S. and Cheung, V.G. (2004) Genetic analysis of genome-wide variation in human gene expression. Nature, 430, 743-7. 12. Yan, H., Yuan, W., Velculescu, V.E., Vogelstein, B. and Kinzler, K.W. (2002) Allelic variation in human gene expression. Science, 297, 1143. 13. Yan, H., Dobbie, Z., Gruber, S.B., Markowitz, S., Romans, K., Giardiello, F.M., Kinzler, K.W. and Vogelstein, B. (2002) Small changes in expression affect predisposition to tumorigenesis. Nat Genet, 30, 25-6. 14. Yang, W.S., Tsou, P.L., Lee, W.J., Tseng, D.L., Chen, C.L., Peng, C.C., Lee, K.C., Chen, M.J., Huang, C.J., Tai, T.Y. et al. (2003) Allele-specific differential expression of a common adiponectin gene polymorphism related to obesity. J Mol Med, 81, 428-34. 15. Bray, N.J., Buckland, P.R., Williams, N.M., Williams, H.J., Norton, N., Owen, M.J. and O'Donovan, M.C. (2003) A haplotype implicated in schizophrenia susceptibility is associated with reduced COMT expression in human brain. Am J Hum Genet, 73, 152-61. 16. Laitinen, T., Polvi, A., Rydman, P., Vendelin, J., Pulkkinen, V., Salmikangas, P., Makela, S., Rehn, M., Pirskanen, A., Rautanen, A. et al. (2004) Characterization of a common susceptibility locus for asthma-related traits. Science, 304, 300-4. 17. Ueda, H., Howson, J.M., Esposito, L., Heward, J., Snook, H., Chamberlain, G., Rainbow, D.B., Hunter, K.M., Smith, A.N., Di Genova, G. et al. (2003) Association of the T-cell regulatory gene CTLA4 with susceptibility to autoimmune disease. Nature, 423, 506-11. 18. Suzuki, A., Yamada, R., Chang, X., Tokuhiro, S., Sawada, T., Suzuki, M., Nagasaki, M., Nakayama-Hamada, M., Kawaida, R., Ono, M. et al. (2003) Functional haplotypes of PADI4, encoding citrullinating enzyme peptidylarginine deiminase 4, are associated with rheumatoid arthritis. Nat Genet, 34, 395-402. 19. Helgadottir, A., Manolescu, A., Thorleifsson, G., Gretarsdottir, S., Jonsdottir, H., Thorsteinsdottir, U., Samani, N.J., Gudmundsson, G., Grant, S.F., Thorgeirsson, G. et al. (2004) The gene encoding 5-lipoxygenase activating protein confers risk of myocardial infarction and stroke. Nat Genet, 36, 233-9.

33

20. Buckland, P.R. (2004) Allele-specific gene expression differences in humans. Hum Mol Genet, 13 Spec No 2, R255-60. 21. Takagi, N. and Sasaki, M. (1975) Preferential inactivation of the paternally derived X chromosome in the extraembryonic membranes of the mouse. Nature, 256, 640-2. 22. Surani, M.A., Barton, S.C. and Norris, M.L. (1984) Development of reconstituted mouse eggs suggests imprinting of the genome during gametogenesis. Nature, 308, 548-50. 23. McGrath, J. and Solter, D. (1984) Completion of mouse embryogenesis requires both the maternal and paternal genomes. Cell, 37, 179-83. 24. Cattanach, B.M. and Kirk, M. (1985) Differential activity of maternally and paternally derived chromosome regions in mice. Nature, 315, 496-8. 25. Bartolomei, M.S., Zemel, S. and Tilghman, S.M. (1991) Parental imprinting of the mouse H19 gene. Nature, 351, 153-5. 26. Barlow, D.P., Stoger, R., Herrmann, B.G., Saito, K. and Schweifer, N. (1991) The mouse insulin-like growth factor type-2 receptor is imprinted and closely linked to the Tme locus. Nature, 349, 84-7. 27. DeChiara, T.M., Robertson, E.J. and Efstratiadis, A. (1991) Parental imprinting of the mouse insulin-like growth factor II gene. Cell, 64, 849-59. 28. Morison, I.M., Ramsay, J.P. and Spencer, H.G. (2005) A census of mammalian imprinting. Trends Genet, 21, 457-65. 29. Lamb, J.A., Barnby, G., Bonora, E., Sykes, N., Bacchelli, E., Blasi, F., Maestrini, E., Broxholme, J., Tzenova, J., Weeks, D. et al. (2005) Analysis of IMGSAC autism susceptibility loci: evidence for sex limited and parent of origin specific effects. J Med Genet, 42, 132-7. 30. Stine, O.C., Xu, J., Koskela, R., McMahon, F.J., Gschwend, M., Friddle, C., Clark, C.D., McInnis, M.G., Simpson, S.G., Breschel, T.S. et al. (1995) Evidence for linkage of bipolar disorder to chromosome 18 with a parent-of-origin effect. Am J Hum Genet, 57, 1384-94. 31. Ottman, R., Annegers, J.F., Hauser, W.A. and Kurland, L.T. (1988) Higher risk of seizures in offspring of mothers than of fathers with epilepsy. Am J Hum Genet, 43, 257-64.

34

32. Avramopoulos, D., Wang, R., Valle, D., Fallin, M.D. and Bassett, S.S. (2007) A novel gene derived from a segmental duplication shows perturbed expression in Alzheimer's disease. Neurogenetics, 8, 111-20. 33. Bassett, S.S., Avramopoulos, D., Perry, R.T., Wiener, H., Watson, B., Jr., Go, R.C. and Fallin, M.D. (2006) Further evidence of a maternal parent-of-origin effect on chromosome 10 in late-onset Alzheimer's disease. Am J Med Genet B Neuropsychiatr Genet, 141, 537-40. 34. Perez Jurado, L.A., Peoples, R., Kaplan, P., Hamel, B.C. and Francke, U. (1996) Molecular definition of the chromosome 7 deletion in Williams syndrome and parent-of-origin effects on growth. Am J Hum Genet, 59, 781-92. 35. Buiting, K., Saitoh, S., Gross, S., Dittrich, B., Schwartz, S., Nicholls, R.D. and Horsthemke, B. (1995) Inherited microdeletions in the Angelman and Prader-Willi syndromes define an imprinting centre on human chromosome 15. Nat Genet, 9, 395-400. 36. Rougeulle, C., Glatt, H. and Lalande, M. (1997) The Angelman syndrome candidate gene, UBE3A/E6-AP, is imprinted in brain. Nat Genet, 17, 14-5. 37. Vu, T.H. and Hoffman, A.R. (1997) Imprinting of the Angelman syndrome gene, UBE3A, is restricted to brain. Nat Genet, 17, 12-3. 38. Kishino, T., Lalande, M. and Wagstaff, J. (1997) UBE3A/E6-AP mutations cause Angelman syndrome. Nat Genet, 15, 70-3. 39. Matsuura, T., Sutcliffe, J.S., Fang, P., Galjaard, R.J., Jiang, Y.H., Benton, C.S., Rommens, J.M. and Beaudet, A.L. (1997) De novo truncating mutations in E6-AP ubiquitin-protein ligase gene (UBE3A) in Angelman syndrome. Nat Genet, 15, 74-7. 40. Ono, R., Nakamura, K., Inoue, K., Naruse, M., Usami, T., Wakisaka-Saito, N., Hino, T., Suzuki-Migishima, R., Ogonuki, N., Miki, H. et al. (2006) Deletion of Peg10, an imprinted gene acquired from a retrotransposon, causes early embryonic lethality. Nat Genet, 38, 101-106. 41. Guillemot, F., Nagy, A., Auerbach, A., Rossant, J. and Joyner, A.L. (1994) Essential role of Mash-2 in extraembryonic development. Nature, 371, 333-6. 42. Guillemot, F., Caspary, T., Tilghman, S.M., Copeland, N.G., Gilbert, D.J., Jenkins, N.A., Anderson, D.J., Joyner, A.L., Rossant, J. and Nagy, A. (1995) Genomic

35

imprinting of Mash2, a mouse gene required for trophoblast development. Nat Genet, 9, 235-42. 43. Angiolini, E., Fowden, A., Coan, P., Sandovici, I., Smith, P., Dean, W., Burton, G., Tycko, B., Reik, W., Sibley, C. et al. (2006) Regulation of placental efficiency for nutrient transport by imprinted genes. Placenta, 27 Suppl A, S98-102. 44. Lefebvre, L., Viville, S., Barton, S.C., Ishino, F., Keverne, E.B. and Surani, M.A. (1998) Abnormal maternal behaviour and growth retardation associated with loss of the imprinted gene Mest. Nat Genet, 20, 163-9. 45. Li, L., Keverne, E.B., Aparicio, S.A., Ishino, F., Barton, S.C. and Surani, M.A. (1999) Regulation of maternal behavior and offspring growth by paternally expressed Peg3. Science, 284, 330-3. 46. Li, E. (2002) Chromatin modification and epigenetic reprogramming in mammalian development. Nat Rev Genet, 3, 662-73. 47. Takada, S., Tevendale, M., Baker, J., Georgiades, P., Campbell, E., Freeman, T., Johnson, M.H., Paulsen, M. and Ferguson-Smith, A.C. (2000) Delta-like and gtl2 are reciprocally expressed, differentially methylated linked imprinted genes on mouse chromosome 12. Curr Biol, 10, 1135-8. 48. Parker-Katiraee, L., Carson, A.R., Yamada, T., Arnaud, P., Feil, R., Abu-Amero, S.N., Moore, G.E., Kaneda, M., Perry, G.H., Stone, A.C. et al. (2007) Identification of the imprinted KLF14 transcription factor undergoing human-specific accelerated evolution. PLoS Genet, 3, e65. 49. Caspary, T., Cleary, M.A., Baker, C.C., Guan, X.J. and Tilghman, S.M. (1998) Multiple mechanisms regulate imprinting of the mouse distal chromosome 7 gene cluster. Mol Cell Biol, 18, 3466-74. 50. Cui, H., Onyango, P., Brandenburg, S., Wu, Y., Hsieh, C.L. and Feinberg, A.P. (2002) Loss of imprinting in colorectal cancer linked to hypomethylation of H19 and IGF2. Cancer Res, 62, 6442-6. 51. Horike, S., Mitsuya, K., Meguro, M., Kotobuki, N., Kashiwagi, A., Notsu, T., Schulz, T.C., Shirayoshi, Y. and Oshimura, M. (2000) Targeted disruption of the human LIT1 locus defines a putative imprinting control element playing an essential role in Beckwith-Wiedemann syndrome. Hum Mol Genet, 9, 2075-83.

36

52. Reik, W., Dean, W. and Walter, J. (2001) Epigenetic reprogramming in mammalian development. Science, 293, 1089-93. 53. Leonhardt, H., Page, A.W., Weier, H.U. and Bestor, T.H. (1992) A targeting sequence directs DNA methyltransferase to sites of DNA replication in mammalian nuclei. Cell, 71, 865-73. 54. Kaneda, M., Okano, M., Hata, K., Sado, T., Tsujimoto, N., Li, E. and Sasaki, H. (2004) Essential role for de novo DNA methyltransferase Dnmt3a in paternal and maternal imprinting. Nature, 429, 900-3. 55. Bourc'his, D., Xu, G.L., Lin, C.S., Bollman, B. and Bestor, T.H. (2001) Dnmt3L and the establishment of maternal genomic imprints. Science, 294, 2536-9. 56. Jia, D., Jurkowska, R.Z., Zhang, X., Jeltsch, A. and Cheng, X. (2007) Structure of Dnmt3a bound to Dnmt3L suggests a model for de novo DNA methylation. Nature, 449, 248-51. 57. Ooi, S.K., Qiu, C., Bernstein, E., Li, K., Jia, D., Yang, Z., Erdjument-Bromage, H., Tempst, P., Lin, S.P., Allis, C.D. et al. (2007) DNMT3L connects unmethylated lysine 4 of histone H3 to de novo methylation of DNA. Nature, 448, 714-7. 58. Nakamura, T., Arai, Y., Umehara, H., Masuhara, M., Kimura, T., Taniguchi, H., Sekimoto, T., Ikawa, M., Yoneda, Y., Okabe, M. et al. (2007) PGC7/Stella protects against DNA demethylation in early embryogenesis. Nat Cell Biol, 9, 64-71. 59. Schlesinger, Y., Straussman, R., Keshet, I., Farkash, S., Hecht, M., Zimmerman, J., Eden, E., Yakhini, Z., Ben-Shushan, E., Reubinoff, B.E. et al. (2007) Polycomb- mediated methylation on Lys27 of histone H3 pre-marks genes for de novo methylation in cancer. Nat Genet, 39, 232-6. 60. Heintzman, N.D., Stuart, R.K., Hon, G., Fu, Y., Ching, C.W., Hawkins, R.D., Barrera, L.O., Van Calcar, S., Qu, C., Ching, K.A. et al. (2007) Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat Genet, 39, 311-8. 61. Mikkelsen, T.S., Ku, M., Jaffe, D.B., Issac, B., Lieberman, E., Giannoukos, G., Alvarez, P., Brockman, W., Kim, T.K., Koche, R.P. et al. (2007) Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature, 448, 553-60.

37

62. Lewis, A., Mitsuya, K., Umlauf, D., Smith, P., Dean, W., Walter, J., Higgins, M., Feil, R. and Reik, W. (2004) Imprinting on distal chromosome 7 in the placenta involves repressive histone methylation independent of DNA methylation. Nat Genet, 36, 1291-5. 63. Umlauf, D., Goto, Y., Cao, R., Cerqueira, F., Wagschal, A., Zhang, Y. and Feil, R. (2004) Imprinting along the Kcnq1 domain on mouse chromosome 7 involves repressive histone methylation and recruitment of Polycomb group complexes. Nat Genet, 36, 1296-300. 64. Delaval, K., Govin, J., Cerqueira, F., Rousseaux, S., Khochbin, S. and Feil, R. (2007) Differential histone modifications mark mouse imprinting control regions during spermatogenesis. Embo J, 26, 720-9. 65. Fournier, C., Goto, Y., Ballestar, E., Delaval, K., Hever, A.M., Esteller, M. and Feil, R. (2002) Allele-specific histone lysine methylation marks regulatory regions at imprinted mouse genes. Embo J, 21, 6560-70. 66. Shi, X., Hong, T., Walter, K.L., Ewalt, M., Michishita, E., Hung, T., Carney, D., Pena, P., Lan, F., Kaadige, M.R. et al. (2006) ING2 PHD domain links histone H3 lysine 4 methylation to active gene repression. Nature, 442, 96-9. 67. Shi, Y., Lan, F., Matson, C., Mulligan, P., Whetstine, J.R., Cole, P.A. and Casero, R.A. (2004) Histone demethylation mediated by the nuclear amine oxidase homolog LSD1. Cell, 119, 941-53. 68. Ayer, D.E. (1999) Histone deacetylases: transcriptional repression with SINers and NuRDs. Trends Cell Biol, 9, 193-8. 69. Gaszner, M. and Felsenfeld, G. (2006) Insulators: exploiting transcriptional and epigenetic mechanisms. Nat Rev Genet, 7, 703-13. 70. Hark, A.T., Schoenherr, C.J., Katz, D.J., Ingram, R.S., Levorse, J.M. and Tilghman, S.M. (2000) CTCF mediates methylation-sensitive enhancer-blocking activity at the H19/Igf2 locus. Nature, 405, 486-9. 71. Sparago, A., Cerrato, F., Vernucci, M., Ferrero, G.B., Silengo, M.C. and Riccio, A. (2004) Microdeletions in the human H19 DMR result in loss of IGF2 imprinting and Beckwith-Wiedemann syndrome. Nat Genet, 36, 958-60.

38

72. Bell, A.C. and Felsenfeld, G. (2000) Methylation of a CTCF-dependent boundary controls imprinted expression of the Igf2 gene. Nature, 405, 482-5. 73. Leighton, P.A., Saam, J.R., Ingram, R.S., Stewart, C.L. and Tilghman, S.M. (1995) An enhancer deletion affects both H19 and Igf2 expression. Genes Dev, 9, 2079-89. 74. Hikichi, T., Kohda, T., Kaneko-Ishino, T. and Ishino, F. (2003) Imprinting regulation of the murine Meg1/Grb10 and human GRB10 genes; roles of brain- specific promoters and mouse-specific CTCF-binding sites. Nucl. Acids. Res., 31, 1398-1406. 75. Rosa, A.L., Wu, Y.Q., Kwabi-Addo, B., Coveler, K.J., Reid Sutton, V. and Shaffer, L.G. (2005) Allele-specific methylation of a functional CTCF binding site upstream of MEG3 in the human imprinted domain of 14q32. Chromosome Res, 13, 809-18. 76. Yoon, B., Herman, H., Hu, B., Park, Y.J., Lindroth, A., Bell, A., West, A.G., Chang, Y., Stablewski, A., Piel, J.C. et al. (2005) Rasgrf1 imprinting is regulated by a CTCF-dependent methylation-sensitive enhancer blocker. Mol Cell Biol, 25, 11184- 90. 77. Fitzpatrick, G.V., Pugacheva, E.M., Shin, J.Y., Abdullaev, Z., Yang, Y., Khatod, K., Lobanenkov, V.V. and Higgins, M.J. (2007) Allele-specific binding of CTCF to the multipartite imprinting control region KvDMR1. Mol Cell Biol, 27, 2636-47. 78. Yusufzai, T.M., Tagami, H., Nakatani, Y. and Felsenfeld, G. (2004) CTCF tethers an insulator to subnuclear sites, suggesting shared insulator mechanisms across species. Mol Cell, 13, 291-8. 79. Zhao, H. and Dean, A. (2004) An insulator blocks spreading of histone acetylation and interferes with RNA polymerase II transfer between an enhancer and gene. Nucleic Acids Res, 32, 4903-19. 80. Lehner, B., Williams, G., Campbell, R.D. and Sanderson, C.M. (2002) Antisense transcripts in the human genome. Trends Genet, 18, 63-5. 81. Ripoche, M.A., Kress, C., Poirier, F. and Dandolo, L. (1997) Deletion of the H19 transcription unit reveals the existence of a putative imprinting control element. Genes Dev, 11, 1596-604. 82. Sleutels, F., Zwart, R. and Barlow, D.P. (2002) The non-coding Air RNA is required for silencing autosomal imprinted genes. Nature, 415, 810-3.

39

83. Sado, T., Hoki, Y. and Sasaki, H. (2005) Tsix silences Xist through modification of chromatin structure. Dev Cell, 9, 159-65. 84. Navarro, P., Pichard, S., Ciaudo, C., Avner, P. and Rougeulle, C. (2005) Tsix transcription across the Xist gene alters chromatin conformation without affecting Xist transcription: implications for X-chromosome inactivation. Genes Dev, 19, 1474-84. 85. Kanduri, C., Thakur, N. and Pandey, R.R. (2006) The length of the transcript encoded from the Kcnq1ot1 antisense promoter determines the degree of silencing. Embo J, 25, 2096-106. 86. Delaval, K. and Feil, R. (2004) Epigenetic regulation of mammalian genomic imprinting. Curr Opin Genet Dev, 14, 188-95. 87. Kerjean, A., Dupont, J.-M., Vasseur, C., Le Tessier, D., Cuisset, L., Paldi, A., Jouannet, P. and Jeanpierre, M. (2000) Establishment of the paternal methylation imprint of the human H19 and MEST/PEG1 genes during spermatogenesis. Hum. Mol. Genet., 9, 2183-2187. 88. Tremblay, K.D., Saam, J.R., Ingram, R.S., Tilghman, S.M. and Bartolomei, M.S. (1995) A paternal-specific methylation imprint marks the alleles of the mouse H19 gene. Nat Genet, 9, 407-13. 89. Thorvaldsen, J.L., Duran, K.L. and Bartolomei, M.S. (1998) Deletion of the H19 differentially methylated domain results in loss of imprinted expression of H19 and Igf2. Genes Dev, 12, 3693-702. 90. Leighton, P.A., Ingram, R.S., Eggenschwiler, J., Efstratiadis, A. and Tilghman, S.M. (1995) Disruption of imprinting caused by deletion of the H19 gene region in mice. Nature, 375, 34-9. 91. Moore, T., Constancia, M., Zubair, M., Bailleul, B., Feil, R., Sasaki, H. and Reik, W. (1997) Multiple imprinted sense and antisense transcripts, differential methylation and tandem repeats in a putative imprinting control region upstream of mouse Igf2. Proc Natl Acad Sci U S A, 94, 12509-14. 92. Murrell, A., Heeson, S. and Reik, W. (2004) Interaction between differentially methylated regions partitions the imprinted genes Igf2 and H19 into parent-specific chromatin loops. Nat Genet, 36, 889-93.

40

93. Pant, V., Mariano, P., Kanduri, C., Mattsson, A., Lobanenkov, V., Heuchel, R. and Ohlsson, R. (2003) The nucleotides responsible for the direct physical contact between the chromatin insulator protein CTCF and the H19 imprinting control region manifest parent of origin-specific long-distance insulation and methylation- free domains. Genes Dev, 17, 586-90. 94. Schoenherr, C.J., Levorse, J.M. and Tilghman, S.M. (2003) CTCF maintains differential methylation at the Igf2/H19 locus. Nat Genet, 33, 66-9. 95. Fedoriw, A.M., Stein, P., Svoboda, P., Schultz, R.M. and Bartolomei, M.S. (2004) Transgenic RNAi reveals essential function for CTCF in H19 gene imprinting. Science, 303, 238-40. 96. Miyoshi, N., Kuroiwa, Y., Kohda, T., Shitara, H., Yonekawa, H., Kawabe, T., Hasegawa, H., Barton, S.C., Surani, M.A., Kaneko-Ishino, T. et al. (1998) Identification of the Meg1/Grb10 imprinted gene on mouse proximal chromosome 11, a candidate for the Silver-Russell syndrome gene. Proc Natl Acad Sci U S A, 95, 1102-7. 97. Hikichi, T., Kohda, T., Kaneko-Ishino, T. and Ishino, F. (2003) Imprinting regulation of the murine Meg1/Grb10 and human GRB10 genes; roles of brain- specific promoters and mouse-specific CTCF-binding sites. Nucleic Acids Res, 31, 1398-406. 98. Blagitko, N., Mergenthaler, S., Schulz, U., Wollmann, H.A., Craigen, W., Eggermann, T., Ropers, H.H. and Kalscheuer, V.M. (2000) Human GRB10 is imprinted and expressed from the paternal and maternal allele in a highly tissue- and isoform-specific fashion. Hum Mol Genet, 9, 1587-95. 99. Arnaud, P., Monk, D., Hitchins, M., Gordon, E., Dean, W., Beechey, C.V., Peters, J., Craigen, W., Preece, M., Stanier, P. et al. (2003) Conserved methylation imprints in the human and mouse GRB10 genes with divergent allelic expression suggests differential reading of the same mark. Hum Mol Genet, 12, 1005-19. 100. Charalambous, M., Smith, F.M., Bennett, W.R., Crew, T.E., Mackenzie, F. and Ward, A. (2003) Disruption of the imprinted Grb10 gene leads to disproportionate overgrowth by an Igf2-independent mechanism. Proc Natl Acad Sci U S A, 100, 8292-7.

41

101. Monk, D., Wakeling, E.L., Proud, V., Hitchins, M., Abu-Amero, S.N., Stanier, P., Preece, M.A. and Moore, G.E. (2000) Duplication of 7p11.2-p13, including GRB10, in Silver-Russell syndrome. Am J Hum Genet, 66, 36-46. 102. Joyce, C.A., Sharp, A., Walker, J.M., Bullman, H. and Temple, I.K. (1999) Duplication of 7p12.1-p13, including GRB10 and IGFBP1, in a mother and daughter with features of Silver-Russell syndrome. Hum Genet, 105, 273-80. 103. McCann, J.A., Zheng, H., Islam, A., Goodyer, C.G. and Polychronakos, C. (2001) Evidence against GRB10 as the gene responsible for Silver-Russell syndrome. Biochem Biophys Res Commun, 286, 943-8. 104. Hitchins, M.P., Monk, D., Bell, G.M., Ali, Z., Preece, M.A., Stanier, P. and Moore, G.E. (2001) Maternal repression of the human GRB10 gene in the developing central nervous system; evaluation of the role for GRB10 in Silver-Russell syndrome. Eur J Hum Genet, 9, 82-90. 105. Piras, G., El Kharroubi, A., Kozlov, S., Escalante-Alcalde, D., Hernandez, L., Copeland, N.G., Gilbert, D.J., Jenkins, N.A. and Stewart, C.L. (2000) Zac1 (Lot1), a potential tumor suppressor gene, and the gene for epsilon-sarcoglycan are maternally imprinted genes: identification by a subtractive screen of novel uniparental fibroblast lines. Mol Cell Biol, 20, 3308-15. 106. Grabowski, M., Zimprich, A., Lorenz-Depiereux, B., Kalscheuer, V., Asmus, F., Gasser, T., Meitinger, T. and Strom, T.M. (2003) The epsilon-sarcoglycan gene (SGCE), mutated in myoclonus-dystonia syndrome, is maternally imprinted. Eur J Hum Genet, 11, 138-44. 107. Muller, B., Hedrich, K., Kock, N., Dragasevic, N., Svetel, M., Garrels, J., Landt, O., Nitschke, M., Pramstaller, P.P., Reik, W. et al. (2002) Evidence that paternal expression of the epsilon-sarcoglycan gene accounts for reduced penetrance in myoclonus-dystonia. Am J Hum Genet, 71, 1303-11. 108. Ono, R., Kobayashi, S., Wagatsuma, H., Aisaka, K., Kohda, T., Kaneko-Ishino, T. and Ishino, F. (2001) A retrotransposon-derived gene, PEG10, is a novel imprinted gene located on human chromosome 7q21. Genomics, 73, 232-7.

42

109. Ono, R., Shiura, H., Aburatani, H., Kohda, T., Kaneko-Ishino, T. and Ishino, F. (2003) Identification of a large novel imprinted gene cluster on mouse proximal chromosome 6. Genome Res, 13, 1696-705. 110. Hoshiya, H., Meguro, M., Kashiwagi, A., Okita, C. and Oshimura, M. (2003) Calcr, a brain-specific imprinted mouse calcitonin receptor gene in the imprinted cluster of the proximal region of chromosome 6. J Hum Genet, 48, 208-11. 111. Okita, C., Meguro, M., Hoshiya, H., Haruta, M., Sakamoto, Y.K. and Oshimura, M. (2003) A new imprinted cluster on the human chromosome 7q21-q31, identified by human-mouse monochromosomal hybrids. Genomics, 81, 556-9. 112. Nakabayashi, K., Makino, S., Minagawa, S., Smith, A.C., Bamforth, J.S., Stanier, P., Preece, M., Parker-Katiraee, L., Paton, T., Oshimura, M. et al. (2004) Genomic imprinting of PPP1R9A encoding neurabin I in skeletal muscle and extra- embryonic tissues. J Med Genet, 41, 601-8. 113. Mizuno, Y., Sotomaru, Y., Katsuzawa, Y., Kono, T., Meguro, M., Oshimura, M., Kawai, J., Tomaru, Y., Kiyosawa, H., Nikaido, I. et al. (2002) Asb4, Ata3, and Dcn are novel imprinted genes identified by high-throughput screening using RIKEN cDNA microarray. Biochem Biophys Res Commun, 290, 1499-505. 114. Kaneko-Ishino, T., Kuroiwa, Y., Miyoshi, N., Kohda, T., Suzuki, R., Yokoyama, M., Viville, S., Barton, S.C., Ishino, F. and Surani, M.A. (1995) Peg1/Mest imprinted gene on chromosome 6 identified by cDNA subtraction hybridization. Nat Genet, 11, 52-9. 115. Kobayashi, S., Kohda, T., Miyoshi, N., Kuroiwa, Y., Aisaka, K., Tsutsumi, O., Kaneko-Ishino, T. and Ishino, F. (1997) Human PEG1/MEST, an imprinted gene on chromosome 7. Hum Mol Genet, 6, 781-6. 116. Riesewijk, A.M., Hu, L., Schulz, U., Tariverdian, G., Hoglund, P., Kere, J., Ropers, H.H. and Kalscheuer, V.M. (1997) Monoallelic expression of human PEG1/MEST is paralleled by parent-specific methylation in fetuses. Genomics, 42, 236-44. 117. Kagami, M., Nagai, T., Fukami, M., Yamazawa, K. and Ogata, T. (2007) Silver- Russell syndrome in a girl born after in vitro fertilization: partial hypermethylation at the differentially methylated region of PEG1/MEST. J Assist Reprod Genet, 24, 131-6.

43

118. DeBaun, M.R., Niemitz, E.L. and Feinberg, A.P. (2003) Association of in vitro fertilization with Beckwith-Wiedemann syndrome and epigenetic alterations of LIT1 and H19. Am J Hum Genet, 72, 156-60. 119. Allen, C. and Reardon, W. (2005) Assisted reproduction technology and defects of genomic imprinting. Bjog, 112, 1589-94. 120. Riesewijk, A.M., Blagitko, N., Schinzel, A.A., Hu, L., Schulz, U., Hamel, B.C., Ropers, H.H. and Kalscheuer, V.M. (1998) Evidence against a major role of PEG1/MEST in Silver-Russell syndrome. Eur J Hum Genet, 6, 114-20. 121. Kobayashi, S., Uemura, H., Kohda, T., Nagai, T., Chinen, Y., Naritomi, K., Kinoshita, E.-i., Ohashi, H., Imaizumi, K., Tsukahara, M. et al. (2001) No evidence of PEG1/MEST gene mutations in Silver-Russell syndrome patients. Am J Med Genet, 104, 225-31. 122. Nakabayashi, K., Bentley, L., Hitchins, M.P., Mitsuya, K., Meguro, M., Minagawa, S., Bamforth, J.S., Stanier, P., Preece, M., Weksberg, R. et al. (2002) Identification and characterization of an imprinted antisense RNA (MESTIT1) in the human MEST locus on chromosome 7q32. Hum Mol Genet, 11, 1743-56. 123. Li, T., Vu, T.H., Lee, K.O., Yang, Y., Nguyen, C.V., Bui, H.Q., Zeng, Z.L., Nguyen, B.T., Hu, J.F., Murphy, S.K. et al. (2002) An imprinted PEG1/MEST antisense expressed predominantly in human testis and in mature spermatozoa. J Biol Chem, 277, 13518-27. 124. Blagitko, N., Schulz, U., Schinzel, A.A., Ropers, H.H. and Kalscheuer, V.M. (1999) gamma2-COP, a novel imprinted gene on chromosome 7q32, defines a new imprinting cluster in the human genome. Hum Mol Genet, 8, 2387-96. 125. Yamasaki, K., Hayashida, S., Miura, K., Masuzaki, H., Ishimaru, T., Niikawa, N. and Kishino, T. (2000) The novel gene, gamma2-COP (COPG2), in the 7q32 imprinted domain escapes genomic imprinting. Genomics, 68, 330-5. 126. Lee, Y.J., Park, C.W., Hahn, Y., Park, J., Lee, J., Yun, J.H., Hyun, B. and Chung, J.H. (2000) Mit1/Lb9 and Copg2, new members of mouse imprinted genes closely linked to Peg1/Mest. FEBS Lett, 472, 230-4. 127. Kerjean, A., Dupont, J.M., Vasseur, C., Le Tessier, D., Cuisset, L., Paldi, A., Jouannet, P. and Jeanpierre, M. (2000) Establishment of the paternal methylation

44

imprint of the human H19 and MEST/PEG1 genes during spermatogenesis. Hum Mol Genet, 9, 2183-7. 128. Luedi, P.P., Hartemink, A.J. and Jirtle, R.L. (2005) Genome-wide prediction of imprinted murine genes. Genome Res, 15, 875-84. 129. Luedi, P.P., Dietrich, F.S., Weidman, J.R., Bosko, J.M., Jirtle, R.L. and Hartemink, A.J. (2007) Computational and experimental identification of novel human imprinted genes. Genome Res, 17, 1723-30. 130. Nikaido, I., Saito, C., Mizuno, Y., Meguro, M., Bono, H., Kadomura, M., Kono, T., Morris, G.A., Lyons, P.A., Oshimura, M. et al. (2003) Discovery of imprinted transcripts in the mouse transcriptome using large-scale expression profiling. Genome Res, 13, 1402-9. 131. Schulz, R., Menheniott, T.R., Woodfine, K., Wood, A.J., Choi, J.D. and Oakey, R.J. (2006) Chromosome-wide identification of novel imprinted genes using microarrays and uniparental disomies. Nucleic Acids Res, 34, e88. 132. Choi, J.D., Underkoffler, L.A., Collins, J.N., Marchegiani, S.M., Terry, N.A., Beechey, C.V. and Oakey, R.J. (2001) Microarray expression profiling of tissues from mice with uniparental duplications of chromosomes 7 and 11 to identify imprinted genes. Mamm Genome, 12, 758-64. 133. Herzing, L.B., Cook, E.H., Jr. and Ledbetter, D.H. (2002) Allele-specific expression analysis by RNA-FISH demonstrates preferential maternal expression of UBE3A and imprint maintenance within 15q11- q13 duplications. Hum Mol Genet, 11, 1707-18. 134. Okamoto, I., Otte, A.P., Allis, C.D., Reinberg, D. and Heard, E. (2004) Epigenetic dynamics of imprinted X inactivation during early mouse development. Science, 303, 644-9. 135. Killian, J.K., Nolan, C.M., Stewart, N., Munday, B.L., Andersen, N.A., Nicol, S. and Jirtle, R.L. (2001) Monotreme IGF2 expression and ancestral origin of genomic imprinting. J Exp Zool, 291, 205-12. 136. Killian, J.K., Byrd, J.C., Jirtle, J.V., Munday, B.L., Stoskopf, M.K., MacDonald, R.G. and Jirtle, R.L. (2000) M6P/IGF2R imprinting evolution in mammals. Mol Cell, 5, 707-16.

45

137. van Rheede, T., Bastiaans, T., Boone, D.N., Hedges, S.B., de Jong, W.W. and Madsen, O. (2006) The platypus is in its place: nuclear genes and indels confirm the sister group relation of monotremes and Therians. Mol Biol Evol, 23, 587-97. 138. Walter, J. and Paulsen, M. (2003) The potential role of gene duplications in the evolution of imprinting mechanisms. Hum Mol Genet, 12 Spec No 2, R215-20. 139. Edwards, C.A., Rens, W., Clark, O., Mungall, A.J., Hore, T., Marshall Graves, J.A., Dunham, I., Ferguson-Smith, A.C. and Ferguson-Smith, M.A. (2007) The evolution of imprinting: chromosomal mapping of orthologues of mammalian imprinted domains in monotreme and marsupial mammals. BMC Evol Biol, 7, 157. 140. Barlow, D.P. (1993) Methylation and imprinting: from host defense to gene regulation? Science, 260, 309-10. 141. Sekita, Y., Wagatsuma, H., Nakamura, K., Ono, R., Kagami, M., Wakisaka, N., Hino, T., Suzuki-Migishima, R., Kohda, T., Ogura, A. et al. (2008) Role of retrotransposon-derived imprinted gene, Rtl1, in the feto-maternal interface of mouse placenta. Nat Genet. 142. Suzuki, S., Ono, R., Narita, T., Pask, A.J., Shaw, G., Wang, C., Kohda, T., Alsop, A.E., Marshall Graves, J.A., Kohara, Y. et al. (2007) Retrotransposon Silencing by DNA Methylation Can Drive Mammalian Genomic Imprinting. PLoS Genet, 3, e55. 143. Varmuza, S. and Mann, M. (1994) Genomic imprinting--defusing the ovarian time bomb. Trends Genet, 10, 118-23. 144. Weisstein, A.E., Feldman, M.W. and Spencer, H.G. (2002) Evolutionary genetic models of the ovarian time bomb hypothesis for the evolution of genomic imprinting. Genetics, 162, 425-39. 145. Haig, D. (1994) Refusing the ovarian time bomb. Trends Genet, 10, 346-7; author reply 348-9. 146. Solter, D. (1994) Refusing the ovarian time bomb. Trends Genet, 10, 346; author reply 348-9. 147. Moore, T. (1994) Refusing the ovarian time bomb. Trends Genet, 10, 347-9. 148. Reik, W. and Lewis, A. (2005) Co-evolution of X-chromosome inactivation and imprinting in mammals. Nat Rev Genet, 6, 403-10.

46

149. Duret, L., Chureau, C., Samain, S., Weissenbach, J. and Avner, P. (2006) The Xist RNA gene evolved in eutherians by pseudogenization of a protein-coding gene. Science, 312, 1653-5. 150. Haig, D. (2004) Genomic imprinting and kinship: how good is the evidence? Annu Rev Genet, 38, 553-85. 151. Wilkins, J.F. and Haig, D. (2003) What good is genomic imprinting: the function of parent-specific gene expression. Nat Rev Genet, 4, 359-68.

47

CHAPTER II: ALLELIC ANALYSIS OF CANDIDATE IMPRINTED GENES ON HUMAN CHROMOSOME 7

Data from this chapter have been included in the following manuscript and publication:

Monk D., Arnaud P., Muller P., Bour’chis D., Magnuson T., Parker-Katiraee L., Scherer S.W., Feil R., Stanier P. and Moore G.E. Epigenetic characterization of the Peg10/PEG10 imprinted clusters in mouse and man. Submitted for peer-reviewed publication.

Scherer S.W., Cheung J., MacDonald J.R., Osborne L.R., Nakabayashi K., Herbrick J.A., Carson A.R., Parker-Katiraee L., Skaug J., Khaja R., et al. (2003) Human chromosome 7: DNA sequence and biology. Science. 300:767-772

I performed the RT-PCR reactions, imprinting analyses, and candidate gene selection. The generation of F1 hybrid mice and their dissection was performed under the guidance of Dr. Takahiro Yamada and Dr. Kazuhiko Nakabayashi. Dr Kazuhiko Nakabayashi performed microarray experiments using somatic cell hybrids and cell lines from patients with UPD7, the results of which were also used to select several candidate imprinted genes. Dr. Takahiro Yamada generated glial and neuronal cell lines. Several genes were analyzed together with Dr Kazuhiko Nakabayashi and Dr Takahiro Yamada. 48

II.A INTRODUCTION

Russell-Silver syndrome (RSS) is a phenotypically heterogeneous disorder characterized by intrauterine and post-natal growth retardation, unique facial characteristics, and clinodactyly of the fifth finger (OMIM: 180860). The disease is also genetically heterogeneous with several modes of inheritance. Studies have identified abnormalities involving various chromosomes including 11 (1), 15 (2), and 17 (3, 4).

RSS has been associated with imprinted genes on human chromosome 7, due to the fact that 10% of affected patients have maternal uniparental disomy for chromosome 7

(mUPD7) (Table I.1) (5, 6). A patient has been reported with segmental mUPD spanning

7q31-qter, highlighting the importance of this region (7). Association between this region and RSS was emphasized with the discovery of a patient with mosaic mUPD for 7q21-qter

(8). Additionally, several affected patients have been identified who carry maternally inherited duplications of 7p11.2-p13, suggesting that genes within the p-arm of the chromosome may also contribute towards the aetiology of the disorder. Candidate imprinted genes within these regions have been screened in order to identify causative mutations in non-UPD patients (9-12). However, a causative gene for the chromosome 7- form of RSS has not been found, but absence of a paternally-inherited FOXP2 gene might explain the verbal dyspraxia phenotype usually observed in this sub-type (13).

Consequently, several studies have dedicated their efforts to discovering novel imprinted genes on this chromosome (14, 15).

In this study, I focused on the identification of imprinted transcripts on human chromosome 7. Candidates were selected using various methods outlined in Figure II.1 and their homologues were analyzed in the F1 hybrid offspring of inbred strains of mice. At the same time, several non-coding transcripts, specific to mice, were also studied. These 49 studies were performed on the mouse due to difficulties in obtaining human fetal tissue samples with corresponding parental DNA. These experiments were also performed assuming that most imprinted patterns of expression are maintained between humans and mice. Consequently, the discovery of an imprinted pattern of expression in a murine gene would indicate that its human homologue may exhibit preferential allelic expression in a parent-of-origin pattern and would represent a candidate gene for RSS.

By analyzing the expression of polymorphisms in tissues of F1 hybrid mice, genomic imprinting was observed in two transcripts: Cpa4 and Klf14, whose results are discussed in detail in chapters IV and V, respectively. The preferential allelic expression of

Pon1 is discussed in chapter III due to its unique, developmental-stage specific, pattern of expression. The data outlined in this chapter demonstrate that preferential allelic expression in a non-parent-of-origin pattern is a common phenomenon in mice, as has been previously demonstrated (16). Additionally, many of the genes screened for imprinted expression exhibited preferential allelic expression with tissue or murine strain specificity. 50

II.B MATERIALS AND METHODS

Forty-two candidate imprinted transcripts were selected using various methods, outlined in Figure II.1: microarray was performed using RNA from patients with mUPD7 and pUPD7, as well as RNA from somatic cell hybrid cell lines containing single copies of human chromosome 7 of known parental origin; transcripts were selected from known imprinted loci (7q21.3 and 7q32.3); and putative imprinted transcripts were selected from a genome-wide prediction of imprinted transcripts published by Nikaido, et al (17). Several transcripts were selected from the Williams-Beuren syndrome (WBS) deletion region

(7q11.23), due to parent-of-origin effects observed in this disorder (18). Subsequently, homologues to the human candidates were identified in the mouse genome. Polymorphisms were identified by PCR amplification within each candidate between evolutionarily distant strains of mice, specifically the C57BL/6 (B) strain, JF1/Ms (J), and CAST/Ei9 (C). PCR was performed using genomic primers listed in Table II.1.

Crosses were performed between the three strains of mice. All crosses were performed against C57BL/6, since the genome of this strain has been sequenced and assembled. Embryonic dissections were performed at 9.5 days post coitum (dpc), 12.5 dpc,

15.5 dpc, 17.5 dpc, and P0. Additionally, glial and neuronal cell lines were generated from brains of BxJ and JxB F1 hybrids, as previously described (19). Tissue specific dissections of the placenta were performed to analyze a subset of genes, extracting the labyrinth layer which is free of maternal contamination. RNA was extracted using TRIZOL reagent

(Invitrogen), following the manufacturer’s protocol. Two micrograms of RNA were subsequently used for cDNA synthesis using random primers (SuperScript II, Invitrogen).

PCR was performed to analyze the transcribed polymorphism in each candidate gene using primers listed in Table II.1. Amplified products were purified using 51 microCLEAN (Microzone Ltd) and were subsequently sequenced on an ABI 3730XL using

BigDye Terminator v3.1 cycle sequencing kit (Applied Biosystems), combined with Half

BigDye sequencing buffer (Sigma Aldrich).

Peak heights of each polymorphism were visually analyzed and compared to peak heights observed in genomic DNA to correct for sequencing or PCR bias. When one peak was 2/3 the height of the other peak, preferential allelic expression was considered to be present. Preferential allelic expression was confirmed by observing allelic preference in reciprocal crosses (i.e. both BxJ and JxB, or BxC and CxB). When allelic preference was observed, it was confirmed on a separate PCR reaction, and in both directions whenever possible. Due to the fact that polymorphisms were often identified in the untranslated region of the transcript, inserts/deletions were frequently observed, and sequencing in both directions was not always possible.

All PCR reactions were performed using Taq2000 (Stratagene), following the manufacturer’s protocol. Most transcripts were amplified using 36-38 cycles, and using an annealing temperature of 57°C. All primers were designed using Primer3 (20) with the exception of the primers for the paraoxonase genes, where previously designed primers were used (21). 52

Figure II.1

Differentially expressed Putative imprinted genes Transcripts located within transcripts from microarrays from microarrays using and flanking known using UPD and somatic cell parthenogenetic and imprinted loci hybrid cell lines. androgenetic embryos

Selection of putative imprinted transcripts

Identify murine homologue

Identify transcribed SNPs in homologue in F1 hybrids of inbred strains of mice

PCR amplify and sequence SNP in cDNA and gDNA

Observe ratio of peaks of SNP in cDNA sequencing electropherograms, normalized by ratio in gDNA

Presence of preferential allelic expression Absence of preferential allelic expression

Confirm allelic preference using reciprocal murine cross

Confirm allelic preference using reverse sequencing primer Absence of allelic preference and in an independent sample, whenever possible.

Non-imprinted allelic preference Imprinted expression

Tissue specific imprinting Ubiquitous imprinting

Maternal specific placental expression Imprinting in multiple tissues

Rule out possibility of maternal contamination using placental tissues from F2 hybrid mice

Figure II.1 Experimental outline of screen for preferential allelic expression of candidate genes on human chromosome 7

53

Table II.1 Primers used in preferential allelic expression analysis Gene Genomic primers cDNA primers PCLO TTGTGTGGAGTGGCACAAAT GGTGGTATCCGCTGAAAGAA TGAACATTAAGCTGCCATGC CATTGCCTCAGATGGCTTTT TAX1BP1 TTGGATTTTCTGCATACACTGA TGGCGGAGAAAGAGAAAGAA TTCCATGAAAATAAAAGCTGAATG GCTGGAATCAAAGCAAAAGC AQP1 GAAAAAGTTGGCAGGCAGAC TCTTGGAGGGAGTTGAGCAC GCCAGCTCCCTCTCTGTCTT TGATACCGCAGCCAGTGTAG BET1 GGGAAACCCTGGAAAAACAT CCTGGCAGCTATGGGAACTA GCATGGAGCAATCAGGCTA ATGATCAAATGGCCCACCTA DYNC1I1 TCTGTTGATTTGCCTTGCAG AAGGCCGCATTTGGATCTAT ATGTAACCTCAGCAGCAGCA ACGCACATGCTCTAAGATCG PDK4 GAGAAAGTTTACCACCATTTCTTCA TGTCAGGTTATGGGACAGACG GAGCTCCGAAGCTGATGACT TGCATATTTAACATTTCACCCAAG GNG1 GTCTCCTTCCTTTTAAGGTTTCC AGGACAAGTTGAAGATGGAGGT AAATTATCCGCGTATTGAAACAA TGCTTCTAAAGCAGGGTTCA HOXA4 CACACACAAATCTGTCTCTTAGGTTT CCTGGATGAAGAAGATCCAC GAACCTAAGCGCTCTCGAAC ACGCTGTGCCCCAGTATAAG TFPI2 TGTCTGAGGGAATACTGTAGCA TGTGTGAACCACGGAAACAT TGACTTCCTGCCCAGATTTT CACCAACATTTTAAATTCATCCAA TGCCCAACTTCTTGGGATAG (s) TAC1 TTTGCAGTGGCTTATGAAAGAA CATGGCCAGATCTCTCACAA AAGAAAGGCTGTTGATTTGACA CAGCATGAAAGCAGAACCAG ASNS CACATGCTCTCCCCTTTAGC TGGTTCAAGATTTTGCAGGAC TTTTGGTCACCATCAGAGCA TGTGAAACTTTTATTTCTTTTCATGC ACN9 GTGACTCAGGACGTGTGCAT CTGGGTGACCAGTACGTGAA TTTTTCATCTCCCCAACCTTT TTTTTCATCTCCCCAACCTTT SHFDG1 CATGAAGCGACGGCTACC GAGCAGCTTGGGTCTTGG TCTTTCCAGATTTCTGTTAGTGATTG TTCTCCAGCTCAGCACGTAA CCDC123 GCCTTAGATGTCTGCCTGCT AATGACATGGAGCGTTGGAT TCTCCTGAAGTCTTCCCATCA TCACTTGAGAACACCACACACA LOC253012 CCAAACAAAACAACCAAACAAA GTAGGTCTTCCCCTGCCTCT GGTTCCTTTACCCATAAAACTAGAAA GGAACGCTTGGAAGTTTCAG CYLN AGCCTACATGGCTGTGCTG AAGAGGTCCAGGAGGAGCTG AGACCTACCAGACCGAGGTG CCTTCATCCGAATTGTCTCC FLJ31340 CCATGGTTATAGAAAGTGGCTTC AATCATGGGGAGGAACCAA AGCTCACAGGACAGCTCCTC TCGCTATAGCAAAGCGAGTG GRM8 ACCCGCAATTAGAGTCACCA AGGTGGTCCCCCTTCTTCT CTTCCTCTTCTCCCCAGGAA TCACTTAGCTCTGGGGCTGT CTTTCTCCCTTGGCATGAAC (s) 54

UBE3C ACAAGCAAGCTCTGGTAGGC CTGCCTTTTGTGGTTCCATT AGCTTTCCAGGCATAAAGCA GACAGGATGATCGGCAGAAT AK038694 TGGATGGCTGTCCAGTCTTA CTGCGGAGCAGGGACAG CCCAGGCACTTCTCTGACAT CAATGTCATGTTTCAGCCAAA PHTF2 GGCTCTGCTCCAGGAGTCTA GTTGTCATCCTTTCGGCTGT GAGGGGTGCACCAGATTAAA GCTGTGCAAACAGCTCAGTG UPK3B GAATGTGATGGCCTCTGGTT CTCTTGCTCCTGGCTTTCCT CAGAATGCCAGATGCACAAT TAAGTTGGCCAAGCTGACCT NPTX2 ACCCTGGCACCCCATTAAG GATGCTGGGGATTGTCTGAG AGTCAAAAGCAGGGATGTGC GCAGGAGATCATCAACATCG AIP1 GCTAATCAACATGGGCCATTA GAAAGCGGGACCTATGAAGA TGAGCCCTAAAGAAAACCATC GTAAGGCTGGGAGGGTGTCT RSBN1L AAATCATGAACCGAAGATTTCA TGCGGATCACATAGGACAAG GGACAGGCACTGAGGAATGT CCAGGACACGAGTCTTCCTT FGL2 TCCAATTGCTGTTGAGTCTGA GGATGGCAAGTGTTCCAAGT GTGCTTTCAAGCATTCCTCA GTGCTTTCAAGCATTCCTCA HOXA11 AGATTGCCAGAAGCTTCCAA GGCCACACTGAGGACAAGG ATGCAGATTTTGCCCTTGAC TTCACAGCCACCTTTTGGTT HOXA11-AS TCATGCATCAGGGTACAAGG CTGTTTTAGAGGCGCTGAGG ATCTCCGACTGCAAGGAAAA TGCCGATCAAATCTCCTTTC LOC253012 CCAAACAAAACAACCAAACAAA GTAGGTCTTCCCCTGCCTCT GGTTCCTTTACCCATAAAACTAGAAA GGAACGCTTGGAAGTTTCAG WBSCR16 TCTGAGGCCTCCTACTCCAA ATGTGGGCTTAGCCACTTTG GTGGCTTCTCTTCCAGGTGA AGACAAGACACATCCCCAGA TSGA14 TGCTGGAAAAGACTCGAAGG CTGCCAGCTGAGAACAAGTG GTTGCTGGGGTCTTCCTGTA GCCTCTCTGCTAAACCAGCA SAMD9L CAACAACACACAGACCAGCA TCAGAATCGAAACTGGAAACTG TTTACCCAAACAAACCAGCA CTTGTCCACTCATGTTCAAGC CPA2 AGTGGCGGAAGCATCGAC CTGGATGAAGTGGCCCAAAG TGTTCATCTGTTCTGTATGTCCCTA TGTTCATCTGTTCTGTATGTCCCTA CPA5 TGGATGATGCGGGATAGAAT GCCTCGCTCTTGTTCTCACT ATAGCTGGCCCTGCTCTCC CTCACGACAGCACATCACCT CPA1 ATGCCCATCCTTGTTTCTGA CCAGATGTGAGGGGAACTGT CTCCAGATACCCGCTCCTC CCCTCACTTTCTGCACCTG GTF2I CTCACGTGCCTAACTGCTGA TCAAGCAGGAGCCAGACC TGCGTGTTTGTTCTCTCTGC CACAGGGACTTCCTTTGCTG AB041803 TCTTCCAGGGTCCGTAACTG GCTCACTGGTTTGCAGGAGT CTGACTCTGGGCTCTCAGGA GTGTGGAAATCTGGGCTTTG AK034494 TTTGAATTGTGCGAGCTGAC TTTGAATTGTGCGAGCTGAC AACACCGAGCCTCATTGAAC AACACCGAGCCTCATTGAAC 55

PON3 ATCAGCTGGAGGCTGCTTAC TAACCCCATGAAGCTGTTGA ACGCATCCAGGACTCTTTGT CAAGGAGCACAAATTCAAGTG (s) sequencing primer 56

Table II.2 Transcripts studied for preferential allelic expression Expression Human Murine Gene Candidate PAE locus locus Brain Limb Liver Lung Heart Yolk Sac Labyrinth Placenta Intestine Tongue Diaphragm Neurons Glia Eye HOXA4 7p15.2 6qB3 Nikaido Y Y N Y Y N YS HOXA11 7p15.2 6qB3 Microarray N Y N N N Y Y Y HOXA11-AS 7p15.2 6qB3 Microarray N Y N N N Y Y Y TAX1BP1 7p15.5 6qB3 Microarray Y Y Y Y Y Y AQP1 7p15.1 6qB3 Microarray Y Y Y Y Y PHTF2 7q11.23 5qA3 Nikaido Y Y Y Y Y W Y RSBN1L 7q11.23 5qA3 Nikaido Y Y Y Y Y Y Y FGL2 7q11.23 5qA3 Nikaido Y Y Y Y Y Y Y La CYLN2 7q11.23 5qG2 WBS Y W W Y W Y Y GTF2I 7q11.23 5qG2 Nikaido Y Y Y Y Y Y Y WBSCR16 7q11.23 5qG2 Nikaido Y Y Y Y Y Y Y UPK3B 7q11.23 5qG2 Nikaido N Y Y Y Y Y Y PCLO 7q21.11 5qA1 Microarray Y Y N N Y Y Y T AIP1 7q21.11 5qA3 Nikaido Y Y Y Y Y W W Y Y G FLJ31340 7q21.12 5qA1 Nikaido Y Y Y Y Y N N Y Li BET1 7q21.3 6qA1 PC Y Y N Y Y Y Y Y Y DYNC1I1 7q21.3 6qA1 PC Y Y N N N N PDK4 7q21.3 6qA1 PC Y Y Y Y Y Y Y Y G,Ne,Li,B GNG1 7q21.3 6qA1 PC N N N N N N Y E TFPI2 7q21.3 6qA1 PC N Y Y Y Y Y Y Y Y Lu TAC1 7q21.3 6qA1 PC Y Y N N Y N N Y L ASNS 7q21.3 6qA1 PC Y Y Y Y Y Y SLC25A13 7q21.3 6qA1 PC N Y Y N N Y N ACN9 7q21.3 6qA1 PC Y Y Y Y Y Y Y Y Li SHFDG1 7q21.3 6qA1 PC Y Y Y Y Y Y Y Y CCDC123 7q21.3 6qA1 PC Y Y Y Y Y Y Y Y LOC253012 7q21.3 6qA1 PC Y N N N N Y N Y Y YS PON3 7q21.3 6qA1 PC Y PON2 7q21.3 6qA1 PC Y Li LOC253012 7q21.3 6qA1 PC Y N N N N Y N Y Y Y YS SAMD9L 7q21.3 6qA1 PC Y Y Y Y Y Y Y AK076963* 6qA1 PC N N N N N N N AK038694* 6qA1 PC Y N N N N N N Y Y NPTX2 7q22.1 5qG2 Nikaido Y N N W N N W Y Y GRM8 7q32.1 6qA3.2 Nikaido Y Y N N W N Y I TSGA14 7q32.3 6qA3.3 PC Y Y Y Y Y Y Y CPA2 7q32.3 6qA3.3 PC Y Y Y Y Y Y Y Y YS CPA5 7q32.3 6qA3.3 PC N N N N N N N CPA1 7q32.3 6qA3.3 PC N N N N N N N AB041803* 6qA3.3 PC Y N N N N N N Y Y G,Ne AK034494* 6qA3.3 PC Y Y Y Y Y Y Y UBE3C 7q36.3 5qB1 Nikaido Y N Y Y Y Y Y PAE: Preferential Allelic Expression; WBS: Williams-Beuren Syndrome region; PC: Positional Candidate B: Brain; L: Limb; I: Intestine; T: Tongue; P: Placenta; YS: Yolk Sac; D: Diaphragm; Li: Liver; Lu: Lung; H: Heart; La: Labyrinth; Ne: Neuron; G: Glia; E: Eye W: Weak expression, could not be analyzed; *: Murine specific ncRNA 57

II.C RESULTS

Forty-two imprinted candidates were selected and analyzed for preferential allelic expression in a parent-of-origin pattern. Murine homologues to the human candidates were analyzed in tissues from the F1 hybrid offspring of inbred strains of mice. The imprinted candidates were analyzed in numerous tissues, yet emphasis was placed on tissues of extra- embryonic origin, namely the yolk sac and the placenta, due to the fact that several imprinted genes have preferential allelic expression only in extra-embryonic tissues (22,

23). Preferential allelic expression was identified by analyzing transcribed polymorphisms in cDNA.

II.C.1 Transcripts selected for imprinting analysis

The transcripts selected for imprinting analysis are listed in Table II.2, which also summarizes the method used for the selection of each candidate transcript. Of the 42 transcripts selected for expression analysis, 24 were positional candidates, the majority of which were located in the 7q21.3 imprinted domain that encompasses SGCE and PEG10

(described in I.C.3.ii.b). Twelve transcripts were imprinted candidates identified by microarray using tissues from parthenogenetic and androgenetic murine embryos (17).

Several of these transcripts, in addition to CYLN2, were selected due to their location in the

WBS deletion region. Five transcripts were selected from microarray data using cell lines from mUPD7 and pUPD7 patients, as well as somatic cell hybrids with a single copy of human chromosome 7. Four transcripts were murine-specific non-coding RNAs found within known imprinted regions (AB041803, AK034494, AK076963, and AK038694).

All genes were found to have expression in at least one tissue used for imprinting analysis with the exception of Cpa5 and Cpa1. The former was found to have testis-specific expression and its allelic expression could not be determined. The latter was not expressed 58 in the tissues examined. An in silico search of microarray data revealed that the transcript is expressed in tissues of the spleen and pancreas. Consequently, it could not be analyzed in this study.

Ghrhr was selected for imprinting analysis from microarray data. However, polymorphisms could not be found between the strains of mice used in this analysis, and was consequently excluded from subsequent experiments (data not shown). This transcript, together with the genes described in other chapters (Pon1, Cpa4, and Klf14), brings the total number of analyzed transcripts to 46.

II.C.2 Preferential allelic expression is a common phenomenon

The tissues in which each candidate transcript was analyzed are summarized in

Table II.2. Additionally, the tissues in which preferential allelic expression was observed are also noted. Sixteen transcripts were found to have preferential allelic expression in a non-parent-of-origin pattern in at least one tissue, demonstrating that this is a common phenomenon in mice, as previously described (16). No single transcript was found to have ubiquitous preferential allelic expression, suggesting that tissue specific factors regulate this phenomenon.

The level of preferential allelic expression varied between transcripts (Figure II.2).

Some exhibited monoallelic expression, seen in Pon2 liver expression, while others demonstrated slight decreases in the expression of an allele (seen in Tac1 limb expression).

Such differences in the degree of preferential gene expression stress the importance of comparing cDNA expression to that of genomic DNA (gDNA) (Figure II.3). As seen in

Figure II.3, the difference in peak heights in cDNA can be mistaken for preferential allelic expression. Indeed, the difference in peak heights in Gng1 (Figure II.3) is comparable to 59 that of Tac1 (Figure II.2). Yet when the gDNA amplification bias is taken into account, preferential allelic expression cannot be considered in the sample shown in Gng1.

Additionally, many strain-specific effects were seen in expression, where preferential allelic expression would be observed between the F1 hybrid offspring of two inbred strains of mice, yet such differential expression would not be seen the F1 hybrid offspring of a different cross, as seen in Figure II.4. In genes where strain specific effects were seen, differential allelic expression was still considered to be present. 60

Figure II.2

A gDNA cDNA B gDNA cDNA

CxB-Yolk Sac CxB-Limb BxC BxC

BxC-Yolk Sac BxC-Limb Hoxa4 Tac1

C gDNA cDNA B BxC-Liver

C CxB-Liver

Pon2

Figure II.2 Variation in preferential allelic expression levels. Sequencing electropherograms from the PCR amplification of cDNA from tissues of F1 hybrids of inbred mouse strains are shown. In each of the panels, the sequencing electropherograms on the left show polymorphisms in genomic DNA (gDNA), while those on the right show the expression of the polymorphism in cDNA. BxC and CxB indicate crosses between C57BL/6 (B) and CAST/Ei9 (C), where the first letter denotes the mother in the cross. Panel A shows the expression of a polymorphism in Hoxa4, indicating preferential allelic expression in the yolk sac. Panel B shows the expression of a polymorphism in Tac1 that suggests preferential allelic expression in the limb. Panel C shows the expression of two polymorphisms in Pon2 indicating preferential allelic expression in the liver. The same allele is being preferentially expressed, regardless of parental-origin, indicating that imprinting is not present. In each gene, the degree of allelic preference is variable. Pon2 shows monoallelic expression of the C allele, whereas Tac1 shows slight allelic preference.

61

Figure II.3

A B gDNA cDNA gDNA cDNA

Brain Eye CxB JxB

BxC BxC BxJ

Heart Shfdg1 Gng1

Figure II.3 Comparison of polymorphic peaks between genomic DNA and cDNA to determine preferential allelic expression. Sequencing electropherograms from PCR amplification of cDNA from tissues of F1 hybrids of inbred mouse strains are shown. In each of panels, the sequencing electropherograms on the left show polymorphisms in genomic DNA (gDNA), while those on the right show the expression of the polymorphism in cDNA. BxC, CxB, and JxB indicate crosses between C57BL/6 (B), CAST/Ei9 (C), or JF1/Ms (J) where the first letter denotes the mother in the cross. In each cDNA sequencing electropherogram, the same allelic variance is seen as in the gDNA, suggesting that the skew is not due to preferential allelic expression, but to experimental bias. All cDNA samples are from tissues at 15.5dpc, with the exception of BxJ eye, which was extracted at 17.5dpc. 62

Figure II.4 A gDNA cDNA

BxC BxJ

B C J CxB JxB Liver Pon2 B gDNA cDNA

BxC BxC BxJ

BxJ CxB JxB

Liver Pdk4

Figure II.4 Strain specific effects in preferential allelic expression. Sequencing electropherograms from the PCR amplification of cDNA from tissues of F1 hybrids of inbred mouse strains are shown. In each of the panels, the sequencing electropherograms on the left show polymorphisms in genomic DNA (gDNA), while those on the right show the expression of the polymorphism in cDNA. BxC, CxB, and JxB indicate crosses between C57BL/6 (B), CAST/Ei9 (C), or JF1/Ms (J) where the first letter denotes the mother in the cross. Differences in preferential allelic expression are seen in both Pon2 (panel A) and Pdk4 (panel B) between BxC/CxB crosses and BxJ/JxB crosses. All cDNA samples are from 15.5dpc. 63

II.C.3 Analysis of preferential allelic expression in placental tissues

Due to the high risk of contamination from tissues of maternal origin in placental dissections, the analysis of preferential allelic expression in this organ is problematic. The analysis of placental tissues which bear maternal contamination, yields results that can be mistaken for preferential allelic expression from the maternal allele (imprinting).

Consequently, additional measures were taken when maternal expression was observed in placental tissues.

The preliminary analyses of Hoxa11 and Hoxa11-AS indicated maternal specific expression. Due to the fact that Hoxa11 is involved in the differentiation of primary trophoblasts and is associated with an antisense non-coding transcript, the gene was considered to be a very good candidate for imprinted expression (24). Monoallelic maternal expression was seen in whole placental cDNA at 15.5dpc (Figure II.5a), as well as cDNA from placentas at 12.5dpc (data not shown).

To reduce the possibility of contamination from maternal tissues, a more cautious dissection of the placenta was performed, extracting the labyrinth and spongiotrophoblast layers of the placenta, which are of fetal origin. Additionally, the maternal decidua was extracted, which is the outermost layer of the placenta and is of maternal origin. An analysis of Hoxa11 in these tissues revealed that the gene was not imprinted in the labyrinth, yet it was expressed from the maternal allele in the spongiotrophobast (Figure II.5b).

However, due to the close proximity of the spongiotrophoblast with the maternal decidua, maternal contamination of this tissue could not be excluded. Similar results were obtained when Hoxa11-AS was analyzed

To determine if Hoxa11 was imprinted in the spongiotrophoblast, polymorphisms unique to each of the three strains of mice used in this study were identified (Figure II.6a). 64

Subsequently, a cross was performed between the F1 hybrid BxJ and the inbred mouse

CAST (Figure II.6a). F2 embryos of genotypes BxC and JxC were obtained and dissections were performed, extracting the labyrinth, spongiotrophoblast, and maternal decidual layers of the placenta. Additionally, the limb was extracted and was used as a control. Using the

F2 embryos of genotype BxC, it was predicted that if the genotype corresponding to the J allele was observed in cDNA, the tissues would necessarily bear maternal contamination.

The analysis of polymorphisms in cDNA revealed that the limb and labyrinth layers were of embryonic origin (BxC). However, the spongiotrophoblast layer was contaminated with maternal tissue, since it carried the maternal genotype. The maternal decidua also bore the

BxJ genotype, as was expected. Consequently, the analysis revealed that Hoxa11 was not imprinted.

Hoxa11-AS did not have SNPs unique to each mouse strain in the genomic regions examined, and consequently preferential allelic expression could not be analyzed in the F2 hybrid mice. 65

Figure II.5

A B gDNA cDNA gDNA cDNA Placenta Limb Limb Labyrinth

C BxC J

TCCCCA TCTCCA TCNCCA TCCCC TCNCC TCNCC BxJ

BxC CxB BxJ

TCNCC TCCCCA TCNCCA TCNCC TCTCC TCTCC MD ST

Figure II.5 Imprinting analysis of Hoxa11. The imprinting analysis of Hoxa11 in the F1 hybrid offspring of BxC and CxB embryos (panel A), as well as BxJ embryos (panel B) is shown. In each of the panels, the sequencing electropherograms on the left show polymorphisms in genomic DNA (gDNA), while the panels on the right show the expression of the polymorphism in cDNA. BxC, CxB, and JxB indicate crosses between C57BL/6 (B), CAST/Ei9 (C), or JF1/Ms (J) where the first letter denotes the mother in the cross. In panel A, Hoxa11 monoallelic expression from the maternal allele is seen in the placenta. In panel B, the labyrinth and spongiotrophoblast layers of the placenta were extracted. Panel B also shows expression from the maternal decidua, which of maternal origin. The results demonstrate that Hoxa11 is biallelically expressed from the labyrinth, yet suggest that the transcript is maternally expressed in the spongiotrophoblast. 66

Figure II.6 A SNP 1 2 3 Strain ♀ ♂ B A T T BxJ x C

C G C C BxC JxC J A C T

1 2 3 Corresponding B Strain

Limb

Labyrinth

Genotype A T T B G C C C

Maternal decidua

Spongiotrophoblast

A T T Genotype B A C T J

Figure II.6 Imprinting analysis of Hoxa11 using F2 hybrids. A) The genotype for each of the three strains of mice used in the analysis (C57BL/6 (B), CAST/Ei9 (C), or JF1/Ms (J)) at three polymorphisms in the Hoxa11 transcript is indicated in the table. On the right, the cross performed for the analysis is shown, where the underlined cross denotes the mouse analyzed in panel B. B) The expression of SNPs 1-3 of Hoxa11 is shown for four different tissues extracted from a BxC embryo, product of the cross depicted in panel A. The deduced genotype and its corresponding strain are indicated beneath the sequencing electropherograms. The results indicate that the limb and labyrinth (top electropherograms) are biallelically expressed from the embryo (due to the expression from BxC alleles), while the spongiotrophoblast (bottom electropherograms), which is expected to have a BxC genotype due to its embryonic origin, is expressed from the same alleles as the maternal decidua (BxJ alleles). Consequently, the spongiotrophoblast dissection from the BxC embryo is contaminated with maternal tissues. 67

II.D DISCUSSION

This chapter summarizes the allelic expression patterns for 42 murine homologues of candidate imprinted genes on human chromosome 7. These transcripts were identified using several methods, including the selection of transcripts within known imprinted domains and the selection of putative imprinted genes from microarray data (Figure II.1).

The identification of a novel imprinted gene (Klf14), detailed in chapter V, suggests that positional candidate gene selection is one of the most successful methods for imprinted gene identification. Numerous candidate genes were selected from microarray data from parthenogenetic and androgenetic embryos in a genome-wide screen performed by Nikaido, et al (17), however none of these genes showed imprinted expression, suggesting that the use of these materials is an ineffective method to identify imprinted transcripts. A similar conclusion was drawn in a recent study that analyzed 68 candidate imprinted transcripts from the Nikaido et al study located within or flanking known imprinted domains (25).

These 68 transcripts were considered to be the strongest candidates from the genome-wide screen, yet 93% of the analyzed transcripts were non-imprinted. The authors concluded that the genes identified by microarray represent those that are differentially expressed between parthenogenetic and androgenetic embryos, including transcripts that are regulated by imprinted genes.

The screen described in this chapter identified preferential allelic expression in a non-parent-of-origin pattern in numerous genes. Previous studies have demonstrated that preferential allelic expression is a common phenomenon in mice (16). Cowles, et al identified a 1.5-fold difference in allelic expression in 6% of genes in a screen of 69 transcripts. Here, preferential allelic expression was identified in 38% of transcripts examined. The difference in frequency between these studies can be attributed to the fact 68 that this study examined expression in numerous tissues at several developmental stages, while Cowles, et al examined allelic ratios in a limited number of tissues. Additionally, the study by Cowles and colleagues set a very high cutoff (1.5-fold) in identifying allelic preference and the authors state that lower levels of preferential allelic expression may be more prevalent among their examined genes. Thus, the findings in this chapter stress the importance of screening for preferential allelic expression in multiple tissues. Consequently, the number of genes undergoing preferential allelic expression may be highly underestimated due to the scarcity of studies which analyze this phenomenon in multiple tissues, among other reasons.

It is important to note that the identification of preferential allelic expression in a given tissue does not exclude the possibility of allelic preference in only a subset of cells.

Additionally, if imprinted gene expression were to occur in only a specific cell type, it may not be identified due to biallelic expression from surrounding cells. Attempts were made to study specific cell types by creating neuronal and glial cell lines since cell type specific imprinting has been observed in the brain (26). Interestingly, the analysis of Pdk4 in these cell lines revealed that it exhibits preferential allelic expression in both cell types as well as whole brain samples (Table II.2). However, such an in depth analysis could not be extended to all tissues analyzed.

The quantitative analysis of Pon1 (chapter III) demonstrates that differences in peak heights in sequencing electropherograms can correlate with expression levels. However, to accurately determine the difference in allelic ratios, a quantitative method must be employed, such as pyrosequencing or SNaPshot. Such measurements are important in understanding the effects of regulatory polymorphisms on gene expression. At the same 69 time, these measurements are not reflective of protein levels, but represent levels of the transcript present in the cell.

Sequencing and PCR amplification bias were important factors to consider in this study as potential sources of error. Much effort was taken to minimize PCR amplification bias by ensuring that the primers used did not overlap polymorphisms. Additionally, peak heights from amplified cDNA were compared to allelic ratios in gDNA to account for sequencing bias. However, due to the fact that the same primers were seldom used for both cDNA and gDNA analyses, amplification bias could not be completely ruled out.

Nevertheless, transcripts displaying preferential allelic expression generally demonstrated this phenomenon in specific tissues. Consequently, the biallelically expressed samples generated using the same sequencing and PCR conditions as the samples where preferential allelic expression was observed served as an internal control. This indicated that the observed skews in peak heights in sequencing electropherograms were not due to experimental conditions.

The analysis revealed that the difference in allelic expression occurs along a broad spectrum. Previous studies have demonstrated that differences in the expression of polymorphic alleles can be as high as 100% (27). Similar findings were observed in this chapter, where Pon2 displayed near monoallelic expression in the liver, yet other transcripts demonstrated lesser degrees of preferential allelic expression (Figure II.2).

The results of the analyses in this study demonstrate that strain-specific preferential allelic expression can occur (Figure II.4). I hypothesize that these strain specific differences in expression are due to polymorphisms located in cis-acting regulatory regions unique to each strain of mouse used in the study. Due to the scarcity of such unique polymorphisms 70 in regulatory regions between the three strains of mice used in this study, the identification of cis-acting regulatory elements may be simplified by focusing on these distinct variants.

This chapter provides a novel method of identifying maternal contamination in placental tissues. Due to the importance of imprinted transcripts in embryonic and placental development, discarding the possibility of maternal contamination in tissues used for these studies is essential. The method described in this study relies on the identification of transcribed polymorphisms unique to three different strains of mice and performing crosses between a hybrid and inbred mouse. However, this method is limited to studying transcripts where a unique genotype can be identified in the mice used. Consequently, it will not detect maternal contamination in all transcripts of interest. Using this method, it was determined that Hoxa11 is not imprinted in the spongiotrophoblast layer of the placenta, and that the observed expression of the gene in dissections of this tissue was due to maternal contamination (Figure II.6). However, since only maternal expression was observed, the study could not determine if there is any Hoxa11 expression in the spongiotrophoblast.

Consequently, additional studies, such as RNA in situ hybridization must be performed to determine if the transcript is expressed in this placental layer. Due to the fact that unique polymorphisms to each of the three strains of mice used in this study could not be identified in Hoxa11-AS, the analysis could not be extended to the antisense transcript. However, the study of Hoxa11 was sufficient to determine that the spongiotrophoblast dissections bore maternal contamination.

As high-throughput murine genotyping technologies develop, the materials described in this study can be used to discover potentially hundreds of transcripts subject to preferential allelic expression, particularly in tissue specific patterns. These studies would be limited to the analysis of transcripts where polymorphisms could be identified in genes 71 of interest. However, as noted in this screen, the absence of polymorphisms was only seen in ~2% of genes. Consequently, the use of evolutionarily distant strains of mice is an effective method for the identification of transcribed polymorphisms.

The frequency of preferential allelic expression observed in this study adds to the growing body of evidence which demonstrates that preferential allelic expression in a non- parent-of-origin pattern is a common phenomenon in mammals, where over 50% of transcripts may exhibit biased allelic expression (16, 27-30). However, this study is unique in demonstrating that preferential allelic expression occurs commonly with tissue specificity, suggesting that tissue specific factors are involved in this phenomenon. Further studies are required to identify regulatory polymorphisms which may influence allelic expression in the transcripts listed in this study, and the effect of these variants on transcription factor binding and transcriptional stability. The identification of such polymorphisms will help understand the mechanisms regulating the differential expression of alleles and uncover the effect of this phenomenon on human disease.

72

I.E REFERENCES

1. Gicquel, C., Rossignol, S., Cabrol, S., Houang, M., Steunou, V., Barbu, V., Danton, F., Thibaud, N., Le Merrer, M., Burglen, L. et al. (2005) Epimutation of the telomeric imprinting center region on chromosome 11p15 in Silver-Russell syndrome. Nat Genet, 37, 1003-7. 2. Abu-Amero, S., Price, S., Wakeling, E., Stanier, P., Trembath, R., Preece, M.A. and Moore, G.E. (1997) Lack of hemizygosity for the insulin-like growth factor I receptor gene in a quantitative study of 33 Silver Russell syndrome probands and their families. Eur J Hum Genet, 5, 235-41. 3. Eggermann, T., Eggermann, K., Mergenthaler, S., Kuner, R., Kaiser, P., Ranke, M.B. and Wollmann, H.A. (1998) Paternally inherited deletion of CSH1 in a patient with Silver-Russell syndrome. J Med Genet, 35, 784-6. 4. Ramirez-Duenas, M.L., Medina, C., Ocampo-Campos, R. and Rivera, H. (1992) Severe Silver-Russell syndrome and translocation (17;20) (q25;q13). Clin Genet, 41, 51-3. 5. Eggermann, T., Wollmann, H.A., Kuner, R., Eggermann, K., Enders, H., Kaiser, P. and Ranke, M.B. (1997) Molecular studies in 37 Silver-Russell syndrome patients: frequency and etiology of uniparental disomy. Hum Genet, 100, 415-9. 6. Preece, M.A., Price, S.M., Davies, V., Clough, L., Stanier, P., Trembath, R.C. and Moore, G.E. (1997) Maternal uniparental disomy 7 in Silver-Russell syndrome. J Med Genet, 34, 6-9. 7. Hannula, K., Lipsanen-Nyman, M., Kontiokari, T. and Kere, J. (2001) A narrow segment of maternal uniparental disomy of chromosome 7q31-qter in Silver-Russell syndrome delimits a candidate gene region. Am J Hum Genet, 68, 247-53. 8. Reboul, M.P., Tandonnet, O., Biteau, N., Belet-de Putter, C., Rebouissoux, L., Moradkhani, K., Vu, P.Y., Saura, R., Arveiler, B., Lacombe, D. et al. (2006) Mosaic maternal uniparental isodisomy for chromosome 7q21-qter. Clin Genet, 70, 207-13. 9. Kayashima, T., Yamasaki, K., Yamada, T., Sakai, H., Miwa, N., Ohta, T., Yoshiura, K., Matsumoto, N., Nakane, Y., Kanetake, H. et al. (2003) The novel imprinted 73

carboxypeptidase A4 gene (CPA4) in the 7q32 imprinting domain. Hum Genet, 112, 220-6. 10. McCann, J.A., Zheng, H., Islam, A., Goodyer, C.G. and Polychronakos, C. (2001) Evidence against GRB10 as the gene responsible for Silver-Russell syndrome. Biochem Biophys Res Commun, 286, 943-8. 11. Riesewijk, A.M., Blagitko, N., Schinzel, A.A., Hu, L., Schulz, U., Hamel, B.C., Ropers, H.H. and Kalscheuer, V.M. (1998) Evidence against a major role of PEG1/MEST in Silver-Russell syndrome. Eur J Hum Genet, 6, 114-20. 12. Meyer, E., Wollmann, H.A. and Eggermann, T. (2003) Searching for genomic variants in the MESTIT1 transcript in Silver-Russell syndrome patients. J Med Genet, 40, e65. 13. Feuk, L., Kalervo, A., Lipsanen-Nyman, M., Skaug, J., Nakabayashi, K., Finucane, B., Hartung, D., Innes, M., Kerem, B., Nowaczyk, M.J. et al. (2006) Absence of a paternally inherited FOXP2 gene in developmental verbal dyspraxia. Am J Hum Genet, 79, 965-72. 14. Yamada, T., Mitsuya, K., Kayashima, T., Yamasaki, K., Ohta, T., Yoshiura, K., Matsumoto, N., Yamada, H., Minakami, H., Oshimura, M. et al. (2004) Imprinting analysis of 10 genes and/or transcripts in a 1.5-Mb MEST-flanking region at human chromosome 7q32. Genomics, 83, 402-12. 15. Bonora, E., Bacchelli, E., Levy, E.R., Blasi, F., Marlow, A., Monaco, A.P. and Maestrini, E. (2002) Mutation screening and imprinting analysis of four candidate genes for autism in the 7q32 region. Mol Psychiatry, 7, 289-301. 16. Cowles, C.R., Hirschhorn, J.N., Altshuler, D. and Lander, E.S. (2002) Detection of regulatory variation in mouse genes. Nat Genet, 32, 432-7. 17. Nikaido, I., Saito, C., Mizuno, Y., Meguro, M., Bono, H., Kadomura, M., Kono, T., Morris, G.A., Lyons, P.A., Oshimura, M. et al. (2003) Discovery of imprinted transcripts in the mouse transcriptome using large-scale expression profiling. Genome Res, 13, 1402-9. 18. Perez Jurado, L.A., Peoples, R., Kaplan, P., Hamel, B.C. and Francke, U. (1996) Molecular definition of the chromosome 7 deletion in Williams syndrome and parent-of-origin effects on growth. Am J Hum Genet, 59, 781-92. 74

19. Mnatzakanian, G.N., Lohi, H., Munteanu, I., Alfred, S.E., Yamada, T., MacLeod, P.J., Jones, J.R., Scherer, S.W., Schanen, N.C., Friez, M.J. et al. (2004) A previously unidentified MECP2 open reading frame defines a new protein isoform relevant to Rett syndrome. Nat Genet, 36, 339-41. 20. Rozen, S. and Skaletsky, H. (2000) Primer3 on the WWW for general users and for biologist programmers. Methods Mol Biol, 132, 365-86. 21. Ono, R., Shiura, H., Aburatani, H., Kohda, T., Kaneko-Ishino, T. and Ishino, F. (2003) Identification of a large novel imprinted gene cluster on mouse proximal chromosome 6. Genome Res, 13, 1696-705. 22. Lewis, A., Mitsuya, K., Umlauf, D., Smith, P., Dean, W., Walter, J., Higgins, M., Feil, R. and Reik, W. (2004) Imprinting on distal chromosome 7 in the placenta involves repressive histone methylation independent of DNA methylation. Nat Genet, 36, 1291-5. 23. Sandell, L.L., Guan, X.J., Ingram, R. and Tilghman, S.M. (2003) Gatm, a creatine synthesis enzyme, is imprinted in mouse placenta. Proc Natl Acad Sci U S A, 100, 4622-7. 24. Zhang, Y.M., Xu, B., Rote, N., Peterson, L. and Amesse, L.S. (2002) Expression of homeobox gene transcripts in trophoblastic cells. Am J Obstet Gynecol, 187, 24-32. 25. Ruf, N., Dunzinger, U., Brinckmann, A., Haaf, T., Nurnberg, P. and Zechner, U. (2006) Expression profiling of uniparental mouse embryos is inefficient in identifying novel imprinted genes. Genomics, 87, 509-19. 26. Yamasaki, K., Joh, K., Ohta, T., Masuzaki, H., Ishimaru, T., Mukai, T., Niikawa, N., Ogawa, M., Wagstaff, J. and Kishino, T. (2003) Neurons but not glial cells show reciprocal imprinting of sense and antisense transcripts of Ube3a. Hum Mol Genet, 12, 837-47. 27. Pastinen, T., Sladek, R., Gurd, S., Sammak, A., Ge, B., Lepage, P., Lavergne, K., Villeneuve, A., Gaudin, T., Brandstrom, H. et al. (2004) A survey of genetic and epigenetic variation affecting human gene expression. Physiol Genomics, 16, 184- 93. 28. Yan, H., Yuan, W., Velculescu, V.E., Vogelstein, B. and Kinzler, K.W. (2002) Allelic variation in human gene expression. Science, 297, 1143. 75

29. Lo, H.S., Wang, Z., Hu, Y., Yang, H.H., Gere, S., Buetow, K.H. and Lee, M.P. (2003) Allelic variation in gene expression is common in the human genome. Genome Res, 13, 1855-62. 30. Pant, P.V., Tao, H., Beilharz, E.J., Ballinger, D.G., Cox, D.R. and Frazer, K.A. (2006) Analysis of allelic differential expression in human white blood cells. Genome Res, 16, 331-9.

76

CHAPTER III: DYNAMIC VARIATION IN ALLELE-SPECIFIC GENE EXPRESSION OF PARAOXONASE-1 IN MURINE TISSUES THROUGHOUT DEVELOPMENT

Data from this chapter have been included in the following manuscript:

Parker-Katiraee L., Nakabayashi K., Bousiaki E., Moore G.E., Monk D. and Scherer S.W. Dynamic allelic variation of the paraoxonase-1 gene (PON1) in liver development. To be submitted for peer-review publication.

I performed all RNA extractions, cDNA syntheses, pyrosequencing, qPCR, and analyses of the data. I performed SNaPshot sample preparations, while the reactions were performed by The Centre for Applied Genomics. Human PON1 experiments were performed in collaboration with Eleni Bousiaki, Dr David Monk and Dr Gudrun Moore at the Imperial College School of Medicine, UK, due to their collection of human fetal tissue samples and corresponding maternal DNA. Dissections were performed together with Dr Kazuhiko Nakabayashi, as well as the analysis of the RT-PCR data. 77

III.A INTRODUCTION

The preferential expression of alleles has been observed in both humans and mice

(1-4). Genome-wide studies analyzing expression in several individuals have identified differential allelic expression in over 50% of transcripts examined (1, 3). The importance of the analysis of this phenomenon at disease loci is gradually being unravelled and is highlighted by the association of altered levels of expression of polymorphic alleles with pathogenesis, as highlighted in chapter I.B.1 (for review, see (5))

This chapter describes the identification of preferential allelic expression in the

Paraoxonase-1 gene transcript (mouse gene Pon1; human gene PON1; OMIM: 168820).

The paraoxonase-1 protein (EC 3.1.1.2), known to hydrolyze organophosphate compounds, plays an important role in lipid metabolism by associating with high-density lipoproteins and preventing the oxidation of low-density-lipoproteins (6-8). As such, Pon1-knockout mice are more susceptible to atherosclerotic lesions than wild-type controls (9).

Additionally, high levels of Pon1 protein have been shown to confer protection against poisoning by organophosphorous compounds in murine models (10, 11). Studies have identified various common polymorphisms in Pon1/PON1’s gene sequence and promoter region, which have been shown to correlate with gene expression levels as well as enzymatic activity (12-15). Genotypes with lower enzymatic activity have been associated with coronary artery disease (16, 17), stressing the importance of the analysis of allelic ratios at this locus.

PON1 is a member of a family of at least three genes, all of which map to human chromosome 7q21.3. The mouse orthologues of two of these genes, Pon2 and Pon3, are part of a large cluster of imprinted genes located on chromosome 6A1 (Figure III.1) (18).

The imprinting status of Pon2 and Pon3 has not been determined in humans (Table I.2). As 78 described in chapter I.C.3.ii.b, the gene centromeric to Pon1, Ppp1r9a, has been shown to be imprinted in both human and mouse in a tissue-specific pattern (18, 19). Based on

Pon1’s location within an imprinted cluster, the gene was considered to be a potential candidate for imprinting. Therefore, its imprinting status was examined using F1 reciprocal hybrids of inbred mice strains, as outlined in chapter II.

Although a parent-of-origin pattern of allelic expression was not identified, it was discovered that Pon1 shows preferential allelic expression in the liver and that this expression level changes dynamically throughout embryonic development. This may represent the first demonstration that allele-specific effects on gene expression can be dependent on developmental stage. 79

Figure III.1

Figure III.1 Overview of the murine Pon1 locus. The 1 Mb region contains several imprinted genes (white boxes), as well as genes showing preferential or biallelic expression (black boxes) Transcriptional direction of each gene is shown by arrows. The intron-exon structure of Pon1 is depicted below, as well as its non-coding region (grey segments). Primers used for amplifying cDNA fragments and primers used for amplification of genomic DNA in SNP identification are indicated by fragments and a black bar, respectively. The triangles denote the SNPs described in this chapter.

80

III.B MATERIALS AND METHODS

III.B.1 Allelic expression analysis of murine tissues

Hybrid crosses were performed between C57BL/6J and CAST/Ei, and C57BL/6J and JF1/Ms, and tissues were obtained from the F1 generations and their reciprocal crosses at 12.5 days-post-coitum (dpc), 15.5 dpc and P0 (postnatal day 0). Total RNA was extracted using TRIZOL reagent (Invitrogen) as outlined by the manufacturer. Two micrograms of DNase treated total RNA was used for cDNA synthesis using random primers (SuperScript II, Invitrogen).

To identify single nucleotide polymorphisms (SNPs) in Pon1, a 607 bp fragment was sequenced from cDNA using primers Pon1-F and Pon1-R whose sequences are 5'-

ACAAGAACCATCGGTCTTCC-3' and 5'-CCTTCTGCTACCACCTGGAC-3' (18), respectively. The identified SNP at nucleotide 280 of NM_011134 was confirmed in genomic DNA by PCR amplification using primers Exon-4F (5'-

TGATGTCTCGAGAGCAATGG-3') and Exon-3R (5'-TGCACCACGTTTGAAACAAT-

3'). Genomic SNPs were identified in the promoter using primers Promoter-F 5'-

ATGGCCTGAGACAGATGGAC-3' and Promoter-R 5'-CCTCCTTCCACCACACAAGT-

3', which amplified an 803 bp fragment. The cycling conditions were initial denaturation at

94ºC for 5 min, followed by 35 cycles for cDNA and 40 cycles for genomic DNA of, denaturation at 94ºC for 30 s, annealing at 57.5ºC for 30 s, and extension at 72ºC for 90 s.

Products were purified using microCLEAN (Microzone Ltd), and were subsequently sequenced on an ABI 3730XL using BigDye Terminator v3.1 cycle sequencing kit

(Applied Biosystems), combined with Half BigDye sequencing buffer (Sigma Aldrich).

Biased expression levels were confirmed in at least two embryos at each stage of 81 development and for each hybrid cross.

III.B.2 Quantification of allelic ratios

Pon1 cDNA was amplified for pyrosequencing using Taq2000 (Stratagene) following the manufacturer’s protocol and using primers (forward) 5'-

GACGGGTGCTGAAGACTTAGA-3' and (reverse) 5'-biotin-

AGGCTTACTGGGATCGAAACT-3'. The biotinylated products were purified using streptavidin sepharose (GE Healthcare), following the PSQ 96 sample preparation guide.

The sequencing primer used in the pyrosequencing reaction was 5'-

GGACTAACTTTCTTTAGCAC-3' at a concentration of 0.4 μM per reaction.

Pyrosequencing was performed using the Pyro Gold Enzyme Mixture (Biotage) and analyzed using PSQ 96MA 2.1 ID system.

SNaPshot analysis was performed using Pon1 liver cDNA. Unincorporated PCR primers and dNTPs were removed from 25 μl of PCR products by the addition of 8.3 U of shrimp alkaline phosphatase (SAP) and 3.3 U of Exonuclease I (Exo) (USB). Mixtures were incubated at 37ºC for one hour followed by 75ºC for 15 min to inactivate the enzymes. SNP genotyping was performed using SNaPshot™ single-basepair extension reactions (Applied Biosystems). Primers for the extension reactions were designed according to the manufacturer’s instructions: SNP-g-mPON1-ss2 (5'-

GACTTATAATAAATGTCAATAAACACTCAC-3') was used to quantify the allelic ratios of genomic samples and mPON1 (5'-GACTAATGGACTAACTTTCTTTAGCAC-3') and was used for cDNA.

Reactions contained 7 μl of cleaned PCR product, 2 μl of SNaPshot™ multiplex enzyme mix and 2 pmol of primer for a total volume of 10 μl. Cycling conditions for the extension reaction were 25 cycles of 96ºC for 10 s, 55ºC for 5 s and 60ºC for 30 s. 82

Following the extension reaction, unincorporated ddNTPs were removed by adding 1 U of

SAP and incubating products at 37ºC for one hour followed by 75ºC for 15 min.

Volumes of 1-2 μl of SNaPshot reactions were suspended in 9 μl of Hi-Di formamide

(ABI) and run on an ABI3100 genetic analyzer (Applied Biosystems) using the POP4 polymer and dye set E5. Results were analyzed using the GENESCAN v. 3.1 software.

III.B.3 Real-time quantitative PCR

To compare Pon1 levels at different stages of embryonic development, quantitative

PCR was performed using the Brilliant SYBR Green qPCR Master Mix (Stratagene). The primers used for the reaction were qPCR-Pon1-F (5'-CGGGTGCTGAAGACTTAGAGA-

3') and qPCR-Pon1-R (5'-CTCTGACACTGCTGGCTCCT-3'). The PCR reactions were performed in triplicates and in separate tubes. Absolute quantitation of Pon1 was obtained from cDNA from hybrid mice at various stages of growth. Results were normalized by β-2- microglobulin and β-actin, which was quantified using the following primers: β-2- microglobulin-F (5'-ATGGGAAGCCGAACATACTG-3') and β-2-microglobulin-R (5'-

CAGTCTCAGTGGGGGTGAAT-3'); β-actin-F (5'-TTGTTACCAACTGGGACGAC-3') and β-actin-R (5'- TCTCAGCTGTGGTGGTGAAG-3'). Results were analyzed using the standard curve method according to the manufacturer´s instructions using the Mx3005P quantitative PCR system (Stratagene, La Jolla, USA). The system’s default PCR conditions were used, with the following modification: annealing temperature at 58ºC. The standard curve for all three transcripts quantified was developed using dilutions of a single liver cDNA sample.

III.B.4 Methylation analysis

DNA was extracted from livers of hybrid embryo from the offspring of JF1/Ms

(JF1) and C57BL/6J (BL6) using DNeasy blood and tissue kit (Qiagen). Bisulfite treatment 83 was performed using the EZ methylation protocol (Zymo Research). The bisulfite-treated promoter region of Pon1 was amplified using the following primers: (forward) 5'-

GTTAGAGTTTTTTAGAGGTATTTTGTTGG-3' and (reverse) 5'-

CATAACTAACACTCAATAAACCTCAATC-3'. One microlitre of the reaction was used for a semi-nested reaction where the same forward primer was used with the following reverse primer: 5'-CCTCAATCACATAAAAAAAATACTAAATAA-3'. The amplified product was purified using microCLEAN (Microzone Ltd) and was subcloned using the

TOPO TA-cloning system (Invitrogen).

III.B.5 Human PON1 expression analysis

Fetal tissues were obtained from the MRC Tissue Bank at Imperial College London,

Hammersmith hospital site. Local ethical approval for use of the collection was granted by the Hammersmith, Queen Charlotte’s and Chelsea and Acton Hospitals Research Ethics

Committee (2001/6028).

Total RNA was extracted from fetal tissues using TRIZOL (Life Technologies). RT-

PCR of PON1 was performed on cDNA synthesised from the total RNA following reverse transcription using the primer pair 1F (5'-TATTGTTGCTGTGGGACCTGAG-3') and 1R

(5'-CCACAGATATGTTATCCACG-3'). For each RNA sample a RT negative control was used and standard GAPDH primers were used to confirm RNA integrity.

A 97 bp DNA fragment was amplified from fetal genomic DNA using primers ex6 1F (5'-

TATTGTTGCTGTGGGACCTGAG-3') and 2R (5'-

CAGGCTAAACCCAAATACATCTC-3'), and was sequenced to assay for SNP rs662 on exon 6. The cycling conditions were: initial denaturation at 94ºC for 3 min, followed by 35 cycles denaturation at 94ºC for 30 s, annealing at 55ºC for 30 s, and extension at 72ºC for

90 s. The PCR products were purified and sequenced using primer 5'- 84

TATTGTTGCTGTGGGACCTGAG-3'. Allelic expression pattern was determined by comparing the peak heights of two alleles at a SNP site in sequence electropherograms.

When the peak height for one allele was lower than one tenth of that of the other allele, it was interpreted as monoallelic expression. When one peak was equal or higher than one tenth of the other, and lower than the half of the other, it was assigned to be preferential expression. When one peak was equal or higher than the half of the other, it was regarded as biallelic expression. 85

III.C RESULTS

III.C.1 Pon1 allelic expression is dynamic throughout embryonic development

To determine the allelic expression pattern of Pon1, a SNP (T/C) was identified in its cDNA sequence at nucleotide 280 of NM_011134. The T allele was present in

C57BL/6J (BL6) inbred strains, while the C allele was present in both JF1/Ms (JF1) and

CAST/Ei (CAST) strains. Tissues were extracted at different stages of development (12.5 days post coitum (dpc), 15.5 dpc, and P0 (postnatal day 0)) from the F1 embryos of hybrid strains BxC and BxJ (where B is the female), as well as their reciprocal crosses. Pon1 expression was determined by RT-PCR in lung, liver, yolk sac, placenta, intestine, limb, and brain. A detectable level of Pon1 expression at all three stages was observed only in liver. The transcript was also found to be expressed in brain, but at low levels and was consequently excluded from further analyses.

Allelic expression analysis of Pon1 was performed by direct sequencing of RT-PCR products, which span exons 1-6 (Figure III.1). A preferential expression pattern, independent of parent-of-origin, was observed in the liver samples and was subsequently confirmed by pyrosequencing (Figure III.2). This method has been previously shown to be a quantitative method of determining allelic ratios (20, 21). These analyses revealed a higher expression of the CAST and JF1 alleles at 12.5 dpc in each cross and a marked decrease in its allelic expression level in the later stages examined. This decrease was most notable in crosses involving CAST, where expression of the BL6 allele surpassed that of the CAST allele in P0 samples. In JF1 hybrid crosses however, the JF1 allele always had a stronger expression than of BL6, which is consistent with the allelic preference at P0 observed by Ono, et al (18). Notably, the decrease in CAST and JF1 alleles was more drastic when the BL6 was maternally inherited. This disparity in expression was best 86 observed at 15.5 dpc in crosses involving CAST, where this strain’s allelic frequency is measured at 45.3% in BxC, whereas it is 67.6% in CxB. The observed biased allelic expression patterns were reproduced in a separate set of embryonic livers by direct sequencing of RT-PCR samples at each stage of development, indicating the possibility that the allelic expression patterns are not random but developmentally regulated (Figure III.2).

Additionally, the findings were replicated using SNaPshot, a fluorescent-based primer extension method which allows quantification of differences in peak heights between two alleles (22), suggesting that the findings are not dependent on the methodology (Figure

III.3).

To identify splice variants whose increased expression may account for the differential pattern observed, 5' RACE was performed and the whole length of Pon1 (from exons 1-9) was also amplified. Additional 5' ends or splice variants were not identified.

87

Figure III.2

c c c c p p c p p c d d p d d p .5 .5 d .5 .5 d 2 5 0 2 5 0 1 1 P 1 1 P

BxC BxC

Genomic DNA CxB CxB CB-PON1-exon3

BxJ BxJ

JxB JxB Genomic DNA JB-PON1-exon3

Figure III.2 Expression pattern of Pon1. Electropherograms from sequencing of two sets of liver cDNA samples from hybrid crosses at different developmental stages are shown. Results from amplification of genomic DNA are indicated on the right. The blue and red peaks indicate C and T nucleotides respectively, where T is the allele inherited from BL6 in Pon1. Letters B, C, and J, refer to species BL6, CAST, and JF1, respectively, with the first letter of each cross representing the mother. The results show reproducibility in the two sample sets and indicate a dynamic pattern of allelic expression throughout development. 88

Figure III.3

Figure III.3 Comparison of pyrosequencing and SNaPshot methodologies. The allelic ratios of the CAST or JF1 allele and BL6 allele for Pon1 are shown in black and grey, respectively. The developmental time point and the methodology used to measure the allelic ratio are shown on the X-axis, where P and S represent measurements from pyrosequencing and SNaPshot, respectively. Different sets of cDNA liver samples were used for each methodology. The findings indicate that the dynamic differential allelic ratio observed in Pon1 is not due to the methodology or the samples used.

89

III.C.2 Expression from Pon1 alleles increase disproportionately through embryonic development

To determine if the dynamic allelic patterns of expression observed in Pon1 were due to a static expression of the CAST and JF1 alleles with increases in BL6 allele or if they were caused by decreases in the CAST and JF1 alleles, quantitative PCR was performed using the SYBR Green detection method. The expression of Pon1 was quantified in each liver samples for every developmental stage and each hybrid cross. The values were normalized by β-2-microglobulin gene expression quantities at the corresponding stage (Figure III.4). The analysis revealed that Pon1 expression increased in an exponential pattern during embryonic development. This increase in total gene expression was far more striking in crosses involving JF1. Similar results were obtained using an independent set of liver samples and by β-actin normalization (data not shown).

The latter experiment was performed to ensure that the findings were not dependent on possible variations in the expression of the control transcript.

Upon incorporation of allele frequencies obtained from pyrosequencing analysis into the quantitative measurements, it was apparent that both alleles increased, yet in disproportionate amounts as shown in Figure III.4. The BL6 allele shows a higher fold increase in expression than the CAST or JF1 allele. This is best illustrated in the JxB hybrid crosses, where a 27.7-fold increase in BL6 allelic expression is measured between 15.5 dpc and P0. In contrast, only a 2.8-fold increase in the expression of the JF1 allele was observed between the same two developmental stages (Figure III.4, Table III.1).

90

Figure III.4

91

Figure III.4 Real-time quantitative PCR analysis of Pon1. Transcript abundance of hepatic Pon1 at various stages of embryonic development in F1 hybrid mice was quantified by SYBR Green detection method in triplicates. Values in the uppermost section represent total average Pon1 measurements normalized by β-2-microglobulin expression in each sample. Frequency of CAST or JF1 and BL6 alleles measured by pyrosequencing are represented by pie charts, where the white portions represent the CAST or JF1 allele and the grey portions represent the BL6 allele. The statistical significance in change in expression between each time point was calculated using the standard t-test, and is indicated above each pair of bars, where *, **, and *** indicate p values <0.02, <0.005, and ≤0.0001, respectively. The shading in BJ-P0 expression indicates that it goes beyond the upper limit of the graph. The total expression at this developmental time point was found to be 22.2±3.0 x10-1, and the change in expression between BJ-15.5 and BJ-P0 was found to be <0.02. The four charts in the lower section represent the relative contribution of each allele towards the total expression of Pon1, where the cross and developmental time point is indicated at the bottom of each bar. Fold increase in expression of each allele from the previous time-point is also indicated at the top of the bar. Error bars were calculated using the standard deviation of both pyrosequencing and quantitative PCR results. The results indicate a disproportionate increase in the expression of each allele. JB, CB, BC and BJ refer to JxB, CxB, BxC and BxJ F1 hybrids, respectively. 92

Table III.1 Quantitative PCR and pyrosequencing results for Pon1

BC-12.5 BC-15.5 BC-P0 RE 10.9 ± 0.9 x10-3 65.5 ± 1.0 x10-3 1.15 ± 0.09 % CAST allele 66.2 ± 1.9 45.3 ± 1.0 30.7 ± 1.2 CAST RE 7.23 ± 0.60 x10-3 29.7 ± 0.8 x10-3 35.4 ± 3.1 x10-2 Fold increase in CAST RE 4.1 11.9 BL6 RE 3.70 ± 0.60 x10-3 35.8 ± 0.8 x10-3 79.9 ± 3.1 x10-2 Fold increase in BL6 RE 9.7 22.3 CB-12.5 CB-15.5 CB-P0 RE 9.79 ± 2.05 x10-3 7.91 ± 1.03 x10-2 1.16 ± 0.09 % CAST allele 76.6 ± 2.8 67.6 ± 1.0 31.7 ± 0.4 CAST RE 7.50 ± 1.59 x10-3 53.5 ± 7.0 x10-3 36.9 ± 3.2 x10-2 Fold increase in CAST RE 7.1 6.9 BL6 RE 2.29 ± 1.59 x10-3 25.6 ± 7.0 x10-3 79.5 ± 3.2 x10-2 Fold increase in BL6 RE 11.2 31.0 BJ-12.5 BJ-15.5 BJ-P0 RE 51.7 ± 3.4 x10-3 67.0 ± 2.1 x10-2 22.2 ± 3.0 x10-1 % JF1 allele 91.4 ± 1.2 85.8 ± 1.1 60.0 ± 0.8 JF1 RE 47.3 ± 3.2 x10-3 57.5 ± 1.9 x10-2 13.3 ± 1.8 x10-1 Fold increase in JF1 RE 12.2 2.3 BL6 RE 4.43 ± 3.15 x10-3 9.54 ± 1.93 x10-2 8.89 ± 1.80x10-1 Fold increase in BL6 RE 21.5 9.3 JB-12.5 JB-15.5 JB-P0 RE 16.0 ± 0.2 x10-2 33.1 ± 1.4 x10-2 14.1 ± 0.6 x10-1 % JF1 allele 94.1 ± 1.6 94.1 ± 1.1 61.7 ± 1.5 JF1 RE 15.1 ± 0.3 x10-2 31.1 ± 1.38x10-2 86.8 ± 4.3 x10-2 Fold increase in JF1 RE 2.1 2.8 BL6 RE 9.54 ± 3.04x10-3 1.95 ± 1.38x10-2 54.0 ± 4.3 x10-2 Fold increase in BL6 RE 2.0 27.7 RE: relative expression (Pon1 expression normalized by beta-2-microglobulin ) All measurements were performed in triplicates 93

III.C.3 Screening of candidate SNPs responsible for allele-specific gene expression and promoter methylation analysis

Previous studies have identified several SNPs upstream from the human PON1 gene associated with paraoxonase protein and transcript levels (13, 14, 23). Based on this observation, approximately 1 kb of the region upstream from the murine Pon1 start codon was sequenced, as well as exon 1, in order to identify SNPs in the regulatory region. Two polymorphisms were identified at -227 (G/T) and -413 (C/T), where the nucleotide prior to the ATG is -1. At the -227 polymorphism, in silico analysis using TESS

(www.cbil.upenn.edu/tess) identified a TFIID binding site present in BL6, and absent in

CAST and JF1. Further sequencing of the region 2 kb upstream from the start codon identified an average of one SNP every 57 bp, indicating relaxed selective constraint and consequently, a lower probability of the presence of regulatory elements in the region.

Consequently, the 1 kb region upstream from the start codon was identified as the putative promoter region for murine Pon1.

Additionally, the possibility of differential methylation of the Pon1 putative promoter was investigated. DNA was extracted from livers at four developmental time points (12.5 dpc, 16.5 dpc, 17.5 dpc, and P10) from JF1 x BL6 hybrid embryos. The samples were treated with bisulfite and PCR amplified to analyze the methylation of cytosines in the promoter. Since the expression analyses had revealed an increase in BL6 allele frequency throughout development in this strain, it was hypothesized that a loss of methylation on this allele would be observed. However, no discernable difference was seen in the methylation pattern between the developmental time points (Figure III.5).

94

Figure III.5

12.5 dpc 16.5 dpc 17.5 dpc P10

BL6 ♂

JF1 ♀

Figure III.5 Methylation analysis of Pon1 putative promoter in JF1 x BL6 hybrid embryos. DNA was extracted from livers at four developmental time points (12.5 dpc, 16.5 dpc, 17.5 dpc, and P10) from JF1 x BL6 hybrid embryos. The samples were treated with bisulfite and PCR amplified to analyze the methylation of cytosines in the promoter. Hollow and grey circles indicate unmethylated and methylated CpG dinucleotides, respectively. Each row of circles represents CpGs in an individual PCR product clone. The absence of methylation on the second CpG dinucleotide in JF1 samples (indicated by an arrow in the JF1 panel at 12.5 dpc) is due to a polymorphism at the site which abolishes the possibility of methylation. 95

III.C.4 Allelic expression analysis for the human PON1 gene

PON1 expression was analyzed in placenta, muscle, stomach, kidney, intestine, liver, pancreas, and lung by RT-PCR. High expression was observed in liver and weaker expression in pancreas (Figure III.6a). Consequently, only these two tissues were used in subsequent analyses.

Thirteen fetal liver samples were genotyped at an A/G polymorphic position in exon

6 of PON1. Seven of these samples, five of which were second trimester foetuses and two of which were of unknown gestational stages, were found to be heterozygous and were used for subsequent analyses. Sequencing of cDNA extracted from these seven tissues revealed monoallelic expression of PON1 in four samples, two displayed preferential expression, and the remaining was biallelically expressed (Figure III.6b). It must be noted that the latter was from an unknown gestational stage.

Ten pancreatic samples were genotyped for the same SNP, and four were found to be heterozygous. cDNA analyses revealed that three of these samples showed preferential allelic expression of PON1 and one sample showed monoallelic expression of the gene

(Figure III.6b). Since parental samples corresponding to the fetal samples analyzed were unavailable, the parent-of-origin in the expression of PON1 could not be determined.

96

Figure III.6

A Pl Mus St K In L Panc Lu M B + - + - + - + - + - + - + - + -

B

Genotyping (Fetal DNA)

Expression (Reverse sequence)

Figure III.6 Human PON1 expression. A) PON1 tissue expression. RT-PCR was performed on placenta (Pl), muscle (Mus), stomach, (St), kidney (K), intestine (In), liver (L), pancreas (P), and lung (Lu). B, blank; M, marker. Upper panel shows PON1 expression, lower panel shows GAPDH control expression. B) Monoallelic expression of human PON1 in liver and pancreas. Biallelic sequencing results of a human PON1 polymorphism from two fetal samples are shown in the top panels. cDNA sequencing of PON1 in liver (left panel) and pancreas (right panel), showing monoallelic expression of the polymorphism are shown below. 97

III.D DISCUSSION

This chapter demonstrates that Pon1 exhibits preferential allelic expression in the liver and that the allelic expression level can change in a dynamic manner throughout embryonic development and is also dependent on genetic background. The allelic expression pattern of Pon1 was examined by Ono and colleagues (2003), but only in neonatal liver and lung tissues (18). The allelic difference detected in gene expression changed throughout development in a strain-dependent manner and was found to be biased against the allele carried by CAST and JF1, although the bias was stronger in crosses involving CAST and when the BL6 allele was maternally inherited. These observations illustrate the first case of a developmentally regulated dynamic allelic expression pattern. A recent study performed by Wilkins and colleagues demonstrated that differences in allelic expression can occur between different regions of a single tissue sample from the same individual (24). Consequently, due to the fact that whole liver samples were used in the study, its findings may reflect differences in allelic expression occurring in a subset of cells within the liver.

An analysis of the human PON1 protein in infants revealed an increased expression level of 2- to 7-fold from birth until 6 to 12 months, at which point it reached a plateau

(25). The equivalent stage is reached at three weeks of age in mice and rats. It was found that Pon1 transcript levels increased substantially during embryonic growth, providing evidence that Pon1 protein activity escalates during hepatic development and plateaus at postnatal stages.

Monoallelic and preferential expression of human PON1 was observed in various liver and pancreatic fetal samples, yet the parent-of-origin could not be determined. The differences in human PON1 expression may be attributed to polymorphic imprinting, which 98 has been observed in several human genes (26, 27). However, it is important to note that preferential monoallelic expression can occur in a non-parent-of-origin pattern (28), as was seen in Pon1 at 12.5 dpc in BxJ and JxB hybrids (Figure III.2). Consequently, it is plausible to hypothesize that PON1 is expressed monoallelically at early gestational stages, preferentially later in development, and biallelically expressed neonatally. Such a dynamic pattern of expression would account for the preferential and biallelic expression patterns seen in several of the human samples. However, the differences in PON1 expression may also be due to polymorphisms in cis-acting regulatory regions between the biallelically and monoallelically expressed samples.

Two common coding polymorphisms have been identified in human PON1, L55M and Q192R, where the former has been associated with a greater production of mRNA (L allele) (15) and greater serum paraoxonase levels (12). Brophy and colleagues found that this polymorphism is in linkage disequilibrium with a polymorphism in the promoter region

(-108 C/T) (13). To determine if sequence variations in regulatory-regions may likewise account for the preferential pattern of expression observed in mice, the 1 kb region upstream from the coding region was sequenced and two SNPs were identified. In silico analysis of the SNPs identified a putative binding site for TFIID, a TATA-box binding protein required for RNA polymerase II activity, at SNP -227, implicating a possible role for this general transcription factor in the expression pattern of Pon1. Effects of this sequence variant on transcription factor binding may be confirmed by mobility shift assays.

Sequence polymorphisms affecting transcription factor binding have been observed in previous studies. Most notably, an intronic SNP in the lymphotoxin-α gene was found to be correlated to the protein’s production due to the haplotype-specific binding of a bHLH protein which gave rise to allele-specific regulation (29). However, such an analysis at bp 99

-277 in Pon1 would not exclude the possibility that cis-acting regulatory elements may be located in other regions of the gene, including introns and 3' regulatory regions.

An analysis of IL10 production and promoter cis-acting variations within the locus identified specific haplotypes which were associated with higher allelic transcription (30).

Promoter specific cis-acting variations have also been observed in the glutathione-S- transferase gene (31). Such variations may account for the difference in Pon1 gene expression levels seen between JF1 and CAST mice strains. Further analyses using reporter-based promoter studies may corroborate the impact of cis-acting variations on

Pon1 allele-specific expression.

The results of this study emphasize the importance of determining expression levels, not only in reciprocal crosses as shown here, but also at different developmental stages when analysing preferential patterns of expression (2, 18). This is highlighted by the fact that each gene in the paraoxonase family shows a different pattern of expression, although located within a cluster of imprinted genes. Such a finding could only be proven by reciprocal analyses of hybrid samples, which allows one to distinguish between monoallelic or preferential expression and true imprinting which must be parent-of-origin specific. Additionally, these results were duplicated on an independent set of liver samples further stressing the absence of random expression.

Our findings further highlight the need to determine allelic levels at disease loci, since differences in these levels are sources of phenotypic variation in human genetic diseases. If the mechanism of Pon1 regulation is conserved, this may imply that polymorphisms in human PON1 may not be a clear indicator of increased risk for atherosclerosis, since the expression of both alleles is neither equivalent nor static. 100

However, further experiments need to be performed to determine levels of human PON1 allelic variance. 101

III.E REFERENCES

1. Lo, H.S., Wang, Z., Hu, Y., Yang, H.H., Gere, S., Buetow, K.H. and Lee, M.P. (2003) Allelic variation in gene expression is common in the human genome. Genome Res, 13, 1855-62. 2. Cowles, C.R., Hirschhorn, J.N., Altshuler, D. and Lander, E.S. (2002) Detection of regulatory variation in mouse genes. Nat Genet, 32, 432-7. 3. Pant, P.V., Tao, H., Beilharz, E.J., Ballinger, D.G., Cox, D.R. and Frazer, K.A. (2006) Analysis of allelic differential expression in human white blood cells. Genome Res, 16, 331-9. 4. Ohtsuka, H., Mafune, Y., Tsunashima, K., Takahashi, H. and Kominami, R. (1994) Difference in allelic expression of genes probably associated with tumor progression in murine fibrosarcomas and cell lines. Jpn J Cancer Res, 85, 1015-22. 5. Buckland, P.R. (2004) Allele-specific gene expression differences in humans. Hum Mol Genet, 13 Spec No 2, R255-60. 6. Mackness, M.I., Arrol, S. and Durrington, P.N. (1991) Paraoxonase prevents accumulation of lipoperoxides in low-density lipoprotein. FEBS Lett, 286, 152-4. 7. Mackness, M.I., Arrol, S., Abbott, C. and Durrington, P.N. (1993) Protection of low-density lipoprotein against oxidative modification by high-density lipoprotein associated paraoxonase. Atherosclerosis, 104, 129-35. 8. Watson, A.D., Berliner, J.A., Hama, S.Y., La Du, B.N., Faull, K.F., Fogelman, A.M. and Navab, M. (1995) Protective effect of high density lipoprotein associated paraoxonase. Inhibition of the biological activity of minimally oxidized low density lipoprotein. J Clin Invest, 96, 2882-91. 9. Shih, D.M., Gu, L., Xia, Y.R., Navab, M., Li, W.F., Hama, S., Castellani, L.W., Furlong, C.E., Costa, L.G., Fogelman, A.M. et al. (1998) Mice lacking serum paraoxonase are susceptible to organophosphate toxicity and atherosclerosis. Nature, 394, 284-7. 10. Li, W.F., Furlong, C.E. and Costa, L.G. (1995) Paraoxonase protects against chlorpyrifos toxicity in mice. Toxicol Lett, 76, 219-26. 102

11. Costa, L.G., McDonald, B.E., Murphy, S.D., Omenn, G.S., Richter, R.J., Motulsky, A.G. and Furlong, C.E. (1990) Serum paraoxonase and its influence on paraoxon and chlorpyrifos-oxon toxicity in rats. Toxicol Appl Pharmacol, 103, 66-76. 12. Garin, M.C., James, R.W., Dussoix, P., Blanche, H., Passa, P., Froguel, P. and Ruiz, J. (1997) Paraoxonase polymorphism Met-Leu54 is associated with modified serum concentrations of the enzyme. A possible link between the paraoxonase gene and increased risk of cardiovascular disease in diabetes. J Clin Invest, 99, 62-6. 13. Brophy, V.H., Jampsa, R.L., Clendenning, J.B., McKinstry, L.A., Jarvik, G.P. and Furlong, C.E. (2001) Effects of 5' regulatory-region polymorphisms on paraoxonase-gene (PON1) expression. Am J Hum Genet, 68, 1428-36. 14. Suehiro, T., Nakamura, T., Inoue, M., Shiinoki, T., Ikeda, Y., Kumon, Y., Shindo, M., Tanaka, H. and Hashimoto, K. (2000) A polymorphism upstream from the human paraoxonase (PON1) gene and its association with PON1 expression. Atherosclerosis, 150, 295-8. 15. Leviev, I., Negro, F. and James, R.W. (1997) Two alleles of the human paraoxonase gene produce different amounts of mRNA. An explanation for differences in serum concentrations of paraoxonase associated with the (Leu-Met54) polymorphism. Arterioscler Thromb Vasc Biol, 17, 2935-9. 16. Serrato, M. and Marian, A.J. (1995) A variant of human paraoxonase/arylesterase (HUMPONA) gene is a risk factor for coronary artery disease. J Clin Invest, 96, 3005-8. 17. Odawara, M., Tachi, Y. and Yamashita, K. (1997) Paraoxonase polymorphism (Gln192-Arg) is associated with coronary heart disease in Japanese noninsulin- dependent diabetes mellitus. J Clin Endocrinol Metab, 82, 2257-60. 18. Ono, R., Shiura, H., Aburatani, H., Kohda, T., Kaneko-Ishino, T. and Ishino, F. (2003) Identification of a large novel imprinted gene cluster on mouse proximal chromosome 6. Genome Res, 13, 1696-705. 19. Nakabayashi, K., Makino, S., Minagawa, S., Smith, A.C., Bamforth, J.S., Stanier, P., Preece, M., Parker-Katiraee, L., Paton, T., Oshimura, M. et al. (2004) Genomic imprinting of PPP1R9A encoding neurabin I in skeletal muscle and extra- embryonic tissues. J Med Genet, 41, 601-8. 103

20. Sun, A., Ge, J., Siffert, W. and Frey, U.H. (2005) Quantification of allele-specific G-protein beta3 subunit mRNA transcripts in different human cells and tissues by Pyrosequencing. Eur J Hum Genet, 13, 361-9. 21. Wittkopp, P.J., Haerum, B.K. and Clark, A.G. (2004) Evolutionary changes in cis and trans gene regulation. Nature, 430, 85-8. 22. Norton, N., Williams, N.M., Williams, H.J., Spurlock, G., Kirov, G., Morris, D.W., Hoogendoorn, B., Owen, M.J. and O'Donovan, M.C. (2002) Universal, robust, highly quantitative SNP allele frequency measurement in DNA pools. Hum Genet, 110, 471-8. 23. Brophy, V.H., Hastings, M.D., Clendenning, J.B., Richter, R.J., Jarvik, G.P. and Furlong, C.E. (2001) Polymorphisms in the human paraoxonase (PON1) promoter. Pharmacogenetics, 11, 77-84. 24. Wilkins, J.M., Southam, L., Price, A.J., Mustafa, Z., Carr, A. and Loughlin, J. (2007) Extreme context specificity in differential allelic expression. Hum Mol Genet, 16, 537-46. 25. Cole, T.B., Jampsa, R.L., Walter, B.J., Arndt, T.L., Richter, R.J., Shih, D.M., Tward, A., Lusis, A.J., Jack, R.M., Costa, L.G. et al. (2003) Expression of human paraoxonase (PON1) during development. Pharmacogenetics, 13, 357-64. 26. Bunzel, R., Blumcke, I., Cichon, S., Normann, S., Schramm, J., Propping, P. and Nothen, M.M. (1998) Polymorphic imprinting of the serotonin-2A (5-HT2A) receptor gene in human adult brain. Mol Brain Res, 59, 90-2. 27. Jinno, Y., Yun, K., Nishiwaki, K., Kubota, T., Ogawa, O., Reeve, A.E. and Niikawa, N. (1994) Mosaic and polymorphic imprinting of the WT1 gene in humans. Nat Genet, 6, 305-9. 28. Pastinen, T., Sladek, R., Gurd, S., Sammak, A., Ge, B., Lepage, P., Lavergne, K., Villeneuve, A., Gaudin, T., Brandstrom, H. et al. (2004) A survey of genetic and epigenetic variation affecting human gene expression. Physiol Genomics, 16, 184- 93. 29. Knight, J.C., Keating, B.J. and Kwiatkowski, D.P. (2004) Allele-specific repression of lymphotoxin-alpha by activated B cell factor-1. Nat Genet, 36, 394-9. 104

30. Kurreeman, F.A., Schonkeren, J.J., Heijmans, B.T., Toes, R.E. and Huizinga, T.W. (2004) Transcription of the IL10 gene reveals allele-specific regulation at the mRNA level. Hum Mol Genet, 13, 1755-62. 31. Guy, C.A., Hoogendoorn, B., Smith, S.K., Coleman, S., O'Donovan, M.C. and Buckland, P.R. (2004) Promoter polymorphisms in glutathione-S-transferase genes affect transcription. Pharmacogenetics, 14, 45-51.

105

CHAPTER IV: IMPRINTING ANALYSIS OF MURINE CARBOXYPEPTIDASE-A4

Data from this chapter have been included in the following manuscript:

Parker-Katiraee L., Yamada T., Nakabayashi K. and Scherer S.W. The murine carboxypeptidase-A4 gene is preferentially expressed from the maternal allele in a tissue-specific pattern. To be submitted for peer-review publication.

I performed all the pyrosequencing, methylation, and imprinting analyses. Glial and neuronal cell lines were generated by Dr Takahiro Yamada. Tissue dissections were performed in collaboration with Dr Kazuhiko Nakabayashi and Dr Takahiro Yamada. Tissues from Dnmt3a knockout mice were a generous gift from Dr Hiroyuki Sasaki and Dr Masahiro Kaneda. 106

IV.A INTRODUCTION

Imprinted patterns of expression are generally maintained between humans and mice. As such, the use of mouse as a model has provided opportunity to examine the mechanisms regulating imprinted expression. Determining the allelic expression of genes in both species is of importance for the functional analysis of imprinted transcripts. At the same time, divergence in imprinted expression between the two species, which has been observed in IMPACT (1), TSSC4 (2), and COMMD1(3), among others, has provided the opportunity to determine whether differences in DNA sequence or its epigenetic modifications contribute to specificity (1). The expressions of several imprinted transcripts in the 7q32.3 locus have been previously analyzed in both human and mouse (Figure V.1).

Interestingly, Copg2 is reported to be imprinted in murine tissues, but not in humans (4, 5), suggesting that other genes in the interval may have divergence of imprinted expression.

Carboxypeptidase-A4 (human gene CPA4, mouse gene Cpa4, OMIM: 607635) is located within the human 7q32.3 imprinted locus. Its protein product is a member of the metallocarboxypeptidase family and binds latexin (6). The gene is located in a putative prostate cancer-aggressiveness locus (7-9) and has been induced in prostate cancer cell lines by histone deacetylase inhibitors (10). CPA4 is also located in a candidate region for

Russell-Silver Syndrome (RSS), as outlined in chapter II.A. Due to CPA4’s putative association with RSS as well as prostate cancer, the gene’s imprinted expression has been analyzed extensively in human tissues, as described in chapter I.C.3.ii.c (11, 12). However, its expression in murine samples has not been examined. Consequently, Cpa4 was selected as a candidate imprinted gene based on its location within the imprinted 7q32.3 cluster and its known imprinted expression in humans. 107

This chapter demonstrates that the pattern of allelic expression of Cpa4 is maintained between humans and mice. It determines that murine Cpa4 is maternally expressed in murine embryonic tissues, yet escapes imprinting in the fetal brain.

Additionally, it examines epigenetic modifications in the putative promoter region of the gene. 108

IV.B MATERIALS AND METHODS

IV.B.1 Expression analysis

Tissues were extracted from the F1 hybrid offspring of C57BL/6J (BL6) and

JF1/Ms (JF1) mice. RNA was isolated using TRIZOL reagent (Invitrogen), following the manufacturer’s protocol. Two micrograms of RNA were subsequently used for cDNA synthesis using random primers (SuperScript II, Invitrogen). Neuronal and glial cell lines were established, as previously described (13), and cDNA was obtained.

For the analysis of Cpa4 in murine tissues, cDNA was amplified using the following primers: forward 5'-GCTCTCTCTCGGGCACTAA-3' and reverse 5'-

TATACACCAAAGGTCAGTAGAGCA-3'. PCR was carried out using Taq2000

(Stratagene) following the manufacturer’s instructions with the annealing step being carried out at 58°C and cycling 37 times. Amplicons were digested with SplI (Fermentas) at 37°C for 1.5 h. Digested products were visualized on a 3% agarose gel to ensure adequate separation of the bands.

Cpa4 cDNA was amplified for pyrosequencing using Taq2000 (Stratagene) following the manufacturer’s protocol and using primers (forward) 5'-Biotin-

CTAGTGGGAGCAGCGTTGAC-3' and (reverse) 5'- CCCAGTGTCTCTCAGCTCAAA-

3'. The biotinylated products were purified using streptavidin sepharose (GE Healthcare), following the PSQ 96 sample preparation guide. The sequencing primer used in the pyrosequencing reaction was 5'- TGATGCCATTGTCGTA-3' at a concentration of 0.4 μM per reaction. Pyrosequencing was performed using the Pyro Gold Enzyme Mixture

(Biotage) and analyzed using PSQ 96MA 2.1 ID system.

109

IV.B.2 Methylation analysis

To analyze the methylation of Cpa4, DNA was extracted from the yolk sac and brain of 15.5 dpc hybrid offspring of CAST/Ei (Cast) and C57BL/6J (BL6) using TRIZOL reagent (Invitrogen). Bisulfite treatment was performed using the EZ methylation protocol

(Zymo Research). The bisulfite-treated putative promoter region of Cpa4 was amplified using the following primers: (forward) 5'-TATTATTGAGATGTTAAGTGATGA-3' and

(reverse) 5'-TAAAAACCACCAACTTTAATTTCAC-3'. One microlitre of the reaction was used for a semi-nested reaction where the same forward primer was used together with the following reverse primer: 5'-TTCACAAAAAAAATCAAATTTCAAA-3'. The amplified product was purified using microCLEAN (Microzone Ltd) and was subcloned using the TOPO TA-cloning system (Invitrogen). A standard t-test was performed in order to determine if there was a significant difference in methylation between the maternal and paternal alleles. Each clone was considered to be a single individual and the number of methylated CpGs was counted in each clone. 110

IV.C RESULTS

IV.C.1 Cpa4 is imprinted in murine embryonic tissues, yet displays biallelic expression in the fetal brain

To identify imprinted expression of the murine Cpa4, reciprocal crosses of

C57BL/6J (BL6) and JF1/Ms (JF1) were carried out and cDNA was extracted from embryonic and extra-embryonic tissues at various developmental time points (12.5 days post coitum (dpc) and 15.5 dpc).

To distinguish the two alleles of murine Cpa4, an A/G polymorphism was identified in the hybrid mice, corresponding to basepair 1125 of NM_027926. PCR was performed on cDNA and the expression of the SNP was analyzed, as described in chapter II. The results indicated preferential expression of the maternal allele (Figure IV.1). However, the paternal allele was still expressed, particularly in embryonic tissues.

To determine the degree of preferential expression present in Cpa4 expression, pyrosequencing was performed. This method has been shown to be an effective method of quantifying allelic ratios (14, 15) and has been used to identify preferential allelic expression in parent-of-origin patterns (11, 16-18). The reactions were performed in triplicate replicates, where genomic DNA was used as a positive control. The difference in allelic ratios in genomic DNA from heterozygous mice was never found to exceed 3%, demonstrating that the primers were not allele specific and that the method is quantitative.

The murine Cpa4 pyrosequencing analysis demonstrates that, with the exception of embryonic brain tissues, the transcript is maternally expressed in all samples (Table IV.1).

Embryonic brain samples exhibited allelic preference, where the BL6 allele was more abundantly expressed than the JF1 allele. This trend was observed in brain samples from two developmental time points. Extra-embryonic tissues (yolk sac and placental labyrinth) 111 were found to have greater levels of imprinted expression than samples from the embryo proper. Imprinting was observed in the yolk sac, not only at 15.5 dpc, but also at 12.5 dpc.

Imprinting was also observed in the extra-embryonic chorion at 9.5 dpc (Figure IV.2).

Imprinting has been shown to occur in cell-type specific patterns in the brain (19,

20). Consequently, Cpa4 expression was examined in neuronal and glial cell lines derived from the brains of 15.5 dpc hybrid embryos. The results confirm the absence of imprinting, which was observed in the whole brain. Again, a bias was observed towards the expression of the BL6 allele. Such a bias towards the expression of a specific strain’s allele has been demonstrated in previous studies and in chapter II (18). 112

Figure IV.1

gDNA cDNA

Placental Intestine Labyrinth Limb

JxB

JxB BxJ

Figure IV.1 Imprinted maternal expression of Cpa4 in murine tissues. Sequencing electropherograms from the PCR amplification of cDNA from tissues of F1 hybrids of inbred strains of mice are shown. In each of the panels, the sequencing electropherograms on the left show polymorphisms in genomic DNA (gDNA), while the panels on the right show the expression of the polymorphism in cDNA. JxB and BxJ indicate crosses between C57BL/6 (B) or JF1/Ms (J) where the first letter denotes the mother in the cross. The C allele (blue) is carried by JF1, while the T allele (red) is carried by BL6. All cDNA samples are from 15.5 dpc. The results indicate preferential maternal expression in the tissues shown. 113

Table IV.1 Frequency of JF1 allele in tissues of F1 hybrid offspring 12.5 dpc Yolk Sac Brain BJ 29.0 ± 1.0 BJ 35.8 ± 3.0 JB 72.9 ± 2.0 JB 38.7 ± 0.9 15.5 dpc Yolk Sac Brain Lung BJ 23.8 ± 5.2 BJ 41.4 ± 2.0 BJ 44.0 ± 4.5 JB 72.0 ± 3.3 JB 39.6 ± 2.7 JB 61.8 ± 2.9 Limb Intestine Placental Labyrinth BJ 35.2 ± 0.9 BJ 31.3 ± 1.4 BJ 22.2 ± 2.5 JB 65.1 ± 1.7 JB 65.1 ± 1.7 JB 78.3 ± 1.5 Cultures Neuronal Glial BJ 43.0 ± 4.5 BJ 47.8 ± 2.8 JB 44.2 ± 5.0 JB 29.9 ± 3.9 BJ: F1 hybrid offspring of BL6xJF1 mice; JB: F1 hybrid offspring of JF1xBL6 mice 114

Figure IV.2

BJ-9.5 JB-9.5 JF1 EmbYS Chr Emb YS Chr BL6 JF1

Figure IV.2 Imprinting analysis of 9.5 dpc tissues. Cpa4 was PCR amplified in cDNA from 9.5 dpc embryonic and extra-embryonic tissues from BL6xJF1 (BJ) and JF1xBL6 (JB) embryos (Emb: embryo; YS: yolk sac; Chr: chorion). Amplicons were treated with SplI, which digested the JF1 allele (lower band), but not the BL6 allele due to a polymorphism at the restriction site. Genomic DNA from JF1 was used as a positive control to ensure complete digestion.

115

IV.C.2 Regulation of Cpa4 imprinted expression

To determine if the putative promoter region of Cpa4 is subject to differential methylation, CpG dinucleotides were analyzed by bisulfite sequencing. The region selected to analyze demonstrated high sequence conservation, suggesting a regulatory function.

Bisulfite treated DNA from 15.5 dpc yolk sac was PCR amplified and subcloned. Maternal and paternal alleles were distinguished by polymorphisms in the amplicon. Due to the absence of SNPs between JF1 and BL6 in this region, the tissue used was from a cross between the CAST/Ei (CAST) and BL6 strains. The region was relatively depleted of CpG nucleotides, harboring only five CpG-dinucleotides in 450 basepairs of sequence. The analysis of this region revealed that the paternal allele had modestly greater levels of methylation than the maternal allele (Figure IV.3). The difference in methylation between the two parental alleles was found to be significantly different (p=0.0014). The same experiment was performed using bisulfite treated DNA from 15.5 dpc CASTxBL6 brain.

No difference was observed in methylation between the two parental alleles (p=0.50)

(Figure IV.3d)

A putative imprinting control region (ICR) has been identified at the 7q32.3 locus

(13). The differentially methylated region associated with the putative ICR is maternally methylated. Consequently, it was hypothesized that loss of methylation at this region may affect expression of Cpa4. The expression of the transcript was analyzed in the embryonic and extra-embryonic tissues from the offspring of Dnmt3a conditional knockout females.

This analysis revealed that Cpa4 is still expressed in these samples (data not shown). 116

Figure IV.3

AB Yolk Sac %M 1 2 3 4 5

C ♀ 80 47 80 67 47 Exon 1 Exons 2-3 B ♂ 100 75 94 100 81

Brain %M 1 2 3 4 5 ♀-Allele ♂-Allele C ♀ 90 20 90 70 70 C B ♂ 100 45 89 78 78

D ♀-Allele ♂-Allele

Figure IV.3 Methylation analysis of murine Cpa4 putative promoter region. The analysis was performed using bisulfite-treated DNA extracted from CASTxBL6 yolk sac (C) and brain (D) (15.5 dpc). A) Exon structure of Cpa4 is shown and the region amplified by PCR for bisulfite sequencing is indicated by a line. B) Percentage of methylated cytosines at each CpG-dinucleotide in Cpa4 promoter is shown. C) Bisulfite analysis of Cpa4 putative promoter region in yolk sac. Hollow circles and black circles indicate unmethylated and methylated CpG dinucleotides, respectively. Each row of circles represents CpGs in an individual PCR product clone. Panel on the left and right are clones derived from the maternal and paternal alleles, respectively, as determined by polymorphisms specific to each allele. The difference in methylation between the two parental alleles was found to be significantly different (p=0.0014). D) Bisulfite analysis of Cpa4 putative promoter region in brain. Panel on the left and right are clones derived from the maternal and paternal alleles, respectively, as determined by polymorphisms specific to each allele. The difference in methylation between the two parental alleles was not found to be significantly different (p=0.50). 117

IV.D DISCUSSION

This chapter describes the imprinted pattern of expression for the murine Cpa4 transcript, a member of the zinc-dependent family of metallocarboxypeptidases. Previous studies demonstrated that its human orthologue is maternally expressed in all tissues with the exception of the fetal brain (11, 12). The findings in this chapter indicate that Cpa4 has the same pattern of expression and demonstrate that imprinting is absent in both glial and neuronal cells, discarding the possibility of cell-specific imprinting in the brain (19, 20).

Additionally, these results show that the Cpa4 gene is subject to stronger imprinted expression in extra-embryonic tissues. Thus, Cpa4 is added to growing list of imprinted transcripts expressed in the placenta, strengthening the importance of imprinted gene regulation in extra-embryonic tissues (21).

Imprinted gene clusters are regulated by ICRs and the expression of transcripts within these clusters is often dependent upon proper gametic establishment and post- zygotic maintenance of differential methylation at the ICR (for review, see (22)).

Additionally, these ICRs carry allele-specific histone modifications (23, 24). Despite the presence of these epigenetic modifications, imprinted clusters often harbor genes which escape mono-allelic expression, as well as transcripts subject to tissue-specific imprinting

(2, 4, 22). Tissue-specific imprinting has been closely linked to histone modifications. A study by Lewis et al. examined the expression of transcripts with placenta-specific imprinting in DNA-methyltransferase 1 (Dnmt1) knockout embryos demonstrated that loss of methylation did not affect imprinting in these genes (25). In contrast, transcripts subject to imprinting in both embryonic and extra-embryonic tissues suffered a loss of parent-of- origin specific expression. Additionally, the authors identified enrichment for repressive histone modifications on the silenced allele, regardless of the presence of a differentially 118 methylated region (DMR). Such allele specific differences in histone modifications have been observed at several loci associated with tissue-specific imprinting, including Igf2r which also shows loss of imprinting in brain tissues (20, 26-28). However, cytosine methylation has been shown to play a key role in the parent-of-origin specific expression of other transcripts subject to tissue-specific imprinting, such as GRB10 and NDN (27, 29).

Additionally, the presence of promoters, insulators, and isoforms in tissue-specific patterns have been associated with transcripts subject to tissue-specific imprinting (30-32).

At the imprinted cluster of genes located on human chromosome 7q32.3/murine chromosome 6qA.2, a maternally methylated DMR associated with MEST has been identified which carries histone modification patterns found at ICRs (13, 33). Loss of maternal methylation has been shown to cause loss of expression of the maternally transcribed Klf14 and increased expression of Mest, leading to the hypothesis that the

MEST DMR may act as an ICR (13, 34). Despite the presence of this putative ICR, numerous genes at the locus escape imprinting and several, including CPA4, show tissue specific imprinting (Figure V.1). To determine if this DMR regulates the imprinted expression of Cpa4, its expression was examined in the offspring of Dnmt3a conditional knockout mice, and it was found to be expressed in both embryonic and extra-embryonic tissues. However, due to Cpa4’s partial imprinted expression, a loss of methylation at the

DMR may give rise to a decreased or increased level of expression, not a loss of expression or loss of imprinting. Consequently, a quantifiable method of measuring such changes in expression, such as qPCR, may detect changes in the regulation of the gene in Dnmt3a knockout mice. However, due to the lack of sufficient materials to perform these experiments, I could not complete this analysis. A more direct analysis may be performed 119 by deleting the putative ICR in a murine model, which may determine the involvement of this region in regulating Cpa4, as well as other genes in the locus.

Bisulfite sequencing of the Cpa4 promoter revealed a slight, yet statistically significant increase in methylation on the paternal allele. This differential pattern of methylation was not observed in the brain, suggesting that methylation may be associated with Cpa4 imprinting. Whether such a slight degree of differential methylation is sufficient to drive allele-specific expression remains to be studied. However, due to the role played by differential histone modifications at imprinted loci as well as the induction of Cpa4 caused by histone deacetylase inhibitors (10), it is possible that the transcript may be regulated by this epigenetic modification. Future studies may ascertain the mechanism of tissue-specific imprinted expression in Cpa4 by performing chromatin immunoprecipitation in imprinted and non-imprinted tissues. To determine if tissue specific promoters contributed to Cpa4’s pattern of expression, an in silico search for alternate 5' ends was performed. Alternate exons were not identified (data not shown), yet 5' RACE experiments using brain cDNA would be able to verify this observation and determine if the absence of imprinting in this tissue is due to transcription from a different promoter.

Previous studies have failed to determine a role for Cpa4 in the aetiology of RSS

(12). However, Cpa4 remains a strong candidate for prostate cancer aggressiveness. To determine if changes in the regulation of the transcript contribute towards this disease, future studies may examine changes in the expression and imprint of Cpa4 in affected patients.

The identification of imprinted expression in Cpa4 represents another example suggesting that most imprinted genes maintain parent-of-origin specific expression in both humans and mice. Additionally, it facilitates the identification of cis-acting regulatory 120 elements conserved between the two species. It allows for the use of mice as a model to study the transcripts at the human 7q32.3 imprinted locus and the conserved mechanisms that regulate their parent-of-origin specific expression.

121

IV.E REFERENCES

1. Okamura, K., Hagiwara-Takeuchi, Y., Li, T., Vu, T.H., Hirai, M., Hattori, M., Sakaki, Y., Hoffman, A.R. and Ito, T. (2000) Comparative genome analysis of the mouse imprinted gene impact and its nonimprinted human homolog IMPACT: toward the structural basis for species-specific imprinting. Genome Res, 10, 1878- 89. 2. Lee, M.P., Brandenburg, S., Landes, G.M., Adams, M., Miller, G. and Feinberg, A.P. (1999) Two novel genes in the center of the 11p15 imprinted domain escape genomic imprinting. Hum Mol Genet, 8, 683-90. 3. Wang, Y., Joh, K., Masuko, S., Yatsuki, H., Soejima, H., Nabetani, A., Beechey, C.V., Okinami, S. and Mukai, T. (2004) The mouse Murr1 gene is imprinted in the adult brain, presumably due to transcriptional interference by the antisense-oriented U2af1-rs1 gene. Mol Cell Biol, 24, 270-9. 4. Yamasaki, K., Hayashida, S., Miura, K., Masuzaki, H., Ishimaru, T., Niikawa, N. and Kishino, T. (2000) The novel gene, gamma2-COP (COPG2), in the 7q32 imprinted domain escapes genomic imprinting. Genomics, 68, 330-5. 5. Lee, Y.J., Park, C.W., Hahn, Y., Park, J., Lee, J., Yun, J.H., Hyun, B. and Chung, J.H. (2000) Mit1/Lb9 and Copg2, new members of mouse imprinted genes closely linked to Peg1/Mest. FEBS Lett, 472, 230-4. 6. Pallares, I., Bonet, R., Garcia-Castellanos, R., Ventura, S., Aviles, F.X., Vendrell, J. and Gomis-Ruth, F.X. (2005) Structure of human carboxypeptidase A4 with its endogenous protein inhibitor, latexin. Proc Natl Acad Sci U S A, 102, 3978-83. 7. Neville, P.J., Conti, D.V., Paris, P.L., Levin, H., Catalona, W.J., Suarez, B.K., Witte, J.S. and Casey, G. (2002) Prostate cancer aggressiveness locus on chromosome 7q32-q33 identified by linkage and allelic imbalance studies. Neoplasia, 4, 424-31. 8. Paiss, T., Worner, S., Kurtz, F., Haeussler, J., Hautmann, R.E., Gschwend, J.E., Herkommer, K. and Vogel, W. (2003) Linkage of aggressive prostate cancer to chromosome 7q31-33 in German prostate cancer families. Eur J Hum Genet, 11, 17-22. 122

9. Witte, J.S., Goddard, K.A., Conti, D.V., Elston, R.C., Lin, J., Suarez, B.K., Broman, K.W., Burmester, J.K., Weber, J.L. and Catalona, W.J. (2000) Genomewide scan for prostate cancer-aggressiveness loci. Am J Hum Genet, 67, 92-9. 10. Huang, H., Reed, C.P., Zhang, J.S., Shridhar, V., Wang, L. and Smith, D.I. (1999) Carboxypeptidase A3 (CPA3): a novel gene highly induced by histone deacetylase inhibitors during differentiation of prostate epithelial cancer cells. Cancer Res, 59, 2981-8. 11. Bentley, L., Nakabayashi, K., Monk, D., Beechey, C., Peters, J., Birjandi, Z., Khayat, F.E., Patel, M., Preece, M.A., Stanier, P. et al. (2003) The imprinted region on human chromosome 7q32 extends to the carboxypeptidase A gene cluster: an imprinted candidate for Silver-Russell syndrome. J Med Genet, 40, 249-56. 12. Kayashima, T., Yamasaki, K., Yamada, T., Sakai, H., Miwa, N., Ohta, T., Yoshiura, K., Matsumoto, N., Nakane, Y., Kanetake, H. et al. (2003) The novel imprinted carboxypeptidase A4 gene (CPA4) in the 7q32 imprinting domain. Hum Genet, 112, 220-6. 13. Parker-Katiraee, L., Carson, A.R., Yamada, T., Arnaud, P., Feil, R., Abu-Amero, S.N., Moore, G.E., Kaneda, M., Perry, G.H., Stone, A.C. et al. (2007) Identification of the imprinted KLF14 transcription factor undergoing human-specific accelerated evolution. PLoS Genet, 3, e65. 14. Sun, A., Ge, J., Siffert, W. and Frey, U.H. (2005) Quantification of allele-specific G-protein beta3 subunit mRNA transcripts in different human cells and tissues by Pyrosequencing. Eur J Hum Genet, 13, 361-9. 15. Wittkopp, P.J., Haerum, B.K. and Clark, A.G. (2004) Evolutionary changes in cis and trans gene regulation. Nature, 430, 85-8. 16. Ruf, N., Bahring, S., Galetzka, D., Pliushch, G., Luft, F.C., Nurnberg, P., Haaf, T., Kelsey, G. and Zechner, U. (2007) Sequence-based bioinformatic prediction and QUASEP identify genomic imprinting of the KCNK9 potassium channel gene in mouse and human. Hum Mol Genet, 16, 2591-9. 17. Ruf, N., Dunzinger, U., Brinckmann, A., Haaf, T., Nurnberg, P. and Zechner, U. (2006) Expression profiling of uniparental mouse embryos is inefficient in identifying novel imprinted genes. Genomics, 87, 509-19. 123

18. Nakabayashi, K., Makino, S., Minagawa, S., Smith, A.C., Bamforth, J.S., Stanier, P., Preece, M., Parker-Katiraee, L., Paton, T., Oshimura, M. et al. (2004) Genomic imprinting of PPP1R9A encoding neurabin I in skeletal muscle and extra- embryonic tissues. J Med Genet, 41, 601-8. 19. Yamasaki, K., Joh, K., Ohta, T., Masuzaki, H., Ishimaru, T., Mukai, T., Niikawa, N., Ogawa, M., Wagstaff, J. and Kishino, T. (2003) Neurons but not glial cells show reciprocal imprinting of sense and antisense transcripts of Ube3a. Hum Mol Genet, 12, 837-47. 20. Yamasaki-Ishizaki, Y., Kayashima, T., Mapendano, C.K., Soejima, H., Ohta, T., Masuzaki, H., Kinoshita, A., Urano, T., Yoshiura, K., Matsumoto, N. et al. (2007) Role of DNA methylation and histone H3 lysine 27 methylation in tissue-specific imprinting of mouse Grb10. Mol Cell Biol, 27, 732-42. 21. Ferguson-Smith, A.C., Moore, T., Detmar, J., Lewis, A., Hemberger, M., Jammes, H., Kelsey, G., Roberts, C.T., Jones, H. and Constancia, M. (2006) Epigenetics and imprinting of the trophoblast -- a workshop report. Placenta, 27 Suppl A, S122-6. 22. Reik, W. and Walter, J. (2001) Genomic imprinting: parental influence on the genome. Nat Rev Genet, 2, 21-32. 23. Fournier, C., Goto, Y., Ballestar, E., Delaval, K., Hever, A.M., Esteller, M. and Feil, R. (2002) Allele-specific histone lysine methylation marks regulatory regions at imprinted mouse genes. Embo J, 21, 6560-70. 24. Umlauf, D., Goto, Y., Cao, R., Cerqueira, F., Wagschal, A., Zhang, Y. and Feil, R. (2004) Imprinting along the Kcnq1 domain on mouse chromosome 7 involves repressive histone methylation and recruitment of Polycomb group complexes. Nat Genet, 36, 1296-300. 25. Lewis, A., Mitsuya, K., Umlauf, D., Smith, P., Dean, W., Walter, J., Higgins, M., Feil, R. and Reik, W. (2004) Imprinting on distal chromosome 7 in the placenta involves repressive histone methylation independent of DNA methylation. Nat Genet, 36, 1291-5. 26. Sakamoto, A., Liu, J., Greene, A., Chen, M. and Weinstein, L.S. (2004) Tissue- specific imprinting of the G protein Gsalpha is associated with tissue-specific differences in histone methylation. Hum Mol Genet, 13, 819-28. 124

27. Lau, J.C., Hanel, M.L. and Wevrick, R. (2004) Tissue-specific and imprinted epigenetic modifications of the human NDN gene. Nucleic Acids Res, 32, 3376-82. 28. Yamasaki, Y., Kayashima, T., Soejima, H., Kinoshita, A., Yoshiura, K., Matsumoto, N., Ohta, T., Urano, T., Masuzaki, H., Ishimaru, T. et al. (2005) Neuron-specific relaxation of Igf2r imprinting is associated with neuron-specific histone modifications and lack of its antisense transcript Air. Hum Mol Genet, 14, 2511-20. 29. Arnaud, P., Hata, K., Kaneda, M., Li, E., Sasaki, H., Feil, R. and Kelsey, G. (2006) Stochastic imprinting in the progeny of Dnmt3L-/- females. Hum Mol Genet, 15, 589-98. 30. Hikichi, T., Kohda, T., Kaneko-Ishino, T. and Ishino, F. (2003) Imprinting regulation of the murine Meg1/Grb10 and human GRB10 genes; roles of brain- specific promoters and mouse-specific CTCF-binding sites. Nucleic Acids Res, 31, 1398-406. 31. Peters, J., Wroe, S.F., Wells, C.A., Miller, H.J., Bodle, D., Beechey, C.V., Williamson, C.M. and Kelsey, G. (1999) A cluster of oppositely imprinted transcripts at the Gnas locus in the distal imprinting region of mouse chromosome 2. Proc Natl Acad Sci U S A, 96, 3830-5. 32. Blagitko, N., Mergenthaler, S., Schulz, U., Wollmann, H.A., Craigen, W., Eggermann, T., Ropers, H.H. and Kalscheuer, V.M. (2000) Human GRB10 is imprinted and expressed from the paternal and maternal allele in a highly tissue- and isoform-specific fashion. Hum Mol Genet, 9, 1587-95. 33. Kerjean, A., Dupont, J.M., Vasseur, C., Le Tessier, D., Cuisset, L., Paldi, A., Jouannet, P. and Jeanpierre, M. (2000) Establishment of the paternal methylation imprint of the human H19 and MEST/PEG1 genes during spermatogenesis. Hum Mol Genet, 9, 2183-7. 34. Kaneda, M., Okano, M., Hata, K., Sado, T., Tsujimoto, N., Li, E. and Sasaki, H. (2004) Essential role for de novo DNA methyltransferase Dnmt3a in paternal and maternal imprinting. Nature, 429, 900-3.

125

CHAPTER V: IDENTIFICATION OF THE IMPRINTED KLF14 TRANSCRIPTION FACTOR ON HUMAN CHROMOSOME 7q32

Portions of the work in this chapter have been published in:

Parker-Katiraee L., Carson A.R., Yamada T., Arnaud P., Feil R., Abu-Amero S.N., Moore G., Kaneda M., Perry G., Stone A.C., Lee C., Meguro-Horike M., Sasaki H., Kobayashi K., Nakabayashi K. and Scherer S.W. (2007) Identification of the imprinted KLF14 transcription factor undergoing human-specific accelerated evolution. PLoS Genetics. 3:e65

I performed the imprinting analysis in humans and mice, the methylation analysis in humans and mice, the characterization of the gene structure and expression, the retrotransposition timing, mutation screens, the vector constructions and functional analyses. The human imprinting study was performed together with Dr. Abu-Amero and Dr. Moore from the Imperial College School of Medicine, UK. The ChIP experiment was performed in collaboration with Dr. Takahiro Yamada. The native ChIP experiments were performed by Dr. Philippe Arnaud and Dr. Robert Feil from the Institute of Molecular Genetics, France, due to their collection of native ChIP materials. The methylation analysis on two sections of the murine CpG island were performed by Dr. Kazuhiko Nakabayashi, as well as the methylation analysis of Mest. The materials from Dnmt3a knockout mice were a generous gift from Dr. Masahiro Kaneda and Dr. Hiroyuki Sasaki. The tissue samples from eutherian mammals were a gift from the Royal Ontario Museum. Tissue dissections were performed under the guidance of Dr. Takahiro Yamada and Dr. Kazuhiko Nakabayashi. Glial and neuronal cell lines were established by Dr. Takahiro Yamada. 126

V.A INTRODUCTION

Imprinted genes are generally found in clusters where they share common regulatory elements (for review, see (1)). These clusters are characterized by the presence of imprinting control regions (ICRs), which are differentially methylated regions that regulate the parent-of-origin specific expression of several genes. Two such clusters are found on human chromosome 7, located at 7q21.3 and 7q32.3 (Table I.2). Genes located within and flanking these clusters were selected as candidate imprinted genes and their patterns of transcriptional expression were analyzed according to the methodology outlined in chapter II.

This chapter describes the identification of a novel maternally-expressed imprinted gene located at 7q32.3, telomeric to TSGA13. This gene, named Krüppel-like factor 14

(human gene KLF14, mouse gene Klf14, OMIM: 609393), is intronless and encodes for a member of the Sp/KLF family of transcription factors. These proteins are characterized by three highly conserved C2H2-type zinc fingers at the carboxy-terminal end joined to each other by linker sequences, known as Krüppel-links (2). In contrast, the N-terminus is highly variable between KLF paralogues and has lower levels of conservation between orthologues (3). Members of the KLF family are known to act as transcriptional activators, repressors, or both (4). KLF14 itself is uncharacterized and its binding sites are unknown.

The cluster of imprinted genes at 7q32.3, described in chapter I.C.3.ii.c, is characterized by the presence of MEST. Consequently, this gene and its associated differentially methylated region were used as a control in many of the experiments. The work in this chapter shows that KLF14 has monoallelic maternal expression in a variety of different embryonic and extra-embryonic tissues in human and mouse. 127

Figure V.1

Figure V.1 Human and murine KLF14 structure. A) Human KLF14 and B) Murine Klf14 structure. Genes in 7q32.3 region are shown in the upper panel with maternally and paternally expressed genes depicted in grey and black, respectively. Striped patterns represent genes with tissue specific imprinting. Arrows indicate transcriptional direction. Lower panels show the gene structure of KLF14/Klf14, including results from the rapid amplification of cDNA ends (RACE), the open reading frame (ORF), CpG island, and primers used in various analyses (represented by thin black bars). The grey block representing AK030435 denotes the fact that evidence of splicing was not identified in our experiments. 128

V.B MATERIALS AND METHODS

V.B.1 RT-PCR using RNA from somatic cell hybrids, human tissues, and murine embryonic samples

cDNA was obtained and amplified from somatic cell hybrids cell lines and human tissues (5). Briefly, RNA was isolated using TRIZOL reagent (Invitrogen), following the manufacturer’s protocol. Two micrograms of RNA were subsequently used for cDNA synthesis using random primers (SuperScript II, Invitrogen). The primers used for amplification of KLF14 are forward (5'-CCACCCAACCTATCATCCAG-3') and reverse

(5'-GTACCTCCCCAGAGTCCACA-3'). Reciprocal hybrid crosses were performed between C57BL/6J and JF1/Ms, and tissues were obtained from the F1 generations. Glia and neurons were cultured as previously described (6) and cDNA was obtained (5). To identify genomic single nucleotide polymorphism (SNP) in Klf14, an 851 bp fragment was amplified using primers AK030435F (5'-TGGACACCCTCTCCAAAGTC-3') and

AK030435R (5'-AAGCGACATCAGTGCTCCTT-3'), and a SNP corresponding to bp 451 of AK030435 was found (Figure V.1). Amplified DNA and cDNA fragments were purified using microCLEAN (Microzone Ltd), and were subsequently sequenced on an ABI

3730XL using BigDye Terminator v3.1 cycle sequencing kit (Applied Biosystems), combined with Half BigDye sequencing buffer (Sigma Aldrich).

V.B.2 Methylation analysis

Genomic DNA was extracted from BL6xJF1 12.5 dpc whole embryos and fibroblasts. Bisulfite treatment was performed using the EZ methylation protocol (Zymo

Research). One microlitre of the eluted DNA was used for PCR using the following primers: F1 5'-TGGTTGTAATAAGGTTTATTATAAGT-3'and R1 5'-

AAACCAAAACTTTCCACCATAACTA-3', F2 5'-TGGAGGATTGGGGGTATTTATA- 129

3' and R2 5'-CAAACAAATAATTTCCCAAACTACTAA-3', F3 5'-

TTTGGGGTTATTTTTTATTTGAGTT-3' and R3 5'-

TCAAACAAAATCCTAAAAACTTTTT-3'. PCR products were subcloned using the pGEM-T easy system (Promega) and transformants were sequenced.

V.B.3 Rapid amplification of cDNA ends (RACE) to determine full length sequence of KLF14

Marathon ready cDNA from 11-day embryo and placenta (Clontech) were used for

RACE to determine the full length of KLF14 in mouse and human, respectively. The manufacturer’s protocol was followed using the following primers for 3'RACE: human

GSP1 (5'- GAAGGGATGAACTCCCGTACTCTCCA-3'), human GSP1-nested (5'-

AACCAGGGATGTGAAACTGG-3'), mouse GSP1 (5'-

GGGTGTTGTGATCTCATGGAGTTG-3') and mouse GSP1-nested (5'-

TGCTAAGTTTCTGCCAAGAGC-3'). The amplified cDNA fragments were purified using microCLEAN (Microzone Ltd) and directly sequenced.

V.B.4 Chromatin immunoprecipitation (ChIP) and analysis of histone modifications

Formaldehyde fixed ChIP assay was carried out using Chromatin

Immunoprecipitation Assay Kit (Upstate Biotechnology, Lake Placid, NY) as previously described (7). Sonicated fragments of 200bp-1Kb in length were used, as determined by gel electrophoresis. The antisera used were against H3K9acK14ac (Upstate Biotechnology,

06-599), H4ac (Upstate Biotechnology, 06-866), and H3K4me2 (Upstate Biotechnology,

07-030). DNA obtained from the precipitated fractions was amplified using primers for

Mest (MCF1: 5'-AGGGGGTAGCGGGTCAATAC-3' and MCR1: 5'-

ATGTGCTGGTGGCCGAAGCAG-3'), and Klf14 (KCF1: 5'-

TTGGAGCCAGACGAGCTGGAAG-3' and KCR1: 5'- 130

AGGCTGCTGGGAATGCCATAGC-3'). Alleles between BL6 and JF1 were distinguished by MaeI and SacII polymorphisms. The dominant allele was determined by analyzing band intensities. ChIP was performed using non-fixed chromatin from BL6xJF1

13.5dpc hybrid embryos, and precipitated DNA was analysed by PCR-SSCP as previously described (8). The antibodies used for native-ChIP were directed against H3K9ac (Upstate

Biotechnology, 06-942), H3K4me2 (Upstate Biotechnology, 07-030), H3K9me3 (Upstate

Biotechnology, 07-442) and H4K20me3 (Upstate Biotechnology, 07-463).

V.B.5 Amplification of KLF14 and KLF16 in mammalian species

The KLF14 ORF was sequenced in 352 individuals (704 chromosomes), representing both patients (60 RSS and 160 autistic) and controls (78 Caucasians from the human variation panel and 54 individuals from an ethnically diverse panel containing 9

African American, four Arabic, four Armenian, four Chinese, three Greek, four Indo-

Pakistani, four Italian, four Iranian, eight Japanese, and 10 Somalian individuals).

Haplotypes were determined manually, and when necessary, samples were subcloned to determine the phase of polymorphisms. The KLF14 ORF was also sequenced from three gorillas, two orang-utans, two macaques, two bonobos and 20 chimpanzees. In the latter, at least one individual from three chimpanzee subspecies was included. Primers used to amplify KLF16 were: F1 (5'-CCCGCCACCACCGGAC-3') or F2 (5'-

CCCGGCACTACCGGAC-3') and R (5'-TGCAGGGCAGCGAGTCG-3'). KLF14 was amplified using the following primers and combinations thereof: F1 (5'-

AACTTCTTGTCGCAGTCGAG-3'), F2 (5'-ACTTCTTGTCGCAGTCGA-3'), R1 (5'-

CGTGCCTGGACTACTTCGC-3', R2 (5'-GCCCCACCTGCTGGCT-3'), R3 (5'-

GCCCCACCTGCTGGC-3').

131

V.B.6 Sub-cellular localization

A GFP tagged murine Klf14 construct was generated using pEGFP-N1 (Clontech).

Full length Klf14 was amplified by PCR from genomic DNA using primers 5'- tcagGGATCCATGTCGGCCGCCGTGGCTTGC-3' and 5'- tcagTCTAGACTACAGGCAAGCAGTGAAGCT-3'. The lower case letters are overhangs.

Underlined letters correspond to restriction sites for BamHI and XbaI. The amplified fragment was sub-cloned using Invitrogen’s TOPO cloning kit and was verified to be mutation free. The sub-cloned open reading frame and the GFP vector were both digested with BamHI and XbaI. The open reading frame was ligated in-frame to the GFP vector, and

2 micrograms of the resulting pEGFPKLF14 vector were transfected into COS-7 cells using

Lipofectamine-Plus (Invitrogen). A Myc tagged murine Klf14 construct was generated, where the full length open reading frame of the gene was amplified using the following primers: 5'-cagtcagCTCGAGAAATGTCGGCCGCCGTGGCGTG-3' and 5'- tcagtcagCTCGAGCTACAGGCAAGCAGTGAAGCT-3'. The underlined letters correspond to the restriction site for XhoI. The amplified fragment was subcloned as aforementioned. Vector pcDNA3myc, which was a gift from Dr. Berge Minassian, was digested with XhoI. The dephosphorylated vector was ligated to the open reading frame to create vector pcDNA3mycKlf14. Two micrograms of the construct were transfected into

COS-7 cells using Lipofectamine-Plus (Invitrogen). The cells were washed and fixed.

Cultures were blocked for 1 hour and incubated with anti-Myc (c-Myc (9E10): sc-40, Santa

Cruz) for 45 minutes at room temperature. Slides were washed with PBS and incubated with secondary antibody (Alexa Fluor 488 chicken anti-mouse IgG, Invitrogen, 1:400) in blocking solution. DAPI staining was performed using a 1/1000 dilution in PBS of a 0.3 mM stock (Sigma). 132

V.C RESULTS

V.C.1 Maternal specific expression of human and murine KLF14 in embryonic and extra-embryonic murine tissues

In order to determine the allelic expression of murine Klf14 in various tissues, reciprocal crosses of C57BL/6J (BL6) and JF1/Ms (JF1) were carried out and cDNA was extracted from embryonic and extra-embryonic tissues at 15.5 days post coitum (dpc). To distinguish the two parental alleles, a G/A polymorphism corresponding to nucleotide 451 of AK030435 was identified. This strain-specific variation was used for an RFLP analysis. cDNA spanning the polymorphism was PCR amplified and digested using HpaII, which cut the BL6 allele once and the JF1 allele twice (Figure V.2a). The analysis revealed a clear pattern of monoallelic expression of the maternal allele in all tissues examined (Figure

V.2b).

To confirm these findings and to determine the pattern of expression of Klf14 in tissues with low levels of expression, PCR amplified cDNA was sequenced. Equal peak heights of the two alleles were observed in sequencing electropherograms from PCR- amplified genomic DNA of BL6xJF1 F1 hybrids, indicating lack of amplification bias

(Figure V.3). cDNA from the F1 hybrids was amplified, the products were sequenced, and allelic expression was analyzed in an intronless PCR fragment amplified by primers

AK0302435F/AK030435R. Due to the intronless nature of the amplicon, samples without reverse transcriptase were also prepared to account for the possibility of genomic DNA contamination. In all tissues examined, monoallelic expression was observed, as noted by the expression of a single peak at the position corresponding to the G/A polymorphism

(Figure V.3a). A parent-of-origin pattern corresponding to the expression of the maternal allele was identified by use of the reciprocal crosses, thereby eliminating the possibility of a 133 non-imprinted allele specific expression pattern. In addition, Klf14 was found to be imprinted in tissues extracted from 9.5 dpc embryos and neonates (data not shown), indicating that the imprinted expression of Klf14 is not developmental stage specific and is an imprint established early in development.

To distinguish parental alleles of human KLF14, DNA derived from fetal samples were genotyped for a C/T polymorphism at nucleotide 336 of NM_138693, and three fetuses heterozygous for the polymorphism were identified (Figure V.3b). cDNA corresponding to KLF14 was sequenced, and the expression of the alleles was noted.

Monoallelic expression of KLF14 was observed in lung (Fetus #66), heart (Fetus #65), tongue, stomach, eye, intestine, and placental samples (Fetus #62). One informative fetus- mother DNA pair indicated monoallelic expression of the maternal allele (Fetus #66).

Additionally, expression of KLF14 was analyzed in cDNA extracted from somatic cell hybrid lines containing a single copy of a maternal or paternal human chromosome 7.

These lines have previously been shown to maintain the monoallelic expression of MEST- isoform-1 and MESTIT1 (5). PCR was carried out, amplifying a 310 bp fragment specific to

KLF14. The absence of a PCR product in a cell line not containing the human chromosome

(A9) indicated that the amplification was specific to a human chromosome 7 transcript.

Amplification of KLF14 was observed exclusively in the cell lines containing a maternal copy of human chromosome 7 (Figure V.3C), indicating maternal specific expression. 134

Figure V.2

A Uncut PCR product 821 bp 648 bp 173 bp BL6 allele 536 bp 112 bp 173 bp JF1 and CAST alleles

B

Genomic-Cast-Cut BJ-Intestine BJ-Brain BJ-Heart BJ-Lung Genomic-BL6-Cut Genomic-JF1-Cut JB-Intestine JB-Brain JB-Heart JB-Lung BJ-Labyrinth Genomic-Uncut JB-Yolk Sac JB-Yolk BJ-Yolk Sac JB-Labyrinth

1kb 650bp 500bp

200bp

Figure V.2 Imprinting analysis of murine Klf14 by RFLP. A) Digestion of Klf14 cDNA by HpaII. PCR amplified cDNA was digested with HpaII, which produced distinct patterns of digestion for each murine allele due to polymorphisms in the sequence. B) Monoallelic expression of Klf14 in murine tissues. The digested Klf14 cDNA products were run on a 1.5% agarose gel. Undigested PCR amplicons and PCR products corresponding to genomic DNA were run as positive controls. These products were of the same length as the cDNA amplicon due to the intronless nature of the transcript. These controls indicate that the PCR products were fully digested. The digested cDNA products indicate that the maternal allele is monoallelically expressed in all tissues shown. Labyrinth refers to the placental labyrinth. BJ and JB denote BL6xJF1 and JF1xBL6, respectively, where the first letter denotes the maternal allele. All tissues used were from 15.5 dpc mice. 135

Figure V.3

Figure V.3 Imprinting analysis of human and murine KLF14. A) Imprinted expression of murine Klf14. Sequence analysis of genomic DNA and RT-PCR products from 15.5 dpc hybrid mice are shown in the left and right panels, respectively. Genomic sequencing results indicate the genotype for JF1 (G) at the polymorphism. RT-PCR sequencing results show the expression of the JF1 allele in all tissues where JF1 is the maternal allele (upper row in right panel), and expression of the BL6 allele in the reciprocal cross (lower row in right panel), indicating maternal expression. B) Imprinted expression of human KLF14. The first column of panels shows genomic sequencing electropherograms for three fetal samples (rows) heterozygous for a polymorphism in KLF14. The second column presents the genotype for the corresponding maternal samples (maternal DNA was not available for Fetus #62). The third column shows sequencing results of RT-PCR products indicating the monoallelic expression of various tissues, as indicated on the right of the column. Results from Fetus #66, which is informative for parental origin, indicate that KLF14 is maternally expressed. (*Sequencing of tongue, stomach, eye, kidney, and intestine cDNA from fetus #62 showed monoallelic expression) C) Maternal expression of human KLF14 in somatic cell hybrids. RT-PCR was performed for three independent maternal or paternal monochromosomal hybrid cell lines for human chromosome 7. Results confirm the maternal expression of KLF14, as seen in B. The expression of the paternally expressed MEST and mouse A9 cell-line, which lacks human chromosome 7, are also shown. 136

V.C.2 Histone modifications in Klf14 and Mest CpG islands

As described in Chapter I.C.2.ii, the allele-specific modification of histones has been shown to be a hallmark of promoters and ICRs of imprinted loci (9, 10). Though such modifications are a feature of preferential expression when found in the promoter, they are integral at the level of ICRs. The unmethylated allele of ICRs is associated with “open chromatin” marks, such as acetylated histones and dimethylation of lysine 4 (H3K4me2).

In contrast, the methylated allele of ICRs is characterized by heterochromatic features, such as hypoacetylation on lysine 9 of histone H3 (H3K9ac), trimethylation of lysine 9 on histone 3 (H3K9me3) and trimethylation of lysine 20 on histone 4 (H4K20me3) (11, 12).

Thus, by performing chromatin immunoprecipitation (ChIP), allele-specific covalent histone modifications that have been shown to be characteristic of maternally and paternally methylated ICRs were analyzed (9, 10, 13, 14) (P. Arnaud and R. Feil, personal communication).

ChIP was performed by formaldehyde fixation using murine fibroblasts from the F1 hybrid offspring of BL6xJF1. It had been previously determined that Klf14 is imprinted in these cells (Figure V.3), and therefore epigenetic modifications associated with the gene would be present in fibroblast cells from F1 hybrids. To distinguish each allele, SNPs were identified in the CpG islands of murine Mest and Klf14, which were also RFLPs of MaeI and SacII. (Figure V.4a). Although H3K9acK14ac, H4 acetylation (at lysines 5, 8, 12, and

16) and H3K4me2 were exclusively enriched in nucleosomes of the unmethylated paternal allele of Mest, no differences were observed between the parental alleles at Klf14 (Figure

V.4b).

Subsequently native-ChIP was performed, which precipitates a higher quantitity of chromatin in comparison to formaldehyde fixed chromatin (15). This analysis was carried 137 out to determine if a more sensitive assay would identify differences in histone modifications in the CpG island of Klf14. For this analysis DNA from the 13.5dpc whole embryos of hybrid BL6xJF1 mice was used. Precipitated DNA was PCR amplified and analysed by the single strand conformation polymorphism (SSCP) method (Figure V.4c).

Again, it was determined that H3K9ac and H3K4me2 were associated with the unmethylated paternal allele in the Mest promoter and that H3K9me3 and H4K20me3 were associated with the methylated maternal allele. However, differences between the alleles of

Klf14 were not observed. 138

Figure V.4 139

Figure V.4 Epigenetic modifications of murine Klf14. A) The location and distribution of regions analyzed in Mest and Klf14. The CpG islands overlapping Mest exon 1 and Klf14 are depicted by grey bars (row 1). The regions examined in the methylation analysis and ChIP assay are indicated in rows 2 and 3, respectively. The restriction enzymes used in the ChIP assay and the polymorphisms identified in BL6 and JF1 strains are also shown. B) Analysis of histone modifications by ChIP in fibroblast cells of BL6xJF1 and JF1xBL6 hybrids. ChIP was performed using formaldehyde fixed chromatin. Antibodies against H3 acetylated at lysines 9 and 14 (H3K9acK14ac), H4 acetylated at lysines 5, 8, 12, and 16 (H4ac), and dimethylated H3K4 (H3K4me2) were used in the ChIP assay. Precipitated DNAs were PCR amplified using primers specific to the CpG islands of Mest and Klf14 and subsequently digested as shown in a). DNA before immunoprecipitation (input) and the product obtained with no antibody (N.C.) were also included in the analysis. The difference in band intensities between the precipitated products and input DNA reveals that there is preferential precipitation of H3K9acK14ac, H4ac and H3K4me2 on the paternal allele of the Mest CpG island, but no allelic differences were detected at the Klf14 region. C) Analysis of histone modifications by native-ChIP in whole embryos of BL6xJF1 hybrids. Chromatin was immunoprecipitated using antibodies against acetylated K9 on H3 (H3K9ac), H3K4me2, trimethylated K9 on H3 (H3K9me3) and trimethylated K20 of H4 (H4K20me3). Anti-chicken was used as a non- specific antibody (mock). Input DNA is denoted by I. Antibody-bound and unbound fractions of the precipitate are denoted by B and U, respectively. Precipitated DNA was PCR amplified using the same primers as in panel B. The amplified DNA was analysed by SSCP. The results show differences in histone modifications between the two parental alleles in the Mest CpG island, allelic enrichments were observed for Klf14. 140

V.C.3 Methylation analysis and Klf14 expression in Dnmt3a knockout mice

As described in Chapter I.C.2.i, the establishment of a differentially methylated region (DMR) in the germ line is the primary hallmark of imprinted loci and has been shown to be necessary for the proper imprinting of many regions (for review, see (1)). The study of these regions has been greatly enhanced by DNA methyltransferase 3a (Dnmt3a) conditional knockout mice. These mice have Dnmt3a disrupted specifically in germ cells, while somatic cells express the wild type protein. This method allows conditional knockouts to be viable and enables the study of their offspring. The progeny of female conditional knockouts die in utero at approximately 10.5 dpc, and have been shown to be hypomethylated at maternally methylated DMRs, while methylation patterns at paternally methylated regions and repetitive regions remain unaltered (16). Correspondingly, a 1.6- fold increase in the expression of Mest has been measured in these Dnmt3a-/wt embryos indicating a relaxation of the imprinted expression of the gene. The expression of Mest and

Klf14 was examined in the offspring of female Dnmt3a conditional knockout mice. These results indicate a substantial decrease in the expression of Klf14 in both embryonic and extra-embryonic tissues in the Dnmt3a-/wt embryos (Figure V.5a). This suggests that the expression of Klf14 is dependent upon the establishment of maternal imprints in oocytes

(Figure V.5b).

Subsequently, the methylation status of the CpG island located in Klf14 was examined. Bisulfite-treated DNA extracted from 12.5 dpc BL6xJF1 hybrids was PCR amplified using three different PCR primers spanning the CpG island (Figure V.4a). These amplicons were subcloned and analyzed. Surprisingly, hypomethylation of both alleles was observed throughout the CpG island (Figure V.6a). Similar results were obtained from the bisulfite sequencing of fibroblast DNA (Figure V.6b). Using the same materials, the 141 methylation of Mest was analyzed and enrichment of methylated CpGs on the maternal allele was observed, as previously described (17) (data not shown).

The methylation status of 94 CpG dinucleotides located in the open reading frame

(ORF) and 5' UTR regions of human KLF14 were examined by bisulfite sequencing of human fibroblast DNA. Extensive hypomethylation was again observed in the subcloned fragments (Figure V.6c). 142

Figure V.5

A

B WT Mest Klf14

P

M Dnmt3a -/wt Mest Klf14

P

M

Figure V.5 Expression of Klf14 in offspring of Dnmt3a conditional knockout mice. A) The expression of Mest and Klf14 was examined in tissues of Dnmt3a knockout mice. Expression was examined in two embryos (e1-2) and their corresponding extra-embryonic tissues (e1-2ex) from the offspring of female Dnmt3a conditional knockout mice, as well as a wildtype (wt) embryo. Klf14 expression is lost in the knockout mice, suggesting that its expression is dependent upon a maternally methylated region. B) Model of Klf14 expression in wildtype (wt) and Dnmt3a conditional knockout mice. In wildtype mice (upper panel), the maternally methylated CpG island in Mest (black circle) silences the expression of the gene from the maternal allele (M), while Klf14 is actively transcribed on this allele. The opposite pattern of expression is seen on the paternal strand (P), due to the unmethylated CpG island (white circle). In Dnmt3a -/wt embryos (lower panel), maternal methylation of the Mest CpG island is lost, causing increased expression of Mest and loss of expression of Klf14. 143

Figure V.6 A

B

C

Figure V.6 Bisulfite sequencing of murine and human DNA. Hollow circles and filled circles indicate unmethylated and methylated CpG dinucleotides, respectively (N, could not be determined). Each row of circles represents CpGs in an individual PCR product clone. A) Bisulfite sequencing analysis from 12.5 dpc whole embryos of BL6xJF1 hybrids. Each block corresponds to a separate region analyzed in the murine Klf14 CpG island, as shown in Figure V.4a. In each block, the top section and bottom sections correspond to clones from the maternal allele (BL6) and paternal (JF1) alleles, respectively, as determined by use of polymorphisms. The three regions analyzed indicate that the Klf14 CpG island is hypomethylated on both alleles. B) Bisulfite sequencing analysis from fibroblast DNA from BL6xJF1 hybrid embryos. The top section and bottom sections correspond to clones from the maternal allele (BL6) and paternal (JF1) alleles, respectively, as determined by use of polymorphisms. The region analysed is equivalent to the large segment in A) C) Bisulfite sequencing analysis of human fibroblast DNA. The methylation status of 94 CpG dinucleotides located in the open reading frame (ORF) and 5’ UTR regions of human KLF14 are shown. Polymorphisms unique to each allele could not be identified. The results indicate hypomethylation of the human KLF14 CpG island. 144

V.C.4 Characterization of the human and murine KLF14 transcripts

An in silico analysis of human KLF14 was performed to identify the full length transcript of the gene by EST assembly. A single intronless gene of approximately 1.4 Kb

(NM_138693) was found in the Chromosome 7 Annotation database (18). This reference sequence, derived from mRNA AF490374, contains an ORF of 972 nucleotides, as well as an in-frame stop codon. Rapid amplification of cDNA ends (RACE) was performed and single band of approximately 1.6 Kb was amplified and directly sequenced (Figure V.1a).

The unspliced fragment contained a poly-A tail and poly-A signal. 3' RACE was again performed using primers located closer to the new 3' end and a second fragment, also containing a poly-A tail, was identified.

In silico analysis was also performed to identify the full length of murine Klf14. The spliced transcript AK030435, whose putative second exon was originally used to determine the imprinted expression of the gene, contains a poly-A signal. RT-PCR was performed to confirm the expression of the spliced transcript, yet such attempts only succeeded in amplifying cDNA intronic to AK030435 and did not find evidence of splicing. An ORF of

978 nucleotides was identified, partially located in the intron of AK030435, corresponding to Klf14 (Figure V.1b). A new accession, corresponding to DQ534758, was submitted with the new cDNA sequence. 3' RACE was performed and the results did not extend the transcript beyond the 3' end of AK030435, though they identified a poly-A tail. 145

V.C.5 Expression of human and murine KLF14

RT-PCR was performed on murine cDNA, where higher levels of Klf14 expression were observed in embryonic and extra-embryonic tissues with respect to adult tissues

(Figure V.7a). Glia and neurons were cultured from mouse embryos, as previously described (6), and a much higher level of expression was observed in the latter (Figure

V.7b).

To determine the expression of KLF14 in human tissues, cDNA was obtained from numerous adult and fetal RNA samples. RT-PCR was carried out, amplifying a 310 bp fragment of KLF14. In general, the transcript was found to have low levels of expression in both human and mouse, requiring a minimum of 35 cycles of PCR for visible amplification.

The transcript was found to be expressed in many tissues (Figure V.7c), but its expression was absent in liver and lymphoblast (data not shown). It was found to have higher levels of expression in fetal tissues and placenta than in adult tissues. 146

Figure V.7

A B

C

Figure V.7 Human and murine KLF14 expression. A) Expression of Klf14 in murine embryonic and extra-embryonic and B) brain tissues. Samples lacking reverse transcriptase are indicated by -. Results indicate higher levels of expression in extra- embryonic tissues. C) Expression of KLF14 in human tissues. Human expression results concur with those of mouse expression, in that there is higher expression in pre-natal stages of development. 147

V.C.6 Functional prediction and sub-cellular localization of KLF14

To identify conserved domains, a multispecies protein sequence alignment was performed using sequences from human, chimpanzee, gorilla, dog, rat, and mouse. The analysis revealed two conserved domains: one at the N-terminus of the protein and a highly conserved region at the C-terminus. The sequence conservation and the length of these regions suggested functional importance. In silico and literature searches determined that the C-terminal region consisted of three C2H2-domains (Y/F-X-C-X2-4-C-X3-F/Y-X5-L-X2-

H-X3-4) joined by conserved motifs known as Krüppel links (T-G-E-R/K-P) (Figure V.8)

(19). These features are considered to be the hallmark characteristics of Krüppel-like transcription factors. The N-terminal domain was identified as being a conserved α-helical repression motif. This motif has been shown to associate with the PAH2 domain of mSin3A, which is part of the histone deacetylase complex (20, 21). Consequently, KLF14 may act as a transcriptional repressor.

Sub-cellular localization was carried out in order to determine if KLF14 was localized to the nucleus, due to its putative role as a transcription factor. COS-7 cells were transfected with vectors carrying tagged murine Klf14 (Figure V.9). The tags were ligated to the N-terminus of the protein to minimize disruption of the C-terminal zinc-fingers. The analysis revealed that Klf14 is indeed localized to the nucleus, supporting the hypothesis that it acts as a transcription factor. 148

Figure V.8

Human MSAAVACLDYFAAECLVSMSAGAVVHRRPPDPEGAGGAAGSEVGAAQPESALPGPGPSGP Chimp MSAAVACLDYFAAECLVSMSAGAVVHRRPPDPEGAGGAAGSEVGAAPPESALPGPGPPGP Gorilla MSAAVACLDYFAAECLVSMSAGAVVHRRPPDPEGAGGAAGSEVGAAPPESALPGPGPPGP Dog MSAAVACLDYFAAECLVSMSAGAVVHRRPPDPEGAGGAAGSEVGAAPPESALPGPGPPGP Rat MSASVACLDYFAAECLVSMSTRAVLHRRATDSEGAGAAAVSEVGEVSRESAGKGTGSRGV Mouse MSAAVACLDYFAAECLVSMSTRAVLHRRATDPEGASAAAVSEVGAVSRESAGKGTGSRGV ***:****************: **:***..*.***..** **** . *** *.*. *

Human ASVPQLPQVPAPSPGAGGAAPHLLAASVWADLRGSSGEGSWENSGEAPRASSGFSDPIPC Chimp ASVPPLPQVPAPSPGAGGAAPHLLAASVWADLRGSSGEGSWENSGEAPRASSGFSDPIPC Gorilla ASVPPLPQVPAPSPGAGGAAPHLLAASVWADLRGSSGEGSWENSGEAPRASSGFSDPIPC Dog ASVPPLPQVPAPSPGAGGAAPHLLAASVLADLRGGSGEGFGENSGEAPRASPGSSGP--- Rat LWIPPVLEVPAPSPGEGDGAPHLLAASALADLSCGAREGTKEDSEEAPCASTSCFEPTQC Mouse LWIPPVLQVPTPSPGEGDGAPHLLAASALADLSCGAREDFREDSEEAPCASTSCFEPTWC :* : :**:**** *..********. *** .: *. *:* *** **.. *

Human SVQTPCSELAPASGAAAVCAPESSSDAPAVPSAPAAPGAPAASGGFSGGALGAGPAPAAD Chimp SVQTPCSELAPASGAAAVCAPESSSDAPAVPSAPAAPGAPAASGGFSGGALGAGPAPAAD Gorilla SVQTPCSELAPASGAAAVCAPESSSDAPAVPSAPAAPGAPAASGGFSGGALGAGPAPAAD Dog ---TPCSEPAPTASAAQISGPAHSAGALEVPGAPAVPGAPAVPGAGPGAAPGACPAPAIG Rat SSPTGCSEPTQTFGEDELSDAESSCSESAILGAPEVPEEPDDSGEVPEGPPGARPGPAVG Mouse SSPTGGSEPTQAFFEDELSDAESSCSDSAILDAPEASEEPDDSGEVPEGPPGARPAPSTG ***: : :.. *.. : .**..* .*...***.*: .

Human QAPRRRSVTPAAKRHQCPFPGCTKAYYKSSHLKSHQRTHTGERPFSCDWLDCDKKFTRSD Chimp QVPRRRPVTPAAKRHQCPFPGCTKAYYKSSHLKSHQRTHTGERPFSCDWLDCDKKFTRSD Gorilla QVPRRRPVTPAAKRHQCPFPGCTKAYYKSSHLKSHQRTHTGERPFSCDWLDCDKKFTRSD Dog PVPRRRPVTPAAKRHRCPFPGCNKAYYKSSHLKSHQRTHTGERPFSCDWLDCDKKFTRSD Rat PTYRRRQITPASKRHQCSFHGCNKAYYKSSHLKSHQRTHTGERPFSCDWLDCDKKFTRSD Mouse PTYRRRQITPASKRHQCSFHGCNKAYYKSSHLKSHQRTHTGERPFSCDWLDCDKKFTRSD .***:***:***:*.* **.*************************************

Human ELARHYRTHTGEKRFSCPLCPKQFSRSDHLTKHARRHPTYHPDMIEYRGRRRTPRIDPPL Chimp ELARHYRTHTGEKRFSCPLCPKQFSRSDHLTKHARRHPTYHPDMIEYRGRRRTPRIDPPL Gorilla ELARHYRTHTGEKRFSCPLCPKQFSRSDHLTKHARRHPTYHPDMIEYRGRRRTPRIDPPL Dog ELARHYRTHTGEKRFSCPLCPKQFSRSDHLTKHARRHPAYHPDMIEYRGRRRTPRVDSQP Rat ELARHYRTHTGEKRFSCPLCPKQFSRSDHLTKHARRHPTYHPDMIEYRGRRRTPRPEPPP Mouse ELARHYRTHTGEKRFSCPLCPKQFSRSDHLTKHARRHPTYHPDMIEYRGRRRTPRPEPPP **************************************:**************** :.

Human TSEVESSASGSGP--GPAPSFTTCL------Chimp TSEVESSASGSGP--GPAPSFTTCL------Gorilla TSEVESSASGSGP--GPAPSFTTCL------Dog TSLVDS--SGSVP—-GQAPSFTTCLSDGHCC Rat PAMVESSGSDS----GQETSFTACP------Mouse PAMVESSGSDSSSSSGQETSFTACL------.: *:**.**.***:*

Figure V.8 Multispecies amino acid alignment of KLF14 open reading frame. Alignment was generated using ClustalW with default settings. The black boxes encompass the C2H2-domains, while the red box outlines the alpha-helical repression motif. 149

FigureV.9

A DAPI GFP-mKlf14 Merge

B DAPI Myc-mKlf14 Merge

Figure V.9 Cellular localization of the murine Klf14 protein. Klf14 was tagged with A) GFP and B) Myc and transfected in COS-7 cells. Fixed cells were stained with DAPI nuclear staining (left panels). The merge of the two figures demonstrates that the Klf14 signal overlaps with that of DAPI, suggesting that Klf14 is localized to the nucleus. 150

V.C.7 Syntenic analysis of KLF14

The intronless nature of KLF14 suggested that the gene may have arisen through retrotransposition (22). Phylogenetic studies of the KLF family have revealed that the

KLF14 protein is most closely related to KLF16 (NP_114124), encoded on human chromosome 19 (Figure V.10) (19). An amino acid alignment of these two proteins (blastp) demonstrated that they are 58% identical, and bear most similarity in the first 26 amino acids (N-terminus) and the zinc-finger domains (C-terminal end). Thus, it is plausible that this gene is an ancient retrotransposon-derived duplication of KLF16.

To determine the timing of the retrotransposition event during vertebrae evolution, the synteny of the region encompassing KLF14 in human, mouse, opossum, and chicken was examined using the genome browser at UCSC (genome.ucsc.edu). COPG2/TSGA13 and MKLN1 were used as reference points, which are located centromeric and telomeric to human KLF14, respectively. Two highly conserved segments corresponding to miRNAs

(miR-29 and miR-29b-2) were also used as anchors. Synteny with all these elements, including KLF14, was maintained in mouse (Figure V.11a). However, KLF14 and miR-29 were absent in the opossum, as well as the platypus (data not shown). A break in synteny was observed in chicken, where COPG2 and MKLN1 were located on different chromosome. This analysis thereby suggested that KLF14 was retrotransposed after the divergence of marsupials from eutherian mammals.

A more precise timing of this retrotransposition event in eutherian evolution was examined by amplifying KLF14 in organisms from each of the supraordinal clades:

Afrotheria, Xenarthra, Euarchontoglires, and Laurasiatheria (23). Using a variety of different PCR conditions, KLF14 was amplified from numerous species and its sequence was confirmed by direct sequencing of products. Several primers were designed, located in 151 the conserved zinc-finger domains of the gene. Multiple attempts using increasingly more permissive conditions were used to amplify DNA from red-legged short-tailed opossum

(Monodelphis brevicaudata), red-necked wallaby (Macropus rufogriseus), and echidna

(Tachyglossus aculeatus) yet bands corresponding to a KLF14 homologue were not obtained. However, KLF16 was amplified in both eutherian and marsupial mammals

(Figure V.11b).

The syntenic analysis together with the amplification of KLF14 in each of the superclades, most notably in members of the Xenarthra order (represented by the armadillo and tamandua), indicates that the gene is present in all eutherian mammals, yet absent in monotremes and marsupials. Based on estimates of mammalian evolution, this would place the retrotransposition event which gave rise to KLF14, between 130 and 170 million years ago (Mya) (i.e. prior to the divergence of Xenarthra and after the divergence of

Marsupialia) (24). At the same time, the presence of KLF16 in marsupials indicates that it is more ancient than KLF14, and supports our hypothesis that the latter gene may be a retrotransposed copy of KLF16. 152

Figure V.10

82 KLF8 (NP 009181.1) 98 KLF12-a (NP 009180.3) 40 KLF3 (NP 057615.2)

93 KLF5 (NP 001721.2) KLF7 (NP 003700.1)

100 KLF6-a (NP 001291.3) 100 KLF6-b (NP 001008490.1) KLF1 (NP 006554.1)

98 KLF2 (NP 057354.1) 100 KLF4 (NP 004226.2) KLF15 (NP 054798.1)

100 KLF10 (NP 005646.1) 97 KLF11 (NP 003588.1)

93 KLF9 (BAA06524.1) 97 KLF13 (NP 057079.2) 98 KLF14 (NP 619638.1) 100 KLF16 (NP 114124.1)

Figure V.10 Relationship between the human KLF-protein transcription factors. The neighbour-joining tree was constructed using MEGA 3.1 with 150 bootstrap replicates. Isoforms are differentiated by letters ‘a’ and ‘b’, whenever applicable. KLF14 is highlighted in red, and the accession for each protein used is indicated between parentheses. The analysis reveals that KLF14’s closest relative in the KLF family is KLF16. 153

Figure V.11

Figure V.11 Retrotransposition of KLF14 and mammalian evolution. A) Genomic distribution of KLF14 and flanking genes. KLF14 is flanked by MKLN1 and TSGA13 in human and mouse. These two genes are present in opossum, yet KLF14 is absent. In chicken, there is a syntenic break in the region, placing genes on different chromosomes. Two microRNAs (miR-29 and miR-29b-2) which lie in the fragile site adjacent to KLF14 (FRA7H) are conserved. B) Presence of KLF14 and KLF16 in distant mammals. The lower panel shows PCR amplification of KLF16 from mammals of diverse clades and the upper panel shows the amplification of KLF14. The mammals shown are (L-R) cow (Bos taurus), tree shrew (Tupaia glis), nine-banded armadillo (Dasypus novemcinctus), tamandua (Tamandua tetradactyla), red-necked wallaby (Macropus rufogriseus), and red- legged short-tailed opossum (Monodelphis brevicaudata). It indicates that KLF14 is present in eutherian, but not marsupial mammals and shows that KLF16 is more ancient than KLF14. 154

V.C.8 Sequence of KLF14 in RSS patients, autistic and control individuals

As previously mentioned, imprinted genes on chromosome 7 are hypothesized to play a role in the aetiology of several diseases, including RSS and autism. The KLF14 ORF was sequenced in 55 RSS patients and 160 autistic individuals to identify mutations that may be associated with these diseases.

Although, mutations specific to these affected populations were not identified in the

KLF14 ORF, numerous non-synonymous base-pair substitutions were discovered in the N- terminal region of the gene (Figure V.12). All polymorphic changes were found at equal frequencies in controls, suggesting that they are not associated with the aetiology of these diseases. The presence of these non-synonymous polymorphisms led to an in depth analysis of the gene in numerous populations to determine if the transcript was under positive selection.

The KLF14 ORF was sequenced in a total of 704 chromosomes, representing both patients and controls, the latter of which included an ethnically diverse panel of 61 individuals. Eight haplotypes were found (Figure V.12), two of which were specific to the

Japanese population (haplotypes 7 and 8) and one of which was found predominantly in individuals of African descent (haplotype 6) (Figure V.13). The frequency of these haplotypes in different populations varied greatly, suggesting that the gene may have recently undergone relaxed selection or accelerated evolution.

Subsequent phylogenetic analyses performed in conjunction with members of our laboratory could not reject the hypothesis of neutral selection for KLF14. However, it was determined that the gene is highly variable relative to other genes in the genome and its variability is not common to other primates, suggesting human-specific accelerated evolution. 155 Figure V.12 156

Figure V.12 KLF14 ORF sequences in the human, chimpanzee and gorilla. Eight different haplotypes (1-6, 7J, 8J) identified in diverse human populations are shown, as well as the sequence for the KLF14 ORF in the chimpanzee and gorilla (C and G respectively). Synonymous polymorphisms and non-synonymous polymorphisms are denoted by grey and black boxes, respectively. The corresponding amino acid substitution is noted above each polymorphism. The identity of the ancestral allele was identified by comparison to orang-utan (Pongo pygmaeus) and macaque (Macaca mulatta). The sequence corresponding to zinc-finger domains are enclosed, and the sequence between the boxes comprises the conserved Krüppel-link. Haplotypes specific to the Japanese population are indicated by “J”. 157

Figure V.13

Figure V.13: Haplotype frequencies of KLF14 open reading frame in the human population. The frequency of each haplotype, as defined in Figure V.12, identified in ethnic populations is shown (n= number of chromosomes genotyped).

158

V. D D ISCUSSION

V.D.1 Imprinting and function of KLF14

This chapter describes the identification of a novel imprinted gene, KLF14, and demonstrates that it is maternally expressed in every tissue examined in both human and mouse. The gene encodes a putative Krüppel-like transcription factor, containing three

C2H2 zinc-fingers joined together by a characteristic linker sequence (2). It is a member of a large family of transcription factors whose founding member is Erythroid Krüppel-like factor, which has been shown to bind the sequence CACCC in the promoter of the β-globin gene (25). The function of KLF14 is currently unknown, however, in silico analyses suggest that it may act as a transcriptional repressor. Sub-cellular localization studies using tagged Klf14 localized the protein to the nucleus, supporting its putative role as a transcription factor. Tags were added to the N-terminus of Klf14 and these did not interfere with its localization. This suggests that a nuclear localization signal at the N-terminal end of the protein does not direct its transportation from the cytoplasm to the nucleus. It is possible that, due to the small size of the protein (323 amino acids, approximately 35 kDa), the transcription factor is transferred to the nucleus by diffusion. Studies have demonstrated that proteins larger than 60 kDa can diffuse through nuclear pores, supporting this hypothesis (26).

The relatively greater expression level of the gene in embryonic and extra- embryonic tissues compared to adult tissues suggests a role for the transcript in development. Indeed, many imprinted genes play important roles in the regulation of growth and embryonic development (for a review, see (1)), and the aberrant expression of imprinted genes is associated with syndromes of overgrowth or growth restriction, as described in chapter I.C.1. This observation has led to the proposal of a conflict hypothesis, 159 outlined in chapter I.C.5, where maternally expressed genes suppress the growth of the offspring, thereby allowing nutrient supply to be available for future pregnancies and increasing the survival rate of the mother. In contrast, paternally expressed transcripts enhance fetal growth to ensure survival of their genetic offspring (27). Under this hypothesis, the maternally expressed KLF14 is predicted to suppress embryonic growth.

This, together with its predicted function as a transcriptional repressor, suggests that the protein may suppress the expression of genes which enhance fetal growth or placental development.

The analysis of the Klf14 CpG island indicates that the region is hypomethylated.

The identification of an unmethylated CpG-rich region in an imprinted cluster has been previously described, such as the promoter region of Dlk1 (28), the promoter and first exon of Gsα (29), and Ascl2. The results from ChIP experiments, performed using antibodies specific to various histone modifications, did not identify differences between the two alleles of Klf14. However, clear allele-specific precipitation of histone modifications was observed at the DMR of Mest. This is the first description of allele-specific histone modifications associated with the Mest germline DMR. Despite the lack of these trademark epigenetic features at the Klf14 CpG island, its imprinting is maintained and its expression is dependent on the function of Dnmt3a in female germ cells, as demonstrated by experiments involving Dnmt3a germline-specific KO mice. To date, the only DMR identified in this region is the maternally methylated CpG island at the 5' end of

MEST/Mest, which has been shown to be established in gametes (17). As such, it suggests that this DMR may regulate the imprinted expression of murine and human MEST and

KLF14, possibly through long range chromatin regulatory interactions, and may act as an

ICR for the entire locus, spanning CPA4 to KLF14. Further studies, such as chromatin loop 160 assays, are necessary to determine if the expression of these genes is regulated by long range insulator or enhancer elements, as has been shown for genes in the H19/Igf2 locus and KCNQ1 region (30).

Despite numerous attempts at performing RACE, the 5' end of Klf14 has not been determined. This analysis was complicated by the presence of polymorphic repetitive elements at the gene’s predicted 5' end in the mouse. Additionally, attempts at determining the transcript’s full length by performing Northern Blot hybridization produced smears despite using several probes unique to the transcript, suggesting that the transcript has a short half life and/or that it is not abundant. Identifying the 5' end of the gene is paramount for future epigenetic studies examining its promoter region. Future studies may elucidate the 5' end of the transcript by using human cDNA when performing 5' RACE, since the region’s sequence may be less polymorphic in humans.

As previously mentioned, KLF14 mutations unique to RSS patients were not identified. However, changes in the expression of the transcript or mutations in non-coding regions in RSS patients were not examined. Obtaining fibroblast samples from affected individuals is of great importance to perform such expression studies. Due to the genetically heterogeneous nature of the disorder, numerous samples would be needed in order to ascertain whether a fraction of patients have aberrations in KLF14 expression.

To determine if Klf14 contributes to the growth retardation phenotype in mice with segmental UPD for the region, it would be of interest to perform gene targeting experiments, knocking out the transcript and observing its effects on development, growth, and behaviour.

Changes in the expression of KLF14 may translate into changes in the expression of its target proteins. Identifying such target proteins will provide invaluable insight into the 161 function of KLF14. Consequently, future studies may examine the regions to which KLF14 binds by performing ChIP-on-chip. To perform such an experiment on endogenous KLF14, an antibody specific to this member of the KLF-family would be required. Attempts to identify peptides unique to KLF14 for antibody production were not successful, due to its similarity to KLF16 in hydrophilic regions. However, a similar experiment may be performed by introducing tagged KLF14 into a mammalian system, although the risk of identifying false positives would undoubtedly increase due to the protein’s over-expression.

Alternatively, changes in the expression of target genes may be analyzed by microarray experiments following over-expression and silencing of KLF14. Such experiments have previously been shown to identify transcription factor targets (31).

V.D.2 Evolution of KLF14

The analysis outlined in chapter V.C.7 suggests that KLF14 arose through the retrotransposition of KLF16. Thus, KLF14 is the ninth imprinted retrotransposed gene identified to date, and the first protein-coding maternally expressed retrotransposed gene identified in mouse, adding further support to the hypothesis that imprinting serves as a mechanism for regulating increased gene dosage (32). I postulate that KLF14 acquired imprinting through cis- and trans- acting elements associated with the more ancient MEST

CpG island, which is known to be imprinted in marsupials (33), allowing for the adaptation of the host eutherian mammal to the increased gene dosage caused by the retrotransposition of KLF16. By analysing the expression of KLF14 and other retrotransposed genes in eutherian mammals, as well as the epigenetic modifications present in these species, it should be possible to further elucidate the mechanism whereby these genes have acquired imprinted expression and the control elements upon which their imprinting depends. 162

KLF14, unlike its closest relatives in the KLF family of genes, has a large CpG island spanning the vast majority of its open reading frame. At the same time, the gene is enriched for proline and arginine amino acids, which are encoded by the codons CCN and

CGN, respectively. Hence, it is plausible that, upon the retrotransposition of KLF16, the fusion of the proline/arginine-rich exons created a CpG-rich region, which is bioinformatically detected as a CpG island, yet lacks the biological functions generally associated with such regions. Such an occurrence would also account for the absence of epigenetic modifications observed in the “CpG island”. As such, the existence of a GC- neutral differentially methylated promoter or additional exon upstream of the KLF14 ORF cannot be excluded. However, as previously mentioned, attempts to identify the 5' end of the gene in both human and mouse were fruitless (data not shown).

Through the analysis of KLF14 sequence in numerous individuals, it was revealed that the gene’s sequence is highly variable, specifically in the human species. It is plausible that this variability may be due to the monoallelic expression of the transcript, which allows for the accumulation of mutations on the silenced allele. Maternal inheritance of mutated alleles would give rise to their expression and deleterious mutations would face strong purifying selection due to haploinsufficiency. In contrast, beneficial mutations would undergo stronger and more rapid positive selection since their impact would be greater due to the gene’s monoallelic expression. The latter phenomenon has been implicated in the increased non-synonymous substitution rate on the X-chromosome (34). Consequently, the inherited variations seen in KLF14 should be non-deleterious, and possibly advantageous.

This is supported by the fact that the haplotypes carrying rare alleles are transmitted from both mothers and fathers, and are consequently expressed in healthy individuals, as evidenced in the unaffected siblings of RSS and autism patients. All of these sequence 163 variations, with the exception of a synonymous polymorphism observed in haplotype 6

(Figure V.12), occur in the variable, N-terminal end of the putative protein which has low sequence conservation, suggesting that variation in the C-terminal end of the protein may not be tolerated.

In 2004, Dorus and colleagues examined the evolutionary rates of nervous system- related genes between primates and rodents, thereby identifying genes under accelerated evolution in primates. They proposed that these genes may have played important roles in human speciation by developing human behaviour and brain size (35). Consequently, the disruption of genes under accelerated evolution or positive selection in the human lineage has been associated with disease. For example, mutations in FOXP2 underlie severe language and speech impairment/developmental verbal dyspraxia (36, 37), while

Microcephalin and ASPM are associated with microcephaly (38, 39). Thus, due to KLF14’s increased expression in neuronal cells, as well as the accelerated evolution observed in the human lineage, this gene may have played a role in the acquisition of human-specific traits in the evolution of the human species. Such a function would agree with the hypothesis postulated above, describing the role of imprinting in the variability of the gene. However, further studies are required to determine the gene’s function in the brain, particularly in neurons, in order to assess its contribution to human speciation and its putative role in cognitive disease.

Although, the sequencing of the ORF of KLF14 in numerous autistic individuals and RSS patients did not identify any mutations unique to these populations, it does not rule out the involvement of KLF14 in these or other diseases or phenotypes since mutations may be present in regulatory regions causing changes in expression levels, loss of imprinting, or transcript instability. Due to the lack of KLF14 expression in lymphoblasts, 164 transcript’s level of expression could not be quantified in patients, nor could imprinted expression be verified in these patients. Future studies may be able to ascertain the involvement of KLF14 in these disorders by obtaining fibroblast cells from patients, thereby elucidating the role of this gene in developmental disorders associated with human chromosome 7q32.3.

165

V.E REFERENCES

1. Reik, W. and Walter, J. (2001) Genomic imprinting: parental influence on the genome. Nat Rev Genet, 2, 21-32. 2. Turner, J. and Crossley, M. (1999) Mammalian Kruppel-like transcription factors: more than just a pretty finger. Trends Biochem Sci, 24, 236-40. 3. van Vliet, J., Crofts, L.A., Quinlan, K.G., Czolij, R., Perkins, A.C. and Crossley, M. (2006) Human KLF17 is a new member of the Sp/KLF family of transcription factors. Genomics, 87, 474-82. 4. Dang, D.T., Pevsner, J. and Yang, V.W. (2000) The biology of the mammalian Krüppel-like family of transcription factors. Int J Biochem Cell Biol, 32, 1103-21. 5. Nakabayashi, K., Bentley, L., Hitchins, M.P., Mitsuya, K., Meguro, M., Minagawa, S., Bamforth, J.S., Stanier, P., Preece, M., Weksberg, R. et al. (2002) Identification and characterization of an imprinted antisense RNA (MESTIT1) in the human MEST locus on chromosome 7q32. Hum Mol Genet, 11, 1743-56. 6. Mnatzakanian, G.N., Lohi, H., Munteanu, I., Alfred, S.E., Yamada, T., MacLeod, P.J., Jones, J.R., Scherer, S.W., Schanen, N.C., Friez, M.J. et al. (2004) A previously unidentified MECP2 open reading frame defines a new protein isoform relevant to Rett syndrome. Nat Genet, 36, 339-41. 7. Yang, Y., Li, T., Vu, T.H., Ulaner, G.A., Hu, J.F. and Hoffman, A.R. (2003) The histone code regulating expression of the imprinted mouse Igf2r gene. Endocrinology, 144, 5658-70. 8. Umlauf, D., Goto, Y. and Feil, R. (2004) Site-specific analysis of histone methylation and acetylation. Methods Mol Biol, 287, 99-120. 9. Fournier, C., Goto, Y., Ballestar, E., Delaval, K., Hever, A.M., Esteller, M. and Feil, R. (2002) Allele-specific histone lysine methylation marks regulatory regions at imprinted mouse genes. Embo J, 21, 6560-70. 10. Umlauf, D., Goto, Y., Cao, R., Cerqueira, F., Wagschal, A., Zhang, Y. and Feil, R. (2004) Imprinting along the Kcnq1 domain on mouse chromosome 7 involves repressive histone methylation and recruitment of Polycomb group complexes. Nat Genet, 36, 1296-300. 166

11. Martens, J.H., O'Sullivan, R.J., Braunschweig, U., Opravil, S., Radolf, M., Steinlein, P. and Jenuwein, T. (2005) The profile of repeat-associated histone lysine methylation states in the mouse epigenome. Embo J, 24, 800-12. 12. Grewal, S.I. and Moazed, D. (2003) Heterochromatin and epigenetic control of gene expression. Science, 301, 798-802. 13. Delaval, K., Govin, J., Cerqueira, F., Rousseaux, S., Khochbin, S. and Feil, R. (2007) Differential histone modifications mark mouse imprinting control regions during spermatogenesis. Embo J, 26, 720-9. 14. Wu, M.Y., Tsai, T.F. and Beaudet, A.L. (2006) Deficiency of Rbbp1/Arid4a and Rbbp1l1/Arid4b alters epigenetic modifications and suppresses an imprinting defect in the PWS/AS domain. Genes Dev, 20, 2859-70. 15. Goto, Y., Gomez, M., Brockdorff, N. and Feil, R. (2002) Differential patterns of histone methylation and acetylation distinguish active and repressed alleles at X- linked genes. Cytogenet Genome Res, 99, 66-74. 16. Kaneda, M., Okano, M., Hata, K., Sado, T., Tsujimoto, N., Li, E. and Sasaki, H. (2004) Essential role for de novo DNA methyltransferase Dnmt3a in paternal and maternal imprinting. Nature, 429, 900-3. 17. Kerjean, A., Dupont, J.M., Vasseur, C., Le Tessier, D., Cuisset, L., Paldi, A., Jouannet, P. and Jeanpierre, M. (2000) Establishment of the paternal methylation imprint of the human H19 and MEST/PEG1 genes during spermatogenesis. Hum Mol Genet, 9, 2183-7. 18. Scherer, S.W., Cheung, J., MacDonald, J.R., Osborne, L.R., Nakabayashi, K., Herbrick, J.A., Carson, A.R., Parker-Katiraee, L., Skaug, J., Khaja, R. et al. (2003) Human chromosome 7: DNA sequence and biology. Science, 300, 767-72. 19. Kaczynski, J., Cook, T. and Urrutia, R. (2003) Sp1- and Kruppel-like transcription factors. Genome Biol, 4, 206. 20. Kaczynski, J., Zhang, J.S., Ellenrieder, V., Conley, A., Duenes, T., Kester, H., van Der Burg, B. and Urrutia, R. (2001) The Sp1-like protein BTEB3 inhibits transcription via the basic transcription element box by interacting with mSin3A and HDAC-1 co-repressors and competing with Sp1. J Biol Chem, 276, 36749-56. 167

21. Zhang, J.S., Moncrieffe, M.C., Kaczynski, J., Ellenrieder, V., Prendergast, F.G. and Urrutia, R. (2001) A conserved alpha-helical motif mediates the interaction of Sp1- like transcriptional repressors with the corepressor mSin3A. Mol Cell Biol, 21, 5041-9. 22. Esnault, C., Maestre, J. and Heidmann, T. (2000) Human LINE retrotransposons generate processed pseudogenes. Nat Genet, 24, 363-7. 23. Murphy, W.J., Eizirik, E., Johnson, W.E., Zhang, Y.P., Ryder, O.A. and O'Brien, S.J. (2001) Molecular phylogenetics and the origins of placental mammals. Nature, 409, 614-8. 24. Kumar, S. and Hedges, S.B. (1998) A molecular timescale for vertebrate evolution. Nature, 392, 917-20. 25. Miller, I.J. and Bieker, J.J. (1993) A novel, erythroid cell-specific murine transcription factor that binds to the CACCC element and is related to the Kruppel family of nuclear proteins. Mol Cell Biol, 13, 2776-86. 26. Wang, R. and Brattain, M.G. (2007) The maximal size of protein to diffuse through the nuclear pore is larger than 60kDa. FEBS Lett, 581, 3164-70. 27. Moore, T. and Haig, D. (1991) Genomic imprinting in mammalian development: a parental tug-of-war. Trends Genet, 7, 45-9. 28. Takada, S., Tevendale, M., Baker, J., Georgiades, P., Campbell, E., Freeman, T., Johnson, M.H., Paulsen, M. and Ferguson-Smith, A.C. (2000) Delta-like and gtl2 are reciprocally expressed, differentially methylated linked imprinted genes on mouse chromosome 12. Curr Biol, 10, 1135-8. 29. Liu, J., Yu, S., Litman, D., Chen, W. and Weinstein, L.S. (2000) Identification of a methylation imprint mark within the mouse Gnas locus. Mol Cell Biol, 20, 5808-17. 30. Du, M., Beatty, L.G., Zhou, W., Lew, J., Schoenherr, C., Weksberg, R. and Sadowski, P.D. (2003) Insulator and silencer sequences in the imprinted region of human chromosome 11p15.5. Hum Mol Genet, 12, 1927-39. 31. Zhang, W., Walker, E., Tamplin, O.J., Rossant, J., Stanford, W.L. and Hughes, T.R. (2006) Zfp206 regulates ES cell gene expression and differentiation. Nucleic Acids Res, 34, 4780-90. 168

32. Walter, J. and Paulsen, M. (2003) The potential role of gene duplications in the evolution of imprinting mechanisms. Hum Mol Genet, 12 Spec No 2, R215-20. 33. Suzuki, S., Renfree, M.B., Pask, A.J., Shaw, G., Kobayashi, S., Kohda, T., Kaneko- Ishino, T. and Ishino, F. (2005) Genomic imprinting of IGF2, p57(KIP2) and PEG1/MEST in a marsupial, the tammar wallaby. Mech Dev, 122, 213-22. 34. Lu, J. and Wu, C.I. (2005) Weak selection revealed by the whole-genome comparison of the X chromosome and autosomes of human and chimpanzee. Proc Natl Acad Sci U S A, 102, 4063-7. 35. Dorus, S., Vallender, E.J., Evans, P.D., Anderson, J.R., Gilbert, S.L., Mahowald, M., Wyckoff, G.J., Malcom, C.M. and Lahn, B.T. (2004) Accelerated evolution of nervous system genes in the origin of Homo sapiens. Cell, 119, 1027-40. 36. Lai, C.S., Fisher, S.E., Hurst, J.A., Vargha-Khadem, F. and Monaco, A.P. (2001) A forkhead-domain gene is mutated in a severe speech and language disorder. Nature, 413, 519-23. 37. Feuk, L., Kalervo, A., Lipsanen-Nyman, M., Skaug, J., Nakabayashi, K., Finucane, B., Hartung, D., Innes, M., Kerem, B., Nowaczyk, M.J. et al. (2006) Absence of a paternally inherited FOXP2 gene in developmental verbal dyspraxia. Am J Hum Genet, 79, 965-72. 38. Bond, J., Roberts, E., Mochida, G.H., Hampshire, D.J., Scott, S., Askham, J.M., Springell, K., Mahadevan, M., Crow, Y.J., Markham, A.F. et al. (2002) ASPM is a major determinant of cerebral cortical size. Nat Genet, 32, 316-20. 39. Jackson, A.P., Eastwood, H., Bell, S.M., Adu, J., Toomes, C., Carr, I.M., Roberts, E., Hampshire, D.J., Crow, Y.J., Mighell, A.J. et al. (2002) Identification of microcephalin, a protein implicated in determining the size of the human brain. Am J Hum Genet, 71, 136-42.

169

CHAPTER VI: SUMMARY AND FUTURE DIRECTIONS

170

VI.A SUMMARY AND FUTURE DIRECTIONS

The differential expression of alleles in non-random patterns has been shown to be a complex, yet common phenomenon, where recent studies have indicated that approximately 50% of transcripts may be differentially expressed (1, 2). The findings in this thesis stress that patterns of preferential allelic expression are very context specific where these patterns can change with developmental stage, tissue, and genotype, among other factors. The mechanisms that regulate non-random differential allelic expression are diverse and can be epigenetic in nature or dependent on DNA sequence. This work sought to identify genes subject to preferential allelic expression, particularly imprinted expression, on human chromosome 7. This chromosome was selected due to its association with various disorders that show parent-of-origin effects, as well as our laboratory’s extensive and historical role in the study of this genomic region.

A recent study demonstrated that random patterns of mono-allelic expression occur on human autosomes, further stressing the need to analyze expression patterns in genes of interest (3). However, this finding adds additional complexity to the study of differential allelic expression, since it requires the analysis of expression in multiple individuals as well as comparison to parental DNA samples in order to determine if the pattern of expression is dependent on haplotype, parent-of-origin, or if it is random in nature. The screen presented in chapter II provides an effective way of distinguishing these patterns of expression through the use of inbred strains of mice. It allows for the analysis of allelic expression in various tissues, developmental stages, and genetic backgrounds. Additionally, the analysis of genes in reciprocal crosses of mice can distinguish between haplotype-specific or parent- of-origin specific expression. 171

The analysis of Pon1, a gene involved in athrosclerosis, revealed a differential pattern of allelic expression that varied with development. Although hepatic Pon1 expression increased overall throughout murine development, the expression from each individual allele increased disproportionately. Several other genes with differential patterns of allelic expression, as identified in chapter II, were examined at various time points. None presented the dynamic pattern of allelic variance seen in Pon1. However, subtle developmentally regulated variations may have been overlooked. Therefore, it is unknown if such a pattern of differential allelic expression is unique to Pon1 or if it is seen in other transcripts. Future studies examining non-random allelic expression at various developmental time points may be able to elucidate the commonality of the phenomenon.

The identification of a dynamic pattern of allelic expression has far reaching implications. It suggests that the study of allelic variance at a single developmental time point does not accurately portray differential expression. Additionally, it suggests that disease-associated alleles may not be equally expressed throughout development, which may account for variations in penetrance. Dynamic patterns of allelic variance may play a role in complex neurological diseases, among others, where neurological development and maturation may be dependent upon the proper expression of alleles at a particular developmental time point.

The analysis of murine carboxypeptidase-A4 identified a maternally expressed tissue-specific imprinted pattern of expression. This pattern of expression is conserved between humans and mice; consequently, the mechanisms which regulate the imprint are also conserved. The systematic expression analysis of known imprinted genes in additional species is a necessary exercise to understand the evolutionary basis of imprinting and the mechanisms that regulate parent-of-origin expression. Divergence in imprinted expression 172 has helped identify cis-acting regulatory elements essential for epigenetic modifications (4,

5). Consequently, the identification of imprinted expression in murine Cpa4 contributes towards the understanding of the origin and the mechanisms underlying the epigenetic regulation at the 7q32.3 cluster of genes.

Chapter V describes the identification of a novel imprinted transcription factor that is maternally expressed in all human and murine tissues examined. Its imprinted pattern of expression is dependent upon maternal methylation and its CpG island lacks differential histone modifications. Additionally, the chapter provides evidence suggesting that the gene arose through a retrotransposition event which occurred after the divergence of marsupials from eutherian mammals. Although the precise regulatory regions controlling KLF14’s parent-of-origin pattern of expression have not been elucidated, the work in this thesis sheds light on the epigenetic mechanisms directing the gene’s imprinted expression. As mentioned in chapter V, the maternally methylated CpG island associated with Mest is a candidate ICR for the region encompassing Klf14. Several attempts were made at identifying a ncRNA derived from this putative ICR in the murine locus, including strand- specific RT in the region overlapping Klf14, yet such attempts failed at identifying an antisense transcript (data not shown). Currently, there is no EST evidence for the presence of a ncRNA in the murine region, yet future studies may identify such a molecule.

The distance between Mest and Klf14 (200 kb) and the absence of a ncRNA in the region suggest that the region may be regulated by insulation (Figure VI.1). Under this model, Ctcf would be unable to bind to the maternally methylated CpG island of Mest, allowing enhancers to access the promoter region of Klf14. Maternal methylation, in addition to restrictive histone modifications, would silence Mest. In contrast, Ctcf would bind the unmethylated paternal allele, blocking interactions between enhancers and the 173

Klf14 promoter region. Additionally, hypomethylation of the CpG island would allow for the transcription of Mest. Such a model could be validated through chromatin loop assays and chromatin immunoprecipitation assays identifying Ctcf binding sites. 174

Figure VI.1

M Mest Klf14

P Mest Klf14

Figure VI.1 Model for the regulation of the Mest/Klf14 imprinted cluster. On the maternal allele (upper panel), the CpG island associated with Mest is methylated (green circles), blocking Ctcf from binding (purple oval). The absence of Ctcf on this allele allows for the interaction of distant enhancers (orange squares) with the Klf14 promoter, allowing for the transcription of this gene. On the paternal allele (lower panel), hypomethylation of the Mest CpG island allows for the binding of Ctcf. This blocks the interaction between the enhancers and Klf14. Arrows derived from the genes indicate transcriptional directions. 175

The identification of the imprinted Klf14 and Cpa4, which were positional candidates due to their location within an imprinted cluster, suggests that this method for the selection of imprinted candidates is successful. It stresses the value of allelic analysis of transcripts located within and flanking known imprinted loci. However, the development of high-throughput methodologies for genotyping and expression analyses may accelerate the identification of imprinted genes, as well as genes subject to preferential allelic expression in non-imprinted patterns. The development of these technologies for analyses in the mouse is in its very early phases when compared to the possibilities available for human studies.

Human studies are impaired by limited access to tissue samples with their corresponding

DNA and parental DNAs. However, the development of murine technologies will undoubtedly ease the identification of imprinted transcripts not located within clusters, due to possibility of assaying polymorphisms genome-wide. The development of such technologies will be greatly simplified if scientists analyzing imprinted expression worldwide were to select a single murine strain, in addition to C57BL/6, with which to perform their hybrid crosses.

As described in chapter II, Russell-Silver syndrome has been associated with human chromosome 7 due to the fact that 10% of patients with this disorder have maternal uniparental disomy for the chromosome. This has led to the hypothesis that imprinted genes on the chromosome may be involved with the disorder. However, no direct association has been made between any of the imprinted genes identified to date, including those analyzed in this thesis, and RSS. Future studies analyzing the role of imprinted genes in RSS may find such an association by performing high-density genotyping arrays on affected patients.

This analysis may uncover submicroscopic deletions or duplications on the chromosome, thereby narrowing candidate regions. Such an analysis has also been shown to be effective 176 in identifying segmental uniparental disomy by detecting stretches of homozygosity or

Mendelian inconsistencies (6). Candidate genes located within such loci can be screened for imprinted expression using the method outlined in chapter II.

Recent studies have demonstrated the commonality of genetic variability in the form of copy number variations (for review, see (7, 8)). Many of these polymorphic regions have been found to contain genes, indicating that there is variability in the number of their alleles in the human population. Whether all copies of these genes are expressed is gradually being unraveled and the importance of the analysis of gene expression in these regions is highlighted by the association of copy number variations with disease (for review, see (9, 10)). Consequently, altered levels of these polymorphic transcripts may be associated with morbidity and it is plausible that dosage sensitive genes within such regions may be regulated by non-random differential allelic expression. Thus, preferential allelic expression may gain a new facet as the phenomenon is studied in the context of genomic variability.

High-throughput analyses have estimated that 50% of transcripts, many of which are undoubtedly associated with disease, are subject to differential allelic expression in at least one individual examined in the studies. Whether such differential allelic expression represents random, imprinted, or haplotype-specific expression is not known and must be elucidated. To date, only ~100 imprinted genes have been identified, representing less than

1% of protein coding genes in the human genome. Such a number may be grossly under- estimated, due to the lack of studies analyzing gene expression at different developmental time points, tissues, and individuals. For example, there are no estimates of polymorphic imprinting, which has been shown to occur in various genes (11-13) and may represent an important factor in phenotypic variability. The work in this thesis highlights the numerous 177 patterns of expression that can occur in mammalian systems. It stresses the need to study allelic ratios when characterizing a gene’s structure and function, particularly the allelic expression of transcripts associated with disease. Such studies will help unravel the association between haplotypes or genetic variants with disease susceptibilities and contribute towards a greater understanding of transcriptional regulation. 178

VI.B REFERENCES

1. Pant, P.V., Tao, H., Beilharz, E.J., Ballinger, D.G., Cox, D.R. and Frazer, K.A. (2006) Analysis of allelic differential expression in human white blood cells. Genome Res, 16, 331-9. 2. Lo, H.S., Wang, Z., Hu, Y., Yang, H.H., Gere, S., Buetow, K.H. and Lee, M.P. (2003) Allelic variation in gene expression is common in the human genome. Genome Res, 13, 1855-62. 3. Gimelbrant, A., Hutchinson, J.N., Thompson, B.R. and Chess, A. (2007) Widespread monoallelic expression on human autosomes. Science, 318, 1136-40. 4. Okamura, K., Hagiwara-Takeuchi, Y., Li, T., Vu, T.H., Hirai, M., Hattori, M., Sakaki, Y., Hoffman, A.R. and Ito, T. (2000) Comparative genome analysis of the mouse imprinted gene impact and its nonimprinted human homolog IMPACT: toward the structural basis for species-specific imprinting. Genome Res, 10, 1878- 89. 5. Suzuki, S., Ono, R., Narita, T., Pask, A.J., Shaw, G., Wang, C., Kohda, T., Alsop, A.E., Marshall Graves, J.A., Kohara, Y. et al. (2007) Retrotransposon Silencing by DNA Methylation Can Drive Mammalian Genomic Imprinting. PLoS Genet, 3, e55. 6. Bruce, S., Leinonen, R., Lindgren, C.M., Kivinen, K., Dahlman-Wright, K., Lipsanen-Nyman, M., Hannula-Jouppi, K. and Kere, J. (2005) Global analysis of uniparental disomy using high density genotyping arrays. J Med Genet, 42, 847-51. 7. Redon, R., Ishikawa, S., Fitch, K.R., Feuk, L., Perry, G.H., Andrews, T.D., Fiegler, H., Shapero, M.H., Carson, A.R., Chen, W. et al. (2006) Global variation in copy number in the human genome. Nature, 444, 444-54. 8. Freeman, J.L., Perry, G.H., Feuk, L., Redon, R., McCarroll, S.A., Altshuler, D.M., Aburatani, H., Jones, K.W., Tyler-Smith, C., Hurles, M.E. et al. (2006) Copy number variation: new insights in genome diversity. Genome Res, 16, 949-61. 9. Pinto, D., Marshall, C., Feuk, L. and Scherer, S.W. (2007) Copy-number variation in control population cohorts. Hum Mol Genet, 16 Spec No. 2, R168-73. 10. Lupski, J.R. (2007) Genomic rearrangements and sporadic disease. Nat Genet, 39, S43-7. 179

11. Bunzel, R., Blumcke, I., Cichon, S., Normann, S., Schramm, J., Propping, P. and Nothen, M.M. (1998) Polymorphic imprinting of the serotonin-2A (5-HT2A) receptor gene in human adult brain. Mol Brain Res, 59, 90-2. 12. Jinno, Y., Yun, K., Nishiwaki, K., Kubota, T., Ogawa, O., Reeve, A.E. and Niikawa, N. (1994) Mosaic and polymorphic imprinting of the WT1 gene in humans. Nat Genet, 6, 305-9. 13. Xu, Y., Goodyer, C.G., Deal, C. and Polychronakos, C. (1993) Functional polymorphism in the parental imprinting of the human IGF2R gene. Biochem Biophys Res Commun, 197, 747-54.

180

APPENDIX I: ANALYSIS OF PARAOXONASE-1 SPLICE VARIANT

181

An in silico search for Pon1 splice variants identified an isoform lacking the paraoxonase domain (AK050119). Semi-quantitative RT-PCR using primers specific to

AK050119 revealed that it is less abundant than the major isoform of Pon1, requiring more cycles of PCR in order to observe a distinct band on an agarose gel. Pyrosequencing,

SNaPshot, and qPCR were also performed on this isoform, and a pattern of allelic expression distinct from that of Pon1 was observed (Appendix Figure 1). However, unlike

Pon1, the pyrosequencing and SNaPshot results for AK050119 were not in accordance

(Appendix Figure 2). Due to the low level of expression of the transcript, it could not be determined if the difference in measurements was due to the numerous rounds of PCR required for pyrosequencing, if the latter was accurate, or if there was variability in

AK050119 allelic expression in liver samples at the same developmental stage.

Consequently, the results were omitted from the main body of this chapter. Additionally, the biological relevance of this splice variant could not be ascertained since it lacks any enzymatic domains.

182

Appendix Figure 1

* *

A T/C A A T/C A CB-splice variant JB-splice variant

12.5dpc 15.5dpc P0 BxC

CxB

BxJ

JxB

AK050119

Appendix Figure 1. Expression pattern of AK050119. Electropherograms from sequencing of a set of liver cDNA samples from hybrid crosses at different developmental stages are shown. Results from amplification of genomic DNA are indicated on the top. The blue and red peaks indicate C and T respectively, where C is the allele inherited from BL6 in AK050119. Letters B, C, and J, refer to species BL6, CAST, and JF1, respectively, with the first letter of each cross representing the mother. The tissues in this analysis were used for SNaPshot measurements, featured in Appendix Figure 2.

183

Appendix Figure 2

Appendix Figure 2. Comparison of pyrosequencing and SNaPshot methodologies. The allelic ratios of the CAST or JF1 allele and BL6 allele for AK050119 are shown in black and grey, respectively. The developmental time point and the methodology used to measure the allelic ratio are shown on the X-axis, where P and S represent measurements from pyrosequencing and SNaPshot, respectively. Different sets of cDNA liver samples were used for each methodology. SNaPshot could not be performed on BC-12.5 and CB- 12.5 samples due to low expression levels. The data shows variability between the two methods.

184

APPENDIX II: LIST OF ABBREVIATIONS

185

KLF14: human Krüppel-like factor 14 protein

Klf14: murine Krüppel-like factor 14 protein

KLF14: human Krüppel-like factor 14 gene

Klf14: murine Krüppel-like factor 14 gene

CPA4: human carboxypeptidase-A4 protein

Cpa4: murine carboxypeptidase-A4 protein

CPA4: human carboxypeptidase-A4 gene

Cpa4: murine carboxypeptidase-A4 gene

PON1: human paraoxonase-1 protein

Pon1: murine paraoxonase-1 protein

PON1: human paraoxonase-1 gene

Pon1: murine paraoxonase-1 gene

AS: Angelman syndrome

RSS: Russell-Silver syndrome

PCR: polymerase chain reaction

RFLP: restriction fragment length polymorphism

SNP: single nucleotide polymorphism

RACE: rapid amplification of cDNA ends qPCR: quantitative polymerase chain reaction gDNA: genomic DNA

ChIP: chromatin immunoprecipitation

SSCP: single strand conformation polymorphism

186

UTR: untranslated region

pUPD: paternal uniparental disomy mUPD: maternal uniparental disomy

ICR: imprinting control region

DMR: differentially methylated region

CTCF: CCCTC-binding factor

DNMT: DNA methyl-transferase

HDAC: histone deacetylase

H3K9: lysine 9 of histone 3

H3K27: lysine 27 of histone 3

H4K20: lysine 20 of histone 4

H3K4: lysine 4 of histone 3

H3ac: acetylated histone 3

H4ac: acetylated histone 4

H3K4me3: trimethylated lysine 4 of histone 3

H3K4me2: dimethylated lysine 4 of histone 3

H3K9me3: trimethylated lysine 9 of histone 3

H3K9ac: acetylated lysine 9 of histone 3

H4K20me3: trimethylated lysine 20 of histone 4

187

JF1: JF1/Ms

CAST: CAST/Ei9

BL6: C57BL/6

JxB, BxJ, BxC, CxB: F1 hybrid offspring of JF1/Ms x C57BL/6, C57BL/6 x JF1/Ms,

C57BL/6 x CAST/Ei9, and CAST/Ei9 x C57BL/6 dpc: days post coitum

P0: post-natal day 0