SEARCH FOR FUNCTIONAL ALLELES IN THE WITH FOCUS ON CARDIOVASCULAR DISEASE CANDIDATE

DISSERTATION

Presented in Partial Fulfillment of the Requirements for

the Degree Doctor of Philosophy in the Graduate

School of The Ohio State University

By

Andrew Danner Johnson, B.S.

*****

The Ohio State University 2007

Dissertation Committee: Approved by Professor Wolfgang Sadée, Advisor

Professor Daniel A. Janies ______Professor Kirk Mykytyn Advisor Integrated Biomedical Sciences Graduate Program

Copyright by

Andrew Danner Johnson

2007

ii ABSTRACT

The genetic investigation of human disorders largely through linkage mapping has led to the discovery of candidate genes and mutations as risk factors for those disorders where there is a high degree of penetrance. While twin studies have provided evidence that there are major genetic contributions to multifactorial diseases like coronary artery disease, it has proven difficult to find and replicate significant genetic associations for

such diseases. Recent advances in technology, throughput and understanding of

widespread human genetic variation at the genomic level (e.g., the HapMap project) have

allowed the application of more genetic markers in larger sample studies, but we are still

lacking a complete picture of genetic contributions to major multifactorial diseases.

Searching for genetic variants with evidence of a direct molecular impact on the

expression and function of genes vital to disease development and progression is one

valid approach to this problem. There is a growing appreciation that one major class of

variation acts at the level of mRNA expression. Traditional tools for studying this class

of variation (e.g., reporter assays) in the laboratory have severe limitations, mainly

in that they lack the in vivo context where the alleles are hypothesized to have a

functional impact. This dissertation relies heavily on the application of a relatively novel

technique, the measurement of allelic expression imbalances (AEI) between

in primary human tissues. Using these measurements as phenotypic traits,

ii we demonstrate that cis-acting alleles exerting molecular affects on mRNA expression can often be readily mapped. In the largest survey to date of AEI in primary human tissues we find that AEI in disease candidate genes is quite common, and that the functional contributors to these expression phenotypes are often not regulatory polymorphisms, but polymorphisms found directly within the mRNAs and affecting mRNA processing and functions. Computational analysis of mRNA structures and genetic variation within human genomes indicates that modulation of mRNA structural plasticity to polymorphism is likely one contributor to human phenotypic variability.

Focusing on a number of cardiovascular disease candidate genes I make a number of novel findings: 1) a strong ACE AEI phenotype common in the African-American population is mapped to specific upstream regulatory alleles and is significantly associated with relevant clinical phenotypes, 2) SOD2 is subject to extremely common and extensive AEI in the human population suggesting potential positive selection, and 3) our results call into question the strength of many previous association studies based on polymorphisms in ACE, CCL2 and NOS3 where there is weak evidence supporting putative functional alleles.

iii

Dedicated:

First, to Meg, my partner in all; what a long, strange and wonderful trip we are on

Second, to my family who brought me up curious

Last, to colleagues and advisors past, present and future who keep me both inspired and

on track

iv ACKNOWLEDGMENTS

I thank my adviser, Wolfgang Sadée, for his inspiration, time and direction. He

showed me many facets of asking meaningful questions and becoming a successful

scientist, and he gave me the space I needed to grow while demanding that I do it;

I thank Dan Janies for frank talks from a different career perspective,

collaborations both successful and published and those done out of interest, and for

sharing computing resources and expertise, and teaching and travel opportunities;

I thank Kirk Mykytyn for some interesting bioinformatics questions along the way and his some darn good chili at the annual Department party;

I thank the whole lab crew: Audrey Papp, Danxin Wang, Zunyan Dai, Julia

Pinsonneault, Ying Zhang and Gloria Smith for assistance with so many tiny details, and sharing moments and meetings, lunches and laughter, tea and talks and wells on 3730 plates. I fear I may never again work with such a nice group of people;

I thank Jonathan Day who gave me my start in science at Penn State after I had knocked on other doors that were closed;

I thank Helen Chamberlin who saw potential in me and acknowledged this by giving me my first start with publications;

v I thank Ben Givens who laid the groundwork for me to take leadership roles in a lab setting, inadvertently set me off on computer programming and launched me into graduate school;

I thank my research colloborators outside the lab whose hard work is also presented here: David Saffen and Jeong-Eun Lim, Philip Binkley and Amanda Lesinski,

Glen Cooke, Clay Marsh, Chris Baran, Julie Johnson, Yan Gong, and Taimour Langaee;

I am thankful for the support of IBGP program staff (Allan Yates, Christine Kerr,

Angie Thomas, Darlene Johns) and Pharmacology staff (Sherry Ring, Gina Pace) whom never faltered in helping me navigate the system and always gave me the correct answers;

I thank the Jest Jugglers of Columbus who provided stress relief on many

Thursdays and may have helped spawn some critical neural circuitry;

I thank, again, my wife Meg who lived through it and loved me all the way.

I thank the anonymous tissue donors and their families without whom this work would not have been possible.

I am also thankful for research support that contributed to this doctoral work: a predoctoral fellowship award from the American Heart Association (0515157B) and a predoctoral Distinguished University fellowship award from The Ohio State University.

Travel grants from the OSU Ray Award, the OSU Medical Center Research Day and the

Pharmaceutical Sciences World Congress supported conference presentation of my research.

vi VITA

November 15, 1975...... Born – Bryn Mawr, PA

1998...... ………… ...... B.S. Biology (Vertebrate Physiology), The Pennsylvania State University

1999 – 2003...... Researcher The Ohio State University.

2003 - 2004 ...... ………. . Distinguished University Fellow The Ohio State University

2005 - 2007 ...... ………. . American Heart Association Fellow The Ohio State University

2007. . . ………...... ………. . Distinguished University Fellow The Ohio State University

PUBLICATIONS

Research Publications

1. F. Habib, A.D. Johnson, R. Bandschuh and D. Janies, “Large scale genotype- phenotype correlation analysis based on phylogenetic trees.” Bioinformatics, 23, 785, (2007).

2. T. Kurc, D. Janies, A.D. Johnson, S. Langella, S. Oster, S. Hastings, F. Habib, T. Camerlengo, D. Ervin, U.V. Catalyurek and J. Saltz, “An XML-based system for synthesis of data from disparate databases.” J. Am. Med. Inform. Assoc., 13, 289, (2006).

3. Y. Zhang, D. Wang, A.D. Johnson, A.C. Papp and W. Sadée, “Allelic expression imbalance of human mu-opoid receptor (OPRM1) caused by variant A118G.” J. Biol. Chem., 280, 32618, (2005).

vii 4. D. Wang, A.D. Johnson, A. Papp, D.L. Kroetz and W. Sadée, “Multidrug resistance polypeptide 1 (MDR1, ABCB1) variant 3435C>T affects mRNA stability.” Pharmacogen. Gen., 15, 693, (2005).

5. A.D. Johnson, D. Wang and W. Sadée, “Polymorphisms affecting gene regulation and mRNA processing: Broad implications for pharmacogenetics.” Pharm. Ther., 106, 19, (2005).

6. H. McCartney, A.D. Johnson, Z. Weil and B. Givens, “Theta reset produces optimal conditions for long-term potentiation in the dentate gyrus.” Hippocampus, 14, 684, (2004).

7. A.D. Johnson, D. Fitzsimmons, J. Hagman, H.M. Chamberlin, “EGL-38 Pax regulates the ovo-related gene lin-48 during Caenorhabditis elegans organ development.” Development, 128, 2857, (2001).

FIELDS OF STUDY

Major Field: Integrated Biomedical Sciences

viii TABLE OF CONTENTS

P a g e

Abstract...... ii

Dedication...... iv

Acknowledgments ...... v

Vita ...... vii

List of Tables...... xi

List of Figures ...... xiii

List of Abbreviations ...... xv

Chapters:

1. Introduction...... …… .1

2. cis-acting genetic variation in the human genome…………………………….. ..10

2.1 Evidence for cis-acting effects on genes of clinical relevance………..……..10 2.2 Modes of cis-acting polymorphisms and methods for discovery...……….…16 2.2.1 Experimental methods for discovering cis-acting polymorphisms...16 2.2.2 In silico methods for discovering cis-acting polymorphisms…...…19 2.3. Review of evidence for cis-acting variation in genes of clinical relevance.. .22 2.3.1 Drug metabolizing …………………………………….....22 2.3.1.1 CYP1 family…………………………………………..…23 2.3.1.2 CYP2 family…………………………………………..…25 2.3.1.3 CYP3 family………………………………………….…30 2.3.1.4 Other CYPs……………..……………………………..…32 2.3.1.5 Other classes of drug metabolizing enzymes………….…33 2.3.2 Drug transporters……………………………………………….….37

ix 2.3.3 Drug targets and receptors………………………………………....39 2.3.4 cis-acting polymorphisms in relevant trans factors………………..42 2.4 Summary…………………………………………………………………..…44 3. Survey of allelic expression in human target tissues…………………………….46

3.1 Survey of allelic expression in human target tissues………………………...46 3.2 Method for allelic expression survey in human target tissues..……………...49 3.2.1 Tissue sources and processing……………………………………..49 3.2.2 Design, assay and analysis of allelic expression imbalance……….51 3.2.3 Computational analysis of mRNA folding………………………...60 3.3 Results of allelic expression survey in human target tissues...………………61 3.3.1 Allelic expression results in 42 candidate genes…………………..61 3.3.2 Results in cardiovascular disease candidate genes……………...…72 3.3.3 Putative functional effects of variants on mRNA folding…………75 3.4 Summary …………………………………………………………………….78

4. Human ACE (angiotensin converting-I 1)……………….…………..… 90

4.1 Method of investigation of human ACE…………………………………….. 92 4.2 Results of investigation of human ACE…………………………………… 106 4.3 Summary…………………………………………………………………... 120

5. Human CCL2 (chemokine (C-C motif) ligand 2).....………………………….. 123

5.1 Method of investigation of human CCL2………………………………..… 126 5.2 Results of investigation of human CCL2………………………………..… 131 5.3 Summary………………………………………………………………...… 145

6. Influence of human polymorphisms on RNA structures……………………… 152

6.1 Method of study on the influence of human polymorphisms on RNA structures…………………………………………………………………..…… 158 6.2 Results of study on the influence of human polymorphisms on RNA structures……………………………………………………………………..… 168 6.3 Summary……………………………………………………………...…… 187

Bibliography…………………………………………………………………………… 197

x LIST OF TABLES

Table Page

2.1 Polymorphisms affecting gene regulation and mRNA processing...... 14

2.2 Web references for tools and databases useful in finding and characterizing functional cis-regulatory polymorphisms...... 20

2.3 Literature and web references for gene and allele nomenclature of families with pharmacogenetic importance...... 24

3.1 Oligonucleotide primers used for PCR amplification and SNaPshot primer extension reactions for 42 genes surveyed, with reverse PCR primers also used in gene specific cDNA synthesis…………………….57

3.2 Information for 42 genes surveyed for allelic expression imbalance in human tissues…………………………………………..…63

3.3 Genes in the survey showing highly significant allelic expression imbalance (minimally ±20.5 in at least one sample)………….………….69

4.1 Forward and reverse oligonucleotide primers used for PCR amplification and resequencing of human ACE.…………….….……….95

4.2 Baseline characteristics of INVEST case-control cohort……………….101

4.3 INVEST primary outcome odds ratios by ACE alleles…………………112

4.4 Statistical association of ACE polymorphisms with clinical phenotypes in the INVEST cohort…………………………….…..… 113

4.5 INVEST rs7213516 genotype effects were modified by drug therapy…115

4.6 Allele frequencies for polymorphisms in the INVEST cohort…………116

xi 4.7 ACE clinical genotyping results from a CAD patient cohort of 117 African-Americans at OSU hospitals…………………………….…….. 119

5.l Oligonucleotides used in the chemokine study with matches to specific alleles, intentional mismatches (GC clamps), and exon- spanning dinucleotides underlined…………………….……….………129

5.2 CCL2 and CSF1 genotype and total expression level results show no significant associations……………………………….……… 138

5.3 Allele frequency information for CCL2 SNP rs1024611 across studies included in meta-analysis…………………………………...…. 141

6.1 Reported RNA structure and polymorphism differences in the literature compared with results from the current work……………..… 153

6.2 Comparative analysis of structures from major and minor alleles 22,785 validated human SNPs………………………………………… 175

xii LIST OF FIGURES

Figure Page

2.1 Mechanisms of human genetic variability...... 11

3.1 Schematic of main types of functional polymorphisms…………………47

3.2 Representative allelic expression imbalances from 4 genes in the survey.. ……………………… ...... 68

3.3 Allelic expression imbalance results for 12 cardiovascular candidate genes (all individual human heart samples’ results displayed)…...……..74

3.4 In silico mRNA folding results for mature OPRM1 sequence………..…77

4.1 Human ACE gene structure and relevant genetic polymorphisms ( 17q23.3)…………………………………97

4.2 Allelic expression imbalance of ACE in heart tissue samples…….....…108

4.3 ACE allele effects in reporter gene assays………………………...……110

4.4 Kaplan Meier curve for rs7213516 GG homozygous vs. A carriers in the INVEST cohort…………………………………………………..…117

5.1 Human CCL2 gene structure and relevant genetic polymorphisms (chromosome 17q12)………………………………………………...…128

5.2 CSF1 stimulation potently upregulates CCL2 expression in human MDMs…………………………………………………..…… 133

5.3 CCL2 allelic expression imbalance results in human heart tissues segregated by rs1024611 genotype…………………………….. 134

5.4 CCL2 allelic expression imbalance results in human MDMs segregated by rs1024611 genotype……………………………………. 136

xiii

5.5 CSF1 allelic expression imbalance results in human MDMs...………. 137

5.6 Meta-analysis of CCL2 SNP rs1024611 including only diseases where multiple studies have been conducted…………………………... 143

5.7 Meta-analysis of CCL2 SNP rs1024611 including only diseases where one study has been conducted…………………………………... 147

6.1 Examples of information types and data flow for RNA SNP sequences in the study…………………………………………………………..….161

6.2 Example SNP structure results for rs10778……………………….……165

6.3 Distributions of thermodynamic energies of mRNAs across the human genome and predicted effects of SNPs on thermodynamic energies…………………………………………………………………170

6.4 Predicted effects of SNPs in the human genome on mRNA structure organized by SNP type……………………………………………….…172

6.5 Effects of varied flanking sequence sizes on mRNA structure predictions in a human genome-wide dataset……………………..……177

xiv LIST OF ABBREVIATIONS

ACE Angiotensin I-converting enzyme 1

AEI allelic expression imbalance

AIM ancestry informative marker

BAEC bovine aortic endothelial cell

BMI body mass index

CABG coronary artery bypass graft

CAD coronary artery disease

CCL2 Chemokine (C-C motif) ligand 2

CSF1 Colony stimulatory factor 1

CVA cerebrovascular attack

DMSO dimethylsulfoxide

DNA deoxyribonucleic acid

ESE exonic splicing enhancer

EST expressed sequence tag

FAM 6-carboxyfluorescein

HEK human embryonic kidney

HEX 6-carboxy-2’,4,4’,5’,7,7’-hexachlorofluorescein

HCTZ hydrochlorothiazide

xv IRB Institutional Review Board

IRE iron regulatory element

IUPAC International Union of Pure and Applied Chemistry

LD linkage disequilibrium

MAF minor allele frequency

MDM monocyte-derived macrophage

MFE minimum free energy

MI myocardial infarction mRNA messenger ribonucleic acid

PCR polymerase chain reaction

PMA phorbol 12-myristate 12-acetate

RNA ribonucleic acid

RTPCR real-time polymerase chain reaction

RT-PCR reverse transcriptase polymerase chain reaction

SECIS selenocysteine insertion sequence

SNP single nucleotide polymorphism

SOD2 Superoxide dismutase, mitrochondrial

TIA transient ischemic attack

UTR untranslated region

xvi CHAPTER 1

INTRODUCTION

The central problem of research originally lay with tracking the heritability of traits within groups of organisms. The realization that DNA molecules are the template for the inheritance of organismal characteristics did not change the goal of genetics but instead provided more direct molecular material for study (Watson & Crick

1953). Over decades, with the evolution of first (Sanger & Tuppy 1951), then

RNA and then DNA sequencing technology, substantial evidence accumulated to support the role of differences in DNA sequence between and within organisms that accounted for variation in phenotypes, beginning with the recognition of large-scale chromosomal aberrations, and eventually proceeding to finer scale differences and genetic maps including microsattelite repeats and single nucleotide polymorphisms (hereafter referred to as SNP in singular or plural) (de Martinville et al., 1982).

Although sequence variation between organisms and within wild, domesticated and model organisms is of much scientific interest and importance, a major focus for modern genetics is locating and characterizing variation in DNA that, in part, accounts for inter-individual differences among humans in their susceptibility to diseases.

Positional cloning techniques and genetic maps that are sparse by current standards were

1 successful in the mapping of major genes that contribute to Mendelian disorders such as

cystic fibrosis and Long QT syndrome where causative alleles exhibit a high degree of

penetrance (Collins et al., 1987). An elusive and challenging problem has been the

search for and characterization of less penetrant genetic variants that contribute to multi-

factorial disease. These multi-factorial diseases such as coronary artery disease (CAD)

and cancer can develop through an interplay between genetic factors with environmental

factors, such as exposure to toxins. Taking CAD as an example, a number of

acknowledged risk factors with clinical utility have been defined including gender, obesity, body mass index (BMI), smoking history, history of physical activity, blood pressure, levels and history of diabetes incidence. Most of these factors are contributed to by personal lifestyle choices and environmental influences, but each of them in turn may have a significant component that is influenced by inter-individual genetic differences. The complex multivariable nature of gene and environment influences in multi-factorial disease thus makes the dissection of genetic contributors extremely challenging.

An approach often applied to studying genetic factors in multi-factorial disease

(genetic association study) is to collect DNA from a population of cases (individuals meeting a clinical definition of disease) and controls (individuals who do not display signs of the disease). Ideally these sample collections are matched in demographics not considered in the study (e.g., mean age and gender), and statistical considerations are factored in calculating the sample size based on power to detect significant effects with the number of genetic markers typed. Two distinct but overlapping types of study are undertaken in genetic association studies. The first approach is generally referred to as

2 linkage mapping, whereby markers dispersed throughout the genome are typed with the understanding that nearby markers are often linked and co-segregate over generations.

The selection of markers for these studies tends toward an unbiased representation of the genome, spacing markers at relatively even distances. Thus, a marker or grouped set of markers may associate with the disease indicating a candidate region of the genome.

Often investigators return to this region and do a finer scale (fine mapping) study for association with the disease in an attempt to further define the causal region or variant.

More recently, this general approach is beginning to be applied on much larger scale due to technological advances that make it feasible to study more than 500,000 markers at tractable costs in a whole genome association study. The second major approach to genetic association studies relies on a priori knowledge of candidate genes to guide study design in the selection of genes of interest for the disease. In general this approach involves the study of a denser set of genetic markers for between one and tens of genes that are suspected to play a role in disease etiology based on prior research knowledge, including previous linkage studies. Candidate gene studies have some substantial advantages in their narrower focus, including a lesser need for multiple hypothesis testing correction. The application of genetic association studies of both types has been widely applied to the study of many traits and has resulted in the successful identification of a handful of common genetic variants that contribute to multi-factorial diseases (e.g.,

ApoE4 in familial risk for Alzheimer’s disease; Strittmatter et al., 1993). However, a major portion of genetic association studies fail to make significant findings, or their findings are not consistently replicated in further studies in distinct study populations.

3 The design of genetic association studies is fraught with potential problems

including marker selection, insufficient statistical power, genotyping error, population

admixture, inadequate definition of phenotypic categorizations, and lack of an

appropriate replication cohort. These issues and the general nature of the goal (i.e.,

finding genetic contributions that may only explain a few percentage of variation to a

complex, multi-factorial phenotype) may largely account for the relatively few truly

successful, replicated findings to date. It is the position of this work that another

considerable limitation of genetic association studies is the failure to adequately wrestle

with the identification and characterization of true functional variants. Two main

contributions to this phenomenon are 1) reliance on linkage disequilibrium (LD) as a

surrogate for the identification of true functional variants, and 2) an underlying

assumption that the wide majority of functional variants are those that result in a change

in protein sequence. Given a strong statistical association with a marker, if there is no

apparent biological explanation researchers often conclude that the association is due to

linkage disequilibrium with one or more functional variants, but further effort to define

these variants is often not undertaken or is reported to be fruitless, and future studies

often rely on the marker with the original statistical association alone. Other than the

failure to correctly identify functional variants themselves and potentially reveal novel

insights into biological function, this approach leads to another major dilemma in that

patterns of linkage disequilibrium among markers vary dramatically among the

worldwide human population. Thus, while a marker may be associated with a disease and in strong LD with the functional variants in one population, a breakdown in LD with

4 the functional variants in another population may lead to failure to replicate findings,

confusion regarding the association and a marker that has poor clinical utility.

A widely held view among genetic researchers is that variants that alter protein

sequence are those most likely to impact biologically relevant traits. For example a

number of recent large-scale genetic studies have implicitly stated that all variants that did not result in a protein sequence change were considered neutral and discarded from further analytical consideration (e.g., Sjoblom et al., 2006). This bias is adopted for a number of reasons: 1) most studies choose to focus on gene-centric regions (protein coding portions of genes) that make up less than 1% of the genome sequence because these are the regions most directly attributable to biological activity and the functions of

non-protein coding regions is generally less well understood, 2) a variant resulting in a

change in protein sequence or protein truncation affects this change in all tissues where

the gene is expressed largely reducing the need to grapple with issues of tissue-specific

regulation in regards to the disease of interest, 3) it is more cost effective to scan only the

protein coding (exonic) regions of DNA than to include intronic and inter-genic regions

which span larger regions and where the payoff is also expected to be lower, and 4) many

of the well known examples of gene-disease interactions, primarily Mendelian disorders,

involve protein changing variants. Nonetheless, careful literature reviews reveal many

examples of non-protein altering variants that have a direct molecular impact on

biological function and are even causal for disease (Rockman & Wray 2002; Johnson et

al., 2005). Moreover, two recent genome-wide association studies for coronary heart

disease have found strong genetic associations with markers that are not in known gene

regions (Helgadottir et al., 2007; McPherson et al., 2007). A major limiting factor in the

5 discovery of this type of variant is that observation of biological effects may be extremely

context-specific (e.g., tissue dependent, time dependent, disease state dependent) and

adequate tools to study their impact in a natural context have largely been lacking. The

advent of genomic scale techniques (e.g., shotgun sequencing, EST mapping, array hybridization) and bioinformatics databases has been critical in a number of observations

that highlight the likely importance of non-protein altering genetic variants: 1) human

genes undergo dramatic levels of complexity in alternative splicing and translation, 2)

is highly variable and regulated among tissues and contexts including

developmental stages and disease states, 3) non-protein coding regions of the genome

show remarkably high levels of sequence conservation across species highlighting

important biological functions, 4) RNA molecules have complex functional roles aside

from the dogmatic role of coding for protein sequences, 5) human mRNA untranslated

regions are on average longer than other vertebrates suggesting they may play important roles in regulation or mRNA stability, 6) at a genomic level the largest differences between humans and “lower” organisms that presumably account for major biological differences is mostly not in gene number and protein sequence but in alternative regulation of genes and differences in inter-genic sequences.

Standard approaches for studying the effects of genetically distinct alleles on gene expression are typically either indirect in their measurement or lack a fair approximation of true in vivo context. Approaches that are still widely applied include measurement of gene expression by DNA microarray or real-time PCR (RTPCR), or in vitro reporter gene assay, and subsequent association with genotype. Both methods are subject to significant variability due to experimental conditions (trans-acting variation).

6 Reporter genes lack both the full gene sequence context and the natural cellular context of the physiologically relevant tissue. Differences between individual samples in tissue and RNA quality often effect RTPCR and microarray results for genes of interest to a much greater extent than the size of effects that are readily discriminated. Thus, these approaches are likely to result in a high degree of false positives that are truly due to differences in trans-acting variation, and to have a high degree of false negatives due to their poor discriminative capability. An alternative methodology involves directly measuring allele-specific RNA levels, an approach first adopted in the successful study imprinting of PGK-1 and HPRT in heterozygous mouse embryos (Singer-Sam et al.,

1992). This method incorporates allele-specific primers differing by a single nucleotide at heterozygous genetic marker positions to quantitate RNA molecules originating from each of the chromosomes. Thus, a genetic difference leading to a difference in gene expression is readily and directly detected. Application of this type of analysis of allelic expression imbalance (AEI) has led to the detection of significant inter-individual differences in gene expression and provided support for functional variants that may impact human disease (Wang et al., 2005; Zhang et al., 2005; Lim et al., 2006;

Pinsonneault et al., 2006; Cirulli & Goldstein, 2007; Lim et al., 2007). Since the original conception, the method has been adapted for modern equipment allowing greater throughput (Yan et al., 2002; Pinsonneault et al., 2004) and a number of analytical extensions have been introduced including: 1) the use of DNA as an internal standard for technical variability, 2) internal validation using compound heterozygotes, 3) multi- plexing when necessary, and 4) the use of allele-specific primers at the 3’ end of genes during reverse transcription to improve recovery and accurate quantitation of lower copy

7 messages. Here we extensively apply the AEI methodology along with other techniques

in the study of cardiovascular candidate genes in primary human tissues.

Cardiovascular disease including CAD and stroke, and related fatal events like

myocardial infarction are a leading killer in the developed world. Despite decreases in

cardiovascular-related deaths, this still remains a major health care burden (Mathers &

Loncar 2006). The heritable component to cardiovascular disease has been long realized

as seen in early twin studies (e.g., Marenberg et al., 1994) and a combined list of

candidate genes gleaned from the literature and the major cardiovascular Programs for

Genomics Applications would contain more than 650 entries. Despite the focus of

genetic association studies on many of these candidate genes there are still surprisingly

few well-validated and clinically applicable genetic markers for cardiovascular disease.

It is the major goal of this doctoral thesis to explore the presence and frequency of

genetic alleles influencing RNA expression in a number of major cardiovascular disease

candidate genes, and to attempt to locate and characterize the functional alleles leading to

observed differences. Chapter 2 presents a significant overview of the literature

regarding human genetic variation acting at the RNA level. Chapter 3 provides experimental detail and results of a survey of greater than 40 candidate genes for AEI in

human tissues. Chapters 4 and 5 focus on specific cardiovascular candidate genes that

have been the subject of further significant study. Given my finding that AEI is common

in human tissues and attributable in some cases to both synonymous (Wang et al., 2005)

and nonsynonymous (Zhang et al., 2005) variation acting at the mRNA level I used a

bioinformatics approach to investigate whether differences in mRNA structure might

8 account for and predict functional variation in the human genome, the subject of Chapter

6. Summary conclusions and discussion are found at the end of each chapter.

9 CHAPTER 2

cis-acting genetic variation in the human genome

Regulation at the level of transcription initiation and RNA processing defines

downstream biological effects. Such regulation occurs in cis, directly affecting the

regulated gene, but it can also act in trans by altering activity of downstream genes (see

Fig. 2.1). Significant interindividual differences in gene expression patterns are common

and may result from both environmental factors and cis- or trans-mediated genetic effects

(Singer-Sam et al., 1992; Enard et al., 2002; Whitney et al., 2003; Pastinen & Hudson

2004). There is growing evidence for abundant polymorphisms in cis-acting sequences

that influence gene expression (Rockman & Wray 2002; Yan et al., 2002; Bray et al.,

2003; Lo et al., 2003) and indication that a significant portion of functional polymorphisms affect cis-acting regulatory elements (Stamatoyannopoulos 2004;

Wittkopp et al., 2004). Identifying the functional alleles that account for inter-individual differences remains difficult (Ioannidis 2003; Page et al., 2003; Sun et al., 2004). The genetic components of complex inter-individual differences may require resolution of multiple modest variations in genotype which collectively yield a recognizable phenotype, such as disease susceptibility or drug response.

10

Modes of human genetic variability involving cis- and trans-acting polymorphisms. If cis-acting polymorphisms alter signaling or transcription factor activity, multiple trans- acting changes ensue. Epigenetic changes can mimic cis-acting polymorphisms. Lastly, epistatic effects (multiple interacting polymorphisms) are likely to play a role as well. Not shown are regulatory effects exerted by small RNAs, also subject to genetic variability.

Figure 2.1: Mechanisms of human genetic variability

11 Phenotypic differences can arise from genetic polymorphisms acting in cis by

changing the protein coding sequence or at the level of RNA (Day & Tuite 1998):

affecting transcription (activation or inhibition through regulatory sites or structure of

regulatory elements), mRNA processing, pre-mRNA splicing, exonic splicing enhancers

(ESEs), exon skipping (Cartegni et al., 2003), mRNA stability (Sheets et al., 1990; Conne

et al., 2000; Di Paola et al., 2002; Tebo et al., 2003), mRNA trafficking, or regulatory

RNAs (see Fig. 2.1). The most commonly studied polymorphisms, nonsynonymous

changes that alter amino acid coding, appear in many cases insufficient to account for

inter-individual differences in disease etiology and response to therapies. Further, it is estimated that functional polymorphisms that are cis-regulatory in the human genome

outnumber those that alter protein sequence, and that the bulk of regulatory

polymorphisms remain to be discovered (Ng & Henikoff 2002; Rockman & Wray 2002;

Stamatoyannopoulos 2004; Yan & Zhou 2004). On the other hand, genome-wide linkage

analysis with mRNA expression as the quantitative trait demonstrates that interindividual

differences in mRNA profiles appear to be largely caused by trans-acting factors (Morley

et al., 2004). These statements are not incompatible since a single cis-acting

polymorphism in a transcription factor or receptor could affect the expression of

numerous other genes (see Fig. 2.1). Therefore, to understand consequences of genetic

variations, we must first determine whether interindividual differences of a protein’s

activity are caused by polymorphisms in cis or trans (or both), or by environmental

factors. If in cis, we must then find the functional polymorphism(s) in the candidate gene

that can account for the observed variations, and epistatic interactions among them if

several are present. If in trans, we search for polymorphisms in trans-acting factors (e.g.,

12 transcription factors). Lastly we must also consider epigenetic factors such as methylation, imprinting, and chromatin structure modulation that can be transmitted through the germ-line or observed in somatic cells without alterations in the primary

DNA sequence (Grewal & Moazed 2003; Yan & Zhou 2004).

While many studies have addressed cis-regulatory variations, it is likely that a majority of functional variants are yet to be discovered. Novel techniques now enable broad investigation of this type of variation, which is likely to contribute substantially to knowledge of phenotypic variability of pharmacogenetic relevance. Previous extensive studies on cis-regulatory variations affecting disease susceptibilities (e.g., Loktionov

2004) inform our thinking about functional variations with pharmacological-

pharmacogenetic implications. Here, I review advances in the discovery of cis-regulatory

variations within genes encoding drug metabolizing enzymes, drug transporters, and drug targets and receptors (see Table 2.1).

13 Gene*Allele Description Functional reports Reference CYP1A2*1C Enhancer (-3860G>A) ↓ enzyme activity Nakajima 1999 CYP1A2*1F Intron 1 (-163C>A) ↑ enzyme induction Chida 1999, Sachse 1999, Shimoda 2002, Nordmark 2002 CYP1A2*1K Intron 1 Disrupt Ets BS, ↓ RNA, metabolism Aklillu 2003 CYP1A2*7 Intron 6 splice donor SNP PM to (single individual) Allorge 2003 (3534G>A) CYP2A6*9 TATA Box ↓ RNA, protein, enzyme activity Pitarque 2001, Kiyotani 2003, Yoshida 2003 CYP2A6*1D Enhancer (-1013A>G) ↓ transcription (reporter assay) Pitarque 2004 CYP2A6*1H,J Enhancer (-745A>G) Disrupt NF-Y BS, ↓ transcription von Richter 2004 CYP2A6*12 Intron 1 2A6/7 crossover ↓ enzyme activity in vitro and in vivo Oscarson 2002 CYP2B6*9 Splice variant Skip exons 4-6 Lamba 2003 CYP2B6*1G Promoter (-750C>T) ↓ RNA Lamba 2003 CYP2B6*1B Enhancer (-2320T>C) ↓ protein in Caucasian females Lamba 2003 CYP2B6*1C ESE synonymous SNP ↓ protein in Caucasian females Lamba 2003 CYP2C9*6 Frameshift Null, severe toxicity Kidd 2001 CYP2C19*2 Exon 5 splice variant PTC, PM phenotype de Morais 1994a CYP2C19*3 Exon 4 premature stop PTC, PM phenotype de Morais 1994b CYP2C19*4 Initiation codon Transcription ablation, PM phenotype Ferguson 1998 CYP2C19*7 Intron 5 splice donor SNP PM phenotype Ibeanu 1999 (IVS5+2 T>A) CYP2D6*4A-L Intron 3 splice variant PTC, PM phenotype Kagimoto 1990 CYP2D6*11 Intron 1 splice acceptor PTC, PM phenotype Marez 1995 CYP2D6*41 Promoter IM phenotype, ↑ expression Lovlie 2001, Zanger 2001, Gaedigk 2003 CYP2D6 Intron 6 (2988G>A) IM phenotype prediction Raimundo 2004 CYP2D6*44 Intron 6 splice donor SNP ↓ enzyme activity Yamazaki 2003 CYP2D7 138delT Pseudogene ORF, expression Pai 2004 CYP2E1*1D Enhancer VNTR ↔ transcription, ↑ enzyme induction McCarver 1998, Hu 1999 CYP2J2*7 SP-1 BS (-76G>T) Disrupt SP-1 BS, ↓ transcription King 2002, Speicker 2004 CYP3A4*1B Proximal promoter Disrupt nifedipine specific repressor Westlind 1999, (-392A>G) Spurdle 2002, Amirimani 2003, Floyd 2003 CYP3A4 Far upstream enhancer Disrupt USF1BS, ↓ expression Matsumura 2004 (-11,129_-11,128insTGT) CYP3A5*3 Exon 3B splice inclusion PTC, ↓ protein, enzyme activity Kuehl 2001, Hustert 2001 CYP3A5*6 Exon 7 (14690G>A) Splice defect, Exon 7 deletion Kuehl 2001, Hustert 2001 CYP3A5*7 Exon 11 (27131 T ins) Predicted PTC and ↓ protein Hustert 2001

Continued

Table 2.1: Polymorphisms affecting gene regulation and mRNA processing.

14 Table 2.1 continued

CYP4F12*v1 Intron 1 (146 bp del) ↓ transcription Cauffiez 2004 CYP4F12*v2 9-SNP promoter allele ↓ transcription Cauffiez 2004 CYP7A1 Promoter (-204A>C) ↓ response to atorvastatin Kajinami 2004 CYP8A1*1D-F Promoter VNTRs ↑ number SP-1 BS, ↑ transcription Chevalier 2002 UGT1A1*28 (TA5-8)TAA repeat ↓ transcription, protein, enz act, ↑ toxicity Bosma 1995, Iyer 2002, Fang 2004 UGT1A9*22 Promoter (T ins) ↑ transcription Yamanaka 2004 TPMT*V6a Promoter VNTRs ↓ transcription, ↓ in vivo activity, Spire-Vayron de conflicting reports la Moureyre 1999, Yan 2000, Alves, 2000 NAT1*16 3’UTR AAA ins + C>A ↓ protein, ↓ in vitro activity, de Leon 2000 disrupt predicted RNA structure ADH4 Promoter (-75A>C) ↓ transcription Edenberg 1999, Iida 2002 ABCB1 Synonymous (3435C>T) ↓ RNA, protein, drug transport activity Hoffmeyer 2000, Sakaeda 2001, Wang 2005 ABCB1 Promoter haplotypes ↑ transcription Takane 2004 SLC6A4 5’HTTLPR (14 repeats) ↓ RNA, protein, transport activity Heils 1996, Heinz 2000, Hranilovic 2004 SLC6A4 3’UTR PolyA No quantitative assay Battersby 1999 SLCO1B1*17 Upstream promoter ↑ pravastatin clearance Niemi 2004 TSER*3 Promoter VNTR ↑ RNA, Poor 5-FU treatment outcome Horie 1995, Villafranca 2001, McLeod 2002 TSER*3RG SNP in 2nd VNTR Disrupt USF-1 Mandola 2003 TSER 3’ UTR (1494 6bp del) ↓ RNA, ↓ stability, ↓ intratumoral protein Ulrich 2000, Mandola 2004 HTR2A Synonymous (102C>T) (Conflicting reports on functionality) Arranz 1998, Bray 2004 HMGCR(SNP3) Intron 5 ↓ response to pravastatin Chasman 2004 HMGCR(SNP29) Intron 15 ↓ response to pravastatin Chasman 2004 MMP3 Promoter (-1171 5A>6A) ↓ expression, ↓ response to pravastatin de Maat 1999 LIPC Promoter (-480C>T) Disrupt USF BS, ↓ transcription, Botma 2001, ↓ enzyme activity, ↓ response to treatments Zambon 2001 ACE Intron 16 (287 bp ins) ↓ response to fluvastatin Marian 2000 PTP1B 3’ UTR (1484 ins G) ↑ RNA, ↑ mRNA stability Di Paola 2002 hGRß 3’UTR AUUUA SNP ↑ mRNA stability, ↑ protein Schaaf 2002

Polymorphisms listed with gene, allele denoted after ‘*’ (if defined), the type of genetic alteration (if an allele defined by a single SNP exists then the position and base change are given), reported functional observations, and related literature references.

15 2.2 Modes of cis-acting polymorphisms and methods for discovery

2.2.1. Experimental methods for discovering cis-acting polymorphisms

Measurement of sequence variants, primarily single nucleotide polymorphisms

(SNPs), provides the fundamental units for linking genetic sequence to traits. Most genes harbor multiple sequence variations (e.g., SNPs, repeats, indels) showing a broad range of frequencies and linkage disequilibrium among them. In clinical genetic association studies with goals to pinpoint candidate genes, selection of polymorphisms yielding maximum information is difficult. Confounding factors such as their relative frequencies in the targeted populations, population admixture, and the effects of age and sex

(Pinsonneault & Sadée 2003) account in part for the failure to replicate many association studies. Most polymorphisms are nonfunctional and thus serve as markers for functional alleles. Rather than using single polymorphisms, associations are now often made with the use of haplotypes, blocks of linked polymorphisms, that may demarcate trait- significant cis-regions of sequence. High-throughput SNP genotyping methods are now coming online, such as SNPlex, capable of screening thousands of SNP in many samples

(Wenz 2004). Such methods have been used to establish haplotype maps on a genome- wide basis, including genes involved in drug metabolism, at significant marker density

(Kamatani et al., 2004). Despite improvements in throughput for identification and association of sequence variants, the search for the specific identity of key regulatory variations is a difficult problem. Linking polymorphisms to transcriptional regulation has traditionally employed gene reporter constructs and in vitro DNA-protein factor binding

16 assays. For example, haplotype-specific chromatin immunoprecipitation (haploChIP)

takes advantage of the relation between the amount of phosphorylated RNA polymerase

II and transcriptional activity (Knight et al., 2003). Via haploChIP, SNPs in regulatory regions can be investigated in conjunction with variation in transcription levels.

However, these approaches provide incomplete pictures because they lack the physiological and structural context of a target tissue, or they currently lack high- throughput capability.

More broadly, mRNA expression measured by microarrays has been combined with genome-wide linkage analysis, taking the expression level of each gene as the measured phenotype (Morley et al., 2004). Heritability of gene expression phenotypes can be explored through familial genotyping and transmission disequilibrium testing in nuclear families (Spielman & Ewens 1996) or pedigree disequilibrium testing in larger pedigrees (Martin et al., 2000). Using target tissues (such as immortalized lymphocytes) from family members, this type of analysis is capable of distinguishing between cis- and

trans-acting genetic factors, and shows an abundance of functional genomic loci and a

preponderance of trans-acting effects, as expected (see Fig 2.1). However, the

technology suffers from low sensitivity, and therefore, may limit the detection of

functional variations in target genes.

An alternative approach involves the analysis of allele-specific expression in a

relevant target tissue; each allele experiences its own regulation in the same cellular

environment, with the other allele (for autosomal genes) serving as an internal control.

As a result, the method controls for tissue conditions, trans-acting factors, and other environmental influences. Thus, SNPs in exonic and untranslated regions of message,

17 can serve as markers for allele expression levels in individuals heterozygous for these

markers. Taking the peptide transporter hPepT2 as one example, our laboratory has

recently described a method for allele-specific measurement of mRNA expression

through primer extension incorporation of fluorescent dideoxy-nucleotide terminating probes after RT-PCR amplification (Pinsonneault et al., 2004). Significant differences in the relative abundance of each allele in mRNA from kidney tissues demonstrated the presence of functional cis-acting factors. The primer extension reaction can be multiplexed (Bray et al., 2004), so that it will be possible to search for functional cis- acting polymorphisms in a large number of genes (Yan et al., 2002). Similar results can be achieved through methods employed on other platforms (Wojnowski & Brockmoller

2004) including the use of matrix-assisted laser desorption/ionization time-of-flight spectroscopy (MALDI-TOF) (Ding et al., 2004) and allele-specific RT-PCR methodologies (Zhang et al., 2004). These techniques may be extended to unprocessed heterogeneous nuclear RNA (hnRNA) if exonic and untranslated markers are unavailable and hnRNA is abundant enough in the target samples (Hirota et al., 2004).

Taken together, these approaches allow rapid determination of the extent of cis or trans genetic variation in a and the heritability of that component. Further determination of the functional alleles is challenging because regulatory regions span across large genomic DNA segments. Therefore, in silico methods have proven helpful.

18 2.2.2. In silico methods for discovering cis-acting polymorphisms

Bioinformatics complements experimental investigations of regulatory polymorphisms, allowing investigators to interpret whether polymorphisms exist in a sequence region with predicted functional importance (Wasserman & Sandelin 2004).

Table 2.2 provides web links for tools and databases. Most tools employ phylogenetic footprinting to compare regions of sequence conservation that may highlight regulatory regions, and then match these sequences against models predicting transcription factor binding sites. Pharmacogenetics-centered examples can be found in conjunction with the

CREATE website (see Table 2.2: PromoLign, ReguLign). This general approach provides improvements over earlier methods but still fails to identify many regulatory sites (Wasserman & Sandelin 2004). Acknowledging the combinatorial nature of factors binding regulatory regions, some tools use combinations of cis-regulatory modules

(CRM) for specific tissues or gene types to make successful predictions (Liu et al., 2003).

A recent tool, PupaSuite, integrates available information on the potential for

individual SNPs to alter expression or function (Conde et al., 2004). PupaSuite takes into

account predicted transcription factor binding sites, intron/exon boundaries, predicted

ESEs (Cartegni et al., 2003), amino acid changes in predicted motifs, and additional

annotation information. ESE prediction is of interest since defects in splicing represent

cis-regulatory variants, constitute a small but significant portion of known disease-

causing mutations (Cooper & Mattox 1997) and provide potential therapeutic targets

(Sierakowska et al., 1996).

19 Resource Name URL reference ARED (AU-rich element database) rc.kfshrc.edu.sa/ared CGAP gai.nci.nih.gov CREATE pharmacogenetics.wustl.edu dbSNP www.ncbi.nlm.nih.gov/SNP Environmental Genome DB www.niehs.nih.gov/envgenom GeneSNPs www.genome.utah.edu/genesnps Human Gene Mutation DB (Cardiff) www.hgmd.cf.ac.uk/ac Human Genome Variation Society www.hgvs.org HapMap Project www.hapmap.org Innate Immunity PGA innateimmunity.net JSNP database snp.ims.u-tokyo.ac.jp Pharm GKB www.pharmgkb.org PromoLign polly.wustl.edu/promolign/main.html PupaSuite pupasuite.bioinfo.cipf.es ReguLign polly.wustl.edu/regulign/default.html Seattle SNPs (UW-FHCRC) pga.mbt.washington.edu SIFT blocks.fhcrc.org/sift/SIFT.html

Table 2.2: Web references for tools and databases useful in finding and characterizing functional cis-regulatory polymorphisms.

20 Post-transcriptional mRNA turnover represents another potentially important

cause of genetic variability arising from cis-polymorphisms. mRNA stability can

fluctuate through modulation of a number of pathways and changes in RNA structure and

protein-RNA binding sites. For example, destabilizing adenylate uridylate-rich sequence

elements (ARE) are found in the 3’ untranslated regions (3’UTR) (Tebo et al., 2003) of

5-8% of human genes (Bakheet et al., 2003). Mutations in these elements are linked to disease pathology in human resistance and have been suggested as stratifiers for administration of PTP1N inhibitors (Di Paola et al., 2002). Human glucocorticoid receptor beta (hGRß) acts as a dominant negative factor for the -responsive hGRß

(Bamberger et al., 1995), and increased hGRß expression has been associated with steroid-resistance in asthmatics (Hamid et al., 1999; Sousa et al., 2000). A SNP in the

3’UTR of hGRß disrupts an ATTTA motif and leads to increased mRNA stability suggesting the functional cause of steroid-resistance (Derijk et al., 2001; Schaaf &

Cidlowski 2002). AREs collected in a database (Bakheet et al., 2003) demonstrate how the identification of consensus motifs and novel regulatory elements in combination with integrated genomic polymorphism databases may allow parallel characterization of functional polymorphisms. Further effects of ‘silent’ polymorphisms in transcribed regions will likely be uncovered as methods improve for the difficult problem of predicting RNA tertiary structure (Chen et al., 1999; Zuker 2003). Silent mutations predicted to change mRNA folding in drug-related genes have previously been supported experimentally: DRD2 (Duan et al., 2003) and NAT1 (de Leon et al., 2000). Newer algorithms that include evolutionary weighting schemes will likely lead to further experimentally validated examples (e.g., comRNA, Ji et al., 2004).

21

2.3 Review of evidence for cis-acting variation in genes of clinical relevance

2.3.1 Drug metabolizing enzymes

Numerous genes encoding involved in drug metabolism and transport have been categorized, and nomenclature of polymorphic variations is being

standardardized (see Table 2.3 for references). Accounting for the largest proportion of genetic variation affecting drug therapy are the P450s, a large, highly polymorphic family of heme-containing mono- (http://www.cypalleles.ki.se/)

(Rendic 2002; Guengerich 2004). The human genome encodes at least 57 cytochrome

P450s and contains 58 pseudogenes (Nelson et al., 2004), which are organized into 18 families (enzymes sharing >40% identity) and 43 subfamilies (enzymes sharing >55% identity). The most important cytochrome P450s involved in drug metabolism are the members of the CYP1, CYP2 and CYP3 families (Danielson 2002).

Comprehensive cytochrome genotyping assays are becoming feasible and are now applied in pharmaceutical trials, but a key question is whether we have sufficiently determined the bulk of the functional alleles in human populations. Many alleles have been reported but there is an inherent bias toward sequencing and genotyping of coding regions. As a result, a portion of phenotypic variation remains unaccounted for by genetic factors. Large-scale sequencing projects continue to reveal noncoding alterations that could affect expression and function of drug-relevant genes (Iida et al., 2001b; Adjei et al., 2003; Aklillu et al., 2003; Allorge et al., 2003; Saito et al., 2003; Blaisdell et al.,

22 2004; Cauffiez et al., 2004; Fukushima-Uesaka et al., 2004; Murayama et al., 2004).

Table 2.1 lists known functional noncoding polymorphisms, which include crucial drug-

relevant functional alleles.

2.3.1.1 CYP1 family

CYP1A2 has roles in metabolism of clozapine, , ,

, and and is generally probed with since it

specifically demethylates this . There is evidence for a number of functional,

noncoding alleles in the sequence of CYP1A2. A moderately frequent single nucleotide

change at a demonstrated protein binding site in the enhancer region (CYP1A2*1C allele)

correlates with a decrease in enzyme activity (Nakajima et al., 1999). A single base

change (-163C>A) in intron 1 (CYP1A2*1F), occurring frequently in a Japanese

population (Chida et al., 1999), is correlated with high enzyme induction in Caucasians

(Sachse et al., 1999). However, a study on the plasma concentrations of in

Japanese schizophrenics carrying CYP1A2*1F (Shimoda et al., 2002) and another on

Swedish pregnant women with the same allele (Nordmark et al., 2002) dispute this connection. Another polymorphism in intron 1 (-729C>T) was found in an Ethiopian

population 10 bp upstream from one previously reported (-739T>G) (Aklillu et al., 2003).

This novel allele (CYP1A2*1K) associates with decreased in vivo metabolism, decreased

expression in reporter constructs, and disruption of binding within intron 1 by a member

23 Gene Class URL reference Reference Alcohol dehydrogenases www.gene.ucl.ac.uk/nomenclature/genefamily/ADH.php Duester et al., 1999

Aldehyde dehydrogenases www.aldh.org/ Vasiliou et al., 1999

Arylamine N-acetyltransferases louisville.edu/~dwhein01/NAT.html Hein et al., 2000b

Cytochrome P450s www.cypalleles.ki.se/ Nelson et al., 2004

Dihydropyrmidine dehydrogenase (none available) McLeod et al., 1998

Organic anion transporting polypeptides www.bioparadigms.org/slc/ Hagenbuch & Meier, 2004

24 Sulfotranferases (see Glatt & Meinl 2004 for amino acid changing alleles) Blanchard et al., 2004, Glatt & Meinl, 2004

UDP glucuronosyltransferases som.flinders.edu.au/FUSA/ClinPharm/UGT/ Mackenzie et al., 1997

Table 2.3: Literature and web references for gene and allele nomenclature of families with pharmacogenetic importance.

24 of the E twenty six (Ets) family of proteins (Aklillu et al., 2003). The results suggest that

the discrepancy between previous intron 1 studies may be due to failure to completely

determine haplotype structures since CYP1A2*1K (-163C>A, -729C>T, -739T>G) was

shown to be functionally significant but CYP1A2*1F (-163C>A) and CYP1A2*1J (-

163C>A, -739T>G) were not. Another SNP (3534G>A) in the donor splice site of intron

6 (CYP1A2*7) was suggested to account for extremely high clozapine concentrations at

normal doses in a single individual, but this has not been replicated (Allorge et al., 2003).

These results illustrate difficulties in assigning functional properties to polymorphisms in

regulatory regions, and combinations of polymorphisms in haplotypes, and moreover, in

relating them to pharmacokinetic differences in vivo where more than one factor contributes to the phenotype.

2.3.1.2 CYP2 family

CYP2A6 accounts for approximately 10% of human liver microsome CYPs and is the major player in oxidation of nicotine, cotinine and a few pharmaceuticals (e.g., fadrozole, halothan, losigamone, letrozole, methoxyflurane, SM-12502) (Pelkonen et al.,

2000). Inter-individual expression levels vary more than ten-fold and are attributed to environmental and genetic factors, with Asian populations having a high proportion of poor metabolizers (PMs) (Pelkonen et al., 2000). An uncommon allele (CYP2A6*12) results in a crossing over event between CYP2A6 and CYP2A7 in intron 2, addition of 10 amino acids and subsequently lower 7-hydroxylation activity of the enzyme (Oscarson et

25 al., 2002). A TATA box allele (CYP2A6*9) with ~23% frequency in Asian populations and ~5% in Caucasians correlates with lower expression level (mRNA and protein) and enzyme activity (Pitarque et al., 2001; Kiyotani et al., 2003). Recently an additional functional promoter allele (CYP2A6*1D) with high prevalence in Caucasians has been

described which appears to disrupt an enhancer element in reporter assays (Pitarque et al.,

2004). A novel regulatory polymorphism (CYP2A6*1H) that disrupts binding of NF-Y

to the CYP2A6 enhancer region affects expression and was assayed alone and in

combination with CYP2A6*1D (CYP2A6*1J) (von Richter et al., 2004).

The proteins encoded by CYP2C genes account for approximately 20% of the

total liver cytochrome P450 content in humans (Imaoka et al., 1996) and are responsible

for metabolizing approximately 20% of clinically administered drugs. CYP2C19 is the

cytochrome P450 isoform primarily responsible for metabolism of the anticonvulsant

agent (S)-mephenytoin. Individuals can be characterized as either extensive metabolizers

(EM) or PMs. The PM phenotype occurs in 2-5% of Caucasian populations and 18-23%

of Asian populations (Kupfer & Preisig 1984; Nakamura et al., 1985). The major genetic

defect responsible for the CYP2C19 PM phenotype is a single (681G>A)

mutation in exon 5 of CYP2C19 (CYP2C19*2), which creates an aberrant splice site.

This alters the reading frame of the mRNA starting with amino acid 215 and produces a

premature termination codon (PTC) 20 amino acids downstream, resulting in a truncated

non-functional protein (de Morais et al., 1994). Another SNP (636G>A) (CYP2C19*3)

in exon 4, also creates a PTC and is responsible for the PM phenotype in Japanese

populations but not in Caucasian populations (De Morais et al., 1994). The PM

phenotype in Caucasian populations is also partially explained by other SNPs: disruption

26 of the ATG initiation codon (change to GTG) (CYP2C19*4) (Ferguson et al., 1998), changes in amino acids (CYP2C19*5A, *5B, *6), and a single nucleotide transversion

(IVS5+2T>A) in the GT 5' donor splice site of intron 5 (CYP2C19*7) (Ibeanu et al.,

1999). Overall, variations in splicing contribute to a significant extent to the PM phenotype.

CYP2C9 is the most highly expressed member of the CYP2C subfamily in hepatic tissue, and metabolizes 16% of drugs in current clinical use, including some drugs with narrow therapeutic indices, such as the hypoglycemic tolbutamide, the anticonvulsant , and the (S)- (Schwarz 2003). Several

SNPs that change amino acids and result in reduced enzyme activity have been identified and associated with adverse drug reactions or toxicity to drugs metabolized by CYP2C9

(Aithal et al., 1999; Lee et al., 2002; Ho et al., 2003; Schwarz 2003). The deletion of an adenine at base pair 818 of the mRNA causes frame shift and yields a nonfunctional protein (CYP2C9*6) (Kidd et al., 2001). While the allele frequency of this variant is

<1%, it has been associated with toxicity after treatment with normal doses of phenytoin

(Kidd et al., 2001).

CYP2D6 is the most polymorphic cytochrome gene, constitutes 2% of total hepatic cytochrome P450 content (Shimada et al., 1994; Imaoka et al., 1996), and supports oxidative metabolism of more than 70 pharmaceuticals. Genetic polymorphisms in the coding region of the CYP2D6 gene have been extensively investigated (see review

Zanger et al., 2004). More than 70 SNPs have been identified so far, and the focus has been on the coding region and mRNA splice sites that are responsible for the PM phenotype (7-10% in Caucasian populations and ~ 1% in Asian populations) (Zanger et

27 al., 2004). One of the main functional defects, a splicing defect mutation (1846G>A)

(CYP2D6*4, with an allele frequency of 20-25%) in the intron 3/exon 4 boundary causes

a shift of the consensus acceptor splice site of the third intron by one base pair, yielding a

spliced mRNA with one additional base, an altered reading frame and a PTC (Kagimoto

et al., 1990). Other genetic mechanisms for null alleles include frame shifts resulting

from single or multiple base pair insertion or deletion (CYP2D6*3, *6, *15, *19, *20,

*38, *42) (Kagimoto et al., 1990; Saxena et al., 1994; Sachse et al., 1996; Marez-Allorge

et al., 1999), a SNP that creates a PTC (CYP2D6*8) (Broly et al., 1995), and other

splicing defect mutations (CYP2D6*11, *44) (Marez et al., 1995; Yamazaki et al., 2003), in addition to entire gene deletion (CYP2D6*5) or duplications (Gaedigk et al., 1991) and amino acid changes (CYP2D6*7, 12) (Evert et al., 1994; Marez et al., 1996). However, only a few SNPs in the promoter region have been identified, and phenotypic consequences have not been demonstrated. Since ‘normotypic’ CYP2D6 carriers still display large variations in metabolic capacity, including the intermediate metabolizer phenotype (IM), we need to discern whether this is due to yet unrecognized polymorphisms acting in cis or trans, or to enzyme induction effects. A SNP in the promoter region (-1548C>G) within the CYP2D6*41 allele has been associated with

CYP2D6 expression, with the G allele correlating with higher levels of expression

(Lovlie et al., 2001; Zanger et al., 2001). However, because this SNP is in linkage with other SNPs known to affect expression (Raimundo et al., 2000), it is not clear which is functional. The mutation has been used as a marker to rule out CYP2D6 PM status

(Gaedigk et al., 2003). An intron 6 SNP (2988G>A) with a frequency of 8.4% in

28 Caucasians was found to be an improved predictive marker for the IM phenotype over

CYP2D6*41 (Raimundo et al., 2004).

Although most frameshift mutations in cytochrome P450 genes cause non- functional proteins, a frameshift mutation in the CYP2D7 pseudogene generates a functional enzyme (Pai et al., 2004). This common single base pair deletion (138delT) generates an open reading frame in the CYP2D7 pseudogene and a spliced variant containing partial inclusion of intron 6. This transcript produces a functional protein, that is expressed in the brain, but not liver or kidney. The variant CYP2D7 metabolizes codeine to morphine more efficiently than CYP2D6 in Neuro2a cells and also colocalizes with µ-opoid receptors in brain tissue, suggesting a possible role in metabolism at the drug site of action.

Less than 1% of hepatic cytochrome P450 content is due to CYP2B6, yet it is involved in the metabolism of ~70 clinically employed drugs, including alfentanil, ketamine, bupropion, , , efavirenz, and drugs of abuse such as methylenedioxymeth-amphethamine and nicotine (Lang et al., 2001). CYP2B6 activity in liver microsomes varies more than 100-fold among different individuals (Yamano et al., 1989; Ekins et al., 1998), with female subjects having higher levels of mRNA, protein and enzyme activity than males (Lamba et al., 2003). Large interracial differences are also observed for CYP2B6 with Hispanic females having higher CYP2B6 activity than

Caucasian or African-American females (Lamba et al., 2003).

Many variants have been identified for CYP2B6 including mRNA splice variants

(Lamba et al., 2003). These variants paint a complex picture because multiple potentially functional polymorphisms are found in linkage, and thus produce epistatic interactions.

29 The most common splice variant skips exons 4-6 (CYP2B6*9), while others lack the first

29 bp of exon 4 or contain an intron 3 insertion. Because of the presence of PTCs, all of these variants encode truncated proteins. A nonsynonymous SNP in exon 4 (15631G>T) that disrupts an ESE and a SNP (15582C>T) in an intron 3 branch site are correlated with these splicing variants. Several SNPs in the promoter region have also been reported.

The -750T>C SNP (CYP2B6*1G) correlates with lower levels of expression. The -

2320T>C SNP (CYP2B6*1B) in the HNF4 binding site in the promoter and the SNP in the intron 3 branch point (15582C>T) show a high degree of linkage disequilibrium and associate with low quantities of CYP2B6 protein in Caucasian females. Recently, several missense SNPs forming null alleles were identified but their combined frequency is only

2.6% in a Caucasian population (Lang et al., 2004). In combination these genetic variants do not fully explain the phenotypic variability in CYP2B6 activity. Because

CYP2B6 expression is correlated with constitutive androstane receptor (CAR) expression

(Lamba et al., 2003), it is possible that polymorphisms in the CAR gene might affect

CYP2B6 expression in trans. Recently, several novel tissue specific variant isoforms of

CAR have been identified (Lamba et al., 2004), but their association with CYP2B6 expression remains to be established.

2.3.1.3 CYP3 family

The CYP3A subfamily of proteins is highly homologous but exhibits wide tissue expression differences, implying a diversity of regulatory control. CYP3A4 is a critical enzyme because it is involved in the metabolism of over 30% of clinically used drugs,

30 including cyclosporin, erythromycin, and nifedipine. Inter-individual expression

differences of CYP3A4 in liver are substantial and have not been sufficiently ascribed to

known functional polymorphisms (Hirota et al., 2004). Principal among these has been a

promoter mutation (CYP3A4*1B) that disrupts a nifedipine-specific repressor element

(Westlind et al., 1999; Amirimani et al., 2003) which has been suggested to associate

with treatment-related leukemia (Felix et al., 1998). However, different reporter

constructs and confounding results suggest the need for consideration of a more complex

regulatory picture (Spurdle et al., 2002; Floyd et al., 2003). Clarification of polymorphic

trans-factors also provided insufficient explanation (Zhang et al., 2001). A recent attempt

to elucidate genetic components of variation scanned up to 13kb upstream of CYP3A4

and found a novel far upstream enhancer element with a polymorphism (-11,129_-

11,128insTGT) that disrupts a USF1 binding site and reduces expression, but is relatively

uncommon (3.1%) in a French population (Matsumura et al., 2004). The complexity of

the yet unfolding regulatory story of CYP3A4 underlies the importance of careful

analysis of regulatory regions and the need for application of new technologies to resolve

the nature and extent of genetic factors contributing to variabilities in expression and

activity.

Polymorphic CYP3A5 expression may also contribute to interindividual and

interracial differences in CYP3A-dependent drug clearance (Kuehl et al., 2001).

CYP3A5 was formerly considered to be an extraheptic enzyme (Wrighton et al., 1989;

Schuetz et al., 1992; Murray et al., 1995; Haehner et al., 1996; Kivisto et al., 1996), only

sporadically found in the liver tissue of some individuals (Wrighton et al., 1989). Kuehl

et al. (2001) discovered a SNP (6986A>G) in intron 3 (CYP3A5*3) creates a cryptic

31 splice site and an extra exon (exon 3b) and is responsible for polymorphic expression of

CYP3A5 in the liver. This CYP3A5*3 allele encodes an aberrantly spliced mRNA with a PTC resulting in an unstable mRNA and no detectable CYP3A5 mRNA and protein in

the liver. The allele frequency of CYP3A5*3 is reported to be 75.9% in a Japanese

population (Saeki et al., 2003) and ~73% in an African-American population (Hustert et

al., 2001a). CYP3A4 and CYP3A5 have overlapping substrate specificity (Bargetzi et

al., 1989; Gorski et al., 1994; Guitton et al., 1997), and CYP3A5 can represent over 50%

of the total hepatic CYP3A content in some individuals (Kuehl et al., 2001). Thus,

genetic polymorphism of CYP3A5 may play a role in the variability in CYP3A targeted

drug response, in some human populations.

2.3.1.4 Other CYPs

Other human cytochrome genes have reported functional regulatory

polymorphisms, and their diversity indicates that more will likely be uncovered. A

relatively new family of , CYP4F, contains members important for

physiology. CYP4F12 is expressed in liver and intestine and ω- hydroxylates eicosanoids, fatty acids and antihistamines (e.g., ebastine) (Bylund et al.,

2001; Hashizume et al., 2001). Cauffiez et al. (2004) recently published the first study of the CYP4F12 promoter region and revealed two alleles (CYP4F12*v1, CYP4F12*v2) common in the French population that associate with significantly reduced expression levels in HepG2 cells (Cauffiez et al., 2004). CYP4F12*v1 is a 192 bp deletion in intron

1 (21% frequency) that diminishes expression. CYP4F12*v2 is a nine-SNP phased

32 haplotype (8.5% frequency) which also associates with lower expression. In silico analysis suggests functional sites within these alleles but further characterization is needed. The extrahepatically expressed CYP2J2 metabolizes diclofenac, bufuralol and astemizole and contains a common mutation in a putative SP-1 site (King et al., 2002).

Contrary to previous reports the functional effects of this SNP were only speculated upon after discovery, while effects for less common amino acid changes were actually investigated. However, a recent article verifies that CYP2J2*7 disrupts an SP-1 binding site and reduces transcription significantly (Speicker 2004). Other proposed functional

CYP polymorphisms include variable number tandem repeats (VNTRs) in the 5’ regulatory regions of CYP2E1 (McCarver et al., 1998; Hu et al., 1999) and CYP8A1

(Chevalier et al., 2002), and point mutation in the CYP7A1 promoter recently linked with reduced response to atorvastatin (Wang et al., 1998; Kajinami et al., 2004). Taken together the mutational record of the CYP superfamily argues for extensive consideration of regulatory mutations in all families critical to pharmacology and disease.

2.3.1.5 Other classes of drug metabolizing enzymes

Drug metabolizing enzymes other than the cytochromes also display significant interindividual allelic differences. Non-CYP genes and their variations may play increasingly important roles as pharmaceutical companies design drugs that evade the challenges of cytochrome variability (Ingelman-Sundberg 2001). The cluster of UDP- glucuronosyltranferase-1 (UGT1) genes (UGT1 superfamily) at 2q37 exhibits exon sharing and harbors identified promoter and missense mutations that have been

33 associated with 2-6 fold lower conjugation activity (Kohle et al., 2003). Pathways

involving members of the UGT1 superfamily act on approximately 35% of all drugs

metabolized by phase II drug-metabolizing enzymes (Evans & Relling 1999). Over and

above many rare polymorphisms in the UGT1A1 gene, a common promoter region

dinucleotide repeat (5-8 repeats) (UGT1A1*28) accounts for reductions in expression

level via alteration of transcription initiation (Bosma et al., 1995; Guillemette 2003). The

polymorphism correlates with lower protein levels and reduced conjugation activity

(Fang & Lazarus 2004), and is implicated in toxicity to treatment with irinotecan via

altered glucuronidation of its active metabolite SN-38 (7-ethyl-10-hydroxycamptothecin)

(Iyer et al., 1999) and adverse reactions including neutropenia and diarrhea (Iyer et al.,

2002). UGT1A1*28 is relatively frequent in many populations: Caucasian (~32%),

Asian non-Japanese (~15%), African (~41%), Italian (~36%) (Guillemette 2003). A promoter region polymorphism in UGT1A9 (UGT1A9*22) common in Japanese,

Caucasians and African-Americans has recently been associated with higher expression via HepG2 reporter constructs but related enzyme induction or metabolism phenotypes are unreported (Yamanaka et al., 2004).

Thiopurine (S)-methyltransferase (TPMT) is a prominent example of how pharmacogenetics can impact individual treatment. The enzyme is involved in the metabolism of thiopurines: the cytotoxic drugs 6-mercaptopurine (6-MP) and 6- thioguanine (6-TG), and the immunosuppressant azathioprine, which is rapidly converted to 6-MP. Amino acid changing polymorphisms account for most of the variability in the

RBC activity of TPMT, though their frequency varies widely in different populations

(Vuchetich et al., 1995; McLeod & Siva 2002). Variable number tandem repeats

34 (VNTR) in the promoter region of TPMT have been identified (Spire-Vayron de la

Moureyre et al., 1998; Spire-Vayron de la Moureyre et al., 1999) and suggested as modulators of expression and thus enzyme activity (Fessing et al., 1998; Spire-Vayron de la Moureyre et al., 1998; Yan et al., 2000; Alves et al., 2001). These repeats appear to be in linkage disequilibrium with the nonsynonymous alleles making definitive declaration of their functional role difficult (Marinaki et al., 2003).

N-acetyltransferases (NAT) catalyze acetyltransfer from acetylcoenzyme A to an array of and arylamine and hydrazine drugs (e.g., PABA, , sulfamethazine, procainamide, nitrazepam, dapsone). The two human NATs, NAT1 and

NAT2, are intronless and, so far, have mainly nonsynonymous polymorphisms that associate with enzyme activity (Hein et al., 2000; Butcher et al., 2002). Slow acetylation genotypes correlate with adverse effects after combined isoniazid and therapy

(Ohno et al., 2000) and co-trimoxazole treatment (Zielinska et al., 1998). The extent of

N-acetylation also accounts for variability in toxicity to amonafide treatment in cancer patients, with rapid acetylators experiencing significantly greater toxicity (Ratain et al.,

1995). A relatively uncommon allele (NAT1*16) is an AAA insertion and C to A transversion in the 3’UTR region and correlates with predicted structural variation, 2-fold lower expression and a similar reduction in N-acetylation of substrates (de Leon et al.,

2000). Another 3’UTR variant NAT1*10 is suggested to correlate with higher enzyme activity (Bell et al., 1995; Payton & Sim 1998), but this has not been supported in a number of studies perhaps due to linkage with an amino acid change not detected in some of the former assays (Bruhn et al., 1999; de Leon et al., 2000; Hein et al., 2000).

35 The sulfotransferase family (SULTs) forms sulfate conjugates with a variety of

xenobiotics and endogenous small molecules and now numbers 17 distinct genes

(Freimuth et al., 2004). Multiple studies report inter-individual differences in SULT

expression, drug-related response, and realization of a heritable component (Reiter &

Weinshilboum 1982; Bonham Carter et al., 1983; Weinshilboum 1990; Glatt & Meinl

2004). Noticeably, no functional studies on noncoding variants have been reported despite a substantial number of such variants identified in sequencing studies (Freimuth et al., 2001; Iida et al., 2001b; Glatt & Meinl 2004).

A few aldehyde (ALDH) and alcohol (ADH) dehydrogenases are responsible for

90% of the metabolism of ethanol, a compound with potential for pharmacological interactions. Well known coding variants in ADH2 and ALDH2 have been described which associate with resistance to alcoholism. ADH4 is expressed in specific portions of the GI tract (Vaglenova et al., 2003) and is essential for vitamin A metabolism (Duester et al., 2003), but may also have a role in first-pass metabolism of ethanol (Yin et al.,

2003). A reporter assay indicated that a single functional promoter allele affects ADH4 expression (Edenberg et al., 1999), and no coding variations were identified in a subsequent sequencing study (Iida et al., 2002), but in vivo characterizations of allelic protein or tissue level effects are yet unreported. These examples show that further understanding of polymorphic expression in many types of genes beyond the cytochrome

P450 superfamily is important to understanding the genetic basis of inter-individual differences in drug responses.

36 2.3.2 Drug transporters

Modulation in drug transporter expression potentially affects the uptake and

efficacy of many compounds. Numerous genes encode solute carriers (SLC families,

>300 genes) and the class of primary active ABC transporters (48 genes). Because of the

size of these gene families, we only address a few examples here. The multiple drug

resistance polypeptide ABCB1 (MDR1, Pgp) is an energy-dependent protein efflux pump that acts upon a wide range of natural and pharmacological substrates (see reviews Sun et

al., 2004). A synonymous SNP (3435C>T) has been associated with low expression and

altered pharmacokinetics in a number of studies (Hoffmeyer et al., 2000), but others have

reported conflicting results (Sakaeda et al., 2001) or dismissed this as an unlikely

functional SNP since it is not a coding alteration. In these cases it is suggested that a

functional allele (perhaps 2677G>A,T) must be in linkage, with 3435C>T being an

indicator SNP. However, the synonymous polymorphism could affect mRNA structure,

stability, or translational efficiency. Recently Takane et al. (2004) conducted a functional

analysis of ABCB1 variants. They found 10 promoter variants (7 were novel), an

association of 3435C>T with lower expression, association of promoter haplotypes with

transcription level differences independent of 3435C>T, a novel transcription factor

binding site which is disrupted in a haplotype correlating with lower expression, and

ruled out differences in methylation status as a principal cause (Takane et al., 2004).

Three promoter variants found in rare haplotypes were associated with higher

transcriptional expression, including one (-129T>C) that was previously reported to be

associated with high transport activity in hematopoietic stem cells (Calado et al., 2002).

37 While there are numerous studies associating ABCB1 polymorphisms with altered drug disposition and effect, these results are often not reproducible in different populations.

This points to a lack of sufficient understanding of the interaction among multiple genetic factors determining ABCB1 expression and function.

Two non-coding polymorphisms of the serotonin transporter (5-HTT) gene

SLC6A4 have been studied extensively based on allelic differences in expression in brain and other tissues (Lesch et al., 1994; Heils et al., 1996; Hranilovic et al., 2004). The contribution of serotonin neurophysiology to psychiatric disorders is great and, thus, the impact of these variants may be far-reaching. In particular the short form of the promoter region polymorphism (5-HTTLPR) correlates with lower transcriptional and translational activity (Heinz et al., 2000; Hranilovic et al., 2004), blunted fenfluramine-induced prolactin release (Reist et al., 2001), greater amygdala neuronal activity in response to fearful stimuli (Hariri et al., 2002), poorer efficacy of anti- (e.g., citalopram

(Eichhammer et al., 2003), (Rausch et al., 2002), (Pollock et al.,

2000)), and anti- induced mania in bipolar disorder (Mundo et al., 2001). The presence of a genetic component to variability in response to anti-depressants is also suggested by family studies (e.g., O'Reilly et al., 1994). Contributions of this polymorphism to variation in colonic uptake of (Scherl & Frissora 2003) and heroin dependence (Gerra et al., 2004) have also been suggested. However, there are dissenting results, and the debate over the extent of the functional importance of these alleles is yet to be settled. There are alternative alleles, splice variants and polyadenylation signals that might play roles (Delbruck et al., 1997; Battersby et al.,

1999; Michaelovsky et al., 1999; Frisch et al., 2000; Nakamura et al., 2000; Cigler et al.,

38 2001; Sun et al., 2002), and significant population differences in genotype to consider

(Gelernter et al., 1999; Lotrich et al., 2003).

The members of the organic anion transporting polypeptide (OATP) superfamily, encoded by the SLCO genes, are broadly expressed and mediate transport of a wide range of endogenous and exogenous compounds including anions, cations, neutral compounds and peptidomimetic agents (Tirona & Kim 2002; Hagenbuch & Meier 2004). Regulatory region polymorphisms have been reported in a number of SLCO genes (Iida et al., 2001a;

Tirona & Kim 2002). OATP-A (SLCO1A2) is expressed in brain capillary endothelial cells suggesting a role in blood-brain barrier permeability of solutes and this transmembrane protein transports opoid peptides (Gao et al., 2000). There is a

SLCO1A2 SNP localized to a putative hepatocyte nuclear factor 1α (HNF1α) binding site

(Kullak-Ublick et al., 1997), but there are no functional reports on it or any of the identified nonsynonymous polymorphisms in the gene [Source: dbSNP]. A newer far upstream polymorphism (–11187G>A) in OATP-C (SLCO1B1) was associated with 98% higher clearance (AUC) of pravastatin in Caucasian males and combined in a haplotype

(SLCO1B1*17) with coding polymorphisms which exhibited a similar association

(Niemi et al., 2004). Drug transporters will likely provide further examples of clinically relevant cis-regulatory polymorphisms.

2.3.3 Drug targets and receptors

Modulation of expression of drug targets is another avenue for studying inter- individual differences in therapeutic response. In the field of cancer pharmacogenetics,

39 understanding the expression patterns in patients’ tumors or their untransformed genomes

in somatic cells can guide selection or administration of treatment. The principal

downstream target of the common chemotherapeutic 5-fluorouracil (5-FU) is thymidylate

synthase (TYMS). Three copies of a 28 bp tandem repeat (TSER*3) in the promoter of

TYMS have been associated with higher TYMS levels (Horie et al., 1995) and poorer

outcomes with 5-FU treatment (Horie et al., 1995; Villafranca et al., 2001). Here again we see a complex regulatory picture that needs dissection as further polymorphisms show similar association: a SNP within the TSER*3 (TSER*3RG) that abolishes a USF1- binding site (Mandola et al., 2003) and a 3’UTR SNP now associated with message stability (Ulrich et al., 2000; Mandola et al., 2004). Inactivation of 5-FU is principally mediated via dihydropyrimidine dehydrogenase (DPYD), and variations in its activity can have fatal consequences (Van Kuilenburg et al., 2002). A splice site transition

(DPYD*2A) accounts for some observed toxicity but has low frequency (<1%) (Wei et al., 1996). Despite description of many other polymorphisms (Wei et al., 1996; McLeod et al., 1998; Collie-Duguid et al., 2000), 5-FU toxicity remains only partially understood.

Other examples in cancer pharmacogenetics include applications of genotyping to treatment with well known tyrosine kinase inhibitors (e.g., herceptin, (Arteaga & Baselga

2004)). Functional explanations for these stratifying mutations, often involving genomic instability and high expression or loss of heterozygosity, are becoming understood

(Sordella et al., 2004).

Drug targets in the brain and the degree to which their genetic variation explains differences in psychoses and their treatment have been the focus of much research. A synonymous 102C>T polymorphism of the serotonin-2A receptor (HTR2A) was

40 previously proposed to associate with responsiveness to clozapine (Arranz et al., 1998),

implying that it is in linkage disequilibrium with cis-acting regulatory polymorphisms.

However, using this SNP as a marker for measuring relative allelic mRNA abundance, no difference in expression was detected in adult brain tissues (Bray et al., 2004). This

argues against the presence of cis-acting regulatory polymorphisms. These findings do

not address possible effects on translation but do argue for a shift of focus in future

research on variants in this gene region.

An intriguing study in a large cohort of individuals treated with pravastatin

recently revealed a significant association between reduction in LDL levels after

treatment and two intronic SNPs in the 3-hydroxy-3-methylglutaryl-coenzyme A

reductase (HMGCR) gene (Chasman et al., 2004). The SNPs are in linkage

disequilibrium but are not near intron-exon borders or CpG dinucleotides. The functional

explanation remains an open question but there is disequilibrium with a 3’UTR variant

present in mRNA, raising the possibility of altered pre-translational regulation. If a

functional allele is identified and generalized to other statins, it may help explain why a

significant proportion of treated individuals does not respond readily to this blockbuster

class of drugs (Schmitz & Drobnik 2003). There are no definitive genetic markers that

account for observed variations despite a number of non-coding variations previously

associated with differential responses to statins (de Maat et al., 1999; Marian et al., 2000;

Zambon et al., 2001; Schmitz & Drobnik 2003; Kajinami et al., 2004).

These few transporter and receptor examples highlight the important contribution

regulatory polymorphisms will play in defining genetic variability in pharmacokinetics

and pharmacodynamics. The identification of allelic differences in key genes may also

41 allow eventual targeting of sequence-specific functional cis-regulatory polymorphisms

(Fluiter et al., 2003; Miller et al., 2003; Achenbach et al., 2004; Bruno et al., 2004).

Alternatively, general classes of cis-regulatory variants may be targeted. For instance, aminoglycoside antibiotics have been shown to inhibit proofreading activity by

misincorporating an amino acid and thus ameliorating nonsense mutations through

increased translational readthrough (Howard et al., 1996). Analysis of cystic fibrosis

genotypes among over 1,000 known variants has thus been used to select targeted therapy

in a small, stratified subgroup of the disease population (Wilschanski et al., 2003).

2.3.4 cis-acting polymorphisms in relevant trans factors

While most genes harbor cis-acting changes relevant to their expression, genetic

variants regulating gene expression in trans are likely to dominate interindividual

variability in mRNA expression profiles (Wittkopp et al., 2004). This results from

pleiotropic consequences on gene expression by cis-acting polymorphisms that alter the

function of transcription factors, receptors, and other signaling molecules. We therefore

need to consider cis-acting polymorphisms within the genes encoding trans factors

influencing drug metabolizing enzymes, transporters and targets, reflecting complex

regulatory networks underlying pharmacogenetics phenotypes (Rushmore & Kong 2002;

Akiyama & Gonzalez 2003). Among the transcription factors, we can distinguish

between those determining constitutive expression on the one hand and factors mediating

enzyme induction, a common cause of temporal changes in drug metabolizing capacity.

Key players in the regulation of genes discussed in this review include members of the

42 nuclear hormone receptor superfamily (pregnane X receptor (PXR), constitutive androstane receptor (CAR), farnesol X receptor (FXR), hepatocyte nuclear factor(HNF)4α, peroxisome proliferator-activated receptor alpha (PPAR-α), Vitamin D receptor (VDR)), and transcription factors (HNF-1α, HNF-3, HNF6, CCAAT/enhancer- binding proteins (C/EBP), albumin D-site binding protein (DBP)) (Akiyama & Gonzalez

2003). Studies of human polymorphisms in these genes may have important pharmacological implications, but current understanding is immature.

For example, PXR stimulates transcription of a number of drug-metabolizing enzymes (e.g., CYP3A4) (Lehmann et al., 1998; Goodwin et al., 1999; Schuetz et al.,

2001; Tirona et al., 2003), as well as ABCB1 in intestine (Geick et al., 2001). CYP3A4 cis-variations do not adequately account for observed drug phenotypes, thus, implicating variability in other factors such as PXR (Lamba et al., 2002). PXR expression correlates with CYP3A subfamily expression (Westlind-Johnsson et al., 2003). In vitro assays of

PXR variants encoding altered proteins have demonstrated correlated changes in

CYP3A4 expression, particularly in response to rifampicin induction of the enzyme

(Hustert et al., 2001b; Zhang et al., 2001). However, these changes are infrequent, and a heterozygous carrier of one such polymorphism showed normal CYP3A4 metabolism

(Zhang et al., 2001). Three silent mutations in PXR also correlated with changes in its expression level (Zhang et al., 2001). Moreover, PXR is also transcribed from an alternative initiation codon (Bertilsson et al., 1998). A 6-bp deletion in the promoter of this alternative transcript, hPAR-2, disrupts a predicted HNF1α binding site and abolishes transcription in a liver cell line, but again no human phenotype was observed (Uno et al.,

2003).

43 HNFs have a large role in the liver-specific enhancement of transcription of many cytochromes (Akiyama & Gonzalez 2003). HNF1α and HNF4α polymorphisms have been widely scanned because of their relation with diabetes in many populations (Ryffel

2001), but associated effects on drug phenotypes are not well investigated. Functional

HNF1α promoter polymorphisms, including one in a putative HNF4 binding site have been described (Gragnoli et al., 1997). Amino acid changes were recently shown to result in a PTC and decreased protein stability through nonsense-mediate decay (Harries et al., 2004). The documented role of HNFs in the expression of many genes of pharmacological relevance (e.g., HNF4 regulation of CYP3A4, (Tirona et al., 2003)) warrants further work on the effects of polymorphisms in these trans factors. Because genes are regulated by multiple factors, often with overlapping specificity, characterization of the effects of variation in trans factors remains a challenging task.

Ideally, for genes of critical pharmacological importance, we will eventually arrive at a multivariate understanding that accounts for variations at the target locus and variations effecting the contextual trans inputs to the locus.

2.4 Summary

Accounting for genetic components of variation in all phenotypes must ultimately be done at cis-sequences with pleiotropic effects reverberating in trans (see Fig. 2.1).

The examples presented here demonstrate that as pharmacogenetics proceeds, continuing identification of novel cis-regulatory variants and their functional effects is necessary.

High throughput genotyping for clinical pharmacological applications is increasingly

44 feasible, and screening panels are easily developed, but the identification and selection of

relevant alleles lags behind. This process remains difficult for reasons discussed here,

including population-specific sources of error, limits of experimental approaches, and the challenges of investigating the many potential regulatory modes, including accounting for epistasis and epigenetics. If well characterized genes are indicative, a few common polymorphisms within a population will cover the bulk of the genetic variation while many, less frequent polymorphisms will explain the rest. However, in genes that have accumulated many mutations, such as CYP2D6, a complete accounting of all functional polymorphisms is needed to permit prospective clinical applications, including drug choice and selection of dosage. Because frequent null mutations can combine with less frequent mutations on the sister chromosome (compound heterozygosity), even low frequency variants may be clinically important. Haplotyping may be effective in identifying key combinations of polymorphisms, but insufficient marker density and assumption of linkage over large sequences may lead to missed functional alleles. The standardization of nomenclature (see Table 2.3) and integration of databases (see Table

2.2) also remain ongoing challenges as the number of marker and functional alleles continues to increase. Appropriate meta-analyses can also provide a bird’s-eye view of the most important alleles within specific populations (e.g., Phillips et al., 2001). Though newer drugs may evade cytochrome metabolism, the examples given here demonstrate that many classes of genes must be pursued as potential contributors to inter-individual differences in drug response, and efforts to identify their functional cis-regulatory variants will have lasting importance.

45 CHAPTER 3

3.1 Survey of allelic expression in human target tissues

Genetic, epigenetic, and environmental factors determine phenotypic variability,

including susceptibility to disease or treatment outcome. Polymorphisms that change the

encoded amino acid sequence are readily detectable and have been widely studied,

encompassing mostly single nucleotide polymorphisms in coding regions (cSNPs). More

difficult to discover, regulatory polymorphisms (rSNPs) nevertheless appear to be more prevalent than functional nonsynonymous cSNPs (Rockman & Wray, 2002; Yan et al.,

2002; Bray et al., 2003; Hoogendoorn et al., 2003; Buckland 2004; Secko 2004;

Stamatoyannopoulos 2004; Wittkopp et al., 2004; Yan & Zhou 2004; Buckland et al.,

2005; Gilad et al., 2006; Pastinen et al., 2006) as judged from genome-wide surveys and

SNP association analysis with mRNA expression trait mapping (Cheung et al., 2003;

Pastinen et al., 2006; Rockman & Kruglyak 2006). This has suggested pervasive effects

of regulatory polymorphisms, acting both in cis and in trans, as main factors in human

phenotypic evolution and variability (Rockman et al., 2005; Pastinen et al., 2006).

Expression profiling in primates revealed that expression levels are often under

stabilizing selection, with ~5% of genes analyzed inferred to have evolved under positive

selection (Rockman & Kruglyak 2006). These results emphasize the importance of

changes in gene regulation as main mechanisms driving human phenotypic evolution

46 (Rockman et al., 2005). A third type of functional polymorphism affects mRNA processing (splicing, maturation, stability, transport) and translation (see Fig. 3.1)

(Johnson et al., 2005). We refer to this class of polymorphisms that do not affect gene transcription as ‘structural RNA polymorphisms’ (srSNPs), and hypothesize that they may be more prevalent cis-acting modifiers than cSNPs and rSNPs.

Figure 3.1: Schematic of main types of functional polymorphisms

Recent advances have enabled large-scale searches for cis-acting factors including rSNPs and srSNPs. Integration of mRNA expression profiles with genome-wide SNP data can detect cis- and trans-acting genetic variants (Pastinen et al., 2006). To mitigate trans-acting factors while detecting those that are cis-acting, one can specifically measure the expression of mRNAs from each of the two alleles in a tissue. Allelic expression imbalance (AEI), i.e., a different number of mRNAs generated from one allele compared to the other, is a robust and quantitative phenotype directly linked to cis-acting factors

(Pinsonneault et al., 2003; Johnson et al., 2005; Wang et al., 2005; Zhang et al., 2005;

Lim et al., 2006; Pastinen et al., 2006; Pinsonneault et al., 2006). These factors include

47 polymorphisms (e.g., SNPs, copy number variations), as well as epigenetic regulation

(Grewal & Moazed 2003; Yan & Zhou 2004), including X-inactivation, imprinting, and

gene silencing (Popendikyte et al., 1999; Petronis 2003; Sandovici et al., 2004; Carrel et al., 2005; Liu et al., 2005). Allele-specific analysis, both experimental and by mining of expressed sequence databases, has been successfully applied to the study of cis-acting polymorphisms (Cowles et al., 2002; Yan et al., 2002; Knight et al., 2003; Pinsonneault et al., 2004; Ge et al., 2005; Lin et al., 2005; Wang et al., 2005; Zhang et al., 2005; Lim et al., 2006, Pinsonneault et al., 2006; Wang et al., 2006; Lim et al., 2007), imprinting

(Szabo & Mann 1995), and splice variants (Vandenbroucke et al., 2001; Khan et al.,

2002). Occurrence of AEI indicates the possible presence of rSNPs or srSNPs, the former occurring over very large regions of the gene locus (Forton et al., 2007), and the latter located anywhere in transcribed exons and introns (affecting hnRNA or mRNA).

Use of AEI as the phenotype can reveal the location of functional polymorphisms with a relatively small number of subjects through linkage analysis by dense SNP scanning of the gene locus, as we have shown with MAOA, TPH2, MDR1, and OPRM1 (Pinsonneault et al., 2004; Wang et al., 2005; Zhang et al., 2005; Lim et al., 2006, Pinsonneault et al.,

2006; Lim et al., 2007). Presence of more than one functional cis-acting polymorphism

in a gene locus can result in complex patterns of AEI. In these cases, high precision and

accuracy of AEI assays is essential to finding the responsible polymorphisms. Given an

AEI assay of sufficient accuracy, AEI analysis can resolve contributions from genetic and

epigenetic factors, as we have shown for MAOA (Pinsonneault et al., 2006).

Recent surveys for AEI, in a limited number of genes suggest that 20-50% of

genes show detectable frequent AEI (Yan et al., 2002; Bray et al., 2003; Morley et al.,

48 2004; Pastinen et al., 2004; He et al., 2005). This is corroborated by large-scale array

analysis of allelic ratios in mRNA compared to genomic DNA in blood lymphocytes

(Pant et al., 2006). However, because the impact of rSNPs and srSNPs depends on the

tissue context, selection of a physiologically relevant tissue is critical to understanding

the biological significance of the results. The majority of samples assayed in previous

surveys were drawn from in vitro transformed cell lines, known to exhibit issues like

clonality and aberrant methylation (Morley et al., 2004; Pastinen et al., 2004; He et al.,

2005). We surveyed allelic expression imbalances for 42 genes previously implicated in

major diseases or affecting drug treatment outcomes, most of which had already been the

subject of numerous genetic studies. We employed a variety of human post-mortem and biopsy tissues, including brain, heart, liver, intestines, and kidney, as well as blood cells

and transformed B-lymphoblasts, making this the broadest and largest survey yet of AEI in primary human tissues. Our results indicate AEI is relatively frequent and common in human tissues. We present methodological approaches for AEI in human tissues, and highlight our analysis indicating the presence of frequent functional genetic variants in

SOD2 and ACE.

3.2 Method for allelic expression survey

3.2.1 Tissue sources and processing

The choice of tissue is important when examining AEI. Each tissue where genes are functionally expressed displays distinct profiles of regulatory factors and proteins

49 involved in mRNA processing, including splice factors. We have obtained autopsy or

biopsy tissue samples from various organs, such as liver tissues for drug metabolism, kidney, intestines, peripheral white blood cells, and various brain regions (prefrontal cortex, hippocampus, ventral tegmental area (VTA), amygdala, and nucleus accumbens,

all regions part of the neural reward circuits for drugs of abuse, and pontine nuclei of the brain stem (for SERT and TPH2). Heart tissues were left ventricular pieces collected

from the failed hearts of transplant recipients under an IRB-approved protocol at The

Ohio State University. We have collected tissues from up to ~100 subjects obtained from

various sources and tissue banks (OSU tissue procurement division, NIH Cooperative

Human Tissue Network, tissue banks supported by NIDA, 105 brain sections from the

Stanley Foundation, and tissue banks at the University of Maryland and the National

Disease Research Interchange). 90 EBV-transformed B-lymphoblasts were obtained from the CEPH collection (Coriell cell repository), consisting of 30 trios (two parents and one offspring). A majority of these tissues is from normal subjects, while there are some tissues included from subjects diagnosed with , bipolar disorder,

Alzheimer’s disease, and cancer. While allele frequencies may vary substantially with phenotype, the objective of this study was to assess genes on 30 to 100 different

chromosomes, for finding functional polymorphisms with frequencies of 5% or more.

We expected frequent functional polymorphisms to be present in both normal and

diseased populations, albeit potentially at different frequencies. Clinical association

studies are not included here.

50

3.2.2 Design, assay and analysis of allelic expression imbalance

We developed an analytical approach, optimized at each step of the procedure, to resolve problems with AEI measurements in human tissues and enable detailed study of select genes with a throughput of 150-500 samples per hour. The approach involves PCR amplification of the DNA or mRNA (after conversion to cDNA) regions containing a marker SNP, followed by SNaPshot primer extension analysis of each allele

(Pinsonneault et al., 2004; Wang et al., 2005; Zhang et al., 2005; Lim et al., 2006,

Pinsonneault et al., 2006; Lim et al., 2007) (for examples see Fig. 3.2). While a number of studies have applied similar methods (e.g., Bray et al., 2003), the relatively high throughput and robustness of the assay has enabled us to study >40 candidate genes in relevant human tissues. Our study differs from earlier studies by increased use of multiple marker SNPs in transcribed regions where available, inclusion of gene-specific primers close to the marker SNP region during cDNA synthesis to compensate for mRNA degradation, multiplexing of cDNA preparation to permit analysis of multiple genes in the same tissue sample and the measurement of allelic variation in relation to distinct splice variants. In most cases we also determined overall mRNA levels by RT-

PCR for the gene in the tissue of interest, unlike previous studies, because we have found that accurate AEI analysis requires robust expression. Our assays commonly yielded AEI ratios with a precision of 3-8% (standard deviation) for genomic DNA, and 5-10% for mRNA (varying with each gene and tissue analyzed), allowing the detection of relatively small changes of allelic ratios in both DNA and mRNA. We applied a conservative

51 threshold cutoff of based on analysis of variation. Where AEI is observed, we often applied further replications and validation processes, including the mixing of homozygous DNA of both alleles to determine assay linearity, and the use of additional

SNPs. The latter approach provides an independent second assay of AEI ratios in compound heterozygotes, as we have shown for MDR1, MAOA, TPH2, and CACNA1C

(Wang et al., 2005; Pinsonneault et al., 2006; Wang et al., 2006; Lim et al., 2007). If results from two marker SNPs in the same gene are discrepant in the same subject, this indicates either the presence of an analytical artifact, or the presence of alternative splice variants, where splicing is affected by a cis-acting polymorphisms. In these cases we search for additional markers and selectively amplify splice variants (Wang et al., 2006).

The assay readout in this study was a primer extension method (SNaPshot,

Applied Biosystems, Inc, Foster City, CA) with fluorescence detection after capillary electrophoresis (ABI3730). Details of the procedures have been published (Anderle et al., 2004; Wang et al., 2005; Zhang et al., 2005; Lim et al., 2006, Pinsonneault et al.,

2006; Wang et al., 2006; Lim et al., 2007). All oligonucleotide primers used in the assays surveying AEI in 42 genes are listed in Table 3.1. The SNaPshot assay readout per se, after PCR amplification, is highly reproducible and does not contribute substantially to assay variability, which arises largely from sample preparation and PCR processing, steps that require optimization. With an assay throughput of 150 samples per hour, we can survey multiple genes in relatively short time periods. We tested multiplexing of the allelic assay for higher throughput, but found that linearity of the readout has to be established for each combination limiting the utility of this approach.

52 For detection of AEI, we needed samples that are heterozygous for one or more

marker SNPs. Selection criteria for a marker SNP were as follows: 1) the SNP was

located in the transcribed region, coding or non-coding, 2) the minor allele frequency was

high (0.15 to 0.50) to allow for a sufficient number of informative samples, 3) selections

were made from validated SNPs in public databases and SNPs previously associated in

clinical studies, 4) when readily available, we selected two marker SNPs for each gene, and 5) the SNPs preferably lay at least 20 bp from exon boundaries so that the same set of

primers was used to amplify the surrounding sequence can be used in both DNA and

RNA.

Genomic DNA and RNA was prepared from peripheral lymphocytes, frozen

tissue samples (brain, liver, etc), or cell pellets (B-lymphocytes). For whole blood

extractions, the buffy coat was harvested, then red cells were either lysed using

ammonium chloride to yield a leukocyte pellet for RNA extraction, or red and white cells

are lysed with a sucrose Triton solution, providing a nuclear pellet for DNA purification.

Lymphocyte pellets were frozen for subsequent analysis. For tissue extractions, frozen

tissue samples were pulverized under liquid nitrogen. The frozen powder was portioned

into aliquots for DNA and RNA extractions. DNA was prepared by digestion of the

pellet or frozen powder with SDS and proteinase K followed by NaCl “salting out”

precipitation of proteins (Miller et al., 1988). The DNA in the supernatant was further

purified and recovered by ethanol precipitation. For RNA preparation, the starting

material was homogenized in Trizol™ reagent. Chloroform was added to partition the

solution into an organic phase containing the proteins, a DNA interface, and an aqueous

layer containing the RNA. The RNA was recovered by precipitation with isopropanol 53 followed by centrifugation. For additional purification, the RNA precipitate was

dissolved in RNase free water or Qiagen buffer, then extracted using Qiagen RNAeasy ™ columns according to the manufacturers instructions.

Complementary DNA (cDNA) was generated from the mRNA by Superscript II

reverse transcriptase (Invitrogen) employing oligo dT as a primer. This ensures that all

of the gene transcripts are represented at a rate that reflects the original abundance of

each gene product. However, the oligo dT priming often fails in autopsy tissues,

particularly where the marker SNP for AEI analysis resides far upstream of the

polyadenylation site. Therefore, we used gene specific oligonucleotide probes targeting a

region immediately 3’ of the marker SNP to prime cDNA synthesis. Since tissue samples

are often in limited supply, we have multiplexed up to 25 primers to permit 25 AEI

assays per cDNA preparation. Comparisons between single and multiple primers showed

no significant differences where tested. We have been successful in extracting DNA and

RNA from small frozen brain sections, under protocols we have established for optimal

results with SNaPshot analysis (Zhang et al., 2005; Lim et al., 2006; Pinsonneault et al.,

2006; Lim et al., 2007). Since most of our assays are highly multiplexed, minimum

requirements for multiple analyses of DNA and RNA are in the order of 100 ng for DNA

and 1 ug for RNA (depending upon the quality).

We have adapted this primer extension method (Matyas et al., 2002; Norton et al.,

2002) for the ABI 3730. A stretch of genomic DNA (~70 bp), or cDNA made from the

same tissue, is amplified by polymerase chain reaction (PCR), and the polymorphism is

measured by primer extension using fluorescently labeled terminator nucleotides. The

products are analyzed using an ABI 3730 capillary electrophoresis DNA instrument, and 54 calculated with Gene MapperTM 3.0 (ABI) software. This permits injection of 48 samples

every 30 min. The data for each incorporated fluorescently labeled nucleotide is

measured as a peak area. The position of the peak corresponds to the length of the

extension primer. Varying the primer extension primer size gives the ability for

multiplex analysis, so that our assay throughput is from ~150 samples per hour (single

SNP) to ~450 samples/hour (multiplexed to 3 SNPs; however, in each case any effect of

multiplexing on ratio measurements needs to be controlled for. The PCR amplification

step cannot be readily multiplexed; rather we combine singly amplified PCR fragments

for SNaPshot analysis). An important criterion is that each allele display similar PCR

amplification rates, and that both alleles of a heterozygous sample are amplified equally

in genomic DNA and cDNA. Small differences in allelic amplification can be controlled

for by normalizing the mRNA ratios to the DNA ratios, which are set to 1, but if

substantial differences are observed, we redesign the PCR primers. Success rate is >90%.

We have further observed that the allelic ratio measurements become erratic if mRNA

levels are too low. Cutoff for reproducible results is a cycle threshold of 26-28 for real-

time RT-PCR, above which we no longer use this method. For maximum precision, we

dilute samples as needed so that DNA and mRNA thresholds are similar. Using cloned

DNA, we establish the number of mRNA molecules present (1,000 or greater for optimal

results). For SNaPshot, optimized PCR conditions are used on 15 ul reactions.

Amplification conditions consist of 25-35 cycles of denaturation at 95oC for 30 seconds,

then primer annealing at 60o C for 1 minute, followed by extension at 72o C for 1 minute.

After amplification, the reactions are treated with exonuclease I and bacterial Antarctic alkaline phosphatase (New England Biolabs) to digest leftover PCR primers and remove

55 5’ phosphate groups from remaining unincorporated deoxynucleotides. For the primer

extension reaction, a gene-specific primer is designed with its 3’-end one base from the

SNP position (for internal control, we use primers in both directions). During the reaction a single fluorescently-labeled dideoxynucleotide (ddNTP) is added to the 3’ end of the primer in a template dependent manner (SNaPshot™ kit from Applied Biosystems

(ABI), which includes a different fluorescent dye label for each ddNTP, plus a modified

Taq enzyme that readily incorporates the labeled dideoxynucleotides). Allelic mRNA peak areas are normalized to the mean genomic allele ratio of heterozygous samples

(considered to be 1, unless a sample shows gene dosage effects). Standard deviations for heterozygous genomic DNA samples range from 3-6%. Typically, mRNA/cDNA ratios assayed in independent triplicate samples (each assayed twice) show higher standard deviations (5-10%) than the DNA, and detectability of AEI varies with the quality of the sample and the SNP analyzed. We have developed numerous controls, such as standard curves using cloned gene fragments containing the marker SNP, the use of more than one marker SNP per gene, use of extension primers in both directions, 3 independent cDNA preparations, and use of gene specific primers for cDNA preparation.

The SNaPshot primer extension reaction per se yields allelic ratios but not absolute levels of mRNA. For most of the genes in this report we used standard real-time

PCR conditions on an ABI7000 cycler with SYBR-Green as the marker dye to determine the amount of mRNA present relative to an internal standard (beta-actin or GAPDH).

56 Symbol SNP(s) Forward PCR primer Reverse PCR primer Extension primer rs4309 5’TGAGATGGGCCATATACAGTACTAC3’ 5’CCCGACGCAGGGAGAC3’ 5’CTGCAGTACAAGGATCTGCC3’ ACE 5’CCCTTACAAGCAGAGGTGAGCTAA rs4343 3’(DNA); 5’ACCACCTACAGCGTGGCC3’ 5’CATGCCCATAACAGGTCTTCATATT3’ 5’GACGAATGTGATGGCCAC3’ (cDNA) 5’AGATCTTCCTATTGGTGAAGTTATA rs4586 5’ATGCAATCAATGCCCCAGTC3’ 5’GCGAGCCTCTGCACTGAGAT3’ MCP1 3’ (CCL2) rs13900 5’CAACCCAAGAATCTGCAGCTAA3’ 5’GGCATAATGTTTCACATCAACAAAC3’ 5’TAGCTTTCCCCAGACACC3’ rs4880 5’GGTTGTTCACGTAGGCCG3’ 5’CAGCAGGCAGCTGGCT3’ 5’GAGCCCAGATACCCCAAA3’ SOD2 rs5746092 5’TTGCGGCGCAGCTGG3’ 5’CTGAAGCCGCTGCCGAA3’ 5’GGGCCTTAAGAAAGCGC3’ NNMT rs4646335 5’GTCCTGTCTCTCTGAACTTTGGG3’ 5’GAGCTGTATGCAATGCTTGCC3’ 5’ATTGTAGACCAGAGGGAGCACT3’ LPL rs1059611 5’TAAAGCAGCACATAGCACTGG3’ 5’GCAGATAGCCACAATGACCTT3’ 5’CCTTTCCAATATGTACAAGCTCC3’ 5’AATTAACTACAAAATCAGGAGTTTCATCA HMGCR rs12916 5’GCAAATATAAGCTGGGAAAAAAGTTT3’ 5’AAATCCATTTTCAACTGGCAGG3’ G3’ CSF1 rs333970 5’TTCCTCTCAGCATCTTCTCCAC3’ 5’GGGCAGATGGATGGTCTGTC3’ 5’GCCGGCAGATGTAACTGGTAC3’ 57 PTGDS rs6926 5’GGGCTGAAGCTGGGATC3’ 5’CTGACTTGCTTCCGGAGTTT3’ 5’CTCCCCGCCAAAGCA3’ 5’ATGTAGAAAATATAAATAGACTGC HIF1A rs2057482 5’CATTCCTTTTTTTGGACACTGGT3’ 5’CAAGTTTGTGCAGTATTGTAGCCA3’ TTTAGGTA3’ 5’AGTAACCTTGGAACCTTGGTGCAG NOS3 rs1799983 5’GAAACGGTCGCTTCGACGT3’ 5’GGCAGAAGGAAGAGTTCTGGG3’ GCCCCAGATGA3’ 5’GCTGTAGATTTTGTCAAAGATAGAT FLT1 rs2296189 5’AATACTCCGTAAGACCACACGTC3’ 5’ACTCGACTTCCTCTGAAATGGA3’ TC3’ rs1544514 5'GAACGAGTGGAATATCTCTTTCTCATAA3' 5'GCGGAGGTAGGCATTGGG3' 5'CATAATTTTTACGGTGGAAGC3' CACNA1C rs216008 5'CCAGAGCTGCCTGTTCAAAAT3' 5'ATGAGCTTCAGGATCATCTCCAC3' 5'TGCTCTTCACTGGCCTCTT3' ADRB2 rs1042719 5'TTCCAGGAGCTTCTGTGCCT3' 5'GCCGTTGCTGGAGTAGCC3' 5'TCTTCTTTGAAGGCCTATGG3' rs11739136 5'CAGGAATCCAAGTGCCACC3' 5'CCACAGGCATGGTACTGG3' 5'CCAACATCAGGGACCAGGAG3' KCNMB1 rs2656842 5'AAGTAGAGCCATCCATCCATGC3' 5'GATTGGACTGGAAGAGTGGG3' 5'CTGCTCCCCACTTGCAG3'

Continued

Table 3.1: Oligonucleotide primers used for PCR amplification and SNaPshot primer extension reactions for 42 genes

surveyed, with reverse PCR primers also used in gene specific cDNA synthesis. 57 Table 3.1 continued

VKORC1 rs7294 5’TTTTCCTAACTCGCCCGCT3’ 5’TGGGTGTAAAAAAGAGCGAGC3’ 5’CCTCCTCCTGCCATACCC3’ GGCX rs699664 5’GAGTGGCCTCGGAAGCTG3’ 5’GGAAACACTGGGCTGAGGG3’ 5’GGTGTCCTACTGCCCCC3’ CETP rs5884 5’TCACCATGGGCATTTGATT3’ 5’CCACAGCGGT GATCATTGAC3’ 5’TGAGAGCAGCTCCGAGTCC3’ HMOX1 rs11555832 5’GGAGGTTTGAGACAGCTGCC3’ 5’CTGCAGCAGAGCCTGGAAG3’ 5’CTGCAGCAGAGCCTGGAAG3’ rs1045642 5’CCTATGGAGACAACAGCCGG3’ 5’GGCATGTATGTTGGCCTCCT3’ 5’CTCCTTTGCTGCCCTCAC3’ rs1128503 5’TTTCTCACTCGTCCTGGTAGATCTT3’ 5’ACTGTTTCCAACCAGGGCC3’ 5’CTCTGCACCTTCAGGTTCAG3’ ABCB1 5’TGAAAGATAAGAAAGAACTAGAAG rs2032582 5’TTGTTGAAATGAAAATGTTGTCTGG3’ 5’CAATCATATTTAGTTTGACTCACCTTCC3’ GT3’ rs2656841 5’AAGTAGAGCCATCCATCCATGC3’ 5’GATTGGACTGGAAGAGTGGG3’ 5’GCAGGTGGAGAAGGCATTG3’ rs769258 5’TGTGTCCAGAGGAGCCCAT3’ 5’GGCTCACCAGGAAAGCAAA3’ 5’ACCGCCCGCCTGTGCCCATCA3’ rs1058164 5’TGTGTCCAGAGGAGCCCAT3’ 5’GGCTCACCAGGAAAGCAAA3’ 5’CGAGCAGAGGCGCTTCTCCGT3’ CYP2D6 rs16947 5’TGTGTCCAGAGGAGCCCAT3’ 5’GGCTCACCAGGAAAGCAAA3’ 5’AGCTTCAATGATGAGAACCTG3’ rs1135840 5’GGCCCAGCCACCATG3’ 5’GCACAGCACAAAGCTCATAGGG3’ 5’GTGTCTTTGCTTTCCTGGTGA3’ rs9332242 5’GGATTTGTGTGGGAGAAGCC3’ 5’TGAAACATAGGAAACTCTCC3’ 5’AATGCCTTTTCTCACC3’ 58 CYP2C9 rs2017319 5’GGATTTGTGTGGGAGAAGCC3’ 5’TGAAACATAGGAAACTCTCC3’ 5’GGCGGCACAGAGGCAAA3’ F: 5’CTGGATCACCAGTCACTGC3’ SLC15A1 rs1339067 5’ACATTTCTTCTCCTGGATCACCA3’ 5’ACACTAGAAGCGTGTGGCGTT3’ R: 5’CTGCTTGAAGTCGTCAGTTAC3’ rs1143670 5’AGGAAAATGGCTGTTGGTATGATC3’ 5’CGCAACTGCAAATGCCAG3’ 5’GCTGTTGGTATGATCCTAGC3’ SLC15A2 rs1143671 5’GAAATGGCCCCAGCCC3’ 5’CATCTGCCAGATTCAAGACTTGTAG3’ 5’AACCTCCTGGGGACCTG3’ 5’CGATCACATGTCGTGAACTGACTG rs6277 5’CCAGCTGACTCTCCCCGAC3’ 5’GCATGCCCATTCTTCTCTGG3’ ACTGGTTTGGCGGGGCTGTC3’ DRD2 rs6275 5’CCAGCTGACTCTCCCCGAC3’ 5’GCATGCCCATTCTTCTCTGG3’ 5’GGAGTGCTGTGGAGACC3’ rs6279 5’AGCCTGAGTCAGGGCCC3’ 5’ACCGCCTGCTCCACG3’ 5’CCCAGAGGCTGAGTTTTCT3’ rs1044393 5’ TGAACATGCACAGCCGC3’ 5’ CGAAGGCATAGGTGATGTCC3’ 5’TTGTAGGTGCCCACGGC3’ CHRNA4 rs1044397 5’CACATGCAAGAAGGAGCCCT3’ 5’ GGTGGTCTGCAATGTACTGGA3’ 5’CCGCAGCACCAAAGC3’ rs2236196 5’ CCCTCTCCTAGCGAAGCAGAT3’ 5’ GGTCCTTGAGCCTCTCGGG3’ 5’CTAGCGAAGCAGATTGGAGC3’

Continued

58 Table 3.1 continued

F: 5’ACTGATCGACTTGTCCCACTTAGA OPRM1 rs1799971 5’CCGTCAGTACCATGGACAGC3’ 5’GAGTACGCCAAGGCATCAGT3’ TGGC3’ R: 5’ACTGACTGACTGACCATGGGTC GGACAGGT3’ 5’ACTGACTGACTGGAACCTTGACT HTR2A rs6313 5’GACACCAGGCTCTACAGTAATGACTTT3’ 5’TGTCCAGTTAAATGCATCAGAAGTG3’ CATCAGAAGTGTTAGCTTCTCC3’ 5’AACCTTGGAACCTTGGGGCTGACA BDNF rs6265 5’GCTTGACATCATTGGCTGACA3’ 5’CTGGTCCTCATCCAACAGCTC3’ CT3’ 5’GGTTCTAGTAGATTCCAGCAATAAA SLC6A4 rs1042173 5’TATCTGTTTGCTTCTAAAGGTTTC3’ 5’TGGACACACTATTTTTCATTTTAG ATT3’ rs7305115 5’ACGAGACTTTCTGGCAGGACTG3’ 5’TTAATTCTCCAATGGAGGAAAGGA3’ 5’GATCCCCTCTACACCCC3’ TPH2 rs4290270 5’ACGAGACTTTCTGGCAGGACTG3’ 5’TTAATTCTCCAATGGAGGAAAGGA3’ 5’AAAGGAGTCCTGCTCCATA3’

59 SLC6A2 rs5569 5’ATGGGAGGCATGGAGGCTGTC3’ 5’CGAGAAGGAAAGTGCTGAAGGTGAC3’ 5’GGCATGGAGGCTGTCATCAC3’ 5’TGTTTTATAGAGGTTCTTGATTTTTA DTNBP1 rs1047631 5’ATCCAGTTTTGGCTGTATGC3’ 5’CTGTTCTTTAAGTTTCTCACACATT3’ C3’ SNP8NRG4 NRG1 5’CTGCTGCCACTACTGCTGCTGCT3’ 5’CACCTTTCCCTCGATCACCA3’ 5’TAGCACACCGAGGCCC3’ 33E1006 5’TCGAAATCCGGATCTCCTGTGTATG HTR1B rs6296 5’CCACGTCCTCGGTCACCTCTATTAAC3’ 5’CACAATAAAGGCTCCCAAAATGATCC3’ T3’ rs1137070 5’AAATGGTCTCGGGAAGGTGA3’ 5’TTTGATTCAGGTTCTTGTACCCAG3’ 5’GGAAGGTGACCGAGAAAGA3’ MAOA rs6323 5’ACTTCAGACCAGAGCTTCCAGC3’ 5’ATGCACTTAATGACAGCTCCCA3’ 5’GAGAAACCAGTTAATTCAGCG3’ DRD3 rs6280 5’TCTGCCCCACAGGTGTAGTTC3’ 5’GGCATCTCTGAGCCAGCTG3’ 5’ATCTCTGAGCCAGCTGAGT3’ ESR1 rs3798577 5’TGGTGTTGCATTTAGCCCTGG3’ 5’AGCCACAACAATCCTGCACA3’ 5’GGCATGGAGCTGAACAGTAC3’ SLC6A3 rs6347 5’TTCATCATCTACCCGGAAGCC3’ 5’GAAGAAGACCACGGCCCAG3’ 5’ACGCTCCCTCTGTCCTC3’ COMT rs4633 5’GTGACACCAAGGAGCAGCG3’ 5’TGTCAATGGCCTCCAGCAC3’ 5’AGCGCATCCTGAACCA3’ DAO rs2070588 5’GACGGGACTGATAACAGCAGC3’ 5’CACAAGCATCCATTCATCCAA3’ 5’CATCCAAGTCTCCCAACACT3’ NR3C1 rs6196 5’GGCAGTCACTTTTGATGAAACAGA3’ 5’GAGTATTGAATTCCCCGAGATGTTAG3’ 5’CAATCAGATACCAAAATATTCAAA3 NQO2 rs1143684 5’GCTGCACCGTCACAGTGTCT3’ 5’TGATATCTTTGTCTGTGGCCCTC3’ 5’GTCTGATTTGTATGCCATGAAC3’

59 3.2.3 Computational analysis of mRNA folding

As mRNA function and stability may be related to folding structure, we used

Mfold version 3.0 to estimate the effect of the polymorphism (SNP) on mRNA folding

(Mathews et al., 1999; Zuker 2003). Wild-type Refseq mRNA sequences of OPRM1,

OPRD1 and OPRK1 were obtained without untranslated regions. A custom Unix

program created every possible variant at each base position and fed sequences to Mfold

for structure prediction, and subsequent automated analysis. Changes in minimum free energy, as well as pairwise comparisons in structural interactions (paired vs. unpaired) were calculated relative to the wild type structure using sliding windows around the induced variants, and across the complete mRNA structure. OPRM1, OPRK1 and

OPRD1 homologs were gathered through BLAST of the human mRNA sequence, then trimmed and aligned with ClustalW. Pfam identities were calculated for each base

position separately among each of the receptor groups, and also for overall similarity

among all opioid receptors. No significant correlation was observed between base

conservation and predicted effects on mRNA structure within receptor subtypes or across

all receptors (data not shown). The human transcriptome-wide predictions are described

in Chapter 6.

60 3.3 Results of allelic expression survey

3.3.1 Allelic expression results in 42 candidate genes

AEI analysis was applied to 42 candidate genes (see Table 3.2), in human biopsy and

autopsy specimens (brain, liver, kidney, heart, intestines, peripheral blood monocytes/macrophages, and transformed B-lymphoblasts (CEPH/Utah, Coriell

repository). The candidate genes are divided into several groups, namely, cardiovascular

and CNS disorders, and drug metabolism and transport. No attempt was made to cover a

majority of candidate genes (there are several hundred candidates for cardiovascular

diseases alone), but rather, a representative selection was sought to obtain information on

the frequency of cis-acting factors.

We analyzed >4,200 individual genotypes and determined 1,008 informative

heterozygous samples to use in AEI assays. On average we surveyed 46 individual

chromosomes per gene for expression imbalance, with an average marker SNP

heterozygosity of ~24%. Because of the limited number of target tissue samples, this

study aimed at finding relatively frequent functional polymorphisms (generally >5%

minor allele frequency) and was statistically well powered to do so, similar to previous

studies of allelic expression imbalance (Yan et al., 2002; Pastinen et al., 2004; He et al.,

2005). Details on genes, tissue sources, number of samples analyzed, marker SNPs,

frequency and number of replicate assays are found in Table 3.2. Table 3.3 contains

results for those genes showing significant AEI in at least one sample, along with

information on the marker SNPs, and the frequency, magnitude and direction of AEI. As 61 a conservative detection threshold for the presence of mRNA AEI ratios (major over

minor allele), we use a ±log2 0.5 (1:1.4 or 1.4:1) threshold. Assay readout results for

selected samples that meet the threshold cutoff are shown in Figure 3.2. This threshold

minimizes the number of false positives, in view of the precision of the assays. Lesser

AEI ratios may also be of physiological relevance but should be subject to more extensive validation to exclude artifacts. The results reveal AEI above our threshold in

67% of genes, with relatively frequent AEI (two or more samples) observed in 55% of

genes. A number of well-studied genes, such as ACE and SOD2 displayed a surprising magnitude and frequency of AEI that was unknown in previous genetic analyses. This implies the presence of yet unrecognized functional polymorphisms even though these genes had been under intense scrutiny.

In a few cases, our AEI data confirm previous studies by other investigators. For example, the modest AEI ratios observed for COMT (see Table 3.2) occur in a similar range as reported (Bray et al., 2003). We found a similar frequency and extent of AEI for

NQO2 as in a previous survey in white blood cells suggesting a common cis-acting allele that remains undiscovered (He et al., 2005). We failed to observe significant AEI in

5HT2A, as previously reported by others (Bray et al., 2004; Pastinen et al., 2004); however, a more recent study does reveal some AEI (Fukuda et al., 2006) while not providing rigorous validation of the results. These results support the validity of our analysis and extend the number of chromosomes analyzed for genetic defects – that is, the results further reduce the upper frequency limit of any unrecognized polymorphism in

62 Replicates Tissue (number of % (n) or Name Symbol Indication Marker SNP subjects analyzed) het assay validation

Cardiovascular disease candidate genes 63 Heart (31) 51 6 Angiotensin I converting Blood pressure, ACE Small bowel (32) rs4309 (exon 9) 37 4 enzyme CAD Liver (10) 53 4 rs4343 (exon 17) 40 6 CAD, SLE, Monocyte chemoattractant Heart (29) CCL2 inflammation, rs4586 (exon 2) 45 4 protein 1 Monocytes (26) infection rs13900 (3’UTR) 46 4 Superoxide dismutase, Oxidative stress, SOD2 Heart (34) rs4880 (exon 2) 52 4 mitochondrial cancer rs5746092 (5’UTR) 38 4 Nicotinamide N- Homocysteine Liver (19) NNMT rs4646335 (exon 1) 33 2 methyltransferase levels Small bowel (20)

Continued

Table 3.2: Information for 42 genes surveyed for allelic expression imbalance in human tissues.

63 Table 3.2 continued

Triglyceride, Lipoprotein lipase LPL cholesterol Heart (15) rs1059611 (exon 10) 23 3 metabolism Cholesterol 3-hydroxy-3- synthesis, direct Liver (8) 23 1 methylglutaryl-CoA HMGCR rs12916 (exon 19) target B-lymph (18) 20 1 reductase (statins) Colony-stimulating factor, Inflammation, CSF1 Monocytes (18) rs333970 (exon 6) 43 4 macrophage-specific endothelial function Prostaglandin D2 synthase PTGDS CAD Heart (30) rs6926 (exon 7) 46 2 Hypoxia-inducible factor HIF1A Oxidative stress Heart (18) rs2057482 (exon 15) 28 2 1, alpha subunit

64 Nitric oxide synthase, Oxidative stress, NOS3 Heart (22) rs1799983 (exon 8) 34 4 endothelial vascular function FMS-related tyrosine FLT1 Endothelial function Heart (19) rs2296189 (exon 23) 29 2 kinase L-type voltage-dependent calcium channel alpha CACNA1C Blood pressure Heart (37) rs1544514 (exon 4) 27 6 subunit 1C rs216008 (exon 30) 40 6 Adrenergic receptor beta2 ADRB2 Blood pressure Heart (22) rs1042719 (exon1) 46 2 BK channel beta1 subunit KCNMB1 Blood pressure Heart (30) rs11739136 (exon 3) 12 6 rs2656842 (exon 4) 46 6 rs2656841 (exon 4) 47 6 Liver (29) Vitamin K rs7294 (exon 3, VKORC1 Warfarin target Heart (26) 40 6 reductase 3'UTR) B-lymph (12)

Continued

64 Table 3.2 continued

Gamma-glutamyl GGCX Blood coagulation Liver (23) rs699664 (exon 8) 40 2 carboxylase Cholesteryl ester transfer HDL cholesterol CETP Liver (29) rs5884 (exon 14) 47 6 protein remodeling Heme 1 HMOX1 Heme catabolism Heart (5) rs11555832 (3’UTR) 8 1

Drug metabolism candidate genes Multidrug resistance ABCB1 Drug transporter Liver (22) rs1045642 (exon 26) 50 6

65 polypeptide 1, MDR1 rs1128503 (exon 12) 44 6 rs2032582 (exon 21) 47 6 Cytochrome P450, subfamily 2D, polypeptide CYP2D6 Drug metabolism Liver (26) rs1065852 (exon 1) 26 6 6 rs1058164 (exon 3) 32 6 rs16947 (exon 6) 37 6 rs1135840 (exon 9) 30 6 Cytochrome P450, subfamily 2C, polypeptide CYP2C9 Drug metabolism Liver (19) rs9332242 (exon 9) 11 4 9 rs1057911 (exon 9) 9 4 Human peptide transporter SLC15A1 Transporter Small bowel (21) rs1339067 (exon 17) 42 4 1 Human peptide transporter SLC15A2 Transporter Kidney (8) rs1143670 (exon 14) 45 4 2 rs1143671 (exon 15) 45 4

Continued

65 Table 3.2 continued

CNS disorder candidate genes rs6277 (exon 7) 44 6 Prefrontal cortex, Dopamine D2 receptor DRD2 CNS rs6275 (exon 7) 33 4 Cerebral lobes (30) rs6279 (3’ UTR) 31 6 rs1044393 (exon 5) 20 4 Acetylcholine nicotinic CHRNA4 CNS Striatum (13) rs1044397 (exon 5) 40 4 receptor subunit a4 rs2236196 (3’ UTR) 44 4 mu Opioid receptor Cerebral lobes, Pons OPRM1 CNS rs1799971 (exon 1) 8 6

66 (8) Serotonin receptor 2A HTR2A CNS Prefrontal cortex (16) rs6313 (exon 1) 15 2 Brain-derived neurotrophic BDNF CNS Cerebral lobes (9) rs6265 (exon 2) 35 4 factor Serotonin transporter SLC6A4 CNS Pons (29) rs1042173 (3’UTR) 60 3 rs7305115 (exon 7) 38 4 Tryptophan hydroxylase 2 TPH2 CNS Pons (27) rs4290270 (exon 9) 46 2 Norepinephrine transporter SLC6A2 CNS Pons (18) rs5569 (exon 9) 38 2 Dysbindin 1 DTNBP1 CNS Pons (9) rs1047631(3’UTR) 15 2 SNP8NRG433E1006 Neuregulin 1 NRG1 CNS Pons (8) 17 2 (exon 1) Serotonin receptor 1B HTR1B CNS Pons (16) rs6296 (exon 1) 33 rs1137070 (exon 14) 47 3 Monoamine oxidase A MAOA CNS Prefrontal cortex (19) rs6323 (exon 8) 50 3 Dopamine D3 receptor DRD3 CNS Prefrontal cortex (21) rs6280 (exon 2) 38 3 Estrogen receptor ESR1 CNS Prefrontal cortex (46) rs3798577 (3’UTR) 44 3 Dopamine transporter SLC6A3 CNS Striatum (21) rs6347 (exon 9) 54 3

Continued 66 Table 3.2 continued

Catechol-O- COMT CNS Prefrontal cortex (10) rs4633 (exon 3) 65 1 methyltransferase Diamine oxidase DAO CNS Prefrontal cortex (21) rs2070588 (5’UTR) 37 3 CNS/stress Glucocorticoid receptor NR3C1 Prefrontal cortex (20) rs6196 (exon 8) 19 3 hormonal Nad(p)h:menadione 1, dioxin- NQO2 CNS B-lymph (9) rs1143684 (exon 3) 30 1 inducible 2

67

List of candidate genes analyzed in this study, grouped by indication (disease or pharmacology). Marker SNPs are all located

in transcribed regions of the mature mRNA, or a splice variant. The tissues analyzed are indicated with the total number of

unique samples surveyed for AEI in parentheses. The percent heterozygosity is indicated for each marker SNP out of all

samples genotyped in a given tissue. For some genes more than one marker SNP were used. Number of replicate assays

(RNA isolations, cDNA syntheses) is indicated.

67 Representative cDNA without cDNA with genomic DNA AEI significant AEI

ACE -0.02 -1.80 (rs4309)

+0.19 +1.05 SOD2 (rs4880)

NOS3 +0.05 -0.62

(rs1799983)

CCL2 -0.08 +0.90 (rs4586)

Example results of AEI analysis for ACE, SOD2, NOS3 and CCL2 in heart tissues. Each peak represents a distinct allele measured in genomic DNA or cDNA from a single heterozygous individual. Samples (columns, left to right) represent the typical DNA ratio observed, a cDNA showing insignificant deviation from the expected ratio and a cDNA sample showing highly significant deviation from unity. Normalized AEI values relative to genomic DNA are given (cDNA values listed as major:minor allele on a log2 scale).

Figure 3.2: Representative allelic expression imbalances from 4 genes in the survey

68

Genes showing highly significant allelic expression imbalance (one sample showing minimally ±20.5, or roughly 40% AEI in either direction). AEI ratios are defined by mRNA abundance of major over minor allele for a given marker SNP (normalized to genomic allele ratios measured with SNaPshot). Frequency of AEI is given as % of total number of samples heterozygous for the marker SNP(s). Direction of AEI ratios (>1, <1, and <1>) refers to values above or below unity, with the ratio defined by mRNA abundance of major over minor allele for a given marker SNP.

Table 3.3: Genes in the survey showing highly significant allelic expression imbalance (minimally ±20.5 in at least one sample).

69 % samples with AEI ratio significant AEI Direction of Symbol Tissue Marker SNP(s) range (total number of AEI ratios samples) Cardiovascular disease candidate

genes Heart, Liver, rs4309 ACE 0.2-1.5 48% (n=73) <1> Small bowel rs4343 Heart, rs4586 MCP1/CCL2 0.66-1.87 18% (n=55) <1> Monocytes rs13900 rs4880 SOD2 Heart 0.9-2.1 83% (n=41) >1 rs5746092 Liver, Small NNMT rs4646335 0.58-2.9 15% (n=39) <1> bowel CSF1 Monocytes rs333970 0.98-1.9 11% (n=18) >1 PTGDS Heart rs6926 0.7-1.3 3% (n=30) <1> NOS3 Heart rs1799983 0.65-1.11 18% (n=22) <1 VKORC1 Liver rs7294 0.5-1 68% (n=29) <1 CETP Liver rs5884 0.67-1.89 77% (n=29) <1> Drug metabolism candidate

genes rs1045642 ABCB1 Liver rs1128503 1.0-1.6 36% (n=22 ) >1 rs2032582 rs1065852 <1> rs1058164 <1> CYP2D6 Liver 0.1-10 88% (n=26) rs16947 <1> rs1135840 <1> CYP2C9 Liver rs2017319 0.6-1.2 21% (n=19) <1 rs1143670 SLC15A2 Kidney 0.6-1.1 13% (n=8) <1> rs1143671 CNS disorder candidate genes rs6277 DRD2 Brain rs6275 0.5-2.0 25% (n=30) <1> rs6279 rs1044393 rs1044397 CHRNA4 Brain 0.6-2.0 85% (n=13) <1> rs2229959 rs2236196 OPRM1 Brain rs1799971 1.0-3.0 100% (n=8) >1 BDNF Brain rs6265 0.9-1.6 11% (n=9) >1 rs7305115 TPH2 Brain 0.92-2.55 48% (n=27) >1 rs4290270 DTNBP1 Brain rs1047631 0.88-1.45 22% (n=9) >1 rs1137070 MAOA Brain 0.3-4.0 89% (n=19) <1> rs6232 DRD3 Brain rs6280 0.3-2.7 57% (n=21) <1> ESR1 Brain rs3798577 0.5-1.7 24% (n=46) <1> SLC6A3 Brain rs6347 0.1-1.2 95% (n=21) <1> COMT Brain rs4633 0.9-1.4 10% (n=10) >1 DAO Brain rs2070588 0.3-4.5 57%(n=21) <1> NR3C1 Brain rs6196 0.4-1.0 5% (n=20) <1 NQO2 Coriell sample rs1143684 0.88-2.60 33% (n= 9) >1

Table 3.3.

70 COMT and 5HT2A. Where we have failed to detect significant AEI, this strongly argues

against the presence of cis-acting factors on mRNA levels, in the tissues analyzed.

However, if alternative splicing occurs, the AEI assay must be performed after specific amplification of each splice variant.

In a subset of genes for which we had complete RT-PCR data and two or more

samples met the threshold for AEI (ACE, SOD2, CCL2, NOS3, TPH2) we tested for

association between AEI and total mRNA levels. We only observed significant

associations between AEI and total mRNA levels for ACE and TPH2 (Lim et al., 2007).

Analysis of correlations between bi-directional AEI (regardless of threshold) and RTPCR

results was conducted for ACE, SOD2, CCL2, NOS3, FLT1, HIF1A, LPL, CCL2, NOS3 and PTGDS. We observed borderline significant correlations between AEI and RTPCR for ACE (r=0.34, p<0.06), HIF1A (r=-0.45, p<0.06) and PTGDS (r=0.38, p<0.04). These results indicate as expected and previously reported that trans-acting variation between samples likely masks many cis-acting factors when total RNA or protein levels are measured.

For genes showing strong AEI, the frequency of cis-acting factors can be estimated from the number of heterozygous subjects with AEI compared to the total number of subjects heterozygous for the marker SNP. It is possible that meaningful AEI may be detected in specific disease groups or ethnogeographic populations (see ACE below), or in specific contexts (environmental conditions, tissues). The potential importance of examining multiple primary tissues is exhibited for VKORC1, for which we observed no substantial AEI in heart tissues or in the B-lymphoblast CEPH samples, but found common and significant AEI in a group of liver biopsy samples. Important 71 information is also embedded in the magnitude and pattern of AEI ratios. This is illustrated with results from genes expressed in heart tissues.

3.3.2 Results in cardiovascular disease candidate genes

Allelic expression ratios in left ventricular heart tissues from transplant recipients were measured for candidate genes related to cardiovascular diseases (see Fig. 3.3). We plot the ratios of the major allele over the minor allele, so that a ratio >1 indicates the main marker allele is more highly expressed, whereas a ratio <1 indicates the opposite.

Four genes in the cardiovascular disease panel displayed significant AEI in more than one sample, namely ACE, SOD2, NOS3, and CCL2. The frequency of significant AEI ratios relative to that of proposed rSNPs provides information whether and to what extent the putative regulatory polymorphisms contribute to this phenotype. For example we have performed genotype scanning of the CCL2 locus, including a proposed promoter

SNP (rs1024611) (Rovin et al., 1999). Homozygosity of two subjects that display

significant AEI for rs1024611 illustrates that this putative functional SNP does not fully account for observed expression imbalances. Furthermore, many heterozygotes for this

SNP do not display significant AEI contrary to previous reports (see also Chapter 5).

ACE, CCL2, PTGDS, and KCNMB1 show allelic ratios below and above 1, suggesting multiple functional variants and/or incomplete linkage disequilibrium between the marker SNPs and functional alleles. ACE shows threefold unidirectional differences in allelic mRNA abundance in a group of African American samples, suggesting the 72 presence of a strong cis-acting factor that is enriched in this population. Allelic ratios of lesser magnitude in both directions among Caucasian American samples suggest the

presence of additional cis-acting factors with smaller impact. ACE results were

confirmed by high concordance for results between two marker SNPs in compound

heterozygotes (r2=0.98), and linearity of the assay was demonstrated over the observed

AEI range by dilution mixing experiments using homozygous DNA (r2=0.99 for both

SNPs). Both SOD2 and NOS3 show allelic ratios largely in a single direction – indicative

of a functional SNP in the same haplotype block as the marker SNP, or that the marker

SNP itself is functional. We further genotyped a putative regulatory SNP, T-786C

(rs2070744), upstream of NOS3 but found that genotypes for this marker did not associate with allelic imbalance results. The allelic ratios for SOD2 mRNA are extremely common and on the order of 1.5 fold, indicating that the ‘major allele’ has

~50% greater expression (however, allele frequency is close to 50%, making it difficult

to identify the minor allele). These results were supported by the use of an additional

marker SNP, rs5746092 in the 5’UTR, which was in a modest degree of LD with rs4880

(AEI r2=0.73, n=16). The results indicate a remarkably high frequency of AEI for SOD2

(83% of samples surveyed). Given the frequency of the marker SNPs in heart failures samples (rs4880, 52% heterozygosity, rs5746092, 37% heterozygosity) and in the general population, this indicates a very common cis-acting factor influencing SOD2 expression in the human lineage.

A number of genes did not show any AEI. For example, the L-type channel

CACNA1C – a gene featuring >55 exons across ~250 kB – surprisingly failed to reveal

73 1.5

1.0 SOD2

KCNMB1 0.5 ACE ADRB2 HIF1A PTGDS VKORC1 LPL NOS3 CCL2 FLT1 0.0 CACNA1C

-0.5

-1.0 Log 2 Allele Expression Imbalance (AEI) Imbalance Expression Log 2 Allele -1.5

-2.0

Allelic mRNA expression ratios (major allele over minor allele, normalized to the mean allelic ratio in genomic DNA) measured in heart failure samples for cardiovascular candidate genes. Results for individual samples are displayed with the magnitude and direction of AEI indicated on a log2 scale (y-axis). Potential AEI in individual samples is indicated by ratios >(+0.3) log2 or <(-0.3) log2, a cutoff arrived at by analysis of the extent of variation in genomic DNA ratios. For the present survey study we considered ratios >(+0.5) log2 or <(-0.5) log2 to represent highly significant AEI.

Figure 3.3: Allelic expression imbalance results for 12 cardiovascular candidate genes

(all individual human heart samples’ results displayed)

74 any detectable AEI. We have subsequently used several marker SNPs, and moreover, have combined the AEI analysis with quantitative analysis of all known splice variants at

12 splice sites. This failed to reveal any cis-acting factor that could have caused the observed large differences in splicing of CACNA1C between individuals in human heart tissues (Wang et al., 2006).

3.3.3 Putative functional effects of variants on mRNA folding

Our observation that a number of frequent allelic imbalances that are unidirectional seem to be caused by both synonymous and nonsynonymous SNPs in coding regions (e.g.,

OPRM1, ABCB1, TPH2), and results from a number of other groups (e.g., Nackley et al.,

2006) suggest that srSNPs could be as common as rSNPs. We sought a mechanism

whereby srSNPs may commonly affect gene expression. One potential mechanism is

through the alteration of locally functional structures within mRNA as single nucleotide

substitutions in single-stranded nucleic acids have a high probability of altering the

folding or folding ensembles. Previously we showed that mRNA structural modeling for

a functional, nonsynonymous srSNP in OPRM1 corroborated in vivo AEI experiments

(Zhang et al., 2005). Furthermore, we showed that RNA structure modeling for all four

possible nucleotides at that position was paralleled by in vitro AEI experiments for all

four alleles (Zhang et al., 2005). To assess the potential prevalence of SNP-induced

changes in mRNA folding in coding sequences, we calculated estimated changes in folding energies for all possible transitions and transversions in the mature mRNAs of the

µ, κ and δ opioid receptors (OPRM1, OPRK1, OPRD1). At each position in the opioid

75 receptor sequences there was one possible transition (C<>U, G<>A) and two transversions (C<>G, C<>A, G<>U, A<>U). Calculations of all possible OPRM1 transitions/transversions on thermodynamic folding energies reveal that a majority of

SNPs show the potential to alter mRNA folding significantly, often predicting more profound changes than the known functional A118G SNP in OPRM1 (Zhang et al., 2005)

(see arrow in Fig. 3.4). Thus, while a change in predicted folding does not necessarily imply a functional change in the mRNA, such changes are likely to be common and do have the potential to alter any of the numerous processing steps, and translation (Nackley et al., 2006). One region of the OPRM1 mRNA displays low susceptibility to structural disturbance by SNP transitions and transversions, which is reproduced in OPRD and

OPRK (data not shown). Whether this domain has functional significance or may be subject to rapid evolution remains to be determined.

To address the effect of potential srSNPs on a genome-wide scale, we computed predicted changes for 34,557 validated SNPs (dbSNP Build 125) located in transcribed exonic domains, using the Vienna program (see also Chapter 6). We calculated and analyzed both the minimum free energy conformations (MFE) and the ensembles of suboptimal conformations in 100 bp windows around the SNPs. Sixty percent of all transcribed human SNPs affected the MFE structure, while >90% altered the suboptimal ensembles. Therefore, srSNPs have broad potential for causing changes in mRNA folding and hence functions, regardless of the potential for affecting RNA-protein interactions at the level of primary sequence structure. We used the resulting database to analyze allelic affects on mRNA folding for those SNPs that showed fairly common AEI

76 3500 e OPRM1 118A>G

3000

2500

2000

1500

1000 effect of transition SNPs on OPRM1 mRNA secondary structure relativ structure secondary mRNA OPRM1 on SNPs transition of effect

to wild typeto wild allele (global difference single-strandedness in count) 500 n silico I

0 1 101 201 301 401 501 601 701 801 901 1001 1101 Nucleotide position in OPRM1 mRNA relative to translation start site

Computed changes of mRNA folding (minimum free energy conformations) induced by all possible transitions (C<>T; G<>A SNPs) in the transcribed exonic domains of OPRM1 mRNA. The arrow indicates the location of the functional SNP A118G, affecting mRNA levels in human brain (Zhang et al., 2005). The x-axis denotes the nucleotide position in the mature OPRM1 mRNA, while the y-axis represents the extent by which mRNA folding is affected by each given transition relative to wild type All conformations were calculated using Mfold, and the sum of differences in the Mfold single-strandedness count measure at each nucleotide was computed globally (across the full mRNA structure as shown here) and in regional sliding windows of varied sizes. Analysis of sliding windows, both types of transversions at each base (pyrmidine<>purine), and A>G transitions alone all gave very similar results.

Figure 3.4: In silico mRNA folding results for mature OPRM1 sequence

77 folding consistently in a single direction, suspecting these may represent srSNPs (SOD2 rs4880, rs5646092; VKORC1 rs7294; NQO2 rs1143684 for which 89% of samples showed AEI at a lower cutoff). One method for evaluating RNA structures relies on analyzing the Boltzmann distribution of structures to determine the frequency of the thermodynamically optimal structure among all predicted structures. When the optimal structure has a high frequency among all predicted structures this may be an indicator of stable structure (Miklos et al., 2005). This approach provides a different analysis than relying on changes in the lowest thermodynamic energy which are susceptible to sequence biases in nucleotide content. We noted that alleles associated with AEI in

SOD2 (rs5646092), VKORC1 (rs7294) and NQO2 (rs1143684) all displayed Boltzmann frequencies well above the transcriptome-wide average (2-3 SD from the norm), and that the SNP base change for each was predicted to alter the Boltzmann frequency to a greater extent than the average SNP within the large dataset (also 2-3 SD from the norm). In particular both SOD2 SNPs are in regions that display highly stable structures, with rs5646092 positioned within an 18bp helix. These results suggest that one or more of these alleles could affect gene expression through a change in mRNA structure. Results of SNP-mRNA structure relationships for other genes in our survey ORPM1 (Zhang et al., 2005), ABCB1 (Wang et al., 2005) and COMT (Nackley et al., 2006) have previously been reported.

3.4 Summary

We have developed rapid and robust assays of allelic gene expression (AEI) as a powerful means for discovering functional polymorphisms affecting gene regulation

78 (rSNPs) and RNA processing (srSNPs). The effect of rSNPs and srSNPs is expected to

vary with the cellular environment, so that studies on human genes in physiologically

relevant target tissues are of critical importance. Measuring allelic mRNA expression

ratios can mitigate variations in mRNA levels and tissue degradation, by using one allele

as the control of the other. This study is the largest survey of AEI in physiologically

relevant target tissues for a series of human candidate genes involved in disease

susceptibility or treatment response. The importance of studying AEI in a variety of native tissues is shown by our results for VKORC1, a gene where alleles are currently being suggested for pharmacogenetic testing in guiding warfarin dosing although the functional explanation remains undiscovered. We find common and robust AEI for

VKORC1 in liver, the physiologically most relevant tissue, while no AEI was detected in heart tissues or in immortalized B-lymphoblast cell lines. This suggests highly tissue specific effects drive the major allelic effects on VKORC1. These effects would have

gone undetected using B-lymphoblast cell lines alone, indicating that relying heavily on

these cell lines alone may give a rather incomplete picture of human genotypic and

phenotypic diversity (Morley et al., 2004; Pastinen et al., 2004; He et al., 2005).

While the selection of candidate genes is far from complete, the study permits an

estimation of the prevalence of cis-acting polymorphisms in a broader subset of human

tissues than previous studies. Significant (>log2 ±0.5) AEI in more than one subject was

observed for 55% of the surveyed genes (see Table 3.3). This frequency of AEI is similar

to a number of previous studies that selected specific candidate genes and assayed similar

numbers of heterozygotes (Yan et al., 2002; Bray et al., 2003; He et al., 2005); however,

the frequency is higher than estimates from other studies including one that made a semi- 79 random selection of genes and measured AEI mostly in cell lines (Pastinen et al., 2004).

These differences may be attributable to the selection of candidate genes based on a priori knowledge, or differences in methodology, tissue specificity, statistical power or stringency of applied thresholds. The presence of frequent AEI was unexpected for some of the candidate genes that had already been intensely studied for genetic polymorphisms

(e.g., ACE, TPH2, DRD2). The results shown in Tables 3.2 and 3.3 provide information on the possible presence or absence of cis-acting functional polymorphisms at the mRNA level, and their frequency. Given the precision of the AEI analysis in our studies, we can detect smaller ratios, and these may indeed be significant. For example, one might suspect that a change of 20-30% in the activity of a critical gene such as HMGCR could affect cholesterol production. If no AEI is observed (genes in Table 3.2 not listed in

Table 3.3), this is a strong indication that cis-acting polymorphisms are absent in the study population. Larger sample cohorts, or disease populations, could nevertheless reveal functional polymorphisms occurring at frequencies below 1-10%.

Previous work has proposed a number of regulatory polymorphisms to be present in several genes studied here with AEI analyses. The main tool for demonstrating that these polymorphisms are functional is reporter gene assays performed with the cloned promoter regions in heterologous tissues. If any of these proposed regulatory polymorphisms were functional in vivo in the relevant tissues analyzed here, one must expect corresponding changes in the AEI ratios. However, in several instances we have failed to detect significant linkage between the observed AEI ratios and the putative regulatory SNPs. For example, AEI ratios of MAOA in several brain regions correlated poorly with the pVNTR in the promoter region (Pinsonneault et al., 2006) – the latter 80 supported by reporter gene assays and the subject of numerous association studies. For

SERT, previous studies had identified a putative promoter polymorphism (SERT-LPR)

(Lesch et al., 1996). However, our genotype scanning with AEI did not support a role for

SERT-LPR (although we cannot rule out that this promoter polymorphism might be active in development, or under stress) (Lim et al., 2006). Similarly we have failed to corroborate putative promoter SNPs for CCL2 and NOS3 (see below), and DRD2 (Y.

Zhang et al., in submission). These observations indicate that in vitro reporter gene assays are not reliable indicators of polymorphic effects in target tissues in vivo, or that further tissue contexts must be studied to understand any impact for these putative functional alleles. A recent study of 4 genes (MAOA (corroborating our earlier results

(Pinsonneault et al., 2006)), NOS3, PDYN, and NPY) using AEI analysis corroborates this conclusion (Cirulli & Goldstein 2007). Taken together, we propose that a finding of a functional regulatory polymorphism requires both in vivo AEI analysis and in vitro reporter assays.

Measuring allelic ratios of both DNA and mRNA in human target tissues has proven difficult, requiring optimization and validation at each step of the entire process to achieve precision and accuracy. Important steps detailed here include the selection of suitable marker SNPs, preparation of cDNA with primers that target a region immediately adjacent to the marker SNP, multiplexing of cDNA preparation to permit analysis of multiple genes in the same tissue sample, and selection of tissues with robust expression of the candidate gene. Use of a second marker SNP yields validation by an independent assay where both markers are heterozygous in an individual (Wang et al.,

2005; Pinsonneault et al., 2006; Wang et al., 2006; Lim et al., 2007). Allelic ratios 81 measured for genomic DNA are not expected to deviate substantially from unity, unless gene duplications or deletions occur, in normal tissues (Lim et al., 2007) or more likely in cancer tissues (Lengauer et al., 1998). Recent studies have suggested that gene duplications, deletions, and other rearrangements are rather frequent in the germline of normal populations and may play a role in disease (Redon et al., 2006). Among hundreds of samples analyzed for the surveyed genes, we have observed deviations of the DNA ratios from unity in only two samples with TPH2 (Lim et al., 2007), indicating that gene duplications were extremely rare, at least among the genes surveyed here. Loss of heterozygosity could not be assessed with the SNaPshot method presently applied. Other factors may contribute to or partially mask AEI, including allele-selective epigenetic regulation of gene expression, as expected for imprinted and X-linked genes (unequal X- inactivation). This must be tested for each gene where AEI ratios cannot be readily accounted for with polymorphisms alone. Because of the relative precision by which the

AEI ratios can be measured as a proximate phenotype, we were able to dissect genetic and epigenetic regulation of the X-linked MAOA, which escapes X-inactivation but yet is regulated by a distinct mechanism of allelic CpG methylation in females, but not in males

(Pinsonneault et al., 2006).

AEI analysis yields important information on suspected functional variants, including population frequency, magnitude of the effect, presence of more than one factor, and a first estimate of the location of the functional polymorphism relative to the marker SNP. If AEI ratios vary in magnitude, for example, the allelic ratios of ACE, this is an indication that more than one functional polymorphism plays a role. If the suspected functional SNP is in complete linkage disequilibrium with the marker SNP, 82 most or all AEI ratios are either <1 or >1, as observed with SOD2 in heart tissues (Fig.

3.3). In contrast, functional SNPs unlinked to the marker SNP are revealed by random distribution of ratios <1 and >1 (Fig. 3.3); indicating that these are located in other

haplotype blocks. A more precise location for functional polymorphisms may be attained

by scanning the gene locus for polymorphisms that are congruent with the allelic

expression ratios in all subjects heterozygous for the marker SNP(s). We have scanned a

number of the genes in this survey by selecting tagging SNPs and by direct sequencing,

leading to successful identification of putative functional alleles for OPRM1, MDR1,

MAOA, SERT, and TPH2 (Wang et al., 2005; Zhang et al., 2005; Lim et al., 2006;

Pinsonneault et al., 2006; Lim et al., 2007). For OPRM1, MDR1, and TPH2, we have

linked the observed AEI ratios to a SNP in the transcribed region of the gene, likely

involved in mRNA processing, turnover, and splicing, respectively. These three

proposed functional srSNPs affect folding and mRNA functions (Wang et al., 2005;

Zhang et al., 2005; Lim et al., 2007). Our results further indicated that a SNP in the 3’-

portion of the MAOA gene does not account for altered gene expression (Pinsonneault et

al., 2006). ACE gene scanning based on AEI ratios located a proposed functional rSNP

in African-American samples that accounted for the strong differences we observed (see

Chapter 4). Therefore, in this limited set of genes, srSNPs seem to account for more frequent cis-acting polymorphisms in important target genes than rSNPs.

Many more genes need to be studied in detail before the prevalence genetic mechanisms can be gauged on a genome-wide basis, but we propose that srSNPs at least represent a considerable source of human phenotypic variability and could play a significant role in evolution. One possible mechanism by which srSNPs arise is through 83 alteration of functional mRNA structures. Nascent hnRNA and mRNA strands fold into

energetically favorable structures that are studded with numerous proteins, regulating maturation, splicing, and turnover. SNPs have the potential to alter mRNA-protein interactions and mRNA folding at multiple sites. Using Mfold to predict mRNA structures affected by systematic nucleotide changes in the genes encoding opioid receptors (Fig. 3.4), and by similar predictions at the transcriptome-wide level, we find that any single nucleotide substitution has a 60% chance of affecting the minimum free energy conformation, and a >90% chance of changing the likely ensemble of structural conformations. Consistent with this, SNPs can be detected by a physical method based on ‘single-strand conformational polymorphisms’, which depends upon mobility changes of 500 bp single-stranded nucleic acids on non-denaturing HPLC, with a 95% discovery rate because of altered folding. This supports the view that mRNAs exist in a dynamic state with interconverting conformations, whereby by some SNPs may disturb the relative occupancy of important confirmations. Our analysis of marker SNPs in SOD2, VKORC1 and NQO2 that correlate with AEI indicates they may effect RNA structures.

Consequences of polymorphic changes in primary and secondary mRNA structures may include altered mRNA processing, turnover, splicing, and transport (Johnson et al.,

2005). Consistent with this, a number of synonymous polymorphisms in transcribed regions have been shown to affect mRNA stability (Duan et al., 2003; Wang et al., 2005).

Moreover, with approximately half of all human genes undergoing alternative splicing

(Modrek & Lee 2002; Lee & Irizarry 2003), an increasing number of genetic variants are being discovered that affect splicing (Attaie et al., 1997; Laporte et al., 1997; Plant et al.,

1999; Maillet et al., 1999; Bodzioch et al., 2001; Howe & Lynas 2001; Perez et al., 2003;

84 Tang et al., 2004; Murrell et al., 2005). The effects of both regulatory and srSNPs are very likely context-dependent, varying with tissue type and environmental factors, in contrast to nonsynonymous SNPs that affect protein sequence in each tissue. Therefore, fewer constraints exist for the evolution of rSNPs and srSNPs when compared to non- synonymous cSNPs. Structural RNA SNPs can further affect translation, a process that cannot be studied with use of mRNA AEI. A recent report on COMT variants demonstrates the presence of a haplotype with strong effect on protein levels, presumably a result of altered translation via RNA structural changes (Nackley et al., 2006).

Measuring AEI ratios at the protein level, which would require the presence of nonsynonymous SNPs resulting in amino acid substitutions for allelic analysis, is a future possibility that would allow the determination of quantitative effects of polymorphisms on translation and protein turnover. This approach has already been applied to APOE3 and APOE4 variants that can be separated by electrophoresis, to address the question whether regulatory polymorphisms can modify risk of Alzheimer’s disease. Rapid analysis of protein AEI would enable broad analysis of the impact of polymorphisms on proteins.

Cardiovascular-related disorders comprise a major source of health care burden and therapeutic targeting. While numerous candidate genes have been implicated, few common contributing variants have been consistently supported, possibly in part because of a previous focus on cSNPs. We surveyed AEI for 18 cardiovascular candidate genes to search for functional variants at the mRNA level. These genes encompass drug targets, key players in oxidative stress response, inflammation and plaque development, lipid and triglyceride transport and metabolism, vasomotor tone, and heart contractility 85 (see Table 3.2). Physiologically relevant target tissues included 65 heart failure explants from transplant recipients, livers, ex vivo monocytes, and peripheral blood monocyte- derived macrophages. AEI was detectable for 15 cardiovascular genes if we set a 20% imbalance in allelic expression as a threshold value (see Fig. 3.3), but only three genes displayed frequent and substantial AEI at a more conservative cutoff (allelic ratios >1.5).

Immediate follow up was done for ACE, SOD2 and CCL2 with separate independent assays of the same tissue sample to make a stringent test of assay validity, using a second marker, with cDNA derived from a different primer. For example, two marker SNPs served to measure the substantial AEI for ACE (angiotensin I-converting enzyme), providing independent support for the accuracy of the measured AEI ratios (see Chapter

4). Notably, a common polymorphism of ACE, an intron 16 insertion deletion polymorphism (indel), has been the subject of several thousand published clinical association studies. Despite extensive studies of the genetics of this critical physiological regulator, to our knowledge no definitive functional variants have been identified. There are so many conflicting association studies that a number of metanalyses have been conducted in an attempt to reach conclusions (Sayed-Tabatabaei et al., 2006). We find common AEI of varying magnitude for ACE in both Caucasian and African-American heart tissue samples. In Caucasians, AEI ratios are mildly associated with the intron 16 indel, but also with a large haplotype block surrounding the indel, making identification of the causative variants challenging. In fact, the pattern of AEI ratios suggests that the indel is not responsible for differences in ACE expression. African-American samples show AEI of greater magnitude and display more haplotype diversity, allowing us to link

86 the AEI with a polymorphism in the regulatory region, which is currently followed up both in vitro and in clinical studies involving hypertensive patients (see Chapter 4).

A key factor involved in metabolizing superoxide, SOD2 (manganese superoxide dismutase 2, mitochrondrial) mRNA expression is upregulated in failing human hearts in association with oxidative stress, whereas protein levels and activity are decreased, suggesting translational or post-translational dysregulation (Sam et al., 2005). Two variants in SOD2 have previously been studied for association with cardiomyopathy

(Hiroi et al., 1999; Valenti et al., 2004), cancers (e.g., Hung et al., 2004), and other disorders including methamphetamine-induced psychosis (Nakamura et al., 2006). These studies have not shown consistent associations with either variant studied. Some evidence indicates that the nonsynonymous marker SNP (rs4880, -9A>V) used in the present study for AEI analysis affects a mitochondrial targeting sequence and may reduce mitochondrial uptake of the mature protein (Hiroi et al., 1999). Another report finds that a promoter region SNP (rs5746092) disrupts binding of AP-2 and affects SOD2 transcription (Xu et al., 1999). We find common AEI in failed heart tissues surveyed for

SOD2 (Fig. 3), with ratios consistently >1. This indicates a frequent functional SNP affecting mRNA levels may lie in the haplotype block containing the marker SNP.

Genotype scanning of the SOD2 locus with AEI ratios as the phenotype rules out the regulatory variant thought to affect transcription (rs5746092) as the main contributor to

AEI, while the marker SNP itself is the most likely functional SNP candidate, deduced from the AEI ratios and allele frequencies. Because of the high frequency of AEI for this gene we speculate this may be a gain of function allele that has risen to high frequency in the human lineage due to the protective role of SOD2. At the same time SOD2 87 expression is known to be regulated by methylation (Huang et al., 1999), so we cannot rule out allele-specific epigenetics as a contributing mechanism. We used an HpaII methylation-sensitive restriction site near the rs4880 marker SNP combined with the AEI assay to test whether one allele was preferentially methylated in any DNA samples and found that there was no significant pattern of allelic methylation (data not shown). While further study of SOD2 functional regulation is necessary at both the mRNA and protein levels, the measured AEI ratios demonstrate functional variation of SOD2 mRNA expression in vivo that may have important implications for a number of disorders.

Less frequent or less substantial AEI was observed for a number of other candidate genes, namely CCL2, NOS3, FLT1, HIF1A, HMOX1 and LPL. In the cases of

CCL2 and NOS3, we genotyped additional regulatory region SNPs that have been the

subject of previous reports (Rovin et al., 1999), but we failed to find a direct (CCL2) or

complete (NOS3) association between these suspected functional variants and the

observed AEI (data not shown). Our results for NOS3 are consistent with AEI results in

brain tissues that were recently reported (Cirulli & Goldstein 2007). The results for

CCL2 and NOS3 indicate there may be additional functional alleles to be discovered and

characterized. Among the larger group of genes including FLT1, HIF1A, HMOX1, and

LPL, our results suggest that these genes are not subject to common cis-regulatory

differences affecting mRNA transcription and processing, at least in the tissues examined, or that specific gene activation states, developmental states, or disease states, are required to observe the functional nature of these variants. A small correlation was observed between AEI ratios and total mRNA expression for HIF1A and PTGDS, hinting that there may be factors to be discovered by further study. Because these genes all have 88 important physiological roles, even expression imbalances that are less common in the

general population may be of clinical importance. Indeed, small changes in gene expression can have serious consequences, for example in tumor suppressor genes (Yan et al., 2002). We found evidence for a small imbalance of expression of HMGCR (3-

hydroxy-3-methylglutaryl-CoA reductase), the direct target of the lipid-lowering statins.

A large association study with candidate genes and pravastatin response indicated a likely

functional variant linked to two intronic SNPs in HMGCR and causing decreased lipid-

lowering response, with frequency greater than 5% in the population (Chasman et al.,

2004). We observed AEI for HMGCR in two samples (8% of those surveyed). This

suggests that these individuals may carry a variant predisposing them to reduced statin

response, but the limited use of subsequent haplotype tagging did not revealed any

functional candidates that might confirm this (data not shown). If overall mRNA amount

is not the functional explanation of variability, effects on splicing or translation may be

instead. Interestingly, we detected an alternative splice form of HMGCR that lacks the

exon encoding the where statins bind. This previously identified splice form

(Johnson et al., 2003) showed variable, but lower expression, than the major isoform

across a population of human liver samples; however the ratio of the two isoforms did not

directly relate to AEI or haplotypes we measured (data not shown). In summary, we have

applied mRNA AEI analysis to 42 candidate genes, revealing many instances of yet

unrecognized functional polymorphisms or other cis-acting factors. The AEI

methodology can be applied on a large scale, and possibly genome-wide.

89 CHAPTER 4

Human ACE (angiotensin converting-I enzyme 1)

Angiotensin I-converting enzyme (ACE) is expressed with a wide tissue

distribution including plasma, endothelial cells, kidney, heart and lungs. This enzyme

hydrolyzes a number of substrates, including conversion of angiotensin I to angiotensin II

(as part of the renin-angiotensin system). Angiotensin II (AngII) is a potent vasoconstrictor and pro-hypertrophic factor. Ang II induces production of superoxide

free radicals (O2-) that scavenge available nitric oxide and reduce endothelial

vasodilatation. ACE has even greater affinity for bradykinin, thus hydrolyzing and

inactivating a potent vasodilator. Thus, variation in ACE levels can affect multiple key

physiological roles and impinge on pathways important for the development of diseases,

including namely cardiovascular disease.

Angiotensin I-converting enzyme inhibitors are often first line drugs of choice in

treating hypertension and congestive heart failure. The observation that there is a heritable component to ACE enzyme activity (Cambien et al., 1988) indirectly led to thousands of studies aimed at uncovering phenotype-genotype associations with ACE including response to ACE inhibitors (Arnett et al., 2005). The majority of these studies focus only on a single insertion/deletion (I/D) polymorphism in intron 16 of the gene.

Meta-analyses of phenotypic associations with the I/D show a lack of association in most 90 cases and mild associations at best (Sayed-Tabatabei et al., 2006). Investigations of the potential functional implications of the I/D have failed to show any effect on transcription

(Rosatto et al., 1999) or splicing (Lei et al., 2005), suggesting that linkage disequilibrium with other polymorphisms may account for significant associations.

The use of race in the analysis of hypertension and response to anti-hypertensives suggests that there are differences between population groups that may be attributable in part to the contribution of individuals’ genetic ancestry, and that African-Americans are at higher risk of hypertension (Hajjar & Kotchen 2003), end-stage renal disease (Karter et al., 2002) and poorer response and outcome to anti-hypertensive treatment (Exner et al.,

2001; The BEST Investigators 2001; Sehgal 2004) and more likely to experience adverse effects with ACE inhibitor treatment (McDowell et al., 2006). Genetic studies in families of African ancestry suggest there is a major heritable ACE contribution to ACE enzyme levels and blood pressure (Zhu et al., 2001), and comparisons of the ACE locus between

European-Americans and African-Americans indicate major ancestral divergences

(Rieder et al., 1999). The use of race in guiding treatment is controversial (Taylor et al.,

2004) and ultimately patients may be better segregated and treated by a more complete understanding of the ancestral alleles that underlie such decisions (Bamshed 2005). Here

I report novel functional alleles affecting ACE expression discovered in a screen of heart failure tissues. I provide evidence both in vitro and in vivo for functional regulatory alleles. These alleles are common in African-American samples surveyed while less common in Hispanic-Americans and Caucasian-Americans, respectively. In a follow-up clinical genetic association study in the genetic substudy of the INVEST cohort (Pepine

91 et al., 2003) we found these alleles to be strongly associated with the primary endpoint, and more specifically with the incidence of myocardial infarction.

4.1 Method of investigation on human ACE

Tissue isolation and cDNA synthesis: Approval for use of human subjects was obtained from the OSU IRB. Left ventricle tissue from 65 heart transplant patients was obtained through The Cooperative Human tissue Network: Midwestern Division at OSU and stored at -80C until extraction. Genomic DNA was prepared by a standard salting- out method from tissue. RNA was isolated by pulverizing ~100mg of tissue over dry ice and suspension in Trizol reagent, followed by phenol-chloroform extraction, and passed through an RNAeasy column (Qiagen) treated with DNAse I. RNA quantity and quality was confirmed by UV spectrophotometry and nanodrop analysis (Bioanalyzer, Agilent

Biotechnologies). cDNA was synthesized from 1.0 ug RNA in triplicate using oligo dT and ACE gene-specific primers by manufacturer’s protocol (Superscript RTII,

Invitrogen). Negative controls (RTII-) and positive expression signals for ACE were confirmed by RTPCR on an ABI7000 instrument followed by gel electrophoresis.

Measurement of allelic expression imbalances: We used a method previously described to measure allelic expression imbalances in ACE mRNAs (Wang et al., 2005).

Briefly, we amplified short regions around SNPs (rs4309, n=27; rs4343, n=23) in both

DNA and cDNA from heterozygous individuals. Primer extension incorporation of fluorescent dideoxynucleotides allowed quantitation of relative amounts of each allele by capillary electrophoresis on an ABI3730 instrument. Outer amplification primers were as

92 follows: rs4309 (forward primer: TGAGATGGGCCATATACAGTACTAC; reverse

primer: CCCGACGCAGGGAGAC), rs4343 (DNA forward primer:

CCCTTACAAGCAGAGGTGAGCTAA; cDNA forward primer:

CACTGTGTGCCACCCGAAT; reverse primer:

CATGCCCATAACAGGTCTTCATATT). Extension primers were as follows: rs4309

(CTGCAGTACAAGGATCTGCC), rs4343 (GACGAATGTGATGGCCAC). Corrected allelic expression imbalance ratios for individual cDNAs were calculated by normalizing to the ratio of DNA peaks. Results from compound heterozygotes (n=18) were compared for internal validation (r2=0.98). Homozygous DNA was mixed at varied ratios

(1:4,1:1,4:1) for each SNP and assayed to determine linearity of each assay (r2=0.99 for both SNPs). Association between genotypes and AEI results was done using Helix Tree software (Golden Helix, Inc., Bozeman, MT).

Resequencing of DNA samples: Genomic DNA in seven African-American

samples (five with large AEI, two that did not exhibit significant AEI) was PCR

amplified followed by direct sequencing on an ABI3730 and analysis of chromatographic

and sequence results with Xplorer (DNAtools, Inc.) and BLAT (Kent 2002), respectively.

Amplicons were designed to cover the mRNA/cDNA sequence of the major ACE isoform, and DNA up to -3,942 bp 5’ of the transcription start site and up to +115 bp 3’ of the end the consensus 3’UTR. Primers used in PCR reactions and sequencing are given in Table 4.1.

Genotyping of ACE polymorphisms: A diagram of the ACE gene structure and

relative location of genetic polymorphisms (not to scale) genotyped in these studies is

shown in Figure 4.1. Marker SNPs (rs4309, rs4343) were genotyped in 65 heart failure

93 samples by a melting curve dissociation approach on an ABI7000 real-time PCR

instrument in order to determine heterozygotes for allelic expression assays (Papp et al.,

2003). Genotyping of rs4343 was done with 600 nM of the G allele primer

(GACGAATGTGATGGCCTCG) and 200 nM of the A allele primer

(GGGCCGGCCGCGCGACGAATGTGATGGCCGCA) and matching reverse primer

(CATGCCCATAACAGGTCTTCATATT). Genotyping of rs4309 was done with 600 nM of the C allele primer (TGCAGTACAAGGATCTGACC), 150 nM of the T allele

primer (CGCCGGGCCGGCCGGTGCAGTACAAGGATCTGGCT) and 200 nM of the

matching reverse primer (TCCCCAATGGCCTCATG). Genomic DNA for the INVEST-

GENES clinical study was isolated from buccal genetic samples using commercially

available kits (PureGene, Gentra Systems Inc., Minneapolis, MN) and normalized to

20ng/µl. Genotyping for the rs4290 polymorphism was performed by polymerase chain

reaction (PCR) at the University of Florida followed by Pyrosequencing®

(Pyrosequencing, Uppsala, Sweden) using a PSQ HS96A SNP reagent kit according to

the manufacturer’s protocol (Biotage AB, Uppsala, Sweden) (Langaee & Ronaghi 2005).

SNPs rs7213516 and rs4291 genotyping was performed by Taqman assay. The PCR and sequencing primers used for ACE SNP rs4290 were as follows: Forward biotinylated

PCR primer; 5’- GAGTGTGGGTCATTTCCTCTTT-3’; Reverse PCR primer; 5’-

AGTTTAGCATGGTGCCTAGCA-3’; and Reverse sequencing primer; 5’-

GGGCAAAACCTCATC-3’. The PCR conditions were as follows: 95 ºC for 15 min, 40

cycles consisting of, denaturation at 94 ºC for 30 s, annealing at 59 ºC for 30 s, and

extension at 72 ºC for 1 min, followed by final extension at 72 ºC for 7 min. The Applied

Biosystems 7900 HT SNP genotyping platform was used for the Taqman® assays. The

94 SNP genotyping probes (ID, C_32160109_10 and C__11942507_10) were used [Applied

Biosystems, Foster City, USA] for ACE rs7213516 A>G and rs4291 A>T, respectively.

Five µL reactions in 384-well plate were prepared and the assays were performed and

analyzed according to the manufacture's recommendations. Additionally, assays at OSU

were developed for the following polymorphisms:

rs7213516, rs4290, rs4291, rs4292, rs13447447, rs4366.

-3040F CAGCCCCAAATTTTGTATATGG -2235R GTTACTGGAGGGCA GGGA TG -2677F TTCTCCTTTGTTGTGACGGC -1718R TCTGTGTGCAAATGAGCTGC -1546F TGTCCTCTGGTATCCACTGGCT -542R GACCTTAGGTGTCTTGCAGGC -3040F *amplified with -3040F, sequenced from reverse -1287R TCCTGTGAGATGCACCTCCAG -661F AGGCGCTCCAAAGCTCC +251R GTGATGTTGGTGTCGTGCG +122F CTGCAGCCCGGCAACTT +832R GTATCTGTCTCCGTATCGGCG +624F AACCGCTGTACGAGGATTTCAC +1373R CGATTTTGTGCAGATGTTCAGG +1230F TGAGATGGGCCATATACAGTACTAC +1938R CCCTCCGGGTAGTTGTCAGG +1775F CTGAAGGACATGGTCGGCTTAG +2506R CCACGAGTCCCCTGCATCTAC +2312F TGGAAACCACCTACAGCGTG +3044R CCCTCAAGGCCACAGGTAAGT +3025F CTTACCTGTGGCCTTGAGGG +3725R CTTCTGAGCGAGCGGAGTTC +3258F AAGCATCACCAAGGAGAACTATAACC +4174R TGTATTCACAGAGAGACTTGGAGAGGT 3'UTRfwd GAACACTTGCCATTTTGAGCC 3'UTRrev AGGATGGAGGAACAAACCTAGTAAC

Table 4.1: Forward and reverse oligonucleotide primers used for PCR amplification and resequencing of human ACE.

Two SNPs (rs4291 and rs4292) were genotyped by a single PCR reaction (forward primer: AGGCGCTCCAAAGCTCC; reverse primer:

GTGATGTTGGTGTCGTGCGCCC) and multiplexed SNaPshot primer extension reaction (rs4291 reverse: TGGCTAGAAAGGGCCTCCTCTCTTT; rs4292 reverse:

CTTCCTCCTCCGCTCCA). The nucleotides in italics are tails that do not match

95 genomic sequence and were added to increase the melting temperature of products and

differentiate them from native repetitive DNA. The multiplexed primer extension was

run with 100 nM of rs4291 extension primer and 400 nM of rs4292 extension primer.

The upstream SNP rs7213516 was genotyped by PCR with both unlabeled and FAM-

labeled forward primer (FAM-GCCCCAGCACCATTTGTTAA) and reverse primer

(CAGAGACCTGACCCACGTGAG) followed by digest with HinfI enzyme (which cuts

the minor allele one additional time), capillary electrophoresis and analysis of resulting

size products (minor allele peak at 87 bp, major allele peak at 99 bp). Genotyping of the upstream SNP rs4290 was done using the melting curve approach on an ABI7000 with

600 nM C allele primer (CACAGGGCAAAACCTCAACG), 200 nM T allele primer

(GGCGCGCCGCGGGCCCACAGGGCAAAACCTCACCA), 200 nM forward primer

(GTCATTTCCTCTTTCCTCTGCAC). The 287bp insertion/deletion polymorphism

(rs13447447) was genotyped by PCR with both unlabeled and FAM-labeled reverse primer (FAM-GTGGCCATCACATTCGTCAG), and two forward primers, one of which was insert-specific (CCCATCCTTTCTCCCATTTCT; insert-specific:

GACCTCGTGATCCGCCC), run on an ABI3730 capillary electrophoresis instrument to

distinguish size products (insert peaks 191 bp and 462 bp, deletion 175 bp). The CT2/3 repeat polymorphism (rs4366) was genotyped by PCR with a FAM-labeled forward

96 Alu I/D (287 bp indel) 5kbp

CpG island Alternative start

rs4309 C>T rs4343 A>G rs4366 (CT)2/3

0.5kbp

rs4290 rs4291 rs4292 rs7213516 rs7214530 CpG island

Figure 4.1: Human ACE gene structure and relevant genetic polymorphisms (chromosome 17q23.3)

97 primer (FAM-TGGCTCCTGCCTGTACCAG) and reverse primer

(CCAAGGCTGTTCACCCGA) and separation by capillary electrophoresis. Genotype assays were validated when possible against samples of known genotypes from the

HapMap database. For the quality control, genotypes for each polymorphism done at the

University of Florida were confirmed through a blind genotype program by re- genotyping of 5% of samples via the same or an alternative method. Genotyping data validity was also checked by determination of Hardy Weinberg Equilibrium. Haplotypes were computationally derived and pairwise linkage disequilibrium (D′) calculated separately for each racial/ethnic group using Polymorphism and Haplotype Analysis

Suite (http://ilya.wustl.edu/~pgrn/programs.html). Assays for most polymorphisms were

confirmed by 100% concordance with direct sequencing, SNaPshot primer extension,

results from the University of Florida, or running assays on genomic DNA isolated from

immortalized B-lymphoblast Coriell cell lines for which genotypes are publicly available

as part of the HapMap project (The International HapMap Consortium, 2005).

To control for potential population stratification in our racially and ethnically

diverse population, we used a panel of 87 autosomal ancestry informative markers

(AIMs) that show large allele frequency differences across three parental populations

(West Africans, Indigenous Americans, and Europeans) (Shriver et al., 2005). These

AIMs were selected to be distributed across the genome and to be distantly interspaced to

give independent association with the disease and genetic background. These 87 AIMs

were genotyped using either allele-specific PCR with universal energy transfer labeled

primers or competitive allele specific PCR at Prevention Genetics (Marshfield,

Wisconsin) (Myakishev et al., 2001).

98 DNA samples from the INternational VErapamil-Trandolopril STudy (INVEST)

cohort: The INVEST evaluated blood pressure and adverse outcomes occurring with either an atenolol-based or a verapamil SR-based hypertension treatment strategy in

22,576 patients with documented coronary artery disease (CAD) and hypertension

(Pepine et al., 2003). Determination of race/ethnicity was by patient report with interaction by the site investigator, choosing all that were applicable among the following options on the INVEST data collection form: white, black, Asian, Hispanic, and “other.”

In this analysis, Hispanic patients were defined as those who chose only ‘Hispanic’ on the data collection form. The design, protocol, and primary outcome have been published in detail elsewhere (Beitelshees et al., 2005). Briefly, the protocol required patients to be seen at baseline, 6, 12, 18, and 24 weeks, and then every 6 months thereafter until two years after the last patient was enrolled. At each visit, patients had BP and heart rate

measured, clinical assessment, and additional antihypertensive medications added as

needed to meet JNC VI BP goals (Joint National Committee 1997; Beitelshees et al.,

2005). The primary outcome was the first occurrence of death (all cause), nonfatal MI, or

nonfatal stroke, and all events were adjuticated by an independent events adjudication

committee. Clinical Trial Registration Identifier: NCT00133692,

URL: http://clinicaltrials.gov/ct/gui/show/NCT00133692?order=5

Genetic samples were collected from 5,979 INVEST patients residing in mainland

United States and Puerto Rico. Genomic DNA was collected using buccal cells from mouthwash samples as previously described (Andrisin et al., 2002). All patients provided written informed consent for participation in the genetic substudy and the study was approved by the University of Florida IRB. 99 We conducted a nested case-control study among the 258 INVEST-GENES patients who experienced a primary outcome event (death, nonfatal MI, or nonfatal stroke) during study follow-up (cases). A total of 774 individuals who did not have an event during study follow-up were frequency matched in a ratio of 3:1 to cases for age, sex, and race/ethnicity (controls).

Baseline characteristics between case and controls for patients in INVEST-

GENES (see Table 4.2) were compared using t-test for continuous and Chi-squared test for categorical variables, respectively. Hardy-Weinberg equilibrium (HWE) of the genotype frequencies within each race/ethnic group was tested with Chi-squared test with one degree of freedom. Maximum likelihood was used to estimate each patient’s individual genomic ancestry proportions on the three parental ancestries (African,

European, Native Americans) and two of these three terms (estimates of African and

Native American ancestry) were included in statistical models to control for potential population stratification (Shriver et al., 2005).

Because of the low minor allele frequency for both the rs7213516 and rs4290

SNPs, we decided a priori to combine heterozygote patients with the homozygous variant patients for all analyses, assuming dominant mode of inheritance. Logistic regression was performed to assess the association of the genotypes/haplotypes with the primary outcome and secondary outcomes after adjusting for ancestry and prespecified confounding factors documented from the primary INVEST analysis, namely age (by decades), gender, race/ethnicity, history of myocardial infarction and heart failure. The

Kaplan Meier method was used to assess the time to event between genotypes. Analyses were also conducted relative to the influence of study drugs on the genetic associations.

100 Cases Controls Characteristic (N = 258) (N = 774) Age, mean (SD), years 71.5 (9.9) 70.2 (9.3) Women 131 (50.8) 393 (50.8) BP, mean (SD), mmHg Systolic 150.6 (19.0) 147.4 (19.0) Diastolic 83.6 (11.1) 83.3 (11.1) Race/ethnicity White 158 (61.2) 472 (61.0) Black 36 (14.0) 101 (13.1) Hispanic 63 (24.4) 198 (25.6) Other/multiracial 1 (0.4) 3 (0.4) BMI, mean (SD), kg/m2 27.4 (4.8) 29.0 (5.5) Past Medical History Myocardial infarction 96 (37.2) 230 (29.7) Angina pectoris 153 (59.3) 483 (62.4) Stroke/TIA 36 (14.0) 71 (9.2) Left ventricular hypertrophy 46 (17.8) 136 (17.6) Heart failure (class I-III) 28 (10.9) 29 (3.8) Peripheral vascular disease 43 (16.7) 88 (11.4) Smoking Past 133 (51.6) 355 (45.9) Within 30 days 34(13.2) 83 (10.7) Diabetes‡ 102 (39.5) 224 (28.9) Hypercholesterolemia‡ 161 (62.4) 485 (62.7) Renal impairment† 14 (5.4) 18 (2.3) Cancer 20 (7.8) 46 (5.9) Medication Aspirin/other antiplatelet agent 162 (62.8) 451 (58.3) Antidiabetic medication 86 (33.3) 188 (24.3) Any lipid-lowering agent 106 (41.1) 331 (42.8) Nitrates 92 (35.7) 232 (30.0)

* Values expressed as number (percentage) unless otherwise indicated. Percentages may not equal 100 due to rounding. ‡ History of or currently taking antidiabetic or lipid- lowering medications. † History of or currently have elevated serum creatinine level but less than 4 mg/dL (< 354 µmol/L).

Table 4.2: Baseline characteristics of INVEST case-control cohort.

101 Time-varying exposure was used for HCTZ and trandolapril since these drugs were

added at different times for each patient depending on individual BP control. 983 DNA

samples from the genetic substudy of the INVEST clinical trial (Pepine et al., 2003) were

genotyped for the following variants: rs7213516, rs7214530, rs4290, rs4291, rs13447447,

rs4366. Analysis of genotype and haplotype associations with clinical phenotypes was done by use of multivariable models at the University of Florida. Variables included age, gender, race (self-identified as Caucasian-, Hispanic- or African-American), (verapamil) treatment arm, (atenolol) treatment arm, prescription of ACE inhibitor (trandolopril, possible in both arms), history of myocardial

infarction (MI), history of heart failure (HF), history of diabetes mellitus (DM) and

ancestry (determined by the genotyping of 87 Ancestry Informative Markers). The

clinical phenotypes considered were a primary combined outcome consisting of death,

nonfatal-MI, or nonfatal-stroke since trial enrollment. The secondary clinical endpoints were the separate consideration of each of the three outcomes in the primary outcome.

DNA samples from the OSUP database: 117 African-American DNA samples in

Dr. Glen Cooke’s OSUP (OSU patient) database were genotyped for markers rs7213516

and rs4290. These individuals were seen by Dr. Cooke in combination with an OSU IRB

approved protocol. The population was highly hypertensive and enrolled under CAD-

related criteria and thus should not be considered reflective of the general population, and

was considered a population with a more advanced stage of disease than the Florida

cohort. The general criteria for enrollment was evidence for symptomatic CAD that required percutaneous intervention and greater than 70 percent lesion in one or more

102 coronary artery. Most enrollees also had abnormal results on a standard exercise stress

testing program.

The following clinical endpoints were adjudicated by Dr. Cooke and analyzed in combination with genotype data: history of CAD, history of MI, history of restenosis,

family history of CAD, incidence of elective/stable angiography, unstable angina (UA),

or acute MI after enrollment, classification of MI as ST segment-elevation (STEMI) or

non-ST segment-elevation (NSTEMI), hypertension status (defined as greater than 140

mmHg systolic blood pressure on three or more separate occasions and/or prescription of

anti-hypertensive medication), lipid elevation status (total cholesterol greater than 200 and/or LDL cholesterol greater than 130), diabetes mellitus status, use, death, restenosis event, cerebrovascular/transient ischemic attack (CVA/TIA) event (mostly pre- enrollment), UA event requiring hospitalization, coronary artery bypass graft (CABG), and target vessel restenosis (TVR). Many of these categorizations result in groups with sample sizes too small for reliable statistical analysis, thus, a combined endpoint similar to the definition used in the University of Florida population was created by combining death, MI and CVA/TIA event before analysis by genotype. Lipid status, MI (before and after enrollment), CVA/TIA, hypertension status, diabetes mellitus and family history of

CAD were also tested for association with genotypes of individual SNPs.

In vitro mutagenesis of ACE upstream constructs: An ACE upstream region

construct (-4335 to the transcription start site) driving expression of a firefly Luciferase

reporter gene (pGL3.Basic, Promega) was provided by M. Eyries (see Eyries et al.,

2002). Sequencing indicated this construct contained the major allele at all sites in the

region, thus it was labeled ACEwt. Site-directed mutagenesis (Stratagene) was employed

103 to generate altered constructs with SNP combinations; ACErs4290T (sense primer:

CTCTGCACCCTTCCTTTGATGAGGTTTTGCCCT; antisense primer:

AGGGCAAAACCTCATCAAAGGAAGGGTGCAGAG), ACErs7214530G (sense primer: GAGCATATTTTTAAGGGCTGGTTTTCTCTCCTGTGGTAACT ; antisense primer: AGTTACCACAGGAGAGAAAACCAGCCCTTAAAAATATGCTC), and

ACErs4290T/rs7214530G. A fifth construct containing three minor alleles,

ACErs7213516A/rs4290T/rs7214530G was isolated by PCR of an individual genomic

DNA (forward primer: GAGACGGAGTTTTGCTCTTGTTG; reverse primer:

CAGAGACCTGACCCACGTGAG), restriction digest with MscI and BstEII and ligation with digested plasmid. All plasmid insert sequences were fully sequenced confirming the absence of additional genetic differences.

ACE allelic reporter gene activity assays: HEK cells were cultured to confluence

in 50/50 DMEM/F12 medium plus glutamine and penicillin, and seeded at 5.0X106/mL

in 96-well plates. All ACE constructs (0.8 ug/well) were co-transfected with Renilla

control gene (40ng/well) into separate wells by Lipofectamine protocol (Invitrogen).

Media was replaced 4h post-transfection and wells treated with either DMSO (100nM) or

PMA (100nM). At 24h or 48h post-treatment cells were assayed in RPMI medium in

accordance with manufacturer’s protocol for Luciferase reading (Kit #E2920, Promega)

on a Packard Fusion plate reader system. All experiments were replicated at least sixteen

times in individual wells, representing transfection on at least three separate occasions.

Bovine endothelial cells (BAEC) at passage 12 were plated in 12-well plates in 1X

DMEM medium and similarly transfected and assayed for activity.

104 Total mRNA expression levels in tissues: Total ACE mRNA expression levels

were measured in heart, kidney, liver and small bowel tissues and in immortalized

transformed B-lymphoblast cell lines by RTPCR with cDNA specific primers that

spanned the exon 9/10 border (forward primer: CCCCTTCCCGCTACAACTT; reverse

primer spanning exons: TCCCCTGATACTTGGTTCGAA), with the specificity of the

amplicon confirmed by DNA gel electrophoresis. RTPCR was done using SYBR Green

PCR Master mix on an ABI7000 for 30 cycles (2 steps: 95oC, 60oC).

ACE serum enzyme activity assays: African-American serum samples from the

OSU P database were selected based on genotype for rs4290 (14 CC, 11 CT, 2 TT).

Serum samples were not fresh, being previously frozen at -20oC for an extended period of

time. 100 ul of serum for each sample was diluted in 100 ul of autoclaved distilled water and assayed for ACE enzyme activity according to the manufacturer’s protocol (ACE

Colorimetric enzymatic assay, ALPCO diagnostics, 01-KK-ACE). Briefly, the enzyme cleaves a synthetic substrate, N-hippuryl-l-histidyl-l-leucine to form products hippuric acid and a dipeptide, histidyl-leucine. Addition of hydrochloric acid quenches the reaction and then the hippuric acid product is complexed with cyanuric chloride and absorbance can be measured at a wavelength of 382 nm, yielding values proportional to starting ACE serum activity. In parallel with sample measurements, a standard curve is constructed from known concentrations of hippuric acid (250, 500, 1000, 1500, 2000,

2500 umol/L) and used to normalize sample results.

CpG island methylation assay: 200ng each of four genomic DNA samples from

each of three tissues (heart, liver, small bowel) was digested in separate reactions with methylation-sensitive restriction enzymes SmaI and HhaI for 16h at 37oC. Separate

105 reactions for each DNA lacking restriction enzyme were subject to identical conditions in

parallel. Restriction enzymes were selected for their ability to cut in amplicons within the ACE CpG island (see Fig. 4.1, SmaI digests in exon 1 at +118 relative to TSS, HhaI digests at -563 relative to TSS). A preamplification reaction was run for both cut and uncut reactions for 8 cycles with 2 hour extension step to increase specificity using 5 nM of primers containing a sequence address tag, indicated in italics: (HhaI forward FAM tag: CGATGGCCCACTACGTGAATGAGCCCCTCCAGCACC, HhaI forward HEX tag:

ATACCGCGCCACATAGCATGAGCCCCTCCAGCACC, HhaI reverse:

GGGAGAGGAGGGAGGCG; SmaI forward FAM tag:

CGATGGCCCACTACGTGAAGTTTTATAACCCGCAGGGCG, SmaI forward HEX tag:

ATACCGCGCCACATAGCAGTTTTATAACCCGCAGGGCG, SmaI reverse:

CCGGCCTCGTCAGCAGA). A 1:15 dilution of the preamplification reactions was added to a 25 cycle PCR reaction (60oC 1:00 annealing step, 72oC 1:00 extension step)

with FAM/HEX-labeled primers (250nM) that annealed to the sequence address tags

(FAM-CGATGGCCCACTACGTGAA, HEX- ATACCGCGCCACATAGCA) and the

corresponding reverse primers and then run through capillary electrophoresis on an

ABI3730.

4.2 Results of investigation of human ACE

Allelic expression of ACE reveals major cis-acting factors: Measurement of ACE

AEI in heart tissues revealed dramatic expression imbalances in African-American but not Caucasian-American samples (see Fig 4.2). This suggested that differences in genetic

106 allele frequencies between the populations explained the results and led to further investigations to try to discover contributing alleles. Notably results of AEI

measurements in other tissue samples (kidney, liver, small bowel) do seem to confirm

association with specific ACE alleles (see below), but there were very few African-

American tissue samples available for study making it difficult to determine whether

these allelic effects are specific to particular tissues or more general (data not shown).

Resequencing identifies highly associated markers: Resequencing of seven

African-American samples (five displaying significant AEI in heart tissue and two

displaying no AEI in heart tissue) revealed that there were no polymorphisms within the

mRNA region (UTR or protein-coding region) that could account for the observed

expression imbalances. At the same time it was discovered that 3 polymorphisms

(rs7213516, rs7214530, rs4290) in a region upstream of the ACE transcription start site

were highly associated with the observed imbalances being 1) heterozygous in most or all

of the five samples displaying large AEI, and 2) being homozygous in both of the

samples that did not display AEI. The association of rs4290 with AEI of large magnitude

(>1.8 fold imbalance) was strong (p<10-7). In all samples surveyed, rs7214530 was in

complete linkage disequilibrium with rs4290, thus having the same level of significance.

107 *P<0.001

1.00 African- Caucasian- American American 0.50

0.00

-0.50

-1.00

-1.50 AEI (log base 2 scale) base (log AEI

-2.00

-2.50

Allelic expression imbalances for ACE in human left ventricle tissues. Results are displayed on the log base 2 scale with error bars as standard deviations of multiple assays and cDNA syntheses run on different dates. The vertical dashed bar separates samples self-identified as either African-American or Caucasian-American. Differences greater than ± 0.5 are taken to be highly significant. The leftmost African-American sample was found to have one RNA sample of low quality by nanodrop analysis which was excluded from analysis.

Figure 4.2: Allelic expression imbalance of ACE in heart tissue samples

108 The variant rs7213516 was in fairly high, although incomplete, LD with rs4290, and

showed a slightly lower level of association with AEI. Further genotyping of all heart

DNAs with available AEI results indicated that this association held, with the minor

alleles of these three SNPs not found in the Caucasian-American samples which largely

do not exhibit significant AEI in heart tissue (see Fig 4.2). Additional markers (rs4291,

rs4292, rs4357, rs4363, rs13447447, rs4366) were selected as haplotype tagging SNPs

based on published studies on ACE and genotyped for all samples with AEI data. Among

the other SNPs that were genotyped, only rs4357 showed significant association with

AEI. This variant was found to be in a high degree of LD with rs4290. The highly

studied insertion/deletion variant (rs13447447) had a single locus p-value of 0.09 for

association with large AEI, and even less association with lower magnitude AEI cutoffs.

Allele associated with AEI induces increased ACE reporter gene expression: Site- directed mutagenesis and transfection was used to test the relative differences between alleles in contribution to gene transcription within two heterologous contexts – a transformed human kidney cell context (HEK cells) and a bovine endothelial cell context

(BAECs). Results from these experiments clearly showed that the T allele of rs4290 resulted in increased ACE reporter gene expression regardless of cellular context or other alleles that were present. There was no significant difference attributable to the minor alleles of rs7213516 or rs7214530. These results support the hypothesis that rs4290 is a functional allele affecting ACE transcription, and likely accounts for a large portion of the in vivo AEI observed in heart tissues.

109 2.50

2.00

1.50 DMSO (100 nM) 1.00 PMA (100 nM)

0.50

0.00 relative to wild type allele ACEwt Normalized luciferase expression ACErs4290T ACErs7214530G /rs7214530G ACErs7213516A/rs4290T ACErs4290T/rs7214530G

ACE reporter gene assay results for different upstream alleles at 24h post-transfection of HEK cells. All constructs containing the rs4290 T allele showed significantly higher expression under PMA stimulation conditions than the wild type allele (all p<0.05). Results displayed here are reflective of results at the 48h time point and in BAEC transfected cells. All experiments were done on three separate occasions. Results here are from a single plate; the overall trend and significance are similar on separate occasions.

Figure 4.3: Effects of ACE alleles on reporter gene expression

110 ACE alleles associate with cardiovascular clinical endpoints in INVEST clinical

genetic association study: A clinical genotyping study in collaboration with INVEST investigators at the University of Florida found that ACE polymorphisms that associated with AEI results in heart tissues (rs7213516, rs4290) also associated with primary clinical

cardiovascular outcome (see Table 4.3). A breakdown of the analysis by phenotypic categorizations indicates that these effects are largely driven by an association of

rs7213516 (and rs4290 to a lesser, but still significant degree) with an additional nonfatal

MI event after enrollment in individuals receiving beta-blocker (atenolol) treatment

and/or ACE inhibitor (trandolopril) therapy (see Table 4.4). Drug-genotype effects are

shown in Table 4.5. Lower allele frequencies for these polymorphisms in Hispanic-

American and white Caucasian-American subgroups made it more difficult to assess their

significance in these groups (allele frequencies shown in Table 4.6). However, it is

interesting to note that these markers did reach significance in these subgroups for some

phenotypes, and where they did not reach significance they showed allelic trends

consistent with the results in the African-American subgroup.

ACE enzyme activity and genotype: There was a slight overall trend for carriers

of minor alleles of rs4290 and rs7213516 to have higher serum ACE enzyme activity

levels (rs4290 CT/TT mean 34.1 vs CC mean 27.9 umol hippuric acid/min X L;

rs7213516 GA/AA mean 32.8 vs GG mean 29.3 umol hippuric acid/min X L), though

this trend was not significant for either SNP (rs4290 p<0.41; rs7213516 p<0.66). Given

the wide standard deviation in both groups it seems likely that trans-acting variation

between samples greatly affected serum activity levels.

111 Polymorphism Odds Ratio (95% CI) Overall African Americans Hispanics Caucasians rs7213516 A carriers vs. GG 3.40 (1.74-6.62)* 4.31 (1.45-12.85)* 2.90 (0.98-8.52) na† rs4290 T carriers vs. CC 2.18 (1.17-4.06)* 3.90 (1.41-10.76)* 1.72 (0.57-5.19) 0.70 (0.08-6.16) I/D ID vs. DD 0.81 (0.54-1.21) 0.86 (0.26-2.97) 2.38 (0.93-6.08) 0.59 (0.35-0.99) II vs. DD 1.77 (1.08-2.91)* 2.61 (0.74-9.26) 2.83 (0.86-9.32) 1.23 (0.64-2.39) * p value < 0.05; †OR is not tractable due to low number of Caucasian A carriers.

*Adjusted for: age, sex, race/ethnicity, BMI, smoking, INVEST treatment strategy, previous myocardial infarction, previous stroke, heart failure, diabetes, renal insufficiency, baseline SBP, diuretic use, and ACE inhibitor use

112

Table 4.3: INVEST primary outcome odds ratios by ACE alleles.

112 Primary outcome

rs7213516 rs4290 rs4291 ALU I/D rs4366 African-American 0.0097 0.0094 0.63 0.55 0.76 Hispanic-American 0.0535 0.3622 0.13 0.14 0.23 White Caucasian-American 0.0462 na 0.22 0.032 0.02 All samples combined 0.0006 0.029 0.35 0.003 0.20

Secondary outcome - All cause mortality

rs7213516 rs4290 rs4291 ALU I/D rs4366 African-American 0.17 0.12 0.16 0.87 0.85 Hispanic-American na 0.53 0.46 0.41 0.23 White Caucasian-American na na 0.72 0.37 0.27 All samples combined 0.79 0.44 0.89 0.51 0.14

Secondary outcome - Nonfatal MI

rs7213516 rs4290 rs4291 ALU I/D rs4366 African-American 0.086 0.0996 0.86 0.37 0.27 Hispanic-American 0.0014 0.16 0.71 0.11 0.089 White Caucasian-American 0.0089 na 0.08 0.034 0.20 All samples combined <0.0001 0.056 0.19 0.012 0.17

Secondary outcome - Nonfatal Stroke

rs7213516 rs4290 rs4291 ALU I/D rs4366 African-American 0.0875 0.33 0.78 0.13 0.82 Hispanic-American 0.17 0.15 0.37 0.46 0.85 White Caucasian-American na na 0.55 0.35 0.28 All samples combined 0.18 0.37 0.84 0.18 0.94

Primary outcome - Primary treatment strategy - Verapamil arm

rs7213516 rs4290 rs4291 ALU I/D rs4366 African-American 0.20 0.054 0.36 0.60 0.98 Hispanic-American 0.17 0.18 0.93 0.09 0.83 White Caucasian-American na na 0.70 0.28 0.32 All samples combined 0.18 0.20 0.59 0.07 0.048

Continued

Table 4.4: Statistical association of ACE polymorphisms with clinical phenotypes in the

INVEST cohort.

113 Table 4.4 continued

Primary outcome - Primary treatment strategy - Atenolol arm

rs7213516 rs4290 rs4291 ALU I/D rs4366 African-American 0.07 0.68 0.33 0.69 0.92 Hispanic-American 0.13 0.74 0.57 0.39 0.54 White Caucasian-American na na 0.13 0.03 0.11 All samples combined 0.0013 0.088 0.14 0.01 0.48

Secondary outcome - Nonfatal MI - Verapamil arm

rs7213516 rs4290 rs4291 ALU I/D rs4366 All samples combined 0.30 0.64 0.14 0.19 0.44

Secondary outcome - Nonfatal MI - Atenolol arm

rs7213516 rs4290 rs4291 ALU I/D rs4366 All samples combined <0.0001 0.0061 0.10 0.07 0.31

Primary outcome - Individuals treated with ACE inhibitor Trandolopril

rs7213516 rs4290 rs4291 ALU I/D rs4366 African-American 0.07 0.02 0.35 0.79 0.81 Hispanic-American 0.0225 0.23 0.13 0.13 0.042 White Caucasian-American 0.29 na 0.72 0.16 0.14 All samples combined 0.002 0.025 0.69 0.058 0.04

Statistical p-values for Odds Ratios for association of ACE polymorphisms with clinical phenotypes in the INVEST cohort. All associations have been adjusted for a model incorporating age, gender, race, history of HF and MI, ancestry informative markers, and DM status. P-values below 0.10 are indicated in italics. Where insufficient numbers of samples in groups were available for analysis, “na” indicates this.

114 ACEI BB N HR* 95% CI P value L U - - 87 9.93 0.61 161.59 0.107 - + 99 8.79 1.62 47.56 0.012 + - 425 2.23 0.86 5.77 0.098 + + 301 4.16 1.75 9.89 0.0012

HR: adjusted hazard ratio after adjusting for: age, sex, race/ethnicity, BMI, smoking, INVEST treatment strategy, previous myocardial infarction, previous stroke, heart failure, diabetes, renal insufficiency, baseline SBP, diuretic use and ACE inhibitor use in time-dependent variables.

Table 4.5: INVEST rs7213516 genotype effects were modified by drug therapy.

115 Minor Allele Frequencies Minor African Allele Overall Caucasians Hispanics Americans P value* rs7213516 (G>A) A 3.37% 0.17% 4.33% 16.00% <0.0001 rs4290 (C>T) T 3.60% 0.57% 4.21% 16.03% <0.0001 rs4291 (A>T) T 37.12% 38.78% 34.68% 32.92% 0.11 indel (insertion/deletion) insertion 41.16% 41.67% 39.62% 40.57% 0.38 rs4366 (22/33) 22** 47.91% 55.89% 49.79% 40.16% 0.0004 *: P value for chi-squared tests or Fisher's exact test for the genotype frequencies by race/ethnicity, as appropriate. **: 22 is the minor allele in Hispanics and African Americans; 33 is the minor allele in Caucasians

116

Table 4.6: Allele frequencies for polymorphisms in the INVEST cohort.

116 1.0

.9

.8 GG homozygous .7

.6

.5 Event-free Survival Event-free .4 Log rank P=0.011 A carriers .3

.2 0 10 20 30 40 50 60 70

Month

Kaplan Meier curve for rs7213516 GG homozygous vs. A carriers shows a significantly greater proportional mortality in A carriers in the overall INVEST cohort. This trend was observed in all subgroups and statistically significant in African-Americans and Caucasian-Americans (data not shown).

Figure 4.4: Kaplan Meier curve for rs7213516 GG homozygous vs. A carriers in

INVEST cohort

117 Association study in OSUP database samples: ACE SNPs (rs4290, rs7213516)

were not significantly associated with a composite primary endpoint, family history of

CAD, MI before or after enrollment, CVA/TIA event, or lipid, hypertension, or diabetes

mellitus status among 117 African-American samples in the OSUP database (all p>0.05).

Notably, the minor allele frequencies (MAF) for both SNPs are higher in African-

American patients that had an MI after enrollment (rs7213516: 9% higher MAF; rs4290:

7% higher MAF) or CVA (primarily before enrollment) (rs7213516: 7% higher MAF;

rs4290 9% higher MAF) than in those who did not. These results are consistent with the trends observed in the University of Florida INVEST clinical genotyping study. Given the relatively low number of events and limited number of samples these differences were not statistically significant (see Table 4.7).

CpG island methylation of ACE shows tissue-specific differences but does not correlate with AEI: Results from a small group of DNA samples across tissues indicate

no detectable methylation at a SmaI site in exon 1 of ACE, but show methylation at a

HhaI site 563 bp upstream of the transcription start site. Methylation at the HhaI site

differed between tissues: all four small bowel DNAs showed no detectable methylation,

four heart DNAs showed 3%, 11%, 17%, and 17% methylation, respectively, and four

liver DNAs showed 2%, 8%, 12%, and 27% methylation, respectively. Within each

group of four DNAs two were selected because the samples displayed significant AEI in

those tissues and two were selected because they did not display significant AEI. There

was no apparent relationship between percent methylation at the HhaI site and AEI

results for the samples surveyed, suggesting that allelic methylation is not likely to

118 rs7213516 rs4290 GG GA AA MAF pvalue CC CT TT MAF pvalue Post enrollment phenotypes Endpoint (-) 31 14 0 0.16 0.53 34 11 0 0.12 0.28 Endpoint (+) 51 17 3 0.16 53 15 4 0.16 MI event (-) 68 25 1 0.14 0.13 71 21 2 0.13 0.20 MI event (+) 14 6 2 0.23 16 5 2 0.20

Pre enrollment phenotypes Previous MI (-) 44 18 1 0.16 0.56 47 15 1 0.14 0.38 Previous MI (+) 38 13 2 0.16 40 11 3 0.16 CVA event (-) 66 27 0 0.15 0.13 71 22 1 0.13 0.20 CVA event (+) 16 4 3 0.22 16 4 3 0.22 Hypertension (-) 7 3 0 0.15 0.60 7 3 0 0.18 0.58 Hypertension (+) 75 28 3 0.16 80 23 4 0.17 Family CAD (-) 34 16 1 0.18 0.33 36 13 2 0.17 0.26 Family CAD (+) 48 15 2 0.15 51 13 2 0.13 High (-) 14 11 0 0.25 0.14 15 10 0 0.22 0.16 High lipids (+) 68 20 3 0.15 72 16 4 0.14 Diabetes mellitus (-) 40 18 2 0.18 0.20 42 15 3 0.18 0.13 Diabetes mellitus (+) 42 13 1 0.13 45 11 1 0.11

Table 4.7: ACE clinical genotyping results from a CAD patient cohort of 117 African-

Americans at OSU hospitals.

119 account for or mask AEI for ACE in these tissues. Notably, the small bowel DNA samples were not detectably methylated in this upstream region near known transcriptional regulation sites. Interestingly, I observed via RTPCR that overall ACE mRNA expression is highest among surveyed tissues in small bowel, consistent with a report from another group (Harmer et al., 2002). These results in combination suggest that methylation status of the ACE promoter may be an important regulator of gene expression and may influence tissue-specific differences in expression, though additional experiments would be required to evaluate this hypothesis.

4.3 Summary

We found major cis-acting affects on ACE mRNA expression in primary human heart tissue that are attributable to upstream alleles common in individuals of African-

American ancestry (see Fig. 4.2). The use of quantitative allelic expression imbalance assay in primary tissues was critical in the discovery of these functional alleles. In vitro reporter gene assays in multiple cellular contexts confirm that these alleles (rs4290 T allele and perhaps rs7213516 A allele) result in a functional increase in ACE expression

(see Fig. 4.3). Because the alleles were discovered in a screen of heart failure patient tissues for a number of candidate genes (see also Chapter 3), we suspected they might play a functional role in the development of cardiovascular disease.

In the initial analysis of a genetic association study in a diverse clinical trial cohort (INVEST) we found that the alleles identified in our initial screen (rs4290 T allele; rs7213516 A allele) were highly associated with cardiovascular disease outcomes

120 (p<0.001) (see Tables 4.3-4.5). Given that these ACE alleles were found at relatively

high frequency particularly in the African-American population in Florida and Ohio, as

well in a Nigerian population (Yurubans in the HapMap project), this also suggests that

these alleles may partially account for many previous reports showing phenotypic

variation in ACE levels and blood pressure (e.g., Zhu et al., 2001) and response to ACE

inhibitors (Exner et al., 2001; Sehgal 2004; McDowell et al., 2006) in individuals of

African origin. Importantly, the alleles we characterized are found at lower frequency in

Hispanic and Caucasian-Americans surveyed, and clinical associations remained largely significant within the overall group, suggesting that these alleles may be clinically relevant in a wider human population.

ACE alleles (rs7213516 A allele; rs4290 T allele) were most strongly associated with an additional MI event, particularly in individuals receiving ACE inhibitor and/or beta-blocker therapy (see Table 4.5). This pharmacogenetic interaction makes sense

given that these anti-hypertensives (trandolopril and atenolol) target overlapping systems

of blood pressure control with ACE a critical component, while verapamil targets a

distinct pathway (Katzung 2004). Further investigation and replication is warranted

before we can determine whether these alleles may have clinical utility in the selection of

therapeutic options for particular individuals. The association with MI and the allelic

expression results in heart tissue are also of particular interest given that many studies

have shown that inhibiting ACE reverses cardiac remodeling and hypertrophy. Although

we have not provided a direct molecular explanation or detailed investigation of the

tissue specificity of the effects described here, our results suggest that these alleles should

be the subject of wider clinical study. An attempt to replicate the INVEST clinical

121 genetic association in a CAD population at the Ohio State University showed similar allelic trends with regard to MI events though the observed differences did not reach statistical significance in our analysis. However, the number of cases in this population was slightly smaller giving lower statistical power and the enrollment criteria and study design were significantly different than the INVEST genetic substudy that was specifically designed for case-control analysis. We have provided evidence for functional alleles in vitro, in primary human tissue and in a clinical genetic study. Given that ACE has physiological roles of wide importance including blood pressure regulation, kidney function, processing of kinins and other peptides, and degradation of amyloid-beta protein (Hemming & Selkoe 2005), we suggest that the alleles we have characterized should be the subject of substantial additional study in a variety of human diseases.

We found little evidence for a functional allelic effect on expression in the human tissues we studied for the often studied ACE insertion/deletion polymorphism in intron

16. Our findings are consistent with in vitro molecular studies of this polymorphism by others (Rosatto et al., 1999; Lei et al., 2005). Remarkably thousands of genetic association studies have been conducted using this polymorphism despite a lack of strong evidence for a functional mechanism, with many studies yielding no association or associations that are rather small in magnitude compared with those we report here

(Sayed-Tabatabaei et al., 2006). The approach of our study and the resulting significant discovery highlights two major problems currently facing human genetics research: 1) a failure to enroll and study diverse human sample populations, and 2) a reliance on statistical genetic associations alone without further significant investigations of functional mechanisms.

122

CHAPTER 5

Human CCL2 (chemokine (C-C motif) ligand 2)

Since the discovery of chemokines (Yoshimura et al., 1987), numerous human genes participating in immune cell chemotaxis have been identified and characterized.

Chemokines and their receptors are recognized as key mediators in the development of

atherosclerotic plaques (Tedgui & Mallat 2006), and as entry points for HIV into host cells during infection (Lederman et al., 2006). Given their widespread influence on inflammation, chemokines have been attributed further roles in many human disease pathologies, including arthritis (Muller-Ladner et al., 2005), gout (Cronstein &

Terkeltaub 2006), kidney disease (Segerer & Nelson 2005; Galkina & Ley 2006; Panzer et al., 2006), lupus nephritis (Tucci et al., 2005), multiple sclerosis (Sorensen 2004;

Bruck 2005; Ubogu et al., 2006), brain responses to pain and inflammation (Minami et

al., 2006; Rittner & Brack 2006), cancer growth, and metastasis (Gomperts & Strieter

2006; Rollins 2006; Zlotnik 2006). Chemokine modulation is also suggested to influence

lung diseases including (Murray et al., 2006; Schaller et al., 2006), COPD (Papi

et al., 2006), tuberculosis (Rivas-Santiago et al., 2005), sarcoidosis (Gurrieri et al., 2005),

idiopathic pulmonary fibrosis (Antoniou et al., 2005; Agostini et al., 2006), skin disorders

including atopic infection (Albanesi et al., 2005; Allam & Novak 2006; Homey et al.,

123 2006) and (Gaspari 2006), sepsis (Alves-Filho et al., 2006; Kobayashi et al.,

2006), viral infections (Rosenkilde & Kledal 2006), and transplantation outcomes

(Dickinson & Charron 2005; Merani et al., 2006). Because of the importance of

chemokines in these diverse pathologies, targeting of chemokine systems by

pharmacological intervention is the focus of a number of clinical trials.

Chemokines and their receptors represent potential candidate genes involved in

multifactorial disease susceptibility and progression. Among thousands of studies on

chemokine polymorphisms, many focus on polymorphisms of the gene encoding

Monocyte Chemoattractant Protein-1/C-C Chemokine Ligand 2 (MCP-1, also known as

CCL2, 17q11.2-q12). This chemokine plays a critical physiological role in gradient

signaling to attract monocytes and macrophages to tissue sites. Expression of CCL2 is a

marker in many human disease conditions, and the protein is currently being targeted in

therapeutic trials. A SNP at position -2578 in a distal enhancer region of CCL2

putatively affects transcription (Rovin et al., 1999), but conclusive biochemical data on

an underlying mechanism remain lacking. Nonetheless, this regulatory SNP has been

studied in more than 80 clinical populations for a variety of disorders. Recently, an

association was reported between CCL2 genotype for this SNP and myocardial infarction

(MI) in a large cohort within the Framingham Study (McDermott et al., 2005). However,

a subsequent study in a large Japanese population found genotype associations with

serum CCL2 levels but not with MI, suggesting there are other contributing allele

differences between populations (Iwai et al., 2006). Despite significant disease associations with CCL2 expression, it remains unclear to what extent the observed inter-

124 individual variability is due to trans-acting disease factors or functional genetic

differences between individuals.

Another cytokine, macrophage colony-stimulating factor, also known as colony- stimulating factor-1 (CSF1, 1p21-p13), localizes to atherosclerotic plaques along with

CCL2. CSF1 shows cell-type specificity in controlling the differentiation of monocytes

to macrophages (Rohrschneider et al., 1997; Kelley et al., 1999) and increases the

expression of CCL2 in monocytes (Shyy et al., 1993) suggesting it is an important in vivo

regulator of mononuclear phagocyte chemotaxis. Recent studies show associations of

CSF1 serum levels with aortic calcification in hemodialysis patients (Nitta et al., 2001;

Kihara et al., 2005), as well as development of carotid plaques and increased intima-

media thickness (Haraguchi et al., 2006), suggesting a potential contribution to individual

differences in the risk of vascular disease. However, this cytokine is the subject of only one previous genetic report which implicated three CSF1 SNPs in aggressive periodontitis (Rabello et al., 2006).

Here we report the relative contribution of cis- versus trans-factors to CCL2

expression in different human tissue types, testing specifically the contribution of the

putative regulatory -2578 SNP. We further provide the first evidence that CSF1 exhibits

functional cis-acting differences at the mRNA level. To address both questions we

employed analysis of allelic mRNA expression imbalance (AEI), a method that reveals

the relative contribution of cis-acting factors to differences in expression while

effectively controlling for trans-acting factors (Johnson et al., 2005; Wang et al., 2005;

Zhang et al., 2005; Pinsonneault et al., 2006). Measuring allelic mRNA expression

ratios, we detected the presence of AEI for CCL2 mRNA expression in human monocyte

125 derived macrophages (MDMs) and cardiac tissues from left ventricles of heart transplant

patients. Because regulatory polymorphisms can act in a context-dependent manner, we also measured CCL2 and CSF1 expression by both RTPCR and AEI after stimulation of

MDMs with CSF1. Our results demonstrate the presence of cis-regulatory factors affecting expression of both genes, but indicate the commonly studied CCL2 -2578 SNP is not likely the source of differences in cis-regulation in CCL2 expression. Moreover, a

meta-analysis of clinical genetic studies of the -2578 SNP with varying outcomes

weakens the argument that the -2578 SNP is functional. Our results in relevant human

tissues demonstrate that the cytokines CCL2 and CSF1 are subject to cis-acting

differences in expression that require further study.

5.1 Method of investigation of human CCL2

Sample preparation of human monocyte derived macrophages (MDM):

Mononuclear cells were isolated from 55 independent buffy coat samples obtained from the American Red Cross (Columbus, OH) via the clumping method as previously described (Eubank et al., 2003). The monocytes were then collected, washed with RPMI medium, and then resuspended in RPMI 1640 supplemented with 10% FBS. 3.5 x 106 cells/well were plated in a 12-well tissue culture plate and incubated at 37°C, 5% CO2 for

60 minutes to allow the monocytes to bind to the plate. All non-adherent cells were

removed by washing the plate with RPMI 1640, and fresh medium was added to the

adherent monocytes supplemented with 5% FBS and 20 ng/ml CSF1. Fresh CSF1 was

added every 2 days, and the cells were incubated at 37°C, 5% CO2 for 5-6 days to allow

126 differentiation. Cells were analyzed via flow cytometry and were shown to be 96.0 ±

2.6% (average ± SEM) positive for both CD14-PE-Cy7 and Mannose Receptor-FITC.

All cells were washed with RPMI 1640 and subsequently serum starved overnight at

37°C prior to stimulation with CSF1. Where indicated, cells were stimulated with 100 ng/ml CSF1 and incubated at 37°C, 5% CO2 for 48-72 hours. For MDMs, vacuum

centrifugation was applied to concentrate isolated RNA before cDNA synthesis.

Sample preparation of human heart tissue: Tissue was harvested from the left

ventricular region of failed hearts from 55 patients receiving transplants under an IRB

approved protocol. Tissue was snap frozen and pieces pulverized over dry ice. Two

independent RNA preparations were made for each sample using Trizol and chloroform

extraction. RNA was treated with DNAse I (Invitrogen) for 30 minutes on a purification

column (Qiagen). Genomic DNA for both tissues was isolated using a salting out

procedure described elsewhere (Miller et al., 1988).

Sample genotyping: A diagram of the CCL2 gene with the relative positions of

markers from this study are shown in Figure 5.1. A multiplexed SNaPshotTM assay was

developed to genotype three CCL2 SNPs: rs2857657 in intron 1, rs4586 in exon 2, and

rs13900 in the 3’UTR. Briefly, outside primers flanking the SNPs (Table 5.1) were

multiplexed with ~30ng genomic DNA in a standard 30 cycle PCR reaction. Products

were treated with bacterial alkaline phosphatase (BAP) and exonuclease-I (Exo-I) for 3h at 37°C before undergoing single base extension reaction at 55°C with multiplexed SNP- specific extension primers (Table 5.1). Calf intestinal phosphatase (CIP) treatment was applied for 3h to remove product, and the primers with incorporated fluorescent ddNTPs were denatured with Formamide/LIZ dye standard and run on an ABI3730 capillary

127 electrophoresis instrument (Applied Biosystems, Inc.). Peak calling and analysis was

done with GeneMapper software, version 3.7. An alternative method was optimized to perform genotyping of rs1024611 (elsewhere referred to as -2578A>G and -2518 A>G)

1kbp

2.6kbp

rs1024611 (A>G) rs2857656 (C>G) rs4586 (C>T) rs13900 (C>T) (-2578)

Figure 5.1: Human CCL2 gene structure and relevant genetic polymorphisms (chromosome 17q12)

upstream of CCL2 and rs333970 in Exon 6 of CSF1, using GC-clamp primer sets (Table

5.1) (Papp et al., 2003). Primers were mixed with ~30ng genomic DNA and SYBR green mix and cycled on an ABI7000 sequence detection system, with melting curve dissociation analysis. The alleles were distinguishable as peaks in the melting curve, and the assays were confirmed using CEPH Coriell sample DNAs of known genotype based on HapMap project results. For rs1024611, CEPH samples of genotypes AA (n=26), GA

(n=18), GG (n=2) were blindly compared to HapMap results with 100% concordance.

For rs333970, separate wells were necessary to call the C and A alleles for each DNA.

128 PRIMER DESCRIPTION PRIMER SEQUENCE rs4586 forward ATGCAATCAATGCCCCAGTC rs4586 reverse GCGAGCCTCTGCACTGAGAT rs4586 extension AGATCTTCCTATTGGTGAAGTTATA rs13900 forward CAACCCAAGAATCTGCAGCTAA rs13900 reverse GGCATAATGTTTCACATCAACAAAC rs13900 extension TAGCTTTCCCCAGACACC rs2857657 forward GGCAGAAGCACTGGGATTTAAT rs2857657 reverse TCTGCACTGAGATCTTCCTATTGG rs2857657 extension AAGAGTCATGAGGAAAAAGCAAAAGGCAGGCAGGAGA AGA rs333970 forward TTCCTCTCAGCATCTTCTCCAC rs333970 reverse GGGCAGATGGATGGTCTGTC rs333970 extension GCCGGCAGATGTAACTGGTAC rs1024611 allele-specific fwd. GCCAGCACTGACCTCCC rs1024611 A allele-specific AAAGAAAGTCTTCTGGAAAGTCAT rs1024611 G allele-specific CGGGCGGCGGCGAAAGAAAGTCTTCTGGAAAGTTAC rs333970 allele-specific reverse GCAGTGATGAGATCCTGGGAG rs333970 A allele-specific GCCGGCAGATGTAACTGGCACA rs333970 C allele-specific GCGCGGCCCGGCGGCGCCGGCAGATGTAACTGGAACC CCL2 exon 2 forward AGAATCACCAGCAGCAAGTGTC CCL2 exon 2/3 reverse CAATGGTCTTGAAGATCACAGCTT CSF1 exon 5 forward TGCGTCCGAACTTTCTATGAGA CSF1 exon 5/6 reverse AGGCTTGGTCACCACATCTTG

Table 5.1: Oligonucleotides used in the chemokine study with matches to specific alleles, intentional mismatches (GC clamps), and exon-spanning dinucleotides underlined.

129 This 2-well assay was verified by SNaPshot assay and concordance with 3 HapMap samples representing genotypes CC, CA, and AA.

Gene-specific cDNA synthesis: RNAs were analyzed for yield and quality by

spectrophotometry and nanodrop analysis (Agilent Bioanalyzer). cDNAs were synthesized in duplicate from 1 ug RNA each using oligo dT and gene specific primers in reverse orientation (targeting a sequence just downstream of the marker SNPs), and

superscript reverse transcriptase (SSRTII, Invitrogen). Products were quantitated through

RTPCR using SYBR green on an ABI7000 (Applied Biosystems). ß-Actin expression

was measured on the same plate as a reference gene for relative quantitation. Negative

controls (SSRTII-) and products were verified by gel. Primer sets for CCL2 and CSF1

mRNA quantitation included one primer that spanned two exons ensuring measurement

of only the specific cDNA of interest (Table 5.1).

Allelic mRNA expression assay: The method involves PCR amplification and

analysis of allelic ratios for genomic DNA and cDNA, using a marker SNP in the

transcribed region of the gene. Marker SNPs within the mRNAs were selected and

genotyped in all samples: in CCL2 exon 2 (rs4586) and 3’UTR (rs13900), and exon 6 of

CSF1 (rs333970). Allelic ratios in the PCR amplicons were determined with a primer

extension method (SNaPshot, Applied Biosystems, Inc., Foster City, CA), using the

protocol described above except that all reactions were singleplex, and reactions were run

for both genomic DNA and synthesized cDNA. Allelic ratios were calculated as the

heterozygous cDNA peak ratios (GeneMapper analysis), normalized against the average

heterozygous DNA peak ratio among the population of genomic DNA samples on the

plate. Standard deviation of the allelic expression ratio in genomic DNA was 1.0 ± 0.13

130 for rs4586 (n=53 assays) and 1.0 ± 0.11 for rs13900 in heart tissue (n=50), 1.0 ± 0.04 for rs4586 in MDMs (n=73), 1.0 ± 0.05 for rs13900 in MDMs (n=66) and 1.0 ± 0.04 for rs333970 in MDMs (n=36). None of the allelic ratios of genomic DNA deviated significantly from the mean value, as expected if the two alleles are present in equal number in the target tissue (i.e., no chromosomal aberrations or copy number variations were detected in these two autosomal genes).

CCL2 -2578 genotype meta-analysis: Pubmed was searched with the terms MCP-

1, CCL2, allele, polymorphism, haplotype, promoter, 2578, 2518, SNP. All abstracts were reviewed and, if there was indication of genotyping, population information and other details were collected and the references examined for additional articles reporting genotypes. In this manner 80 reports indicating genotyping of this SNP were identified, as of December 2006. In some cases authors were contacted to obtain additional population data not included in the original publications. All reports were included in the analysis of allele frequency, but only those reports with a case-control study design were included in the odds ratio analysis. We followed community suggested guidelines for meta-analysis (Stroup et al., 2000). Statistical analysis of all data was done with SPSS

(version 13).

5.2 Results of investigation of human CCL2

Expression of CCL2 is upregulated by CSF1 treatment: We observed robust expression for both CCL2 and CSF1 in untreated MDMs with average RTPCR cycle threshold crossings (CT) of 17.9 ± 1.5 (1SD) and 21.9 ± 2.5 (1SD), respectively.

131 Treatment with CSF1 (100 ng/ml) strongly increased CCL2 mRNA expression (see Fig.

5.2, average CT 14.7 ± 1.2 (1SD)), consistent with other reports (Shyy et al., 1993; C.

Baran unpublished results). In contrast with enhanced CCL2 gene expression after CSF1

stimulation, there was no increase in CSF1 gene expression (see Fig. 5.2).

CCL2 and CSF1 display Allelic mRNA expression imbalance (AEI): Genotyping

identified heterozygous samples for the marker SNPs used for AEI analysis in a majority of the heart tissue and MDM cell samples. Two marker SNPs (rs4586, rs13900)

measured AEI for CCL2 in both heart and MDM samples, while one marker SNP

(rs333970) measured CSF1 AEI in MDMs. AEI results for CCL2 in 32 heart samples are

displayed in Figure 5.3 on a log base 2 scale, with the bi-directionality of the bars

indicating that AEI ratios are not directly linked to either marker SNP, but rather to other

functional polymorphisms not in complete LD with the marker SNPs.

Heterozygous carriers of the putative regulatory -2578 SNP (rs1024611) did not

display AEI in many cases, and moreover, one sample displaying substantial AEI was

homozygous for rs1024611 (AA), arguing against a major functional role for this SNP in

132

RTPCR results displaying effects of CSF1 (100 ng/ml) stimulation of MDMs on mRNA expression of CCL2 and CSF1. Expression levels are calculated relative to human ß- actin with the 2-∆∆CT method. Both genes are well expressed in MDMs. CCL2 mRNA expression is potently upregulated with CSF1 stimulation (paired t-test; p<0.001), while CSF1 mRNA expression showed a modest decrease with CSF1 stimulation (paired t-test; p<0.001).

Figure 5.2: CSF1 stimulation potently upregulates CCL2 expression in human MDMs

133 CCL-2 ratios in heterozygous heart tissues

0.6 rs1024611 AA rs1024611 GA 0.4

0.2

0

-0.2

-0.4

-0.6

-0.8 AEI ratio (log base 2 scale) -1

-1.2

Imbalances of CCL2 expression in heart tissue between chromosomes of individuals heterozygous for one or more of the marker SNPs, rs4586 (G/A) and rs13900 (C/T). Bars represent the mean of multiple cDNA ratios (on a log2 scale) between major and minor allele fluorescent peaks and normalization to the average DNA ratio of all samples (assumed 1:1). Brackets above indicate the samples grouped by AA and GA genotype for rs1024611. Error bars indicate 1SD between all measurements within samples (2 RNA isolates X 1 or 2 marker assays depending on rs4586/rs13900 genotypes). Samples whose cDNA results varied widely were excluded (defined as variability greater than 1SD between all AEI results, n=4).

Figure 5.3: CCL2 allelic expression imbalance results in human heart tissues segregated by rs1024611 genotype

134 mRNA expression.

In both heart tissue and MDMs the use of two marker SNPs for CCL2 allowed internal comparison of AEI results using two independent assays for the same measurement (22 of 36 heart samples and 23 of 37 MDM samples were heterozygous for both markers). Compound heterozygotes in MDM samples showed significant correlation between results from both AEI assays (Pearson r2=0.93). There was lower,

but still significant correlation between marker results in heart samples (r2=0.87) and

CSF1-treated MDMs (r2=0.71). AEI results for CCL2 in treated and untreated MDMs

(see Fig. 5.4) are consistent with those in the heart, showing that heterozygous carriers of rs1024611 do not all display detectable AEI, while at least one sample displaying nearly twofold AEI is homozygous (AA) for rs1024611.

AEI results for CSF1 indicate an expression imbalance of greater magnitude, mostly in equilibrium with a single allele (see Fig. 5.5). Comparing CCL2 and CSF1 AEI ratios between samples in CSF1-treated and untreated MDMs showed no significant differences (paired t-tests: p<0.21 and p<0.31, respectively). However, several individual samples did display a marked difference in allelic gene expression of CCL2 and CSF1 between conditions, indicating that in some individuals there is likely a cis-acting difference in constitutive versus induced gene expression (see Figs. 5.4, 5.5).

Genotype analysis of CCL2 and CSF1 expression: Genotype results for heart and

MDM samples are displayed in Table 5.2. We tested all variants for association with

gene expression (RTPCR) results in heart samples, and untreated and CSF1-treated

MDMs using genotype (ANOVA) and co-dominant and recessive inheritance models for

each variant. The results revealed no significant associations between expression levels

135 CCL-2 ratios in MDMs from heterozygous individuals

0.8 rs1024611 AA 0.6 rs1024611 GA

0.4

0.2

0.0

-0.2

-0.4

-0.6

AEI ratio (log base 2 scale) -0.8

-1.0

-1.2

Allelic expression imbalances for CCL2 under untreated (white bars) and CSF1-treated conditions (grey bars) in individual MDM samples, displayed on a log2 scale. Brackets above indicate the samples grouped by AA and GA genotype for rs1024611. Error bars indicate 1SD between all measurements within samples (2 cDNA syntheses X 1 or 2 marker assays depending on rs4586/rs13900 genotypes). Samples whose cDNA results varied widely were excluded (defined as variability greater than 2SD between all AEI results, n=3).

Figure 5.4: CCL2 allelic expression imbalance results in human MDMs segregated by rs1024611 genotype

136 CSF-1 ratios in MDMs from heterozygous individuals

1.2

1.0

0.8

0.6

0.4

0.2

0.0 AEI ratio (log base 2 scale) 2 base (log ratio AEI

-0.2

-0.4

Allelic expression imbalances for CSF1 under untreated (white bars) and CSF1-treated conditions (grey bars) in individual MDM samples heterozygous for marker SNP rs333970 (C/A), displayed on a log2 scale. Error bars indicate the range of all measurements within samples (2 cDNA syntheses). Samples whose cDNA results varied widely were excluded (defined as variability greater than 2SD between all AEI results, n=3).

Figure 5.5: CSF1 allelic expression imbalance results in human MDMs

137

SNP genotype and RTPCR expression data (means by genotype given in arbitrary units relative to B-actin internal standard) for monocyte and heart failure samples. Genotype frequencies are given in parentheses. Analysis of expression level for both genes by all alleles revealed no significant associations. Analysis was done by genotype as shown in the table (ANOVA) and also using recessive and co-dominant models (not shown). Baseline and induced levels for MDMs (not shown) also showed no significant associations.

Table 5.2: CCL2 and CSF1 genotype and total expression level results show no significant associations.

138

Heart Monocyte Fold induction Fold induction Heart tissue failure RefSNPid samples of CCL2 of CSF1 CCL2 samples (n=55) expression expression expression (n=55) rs1024611 AA 23 (41.8) 6 p<0.73 0.7 p<0.75 24 (43.6) 9 p<0.35 GA 31 (56.4) 7 0.7 27 (49.1) 21 GG 1 (1.8) 3 1.0 4 (7.3) 3

rs2857657 GG 38 (69.1) 7 P<0.50 0.7 p<0.78 42 (76.4) 11 p<0.24 GC 15 (27.3) 8 0.8 10 (18.2) 30 CC 2 (3.6) 2 0.8 3 (5.5) 5

rs4586 AA 13 (23.6) 6 p<0.80 0.7 p<0.68 15 (27.3) 9 p<0.48 139 GA 36 (65.5) 7 0.7 31 (56.4) 19 GG 6 (10.9) 7 0.6 9 (16.4) 7

rs13900 CC 22 (40.0) 6 p<0.73 0.7 p<0.78 24 (43.6) 9 p<0.23 CT 32 (58.2) 7 0.7 26 (47.3) 21 TT 1 (1.8) 3 1.0 5 (9.1) 3

rs333970 AA 25 (45.5) 8 p<0.36 0.7 p<0.81 CA 21 (38.2) 6 0.7 CC 9 (16.4) 5 0.7

Table 5.2.

139 and any of the genotyped variants, regardless of the model used. AEI results for CCL2 and CSF1 displayed no significant association with any of the SNPs genotyped.

CCL2 meta-analysis reveals population diversity and inconsistent association with human disease: We collected and analyzed 29,811 genotypes for the -2578 SNP

(rs1024611). The results show that on a global scale there is considerable diversity at this locus, with the major and minor alleles being flipped and at widely varying frequencies among different groups studied (see Table 5.3). Chi-square tests on all subpopulations of samples indicates genotype distributions fall within Hardy-Weinberg equilibrium with the exception of Icelandic, Hungarian, Chinese, Korean and Mexican populations.

Analysis of control population genotype distributions alone indicates that deviations from

Hardy-Weinberg equilibrium are likely due to the selection of patient populations that are implicitly non-random, with the sole exception of the Korean population (see Table 5.3).

In all subpopulations displaying disequilibrium there was an overrepresentation of heterozygotes. Meta-analysis of odds ratios for -2578 alleles with a variety of human diseases shows that the SNP is not consistently or strongly associated with any disease

(see Fig. 5.6).

140

Complete information for genotype frequencies in all papers where genotypes were reported, or shared by authors after e-mail query. The left portion of the table reflects all case and control samples, including case samples from studies that did not adopt a case- control design. The right portion indicates control samples from studies with a case- control design. The dashed line separates populations where the A allele (above) or the G allele (below) is dominant.

Table 5.3: Allele frequency information for CCL2 SNP rs1024611 across studies included in meta-analysis.

141 All samples Controls only Hardy- Hardy- Weinberg Weinberg Population N AA GA GG Equilibrium n AA GA GG Equilibrium Icelandic 535 63.7% 29.3% 6.9% p<0.009 303 64.0% 29.7% 6.3% p<0.17

African American 973 63.1% 32.6% 4.3% p<0.99 305 62.3% 33.1% 4.6% p<0.99 Spanish 3706 59.9% 35.2% 4.9% p<0.89 1654 60.6% 34.6% 4.8% p<0.98 Czechoslovakian 498 59.2% 35.7% 5.0% p<0.96 359 57.4% 37.6% 5.0% p<0.79 French 610 58.7% 34.1% 7.2% p<0.20 0 n/a n/a n/a n/a Hungarian 1566 56.1% 35.4% 8.5% p<0.004 777 61.3% 32.6% 6.2% p<0.19 Brazilian 231 55.4% 35.9% 8.7% p<0.47 0 n/a n/a n/a n/a German 6574 54.0% 38.7% 7.3% p<0.70 3848 52.9% 39.9% 7.2% p<0.86 142 European American 1397 52.8% 39.5% 7.7% p<0.96 519 50.1% 40.1% 9.8% p<0.61 Italian 2605 50.8% 41.3% 7.8% p<0.76 1181 50.8% 41.3% 7.8% p<0.89 British 528 49.8% 43.8% 6.4% p<0.20 224 44.2% 48.2% 7.6% p<0.24 Argentinean 592 33.3% 49.2% 17.6% p<0.98 270 34.4% 47.0% 18.5% p<0.85 Hispanic American 177 32.2% 49.7% 18.1% p<0.98 110 27.3% 50.9% 21.8% p<0.98 Asian American 16 31.3% 43.8% 25.0% p<0.89 16 31.3% 43.8% 25.0% p<0.89 Thai 481 25.8% 50.5% 23.7% p<0.97 0 n/a n/a n/a n/a Chinese 704 20.0% 44.5% 35.5% p<0.06 264 19.3% 44.3% 36.4% p<0.37 Korean 2140 17.5% 43.0% 39.5% p<6x10-6 872 20.0% 42.3% 37.7% p<0.001

Mexican 962 17.3% 44.4% 38.4% p<0.09 193 23.8% 50.8% 25.4% p<0.98 Japanese 5516 13.1% 45.9% 41.0% p<0.96 3076 13.5% 48.1% 38.4% p<0.36

Table 5.3.

142

Odds ratio calculations for the A allele relative to the G allele for diseases where more than one case-control study for rs1024611 is reported. Labels indicate first author, year and reported ethnic/geographic group. The horizontal bars indicate the 95% confidence interval for the odds ratio calculation. For each disease grouping the overall odds ratio with 95% confidence interval across studies is the top horizontal bar. Horizontal bars that overlap anywhere with 1.0 represent insignificant differences between the alleles.

Figure 5.6: Meta-analysis of CCL2 SNP rs1024611 including only diseases where multiple studies have been conducted

143 Odds Ratio (G:A allele) CAD, M I, hypertension (n=4) Tucci 2006 (Italian) Szalai 2001(Hungarian) Iwai 2006 (Japanese) Cermakova 2005 (Czech)

Asthma (n=3) Szalai 2001 (Hungarian) Keszei 2006 (Hungarian) Yao 2004 (Chinese)

Rheumatoid Arthritis (n=4) Ye 2004 (Chinese) Gonzalez 2004 (Spanish) Lee 2003 (Korean) Hwang 2002 (Korean)

SLE (n=8) Tucci 2004 (mixed ethnicity) Sanchez 2006 (Spanish) Ye 2004 (Chinese) Aguilar 2001 (Spanish) Liao 2004 (Chinese) Nakashima 2004 (Japanese) Hwang 2002 (Korean) Kim 2002 (Korean)

SLE LN+ (n=5) Tucci 2004 (mixed ethnicity) Kim 2002 (Korean) Aguilar 2001 (Spanish) Nakashima 2004 (Japanese) Ye 2004 (Chinese)

IgA Nephropathy (n=2) Steinmetz 2004 (German) M ori 2005 (Japanese)

Anterior Uveitis (n=2) Wegscheider 2005 (Austrian) Yeo 2006 (British)

Hepatitis C (n=3) M uhlbauer 2003 (German) Bonkovsky 2005 (unreported) Glas 2004 (German)

HIV (n=5) Gonzalez 2004 (Argentinian) Alonso 2004 (Spanish) Gonzalez 2004 (African American) Gonzalez 2004 (Eur. American) Gonzalez 2004 (Hispanic American)

Type I Diabetes (n=2) Simeoni 2004 (German) Yang 2004 (British)

Alzheihmer's Disease (n=5) Pola 2005 (Italian) Huerta 2004 (Spanish) Combarros 2004 (Spanish) Nishimura 2003 (Japanese) Fenoglio 2004 (Italian)

Parkinson's Disease (n=2) Nishimura 2003 (Japanese) Huerta 2004 (Spanish)

Schizophrenia (n=2) M undo 2004 (Italian) Pae 2004 (Korean) 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

Figure 5.6.

144 5.3 Summary

Functional polymorphisms of CCL2 are the subject of many previous studies.

Because there are no known nonsynonymous SNPs in the gene, and the gene is subject to

significant regulation at the mRNA level, the focus of study is on an upstream SNP (-

2578, rs1024611). The prevailing assumption is that the G allele at this position leads to

increased CCL2 transcription based on results from early publications, but disease associations with this SNP have been inconsistent. Only a few studies have investigated

effects at the mRNA or protein level, with some associating the A allele with higher

levels of CCL2 mRNA and protein (Kim et al., 2002; Simeoni et al., 2004), even though

the G allele is widely presumed to increase production. Here we demonstrate

experimentally in two physiologically relevant systems, and through meta-analysis of the

literature, that rs1024611 alleles are unlikely to be functionally relevant. Using AEI

assays that control for trans-modifying factors, we have previously assessed the

contribution of cis-regulatory variants to inter-individual differences in expression of

other genes with suspected roles in disease and response to therapies (Wang et al., 2005;

Zhang et al., 2005; Lim et al., 2006; Pinsonneault et al., 2006). This approach is strong in

that it measures the contribution of alleles relative to one another within an in vivo tissue context, in contrast to traditional reporter assays. Our results in human heart tissues show

CCL2 is regulated in an allele-dependent manner at the mRNA level in some individuals, but the functional allele(s) is incompletely linked to the G allele at position -2578 (see

Fig. 5.3).

145 To account for upregulation of CCL2 under a variety of pro-inflammatory conditions, we also measured allelic mRNA differences in MDMs that were untreated or

treated with CSF1. We observed a pattern similar to that in the heart tissue, indicating

allele-related differences in mRNA expression are not explained by heterozygosity at

CCL2 position -2578 (see Fig 5.4). Treatment with CSF1 potently upregulated CCL2

expression, implying that an induced state of gene expression was achieved (see Fig. 5.2).

In addition differences were observed in AEI under constitutive and induced conditions

within some individual samples, implying an interaction between activation of CCL2

expression and allelic regulation in some subjects. Given that such differences were

observed in a few individuals with AA and GA rs1024611 genotypes and that the relative

direction of this change in AEI was variable, we conclude that at least one functional

variant is at lower frequency than and in incomplete equilibrium with rs1024611. We did

not find any association between rs1024611 and constitutive or induced expression of

CCL2 as measured by RT-PCR in heart tissues and MDMs (see Table 5.2), further

suggesting this specific SNP does not contribute to inter-individual differences in vivo.

A meta-analysis of many genetic studies of CCL2 reveals a number of interesting

conclusions. The allele frequency of rs1024611 is extremely variable among world

populations (see Table 5.3). Our meta-analysis of disease risk in case-control studies

employing rs1024611 indicates there are no diseases where this SNP is consistently

linked with increased risk (see Figs. 5.6, 5.7). However, a number of studies do show

risk association, or report strong associations with clinical indicators that were not

factored into the odds ratio meta-analysis. Notably case-control studies in Asian

populations reveal no significant associations with disease risk, with the exception of

146 Odds Ratio (G:A allele)

Tuberculosis (Flores-Villaneuva 2005, mixed)

Acute Pancreatitis (Papachritou 2006, mixed)

Liver transplant (Franco-Lopez 2005, Spanish)

Hemodialysis Patients (Omori 2002, Japanese)

Atopic dermatitis/eczema (Kozma 2002, Hungarian)

Crohn's Disease (Herfarth 2003, German)

Alopecia Areata (Hong 2006, Korean)

Kaw asaki's Disease (Jibiki 2001, Japanese)

Behcet's Disease (Cho 2004, Korean)

Breast Cancer (Ghilardi 2005, Italian)

Multiple Sclerosis (Kroner 2004, German)

Sarcoidosis (Takada 2002, Japanese)

Depression (Pae 2004, Korean)

Bipolar disorder (Pae 2004, Korean)

0.00.51.01.52.02.53.0

Odds ratio calculations for the A allele relative to the G allele for all diseases where only one case-control study for rs1024611 is reported. Labels indicate first author, year and reported ethnic/geographic group. The horizontal bars indicate the 95% confidence interval for the odds ratio calculation. Horizontal bars that overlap anywhere with 1.0 represent insignificant differences between the alleles.

Figure 5.7: Meta-analysis of CCL2 SNP rs1024611 including only diseases where one study has been conducted

147 single studies in small Korean populations for tuberculosis (Flores-Villanueva et al.,

2005) and a sub-divided group with bipolar disorder (Pae et al., 2004). Of 9 studies that

show significantly increased risk with the G allele, six are in Southern or Eastern

European populations in diseases that include CAD (Szalai et al., 2001a), arterial

hypertension (Tucci et al., 2006), asthma (Szalai et al., 2001b; Keszei et al., 2006),

anterior uveitis (Wegscheider et al., 2005) and Alzheimer’s disease (Pola et al., 2004).

Moreover, the three other studies show borderline significance at an α value of 0.05

(Sanchez et al., 2006) or are in mixed populations (Tucci et al., 2004; Flores-Villanueva

et al., 2005) that exhibit allele stratification (see below). Conversely, among six study

groups that show significantly reduced disease risk with the G allele none is in a Southern

or Eastern European population (Gonzalez et al., 2002; Hwang et al., 2002; Glas et al.,

2004; Pae et al., 2004; Yang et al., 2004). It should be noted however that separate

studies of CAD (Cermakova et al., 2005) and Alzheimer’s disease (Fenoglio et al., 2004)

in Eastern European populations did not replicate significant risk associations.

Nonetheless, this analysis suggests that because of haplotype frequencies or linkage

differences, rs1024611, may be a useful marker for true functional alleles in some

populations. This is consistent with our finding of a wide distribution of genotype

frequencies among world populations (see Table 5.3). The stratification of this allele

impacts the analysis of mixed populations groups (e.g., overall cardiovascular disease in

Fig. 5.6; Tucci et al., 2004; Flores-Villanueva et al., 2005). Thus, more work is

necessary to identify true CCL2 functional variants and their distribution among diverse

populations, and the use of the rs1024611 marker alone in future association studies is

not recommended. This meta-analysis provides a reference for expected allele

148 frequencies in different groups. As an example of the potential utility of this information,

we find the genotype frequencies for a healthy Korean control population in a study

associating CCL2 with tuberculosis (see Fig. 5.7) deviate in the extreme from the

frequencies observed in our analysis (see Table 5.3), calling this association into question

(Flores-Villanueva et al., 2005).

We used bioinformatics tools to search for potential functional SNPs in the CCL2

gene region. Two promising candidates were rs11575011 and rs11575012 located at

positions -122 and -119 within a GC box that is known to bind Sp-1 and regulate CCL2

expression. These SNPs were reported at greater than 5% frequency in dbSNP, but using

allele-specific PCR in 180 chromosomes we detected no occurrences (data not shown).

Examination of the original electropherogram data for these SNPs confirmed them as

false positives (D. Geraghty, personal communication). More experimentation is

required to find and support true functional variants that affect CCL2 expression or

genetic studies will likely continue to yield confounding results. A further consideration

is that positive associations in the literature may also be due to linkage disequilibrium

with alleles in other chemokine genes clustered at the 17q locus which have similar

functions and tissue expression distributions, although this remains relatively untested.

Future work should focus on variants that affect transcription, mRNA processing and

degradation. Our results suggest the need for addressing the particular mode of activation of CCL2 expression in combination with the contribution of different alleles. If definitive

functional alleles cannot be established, then efforts to establish CCL2 biomarkers should

focus on absolute transcript or protein levels in relevant tissues.

149 We encountered a number of difficulties in conducting the meta-analysis which

disqualified some studies from inclusion: 1) repeated use of identical control populations

in multiple publications, either with or without disclosure, 2) authors’ decisions not to

report genotypes at all, or to collapse genotypes in co-dominant or recessive models, and

3) failure to disclose the ethno-geographic population or use of a mixed group without

providing a breakdown of genetic data. In most but not all cases authors readily shared

additional information with us. However, the extent of these problems suggests a need

for journals to adopt stricter standards for genetic reporting that will make future

retrospective analyses more feasible. Although 45 of the 80 studies we analyzed included

genotyping of multiple markers, most additional markers were located in entirely distinct

gene regions, with only 12 studies typing other markers within the CCL2 region. Only 5 studies including the original rs1024611 report (Rovin et al., 1999) conducted additional sequencing in the gene region.

CSF1 plasma levels have recently been associated with increased intima-media thickness and plaque formation in patients (Haraguchi et al., 2006). Given the CSF1- mediated upregulation of CCL2 demonstrated here and previously (Kelley et al., 1999), and the known role for CCL2 in atherogenesis, both CSF1 and CCL2 are related candidates in the inflammatory-mediated progression of diseases. Our results suggest that CSF1 is subject to allelic expression (see Fig. 5.5). Given the fairly high frequency of this imbalance and that it is in a single direction, a functional polymorphism may be tightly linked with the marker SNP, rs333970. RTPCR results indicate a slight decrease in CSF1 expression with CSF1 stimulation in MDMs implying the existence of a feedback inhibition mechanism. Further work is required to characterize the effects of

150 CSF1 variants and regulation at the mRNA and protein levels (Rabello et al., 2006), and to account for the relative contribution of known alternative splice variants produced from the CSF1 gene (Probst-Kepper et al., 2004). Unlike CCL2, CSF1 contains a CpG island in its promoter region, suggesting epigenetic regulation may also play a role.

151 CHAPTER 6

Influence of human polymorphisms on RNA structures

I provide here a genome-wide analysis of the distribution of RNA structures and

predicted structural effects of single nucleotide polymorphisms (SNPs) in various types of RNA. Algorithms for predicting RNA secondary structure have seen wide application mostly with a focus on functional RNAs (tRNA and rRNAs) rather than protein-coding mRNAs. Recent studies support the notion that SNPs in transcribed regions of a gene can affect structure and function (see Table 6.1). Comparisons of secondary structures predicted from mRNAs versus shuffled sequences in a variety of genomes indicate genome-wide selection for secondary structure, particularly in eubacterial organisms and one eukaryote, Saccharomyces cerevisiae (Katz & Burge 2003). Conclusions regarding secondary structure in human mRNAs have been mixed (Seffens & Digby 1999;

Workman & Krogh 1999; Katz & Burge 2003; Clote et al., 2005; Meyer & Miklos 2005).

Regardless of genome level bias for or against mRNA secondary structure there is good evidence of functional roles for structure within specific human mRNAs. The effect of structure in 5’ untranslated regions (UTR) in both inhibiting and promoting translation has been documented in mammalian cells (Kozak 1986; Kozak 1990; Kozak 2005;

Babendure et al., 2006). Specific structural motifs, selenocysteine insertion sequence elements (SECIS), are known to function in selenocysteine incorporation in human

152

A partial indication of experimental and computational investigations to date on the effects of human mRNA variation upon RNA structure, expression and disease. The table is separated as follows, 6.1a: a single functional SNP is investigated, 6.1b: multiple SNPs are known or predicted based on available information, 6.1c: multinucleotide and indel variants. Known functional variants in human tRNAs are not included. For reviews of these see (Florentz and Sissler 2001; Vilmi et al. 2005). Results from our structure-SNP database are reported for those variants for which haplotypes were unambiguous: Z- scores were calculated relative to categorical SDs (Fig. 6.3). Structure alteration was judged based on differences in the bracket notations for the alleles (‘-‘ no difference, ‘+’ difference at two or fewer positions, “++” difference at a moderate number of positions, “+++” difference at many positions).

Table 6.1: Reported RNA structure and polymorphism differences in the literature compared with results from the current work.

153 Structure Boltzmann Table 6.1a: Single SNPs ∆G ∆∆G alteration probability Article Gene dbSNP id Region MFE wt Z MFE Z Ens Z MFE Ens Major Minor Allamand 2006 SEPN1 -3'UTR Structural differences are observed Carpen 2005 PER2 rs2304672 5'UTR -26.8 0.4 -0.9 -0.6 -0.8 -0.6 +++ +++ <0.01 <0.01 Steinberger 2004 SPR -5'UTR Structural differences are observed Villette 2002 GPX4 rs713041 3'UTR -37.0 1.5 1.7 0.8 1.7 0.9 + + 0.42 0.45 Hu 2001 SEP15 rs5845 3'UTR -18.3 -0.5 1.5 0.6 1.9 1.0 ++ +++ 0.01 0.02 SEP15 rs5859 3'UTR -26.0 0.3 2.0 0.9 1.7 0.9 + ++ 0.2 0.12 Shen 1999 AARS rs2070203 Syn -33.4 0.9 -0.7 -0.5 1.4 0.7 +++ +++ <0.01 <0.01 RPA70 rs2230931 Syn -23.4 -0.4 0.7 0.3 0.6 0.2 - + <0.01 <0.01 Table 6.1b: Haplotypes Goodarzi 2005 LPL 2 haplotypes 3'UTR Structural differences are observed Mas 2005 GJA9 rs3743123 Syn -22.6 -0.5 -1.4 -0.9 -0.6 -0.6 + ++ <0.01 0.01

154 Puga 2005 TNFRSF1B 5 haplotypes 3'UTR Structural differences are observed Russcher 2005 NR3C1 rs6189 Syn, NS -24.6 -0.2 2.2 1.0 2.5 1.4 ++ ++ 0.07 0.12 Wang 2005 ABCB1 rs1045642 Syn -28.4 0.2 -0.1 -0.2 0.7 0.2 ++ ++ <0.01 0.02 Zhang 2005 OPRM1 rs1799971 NS -37.7 1.4 -4.9 -2.5 -4.2 -2.4 + ++ 0.01 0.04 Duan 2003 DRD2 rs6277 Syn Structural differences are observed Table 6.1c: Insertions and multinucleotide variants Ding 2005 AR -G,Q repeats Michlewski 2004 ATNN3, CACNA1A, ATN1 - Q repeats Myers 2004 GRIA2 -5'UTR Sobczak 2003 In vitro Repeats Popowski 2003 EDN1 rs10478694 5'UTR Ly 2003 TERC Multi-allelic Pseudo-knots Shalev 2002 INS -5'UTR Barette 2001 PRNP - 24bp repeats de Leon 2000 NAT1 -3'UTR Maffei 1997 HLA-DQA1 -5'UTR

Table 6.1.

154 coding regions (Kryukov et al., 2003). Iron-responsive elements (IREs) in both 5’ and

3’UTRs are known to regulate translation of specific genes (Gray and Hentze 1994).

Indeed, evidence is mounting that structure in untranslated and coding exonic regions of human mRNAs and small non-coding RNAs may affect the rate of transcription, their processing (e.g., splicing, polyadenylation, editing (Athanasiadis et al., 2004)), hybridization (Kuo et al., 1997; Vickers et al., 2000), decay (Lopez de Silanes et al., 2004; Fialcowitz et al., 2005; Lopez de Silanes et al., 2005), transport, targeting and initiation, and rate of translation (Martineau et al., 2004; Kozak 2005; Russcher et al.,

2005b). Throughout the life cycle of an mRNA there are interactions with many proteins.

These interactions may both limit the RNA conformations assumed in vivo (Zhang et al.,

2006) and be influenced by the nascent RNA structure (Moore 2005). The importance of structure in eukaryotic small RNAs such as miRNAs has been realized (Bentwich et al.,

2005; Bonnet et al., 2004), and it is suggested that pre-mRNA structure may have functional relevance (Buratti & Baralle 2004; Meyer & Miklos 2005). It is also generally accepted that functional RNA secondary structural motifs are typically small in size and fairly well-predicted in shorter sequences (Mathews et al., 1999). Ribonucleic acids fluctuate between different energetically favorable configurations due to stochastic molecular motion and other constraints, though they most likely favor the thermodynamic minimum free energy (MFE) structure over an ensemble of suboptimal structures within a free energy range. Given an RNA sequence, secondary structure prediction programs can generate both an MFE structure and an ensemble of suboptimal structures. Analysis of the MFE structure and the ensemble of suboptimal structures can provide evidence for well-determined structures that are more likely to form in vivo (Mathews et al., 1999).

155 An additional dimension not often considered is how functional RNA structures

are influenced by sequence variation. One method for detection of genetic variants,

single-strand conformation polymorphism (SSCP), relies on differences in the structural conformation of variants in single-stranded DNA or RNA (Lenz et al., 1995; Ren 2000).

A large portion of variants in exonic sequences are detectable by changes in RNA

structure via RNA-SSCP, showing that most SNPs have the potential to affect RNA

structure (Lenz et al., 1995; Sarkar et al., 1992). Experiments employing enzymes that

specifically cleave paired and unpaired bases to create structural maps show that

prokaryote and human sequences differing by a single SNP can have different mRNA

secondary structures (Shen et al., 1999). On the other hand, sequence variations may

often be neutral with regards to effects on structure, in many cases preserving an evolved

functional state (Ancel & Fontana 2000). However, even variations that are neutral in

their effect on the predicted MFE structure may alter the characteristics of suboptimal

structures within the ensemble of conformations, the relative time an mRNA spends in the MFE state, or create a sequence that is likely to change the structure if additional variation occurs (Ancel & Fontana 2000). Variants that alter structure may be compensated for by additional variants that epistatically preserve the ground state structure (Chen et al., 1999).

A deeper understanding of how population variants influence RNA structure may help explain inter-individual and inter-species differences in gene expression and function. Studies in humans have supported the idea that some inter-individual genetic differences alter RNA structures and affect RNA functions, in some cases contributing to disease (Table 6.1, (Florentz & Sissler 2001; Johnson et al., 2005; Wang et al., 2005;

156 Zhang et al., 2005)). Recently, Vilmi et al. resequenced 22 tRNA genes in the mitochondrial genomes of 477 Finns, and examined 435 European tRNA sequences from the MitoKor database. They found that MFE structures predicted among the 96 polymorphic tRNA sequences showed a significantly different distribution than wild-type tRNAs, with low frequency alleles yielding the greatest predicted change in MFE (Vilmi et al. 2005). Pathological evidence also indicates that SNPs disrupting IRE structure in the 5’UTR of FTL are a genetic cause of hereditary hyperferritinemia-cataract syndrome

(HHCS) (Beaumont et al., 1995; Girelli et al., 1995; Aguilar Martinez et al., 1997;

Cazzola et al., 1997; Martin et al., 1998; Mumford et al., 1998; Allerson et al., 1999;

Camaschella et al., 2000; Campagnoli et al., 2002; McLeod et al., 2002). Clinical cases of rigid spine syndrome have been attributed to SNPs in the structure-encoding sequence of the SECIS-containing 3’UTR of SEPN1 (Moghadaszadeh et al., 2001; Allamand et al.,

2006). Thus, we hypothesize that synonymous, nonsynonymous and UTR variants can potentially act in mildly deleterious and, in some cases, pathological fashion on pre- and post-translational levels through changes in RNA structure.

Bioinformatics databases and tools have been developed to predict the potential functional effects of genetic differences, particularly for nonsynonymous SNPs that alter amino acid coding or those that fall near splicing borders and predicted protein-DNA binding sites (for a review see Johnson et al., 2005). However, computational analysis of variants altering human RNA structures has typically been investigated only for single genes or variants following experimental observations. Here we present the first report of the predicted effects of known variants on RNA structure in a large portion of the human genome. We used the Vienna RNA secondary structure program (Hofacker 2003) to

157 create a human genome SNP-structural conformation dataset for analysis. We determine how frequently variants are predicted to alter RNA structures, whether allele frequencies associate with structural differences, and if proposed functional variants from the literature differ from the normal distribution of variance. We analyze structure results with multiple analytical approaches and report a set of SNPs predicted to potentially alter

RNA processing or expression via changes in RNA structure. We also present a number of challenges and solutions specific to genome-wide surveys of SNPs and consideration of RNA sequence contexts. The resulting SNP-structure dataset is available for researchers to assess structural variation for particular genes, variants and haplotypes, or to search for structure patterns. Aside from genetic variation, this is also one of the largest examinations of mRNA structure in the human genome (about 17,900 transcriptional units represented) as previous reports examined 12 (Meyer & Miklos

2005), 1,855 (Katz & Burge 2003), and an overlapping set of 41 (Clote et al., 2005), 46

(Workman & Krogh 1999), and 51 (Seffens & Digby 1999) human mRNAs. Results of our analysis of human mRNA structures indicate there is considerable potential for favorable structures to form within specific regions of many mRNAs.

6.2 Method of study on the influence of human polymorphisms on RNA structures

Generation of custom sequence databases via UCSC Genome Browser tools: Our sequence database is based on the May 2004 human assembly available as hg17 from genome.ucsc.edu. We used the RefSeq gene annotations as of November 2005 and dbSNP build 124. The UCSC source code includes a set of snpMask utilities:

158 snpMaskChrom.c, snpMaskGenes.c and snpMaskFlank.c. The concept of "snpMasking"

is to produce nucleotide sequence where single base substitutions are represented by

IUPAC codes (Cornish-Bowden 1985). This method was independently developed by

both UCSC and the WUSTL SNP Research Facility, and snpMasked sequences were

compared and found to be identical (Koboldt et al. 2006). snpMasking includes detection

of dbSNP clustering errors, creating a blended observation of all alleles. At the current

time, if the reference assembly differs from the observed SNP, this is not included as an

additional variation. UCSC source code is available at

http://hgdownload.cse.ucsc.edu/admin/jksrc.zip, or can be obtained via CVS as

documented at http://genome.ucsc.edu/admin/cvs.html.

We used snpMaskFlank.c, which first stores all gene annotations in UCSC

genePred format. This format includes a list of exon coordinates for each gene. snpMaskFlank.c then iterates through all genes, examining all exons. For each exon, snpMaskFlank.c selects all SNPs where class = 'snp', locType = 'exact' and chromEnd = chromStart + 1. It generates masked sequence for all exons using the absolute coordinates of the SNPs, and then constructs flanking sequence on two sides, connecting exons together as necessary. snpMaskFlank.c has a configurable #FLANKSIZE parameter, which was set to 25, 50 and 75 for our analysis.

These sequence sizes (50, 100, 150 bp total) were chosen to allow sufficient space for structural interactions and based on the widely held assumptions that RNA secondary structure motifs are generally small and predicted well locally, while predictions for large sequences are generally less accurate (Mathews et al., 1999; Meyer & Miklos 2005). We also tested this assumption by analyzing the frequency of structure changes at positions

159 relative to the SNP base, and adopted a total sequence length of 100 bp for subsequent

analyses (see Fig. 6.5). Adoption of smaller sequence contexts reduces the sequence

complexity due to additional genetic and alternative splicing variants (see below).

Sequence retrieved was only exonic or untranslated region up to the maximum flanking length or to the end of each annotated RNA, whichever was reached first. The UCSC

AltSplicing track was also incorporated to retrieve multiple flanking sequence contexts if a SNP is within a spliced region. Additional SNPs that fell within the flanking sequences were also represented by IUPAC codes and enumerated in the sequence header to allow

for generation of haplotypes. All IUPACs were adopted to reflect the coding strand

orientation since genotypes are often reported on noncoding strands. An example of

sequence information in the database and how it is processed is shown in Figure 6.1. A

filter limit of 9 or less IUPAC codes per sequence was then imposed before structure

prediction in order to reduce high computational load from exponential numbers of

haplotypes due to highly polymorphic regions and duplicons in the genome. This

resulted in the exclusion of 1,736 SNPs but decreased the size of the sequence space by

more than 8.5 X 1012 alleles. A Perl script was used to generate all possible unique coding contexts/haplotypes for the remaining 153,397 unique coding region SNPs.

Alternative RNA contexts (Fig. 6.1b), SNPs with more than two alleles, and variable numbers of additional SNPs in the sequence regions (Fig. 6.1c) contributed to generate a database of 1,059,195 unique mRNA haplotype allele sequences. These sequences were separated by chromosome and used in structure prediction. Examples of the sequence formats for the IUPAC and RNA sequence databases are shown in Figure 6.1.

160

6.1A: Generalized view of IUPAC and mRNA haplotype databases. Each sequence header line indicates the following: a – dbSNP reference id for SNP, b – unique sequence context number for given SNP (0…n) (e.g., alternative splicing context), c – haplotype allele sequence number (0,1…n), d – total number of SNPs located in sequence. Major allele and coding strand are also indicated. A second line gives the sequence with IUPAC codes or the unique haplotype sequence.

6.1B: Example of a biallelic SNP (rs2205447) that has no additional SNPs in the flanking sequence but exists in two unique sequence contexts due to alternative splicing. Four mRNA haplotypes are generated for this SNP.

6.1C: Example of a biallelic SNP (rs4113746) that has one additional SNP in the flanking sequence region. Four RNA haplotypes are also generated but the resulting numbering scheme differs from the example in Fig. 6.1B.

Figure 6.1: Examples of information types and data flow for RNA SNP sequences in the study

161

Figure 6.1.

162 Sets of known functional RNAs in the genome that contained variants were analyzed separately from the large RefSeq set. Small RNAs containing variants were identified using a merge of the SNP and sno/miRNA tracks in UCSC genome Table rowser (Griffiths-Jones 2004; Weber 2005), and information from two papers (Bentwich et al., 2005; Iwai & Naraba 2005). The predicted effects of variants in these RNAs were analyzed based on structures published in the miRNA Registry at the Wellcome Trust

Sanger Institute (Griffiths-Jones 2004) and the snoRNA-LBME-DB at the Laboratoire de

Biologie Moleculaire Eucaryote (Lestrade & Weber 2006). IREs and SECIS were determined based on previous works. BLAT searches were applied to the sequences for

IREs and SECIS in an attempt to identify other similar regions in the human genome

(Kent 2002). We also created a merge of dbSNP and EvoFold (Pedersen et al., 2006), a comparative genomics program that found many known and novel putative functional

RNA structures in the human genome. To analyze the representation of methionine within EvoFold structures, we calculated the genome frequency of methionine codons from all RefSeq protein sequences. EvoFold structure lengths were trimmed to exclude intronic sequence, methionine codons were enumerated based on RefSeq annotation, and structures counted only once even if they contained multiple SNPs.

RNA secondary structure prediction: The RNA sequence files were distributed over a cluster of 3, 64 bit machines, 2 with single processors and one with 4 processors, all running a UNIX O/S. A custom Perl script managed a pipeline feeding sequences to the Vienna RNA package (Hofacker 2003) (version 1.4) and processing results. The command line argument for prediction of each secondary structure followed a pattern:

RNAfold –p < InputSeq > OutputStructure

163 The –p flag for RNAfold specifies the calculation of a partition function and base pairing

probability matrix in addition to the typical MFE structure calculation. Structures were

calculated with default settings (370C, GU pairs allowed). An example and description of

a text-based structure result in bracket notation is given in Figure 6.2a. These textual

structure results served as the basis for subsequent filtering and analysis. Additionally for

every sequence, a graphical view of the MFE structure and the dot-plot was generated,

archived and stored (see Fig. 6.2b).

Annotation and clustering of sequences and variants by functional type: In order to conduct analyses, annotations for sequences and variants were combined from a number of sources. A unique merge of dbSNP and UCSC annotation for more than 10 million variants was created through the use of Perl scripts. Though complete information was not available for every SNP, many contained the following information: rsID#, genome position, strand orientation, major allele, IUPAC, dbSNP validation status, average heterozygosity among genotyped populations, SD of average heterozygosity, functional classification (i.e., synonymous, UTR, intron), and number of genotyped individuals and frequencies of alleles. This annotation was critical to identifying the major and minor allele(s) in each sequence, and analyzing by allele frequency and functional categorization. Major and minor alleles were determined by the relative allele frequencies combined from all genotyped samples for each SNP in dbSNP build 124. Untranslated region SNPs have not previously been categorized as 5’ or 3’ in the human genome. We annotated UTR SNPs as 5’ or 3’ to estimate their abundance in the datasets using SNP genome position in combination with UCSC tables for gene boundaries (rnaCluster) and RefSeq genes (refGene). Merges of SNP sets and refGene

164

6.2a:: Example mRNA secondary structure results for a single SNP (rs10778) with two alleles. The SNP base is in bold. ID and sequences are indicated on lines 1, 2, 6 and 7 in identical format to Figure 6.1. Lines 3 and 8 indicate the predicted MFE structures for the two alleles with the MFE (∆G in kcal/mol) indicated in parentheses. Structure is in the bracket notation adopted in Vienna: ‘.’ unpaired base, ‘(‘ base paired with 3’ partner, ‘)’ base paired with 5’ partner. Lines 4 and 9 indicate the consensus structure for the ensemble including all suboptimal structures: ‘.’ unpaired base in > 2/3 of structures, ‘(‘ and ‘)’ paired in > 2/3 of structures, ‘,’ ‘{‘ ‘}’ are weaker versions of these preferences, and ‘|’ indicates that a base is paired in 2/3 of structures without a clear preference for a 5’ or 3’ partner. The free energy for the thermodynamic ensemble (∆G in kcal/mol) is indicated in braces. Lines 5 and 10 indicate the Boltzmann probabilities; assuming thermodynamic equilibration this is a representation of the amount of time spent in the MFE structure relative to all structures in the ensemble.

6.2b: Graphical depiction of the MFE structure and a dotplot representation of the structure ensemble for rs10778 (G allele). MFE structures can be reconstituted from bracket notations.

Figure 6.2: Example SNP structure results for rs10778

165

Figure 6.2.

166 were used to annotate geneIDs and estimate the number of unique genes represented in

each of the datasets presented here (17,891 genes in 153,397 SNP set; 12,453 genes in

34,557 SNP set; 10,501 genes in 22,785 SNP set). Components of the annotations were

also used in the filtering processes described below.

Analysis of 34,577 SNPs in a low ambiguity dataset: Ambiguity in the full

sequence dataset arises from a number of sources: 1) n biallelic SNPs results in 2n possible haplotypes, some of which do not exist in human populations, 2) SNPs without validated allele frequencies, 3) SNPs with multiple functional categorizations (i.e., combinations of synonymous, nonsynonymous, UTR, intron), 4) SNPs with additional sequence contexts due to alternative splicing, 5) SNPs with flanking sequences of varying length due to transcript boundaries. To analyze structure results in a low ambiguity context, we created a subset of 34,577 SNPs in RefSeq RNAs. All sequences in this subset contained 1) only one known SNP in the sequence, 2) SNP allele frequencies validated in one or more human population, 3) a single functional categorization for the

SNP, 4) the SNP existed in only one sequence context (no alternative splicing), and 5) total sequence length of 100 bp. This subset was generated from another subset of

44,185 SNPs that contain no additional known variants in their flanking sequences.

SNPs were removed from that set as follows: ambiguity in functional categorization

(n=7,771), total sequence length less than 100 bp (n=1,203), and existence in multiple sequence contexts (n=634). The final analytical subset contained 8,905 nonsynonymous,

10,702 synonymous, and 14,950 UTR region SNPs. Perl scripts were used to calculate further heuristics on the structure data for these SNPs including: change in the MFE and ensemble thermodynamic energies between major and minor allele structures, number of

167 bases undergoing predicted structural changes from helical to base-paired and base-paired

to helical between alleles, the frequency of changes in structural behavior predicted at

{SNP-n…SNP…SNP+n} positions, and the change in the Boltzmann probability of the

MFE structure within the ensemble.

Search for haplotypes with structurally interacting SNPs: We used a Perl script to

search for human variant combinations that preserved RNA structures among instances

where two SNPs were located within a 100 bp window as determined by the number of

IUPACs (e.g., Fig. 6.1c). The SNP base positions were determined by contrast of the

sequences. MFE structures for each of 4 possible haplotypes were interrogated to find

instances where both SNP bases helically paired. To do this a simple count up-count

down ladder analysis of the helical structures in bracket notation (see Fig. 6.2a) was

applied. Structures were selected as interacting instances when SNP bases were 1) both

located at the bottom of the ladder, and 2) counting down only reached the base of the

ladder once and exactly at the 3’ SNP position.

6.3 Results of study on the influence of human polymorphisms on RNA structures

Computational analysis predicts SNPs as a common cause of conformational

variation in human RNA secondary structure: We predicted mRNA structures derived

from regions surrounding 153,397 SNPs in ~17,900 Refseq genes. We filtered this set to

reduce sources of ambiguity (see Methods) yielding structures and thermodynamic MFEs

(∆G) around 34,557 SNPs in ~12,450 Refseq genes. This approach allowed both a general analysis of mRNA structures in a genome-wide set of RNAs, and an examination

168 of changes in structures due to polymorphism (e.g., by analysis of changes in energies:

∆∆G). For all Figures and Tables where ∆G (thermodynamic minimum free energy in kcal/mol) and ∆∆G (change in thermodynamic minimum free energy from major to minor allele) are reported, negative values correspond to lower free energies (more thermodynamically favorable). The thermodynamic distributions of RNA structures

(∆G) around SNPs across the genome are displayed in Figure 6.3a. Notably wide ranges of ∆G values are observed in all SNP contexts. The distributions are slightly right-shifted toward less favorable structures, in particular in the case of UTR structures. The left tails

(more favorable structures) are considerably longer than the right and at the extreme they are above the calculated normal distribution. This is in part due to the zero energy barrier, but may also indicate some selection for a small proportion of highly favorable structures.

SNP contexts differ significantly in the distribution of ∆G values, ranking from most favorable to least favorable: synonymous, nonsynonymous, and UTR (p<0.001,

ANOVA Bonferroni correction). Nonsynonymous and synonymous sequence contexts are most similar and the largest average difference of -3.16 kcal/mol is between synonymous and UTR structures, which is roughly equivalent to the energy of a single

Watson-Crick G-C pair. The SNP context categories also display significant differences with respect to the distribution of the types of biallelic SNPs (see Fig. 6.4a), structure prediction results (see Figs. 6.4b, 6.4c, Table 6.2), and overall sequence context RNA nucleotide content (see Fig. 6.4d). Untranslated region sequence contexts show significantly greater cytosine content and diminished guanine and thymine content (Fig.

169 6.3a: Histogram distributions of the ∆G (kcal/mol) values for the MFE structures predicted for 69,114 human mRNA sequences. The structures include those predicted from sequences surrounding both major and minor alleles for 34,557 validated human SNPs. Histograms are separated by three SNP categories (nonsynonymous, synonymous and UTR). Means ± 1 SD are: nonsynonymous (-25.69 kcal/mol ± 8.59), synonymous (- 26.40 kcal/mol ± 8.21) and UTR (-23.24 kcal/mol ± 9.42).

6.3b: Histogram distributions of the ∆∆G (kcal/mol) values for the MFE structures predicted for 22,785 validated human SNPs. Not represented are 11,772 SNPs for which both alleles predicted identical MFE structures and ∆∆G values of 0 kcal/mol. Their exclusion prevents kurtotic spikes for all three SNP contexts. Means ± 1 SD are: nonsynonymous (0.27 kcal/mol ± 2.06), synonymous (0.32 kcal/mol ± 1.90) and UTR (0.20 kcal/mol ± 1.95). Calculated normal curves with identical means and standard deviations are superimposed (SPSS, version 13). Histogram distributions for ensemble ∆G and ∆∆G values showed similar patterns (data not shown).

Figure 6.3: Distributions of thermodynamic energies of mRNAs across the human genome and predicted effects of SNPs on thermodynamic energies

170

Figure 6.3.

171

Figures 6.4a, b and c are organized from the most common to least common SNP types (left to right) in human mRNAs.

6.4a: Relative frequencies in the dataset of 12 SNP types among functional categories for 34,557 validated human SNPs for which mRNA structures were analyzed.

6.4b: Percentage of SNPs of each type and functional category for which the major and minor alleles predicted identical MFE structures.

6.4c: Boxplots indicating the median (horizontal bars), interquartile ranges (boxes) and outliers for ∆∆G of MFE structures for 12 SNP types by sequence context.

6.4d: Percent sequence composition of the four nucleotides by SNP context type among 69,114 human mRNA sequences for which structures were analyzed.

Figure 6.4: Predicted effects of SNPs in the human genome on mRNA structure organized by SNP type

172 173

Figure 6.4. 173 6.4d). Synonymous SNPs exist in sequence contexts with slightly higher average

guanine and thymine content than non-synonymous SNPs (Fig. 6.4d).

To analyze the effect of SNPs on structure, we used the bracket notation for

structures (e.g., see Fig. 6.2a) to compare structures between major and minor alleles for

34,557 validated, biallelic SNPs. We find that the majority of these human genetic

differences alter the MFE RNA structure as well as the profile of the ensemble of top-

rated suboptimal RNA structures. Some minor alleles are predicted to have identical

MFE structures (34.1%), while few are predicted to have near identical structure

ensembles (6.4%). The predicted changes in thermodynamic energy values for minor

alleles relative to major alleles (∆∆G) are depicted in Figure 6.3b.

Analysis of SNPs predicted to change mRNA structures: We further

characterized the relationship between sequence variation and RNA structure in this genomic dataset of SNPs that predicted altered structures (n=22,785). The greatest frequency of structure change is predicted for transversion SNPs involving guanine exchanges. Other transversions are the next most frequent, followed by transitions (see

Fig. 6.3b). Mean results for structure predictions are displayed in Table 6.2.

Synonymous major and minor allele structures have on average slightly more negative

∆G values and larger increases in ∆∆G with minor alleles (less favorable change) than nonsynonymous and UTR SNPs. In keeping with this, synonymous structure contexts generally have a greater percentage of bases involved in helical pairs, thus tending to form more favorable structures that would be more likely to be affected by a SNP. We analyzed the predicted structural behavior for each base position of each structure pair.

There are four possible patterns of behavior at each base position of each structure

174 Nonsynonymous Synonymous UTR ANOVA (Bonferroni corrected) n (n=6041) (n=6886) (n=9858) NS <> Syn Syn <> UTR UTR <> NS ∆G major allele -25.8 -26.3 -23.4 0.004 <0.001 <0.001 (kcal/mol) MFE ∆G minor allele -25.6 -26.0 -23.2 0.011 <0.001 <0.001 (kcal/mol) ∆∆G (kcal/mol) 0.27 0.31 0.20 0.549 0.001 0.109 ∆G major allele -28.9 -29.5 -26.4 0.001 <0.001 <0.001 (kcal/mol) Ensemble ∆G minor allele -28.6 -29.2 -26.3 0.003 <0.001 <0.001 (kcal/mol) ∆∆G (kcal/mol) 0.26 0.30 0.19 0.561 <0.001 0.029

Average Major allele 0.03 0.03 0.02 0.098 1.000 0.015 Boltzmann Minor allele 0.03 0.02 0.02 0.002 1.000 0.003

Average % of Major allele 52.7% 53.2% 52.6% bases paired in structures Minor allele 52.4% 52.8% 52.5%

Helices disrupted: MFE 1.04 1.04 1.01 Helices Ensemble 1.05 1.05 1.04 formed

Average MAF 0.16 0.18 0.19

Comparative description and analysis of structures from 22,785 validated human SNPs whose major and minor alleles predict non-identical MFE structures. All reported values are means. Statistical analysis of differences by category was done by ANOVA with Bonferroni correction in SPSS (version 13).

Table 6.2: Comparative analysis of structures from major and minor alleles 22,785 validated human SNPs.

175 (major>minor allele): unpaired>unpaired, unpaired>paired, paired>unpaired,

paired>paired. This analysis reveals that changes from major to minor alleles favor

disruption of helical pairing interactions over the creation of new helices, regardless of

SNP category (Table 6.2).

Examination of the frequency of predicted structural changes at surrounding bases

relative to the SNP base in the sequence reveals a strong central tendency in change despite varying flanking sequence sizes (see Fig. 6.5), showing that alterations in secondary structure interactions typically localize to nearby regions. The relationship between SNP minor allele frequency (MAF), ∆∆G, and change in structure pairing behavior was analyzed by ANOVA at different MAF cutoffs (similar to the tRNA allele analysis in (Vilmi et al., 2005)). There is an overall trend toward greater change in structure and larger ∆∆G among SNPs with lower MAFs (below 10% MAF vs. above

10% MAF; ∆∆G, p < 0.01, structure pairing change, p=0.052, Bonferroni correction).

Discrimination of putative functional mRNA variants: We used different analyses

to identify putative functional mRNA variants, but in all cases we searched for favorable

structures (low ∆G) with SNPs predicted to create large changes (extremely high or low

∆∆G). All of the SNPs isolated by these approaches were visualized in their genome

context via the UCSC Genome Browser to ensure accurate annotation of their position

within the gene context, CpG island predictions, proximity to alternative processing

events, and observation of nearby poly-repeats and inframe methionine codons.

We examined the most stable structures predicted in the single SNP human

genome structure-variation dataset (Z-score >3; -52.0 kcal/mol cutoff for MFE

structures). This set contained 152 variants (36 nonsynonymous, 40 synonymous, 48

176

Plot of frequencies that structural behavior of bases are altered between major and minor alleles among 34,557 validated human SNPs. The x-axis indicates base positions 5’ to 3’ (-75 to +75; -50 to +50; -25 to +25) relative to the centrally located SNPs. Analysis was done with a Perl program that compared the structural pairing behavior of individual base pairs between alleles using the bracket notation for structure (e.g., Fig. 6.2b). In overlap regions the 50 bp and 75 bp flanking results are not significantly different from each other, but both are different from the 25 bp flanking results (p<0.05). Differences between SNP categories were not significant (data not shown). Comparisons including ensembles of suboptimal structures followed similar patterns (data not shown) but display consistently more frequent differences in structure because the bracket notation for ensembles indicates relative frequencies of base pairing behavior among multiple thermodynamic structures, in contrast to the fixed notations for single MFE structures.

Figure 6.5: Effects of varied flanking sequence sizes on mRNA structure predictions in a human genome-wide dataset

177

Figure 6.5.

178 5’UTR, 28 3’UTR). Many structures with long helices are observed, up to a maximum helical stretch of 25 bp in the most favorable structure found flanking the translation start site of NDST1 (rs3733935 G allele: -69.0 kcal/mol). A number of trends are observed among the gene regions in this set. Structures in (48) and near (38) 5’UTRs are over- represented relative to their presence within the full set. Of the synonymous and nonsynonymous structures half are located in 5’ exons within 100 bp of translation start sites, and 40% are in close proximity within the mRNA to one or more downstream inframe methionine codons. At least eleven of the 152 variants are in or near methionine codons known to be sites of alternative translation. More than two-thirds of the structures (68%) are found within predicted CpG islands (Gardiner-Garden & Frommer

1987). In three cases, extended exonic poly-repeats are observed near strongly predicted structures (rs367398, Leu[11] in NOTCH4; rs1799925, Pro[9] in WT1; rs3021525, Ala[16] in

FOXE1). No 3’UTRs shorter than 300 bp are observed and the average 3’UTR length is greater than 1 kb. One quarter of the predicted stable structures (38) are located in exonic

regions but not in close proximity to either translation start or stop regions.

In a separate analysis, cutoffs for Boltzmann probabilities were used to identify

512 variants that have a set of well-predicted structures (Z-score >3; 0.145 cutoff). We

hypothesized that structures with high Boltzmann probabilities might contain biologically

functional structures because of their relatively confined and similar set of suboptimal

structures (Miklos et al., 2005). We further screened these structures for those with 1) large ∆∆G due to the SNP, 2) a high degree of sequence conservation and regulatory

potential based on UCSC genome annotation, and 3) low (favorable) ∆G. The resulting

set contained 129 candidate variants (38 nonsynonymous, 43 synonymous, 14 5’UTR, 34

179 3’UTR) predicted to substantially alter a favorable structure. In some cases we noted that

the candidates coincided with EvoFold predictions indicating the structures, and not

simply the sequences, are likely preserved across mammals (Pedersen et al., 2006).

Based on the success of EvoFold in predicting some functional RNA structures

(Pedersen et al., 2006), we used UCSC Table Browser to create a merged set containing

EvoFold predictions in mRNAs and known SNPs (n=936). There is a distinctive distribution in the location of these structures in the mRNAs to: coding exons (70.5%),

3’UTRs (27.6%), and 5’UTRs (1.8%). We find exonic EvoFold structures in this set overlap inframe methionine codons in 42% of cases, which is 1.84 fold greater than expected based on their sequence length and the calculated human RefSeq protein frequency of methionine codons (2.14%). Some of these form stem-loop structures with the loop between two methionine codons that overlap in the stem. Instances of two consecutive methionine codons occur within this structure set 3.34 fold greater than expected assuming independent assortment of codons. A strikingly large portion of the

3’UTR structures in this set are located at the extreme distal end of the transcript

(48.3%). We also note that some gene categories are extensively represented within the

EvoFold-SNP structure set: histones, ribosomal proteins, ribonucleoprotein-related,

translation initiation factor-related, vesicular-related, ubiqutin-related, myosins, and

neurotransmitter receptors.

SNPs in structures near alternative mRNA processing sites: Our initial analyses

of the 50 bp flanking structure set revealed that inclusion of sequences of total length less than 100 bp exaggerated differences between the UTR and coding variant sequence contexts, and in particular resulted in higher (less favorable) average ∆G and higher

180 average Boltzmann probabilities for the UTR category (data not shown).

Correspondingly, SNPs in the UTR category had the greatest percentage of sequences

with total length less than 100 bp due to their proximity to transcript boundaries (UTR –

7.3%, nonsynonymous – 2.0%, synonymous – 1.3%). These shorter sequences formed

less secondary structure because of their diminished length, and the smaller number of

potential structural interactions yielded greater average Boltzmann probabilities.

We hypothesized that this set of shorter length sequences (n=1,203) would

include many in close proximity to 5’ transcription and translation initiation sites, and

that those sequences predicted to form stable structures might influence the initiation or

rate of transcription or translation. Additionally, we hypothesized that the set of SNPs

initially removed because of multiple sequence contexts (n=634) would include many

genes exhibiting alternative transcription or translation start sites, splicing, termination, or poly-adenylation. The SNP contexts for shorter length sequences and multiple sequence contexts (n=1,653 due to overlap) were visualized in UCSC Genome Browser to collect further annotation. Many of the SNPs in multiple sequence contexts (n=634) map to multiple sites in the genome or are in regions where transcripts are derived from both strands. However, a portion of these SNPs are found near sites of known or putative alternative gene processing (n=196).

The majority of SNPs (53.4%) in the shorter sequence length set were located at the 5’ end of transcripts, in contrast to our estimates for the full dataset where 3’UTR

SNPs (80%) far outnumber 5’UTR SNPs (20%). This is consistent with prior observations that 3’UTRs are generally longer than 5’UTRs in eukaryotes (and thus likely to accumulate relatively more variation). Filtering based on sequence length and

181 multiple sequence contexts effectively enriched for SNPs at the 5’ end of transcripts that were near transcription and translation start sites. This group of SNPs (n=196) was examined to find candidates with 1) a large ∆∆G predicted between alleles (>2 kcal/mol), and 2) low ∆G (one allele more favorable than -27.0 kcal/mol) and/or long helical

structures. Boltzmann probability was not considered as this was observed to be

influenced by sequence length.

RNA structures containing interacting SNP bases: Analysis of RNA structures generated from sequences containing two SNPs identified 568 pairs where the two SNP bases were predicted to structurally interact in 1 or more haplotype, indicating potential multi-allelic interactions. Only 34 of the 568 SNP pairs were further examined since they had validation annotation for both SNPs. Among these we found SNP pairs interacting in only one haplotype (n=21), those following the canonical GC-AU substitution pattern often observed across species (n=8), and other patterns: GU-UA (n=1), GC-CG (n=1),

AU-UA (n=1) and GC>U (n=2). In some cases (CIT, TNFAIP2, BCKDHB) the allele frequencies of structurally interacting alleles complemented the substitution pattern expected to preserve RNA structures. Notably these three genes have been associated with human disease phenotypes and the regions containing the predicted structures are more conserved across species than surrounding sequences in the genes. We also find the structure predicted in the 3’UTR of LPL interesting, since a haplotype containing two alleles in this precise location was recently associated with increased enzyme activity and metabolic phenotypes among a large Mexican-American cohort (Goodarzi et al., 2005).

This suggests that further analysis of structurally interacting alleles may be useful in finding functional structures, and may indicate possible instances of covariation within

182 human populations to form or preserve RNA structure. However, estimation of the

phased multi-SNP haplotypes that truly exist in human populations is inexact, making

extensive computational analysis potentially problematic. This is the principal reason

that the majority of the analysis in this work is done only for single validated SNPs,

where the two haplotypes are known and phasing is not an issue.

Polymorphisms in known functional RNA structures: Based on previously

described functional structures (see Methods for database sources) we analyzed SNP and

multi-nucleotide variants occurring in or near described IRE, SECIS, miRNA and

snoRNA structures. This examination reveals that most contain no variation or variation that is predicted to be neutral in structural effect, but also that some structures do harbor

validated variants that may affect biological function.

Sequence from Allerson et al., 1999 was used to identify the IRE structural

element in the 5’UTR of FTL (L-ferritin) (Allerson et al., 1999). None of the numerous pathological variants reported in the literature within the FTL IRE were found in dbSNP, and thus were not available for analysis in our results set (Beaumont et al., 1995; Girelli et al., 1995; Aguilar Martinez et al., 1997; Cazzola et al., 1997; Martin et al., 1998;

Mumford et al., 1998; Allerson et al., 1999; Camaschella et al., 2000; Campagnoli et al.,

2002; McLeod et al., 2002). A multi-nucleotide variant (rs11553230) was noted in the

IRE of FTL but this variant has not been validated. A SNP reported in the FTH1 IRE was also absent from dbSNP (Kato et al., 2001). Known IRE structures in FTH1, ALAS2,

ACO2, SLC40A1, TFRC, SLC11A2, CDC14A and CDC42BPA (Cmejla et al., 2006) were examined but no variants were observed. The comparative genomics program EvoFold

(Pedersen et al., 2006) effectively predicted many but not all of the known IREs. BLAT

183 search (Kent 2002) for many of the IRE encoding sequences revealed similarity in

retroposed regions, pseudogenes and intronic and intergenic regions throughout the

human genome, but no additional putative IREs within known protein-coding mRNAs.

The regions encoding the known functional stem-loop structures controlling

selenocysteine incorporation (SECIS) were examined in 25 human genes for nearby

variants (Kryukov et al., 2003). A total of 26 elements were examined since the SELP

3’UTR contains two elements. Eleven of the SECIS elements contained no known

polymorphisms. The previously described pathological variants in SEPN1 were not

found in dbSNP (Moghadaszadeh et al., 2001; Allamand et al., 2006). Four SECIS

elements (GPX1, C11orf31 (SELH), SELK, SELO) harbored variants in dbSNP but these

were excluded from consideration because they are computationally predicted and not validated in human populations.

Potential novel variants were discovered in BLAT alignments against the human genome for one SELP SECIS and the DIO2 SECIS. A single base (A>U) difference was

noted in SELP from the sequence published in Kryukov et al., 2003. Differences

corresponding to the deletion of three single cytosines were noted in the alignment of the

DIO2 SECIS to the genome. Ten of the SECIS elements contained variants that have

been validated in human populations. Because selenocysteine structures involve considerable non-Watson-Crick base-pairing they are only partially predicted by standard settings for secondary structure algorithms like Vienna. Thus, we used SECISearch 2.19 to predict structures for all alleles (Kryukov et al., 2003).

All wild type allele sequences predict the functional SECIS structural elements as expected: Helix I – Internal loop – UGAN SECIS quartet core – Helix II – Apical Loop.

184 Some variant alleles described below were found to predict alterations in the SECIS structures and may affect the incorporation of selenocysteine or translational readthrough in these proteins. The putative single base change in one SELP SECIS is an A>U shift just 2 bp upstream of the critical quartet and is predicted to close the internal loop via an

U-A interaction. The putative cytosine indels in DIO2 are predicted to affect structure in

the Helix I and apical loop areas of that element. SNPs in SEP15 (15kDA) (rs5859) and

TXNRD3 (rs14682) are predicted to alter structure in the apical loop region of those

SECIS elements. The SNP in SEP15 led us to discover two previous papers showing functional studies of this and another SNP (rs5845) in the 3’UTR suggesting a link to

cancer etiology (Kumaraswamy et al., 2000; Hu et al., 2001). A previously studied SNP

in GPX4 (rs713041) was predicted to significantly alter structure in the Helix I region

(Villette et al., 2002), and a validated SNP in SEPX1 (rs4987018) is found just 2 bp

downstream of the SECIS quartet and is predicted to disrupt a portion of Helix II. A SNP

in TXNRD2 (rs1044732) is predicted to weaken an interaction in Helix I. A number of

variants are not predicted to appreciably affect SECIS elements: an indel in the second

SELP still formed a stable Helix I (rs10569610, -/AGUA), and SNPs in GPX3 (rs4661),

SELI (rs7588538) and DIO1 (rs12095080) are distanced enough that they are not predicted to affect the core SECIS structure.

Fourteen of the SECIS-containing human genes include one or more additional inframe UGA codon located in considerably distant upstream locations. In bacterium, structure immediately downstream of UGA codons is critical to selenocysteine incorporation, while in eukaryotes the SECIS elements (e.g., those above) are located, often distantly, in the 3’UTR (Berry et al., 1991). However, it has been noted that stable

185 helical structures are located downstream of the human inframe UGA codons in SECIS- containing genes and these structures may stabilize readthrough (Kryukov et al., 2003).

We observed that EvoFold (Kryukov et al., 2003) predicts helices conserved across mammals downstream of inframe upstream UGA codons in SEPN1 (Exon 10), SELT

(Exon 2), and SELK (Exon 4). In fact, experimental mutagenesis of the helix immediately downstream of the SEPN1 upstream UGA codon showed that this structure did facilitate selenocysteine readthrough (Howard et al., 2005). Thus, we examined structures near inframe UGA codons in SECIS encoding human genes for variation.

Remarkably, only three variants were noted in regions near inframe UGA codons: rs11552989 (not validated, 1 bp downstream of the C11orf31 (SELH) exon 2 UGA), rs6440687 (4 bp upstream of the SELT exon 2 UGA), and rs2272853 (33 bp downstream of SELO). In our structural results database stable helices are predicted downstream of the SELO and C11orf31 UGA codons (-34.8 kcal/mol and -36.4 kcal/mol, respectively), and the SNPs are not predicted to disrupt the helices.

Of 21 variants found in human pre-microRNA (pre-miRNA) regions, twelve seem unlikely to disrupt structure and processing of a miRNA because they preserve Watson-

Crick interactions and are not located near mature miRNA sequences. The remaining 9

SNPs are distributed as follows: disruptive of a helical interaction and located in the mature miRNA (hsa-mir-520c; rs7255628, hsa-mir-125a; rs12975333), creating a novel helical interaction and located complementary to the mature miRNA (hsa-mir-146a; rs2910164), disruptive of a helical interaction within 10 bp of the mature miRNA (hsa- mir-521-2; rs13382089, hsa-mir-140; rs7205289, hsa-mir-27a; rs11671784, hsa-mir-516-

3; rs10583889), disruptive of a helical interaction greater than 10 bp from the mature

186 miRNA (hsa-mir-492; rs2289030), and a multi-nucleotide insertion in a hairpin loop

(hsa-mir-516-3; rs10670323). Notably only 2 of these 9 potentially disruptive miRNA variants are validated in multiple human populations (rs2289030, rs2910164). None of the variants analyzed here overlap with a similar analysis conducted on miRNA variants in a Japanese population (Iwai & Naraba 2005).

Small nucleolar RNAs (snoRNAs) characterized with RNA structure motifs as

C/D box snoRNPs and H/ACA box snoRNPs, and Cajal body-specific RNAs (scaRNAs) were examined to identify variants in their functional domains. The majority of the 42 variants analyzed in these RNAs are located greater than 10 bp away from known

functional domains or are not expected to significantly disrupt Watson-Crick pairs or

RNA targeting interactions. However, three variants are potentially disruptive to functional domains: one in the D box of HBII-52-32 adjacent to a 5HT-2C complementary domain (rs12910266) (Cavaille et al., 2000), one within SNORD1B that is predicted to be involved in 2’O-ribose methylation of 28S rRNA (rs16969028), and one within SNORA44 in the stem of a psuedoknot predicted to guide the pseudouridylation of 18S rRNA (rs16837624) (Kiss et al., 2004). Of these three potentially disruptive SNPs, two have been detected in multiple human populations

(rs16969028, rs16837624).

6.3 Summary

The complex and varied interactions an mRNA engages in from genesis to processing, transport, translation, and recycling provide many regulatory points. Throughout its life

187 cycle there is potential for any RNA to form physicochemical structural interactions.

Whether or not an RNA forms and maintains particular structures is influenced by the

sequence of the RNA, as well as constraints including ion concentrations, temperature, timing of processing, and interactions with proteins, other nucleic acids, and further cellular substrates. An RNA also fluctuates to varying degrees among an ensemble of many possible, and often similar, structures. Critical biological roles for RNA structure have been realized in tRNAs, rRNAs, miRNAs, viral RNA genomes, and eukaryotic mRNAs (IREs, SECIS, editing sites). There is some debate over the importance of mRNA structure in eukaryotes. This was previously addressed through analysis of MFE secondary structures for mRNA sequences from limited numbers of genes across species, including humans, and by comparison with shuffled sequences (Seffens & Digby 1999;

Workman & Krogh 1999; Katz & Burge 2003; Clote et al., 2005; Meyer & Miklos 2005).

However, based on experiments of Kozak and others (for a review see Kozak 2005;

Babendure et al., 2006) showing the importance of structure in eukaryotic 5’ gene regulation, work on human functional mRNA IREs and SECIS (Kryukov et al., 2003), and a number of clinical associations with mRNA structure variants (Table 6.1), firm experimental evidence now supports human mRNA structures harboring biological functions, and that polymorphic alteration of such structure can exert biological and even clinical effects.

This study applied human genome-wide computation of mRNA secondary structures in conjunction with all available human coding SNPs in order to analyze structures in a large group of human mRNAs and to address the effects of human genetic variation on secondary structure. We did not compare structures to those from shuffled

188 sequences because our purpose is to describe human alleles and structures that may truly exist in vivo. This is the largest study thus far employing RNA structure prediction in combination with validated human population genetic information. This was accomplished by programmatic integration of UCSC and NCBI genome databases along with application of Perl programs to automate RNA secondary structure prediction

(Vienna version 1.4 (Hofacker 2003)) and analysis. Structures are predicted for

1,059,195 unique alleles and are available by request. Importantly, this larger set of sequences is inherently biased toward over-representation of highly polymorphic regions since these generate more possible haplotypes that are of high sequence similarity. In reality, given particular sets of human SNPs, all possible haplotypes commonly do not occur in populations. This initial broad approach was adopted to provide a comprehensive set of structures as an analytical starting point.

To conduct analyses on a less ambiguous dataset than available in the full dataset, annotation was employed to obtain a set of 34,577 validated, biallelic SNPs without additional sequence complexity (see Methods). Analysis of structural differences between major and minor alleles in this dataset reveals that the majority (65.9%) of human coding SNPs alter nearby MFE secondary structure, and a wider majority (93.6%) alter the profile of the ensemble of nearby secondary structures including suboptimal structures. Thus, most human SNPs alter RNA structures and should be detectable by their alteration of structure. Interestingly, single-strand conformation polymorphism

(SSCP) SNP detection and genotyping approaches have high success rates (~90-95%), but some SNPs prove difficult to detect (Lenz et al., 1995; Ren 2000). This suggests that analysis of MFE structure alone in some cases is an incomplete predictor of structural

189 change compared with analysis of the ensemble of all predicted confirmations, as the

latter agrees more with observations regarding SNP detection rates by RNA-SSCP.

Many of the articles cited in Table 6.1 have relied only on MFE structures in their

analysis. We applied a Z-score analysis of ∆∆G (MFE and ensemble) to those previously

reported functional variants that appeared in our dataset. These functional variants have

an average deviation from normal near 1 SD, with minor alleles biased toward

thermodynamically less favorable MFE structures and ensembles, and in most cases ∆∆G

was similar for MFE and ensemble structures (Pearson correlation; r=0.948) (Table 6.1).

Correlations between MFE and ensemble ∆G values were also high in the larger datasets

(Pearson correlations in both 22,785 and 34,557 SNP datasets: ∆Gwt (r=0.995), ∆Gsnp

(r=0.995) and ∆∆G (r=0.94)). These results indicate that variants affecting functional structures generally change ∆∆G at the MFE and ensemble levels to at least a moderate extent, and that use of these measures is a reasonable predictor. However, reliance on thermodynamic favorability (∆G) of MFE structures or ensembles, the free energy changes due to alleles (∆∆G), or Boltzmann probabilities are only partial indicators of potential functional changes, especially given the ranges of values observed in Table 6.1.

We suggest that the combination of multiple analytical approaches be applied to identify variant structures as demonstrated below. In specific SNP cases, structures and ensembles should be thoroughly analyzed (e.g., Zhang et al., 2005), and experimentally verified by structural mapping or other methods demonstrated in papers in Table 6.1.

Notably, studies employing secondary structure prediction algorithms can potentially miss biologically important factors like long-range secondary interactions or complex tertiary motifs like pseudoknots. Nevertheless, we find that secondary structure

190 predictions over a small sequence range capture the main portion of change due to sequence variations (see Fig. 6.5). We compared the behavior of major and minor allele structures at each position and found that changes in structure as a result of allele differences have a strong tendency to alter interactions within a highly localized sequence space (see Fig. 6.5). Sequence windows of 100 bp capture much of the predicted change due to variants, while 50 bp windows appear too narrow to allow a sufficient structural interaction space. One implication of this observation is that genetic variants must exist in close sequence space to functional RNA structures to exert a significant change, unless that change is mediated by tertiary interactions. Comparing total sequence windows of

100 bp and 150 bp in Figure 6.5 indicates that our database misses a relatively small portion of secondary structural change due to allele changes affecting sequence interactions beyond a 50 bp radius.

Synonymous SNP contexts were found to form energetically more favorable structures on average than nonsynonymous and UTR contexts. This is in large part attributable to the nucleotide composition of the contexts (see Fig. 6.4d). The substantially higher frequency of guanine and uracil content among synonymous polymorphism sequence contexts results in more favorable interactions because these nucleotides may each pair in two helical interactions (G-C/G-U/A-U). This is also reflected by a higher percentage of bases helically paired on average in synonymous context structures (see Table 6.2). Structure in exon regions and within pre-mRNAs may influence alternative splicing (Buratti & Baralle 2004), and the significant difference between MFE for coding SNP contexts and UTR contexts of identical length that we observe may support some functional role for structures in the coding region.

191 We examined the distribution of the 12 SNP types among the functional

categories and their effects on predicted structures. The four transversion types involving

guanine were found to have the largest impact on structures, changing the MFE structure

in close to 90% of cases, followed by other transversions (~75% change) and transitions

(~60%) (see Fig. 6.4b). However, the fairly balanced distribution in Figure 6.3b indicates

that change was almost as often predicted to be thermodynamically favorable as it was to

be detrimental. Although we suspect most functional variants will impede the formation

of favorable helical structures, cases where variants have closed functional loops have also been reported, and so variants at both sides of the ∆∆G distribution are potentially important (see Fig. 6.3b). Thus, the patterns of SNP type and ∆∆G in Figure 6.4c do not

strictly match those in Figure 6.3b (e.g., a C>G transversion often forms a

thermodynamically more favorable structure but in some contexts it is an unfavorable

change). Both observations fit the thermodynamic underpinnings of secondary structure

prediction, revealing that, in general, variants tend to follow the expected patterns

dictated by Watson-Crick interactions, but also that the sequence context of individual

variants ultimately has great influence on their predicted impact upon structure. The

importance of sequence context was further noted in analysis of variable length

sequences and sequences differing due to alternative splicing. In particular the prediction

of secondary structures of alternatively spliced mRNAs is relatively unique to this study.

As expected we find that alternative splicing has significant potential to yield different

mRNA structures (data not shown). One study has shown that alternative splicing of the

human proinsulin gene results in a difference in 5’UTR RNA structure and altered

translational efficiency (Shalev et al., 2002). This type of observation may be important

192 given the extensive amount of splicing now known in vertebrate genomes (Le Texier et

al., 2006).

We analyzed the minor allele frequencies (MAF) of SNPs and their predicted

effect on MFE (∆∆G) and structure pairing behavior. We were initially surprised when

we found that at MAF cutoffs varying from 1% to 40% we always observed that the

group of rarer alleles had larger average changes in structure and less favorable minor

alleles (∆∆G). These differences are most significant near a MAF cutoff of 10%.

Further analysis and results in Figure 6.3c indicate that the effect is in part due to a

difference in the distribution of SNP types with respect to MAF. In particular, ranking

the 12 SNP types by their average MAF in this set revealed that SNPs creating more

often structurally favorable guanine alleles had higher MAF (A>G – 1st, U>G – 3rd) than those that abolished a guanine allele (G>A – 9th, G>U – 12th). Although this analysis indicates that variants with lower MAF are slightly more likely to create a change in

RNA structure, it seems unlikely this is due to widespread population selection to specifically preserve RNA structures. Indeed, although G>A SNPs generally have slightly lower MAF, they are considerably more abundant in number in human coding

regions than A>G SNPs (see Fig. 6.3a).

Given that our genome-wide analysis indicates most variants have the potential to

alter RNA structure, we sought to identify alleles that alter a putative functional structure.

In these analyses we applied filters to both major and minor allele structures without

preference, and considered large thermodynamic changes in either direction as potentially

significant. First, we analyzed those mRNA structures with extreme thermodynamic

favorability (∆G values between -54.0 and -69.0 kcal/mol). Among these structures we

193 observe examples of extensive helical pairing and an over-representation of sequences

near translation initiation sites. This suggests that some of these structures may have functional roles in translation codon selection, initiation or ribosomal processivity. CpG islands are highly correlated with thermodynamically favorable structures, indicating that regions that exhibit DNA methylation and are transcribed may be more likely to form more stable RNA structure than other regions.

Another analytical approach applied was to select structures with high Boltzmann probabilities (>0.145), with the hypothesis that the constrained thermodynamic ensemble may make these structures more likely to form in vivo (Miklos et al., 2005). This

revealed a set of 46 SNP candidates that putatively alter likely RNA structures, including

a number in candidate genes with strong links to disease etiology. We also analyzed

structures originally discarded from the single SNP dataset because of length constraints

(n=1,203) or due to alternative sequence contexts (n=634). This set was significantly enriched for SNPs near the 5’ end of transcripts as well as near sites of alternative splicing and translation initiation. Finally, we analyzed a merge of a database of evolutionarily conserved RNA structures identified through comparative genomics

(Pedersen et al. 2006) and dbSNP (n=936). A large portion of conserved, favorable structures are within the translated regions of mRNAs (synonymous and nonsynonymous), consistent with our findings (see Fig. 6.3a, Table 6.2). We observe a remarkably high co-localization of these exon structures with methionine codons, which may indicate some of these structures play a role in promoting or inhibiting translation of downstream inframe start codons. A high percentage of EvoFold/SNP intersections are near the distal ends of 3’UTRs, indicating a potential conserved regulatory mechanism

194 for mRNA structure at these sites, perhaps affecting RNA degradation rates,

polyadenylation or miRNA targeting. Through alternative analyses we also identify thermodynamically favorable structures located in 5’ regions and in proximity to

translation start sites. Taken together these results suggest that human mRNA structures

have multiple functional modalities and there is significant potential for human variants to alter favorable mRNA structures.

We carefully scanned non-coding and mRNA structures with known biological function for human genetic variation. We find support for previously investigated variants and additionally discover validated variants in functional RNA structures that have not yet been investigated. At the same time, known variants were notably absent, or

distantly located in the bulk of functional RNAs examined, indicating that these

structures may be selectively preserved due to their biological roles. Further study of

genotypes in functional RNA structures is warranted as some variants are suggested as causative for diseases (e.g., Allerson et al., 1999; Allamand et al., 2006). We observed that a significant number of disease-associated SNPs were absent from databases and that the proportion of non-validated, monoallelic and computationally predicted SNPs is considerable. Thus, more effort is needed to annotate human disease-associated alleles

reported in the literature but not submitted to databases. Furthermore, genome-wide

projects relying on SNP annotations must guard against both false negatives and positives

as we have done via SNPmasking and additional filtering steps before analysis.

Given our results and recent works (see Table 6.1, Washietl et al., 2005; Pedersen

et al., 2006), we conclude there is sufficient evidence to suggest many functional RNA

structures in the human genome, and that a portion of functional human genetic variation

195 exerts effects through the alteration of RNA conformations. We have shown that the tendency to form structure and the impact of individual alleles is highly dependent on sequence context. The structural database we make available provides an opportunity for researchers to search for specific structural motifs that are predicted to be polymorphic in human populations (e.g., Lopez de Silanes et al., 2004; Lopez de Silanes et al., 2005).

Clinical association studies have traditionally focused on amino acid changing variants and given less consideration to regulatory regions and variants that affect RNA maturation. However, there are a mounting number of examples where variants of all types (synonymous, nonsynonymous, UTR, intronic) significantly alter function at a pre- translational level (Johnson et al., 2005). The impact of genetic variation on RNA structure may be much greater than our results suggest because we have analyzed only a portion of all alleles. Our analysis of 2 SNP haplotypes indicates that multi-allele interactions may mediate the preservation or disruption of functional structures.

Although SNPs are the most common variant type, structures may also be affected by changes due to other types of variations (e.g., indels, tandem repeats, translocations).

Poly-repeats associated with disease have already been observed to influence biologically important human RNA structures (see Table 6.1). Furthermore, recent analyses indicate that there may be many non-coding and intergenic RNA structures left to be characterized in the human genome (Washietl et al., 2005; Pedersen et al., 2006), and these may also harbor functional genetic variation. We suggest further experimental work must be done to reveal population associations of known and putative functional

RNA conformational polymorphisms, and to understand the biologically relevant structures and the functional mechanisms that are affected.

196 BIBLIOGRAPHY

Achenbach, J. C., Chiuman, W., Cruz, R. P. & Li, Y. (2004). Dnazymes: From creation in vitro to application in vivo. Curr Pharm Biotechnol 5(4), 321-336.

Adjei, A. A., Thomae, B. A., Prondzinski, J. L., Eckloff, B. W., Wieben, E. D. & Weinshilboum, R. M. (2003). Human estrogen sulfotransferase (sult1e1) pharmacogenomics: Gene resequencing and functional genomics. Br J Pharmacol 139(8), 1373-1382.

Agostini, C. & Gurrieri, C. (2006). Chemokine/cytokine cocktail in idiopathic pulmonary fibrosis. Proc Am Thorac Soc 3, 357-363.

Aguilar Martinez, P., Biron, C., Blanc, F., Masmejean, C., Jeanjean, P., Michel, H. & Schved, J.F. (1997). Compound heterozygotes for hemochromatosis gene mutations: may they help to understand the pathophysiology of the disease? Blood Cells Mol Dis 23, 269-276.

Aithal, G. P., Day, C. P., Kesteven, P. J. & Daly, A. K. (1999). Association of polymorphisms in the cytochrome p450 with warfarin dose requirement and risk of bleeding complications. Lancet 353(9154), 717-719.

Akiyama, T. E. & Gonzalez, F. J. (2003). Regulation of p450 genes by liver-enriched transcription factors and nuclear receptors. Biochim Biophys Acta 1619(3), 223-234.

Aklillu, E., Carrillo, J. A., Makonnen, E., Hellman, K., Pitarque, M., Bertilsson, L. & Ingelman-Sundberg, M. (2003). Genetic polymorphism of in ethiopians affecting induction and expression: Characterization of novel haplotypes with single-nucleotide polymorphisms in intron 1. Mol Pharmacol 64(3), 659-669.

Albanesi, C., Scarponi, C., Giustizieri, M. L. & Girolomoni, G. (2005). Keratinocytes in inflammatory skin diseases. Curr Drug Targets Inflamm Allergy 4, 329-334.

Allam, J. P. & Novak, N. (2006). The pathophysiology of atopic eczema. Clin Exp Dermatol 31, 89-93.

Allamand, V., Richard, P., Lescure, A., Ledeuil, C., Desjardin, D., Petit, N., Gartioux, C., Ferreiro, A., Krol, A., Pellegrini, N. et al. (2006). A single homozygous point mutation in

197 a 3'untranslated region motif of selenoprotein N mRNA causes SEPN1-related myopathy. EMBO Rep 7, 450-454.

Allerson, C. R., Cazzola, M. & Rouault, T. A. (1999). Clinical severity and thermodynamic effects of iron-responsive element mutations in hereditary hyperferritinemia-cataract syndrome. J Biol Chem 274, 26439-26447.

Allorge, D., Chevalier, D., Lo-Guidice, J. M., Cauffiez, C., Suard, F., Baumann, P., Eap, C. B. & Broly, F. (2003). Identification of a novel splice-site mutation in the cyp1a2 gene. Br J Clin Pharmacol 56(3), 341-344.

Alves, S., Amorim, A., Ferreira, F. & Prata, M. J. (2001). Influence of the variable number of tandem repeats located in the promoter region of the thiopurine methyltransferase gene on enzymatic activity. Clin Pharmacol Ther 70(2), 165-174.

Alves-Filho, J. C., Tavares-Murta, B. M., Barja-Fidalgo, C., Benjamim, C. F., Basile- Filho, A., Arraes, S. M. & Cunha, F. Q. (2006). Neutrophil function in severe sepsis. Endocr Metab Immune Disord Drug Targets 6, 151-158.

Amirimani, B., Ning, B., Deitz, A. C., Weber, B. L., Kadlubar, F. F. & Rebbeck, T. R. (2003). Increased transcriptional activity of the *1b promoter variant. Environ Mol Mutagen 42(4), 299-305.

Ancel, L. W. and Fontana, W. (2000). Plasticity, evolvability, and modularity in RNA. J Exp Zool 288, 242-283.

Anderle, P., Nielsen, C. U., Pinsonneault, J., Krog, P. L., Brodin, B. & Sadée, W. (2004). Genetic variants of the human dipeptide transporter PEPT1. J Pharmacol Exp Ther 316, 636-646.

Andrisin, T. E., Humma, L. M. & Johnson, J. A. (2002). Collection of genomic DNA by the noninvasive mouthwash method for use in pharmacogenetic studies. Pharmacotherapy 22, 954-960.

Antoniou, K. M., Alexandrakis, M. G., Siafakas, N. M. & Bouros, D. (2005). Cytokine network in the pathogenesis of idiopathic pulmonary fibrosis. Sarcoidosis Vasc Diffuse Lung Dis 22, 91-104.

Arnett, D. K., Davis, B. R., Ford, C. E., Boerwinkle, E., Leiendecker-Foster, C., Miller, M. B., Black, H. & Eckfeldt, J. H. (2005). Pharmacogenetic association of the angiotensin-converting enzyme insertion/deletion polymorphism on blood pressure and cardiovascular risk in relation to antihypertensive treatment: the Genetics of Hypertension-Associated Treatment (GenHAT) study. Circulation 111(25), 3374-3383.

198 Arranz, M. J., Munro, J., Sham, P., Kirov, G., Murray, R. M., Collier, D. A. & Kerwin, R. W. (1998). Meta-analysis of studies on genetic variation in 5-ht2a receptors and clozapine response. Schizophr Res 32(2), 93-99.

Arteaga, C. L. & Baselga, J. (2004). Tyrosine kinase inhibitors: Why does the current process of clinical development not apply to them? Cancer Cell 5(6), 525-531.

Athanasiadis, A., Rich, A. & Maas, S. (2004). Widespread A-to-I RNA editing of Alu- containing mRNAs in the human transcriptome. PLoS Biol 2, e391.

Attaie, A., Kim, E., Wilcox, E. R. & Lalwani, A. K. (1997). A splice-site mutation affecting the paired box of PAX3 in a three generation family with Waardenburg syndrome type I (WS1). Mol Cell Probes 11, 233-236.

Babendure, J. R., Babendure, J. L., Ding, J. H. & Tsien, R. Y. (2006). Control of mammalian translation by mRNA structure near caps. Rna.

Bakheet, T., Williams, B. R. & Khabar, K. S. (2003). Ared 2.0: An update of au-rich element mrna database. Nucl Acids Res 31(1), 421-423.

Bamberger, C. M., Bamberger, A. M., de Castro, M. & Chrousos, G. P. (1995). Glucocorticoid receptor beta, a potential endogenous inhibitor of glucocorticoid action in humans. J Clin Invest 95(6), 2435-2441.

Bamshed, M. (2005). Genetic influences on health. Does race matter? J Am Med Assoc 294, 937-946. [Erratum, J Am Med Assoc 2005, 294, 1620].

Bargetzi, M. J., Aoyama, T., Gonzalez, F. J. & Meyer, U. A. (1989). metabolism in human liver microsomes by cytochrome p450iiia4. Clin Pharmacol Ther 46(5), 521-527.

Barrette, I., Poisson, G., Gendron, P. & Major, F. (2001). Pseudoknots in prion protein mRNAs confirmed by comparative sequence analysis and pattern searching. Nucl Acids Res 29, 753-758.

Battersby, S., Ogilvie, A. D., Blackwood, D. H., Shen, S., Muqit, M. M., Muir, W. J., Teague, P., Goodwin, G. M. & Harmar, A. J. (1999). Presence of multiple functional polyadenylation signals and a single nucleotide polymorphism in the 3' untranslated region of the human serotonin transporter gene. J Neurochem 72(4), 1384-1388.

Beaumont, C., Leneuve, P., Devaux, I., Scoazec, J. Y., Berthier, M., Loiseau, M. N., Grandchamp, B. & Bonneau, D. (1995). Mutation in the iron responsive element of the L ferritin mRNA in a family with dominant hyperferritinaemia and cataract. Nat Genet 11, 444-446.

199 Beitelshees, A.L., Gong, Y., Cooper-Dehoff, R.M., Moss, J.I., Pepine, C.J. & Johnson, J.A. (2005). Variable blood pressure response to verapamil by KCNMB1 genotype. Clin Pharmacol Ther 77, P97.

Bell, D. A., Badawi, A. F., Lang, N. P., Ilett, K. F., Kadlubar, F. F. & Hirvonen, A. (1995). Polymorphism in the n-acetyltransferase 1 (nat1) polyadenylation signal: Association of nat1*10 allele with higher n-acetylation activity in bladder and colon tissue. Cancer Res 55(22), 5226-5229.

Bentwich, I., Avniel, A., Karov, Y., Aharonov, R., Gilad, S., Barad, O., Barzilai, A., Einat, P., Einav, U., Meiri, E. et al. (2005). Identification of hundreds of conserved and nonconserved human microRNAs. Nat Genet 37, 766-770.

Berry, M. J., Banu, L., Chen, Y. Y., Mandel, S. J., Kieffer, J. D., Harney, J. W. & Larsen, P. R. (1991). Recognition of UGA as a selenocysteine codon in type I deiodinase requires sequences in the 3' untranslated region. Nature 353, 273-276.

Bertilsson, G., Heidrich, J., Svensson, K., Asman, M., Jendeberg, L., Sydow-Backman, M., Ohlsson, R., Postlind, H., Blomquist, P. & Berkenstam, A. (1998). Identification of a human nuclear receptor defines a new signaling pathway for induction. Proc Natl Acad Sci U S A 95(21), 12208-12213.

Blaisdell, J., Jorge-Nebert, L. F., Coulter, S., Ferguson, S. S., Lee, S. J., Chanas, B., Xi, T., Mohrenweiser, H., Ghanayem, B. & Goldstein, J. A. (2004). Discovery of new potentially defective alleles of human cyp2c9. Pharmacogenetics 14(8), 527-537.

Blanchard, R. L., Freimuth, R. R., Buck, J., Weinshilboum, R. M. & Coughtrie, M. W. (2004). A proposed nomenclature system for the cytosolic sulfotransferase (SULT) superfamily. Pharmacogenetics 14(3), 199-211.

Bodzioch, M., Lapicka, K., Aslanidis, C., Kacinski, M. & Schmitz, G. (2001). Two novel mutant alleles of the gene encoding neurotrophic tyrosine kinase receptor type 1 (NTRK1) in a patient with congenital insensitivity to pain with anhidrosis: a splice junction mutation in intron 5 and cluster of four mutations in exon 15. Hum Mutat 17, 72.

Bonham Carter, S. M., Rein, G., Glover, V., Sandler, M. & Caldwell, J. (1983). Human platelet phenolsulphotransferase m and p: Substrate specificities and correlation with in vivo sulphoconjugation of paracetamol and salicylamide. Br J Clin Pharmacol 15(3), 323-330.

Bonnet, E., Wuyts, J., Rouze, P. & Van de Peer, Y. (2004). Evidence that microRNA precursors, unlike other non-coding RNAs, have lower folding free energies than random sequences. Bioinformatics 20, 2911-2917.

200 Bosma, P. J., Chowdhury, J. R., Bakker, C., Gantla, S., de Boer, A., Oostra, B. A., Lindhout, D., Tytgat, G. N., Jansen, P. L., Oude Elferink, R. P. & et al. (1995). The genetic basis of the reduced expression of bilirubin udp-glucuronosyltransferase 1 in gilbert's syndrome. N Engl J Med 333(18), 1171-1175.

Botma, G. J., Verhoeven, A. J. & Jansen, H. (2001). Hepatic lipase promoter activity is reduced by the C-480T and G-216A substitutions present in the common LIPC gene variant, and is increased by upstream stimulatory factor. Atherosclerosis 154(3), 625-632.

Bray, N. J., Buckland, P. R., Hall, H., Owen, M. J. & O'Donovan, M. C. (2004). The serotonin-2a receptor gene locus does not contain common polymorphism affecting mrna levels in adult brain. Mol Psychiatry 9(1), 109-114.

Bray, N. J., Buckland, P. R., Owen, M. J. & O'Donovan, M. C. (2003). Cis-acting variation in the expression of a high proportion of genes in human brain. Hum Genet 113(2), 149-153.

Bray, N. J., Buckland, P. R., Williams, N. M., Williams, H. J., Norton, N., Owen, M. J. & O'Donovan, M. C. (2003). A haplotype implicated in schizophrenia susceptibility is associated with reduced COMT expression in human brain. Am J Hum Genet 73, 152- 161.

Broly, F., Marez, D., Lo Guidice, J. M., Sabbagh, N., Legrand, M., Boone, P. & Meyer, U. A. (1995). A nonsense mutation in the cytochrome p450 gene identified in a caucasian with an enzyme deficiency. Hum Genet 96(5), 601-603.

Bruck, W. (2005). The pathology of multiple sclerosis is the result of focal inflammatory demyelination with axonal damage. J Neurol 252 Suppl 5, v3-9.

Bruhn, C., Brockmoller, J., Cascorbi, I., Roots, I. & Borchert, H. H. (1999). Correlation between genotype and phenotype of the human arylamine n-acetyltransferase type 1 (nat1). Biochem Pharmacol 58(11), 1759-1764.

Bruno, I. G., Jin, W. & Cote, G. J. (2004). Correction of aberrant fgfr1 alternative rna splicing through targeting of intronic regulatory elements. Hum Mol Genet

Buckland, P. R. (2004). Allele-specific gene expression in humans. Hum Mol Genet 13, R255-R260.

Buckland, P. R., Coleman, S. L., Hoogendoorn, B., Guy, C., Smith, S. K. & O’Donovan, M. C. (2005). A high proportion of chromosome 21 promoter polymorphisms influence transcriptional activity. Hum Mol Genet 11, 233-239.

Buratti, E. and Baralle, F. E. (2004). Influence of RNA secondary structure on the pre- mRNA splicing process. Mol Cell Biol 24, 10505-10514.

201

Butcher, N. J., Boukouvala, S., Sim, E. & Minchin, R. F. (2002). Pharmacogenetics of the arylamine n-acetyltransferases. Pharmacogenomics J 2(1), 30-42.

Bylund, J., Bylund, M. & Oliw, E. H. (2001). Cdna cloning and expression of , a novel human cytochrome p450. Biochem Biophys Res Commun 280(3), 892-897.

Calado, R. T., Falcao, R. P., Garcia, A. B., Gabellini, S. M., Zago, M. A. & Franco, R. F. (2002). Influence of functional mdr1 gene polymorphisms on p-glycoprotein activity in cd34+ hematopoietic stem cells. Haematologica 87(6), 564-568.

Camaschella, C., Zecchina, G., Lockitch, G., Roetto, A., Campanella, A., Arosio, P. & Levi, S. (2000). A new mutation (G51C) in the iron-responsive element (IRE) of L- ferritin associated with hyperferritinaemia-cataract syndrome decreases the binding affinity of the mutated IRE for iron-regulatory proteins. Br J Haematol 108, 480-482.

Cambien, F., Alhenc-Gelas, F., Herbeth, B., Andre, J. L., Rakotovao, R., Gonzales, M. F., Allegrini, J. & Bloch, C. (1988). Familial resemblance of plasma angiotensin- converting enzyme level: the Nancy study. Am J Hum Genet 43(5), 774-780.

Campagnoli, M. F., Pimazzoni, R., Bosio, S., Zecchina, G., DeGobbi, M., Bosso, P., Oldani, B. & Ramenghi, U. (2002). Onset of cataract in early infancy associated with a 32G-->C transition in the iron responsive element of L-ferritin. Eur J Pediatr 161, 499- 502.

Carpen, J. D., Archer, S. N., Skene, D. J., Smits, M. & von Schantz, M. (2005). A single- nucleotide polymorphism in the 5'-untranslated region of the hPER2 gene is associated with diurnal preference. J Sleep Res 14, 293-297.

Carrel, L. & Willard, H. F. (2005). X-inactivation profile reveals extensive variability in X-linked gene expresssion in females. Nature 434, 400-404.

Cartegni, L., Wang, J., Zhu, Z., Zhang, M. Q. & Krainer, A. R. (2003). Esefinder: A web resource to identify exonic splicing enhancers. Nucl Acids Res 31(13), 3568-3571.

Cauffiez, C., Klinzig, F., Rat, E., Tournel, G., Allorge, D., Chevalier, D., Lovecchio, T., Pottier, N., Colombel, J. F., Lhermitte, M., D'Halluin, J. C., Broly, F. & Lo-Guidice, J. M. (2004). Functional characterization of genetic polymorphisms identified in the human cytochrome p450 4f12 (cyp4f12) promoter region. Biochem Pharmacol 67(12), 2231- 2238.

Cavaille, J., Buiting, K., Kiefmann, M., Lalande, M., Brannan, C. I., Horsthemke, B., Bachellerie, J. P., Brosius, J. & Huttenhofer, A. (2000). Identification of brain-specific and imprinted small nucleolar RNA genes exhibiting an unusual genomic organization. Proc Natl Acad Sci U S A 97, 14311-14316.

202

Cazzola, M., Bergamaschi, G., Tonon, L., Arbustini, E., Grasso, M., Vercesi, E., Barosi, G., Bianchi, P. E., Cairo, G. & Arosio, P. (1997). Hereditary hyperferritinemia-cataract syndrome: relationship between phenotypes and specific mutations in the iron-responsive element of ferritin light-chain mRNA. Blood 90, 814-821.

Cermakova, Z., Petrkova, J., Arakelyan, A., Drabek, J., Mrazek, F., Lukl, J. & Petrek, M. (2005). The MCP-1 -2518 (A to G) single nucleotide polymorphism is not associated with myocardial infarction in the Czech population. Int J Immunogenet 32, 315-318.

Chasman, D. I., Posada, D., Subrahmanyan, L., Cook, N. R., Stanton, V. P. & Ridker, P. M. (2004). Pharmacogenetic study of statin therapy and cholesterol reduction. J Am Med Assoc 291, 2821-2827.

Chen, Y., Carlini, D. B., Baines, J. F., Parsch, J., Braverman, J. M., Tanda, S. & Stephan, W. (1999). RNA secondary structure and compensatory evolution. Genes Genet Syst 74(6), 271-286.

Cheung, V. G., Conlin, L. K., Weber, T. M., Arcaro, M., Jen, K. Y., Morley, M. & Spielman, R. S. (2003). Natural variation in human gene expression assessed in lymphoblastoid cells. Nat Genet 33, 422-425.

Chevalier, D., Allorge, D., Lo-Guidice, J. M., Cauffiez, C., Lepetit, C., Migot-Nabias, F., Kenani, A., Lhermitte, M. & Broly, F. (2002). Sequence analysis, frequency and ethnic distribution of vntr polymorphism in the 5'-untranslated region of the human gene (cyp8a1). Prostaglandins Other Lipid Mediat 70(1-2), 31-37.

Chida, M., Yokoi, T., Fukui, T., Kinoshita, M., Yokota, J. & Kamataki, T. (1999). Detection of three genetic polymorphisms in the 5'-flanking region and intron 1 of human cyp1a2 in the japanese population. Jpn J Cancer Res 90(9), 899-902.

Cigler, T., LaForge, K. S., McHugh, P. F., Kapadia, S. U., Leal, S. M. & Kreek, M. J. (2001). Novel and previously reported single-nucleotide polymorphisms in the human 5- ht(1b) receptor gene: No association with cocaine or alcohol abuse or dependence. Am J Med Genet 105(6), 489-497.

Cirulli, E. T. & Goldstein, D. B. (2007). in vitro assays fail to predict in vivo effects of regulatory polymorphisms. Hum Mol Gen, in press.

Clote, P., Ferre, F., Kranakis, E. & Krizanc, D. (2005). Structural RNA has lower folding energy than random RNA of the same dinucleotide frequency. Rna 11, 578-591.

Cmejla, R., Petrak, J. & Cmejlova, J. (2006). A novel iron responsive element in the 3'UTR of human MRCKalpha. Biochem Biophys Res Commun 341, 158-166.

203 Collie-Duguid, E. S., Etienne, M. C., Milano, G. & McLeod, H. L. (2000). Known variant dpyd alleles do not explain dpd deficiency in cancer patients. Pharmacogenetics 10(3), 217-223.

Collins, F. S., Drumm, M. L., Cole, J. L., Lockwood, W. K., vande Woude, G. G. & Iannuzzi, M. C. (1987). Construction of a general human chromosome jumping library, with application to cystic fibrosis. Science 235(4792), 1046-1049.

Conde, L., Vaquerizas, J. M., Santoyo, J., Al-Shahrour, F., Ruiz-Llorente, S., Robledo, M. & Dopazo, J. (2004). Pupasnp finder: A web tool for finding snps with putative effect at transcriptional level. Nucl Acids Res 32(Web Server issue), W242-248.

Conne, B., Stutz, A. & Vassalli, J. D. (2000). The 3' untranslated region of messenger rna: A molecular 'hotspot' for pathology? Nat Med 6(6), 637-641.

Cooper, T. A. & Mattox, W. (1997). The regulation of splice-site selection, and its role in human disease. Am J Hum Genet 61(2), 259-266.

Cornish-Bowden, A. (1985). Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984. Nucl Acids Res 13, 3021-3030.

Cowles, C. R., Hirschhorn, J. N., Altshuler, D. & Lander, E. S. (2002). Detection of regulatory variation in mouse genes. Nat Genet 32, 432-437.

Cronstein, B. N. & Terkeltaub, R. (2006). The inflammatory process of gout and its treatment. Arthritis Res Ther 8 Suppl 1, S3.

Danielson, P. B. (2002). The cytochrome p450 superfamily: Biochemistry, evolution and drug metabolism in humans. Curr Drug Metab 3(6), 561-597.

Day, D. A. & Tuite, M. F. (1998). Post-transcriptional gene regulatory mechanisms in eukaryotes: An overview. J Endocrinol 157(3), 361-371. de Leon, J. H., Vatsis, K. P. & Weber, W. W. (2000). Characterization of naturally occurring and recombinant human n-acetyltransferase variants encoded by NAT1. Mol Pharmacol 58(2), 288-299. de Maat, M. P., Jukema, J. W., Ye, S., Zwinderman, A. H., Moghaddam, P. H., Beekman, M., Kastelein, J. J., van Boven, A. J., Bruschke, A. V., Humphries, S. E., Kluft, C. & Henney, A. M. (1999). Effect of the stromelysin-1 promoter on efficacy of pravastatin in coronary atherosclerosis and restenosis. Am J Cardiol 83(6), 852-856. de Martinville, B., Wyman, A. R., White, R. & Francke, U. (1982). Assignment of first random restriction fragment length polymorphism (RFLP) locus (D14S1) to a region of human chromosome 14. Am J Hum Genet 34(2), 216-226.

204 de Morais, S. M., Wilkinson, G. R., Blaisdell, J., Meyer, U. A., Nakamura, K. & Goldstein, J. A. (1994a). Identification of a new genetic defect responsible for the polymorphism of (s)-mephenytoin metabolism in japanese. Mol Pharmacol 46(4), 594- 598. de Morais, S. M., Wilkinson, G. R., Blaisdell, J., Nakamura, K., Meyer, U. A. & Goldstein, J. A. (1994b). The major genetic defect responsible for the polymorphism of s- mephenytoin metabolism in humans. J Biol Chem 269(22), 15419-15422.

Delbruck, S. J., Wendel, B., Grunewald, I., Sander, T., Morris-Rosendahl, D., Crocq, M. A., Berrettini, W. H. & Hoehe, M. R. (1997). A novel allelic variant of the human serotonin transporter gene regulatory polymorphism. Cytogenet Cell Genet 79(3-4), 214- 220.

Derijk, R. H., Schaaf, M. J., Turner, G., Datson, N. A., Vreugdenhil, E., Cidlowski, J., de Kloet, E. R., Emery, P., Sternberg, E. M. & Detera-Wadleigh, S. D. (2001). A human glucocorticoid receptor gene variant that increases the stability of the glucocorticoid receptor beta-isoform mrna is associated with rheumatoid arthritis. J Rheumatol 28(11), 2383-2388.

Di Paola, R., Frittitta, L., Miscio, G., Bozzali, M., Baratta, R., Centra, M., Spampinato, D., Santagati, M. G., Ercolino, T., Cisternino, C., Soccio, T., Mastroianno, S., Tassi, V., Almgren, P., Pizzuti, A., Vigneri, R. & Trischitta, V. (2002). A variation in 3' utr of hptp1b increases specific gene expression and associates with insulin resistance. Am J Hum Genet 70(3), 806-812.

Dickinson, A. M. & Charron, D. (2005). Non-HLA immunogenetics in hematopoietic stem cell transplantation. Curr Opin Immunol 17, 517-525.

Ding, C., Maier, E., Roscher, A. A., Braun, A. & Cantor, C. R. (2004). Simultaneous quantitative and allele-specific expression analysis with real competitive pcr. BMC Genet 5(1), 8.

Ding, D., Xu, L., Menon, M., Reddy, G. P. & Barrack, E. R. (2005). Effect of GGC (glycine) repeat length polymorphism in the human androgen receptor on androgen action. Prostate 62, 133-139.

Duan, J., Wainwright, M. S., Comeron, J. M., Saitou, N., Sanders, A. R., Gelernter, J. & Gejman, P. V. (2003). Synonymous mutations in the human dopamine receptor d2 (drd2) affect mrna stability and synthesis of the receptor. Hum Mol Genet 12(3), 205-216.

Duester, G., Farres, J., Felder, M. R., Holmes, R. S., Hoog, J. O., Pares, X. et al. (1999). Recommended nomenclature for the vertebrate alcohol dehydrogenase gene family. Biochem Pharmacol 58(3), 381-395.

205

Duester, G., Mic, F. A. & Molotkov, A. (2003). Cytosolic retinoid dehydrogenases govern ubiquitous metabolism of retinol to retinaldehyde followed by tissue-specific metabolism to retinoic acid. Chem Biol Interact 143-144(201-210.

Edenberg, H. J., Jerome, R. E. & Li, M. (1999). Polymorphism of the human alcohol dehydrogenase 4 (adh4) promoter affects gene expression. Pharmacogenetics 9(1), 25-30.

Eichhammer, P., Langguth, B., Wiegand, R., Kharraz, A., Frick, U. & Hajak, G. (2003). Allelic variation in the serotonin transporter promoter affects neuromodulatory effects of a selective serotonin transporter reuptake inhibitor (ssri). Psychopharmacology (Berl) 166(3), 294-297.

Ekins, S., Vandenbranden, M., Ring, B. J., Gillespie, J. S., Yang, T. J., Gelboin, H. V. & Wrighton, S. A. (1998). Further characterization of the expression in liver and catalytic activity of . J Pharmacol Exp Ther 286(3), 1253-1259.

Enard, W., Khaitovich, P., Klose, J., Zollner, S., Heissig, F., Giavalisco, P., Nieselt- Struwe, K., Muchmore, E., Varki, A., Ravid, R., Doxiadis, G. M., Bontrop, R. E. & Paabo, S. (2002). Intra- and interspecific variation in primate gene expression patterns. Science 296(5566), 340-343.

Eubank, T. D., Galloway, M., Montague, C. M., Waldman, W. J. & Marsh, C. B. (2003). M-CSF induces vascular endothelial growth factor production and angiogenic activity from human monocytes. J Immunol 171, 2637-2643.

Evans, W. E. & Relling, M. V. (1999). Pharmacogenomics: Translating functional genomics into rational therapeutics. Science 286(5439), 487-491.

Evert, B., Griese, E. U. & Eichelbaum, M. (1994). A missense mutation in exon 6 of the cyp2d6 gene leading to a histidine 324 to proline exchange is associated with the poor metabolizer phenotype of sparteine. Naunyn Schmiedebergs Arch Pharmacol 350(4), 434-439.

Exner, D. V., Dries, D. L., Domanski, M. J. & Cohn, J. N. (2001). Lesser response to angiotensin-converting- therapy in black as compared with white patients with left ventricular dysfunction. N Engl J Med 344, 1351-1357.

Eyries, M., Agrapart, M., Alonso, A. & Soubrier, F. (2002). Phorbol ester induction of angiotensin-converting enzyme transcription is mediated by Egr-1 and AP-1 in human endothelial cells via ERK1/2 pathway. Circ Res 91, 899-906.

Fang, J. L. & Lazarus, P. (2004). Correlation between the udp-glucuronosyltransferase (ugt1a1) tataa box polymorphism and detoxification phenotype: Significantly decreased glucuronidating activity against benzo(a)pyrene-7,8-dihydrodiol(-) in liver

206 microsomes from subjects with the ugt1a1*28 variant. Cancer Epidemiol Biomarkers Prev 13(1), 102-109.

Felix, C. A., Walker, A. H., Lange, B. J., Williams, T. M., Winick, N. J., Cheung, N. K., Lovett, B. D., Nowell, P. C., Blair, I. A. & Rebbeck, T. R. (1998). Association of cyp3a4 genotype with treatment-related leukemia. Proc Natl Acad Sci U S A 95(22), 13176- 13181.

Fenoglio, C., Galimberti, D., Lovati, C., Guidi, I., Gatti, A., Fogliarino, S., Tiriticco, M., Mariani, C., Forloni, G., Pettenati, C. et al. (2004). MCP-1 in Alzheimer's disease patients: A-2518G polymorphism and serum levels. Neurobiol Aging 25, 1169-1173.

Ferguson, R. J., De Morais, S. M., Benhamou, S., Bouchardy, C., Blaisdell, J., Ibeanu, G., Wilkinson, G. R., Sarich, T. C., Wright, J. M., Dayer, P. & Goldstein, J. A. (1998). A new genetic defect in human : Mutation of the initiation codon is responsible for poor metabolism of s-mephenytoin. J Pharmacol Exp Ther 284(1), 356-361.

Fessing, M. Y., Krynetski, E. Y., Zambetti, G. P. & Evans, W. E. (1998). Functional characterization of the human thiopurine s-methyltransferase (tpmt) gene promoter. Eur J Biochem 256(3), 510-517.

Fialcowitz, E. J., Brewer, B. Y., Keenan, B. P. & Wilson, G. M. (2005). A hairpin-like structure within an AU-rich mRNA-destabilizing element regulates trans-factor binding selectivity and mRNA decay kinetics. J Biol Chem 280, 22406-22417.

Florentz, C. and Sissler, M. (2001). Disease-related versus polymorphic mutations in human mitochondrial tRNAs. Where is the difference? EMBO Rep 2, 481-486.

Flores-Villanueva, P. O., Ruiz-Morales, J. A., Song, C. H., Flores, L. M., Jo, E. K., Montano, M., Barnes, P. F., Selman, M. & Granados, J. (2005). A functional promoter polymorphism in monocyte chemoattractant protein-1 is associated with increased susceptibility to pulmonary tuberculosis. J Exp Med 202, 1649-1658.

Floyd, M. D., Gervasini, G., Masica, A. L., Mayo, G., George, A. L., Jr., Bhat, K., Kim, R. B. & Wilkinson, G. R. (2003). Genotype-phenotype associations for common cyp3a4 and variants in the basal and induced metabolism of midazolam in european- and african-american men and women. Pharmacogenetics 13(10), 595-606.

Fluiter, K., Housman, D., Ten Asbroek, A. L. & Baas, F. (2003). Killing cancer by targeting genes that cancer cells have lost: Allele-specific inhibition, a novel approach to the treatment of genetic disorders. Cell Mol Life Sci 60(5), 834-843.

Forton, J. T., Udalova, I. A., Campino, S., Rockett, K. A., Hull, J. & Kwiatkowski, D. P. (2007). Localization of a long-range cis-regulatory element of IL13 by allelic transcript ratio mapping. Genome Res 17, 82-87.

207

Freimuth, R. R., Eckloff, B., Wieben, E. D. & Weinshilboum, R. M. (2001). Human sulfotransferase sult1c1 pharmacogenetics: Gene resequencing and functional genomic studies. Pharmacogenetics 11(9), 747-756.

Freimuth, R. R., Wiepert, M., Chute, C. G., Wieben, E. D. & Weinshilboum, R. M. (2004). Human cytosolic sulfotransferase database mining: Identification of seven novel genes and pseudogenes. Pharmacogenomics J 4(1), 54-65.

Frisch, A., Finkel, B., Michaelovsky, E., Sigal, M., Laor, N. & Weizman, R. (2000). A rare short allele of the serotonin transporter promoter region (5-httlpr) found in an aggressive schizophrenic patient of jewish libyan origin. Psychiatr Genet 10(4), 179-183.

Fukuda, Y., Koga, M., Arai, M., Noguchi, E., Ohtsuki, T., Horiuchi, Y., Ishiguro, H., Niizato, K., Iritani, S., Itokawa, M. et al. (2006). Monoallelic expression of the HTR2A gene in human brain and peripheral lymphocytes. Biol Psychiatry 60, 1331-1335.

Fukushima-Uesaka, H., Saito, Y., Watanabe, H., Shiseki, K., Saeki, M., Nakamura, T., Kurose, K., Sai, K., Komamura, K., Ueno, K., Kamakura, S., Kitakaze, M., Hanai, S., Nakajima, T., Matsumoto, K., Saito, H., Goto, Y., Kimura, H., Katoh, M., Sugai, K., Minami, N., Shirao, K., Tamura, T., Yamamoto, N., Minami, H., Ohtsu, A., Yoshida, T., Saijo, N., Kitamura, Y., Kamatani, N., Ozawa, S. & Sawada, J. (2004). Haplotypes of cyp3a4 and their close linkage with cyp3a5 haplotypes in a japanese population. Hum Mutat 23(1), 100.

Gaedigk, A., Blum, M., Gaedigk, R., Eichelbaum, M. & Meyer, U. A. (1991). Deletion of the entire cytochrome p450 cyp2d6 gene as a cause of impaired drug metabolism in poor metabolizers of the debrisoquine/sparteine polymorphism. Am J Hum Genet 48(5), 943- 950.

Gaedigk, A., Ryder, D. L., Bradford, L. D. & Leeder, J. S. (2003). Cyp2d6 poor metabolizer status can be ruled out by a single genotyping assay for the -1584g promoter polymorphism. Clin Chem 49(6 Pt 1), 1008-1011.

Galkina, E. & Ley, K. (2006). Leukocyte recruitment and vascular injury in diabetic nephropathy. J Am Soc Nephrol 17, 368-377.

Gao, B., Hagenbuch, B., Kullak-Ublick, G. A., Benke, D., Aguzzi, A. & Meier, P. J. (2000). Organic anion-transporting polypeptides mediate transport of opioid peptides across blood-brain barrier. J Pharmacol Exp Ther 294(1), 73-79.

Gardiner-Garden, M. and Frommer, M. (1987). CpG islands in vertebrate genomes. J Mol Biol 196, 261-282.

208 Gaspari, A. A. (2006). Innate and adaptive immunity and the pathophysiology of psoriasis. J Am Acad Dermatol 54, S67-80.

Ge, B., Gurd, S., Gaudin, T., Dore, C., Lepage, P., Harmsen, E., Hudson, T. J. & Pastinen, T. J. (2005). Survey of allelic expression using EST mining. Genome Res 15, 1584-1591.

Geick, A., Eichelbaum, M. & Burk, O. (2001). Nuclear receptor response elements mediate induction of intestinal mdr1 by rifampin. J Biol Chem 276(18), 14581-14587.

Gelernter, J., Cubells, J. F., Kidd, J. R., Pakstis, A. J. & Kidd, K. K. (1999). Population studies of polymorphisms of the serotonin transporter protein gene. Am J Med Genet 88(1), 61-66.

Gerra, G., Garofano, L., Santoro, G., Bosari, S., Pellegrini, C., Zaimovic, A., Moi, G., Bussandri, M., Moi, A., Brambilla, F. & Donnini, C. (2004). Association between low- activity serotonin transporter genotype and heroin dependence: Behavioral and personality correlates. Am J Med Genet 126B(1), 37-42.

Gilad, Y., Oshlack, A., Smyth, G. K., Speed, T. P. & White, K. P. (2006). Expression profiling in primates reveals a rapid evolution of human transcription factors. Nature 440, 242-245.

Girelli, D., Corrocher, R., Bisceglia, L., Olivieri, O., De Franceschi, L., Zelante, L. & Gasparini, P. (1995). Molecular basis for the recently described hereditary hyperferritinemia-cataract syndrome: a mutation in the iron-responsive element of ferritin L-subunit gene (the “Verona mutation”). Blood 86, 4050-4053.

Glas, J., Torok, H. P., Tonenchi, L., Schiemann, U. & Folwaczny, C. (2004). The -2518 promotor polymorphism in the MCP-1 gene is not associated with liver cirrhosis in chronic hepatitis C virus infection. Gastroenterology 126, 1930-1931; author reply 1931- 1932.

Glatt, H. & Meinl, W. (2004). Pharmacogenetics of soluble sulfotransferases (sults). Naunyn Schmiedebergs Arch Pharmacol 369(1), 55-68.

Gomperts, B. N. & Strieter, R. M. (2006). Chemokine-directed metastasis. Contrib Microbiol 13, 170-190.

Gonzalez, E., Rovin, B. H., Sen, L., Cooke, G., Dhanda, R., Mummidi, S., Kulkarni, H., Bamshad, M. J., Telles, V., Anderson, S. A. et al. (2002). HIV-1 infection and AIDS dementia are influenced by a mutant MCP-1 allele linked to increased monocyte infiltration of tissues and MCP-1 levels. Proc Natl Acad Sci U S A 99, 13795-13800.

209 Goodarzi, M. O., Wong, H., Quinones, M .J., Taylor, K. D., Guo, X., Castellani, L. W., Antoine, H. J., Yang, H., Hsueh, W. A. & Rotter, J. I. (2005). The 3' untranslated region of the lipoprotein lipase gene: haplotype structure and association with post-heparin plasma lipase activity. J Clin Endocrinol Metab 90, 4816-4823.

Goodwin, B., Hodgson, E. & Liddle, C. (1999). The orphan human pregnane x receptor mediates the transcriptional activation of cyp3a4 by rifampicin through a distal enhancer module. Mol Pharmacol 56(6), 1329-1339.

Gorski, J. C., Hall, S. D., Jones, D. R., VandenBranden, M. & Wrighton, S. A. (1994). Regioselective biotransformation of midazolam by members of the human cytochrome p450 3a (cyp3a) subfamily. Biochem Pharmacol 47(9), 1643-1653.

Gragnoli, C., Lindner, T., Cockburn, B. N., Kaisaki, P. J., Gragnoli, F., Marozzi, G. & Bell, G. I. (1997). Maturity-onset diabetes of the young due to a mutation in the hepatocyte nuclear factor-4 alpha binding site in the promoter of the hepatocyte nuclear factor-1 alpha gene. Diabetes 46(10), 1648-1651.

Gray, N. K. and Hentze, M. W. (1994). Iron regulatory protein prevents binding of the 43S translation pre-initiation complex to ferritin and eALAS mRNAs. Embo J 13, 3882- 3891.

Grewal, S. I. & Moazed, D. (2003). Heterochromatin and epigenetic control of gene expression. Science 301(5634), 798-802.

Griffiths-Jones, S. (2004). The microRNA Registry. Nucl Acids Res 32, D109-111.

Guengerich, F. P. (2004). Cytochrome p450: What have we learned and what are the future issues? Drug Metab Rev 36(2), 159-197.

Guillemette, C. (2003). Pharmacogenomics of human udp-glucuronosyltransferase enzymes. Pharmacogenomics J 3(3), 136-158.

Guitton, J., Buronfosse, T., Desage, M., Lepape, A., Brazier, J. L. & Beaune, P. (1997). Possible involvement of multiple cytochrome p450s in fentanyl and sufentanil metabolism as opposed to alfentanil. Biochem Pharmacol 53(11), 1613-1619.

Gurrieri, C., Bortoli, M., Brunetta, E., Piazza, F. & Agostini, C. (2005). Cytokines, chemokines and other biomolecular markers in sarcoidosis. Sarcoidosis Vasc Diffuse Lung Dis 22 Suppl 1, S9-14.

Haehner, B. D., Gorski, J. C., Vandenbranden, M., Wrighton, S. A., Janardan, S. K., Watkins, P. B. & Hall, S. D. (1996). Bimodal distribution of renal cytochrome p450 3a activity in humans. Mol Pharmacol 50(1), 52-59.

210 Hagenbuch, B. & Meier, P. J. (2004). Organic anion transporting polypeptides of the oatp/ slc21 family: Phylogenetic classification as oatp/ slco superfamily, new nomenclature and molecular/functional properties. Pflugers Arch 447(5), 653-665.

Hajjar, I. & Kotchen, T. A. (2003). Trends in prevalence, awareness, treatment, and control of hypertension in the United States, 1998-2000. J Am Med Assoc 290, 199-206.

Hamid, Q. A., Wenzel, S. E., Hauk, P. J., Tsicopoulos, A., Wallaert, B., Lafitte, J. J., Chrousos, G. P., Szefler, S. J. & Leung, D. Y. (1999). Increased glucocorticoid receptor beta in airway cells of glucocorticoid-insensitive asthma. Am J Respir Crit Care Med 159(5 Pt 1), 1600-1604.

Haraguchi, K., Kubo, M., Saito, T., Furuya, F., Inoue, H., Takahashi, M., Shimura, H., Tago, K. & Kobayashi, T. (2006). Serum level of macrophage colony-stimulating factor and atherosclerosis in hemodialysis patients. Nephron Clin Pract 102, c14-20.

Hariri, A. R., Mattay, V. S., Tessitore, A., Kolachana, B., Fera, F., Goldman, D., Egan, M. F. & Weinberger, D. R. (2002). Serotonin transporter genetic variation and the response of the human amygdala. Science 297(5580), 400-403.

Harmer, D., Gilber, M., Borman, R. & Clark, K. L. (2002). Quantitative mRNA expression profiling of ACE 2, a novel homologue of angiotensin converting enzyme. FEBS Lett 532(1-2), 107-110.

Harries, L. W., Hattersley, A. T. & Ellard, S. (2004). Messenger rna transcripts of the hepatocyte nuclear factor-1alpha gene containing premature termination codons are subject to nonsense-mediated decay. Diabetes 53(2), 500-504.

Hashizume, T., Imaoka, S., Hiroi, T., Terauchi, Y., Fujii, T., Miyazaki, H., Kamataki, T. & Funae, Y. (2001). Cdna cloning and expression of a novel cytochrome p450 (cyp4f12) from human small intestine. Biochem Biophys Res Commun 280(4), 1135-1141.

He, H., Olesnanik, K., Nagy, R., Liyanarachchi, S., Prasad, M. L., Stratakis, C. A., Kloos, R.T. & de la Chapelle, A. (2005). Allelic variation in gene expression in thyroid tissue. Thyroid 15, 660-666.

Heils, A., Teufel, A., Petri, S., Stober, G., Riederer, P., Bengel, D. & Lesch, K. P. (1996). Allelic variation of human serotonin transporter gene expression. J Neurochem 66(6), 2621-2624.

Hein, D. W., Doll, M. A., Fretland, A. J., Leff, M. A., Webb, S. J., Xiao, G. H., Devanaboyina, U. S., Nangju, N. A. & Feng, Y. (2000). Molecular genetics and epidemiology of the nat1 and nat2 acetylation polymorphisms. Cancer Epidemiol Biomarkers Prev 9(1), 29-42.

211 Heinz, A., Jones, D. W., Mazzanti, C., Goldman, D., Ragan, P., Hommer, D., Linnoila, M. & Weinberger, D. R. (2000). A relationship between serotonin transporter genotype and in vivo protein expression and alcohol neurotoxicity. Biol Psychiatry 47(7), 643-649.

Helgadottir, A., Thorleifsson, G., Manolescu, A., Gretarsdottir, S., Blondal, T., Jonasdottir, A., Jonasdottir, A., Sigurdsson, A., Baker, A., Palsson, A., Masson, G., Gudbjartsson, D. F., Magnusson, K. P., Andersen, K., Levey, A. I., Backman, V. M., Matthiasdottir, S., Jonsdottir, T., Palsson, S., Einarsdottir, H., Gunnarsdottir, S., Gylfason, A., Vaccarino, V., Hooper, W. C., Reilly, M. P., Granger, C. B., Austin, H., Rader, D. J., Shah, S. H., Quyyumi, A. A., Gulcher, J. R., Thorgeirsson, G., Thorsteinsdottir, U., Kong, A. & Stefansson, K. (2007). A common variant on chromosome 9p21 affects the risk of myocardial infarction. Science 316, 1491-1493.

Hemming, M. L. & Selkoe, D. J. (2005). Amyloid beta-protein is degraded by cellular angiotensin-converting enzyme (ACE) and elevated by an ACE inhibitor. J Biol Chem 280(45), 37644-37650.

Hiroi, S., Harada, H., Nishi, H., Satoh, M., Nagai, R. & Kimura, A. (1999). Polymorphisms in the SOD2 and HLA-DRB1 genes are associated with nonfamilial idiopathic dilated cardiomyopathy in Japanese. Biochem Biophys Res Commun 261, 332-339.

Hirota, T., Ieiri, I., Takane, H., Maegawa, S., Hosokawa, M., Kobayashi, K., Chiba, K., Nanba, E., Oshimura, M., Sato, T., Higuchi, S. & Otsubo, K. (2004). Allelic expression imbalance of the human cyp3a4 gene and individual phenotypic status. Hum Mol Genet

Ho, P. C., Abbott, F. S., Zanger, U. M. & Chang, T. K. (2003). Influence of cyp2c9 genotypes on the formation of a hepatotoxic metabolite of valproic acid in human liver microsomes. Pharmacogenomics J 3(6), 335-342.

Hofacker, I.L. (2003). Vienna RNA secondary structure server. Nucl Acids Res 31, 3429- 3431.

Hoffmeyer, S., Burk, O., von Richter, O., Arnold, H. P., Brockmoller, J., Johne, A., Cascorbi, I., Gerloff, T., Roots, I., Eichelbaum, M. & Brinkmann, U. (2000). Functional polymorphisms of the human multidrug-resistance gene: Multiple sequence variations and correlation of one allele with p-glycoprotein expression and activity in vivo. Proc Natl Acad Sci U S A 97(7), 3473-3478.

Homey, B., Steinhoff, M., Ruzicka, T. & Leung, D. Y. (2006). Cytokines and chemokines orchestrate atopic skin inflammation. J Allergy Clin Immunol 118, 178-189.

Hoogendoorn, B., Coleman, S. L., Guy, C. A., Smith, K., Bowen, T., Buckland, P. R. & O'Donovan, M. C. (2003). Functional analysis of human promoter polymorphisms. Hum Mol Genet 12, 2249-2254.

212

Horie, N., Aiba, H., Oguro, K., Hojo, H. & Takeishi, K. (1995). Functional analysis and DNA polymorphism of the tandemly repeated sequences in the 5'-terminal regulatory region of the human gene for thymidylate synthase. Cell Struct Funct 20(3), 191-197.

Howard, M. T., Aggarwal, G., Anderson, C. B., Khatri, S., Flanigan, K. M. & Atkins, J. F. (2005). Recoding elements located adjacent to a subset of eukaryal selenocysteine- specifying UGA codons. Embo J 24, 1596-1607.

Howard, M., Frizzell, R. A. & Bedwell, D. M. (1996). Aminoglycoside antibiotics restore cftr function by overcoming premature stop mutations. Nat Med 2(4), 467-469.

Howe, D. & Lynas, C. (2001). The cyclin D1 alternative transcripts [a] and [b] are expressed in normal and malignant lymphocytes and their relative levels are influenced by the polymorphism at codon 241. Haematologica 86, 563-569.

Hranilovic, D., Stefulj, J., Schwab, S., Borrmann-Hassenbach, M., Albus, M., Jernej, B. & Wildenauer, D. (2004). Serotonin transporter promoter and intron 2 polymorphisms: Relationship between allelic variants and gene expression. Biol Psychiatry 55(11), 1090- 1094.

Hu, Y. J., Korotkov, K. V., Mehta, R., Hatfield, D. L., Rotimi, C. N., Luke, A., Prewitt, T. E., Cooper, R. S., Stock, W., Vokes, E. E. et al. (2001). Distribution and functional consequences of nucleotide polymorphisms in the 3'-untranslated region of the human Sep15 gene. Cancer Res 61, 2307-2310.

Hu, Y., Hakkola, J., Oscarson, M. & Ingelman-Sundberg, M. (1999). Structural and functional characterization of the 5'-flanking region of the rat and human cytochrome p450 2e1 genes: Identification of a polymorphic repeat in the human gene. Biochem Biophys Res Commun 263(2), 286-293.

Huang, Y., He, T. & Domann, F. E. (1999). Decreased expression of manganese superoxide dismutase in transformed cells is associated with increased cytosine methylation of the SOD2 gene. DNA Cell Biol 18, 642-652.

Hung, R. J., Boffetta, P., Brennan, P., Malaveille, C., Gelatti, U., Placidi, D., Carta, A., Hautefeuille, A. & Porru, S. (2004). Genetic polymorphisms of MPO, COMT, MnSOD, NQO1, interactions with environmental exposures and bladder cancer risk. Carcinogenesis 25, 973-978.

Hustert, E., Haberl, M., Burk, O., Wolbold, R., He, Y. Q., Klein, K., Nuessler, A. C., Neuhaus, P., Klattig, J., Eiselt, R., Koch, I., Zibat, A., Brockmoller, J., Halpert, J. R., Zanger, U. M. & Wojnowski, L. (2001a). The genetic determinants of the cyp3a5 polymorphism. Pharmacogenetics 11(9), 773-779.

213 Hustert, E., Zibat, A., Presecan-Siedel, E., Eiselt, R., Mueller, R., Fuss, C., Brehm, I., Brinkmann, U., Eichelbaum, M., Wojnowski, L. & Burk, O. (2001b). Natural protein variants of pregnane x receptor with altered transactivation activity toward cyp3a4. Drug Metab Dispos 29(11), 1454-1459.

Hwang, S. Y., Cho, M. L., Park, B., Kim, J. Y., Kim, Y. H., Min, D. J., Min, J. K. & Kim, H. Y. (2002). Allelic frequency of the MCP-1 promoter -2518 polymorphism in the Korean population and in Korean patients with rheumatoid arthritis, systemic lupus erythematosus and adult-onset Still's disease. Eur J Immunogenet 29, 413-416.

Ibeanu, G. C., Blaisdell, J., Ferguson, R. J., Ghanayem, B. I., Brosen, K., Benhamou, S., Bouchardy, C., Wilkinson, G. R., Dayer, P. & Goldstein, J. A. (1999). A novel transversion in the intron 5 donor splice junction of cyp2c19 and a sequence polymorphism in exon 3 contribute to the poor metabolizer phenotype for the anticonvulsant drug s-mephenytoin. J Pharmacol Exp Ther 290(2), 635-640.

Iida, A., Saito, S., Sekine, A., Kondo, K., Mishima, C., Kitamura, Y., Harigae, S., Osawa, S. & Nakamura, Y. (2002). Thirteen single-nucleotide polymorphisms (snps) in the alcohol dehydrogenase 4 (adh4) gene locus. J Hum Genet 47(2), 74-76.

Iida, A., Saito, S., Sekine, A., Mishima, C., Kondo, K., Kitamura, Y., Harigae, S., Osawa, S. & Nakamura, Y. (2001a). Catalog of 258 single-nucleotide polymorphisms (snps) in genes encoding three organic anion transporters, three organic anion-transporting polypeptides, and three nadh:Ubiquinone oxidoreductase flavoproteins. J Hum Genet 46(11), 668-683.

Iida, A., Sekine, A., Saito, S., Kitamura, Y., Kitamoto, T., Osawa, S., Mishima, C. & Nakamura, Y. (2001b). Catalog of 320 single nucleotide polymorphisms (snps) in 20 quinone oxidoreductase and sulfotransferase genes. J Hum Genet 46(4), 225-240.

Imaoka, S., Yamada, T., Hiroi, T., Hayashi, K., Sakaki, T., Yabusaki, Y. & Funae, Y. (1996). Multiple forms of human p450 expressed in saccharomyces cerevisiae. Systematic characterization and comparison with those of the rat. Biochem Pharmacol 51(8), 1041-1050.

Ingelman-Sundberg, M. (2001). Implications of polymorphic cytochrome p450- dependent drug metabolism for drug development. Drug Metab Dispos 29(4 Pt 2), 570- 573.

Ioannidis, J. P. (2003). Genetic associations: False or true? Trends Mol Med 9(4), 135- 138.

Iwai, N. and Naraba, H. (2005). Polymorphisms in human pre-miRNAs. Biochem Biophys Res Commun 331, 1439-1444.

214 Iwai, N., Kajimoto, K., Kokubo, Y., Okayama, A., Miyazaki, S., Nonogi, H., Goto, Y. & Tomoike, H. (2006). Assessment of genetic effects of polymorphisms in the MCP-1 gene on serum MCP-1 levels and myocardial infarction in Japanese. Circ J 70, 805-809.

Iyer, L., Das, S., Janisch, L., Wen, M., Ramirez, J., Karrison, T., Fleming, G. F., Vokes, E. E., Schilsky, R. L. & Ratain, M. J. (2002). Ugt1a1*28 polymorphism as a determinant of irinotecan disposition and toxicity. Pharmacogenomics J 2(1), 43-47.

Iyer, L., Hall, D., Das, S., Mortell, M. A., Ramirez, J., Kim, S., Di Rienzo, A. & Ratain, M. J. (1999). Phenotype-genotype correlation of in vitro sn-38 (active metabolite of irinotecan) and bilirubin glucuronidation in human liver tissue with ugt1a1 promoter polymorphism. Clin Pharmacol Ther 65(5), 576-582.

Ji, Y., Xu, X. & Stormo, G. D. (2004). A graph theoretical approach for predicting common rna secondary structure motifs including pseudoknots in unaligned sequences. Bioinformatics 20(10), 1591-1602.

Johnson, J. M., Castle, J., Garrett-Engele, P., Kan, Z., Loerch, P. M., Armour, C. D., Santos, R., Schadt, E. E., Stoughton, R. & Shoemaker, D. D. (2003). Genome-wide survey of human alternative pre-mRNA splicing with exon junction microarrays. Science 302(5653), 2141-2144.

Johnson, A. D., Wang, D. & Sadée, W. (2005). Polymorphisms affecting gene regulation and mRNA processing: broad implications for pharmacogenetics. Pharmacol Ther 106, 19-38.

Joint National Committee on prevention, detection, evaluation and treatment of high blood pressure. (1997). The sixth report of the Joint National Committee on prevention, detection, evaluation and treatment of high blood pressure. Arch Intern Med 157, 2413- 2446.

Kagimoto, M., Heim, M., Kagimoto, K., Zeugin, T. & Meyer, U. A. (1990). Multiple mutations of the human cytochrome p450iid6 gene (cyp2d6) in poor metabolizers of debrisoquine. Study of the functional significance of individual mutations by expression of chimeric genes. J Biol Chem 265(28), 17209-17214.

Kajinami, K., Brousseau, M. E., Ordovas, J. M. & Schaefer, E. J. (2004). Interactions between common genetic polymorphisms in abcg5/g8 and cyp7a1 on ldl cholesterol- lowering response to atorvastatin. Atherosclerosis 175(2), 287-293.

Kamatani, N., Sekine, A., Kitamoto, T., Iida, A., Saito, S., Kogame, A., Inoue, E., Kawamoto, M., Harigai, M. & Nakamura, Y. (2004). Large-scale single-nucleotide polymorphism (snp) and haplotype analyses, using dense snp maps, of 199 drug-related genes in 752 subjects: The analysis of the association between uncommon snps within

215 haplotype blocks and the haplotypes constructed with haplotype-tagging snps. Am J Hum Genet 75(2), 190-203.

Karter, A. J., Ferrara, A., Liu, J. Y., Moffet, H. H., Ackerson, L. M. & Selby, J. V. (2002). Ethnic disparities in diabetic complications in an insured population. J Am Med Assoc 287, 2519-2527.

Kato, J., Fujikawa, K., Kanda, M., Fukuda, N., Sasaki, K., Takayama, T., Kobune, M., Takada, K., Takimoto, R., Hamada, H. et al. (2001). A mutation, in the iron-responsive element of H ferritin mRNA, causing autosomal dominant iron overload. Am J Hum Genet 69, 191-197.

Katz, L. and Burge, C. B. (2003). Widespread selection for local RNA secondary structure in coding regions of bacterial genes. Genome Res 13, 2042-2051.

Katzung, B. G. Basic and clinical pharmacology. 9th ed. New York, NY: McGraw-Hill; 2004.

Kelley, T. W., Graham, M. M., Doseff, A. I., Pomerantz, R. W., Lau, S. M., Ostrowski, M. C., Franke, T. F. & Marsh, C. B. (1999). Macrophage colony-stimulating factor promotes cell survival through Akt/protein kinase B. J Biol Chem 274, 26393-26398.

Kent, W. J. (2002). BLAT--the BLAST-like alignment tool. Genome Res 12, 656-664. Keszei, M., Nagy, A., Kozma, G. T., Radosits, K., Tolgyesi, G., Falus, A. & Szalai, C. (2006). Pediatric asthmatic patients have low serum levels of monocyte chemoattractant protein-1. J Asthma 43, 399-404.

Khan, S. G., Muniz-Medina, V., Shahlavi, T., Baker, C. C., Inui, H., Ueda, T., Emmert, S., Schneider, T. D. & Kraemer, K. H. (2002). The human XPC DNA repair gene: arrangement, splice site information content and influence of a single nucleotide polymorphism in a splice acceptor site on alternative splicing and function. Nucl Acids Res 30, 3624-3631.

Kidd, R. S., Curry, T. B., Gallagher, S., Edeki, T., Blaisdell, J. & Goldstein, J. A. (2001). Identification of a null allele of cyp2c9 in an african-american exhibiting toxicity to phenytoin. Pharmacogenetics 11(9), 803-808.

Kihara, T., Miyata, Y., Furukawa, M., Noguchi, M., Nishikido, M., Koga, S. & Kanetake, H. (2005). Predictive value of serum macrophage colony-stimulating factor for development of aortic calcification in haemodialysis patients: a 6 year longitudinal study. Nephrol Dial Transplant 20, 1647-1652.

Kim, H. L., Lee, D. S., Yang, S. H., Lim, C. S., Chung, J. H., Kim, S., Lee, J. S. & Kim, Y. S. (2002). The polymorphism of monocyte chemoattractant protein-1 is associated with the renal disease of SLE. Am J Kidney Dis 40, 1146-1152.

216

King, L. M., Ma, J., Srettabunjong, S., Graves, J., Bradbury, J. A., Li, L., Spiecker, M., Liao, J. K., Mohrenweiser, H. & Zeldin, D. C. (2002). Cloning of gene and identification of functional polymorphisms. Mol Pharmacol 61(4), 840-852.

Kiss, A. M., Jady, B. E., Bertrand, E. & Kiss, T. (2004). Human box H/ACA pseudouridylation guide RNA machinery. Mol Cell Biol 24, 5797-5807.

Kivisto, K. T., Griese, E. U., Fritz, P., Linder, A., Hakkola, J., Raunio, H., Beaune, P. & Kroemer, H. K. (1996). Expression of cytochrome p 450 3a enzymes in human lung: A combined rt-pcr and immunohistochemical analysis of normal tissue and lung tumours. Naunyn Schmiedebergs Arch Pharmacol 353(2), 207-212.

Kiyotani, K., Yamazaki, H., Fujieda, M., Iwano, S., Matsumura, K., Satarug, S., Ujjin, P., Shimada, T., Guengerich, F. P., Parkinson, A., Honda, G., Nakagawa, K., Ishizaki, T. & Kamataki, T. (2003). Decreased coumarin 7-hydroxylase activities and expression levels in humans caused by genetic polymorphism in cyp2a6 promoter region (cyp2a6*9). Pharmacogenetics 13(11), 689-695.

Knight, J. C., Keating, B. J., Rockett, K. A. & Kwiatkowski, D. P. (2003). In vivo characterization of regulatory polymorphisms by allele-specific quantification of rna polymerase loading. Nat Genet 33(4), 469-475.

Kobayashi, M., Tsuda, Y., Yoshida, T., Takeuchi, D., Utsunomiya, T., Takahashi, H. & Suzuki, F. (2006). Bacterial sepsis and chemokines. Curr Drug Targets 7, 119-134.

Koboldt, D. C., Miller, R. D. & Kwok, P. Y. (2006). Distribution of human SNPs and its effect on high-throughput genotyping. Hum Mutat 27, 249-254.

Kohle, C., Mohrle, B., Munzel, P. A., Schwab, M., Wernet, D., Badary, O. A. & Bock, K. W. (2003). Frequent co-occurrence of the tata box mutation associated with gilbert's syndrome (ugt1a1*28) with other polymorphisms of the udp-glucuronosyltransferase-1 locus (ugt1a6*2 and ugt1a7*3) in caucasians and egyptians. Biochem Pharmacol 65(9), 1521-1527.

Kozak, M. (1986). Influences of mRNA secondary structure on initiation by eukaryotic ribosomes. Proc Natl Acad Sci U S A 83, 2850-2854.

Kozak, M. (1990). Downstream secondary structure facilitates recognition of initiator codons by eukaryotic ribosomes. Proc Natl Acad Sci U S A 87, 8301-8305.

Kozak, M. (2005). Regulation of translation via mRNA structure in prokaryotes and eukaryotes. Gene 361, 13-37.

217 Kryukov, G. V., Castellano, S., Novoselov, S. V., Lobanov, A. V., Zehtab, O., Guigo, R. & Gladyshev, V. N. (2003). Characterization of mammalian selenoproteomes. Science 300, 1439-1443.

Kuehl, P., Zhang, J., Lin, Y., Lamba, J., Assem, M., Schuetz, J., Watkins, P. B., Daly, A., Wrighton, S. A., Hall, S. D., Maurel, P., Relling, M., Brimer, C., Yasuda, K., Venkataramanan, R., Strom, S., Thummel, K., Boguski, M. S. & Schuetz, E. (2001). Sequence diversity in cyp3a promoters and characterization of the genetic basis of polymorphic cyp3a5 expression. Nat Genet 27(4), 383-391.

Kullak-Ublick, G. A., Beuers, U., Fahney, C., Hagenbuch, B., Meier, P. J. & Paumgartner, G. (1997). Identification and functional characterization of the promoter region of the human organic anion transporting polypeptide gene. Hepatology 26(4), 991- 997.

Kumaraswamy, E., Malykh, A., Korotkov, K. V., Kozyavkin, S., Hu, Y., Kwon, S. Y., Moustafa, M. E., Carlson, B. A., Berry, M. J., Lee, B. J. et al. (2000). Structure- expression relationships of the 15-kDa selenoprotein gene. Possible role of the protein in cancer etiology. J Biol Chem 275, 35540-35547.

Kuo, K. W., Leung, M. F. & Leung, W. C. (1997). Intrinsic secondary structure of human TNFR-I mRNA influences the determination of gene expression by RT-PCR. Mol Cell Biochem 177, 1-6.

Kupfer, A. & Preisig, R. (1984). Pharmacogenetics of mephenytoin: A new drug hydroxylation polymorphism in man. Eur J Clin Pharmacol 26(6), 753-759.

Lamba, J. K., Lamba, V., Yasuda, K., Lin, Y. S., Assem, M., Thompson, E., Strom, S. & Schuetz, E. G. (2004). Expression of car splice variants in human tissues and their functional consequences. J Pharmacol Exp Ther 311, 811-821.

Lamba, J. K., Lin, Y. S., Schuetz, E. G. & Thummel, K. E. (2002). Genetic contribution to variable human cyp3a-mediated metabolism. Adv Drug Deliv Rev 54(10), 1271-1294.

Lamba, V., Lamba, J., Yasuda, K., Strom, S., Davila, J., Hancock, M. L., Fackenthal, J. D., Rogan, P. K., Ring, B., Wrighton, S. A. & Schuetz, E. G. (2003). Hepatic cyp2b6 expression: Gender and ethnic differences and relationship to cyp2b6 genotype and car (constitutive androstane receptor) expression. J Pharmacol Exp Ther 307(3), 906-922.

Lang, T., Klein, K., Fischer, J., Nussler, A. K., Neuhaus, P., Hofmann, U., Eichelbaum, M., Schwab, M. & Zanger, U. M. (2001). Extensive genetic polymorphism in the human cyp2b6 gene with impact on expression and function in human liver. Pharmacogenetics 11(5), 399-415.

218 Lang, T., Klein, K., Richter, T., Zibat, A., Kerb, R., Eichelbaum, M., Schwab, M. & Zanger, U. M. (2004). Multiple novel nonsynonymous cyp2b6 gene polymorphisms in caucasians: Demonstration of phenotypic null alleles. J Pharmacol Exp Ther 311, 34-43.

Langaee, T. & Ronaghi, M. (2005). Genetic variation analyses by Pyrosequencing. Mutat Res 573(1-2), 96-102.

Laporte, J., Guiraud-Chaumeil, C., Vincent, M. C., Mandel, J. L., Tanner, S. M., Liechti- Gallati, S., Wallgren-Pettersson, C., Dahl, N., Kress, W., Bolhuis, P. A. et al. (1997). Mutations in the MTM1 gene implicated in X-linked myotubular myopathy. ENMC International Consortium on Myotubular Myopathy. European Neuro-Muscular Center. Hum Mol Genet 6, 1505-1511.

Le Texier, V., Riethoven, J. J., Kumanduri, V., Gopalakrishnan, C., Lopez, F., Gautheret, D. & Thanaraj, T. A. (2006). AltTrans: Transcript pattern variants annotated for both alternative splicing and alternative polyadenylation. BMC Bioinformatics 7, 169.

Lederman, M. M., Penn-Nicholson, A., Cho, M. & Mosier, D. (2006). Biology of CCR5 and its role in HIV infection and treatment. J Am Med Assoc 296, 815-826.

Lee, C. & Irizarry, K. (2003). Alternative splicing in the nervous system: an emerging source of diversity and regulation. Biol Psychiatry 54, 771-776.

Lee, C. R., Goldstein, J. A. & Pieper, J. A. (2002). Cytochrome p450 2c9 polymorphisms: A comprehensive review of the in-vitro and human data. Pharmacogenetics 12(3), 251-263.

Lehmann, J. M., McKee, D. D., Watson, M. A., Willson, T. M., Moore, J. T. & Kliewer, S. A. (1998). The human orphan nuclear receptor pxr is activated by compounds that regulate cyp3a4 gene expression and cause drug interactions. J Clin Invest 102(5), 1016- 1023.

Lei, H., Day, I. N. M. & Vorechocsky, I. (2005). Exonization of AluYa5 in the human ACE gene requires mutations in both 3? and 5? splice sites and is facilitated by a conserved splicing enhancer. Nucl Acids Res 33, 3897-3906.

Lengauer, C., Kinzler, K. W. & Vogelstein, B. (1998). Genetic instabilities in human cancers. Nature 396, 643-649.

Lenz, H. J., Danenberg, K. D., Schnieders, B., Banerjee, D., Bertino, J. R., Leichman, L. & Danenberg, P. V. (1995). Identification of mutations by RNA conformational polymorphism “bar code” analysis. Genomics 30, 120-122.

219 Lesch, K. P., Balling, U., Gross, J., Strauss, K., Wolozin, B. L., Murphy, D. L. & Riederer, P. (1994). Organization of the human serotonin transporter gene. J Neural Transm Gen Sect 95(2), 157-162.

Lesch, K. P., Bengel, D., Heils, A., Sabol, S. Z., Greenberg, B. D., Petri, S., Benjamin, J., Muller, C. R., Hamer, D. H. & Murphy, D. L. (1996). Association of anxiety-related traits with a polymorphism in the serotonin transporter gene regulatory region. Science 274, 1527-1531.

Lestrade, L. and Weber, M. J. (2006). snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs. Nucl Acids Res 34, D158-162.

Lim, J. E., Papp, A., Pinsonneault, J., Sadée, W. & Saffen, D. (2006). Allelic expression of serotonin transporter (SERT) mRNA in human pons: lack of correlation with the polymorphism SERTLPR. Mol Psychiatry 11, 649-662.

Lim, J. E., Pinsonneault, J., Sadée, W. & Saffen, D. (2007). Tryptophan hydroxylase 2 (TPH2) haplotypes predict levels of TPH2 mRNA expression in human pons. Mol Psychiatry 12, 491-501.

Lin, W., Yang, H. H. & Lee, M. P. (2005). Allelic variation in gene expression identified through computational analysis of the dbEST database. Genomics 86, 518-527.

Liu, J., Chen, M., Deng, C., Bourc’his, D., Nealon, J. G., Erlichman, B., Bestor, T. H. & Weinstein, L. S. (2005). Identification of the control region for tissue-specific imprinting of the stimulatory G protein alpha subunit. Proc Natl Acad Sci U S A 102, 5513-5518.

Liu, R., McEachin, R. C. & States, D. J. (2003). Computationally identifying novel nf- kappa b-regulated immune genes in the human genome. Genome Res 13(4), 654-661.

Lo, H. S., Wang, Z., Hu, Y., Yang, H. H., Gere, S., Buetow, K. H. & Lee, M. P. (2003). Allelic variation in gene expression is common in the human genome. Genome Res 13(8), 1855-1862.

Loktionov, A. (2004). Common gene polymorphisms, cancer progression and prognosis. Cancer Lett 208(1), 1-33.

Lopez de Silanes, I., Galban, S., Martindale, J. L., Yang, X., Mazan-Mamczarz, K., Indig, F. E., Falco, G., Zhan, M. & Gorospe, M. (2005). Identification and functional outcome of mRNAs associated with RNA-binding protein TIA-1. Mol Cell Biol 25, 9520-9531.

Lopez de Silanes, I., Zhan, M., Lal, A., Yang, X. & Gorospe, M. (2004). Identification of a target RNA motif for RNA-binding protein HuR. Proc Natl Acad Sci U S A 101, 2987- 2992.

220 Lotrich, F. E., Pollock, B. G. & Ferrell, R. E. (2003). Serotonin transporter promoter polymorphism in african americans : Allele frequencies and implications for treatment. Am J Pharmacogenomics 3(2), 145-147.

Lovlie, R., Daly, A. K., Matre, G. E., Molven, A. & Steen, V. M. (2001). Polymorphisms in cyp2d6 duplication-negative individuals with the ultrarapid metabolizer phenotype: A role for the cyp2d6*35 allele in ultrarapid metabolism? Pharmacogenetics 11(1), 45-55.

Ly, H., Blackburn, E. H. & Parslow, T. G. (2003). Comprehensive structure-function analysis of the core domain of human telomerase RNA. Mol Cell Biol 23, 6849-6856.

Mackenzie, P. I., Owens, I. S., Burhcell, B., Bock, K. W., Bairoch, A., Belanger, A. et al. (1997). The UDP glycosyltransferase gene superfamily: recommended nomenclature update based on evolutionary divergence. Pharmacogenetics 7(4), 255-269.

Maffei, A., Pozzo, G. D., Prisco, A., Ciullo, M., Harris, P. E., Reed, E. F. & Guardiola, J. (1997). Polymorphism in the 5' terminal region of the mRNA of HLA-DQA1 gene: identification of four groups of transcripts and their association with polymorphism in the alpha 1 domain. Hum Immunol 53, 167-173.

Maillet, P., Dalla Venezia, N., Lorenzo, F., Moriniere, M., Bozon, M., Noel, B., Delaunay, J. & Baklouti, F. (1999). A premature termination codon within an alternative exon affecting only the metabolism of transcripts that retain this exon. Hum Mutat 14,145-155.

Mandola, M. V., Stoehlmacher, J., Muller-Weeks, S., Cesarone, G., Yu, M. C., Lenz, H. J. & Ladner, R. D. (2003). A novel single nucleotide polymorphism within the 5' tandem repeat polymorphism of the thymidylate synthase gene abolishes usf-1 binding and alters transcriptional activity. Cancer Res 63(11), 2898-2904.

Mandola, M. V., Stoehlmacher, J., Zhang, W., Groshen, S., Yu, M. C., Iqbal, S., Lenz, H. J. & Ladner, R. D. (2004). A 6 bp polymorphism in the thymidylate synthase gene causes message instability and is associated with decreased intratumoral ts mrna levels. Pharmacogenetics 14(5), 319-327.

Marenberg, M. E., Risch, N., Berkman, L. F., Floderus, B. & de Faire, U. (1994). Genetic susceptibility to death from coronary heart disease in a study of twins. New Eng J Med 330, 1041-1046.

Marez, D., Legrand, M., Sabbagh, N., Lo-Guidice, J. M., Boone, P. & Broly, F. (1996). An additional allelic variant of the cyp2d6 gene causing impaired metabolism of sparteine. Hum Genet 97(5), 668-670.

221 Marez, D., Sabbagh, N., Legrand, M., Lo-Guidice, J. M., Boone, P. & Broly, F. (1995). A novel cyp2d6 allele with an abolished splice recognition site associated with the poor metabolizer phenotype. Pharmacogenetics 5(5), 305-311.

Marez-Allorge, D., Ellis, S. W., Lo Guidice, J. M., Tucker, G. T. & Broly, F. (1999). A rare g2061 insertion affecting the open reading frame of cyp2d6 and responsible for the poor metabolizer phenotype. Pharmacogenetics 9(3), 393-396.

Marian, A. J., Safavi, F., Ferlic, L., Dunn, J. K., Gotto, A. M. & Ballantyne, C. M. (2000). Interactions between angiotensin-i converting enzyme insertion/deletion polymorphism and response of plasma lipids and coronary atherosclerosis to treatment with fluvastatin: The lipoprotein and coronary atherosclerosis study. J Am Coll Cardiol 35(1), 89-95.

Marinaki, A. M., Arenas, M., Khan, Z. H., Lewis, C. M., Shobowale-Bakre el, M., Escuredo, E., Fairbanks, L. D., Mayberry, J. F., Wicks, A. C., Ansari, A., Sanderson, J. & Duley, J. A. (2003). Genetic determinants of the thiopurine methyltransferase intermediate activity phenotype in british asians and caucasians. Pharmacogenetics 13(2), 97-105.

Martin, E. R., Monks, S. A., Warren, L. L. & Kaplan, N. L. (2000). A test for linkage and association in general pedigrees: The pedigree disequilibrium test. Am J Hum Genet 67(1), 146-154.

Martin, M. E., Fargion, S., Brissot, P., Pellat, B. & Beaumont, C. (1998). A point mutation in the bulge of the iron-responsive element of the L ferritin gene in two families with the hereditary hyperferritinemia-cataract syndrome. Blood 91, 319-323.

Martineau, Y., Le Bec, C., Monbrun, L., Allo, V., Chiu, I. M., Danos, O., Moine, H., Prats, H. & Prats, A. C. (2004). Internal ribosome entry site structural motifs conserved among mammalian fibroblast growth factor 1 alternatively spliced mRNAs. Mol Cell Biol 24, 7622-7635.

Mas, C., Taske, N., Deutsch, S., Guipponi, M., Thomas, P., Covanis, A., Friis, M., Kjeldsen, M. J., Pizzolato, G. P., Villemure, J. G. et al. (2004). Association of the connexin36 gene with juvenile myoclonic epilepsy. J Med Genet 41, e93.

Mathers, C. D. & Loncar, D. (2006). Projections of global mortality and burden of disease from 2002 to 2030. PLoS Med 3(11), e442.

Mathews, D. H., Sabina, J., Zuker, M. & Turner, D. H. (1999). Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J Mol Biol 288, 911-940.

222 Matsumura, K., Saito, T., Takahashi, Y., Ozeki, T., Kiyotani, K., Fujieda, M., Yamazaki, H., Kunitoh, H. & Kamataki, T. (2004). Identification of a novel polymorphic enhancer of the human cyp3a4 gene. Mol Pharmacol 65(2), 326-334.

Matyas, G., Giunta, C., Steinmann, B., Hossle, J. P. & Hellwig, R. (2002). Quantification of single nucleotide polymorphisms: a novel method that combines primer extension assay and capillary electrophoresis. Hum Mutat 19, 58-68.

McCarver, D. G., Byun, R., Hines, R. N., Hichme, M. & Wegenek, W. (1998). A genetic polymorphism in the regulatory sequences of human : Association with increased chlorzoxazone hydroxylation in the presence of obesity and ethanol intake. Toxicol Appl Pharmacol 152(1), 276-281.

McDermott, D. H., Yang, Q., Kathiresan, S., Cupples, L. A., Massaro, J. M., Keaney, J. F., Jr., Larson, M. G., Vasan, R. S., Hirschhorn, J. N., O'Donnell, C. J. et al. (2005). CCL2 polymorphisms are associated with serum monocyte chemoattractant protein-1 levels and myocardial infarction in the Framingham Heart Study. Circulation 112, 1113- 1120.

McDowell, S. E., Coleman, J. J. & Ferner, R. E. (2006). Systematic review and meta- analysis of ethnic differences in risks of adverse reactions to drugs used in cardiovascular medicine. Br Med J 332(7551), 1177-1181.

McLeod, H. L. & Siva, C. (2002). The thiopurine s-methyltransferase gene locus -- implications for clinical pharmacogenomics. Pharmacogenomics 3(1), 89-98.

McLeod, H. L., Collie-Duguid, E. S., Vreken, P., Johnson, M. R., Wei, X., Sapone, A., Diasio, R. B., Fernandez-Salguero, P., van Kuilenberg, A. B., van Gennip, A. H. & Gonzalez, F. J. (1998). Nomenclature for human dpyd alleles. Pharmacogenetics 8(6), 455-459.

McLeod, J. L., Craig, J., Gumley, S., Roberts, S. & Kirkland, M. A. (2002). Mutation spectrum in Australian pedigrees with hereditary hyperferritinaemia-cataract syndrome reveals novel and de novo mutations. Br J Haematol 118, 1179-1182.

McPherson, R., Pertsemlidis, A., Kavaslar, N., Stewart, A., Roberts, R., Cox, D. R., Hinds, D. A., Pennacchio, L. A., Tybjaurg-Hansen, A., Folsom, A. R., Boerwinkle, E., Hobbs, H. H. & Cohen, J. C. (2007). A common allele on chromosome 9 associated with coronary heart disease. Science 316, 1488-1491.

Merani, S., Truong, W. W., Hancock, W., Anderson, C. C. & Shapiro, A. M. (2006). Chemokines and their receptors in islet allograft rejection and as targets for tolerance induction. Cell Transplant 15, 295-309.

223 Meyer, I. M. and Miklos, I. (2005). Statistical evidence for conserved, local secondary structure in the coding regions of eukaryotic mRNAs and pre-mRNAs. Nucl Acids Res 33, 6338-6348.

Michaelovsky, E., Frisch, A., Rockah, R., Peleg, L., Magal, N., Shohat, M. & Weizman, R. (1999). A novel allele in the promoter region of the human serotonin transporter gene. Mol Psychiatry 4(1), 97-99.

Michlewski, G. and Krzyzosiak, W. J. (2004). Molecular architecture of CAG repeats in human disease related transcripts. J Mol Biol 340, 665-679.

Miklos, I., Meyer, I. M. & Nagy, B. (2005). Moments of the Boltzmann distribution for RNA secondary structures. Bull Math Biol 67, 1031-1047.

Miller, S. A., Dykes, D. D. & Polesky, H. F. (1988). A simple salting out procedure for extracting DNA from human nucleated cells. Nucl Acids Res 16, 1215.

Miller, V. M., Xia, H., Marrs, G. L., Gouvion, C. M., Lee, G., Davidson, B. L. & Paulson, H. L. (2003). Allele-specific silencing of dominant disease genes. Proc Natl Acad Sci U S A 100(12), 7195-7200.

Minami, M., Katayama, T. & Satoh, M. (2006). Brain cytokines and chemokines: roles in ischemic injury and pain. J Pharmacol Sci 100, 461-470.

Modrek, B. & Lee, C. (2002). A genomic view of alternative splicing. Nat Genet 30, 13- 19.

Moghadaszadeh, B., Petit, N., Jaillard, C., Brockington, M., Roy, S. Q., Merlini, L., Romero, N., Estournet, B., Desguerre, I., Chaigne, D. et al. (2001). Mutations in SEPN1 cause congenital muscular dystrophy with spinal rigidity and restrictive respiratory syndrome. Nat Genet 29, 17-18.

Moore, M. J. (2005). From birth to death: the complex lives of eukaryotic mRNAs. Science 309, 1514-1518.

Morley, M., Molony, C. M., Weber, T. M., Devlin, J. L., Ewens, K. G., Spielman, R. S. & Cheung, V. G. (2004). Genetic analysis of genome-wide variation in human gene expression. Nature 430(7001), 743-747.

Muller-Ladner, U., Pap, T., Gay, R. E., Neidhart, M. & Gay, S. (2005). Mechanisms of disease: the molecular and cellular basis of joint destruction in rheumatoid arthritis. Nat Clin Pract Rheumatol 1, 102-110.

224 Mumford, A. D., Vulliamy, T., Lindsay, J. & Watson, A. (1998). Hereditary hyperferritinemia-cataract syndrome: two novel mutations in the L-ferritin iron- responsive element. Blood 91, 367-368.

Mundo, E., Walker, M., Cate, T., Macciardi, F. & Kennedy, J. L. (2001). The role of serotonin transporter protein gene in -induced mania in bipolar disorder: Preliminary findings. Arch Gen Psychiatry 58(6), 539-544.

Murayama, N., Soyama, A., Saito, Y., Nakajima, Y., Komamura, K., Ueno, K., Kamakura, S., Kitakaze, M., Kimura, H., Goto, Y., Saitoh, O., Katoh, M., Ohnuma, T., Kawai, M., Sugai, K., Ohtsuki, T., Suzuki, C., Minami, N., Ozawa, S. & Sawada, J. (2004). Six novel nonsynonymous cyp1a2 gene polymorphisms: Catalytic activities of the naturally occurring variant enzymes. J Pharmacol Exp Ther 308(1), 300-306.

Murray, G. I., Pritchard, S., Melvin, W. T. & Burke, M. D. (1995). Cytochrome p450 cyp3a5 in the human anterior pituitary gland. FEBS Lett 364(1), 79-82.

Murray, L. A., Syed, F., Li, L., Griswold, D. E. & Das, A. M. (2006). Role of chemokines in severe asthma. Curr Drug Targets 7, 579-588.

Murrell, A., Rakyan, V. K. & Beck, S. (2005). From genome to epigenome. Hum Mol Genet 14, R3-R10.

Myakishev, M. V., Khripin, Y., Hu, S. & Hamer, D. H. (2001). High-throughput SNP genotyping by allele-specific PCR with universal energy-transfer-labeled primers. Genome Res 11(1),163-169.

Myers, S. J., Huang, Y., Genetta, T. & Dingledine, R. (2004). Inhibition of glutamate receptor 2 translation by a polymorphic repeat sequence in the 5'-untranslated leaders. J Neurosci 24, 3489-3499.

Nackley, A. G., Shabalina, S. A., Tchivileva, I. E., Satterfield, K., Korchynskyi, O., Makarov, S. S., Maixner, W. & Diatchenko, L. (2006). Human catechol-O- methyltransferase haplotypes modulate protein expression by altering mRNA secondary structure. Science 314, 1930-1933.

Nakajima, M., Yokoi, T., Mizutani, M., Kinoshita, M., Funayama, M. & Kamataki, T. (1999). Genetic polymorphism in the 5'-flanking region of human cyp1a2 gene: Effect on the cyp1a2 inducibility in humans. J Biochem (Tokyo) 125(4), 803-808.

Nakamura, K., Chen, C. K., Sekine, Y., Iwata, Y., Anitha, A., Loh, E., Takei, N., Suzuki, A., Kawai, M., Takebayashi, K. et al. (2006). Association analysis of SOD2 variants with methamphetamine psychosis in Japanese and Taiwanese populations. Hum Genet 120, 243-252.

225 Nakamura, K., Goto, F., Ray, W. A., McAllister, C. B., Jacqz, E., Wilkinson, G. R. & Branch, R. A. (1985). Interethnic differences in genetic polymorphism of debrisoquin and mephenytoin hydroxylation between japanese and caucasian populations. Clin Pharmacol Ther 38(4), 402-408.

Nakamura, M., Ueno, S., Sano, A. & Tanabe, H. (2000). The human serotonin transporter gene linked polymorphism (5-httlpr) shows ten novel allelic variants. Mol Psychiatry 5(1), 32-38.

Nelson, D. R., Zeldin, D. C., Hoffman, S. M., Maltais, L. J., Wain, H. M. & Nebert, D. W. (2004). Comparison of cytochrome p450 (cyp) genes from the mouse and human genomes, including nomenclature recommendations for genes, pseudogenes and alternative-splice variants. Pharmacogenetics 14(1), 1-18.

Nestler, E., Hyman, S. & Malenka, R. (2001). Molecular Neuropharmacology: A foundation for Clinical Neuroscience. First ed., The McGraw-Hill Companies, Inc., New York, NY.

Ng, P. C. & Henikoff, S. (2002). Accounting for human polymorphisms predicted to affect protein function. Genome Res 12(3), 436-446.

Niemi, M., Schaeffeler, E., Lang, T., Fromm, M. F., Neuvonen, M., Kyrklund, C., Backman, J. T., Kerb, R., Schwab, M., Neuvonen, P. J., Eichelbaum, M. & Kivisto, K. T. (2004). High plasma pravastatin concentrations are associated with single nucleotide polymorphisms and haplotypes of organic anion transporting polypeptide-c (oatp-c, slco1b1). Pharmacogenetics 14(7), 429-440.

Nitta, K., Akiba, T., Kawashima, A., Kimata, N., Miwa, N., Uchida, K., Honda, K., Takei, T., Otsubo, S., Yumura, W. et al. (2001). Serum levels of macrophage colony- stimulating factor and aortic calcification in hemodialysis patients. Am J Nephrol 21, 465-470.

Nordmark, A., Lundgren, S., Ask, B., Granath, F. & Rane, A. (2002). The effect of the cyp1a2 *1f mutation on cyp1a2 inducibility in pregnant women. Br J Clin Pharmacol 54(5), 504-510.

Norton, N., Williams, N. M., Williams, H. J., Spurlock, G., Kirov, G., Morris, D. W., Hoogendoorn, B., Owen, M. J. & O?Donovan, M. C. (2002). Universal, robust, highly quantitative SNP allele frequency measurement in DNA pools. Hum Genet 110, 471-478.

Ohno, M., Yamaguchi, I., Yamamoto, I., Fukuda, T., Yokota, S., Maekura, R., Ito, M., Yamamoto, Y., Ogura, T., Maeda, K., Komuta, K., Igarashi, T. & Azuma, J. (2000). Slow n-acetyltransferase 2 genotype affects the incidence of isoniazid and rifampicin- induced hepatotoxicity. Int J Tuberc Lung Dis 4(3), 256-261.

226 O'Reilly, R. L., Bogue, L. & Singh, S. M. (1994). Pharmacogenetic response to in a multicase family with affective disorder. Biol Psychiatry 36(7), 467- 471.

Oscarson, M., McLellan, R. A., Asp, V., Ledesma, M., Ruiz, M. L., Sinues, B., Rautio, A. & Ingelman-Sundberg, M. (2002). Characterization of a novel /cyp2a6 hybrid allele (cyp2a6*12) that causes reduced cyp2a6 activity. Hum Mutat 20(4), 275-283.

Pae, C. U., Kim, J. J., Yu, H. S., Lee, C. U., Lee, S. J., Jun, T. Y., Lee, C. & Paik, I. H. (2004). Monocyte chemoattractant protein-1 promoter -2518 polymorphism may have an influence on clinical heterogeneity of bipolar I disorder in the Korean population. Neuropsychobiology 49, 111-114.

Page, G. P., George, V., Go, R. C., Page, P. Z. & Allison, D. B. (2003). “Are we there yet?”: Deciding when one has demonstrated specific genetic causation in complex diseases and quantitative traits. Am J Hum Genet 73(4), 711-719.

Pai, H. V., Kommaddi, R. P., Chinta, S. J., Mori, T., Boyd, M. R. & Ravindranath, V. (2004). A frameshift mutation and alternate splicing in human brain generate a functional form of the pseudogene cytochrome p4502d7 that demethylates codeine to morphine. J Biol Chem 279(26), 27383-27389.

Pant, P. V., Tao, H., Beilharz, E. J., Ballinger, D. G., Cox, D. R. & Frazer, K. A. (2006). Analysis of allelic differential expression in human white blood cells. Genome Res 16, 331-339.

Panzer, U., Steinmetz, O. M., Stahl, R. A. & Wolf, G. (2006). Kidney diseases and chemokines. Curr Drug Targets 7, 65-80.

Papi, A., Luppi, F., Franco, F. & Fabbri, L. M. (2006). Pathophysiology of exacerbations of chronic obstructive pulmonary disease. Proc Am Thorac Soc 3, 245-251.

Papp, A. C., Pinsonneault, J. K., Cooke, G. & Sadée, W. (2003). Single nucleotide polymorphism genotyping using allele-specific PCR and fluorescence melting curves. Biotechniques 34, 1068-1072.

Pastinen, T. & Hudson, T. J. (2004). Cis-acting regulatory variation in the human genome. Science 306(5696), 647-650.

Pastinen, T., Ge, B. & Hudson, T. J. (2006). Influence of human genome polymorphism on gene expression. Hum Mol Genet 15, R9-R16.

Pastinen, T., Sladek, R., Gurd, S., Sammak, A., Ge, B., Lepage, P., Lavergne, K., Villeneuve, A., Gaudin, T., Brandstrom, H. et al. (2004). A survey of genetic and epigenetic variation affecting human gene expression. Physiol Genomics 16, 184-193.

227

Payton, M. A. & Sim, E. (1998). Genotyping human arylamine n-acetyltransferase type 1 (nat1): The identification of two novel allelic variants. Biochem Pharmacol 55(3), 361- 366.

Pedersen, J. S., Bejerano, G., Siepel, A., Rosenbloom, K., Lindblad-Toh, K., Lander, E. S., Kent, J., Miller, W. & Haussler, D. (2006). Identification and Classification of Conserved RNA Secondary Structures in the Human Genome. PLoS Comput Biol 2, e33.

Pelkonen, O., Rautio, A., Raunio, H. & Pasanen, M. (2000). Cyp2a6: A human coumarin 7-hydroxylase. Toxicology 144(1-3), 139-147.

Pepine, C. J., Handberg, E., Cooper-DeHoff, R. M., Marks, R. G., Kowey, P., Messerli, F. H., Mancia, G., Cangiano, J. L., Garcia-Barreto, D., Keltai, M., Erdine, S., Bristol, H. A., Kolb, H. R., Bakris, G. L., Cohen, J. D. & Parmley, W. W., for the INVEST Investigatorset al. (2003). A calcium antagonist versus a non-calcium antagonist hypertension treatment strategy for patients with coronary heart disease ? The International Verapamil-Trandolopril Study (INVEST): a randomized control trial. J Am Med Assoc 290, 2805-2816.

Perez, B., Desviat, L. R., Rodriguez-Pombo, P., Clavero, S., Navarette, R., Perez-Cerda, C. & Ugarte, M. (2003). Propionic acidemia: identification of twenty-four novel mutations in Europe and North America. Mol Genet Metab 78, 59-67.

Petronis, A. (2003). Epigenetics in bipolar disorder: New opportunities and challenges. Am J Med Genet 123C, 65-75.

Phillips, K. A., Veenstra, D. L., Oren, E., Lee, J. K. & Sadée, W. (2001). Potential role of pharmacogenomics in reducing adverse drug reactions: A systematic review. J Am Med Assoc 286(18), 2270-2279.

Pinsonneault, J. & Sadée, W. (2003). Pharmacogenomics of multigenic diseases: Sex- specific differences in disease and treatment outcome. AAPS PharmSci 5(4), E29.

Pinsonneault, J. K., Papp, A. C. & Sadée, W. (2006). Allelic mRNA expression of X- linked monoamine oxidase a (MAOA) in human brain: dissection of epigenetic and genetic factors. Hum Mol Genet 15, 2636-2649.

Pinsonneault, J., Nielsen, C. U. & Sadée, W. (2004). Genetic variants of the human H+/dipeptide transporter PEPT2: analysis of haplotype functions. J Pharmacol Exp Ther 311,1088-1096.

Pitarque, M., von Richter, O., Oke, B., Berkkan, H., Oscarson, M. & Ingelman-Sundberg, M. (2001). Identification of a single nucleotide polymorphism in the tata box of the

228 cyp2a6 gene: Impairment of its promoter activity. Biochem Biophys Res Commun 284(2), 455-460.

Pitarque, M., von Richter, O., Rodriguez-Antona, C., Wang, J., Oscarson, M. & Ingelman-Sundberg, M. (2004). A nicotine c-oxidase gene (cyp2a6) polymorphism important for promoter activity. Hum Mutat 23(3), 258-266.

Plant, K. E., Green, P. M., Vetrie, D. & Flinter, F. A. (1999). Detection of mutations in COL4A5 in patients with Alport syndrome. Hum Mutat 13, 124-132.

Pola, R., Flex, A., Gaetani, E., Proia, A. S., Papaleo, P., Giorgio, A. D., Straface, G., Pecorini, G., Serricchio, M. & Pola, P. (2004). Monocyte chemoattractant protein-1 (MCP-1) gene polymorphism and risk of Alzheimer's disease in Italians. Exp Gerontol 39, 1249-1252.

Pollock, B. G., Ferrell, R. E., Mulsant, B. H., Mazumdar, S., Miller, M., Sweet, R. A., Davis, S., Kirshner, M. A., Houck, P. R., Stack, J. A., Reynolds, C. F. & Kupfer, D. J. (2000). Allelic variation in the serotonin transporter promoter affects onset of paroxetine treatment response in late-life depression. Neuropsychopharmacology 23(5), 587-590.

Popendikyte, V., Laurinavicius, A., Paterson, A. D., Macciardi, F., Kennedy, J. L. & Petronis, A. (1999). DNA methylation at the putative promoter region of the human dopamine D2 receptor gene. NeuroReport 10, 1249-1255.

Popowski, K., Sperker, B., Kroemer, H. K., John, U., Laule, M., Stangl, K. & Cascorbi, I. (2003). Functional significance of a hereditary adenine insertion variant in the 5'-UTR of the endothelin-1 gene. Pharmacogenetics 13, 445-451.

Probst-Kepper, M., Hecht, H. J., Herrmann, H., Janke, V., Ocklenburg, F., Klempnauer, J., van den Eynde, B. J. & Weiss, S. (2004). Conformational restraints and flexibility of 14-meric peptides in complex with HLA-B*3501. J Immunol 173, 5610-5616.

Puga, I., Lainez, B., Fernandez-Real, J. M., Buxade, M., Broch, M., Vendrell, J. & Espel, E. (2005). A polymorphism in the 3' untranslated region of the gene for tumor necrosis factor receptor 2 modulates reporter gene expression. Endocrinology 146, 2210-2220.

Rabello, D., Soedarsono, N., Kamei, H., Ishihara, Y., Noguchi, T., Fuma, D., Suzuki, M., Sakaki, Y., Yamaguchi, A. & Kojima, T. (2006). CSF1 gene associated with aggressive periodontitis in the Japanese population. Biochem Biophys Res Commun 347, 791-796.

Raimundo, S., Fischer, J., Eichelbaum, M., Griese, E. U., Schwab, M. & Zanger, U. M. (2000). Elucidation of the genetic basis of the common 'intermediate metabolizer' phenotype for drug oxidation by cyp2d6. Pharmacogenetics 10(7), 577-581.

229 Raimundo, S., Toscano, C., Klein, K., Fischer, J., Griese, E. U., Eichelbaum, M., Schwab, M. & Zanger, U. M. (2004). A novel intronic mutation, 2988g>a, with high predictivity for impaired function of cytochrome p450 2d6 in white subjects. Clin Pharmacol Ther 76(2), 128-138.

Ratain, M. J., Rosner, G., Allen, S. L., Costanza, M., Van Echo, D. A., Henderson, I. C. & Schilsky, R. L. (1995). Population pharmacodynamic study of amonafide: A cancer and leukemia group b study. J Clin Oncol 13(3), 741-747.

Rausch, J. L., Johnson, M. E., Fei, Y. J., Li, J. Q., Shendarkar, N., Hobby, H. M., Ganapathy, V. & Leibach, F. H. (2002). Initial conditions of serotonin transporter kinetics and genotype: Influence on ssri treatment trial outcome. Biol Psychiatry 51(9), 723-732.

Redon, R., Ishikawa, S., Fitch, K. R., Feuk, L., Perry, G. H., Andrews, T. D., Fiegler, H., Shapero, M. H., Carson, A. R., Chen, W. et al. (2006). Global variation in copy number in the human genome. Nature 444, 444-454.

Reist, C., Mazzanti, C., Vu, R., Tran, D. & Goldman, D. (2001). Serotonin transporter promoter polymorphism is associated with attenuated prolactin response to fenfluramine. Am J Med Genet 105(4), 363-368.

Reiter, C. & Weinshilboum, R. (1982). Platelet phenol sulfotransferase activity: Correlation with sulfate conjugation of acetaminophen. Clin Pharmacol Ther 32(5), 612- 621.

Ren, J. (2000). High-throughput single-strand conformation polymorphism analysis by capillary electrophoresis. J Chromatogr B Biomed Sci Appl 741, 115-128.

Rendic, S. (2002). Summary of information on human cyp enzymes: Human p450 metabolism data. Drug Metab Rev 34(1-2), 83-448.

Rieder, M. J., Taylor, S. L., Clark, A. G. & Nickerson, D. A. (1999). Sequence variation in the human angiotensin converting enzyme. Nat Gen 22, 59-62.

Rittner, H. L. & Brack, A. (2006). Chemokines and pain. Curr Opin Investig Drugs 7, 643-646.

Rivas-Santiago, B., Vieyra-Reyes, P. & Araujo, Z. (2005). Cell immunity response in human pulmonary tuberculosis. Invest Clin 46, 391-412.

Rockman, M. V. & Kruglyak, L. (2006). Genetics of global gene expression. Nat Rev Genet 7, 862-872.

230 Rockman, M. V. & Wray, G. A. (2002). Abundant raw material for cis-regulatory evolution in humans. Mol Biol Evol 19(11), 1991-2004.

Rockman, M. V., Hahn, M. W., Soranzo, N., Zimprich, F., Goldstein, D. B. & Wray, G.A. (2005). Ancient and recent positive selection transformed opioid cis-regulation in humans. PloS Biol 3, e387.

Rohrschneider, L. R., Bourette, R. P., Lioubin, M. N., Algate, P. A., Myles, G. M. & Carlberg, K. (1997). Growth and differentiation signals regulated by the M-CSF receptor. Mol Reprod Dev 46, 96-103.

Rollins, B. J. (2006). Inflammatory chemokines in cancer growth and progression. Eur J Cancer 42, 760-767.

Rosatto, N., Pontremoli, R., de Ferrari, G. & Ravazzolo, R. (1999). Intron 16 insertion of the angiotensin converting enzyme gene and transcriptional regulation. Nephrol Dial Transplant 14, 868-871.

Rosenkilde, M. M. & Kledal, T. N. (2006). Targeting herpesvirus reliance of the chemokine system. Curr Drug Targets 7, 103-118.

Rovin, B. H., Lu, L. & Saxena, R. (1999). A novel polymorphism in the MCP-1 gene regulatory region that influences MCP-1 expression. Biochem Biophys Res Commun 259, 344-348.

Rushmore, T. H. & Kong, A. N. (2002). Pharmacogenomics, regulation and signaling pathways of phase i and ii drug metabolizing enzymes. Curr Drug Metab 3(5), 481-490.

Russcher, H., Smit, P., van den Akker, E. L., van Rossum, E. F., Brinkmann, A. O., de Jong, F. H., Lamberts, S. W. & Koper, J. W. (2005a). Two polymorphisms in the glucocorticoid receptor gene directly affect glucocorticoid-regulated gene expression. J Clin Endocrinol Metab 90, 5804-5810.

Russcher, H., van Rossum, E. F., de Jong, F. H., Brinkmann, A. O., Lamberts, S. W. & Koper, J. W. (2005b). Increased expression of the glucocorticoid receptor-A translational isoform as a result of the ER22/23EK polymorphism. Mol Endocrinol 19, 1687-1696.

Ryffel, G. U. (2001). Mutations in the human genes encoding the transcription factors of the hepatocyte nuclear factor (hnf)1 and hnf4 families: Functional and pathological consequences. J Mol Endocrinol 27(1), 11-29.

Sachse, C., Brockmoller, J., Bauer, S. & Roots, I. (1999). Functional significance of a c-- >a polymorphism in intron 1 of the cytochrome p450 cyp1a2 gene tested with caffeine. Br J Clin Pharmacol 47(4), 445-449.

231 Sachse, C., Brockmoller, J., Bauer, S., Reum, T. & Roots, I. (1996). A rare insertion of t226 in exon 1 of cyp2d6 causes a frameshift and is associated with the poor metabolizer phenotype: Cyp2d6*15. Pharmacogenetics 6(3), 269-272.

Saeki, M., Saito, Y., Nakamura, T., Murayama, N., Kim, S. R., Ozawa, S., Komamura, K., Ueno, K., Kamakura, S., Nakajima, T., Saito, H., Kitamura, Y., Kamatani, N. & Sawada, J. (2003). Single nucleotide polymorphisms and haplotype frequencies of cyp3a5 in a japanese population. Hum Mutat 21(6), 653.

Saito, S., Iida, A., Sekine, A., Kawauchi, S., Higuchi, S., Ogawa, C. & Nakamura, Y. (2003). Catalog of 680 variations among eight cytochrome p450 ( cyp) genes, nine esterase genes, and two other genes in the japanese population. J Hum Genet 48(5), 249- 270.

Sakaeda, T., Nakamura, T., Horinouchi, M., Kakumoto, M., Ohmoto, N., Sakai, T., Morita, Y., Tamura, T., Aoyama, N., Hirai, M., Kasuga, M. & Okumura, K. (2001). Mdr1 genotype-related pharmacokinetics of digoxin after single oral administration in healthy japanese subjects. Pharm Res 18(10), 1400-1404.

Sam, F., Kerstetter, D. L., Pimental, D. R., Mulukutla, S., Tabaee, A., Bristow, M. R., Colucci, W. S. & Sawyer, D. B. (2005). Increased reactive oxygen species production and functional alterations in antioxidant enzymes in human failing myocardium. J Card Fail 11, 473-480.

Sanchez, E., Sabio, J. M., Callejas, J. L., de Ramon, E., Garcia-Portales, R., Garcia- Hernandez, F. J., Jimenez-Alonso, J., Gonzalez-Escribano, M. F., Martin, J. & Koeleman, B. P. (2006). Association study of genetic variants of pro-inflammatory chemokine and cytokine genes in systemic lupus erythematosus. BMC Med Genet 7, 48.

Sandovici, I., Naumova, A. K., Leppert, M., Linares, Y. & Sapienza, C. (2004). A longitudinal study of X-inactivation in human females. Hum Genet 115, 387-392.

Sanger, F. & Tuppy, H. (1951). The amino-acid sequence in the phenylalanyl chain of insulin. 1. The identification of lower peptides from partial hydrolysates. Biochem J 49, 463-481.

Sarkar, G., Yoon, H. S. & Sommer, S. S. (1992). Screening for mutations by RNA single- strand conformation polymorphism (rSSCP): comparison with DNA-SSCP. Nucl Acids Res 20, 871-878.

Saxena, R., Shaw, G. L., Relling, M. V., Frame, J. N., Moir, D. T., Evans, W. E., Caporaso, N. & Weiffenbach, B. (1994). Identification of a new variant cyp2d6 allele with a single base deletion in exon 3 and its association with the poor metabolizer phenotype. Hum Mol Genet 3(6), 923-926.

232 Sayed-Tabatabaei, F. A., Oostra, B. A., Isaacs, A., van Duijn, C. M. & Witteman, J. C. M. (2006). ACE polymorphisms. Circ Res 98, 1123-1133.

Schaaf, M. J. & Cidlowski, J. A. (2002). Auuua motifs in the 3'utr of human glucocorticoid receptor alpha and beta mrna destabilize mrna and decrease receptor protein expression. 67(7), 627-636.

Schaller, M., Hogaboam, C. M., Lukacs, N. & Kunkel, S. L. (2006). Respiratory viral infections drive chemokine expression and exacerbate the asthmatic response. J Allergy Clin Immunol 118, 295-302.

Scherl, E. & Frissora, C. L. (2003). Irritable bowel syndrome genophenomics: Correlation of serotonin-transporter polymorphisms and alosetron response. Pharmacogenomics J 3(2), 64-66.

Schmitz, G. & Drobnik, W. (2003). Pharmacogenomics and pharmacogenetics of cholesterol-lowering therapy. Clin Chem Lab Med 41(4), 581-589.

Schuetz, E. G., Schuetz, J. D., Grogan, W. M., Naray-Fejes-Toth, A., Fejes-Toth, G., Raucy, J., Guzelian, P., Gionela, K. & Watlington, C. O. (1992). Expression of cytochrome p450 3a in amphibian, rat, and human kidney. Arch Biochem Biophys 294(1), 206-214.

Schuetz, E. G., Strom, S., Yasuda, K., Lecureur, V., Assem, M., Brimer, C., Lamba, J., Kim, R. B., Ramachandran, V., Komoroski, B. J., Venkataramanan, R., Cai, H., Sinal, C. J., Gonzalez, F. J. & Schuetz, J. D. (2001). Disrupted bile acid homeostasis reveals an unexpected interaction among nuclear hormone receptors, transporters, and cytochrome p450. J Biol Chem 276(42), 39411-39418.

Schwarz, U. I. (2003). Clinical relevance of genetic polymorphisms in the human cyp2c9 gene. Eur J Clin Invest 33 Suppl 2, 23-30.

Secko, D. (2004). Computing gene regulation. The Scientist 18, 28-29.

Seffens, W. and Digby, D. (1999). mRNAs have greater negative folding free energies than shuffled or codon choice randomized sequences. Nucl Acids Res 27, 1578-1584.

Segerer, S. & Nelson, P. J. (2005). Chemokines in renal diseases. ScientificWorldJournal 5, 835-844.

Sehgal, A. R. (2004). Overlap between whites and blacks in response to antihypertensive drugs. Hypertension 43, 566-572.

233 Shalev, A., Blair, P. J., Hoffmann, S. C., Hirshberg, B., Peculis, B. A. & Harlan, D. M. (2002). A proinsulin gene splice variant with increased translation efficiency is expressed in human pancreatic islets. Endocrinology 143, 2541-2547.

Sheets, M. D., Ogg, S. C. & Wickens, M. P. (1990). Point mutations in aauaaa and the poly (a) addition site: Effects on the accuracy and efficiency of cleavage and polyadenylation in vitro. Nucl Acids Res 18(19), 5799-5805.

Shen, L. X., Basilion, J. P. & Stanton, V. P., Jr. (1999). Single-nucleotide polymorphisms can cause different structural folds of mRNA. Proc Natl Acad Sci U S A 96, 7871-7876.

Shimada, T., Yamazaki, H., Mimura, M., Inui, Y. & Guengerich, F. P. (1994). Interindividual variations in human liver cytochrome p-450 enzymes involved in the oxidation of drugs, carcinogens and toxic chemicals: Studies with liver microsomes of 30 japanese and 30 caucasians. J Pharmacol Exp Ther 270(1), 414-423.

Shimoda, K., Someya, T., Morita, S., Hirokane, G., Yokono, A., Takahashi, S. & Okawa, M. (2002). Lack of impact of cyp1a2 genetic polymorphism (c/a polymorphism at position 734 in intron 1 and g/a polymorphism at position -2964 in the 5'-flanking region of cyp1a2) on the plasma concentration of haloperidol in smoking male japanese with schizophrenia. Prog Neuropsychopharmacol Biol Psychiatry 26(2), 261-265.

Shriver, M. A., Mei, R., Parra, E. J., Sonpar, V., Tishkoff, S. A., Schurr, T. G., Zhadanov, S. I., Osipova, L. P., Brutsaert, T. D., Friedlaender, J., Jorde, L. B., Watkins, W. S., Bamshad, M. J., Gutierrez, G., Loi, H., Matsuzaki, H., Kittles, R. A., Argyropoulos, G., Fernandez, J. R., Akey, J. M. & Jones, K. W. (2005). Large-scale SNP analysis reveals clustered and continuous patterns of human genetic variation. Hum Genomics 2(2), 81- 89.

Shyy, Y. J., Wickham, L. L., Hagan, J. P., Hsieh, H. J., Hu, Y. L., Telian, S. H., Valente, A. J., Sung, K. L. & Chien, S. (1993). Human monocyte colony-stimulating factor stimulates the gene expression of monocyte chemotactic protein-1 and increases the adhesion of monocytes to endothelial monolayers. J Clin Invest 92, 1745-1751.

Sierakowska, H., Sambade, M. J., Agrawal, S. & Kole, R. (1996). Repair of thalassemic human beta-globin mrna in mammalian cells by antisense oligonucleotides. Proc Natl Acad Sci U S A 93(23), 12840-12844.

Simeoni, E., Hoffmann, M. M., Winkelmann, B. R., Ruiz, J., Fleury, S., Boehm, B. O., Marz, W. & Vassalli, G. (2004). Association between the A-2518G polymorphism in the monocyte chemoattractant protein-1 gene and insulin resistance and Type 2 diabetes mellitus. Diabetologia.

Singer-Sam, J., Chapman, V., LeBon, J. M. & Riggs, A. D. (1992). Parental imprinting studied by allele-specific primer extension after pcr: Paternal x chromosome-linked genes

234 are transcribed prior to preferential paternal x chromosome inactivation. Proc Natl Acad Sci U S A 89(21), 10469-10473.

Sjoblom, T., Jones, S., Wood, L. D., Parsons, W., Lin, J., Barber, T. D., Mandelker, D., Leary, R. J., Ptak, J., Silliman, N., Szabo, S., Buckhaults, P., Farrell, C., Meeh, P., Markowitz, S. D., Willis, J., Dawson, D., Willson, J. K. V., Gazdar, A. F., Hartigan, J., Wu, L., Liu, C., Parmigiani, G., Park, B. H., Bachman, K. E., Papadopoulos, N., Vogelstein, B., Kinzler, K. W. & Velculescu, V. E. (2006). The consensus coding sequences of human breast and colorectal cancers. Science 13, 268-274.

Sobczak, K., de Mezer, M., Michlewski, G., Krol, J. & Krzyzosiak, W. J. (2003). RNA structure of trinucleotide repeats associated with human neurological diseases. Nucl Acids Res 31, 5469-5482.

Sordella, R., Bell, D. W., Haber, D. A. & Settleman, J. (2004). Gefitinib-sensitizing egfr mutations in lung cancer activate anti-apoptotic pathways. Science 305(5687), 1163- 1167.

Sorensen, T. L. (2004). Targeting the chemokine receptor CXCR3 and its ligand CXCL10 in the central nervous system: potential therapy for inflammatory demyelinating disease? Curr Neurovasc Res 1, 183-190.

Sousa, A. R., Lane, S. J., Cidlowski, J. A., Staynov, D. Z. & Lee, T. H. (2000). Glucocorticoid resistance in asthma is associated with elevated in vivo expression of the glucocorticoid receptor beta-isoform. J Allergy Clin Immunol 105(5), 943-950.

Speicker, M., Darius, H., Hankeln, T., Soufi, M., Sattler, A., Schaefer, J. R., Node, K., Borgel, J., Mugge, A., Lindpainter, K., Huesing, A., Maisch, B., Zeldin, D. C. & Liao, J. K. (2004). Risk of coronary artery disease associated with polymorphism of a cytochrome p450 , cyp2j2. Circulation 110, 2132-2136.

Spielman, R. S. & Ewens, W. J. (1996). The tdt and other family-based tests for linkage disequilibrium and association. Am J Hum Genet 59(5), 983-989.

Spire-Vayron de la Moureyre, C., Debuysere, H., Fazio, F., Sergent, E., Bernard, C., Sabbagh, N., Marez, D., Lo Guidice, J. M., D'Halluin J, C. & Broly, F. (1999). Characterization of a variable number tandem repeat region in the thiopurine s- methyltransferase gene promoter. Pharmacogenetics 9(2), 189-198.

Spire-Vayron de la Moureyre, C., Debuysere, H., Mastain, B., Vinner, E., Marez, D., Lo Guidice, J. M., Chevalier, D., Brique, S., Motte, K., Colombel, J. F., Turck, D., Noel, C., Flipo, R. M., Pol, A., Lhermitte, M., Lafitte, J. J., Libersa, C. & Broly, F. (1998). Genotypic and phenotypic analysis of the polymorphic thiopurine s-methyltransferase gene (tpmt) in a european population. Br J Pharmacol 125(4), 879-887.

235 Spurdle, A. B., Goodwin, B., Hodgson, E., Hopper, J. L., Chen, X., Purdie, D. M., McCredie, M. R., Giles, G. G., Chenevix-Trench, G. & Liddle, C. (2002). The cyp3a4*1b polymorphism has no functional significance and is not associated with risk of breast or ovarian cancer. Pharmacogenetics 12(5), 355-366.

Stamatoyannopoulos, J. A. (2004). The genomics of gene expression. Genomics 84(3), 449-457.

Steinberger, D., Blau, N., Goriuonov, D., Bitsch, J., Zuker, M., Hummel, S. & Muller, U. (2004). Heterozygous mutation in 5'-untranslated region of sepiapterin reductase gene (SPR) in a patient with dopa-responsive dystonia. Neurogenetics 5, 187-190.

Strittmatter, W. J., Saunders, A. M., Schmechel, D., Pericak-Vance, M., Enghild, J., Salvesen, G. S., Roses, A. D. (1993). Apolipoprotein E: high-avidity binding to beta- amyloid and increased frequency of type 4 allele in late-onset familial Alzheimer disease. Proc Natl Acad Sci U S A 90(5), 1977-1981.

Stroup, D. F., Berlin, J. A., Morton, S. C., Olkin, I., Williamson, G. D., Rennie, D., Moher, D., Becker, B. J., Sipe, T. A. & Thacker, S. B. (2000). Meta-analysis of observational studies in epidemiology: a proposal for reporting. Meta-analysis Of Observational Studies in Epidemiology (MOOSE) group. J Am Med Assoc 283, 2008- 2012.

Sun, H. F., Chang, Y. T., Fann, C. S., Chang, C. J., Chen, Y. H., Hsu, Y. P., Yu, W. Y. & Cheng, A. T. (2002). Association study of novel human serotonin 5-ht(1b) polymorphisms with alcohol dependence in taiwanese han. Biol Psychiatry 51(11), 896- 901.

Sun, J., He, Z. G., Cheng, G., Wang, S. J., Hao, X. H. & Zou, M. J. (2004). Multidrug resistance p-glycoprotein: Crucial significance in drug disposition and interaction. Med Sci Monit 10(1), RA5-14.

Szabo, P. E. & Mann, J. R. (1995). Allele-specific expression and total expression levels of imprinted genes during early mouse development: implications for imprinting mechanisms. Genes Dev 9, 3097-3108.

Szalai, C., Duba, J., Prohaszka, Z., Kalina, A., Szabo, T., Nagy, B., Horvath, L. & Csaszar, A. (2001a). Involvement of polymorphisms in the chemokine system in the susceptibility for coronary artery disease (CAD). Coincidence of elevated Lp(a) and MCP-1 -2518 G/G genotype in CAD patients. Atherosclerosis 158, 233-239.

Szalai, C., Kozma, G. T., Nagy, A., Bojszko, A., Krikovszky, D., Szabo, T. & Falus, A. (2001b). Polymorphism in the gene regulatory region of MCP-1 is associated with asthma susceptibility and severity. J Allergy Clin Immunol 108, 375-381.

236 Takane, H., Kobayashi, D., Hirota, T., Kigawa, J., Terakawa, N., Otsubo, K. & Ieiri, I. (2004). Haplotype-oriented genetic analysis and functional assessment of promoter variants in the mdr1 (abcb1) gene. J Pharmacol Exp Ther 311, 1179-1187.

Tang, Z. Z., Liang, M. C., Lu, S., Yu, D., Yu, C. Y., Yue, D. T. & Soong, T. W. (2004). Transcript scanning reveals novel and extensive splice variations in human L-type voltage-gated calcium channel, Cav1.2 alpha1 subunit. J Biol Chem 279, 44335-44343.

Taylor, A. L., Ziesche, S., Yancy, C., Carson, P., D’Agostino, R., Ferdinand, K., Taylor, M., Adams, K., Sabolinski, M., Worcel, M. & Cohn, J.N., for The African-American Heart Failure Trial Investigators. (2004). Combination of isosorbide dinitrate and hydralazine in blacks with heart failure. N Engl J Med 351, 2049-2057. [Erratum, N Engl J Med 2005, 352, 1276-b].

Tebo, J., Der, S., Frevel, M., Khabar, K. S., Williams, B. R. & Hamilton, T. A. (2003). Heterogeneity in control of mrna stability by au-rich elements. J Biol Chem 278(14), 12085-12093.

Tedgui, A. & Mallat, Z. (2006). Cytokines in atherosclerosis: pathogenic and regulatory pathways. Physiol Rev, 86, 515-581.

The Beta-Blocker Evaluation of Survival Trial Investigators. (2001). A trial of the beta- blocker bucindolol in patients with advanced chronic heart failure. J Am Med Assoc 344, 1659.

The International HapMap Consortium. (2005). A haplotype map of the human genome. Nature 437, 1299-1320.

Tirona, R. G. & Kim, R. B. (2002). Pharmacogenomics of organic anion-transporting polypeptides (oatp). Adv Drug Deliv Rev 54(10), 1343-1352.

Tirona, R. G., Lee, W., Leake, B. F., Lan, L. B., Cline, C. B., Lamba, V., Parviz, F., Duncan, S. A., Inoue, Y., Gonzalez, F. J., Schuetz, E. G. & Kim, R. B. (2003). The orphan nuclear receptor hnf4alpha determines pxr- and car-mediated xenobiotic induction of cyp3a4. Nat Med 9(2), 220-224.

Tucci, M., Barnes, E. V., Sobel, E. S., Croker, B. P., Segal, M. S., Reeves, W. H. & Richards, H. B. (2004). Strong association of a functional polymorphism in the monocyte chemoattractant protein 1 promoter gene with lupus nephritis. Arthritis Rheum 50, 1842- 1849.

Tucci, M., Calvani, N., Richards, H. B., Quatraro, C. & Silvestris, F. (2005). The interplay of chemokines and dendritic cells in the pathogenesis of lupus nephritis. Ann N Y Acad Sci 1051, 421-432.

237 Tucci, M., Quatraro, C., Frassanito, M.A. & Silvestris, F. (2006). Deregulated expression of monocyte chemoattractant protein-1 (MCP-1) in arterial hypertension: role in endothelial inflammation and atheromasia. J Hypertens 24, 1307-1318.

Ubogu, E. E., Cossoy, M. B. & Ransohoff, R. M. (2006). The expression and function of chemokines involved in CNS inflammation. Trends Pharmacol Sci 27, 48-55.

Ulrich, C. M., Bigler, J., Velicer, C. M., Greene, E. A., Farin, F. M. & Potter, J. D. (2000). Searching expressed sequence tag databases: Discovery and confirmation of a common polymorphism in the thymidylate synthase gene. Cancer Epidemiol Biomarkers Prev 9(12), 1381-1385.

Uno, Y., Sakamoto, Y., Yoshida, K., Hasegawa, T., Hasegawa, Y., Koshino, T. & Inoue, I. (2003). Characterization of six base pair deletion in the putative hnf1-binding site of human pxr promoter. J Hum Genet 48(11), 594-597.

Vaglenova, J., Martinez, S. E., Porte, S., Duester, G., Farres, J. & Pares, X. (2003). Expression, localization and potential physiological significance of alcohol dehydrogenase in the gastrointestinal tract. Eur J Biochem 270(12), 2652-2662.

Valenti, L., Conte, D., Piperno, A., Dongiovanni, P., Fracanzani, A. L., Fraquelli, M., Vergani, A., Gianni, C., Carmagnola, L. & Fargion, S. (2004). The mitochondrial superoxide dismutase A16V polymorphism in the cardiomyopathy associated with hereditary haemochromatosis. J Med Genet 41, 946-950.

Van Kuilenburg, A. B., Meinsma, R., Zoetekouw, L. & Van Gennip, A. H. (2002). High prevalence of the ivs14 + 1g>a mutation in the dihydropyrimidine dehydrogenase gene of patients with severe 5-fluorouracil-associated toxicity. Pharmacogenetics 12(7), 555-558.

Vandenbroucke, I. I., Vandesompele, J., Paepe, A. D. & Messiaen, L. (2001). Quantification of splice variants using real-time PCR. Nucl Acids Res 29, E68-68.

Vasilou, V., Bairoch, A., Tipton, K. F. & Nebert, D. W. (1999). Eukaryotic aldehyde dehydrogenase (ALDH) genes: human polymorphisms and recommended nomenclature based on divergent evolution and chromosomal mapping. Pharmacogenetics 9(4), 421- 434.

Vickers, T. A., Wyatt, J. R. & Freier, S. M. (2000). Effects of RNA secondary structure on cellular antisense activity. Nucl Acids Res 28, 1340-1347.

Villafranca, E., Okruzhnov, Y., Dominguez, M. A., Garcia-Foncillas, J., Azinovic, I., Martinez, E., Illarramendi, J. J., Arias, F., Martinez Monge, R., Salgado, E., Angeletti, S. & Brugarolas, A. (2001). Polymorphisms of the repeated sequences in the enhancer region of the thymidylate synthase gene promoter may predict downstaging after preoperative chemoradiation in rectal cancer. J Clin Oncol 19(6), 1779-1786.

238

Villette, S., Kyle, J. A., Brown, K. M., Pickard, K., Milne, J. S., Nicol, F., Arthur, J. R. & Hesketh, J. E. (2002). A novel single nucleotide polymorphism in the 3' untranslated region of human glutathione peroxidase 4 influences lipoxygenase metabolism. Blood Cells Mol Dis 29, 174-178.

Vilmi, T., Moilanen, J. S., Finnila, S. & Majamaa, K. (2005). Sequence variation in the tRNA genes of human mitochondrial DNA. J Mol Evol 60, 587-597. von Richter, O., Pitarque, M., Rodriguez-Antona, C., Testa, A., Mantovani, R., Oscarson, M. & Ingelman-Sundberg, M. (2004). Polymorphic nf-y dependent regulation of human nicotine c-oxidase (cyp2a6). Pharmacogenetics 14(6), 369-379.

Vuchetich, J. P., Weinshilboum, R. M. & Price, R. A. (1995). Segregation analysis of human red blood cell thiopurine methyltransferase activity. Genet Epidemiol 12(1), 1-11.

Wang, D., Johnson, A. D., Papp, A. C., Kroetz, D. L. & Sadée, W. (2005) Multidrug resistance polypeptide 1 (MDR1, ABCB1) variant 3435C>T affects mRNA stability. Pharmacogen Gen 15, 693-704.

Wang, D., Papp, A. C., Binkley, P. F., Johnson, J. A. & Sadée, W. (2006). Highly variable mRNA expression and splicing of L-type voltage-dependent calcium channel alpha subunit 1C (CACNA1C) in human heart tissues. Pharmacogen Genomics 16, 735- 745.

Wang, J., Freeman, D. J., Grundy, S. M., Levine, D. M., Guerra, R. & Cohen, J. C. (1998). Linkage between cholesterol 7alpha-hydroxylase and high plasma low-density lipoprotein cholesterol concentrations. J Clin Invest 101(6), 1283-1291.

Washietl, S., Hofacker, I. L., Lukasser, M., Huttenhofer, A. & Stadler, P. F. (2005). Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome. Nat Biotechnol 23, 1383-1390.

Wasserman, W. W. & Sandelin, A. (2004). Applied bioinformatics for the identification of regulatory elements. Nat Rev Genet 5(4), 276-287.

Watson, J. D. & Crick, F. H. (1953). Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. Nature 171(4356), 737-738.

Weber, M. J. (2005). New human and mouse microRNA genes found by homology search. Febs J 272, 59-73.

Wegscheider, B. J., Weger, M., Renner, W., Posch, U., Ulrich, S., Hermann, J., Ardjomand, N., Haller-Schober, E. M. & El-Shabrawi, Y. (2005). Role of the

239 CCL2/MCP-1 -2518A>G gene polymorphism in HLA-B27 associated uveitis. Mol Vis 11, 896-900.

Wei, X., McLeod, H. L., McMurrough, J., Gonzalez, F. J. & Fernandez-Salguero, P. (1996). Molecular basis of the human dihydropyrimidine dehydrogenase deficiency and 5-fluorouracil toxicity. J Clin Invest 98(3), 610-615.

Weinshilboum, R. (1990). Sulfotransferase pharmacogenetics. Pharmacol Ther 45(1), 93- 107.

Wenz, H. M. (2004). A novel high-throughput snp genotyping system utilizing capillary electrophoreisis detection platforms. Applied Biosystems, 850 Lincoln Centre Dr, Foster City, CA 94404, USA

Westlind, A., Lofberg, L., Tindberg, N., Andersson, T. B. & Ingelman-Sundberg, M. (1999). Interindividual differences in hepatic expression of cyp3a4: Relationship to genetic polymorphism in the 5'-upstream regulatory region. Biochem Biophys Res Commun 259(1), 201-205.

Westlind-Johnsson, A., Malmebo, S., Johansson, A., Otter, C., Andersson, T. B., Johansson, I., Edwards, R. J., Boobis, A. R. & Ingelman-Sundberg, M. (2003). Comparative analysis of cyp3a expression in human liver suggests only a minor role for cyp3a5 in drug metabolism. Drug Metab Dispos 31(6), 755-761.

Whitney, A. R., Diehn, M., Popper, S. J., Alizadeh, A. A., Boldrick, J. C., Relman, D. A. & Brown, P. O. (2003). Individuality and variation in gene expression patterns in human blood. Proc Natl Acad Sci U S A 100(4), 1896-1901.

Wilschanski, M., Yahav, Y., Yaacov, Y., Blau, H., Bentur, L., Rivlin, J., Aviram, M., Bdolah-Abram, T., Bebok, Z., Shushi, L., Kerem, B. & Kerem, E. (2003). Gentamicin- induced correction of cftr function in patients with cystic fibrosis and cftr stop mutations. N Engl J Med 349(15), 1433-1441.

Wittkopp, P. J., Haerum, B. K. & Clark, A. G. (2004). Evolutionary changes in cis and trans gene regulation. Nature 430(6995), 85-88.

Wojnowski, L. & Brockmoller, J. (2004). Single nucleotide polymorphism characterization by mrna expression imbalance assessment. Pharmacogenetics 14(4), 267-269.

Workman, C. and Krogh, A. (1999). No evidence that mRNAs have lower folding free energies than random sequences with the same dinucleotide distribution. Nucl Acids Res 27, 4816-4822.

240 Wrighton, S. A., Ring, B. J., Watkins, P. B. & VandenBranden, M. (1989). Identification of a polymorphically expressed member of the human cytochrome p-450iii family. Mol Pharmacol 36(1), 97-105.

Xu, Y., Krishnan, A., Wan, X. S., Majima, H., Yeh, C., Ludewig, G., Kasarskis, E. J. & St. Clair, D. K. (1999). Mutations in the promoter reveal a cause for the reduced expression of the human manganese superoxide dismutase gene in cancer cells. Oncogene 18, 93-102.

Yamanaka, H., Nakajima, M., Katoh, M., Hara, Y., Tachibana, O., Yamashita, J., McLeod, H. L. & Yokoi, T. (2004). A novel polymorphism in the promoter region of human ugt1a9 gene (ugt1a9*22) and its effects on the transcriptional activity. Pharmacogenetics 14(5), 329-332.

Yamano, S., Nhamburo, P. T., Aoyama, T., Meyer, U. A., Inaba, T., Kalow, W., Gelboin, H. V., McBride, O. W. & Gonzalez, F. J. (1989). Cdna cloning and sequence and cdna- directed expression of human p450 iib1: Identification of a normal and two variant cdnas derived from the cyp2b locus on and differential expression of the iib mrnas in human liver. Biochemistry 28(18), 7340-7348.

Yamazaki, H., Kiyotani, K., Tsubuko, S., Matsunaga, M., Fujieda, M., Saito, T., Miura, J., Kobayashi, S. & Kamataki, T. (2003). Two novel haplotypes of cyp2d6 gene in a japanese population. Drug Metab Pharmacokin 18(4), 269-271.

Yan, H. & Zhou, W. (2004). Allelic variations in gene expression. Curr Opin Oncol 16(1), 39-43.

Yan, H., Dobbie, Z., Gruber, S. B., Markowitz, S., Romans, K., Giardiello, F. M., Kinzler, K. W. & Vogelstein, B. (2002). Small changes in expression affect predisposition to cancer. Nat Genet 30, 25-26.

Yan, H., Yuan, W., Velculescu, V. E., Vogelstein, B. & Kinzler, K. W. (2002). Allelic variation in human gene expression. Science 297(5584), 1143.

Yan, L., Zhang, S., Eiff, B., Szumlanski, C. L., Powers, M., O'Brien, J. F. & Weinshilboum, R. M. (2000). Thiopurine methyltransferase polymorphic tandem repeat: Genotype-phenotype correlation analysis. Clin Pharmacol Ther 68(2), 210-219.

Yang, B., Houlberg, K., Millward, A. & Demaine, A. (2004). Polymorphisms of chemokine and chemokine receptor genes in Type 1 diabetes mellitus and its complications. Cytokine 26, 114-121.

Yin, S. J., Chou, C. F., Lai, C. L., Lee, S. L. & Han, C. L. (2003). Human class iv alcohol dehydrogenase: Kinetic mechanism, functional roles and medical relevance. Chem Biol Interact 143-144, 219-227.

241

Yoshida, R., Nakajima, N., Nishimura, K., Tokudome, S., Kwon, J. T. & Yokoi, T. (2003). Effects of polymorphism in promoter region of human CYP2A6 gene (CYP2A6*9) on expression level of messenger ribonucleic acid and enzymatic activity in vivo and in vitro. Chem Pharmacol Ther 74(1), 69-76.

Yoshimura, T., Matsushima, K., Tanaka, S., Robinson, E. A., Appella, E., Oppenheim, J. J. & Leonard, E. J. (1987). Purification of a human monocyte-derived neutrophil chemotactic factor that has peptide sequence similarity to other host defense cytokines. Proc Natl Acad Sci U S A 84, 9233-9237.

Zambon, A., Deeb, S. S., Brown, B. G., Hokanson, J. E. & Brunzell, J. D. (2001). Common hepatic lipase gene promoter variant determines clinical response to intensive lipid-lowering treatment. Circulation 103(6), 792-798.

Zanger, U. M., Fischer, J., Raimundo, S., Stuven, T., Evert, B. O., Schwab, M. & Eichelbaum, M. (2001). Comprehensive analysis of the genetic factors determining expression and function of hepatic cyp2d6. Pharmacogenetics 11(7), 573-585.

Zanger, U. M., Raimundo, S. & Eichelbaum, M. (2004). Cytochrome p450 2d6: Overview and update on pharmacology, genetics, biochemistry. Naunyn Schmiedebergs Arch Pharmacol 369(1), 23-37.

Zhang, J., Kuehl, P., Green, E. D., Touchman, J. W., Watkins, P. B., Daly, A., Hall, S. D., Maurel, P., Relling, M., Brimer, C., Yasuda, K., Wrighton, S. A., Hancock, M., Kim, R. B., Strom, S., Thummel, K., Russell, C. G., Hudson, J. R., Jr., Schuetz, E. G. & Boguski, M. S. (2001). The human pregnane x receptor: Genomic structure and identification and functional characterization of natural allelic variants. Pharmacogenetics 11(7), 555-572.

Zhang, Q., Sun, X., Watt, E. D. & Al-Hashimi, H. M. (2006). Resolving the motional modes that code for RNA adaptation. Science 311, 653-656.

Zhang, X., Caggana, M., Cutler, T. L. & Ding, X. (2004). Development of a real-time pcr-based method for the measurement of relative allelic expression, and identification of alleles with decreased expression in human lung. J Pharmacol Exp Ther 311, 373-381.

Zhang, Y., Wang, D., Johnson, A. D., Papp, A. C. & Sadée, W. (2005). Allelic expression imbalance of human mu opioid receptor (OPRM1) caused by variant A118G. J Biol Chem 280, 32618-32624.

Zhu, X., Bouzekri, N., Southam, L., Cooper, R. S., Adeyemo, A., McKenzie, C. A., Luke, A., Chen, G., Elston, R. C. & Ward, R. (2001). Linkage and association analysis of

242 angiotensin I-converting enzyme (ACE)-gene polymorphisms with ACE concentration and blood pressure. Am J Hum Genet 68, 1139-1148.

Zielinska, E., Niewiarowski, W. & Bodalski, J. (1998). The arylamine n-acetyltransferase (nat2) polymorphism and the risk of adverse reactions to co-trimoxazole in children. Eur J Clin Pharmacol 54(9-10), 779-785.

Zlotnik, A. (2006). Chemokines and cancer. Int J Cancer 119, 2026-2029.

Zuker, M. (2003). Mfold web server for nucleic acid folding and hybridization prediction. Nucl Acids Res 31(13), 3406-3415.

243