Functional genomics of the VNN1

Luke Aris Diepeveen BSc, Hons (Genetics)

School of Chemistry and Biochemistry University of Western Australia

Supervisors: W/Prof Lawrence Abraham A/Prof Daniela Ulgiati

This thesis is presented for the degree of Doctor of Philosophy at the University of Western Australia

July 2013 Declaration

I declare that this thesis describes my own research and work not previously submitted at any other university.

Luke Aris Diepeveen

2

Abstract

Chromatin accessibility is considered an excellent measure of the functionality of genetic variants (Mikkelson et al ., 2007; Birney et al ., 2010; Pang et al ., 2013). It is common to use reporter gene and gel shift assays for determining the functionality of genetic variants, however, both approaches are limited by scalability and are lacking in chromatin context. The present study uses a novel next-gen sequencing-based approach

(‘CHA-seq’) which translates Chromatin Accessibility by Real-Time PCR (CHART-

PCR) into a robust method in order to assess allele-specific differences in chromatin arrangement across an extended genomic region . CHA-seq is a means of simultaneously sequencing genetic variants, mapping nucleosomes and nuclease hypersensitivity sites, and assessing SNP functionality.

Genetic variants within the VNN1 gene have been associated with cardiovascular disease (Jacobo-Albavera et al ., 2012; Fava et al ., 2010; Zhu and Cooper, 2007), and as such, VNN1 represents an ideal gene model for functional assessment. Göring et al .

(2007) identified two candidate functional SNPs, namely -137 (rs4897612) and -587

(rs2050153), in the VNN1 gene promoter that have been experimentally validated using chromatin arrangement assays as part of this current research. Genotyped cell lines derived from the San Antonio Family Heart Study (SAFHS) (Göring et al ., 2007), carrying different -137 and -587 alleles, were characterised for allele-dependent differences in chromatin arrangement as an indicator of function. The CHART-PCR and

CHA-seq results showed allele-specific chromatin accessibility at the -137 and the -587 variants. Further CHA-seq analysis across an extended genomic interval revealed several candidate functional genetic variants, including the rs2267951 SNP in intron 2 of VNN1 . Experimental validation using EMSA showed allele-specific DNA-complexes

3 at the rs2267951 locus, therefore supporting the CHA-seq results. Of the genetic variants selected for functional assessment, both CHA-seq and other experimental validation assays have given an equivalent measure of function in VNN1 .

Bioinformatic analysis of the VNN1 promoter showed the presence of a CpG island, of which one CpG site was disrupted by the -587A allele. Bisulfite sequencing showed allele-specific DNA methylation patterns at the -720 CpG site which were dependent on the allele at the -587 locus. Therefore, this current study provides evidence that allele- specific chromatin differences can be seen at sites remote from the actual genetic variation.

The development of enhanced genome-wide in vivo approaches such as CHA-seq allows for a more cost and time-effective assessment of whether a variant does indeed regulate chromatin arrangement and , yielding insights into the long range allele-specific mechanisms of action.

4

List of Publications

Curran, J., Jowett, J., Abraham, L., Diepeveen, L.A. , Elliott, K., Dyer, T., Kerr- Bayles, L., Johnson, M., Comuzzie, A., Moses, E.K, et al. (2009). "Genetic variation in PARL influences mitochondrial content." Human Genetics 127 (2): 183–190.

Kaskow, B.J., Diepeveen, L.A. , Proffitt, J.M., Rea, A.J., Ulgiati, D., Blangero, J., Moses, E.K., Abraham, L.J. (2013). “Molecular prioritisation strategies to identify functional genetic variants in the cardiovascular disease-associated expression QTL Vanin-1.” European Journal of Human Genetics doi:10.1038/ejhg.2013.208

5

Table of Contents

Declaration ...... 2 Abstract ...... 3 List of Publications ...... 5 Table of Contents ...... 6 List of Figures and Tables ...... 9 Abbreviations ...... 11 Acknowledgements ...... 13

Chapter 1: Literature Review ...... 15 1.1 Human genetic variation ...... 15 1.2 Classes of human genetic variation ...... 18 1.3 Complex diseases and genetic association studies ...... 21 1.3.1 Candidate gene approach ...... 24 1.3.2 Genome-wide association studies (GWAS) ...... 27 1.4 Experimental validation of candidate genetic variation ...... 31 1.4.1 Reporter gene assays ...... 34 1.4.2 Electrophoretic Mobility Shift Assay...... 36 1.4.3 Chromatin Immunoprecipitation ...... 38 1.4.4 CHART-PCR ...... 40 1.4.5 Bisulfite sequencing ...... 43 1.4.6 Genome-wide functional assays ...... 47 1.4.7 Bioinformatic analyses ...... 49 1.5 VNN1 – A gene model for novel experimental validation assays and cardiovascular disease ...... 51 1.5.1 VNN1 – The novel cardiovascular risk-gene ...... 52 1.5.2 Gene models in cardiovascular disease – case studies ...... 53 1.6 Research justification and PhD aims...... 57

Chapter 2: General methods and materials ...... 61 2.1 General reagents, biochemicals, buffers and solutions ...... 61 2.2 Bacterial Methods ...... 62 2.2.1 Bacterial culture medium ...... 62 2.2.2 Bacterial culturing conditions ...... 63 2.2.3 TOPO cloning in bacteria...... 64 2.2.4 Transformation of Bacteria and blue-white colony selection ...... 64 2.3 Eukaryotic Methods ...... 65 2.3.1 Eukaryotic cells and culturing ...... 65 2.3.2 Eukaryotic cell counting with Trypan Blue staining ...... 65 2.3.3 Cyropreservation storage and revival of eukaryotic cell lines ...... 66 2.4 Oligonucleotides ...... 66 2.5 Isolation and purification of plasmid and eukaryotic DNA ...... 69 2.5.1 Mini-prep purification of plasmid DNA ...... 69 2.5.2 Isolation and purification of DNA from eukaryotic cells ...... 69 2.6 Analysis of nucleic acids ...... 70 2.6.1 Nano-drop analysis of nucleic acids ...... 70 2.6.2 Gel Electrophoresis ...... 70 2.7 Polymerase Chain Reaction (PCR) ...... 71 2.8 Sanger DNA Sequencing ...... 71

6

2.9 Experimental validation methods for SNP functionality ...... 72 2.9.1 Chromatin Accessibility by Real-Time PCR (CHART-PCR) ...... 72 2.9.1.1 Nuclei isolation ...... 72 2.9.1.2 Nuclease digestion of chromatin ...... 72 2.9.1.3 Collection of DNA fragments ...... 73 2.9.1.4 qPCR and analysis ...... 73 2.9.2 Electrophoretic Mobility Shift Assay (EMSA) ...... 74 2.9.2.1 Nuclear protein extraction ...... 74 2.9.2.2 Binding reaction preparation ...... 75 2.9.2.3 Gel electrophoresis and detection ...... 76 2.9.3 Bisulfite Sequencing ...... 77 2.9.3.1 Bisulfite treatment ...... 78 2.9.3.2 PCR amplification ...... 78 2.9.3.3 Cloning and transformation...... 78 2.9.3.4 DNA sequencing and analysis ...... 79 2.10 Targeted-CHA-seq (T-CHA-seq)...... 79 2.10.1 T-CHA-seq library preparation ...... 80 2.10.1.1 Isolation of nuclei from cell lines...... 80 2.10.1.2 Digesting DNA with chromatin accessibility agent ...... 81 2.10.1.3 NEBNext DNA Sample Prep kit processing ...... 81 2.10.2 PCR amplification ...... 82 2.10.3 TOPO cloning and transformation ...... 84 2.10.4 Sanger DNA sequencing ...... 84 2.10.5 Analysis and bioinformatics ...... 84 2.11 Extended CHA-seq (CHA-seq) ...... 85 2.11.1 Library preparation...... 86 2.11.2 Bead selection ...... 86 2.11.3 Cluster generation and deep sequencing ...... 86 2.11.4 Analysis with IGV 2.1...... 86 2.12 Statistical Analyses ...... 87

Chapter 3: In vivo assessment of VNN1 promoter SNPs using two chromatin- sensitive nucleases in CHART-PCR ...... 88 3.1 Introduction ...... 88 3.2 Methods ...... 91 3.3 Results ...... 93 3.3.1 Assessment of relative chromatin accessibility within the VNN1 gene promoter using DNase I ...... 93 3.3.1.1 Chromatin accessibility of VNN1 SNPs relative to SPA-2 and β-actin control loci using DNase I...... 93 3.3.1.2 Assessment of allele-specific chromatin accessibility at VNN1 SNPs using DNase I ...... 95 3.3.2 Assessment of relative chromatin accessibility within the VNN1 gene promoter using MNase ...... 97 3.3.2.1 Chromatin accessibility of VNN1 SNPs relative to SPA-2 and β-actin control loci using MNase ...... 97 3.3.2.2 Assessment of allele-specific chromatin accessibility at VNN1 SNPs using MNase ...... 99 3.4 Discussion ...... 102

Chapter 4: CHA-seq - A novel functional variant screen ...... 106 4.1 Introduction ...... 106 7

4.2 CHA-seq method development ...... 110 4.2.1 Traditional chromatin arrangement assays – issues ...... 111 4.2.2 Targeted CHA-seq (T-CHA-seq) development ...... 113 4.2.2.1 T-CHA-seq library preparation – DNase I digestion optimisation ...... 113 4.2.2.2 T-CHA-seq library preparation – adapter-ligation optimisation ...... 115 4.2.2.3 PCR amplification - optimisation...... 117 4.2.2.4 T-CHA-seq methodology following PCR amplification ...... 121 4.2.3 Extended CHA-seq method development ...... 121 4.2.3.1 Library preparation – MNase digestion optimization ...... 122 4.2.3.2 CHA-seq methodology post-library preparation ...... 124 4.3 Results ...... 124 4.3.1 Targeted CHA-seq (T-CHA-seq) ...... 124 4.3.2 CHA-seq ...... 129 4.3.2.1 Allele-specific nuclease activity at the -587 SNP ...... 134 4.3.2.2 Nucleosome mapping within the -587 VNN1 promoter region ...... 136 4.3.3 CHA-seq for the VNN2 and VNN3 ...... 138 4.4 Discussion ...... 140

Chapter 5 – DNA methylation analysis of the -587 promoter region in the VNN1 gene ...... 144 5.1 Introduction ...... 144 5.2 Materials and Methods ...... 145 5.3 Results ...... 146 5.3.1 Predicted CpG sites in the -587 promoter region of the VNN1 gene ...... 146 5.3.2 Bisulfite sequencing of the -587 promoter region in the VNN1 gene and BISMA analysis ...... 148 5.3.3 Bioinformatic analysis of the -720 CpG site in the VNN1 gene promoter ... 150 5.4 Discussion ...... 152

Chapter 6 – Experimental validation of candidate functional SNP using EMSA 155 6.1 Introduction ...... 155 6.2 Methods ...... 155 6.3 PARL Results ...... 158 6.3.1 EMSA and bioinformatic analysis of the -191 SNP in the PARL gene promoter ...... 158 6.4 VNN1 Results ...... 160 6.4.1 Bioinformatic analysis of the rs2267951 VNN1 SNP ...... 160 6.4.2 EMSA analysis of the rs2267951 VNN1 SNP ...... 162 6.5 Discussion ...... 164

Chapter 7 – General discussion and future research directions ...... 166 7.1 CHA-seq contribution to functional research ...... 166 7.2 Contribution to VNN1 research ...... 168 7.3 Future research directions ...... 170

References ...... 173

Appendix – Supplementary Tables and Figures ...... 204

8

List of Figures and Tables

Table 1.1: Characteristics of traditional experimental validation assays for the assessment of candidate functional genetic variants ...... 33 Figure 1.1 Principles of CHART-PCR...... 41 Figure 1.2 Principles of methylation analysis using bisulfite genomic sequencing ...... 44

Table 2.1: Primer design guidelines ...... 68 Table 2.2: PCR primers used in T-CHA-seq ...... 83

Table 3.1: CHART-PCR primers ...... 92 Figure 3.1: Chromatin accessibility differences at the -137 and -587 VNN1 SNPs relative to SPA-2 and β-actin control regions ...... 94 Figure 3.2: Chromatin accessibility differences at the -137 and -587 VNN1 SNPs ...... 96 Figure 3.3: Chromatin accessibility differences at the -137 and -587 VNN1 SNPs relative to SPA-2 and β-actin control regions ...... 98 Figure 3.4: Chromatin accessibility differences at the -137 VNN1 SNP ...... 100 Figure 3.5: Chromatin accessibility differences at the -587 VNN1 SNP ...... 101

Figure 4.1 Principles of CHA-seq and T-CHA-seq...... 109 Figure 4.2: Chromatin accessibility differences between replicates at the β-actin promoter using CHART-PCR ...... 112 Figure 4.3: Electrophoretic patterns of DNase 1-digested chromatin ...... 114 Figure 4.4: Adapter ligation to DNase I-digested chromatin ...... 116 Figure 4.5: T-CHA-seq primer positions in the -137 VNN1 promoter region ...... 118 Figure 4.6: T-CHA-seq ‘forward’ PCR amplification of DNase I library fragments ... 119 Figure 4.7: T-CHA-seq ‘reverse’ PCR amplification of DNase I library fragments .... 120 Figure 4.8: Electrophoretic patterns of MNase-digested chromatin ...... 123 Figure 4.9: Assessment of the chromatin arrangement across the VNN1 promoter using T-CHA-seq ...... 126 Figure 4.10: Assessment of the chromatin arrangement across the VNN1 promoter using T-CHA-seq ...... 128 Table 4.1: Heterozygous VNN1 SNPs captured in CHA-seq...... 131 Figure 4.11: Distribution of heterozygous SNPs captured in CHA-seq across the VNN1 gene relative to allele skew ...... 133 Figure 4.12: Assessment of the allele-specific chromatin accessibility across the -587 VNN1 promoter region using CHA-seq ...... 135 Figure 4.13: Assessment of the chromatin arrangement across the -587 VNN1 promoter region using CHA-seq ...... 137 Table 4.2: Heterozygous VNN2 and VNN3 SNPs captured in CHA-seq ...... 139

Figure 5.1: BISMA prediction of VNN1 CpG island length ...... 147 Figure 5.2: Methylation of the -587 SNP ...... 149 Figure 5.3: Bioinformatic analysis of -720 CpG site in the VNN1 gene promoter ...... 151

Table 6.1: EMSA oligonucleotides for PARL and VNN1 SNPs ...... 157 Figure 6.1: EMSA for the -191 PARL SNP ...... 159 Figure 6.2: Bioinformatic analysis of the rs2267951 SNP in the VNN1 gene ...... 161 Figure 6.3: EMSA for the rs2267951 VNN1 SNP relative to the -191 PARL SNP ...... 163

9

Table A.1: Description of general biochemicals and reagents ...... 204 Table A.2: Distributers for laboratory equipment used in research ...... 207 Table A.3: Distributers for general laboratory disposables used in this research ...... 208 Table A.4: DNase I cutting at the -137 and -587 VNN1 gene promoter SNPs, with SPA- 2 and β-actin control loci in SAFHS cell 0054 ...... 209 Table A.5: DNase I cutting at the -137 and -587 VNN1 gene promoter SNPs in homozygous SAFHS cell lines ...... 210 Table A.6: MNase cutting at the -137 and -587 VNN1 gene promoter SNPs, with SPA-2 and β-actin control loci in heterozygous Ramos cell line ...... 211 Table A.7: MNase cutting at the -137 VNN1 gene promoter SNP in different SAFHS cell lines ...... 212 Table A.8: MNase cutting at the -587 VNN1 gene promoter SNP in different SAFHS cell lines ...... 213

10

Abbreviations

Abbreviation Description º Degrees % Percentage ANOVA Analysis of Variance ATCC American Type Culture Collection BISMA Bisulfite Sequencing DNA Methylation Analysis bp BP Blood pressure BSA Bovine Serum Albumin C Celsius CD Crohn’s disease C/EBP CCAAT/enhancer binding protein CHART-PCR Chromatin accessibility by Real-Time PCR CHA-seq Chromatin accessibility by sequencing CHD Coronary heart disease ChIP Chromatin Immunoprecipitation cm Centimeters cM Centi-morgan CO 2 Carbon dioxide CR Complement Receptor CVCD Common Variant, Common Disease CVD Cardiovascular disease ddH 2O Double deionised water DMSO Dimethyl sulphoxide DNA Deoxyribonucleic acid DNase I Deoxyribonuclease I ds Double-stranded DTT Dithiothreitol EBV Epstein-Barr Virus E.coli Escherichia coli EDTA Ethylenediaminetetraacetic acid EGTA Ethylene glycol tetraacetic acid EMSA Electrophoretic Mobility Shift Assay eQTL Expression Quantitative Trait Loci FBG Fasting blood glucose FBS Fetal Bovine Serum FH Familial hypercholesterolemia g Gram GWAS Genome-wide Association Study HCl Hydrochloric acid HDL-C High-density lipoprotein cholesterol HEPES Hydroxyethyl piperazineethanesulfonic acid Hg Mercury HGP Project HLA Human leukocyte antigen IGV Integrative Genomics Viewer indel Insertion or deletion kb Kilobase pairs

11

L Litre(s) LB Luria-Bertani LD Linkage disequilibrium LDL Low-density lipoprotein m Milli (1 x 10 -3) M Molar Mb Megabase pairs min Minute(s) MNase Micrococcal nuclease mRNA Messenger RNA NaOH Sodium Hydroxide nm Nanometers oligo Oligonucleotide PARL Presenilin Associated Rhomboid-Like PBS Phosphate Buffered Saline PCR Polymerase Chain Reaction PMSF Phenylmethylsulphonylfluoride Poly dI:dC Poly (deoxyinosinic-deoxycytidylic) acid PSI pounds per square inch qPCR Quantitative real-time PCR QTL Quantitative Trait Loci RA Rheumatoid arthritis RNA Ribonucleic acid rpm Revolutions per minute RVCD Rare Variant, Common Disease SAFHS San Antonio Family Heart Study SD Standard deviation SDS Sodium dodecyl sulphate seq Sequencing SNP Single Nucleotide Polymorphism Sp Specificity protein SPA-2 Surfactant protein 2 T1D Type 1 diabetes TAE Tris-acetate-EDTA T-CHA-seq Targeted chromatin accessibility by sequencing TE Tris-EDTA TF Transcription factor TNF Tumour Necrosis Factor Tris 2-amino-2-hydroxymethyl-1,3-propanediol U International enzyme units µ Micro (1 x 10 -6) UV Ultraviolet V Volts VNN Vanin v/v Volume per volume w/w Weight per weight w/v Weight per volume X Times

12

Acknowledgements

First and foremost I would like to thank my supervisors W/Prof Lawrie Abraham and A/Prof Daniela Ulgiati. It has been an honour to be one of your PhD students. You each have taught me, both consciously and un-consciously, how to become a good scientist. I appreciate all the contributions of time, ideas, and funding to make my PhD experience productive and enriching.

Importantly, I wish to thank the University of Western Australia, the School of Chemistry and Biochemistry and the Scholarships department for all of the funding, education and opportunities I have been given. I must also recognise the amazing opportunity I had to travel to the Texas Biomedical Research Institute. I am fortunate for the time I was able to spend there, for the use of the deep sequencing facilities, and for the cell-lines that were provided for this project. I must sincerely thank all concerned for this opportunity, especially my supervisor Lawrie.

I must express my thanks the members of the Abraham/Ulgiati group have contributed immensely to my personal and professional time at the University of Western Australia. The group has been a source of friendships as well as good advice and collaboration. I am grateful to all members including Des, Hui Min, Mahdad, Mark, Tim, Nick, Michelle, Freya and others.

My time at the University of Western Australia was made enjoyable in large part due to the many friends and groups that became a part of my life. I am appreciative of the support from members of the School of Chemistry and Biochemistry including Amber, Yat-zu, and Steve, Paul, Adam and Nehal. Thank you.

In particular, I would like to personally address the contributions of certain individuals. Firstly Belinda, you have been a constant source of joy and inspiration. I am thankful for all of your kind words and guidance. You are a true friend.

Alex, you have been both a friend and a role model. I could not have asked for a better role model who is inspirational, supportive, and thoughtful.

13

Rhonda, your determination and passion was a source of inspiration for me. When times were tough, you always managed to help me through.

Josh and Robert, you are both such amazing people. I know you both will achieve so much in science, and life.

George, I have always enjoyed discussing our joint love of tennis and the West Coast Eagles. More importantly, I am forever grateful for the opportunity you have given me to work in your laboratory during the final months of writing my thesis. I also must thank the other members of your group, Adam, Sarah, Robyn, Connie, Ken, Anne, Aurelian, Megan, Ros, Karen and Bernard for making my time in the Yeoh laboratory so fruitful and enriching.

Liz, you are an inspiration to me as both a scientist and person. You are both kind and caring, and I am so grateful for your guidance. You have given me so much, and I am truly grateful.

I would have never achieved what I have in life without the love and support of my family. Mum, Dad, Lara, Pippa, you are my inspiration. Thank you for everything and for always being there for me.

To my dearest wife, Becky, you are my world. I dedicate this PhD to you, because without you I have nothing. I was so lucky to have found you during the course of this PhD and I cannot wait for our future together.

14

Chapter 1: Literature Review

1.1 Human genetic variation

The last decade has seen significant effort committed to cataloguing human genetic variants and their association with phenotypic differences or disease (Frazer et al ., 2009;

Bodmer and Bonilla, 2008; Mackay et al ., 2009). Since the completion of the Human

Genome project (HGP) in 2001, most common variants (minor allele frequency > 1%) have been identified and assessed in genetic studies for statistical association with complex traits and diseases (Collins et al ., 2003; Bulik-Sullivan et al ., 2012; Kathiresan and Srivastava, 2012). Although genetic studies in humans have identified potential disease-risks and novel biological insights, limited progress has been made to identify the heritable component(s) of any complex human trait (Frazer et al ., 2009).

Characterising and cataloguing genetic variation will assist in understanding the functional mechanism behind which associated variants lead to certain phenotypic traits or disease. Dissection of such mechanisms will enable the prioritization of candidate genetic variants for further investigation (Stower, 2013).

One of the first large-scale studies to characterise human genetic variation was the

‘International HapMap Project’ (The International HapMap Consortium, 2003; The

International HapMap Consortium, 2004). This study aimed to characterize the influence of human genetic variation on phenotype with a focus on understanding the range of patterns and frequencies of genetic variants (Frazer et al ., 2009). The major objective of the HapMap project was to genotype at least one common variant every five kilobases across the actively transcribed regions of the human genome (The

International HapMap Consortium, 2007). Between the HapMap project launch in 2002

15 and the phase II publication of results in 2007, more than 3.1 million single nucleotide polymorphisms (SNPs) were catalogued in 270 individuals from four geographically distinct populations (The International HapMap Consortium, 2007). More recently,

HapMap phase III results provide genotype data on approximately 1.5 million SNPs from 1397 individuals in 11 populations (Pemberton et al ., 2010). With the aid of newer, high-throughput DNA sequencing technologies, over 10 million SNPs have now been catalogued and are publically available online through the HapMap project database (http://hapmap.ncbi.nlm.nih.gov/).

Ancestral and de novo mutations, in addition to the admixture generated by DNA recombination, are the primary sources of genetic variation in a given population

(Jenkins et al ., 2012; Peter et al ., 2012). In humans, the contribution of de novo mutation and recombination has been estimated on average to be 60 novel mutations/variants per generation (Conrad et al ., 2011), with an average rate of 0.5 ×

10 −9 mutations base-pair −1 year −1 over the past million years (Scally and Durbin, 2012).

This estimate, however, suggests a mutation rate roughly half that of previously derived measurements from fossil calibration and suggests a higher-than-previously-thought contribution of DNA recombination to evolution (Scally and Durbin, 2012). Human de novo mutation is not only an important substrate for selection and evolution (Lynch,

2010), but also to fitness through the generation of deleterious alleles as demonstrated by Kong et al . (2012). Kong and co-workers determined that a father’s age at conception was associated with an increase (two additional mutations per year) in de novo mutation rate of SNPs, and subsequently increased disease-risk of schizophrenia and autism in an Icelandic population .

16

Typically, DNA recombination occurs within 1-2 kb genomic ‘hotspots’. More than

30,000 hotspots have been identified in humans, suggesting that they are a ubiquitous feature of the human genome (Myers et al ., 2008; Myers et al ., 2006). For instance, known recombination hotspots have been reported within the human Prdm9 gene

(Baudat et al ., 2010), the TAP2 gene (Jeffreys et al ., 2000), the RNF212 gene (Kong et al ., 2002) and within the region 11p15.5 (Sandovici et al ., 2006). The average rate of recombination per generation in humans is 1.13 cM/Mb, however rates vary significantly (0.1-3 cM/Mb) across the genome (Auton and McVean, 2012).

Variation in DNA recombination rate can be attributed to factors such as age and sex

(Hussin et al ., 2011), genetic variation (Fledel-Alon et al ., 2011), and chromosome size

(Farré et al ., 2012). Wang et al . (2012) compared DNA recombination and de novo mutation rates and found a recombination rate of approximately 23 events per cell, whereas the aneuploidy rate ranged from 2-10% in sperm.

Gene expression is the mechanism by which genetic information is used in the synthesis of a functional product such as RNAs and protein (Cambien and Tiret, 2007). The functional impact of most genetic variants is hypothesised to be neutral (i.e. no observable phenotypic impact), with many variants achieving high frequency in human populations by chance (Powell, 2012; Cambien and Tiret, 2007). However, subsets of genetic variants occurring within regulatory or coding regions can influence gene expression levels or alter the amino acid sequence of proteins. This can result in observable differences in phenotype (Cambien and Tiret, 2007). For example, Love-

Gregory et al . (2011) identified four metabolic syndrome risk-SNPs (rs1761667, rs3211909, rs3211913, rs3211938) located within the CD36 gene which resulted in reduced CD36 protein expression. These SNPs were also found to be associated with levels of free fatty acids and high-density lipoprotein (HDL). More recently, Keenan et

17 al . (2012) reported that a coding SNP (rs4844609) in the CR1 gene, which correlated with a Ser1610Thr amino acid change, was associated with episodic memory decline in

Alzheimer’s disease. Keenan and co-workers suggested that the rs4844609 SNP alters the conformation of the functional domain of the CR1 protein, leading to the development of Alzheimer’s symptoms. Given that ‘functional’ genetic variants can interrupt or alter cellular mechanisms, understanding the role of genetic variation is fundamental to disease studies in determining how a particular disease or phenotype arises (Girirajan et al ., 2011).

1.2 Classes of human genetic variation Human genetic variants are termed as either common or rare based on the frequency of the minor allele in a population, and is divided into two classes based on nucleotide composition; single nucleotide variants and structural variants (Eichler et al ., 2007;

Redon et al ., 2006; Altshuler et al ., 2010). Structural variations primarily consist of insertions and deletions (‘indels’); in addition to copy number variations and retrotransposons (Alkan et al ., 2011; Muers, 2010; Mills et al ., 2011). Single variants account for approximately 90% of all genetic variation and occur on average every 100-

300 base pairs (Thomas et al ., 2011).

The ‘1000 Genomes Project’ was established to catalogue all classes of human genetic variation (Altshuler et al ., 2010; Abecasis et al ., 2012). This project consisted of sequencing 1092 genomes using a low-coverage (2-6 fold) whole-genome combinatorial approach. This approach consisted of targeted (50-100 fold) exonic and dense SNP genotype data in conjunction with bioinformatic methods for selecting high resolution variant calls. The 1000 genome project generated a haplotype map of 38 million SNPs, 1.4 million short insertions and deletions and more than 14,000 large

18 deletions (Altshuler et al ., 2010; Abecasis et al ., 2012). Ultimately, this project has created the most detailed public catalogue of human genetic variation and is publically available online (http://www.1000genomes.org/).

Common genetic variants in human populations are the focus of most current research studies that target phenotypic traits (Marian and Belmont, 2011). Genetic variants associated with a measurable allele-specific influence on the function of a gene product, gene expression, or phenotype are considered to be functional (Wang et al ., 2011;

Mottagui-Tabar et al ., 2005; Knight, 2005; Jian et al ., 2011). Functional genetic variants that contribute to phenotypic differences amongst individuals have been associated with cell function (Bonetti et al ., 2012; Johnson et al ., 2012; Collins et al .,

1997; Syvanen et al ., 1999) and disease (Bartoszewski et al ., 2010; Zhang et al ., 2011;

Pratt and Dzau, 1999; Wang and Moult, 2001). Ultimately, the position of genetic variants within a gene directly influences the potential functional impact on the gene

(Guo and Jamison, 2005). For example, SNPs occurring within regulatory regions, especially near the transcription start site, can influence gene expression levels and promote alternative splicing (Pickrell et al ., 2010). Interestingly, genetic variants located within regulatory regions that are distal to a particular gene, such as enhancer elements, can also influence gene expression (Chavan et al ., 2010; Steidl et al ., 2007).

In contrast, SNPs within coding regions can alter the structure and activity of the encoded functional products which may include both RNAs and proteins (Nackley et al ., 2006; Hoffmeyer et al ., 2000). For instance, SNPs within exons of protein-coding genes can alter the subsequent protein structure and activity as shown by Schaeffeler et al . (2006) in the TPMT gene and Diogo et al . (2013) in the IL2RA , IL2RB and CD2 genes.

19

In contrast to single nucleotide variants, structural variants can have a more significant influence on phenotype, often resulting in clinical manifestations such as Down and

Turner syndromes (Feuk and Carson, 2006; Lee and Scherer, 2010), and Angelman and

Prader–Willi syndromes (Sanders et al ., 2011). The molecular consequences of structural variants are locus-dependent, with variants occurring within coding regions resulting in gene fusions or truncations. Alternatively, variants within regulatory regions can lead to alterations in gene expression levels (Weischenfeldt et al ., 2013). For example, overlapping 1.5-3Mb deletions within the chromosome 22q11 band containing genes such as TBX1 , COMT , and CRKL , can lead to DiGeorge syndrome and velocardio facial syndrome due to haploinsufficiency of a number of encoded proteins

(Maynard et al ., 2002; Weischenfeldt et al ., 2013).

In response to research evidence suggesting that most candidate functional variants are located within regulatory rather than coding regions, a public research consortium designated as the ‘ENCODE’ project was launched in 2003 to improve the functional annotation of genetic variation in the human genome (Birney et al ., 2007; Rosenbloom et al ., 2012). The objective of the ENCODE research was to identify and catalogue all functional elements and variants in human and other mammalian genomes (Birney et al ., 2007; Rosenbloom et al ., 2012). As of January 2012, 27 biochemical assays

(including RNA-seq, ChIP-seq, and DNA methylation assays) had been submitted. This data covered 200 distinct human cell types and is available for public use

(http://genome.ucsc.edu). While the ENCODE project has generated significant amounts of novel information, a detailed understanding of the default function and interactions of regulatory variants in the phenotype of different cells and tissues remains incomplete (Rosenbloom et al ., 2012).

20

1.3 Complex diseases and genetic association studies Complex diseases, as opposed to ‘simple’ single-gene Mendelian disorders, are typically caused by combinations of genetic variants and/or mutations that combinatorially result in dysregulation of components of a cellular system or pathway

(Cho et al ., 2012). Complex diseases such as cancer, cardiovascular disease and diabetes are associated with the effects of multiple genes in combination with environmental and lifestyle factors (Erler and Linding, 2012; Kathiresan and Srivastava,

2012; Kodama et al ., 2012). Consequently, most complex diseases are heterogeneous given the unique genetic variation that exists between individuals and the vast array of external environmental factors (Cho et al ., 2012).

Fundamental to medical research is the identification of functional genetic variants and the characterisation of the mechanisms through which they influence the phenotype of complex diseases (Stranger et al ., 2011). Thus far, genetic variation has been associated with the disease phenotypes in terms of predisposition for disease initiation, progression and severity (Duff, 2006; Knowles and Drumm, 2012). Two common hypotheses, the

‘common variant-common disease’ (CVCD) and the ‘rare-variant-common disease’

(RVCD) are used to describe the genetic contribution of individual susceptibility to complex diseases (Schork et al ., 2009). The CVCD hypothesis suggests that a collection of common alleles (alleles with a population frequency of ≥0.05) increase the risk of common disease onset and progression (Marian and Belmont, 2011). This hypothesis suggests that common genetic variants confer a modest cumulative and interactive effect in complex disease phenotype such as diabetes or cardiovascular disease (Liu and

Song, 2010). The RVCD hypothesis, on the other hand, suggests that variants with lower-frequency (alleles with a population frequency of <0.005) and higher-penetrance significantly contribute to disease phenotype (Bodmer and Bonilla 2008). For example,

21 a number of rare genetic variants have been demonstrated to contribute to diseases such as hypertrophic cardiomyopathy (Marian, 2010) susceptibility and thus allow enhanced detection of the diseases.

Coding variants were the initial targets of complex trait research (Bodmer and Bonilla,

2008; Majewski and Pastinen, 2011; Bostein and Risch, 2003), with casual rare variants associated with common complex disorders often through amino acid substitution

(Ladouceur et al., 2012; Cohen et al., 2004). There is increasing evidence supporting the

RVCD hypothesis, with rare coding variants associated with common human disorders including breast cancer (Schork et al ., 2009), type I diabetes (Nejentsev et al ., 2009), hypertension (Ji et al ., 2008), and cholesterol disorders associated with sterol absorption and plasma LDL-cholesterol levels (Cohen et al ., 2004; Cohen et al ., 2006). For example, Goldgar et al . (2011) identified a number of truncating variants, and a rare

SNP (‘c.7271T>G’) in the ATM gene that were associated with a significantly increased risk of breast cancer. The penetrance of these variants was similar to that conferred by mutations in the known breast cancer susceptibility gene BRCA2 . While a focus of complex disease research has been to identify susceptibility variants, recent studies have identified rare genetic variants that have a ‘protective’ role in complex traits (Dai et al .,

2012). For instance, multiple rare variants have been shown to act protectively against type I diabetes (Nejentsev et al ., 2009) and psoriasis (Li et al ., 2010).

The CVCD hypothesis proposes that common diseases are typically correlated with multiple susceptibility alleles, with early supporting evidence from the ApoE ε4 allele in Alzheimer's disease (Saunders et al . 1993; Genin et al ., 2011) and a Factor V allele

(‘1691C A’) in deep-vein thrombosis (Bertina et al . 1994; Knight, 2005). More recently, common genetic variants have been identified in Crohn's disease (CD)

22

(Yamazaki et al ., 2005; Mathew, 2008) and type 1 diabetes (Todd et al ., 2007). In 2005,

Yamazakiet et al. identified a common intronic SNP (‘14,340T →C’) in the TNFSF15 gene, which encodes the tumour necrosis super-family member 15, which contributes to

CD susceptibility. A subsequent study by Duerr et al . (2006) screened a North-

American sample and reported strong association of multiple common SNPs

(rs2066843 and rs2076756) within a known CD susceptibility gene, CARD15 . Common variants associated with gene expression levels have also been linked to common complex diseases such as breast cancer (Zhang et al ., 2011). For instance, Zhang et al .

(2011) identified a common SNP (rs1044129) located in a micro-RNA binding site

(miR-367) in the 3’ untranslated region (3’UTR) of the RYR3 gene that was associated with a higher risk of breast cancer. Zhang et al . (2011) reported that miR-367 had a higher binding affinity (resulting in a decrease in gene expression) in the presence of the major allele (A) compared to the minor allele (G). Genotypes in which the G allele was present were associated with a higher risk for breast cancer development, calcification and survival of the patient (Zhang et al ., 2011).

In order to identify the genetic component of human diseases, early ‘family studies’ were performed to trace the distribution of traits from parental generations to progeny and involved the determination of mathematical segregation ratios (Borecki and

Province, 2008). Family studies have been successful in the identification of genes associated with monogenic diseases including Duchenne muscular dystrophy (Jones et al ., 2012) and Huntington disease (Anderson et al ., 2004). However, this approach has had limited success with complex traits and diseases. Complex diseases do not exhibit patterns of Mendelian inheritance. Consequently, genetic association approaches are used to identify genetic risk-variants in favour of family studies (Cambien and Tiret,

2007). Primarily through advancements in genotyping technologies, the focus of gene

23 mapping in humans has shifted towards the use of genetic association studies (Stranger et al ., 2011).

Genetic association studies incorporate large sets of SNPs and other markers located in known candidate regions or genes to statistically determine possible causal variants that show greater prevalence in case vs control samples (Doris, 2011). Genetic association is dependent upon the existence of linkage disequilibrium (LD) between genetic variants in close physical proximity within the genome Thus, the identification of possible casual variants is possible due to the presence of LD with detected variants (Stranger et al ., 2011). Thus far, over 800 genotype/SNP-phenotype associations have been identified from 600 published association studies covering 150 complex human diseases

(Bakir-Gungor and Sezerman, 2011). While association studies have provided great insight into potential disease pathways, this success must be put into perspective by considering inherent limitations (Hirschhorn et al ., 2009; McCarthy et al ., 2008). For example, a review by Jakobsdottir et al . (2009) demonstrates that while genetic associations are valuable for establishing research hypotheses, strong genetic association does not guarantee the identification of causal genetic risk variation between cases and controls. Ultimately, a more detailed background of the nature of the complex disease and possible genomic regions of susceptibility and/or biochemical pathways is required to further refine lists of potential disease-risk variants (Jostins and Barrett,

2011; Pociot et al ., 2010).

1.3.1 Candidate gene approach In order to identify and map novel genetic variation underlying human diseases or traits in association studies, researchers utilise either candidate gene or genome-wide approaches (Altshuler et al ., 2008). Candidate gene studies target genetic variation

24 within specific genes of interest, with candidates generally selected based on prior knowledge of the biological or functional effect of a gene in a particular phenotype or disease (Zhu and Zhao, 2007). The key assumption in candidate gene approaches is that a gene functionally associated with a disease-causing gene is likely to contribute to the same or similar disease state (Ideker and Sharan, 2008; Goh et al . 2007).

Based on functional data in mice, Kurreeman et al . (2007) targeted complement component 5 ( C5 ) and TNF receptor-associated factor 1 ( TRAF1 ) as candidate genes for rheumatoid arthritis (RA). Kurreeman and co-workers identified the rs10818488 SNP to be associated with a 6.1% increased risk of RA. The A allele of this SNP also found to be correlated with increased RA disease progression. Candidate gene approaches have also been used by Muise et al . (2012) to identify a novel missense variant (‘c.113

G→A’) in the neutrophil cytosolic factor 2 ( NCF2 ) gene that increases the susceptibility to and severity of inflammatory bowel disease. Muise and co-workers determined that the c.113 G →A variant correlated with increased disease progression through reduced binding of the NCF2 gene product, p67-phox, to the NADPH oxidase complex gene,

RAC2 .

Typically, candidate gene studies have high statistical power, but the selection of candidate genes is intrinsically subjective and often arbitrary (Moreau and Tranchevent,

2012). As such, traditional candidate gene studies are limited by a dependence on prior knowledge of the biochemical, functional or physiological impact of possible candidates, which is often incomplete or unknown (Zhu and Zhao, 2007). However, recently computational strategies have been developed that integrate DNA sequence data, gene expression information, functional characterisation and research literature to

25 allow improved candidate gene selection, as reviewed by Moreau and Tranchevent

(2012).

A review by Zhu and Zhao (2007) outlines three common approaches for improved candidate gene selection in genetic association studies. The first approach, ‘position- dependent’, integrates genome sequence information and candidate gene analyses to identify candidate genes primarily based on the physical linkage within a chromosomal locus previously associated with the trait of interest (Zhu and Zhao, 2007) . The second approach, outlined by Zhu and Zhao (2007), is ‘comparative genomics’. This approach enables the identification and characterisation of putative candidates using a cross- species approach in human-disease models. Thirdly, the ‘function-dependent’ strategy incorporates analyses of expression of a gene associated with a particular trait at different phenotypic stages, including signalling and regulatory pathways combined with genome-wide transcriptional profiles, to prioritise candidate genes (Zhu and Zhao,

2007) . Ultimately, combining these selection strategies together with the use of bioinformatic tools can substantially reduces the number of candidate genes for further analysis as reported by Moreau and Tranchevent (2012). For example, Rajab et al .

(2010) used a prioritisation tool, GeneDistiller, in combination with different candidate gene approaches, to prioritize 74 genes from a 2 megabase region on chromosome 17 that was associated with cardiac arrythmias. Rajab and co-workers used this integrated selection strategy to identify a mutation in the leading candidate gene PTRF , which became the focus of subsequent research (Rajab et al ., 2010).

Although the candidate gene approach is an effective and economical method for direct gene discovery, the number of causative genetic faults that have been experimentally validated is limited, thereby restricting the number candidate genes (Zhu and Zhao,

26

2007). Furthermore, the candidate gene approach has been criticized due to the non- replication of results and the inability to include all potential causative genes and polymorphisms, suggesting a large number of false-positive reports (Tabor et al ., 2002;

Morgan et al ., 2007; Pasche and Yi, 2011). Ultimately, the capabilities of traditional candidate gene approaches are limited by the dependence on existing knowledge of the underlying biology of a particular disease or phenotype (Zhu and Zhao, 2007). Given that detailed characterisation of genes or variants underpinning most complex diseases remains unknown, integrated selection strategies may represent the most effective candidate gene approach in future research (Moreau and Tranchevent, 2012; Zhu and

Zhao, 2007).

1.3.2 Genome-wide association studies (GWAS) With advances in high-throughput DNA sequencing technologies, genetic association studies have extended trait mapping to a genome-wide scale, allowing the correlation of allele frequencies of thousands of genetic variants with phenotypic variation in a population (Stranger et al ., 2011). Already with these advances, genome-wide association studies (GWAS) have yielded evidence for the involvement of genetic variants in complex human diseases including autism (Wang et al ., 2009), asthma

(Sleiman et al ., 2010), type 1-diabetes (Todd et al ., 2007; Hakonarson et al ., 2007), and cancer (Tomlinson et al ., 2012; Tao et al ., 2012). GWA studies targeting human cancers alone have yielded over 200 common low-penetrance susceptibility variants (Freedman et al ., 2011), including rs16892766 (Tomlinson et al ., 2008) and rs3802842 (Tenesa et al . 2008) in colorectal cancer, and rs1465618 and rs12621278 in prostate cancer (Eeles et al., 2009).

27

One of the fundamental advantages of GWAS is that it does not require prior knowledge of trait etiology and genomic structure, as opposed to the candidate gene approach (Stranger et al ., 2011). Consequently, the GWAS approach can identify causal genes or variants that have not previously been associated with disease expression and enable the characterisation of non-coding regions and pleiotropic genes in an unbiased way (Gottesman et al ., 2012). Essential to the GWAS approach is the necessity to distinguish between the direct effects of functional variants themselves and indirect associations due to LD in large sets of candidate variants generated in each study

(McCarthy et al ., 2008; Teng et al ., 2012). As such, the utilisation of bioinformatic and computational approaches in GWAS are critical for the prioritisation of variants most likely to contribute to the observed association and assists the selection of candidates for subsequent experimental validation (Wang et al ., 2011; Teng et al ., 2012).

A number of bioinformatic tools such as TRAP (Thomas-Chollier et al ., 2011),

BCRANK (Ameur et al ., 2009), is-rSNP (Macintyre et al ., 2010) and AASsites (Faber et al ., 2011), have been designed to prioritise candidate functional SNPs located within regulatory regions such as transcription factor binding sites and exonic splice sites.

Furthermore, a number of systems for SNP prioritisation, as proposed by Lee and

Shatkay (2009) and Nica et al . (2010), combine multiple independent prediction tools to enhance candidate SNP selection. For example, Nica et al . (2010) developed a novel

SNP prioritisation approach that accounts for local LD structure and integrates quantitative trait loci (QTLs) and GWAS data to determine the specific association indicators that arise from cis-acting QTLs. Nica et al . (2010) used their SNP prioritisation approach to identify 35 candidate disease-associated SNPs from GWAS and QTL datasets, of which the leading candidate disease-risk SNP, a non-synonymous

SNP (rs3764261) in the MT1H gene, was highly associated with high density

28 lipoprotein cholesterol (HDL-C). Previously, the MT1H gene has been associated with inflammatory bowel diseases (Goyette et al ., 2008).

In the last decade, GWAS have identified hundreds of genetic variants associated with complex diseases such as systemic lupus erythematosus, multiple sclerosis, Crohn's disease, rheumatoid arthritis and type 1 diabetes (T1D) (Hu and Daly, 2012; Hindorff et al ., 2009). In one such study, Shan et al . (2012) identified five out of nine previously identified GWAS candidate loci to be significantly associated with breast cancer in a

Tunisian population. The five loci included the rs1219648 and rs2981582 SNPs from the FGFR2 gene, rs8051542 in the TNRC9 gene, rs889312 of the MAP3K1 gene and the rs13281615 SNP located on chromosome 8 (8q24). Shan and co-workers identified homozygous rs2981582 genotypes to be strongly associated with lymph node negative breast cancer, with the minor allele also linked to an increased risk of ER+ tumor development and metastasis (Shan et al ., 2012). GWA studies have also identified candidate disease-risk SNPs in known T1D-susceptibility genes, such as HLA , INS,

PTPN22 , IFIH1 and CTLA4 (Reddy et al., 2011). Moreover, genetic determinants within the HLA class II genes are considered to account for almost 50% of T1D-risk

(Grant and Hakonarson, 2009).

GWAS have identified more than 2,000 complex trait-SNP associations, with numbers increasing rapidly (Casto and Feldman, 2011). Regardless of the success as a screening approach for identifying genetic variations involved in complex traits, GWAS research based on SNP analysis are unable to identify causal genetic variants without experimental validation (Li, M et al ., 2011). Apart from the significant cost and resources required of GWAS, a major criticism has been that the majority of the candidate SNPs identified in such studies are only associated with minor increases in

29 disease susceptibility and therefore possess limited predictive value (Manolio, 2010). In fact, the median odds ratio per candidate risk-SNP has been calculated as 1.33, which is considered small given that it does not account for the significant proportion of the heritable variation (Hindorff et al ., 2009). Furthermore, without proper research design it is possible for a number of variables to potentially confound GWAS results, including sex, age and population stratification (Novembre et al ., 2008). Confounding factors from GWAS research design can also arise from lack of well-defined case and control groups, controls for multiple testing and inadequate sample size (Pearson and Manolio,

2008). The resolution of GWAS is also the source of some debate, with typical screens only able to identify partial chromosomal regions of QTLs at the mega-base level through DNA markers which usually specify many candidate genes (Zhu and Zhao,

2007).

The results of GWAS can be considered as a source of preliminary genetic information that serves as the basis for further analysis. Such further analysis is required to accumulate evidence for subsequent prioritization and identification of strong candidate variation for experimental validation (Cantor et al ., 2010). Ultimately, the integration of traditional and fine mapping data, cross-species resources, literature, bioinformatic resources, and high-throughput sequence-based and gene expression data in GWA studies will improve the validity of candidate variant lists (McCarthy et al ., 2008; Li and Patra, 2010). Although GWAS is an effective screening approach, experimental validation is still required to determine actual functional variants within a prioritised candidate gene or GWAS hits and reveal the functional mechanisms underlying disease traits (Stranger et al ., 2011)

30

1.4 Experimental validation of candidate genetic variation Functional genomics, through experimental validation methods, attempts to elucidate the molecular basis of biological functions by characterising the relative transcriptional or functional potential of a genomic region (Werner, 2010). Reporter gene assays and the Electrophoretic Mobility Shift Assay (EMSA) are benchmark tests for experimentally validating candidate functional genetic variation within regulatory regions as shown by Pang et al . (2013), Low et al . (2012), Knight (2005), Meckler et al .

(2013) and Lattka et al . (2010). Reporter gene assays measure transcript abundance and, in turn, allow the elucidation of allele- or haplotype-specific effects on gene expression levels (Cheng and Inglese, 2012; Liu et al ., 2009; Seldon et al ., 1986; Alam and Cook,

1990). Comparatively, EMSA can discriminate differences in binding of proteins such as transcription factors at a particular locus, which may be caused by allelic differences in the DNA (Albers et al ., 2012; Hellman and Fried, 2007; Garner and Revzin, 1981).

Alternative experimental validation methods characterise genetic variants by assaying for different mechanistic roles for functionality including chromatin arrangement

(Chromatin accessibility by Real-Time PCR) (Cruikshank et al ., 2012; Rao et al ., 2001) and allele-specific FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements)

(Smith et al ., 2012; Giresi et al ., 2007) and chromatin conformation capture (Wang, S et al ., 2013), DNA methylation (bisulfite sequencing) (Shoemaker et al ., 2010; Kagami et al ., 2010; Frommer et al ., 1992), and in vivo DNA-protein interactions (Chromatin

Immunoprecipitation) (Johnson et al . 2007; Robertson et al ., 2007; O’Neill and Turner,

1995). The results of experimental validation methods can also be complemented through the use of bioinformatic analysis (Johnson, 2009; Schaefer et al ., 2012) and high-throughput DNA sequencing (Kingsley, 2011).

31

The development of post-association validation methods is required for characterizing the function of trait- or disease-associated genetic variants which may influence disease susceptibility (Gibson, 2012; Conde et al ., 2013). Consequently, experimental validation methodologies that integrate various biological data sets with genome-wide and candidate gene results will help elucidate the mechanistic role of associated SNPs, thus enabling the identification of causal disease variants for the development of therapeutic interventions (Yanase et al ., 2006; Kazani et al ., 2010; Crews et al ., 2012).

Several methodologies for experimental validation of genetic variant functionality are currently available and assay for different mechanistic roles that lead to variant functionality (Table 1.1). Combinations of various experimental validation methods have been used to characterise the functional role of genetic variants in a range of complex human diseases including schizophrenia (Li, M et al ., 2011), breast cancer

(Wang et al ., 2010), and type 2 diabetes (Pang et al ., 2013).

32

Table 1.1: Characteristics of traditional experimental validation assays for the assessment of candidate functional genetic variants

A summary of characteristics associated with current functional assays for assessing genetic variants, including: the functional mechanism assayed for (“Mechanism”), the type of assay (“Approach”), the issues and limitations associated with each assay (“Issues/Limitations”) and literature examples citing the use of the assay (“References”).

Functional assay Mechanism Approach Issues/Limitations Reference(s) Reporter gene Relative transcript in vitro • Locus-specific Cheng and Inglese, assay abundance • Associated with minor fold (2012) changes in relative expression Liu et al . (2009) • Confounding factors such as transfection efficiency and construct-concentration Seldon et al . (1986) effects Alam and Cook (1990)

Karimi et al . (2009) EMSA DNA-protein binding in vitro • Locus-specific Smith et al . (2010) interactions • Target DNA sequence must be known prior to testing Carey et al. (2009) (oligo design) • Inability to measure the co- operation of genomic sites Garner and Revzin that are essential for binding (1981)

Knight (2005) ChIP DNA-protein binding in vivo • Protein-specific Wright et al . (2010) interactions • Cumbersome (long)

• Requirement for significant O’Neill and Turner numbers of cells (1995)

Collas (2010) CHART-PCR Chromatin in vivo • Locus-specific Cruickshank et al . accessibility • Influenced by other SNPs (2012) and regulatory elements within the target region Rao et al . (2001) • Target locus sequence complexity influences optimal amplification Cruickshank et al . • Methodological issues (2009) related to optimal nuclease digestion

Bisulfite Methylation patterns in vivo • Locus-specific Zhang, Y et al . (2009) sequencing • Requirement for significant numbers of cells/chromatin Shoemakers et al . due to harsh bisulfite (2010) treatment • Target DNA sequence must be known prior to testing Frommer et al . (1992) (oligo design) Grigg and Clark (1994) Allele-specific Allelic effects on in vivo • Expensive Smith et al . (2012) FAIRE chromatin • Relies on the existence of a accessibility and ChIP-grade antibody to Giresi et al . (2007) regulatory potential recognise each DNA-bound transcription factor

Chromatin Haplotype effects on in vivo • Detection of a product does Wang, S et al . (2013) conformation chromatin not always mean a specific capture conformation interaction has occurred between two regions • Requirement of a great number of cells 33

Published genetic association studies can identify candidate functional genetic variants within gene promoters. Further studies have identified allele-specific transcription factor binding patterns that lead to altered gene expression. This represents one possible mechanism for genetic variants contributing to human disease (Butter et al ., 2012). For example, Li, M et al . (2011) identified a schizophrenia risk SNP (rs3755557) in the

GSK3 β gene promoter that alters transcription factor binding affinities, resulting in a higher promoter activity when the risk allele (A) is present as determined by reporter gene and gel shift assays in multiple cell lines. More recently, McManus et al . (2012) found that a hypertension risk-SNP (-1651T/C) in the CYP11B2 gene promoter influenced binding of the transcriptional repressor APEX1, with the -1651T allele associated with increased transcriptional activity in comparison to the -1651C allele, where the repressor binds with decreased affinity.

1.4.1 Reporter gene assays Reporter gene assays use allelic regulatory regions, such as gene promoters, to drive the expression of a reporter gene (Karimi et al ., 2009). Reporter gene expression can be assayed so as to detect relative differences in transcript abundance as a result of particular promoter variants in a cell line following transfection (Cheng and Inglese,

2012; Liu et al ., 2009; Seldon et al ., 1986; Alam and Cook, 1990). Allelic reporter constructs that significantly differ in relative transcript abundance relative to controls are considered to contain functional genetic variants (Karimi et al ., 2009).

The reporter gene assay was originally designed to define the major transcriptional elements that have an effect on basal and/or inducible gene expression, but has been refined for various applications (Karimi et al ., 2009). Further adjustments to the initial

34 methodology (Seldon et al ., 1986; Alam and Cook, 1990) have been made, including alternate methods of transfection and transformation, and the use of inducible promoters. For example, Karimi et al . (2009) reports the use of a stable transfection method that utilises FLP-recombinase to functionally assess the G-308A polymorphism in the TNF gene promoter. Furthermore, the integration of a droplet-based micro-fluidic system has been used to perform a quantitative reporter gene assay, and has increased the resolution to the single cell level, allowing accurate measurement of the expression- response profile for a nuclear receptor agonist; hormone 20E (Baret et al ., 2010).

Reporter gene assays have been used to characterise the role of disease-risk variants in a variety of human diseases, including cardiovascular diseases (Huang et al ., 2007), acute lung injury (Marzec et al ., 2007), and viral infections such as hepatitis C (He et al .,

2009). For instance, Huang et al . (2007) identified a functional SNP (-764G/C) in the

IFN -γ promoter region that conferred a two- to three-fold higher promoter activity in the presence of the G allele relative to C using reporter gene assays. Huang and co-workers suggested that this SNP is functionally important in determining viral clearance and treatment response in hepatitis C virus-infected patients. Alternatively, Marzec et al .

(2007) also used reporter gene assays to identify a promoter SNP (-617C/A) in the

NRF2 gene that is associated with reduced luciferase activity and is suggested to be associated with acute lung injury. In cardiovascular disease research, He et al . (2009) observed a 14-45% reduction in luciferase reporter expression in the presence of the

+190C coronary heart disease risk-allele (rs1043618) in the HSPA1A gene, relative to the +190G allele in cell culture.

Although reporter gene assays are considered benchmark experimental tests for function, typical reporter gene approaches are associated with inherent limitations that

35 confound functional assessment (Karimi et al ., 2009). For instance, Karimi et al . (2009) reviewed factors associated with transient reporter gene assays that confound the assessment of allelic differences by testing a SNP (-308G/A) in the TNF gene promoter identified variables that influenced allele-specific TNF transcript abundance included: the method of transfection, host cell growth history, the amount (and quality) of DNA and control vector used in transfection for normalisation purposes. An early study by

Smith and Hager (1997) also reported that aberrant transcriptional activity in reporter gene assays can be influenced by the lack of appropriate chromatin arrangement.

Furthermore, in vitro gene expression, as measured by reporter gene assays, does not necessarily correlate with in vivo gene expression (Makar et al ., 2001). Makar et al .

(2001) determined that an intronic silencing region (CRS) in the human CR2 gene did not silence transcription using in vitro reporter gene assays, but was functional upon in vivo testing. Makar and co-workers found that the CRS region was functional only in in vivo assays where the construct was stably integrated into cells to provide the appropriate chromatin context.

1.4.2 Electrophoretic Mobility Shift Assay Electrophoretic Mobility Shift Assay (EMSA) is an in vitro gel separation technique based on electrophoretic migration of DNA-protein complexes (Hellman and Fried,

2007; Garner and Revzin, 1981). EMSA can differentiate between differences in binding of proteins to regions of DNA which differ by as little as a single nucleotide

(SNPs). Furthermore, EMSA can be used to determine whether more than one protein can bind to the region, as demonstrated in the KRT1 (Tao et al ., 2006), ADRB2 (Naka et al ., 2012) and TNFSF4 gene promoters (Ria et al ., 2011). Although variations in EMSA methodology have been described in the literature, typically functional genetic variants are identified by comparing migratory differences of a control binding reaction (with no

36 protein added) relative to allele-specific (or, gene-specific) DNA-protein reactions

(Hellman and Fried, 2007).

EMSA has been used to characterise the role of functional variants in human disease studies including variants associated with increased risk of cardiovascular disease

(Göring et al ., 2007; Musunuru et al ., 2010). For example, Göring et al . (2007) used

EMSA to identify an additional DNA-protein binding complex in the presence of the -

137T allele within the promoter region of the cardiovascular risk-gene VNN1 that was not present with the -137G allele. Bioinformatic analysis using the online software P-

Match (Chekmenev et al ., 2005) supported the EMSA results and predicted binding of the transcription factor (TF) Sp1 in the presence of the -137T allele (Göring et al .,

2007). Another study by Curran et al . (2009) used EMSA to identify allele-specific binding of proteins at the -191T/C SNP in the promoter of the type 2 diabetes risk-gene

PARL . Bioinformatic analysis using P-Match (Chekmenev et al ., 2005) supported the

EMSA findings and revealed that the -191T allele is embedded in a consensus Sp1 binding site whereas the -191C SNP ablated the core sequence. More recently,

Musunuru et al . (2010) used EMSA to identify an allele-specific CCAAT/enhancer binding protein (C/EBP) transcription factor binding site, localized around the rs12740374 SNP, which influences the hepatic expression of the SORT1 gene.

Supershift EMSA extends the capabilities of traditional EMSA methodology by using antibodies specific to target proteins to confirm the identity of DNA-binding proteins

(Carey et al ., 2009; Leong et al ., 2003). For instance, Neumann et al . (2000) used supershift EMSA to confirm the identity of 2 major DNA-protein binding complexes observed in the κB gene within mature dendritic cells. Neumann and co-workers identified p65/RelA-containing heterodimers (complex 1) and RelB-containing

37 heterodimers (complex 2) binding to oligonucleotides specific for the κB gene in mature dendritic cell nuclear extract, indicating Rel-directed regulation of the gene encoding the κB transcription factor. More recently, Smith et al . (2010) identified a novel candidate functional SNP (rs327) in the LPL gene using supershift EMSA, which confirmed the in vitro binding of the FOXA2 transcription factor to the T allele- containing locus.

A recent review by Holden and Tacon (2011) cites a number of confounding factors in

EMSA, such as nuclear extract purity and the requirement of post-translational modifications that influence in vitro DNA-protein interactions. Furthermore, Dekker and Haisma (2009) have identified that histone acetyltransferase activation may be required for the binding of particular proteins to genomic DNA, thereby limiting the potential pool of candidate regulatory proteins assessed. A study by Hager et al . (2009) describes a further limitation associated with EMSA whereby structural changes in chromatin, which facilitate DNA-protein binding, are not taken into account in EMSA.

1.4.3 Chromatin Immunoprecipitation Chromatin profiling represents an effective means of identifying functional disease-risk genetic variants and for the detection of regulatory activity associated with chromatin context (Ernst et al ., 2011; Coetzee et al ., 2010). The annotation of chromatin provides a reflection of the transcriptional potential of a genomic region or of a genetic variant, as chromatin plays a central role in mediating regulatory signals and controlling access of transcription factors and other elements to the DNA (Ernst et al ., 2011). The characterisation of regulatory regions, in terms of histone modifications and transcription factor-occupied regions, can be assessed through the direct (DNA-protein binding) interaction of selected transcription factors with candidate genetic variants or

38 haplotypes (Freedman et al ., 2011; Hon et al ., 2009). For example, chromatin immunoprecipitation (ChIP) is an in vivo experimental technique consisting of the selective immunoprecipitation of a protein of interest from a chromatin preparation to determine the associated DNA sequences (Collas, 2010; O’Neill and Turner, 1995).

Since initial publications in 1995 (O’Neill and Turner, 1995), the ChIP methodology has been refined in order to target genetic variants for functional analysis, through the comparison of chromatin derived from cell lines containing genetic variants of interest

(Wright et al ., 2010) . For example, Wright et al . (2010) used ChIP to identify allele- specific enrichment of the TCF4 and beta-catenin proteins to a 300 bp region centred around the rs6983267 SNP in familial colorectal cancer (CRC) cell lines. Wright and co-workers found the ChIP DNA samples analysed using the TCF4 and beta-catenin antibodies showed enrichment with the cancer-associated rs6983267G allele in contrast to the T allele. ChIP has also been used by Menendez et al . (2006) to identify allele- specific enrichment of the p53 transcription factor in the FLT-1 promoter encompassing a SNP (‘C519T’). Menendez and co-workers identified p53 enrichment in the presence of the T allele and determined an association between the observed allele-specific p53 interaction and angiogenesis-associated diseases, including cancer (Menendez et al .,

2006).

Despite proving useful in disease research, ChIP is associated with a number of inherent limitations, relating to low signal strength (relative to controls) and resolution, that may lead to inconclusive results (Collas, 2010; Carey et al ., 2009). Carey and co-workers also report that the use of ChIP cannot solely demonstrate the functionality of a protein at a genomic region of interest (Carey et al ., 2009). Experimentally, targeted ChIP approaches can also be cumbersome and require significant numbers of cells (Collas,

39

2010), especially when targeting SNPs rather than the traditional target of haplotypes

(i.e. haploChIP') (Ameur et al ., 2009; Hultman et al ., 2010). Adding to the complexity, the functional assessment of SNPs using ChIP requires the assessment of multiple regions around and including a SNP in order to validate the influence of specific SNPs

(Charles, 2005).

1.4.4 CHART-PCR Chromatin accessibility by Real-Time PCR (CHART-PCR) is an in vivo assay (Figure

1.1) that is able to map chromatin accessibility across a gene and regulatory sequences

(Rao et al ., 2001; Cruickshank et al ., 2012). CHART-PCR assigns a relative value of accessibility of DNA binding proteins to a specific region of DNA and is an indication of the chromatin arrangement and transcriptional activity at that site (Rao et al ., 2001;

Hager et al ., 1993; Poke et al ., 2012). CHART-PCR is based on the concept that localized ‘open’ chromatin conformation confers greater sensitivity to nuclease digestion. The degree of sensitivity (or ‘chromatin accessibility’) can be amplified and measured by quantitative real-time PCR (qPCR) (Mummidi et al ., 2007).

40

Figure 1.1 Principles of CHART-PCR . CHART-PCR consists of firstly the isolation of intact nuclei. Following isolation and washing, nuclei are incubated with or without nuclease (such as micrococcal nuclease or deoxyribonuclease 1) giving “undigested” and “digested” samples. Accessibility reactions are subsequently terminated and genomic DNA recovered and prepared for analysis by quantitative real-time PCR in order to quantify target regions. Comparing the quantity of targeted genomic sequences in digested relative to undigested samples provides a measure of chromatin accessibility and expressed as a percentage of nuclease cutting across the region specified by the primers. Figure derived from www.epigentek.com

41

Mikkelson et al . (2007) and Birney et al . (2010) found that chromatin state can be read in an allele-specific manner using SNPs, which can influence the localised chromatin context and accessibility within genes. Moreover, Pang et al . (2013) reported that allele- specific chromatin accessibility differences can influence gene expression, phenotype, and functionality. CHART-PCR has been used to characterise functional chromatin accessibility in genes including CYP19 (Fürbass et al ., 2008), IL -2 (Rao et al ., 2001), and E-selectin (Edelstein et al ., 2005). More recently, CHART-PCR has been refined in order to differentiate between regions of DNA which differ by as little as a single SNP, as shown in the CR2 gene (Cruickshank et al ., 2012). Cruickshank and co-workers used

CHART-PCR to identify a systemic lupus erythematosus (SLE) risk-SNP (rs3813946) in the 5 ′ untranslated region (UTR) of the CR2 gene that resulted in allele-specific

MNase digestion and transcription factor binding within the SNP-containing region.

These data confirmed the functional effects of rs3813946 on CR2 transcription

(Cruickshank et al ., 2012), which had been previously found a decrease in transcriptional activity of CR2 in the presence of the C allele relative to the T allele (Wu et al ., 2007).

While chromatin accessibility plays an essential role in transcriptional regulation,

CHART-PCR as an assay has limited potential due to locus-specific effects, with confounding variability resulting from other SNPs and regulatory elements within the target region (Cruickshank et al ., 2009). CHART-PCR can also be considered as a low throughput assay, with inconsistencies arising when comparing multiple cell lines due to fluctuations in the degree of nuclease digestion (Karimi, 2007). Other sources of potential variation can be attributed to quantitative real-time PCR (qPCR) inconsistencies from: improper primer design (in low complexity regions), DNA hetero-

42 duplex formation, mis-priming and primer dimerisation (Freeman et al ., 1999; Bustin et al ., 2005; and Git et al ., 2010).

1.4.5 Bisulfite sequencing DNA methylation is an epigenetic mechanism that involves the addition of a methyl group to particular cytosine residues in the DNA. This process can drastically affect transcriptional regulation during development, with dysregulation of this mechanism associated with diseases such as type 2 diabetes (Volkmar et al ., 2012) and cancer

(Deng et al ., 2009; Suzuki et al ., 2012). Typically, DNA methylation is associated with reduced gene expression either directly, through reducing the accessibility of transcription factors, or indirectly, by enhancing the formation of heterochromatin (Caro et al ., 2012; Watt and Molloy, 1988; Boyes et al ., 1991). The process is reversible by de-methylation of methylated residues (Holliday et al ., 1991). Bisulfite sequencing enables the identification of the methylation status of cystosines in targeted genomic regions and is dependent on the action of bisulfite (Figure 1.2). Bisulfite preferentially deaminates unmethylated cytosines, which are converted to uracil upon desulfonation, while methylated cytosines are unaffected due to the presence of the methyl modification (Benoukraf et al ., 2013; Frommer et al., 1992). The amplification and subsequent sequencing of target regions from bisulfite-treated DNA permits the characterisation of the methylation status at single-nucleotide resolution (Deng et al .,

2009).

43

Figure 1.2 Principles of methylation analysis using bisulfite genomic sequencing . After treatment with sodium bisulfite, unmethylated cytosine residues are converted to uracil whereas 5-methylcytosine (5mC) remains unaffected. After PCR amplification, uracil residues are converted to thymine. DNA methylation status can be determined by direct PCR sequencing or cloning sequencing. Figure and legend obtained form Li and Tollefsbol (2011).

44

Bisulfite sequencing is particularly useful for the characterisation of CpG islands, as increased numbers of CG dinucleotides in associated primers increase the specificity of targeted amplification when performing PCR as they allow a tighter interaction with the target region (Rauch et al ., 2005). CpG islands are found within, or in close proximity to, more than 70% of mammalian gene promoters (Hodges et al ., 2009; Saxonov et al .,

2006). They can influence the chromatin conformation and subsequently affect the transcriptional potential of a gene, depending on the methylation status of CG dinucleotides within the island (Bird, 2002; Goll and Bestor, 2005). Robertson (2005) reviewed the effects of irregular DNA methylation and aberrant regulation of CpG island methylation status and noted associations with human diseases, including cancers

(van Rijnsoever et al ., 2002; Dong et al ., 2005) and cardiovascular disease (Dong et al .,

2002). Genetic variants located in CpG sites can greatly influence DNA methylation and therefore affect chromatin conformation and accessibility. As such, allele-specific differences in chromatin accessibility can often be detected and indicate SNP functionality (Mikkelsen et al ., 2007; Shoemakers et al ., 2010). For example, Zhang, Y et al . (2009) used bisulfite sequencing to assess DNA methylation of CpG islands on chromosome 21 in leukocytes and identified significant allele-specific methylation due to candidate functional SNPs in three chromosomal regions, including the promoter of the CBR1 gene (dbSNPs rs41563015 and rs25678). Zhang and co-workers found that

DNA methylation varied by as much as 85% between alleles and was associated with mono-allelic gene silencing of the methylated allele (Zhang, Y et al ., 2009).

A review by Tycko (2010) describes two mechanistic models describing the role of genetic variants situated within CpG islands. The first model suggests that variants can either create or delete CpG sites, which in turn influence the propensity of adjacent non- polymorphic CpG sites to become cytosine-methylated (Tycko, 2010). The alternative

45 model describes that a portion of CpG-associated genetic variants could modulate the binding of transcription factors in either a positive or negative manner (Tycko, 2010).

For example, Boumber et al . (2008) found a 12 bp indel polymorphism in a Sp1 GTF- binding site in a CpG island within the RIL gene promoter that influences the propensity of the gene to become methylated in leukaemias.

Zhang and Jeltsch (2010) review sources of potential variability associated with the bisulfite sequencing approach, including: clonal DNA amplification, incomplete bisulfite conversion, non-CpG methylation and insufficient sequence depth. Recent advances in high-throughput sequencing technologies, however, may overcome the lack of sequencing depth required to generate conclusive evidence and also allows the assessment of multiple genomic regions (Statham et al ., 2012). For instance,

Shoemakers et al . (2010) used high-throughput bisulfite sequencing to identify heterozygous SNPs in CpG sites (‘CpG-SNPs’) that disrupt DNA methylation.

Shoemakers and co-workers identified that allele-specific methylation (ASM) was present on 23-37% of heterozygous SNPs in cell culture, with a significant proportion of ASM (38-88%) influenced by SNPs disrupting potential CpG sites. In a related study,

Gertz et al . (2011) also used high-throughput bisulfite sequencing to identify allele- specific DNA methylation on the X-chromosome, and found allele-specific methylation at SNPs across six individuals of a three-generation family. Gertz and co-workers found that 80% of the variation in DNA methylation was dependent on sequence. Historically, technology has limited genome-wide scale approaches in epigenomics, but the emergence of high-throughput sequencing technologies will allow the determination of the global allele-specific methylation status (Callinan and Feinberg, 2006).

46

1.4.6 Genome-wide functional assays The last decade has seen significant advances over the traditional Sanger et al . (1977) chain-termination DNA sequencing approach, with current high-throughput (‘deep’ or

‘next-gen’) technologies capable of re-sequencing the human genome in approximately a week using a single instrument (Ståhl and Lundeberg, 2012; Metzker, 2010; Shendure and Ji 2008). Since the completion of the HGP, research has focussed on identifying the functions of particular DNA sequences at a genome-wide level. Combining new high- throughput sequencing with other techniques provides a more complete picture of how the genome and various regulatory mechanisms result in particular biological functions

(Wu et al ., 2006). Recent genome-wide functional assays that utilise high-throughput sequencing technologies have enabled the direct evaluation of the role of genetic variants in gene expression and function (Pastinen, 2010).

Deep sequencing technologies have enabled functional assays such as ChIP, to be extended to a genome-wide scale (Collas, 2010; Barski et al ., 2007; Statham et al .,

2012). A review by Collas (2010) provides a detailed evaluation of the existing integrated approaches that combine deep sequencing and ChIP methodologies (‘ChIP- seq’) for the profiling of histone modifications, histone variants, and transcription factor occupancy of the DNA. For example, ChIP-seq was used initially by Barski et al .

(2007) to characterise histone modifications associated with H2A.Z, RNA polymerase

II, and the insulator binding protein CTCF in the human genome. More recently,

Statham et al . (2012) developed an enhanced ChIP-seq methodology for directly determining the methylation status of histone-modified DNA through the combination of high-throughput bisulfite sequencing and ChIP (‘BisChIP-seq’). Statham and co- workers used BisChIP-seq to identify allele-specific methylation associated with

H3K27me3 histones in normal and prostate cancer cells (Statham et al ., 2012).

47

Allele-specific gene expression has been widely studied, with an estimated 20% of human polymorphic sites showing 1.5-fold expression differences (Zhang, K et al .,

2009), and 30% of human transcripts showing a 1.2-fold difference between alleles (Ge et al ., 2009). Thus far, the annotation of human genetic variation associated with gene expression has centred on expression quantitative trait loci (eQTL) mapping. However, next-gen sequencing technologies have enabled the development of genome-wide functional assays targeting allele-specific gene expression (or ‘cis-regulatory variation’), DNA–protein interactions and chromatin configuration (Pastinen, 2010;

Cookson et al ., 2009). For example, early studies characterizing allelic-specific gene expression utilised deep sequencing to target the human transcriptome (‘RNA-seq’), as shown by Montgomery et al . (2010) and Pickrell et al . (2010). RNA-seq is dependent upon the correlation between sequence reads associated with polymorphic sites and

RNA abundance, which enables the detection of genetic variants that show potentially skewed expression (Pastinen, 2010). For instance, Pickrell et al . (2010) used RNA-seq to identify 138 SNPs that were associated with eQTLs, including one SNP (rs7639979) showing allele-specific expression (GG>GA>AA) in the TSP50 gene in Nigerian lymphoblastoid cell lines.

While genome-wide functional assays have overcome some limitations related to traditional assays, such as locus-specificity, large-scale approaches involving comprehensive genome coverage are associated with both false positives and false negatives (Mardis and Wilson, 2009; Grunenfelder and Winzler, 2002; Huttlin et al .,

2007). Furthermore, the quality and resolution of datasets from seemingly related studies can vary considerably (von Mering et al ., 2002). In order to resolve the effects of confounding factors, the results from such genome-wide functional experiments must

48 be validated by performing further experiments (Martin and Drubin, 2003; Mattocks et al ., 2010).

A current focus in functional genomics is the elucidation of causal genetic variants that are essential for cis-regulatory control of gene expression, with association studies implicating a functional role of these genetic variants in complex traits and disease

(Pastinen, 2010). Given that the genes and variants associated with complex traits can be diverse, the integration of multiple whole-genome approaches including GWAS and genome-wide functional assays such as ChIP-seq and RNA-seq, are essential for characterising the genetic basis of complex traits (Hawkins et al ., 2010; Cowie et al .,

2006). This integrated approach enables the functional characterisation of genetic variants with single nucleotide resolution, and overcomes limitations associated with benchmark tests such as reporter gene and gel shifts assays (Pastinen, 2010; Karimi et al ., 2009). Genome-wide approaches generate large numbers of experimental data points, with difficulties arising when attempting to discern functionally relevant results from large-scale datasets (Salmon-Divon et al ., 2010). As such, bioinformatic tools represent an ideal approach for the prioritisation of candidate genetic variants from genome-wide datasets for further investigation (Kann, 2010; Sun et al ., 2009; Salmon-

Divon et al ., 2010).

1.4.7 Bioinformatic analyses Bioinformatic approaches are important for the prediction, prioritisation and verification of functional genetic variants in the human genome, as reviewed by Huang et al . (2009) and Fernald et al . (2011). Many bioinformatic approaches exist for the prioritisation of functional genetic variation by identifying transcription factor binding motifs that contain potential functional variants. Such tools include JASPAR (Sandelin et al .,

49

2004); P-Match (Chekmenev et al ., 2005); TRANSFAC (Wingender et al ., 1996; Matys et al ., 2006); and AliBaba2 (Grabe, 2002) and often utilise various methods to improve the validity of results. For example, P-Match integrates known transcription factor binding sites from the TRANSFAC database (Wingender et al ., 1996) with pattern matching and weight matrix algorithms to increase recognition accuracy and reduce error rate (Chekmenev et al ., 2005). P-Match has been used to predict transcription factor binding sites in fish (Sun et al ., 2006), mice (Jiang et al ., 2008) and humans

(Göring et al ., 2007). Similarly, AliBaba2 utilises the TRANSFAC database

(Wingender et al ., 1996) to identify transcription factor binding sites by generating predictive matrices based on input DNA sequences (Grabe, 2002), as demonstrated in the MTA1 gene (Reddy et al ., 2009) and the SVEP1 gene (Glait-Santar and Benayahu,

2011). AliBaba2 has been used to predict transcription factor binding sites in different organisms including insects (Ursic-Bedoya et al ., 2008), fish (Yeh and Klesius, 2008) and humans (Moosa et al ., 2011).

The structural influence of genetic variants at the protein level can be predicted using bioinformatic tools as reported by Lin et al . (2005), Cuff et al. (1998) and Faraggi et al .

(2012). For example, YASPIN is an online database that utilises 7-state local structure combined with Markov chain models to predict protein secondary structures (Lin et al .,

2005). YASPIN has been used by Pool et al . (2012) and Fallahi-Sichani et al . (2011) to determine protein secondary structures. Other tools that are being developed to better predict secondary and tertiary structures include SPINE X (Faraggi et al ., 2012) and

FPGA accelerator (Xia et al ., 2011).

With recent advancements in high-throughput sequencing technologies, new challenges have arisen that require further bioinformatic power. Such challenges include complex

50 multiple sequence alignments from large datasets and the non-reproducibility present amongst various current tools, as reviewed by Aniba et al . (2010). However, these limitations are, in part, overcome by the integration of multiple bioinformatic tools.

Such a high-throughput approach is likely to increase the likelihood of identifying casual genetic variants in human traits and disease (Huang et al ., 2009). For example,

Lee and Shatkay (2008) developed a functional SNP (F-SNP) application that incorporates 16 different bioinformatic tools to predict the splicing, transcriptional, translational and post-translation effects of SNPs. Multiple bioinformatic tools, via the

F-SNP database, have been used to predict functional SNPs in the FOXE1 gene (Landa et al ., 2009), and genes encoding for proline metabolic enzymes (Hu et al ., 2008). Xu and Taylor (2009) use an equivalent approach called ‘SNPinfo’ for candidate SNP prioritisation that integrates published GWAS results with LD data and in silico predictions of functional characteristics. The SNPinfo database has been used in studies to predict functional SNPs in the LIN28B gene (Permuth-Wey et al ., 2011) and the

TNFSF8 gene (Wei et al ., 2011). Typically, most genetic variants identified in GWAS have minor effects on disease susceptibility. Thus, the integration of bioinformatic approaches and functional data is essential for the prioritisation and identification of candidate functional genetic variants (Moore et al . 2010).

1.5 VNN1 – A gene model for novel experimental validation assays and cardiovascular disease The VNN1 gene has been associated with a number of human diseases (Kaskow et al .,

2012) including inflammation (Martin et al ., 2004; Berruyer et al ., 2006; Jansen et al .,

2009), malaria (Min-Oo et al ., 2007), type 1 diabetes (Roisin-Bouffay et al ., 2008) and cardiovascular disease (Göring et al ., 2007; Zhu and Cooper, 2007). Initial cardiovascular studies concerning the VNN1 gene targeted transcript levels as an

51 endophenotype to identify the VNN1 gene as a quantitative trait locus associated with high density lipoprotein–cholesterol levels and cardiovascular phenotypes (Göring et al ., 2007). More recently, genetic variants within the VNN1 gene have been found to be associated with cardiovascular disease as shown by Jacobo-Albavera et al . (2012), Fava et al . (2010) and Zhu and Cooper (2007). Furthermore, statistical and functional genetic assays have experimentally assessed a number of functional genetic variants within the

VNN1 gene including the ‘-708G/A’, ‘-667G/A’, ‘-612C/A’, ‘-587G/A’, ‘-137T/G’

SNPs in the gene promoter (Göring et al ., 2007) and the exonic rs2294757 SNP (Kerkel et al ., 2008). Reviews exemplify VNN1 as a novel cardiovascular risk-gene and an ideal gene model for further functional analysis given the experimentally validated associations of VNN1 genetic variation with cardiovascular phenotypes (Göring et al.,

2007; Kerkel et al., 2008).

1.5.1 VNN1 – The novel cardiovascular risk-gene In humans, the VNN1 gene is located on the antisense strand of (Entrez

Gene cytogenetic band: 6q23-24) within a cluster of three genes ( VNN1 , VNN2 , and

VNN3 ) which occupy a 80 kb region and comprise the Vanin gene family (Martin et al ., 2001; Galland et al ., 1998). The VNN1 gene is 33,198 bp in length and consists of seven exons (Martin et al . 2001; http://www.genecards.org/cgi- bin/carddisp.pl?gene= VNN1 ). The dbSNP database includes over 40 million validated

SNPs in the most recent human genome build (dbSNP build 135), with over seven- hundred SNPs (779 as of July 2012) located within the VNN1 gene (Sherry et al ., 2001; http://www.ncbi.nlm.nih.gov/projects/SNP/). Orthologs for the VNN1 gene have been identified in 15 species including mice (Martin et al ., 2001) and rats (Fugmann et al .,

2011). VNN1 mRNA expression has been reported in numerous human cells and tissues including leukocytes, kidney, liver, and the intestine (Galland et al ., 1998; Jansen et al .

52

2009). A genome-wide profiling study by Yanai et al . (2005) identified VNN1 expression levels to be highest within whole blood and liver.

The VNN1 gene encodes for a 513 amino acid protein, namely Vanin 1, which was first reported to have pantetheinase activity by Aurrand-Lions et al . (1996). Aurrand-Lions et al . (1996) identified similar pantetheinase activity in Vanin family members VNN2 and VNN3 , who also found that Vanin 1 played a role in regulating late adhesion steps of thymus homing under physiological (non-inflammatory) conditions. Aurrands-Lion et al . (1996) also reported the Vanin 1 protein to have a sub-cellular location on the cell membrane, linked to a lycosylphosphatidylinositol (GPI)-anchor. In a subsequent study,

Marras et al . (1999) defined Vanin 1 pantetheinase activity to involve the hydrolysis of one carbo-amide linkage in d-pantetheine to form d-pantothenate (vitamin

B5/pantothenic acid) and cysteamine (2-aminoethanethiol). A recent review by Kaskow et al . (2012) reported that Vanin 1-directed hydrolysis of pantetheine impacts on cellular processes including fatty acid metabolism and oxidative stress response.

1.5.2 Gene models in cardiovascular disease – case studies Cardiovascular diseases (CVDs) represent the leading cause of death worldwide (Moran et al ., 2012), and affected more than 80 million people in the United States in 2012 alone (Kathiresan and Srivastava, 2012). To elucidate the genetic risk factors of CVD, early research focused on the APOE gene, which encodes the lipid-transport protein apolipoprotein E (Rall and Mahley, 1992; Visvikis-Siest and Marteau, 2006; Willer et al ., 2008). The APOE gene is polymorphic with two non-synonymous SNPs (rs429358 and rs7412) which correspond to three haplotypes ( ε2, ε3, ε4) with varying population frequencies (Bekris et al ., 2008). The polymorphisms alter the amino acid sequence at positions 112 and 158 and result in three apolipoprotein E isoforms which differ in

53 structure and function (Kuhlmann et al ., 2010; McIntosh et al ., 2012). A report by

Cambien and Tiret (2007) determined that ε4 carriers (allele frequency >20% in the population) had a 40% higher risk of CHD than ε3 homozygotes, and shows the link between genetic variation and susceptibility risk for cardiovascular disease. More recently, the APOE gene has also been associated with Alzheimer’s disease (Genin et al ., 2011).

At the molecular level, combinations of genetic variants are assumed to dysregulate particular cellular pathways and contribute to human diseases such as CVD (Kim et al .,

2011). Although few causal genetic risk factors have been associated with complex

CVDs, the genetic mechanisms underlying monogenic cardiovascular traits have been more extensively characterised (Nabel, 2003; Glazier et al ., 2002; Timpson et al .,

2012). For example, low-density lipoprotein (LDL) is a risk factor for monogenic forms of CVD through the inactivation of hepatic LDL receptors, which modulates LDL levels in plasma (Timpson et al ., 2012). Familial hypercholesterolemia (FH) is a monogenic disorder associated with mutations within the LDLR gene, leading to a deficit in LDL receptors and subsequently elevated plasma cholesterol levels (Nabel, 2003; Kassim et al ., 2010). Published estimates indicate in excess of 600 mutations and variants in the

LDLR gene have been identified in patients with FH, with heritability of variants estimated to exceed 50% (Goldstein et al ., 2001; Cambien and Tiret, 2007; Browning and Browning, 2013).

Hypertension is a complex cardiovascular condition associated with elevated blood pressure in arteries and confers an increased susceptibility to heart and renal failure, stroke, and myocardial infarction (Messerli et al ., 2007). Many genetic risk factors associated with increased blood pressure have been identified (Ehret et al ., 2011; Lifton

54 et al ., 2001). However, the genetic basis of primary hypertension remains poorly understood; in part due to confounding environmental and lifestyle factors such as dietary salt consumption (He and MacGregor, 2009), physical activity (Dickinson et al .,

2006), and obesity (Haslam and James, 2005) which influence the CVD phenotype

(Taylor et al ., 2009). Thus far, hypertension research has focused on the identification of genetic risk variants in the nuclear genome; however, a recent study by Liu, C et al .

(2012) identified a number of SNPs within the mitochondrial genome that contribute to increased susceptibility to hypertension. Liu and co-workers identified a non- synonymous mitochondrial SNP 5913G>A (Asp>Asn) in the cytochrome C oxidase subunit 1 of respiratory complex IV that was found to have a significant association with blood pressure (BP) and fasting blood glucose (FBG) levels (Liu, C et al ., 2012).

On average, carriers of the rare 5913A allele were at greater risk of developing hypertension, with 7mm Hg higher systolic BP than observed at baseline levels and

17mg/dL higher mean FBG (Liu, C et al ., 2012). Lui and co-workers identified a second mitochondrial SNP, 3316G>A (Ala>Thr) in the NADH dehydrogenase subunit 1 of complex I that was also highly associated with BP and FBG levels (Liu, C et al .,

2012). The two individuals with the rare 3316A allele had 17 and 25mg/dL higher FBG than observed at baseline levels (Liu, C et al ., 2012). Estimates of the contribution of genetic variants to hypertension range from 30 to 50% (Tanira and Balushi, 2005), which reflects the importance of identifying and characterising the role of functional genetic variants in human diseases.

To elucidate the role of VNN1 in CVD, Göring et al . (2007) used genome-wide transcriptional profiles (of lymphocytes) and bioinformatic tools to identify genetic variants in the VNN1 gene promoter that were highly associated with transcript abundance and high-density lipoprotein cholesterol (HDL-C) concentrations. Göring

55 and co-workers reported five SNPs (-708G/A, -667G/A, -612C/A, -587G/A, -137T/G) in the VNN1 gene promoter that were identified to be putatively functional based on the genetic association with transcript abundance and HLD-C concentration (Göring et al .,

2007). The -137T/G SNP had the highest association with HDL-C concentrations and was functionally assessed using EMSA, which indicated an allele-specific binding profile of DNA-protein complexes (Göring et al ., 2007). Bioinformatic analysis of the -

137T/G SNP using P-Match identified allele-specific stimulating protein 1 (Sp1) binding in the presence of the -137T allele (Göring et al ., 2007). The transcription factor

Sp1 has previously been associated with high gene expression and contains a zinc finger protein motif which directly binds to DNA and allows the TF to enhance transcriptional activity (Jiang et al ., 2005). More recently, Jacobo-Albavera et al . (2012) also found

VNN1 gene expression and the -137T/G SNP to be associated with HDL-C levels in

Mexican children.

To determine the role of DNA methylation in VNN1 gene expression, Kerkel et al .

(2008) used a methylation-sensitive restriction endonuclease (HpaII) to assess heterozygous VNN1 genetic variants their effect on gene expression. Kerkel and co- workers identified the A allele of an exonic SNP, rs2294757, to have preferential upregulation of gene expression, indicating allele-specific methylation and the presence of functional SNPs in the VNN1 gene (Kerkel et al ., 2008). Subsequently, Fava et al .

(2010) found an association between the rs2294757 SNP and cardiovascular disease symptoms such as hypertension.

The identification of causal genetic risk factors represents the primary focus of cardiovascular research, with genetic variants in the VNN1 gene representing novel targets for analysis based on published studies (Zhu and Cooper, 2007; Fava et al .,

56

2010; Jacobo-Albavera et al ., 2012; Göring et al ., 2007; Kerkel et al ., 2008). Clearly, the VNN1 gene is involved with cardiovascular phenotypes, however, the specific roles of these genetic variants in disease is not obvious (Kaskow et al ., 2012). While predictive bioinformatic tools and genetic association studies have been used to identify candidate functional variants in the VNN1 gene (Zhu and Cooper, 2007; Fava et al .,

2010), further experimental research is required to elucidate the role of VNN1 genetic variants in CVD in order to complement functional studies, such as those done by

Göring et al . (2007) and Kerkel et al . (2008).

1.6 Research justification and PhD aims Recent advances in bioinformatics and functional genomics have allowed greater precision in predicting the function of human genetic variants in complex traits and disease (Collas, 2010; Statham et al ., 2012). However, a review of the literature indicates there is a still a clear requirement for robust, rapid and reproducible experimental validation methods for assessing candidate functional genetic variants, such as those within the VNN1 gene. The VNN1 gene is clearly involved in inflammation, but specific roles in other pathways are unclear, with dysregulation of the

VNN1 gene also resulting in the manifestation of other diseases (Kaskow et al ., 2012).

Vanin 1 pantetheinase activity is critical in maintenance of normal cellular processes.

Abnormal gene activity, and consequent dysregulation of such processes, can lead to disease and indicates the importance of characterising the effect of genetic variants on gene expression (Kaskow et al ., 2012). In fact, recently published studies show that

VNN1 genetic variants have a role in cardiovascular disease (Jacobo-Albavera et al .,

2012; Zhu and Cooper, 2007; Fava et al ., 2010; Göring et al ., 2007; Kerkel et al .,

2008), which promotes VNN1 as the ideal gene model for undertaking functional analyses using novel experimental validation approaches. Since there is limited

57 information regarding the functional mechanism by which genetic variants influence human disease, this thesis will significantly contribute to the understanding of the role of functional genetic variants in the cardiovascular risk-gene VNN1 .

The current study will include the development of a novel in vivo assay for the identification and characterisation of functional genetic variants within the VNN1 gene, which overcomes limitations associated with functional assays available at the commencement of this study (2009). This novel assay, namely ‘CHA-seq’, is based on the measurement of chromatin accessibility across regions containing genetic variants to account for patterns of transcription factor binding, nucleosomal/histone positioning and

DNA methylation, which are directly associated with functional genetic variation. The

VNN1 gene was used as a model for the development of CHA-seq, with the aim of extending this approach to a genome-wide scale using novel high-throughput sequencing technologies.

The aims of this PhD research were to:

1. Use existing chromatin arrangement methods (i.e. CHART-PCR) to assess the

functionality of candidate genetic variants in the VNN1 gene promoter.

2. Develop a novel experimental validation assay (‘CHA-seq’) to evaluate the

functionality of candidate genetic variants in the VNN1 gene promoter.

3. Experimentally validate candidate functional genetic variants in the VNN1

gene promoter derived from the CHA-seq approach.

58

To address the first aim of this study, CHART-PCR was used to characterise chromatin accessibility in the VNN1 gene promoter using cultured lymphocytic cell lines derived from the San Antonio Family Heart Study (SAFHS) (Göring et al ., 2007), in addition to the Ramos lymphocytic cell line obtained from the American Type Culture Collection

(ATCC). Cell lines were selected for functional analysis based on the particular genotypes present at the -137T/G and -587G/A SNP loci in the VNN1 gene promoter.

The -137T/G and -587G/A VNN1 SNPs were selected for further functional analysis based on preliminary findings in our laboratory and the results of published studies such as Göring et al . (2007). The -137T/G and -587G/A SNPs were assessed using two chromatin-sensitive nucleases (micrococcal nuclease and deoxyribonuclease 1) in

CHART-PCR to determine the presence or absence of allele-specific chromatin accessibility, which is indicative of functionality.

To address the second aim, chromatin derived from the SAFHS cell lines with different allelic combinations were used to develop the novel ‘CHA-seq’ approach to functionally characterise genetic variants in the VNN1 gene. The CHA-seq approach incorporated chromatin-sensitive nucleases and high-throughput DNA sequencing in order to simultaneously identify and functionally assess candidate SNPs based on chromatin accessibility. CHA-seq was used to validate the findings of CHART-PCR of the candidate functional -137T/G and -587G/A SNPs and to screen for other novel candidate functional SNPs in the VNN1 gene.

To address the third research aim, novel candidate functional VNN1 SNPs identified from CHA-seq were selected for further functional assessment in order to characterise the role of the candidate genetic variants in the VNN1 gene. A combined bioinformatic and experimental validation approach was used to confirm the CHA-seq results which

59 incorporated bioinformatic or in silico predictive methods such as P-Match and

AliBaba, in addition to ‘traditional’ experimental methods such as EMSA and bisulfite sequencing.

60

Chapter 2: General methods and materials

2.1 General reagents, biochemicals, buffers and solutions Biochemical reagents used in the current study were of the highest grade available and purchased from primarily Sigma Chemical Company (Sigma, Australia), Roche

Diagnostics Corporation (Roche Diagnostics Australia Pty Ltd, Australia), Invitrogen

Corporation, Australia Corporation (Invitrogen Corporation, Australia), QIAGEN

(QIAGEN, Australia) and BD Biosciences (Becton, Dickinson and Co, USA). All disposable plastic (polypropylene and polystyrene) lab-ware was purchased sterile from

Sarstedt (SARSTEDT AG and Co, Germany), Nalgene (Nalge Nunc International,

USA) or Corning Life Sciences (Corning Incorporated, USA), with a detailed list of lab- ware shown in Table A.3 (Appendix – Supplementary Tables and Figures).

Biochemicals and buffers were prepared in accord with the methods described by

Sambrook and Russell (2001) using sterile non-pyrogenic (Baxter Internation Inc.,

USA), double deionised Millipore-filtered water (ddH 2O), Dulbecco’s phosphate buffered saline (Invitrogen Corporation, Australia) or molecular biology-grade organic solvents as required. Solutions were prepared in sterile plastic tubes or glass vessels that had been previously washed several times with ddH 2O and autoclaved (30min at 121˚C,

15 PSI). Solutions were sterilised, depending on the contents, by autoclaving (30min at

121˚C, 15 PSI) or using a 0.22µm Acrodisc (Pall Corp, Australia) syringe filter units.

Commonly used 1X Tris-EDTA (TE) and 50X Tris-Acetate-EDTA (TAE) buffers were made up routinely for general laboratory use. To make a 100 mL solution of TE Buffer,

1 mL of 1 M Tris-HCl (pH 8.0) and 0.2 mL EDTA (0.5 M) was added together in a glass beaker and made up to volume with double distilled water. pH was adjusted to 8.0

61 using 1 M HCl and 1 M NaOH. TAE buffer was prepared as a 50X stock solution by dissolving 242 g Tris Base in sterile double distilled water, adding 57.1 mL Glacial

Acetic Acid, and 100 mL of 0.5M EDTA (pH 8.0) solution, and maded up to 1 L with water. This stock solution was diluted 1/50 with water to make a 1X working solution.

Where necessary, reagents were weighed using a model 310 electronic balance (MG

Scientific, USA) and heated and/or stirred with a Thermolyne Cimarec 2 magnetic stirrer (Thermo Fisher Scientific Inc, USA). The pH of solutions was measured using a

PerpHecT pH/temperature meter (Orion Research, USA).

General biochemicals and reagents utilised in the current study were sourced from various distributers listed in detail in Table A.1 (Appendix – Supplementary Tables and

Figures), with a detailed list of the corresponding equipment used in this research shown in Table A.2 (Appendix – Supplementary Tables and Figures).

2.2 Bacterial Methods

2.2.1 Bacterial culture medium Microbiological reagents were obtained from Difco Laboratories (Becton, Dickinson and Company, USA), BDH Laboratories (Merck, Germany) and Sigma Chemical

Company (Sigma, Australia). Bacterial cell culture media and ampicillin antibiotics were prepared in accord with the methods described by Sambrook and Russell (2001).

All media and solutions were prepared to the prescribed volumes with double-deionised water and balanced to pH 7 with 1 M HCl and 1 M NaOH. To facilitate positive bacterial selection, the antibiotic ampicillin is added to bacterial culture media prior to experimentation at a concentration of 100 µg/ µL. Aliquots of ampicillin were stored at -

20 oC for no more than 1 week.

62

1 L of LB ( Luria-Bertaini ) broth was prepared by dissolving 10 g Bacto Tryptone, 5 g

Yeast Extract and 10 g NaCl into sterile double distilled water and made up to 1 L with water. pH was adjusted to 8.0 using 1 M HCl and 1 M NaOH, followed by autoclaving at 121°C for 30 minutes. LB broth was supplemented with 100 mg/mL ampicillin immediately prior to use. Luria-Bertani (LB) agar was prepared using the same methodology as LB broth, with the addition of 15 g of Bacto Agar. Following autoclaving, LB agar was cooled to ~60°C and supplemented with 100 mg/mL ampicillin, and poured (~25 mL) into perti dishes and left to cool at room temperature until set. Once the agar was set, plates were sealed in parafilm and stored at 4 oC until required, however, LB agar dishes were stored for no longer than two weeks. Prior to use, plates were warmed in a 37 oC incubator for 30 minutes and supplemented with 20

µL X-gal (Promega, USA) for subsequent selection of white recombinant colonies using traditional blue-white colony screening (Section 2.2.4).

2.2.2 Bacterial culturing conditions All recombinant cloning and bacterial procedures were carried out using the One Shot

MAX efficiency DH5 α-T1 R strain of Escherichia Coli (Life Technologies, Australia), stored at -80 oC, with the following genotype: F- φ80lacZ ∆M15 ∆(lacZYA-argF) U169 deoR recA1 endA1 hsdR17 (rk-, mk+) phoA supE44 λ- thi-1 gyrA96 relA1 tonA

o Bacterial cells were grown at 37 C with 5% CO 2 in a CO2 Water Jacketed Incubator

(Forma Scientific, USA).

63

2.2.3 TOPO cloning in bacteria The TOPO TA Cloning kit (Invitrogen Corporation, Australia) was used in this project to incorporate Taq-polymerase-amplified PCR products into the pCR2.1-TOPO plasmid vector and conducted according to the manufacturer’s instructions, with the following modifications to the reaction set-up. In a 1.5 mL micro-centrifuge tube, 1 µL of pCR2.1

TOPO vector was added to 1 µL of ‘salt solution’ and 1-4 µL of fresh PCR product.

Reaction volume was made up to 6 µL with sterile double distilled water and mixed gently by pipette. TOPO Reaction was left to incubate for five minutes at room temperature, followed immediately by transformation (Section 2.2.4) or stored for later use at -20°C. TOPO reactions were subsequently transformed into competent E.Coli cells (refer to section 2.2.4) or stored at -80°C for use at a later date.

2.2.4 Transformation of Bacteria and blue-white colony selection Bacterial transformations were performed using One Shot MAX Efficiency DH5 α-T1 R

Competent Cells (Invitrogen Corporation, Australia). In brief, 2 µL of pCR2.1-TOPO plasmid vector containing sequence of interest was added to one 50 µL vial of competent cells and stored on ice for 30 minutes. Cells were then heat shocked at 42°C for 45 seconds, followed by two minutes on ice, and subsequently the addition of 500

µL of pre-warmed (37°C) LB broth (See Section 2.2.1). The transformation reaction was then incubated in 37°C water bath for one hour, and then transferred onto LB agar plates (See Section 2.2.1) and grown overnight at 37°C with 5% CO 2. Blue-white colony selection was used in this current study for the selection of recombinant bacterial colonies and performed using the method as described by Sambrook and Russell (2001).

64

2.3 Eukaryotic Methods

2.3.1 Eukaryotic cells and culturing

All mammalian cell lines were cultured at 37°C with 5% CO 2 in a humidified incubator

(Forma Scientific, USA). Human haematopoietic suspension cell lines including Ramos

(RA 1) were obtained from the American Tissue Culture Collection (ATCC, USA), and

Epstein Bar Virus (EBV) transformed lymphoblastoid cell lines were obtained from the

SAFHS cohort (Göring et al ., 2007) . SAFHS cell lines included 0054, 0194, 0306, 0525,

0029, 0188 . Cells were propagated in RPMI 1640 cell culture media (Invitrogen

Corporation, Australia) supplemented with 10% (v/v) heat inactivated foetal bovine serum (FBS; Invitrogen Corporation, Australia), antibiotics (100 U/mL penicillin and

100 µg/mL streptomycin, Invitrogen Corporation, Australia) and 4 mM L-glutamine

(Invitrogen Corporation, Australia). FBS was heat inactivated at 56˚C for three hours, dispensed into 50 mL aliquots and stored at -20˚C until used. Suspension cell cultures were grown in filter-top tissue culture flasks (Nalge Nunc International, USA) and maintained between 0.5-1.5 x 10 6 viable cells/mL by subculturing every 2 days.

2.3.2 Eukaryotic cell counting with Trypan Blue staining To calculate cell density, a 10 µL aliquot of cells were transferred to a 1.5 mL

Eppendorf tube and mixed with 20 µL of Trypan Blue Stain (Sigma, Australia). Stained cells were loaded into a Neubauer Counting Chamber (ProSciTech, Australia) and counted at 20x magnification under a Wild M40 light microscope (Wild Heerbrugg,

Gais, Switzerland). Cells appearing white in colour and uniformly spherical were considered viable and were counted. Cells appearing dark blue in colour were deemed non-viable and were not considered. Cell density was calculated by counting the number viable cells from the four quadrants of the Counting Chamber and then dividing

65 by the number of quadrants considered (4), and then multiplying by the dilution factor

(3) to give cell density in terms of x 10 4 cells/mL.

2.3.3 Cyropreservation storage and revival of eukaryotic cell lines All mammalian cell lines were maintained in storage by cryopreservation. For long- term storage, cells were grown in complete media for approximately five days post- revival from cryopreservation storage (or until cells are in mid-late log growth phase) and were harvested through centrifugation at 1000 rpm (Jouan C312 swinging rotor centrifuge) for five minutes at room temperature in 50 mL Falcon tubes (Sarstedt,

Germany). Following the removal of the supernatant, pellet cells were resuspended in complete media to a density of 2-3 x 10 6 cells/mL. 700 µL Aliquots of resuspended cells were added to 200 µL FBS and 100 µL Dimethyl sulfoxide (DMSO) and stored in

1.5 mL cryovials (Nalge Nunc International, USA) at -80°C for 1-2 days before being transferred to liquid nitrogen for long term storage.

To revive the cells from cryopreservation, vials were initially thawed at 37°C and then transferred to 9 mL complete-media in a 15 mL Falcon tube, and harvested by centrifugation at 1000 rpm (Jouan C312 swinging rotor centrifuge) for five minutes at room temperature. Following the removal of the supernatant, pelleted cells were resuspended in 2-5 mL of fresh complete growth media and transferred to a sterile 25

2 cm filter-top culture flask (Stratagene, USA) and grown at 37°C with 5% CO 2.

2.4 Oligonucleotides All oligonucleotide primers were obtained from Sigma-Aldrich Pty Ltd, Castle Hill

NSW, Australia (Sigma, Australia) purified by desalting for use in Polymerase Chain

Reactions (PCR) (Section 2.8) and DNA sequencing (Section 3.9), whereas

66 oligonucleotides used for EMSA were obtained cleaned by High-performance liquid chromatography. Oligonucleotides, supplied desiccated, were resuspended in an appropriate volume of sterile water to a final concentration of 100 µM, vortexed, and centrifuged. Aliquots of 10 µM were prepared and stored at -20 oC until required.

Oligonucleotides for use in PCR or qPCR were designed in accord with the commonly used guidelines outlined in Table 2.1. Oligonucleotides required for use in the assessment of SNPs in EMSA (Section 2.10.2) were designed to span fifteen nucleotides either side (both 5’ and 3’) of a SNP (total length of 31 bp including SNP) with the forward (5’ to 3’) oligonucleotide of each pair possessing a 5’ biotin modification. To simulate double stranded DNA (dsDNA), 10 µL of 100 µM forward and reverse oligos in TE buffer (See Section 2.1) were added together into the same 1.5 mL Eppendorf tube and denatured at 95˚C for ten minutes. The tube was left to cool for two hours at room temperature to facilitate ligation of the single stranded oligos to form a double stranded nucleic acid complex.

67

Table 2.1: Primer design guidelines

Shown are the primer design parameters, derived from those specified by Sigma- Aldrich (http://www.sigmaaldrich.com/etc/medialib/docs/Sigma- Aldrich/General_Information/1/oligo_architect_glossary.Par.0001.File.tmp/oligo_archit ect_glossary.pdf), that were followed in this current research.

Characteristic Parameters Length 20-25 bp Secondary structure None to weak Repetitive sequences None preferably, but no more than 3 identical nucleotides in a row. GC cap (3’ G or C nucleotide, or ideally GC Present di-nucleotide) GC content 40-60% Predicted primer dimer (using Sigma None Genosys online primer design software) Melting temperature ~65˚C, with a difference of less than 1˚C between forward and reverse primers

68

2.5 Isolation and purification of plasmid and eukaryotic DNA

2.5.1 Mini-prep purification of plasmid DNA Plasmids carrying DNA sequence of interest were isolated and purified using the

QIAGEN Spin Mini-prep kit (QIAGEN, Australia) in accord with the manufacturer’s instructions. Prior to purification, transformed bacteria were spread onto LB agar plates

(See Section 2.2.1) supplemented with 100 µg/mL ampicillin and incubated at 37 ˚C overnight. The following day individual bacterial colonies were picked and inoculated in 5 mL of LB broth (See Section 2.2.1) containing ampicillin (100 µg/mL) and incubated overnight at 37 ˚C with vigorous shaking (200-250 rpm in the Jouan C312 swinging rotor centrifuge). Bacterial cells were then harvested by centrifugation at 3000 rpm (Jouan C312 swinging rotor centrifuge) for ten minutes. DNA was subsequently isolated from the cell pellet according to protocol supplied with the QIAGEN Plasmid

Purification kits. The purified DNA was resuspended in 100 µL endotoxin-free Buffer

TE (QIAGEN, Australia) and stored at -20 ˚C.

2.5.2 Isolation and purification of DNA from eukaryotic cells Eukaryotic cell lines carrying genomic DNA of interest was isolated and purified using the QIAamp DNA blood mini kit (QIAGEN, Australia) in accord with the manufacturer’s instructions. Prior to purification, cell lines of interest in culture media were harvested by centrifugation at 1000 rpm (Jouan C312 swinging rotor centrifuge) for five minutes. Following the removal of the supernatant, genomic DNA was extracted and purified using the QIAamp DNA blood mini kit (QIAGEN, Australia) according to the manufacturer’s protocol outlined in the QIAamp DNA Mini and Blood

Mini Handbook that was supplied with the kit. The purified DNA was resuspended in

150 µL endotoxin-free Buffer TE (QIAGEN, Australia) and stored at -20 ˚C.

69

2.6 Analysis of nucleic acids

2.6.1 Nano-drop analysis of nucleic acids To determine the concentration and relative purity of nucleic acid samples, 1 µL aliquots were loaded onto a Nanodrop ND-100 spectrophotometer (NanoDrop, USA) for analysis. Absorbance is measured at two wavelengths: 260 nm for nucleic acids, and

280 nm for peptide bonds. The 260 nm measurement was used to calculate the concentration of the nucleic acid sample. The relative purity of the sample was determined from the A 260/280 ratio. Samples relatively free from protein contamination return an A 260/280 ratio between 1.6 and 2.0.

2.6.2 Gel Electrophoresis 1-2% (w/v) agarose (Promega Corporation, USA) gels containing 0.5 µg/mL ethidium bromide (Sigma, Australia) were prepared using 1x TAE Buffer (See Section 2.1). DNA samples were adjusted to approximately 18 µL with ddH 2O and combined with 2 µL of

Gel Loading Dye Blue (6X) (New England Biolabs, USA) immediately prior to loading the gel. DNA samples were run alongside maker standards such as the NEB PCR

Marker (sizes: 50 bp, 150 bp, 300 bp, 500 bp, 766 bp) and the NEB 1 kb DNA ladder

(sizes: 500 bp, 1000 bp, 1500 bp, 2000 bp, 3000 bp, 4000 bp, 5-10 kb bands indistinguishable) from New England Biolabs (New England Biolabs, USA).

Electrophoresis was performed using a BIO-RAD Electrophoresis tank (BIO-RAD,

USA) filled with TAE buffer to cover the gel and with a constant voltage of 90V for 30-

60 minutes. Following gel electrophoresis, separated DNA fragments were visualised by fluorescence of ethidium bromide using the Spectroline UV trans-illuminator

(Spectronics, USA) and images captured using the FinePix S1 digital camera (Fujifilm

Corporation, Japan).

70

2.7 Polymerase Chain Reaction (PCR) In general, conventional PCR amplifications were carried out using the PTC-

Thermocycler (MJ Research Inc) and Platinum Taq polymerase (Invitrogen

Corporation, Australia) according to the manufacturer’s instructions. Briefly, PCR was performed in sterile 0.2 mL Eppendorf tubes with 50 µL reactions including 5 µL DNA template (~50-100 ng), 5 µL 10x PCR reaction buffer, 2 µL MgCl 2 (50 mM), 2 µL dNTPs (dATP, dGTP, dCTP, and dTTP at 10 mM each, Roche Diagnostics) 2 µL primer mix (10 µM each forward and reverse sequence specific primers), 0.5 µL

Platinum Taq polymerase and 33.5 µL sterile water. Typical thermo-cycling conditions were as follows: initial denaturation of 95˚C for 5 minutes, followed by 10-40 cycles of

95˚C denaturation for 30 seconds, 58-68˚C (5°C less than primer melting temperature) annealing for 30 seconds and 72˚C extension for 1 minute; followed by extension at

72˚C for 5 minutes; finally cooled to 4˚C until required.

2.8 Sanger DNA Sequencing Samples to be sequenced by traditional Sanger-based chemistry

(http://www.agrf.org.au/index.php?id=72#Purified DNA Process) were sent to

Australian Genome Research Facility (AGRF) on Murray St, Perth, Western Australia.

For AGRF sequencing of purified plasmid DNA, 1 µg of DNA was adjusted to a volume of 12 µL in sterile water and was added to 1 µL of 10 mM of the appropriate primer in a 1.5 mL Eppendorf tube. Samples were then sent to AGRF for sequencing.

For those sequences integrated into the pCR2.1 TOPO vector, M13F primer (5´-

GTAAAACGACGGCCAG-3´) was used.

71

2.9 Experimental validation methods for SNP functionality

2.9.1 Chromatin Accessibility by Real-Time PCR (CHART-PCR) The CHART-PCR protocol was originally developed by Frances Shannon and co- worker at the Australian National University (Rao et al ., 2001). In brief, CHART-PCR consists of firstly the isolation of intact nuclei. Following isolation and washing, nuclei were incubated with or without nuclease (micrococcal nuclease or deoxyribonuclease 1) giving “undigested” and “digested” samples. Accessibility reactions were subsequently terminated and genomic DNA recovered and prepared for analysis by quantitative real- time PCR in order to quantify target regions. Comparing the quantity of targeted genomic sequences in digested relative to undigested samples provides a measure of chromatin accessibility and expressed as a percentage of nuclease cutting across the region specified by the primers.

2.9.1.1 Nuclei isolation Approximately 1x10 7 cultured cells were harvested by centrifugation in 50 mL Falcon tubes at 1000 rpm (Jouan C312 swinging rotor centrifuge) for five minutes at 4°C. The cell pellet was washed once with ice-cold PBS before re-suspension in 1 mL ice-cold

Igepal Lysis buffer (10 mM Tris.HCl [pH 7.4], 10 mM NaCl, 3 mM MgCl 2, 0.5%

Igepal CA-630, 0.15 mM spermine and 0.5 mM spermidine) and five minutes incubation on ice. Nuclei were collected by centrifugation at 3000 rpm (Jouan C312 swinging rotor centrifuge) (4°C) for five minutes.

2.9.1.2 Nuclease digestion of chromatin

Isolated nuclei were washed with 1 mL of Digestion buffer without CaCl 2 (10 mM

Tris.HCl [pH 7.4], 15 mM NaCl, 60 mM KCl, 0.15 mM spermine and 0.5 mM spermidine) and centrifuged at 3000 rpm (Jouan C312 swinging rotor centrifuge) (4°C) for five minutes. Washed nuclei were then resuspended in 350 µL of Digestion buffer 72 with CaCl 2 (prepared by the addition of 25 µL of 1 M CaCl 2 and 150 µL of 1 M MgCl 2 to 25 mL of Digestion buffer without CaCl 2). This cell suspension was divided into six

50 µL aliquots in sterile, labelled 1.5 mL Eppendorf tubes with each containing approximately 1.25 x 10 6 nuclei. Triplicate samples were used for both MNase/DNase I accessibility reactions (“digested”) and for controls incubated without nuclease

(“undigested”). 50 µL of pre-warmed (five minutes at 37°C) Digestion buffer (with

CaCl 2) without nuclease was added to “undigested” samples, while 50 µL of pre- warmed (five minutes at 37°C) Digestion buffer (with CaCl 2 and MgCl 2) supplemented with 20 U/mL MNase (New England Biolabs, USA) or DNase I (Invitrogen

Corporation, Australia) was added to the “digested” samples. All samples were then incubated in a 37°C water bath for ten minutes. To stop digestion, 30 µL stop solution

(3.3% SDS (w/v), 66 mM EDTA [pH 8.1], 6.6 mM EGTA [pH 8.1]) was added followed by the addition of 20 µL of proteinase K (New England Biolabs, USA).

2.9.1.3 Collection of DNA fragments Genomic DNA was isolated using the QIAamp DNA blood mini kit (QIAGEN,

Australia) using reagents and centrifuges at room temperature according to the manufacturer’s instructions. DNA was recovered in 150 µL 1x TE buffer (See Section

2.1). Eluates were analysed using the Nanodrop ND-100 spectrophotometer (NanoDrop,

USA) routinely to ensure the intergrity of individual assays (Section 2.6.1).

2.9.1.4 qPCR and analysis Quantitative real time PCR (qPCR) was performed using the SensiMix SYBR No-ROX

Kit (Bioline Pty Ltd, Australia) prepared in a 96-well plate (Bio-Rad Laboratories, USA) using 5 µL DNA (approximately 100 ng), 2 µL of primer mix (forward and reverse specific primers at 10 mM each), 10 µL of 2X SensiMix SYBR - No-ROX, and made up

73 to a final reaction volume of 20 µL with sterile ddH 2O. Thermal cycling conditions for the

CFX96 Real-Time Detection System (Biorad Laboratories Inc., USA) were as follows:

95°C for 10 minutes; followed by 40 cycles of 95°C for 15 seconds, 64°C for 15 seconds and 72°C for 15 seconds. Percentage chromatin accessibility was calculated as a ratio of relative abundance of digested to undigested amplicon, with the measurement calculated using the CFX Manager software (Biorad Laboratories Inc., USA). Statistics were performed using GraphPad Prism v4 (GraphPad Software, USA).

2.9.2 Electrophoretic Mobility Shift Assay (EMSA) The majority of EMSA protocols are based on those initially described by Garner and

Revzin (1981). In short, the EMSA approach consists of firstly the isolation of nuclear protein extract from a cultured cell line of interest. Following isolation and washing, allelic double-stranded oligonucleotides specifying a target genomic region were incubated with or without nuclear protein extract. Binding reactions were terminated and prepared for gel electrophoresis, and subsequently quantified using the

‘Chemilluminescent nucleic acid detection module’ (Thermo-Scientific, Australia) to identify the presence of allele-specific DNA-protein binding complexes indicative of functionality. For the functional assessment of genetic variants using EMSA, migratory differences of allelic samples incubated in nuclear extract are compared to relative to a control binding reaction absent of protein relative to determine the presence or absence of allele-specific DNA-protein complexes.

2.9.2.1 Nuclear protein extraction Nuclei were collected from 5x10 7 cells from culture cell lines by firstly harvesting by centrifugation at 1000 rpm (Jouan C312 swinging rotor centrifuge). Following the removal of supernatant, the cell pellet was resuspended in 5 mL ice-cold PBS. Cells were collected by centrifugation again, re-suspended in 1 mL of ice-cold PBS and 74 transferred to a clean 1.5 mL Eppendorf tube. Cells were centrifuged down at 14000 rpm (Jouan C312 swinging rotor centrifuge), 4C for one minute, with the subsequent supernatant discarded. The cell pellet was resuspended in 200 µL ice cold Buffer A (10 mM HEPES [pH 7.9], 1.5 mM MgCl 2, 10 mM KCl, 0.5 mM DTT, supplemented prior to use with 2 µL of 0.5 mM PMSF and 1 µL of 1 µg/mL pepstatin for every 1 mL used) using a 28 gauge needle through which cells pass 6-8 times on ice. The cell suspension was then collected by centrifugation at 14000 rpm (Jouan C312 swinging rotor centrifuge), 4˚C, for ten seconds, with the resulting supernatant discarded. The crude nuclei were extracted by re-suspending the cell pellet in 30 µL ice cold Buffer C (20 mM HEPES [pH 7.9], 1.5 mM MgCl 2, 420 mM KCl, 0.5 mM DTT, 0.2 mM EDTA [pH

8.0], 25% [v/v] Glycerol, supplemented with 2 µL of 0.5 mM PMSF, 1 µL of 1 µg/mL pepstatin and 1 µL of 1ug/mL leupeptin prior to use for every 1 mL used), followed by mixing and incubation on ice for 20 minutes. An equal volume of Buffer D (20 mM

HEPES [pH 7.9], 0.5 mM DTT, 0.2 mM EDTA [pH 8.0], 20% [v/v] Glycerol, supplemented with 2 µL of 0.5 mM PMSF, 1 µL of 1 µg/mL pepstatin and 1 µL of 1

µg/mL leupeptin prior to use for every 1 mL used) was added to the nuclei and mixed.

Finally, the nuclei were collected by centrifugation at 14000 rpm (Jouan C312 swinging rotor centrifuge), 4˚C, for ten minutes. The purified nuclei were subsequently divided into 20 µL aliquots, snap frozen in liquid nitrogen and stored at -80˚C until used.

2.9.2.2 Binding reaction preparation To prepare the DNA retardation gel (Invitrogen Corporation, Australia) for electrophoresis, the gel wells were washed with 0.5X running buffer (diluted from 5X stock of 54 g trimza base, 27.5 g boric acid, added to 20 mL 0.5 M EDTA, and made-up to 1 L with sterile water) with a Pasteur pipette to remove bubbles and contaminants.

The gel was assembled into the electrophoresis tank (Invitrogen Corporation, Australia),

75 and filled with 0.5X running buffer. The gel was pre-run at 100V for 30 minutes.

During the incubation the DNA binding reaction was prepared in 1.5 mL Eppendorf tubes on ice with nuclear extract (4 µL of 20% ficoll, 2 µL of 200 mM HEPES [pH 7.9],

1 µL of 20 mM EDTA [pH 8.0], 2 µL of 10 mM DTT, 1 µL of Poly dI:dC, 6 µL of sterile water and 1.5 µg of nuclear extract) and without (4 µL of 20% ficoll, 2 µL of 200 mM HEPES [pH 7.9], 1 µL of 1 M KCl, 1 µL of 20 mM EDTA [pH 8.0], 2 µL of 10 mM DTT, 1 µL of Poly dI:dC and 6.5 µL of sterile water) followed by 10 minutes incubation on ice. Following incubation, labeled sequence-specific oligonucleotides were added to the binding reactions (2.5 µL in the nuclear protein reaction and 8 µL in the reaction without extract) and left to incubate on ice for 30 minutes.

2.9.2.3 Gel electrophoresis and detection Samples were mixed with 2 µL of bromophenol blue dye (Merck, Australia) and loaded quickly onto the pre-run retardation gel, with an additional lane of dye to visually monitor the migration progress of the samples. The gel was run for one hour at 100V.

Following the electrophoresis, the gel tank was reassembled with the addition of a nylon membrane (Invitrogen Corporation, Australia) placed in direct contact with the DNA retardation gel, and run at 30V for 30 minutes to facilitate the transfer of the DNA- protein complexes formed in the binding reaction. The membrane was then removed from the gel tank, covered with a sheet of plastic wrap, and exposed on a UV trans- illuminator for ten minutes. The detection of immobilized nucleic acids was carried out using the ‘Chemilluminescent nucleic acid detection module’ (Thermo-Scientific,

Australia) according to the manufacturer’s protocol. The membrane was then exposed onto chemiluminescence film (GE Healthcare Limited, Australia) by firmly pressing the membrane onto a sheet of film in a film cassette (Amersham, UK) for up to five minutes, followed by the submersion of the exposed film in developer solution (diluted

76

1:5 with sterile water prior to use) (Kodak, Australia) until bands became visible. The film was then immediately placed into fixer solution (diluted 1:5 with sterile water prior to use) (Kodak, Australia) to terminate the reaction. Excess fixer solution was removed by rinsing the film in water. All exposure steps took place in a dark room with minimal illumination. Genetic variants associated with allele-specific DNA-protein complexes in

EMSA were considered to be functional.

2.9.3 Bisulfite Sequencing The majority of current bisulfite sequencing methodologies are based on those initially described by Frommer et al . (1992). In brief, bisulfite sequencing consists of firstly the bisulfite treatment of genomic DNA isolated from a cultured cell line of interest using the EpiTect Bisulfite Kit (QIAGEN, Australia). Following isolation and washing, genomic DNA was enriched for a genomic region of interest using sequence-specific primers in PCR. Bisulfite-enriched target regions were subsequently cloned into TOPO vectors using the TOPO TA Cloning kit (Invitrogen Corporation, Australia), and transformed into competent E.coli . Following blue-white colony selection, positive recombinant colonies were picked and grown overnight at 37˚C in LB broth supplemented with ampicillin and subsequently purified using mini-prep spin columns

(QIAGEN, Australia). Finally, purified DNA was sequenced and analysed using bioinformatic tools (BISMA) to determine the presence of allele-specific DNA methylation patterns. Genetic variants that were associated with allele-specific DNA methylation following bisulfite sequencing were considered to be functional.

77

2.9.3.1 Bisulfite treatment Genomic DNA was isolated from seven cultured human leukocyte cell lines from the

SAFHS cohort carrying different -587 genotypes (two -587G/G homozygotes, three -

587G/A heterozygotes and two -587A/A homozygotes) and purified using DNA Blood mini columns (QIAGEN, Australia). Isolated DNA was treated with bisulfite to convert unmethylated cytosine bases into uracils, and subsequently purified using the EpiTect

Bisulfite Kit (QIAGEN, Australia).

2.9.3.2 PCR amplification The converted DNA was enriched for the VNN1 gene promoter by PCR using primers

(5’-CAGTGAGCCTGTACCCCTGG-3’ and 5’-TATTCTCCCCATGTGTGTGCC-3’) flanking the predicted CpG island (between -758 to -496 upstream of the VNN1 transcriptional start site) and PCR amplified using 40 cycles with the protocol outlined in Section 2.7.

2.9.3.3 Cloning and transformation

The amplification products were cloned (Section 2.2.3) into the pCR2.1-TOPO vector

(Invitrogen Corporation, Australia) and transformed (Section 2.2.4) into competent

E. coli (Invitrogen Corporation, Australia) for subsequent Sanger-based DNA sequencing (Australian Genome Research Facility, Perth, Western Australia). The transformation mix was plated onto LB-agar plates (Section 2.2.1) supplemented with amplicillin and X-gal for blue-white colony selection (Section 2.2.4). White recombinant colonies were picked the following day and grown overnight at 37˚C in LB broth (Section 2.2.1) supplemented with ampicillin and subsequently purified (Section

2.5.1) using mini-prep spin columns (QIAGEN, Australia).

78

2.9.3.4 DNA sequencing and analysis Isolated clones were then sent for Sanger-based sequencing (Australian Genome

Research Facility, Perth, Western Australia) using the endogenous M13F primer (5´-

GTAAAACGACGGCCAG-3´). Sequences were analysed using BISMA

(http://biochem.jacobs-university.de/BDPC/BISMA/manual_unique.php) to identify the methylation states of predicted CG di-nucleotides in CpG islands. Allele-specific differences in methylation patterns, localised near a particular SNP, suggest the potential for differential gene expression and functionality. Further bioinformatic verification was performed using AliBaba 2.1 (http://www.gene- regulation.com/pub/programs/alibaba2/index.html).

2.10 Targeted-CHA-seq (T-CHA-seq) In the present study, two novel in vivo approaches (‘targeted’ CHA-seq [T-CHA-seq] and

CHA-seq) were developed in order to characterise allele-specific differences in chromatin arrangement within the VNN1 gene. Firstly in T-CHA-seq, a DNase I-digested genomic library was prepared from a -137T/G heterozygous cell line from the SAFHS cohort and prepared using the NEBNext DNA Sample Prep kit (New England Biolabs, USA).

Following library preparation DNA fragments were enriched for the -137 promoter region of the VNN1 gene using two rounds of sequence-specific primers in PCR . VNN1 - enriched fragments were subsequently cloned into TOPO vectors using the TOPO TA

Cloning kit (Invitrogen Corporation, Australia), and transformed into competent E. coli .

Following blue-white colony selection, positive recombinant colonies were picked and grown overnight at 37˚C in LB broth supplemented with ampicillin and subsequently purified using mini-prep spin columns (QIAGEN, Australia). This T-CHA-seq library was subsequently sequenced using traditional Sanger sequencing in order to map nuclease digestion sites around the -137 SNP. Finally, purified DNA was sequenced and analysed using bioinformatic tools (BLAST) to align allelic sequence reads to the -137 promoter 79 region and determine the presence of an allele-specific bias (“allele-skew”) in sequence reads as a direct reflection of localised chromatin accessibility and functionality. To facilitate a direct comparison between different alleles at a given SNP, SAFHS cell lines containing heterozygous SNPs were used and loci correlating with high allele-skews were considered to be functional genetic variants.

2.10.1 T-CHA-seq library preparation

2.10.1.1 Isolation of nuclei from cell lines Nuclei from 1 x 10 8 cells (~100 mL) of a eukaryotic cell line were harvested by centrifugation at 1000 rpm (Jouan C312 swinging rotor centrifuge) for five minutes at

4OC (using two 50 mL Falcon tubes). Following the removal of supernatant, the cell pellet was washed in 50 mL ice-cold PBS and collected again by centrifugation. The pellet was subsequently resuspended in 4.5 mL Igepal Lysis buffer (10 mM Tris.HCl

[pH 7.4], 10 mM NaCl, 3 mM MgCl2, 0.5% Igepal CA-630, 0.15 mM spermine and 0.5 mM spermidine), and incubated on ice for five minutes while swirling occasionally.

Cells were transferred to six 1.5 mL Eppendorf tubes and centrifuged at 3000 rpm

(Jouan C312 swinging rotor centrifuge) for five minutes at 4ºC. Resulting supernatant was then removed and the remaining crude nuclei were washed with 1 mL Digestion buffer without CaCl 2 (10 mM Tris.HCl [pH 7.4], 15 mM NaCl, 60 mM KCl, 0.15 mM spermine and 0.5 mM spermidine), and centrifuged at 3000 rpm (Jouan C312 swinging rotor centrifuge) for five minutes at 4 OC. The nuclei were then resuspended in 300 µL warmed Digestion buffer with CaCl 2 (prepared by the addition of 25 µL of 1 M CaCl 2 and 150 µL of 1 M MgCl 2 to 25 mL of Digestion buffer without CaCl 2).

80

2.10.1.2 Digesting DNA with chromatin accessibility agent The nuclei from all six 1.5 mL Eppendorf tubes were transferred to a 50 mL Falcon tube

(~1.8 mL) and warmed at 37°C for two minutes. Following the incubation, 1.8 mL

DNase I (T-CHA-seq) (2 µL of 251 U/µL DNase I [Invitrogen Corporation, Australia] was diluted into 51 µL Digestion buffer with CaCl 2 [Section 2.9.1.2], with 36 µL of this dilution added to 1.8 mL of Digestion buffer with CaCl 2 to generate a 1X working solution) dilution was added to the nuclei and digested for six minutes at 37°C. A 2 mL aliquot of buffer AL (QiaAmp DNA blood mini kit, QIAGEN, Australia) was then added to the digested nuclei and mixed by inversion. An aliquot of 2.2 mL PBS and 200

µL Qiagen Protease (QiaAmp DNA blood mini kit, QIAGEN, Australia) was combined with digested nuclei. The samples were then transferred to four 1.5 mL Eppendorf tubes and incubated at 55 ºC overnight to terminate digestion. Digested nuclei were diluted with 7 mL sterile water and 10 mL buffer AL, and were subsequently mixed by inversion. Samples were purified to isolate digested DNA fragments using the QiaAmp

DNA blood mini kit (QIAGEN, Australia) according to the manufacturer’s protocol.

2.10.1.3 NEBNext DNA Sample Prep kit processing A T-CHA-seq library of compatible DNA fragments was prepared for deep sequencing using the NEBNext DNA Sample Prep kit (New England Biolabs, USA) in accord with the manufacturer’s instructions provided with the kit. The NEBNext DNA Sample Prep kit (New England Biolabs, USA) consists of several workflow steps including end- repair and adapter sequence annealing and adapter-specific amplification using PCR.

Size selection was required to generate a library with an average size of 330 bp for subsequent deep sequencing using the Illumina/Solexa Genome Analyser II (Illumina,

USA). Consequently, the prepared T-CHA-seq library was sized using Agarose Gel

81

Electrophoresis (Section 2.6.2) by relative comparison to a control (100 bp DNA ladder,

New England Biolabs, USA) on a 2% agarose gel (and run for 30 min at 90V). Gel pieces containing library fragments of sizes between 150-400 bp (mono- and di- nucleosomal sizes) were cut out while visualising with a dark reader - blue light transilluminator (Clare Chemical Research, USA). Library fragments were purified using the QIAquick Gel Extraction kit (QIAGEN, Australia).

2.10.2 PCR amplification Two rounds of PCR amplification (Section 2.7) using nested sets of sequence-specific primers (Table 2.2) were used to increase target yield of the -137 VNN1 locus from both the forward (5’ to 3’) and reverse (3’ to 5’) direction using the protocol outlined in

Section 2.7, with 20 cycles for each round. In each round of amplification, one of the target specific primers was paired with an adapter-specific primer to ensure only library fragments are amplified. After each round of amplification, products were purified using the QIAquick PCR purification Kit (QIAGEN, Australia) to ensure no primer sequence were present in subsequent rounds of PCR. Following two rounds of amplification for each of the ‘forward’ and ‘reverse’ libraries consisting of enriched -

137 VNN1 sequence, libraries were combined to ensure the entire -137 promoter region was mapped for DNase I cut sites.

82

Table 2.2: PCR primers used in T-CHA-seq .

Two rounds of PCR amplification were performed using nested sets of sequence- specific primers of the sequence shown (“DNA sequence”), in order to increase target yield of the -137 VNN1 locus from both the forward (5’ to 3’) and reverse (3’ to 5’) direction. In each round of amplification, one corresponding VNN1 -specific primer was paired with an adapter-specific primer to ensure only library fragments are amplified. After each round of amplification, products are purified using the QIAquick PCR purification Kit (QIAGEN, Australia) to ensure no primer sequence is present in subsequent rounds of PCR.

Primer Round of DNA sequence (5’ to 3’) orientation nested PCR Novel First round 5’-GCAGCCCTAGCGAACTAATG-3’ forward Second 5’-TCTTGAGTTAATCTTCACAAATTACTGC-3’ primers round Novel First round 5’-GTCCAGAACTTAGTACCTTAAACTACGGAAG 3’ reverse Second 5’-CGCTATGATTTCATAGCATTGTTACATAAC 3’ primers round Adapter- First and 5’-ACTCTTTCCCTACACGACGC-3’ specific second primer round

83

2.10.3 TOPO cloning and transformation The PCR-enriched amplification products were cloned (Section 2.2.3) into the pCR2.1-

TOPO vector (Invitrogen Corporation, Australia) and transformed (Section 3.2.4) into

One-Shot chemically competent E.coli (Invitrogen Corporation, Australia). The transformation mix was plated onto LB-agar plates (Section 2.2.1) supplemented with amplicillin and X-gal for blue-white colony selection (Section 2.2.4). Positive white recombinant colonies were picked the following day and grown overnight at 37˚C in LB broth (Section 2.2.1) supplemented with ampicillin and subsequently purified (Section

2.5.1) using mini-prep spin columns (QIAGEN, Australia).

2.10.4 Sanger DNA sequencing Prepared T-CHA-seq library fragments were sequenced by traditional Sanger-based chemistry (http://www.agrf.org.au/index.php?id=72#Purified DNA Process) at the

Australian Genome Research Facility (AGRF) on Murray St, Perth, Western Australia using 1 µg of DNA in 12 µL and 1 µL of 10 mM M13F primer (5´-

GTAAAACGACGGCCAG-3´).

2.10.5 Analysis and bioinformatics In order to map DNase I digestion sites around the -137 SNP and determine the presence or absence of an allele-specific bias in read number (or ‘allele-skew’), T-CHA-seq sequence reads were mapped to the -137 promoter region using bioinformatic alignment tools such as BLAST, which are available online at the National Center for

Biotechnology Information (NCBI) website (http://blast.ncbi.nlm.nih.gov/Blast.cgi).

Significant allele-skews were considered to be functional.

84

2.11 Extended CHA-seq (CHA-seq) A MNase-digested genomic library was prepared from a -587G/A heterozygous cell line from the SAFHS cohort and subjected to high-throughput Illumina sequencing in order to map allelic-nuclease digestion sites across the entire VNN1 gene. This CHA-seq approach was developed as a refinement in methodology over the targeted T-CHA-seq approach, in order to screen larger genomic regions for functional genetic variants. As in T-CHA- seq, significant allele-skews were taken as a direct reflection of allele-specific nuclease accessibility and functionality at heterozygous loci.

In summary, CHA-seq consisted of firstly the isolation of intact chromatin from a -

587G/A heterozygous cell line from the SAFHS of interest, followed by the nuclease digestion with micrococcal nuclease and preparation of a DNA fragment library compatible for next-gen sequencing using the NEBNext DNA Sample Prep kit (New

England Biolabs, USA). Following library preparation DNA fragments were enriched for the Vanin gene family using sequence-specific biotinylated probes specific for the

SureSelect DNA Target Enrichment system. Bead-enriched fragments were subsequently processed for cluster generation on the Illumina cluster station (Agilent,

USA) and sequenced using Illumina next-gen sequencing technology on the Illumina

Genome Analyzer II platform (Agilent, USA) in accord with the manufacturer’s instructions. Finally, sequence reads were analysed using the Integrated Genome

Viewer (IGV) 2.1 (Broad Institute) software and aligned to the hg19 reference sequence. Raw sequence data was used to interpret chromatin accessibility by comparing numbers of allelic sequence reads at heterozygous SNPs. Genetic variants with significant allele-specific bias in sequence read number (‘allele-skew”) were considered to be functional.

85

2.11.1 Library preparation CHA-seq library preparation was equivalent to that of T-CHA-seq, as described in

Section 2.10.1, with the only modification being the substitution of MNase for DNase I in order to map nucleosome positions, rather than mapping transcription factor binding sites.

2.11.2 Bead selection In order to increase sequencing coverage of the Vanin gene family, a bead selection step was included in the CHA-seq methodology. Following preparation of the MNase- digested library using NEBNext DNA Sample Prep kit (New England Biolabs, USA), target sequences containing the entire Vanin gene family were supplied to Agilent to synthesise pools of biotinylated probes specific for the SureSelect DNA Target

Enrichment system. Libraries were purified using a bead-based selection protocol with the SureSelect Target Enrichment System - Illumina Single-End Sequencing Platform

Library Prep Version 1.2 (Agilent, USA) according to the manufacturer’s instructions.

2.11.3 Cluster generation and deep sequencing Prepared libraries were processed for cluster generation on the Illumina cluster station

(Agilent, USA) and subsequently sequenced using Illumina next-gen sequencing technology on the Illumina Genome Analyzer II platform (Agilent, USA) in accord with the manufacturer’s instructions.

2.11.4 Analysis with IGV 2.1 DNA sequence reads were then analysed using the Integrated Genome Viewer (IGV)

2.1 (Broad Institute) software and aligned to the hg19 reference sequence. The IGV software allowed the visualisation of base sequence, read lengths and polymorphisms. 86

Raw sequence data was used to interpret chromatin accessibility by comparing numbers of allelic sequence reads at heterozygous SNPs. Genetic variants with significant allele- skews were considered to be functional.

2.12 Statistical Analyses Statistical analyses (t-test, ANOVA, distribution test) using Graphpad Prism 5 software

(http://www.graphpad.com/prism/prism.htm) were used to determine statistical significance in mean chromatin accessibilities between sample sets for each target locus in CHART-PCR. For this research, CHART-PCR data was obtained from a minimum of three independent experiments performed in triplicate. Statistical significance (*) as shown was determined using student’s unpaired t-test using a confidence interval of

95% (P<0.05).

Bioinformatic analysis was performed by submitting a sequence of interest to publicly available online databases which match to large collections of documented regulators, enabling the prediction of transcription factor binding sites occurring in the submitted sequence. Based on the predictions from multiple databases, a model can be developed as to the role the submitted sequence has in a particular function in a biochemical pathway. For example, EMSA (section 2.10.2) was used to validate allele-specific

DNA-protein binding differences predicted from online databases such as AliBaba2.1

(Grabe, 2002), and P-Match (Chekmenev et al ., 2005).

87

Chapter 3: In vivo assessment of VNN1 promoter SNPs using two chromatin-sensitive nucleases in CHART- PCR

3.1 Introduction Chromatin arrangement is central to the regulation of cellular signals, controlling access of transcription factors to the DNA, and consequently, gene expression (Ernst et al .,

2011). Furthermore, genetic variants may influence chromatin arrangement and accessibility within genes (Mikkelson et al ., 2007; Birney et al ., 2010), potentially affecting gene expression and phenotype (Pang et al ., 2013). CHART-PCR was initially designed to characterise the chromatin arrangement and in vivo transcription factor binding differences within the IL-2 proximal gene promoter (Rao et al ., 2001). Since

2001, the CHART-PCR methodology has been refined to enable cell-type and allele- specific comparisons of chromatin accessibility associated with genetic variants, as demonstrated for the CR2 gene (Cruikshank et al ., 2012).

In CHART-PCR, chromatin accessibility can be assessed using nucleases which digest

DNA depending on the chromatin arrangement, including restriction enzymes Dra I and

HinfI (Rao et al ., 2001; Bulanenkova et al ., 2011). A key assumption of CHART-PCR is that the degree of chromatin accessibility in a particular genomic region is directly related to the level of physical protection to nuclease digestion conferred by the occupancy of histones and bound proteins (Bell et al ., 2011; An et al ., 1998; Grayling et al ., 1997). Micrococcal Nuclease (MNase) and Deoxyribonuclease I (DNase I) are commonly used nucleases in chromatin studies. They differ in terms of recognition sites and the mode of nucleolytic digestion (Xiao et al ., 2007). Complementary structural chromatin information can be derived from the results of nuclease digestion including

88 the positions of nucleosomes and hypersensitive sites (Sajan et al ., 2012; Mannherz et al ., 1995; Smith et al ., 1983; Heins et al ., 1967).

DNase I is an endonuclease which preferentially cleaves DNA at phosphodiester bonds adjacent to a pyrimidine nucleotide, producing 5'-phosphate-terminated oligonucleotides and 3’-hydroxylated ends (Liu, Z et al ., 2012). DNase I footprinting is an approach commonly used to map the chromatin arrangement of gene promoter regions, potential transcription factor binding sites and areas of hypersensitivity within the genome (Boyle et al ., 2011; Drouin et al ., 1997; Decker et al ., 2009). The integration of a deep sequencing approach has extended DNase I hypersensitive site and chromatin mapping to a genome-wide scale (‘DNase-seq’) (John et al ., 2011; Thurman et al ., 2012).

MNase (or S7 nuclease) is an endo-exonuclease, typically isolated from Staphylococcus aureus , that preferentially digests naked DNA and linker DNA between adjacent nucleosomes, thereby enriching for nucleosome-associated DNA which is partially or completely protected from MNase cutting (Chung et al ., 2010; Axel, 1975; Rizzo et al .,

2012). The occupancy and position of nucleosomes can therefore be assayed using

MNase (Rizzo et al ., 2012; Li, Z et al ., 2011; Yigit et al ., 2013). Typically, the quantification of MNase-digested DNA fragments has been performed using indirect- end-label analysis (Kent and Mellor, 1995; Wu and Winston, 1997), or by qPCR as in

CHART-PCR (Cruikshank et al ., 2008; Cruikshank et al ., 2012). More recently, MNase nucleosome mapping has shifted away from the quantification of reduced cutting frequency, to a genome-wide approach that targets the abundance of recovered nucleosome-sized DNA fragments by microarrays (Lantermann et al ., 2010) or high-

89 throughput sequencing technologies (‘MNase-seq’) (Wei et al ., 2012; Gaffney et al .

2012, Li et al ., 2012).

To interpret the level of nuclease digestion in CHART-PCR, control genomic regions are fundamental to establishing benchmark chromatin accessibility in order to compare samples between different cell lines (Rao et al ., 2001; Cruikshank et al ., 2012). For example, Surfactant protein 2 ( SPA-2) and Beta-actin ( β-actin ) have been used in

CHART-PCR as chromatin accessibility controls to characterise the chromatin state of the human CR2 gene with MNase in a number of immortalised human cell lines (U937,

REH, Ramos and Raji) (Cruikshank et al ., 2008). Cruikshank et al . (2008) determined an SPA-2 regulatory region to be representative of transcriptionally silent heterochromatin, as this region has not been found to be expressed in hematopoietic cells (Li et al ., 1998). Alternatively, β-actin has been demonstrated to have constitutively high gene expression in most eukaryotic non-muscle cells and therefore is commonly used as an internal control for the quantification of mRNA levels (Zhang et al ., 2005; Danilition et al ., 1991). For example, Cruikshank et al . (2012) used sequence near the transcriptional start site of the β-actin gene as a control in order to characterise the chromatin accessibility of a functional SNP (rs3813946) in the CR2 gene using

CHART-PCR (Cruikshank et al ., 2012). Consequently, SPA-2 and β-actin were used as control genomic regions in the current study to interpret the relative level of chromatin accessibility in CHART-PCR and to facilitate comparisons between multiple cell lines.

In this chapter, the relative level of chromatin accessibility was elucidated at candidate functional SNPs in the VNN1 gene model. The degree of chromatin accessibility in

DNA heterozygous for both -137 (rs4897612) and -587 (rs2050153) promoter variants, which are 450bp apart, was assessed by comparing digestion patterns to SPA-2 and β-

90 actin controls. Furthermore, both nucleases were used to assess the VNN1 promoter for allele-specific chromatin accessibility by examining the level of DNase I hypersensitivity sites and the degree of nucleosome protection from MNase in cell lines homozygous for both the -137 and -587 SNPs.

3.2 Methods A detailed description of the CHART-PCR methodology is shown in Section 2.9.1

(Chapter 2), with a brief summary included below.

Nuclei were isolated and digested with 20 U/mL nuclease using approximately 1 x 10 7 genotyped Ramos (ATCC, USA) and SAFHS cohort cells. Nuclease-digested DNA fragments were purified using the QIAamp DNA blood mini kit (QIAGEN, Australia), followed by quantitative real time PCR using approximately 100 ng DNA, 0.5µM of primers (See Table 3.1) and SensiMix SYBR No Rox (Bioline Pty Ltd, Australia) in a final reaction volume of 20 µL. Thermal cycling conditions for the CFX96 Real-Time

Detection System (Biorad Laboratories Inc., USA) were as follows: 95°C for 10 min; followed by 40 cycles of 95°C for 15 seconds, 64°C for 15 seconds and 72°C for 15 seconds. Percentage chromatin accessibility was determined as a ratio of relative abundance of digested to undigested amplicon, calculated using the CFX Manager software (Biorad Laboratories Inc., USA). Statistics were performed using GraphPad

Prism v4 (GraphPad Software, USA).

91

Table 3.1: CHART-PCR primers

Sequence-specific forward and reverse (FWD and REV) primers were used for quantitative real-time PCR analysis of the -587 and -137 VNN1 SNPs and the SPA-2 and β-actin promoter sequences. Specific sequences are shown (“Primer sequence”) and were derived from published reports of promoter sequences regulating SPA-2 (Cruikshank et al ., 2008; Li et al ., 1998) and β-actin (Cruikshank et al ., 2008; Danilition et al ., 1991) or designed based on the sequence surrounding the VNN1 SNPs. Forward (FWD) and reverse (REV) primers are indicated by suffix. Pairs used in qPCR assays are grouped together and share the same prefix along with the predicted amplicon size (“Expected amplicon size”).

Primer designation Primer sequence (5’ to 3’) Expected amplicon size (bp) -587 CHART-PCR FWD GAGAATCACATGAACCCGAGAGG 129 -587 CHART-PCR REV GTGAGGACACCACTCTAATCTTTAACACAG -137 CHART-PCR FWD CCTAGCGAA CTAATGTACATAGAGTTCTTGAG 104 -137 CHART-PCR REV AACGCTATGATTTCATAGCATTGTTACATAAC SPA-2 CHART-PCR CTAAGTATTCCTCCAGCCTGAGTGTTC 145 FWD SPA-2 CHART-PCR GGTGGACAACAGCATTTATAGCATG REV β-actin CHART-PCR CTGCACTGTGCGGCGAAGCCGGTG 177 FWD β-actin CHART-PCR CGGACGCGGTCTCGGCGGTGGTG REV

92

3.3 Results

3.3.1 Assessment of relative chromatin accessibility within the VNN1 gene promoter using DNase I

3.3.1.1 Chromatin accessibility of VNN1 SNPs relative to SPA-2 and β-actin control loci using DNase I To determine the relative level of DNase I digestion within the -137 and -587 VNN1 promoter regions, the CHART-PCR methodology was applied on DNase I-digested chromatin from heterozygous -137T/G and -587G/A cell lines and compared to control sequences near the transcriptional start site of SPA-2 and β-actin . The results showed that the -137 SNP had an equivalent DNase I cutting (23.0%) to that of β-actin (20.5%), indicating high chromatin accessibility (Figure 3.1). In contrast, the level of DNase I cutting observed at the -587 locus was 17.1%, implying a moderate-to-high accessibility by comparison to controls ( β-actin = 20.5% and SPA-2 = 7.5%). These results provide evidence that the chromatin states of the -137 and -587 promoter regions are transcriptionally active, at a level equivalent to that of the β-actin gene promoter.

93

35 *P=0.005 30 *P=0.0048 *P=0.010 25 *P=0.024 20 15 (% cutting) (% 10 5 Chromatin Accessibility 0

/G A-2 SP -Actin -137T -587G/A B Genotype

Figure 3.1: Chromatin accessibility differences at the -137 and -587 VNN1 SNPs relative to SPA-2 and β-actin control regions . Chromatin accessibility at each VNN1 SNP, expressed as a ratio of undigested to DNase I-digested chromatin (% cutting). The data were obtained from the Ramos cell line which is heterozygous at each SNP (genotype) and assayed in triplicate from independent cultures (n=3). DNase I cutting values for each replicate are shown in full in Table A.4 (Appendix – Supplementary Tables and Figures). Significant difference between genotypes (P<0.05) was determined by an unpaired Student’s t-test and indicated by an asterisk (*). Error bars represent standard error of the mean across replicates of each sample.

94

3.3.1.2 Assessment of allele-specific chromatin accessibility at VNN1 SNPs using DNase I To evaluate allele-specific chromatin accessibility, DNase I was used to characterise chromatin derived from cell lines, homozygous at the -137 (-137T/T and -137G/G) and

-587 (-587G/G and -587A/A) loci. The CHART-PCR results showed significant

(P<0.05) allele-specific DNase I digestion at both loci (Figure 3.2). The -137T/T homozygote (30.4%) had a 1.2-fold higher degree of DNase I cutting than the -137G/G homozygote (25.5%). Similarly, the -587A/A homozygote (22.6%) possessed a 1.2-fold higher degree of DNase I cutting than -587G/G (18.2%). On average, 1.3-fold higher frequency of DNase I cutting was observed in the -137 homozygotes compared to -587 homozygotes (Figure 3.2), which supports the results from chromatin comparison in heterozygous cell lines (Figure 3.1), which demonstrated the -137 locus has a higher level of chromatin accessibility and transcriptional potential

95

*P=0.004 *P=0.001

35 *P=0.003 *P=0.047

30 *P=0.003 25 20 15 (% cutting) (% 10 5 Chromatin Accessibility 0

G /G T/T G/ 37 -137 -1 -587G -587A/A Genotype

Figure 3.2: Chromatin accessibility differences at the -137 and -587 VNN1 SNPs . Chromatin accessibility at each VNN1 SNP, expressed as a ratio of undigested to DNase I-digested chromatin (% cutting). The data were obtained from cultured human cell lines derived from the SAFHS cohort for each genotype and assayed in triplicate from independent cultures (n=3). DNase I cutting values for each replicate are shown in full in Table A.5 (Appendix – Supplementary Tables and Figures). Significant difference between genotypes (P<0.05) was determined by an unpaired Student’s t-test and indicated by an asterisk (*). Error bars represent standard error of the mean across replicates of each sample.

96

3.3.2 Assessment of relative chromatin accessibility within the VNN1 gene promoter using MNase

3.3.2.1 Chromatin accessibility of VNN1 SNPs relative to SPA-2 and β-actin control loci using MNase MNase was used to evaluate the relative level of chromatin accessibility in the VNN1 promoter region surrounding the heterozygous -137T/G and -587G/A loci by relative comparison to SPA-2 and β-actin controls as per the CHART-PCR method. A mean

MNase cutting frequency of 30.8% was observed at the -137 locus, 25.9% at -587,

20.4% at SPA-2 and 29.9% for the β-actin locus (Figure 3.3). These results are suggestive that the -587 SNP had low-to-moderate chromatin accessibility, whilst the -

137 SNP had moderate-high chromatin accessibility, similar to that of the β-actin control. The observed levels of chromatin accessibility at this locus using MNase support the data obtained using DNase I (Figure 3.1). Results obtained using MNase and DNase I (Figure 3.3) provide evidence that the -137 and -587 promoter reigons are transcriptionally active regions with moderate-high chromatin accessibility

97

*P=0.008 *P=0.027 *P=0.007 35 *P=0.015 30 *P=0.027 25 20 15 (% cutting) (% 10 5 Chromatin Accessibility 0

G /A T/ G 7 7 SPA-2 -13 -58 B-Actin Genotype

Figure 3.3: Chromatin accessibility differences at the -137 and -587 VNN1 SNPs relative to SPA-2 and β-actin control regions . MNase-treated cell lines of the genotypes shown were used to calculate chromatin accessibility expressed as a ratio of undigested to MNase digested DNA (% cutting). The data were obtained from the Ramos cell line and assayed in triplicate from independent cultures (n=3). MNase cutting values for each replicate are shown in full in Table A.6 (Appendix – Supplementary Tables and Figures). Significance (P<0.05) was determined between genotypes by an unpaired Student’s t-test and shown by an asterix (*). Error bars show the standard error of the mean between replicates of each sample.

98

3.3.2.2 Assessment of allele-specific chromatin accessibility at VNN1 SNPs using MNase To differentiate allele-specific chromatin accessibility differences at SNPs within the

VNN1 gene promoter, the chromatin accessibility of DNA from cells homozygous and heterozygous for the -137 and -587 SNPs were compared using CHART-PCR (MNase digestion). The results showed a MNase cutting percentage in -137G/G homozygotes of

13.6% (cell line 0306) and 13.6% (cell line 0525), which was significantly lower than that of the -137T/T homozygotes with 24.8% (cell line 0188) and 29.1% (cell line 0194)

(Figure 3.4A). The -137T/G heterozygotes had a MNase cutting level intermediate to that of the -137 homozygotes with 18.8% (cell line 0029) and 16.6% (cell line 0054)

(Figure 3.4A). The average nuclease cutting for each genotype were 13.6% for the -

137G/G homozygotes, 26.9% for the -137T/T homozygotes, and 17.7% for the heterozygotes (Figure 3.4B). The -137T/T homozygote had a 2-fold higher nuclease cutting level than that of -137G/G, indicating the presence of allele-specific chromatin accessibility at the -137 locus.

At the -587 locus, the -587A/A homozygotes had a MNase cutting level of 26.6% (cell line 0306) and 27.6% (cell line 0525), which was higher than that of the -587G/G homozygotes with 7.2% (cell line 0188) and 13.9% (cell line 0194) (Figure 3.5A). The degree of MNase cutting in -587G/A heterozygotes was intermediate compared to the -

587 homozygotes with 15.0% (cell line 0029) and 16.3% (cell line 0054) (Figure 3.5A).

The average MNase cutting of the genotypes were 10.6% for the -587G homozygotes,

27.1% for the -587A homozygotes, and 15.6% for the heterozygotes (Figure 3.5B). The

-587A homozygote had a 2.6-fold higher degree of MNase cutting than that of -587G/G, indicating the presence of allele-specific chromatin accessibility at the -587 locus.

99

A

35 30 25 20 15 (% cutting) (% 10 5 Chromatin Accessibility 0 ) G) /G) T (G/ (T/G) (T 25 29 54 0306(G/G) 05 00 00 0188(T/ 0194(T/T) SAFHS cohort ID (-137 genotype)

B

*P<0.0003 35 *P<0.0001 30 25 *P=0.0179 20 15

(% cutting) 10 5 Chromatin Accessibility Chromatin 0

G G T / T/ 7 7G/ 37T -13 -1 -13 Genotype

Figure 3.4: Chromatin accessibility differences at the -137 VNN1 SNP . MNase- treated cell lines of the genotypes shown were used to calculate chromatin accessibility expressed as a ratio of undigested to MNase-digested DNA (% cutting). Figure 3.4A shows the data obtained from cultured cell lines derived from the SAFHS cohort for each genotype and assayed in quadruplicate or greater from independent cultures (n ≥4). MNase cutting values for each replicate are shown in full in Table A.7 (Appendix – Supplementary Tables and Figures). Averages of the genotypes are shown in Figure 3.4B. Significance (P<0.05) between means was established by One-way ANOVA, with Turkey's Multiple Comparison Test used to determine significance between cell lines (genotypes) as shown in Figure 3.4B as an asterix (*). Error bars show the standard error of the mean between replicates of each sample.

100

A

35 30 25 20 15 (% cutting) (% 10 5 Chromatin Accessibility 0 ) ) /A A /A) G) (A (G/ (G 4(G/ 9 1 0306 0525(A/A) 0029 0054 0188(G/G) 0 SAFHS cohort ID (-587 genotype)

B

*P<0.0001 35 *P<0.0001 30 25 20 *P=0.0048 15 (% cutting) (% 10 5 Chromatin Accessibility 0

/A G

-587A/A -587 -587G/G Genotype

Figure 3.5: Chromatin accessibility differences at the -587 VNN1 SNP . MNase- treated cell lines of the genotypes shown were used to calculate chromatin accessibility expressed as a ratio of undigested to MNase digested DNA (% cutting). Figure 3.5A shows the data were obtained from cultured human cell lines derived from the SAFHS cohort for each genotype and assayed in quadruplicate or greater from independent cultures (n ≥4). MNase cutting values for each replicate are shown in full in Table A.8 (Appendix – Supplementary Tables and Figures). Averages of the genotypes are shown in Figure 3.5B. Significance (P<0.05) between means was established by One-way ANOVA, with Turkey's Multiple Comparison Test used to determine significance between cell lines (genotypes) as shown in Figure 3.4B as an asterix (*). Error bars show the standard error of the mean between replicates of each sample. 101

3.4 Discussion CHART-PCR results in the present study are suggestive that the -137 and -587 SNPs have an allele-specific effect on chromatin accessibility in VNN1, which supports the findings of Göring et al. (2007) which identified these SNPs to be candidate functional genetic variants. The overall level of nucleosome protection within the VNN1 gene promoter localised to the -137 (30.8% for -137T/G) and -587 (25.9% for -587G/A) loci, was found to be low relative to control genomic regions (20.4% for SPA-2 and 29.9% for β-actin ). This observation supports research that indicates that the VNN1 gene is highly expressed within leukocytes (Galland et al ., 1998; Jansen et al . 2009; Yanai et al ., 2005). Furthermore, the level of DNA-binding protein occupancy at the -137 and -

587 VNN1 SNPs, as a measure of DNase I hypersensitivity using CHART-PCR, was shown to be approximately equivalent (23.0% for -137T/G, and 17.1% for -587G/A) to that of the transcriptionally-active β-actin locus (20.5%). Consequently, chromatin within the VNN1 promoter is likely to confer accessibility to the binding of transcription factors and other potential regulators. In fact, allele-specific DNA-protein binding complexes have been experimentally demonstrated at the -137 VNN1 SNP using

EMSA, with Sp1 being the likely candidate in the presence of the T allele (Göring et al .,

2007). Sp1 has been associated with high gene expression, and contains a zinc finger protein motif which directly binds DNA and enhances transcription (Jiang et al ., 2005).

Since high chromatin accessibility is observed in the presence of the -137T allele, increased VNN1 expression due to greater Sp1 accessibility would be expected.

Alternatively, the 2-fold difference in the level of nucleosome protection in the presence of the -137 alleles may also be sufficient to inhibit the binding of the Sp1 transcription factor reported by Göring et al . (2007). Consequently, the allele-specific effects in chromatin arrangement at the -137 and -587 variants are likely to influence VNN1 gene expression.

102

At the -587 VNN1 SNP, the CHART-PCR results showed allele-specific chromatin accessibility with DNase I (1.2-fold higher at -587A/A) and MNase (2.6-fold higher at -

587A/A), demonstrating that this genetic variant has a functional role at the chromatin level. While Göring et al . (2007) identified the -587 SNP to be a functional candidate based on statistical approaches, no allele-specific DNA-protein binding complexes were found with EMSA (Göring et al ., 2007). Therefore, both findings indicate the functional mechanism at the -587 SNP is not due to allele-specific chromatin accessibility contributing to differential transcription factor binding, as seen at the -137 SNP. In

Chapter 5 of the present study, the VNN1 gene promoter was assessed for allele-specific

DNA methylation associated with the -587 SNP using bisulfite sequencing.

Two control genomic regions, namely SPA-2 and β-actin , were assessed using CHART-

PCR to demonstrate the relative level of chromatin accessibility to DNase I and MNase within the VNN1 gene promoter. SPA-2 and β-actin were selected to represent transcriptionally-silent heterochromatin and transcriptionally-active euchromatin controls respectively (Li et al ., 1998; Zhang et al ., 2005; Danilition et al ., 1991;

Cruikshank et al ., 2012). Importantly, the chromatin accessibilities of the two control regions were found to be significantly different using both DNase I and MNase, which matched the findings of Cruikshank et al (2008). For instance, the present study reported 1.5-fold higher MNase digestion at the β-actin locus relative to SPA-2 in

Ramos cells, with Cruikshank et al (2008) reporting a similar fold difference (2.0) using an equivalent methodology in Ramos cells. Therefore, both SPA-2 and β-actin are effective as control genomic regions for chromatin accessibility within VNN1 .

103

While the mode of nucleolytic digestion and the recognition sites differed for DNase I and MNase, both nucleases were able to identify allele-specific chromatin accessibility at the -137 and -587 VNN1 SNPs. Functionality, in terms of allele-specific chromatin accessibility, was assigned to SNPs that showed a significant difference (P<0.05) between alleles as implemented by Cruikshank et al . (2012) at the rs3813946 SNP in the CR2 gene. The approach used in the present study, extended upon that of

Cruikshank et al . (2012) to include DNase I, which gave equivalent results to that of

MNase. While the digestion level of MNase and DNase I differed acrossed cell lines, both nucleases corresponded with equivalent fold differences when comparing the heterozygous -137T/G and -587G/A (1.3-fold with DNase I and 1.2-fold with MNase) in Ramos cells. Therefore, the use both DNase I and MNase was adequate for the identification of functional genetic variants based on chromatin accessibility and potentially could be applied to other nucleases and genetic variants.

The selection of both DNase I and MNase for use in chromatin arrangement studies provided a greater level of confidence in assigning function to genetic variants. Taken together, the results of CHART-PCR using DNase I and MNase indicate complementary structural chromatin information regarding the VNN1 gene promoter, including the relative positions of nucleosomes and hypersensitive sites.

Although CHART-PCR studies have enabled the in vivo characterisation of chromatin arrangement and transcriptional potential in multiple genes and organisms (Fürbass et al ., 2008; Su et al ., 2004; Goriely et al ., 2004; Pompei et al ., 2007), variability can arise when targeting the alleles of individual genetic variants. Given the requirement for the comparison of chromatin between different homozygotic cell lines, chromatin accessibility values derived from CHART-PCR remain a relative measures approach

104

(Cruikshank et al ., 2008), rather than an ‘absolute’ figure that accounts for all biological variation when comparing multiple cell lines. For example, cell line-specific MNase digestion was observed at both the -137 and -587 VNN1 SNPs in the present study.

MNase digestion levels at the -137T/G locus in the Ramos cell line was 30.8%, as compared to 18.8% (‘0029’) and 16.6% (‘0054) in the SAFHS cell lines. In this case, the characteristics of the cell line such as genotype and culture conditions also need to be considered when interpreting CHART-PCR measurements (Baral et al ., 2012).

Whole-genome comparisons of heterozygotic cell lines overcome biological variation associated with assessment of multiple cell lines by simultaneously assessing both alleles at a given genetic variant in the context of a genome (Pastinen, 2010).

Consequently, the development of novel genome-wide approaches for targeting chromatin accessibility at genetic variants may overcome these inherent limitations associated with current assays such as CHART-PCR.

105

Chapter 4: CHA-seq - A novel functional variant screen

4.1 Introduction Genetic variants correlating with allele-specific effects on gene expression or phenotype are considered to be functional (Wang et al ., 2011; Jian et al ., 2011). Functional genetic variants have been associated with cell function (Bonetti et al ., 2012; Johnson et al .,

2012) and disease (Bartoszewski et al ., 2010; Zhang et al ., 2011). Functional genetic variation can be attributed to allelic sequence changes in regulatory elements, which may alter gene expression, often without an obvious effect on phenotype (Kleinjan and van Heyningen, 2005; Pastinen et al ., 2004). For example, Pastinen et al . (2004) surveyed allelic imbalances in gene expression in heterozygous human cells, and reported that 23 of the 126 genes tested showed a bias in gene expression toward one allele. This allelic imbalance approach involved quantitative genotyping of heterozygous individuals for genetic variants within RNA transcripts and compared observed allele ratios to assumed equimolar (50:50) allele ratios. Pastinen et al . (2004) determined an average observed allele ratio of 37:63 as the cutoff for determining the presence of a significant allelic imbalance. As such, an allelic imbalance of greater than

37:63 is a likely indicator of functional genetic variants.

Traditionally, reporter gene and gel shift assays have been considered sufficient to assess the in vitro functionality of genetic variants; however there are many variables that may confound results (Karimi et al ., 2009; Holden and Tacon, 2011). Furthermore, these ‘gold standard’ approaches have inherent limitations including locus-specificity and the lack of endogenous chromatin context (Hager et al ., 2009). While more recent in vivo approaches such as CHART-PCR can be used to functionally interrogate

106 endogenous chromatin contexts, the applicability of using existing locus-specific assays to interpret genetic variants in a genome-wide context remains time and cost-inefficient

(Barski et al ., 2007).

With advances in DNA sequencing technologies, genome-wide approaches have become efficient alternatives for identifying functional genetic variants (Hunt et al .,

2008; Satake et al ., 2009; Collas, 2010; Statham et al ., 2012). The use of sequencing to characterise chromatin arrangement, as opposed to quantitative real-time PCR in the

CHART-PCR approach, not only allows scalability but also eliminates PCR-based issues, including primer design, hetero-duplex formation, mis-priming and primer dimerisation (Freeman et al ., 1999; Bustin et al ., 2005; and Git et al ., 2010). The current study demonstrates a novel sequencing-based approach for assessing functionality of non-coding sequence variation, by developing strategies that assess function in an in vivo context.

The development of strategies that assess differences in chromatin arrangement as a measure of functional difference in an in vivo context represents an ideal approach to greatly increase time and cost-efficiency and to assign function to a particular genetic variant within an extended genomic region. The current study translates traditional chromatin arrangement assays such as the CHART-PCR method into a novel, robust approach (CHA-seq) in order to assess allele-specific differences in chromatin arrangement across an extended genomic region, based on next generation sequencing. The CHA-seq approach represents a means of simultaneously sequencing genetic variants, mapping nucleosomes and nuclease hypersensitivity sites, and assessing SNP functionality. The development of enhanced genome-wide in vivo approaches allows for a more time and

107 cost-effective assessment of whether a variant does indeed regulate gene expression and yields insights into the long range allele-specific mechanisms of action.

In this study, two novel in vivo approaches (‘targeted CHA-seq’ [T-CHA-seq] and

‘extended CHA-seq’ [CHA-seq]) were developed in order to characterise allele-specific differences in chromatin arrangement within the VNN1 gene (Figure 4.1). For the development of T-CHA-seq, a DNase I-digested genomic library was prepared from a -

137T/G heterozygous cell line from the SAFHS cohort and amplified for the -137 SNP using two rounds of ‘nested’ PCR. This T-CHA-seq library was subsequently sequenced using traditional Sanger sequencing in order to map nuclease digestion sites around the -137

SNP. For CHA-seq, a MNase-digested genomic library was prepared from a -587G/A heterozygous cell line from the SAFHS cohort, and subjected to high-throughput Illumina sequencing in order to map allelic-nuclease digestion sites across the entire VNN1, VNN2 and VNN3 genes. Subsequently, library sequence reads for both approaches were compared to establish the presence of an allele-specific bias in read number (or ‘allele- skew’) as a direct reflection of allele-specific nuclease digestion and functionality. To facilitate a direct comparison between different alleles at a given SNP, SAFHS cell lines containing heterozygous SNPs were used in the two novel approaches and loci correlating with high allele-skews were considered to be functional genetic variants.

108

A. B.

CHA-seq T-CHA-seq

Figure 4.1 Principles of CHA-seq and T-CHA-seq. A: CHA-seq consists of firstly the preparation of a MNase-digested genomic library using the NEBNext DNA Sample Prep kit (New England Biolabs) on a -587G/A heterozygous cell line from the SAFHS cohort. After bead selection to enrich for genes of interest, high-throughput Illumina sequencing is performed and reads are mapped to the human genome using the Integrated Genome Viewer (IGV)(Broad Institute) in order to map allelic-nuclease digestion sites across the VNN1, VNN2 and VNN3 genes. B: T-CHA-seq consists of firstly the preparation of a DNase I-digested genomic library using the NEBNext DNA Sample Prep kit (New England Biolabs) on a -137T/G heterozygous cell line from the SAFHS cohort. After amplification of the -137 SNP using two rounds of ‘nested’ PCR, the library is cloned in competent E. coli and subsequently sequenced using Sanger sequencing in order to map nuclease digestion sites around the -137 SNP.

109

4.2 CHA-seq method development The two novel CHA-seq methods required optimisation at multiple key steps in library preparation (nuclease digestion and adapter-ligation) and PCR amplification

(optimisation of the PCR conditions in T-CHA-seq) as shown in Sections 4.2.2 (T-

CHA-seq) and 4.2.3 (CHA-seq), with the optimised CHA-seq methodologies described in detail in Chapter 2 (Section 2.10 for T-CHA-seq and Section 2.11 for CHA-seq).

In order to determine the presence or absence of an allele-specific bias in chromatin accessibility via CHA-seq, a proportion is calculated based on the sequence reads of

MNase or DNase-digested fragments containing each allele at a given heterozygous

SNP locus relative to the total number of reads. This allelic proportion relative to a random distribution (50%), or ‘allele-skew’, is a direct reflection of chromatin arrangement and nuclease-sensitivity. For example, if 60% of sequence reads contained the A allele at a given heterozygous SNP locus (and consequently 40% the alternate allele), then the allele-skew (from a random distribution) is 10%. A two-tailed Student’s t-test was then applied to determine statistical significance (P<0.05). Those SNPs that recorded a CHA-seq allele ratio of greater than 37:63, or 13% allele skew, were identified as candidate functional variants based on the allele bias cutoff set by Pastinen et al . (2004).

T-CHA-seq is a site-specific and more cost-effective approach for functional assessment of genetic variants, in comparison to CHA-seq, as it does not require deep sequencing.

The T-CHA-seq method is of particular use when candidate SNPs or preliminary data is available regarding functionality. Alternatively, CHA-seq is a large-scale functional screen providing the DNA base sequence, novel SNP identification, chromatin arrangement assessment, nucleosomal positioning information, and the ability to predict 110 areas of nuclease hypersensitivity (and consequently transcription factor binding sites).

A limitation of CHA-seq, when using Illumina sequencing technology, is the inherent error rate of 0.99% for comparisons of sequence reads (Illumina Inc, 2009). Therefore, when interpreting CHA-seq results, this error rate was considered when determining the viability of further functional assays for candidate SNPs.

4.2.1 Traditional chromatin arrangement assays – issues Although the traditional CHART-PCR approach has been sufficient for characterising the chromatin arrangement at specific genomic loci as evident in the literature

(Cruikshank et al ., 2012; Rao et al ., 2001), this approach requires extensive optimisation to obtain consistent results. Of particular importance to the reproducibility of CHART-PCR, is the normalisation to chromatin accessibility controls. Variation in the chromatin accessibility of control loci such as β-actin (Figure 4.2), illustrates that there are inconsistencies arising from the chromatin preparation which can significantly influence chromatin accessibility values when comparing replicates and multiple cell lines. In order to overcome these issues, extensive optimisation was required in order to target VNN1 SNPs for the establishment of consistent chromatin accessibility measurements using CHART-PCR (Chapter 3). Essential to this optimisation was the establishment of an optimal nuclease concentration to reliably discriminate chromatin accessibility. For example, the β-actin locus shown in Figure 4.2 had a higher MNase concentration (4 enzyme units) and hence higher nuclease digestion to that used in the optimised CHART-PCR methodology, shown in Figure 3.3 (1 enzyme unit). Partial nuclease digestion proved better at discriminating chromatin accessibility and was used in CHART-PCR (Chapter 3) and CHA-seq in the current study.

111

100

90 (% cutting) (%

80 Chromatin Accessibility 0 ) ) ) ) ) )

0029(1)0029(20029(30054(1)0054(20054(30331(1)0331(20331(3 SAFHS chohort ID (replicate number)

Figure 4.2: Chromatin accessibility differences between replicates at the β-actin promoter using CHART-PCR . MNase-treated cell lines from the SAFHS cohort (‘0029’, ‘0054’ and ‘0331’) were used to calculate chromatin accessibility expressed as a ratio of undigested to MNase-digested DNA (% cutting) at sequence near the β-actin promoter in CHART-PCR. The data were obtained from SAFHS cell lines shown and assayed in triplicate (n=3). No significance (P<0.05) was observed between samples as determined by one-way ANOVA (P value = 0.98).

112

4.2.2 Targeted CHA-seq (T-CHA-seq) development To demonstrate the utility of CHA-seq as a functional assay, a targeted CHA-seq (T-

CHA-seq) approach was developed to verify the results of CHART-PCR (Chapter 3) which compared allele-specific chromatin accessibility at the -137 VNN1 SNP. T-CHA- seq was applied to SAFHS cell line ‘0331’, which was heterozygous at the -137T/G locus, and assessed for chromatin accessibility using DNase I. In this case, DNase I was selected as the chromatin-sensitive nuclease based on its ability to discriminate genomic regions that are under protein occupancy, given that the -137 locus is likely to be regulated by Sp1 (Göring et al ., 2007).

4.2.2.1 T-CHA-seq library preparation – DNase I digestion optimisation Chromatin was isolated from a SAFHS cell line (‘0331’) which was heterozygous for multiple VNN1 SNPs (including -137 and -587) and treated with DNase I. Various reaction conditions and enzyme concentrations were trialled. In order to determine optimal digestion, prepared chromatin was exposed to different DNase I concentrations

(1 U, 2 U, 3 U, 4 U and 6 U) and incubation lengths at 37˚C (2, 3, 4, 5 and 6 minutes) as shown in Figure 4.3. The 6 minute incubation using 6 U DNase I (lane 7) was selected based on the presence of a higher proportion of fragments located within the mono- and di-nucleosomal size range (~100-400 bp) which was optimal for subsequent library preparation steps using the NEBNext DNA Sample Prep kit (New England Biolabs,

USA). Additional experimentation was conducted to assess longer nuclease incubation periods which showed that digestion was complete after six minutes. Optimal DNase I digestion (6 minutes incubation using 6 U) was used for all subsequent preparations.

113

Figure 4.3: Electrophoretic patterns of DNase 1-digested chromatin . Electrophoresis was performed using a 1% agarose gel stained with ethidium bromide and run for 45 minutes at 90V. The NEB PCR Marker was loaded in lane 1 along with the NEB 100 bp DNA ladder in lane 14, undigested chromatin derived from the SAFHS cohort in lane 2 (3 µg chromatin) and lane 8 (1 µg chromatin, note faint band due to loading spillage). 3 µg of chromatin derived from the SAFHS cohort was incubated in 6 U of DNase I with different incubation times and loaded in lane 3 (2 minutes), lane 4 (3 minutes), lane 5 (4 minutes), lane 6 (5 minutes) and lane 7 (6 minutes). 1 µg of SAFHS chromatin incubated for 6 minutes incubated in different DNase I concentrations was loaded in lane 9 (1 U), lane 10 (2 U), lane 11 (3 U), lane 12 (4 U) and lane 13 (6 U). Optimal digestion of the SAFHS chromatin was established using 6 minute incubation with 6 U DNase I.

114

4.2.2.2 T-CHA-seq library preparation – adapter-ligation optimisation DNase 1-digested fragments within the 100-400 bp range were selected following agarose gel electrophoresis, cut out using a scalpel, and purified using the QIAquick Gel

Extraction kit (QIAGEN, Australia). The fragments were subsequently prepared according to the protocol outlined in the NEBNext DNA Sample Prep kit (New England

Biolabs, USA). It was of particular importance to ensure the optimal size of DNA fragments following the ligation of adapters using agarose gel electrophoresis. As shown in Figure 4.4, the highest proportion of fragments were approximately 300 bp

(di-nucleosomal) as estimated by comparison to the NEB PCR marker (New England

Biolabs, USA). An additional 50 bp was also observed, which matches the size of the adapters used in the library preparation and was subsequently removed using the

QIAquick PCR purification kit (QIAGEN, Australia).

115

1 2

766bp

500bp

300bp

150bp

50bp

Figure 4.4: Adapter ligation to DNase I-digested chromatin . Electrophoresis was performed using a 1% agarose gel stained with ethidium bromide and run for 45 minutes at 90V. The NEB PCR Marker was loaded in lane 1 along with DNase I- digested fragments that had been prepared using the NEBNext DNA Sample Prep kit (New England Biolabs, USA) following the ligation of adapter sequence in lane 2.

116

4.2.2.3 PCR amplification - optimisation In order to target specific genomic regions, namely the -137 VNN1 SNP, PCR amplification (Section 2.7) was used to enrich prepared library fragments following preparation using the NEBNext DNA Sample Prep kit (New England Biolabs, USA).

Two rounds of PCR amplification using ‘nested’ sets (primer sets positioned internally to the previous set) of VNN1 -specific primers were designed to increase the yield of fragments containing the -137 locus from the forward (first round primer: 5’-

GCAGCCCTAGCGAACTAATG-3’ and second round primer: 5’-

TCTTGAGTTAATCTTCACAAATTACTGC-3’) and reverse (first round primer: 5’-

GTCCAGAACTTAGTACCTTAAACTACGGAAG 3’ and second round primer: 5’-

CGCTATGATTTCATAGCATTGTTACATAAC 3’) direction (Figure 4.5). In each round of ‘nested’ PCR amplification, a VNN1 -specific primer was paired with an adapter-specific primer (5’-ACTCTTTCCCTACACGACGC-3’) to ensure only library fragments were amplified. The results following PCR amplification showed the presence of a ‘smear’ which represented a range of different sized fragments. The size of the first and second round of ‘forward’ amplicons were within a range of approximately 100-350 bp, indicating the presence of different VNN1 -enriched fragments (Figure 4.6), with the reverse library resulting in equivalent sized amplicons

(Figure 4.7). The forward and reverse -137T/G amplicon libraries were subsequently pooled and sequenced using Sanger-based chemistry (Section 4.3.1).

117

5’-CTATGGGTCAAGGAGAGAGGCCTGGAACACATCTTTCCTTCACCGAAA

TCAGGAGGAACCAACTCTGCTGACACCTTCATCTGGGACTCCCACCCTCCAG

AACTGCAAAGCAATAAATTTTTTATTTTTTACACCACCCAGTTTATTGTATTT

TGTTAG GCAGCCCTAGCGAACTAATG TACATAGAGTTCTTGAGTTAATCTTCA

CAAATTACTGC AATAAGG KAGGGTCTTTTGTTATGTAACAATGCTATGAAAT

CATAGCGTTTTCTTAATTAA CTTCCGTAGTTTAAGGTACTAAGTTCTGGAC ACC

ACGTGTCTTCTTTCTATAAATACCAGGACATGCTCTGTTTTTCAGCACT -3’

Figure 4.5: T-CHA-seq primer positions in the -137 VNN1 promoter region . Two rounds of PCR amplification were performed using nested sets of sequence-specific primers in order to increase target yield of the -137 VNN1 locus from both the forward (5’ to 3’) and reverse (3’ to 5’) direction. Shown are the consensus sequences of the first (italics) and second-round ‘forward’ (black boxes) and ‘reverse’ (grey boxes) T-CHA- seq PCR primers within the VNN1 promoter region relative to the -137 SNP ( K). In each round of amplification, one corresponding VNN1 -specific primer was paired with an adapter-specific primer to ensure only library fragments were amplified.

118

I-digested library

1 2 3 4 5 6

766bp

500bp 300bp 150bp 50bp

Figure 4.6: T-CHA-seq ‘forward’ PCR amplification of DNase I library fragments . Electrophoresis was performed using a 1% agarose gel stained with ethidium bromide and run for 45 minutes at 90V. The NEB PCR Marker (sizes: 50 bp, 150 bp, 300 bp, 500 bp, 766 bp) was loaded in lane 1 and 6 along with the ‘forward’ (5’ to 3’) PCR amplification products of the T-CHA-seq library in lane 2 (first round amplicons) and lane 4 (second round amplicons). Controls (no template) were run in lane 3 (first round) and 5 (second round). *Note: A 1% agarose gel was selected to confirm whether amplification was successful rather than to establish the size range which was determined in Figure 4.4.

119

1 2 3 4 5

766bp

500bp 300bp 150bp 50bp

Figure 4.7: T-CHA-seq ‘reverse’ PCR amplification of DNase I library fragments . Electrophoresis was performed using a 1% agarose gel stained with ethidium bromide and run for 45 minutes at 90V. The NEB PCR Marker (sizes: 50 bp, 150 bp, 300 bp, 500 bp, 766 bp) was loaded in lane 1 along with the ‘reverse’ (3’ to 5’) PCR amplification products of the T-CHA-seq library in lane 2 (first round amplicons) and lane 4 (second round amplicons). Controls (no template) were run in lane 3 (first round) and 5 (second round). *Note: A 1% agarose gel was selected to confirm whether amplification was successful rather than to establish the size range which was determined in Figure 4.4.

120

4.2.2.4 T-CHA-seq methodology following PCR amplification Following PCR enrichment of the DNase I-digested libraries for the -137 VNN1 SNP,

TOPO TA cloning (Invitrogen Corporation, Australia) and bacterial transformation steps using One Shot MAX Efficiency DH5 α-T1 R Competent Cells (Invitrogen

Corporation, Australia) were performed (See Sections 2.2.3 and 2.2.4) in order to isolate and amplify single library fragments within plasmid vectors, which were subsequently purified and sequenced using Sanger-based chemistry. Following blue-white colony selection (See Section 2.2.4), positive bacterial clones carrying enriched library fragments were purified using the QIAGEN Spin Mini-prep kit (QIAGEN, Australia)

(See Section 2.5.1), and subsequently sequenced using Sanger-based chemistry at the

Australian Genome Research Facility (AGRF) (See Section 2.8). The DNA sequence reads were downloaded (http://www.agrf.org.au/) and then processed using bioinformatic tools (BLAST). Results are shown in Figures 4.10 and 4.11.

4.2.3 Extended CHA-seq method development The CHA-seq approach was modified from the targeted T-CHA-seq approach in order to screen larger genomic regions for functional genetic variants (Section 2.11). CHA- seq required a bead selection step (as opposed to locus-specific primers in T-CHA-seq) and deep-sequencing in order to extend the target assay region. CHA-seq was applied to a SAFHS cell line (‘0331’) which was heterozygous for multiple VNN1 SNPs

(including -137 and -587) and assessed for chromatin accessibility using MNase. The use of MNase generated detailed information regarding nucleosome positions and relative levels of nucleosome protection across the VNN1 gene that extends upon the

CHART-PCR results (Chapter 3).

121

4.2.3.1 Library preparation – MNase digestion optimization Chromatin was isolated from a SAFHS cell line (‘0331’) and treated with MNase under various reaction conditions to determine optimal nuclease digestion to generate a higher distribution of fragments within the mono- and di-nucleosomal size range (Figure 4.8).

Prepared chromatin was exposed to different MNase concentrations (0.5, 1, and 2 enzyme units) and incubation times (5, 10 and 15 minutes) (Figure 4.8). As shown in

Figure 4.8, a ten minute incubation using one enzyme unit of MNase was selected as optimal and thus utilised for subsequent library preparation using the NEBNext DNA

Sample Prep kit (New England Biolabs, USA).

122

1 2 3 4 5 6 7

4000bp 3000bp 2000bp 1500bp

1000bp

500bp

Figure 4.8: Electrophoretic patterns of MNase-digested chromatin . Electrophoresis was performed using a 1% agarose gel stained with ethidium bromide and run for 45 minutes at 90V. The NEB 1 kb DNA ladder (5-10 kb bands indistinguishable) was loaded in lane 1 along with micrococcal nuclease-digested chromatin isolated from a SAFHS cohort cell line (‘0331’) using different incubation lengths and enzyme concentrations in lanes 2 (10 minute digestion, 0.5 U MNase), 3 (10 minute digestion, 1 U MNase), 4 (10 minute digestion, 2 U MNase), 5 (5 minute digestion, 1 U MNase), 6 (10 minute digestion, 1 U MNase) and 7 (15 minute digestion, 1 U MNase).

123

4.2.3.2 CHA-seq methodology post-library preparation Following preparation of the MNase-digested library using the NEBNext DNA Sample

Prep kit (New England Biolabs, USA), target sequences containing the entire Vanin gene family were supplied to Agilent to synthesise pools of biotinylated probes specific for the SureSelect DNA Target Enrichment system. Libraries were purified using a bead-based selection protocol with the SureSelect Target Enrichment System - Illumina

Single-End Sequencing Platform Library Prep Version 1.2 (Agilent, USA) in accordance with the manufacturer’s instructions.

Prepared libraries were processed for cluster generation on the Illumina cluster station

(Agilent, USA) and subsequently sequenced using Illumina next-gen sequencing technology on the Illumina Genome Analyzer II platform (Agilent, USA). DNA sequence reads were analysed using the IGV 2.1 software and aligned to the hg19 reference sequence. Raw sequence data was used to interpret chromatin accessibility by comparing numbers of allelic sequence reads at heterozygous SNPs (Table 4.1 and 4.3).

The optimised protocol for CHA-seq is described in detail in Chapter 2 (Section 2.11).

4.3 Results

4.3.1 Targeted CHA-seq (T-CHA-seq) To assess allele-specific chromatin accessibility, a DNase I-treated genomic library was prepared from a -137G/A heterozygous cell line from the SAFHS cohort, enriched for the -

137 locus using two rounds of PCR amplification, and subjected to Sanger sequencing to map the nuclease digestion sites. The results showed the presence of an area of high nuclease accessibility directly adjacent (5’ upstream) to the -137 SNP, suggesting the influence of histone or bound protein protection localised to the -137 locus (Figure 4.9).

Comparison of the number of sequenced library fragments ending at a particular nucleotide

(and hence mapping the sites of digestion by the nuclease) for each -137 variant, indicated 124 an allele-specific bias (allele skew) in the digestion pattern (Figure 4.9). Only 11.4% of the total number of fragments carried -137G, while the majority carried the T allele (89.6%).

This is consistent with the results of the CHART-PCR (Chapter 3) that showed the -137G allele to be correlated with greater chromatin condensation than the -137T allele.

125

Figure 4.9: Assessment of the chromatin arrangement across the VNN1 promoter using T-CHA-seq . Shown is the total coverage across the VNN1 promoter sequence, the number of reads (vertical axis) carrying the -137 variant (K), and the position of the 3’ terminal nucleotide located in the region derived from T-CHA-seq results. Sequence reads containing the -137T allele are shown as black columns and those with the -137G allele in grey. Sites of digestion by DNase I are plotted across (horizontal axis) the -137 VNN1 promoter region, with the number of sequenced fragments ending at each nucleotide indicated by the vertical axis for each -137 allele (T or G). The areas of highest sequence reads (15) are 15, 16, and 17 base-pairs upstream of the -137 SNP and contain the T allele. The number of reads containing the -137T allele (132) were higher than that of the -137G allele (17) indicating an allelic bias towards the T allele .

126

To further delineate exact positions of DNase I hypersensitive sites within the -137 promoter region, the total number of T-CHA-seq sequence reads were mapped to the -

137 VNN1 promoter region and included those reads terminating prior to the -137 locus

(Figure 4.10). There appears to be two regions of higher DNase I accessibility ( ≥13 reads) adjacent to the -137 SNP (1-5 bp 3’ downstream, and 11-15 bp 5’ upstream of the

SNP) (Figure 4.12).

127

Analysis of hypersensitivity about the -137 VNN1 gene promoter SNP using sequenced library fragment ends

Figure 4.10: Assessment of the chromatin arrangement across the VNN1 promoter using T-CHA-seq . Shown is the total sequence read coverage across the VNN1 gene promoter (horizontal axis) with the position of the 3’ terminal nucleotide located in the - 137 (K) promoter region derived from T-CHA-seq results. Numbers of sequence reads are shown (vertical axis) as black columns, with the areas of highest sequence reads (16) occurring at positions -136 and -135.

128

4.3.2 CHA-seq To map nuclease digestion sites across the entire Vanin gene family, including the VNN1 gene, a MNase-digested genomic library was prepared from a -587 G/A heterozygous cell line from the SAFHS cohort and subjected to deep sequencing. Using a prepared library of

DNA fragments covering the VNN1 gene, 41 heterozygous SNPs were captured following sequencing and analysed using the Integrated Genome Viewer (IGV) 2.1 software (Broad Institute) to determine the location and allele skew for each VNN1 SNP

(Table 4.1).

To assign rs numbers, the chromosome position (based on the hg19 human reference genome) of captured CHA-seq SNPs were matched to the dbSNP database

(http://www.ncbi.nlm.nih.gov/snp) and verified by comparing CHA-seq reads to the

SNP sequence in dbSNP. At the time of analysis (29-6-13), there was 1042 SNPs in dbSNP that matched to the VNN1 gene. A total of 12 heterozygous SNPs captured in

CHA-seq corresponded with rs numbers. The remaining SNPs may be either novel genetic variants within the SAFHS cohort, or base substitutions introduced as part of

Illumina sequencing error. For example, the SNP at chromosome position 133,035,606 in the VNN1 promoter had not been assigned an rs number at the time of experimentation (2010), but has since been designated rs45513194. Therefore, the novel

SNPs identified in the current study may be verified in future studies. To reduce the likelihood of error introduced by the Illumina sequencing, only those captured SNPs with rs numbers were considered for further functional analysis.

A summary of the distribution of captured heterozygous SNPs across the VNN1 gene indicates a number of densely populated genomic regions carrying captured heterozygous SNPs (Figure 4.11). Of interest is the presence of five intronic SNPs that

129 are co-located between chromosomal positions 133,023,083 to 133,023,087 which clearly show an active region that affects the conformation of VNN1 chromatin (Figure

4.11). Of the 41 captured SNPs in CHA-seq, 37 were found to be significantly different from random distribution (50:50) between alleles (P<0.005) (Table 4.1). Noticeably, the

-587 SNP (88% A and 12% G) and the rs2267951 SNP (85% T, and 15% C) have an allele-specific influence on VNN1 chromatin, with a further 21 SNPs that are also likely to contribute (greater than 13% + 1% Illumina sequencing error rate) based on the allele ratio cutoff set by Pastinen et al . (2004).

130

Table 4.1: Heterozygous VNN1 SNPs captured in CHA-seq .

Shown is a summary of the Integrated Genome Viewer (IGV) software (Broad Institute) output from CHA-seq that detected 41 heterozygous SNPs in the VNN1 gene by comparison to the hg19 human reference sequence. Shown for each captured VNN1 SNP are the genomic positions relative to the transcriptional start site (TSS), position within the VNN1 gene (promoter/5’upstream, intronic or exonic), alleles, total CHA-seq sequence reads carrying the SNP, rs SNP numbers (N/A = no dbSNP match), allele frequencies (allele 1 and 2), and allele skew as a direct reflection of chromatin accessibility and functionality (percentage). Allele-skews that were statistically significant using the critical t-value (2.97% at P=0.005, df=41, 2-tailed) are bolded as shown. The range of nucleosomal protection at genetic variants in the VNN1 gene was between 1-38%. The average allele skew across all captured VNN1 SNPs was 13.6%, with the rs2050153 (-587) variant having the highest recorded allele-skew (38%).

Chr 6 Genomic location Alleles Total rs SNP Allele Allele Allele position reads number 1 % 2 % skew 133,035,897 promoter/5'upstream T-C 2817 rs7760367 56 44 6 133,035,856 promoter/5'upstream C-T 3250 rs13204527 53 46 3.5 133,035,776 promoter/5'upstream A-G 5253 rs2050153 88 12 38 (-587) 133,035,606 promoter/5'upstream G-C 4955 rs45513194 61 39 11 133,035,099 exon 1 A-G 1432 rs2294757 65 34 15.5 133,034,654 intron 1 A-T 1069 N/A 55 45 5 133,034,653 intron 1 G-A 1070 N/A 54 46 4 133,034,652 intron 1 G-C 1070 N/A 54 45 4.5 133,034,650 intron 1 T-C 1070 N/A 54 46 4 133,034,647 intron 1 A-T 1070 N/A 55 45 5 133,034,549 intron 1 C-T 1099 N/A 56 44 5.5 133,034,550 intron 1 C-T 1099 N/A 55 45 6 133,034,553 intron 1 C-T 1099 N/A 56 44 6 133,030,761 intron 2 G-A 758 rs148984574 56 44 6 133,023,821 intron 2 C-T 1439 rs113832566 66 34 16 133,023,820 intron 2 A-G 1439 N/A 66 34 16 133,023,174 intron 2 A-C 1175 N/A 61 39 11 133,023,167 intron 2 A-C 1175 N/A 61 39 11 133,023,087 intron 2 G-C 1951 N/A 71 26 22.5 133,023,086 intron 2 A-T 1951 N/A 73 26 23.5 133,023,085 intron 2 G-A 1951 N/A 73 27 23 133,023,084 intron 2 A-G 1951 N/A 74 26 24 133,023,083 intron 2 C-A 1913 rs78040505 73 27 23 133,020,445 intron 2 C-T 1806 N/A 66 34 16 133,020,383 intron 2 A-T 1913 N/A 68 32 18 133,020,380 intron 2 T-A 1913 N/A 68 32 18 133,020,379 intron 2 T-G 1913 N/A 68 32 18 133,020,376 intron 2 G-T 1912 rs2267949 68 32 18 133,019,859 intron 2 T-C 1522 rs2267951 85 15 35 133,019,482 intron 2 T-C 1041 N/A 51 49 1 133,013,475 exon 5 T-G 2044 N/A 73 27 23 133,013,474 exon 5 T-C 2043 N/A 73 27 23 131

133,013,374 exon 5 A-C 1524 N/A 69 31 19 Chr 6 Genomic location Alleles Total rs SNP Allele Allele Allele position reads number 1 % 2 % skew

133,013,369 exon 5 A-C 1520 N/A 69 31 19 133,012,903 intron 5 A-G 846 N/A 51 49 1 133,012,888 intron 5 G-A 846 N/A 51 49 1 133,012,885 intron 5 A-T 846 N/A 51 49 1 133,012,858 intron 5 A-T 1254 N/A 65 35 15 133,010,071 intron 5 T-C 854 N/A 72 28 22 133,005,239 intron 6 C-T 1656 rs45525739 70 30 20 133,004,992 intron 6 A-T 3015 rs9389025 65 35 15 Average 63.5 36.2 13.6

132

Figure 4.11: Distribution of heterozygous SNPs captured in CHA-seq across the VNN1 gene relative to allele skew . Shown is a graphical summary of the Integrated Genome Viewer (IGV) software (Broad Institute) output from CHA-seq that detected 41 heterozygous SNPs in the VNN1 gene by comparison to the hg19 human reference sequence. Shown are the positions of captured VNN1 SNPs relative to the transcriptional start site (x-axis) and the associated allele-skew as a percentage (y-axis). Exonic (black rectangles), intronic (black bar between exons) and promoter (points with negative x-axis values) regions are shown in the VNN1 gene. The highest recorded allele-skews were observed at the -587 SNP (38%) in the promoter region, and at the rs2267951 SNP (35%) within intron 2 (red arrows).

133

4.3.2.1 Allele-specific nuclease activity at the -587 SNP To interpret allele-specific areas of MNase sensitivity within the -587 VNN1 promoter region derived from CHA-seq, the number of sequenced library fragments ending at a particular nucleotide (and hence mapping the sites of digestion by the nuclease) for each -

587 variant were compared (Figure 4.12). An allele-specific bias in the digestion pattern was shown for the -587 locus. In fact, only 12% of the total number of fragments carried -

587G, with the majority carrying the A allele (88%).

In the presence of the -587A allele, there were two areas (20-30 bp upstream from the -

587 locus and 70-80 bp downstream) that correlated with a high number of terminating sequenced reads (Figure 4.12). Therefore, it is likely the -587A locus is located with an active chromatin region. These findings are consistent with the CHART-PCR results which showed that the presence of the -587G allele correlates with greater chromatin condensation than when the A allele is observed.

134

Figure 4.12: Assessment of the allele-specific chromatin accessibility across the - 587 VNN1 promoter region using CHA-seq . Shown is a graphical summary of the Integrated Genome Viewer (IGV) software (Broad Institute) output from CHA-seq that detected 41 heterozygous SNPs in the VNN1 gene by comparison to the hg19 human reference sequence. The CHA-seq results shown represent the total sequence coverage (ticks every 10 base-pairs) across the -587 (red) promoter region (horizontal axis) in the VNN1 gene along with the positions and numbers of the 3’ terminal nucleotide following micrococcal nuclease digestion (vertical axis). Sequence reads containing the -587A allele are shown above the horizontal axis and reads containing the -587G allele are shown below the horizontal axis. The numbers shown on these columns represent the number of sequenced fragments whose 3’ nucleotide terminates at that position. Of note were two areas (indicated by black arrows) containing the -587A allele that were densely populated and possessed the highest number of terminating sequenced reads, suggesting the presence of active chromatin regions.

135

4.3.2.2 Nucleosome mapping within the -587 VNN1 promoter region To map nucleosome positions within the -587 VNN1 promoter region (and determine the relative level of nucleosome protection), sequenced CHA-seq library fragments containing the -587 locus (G or A) were analysed using the Internal Genome Viewer

(IGV) 2.1 software from the Broad Institute (Figure 4.13). The region surrounding the -

587 SNP displayed a periodicity in the number of MNase cut sites that was consistent with the positioning of the protective nucleosomes in the region (Figure 4.13). The

‘novel’ SNP (newly identified at the time of this research, but since has been designated rs45513194) shows partial protection and is positioned at the edge of a potential nucleosome, whereas the -587 SNP (rs2050153) is located in an area of high sequence read frequency, indicating the presence of little or no influence from nucleosomes. The remaining two SNPs (rs7760367 and rs3204527) within the -587 VNN1 promoter region are likely to be non-functional SNPs due to nucleosome protection.

136

CHA-seq reads (number)

VNN1 gene promoter (chromosome position)

Figure 4.13: Assessment of the chromatin arrangement across the -587 VNN1 promoter region using CHA-seq . Shown is an adapted summary of the Integrated Genome Viewer (IGV) software (Broad Institute) output from CHA-seq showing the - 587 (rs2050153) promoter region in the VNN1 gene correlated with the total sequence coverage across the region. Sequence reads (grey bars) corresponding to particular loci are shown along the horizontal axis (genomic position), with read number on the vertical axis. Also shown are the putative positions of the nucleosomes (dashed ovals) based on sequence coverage, with the -587 locus predicted to lie within a region with no nucleosome protection. The labelled red arrows indicate the positions of genetic variants located within the -587 promoter region, with the percentage of reads carrying each allele proportionally represented as red and blue bars underneath each arrow. It is of note is that the -587 SNP is predicted to occur within an area with little to no nucleosome protection.

137

4.3.3 CHA-seq for the VNN2 and VNN3 genes The VNN2 and VNN3 genes were subjected to CHA-seq using a prepared MNase- digested library isolated from a -587G/A heterozygous cell line (SAFHS cohort) in order to map nuclease digestion sites and illustrate the scalability of the CHA-seq approach.

As shown in Table 4.2, 24 heterozygous SNPs were captured across both genes and analysed using IGV to determine the location and allele skew of SNPs. All of the 24

VNN2 and VNN3 SNPs captured in CHA-seq were significantly skewed (P<0.005), of which 11 were in the VNN2 gene and 13 located in the VNN3 gene. Of the 11 VNN2

SNPs, 4 were located in the promoter region and the remaining 7 positioned within intronic regions. In the VNN3 gene, 3 were located in the promoter and 10 located within intronic sequences. The SNP located within intron 3 of the VNN3 gene

(chromosome 6 position 133,048,287) had the highest recorded allele skew (28%) and is a likely candidate for further functional assessment (Table 4.2).

There were 24 SNPs in the dbSNP database (http://www.ncbi.nlm.nih.gov/snp) that matched to the VNN2 gene and 389 SNPs to the VNN3 gene at the time of analysis (29-

6-13). In CHA-seq, one captured heterozygous SNP at chromosome position

133,048,214 in VNN3 matched to the rs9493417 in dbSNP.

138

Table 4.2: Heterozygous VNN2 and VNN3 SNPs captured in CHA-seq .

Shown is a summary of the Integrated Genome Viewer (IGV) software (Broad Institute) output from CHA-seq which detected 11 heterozygous SNPs in the VNN2 gene and 13 in the VNN3 gene by comparison to the hg19 human reference sequence. Shown for each captured VNN1 SNP are the genomic positions (chromosome position), position within the VNN2 and VNN3 genes (promoter/5’upstream, intronic or exonic), alleles, total CHA-seq sequence reads carrying the SNP, rs SNP numbers (N/A = no dbSNP match), allele frequencies (allele 1 and 2), and allele skew as a direct reflection of chromatin accessibility and functionality (percentage). Allele skews that were statistically significant as determined by their critical t-value (3.09% at P=0.005, df=24, 2-tailed) are bolded. The range of nucleosomal protection at genetic variants in the VNN2 and VNN3 genes was between 5-28%. The average allele skew across all captured VNN2 and VNN3 SNPs was 14.6%, with the VNN3 variant at chromosome position 133,048,287 possessing the highest recorded allele skew (28%) (red).

SNP location (chromosome Total rs SNP Allele Allele 2 Allele position) Gene/genomic location Alleles reads number 1 % % skew 133,085,576 VNN2 promoter/5'upstream A-T 1445 N/A 39 61 11 133,085,564 VNN2 promoter/5'upstream C-T 1448 N/A 61 39 11 133,085,553 VNN2 promoter/5'upstream C-T 1451 N/A 61 39 11 133,085,552 VNN2 promoter/5'upstream A-T 1451 N/A 60 40 10 133,079,482 VNN2 intron 1 C-T 803 N/A 33 67 17 133,076,897 VNN2 intron 4 A-G 711 N/A 29 71 21 133,073,728 VNN2 intron 4 A-C 1382 N/A 63 37 13 133,073,726 VNN2 intron 4 A-T 1385 N/A 37 63 13 133,073,721 VNN2 intron 4 C-T 1384 N/A 63 36 13 133,072,720 VNN2 intron 4 C-T 1518 N/A 61 39 11 133,070,042 VNN2 intron 6 A-G 1033 N/A 45 55 5

133,056,346 VNN3 promoter/5’upstream G-T 1227 N/A 31 69 19 133,056,344 VNN3 promoter/5’upstream A-T 1227 N/A 69 31 19 133,056,340 VNN3 promoter/5’upstream A-T 1227 N/A 31 69 19 133,053,371 VNN3 intron 2 C-T 859 N/A 66 34 16 133,053,369 VNN3 intron 2 A-G 859 N/A 34 66 16 133,053,367 VNN3 intron 2 C-T 859 N/A 34 66 16 133,048,325 VNN3 intron 3 A-G 1669 N/A 37 63 13 133,048,324 VNN3 intron 3 A-G 1669 N/A 62 38 12 133,048,321 VNN3 intron 3 G-T 1669 N/A 62 38 12 133,048,316 VNN3 intron 3 A-T 2268 N/A 72 28 22 133,048,287 VNN3 intron 3 C-T 2875 N/A 78 22 28 133,048,214 VNN3 intron 3 A-G 1689 rs9493417 37 63 13 133,045, 554 VNN3 intron 5 A-C 634 N/A 40 60 10 Average 14.6

139

4.4 Discussion Genetic variants that alter chromatin accessibility and transcription factor binding comprise a central regulatory mechanism by which sequence variation leads to changes in gene expression in humans (Degner et al ., 2012) . A key result from this present study is the utility of a novel sequencing approach (CHA-seq) to characterise functional genetic variants on a larger scale and more time and cost-effectively than existing chromatin arrangement assays (CHART-PCR).

T-CHA-seq was applied to the -137 promoter region in VNN1, which showed a significant allele-bias in sequence reads (38.5% for the -137T allele). Taken together,

CHART-PCR and T-CHA-seq results are suggestive that the -137 variant confers allele- specific chromatin accessibility differences in the VNN1 gene promoter. Göring et al .

(2007) identified allele-specific binding at the -137 variant using EMSA, supporting the findings of both T-CHA-seq and CHART-PCR approaches from the present study.

Furthermore, the mapping of nuclease cut sites identified that the -137 locus is present within a region of high DNase I hypersensitivity. Consequently, this result and the findings of Göring et al . (2007) support the presence of bound proteins at the -137 locus that confer allele-specific nuclease protection.

In order to assess allele-specific differences in chromatin arrangement across an extended genomic region, the CHART-PCR assay was translated into a genome-wide format (CHA- seq). CHA-seq involved the use of a prepared library of DNA fragments covering the

VNN1 gene and which captured 41 heterozygous SNPs. Of these, 37 showed a significant allele skew (P<0.005), including the SNP captured at position 133,035,099

(rs2294757) which has been reported to have an allele-specific influence in DNA methylation (Kerkel et al ., 2008). Previously, Pastinen et al . (2004) determined an average observed allele ratio of 37:63 as the cutoff for determining the presence of a 140 significant allelic imbalance. Taking this allele skew as a baseline for functional candidates and incorporating the 1% Illumina sequencing error rate (i.e. 14%), CHA-seq has identified 23 novel candidate SNPs that have an allele-specific role in VNN1 chromatin organisation. Of the 23 candidate functional SNPs, 17 occur in intronic regions, five in exonic regions and one in the promoter (-587). The majority (17/23) of candidate SNPs, however, are localised to the promoter and intronic regions of the

VNN1 gene, and are suggestive of a region under significant transcriptional regulation through allele-specific effects on chromatin (Tsukada et al ., 2006). Interestingly, the rs2267951 SNP in intron 2 recorded the second highest allele skew in CHA-seq and was selected for further in vitro characterisation by EMSA in order to verify SNP functionality and the veracity of the CHA-seq results (Chapter 6). The remaining genetic variants with significant allele skews represent ideal candidates for further functional validation.

Comparison of the number of sequenced library fragments ending at a particular nucleotide

(and hence mapping the sites of digestion by the nuclease) for each -587 variant indicated an allele-specific bias in nuclease digestion, with the majority of sequence reads terminating within the promoter when carrying the -587A allele (88%). Furthermore, the -587 promoter region displayed a periodicity in the number of MNase cut sites that correlated with nucleosome positions within the region. The nucleosome periodicity suggests that the -

587A locus is located within an active euchromatic region, which therefore indicates greater transcriptional potential. Furthermore, the level of predicted protection of SNPs within the -587 promoter region was inversely correlated to the allele skew at these

SNPs. Consequently, this may indicate that candidate functional variants with significant allele-specific chromatin accessibility or CHA-seq allele-skews, such as the

141

-587 variant, have reduced nucleosome protection against MNase digestion due to histone proteins.

To assess the scalability of the CHA-seq approach, this method was applied to the

VNN2 and VNN3 genes. The results of CHA-seq showed the presence of two candidate functional SNPs within the VNN2 gene and eight in the VNN3 gene, based on a comparison to the 14% baseline allele skew. All candidate SNPs in the VNN2 and VNN3 genes were present within non-coding regions, suggesting that the role of these genetic variants is likely to be associated with chromatin structure and/or transcriptional regulation. Further experimental validation would assist in the elucidation of the functional mechanisms associated with these VNN2 and VNN3 candidate variants.

It was expected that CHA-seq would capture all heterozygous SNPs across the entire

VNN1 sequence, including the -137 SNP, using a prepared MNase-digested library from the SAFHS cohort (note that T-CHA-seq used DNase I). However, a periodicity was observed in densely populated and under-populated regions with respect to SNP- containing sequence reads over extended genomic regions. The cause of this unexpected periodicity is unclear and may be due to inconsistent capture of MNase-digested library fragments during bead selection (see Section 4.2). Consequently, further optimisation of the CHA-seq methodology is required to ensure capture of all genetic variants in other

CHA-seq libraries. Since the completion of the current study, a CHA-seq library using

DNase I was prepared using an optimised methodology developed as part of this research, however, the -137 locus was again not captured. This research area is subject to further experimentation. An additional limitation of CHA-seq, is the inherent

Illumina sequencing error rate (1%), which may introduce false polymorphisms with

DNA sequence reads.

142

In the present study, CHA-seq has identified multiple candidate functional genetic variants across the VNN1 , VNN2 and VNN3 genes. The CHA-seq results are suggestive that this approach may represent a time- and cost-effective alternative for identifying and characterising functional genetic variants. These candidate functional variants in the

Vanin gene family will provide targets for verification and future research. This potential application however, requires knowledge of the heritable component of allele- specific chromatin accessibility. Consequently, CHA-seq would represent an ideal approach for rapidly determining chromatin accessibility patterns within familial groups.

143

Chapter 5 – DNA methylation analysis of the -587 promoter region in the VNN1 gene

5.1 Introduction DNA methylation plays a crucial role in transcriptional regulation and is typically associated with gene silencing either directly, by reducing the accessibility of transcription factors to the DNA, or indirectly, by enhancing the formation of heterochromatin (Caro et al ., 2012; Watt and Molloy, 1988; Boyes et al ., 1991). These changes are reversible upon de-methylation (Holliday et al ., 1991). Bisulfite sequencing is the standard approach for characterising DNA methylation patterns and is dependent upon the treatment of genomic DNA with bisulfite prior to sequencing, converting unmethylated cytosines to uracils and leaving methylated cytosines in their original state (Rohde et al ., 2010; Frommer et al ., 1992). Bisulfite sequencing is particularly useful for the characterisation of genetic variants located within CpG islands, which can influence chromatin and gene expression depending upon the methylation state of associated CpG sites (Bird, 2002; Goll and Bestor, 2005); as seen in the CBR1 gene

(Zhang, Y et al ., 2009) and the RIL gene (Boumber et al ., 2008).

Göring et al . (2007) identified the -587 VNN1 variant to be putatively functional based on the genetic association with transcript abundance and HLD-C concentration in the

SAFHS cohort. Functional analysis identified no allele-specific DNA-protein complexes at the -587 locus using EMSA, indicating that differential transcription factor regulation was not the functional mechanism at this locus (Göring et al ., 2007).

Taken together, the CHART-PCR (Chapter 3) and CHA-seq (Chapter 4) results indicated that the -587 VNN1 variant was in a region of condensed chromatin, hence it was important to determine whether the methylation status of the -587 locus was

144 directly responsible for the difference. Previously, Kerkel et al . (2008) reported differential CpG methylation in the VNN1 gene using three haematopoietic cell lines , and identified a functional exonic SNP (rs2294757) that was associated with preferential gene expression through allele-specific methylation in the VNN1 gene. In this current study, bisulfite sequencing was performed on genotyped lymphoblastoid cell lines from the SAHFS population to assess the DNA methylation status of the -587 promoter region in the VNN1 gene.

5.2 Materials and Methods A detailed description of the bisulfite sequencing methodology is shown in Section

2.9.3 (Chapter 2), with a brief summary included below.

Genomic DNA was isolated from seven cultured human leukocyte cell lines from the

SAFHS cohort carrying different -587 genotypes (two -587G/G homozygotes, three -

587G/A heterozygotes and two -587A/A homozygotes) and purified using DNA Blood mini columns (QIAGEN, Australia). Isolated DNA was treated with bisulfite to convert unmethylated cytosines bases into uracils, and subsequently purified using EpiTect spin columns (QIAGEN, Australia). The converted DNA was enriched for the VNN1 gene promoter by PCR using primers (5’-CAGTGAGCCTGTACCCCTGG-3’ and 5’-

TATTCTCCCCATGTGTGTGCC-3’) flanking the predicted CpG island (between -758 to -496 upstream of the VNN1 transcriptional start site). The amplification products were cloned into the pCR2.1-TOPO vector (Life Technologies Co., USA) for sequencing (Australian Genome Research Facility, Perth, Western Australia).

Sequences were analysed using the online analysis program BISMA (Bisulfite

Sequencing DNA Methylation Analysis) (http://biochem.jacobs- university.de/BDPC/BISMA/) using the default analysis parameters to identify the 145 methylation state of predicted CG di-nucleotides in CpG islands. Further bioinformatic verification was performed using AliBaba 2.1 (http://www.gene- regulation.com/pub/programs/alibaba2/index.html) in order to determine whether transcription factor binding sites occured within CpG sites.

5.3 Results

5.3.1 Predicted CpG sites in the -587 promoter region of the VNN1 gene To investigate DNA methylation in the VNN1 gene promoter, bioinformatic analysis was performed in the region surrounding the -587 SNP, indicating the presence of a

CpG island. A 2 kb region upstream of the transcriptional start site of the VNN1 gene was analysed using BISMA, which identified the presence of a CpG island composed of

10 CG di-nucleotides (Figure 5.1). Interestingly, the -587G allele was imbedded within a CG di-nucleotide. However, the -587A allele ablates the CG-dinucleotide and reduces the length of the CpG island (Figure 5.1). In fact, the predicted CpG island length was

264 bp in the presence of the -587G allele, but only 204 bp when the A allele was present (Figure 5.1).

146

Figure 5.1: BISMA prediction of VNN1 CpG island length . Shown are bioinformatic predictions from Bisulfite Sequencing DNA Methylation Analysis (BISMA) (http://biochem.jacobs-university.de/BDPC/BISMA/) which interrogated the VNN1 gene promoter for potential CpG islands in the presence of different -587 alleles (R= G or A, in red). BISMA predicted the presence of 10 methylated CG di-nucleotides (grey boxes) in the VNN1 gene promoter. Black dots illustrate the predicted CpG island length of 264 bp when the -587G allele is present, with grey dots correlating with a CpG island length of 204 bp when the -587A allele is present. Of interest is that the presence of the -587A allele disrupts the formation of a CpG site which occurs in the presence of the - 587G allele and therefore correlates with an allele-specific difference in the length (60 bp) of the CpG island.

147

5.3.2 Bisulfite sequencing of the -587 promoter region in the VNN1 gene and BISMA analysis Seven genotyped cell lines from the SAFHS cohort, differing in their -587 genotype, were assessed using bisulfite sequencing to determine the DNA methylation status of the predicted VNN1 promoter CpG island (Figure 5.2). As shown in Figure 5.2, the -

587G site in both homozygotes and heterozygotes shows near complete cytosine methylation (on the opposite strand). In fact, all CpG di-nucleotides displayed heavy methylation, except at the -720 CpG site. For this site, the methylation pattern correlated with the genotype at the -587 locus. In haplotypes carrying the -587G allele, the -720 CpG site was generally un-methylated (38 out of 48 sequences). Conversely, when the -587A allele was present, the -720 CpG site was almost exclusively cytosine methylated (41 out of 42 sequences). This pattern was present even in the heterozygous lines, indicating that the allele-specific pattern is not cell line specific but rather is determined by genotype. The biological consequence of this is uncertain, but is likely responsible for the -587 allelic differences apparent in chromatin arrangement. The observed variability in unmethylated CpG sites (apart from the -720 site) may be attributed to factors such as: clonal DNA amplification, incomplete bisulfite conversion or non-CpG methylation (Zhang and Jeltsch, 2010).

148

Figure 5.2: Methylation of the -587 SNP . Bisulfite sequencing analysis of genotyped lymphoblastoid lines from the SAFHS population. Cell line designations and -587 genotype are shown on the left. Black boxes indicate DNA methylation, whereas white boxes indicate no methylation. “A” represents the alternate allele at -587 which is not a substrate for CpG methylation. The grey shading indicates heterozygous cell lines. Near complete methylation (41 out of 42 sequences) was seen at the -587G allele. Additionally, the methylation status at -587 appears to affect methylation at -720, a non- variant site.

149

5.3.3 Bioinformatic analysis of the -720 CpG site in the VNN1 gene promoter The bisulfite sequencing results and bioinformatic analysis showed an association between the methylation status of the -720 CpG site and the allele present at the -587 locus (Figure 5.2). To determine a potential functional mechanism of allele-specific

DNA methylation at the -720 CpG site, bioinformatic analysis was conducted using

AliBaba 2.1 (http://www.gene-regulation.com/pub/programs/alibaba2/index.html).

Analysis revealed a CCAAT/enhancer-binding protein beta (C/EBP β) transcription factor binding site at the -720 locus (Figure 5.3). Furthermore, the GATA-binding factor

1 (GATA-1) transcription factor was predicted to bind downstream of the -720 locus

(Figure 5.3).

150

Figure 5.3: Bioinformatic analysis of -720 CpG site in the VNN1 gene promoter . Shown is an adapted output from the online bioinformatic tool AliBaba 2.1 (Grabe, 2002), which interrogated the -720 promoter region in the VNN1 gene (red) for potential transcription factor binding sites. A CCAAT/enhancer-binding protein beta (C/EBPbeta) binding site (AliBaba score: 20) encompasses the -720 CpG site as shown, while a GATA-binding factor 1 (GATA -1) binding site (AliBaba score: 29) is predicted further downstream (3’ direction).

151

5.4 Discussion Chromatin arrangement is dependent upon DNA sequence and methylation status

(Statham et al ., 2012). A genetic variant may alter chromatin arrangement over a large genomic region whereby alternate alleles affect chromatin arrangement and hence differentially regulate gene expression. In the current study, the chromatin architecture was assessed in the -587 promoter region of VNN1 . The CHART-PCR and CHA-seq results showed that the -587G variant resides within a region of condensed chromatin.

Conversely, when the A allele is present at the -587 locus, chromatin is less condensed.

Furthermore, methylation at the -587G allele was supported through BISMA analysis.

Additionally, the length of the CpG island surrounding the -587G allele is greater than that surrounding the -587A allele. The results of the current study suggest that the methylation status of the VNN1 gene promoter at the -587 SNP directly affect chromatin structure and the access of the transcriptional machinery to this gene. Thus, the allele present at the -587 SNP is likely to be functional and influence gene expression.

Interestingly, DNA methylation at a site 133 bp upstream (-720) of the -587 SNP appears to be inversely correlated with the -587 genotype, suggesting that allele- dependant chromatin changes can be seen at sites remote from the actual genetic variation. Typically, the -587G allele was methylated whereas the -720 CpG site was unmethylated. In contrast, the -587A allele was not methylated and was seemingly associated with a methylated -720 CpG site. Kerkel et al . (2008) reported a similar pattern of differential CpG methylation at the VNN1 -720 site in three haematopoietic cell lines derived from the perifieral blood of non-specified adults. Using methylation- sensitive SNP analysis (MSNP), VNN1 was identified as a gene regulated by allele- specific DNA methylation (Kerkel et al ., 2008). However, in contrast to the current study, Kerkel et al . (2008) suggest that the genotype at -587 does not correlate with -

152

720 methylation status which was a novel finding in the current study. The absence of a correlation in the Kerkel et al . (2008) study may be attributed to the difference in the selection of experimental validation method, population-specific differences, or unknown -137 genotype. In fact, the CHART-PCR and CHA-seq results are suggestive that the -137 variant contributes to the differences in chromatin seen in the VNN1 promoter. However, given the almost complete linkage disequilibrium that exists between the -587 and -137 loci in the SAFHS population (Göring et al ., 2007), it is unclear which genetic variant has a functional role in affecting VNN1 chromatin arrangement. A possible solution may be to re-analyse the results of the Kerkel et al .

(2008) study by accounting for the -137 genotype, or to functionally assess a cell line from the SAFHS cohort carrying -137 and -587 loci that are not in linkage disequilibrium.

Besides the -720 CpG site, the bisulfite sequencing data showed overall methylation of the VNN1 CpG island regardless of -587 genotype. Given the linkage disequilibrium between -587 and -137 in the SAFHS population (Göring et al ., 2007), it is possible that the effects seen at the -720 site are due to allele-specific events at the -137 site. A feasible mechanism is that allele-specific Sp1 transcription factor binding leads to allele-specific changes to chromatin arrangement of an extended promoter region upstream of the VNN1 gene, and consequently, differential gene expression.

The bisulfite sequencing and bioinformatic analysis showed that allele-specific changes to chromatin arrangement of an extended VNN1 promoter region may correlate with differential C/EBP β binding at the -720 CpG site. C/EBP β is a transcriptional activator of genes involved in immune and inflammatory responses (Akira et al ., 1990).

Previously, C/EBP β binding has been experimentally demonstrated to be influenced by

153

DNA methylation in mammalian cell lines (Ikeda et al ., 2008). Given that VNN1 is a key regulator of responses to inflammatory stimuli (Min-Oo et al ., 2007), C/EBP β represents a candidate activator of the VNN1 gene under the influence of allele-specific

DNA methylation. Experimental verification using ChIP-seq or EMSA could be conducted at a later date to confirm the C/EBP β interaction in the VNN1 promoter.

154

Chapter 6 – Experimental validation of candidate functional SNP using EMSA

6.1 Introduction A number of candidate functional genetic variants were identified in VNN1 using CHA- seq (Chapter 4), including the rs2267951 SNP which recorded the second-highest allele- skew (35%), which could influence VNN1 expression. Previously, Göring et al . (2007) used genome-wide transcriptional profiles and bioinformatic analyses to identify cis- regulatory genetic variants in VNN1 . However, there have been no reports that the rs2267951 variant is a functional candidate. In the current study, EMSA was used to experimentally validate the functionality of the rs2267951 SNP and to substantiate the veracity of the CHA-seq approach.

In order to provide a positive control for allele-specific nuclear factor binding in EMSA, a candidate functional SNP (‘-191T/C’) in the Presenilin Associated Rhomboid-Like

(PARL , formally PSARL) gene promoter was selected based on strong genetic association with mitochondrial content levels (Curran et al., 2009). Following functional analysis of the -191T/C SNP, EMSA and bioinformatic analysis was applied to experimentally validate the CHA-seq results for the rs2267951 VNN1 SNP.

6.2 Methods A detailed description of the EMSA methodology is shown in Section 2.9.2 (Chapter 2).

Of note, complementary 5' biotinylated oligonucleotides representing both allelic forms of the promoter variants were obtained commercially from Sigma-Aldrich Pty Ltd

(Sigma, Australia). The sequences of the EMSA oligonucleotides are shown in Table

155

6.1. Nuclear extract preparation, binding reactions and gel analysis and detection were performed as described previously (Section 2.9.2).

For EMSA assessment at the -191T/C SNP, nuclear extract was isolated from the Jurkat

E6-1 cell line (ATCC #TIB-152) (Curran et al ., 2009), whereas nuclear extract was isolated from the ‘0331’ SAFHS cell line for the rs2267951 SNP. The 0331 cell line was selected as it was previously used for CHA-seq experiments (Chapter 4).

156

Table 6.1: EMSA oligonucleotides for PARL and VNN1 SNPs

Electrophoretic Mobility Shift Assays (EMSA) were performed using sets of sequence- specific oligonucleotides obtained from Sigma-Aldrich Pty Ltd (Sigma, Australia) of the sequence shown (“DNA SEQUENCE/MODIFICATION”). Oligonucleotides were designed to target the -191 PARL SNP (rows 1-4) and the rs2267951 VNN1 SNP (rows 5-8). Two pairs of allelic oligonucleotides were required for each DNA-protein binding reaction targeting a particular SNP. Each pair consisted of two complementary base sequences (forward from 5’ to 3’ and reverse from 3’ to 5’) that differ only by the SNP allele (as shown in bold). The forward oligonucleotide from each complementary pair also possessed a 5’ biotin modification to facilitate visualisation of DNA-protein binding complexes. Forward (FWD) and reverse (REV) primers are indicated by their respective suffix. Pairs used in EMSA assays are grouped together and share the same prefix.

OLIGONUCLEOTIDE DNA Sequence/Modification (5’-3’)

DESIGNATION

-191 EMSA (RARE 5’ Biotin – GAGCCAGCTGAGCAG CGGGAGGGGAAGCGGT 3’ ALLELE) FWD -191 EMSA (RARE 5’ ACCGCTTCCCCTCCC GCTGCTCAGCTGGCTC 3’ ALLELE) REV -191 EMSA (COMMON 5’ Biotin – GAGCCAGCTGAGCAG TGGGAGGGGAAGCGGT 3’ ALLELE) FWD -191 EMSA (COMMON 5’ ACCGCTTCCCCTCCC ACTGCTCAGCTGGCTC 3’ ALLELE) REV RS2267951 EMSA (T 5’ – Biotin – CTGAGATTGGATTT TGTGGGGTGAGGTGAGT- 3’ ALLELE) FWD RS2267951 EMSA (T 5’- ACTCACCTCACCCCAC AAAATCCAATCTCAG -3’ ALLELE) REV RS2267951 EMSA (C 5’ – Biotin – CTGAGATTGGATTT CGTGGGGTGAGGTGAGT- 3’ ALLELE) FWD RS2267951 EMSA (C 5’ – ACTCACCTCACCCCAC GAAATCCAATCTCAG -3’ ALLELE) REV

157

6.3 PARL Results

6.3.1 EMSA and bioinformatic analysis of the -191 SNP in the PARL gene promoter To provide a positive control for allele-specific nuclear factor binding, EMSA was performed by comparing sequences that differed only at the -191T/C nucleotide. The results shown in Figure 6.1 suggests that the -191 SNP did affect binding of nuclear factors, with a major complex preferentially binding to the -191T allele in nuclear extract conditions (lane 4). This complex is absent in the -191C sample (lane 3). In order identify transcription factor binding sites within the PARL gene promoter, bioinformatic analysis using P-Match was performed, and revealed that the -191T allele is embedded in a Sp1 binding site (TRANSFAC 6.0 Core similarity = 0.957, matrix similarity = 0.967). In contrast, the -191C SNP disrupts the core sequence. These results suggest that the -191 SNP contributes to allele-specific differences in PARL expression, possibly via Sp1-mediated regulation.

158

Figure 6.1: EMSA for the -191 PARL SNP . Shown are the results of the ESMA that compares the gel-migratory patterns of -191T/C specific oligos in the absence of nuclear extract (lane 1 and 2), and in the presence of Jurkat lymphocyte nuclear extract (lanes 3 and 4) and analysed on a 6% non-denaturing polyacrylamide gel. A difference in banding patterns is observed between lanes 3 and 4 which differ in -191 alleles indicate the presence of an additional complex (indicated by arrowhead) in the presence of T allele (lane 4) under nuclear extract conditions. In contrast, this band was not seen in the presence of the C allele (lane 3). No complex was observed without the presence of nuclear extract (lanes 1 and 2).

159

6.4 VNN1 Results

6.4.1 Bioinformatic analysis of the rs2267951 VNN1 SNP To determine whether the rs2267951 SNP is embedded within a transcription factor binding site, bioinformatic analysis was conducted using AliBaba 2.1 (TRANSFAC 7.0)

(http://www.gene-regulation.com/pub/programs/alibaba2/index.html) which predicted

C/EBP α and (yeast-specific) MIG1 binding sites specifically at the T allele (Figure 6.2).

Conversely, no transcription factor binding sites were predicted at the rs2267951 locus in the presence of the C allele. However, Sp1 was predicted to bind downstream (3’) of both alleles (Figure 6.2).

160

A

B

Figure 6.2: Bioinformatic analysis of the rs2267951 SNP in the VNN1 gene . Shown is the output from the online bioinformatic tool AliBaba 2.1 (Grabe, 2002), which interrogated the VNN1 gene region (red text) surrounding the rs2267951 SNP (labelled) for potential transcription factor binding sites. As shown in Figure 6.2 A, three transcription factor binding sites were predicted within VNN1 in the presence of the T allele of the rs2267951, including CCAAT/enhancer-binding protein alpha (C/EBPalp)(AliBaba score: 23), MIG1 (AliBaba score: 27) and Specificity Protein 1 (Sp1) (AliBaba score: 34). As shown in Figure 6.2 B is the AliBaba 2.1 output which identified a Sp1 binding site (AliBaba score: 34) in the VNN1 gene region (red text) in which the C allele of the rs2267951 is embedded. Note that the T allele introduces two transcription factor binding sites (C/EBPalpa, MIG1) which were not observed in the presence of the C allele.

161

6.4.2 EMSA analysis of the rs2267951 VNN1 SNP To determine whether the rs2267951 SNP was associated with allele-specific DNA- protein binding in vitro , as observed in the positive control (-191 PARL SNP), EMSA was undertaken on sequences that differed by only the single rs2267951 nucleotide. The results shown in Figure 6.3B indicated that the rs2267951 SNP does affect binding of nuclear factors, with a major complex preferentially binding to the T allele in nuclear extract derived from the SAFHS cohort (lane 4). This complex was absent in the presence of the C allele (lane 3). A similar EMSA result was observed at the -191T/C nucleotide (Figure 6.3A).

162

A B

Figure 6.3: EMSA for the rs2267951 VNN1 SNP relative to the -191 PARL SNP . Shown are the results of the ESMA that compares the gel-migratory patterns of rs2267951-specific oligonucleotides (Figure 6.3B on right) in the absence of nuclear extract (lane 1 and 2) and in the presence of SAFHS (cell line ‘0331’) nuclear extract (lane 3 and 4) relative to the positive control at the -191 PARL SNP (Figure 6.3A on left). Figure 6.3 A shows a difference in banding patterns between lanes 3 and 4 which differ in -191 alleles. An additional complex (indicated by arrowhead) is seen the presence of T allele (lane 4) under nuclear extract conditions in contrast to the C allele (lane 3), where no band is present. No complex was observed without the presence of nuclear extract (lane 1 and 2). Figure 6.3 B shows a difference in banding patterns between lanes 3 and 4 which differ in rs2267951 alleles. An additional complex (indicated by arrowhead) is seen in the presence of T allele (lane 4) under nuclear extract conditions, in contrast to the C allele (lane 3), where this band is not present. No complex was observed without nuclear extract (lane 1 and 2).

163

6.5 Discussion In the current study, two genetic variants (-191T/C PARL SNP and rs2267951 VNN1

SNP) were validated for functionality at the molecular level using EMSA and bioinformatic analysis. The purpose of these analyses was to verify the veracity of the

CHA-seq results for the rs2267951 SNP in the VNN1 gene.

EMSA results and bioinformatic analyses where suggestive that the presence of the common -191T SNP introduces a Sp1 transcription factor binding site into the PARL promoter. This binding site creates a transcription domain that could regulate PARL expression when the T allele is present. This is verified by the statistical analysis which indentifies an association of the -191T/C SNP with mitochondrial content levels (Curran et al ., 2009). Taken together, the -191T/C SNP represents a functional genetic variant and has been used in this current study as a positive control for allele-specific nuclear factor binding.

For VNN1 , the EMSA results show the presence of an additional DNA-protein complex in the presence of the T allele at the rs2267951 locus. This result confirms that the rs2267951 variant would be functional in CHA-seq. Bioinformatic analysis suggested a

CCAAT/enhancer-binding protein alpha (C/EBP α) transcription factor binding site in the VNN1 gene in the presence of the T allele. C/EBP α is a transcription factor that is encoded by the CEBPA gene in humans (Dufour et al ., 2009; Szpirer et al ., 1992; Cao et al ., 1991). The C/EBP α protein contains the basic Leucine Zipper Domain (bZIP domain), which permits the binding to target promoters and enhancers (Dufour et al .,

2009; Ellenberger, 1994). For example, C/EBP α has been associated with body weight homeostasis as a transcription enhancer (Miller et al ., 1996). A potential consequence of allele-specific C/EBP α regulation would be increased VNN1 expression in the presence of the T allele. Given the association between body weight and cardiovascular disease 164

(Lavie et al ., 2009; Berrington de Gonzalez et al ., 2010), the rs2267951 SNP may confer allele-specific functional differences in cardiovascular phenotypes.

The results of EMSA provide further experimental evidence of function for the rs2267951 variant, which is supported by the bioinformatic analysis. This analysis revealed an allele-specific C/EBP α binding site in the presence of the T allele in VNN1 .

Therefore, the EMSA and bioinformatic results are suggestive that CHA-seq is a reasonable measure of function for genetic variants.

165

Chapter 7 – General discussion and future research directions

7.1 CHA-seq contribution to functional research Reporter gene and gel shift assays have been used successfully over the last two decades to determine the functionality of genetic variants. However, both approaches possess limitations that affect their utility. For example, these assays lack scalability and do not provide the proper chromatin context and hence are not always representative of function in an in vivo context. In order to overcome limitations associated with current functional assays, two novel in vivo assays (T-CHA-seq and CHA-seq) were developed as part of the current study to provide a reasonable alternative for functional assessment of genetic variants. The CHA-seq approach was developed in order to quantitate differences in chromatin arrangement as a measure of functional differences at genetic variants by extending CHART-PCR into a genome-wide format using next-gen DNA sequencing.

The CHART-PCR method has been refined to enable cell-type and allele-specific comparisons of chromatin accessibility associated with genetic variants (Cruikshank et al ., 2012), however, the assay is not efficiently scalable in the current format.

Traditionally, CHART-PCR consists of the targeted amplification of specific genomic regions of interest. However, this process requires significant time and resources to characterise entire genes or genomes and thus is limited by its ability to analyse short target sequences. To overcome this limitation, next-gen sequencing was incorporated as part of the CHA-seq approach in order to provide a cost- and time-efficient alternative for the quantification of chromatin accessibility at genetic variants across extended genomic regions. The CHA-seq approach has been used successfully to analyse the

Vanin gene family in the current study. Other CHA-seq features of this approach include the ability to: generate a high-resolution DNA sequence map specifying the

166 alleles of genetic variants within a region or genome of interest, identify novel SNPs, map nucleosome positions relative to genetic variants, and to functionally assess genetic variants through interpretation of allele-specific biases in sequence read numbers

(‘allele skew’).

Chromatin accessibility is considered a reliable measure of the functionality of genetic variants (Mikkelson et al ., 2007; Birney et al ., 2010; Pang et al ., 2013) and has been used extensively in the current research to identify and characterise candidate functional genetic variants in the VNN1 gene. Traditionally, chromatin arrangement, as measured by CHART-PCR, is calculated as a ratio of relative abundance of digested to undigested amplicon and expressed as a percentage of cutting within a genomic region. In contrast, chromatin arrangement, as measured by the CHA-seq approach, is calculated as an allele-specific bias in sequence read numbers at genetic variants. At the -137 and -587

VNN1 SNPs, both CHA-seq and CHART-PCR results showed matching allele-specific chromatin arrangement, which is indicative that DNA sequencing is a viable alternative for the quantification of functional genetic variants.

The selection of chromatin-sensitive nuclease is an important feature in the interpretation of CHA-seq results, since nucleases may differ in recognition sites and mode of nucleolytic digestion. While MNase has been used to map nucleosome occupancies and positions (Rizzo et al ., 2012), DNase I is commonly used to map gene promoter regions for chromatin arrangement, potential transcription factor binding sites and areas of hypersensitivity (Boyle et al ., 2011; Drouin et al ., 1997). Consequently, the selection of nuclease for the CHA-seq approach was a means for studying specific functional properties of genetic variants. For example, DNase I was specifically selected in T-CHA-seq to assess chromatin accessibility associated with the allele-specific Sp1

167 binding at the -137 locus (Göring et al ., 2007). As such, the use of multiple nucleases can provide complementary information regarding the chromatin structure in a particular region. This allows a more conclusive interpretation of the specific functional mechanism of genetic variants.

7.2 Contribution to VNN1 research A significant output of the current study was the characterisation of three experimentally verified functional genetic variants, namely rs4897612 (-137), rs2050153 (-587) and rs2267951 in the VNN1 gene. Taken together, the results from CHART-PCR (Chapter 3) and CHA-seq (Chapter 4) showed that the -137 and -587 SNPs within the VNN1 promoter, previously predicted to be functional through extensive statistical interrogation of transcript levels from the SAFHS cohort ( Göring et al ., 2007) , were actually functional due to allele- specific chromatin accessibility. Previously, Göring et al . (2007) used EMSA to show the presence of allele-specific DNA-protein complexes in the presence of the -137T allele that are consistent with the ability to preferentially bind the activating transcription factor Sp1 and the allele-specific chromatin accessibility observed in CHART-PCR (Chapter 3). In contrast, Göring et al . (2007) reported that the -587 alleles displayed no differences in transcription factor binding using EMSA. However, the application of the CHART-PCR and CHA-seq methods showed significant allelic differences in chromatin accessibility surrounding the -587 SNP and consequently indicated the functional potential at this variant in addition to the -137 SNP. As the -587G allele is contained within a CG di-nucleotide and hence is a potential site of cytosine methylation, BISMA analysis of the region surrounding the -587 SNP revealed allele-specific differences in the length of the CpG island predicted in the VNN1 promoter. Assessment of the methylation status of the region surrounding the -

587 SNP, using bisulfite sequencing, showed an allele-specific methylation pattern at a single CpG site (-720 locus) located upstream of the -587 SNP. This indicated that allele- specific chromatin differences can be seen at sites remote from the causal genetic variant(s).

168

Similar results have been previous ly reported in a study by Kerkel et al . (2008) in which three haematopoietic cell samples showed a similar pattern of differential CpG methylation at the -720 site to that in the SAFHS cell lines used in the current study. However, in contrast to the results of the current research, the results of Kerkel et al . (2008) suggest that the -587 genotype does not correlate with the -720 methylation status. Given the strong linkage disequilibrium between the -587 and -137 SNPs in the SAFHS cohort, as reported by Göring et al . (2007) , it is possible that the effects seen at the -720 site are due to allele- specific events at the -137 locus. Potentially, the allele-specific DNA methylation may be caused by allele-specific Sp1 transcription factor binding, resulting in allele-specific chromatin arrangement of an extended promoter region upstream of the VNN1 gene. Such an occurrence could lead to differential gene expression. Allele-specific changes to chromatin arrangement within an extended genomic region have been demonstrated previously by Shoemakers et al . (2010) and Zhang, Y et al . (2009).

Given the nature of the functional effects of the -137 and -587 variants, it is possible that the effects of these variants are independent. However, given the linkage disequilibrium observed between them, it is more likely that the two functional variants are part of a haplotype that regulates genotype-specific VNN1 expression. This interaction may be complicated by the presence of additional functional genetic variants (CHA-seq candidates in Chapter 4) located in regions distinct from the VNN1 gene promoter. For instance, the

CHA-seq results were suggestive that the rs2267951 VNN1 SNP to be functional based on a significant allele-skew, which was subsequently verified using EMSA and bioinformatic analysis supporting the presence of allele-specific C/EBP α binding at this site. Potentially, functional haplotypes containing multiple functional genetic variants may exist in genes other than VNN1 .

169

Given that VNN1 appears to play a significant role in determining levels of ‘protective’

HDL-cholesterol (Göring et al ., 2007; Jacobo-Albavera et al ., 2012), functional genetic variants associated with high levels of VNN1 gene expression, such as the -137 and -587

SNPs, may lead to reduced HDL-C levels which is known to contribute to an increased risk of CVD.

7.3 Future research directions The elucidation of functional genetic variants that alter human phenotypes, such as disease susceptibility, is essential to characterising the relationship between genotype and phenotype (Stranger et al ., 2011; Cooper and Shendure, 2011). Although comprehensive catalogues of human genetic variation exist, the identification of causal functional variants in particular phenotypes or disease has proven difficult (Cooper and

Shendure, 2011). Perhaps the most comprehensive catalogue of functional genetic variants is the ENCODE database which was developed in order to improve the annotation of functional genetic variation in the human genome (Birney et al ., 2007;

Rosenbloom et al ., 2012). While ENCODE predictions are useful in facilitating hypothesis generation and improving the prioritisation of variants, they are insufficient for verifying the biological function and interactions of regulatory variants in the phenotype of different cells and tissues when not supported by other data (Rosenbloom et al ., 2012). For example, a candidate functional variant may be predicted by the

ENCODE database to disrupt a consensus transcription factor binding motif, however, these predictions are limited without experimental validation to verify allele-specific binding and the associated biological consequence. Consequently, functional assays, such as CHA-seq and T-CHA-seq that incorporate prioritisation and experimental validation may represent a more time and cost-efficient alternative to greatly improve the rate of discovery of functional genetic variants in phenotypes of interest. 170

The results of the current study are suggestive that CHA-seq is a time and cost-effective in vivo approach for the identification of functional variation in extended genomic regions. A key recommendation for the application of CHA-seq to cell lines possessing genetic variants of interest is to incorporate one or more chromatin-sensitive nucleases in the CHA-seq approach, followed by the application of T-CHA-seq to interrogate the effects of individual genetic variants. The incorporation of additional experimental validation tests, such as EMSA and bisulfite sequencing, will enhance the positive identification of functional variants by providing further supporting evidence. Further

CHA-seq experiments will ultimately involve the generation of whole genome maps across different populations. The application of CHA-seq to populations comprised of familial groups would enable confirmation that observed allele-specific patterns are heritable. The identification of heritable functional variants, in contrast to cell or individual-specific functional variants, would further establish the veracity of the CHA- seq approach to identify true functional variants.

Previous Vanin research used transcript levels as an endophenotype to identify VNN1 as a quantitative trait locus associated with high density lipoprotein–cholesterol levels and cardiovascular phenotypes (Göring et al ., 2007). More recently, genetic variants within

VNN1 have been found to be associated with cardiovascular disease (Jacobo-Albavera et al ., 2012; Fava et al ., 2010; Zhu and Cooper, 2007). However, limited experimental validation has been performed to determine specific functional variants in VNN1

(Göring et al ., 2007; Kerkel et al ., 2008). This current research, has contributed to the number of experimentally validated functional genetic variants in VNN1 through the application of the novel CHA-seq assay.

171

The research described in this thesis was able to establish that allele-specific chromatin differences can be seen at sites distinct from the actual genetic variant, and suggested the presence of co-operative functional genetic variants within the VNN1 gene. As functional variants may operate in a cell or tissue-dependent manner (Dimas et al .,

2009), the findings of the current study would require further interrogation using different cell types, to elucidate whether the mechanisms of action associated with functional variants are ubiquitous or specific to certain cells.

Of interest in the current study was the presence of a periodicity in densely populated and under-populated regions with respect to SNP-containing CHA-seq reads over extended genomic regions. It is not known whether this periodicity is a direct reflection of large-scale chromatin arrangement across the VNN1 gene or a consequence of library preparation. In either case, this observation requires further testing in order to optimise

CHA-seq for entire genomes.

The importance of characterising genetic variation is becoming apparent due to the increasing number of functional variants associated with a wide array of diseases

(Stranger et al ., 2011). Consequently, the knowledge of an individual’s SNP genotype may provide a basis for assessing susceptibility to disease and the subsequent selection of optimal treatments (Shastry, 2007). Alternatively, these functional genetic variants may facilitate the development of customised genetic therapies. Functional VNN1 variants identified in the present study may represent potential targets for therapeutic intervention, or represent genetic markers for cardiovascular disease susceptibility.

Ultimately, the long term goal in this field of research is to be able to predict disease susceptibility based on genotype and knowledge of which genetic variants are functional (Cooper and Shendure, 2011).

172

References

Abecasis, G.R., Auton, A., Brooks, L.D., DePristo, M.A., Durbin, R.M., Handsaker, R.E., Kang, H.M., Marth, G.T., McVean, G.A., Altshuler, D.M. , et al. (2012). "An integrated map of genetic variation from 1,092 human genomes." Nature 491 (7422): 56–65.

Akira, S., Isshiki, H., Sugita, T., Tanabe, O., Kinoshita, S., Nishio, Y., Nakajima, T., Hirano, T. and Kishimoto, T. (1990). "A nuclear factor for IL-6 expression (NF-IL6) is a member of a C/EBP family." The EMBO Journal 9(6): 1897-1906.

Alam, J. and Cook, J.L. (1990). "Reporter genes: Application to the study of mammalian gene transcription." Analytical Biochemistry 188 (2): 245-254.

Albers, C.A., Paul, D.S., Schulze, H., Freson, K., Stephens, J.C., Smethurst, P.A., Jolley, J.D., Cvejic, A., Kostadima, M., Bertone, P. , et al. (2012). "Compound inheritance of a low-frequency regulatory SNP and a rare null mutation in exon-junction complex subunit RBM8A causes TAR syndrome." Nature Genetics 44 (4): 435-439.

Alkan, C., Coe, B.P. and Eichler, E.E. (2011). "Genome structural variation discovery and genotyping." Nature Reviews Genetics 12 (5): 363-376.

Altshuler, D., Daly, M.J. and Lander, E.S. (2008). "Genetic mapping in human disease." Science 322 (5903): 881–888.

Altshuler, D.M., Lander, E.S., Ambrogio, L., Bloom, T., Cibulskis, K., Fennell, T.J., Gabriel, S.B., Jaffe, D.B., Shefler, E., Sougnez, C.L. , et al. (2010). "A map of human genome variation from population-scale sequencing." Nature 467 (7319): 1061–1073.

Ameur, A., Rada-Iglesias, A., Komorowski, J. and Wadelius, C. (2009). "Identification of candidate regulatory SNPs by combination of transcription-factor-binding site prediction, SNP genotyping and haploChIP." Nucleic Acids Research 37 (12): e85-e85.

An, W., Leuba, S., Van Holde, K. and Zlatanova, J. (1998). "Biochemistry Linker histone protects linker DNA on only one side of the core particle and in a sequence- dependent manner." Proceedings of the National Academy of Sciences 95 (7): 3396- 3401.

Anderson, M.A., Tanzi, R.E., Watkins, P.C., Ottina, K., Wallace, M.R., Sakaguchi, A.Y., Youngh, A.B., Shoulsonh, I. and Bonillah, E. (2004). "A polymorphic DNA marker genetically linked to Huntington's disease." Landmarks in Medical Genetics: Classic Papers with Commentaries 306 (51): 153.

Aniba, M.R., Poch, O. and Thompson, J.D. (2010). "Issues in bioinformatics benchmarking: the case study of multiple sequence alignment." Nucleic Acids Research 38 (21): 7353-7363.

Aurrand-Lions, M., Galland, F., Bazin, H., Zakharyev, V., Imhof, B. and Naquet, P. (1996). "Vanin 1, a novel GPI-linked perivascular molecule involved in thymus homing." Immunity 5(5): 391–405.

173

Auton, A. and McVean, G. (2012). "Estimating Recombination Rates from Genetic Variation in Humans." Evolutionary Genomics: Methods in Molecular Biology 856 : 217-237.

Axel, R. (1975). "Cleavage of DNA in nuclei and chromatin with staphylococcal nuclease." Biochemistry 14 (13): 2921–2925.

Bakir-Gungor, B. and Sezerman, O.U. (2011). "A New Methodology to Associate SNPs with Human Diseases According to Their Pathway Related Context." PLoS ONE 6(10): e26277.

Baral, A., Kumar, P., Halder, R., Mani, P., Yadav, V.K., Singh, A., Das, S.K. and Chowdhury, S. (2012). "Quadruplex-single nucleotide polymorphisms (Quad-SNP) influence gene expression difference among individuals." Nucleic Acids Research 40 (9): 3800–3811.

Baret, J.-C., Beck, Y., Billas-Massobrio, I., Moras, D. and Griffiths, A.D. (2010). "Quantitative Cell-Based Reporter Gene Assays Using Droplet-Based Microfluidics." Chemistry & Biology 17 (5): 528-536.

Barski, A., Cuddapah, S., Cui, K., Roh, T.Y., Schones, D.E., Wang, Z., Wei, G., Chepelev, I. and Zhao, K. (2007). "High-resolution profiling of histone methylations in the human genome." Cell 129 (4): 823-837.

Bartoszewski, R.A., Jablonsky, M., Bartoszewska, S., Stevenson, L., Dai, Q., Kappes, J., Collawn, J.F. and Bebok, Z. (2010). "A synonymous single nucleotide polymorphism in ∆F508 CFTR alters the secondary structure of the mRNA and the expression of the mutant protein." The Journal of Biological Chemistry 285 (37): 28741–28748.

Baudat, F., Buard, J., Grey, C., Fledel-Alon, A., Ober, C., Przeworski, M., Coop, G. and De Massy, B. (2010). "PRDM9 is a major determinant of meiotic recombination hotspots in humans and mice." Science 327 (5967): 836-840.

Bekris, L.M., Millard, S.P., Galloway, N.M., Vuletic, S., Albers, J.J., Li, G., Galasko, D.R., DeCarli, C., Farlow, M.R., Clark, C.M. , et al. (2008). "Multiple SNPs Within and Surrounding the Apolipoprotein E Gene Influence Cerebrospinal Fluid Apolipoprotein E Protein Levels." Journal of Alzheimer's Disease 13 (3): 255–266.

Bell, O., Tiwari, V.K., Thomä, N.H. and Schübeler, D. (2011). "Determinants and dynamics of genome accessibility." Nature Reviews Genetics 12 (8): 554-564.

Benoukraf, T., Wongphayak, S., Hadi, L.H.A., Wu, M. and Soong, R. (2013). "GBSA: a comprehensive software for analysing whole genome bisulfite sequencing data." Nucleic Acids Research 41 (4): e55.

Berrington de Gonzalez, A., Hartge, P., Cerhan, J., Flint, A., Hannan, L., MacInnis, R., Moore, S., Tobias, G., Anton-Culver, H., Freeman, L. , et al. (2010). "Body mass index and mortality among 1.46 million white adults." The New England Journal of Medicine 363 (23): 2211–2219.

Berruyer, C., Pouyet, L., Millet, V., Martin, F., LeGoffic, A., Canonici, A., Garcia, S., Bagnis, C., Naquet, P. and Galland, F. (2006). "Vanin-1 licenses inflammatory mediator 174 production by gut epithelial cells and controls colitis by antagonizing peroxisome proliferator-activated receptor gamma activity." The Journal of experimental medicine 203 (13): 2817–2827.

Bertina, R.M., Koeleman, B.P.C., Koster, T., Rosendaal, F.R., Dirven, R.J., de Ronde, H., van der Velden, P.A. and Reitsma, P.H. (1994). "Mutation in blood coagulation factor V associated with resistance to activated protein C." Nature 369 (6475): 64 - 67.

Bird, A. (2002). "DNA methylation patterns and epigenetic memory." Genes and Development 16 (1): 6-21.

Birney, E., John A. Stamatoyannopoulos, Anindya Dutta, Roderic Guigó, Thomas R. Gingeras, Elliott H. Margulies, Weng, Z. and al., e. (2007). "Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project." Nature 447 (7146): 799-816.

Birney, E., Lieb, J., Furey, T., Crawford, G. and Iyer, V. (2010). "Allele-specific and heritable chromatin signatures in humans." Human Molecular Genetics 19 (2): 204–209.

Bodmer, W. and Bonilla, C. (2008). "Common and rare variants in multifactorial susceptibility to common diseases." Nature Genetics 40 (6): 695–701.

Bonetti, S., Trombetta, M., Linda, B., Dauriz, M., Malerba, G., Trabetti, E., Pignatti, P., Bonora, E. and Bonadonna, R. (2012). "Common variants of monogenic diabetes genes may affect [beta] cell functional mass in patients with newly diagnosed type 2 diabetes." Endocrine Abstracts 29 : OC3.1.

Borecki, I.B. and Province, M.A. (2008). "Genetic and Genomic Discovery Using Family Studies." Circulation 118 (10): 1057-1063.

Botstein, D. and Risch, N. (2003). "Discovering genotypes underlying human phenotypes: past successes for Mendelian disease, future approaches for complex disease." Nature Genetics 33 : 228–237.

Boumber, Y.A., Kondo, Y., Chen, X., Shen, L., Guo, Y., Tellez, C., Estecio, M.R., Ahmed, S. and Issa, J.P. (2008). "An Sp1/Sp3 binding polymorphism confers methylation protection." PLoS Genetics 4(8): e1000162.

Boyes, J. and Bird, A. (1991). "DNA methylation inhibits transcription indirectly via a methyl CpG binding protein." Cell 64 (6): 1123-1134.

Boyle, A., Song, L., Lee, B.-K., London, D., Keefe, D., Birney, E., Iyer, V., Crawford, G. and Furey, T. (2011). "High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells." Genome Research 21 (3): 456-464.

Browning, S.R. and Browning, B.L. (2013). "Identity-by-descent-based heritability analysis in the Northern Finland Birth Cohort." Human Genetics 132 (2): 129-138.

Bulanenkova, S., Kozlova, A., Kotova, E., Snezhkov, E., Azhikina, T., Akopov, S., Nikolaev, L. and Sverdlov, E. (2011). "Dam methylase accessibility as an instrument for analysis of mammalian chromatin structure." Epigenetics 6(9): 1078 – 1084.

175

Bulik-Sullivan, B.K. and Sullivan, P.F. (2012). "The authorship network of genome- wide association studies." Nature Genetics 44 (2): 113.

Bustin, S.A., Benes, V., Nolan, T. and Pfaffl, M.W. (2005). "Quantitative real-time RT- PCR – a perspective." Journal of Molecular Endocrinology 34 (3): 597–601.

Butter, F., Davison, L., Viturawong, T., Scheibe, M., Vermeulen, M., Todd, J.A. and Mann, M. (2012). "Proteome-Wide Analysis of Disease-Associated SNPs That Show Allele-Specific Transcription Factor Binding." PLoS Genetics 8(9): e1002982.

Callinan, P.A. and Feinberg, A.P. (2006). "The emerging science of epigenomics." Human Molecular Genetics 15 (suppl 1): R95-R101.

Cambien, F. and Tiret, L. (2007). "Genetics of Cardiovascular Diseases From Single Mutations to the Whole Genome." Circulation 116 (15): 1714-1724.

Cantor, R.M., Lange, K. and Sinsheimer, J.S. (2010). "Prioritizing GWAS Results: A Review of Statistical Methods and Recommendations for Their Application." The American Journal of Human Genetics 86 (1): 6–22.

Cao, Z., Umek, R. and McKnight, S. (1991). "Regulated expression of three C/EBP isoforms during adipose conversion of 3T3-L1 cells." Genes and Development 5(9): 1538–1552.

Carey, M., Peterson, C. and Smale, S. (2009). Chromatin Immunoprecipitation (ChIP) . Cold Spring Harbor, New York.

Carey, M., Peterson, C. and Smale, S. (2009). Transcriptional Regulation in Eukaryotes: Concepts, Strategies, and Techniques . Cold Spring Harbor, New York.

Caro, E., Stroud, H., Greenberg, M.V.C., Bernatavichute, Y.V., Feng, S., Groth, M., Vashisht, A.A., Wohlschlegel, J. and Jacobsen, S.E. (2012). "The SET-Domain Protein SUVR5 Mediates H3K9me2 Deposition and Silencing at Stimulus Response Genes in a DNA Methylation–Independent Manner." PLoS Genetics 8(10): e1002995.

Casto, A.M. and Feldman, M.W. (2011). "Genome-Wide Association Study SNPs in the Human Genome Diversity Project Populations: Does Selection Affect Unlinked SNPs with Shared Trait Associations?" PLoS Genetics 7(1): e1001266.

Chavan, S.V., Maitra, A., Roy, N., Patwardhan, S., Chavan, P.R..(2010). “Genetic variants in the distal enhancer region of the PSA gene and their implication in the occurrence of advanced prostate cancer” Molecular Medicine Reports 3(5): 837-843.

Chekmenev, D.S., Haid, C. and Kel, A.E. (2005). "P-Match: transcription factor binding site search by combining patterns and weight matrices." Nucleic Acids Research 33 (suppl 2): W432–W437.

Cheng, K.C.-C. and Inglese, J. (2012). "A coincidence reporter-gene system for high- throughput screening." Nature Methods 9(10): 937-937.

Cho, D.-Y., Kim, Y.-A. and Przytycka, T.M. (2012). "Network Biology Approach to Complex Diseases." PLoS Computational Biology 8(12): e1002820. 176

Chung, H.-R., Dunkel, I., Heise, F., Linke, C., Krobitsch, S., Ehrenhofer-Murray, A.E., Sperling, S.R. and Vingron, M. (2010). "The Effect of Micrococcal Nuclease Digestion on Nucleosome Positioning Data." PLoS ONE 5(12): e15754.

Coetzee, G.A., Jia, L., Frenkel, B., Henderson, B.E., Tanay, A., Haiman, C.A. and Freedman, M.L. (2010). "A systematic approach to understand the functional consequences of non-protein coding risk regions." Cell Cycle 9(2): 47–51.

Cohen, J.C., Kiss, R.S., Pertsemlidis, A., Marcel, Y.L., McPherson, R. and Hobbs, H.H. (2004). "Multiple rare alleles contribute to low plasma levels of HDL cholesterol." Science 305 (5685): 869–872.

Cohen, J.C., Pertsemlidis, A., Fahmi, S., Esmail, S., Vega, G.L., Grundy, S.M. and Hobbs, H.H. (2006). "Multiple rare variants in NPC1L1 associated with reduced sterol absorption and plasma low-density lipoprotein levels." Proceedings of the National Academy of Sciences 103 (6): 1810-1815.

Collas, P. (2010). "The Current State of Chromatin Immunoprecipitation." Molecular Biotechnology 45 (1): 87-100.

Collins, F., Guyer, M. and Charkravarti, A. (1997). "Variations on a theme: cataloging human DNA sequence variation." Science 278 (5343): 1580–1581.

Collins, F.S., Morgan, M. and Patrinos, A. (2003). "The Human Genome Project: lessons from large-scale biology." Science 300 (5617): 286-290.

Conde, L., Bracci, P.M., Richardson, R., Montgomery, S.B. and Skibola, C.F. (2013). "Integrating GWAS and expression data for functional characterization of disease- associated SNPs: an application to follicular lymphoma." The American Journal of Human Genetics 92 (1): 126-130.

Conrad, D.F., Keebler, J.E., DePristo, M.A., Lindsay, S.J., Zhang, Y., Casals, F., Idaghdour, Y., Hartl, C.L., Torroja, C., Garimella, K.V. , et al. (2011). "Variation in genome-wide mutation rates within and between human families." Nature Genetics 43 (7): 712–714.

Cookson, W., Liang, L., Abecasis, G., Moffatt, M. and Lathrop, M. (2009). "Mapping complex disease traits with global gene expression." Nature Reviews Genetics 10 (3): 184–194.

Cooper, G.M. and Shendure, J. (2011). “Needles in stacks of needles: finding disease- causal variants in a wealth of genomic data.” Nature Reviews Genetics 12 (9): 628-640.

Cowie, A., Cheng, J., Sibley, C.D., Fong, Y., Zaheer, R., Patten, C.L., Morton, R.M., Golding, G.B. and Finan, T.M. (2006). "An Integrated Approach to Functional Genomics: Construction of a Novel Reporter Gene Fusion Library for Sinorhizobium meliloti." Applied and Environmental Microbiology 72 (11): 7156–7167.

Crews, K.R., Hicks, J.K., Pui, C.-H., Relling, M.V. and Evans, W.E. (2012). "Pharmacogenomics and Individualized Medicine: Translating Science Into Practice." Clinical Pharmacology & Therapeutics 92 (4): 467–475. 177

Cruickshank, M., Fenwick, E., Abraham, L. and Ulgiati, D. (2008). "Quantitative differences in chromatin accessibility across regulatory regions can be directly compared in distinct cell-types." Biochemical and Biophysical Research Communications 367 (2): 349-355.

Cruickshank, M., Karimi, M., Mason, R., Fenwick, E., Mercer, T., Tsao, B., Boackle, S. and Ulgiati, D. (2012). "Transcriptional effects of a lupus-associated polymorphism in the 5 ′ untranslated region (UTR) of human complement receptor 2 (CR2/CD21)." Molecular Immunology 52 (3): 165–173.

Cruickshank, M.N., Fenwick, E., Karimi, M., Abraham, L.J. and Ulgiati, D. (2009). "Cell- and stage-specific chromatin structure across the Complement receptor 2 (CR2/CD21) promoter coincide with CBF1 and C/EBP-beta binding in B cells." Molecular Immunology 46 (13): 2613-2622.

Cuff, J., Clamp, M., Siddiqui, A., Finlay, M. and Barton, G. (1998). "JPred: a consensus secondary structure prediction server." Bioinformatics 14 (10): 892-893.

Curran, J., Jowett, J., Abraham, L.J, Diepeveen, L.A, Elliott, K., Dyer, T., Kerr- Bayles, L., Johnson, M., Comuzzie, A., Moses, E. , et al. (2009). "Genetic variation in PARL influences mitochondrial content." Human Genetics 127 (2): 183–190.

Dai, Y., Jiang, R. and Dong, J. (2012). "Weighted selective collapsing strategy for detecting rare and common variants in genetic association study." BMC Genetics 13 (1): 7.

Danilition, S., Fredrickson, R., Taylor, C. and Miyamoto, N. (1991). "Transcription factor binding and spacing constraints in the human beta-actin proximal promoter." Nucleic Acids Research 19 (24): 6913-6922.

Decker, T., di Magliano, M.P., McManus, S., Sun, Q., Bonifer, C., Tagoh, H. and Busslinger, M. (2009). "Stepwise Activation of Enhancer and Promoter Regions of the B Cell Commitment Gene Pax5 in Early Lymphopoiesis." Immunity 30 (4): 508-520.

Degner, J.F., Pai, A.A., Pique-Regi, R., Veyrieras, J.-B., Gaffney, D.J., Pickrell, J.K., De Leon, S., Michelini, K., Lewellen, N., Crawford, G.E. , et al. (2012). "DNase I sensitivity QTLs are a major determinant of human expression variation." Nature 482 (7385): 390-394.

Dekker, F. and Haisma, H. (2009). "Histone acetyl transferases as emerging drug targets." Drug Discovery Today 14 (19): 942−948.

Deng, J., Shoemaker, R., Xie, B., Gore, A., LeProust, E.M., Antosiewicz-Bourget, J., Egli, D., Maherali, N., Park, I.-H., Yu, J. , et al. (2009). "Targeted bisulfite sequencing reveals changes in DNA methylation associated with nuclear reprogramming." Nature Biotechnology 27 (4): 353-360.

Dickinson, H.O., Mason, J.M., Nicolson, D.J., Campbell, F., Beyer, F.R., Cook, J.V., Williams, B. and Ford, G.A. (2006). "Lifestyle interventions to reduce raised blood pressure: a systematic review of randomized controlled trials." Journal of Hypertension 24 (2): 215–233. 178

Dimas, A.S., Deutsch, S., Stranger, B.E., Montgomery, S.B., Borel, C., Attar-Cohen, H., Ingle, C., Beazley, C., Arcelus, M.G., Sekowska, M., et al . (2009). “Common Regulatory Variation Impacts Gene Expression in a Cell Type–Dependent Manner.” Science . 325 (5945): 1246-1250.

Diogo, D., Kurreeman, F., Stahl, E.A., Liao, K.P., Gupta, N., Greenberg, J.D., Rivas, M.A., Hickey, B., Flannick, J., Thomson, B. , et al. (2012). "Rare, Low-Frequency, and Common Variants in the Protein-Coding Sequence of Biological Candidate Genes from GWASs Contribute to Risk of Rheumatoid Arthritis." The American Journal of Human Genetics 92 (1): 15-27.

Dong, C., Yoon, W. and Goldschmidt-Clermont, P. (2002). "DNA Methylation and Atherosclerosis." Journal of Nutrition 132 (8): 2406-2409.

Dong, K., Li, B., Qing, Y., Liu, J., Li, C. and Sun, Z. (2005). "Aberrant Methylation in CpG Islands of p15 and p16 Tumor Suppressor Genes in Pancreatic Cancer Tissue." The Chinese-German Journal of Clinical Oncology 4(4): 213-217.

Doris, P.A. (2011). "The Genetics of Blood Pressure and Hypertension: The Role of Rare Variation." Cardiovascular Therapeutics 29 (1): 37–45.

Drouin, R., Angers, M., Dallaire, N., Rose, T.M., Khandjian, E.W. and Rousseau, F. (1997). "Structural and functional characterization of the human FMR1 promoter reveals similarities with the hnRNP-A2 promoter region." Human Molecular Genetics 6(12): 2051–2060.

Duerr, R.H., Taylor, K.D., Brant, S.R., Rioux, J.D., Silverberg, M.S., Daly, M.J., Steinhart, A.H., Abraham, C., Regueiro, M., Griffiths, A. , et al. (2006). "A genome- wide association study identifies IL23R as an inflammatory bowel disease gene." Science 314 (5804): 1461–1463.

Duff, G.W. (2006). "Evidence for genetic variation as a factor in maintaining health." The American Journal of Clinical Nutrition 83 (2): 431S-435S.

Dufour, A., Schneider, F., Metzeler, K.H., Hoster, E., Schneider, S., Zellmeier, E., Benthaus, T., Sauerland, M.-C., Berdel, W.E., Büchner, T. , et al. (2010). "Acute Myeloid Leukemia With Biallelic CEBPA Gene Mutations and Normal Karyotype Represents a Distinct Genetic Entity Associated With a Favorable Clinical Outcome." Journal of Clinical Oncology 28 (4): 570-577.

Edelstein, L., Pan, A. and Collins, T. (2005). "Chromatin Modification and the Endothelial-specific Activation of the E-selectin Gene." The Journal of Biological Chemistry 280 (12): 11192-11202.

Eeles, R.A., Kote-Jarai, Z., Al Olama, A.A., Giles, G.G., Guy, M., Severi, G., Muir, K., Hopper, J.L., Henderson, B.E., Haiman, C.A. , et al. (2009). "Identification of seven new prostate cancer susceptibility loci through a genome-wide association study." Nature Genetics 41 (10): 1116–1121.

Ehret, G.B., Munroe, P.B., Rice, K.M., Bochud, M., Johnson, A.D., Chasman, D.I., Smith, A.V., Psaty, B.M., Abecasis, G.R., Chakravarti, A. , et al. (2011). "Genetic 179 variants in novel pathways influence blood pressure and cardiovascular disease risk." Nature 478 (7367): 103–109.

Eichler, E.E., Nickerson, D.A., Altshuler, D., Bowcock, A.M., Brooks, L.D., Carter, N.P., Church, D.M., Felsenfeld, A., Guyer, M., Lee, C. , et al. (2007). "Completing the map of human genetic variation." Nature 447 (7141): 161–165.

Ellenberger, T. (1994). "Getting a grip in DNA recognition: structures of the basic region leucine zipper, and the basic region helix-loop-helix DNA-binding domains." Current Opinion in Structural Biology 4(1): 12–21.

Erler, J.T. and Linding, R. (2012). "Network Medicine Strikes a Blow against Breast Cancer." Cell 149 (4): 731–733.

Ernst, J., Kheradpour, P., Mikkelsen, T.S., Shoresh, N., Ward, L.D., Epstein, C.B., Zhang, X., Wang, L., Issner, R., Coyne, M. , et al. (2011). "Mapping and analysis of chromatin state dynamics in nine human cell types." Nature 473 (7345): 43–49.

Faber, K., Glatting, K.-H., Mueller, P.J., Risch, A. and Hotz-Wagenblatt, A. (2011). "Genome-wide prediction of splice-modifying SNPs in human genes using a new analysis pipeline called AASsites." BMC Bioinformatics 12 (Suppl 4): S2.

Fallahi-Sichani, M., El-Kebir, M., Marino, S., Kirschner, D. and Linderman, J. (2011). "Multiscale computational modeling reveals a critical role for TNF-α receptor 1 dynamics in tuberculosis granuloma formation." The Journal of Immunology 186 (6): 3472-3483.

Faraggi, E., Zhang, T., Yang, Y., Kurgan, L. and Zhou1, Y. (2012). "SPINE X: Improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles." Journal of Computational Chemistry 33 (3): 259–267.

Farré, M., Micheletti, D. and Ruiz-Herrera, A. (2012). "Recombination Rates and Genomic Shuffling in Human and Chimpanzee—A New Twist in the Chromosomal Speciation Theory." Molecular Biology and Evolution 6(6): e20321.

Fava, C., Montagnana, M., Danese, E., Sjogren, M., Hedblad, B. and Melander, O. (2010). "Vanin-1 I26t Polymorphism and Hypertension in Two Large Urban-Based Prospective Studies in Swedes." Journal of Hypertension 28 : e341.

Fernald, G., Capriotti, E., Daneshjou, R., Karczewski, K. and Altman, R. (2011). "Bioinformatics challenges for personalized medicine." Bioinformatics 27 (13): 1741- 1748.

Feuk, L., Carson, A.R. and Scherer, S.W. (2006). "Structural variation in the human genome." Nature Reviews Genetics 7(2): 85–97.

Fledel-Alon, A., Leffler, E.M., Guan, Y., Stephens, M., Coop, G. and Przeworski, M. (2011). "Variation in Human Recombination Rates and Its Genetic Determinants." PLoS ONE 6(6): e20321.

Frazer, K.A., Murray, S.S., Schork, N.J. and Topol, E.J. (2009). "Human genetic 180 variation and its contribution to complex traits." Nature Reviews Genetics 10 (4): 241- 251.

Freedman, M.L., Monteiro, A.N.A., Gayther, S.A., Coetzee, G.A., Risch, A., Plass, C., Casey, G., De Biasi, M., Carlson, C., Duggan, D. , et al. (2011). "Principles for the post- GWAS functional characterization of cancer risk loci." Nature Genetics 43 (6): 513– 518.

Freeman, W., Walker, S. and Vrana, K. (1999). "Quantitative RT-PCR: Pitfalls and Potential." BioTechniques 26 : 112-125.

Frommer, M., McDonald, L., Millar, D., Coiiis, C., Watt, F., Grigg, G., Molloy, P. and Paul, C. (1992). "A genomic sequencing protocol that yields a positive display of 5- methylcytosine residues in individual DNA strands." Proceedings of the National Academy of Sciences 89 (5): 1827-1831.

Fugmann, T., Borgia, B., Révész, C., Godó, M., Forsblom, C., Hamar, P., Holthöfer, H., Neri, D. and Roesli, C. (2011). "Proteomic identification of vanin-1 as a marker of kidney damage in a rat model of type 1 diabetic nephropathy." Kidney International 80 (3): 272-281.

Fürbass, R., Selimyan, R. and Vanselow, J. (2008). "DNA methylation and chromatin accessibility of the proximal Cyp19 promoter region 1.5/2 correlate with expression levels in sheep placentomes." Journal of Molecular Reproduction and Development 75 (1): 1–7.

Gaffney, D.J., McVicker, G., Pai, A.A., Fondufe-Mittendorf, Y.N., Lewellen, N., Michelini, K., Widom, J., Gilad, Y. and Pritchard, J.K. (2012). "Controls of nucleosome positioning in the human genome." PLoS Genetics 8(11): e1003036.

Galland, F., Malergue, F., Bazin, H., Mattei, M., Aurrand-Lions, M., Theillet, C. and Naquet, P. (1998). "Two human genes related to murine Vanin-1 are located on the long arm of human chromosome 6." Genomics 53 (2): 203-213.

Garner, M. and Revzin, A. (1981). "A gel electrophoresis method for quantifying the binding of proteins to specific DNA regions: application to components of the Escherichia coli lactose operon regulatory system." Nucleic Acids Research 9(13): 3047-3060.

Ge, B., lok, D.K., Kwan, T., Grundberg, E., Morcos, L., Verlaan, D.J., Le, J., Koka, V., Lam, K.C., Gagné, V. , et al. (2009). "Global patterns of cis variation in human cells revealed by high-density allelic expression analysis." Nature Genetics 41 (11): 1216– 1222.

Genin, E., Hannequin, D., Wallon, D., Sleegers, K., Hiltunen, M., Combarros, O., Bullido, M.J., Engelborghs, S., De Deyn, P., Berr, C. , et al. (2011). "APOE and Alzheimer disease: a major gene with semi-dominant inheritance." Molecular Psychiatry 16 (9): 903–907.

Gertz, J., Varley, K., Reddy, T., Bowling, K., Pauli, F., Parker, S., Kucera, K., Willard, H. and Myers, R. (2011). "Analysis of DNA Methylation in a Three-Generation Family Reveals Widespread Genetic Influence on Epigenetic Regulation." PLoS Genetics 7(8): 181 e1002228.

Gibson, G. (2012). "Rare and common variants: twenty arguments." Nature Reviews Genetics 13 (2): 135-145.

Giresi, P.G., Kim, J., McDaniell, R.M., Iyer, V.R. and Lieb, J.D. (2007) "FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active regulatory elements from human chromatin." Genome Research 17 (6): 877-885.

Girirajan, S., Brkanac, Z., Coe, B.P., Baker, C., Vives, L., Vu, T.H., Shafer, N., Bernier, R., Ferrero, G.B., Silengo, M. , et al. (2011). "Relative Burden of Large CNVs on a Range of Neurodevelopmental Phenotypes." PLoS Genetics 7(11): e1002334.

Git, A., Dvinge, H., Salmon-Divon, M., Osborne, M., Kutter, C., Hadfield, J., Bertone, P. and Caldas, C. (2010). "Systematic comparison of microarray profiling, real-time PCR, and next-generation sequencing technologies for measuring differential microRNA expression." RNA 16 (5): 991-1006.

Glait-Santar, C. and Benayahu, D. (2011). "SVEP1 promoter regulation by methylation of CpG sites." Gene 490 (1): 6–14.

Glazier, A.M., Nadeau, J.H. and Aitman, T.J. (2002). "Finding genes that underlie complex traits." Science 298 (5602): 2345-2349.

Goh, K.I., Cusick, M.E., Valle, D., Childs, B., Vidal, M. and Barabasi, A.L. (2007). "The human disease network." Proceedings of the National Academy of Sciences 104 (21): 8685–8690.

Goldgar, D.E., Healey, S., Dowty, J.G., Da Silva, L., Chen, X., Spurdle, A.B., Terry, M.B., Daly, M.J., Buys, S.M., Southey, M.C. , et al. (2011). "Rare variants in the ATM gene and risk of breast cancer." Breast Cancer Research 13 (4): R73-R73.

Goldstein, J.L., Hobbs, H.H. and Brown, M.S. (2001). Familial hypercholesterolemia . McGraw-Hill Book Co., New York.

Goll, M.G. and Bestor, T.H. (2005). "Eukaryotic cytosine methyltransferases." Annual Review of Biochemistry 74 : 481–514.

Goriely, S., Van Lint, C., Dadkhah, R., Libin, M., De Wit, D., Demonté, D., Willems, F. and Goldman, M. (2004). "A defect in nucleosome remodeling prevents IL-12 (p35) gene transcription in neonatal dendritic cells." The Journal of Experimental Medicine 199 (7): 1011-1016.

Göring, H., Curran, J., Johnson, M., Dyer, T., Charlesworth, J., Cole, S., Jowett, J., Abraham, L., Rainwater, D., Comuzzie, A. , et al. (2007). "Discovery of expression QTLs using large-scale transcriptional profiling in human lymphocytes." Nature Genetics 39 (10): 1208 - 1216.

Gottesman, O., Drill, E., Lotay, V., Bottinger, E. and Peter, I. (2012). "Can Genetic Pleiotropy Replicate Common Clinical Constellations of Cardiovascular Disease and Risk?" PLoS ONE 7(9): e46419.

182

Goyette, P., Lefebvre, C., Ng, A., Brant, S.R., Cho, J.H., Duerr, R.H., Silverberg, M.S., Taylor, K.D., Latiano, A., Aumais, G. , et al. (2008). "Gene-centric association mapping of chromosome 3p implicates MST1 in IBD pathogenesis." Mucosal Immunology 1(2): 131–138.

Grabe, N. (2002). "AliBaba2: Context specific identification of transcription factor binding sites." In Silico Biology 2(1): 1-15.

Grant, S.F. and Hakonarson, H. (2009). "Genome-wide association studies in type 1 diabetes." Current Diabetes Reports 9(2): 157-163.

Grayling, R., Bailey, K. and Reeve, J. (1997). "DNA binding and nuclease protection by the HMf histones from the hyperthermophilic archaeon Methanothermus fervidus." Extremophiles 1(2): 79-88.

Grunenfelder, B. and Winzler, E.A. (2002). "Treasures and traps in genome-wide data sets: case examples from yeast." Nature Reviews Genetics 3(9): 653-661.

Guo, Y. and Jamison, D. (2005). "The distribution of SNPs in human gene regulatory regions." BMC Genomics 6(1): 140.

Hager, G., Archer, T., Fragoso, G., Bresnick, E., Tsukagoshi, Y., John, S. and Smith, C. (1993). "Influence of Chromatin Structure on the Binding of Transcription Factors to DNA." Cold Spring Harbor Symposia on Quantitative Biology 58 : 63-71.

Hager, G., McNally, J. and Misteli, T. (2009). "Transcription dynamics." Molecular Cell 35 (6): 741-753.

Hakonarson, H., Grant, S.F., Bradfield, J.P., Marchand, L., Kim, C.E., Glessner, J.T., Grabs, R., Casalunovo, T., Taback, S.P., Frackelton, E.C. , et al. (2007). "A genome- wide association study identifies KIAA0350 as a type 1 diabetes gene." Nature 448 (7153): 591-594.

Haslam, D.W. and James, W.P. (2005). "Obesity." Lancet 366 (9492): 1197–209.

Hawkins, R.D., Hon, G.C. and Ren, B. (2010). "Next-generation genomics: an integrative approach." Nature Reviews Genetics 11 (7): 476-486.

He, F.J. and MacGregor, G.A. (2009). "A comprehensive review on salt and health and current experience of worldwide salt reduction programmes." Journal of Human Hypertension 23 (6): 363–384.

He, M., Guo, H., Yang, X., Zhang, X., Zhou, L., Cheng, L., Zeng, H., Hu, F.B., Tanguay, R.M. and Wu, T. (2009). "Functional SNPs in HSPA1A Gene Predict Risk of Coronary Heart Disease." PLoS ONE 4(3): e4851.

Heins, J., Suriano, J., Taniuchi, H. and Anfinsen, C. (1967). "Characterization of a nuclease produced by Staphylococcus aureus." The Journal of Biological Chemistry 242 (5): 1016-1020.

Hellman, L.M. and Fried, M.G. (2007). "Electrophoretic Mobility Shift Assay (EMSA) for Detecting Protein-Nucleic Acid Interactions." Nature Protocols 2(8): 1849–1861. 183

Hirschhorn, J.N. (2009). "Genomewide association studies—illuminating biologic pathways." The New England Journal of Medicine 360 (17): 1699-1701.

Hodges, E., Smith, A.D., Kendall, J., Xuan, Z., Ravi, K., Rooks, M., Zhang, M.Q., Ye, K., Bhattacharjee, A., Brizuela, L. , et al. (2009). "High definition profiling of mammalian DNA methylation by array capture and single molecule bisulfite sequencing." Genome Research 19 (9): 1593-1605.

Hoffmeyer, S., Burk, O., von Richter, O., Arnold, H., Brockmöller, J., Johne, A., Cascorbi, I., Gerloff, T., Roots, I., Eichelbaum, M. , et al. (2000). "Functional polymorphisms of the human multidrug-resistance gene: Multiple sequence variations and correlation of one allele with P-glycoprotein expression and activity in vivo." Proceedings of the National Academy of Sciences 97 (7): 3473-3478.

Holden, N. and Tacon, C. (2011). "Principles and problems of the electrophoretic mobility shift assay." Journal of Pharmacological and Toxicological Methods 63 (1): 7- 14.

Holliday, R. and Ho, T. (1991). "Gene silencing in mammalian cells by uptake of 5- methyl deoxycytidine-5'-triphosphate." Somatic Cell and Molecular Genetics 17 (6): 537-542.

Hon, G., Wang, W. and Ren, B. (2009). "Discovery and Annotation of Functional Chromatin Signatures in the Human Genome." PLoS Computational Biology 5(11): e1000566.

Hu, C., Bart Williams, D., Zhaorigetu, S., Khalil, S., Wan, G. and Valle, D. (2008). "Functional genomics and SNP analysis of human genes encoding proline metabolic enzymes." Amino Acids 35 (4): 655-664.

Hu, X. and Daly, M. (2012). "What have we learned from six years of GWAS in autoimmune diseases, and what is next?" Current Opinion in Immunology 24 (5): 571– 575.

Huang, D.W., Sherman, B.T. and Lempicki, R.A. (2009). "Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists." Nucleic Acids Research 37 (1): 1–13.

Huang, Y., Yang, H., Borg, B., Su, X., Rhodes, S., Yang, K., Tong, X., Tang, G., Howell, C., Rosen, H. , et al. (2007). "A functional SNP of interferon-γ gene is important for interferon-α-induced and spontaneous recovery from hepatitis C virus infection." Proceedings of the National Academy of Sciences 104 (3): 985-990.

Hultman, K., Tjarnlund-Wolf, A., Odeberg, J., Eriksson, P. and Jern, C. (2010). "Allele- specific transcription of the PAI-1 gene in human astrocytes." Thrombosis & Haemostasis 104 (5): 998-998.

Hunt, K., Zhernakova, A., Turner, G., Heap, G., Franke, L., Bruinenberg, M., Romanos, J., Dinesen, L., Ryan, A., Panesar, D. , et al. (2008). "Newly identified genetic risk variants for celiac disease related to the immune response." Nature Genetics 40 (4): 395- 402. 184

Hussin, J., Roy-Gagnon, M.-H., Gendron, R., Andelfinger, G. and Awadalla, P. (2011). "Age-Dependent Recombination Rates in Human Pedigrees." PLoS Genetics 7(9): e1002251.

Huttlin, E.L., Hegeman, A.D., Harms, A.C. and Sussman, M.R. (2007). "Prediction of error associated with false-positive rate determination for peptide identification in large- scale proteomics experiments using a combined reverse and forward peptide sequence database strategy." Journal of Proteome Research 6(1): 392-398.

Ideker, T. and Sharan, R. (2008). "Protein networks in disease." Genome Research 18 (4): 644-652.

Ikeda, R., Nishida, T., Watanabe, F., Shimizu-Saito, K., Asahina, K., Horikawa, S. and Teraoka, H. (2008). "Involvement of CCAAT/enhancer binding protein-beta (C/EBPbeta) in epigenetic regulation of mouse methionine adenosyltransferase 1A gene expression." The International Journal of Biochemistry & Cell Biology 40 (9): 1956- 1969.

Illumina. (2009). "Genomic Sequencing." Retrieved 20/04/2013, from http://www.illumina.com/Documents/products/datasheets/datasheet_genomic_sequence. pdf.

Jacobo-Albavera, L., Aguayo-de la Rosa, P.I., Villarreal-Molina, T., Villamil-Ramírez, H., León-Mimila, P., Romero-Hidalgo, S., López-Contreras, B.E., Sánchez-Muñoz, F., Bojalil, R., González-Barrios, J.A. , et al. (2012). " VNN1 Gene Expression Levels and the G-137T Polymorphism Are Associated with HDL-C Levels in Mexican Prepubertal Children." PLoS ONE 7(11): e49818.

Jakobsdottir, J., Gorin, M.B., Conley, Y.P., Ferrell, R.E. and Weeks, D.E. (2009). "Interpretation of Genetic Association Studies: Markers with Replicated Highly Significant Odds Ratios May Be Poor Classifiers." PLoS Genetics 5(2): e1000337.

Jansen, P., Kamsteeg, M., Rodijk-Olthuis, D., van Vlijmen-Willems, I., de Jongh, G., Bergers, M., Tjabringa, G., Zeeuwen, P. and Schalkwijk, J. (2009). "Expression of the vanin gene family in normal and inflamed human skin: induction by proinflammatory cytokines." Journal of Investigative Dermatology 129 (9): 2167–2174.

Jeffreys, A.J., Ritchie, A. and Neumann, R. (2000). "High resolution analysis of haplotype diversity and meiotic crossover in the human TAP2 recombination hotspot." Human Molecular Genetics 9(5): 725-733.

Jenkins, P.A., Song, Y.S. and Brem, R.B. (2012). "Genealogy-Based Methods for Inference of Historical Recombination and Gene Flow and Their Application in Saccharomyces cerevisiae." PLoS ONE 7(11): e46947.

Ji, W.Z., Foo, J.N., O’Roak, B.J., Zhao, H., Larson, M.G., Simon, D.B., Newton-Cheh, C., State, M.W., Levy, D. and Lifton, R.P. (2008). "Rare independent mutations in renal salt handling genes contribute to blood pressure variation." Nature Genetics 40 (5): 592- 599.

Jian, Y., Manolio, T.A., Pasquale, L.R., Boerwinkle, E., Caporaso, N., Cunningham, 185

J.M., de Andrade, M., Feenstra, B., Feingold, E., Hayes, M.G. , et al. (2011). "Genome partitioning of genetic variation for complex traits using common SNPs." Nature Genetics 43 (6): 519-525.

Jiang, L., Yao, M., Shi, J., Shen, P., Niu, G. and Fei, J. (2008). "Yin yang 1 directly regulates the transcription of RE-1 silencing transcription factor." Journal of Neuroscience Research 86 (6): 1209–1216.

Jiang, Y., Wang, L., Gong, W., Wei, D., Le, X., Yao, J., Ajani, J., Abbruzzese, J., Huang, S. and Xie, K. (2005). "A high expression level of insulin-like growth factor I receptor is associated with increased expression of transcription factor Sp1 and regional lymph node metastasis of human gastric cancer." Clinical and Experimental Metastasis 21 (8): 755-764.

John, S., Sabo, P.J., R.E., T., Sung, M.H., Biddie, S.C., Johnson, T.A., Hager, G.L. and Stamatoyannopoulos, J.A. (2011). "Chromatin accessibility pre-determines glucocorticoid receptor binding patterns." Nature Genetics 43 (3): 264–268.

Johnson, A.D. (2009). "Single-Nucleotide Polymorphism Bioinformatics. A Comprehensive Review of Resources." Circulation: Cardiovascular Genetics 2(5): 530- 536.

Johnson, A.R., Lao, S., Wang, T., Galanko, J.A. and Zeisel, S.H. (2012). "Choline Dehydrogenase Polymorphism rs12676 Is a Functional Variation and Is Associated with Changes in Human Sperm Cell Function." PLoS ONE 7(4): e36047.

Johnson, D.S., Mortazavi, A., Myers, R.M. and Wold, B. (2007). "Genome-wide mapping of in vivo protein-DNA interactions." Science 316 (5830): 1497–1502.

Jones, T.I., Chen, J.C., Rahimov, F., Homma, S., Arashiro, P., Beermann, M.L., King, O.D., Miller, J.B., Kunkel, L.M., Emerson Jr, C.P. , et al. (2012). "Facioscapulohumeral muscular dystrophy family studies of DUX4 expression: evidence for disease modifiers and a quantitative model of pathogenesis." Human Molecular Genetics 21 (20): 4419- 4430.

Jostins, L. and Barrett, J.C. (2011). "Genetic risk prediction in complex disease." Human Molecular Genetics 20 (R2): R182-R188.

Kagami, M., O'Sullivan, M.J., Green, A.J., Watabe, Y., Arisaka, O., Masawa, N., Matsuoka, K., Fukami, M., Matsubara, K., Kato, F. , et al. (2010). "The IG-DMR and the MEG3-DMR at Human Chromosome 14q32.2: Hierarchical Interaction and Distinct Functional Properties as Imprinting Control Centers." PLoS Genetics 6(6): e1000992.

Kann, M.G. (2010). "Advances in translational bioinformatics: computational approaches for the hunting of disease genes." Briefings in bioinformatics 11 (1): 96–110.

Karimi, M. (2007). Functional analysis of the -308G/A polymorphism in the tumour necrosis factor promoter. School of Biomedical, Biomolecular and Chemical Sciences. Perth, The University of Western Australia. PhD Thesis .

Karimi, M., LC, G., Cruickshank, M., Moses, E. and Abraham, L. (2009). "A critical assessment of the factors affecting reporter gene assays for promoter SNP function: a 186 reassessment of −308 TNF polymorphism function using a novel integrated reporter system." European Journal of Human Genetics 17 (11): 1454–1462.

Kaskow, B., Proffit, J., Blangero, J., Moses, E. and Abraham, L. (2012). "Diverse biological activities of the vascular non-inflammatory molecules – The Vanin pantetheinases." Biochemical and Biophysical Research Communications 417 (2): 653– 658.

Kassim, S.H., Wilson, J.M. and Rader, D.J. (2010). "Gene therapy for dyslipidemia: a review of gene replacement and gene inhibition strategies." Journal of Clinical Lipidology 5(6): 793–809.

Kathiresan, S. and Srivastava, D. (2012). "Genetics of Human Cardiovascular Disease." Cell 148 (6): 1242–1257.

Kazani, S., Wechsler, M.E. and Israel, E. (2010). "The role of pharmacogenomics in improving the management of asthma." Journal of Allergy and Clinical Immunology 125 (2): 295–302.

Keenan, B.T., Shulman, J.M., Chibnik, L.B., Raj, T., Tran, D., Sabuncu, M.R., Allen, A.N., Corneveaux, J.J., Hardy, J.A., Huentelman, M.J. , et al. (2012). "A coding variant in CR1 interacts with APOE-4 to influence cognitive decline." Human Molecular Genetics 21 (10): 2377-2388.

Kent, N.A. and Mellor, J. (1995). "Chromatin structure snap-shots: Rapid nuclease digestion of chromatin in yeast." Nucleic Acids Research 23 (18): 3786–3787.

Kerkel, K., Spadola, A., Yuan, E., Kosek, J., Jiang, L., Hod, E., Li, K., Murty, V., Schupf, N., Vilain, E. , et al. (2008). "Genomic surveys by methylation-sensitive SNP analysis identify sequence-dependent allele-specific DNA methylation." Nature Genetics 40 (7): 904-908.

Kim, Y.-A., Wuchty, S. and Przytycka, T.M. (2011). " Identifying Causal Genes and Dysregulated Pathways in Complex Diseases." PLoS Computational Biology 7(3): e1001095.

Kingsley, C.B. (2011). "Identification of causal sequence variants of disease in the next generation sequencing era." Disease Gene Identification 700 : 37-46.

Kleinjan, D.A. and van Heyningen, V. (2005). “Long-Range Control of Gene Expression: Emerging Mechanisms and Disruption in Disease.” The American Journal of Human Genetics 76 (1): 8-32.

Knight, J. (2005). "Regulatory polymorphisms underlying complex disease traits." Journal of Molecular Medicine 83 (2): 97-109.

Charles, K.J. (2005). "HaploChIP: an in vivo assay." Methods in Molecular Biology 311 : 49-60.

Knowles, M.R. and Drumm, M. (2012). "The Influence of Genetics on Cystic Fibrosis Phenotypes." Cold Spring Harbor Perspectives in Medicine 2(12): a009548.

187

Kodama, K., Horikoshic, M., Toda, K., Yamada, S., Hara, K., Irie, J., Sirota, M., Morgan, A.A., Chen, R., Ohtsu, H. , et al. (2012). "Expression-based genome-wide association study links the receptor CD44 in adipose tissue with type 2 diabetes." Proceedings of the National Academy of Sciences 109 (18): 7049–7054.

Kong, A., Frigge, M.L., Masson, G., Besenbacher, S., Sulem, P., Magnusson, G., Gudjonsson, S.A., Sigurdsson, A., Jonasdottir, A., Jonasdottir, A. , et al. (2012). "Rate of de novo mutations and the importance of father's age to disease risk." Nature 488 (7412): 471-475.

Kong, A., Gudbjartsson, D.F., Sainz, J., Jonsdottir, G.M., Gudjonsson, S.A., Richardsson, B., Sigurdardottir, S., Barnard, J., Hallbeck, B., Masson, G. , et al. (2002). "A high-resolution recombination map of the human genome." Nature Genetics 31 (3): 241-247.

Kuhlmann, I., Minihane, A.M., Huebbe, P., Nebel, A. and Rimbach, G. (2010). "Apolipoprotein E genotype and hepatitis C, HIV and herpes simplex disease risk: a literature review." Lipids in Health and Disease 9(1): 8.

Kurreeman, F.A.S., Padyukov, L., Marques, R.B., Schrodi, S.J., Seddighzadeh, M.S.-R., G, van der Helm-van Mil, A.H.M., Allaart, C.F., Verduyn, W., Houwing-Duistermaat, J., Alfredsson, L. , et al. (2007). "A Candidate Gene Approach Identifies the TRAF1/C5 Region as a Risk Factor for Rheumatoid Arthritis." PLoS Medicine 4(9): e278.

Ladouceur, M., Dastani, Z., Aulchenko, Y.S., Greenwood, C.M.T. and Richards, J.B. (2012). "The Empirical Power of Rare Variant Association Methods: Results from Sanger Sequencing in 1,998 Individuals." PLoS Genetics 8(2): e1002496.

Landa, I., Ruiz-Llorente, S., Montero-Conde, C., Inglada-Pérez, L., Schiavi, F., Leskelä, S., Pita, G., Milne, R., Maravall, J., Ramos, I. , et al. (2009). "The Variant rs1867277 in FOXE1 Gene Confers Thyroid Cancer Susceptibility through the Recruitment of USF1/USF2 Transcription Factors." PLoS Genetics 5(9): e1000637.

Lantermann, A.B., Straub, T., Stralfors, A., Yuan, G.C., Ekwall, K. and Korber, P. (2010). "Schizosaccharomyces pombe genome-wide nucleosome mapping reveals positioning mechanisms distinct from those of Saccharomyces cerevisiae." Nature Structural & Molecular Biology 17 (2): 251–257.

Lattka, E., Eggers, S., Moeller, G., Heim, K., Weber, M., Mehta, D., Prokisch, H., Illig, T. and Adamski, J. (2010). "A common FADS2 promoter polymorphism increases promoter activity and facilitates binding of transcription factor ELK1." Journal of Lipid Research 51 (1): 182-191.

Lavie, C., Milani, R. and Ventura, H. (2009). "Obesity and Cardiovascular Disease Risk Factor, Paradox, and Impact of Weight Loss." Journal of the American College of Cardiology 53 : 1925-1932.

Lee, C. and Scherer, S.W. (2010). "The clinical context of copy number variation in the human genome." Expert Reviews in Molecular Medicine 12 : e8.

Lee, P. and Shatkay, H. (2008). "F-SNP: computationally predicted functional SNPs for 188 disease association studies." Nucleic Acids Research 36 (suppl 1): D820-D824.

Lee, P.H. and Shatkay, H. (2009). "An integrative scoring system for ranking SNPs by their potential deleterious effects." Bioinformatics 25 (8): 1048-1055.

Leong, P., Andrews, G., Johnson, D., Dyer, K., Xi, S., Mai, J., Robbins, P., Gadiparthi, S., Burke, N., Watkins, S. , et al. (2003). "Targeted inhibition of Stat3 with a decoy oligonucleotide abrogates head and neck cancer cell growth." Proceedings of the National Academy of Sciences 100 (7): 4138-4143.

Li, J., Gao, E., Seidner, S. and Mendelson, C. (1998). "Differential regulation of baboon SP-A1 and SP-A2 genes: structural and functional analysis of 5’-flanking DNA." American Journal of Physiology 275 (6): L1078-L1088.

Li, M., Mo, Y., Luo, X.J., Xiao, X., Shi, L., Peng, Y.M., Qi, X.B., Liu, X.Y., Yin, L.D., Diao, H.B. , et al. (2011). "Genetic association and identification of a functional SNP at GSK3 β for schizophrenia susceptibility." Schizophrenia Research 133 (1-3): 165-171.

Li, Y and Tollefsbol, T.O. (2011). "DNA methylation detection: Bisulfite genomic sequencing analysis." Epigenetics Protocols . Humana Press: p11-21

Li, Y.H., Liao, W., Cargill, M., Chang, M., Matsunami, N., Feng, B.J., Poon, A., Callis- Duffin, K.P., Catanese, J.J., Bowcock, A.M. , et al. (2010). "Carriers of Rare Missense Variants in IFIH1 Are Protected from Psoriasis." Journal of Investigative Dermatology 130 (12): 2768-2772.

Li, Z., Gadue, P., Chen, K., Jiao, Y., Tuteja, G., Schug, J., Li, W. and Kaestner, K.H. (2012). "Foxa2 and H2A.Z mediate nucleosome depletion during embryonic stem cell differentiation." Cell 151 (7): 1608–1616.

Li, Z., Schug, J., Tuteja, G., White, P. and Kaestner, K.H. (2011). "The nucleosome map of the mammalian liver." Nature structural & molecular biology 18 (6): 742-746.

Lifton, R.P., Gharavi, A.G. and Geller, D.S. (2001). "Molecular mechanisms of human hypertension." Cell 104 (4): 545-556.

Lin, K., Simossis, V., Taylor, W. and Heringa, J. (2005). "A simple and fast secondary structure prediction method using hidden neural networks." Bioinformatics 21 (2): 152- 159.

Liu, A.M.F., New, D.C., Lo, R.K.H. and Wong, Y.H. (2009). Reporter Gene Assays. Cell-Based Assays for High-Throughput Screening, Humana Press. 486: 109-123.

Liu, C., Yang, Q., Hwang, S.-J., Sun, F., Johnson, A.D., Shirihai, O.S., Vasan, R.S., Levy, D. and Schwartz, F. (2012). "Association of Genetic Variation in the Mitochondrial Genome With Blood Pressure and Metabolic Traits." Hypertension 60 (4): 949-956.

Liu, S. and Song, Y. (2010). "Building Genetic Scores to Predict Risk of Complex Diseases in Humans: Is It Possible?" Diabetes 59 (11): 2729-2731.

Liu, Z., Zhan, Y., Bai, Y. and Sun, J. (2012). "AND logic gate application based on 189 colorimetric screening of enzyme activity." Solid State Sciences 14 (7): 870–873.

Love-Gregory, L., Sherva, R., Schappe, T., Qi, J.-S., McCrea, J., Klein, S., Connelly, M.A. and Abumrad, N.A. (2011). "Common CD36 SNPs reduce protein expression and may contribute to a protective atherogenic profile." Human Molecular Genetics 20 (1): 193-201.

Low, S.-K., Takahashi, A., Cha, P.-C., Zembutsu, H., Kamatani, N., Kubo, M. and Nakamura, Y. (2012). "Genome-wide association study for intracranial aneurysm in the Japanese population identifies three candidate susceptible loci and a functional genetic variant at EDNRA." Human Molecular Genetics 21 (9): 2102-2110.

Lynch, M. (2010). "Rate, molecular spectrum, and consequences of human mutation." Proceedings of the National Academy of Sciences 107 (3): 961-968.

Macintyre, G., Bailey, J., Haviv, I. and Kowalczyk, A. (2010). "is-rSNP: a novel technique for in silico regulatory SNP detection." Bioinformatics 26 (18): i524-i530.

Mackay, T.F.C., Stone, E.A. and Ayroles, J.F. (2009). "The genetics of quantitative traits: challenges and prospects." Nature Reviews Genetics 10 (8): 565-577.

Majewski, J. and Pastinen, T. (2011). "The study of eQTL variations by RNA-seq: from SNPs to phenotypes." Trends in Genetics 27 (2): 72–79.

Makar, K., Ulgiati, D., Hagman, J. and Holers, V. (2001). "A site in the complement receptor 2 (CR2/CD21) silencer is necessary for lineage specific transcriptional regulation." International Immunology 13 (5): 657-664.

Mannherz, H., Peitsch, M., Zanotti, S., Paddenberg, R. and Polzar, B. (1995). "A new function for an old enzyme: the role of DNase I in apoptosis." Current Topics in Microbiology and Immunology 198 : 161-174.

Manolio, T.A. (2010). "Genomewide association studies and assessment of the risk of disease." The New England Journal of Medicine 363 (2): 166–176.

Maras, B., Barra, D., Duprè, S. and Pitari, G. (1999). "Is pantetheinase the actual identity of mouse and human vanin-1 proteins?" Journal of Clinical Investigation 461 (3): 149-152.

Mardis, E.R. and Wilson, R.K. (2009). "Cancer genome sequencing: a review." Human Molecular Genetics 18 (R2): R163-R168.

Marian, A.J. (2010). "Hypertrophic cardiomyopathy: from genetics to treatment." European Journal of Clinical Investigation 40 (4): 360–369.

Marian, A.J. and Belmont, J. (2011). "Strategic Approaches to Unraveling Genetic Causes of Cardiovascular Diseases." Circulation Research 108 (10): 1252-1269.

Martin, A.C. and Drubin, D.G. (2003). "Impact of genome-wide functional analyses on cell biology research." Current Opinion in Cell Biology 15 (1): 6-13.

Martin, F., Malergue, F., Pitari, G., Philippe, J., Philips, S., Chabret, C., Granjeaud, S., 190

Mattei, M., Mungall, A., Naquet, P. , et al. (2001). "Vanin genes are clustered (human 6q22-24 and mouse 10A2B1) and encode isoforms of pantetheinase ectoenzymes." Immunogenetics 53 (4): 296-306.

Martin, F., Penet, M.-F., Malergue, F., Lepidi, H., Dessein, A., Galland, F., de Reggi, M., Naquet, P. and Gharib, B. (2004). "Vanin-1(-/-) mice show decreased NSAID- and Schistosoma-induced intestinal inflammation associated with higher glutathione stores." Journal of Clinical Investigation 113 (4): 591-597.

Marzec, J., Christie, J., Reddy, S., Jedlicka, A., Vuong, H., Lanken, P., Aplenc, R., Yamamoto, T., Yamamoto, M., Cho, H.-T. , et al. (2007). "Functional polymorphisms in the transcription factor NRF2 in humans increase the risk of acute lung injury." The FASEB Journal 21 (9): 2237-2246.

Mathew, C.G. (2008). "New links to the pathogenesis of Crohn disease provided by genome-wide association scans." Nature Reviews Genetics 9(1): 9-14.

Mattocks, C.J., Morris, M.A., Matthijs, G., Swinnen, E., Corveleyn, A., Dequeker, E., Müller, C.R., Pratt, V. and Wallace, A. (2010). "A standardized framework for the validation and verification of clinical molecular genetic tests." European Journal of Human Genetics 18 (12): 1276–1288.

Matys, V., Kel-Margoulis, O., Fricke, E., Liebich, I., Land, S., Barre-Dirrie, A., Reuter, I., Chekmenev, D., Krull, M., Hornischer, K. , et al. (2006). "TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes." Nucleic Acids Research 34 (suppl 1): D108-D110.

Maynard, T.M., Haskell, G.T., Lieberman, J.A. and LaMantia, A.S. (2002). "22q11 DS: genomic mechanisms and gene function in DiGeorge/velocardiofacial syndrome." International Journal of Developmental Neuroscience 20 (3): 407–419.

McCarthy, M.I., Abecasis, G.R., Cardon, L.R., Goldstein, D.B., Little, J., Ioannidis, J.P.A. and Hirschhorn, J.N. (2008). "Genome-wide association studies for complex traits: consensus, uncertainty and challenges." Nature Reviews Genetics 9(5): 356-369.

McIntosh, A.M., Bennett, C., Dickson, D., Anestis, S.F., Watts, D.P., Webster, T.H., Fontenot, M.B. and Bradley, B.J. (2012). "The Apolipoprotein E (APOE) Gene Appears Functionally Monomorphic in Chimpanzees (Pan troglodytes)." PLoS ONE 7(10): e47760.

McManus, F., Sands, W., Diver, L., MacKenzie, S.M., Fraser, R., Davies, E. and Connell, J.M. (2012). "APEX1 Regulation of Aldosterone Synthase Gene Transcription Is Disrupted by a Common Polymorphism in Humans." Circulation Research 111 (2): 212-219.

Meckler, J.F., Bhakta, M.S., Kim, M.S., Ovadia, R., Habrian, C.H., Zykovich, A., Yu, A., Lockwood, S.H., Morbitzer, R., Elsäesser, J. , et al. (2013). "Quantitative analysis of TALE–DNA interactions suggests polarity effects." Nucleic Acids Research 41 (7): 4118-4128.

Menendez, D., Krysiak, O., Inga, A., Krysiak, B., Resnick, M. and Schönfelder, G. (2006). "A SNP in the flt-1 promoter integrates the VEGF system into the p53 191 transcriptional network." Proceedings of the National Academy of Sciences 103 (5): 1406-1411.

Messerli, F.H., Williams, B. and Ritz, E. (2007). "Essential hypertension." Lancet 370 (9587): 591-603.

Metzker, M. (2010). "Sequencing technologies - the next generation." Nature Reviews Genetics 11 (1): 31-46.

Mikkelsen, T., Ku, M., Jaffe, D., Issac, B., Lieberman, E., Giannoukos, G., Alvarez, P., Brockman, W., Kim, T.-K., Koche, R. , et al. (2007). "Genome-wide maps of chromatin state in pluripotent and lineage-committed cells." Nature 448 (7153): 553-560.

Miller, S., De Vos, P., Guerre-Millo, M., Wong, K., Hermann, T., Staels, B., Briggs, M. and Auwerx, J. (1996). "The adipocyte specific transcription factor C/EBPalpha modulates human ob gene expression." Proceedings of the National Academy of Sciences 93 (11): 5507-5511.

Mills, R.E., Pittard, W.S., Mullaney, J.M., Farooq, U., Creasy, T.H., Mahurkar, A.A., Kemeza, D.M., Strassler, D.S., Ponting, C.P., Webber, C. , et al. (2011). "Natural genetic variation caused by small insertions and deletions in the human genome." Genome Research 21 (6): 830-839.

Min-Oo, G., Fortin, A., Pitari, G., Tam, M., Stevenson, M. and Gros, P. (2007). "Complex genetic control of susceptibility to malaria: positional cloning of the Char9 locus." The Journal of Experimental Medicine 204 (3): 511–524.

Montgomery, S.B., Sammeth, M., Gutierrez-Arcelus, M., Lach, R.P., Ingle, C., Nisbett, J., Guigo, R. and Dermitzakis, E.T. (2010). "Transcriptome genetics using second generation sequencing in a Caucasian population." Nature 464 (7289): 773–777.

Moore, J.H., Asselbergs, F.W. and Williams, S.M. (2010). "Bioinformatics challenges for genome-wide association studies." Bioinformatics 26 (4): 445-455.

Moosa, M., Ayub, M., Bashar, A., Sarwardi, G., Khan, W., Khan, H. and Yeasmin, S. (2011). "Combination of two rare mutations causes B-thalassaemia in a Bangladeshi patient." Genetics and Molecular Biology 34 (3): 406-409.

Moran, A.E., Oliver, J.T., Mirzaie, M., Forouzanfar, M.H., Chilov, M., Anderson, L., Morrison, J.L., Khan, A., Zhang, N., Haynes, N. , et al. (2012). "Assessing the Global Burden of Ischemic Heart Disease : Part 1: Methods for a Systematic Review of the Global Epidemiology of Ischemic Heart Disease in 1990 and 2010." Global Heart 7(4): 315–329.

Moreau, Y. and Tranchevent, L.C. (2012). "Computational tools for prioritizing candidate genes: boosting disease gene discovery." Nature Reviews Genetics 13 : 523– 536.

Morgan, T.M., Krumholz, H.M., Lifton, R.P. and Spertus, J.A. (2007). "Non-validation of reported genetic risk factors for acute coronary syndrome in a large-scale replication study." JAMA: The Journal of the American Medical Association 297 (14): 1551-1561.

192

Mottagui-Tabar, S., Faghihi, M., Mizuno, Y., Engström, P., Lenhard, B., Wasserman, W. and Wahlestedt, C. (2005). "Identification of functional SNPs in the 5-prime flanking sequences of human genes." BMC Genomics 6(1): 18.

Muers, M. (2010). "Genomic variation: Human retrotransposons keep it active." Nature Reviews Genetics 11 (8): 527-527.

Muise, A.M., Xu, W., Guo, C.-H., Walters, T.D., Wolters, V.M., Fattouh, R., Lam, G.Y., Hu, P., Murchie, R., Sherlock, M. , et al. (2012). "NADPH oxidase complex and IBD candidate gene studies: identification of a rare variant in NCF2 that results in reduced binding to RAC2." Gut 61 (7): 1028-1035.

Mummidi, S., Adams, L.M., Van Compernolle, S.E., Kalkonde, M., Camargo, J.F., Kulkarni, H., Bellinger, A.S., Bonello, G., Tagoh, H., Ahuja, S.S. , et al. (2007). "Production of Specific mRNA Transcripts, Usage of an Alternate Promoter, and Octamer-Binding Transcription Factors Influence the Surface Expression Levels of the HIV Coreceptor CCR5 on Primary T Cells." The Journal of Immunology 178 (9): 5668- 5681.

Musunuru, K., Strong, A., Frank-Kamenetsky, M., Lee, N.E., Ahfeldt, T., Sachs, K.V., Li, X., Li, H., Kuperwasser, N., Ruda, V.M. , et al. (2010). "From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus." Nature 466 (7307): 714-719.

Myers, S., Freeman, C., Auton, A., Donnelly, P. and McVean, G. (2008). "A common sequence motif associated with recombination hot spots and genome instability in humans." Nature Genetics 40 (9): 1124–1129.

Myers, S., Spencer, C.C.A., Auton, A., Bottolo, L., Freeman, C., Donnelly, P. and McVean, G. (2006). "The distribution and causes of meiotic recombination in the human genome." Biochemical Society Transactions 34 (4): 526–530.

Nabel, E.G. (2003). "Cardiovascular disease." The New England Journal of Medicine 349 (1): 60-72.

Nackley, A., Shabalina, S., Tchivileva, I., Satterfield, K., Korchynskyi, O., Makarov, S., Maixner, W. and Diatchenko, L. (2006). "Human Catechol-O-Methyltransferase Haplotypes Modulate Protein Expression by Altering mRNA Secondary Structure." Science 314 (5807): 1930-1933.

Naka, I., Hikami, K., Nakayama, K., Koga, M., Nishida, N., Kimura, R., Furusawa, T., Natsuhara, K., Yamauchi, T., Nakazawa, M. , et al. (2012). "A functional SNP upstream of the beta-2 adrenergic receptor gene (ADRB2) is associated with obesity in Oceanic populations." International Journal of Obesity : 1-7.

Nejentsev, S., Walker, N., Riches, D., Egholm, M. and Todd, J.A. (2009). "Rare Variants of IFIH1, a Gene Implicated in Antiviral Responses, Protect Against Type 1 Diabetes." Science 324 (5925): 387-389.

Neumann, M., Fries, H.-W., Scheicher, C., Keikavoussi, P., Kolb-Mäurer, A., Bröcker, E.-B., Serfling, E. and Kämpgen, E. (2000). "Differential expression of Rel/NF-κB and octamer factors is a hallmark of the generation and maturation of dendritic cells." Blood 95 (1): 277-285. 193

Nica, A.C., Montgomery, S.B., Dimas, A.S., Stranger, B.E., Beazley, C., Barroso, I. and Dermitzakis, E.T. (2010). "Candidate causal regulatory effects by integration of expression QTLs with complex trait genetic associations." PLoS Genetics 6(4): e1000895.

Novembre, J., Johnson, T., Bryc, K., Kutalik, Z., Boyko, A.R., Auton, A., Indap, A., King, K.S., Bergmann, S., Nelson, M.R. , et al. (2008). "Genes mirror geography within Europe." Nature 456 (7218): 98–101.

O’Neill, L. and Turner, B. (1995). "Histone H4 acetylation distinguishes coding regions of the human genome from heterochromatin in a differentiation-independent manner." The EMBO Journal 14 (16): 3946–3957.

Pang, D.X., Smith, A.J. and Humphries, S.E. (2013). "Functional analysis of TCF7L2 genetic variants associated with type 2 diabetes." Nutrition, Metabolism & Cardiovascular Diseases 23 (6): 550-556.

Pasche, B. and Yi, N. (2010). "Candidate gene association studies: Successes and failures." Current Opinion in Genetics & Development 20 (3): 257–261.

Pastinen, T. (2010). "Genome-wide allele-specific analysis: insights into regulatory variation." Nature Reviews Genetics 11 (8): 533-538.

Pastinen, T., Sladek, R., Gurd, S., Sammak, A., Ge,B., Lepage, P., Lavergne., K., Villeneuve, A., Gaudin, T., Brändström., et al . (2004). “A survey of genetic and epigenetic variation affecting human gene expression.” Physiological Genomics 16 (2): 184-193.

Pearson, T.A. and Manolio, T.A. (2008). "How to interpret a genome-wide association study." JAMA: The Journal of the American Medical Association 299 (11): 1335–1344.

Pemberton, T.J., Wang, C., Li, J.Z. and Rosenberg, N.A. (2010). "Inference of unexpected genetic relatedness among individuals in HapMap Phase III." The American Journal of Human Genetics 87 (4): 457-464.

Permuth-Wey, J., Kim, D., Tsai, Y.-Y., Lin, H.-Y., Chen, Y.A., Barnholtz-Sloan, J., Birrer, M.J., Bloom, G., Chanock, S.J., Chen, Z. , et al. (2011). "LIN28B polymorphisms influence susceptibility to epithelial ovarian cancer." Cancer Research 71 (11): 3896-3903.

Peter, B.M., Huerta-Sanchez, E. and Nielsen, R. (2012). "Distinguishing between Selective Sweeps from Standing Variation and from a De novo Mutation." PLoS Genetics 8(10): e1003011.

Pickrell, J.K., Marioni, J.C., Pai, A.A., Degner, J.F., Engelhardt, B.E., Nkadori, E., Veyrieras, J.-B., Stephens, M., Gilad, Y. and Pritchard, J.K. (2010). "Understanding mechanisms underlying human gene expression variation with RNA sequencing." Nature 464 (7289): 768-772.

Pociot, F., Akolkar, B., Concannon, P., Erlich, H.A., Julier, C., Morahan, G., Nierras, C.R., Todd, J.A., Rich, S.S. and Nerup, J. (2010). "Genetics of Type 1 Diabetes: What's 194

Next?" Diabetes 59 (7): 1561–1571.

Poke, F.S., Upcher, W.R., Sprod, O.R., Young, A., Brettingham-Moore, K.H. and Holloway, A.F. (2012). "Depletion of c-Rel from Cytokine Gene Promoters Is Required for Chromatin Reassembly and Termination of Gene Responses to T Cell Activation." PLoS ONE 7(7): e41734.

Pompei, L., Jang, S., Zamlynny, B., Ravikumar, S., McBride, A., Hickman, S.P. and Salgame, P. (2007). "Disparity in IL-12 release in dendritic cells and macrophages in response to Mycobacterium tuberculosis is due to use of distinct TLRs." The Journal of Immunology 178 (8): 5192-5199.

Pool, R., Heringa, J., Hoefling, M., Schulz, R., Smith, J.C. and Feenstra, K. (2012). "Enabling grand-canonical Monte Carlo: Extending the flexibility of GROMACS through the GromPy python interface module." Journal of Computational Chemistry 33 (12): 1207-1214.

Powell, R. (2012). "The Evolutionary Biological Implications of Human Genetic Engineering." Journal of Medicine and Philosophy 37 (3): 204-225.

Pratt, R. and Dzau, V. (1999). "Genomics and hypertension. Concepts, potentials and opportunities." Journal of Hypertension 33 (1): 238–245.

Rajab, A., Straub, V., McCann, L.J., Seelow, D., Varon, R., Barresi, R., Schulze, A., Lucke, B., Lützkendorf, S., Karbasiyan, M. , et al. (2010). "Fatal cardiac arrhythmia and long-QT syndrome in a new form of congenital generalized lipodystrophy with muscle rippling (CGL4) due to PTRF-CAVIN mutations." PLoS Genetics 6(3): e1000874.

Rall, S.C. and Mahley, R.W. (1992). "The role of apolipoprotein E genetic variants in lipoprotein disorders." Journal of Internal Medicine 231 (6): 653–659.

Rao, S., Procko, E. and Shannon, M. (2001). "Chromatin Remodeling, Measured by a Novel Real-Time Polymerase Chain Reaction Assay, Across the Proximal Promoter Region of the IL-2 Gene." The Journal of Immunology 167 (8): 4494-4503.

Raunch, T.A. and Pfeifer, G.P. (2005). "Methylated-CpG island recovery assay: a new technique for the rapid detection of methylated-CpG islands in cancer." Laboratory Investigation 85 (9): 1172-1180.

Reddy, M.V.P.L., Wang, H., Liu, S., Bode, B., Reed, J.C., Steed, R.D., Anderson, S.W., Steed, L., Hopkins, D. and She, J.-X. (2011). "Association between type 1 diabetes and GWAS SNPs in the southeast US Caucasian population." Genes and Immunity 12 (3): 208–212.

Reddy, S.D.N., Pakala, S.B., Ohshiro, K., Rayala, S.K. and Kumar, R. (2009). "MicroRNA-661, a c/EBP α Target, Inhibits Metastatic Tumor Antigen 1 and Regulates Its Functions." Cancer Research 69 (14): 5639-5642.

Redon, R., Ishikawa, S., Fitch, K.R., Feuk, L., Perry, G.H., Andrews, D., Fiegler, H., Shapero, M.H., Carson, A.R., Chen, W. , et al. (2006). "Global variation in copy number in the human genome." Nature 444 (7118): 444-454.

195

Ria, M., Lagercrantz, J., Samnegård, A., Boquist, S., Hamsten, A. and Eriksson, P. (2011). "A Common Polymorphism in the Promoter Region of the TNFSF4 Gene Is Associated with Lower Allele-Specific Expression and Risk of Myocardial Infarction." PLoS ONE 6(3): e17652.

Rizzo, J.M., Bard, J.E. and Buck, M.J. (2012). "Standardized collection of MNase-seq experiments enables unbiased dataset comparisons." BMC Molecular Biology 13 (1): 15.

Robertson, G., Hirst, M., Bainbridge, M., Bilenky, M., Zhao, Y., Zeng, T., Euskirchen, G., Bernier, B., Varhol, R., Delaney, A. , et al. (2007). "Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing." Nature Methods 4(8): 651–657.

Robertson, K. (2005). "DNA methylation and human disease." Nature Reviews Genetics 6(8): 597-610.

Rohde, C., Zhang, Y., Reinhardt, R. and Jeltsch, A. (2010). "BISMA - Fast and accurate bisulfite sequencing data analysis of individual clones from unique and repetitive sequences." BMC Bioinformatics 11 (1): 230.

Roisin-Bouffay, C., Castellano, R., Valéro, R., Chasson, L., Galland, F. and Naquet, P. (2008). "Mouse vanin-1 is cytoprotective for islet beta cells and regulates the development of type 1 diabetes." Diabetologia 51 (7): 1192–1201.

Rosenbloom, K.R., Dreszer, T.R., Long, J.C., Malladi, V.S., Sloan, C.A., Raney, B.J., Cline, M.S., Karolchik, D., Barber, G.P., Clawson, H. , et al. (2012). "ENCODE whole- genome data in the UCSC Genome Browser: update 2012." Nucleic Acids Research 40 (D1): D912-D917.

Sajan, S.A. and Hawkins, R.D. (2012). "Methods for identifying higher-order chromatin structure." Annual Review of Genomics and Human Genetics 13 : 59-82.

Salmon-Divon, M., Dvinge, H., Tammoja, K. and Bertone, P. (2010). "PeakAnalyzer: Genome-wide annotation of chromatin binding and modification loci." BMC Bioinformatics 11 (1): 415.

Sambrook, J. and Russell, D.W. (2001). Molecular Cloning: A Laboratory Manual . Cold Spring Harbor Laboratory Press, New York.

Sandelin, A., Alkema, W., Engström, P., Wasserman, W. and Lenhard, B. (2004). "JASPAR: an open access database for eukaryotic transcription factor binding profiles." Nucleic Acids Research 32 (suppl 1): D91-D94.

Sanders, S.J., Ercan-Sencicek, A.G., Hus, V., Luo, R., Murtha, M.T., Moreno-De-Luca, D., Chu, S.H., Moreau, M.P., Gupta, A.R., Thomson, S.A. , et al. (2011). "Multiple recurrent de novo CNVs, including duplications of the 7q11.23 Williams syndrome region, are strongly associated with autism." Neuron 70 (5): 863–885.

Sandovici, I., Kassovska-Bratinova, S., Vaughan, J.E., Stewart, R., Leppert, M. and Sapienza, C. (2006). "Human Imprinted Chromosomal Regions Are Historical Hot- Spots of Recombination." PLoS Genetics 2(7): e101.

196

Satake, W., Nakabayashi, Y., Mizuta, I., Hirota, Y., Ito, C., Kubo, M., Kawaguchi, T., Tsunoda, T., Watanabe, M., Takeda, A. , et al. (2009). "Genome-wide association study identifies common variants at four loci as genetic risk factors for Parkinson's disease." Nature Genetics 41 (12): 1303 - 1307.

Saunders, A.M., Strittmatter, W.J., Schmechel, D., George-Hyslop, P.H., Pericak- Vance, M.A., Joo, S.H., Rosi, B.L., Gusella, J.F., Crapper-MacLachlan, D.R., Alberts, M.J. , et al. (1993). "Association of apolipoprotein E allele epsilon 4 with late-onset familial and sporadic Alzheimer's disease." Neurology 43 (8): 1467–1472.

Saxonov, S., Berg, P. and Brutlag, D.L. (2006). "A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters." Proceedings of the National Academy of Sciences 103 (5): 1412–1417.

Scally, A. and Durbin, R. (2012). "Revising the human mutation rate: implications for understanding human evolution." Nature Reviews Genetics 13 (10): 745-753.

Schaefer, C., Meier, A., Rost, B. and Bromberg, Y. (2012). "SNPdbe: constructing an nsSNP functional impacts database." Bioinformatics 28 (4): 601-602.

Schaeffeler, E., Eichelbaum, M., Reinisch, W., Zanger, U. and Schwab, M. (2006). "Three Novel Thiopurine S-Methyltransferase Allelic Variants (TPMT*20, *21, *22) – Association With Decreased Enzyme Function." Human Mutation 27 (9): 976.

Schork, N.J., Murray, S.S., Frazer, K.A. and Topol, E.J. (2009). "Common vs. Rare Allele Hypotheses for Complex Diseases." Current Opinion in Genetics & Development 19 (3): 212–219.

Selden, R.F., Howie, K.B., Rowe, M.E., Goodman, H.M. and Moore, D.D. (1986). "Human Growth Hormone as a Reporter Gene in Regulation Studies Employing Transient Gene Expression." Molecular and Cellular Biology 6(9): 3173-3179.

Shan, J., Mahfoudh, W., Dsouza, S.P., Hassen, E., Bouaouina, N., Abdelhak, S., Benhadjayed, A., Memmi, H., Mathew, R.A., Aigha, I.I. , et al. (2012). "Genome-Wide Association Studies (GWAS) breast cancer susceptibility loci in Arabs: susceptibility and prognostic implications in Tunisians." Breast Cancer Research and Treatment 135 (3): 715–724.

Shastry, B.S. (2007). “SNPs in disease gene mapping, medicinal drug development and evolution.” Journal of Human Genetics 52 (11): 871-880.

Shendure, J. and Ji, H. (2008). "Next-generation DNA sequencing." Nature Biotechnology 26 (10): 1135-1145.

Sherry, S., Ward, M., Kholodov, M., Baker, J., Phan, L., Smigielski, E. and Sirotkin, K. (2001). "dbSNP: the NCBI database of genetic variation." Nucleic Acids Research 29 (1): 308-311.

Shoemaker, R., Deng, J., Wang, W. and Zhang, K. (2010). "Allele-specific methylation is prevalent and is contributed by CpG-SNPs in the human genome." Genome Research 20 (7): 883–889.

197

Sleiman, P., Flory, J., Imielinski, M., Bradfield, J., Annaiah, K., Willis-Owen, S.A., Wang, K., Rafaels, N.M., Michel, S., Bonnelykke, K., et al. (2010). "Variants of DENND1B associated with asthma in children." The New England Journal of Medicine 362 (1): 36-44.

Smith, A.J., Palmen, J., Putt, W., Talmud, P.J., Humphries, S.E. and Drenos, F. (2010). "Application of statistical and functional methodologies for the investigation of genetic determinants of coronary heart disease biomarkers: lipoprotein lipase genotype and plasma triglycerides as an exemplar." Human Molecular Genetics 19 (20): 3936-3947.

Smith AJP, Howard P, Shah S, Eriksson P, Stender S, et al. (2012) “Use of Allele- Specific FAIRE to Determine Functional Regulatory Polymorphism Using Large-Scale Genotyping Arrays” . PLoS Genetics 8(8): e1002908

Smith, C. and Hager, G. (1997). "Transcriptional regulation of mammalian genes in vivo. A tale of two templates." The Journal of Biological Chemistry 272 (44): 27493- 27496.

Smith, R., Seale, R. and Yu, J. (1983). "Transcribed chromatin exhibits an altered nucleosomal spacing." Proceedings of the National Academy of Sciences 80 (18): 5505- 5509.

Ståhl, P.L. and Lundeberg, J. (2012). "Toward the Single-Hour High-Quality Genome." Annual Review of Biochemistry 81 : 359-378.

Statham, A., Robinson, M., Song, J., Coolen, M., Stirzaker, C. and Clark, S. (2012). "Bisulfite sequencing of chromatin immunoprecipitated DNA (BisChIP-seq) directly informs methylation status of histone-modified DNA." Genome Research 22 (6): 1120- 1127.

Steidl, U., Steidl, C., Ebralidze, A., Chapuy, B., Han, H-J., Will, B., Rosenbauer, F., et al . (2007) "A distal single nucleotide polymorphism alters long-range regulation of the PU. 1 gene in acute myeloid leukemia." Journal of Clinical Investigation 117 (9): 2611- 2620.

Stower, H. (2013). "Evolution: Explosive human genetic variation." Nature Reviews Genetics 14 (1): 5.

Stranger, B.E., Stahl, E.A. and Raj, T. (2011). "Progress and Promise of Genome-Wide Association Studies for Human Complex Trait Genetics." Genetics 187 (2): 367-383.

Su, L., Creusot, R.J., Gallo, E.M., Chan, S.M., Utz, P.J., Fathman, C.G. and Ermann, J. (2004). "Murine CD4+ CD25+ regulatory T cells fail to undergo chromatin remodeling across the proximal promoter region of the IL-2 gene." The Journal of Immunology 173 (8): 4994-5001.

Sun, B., Chang, M., Chen, D. and Nie, P. (2006). "Gene structure and transcription of IRF-2 in the mandarin fish Siniperca chuatsi with the finding of alternative transcripts and microsatellite in the coding region." Immunogenetics 58 (9): 774-784.

Sun, J., Jia, P., Fanous, A.H., Webb, B.T., van den Oord, E.J.C.G., Chen, X., Bukszar, J., Kendler, K.S. and Zhao, Z. (2009). "A multi-dimensional evidence-based candidate 198 gene prioritization approach for complex diseases–schizophrenia as a case." Bioinformatics 25 (19): 2595-6602.

Suzuki, H., Maruyama, R., Yamamoto, E. and Kai, M. (2012). "DNA methylation and microRNA dysregulation in cancer." Molecular Oncology 6(6): 567-578.

Syvanen, A., Landegren, U., Isaksson, A., Gyllensten, U. and Brookes, A. (1999). "Enthusiasm mixed with skepticism about singlenucleotide polymorphism markers for dissecting complex disorders." European Journal of Human Genetics 7: 98-101.

Szpirer, C., Riviere, M., Cortese, R., Nakamura, T., Islam, M., Levan, G. and Szpirer, J. (1992). "Chromosomal localization in man and rat of the genes encoding the liver- enriched transcription factors C/EBP, DBP, and HNF1/LFB-1 (CEBP, DBP, and transcription factor 1, TCF1, respectively) and of the hepatocyte growth factor/scatter factor gene (HGF)." Genomics 13 (2): 293–300.

Tabor, H.K., Risch, N.J. and Myers, R.M. (2002). "Candidate-gene approaches for studying complex genetic traits: practical considerations." Nature Reviews Genetics 3(5): 391-397.

Tanira, M.O. and Al Balushi, K.A. (2005). "Genetic variations related to hypertension: a review." Journal of Human Hypertension 19 (1): 7-19.

Tao, H., Cox, D.R. and Frazer, K.A. (2006). "Allele-Specific KRT1 Expression Is a Complex Trait." PLoS Genetics 2(6): e93.

Tao, S., Wang, Z., Feng, J., Hsu, F.-C., Jin, G., Kim, S.-T., Zhang, Z., Gronberg, H., Zheng, L.S., Isaacs, W.B. , et al. (2012). "A genome-wide search for loci interacting with known prostate cancer risk-associated genetic variants." Carcinogenesis 33 (3): 598-603.

Taylor, J.Y., Maddox, R. and Wu, C.Y. (2009). "Genetic and Environmental Risks for High Blood Pressure Among African American Mothers and Daughters." Biological Research for Nursing 11 (1): 53–65.

Tenesa, A., Farrington, S.M., Prendergast, J.G., Porteous, M.E., Walker, M., Haq, N., Barnetson, R.A., Theodoratou, E., Cetnarskyj, R., Cartwright, N. , et al. (2008). "Genome-wide association scan identifies a colorectal cancer susceptibility locus on 11q23 and replicates risk loci at 8q24 and 18q21." Nature Genetics 40 (5): 631–637.

Teng, M., Ichikawa, S., Padgett, L.R., Wang, Y., Mort, M., Cooper, D.N., Koller, D.L., Foroud, T., Edenberg, H.J., Econs, M.J. , et al. (2012). "regSNPs: a strategy for prioritizing regulatory single nucleotide substitutions." Bioinformatics 28 (14): 1879- 1886.

The International HapMap Consortium (2003). "The International HapMap Project." Nature 426 (6968): 789–796.

The International HapMap Consortium (2004). "Integrating ethics and science in the International HapMap Project." Nature Reviews Genetics 5(6): 467–475.

The International HapMap Consortium (2007). "A second generation human haplotype 199 map of over 3.1 million SNPs." Nature 449 (7164): 851-861.

Thomas, P.E., Klinger, R., Furlong, L.I., Hofmann-Apitius, M. and Friedrich, C.M. (2011). "Challenges in the association of human single nucleotide polymorphism mentions with unique database identifiers." BMC Bioinformatics 12 (suppl 4): S4.

Thomas-Chollier, M., Hufton, A., Heinig, M., O'Keeffe, S., El Masri, N., Roider, H.G., Manke, T. and Vingron, M. (2011). "Transcription factor binding predictions using TRAP for the analysis of ChIP-seq data and regulatory SNPs." Nature Protocols 6(12): 1860–1869.

Thurman, R.E., Rynes, E., Humbert, R., Vierstra, J., Maurano, M.T., Haugen, E., Sheffield, N.C., Stergachis, A.B., Wang, H., Vernot, B. , et al. (2012). "The accessible chromatin landscape of the human genome." Nature 489 (7414): 75-82.

Timpson, N.J., Wade, K.H. and Smith, G.D. (2012). "Mendelian Randomization: Application to Cardiovascular Disease." Current Hypertension Reports 14 (1): 29-37.

Todd, J.A., Walker, N.M., Cooper, J.D., Smyth, D.J., Downes, K., Plagnol, V., Bailey, R., Nejentsev, S., Field, S.F., Payne, F. , et al. (2007). "Robust associations of four new chromosome regions from genome-wide analyses of type 1 diabetes." Nature Genetics 39 (7): 857-864.

Tomlinson, I.P., Webb, E., Carvajal-Carmona, L., Broderick, P., Howarth, K., Pittman, A.M., Spain, S., Lubbe, S., Walther, A., Sullivan, K. , et al. (2008). "A genome-wide association study identifies colorectal cancer susceptibility loci on 10p14 and 8q23.3." Nature Genetics 40 (5): 623–630.

Tomlinson, I.P.M., Houlston, R.S., Montgomery, G.W., Sieber, O.M. and Dunlop, M.G. (2012). "Investigation of the effects of DNA repair gene polymorphisms on the risk of colorectal cancer." Mutagenesis 27 (2): 219-223.

Tsukada, S., Tanaka, Y., Maegawa, H., Kashiwagi, A., Kawamori, R. and Maeda, S. (2006). "Intronic Polymorphisms within TFAP2B Regulate Transcriptional Activity and Affect Adipocytokine Gene Expression in Differentiated Adipocytes." Molecular Endocrinology 20 (5): 1104-1111.

Tycko, B. (2010). "Allele-specific DNA methylation: beyond imprinting." Human Molecular Genetics 19 (R2): R210-R220.

Ursic-Bedoya, R., Nazzari, H., Cooper, D., Triana, O., Wolff, M. and Lowenberger, C. (2008). "Identification and characterization of two novel lysozymes from Rhodnius prolixus, a vector of Chagas disease." Journal of Insect Physiology 54 (3): 593–603. van Rijnsoever, M., Grieu, F., Elsaleh, H., Joseph, D. and Iacopetta, B. (2002). "Cancer Characterisation of colorectal cancers showing hypermethylation at multiple CpG islands." Gut 51 (6): 797-802.

Visvikis-Siest, S. and Marteau, J.-B. (2006). "Genetic variants predisposing to cardiovascular disease." Current Opinion in Lipidology 17 (2): 139-151.

Volkmar, M., Dedeurwaerder, S., Cunha, D.A., Ndlovu, M.N., Defrance, M., Deplus, 200

R., Calonne, E., Volkmar, U., Igoillo-Esteve, M., Naamane, N. , et al. (2012). "DNA methylation profiling identifies epigenetic dysregulation in pancreatic islets from type 2 diabetic patients." The EMBO Journal 31(6): 1405–1426. von Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S.G., Fields, S. and Bork, P. (2002). "Comparative assessment of large-scale data sets of protein-protein interactions." Nature 417 (6887): 399-403.

Wang, J., Fan, H.C., Behr, B. and Quake, S.R. (2012). "Genome-wide single-cell analysis of recombination activity and de novo mutation rates in human sperm." Cell 150 (2): 402-412.

Wang, K., Zhang, H., Ma, D., Bucan, M., Glessner, J.T., Abrahams, B.S., Salyakina, D., Imielinski, M., Bradfield, J.P., Sleiman, P.M. , et al. (2009). "Common genetic variants on 5p14.1 associate with autism spectrum disorders." Nature 459 (7246): 528-533.

Wang, L., Ellsworth, K.A., Moon, I., Pelleymounter, L.L., Eckloff, B.W., Martin, Y.N., Fridley, B.L., Jenkins, G.D., Batzler, A., Suman, V.J. , et al. (2010). "Therapeutics, Targets, and Chemical Biology: Functional Genetic Polymorphisms in the Aromatase Gene CYP19 Vary the Response of Breast Cancer Patients to Neoadjuvant Therapy with Aromatase Inhibitors." Cancer Research 70 (1): 319-328.

Wang, S., Wen, F., Wiley, G.B., Kinter, M.T., Gaffney, P.M. (2013) “An Enhancer Element Harboring Variants Associated with Systemic Lupus Erythematosus Engages the TNFAIP3Promoter to Influence A20 Expression” PLoS Genetics 9(9): e1003750.

Wang, W., Ronaghi, M., Chong, S.S. and Lee, C.G.L. (2011). "pfSNP: An integrated potentially functional SNP resource that facilitates hypotheses generation through knowledge syntheses." Human Mutation 32 (1): 19-24.

Wang, Z. and Moult, J. (2001). "SNPs, Protein Structure, and Disease." Human Mutation 17 (4): 263-270.

Watt, F. and Molloy, P. (1988). "Cytosine methylation prevents binding to DNA of a HeLa cell transcription factor required for optimal expression of the adenovirus major late promoter." Genes and Development 2(9): 1136-1143.

Wei, G., Hu, G., Cui, K. and Zhao, K. (2012). "Genome-wide mapping of nucleosome occupancy, histone modifications, and gene expression using next-generation sequencing technology." Methods in Enzymology 513 : 297-313.

Wei, S., Niu, J., Zhao, H., Liu, Z., Wang, L.-E., Han, Y., Chen, W., Amos, C., Rafnar, T., Sulem, P. , et al. (2011). "Association of a novel functional promoter variant (rs2075533 C>T) in the apoptosis gene TNFSF8 with risk of lung cancer—a finding from Texas lung cancer genome-wide association study." Carcinogenesis 32 (4): 507- 515.

Weischenfeldt, J., Symmons, O., Spitz, F. and Korbel, J.O. (2013). "Phenotypic impact of genomic structural variation: insights from and for human disease." Nature Reviews Genetics 14 (2): 125-138.

Werner, T. (2010). "Next generation sequencing in functional genomics." Briefings in 201

Bioinformatics 11 (5): 499-511.

Willer, C.J., Sanna, S., Jackson, A.U., Scuteri, A., Bonnycastle, L.L., Clarke, R., Heath, S.C., Timpson, N.J., Najjar, S.S., Stringham, H.M. , et al. (2008). "Newly identified loci that influence lipid concentrations and risk of coronary artery disease." Nature Genetics 40 (2): 161 - 169.

Wingender, E., Dietze, P., Karas, H. and Knüppel, R. (1996). "TRANSFAC: a database on transcription factors and their DNA binding sites." Nucleic Acids Research 24 (1): 238-241.

Wright, J., Brown, S. and Cole, M. (2010). "Upregulation of c-MYC in cis through a Large Chromatin Loop Linked to a Cancer Risk-Associated Single-Nucleotide Polymorphism in Colorectal Cancer Cells." Molecular Cell Biology 20 (6): 1411-1420.

Wu, H., Boackle, S.A., Hanvivadhanakul, P., Ulgiati, D., Grossman, J.M., Lee, Y., Shen, N., Abraham, L.J., Mercer, T.R., Park, E. , et al. (2007). "Association of a common complement receptor 2 haplotype with increased risk of systemic lupus erythematosus." Proceedings of the National Academy of Sciences of the United States of America 104 (10): 3961–3966.

Wu, J., Smith, L.T., Plass, C.P. and Huang, T.H.M. (2006). "ChIP-chip comes of age for genome-wide functional analysis." Cancer Research 66 (14): 6899-6902.

Wu, L. and Winston, F. (1997). "Evidence that Snf–Swi controls chromatin structure over both the TATA and UAS regions of the SUC2 promoter in Saccharomyces cerevisiae." Nucleic Acids Research 25 (21): 4230–4234.

Xia, F., Dou, Y., Lei, G. and Tan, Y. (2011). "FPGA accelerator for protein secondary structure prediction based on the GOR algorithm." BMC Bioinformatics 12 (suppl 1): S5.

Xiao, F., Widlak, P. and Garrard, W.T. (2007). "Engineered apoptotic nucleases for chromatin research." Nucleic Acids Research 35 (13): e93.

Xu, Z. and Taylor, J. (2009). "SNPinfo: integrating GWAS and candidate gene information into functional SNP selection for genetic association studies." Nucleic Acids Research 37 (suppl 2): W600-W605.

Yamazaki, K., McGovern, D., Ragoussis, J., Paolucci, M., Butler, H., Jewell, D., Cardon, L., Takazoe, M., Tanaka, T., Ichimori, T. , et al. (2005). "Single nucleotide polymorphisms in TNFSF15 confer susceptibility to Crohn's disease." Human Molecular Genetics 14 (22): 3499–3506.

Yanai, I., Benjamin, H., Shmoish, M., Chalifa-Caspi, V., Shklar, M., Ophir, R., Bar- Even, A., Horn-Saban, S., Safran, M., Domany, E. , et al. (2005). "Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification." Bioinformatics 21 (5): 650-659.

Yanase, K., Tsukahara, S., Mitsuhashi, J. and Sugimoto, Y. (2006). "Functional SNPs of the breast cancer resistance protein †therapeutic effects and inhibitor development." Cancer letters 234 (1): 73-80.

202

Yeh, H.-Y. and Klesius, P. (2008). "Molecular cloning, sequencing and characterization of channel catfish (Ictalurus punctatus, Rafinesque 1818) cathepsin S gene." Veterinary Immunology and Immunopathology 126 (3): 382–387.

Yigit, E., Bischof, J.M., Zhang, Z., Ott, C.J., Kerschner, J.L., Leir, S., Buitrago- Delgado, E., Zhang, Q., Wang, J.P., Widom, J. , et al. (2013). "Nucleosome mapping across the CFTR locus identifies novel regulatory factors." Nucleic Acids Research 41 (5): 2857-2868.

Zhang, K., Li, J.B., Gao, Y., Egli, D., Xie, B., Deng, J., Li, Z., Lee, J.H., Aach, J., Leproust, E.M. , et al. (2009). "Digital RNA allelotyping reveals tissue-specific and allele-specific gene expression in human." Nature Methods 6(8): 613–618.

Zhang, L., Liu, Y., Song, F., Zheng, H., Hu, L., Lu, H., Liu, P., Hao, X., Zhang, W. and Chen, K. (2011). "Functional SNP in the microRNA-367 binding site in the 3'UTR of the calcium channel ryanodine receptor gene 3 (RYR3) affects breast cancer risk and calcification." Proceedings of the National Academy of Sciences 108 (33): 13653-13658.

Zhang, W.M., Zhang, Y., Zhang, L.H., Wang, S.G., Zhu, T.Y., Lin, D. and Ma, G.Z. (2005). "Nucleotide sequence and mRNA expression analysis of β-actin gene in the orange-spotted grouper Epinephelus coioides." Fish Physiology and Biochemistry 31 (4): 373-383.

Zhang, Y. and Jeltsch, A. (2010). "The Application of Next Generation Sequencing in DNA Methylation Analysis." Genes 1(1): 85-101.

Zhang, Y., Rohde, C., Reinhardt, R., Voelcker-Rehage, C. and Jeltsch, A. (2009). "Non- imprinted allele-specific DNA methylation on human autosomes." Genome Biology 10 (12): R138

Zhu, M. and Zhao, S. (2007). "Candidate Gene Identification Approach: Progress and Challenges." International Journal of Biological Sciences 3(7): 420-427.

Zhu, X. and Cooper, R. (2007). "Admixture Mapping Provides Evidence of Association of the VNN1 Gene with Hypertension." PLoS ONE 2(11): e1244.

203

Appendix – Supplementary Tables and Figures

Table A.1: Description of general biochemicals and reagents

As shown are the biochemical reagents (“Biochemical/reagent”) that were used in the current study. Biochemicals were of the highest grade available and purchased from a variety of distributers (“Distributer/Supplier”).

Biochemical/reagent Distributer/Supplier 1X Dulbecco’s Phosphate Buffered Saline Life Technologies Australia Pty Ltd, (PBS) Mulgrave, VIC, Australia (Life Technologies, Australia) 100 base pair DNA Ladder (500 µg/mL) New England Biolabs, Ipswich, MA, USA (New England Biolabs, USA) 1 kilobase DNA Ladder (500 µg/mL) New England Biolabs, Ipswich, MA, USA (New England Biolabs, USA) Agarose, analytical grade Promega Corporation, Madison WI, USA (Promega, USA) Ampicillin sodium salt Sigma-Aldrich Pty Ltd, Castle Hill, NSW Australia (Sigma, Australia) Bacto-tryptone Difco Laboratories, Detroit MI, USA (Difco, USA) Bacto-yeast extract Difco Laboratories, Detroit MI, USA (Difco, USA) Boric acid Sigma-Aldrich Pty Ltd, Castle Hill, NSW Australia (Sigma, Australia) Bovine Serum Albumin (BSA) New England Biolab, Ipswich, MA, USA (New England Biolabs, USA) Bromophenol blue Merck Pty Limited, Kilsyth, Victoria, Austraila (Merck, Australia) DH5 α-T1 R Chemically Competent Life Technologies Australia Pty Ltd, Escherichia Coli Mulgrave, VIC, Australia (Life Technologies, Australia) Difco Agar Difco Laboratories, Detroit, MI, USA (Difco, USA) Dimethyl sulfoxide (DMSO) Merck Pty. Ltd, Kilsyth VIC, Australia (Merch, Australia) Deoxyribonuclease I (DNase I) Invitrogen Corporation, Mount Waverley, VIC, Australia (Invitrogen Corporation, Australia) Dithiothreitol (DTT) Sigma-Aldrich Pty Ltd, Castle Hill, NSW Australia (Sigma, Australia) Ethylenediaminetetraacetic acid (EDTA) Sigma-Aldrich Pty Ltd, Castle Hill, NSW Australia (Sigma, Australia) Ethylene glycol tetraacetic acid (EGTA) Sigma-Aldrich Pty Ltd, Castle Hill, NSW Australia (Sigma, Australia) Ethanol, absolute APS Ajax Finechem, Auburn NSW, Australia (APS Ajax Finechem, Australia)

204

Biochemical/reagent Distributer/Supplier Ethidium Bromide Sigma-Aldrich Pty Ltd, Castle Hill, NSW Australia (Sigma, Australia) Fetal Bovine Serum (FBS) Life Technologies, Gaitherburg MD, USA (Life Technologies, USA) Ficoll GE Healthcare Technologies, Sydney, Australia (GE Healthcare Limited, Australia) Gel Loading Dye Blue (6X) New England Biolabs, Ipswich, MA, USA (New England Biolabs, USA) Glycerol Merck Pty Limited, Kilsyth, Victoria, Austraila (Merck, Australia) HEPES MP Biomedicals, LLC, Ohio, USA (MP Biomedicals, USA) Hi-Di Formamide Applied Biosystems Pty Ltd, Scoresby Victoria, Australia (Applied Biosystems, Australia) Hydrogen Chloride School of Biochemistry Chemical Store Isopropanol (propan-2-ol) Merck Pty. Limited, Kilsyth VIC, Australia (Merch, Australia) Liquid Nitrogen School of Biochemistry Chemical Store, Crawley, WA, Australia Leupeptin Sigma-Aldrich Pty Ltd, Castle Hill, NSW Australia (Sigma, Australia) Magnesium Chloride Sigma-Aldrich Pty Ltd, Castle Hill, NSW Australia (Sigma, Australia) Micrococcal Nuclease (MNase) New England Biolab, Ipswich, MA, USA (New England Biolabs, USA) Penicillin-Streptomycin Life Technologies Australia Pty Ltd, 10 000 Units/mL Penicillin Mulgrave, VIC, Australia 10 000 µg/mL Streptomycin (Life Technologies, Australia) Pepstatin Sigma-Aldrich Pty Ltd, Castle Hill, NSW Australia (Sigma, Australia) Phosphate Buffered Saline (PBS) Life Technologies, Gaitherburg MD, USA (Life Technologies, USA) Phenylmethylsulfonyl fluoride (PMSF) Sigma-Aldrich Pty Ltd, Castle Hill, NSW Australia (Sigma, Australia) Poly-deoxy-inosinic-deoxy-cytidylic Roche Diagnostics Corporation, (Poly[d(I-C)]) Indianapolis IN, USA (Roche Diagnostics, USA) Potassium Chloride CHEM-SUPPLY, Gillman, SA, Australia Proteinase K New England Biolab, Ipswich, MA, USA (New England Biolabs, USA) RPMI-1640 cell culture medium Life Technologies, Gaitherburg MD, USA (Life Technologies, USA) Sodium Acetate Laboratory Supply, Osborn Park, WA, Australia (Laboratory Supply, Australia) Sodium Chloride Merck Pty. Limited, Kilsyth VIC, Australia (Merch, Australia) Sodium dodecyl sulphate (SDS) Sigma-Aldrich Pty Ltd, Castle Hill, NSW 205

Australia (Sigma, Australia) Biochemical/reagent Distributer/Supplier Sodium Hydroxide School of Biochemistry Chemical Store, Crawley, WA, Australia Trimza base Sigma-Aldrich Pty Ltd, Castle Hill, NSW Australia (Sigma, Australia) Trypan Blue Stain Sigma-Aldrich Pty Ltd, Castle Hill, NSW Australia (Sigma, Australia) Water for Irrigation Baxter Healthcare Pty Ltd, Old Toongabbie, NSW, Australia X-Gal (5-bromo-4-chloro-3-indolyl-beta- Promega Corporation, Madison WI, USA D-galacto-pyranoside) (Promega, USA)

206

Table A.2: Distributers for laboratory equipment used in research

As shown is the equipment (“Equipment”) that was used in the current study. Equipment was of the highest standard available and was previously purchased from a variety of distributers (“Distributer/Supplier”).

Equipment Distributer/Supplier Beckman J2-MC Centrifuge Beckman Coulter Inc., Fullerton CA, USA (Beckman Coulter, USA) Bioline Shaking Incubator Edwards Instrument Company, Narellan NSW, Australia (Edwards Instrument Company, Australia) CFX96 Real-Time PCR Detection System Bio-Rad Laboratories, CA, USA (Bio-Rad Laboratories, USA) C0 2 Water Jacketed Incubator Forma Scientific Inc., Marietta OH, USA (Forma Scientific, USA) Dark reader - blue light transilluminator Clare Chemical Research, CO, USA (Clare Chemical Research, USA) Electrophoresis Tank BIO-RAD Laboratories, Hercules, CA, USA (BIO-RAD, USA) Eppendorf Centrifuge 5415C Eppendorf AG, Hamberg, Germany (Eppendorf, Germany) Eppendorf Refridgerative Microcentrifuge Eppendorf AG, Hamberg, Germany 5417R (Eppendorf, Germany) Hypercassette Amersham International plc, Buckinghamshire, England, UK (Amersham, UK) Jouan C312 swinging rotor centrifuge Jouan, St-Herblain, France (Jouan, France) Nanodrop ND-100 NanoDrop Technologies, DE, USA (NanoDrop, USA) Neubauer Counting Chamber ProSciTech, Kirwan, QLD, Australia PTC-100 Programmable Thermal MJ Research Inc., San Francisco, CA, USA Controller (MJ Research, USA) Spectroline UV Transilluminator Spectronics Corporation, Westbury, NY, USA (Spectronics, USA) Wild M40 light microscope Wild Heerbrugg, Gais, Switzerland XCell SureLock Electrophoresis Cell Invitrogen Corporation, Mount Waverley, VIC, Australia (Invitrogen Corporation, Australia)

207

Table A.3: Distributers for general laboratory disposables used in this research

As shown are the general laboratory disposables (“Disposables”) that were used in the current study. Disposables were of the highest standard available and were purchased from a variety of distributers (“Distributer/Supplier”).

Disposables Distributer/Supplier 1.5mL Cryovials Nalge Nunc International, NY, USA (Nalge Nuc International, USA) 96-well plates Bio-Rad Laboratories, CA, USA (Bio- Rad Laboratories, USA) 15ml and 50mL Falcon Tubes Sarstedt Inc., Nürnberg, Germany (Sarstedt, Germany) 0.2mL and 1.5ml Eppendorf micro- Eppendorf AG, Hamberg, Germany centrifuge tubes (Eppendorf, Germany) 25 cm 2 and 75 cm 2 filter-top culture flask Stratagene, La Jolla, CA, USA Petri dishes Sarstedt Inc., Nürnberg, Germany (Sarstedt, Germany) Pasteur pipettes Sarstedt Inc., Nürnberg, Germany (Sarstedt, Germany)

208

Table A.4: DNase I cutting at the -137 and -587 VNN1 gene promoter SNPs, with SPA-2 and β-actin control loci in SAFHS cell 0054

DNase I-treated cultured cell line of the genotypes (“Genotype”) shown were used to calculate chromatin accessibility expressed as a ratio of undigested to DNase I-digested (DNase I cutting [%]). The data were obtained from the Ramos cell line and assayed in triplicate (n=3). The average DNase I cutting values for each replicate are shown in addition to the overall average (“Average”) of each genotype.

Genotype -137T/G -587G/A SPA-2 β-actin DNase I cutting 19.995 15.991 3.754 17.761 (%) 24.990 15.310 7.218 20.387 23.940 19.949 11.591 23.417 Average 23.0 17.1 7.5 20.5

209

Table A.5: DNase I cutting at the -137 and -587 VNN1 gene promoter SNPs in homozygous SAFHS cell lines

DNase I-treated cell lines of the genotypes (“Genotype”) shown were used to calculate chromatin accessibility expressed as a ratio of undigested to DNase I-digested (DNase I cutting [%]). The data were obtained from cultured human cell lines derived from the SAFHS cohort for each genotype and assayed in triplicate (n=3). The average DNase I cutting values for each replicate are shown in addition to the overall average (“Average”) of each genotype.

Genotype 137TT 137GG 587GG 587AA DNase I cutting (%) 31.815 25.357 20.203 24.558 29.767 26.024 17.761 22.562 29.604 24.974 16.576 20.650 Average 30.4 25.5 18.2 22.6

210

Table A.6: MNase cutting at the -137 and -587 VNN1 gene promoter SNPs, with SPA-2 and β-actin control loci in heterozygous Ramos cell line

MNase-treated cell lines of the genotypes (“Genotype”) shown were used to calculate chromatin accessibility expressed as a ratio of undigested to MNase-digested (MNase cutting [%]). As shown are the data obtained from cultured cell lines derived from the SAFHS cohort for each genotype and assayed in triplicate (n=3). The average MNase cutting values for each replicate are shown in addition to the overall average (“Average”) of each genotype.

Genotype -137T/G -587G/A SPA-2 β-actin MNase cutting (%) 28.327 25.346 17.166 28.105 30.898 26.273 21.606 30.466 33.118 26.193 22.299 31.191 Average 30.8 25.9 20.4 29.9

211

Table A.7: MNase cutting at the -137 VNN1 gene promoter SNP in different SAFHS cell lines

MNase-treated cell lines of the genotypes (“Genotype”) shown were used to calculate chromatin accessibility expressed as a ratio of undigested to MNase-digested (MNase cutting [%]). As shown are the data obtained from cultured cell lines (“Cell line designation") derived from the SAFHS cohort for each genotype and assayed in quadruplicate or greater (n ≥4) based on availability of frozen stocks. The average MNase cutting values for each replicate are shown in addition to the overall average (“Average”) of each genotype.

Cell line designation 0306 0525 0029 0054 0188 0194 Genotype -137G/G -137G/G -137T/G -137T/G -137T/T -137T/T MNase 11.747 12.340 21.510 12.653 22.957 22.727 cutting (%) 12.433 21.123 19.013 13.713 21.399 38.598 10.698 11.564 19.319 16.756 29.553 19.720 12.974 10.188 25.238 23.097 26.713 28.998 19.938 12.554 16.266 24.455 29.462 16.548 23.647 35.055 13.903 Average 13.6 13.6 18.8 16.6 24.8 29.1

212

Table A.8: MNase cutting at the -587 VNN1 gene promoter SNP in different SAFHS cell lines

MNase-treated cell lines of the genotypes (“Genotype”) shown were used to calculate chromatin accessibility expressed as a ratio of undigested to MNase-digested (MNase cutting [%]). As shown are the data obtained from cultured cell lines (“Cell line designation") derived from the SAFHS cohort for each genotype and assayed in quadruplicate or greater (n ≥4) based on availability of frozen stocks. The average MNase cutting values for each replicate are shown in addition to the overall average (“Average”) of each genotype.

Cell line designation 0306 0525 0029 0054 0188 0194 Genotype -587A/A -587A/A -587G/A -587G/A -587G/G -587G/G MNase 32.842 33.869 9.057 12.544 7.509 16.278 cutting (%) 18.842 31.503 18.037 16.039 9.276 12.497 19.361 27.52 17.154 20.028 2.247 16.353 38.251 20.535 15.615 16.427 7.442 10.318 29.265 24.748 13.712 9.32 21.325 15.773 15.645 Average 26.6 27.6 15.0 16.3 7.2 13.9

213