INTEGRATING POPULATION GENOMICS AND MEDICAL GENETICS FOR UNDERSTANDING THE GENETIC AETIOLOGY OF EYE TRAITS

FAN QIAO (M.Sc. University of Minnesota)

A THESIS SUBMITTED FOR THE DEGREE OF DOCTOR OF PHILOSPHY

SAW SWEE HOCK SCHOOL OF PUBLIC HEALTH NATIONAL UNIVERSITY OF SINGAPORE

2012

Acknowledgements

I would like to express my sincerest gratitude to my supervisor, Prof. Yik-

Ying Teo, for his guidance, patience and encouraging high standards in my

work through this study. He spent hours reviewing my original manuscripts,

gave constructive feedback and made detailed corrections. His support has been invaluable for me to write this doctoral thesis.

I am also deeply grateful to my supervisor, Prof. Seang-Mei Saw, for her continuous support, suggestions and providing research resources for me to accomplish my work. Her passion in research and the determination to slow the myopic progression in children has influenced me greatly.

My sincere thanks also go to Dr. Yi-Ju Li, who encouraged me to move a step forward in my career and broadened my research experience. Her unflinching courage confronting ill health will inspire me for my whole life. I am also thankful to Dr. Ching-Yu Cheng. The conversations with Ching-Yu were always valuable for me to understand the clinical relevance of ocular diseases. My thanks are also due to Dr. Chiea-Chuen Khor for his prompt comments in reviewing my papers and the insight provided. I also wish to thank Dr. Liang Kee Goh for providing the infrastructure to support me at the beginning of this research, and Prof. Terri L Young and Prof. Tien-Yin Wong for their dedication along this project.

During this research, I have worked with many collaborators for whom I have great regard. In particular, I am indebted to Dr. Veluchamy A. Barathi for performing and expression in ocular tissues. The discussion with her regarding the animal model of was an interesting exploration.

1

It is also my pleasure to acknowledge Dr. Akira Meguro and Dr. Isao Nakata

for kindly sharing their data in the replication study and stimulating

discussions.

Many thanks go to my office-mates and colleagues, Zhou Xin, Chen Peng,

Xiaoyu, Haiyang, Huijun, Rick, Queenie, Vivian, Chenwei and Wang Pei for

their cheerful discussion and a source of inspiration.

Finally, I would like to thank my family for their wholehearted support given to me - I owe everything to them.

For me, the journey over the past several years has been more like a

process of cultivation. The best way to express my gratitude is, without

attachment to a self, to help others in my life.

2

Tables of Contents

SUMMARY…………………………………………………………………..6

LIST OF TABLES………………………………………………...... 8

LIST OF FIGURES…………………………………………………...... 9

1 CHAPTER 1 INTRODUCTION ...... 12

1.1 Statistical analysis of genome-wide association studies ...... 12 1.1.1 Linkage disequilibrium based association mapping...... 12 1.1.2 Study design and analytical strategy ...... 13 1.1.2.1 Data quality control ...... 13 1.1.2.2 Population structure ...... 14 1.1.2.3 Study design ...... 16 1.1.2.4 Multiple testing ...... 17 1.1.3 Phenotype classification...... 18 1.1.3.1 Binary/quantitative traits ...... 18 1.1.3.2 Paired eye measurements ...... 19 1.1.4 Meta-analysis of genome-wide association studies ...... 22 1.1.4.1 Imputation on genotyped data ...... 22 1.1.4.2 Statistics in the meta-analysis ...... 23 1.1.4.3 Statistical challenges in analyzing multi-ethnic populations ...... 26

1.2 Recombination variation between populations ...... 28 1.2.1 Recombination and genetic diversity ...... 28 1.2.2 Variation in inter-population recombination ...... 29 1.2.3 Current approaches of quantifying recombination differences ...... 30

1.3 Refractive errors and the aetiology of myopia ...... 32 1.3.1 Types of refractive errors ...... 33 1.3.1.1 Myopia, hyperopia and ocular biometrics ...... 33 1.3.1.2 ...... 34 1.3.2 Experimental animal myopia models ...... 35 1.3.2.1 Deprivation myopia and inducing myopia ...... 35 1.3.2.2 Emmetropisation and the role of scleral changes in eye growth ...... 37 1.3.2.3 Peripheral refraction ...... 37 1.3.3 Roles of environmental factors in controlling refraction ...... 38 1.3.4 Genetic basis of myopia ...... 41 1.3.4.1 Familial aggregation and segregation...... 41 1.3.4.2 Estimates of heritability ...... 43 1.3.5 Genetic loci associated with or linked to refractive errors ...... 46 1.3.5.1 Myopic loci identified from genome-wide linkage studies ...... 46 1.3.5.2 Candidate gene studies ...... 50 1.3.5.3 Genome-wide association studies ...... 57 3

1.3.6 Intervention to slow myopia progression ...... 61

2 CHAPTER 2 STUDY AIMS ...... 65

3 CHAPTER 3 GENETIC VARIANTS ON 1Q41 INFLUENCE OCULAR AXIAL LENGTH AND HIGH MYOPIA ...... 67

3.1 Abstract ...... 67

3.2 Background ...... 68

3.3 Methods ...... 70 3.3.1 Study cohorts ...... 70 3.3.2 Data quality control ...... 74 3.3.3 Statistical methods ...... 77 3.3.4 Functional studies ...... 78 3.3.4.1 Gene expression in human ...... 78 3.3.4.2 Myopia-induced mouse model ...... 79

3.4 Results ...... 82 3.4.1 Datasets after quality control ...... 82 3.4.2 Locus at chromosome 1q41 achieved genome-wide significance ...... 83 3.4.3 Association with high myopia on the identified SNPs ...... 84 3.4.4 Gene expression ...... 85

3.5 Discussion ...... 86

4 CHAPTER 4 GENOME-WIDE META-ANALYSIS OF FIVE ASIAN COHORTS IDENTIFIES PDGFRA AS A SUSCEPTIBILITY LOCUS FOR CORNEAL ASTIGMATISM ...... 103

4.1 Abstract ...... 103

4.2 Background ...... 104

4.3 Methods ...... 106 4.3.1 Study cohorts ...... 106 4.3.2 Data quality control ...... 109 4.3.3 Statistical methods ...... 113

4.4 Results ...... 115 4.4.1 Datasets after quality control ...... 115 4.4.2 Gene PDGFRA exhibiting genome-wide significance...... 116

4.5 Discussion ...... 117

5 CHAPTER 5 GENOME-WIDE COMPARISON OF ESTIMATED RECOMBINATION RATES BETWEEN POPULATIONS ...... 130 4

5.1 Study summary ...... 130

5.2 Methods ...... 131 5.2.1 Development of recombination variation score ...... 131 5.2.2 Simulation ...... 134 5.2.3 Estimation of recombination rates ...... 137 5.2.4 Simulation ...... 138

5.2.5 SNP annotation, copy number variation and FST calculation ...... 141 5.2.6 Quantification of variations in linkage disequilibrium ...... 141

5.3 Results ...... 143 5.3.1 Simulation studies on power and false positive rates ...... 143 5.3.2 Application to HapMap and Singapore Genome Variation Project ...... 145 5.3.3 Recombination variation and Linkage disequilibrium variation highly correlated 148 5.3.4 Regions with largest recombination variation less frequent in ...... 149

5.4 Discussion ...... 149

6 CHAPTER 6 CONCLUSION ...... 181

6.1 Identified genetic variants associated with refractive errors ...... 181

6.2 Transferability of the genetic variants for refractive errors across populations 182

6.3 Statistical meta-analysis of GWAS in diverse populations ...... 184

6.4 Missing heritability of myopia ...... 185

6.5 Recombination variations and implications in genetic association studies ...... 187

7 PUBLICATIONS ...... 190

8 REFERENCES ...... 191

5

Summary For complex human diseases, identifying the underlying genetic factors

has previously primarily relied on either genome-wide linkage scans to narrow

down the chromosomal regions that are linked to disease-causing genes or the candidate gene approach based on known mechanisms of disease pathogenesis. During the past few years, genome-wide association studies have emerged as popular tools to identify genetic variants underlying common and complex diseases, greatly advancing our understanding of the genetic architecture of human diseases.

Refractive errors are complex ocular disorders, as the underlying causes are both genetic and environmental in origin. The need for continued research into the genetic aetiology of refractive errors is considerable, especially considering a mismatch between high heritability in twin studies and the paucity of evidence for associated genetic variation. This thesis seeks to address the potential roles of genetic factors involved in refractive errors.

Through a meta-analysis of three genome-wide association scans on ocular biometry of axial length in Asians, we have determined that a genetic locus on chromosome 1q41 is associated with axial length and high myopia. In addition, our meta-analysis in five genome-wide association studies in Asians has revealed that genetic variants on chromosome 4q12 are associated with corneal astigmatism, exhibiting strong and consistent effects over Chinese,

Malays and Indians.

Inter-population variation in patterns of linkage disequilibrium, largely shaped by underlying homologous recombination, influences the transferability of genetic risk loci across different populations. Understanding 6

the recombination variation provides the insight into fine-mapping of the

functional polymorphisms by leveraging on the genetic diversity of different

populations. This motivates an attempt to quantify the recombination

variations between populations. For this purpose, a quantitative measure

(varRecM) is proposed to evaluate the extent of inter-population differences in

recombination rates. Our findings suggest that significant fine-scale differences exist in the recombination profiles of Europeans, Africans and East

Asians. Regions that emerged with the strongest evidence harbour candidate genes for population-specific positive selection, and for genetic syndromes.

7

List of Tables

Table 1. Summary of analytic approaches for quantitative trait two-eye data in genome-wide association studies...... 21

Table 2. Myopia loci identified from genome-wide linkage studies...... 49

Table 3. Candidate genes studied for high myopia...... 54

Table 4. Genetic loci identified from genome-wide association studies...... 59

Table 5. Characteristics of study participants in the five Asian cohorts...... 92

-5 Table 6. Top SNPs (Pmeta-value ≤ 1 × 10 ) associated with AL from the meta3analysis in the three Asian cohorts...... 93

Table 7. Association between genetic variants at chromosome 1q41 and high myopia in the five Asian cohorts...... 94

Table 8. Characteristics of the participants in five studies...... 128

Table 9. Top SNPs (P-value ≤ 5 x 10-6) identified from combined meta- analysis of five Asian population cohorts...... 129

Table 10. varRecM scores at top percentiles for pair-wise comparisons of the three HapMap populations between CEU and JPT + CHB, CEU and YRI, YRI and JPT + CHB...... 175

Table 11. The 20 strongest signals of varRecM scores in comparisons of HapMap populations...... 176

Table 12. The 20 strongest signals of varRecM score in comparison of populations of SGVP Chinese and Indians, and Chinese and HapMap East Asians...... 179

8

List of Figures

Figure 1. Impact of population stratification on genotype frequencies in the case-control association study...... 15

Figure 2. Cross-sectional view of the human eye structure ……...... ………34

Figure 3. The implicated genes likely to be involved in the visual signal transmission and scleral remodeling…………………………………………61

Figure 4. Principal component analysis (PCA) was performed in SiMES to assess the extent of population structure...... 95

Figure 5. Principal Component Analysis (PCA) of discovery cohorts SCES, SCORM and SiMES with respect to the four population panels in phase 2 of the HapMap samples (CEU-European, YRI-African, CHB-Chinese, JPT- Japanese)...... 96

Figure 6. Quantile-Quantile (Q-Q) plots of P-values for association between all SNPs and AL in the individual cohort (A) SCES, (B) SCORM,(C) SiMES, and combined meta-analysis of the discovery cohorts (D) SCES + SCORM + SiMES...... 97

Figure 7. Manhattan plot of -log10(P) for the association on axial length from the meta-analysis in the combined cohorts of SCES, SCORM and SiMES....98

Figure 8. The chromosome 1q41 region and its association with axial length in the Asian cohorts...... 99

Figure 9. mRNA expression of ZC3H11B, SLC30A10 and LYPLAL1 in human tissues...... 100

Figure 10. Transcription quantification of ZC3H11A, SLC30A10 and LYPLAL1 in mouse retina, retinal pigment epithelium and sclera in induced myopic eyes, fellow eyes and independent control eyes...... 101

Figure 11. Immunofluorescent labelling of (A) ZC3H11A (B) SLC30A10 and (C) LYPLAL1 in mouse retina, retinal pigment epithelium and sclera in induced myopic eyes, fellow eyes and independent control eyes...... 102

Figure 12. Principal Component Analysis (PCA) of SP2, SiMES, SINDI, SCORM with respect to the population panels in phase 2 of the HapMap samples (CEU-European, YRI-African, CHB-Chinese, JPT-Japan)...... 122

Figure 13. Principal component analysis (PCA) was performed in SINDI to assess the extent of population structure...... 123

9

Figure 14. Quantile-Quantile (Q-Q) plots of P-values for association between all SNPs and corneal astigmatism in the combined meta-analysis of (A) individual cohort SP2, (B) SiMES, (C) SINID, (D) SCORM, (E) STARS and (F) SP2 + SiMES + SINDI + SCORM + STARS...... 124

Figure 15. (A) Manhattan plot of log10(P-values) in the combined discovery cohort of SP2, SiMES, SINDI, SCORM and STARS. The blue horizontal line presents the threshold of suggestive significance (P = 1.00 × 10-5). (B) Regional plot of the association signals from the meta-analysis of the five GWAS cohorts around the PDGFRA gene locus...... 125

Figure 16. Forest plot of the estimated allelic odds ratios for the lead SNP rs7677751...... 126

Figure 17. Linkage disequilibrium (LD) calculated in terms of r2 for Singapore Chinese samples from SP2 (A), Malays samples from SiMES (B) and Indians panels from SINID (C)...... 127

Figure 18. Illustration of ranking the recombination differences from two populations...... 156

Figure 19. Evaluation of false positive rates (FPR) of varRecM method.....157

Figure 20. Power performance of varRecM method...... 158

Figure 21. Accumulative density plots of varRecM scores from five pair comparisons between HapMap and SGVP populations...... 159

Figure 22. Distribution of population-specific recombination peak regions in the top 1% of the varRecM scores...... 160

Figure 23. Top regions of largest varRecM scores with overlapping signals of positive selection...... 161

Figure 24. Plots of the top 20 regions of the varRecM scores for the comparison between samples of HapMap CEU and JPT+CHB...... 164

Figure 25. Plots of the top 20 regions of the varRecM scores for the comparison between samples of HapMap CEU and YRI...... 166

Figure 26. Plots of the top 20 regions of the varRecM scores for the comparison between samples of HapMap JPT+CHB and YRI...... 168

Figure 27. Plots of the top 20 regions of the varRecM scores for the comparison between samples of SGVP CHS and INS...... 170

Figure 28. Plots of the top 20 regions of the varRecM scores for the comparison between samples of SGVP CHS and HapMap JPT+CHB...... 172

10

Figure 29. Scatter plot of varLD score versus varRecM score among HapMap and SGVP populations...... 173

Figure 30. Odds ratio of extreme varRecM scores presenting in intergenic versus gene regions...... 174

11

1 Chapter 1 Introduction

In this Chapter, I will initially introduce the genome-wide association

studies (GWAS) and the GWAS meta-analysis, and also highlight the statistical challenges for paired-eye data. Subsequently, I will provide the background and motivation of the study in inter-population recombination variations. The last section will include a literature review on the aetiology of refractive errors, particularly myopia.

1.1 Statistical analysis of genome-wide association studies

1.1.1 Linkage disequilibrium based association mapping

Mapping disease genes primarily depends on linkage studies and association mapping. The former exploits within-family correlations between the disease and the genetic markers (i.e. microsatellite) linked to disease- related genes by calculating the logarithm of odds (LOD) scores1. Mutations

for more than 1,600 Mendelian diseases have been discovered by linkage

studies; however, it is less successful for complex (polygenic) disorders.

The genome-wide design is proposed as a powerful means to identify

common variants that underlie complex human traits2,3. GWAS typically

survey between 500,000 to 1,000,000 single nucleotide polymorphisms

(SNPs) across the entire simultaneously4. Such a dense set of

SNPs (known as tag SNPs) across the genome is chosen based on the linkage

disequilibrium (LD) pattern of genotyped SNPs within a particular

chromosomal region in HapMap reference samples, thanks to the launch of the

international HapMap project5. In the simple scenario, an association study

12

compares the frequency of alleles or genotypes for a particular variant between the cases and controls. The current design of GWAS relies on genetic correlations between the genotyped markers and underlying functional polymorphisms, named LD-mapping. LD is the non-random association of alleles at two or more loci. The amount of LD depends on the difference between observed and expected (which is assumed randomly distributed) allelic frequencies. SNPs in high LD are likely to transmit to the same offspring in subsequent generations. It is hoped that a true causal SNP not genotyped in a study would be captured through a minimal level of LD with an informative nearby genotyped SNP exhibiting significant association with the disease.

1.1.2 Study design and analytical strategy

1.1.2.1 Data quality control

GWAS rely on commercial SNP chips, predominantly by Illumina

(http://www.illumina.com/) and Affymetrix (http://www.affymetrix.com/).

Regardless of the type of SNP chips used, a rigorous quality control (QC) procedure is very important to ensure the success of the study. While both

Affymetrix and Illumina have their own genotype-calling algorithms for raw data analysis, one should make sure that the best practice of genotype calling protocol is applied. Several QC check points are often examined in a GWAS including the sample call rate, Hardy-Weinberg equilibrium (HWE), the minor allele frequency (MAF), genotype missingness per marker, and population structure6. Although there is no gold standard for these QC check points, examples of thresholds that we would recommend are: excluding samples with 13

call rates <95%, and excluding SNPs which are out of HWE (p< 10-6) in control samples, MAF < 0.01, or genotype missingness >10%. Population structure is another important QC task to investigate and will be described in the next section.

1.1.2.2 Population structure

Early views of the role of population structure in genetic association studies of unrelated individuals focused on the concern that cryptic population substructure would raise the false-positive rate of statistical tests above their nominal level. For instance, in a case-control dataset, we assume that there are two underlying subpopulations with different allele frequencies at the SNP and that the number of cases is disproportionally high in one subpopulation

(Figure 1). Although genotype frequencies are identical in the cases and controls within a population 1 or population 2, it appears there are dramatic differences in CC and TT genotypes among cases and controls in the combined data. Under this scenario, the failure to account for population stratification, a confounding factor of allele frequency differences, could result in a false-positive association between a certain SNP and the disease status.

14

Figure 1. Impact of population stratification on genotype frequencies in the case-control association study. The percentages of individuals carrying different genotypes in cases in the population 1, combined populations and population 2 respectively are on top panel; analogously for controls in bottom panel. Cases are overrepresented in population 1.

Price and colleagues proposed a computational feasible approach to detect and correct population stratification7. In their approach, principal components analysis (PCA) was used to model ancestry differences between cases and controls. The EIGENSTRAT approach identifies ancestry differences among samples along eigenvectors of a covariates matrix. The ancestry outliers will be excluded from further association analyses. In addition to excluding these samples, the EIGENSTRAT approach is used to adjust the amounts attributable to ancestry for the top eigenvectors

(http://genepath.med.harvard.edu/~reich /Software.htm). Patterson and colleagues pointed out that top eigenvectors could be caused by a large set of markers in a high (or complete) LD block8. Hence they recommended pruning the markers in tight LD before performing PCA.

15

1.1.2.3 Study design

Case-control or cross-sectional study designs are widely adopted to

evaluate the association between the disease and multiple SNPs. The statistical

approach to analyse GWAS data is similar to traditional epidemiology studies,

except the same test is repeated for each SNP. Cochran-Armitage’s trend test,

χ2 test and logistical regression model are largely utilised in the case-control

design to study the overrepresentation of the mutated allele in cases versus

controls9.

Although most GWAS phenotype data, employing the existing

epidemiology cohorts, are collected longitudinally, they are usually analysed

in a case-control fashion. The incorporation of longitudinal information such

as modelling time to event and repeated measurements will add merit to

GWAS10. Analysing the longitudinal data of repeated measurements is

however computational intensive, and lacks efficient software. An alternative

way is to use the aggregate outcome of interest, i.e. changes in the outcome

over time, but the use of limited or partial data can compromise the statistical

power11.

For a family-based GWAS, the transmission disequilibrium test (TDT) is

used to measure the excessive-transmission of an allele from heterozygous

parents to the affected offspring under the condition of Mendel’s law12. TDT

has been generalised for multiple sibling using family based association tests

(FBATs)13. Such tests are extended to quantitative traits, named quantitative

transmission disequilibrium test (QTDT) and family-based association tests for quantitative traits (QFAM), and both are implemented in the QTDT software package (http://www.sph.umich.edu/csg/abecasis/QTDT/). 16

Compared to the population-based case-control design, family-based

association study in the use of trios of families is robust against the population

stratification14. However, the recruitment of parents-offsprings usually requires more research resources than that of unrelated subjects in population- based study, particularly posing challenges for late-onset diseases.

Furthermore, to obtain the similar statistical power, costs increase in genotyping trios to that of genotyping two individuals in the case-control study 15. These factors might explain the popularity of population-based

design in current GWAS.

1.1.2.4 Multiple testing

Testing multiple hypotheses simultaneously to draw the correct statistical inference is the most challenging aspect of a GWAS. It is now common to assay one million variants in a GWAS, and this effectively constitutes

1,000,000 hypothesis tests. A conventional significance threshold of 5% is thus expected to artificially identify 5,000 markers that are “correlated” to the trait. To address this issue of multiple testing, geneticists have adopted a stringent statistical significance level of 5.0 × 10-8, commonly defined as

attaining genome-wide significance, as the benchmark for evaluating the

fidelity of the association signal at each marker9. Notably, the Bonferroni

correction is simple but conservative, as assuming the independence of one

million genetic variants and all tests conducted without considering the inter-

marker correlation. Replication is thus considered as the gold standard for

GWAS publications16. Currently, the identification of candidate genetic loci

for replication is mainly driven by the level of statistical evidence from single-

17

marker association tests (either the p-value or the Bayes factor) for further

downstream functional evaluation.

1.1.3 Phenotype classification

1.1.3.1 Binary/quantitative traits

In gene mapping, ocular phenotypes are usually classified into two broad

types: qualitative (or binary) and quantitative (or continuous) traits.

Dichotomous traits have been featured in GWAS for age-related (AMD)17,18, primary open-angle (POAG)19,20,

cataract21 and high myopia22,23. The affected individuals are usually classified

on the basis of diagnosis from the worse eye or both eyes, while controls

exhibit no sign of syndrome for both eyes. Although assessing the binary

outcome is more directly relevant to clinical application, quantitative traits

(endophenotypes or intermediate traits) underlying diseases are also valuable

in the dissection of the genetic architecture, as they take the full-spectrum

measures into account. For instance, central corneal thickness (CCT) and cup- to-disc ratio (CDR) are presented as quantitative endophenotypes of open-

angle glaucoma (OPRG)24. Mapping genes for CCT25-27 and CDR28,29 in the

GWAS would shed light on the joint genetic aetiology of OPRG.

A “myopia” gene may be practically relevant to the hyperopic defocus

whereas quantitative trait locus (QTL) for affecting ocular

component growth is responsible for the entire phenotypic spectrum. It is

possible that genes involved in a quantitative trait (refractive error) also play a

role in the extreme forms of the trait (high myopia)30.

18

1.1.3.2 Paired eye measurements

Often, the primary interest in ophthalmological genetic studies is to locate

shared quantitative genetic loci (QTL) that exert effects on both eyes31-33, as

the physiological mechanism underlying inter-eye difference of phenotypic

abnormalities remains elusive and inadequately understood. Therefore, for

quantitative traits collected from both eyes, an immediate question is whether

the analyses should be performed on data from one eye or two eyes. In seven

GWAS papers on eye-related QTL that have been published

(http://www.genome.gov/gwastudies), the analytic strategies varied from the

use of right eye26,27,29 or a randomly chosen eye28 to the averaged measurement from two eyes25,34,35. Conducting analysis on one eye alone is a

simple approach to avoid the statistical model complexity. However, using

partial data of one eye only might be statistically inefficient. Averaging ocular

measurements between two eyes has been suggested to yield higher

heterogeneity estimates than using information from one eye only; therefore

this tends to have more power in genetic studies36. Using averaged ocular

measurements therefore has been the convention in QTL linkage studies in the

myopia genetics research community37-40. However, in a few scenarios the

traits might be moderately or weakly correlated between two eyes41. Neither

the use of data from one eye nor an average from both eyes is appropriate due

to the negligence of phenotypic dissimilarity.

A wide array of statistical approaches has emerged recently for the

detection of the pleiotropic genetic factors contributing to multiple correlated

traits, which could also be applied to two-eye data (see Table 1). The

simultaneous consideration of all correlated phenotypes has been shown to be 19

statistically powered to exploit pleiotropic genetic effects over univariate

analysis42-45. The first approach is to combine dependent test statistics or estimators from the univariate analyses for a global assessment on association42,46-48. In brief, GWAS tests are conducted for two eyes separately.

The two test statistics from both eyes (for example, z scores) are combined

subsequently in a linear form weighted by the covariance matrix estimates42,48.

Correcting for twice the number of markers is not relevant here since for each

marker only one global test is performed using the combined statistics. This

simple approach does not rely on any complicated model assumption as well.

The second approach is to transform multiple traits to an optimal single

phenotype with enhanced heritability, and one such example is principle

component analysis43,49. This dimension reduction technique involves

intensive computation, thus the application in two-eye data might not be

straightforward. The third one is model-based joint analysis of bivariate traits,

including generalized estimating equations (GEE)44,50-52, the mixed-effect

model45,53 and tree-based regression model54, etc. Among these, the GEE

model is most statistically efficient to perform bivariate association tests44,52.

To date, few statistical software packages incorporating model-based joint

analyses on bivariate traits are available55, and much more effort should be

devoted to this area.

20

Table 1. Summary of analytic approaches for quantitative trait two-eye data in genome-wide association studies

Approaches Comments

Data from One eye

-either eye or a Simple; less powerful if the correlation between the two randomized eye traits is low

Data from Both Eyes

Transform bivariate traits to one trait

-average measurements Simple and efficient; statistically less efficient if the correlation between bivariate traits is low and missing data are present on either eye.

-principle components Statistically powerful; complex; reduce the phenotypes analysis43,49 to a single trait; computationally intensive

Combining univariate test Simple and powerful; capable of handling paired-eye statistics traits not highly correlated; robust for partially missing trait values

Model-based approaches

-GEE44,50-52 Statistically powerful; robust for various correlation structures; efficient on both normal and nonnormal traits; complex

-mixed-effect model50 Statistically powerful; complex; robust for various correlation structures of multiple traits; computationally intensive

-tree-based regression54 Analytically complex; capable of assessing multiloci association test for multivariate traits; computation extremely intensive

21

1.1.4 Meta-analysis of genome-wide association studies

Accumulated evidence suggests that most of the GWAS are underpowered

for the variants with small effect sizes (ORs of 1.0 ~ 1.5), and the associated

SNPs generally explain a small fraction of the genetic risk56. Meta-analysis

provides a robust approach to enhance statistical power and effective sample

size by pooling evidence from multiple independent association studies57,58.

The application of meta-analysis in ophthalmology has become a standard

practice to identify genes that are associated with eye disorders26-29,34,35.

1.1.4.1 Imputation on genotyped data

If the individual GWAS is conducted with different genotyping platforms

(Illumina or Affymetrix), the meta-analysis strategy could only utilise a small

subset of overlapped markers. In addition, if the causal polymorphism is a

common untyped SNP and in varying degrees of LD with the genotyped SNP

nearby in different populations, the meta-analysis also has limited power to detect true association in the combined data. One way to address these issues

is to perform imputation using the HapMap reference panels, which provide a

powerful framework for the assessment of the complete array of genetic

variants (most of which are un-typed). Step-by-step guidelines and techniques for performing imputation-based genome-wide meta-analysis was reviewed by de Bakker and colleagues58. The development of several imputation methods

for inferring the genotypes of untyped markers has provided a solution for this

problem (for a review, see59). The basic idea behind imputation is to utilise the

correlation among untyped and typed markers to infer the genotypes of

untyped markers in each dataset. With the imputation programs becoming

22

available, we now can impute untyped markers at the first stage to allow

assessing multiple datasets for the same set of SNPs.

The accuracy of imputation largely depends on two factors. First, the

overall level of LD reflects the distance over which the genotypic correlations

permit imputation to extend, so the imputation is more accurate in high-LD

regions60. Second, the level of genetic similarity of the study population to the

reference panels affects the utility of the haplotypes copied from the reference

samples in imputing genotypes in the study populations. Imputation accuracy

based on HapMap reference panels is highest in European populations, which

are closely related to the HapMap CEU panel, and lowest in Africans with a

diverse genetic background. If GWAS are conducted in populations which are

not represented by the available high density reference panels in HapMap

data, for example, Malays and Indians, mixtures of reference panels are

recommended to maximize imputation accuracy61 .

In addition, it should be noted that imputation is generally computational intensive. IMPUT60,62, MACH62, and BEAGLE63 are the frequently used programs. Each has different strengths and weaknesses, but none of them is optimal for all situations64.

1.1.4.2 Statistics in the meta-analysis

Meta-analysis in the setting of genetic studies refers to combining

summary statistics of overlapping SNPs from multiple genetic association

studies. Since combining raw individual genotype and phenotype data across

studies to perform pooled analysis is difficult in general, the meta-analysis

23

based on the summary results is a surrogate to assess the association tests

across all datasets. Here, we describe a few meta-analysis methods in GWAS.

First, the simplest meta-analysis method is Fisher's methods Tfisher = -2 * ∑

2 log(pi), where pi is p value of study i, i=1, …, k. Tfisher follows a χ

distribution of 2k degrees of freedom where k is the total number of datasets.

Since Fisher’s method takes only information from the p-values, it is

important to keep in mind that it should be applied to the markers with the

same direction of the effect to the susceptibility of the disease.

Second, Mantel-Haenszel methods are commonly used for dichotomous

traits if the information for a 2 × 2 contingency table can be recovered from

each study65. In combining odds ratio, weight is usually given proportionally to the precision of the results in each study.

Third, if a 2 × 2 table is not available in each study, such as if p-values

were obtained from logistical regression framework in order to adjust for

potential confounding covariates, using z-score statistics to compute the meta- p values is the best option. The z-score statistics are wildly used in practice for meta-analysis since a z-score could be easily converted in each study and the

direction of effect is manifested in itself58. For quantitative traits, the pooled

weighted effect size is commonly calculated as the sum of the individual

effect size using inverse variance of each study as weight. Such an approach is

also known as a fixed-effect model under an assumption of the same expected

effect size between studies. Combined effect size is calculated as:

Tmeta = ∑Tiwi,

where Ti is the effect size of study i and wi is the inverse variance of effect

size of study i. The pooled standard error of Tmeta is: 24

SEmeta= 1 �∑푤푖 Then a pooled z-score is obtained as

zmeta = , which follows a chi-square distribution 푇푚푒푡푎 푆퐸푚푒푡푎 with 1 degree of freedom. In cases where the variance is not given in the

summary statistics or standard error is not on the same unit (for example, the

quantitative trait is not measure on the same unit), a z-score can then be

summed across multiple studies weighting them by study sample size:

= , where wi = . 훽푖 푁푖 푍푚푒푡푎 ∑ 푆퐸푖 푤푖 �푁푡표푡푎푙 It is unlikely that every dataset for a meta-analysis is derived from a single homogenous population with the same genetic effect. Therefore, it is important to access the heterogeneity across datasets. A commonly used method to assess between-study heterogeneity is called Cochran’s Q statistic, for which the large values of Cochran’s Q favour the alternative hypothesis of heterogeneity. For datasets i = 1, … , k, T1, … , Tk is the study-specific effect size. The Cochran’s Q statistic is computed by:

= ( ), = where 푘 푘 ∑푖=1 푤푖푇푖 푖 푖 푘 푄 ∑푖=1 푤 푇 − 푇 푇 ∑푖=1 푤푖

and wi is the inverse of the estimated variance in dataset i. Q is distributed as a chi-square distribution with k-1 degrees of freedom. An alternative form, statistic I2 (inconsistency), derived from Q, 100% × (Q-degree of freedom), is

a measure of the percentage of heterogeneity versus total variation across

studies. Values of I2 over 50% indicate the presence of heterogeneity. If

25

evidence of heterogeneity is demonstrated, measures to identify its possible

cause are needed before drawing any explicit conclusion.

Finally, given the presence of inter-study heterogeneity, a random-effect

model in the meta-analysis makes an assumption that individual studies are

sampled from populations that may have different true effect sizes.

Differences in observed effect sizes arise from two resources: random errors

and true variations in expected effect sizes. In practice, the meta-analysis is

conducted in diverse populations using different study design, sample

ascertainment and phenotype definition. More advanced statistical analyses

are expected to accommodate these issues in the trans-ethnic mapping66-68. No matter what statistical strategies are adopted in such scenarios, additional cohorts for replication or fine-mapping approaches are required to further investigate on the true genetic variants of interest.

1.1.4.3 Statistical challenges in analyzing multi-ethnic populations

A meta-analysis of GWAS across multi-ethnic groups enables us to

uncover the shared genetic variants underlying susceptibility to diseases, an

essential component of the next phase of GWAS to gain a broader view of

disease aetiology69. Heterogeneity, where the genetic effect exits but the effect

sizes vary in different populations, poses a major challenge in the multi-ethnic

meta-analysis. One example is the ε4 allele of the apolipoprotein E gene

(APOE), which is associated with Alzheimer’s disease in Caucasians of per- allele odds ratio above 2, but not significantly associated in African

Americans70. Therefore, the association signals at APOE are expected to dilate

in the pooled data comprising both Caucasians and Africans.

26

The ethnic-specific genetic risk offers a clue to understand the interaction of the identified genetic locus with the undetermined environmental or genetic

variables that influence diseases. The gene-gene (epistatic) or gene-

environmental interactions occur when the effect of a causal variant manifests

under a certain genetic or environmental background71. Environment can have

a substantial role to influence the effect sizes at a given susceptibility locus.

However, the detection of gene-gene or gene-environmental interaction is a

daunting task. Little robust evidence has been provided for ocular diseases.

Heterogeneity can also occur when the genetic association between the

causal variant and the genotyped SNPs varies in different ethnic groups, or in

different samples but the same population (due to the sampling error). The

different LD pattern can generate spurious associations in terms of both size

and direction of effects at the genotyped SNPs, confounding the underlying

true effect of the casual variant72.

Allele frequency also has an impact on effect sizes of the risk allele. It has been noted that wide variation in allele frequencies at susceptibility loci to the complex diseases across populations. One such example is rs19061170 in the complement factor H gene, which has a large effect size with age-related macular degeneration in Caucasians that is much smaller in East Asian populations; the risk allele is found at low frequencies in East Asians (5%), but at moderate frequencies in Caucasians (35%)73.

Allelic heterogeneity (or population-specific causal variants) is also

noteworthy, where the causal variants reside at different loci but likely in the

same functional unit across different populations. However, current meta-

27

analysis approaches for the combination of genetic association results in the

presence of allelic heterogeneity are underpowered74. Allelic heterogeneity is

believed to be enriched for rare variants, and gene-based or regional-based

meta-analysis is expected in exploring sequencing or exome sequencing data

targeting rare variants75.

1.2 Recombination variation between populations

1.2.1 Recombination and genetic diversity

Homologous recombination is one of the key evolutionary determinants of

genomic diversity through the introduction of new haplotypes that alter the

extent and pattern of linkage disequilibrium (LD)76. The most striking feature of recombination in human is the tendency to cluster in highly localized regions named ‘hotspots’ in the human genome of typical 1 to 2 kb in width77,78. Extensive heterogeneity in recombination rates has been catalogued between species79-81. In the comparison between the genomes of human and

, where even though 99% of the genomes were conserved,

remarkable differences in recombination and LD patterns were observed82,83.

Meiotic recombination landscape is transient over evolutionary time84,85 and

highly variable between individuals78,86.

Homologous recombination is an important evolutionary determinant of

genomic diversity by producing novel combination of alleles, resulting in

selection for or against new haplotypes, and linkage disequilibrium (LD)

decay76. LD mapping is one of the key features to permit the success of

genome-wide assessment by linking the untyped functional polymorphism and

28

surrounding assayed markers87,88. Recombination rate has also been widely

used as a surrogate for LD in SNP imputation algorithms60. As such,

understanding recombination variation does not only provides insight into the

genome evolutionary process that has shaped the genetic diversity along the

human history, but also builds a foundation for genetic studies to disentangle

the genes that are associated with common diseases89.

1.2.2 Variation in inter-population recombination

Within the human species, our understanding of fine-scale differences in rates of recombination between human populations remains relatively limited.

Studies have shown that, on a broad scale, recombination rates generally remain evolutionarily conserved in the entire genome90-92. Of the

approximately 30,000 potential recombination hotspots estimated from

European, African and Han Chinese ancestries, only half are common to three

populations, and the remaining are population-specific93. Significant variation

in recombination rates have been documented mostly in regions containing

polymorphic inversions90,94,95, although these comparisons have mainly been

performed at a broad scale across regions stretching megabases in lengths. At

a finer scale comparison across kilobases of the genome, population-specific

spikes or peaks in rates have similarly been reported92,96,97. Recently, Hinch

and colleagues have inferred that 2,500 recombination hotspots, defined as

localized regions of elevated recombination, are active in West Africans but

not Europeans, and that there appears to be a scarcity of hotspots that are

unique to the people of European ancestry98. This observation, along with

findings from the sequenced genes by the Seattle SNPs program96,97, suggests

29

that the intensity and location of recombination hotspots can differ

substantially across different populations. The International HapMap

Consortium5,99 provided the first large-scale database with sufficiently dense

genotyping across the human genome in multiple populations for investigating

recombination. However, there is no study to-date that systematically

interrogates the whole genome for evidence of inter-population variation in recombination rates.

Such inter-population differences in recombination patterns can provide vital opportunities in fine-mapping the functional polymorphisms that underpin the association signals from large-scale genetic studies, through leveraging on different patterns of LD in multiple populations. Understanding the similarities in recombination rates across multiple populations is also important in bioinformatic analyses that depend on recombination rates, such as in genotype imputation and in surveying the human genome for signatures of positive natural selection. Variation in recombination, particularly at disease-associated regions, is likely to have important consequences to genetic association studies.

1.2.3 Current approaches of quantifying recombination differences

Comparing recombination differences at a fine scale relies on the

availability of genetic maps at a high resolution. However, generating a

precise genome-wide map of recombination via direct experimental mapping

of hotspots is not feasible. Genetic maps of recombination commonly

employed in genotype imputation and population selection surveys, such as 30

those from the International HapMap Project5 or from the Singapore Genome

Variation Project100, are usually probabilistically inferred from genotype data

across at least tens of samples from a particular population by correlating

recombination events with the breakdown of linkage disequilibrium (LD)

observed across the population samples based on population coalescent

theory101. The resolution of these maps thus depends on the density of the

SNPs in these databases, and typically yields a resolution in the order of

kilobases. Such LD-based estimates of recombination rate are sex-averaged

over tens of thousands of generations, and are likely to be influenced by the

locus-specific demographic forces102,103. Despite the potential limitations,

these estimated rates of recombination have yielded remarkable insights into

the process of human evolution, leading to the identification of 13-basepair

motifs that are enriched in hotspots104 and the discovery of the PRDM9 gene

as a genetic modifier of recombination activity105.

Current metrics that prioritise genomic regions exhibiting differences in recombination profiles or differential presence of recombination hotspots tend to rely on ad-hoc thresholds, such as: (i) searching for recombination rates exceeding 5 cM/Mb over 2 kb in one population but yet less than 1 cM/Mb in the other population98; (ii) possessing a standardized rate of 10 over a 10kb region in one population but less than 3 or 1 in the second population106; (iii) a five-fold increase in the mean recombination rate in only one population107;

or (iv) spanning a genetic distance of more than 0.01cM within a physical

distance of less than 100kb in only one population but not the other108. Using

different definitions can alter the number and positions of the detected

31

hotspots, and simply querying whether hotspots overlap between populations

may neglect vital information on the local recombination profile.

1.3 Refractive errors and the aetiology of myopia

Refractive errors broadly comprise two types of ocular abnormalities:

spherical errors and cylindrical errors. Spherical errors include myopia

(commonly known as nearsightedness) and hyperopia (farsightedness), while

the condition of cylindrical errors is usually called astigmatism.

Myopia, a multifactorial disorder, represents one of the most common

refractive errors. It is most often associated with subsequent long-term

pathological outcomes. Myopia is caused by a variety of ocular, optical or

functional difficulties manifested while visually interacting with the external

environment109. Environmental factors such as the extent of near work, level

of educational attainment and amount of outdoor activities have been

documented to affect myopia development110. On the other hand, compelling

evidence points to the genetic basis of myopia and more than twenty myopic

loci have been reported from genome-wide linkage studies, some of which show evidence of replication in the independent studies33. In the last five

years, genome-wide association studies have suggested that several genes are

associated with myopia, which are currently awaiting further confirmation and

the assessment of their biological function.

In contrast to myopia, very little data is currently available with regard to elucidating the aetiology of astigmatism. No environmental factors have been recognised to influence the development of astigmatism. Although

32

astigmatism is a heritable trait, no prior study has reported any genes

associated with astigmatism.

1.3.1 Types of refractive errors

1.3.1.1 Myopia, hyperopia and ocular biometrics

When light rays are focused in front of the retina, leading to blurred vision

on far objects, myopia occurs (Figure 2A). Similarly, when light rays are focused behind the retina, it is called hyperopia, as the near objects are blurred.

Myopia poses a considerable public health burden. It is highly prevalent, especially in urban areas of East and Southeast Asia, where 80% of children completing high school have myopia109. Myopia-associated pathological

complications could lead to degenerative changes in the retina and the

choroid, which are not prevented by optical correction. This subsequently

increases the risk of visual impairment through myopic maculopathy,

choroidal neovascularisation and retinal detachment111,112.

Spherical refractive errors are measured on a continuous dioptric scale, an

optical power of in diopters (D) that is necessary to correct the myopic or

hyperopic eye, and are generally quantified using the spherical equivalent (SE;

the algebraic sum of the value of the sphere and half the cylindrical value).

Various categorisations have been applied when describing different refractive

states. By convention, an eye presenting with an SE beyond 0.5 D is referred

to as being hyperopic; a value between -0.5 and 0.5 D is referred to as being

emmetropic. Myopia is defined of the SE at least -1.00 or -0.50 D, and can be

33

further divided into mild (-3.0 < SE ≤ -0.5), moderate (-6.0 < SE ≤ - 3.0) and

high myopia (SE ≤ -6.0).

The spherical refractive error status is contributed by the underlying ocular

biometrics: the optical power of the and lens, and the axial length (AL)

of the eyeball (Figure 2A). AL is composed of the anterior chamber depth

(ACD), lens thickness and vitreous chamber depth (VCD)113. Particularly, myopic subjects are more likely to have a longer axial length. A 1mm increase in AL, mainly through the elongation of the vitreous chamber, is equivalent to a myopic shift of -2.00 to -3.00 D without corresponding changes in the optical power of the cornea and lens. In contrast, the differences in lens thickness and corneal curvature (CC) by comparing myopic to emmetropic subjects are minimal114. Therefore, the control of the AL and excessive

elongation of the eyes is crucial for achieving normal vision in .

A B cornea axial length

lens retina retina

Figure 2. Cross-sectional view of the human eye structure. A) myopic eye; B)

astigmatic eye.

1.3.1.2 Astigmatism

Cylindrical refractive errors commonly refer to astigmatism, where the

light rays do not bend properly to achieve a single focus point on the retina

34

(Figure 2B). While astigmatism comprises corneal and non-corneal components, it typically results from the unequal curvature of two principle meridians in the anterior surface of the cornea, which is known as corneal astigmatism115,116.

The presence of a high degree of astigmatism during early development is

believed to be associated with refractive amblyopia117-119, as evidenced by

decreased best-corrected visual acuity which cannot be remedied by external

corrective lenses. Early abnormal visual input caused by uncorrected

astigmatism can lead to orientation-dependent visual deficits, despite optical correction of visual acuity later in life120. In addition, it has been suggested

that optical blurring by astigmatism may predispose the development of

myopia121-124. Astigmatism is highly prevalent across most populations, with at least 1 in 3 adults above 30 years of age suffering from astigmatism of 0.5

D or greater125.

1.3.2 Experimental animal myopia models

1.3.2.1 Deprivation myopia and inducing myopia

Deprivation myopia occurs when the eyesight is deprived by limited illumination and degraded vision image, e.g. as a result of wearing a diffusing goggle (form deprivation myopia), or a negative/positive spectacle lens

(induced myopia) in front of the eye. Such a phenomenon has been observed in a wide range of species including the , fish, tree shrew, rhesus monkey, guinea pig and mouse126. A negative lens in front of the eye induces

hyperopic defocus (image behind the retina photoreceptors) that results in the

elongation of the eyeball to compensate for the optical effects of the lens. 35

Therefore the eye becomes myopic because of the excessive elongated axial

length. Analogously, a positive lens causes the image to form in front of the

retina (myopic defocus) and reduces the ocular growth rate. Nevertheless,

findings related to its association with hyperopia are less consistent in primates as compared to chicks and mice. It is noteworthy to mention that, in

the animal model, such induced myopia or hyperopia generally shows a

significant degree of recovery after lens removal127.

Myopia induction in animals following alteration of the visual input requires the eye or brain to be able to distinguish myopic defocus from hyperopic defocus. Although the retina produces biochemical signals which control eye growth in response to local defocus, it has become clear that both retinal and central elements play roles in the emmetropisation process, with the central nervous system exhibiting fine-tuning128. Of particular interest are

genes that express in the opposite direction in ocular tissues when subjects

alternately wear negative and positive lenses. The transcription factor ZENK, a

so-called ‘STOP” sign for myopia, is found expressed differently within

glucagonergic amacrine cells in myopia- versus hyperopia-induced mouse models129. Dopamine has been shown to be involved in the optical regulation

of eye growth in myopia-induced animal model, while its gene expression is

also mainly restricted to the amacrine cells in the retina130. Muscarinic

receptors are known to regulate several important physiologic processes in eye

growth, and antagonists to these receptors, such as atropine and pirenzepine,

are effective in stopping the excessive ocular growth that results in myopia131.

However, the primary mechanism of genes as well as the pathway used by the

eye to detect the direction of defocus remains unclear. 36

1.3.2.2 Emmetropisation and the role of scleral changes in eye growth

The endogenous process in matching axial length to the focal place in the

eye growth is called emmetropisation. This biological mechanism involves the

detection of myopic or hyperopic defocus at the retina, signal transmission

across the retinal pigment epithelium and choroid, and alteration of the scleral

matrix132. In myopic human eyes, it is speculated that the emmetropisation

mechanism is defective, with a loss of ability to use myopia defocus to slow

the growth of the eyes. In this case, eyes will gradually become more myopic.

This diminished emmetropisation was also observed in an animal study

showing that wearing positive lens in myopic-induced older tree shrews had

less of an effect; most of the eyes remained myopic while wearing the lens in

older tree shrews, which was in contrast to what was found in infant tree

shrews133.

A larger body of evidence shows that the changes in refraction in animal

models are primarily due to changes in AL, rather than in corneal or lens

parameters. In the process of AL elongation, scleral remodelling plays a

pivotal role in eye size regulation, with sclera thinning and changes in

collagen fibril architecture through the turnover of extra-cellular matrix

(ECM) materials; this is evident in mammals134.

1.3.2.3 Peripheral refraction

There is a growing interest in understanding the role of peripheral

refraction in controlling eye growth. The visual optical device implemented in

the animal model that affects the entire field of view can alter the pattern of peripheral refraction as well. Emerging evidence in animal studies suggests

37

that the peripheral retinal signals can dominate axial growth and central

refractive development when there are conflicting visual signals in the central

and peripheral retina135. Smith and colleagues found that infant monkeys with

peripheral form deprivation but intact central vision were significantly less

hyperopic or more myopic compared to the age-matched controls, suggesting

that the peripheral retina contributes to emmetropising responses136. A recent

study in monkeys also showed that foveal ablation by itself did not produce

alterations in either the central or peripheral refractive errors of treated eyes

137. However, emmetropisation appears not to be affected by changes in

peripheral refraction in chicks, possibly due to different patterns in the

distribution of photoreceptors on the retina in chicks and primates138.

1.3.3 Roles of environmental factors in controlling human refraction

Numerous studies support factors such as the level of educational attainment, near work and outdoor activities having an effect on myopia onset or progression. Evidence has also recently emerged to support a potential role of peripheral refractive errors in myopia development.

Level of education has been consistently associated with myopia across different ethnic groups in a large number of epidemiological studies, where higher academic achievements appear to be positively correlated with myopia139-141. Education level usually correlates with the time spent on reading and writing, so this can be treated as a surrogate of near work.

Near work has long been regarded as an important factor for the

development of myopia. Under the accommodation theory, the eye increases

38

its optical power to maintain a clear visual image when reviewing a near object. The active accommodation to a near target is the driver to move the retina toward the conjugate point, resulting in axial eye elongation142. In humans, clinical evidence has showed that atropine, a potent cycloplegic, inhibits myopic progression under the non-accommodative mechanism143. The important role of near work may also be implicated from the fact that more intensive near work concurs with the increased prevalence of myopia seen in the past few decades. In a longitudinal cohort study in school children aged 7 to 9 years (Singapore cohort study for risk factors of myopia; SCORM), students who read more than two books per week were three times more likely to have higher myopia compared to those who read less than two books per week144. In a population-based sample of 12-year-old Australian schoolchildren recruited in the Sydney Myopia Study (SMS), the positive association between the time of continuously reading (>30 min)/reading distance (<30cm) and myopia was maintained145. It is noteworthy that the association between near work and myopia assessed so far is probably not very precise, partially because of the inaccuracy of the questionnaire in measuring the amount of near work. Although it remains to be elucidated whether the total amount of time of near work, or the modifiable reading habit

(i.e. intensity of reading, reading posture, or proper lighting), is associated with myopia, most ophthalmologists recommend taking breaks during periods of high-intensity near work to prevent the onset of myopia.

A recent study reported that the number of sports and hours of outdoor activity, rather than the number of reading hours per week, was significantly associated with myopia in children aged 8 to 13 years in the U.S146. Similarly, 39

students in Australia who had performed high levels of outdoor activity were

found to be more hyperopic than those involved in a low amount of outdoor

activity147, where the protective effect of outdoor activities appeared to be

associated with the total time spent outdoors, rather than sports per se. The

biological basis however remains unclear. The authors postulated that high

light intensity promotes the retinal transmitter dopamine, which can inhibit

eye elongation147,148. Recent animal experiments suggested that the bright light

level could significantly retard the development of form-deprivation myopia149. An alternative possibility is the greater depth of three-dimensional

structures from a large outdoor field, which could impact the patterns of

defocus on the retina150.

Studies in primates have shown that hyperopic defocus in the retinal

periphery could stimulate foveal myopic progression135. In humans, myopes

tend to show less myopic off-axis (periphery hyperiopia) because of their

relatively less oblate ocular shape, while hyperopic eyes tend to show less

hyperopic off-axis (periphery myopia)151,152. In a longitudinal cohort of about

1,000 children aged between 6 and 14 years in the Collaborative Longitudinal

Evaluation of Ethnicity and Refractive Error (CLEERE) study in the U.S.,

children who became myopic were more hyperopic relative peripheral

refractive errors than emmetropes from 2 years before the onset through to 5

years after the onset of myopia153. Following 2,043 non-myopic third-grade

children for five years, Mutti and colleagues recently found that the

association between hyperopic peripheral defocus and myopia onset was

inconsistent across different ethnic groups, as the highest risk for myopia was

noted in Asian children154. Myopia progression was greater per diopter of 40

more hyperopic relative peripheral refractive error, but only by a small amount

(−0.024 D per year).

Other environmental factors such as higher glycaemic index diets, body

mass index, and early exposure to night lights or depth of the field have also been postulated for myopia development155. Collectively, the environmental

factors account for a small proportion of the variation in myopia156.

1.3.4 Genetic basis of myopia

Evidence from family and twin studies has supported a substantial genetic

component in myopia, astigmatism and underlying ocular parameters156-163.

The following section mainly highlights the findings for myopia, and relevant ocular biometrics. The genetic basis for corneal astigmatism will be addressed in Chapter 4.

Syndromic myopia refers to the condition where high myopia is one of the symptoms of a Mendelian disorder, such as connective tissue disorders

(Marfan and Sticker syndromes). In the following section, I will focus on simple non-syndromic myopia rather than degenerated syndromic myopia.

1.3.4.1 Familial aggregation and segregation

Familial aggregation represents a scenario where relatives with the same genes and/or shared environmental factors are more phenotypically similar than unrelated individuals. The recurrence risk (λ) refers to the risk of being affected given an affected sibling, relative to the risk of being affected in the overall population. The study of familial aggregation is a central theme in

41

genetic field in order to evaluate the relevant genetic and environmental

effects.

Ample evidence shows the clustering of myopia within a family. The Old

Order Amish (OOA) is an isolated population with homogeneous agrarian

lifestyle, representing an ideal population to investigate the genetic heritability

and familial aggregation of refractive errors. In 967 siblings in 269 OOA

families, λs were estimated at 2.36 (95% CI: 1.65-3.19) and 4.52 (95% CI:

2.27 – 8.38) for mild (SE ≤ -0.5 D) and moderate myopia (SE ≤ -2.5 D), respectively164. In 296 siblings in a cohort from the United Kingdom, λs were

approximately 5 to 20 for high myopia (SE ≤ - 6.0 D), and 1.5 to 3 for moderate myopia (-3.0 D ≤ SE ≤ -1.0 D)165. Similarly, in 1,259 nuclear

families of Tehran, for myopia at threshold of SE at -0.5 D or -2.0 D, λs

ranged from 2.09 to 3.86 in siblings and 1.82 to 3.81 among parent-offspring

pairs166. Strong familial aggregation for myopia was also observed in 759

siblings in 241 families recruited from the Salisbury Eye Evaluation Study in

Maryland167, where the odds ratios of myopia (SE ≤ -0.5 D) or moderate myopia (SE ≤ - 2.0 D) were greater than 2 for an individual having a myopic sibling relative to non-myopic sibling. Interestingly, in the Framingham Eye

Study cohort, the estimated odds ratio of myopia (SE ≤ -1.0 D) was 5 for an age difference of 2 years, while it was reduced to half that for an age difference of 10 years, showing a pattern of the decrease in the strength of the between-sibling association for myopia with increasing between-sibling age differences168.

Eye size has also been shown to vary in children according to the parental

history of myopia. A cross-sectional study of 716 schoolchildren (aged 6 to 14 42

years) demonstrated that children with two myopic parents had longer eyes

and less hyperopic refractive error than children with only one myopic parent

or no myopic parents169. A one-year follow-up study in 4,468 Hong Kong

Chinese children aged 5 to 16 years showed more rapid changes in AL growth

and myopia progression among children with a stronger parental history of

myopia170. In SCORM, parental myopia was significantly associated with the

3 year change in AL after adjustment for school, age, sex, race, and reading in

books per week114. As a contrast, in a longitudinal cohort of the Beaver Dam

Eye study, although strong familial aggregation was evident for a sibling to be myopic (λ = 4.18), no statistically significant correlation between siblings in changes of refraction, cylinder power or astigmatism were seen171.

Based on the evidence illustrated from familial aggregation studies, segregation analysis has been further adopted to elucidate the inheritance mode for myopia: a single gene of a large effect (monogenetic) versus multiple genes of small or modest effects (polygenetic). Through the complex segregation analysis, a study conducted in nuclear families of both Japanese and European ancestries has failed to support the influence of a single major gene on refraction172. Furthermore, Klein and colleagues showed that the

heritance mode of refection was polygenic, rather than showing single–gene

Mendelian inheritance, in 620 extended pedigrees of 2138 Caucasians from

the Beaver Dam Eye Study40.

1.3.4.2 Estimates of heritability

Heritability is the proportion of phenotypic variation that is attributable to

genetic variation among subjects. Both twin studies and families data are

43

utilised to estimate the heritability of the trait of interest. In the simplest

scenario, twin studies examine whether the identical monozygotic (MZ) twins

raised together are more similar to each other for a given phenotype than

dizygotic (DZ) twins, who share half of their genes. The correlation between

MZ twins is expected to be twice that of DZ twins under the assumption of

additive genetic effects. The heritability (h2) is thus approximately twice of the

difference in the correlation of MZ and DZ twin pairs:

2 h = 2(rMZ – rDZ) where r = inter-pair correlation

Since the similarity of the phenotype for MZ twins in contrast to DZ twins

could also be due to shared environment factors, the underlying Equal

Environment Assumption (EEA) in the heritability estimation for MZ and DZ

twin is assumed to hold in the heritability studies.

Family studies are also adopted to estimate heritability in a more

complicated familial relationship. Provided the genetic covariance between

individuals of differing degree of relatedness, the Structure Equation

Modelling (SEM) method is widely used in the heritability estimation for twin

and family data, by solving a series of linear equations simultaneously and

permitting the joint estimation of the parameters.

The heritability estimate was 61% (95% CI: 34% - 88%) for ocular refractive error (SE) in 759 siblings of 241 families residing in Maryland167.

Similarly, the estimation was 69% (95% CI: 58%–85%) in the OOA population164,173. Twin studies have yielded higher heritability for refractive

errors, with 83% (95% CI: 77% - 87%) for 90 MZ and 86 DZ female twins in

44

Finland, and 85% in 226 MZ and 280 DZ twin pairs aged 49 to 79 years from the United Kingdom using a model with additive genetic and environmental components157. Recently, a large twin study comprising of 1,152 MZ and

1,149 DZ twin pairs yielded similar results (heritability for SE =0.77; 95% CI:

0.68-0.84)162. However, interpreting results from twin studies should be performed with caution. The heritability calculation might be overestimated as the excess correlation in MZ twins compared to DZ twins might be also due the larger shared environmental exposure within MZ twins.

Ample evidence supports an underlying genetic basis, not only for refractive error but also for underlying ocular dimensions. The majority of heritability estimates for ocular parameters are found to be greater than 0.5.

For instance, in 2,375 participants in a population-based family cohort from the Beaver Dam Eye Study, Klein and colleagues reported that the adjusted heritability estimates were 0.58 ± 0.13 for SE, 0.67 ± 0.14 for AL, 0.78 ± 0.14 for ACD, and 0.95 ± 0.11 for CC159. In 723 participants from 132 pedigrees in

Australia in the Gene in Myopia (GEM) family study, similar adjusted heritability estimates were reported for SE at 0.50 ± 0.05, 0.73 ± 0.04 for AL,

0.78 ± 0.04 for ACD, and 0.16 ± 0.06 for CC156. In 345 MZ and 267 DZ twin pairs between the ages of 18 and 88 years in Austria, the heritability was estimated in the men and women separately. The heritability values were found 0.88 and 0.75 in the men and woman for SE, 0.94 and 0.92 for AL, and

0.51 and 0.78 for ACD, respectively158. For a comprehensive review on the heritability for the ocular traits, please refer to a recent review article174.

45

1.3.5 Genetic loci associated with or linked to refractive errors

Genetic variants that predispose an individual to an increased risk of

developing myopia have been studied for several decades. The majority of

near-sighted individuals develop juvenile onset myopia (or school myopia),

with a peak incidence between 10 and 16 years of age33. A more severe form

of pathological myopia is contrasted with juvenile-onset myopia by its earlier

age of onset, the greater amount of myopia (≤ -6.0 D), and a higher frequency

of fundus abnormalities. As a result, the following questions may arise: what

are the genetic variants involved in various level of myopia? Are the genetic

factors for pathologic myopia (or early-onset myopia) similar to those for

common and moderate myopia (juvenile-onset myopia)?

1.3.5.1 Myopic loci identified from genome-wide linkage studies

More than 20 putative myopia loci have been implicated in the last two decades through genome-wide linkage studies (Table 2). Typically, most studies utilised families with a history of high myopia, and a severe form of myopia usually defined as SE ≤ -6.00 D; as a result, a number of genetic loci have been reported to link to high myopia. The list includes genetic loci on chromosome 2p37.1175, 4p22-q27176, 5p15.33-15.2177, 5p15.1-13.1178, 7p15179,

7p36180, 10q21.1181, 12q21-23182, 14q22.1 – 14q24.2183, 17q21-22184,

18p11.31185, Xq23-27.2186 and Xq28187. Of these implicated loci predisposed

to high myopia, several loci have been replicated in independent samples. For

instance, chromosome 7p15 (MYP17) and 7p36 (MYP4) were subsequently

replicated for quantitative refraction in American pedigrees188. The

46

chromosomal locus 12q21-23 (MYP3) was found to associate with high

myopia in an independent Caucasian samples from Europe189. The genetic

locus 18p11.31 (MYP2) identified from Chinese and American multi-

generation families showed significant association in 15 Chinese families190.

Further sequencing of these linkage peaks within MYP2, however, failed to

establish any genes involved in myopia development in the Caucasian

population191. A locus on chromosome Xq23-27.2 (MYP13) was replicated in an independent Chinese family study192, adjacent to MYP1 (Xq28). Although

a few genetic regions, i.e. loci at 7p15 (MYP17) and 7p36 (MYP4), have

shown association to quantitative refraction in replication studies, others,

including 18p (MYP2) and 12q (MYP3), do not appear to play a similar role

in moderate or mild myopia193,194.

A study on a milder form of myopia as affected samples defined as SE ≤ -

1.00 D pointed to a myopia locus to chromosome 22q12 (MYP6) in a large sample set of Ashkenazi Jewish individuals195. MYP6 was replicated in two

studies using either the same definition for myopia196 or quantitative trait of

refraction197.

A few genome-wide linkage studies have been conducted for quantitative

refraction. The implicated genetic regions pinpointed to chromosome loci

3q26, 4q12, 8p23 and 11p13 in 221 dizygotic twins39 and 1p36 in 49

multigenerational Ashkenazi Jewish families198. Hammon and colleagues

identified the strongest linkage signal on chromosome 11p13 (MYP7), which

contains the biologically relevant candidate paired box gene 6 (PAX6)39.

Chromosome 8p23 (MYP10) was further replicated in an OOA population for mild myopia199. 3q26 (MYP8) was also mapped to quantitative refraction and 47

hyperopia via a linkage scan188. Wojciechowski and colleagues performed a

genome-wide linkage meta-analysis for refractive error in pooled samples of

Caucasians, OOAs, African Americans and Ashkenazi Jewish subjects, but

chromosome 1q36 (MYP14) showed no significant association200. In an

international collaborative effort combining 254 pedigrees of Caucasian

families for quantitative refraction, the MYP14 genetic locus was replicated

with multipoint linkage scores above 1.5201.

Although twenty regions in the human genome susceptible to refractive

error and myopia have been implicated, no myopia genes have been

consistently identified from these regions by fine-mapping or deep-

sequencing. In addition, it is still an open question whether there is any

overlap between genes that might be responsible for both the rarer forms of

high myopia and the more common, less severe juvenile-onset myopia. This scenario reflects the complexity in the disease architecture of myopia pathogenesis.

AL is the most important endophenotype of myopia. Presently there have been only two genome-wide linkage studies performed in European descent populations that suggested the presence of AL quantitative trait loci (QTLs) on 2p24202 and 5q (at 98 centimorgans) along with two classical

genetic loci (MYP3 at 12q21 and MYP9 at 4q12)203, but there are no reports

of any genes that are indisputably confirmed to be associated with AL.

48

Table 2 Myopia loci identified from genome-wide linkage studies. Location Myopia Study Inherita Affected status AAO LOD Ref locus population nce (yrs)

1p36 MYP14 Ashkenazi QTL SE ≤ -1.00; na 9.54 198 Jewish mean SE -3.46 Caucasian QTL mean SE −5.74 na 1.5 201 2q37.1 MYP12 Caucasian AD SE ≤ -7.25; mean <12 4.75 175 SE -14.46 3q26 MYP8 English QTL mean SE < 0 na 3.7 39 [-12.12, 7.25] English AD mean SE -0.55 na 204 American QTL Mean SE 0.44 na P=5.34×10-4 188 [-12.12, 8.38] 3q28 MYP Caucasian AD SE ≤ -5.00 na 11.5 205 unknown [-5,-18] 4q12 MYP9 UK QTL mean SE < 0 na 3.3 39 [-12.12, 7.25] 4p22 - q27 MYP11 Han Chinese AD SE [-5 ,-20] <6 3.11 176 5p15.33 -p15.2 MYP16 Han Chinese AD SE ≤ -6.00 na 4.81 177 Mean SE -11.59 5p15.1 -p13.3 MYP19 Han Chinese AD SE ≤ -6.00 & AL <12 3.02 178 >26 7p15 MYP17 French and AD SE ≤ -6.00 na 4.07 179 Algerian African QTL SE: 2.87 ± 3.58 na 5.87 38 American American QTL Mean SE: 0.44 na P=9.43 ×10-4 188 [-12.12, 8.38] 7q36 MYP4 French and AD SE ≤ -6.00 na 2.81 180 Algerian American QTL Mean SE:0.44 na P=2.32 ×10-3 188 [-12.12 , 8.38] 8p23 MYP10 UK QTL -12.12 ≤ SE ≤ na 4.1 39 7.25 Old Order AD SE ≤ -1.00 na 2.03 199 Amish 10q21.1 MYP15 Caucasian AD SE ≤ -6.00 na 3.22 181 11p13 MYP7 UK QTL SE [-12.12, 7.25] na 6.1 39 Ashkenazi AD SE ≤ -1.00 na >2 206 Jewish 12q21-23 MYP3 German/Italian AD SE ≤ -6.00 5.9 3.85 182 UK AD SE ≤ -6.00 na 2.54 189 14q22.1- q24.2 MYP18 Chinese AR SE ≤ -6.00 na 2.19 183 17q21-22 MYP5 English/Canadi AD SE ≤ -6.00, mean 8.9 3.17 184 an SE -13.95 18p11.31 MYP2 American & AD SE ≤ -6.00 6.8 9.59 185 Chinese Hong Kong SE ≤ -6.00 na 2.1 207 Chinese 22q12 MYP6 American AD SE ≤ -1.00 na 3.54 195 families of Jewish descent Jewish descent AD SE ≤ -1.00 na 4.73 196 American QTL Mean SE 0.44 na P=0.0033 188,1 [-12.12 , 8.38] 97 Xq23 - 25 MYP13 Han Chinese XR SE ≤ -6.00 <6 2.75 186 Xq23 – 27.2 Han Chinese XR SE ≤ -6.00 na 2.79 192 Xq28 MYP1 Danish XR SE ≤ -6.00 1.5-5 4.8 187 Asian Indian XR SE ≤ -6.00 na 5.3 208

49

SE – spherical equivalent in diopters; AL – axial length in mm; LOD-logarithm of odds; AAO – age at onset of myopia; na- not available.

1.3.5.2 Candidate gene studies

A number of candidate genes have been queried in genetic studies of

refractive errors based on their potential biological function, differential gene

expression in animal models, or until recently, their physical locations residing

in known myopia loci identified from the linkage studies or genome-wide association studies. Table 3 summarises the candidate genes studied for myopia at the time of writing this thesis.

Sclera thinning is an important process involved in the development of myopia. 90% of the human sclera is composed of ECM collagen fibrils. The mutation in collagen genes causes autosomal dominant syndromes, such as osteogenesis imperfecta, Ehlers-Danlos syndrome (collage, type 1, COL1A1) and Stickler’s syndrome (collagen, type 2, COL2A1). A positive association with COL1A1 was observed, but only in a Japanese study209. In contrast, two

studies supported the role of COL2A1 in the pathogenesis of myopia in 123

nuclear families of mixed ethnicities (62% Caucasian)210 and in a Duke-

Cardiff cohort totalling 276 families of Caucasians211. The SNP identified in

these two cohorts was rs1635529. However, the assessment of rs1635529 or

other investigated SNPs in COL2A1 for high myopia has failed to show

significance in the Chinese population212,213.

The structural organisation of the ECM also largely depends on the

cellular activity of fibroblasts, which is partially regulated by members of a

major family of zinc- and calcium-dependent endopeptidases: the matrix metalloproteinases (MMPs) and tissue inhibitors of metalloproteinases 50

(TIMP)214. In a family-based study, Wojciechowski and colleagues found

positive associations for rs1939008 (MMP1) and rs9928731 (MMP2) with

quantitative refraction in OOA families, but not Ashkenazi Jewish (ASHK)

families215. The authors speculated that strong environmental exposure may mask the effect in ASHK society where much a more rigorous schooling system was implemented. Significant association of MMP3 and MMP9 for

low myopia in the Caucasian population has also been observed216. However,

MMP isoforms 1, 2 and 3 do not appear to play critical roles in the

development of high myopia in the Japanese population217. The MMP3 and

TIMP1 genes also showed no significant association for high myopia in

Taiwanese men218.

Hepatocyte growth factor (HGF) and its receptor, cMET, which modulates

the MMP and TIMP pathways, have been proposed to play an active role in

the scleral remodelling. The association of rs373350 in HGF with myopia was

first reported in 128 families with severe myopic sibs defined as SE ≤ -10.0

D219. For the Caucasian population, the same SNP was found to be associated with HGF in 146 multiplex Caucasian families for moderate myopia220, but a

different set of SNPs within HGF showed significant association221. The

association of cMET with high myopia was noted in school children in

Singapore (Khor, et al., 2009), but these findings need further assessment in other studies.

Transforming growth factor-β (TGF-β) is known to initiate myofibroblast formation and fibrosis as well as the production of collagen222. Although five

members of the TGF-β family have currently been identified, only TGF-β

isoforms 1, 2, and 3 are detected in eyes, and TGF-β2 is the predominant 51

form. The mRNA levels of TGF-β are significantly reduced in retina, choroid and sclera in correlation with axial elongation of myopia (Liu et al., 1996).

The activation of TGFB1 expression in ocular tissues mediated by

transcription factor EGR1 (early growth response type 1 gene) is thought to

influence ocular elongation. The SNP rs1800470 in TGFB1, which is

associated with high myopia, was first reported in Taiwanese individuals (201 cases and 86 controls)223. Another SNP (rs4803455) in TGFB1 also showed a

significant association with high myopia in two Chinese studies224,225. With

regard to TGFB2, emerging evidence suggests its role in the development of

myopia, as rs7550232 in the promoter region in TGFβ2 was implicated with

high myopia226.

Insulin-like growth factor 1 (IGF-1) lies within the MYP3 loci and plays

an important role in cell proliferation, differentiation and apoptosis. It has

been speculated that high-glycaemic load carbohydrate diets may contribute to

the development and progression of myopia, perhaps by affecting sensitivity

to insulin or increasing free-circulating IGF-1 levels227. The role of the IGF-1 gene in association with human myopia was first examined in 265 multiplex families with 1,391 Caucasians, revealing significant associations between

SNP rs6214 with both high and mild myopia228. The association of IGF1 gene

with myopia was replicated in several Chinese populations229,230.

Trabecular-meshwork-inducible glucocorticoid response (TIGR) gene

encodes myocillin (MYOC). The expression of MYOC in the trabecular

meshwork is affected by TGFB and mechanical stretching. In addition, the

ECM components comprise small leucine-rich /proteoglycans

(SLRPs) to regulate collagen fibril diameter, including lumican (LUM), 52

fibromodulin (FMOD), prolargin (PRELP) and opticin (OPTC)231. Given their roles in ECM assembly and expression in the eye, these candidates are likely to regulate the shape and size of eyes. However, there has been no data so far to support the roles of TIGR, MYOC and SLRPs in myopia development.

In addition to the growth factors and extracellular matrix components, a

‘major’ gene controlling the activity of many ‘myopia genes’ is also proposed to be involved in eye growth. For example, paired box (PAX) genes encode tissue-specific transcription factors and, among those, the PAX6 gene is involved in oculogenesis and ocular growth. The correct dosage of PAX6 is crucial for normal eye development: over-expression of Pax6 in mice results in microphthalmia, retinal dysplasia and defective retinal ganglion cell axon guidance232. Interestingly, despite a lack of evidence in the Caucasian population, three positive association studies in Chinese individuals suggest the involvement of PAX6 in the pathogenesis of high myopia232-234.

53

Table 3 Candidate genes studied for myopia.

Chr Gene Ref Phenotype Study population (sample Chosen criteria Significant sizes) SNPs 1 FMOD 226 High myopia Chinese ECM ns (rs7543418) (195 cases, 94 controls) 1 TGFB2 226 High myopia Chinese ECM rs7550232 (195 cases, 94 controls) ns (rs991967) 1 TIGR 235 High myopia Chinese MYP2 ns (MYOC) (70 cases, 69 controls) 236 High myopia Chinese rs235858, (162 families) rs2421853 237 High myopia Caucasian ns (164 Cardiff and 87 Duke families ) 3 SOX2 238 Low/moderate Caucasian Eye growth ns (n=596) 3 LEPREL1 205 High myopia Caucasian Linkage study c.1523 G>F ( a pedigree) 4 FGF2 210 Low/moderate American ECM/gene ns myopia (62% Caucasian, 123 expression families) 226 High myopia Chinese ns(rs308395, (195 cases, 94 controls) rs41348645) 239 High myopia Chinese ns (1064 cases, 1001 controls) 5 CTNND2 240 High myopia Chinese MYP16/ GWAS rs6885224 (287 cases, 673 controls) 7 cMET 220 All Caucasian ECM ns (146 multiplex families) 241 Moderate Chinese (n=650) rs2073560 myopia 242 Low/moderate, Caucasian (n=818 ) ns hypermetropric 7 HGF 219 High myopia Han Chinese ECM rs3735520 (SE ≤ -10.0 ) (128 families) 243 High myopia Chinese ns (288 cases, 208 controls) 220 All Caucasian rs3735520 (146 multiplex families) rs2286194

221 All Caucasian (n=551) rs1743, rs4732402, rs12536657, rs10272030, rs9642131. rs5745718, rs12536657 244 VCD Chinese (n=814) rs3735520 11 BDNF 210 Low/moderate American Function ns myopia (62% Caucasian, 123 families) 11 CHRM1 245 High myopia Chinese Function rs11823728, (194 cases, 109 controls) rs544978, rs5422969 11 MMP1 216 Low myopia Caucasian ECM/gene ns (SE ≤ -1.0) (n=366) expression 215 SPH (spherical AMISH (n=358) rs1939008 component of ASHK (n=535) (AMISH only) refraction) 217 High myopia Japanese ns (SE ≤ -6.0 & AL (case 725, controls 546) ≥ 26) 11 MMP3 218 High myopia Taiwanese ECM/gene ns (case 216; control 474) expression 216 Low myopia Caucasian (n=366) -1612 5A/6A (SE ≤ -1.0) 54

217 High myopia Japanese ns (SE ≤ -6.0 & AL (case 725, controls 546) ≥ 26) 11 PAX6 39 Low/moderate UK (221 dizygotic twin) MYP7 ns myopia 210 Low myopia American ns (SE ≤ -0.75) (62% Caucasian, 123 families) 238 Low/moderate Caucasian (n=596) ns myopia 233 High myopia Chinese rs667773 (188 cases, 85 controls) 234 High myopia Chinese (164 families with rs3026393 170 highly myopic offspring) 232 High myopia ( Chinese rs12421026, SE ≤ -8.0) (300 cases, 300 controls) rs2071754, rs3026393, rs1506 and rs12421026 12 COL2A1 210 Low myopia USA ECM rs1635529 (SE ≤ -0.75) (62% Caucasian, 123 families) 211 High myopia English/Welsh rs1034762, (SE ≤ -5.0) (146 Duke and 130 Cardiff rs1635529, families) rs1793933, rs3803183, rs17122571, rs1635529 212 High myopia Chinese ns (697 cases, 762 controls) 213 High myopia Chinese ns (581 cases, 384 controls) 12 DCN 246 High myopia Taiwanese ECM ns (120 cases, 137 controls) 247 High myopia Taiwanese ns (120 cases, 137 controls) 12 DSPG3 247 High myopia Taiwanese ns (120 cases, 137 controls) 12 LUM 246 High myopia Taiwanese ECM rs3759223 (120 cases, 137 controls) 243 High myopia Chinese ns (288 cases and 208 controls) 248 High myopia English and Finnish n/a (125 cases, 308 controls) 249 High myopia Chinese c1567 ( 201 cases, 86 controls) 247 High myopia Taiwanese rs3759223, (120 cases, 137 controls) rs3741834 12 IGF1 228 High myopia Caucasian (265 families) MYP3 and animal rs10860860, models rs2946834, rs6214 250 High myopia Polish (42 multiplex families) ns 230 High myopia Chinese rs12423791 (421 cases, 401 controls) 244 Lens thickness Chinese (n=814) rs6214 229 High myopia Chinese Haplotype (300 cases, 300 controls) rs12423791, rs7956547, rs5742632 15 GJD2 251 High myopia Japanese GWAS r634990, rs524952 (1125 cases, 929 controls) 244 Ocular Chinese (n=814) ns biometrics 15 RASGRF1 252 High myopia Chinese GWAS rs2218817, (640 cases, 1052 controls) rs10034228, rs1585471 251 High myopia Japanese rs8027441 (1125 cases, 929 controls) 16 MMP2 215 SPH (spherical AMISH (n=358) Gene expression rs9928731

55

component of ASHK (n=535) refraction) 217 High myopia Japanese ns (SE ≤ -6.0 &AL (case 725, controls 546) >=26) 17 COL1A1 253 High myopia Taiwanese MYP5 ns (471 cases, 623 controls) 209 High myopia Japanese rs2075555, (SE ≤ -9.25) (330 cases, 330 controls) rs2269336 254 High myopia Japanese ns (SE ≤ -6.0 and (427 cases, 420 controls) AL ≥ 26) 211 High myopia English/Welsh (146 Duke and ns (SE ≤ -5.0) 130 Cardiff families) 212 High myopia Chinese ns (SE ≤ -6.0 and (697 cases, 762 controls) AL ≥ 26) 17 RARA 255 Low/moderate, Caucasian (n=802) Gene expression ns hyperopic 18 TGIF 190 High myopia Chinese MYP2 657 (T - > G) (71 cases, 105 controls) 256 High myopia Caucasian (n=21, ns sequencing) 257 High myopia Japanese ns (SE ≤ -9.0) (330 cases, 330 controls) 258 AL, CC, Low Caucasian rs8082866, myopia (SE≤ - (257 cases, 294 controls) rs2020436 0.5) ns after multiple testing 243 High myopia Chinese ns (288 cases, 208 controls) 19 TGFB1 223 High myopia Taiwanese Gene rs1800470 (201 cases, 86 controls) expression/function 259 High myopia Japanese ns (SE ≤ -9.25 ) (300 cases, 300 controls) 243 High myopia Chinese ns (288 cases, 208 controls) 224 High myopia Chinese rs1800470, (SE ≤ -10.0) (300 cases, 300 controls) rs4803455

225 High myopia Singapore Chinese rs4803455 20 BMP2 260 High myopia Chinese (201 cases, 86 Gene expression rs2288255 controls) 20 MMP9 216 Low myopia Caucasian (n=366) ECM/ Gene 6 (SE ≤ -1.0) expression (Arg- > Gln) 21 COL18A1 210 Low myopia American Function ns (SE ≤ -0.75) (62% Caucasian, 123 families) 21 UMODL1 261 High myopia Japanese Linkage study rs2839471 (520 cases, 520 controls) 23 TIMP1 218 High myopia Taiwanese Gene expression (216 cases, 474 controls) SE – Spherical equivalent in diopters; AL – Axial length in mm; VCD - Vitreous chamber depth; ns - not significant; ECM – Extracellular matrix; High myopia is defined as SE ≤ -6.0 D unless otherwise indicated in the parentheses. Controls are non-myopic controls.

BDNF - brain-derived neurotrophic factor; BMP2 –bone morphogenetic protein 2; COL18A1- collagen, type XVIII, alpha 1; COL2A1 – collagen, type II, alpha 1; COL1A1 – collagen, type 1, alpha 1; CTNND2 – catenin (cadherin – associated protein); cMET – growth factor receptor c-met; DCN – decorin; DSPG3 - dermatan sulfate proteoglycan 3; FMOD - fibromodulin ; FGF2 - fibroblast growth factor 2; FMOD – fibromodulin; GJD2 – gap junction protein, delta 2; HGF - hepatocyte growth factor; IGF1 – Insulin-like Growth factor-1; LEPREL1 – leprecan-like 1; LUM – Lumican; MMP-matrix metallopeptidase; PAX6 –paired box gene 6; RASGRF1 – RAS protein-specific guanine nucleotide –releasing factor 1. RARA – retinoic acid receptor, alpha; SOX2 – sex determining region Y –box 2; TIGR/MYOC – myocilin; 56

TGFB1 - transforming growth factor-beta1; TGIF - Transforming growth beta-induced factor; TGFB2 - transforming growth factor-beta2; TIMP1 – tissue inhibitor of metalloproteinase metallopeptidase inhibitor 1; UMODL1 – uromodulin –like 1.

1.3.5.3 Genome-wide association studies

In the past several years, with the advent of cheap and high-density array-

based DNA marker technologies, genome-wide scale assessments of the

genetic effect for each typed marker have become common practice87,88.

GWAS have transformed population-based epidemiology studies into

repertoires for further genetic exploration. The ophthalmology genetic

community has started to benefit from the convergence of population-based

epidemiological studies and genetic assays to identify genes involved in

susceptibility to eye disorders, including refractive errors17-21,23,34,35,73,262.

Table 4 lists the recent GWAS findings for refractive errors or myopia and the

subsequent replication results.

Two companion GWAS pinpointed genetic susceptibility variants for

quantitative refraction at 15q1434 and 15q2535 in Caucasian populations. A

locus in the region 15q14 was further successfully replicated for SE in a large-

scale meta-analysis study composed of 42,848 Caucasians and 12,332

Asians263 and in a Japanese cohort for high myopia251, whereas the association

at the 15q25 region was mixed in the pooled data. For chromosome loci

15q14, the implicated polymorphisms are in a putative regulatory region near

the genes GJD2 and ACTC1. GJD2 encodes a neuron-specific protein

(connexin36) in retinal photoreceptors, amacrine and bipolar cells, implying an important modulator of retinal signal transmission. ACTC1 encodes the 42- kDa alpha cardiac muscle actin 1, with the function in eyes being largely 57

unknown. The gene closest to chromosome 15q25 is RASGRF1, which is

highly expressed in human retina and known to be functionally involved in

eye development. RASGRF1 is regulated by muscarinic receptor and retinoic

acid, both of which are postulated in the biological mechanisms for refractive

controls.

In a meta-analysis of two GWAS in Chinese adults and children, our group

identified that minor allele of rs6885224 in CTNND2 at chromosome 5 was

-6 22 associated with an elevated risk of high myopia (Pmeta = 7.84 × 10 ) .

CTNND2 is known to play a crucial role in retinal morphogenesis, adhesion, and retinal cell architectural integrity via the regulation of adhesion molecules.

Interestingly, the most prominent SNP rs6885224 was subsequently found to be associated with high myopia (n=1203) and moderate myopia (n=615) versus emmetropia (n=955) in an independent Chinese sample set, but with a protective effect for the same minor allele240.

Nakanishi and colleagues reported the first GWAS for pathological

myopia (defined as AL ≥ 26mm on both eyes) and detected a novel susceptibility locus at chromosome 11q24.1 in a Japanese cohort23. The

adjacent BLID gene discovered in this study along with the gene PSARL

identified from another genome-wide linkage study204 are both involved in

mitochondrial apoptosis. However, the association with BLID was not

replicated in 1,818 high myopia cases (SE ≤ -10.0 D) and 1,052 controls in the

Chinese population264.

In a case-control data comprising 500 high myopia cases (SE ≤ -6.0 D and

AL ≥ 26 mm on either eye) and 500 emmetropic controls in South China, Shi

58

and colleagues reported a strong association for high myopia at gene MIPEP

on chromosome 13q12.12 in a GWAS, which encodes proteins targeting to the

mitochondrial matrix, suggesting a potential novel pathway leading to

myopia265. The associated gene MIPEP for high myopia was also noted in our

GWAS data (p < 0.05). Using a much smaller sample size for high myopia, Li

and colleagues identified chromosome locus 4q25 associated with high

myopia across older Chinese and much younger cohorts266.

Table 4 Genetic loci identified from genome-wide association studies

Locus (gene) Ethnics Most significant SNP Phenotype Sample size Ref 11q24.1 (BLID) Japanese rs577948 High myopia 830/934 23 Chinese ns(rs577948,rs11218544, High myopia 1818/1052 264 rs2839471) (SE ≤ -10.0D) 15q14 European rs634990 SE 15,608 34 (GJD2,ACTC1) 15q25 European rs8027411 SE 17,685 35 (RASGRF1) Japanese 15q14: rs634990,rs524952 High myopia 1125/366 (929) 251 15q25: rs8027411,rs17175798 Chinese ns (GJD2) Ocular biometrics 244 5q15 (CTNND2) Chinese rs6885224 High myopia 1,246/2,584 22 Chinese rs6885224 (opposite effect) High myopia 1203 high 240 myopia/955 controls 4q25 (intergenic) Chinese rs10034228 High myopia 2,993/10,406 266 Chinese rs10034228/rs2218817/ High myopia 1,052/1255 252 rs1585471/rs6837348 13q12.12(MIPEP) Chinese rs9318086 High myopia 3,222/6,341 265 SE – Spherical equivalent; high myopia is defined as either SE ≤ -6.0 D or axial length ≥ 26 mm, unless otherwise specified

The first genome-wide exome sequencing study is worthy of note. Shi and

colleagues identified the association of protein 644 isoform

1 (ZNF644) sequence variants with susceptibility to high myopia in two

affected individuals in a large Han Chinese family and the findings were

confirmed in 600 Chinese adults of high myopia cases (both SE < -6.0 D)

and 600 normal controls267. In a multi-ethnic case-control study, primarily 59

comprised of Caucasians, the rare variants in ZNF644 screened by deep-

sequencing were also observed to be significantly enriched in 131 high

myopia cases compared to 672 controls268. ZNF644 belongs to the C2H2-type

zinc-finger protein family, which contains seven C2H2-type zinc fingers and

is predicted to be a transcription factor in gene expression regulation in the

retina and retinal pigment epithelium (RPE).

The sequence variants identified thus far from standard GWAS approaches

exhibit very modest effect sizes on myopia risk (per-allele odds ratio ≤ 1.5).

This observation reflects the complexity in the disease architecture of myopia

pathogenesis. The implicated genes for myopia identified through candidate

gene studies, GWAS or exome sequencing fall into three categories as shown

in Figure 3: 1) The majority are known to be involved in the process of sclera remodelling, for example COL2A1, COL2A2, HGF, TGFB1, TGFB2 etc.; 2)

Photoreceptor-related genes, like GJD2 for retinal signal transmission on

RPE; and 3) Genes hypothesised to regulate the ‘myopia important genes’ at

the transcriptional level, such as ZNF644. Interestingly, none of the genes

involved in sclera remodelling in the ECM pathway (Category 1) reached

genome-wide significance for myopia in the published GWAS.

60

Visual input

Retina Neurotransmission in the retina : GJD2, RASGRF1 Retinal pigment epithelium

Choroid Signaling cascade Regulate transcription levels: ZNF644, PAX6 Mitochondrial apoptosis: BLID, MIPEP

sclera Scleral remodeling: COL2A1, COL1A1, HGF, cMET, MMP1, MMP2, MMP3, MMP9, MMP13, LUM, PAX6, TGFB1, TGFB2, CTNND2, TGIF, IGF1

Figure 3. The implicated genes likely to be involved in the visual signal transmission and scleral remodeling. COL2A1 – collagen, type II, alpha 1; COL1A1 – collagen, type 1, alpha 1; CTNND2 – catenin (cadherin – associated protein); cMET – growth factor receptor c-met; GJD2 – gap junction protein, delta 2; HGF - hepatocyte growth factor; IGF1 – Insulin-like Growth factor-1; LUM – Lumican; MMP-matrix metallopeptidase; PAX6 –paired box gene 6; RASGRF1 – RAS protein-specific guanine nucleotide – releasing factor 1. TGFB - transforming growth factor-beta 1; TGIF - Transforming growth beta-induced factor; TGFB2 - transforming growth factor-beta2; UMODL1 – uromodulin –like 1; ZNF644 – zinc finger protein 644 isoform 1.

1.3.6 Intervention to slow myopia progression

Different aetiological factors have been proposed for myopia onset and

progression, such as anomalous accommodative activity and defocus of the

retinal image. Depending on the implicated mechanism, different clinical

approaches have been developed. The effectiveness of current strategies to

control the progression of myopia in children has been reviewed

extensively269,270.

Using single vision lenses (SVLs) is the most common routine to correct

vision, although it does not retard the progression of myopia. Myopic

individuals exhibit a greater accommodative lag (poor focusing accuracy

while looking at near work) than emmetropes. Accommodative lag results in

61

light being focused behind the retina during near work, which may stimulate the eye growth progressively after myopia onset. It is thus proposed that the reduction in accommodative lag (or load) in reviewing a near object with bifocal or multifocal spectacles should reduce myopic progression in already myopic children. In the Correction of Myopia Evaluation Trial (COMET), however, children wearing progression additional lenses (PALS) were found to have a small slowing of myopia progression compared to those wearing

SVLs, with negligible clinical significance271. Similar findings were noted in several other trails as systematically reviewed by Walline and colleagues269.

Interestingly, a subset of myopic children - for example, esophores at near with high accommodative lag - showed a larger treatment effect142, suggesting that accommodation lag may be a potential indicator for some individuals who could actually benefit from the treatment.

The uncorrected myopia decreases the need for accurate accommodation and poses the question of whether myopia progression can be reduced by diminishing accommodation by reading without spectacles or wearing under- corrected spectacles. In a three-year perspective study in myopic school children aged 9 to 11 years, the myopia progression was not correlated with full-time spectacle wearing, limited-time wearing for distance vision only or bifocal lenses272. The most interesting finding in this study, however, was that the shorter the average reading distance in the follow-up time, the faster the myopic progression was. In a randomised trial, Chung and colleagues found that the myopic progression was significantly accelerated in the under- corrected myopic children as compared to the full-time fully corrected group,

62

suggesting that the presence of blurred vision at any distance may stimulate

the progression of myopia regardless of the sign of defocus in the susceptible

eyes273. However, the findings have been criticised for not accounting for

‘running period’ at the onset of experiment.

One treatment modality is the use of overnight rigid gas permeable (RGP)

contact lens to reshape the corneal surface. It has been shown that RGP lenses

effectively reduce the degree of hyperopic field curvature in myopic eyes274.

Experimental studies support the concept that the peripheral hyperopic

defocus promotes axial myopia135. It has been shown that the rigid contact lens

corrected not only the central refractive error but also induced a myopic

change in relative peripheral refractive error, while the latter could be

effective for controlling eye growth in myopic children275. Two recent

randomised clinical trials reported significant changes in axial length in the

corneal reshaping group compared to those wearing spectacles276,277 or soft contact lenses278. Nevertheless, an early trial over 24-months in children aged

6 to 12 years in Singapore showed a statistically insignificant reduction in

myopia in the participants wearing rigid contact lenses during the daytime

when compared with the spectacle-corrected group279. Notably, there are

number of risks with overnight contact lens wearing, including corneal

infection and scarring, and the potentially-related microbial keratitis280.

The positive effects for slowing myopia progression have been exhibited

by cycloplegic agents centred on atropine and pirenzepine. Such types of

medicines are thought to reduce myopic progression by eliminating

accommodation. Several studies have reported that children receiving

63

pirenzepine gel, cyclopentolate eye drops, or atropine eye drops showed significantly less myopic progression compared with children receiving placebo281,282. However, many ophthalmologists do not embrace this treatment option due to concern over the atropine toxicity, photophobia, and significant catch-up after treatment removal.

64

2 Chapter 2 Study Aims

Study 1: Genetic Variants on Chromosome 1q41 Influence ocular Axial

Length and High Myopia (Chapter 3)

As a highly heritable ocular biometry of refractive errors, identification of quantitative trait loci influencing axial length (AL) variation would be valuable in informing the biological aetiology of myopia. This is the first

study using GWAS approach to investigate genes susceptibility to AL. The

aims of this study are to

1) discover genetic variants associated with AL in Asian populations

2) provide evidence of association for myopia at the implicated genetic

locus

Study 2: Genome-wide Meta-Analysis of Five Asian Cohorts Identifies

PDGFRA as a Susceptibility Locus for Corneal Astigmatism (Chapter 4)

Corneal astigmatism is associated with reduced visual acuity and is highly

heritable, but currently there is a paucity of research to investigate the genetic

aetiology of corneal astigmatism. This study aims to

1) identify genetic variants associated with corneal astigmatism in Asians

2) explore the transferability of the implicated loci in multiple Asian

populations

Study 3: Genome-wide Comparison of Estimated Recombination Rates

between Populations (Chapter 5)

Homologous recombination is the only force which breaks down linkage

disequilibrium (LD), where the inter-population pattern of LD is a 65

fundamental basis on the transferability of genome-wide associated genetic

loci. Inter-population differences in recombination patterns yield important

insights into trans-ethnic fine-mapping as well as recent human evolution. Ad-

hoc thresholds have been previously used to identify genomic regions

exhibiting either concurring or differential evidence of hotspot differences .

No formal statistical matrix has been applied to quantify the inter-population recombination rate variation based on estimated rates. This study aims to

1) introduce a novel strategy to evaluate the inter-population genetic

recombination differences

2) describe the pattern of genome-wide recombination differences among

populations

66

3 Chapter 3 Genetic Variants on Chromosome 1q41

Influence Ocular Axial Length and High Myopia

This Chapter is a slightly modified version of the paper published in PLoS

Genet 2012, 8(6): e1002753. doi:10.1371/journal.pgen.1002753

3.1 Abstract

As one of the leading causes of visual impairment and blindness, myopia poses a significant public health burden in Asia. The primary determinant of myopia is an elongated ocular axial length (AL). Here we report a meta- analysis of three genome-wide association studies on AL conducted in 1,860

Chinese adults, 929 Chinese children and 2,155 Malay adults respectively. We identified a genetic locus on chromosome 1q41 harbouring the zinc-finger

11B pseudogene ZC3H11B showing genome-wide significant association with

-10 AL variation (rs4373767, β = -0.16 mm per minor allele, Pmeta = 2.69 × 10 ).

The minor C allele of rs4373767 was also observed to significantly associate

with decreased susceptibility to high myopia (per-allele odds ratio (OR) =

-7 0.75, 95% CI: 0.68 – 0.84, Pmeta = 4.38 × 10 ) in 1,118 highly myopic cases

and 5,433 controls. ZC3H11B and two neighboring genes SLC30A10 and

LYPLAL1 were expressed in the human neural retina, retinal pigment epithelium and sclera. In an experimental myopia mouse model, we observed

significant alterations to gene and protein expression in the retina and sclera of

the unilateral induced myopic eyes for the murine genes ZC3H11A, SLC30A10

and LYPLAL1. This supports the likely role of genetic variants at chromosome

1q41 in influencing AL variation and high myopia.

67

3.2 Background

Myopia increases the risk of visual morbidity and poses a considerable public health and economic burden globally, especially in Asia, where the prevalence is significantly higher than other parts of the world 283. Human

myopia primarily results from an abnormal increase in ocular axial length

(AL), the distance between the anterior and posterior poles of the eye globe,

whereas the role of corneal curvature and lens thickness is minimal 114. A 1

millimeter (mm) increase in AL is equivalent to a myopic shift of -2.00 to -

3.00 diopters (D) with no corresponding changes in the optical power of the

cornea and lens. High myopia, often defined as ocular spherical equivalent

(SE) refraction below -6.00 D, is associated with an abnormally long AL, and

this affects between 1% to 10% of the general population284. The degenerative

changes in the retina and the choroid due to the excessive elongation of the

globe are not prevented by optical correction and this subsequently increases

the risk of visual morbidity through myopic maculopathy, choroidal

neovascularization, and macular holes111. The active

remodelling of the sclera, mediated by the signaling cascade initiated in the retina under visual input, has also been found to be critical in determining

axial growth, and thus the refractive state of the eye285.

It is well documented that near work, educational attainment and outdoor

activities are environmental factors contributing to myopia286. Evidence from

family and twin studies has also supported a substantial genetic component in

spherical refractive error and AL157,159,160. The heritability of the quantitative

trait AL has been estimated to be as high as 94% comparable to that for SE

(for a review, see174). Although linkage scans on pedigrees (myopia loci 68

MYP1 to MYP18; see http://www.omim.org) and genome-wide association

studies (GWAS)22,23,34,35,265,266 have implicated several regions in the human

genome as being significant for refractive error and myopia, no myopia genes

have been consistently identified within or across different population groups.

This scenario reflects the complexity in the disease architecture of myopia

pathogenesis.

Genetic factors influencing AL and refraction appear to be at least partly

shared, given previous literature from twin studies illustrating that at least half

of the covariance between AL and refraction are due to common genetic

factors287. The measurement of AL is more precise and less prone to errors

compared to cycloplegic or non-cycloplegic assessments of refraction. As AL is an endophenotype for spherical refractive error, identifying genes that are responsible for AL variation provides insight into myopia predisposition and development. Presently there are only two genome-wide linkage studies performed in European descent populations that suggest the presence of AL quantitative trait loci (QTLs) on chromosomes 2p24202 and 5q (at 98

centimorgans) along with two classical myopia loci (MYP3 at 12q21 and

MYP9 at 4q12)203, and there are no reports of any genes that are indisputably confirmed to be associated with AL.

We thus performed a meta-analysis of three genome-wide surveys of AL in a total of 4,944 individuals in Asian populations comprising (i) Chinese adults from the Singapore Chinese Eye Study (SCES); (ii) Chinese children from the Singapore cohort Study of the Risk factors for Myopia (SCORM); and (iii) Malay adults from the Singapore Malay Eye Study (SiMES). SNPs that have been identified from this meta-analysis to be significantly associated 69

with AL were further assessed for association with high myopia in an

additional two independent case-control studies from Japan. We also examined the expression patterns of the candidate genes located in the vicinity of the identified SNPs in human ocular tissues and in the eyes of myopic mice.

3.3 Methods

3.3.1 Study cohorts

Discovery cohorts

Singapore Chinese Eye Study (SCES). SCES is an ongoing population- based cross-sectional survey of eye diseases in Chinese adults aged 40 to 80 years residing in the Southwestern part of Singapore. The study began in 2007 and a detailed description was published elsewhere288. In brief, a total of 2,226

residents in the Southwestern area of Singapore completed comprehensive

ophthalmologic examinations, including visual acuity assessments, refraction,

lens and retinal imaging, and slit lamp examinations. Genome-wide

genotyping was performed in 1,952 individuals. Completed post quality

control (QC) data for GWAS were available for 1,860 adults with AL

measurements.

Singapore Cohort study of the Risk factors for Myopia (SCORM). A

total of 1,979 children in grades 1, 2, and 3 from three schools in Singapore

were recruited from 1999 to 2001289. The children were examined on their

respective school premises annually by a team of eye care professionals. The

GWAS was conducted in a subset of 1,116 Chinese children22,290. The

70

phenotype used in this study was based on the AL measured on the 4th annual

examination of the study (children at age 10 to 12 years). Complete post-

filtering data on AL measurements and SNP data were available in 929

children.

Singapore Malay Eye Study (SiMES). SiMES is a population-based

cross-sectional survey of eye diseases in Malay adults aged 40 to 80 years

living in Singapore. It was conducted between August of 2004 and June of

2006291. A total of 4,168 Malay residents in the Southwestern area of

Singapore were identified and invited for a detailed ocular examination where

3,280 (78.7%) participated. Genome-wide genotyping was performed in 3,072 individuals26,29. Complete post-filtering data for GWAS with AL

measurements were available for 2,155 subjects.

Validation cohorts for high myopia

Japan dataset 1. The Japan dataset 1 consisted of 483 high myopia cases

and 1,194 general healthy population controls. High myopia status was

determined primarily on the basis of AL ≥ 28 mm for both eyes, which

corresponded to the spherical equivalent (SE) cut-off of at least -9.00 D292.

Cases were recruited at the Center for Macular Disease of Kyoto University

Hospital, the High Myopia Clinic of Tokyo Medical and Dental University,

and the Fukushima Medical University Hospital. Details of the data have been

reported elsewhere 23. The population controls were recruited at the Aichi

Cancer Center Research Institute.

Japan dataset 2. The Japan dataset 2 was comprised of 504 high myopia

cases (SE ≤ -9.00 D in either eye) and 550 non-highly myopic controls (SE ≥ - 71

3.00 D in both eyes). Less stringent thresholds were adopted for controls for

the purpose of ease of recruitment from the clinics. Given the large phenotypic

separation between the cases and controls, and assumption of

homoscedasticity across genotype categories, such a study design using the

extreme on one end (i.e. SE ≤ -9.00 D) but sampling less extreme controls (i.e.

SE ≥ -3.00 D) still provides sufficient statistical power to detect the true

positive signals in the association study293. Cases were recruited at the

Yokohama City University and Okada Eye Clinic. Controls were obtained

from the Yokohama City University and Tokai University Hospital.

Measurements of AL, refractive error and covariates

All the studies used a similar protocol for ocular phenotype measurements.

For subjects in SCES and SiMES, AL for both eyes were measured using

optical laser interferometry (IOLMaster V3.01, Carl Zeiss; Meditec AG Jena,

Germany)288,291. Children in the SCORM study underwent AL measurements using the A-scan ultrasound biometry machine (Echoscan US-800; Nidek Co,

Tokyo, Japan)289. For subjects in the Japan dataset 1, applanation A-scan

ultrasongraphy (UD-6000, Tomey, Nagoya, Japan) or partial coherence

interferometry (IOLMaster, Carl Zeiss Meditec, Dublin, CA) were used to

measure AL. AL was assessed using a portable A-scan Biometer/pachymeter

(AL-2000, Tomey, Negoya, Japan) for the participants in the Japan dataset 2.

Non-cycloplegic refraction in SCES and SiMES as well as cycloplegic

refraction in SCORM (three drops of 1% cyclopentolate at 5 minutes apart)

were measured by autorefractor (Canon RK-5, Tokyo, Japan)294. For subjects

in the Japan dataset 2, refraction was measured using auto-refraction ARK- 72

730A (NIDEK), ARK-700A (NIDEK) and KR-8100P (TOPCON). SE was calculated as the sphere power plus half of the cylinder power for each eye.

To perform the genetic association of high myopia in SCES and SiMES, we used the definition adopted by the Japan case-control studies and defined high myopia cases as subjects having SE ≤ -9.0 D in at least one eye, and non high-myopia controls as samples with SE ≥ -3.0 D in both eyes. For children from SCORM aged 10 to 12 years, cases were defined as SE ≤ -6.0 D for at least one eye, while controls were defined as SE ≥ -1.0 D for both eyes; this is approximately equivalent to the projected SE of -9.0 and -3.0 respectively at university age based on the estimated annual progression rate in SE of -0.6 D for Chinese myopic children and -0.3 D in the controls295. Given the small

sample sizes of high myopia cases identified in our population-based cohorts,

in the supplementary analysis, we further applied the commonly adopted

criteria of SE≤ -6.0 D in either eye as cases. Controls were defined as SE ≥ -

1.0 D in both eyes. For SCORM children, we retained the same criteria in both

analyses.

Age, gender, height and level of education were obtained from all

Singapore participants who underwent ophthalmologic examination.

Education was measured on an ordinal scale from no formal education to the

highest educational level. For participants in SCORM, the education of the

child was defined by the level of educational attainment of the father, as a

marker of socioeconomic status.

73

3.3.2 Data quality control

Discovery cohorts

For SCES, a total of 1,952 venous blood-derived samples were genotyped

using Illumina Human 610 Quad Beadchips (Illumina Inc., San Diego, US)

according to the manufacturer’s protocols. Samples which failed genotyping

or with low call rate (< 95%, n = 11), with excessive heterozygosity (defined

as sample heterozygosity exceeding 3 standard deviations from the mean

sample heterogzygosity; n = 3), with gender discrepancies (n = 2) were

excluded, as were cryptically related samples identified by the identity-by- state (IBS) (n = 41) and population structure in the principal components analyses (PCA) (n = 6). The criteria to define cryptically related samples and outliers with population structure in the discovery cohorts are described in the following paragraph. After the removal of the samples, SNP quality control

(QC) was then applied on a total of 579,999 autosomal SNPs for the 1,889 post-QC samples. SNPs were excluded based on (i) high rates of missingness

(> 5%) (n = 26,437); (ii) monomorphism or minor allele frequency (MAF) <

1% (n = 59,633); or (iii) genotype frequencies deviating from Hardy-

Weinberg Equilibrium (HWE) defined as HWE P-value < 10-6 (n = 1,821).

This yielded 492,108 autosomal SNPs. Those individuals with missing data on

phenotypes were further removed (n = 29). Finally, 492,108 SNPs in 1,860

samples were available for analyses.

For SCORM, 1,116 DNA samples (1,037 from buccal swab and 79 from

saliva) were genotyped on the Illumina HumanHap 550 Beadchips and 550

Duo Beadarrays. A total of 108 samples were excluded, comprising (i) 70

samples with call rates below 98%; (ii) 6 with poor genotyping quality; (iii) 11 74

samples identified from sib-ships; (iv) 18 with inconsistent gender

information; and (v) 3 due to population structure. This left a total of 1,008

samples for further SNP QC. Based on 514,849 autosomal SNPs, we excluded

32,669 markers if they had missing genotype calls > 5%, MAF < 1%, or

significantly deviated from HWE (P < 10-6) 22. A final set of 929 samples with

482,180 post-QC SNPs and completed AL measurement were included in

analyses.

For SiMES, 3,072 DNA samples were genotyped using the Illumina

Human 610 Quad Beadchips. The detailed QC procedures were provided

elsewhere296. In brief, we omitted a total of 530 individuals due to: (i)

subpopulation structure (n = 170); (ii) cryptic relatedness (n = 279); (iii)

excessive heterozygosity or high missingness rate > 5% (n = 37); and (iv)

gender discrepancy (n = 44). After the removal of the samples, SNP QC was

then applied on a total of 579,999 autosomal SNPs for the 2,542 post-QC

samples. SNPs were excluded based on: (i) high rates of missingness ( > 5%)

(n = 26,343); (ii) monomorphism or MAF < 1% (n = 34,891); or (iii) genotype

frequencies deviating from HWE (P < 10-6) (n = 3,645). This yielded 515,120

SNPs after the same SNP QC criteria. Individuals without valid measurements

for AL were further removed (n = 387). After the above filtering criteria,

515,120 SNPs in 2,155 samples were available for association analyses.

In our discovery cohorts, IBS was estimated with the genome-wide SNP

data using PLINK software to assess the degree of recent shared ancestry for a

pair of individuals297. For a pair of putatively-related samples defined as an

identity by descent (IBD) value greater than 0.1856, we removed one individual from each pair of monzygotic twins/duplicates, parent-offspring or 75

full-siblings etc. Population structure was ascertained using PCA with the

EIGENSTRAT program and genetic outliers were defined as individuals whose ancestry was at least 6 standard deviations from the mean on one of the top ten inferred axes of variation7.

For SiMES Malays, we also excluded the samples falling in the main clusters of PCA plots of the Chinese and Indians ethnic groups, as described in the previous study296. In SiMES, we noticed some degree of admixture in genetic ancestry of Malays and thus adjusted for ancestry along the top five axes of variation, as the spread of principal component scores was greater for the top five eigenvectors in the bivariate plots of PCA (Figure 4), The top ten principal components explained a small percentage of the global genetic variability of 1.3% while top five explained 1.0%, suggesting, all together, they had minimal effects on our association analyses.

Validation cohorts for high myopia

High myopia cases in the Japan dataset 1 were genotyped using Illumina

Human-Hap550 and 660 chips23, while controls in the Japan dataset 1 were genotyped on Illumina Human-Hap610 chips. Subjects in the Japan dataset 2 were genotyped on the Affymetrix GeneChip Human Mapping 500K Array

Set (Affymetrix Inc., Santa Clara, US). For SNPs not available on the

Affymetric chips (rs43737678, rs10779363 and rs7544369), genotyping was performed with TaqMan 5′ exonuclease assays using primers supplied by

Applied Biosystems (Foster City, US). The probe fluorescence signal was detected using the TaqMan Assay for Real-Time PCR (7500 Fast Real-Time

PCR System, Applied Biosystems). 76

3.3.3 Statistical methods

The primary analysis was performed on the AL quantitative trait. As a strong correlation exists in AL measurements from both eyes (r > 0.9), we used the mean AL across both eyes in the GWAS analysis, as was recommended in a review298. Linear regression was used to interrogate the association of each SNP with AL after adjusting for age, gender, height and level of education, under the assumption of an additive genetic effect where the genotypes of each SNP are coded numerically as 0, 1 and 2 for the number of minor alleles carried. In addition, for SiMES, the top five principal components of genetic ancestry from the EIGENSTRAT PCA were also included as covariates to account for the effects of population substructure as described in genotype QC section290. Association tests between each genetic marker and phenotype were carried out using PLINK software297 (version

1.07). Analyses were also repeated without adjustment for education level or height for the purpose of comparison.

In the discovery phase, we conducted a meta-analysis of GWAS results from 3 cohorts for AL using a weighted-inverse variance approach by fixed- effect modelling in METAL (http://www.sph.umich.edu/csg/abecasis/metal).

In the secondary analyses, SNPs that have been identified from the primary analyses were tested for association with high myopia onset (as a binary trait) and SE (as a quantitative trait). For Singapore cohorts, the association analyses adjusted for the same covariates as the primary analyses within a linear regression and logistic regression framework respectively. For Japan case-

77

control datasets, only age and gender were included as covariates in the model

for high myopia, as the other covariates were not available.

The regional association plots were constructed by SNAP

(http://www.broadinstitute.org/mpg /snap ). Haploview 4.1

(http://www.broad.mit.edu/mpg/ haploview) was used to visualize the LD of

the genomic regions. Genotyping quality of all reported SNPs has been

visually evaluated by the intensity clusterplots. The coordinates reported in

this paper are on NCB136 (hg18).

For functional studies in the myopic mouse model, gene expression of all

three identified genes in control and experimental groups was quantified using

the 2-∆∆Ct method299. The standard student’s t-test was performed to determine the significance of the relative fold change of mRNA between the myopic eyes of the experimental mice with the independent age-matched controls.

3.3.4 Functional studies

3.3.4.1 Gene expression in human

Genes ZC3H11B , SLC30A10 and LYLPLAL1, adjacent to the top

associated SNPs, as well as GAPDH (house keeping gene) were run using

10ul reactions with Qiagen’s PCR products consisting of 1.26 ul H2O, 1.0 ul

10X buffer, 1.0 ul dNTPs, 0.3 ul MgCl, 2.0 ul Q- Solution, 0.06 ul taq polymerase, 1.0 ul forward primer, 1.0 ul reverse primer and 1.5.0 ul cDNA.

The reactions were run on a Eppendorf Mastercycler Pro S thermocycler with touchdown PCR ramping down 1°C per cycle from 72°C to 55°C followed by

50 cycles of 94°C for 0:30, 55°C for 0:30 and 72°C for 0:30 with a final 78

elongation of 7:00 at 72°C. All primer sets were designed using Primer3300.

The gel electrophoresis was run on a 2% agarose gel at 70 volts for 35

minutes. The primers from a custom tissue panel including Clontech’s Human

MTC Panel I, Fetal MTC Panel I and an ocular tissue panel were used to

amplify the relevant transcripts. The adult ocular samples were obtained from

normal eyes of an 82-year-old Caucasian female from the North Carolina Eye

Bank, Winston-Salem, North Carolina, USA. The fetal ocular samples were

from 24-week fetal eyes obtained by Advanced Bioscience Resources Inc.,

Alameda, California, USA. All adult ocular samples were stored in Qiagen’s

RNAlater within 6.5 hours of collection and shipped on ice overnight to the

lab. Fetal eyes were preserved in RNAlater within minutes of harvesting and shipped over night on ice. Whole globes were dissected on the arrival day.

Isolated tissues were snap-frozen and stored at -80°C until RNA extraction.

RNA was extracted from each tissue sample independently using the Ambion mirVana total RNA extraction kit. The tissue samples were homogenized in

Ambion lysis buffer using an Omni Bead Ruptor Tissue Homogenizer per

protocol. Reverse transcription reactions were performed with Invitrogen

SuperScript III First-Strand Synthesis kit.

3.3.4.2 Myopia-induced mouse model

Gene expression in a mouse model of myopia

Experimental myopia was induced in B6 wild-type (WT) mice (n = 36) by

applying a -15.00 D spectacle lens on the right eye (experimental eye) for 6 weeks since post-natal day 10. The left eyes were uncovered and served as

79

contra-lateral fellow eyes. Age matched naive mice eyes were used as

independent control eyes (n = 36). Each eye was refracted weekly using the

automated infrared photorefractor as described previously301. AL was

measured by AC- Master, Optic low coherence interferometry (Carl-Zeiss), in- vivo at 2, 4 and 6 weeks after the induction of myopia302. The minus-lens-

induced eyes after six weeks were significantly associated with increased AL

and myopic shift in refraction of < -5.00 D as compared to independent

control eyes (n = 36, P = 3.00 × 10-6 for AL, and 2.05 x 10-4 for refraction).

Eye tissues were collected at 6 weeks post myopia induction for further

analyses.

Total RNA was isolated from pooled cryogenically ground mouse neural

retina (retina), retinal pigment epithelium (RPE) and sclera for three batches

using TRIzol Reagent (Invitrogen, Carlsbad, CA) with each batch (n = 6)

comprising the myopic eye, fellow eye and control eye. RNA concentration

and quality were assessed by the absorbance at 260 nm and the ratio of

absorbance ratio at 260 and 280nm respectively, using Nanodrop® ND-1000

Spectrophotometer (Nanodrop Technologies, Wilmington, DE). RNA was

purified using the RNeasy® Mini kit (Qiagen, GmbH).

500 ng of purifed RNA was reverse-transcribed into cDNA using random

primers and reagents from iScriptTM select cDNA synthesis kit (Bio-rad

Laboratories, Hercules, CA). The pseudogene ZC3H11B (zinc finger CCCH

type containing 11B) is not characterized in the mouse genome, therefore we

examined a similar gene ZC3H11A (zinc finger CCCH type containing 11A) in mice. ZC3H11A in mice and ZC3H11B in humans are highly conserved with 79% nucleotide similarity by BLAST alignment analysis 80

(http://blast.ncbi.nlm.nih.gov). We used quantitative Real-Time PCR (qRT-

PCR) to validate the gene expression. qRT-PCR primers were designed using

ProbeFinder 2.45 (Roche Applied Science, Indianapolis, IN) and this was performed using a Lightcycler 480 Probe Master (Roche Applied Science,

Indianapolis, IN). The reaction was run in a Lightcycler 480 for 45 cycles under the following conditions: 95˚C for 10s, 56˚C for 10s and 72˚C for 30s.

Gene expressions in the retina, RPE and sclera after six weeks of myopic eyes and the fellow eyes were compared to the control eyes. Glyceraldehyde 3-

phosphate dehydrogenase (GAPDH) was used as an endogenous internal

control.

Immunohistochemistry

Whole mouse eyes (6 weeks minus lens treated myopic, contra-lateral

fellow and independent control eyes, n = 6 per type) were embedded in frozen

tissue matrix compound at -20°C for 1 hour. Prepared tissue blocks were

sectioned with a cryostat at 6 microns thicknesses and collected on clean polysine™ glass slides. Slides with the sections were air dried at room temperature (RT) for 1 hour and fixed with 4% para-formaldehyde for 10 min.

After washing 3X with 1x PBS for 5 minutes, 4% bovine serum albumin

(BSA) diluted with 1x PBS was added as a blocking buffer. The slides were then covered and incubated for 1 hour at RT in a humid chamber. After rinsing with 1x PBS, a specific primary antibody raised in rabbit against ZC3H11A,

SLC30A10 and raised in goat against LYPLAL1 (Abcam, Cambridge, UK) diluted (1:200) with 4% BSA was added and incubated further at 4°C in a humid chamber overnight. After washing 3X with 1x PBS for 10 min, 81

fluorescein-labeled goat anti-rabbit secondary antibody (1:800, Invitrogen-

Molecular Probes, Eugene, OR) and fluorescein-labeled rabbit anti-goat

secondary antibody (1:800, Santa Cruz Biotechnology, Inc. CA, USA) was

applied respectively and incubated for 90 min at RT. After washing and air-

drying, slides were mounted with antifade medium containing DAPI (4,6-

diamidino-2-phenylindole; Vectashield, Vector Laboratories, Burlingame,

CA) to visualize the cell nuclei. Sections incubated with 4% BSA and omitted

primary antibody were used as a negative control. A fluorescence microscope

(Axioplan 2; Carl Zeiss Meditec GmbH, Oberkochen, Germany) was used to

examine the slides and capture images. Experiments were repeated in

duplicates from three different samples.

3.4 Results

3.4.1 Datasets after quality control

A genome-wide meta-analysis of three GWAS on AL was performed in the post quality control samples from SCES (n = 1,860), SCORM (n = 929) and SiMES (n = 2,155). Principal component analysis (PCA) of these samples with reference to the HapMap Phase 2 individuals showed that the two

Chinese cohorts (SCES and SCORM) are indistinguishable with respect to samples of Han Chinese descent, and the differentiation from samples of

Japanese descent is evident only on the fourth principal component (Figure

5). The SiMES Malays are genetically similar to the Chinese-descent samples relative to individuals with European or African ancestries. The distributions of AL measurements in the three cohorts were approximately Gaussian and

82

the baseline characteristics are summarised in Table 5. The mean AL were

23.98 mm (SD = 1.39 mm), 24.10 mm (SD = 1.18 mm) and 23.57 mm (SD =

1.04 mm) for SCES, SCORM and SiMES respectively. Moderate to high

correlations between AL and SE were observed (SCES/SCORM/ SiMES;

Pearson correlation coefficient r = -0.75, -0.76 and -0.62 respectively).

3.4.2 Locus at chromosome 1q41 achieved genome-wide significance

The meta-analysis was performed on 456,634 SNPs present in all three studies, and the quantile-quantile (QQ) plots of the P-values showed only modest inflation of the test statistics in SCES and in the meta-analysis

(genomic control inflation factor: λmeta = 1.03; λSCES = 1.05; λSCORM = 1.00;

λSiMES = 1.00, Figure 6).

A cluster of four SNPs on chromosome 1q41 (rs4373767, rs10779363,

rs7544369 and rs4428898) attained genome-wide significance on meta-

analysis for AL, adjusting for age, gender, height and education level (Figure

7). Analyses conducted without adjustment for height or education level

yielded the same pattern of results. The most significant SNP rs4373767 (Pmeta

= 2.69 × 10-10) explained 0.98% of AL variance in SCES, 0.86% in SCORM

and 0.73% in SiMES, and each copy of the minor allele (cytosine) decreased

AL by 0.16mm on average (Table 6). These top associated SNPs at

chromosome 1q41 remained significant after adjustment for genomic control

-8 (Pmeta ≤ 1.85 × 10 ). Table 6 also lists three genetic loci at chromosome

2p13.1 (SEMA4F), 2p21 (SPTBN1) and 5q11.1 (PARP8) exhibiting suggestive

evidence of association with AL that were seen in at least one SNP with P-

values < 1 ×10-5.

83

3.4.3 Association with high myopia on the identified SNPs

To assess whether these four SNPs at chromosome 1q41 have any role in

high myopia predisposition, we performed association testing of these SNPs

with high myopia in two independent case-control studies from Japan

consisting of 987 high myopes and 1,744 controls. High myopes were defined

as individuals with SE ≤ -9.00 D or AL ≥ 28 mm. All four SNPs exhibited

consistent evidence of association (P < 0.05) in both Japanese studies,

suggesting a potential role of these SNPs for high myopia (Table 7).

We further dichotomized the quantitative refraction from our three

population-based studies (SCORM, SCES, and SIMES) to define samples as high myopes and controls according to similar criteria from the Japanese datasets. High myopes in SCES and SiMES were younger and more highly educated than controls. While the case-control associations of these 4 SNPs with high myopia did not achieve statistical significance in SCES and SiMES, this is likely a consequence of the small sample sizes since the direction and magnitude of the odds ratios were highly similar across all cohorts. The meta- analysis of 1,118 high myopia cases and 5,433 controls from all the five cohorts yielded strong evidence of association with high myopia at these SNPs

-6 -8 (Pmeta between 1.45 × 10 to 7.86 × 10 , Table 7), with no evidence of inter-

study heterogeneity (P ≥ 0.75 for heterogeneity). The minor allele cytosine at

rs4373767 lowered the odds of high myopia by 25% with respect to the

-7 thymidine allele (ORmeta = 0.75, 95% CI: 0.68 – 0.84, Pmeta = 4.38 × 10 ). The

stringent definition of high myopia (SE ≤ -9.00D) used here only considered between 1.0% to 2.4% of our samples as cases, and relaxing this criterion to the commonly adopted threshold of SE ≤ -6.00D identified more myopia cases 84

- and increased the statistical support of all four SNPs (Pmeta between 1.47 × 10

7 to 9.13 × 10-9).

This associated interval spans approximately 70kb in the extended linkage

disequilibrium (LD) block within an intergenic region on chromosome 1q41

(pairwise r2 > 0.5 with the most significant SNP rs4373767, Figure 8A). Zinc

finger family CCCH-type 11B pseudogene ZC3H11B (RefSeq NG_007367.2)

is embedded between the associated top SNPs rs4373767 and rs10779363

(Figure 8B). The most significant SNP rs4373767 is located 223kb downstream from SLC30A10 (RefSeq NM_018713.2), which is a member of solute carrier family 30, and 354kb downstream of LYPLAL1 (RefSeq

NM_138794.3), encoding a lysophospholipase-like protein.

3.4.4 Gene expression

The mRNA expression levels of ZC3H11B, SLC30A10 and LYPLAL1 were surveyed in 24-week human fetal and adult tissues using reverse-

transcriptase polymerase chain reaction (RT-PCR). Whilst ZC3H11B and

LYPLAL1 were found to be expressed across all the tissues including brain, placenta, neural retina, retina pigment epithelium (RPE) and sclera, the expression of ZC3H11B was more abundant compared to LYPLAL1 (Figure

9). SLC30A10 was expressed in all tissues but the adult sclera, analogous to observations made in other zinc transporters303.

Gene expressions for ZC3H11A, SLC30A10 and LYPLAL1 from the tissues

of myopic (with SE < -5.0 D) and fellow non-occluded eyes of the

experimental mice were compared with age-matched control tissues (Figure

85

10). The mRNA levels of ZC3H11A, a gene that is conserved with respect to

ZC3H11B in human, were significantly down-regulated in myopic eyes

compared to naive controls (retina/ RPE/sclera, Fold change = -2.88 , -3.24 and -2.07; P = 2.60 × 10-5, 2.62 × 10-6 and 1.08 × 10-4, respectively). At the

neighboring gene SLC30A10, there was a similarly significant reduction in the expression of mRNA in the retina tissue of myopic eyes in contrast to independent controls (retina/RPE, Fold change = -2.02, -2.69; P = 2.00 × 10-4,

2.00 × 10-4, respectively), with elevated expression in the sclera (Fold change

= 4.58; P = 4.02 × 10-4). Another neighboring gene LYPLAL1 exhibited up-

regulation of transcription levels in retina tissue but was down-regulated in the

sclera (retina/ RPE/sclera, Fold change = 2.71, 3.45 and -2.36; P = 1.50 × 10-4,

1.50 × 10-4 and 1.54 × 10-4, respectively).

Immunohistochemical results confirmed the localization of ZC3H11A,

SLC30A10 and LYPLAL1 proteins in the neural retina, RPE and sclera (Figure

11). For ZC3H11A, positive immunostaining intensity was reduced

significantly in the myopic tissues of experimental mice compared to the non-

myopic independent controls (Figures 10A). This is consistent with the

differential expression patterns at the transcription level. For SLC30A10 and

LYPLAL1, there were also similarly noticeable changes in the expression of

proteins to that of their mRNA levels (Figures 10B and 10C).

3.5 Discussion

We report that the chromosome 1q41 locus (most significant SNP

rs4373767) is associated with AL in a meta-analysis of three GWAS

86

performed in the study cohorts consisting of Chinese adults, Chinese children,

and Malay adults. The discovery of chromosome 1q41 as a locus for high

myopia in our data is further supported by validation in two independent

Japanese cohorts, and the observed genetic effects are highly consistent across

all five studies. The pseudogene ZC3H11B and two nearby genes SLC30A10

and LYPLAL1 were found to be expressed in the human retina and sclera. The

potential roles in regulating myopia at three candidate genes were further

implicated by the concordant changes in the pattern of transcription and

protein expression in the mouse model.

The ZC3H11B pseudogene belongs to the CCCH-type zinc finger family,

whereas such type of zinc finger protein has been shown as a RNA-binding motif to facilitate the mRNA processing at transcription 304. Emerging evidence suggests that pseudogenes, resembling known genes but not producing proteins, play a significant role in pathological conditions by competing for binding sites to regulate the transcription of its protein-coding counterpart 305-307. Although the function of the ZC3H11B in humans is presently unknown, the implicated role of the murine gene ZC3H11A

(conserved gene of ZC3H11B in mouse) in myopia development is in keeping with previous findings that several zinc finger proteins are involved in myopia

308,309. Given their role as transcription factors 310, zinc finger protein ZENK

has been proposed to function as a messenger in modulating the visual

signaling cascade in the chicken retina, where the expression of the ZENK was

suppressed by the condition of minus defocus (induced myopic eye growth)

and enhanced by positive defocus (induced hyperopic eye growth)311-313.

Similarly, it has been reported that ZENK knockout mice had elongated AL 87

and a myopic shift in refraction309. Moreover, early growth response gene

type1 EGR-1 (the human homologue of ZENK) has been shown to activate

transforming growth factor beta 1 gene TGFB1 by binding its promoter314,315,

a gene that is implicated to be associated with myopia224,225. Another zinc

protein finger protein 644 isoform ZNF644 has recently been identified to be

responsible for high myopia using whole genome exome sequencing in a Han

Chinese family308, whereas its influence on “myopia genes” remains to be

elucidated. In light of this, the observation that ZC3H11B is abundantly

expressed in retina and sclera, together with the significant down-regulation of the coding counterpart ZC3H11A in myopic mice eyes, suggests it may promote or inhibit the transcription of ocular growth genes vital in myopia development.

One of the two neighboring genes SLC30A10 is an efflux transporter that reduces cytoplasmic zinc concentrations316. The SLC30 zinc transporters are

expressed abundantly in human RPE cells, and the retina has been observed to

possess the highest concentration of zinc in the human body303. Zinc

deficiency in the intracellular retina has thus been implicated in the

pathogenesis of age-related macular degeneration (AMD)317,318, and in RPE-

photoreceptor complex deficits, which can affect visual signal transduction

from retina to sclera and lead to visual impairment319. LYPLAL1 functions as a

triglyceride lipase and this gene has been shown to be up-regulated in

subcutaneous adipose tissue in obese individuals320-322. While the relationship

between LYPLAL1 and myopia is unknown, elevated saturated-fat intake has

been proposed to influence myopia development through the retinoid receptor

pathway227,323,324. Interestingly, the SNPs pinpointing chromosome 1q41 in 88

our study are 1Mb away from the transforming growth factor beta 2 gene

(TGFβ2) which has been implicated in the down-regulation of mRNA levels

in myopia progression of an induced tree shrew myopia model325. None of

these nearby genes, however, are within the LD block containing our

identified SNPs.

Chromosome 1q41 is a previously reported locus for refraction from a

linkage analysis of 486 pedigrees in the Beaver Dam Eye Study, US197. Using

microsatellite markers, Klein et al identified novel regions of linkage to SE on

chromosome 1q41, whereas the peak spanned a broad region near Marker

D1S2141 (multipoint P < 1.9 × 10-4). This result however was not replicated

in a subsequent genome-wide linkage scan for SE with denser SNP markers,

partially due to varying information of linkage conveyed by SNPs versus

microsatellites188. The identified variants at chromosome 1q41 in our study

were noted to exhibit weaker, albeit still significant, association with SE in

SCES and SCORM (rs4373767, SCES/SCORM: P = 3.54 × 10-3, 3.49 × 10-2,

respectively), but not in SiMES (3.51 × 10-1 ), which is consistent with the

lower correlation of AL and SE seen in the SiMES data, partially from

increasing lens opalescence in the Malay population326,327.

Our data have shown that genetic variants on chromosome 1q41 influence

the physiological attribute of AL and are also associated with high myopia.

Elongation of AL is the major underlying structural determinant of high

myopia, mostly accompanied with prolate eyeballs and thinning of the sclera,

macula and retina111. Thus, high myopia is also defined as AL of >26 mm in

some studies23,328. It is possible that genes involved in a quantitative trait

(refraction or underlying AL) also play a role in the extreme forms of the trait 89

(high myopia)30. Two recent GWAS performed in general Caucasians population have identified genetic variants for quantitative refraction at

chromosome 15q1434 and 15q2535, of which the locus on 15q14 was subsequently confirmed to be associated with high myopia in the Japanese251.

Our GWAS results herein highlight AL QTLs’ relevance for high myopia

predisposition, which advances our understanding of the genetic aetiology of

myopia at different levels of severity.

The meta-analysis of three GWAS in our discovery suggests that the

quantitative trait locus at chromosome 1q41 accounts for variation in AL in

both school children and adults, regardless of age differences. Notably, the

early-onset of myopia in childhood may continuously progress toward high

myopia in later life, while adult-onset of myopia is usually in the low or moderate form329,330. The significant association on chromosome 1q41 for

high myopia in adults and children thus also implicates this locus identified

for AL is likely to be associated with early-onset myopia.

The prevalence of myopia among Asian population is considerably higher

than in Caucasians 283. Although distinct genetic mechanisms governing

myopia may exist for populations with different genetic backgrounds, we

believe there are polymorphisms involved in refractive variation that are

shared across populations. However, the allele frequencies of these identified

SNPs vary across populations. For instance, the minor C allele of rs4373767

was a major allele in the HapMap Africans and Europeans with frequency of

0.92 and 0.62 respectively. The effect size of the top SNPs on AL was found

greatly attenuated in Singapore Indian samples (rs4373767; β = -0.04 mm per

90

minor allele C, P = 0.151, MAF = 0.48; details of Indian cohort in Chapter 4).

In addition, we note that the variability in refraction attributed to AL may vary

in different ethnic groups. For example, AL has been reported to account for a

larger proportion of the variation in refraction in East-Asian children compared to their Caucasian counterparts 331, therefore the increased power of

refraction may reflect more variation in factors other than pure elongation of

AL in certain ethnic groups. Such heterogeneity may confer different

statistical power and large sample sizes will be required to study the

transferability of the same variants in Caucasian populations 332,333. Recent

largest GWAS conducted by the international Consortium for refractive Error

and Myopia has confirmed 1q41 locus influencing axial length across

Europeans (data has not published yet), which will be discussed in the Chapter

6.

In conclusion, our findings suggest that common variants at chromosome

1q41 are associated with AL and high myopia in a pediatric and an adult

cohort, the latter incorporating Chinese, Malay and Japanese populations.

Further evaluation of causal variants and underlying pathway mechanisms

may contribute to early identification of children at highest risk of developing

myopia, and eventually lead to appropriate interventions to retard the

progression of myopia.

91

Table 5: Characteristics of study participants in the five Asian cohorts. Characteristics SCESc SCORMc SiMESc Japan Dataset 1d Japan Dataset 2e High myopia Controls High myopia Controls Individuals (n) 1,860 929 2,155 483 1,194 504 550 Male (%) 51.5 51.7 49.3 33.7 41.3 43.3 49.5 Age a (yrs) 58.4 (9.5) 10.8 (0.8) 57.7 (13.9) 58.8 (13.2) 50.3 37.8 (11.9) 39.7 (15.9) (12.6) Range of age [44, 85] [10, 12] [40, 80] [14, 91] [20, 79] [12, 76] [21, 75] Height a (cm) Male 168.5 (6.3) 144.8 (8.7) 165.5 (6.4) NAf NA NA NA Female 156.7 (5.5) 145.5 (8.9) 152.3 (6.2) NA NA NA NA Average ALa (mm) 23.97 (1.39) 24.13 (1.18) 23.57 (1.04) 30.08 (1.38) NA 27.83 (1.28) NA Range of AL [20.64, 33.36] [21.05,28.20] [20.48, 31.11] [28.00, 38.03] NA [24.25, 34.74] NA Average SEa (diopter) -0.77 (2.64) -2.02 (2.26) -0.05 (1.90) -14.86 (4.28) NA -11.61 (2.22) NA Range of SE [-15.40, 6.25] [-11.09, 3.78] [-17.46, 8.56] [-42.00, -2.50] NA [-23.00, -9.25] ≥ -3.0 a Data presented are means ( standard deviation). AL, ocular axial length; SE, spherical equivalent. b The education levels of the children in SOCRM was presented by the level of educational attainment of the father, as cGWAS cohorts. SCES, Singapore Chinese Eye Study; SCORM, Singapore Cohort study of the Risk factors for Myopia; SiMES, Singapore Malay Eye Study. dFor the Japan dataset 1, high myopia, AL ≥ 28mm for both eyes; controls, general healthy population. e For the Japan dataset 2, high myopia, SE ≤ -9.0 D for either eye; controls, SE ≥ -3.0 D for both eyes. fNA, data not available.

92

-5 Table 6. Top SNPs (Pmeta-value ≤ 1 × 10 ) associated with AL from the meta-analysis in the three Asian cohorts.

SCESc SCORMc SiMESc Meta-analysis (n = 1,860) (n = 929) (n = 2,155) (n = 4,944) d Nearest β β β βmeta a b e SNP Gene CHR BP MA MAF (s.e) P MAF (s.e.) P MAF (s.e.) P (s.e.) Pmeta Phet rs4373767 ZC3H11B 1 217826305 C 0.30 -0.21 2.55 ×10-6 0.32 -0.16 1.80 ×10-3 0.24 -0.12 1.22 ×10-3 -0.16 2.69 ×10-10 0.23 (0.05) (0.05) (0.04) (0.02) rs10779363 ZC3H11B 1 217853513 C 0.29 -0.21 5.27 ×10-6 0.31 -0.17 1.63 ×10-3 0.24 -0.11 1.70 ×10-3 -0.15 7.83 ×10-10 0.23 (0.05) (0.05) (0.04) (0.02) rs7544369 ZC3H11B 1 217856085 T 0.29 -0.21 7.17 ×10-6 0.31 -0.16 2.56 ×10-3 0.24 -0.12 1.45 ×10-3 -0.15 1.10 ×10-9 0.29 (0.05) (0.05) (0.04) (0.03) rs4428898 ZC3H11B 1 217806589 G 0.30 -0.21 4.49 ×10-6 0.31 -0.14 7.46 ×10-3 0.22 -0.10 4.79 ×10-3 -0.14 9.07 ×10-9 0.19 (0.05) (0.05) (0.04) (0.02) rs4557020 SPTBN1 2 54571685 T 0.37 -0.11 9.08 ×10-3 0.37 -0.15 2.81 ×10-3 0.31 -0.11 8.10 ×10-4 -0.12 2.61 ×10-7 0.76 (0.04) (0.05) (0.03) (0.02) rs282544 PARP8 5 50062222 C 0.35 -0.10 2.38 ×10-2 0.33 -0.14 6.13 ×10-3 0.40 -0.09 2.40 ×10-3 -0.10 4.15 ×10-6 0.70 (0.04) (0.05) (0.03) (0.02) rs1137 SEMA4F 2 74792684 C 0.16 -0.01 8.68 ×10-1 0.18 0.22 (0.06) 5.97 ×10-4 0.26 0.14 3.01 ×10-5 0.12 4.26 ×10-6 0.02 (0.06) (0.03) (0.03) rs2404958 PARP8 5 50098792 T 0.35 -0.10 2.91 ×10-2 0.33 -0.15 5.32 ×10-3 0.40 -0.09 2.46 ×10-3 -0.10 4.72 ×10-6 0.66 (0.04) (0.05) (0.03) (0.02) rs10735496 ZC3H11B 1 217790029 C 0.29 -0.14 1.95 ×10-3 0.30 -0.09 7.61 ×10-2 0.26 -0.10 3.66 ×10-3 -0.11 5.75 ×10-6 0.71 (0.05) (0.05) (0.03) (0.02) rs4671938 SPTBN1 2 54546708 G 0.35 -0.08 5.85 ×10-2 0.34 -0.12 1.61 ×10-2 0.33 -0.11 8.92 ×10-4 -0.10 7.46 ×10-6 0.82 (0.04) (0.05) (0.03) (0.02) rs32396 PARP8 5 50142196 A 0.35 -0.09 3.78 ×10-2 0.33 -0.14 6.98 ×10-3 0.40 -0.09 2.57 ×10-3 -0.10 7.73 ×10-6 0.69 (0.04) (0.05) (0.03) (0.02) rs12055210 PARP8 5 50025458 A 0.35 -0.10 2.38 ×10-2 0.33 -0.14 8.39 ×10-3 0.40 -0.09 4.01 ×10-3 -0.10 9.11 ×10-6 0.70 (0.04) (0.05) (0.03) (0.02) rs11954386 PARP8 5 50021409 A 0.35 -0.10 2.17 ×10-2 0.33 -0.14 6.34 ×10-3 0.40 -0.09 5.35 ×10-3 -0.10 9.44 ×10-6 0.63 (0.04) (0.05) (0.03) (0.02) aMA, minor allele. bMAF, minor allele frequency in each cohort. cGWAS cohorts. SCES - Singapore Chinese Eye Study; SCORM - Singapore Cohort study of the Risk factors for Myopia; SiMES - Singapore Malay Eye Study. dβ, coefficient of linear regression; s.e., standard error for coefficient β. Association between each genetic marker and AL was examined using linear regression, adjusted for age, gender, height and level of education. The effect sizes denote changes in millimeter of AL per each additional copy of the minor allele. e Phet, P-value for heterogeneity by Cochran’s Q test across three study cohorts.

93

Table 7. Association between genetic variants at chromosome 1q41 and high myopia in the five Asian cohorts.

Japan Dataset 1 Japan Dataset 2 SCESb SCORMb SiMESb Meta–analysis (483/1,194)c (504/550) (44/1,305) (65/332) (22/2,052) (1,118/5,433) d e SNP BP MA OR P OR P OR P OR P OR P OR Pmeta Phet a (95% CI) (95% CI) (95% CI) (95% CI) (95% CI) (95% CI) rs4428898 217806589 G 0.74 2.33 ×10-4 0.76 2.15 ×10-3 0.73 2.06 ×10-1 0.63 4.89 ×10-2 0.72 4.37 ×10-1 0.74 7.86 ×10-8 0.96 (0.64, 0.87) (0.64, 0.91) (0.45, 1.19) (0.40, 0.99) (0.32, 1.64) (0.66, 0.83) rs4373767 217826305 C 0.74 1.44 ×10-4 0.81 1.80 ×10-2 0.73 1.99 ×10-1 0.59 2.59 ×10-2 0.77 5.16 ×10-1 0.75 4.38 ×10-7 0.75 (0.63, 0.86) (0.68, 0.96) (0.44, 1.18) (0.38, 0.94) (0.35, 1.69) (0.68, 0.84) rs10779363 217853513 C 0.74 2.11 ×10-4 0.81 1.41 ×10-2 0.68 1.35 ×10-1 0.62 4.13 ×10-2 0.76 4.96 ×10-1 0.76 7.81 ×10-7 0.82 (0.63, 0.87) (0.68, 0.96) (0.41, 1.13) (0.39, 0.98) (0.34, 1.68) (0.68, 0.85) rs7544369 217856085 T 0.75 3.05 ×10-4 0.82 2.32 ×10-2 0.67 1.45×10-1 0.62 4.17 ×10-2 0.74 4.56 ×10-1 0.76 1.45 ×10-6 0.79 (0.64, 0.88) (0.69, 0.97) (0.39, 1.15) (0.39, 0.98) (0.33, 1.64) (0.69, 0.85) aMA, minor allele. bGWAS cohorts; SCES - Singapore Chinese Eye Study; SCORM - Singapore Cohort study of the Risk factors for Myopia; SiMES - Singapore Malay Eye Study. cThe sample sizes for each study denote the number of high-myopia cases versus controls. For the Japan dataset 1, high myopia, AL ≥ 28mm for both eyes; controls, general healthy population; For the Japan dataset 2, SCES, and SiMES, high myopia, SE ≤ -9.0 D for either eye; controls, SE ≥ -3.0 D for both eyes; For SCORM children, high myopia, SE ≤ -6.0 D for either eye; controls, SE ≥ -1.0 D for both eyes. dOR, odds ratio per copy of minor allele. e Phet, P-value for heterogeneity by Cochran’s Q test across five study cohorts.

94

Figure 4. Principal component analysis (PCA) was performed in SiMES to assess the extent of population structure. Each figure represents a bivariate plot of two principal components from the PCA analysis of genetic diversity within SiMES. (A) 1st eigenvector against 2nd eigenvector, (B) 2nd eigenvector against 3rd eigenvector, (C) 3rd eigenvector against 4th eigenvector and (D) 1nd eigenvector against 5th eigenvector . The first 5 principal components were used as covariates to account for population structure.

95

Figure 5. Principal Component Analysis (PCA) of discovery cohorts SCES, SCORM and SiMES with respect to the four population panels in phase 2 of the HapMap samples (CEU - European, YRI – African, CHB – Chinese, JPT – Japanese). ( A) Principal components 1 versus 2; The principal components (PCs) were calculated with SCES, SCORM, SiMES and four HapMap panels on the thinned set of 102,122 SNPs (r2 <0.2). (B) Principal components 1 versus 2; (C) Principal components 1 versus 3; (D) Principal components 1 versus 4. For figure(B-D), The PCs were calculated with SCES, SCORM, SiMES and HapMap Asian population panels on the thinned set of 86,516 SNPs (r2 <0.2).

96

Figure 6. Quantitle –Quantitle (Q-Q) plots of P-values for association between all SNPs and AL in the individual cohort (A) SCES, (B) SCORM,(C) SiMES, and combined meta-analysis of the discovery cohorts (D) SCES + SCORM + SiMES.

97

Figure 7. Manhattan plot of -log10(P) for the association on axial length from the meta-analysis in the combined cohorts of SCES, SCORM and SiMES. The red horizontal line denotes genome-wide significance (P = 5.0 × 10-8).

98

Figure 8. The chromosome 1q41 region and its association with axial length in the Asian cohorts. A) Regional plots for AL from the meta-analysis of three Asian GWAS cohorts: SCES, SCORM and SiMES. The association signals in a 1 megabase (Mb) region at chromosome 1q41 from 217,400 kb to 218,400 kb around the top SNP rs4373767 (red diamond) are plotted. The degree of pair-wise LD between the rs4373767 and any genotyped SNPs in this region is indicated by red shading, measured by r2. Superimposed on the plots are gene locations and recombination rates in HapMap Chinese and Japanese populations (blue lines). B) LD plot showing pair-wise r2 for all the SNPs genotyped in HapMap database residing between rs4428898 and rs7544369, inclusively, at chromosome 1q41. The four identified top SNPs are in red rectangles. The LD plot is generated by Haploview using SNPs (MAF > 1%) genotyped on Han Chinese and Japanese samples in the HapMap database. All coordinates are in Build hg18.

99

Figure 9. mRNA expression of ZC3H11B, SLC30A10 and LYPLAL1 in human tissues . Expression of mRNA for the three genes was examined in human brain, placenta, neural retina (retina), retinal pigment epithelium (RPE) and sclera from adult tissues, and retina/RPE and sclera from 24-week gestation fetal tissues using reverse transcription polymerase chain reaction (RT-PCR). Glyceraldehyde 3-phosphate dehydrogenase (GAPDH) is a housekeeping gene and was used as an internal control for the quantification of mRNA expression. NTC (No template control) served as a negative control with the use of water rather than cDNA during PCR.

100

Figure 10. Transcription quantification of ZC3H11A, SLC30A10 and LYPLAL1 in mouse retina, retinal pigment epithelium and sclera in induced myopic eyes, fellow eyes and independent control eyes. Myopia was induced using -15 diopter negative lenses in the right eye of mice for 6 weeks. Uncovered left eyes were served as fellow eyes and age-matched naive mice eyes were controls. Quantification of mRNA expression in mice neural retina (retina), retinal pigment epithelium (RPE) and sclera using quantitative real-time PCR. The bar represents the fold changes of mRNA for each gene after normalization using GAPDH as reference. The mRNA levels of murine ZC3H11A, a gene that is conserved with respect to ZC3H11B in human, SLC30A10 and LYPLAL1 in myopic and fellow retina, RPE and sclera are compared with independent controls with P-values as follows: ZC3H11A (retina/ RPE/sclera, P = 2.60 × 10-5, 2.62 × 10-6 and 1.08 × 10-4 respectively), SLC30A10 (P = 2.00 × 10-4, 2.00 × 10-4 and 4.02 × 10-4 respectively) and LYPLAL1 (P = 1.50 × 10-4, 1.50 × 10-4, 1.54 × 10-4 respectively). *P<0.0001.

101

.

Figure 11. Immunofluorescent labelling of (A) ZC3H11A (B) SLC30A10 and (C) LYPLAL1 in mouse retina, retinal pigment epithelium and sclera in induced myopic eyes, fellow eyes and independent control eyes. The neural retina (retina), retinal pigment epithelium (PRE) and scleral cells were immunolabeled with the polycolonal antibodies against ZC3H11A, SLC30A10 and LYPLAL1 and were co-labeled with 4',6-diamidino-2- phenylindole (DAPI). Negative controls were devoid of a fluorescence signal, treated with the secondary antibody alone and DAPI. No immunostaining was observed in the negative controls. Scale bar represents 50µM and magnification is 200X. The florescence intensity labeled of the green color shows the localization of proteins and blue color indicates the nuclei that were stained with DAPI. Expression of the proteins had a trend in abundance similarly to that of their mRNA levels as depicted in Figure 10. Lower level of expression was determined for ZC3H11A in all tissues for myopic mice. Similarly significant reduction was shown in the expression of SLC30A10 in retina and RPE while higher level of expression was found in myopic sclera. LYPLAL1 showed higher level of expression in the retina and RPE tissue but reduced expression in the sclera in myopic mice. The following abbreviations represent the retinal layers: nerve fibre layer (NFL), ganglion cell layer (GCL), inner plexiform layer (IPL), inner nuclear layer (INL), outer plexiform layer (OPL), outer nuclear layer (ONL), photo receptor layer (PRL) and retinal pigment epithelium (RPE).

102

4 Chapter 4 Genome-Wide Meta-Analysis of Five Asian

Cohorts Identifies PDGFRA as a Susceptibility Locus for

Corneal Astigmatism

This Chapter is a slightly modified version of the paper published in PLoS Genet

2011, 7(12): e1002402. doi:10.1371/journal.pgen.1002402

4.1 Abstract

Corneal astigmatism refers to refractive abnormalities and irregularities in the curvature of the cornea, and this interferes with light being accurately focused at a single point in the eye. This ametropic condition is highly prevalent, influences visual acuity and is a highly heritable trait. There is currently a paucity of research in the genetic aetiology of corneal astigmatism. Here we report the results from five genome-wide association studies of corneal astigmatism across three Asian populations, with case-control datasets of Chinese (1,991 cases /954 controls),

Malays (1,018 cases /1,220 controls), Indians (825 cases /1,314 controls) and an independent 397 Chinese family trios. Variants in PDGFRA on chromosome 4q12

(lead SNP: rs7677751, allelic odds ratio = 1.26 (95% CI: 1.16-1.36), Pmeta = 7.87

× 10-9) were identified to be significantly associated with corneal astigmatism, exhibiting consistent effect sizes across all five cohorts. This highlights the potential role of variants in PDGFRA in the genetic aetiology of corneal astigmatism across diverse Asian populations.

103

4.2 Background

Astigmatism is a condition where light rays are prevented from focusing at a single point in the eye, resulting in blurred vision at any near or far distance. The

reported age-adjusted prevalence of astigmatism was 37.8% for Chinese adults284,

54.8% in rural Asian Indians334, 37% (≤ -0.75 D) for Caucasian in Australia335

and 36.2% in the US336. The prevalence of astigmatism in children varies considerably across different studies and ethnic groups. For instance, the prevalence of astigmatism (≤ -0.75 D) in school-children ranges from 13.6% in

Australia337, 20% in Northern Ireland338, 28.4% for Singapore school children122,

to 42.7% for Chinese children in urban China339.

Although the precise cause of astigmatism is unknown, genetic factors have

been implicated in the aetiology of corneal astigmatism. Studies have reported a higher risk of developing astigmatism in individuals whose sibling or parents have astigmatism125. However, only a few segregation analyses have been

conducted for astigmatism. A study examined the mode of inheritance of

astigmatism in 476 individuals from 125 nuclear familiars with probands affected

by astigmatism340. The results suggested familial transformation existing for

affected probands (≤ -0.75 D), and single-major-locus inheritance when the

corneal astigmatism was coded as ordinal outcome by the severity degree.

Proportion of genetic contribution to astigmatism development is generally

lower than that for spherical refractive error, with the estimated heritability, in

general, ranging from 30% to 60%157,341-344. For instance, Hammond and

colleagues157 investigated the inheritance of astigmatism for 226 monzygotic

104

(MZ) and 280 dizygotic (DZ) twins in the United Kingdom and found genetic

effects accounted for 42% to 61% of the variation in corneal astigmatism.

Similarly, heritability estimate was as high as 60% in 345 MZ and 267 DZ twin

pairs in Austria in myopia (GEM) twin study341. While most of the twin studies

have been conducted in Caucasian populations, a study on Chinese twins in

Taiwan reported a heritability estimate of 46% for corneal astigmatism,

suggesting that genetic factors account for a similar extent in the aetiology of

astigmatism for Asian populations343. However, in a recent study in 52 MZ and

47 female twin pairs aged 66-79 years, the genetic contribution to the corneal

astigmatism is not significant345; this may be due to the small samples size.

No genetic loci have been systematically and consistently identified to be implicated in the development of corneal astigmatism. Here we report the findings from the meta-analyses of five genome-wide association studies (GWAS) performed in 8,513 individuals from three Asian populations. The discovery phase of our study comprises 4,254 individuals from two population-based

GWAS performed in adults of Chinese and Malay ethnicities from the Singapore

Prospective Study Program (SP2) and the Singapore Malay Eye Study (SiMES) respectively. The replication phase comprises of data from three other genome- wide surveys of: (i) 2,139 Indian adults from the Singapore Indian Eye Study

(SINDI); (ii) 929 Chinese school children from the Singapore Cohort Study of the

Risk Factors for Myopia (SCORM); and (iii) 397 Chinese trios of parents and astigmatic offsprings from the Singaporean Chinese in the Strabismus, and Refractive Error Study (STARS).

105

4.3 Methods

4.3.1 Study cohorts

Singapore Prospective Study Program (SP2): Participants included in SP2

were recruited from a revisit of four previously conducted population-based

surveys in Singapore: Thyroid and Heart Study 1982-1984346, National Health

Survey 1992347, National University of Singapore Heart Study 1993–1995348 and

347-349 the National Health Survey 1998 . These studies comprise random samplings of individuals stratified by ethnicity from the entire Singapore population. From

2003 to 2007, a total of 10,747 adults aged 40 years or older were invited in this follow-up survey of which 5,157 underwent a health examination and blood samples were drawn. The present genome-wide genotyping involved individuals of Chinese descent only (n = 2,867). Complete post-filtering data on corneal astigmatism for GWAS were available for 2,016 individuals.

Singapore Malay Eye Study (SiMES): SiMES is a population-based cross- sectional survey on eye diseases for Malay adults aged 40 to 80 years living in

Singapore. It was conducted between August of 2004 and June of 2006. The details of the study design and methods have been previously described291. In

brief, a total of 4,168 Malay residents in the Southwestern area of Singapore were

identified and invited for a detailed ocular examination which 3,280 (78.7%)

participated. Genome-wide genotyping was performed in 3,072 individuals26,29.

Individuals with cataract surgery and missing corneal astigmatism measurements

were excluded from the study. Complete post-filtering data on corneal

106

astigmatism for GWAS were available for 2,238 individuals.

Singapore Indian Eye Study (SINDI): SINDI is a population-based survey of major eye diseases288 in ethnic Indians aged 40 to 80 years living in the South-

Western part of Singapore and was conducted from August 2007 to December

2009. In brief, 4,497 Indian adults were eligible and 3,400 participated. Genome-

wide genotyping was performed in 2,953 individuals26,29. As in the discovery

cohorts, participants were excluded from the study if they had cataract surgery

and missing corneal astigmatism data. Complete post-filtering data on corneal astigmatism were available for 2,134 participants.

Singapore Cohort study of the Risk factors for Myopia (SCORM): A total of

1,979 children in grades 1, 2, and 3 from three schools were recruited from 1999 to 2001 with detailed information described elsewhere289. The children were

examined on the school premises annually by a team of eye care professionals.

The GWAS was conducted in a subset of Chinese children of 1,116 subjects22.

The phenotype used in this study was based on the corneal astigmatism measured

on the 4th annual examination of the study (children at age 10 to 12 years).

Complete post-filtering data on corneal astigmatism measurements and SNP data

were available in 929 SCORM children.

Singaporean Chinese in the Strabismus, Amblyopia and Refractive Error

Study (STARS): STARS is a population-based survey of Chinese families with

children aged 6 to 72 months residing in the south-western and western region of

Singapore. Disproportionate random sampling by 6-month age groups resulted in

the recruitment and subsequent eye examination of 3,009 Chinese children

107

between May 2006 and November 2008. Details of the study design and

methodology have been previously described350. A total of 1,451 samples from

440 nuclear families were included for genome-wide genotyping. In all, 397 trio-

sets of parents and their offsprings with corneal astigmatism had complete post-

filtered genotype data.

Measurements and definition of corneal astigmatism

All studies used a similar protocol to measure ocular phenotypes including

corneal curvature, autorefraction and cylinder power by a team of eye care

professionals. Participants in SP2, SIMES and SINDI underwent non-cycloplegic

automated refractive assessments using the autorefractor (Canon RK-5, Tokyo,

Japan). For SCORM and STARS, cycloplegic measurements (Canon RK-F1,

Tokyo, Japan) were performed 30 minutes after three drops of 1% cyclopentolate which were administered 5 minutes apart.

Corneal curvature radii in the horizontal and vertical meridian were determined with keratometry in millimeters351. The keratometer measured the

anterior corneal surface and used a refractive index of 1.3375 to account for the

contribution from the posterior corneal surface to derive the corneal refractive

power in diopters. Corneal cylinder power was calculated as the difference

between the flattest and steepest meridian of the keratometry readings in diopters

of power.

As the corneal cylinder power between the right and left eyes are strongly

correlated across all five cohorts (Pearson’s correlation coefficient r ranging from

0.51 to 0.79; P < 0.001), the mean corneal cylinder power over both eyes was

108

used to define corneal astigmatism. Averaging ocular measurements between two

eyes in genetic studies has been suggested to be statistically more powerful than

using the information from only one eye36, and this approach has been

consistently adopted in genome-wide studies of myopia34,35. As with previous

studies337,340, we have defined individuals with average corneal cylinder power ≤ -

0.75 D as cases, and those with average corneal power between -0.75 D and 0 D

as controls.

4.3.2 Data quality control

For SP2, a total of 2,867 blood-derived DNA samples were genotyped using

the Illumina Human610 Quad and 1Mduov3 Beadchips. For the samples that were

genotyped on the two platforms, the genotypes from the denser platform were

used in our study. For SiMES (n = 3,072), SINDI (n = 2,593) and STARS (n =

1,451), the Illumina Human610 Quad Beadchips were used for genotyping all

DNA samples. For the 1,116 SCORM children, DNA samples were genotyped on

the Illumina HumanHap 550 Duo Beadchips®.

In brief, for case-control study design, QC criteria included a first round for

autosomes SNP QC to obtain a cleaned set of genotypes for sample QC, by

excluding SNPs with: (i) missingness (per-SNP call rate) > 5%; (ii) minor allele

frequency (MAF) < 1%; and (iii) HWE p-value < 10-7. Using the subset of SNPs passing the first round QC, samples were then excluded based on the following conditions: (i) per-sample call rates of less than 95%; (ii) excessive heterozygosity (defined as the sample heterozygosity to be beyond 3 standard

109

deviations from the mean sample heterozygosity); (iii) cryptic relatedness; (iv)

gender discrepancies; and (v) deviation in population membership from

population structure analysis. A second round of SNP QC was then applied to the

remaining samples passing quality checks. We excluded SNPs with missingness >

5%, gross departure from HWE (P value < 10-6), MAF <1 % and low

concordance between duplicate samples on different genotype platforms (relevant

to SP2 samples only).

Population structure was ascertained using principal components analyses

(PCA) with the EIGENSTRAT program7. Population substructure of SP2,

SiMES, SINDI and SCORM was examined by PCA with respect to three

population panels in the HapMap samples (Figure 12). Due to the presence of population structure within the Malay (Figure 4) and Indian samples (Figure 13), we adjusted for the top 5 principal components in the association analyses for the

SiMES and SINDI datasets.

For the STARS trios, we additionally excluded samples and trio-sets on the basis of excessive Mendelian inconsistencies defined as having > 1% of the post-

QC SNPs exhibiting Mendelian errors. SNPs with more than 10% Mendelian errors are excluded from the association analyses, and the genotypes leading to

Mendelian errors in all other remaining SNPs are coded as missing. As family trios are more robust to the presence of population structure, we did not exclude any samples due to population structure.

110

Detailed data quality control (QC) procedures for each study are as follows

(except the QC procedure for SCORM and SiMES are provided in Chapter 3

Method Session):

SP2: Of the 2,867 blood-derived DNA samples, 392 samples were genotyped on the HumanHap 550v3, 1,459 samples on the 610-Quad, 817 samples on the

1M-Duov3, 191 samples on both 550v3 and 1M-Duov3, and 8 samples on both

610-Quad and 1M-Duov3. For the samples that were genotyped on two platforms, we used the genotypes from the denser platform in our study. We excluded 443 individuals on the following conditions, sample call rates of less than 95%, excessive heterozygosity, cryptic relatedness by IBS, population structure ascertainment, and gender discrepancies as listed in the main text. This left 2,434 post-QC SP2 samples. During the SNPs QC procedure, we excluded SNPs with low genotyping call rates (> 5% missingness) or monomorphic, with MAF < 1%, or with significant deviation from HWE (P< 10-6). This yielded a post-QC set of

462,580 SNPs. As SP2 samples are genotyped on different platforms, the concordance of the duplicate samples plated on different Beadarrays chips was also examined as quality of genotyping. The average SNP concordance rate between chips for the post-QC duplicated samples was 0.995. We additionally assessed the SNPs that are present on different platforms for extreme variations in allele frequencies with a 2-degree of freedom chi-square test of proportions, removing 62 SNPs with P-values < 0.0001. We further excluded those with missing phenotype data on corneal astigmatism (n = 418). This yields a final set

111

of 462,518 SNPs that are common to all 2,016 SP2 samples (1231 cases, 785

controls).

SiMES: Using the same quality control criteria, we omitted a total of 530

individuals from the total of 3,280 samples in the study, including those of

subpopulation structure (n = 170), cryptic relatedness (n = 279), excessive

heterozygosity or high missingness rate > 5% (n = 37), and gender discrepancy (n

= 44). After the removal of the samples, SNP QC was then applied on a total of

579,999 autosomal SNPs for the 2,542 post-QC samples. SNPs were excluded based on (i) high rates of missingness (> 5%) (n = 26,343); (ii) monomorphic

SNPs or MAF < 1% (n = 34,891); or (iii) genotype frequencies deviated from

HWE (p < 1 × 10-6) (n = 3,053). This yielded 515,712 SNPs after the same SNP

QC criteria. Those with missing data or having cataract surgery were further removed (n = 304). Finally, 515,712 SNPs in 2,238 samples (1018 cases, 1220 controls) were available for association analysis.

SINDI: We excluded 415 subjects from the total of 2,953 genotyped samples

based on: excessive heterozygosity or high missingness rate > 5% (n = 34),

cryptic relatedness (n = 326), issues with population structure ascertainment (n =

39) and gender discrepancies (n = 16). This left a total of 2,538 individuals with

579,999 autosomal SNPs. During SNP QC procedure, i) 17,923 SNPs were

removed due to high rates of missingness (> 5%), ii) and 17,966 SNP that were

monomorphic or had MAF < 1%, iii) and 2,958 SNPs out of HWE (p < 10-6).

After SNP QC, 541,152 SNPs were available for analysis. We further excluded

any individuals with missing data on corneal astigmatism measurements and

112

having history of cataract surgery (n = 399). This yielded a final dataset of 2,139 individuals (825 cases, 1,314 controls) and 541,152 post-QC SNPs.

STARS: Of the total of 1,451 genotyping samples, the individuals were excluded based on gender discrepancy (n = 17), high missingness (> 5%), excessive heterozygosity (n = 11) and excessive Mendelian inconsistency

(Mendelian error per family > 1%; n = 14). This left a set of 1,408 samples in 436 families after quality control, of which 57 samples (from 29 families) were further excluded due to founder singleton, or non-founder without 2 parents, yielding

1,351 individuals in 407 families. SNP QC was performed on the 576,979 autosomal SNPs by excluding those with high missingness rate > 5% (n = 1,984), gross departure of HWE for parents (P < 10-8) (n = 908), monomorphic SNPs and

SNPs with MAF <1% (n = 80,804), and SNPs exhibiting significant degree of

Mendelian inconsistencies (> 10% of all the trios) (n = 7). This led to a final set of

493,594 post-QC SNPs. We further restricted the family-based analyses to 1,191 individuals in 397 parents-trios (317 nuclear families) having corneal astigmatic child. In all, 493,594 SNPs for 1,191 samples from 397 parent-trios were available for family GWAS on corneal astigmatism.

4.3.3 Statistical methods

The genome-wide association tests were performed using PLlNK (version

1.07; http://pngu.mgh.harvard.edu/~purcell/plink/) as the primary analysis tool. A logistic regression adjusted for age and gender is used to model the association of genetic markers with corneal astigmatism. For each of SiMES and SINDI, the top

113

5 principal components of genetic ancestry from the EIGENSTART PCA were

also included as covariates to adjust for population stratification in these

populations. We assumed an additive genetic model where the genotypes of each

SNP is coded as 0, 1, and 2 for the number of minor alleles carried, in keeping

with increments in allelic dosage. For family GWAS association tests in STARS,

a transmission disequilibrium test (TDT) is used to measure significant distortions

in transmission of an allele from heterozygous parents to the affected offspring

under the condition of Mendel’s law352.

We also performed a quantitative trait analysis with the average corneal

cylinder power as the outcome. This is performed in PLINK for the unrelated

samples, and in FBAT (http://www.hsph.harvard.edu/~fbat/fbat.htm) for the

family trios. As the distribution of the quantitative trait of corneal cylinder power

is skewed, we performed a normal quantile transformation 353 prior to the

association analysis for unrelated samples. For family-based data, no

transformation was conducted since the FBAT does not require normal trait55

A meta-analysis of all five datasets is performed using weighted-inverse variance estimated from fixed-effect modelling in METAL

(http://www.sph.umich.edu/csg/abecasis/metal/). We adopt the method by

Kazeem and Farrall352 to pool the evidence from the case-control analyses and the

family trio TDT. For the quantitative trait analysis, the overall z statistics is

calculated as a weighted sum of the z-statistics from the linear regressions in the

non-familial data and FBAT analysis for the family-based data, weighted by

number of unrelated individuals or trios in the respective studies58.

114

Genotyping quality of all reported SNPs in this paper have been visually assessed by checking the intensity clusterplots.

4.4 Results

4.4.1 Datasets after quality control

The characteristics of the post-QC samples from the five studies are summarised in Table 8. The post-QC SP2 dataset comprised 2,016 adults, of which 1,231 individuals had corneal astigmatism (≤ -0.75 D) and 785 subjects were defined as non-astigmatic controls. The post-QC datasets comprised 2,238 adults (1,018 cases and 1,220 controls) in SiMES, 825 cases and 1,314 controls in

SINID, and 760 cases and 169 controls in SCORM. 397 trio-sets of parents and their offsprings with corneal astigmatism passed sample QC (n = 1,191). In total,

the genome-wide meta-analysis was conducted on 460,528 autosomal genotyped

SNPs passed stringent quality control criteria present in all studies consisting of

8,513 individuals.

There was no evidence of over-inflation of statistical significances due to

population structure in either of the cohorts (SP2 λGC = 1.006, SiMES λGC =

1.007, SINDI λGC = 1.009, SCORM λGC = 1.01, STARS λGC = 1.021 ) or in the

meta-analysis of all studies (overall λGC = 1.002). Suggestive evidence of

association (defined as 10-7 < P-value < 10-5) were seen in each of cohort

(Figures 14A -E), as well as in the meta-analysis of five cohorts where a collection of SNPs deviated from their expected distributions in the quantile- quantile plots of the P-values (Figure 14F).

115

4.4.2 Gene PDGFRA exhibiting genome-wide significance

Three SNPs (rs7677751, rs2307049 and rs7660560) in meta-analysis attained

genome-wide significance of P-value < 5 × 10-8 and were found to cluster in the

platelet-derived growth factor receptor alpha (PDGFRA) gene on chromosome

4q12 (lowest P = 7.87 × 10-9 at rs7677751; Table 9; Figure 15). The direction

and magnitude of the effect sizes at these SNPs in all five cohorts were highly

similar. No significant evidence of effect size heterogeneity was detected across

the SNPs (heterogeneity I2 P-value ≥ 0.246), and the minor allele frequencies of

these SNPs are consistently similar across all five studies. Interestingly, these

SNPs are located within the MYP9 region identified previously as a candidate

locus for myopia through linkage scans 39.

At the most significant SNP rs7677751 in PDGFRA, the frequency of the risk

T-allele ranged from 0.19 to 0.26 in the five cohorts and conferred a 26% higher risk of corneal astigmatism than the C allele (OR = 1.26, 95% CI = 1.16 – 1.36) in the meta-analysis across all five studies (Figure 16). This SNP alone explains

0.41% of the variation in corneal cylinder power. In addition, a general genetic model identified that the 5.5% of the individuals in the combined cohorts that carry the TT genotype at rs7677751 had a 1.65-fold (95% CI = 1.33 – 2.06, P- value = 6.23 × 10-6) increase in the risk of developing corneal astigmatism

compared to those that are not carrying any copies of the risk allele. All of the

associated SNPs spanned 10kb within PDGFRA at 4q12, and a high degree of

linkage disequilibrium is seen at this locus in all three Asian populations

(Chinese, Malays and Asian Indians; Figure 17). 116

4.5 Discussion

We have performed a genome-wide survey for corneal astigmatism across

8,513 individuals, combining the data from five GWAS performed in Chinese and

Malay adults, Indian adults, Chinese children and family trios. We observed a

strong and consistent association with the onset of corneal astigmatism at the

PDGFRA gene locus on chromosome 4q12 across all five Asian cohorts, with

three SNPs in this locus exhibiting evidence stronger than genome-wide significance in the meta-analysis. To the best of our knowledge, this is the first

GWAS to investigate the genetic aetiology of corneal astigmatism in a genome- wide fashion.

The PDGFRA gene spans 69kb with 23 coding and resides on chromosome 4q12. The receptor for platelet-derived growth factor (PDGF) contains two types of subunit, a- and β- PDGFRA, which are differentially expressed on the cell surface354. PDGFR-a binds to three forms of PDGF (PDGF-

AA, AB and BB) and mediates many biological process including embryonic

development, angiogenesis, cell proliferation and differentiation. The role of

PDGFRA in cellular growth and proliferation is underlined by its contribution to

the pathogenesis of gastrointestinal stromal tumours355. A large body of evidence

has shown that both PDGF and its receptors are expressed in corneal epithelium,

stromal fibroblasts and endothelium356,357 as well as proliferative retinal tissues in eyes358-360. Along with other cytokines (epidermal growth factor, transforming growth factor-a,-β etc), studies have further suggested that PDGF and its

117

receptors can mediate corneal fibroblast migration, matrix remodelling and play

an important role in corneal wound healing357,361-363. The corneal stroma

comprises a large portion of the cornea; the sensitivity of stromal tissue to various

growth factors is well described364. The administration of PDGF resulted in

keratinocyte elongation using rabbit corneal stroma tissue365. In light of this, a

role for PDGFRA in the regulation of ocular development and parameters cannot

be excluded, and our study suggests that genetic polymorphisms within PDGFRA

may be involved in the regulation of corneal biometrics resulting in the

occurrence of corneal astigmatism.

In addition, Hammond and colleagues reported that 4q12 (MYP9; LOD 3.3)

was significant linked with myopia from a genome-wide linkage study of 221 dizygotic twin pairs39, and subsequent replication revealed nominal significance

of 4q12 (P = 0.065) for refractive error in African-American families38. We thus

undertook a candidate SNP approach with the identified SNPs to investigate the

possible association between PDGFRA and (i) the onset of high myopia; (ii) the refractive error as a quantitative trait. We did not observe any striking association between the identified variants with either outcomes, suggesting that the association of PDGFRA with corneal astigmatism is probably not through any shared aetiology with myopia.

The most significant SNP in our analyses rs7677751 is located in the intron 1 of PDGFRA. Interestingly, among the SNPs identified, rs2228230 is coding- synonymous (valine:GTC > valine:GTT) and resides in exon 18, while rs3690 is within the untranslated-3’ region (Figure 17). These three SNPs (rs7677751,

118

rs2228230 and rs3690) are strongly correlated with each other (pair-wise Pearson correlation coefficient r ranging from 0.77 to 0.81), although the association evidence at the latter two SNPs did not reach genome-wide significance. As the next closest gene (GSX2) from the 5’ end of PDGFRA is 127kb away and is not within the LD block with our identified SNPs, it is unlikely that the signals

observed in our study are attributed to functional variants located beyond

PDGFRA.

Our group recently reported a strong association between variants in PDGFRA

with corneal curvature366. Corneal curvature is an ocular dimension defined as the

average of the radius of corneal curvature at the horizontal and vertical meridians.

Myopic eyes have been found to have steeper (reduced radius of

curvature), but the significant correlation between corneal curvature and refractive

error was not consistently observed331,367,368. Excessively flatter cornea is

associated with cornea plana, producing high hyprotropia and likely resulting in

angle-closure glaucoma33,369. On the other hand, corneal astigmatism is an eye-

disorder, where the cornea is more curved in one meridian direction compared to

the other. This fragmentizes the light rays entering the eye, leading to the inability

to focus onto a single point in the eye115. It is thus interesting that the same

PDGFRA gene has been identified in two ocular outcomes that are biologically

different, given the presence of a weak correlation between corneal astigmatism

and corneal curvature (Spearman correlation coefficient r between 0.088 and

0.192 in our cohorts), pointing to a possible pleiotropic contribution of PDGFRA.

119

Our study has adopted a binary definition of corneal astigmatism (affected and unaffected) that is commonly adopted in clinical practice and eye-trait epidemiology337,340. One caveat of this definition is the potential for misclassifying the affected status, particularly for samples with cylinder power around the cutoff threshold of -0.75D. To evaluate the robustness of our findings to the choice of threshold used, we additionally performed the association analysis at the identified SNPs with four different combinations of the thresholds used to define cases and controls. We observed that the odds ratios were highly similar across all four scenarios, with the combined evidence at rs7677751 ranging from

-4 -8 Pmeta of 1.5 × 10 to 6.7 × 10 . Unsurprisingly, the association evidence was weakest in the scenario with the most stringent thresholding (≤ -1.5 for cases and

> -0.5 for controls), given this stringency comes at the expense of decreasing the number of individuals in each study. We additionally performed a secondary analysis treating corneal cylinder power as a quantitative trait. Strong statistical evidence was consistently observed at the three leading SNPs (rs7677751, P =

1.76 ×10-7; rs2307049, P = 3.41 × 10-7 and rs7660560, P = 4.41 × 10-7), indicating that our findings are robust to the definition of the phenotype.

Owing to the relatively small sample sizes within each of the five GWAS studies, we have chosen to priortise our survey to identify genetic variants that contribute to the aetiology of corneal astigmatism in multiple Asian populations.

While Malays have been observed to be genetically closer to the Chinese, the

Asian Indians tend to be genetically closer to the Caucasians100. Our discovery at

PDGFRA thus suggests that part of the underlying biological pathway responsible 120

for astigmatism development is common to multiple populations, although there

may be population-specific genetic variants that our current study is not

sufficiently powered to identify.

Our study has included two pediatric Chinese populations (SCORM and

STARS) with school or pre-school children who are still progressing to their final

phenotype. It was documented that a high degree of astigmatism occurs during

infancy and the early childhood 370. The prevalence rates remain stable during

young adulthood (18 to 40 years), but increase consistently during late adulthood

at aged 40 years or older115,284. Studies have also indicated that the age-related change in astigmatism is associated with meridians changes in the cornea125.

Children and adolescents have a predominance of “within-the-rule” corneal astigmatism in general, where the vertical curve is greater than the horizontal

(axis of 1° to 15°); while in older adults, it shifts to “against-the-rule” astigmatism

(axis of 75° -105°)371,372. However, our study considers corneal astigmatism without reference to the axis nor the longitudinal changes from children to adults.

Whether PDGFRA plays the same role in pediatric and adults populations will however need further investigation.

121

Figure 12 Principal Component Analysis (PCA) of SP2, SiMES, SINDI, SCORM with respect to the population panels in phase 2 of the HapMap samples (CEU - European, YRI – African, CHB – Chinese, JPT – Japanese).

122

Figure 13 Principal component analysis (PCA) was performed in SINDI to assess the extent of population structure. Each figure represents a bivariate plot of two principal components from the PCA analysis of genetic diversity within SINDI. (A) 1st eigenvector against 2nd eigenvector, (B) 2nd eigenvector against 3rd eigenvector, (C) 3rd eigenvector against 4th eigenvector and (D) 1nd eigenvector against 5th eigenvector . The first 5 principal components were used as covariates to account for population structure.

123

A B

C D

E F

Figure 14 Quantile-Quantile (Q-Q) plots of P-values for association between all SNPs and corneal astigmatism in the combined meta-analysis of (A) individual cohort SP2, (B) SiMES, (C) SINID, (D) SCORM, (E) STARS and (F) SP2 + SiMES + SINDI + SCORM + STARS. 124

A

B

Figure 15 (A) Manhattan plot of log10(P-values) in the combined discovery cohort of SP2, SiMES, SINDI, SCORM and STARS. The blue horizontal line presents the threshold of suggestive significance (P = 1.00 x 10-5). (B) Regional plot of the association signals from the meta-analysis of the five GWAS cohorts around the PDGFRA gene locus. A region of 400kb around the lead SNP (rs7677751, red diamond) is shown. The LD between the lead SNP and the neighbouring SNPs is represented by the shading of the squares, with increasing shade of red indicating higher LD as measured by r2. The blue lines represent the recombination rates of JPT+CHB panels from HapMap II.

125

Figure 16. Forest plot of the estimated allelic odds ratios for the lead SNP rs7677751. The allelic odd ratios for allele T of rs7677751 and 95% confidence intervals are presented for the five studies separately (black rectangles or green rectangles), and for the overall meta-analysis across all five studies (red diamond).

126

A

B

C

Figure 17 Linkage disequilibrium (LD) calculated in terms of r2 for Singapore Chinese samples from SP2 (A), Malays samples from SiMES (B) and Indians panels from SINID(C). Black squares show perfect LD whereas shades of grey show decreasing LD.

127

Table 8. Characteristics of the participants in five studies

Cohorts SP2 SIMES SINDI SCORM STARS Individuals genotyped 2687 3280 2953 1116 1451 (440 families) Individuals after QC 2434 2542 2538 1008 1351 (407 families) Individuals in GWAS 2016 2238 2139 929 1191 (397 parents-trios) a Male (%) 45.8% 48.9% 51.2% 51.8% 52.3% a Age (SD) 47.9 (11.2) 57.7 (10.7) 55.9 (8.9) 10.8 (0.8) 7.5 (3.8) Cases 1231 1018 825 760 NA Controls 785 1220 1314 169 NA Corneal cylinder powerb (SD) Cases -1.38 (0.73) -1.37 (0.94) -1.21 (0.58) -1.52 (0.68) 1.30 (0.78)* Controls -0.48 (0.15) -0.46 (0.15) -0.46 (0.16) -0.47 (0.14) NA aInformation of gender and age is based on data from the offsprings with corneal astigmatism in STARS family cohort. bAveraged across both eyes; NA, not available; Age is in years. SP2 - Singapore Prospective Study Program; SiMES - Singapore Malay Eye Study; SINDI - Singapore Indian Eye Study; SCORM - Singapore Cohort study of the Risk factors for Myopia; STARS - Singaporean Chinese in the Strabismus, Amblyopia and Refractive Error Study.

128

Table 9 Top SNPs (P-value ≤ 5 x 10-6) identified from combined meta-analysis of five Asian population cohorts.

SP2 SiMES SINDI SCORM STARS Meta-analysis (n=2,016) (n = 2,238) (n = 2,139) (n = 929) (397 trios) (n=8,513)

SNP GENE CHR BPa Ab EAFc OR P OR P OR P OR P OR P OR s.e. P Pe

rs7677751 PDGFRA 4 54819217 T 0.23 1.35 3.76E-04 1.27 6.26E-04 1.23 4.09E-03 1.22 2.00E-01 1.12 3.74E-01 1.30 0.05 7.87E-09 0.786

rs7660560 PDGFRA 4 54829151 A 0.23 1.32 1.46E-03 1.28 5.34E-04 1.26 1.37E-03 1.13 4.22E-01 1.14 3.20E-01 1.29 0.05 1.15E-08 0.839

rs2307049 PDGFRA 4 54824911 A 0.23 1.31 2.17E-03 1.26 7.55E-04 1.27 1.08E-03 1.14 4.03E-01 1.14 3.17E-01 1.28 0.05 1.58E-08 0.885

rs10189905 2 199387355 G 0.12 0.72 2.56E-03 0.78 4.13E-03 0.83 1.09E-01 1.08 7.05E-01 0.61 5.70E-03 0.76 0.07 5.57E-07 0.508

rs3690 PDGFRA 4 54856570 C 0.28 1.40 6.25E-04 1.29 1.53E-03 1.18 2.17E-02 1.03 8.58E-01 1.09 5.28E-01 1.33 0.06 7.77E-07 0.391

rs4864872 PDGFRA 4 54847041 T 0.20 1.41 5.94E-04 1.26 3.19E-03 1.19 1.49E-02 1.05 7.69E-01 1.06 6.71E-01 1.32 0.06 1.24E-06 0.370

rs2228230 PDGFRA 4 54846797 T 0.20 1.41 5.94E-04 1.26 3.19E-03 1.19 1.67E-02 1.05 7.69E-01 1.06 6.71E-01 1.32 0.06 1.43E-06 0.362

rs10032688 4 54855415 A 0.19 1.39 8.28E-04 1.26 3.07E-03 1.20 1.39E-02 1.02 8.97E-01 1.11 4.80E-01 1.31 0.06 1.64E-06 0.470

rs17084051 4 54782338 A 0.24 1.35 3.68E-04 1.26 1.08E-03 1.14 7.77E-02 1.23 1.68E-01 1.04 7.55E-01 1.29 0.05 2.16E-06 0.246

rs6758183 2 226709230 A 0.16 1.37 2.36E-02 1.41 1.58E-03 1.21 1.10E-02 1.17 5.24E-01 1.33 1.78E-01 1.39 0.09 2.80E-06 0.772

rs7676985 4 54759330 A 0.26 1.26 2.41E-03 1.19 1.10E-02 1.14 6.09E-02 1.36 3.82E-02 1.09 4.51E-01 1.22 0.05 2.98E-06 0.693

rs1547904 PDGFRA 4 54841146 T 0.19 1.35 1.77E-03 1.24 6.10E-03 1.19 1.52E-02 1.09 6.19E-01 1.07 6.27E-01 1.28 0.06 4.41E-06 0.626

rs6792584 SUCLG2 3 67614078 G 0.24 1.10 2.61E-01 1.36 8.33E-06 1.16 4.06E-02 1.02 8.72E-01 1.17 2.21E-01 1.24 0.05 4.72E-06 0.205

rs2270676 KCNE3 11 73846059 G 0.20 0.74 1.89E-03 0.80 1.67E-03 0.90 1.76E-01 1.08 6.75E-01 0.80 1.18E-01 0.78 0.06 5.09E-06 0.392 aBase pair positions are indicated according to the NCBI build 136 (hg18); bEffect allele; cAverage effect allele frequency in the discovery cohort; s Standard error for odds ratios; eP-value for heterogeneity I2 between five study cohorts.

129

5 Chapter 5 Genome-Wide Comparison of Estimated

Recombination Rates between Populations

5.1 Study summary

Inter-population differences in recombination patterns yield important insights

into recent human evolution and trans-ethnic fine-mapping. Although recombination profiles are largely conserved at a broad scale between populations, preliminary findings have suggested there are variations in fine-scale differences in recombination rates.

In this study, a signal-processing strategy (varRecM) is proposed to evaluate variations in inter-population recombination rates. Generally speaking, recombination rates estimated from two populations can differ in the following two aspects (see Figure 18): in (i) the kurtosis or ‘peakedness’ of the recombination profiles (e.g. Figure 18A to C); or in (ii) the level of coupling, or phase shifting, between the rates (e.g. Figure 18D to F). This method simultaneously quantifies the extent in the differential peakedness of the recombination profiles as well as the degree of phase shift, and adopts a sliding window approach to interrogate the whole genome for any variation in recombination patterns. For each sliding window, a numerical score (the varRecM score) that reflects the extent of recombination variation is assigned. A larger score represents greater evidence of variation in either the peakedness or the phase shift of the recombination rate profiles. Genomic regions harbouring

130

significant differences in recombination rate profiles are thus more likely to

generate scores that lie in the right tail of the genome-wide distribution.

Using LD-based recombination rates that have been computed from the

International HapMap Project (HapMap)5 and the Singapore Genome Variation

Project (SGVP)100, we subsequently apply our varRecM method to compare the

recombination rates between five population panels composed of European, East

Asian, African, Singapore Chinese and Singapore Indian samples. This represents

the first attempt to comprehensively catalogue recombination rate variations

across different human populations on a genome-wide scale. While overall patterns of recombination were conserved between populations, several regions emerged with strong evidence of recombination rate variation that encompass genes exhibiting population-specific signatures of positive natural selection.

Intergenic regions are more likely to exhibit recombination differences between populations.

5.2 Methods

5.2.1 Development of recombination variation score

We calculate the recombination rate differences between two populations with

the varRecM score, which quantifies the extent in the differential peakedness and

the degree of phase shift in the two recombination profiles from each of the two

populations. The approach is implemented genome-wide by adopting a sliding

131

window of 50 kb throughout the whole genome where consecutive windows are

shifted by one SNP. Assuming there are two recombination rate profiles estimated

from the two populations across L SNPs within each genomic window, we define

the varRecM score as

| ( ) ( )|

( 1) + ( 2) 푔 − 푔 푤 1 2 푔 푔 where g(1) and g(2) denote the cumulative recombination measured in

centimorgans (commonly defined as the genetic distance) within the window for

population 1 and population 2 respectively and each metric is calculated as the

area under the curve of the recombination rates across the window; and w denotes a composite weight metric. While we have taken the absolute value in the numerator, in practice we can identify which population spans a larger genetic distance for the same window (thus indicating a higher likelihood for recombination to take place) from the sign of g(1) – g(2). The weight metric w is

defined as the product of two components wm and wcor, where: (i) wm effectively

measures the kurtosis of the recombination peak (for the population that spans the

larger genetic distance) relative to the averaged recombination in the window and

is defined as

( ) = log max ( ) + 1 푘 푘 푚 10 푔 푤 � 푙 푐푙 − � 퐷 (k) with cl denotes the recombination rate (in cM/Mb) between SNP l and SNP (l +

1), k denotes the population that spans the larger genetic distance, and D defines

132

the physical distance of the window (in Mb); and (ii) wcor represents the extent of

‘uncoupling’, or dissimilarity in correlation, between the two series of

recombination rates across L SNPs in the two populations and is given as

= 1 max lag , lag = 0, 1, 2, 3,…

푤푐표푟 − lag��푟 �� Here, rlag represents the lagged cross-correlation coefficient between the two recombination rate series and is defined as

lag ( ) ( ) ( ) ( ) lag = 퐿− 1 1 2 2 lag∑푙=1( )�푐푙+푙푎푔( −) 푐̅ � �푐푙lag − (푐̅ ) � ( ) 푟 2 2 퐿− 1 1 퐿− 2 2 �∑푙=1 �푐푙+푙푎푔 − 푐̅ � �∑푙=1 �푐푙 − 푐̅ � with ( ) denoting the average recombination rate across the window for 푘 population푐̅ k.

To provide a more robust assessment of the cross-correlation coefficient, particularly when a recombination peak occurs at the boundary of the window, the calculation of rlag is performed either with an additional buffer region of 4kb at each end of the window, or includes the recombination rates from two additional

SNPs at each end, depending on which scheme includes more SNPs. The inclusion of the lag component explicitly allows for different SNP densities in the two populations. The use of the genetic distance also mitigates the need for the same SNPs to be present in the two populations.

The rationale behind the inclusion of the first component wm is to up-weigh

the score of a window where there is a clear regional elevation of recombination

rate beyond the averaged rate across the window (for example, a recombination 133

hotspot). Typically wm is between 0 and 2, except in the rare occasion that the

maximum estimated recombination rate is at least 100 cM/Mb above the region-

averaged recombination rate. The intuition behind the second component wcor,

bounded between 0 and 1 inclusive, is to down-weigh the scores where the cross-

correlation between the two series of recombination rates is high, since a high

level of coupling between the two series of recombination rates suggests a smaller

difference in recombination pattern between the two populations in this genomic

window.

5.2.2 Simulation

Simulating recombination rates for two populations

For evaluating the performance of the method, we simulated recombination

rates for two populations across a 200kb region spanning 401 evenly distributed

SNPs. In population k (where k ∈ {1, 2}), we introduced a recombination hotspot

starting at a physical position dk, with a width of wk (in kb) that is distributed as

2 Normal (2, 0.5 ), and spans a genetic distance of gk (in cM). The assumed size of the hotspot is consistent with those observed in sperm typing78,373. The genetic

distance here is equivalently the area under the recombination hotspot versus

physical distance curve, or a measure of the cumulative peakedness of the

recombination hotspot. The background recombination rate used in our simulation

is assumed to follow an exponential distribution with a mean of 0.24cM/Mb. This

is based on the observation that the average recombination rate is 1.2cM/Mb

across the human genome, and yet only 20% of the recombination occurs outside

134

recombination hotspots374. In our simulations, we assume the presence of a

recombination hotspot in each of the two populations, varying in intensity and

location of the hotspots. The recombination rate for the hotspot in the first

2 population g1 is distributed as Normal (µ, 0.5 ) with µ measured in cM/Mb. The

recombination rate in the second population g2 is defined with respect to λ, which

represents the ratio of g1 to g2 (or λ = g1/g2). Larger values of λ correspond to

greater differences in the recombination profiles between the two populations.

For the assessment of the specificity of the varRecM metric, let ddiff denote the distance (in kb) between the starting locations of the two hotspots, assumed to be

2 2 distributed as Normal (0, σd ) with σd ∈ {0, 0.5, 1, 2}. Varying σd thus controls

2 the proximity of the hotspots in the two populations, and larger values of σd

effectively introduce greater differences in the simulated recombination profiles.

In addition to adding moderate jitter in the starting positions of the recombination

hotspots, we also evaluate the robustness of the varRecM metric in situations

where moderate differences exist in the peakedness of the recombination hotspots

between the two populations but not in the location of the hotspots. This is

achieved by setting µ as 25cM/Mb (for the first population) and λ ∈ {1, 5, 10}.

We also evaluate the effect of varying the lag parameter in the metric across a range of values (0, 2, 4, 6, 8, 12 and 10 log10(L/2)), which broadly corresponds to

lag distances of 0kb, 1kb, 2kb, 3kb, 4kb, 6kb and 5 log10(L/2)kb given a SNP

density of 2 SNPs/kb.

Our assessment of the sensitivity of varRecM assumes the recombination

hotspot for population 1 is located in the center of the simulated 200kb region, 135

while the hotspot for population 2 is randomly distributed between the 75kb and

125kb mark in each window. Given that the size of each hotspot follows a Normal

(2, 0.52) distribution, this simulates a high likelihood that the two hotspots are

found in significantly different locations. In addition, we vary the relative

intensities of the hotspots by setting µ ∈ {6, 25, 50} and allowing λ to range from 1 (no difference in the peak intensities) to 50 (where the genetic distance from the recombination hotspot in population 1 is fifty times the genetic distance in population 2).

By performing 10,000 simulations under each of the different settings for σd,λ

and µ, we count the frequencies where a large varRecM score than a pre-defined

genome-wide threshold is obtained, since this represents stronger evidence of

inter-population variation in recombination rates relative to the genome-wide

distribution. The empirical distributions of the varRecM score from the pairwise

genome-wide comparisons of the three HapMap populations displayed similar

values for the top quantiles. We thus averaged the varRecM scores at the top 1%

(and also the top 5% and top 0.1%) across the three pairs of comparisons , and

used these values as thresholds in our calculation of statistical power and false

positive rates. The power and false positive rates in our simulations are thus

defined as the proportion of simulations out of 10,000 where obtained score

exceeds the thresholds at the top 5%, 1% and 0.1% given the different

combinations of σd ,λ and µ. In all likelihood, our choice of genome-wide thresholds will result in a marginal under-estimate of the statistical power and false positive rates. 136

5.2.3 Estimation of recombination rates

Data

Our genome-wide comparisons utilised recombination rates estimated from

the genotypes of autosomal SNPs in the three population panels in Phase 2 of the

HapMap5, as well as in the three Asian populations from the SGVP100. Phase 2 of

HapMap surveyed around 2.8 million SNPs in each of the three panels, consisting

of: (i) 60 unrelated individuals in Utah with European ancestry (CEU); (ii) 60

unrelated Yoruba individuals sampled from the Ibadan region of Nigeria (YRI);

(iii) 45 unrelated Han Chinese from Beijing (CHB) and 45 unrelated Japanese

from Tokyo (JPT), which have been pooled as the East Asian panel (JPT+CHB).

The SGVP surveyed around 1.6 million autosomal SNPs in three population

groups, consisting of 96 Chinese (CHS), 89 Malays (MAS) and 83 Asian Indians

(INS) in Singapore. Population-specific recombination rates for the SGVP

samples are downloaded from the SGVP website (http://www.nus-

cme.org.sg/sgvp). To avoid any inadvertent confounding in our analyses that are

solely attributed to the different SNP densities, we thinned the SNPs from the

HapMap populations to a similar set of SNPs found in SGVP and re-estimated the recombination rates for each population panel separately.

Recombination rate estimation

137

Both the HapMap and SGVP used LDhat101 to calculate the recombination

rates within each population, yielding an estimated rate in cM/Mb for each

successive pair of SNPs. For the HapMap populations, per generation

recombination rates c were obtained as the ratio of the estimated rate from LDhat

to the effective population size, which have been estimated as 11,418 for CEU,

17,469 for YRI and 14,269 for JPT+CHB60,375. For the SGVP populations, the

effective population size of 14,269 was used for the Chinese and Malays, while

the average of the effective population sizes for CEU and JPT+CHB (Ne =

12,844) was used for the Singapore Indians.

5.2.4 Simulation

Simulating recombination rates for two populations

For evaluating the performance of the method, we simulated recombination rates for two populations across a 200kb region spanning 401 evenly distributed

SNPs. In population k (where k ∈ {1, 2}), we introduced a recombination hotspot starting at a physical position dk, with a width of wk (in kb) that is distributed as

2 Normal (2, 0.5 ), and spans a genetic distance of gk (in cM). The assumed size of the hotspot is consistent with those observed in sperm typing78,373. The genetic

distance here is equivalently the area under the recombination hotspot versus

physical distance curve, or a measure of the cumulative peakedness of the

recombination hotspot. The background recombination rate used in our simulation

is assumed to follow an exponential distribution with a mean of 0.24cM/Mb. This

is based on the observation that the average recombination rate is 1.2cM/Mb

138

across the human genome, and yet only 20% of the recombination occurs outside

recombination hotspots374. In our simulations, we assume the presence of a

recombination hotspot in each of the two populations, varying in intensity and

location of the hotspots. The recombination rate for the hotspot in the first

2 population g1 is distributed as Normal (µ, 0.5 ) with µ measured in cM/Mb. The

recombination rate in the second population g2 is defined with respect to λ, which

represents the ratio of g1 to g2 (or λ = g1/g2). Larger values of λ correspond to

greater differences in the recombination profiles between the two populations.

For the assessment of the false positive rate of the varRecM metric, let ddiff

denote the distance (in kb) between the starting locations of the two hotspots,

2 2 assumed to be distributed as Normal (0, σd ) with σd ∈ {0, 0.5, 1, 2}. Varying σd thus controls the proximity of the hotspots in the two populations, and larger

2 values of σd effectively introduce greater differences in the simulated recombination profiles. In addition to adding moderate jitter in the starting positions of the recombination hotspots, we also evaluate the robustness of the varRecM metric in situations where moderate differences exist in the peakedness of the recombination hotspots between the two populations but not in the location of the hotspots. This is achieved by setting µ as 25cM/Mb (for the first

population) and λ ∈ {1, 5, 10}. We also evaluate the effect of varying the lag

parameter in the metric across a range of values (0, 2, 4, 6, 8, 12 and 10

log10(L/2)), which broadly corresponds to lag distances of 0kb, 1kb, 2kb, 3kb,

4kb, 6kb and 5 log10(L/2)kb given a SNP density of 2 SNPs/kb.

139

Our assessment of the power of varRecM assumes the recombination hotspot for population 1 is located in the center of the simulated 200kb region, while the hotspot for population 2 is randomly distributed between the 75kb and 125kb mark in each window. Given that the size of each hotspot follows a Normal (2,

0.52) distribution, this simulates a high likelihood that the two hotspots are found in significantly different locations. In addition, we vary the relative intensities of the hotspots by setting µ ∈ {6, 25, 50} and allowing λ to range from 1 (no difference in the peak intensities) to 50 (where the genetic distance from the recombination hotspot in population 1 is fifty times the genetic distance in population 2).

By performing 10,000 simulations under each of the different settings for σd,λ and µ, we count the frequencies where a large varRecM score than a pre-defined genome-wide threshold is obtained, since this represents stronger evidence of inter-population variation in recombination rates relative to the genome-wide distribution. The empirical distributions of the varRecM score from the pairwise genome-wide comparisons of the three HapMap populations displayed similar values for the top quantiles. We thus averaged the varRecM scores at the top 1%

(and also the top 5% and top 0.1%) across the three pairs of comparisons , and used these values as thresholds in our calculation of statistical power and false positive rates. The power and false positive rates in our simulations are thus defined as the proportion of simulations out of 10,000 where obtained score exceeds the thresholds at the top 5%, 1% and 0.1% given the different combinations of σd ,λ and µ. In all likelihood, our choice of genome-wide 140

thresholds will result in a marginal under-estimate of the statistical power and false positive rates.

5.2.5 SNP annotation, copy number variation and FST calculation

Annotation information for all the SNPs was obtained from the PLINK

retrieval interface (http://pngu.mgh.harvard.edu/~purcell/plink/psnp.shtml) which

uses the TAMAL database 376 and maps to the information from the UCSC

genome browser, HapMap and dbSNP build 126. We additionally classify each

SNP as whether it is: (i) non-synonymous; (ii) synonymous; (iii) found within a

gene; or (iv) not within a gene. We used Version 10 of the database for copy

number variation from the Database of Genomic Variants (http://projects.tcag.ca).

For each SNP that is present in both populations, we calculated the FST using the Weir and Cockerham estimator377. This metric quantifies the difference in

allele frequencies at a SNP between two populations. For each window in the

calculation of the varRecM score, we adopt an established approach378 that uses

the maximum FST across the SNPs to represent the regional differentiation

between two populations.

5.2.6 Quantification of variations in linkage disequilibrium

We used the varLD programme379 to assess the extent of LD variation

between two populations across the genome, which considers sliding windows of

50 SNPs with a 1-SNP shift. The position of each window is summarised as the

141

average physical position across the 50 SNPs. Further details of the varLD

algorithm can be found in the original publication379. To evaluate the correlation

between the varRecM and varLD scores calculated for each pair of populations,

we calculated both the average and the maximum varLD scores across all varLD

windows possessing averaged positions that fall within the start and end positions

for each of the 50kb varRecM windows. We consider the varRecM windows with

scores that are found in 15 bins, each spanning 0.1 quantile of the varRecM

scores. The 15 quantile intervals considered are: 4.9th – 5th, 9.9th – 10th, 19.9th –

20th, 29.9th – 30th, 39.9th – 40th, 49.9th – 50th, 59.9th – 60th, 69.9th – 70th, 79.9th –

80th, 89.9th – 90th, 94.9th – 95th, 97.4th – 97.5th, 98.9th – 99th, 99.4th – 99.5th, 99.9th –

100th. For example, the 94.9th – 95th quantile interval contains all the windows where the varRecM scores are found to be at least as large as 94.9% of the genome-wide distribution but less than the top 5% of the scores. We calculate the

Pearson’s correlation coefficient between the varLD scores and the varRecM scores across the regions in these 15 bins. In addition, we also calculated the correlation between varLD and varRecM scores after adjusting for the effect of population differentiation as quantified by FST. This is achieved by regression the varLD scores as a linear function of the varRecM scores and the FST for each

region.

142

5.3 Results

5.3.1 Simulation studies on power and false positive rates

Our formulation of the varRecM metric explicitly locates genomic regions

where there is a significant deviation, such as a spike, in the recombination profile

of one population that is not present in the second population. The varRecM score

is negligible when the recombination profile is relatively flat across both

populations. Thus, in assessing the power and false positive rates (FPRs) of

varRecM, our simulations concentrated on situations where there is at least one

spike in the recombination rates of each population, with varying degrees of

peakedness and with different hotspot locations. We compared the varRecM score for each simulated window against three thresholds at the top 0.1%, 1% and 5% quantiles of the genome-wide distributions obtained from pairwise comparisons of the HapMap populations. This approach of prioritizing regions found in the

extreme tails of the genome-wide distribution is similar to other population

genetics strategies for highlighting unique features in the human genome, such as

extreme evidence of LD variation379 and positive natural selection378,380-382.

We first evaluate the relevance of the lag component in the varRecM metric

in minimizing the FPR. In addition to varying the relative intensities of the

recombination profiles (quantified as λ ∈ {1, 5, 10}) in our simulations, we also

allow minor variations in the hotspots locations in the two populations. This

mimics the situation where the recombination rates in each population have been

estimated with a slightly different set of SNPs due to the use of different

143

genotyping technology which can introduce minor deviations in the positions of

the hotspots. When the simulated hotspots are located at exactly the same

positions (corresponding to σd = 0), the FPR at the 1% threshold is almost entirely

zero regardless of the relative intensities λ and the lag values used (Figure 19A).

Unsurprisingly, as the positions of the hotspots start to vary (σd > 0), higher FPRs are observed although this can be controlled by increasing the lag value (Figure

19B-D). Similar patterns are observed with the use of the 0.1% and 5% thresholds. However, increasing the lag reduces the power to detect genuine variations in recombination rates (Figure 20A) although this attenuation of power is marginal for moderately small lag values at the top 1% and 5% threshold (i.e. <

6). Based on the SNP densities found in Phase 2 of the HapMap and SGVP, we calibrate the varRecM metric to adopt a lag distance equivalent to approximately

4kb in all subsequent analyses.

To investigate the sensitivity of the varRecM metric to variations in recombination rates, we vary the ratios of hotspot intensities and the locations of the hotspots in the two populations (Figure 20B-D). Unsurprisingly, the power of varRecM is observed to increase with larger differences in the hotspot intensities

(as λ increases from 1 to 50), particularly when the average intensity in the reference population is high (µ of 25 and 50 cM/Mb). The power is generally lower for detecting variations as a result of smaller spikes in recombination profiles at the threshold of top 0.1% and 1% (Figure 20B).

144

5.3.2 Application to HapMap and Singapore Genome Variation Project

Using the estimated recombination rates for HapMap and SGVP, five genome-wide scans of recombination rate variations are performed for the following population pairings: (i) CEU versus JPT+CHB; (ii) CEU versus YRI;

(iii) JPT+CHB versus YRI; (iv) CHS versus INS; (v) CHS versus JPT+CHB. We adopt a sliding window approach where each window spans 50kb and consecutive windows are shifted by 1 SNP (see Subjects and methods). Overlapping regions that emerged in the top 1% of each scan are merged to localize the start and end positions, and to avoid double-counting. In general, the genome-wide distribution of the varRecM statistics has a high kurtosis with long tails (Figure 21). Our survey of the two relatively homogeneous populations (CHS and JPT+CHB, median FST = 0.3%) revealed that the varRecM scores tend to be clustered around zero, with a median score of 0.0063, or approximately four-fold lower than the median scores from the comparison between YRI and non-YRI HapMap populations (Table 10). Using the associated varRecM score in the top 1% of the genome-wide distribution, we observed an apparent pattern in the spatial distribution of the population-specific recombination peak regions. From our pairwise comparisons between HapMap CEU and non-CEU samples (YRI ,

JPT+CHB), 114 regions exhibited strong evidence of CEU-specific recombination peaks across the autosomes (Figure 22). The same analyses comparing JPT+CHB against CEU and YRI identified 88 regions, while the corresponding number for YRI-specific recombination peaks is 294.

145

Amongst the strongest twenty regions that emerged in the comparison

between the HapMap Europeans (CEU) and East Asian (JPT+CHB), six regions

harboured genetic evidence of population-specific positive selection (Table 11) such as the region on chromosome 15 that encompassed the SLC24A5 pigmentation gene which is well-established to be positively selected in

Europeans but not in other populations383. Recombination peaks spanning 0.12cM

and 0.05cM are found adjacent to SLC24A5 in JPT+CHB (Figure 23A ) and YRI respectively, and this is notably absent in CEU. Another region encapsulated the

LPP gene on chromosome 3q28. The region spans a 100kb window where a significant spike in the recombination rates was observed in CEU, but not in

JPT+CHB (Figure 23B). Intriguingly the recombination hotspot was also present in CHS, and this window emerged with strong evidence of recombination variation in our analysis between CHS and HapMap JPT+CHB (Table 12). The

LPP gene has previously been reported as a strong candidate for recent positive selection in the HapMap East Asians382, and subsequent studies have suggested

that this selection signal is present only in northern Chinese populations but not in

southern Chinese populations384. It is however not necessarily true that a selection sweep appears to coincide with a region of low recombination, as we observed spikes in recombination in regions reported to be undergoing positive selection.

One example is at the region encompassing PHACTR1, which is a candidate gene for positive selection in HapMap East Asian381 but yet exhibits evidence of

significant recombination in JPT+CHB (Figure 23C).

146

For the comparisons between YRI and the non-African populations (CEU,

CHB+JPT), only two top regions contain population-specific selection (Table

11). The first is in the region around the SLC24A5 gene in the comparison

between CEU and YRI; the second is an extended region surrounding GRP85 on

chromosome 7 which has been identified to be positive selected in East Asians380-

382 (Figure 23D). In addition, although recombination hotspots are thought to be highly active in YRI populations, strong population-specific recombination peaks are also observed for CEU (10 of 20 top regions) and CHB + JPT (6 of 20) in the comparison to YRI (Table 12).

In addition to locating candidates of positive selection, there are other regions that emerged with strong evidence of inter-population differences in recombination rates (Table 11 & Table 12, Figure 24 - 28). Several genes are related with severe genetic syndromes. For example, the region encompassing

SPINK5 (associated with Netherton syndrome) on chromosome 5q32 was consistently identified among the top signals in the comparisons between YRI and non-Africans (CEU, JPT+CHB); PLCE1 on chromosome 10q23, which is associated with diffuse mesangial sclerosis (DMS), has two strong hotspots in the gene for JPT+CHB populations but a significantly weaker recombination profile in YRI; and SLC2A2 on chromosome 3q26, which is a major genetic factor for

Fancoin-Bickel syndrome, has a hotspot that is present only in CEU but not in

JPT+CHB or YRI.

147

5.3.3 Recombination variation and Linkage disequilibrium variation highly

correlated

To examine the relationship between inter-population variations in recombination rates and LD patterns, we measured the average and maximum varLD scores for each window used in the varRecM score calculation. We observed strong correlations between the varRecM and averaged varLD metrics in all of our comparisons (Pearson’s correlation coefficient r ranging from 0.873 to

0.965 with corresponding p-values ranging from 2.10 × 10-5 to 6.38 × 10-9, see

Figure 29), except in the comparison between the two East Asian panels (CHS

versus JPT+CHB, r = -0.039, P-value = 0.889). Similar patterns were observed

when we use the maximum varLD scores. The strongest correlation between

inter-population LD variation and recombination differences was observed in the

comparison between JPT+CHB and YRI. To address whether this strong

relationship is driven by the differences in allele frequency spectrum between the

populations which may inadvertently bias both the LD calculations and the

recombination rate estimation, we calculated the FST for every SNP that is found in both populations and calculated the maximum FST across the SNPs in each window. Similar correlation patterns between the varRecM and varLD scores remained after accounting for the FST within a regression framework, while no

significant association was observed between varRecM and FST (P-values > 0.05).

148

5.3.4 Regions with largest recombination variation less frequent in genes

We observed there was an enrichment of windows exhibiting high varRecM scores in non-gene regions (P-values < 10-7, Figure 30), except in the comparison between CHS and JPT+CHB (P-value = 0.614). In the comparison between CEU and JPT+CHB, high varRecM scores (defined as in the top 1% of the genome- wide distribution) were 1.11 times (95% CI: 1.07 – 1.15, P-value = 1.23 × 10-9) more likely to be found in nongenic regions as compared to varRecM scores below the top percentile. Similar odds ratios were observed in the comparisons between CEU and YRI (OR = 1.23, 95% CI: 1.19 – 1.27, P-value = 8.66 × 10-24);

JPT+CHB and YRI (OR = 1.16, 95% CI: 1.12 – 1.20, P-value = 1.76 × 10-18);

CHS and INS (OR =1.10, 95% CI: 1.06 – 1.17, P-value =5.67 × 10-8). This suggests that extreme inter-population variations in recombination rates are generally less likely to happen in gene regions, which concurs with previous observations that recombination in humans tends to be more conserved in gene regions 5,101.

5.4 Discussion

We have introduced a quantitative approach to evaluate fine-scale inter- population variation in recombination rates across the whole genome based on the high-resolution recombination maps. This method is especially powerful for locating genomic regions where a recombination spike exists in one population but not in another population. By introducing a lag component in the formula, our

149

metric is robust to small shifts in the relative positions of recombination hotspots

that are found in both populations. We adopted the same approach as most

population genetics strategies for LD variations379 and positive selection379-381 by

prioritizing genomic regions with extreme evidence when benchmarked against

the rest of the genome. We applied our method to pairs of populations from

HapMap and SGVP, observing significant fine-scale differences in the recombination profiles of Europeans, Africans and East Asians. The number of

regions in the top 1% exhibiting strong evidence of YRI-specific recombination peaks is at least twice the number of population-specific recombination peaks for

CEU or JPT+CHB. Seven of the top twenty regions that emerged with the strongest evidence of recombination differences between HapMap CEU and JPT+

CHB coincide with genomic regions exhibiting strong signatures of population- specific positive selection, while a few of the top regions contain genomic signatures of positive selection from the comparisons between African and non-

African populations. In addition, multiple top regions encompass genes associated with the severe genetic syndromes. Large inter-population variation in recombination is also less frequent in genes, and unsurprisingly we detected a strong correlation between variations in recombination rates and LD patterns.

Overall the varRecM score in each window indicates the extent of recombination rate variation between the two populations in consideration, with larger values corresponding to higher amount of differences in the recombination rate profiles in the two populations. These differences can be due to differential peakedness in the recombination rates, phase shifting, or both. As with most 150

population genetic metrics, it is difficult to interpret the statistical significance of each varRecM score and instead we consider the overall distribution of the varRecM scores across the entire human genome. This allows us to identify specific genomic regions which exhibit the strongest evidence of recombination rate variation between the populations. For example, a varRecM score that falls in the top 0.1% of the genome-wide distribution suggests that the difference in the recombination rates in this particular window is more pronounced than 99.9% of the genome. We emphasize that this does not necessarily indicate the biological relevance or significance of this genomic window, other than to highlight the region as potentially interesting given the evidence of significant diversity in recombination patterns. To the best of our knowledge, there is currently no formal statistical metric for defining recombination rate variation without first defining the presence or absence of recombination hotspots. The varRecM score is thus the first metric that performs an agnostic survey of recombination rate differences with the rates directly.

One of the challenges we encounter in formulating a metric for evaluating inter-population variation in recombination rates is justifying the sensitivity of the method – how do we know that the regions we identified truly concur with genuine differences in recombination? There are two aspects to this question: (i) that there are genuine differences in the biological mechanisms between different populations that resulted in recombination preferentially occurring in one population but not in the other; (ii) that methods used to estimate recombination rates are biased by inherent artifacts in the data used, such as erroneous genotype 151

calls, loss of heterozygosity due to copy numbers or positive selection. We have

used recombination rates that were fundamentally estimated via the coalescence

of haplotype data385,386. This inference of historical recombination can be

distorted locally by positive selection, since the time tracing back to the most recent common ancestor (MRCA) in a coalescent based-model is underestimated

when local genetic diversity is low or reaches fixation387,388. Thus, the

concordance between the regions of suppressed recombination we identified with

known signatures of intensive positive selection (i.e. SLC24A5) is reassuring,

providing the means for bench-marking the performance of the method as well.

Conversely, it has been suggested, perhaps surprisingly, that selective sweeps

could also decrease or even eliminate LD on the selected loci, but such LD

reduction has the potential to create false evidence of recombination

hotspots103,389,390. The top regions identified in our study with the strongest

evidence of elevated rates of recombination at well-established selection loci thus

warrants further investigation for such scenarios.

Our study has approached recombination rate differences at the population

level. However accumulating evidence suggests that differential hotspot presence

across individuals or between males and females may be in part attributed to

germline genetic differences. Several studies have suggested that recombination

may be activated by: (i) a single base change at certain SNPs84,373,391; (ii) the

presence of a certain haplotype392; or (iii) the presence of a specific 13-mer motif of binding site for the zinc finger104. Recent studies have also discovered the

152

coding zinc-finger protein PRDM9 gene can influence both allelic and non-allelic

recombination and can account for up to 18% of hotspot variance in humans105,393-

395. It has been shown through pedigree analyses393 and sperm typing394 that

individuals carrying the PRDM9 variants lacking the motif-recognizing alleles

exhibited dramatically lower genome-wide hotspot usage. Substantial allele

frequency differences in PRDM9 between populations have been suggested to

result in potential shifts in recombination hotspots98. The results from our study

are consistent with the findings that recombination hotspots can be polymorphic

among human populations. Our approach thus serves to identify genomic regions

that may be investigated further for association with PRDM9 variants across

different populations.

Recombination is one of the key forces that breakdowns LD, and the rate of

recombination has been widely used in SNP imputation algorithms60. Our

genome-wide surveys identified a close relationship between the evidence of LD

variation and recombination rate variation. This observation is important in trans-

ethnic experiments that aim to fine-map the functional polymorphisms by

leveraging on the genomic diversity of different populations396,397, as it highlights

the use of population-specific recombination rates for carrying out the fine- mapping. Since regions priortised for trans-ethnic fine-mapping tend to be disease-associated regions that exhibited variable LD patterns between populations, using recombination rates that are averaged across multiple populations (such as those available on the HapMap resource) may not accurately define the haplotype breakpoints in the different populations. We did not observe 153

any correlation between the recombination and LD differences in two closely-

related populations: CHS and JPT+CHB, likely because one or both of the

varRecM and varLD statistics will be highly skewed in such comparisons and

scores at the tail end of the genome-wide distribution may not effectively differentiate the degree of inter-population recombination variation.

Recently, Wegmann and colleagues have found that regions at a broad scale of 1Mb showing the strongest differences in recombination are disrupted by common structural variation (inversions) in the comparison between admixture

American Africans and Europeans95. Some regions exhibiting largest recombination differences in our study also harboured copy number variation,

especially in the comparisons between Africans and non-Africans. However, in

contrast to the previous study, these structural variants were generally very rare

(frequency less than 0.05) and therefore less likely to distort the local

recombination rate estimation within these top identified regions.

Our findings of genomic regions that exhibit population-specific

recombination hotspots illustrate that there can be variations in fine-scale recombination profiles between populations. The regions with the largest recombination difference residing in the population-specific sites of positive selection may reflect artifacts that are introduced due to a suppression of heterozygosity in the genotype data as a result of the positive selection signatures, which inadvertently affect the estimation of recombination rates. The fine-scale comparison of recombination profiles will priortise genomic regions for further

154

studies into the biological basis of heterogeneous recombination across different human populations, thus providing further insight into the process of evolution.

155

Figure 18. Illustration of ranking the recombination differences from two populations. Suppose there are two recombination rate profiles estimated from different populations, population 1 (yellow line) and population 2 (blue line), while the genetic distance (the area under the curve) for population 1 is assumed larger than that for population 2 in a fixed window. We rank the recombination differences between population 1 and 2 as depicted by the red arrow. For the upper panel, the region with the extent variation in the magnitude of the recombination peaks between these two populations is greatest in C) where high peak in population 1 and depleted recombination in population 2. For the lower panel, the recombination difference is incremental from D) to E), which is attributed to the level of phase shifting of two recombination rates.

156

A B

C D

Figure 19. Evaluation of false positive rates (FPR) of varRecM method.

The false positive rate of the varRecM approach is evaluated for the two recombination hotspots which are at the same or close proximity location with varying intensity. The distance between the starting positions of two hotspots is 2 distributed as Normal (0, σd ), where A) σd =0 kb, B) σd = 0.5 kb, C) σd = 1.0 kb

and D) σd =2.0 kb. The recombination rate for the hotspot in the first population 2 g1 is distributed as Normal (25, 0.5 ). The recombination rate in the second population g2 is defined with respect to λ, where λ ∈ {1, 5, 10},*, which represents the recombination hotspot intensity ratio of g1 to g2. The lag parameter is varying across a range of values (0, 2, 4, 6, 8, 12 and 10 log10(L/2)) where L =100, the total SNP in the 50 kb of the window in the simulation study. Give a SNP density of 2 SNPs/kb in the simulated data, the lag values corresponds to lag

distances of 0kb, 1kb, 2kb, 3kb, 4kb, 6kb and 5 log10(L/2)kb. Threshold in the false positive rates calculation is at the top 1% genome-wide.

157

A B

C D

Figure 20. Power performance of varRecM method. The power of the varRecM metric is evaluated using the simulated data with varying intensity and phase shifting of the recombination hotspots between two populations. Recombination hotspot for population g1 is located at the center of the simulated 50kb window, where the hotspot rate is distributed as Normal (µ, 0.52) and µ is set as A) 25 cM/Mb, B) 6 cM/Mb, C) 25 cM/Mb, and D) 50 cM/Mb. The hotspot for population g2 is randomly distributed within each window of 50kb, and the rate is determined by λ, intensity ratio of g1 to g2. For A), λ is fixed at 10; For B), C) and D), λ ranges from 1 (no difference in hotspot intensity) to 50 (where the hotspot intensity in population g1 is fifty times the hotspot intensity in population g2). The lag distance is adopted as 4kb in the power evaluation for B), C) and D).

158

Figure 21. Accumulative density plots of varRecM scores from five pair comparisons between HapMap and SGVP populations. CEU - JPT + CHB: comparisons between HapMap Europeans (CEU) and East Asians (JPT+CHB) (black line); CEU - YRI: comparisons between HapMap European (CEU) and Africans (YRI) (red line); JPT + CHB - YRI: comparisons between HapMap East Asians (JPT + CHB) and Africans (YRI) (green line); CHS - INS: comparisons between SGVP Chinese (CHS) and Indian (INS) (blue line); CHS - JPT + CHB: comparisons between SGVP Chinese (CHS) and HapMap East Asians (JPT + CHB) (pink line).

159

Figure 22. Distribution of population-specific recombination peak regions in the top 1% of the varRecM scores. We identified those population-specific recombination peak regions that contained signals in the top 1% of the genome- wide distributions in two specific population-pair comparisons (say between population A and population B, and between population A and population C) but not in the top 5% of the distribution in the third population-pair (between population B and population C). These regions are indicative that the signals are driven mainly by one population (population A in our example here). The number of regions for each population is: 114 for CEU, 88 for JPT+CHB and 294 for YRI.

160

A SLC24A5 ( chr 15)

B LPP ( chr 3) ) /Mb cM

C varRecM PHACTR1 (chr 6) Recombinationrate (

D GPR85 (chr 7)

. Chromosome position (Mb)

Figure 23. Top regions of largest varRecM scores with overlapping signals of positive selection. The top regions illustrated are from the comparisons between HapMap Europeans (CEU) and East Asians (JPT+CHB) (Fig. 23A-C), and HapMap JPT+ CHB versus Africans (YRI) (Fig. 23D). The varRecM scores are depicted as dark blue points. Recombination rates for CEU (orange line), JPT+CHB (brown line) and YRI (sky blue line). A) A strong recombination peak

161

in JPT+CHB but suppressed in CEU at 46.3 Mb on chromosome 15. The deficit of rates in CEU is within the extended selection region at gene SLC24A5. The horizontal grey bar near the x-axis denotes the position undergoing selection for CEU and the red bar below indicates the position of the gene SLC24A5. B) Strong recombination rate is within LPP (red bar) at 190.0 Mb on chromosome 3 in CEU with the selection signals for JPT+CHB (grey bar), while the rate is depleted for JPT+CHB at the same location. C) Recombination peaks at 12.9 Mb on chromosome 6 in JPT+CHB, but absent in CEU. High recombination in PHACTR1 genes in JPT+CHB coincides with selection reported in the Chinese population. D) High recombination peak in JPT+CHB but suppressed in YRI at 112.5Mb on chromosome 7, corresponding to a selection region in JPT+CHB encapsulating gene GPR85.

162

163

Figure 24. Plots of the top 20 regions of the varRecM scores for the comparison between samples of HapMap CEU and JPT+CHB. Recombination rates of CEU (black line) and JPT+CHB (Green line). Vertical grey dotted line depicts the top varRecM region.

164

165

Figure 25. Plots of the top 20 regions of the varRecM scores for the comparison between samples of HapMap CEU and YRI. Recombination rates of CEU (black line) and YRI (Green line).

166

167

Figure 26. Plots of the top 20 regions of the varRecM scores for the comparison between samples of HapMap JPT+CHB and YRI. Recombination rates of JPT+CHB (black line) and YRI (Green line).

168

169

Figure 27. Plots of the top 20 regions of the varRecM scores for the comparison between samples of SGVP CHS and INS. Recombination rates of CHS (black line) and INS (Green line).

170

171

Figure 28. Plots of the top 20 regions of the varRecM scores for the comparison between samples of SGVP CHS and HapMap JPT+CHB. Recombination rates of CHS (black line) and JPT +CHB (Green line).

172

A B C

D E

Figure 29. Scatter plot of varLD score versus varRecM score among HapMap and SGVP populations. A) CEU - JPT+CHB (Pearson correlation coefficient r = 0.873; p = 2.10 × 10-5 ), B) CEU- YRI (r = 0.897; p = 5.72 × 10-6 ), C) JPT+ CHB – YRI (r = 0.965; p = 6.38 × 10-9), D) CHS-INS (r = 0.893; p = 7.49 × 10-6 ), E) CHS - JPT+CHB (r = -0.039; p = 0.889). Fitted line is predicated from the linear regression function by regressing varLD on varReCM scores.

173

1.4

1.2

1

0.8

Odds Ratios 0.6

0.4

0.2

0 CEU - JPT+CHB CEU - YRI JPT+CHB - YRI CHS - INS CHS - JPT+CHB

Figure 30. Odds ratio of extreme varRecM scores presenting in intergenic versus gene regions.

CEU –JPT + CHB: comparisons between HapMap Europeans (CEU) and East Asians (JPT+CHB); CEU - YRI: HapMap European (CEU) and Africans (YRI); JPT + CHB - YRI: HapMap East Asians (JPT + CHB) and Africans (YRI); CHS - INS: SGVP Chinese (CHS) and Indian (INS); CHS - JPT + CHB: SGVP Chinese (CHS) and HapMap East Asians (JPT + CHB). For all the comparisons, regions with extreme varRecM scores are defined as > top 1% in the distribution and non- extreme scores at ≤ top 1%, except for comparison between SGVP CHS and HapMap JPT +CHS; for CHS and JPT +CHB, regions with extreme varRecM scores are defined as exceeding top 0.1%. Y error bar represents 95% CI of the odds ratio.

174

Table 10. varRecM scores at top percentiles for pair-wise comparisons of the three HapMap populations between CEU and JPT + CHB, CEU and YRI, YRI and JPT + CHB CEU - YRI - Percentage\populations JPT+CHB CEU -YRI JPT+CHB Mean SE top 5% 0.216 0.265 0.246 0.242 0.014 top 1 % 0.447 0.475 0.443 0.455 0.010 top 0.5% 0.552 0.565 0.537 0.551 0.008 top 0.1% 0.799 0.773 0.727 0.766 0.021

175

Table 11. The 20 strongest signals of varRecM scores in comparisons of HapMap populations. Chr Position (Mb Population with Genes Population Selection Structure variationc in hg18) strong under position recombination selection (Mb)b (peak rate cM/Mb)a CEU – JPT +CHB 1 192.78 – 192.88 CEU ( 28.2) 2 35.15 – 35.24 JPT + CHB (30.5 ) Copy number difference in HapMap samples398

3 172.22 – 172.35 CEU (20.2) SLC2A2, TNIK 3 189.93 – 190.03 CEU (52.1) LPP JPT + CHB382,399 189.73 – 190.42 4 148.40 – 148.50 CEU (42.5) 4 149.00 – 149.10 CEU (23.2) ARHGAP10 CEU381,400 148.74 – 149.14 4 161.91 – 162.00 JPT + CHB (39.5) Copy number loss in Caucasians or African- Americans 401 / copy number gain in HapMap samples (MAF=0.07) 398, Germans402 5 43.85 – 43.95 CEU (46.9) 5 151.85 – 151.92 JPT + CHB (61.3) 6 12.88 – 13.02 JPT + CHB (17.8) PHACTR1 JPT+CHB381,403/ 12.62 – 13.20 Chinese399 7 92.16 – 92.26 CEU (22.3) CDK6 CEU381,403 92.12 – 92.22 7 135.72 – 135.81 CEU (81.0) 8 136.27 – 136.36 CEU (54.0) 10 126.92 – 127.01 JPT + CHB (28.6) JPT + CHB380,404 126.90 – 127.00 11 71.46 – 71.52 CEU (18.5) LRRC51, NUMA1, FOLR3 13 31.56 – 31.66 JPT + CHB (57.9) FRY 15 34.04 – 34.12 CEU (44.4) 15 46.26 – 46.36 JPT + CHB (47.8) SLC24A5, CEU381,382,399,403 45.94 – 46.80 SLC12A,MY EF2 17 71.01 – 71.11 CEU (35.4) CASKIN2, Copy number loss in LLGL2, Caucasian/African- TSEN54 American, HGDP, Japanese 401,405,406 /gain in HGDP405 18 63.70 – 63.77 CEU (14.1) CEU - YRI 1 98.02 – 98.12 CEU (82.1) DPYD 1 102.55 – 102.62 YRI (15.3) Copy number loss in Caucasians, Germans, HapMap, Canadian Ontario controls 402,407,408 2 8.24 – 8.35 YRI (34.1) Copy number gain in Caucasian individuals 407 2 151.09 – 151.19 CEU (88.6) 2 226.52 – 226.62 CEU (32.7) 3 179.54 – 179.64 CEU (59.9) 4 42.42 – 42.52 CEU (99.1) 4 54.43 – 54.52 CEU (70.4) 5 147.45 – 147.55 YRI (40.4) SPINK5, SPINK5L2 7 139.56- 139.62 CEU (44.0) 9 119.25 – 119.35 CEU (42.0) 10 2.24 – 2.34 YRI (36.6) 10 5.37 – 5.46 CEU (44.8) NET1,UCN3 , TUBAL3 10 122.83 – 122.93 YRI (51.5) Copy number differences in HapMap samples398 11 15.75 – 15.83 YRI (37.2) 11 18.99 – 19.08 YRI (64.6) MRGPRX2, Copy number ZDHHC13 differences in HapMap samples398 12 2.79 – 2.88 CEU (42.0) TULP3,NRI Copy number loss in P2,ITFG2,F CEPH diverse KBP4, population 409 15 46.26 – 46.36 YRI (34.5) SLC12A1, CEU381,382,399,403 45.94 – 46.80 SLC24A5, MYEF2 18 57.80 – 57.89 YRI (55.8) PIGN Copy number difference in CEPH diverse population 409 20 12.34 – 12.40 YRI (35.8) Copy number differences in HapMap samples398 JPT + CHB - YRI 3 82.53 – 82.59 YRI (28.8) Copy number gain in HapMap samples398 4 12.84 – 12.93 YRI (31.0) Copy number loss in CEPH diverse population 409 5 51.19 – 51.30 JPT + CHB (34.8) 5 58.32 – 58.42 YRI (22.5) PDE4D 5 147.45 – 147.55 YRI (40.4) SPINK5, SPINK5L2 5 151.85 – 151.95 JPT + CHB (61.3) 6 67.91 – 67.99 YRI (21.7) 7 112.43 – 112.54 JPT + CHB (40.4) GPR85 JPT + CHB380-382 111.87 – 112.59 Copy number difference in HapMap samples398 7 135.72 – 138.81 YRI (36.1) 10 95.89 – 96.01 JPT + CHB (57.9) PLCE1 11 91.13 – 91.27 YRI (27.7) 11 100.81 – 100.91 JPT + CHB (38.8) TRPC6 12 65.73 – 65.83 YRI (30.0) 15 34.83 – 34.95 YRI (28.8) C15orf41 15 63.24 – 63.33 YRI (57.8) CLPX,CILP, PARP16 16 75.38 – 75.47 YRI (32.2) Copy number loss in Caucasians407 17 22.53 – 22.61 JPT + CHB (36.1) 18 63.72 – 63.77 YRI (20.7) 20 12.33 – 12.43 YRI (35.8) 21 22.72 – 22.80 YRI (26.0) Copy number loss in Caucasian/African- American 401 a Peak rate is the estimated recombination rate in cM/Mb for the population- specific recombination peak present in each top region.

177

b Region undergoing selection is the stretch spanning a maximum length and collapsed from all the overlap regions reported from studies410. c Copy number variations have to be at least 10 kb to be included and overlap with the peak rate regions. The frequency of the copy number difference is rare ( < 0.05) in the source populations unless otherwise specified in the table. CEU -JPT+CHB: comparisons between HapMap Europeans (CEU) and East Asians (JPT+CHB); CEU - YRI: comparisons between HapMap European (CEU) and Africans (YRI); JPT + CHB - YRI: comparisons between HapMap East Asians (JPT + CHB) and Africans (YRI).

178

Table 12. The 20 strongest signals of varRecM score in comparison of populations of SGVP Chinese and Indians, and Chinese and HapMap East Asians. Chr Position (Mb Population with Genes Population Selection Structure variation c in hg18) strong under position recombination selection (Mb)b (peak rate cM/Mb)a CHS -INS 1 214.96 – 215.06 INS (72.3) ESRRG 1 215,07 – 215.20 INS (49.6) ESRRG 1 230.65 – 230.75 INS (38.7) SIPA1L2 2 25.06 – 25.15 CHS (19.9) EFR3B 2 209.42 – 209.50 CHS (25.7) Copy number difference in Indian411 3 164.98 – 165.07 INS (38.7) 4 29.80 – 29.89 INS (29.5) 4 72.07 – 72.16 INS (53.0) DCK,MOB CEU403,412 72.04 – 72.41 KL1A 4 73.90 – 73.95 INS (16.3) 4 135.62 – 135.72 INS (24.5) CEU400,404,413 135.10 – 135.64 Common copy number loss in Indian411 5 66.21 – 66.30 INS (28.0) MAST4 7 122.16 – 122.22 INS (19.0) CADPS2 8 127.38 – 127.46 CHS (51.1) 9 24.10 – 24.19 CHS (73.2) 9 132.68 – 132.78 INS (67.4) ABL1,QRF P, FIBCD1 12 70.85 – 70.92 INS (34.7) 12 124.77 – 124.85 INS (65.4) 18 74.82 – 74.91 INS (28.5) SALL3 Copy number loss in Indian and Singapore Chinese411 20 52.60 – 52.73 INS (39.3) DOK5 JPT+CHB404,412 52.43 – 52.75 21 18.34 – 18.40 CHS (66.8) Copy number loss in Singapore Chinese411 CHS – JPT + CHB 1 21.56 – 21.65 CHS (30.7) NBPF3, ECE1 1 214.95 – 215.06 JPT + CHB (66.8) ESRRG 1 215.08 – 215.20 JPT + CHB (44.9) ESRRG 3 189.93 – 190.03 CHS (41.4) LPP JPT + 189.73 – 190.42 CHB382,399 4 123.05 – 123.14 CHS (37.2) TRPC3 4 142.23 – 142.33 JPT + CHB (41.6) RNF150 5 153.38- 153.44 JPT + CHB (39.5) MFAP3, FAM114A2 6 67.91 – 68.02 CHS (18.4) 8 57.00 – 57.10 JPT + CHB (29.3) LYN JPT + 56.96 – 57.18 CHB403,412 8 111.71 – 111.81 CHS (20.5) JPT+CHB381,404 111.41 – 111.86 9 119.53 – 119.62 JPT + CHB (47.1) TLR4 10 107.85 – 107.94 JPT + CHB (42.1) 11 100.50 – 100.60 JPT + CHB (17.0) PGR JPT + CHB404 100.49 – 100.60 11 100.81 – 100.91 JPT + CHB (38.8) TRPC6 12 70.85 – 70.93 JPT + CHB (40.9) 15 43.97 – 44.07 JPT + CHB (27.2) JPT+CHB404/As 43.92 – 44.33 ian American399 18 30.71 – 30.82 CHS (14.5) DTNA 21 18.35 – 18.40 CHS (66.8) 179

21 18.93 – 18.99 CHS (75.1) Copy number gain in Singapore Chinese411 22 29.02 – 29.11 CHS (10.5) TBC1D10, RNF215, SEC14L2 a Peak rate is the estimated recombination rate in cM/Mb for the population- specific recombination peak present in each top region. b Region undergoing selection is the stretch spanning a maximum length and collapsed from all the overlap regions reported from studies 410. c Copy number variations have to be at least 10 kb to be included and overlap with the peak rate regions. The frequency of the copy number difference is rare ( < 0.05) in the source populations unless otherwise specified in the table. CHS - INS: comparisons between SGVP Chinese (CHS) and Indian (INS); CHS - JPT + CHB: comparisons between SGVP Chinese (CHS) and HapMap East Asians (JPT + CHB).

180

6 Chapter 6 Conclusion

This dissertation is comprised of three studies (Study 1, 2 and 3). Since each

study has its own discussion, in this final chapter, I will only summarise the key

results and major implications, and suggest areas for further research.

6.1 Identified genetic variants associated with refractive errors

In Study 1 (Chapter 3), the genetic locus on chromosome 1q41 containing

zinc-finger pseudogene ZC3H11B was found to associate with AL through a

meta-analysis of three GWAS in Chinese and Malays. This is the first genome-

wide association scan, to best of our knowledge, to explore the genetic association

for AL. The discovery of chromosome 1q41 as a locus for high myopia in our

data was further supported by validation in two independent Japanese cohorts.

Similar genetic effects at 1q41 for high myopia in Chinese, Malays and Japanese

were observed, suggesting that underlying biological mechanisms of the genetic

variation are likely to be conserved across these three Asian populations. Our

GWAS results herein highlight AL QTLs relevant for high myopia predisposition,

which advances our understanding of the genetic aetiology of common and high

myopia.

In Study 2 (Chapter 4), our genome-wide meta-analysis across five genome-

wide surveys from three genetically diverse populations in Asia revealed that

genetic variants in the PDGFRA gene on chromosome 4q12 are significantly associated with corneal astigmatism. We observed a strong and consistent

181

association with the onset of corneal astigmatism at the PDGFRA gene locus on

chromosome 4q12 across Chinese, Malays and Indians.

Due to the nature of the GWAS design of ‘indirect mapping’, the identified

SNPs at locus chromosome 1q41 or within gene PDGFRA are probably not the

casual variants, but are likely to be in LD with the causal SNPs. Further strategies

to pinpoint the causal mutation including fine mapping and deep-sequencing on

this genomic region in different population will be required.

6.2 Transferability of the genetic variants for refractive errors across

populations

The international Consortium for Refractive Error and Myopia (CREAM) has

recently been launched, representing the largest genome-wide scan on AL in

about 23,000 individuals from 18 cohorts in Europe, United States, Australia and

Asia. Meta-analysis on 2.5 million autosomal SNPs by whole-genome imputation using HapMap 2 reference panels reveals the most significant association for AL on chromosome 1q41 in the proximity of pseudogene ZC3H11B ( meta-P = 9.6 ×

10-12 ), along with genetic locus on chromosome 15q14 and seven novel loci (data

not published). In a parallel genome-wide meta-analysis of GWAS in 27

Caucasian populations from the CREAM Consortium, use of spherical equivalent

as a continuous phenotype has identified 24 distinct genetic regions, while 13 loci

show significant evidence of replication in the Asian populations.

Chromosome 4q12, harbouring the gene PDGFRA, is a previously reported

myopia loci (MYP9). Interestingly, our group also found a strong association

182

between variants in PDGFRA with corneal curvature in Asians414, where corneal

curvature is one of the underlying ocular parameters of refractive error, pointing

to a possible pleiotropic contribution of PDGFRA on corneal astigmatism and

corneal curvature. Recent GWAS in Australian of northern European ancestry

confirmed the important role of the PDGFRA gene in controlling the corneal

curvature415. Significant associations of PDGFRA to corneal curvature and corneal astigmatism were also found in UK European Children (data not shown).

These outcomes, along with finding from the CREAM Consortium, provide evidence for substantial overlap in genetic aetiology of refractive error between

Caucasians and Asians.

The transferability of the genetic variants for refractive errors across different ethnic groups will have important implication for preventive or therapeutic

intervention for myopia in the future. The discovery of common risk loci in

GWAS raises the possibility of using these variants for risk predication in a

clinical setting. Assuming that the genetic variants for myopia are shared across

different population, a generic risk predication model will be used to classify

individuals into high or low risk for myopia onset and enable targeted preventive

treatment. A good genetic prediction model has been built up for age-related

macular degeneration, utilizing multiple large-effect genetic variants in addition

to epidemiological risk factors416. Limited predicative value, however, is noted for

most common diseases so far417. Substantial improvements will only be expected,

as the field of genomics is progressing, to better characterise genetic variation via

183

fine-mapping, whole-genome sequencing as well as functional testing, and assess

gene-environment interactions.

6.3 Statistical meta-analysis of GWAS in diverse populations

The use of diverse populations will be an essential component of the next

phase of GWAS. Meta-analysis combining many studies as possible provides the

opportunity to scrutinise less significant SNPs embedded in the individual GWAS

that may reflect the genuine association at larger sample sizes. In Study 2, the

existing inter-population variation of LD (i.e. Indians versus Chinese) may

diminish the statistical power to detect genetic loci at these regions with different

LD patterns across ethnic groups, although the false-positive rate generally is not

affected. Such inter-population LD variation could yield different effect sizes of the association (or even opposite direction in the association) at the genotyped

SNP, while the underlying genetic effect of the casual variant is the same72.

Rather than combining test statistic of the same SNP, regional-based meta-

analysis method is useful in trans-ethnic studies for assessing regional evidence of

phenotypic association, while accommodating different patterns of LD and the

presence of allelic heterogeneity68.

In addition, the assumption of fixed-effect model adopted in the current meta-

analysis model does not account for the scenario where the effect sizes of genetic

variants can vary across studies. Interestingly, the conventional random-effect

model yields less significant p-values than the fixed-effect model in the presence

of inter-study heterogeneity 418. There are future opportunities to perform more

184

advanced statistical analyses to accommodate this issue in the trans-ethnic mapping. For example, Han and Eskin have proposed a revised random-effect model that relaxed the conservative assumption of null-hypothesis, showing significantly improved power in comparison with the traditional random-effect model66. By accounting for similarity in allelic effects between genetically close-

related populations but allowing heterogeneity between more diverse ethnic

groups, Morris has demonstrated that such an approach significantly increased the

power compared to the uniform fixed- or random-effect models67.

6.4 Missing heritability of myopia

In Study 1, the effect sizes of the observed association for AL were modest, and the proportion of variance explained was about 1% for AL. The odds ratio of the risk allele (major allele) at the implicated SNP (rs4373767) for high myopia was below 1.35, similarly to other implicated loci reported in previous

GWAS22,265,266.

There are many possible reasons for such ‘missing heritability’ of ocular traits,

considering the heritability estimate as high as 88% for quantitative refraction.

One is the gene-environmental interaction. There is abundant evidence for

environmental impacts on the development of myopia. The importance of the

environmental influence on the development of myopia is strongly supported in

experimental animal models, which show that changes in visual experience can

generate signals that promote eye growth, leading to myopia 126. In humans, the

continuously increased prevalence of myopia globally, especially in urban East

185

Asia in the last several decades, has suggested the roles of environmental factors,

such as levels of educational attainment, the extent of near work, and outdoor

activities110. The lifestyle changes may alter the distribution of genetic effects in

some subgroups as well109. However, unless there is a priori knowledge that

particular environmental factors or genes interplay with each other and reasonable

fraction of individuals experience the adverse exposure, a direct search for pairs

of interacting loci and/or interaction with environmental factors can be difficult.

Another possibility is that large-effect rare variants may contribute to the genetic and phenotypic variation419. Rare variants (MAF <5%) are generally

omitted in GWAS due to a lack of power. The presence of allelic heterogeneity

poses a challenge for rare variants in exome sequencing studies, as it is expected

that, in the same functional unit, different causal mutations of extremely low

frequencies could be carried by different individuals. The statistical analysis of

rare variants is fundamentally different from association statistics used for testing

common variants. The challenges arise in several aspects including extremely low

power in the single-marker test, allelic heterogeneity, effect heterogeneity of the

variants within the same region (gene), etc.420. Developing practical computer

software to prioritise promising rare variants for further functional assay is

currently an active research area for statistical geneticists.

Genome-wide scans for refractive errors on structure variations, such as copy

number variations (CNVs) referring to the chromosomal deletions or duplications,

are lag behind many complex diseases. Different computational algorithms have

been developed to generate CNV calls based on the current available SNP chips

186

from GWAS, however large variation of copy number detection among different

programs is noted421. With the availability of genome-wide sequencing data in the near future, it is expected that studies on genome-wide CNVs for myopia will be reported.

6.5 Recombination variations and implications in genetic association studies

In Study 3 (Chapter 5), a signal processing approach (varRecM) was introduced to evaluate differences in two estimated recombination rate profiles.

To the best of our knowledge, there is currently no formal statistical metric for defining recombination rate variation without first defining the presence or absence of recombination hotspots.

Applying the metric to genome-wide recombination rates estimated from

HapMap and SGVP suggests that significant fine-scale differences exist in the recombination profiles of Europeans, Africans and East Asians. This observation is important in trans-ethnic studies, as it provides the insight into the fine- mapping of the functional polymorphisms in diverse populations. Assuming that a functional variant of the similar effect size and the same direction is present across multiple populations, whether this association is detectable largely depends on the pattern of LD in the region. Recombination is one of the key forces that breakdowns LD, and the rate of recombination has been widely used as a surrogate measure for LD in genetic association studies, for instance, SNP imputation algorithms60. Therefore, comparison of inter-population

recombination variation at fine scale, which is relevant to the extent of LD

187

variation in the same region, is of great value to studies that aim to identify the

genetic associations transferable across different populations. Top regions exhibiting large recombination variations will be studied further for the relevance

to the disease-associated regions and subsequently the transferability of genetic

associations at these regions.

Until very recently two largest meta-analyses of GWAS ever have uncovered a total of 39 genetic loci implicated with myopia 422,423. The association signals

from both studies are primarily discovered from the samples of Caucasian

descent. In the CREAM GWAS on spherical equivalent, a number of these loci

have been successfully replicated in Asian populations comprising Chinese,

Malays and Indians, whereas others have failed. Since the causal variants are

highly likely to be in LD with these associated SNPs, the variation of LD patterns

can challenge replication studies across populations. Further evaluation of inter-

population differences in the patterns of recombination (varRecM scores), or

similarly the extent of LD, will facilitate our understanding of the transferability

of theses ‘myopia’ genetic loci in multiple populations.

This analysis also shows that the large variations in recombination profiles are

generally found in non-gene regions, an observation that appears to be concordant

with previous reports that recombination tends to be suppressed in genes93 even

though the biological mechanisms for this are still poorly understood424. An

interesting question remaining is why some hotspots are less likely to be shared

between different human populations? Our findings prioritise genomic regions for

188

further studies to provide insight into the biological basis of heterogeneous

recombination in different populations.

In summary, this dissertation demonstrates the value of integrating population genomics and medical genetics to carry out large-scale genetic association studies in multiple populations in elucidation of the genetic architecture of refractive errors. The disease-predisposing genetic variation identified in this dissertation will shed light on the complete picture of the genetic pathways involved in the occurrence and development of refractive errors and further characterisation of the causal mutations responsible for the genetic aetiology.

189

7 Publications

This thesis is based on the following papers,

Fan, Q., Barathi, V.A., Cheng, C.Y., Zhou, X., Meguro, A., Nakata, I., Khor, C.C., Goh, L.K., Li, Y.J., Lim, W., Ho CEH, Hawthorne F., Zheng Y., Chua D., Inoko H., Yamashiro K., Ohno-Matsui K., Matsuo K., Matsuda F., Vithana E., Seielstad M., Mizuki N., Beuerman RW., E-Shyong Tai E-S., Yoshimura., Aung T., Young TL., Wong T-Y., Teo Y-Y.*, Saw S-M.* (2012). Genetic variants on chromosome 1q41 influence ocular axial length and high myopia. PLoS Genet 8, e1002753.

Fan, Q., Ong, R., Xu, H., Small K.S., Saw, S-M, Teo,Y-Y. Genome-wide comparison of estimated recombination rates between populations; under review (2013)

Fan, Q., Zhou, X., Khor, C.C., Cheng, C.Y., Goh, L.K., Sim, X., Tay, W.T., Li, Y.J., Ong, R.T., Suo, C., Cornes B.,Ikram MK.,Chia K-S., Seielstad M., Liu J., Vithana E., Young TL.,Tai E-S., Wong T-Y.*, Aung T.*, Teo Y-Y.*, Saw S-M.* (2011). Genome-wide meta-analysis of five Asian cohorts identifies PDGFRA as a susceptibility locus for corneal astigmatism. PLoS Genet 7, e1002402.

Fan, Q., Teo, Y-Y., and Saw, S-M. (2011). Application of advanced statistics in ophthalmology. Invest Ophthalmol Vis Sci 52, 6059-6065.

and book chapter:

Li Y.J & Fan Q (2010). Statistical analysis of Genome-wide associaton studies for myopia. In: Myopia: animal models to clincial trials. Singapore, World Scentific Publishing Co. Pte. Ltd, 215-235.

Relevant work: Han, S.*, Chen, P.*, Fan, Q*., Khor, C.C., Sim, X., Tay, W.T., Ong, R.T., Suo, C., Goh, L.K., Lavanya, R., Zheng,Y., Wu R., Seielstad M., Vithana E., Liu J., Chia K.S., Lee J.J., Tai, E-S., Wong, T-Y. *, Aung T. *, Teo Y-Y. *, and Saw S- M*. (2011). Association of variants in FRAP1 and PDGFRA with corneal curvature in Asian populations from Singapore. Hum Mol Genet, 20, 3693-3698

Li, Y.J., Goh, L.K., Khor, C.C., Fan, Q., Yu, M., Han, S., Sim, X., Ong, R.T., Wong, T.Y., Vithana, E.N., Yap E., Seielstad M., Wong T-Y., Tai E-S., Young TY., Saw S-M. (2011). Genome-wide association studies reveal genetic variants in CTNND2 for high myopia in Singapore Chinese. Ophthalmology 118, 368- 375. * Joint first/last authors 190

8 References

1. Morton, N.E. Sequential tests for the detection of linkage. Am J Hum Genet 7, 277-318 (1955). 2. Altshuler, D., Daly, M.J. & Lander, E.S. Genetic mapping in human disease. Science 322, 881-8 (2008). 3. McCarthy, M.I. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 9, 356-69 (2008). 4. Hirschhorn, J.N. & Daly, M.J. Genome-wide association studies for common diseases and complex traits. Nat Rev Genet 6, 95-108 (2005). 5. Frazer, K.A. et al. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851-61 (2007). 6. Anderson, C.A. et al. Data quality control in genetic case-control association studies. Nat Protoc 5, 1564-73 (2010). 7. Price, A.L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38, 904-9 (2006). 8. Patterson, N., Price, A.L. & Reich, D. Population structure and eigenanalysis. PLoS Genet 2, e190 (2006). 9. Balding, D.J. A tutorial on statistical methods for population association studies. Nat Rev Genet 7, 781-91 (2006). 10. Kerner, B., North, K.E. & Fallin, M.D. Use of longitudinal data in genetic studies in the genome-wide association studies era: summary of Group 14. Genet Epidemiol 33 Suppl 1, S93-8 (2009). 11. Rasmussen-Torvik, L.J. et al. Impact of repeated measures and sample selection on genome-wide association studies of fasting . Genet Epidemiol (2010). 12. Spielman, R.S., McGinnis, R.E. & Ewens, W.J. Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am J Hum Genet 52, 506-16 (1993). 13. Rabinowitz, D. & Laird, N. A unified approach to adjusting association tests for population admixture with arbitrary pedigree structure and arbitrary missing marker information. Hum Hered 50, 211-23 (2000). 14. Freedman, M.L. et al. Assessing the impact of population stratification on genetic association studies. Nat Genet 36, 388-93 (2004). 15. Laird, N.M. & Lange, C. Family-based designs in the age of large-scale gene- association studies. Nat Rev Genet 7, 385-94 (2006). 16. Chanock, S.J. et al. Replicating genotype-phenotype associations. Nature 447, 655-60 (2007). 17. Dewan, A. et al. HTRA1 promoter polymorphism in wet age-related macular degeneration. Science 314, 989-92 (2006). 18. Chen, W. et al. Genetic variants near TIMP3 and high-density lipoprotein- associated loci influence susceptibility to age-related macular degeneration. Proc Natl Acad Sci U S A 107, 7401-6 (2010). 19. Thorleifsson, G. et al. Common variants near CAV1 and CAV2 are associated with primary open-angle glaucoma. Nat Genet 42, 906-9 (2010).

191

20. Nakano, M. et al. Three susceptible loci associated with primary open-angle glaucoma identified by genome-wide association study in a Japanese population. Proc Natl Acad Sci U S A 106, 12838-42 (2009). 21. Lin, H.J. et al. Single-nucleotide polymorphisms in chromosome 3p14.1- 3p14.2 are associated with susceptibility of Type 2 diabetes with cataract. Mol Vis 16, 1206-14 (2010). 22. Li, Y.J. et al. Genome-wide association studies reveal genetic variants in CTNND2 for high myopia in Singapore Chinese. Ophthalmology 118, 368-75 (2011). 23. Nakanishi, H. et al. A genome-wide association analysis identified a novel susceptible locus for pathological myopia at 11q24.1. PLoS Genet 5, e1000660 (2009). 24. Charlesworth, J. et al. The path to open-angle glaucoma gene discovery: endophenotypic status of intraocular pressure, cup-to-disc ratio, and central corneal thickness. Invest Ophthalmol Vis Sci 51, 3509-14 (2010). 25. Lu, Y. et al. Common genetic variants near the Brittle Cornea Syndrome locus ZNF469 influence the blinding disease risk factor central corneal thickness. PLoS Genet 6, e1000947 (2010). 26. Vithana, E.N. et al. Collagen-related genes influence the glaucoma risk factor, central corneal thickness. Hum Mol Genet 20, 649-58 (2011). 27. Vitart, V. et al. New loci associated with central cornea thickness include COL5A1, AKAP13 and AVGR8. Hum Mol Genet 19, 4304-11 (2010). 28. Ramdas, W.D. et al. A genome-wide association study of optic disc parameters. PLoS Genet 6, e1000978 (2010). 29. Khor, C.C. et al. Genome-wide association studies in Asians confirm the involvement of ATOH7 and TGFBR3, and further identify CARD10 as a novel locus influencing optic disc area. Hum Mol Genet 20, 1864-72 (2011). 30. Plomin, R., Haworth, C.M. & Davis, O.S. Common disorders are quantitative traits. Nat Rev Genet 10, 872-8 (2009). 31. Evans, K. & Bird, A.C. The genetics of complex ophthalmic disorders. Br J Ophthalmol 80, 763-8 (1996). 32. Wiggs, J.L. Genetic etiologies of glaucoma. Arch Ophthalmol 125, 30-7 (2007). 33. Young, T.L., Metlapally, R. & Shay, A.E. Complex trait genetics of refractive error. Arch Ophthalmol 125, 38-48 (2007). 34. Solouki, A.M. et al. A genome-wide association study identifies a susceptibility locus for refractive errors and myopia at 15q14. Nat Genet 42, 897-901 (2010). 35. Hysi, P.G. et al. A genome-wide association study for myopia and refractive error identifies a susceptibility locus at 15q25. Nat Genet 42, 902-5 (2010). 36. Carbonaro, F. et al. Repeated measures of intraocular pressure result in higher heritability and greater power in genetic linkage studies. Invest Ophthalmol Vis Sci 50, 5115-9 (2009). 37. Ciner, E. et al. Genome-wide scan of African-American and white families for linkage to myopia. Am J Ophthalmol 147, 512-517 e2 (2009). 38. Ciner, E., Wojciechowski, R., Ibay, G., Bailey-Wilson, J.E. & Stambolian, D. Genomewide scan of ocular refraction in African-American families shows significant linkage to chromosome 7p15. Genet Epidemiol 32, 454-63 (2008). 39. Hammond, C.J., Andrew, T., Mak, Y.T. & Spector, T.D. A susceptibility locus for myopia in the normal population is linked to the PAX6 gene region on 192

chromosome 11: a genomewide scan of dizygotic twins. Am J Hum Genet 75, 294-304 (2004). 40. Klein, A.P. et al. Support for polygenic influences on ocular refractive error. Invest Ophthalmol Vis Sci 46, 442-6 (2005). 41. Murdoch, I.E., Morris, S.S. & Cousens, S.N. People and eyes: statistical approaches in ophthalmology. Br J Ophthalmol 82, 971-3 (1998). 42. Xu, X., Tian, L. & Wei, L.J. Combining dependent tests for linkage or association across multiple phenotypic traits. Biostatistics 4, 223-9 (2003). 43. Klei, L., Luca, D., Devlin, B. & Roeder, K. Pleiotropy and principal components of heritability combine to increase power for association analysis. Genet Epidemiol 32, 9-19 (2008). 44. Yang, F., Tang, Z. & Deng, H. Bivariate association analysis for quantitative traits using generalized estimation equation. J Genet Genomics 36, 733-43 (2009). 45. Jiang, C. & Zeng, Z.B. Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics 140, 1111-27 (1995). 46. O'Brien, P.C. Procedures for comparing samples with multiple endpoints. Biometrics 40, 1079-87 (1984). 47. Wei, L.J. & Johnson, W.E. Combining Dependent Tests with Incomplete Repeated Measurements. Biometrika 72, 359-364 (1985). 48. Yang, Q., Wu, H., Guo, C.Y. & Fox, C.S. Analyze multivariate phenotypes in genetic association studies by combining univariate association tests. Genet Epidemiol 34, 444-54 (2010). 49. Lange, C. et al. A family-based association test for repeatedly measured quantitative traits adjusting for unknown environmental and/or polygenic effects. Stat Appl Genet Mol Biol 3, Article17 (2004). 50. Liu, J., Pei, Y., Papasian, C.J. & Deng, H.W. Bivariate association analyses for the mixture of continuous and binary traits with the use of extended generalized estimating equations. Genet Epidemiol 33, 217-27 (2009). 51. Liu, Y.Z. et al. Powerful bivariate genome-wide association analyses suggest the SOX6 gene influencing both obesity and osteoporosis phenotypes in males. PLoS One 4, e6827 (2009). 52. Lange, C., Silverman, E.K., Xu, X., Weiss, S.T. & Laird, N.M. A multivariate family- based association test using generalized estimating equations: FBAT-GEE. Biostatistics 4, 195-206 (2003). 53. Hackett, C.A., Meyer, R.C. & Thomas, W.T. Multi-trait QTL mapping in barley using multivariate regression. Genet Res 77, 95-106 (2001). 54. Yu, K. et al. A partially linear tree-based regression model for multivariate outcomes. Biometrics 66, 89-96 (2010). 55. Lange, C., DeMeo, D., Silverman, E.K., Weiss, S.T. & Laird, N.M. PBAT: tools for family-based association studies. Am J Hum Genet 74, 367-9 (2004). 56. Goldstein, D.B. Common genetic variation and human traits. N Engl J Med 360, 1696-8 (2009). 57. Munafo, M.R. & Flint, J. Meta-analysis of genetic association studies. Trends Genet 20, 439-44 (2004). 58. de Bakker, P.I. et al. Practical aspects of imputation-driven meta-analysis of genome-wide association studies. Hum Mol Genet 17, R122-8 (2008).

193

59. Marchini, J. & Howie, B. Genotype imputation for genome-wide association studies. Nat Rev Genet 11, 499-511 (2010). 60. Marchini, J., Howie, B., Myers, S., McVean, G. & Donnelly, P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet 39, 906-13 (2007). 61. Huang, L. et al. Genotype-imputation accuracy across worldwide human populations. Am J Hum Genet 84, 235-50 (2009). 62. Guan, Y. & Stephens, M. Practical issues in imputation-based association mapping. PLoS Genet 4, e1000279 (2008). 63. Browning, S.R. & Browning, B.L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet 81, 1084-97 (2007). 64. Ellinghaus, D., Schreiber, S., Franke, A. & Nothnagel, M. Current software for genotype imputation. Hum Genomics 3, 371-80 (2009). 65. Mantel, N. & Haenszel, W. Statistical aspects of the analysis of data from retrospective studies of disease. J Natl Cancer Inst 22, 719-48 (1959). 66. Han, B. & Eskin, E. Random-effects model aimed at discovering associations in meta-analysis of genome-wide association studies. Am J Hum Genet 88, 586-98 (2011). 67. Morris, A.P. Transethnic meta-analysis of genomewide association studies. Genet Epidemiol 35, 809-22 (2011). 68. Wang, X. et al. A statistical method for region-based meta-analysis of genome- wide association studies in genetically diverse populations. Eur J Hum Genet 20, 469-75 (2012). 69. Rosenberg, N.A. et al. Genome-wide association studies in diverse populations. Nat Rev Genet 11, 356-66 (2010). 70. Tang, M.X. et al. The APOE-epsilon4 allele and the risk of Alzheimer disease among African Americans, whites, and Hispanics. JAMA 279, 751-5 (1998). 71. Begum, F., Ghosh, D., Tseng, G.C. & Feingold, E. Comprehensive literature review and statistical considerations for GWAS meta-analysis. Nucleic Acids Res 40, 3777-84 (2012). 72. Lin, P.I., Vance, J.M., Pericak-Vance, M.A. & Martin, E.R. No gene is an island: the flip-flop phenomenon. Am J Hum Genet 80, 531-8 (2007). 73. Klein, R.J. et al. Complement factor H polymorphism in age-related macular degeneration. Science 308, 385-9 (2005). 74. Asimit, J. et al. An evaluation of different meta-analysis approaches in the presence of allelic heterogeneity. Eur J Hum Genet 20, 709-12 (2012). 75. Kiezun, A. et al. Exome sequencing and the genetic basis of complex traits. Nat Genet 44, 623-30 (2012). 76. Nachman, M.W. Single nucleotide polymorphisms and recombination rate in humans. Trends Genet 17, 481-5 (2001). 77. Jeffreys, A.J., Murray, J. & Neumann, R. High-resolution mapping of crossovers in human sperm defines a minisatellite-associated recombination hotspot. Mol Cell 2, 267-73 (1998). 78. Jeffreys, A.J., Kauppi, L. & Neumann, R. Intensely punctate meiotic recombination in the class II region of the major histocompatibility complex. Nat Genet 29, 217-22 (2001). 194

79. Lynn, A., Ashley, T. & Hassold, T. Variation in human meiotic recombination. Annu Rev Genomics Hum Genet 5, 317-49 (2004). 80. Jensen-Seaman, M.I. et al. Comparative recombination rates in the rat, mouse, and human genomes. Genome Res 14, 528-38 (2004). 81. Nachman, M.W. Variation in recombination rate across the genome: evidence and implications. Curr Opin Genet Dev 12, 657-63 (2002). 82. Winckler, W. et al. Comparison of fine-scale recombination rates in humans and . Science 308, 107-11 (2005). 83. Ptak, S.E. et al. Fine-scale recombination patterns differ between chimpanzees and humans. Nat Genet 37, 429-34 (2005). 84. Jeffreys, A.J. & Neumann, R. The rise and fall of a human recombination hot spot. Nat Genet 41, 625-9 (2009). 85. Coop, G. & Przeworski, M. An evolutionary view of human recombination. Nat Rev Genet 8, 23-34 (2007). 86. Jeffreys, A.J. & Neumann, R. Reciprocal crossover asymmetry and meiotic drive in a human recombination hot spot. Nat Genet 31, 267-71 (2002). 87. Risch, N. & Merikangas, K. The future of genetic studies of complex human diseases. Science 273, 1516-7 (1996). 88. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661-78 (2007). 89. Jorde, L.B., Watkins, W.S. & Bamshad, M.J. Population genomics: a bridge from evolutionary history to genetic medicine. Hum Mol Genet 10, 2199-207 (2001). 90. Serre, D., Nadon, R. & Hudson, T.J. Large-scale recombination rate patterns are conserved among human populations. Genome Res 15, 1547-52 (2005). 91. Matise, T.C. et al. A second-generation combined linkage physical map of the human genome. Genome Res 17, 1783-6 (2007). 92. Evans, D.M. & Cardon, L.R. A comparison of linkage disequilibrium patterns and estimated population recombination rates across multiple populations. Am J Hum Genet 76, 681-7 (2005). 93. Myers, S., Bottolo, L., Freeman, C., McVean, G. & Donnelly, P. A fine-scale map of recombination rates and hotspots across the human genome. Science 310, 321-4 (2005). 94. Jorgenson, E. et al. Ethnicity and human genetic linkage maps. Am J Hum Genet 76, 276-90 (2005). 95. Wegmann, D. et al. Recombination rates in admixed individuals identified by ancestry-based inference. Nat Genet 43, 847-53 (2011). 96. Crawford, D.C. et al. Evidence for substantial fine-scale variation in recombination rates across the human genome. Nat Genet 36, 700-6 (2004). 97. Fearnhead, P. & Smith, N.G. A novel method with improved power to detect recombination hotspots from polymorphism data reveals multiple hotspots in human genes. Am J Hum Genet 77, 781-94 (2005). 98. Hinch, A.G. et al. The landscape of recombination in African Americans. Nature 476, 170-5 (2011). 99. A haplotype map of the human genome. Nature 437, 1299-320 (2005). 100. Teo, Y.Y. et al. Singapore Genome Variation Project: a haplotype map of three Southeast Asian populations. Genome Res 19, 2154-62 (2009).

195

101. McVean, G.A. et al. The fine-scale structure of recombination rate variation in the human genome. Science 304, 581-4 (2004). 102. Clark, A.G., Wang, X. & Matise, T. Contrasting methods of quantifying fine structure of human recombination. Annu Rev Genomics Hum Genet 11, 45-64 (2010). 103. McVean, G. The structure of linkage disequilibrium around a selective sweep. Genetics 175, 1395-406 (2007). 104. Myers, S., Freeman, C., Auton, A., Donnelly, P. & McVean, G. A common sequence motif associated with recombination hot spots and genome instability in humans. Nat Genet (2008). 105. Myers, S. et al. Drive against hotspot motifs in primates implicates the PRDM9 gene in meiotic recombination. Science 327, 876-9 (2010). 106. Kong, A. et al. Fine-scale recombination rate differences between sexes, populations and individuals. Nature 467, 1099-103 (2010). 107. Laayouni, H. et al. Similarity in recombination rate estimates highly correlates with genetic differentiation in humans. PLoS One 6, e17913 (2011). 108. Khil, P.P. & Camerini-Otero, R.D. Genetic crossovers are predicted accurately by the computed human recombination map. PLoS Genet 6, e1000831 (2010). 109. Morgan, I.G., Ohno-Matsui, K. & Saw, S.M. Myopia. Lancet 379, 1739-48 (2012). 110. Wojciechowski, R. Nature and nurture: the complex genetics of myopia and refractive error. Clin Genet 79, 301-20 (2011). 111. Saw, S.M., Gazzard, G., Shih-Yen, E.C. & Chua, W.H. Myopia and associated pathological complications. Ophthalmic Physiol Opt 25, 381-91 (2005). 112. Wong, T.Y., Klein, B.E., Klein, R., Tomany, S.C. & Lee, K.E. Refractive errors and incident cataracts: the Beaver Dam Eye Study. Invest Ophthalmol Vis Sci 42, 1449-54 (2001). 113. Grosvenor, T.P. Primary care optometry : anomalies of refraction and binocular vision, xiii, 665 p. (Butterworth-Heinemann, Boston, 1996). 114. Saw, S.M. et al. Eye growth changes in myopic children in Singapore. Br J Ophthalmol 89, 1489-94 (2005). 115. Read, S.A., Collins, M.J. & Carney, L.G. A review of astigmatism and its possible genesis. Clin Exp Optom 90, 5-19 (2007). 116. Keller, P.R., Collins, M.J., Carney, L.G., Davis, B.A. & van Saarloos, P.P. The relation between corneal and total astigmatism. Optom Vis Sci 73, 86-91 (1996). 117. Wang, Y. et al. Prevalence and causes of amblyopia in a rural adult population of Chinese the Handan Eye Study. Ophthalmology 118, 279-83 (2011). 118. Dobson, V., Harvey, E.M., Clifford-Donaldson, C.E., Green, T.K. & Miller, J.M. Amblyopia in astigmatic infants and toddlers. Optom Vis Sci 87, 330-6 (2010). 119. Harvey, E.M., Dobson, V., Miller, J.M. & Clifford-Donaldson, C.E. Amblyopia in astigmatic children: patterns of deficits. Vision Res 47, 315-26 (2007). 120. Harvey, E.M. Development and treatment of astigmatism-related amblyopia. Optom Vis Sci 86, 634-9 (2009). 121. Kee, C.S., Hung, L.F., Qiao-Grider, Y., Ramamirtham, R. & Smith, E.L., 3rd. Astigmatism in monkeys with experimentally induced myopia or hyperopia. Optom Vis Sci 82, 248-60 (2005). 122. Tong, L. et al. Prevalence rates and epidemiological risk factors for astigmatism in Singapore school children. Optom Vis Sci 79, 606-13 (2002). 196

123. KhabazKhoob, M. et al. Keratometry measurements, corneal astigmatism and irregularity in a normal population: the Tehran Eye Study. Ophthalmic Physiol Opt 30, 800-5 (2010). 124. Heidary, G., Ying, G.S., Maguire, M.G. & Young, T.L. The association of astigmatism and spherical refractive error in a high myopia cohort. Optom Vis Sci 82, 244-7 (2005). 125. Hashemi, H., Hatef, E., Fotouhi, A. & Mohammad, K. Astigmatism and its determinants in the Tehran population: the Tehran eye study. Ophthalmic Epidemiol 12, 373-81 (2005). 126. Lawrence, M.S. & Azar, D.T. Myopia and models and mechanisms of refractive error control. Ophthalmol Clin North Am 15, 127-33 (2002). 127. Sivak, J.G. The role of the lens in refractive development of the eye: animal models of ametropia. Exp Eye Res 87, 3-8 (2008). 128. Wallman, J., Gottlieb, M.D., Rajaram, V. & Fugate-Wentzek, L.A. Local retinal regions control local eye growth and myopia. Science 237, 73-7 (1987). 129. Morgan, I. & Megaw, P. Using natural STOP growth signals to prevent excessive axial elongation and the development of myopia. Ann Acad Med Singapore 33, 16-20 (2004). 130. Stone, R.A., Lin, T., Laties, A.M. & Iuvone, P.M. Retinal dopamine and form- deprivation myopia. Proc Natl Acad Sci U S A 86, 704-6 (1989). 131. Barathi, V.A. & Beuerman, R.W. Molecular mechanisms of muscarinic receptors in mouse scleral fibroblasts: Prior to and after induction of experimental myopia with atropine treatment. Mol Vis 17, 680-92 (2011). 132. Norton, T.T. & Siegwart, J.T., Jr. Animal models of emmetropization: matching axial length to the focal plane. J Am Optom Assoc 66, 405-14 (1995). 133. Siegwart, J.T., Jr. & Norton, T.T. Binocular lens treatment in tree shrews: Effect of age and comparison of plus lens wear with recovery from minus lens-induced myopia. Exp Eye Res 91, 660-9 (2010). 134. McBrien, N.A., Jobling, A.I. & Gentle, A. Biomechanics of the sclera in myopia: extracellular and cellular factors. Optom Vis Sci 86, E23-30 (2009). 135. Charman, W.N. & Radhakrishnan, H. Peripheral refraction and the development of refractive error: a review. Ophthalmic Physiol Opt 30, 321-38 (2010). 136. Smith, E.L., 3rd, Kee, C.S., Ramamirtham, R., Qiao-Grider, Y. & Hung, L.F. Peripheral vision can influence eye growth and refractive development in infant monkeys. Invest Ophthalmol Vis Sci 46, 3965-72 (2005). 137. Huang, J., Hung, L.F. & Smith, E.L., 3rd. Effects of foveal ablation on the pattern of peripheral refractive errors in normal and form-deprived infant rhesus monkeys (Macaca mulatta). Invest Ophthalmol Vis Sci 52, 6428-34 (2011). 138. Schippert, R. & Schaeffel, F. Peripheral defocus does not necessarily affect central refractive development. Vision Res 46, 3935-40 (2006). 139. Wong, T.Y., Foster, P.J., Johnson, G.J. & Seah, S.K. Education, socioeconomic status, and ocular dimensions in Chinese adults: the Tanjong Pagar Survey. Br J Ophthalmol 86, 963-8 (2002). 140. Teasdale, T.W. & Goldschmidt, E. Myopia and its relationship to education, intelligence and height. Preliminary results from an on-going study of Danish draftees. Acta Ophthalmol Suppl 185, 41-3 (1988).

197

141. Gwiazda, J., Deng, L., Dias, L. & Marsh-Tootle, W. Association of education and occupation with myopia in COMET parents. Optom Vis Sci 88, 1045-53 (2011). 142. Gwiazda, J.E. et al. Accommodation and related risk factors associated with myopia progression and their interaction with treatment in COMET children. Invest Ophthalmol Vis Sci 45, 2143-51 (2004). 143. McBrien, N.A., Moghaddam, H.O. & Reeder, A.P. Atropine reduces experimental myopia and eye enlargement via a nonaccommodative mechanism. Invest Ophthalmol Vis Sci 34, 205-15 (1993). 144. Saw, S.M. et al. Nearwork in early-onset myopia. Invest Ophthalmol Vis Sci 43, 332-9 (2002). 145. Ip, J.M. et al. Role of near work in myopia: findings in a sample of Australian school children. Invest Ophthalmol Vis Sci 49, 2903-10 (2008). 146. Jones, L.A. et al. Parental history of myopia, sports and outdoor activities, and future myopia. Invest Ophthalmol Vis Sci 48, 3524-32 (2007). 147. Rose, K.A. et al. Outdoor activity reduces the prevalence of myopia in children. Ophthalmology 115, 1279-85 (2008). 148. Cohen, Y., Belkin, M., Yehezkel, O., Solomon, A.S. & Polat, U. Dependency between light intensity and refractive development under light-dark cycles. Exp Eye Res 92, 40-6 (2011). 149. Ashby, R.S. & Schaeffel, F. The effect of bright light on lens compensation in chicks. Invest Ophthalmol Vis Sci 51, 5247-53 (2010). 150. Flitcroft, D.I. The complex interactions of retinal, optical and environmental factors in myopia aetiology. Prog Retin Eye Res (2012). 151. Mutti, D.O., Sholtz, R.I., Friedman, N.E. & Zadnik, K. Peripheral refraction and ocular shape in children. Invest Ophthalmol Vis Sci 41, 1022-30 (2000). 152. Lundstrom, L., Mira-Agudelo, A. & Artal, P. Peripheral optical errors and their change with accommodation differ between emmetropic and myopic eyes. J Vis 9, 17 1-11 (2009). 153. Mutti, D.O. et al. Refractive error, axial length, and relative peripheral refractive error before and after the onset of myopia. Invest Ophthalmol Vis Sci 48, 2510-9 (2007). 154. Mutti, D.O. et al. Relative peripheral refractive error and the risk of onset and progression of myopia in children. Invest Ophthalmol Vis Sci 52, 199-205 (2011). 155. Morgan, I. & Rose, K. How genetic is school myopia? Prog Retin Eye Res 24, 1-38 (2005). 156. Chen, C.Y. et al. Heritability and shared environment estimates for myopia and associated ocular biometric traits: the Genes in Myopia (GEM) family study. Hum Genet 121, 511-20 (2007). 157. Hammond, C.J., Snieder, H., Gilbert, C.E. & Spector, T.D. Genes and environment in refractive error: the twin eye study. Invest Ophthalmol Vis Sci 42, 1232-6 (2001). 158. Dirani, M. et al. Heritability of refractive error and ocular biometrics: the Genes in Myopia (GEM) twin study. Invest Ophthalmol Vis Sci 47, 4756-61 (2006). 159. Klein, A.P. et al. Heritability analysis of spherical equivalent, axial length, corneal curvature, and anterior chamber depth in the Beaver Dam Eye Study. Arch Ophthalmol 127, 649-55 (2009).

198

160. Lyhne, N., Sjolie, A.K., Kyvik, K.O. & Green, A. The importance of genes and environment for ocular refraction and its determiners: a population based study among 20-45 year old twins. Br J Ophthalmol 85, 1470-6 (2001). 161. He, M. et al. Heritability of anterior chamber depth as an intermediate phenotype of angle-closure in Chinese: the Guangzhou Twin Eye Study. Invest Ophthalmol Vis Sci 49, 81-6 (2008). 162. Lopes, M.C., Andrew, T., Carbonaro, F., Spector, T.D. & Hammond, C.J. Estimating heritability and shared environmental effects for refractive error in twin and family studies. Invest Ophthalmol Vis Sci 50, 126-31 (2009). 163. Vitart, V. et al. Heritabilities of ocular biometrical traits in two croatian isolates with extended pedigrees. Invest Ophthalmol Vis Sci 51, 737-43 (2010). 164. Peet, J.A., Cotch, M.F., Wojciechowski, R., Bailey-Wilson, J.E. & Stambolian, D. Heritability and familial aggregation of refractive error in the Old Order Amish. Invest Ophthalmol Vis Sci 48, 4002-6 (2007). 165. Farbrother, J.E., Kirov, G., Owen, M.J. & Guggenheim, J.A. Family aggregation of high myopia: estimation of the sibling recurrence risk ratio. Invest Ophthalmol Vis Sci 45, 2873-8 (2004). 166. Fotouhi, A. et al. Familial aggregation of myopia in the Tehran eye study: estimation of the sibling and parent offspring recurrence risk ratios. Br J Ophthalmol 91, 1440-4 (2007). 167. Wojciechowski, R. et al. Heritability of refractive error and familial aggregation of myopia in an elderly American population. Invest Ophthalmol Vis Sci 46, 1588-92 (2005). 168. Familial aggregation and prevalence of myopia in the Framingham Offspring Eye Study. The Framingham Offspring Eye Study Group. Arch Ophthalmol 114, 326- 32 (1996). 169. Zadnik, K., Satariano, W.A., Mutti, D.O., Sholtz, R.I. & Adams, A.J. The effect of parental history of myopia on children's eye size. JAMA 271, 1323-7 (1994). 170. Lam, D.S. et al. The effect of parental history of myopia on children's eye size and growth: results of a longitudinal study. Invest Ophthalmol Vis Sci 49, 873-6 (2008). 171. Lee, K.E., Klein, B.E., Klein, R. & Fine, J.P. Aggregation of refractive error and 5- year changes in refractive error among families in the Beaver Dam Eye Study. Arch Ophthalmol 119, 1679-85 (2001). 172. Ashton, G.C. Segregation analysis of ocular refraction and myopia. Hum Hered 35, 232-9 (1985). 173. Parssinen, O. et al. Heritability of spherical equivalent: a population-based twin study among 63- to 76-year-old female twins. Ophthalmology 117, 1908-11 (2010). 174. Sanfilippo, P.G., Hewitt, A.W., Hammond, C.J. & Mackey, D.A. The heritability of ocular traits. Surv Ophthalmol 55, 561-83 (2010). 175. Paluru, P.C., Nallasamy, S., Devoto, M., Rappaport, E.F. & Young, T.L. Identification of a novel locus on 2q for autosomal dominant high-grade myopia. Invest Ophthalmol Vis Sci 46, 2300-7 (2005). 176. Zhang, Q. et al. A new locus for autosomal dominant high myopia maps to 4q22- q27 between D4S1578 and D4S1612. Mol Vis 11, 554-60 (2005).

199

177. Lam, C.Y. et al. A genome-wide scan maps a novel high myopia locus to 5p15. Invest Ophthalmol Vis Sci 49, 3768-78 (2008). 178. Ma, J.H. et al. Identification of a locus for autosomal dominant high myopia on chromosome 5p13.3-p15.1 in a Chinese family. Mol Vis 16, 2043-54 (2010). 179. Paget, S. et al. Linkage analysis of high myopia susceptibility locus in 26 families. Mol Vis 14, 2566-74 (2008). 180. Naiglin, L. et al. A genome wide scan for familial high myopia suggests a novel locus on chromosome 7q36. J Med Genet 39, 118-24 (2002). 181. Nallasamy, S. et al. Genetic linkage study of high-grade myopia in a Hutterite population from South Dakota. Mol Vis 13, 229-36 (2007). 182. Young, T.L. et al. A second locus for familial high myopia maps to chromosome 12q. Am J Hum Genet 63, 1419-24 (1998). 183. Yang, Z., Xiao, X., Li, S. & Zhang, Q. Clinical and linkage study on a consanguineous Chinese family with autosomal recessive high myopia. Mol Vis 15, 312-8 (2009). 184. Paluru, P. et al. New locus for autosomal dominant high myopia maps to the long arm of chromosome 17. Invest Ophthalmol Vis Sci 44, 1830-6 (2003). 185. Young, T.L. et al. Evidence that a locus for familial high myopia maps to chromosome 18p. Am J Hum Genet 63, 109-19 (1998). 186. Zhang, Q. et al. Novel locus for X linked recessive high myopia maps to Xq23-q25 but outside MYP1. J Med Genet 43, e20 (2006). 187. Schwartz, M., Haim, M. & Skarsholm, D. X-linked myopia: Bornholm eye disease. Linkage to DNA markers on the distal part of Xq. Clin Genet 38, 281-6 (1990). 188. Klein, A.P. et al. Linkage analysis of quantitative refraction and refractive errors in the beaver dam eye study. Invest Ophthalmol Vis Sci 52, 5220-5 (2011). 189. Farbrother, J.E. et al. Linkage analysis of the genetic loci for high myopia on 18p, 12q, and 17q in 51 U.K. families. Invest Ophthalmol Vis Sci 45, 2879-85 (2004). 190. Lam, D.S. et al. TGFbeta-induced factor: a candidate gene for high myopia. Invest Ophthalmol Vis Sci 44, 1012-5 (2003). 191. Scavello, G.S., Jr. et al. Genomic structure and organization of the high grade Myopia-2 locus (MYP2) critical region: mutation screening of 9 positional candidate genes. Mol Vis 11, 97-110 (2005). 192. Zhang, Q., Li, S., Xiao, X., Jia, X. & Guo, X. Confirmation of a genetic locus for X- linked recessive high myopia outside MYP1. J Hum Genet 52, 469-72 (2007). 193. Mutti, D.O. et al. Genetic loci for pathological myopia are not associated with juvenile myopia. Am J Med Genet 112, 355-60 (2002). 194. Ibay, G. et al. Candidate high myopia loci on chromosomes 18p and 12q do not play a major role in susceptibility to common myopia. BMC Med Genet 5, 20 (2004). 195. Stambolian, D. et al. Genomewide linkage scan for myopia susceptibility loci among Ashkenazi Jewish families shows evidence of linkage on chromosome 22q12. Am J Hum Genet 75, 448-59 (2004). 196. Stambolian, D. et al. Genome-wide scan of additional Jewish families confirms linkage of a myopia susceptibility locus to chromosome 22q12. Mol Vis 12, 1499-505 (2006).

200

197. Klein, A.P. et al. Confirmation of linkage to ocular refraction on chromosome 22q and identification of a novel linkage region on 1q. Arch Ophthalmol 125, 80- 5 (2007). 198. Wojciechowski, R. et al. Genomewide scan in Ashkenazi Jewish families demonstrates evidence of linkage of ocular refraction to a QTL on chromosome 1p36. Hum Genet 119, 389-99 (2006). 199. Stambolian, D. et al. Genome-wide scan for myopia in the Old Order Amish. Am J Ophthalmol 140, 469-76 (2005). 200. Wojciechowski, R. et al. Genomewide linkage scans for ocular refraction and meta-analysis of four populations in the Myopia Family Study. Invest Ophthalmol Vis Sci 50, 2024-32 (2009). 201. Abbott, D. et al. An international collaborative family-based whole genome quantitative trait linkage scan for myopic refractive error. Mol Vis 18, 720-9 (2012). 202. Biino, G. et al. Ocular refraction: heritability and genome-wide search for eye morphometry traits in an isolated Sardinian population. Hum Genet 116, 152-9 (2005). 203. Zhu, G. et al. Genetic dissection of myopia: evidence for linkage of ocular axial length to chromosome 5q. Ophthalmology 115, 1053-1057 e2 (2008). 204. Andrew, T. et al. Identification and replication of three novel myopia common susceptibility gene loci on chromosome 3q26 using linkage and linkage disequilibrium mapping. PLoS Genet 4, e1000220 (2008). 205. Mordechai, S. et al. High myopia caused by a mutation in LEPREL1, encoding prolyl 3-hydroxylase 2. Am J Hum Genet 89, 438-45 (2011). 206. Simpson, C.L., Wojciechowski, R., Ibay, G., Stambolian, D. & Bailey-Wilson, J.E. Dissecting the genetic heterogeneity of myopia susceptibility in an Ashkenazi Jewish population using ordered subset analysis. Mol Vis 17, 1641-51 (2011). 207. Lam, D.S. et al. Familial high myopia linkage to chromosome 18p. Ophthalmologica 217, 115-8 (2003). 208. Ratnamala, U. et al. Refinement of the X-linked nonsyndromic high-grade myopia locus MYP1 on Xq28 and exclusion of 13 known positional candidate genes by direct sequencing. Invest Ophthalmol Vis Sci 52, 6814-9 (2011). 209. Inamori, Y. et al. The COL1A1 gene and high myopia susceptibility in Japanese. Hum Genet 122, 151-7 (2007). 210. Mutti, D.O. et al. Candidate gene and locus analysis of myopia. Mol Vis 13, 1012- 9 (2007). 211. Metlapally, R. et al. COL1A1 and COL2A1 genes and myopia susceptibility: evidence of association and suggestive linkage to the COL2A1 locus. Invest Ophthalmol Vis Sci 50, 4080-6 (2009). 212. Zhang, D. et al. An association study of the COL1A1 gene and high myopia in a Han Chinese population. Mol Vis 17, 3379-83 (2011). 213. Wang, J. et al. High myopia is not associated with single nucleotide polymorphisms in the COL2A1 gene in the Chinese population. Mol Med Report 5, 133-7 (2012). 214. Rada, J.A., Shelton, S. & Norton, T.T. The sclera and myopia. Exp Eye Res 82, 185- 200 (2006).

201

215. Wojciechowski, R., Bailey-Wilson, J.E. & Stambolian, D. Association of matrix metalloproteinase gene polymorphisms with refractive error in Amish and Ashkenazi families. Invest Ophthalmol Vis Sci 51, 4989-95 (2010). 216. Hall, N.F., Gale, C.R., Ye, S. & Martyn, C.N. Myopia and polymorphisms in genes for matrix metalloproteinases. Invest Ophthalmol Vis Sci 50, 2632-6 (2009). 217. Nakanishi, H. et al. Single-nucleotide polymorphisms in the promoter region of matrix metalloproteinase-1, -2, and -3 in Japanese with high myopia. Invest Ophthalmol Vis Sci 51, 4432-6 (2010). 218. Liang, C.L. et al. Evaluation of MMP3 and TIMP1 as candidate genes for high myopia in young Taiwanese men. Am J Ophthalmol 142, 518-20 (2006). 219. Han, W., Yap, M.K., Wang, J. & Yip, S.P. Family-based association analysis of hepatocyte growth factor (HGF) gene polymorphisms in high myopia. Invest Ophthalmol Vis Sci 47, 2291-9 (2006). 220. Yanovitch, T. et al. Hepatocyte growth factor and myopia: genetic association analyses in a Caucasian population. Mol Vis 15, 1028-35 (2009). 221. Veerappan, S. et al. Role of the hepatocyte growth factor gene in refractive error. Ophthalmology 117, 239-45 e1-2 (2010). 222. Jobling, A.I., Wan, R., Gentle, A., Bui, B.V. & McBrien, N.A. Retinal and choroidal TGF-beta in the tree shrew model of myopia: isoform expression, activation and effects on function. Exp Eye Res 88, 458-66 (2009). 223. Lin, H.J. et al. The TGFbeta1 gene codon 10 polymorphism contributes to the genetic predisposition to high myopia. Mol Vis 12, 698-703 (2006). 224. Zha, Y. et al. TGFB1 as a susceptibility gene for high myopia: a replication study with new findings. Arch Ophthalmol 127, 541-8 (2009). 225. Khor, C.C. et al. Support for TGFB1 as a susceptibility gene for high myopia in individuals of Chinese descent. Arch Ophthalmol 128, 1081-4 (2010). 226. Lin, H.J. et al. Sclera-related gene polymorphisms in high myopia. Mol Vis 15, 1655-63 (2009). 227. Cordain, L., Eaton, S.B., Brand Miller, J., Lindeberg, S. & Jensen, C. An evolutionary analysis of the aetiology and pathogenesis of juvenile-onset myopia. Acta Ophthalmol Scand 80, 125-35 (2002). 228. Metlapally, R. et al. Genetic association of insulin-like growth factor-1 polymorphisms with high-grade myopia in an international family cohort. Invest Ophthalmol Vis Sci 51, 4476-9 (2010). 229. Mak, J.Y., Yap, M.K., Fung, W.Y., Ng, P.W. & Yip, S.P. Association of IGF1 gene haplotypes with high myopia in Chinese adults. Arch Ophthalmol 130, 209-16 (2012). 230. Zhuang, W. et al. Association of insulin-like growth factor-1 polymorphisms with high myopia in the Chinese population. Mol Vis 18, 634-44 (2012). 231. Rada, J.A., Achen, V.R., Penugonda, S., Schmidt, R.W. & Mount, B.A. Proteoglycan composition in the human sclera during growth and aging. Invest Ophthalmol Vis Sci 41, 1639-48 (2000). 232. Jiang, B. et al. PAX6 haplotypes are associated with high myopia in Han chinese. PLoS One 6, e19587 (2011). 233. Tsai, Y.Y. et al. A PAX6 gene polymorphism is associated with genetic predisposition to extreme myopia. Eye (Lond) 22, 576-81 (2008).

202

234. Han, W. et al. Association of PAX6 polymorphisms with high myopia in Han Chinese nuclear families. Invest Ophthalmol Vis Sci 50, 47-56 (2009). 235. Leung, Y.F., Tam, P.O., Baum, L., Lam, D.S. & Pang, C.C. TIGR/MYOC proximal promoter GT-repeat polymorphism is not associated with myopia. Hum Mutat 16, 533 (2000). 236. Tang, W.C. et al. Linkage and association of myocilin (MYOC) polymorphisms with high myopia in a Chinese population. Mol Vis 13, 534-44 (2007). 237. Zayats, T. et al. Myocilin polymorphisms and high myopia in subjects of European origin. Mol Vis 15, 213-22 (2009). 238. Simpson, C.L. et al. The Roles of PAX6 and SOX2 in Myopia: lessons from the 1958 British Birth Cohort. Invest Ophthalmol Vis Sci 48, 4421-5 (2007). 239. An, J. et al. The FGF2 gene in a myopia animal model and human subjects. Mol Vis 18, 471-8 (2012). 240. Lu, B. et al. Replication study supports CTNND2 as a susceptibility gene for high myopia. Invest Ophthalmol Vis Sci 52, 8258-61 (2011). 241. Khor, C.C. et al. cMET and refractive error progression in children. Ophthalmology 116, 1469-74, 1474 e1 (2009). 242. Schache, M., Chen, C.Y., Dirani, M. & Baird, P.N. The hepatocyte growth factor receptor (MET) gene is not associated with refractive error and ocular biometrics in a Caucasian population. Mol Vis 15, 2599-605 (2009). 243. Wang, P. et al. High myopia is not associated with the SNPs in the TGIF, lumican, TGFB1, and HGF genes. Invest Ophthalmol Vis Sci 50, 1546-51 (2009). 244. Chen, J.H. et al. Endophenotyping reveals differential phenotype-genotype correlations between myopia-associated polymorphisms and eye biometric parameters. Mol Vis 18, 765-78 (2012). 245. Lin, H.J. et al. Muscarinic acetylcholine receptor 1 gene polymorphisms associated with high myopia. Mol Vis 15, 1774-80 (2009). 246. Wang, I.J. et al. The association of single nucleotide polymorphisms in the 5'- regulatory region of the lumican gene with susceptibility to high myopia in Taiwan. Mol Vis 12, 852-7 (2006). 247. Chen, Z.T., Wang, I.J., Shih, Y.F. & Lin, L.L. The association of haplotype at the lumican gene with high myopia susceptibility in Taiwanese patients. Ophthalmology 116, 1920-7 (2009). 248. Majava, M. et al. Novel mutations in the small leucine-rich repeat protein/proteoglycan (SLRP) genes in high myopia. Hum Mutat 28, 336-44 (2007). 249. Lin, H.J. et al. Association of the lumican gene functional 3'-UTR polymorphism with high myopia. Invest Ophthalmol Vis Sci 51, 96-102 (2010). 250. Rydzanicz, M. et al. IGF-1 gene polymorphisms in Polish families with high-grade myopia. Mol Vis 17, 2428-39 (2011). 251. Hayashi, H. et al. Association of 15q14 and 15q25 with High Myopia in Japanese. Invest Ophthalmol Vis Sci (2011). 252. Gao, Y. et al. Common variants in chromosome 4q25 are associated with myopia in Chinese adults. Ophthalmic Physiol Opt 32, 68-73 (2012). 253. Liang, C.L. et al. Systematic assessment of the tagging polymorphisms of the COL1A1 gene for high myopia. J Hum Genet 52, 374-7 (2007).

203

254. Nakanishi, H. et al. Absence of association between COL1A1 polymorphisms and high myopia in the Japanese population. Invest Ophthalmol Vis Sci 50, 544-50 (2009). 255. Veerappan, S. et al. The retinoic acid receptor alpha (RARA) gene is not associated with myopia, hypermetropia, and ocular biometric measures. Mol Vis 15, 1390-7 (2009). 256. Scavello, G.S., Paluru, P.C., Ganter, W.R. & Young, T.L. Sequence variants in the transforming growth beta-induced factor (TGIF) gene are not associated with high myopia. Invest Ophthalmol Vis Sci 45, 2091-7 (2004). 257. Hasumi, Y. et al. Analysis of single nucleotide polymorphisms at 13 loci within the transforming growth factor-induced factor gene shows no association with high myopia in Japanese subjects. Immunogenetics 58, 947-53 (2006). 258. Pertile, K.K. et al. Assessment of TGIF as a candidate gene for myopia. Invest Ophthalmol Vis Sci 49, 49-54 (2008). 259. Hayashi, T., Inoko, H., Nishizaki, R., Ohno, S. & Mizuki, N. Exclusion of transforming growth factor-beta1 as a candidate gene for myopia in the Japanese. Jpn J Ophthalmol 51, 96-9 (2007). 260. Liu, H.P. et al. A novel genetic variant of BMP2K contributes to high myopia. J Clin Lab Anal 23, 362-7 (2009). 261. Nishizaki, R. et al. New susceptibility locus for high myopia is linked to the uromodulin-like 1 (UMODL1) gene region on chromosome 21q22.3. Eye (Lond) 23, 222-9 (2009). 262. Neale, B.M. et al. Genome-wide association study of advanced age-related macular degeneration identifies a role of the hepatic lipase gene (LIPC). Proc Natl Acad Sci U S A 107, 7395-400 (2010). 263. Verhoeven, V.J. et al. Large scale international replication and meta-analysis study confirms association of the 15q14 locus with myopia. The CREAM consortium. Hum Genet (2012). 264. Wang, Q. et al. Replication study of significant single nucleotide polymorphisms associated with myopia from two genome-wide association studies. Mol Vis 17, 3290-9 (2011). 265. Shi, Y. et al. Genetic variants at 13q12.12 are associated with high myopia in the han chinese population. Am J Hum Genet 88, 805-13 (2011). 266. Li, Z. et al. A genome-wide association study reveals association between common variants in an intergenic region of 4q25 and high-grade myopia in the Chinese Han population. Hum Mol Genet 20, 2861-8 (2011). 267. Shi Y, e.a. Exome Sequencing Identifies ZNF644 Mutations in High Myopia. PLoS Genet 7(6), e1002084 (2011). 268. Tran-Viet KN, S.G.E., Soler V, Powell C, Lim SH, Klemm T, Saw SM, Young TL. Study of a US cohort supports the role of ZNF644 and high-grade myopia susceptibility. Mol Vis. 18, 937-44 (2012). 269. Walline, J.J. et al. Interventions to slow progression of myopia in children. Cochrane Database Syst Rev, CD004916 (2011). 270. Ostrow, G. & Kirkeby, L. Update on myopia and myopic progression in children. Int Ophthalmol Clin 50, 87-93 (2010).

204

271. Gwiazda, J. et al. A randomized clinical trial of progressive addition lenses versus single vision lenses on the progression of myopia in children. Invest Ophthalmol Vis Sci 44, 1492-500 (2003). 272. Parssinen, O., Hemminki, E. & Klemetti, A. Effect of spectacle use and accommodation on myopic progression: final results of a three-year randomised clinical trial among schoolchildren. Br J Ophthalmol 73, 547-51 (1989). 273. Chung, K., Mohidin, N. & O'Leary, D.J. Undercorrection of myopia enhances rather than inhibits myopia progression. Vision Res 42, 2555-9 (2002). 274. Shen, J., Clark, C.A., Soni, P.S. & Thibos, L.N. Peripheral refraction with and without contact lens correction. Optom Vis Sci 87, 642-55 (2010). 275. Queiros, A., Gonzalez-Meijome, J.M., Jorge, J., Villa-Collar, C. & Gutierrez, A.R. Peripheral refraction in myopic patients after orthokeratology. Optom Vis Sci 87, 323-9 (2010). 276. Kakita, T., Hiraoka, T. & Oshika, T. Influence of overnight orthokeratology on axial elongation in childhood myopia. Invest Ophthalmol Vis Sci 52, 2170-4 (2011). 277. Cho, P., Cheung, S.W. & Edwards, M. The longitudinal orthokeratology research in children (LORIC) in Hong Kong: a pilot study on refractive changes and myopic control. Curr Eye Res 30, 71-80 (2005). 278. Walline, J.J., Jones, L.A. & Sinnott, L.T. Corneal reshaping and myopia progression. Br J Ophthalmol 93, 1181-5 (2009). 279. Katz, J. et al. A randomized trial of rigid gas permeable contact lenses to reduce progression of children's myopia. Am J Ophthalmol 136, 82-90 (2003). 280. Walline, J.J. et al. The current state of corneal reshaping. Eye Contact Lens 31, 209-14 (2005). 281. Tong, L. et al. Atropine for the treatment of childhood myopia: effect on myopia progression after cessation of atropine. Ophthalmology 116, 572-9 (2009). 282. Siatkowski, R.M. et al. Two-year multicenter, randomized, double-masked, placebo-controlled, parallel safety and efficacy study of 2% pirenzepine ophthalmic gel in children with myopia. J AAPOS 12, 332-9 (2008). 283. Pan, C.W., Ramamurthy, D. & Saw, S.M. Worldwide prevalence and risk factors for myopia. Ophthalmic Physiol Opt 32, 3-16 (2012). 284. Wong, T.Y. et al. Prevalence and risk factors for refractive errors in adult Chinese in Singapore. Invest Ophthalmol Vis Sci 41, 2486-94 (2000). 285. McBrien, N.A. & Gentle, A. Role of the sclera in the development and pathological complications of myopia. Prog Retin Eye Res 22, 307-38 (2003). 286. Saw, S.M., Katz, J., Schein, O.D., Chew, S.J. & Chan, T.K. Epidemiology of myopia. Epidemiol Rev 18, 175-87 (1996). 287. Dirani, M., Shekar, S.N. & Baird, P.N. Evidence of shared genes in refraction and axial length: the Genes in Myopia (GEM) twin study. Invest Ophthalmol Vis Sci 49, 4336-9 (2008). 288. Lavanya, R. et al. Methodology of the Singapore Indian Chinese Cohort (SICC) eye study: quantifying ethnic variations in the epidemiology of eye diseases in Asians. Ophthalmic Epidemiol 16, 325-36 (2009). 289. Saw, S.M. et al. A cohort study of incident myopia in Singaporean children. Invest Ophthalmol Vis Sci 47, 1839-44 (2006).

205

290. Fan, Q. et al. Genome-wide meta-analysis of five Asian cohorts identifies PDGFRA as a susceptibility locus for corneal astigmatism. PLoS Genet 7, e1002402 (2011). 291. Foong, A.W. et al. Rationale and methodology for a population-based study of eye diseases in Malay people: The Singapore Malay eye study (SiMES). Ophthalmic Epidemiol 14, 25-35 (2007). 292. Grosvenor, T.P. Primary care optometry, xiii, 510 p. (Butterworth- Heinemann/Elsevier, St. Louis, Mo., 2007). 293. Schork, N.J., Nath, S.K., Fallin, D. & Chakravarti, A. Linkage disequilibrium analysis of biallelic DNA markers, human quantitative trait loci, and threshold- defined case and control subjects. Am J Hum Genet 67, 1208-18 (2000). 294. Saw, S.M. et al. Prevalence and risk factors for refractive errors in the Singapore Malay Eye Survey. Ophthalmology 115, 1713-9 (2008). 295. Fan, D.S. et al. Prevalence, incidence, and progression of myopia of school children in Hong Kong. Invest Ophthalmol Vis Sci 45, 1071-5 (2004). 296. Sim, X. et al. Transferability of type 2 diabetes implicated loci in multi-ethnic cohorts from Southeast Asia. PLoS Genet 7, e1001363 (2011). 297. Purcell, S. et al. PLINK: a tool set for whole-genome association and population- based linkage analyses. Am J Hum Genet 81, 559-75 (2007). 298. Fan, Q., Teo, Y.Y. & Saw, S.M. Application of advanced statistics in ophthalmology. Invest Ophthalmol Vis Sci 52, 6059-65 (2011). 299. Brink, N., Szamel, M., Young, A.R., Wittern, K.P. & Bergemann, J. Comparative quantification of IL-1beta, IL-10, IL-10r, TNFalpha and IL-7 mRNA levels in UV- irradiated human skin in vivo. Inflamm Res 49, 290-6 (2000). 300. Rozen, S. & Skaletsky, H. Primer3 on the WWW for general users and for biologist programmers. Methods Mol Biol 132, 365-86 (2000). 301. Schaeffel, F., Burkhardt, E., Howland, H.C. & Williams, R.W. Measurement of refractive state and deprivation myopia in two strains of mice. Optom Vis Sci 81, 99-110 (2004). 302. Barathi, V.A., Boopathi, V.G., Yap, E.P. & Beuerman, R.W. Two models of experimental myopia in the mouse. Vision Res 48, 904-16 (2008). 303. Leung, K.W. et al. Expression of ZnT and ZIP zinc transporters in the human RPE and their regulation by neurotrophic factors. Invest Ophthalmol Vis Sci 49, 1221- 31 (2008). 304. Liang, J., Song, W., Tromp, G., Kolattukudy, P.E. & Fu, M. Genome-wide survey and expression profiling of CCCH-zinc finger family reveals a functional module in macrophage activation. PLoS One 3, e2880 (2008). 305. Poliseno, L. et al. A coding-independent function of gene and pseudogene mRNAs regulates tumour biology. Nature 465, 1033-8 (2010). 306. Salmena, L., Poliseno, L., Tay, Y., Kats, L. & Pandolfi, P.P. A ceRNA Hypothesis: The Rosetta Stone of a Hidden RNA Language? Cell 146, 353-8 (2011). 307. D'Errico, I., Gadaleta, G. & Saccone, C. Pseudogenes in metazoa: origin and features. Brief Funct Genomic Proteomic 3, 157-67 (2004). 308. Shi, Y. et al. Exome sequencing identifies ZNF644 mutations in high myopia. PLoS Genet 7, e1002084 (2011).

206

309. Schippert, R., Burkhardt, E., Feldkaemper, M. & Schaeffel, F. Relative axial myopia in Egr-1 (ZENK) knockout mice. Invest Ophthalmol Vis Sci 48, 11-7 (2007). 310. Laity, J.H., Lee, B.M. & Wright, P.E. Zinc finger proteins: new insights into structural and functional diversity. Curr Opin Struct Biol 11, 39-46 (2001). 311. Fischer, A.J., McGuire, J.J., Schaeffel, F. & Stell, W.K. Light- and focus-dependent expression of the transcription factor ZENK in the chick retina. Nat Neurosci 2, 706-12 (1999). 312. Bitzer, M. & Schaeffel, F. Defocus-induced changes in ZENK expression in the chicken retina. Invest Ophthalmol Vis Sci 43, 246-52 (2002). 313. Simon, P., Feldkaemper, M., Bitzer, M., Ohngemach, S. & Schaeffel, F. Early transcriptional changes of retinal and choroidal TGFbeta-2, RALDH-2, and ZENK following imposed positive and negative defocus in . Mol Vis 10, 588-97 (2004). 314. Liu, C., Adamson, E. & Mercola, D. Transcription factor EGR-1 suppresses the growth and transformation of human HT-1080 fibrosarcoma cells by induction of transforming growth factor beta 1. Proc Natl Acad Sci U S A 93, 11831-6 (1996). 315. Baron, V., Adamson, E.D., Calogero, A., Ragona, G. & Mercola, D. The transcription factor Egr1 is a direct regulator of multiple tumor suppressors including TGFbeta1, PTEN, p53, and fibronectin. Cancer Gene Ther 13, 115-24 (2006). 316. Seve, M., Chimienti, F., Devergnas, S. & Favier, A. In silico identification and expression of SLC30 family genes: an expressed sequence tag data mining strategy for the characterization of zinc transporters' tissue expression. BMC Genomics 5, 32 (2004). 317. van Leeuwen, R. et al. Dietary intake of antioxidants and risk of age-related macular degeneration. JAMA 294, 3101-7 (2005). 318. Ugarte, M. & Osborne, N.N. Zinc in the retina. Prog Neurobiol 64, 219-49 (2001). 319. Huibi, X., Kaixun, H., Qiuhua, G., Yushan, Z. & Xiuxian, H. Prevention of axial elongation in myopia by the trace element zinc. Biol Trace Elem Res 79, 39-47 (2001). 320. Steinberg, G.R., Kemp, B.E. & Watt, M.J. Adipocyte triglyceride lipase expression in human obesity. Am J Physiol Endocrinol Metab 293, E958-64 (2007). 321. Heid, I.M. et al. Meta-analysis identifies 13 new loci associated with waist-hip ratio and reveals sexual dimorphism in the genetic basis of fat distribution. Nat Genet 42, 949-60 (2010). 322. Lindgren, C.M. et al. Genome-wide association scan meta-analysis identifies three Loci influencing adiposity and fat distribution. PLoS Genet 5, e1000508 (2009). 323. Cordain, L., Eades, M.R. & Eades, M.D. Hyperinsulinemic diseases of civilization: more than just Syndrome X. Comp Biochem Physiol A Mol Integr Physiol 136, 95- 112 (2003). 324. Lim, L.S. et al. Dietary factors, myopia, and axial dimensions in children. Ophthalmology 117, 993-997 e4 (2010).

207

325. Gao, H., Frost, M.R., Siegwart, J.T., Jr. & Norton, T.T. Patterns of mRNA and protein expression during minus-lens compensation and recovery in tree shrew sclera. Mol Vis 17, 903-19 (2011). 326. Wong, T.Y., Foster, P.J., Johnson, G.J. & Seah, S.K. Refractive errors, axial ocular dimensions, and age-related cataracts: the Tanjong Pagar survey. Invest Ophthalmol Vis Sci 44, 1479-85 (2003). 327. Wu, R. et al. Smoking, socioeconomic factors, and age-related cataract: The Singapore Malay Eye study. Arch Ophthalmol 128, 1029-35 (2010). 328. Tokoro, T. On the definition of pathologic myopia in group studies. Acta Ophthalmol Suppl 185, 107-8 (1988). 329. Dirani, M., Shekar, S.N. & Baird, P.N. Adult-onset myopia: the Genes in Myopia (GEM) twin study. Invest Ophthalmol Vis Sci 49, 3324-7 (2008). 330. Jensen, H. Myopia in teenagers. An eight-year follow-up study on myopia progression and risk factors. Acta Ophthalmol Scand 73, 389-93 (1995). 331. Ip, J.M. et al. Variation of the contribution from axial length and other oculometric parameters to refraction by age and ethnicity. Invest Ophthalmol Vis Sci 48, 4846-53 (2007). 332. McCarthy, M.I. & Hirschhorn, J.N. Genome-wide association studies: potential next steps on a genetic journey. Hum Mol Genet 17, R156-65 (2008). 333. Teo, Y.Y. et al. Power consequences of linkage disequilibrium variation between populations. Genet Epidemiol 33, 128-35 (2009). 334. Raju, P. et al. Prevalence of refractive errors in a rural South Indian population. Invest Ophthalmol Vis Sci 45, 4268-72 (2004). 335. Attebo, K., Ivers, R.Q. & Mitchell, P. Refractive errors in an older population: the Blue Mountains Eye Study. Ophthalmology 106, 1066-72 (1999). 336. Vitale, S., Ellwein, L., Cotch, M.F., Ferris, F.L., 3rd & Sperduto, R. Prevalence of refractive error in the United States, 1999-2004. Arch Ophthalmol 126, 1111-9 (2008). 337. Huynh, S.C., Kifley, A., Rose, K.A., Morgan, I.G. & Mitchell, P. Astigmatism in 12- year-old Australian children: comparisons with a 6-year-old population. Invest Ophthalmol Vis Sci 48, 73-82 (2007). 338. O'Donoghue, L. et al. Refractive and corneal astigmatism in white school children in Northern Ireland. Invest Ophthalmol Vis Sci (2011). 339. He, M. et al. Refractive error and visual impairment in urban children in southern china. Invest Ophthalmol Vis Sci 45, 793-9 (2004). 340. Clementi, M., Angi, M., Forabosco, P., Di Gianantonio, E. & Tenconi, R. Inheritance of astigmatism: evidence for a major autosomal dominant locus. Am J Hum Genet 63, 825-30 (1998). 341. Dirani, M., Islam, A., Shekar, S.N. & Baird, P.N. Dominant genetic effects on corneal astigmatism: the genes in myopia (GEM) twin study. Invest Ophthalmol Vis Sci 49, 1339-44 (2008). 342. Grjibovski, A.M., Magnus, P., Midelfart, A. & Harris, J.R. Epidemiology and heritability of astigmatism in Norwegian twins: an analysis of self-reported data. Ophthalmic Epidemiol 13, 245-52 (2006). 343. Yeh, L.K. et al. The genetic effect on refractive error and anterior corneal aberration: twin eye study. J Refract Surg 23, 257-65 (2007).

208

344. Cagigrigoriu, A., Gregori, D., Cortassa, F., Catena, F. & Marra, A. Heritability of corneal curvature and astigmatism: a videokeratographic child-parent comparison study. Cornea 26, 907-12 (2007). 345. Parssinen, O., Kauppinen, M., Kaprio, J., Koskenvuo, M. & Rantanen, T. Heritability of corneal refraction and corneal astigmatism: a population-based twin study among 66- to 79-year-old female twins. Acta Ophthalmol (2012). 346. Hughes, K. et al. Cardiovascular diseases in Chinese, Malays, and Indians in Singapore. II. Differences in risk factor levels. J Epidemiol Community Health 44, 29-35 (1990). 347. Tan, C.E., Emmanuel, S.C., Tan, B.Y. & Jacob, E. Prevalence of diabetes and ethnic differences in cardiovascular risk factors. The 1992 Singapore National Health Survey. Diabetes Care 22, 241-7 (1999). 348. Hughes, K., Aw, T.C., Kuperan, P. & Choo, M. Central obesity, insulin resistance, syndrome X, lipoprotein(a), and cardiovascular risk in Indians, Malays, and Chinese in Singapore. J Epidemiol Community Health 51, 394-9 (1997). 349. Cutter, J., Tan, B.Y. & Chew, S.K. Levels of cardiovascular disease risk factors in Singapore following a national intervention programme. Bull World Health Organ 79, 908-15 (2001). 350. Dirani, M. et al. Prevalence of refractive error in Singaporean Chinese children: the strabismus, amblyopia, and refractive error in young Singaporean Children (STARS) study. Invest Ophthalmol Vis Sci 51, 1348-55 (2010). 351. Zadnik, K., Mutti, D.O. & Adams, A.J. The repeatability of measurement of the ocular components. Invest Ophthalmol Vis Sci 33, 2325-33 (1992). 352. Kazeem, G.R. & Farrall, M. Integrating case-control and TDT studies. Ann Hum Genet 69, 329-35 (2005). 353. Peng, B., Yu, R.K., Dehoff, K.L. & Amos, C.I. Normalizing a large number of quantitative traits using empirical normal quantile transformation. BMC Proc 1 Suppl 1, S156 (2007). 354. Kawagishi, J., Kumabe, T., Yoshimoto, T. & Yamamoto, T. Structure, organization, and transcription units of the human alpha-platelet-derived growth factor receptor gene, PDGFRA. Genomics 30, 224-32 (1995). 355. Heinrich, M.C. et al. PDGFRA activating mutations in gastrointestinal stromal tumors. Science 299, 708-10 (2003). 356. Hoppenreijs, V.P., Pels, E., Vrensen, G.F., Felten, P.C. & Treffers, W.F. Platelet- derived growth factor: receptor expression in corneas and effects on corneal cells. Invest Ophthalmol Vis Sci 34, 637-49 (1993). 357. Kim, W.J., Mohan, R.R. & Wilson, S.E. Effect of PDGF, IL-1alpha, and BMP2/4 on corneal fibroblast chemotaxis: expression of the platelet-derived growth factor system in the cornea. Invest Ophthalmol Vis Sci 40, 1364-72 (1999). 358. Robbins, S.G. et al. Platelet-derived growth factor ligands and receptors immunolocalized in proliferative retinal diseases. Invest Ophthalmol Vis Sci 35, 3649-63 (1994). 359. Fruttiger, M. et al. PDGF mediates a neuron-astrocyte interaction in the developing retina. Neuron 17, 1117-31 (1996). 360. Cui, J. et al. PDGF receptors are activated in human epiretinal membranes. Exp Eye Res 88, 438-44 (2009).

209

361. Li, D.Q. & Tseng, S.C. Differential regulation of cytokine and receptor transcript expression in human corneal and limbal fibroblasts by epidermal growth factor, transforming growth factor-alpha, platelet-derived growth factor B, and interleukin-1 beta. Invest Ophthalmol Vis Sci 37, 2068-80 (1996). 362. Vij, N., Sharma, A., Thakkar, M., Sinha, S. & Mohan, R.R. PDGF-driven proliferation, migration, and IL8 chemokine secretion in human corneal fibroblasts involve JAK2-STAT3 signaling pathway. Mol Vis 14, 1020-7 (2008). 363. Kaur, H., Chaurasia, S.S., Agrawal, V. & Wilson, S.E. Expression of PDGF receptor- alpha in corneal myofibroblasts in situ. Exp Eye Res 89, 432-4 (2009). 364. Thom, S.B. et al. Effect of topical anti-transforming growth factor-beta on corneal stromal haze after photorefractive keratectomy in rabbits. J Cataract Refract Surg 23, 1324-30 (1997). 365. Kim, A., Lakshman, N., Karamichos, D. & Petroll, W.M. Growth factor regulation of corneal keratocyte differentiation and migration in compressed collagen matrices. Invest Ophthalmol Vis Sci 51, 864-75 (2010). 366. Han, S. et al. Association of variants in FRAP1 and PDGFRA with corneal curvature in Asian populations from Singapore. Hum Mol Genet (2011). 367. Grosvenor, T. & Goss, D.A. Role of the cornea in emmetropia and myopia. Optom Vis Sci 75, 132-45 (1998). 368. Llorente, L., Barbero, S., Cano, D., Dorronsoro, C. & Marcos, S. Myopic versus hyperopic eyes: axial length, corneal shape and optical aberrations. J Vis 4, 288- 98 (2004). 369. Wright, K.W. & Spiegel, P.H. Pediatric ophthalmology and strabismus, xxiii, 1084 p. (Springer, New York, 2003). 370. Mutti, D.O. et al. Refractive astigmatism and the toricity of ocular components in human infants. Optom Vis Sci 81, 753-61 (2004). 371. Baldwin, W.R. & Mills, D. A longitudinal study of corneal astigmatism and total astigmatism. Am J Optom Physiol Opt 58, 206-11 (1981). 372. Cowen, L. & Bobier, W.R. The pattern of astigmatism in a Canadian preschool population. Invest Ophthalmol Vis Sci 44, 4593-600 (2003). 373. Jeffreys, A.J., Neumann, R., Panayi, M., Myers, S. & Donnelly, P. Human recombination hot spots hidden in regions of strong marker association. Nat Genet 37, 601-6 (2005). 374. Kong, A. et al. A high-resolution recombination map of the human genome. Nat Genet 31, 241-7 (2002). 375. Tenesa, A. et al. Recent human effective population size estimated from linkage disequilibrium. Genome Res 17, 520-6 (2007). 376. Hemminger, B.M., Saelim, B. & Sullivan, P.F. TAMAL: an integrated approach to choosing SNPs for genetic studies of human complex traits. Bioinformatics 22, 626-7 (2006). 377. Cockerham, C.C. & Weir, B.S. Estimation of inbreeding parameters in stratified populations. Ann Hum Genet 50, 271-81 (1986). 378. Pickrell, J.K. et al. Signals of recent positive selection in a worldwide sample of human populations. Genome Res 19, 826-37 (2009). 379. Teo, Y.Y. et al. Genome-wide comparisons of variation in linkage disequilibrium. Genome Res 19, 1849-60 (2009).

210

380. Sabeti, P.C. et al. Genome-wide detection and characterization of positive selection in human populations. Nature 449, 913-8 (2007). 381. Voight, B.F., Kudaravalli, S., Wen, X. & Pritchard, J.K. A map of recent positive selection in the human genome. PLoS Biol 4, e72 (2006). 382. Grossman, S.R. et al. A composite of multiple signals distinguishes causal variants in regions of positive selection. Science 327, 883-6 (2010). 383. Lamason, R.L. et al. SLC24A5, a putative cation exchanger, affects pigmentation in and humans. Science 310, 1782-6 (2005). 384. Xu, S. et al. Genomic dissection of population substructure of Han Chinese and its implication in association studies. Am J Hum Genet 85, 762-74 (2009). 385. Hudson, R.R. Two-locus sampling distributions and their application. Genetics 159, 1805-17 (2001). 386. McVean, G., Awadalla, P. & Fearnhead, P. A coalescent-based method for detecting and estimating recombination from gene sequences. Genetics 160, 1231-41 (2002). 387. Tajima, F. Relationship between DNA polymorphism and fixation time. Genetics 125, 447-54 (1990). 388. O'Reilly, P.F., Birney, E. & Balding, D.J. Confounding between recombination and selection, and the Ped/Pop method for detecting selection. Genome Res 18, 1304-13 (2008). 389. Reed, F.A. & Tishkoff, S.A. Positive selection can create false hotspots of recombination. Genetics 172, 2011-4 (2006). 390. Stephan, W., Song, Y.S. & Langley, C.H. The hitchhiking effect on linkage disequilibrium between linked neutral loci. Genetics 172, 2647-63 (2006). 391. Hellenthal, G., Pritchard, J.K. & Stephens, M. The effects of genotype-dependent recombination, and transmission asymmetry, on linkage disequilibrium. Genetics 172, 2001-5 (2006). 392. Kong, A. et al. Sequence variants in the RNF212 gene associate with genome- wide recombination rate. Science 319, 1398-401 (2008). 393. Baudat, F. et al. PRDM9 is a major determinant of meiotic recombination hotspots in humans and mice. Science 327, 836-40 (2010). 394. Berg, I.L. et al. PRDM9 variation strongly influences recombination hot-spot activity and meiotic instability in humans. Nat Genet 42, 859-63 (2010). 395. Parvanov, E.D., Petkov, P.M. & Paigen, K. Prdm9 controls activation of mammalian recombination hotspots. Science 327, 835 (2010). 396. Teo, Y.Y., Ong, R.T., Sim, X., Tai, E.S. & Chia, K.S. Identifying candidate causal variants via trans-population fine-mapping. Genet Epidemiol 34, 653-64 (2010). 397. Zaitlen, N., Pasaniuc, B., Gur, T., Ziv, E. & Halperin, E. Leveraging genetic variability across populations for the identification of causal variants. Am J Hum Genet 86, 23-33 (2010). 398. Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444-54 (2006). 399. Williamson, S.H. et al. Localizing recent adaptive evolution in the human genome. PLoS Genet 3, e90 (2007). 400. Carlson, C.S. et al. Genomic regions exhibiting positive selection identified from dense genotype data. Genome Res 15, 1553-65 (2005).

211

401. Shaikh, T.H. et al. High-resolution mapping and analysis of copy number variations in the human genome: a data resource for clinical and research applications. Genome Res 19, 1682-90 (2009). 402. Pinto, D., Marshall, C., Feuk, L. & Scherer, S.W. Copy-number variation in control population cohorts. Hum Mol Genet 16 Spec No. 2, R168-73 (2007). 403. Wang, E.T., Kodama, G., Baldi, P. & Moyzis, R.K. Global landscape of recent inferred Darwinian selection for Homo sapiens. Proc Natl Acad Sci U S A 103, 135-40 (2006). 404. Kimura, R., Fujimoto, A., Tokunaga, K. & Ohashi, J. A practical genome scan for population-specific strong selective sweeps that have reached fixation. PLoS One 2, e286 (2007). 405. Jakobsson, M. et al. Genotype, haplotype and copy-number variation in worldwide human populations. Nature 451, 998-1003 (2008). 406. Park, H. et al. Discovery of common Asian copy number variants using integrated high-resolution array CGH and massively parallel DNA sequencing. Nat Genet 42, 400-5 (2010). 407. Simon-Sanchez, J. et al. Genome-wide SNP assay reveals structural genomic variation, extended homozygosity and cell-line induced alterations in normal individuals. Hum Mol Genet 16, 1-14 (2007). 408. Zogopoulos, G. et al. Germ-line DNA copy number variation frequencies in a large North American population. Hum Genet 122, 345-53 (2007). 409. Wong, K.K. et al. A comprehensive analysis of common copy-number variations in the human genome. Am J Hum Genet 80, 91-104 (2007). 410. Akey, J.M. Constructing genomic maps of positive selection in humans: where do we go from here? Genome Res 19, 711-22 (2009). 411. Xu, H. et al. SgD-CNV, a database for common and rare copy number variants in three Asian populations. Hum Mutat 32, 1341-9 (2011). 412. Tang, K., Thornton, K.R. & Stoneking, M. A new approach for using genome scans to detect recent positive selection in the human genome. PLoS Biol 5, e171 (2007). 413. Kelley, J.L., Madeoy, J., Calhoun, J.C., Swanson, W. & Akey, J.M. Genomic signatures of positive selection in humans and the limits of outlier approaches. Genome Res 16, 980-9 (2006). 414. Han, S. et al. Association of variants in FRAP1 and PDGFRA with corneal curvature in Asian populations from Singapore. Hum Mol Genet 20, 3693-8 (2011). 415. Mishra, A. et al. Genetic variants near PDGFRA are associated with corneal curvature in Australians. Invest Ophthalmol Vis Sci (2012). 416. Hageman, G.S. et al. Clinical validation of a genetic model to estimate the risk of developing choroidal neovascular age-related macular degeneration. Hum Genomics 5, 420-40 (2011). 417. Jostins, L. & Barrett, J.C. Genetic risk prediction in complex disease. Hum Mol Genet 20, R182-8 (2011). 418. Stephens, M. & Balding, D.J. Bayesian statistical methods for genetic association studies. Nat Rev Genet 10, 681-90 (2009). 419. Bodmer, W. & Bonilla, C. Common and rare variants in multifactorial susceptibility to common diseases. Nat Genet 40, 695-701 (2008). 212

420. Morris, A.P. & Zeggini, E. An evaluation of statistical approaches to rare variant analysis in genetic association studies. Genet Epidemiol 34, 188-93 (2010). 421. Winchester, L., Yau, C. & Ragoussis, J. Comparing CNV detection methods for SNP arrays. Brief Funct Genomic Proteomic 8, 353-66 (2009). 422. Verhoeven, V.J. et al. Genome-wide meta-analyses of multiancestry cohorts identify multiple new susceptibility loci for refractive error and myopia. Nat Genet 45, 314-8 (2013). 423. Kiefer, A.K. et al. Genome-wide analysis points to roles for extracellular matrix remodeling, the visual cycle, and neuronal development in myopia. PLoS Genet 9, e1003299 (2013). 424. Hellenthal, G. & Stephens, M. Insights into recombination from population genetic variation. Curr Opin Genet Dev 16, 565-72 (2006).

213