INTEGRATING POPULATION GENOMICS AND MEDICAL GENETICS FOR UNDERSTANDING THE GENETIC AETIOLOGY OF EYE TRAITS
FAN QIAO (M.Sc. University of Minnesota)
A THESIS SUBMITTED FOR THE DEGREE OF DOCTOR OF PHILOSPHY
SAW SWEE HOCK SCHOOL OF PUBLIC HEALTH NATIONAL UNIVERSITY OF SINGAPORE
2012
Acknowledgements
I would like to express my sincerest gratitude to my supervisor, Prof. Yik-
Ying Teo, for his guidance, patience and encouraging high standards in my
work through this study. He spent hours reviewing my original manuscripts,
gave constructive feedback and made detailed corrections. His support has been invaluable for me to write this doctoral thesis.
I am also deeply grateful to my supervisor, Prof. Seang-Mei Saw, for her continuous support, suggestions and providing research resources for me to accomplish my work. Her passion in research and the determination to slow the myopic progression in children has influenced me greatly.
My sincere thanks also go to Dr. Yi-Ju Li, who encouraged me to move a step forward in my career and broadened my research experience. Her unflinching courage confronting ill health will inspire me for my whole life. I am also thankful to Dr. Ching-Yu Cheng. The conversations with Ching-Yu were always valuable for me to understand the clinical relevance of ocular diseases. My thanks are also due to Dr. Chiea-Chuen Khor for his prompt comments in reviewing my papers and the insight provided. I also wish to thank Dr. Liang Kee Goh for providing the infrastructure to support me at the beginning of this research, and Prof. Terri L Young and Prof. Tien-Yin Wong for their dedication along this project.
During this research, I have worked with many collaborators for whom I have great regard. In particular, I am indebted to Dr. Veluchamy A. Barathi for performing gene and protein expression in ocular tissues. The discussion with her regarding the animal model of myopia was an interesting exploration.
1
It is also my pleasure to acknowledge Dr. Akira Meguro and Dr. Isao Nakata
for kindly sharing their data in the replication study and stimulating
discussions.
Many thanks go to my office-mates and colleagues, Zhou Xin, Chen Peng,
Xiaoyu, Haiyang, Huijun, Rick, Queenie, Vivian, Chenwei and Wang Pei for
their cheerful discussion and a source of inspiration.
Finally, I would like to thank my family for their wholehearted support given to me - I owe everything to them.
For me, the journey over the past several years has been more like a
process of cultivation. The best way to express my gratitude is, without
attachment to a self, to help others in my life.
2
Tables of Contents
SUMMARY…………………………………………………………………..6
LIST OF TABLES………………………………………………...... 8
LIST OF FIGURES…………………………………………………...... 9
1 CHAPTER 1 INTRODUCTION ...... 12
1.1 Statistical analysis of genome-wide association studies ...... 12 1.1.1 Linkage disequilibrium based association mapping...... 12 1.1.2 Study design and analytical strategy ...... 13 1.1.2.1 Data quality control ...... 13 1.1.2.2 Population structure ...... 14 1.1.2.3 Study design ...... 16 1.1.2.4 Multiple testing ...... 17 1.1.3 Phenotype classification...... 18 1.1.3.1 Binary/quantitative traits ...... 18 1.1.3.2 Paired eye measurements ...... 19 1.1.4 Meta-analysis of genome-wide association studies ...... 22 1.1.4.1 Imputation on genotyped data ...... 22 1.1.4.2 Statistics in the meta-analysis ...... 23 1.1.4.3 Statistical challenges in analyzing multi-ethnic populations ...... 26
1.2 Recombination variation between populations ...... 28 1.2.1 Recombination and genetic diversity ...... 28 1.2.2 Variation in inter-population recombination ...... 29 1.2.3 Current approaches of quantifying recombination differences ...... 30
1.3 Refractive errors and the aetiology of myopia ...... 32 1.3.1 Types of refractive errors ...... 33 1.3.1.1 Myopia, hyperopia and ocular biometrics ...... 33 1.3.1.2 Astigmatism ...... 34 1.3.2 Experimental animal myopia models ...... 35 1.3.2.1 Deprivation myopia and inducing myopia ...... 35 1.3.2.2 Emmetropisation and the role of scleral changes in eye growth ...... 37 1.3.2.3 Peripheral refraction ...... 37 1.3.3 Roles of environmental factors in controlling human refraction ...... 38 1.3.4 Genetic basis of myopia ...... 41 1.3.4.1 Familial aggregation and segregation...... 41 1.3.4.2 Estimates of heritability ...... 43 1.3.5 Genetic loci associated with or linked to refractive errors ...... 46 1.3.5.1 Myopic loci identified from genome-wide linkage studies ...... 46 1.3.5.2 Candidate gene studies ...... 50 1.3.5.3 Genome-wide association studies ...... 57 3
1.3.6 Intervention to slow myopia progression ...... 61
2 CHAPTER 2 STUDY AIMS ...... 65
3 CHAPTER 3 GENETIC VARIANTS ON CHROMOSOME 1Q41 INFLUENCE OCULAR AXIAL LENGTH AND HIGH MYOPIA ...... 67
3.1 Abstract ...... 67
3.2 Background ...... 68
3.3 Methods ...... 70 3.3.1 Study cohorts ...... 70 3.3.2 Data quality control ...... 74 3.3.3 Statistical methods ...... 77 3.3.4 Functional studies ...... 78 3.3.4.1 Gene expression in human ...... 78 3.3.4.2 Myopia-induced mouse model ...... 79
3.4 Results ...... 82 3.4.1 Datasets after quality control ...... 82 3.4.2 Locus at chromosome 1q41 achieved genome-wide significance ...... 83 3.4.3 Association with high myopia on the identified SNPs ...... 84 3.4.4 Gene expression ...... 85
3.5 Discussion ...... 86
4 CHAPTER 4 GENOME-WIDE META-ANALYSIS OF FIVE ASIAN COHORTS IDENTIFIES PDGFRA AS A SUSCEPTIBILITY LOCUS FOR CORNEAL ASTIGMATISM ...... 103
4.1 Abstract ...... 103
4.2 Background ...... 104
4.3 Methods ...... 106 4.3.1 Study cohorts ...... 106 4.3.2 Data quality control ...... 109 4.3.3 Statistical methods ...... 113
4.4 Results ...... 115 4.4.1 Datasets after quality control ...... 115 4.4.2 Gene PDGFRA exhibiting genome-wide significance...... 116
4.5 Discussion ...... 117
5 CHAPTER 5 GENOME-WIDE COMPARISON OF ESTIMATED RECOMBINATION RATES BETWEEN POPULATIONS ...... 130 4
5.1 Study summary ...... 130
5.2 Methods ...... 131 5.2.1 Development of recombination variation score ...... 131 5.2.2 Simulation ...... 134 5.2.3 Estimation of recombination rates ...... 137 5.2.4 Simulation ...... 138
5.2.5 SNP annotation, copy number variation and FST calculation ...... 141 5.2.6 Quantification of variations in linkage disequilibrium ...... 141
5.3 Results ...... 143 5.3.1 Simulation studies on power and false positive rates ...... 143 5.3.2 Application to HapMap and Singapore Genome Variation Project ...... 145 5.3.3 Recombination variation and Linkage disequilibrium variation highly correlated 148 5.3.4 Regions with largest recombination variation less frequent in genes ...... 149
5.4 Discussion ...... 149
6 CHAPTER 6 CONCLUSION ...... 181
6.1 Identified genetic variants associated with refractive errors ...... 181
6.2 Transferability of the genetic variants for refractive errors across populations 182
6.3 Statistical meta-analysis of GWAS in diverse populations ...... 184
6.4 Missing heritability of myopia ...... 185
6.5 Recombination variations and implications in genetic association studies ...... 187
7 PUBLICATIONS ...... 190
8 REFERENCES ...... 191
5
Summary For complex human diseases, identifying the underlying genetic factors
has previously primarily relied on either genome-wide linkage scans to narrow
down the chromosomal regions that are linked to disease-causing genes or the candidate gene approach based on known mechanisms of disease pathogenesis. During the past few years, genome-wide association studies have emerged as popular tools to identify genetic variants underlying common and complex diseases, greatly advancing our understanding of the genetic architecture of human diseases.
Refractive errors are complex ocular disorders, as the underlying causes are both genetic and environmental in origin. The need for continued research into the genetic aetiology of refractive errors is considerable, especially considering a mismatch between high heritability in twin studies and the paucity of evidence for associated genetic variation. This thesis seeks to address the potential roles of genetic factors involved in refractive errors.
Through a meta-analysis of three genome-wide association scans on ocular biometry of axial length in Asians, we have determined that a genetic locus on chromosome 1q41 is associated with axial length and high myopia. In addition, our meta-analysis in five genome-wide association studies in Asians has revealed that genetic variants on chromosome 4q12 are associated with corneal astigmatism, exhibiting strong and consistent effects over Chinese,
Malays and Indians.
Inter-population variation in patterns of linkage disequilibrium, largely shaped by underlying homologous recombination, influences the transferability of genetic risk loci across different populations. Understanding 6
the recombination variation provides the insight into fine-mapping of the
functional polymorphisms by leveraging on the genetic diversity of different
populations. This motivates an attempt to quantify the recombination
variations between populations. For this purpose, a quantitative measure
(varRecM) is proposed to evaluate the extent of inter-population differences in
recombination rates. Our findings suggest that significant fine-scale differences exist in the recombination profiles of Europeans, Africans and East
Asians. Regions that emerged with the strongest evidence harbour candidate genes for population-specific positive selection, and for genetic syndromes.
7
List of Tables
Table 1. Summary of analytic approaches for quantitative trait two-eye data in genome-wide association studies...... 21
Table 2. Myopia loci identified from genome-wide linkage studies...... 49
Table 3. Candidate genes studied for high myopia...... 54
Table 4. Genetic loci identified from genome-wide association studies...... 59
Table 5. Characteristics of study participants in the five Asian cohorts...... 92
-5 Table 6. Top SNPs (Pmeta-value ≤ 1 × 10 ) associated with AL from the meta3analysis in the three Asian cohorts...... 93
Table 7. Association between genetic variants at chromosome 1q41 and high myopia in the five Asian cohorts...... 94
Table 8. Characteristics of the participants in five studies...... 128
Table 9. Top SNPs (P-value ≤ 5 x 10-6) identified from combined meta- analysis of five Asian population cohorts...... 129
Table 10. varRecM scores at top percentiles for pair-wise comparisons of the three HapMap populations between CEU and JPT + CHB, CEU and YRI, YRI and JPT + CHB...... 175
Table 11. The 20 strongest signals of varRecM scores in comparisons of HapMap populations...... 176
Table 12. The 20 strongest signals of varRecM score in comparison of populations of SGVP Chinese and Indians, and Chinese and HapMap East Asians...... 179
8
List of Figures
Figure 1. Impact of population stratification on genotype frequencies in the case-control association study...... 15
Figure 2. Cross-sectional view of the human eye structure ……...... ………34
Figure 3. The implicated genes likely to be involved in the visual signal transmission and scleral remodeling…………………………………………61
Figure 4. Principal component analysis (PCA) was performed in SiMES to assess the extent of population structure...... 95
Figure 5. Principal Component Analysis (PCA) of discovery cohorts SCES, SCORM and SiMES with respect to the four population panels in phase 2 of the HapMap samples (CEU-European, YRI-African, CHB-Chinese, JPT- Japanese)...... 96
Figure 6. Quantile-Quantile (Q-Q) plots of P-values for association between all SNPs and AL in the individual cohort (A) SCES, (B) SCORM,(C) SiMES, and combined meta-analysis of the discovery cohorts (D) SCES + SCORM + SiMES...... 97
Figure 7. Manhattan plot of -log10(P) for the association on axial length from the meta-analysis in the combined cohorts of SCES, SCORM and SiMES....98
Figure 8. The chromosome 1q41 region and its association with axial length in the Asian cohorts...... 99
Figure 9. mRNA expression of ZC3H11B, SLC30A10 and LYPLAL1 in human tissues...... 100
Figure 10. Transcription quantification of ZC3H11A, SLC30A10 and LYPLAL1 in mouse retina, retinal pigment epithelium and sclera in induced myopic eyes, fellow eyes and independent control eyes...... 101
Figure 11. Immunofluorescent labelling of (A) ZC3H11A (B) SLC30A10 and (C) LYPLAL1 in mouse retina, retinal pigment epithelium and sclera in induced myopic eyes, fellow eyes and independent control eyes...... 102
Figure 12. Principal Component Analysis (PCA) of SP2, SiMES, SINDI, SCORM with respect to the population panels in phase 2 of the HapMap samples (CEU-European, YRI-African, CHB-Chinese, JPT-Japan)...... 122
Figure 13. Principal component analysis (PCA) was performed in SINDI to assess the extent of population structure...... 123
9
Figure 14. Quantile-Quantile (Q-Q) plots of P-values for association between all SNPs and corneal astigmatism in the combined meta-analysis of (A) individual cohort SP2, (B) SiMES, (C) SINID, (D) SCORM, (E) STARS and (F) SP2 + SiMES + SINDI + SCORM + STARS...... 124
Figure 15. (A) Manhattan plot of log10(P-values) in the combined discovery cohort of SP2, SiMES, SINDI, SCORM and STARS. The blue horizontal line presents the threshold of suggestive significance (P = 1.00 × 10-5). (B) Regional plot of the association signals from the meta-analysis of the five GWAS cohorts around the PDGFRA gene locus...... 125
Figure 16. Forest plot of the estimated allelic odds ratios for the lead SNP rs7677751...... 126
Figure 17. Linkage disequilibrium (LD) calculated in terms of r2 for Singapore Chinese samples from SP2 (A), Malays samples from SiMES (B) and Indians panels from SINID (C)...... 127
Figure 18. Illustration of ranking the recombination differences from two populations...... 156
Figure 19. Evaluation of false positive rates (FPR) of varRecM method.....157
Figure 20. Power performance of varRecM method...... 158
Figure 21. Accumulative density plots of varRecM scores from five pair comparisons between HapMap and SGVP populations...... 159
Figure 22. Distribution of population-specific recombination peak regions in the top 1% of the varRecM scores...... 160
Figure 23. Top regions of largest varRecM scores with overlapping signals of positive selection...... 161
Figure 24. Plots of the top 20 regions of the varRecM scores for the comparison between samples of HapMap CEU and JPT+CHB...... 164
Figure 25. Plots of the top 20 regions of the varRecM scores for the comparison between samples of HapMap CEU and YRI...... 166
Figure 26. Plots of the top 20 regions of the varRecM scores for the comparison between samples of HapMap JPT+CHB and YRI...... 168
Figure 27. Plots of the top 20 regions of the varRecM scores for the comparison between samples of SGVP CHS and INS...... 170
Figure 28. Plots of the top 20 regions of the varRecM scores for the comparison between samples of SGVP CHS and HapMap JPT+CHB...... 172
10
Figure 29. Scatter plot of varLD score versus varRecM score among HapMap and SGVP populations...... 173
Figure 30. Odds ratio of extreme varRecM scores presenting in intergenic versus gene regions...... 174
11
1 Chapter 1 Introduction
In this Chapter, I will initially introduce the genome-wide association
studies (GWAS) and the GWAS meta-analysis, and also highlight the statistical challenges for paired-eye data. Subsequently, I will provide the background and motivation of the study in inter-population recombination variations. The last section will include a literature review on the aetiology of refractive errors, particularly myopia.
1.1 Statistical analysis of genome-wide association studies
1.1.1 Linkage disequilibrium based association mapping
Mapping disease genes primarily depends on linkage studies and association mapping. The former exploits within-family correlations between the disease and the genetic markers (i.e. microsatellite) linked to disease- related genes by calculating the logarithm of odds (LOD) scores1. Mutations
for more than 1,600 Mendelian diseases have been discovered by linkage
studies; however, it is less successful for complex (polygenic) disorders.
The genome-wide design is proposed as a powerful means to identify
common variants that underlie complex human traits2,3. GWAS typically
survey between 500,000 to 1,000,000 single nucleotide polymorphisms
(SNPs) across the entire human genome simultaneously4. Such a dense set of
SNPs (known as tag SNPs) across the genome is chosen based on the linkage
disequilibrium (LD) pattern of genotyped SNPs within a particular
chromosomal region in HapMap reference samples, thanks to the launch of the
international HapMap project5. In the simple scenario, an association study
12
compares the frequency of alleles or genotypes for a particular variant between the cases and controls. The current design of GWAS relies on genetic correlations between the genotyped markers and underlying functional polymorphisms, named LD-mapping. LD is the non-random association of alleles at two or more loci. The amount of LD depends on the difference between observed and expected (which is assumed randomly distributed) allelic frequencies. SNPs in high LD are likely to transmit to the same offspring in subsequent generations. It is hoped that a true causal SNP not genotyped in a study would be captured through a minimal level of LD with an informative nearby genotyped SNP exhibiting significant association with the disease.
1.1.2 Study design and analytical strategy
1.1.2.1 Data quality control
GWAS rely on commercial SNP chips, predominantly by Illumina
(http://www.illumina.com/) and Affymetrix (http://www.affymetrix.com/).
Regardless of the type of SNP chips used, a rigorous quality control (QC) procedure is very important to ensure the success of the study. While both
Affymetrix and Illumina have their own genotype-calling algorithms for raw data analysis, one should make sure that the best practice of genotype calling protocol is applied. Several QC check points are often examined in a GWAS including the sample call rate, Hardy-Weinberg equilibrium (HWE), the minor allele frequency (MAF), genotype missingness per marker, and population structure6. Although there is no gold standard for these QC check points, examples of thresholds that we would recommend are: excluding samples with 13
call rates <95%, and excluding SNPs which are out of HWE (p< 10-6) in control samples, MAF < 0.01, or genotype missingness >10%. Population structure is another important QC task to investigate and will be described in the next section.
1.1.2.2 Population structure
Early views of the role of population structure in genetic association studies of unrelated individuals focused on the concern that cryptic population substructure would raise the false-positive rate of statistical tests above their nominal level. For instance, in a case-control dataset, we assume that there are two underlying subpopulations with different allele frequencies at the SNP and that the number of cases is disproportionally high in one subpopulation
(Figure 1). Although genotype frequencies are identical in the cases and controls within a population 1 or population 2, it appears there are dramatic differences in CC and TT genotypes among cases and controls in the combined data. Under this scenario, the failure to account for population stratification, a confounding factor of allele frequency differences, could result in a false-positive association between a certain SNP and the disease status.
14
Figure 1. Impact of population stratification on genotype frequencies in the case-control association study. The percentages of individuals carrying different genotypes in cases in the population 1, combined populations and population 2 respectively are on top panel; analogously for controls in bottom panel. Cases are overrepresented in population 1.
Price and colleagues proposed a computational feasible approach to detect and correct population stratification7. In their approach, principal components analysis (PCA) was used to model ancestry differences between cases and controls. The EIGENSTRAT approach identifies ancestry differences among samples along eigenvectors of a covariates matrix. The ancestry outliers will be excluded from further association analyses. In addition to excluding these samples, the EIGENSTRAT approach is used to adjust the amounts attributable to ancestry for the top eigenvectors
(http://genepath.med.harvard.edu/~reich /Software.htm). Patterson and colleagues pointed out that top eigenvectors could be caused by a large set of markers in a high (or complete) LD block8. Hence they recommended pruning the markers in tight LD before performing PCA.
15
1.1.2.3 Study design
Case-control or cross-sectional study designs are widely adopted to
evaluate the association between the disease and multiple SNPs. The statistical
approach to analyse GWAS data is similar to traditional epidemiology studies,
except the same test is repeated for each SNP. Cochran-Armitage’s trend test,
χ2 test and logistical regression model are largely utilised in the case-control
design to study the overrepresentation of the mutated allele in cases versus
controls9.
Although most GWAS phenotype data, employing the existing
epidemiology cohorts, are collected longitudinally, they are usually analysed
in a case-control fashion. The incorporation of longitudinal information such
as modelling time to event and repeated measurements will add merit to
GWAS10. Analysing the longitudinal data of repeated measurements is
however computational intensive, and lacks efficient software. An alternative
way is to use the aggregate outcome of interest, i.e. changes in the outcome
over time, but the use of limited or partial data can compromise the statistical
power11.
For a family-based GWAS, the transmission disequilibrium test (TDT) is
used to measure the excessive-transmission of an allele from heterozygous
parents to the affected offspring under the condition of Mendel’s law12. TDT
has been generalised for multiple sibling using family based association tests
(FBATs)13. Such tests are extended to quantitative traits, named quantitative
transmission disequilibrium test (QTDT) and family-based association tests for quantitative traits (QFAM), and both are implemented in the QTDT software package (http://www.sph.umich.edu/csg/abecasis/QTDT/). 16
Compared to the population-based case-control design, family-based
association study in the use of trios of families is robust against the population
stratification14. However, the recruitment of parents-offsprings usually requires more research resources than that of unrelated subjects in population- based study, particularly posing challenges for late-onset diseases.
Furthermore, to obtain the similar statistical power, costs increase in genotyping trios to that of genotyping two individuals in the case-control study 15. These factors might explain the popularity of population-based
design in current GWAS.
1.1.2.4 Multiple testing
Testing multiple hypotheses simultaneously to draw the correct statistical inference is the most challenging aspect of a GWAS. It is now common to assay one million variants in a GWAS, and this effectively constitutes
1,000,000 hypothesis tests. A conventional significance threshold of 5% is thus expected to artificially identify 5,000 markers that are “correlated” to the trait. To address this issue of multiple testing, geneticists have adopted a stringent statistical significance level of 5.0 × 10-8, commonly defined as
attaining genome-wide significance, as the benchmark for evaluating the
fidelity of the association signal at each marker9. Notably, the Bonferroni
correction is simple but conservative, as assuming the independence of one
million genetic variants and all tests conducted without considering the inter-
marker correlation. Replication is thus considered as the gold standard for
GWAS publications16. Currently, the identification of candidate genetic loci
for replication is mainly driven by the level of statistical evidence from single-
17
marker association tests (either the p-value or the Bayes factor) for further
downstream functional evaluation.
1.1.3 Phenotype classification
1.1.3.1 Binary/quantitative traits
In gene mapping, ocular phenotypes are usually classified into two broad
types: qualitative (or binary) and quantitative (or continuous) traits.
Dichotomous traits have been featured in GWAS for age-related macular degeneration (AMD)17,18, primary open-angle glaucoma (POAG)19,20,
cataract21 and high myopia22,23. The affected individuals are usually classified
on the basis of diagnosis from the worse eye or both eyes, while controls
exhibit no sign of syndrome for both eyes. Although assessing the binary
outcome is more directly relevant to clinical application, quantitative traits
(endophenotypes or intermediate traits) underlying diseases are also valuable
in the dissection of the genetic architecture, as they take the full-spectrum
measures into account. For instance, central corneal thickness (CCT) and cup- to-disc ratio (CDR) are presented as quantitative endophenotypes of open-
angle glaucoma (OPRG)24. Mapping genes for CCT25-27 and CDR28,29 in the
GWAS would shed light on the joint genetic aetiology of OPRG.
A “myopia” gene may be practically relevant to the hyperopic defocus
whereas quantitative trait locus (QTL) for refractive error affecting ocular
component growth is responsible for the entire phenotypic spectrum. It is
possible that genes involved in a quantitative trait (refractive error) also play a
role in the extreme forms of the trait (high myopia)30.
18
1.1.3.2 Paired eye measurements
Often, the primary interest in ophthalmological genetic studies is to locate
shared quantitative genetic loci (QTL) that exert effects on both eyes31-33, as
the physiological mechanism underlying inter-eye difference of phenotypic
abnormalities remains elusive and inadequately understood. Therefore, for
quantitative traits collected from both eyes, an immediate question is whether
the analyses should be performed on data from one eye or two eyes. In seven
GWAS papers on eye-related QTL that have been published
(http://www.genome.gov/gwastudies), the analytic strategies varied from the
use of right eye26,27,29 or a randomly chosen eye28 to the averaged measurement from two eyes25,34,35. Conducting analysis on one eye alone is a
simple approach to avoid the statistical model complexity. However, using
partial data of one eye only might be statistically inefficient. Averaging ocular
measurements between two eyes has been suggested to yield higher
heterogeneity estimates than using information from one eye only; therefore
this tends to have more power in genetic studies36. Using averaged ocular
measurements therefore has been the convention in QTL linkage studies in the
myopia genetics research community37-40. However, in a few scenarios the
traits might be moderately or weakly correlated between two eyes41. Neither
the use of data from one eye nor an average from both eyes is appropriate due
to the negligence of phenotypic dissimilarity.
A wide array of statistical approaches has emerged recently for the
detection of the pleiotropic genetic factors contributing to multiple correlated
traits, which could also be applied to two-eye data (see Table 1). The
simultaneous consideration of all correlated phenotypes has been shown to be 19
statistically powered to exploit pleiotropic genetic effects over univariate
analysis42-45. The first approach is to combine dependent test statistics or estimators from the univariate analyses for a global assessment on association42,46-48. In brief, GWAS tests are conducted for two eyes separately.
The two test statistics from both eyes (for example, z scores) are combined
subsequently in a linear form weighted by the covariance matrix estimates42,48.
Correcting for twice the number of markers is not relevant here since for each
marker only one global test is performed using the combined statistics. This
simple approach does not rely on any complicated model assumption as well.
The second approach is to transform multiple traits to an optimal single
phenotype with enhanced heritability, and one such example is principle
component analysis43,49. This dimension reduction technique involves
intensive computation, thus the application in two-eye data might not be
straightforward. The third one is model-based joint analysis of bivariate traits,
including generalized estimating equations (GEE)44,50-52, the mixed-effect
model45,53 and tree-based regression model54, etc. Among these, the GEE
model is most statistically efficient to perform bivariate association tests44,52.
To date, few statistical software packages incorporating model-based joint
analyses on bivariate traits are available55, and much more effort should be
devoted to this area.
20
Table 1. Summary of analytic approaches for quantitative trait two-eye data in genome-wide association studies
Approaches Comments
Data from One eye
-either eye or a Simple; less powerful if the correlation between the two randomized eye traits is low
Data from Both Eyes
Transform bivariate traits to one trait
-average measurements Simple and efficient; statistically less efficient if the correlation between bivariate traits is low and missing data are present on either eye.
-principle components Statistically powerful; complex; reduce the phenotypes analysis43,49 to a single trait; computationally intensive
Combining univariate test Simple and powerful; capable of handling paired-eye statistics traits not highly correlated; robust for partially missing trait values
Model-based approaches
-GEE44,50-52 Statistically powerful; robust for various correlation structures; efficient on both normal and nonnormal traits; complex
-mixed-effect model50 Statistically powerful; complex; robust for various correlation structures of multiple traits; computationally intensive
-tree-based regression54 Analytically complex; capable of assessing multiloci association test for multivariate traits; computation extremely intensive
21
1.1.4 Meta-analysis of genome-wide association studies
Accumulated evidence suggests that most of the GWAS are underpowered
for the variants with small effect sizes (ORs of 1.0 ~ 1.5), and the associated
SNPs generally explain a small fraction of the genetic risk56. Meta-analysis
provides a robust approach to enhance statistical power and effective sample
size by pooling evidence from multiple independent association studies57,58.
The application of meta-analysis in ophthalmology has become a standard
practice to identify genes that are associated with eye disorders26-29,34,35.
1.1.4.1 Imputation on genotyped data
If the individual GWAS is conducted with different genotyping platforms
(Illumina or Affymetrix), the meta-analysis strategy could only utilise a small
subset of overlapped markers. In addition, if the causal polymorphism is a
common untyped SNP and in varying degrees of LD with the genotyped SNP
nearby in different populations, the meta-analysis also has limited power to detect true association in the combined data. One way to address these issues
is to perform imputation using the HapMap reference panels, which provide a
powerful framework for the assessment of the complete array of genetic
variants (most of which are un-typed). Step-by-step guidelines and techniques for performing imputation-based genome-wide meta-analysis was reviewed by de Bakker and colleagues58. The development of several imputation methods
for inferring the genotypes of untyped markers has provided a solution for this
problem (for a review, see59). The basic idea behind imputation is to utilise the
correlation among untyped and typed markers to infer the genotypes of
untyped markers in each dataset. With the imputation programs becoming
22
available, we now can impute untyped markers at the first stage to allow
assessing multiple datasets for the same set of SNPs.
The accuracy of imputation largely depends on two factors. First, the
overall level of LD reflects the distance over which the genotypic correlations
permit imputation to extend, so the imputation is more accurate in high-LD
regions60. Second, the level of genetic similarity of the study population to the
reference panels affects the utility of the haplotypes copied from the reference
samples in imputing genotypes in the study populations. Imputation accuracy
based on HapMap reference panels is highest in European populations, which
are closely related to the HapMap CEU panel, and lowest in Africans with a
diverse genetic background. If GWAS are conducted in populations which are
not represented by the available high density reference panels in HapMap
data, for example, Malays and Indians, mixtures of reference panels are
recommended to maximize imputation accuracy61 .
In addition, it should be noted that imputation is generally computational intensive. IMPUT60,62, MACH62, and BEAGLE63 are the frequently used programs. Each has different strengths and weaknesses, but none of them is optimal for all situations64.
1.1.4.2 Statistics in the meta-analysis
Meta-analysis in the setting of genetic studies refers to combining
summary statistics of overlapping SNPs from multiple genetic association
studies. Since combining raw individual genotype and phenotype data across
studies to perform pooled analysis is difficult in general, the meta-analysis
23
based on the summary results is a surrogate to assess the association tests
across all datasets. Here, we describe a few meta-analysis methods in GWAS.
First, the simplest meta-analysis method is Fisher's methods Tfisher = -2 * ∑
2 log(pi), where pi is p value of study i, i=1, …, k. Tfisher follows a χ
distribution of 2k degrees of freedom where k is the total number of datasets.
Since Fisher’s method takes only information from the p-values, it is
important to keep in mind that it should be applied to the markers with the
same direction of the effect to the susceptibility of the disease.
Second, Mantel-Haenszel methods are commonly used for dichotomous
traits if the information for a 2 × 2 contingency table can be recovered from
each study65. In combining odds ratio, weight is usually given proportionally to the precision of the results in each study.
Third, if a 2 × 2 table is not available in each study, such as if p-values
were obtained from logistical regression framework in order to adjust for
potential confounding covariates, using z-score statistics to compute the meta- p values is the best option. The z-score statistics are wildly used in practice for meta-analysis since a z-score could be easily converted in each study and the
direction of effect is manifested in itself58. For quantitative traits, the pooled
weighted effect size is commonly calculated as the sum of the individual
effect size using inverse variance of each study as weight. Such an approach is
also known as a fixed-effect model under an assumption of the same expected
effect size between studies. Combined effect size is calculated as:
Tmeta = ∑Tiwi,
where Ti is the effect size of study i and wi is the inverse variance of effect
size of study i. The pooled standard error of Tmeta is: 24
SEmeta= 1 �∑푤푖 Then a pooled z-score is obtained as
zmeta = , which follows a chi-square distribution 푇푚푒푡푎 푆퐸푚푒푡푎 with 1 degree of freedom. In cases where the variance is not given in the
summary statistics or standard error is not on the same unit (for example, the
quantitative trait is not measure on the same unit), a z-score can then be
summed across multiple studies weighting them by study sample size:
= , where wi = . 훽푖 푁푖 푍푚푒푡푎 ∑ 푆퐸푖 푤푖 �푁푡표푡푎푙 It is unlikely that every dataset for a meta-analysis is derived from a single homogenous population with the same genetic effect. Therefore, it is important to access the heterogeneity across datasets. A commonly used method to assess between-study heterogeneity is called Cochran’s Q statistic, for which the large values of Cochran’s Q favour the alternative hypothesis of heterogeneity. For datasets i = 1, … , k, T1, … , Tk is the study-specific effect size. The Cochran’s Q statistic is computed by:
= ( ), = where 푘 푘 ∑푖=1 푤푖푇푖 푖 푖 푘 푄 ∑푖=1 푤 푇 − 푇 푇 ∑푖=1 푤푖
and wi is the inverse of the estimated variance in dataset i. Q is distributed as a chi-square distribution with k-1 degrees of freedom. An alternative form, statistic I2 (inconsistency), derived from Q, 100% × (Q-degree of freedom), is
a measure of the percentage of heterogeneity versus total variation across
studies. Values of I2 over 50% indicate the presence of heterogeneity. If
25
evidence of heterogeneity is demonstrated, measures to identify its possible
cause are needed before drawing any explicit conclusion.
Finally, given the presence of inter-study heterogeneity, a random-effect
model in the meta-analysis makes an assumption that individual studies are
sampled from populations that may have different true effect sizes.
Differences in observed effect sizes arise from two resources: random errors
and true variations in expected effect sizes. In practice, the meta-analysis is
conducted in diverse populations using different study design, sample
ascertainment and phenotype definition. More advanced statistical analyses
are expected to accommodate these issues in the trans-ethnic mapping66-68. No matter what statistical strategies are adopted in such scenarios, additional cohorts for replication or fine-mapping approaches are required to further investigate on the true genetic variants of interest.
1.1.4.3 Statistical challenges in analyzing multi-ethnic populations
A meta-analysis of GWAS across multi-ethnic groups enables us to
uncover the shared genetic variants underlying susceptibility to diseases, an
essential component of the next phase of GWAS to gain a broader view of
disease aetiology69. Heterogeneity, where the genetic effect exits but the effect
sizes vary in different populations, poses a major challenge in the multi-ethnic
meta-analysis. One example is the ε4 allele of the apolipoprotein E gene
(APOE), which is associated with Alzheimer’s disease in Caucasians of per- allele odds ratio above 2, but not significantly associated in African
Americans70. Therefore, the association signals at APOE are expected to dilate
in the pooled data comprising both Caucasians and Africans.
26
The ethnic-specific genetic risk offers a clue to understand the interaction of the identified genetic locus with the undetermined environmental or genetic
variables that influence diseases. The gene-gene (epistatic) or gene-
environmental interactions occur when the effect of a causal variant manifests
under a certain genetic or environmental background71. Environment can have
a substantial role to influence the effect sizes at a given susceptibility locus.
However, the detection of gene-gene or gene-environmental interaction is a
daunting task. Little robust evidence has been provided for ocular diseases.
Heterogeneity can also occur when the genetic association between the
causal variant and the genotyped SNPs varies in different ethnic groups, or in
different samples but the same population (due to the sampling error). The
different LD pattern can generate spurious associations in terms of both size
and direction of effects at the genotyped SNPs, confounding the underlying
true effect of the casual variant72.
Allele frequency also has an impact on effect sizes of the risk allele. It has been noted that wide variation in allele frequencies at susceptibility loci to the complex diseases across populations. One such example is rs19061170 in the complement factor H gene, which has a large effect size with age-related macular degeneration in Caucasians that is much smaller in East Asian populations; the risk allele is found at low frequencies in East Asians (5%), but at moderate frequencies in Caucasians (35%)73.
Allelic heterogeneity (or population-specific causal variants) is also
noteworthy, where the causal variants reside at different loci but likely in the
same functional unit across different populations. However, current meta-
27
analysis approaches for the combination of genetic association results in the
presence of allelic heterogeneity are underpowered74. Allelic heterogeneity is
believed to be enriched for rare variants, and gene-based or regional-based
meta-analysis is expected in exploring sequencing or exome sequencing data
targeting rare variants75.
1.2 Recombination variation between populations
1.2.1 Recombination and genetic diversity
Homologous recombination is one of the key evolutionary determinants of
genomic diversity through the introduction of new haplotypes that alter the
extent and pattern of linkage disequilibrium (LD)76. The most striking feature of recombination in human is the tendency to cluster in highly localized regions named ‘hotspots’ in the human genome of typical 1 to 2 kb in width77,78. Extensive heterogeneity in recombination rates has been catalogued between species79-81. In the comparison between the genomes of human and
chimpanzee, where even though 99% of the genomes were conserved,
remarkable differences in recombination and LD patterns were observed82,83.
Meiotic recombination landscape is transient over evolutionary time84,85 and
highly variable between individuals78,86.
Homologous recombination is an important evolutionary determinant of
genomic diversity by producing novel combination of alleles, resulting in
selection for or against new haplotypes, and linkage disequilibrium (LD)
decay76. LD mapping is one of the key features to permit the success of
genome-wide assessment by linking the untyped functional polymorphism and
28
surrounding assayed markers87,88. Recombination rate has also been widely
used as a surrogate for LD in SNP imputation algorithms60. As such,
understanding recombination variation does not only provides insight into the
genome evolutionary process that has shaped the genetic diversity along the
human history, but also builds a foundation for genetic studies to disentangle
the genes that are associated with common diseases89.
1.2.2 Variation in inter-population recombination
Within the human species, our understanding of fine-scale differences in rates of recombination between human populations remains relatively limited.
Studies have shown that, on a broad scale, recombination rates generally remain evolutionarily conserved in the entire genome90-92. Of the
approximately 30,000 potential recombination hotspots estimated from
European, African and Han Chinese ancestries, only half are common to three
populations, and the remaining are population-specific93. Significant variation
in recombination rates have been documented mostly in regions containing
polymorphic inversions90,94,95, although these comparisons have mainly been
performed at a broad scale across regions stretching megabases in lengths. At
a finer scale comparison across kilobases of the genome, population-specific
spikes or peaks in rates have similarly been reported92,96,97. Recently, Hinch
and colleagues have inferred that 2,500 recombination hotspots, defined as
localized regions of elevated recombination, are active in West Africans but
not Europeans, and that there appears to be a scarcity of hotspots that are
unique to the people of European ancestry98. This observation, along with
findings from the sequenced genes by the Seattle SNPs program96,97, suggests
29
that the intensity and location of recombination hotspots can differ
substantially across different populations. The International HapMap
Consortium5,99 provided the first large-scale database with sufficiently dense
genotyping across the human genome in multiple populations for investigating
recombination. However, there is no study to-date that systematically
interrogates the whole genome for evidence of inter-population variation in recombination rates.
Such inter-population differences in recombination patterns can provide vital opportunities in fine-mapping the functional polymorphisms that underpin the association signals from large-scale genetic studies, through leveraging on different patterns of LD in multiple populations. Understanding the similarities in recombination rates across multiple populations is also important in bioinformatic analyses that depend on recombination rates, such as in genotype imputation and in surveying the human genome for signatures of positive natural selection. Variation in recombination, particularly at disease-associated regions, is likely to have important consequences to genetic association studies.
1.2.3 Current approaches of quantifying recombination differences
Comparing recombination differences at a fine scale relies on the
availability of genetic maps at a high resolution. However, generating a
precise genome-wide map of recombination via direct experimental mapping
of hotspots is not feasible. Genetic maps of recombination commonly
employed in genotype imputation and population selection surveys, such as 30
those from the International HapMap Project5 or from the Singapore Genome
Variation Project100, are usually probabilistically inferred from genotype data
across at least tens of samples from a particular population by correlating
recombination events with the breakdown of linkage disequilibrium (LD)
observed across the population samples based on population coalescent
theory101. The resolution of these maps thus depends on the density of the
SNPs in these databases, and typically yields a resolution in the order of
kilobases. Such LD-based estimates of recombination rate are sex-averaged
over tens of thousands of generations, and are likely to be influenced by the
locus-specific demographic forces102,103. Despite the potential limitations,
these estimated rates of recombination have yielded remarkable insights into
the process of human evolution, leading to the identification of 13-basepair
motifs that are enriched in hotspots104 and the discovery of the PRDM9 gene
as a genetic modifier of recombination activity105.
Current metrics that prioritise genomic regions exhibiting differences in recombination profiles or differential presence of recombination hotspots tend to rely on ad-hoc thresholds, such as: (i) searching for recombination rates exceeding 5 cM/Mb over 2 kb in one population but yet less than 1 cM/Mb in the other population98; (ii) possessing a standardized rate of 10 over a 10kb region in one population but less than 3 or 1 in the second population106; (iii) a five-fold increase in the mean recombination rate in only one population107;
or (iv) spanning a genetic distance of more than 0.01cM within a physical
distance of less than 100kb in only one population but not the other108. Using
different definitions can alter the number and positions of the detected
31
hotspots, and simply querying whether hotspots overlap between populations
may neglect vital information on the local recombination profile.
1.3 Refractive errors and the aetiology of myopia
Refractive errors broadly comprise two types of ocular abnormalities:
spherical errors and cylindrical errors. Spherical errors include myopia
(commonly known as nearsightedness) and hyperopia (farsightedness), while
the condition of cylindrical errors is usually called astigmatism.
Myopia, a multifactorial disorder, represents one of the most common
refractive errors. It is most often associated with subsequent long-term
pathological outcomes. Myopia is caused by a variety of ocular, optical or
functional difficulties manifested while visually interacting with the external
environment109. Environmental factors such as the extent of near work, level
of educational attainment and amount of outdoor activities have been
documented to affect myopia development110. On the other hand, compelling
evidence points to the genetic basis of myopia and more than twenty myopic
loci have been reported from genome-wide linkage studies, some of which show evidence of replication in the independent studies33. In the last five
years, genome-wide association studies have suggested that several genes are
associated with myopia, which are currently awaiting further confirmation and
the assessment of their biological function.
In contrast to myopia, very little data is currently available with regard to elucidating the aetiology of astigmatism. No environmental factors have been recognised to influence the development of astigmatism. Although
32
astigmatism is a heritable trait, no prior study has reported any genes
associated with astigmatism.
1.3.1 Types of refractive errors
1.3.1.1 Myopia, hyperopia and ocular biometrics
When light rays are focused in front of the retina, leading to blurred vision
on far objects, myopia occurs (Figure 2A). Similarly, when light rays are focused behind the retina, it is called hyperopia, as the near objects are blurred.
Myopia poses a considerable public health burden. It is highly prevalent, especially in urban areas of East and Southeast Asia, where 80% of children completing high school have myopia109. Myopia-associated pathological
complications could lead to degenerative changes in the retina and the
choroid, which are not prevented by optical correction. This subsequently
increases the risk of visual impairment through myopic maculopathy,
choroidal neovascularisation and retinal detachment111,112.
Spherical refractive errors are measured on a continuous dioptric scale, an
optical power of lens in diopters (D) that is necessary to correct the myopic or
hyperopic eye, and are generally quantified using the spherical equivalent (SE;
the algebraic sum of the value of the sphere and half the cylindrical value).
Various categorisations have been applied when describing different refractive
states. By convention, an eye presenting with an SE beyond 0.5 D is referred
to as being hyperopic; a value between -0.5 and 0.5 D is referred to as being
emmetropic. Myopia is defined of the SE at least -1.00 or -0.50 D, and can be
33
further divided into mild (-3.0 < SE ≤ -0.5), moderate (-6.0 < SE ≤ - 3.0) and
high myopia (SE ≤ -6.0).
The spherical refractive error status is contributed by the underlying ocular
biometrics: the optical power of the cornea and lens, and the axial length (AL)
of the eyeball (Figure 2A). AL is composed of the anterior chamber depth
(ACD), lens thickness and vitreous chamber depth (VCD)113. Particularly, myopic subjects are more likely to have a longer axial length. A 1mm increase in AL, mainly through the elongation of the vitreous chamber, is equivalent to a myopic shift of -2.00 to -3.00 D without corresponding changes in the optical power of the cornea and lens. In contrast, the differences in lens thickness and corneal curvature (CC) by comparing myopic to emmetropic subjects are minimal114. Therefore, the control of the AL and excessive
elongation of the eyes is crucial for achieving normal vision in humans.
A B cornea axial length
lens retina retina
Figure 2. Cross-sectional view of the human eye structure. A) myopic eye; B)
astigmatic eye.
1.3.1.2 Astigmatism
Cylindrical refractive errors commonly refer to astigmatism, where the
light rays do not bend properly to achieve a single focus point on the retina
34
(Figure 2B). While astigmatism comprises corneal and non-corneal components, it typically results from the unequal curvature of two principle meridians in the anterior surface of the cornea, which is known as corneal astigmatism115,116.
The presence of a high degree of astigmatism during early development is
believed to be associated with refractive amblyopia117-119, as evidenced by
decreased best-corrected visual acuity which cannot be remedied by external
corrective lenses. Early abnormal visual input caused by uncorrected
astigmatism can lead to orientation-dependent visual deficits, despite optical correction of visual acuity later in life120. In addition, it has been suggested
that optical blurring by astigmatism may predispose the development of
myopia121-124. Astigmatism is highly prevalent across most populations, with at least 1 in 3 adults above 30 years of age suffering from astigmatism of 0.5
D or greater125.
1.3.2 Experimental animal myopia models
1.3.2.1 Deprivation myopia and inducing myopia
Deprivation myopia occurs when the eyesight is deprived by limited illumination and degraded vision image, e.g. as a result of wearing a diffusing goggle (form deprivation myopia), or a negative/positive spectacle lens
(induced myopia) in front of the eye. Such a phenomenon has been observed in a wide range of species including the chicken, fish, tree shrew, rhesus monkey, guinea pig and mouse126. A negative lens in front of the eye induces
hyperopic defocus (image behind the retina photoreceptors) that results in the
elongation of the eyeball to compensate for the optical effects of the lens. 35
Therefore the eye becomes myopic because of the excessive elongated axial
length. Analogously, a positive lens causes the image to form in front of the
retina (myopic defocus) and reduces the ocular growth rate. Nevertheless,
findings related to its association with hyperopia are less consistent in primates as compared to chicks and mice. It is noteworthy to mention that, in
the animal model, such induced myopia or hyperopia generally shows a
significant degree of recovery after lens removal127.
Myopia induction in animals following alteration of the visual input requires the eye or brain to be able to distinguish myopic defocus from hyperopic defocus. Although the retina produces biochemical signals which control eye growth in response to local defocus, it has become clear that both retinal and central elements play roles in the emmetropisation process, with the central nervous system exhibiting fine-tuning128. Of particular interest are
genes that express in the opposite direction in ocular tissues when subjects
alternately wear negative and positive lenses. The transcription factor ZENK, a
so-called ‘STOP” sign for myopia, is found expressed differently within
glucagonergic amacrine cells in myopia- versus hyperopia-induced mouse models129. Dopamine has been shown to be involved in the optical regulation
of eye growth in myopia-induced animal model, while its gene expression is
also mainly restricted to the amacrine cells in the retina130. Muscarinic
receptors are known to regulate several important physiologic processes in eye
growth, and antagonists to these receptors, such as atropine and pirenzepine,
are effective in stopping the excessive ocular growth that results in myopia131.
However, the primary mechanism of genes as well as the pathway used by the
eye to detect the direction of defocus remains unclear. 36
1.3.2.2 Emmetropisation and the role of scleral changes in eye growth
The endogenous process in matching axial length to the focal place in the
eye growth is called emmetropisation. This biological mechanism involves the
detection of myopic or hyperopic defocus at the retina, signal transmission
across the retinal pigment epithelium and choroid, and alteration of the scleral
matrix132. In myopic human eyes, it is speculated that the emmetropisation
mechanism is defective, with a loss of ability to use myopia defocus to slow
the growth of the eyes. In this case, eyes will gradually become more myopic.
This diminished emmetropisation was also observed in an animal study
showing that wearing positive lens in myopic-induced older tree shrews had
less of an effect; most of the eyes remained myopic while wearing the lens in
older tree shrews, which was in contrast to what was found in infant tree
shrews133.
A larger body of evidence shows that the changes in refraction in animal
models are primarily due to changes in AL, rather than in corneal or lens
parameters. In the process of AL elongation, scleral remodelling plays a
pivotal role in eye size regulation, with sclera thinning and changes in
collagen fibril architecture through the turnover of extra-cellular matrix
(ECM) materials; this is evident in mammals134.
1.3.2.3 Peripheral refraction
There is a growing interest in understanding the role of peripheral
refraction in controlling eye growth. The visual optical device implemented in
the animal model that affects the entire field of view can alter the pattern of peripheral refraction as well. Emerging evidence in animal studies suggests
37
that the peripheral retinal signals can dominate axial growth and central
refractive development when there are conflicting visual signals in the central
and peripheral retina135. Smith and colleagues found that infant monkeys with
peripheral form deprivation but intact central vision were significantly less
hyperopic or more myopic compared to the age-matched controls, suggesting
that the peripheral retina contributes to emmetropising responses136. A recent
study in monkeys also showed that foveal ablation by itself did not produce
alterations in either the central or peripheral refractive errors of treated eyes
137. However, emmetropisation appears not to be affected by changes in
peripheral refraction in chicks, possibly due to different patterns in the
distribution of photoreceptors on the retina in chicks and primates138.
1.3.3 Roles of environmental factors in controlling human refraction
Numerous studies support factors such as the level of educational attainment, near work and outdoor activities having an effect on myopia onset or progression. Evidence has also recently emerged to support a potential role of peripheral refractive errors in myopia development.
Level of education has been consistently associated with myopia across different ethnic groups in a large number of epidemiological studies, where higher academic achievements appear to be positively correlated with myopia139-141. Education level usually correlates with the time spent on reading and writing, so this can be treated as a surrogate of near work.
Near work has long been regarded as an important factor for the
development of myopia. Under the accommodation theory, the eye increases
38
its optical power to maintain a clear visual image when reviewing a near object. The active accommodation to a near target is the driver to move the retina toward the conjugate point, resulting in axial eye elongation142. In humans, clinical evidence has showed that atropine, a potent cycloplegic, inhibits myopic progression under the non-accommodative mechanism143. The important role of near work may also be implicated from the fact that more intensive near work concurs with the increased prevalence of myopia seen in the past few decades. In a longitudinal cohort study in school children aged 7 to 9 years (Singapore cohort study for risk factors of myopia; SCORM), students who read more than two books per week were three times more likely to have higher myopia compared to those who read less than two books per week144. In a population-based sample of 12-year-old Australian schoolchildren recruited in the Sydney Myopia Study (SMS), the positive association between the time of continuously reading (>30 min)/reading distance (<30cm) and myopia was maintained145. It is noteworthy that the association between near work and myopia assessed so far is probably not very precise, partially because of the inaccuracy of the questionnaire in measuring the amount of near work. Although it remains to be elucidated whether the total amount of time of near work, or the modifiable reading habit
(i.e. intensity of reading, reading posture, or proper lighting), is associated with myopia, most ophthalmologists recommend taking breaks during periods of high-intensity near work to prevent the onset of myopia.
A recent study reported that the number of sports and hours of outdoor activity, rather than the number of reading hours per week, was significantly associated with myopia in children aged 8 to 13 years in the U.S146. Similarly, 39
students in Australia who had performed high levels of outdoor activity were
found to be more hyperopic than those involved in a low amount of outdoor
activity147, where the protective effect of outdoor activities appeared to be
associated with the total time spent outdoors, rather than sports per se. The
biological basis however remains unclear. The authors postulated that high
light intensity promotes the retinal transmitter dopamine, which can inhibit
eye elongation147,148. Recent animal experiments suggested that the bright light
level could significantly retard the development of form-deprivation myopia149. An alternative possibility is the greater depth of three-dimensional
structures from a large outdoor field, which could impact the patterns of
defocus on the retina150.
Studies in primates have shown that hyperopic defocus in the retinal
periphery could stimulate foveal myopic progression135. In humans, myopes
tend to show less myopic off-axis (periphery hyperiopia) because of their
relatively less oblate ocular shape, while hyperopic eyes tend to show less
hyperopic off-axis (periphery myopia)151,152. In a longitudinal cohort of about
1,000 children aged between 6 and 14 years in the Collaborative Longitudinal
Evaluation of Ethnicity and Refractive Error (CLEERE) study in the U.S.,
children who became myopic were more hyperopic relative peripheral
refractive errors than emmetropes from 2 years before the onset through to 5
years after the onset of myopia153. Following 2,043 non-myopic third-grade
children for five years, Mutti and colleagues recently found that the
association between hyperopic peripheral defocus and myopia onset was
inconsistent across different ethnic groups, as the highest risk for myopia was
noted in Asian children154. Myopia progression was greater per diopter of 40
more hyperopic relative peripheral refractive error, but only by a small amount
(−0.024 D per year).
Other environmental factors such as higher glycaemic index diets, body
mass index, and early exposure to night lights or depth of the field have also been postulated for myopia development155. Collectively, the environmental
factors account for a small proportion of the variation in myopia156.
1.3.4 Genetic basis of myopia
Evidence from family and twin studies has supported a substantial genetic
component in myopia, astigmatism and underlying ocular parameters156-163.
The following section mainly highlights the findings for myopia, and relevant ocular biometrics. The genetic basis for corneal astigmatism will be addressed in Chapter 4.
Syndromic myopia refers to the condition where high myopia is one of the symptoms of a Mendelian disorder, such as connective tissue disorders
(Marfan and Sticker syndromes). In the following section, I will focus on simple non-syndromic myopia rather than degenerated syndromic myopia.
1.3.4.1 Familial aggregation and segregation
Familial aggregation represents a scenario where relatives with the same genes and/or shared environmental factors are more phenotypically similar than unrelated individuals. The recurrence risk (λ) refers to the risk of being affected given an affected sibling, relative to the risk of being affected in the overall population. The study of familial aggregation is a central theme in
41
genetic field in order to evaluate the relevant genetic and environmental
effects.
Ample evidence shows the clustering of myopia within a family. The Old
Order Amish (OOA) is an isolated population with homogeneous agrarian
lifestyle, representing an ideal population to investigate the genetic heritability
and familial aggregation of refractive errors. In 967 siblings in 269 OOA
families, λs were estimated at 2.36 (95% CI: 1.65-3.19) and 4.52 (95% CI:
2.27 – 8.38) for mild (SE ≤ -0.5 D) and moderate myopia (SE ≤ -2.5 D), respectively164. In 296 siblings in a cohort from the United Kingdom, λs were
approximately 5 to 20 for high myopia (SE ≤ - 6.0 D), and 1.5 to 3 for moderate myopia (-3.0 D ≤ SE ≤ -1.0 D)165. Similarly, in 1,259 nuclear
families of Tehran, for myopia at threshold of SE at -0.5 D or -2.0 D, λs
ranged from 2.09 to 3.86 in siblings and 1.82 to 3.81 among parent-offspring
pairs166. Strong familial aggregation for myopia was also observed in 759
siblings in 241 families recruited from the Salisbury Eye Evaluation Study in
Maryland167, where the odds ratios of myopia (SE ≤ -0.5 D) or moderate myopia (SE ≤ - 2.0 D) were greater than 2 for an individual having a myopic sibling relative to non-myopic sibling. Interestingly, in the Framingham Eye
Study cohort, the estimated odds ratio of myopia (SE ≤ -1.0 D) was 5 for an age difference of 2 years, while it was reduced to half that for an age difference of 10 years, showing a pattern of the decrease in the strength of the between-sibling association for myopia with increasing between-sibling age differences168.
Eye size has also been shown to vary in children according to the parental
history of myopia. A cross-sectional study of 716 schoolchildren (aged 6 to 14 42
years) demonstrated that children with two myopic parents had longer eyes
and less hyperopic refractive error than children with only one myopic parent
or no myopic parents169. A one-year follow-up study in 4,468 Hong Kong
Chinese children aged 5 to 16 years showed more rapid changes in AL growth
and myopia progression among children with a stronger parental history of
myopia170. In SCORM, parental myopia was significantly associated with the
3 year change in AL after adjustment for school, age, sex, race, and reading in
books per week114. As a contrast, in a longitudinal cohort of the Beaver Dam
Eye study, although strong familial aggregation was evident for a sibling to be myopic (λ = 4.18), no statistically significant correlation between siblings in changes of refraction, cylinder power or astigmatism were seen171.
Based on the evidence illustrated from familial aggregation studies, segregation analysis has been further adopted to elucidate the inheritance mode for myopia: a single gene of a large effect (monogenetic) versus multiple genes of small or modest effects (polygenetic). Through the complex segregation analysis, a study conducted in nuclear families of both Japanese and European ancestries has failed to support the influence of a single major gene on refraction172. Furthermore, Klein and colleagues showed that the
heritance mode of refection was polygenic, rather than showing single–gene
Mendelian inheritance, in 620 extended pedigrees of 2138 Caucasians from
the Beaver Dam Eye Study40.
1.3.4.2 Estimates of heritability
Heritability is the proportion of phenotypic variation that is attributable to
genetic variation among subjects. Both twin studies and families data are
43
utilised to estimate the heritability of the trait of interest. In the simplest
scenario, twin studies examine whether the identical monozygotic (MZ) twins
raised together are more similar to each other for a given phenotype than
dizygotic (DZ) twins, who share half of their genes. The correlation between
MZ twins is expected to be twice that of DZ twins under the assumption of
additive genetic effects. The heritability (h2) is thus approximately twice of the
difference in the correlation of MZ and DZ twin pairs:
2 h = 2(rMZ – rDZ) where r = inter-pair correlation
Since the similarity of the phenotype for MZ twins in contrast to DZ twins
could also be due to shared environment factors, the underlying Equal
Environment Assumption (EEA) in the heritability estimation for MZ and DZ
twin is assumed to hold in the heritability studies.
Family studies are also adopted to estimate heritability in a more
complicated familial relationship. Provided the genetic covariance between
individuals of differing degree of relatedness, the Structure Equation
Modelling (SEM) method is widely used in the heritability estimation for twin
and family data, by solving a series of linear equations simultaneously and
permitting the joint estimation of the parameters.
The heritability estimate was 61% (95% CI: 34% - 88%) for ocular refractive error (SE) in 759 siblings of 241 families residing in Maryland167.
Similarly, the estimation was 69% (95% CI: 58%–85%) in the OOA population164,173. Twin studies have yielded higher heritability for refractive
errors, with 83% (95% CI: 77% - 87%) for 90 MZ and 86 DZ female twins in
44
Finland, and 85% in 226 MZ and 280 DZ twin pairs aged 49 to 79 years from the United Kingdom using a model with additive genetic and environmental components157. Recently, a large twin study comprising of 1,152 MZ and
1,149 DZ twin pairs yielded similar results (heritability for SE =0.77; 95% CI:
0.68-0.84)162. However, interpreting results from twin studies should be performed with caution. The heritability calculation might be overestimated as the excess correlation in MZ twins compared to DZ twins might be also due the larger shared environmental exposure within MZ twins.
Ample evidence supports an underlying genetic basis, not only for refractive error but also for underlying ocular dimensions. The majority of heritability estimates for ocular parameters are found to be greater than 0.5.
For instance, in 2,375 participants in a population-based family cohort from the Beaver Dam Eye Study, Klein and colleagues reported that the adjusted heritability estimates were 0.58 ± 0.13 for SE, 0.67 ± 0.14 for AL, 0.78 ± 0.14 for ACD, and 0.95 ± 0.11 for CC159. In 723 participants from 132 pedigrees in
Australia in the Gene in Myopia (GEM) family study, similar adjusted heritability estimates were reported for SE at 0.50 ± 0.05, 0.73 ± 0.04 for AL,
0.78 ± 0.04 for ACD, and 0.16 ± 0.06 for CC156. In 345 MZ and 267 DZ twin pairs between the ages of 18 and 88 years in Austria, the heritability was estimated in the men and women separately. The heritability values were found 0.88 and 0.75 in the men and woman for SE, 0.94 and 0.92 for AL, and
0.51 and 0.78 for ACD, respectively158. For a comprehensive review on the heritability for the ocular traits, please refer to a recent review article174.
45
1.3.5 Genetic loci associated with or linked to refractive errors
Genetic variants that predispose an individual to an increased risk of
developing myopia have been studied for several decades. The majority of
near-sighted individuals develop juvenile onset myopia (or school myopia),
with a peak incidence between 10 and 16 years of age33. A more severe form
of pathological myopia is contrasted with juvenile-onset myopia by its earlier
age of onset, the greater amount of myopia (≤ -6.0 D), and a higher frequency
of fundus abnormalities. As a result, the following questions may arise: what
are the genetic variants involved in various level of myopia? Are the genetic
factors for pathologic myopia (or early-onset myopia) similar to those for
common and moderate myopia (juvenile-onset myopia)?
1.3.5.1 Myopic loci identified from genome-wide linkage studies
More than 20 putative myopia loci have been implicated in the last two decades through genome-wide linkage studies (Table 2). Typically, most studies utilised families with a history of high myopia, and a severe form of myopia usually defined as SE ≤ -6.00 D; as a result, a number of genetic loci have been reported to link to high myopia. The list includes genetic loci on chromosome 2p37.1175, 4p22-q27176, 5p15.33-15.2177, 5p15.1-13.1178, 7p15179,
7p36180, 10q21.1181, 12q21-23182, 14q22.1 – 14q24.2183, 17q21-22184,
18p11.31185, Xq23-27.2186 and Xq28187. Of these implicated loci predisposed
to high myopia, several loci have been replicated in independent samples. For
instance, chromosome 7p15 (MYP17) and 7p36 (MYP4) were subsequently
replicated for quantitative refraction in American pedigrees188. The
46
chromosomal locus 12q21-23 (MYP3) was found to associate with high
myopia in an independent Caucasian samples from Europe189. The genetic
locus 18p11.31 (MYP2) identified from Chinese and American multi-
generation families showed significant association in 15 Chinese families190.
Further sequencing of these linkage peaks within MYP2, however, failed to
establish any genes involved in myopia development in the Caucasian
population191. A locus on chromosome Xq23-27.2 (MYP13) was replicated in an independent Chinese family study192, adjacent to MYP1 (Xq28). Although
a few genetic regions, i.e. loci at 7p15 (MYP17) and 7p36 (MYP4), have
shown association to quantitative refraction in replication studies, others,
including 18p (MYP2) and 12q (MYP3), do not appear to play a similar role
in moderate or mild myopia193,194.
A study on a milder form of myopia as affected samples defined as SE ≤ -
1.00 D pointed to a myopia locus to chromosome 22q12 (MYP6) in a large sample set of Ashkenazi Jewish individuals195. MYP6 was replicated in two
studies using either the same definition for myopia196 or quantitative trait of
refraction197.
A few genome-wide linkage studies have been conducted for quantitative
refraction. The implicated genetic regions pinpointed to chromosome loci
3q26, 4q12, 8p23 and 11p13 in 221 dizygotic twins39 and 1p36 in 49
multigenerational Ashkenazi Jewish families198. Hammon and colleagues
identified the strongest linkage signal on chromosome 11p13 (MYP7), which
contains the biologically relevant candidate paired box gene 6 (PAX6)39.
Chromosome 8p23 (MYP10) was further replicated in an OOA population for mild myopia199. 3q26 (MYP8) was also mapped to quantitative refraction and 47
hyperopia via a linkage scan188. Wojciechowski and colleagues performed a
genome-wide linkage meta-analysis for refractive error in pooled samples of
Caucasians, OOAs, African Americans and Ashkenazi Jewish subjects, but
chromosome 1q36 (MYP14) showed no significant association200. In an
international collaborative effort combining 254 pedigrees of Caucasian
families for quantitative refraction, the MYP14 genetic locus was replicated
with multipoint linkage scores above 1.5201.
Although twenty regions in the human genome susceptible to refractive
error and myopia have been implicated, no myopia genes have been
consistently identified from these regions by fine-mapping or deep-
sequencing. In addition, it is still an open question whether there is any
overlap between genes that might be responsible for both the rarer forms of
high myopia and the more common, less severe juvenile-onset myopia. This scenario reflects the complexity in the disease architecture of myopia pathogenesis.
AL is the most important endophenotype of myopia. Presently there have been only two genome-wide linkage studies performed in European descent populations that suggested the presence of AL quantitative trait loci (QTLs) on chromosomes 2p24202 and 5q (at 98 centimorgans) along with two classical
genetic loci (MYP3 at 12q21 and MYP9 at 4q12)203, but there are no reports
of any genes that are indisputably confirmed to be associated with AL.
48
Table 2 Myopia loci identified from genome-wide linkage studies. Location Myopia Study Inherita Affected status AAO LOD Ref locus population nce (yrs)
1p36 MYP14 Ashkenazi QTL SE ≤ -1.00; na 9.54 198 Jewish mean SE -3.46 Caucasian QTL mean SE −5.74 na 1.5 201 2q37.1 MYP12 Caucasian AD SE ≤ -7.25; mean <12 4.75 175 SE -14.46 3q26 MYP8 English QTL mean SE < 0 na 3.7 39 [-12.12, 7.25] English AD mean SE -0.55 na 204 American QTL Mean SE 0.44 na P=5.34×10-4 188 [-12.12, 8.38] 3q28 MYP Caucasian AD SE ≤ -5.00 na 11.5 205 unknown [-5,-18] 4q12 MYP9 UK QTL mean SE < 0 na 3.3 39 [-12.12, 7.25] 4p22 - q27 MYP11 Han Chinese AD SE [-5 ,-20] <6 3.11 176 5p15.33 -p15.2 MYP16 Han Chinese AD SE ≤ -6.00 na 4.81 177 Mean SE -11.59 5p15.1 -p13.3 MYP19 Han Chinese AD SE ≤ -6.00 & AL <12 3.02 178 >26 7p15 MYP17 French and AD SE ≤ -6.00 na 4.07 179 Algerian African QTL SE: 2.87 ± 3.58 na 5.87 38 American American QTL Mean SE: 0.44 na P=9.43 ×10-4 188 [-12.12, 8.38] 7q36 MYP4 French and AD SE ≤ -6.00 na 2.81 180 Algerian American QTL Mean SE:0.44 na P=2.32 ×10-3 188 [-12.12 , 8.38] 8p23 MYP10 UK QTL -12.12 ≤ SE ≤ na 4.1 39 7.25 Old Order AD SE ≤ -1.00 na 2.03 199 Amish 10q21.1 MYP15 Caucasian AD SE ≤ -6.00 na 3.22 181 11p13 MYP7 UK QTL SE [-12.12, 7.25] na 6.1 39 Ashkenazi AD SE ≤ -1.00 na >2 206 Jewish 12q21-23 MYP3 German/Italian AD SE ≤ -6.00 5.9 3.85 182 UK AD SE ≤ -6.00 na 2.54 189 14q22.1- q24.2 MYP18 Chinese AR SE ≤ -6.00 na 2.19 183 17q21-22 MYP5 English/Canadi AD SE ≤ -6.00, mean 8.9 3.17 184 an SE -13.95 18p11.31 MYP2 American & AD SE ≤ -6.00 6.8 9.59 185 Chinese Hong Kong SE ≤ -6.00 na 2.1 207 Chinese 22q12 MYP6 American AD SE ≤ -1.00 na 3.54 195 families of Jewish descent Jewish descent AD SE ≤ -1.00 na 4.73 196 American QTL Mean SE 0.44 na P=0.0033 188,1 [-12.12 , 8.38] 97 Xq23 - 25 MYP13 Han Chinese XR SE ≤ -6.00 <6 2.75 186 Xq23 – 27.2 Han Chinese XR SE ≤ -6.00 na 2.79 192 Xq28 MYP1 Danish XR SE ≤ -6.00 1.5-5 4.8 187 Asian Indian XR SE ≤ -6.00 na 5.3 208
49
SE – spherical equivalent in diopters; AL – axial length in mm; LOD-logarithm of odds; AAO – age at onset of myopia; na- not available.
1.3.5.2 Candidate gene studies
A number of candidate genes have been queried in genetic studies of
refractive errors based on their potential biological function, differential gene
expression in animal models, or until recently, their physical locations residing
in known myopia loci identified from the linkage studies or genome-wide association studies. Table 3 summarises the candidate genes studied for myopia at the time of writing this thesis.
Sclera thinning is an important process involved in the development of myopia. 90% of the human sclera is composed of ECM collagen fibrils. The mutation in collagen genes causes autosomal dominant syndromes, such as osteogenesis imperfecta, Ehlers-Danlos syndrome (collage, type 1, COL1A1) and Stickler’s syndrome (collagen, type 2, COL2A1). A positive association with COL1A1 was observed, but only in a Japanese study209. In contrast, two
studies supported the role of COL2A1 in the pathogenesis of myopia in 123
nuclear families of mixed ethnicities (62% Caucasian)210 and in a Duke-
Cardiff cohort totalling 276 families of Caucasians211. The SNP identified in
these two cohorts was rs1635529. However, the assessment of rs1635529 or
other investigated SNPs in COL2A1 for high myopia has failed to show
significance in the Chinese population212,213.
The structural organisation of the ECM also largely depends on the
cellular activity of fibroblasts, which is partially regulated by members of a
major family of zinc- and calcium-dependent endopeptidases: the matrix metalloproteinases (MMPs) and tissue inhibitors of metalloproteinases 50
(TIMP)214. In a family-based study, Wojciechowski and colleagues found
positive associations for rs1939008 (MMP1) and rs9928731 (MMP2) with
quantitative refraction in OOA families, but not Ashkenazi Jewish (ASHK)
families215. The authors speculated that strong environmental exposure may mask the effect in ASHK society where much a more rigorous schooling system was implemented. Significant association of MMP3 and MMP9 for
low myopia in the Caucasian population has also been observed216. However,
MMP isoforms 1, 2 and 3 do not appear to play critical roles in the
development of high myopia in the Japanese population217. The MMP3 and
TIMP1 genes also showed no significant association for high myopia in
Taiwanese men218.
Hepatocyte growth factor (HGF) and its receptor, cMET, which modulates
the MMP and TIMP pathways, have been proposed to play an active role in
the scleral remodelling. The association of rs373350 in HGF with myopia was
first reported in 128 families with severe myopic sibs defined as SE ≤ -10.0
D219. For the Caucasian population, the same SNP was found to be associated with HGF in 146 multiplex Caucasian families for moderate myopia220, but a
different set of SNPs within HGF showed significant association221. The
association of cMET with high myopia was noted in school children in
Singapore (Khor, et al., 2009), but these findings need further assessment in other studies.
Transforming growth factor-β (TGF-β) is known to initiate myofibroblast formation and fibrosis as well as the production of collagen222. Although five
members of the TGF-β family have currently been identified, only TGF-β
isoforms 1, 2, and 3 are detected in eyes, and TGF-β2 is the predominant 51
form. The mRNA levels of TGF-β are significantly reduced in retina, choroid and sclera in correlation with axial elongation of myopia (Liu et al., 1996).
The activation of TGFB1 expression in ocular tissues mediated by
transcription factor EGR1 (early growth response type 1 gene) is thought to
influence ocular elongation. The SNP rs1800470 in TGFB1, which is
associated with high myopia, was first reported in Taiwanese individuals (201 cases and 86 controls)223. Another SNP (rs4803455) in TGFB1 also showed a
significant association with high myopia in two Chinese studies224,225. With
regard to TGFB2, emerging evidence suggests its role in the development of
myopia, as rs7550232 in the promoter region in TGFβ2 was implicated with
high myopia226.
Insulin-like growth factor 1 (IGF-1) lies within the MYP3 loci and plays
an important role in cell proliferation, differentiation and apoptosis. It has
been speculated that high-glycaemic load carbohydrate diets may contribute to
the development and progression of myopia, perhaps by affecting sensitivity
to insulin or increasing free-circulating IGF-1 levels227. The role of the IGF-1 gene in association with human myopia was first examined in 265 multiplex families with 1,391 Caucasians, revealing significant associations between
SNP rs6214 with both high and mild myopia228. The association of IGF1 gene
with myopia was replicated in several Chinese populations229,230.
Trabecular-meshwork-inducible glucocorticoid response (TIGR) gene
encodes myocillin (MYOC). The expression of MYOC in the trabecular
meshwork is affected by TGFB and mechanical stretching. In addition, the
ECM components comprise small leucine-rich proteins/proteoglycans
(SLRPs) to regulate collagen fibril diameter, including lumican (LUM), 52
fibromodulin (FMOD), prolargin (PRELP) and opticin (OPTC)231. Given their roles in ECM assembly and expression in the eye, these candidates are likely to regulate the shape and size of eyes. However, there has been no data so far to support the roles of TIGR, MYOC and SLRPs in myopia development.
In addition to the growth factors and extracellular matrix components, a
‘major’ gene controlling the activity of many ‘myopia genes’ is also proposed to be involved in eye growth. For example, paired box (PAX) genes encode tissue-specific transcription factors and, among those, the PAX6 gene is involved in oculogenesis and ocular growth. The correct dosage of PAX6 is crucial for normal eye development: over-expression of Pax6 in mice results in microphthalmia, retinal dysplasia and defective retinal ganglion cell axon guidance232. Interestingly, despite a lack of evidence in the Caucasian population, three positive association studies in Chinese individuals suggest the involvement of PAX6 in the pathogenesis of high myopia232-234.
53
Table 3 Candidate genes studied for myopia.
Chr Gene Ref Phenotype Study population (sample Chosen criteria Significant sizes) SNPs 1 FMOD 226 High myopia Chinese ECM ns (rs7543418) (195 cases, 94 controls) 1 TGFB2 226 High myopia Chinese ECM rs7550232 (195 cases, 94 controls) ns (rs991967) 1 TIGR 235 High myopia Chinese MYP2 ns (MYOC) (70 cases, 69 controls) 236 High myopia Chinese rs235858, (162 families) rs2421853 237 High myopia Caucasian ns (164 Cardiff and 87 Duke families ) 3 SOX2 238 Low/moderate Caucasian Eye growth ns (n=596) 3 LEPREL1 205 High myopia Caucasian Linkage study c.1523 G>F ( a pedigree) 4 FGF2 210 Low/moderate American ECM/gene ns myopia (62% Caucasian, 123 expression families) 226 High myopia Chinese ns(rs308395, (195 cases, 94 controls) rs41348645) 239 High myopia Chinese ns (1064 cases, 1001 controls) 5 CTNND2 240 High myopia Chinese MYP16/ GWAS rs6885224 (287 cases, 673 controls) 7 cMET 220 All Caucasian ECM ns (146 multiplex families) 241 Moderate Chinese (n=650) rs2073560 myopia 242 Low/moderate, Caucasian (n=818 ) ns hypermetropric 7 HGF 219 High myopia Han Chinese ECM rs3735520 (SE ≤ -10.0 ) (128 families) 243 High myopia Chinese ns (288 cases, 208 controls) 220 All Caucasian rs3735520 (146 multiplex families) rs2286194
221 All Caucasian (n=551) rs1743, rs4732402, rs12536657, rs10272030, rs9642131. rs5745718, rs12536657 244 VCD Chinese (n=814) rs3735520 11 BDNF 210 Low/moderate American Function ns myopia (62% Caucasian, 123 families) 11 CHRM1 245 High myopia Chinese Function rs11823728, (194 cases, 109 controls) rs544978, rs5422969 11 MMP1 216 Low myopia Caucasian ECM/gene ns (SE ≤ -1.0) (n=366) expression 215 SPH (spherical AMISH (n=358) rs1939008 component of ASHK (n=535) (AMISH only) refraction) 217 High myopia Japanese ns (SE ≤ -6.0 & AL (case 725, controls 546) ≥ 26) 11 MMP3 218 High myopia Taiwanese ECM/gene ns (case 216; control 474) expression 216 Low myopia Caucasian (n=366) -1612 5A/6A (SE ≤ -1.0) 54
217 High myopia Japanese ns (SE ≤ -6.0 & AL (case 725, controls 546) ≥ 26) 11 PAX6 39 Low/moderate UK (221 dizygotic twin) MYP7 ns myopia 210 Low myopia American ns (SE ≤ -0.75) (62% Caucasian, 123 families) 238 Low/moderate Caucasian (n=596) ns myopia 233 High myopia Chinese rs667773 (188 cases, 85 controls) 234 High myopia Chinese (164 families with rs3026393 170 highly myopic offspring) 232 High myopia ( Chinese rs12421026, SE ≤ -8.0) (300 cases, 300 controls) rs2071754, rs3026393, rs1506 and rs12421026 12 COL2A1 210 Low myopia USA ECM rs1635529 (SE ≤ -0.75) (62% Caucasian, 123 families) 211 High myopia English/Welsh rs1034762, (SE ≤ -5.0) (146 Duke and 130 Cardiff rs1635529, families) rs1793933, rs3803183, rs17122571, rs1635529 212 High myopia Chinese ns (697 cases, 762 controls) 213 High myopia Chinese ns (581 cases, 384 controls) 12 DCN 246 High myopia Taiwanese ECM ns (120 cases, 137 controls) 247 High myopia Taiwanese ns (120 cases, 137 controls) 12 DSPG3 247 High myopia Taiwanese ns (120 cases, 137 controls) 12 LUM 246 High myopia Taiwanese ECM rs3759223 (120 cases, 137 controls) 243 High myopia Chinese ns (288 cases and 208 controls) 248 High myopia English and Finnish n/a (125 cases, 308 controls) 249 High myopia Chinese c1567 ( 201 cases, 86 controls) 247 High myopia Taiwanese rs3759223, (120 cases, 137 controls) rs3741834 12 IGF1 228 High myopia Caucasian (265 families) MYP3 and animal rs10860860, models rs2946834, rs6214 250 High myopia Polish (42 multiplex families) ns 230 High myopia Chinese rs12423791 (421 cases, 401 controls) 244 Lens thickness Chinese (n=814) rs6214 229 High myopia Chinese Haplotype (300 cases, 300 controls) rs12423791, rs7956547, rs5742632 15 GJD2 251 High myopia Japanese GWAS r634990, rs524952 (1125 cases, 929 controls) 244 Ocular Chinese (n=814) ns biometrics 15 RASGRF1 252 High myopia Chinese GWAS rs2218817, (640 cases, 1052 controls) rs10034228, rs1585471 251 High myopia Japanese rs8027441 (1125 cases, 929 controls) 16 MMP2 215 SPH (spherical AMISH (n=358) Gene expression rs9928731
55
component of ASHK (n=535) refraction) 217 High myopia Japanese ns (SE ≤ -6.0 &AL (case 725, controls 546) >=26) 17 COL1A1 253 High myopia Taiwanese MYP5 ns (471 cases, 623 controls) 209 High myopia Japanese rs2075555, (SE ≤ -9.25) (330 cases, 330 controls) rs2269336 254 High myopia Japanese ns (SE ≤ -6.0 and (427 cases, 420 controls) AL ≥ 26) 211 High myopia English/Welsh (146 Duke and ns (SE ≤ -5.0) 130 Cardiff families) 212 High myopia Chinese ns (SE ≤ -6.0 and (697 cases, 762 controls) AL ≥ 26) 17 RARA 255 Low/moderate, Caucasian (n=802) Gene expression ns hyperopic 18 TGIF 190 High myopia Chinese MYP2 657 (T - > G) (71 cases, 105 controls) 256 High myopia Caucasian (n=21, ns sequencing) 257 High myopia Japanese ns (SE ≤ -9.0) (330 cases, 330 controls) 258 AL, CC, Low Caucasian rs8082866, myopia (SE≤ - (257 cases, 294 controls) rs2020436 0.5) ns after multiple testing 243 High myopia Chinese ns (288 cases, 208 controls) 19 TGFB1 223 High myopia Taiwanese Gene rs1800470 (201 cases, 86 controls) expression/function 259 High myopia Japanese ns (SE ≤ -9.25 ) (300 cases, 300 controls) 243 High myopia Chinese ns (288 cases, 208 controls) 224 High myopia Chinese rs1800470, (SE ≤ -10.0) (300 cases, 300 controls) rs4803455
225 High myopia Singapore Chinese rs4803455 20 BMP2 260 High myopia Chinese (201 cases, 86 Gene expression rs2288255 controls) 20 MMP9 216 Low myopia Caucasian (n=366) ECM/ Gene Exon 6 (SE ≤ -1.0) expression (Arg- > Gln) 21 COL18A1 210 Low myopia American Function ns (SE ≤ -0.75) (62% Caucasian, 123 families) 21 UMODL1 261 High myopia Japanese Linkage study rs2839471 (520 cases, 520 controls) 23 TIMP1 218 High myopia Taiwanese Gene expression (216 cases, 474 controls) SE – Spherical equivalent in diopters; AL – Axial length in mm; VCD - Vitreous chamber depth; ns - not significant; ECM – Extracellular matrix; High myopia is defined as SE ≤ -6.0 D unless otherwise indicated in the parentheses. Controls are non-myopic controls.
BDNF - brain-derived neurotrophic factor; BMP2 –bone morphogenetic protein 2; COL18A1- collagen, type XVIII, alpha 1; COL2A1 – collagen, type II, alpha 1; COL1A1 – collagen, type 1, alpha 1; CTNND2 – catenin (cadherin – associated protein); cMET – growth factor receptor c-met; DCN – decorin; DSPG3 - dermatan sulfate proteoglycan 3; FMOD - fibromodulin ; FGF2 - fibroblast growth factor 2; FMOD – fibromodulin; GJD2 – gap junction protein, delta 2; HGF - hepatocyte growth factor; IGF1 – Insulin-like Growth factor-1; LEPREL1 – leprecan-like 1; LUM – Lumican; MMP-matrix metallopeptidase; PAX6 –paired box gene 6; RASGRF1 – RAS protein-specific guanine nucleotide –releasing factor 1. RARA – retinoic acid receptor, alpha; SOX2 – sex determining region Y –box 2; TIGR/MYOC – myocilin; 56
TGFB1 - transforming growth factor-beta1; TGIF - Transforming growth beta-induced factor; TGFB2 - transforming growth factor-beta2; TIMP1 – tissue inhibitor of metalloproteinase metallopeptidase inhibitor 1; UMODL1 – uromodulin –like 1.
1.3.5.3 Genome-wide association studies
In the past several years, with the advent of cheap and high-density array-
based DNA marker technologies, genome-wide scale assessments of the
genetic effect for each typed marker have become common practice87,88.
GWAS have transformed population-based epidemiology studies into
repertoires for further genetic exploration. The ophthalmology genetic
community has started to benefit from the convergence of population-based
epidemiological studies and genetic assays to identify genes involved in
susceptibility to eye disorders, including refractive errors17-21,23,34,35,73,262.
Table 4 lists the recent GWAS findings for refractive errors or myopia and the
subsequent replication results.
Two companion GWAS pinpointed genetic susceptibility variants for
quantitative refraction at 15q1434 and 15q2535 in Caucasian populations. A
locus in the region 15q14 was further successfully replicated for SE in a large-
scale meta-analysis study composed of 42,848 Caucasians and 12,332
Asians263 and in a Japanese cohort for high myopia251, whereas the association
at the 15q25 region was mixed in the pooled data. For chromosome loci
15q14, the implicated polymorphisms are in a putative regulatory region near
the genes GJD2 and ACTC1. GJD2 encodes a neuron-specific protein
(connexin36) in retinal photoreceptors, amacrine and bipolar cells, implying an important modulator of retinal signal transmission. ACTC1 encodes the 42- kDa alpha cardiac muscle actin 1, with the function in eyes being largely 57
unknown. The gene closest to chromosome 15q25 is RASGRF1, which is
highly expressed in human retina and known to be functionally involved in
eye development. RASGRF1 is regulated by muscarinic receptor and retinoic
acid, both of which are postulated in the biological mechanisms for refractive
controls.
In a meta-analysis of two GWAS in Chinese adults and children, our group
identified that minor allele of rs6885224 in CTNND2 at chromosome 5 was
-6 22 associated with an elevated risk of high myopia (Pmeta = 7.84 × 10 ) .
CTNND2 is known to play a crucial role in retinal morphogenesis, adhesion, and retinal cell architectural integrity via the regulation of adhesion molecules.
Interestingly, the most prominent SNP rs6885224 was subsequently found to be associated with high myopia (n=1203) and moderate myopia (n=615) versus emmetropia (n=955) in an independent Chinese sample set, but with a protective effect for the same minor allele240.
Nakanishi and colleagues reported the first GWAS for pathological
myopia (defined as AL ≥ 26mm on both eyes) and detected a novel susceptibility locus at chromosome 11q24.1 in a Japanese cohort23. The
adjacent BLID gene discovered in this study along with the gene PSARL
identified from another genome-wide linkage study204 are both involved in
mitochondrial apoptosis. However, the association with BLID was not
replicated in 1,818 high myopia cases (SE ≤ -10.0 D) and 1,052 controls in the
Chinese population264.
In a case-control data comprising 500 high myopia cases (SE ≤ -6.0 D and
AL ≥ 26 mm on either eye) and 500 emmetropic controls in South China, Shi
58
and colleagues reported a strong association for high myopia at gene MIPEP
on chromosome 13q12.12 in a GWAS, which encodes proteins targeting to the
mitochondrial matrix, suggesting a potential novel pathway leading to
myopia265. The associated gene MIPEP for high myopia was also noted in our
GWAS data (p < 0.05). Using a much smaller sample size for high myopia, Li
and colleagues identified chromosome locus 4q25 associated with high
myopia across older Chinese and much younger cohorts266.
Table 4 Genetic loci identified from genome-wide association studies
Locus (gene) Ethnics Most significant SNP Phenotype Sample size Ref 11q24.1 (BLID) Japanese rs577948 High myopia 830/934 23 Chinese ns(rs577948,rs11218544, High myopia 1818/1052 264 rs2839471) (SE ≤ -10.0D) 15q14 European rs634990 SE 15,608 34 (GJD2,ACTC1) 15q25 European rs8027411 SE 17,685 35 (RASGRF1) Japanese 15q14: rs634990,rs524952 High myopia 1125/366 (929) 251 15q25: rs8027411,rs17175798 Chinese ns (GJD2) Ocular biometrics 244 5q15 (CTNND2) Chinese rs6885224 High myopia 1,246/2,584 22 Chinese rs6885224 (opposite effect) High myopia 1203 high 240 myopia/955 controls 4q25 (intergenic) Chinese rs10034228 High myopia 2,993/10,406 266 Chinese rs10034228/rs2218817/ High myopia 1,052/1255 252 rs1585471/rs6837348 13q12.12(MIPEP) Chinese rs9318086 High myopia 3,222/6,341 265 SE – Spherical equivalent; high myopia is defined as either SE ≤ -6.0 D or axial length ≥ 26 mm, unless otherwise specified
The first genome-wide exome sequencing study is worthy of note. Shi and
colleagues identified the association of zinc finger protein 644 isoform
1 (ZNF644) sequence variants with susceptibility to high myopia in two
affected individuals in a large Han Chinese family and the findings were
confirmed in 600 Chinese adults of high myopia cases (both SE < -6.0 D)
and 600 normal controls267. In a multi-ethnic case-control study, primarily 59
comprised of Caucasians, the rare variants in ZNF644 screened by deep-
sequencing were also observed to be significantly enriched in 131 high
myopia cases compared to 672 controls268. ZNF644 belongs to the C2H2-type
zinc-finger protein family, which contains seven C2H2-type zinc fingers and
is predicted to be a transcription factor in gene expression regulation in the
retina and retinal pigment epithelium (RPE).
The sequence variants identified thus far from standard GWAS approaches
exhibit very modest effect sizes on myopia risk (per-allele odds ratio ≤ 1.5).
This observation reflects the complexity in the disease architecture of myopia
pathogenesis. The implicated genes for myopia identified through candidate
gene studies, GWAS or exome sequencing fall into three categories as shown
in Figure 3: 1) The majority are known to be involved in the process of sclera remodelling, for example COL2A1, COL2A2, HGF, TGFB1, TGFB2 etc.; 2)
Photoreceptor-related genes, like GJD2 for retinal signal transmission on
RPE; and 3) Genes hypothesised to regulate the ‘myopia important genes’ at
the transcriptional level, such as ZNF644. Interestingly, none of the genes
involved in sclera remodelling in the ECM pathway (Category 1) reached
genome-wide significance for myopia in the published GWAS.
60
Visual input
Retina Neurotransmission in the retina : GJD2, RASGRF1 Retinal pigment epithelium
Choroid Signaling cascade Regulate transcription levels: ZNF644, PAX6 Mitochondrial apoptosis: BLID, MIPEP
sclera Scleral remodeling: COL2A1, COL1A1, HGF, cMET, MMP1, MMP2, MMP3, MMP9, MMP13, LUM, PAX6, TGFB1, TGFB2, CTNND2, TGIF, IGF1
Figure 3. The implicated genes likely to be involved in the visual signal transmission and scleral remodeling. COL2A1 – collagen, type II, alpha 1; COL1A1 – collagen, type 1, alpha 1; CTNND2 – catenin (cadherin – associated protein); cMET – growth factor receptor c-met; GJD2 – gap junction protein, delta 2; HGF - hepatocyte growth factor; IGF1 – Insulin-like Growth factor-1; LUM – Lumican; MMP-matrix metallopeptidase; PAX6 –paired box gene 6; RASGRF1 – RAS protein-specific guanine nucleotide – releasing factor 1. TGFB - transforming growth factor-beta 1; TGIF - Transforming growth beta-induced factor; TGFB2 - transforming growth factor-beta2; UMODL1 – uromodulin –like 1; ZNF644 – zinc finger protein 644 isoform 1.
1.3.6 Intervention to slow myopia progression
Different aetiological factors have been proposed for myopia onset and
progression, such as anomalous accommodative activity and defocus of the
retinal image. Depending on the implicated mechanism, different clinical
approaches have been developed. The effectiveness of current strategies to
control the progression of myopia in children has been reviewed
extensively269,270.
Using single vision lenses (SVLs) is the most common routine to correct
vision, although it does not retard the progression of myopia. Myopic
individuals exhibit a greater accommodative lag (poor focusing accuracy
while looking at near work) than emmetropes. Accommodative lag results in
61
light being focused behind the retina during near work, which may stimulate the eye growth progressively after myopia onset. It is thus proposed that the reduction in accommodative lag (or load) in reviewing a near object with bifocal or multifocal spectacles should reduce myopic progression in already myopic children. In the Correction of Myopia Evaluation Trial (COMET), however, children wearing progression additional lenses (PALS) were found to have a small slowing of myopia progression compared to those wearing
SVLs, with negligible clinical significance271. Similar findings were noted in several other trails as systematically reviewed by Walline and colleagues269.
Interestingly, a subset of myopic children - for example, esophores at near with high accommodative lag - showed a larger treatment effect142, suggesting that accommodation lag may be a potential indicator for some individuals who could actually benefit from the treatment.
The uncorrected myopia decreases the need for accurate accommodation and poses the question of whether myopia progression can be reduced by diminishing accommodation by reading without spectacles or wearing under- corrected spectacles. In a three-year perspective study in myopic school children aged 9 to 11 years, the myopia progression was not correlated with full-time spectacle wearing, limited-time wearing for distance vision only or bifocal lenses272. The most interesting finding in this study, however, was that the shorter the average reading distance in the follow-up time, the faster the myopic progression was. In a randomised trial, Chung and colleagues found that the myopic progression was significantly accelerated in the under- corrected myopic children as compared to the full-time fully corrected group,
62
suggesting that the presence of blurred vision at any distance may stimulate
the progression of myopia regardless of the sign of defocus in the susceptible
eyes273. However, the findings have been criticised for not accounting for
‘running period’ at the onset of experiment.
One treatment modality is the use of overnight rigid gas permeable (RGP)
contact lens to reshape the corneal surface. It has been shown that RGP lenses
effectively reduce the degree of hyperopic field curvature in myopic eyes274.
Experimental studies support the concept that the peripheral hyperopic
defocus promotes axial myopia135. It has been shown that the rigid contact lens
corrected not only the central refractive error but also induced a myopic
change in relative peripheral refractive error, while the latter could be
effective for controlling eye growth in myopic children275. Two recent
randomised clinical trials reported significant changes in axial length in the
corneal reshaping group compared to those wearing spectacles276,277 or soft contact lenses278. Nevertheless, an early trial over 24-months in children aged
6 to 12 years in Singapore showed a statistically insignificant reduction in
myopia in the participants wearing rigid contact lenses during the daytime
when compared with the spectacle-corrected group279. Notably, there are
number of risks with overnight contact lens wearing, including corneal
infection and scarring, and the potentially-related microbial keratitis280.
The positive effects for slowing myopia progression have been exhibited
by cycloplegic agents centred on atropine and pirenzepine. Such types of
medicines are thought to reduce myopic progression by eliminating
accommodation. Several studies have reported that children receiving
63
pirenzepine gel, cyclopentolate eye drops, or atropine eye drops showed significantly less myopic progression compared with children receiving placebo281,282. However, many ophthalmologists do not embrace this treatment option due to concern over the atropine toxicity, photophobia, and significant catch-up after treatment removal.
64
2 Chapter 2 Study Aims
Study 1: Genetic Variants on Chromosome 1q41 Influence ocular Axial
Length and High Myopia (Chapter 3)
As a highly heritable ocular biometry of refractive errors, identification of quantitative trait loci influencing axial length (AL) variation would be valuable in informing the biological aetiology of myopia. This is the first
study using GWAS approach to investigate genes susceptibility to AL. The
aims of this study are to
1) discover genetic variants associated with AL in Asian populations
2) provide evidence of association for myopia at the implicated genetic
locus
Study 2: Genome-wide Meta-Analysis of Five Asian Cohorts Identifies
PDGFRA as a Susceptibility Locus for Corneal Astigmatism (Chapter 4)
Corneal astigmatism is associated with reduced visual acuity and is highly
heritable, but currently there is a paucity of research to investigate the genetic
aetiology of corneal astigmatism. This study aims to
1) identify genetic variants associated with corneal astigmatism in Asians
2) explore the transferability of the implicated loci in multiple Asian
populations
Study 3: Genome-wide Comparison of Estimated Recombination Rates
between Populations (Chapter 5)
Homologous recombination is the only force which breaks down linkage
disequilibrium (LD), where the inter-population pattern of LD is a 65
fundamental basis on the transferability of genome-wide associated genetic
loci. Inter-population differences in recombination patterns yield important
insights into trans-ethnic fine-mapping as well as recent human evolution. Ad-
hoc thresholds have been previously used to identify genomic regions
exhibiting either concurring or differential evidence of hotspot differences .
No formal statistical matrix has been applied to quantify the inter-population recombination rate variation based on estimated rates. This study aims to
1) introduce a novel strategy to evaluate the inter-population genetic
recombination differences
2) describe the pattern of genome-wide recombination differences among
populations
66
3 Chapter 3 Genetic Variants on Chromosome 1q41
Influence Ocular Axial Length and High Myopia
This Chapter is a slightly modified version of the paper published in PLoS
Genet 2012, 8(6): e1002753. doi:10.1371/journal.pgen.1002753
3.1 Abstract
As one of the leading causes of visual impairment and blindness, myopia poses a significant public health burden in Asia. The primary determinant of myopia is an elongated ocular axial length (AL). Here we report a meta- analysis of three genome-wide association studies on AL conducted in 1,860
Chinese adults, 929 Chinese children and 2,155 Malay adults respectively. We identified a genetic locus on chromosome 1q41 harbouring the zinc-finger
11B pseudogene ZC3H11B showing genome-wide significant association with
-10 AL variation (rs4373767, β = -0.16 mm per minor allele, Pmeta = 2.69 × 10 ).
The minor C allele of rs4373767 was also observed to significantly associate
with decreased susceptibility to high myopia (per-allele odds ratio (OR) =
-7 0.75, 95% CI: 0.68 – 0.84, Pmeta = 4.38 × 10 ) in 1,118 highly myopic cases
and 5,433 controls. ZC3H11B and two neighboring genes SLC30A10 and
LYPLAL1 were expressed in the human neural retina, retinal pigment epithelium and sclera. In an experimental myopia mouse model, we observed
significant alterations to gene and protein expression in the retina and sclera of
the unilateral induced myopic eyes for the murine genes ZC3H11A, SLC30A10
and LYPLAL1. This supports the likely role of genetic variants at chromosome
1q41 in influencing AL variation and high myopia.
67
3.2 Background
Myopia increases the risk of visual morbidity and poses a considerable public health and economic burden globally, especially in Asia, where the prevalence is significantly higher than other parts of the world 283. Human
myopia primarily results from an abnormal increase in ocular axial length
(AL), the distance between the anterior and posterior poles of the eye globe,
whereas the role of corneal curvature and lens thickness is minimal 114. A 1
millimeter (mm) increase in AL is equivalent to a myopic shift of -2.00 to -
3.00 diopters (D) with no corresponding changes in the optical power of the
cornea and lens. High myopia, often defined as ocular spherical equivalent
(SE) refraction below -6.00 D, is associated with an abnormally long AL, and
this affects between 1% to 10% of the general population284. The degenerative
changes in the retina and the choroid due to the excessive elongation of the
globe are not prevented by optical correction and this subsequently increases
the risk of visual morbidity through myopic maculopathy, choroidal
neovascularization, retinal detachment and macular holes111. The active
remodelling of the sclera, mediated by the signaling cascade initiated in the retina under visual input, has also been found to be critical in determining
axial growth, and thus the refractive state of the eye285.
It is well documented that near work, educational attainment and outdoor
activities are environmental factors contributing to myopia286. Evidence from
family and twin studies has also supported a substantial genetic component in
spherical refractive error and AL157,159,160. The heritability of the quantitative
trait AL has been estimated to be as high as 94% comparable to that for SE
(for a review, see174). Although linkage scans on pedigrees (myopia loci 68
MYP1 to MYP18; see http://www.omim.org) and genome-wide association
studies (GWAS)22,23,34,35,265,266 have implicated several regions in the human
genome as being significant for refractive error and myopia, no myopia genes
have been consistently identified within or across different population groups.
This scenario reflects the complexity in the disease architecture of myopia
pathogenesis.
Genetic factors influencing AL and refraction appear to be at least partly
shared, given previous literature from twin studies illustrating that at least half
of the covariance between AL and refraction are due to common genetic
factors287. The measurement of AL is more precise and less prone to errors
compared to cycloplegic or non-cycloplegic assessments of refraction. As AL is an endophenotype for spherical refractive error, identifying genes that are responsible for AL variation provides insight into myopia predisposition and development. Presently there are only two genome-wide linkage studies performed in European descent populations that suggest the presence of AL quantitative trait loci (QTLs) on chromosomes 2p24202 and 5q (at 98
centimorgans) along with two classical myopia loci (MYP3 at 12q21 and
MYP9 at 4q12)203, and there are no reports of any genes that are indisputably confirmed to be associated with AL.
We thus performed a meta-analysis of three genome-wide surveys of AL in a total of 4,944 individuals in Asian populations comprising (i) Chinese adults from the Singapore Chinese Eye Study (SCES); (ii) Chinese children from the Singapore cohort Study of the Risk factors for Myopia (SCORM); and (iii) Malay adults from the Singapore Malay Eye Study (SiMES). SNPs that have been identified from this meta-analysis to be significantly associated 69
with AL were further assessed for association with high myopia in an
additional two independent case-control studies from Japan. We also examined the expression patterns of the candidate genes located in the vicinity of the identified SNPs in human ocular tissues and in the eyes of myopic mice.
3.3 Methods
3.3.1 Study cohorts
Discovery cohorts
Singapore Chinese Eye Study (SCES). SCES is an ongoing population- based cross-sectional survey of eye diseases in Chinese adults aged 40 to 80 years residing in the Southwestern part of Singapore. The study began in 2007 and a detailed description was published elsewhere288. In brief, a total of 2,226
residents in the Southwestern area of Singapore completed comprehensive
ophthalmologic examinations, including visual acuity assessments, refraction,
lens and retinal imaging, and slit lamp examinations. Genome-wide
genotyping was performed in 1,952 individuals. Completed post quality
control (QC) data for GWAS were available for 1,860 adults with AL
measurements.
Singapore Cohort study of the Risk factors for Myopia (SCORM). A
total of 1,979 children in grades 1, 2, and 3 from three schools in Singapore
were recruited from 1999 to 2001289. The children were examined on their
respective school premises annually by a team of eye care professionals. The
GWAS was conducted in a subset of 1,116 Chinese children22,290. The
70
phenotype used in this study was based on the AL measured on the 4th annual
examination of the study (children at age 10 to 12 years). Complete post-
filtering data on AL measurements and SNP data were available in 929
children.
Singapore Malay Eye Study (SiMES). SiMES is a population-based
cross-sectional survey of eye diseases in Malay adults aged 40 to 80 years
living in Singapore. It was conducted between August of 2004 and June of
2006291. A total of 4,168 Malay residents in the Southwestern area of
Singapore were identified and invited for a detailed ocular examination where
3,280 (78.7%) participated. Genome-wide genotyping was performed in 3,072 individuals26,29. Complete post-filtering data for GWAS with AL
measurements were available for 2,155 subjects.
Validation cohorts for high myopia
Japan dataset 1. The Japan dataset 1 consisted of 483 high myopia cases
and 1,194 general healthy population controls. High myopia status was
determined primarily on the basis of AL ≥ 28 mm for both eyes, which
corresponded to the spherical equivalent (SE) cut-off of at least -9.00 D292.
Cases were recruited at the Center for Macular Disease of Kyoto University
Hospital, the High Myopia Clinic of Tokyo Medical and Dental University,
and the Fukushima Medical University Hospital. Details of the data have been
reported elsewhere 23. The population controls were recruited at the Aichi
Cancer Center Research Institute.
Japan dataset 2. The Japan dataset 2 was comprised of 504 high myopia
cases (SE ≤ -9.00 D in either eye) and 550 non-highly myopic controls (SE ≥ - 71
3.00 D in both eyes). Less stringent thresholds were adopted for controls for
the purpose of ease of recruitment from the clinics. Given the large phenotypic
separation between the cases and controls, and assumption of
homoscedasticity across genotype categories, such a study design using the
extreme on one end (i.e. SE ≤ -9.00 D) but sampling less extreme controls (i.e.
SE ≥ -3.00 D) still provides sufficient statistical power to detect the true
positive signals in the association study293. Cases were recruited at the
Yokohama City University and Okada Eye Clinic. Controls were obtained
from the Yokohama City University and Tokai University Hospital.
Measurements of AL, refractive error and covariates
All the studies used a similar protocol for ocular phenotype measurements.
For subjects in SCES and SiMES, AL for both eyes were measured using
optical laser interferometry (IOLMaster V3.01, Carl Zeiss; Meditec AG Jena,
Germany)288,291. Children in the SCORM study underwent AL measurements using the A-scan ultrasound biometry machine (Echoscan US-800; Nidek Co,
Tokyo, Japan)289. For subjects in the Japan dataset 1, applanation A-scan
ultrasongraphy (UD-6000, Tomey, Nagoya, Japan) or partial coherence
interferometry (IOLMaster, Carl Zeiss Meditec, Dublin, CA) were used to
measure AL. AL was assessed using a portable A-scan Biometer/pachymeter
(AL-2000, Tomey, Negoya, Japan) for the participants in the Japan dataset 2.
Non-cycloplegic refraction in SCES and SiMES as well as cycloplegic
refraction in SCORM (three drops of 1% cyclopentolate at 5 minutes apart)
were measured by autorefractor (Canon RK-5, Tokyo, Japan)294. For subjects
in the Japan dataset 2, refraction was measured using auto-refraction ARK- 72
730A (NIDEK), ARK-700A (NIDEK) and KR-8100P (TOPCON). SE was calculated as the sphere power plus half of the cylinder power for each eye.
To perform the genetic association of high myopia in SCES and SiMES, we used the definition adopted by the Japan case-control studies and defined high myopia cases as subjects having SE ≤ -9.0 D in at least one eye, and non high-myopia controls as samples with SE ≥ -3.0 D in both eyes. For children from SCORM aged 10 to 12 years, cases were defined as SE ≤ -6.0 D for at least one eye, while controls were defined as SE ≥ -1.0 D for both eyes; this is approximately equivalent to the projected SE of -9.0 and -3.0 respectively at university age based on the estimated annual progression rate in SE of -0.6 D for Chinese myopic children and -0.3 D in the controls295. Given the small
sample sizes of high myopia cases identified in our population-based cohorts,
in the supplementary analysis, we further applied the commonly adopted
criteria of SE≤ -6.0 D in either eye as cases. Controls were defined as SE ≥ -
1.0 D in both eyes. For SCORM children, we retained the same criteria in both
analyses.
Age, gender, height and level of education were obtained from all
Singapore participants who underwent ophthalmologic examination.
Education was measured on an ordinal scale from no formal education to the
highest educational level. For participants in SCORM, the education of the
child was defined by the level of educational attainment of the father, as a
marker of socioeconomic status.
73
3.3.2 Data quality control
Discovery cohorts
For SCES, a total of 1,952 venous blood-derived samples were genotyped
using Illumina Human 610 Quad Beadchips (Illumina Inc., San Diego, US)
according to the manufacturer’s protocols. Samples which failed genotyping
or with low call rate (< 95%, n = 11), with excessive heterozygosity (defined
as sample heterozygosity exceeding 3 standard deviations from the mean
sample heterogzygosity; n = 3), with gender discrepancies (n = 2) were
excluded, as were cryptically related samples identified by the identity-by- state (IBS) (n = 41) and population structure in the principal components analyses (PCA) (n = 6). The criteria to define cryptically related samples and outliers with population structure in the discovery cohorts are described in the following paragraph. After the removal of the samples, SNP quality control
(QC) was then applied on a total of 579,999 autosomal SNPs for the 1,889 post-QC samples. SNPs were excluded based on (i) high rates of missingness
(> 5%) (n = 26,437); (ii) monomorphism or minor allele frequency (MAF) <
1% (n = 59,633); or (iii) genotype frequencies deviating from Hardy-
Weinberg Equilibrium (HWE) defined as HWE P-value < 10-6 (n = 1,821).
This yielded 492,108 autosomal SNPs. Those individuals with missing data on
phenotypes were further removed (n = 29). Finally, 492,108 SNPs in 1,860
samples were available for analyses.
For SCORM, 1,116 DNA samples (1,037 from buccal swab and 79 from
saliva) were genotyped on the Illumina HumanHap 550 Beadchips and 550
Duo Beadarrays. A total of 108 samples were excluded, comprising (i) 70
samples with call rates below 98%; (ii) 6 with poor genotyping quality; (iii) 11 74
samples identified from sib-ships; (iv) 18 with inconsistent gender
information; and (v) 3 due to population structure. This left a total of 1,008
samples for further SNP QC. Based on 514,849 autosomal SNPs, we excluded
32,669 markers if they had missing genotype calls > 5%, MAF < 1%, or
significantly deviated from HWE (P < 10-6) 22. A final set of 929 samples with
482,180 post-QC SNPs and completed AL measurement were included in
analyses.
For SiMES, 3,072 DNA samples were genotyped using the Illumina
Human 610 Quad Beadchips. The detailed QC procedures were provided
elsewhere296. In brief, we omitted a total of 530 individuals due to: (i)
subpopulation structure (n = 170); (ii) cryptic relatedness (n = 279); (iii)
excessive heterozygosity or high missingness rate > 5% (n = 37); and (iv)
gender discrepancy (n = 44). After the removal of the samples, SNP QC was
then applied on a total of 579,999 autosomal SNPs for the 2,542 post-QC
samples. SNPs were excluded based on: (i) high rates of missingness ( > 5%)
(n = 26,343); (ii) monomorphism or MAF < 1% (n = 34,891); or (iii) genotype
frequencies deviating from HWE (P < 10-6) (n = 3,645). This yielded 515,120
SNPs after the same SNP QC criteria. Individuals without valid measurements
for AL were further removed (n = 387). After the above filtering criteria,
515,120 SNPs in 2,155 samples were available for association analyses.
In our discovery cohorts, IBS was estimated with the genome-wide SNP
data using PLINK software to assess the degree of recent shared ancestry for a
pair of individuals297. For a pair of putatively-related samples defined as an
identity by descent (IBD) value greater than 0.1856, we removed one individual from each pair of monzygotic twins/duplicates, parent-offspring or 75
full-siblings etc. Population structure was ascertained using PCA with the
EIGENSTRAT program and genetic outliers were defined as individuals whose ancestry was at least 6 standard deviations from the mean on one of the top ten inferred axes of variation7.
For SiMES Malays, we also excluded the samples falling in the main clusters of PCA plots of the Chinese and Indians ethnic groups, as described in the previous study296. In SiMES, we noticed some degree of admixture in genetic ancestry of Malays and thus adjusted for ancestry along the top five axes of variation, as the spread of principal component scores was greater for the top five eigenvectors in the bivariate plots of PCA (Figure 4), The top ten principal components explained a small percentage of the global genetic variability of 1.3% while top five explained 1.0%, suggesting, all together, they had minimal effects on our association analyses.
Validation cohorts for high myopia
High myopia cases in the Japan dataset 1 were genotyped using Illumina
Human-Hap550 and 660 chips23, while controls in the Japan dataset 1 were genotyped on Illumina Human-Hap610 chips. Subjects in the Japan dataset 2 were genotyped on the Affymetrix GeneChip Human Mapping 500K Array
Set (Affymetrix Inc., Santa Clara, US). For SNPs not available on the
Affymetric chips (rs43737678, rs10779363 and rs7544369), genotyping was performed with TaqMan 5′ exonuclease assays using primers supplied by
Applied Biosystems (Foster City, US). The probe fluorescence signal was detected using the TaqMan Assay for Real-Time PCR (7500 Fast Real-Time
PCR System, Applied Biosystems). 76
3.3.3 Statistical methods
The primary analysis was performed on the AL quantitative trait. As a strong correlation exists in AL measurements from both eyes (r > 0.9), we used the mean AL across both eyes in the GWAS analysis, as was recommended in a review298. Linear regression was used to interrogate the association of each SNP with AL after adjusting for age, gender, height and level of education, under the assumption of an additive genetic effect where the genotypes of each SNP are coded numerically as 0, 1 and 2 for the number of minor alleles carried. In addition, for SiMES, the top five principal components of genetic ancestry from the EIGENSTRAT PCA were also included as covariates to account for the effects of population substructure as described in genotype QC section290. Association tests between each genetic marker and phenotype were carried out using PLINK software297 (version
1.07). Analyses were also repeated without adjustment for education level or height for the purpose of comparison.
In the discovery phase, we conducted a meta-analysis of GWAS results from 3 cohorts for AL using a weighted-inverse variance approach by fixed- effect modelling in METAL (http://www.sph.umich.edu/csg/abecasis/metal).
In the secondary analyses, SNPs that have been identified from the primary analyses were tested for association with high myopia onset (as a binary trait) and SE (as a quantitative trait). For Singapore cohorts, the association analyses adjusted for the same covariates as the primary analyses within a linear regression and logistic regression framework respectively. For Japan case-
77
control datasets, only age and gender were included as covariates in the model
for high myopia, as the other covariates were not available.
The regional association plots were constructed by SNAP
(http://www.broadinstitute.org/mpg /snap ). Haploview 4.1
(http://www.broad.mit.edu/mpg/ haploview) was used to visualize the LD of
the genomic regions. Genotyping quality of all reported SNPs has been
visually evaluated by the intensity clusterplots. The coordinates reported in
this paper are on NCB136 (hg18).
For functional studies in the myopic mouse model, gene expression of all
three identified genes in control and experimental groups was quantified using
the 2-∆∆Ct method299. The standard student’s t-test was performed to determine the significance of the relative fold change of mRNA between the myopic eyes of the experimental mice with the independent age-matched controls.
3.3.4 Functional studies
3.3.4.1 Gene expression in human
Genes ZC3H11B , SLC30A10 and LYLPLAL1, adjacent to the top
associated SNPs, as well as GAPDH (house keeping gene) were run using
10ul reactions with Qiagen’s PCR products consisting of 1.26 ul H2O, 1.0 ul
10X buffer, 1.0 ul dNTPs, 0.3 ul MgCl, 2.0 ul Q- Solution, 0.06 ul taq polymerase, 1.0 ul forward primer, 1.0 ul reverse primer and 1.5.0 ul cDNA.
The reactions were run on a Eppendorf Mastercycler Pro S thermocycler with touchdown PCR ramping down 1°C per cycle from 72°C to 55°C followed by
50 cycles of 94°C for 0:30, 55°C for 0:30 and 72°C for 0:30 with a final 78
elongation of 7:00 at 72°C. All primer sets were designed using Primer3300.
The gel electrophoresis was run on a 2% agarose gel at 70 volts for 35
minutes. The primers from a custom tissue panel including Clontech’s Human
MTC Panel I, Fetal MTC Panel I and an ocular tissue panel were used to
amplify the relevant transcripts. The adult ocular samples were obtained from
normal eyes of an 82-year-old Caucasian female from the North Carolina Eye
Bank, Winston-Salem, North Carolina, USA. The fetal ocular samples were
from 24-week fetal eyes obtained by Advanced Bioscience Resources Inc.,
Alameda, California, USA. All adult ocular samples were stored in Qiagen’s
RNAlater within 6.5 hours of collection and shipped on ice overnight to the
lab. Fetal eyes were preserved in RNAlater within minutes of harvesting and shipped over night on ice. Whole globes were dissected on the arrival day.
Isolated tissues were snap-frozen and stored at -80°C until RNA extraction.
RNA was extracted from each tissue sample independently using the Ambion mirVana total RNA extraction kit. The tissue samples were homogenized in
Ambion lysis buffer using an Omni Bead Ruptor Tissue Homogenizer per
protocol. Reverse transcription reactions were performed with Invitrogen
SuperScript III First-Strand Synthesis kit.
3.3.4.2 Myopia-induced mouse model
Gene expression in a mouse model of myopia
Experimental myopia was induced in B6 wild-type (WT) mice (n = 36) by
applying a -15.00 D spectacle lens on the right eye (experimental eye) for 6 weeks since post-natal day 10. The left eyes were uncovered and served as
79
contra-lateral fellow eyes. Age matched naive mice eyes were used as
independent control eyes (n = 36). Each eye was refracted weekly using the
automated infrared photorefractor as described previously301. AL was
measured by AC- Master, Optic low coherence interferometry (Carl-Zeiss), in- vivo at 2, 4 and 6 weeks after the induction of myopia302. The minus-lens-
induced eyes after six weeks were significantly associated with increased AL
and myopic shift in refraction of < -5.00 D as compared to independent
control eyes (n = 36, P = 3.00 × 10-6 for AL, and 2.05 x 10-4 for refraction).
Eye tissues were collected at 6 weeks post myopia induction for further
analyses.
Total RNA was isolated from pooled cryogenically ground mouse neural
retina (retina), retinal pigment epithelium (RPE) and sclera for three batches
using TRIzol Reagent (Invitrogen, Carlsbad, CA) with each batch (n = 6)
comprising the myopic eye, fellow eye and control eye. RNA concentration
and quality were assessed by the absorbance at 260 nm and the ratio of
absorbance ratio at 260 and 280nm respectively, using Nanodrop® ND-1000
Spectrophotometer (Nanodrop Technologies, Wilmington, DE). RNA was
purified using the RNeasy® Mini kit (Qiagen, GmbH).
500 ng of purifed RNA was reverse-transcribed into cDNA using random
primers and reagents from iScriptTM select cDNA synthesis kit (Bio-rad
Laboratories, Hercules, CA). The pseudogene ZC3H11B (zinc finger CCCH
type containing 11B) is not characterized in the mouse genome, therefore we
examined a similar gene ZC3H11A (zinc finger CCCH type containing 11A) in mice. ZC3H11A in mice and ZC3H11B in humans are highly conserved with 79% nucleotide similarity by BLAST alignment analysis 80
(http://blast.ncbi.nlm.nih.gov). We used quantitative Real-Time PCR (qRT-
PCR) to validate the gene expression. qRT-PCR primers were designed using
ProbeFinder 2.45 (Roche Applied Science, Indianapolis, IN) and this was performed using a Lightcycler 480 Probe Master (Roche Applied Science,
Indianapolis, IN). The reaction was run in a Lightcycler 480 for 45 cycles under the following conditions: 95˚C for 10s, 56˚C for 10s and 72˚C for 30s.
Gene expressions in the retina, RPE and sclera after six weeks of myopic eyes and the fellow eyes were compared to the control eyes. Glyceraldehyde 3-
phosphate dehydrogenase (GAPDH) was used as an endogenous internal
control.
Immunohistochemistry
Whole mouse eyes (6 weeks minus lens treated myopic, contra-lateral
fellow and independent control eyes, n = 6 per type) were embedded in frozen
tissue matrix compound at -20°C for 1 hour. Prepared tissue blocks were
sectioned with a cryostat at 6 microns thicknesses and collected on clean polysine™ glass slides. Slides with the sections were air dried at room temperature (RT) for 1 hour and fixed with 4% para-formaldehyde for 10 min.
After washing 3X with 1x PBS for 5 minutes, 4% bovine serum albumin
(BSA) diluted with 1x PBS was added as a blocking buffer. The slides were then covered and incubated for 1 hour at RT in a humid chamber. After rinsing with 1x PBS, a specific primary antibody raised in rabbit against ZC3H11A,
SLC30A10 and raised in goat against LYPLAL1 (Abcam, Cambridge, UK) diluted (1:200) with 4% BSA was added and incubated further at 4°C in a humid chamber overnight. After washing 3X with 1x PBS for 10 min, 81
fluorescein-labeled goat anti-rabbit secondary antibody (1:800, Invitrogen-
Molecular Probes, Eugene, OR) and fluorescein-labeled rabbit anti-goat
secondary antibody (1:800, Santa Cruz Biotechnology, Inc. CA, USA) was
applied respectively and incubated for 90 min at RT. After washing and air-
drying, slides were mounted with antifade medium containing DAPI (4,6-
diamidino-2-phenylindole; Vectashield, Vector Laboratories, Burlingame,
CA) to visualize the cell nuclei. Sections incubated with 4% BSA and omitted
primary antibody were used as a negative control. A fluorescence microscope
(Axioplan 2; Carl Zeiss Meditec GmbH, Oberkochen, Germany) was used to
examine the slides and capture images. Experiments were repeated in
duplicates from three different samples.
3.4 Results
3.4.1 Datasets after quality control
A genome-wide meta-analysis of three GWAS on AL was performed in the post quality control samples from SCES (n = 1,860), SCORM (n = 929) and SiMES (n = 2,155). Principal component analysis (PCA) of these samples with reference to the HapMap Phase 2 individuals showed that the two
Chinese cohorts (SCES and SCORM) are indistinguishable with respect to samples of Han Chinese descent, and the differentiation from samples of
Japanese descent is evident only on the fourth principal component (Figure
5). The SiMES Malays are genetically similar to the Chinese-descent samples relative to individuals with European or African ancestries. The distributions of AL measurements in the three cohorts were approximately Gaussian and
82
the baseline characteristics are summarised in Table 5. The mean AL were
23.98 mm (SD = 1.39 mm), 24.10 mm (SD = 1.18 mm) and 23.57 mm (SD =
1.04 mm) for SCES, SCORM and SiMES respectively. Moderate to high
correlations between AL and SE were observed (SCES/SCORM/ SiMES;
Pearson correlation coefficient r = -0.75, -0.76 and -0.62 respectively).
3.4.2 Locus at chromosome 1q41 achieved genome-wide significance
The meta-analysis was performed on 456,634 SNPs present in all three studies, and the quantile-quantile (QQ) plots of the P-values showed only modest inflation of the test statistics in SCES and in the meta-analysis
(genomic control inflation factor: λmeta = 1.03; λSCES = 1.05; λSCORM = 1.00;
λSiMES = 1.00, Figure 6).
A cluster of four SNPs on chromosome 1q41 (rs4373767, rs10779363,
rs7544369 and rs4428898) attained genome-wide significance on meta-
analysis for AL, adjusting for age, gender, height and education level (Figure
7). Analyses conducted without adjustment for height or education level
yielded the same pattern of results. The most significant SNP rs4373767 (Pmeta
= 2.69 × 10-10) explained 0.98% of AL variance in SCES, 0.86% in SCORM
and 0.73% in SiMES, and each copy of the minor allele (cytosine) decreased
AL by 0.16mm on average (Table 6). These top associated SNPs at
chromosome 1q41 remained significant after adjustment for genomic control
-8 (Pmeta ≤ 1.85 × 10 ). Table 6 also lists three genetic loci at chromosome
2p13.1 (SEMA4F), 2p21 (SPTBN1) and 5q11.1 (PARP8) exhibiting suggestive
evidence of association with AL that were seen in at least one SNP with P-
values < 1 ×10-5.
83
3.4.3 Association with high myopia on the identified SNPs
To assess whether these four SNPs at chromosome 1q41 have any role in
high myopia predisposition, we performed association testing of these SNPs
with high myopia in two independent case-control studies from Japan
consisting of 987 high myopes and 1,744 controls. High myopes were defined
as individuals with SE ≤ -9.00 D or AL ≥ 28 mm. All four SNPs exhibited
consistent evidence of association (P < 0.05) in both Japanese studies,
suggesting a potential role of these SNPs for high myopia (Table 7).
We further dichotomized the quantitative refraction from our three
population-based studies (SCORM, SCES, and SIMES) to define samples as high myopes and controls according to similar criteria from the Japanese datasets. High myopes in SCES and SiMES were younger and more highly educated than controls. While the case-control associations of these 4 SNPs with high myopia did not achieve statistical significance in SCES and SiMES, this is likely a consequence of the small sample sizes since the direction and magnitude of the odds ratios were highly similar across all cohorts. The meta- analysis of 1,118 high myopia cases and 5,433 controls from all the five cohorts yielded strong evidence of association with high myopia at these SNPs
-6 -8 (Pmeta between 1.45 × 10 to 7.86 × 10 , Table 7), with no evidence of inter-
study heterogeneity (P ≥ 0.75 for heterogeneity). The minor allele cytosine at
rs4373767 lowered the odds of high myopia by 25% with respect to the
-7 thymidine allele (ORmeta = 0.75, 95% CI: 0.68 – 0.84, Pmeta = 4.38 × 10 ). The
stringent definition of high myopia (SE ≤ -9.00D) used here only considered between 1.0% to 2.4% of our samples as cases, and relaxing this criterion to the commonly adopted threshold of SE ≤ -6.00D identified more myopia cases 84
- and increased the statistical support of all four SNPs (Pmeta between 1.47 × 10
7 to 9.13 × 10-9).
This associated interval spans approximately 70kb in the extended linkage
disequilibrium (LD) block within an intergenic region on chromosome 1q41
(pairwise r2 > 0.5 with the most significant SNP rs4373767, Figure 8A). Zinc
finger family CCCH-type 11B pseudogene ZC3H11B (RefSeq NG_007367.2)
is embedded between the associated top SNPs rs4373767 and rs10779363
(Figure 8B). The most significant SNP rs4373767 is located 223kb downstream from SLC30A10 (RefSeq NM_018713.2), which is a member of solute carrier family 30, and 354kb downstream of LYPLAL1 (RefSeq
NM_138794.3), encoding a lysophospholipase-like protein.
3.4.4 Gene expression
The mRNA expression levels of ZC3H11B, SLC30A10 and LYPLAL1 were surveyed in 24-week human fetal and adult tissues using reverse-
transcriptase polymerase chain reaction (RT-PCR). Whilst ZC3H11B and
LYPLAL1 were found to be expressed across all the tissues including brain, placenta, neural retina, retina pigment epithelium (RPE) and sclera, the expression of ZC3H11B was more abundant compared to LYPLAL1 (Figure
9). SLC30A10 was expressed in all tissues but the adult sclera, analogous to observations made in other zinc transporters303.
Gene expressions for ZC3H11A, SLC30A10 and LYPLAL1 from the tissues
of myopic (with SE < -5.0 D) and fellow non-occluded eyes of the
experimental mice were compared with age-matched control tissues (Figure
85
10). The mRNA levels of ZC3H11A, a gene that is conserved with respect to
ZC3H11B in human, were significantly down-regulated in myopic eyes
compared to naive controls (retina/ RPE/sclera, Fold change = -2.88 , -3.24 and -2.07; P = 2.60 × 10-5, 2.62 × 10-6 and 1.08 × 10-4, respectively). At the
neighboring gene SLC30A10, there was a similarly significant reduction in the expression of mRNA in the retina tissue of myopic eyes in contrast to independent controls (retina/RPE, Fold change = -2.02, -2.69; P = 2.00 × 10-4,
2.00 × 10-4, respectively), with elevated expression in the sclera (Fold change
= 4.58; P = 4.02 × 10-4). Another neighboring gene LYPLAL1 exhibited up-
regulation of transcription levels in retina tissue but was down-regulated in the
sclera (retina/ RPE/sclera, Fold change = 2.71, 3.45 and -2.36; P = 1.50 × 10-4,
1.50 × 10-4 and 1.54 × 10-4, respectively).
Immunohistochemical results confirmed the localization of ZC3H11A,
SLC30A10 and LYPLAL1 proteins in the neural retina, RPE and sclera (Figure
11). For ZC3H11A, positive immunostaining intensity was reduced
significantly in the myopic tissues of experimental mice compared to the non-
myopic independent controls (Figures 10A). This is consistent with the
differential expression patterns at the transcription level. For SLC30A10 and
LYPLAL1, there were also similarly noticeable changes in the expression of
proteins to that of their mRNA levels (Figures 10B and 10C).
3.5 Discussion
We report that the chromosome 1q41 locus (most significant SNP
rs4373767) is associated with AL in a meta-analysis of three GWAS
86
performed in the study cohorts consisting of Chinese adults, Chinese children,
and Malay adults. The discovery of chromosome 1q41 as a locus for high
myopia in our data is further supported by validation in two independent
Japanese cohorts, and the observed genetic effects are highly consistent across
all five studies. The pseudogene ZC3H11B and two nearby genes SLC30A10
and LYPLAL1 were found to be expressed in the human retina and sclera. The
potential roles in regulating myopia at three candidate genes were further
implicated by the concordant changes in the pattern of transcription and
protein expression in the mouse model.
The ZC3H11B pseudogene belongs to the CCCH-type zinc finger family,
whereas such type of zinc finger protein has been shown as a RNA-binding motif to facilitate the mRNA processing at transcription 304. Emerging evidence suggests that pseudogenes, resembling known genes but not producing proteins, play a significant role in pathological conditions by competing for binding sites to regulate the transcription of its protein-coding counterpart 305-307. Although the function of the ZC3H11B in humans is presently unknown, the implicated role of the murine gene ZC3H11A
(conserved gene of ZC3H11B in mouse) in myopia development is in keeping with previous findings that several zinc finger proteins are involved in myopia
308,309. Given their role as transcription factors 310, zinc finger protein ZENK
has been proposed to function as a messenger in modulating the visual
signaling cascade in the chicken retina, where the expression of the ZENK was
suppressed by the condition of minus defocus (induced myopic eye growth)
and enhanced by positive defocus (induced hyperopic eye growth)311-313.
Similarly, it has been reported that ZENK knockout mice had elongated AL 87
and a myopic shift in refraction309. Moreover, early growth response gene
type1 EGR-1 (the human homologue of ZENK) has been shown to activate
transforming growth factor beta 1 gene TGFB1 by binding its promoter314,315,
a gene that is implicated to be associated with myopia224,225. Another zinc
protein finger protein 644 isoform ZNF644 has recently been identified to be
responsible for high myopia using whole genome exome sequencing in a Han
Chinese family308, whereas its influence on “myopia genes” remains to be
elucidated. In light of this, the observation that ZC3H11B is abundantly
expressed in retina and sclera, together with the significant down-regulation of the coding counterpart ZC3H11A in myopic mice eyes, suggests it may promote or inhibit the transcription of ocular growth genes vital in myopia development.
One of the two neighboring genes SLC30A10 is an efflux transporter that reduces cytoplasmic zinc concentrations316. The SLC30 zinc transporters are
expressed abundantly in human RPE cells, and the retina has been observed to
possess the highest concentration of zinc in the human body303. Zinc
deficiency in the intracellular retina has thus been implicated in the
pathogenesis of age-related macular degeneration (AMD)317,318, and in RPE-
photoreceptor complex deficits, which can affect visual signal transduction
from retina to sclera and lead to visual impairment319. LYPLAL1 functions as a
triglyceride lipase and this gene has been shown to be up-regulated in
subcutaneous adipose tissue in obese individuals320-322. While the relationship
between LYPLAL1 and myopia is unknown, elevated saturated-fat intake has
been proposed to influence myopia development through the retinoid receptor
pathway227,323,324. Interestingly, the SNPs pinpointing chromosome 1q41 in 88
our study are 1Mb away from the transforming growth factor beta 2 gene
(TGFβ2) which has been implicated in the down-regulation of mRNA levels
in myopia progression of an induced tree shrew myopia model325. None of
these nearby genes, however, are within the LD block containing our
identified SNPs.
Chromosome 1q41 is a previously reported locus for refraction from a
linkage analysis of 486 pedigrees in the Beaver Dam Eye Study, US197. Using
microsatellite markers, Klein et al identified novel regions of linkage to SE on
chromosome 1q41, whereas the peak spanned a broad region near Marker
D1S2141 (multipoint P < 1.9 × 10-4). This result however was not replicated
in a subsequent genome-wide linkage scan for SE with denser SNP markers,
partially due to varying information of linkage conveyed by SNPs versus
microsatellites188. The identified variants at chromosome 1q41 in our study
were noted to exhibit weaker, albeit still significant, association with SE in
SCES and SCORM (rs4373767, SCES/SCORM: P = 3.54 × 10-3, 3.49 × 10-2,
respectively), but not in SiMES (3.51 × 10-1 ), which is consistent with the
lower correlation of AL and SE seen in the SiMES data, partially from
increasing lens opalescence in the Malay population326,327.
Our data have shown that genetic variants on chromosome 1q41 influence
the physiological attribute of AL and are also associated with high myopia.
Elongation of AL is the major underlying structural determinant of high
myopia, mostly accompanied with prolate eyeballs and thinning of the sclera,
macula and retina111. Thus, high myopia is also defined as AL of >26 mm in
some studies23,328. It is possible that genes involved in a quantitative trait
(refraction or underlying AL) also play a role in the extreme forms of the trait 89
(high myopia)30. Two recent GWAS performed in general Caucasians population have identified genetic variants for quantitative refraction at
chromosome 15q1434 and 15q2535, of which the locus on 15q14 was subsequently confirmed to be associated with high myopia in the Japanese251.
Our GWAS results herein highlight AL QTLs’ relevance for high myopia
predisposition, which advances our understanding of the genetic aetiology of
myopia at different levels of severity.
The meta-analysis of three GWAS in our discovery suggests that the
quantitative trait locus at chromosome 1q41 accounts for variation in AL in
both school children and adults, regardless of age differences. Notably, the
early-onset of myopia in childhood may continuously progress toward high
myopia in later life, while adult-onset of myopia is usually in the low or moderate form329,330. The significant association on chromosome 1q41 for
high myopia in adults and children thus also implicates this locus identified
for AL is likely to be associated with early-onset myopia.
The prevalence of myopia among Asian population is considerably higher
than in Caucasians 283. Although distinct genetic mechanisms governing
myopia may exist for populations with different genetic backgrounds, we
believe there are polymorphisms involved in refractive variation that are
shared across populations. However, the allele frequencies of these identified
SNPs vary across populations. For instance, the minor C allele of rs4373767
was a major allele in the HapMap Africans and Europeans with frequency of
0.92 and 0.62 respectively. The effect size of the top SNPs on AL was found
greatly attenuated in Singapore Indian samples (rs4373767; β = -0.04 mm per
90
minor allele C, P = 0.151, MAF = 0.48; details of Indian cohort in Chapter 4).
In addition, we note that the variability in refraction attributed to AL may vary
in different ethnic groups. For example, AL has been reported to account for a
larger proportion of the variation in refraction in East-Asian children compared to their Caucasian counterparts 331, therefore the increased power of
refraction may reflect more variation in factors other than pure elongation of
AL in certain ethnic groups. Such heterogeneity may confer different
statistical power and large sample sizes will be required to study the
transferability of the same variants in Caucasian populations 332,333. Recent
largest GWAS conducted by the international Consortium for refractive Error
and Myopia has confirmed 1q41 locus influencing axial length across
Europeans (data has not published yet), which will be discussed in the Chapter
6.
In conclusion, our findings suggest that common variants at chromosome
1q41 are associated with AL and high myopia in a pediatric and an adult
cohort, the latter incorporating Chinese, Malay and Japanese populations.
Further evaluation of causal variants and underlying pathway mechanisms
may contribute to early identification of children at highest risk of developing
myopia, and eventually lead to appropriate interventions to retard the
progression of myopia.
91
Table 5: Characteristics of study participants in the five Asian cohorts. Characteristics SCESc SCORMc SiMESc Japan Dataset 1d Japan Dataset 2e High myopia Controls High myopia Controls Individuals (n) 1,860 929 2,155 483 1,194 504 550 Male (%) 51.5 51.7 49.3 33.7 41.3 43.3 49.5 Age a (yrs) 58.4 (9.5) 10.8 (0.8) 57.7 (13.9) 58.8 (13.2) 50.3 37.8 (11.9) 39.7 (15.9) (12.6) Range of age [44, 85] [10, 12] [40, 80] [14, 91] [20, 79] [12, 76] [21, 75] Height a (cm) Male 168.5 (6.3) 144.8 (8.7) 165.5 (6.4) NAf NA NA NA Female 156.7 (5.5) 145.5 (8.9) 152.3 (6.2) NA NA NA NA Average ALa (mm) 23.97 (1.39) 24.13 (1.18) 23.57 (1.04) 30.08 (1.38) NA 27.83 (1.28) NA Range of AL [20.64, 33.36] [21.05,28.20] [20.48, 31.11] [28.00, 38.03] NA [24.25, 34.74] NA Average SEa (diopter) -0.77 (2.64) -2.02 (2.26) -0.05 (1.90) -14.86 (4.28) NA -11.61 (2.22) NA Range of SE [-15.40, 6.25] [-11.09, 3.78] [-17.46, 8.56] [-42.00, -2.50] NA [-23.00, -9.25] ≥ -3.0 a Data presented are means ( standard deviation). AL, ocular axial length; SE, spherical equivalent. b The education levels of the children in SOCRM was presented by the level of educational attainment of the father, as cGWAS cohorts. SCES, Singapore Chinese Eye Study; SCORM, Singapore Cohort study of the Risk factors for Myopia; SiMES, Singapore Malay Eye Study. dFor the Japan dataset 1, high myopia, AL ≥ 28mm for both eyes; controls, general healthy population. e For the Japan dataset 2, high myopia, SE ≤ -9.0 D for either eye; controls, SE ≥ -3.0 D for both eyes. fNA, data not available.
92
-5 Table 6. Top SNPs (Pmeta-value ≤ 1 × 10 ) associated with AL from the meta-analysis in the three Asian cohorts.
SCESc SCORMc SiMESc Meta-analysis (n = 1,860) (n = 929) (n = 2,155) (n = 4,944) d Nearest β β β βmeta a b e SNP Gene CHR BP MA MAF (s.e) P MAF (s.e.) P MAF (s.e.) P (s.e.) Pmeta Phet rs4373767 ZC3H11B 1 217826305 C 0.30 -0.21 2.55 ×10-6 0.32 -0.16 1.80 ×10-3 0.24 -0.12 1.22 ×10-3 -0.16 2.69 ×10-10 0.23 (0.05) (0.05) (0.04) (0.02) rs10779363 ZC3H11B 1 217853513 C 0.29 -0.21 5.27 ×10-6 0.31 -0.17 1.63 ×10-3 0.24 -0.11 1.70 ×10-3 -0.15 7.83 ×10-10 0.23 (0.05) (0.05) (0.04) (0.02) rs7544369 ZC3H11B 1 217856085 T 0.29 -0.21 7.17 ×10-6 0.31 -0.16 2.56 ×10-3 0.24 -0.12 1.45 ×10-3 -0.15 1.10 ×10-9 0.29 (0.05) (0.05) (0.04) (0.03) rs4428898 ZC3H11B 1 217806589 G 0.30 -0.21 4.49 ×10-6 0.31 -0.14 7.46 ×10-3 0.22 -0.10 4.79 ×10-3 -0.14 9.07 ×10-9 0.19 (0.05) (0.05) (0.04) (0.02) rs4557020 SPTBN1 2 54571685 T 0.37 -0.11 9.08 ×10-3 0.37 -0.15 2.81 ×10-3 0.31 -0.11 8.10 ×10-4 -0.12 2.61 ×10-7 0.76 (0.04) (0.05) (0.03) (0.02) rs282544 PARP8 5 50062222 C 0.35 -0.10 2.38 ×10-2 0.33 -0.14 6.13 ×10-3 0.40 -0.09 2.40 ×10-3 -0.10 4.15 ×10-6 0.70 (0.04) (0.05) (0.03) (0.02) rs1137 SEMA4F 2 74792684 C 0.16 -0.01 8.68 ×10-1 0.18 0.22 (0.06) 5.97 ×10-4 0.26 0.14 3.01 ×10-5 0.12 4.26 ×10-6 0.02 (0.06) (0.03) (0.03) rs2404958 PARP8 5 50098792 T 0.35 -0.10 2.91 ×10-2 0.33 -0.15 5.32 ×10-3 0.40 -0.09 2.46 ×10-3 -0.10 4.72 ×10-6 0.66 (0.04) (0.05) (0.03) (0.02) rs10735496 ZC3H11B 1 217790029 C 0.29 -0.14 1.95 ×10-3 0.30 -0.09 7.61 ×10-2 0.26 -0.10 3.66 ×10-3 -0.11 5.75 ×10-6 0.71 (0.05) (0.05) (0.03) (0.02) rs4671938 SPTBN1 2 54546708 G 0.35 -0.08 5.85 ×10-2 0.34 -0.12 1.61 ×10-2 0.33 -0.11 8.92 ×10-4 -0.10 7.46 ×10-6 0.82 (0.04) (0.05) (0.03) (0.02) rs32396 PARP8 5 50142196 A 0.35 -0.09 3.78 ×10-2 0.33 -0.14 6.98 ×10-3 0.40 -0.09 2.57 ×10-3 -0.10 7.73 ×10-6 0.69 (0.04) (0.05) (0.03) (0.02) rs12055210 PARP8 5 50025458 A 0.35 -0.10 2.38 ×10-2 0.33 -0.14 8.39 ×10-3 0.40 -0.09 4.01 ×10-3 -0.10 9.11 ×10-6 0.70 (0.04) (0.05) (0.03) (0.02) rs11954386 PARP8 5 50021409 A 0.35 -0.10 2.17 ×10-2 0.33 -0.14 6.34 ×10-3 0.40 -0.09 5.35 ×10-3 -0.10 9.44 ×10-6 0.63 (0.04) (0.05) (0.03) (0.02) aMA, minor allele. bMAF, minor allele frequency in each cohort. cGWAS cohorts. SCES - Singapore Chinese Eye Study; SCORM - Singapore Cohort study of the Risk factors for Myopia; SiMES - Singapore Malay Eye Study. dβ, coefficient of linear regression; s.e., standard error for coefficient β. Association between each genetic marker and AL was examined using linear regression, adjusted for age, gender, height and level of education. The effect sizes denote changes in millimeter of AL per each additional copy of the minor allele. e Phet, P-value for heterogeneity by Cochran’s Q test across three study cohorts.
93
Table 7. Association between genetic variants at chromosome 1q41 and high myopia in the five Asian cohorts.
Japan Dataset 1 Japan Dataset 2 SCESb SCORMb SiMESb Meta–analysis (483/1,194)c (504/550) (44/1,305) (65/332) (22/2,052) (1,118/5,433) d e SNP BP MA OR P OR P OR P OR P OR P OR Pmeta Phet a (95% CI) (95% CI) (95% CI) (95% CI) (95% CI) (95% CI) rs4428898 217806589 G 0.74 2.33 ×10-4 0.76 2.15 ×10-3 0.73 2.06 ×10-1 0.63 4.89 ×10-2 0.72 4.37 ×10-1 0.74 7.86 ×10-8 0.96 (0.64, 0.87) (0.64, 0.91) (0.45, 1.19) (0.40, 0.99) (0.32, 1.64) (0.66, 0.83) rs4373767 217826305 C 0.74 1.44 ×10-4 0.81 1.80 ×10-2 0.73 1.99 ×10-1 0.59 2.59 ×10-2 0.77 5.16 ×10-1 0.75 4.38 ×10-7 0.75 (0.63, 0.86) (0.68, 0.96) (0.44, 1.18) (0.38, 0.94) (0.35, 1.69) (0.68, 0.84) rs10779363 217853513 C 0.74 2.11 ×10-4 0.81 1.41 ×10-2 0.68 1.35 ×10-1 0.62 4.13 ×10-2 0.76 4.96 ×10-1 0.76 7.81 ×10-7 0.82 (0.63, 0.87) (0.68, 0.96) (0.41, 1.13) (0.39, 0.98) (0.34, 1.68) (0.68, 0.85) rs7544369 217856085 T 0.75 3.05 ×10-4 0.82 2.32 ×10-2 0.67 1.45×10-1 0.62 4.17 ×10-2 0.74 4.56 ×10-1 0.76 1.45 ×10-6 0.79 (0.64, 0.88) (0.69, 0.97) (0.39, 1.15) (0.39, 0.98) (0.33, 1.64) (0.69, 0.85) aMA, minor allele. bGWAS cohorts; SCES - Singapore Chinese Eye Study; SCORM - Singapore Cohort study of the Risk factors for Myopia; SiMES - Singapore Malay Eye Study. cThe sample sizes for each study denote the number of high-myopia cases versus controls. For the Japan dataset 1, high myopia, AL ≥ 28mm for both eyes; controls, general healthy population; For the Japan dataset 2, SCES, and SiMES, high myopia, SE ≤ -9.0 D for either eye; controls, SE ≥ -3.0 D for both eyes; For SCORM children, high myopia, SE ≤ -6.0 D for either eye; controls, SE ≥ -1.0 D for both eyes. dOR, odds ratio per copy of minor allele. e Phet, P-value for heterogeneity by Cochran’s Q test across five study cohorts.
94
Figure 4. Principal component analysis (PCA) was performed in SiMES to assess the extent of population structure. Each figure represents a bivariate plot of two principal components from the PCA analysis of genetic diversity within SiMES. (A) 1st eigenvector against 2nd eigenvector, (B) 2nd eigenvector against 3rd eigenvector, (C) 3rd eigenvector against 4th eigenvector and (D) 1nd eigenvector against 5th eigenvector . The first 5 principal components were used as covariates to account for population structure.
95
Figure 5. Principal Component Analysis (PCA) of discovery cohorts SCES, SCORM and SiMES with respect to the four population panels in phase 2 of the HapMap samples (CEU - European, YRI – African, CHB – Chinese, JPT – Japanese). ( A) Principal components 1 versus 2; The principal components (PCs) were calculated with SCES, SCORM, SiMES and four HapMap panels on the thinned set of 102,122 SNPs (r2 <0.2). (B) Principal components 1 versus 2; (C) Principal components 1 versus 3; (D) Principal components 1 versus 4. For figure(B-D), The PCs were calculated with SCES, SCORM, SiMES and HapMap Asian population panels on the thinned set of 86,516 SNPs (r2 <0.2).
96
Figure 6. Quantitle –Quantitle (Q-Q) plots of P-values for association between all SNPs and AL in the individual cohort (A) SCES, (B) SCORM,(C) SiMES, and combined meta-analysis of the discovery cohorts (D) SCES + SCORM + SiMES.
97
Figure 7. Manhattan plot of -log10(P) for the association on axial length from the meta-analysis in the combined cohorts of SCES, SCORM and SiMES. The red horizontal line denotes genome-wide significance (P = 5.0 × 10-8).
98
Figure 8. The chromosome 1q41 region and its association with axial length in the Asian cohorts. A) Regional plots for AL from the meta-analysis of three Asian GWAS cohorts: SCES, SCORM and SiMES. The association signals in a 1 megabase (Mb) region at chromosome 1q41 from 217,400 kb to 218,400 kb around the top SNP rs4373767 (red diamond) are plotted. The degree of pair-wise LD between the rs4373767 and any genotyped SNPs in this region is indicated by red shading, measured by r2. Superimposed on the plots are gene locations and recombination rates in HapMap Chinese and Japanese populations (blue lines). B) LD plot showing pair-wise r2 for all the SNPs genotyped in HapMap database residing between rs4428898 and rs7544369, inclusively, at chromosome 1q41. The four identified top SNPs are in red rectangles. The LD plot is generated by Haploview using SNPs (MAF > 1%) genotyped on Han Chinese and Japanese samples in the HapMap database. All coordinates are in Build hg18.
99
Figure 9. mRNA expression of ZC3H11B, SLC30A10 and LYPLAL1 in human tissues . Expression of mRNA for the three genes was examined in human brain, placenta, neural retina (retina), retinal pigment epithelium (RPE) and sclera from adult tissues, and retina/RPE and sclera from 24-week gestation fetal tissues using reverse transcription polymerase chain reaction (RT-PCR). Glyceraldehyde 3-phosphate dehydrogenase (GAPDH) is a housekeeping gene and was used as an internal control for the quantification of mRNA expression. NTC (No template control) served as a negative control with the use of water rather than cDNA during PCR.
100
Figure 10. Transcription quantification of ZC3H11A, SLC30A10 and LYPLAL1 in mouse retina, retinal pigment epithelium and sclera in induced myopic eyes, fellow eyes and independent control eyes. Myopia was induced using -15 diopter negative lenses in the right eye of mice for 6 weeks. Uncovered left eyes were served as fellow eyes and age-matched naive mice eyes were controls. Quantification of mRNA expression in mice neural retina (retina), retinal pigment epithelium (RPE) and sclera using quantitative real-time PCR. The bar represents the fold changes of mRNA for each gene after normalization using GAPDH as reference. The mRNA levels of murine ZC3H11A, a gene that is conserved with respect to ZC3H11B in human, SLC30A10 and LYPLAL1 in myopic and fellow retina, RPE and sclera are compared with independent controls with P-values as follows: ZC3H11A (retina/ RPE/sclera, P = 2.60 × 10-5, 2.62 × 10-6 and 1.08 × 10-4 respectively), SLC30A10 (P = 2.00 × 10-4, 2.00 × 10-4 and 4.02 × 10-4 respectively) and LYPLAL1 (P = 1.50 × 10-4, 1.50 × 10-4, 1.54 × 10-4 respectively). *P<0.0001.
101
.
Figure 11. Immunofluorescent labelling of (A) ZC3H11A (B) SLC30A10 and (C) LYPLAL1 in mouse retina, retinal pigment epithelium and sclera in induced myopic eyes, fellow eyes and independent control eyes. The neural retina (retina), retinal pigment epithelium (PRE) and scleral cells were immunolabeled with the polycolonal antibodies against ZC3H11A, SLC30A10 and LYPLAL1 and were co-labeled with 4',6-diamidino-2- phenylindole (DAPI). Negative controls were devoid of a fluorescence signal, treated with the secondary antibody alone and DAPI. No immunostaining was observed in the negative controls. Scale bar represents 50µM and magnification is 200X. The florescence intensity labeled of the green color shows the localization of proteins and blue color indicates the nuclei that were stained with DAPI. Expression of the proteins had a trend in abundance similarly to that of their mRNA levels as depicted in Figure 10. Lower level of expression was determined for ZC3H11A in all tissues for myopic mice. Similarly significant reduction was shown in the expression of SLC30A10 in retina and RPE while higher level of expression was found in myopic sclera. LYPLAL1 showed higher level of expression in the retina and RPE tissue but reduced expression in the sclera in myopic mice. The following abbreviations represent the retinal layers: nerve fibre layer (NFL), ganglion cell layer (GCL), inner plexiform layer (IPL), inner nuclear layer (INL), outer plexiform layer (OPL), outer nuclear layer (ONL), photo receptor layer (PRL) and retinal pigment epithelium (RPE).
102
4 Chapter 4 Genome-Wide Meta-Analysis of Five Asian
Cohorts Identifies PDGFRA as a Susceptibility Locus for
Corneal Astigmatism
This Chapter is a slightly modified version of the paper published in PLoS Genet
2011, 7(12): e1002402. doi:10.1371/journal.pgen.1002402
4.1 Abstract
Corneal astigmatism refers to refractive abnormalities and irregularities in the curvature of the cornea, and this interferes with light being accurately focused at a single point in the eye. This ametropic condition is highly prevalent, influences visual acuity and is a highly heritable trait. There is currently a paucity of research in the genetic aetiology of corneal astigmatism. Here we report the results from five genome-wide association studies of corneal astigmatism across three Asian populations, with case-control datasets of Chinese (1,991 cases /954 controls),
Malays (1,018 cases /1,220 controls), Indians (825 cases /1,314 controls) and an independent 397 Chinese family trios. Variants in PDGFRA on chromosome 4q12
(lead SNP: rs7677751, allelic odds ratio = 1.26 (95% CI: 1.16-1.36), Pmeta = 7.87
× 10-9) were identified to be significantly associated with corneal astigmatism, exhibiting consistent effect sizes across all five cohorts. This highlights the potential role of variants in PDGFRA in the genetic aetiology of corneal astigmatism across diverse Asian populations.
103
4.2 Background
Astigmatism is a condition where light rays are prevented from focusing at a single point in the eye, resulting in blurred vision at any near or far distance. The
reported age-adjusted prevalence of astigmatism was 37.8% for Chinese adults284,
54.8% in rural Asian Indians334, 37% (≤ -0.75 D) for Caucasian in Australia335
and 36.2% in the US336. The prevalence of astigmatism in children varies considerably across different studies and ethnic groups. For instance, the prevalence of astigmatism (≤ -0.75 D) in school-children ranges from 13.6% in
Australia337, 20% in Northern Ireland338, 28.4% for Singapore school children122,
to 42.7% for Chinese children in urban China339.
Although the precise cause of astigmatism is unknown, genetic factors have
been implicated in the aetiology of corneal astigmatism. Studies have reported a higher risk of developing astigmatism in individuals whose sibling or parents have astigmatism125. However, only a few segregation analyses have been
conducted for astigmatism. A study examined the mode of inheritance of
astigmatism in 476 individuals from 125 nuclear familiars with probands affected
by astigmatism340. The results suggested familial transformation existing for
affected probands (≤ -0.75 D), and single-major-locus inheritance when the
corneal astigmatism was coded as ordinal outcome by the severity degree.
Proportion of genetic contribution to astigmatism development is generally
lower than that for spherical refractive error, with the estimated heritability, in
general, ranging from 30% to 60%157,341-344. For instance, Hammond and
colleagues157 investigated the inheritance of astigmatism for 226 monzygotic
104
(MZ) and 280 dizygotic (DZ) twins in the United Kingdom and found genetic
effects accounted for 42% to 61% of the variation in corneal astigmatism.
Similarly, heritability estimate was as high as 60% in 345 MZ and 267 DZ twin
pairs in Austria in myopia (GEM) twin study341. While most of the twin studies
have been conducted in Caucasian populations, a study on Chinese twins in
Taiwan reported a heritability estimate of 46% for corneal astigmatism,
suggesting that genetic factors account for a similar extent in the aetiology of
astigmatism for Asian populations343. However, in a recent study in 52 MZ and
47 female twin pairs aged 66-79 years, the genetic contribution to the corneal
astigmatism is not significant345; this may be due to the small samples size.
No genetic loci have been systematically and consistently identified to be implicated in the development of corneal astigmatism. Here we report the findings from the meta-analyses of five genome-wide association studies (GWAS) performed in 8,513 individuals from three Asian populations. The discovery phase of our study comprises 4,254 individuals from two population-based
GWAS performed in adults of Chinese and Malay ethnicities from the Singapore
Prospective Study Program (SP2) and the Singapore Malay Eye Study (SiMES) respectively. The replication phase comprises of data from three other genome- wide surveys of: (i) 2,139 Indian adults from the Singapore Indian Eye Study
(SINDI); (ii) 929 Chinese school children from the Singapore Cohort Study of the
Risk Factors for Myopia (SCORM); and (iii) 397 Chinese trios of parents and astigmatic offsprings from the Singaporean Chinese in the Strabismus, Amblyopia and Refractive Error Study (STARS).
105
4.3 Methods
4.3.1 Study cohorts
Singapore Prospective Study Program (SP2): Participants included in SP2
were recruited from a revisit of four previously conducted population-based
surveys in Singapore: Thyroid and Heart Study 1982-1984346, National Health
Survey 1992347, National University of Singapore Heart Study 1993–1995348 and
347-349 the National Health Survey 1998 . These studies comprise random samplings of individuals stratified by ethnicity from the entire Singapore population. From
2003 to 2007, a total of 10,747 adults aged 40 years or older were invited in this follow-up survey of which 5,157 underwent a health examination and blood samples were drawn. The present genome-wide genotyping involved individuals of Chinese descent only (n = 2,867). Complete post-filtering data on corneal astigmatism for GWAS were available for 2,016 individuals.
Singapore Malay Eye Study (SiMES): SiMES is a population-based cross- sectional survey on eye diseases for Malay adults aged 40 to 80 years living in
Singapore. It was conducted between August of 2004 and June of 2006. The details of the study design and methods have been previously described291. In
brief, a total of 4,168 Malay residents in the Southwestern area of Singapore were
identified and invited for a detailed ocular examination which 3,280 (78.7%)
participated. Genome-wide genotyping was performed in 3,072 individuals26,29.
Individuals with cataract surgery and missing corneal astigmatism measurements
were excluded from the study. Complete post-filtering data on corneal
106
astigmatism for GWAS were available for 2,238 individuals.
Singapore Indian Eye Study (SINDI): SINDI is a population-based survey of major eye diseases288 in ethnic Indians aged 40 to 80 years living in the South-
Western part of Singapore and was conducted from August 2007 to December
2009. In brief, 4,497 Indian adults were eligible and 3,400 participated. Genome-
wide genotyping was performed in 2,953 individuals26,29. As in the discovery
cohorts, participants were excluded from the study if they had cataract surgery
and missing corneal astigmatism data. Complete post-filtering data on corneal astigmatism were available for 2,134 participants.
Singapore Cohort study of the Risk factors for Myopia (SCORM): A total of
1,979 children in grades 1, 2, and 3 from three schools were recruited from 1999 to 2001 with detailed information described elsewhere289. The children were
examined on the school premises annually by a team of eye care professionals.
The GWAS was conducted in a subset of Chinese children of 1,116 subjects22.
The phenotype used in this study was based on the corneal astigmatism measured
on the 4th annual examination of the study (children at age 10 to 12 years).
Complete post-filtering data on corneal astigmatism measurements and SNP data
were available in 929 SCORM children.
Singaporean Chinese in the Strabismus, Amblyopia and Refractive Error
Study (STARS): STARS is a population-based survey of Chinese families with
children aged 6 to 72 months residing in the south-western and western region of
Singapore. Disproportionate random sampling by 6-month age groups resulted in
the recruitment and subsequent eye examination of 3,009 Chinese children
107
between May 2006 and November 2008. Details of the study design and
methodology have been previously described350. A total of 1,451 samples from
440 nuclear families were included for genome-wide genotyping. In all, 397 trio-
sets of parents and their offsprings with corneal astigmatism had complete post-
filtered genotype data.
Measurements and definition of corneal astigmatism
All studies used a similar protocol to measure ocular phenotypes including
corneal curvature, autorefraction and cylinder power by a team of eye care
professionals. Participants in SP2, SIMES and SINDI underwent non-cycloplegic
automated refractive assessments using the autorefractor (Canon RK-5, Tokyo,
Japan). For SCORM and STARS, cycloplegic measurements (Canon RK-F1,
Tokyo, Japan) were performed 30 minutes after three drops of 1% cyclopentolate which were administered 5 minutes apart.
Corneal curvature radii in the horizontal and vertical meridian were determined with keratometry in millimeters351. The keratometer measured the
anterior corneal surface and used a refractive index of 1.3375 to account for the
contribution from the posterior corneal surface to derive the corneal refractive
power in diopters. Corneal cylinder power was calculated as the difference
between the flattest and steepest meridian of the keratometry readings in diopters
of power.
As the corneal cylinder power between the right and left eyes are strongly
correlated across all five cohorts (Pearson’s correlation coefficient r ranging from
0.51 to 0.79; P < 0.001), the mean corneal cylinder power over both eyes was
108
used to define corneal astigmatism. Averaging ocular measurements between two
eyes in genetic studies has been suggested to be statistically more powerful than
using the information from only one eye36, and this approach has been
consistently adopted in genome-wide studies of myopia34,35. As with previous
studies337,340, we have defined individuals with average corneal cylinder power ≤ -
0.75 D as cases, and those with average corneal power between -0.75 D and 0 D
as controls.
4.3.2 Data quality control
For SP2, a total of 2,867 blood-derived DNA samples were genotyped using
the Illumina Human610 Quad and 1Mduov3 Beadchips. For the samples that were
genotyped on the two platforms, the genotypes from the denser platform were
used in our study. For SiMES (n = 3,072), SINDI (n = 2,593) and STARS (n =
1,451), the Illumina Human610 Quad Beadchips were used for genotyping all
DNA samples. For the 1,116 SCORM children, DNA samples were genotyped on
the Illumina HumanHap 550 Duo Beadchips®.
In brief, for case-control study design, QC criteria included a first round for
autosomes SNP QC to obtain a cleaned set of genotypes for sample QC, by
excluding SNPs with: (i) missingness (per-SNP call rate) > 5%; (ii) minor allele
frequency (MAF) < 1%; and (iii) HWE p-value < 10-7. Using the subset of SNPs passing the first round QC, samples were then excluded based on the following conditions: (i) per-sample call rates of less than 95%; (ii) excessive heterozygosity (defined as the sample heterozygosity to be beyond 3 standard
109
deviations from the mean sample heterozygosity); (iii) cryptic relatedness; (iv)
gender discrepancies; and (v) deviation in population membership from
population structure analysis. A second round of SNP QC was then applied to the
remaining samples passing quality checks. We excluded SNPs with missingness >
5%, gross departure from HWE (P value < 10-6), MAF <1 % and low
concordance between duplicate samples on different genotype platforms (relevant
to SP2 samples only).
Population structure was ascertained using principal components analyses
(PCA) with the EIGENSTRAT program7. Population substructure of SP2,
SiMES, SINDI and SCORM was examined by PCA with respect to three
population panels in the HapMap samples (Figure 12). Due to the presence of population structure within the Malay (Figure 4) and Indian samples (Figure 13), we adjusted for the top 5 principal components in the association analyses for the
SiMES and SINDI datasets.
For the STARS trios, we additionally excluded samples and trio-sets on the basis of excessive Mendelian inconsistencies defined as having > 1% of the post-
QC SNPs exhibiting Mendelian errors. SNPs with more than 10% Mendelian errors are excluded from the association analyses, and the genotypes leading to
Mendelian errors in all other remaining SNPs are coded as missing. As family trios are more robust to the presence of population structure, we did not exclude any samples due to population structure.
110
Detailed data quality control (QC) procedures for each study are as follows
(except the QC procedure for SCORM and SiMES are provided in Chapter 3
Method Session):
SP2: Of the 2,867 blood-derived DNA samples, 392 samples were genotyped on the HumanHap 550v3, 1,459 samples on the 610-Quad, 817 samples on the
1M-Duov3, 191 samples on both 550v3 and 1M-Duov3, and 8 samples on both
610-Quad and 1M-Duov3. For the samples that were genotyped on two platforms, we used the genotypes from the denser platform in our study. We excluded 443 individuals on the following conditions, sample call rates of less than 95%, excessive heterozygosity, cryptic relatedness by IBS, population structure ascertainment, and gender discrepancies as listed in the main text. This left 2,434 post-QC SP2 samples. During the SNPs QC procedure, we excluded SNPs with low genotyping call rates (> 5% missingness) or monomorphic, with MAF < 1%, or with significant deviation from HWE (P< 10-6). This yielded a post-QC set of
462,580 SNPs. As SP2 samples are genotyped on different platforms, the concordance of the duplicate samples plated on different Beadarrays chips was also examined as quality of genotyping. The average SNP concordance rate between chips for the post-QC duplicated samples was 0.995. We additionally assessed the SNPs that are present on different platforms for extreme variations in allele frequencies with a 2-degree of freedom chi-square test of proportions, removing 62 SNPs with P-values < 0.0001. We further excluded those with missing phenotype data on corneal astigmatism (n = 418). This yields a final set
111
of 462,518 SNPs that are common to all 2,016 SP2 samples (1231 cases, 785
controls).
SiMES: Using the same quality control criteria, we omitted a total of 530
individuals from the total of 3,280 samples in the study, including those of
subpopulation structure (n = 170), cryptic relatedness (n = 279), excessive
heterozygosity or high missingness rate > 5% (n = 37), and gender discrepancy (n
= 44). After the removal of the samples, SNP QC was then applied on a total of
579,999 autosomal SNPs for the 2,542 post-QC samples. SNPs were excluded based on (i) high rates of missingness (> 5%) (n = 26,343); (ii) monomorphic
SNPs or MAF < 1% (n = 34,891); or (iii) genotype frequencies deviated from
HWE (p < 1 × 10-6) (n = 3,053). This yielded 515,712 SNPs after the same SNP
QC criteria. Those with missing data or having cataract surgery were further removed (n = 304). Finally, 515,712 SNPs in 2,238 samples (1018 cases, 1220 controls) were available for association analysis.
SINDI: We excluded 415 subjects from the total of 2,953 genotyped samples
based on: excessive heterozygosity or high missingness rate > 5% (n = 34),
cryptic relatedness (n = 326), issues with population structure ascertainment (n =
39) and gender discrepancies (n = 16). This left a total of 2,538 individuals with
579,999 autosomal SNPs. During SNP QC procedure, i) 17,923 SNPs were
removed due to high rates of missingness (> 5%), ii) and 17,966 SNP that were
monomorphic or had MAF < 1%, iii) and 2,958 SNPs out of HWE (p < 10-6).
After SNP QC, 541,152 SNPs were available for analysis. We further excluded
any individuals with missing data on corneal astigmatism measurements and
112
having history of cataract surgery (n = 399). This yielded a final dataset of 2,139 individuals (825 cases, 1,314 controls) and 541,152 post-QC SNPs.
STARS: Of the total of 1,451 genotyping samples, the individuals were excluded based on gender discrepancy (n = 17), high missingness (> 5%), excessive heterozygosity (n = 11) and excessive Mendelian inconsistency
(Mendelian error per family > 1%; n = 14). This left a set of 1,408 samples in 436 families after quality control, of which 57 samples (from 29 families) were further excluded due to founder singleton, or non-founder without 2 parents, yielding
1,351 individuals in 407 families. SNP QC was performed on the 576,979 autosomal SNPs by excluding those with high missingness rate > 5% (n = 1,984), gross departure of HWE for parents (P < 10-8) (n = 908), monomorphic SNPs and
SNPs with MAF <1% (n = 80,804), and SNPs exhibiting significant degree of
Mendelian inconsistencies (> 10% of all the trios) (n = 7). This led to a final set of
493,594 post-QC SNPs. We further restricted the family-based analyses to 1,191 individuals in 397 parents-trios (317 nuclear families) having corneal astigmatic child. In all, 493,594 SNPs for 1,191 samples from 397 parent-trios were available for family GWAS on corneal astigmatism.
4.3.3 Statistical methods
The genome-wide association tests were performed using PLlNK (version
1.07; http://pngu.mgh.harvard.edu/~purcell/plink/) as the primary analysis tool. A logistic regression adjusted for age and gender is used to model the association of genetic markers with corneal astigmatism. For each of SiMES and SINDI, the top
113
5 principal components of genetic ancestry from the EIGENSTART PCA were
also included as covariates to adjust for population stratification in these
populations. We assumed an additive genetic model where the genotypes of each
SNP is coded as 0, 1, and 2 for the number of minor alleles carried, in keeping
with increments in allelic dosage. For family GWAS association tests in STARS,
a transmission disequilibrium test (TDT) is used to measure significant distortions
in transmission of an allele from heterozygous parents to the affected offspring
under the condition of Mendel’s law352.
We also performed a quantitative trait analysis with the average corneal
cylinder power as the outcome. This is performed in PLINK for the unrelated
samples, and in FBAT (http://www.hsph.harvard.edu/~fbat/fbat.htm) for the
family trios. As the distribution of the quantitative trait of corneal cylinder power
is skewed, we performed a normal quantile transformation 353 prior to the
association analysis for unrelated samples. For family-based data, no
transformation was conducted since the FBAT does not require normal trait55
A meta-analysis of all five datasets is performed using weighted-inverse variance estimated from fixed-effect modelling in METAL
(http://www.sph.umich.edu/csg/abecasis/metal/). We adopt the method by
Kazeem and Farrall352 to pool the evidence from the case-control analyses and the
family trio TDT. For the quantitative trait analysis, the overall z statistics is
calculated as a weighted sum of the z-statistics from the linear regressions in the
non-familial data and FBAT analysis for the family-based data, weighted by
number of unrelated individuals or trios in the respective studies58.
114
Genotyping quality of all reported SNPs in this paper have been visually assessed by checking the intensity clusterplots.
4.4 Results
4.4.1 Datasets after quality control
The characteristics of the post-QC samples from the five studies are summarised in Table 8. The post-QC SP2 dataset comprised 2,016 adults, of which 1,231 individuals had corneal astigmatism (≤ -0.75 D) and 785 subjects were defined as non-astigmatic controls. The post-QC datasets comprised 2,238 adults (1,018 cases and 1,220 controls) in SiMES, 825 cases and 1,314 controls in
SINID, and 760 cases and 169 controls in SCORM. 397 trio-sets of parents and their offsprings with corneal astigmatism passed sample QC (n = 1,191). In total,
the genome-wide meta-analysis was conducted on 460,528 autosomal genotyped
SNPs passed stringent quality control criteria present in all studies consisting of
8,513 individuals.
There was no evidence of over-inflation of statistical significances due to
population structure in either of the cohorts (SP2 λGC = 1.006, SiMES λGC =
1.007, SINDI λGC = 1.009, SCORM λGC = 1.01, STARS λGC = 1.021 ) or in the
meta-analysis of all studies (overall λGC = 1.002). Suggestive evidence of
association (defined as 10-7 < P-value < 10-5) were seen in each of cohort
(Figures 14A -E), as well as in the meta-analysis of five cohorts where a collection of SNPs deviated from their expected distributions in the quantile- quantile plots of the P-values (Figure 14F).
115
4.4.2 Gene PDGFRA exhibiting genome-wide significance
Three SNPs (rs7677751, rs2307049 and rs7660560) in meta-analysis attained
genome-wide significance of P-value < 5 × 10-8 and were found to cluster in the
platelet-derived growth factor receptor alpha (PDGFRA) gene on chromosome
4q12 (lowest P = 7.87 × 10-9 at rs7677751; Table 9; Figure 15). The direction
and magnitude of the effect sizes at these SNPs in all five cohorts were highly
similar. No significant evidence of effect size heterogeneity was detected across
the SNPs (heterogeneity I2 P-value ≥ 0.246), and the minor allele frequencies of
these SNPs are consistently similar across all five studies. Interestingly, these
SNPs are located within the MYP9 region identified previously as a candidate
locus for myopia through linkage scans 39.
At the most significant SNP rs7677751 in PDGFRA, the frequency of the risk
T-allele ranged from 0.19 to 0.26 in the five cohorts and conferred a 26% higher risk of corneal astigmatism than the C allele (OR = 1.26, 95% CI = 1.16 – 1.36) in the meta-analysis across all five studies (Figure 16). This SNP alone explains
0.41% of the variation in corneal cylinder power. In addition, a general genetic model identified that the 5.5% of the individuals in the combined cohorts that carry the TT genotype at rs7677751 had a 1.65-fold (95% CI = 1.33 – 2.06, P- value = 6.23 × 10-6) increase in the risk of developing corneal astigmatism
compared to those that are not carrying any copies of the risk allele. All of the
associated SNPs spanned 10kb within PDGFRA at 4q12, and a high degree of
linkage disequilibrium is seen at this locus in all three Asian populations
(Chinese, Malays and Asian Indians; Figure 17). 116
4.5 Discussion
We have performed a genome-wide survey for corneal astigmatism across
8,513 individuals, combining the data from five GWAS performed in Chinese and
Malay adults, Indian adults, Chinese children and family trios. We observed a
strong and consistent association with the onset of corneal astigmatism at the
PDGFRA gene locus on chromosome 4q12 across all five Asian cohorts, with
three SNPs in this locus exhibiting evidence stronger than genome-wide significance in the meta-analysis. To the best of our knowledge, this is the first
GWAS to investigate the genetic aetiology of corneal astigmatism in a genome- wide fashion.
The PDGFRA gene spans 69kb with 23 coding exons and resides on chromosome 4q12. The receptor for platelet-derived growth factor (PDGF) contains two types of subunit, a- and β- PDGFRA, which are differentially expressed on the cell surface354. PDGFR-a binds to three forms of PDGF (PDGF-
AA, AB and BB) and mediates many biological process including embryonic
development, angiogenesis, cell proliferation and differentiation. The role of
PDGFRA in cellular growth and proliferation is underlined by its contribution to
the pathogenesis of gastrointestinal stromal tumours355. A large body of evidence
has shown that both PDGF and its receptors are expressed in corneal epithelium,
stromal fibroblasts and endothelium356,357 as well as proliferative retinal tissues in eyes358-360. Along with other cytokines (epidermal growth factor, transforming growth factor-a,-β etc), studies have further suggested that PDGF and its
117
receptors can mediate corneal fibroblast migration, matrix remodelling and play
an important role in corneal wound healing357,361-363. The corneal stroma
comprises a large portion of the cornea; the sensitivity of stromal tissue to various
growth factors is well described364. The administration of PDGF resulted in
keratinocyte elongation using rabbit corneal stroma tissue365. In light of this, a
role for PDGFRA in the regulation of ocular development and parameters cannot
be excluded, and our study suggests that genetic polymorphisms within PDGFRA
may be involved in the regulation of corneal biometrics resulting in the
occurrence of corneal astigmatism.
In addition, Hammond and colleagues reported that 4q12 (MYP9; LOD 3.3)
was significant linked with myopia from a genome-wide linkage study of 221 dizygotic twin pairs39, and subsequent replication revealed nominal significance
of 4q12 (P = 0.065) for refractive error in African-American families38. We thus
undertook a candidate SNP approach with the identified SNPs to investigate the
possible association between PDGFRA and (i) the onset of high myopia; (ii) the refractive error as a quantitative trait. We did not observe any striking association between the identified variants with either outcomes, suggesting that the association of PDGFRA with corneal astigmatism is probably not through any shared aetiology with myopia.
The most significant SNP in our analyses rs7677751 is located in the intron 1 of PDGFRA. Interestingly, among the SNPs identified, rs2228230 is coding- synonymous (valine:GTC > valine:GTT) and resides in exon 18, while rs3690 is within the untranslated-3’ region (Figure 17). These three SNPs (rs7677751,
118
rs2228230 and rs3690) are strongly correlated with each other (pair-wise Pearson correlation coefficient r ranging from 0.77 to 0.81), although the association evidence at the latter two SNPs did not reach genome-wide significance. As the next closest gene (GSX2) from the 5’ end of PDGFRA is 127kb away and is not within the LD block with our identified SNPs, it is unlikely that the signals
observed in our study are attributed to functional variants located beyond
PDGFRA.
Our group recently reported a strong association between variants in PDGFRA
with corneal curvature366. Corneal curvature is an ocular dimension defined as the
average of the radius of corneal curvature at the horizontal and vertical meridians.
Myopic eyes have been found to have steeper corneas (reduced radius of
curvature), but the significant correlation between corneal curvature and refractive
error was not consistently observed331,367,368. Excessively flatter cornea is
associated with cornea plana, producing high hyprotropia and likely resulting in
angle-closure glaucoma33,369. On the other hand, corneal astigmatism is an eye-
disorder, where the cornea is more curved in one meridian direction compared to
the other. This fragmentizes the light rays entering the eye, leading to the inability
to focus onto a single point in the eye115. It is thus interesting that the same
PDGFRA gene has been identified in two ocular outcomes that are biologically
different, given the presence of a weak correlation between corneal astigmatism
and corneal curvature (Spearman correlation coefficient r between 0.088 and
0.192 in our cohorts), pointing to a possible pleiotropic contribution of PDGFRA.
119
Our study has adopted a binary definition of corneal astigmatism (affected and unaffected) that is commonly adopted in clinical practice and eye-trait epidemiology337,340. One caveat of this definition is the potential for misclassifying the affected status, particularly for samples with cylinder power around the cutoff threshold of -0.75D. To evaluate the robustness of our findings to the choice of threshold used, we additionally performed the association analysis at the identified SNPs with four different combinations of the thresholds used to define cases and controls. We observed that the odds ratios were highly similar across all four scenarios, with the combined evidence at rs7677751 ranging from
-4 -8 Pmeta of 1.5 × 10 to 6.7 × 10 . Unsurprisingly, the association evidence was weakest in the scenario with the most stringent thresholding (≤ -1.5 for cases and
> -0.5 for controls), given this stringency comes at the expense of decreasing the number of individuals in each study. We additionally performed a secondary analysis treating corneal cylinder power as a quantitative trait. Strong statistical evidence was consistently observed at the three leading SNPs (rs7677751, P =
1.76 ×10-7; rs2307049, P = 3.41 × 10-7 and rs7660560, P = 4.41 × 10-7), indicating that our findings are robust to the definition of the phenotype.
Owing to the relatively small sample sizes within each of the five GWAS studies, we have chosen to priortise our survey to identify genetic variants that contribute to the aetiology of corneal astigmatism in multiple Asian populations.
While Malays have been observed to be genetically closer to the Chinese, the
Asian Indians tend to be genetically closer to the Caucasians100. Our discovery at
PDGFRA thus suggests that part of the underlying biological pathway responsible 120
for astigmatism development is common to multiple populations, although there
may be population-specific genetic variants that our current study is not
sufficiently powered to identify.
Our study has included two pediatric Chinese populations (SCORM and
STARS) with school or pre-school children who are still progressing to their final
phenotype. It was documented that a high degree of astigmatism occurs during
infancy and the early childhood 370. The prevalence rates remain stable during
young adulthood (18 to 40 years), but increase consistently during late adulthood
at aged 40 years or older115,284. Studies have also indicated that the age-related change in astigmatism is associated with meridians changes in the cornea125.
Children and adolescents have a predominance of “within-the-rule” corneal astigmatism in general, where the vertical curve is greater than the horizontal
(axis of 1° to 15°); while in older adults, it shifts to “against-the-rule” astigmatism
(axis of 75° -105°)371,372. However, our study considers corneal astigmatism without reference to the axis nor the longitudinal changes from children to adults.
Whether PDGFRA plays the same role in pediatric and adults populations will however need further investigation.
121
Figure 12 Principal Component Analysis (PCA) of SP2, SiMES, SINDI, SCORM with respect to the population panels in phase 2 of the HapMap samples (CEU - European, YRI – African, CHB – Chinese, JPT – Japanese).
122
Figure 13 Principal component analysis (PCA) was performed in SINDI to assess the extent of population structure. Each figure represents a bivariate plot of two principal components from the PCA analysis of genetic diversity within SINDI. (A) 1st eigenvector against 2nd eigenvector, (B) 2nd eigenvector against 3rd eigenvector, (C) 3rd eigenvector against 4th eigenvector and (D) 1nd eigenvector against 5th eigenvector . The first 5 principal components were used as covariates to account for population structure.
123
A B
C D
E F
Figure 14 Quantile-Quantile (Q-Q) plots of P-values for association between all SNPs and corneal astigmatism in the combined meta-analysis of (A) individual cohort SP2, (B) SiMES, (C) SINID, (D) SCORM, (E) STARS and (F) SP2 + SiMES + SINDI + SCORM + STARS. 124
A
B
Figure 15 (A) Manhattan plot of log10(P-values) in the combined discovery cohort of SP2, SiMES, SINDI, SCORM and STARS. The blue horizontal line presents the threshold of suggestive significance (P = 1.00 x 10-5). (B) Regional plot of the association signals from the meta-analysis of the five GWAS cohorts around the PDGFRA gene locus. A region of 400kb around the lead SNP (rs7677751, red diamond) is shown. The LD between the lead SNP and the neighbouring SNPs is represented by the shading of the squares, with increasing shade of red indicating higher LD as measured by r2. The blue lines represent the recombination rates of JPT+CHB panels from HapMap II.
125
Figure 16. Forest plot of the estimated allelic odds ratios for the lead SNP rs7677751. The allelic odd ratios for allele T of rs7677751 and 95% confidence intervals are presented for the five studies separately (black rectangles or green rectangles), and for the overall meta-analysis across all five studies (red diamond).
126
A
B
C
Figure 17 Linkage disequilibrium (LD) calculated in terms of r2 for Singapore Chinese samples from SP2 (A), Malays samples from SiMES (B) and Indians panels from SINID(C). Black squares show perfect LD whereas shades of grey show decreasing LD.
127
Table 8. Characteristics of the participants in five studies
Cohorts SP2 SIMES SINDI SCORM STARS Individuals genotyped 2687 3280 2953 1116 1451 (440 families) Individuals after QC 2434 2542 2538 1008 1351 (407 families) Individuals in GWAS 2016 2238 2139 929 1191 (397 parents-trios) a Male (%) 45.8% 48.9% 51.2% 51.8% 52.3% a Age (SD) 47.9 (11.2) 57.7 (10.7) 55.9 (8.9) 10.8 (0.8) 7.5 (3.8) Cases 1231 1018 825 760 NA Controls 785 1220 1314 169 NA Corneal cylinder powerb (SD) Cases -1.38 (0.73) -1.37 (0.94) -1.21 (0.58) -1.52 (0.68) 1.30 (0.78)* Controls -0.48 (0.15) -0.46 (0.15) -0.46 (0.16) -0.47 (0.14) NA aInformation of gender and age is based on data from the offsprings with corneal astigmatism in STARS family cohort. bAveraged across both eyes; NA, not available; Age is in years. SP2 - Singapore Prospective Study Program; SiMES - Singapore Malay Eye Study; SINDI - Singapore Indian Eye Study; SCORM - Singapore Cohort study of the Risk factors for Myopia; STARS - Singaporean Chinese in the Strabismus, Amblyopia and Refractive Error Study.
128
Table 9 Top SNPs (P-value ≤ 5 x 10-6) identified from combined meta-analysis of five Asian population cohorts.
SP2 SiMES SINDI SCORM STARS Meta-analysis (n=2,016) (n = 2,238) (n = 2,139) (n = 929) (397 trios) (n=8,513)
SNP GENE CHR BPa Ab EAFc OR P OR P OR P OR P OR P OR s.e. P Pe
rs7677751 PDGFRA 4 54819217 T 0.23 1.35 3.76E-04 1.27 6.26E-04 1.23 4.09E-03 1.22 2.00E-01 1.12 3.74E-01 1.30 0.05 7.87E-09 0.786
rs7660560 PDGFRA 4 54829151 A 0.23 1.32 1.46E-03 1.28 5.34E-04 1.26 1.37E-03 1.13 4.22E-01 1.14 3.20E-01 1.29 0.05 1.15E-08 0.839
rs2307049 PDGFRA 4 54824911 A 0.23 1.31 2.17E-03 1.26 7.55E-04 1.27 1.08E-03 1.14 4.03E-01 1.14 3.17E-01 1.28 0.05 1.58E-08 0.885
rs10189905 2 199387355 G 0.12 0.72 2.56E-03 0.78 4.13E-03 0.83 1.09E-01 1.08 7.05E-01 0.61 5.70E-03 0.76 0.07 5.57E-07 0.508
rs3690 PDGFRA 4 54856570 C 0.28 1.40 6.25E-04 1.29 1.53E-03 1.18 2.17E-02 1.03 8.58E-01 1.09 5.28E-01 1.33 0.06 7.77E-07 0.391
rs4864872 PDGFRA 4 54847041 T 0.20 1.41 5.94E-04 1.26 3.19E-03 1.19 1.49E-02 1.05 7.69E-01 1.06 6.71E-01 1.32 0.06 1.24E-06 0.370
rs2228230 PDGFRA 4 54846797 T 0.20 1.41 5.94E-04 1.26 3.19E-03 1.19 1.67E-02 1.05 7.69E-01 1.06 6.71E-01 1.32 0.06 1.43E-06 0.362
rs10032688 4 54855415 A 0.19 1.39 8.28E-04 1.26 3.07E-03 1.20 1.39E-02 1.02 8.97E-01 1.11 4.80E-01 1.31 0.06 1.64E-06 0.470
rs17084051 4 54782338 A 0.24 1.35 3.68E-04 1.26 1.08E-03 1.14 7.77E-02 1.23 1.68E-01 1.04 7.55E-01 1.29 0.05 2.16E-06 0.246
rs6758183 2 226709230 A 0.16 1.37 2.36E-02 1.41 1.58E-03 1.21 1.10E-02 1.17 5.24E-01 1.33 1.78E-01 1.39 0.09 2.80E-06 0.772
rs7676985 4 54759330 A 0.26 1.26 2.41E-03 1.19 1.10E-02 1.14 6.09E-02 1.36 3.82E-02 1.09 4.51E-01 1.22 0.05 2.98E-06 0.693
rs1547904 PDGFRA 4 54841146 T 0.19 1.35 1.77E-03 1.24 6.10E-03 1.19 1.52E-02 1.09 6.19E-01 1.07 6.27E-01 1.28 0.06 4.41E-06 0.626
rs6792584 SUCLG2 3 67614078 G 0.24 1.10 2.61E-01 1.36 8.33E-06 1.16 4.06E-02 1.02 8.72E-01 1.17 2.21E-01 1.24 0.05 4.72E-06 0.205
rs2270676 KCNE3 11 73846059 G 0.20 0.74 1.89E-03 0.80 1.67E-03 0.90 1.76E-01 1.08 6.75E-01 0.80 1.18E-01 0.78 0.06 5.09E-06 0.392 aBase pair positions are indicated according to the NCBI build 136 (hg18); bEffect allele; cAverage effect allele frequency in the discovery cohort; s Standard error for odds ratios; eP-value for heterogeneity I2 between five study cohorts.
129
5 Chapter 5 Genome-Wide Comparison of Estimated
Recombination Rates between Populations
5.1 Study summary
Inter-population differences in recombination patterns yield important insights
into recent human evolution and trans-ethnic fine-mapping. Although recombination profiles are largely conserved at a broad scale between populations, preliminary findings have suggested there are variations in fine-scale differences in recombination rates.
In this study, a signal-processing strategy (varRecM) is proposed to evaluate variations in inter-population recombination rates. Generally speaking, recombination rates estimated from two populations can differ in the following two aspects (see Figure 18): in (i) the kurtosis or ‘peakedness’ of the recombination profiles (e.g. Figure 18A to C); or in (ii) the level of coupling, or phase shifting, between the rates (e.g. Figure 18D to F). This method simultaneously quantifies the extent in the differential peakedness of the recombination profiles as well as the degree of phase shift, and adopts a sliding window approach to interrogate the whole genome for any variation in recombination patterns. For each sliding window, a numerical score (the varRecM score) that reflects the extent of recombination variation is assigned. A larger score represents greater evidence of variation in either the peakedness or the phase shift of the recombination rate profiles. Genomic regions harbouring
130
significant differences in recombination rate profiles are thus more likely to
generate scores that lie in the right tail of the genome-wide distribution.
Using LD-based recombination rates that have been computed from the
International HapMap Project (HapMap)5 and the Singapore Genome Variation
Project (SGVP)100, we subsequently apply our varRecM method to compare the
recombination rates between five population panels composed of European, East
Asian, African, Singapore Chinese and Singapore Indian samples. This represents
the first attempt to comprehensively catalogue recombination rate variations
across different human populations on a genome-wide scale. While overall patterns of recombination were conserved between populations, several regions emerged with strong evidence of recombination rate variation that encompass genes exhibiting population-specific signatures of positive natural selection.
Intergenic regions are more likely to exhibit recombination differences between populations.
5.2 Methods
5.2.1 Development of recombination variation score
We calculate the recombination rate differences between two populations with
the varRecM score, which quantifies the extent in the differential peakedness and
the degree of phase shift in the two recombination profiles from each of the two
populations. The approach is implemented genome-wide by adopting a sliding
131
window of 50 kb throughout the whole genome where consecutive windows are
shifted by one SNP. Assuming there are two recombination rate profiles estimated
from the two populations across L SNPs within each genomic window, we define
the varRecM score as
| ( ) ( )|
( 1) + ( 2) 푔 − 푔 푤 1 2 푔 푔 where g(1) and g(2) denote the cumulative recombination measured in
centimorgans (commonly defined as the genetic distance) within the window for
population 1 and population 2 respectively and each metric is calculated as the
area under the curve of the recombination rates across the window; and w denotes a composite weight metric. While we have taken the absolute value in the numerator, in practice we can identify which population spans a larger genetic distance for the same window (thus indicating a higher likelihood for recombination to take place) from the sign of g(1) – g(2). The weight metric w is
defined as the product of two components wm and wcor, where: (i) wm effectively
measures the kurtosis of the recombination peak (for the population that spans the
larger genetic distance) relative to the averaged recombination in the window and
is defined as
( ) = log max ( ) + 1 푘 푘 푚 10 푔 푤 � 푙 푐푙 − � 퐷 (k) with cl denotes the recombination rate (in cM/Mb) between SNP l and SNP (l +
1), k denotes the population that spans the larger genetic distance, and D defines
132
the physical distance of the window (in Mb); and (ii) wcor represents the extent of
‘uncoupling’, or dissimilarity in correlation, between the two series of
recombination rates across L SNPs in the two populations and is given as
= 1 max lag , lag = 0, 1, 2, 3,…
푤푐표푟 − lag��푟 �� Here, rlag represents the lagged cross-correlation coefficient between the two recombination rate series and is defined as
lag ( ) ( ) ( ) ( ) lag = 퐿− 1 1 2 2 lag∑푙=1( )�푐푙+푙푎푔( −) 푐̅ � �푐푙lag − (푐̅ ) � ( ) 푟 2 2 퐿− 1 1 퐿− 2 2 �∑푙=1 �푐푙+푙푎푔 − 푐̅ � �∑푙=1 �푐푙 − 푐̅ � with ( ) denoting the average recombination rate across the window for 푘 population푐̅ k.
To provide a more robust assessment of the cross-correlation coefficient, particularly when a recombination peak occurs at the boundary of the window, the calculation of rlag is performed either with an additional buffer region of 4kb at each end of the window, or includes the recombination rates from two additional
SNPs at each end, depending on which scheme includes more SNPs. The inclusion of the lag component explicitly allows for different SNP densities in the two populations. The use of the genetic distance also mitigates the need for the same SNPs to be present in the two populations.
The rationale behind the inclusion of the first component wm is to up-weigh
the score of a window where there is a clear regional elevation of recombination
rate beyond the averaged rate across the window (for example, a recombination 133
hotspot). Typically wm is between 0 and 2, except in the rare occasion that the
maximum estimated recombination rate is at least 100 cM/Mb above the region-
averaged recombination rate. The intuition behind the second component wcor,
bounded between 0 and 1 inclusive, is to down-weigh the scores where the cross-
correlation between the two series of recombination rates is high, since a high
level of coupling between the two series of recombination rates suggests a smaller
difference in recombination pattern between the two populations in this genomic
window.
5.2.2 Simulation
Simulating recombination rates for two populations
For evaluating the performance of the method, we simulated recombination
rates for two populations across a 200kb region spanning 401 evenly distributed
SNPs. In population k (where k ∈ {1, 2}), we introduced a recombination hotspot
starting at a physical position dk, with a width of wk (in kb) that is distributed as
2 Normal (2, 0.5 ), and spans a genetic distance of gk (in cM). The assumed size of the hotspot is consistent with those observed in sperm typing78,373. The genetic
distance here is equivalently the area under the recombination hotspot versus
physical distance curve, or a measure of the cumulative peakedness of the
recombination hotspot. The background recombination rate used in our simulation
is assumed to follow an exponential distribution with a mean of 0.24cM/Mb. This
is based on the observation that the average recombination rate is 1.2cM/Mb
across the human genome, and yet only 20% of the recombination occurs outside
134
recombination hotspots374. In our simulations, we assume the presence of a
recombination hotspot in each of the two populations, varying in intensity and
location of the hotspots. The recombination rate for the hotspot in the first
2 population g1 is distributed as Normal (µ, 0.5 ) with µ measured in cM/Mb. The
recombination rate in the second population g2 is defined with respect to λ, which
represents the ratio of g1 to g2 (or λ = g1/g2). Larger values of λ correspond to
greater differences in the recombination profiles between the two populations.
For the assessment of the specificity of the varRecM metric, let ddiff denote the distance (in kb) between the starting locations of the two hotspots, assumed to be
2 2 distributed as Normal (0, σd ) with σd ∈ {0, 0.5, 1, 2}. Varying σd thus controls
2 the proximity of the hotspots in the two populations, and larger values of σd
effectively introduce greater differences in the simulated recombination profiles.
In addition to adding moderate jitter in the starting positions of the recombination
hotspots, we also evaluate the robustness of the varRecM metric in situations
where moderate differences exist in the peakedness of the recombination hotspots
between the two populations but not in the location of the hotspots. This is
achieved by setting µ as 25cM/Mb (for the first population) and λ ∈ {1, 5, 10}.
We also evaluate the effect of varying the lag parameter in the metric across a range of values (0, 2, 4, 6, 8, 12 and 10 log10(L/2)), which broadly corresponds to
lag distances of 0kb, 1kb, 2kb, 3kb, 4kb, 6kb and 5 log10(L/2)kb given a SNP
density of 2 SNPs/kb.
Our assessment of the sensitivity of varRecM assumes the recombination
hotspot for population 1 is located in the center of the simulated 200kb region, 135
while the hotspot for population 2 is randomly distributed between the 75kb and
125kb mark in each window. Given that the size of each hotspot follows a Normal
(2, 0.52) distribution, this simulates a high likelihood that the two hotspots are
found in significantly different locations. In addition, we vary the relative
intensities of the hotspots by setting µ ∈ {6, 25, 50} and allowing λ to range from 1 (no difference in the peak intensities) to 50 (where the genetic distance from the recombination hotspot in population 1 is fifty times the genetic distance in population 2).
By performing 10,000 simulations under each of the different settings for σd,λ
and µ, we count the frequencies where a large varRecM score than a pre-defined
genome-wide threshold is obtained, since this represents stronger evidence of
inter-population variation in recombination rates relative to the genome-wide
distribution. The empirical distributions of the varRecM score from the pairwise
genome-wide comparisons of the three HapMap populations displayed similar
values for the top quantiles. We thus averaged the varRecM scores at the top 1%
(and also the top 5% and top 0.1%) across the three pairs of comparisons , and
used these values as thresholds in our calculation of statistical power and false
positive rates. The power and false positive rates in our simulations are thus
defined as the proportion of simulations out of 10,000 where obtained score
exceeds the thresholds at the top 5%, 1% and 0.1% given the different
combinations of σd ,λ and µ. In all likelihood, our choice of genome-wide thresholds will result in a marginal under-estimate of the statistical power and false positive rates. 136
5.2.3 Estimation of recombination rates
Data
Our genome-wide comparisons utilised recombination rates estimated from
the genotypes of autosomal SNPs in the three population panels in Phase 2 of the
HapMap5, as well as in the three Asian populations from the SGVP100. Phase 2 of
HapMap surveyed around 2.8 million SNPs in each of the three panels, consisting
of: (i) 60 unrelated individuals in Utah with European ancestry (CEU); (ii) 60
unrelated Yoruba individuals sampled from the Ibadan region of Nigeria (YRI);
(iii) 45 unrelated Han Chinese from Beijing (CHB) and 45 unrelated Japanese
from Tokyo (JPT), which have been pooled as the East Asian panel (JPT+CHB).
The SGVP surveyed around 1.6 million autosomal SNPs in three population
groups, consisting of 96 Chinese (CHS), 89 Malays (MAS) and 83 Asian Indians
(INS) in Singapore. Population-specific recombination rates for the SGVP
samples are downloaded from the SGVP website (http://www.nus-
cme.org.sg/sgvp). To avoid any inadvertent confounding in our analyses that are
solely attributed to the different SNP densities, we thinned the SNPs from the
HapMap populations to a similar set of SNPs found in SGVP and re-estimated the recombination rates for each population panel separately.
Recombination rate estimation
137
Both the HapMap and SGVP used LDhat101 to calculate the recombination
rates within each population, yielding an estimated rate in cM/Mb for each
successive pair of SNPs. For the HapMap populations, per generation
recombination rates c were obtained as the ratio of the estimated rate from LDhat
to the effective population size, which have been estimated as 11,418 for CEU,
17,469 for YRI and 14,269 for JPT+CHB60,375. For the SGVP populations, the
effective population size of 14,269 was used for the Chinese and Malays, while
the average of the effective population sizes for CEU and JPT+CHB (Ne =
12,844) was used for the Singapore Indians.
5.2.4 Simulation
Simulating recombination rates for two populations
For evaluating the performance of the method, we simulated recombination rates for two populations across a 200kb region spanning 401 evenly distributed
SNPs. In population k (where k ∈ {1, 2}), we introduced a recombination hotspot starting at a physical position dk, with a width of wk (in kb) that is distributed as
2 Normal (2, 0.5 ), and spans a genetic distance of gk (in cM). The assumed size of the hotspot is consistent with those observed in sperm typing78,373. The genetic
distance here is equivalently the area under the recombination hotspot versus
physical distance curve, or a measure of the cumulative peakedness of the
recombination hotspot. The background recombination rate used in our simulation
is assumed to follow an exponential distribution with a mean of 0.24cM/Mb. This
is based on the observation that the average recombination rate is 1.2cM/Mb
138
across the human genome, and yet only 20% of the recombination occurs outside
recombination hotspots374. In our simulations, we assume the presence of a
recombination hotspot in each of the two populations, varying in intensity and
location of the hotspots. The recombination rate for the hotspot in the first
2 population g1 is distributed as Normal (µ, 0.5 ) with µ measured in cM/Mb. The
recombination rate in the second population g2 is defined with respect to λ, which
represents the ratio of g1 to g2 (or λ = g1/g2). Larger values of λ correspond to
greater differences in the recombination profiles between the two populations.
For the assessment of the false positive rate of the varRecM metric, let ddiff
denote the distance (in kb) between the starting locations of the two hotspots,
2 2 assumed to be distributed as Normal (0, σd ) with σd ∈ {0, 0.5, 1, 2}. Varying σd thus controls the proximity of the hotspots in the two populations, and larger
2 values of σd effectively introduce greater differences in the simulated recombination profiles. In addition to adding moderate jitter in the starting positions of the recombination hotspots, we also evaluate the robustness of the varRecM metric in situations where moderate differences exist in the peakedness of the recombination hotspots between the two populations but not in the location of the hotspots. This is achieved by setting µ as 25cM/Mb (for the first
population) and λ ∈ {1, 5, 10}. We also evaluate the effect of varying the lag
parameter in the metric across a range of values (0, 2, 4, 6, 8, 12 and 10
log10(L/2)), which broadly corresponds to lag distances of 0kb, 1kb, 2kb, 3kb,
4kb, 6kb and 5 log10(L/2)kb given a SNP density of 2 SNPs/kb.
139
Our assessment of the power of varRecM assumes the recombination hotspot for population 1 is located in the center of the simulated 200kb region, while the hotspot for population 2 is randomly distributed between the 75kb and 125kb mark in each window. Given that the size of each hotspot follows a Normal (2,
0.52) distribution, this simulates a high likelihood that the two hotspots are found in significantly different locations. In addition, we vary the relative intensities of the hotspots by setting µ ∈ {6, 25, 50} and allowing λ to range from 1 (no difference in the peak intensities) to 50 (where the genetic distance from the recombination hotspot in population 1 is fifty times the genetic distance in population 2).
By performing 10,000 simulations under each of the different settings for σd,λ and µ, we count the frequencies where a large varRecM score than a pre-defined genome-wide threshold is obtained, since this represents stronger evidence of inter-population variation in recombination rates relative to the genome-wide distribution. The empirical distributions of the varRecM score from the pairwise genome-wide comparisons of the three HapMap populations displayed similar values for the top quantiles. We thus averaged the varRecM scores at the top 1%
(and also the top 5% and top 0.1%) across the three pairs of comparisons , and used these values as thresholds in our calculation of statistical power and false positive rates. The power and false positive rates in our simulations are thus defined as the proportion of simulations out of 10,000 where obtained score exceeds the thresholds at the top 5%, 1% and 0.1% given the different combinations of σd ,λ and µ. In all likelihood, our choice of genome-wide 140
thresholds will result in a marginal under-estimate of the statistical power and false positive rates.
5.2.5 SNP annotation, copy number variation and FST calculation
Annotation information for all the SNPs was obtained from the PLINK
retrieval interface (http://pngu.mgh.harvard.edu/~purcell/plink/psnp.shtml) which
uses the TAMAL database 376 and maps to the information from the UCSC
genome browser, HapMap and dbSNP build 126. We additionally classify each
SNP as whether it is: (i) non-synonymous; (ii) synonymous; (iii) found within a
gene; or (iv) not within a gene. We used Version 10 of the database for copy
number variation from the Database of Genomic Variants (http://projects.tcag.ca).
For each SNP that is present in both populations, we calculated the FST using the Weir and Cockerham estimator377. This metric quantifies the difference in
allele frequencies at a SNP between two populations. For each window in the
calculation of the varRecM score, we adopt an established approach378 that uses
the maximum FST across the SNPs to represent the regional differentiation
between two populations.
5.2.6 Quantification of variations in linkage disequilibrium
We used the varLD programme379 to assess the extent of LD variation
between two populations across the genome, which considers sliding windows of
50 SNPs with a 1-SNP shift. The position of each window is summarised as the
141
average physical position across the 50 SNPs. Further details of the varLD
algorithm can be found in the original publication379. To evaluate the correlation
between the varRecM and varLD scores calculated for each pair of populations,
we calculated both the average and the maximum varLD scores across all varLD
windows possessing averaged positions that fall within the start and end positions
for each of the 50kb varRecM windows. We consider the varRecM windows with
scores that are found in 15 bins, each spanning 0.1 quantile of the varRecM
scores. The 15 quantile intervals considered are: 4.9th – 5th, 9.9th – 10th, 19.9th –
20th, 29.9th – 30th, 39.9th – 40th, 49.9th – 50th, 59.9th – 60th, 69.9th – 70th, 79.9th –
80th, 89.9th – 90th, 94.9th – 95th, 97.4th – 97.5th, 98.9th – 99th, 99.4th – 99.5th, 99.9th –
100th. For example, the 94.9th – 95th quantile interval contains all the windows where the varRecM scores are found to be at least as large as 94.9% of the genome-wide distribution but less than the top 5% of the scores. We calculate the
Pearson’s correlation coefficient between the varLD scores and the varRecM scores across the regions in these 15 bins. In addition, we also calculated the correlation between varLD and varRecM scores after adjusting for the effect of population differentiation as quantified by FST. This is achieved by regression the varLD scores as a linear function of the varRecM scores and the FST for each
region.
142
5.3 Results
5.3.1 Simulation studies on power and false positive rates
Our formulation of the varRecM metric explicitly locates genomic regions
where there is a significant deviation, such as a spike, in the recombination profile
of one population that is not present in the second population. The varRecM score
is negligible when the recombination profile is relatively flat across both
populations. Thus, in assessing the power and false positive rates (FPRs) of
varRecM, our simulations concentrated on situations where there is at least one
spike in the recombination rates of each population, with varying degrees of
peakedness and with different hotspot locations. We compared the varRecM score for each simulated window against three thresholds at the top 0.1%, 1% and 5% quantiles of the genome-wide distributions obtained from pairwise comparisons of the HapMap populations. This approach of prioritizing regions found in the
extreme tails of the genome-wide distribution is similar to other population
genetics strategies for highlighting unique features in the human genome, such as
extreme evidence of LD variation379 and positive natural selection378,380-382.
We first evaluate the relevance of the lag component in the varRecM metric
in minimizing the FPR. In addition to varying the relative intensities of the
recombination profiles (quantified as λ ∈ {1, 5, 10}) in our simulations, we also
allow minor variations in the hotspots locations in the two populations. This
mimics the situation where the recombination rates in each population have been
estimated with a slightly different set of SNPs due to the use of different
143
genotyping technology which can introduce minor deviations in the positions of
the hotspots. When the simulated hotspots are located at exactly the same
positions (corresponding to σd = 0), the FPR at the 1% threshold is almost entirely
zero regardless of the relative intensities λ and the lag values used (Figure 19A).
Unsurprisingly, as the positions of the hotspots start to vary (σd > 0), higher FPRs are observed although this can be controlled by increasing the lag value (Figure
19B-D). Similar patterns are observed with the use of the 0.1% and 5% thresholds. However, increasing the lag reduces the power to detect genuine variations in recombination rates (Figure 20A) although this attenuation of power is marginal for moderately small lag values at the top 1% and 5% threshold (i.e. <
6). Based on the SNP densities found in Phase 2 of the HapMap and SGVP, we calibrate the varRecM metric to adopt a lag distance equivalent to approximately
4kb in all subsequent analyses.
To investigate the sensitivity of the varRecM metric to variations in recombination rates, we vary the ratios of hotspot intensities and the locations of the hotspots in the two populations (Figure 20B-D). Unsurprisingly, the power of varRecM is observed to increase with larger differences in the hotspot intensities
(as λ increases from 1 to 50), particularly when the average intensity in the reference population is high (µ of 25 and 50 cM/Mb). The power is generally lower for detecting variations as a result of smaller spikes in recombination profiles at the threshold of top 0.1% and 1% (Figure 20B).
144
5.3.2 Application to HapMap and Singapore Genome Variation Project
Using the estimated recombination rates for HapMap and SGVP, five genome-wide scans of recombination rate variations are performed for the following population pairings: (i) CEU versus JPT+CHB; (ii) CEU versus YRI;
(iii) JPT+CHB versus YRI; (iv) CHS versus INS; (v) CHS versus JPT+CHB. We adopt a sliding window approach where each window spans 50kb and consecutive windows are shifted by 1 SNP (see Subjects and methods). Overlapping regions that emerged in the top 1% of each scan are merged to localize the start and end positions, and to avoid double-counting. In general, the genome-wide distribution of the varRecM statistics has a high kurtosis with long tails (Figure 21). Our survey of the two relatively homogeneous populations (CHS and JPT+CHB, median FST = 0.3%) revealed that the varRecM scores tend to be clustered around zero, with a median score of 0.0063, or approximately four-fold lower than the median scores from the comparison between YRI and non-YRI HapMap populations (Table 10). Using the associated varRecM score in the top 1% of the genome-wide distribution, we observed an apparent pattern in the spatial distribution of the population-specific recombination peak regions. From our pairwise comparisons between HapMap CEU and non-CEU samples (YRI ,
JPT+CHB), 114 regions exhibited strong evidence of CEU-specific recombination peaks across the autosomes (Figure 22). The same analyses comparing JPT+CHB against CEU and YRI identified 88 regions, while the corresponding number for YRI-specific recombination peaks is 294.
145
Amongst the strongest twenty regions that emerged in the comparison
between the HapMap Europeans (CEU) and East Asian (JPT+CHB), six regions
harboured genetic evidence of population-specific positive selection (Table 11) such as the region on chromosome 15 that encompassed the SLC24A5 pigmentation gene which is well-established to be positively selected in
Europeans but not in other populations383. Recombination peaks spanning 0.12cM
and 0.05cM are found adjacent to SLC24A5 in JPT+CHB (Figure 23A ) and YRI respectively, and this is notably absent in CEU. Another region encapsulated the
LPP gene on chromosome 3q28. The region spans a 100kb window where a significant spike in the recombination rates was observed in CEU, but not in
JPT+CHB (Figure 23B). Intriguingly the recombination hotspot was also present in CHS, and this window emerged with strong evidence of recombination variation in our analysis between CHS and HapMap JPT+CHB (Table 12). The
LPP gene has previously been reported as a strong candidate for recent positive selection in the HapMap East Asians382, and subsequent studies have suggested
that this selection signal is present only in northern Chinese populations but not in
southern Chinese populations384. It is however not necessarily true that a selection sweep appears to coincide with a region of low recombination, as we observed spikes in recombination in regions reported to be undergoing positive selection.
One example is at the region encompassing PHACTR1, which is a candidate gene for positive selection in HapMap East Asian381 but yet exhibits evidence of
significant recombination in JPT+CHB (Figure 23C).
146
For the comparisons between YRI and the non-African populations (CEU,
CHB+JPT), only two top regions contain population-specific selection (Table
11). The first is in the region around the SLC24A5 gene in the comparison
between CEU and YRI; the second is an extended region surrounding GRP85 on
chromosome 7 which has been identified to be positive selected in East Asians380-
382 (Figure 23D). In addition, although recombination hotspots are thought to be highly active in YRI populations, strong population-specific recombination peaks are also observed for CEU (10 of 20 top regions) and CHB + JPT (6 of 20) in the comparison to YRI (Table 12).
In addition to locating candidates of positive selection, there are other regions that emerged with strong evidence of inter-population differences in recombination rates (Table 11 & Table 12, Figure 24 - 28). Several genes are related with severe genetic syndromes. For example, the region encompassing
SPINK5 (associated with Netherton syndrome) on chromosome 5q32 was consistently identified among the top signals in the comparisons between YRI and non-Africans (CEU, JPT+CHB); PLCE1 on chromosome 10q23, which is associated with diffuse mesangial sclerosis (DMS), has two strong hotspots in the gene for JPT+CHB populations but a significantly weaker recombination profile in YRI; and SLC2A2 on chromosome 3q26, which is a major genetic factor for
Fancoin-Bickel syndrome, has a hotspot that is present only in CEU but not in
JPT+CHB or YRI.
147
5.3.3 Recombination variation and Linkage disequilibrium variation highly
correlated
To examine the relationship between inter-population variations in recombination rates and LD patterns, we measured the average and maximum varLD scores for each window used in the varRecM score calculation. We observed strong correlations between the varRecM and averaged varLD metrics in all of our comparisons (Pearson’s correlation coefficient r ranging from 0.873 to
0.965 with corresponding p-values ranging from 2.10 × 10-5 to 6.38 × 10-9, see
Figure 29), except in the comparison between the two East Asian panels (CHS
versus JPT+CHB, r = -0.039, P-value = 0.889). Similar patterns were observed
when we use the maximum varLD scores. The strongest correlation between
inter-population LD variation and recombination differences was observed in the
comparison between JPT+CHB and YRI. To address whether this strong
relationship is driven by the differences in allele frequency spectrum between the
populations which may inadvertently bias both the LD calculations and the
recombination rate estimation, we calculated the FST for every SNP that is found in both populations and calculated the maximum FST across the SNPs in each window. Similar correlation patterns between the varRecM and varLD scores remained after accounting for the FST within a regression framework, while no
significant association was observed between varRecM and FST (P-values > 0.05).
148
5.3.4 Regions with largest recombination variation less frequent in genes
We observed there was an enrichment of windows exhibiting high varRecM scores in non-gene regions (P-values < 10-7, Figure 30), except in the comparison between CHS and JPT+CHB (P-value = 0.614). In the comparison between CEU and JPT+CHB, high varRecM scores (defined as in the top 1% of the genome- wide distribution) were 1.11 times (95% CI: 1.07 – 1.15, P-value = 1.23 × 10-9) more likely to be found in nongenic regions as compared to varRecM scores below the top percentile. Similar odds ratios were observed in the comparisons between CEU and YRI (OR = 1.23, 95% CI: 1.19 – 1.27, P-value = 8.66 × 10-24);
JPT+CHB and YRI (OR = 1.16, 95% CI: 1.12 – 1.20, P-value = 1.76 × 10-18);
CHS and INS (OR =1.10, 95% CI: 1.06 – 1.17, P-value =5.67 × 10-8). This suggests that extreme inter-population variations in recombination rates are generally less likely to happen in gene regions, which concurs with previous observations that recombination in humans tends to be more conserved in gene regions 5,101.
5.4 Discussion
We have introduced a quantitative approach to evaluate fine-scale inter- population variation in recombination rates across the whole genome based on the high-resolution recombination maps. This method is especially powerful for locating genomic regions where a recombination spike exists in one population but not in another population. By introducing a lag component in the formula, our
149
metric is robust to small shifts in the relative positions of recombination hotspots
that are found in both populations. We adopted the same approach as most
population genetics strategies for LD variations379 and positive selection379-381 by
prioritizing genomic regions with extreme evidence when benchmarked against
the rest of the genome. We applied our method to pairs of populations from
HapMap and SGVP, observing significant fine-scale differences in the recombination profiles of Europeans, Africans and East Asians. The number of
regions in the top 1% exhibiting strong evidence of YRI-specific recombination peaks is at least twice the number of population-specific recombination peaks for
CEU or JPT+CHB. Seven of the top twenty regions that emerged with the strongest evidence of recombination differences between HapMap CEU and JPT+
CHB coincide with genomic regions exhibiting strong signatures of population- specific positive selection, while a few of the top regions contain genomic signatures of positive selection from the comparisons between African and non-
African populations. In addition, multiple top regions encompass genes associated with the severe genetic syndromes. Large inter-population variation in recombination is also less frequent in genes, and unsurprisingly we detected a strong correlation between variations in recombination rates and LD patterns.
Overall the varRecM score in each window indicates the extent of recombination rate variation between the two populations in consideration, with larger values corresponding to higher amount of differences in the recombination rate profiles in the two populations. These differences can be due to differential peakedness in the recombination rates, phase shifting, or both. As with most 150
population genetic metrics, it is difficult to interpret the statistical significance of each varRecM score and instead we consider the overall distribution of the varRecM scores across the entire human genome. This allows us to identify specific genomic regions which exhibit the strongest evidence of recombination rate variation between the populations. For example, a varRecM score that falls in the top 0.1% of the genome-wide distribution suggests that the difference in the recombination rates in this particular window is more pronounced than 99.9% of the genome. We emphasize that this does not necessarily indicate the biological relevance or significance of this genomic window, other than to highlight the region as potentially interesting given the evidence of significant diversity in recombination patterns. To the best of our knowledge, there is currently no formal statistical metric for defining recombination rate variation without first defining the presence or absence of recombination hotspots. The varRecM score is thus the first metric that performs an agnostic survey of recombination rate differences with the rates directly.
One of the challenges we encounter in formulating a metric for evaluating inter-population variation in recombination rates is justifying the sensitivity of the method – how do we know that the regions we identified truly concur with genuine differences in recombination? There are two aspects to this question: (i) that there are genuine differences in the biological mechanisms between different populations that resulted in recombination preferentially occurring in one population but not in the other; (ii) that methods used to estimate recombination rates are biased by inherent artifacts in the data used, such as erroneous genotype 151
calls, loss of heterozygosity due to copy numbers or positive selection. We have
used recombination rates that were fundamentally estimated via the coalescence
of haplotype data385,386. This inference of historical recombination can be
distorted locally by positive selection, since the time tracing back to the most recent common ancestor (MRCA) in a coalescent based-model is underestimated
when local genetic diversity is low or reaches fixation387,388. Thus, the
concordance between the regions of suppressed recombination we identified with
known signatures of intensive positive selection (i.e. SLC24A5) is reassuring,
providing the means for bench-marking the performance of the method as well.
Conversely, it has been suggested, perhaps surprisingly, that selective sweeps
could also decrease or even eliminate LD on the selected loci, but such LD
reduction has the potential to create false evidence of recombination
hotspots103,389,390. The top regions identified in our study with the strongest
evidence of elevated rates of recombination at well-established selection loci thus
warrants further investigation for such scenarios.
Our study has approached recombination rate differences at the population
level. However accumulating evidence suggests that differential hotspot presence
across individuals or between males and females may be in part attributed to
germline genetic differences. Several studies have suggested that recombination
may be activated by: (i) a single base change at certain SNPs84,373,391; (ii) the
presence of a certain haplotype392; or (iii) the presence of a specific 13-mer motif of binding site for the zinc finger104. Recent studies have also discovered the
152
coding zinc-finger protein PRDM9 gene can influence both allelic and non-allelic
recombination and can account for up to 18% of hotspot variance in humans105,393-
395. It has been shown through pedigree analyses393 and sperm typing394 that
individuals carrying the PRDM9 variants lacking the motif-recognizing alleles
exhibited dramatically lower genome-wide hotspot usage. Substantial allele
frequency differences in PRDM9 between populations have been suggested to
result in potential shifts in recombination hotspots98. The results from our study
are consistent with the findings that recombination hotspots can be polymorphic
among human populations. Our approach thus serves to identify genomic regions
that may be investigated further for association with PRDM9 variants across
different populations.
Recombination is one of the key forces that breakdowns LD, and the rate of
recombination has been widely used in SNP imputation algorithms60. Our
genome-wide surveys identified a close relationship between the evidence of LD
variation and recombination rate variation. This observation is important in trans-
ethnic experiments that aim to fine-map the functional polymorphisms by
leveraging on the genomic diversity of different populations396,397, as it highlights
the use of population-specific recombination rates for carrying out the fine- mapping. Since regions priortised for trans-ethnic fine-mapping tend to be disease-associated regions that exhibited variable LD patterns between populations, using recombination rates that are averaged across multiple populations (such as those available on the HapMap resource) may not accurately define the haplotype breakpoints in the different populations. We did not observe 153
any correlation between the recombination and LD differences in two closely-
related populations: CHS and JPT+CHB, likely because one or both of the
varRecM and varLD statistics will be highly skewed in such comparisons and
scores at the tail end of the genome-wide distribution may not effectively differentiate the degree of inter-population recombination variation.
Recently, Wegmann and colleagues have found that regions at a broad scale of 1Mb showing the strongest differences in recombination are disrupted by common structural variation (inversions) in the comparison between admixture
American Africans and Europeans95. Some regions exhibiting largest recombination differences in our study also harboured copy number variation,
especially in the comparisons between Africans and non-Africans. However, in
contrast to the previous study, these structural variants were generally very rare
(frequency less than 0.05) and therefore less likely to distort the local
recombination rate estimation within these top identified regions.
Our findings of genomic regions that exhibit population-specific
recombination hotspots illustrate that there can be variations in fine-scale recombination profiles between populations. The regions with the largest recombination difference residing in the population-specific sites of positive selection may reflect artifacts that are introduced due to a suppression of heterozygosity in the genotype data as a result of the positive selection signatures, which inadvertently affect the estimation of recombination rates. The fine-scale comparison of recombination profiles will priortise genomic regions for further
154
studies into the biological basis of heterogeneous recombination across different human populations, thus providing further insight into the process of evolution.
155
Figure 18. Illustration of ranking the recombination differences from two populations. Suppose there are two recombination rate profiles estimated from different populations, population 1 (yellow line) and population 2 (blue line), while the genetic distance (the area under the curve) for population 1 is assumed larger than that for population 2 in a fixed window. We rank the recombination differences between population 1 and 2 as depicted by the red arrow. For the upper panel, the region with the extent variation in the magnitude of the recombination peaks between these two populations is greatest in C) where high peak in population 1 and depleted recombination in population 2. For the lower panel, the recombination difference is incremental from D) to E), which is attributed to the level of phase shifting of two recombination rates.
156
A B
C D
Figure 19. Evaluation of false positive rates (FPR) of varRecM method.
The false positive rate of the varRecM approach is evaluated for the two recombination hotspots which are at the same or close proximity location with varying intensity. The distance between the starting positions of two hotspots is 2 distributed as Normal (0, σd ), where A) σd =0 kb, B) σd = 0.5 kb, C) σd = 1.0 kb
and D) σd =2.0 kb. The recombination rate for the hotspot in the first population 2 g1 is distributed as Normal (25, 0.5 ). The recombination rate in the second population g2 is defined with respect to λ, where λ ∈ {1, 5, 10},*, which represents the recombination hotspot intensity ratio of g1 to g2. The lag parameter is varying across a range of values (0, 2, 4, 6, 8, 12 and 10 log10(L/2)) where L =100, the total SNP in the 50 kb of the window in the simulation study. Give a SNP density of 2 SNPs/kb in the simulated data, the lag values corresponds to lag
distances of 0kb, 1kb, 2kb, 3kb, 4kb, 6kb and 5 log10(L/2)kb. Threshold in the false positive rates calculation is at the top 1% genome-wide.
157
A B
C D
Figure 20. Power performance of varRecM method. The power of the varRecM metric is evaluated using the simulated data with varying intensity and phase shifting of the recombination hotspots between two populations. Recombination hotspot for population g1 is located at the center of the simulated 50kb window, where the hotspot rate is distributed as Normal (µ, 0.52) and µ is set as A) 25 cM/Mb, B) 6 cM/Mb, C) 25 cM/Mb, and D) 50 cM/Mb. The hotspot for population g2 is randomly distributed within each window of 50kb, and the rate is determined by λ, intensity ratio of g1 to g2. For A), λ is fixed at 10; For B), C) and D), λ ranges from 1 (no difference in hotspot intensity) to 50 (where the hotspot intensity in population g1 is fifty times the hotspot intensity in population g2). The lag distance is adopted as 4kb in the power evaluation for B), C) and D).
158
Figure 21. Accumulative density plots of varRecM scores from five pair comparisons between HapMap and SGVP populations. CEU - JPT + CHB: comparisons between HapMap Europeans (CEU) and East Asians (JPT+CHB) (black line); CEU - YRI: comparisons between HapMap European (CEU) and Africans (YRI) (red line); JPT + CHB - YRI: comparisons between HapMap East Asians (JPT + CHB) and Africans (YRI) (green line); CHS - INS: comparisons between SGVP Chinese (CHS) and Indian (INS) (blue line); CHS - JPT + CHB: comparisons between SGVP Chinese (CHS) and HapMap East Asians (JPT + CHB) (pink line).
159
Figure 22. Distribution of population-specific recombination peak regions in the top 1% of the varRecM scores. We identified those population-specific recombination peak regions that contained signals in the top 1% of the genome- wide distributions in two specific population-pair comparisons (say between population A and population B, and between population A and population C) but not in the top 5% of the distribution in the third population-pair (between population B and population C). These regions are indicative that the signals are driven mainly by one population (population A in our example here). The number of regions for each population is: 114 for CEU, 88 for JPT+CHB and 294 for YRI.
160
A SLC24A5 ( chr 15)
B LPP ( chr 3) ) /Mb cM
C varRecM PHACTR1 (chr 6) Recombinationrate (
D GPR85 (chr 7)
. Chromosome position (Mb)
Figure 23. Top regions of largest varRecM scores with overlapping signals of positive selection. The top regions illustrated are from the comparisons between HapMap Europeans (CEU) and East Asians (JPT+CHB) (Fig. 23A-C), and HapMap JPT+ CHB versus Africans (YRI) (Fig. 23D). The varRecM scores are depicted as dark blue points. Recombination rates for CEU (orange line), JPT+CHB (brown line) and YRI (sky blue line). A) A strong recombination peak
161
in JPT+CHB but suppressed in CEU at 46.3 Mb on chromosome 15. The deficit of rates in CEU is within the extended selection region at gene SLC24A5. The horizontal grey bar near the x-axis denotes the position undergoing selection for CEU and the red bar below indicates the position of the gene SLC24A5. B) Strong recombination rate is within LPP (red bar) at 190.0 Mb on chromosome 3 in CEU with the selection signals for JPT+CHB (grey bar), while the rate is depleted for JPT+CHB at the same location. C) Recombination peaks at 12.9 Mb on chromosome 6 in JPT+CHB, but absent in CEU. High recombination in PHACTR1 genes in JPT+CHB coincides with selection reported in the Chinese population. D) High recombination peak in JPT+CHB but suppressed in YRI at 112.5Mb on chromosome 7, corresponding to a selection region in JPT+CHB encapsulating gene GPR85.
162
163
Figure 24. Plots of the top 20 regions of the varRecM scores for the comparison between samples of HapMap CEU and JPT+CHB. Recombination rates of CEU (black line) and JPT+CHB (Green line). Vertical grey dotted line depicts the top varRecM region.
164
165
Figure 25. Plots of the top 20 regions of the varRecM scores for the comparison between samples of HapMap CEU and YRI. Recombination rates of CEU (black line) and YRI (Green line).
166
167
Figure 26. Plots of the top 20 regions of the varRecM scores for the comparison between samples of HapMap JPT+CHB and YRI. Recombination rates of JPT+CHB (black line) and YRI (Green line).
168
169
Figure 27. Plots of the top 20 regions of the varRecM scores for the comparison between samples of SGVP CHS and INS. Recombination rates of CHS (black line) and INS (Green line).
170
171
Figure 28. Plots of the top 20 regions of the varRecM scores for the comparison between samples of SGVP CHS and HapMap JPT+CHB. Recombination rates of CHS (black line) and JPT +CHB (Green line).
172
A B C
D E
Figure 29. Scatter plot of varLD score versus varRecM score among HapMap and SGVP populations. A) CEU - JPT+CHB (Pearson correlation coefficient r = 0.873; p = 2.10 × 10-5 ), B) CEU- YRI (r = 0.897; p = 5.72 × 10-6 ), C) JPT+ CHB – YRI (r = 0.965; p = 6.38 × 10-9), D) CHS-INS (r = 0.893; p = 7.49 × 10-6 ), E) CHS - JPT+CHB (r = -0.039; p = 0.889). Fitted line is predicated from the linear regression function by regressing varLD on varReCM scores.
173
1.4
1.2
1
0.8
Odds Ratios 0.6
0.4
0.2
0 CEU - JPT+CHB CEU - YRI JPT+CHB - YRI CHS - INS CHS - JPT+CHB
Figure 30. Odds ratio of extreme varRecM scores presenting in intergenic versus gene regions.
CEU –JPT + CHB: comparisons between HapMap Europeans (CEU) and East Asians (JPT+CHB); CEU - YRI: HapMap European (CEU) and Africans (YRI); JPT + CHB - YRI: HapMap East Asians (JPT + CHB) and Africans (YRI); CHS - INS: SGVP Chinese (CHS) and Indian (INS); CHS - JPT + CHB: SGVP Chinese (CHS) and HapMap East Asians (JPT + CHB). For all the comparisons, regions with extreme varRecM scores are defined as > top 1% in the distribution and non- extreme scores at ≤ top 1%, except for comparison between SGVP CHS and HapMap JPT +CHS; for CHS and JPT +CHB, regions with extreme varRecM scores are defined as exceeding top 0.1%. Y error bar represents 95% CI of the odds ratio.
174
Table 10. varRecM scores at top percentiles for pair-wise comparisons of the three HapMap populations between CEU and JPT + CHB, CEU and YRI, YRI and JPT + CHB CEU - YRI - Percentage\populations JPT+CHB CEU -YRI JPT+CHB Mean SE top 5% 0.216 0.265 0.246 0.242 0.014 top 1 % 0.447 0.475 0.443 0.455 0.010 top 0.5% 0.552 0.565 0.537 0.551 0.008 top 0.1% 0.799 0.773 0.727 0.766 0.021
175
Table 11. The 20 strongest signals of varRecM scores in comparisons of HapMap populations. Chr Position (Mb Population with Genes Population Selection Structure variationc in hg18) strong under position recombination selection (Mb)b (peak rate cM/Mb)a CEU – JPT +CHB 1 192.78 – 192.88 CEU ( 28.2) 2 35.15 – 35.24 JPT + CHB (30.5 ) Copy number difference in HapMap samples398
3 172.22 – 172.35 CEU (20.2) SLC2A2, TNIK 3 189.93 – 190.03 CEU (52.1) LPP JPT + CHB382,399 189.73 – 190.42 4 148.40 – 148.50 CEU (42.5) 4 149.00 – 149.10 CEU (23.2) ARHGAP10 CEU381,400 148.74 – 149.14 4 161.91 – 162.00 JPT + CHB (39.5) Copy number loss in Caucasians or African- Americans 401 / copy number gain in HapMap samples (MAF=0.07) 398, Germans402 5 43.85 – 43.95 CEU (46.9) 5 151.85 – 151.92 JPT + CHB (61.3) 6 12.88 – 13.02 JPT + CHB (17.8) PHACTR1 JPT+CHB381,403/ 12.62 – 13.20 Chinese399 7 92.16 – 92.26 CEU (22.3) CDK6 CEU381,403 92.12 – 92.22 7 135.72 – 135.81 CEU (81.0) 8 136.27 – 136.36 CEU (54.0) 10 126.92 – 127.01 JPT + CHB (28.6) JPT + CHB380,404 126.90 – 127.00 11 71.46 – 71.52 CEU (18.5) LRRC51, NUMA1, FOLR3 13 31.56 – 31.66 JPT + CHB (57.9) FRY 15 34.04 – 34.12 CEU (44.4) 15 46.26 – 46.36 JPT + CHB (47.8) SLC24A5, CEU381,382,399,403 45.94 – 46.80 SLC12A,MY EF2 17 71.01 – 71.11 CEU (35.4) CASKIN2, Copy number loss in LLGL2, Caucasian/African- TSEN54 American, HGDP, Japanese 401,405,406 /gain in HGDP405 18 63.70 – 63.77 CEU (14.1) CEU - YRI 1 98.02 – 98.12 CEU (82.1) DPYD 1 102.55 – 102.62 YRI (15.3) Copy number loss in Caucasians, Germans, HapMap, Canadian Ontario controls 402,407,408 2 8.24 – 8.35 YRI (34.1) Copy number gain in Caucasian individuals 407 2 151.09 – 151.19 CEU (88.6) 2 226.52 – 226.62 CEU (32.7) 3 179.54 – 179.64 CEU (59.9) 4 42.42 – 42.52 CEU (99.1) 4 54.43 – 54.52 CEU (70.4) 5 147.45 – 147.55 YRI (40.4) SPINK5, SPINK5L2 7 139.56- 139.62 CEU (44.0) 9 119.25 – 119.35 CEU (42.0) 10 2.24 – 2.34 YRI (36.6) 10 5.37 – 5.46 CEU (44.8) NET1,UCN3 , TUBAL3 10 122.83 – 122.93 YRI (51.5) Copy number differences in HapMap samples398 11 15.75 – 15.83 YRI (37.2) 11 18.99 – 19.08 YRI (64.6) MRGPRX2, Copy number ZDHHC13 differences in HapMap samples398 12 2.79 – 2.88 CEU (42.0) TULP3,NRI Copy number loss in P2,ITFG2,F CEPH diverse KBP4, population 409 15 46.26 – 46.36 YRI (34.5) SLC12A1, CEU381,382,399,403 45.94 – 46.80 SLC24A5, MYEF2 18 57.80 – 57.89 YRI (55.8) PIGN Copy number difference in CEPH diverse population 409 20 12.34 – 12.40 YRI (35.8) Copy number differences in HapMap samples398 JPT + CHB - YRI 3 82.53 – 82.59 YRI (28.8) Copy number gain in HapMap samples398 4 12.84 – 12.93 YRI (31.0) Copy number loss in CEPH diverse population 409 5 51.19 – 51.30 JPT + CHB (34.8) 5 58.32 – 58.42 YRI (22.5) PDE4D 5 147.45 – 147.55 YRI (40.4) SPINK5, SPINK5L2 5 151.85 – 151.95 JPT + CHB (61.3) 6 67.91 – 67.99 YRI (21.7) 7 112.43 – 112.54 JPT + CHB (40.4) GPR85 JPT + CHB380-382 111.87 – 112.59 Copy number difference in HapMap samples398 7 135.72 – 138.81 YRI (36.1) 10 95.89 – 96.01 JPT + CHB (57.9) PLCE1 11 91.13 – 91.27 YRI (27.7) 11 100.81 – 100.91 JPT + CHB (38.8) TRPC6 12 65.73 – 65.83 YRI (30.0) 15 34.83 – 34.95 YRI (28.8) C15orf41 15 63.24 – 63.33 YRI (57.8) CLPX,CILP, PARP16 16 75.38 – 75.47 YRI (32.2) Copy number loss in Caucasians407 17 22.53 – 22.61 JPT + CHB (36.1) 18 63.72 – 63.77 YRI (20.7) 20 12.33 – 12.43 YRI (35.8) 21 22.72 – 22.80 YRI (26.0) Copy number loss in Caucasian/African- American 401 a Peak rate is the estimated recombination rate in cM/Mb for the population- specific recombination peak present in each top region.
177
b Region undergoing selection is the stretch spanning a maximum length and collapsed from all the overlap regions reported from studies410. c Copy number variations have to be at least 10 kb to be included and overlap with the peak rate regions. The frequency of the copy number difference is rare ( < 0.05) in the source populations unless otherwise specified in the table. CEU -JPT+CHB: comparisons between HapMap Europeans (CEU) and East Asians (JPT+CHB); CEU - YRI: comparisons between HapMap European (CEU) and Africans (YRI); JPT + CHB - YRI: comparisons between HapMap East Asians (JPT + CHB) and Africans (YRI).
178
Table 12. The 20 strongest signals of varRecM score in comparison of populations of SGVP Chinese and Indians, and Chinese and HapMap East Asians. Chr Position (Mb Population with Genes Population Selection Structure variation c in hg18) strong under position recombination selection (Mb)b (peak rate cM/Mb)a CHS -INS 1 214.96 – 215.06 INS (72.3) ESRRG 1 215,07 – 215.20 INS (49.6) ESRRG 1 230.65 – 230.75 INS (38.7) SIPA1L2 2 25.06 – 25.15 CHS (19.9) EFR3B 2 209.42 – 209.50 CHS (25.7) Copy number difference in Indian411 3 164.98 – 165.07 INS (38.7) 4 29.80 – 29.89 INS (29.5) 4 72.07 – 72.16 INS (53.0) DCK,MOB CEU403,412 72.04 – 72.41 KL1A 4 73.90 – 73.95 INS (16.3) 4 135.62 – 135.72 INS (24.5) CEU400,404,413 135.10 – 135.64 Common copy number loss in Indian411 5 66.21 – 66.30 INS (28.0) MAST4 7 122.16 – 122.22 INS (19.0) CADPS2 8 127.38 – 127.46 CHS (51.1) 9 24.10 – 24.19 CHS (73.2) 9 132.68 – 132.78 INS (67.4) ABL1,QRF P, FIBCD1 12 70.85 – 70.92 INS (34.7) 12 124.77 – 124.85 INS (65.4) 18 74.82 – 74.91 INS (28.5) SALL3 Copy number loss in Indian and Singapore Chinese411 20 52.60 – 52.73 INS (39.3) DOK5 JPT+CHB404,412 52.43 – 52.75 21 18.34 – 18.40 CHS (66.8) Copy number loss in Singapore Chinese411 CHS – JPT + CHB 1 21.56 – 21.65 CHS (30.7) NBPF3, ECE1 1 214.95 – 215.06 JPT + CHB (66.8) ESRRG 1 215.08 – 215.20 JPT + CHB (44.9) ESRRG 3 189.93 – 190.03 CHS (41.4) LPP JPT + 189.73 – 190.42 CHB382,399 4 123.05 – 123.14 CHS (37.2) TRPC3 4 142.23 – 142.33 JPT + CHB (41.6) RNF150 5 153.38- 153.44 JPT + CHB (39.5) MFAP3, FAM114A2 6 67.91 – 68.02 CHS (18.4) 8 57.00 – 57.10 JPT + CHB (29.3) LYN JPT + 56.96 – 57.18 CHB403,412 8 111.71 – 111.81 CHS (20.5) JPT+CHB381,404 111.41 – 111.86 9 119.53 – 119.62 JPT + CHB (47.1) TLR4 10 107.85 – 107.94 JPT + CHB (42.1) 11 100.50 – 100.60 JPT + CHB (17.0) PGR JPT + CHB404 100.49 – 100.60 11 100.81 – 100.91 JPT + CHB (38.8) TRPC6 12 70.85 – 70.93 JPT + CHB (40.9) 15 43.97 – 44.07 JPT + CHB (27.2) JPT+CHB404/As 43.92 – 44.33 ian American399 18 30.71 – 30.82 CHS (14.5) DTNA 21 18.35 – 18.40 CHS (66.8) 179
21 18.93 – 18.99 CHS (75.1) Copy number gain in Singapore Chinese411 22 29.02 – 29.11 CHS (10.5) TBC1D10, RNF215, SEC14L2 a Peak rate is the estimated recombination rate in cM/Mb for the population- specific recombination peak present in each top region. b Region undergoing selection is the stretch spanning a maximum length and collapsed from all the overlap regions reported from studies 410. c Copy number variations have to be at least 10 kb to be included and overlap with the peak rate regions. The frequency of the copy number difference is rare ( < 0.05) in the source populations unless otherwise specified in the table. CHS - INS: comparisons between SGVP Chinese (CHS) and Indian (INS); CHS - JPT + CHB: comparisons between SGVP Chinese (CHS) and HapMap East Asians (JPT + CHB).
180
6 Chapter 6 Conclusion
This dissertation is comprised of three studies (Study 1, 2 and 3). Since each
study has its own discussion, in this final chapter, I will only summarise the key
results and major implications, and suggest areas for further research.
6.1 Identified genetic variants associated with refractive errors
In Study 1 (Chapter 3), the genetic locus on chromosome 1q41 containing
zinc-finger pseudogene ZC3H11B was found to associate with AL through a
meta-analysis of three GWAS in Chinese and Malays. This is the first genome-
wide association scan, to best of our knowledge, to explore the genetic association
for AL. The discovery of chromosome 1q41 as a locus for high myopia in our
data was further supported by validation in two independent Japanese cohorts.
Similar genetic effects at 1q41 for high myopia in Chinese, Malays and Japanese
were observed, suggesting that underlying biological mechanisms of the genetic
variation are likely to be conserved across these three Asian populations. Our
GWAS results herein highlight AL QTLs relevant for high myopia predisposition,
which advances our understanding of the genetic aetiology of common and high
myopia.
In Study 2 (Chapter 4), our genome-wide meta-analysis across five genome-
wide surveys from three genetically diverse populations in Asia revealed that
genetic variants in the PDGFRA gene on chromosome 4q12 are significantly associated with corneal astigmatism. We observed a strong and consistent
181
association with the onset of corneal astigmatism at the PDGFRA gene locus on
chromosome 4q12 across Chinese, Malays and Indians.
Due to the nature of the GWAS design of ‘indirect mapping’, the identified
SNPs at locus chromosome 1q41 or within gene PDGFRA are probably not the
casual variants, but are likely to be in LD with the causal SNPs. Further strategies
to pinpoint the causal mutation including fine mapping and deep-sequencing on
this genomic region in different population will be required.
6.2 Transferability of the genetic variants for refractive errors across
populations
The international Consortium for Refractive Error and Myopia (CREAM) has
recently been launched, representing the largest genome-wide scan on AL in
about 23,000 individuals from 18 cohorts in Europe, United States, Australia and
Asia. Meta-analysis on 2.5 million autosomal SNPs by whole-genome imputation using HapMap 2 reference panels reveals the most significant association for AL on chromosome 1q41 in the proximity of pseudogene ZC3H11B ( meta-P = 9.6 ×
10-12 ), along with genetic locus on chromosome 15q14 and seven novel loci (data
not published). In a parallel genome-wide meta-analysis of GWAS in 27
Caucasian populations from the CREAM Consortium, use of spherical equivalent
as a continuous phenotype has identified 24 distinct genetic regions, while 13 loci
show significant evidence of replication in the Asian populations.
Chromosome 4q12, harbouring the gene PDGFRA, is a previously reported
myopia loci (MYP9). Interestingly, our group also found a strong association
182
between variants in PDGFRA with corneal curvature in Asians414, where corneal
curvature is one of the underlying ocular parameters of refractive error, pointing
to a possible pleiotropic contribution of PDGFRA on corneal astigmatism and
corneal curvature. Recent GWAS in Australian of northern European ancestry
confirmed the important role of the PDGFRA gene in controlling the corneal
curvature415. Significant associations of PDGFRA to corneal curvature and corneal astigmatism were also found in UK European Children (data not shown).
These outcomes, along with finding from the CREAM Consortium, provide evidence for substantial overlap in genetic aetiology of refractive error between
Caucasians and Asians.
The transferability of the genetic variants for refractive errors across different ethnic groups will have important implication for preventive or therapeutic
intervention for myopia in the future. The discovery of common risk loci in
GWAS raises the possibility of using these variants for risk predication in a
clinical setting. Assuming that the genetic variants for myopia are shared across
different population, a generic risk predication model will be used to classify
individuals into high or low risk for myopia onset and enable targeted preventive
treatment. A good genetic prediction model has been built up for age-related
macular degeneration, utilizing multiple large-effect genetic variants in addition
to epidemiological risk factors416. Limited predicative value, however, is noted for
most common diseases so far417. Substantial improvements will only be expected,
as the field of genomics is progressing, to better characterise genetic variation via
183
fine-mapping, whole-genome sequencing as well as functional testing, and assess
gene-environment interactions.
6.3 Statistical meta-analysis of GWAS in diverse populations
The use of diverse populations will be an essential component of the next
phase of GWAS. Meta-analysis combining many studies as possible provides the
opportunity to scrutinise less significant SNPs embedded in the individual GWAS
that may reflect the genuine association at larger sample sizes. In Study 2, the
existing inter-population variation of LD (i.e. Indians versus Chinese) may
diminish the statistical power to detect genetic loci at these regions with different
LD patterns across ethnic groups, although the false-positive rate generally is not
affected. Such inter-population LD variation could yield different effect sizes of the association (or even opposite direction in the association) at the genotyped
SNP, while the underlying genetic effect of the casual variant is the same72.
Rather than combining test statistic of the same SNP, regional-based meta-
analysis method is useful in trans-ethnic studies for assessing regional evidence of
phenotypic association, while accommodating different patterns of LD and the
presence of allelic heterogeneity68.
In addition, the assumption of fixed-effect model adopted in the current meta-
analysis model does not account for the scenario where the effect sizes of genetic
variants can vary across studies. Interestingly, the conventional random-effect
model yields less significant p-values than the fixed-effect model in the presence
of inter-study heterogeneity 418. There are future opportunities to perform more
184
advanced statistical analyses to accommodate this issue in the trans-ethnic mapping. For example, Han and Eskin have proposed a revised random-effect model that relaxed the conservative assumption of null-hypothesis, showing significantly improved power in comparison with the traditional random-effect model66. By accounting for similarity in allelic effects between genetically close-
related populations but allowing heterogeneity between more diverse ethnic
groups, Morris has demonstrated that such an approach significantly increased the
power compared to the uniform fixed- or random-effect models67.
6.4 Missing heritability of myopia
In Study 1, the effect sizes of the observed association for AL were modest, and the proportion of variance explained was about 1% for AL. The odds ratio of the risk allele (major allele) at the implicated SNP (rs4373767) for high myopia was below 1.35, similarly to other implicated loci reported in previous
GWAS22,265,266.
There are many possible reasons for such ‘missing heritability’ of ocular traits,
considering the heritability estimate as high as 88% for quantitative refraction.
One is the gene-environmental interaction. There is abundant evidence for
environmental impacts on the development of myopia. The importance of the
environmental influence on the development of myopia is strongly supported in
experimental animal models, which show that changes in visual experience can
generate signals that promote eye growth, leading to myopia 126. In humans, the
continuously increased prevalence of myopia globally, especially in urban East
185
Asia in the last several decades, has suggested the roles of environmental factors,
such as levels of educational attainment, the extent of near work, and outdoor
activities110. The lifestyle changes may alter the distribution of genetic effects in
some subgroups as well109. However, unless there is a priori knowledge that
particular environmental factors or genes interplay with each other and reasonable
fraction of individuals experience the adverse exposure, a direct search for pairs
of interacting loci and/or interaction with environmental factors can be difficult.
Another possibility is that large-effect rare variants may contribute to the genetic and phenotypic variation419. Rare variants (MAF <5%) are generally
omitted in GWAS due to a lack of power. The presence of allelic heterogeneity
poses a challenge for rare variants in exome sequencing studies, as it is expected
that, in the same functional unit, different causal mutations of extremely low
frequencies could be carried by different individuals. The statistical analysis of
rare variants is fundamentally different from association statistics used for testing
common variants. The challenges arise in several aspects including extremely low
power in the single-marker test, allelic heterogeneity, effect heterogeneity of the
variants within the same region (gene), etc.420. Developing practical computer
software to prioritise promising rare variants for further functional assay is
currently an active research area for statistical geneticists.
Genome-wide scans for refractive errors on structure variations, such as copy
number variations (CNVs) referring to the chromosomal deletions or duplications,
are lag behind many complex diseases. Different computational algorithms have
been developed to generate CNV calls based on the current available SNP chips
186
from GWAS, however large variation of copy number detection among different
programs is noted421. With the availability of genome-wide sequencing data in the near future, it is expected that studies on genome-wide CNVs for myopia will be reported.
6.5 Recombination variations and implications in genetic association studies
In Study 3 (Chapter 5), a signal processing approach (varRecM) was introduced to evaluate differences in two estimated recombination rate profiles.
To the best of our knowledge, there is currently no formal statistical metric for defining recombination rate variation without first defining the presence or absence of recombination hotspots.
Applying the metric to genome-wide recombination rates estimated from
HapMap and SGVP suggests that significant fine-scale differences exist in the recombination profiles of Europeans, Africans and East Asians. This observation is important in trans-ethnic studies, as it provides the insight into the fine- mapping of the functional polymorphisms in diverse populations. Assuming that a functional variant of the similar effect size and the same direction is present across multiple populations, whether this association is detectable largely depends on the pattern of LD in the region. Recombination is one of the key forces that breakdowns LD, and the rate of recombination has been widely used as a surrogate measure for LD in genetic association studies, for instance, SNP imputation algorithms60. Therefore, comparison of inter-population
recombination variation at fine scale, which is relevant to the extent of LD
187
variation in the same region, is of great value to studies that aim to identify the
genetic associations transferable across different populations. Top regions exhibiting large recombination variations will be studied further for the relevance
to the disease-associated regions and subsequently the transferability of genetic
associations at these regions.
Until very recently two largest meta-analyses of GWAS ever have uncovered a total of 39 genetic loci implicated with myopia 422,423. The association signals
from both studies are primarily discovered from the samples of Caucasian
descent. In the CREAM GWAS on spherical equivalent, a number of these loci
have been successfully replicated in Asian populations comprising Chinese,
Malays and Indians, whereas others have failed. Since the causal variants are
highly likely to be in LD with these associated SNPs, the variation of LD patterns
can challenge replication studies across populations. Further evaluation of inter-
population differences in the patterns of recombination (varRecM scores), or
similarly the extent of LD, will facilitate our understanding of the transferability
of theses ‘myopia’ genetic loci in multiple populations.
This analysis also shows that the large variations in recombination profiles are
generally found in non-gene regions, an observation that appears to be concordant
with previous reports that recombination tends to be suppressed in genes93 even
though the biological mechanisms for this are still poorly understood424. An
interesting question remaining is why some hotspots are less likely to be shared
between different human populations? Our findings prioritise genomic regions for
188
further studies to provide insight into the biological basis of heterogeneous
recombination in different populations.
In summary, this dissertation demonstrates the value of integrating population genomics and medical genetics to carry out large-scale genetic association studies in multiple populations in elucidation of the genetic architecture of refractive errors. The disease-predisposing genetic variation identified in this dissertation will shed light on the complete picture of the genetic pathways involved in the occurrence and development of refractive errors and further characterisation of the causal mutations responsible for the genetic aetiology.
189
7 Publications
This thesis is based on the following papers,
Fan, Q., Barathi, V.A., Cheng, C.Y., Zhou, X., Meguro, A., Nakata, I., Khor, C.C., Goh, L.K., Li, Y.J., Lim, W., Ho CEH, Hawthorne F., Zheng Y., Chua D., Inoko H., Yamashiro K., Ohno-Matsui K., Matsuo K., Matsuda F., Vithana E., Seielstad M., Mizuki N., Beuerman RW., E-Shyong Tai E-S., Yoshimura., Aung T., Young TL., Wong T-Y., Teo Y-Y.*, Saw S-M.* (2012). Genetic variants on chromosome 1q41 influence ocular axial length and high myopia. PLoS Genet 8, e1002753.
Fan, Q., Ong, R., Xu, H., Small K.S., Saw, S-M, Teo,Y-Y. Genome-wide comparison of estimated recombination rates between populations; under review (2013)
Fan, Q., Zhou, X., Khor, C.C., Cheng, C.Y., Goh, L.K., Sim, X., Tay, W.T., Li, Y.J., Ong, R.T., Suo, C., Cornes B.,Ikram MK.,Chia K-S., Seielstad M., Liu J., Vithana E., Young TL.,Tai E-S., Wong T-Y.*, Aung T.*, Teo Y-Y.*, Saw S-M.* (2011). Genome-wide meta-analysis of five Asian cohorts identifies PDGFRA as a susceptibility locus for corneal astigmatism. PLoS Genet 7, e1002402.
Fan, Q., Teo, Y-Y., and Saw, S-M. (2011). Application of advanced statistics in ophthalmology. Invest Ophthalmol Vis Sci 52, 6059-6065.
and book chapter:
Li Y.J & Fan Q (2010). Statistical analysis of Genome-wide associaton studies for myopia. In: Myopia: animal models to clincial trials. Singapore, World Scentific Publishing Co. Pte. Ltd, 215-235.
Relevant work: Han, S.*, Chen, P.*, Fan, Q*., Khor, C.C., Sim, X., Tay, W.T., Ong, R.T., Suo, C., Goh, L.K., Lavanya, R., Zheng,Y., Wu R., Seielstad M., Vithana E., Liu J., Chia K.S., Lee J.J., Tai, E-S., Wong, T-Y. *, Aung T. *, Teo Y-Y. *, and Saw S- M*. (2011). Association of variants in FRAP1 and PDGFRA with corneal curvature in Asian populations from Singapore. Hum Mol Genet, 20, 3693-3698
Li, Y.J., Goh, L.K., Khor, C.C., Fan, Q., Yu, M., Han, S., Sim, X., Ong, R.T., Wong, T.Y., Vithana, E.N., Yap E., Seielstad M., Wong T-Y., Tai E-S., Young TY., Saw S-M. (2011). Genome-wide association studies reveal genetic variants in CTNND2 for high myopia in Singapore Chinese. Ophthalmology 118, 368- 375. * Joint first/last authors 190
8 References
1. Morton, N.E. Sequential tests for the detection of linkage. Am J Hum Genet 7, 277-318 (1955). 2. Altshuler, D., Daly, M.J. & Lander, E.S. Genetic mapping in human disease. Science 322, 881-8 (2008). 3. McCarthy, M.I. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 9, 356-69 (2008). 4. Hirschhorn, J.N. & Daly, M.J. Genome-wide association studies for common diseases and complex traits. Nat Rev Genet 6, 95-108 (2005). 5. Frazer, K.A. et al. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851-61 (2007). 6. Anderson, C.A. et al. Data quality control in genetic case-control association studies. Nat Protoc 5, 1564-73 (2010). 7. Price, A.L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38, 904-9 (2006). 8. Patterson, N., Price, A.L. & Reich, D. Population structure and eigenanalysis. PLoS Genet 2, e190 (2006). 9. Balding, D.J. A tutorial on statistical methods for population association studies. Nat Rev Genet 7, 781-91 (2006). 10. Kerner, B., North, K.E. & Fallin, M.D. Use of longitudinal data in genetic studies in the genome-wide association studies era: summary of Group 14. Genet Epidemiol 33 Suppl 1, S93-8 (2009). 11. Rasmussen-Torvik, L.J. et al. Impact of repeated measures and sample selection on genome-wide association studies of fasting glucose. Genet Epidemiol (2010). 12. Spielman, R.S., McGinnis, R.E. & Ewens, W.J. Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am J Hum Genet 52, 506-16 (1993). 13. Rabinowitz, D. & Laird, N. A unified approach to adjusting association tests for population admixture with arbitrary pedigree structure and arbitrary missing marker information. Hum Hered 50, 211-23 (2000). 14. Freedman, M.L. et al. Assessing the impact of population stratification on genetic association studies. Nat Genet 36, 388-93 (2004). 15. Laird, N.M. & Lange, C. Family-based designs in the age of large-scale gene- association studies. Nat Rev Genet 7, 385-94 (2006). 16. Chanock, S.J. et al. Replicating genotype-phenotype associations. Nature 447, 655-60 (2007). 17. Dewan, A. et al. HTRA1 promoter polymorphism in wet age-related macular degeneration. Science 314, 989-92 (2006). 18. Chen, W. et al. Genetic variants near TIMP3 and high-density lipoprotein- associated loci influence susceptibility to age-related macular degeneration. Proc Natl Acad Sci U S A 107, 7401-6 (2010). 19. Thorleifsson, G. et al. Common variants near CAV1 and CAV2 are associated with primary open-angle glaucoma. Nat Genet 42, 906-9 (2010).
191
20. Nakano, M. et al. Three susceptible loci associated with primary open-angle glaucoma identified by genome-wide association study in a Japanese population. Proc Natl Acad Sci U S A 106, 12838-42 (2009). 21. Lin, H.J. et al. Single-nucleotide polymorphisms in chromosome 3p14.1- 3p14.2 are associated with susceptibility of Type 2 diabetes with cataract. Mol Vis 16, 1206-14 (2010). 22. Li, Y.J. et al. Genome-wide association studies reveal genetic variants in CTNND2 for high myopia in Singapore Chinese. Ophthalmology 118, 368-75 (2011). 23. Nakanishi, H. et al. A genome-wide association analysis identified a novel susceptible locus for pathological myopia at 11q24.1. PLoS Genet 5, e1000660 (2009). 24. Charlesworth, J. et al. The path to open-angle glaucoma gene discovery: endophenotypic status of intraocular pressure, cup-to-disc ratio, and central corneal thickness. Invest Ophthalmol Vis Sci 51, 3509-14 (2010). 25. Lu, Y. et al. Common genetic variants near the Brittle Cornea Syndrome locus ZNF469 influence the blinding disease risk factor central corneal thickness. PLoS Genet 6, e1000947 (2010). 26. Vithana, E.N. et al. Collagen-related genes influence the glaucoma risk factor, central corneal thickness. Hum Mol Genet 20, 649-58 (2011). 27. Vitart, V. et al. New loci associated with central cornea thickness include COL5A1, AKAP13 and AVGR8. Hum Mol Genet 19, 4304-11 (2010). 28. Ramdas, W.D. et al. A genome-wide association study of optic disc parameters. PLoS Genet 6, e1000978 (2010). 29. Khor, C.C. et al. Genome-wide association studies in Asians confirm the involvement of ATOH7 and TGFBR3, and further identify CARD10 as a novel locus influencing optic disc area. Hum Mol Genet 20, 1864-72 (2011). 30. Plomin, R., Haworth, C.M. & Davis, O.S. Common disorders are quantitative traits. Nat Rev Genet 10, 872-8 (2009). 31. Evans, K. & Bird, A.C. The genetics of complex ophthalmic disorders. Br J Ophthalmol 80, 763-8 (1996). 32. Wiggs, J.L. Genetic etiologies of glaucoma. Arch Ophthalmol 125, 30-7 (2007). 33. Young, T.L., Metlapally, R. & Shay, A.E. Complex trait genetics of refractive error. Arch Ophthalmol 125, 38-48 (2007). 34. Solouki, A.M. et al. A genome-wide association study identifies a susceptibility locus for refractive errors and myopia at 15q14. Nat Genet 42, 897-901 (2010). 35. Hysi, P.G. et al. A genome-wide association study for myopia and refractive error identifies a susceptibility locus at 15q25. Nat Genet 42, 902-5 (2010). 36. Carbonaro, F. et al. Repeated measures of intraocular pressure result in higher heritability and greater power in genetic linkage studies. Invest Ophthalmol Vis Sci 50, 5115-9 (2009). 37. Ciner, E. et al. Genome-wide scan of African-American and white families for linkage to myopia. Am J Ophthalmol 147, 512-517 e2 (2009). 38. Ciner, E., Wojciechowski, R., Ibay, G., Bailey-Wilson, J.E. & Stambolian, D. Genomewide scan of ocular refraction in African-American families shows significant linkage to chromosome 7p15. Genet Epidemiol 32, 454-63 (2008). 39. Hammond, C.J., Andrew, T., Mak, Y.T. & Spector, T.D. A susceptibility locus for myopia in the normal population is linked to the PAX6 gene region on 192
chromosome 11: a genomewide scan of dizygotic twins. Am J Hum Genet 75, 294-304 (2004). 40. Klein, A.P. et al. Support for polygenic influences on ocular refractive error. Invest Ophthalmol Vis Sci 46, 442-6 (2005). 41. Murdoch, I.E., Morris, S.S. & Cousens, S.N. People and eyes: statistical approaches in ophthalmology. Br J Ophthalmol 82, 971-3 (1998). 42. Xu, X., Tian, L. & Wei, L.J. Combining dependent tests for linkage or association across multiple phenotypic traits. Biostatistics 4, 223-9 (2003). 43. Klei, L., Luca, D., Devlin, B. & Roeder, K. Pleiotropy and principal components of heritability combine to increase power for association analysis. Genet Epidemiol 32, 9-19 (2008). 44. Yang, F., Tang, Z. & Deng, H. Bivariate association analysis for quantitative traits using generalized estimation equation. J Genet Genomics 36, 733-43 (2009). 45. Jiang, C. & Zeng, Z.B. Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics 140, 1111-27 (1995). 46. O'Brien, P.C. Procedures for comparing samples with multiple endpoints. Biometrics 40, 1079-87 (1984). 47. Wei, L.J. & Johnson, W.E. Combining Dependent Tests with Incomplete Repeated Measurements. Biometrika 72, 359-364 (1985). 48. Yang, Q., Wu, H., Guo, C.Y. & Fox, C.S. Analyze multivariate phenotypes in genetic association studies by combining univariate association tests. Genet Epidemiol 34, 444-54 (2010). 49. Lange, C. et al. A family-based association test for repeatedly measured quantitative traits adjusting for unknown environmental and/or polygenic effects. Stat Appl Genet Mol Biol 3, Article17 (2004). 50. Liu, J., Pei, Y., Papasian, C.J. & Deng, H.W. Bivariate association analyses for the mixture of continuous and binary traits with the use of extended generalized estimating equations. Genet Epidemiol 33, 217-27 (2009). 51. Liu, Y.Z. et al. Powerful bivariate genome-wide association analyses suggest the SOX6 gene influencing both obesity and osteoporosis phenotypes in males. PLoS One 4, e6827 (2009). 52. Lange, C., Silverman, E.K., Xu, X., Weiss, S.T. & Laird, N.M. A multivariate family- based association test using generalized estimating equations: FBAT-GEE. Biostatistics 4, 195-206 (2003). 53. Hackett, C.A., Meyer, R.C. & Thomas, W.T. Multi-trait QTL mapping in barley using multivariate regression. Genet Res 77, 95-106 (2001). 54. Yu, K. et al. A partially linear tree-based regression model for multivariate outcomes. Biometrics 66, 89-96 (2010). 55. Lange, C., DeMeo, D., Silverman, E.K., Weiss, S.T. & Laird, N.M. PBAT: tools for family-based association studies. Am J Hum Genet 74, 367-9 (2004). 56. Goldstein, D.B. Common genetic variation and human traits. N Engl J Med 360, 1696-8 (2009). 57. Munafo, M.R. & Flint, J. Meta-analysis of genetic association studies. Trends Genet 20, 439-44 (2004). 58. de Bakker, P.I. et al. Practical aspects of imputation-driven meta-analysis of genome-wide association studies. Hum Mol Genet 17, R122-8 (2008).
193
59. Marchini, J. & Howie, B. Genotype imputation for genome-wide association studies. Nat Rev Genet 11, 499-511 (2010). 60. Marchini, J., Howie, B., Myers, S., McVean, G. & Donnelly, P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet 39, 906-13 (2007). 61. Huang, L. et al. Genotype-imputation accuracy across worldwide human populations. Am J Hum Genet 84, 235-50 (2009). 62. Guan, Y. & Stephens, M. Practical issues in imputation-based association mapping. PLoS Genet 4, e1000279 (2008). 63. Browning, S.R. & Browning, B.L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet 81, 1084-97 (2007). 64. Ellinghaus, D., Schreiber, S., Franke, A. & Nothnagel, M. Current software for genotype imputation. Hum Genomics 3, 371-80 (2009). 65. Mantel, N. & Haenszel, W. Statistical aspects of the analysis of data from retrospective studies of disease. J Natl Cancer Inst 22, 719-48 (1959). 66. Han, B. & Eskin, E. Random-effects model aimed at discovering associations in meta-analysis of genome-wide association studies. Am J Hum Genet 88, 586-98 (2011). 67. Morris, A.P. Transethnic meta-analysis of genomewide association studies. Genet Epidemiol 35, 809-22 (2011). 68. Wang, X. et al. A statistical method for region-based meta-analysis of genome- wide association studies in genetically diverse populations. Eur J Hum Genet 20, 469-75 (2012). 69. Rosenberg, N.A. et al. Genome-wide association studies in diverse populations. Nat Rev Genet 11, 356-66 (2010). 70. Tang, M.X. et al. The APOE-epsilon4 allele and the risk of Alzheimer disease among African Americans, whites, and Hispanics. JAMA 279, 751-5 (1998). 71. Begum, F., Ghosh, D., Tseng, G.C. & Feingold, E. Comprehensive literature review and statistical considerations for GWAS meta-analysis. Nucleic Acids Res 40, 3777-84 (2012). 72. Lin, P.I., Vance, J.M., Pericak-Vance, M.A. & Martin, E.R. No gene is an island: the flip-flop phenomenon. Am J Hum Genet 80, 531-8 (2007). 73. Klein, R.J. et al. Complement factor H polymorphism in age-related macular degeneration. Science 308, 385-9 (2005). 74. Asimit, J. et al. An evaluation of different meta-analysis approaches in the presence of allelic heterogeneity. Eur J Hum Genet 20, 709-12 (2012). 75. Kiezun, A. et al. Exome sequencing and the genetic basis of complex traits. Nat Genet 44, 623-30 (2012). 76. Nachman, M.W. Single nucleotide polymorphisms and recombination rate in humans. Trends Genet 17, 481-5 (2001). 77. Jeffreys, A.J., Murray, J. & Neumann, R. High-resolution mapping of crossovers in human sperm defines a minisatellite-associated recombination hotspot. Mol Cell 2, 267-73 (1998). 78. Jeffreys, A.J., Kauppi, L. & Neumann, R. Intensely punctate meiotic recombination in the class II region of the major histocompatibility complex. Nat Genet 29, 217-22 (2001). 194
79. Lynn, A., Ashley, T. & Hassold, T. Variation in human meiotic recombination. Annu Rev Genomics Hum Genet 5, 317-49 (2004). 80. Jensen-Seaman, M.I. et al. Comparative recombination rates in the rat, mouse, and human genomes. Genome Res 14, 528-38 (2004). 81. Nachman, M.W. Variation in recombination rate across the genome: evidence and implications. Curr Opin Genet Dev 12, 657-63 (2002). 82. Winckler, W. et al. Comparison of fine-scale recombination rates in humans and chimpanzees. Science 308, 107-11 (2005). 83. Ptak, S.E. et al. Fine-scale recombination patterns differ between chimpanzees and humans. Nat Genet 37, 429-34 (2005). 84. Jeffreys, A.J. & Neumann, R. The rise and fall of a human recombination hot spot. Nat Genet 41, 625-9 (2009). 85. Coop, G. & Przeworski, M. An evolutionary view of human recombination. Nat Rev Genet 8, 23-34 (2007). 86. Jeffreys, A.J. & Neumann, R. Reciprocal crossover asymmetry and meiotic drive in a human recombination hot spot. Nat Genet 31, 267-71 (2002). 87. Risch, N. & Merikangas, K. The future of genetic studies of complex human diseases. Science 273, 1516-7 (1996). 88. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661-78 (2007). 89. Jorde, L.B., Watkins, W.S. & Bamshad, M.J. Population genomics: a bridge from evolutionary history to genetic medicine. Hum Mol Genet 10, 2199-207 (2001). 90. Serre, D., Nadon, R. & Hudson, T.J. Large-scale recombination rate patterns are conserved among human populations. Genome Res 15, 1547-52 (2005). 91. Matise, T.C. et al. A second-generation combined linkage physical map of the human genome. Genome Res 17, 1783-6 (2007). 92. Evans, D.M. & Cardon, L.R. A comparison of linkage disequilibrium patterns and estimated population recombination rates across multiple populations. Am J Hum Genet 76, 681-7 (2005). 93. Myers, S., Bottolo, L., Freeman, C., McVean, G. & Donnelly, P. A fine-scale map of recombination rates and hotspots across the human genome. Science 310, 321-4 (2005). 94. Jorgenson, E. et al. Ethnicity and human genetic linkage maps. Am J Hum Genet 76, 276-90 (2005). 95. Wegmann, D. et al. Recombination rates in admixed individuals identified by ancestry-based inference. Nat Genet 43, 847-53 (2011). 96. Crawford, D.C. et al. Evidence for substantial fine-scale variation in recombination rates across the human genome. Nat Genet 36, 700-6 (2004). 97. Fearnhead, P. & Smith, N.G. A novel method with improved power to detect recombination hotspots from polymorphism data reveals multiple hotspots in human genes. Am J Hum Genet 77, 781-94 (2005). 98. Hinch, A.G. et al. The landscape of recombination in African Americans. Nature 476, 170-5 (2011). 99. A haplotype map of the human genome. Nature 437, 1299-320 (2005). 100. Teo, Y.Y. et al. Singapore Genome Variation Project: a haplotype map of three Southeast Asian populations. Genome Res 19, 2154-62 (2009).
195
101. McVean, G.A. et al. The fine-scale structure of recombination rate variation in the human genome. Science 304, 581-4 (2004). 102. Clark, A.G., Wang, X. & Matise, T. Contrasting methods of quantifying fine structure of human recombination. Annu Rev Genomics Hum Genet 11, 45-64 (2010). 103. McVean, G. The structure of linkage disequilibrium around a selective sweep. Genetics 175, 1395-406 (2007). 104. Myers, S., Freeman, C., Auton, A., Donnelly, P. & McVean, G. A common sequence motif associated with recombination hot spots and genome instability in humans. Nat Genet (2008). 105. Myers, S. et al. Drive against hotspot motifs in primates implicates the PRDM9 gene in meiotic recombination. Science 327, 876-9 (2010). 106. Kong, A. et al. Fine-scale recombination rate differences between sexes, populations and individuals. Nature 467, 1099-103 (2010). 107. Laayouni, H. et al. Similarity in recombination rate estimates highly correlates with genetic differentiation in humans. PLoS One 6, e17913 (2011). 108. Khil, P.P. & Camerini-Otero, R.D. Genetic crossovers are predicted accurately by the computed human recombination map. PLoS Genet 6, e1000831 (2010). 109. Morgan, I.G., Ohno-Matsui, K. & Saw, S.M. Myopia. Lancet 379, 1739-48 (2012). 110. Wojciechowski, R. Nature and nurture: the complex genetics of myopia and refractive error. Clin Genet 79, 301-20 (2011). 111. Saw, S.M., Gazzard, G., Shih-Yen, E.C. & Chua, W.H. Myopia and associated pathological complications. Ophthalmic Physiol Opt 25, 381-91 (2005). 112. Wong, T.Y., Klein, B.E., Klein, R., Tomany, S.C. & Lee, K.E. Refractive errors and incident cataracts: the Beaver Dam Eye Study. Invest Ophthalmol Vis Sci 42, 1449-54 (2001). 113. Grosvenor, T.P. Primary care optometry : anomalies of refraction and binocular vision, xiii, 665 p. (Butterworth-Heinemann, Boston, 1996). 114. Saw, S.M. et al. Eye growth changes in myopic children in Singapore. Br J Ophthalmol 89, 1489-94 (2005). 115. Read, S.A., Collins, M.J. & Carney, L.G. A review of astigmatism and its possible genesis. Clin Exp Optom 90, 5-19 (2007). 116. Keller, P.R., Collins, M.J., Carney, L.G., Davis, B.A. & van Saarloos, P.P. The relation between corneal and total astigmatism. Optom Vis Sci 73, 86-91 (1996). 117. Wang, Y. et al. Prevalence and causes of amblyopia in a rural adult population of Chinese the Handan Eye Study. Ophthalmology 118, 279-83 (2011). 118. Dobson, V., Harvey, E.M., Clifford-Donaldson, C.E., Green, T.K. & Miller, J.M. Amblyopia in astigmatic infants and toddlers. Optom Vis Sci 87, 330-6 (2010). 119. Harvey, E.M., Dobson, V., Miller, J.M. & Clifford-Donaldson, C.E. Amblyopia in astigmatic children: patterns of deficits. Vision Res 47, 315-26 (2007). 120. Harvey, E.M. Development and treatment of astigmatism-related amblyopia. Optom Vis Sci 86, 634-9 (2009). 121. Kee, C.S., Hung, L.F., Qiao-Grider, Y., Ramamirtham, R. & Smith, E.L., 3rd. Astigmatism in monkeys with experimentally induced myopia or hyperopia. Optom Vis Sci 82, 248-60 (2005). 122. Tong, L. et al. Prevalence rates and epidemiological risk factors for astigmatism in Singapore school children. Optom Vis Sci 79, 606-13 (2002). 196
123. KhabazKhoob, M. et al. Keratometry measurements, corneal astigmatism and irregularity in a normal population: the Tehran Eye Study. Ophthalmic Physiol Opt 30, 800-5 (2010). 124. Heidary, G., Ying, G.S., Maguire, M.G. & Young, T.L. The association of astigmatism and spherical refractive error in a high myopia cohort. Optom Vis Sci 82, 244-7 (2005). 125. Hashemi, H., Hatef, E., Fotouhi, A. & Mohammad, K. Astigmatism and its determinants in the Tehran population: the Tehran eye study. Ophthalmic Epidemiol 12, 373-81 (2005). 126. Lawrence, M.S. & Azar, D.T. Myopia and models and mechanisms of refractive error control. Ophthalmol Clin North Am 15, 127-33 (2002). 127. Sivak, J.G. The role of the lens in refractive development of the eye: animal models of ametropia. Exp Eye Res 87, 3-8 (2008). 128. Wallman, J., Gottlieb, M.D., Rajaram, V. & Fugate-Wentzek, L.A. Local retinal regions control local eye growth and myopia. Science 237, 73-7 (1987). 129. Morgan, I. & Megaw, P. Using natural STOP growth signals to prevent excessive axial elongation and the development of myopia. Ann Acad Med Singapore 33, 16-20 (2004). 130. Stone, R.A., Lin, T., Laties, A.M. & Iuvone, P.M. Retinal dopamine and form- deprivation myopia. Proc Natl Acad Sci U S A 86, 704-6 (1989). 131. Barathi, V.A. & Beuerman, R.W. Molecular mechanisms of muscarinic receptors in mouse scleral fibroblasts: Prior to and after induction of experimental myopia with atropine treatment. Mol Vis 17, 680-92 (2011). 132. Norton, T.T. & Siegwart, J.T., Jr. Animal models of emmetropization: matching axial length to the focal plane. J Am Optom Assoc 66, 405-14 (1995). 133. Siegwart, J.T., Jr. & Norton, T.T. Binocular lens treatment in tree shrews: Effect of age and comparison of plus lens wear with recovery from minus lens-induced myopia. Exp Eye Res 91, 660-9 (2010). 134. McBrien, N.A., Jobling, A.I. & Gentle, A. Biomechanics of the sclera in myopia: extracellular and cellular factors. Optom Vis Sci 86, E23-30 (2009). 135. Charman, W.N. & Radhakrishnan, H. Peripheral refraction and the development of refractive error: a review. Ophthalmic Physiol Opt 30, 321-38 (2010). 136. Smith, E.L., 3rd, Kee, C.S., Ramamirtham, R., Qiao-Grider, Y. & Hung, L.F. Peripheral vision can influence eye growth and refractive development in infant monkeys. Invest Ophthalmol Vis Sci 46, 3965-72 (2005). 137. Huang, J., Hung, L.F. & Smith, E.L., 3rd. Effects of foveal ablation on the pattern of peripheral refractive errors in normal and form-deprived infant rhesus monkeys (Macaca mulatta). Invest Ophthalmol Vis Sci 52, 6428-34 (2011). 138. Schippert, R. & Schaeffel, F. Peripheral defocus does not necessarily affect central refractive development. Vision Res 46, 3935-40 (2006). 139. Wong, T.Y., Foster, P.J., Johnson, G.J. & Seah, S.K. Education, socioeconomic status, and ocular dimensions in Chinese adults: the Tanjong Pagar Survey. Br J Ophthalmol 86, 963-8 (2002). 140. Teasdale, T.W. & Goldschmidt, E. Myopia and its relationship to education, intelligence and height. Preliminary results from an on-going study of Danish draftees. Acta Ophthalmol Suppl 185, 41-3 (1988).
197
141. Gwiazda, J., Deng, L., Dias, L. & Marsh-Tootle, W. Association of education and occupation with myopia in COMET parents. Optom Vis Sci 88, 1045-53 (2011). 142. Gwiazda, J.E. et al. Accommodation and related risk factors associated with myopia progression and their interaction with treatment in COMET children. Invest Ophthalmol Vis Sci 45, 2143-51 (2004). 143. McBrien, N.A., Moghaddam, H.O. & Reeder, A.P. Atropine reduces experimental myopia and eye enlargement via a nonaccommodative mechanism. Invest Ophthalmol Vis Sci 34, 205-15 (1993). 144. Saw, S.M. et al. Nearwork in early-onset myopia. Invest Ophthalmol Vis Sci 43, 332-9 (2002). 145. Ip, J.M. et al. Role of near work in myopia: findings in a sample of Australian school children. Invest Ophthalmol Vis Sci 49, 2903-10 (2008). 146. Jones, L.A. et al. Parental history of myopia, sports and outdoor activities, and future myopia. Invest Ophthalmol Vis Sci 48, 3524-32 (2007). 147. Rose, K.A. et al. Outdoor activity reduces the prevalence of myopia in children. Ophthalmology 115, 1279-85 (2008). 148. Cohen, Y., Belkin, M., Yehezkel, O., Solomon, A.S. & Polat, U. Dependency between light intensity and refractive development under light-dark cycles. Exp Eye Res 92, 40-6 (2011). 149. Ashby, R.S. & Schaeffel, F. The effect of bright light on lens compensation in chicks. Invest Ophthalmol Vis Sci 51, 5247-53 (2010). 150. Flitcroft, D.I. The complex interactions of retinal, optical and environmental factors in myopia aetiology. Prog Retin Eye Res (2012). 151. Mutti, D.O., Sholtz, R.I., Friedman, N.E. & Zadnik, K. Peripheral refraction and ocular shape in children. Invest Ophthalmol Vis Sci 41, 1022-30 (2000). 152. Lundstrom, L., Mira-Agudelo, A. & Artal, P. Peripheral optical errors and their change with accommodation differ between emmetropic and myopic eyes. J Vis 9, 17 1-11 (2009). 153. Mutti, D.O. et al. Refractive error, axial length, and relative peripheral refractive error before and after the onset of myopia. Invest Ophthalmol Vis Sci 48, 2510-9 (2007). 154. Mutti, D.O. et al. Relative peripheral refractive error and the risk of onset and progression of myopia in children. Invest Ophthalmol Vis Sci 52, 199-205 (2011). 155. Morgan, I. & Rose, K. How genetic is school myopia? Prog Retin Eye Res 24, 1-38 (2005). 156. Chen, C.Y. et al. Heritability and shared environment estimates for myopia and associated ocular biometric traits: the Genes in Myopia (GEM) family study. Hum Genet 121, 511-20 (2007). 157. Hammond, C.J., Snieder, H., Gilbert, C.E. & Spector, T.D. Genes and environment in refractive error: the twin eye study. Invest Ophthalmol Vis Sci 42, 1232-6 (2001). 158. Dirani, M. et al. Heritability of refractive error and ocular biometrics: the Genes in Myopia (GEM) twin study. Invest Ophthalmol Vis Sci 47, 4756-61 (2006). 159. Klein, A.P. et al. Heritability analysis of spherical equivalent, axial length, corneal curvature, and anterior chamber depth in the Beaver Dam Eye Study. Arch Ophthalmol 127, 649-55 (2009).
198
160. Lyhne, N., Sjolie, A.K., Kyvik, K.O. & Green, A. The importance of genes and environment for ocular refraction and its determiners: a population based study among 20-45 year old twins. Br J Ophthalmol 85, 1470-6 (2001). 161. He, M. et al. Heritability of anterior chamber depth as an intermediate phenotype of angle-closure in Chinese: the Guangzhou Twin Eye Study. Invest Ophthalmol Vis Sci 49, 81-6 (2008). 162. Lopes, M.C., Andrew, T., Carbonaro, F., Spector, T.D. & Hammond, C.J. Estimating heritability and shared environmental effects for refractive error in twin and family studies. Invest Ophthalmol Vis Sci 50, 126-31 (2009). 163. Vitart, V. et al. Heritabilities of ocular biometrical traits in two croatian isolates with extended pedigrees. Invest Ophthalmol Vis Sci 51, 737-43 (2010). 164. Peet, J.A., Cotch, M.F., Wojciechowski, R., Bailey-Wilson, J.E. & Stambolian, D. Heritability and familial aggregation of refractive error in the Old Order Amish. Invest Ophthalmol Vis Sci 48, 4002-6 (2007). 165. Farbrother, J.E., Kirov, G., Owen, M.J. & Guggenheim, J.A. Family aggregation of high myopia: estimation of the sibling recurrence risk ratio. Invest Ophthalmol Vis Sci 45, 2873-8 (2004). 166. Fotouhi, A. et al. Familial aggregation of myopia in the Tehran eye study: estimation of the sibling and parent offspring recurrence risk ratios. Br J Ophthalmol 91, 1440-4 (2007). 167. Wojciechowski, R. et al. Heritability of refractive error and familial aggregation of myopia in an elderly American population. Invest Ophthalmol Vis Sci 46, 1588-92 (2005). 168. Familial aggregation and prevalence of myopia in the Framingham Offspring Eye Study. The Framingham Offspring Eye Study Group. Arch Ophthalmol 114, 326- 32 (1996). 169. Zadnik, K., Satariano, W.A., Mutti, D.O., Sholtz, R.I. & Adams, A.J. The effect of parental history of myopia on children's eye size. JAMA 271, 1323-7 (1994). 170. Lam, D.S. et al. The effect of parental history of myopia on children's eye size and growth: results of a longitudinal study. Invest Ophthalmol Vis Sci 49, 873-6 (2008). 171. Lee, K.E., Klein, B.E., Klein, R. & Fine, J.P. Aggregation of refractive error and 5- year changes in refractive error among families in the Beaver Dam Eye Study. Arch Ophthalmol 119, 1679-85 (2001). 172. Ashton, G.C. Segregation analysis of ocular refraction and myopia. Hum Hered 35, 232-9 (1985). 173. Parssinen, O. et al. Heritability of spherical equivalent: a population-based twin study among 63- to 76-year-old female twins. Ophthalmology 117, 1908-11 (2010). 174. Sanfilippo, P.G., Hewitt, A.W., Hammond, C.J. & Mackey, D.A. The heritability of ocular traits. Surv Ophthalmol 55, 561-83 (2010). 175. Paluru, P.C., Nallasamy, S., Devoto, M., Rappaport, E.F. & Young, T.L. Identification of a novel locus on 2q for autosomal dominant high-grade myopia. Invest Ophthalmol Vis Sci 46, 2300-7 (2005). 176. Zhang, Q. et al. A new locus for autosomal dominant high myopia maps to 4q22- q27 between D4S1578 and D4S1612. Mol Vis 11, 554-60 (2005).
199
177. Lam, C.Y. et al. A genome-wide scan maps a novel high myopia locus to 5p15. Invest Ophthalmol Vis Sci 49, 3768-78 (2008). 178. Ma, J.H. et al. Identification of a locus for autosomal dominant high myopia on chromosome 5p13.3-p15.1 in a Chinese family. Mol Vis 16, 2043-54 (2010). 179. Paget, S. et al. Linkage analysis of high myopia susceptibility locus in 26 families. Mol Vis 14, 2566-74 (2008). 180. Naiglin, L. et al. A genome wide scan for familial high myopia suggests a novel locus on chromosome 7q36. J Med Genet 39, 118-24 (2002). 181. Nallasamy, S. et al. Genetic linkage study of high-grade myopia in a Hutterite population from South Dakota. Mol Vis 13, 229-36 (2007). 182. Young, T.L. et al. A second locus for familial high myopia maps to chromosome 12q. Am J Hum Genet 63, 1419-24 (1998). 183. Yang, Z., Xiao, X., Li, S. & Zhang, Q. Clinical and linkage study on a consanguineous Chinese family with autosomal recessive high myopia. Mol Vis 15, 312-8 (2009). 184. Paluru, P. et al. New locus for autosomal dominant high myopia maps to the long arm of chromosome 17. Invest Ophthalmol Vis Sci 44, 1830-6 (2003). 185. Young, T.L. et al. Evidence that a locus for familial high myopia maps to chromosome 18p. Am J Hum Genet 63, 109-19 (1998). 186. Zhang, Q. et al. Novel locus for X linked recessive high myopia maps to Xq23-q25 but outside MYP1. J Med Genet 43, e20 (2006). 187. Schwartz, M., Haim, M. & Skarsholm, D. X-linked myopia: Bornholm eye disease. Linkage to DNA markers on the distal part of Xq. Clin Genet 38, 281-6 (1990). 188. Klein, A.P. et al. Linkage analysis of quantitative refraction and refractive errors in the beaver dam eye study. Invest Ophthalmol Vis Sci 52, 5220-5 (2011). 189. Farbrother, J.E. et al. Linkage analysis of the genetic loci for high myopia on 18p, 12q, and 17q in 51 U.K. families. Invest Ophthalmol Vis Sci 45, 2879-85 (2004). 190. Lam, D.S. et al. TGFbeta-induced factor: a candidate gene for high myopia. Invest Ophthalmol Vis Sci 44, 1012-5 (2003). 191. Scavello, G.S., Jr. et al. Genomic structure and organization of the high grade Myopia-2 locus (MYP2) critical region: mutation screening of 9 positional candidate genes. Mol Vis 11, 97-110 (2005). 192. Zhang, Q., Li, S., Xiao, X., Jia, X. & Guo, X. Confirmation of a genetic locus for X- linked recessive high myopia outside MYP1. J Hum Genet 52, 469-72 (2007). 193. Mutti, D.O. et al. Genetic loci for pathological myopia are not associated with juvenile myopia. Am J Med Genet 112, 355-60 (2002). 194. Ibay, G. et al. Candidate high myopia loci on chromosomes 18p and 12q do not play a major role in susceptibility to common myopia. BMC Med Genet 5, 20 (2004). 195. Stambolian, D. et al. Genomewide linkage scan for myopia susceptibility loci among Ashkenazi Jewish families shows evidence of linkage on chromosome 22q12. Am J Hum Genet 75, 448-59 (2004). 196. Stambolian, D. et al. Genome-wide scan of additional Jewish families confirms linkage of a myopia susceptibility locus to chromosome 22q12. Mol Vis 12, 1499-505 (2006).
200
197. Klein, A.P. et al. Confirmation of linkage to ocular refraction on chromosome 22q and identification of a novel linkage region on 1q. Arch Ophthalmol 125, 80- 5 (2007). 198. Wojciechowski, R. et al. Genomewide scan in Ashkenazi Jewish families demonstrates evidence of linkage of ocular refraction to a QTL on chromosome 1p36. Hum Genet 119, 389-99 (2006). 199. Stambolian, D. et al. Genome-wide scan for myopia in the Old Order Amish. Am J Ophthalmol 140, 469-76 (2005). 200. Wojciechowski, R. et al. Genomewide linkage scans for ocular refraction and meta-analysis of four populations in the Myopia Family Study. Invest Ophthalmol Vis Sci 50, 2024-32 (2009). 201. Abbott, D. et al. An international collaborative family-based whole genome quantitative trait linkage scan for myopic refractive error. Mol Vis 18, 720-9 (2012). 202. Biino, G. et al. Ocular refraction: heritability and genome-wide search for eye morphometry traits in an isolated Sardinian population. Hum Genet 116, 152-9 (2005). 203. Zhu, G. et al. Genetic dissection of myopia: evidence for linkage of ocular axial length to chromosome 5q. Ophthalmology 115, 1053-1057 e2 (2008). 204. Andrew, T. et al. Identification and replication of three novel myopia common susceptibility gene loci on chromosome 3q26 using linkage and linkage disequilibrium mapping. PLoS Genet 4, e1000220 (2008). 205. Mordechai, S. et al. High myopia caused by a mutation in LEPREL1, encoding prolyl 3-hydroxylase 2. Am J Hum Genet 89, 438-45 (2011). 206. Simpson, C.L., Wojciechowski, R., Ibay, G., Stambolian, D. & Bailey-Wilson, J.E. Dissecting the genetic heterogeneity of myopia susceptibility in an Ashkenazi Jewish population using ordered subset analysis. Mol Vis 17, 1641-51 (2011). 207. Lam, D.S. et al. Familial high myopia linkage to chromosome 18p. Ophthalmologica 217, 115-8 (2003). 208. Ratnamala, U. et al. Refinement of the X-linked nonsyndromic high-grade myopia locus MYP1 on Xq28 and exclusion of 13 known positional candidate genes by direct sequencing. Invest Ophthalmol Vis Sci 52, 6814-9 (2011). 209. Inamori, Y. et al. The COL1A1 gene and high myopia susceptibility in Japanese. Hum Genet 122, 151-7 (2007). 210. Mutti, D.O. et al. Candidate gene and locus analysis of myopia. Mol Vis 13, 1012- 9 (2007). 211. Metlapally, R. et al. COL1A1 and COL2A1 genes and myopia susceptibility: evidence of association and suggestive linkage to the COL2A1 locus. Invest Ophthalmol Vis Sci 50, 4080-6 (2009). 212. Zhang, D. et al. An association study of the COL1A1 gene and high myopia in a Han Chinese population. Mol Vis 17, 3379-83 (2011). 213. Wang, J. et al. High myopia is not associated with single nucleotide polymorphisms in the COL2A1 gene in the Chinese population. Mol Med Report 5, 133-7 (2012). 214. Rada, J.A., Shelton, S. & Norton, T.T. The sclera and myopia. Exp Eye Res 82, 185- 200 (2006).
201
215. Wojciechowski, R., Bailey-Wilson, J.E. & Stambolian, D. Association of matrix metalloproteinase gene polymorphisms with refractive error in Amish and Ashkenazi families. Invest Ophthalmol Vis Sci 51, 4989-95 (2010). 216. Hall, N.F., Gale, C.R., Ye, S. & Martyn, C.N. Myopia and polymorphisms in genes for matrix metalloproteinases. Invest Ophthalmol Vis Sci 50, 2632-6 (2009). 217. Nakanishi, H. et al. Single-nucleotide polymorphisms in the promoter region of matrix metalloproteinase-1, -2, and -3 in Japanese with high myopia. Invest Ophthalmol Vis Sci 51, 4432-6 (2010). 218. Liang, C.L. et al. Evaluation of MMP3 and TIMP1 as candidate genes for high myopia in young Taiwanese men. Am J Ophthalmol 142, 518-20 (2006). 219. Han, W., Yap, M.K., Wang, J. & Yip, S.P. Family-based association analysis of hepatocyte growth factor (HGF) gene polymorphisms in high myopia. Invest Ophthalmol Vis Sci 47, 2291-9 (2006). 220. Yanovitch, T. et al. Hepatocyte growth factor and myopia: genetic association analyses in a Caucasian population. Mol Vis 15, 1028-35 (2009). 221. Veerappan, S. et al. Role of the hepatocyte growth factor gene in refractive error. Ophthalmology 117, 239-45 e1-2 (2010). 222. Jobling, A.I., Wan, R., Gentle, A., Bui, B.V. & McBrien, N.A. Retinal and choroidal TGF-beta in the tree shrew model of myopia: isoform expression, activation and effects on function. Exp Eye Res 88, 458-66 (2009). 223. Lin, H.J. et al. The TGFbeta1 gene codon 10 polymorphism contributes to the genetic predisposition to high myopia. Mol Vis 12, 698-703 (2006). 224. Zha, Y. et al. TGFB1 as a susceptibility gene for high myopia: a replication study with new findings. Arch Ophthalmol 127, 541-8 (2009). 225. Khor, C.C. et al. Support for TGFB1 as a susceptibility gene for high myopia in individuals of Chinese descent. Arch Ophthalmol 128, 1081-4 (2010). 226. Lin, H.J. et al. Sclera-related gene polymorphisms in high myopia. Mol Vis 15, 1655-63 (2009). 227. Cordain, L., Eaton, S.B., Brand Miller, J., Lindeberg, S. & Jensen, C. An evolutionary analysis of the aetiology and pathogenesis of juvenile-onset myopia. Acta Ophthalmol Scand 80, 125-35 (2002). 228. Metlapally, R. et al. Genetic association of insulin-like growth factor-1 polymorphisms with high-grade myopia in an international family cohort. Invest Ophthalmol Vis Sci 51, 4476-9 (2010). 229. Mak, J.Y., Yap, M.K., Fung, W.Y., Ng, P.W. & Yip, S.P. Association of IGF1 gene haplotypes with high myopia in Chinese adults. Arch Ophthalmol 130, 209-16 (2012). 230. Zhuang, W. et al. Association of insulin-like growth factor-1 polymorphisms with high myopia in the Chinese population. Mol Vis 18, 634-44 (2012). 231. Rada, J.A., Achen, V.R., Penugonda, S., Schmidt, R.W. & Mount, B.A. Proteoglycan composition in the human sclera during growth and aging. Invest Ophthalmol Vis Sci 41, 1639-48 (2000). 232. Jiang, B. et al. PAX6 haplotypes are associated with high myopia in Han chinese. PLoS One 6, e19587 (2011). 233. Tsai, Y.Y. et al. A PAX6 gene polymorphism is associated with genetic predisposition to extreme myopia. Eye (Lond) 22, 576-81 (2008).
202
234. Han, W. et al. Association of PAX6 polymorphisms with high myopia in Han Chinese nuclear families. Invest Ophthalmol Vis Sci 50, 47-56 (2009). 235. Leung, Y.F., Tam, P.O., Baum, L., Lam, D.S. & Pang, C.C. TIGR/MYOC proximal promoter GT-repeat polymorphism is not associated with myopia. Hum Mutat 16, 533 (2000). 236. Tang, W.C. et al. Linkage and association of myocilin (MYOC) polymorphisms with high myopia in a Chinese population. Mol Vis 13, 534-44 (2007). 237. Zayats, T. et al. Myocilin polymorphisms and high myopia in subjects of European origin. Mol Vis 15, 213-22 (2009). 238. Simpson, C.L. et al. The Roles of PAX6 and SOX2 in Myopia: lessons from the 1958 British Birth Cohort. Invest Ophthalmol Vis Sci 48, 4421-5 (2007). 239. An, J. et al. The FGF2 gene in a myopia animal model and human subjects. Mol Vis 18, 471-8 (2012). 240. Lu, B. et al. Replication study supports CTNND2 as a susceptibility gene for high myopia. Invest Ophthalmol Vis Sci 52, 8258-61 (2011). 241. Khor, C.C. et al. cMET and refractive error progression in children. Ophthalmology 116, 1469-74, 1474 e1 (2009). 242. Schache, M., Chen, C.Y., Dirani, M. & Baird, P.N. The hepatocyte growth factor receptor (MET) gene is not associated with refractive error and ocular biometrics in a Caucasian population. Mol Vis 15, 2599-605 (2009). 243. Wang, P. et al. High myopia is not associated with the SNPs in the TGIF, lumican, TGFB1, and HGF genes. Invest Ophthalmol Vis Sci 50, 1546-51 (2009). 244. Chen, J.H. et al. Endophenotyping reveals differential phenotype-genotype correlations between myopia-associated polymorphisms and eye biometric parameters. Mol Vis 18, 765-78 (2012). 245. Lin, H.J. et al. Muscarinic acetylcholine receptor 1 gene polymorphisms associated with high myopia. Mol Vis 15, 1774-80 (2009). 246. Wang, I.J. et al. The association of single nucleotide polymorphisms in the 5'- regulatory region of the lumican gene with susceptibility to high myopia in Taiwan. Mol Vis 12, 852-7 (2006). 247. Chen, Z.T., Wang, I.J., Shih, Y.F. & Lin, L.L. The association of haplotype at the lumican gene with high myopia susceptibility in Taiwanese patients. Ophthalmology 116, 1920-7 (2009). 248. Majava, M. et al. Novel mutations in the small leucine-rich repeat protein/proteoglycan (SLRP) genes in high myopia. Hum Mutat 28, 336-44 (2007). 249. Lin, H.J. et al. Association of the lumican gene functional 3'-UTR polymorphism with high myopia. Invest Ophthalmol Vis Sci 51, 96-102 (2010). 250. Rydzanicz, M. et al. IGF-1 gene polymorphisms in Polish families with high-grade myopia. Mol Vis 17, 2428-39 (2011). 251. Hayashi, H. et al. Association of 15q14 and 15q25 with High Myopia in Japanese. Invest Ophthalmol Vis Sci (2011). 252. Gao, Y. et al. Common variants in chromosome 4q25 are associated with myopia in Chinese adults. Ophthalmic Physiol Opt 32, 68-73 (2012). 253. Liang, C.L. et al. Systematic assessment of the tagging polymorphisms of the COL1A1 gene for high myopia. J Hum Genet 52, 374-7 (2007).
203
254. Nakanishi, H. et al. Absence of association between COL1A1 polymorphisms and high myopia in the Japanese population. Invest Ophthalmol Vis Sci 50, 544-50 (2009). 255. Veerappan, S. et al. The retinoic acid receptor alpha (RARA) gene is not associated with myopia, hypermetropia, and ocular biometric measures. Mol Vis 15, 1390-7 (2009). 256. Scavello, G.S., Paluru, P.C., Ganter, W.R. & Young, T.L. Sequence variants in the transforming growth beta-induced factor (TGIF) gene are not associated with high myopia. Invest Ophthalmol Vis Sci 45, 2091-7 (2004). 257. Hasumi, Y. et al. Analysis of single nucleotide polymorphisms at 13 loci within the transforming growth factor-induced factor gene shows no association with high myopia in Japanese subjects. Immunogenetics 58, 947-53 (2006). 258. Pertile, K.K. et al. Assessment of TGIF as a candidate gene for myopia. Invest Ophthalmol Vis Sci 49, 49-54 (2008). 259. Hayashi, T., Inoko, H., Nishizaki, R., Ohno, S. & Mizuki, N. Exclusion of transforming growth factor-beta1 as a candidate gene for myopia in the Japanese. Jpn J Ophthalmol 51, 96-9 (2007). 260. Liu, H.P. et al. A novel genetic variant of BMP2K contributes to high myopia. J Clin Lab Anal 23, 362-7 (2009). 261. Nishizaki, R. et al. New susceptibility locus for high myopia is linked to the uromodulin-like 1 (UMODL1) gene region on chromosome 21q22.3. Eye (Lond) 23, 222-9 (2009). 262. Neale, B.M. et al. Genome-wide association study of advanced age-related macular degeneration identifies a role of the hepatic lipase gene (LIPC). Proc Natl Acad Sci U S A 107, 7395-400 (2010). 263. Verhoeven, V.J. et al. Large scale international replication and meta-analysis study confirms association of the 15q14 locus with myopia. The CREAM consortium. Hum Genet (2012). 264. Wang, Q. et al. Replication study of significant single nucleotide polymorphisms associated with myopia from two genome-wide association studies. Mol Vis 17, 3290-9 (2011). 265. Shi, Y. et al. Genetic variants at 13q12.12 are associated with high myopia in the han chinese population. Am J Hum Genet 88, 805-13 (2011). 266. Li, Z. et al. A genome-wide association study reveals association between common variants in an intergenic region of 4q25 and high-grade myopia in the Chinese Han population. Hum Mol Genet 20, 2861-8 (2011). 267. Shi Y, e.a. Exome Sequencing Identifies ZNF644 Mutations in High Myopia. PLoS Genet 7(6), e1002084 (2011). 268. Tran-Viet KN, S.G.E., Soler V, Powell C, Lim SH, Klemm T, Saw SM, Young TL. Study of a US cohort supports the role of ZNF644 and high-grade myopia susceptibility. Mol Vis. 18, 937-44 (2012). 269. Walline, J.J. et al. Interventions to slow progression of myopia in children. Cochrane Database Syst Rev, CD004916 (2011). 270. Ostrow, G. & Kirkeby, L. Update on myopia and myopic progression in children. Int Ophthalmol Clin 50, 87-93 (2010).
204
271. Gwiazda, J. et al. A randomized clinical trial of progressive addition lenses versus single vision lenses on the progression of myopia in children. Invest Ophthalmol Vis Sci 44, 1492-500 (2003). 272. Parssinen, O., Hemminki, E. & Klemetti, A. Effect of spectacle use and accommodation on myopic progression: final results of a three-year randomised clinical trial among schoolchildren. Br J Ophthalmol 73, 547-51 (1989). 273. Chung, K., Mohidin, N. & O'Leary, D.J. Undercorrection of myopia enhances rather than inhibits myopia progression. Vision Res 42, 2555-9 (2002). 274. Shen, J., Clark, C.A., Soni, P.S. & Thibos, L.N. Peripheral refraction with and without contact lens correction. Optom Vis Sci 87, 642-55 (2010). 275. Queiros, A., Gonzalez-Meijome, J.M., Jorge, J., Villa-Collar, C. & Gutierrez, A.R. Peripheral refraction in myopic patients after orthokeratology. Optom Vis Sci 87, 323-9 (2010). 276. Kakita, T., Hiraoka, T. & Oshika, T. Influence of overnight orthokeratology on axial elongation in childhood myopia. Invest Ophthalmol Vis Sci 52, 2170-4 (2011). 277. Cho, P., Cheung, S.W. & Edwards, M. The longitudinal orthokeratology research in children (LORIC) in Hong Kong: a pilot study on refractive changes and myopic control. Curr Eye Res 30, 71-80 (2005). 278. Walline, J.J., Jones, L.A. & Sinnott, L.T. Corneal reshaping and myopia progression. Br J Ophthalmol 93, 1181-5 (2009). 279. Katz, J. et al. A randomized trial of rigid gas permeable contact lenses to reduce progression of children's myopia. Am J Ophthalmol 136, 82-90 (2003). 280. Walline, J.J. et al. The current state of corneal reshaping. Eye Contact Lens 31, 209-14 (2005). 281. Tong, L. et al. Atropine for the treatment of childhood myopia: effect on myopia progression after cessation of atropine. Ophthalmology 116, 572-9 (2009). 282. Siatkowski, R.M. et al. Two-year multicenter, randomized, double-masked, placebo-controlled, parallel safety and efficacy study of 2% pirenzepine ophthalmic gel in children with myopia. J AAPOS 12, 332-9 (2008). 283. Pan, C.W., Ramamurthy, D. & Saw, S.M. Worldwide prevalence and risk factors for myopia. Ophthalmic Physiol Opt 32, 3-16 (2012). 284. Wong, T.Y. et al. Prevalence and risk factors for refractive errors in adult Chinese in Singapore. Invest Ophthalmol Vis Sci 41, 2486-94 (2000). 285. McBrien, N.A. & Gentle, A. Role of the sclera in the development and pathological complications of myopia. Prog Retin Eye Res 22, 307-38 (2003). 286. Saw, S.M., Katz, J., Schein, O.D., Chew, S.J. & Chan, T.K. Epidemiology of myopia. Epidemiol Rev 18, 175-87 (1996). 287. Dirani, M., Shekar, S.N. & Baird, P.N. Evidence of shared genes in refraction and axial length: the Genes in Myopia (GEM) twin study. Invest Ophthalmol Vis Sci 49, 4336-9 (2008). 288. Lavanya, R. et al. Methodology of the Singapore Indian Chinese Cohort (SICC) eye study: quantifying ethnic variations in the epidemiology of eye diseases in Asians. Ophthalmic Epidemiol 16, 325-36 (2009). 289. Saw, S.M. et al. A cohort study of incident myopia in Singaporean children. Invest Ophthalmol Vis Sci 47, 1839-44 (2006).
205
290. Fan, Q. et al. Genome-wide meta-analysis of five Asian cohorts identifies PDGFRA as a susceptibility locus for corneal astigmatism. PLoS Genet 7, e1002402 (2011). 291. Foong, A.W. et al. Rationale and methodology for a population-based study of eye diseases in Malay people: The Singapore Malay eye study (SiMES). Ophthalmic Epidemiol 14, 25-35 (2007). 292. Grosvenor, T.P. Primary care optometry, xiii, 510 p. (Butterworth- Heinemann/Elsevier, St. Louis, Mo., 2007). 293. Schork, N.J., Nath, S.K., Fallin, D. & Chakravarti, A. Linkage disequilibrium analysis of biallelic DNA markers, human quantitative trait loci, and threshold- defined case and control subjects. Am J Hum Genet 67, 1208-18 (2000). 294. Saw, S.M. et al. Prevalence and risk factors for refractive errors in the Singapore Malay Eye Survey. Ophthalmology 115, 1713-9 (2008). 295. Fan, D.S. et al. Prevalence, incidence, and progression of myopia of school children in Hong Kong. Invest Ophthalmol Vis Sci 45, 1071-5 (2004). 296. Sim, X. et al. Transferability of type 2 diabetes implicated loci in multi-ethnic cohorts from Southeast Asia. PLoS Genet 7, e1001363 (2011). 297. Purcell, S. et al. PLINK: a tool set for whole-genome association and population- based linkage analyses. Am J Hum Genet 81, 559-75 (2007). 298. Fan, Q., Teo, Y.Y. & Saw, S.M. Application of advanced statistics in ophthalmology. Invest Ophthalmol Vis Sci 52, 6059-65 (2011). 299. Brink, N., Szamel, M., Young, A.R., Wittern, K.P. & Bergemann, J. Comparative quantification of IL-1beta, IL-10, IL-10r, TNFalpha and IL-7 mRNA levels in UV- irradiated human skin in vivo. Inflamm Res 49, 290-6 (2000). 300. Rozen, S. & Skaletsky, H. Primer3 on the WWW for general users and for biologist programmers. Methods Mol Biol 132, 365-86 (2000). 301. Schaeffel, F., Burkhardt, E., Howland, H.C. & Williams, R.W. Measurement of refractive state and deprivation myopia in two strains of mice. Optom Vis Sci 81, 99-110 (2004). 302. Barathi, V.A., Boopathi, V.G., Yap, E.P. & Beuerman, R.W. Two models of experimental myopia in the mouse. Vision Res 48, 904-16 (2008). 303. Leung, K.W. et al. Expression of ZnT and ZIP zinc transporters in the human RPE and their regulation by neurotrophic factors. Invest Ophthalmol Vis Sci 49, 1221- 31 (2008). 304. Liang, J., Song, W., Tromp, G., Kolattukudy, P.E. & Fu, M. Genome-wide survey and expression profiling of CCCH-zinc finger family reveals a functional module in macrophage activation. PLoS One 3, e2880 (2008). 305. Poliseno, L. et al. A coding-independent function of gene and pseudogene mRNAs regulates tumour biology. Nature 465, 1033-8 (2010). 306. Salmena, L., Poliseno, L., Tay, Y., Kats, L. & Pandolfi, P.P. A ceRNA Hypothesis: The Rosetta Stone of a Hidden RNA Language? Cell 146, 353-8 (2011). 307. D'Errico, I., Gadaleta, G. & Saccone, C. Pseudogenes in metazoa: origin and features. Brief Funct Genomic Proteomic 3, 157-67 (2004). 308. Shi, Y. et al. Exome sequencing identifies ZNF644 mutations in high myopia. PLoS Genet 7, e1002084 (2011).
206
309. Schippert, R., Burkhardt, E., Feldkaemper, M. & Schaeffel, F. Relative axial myopia in Egr-1 (ZENK) knockout mice. Invest Ophthalmol Vis Sci 48, 11-7 (2007). 310. Laity, J.H., Lee, B.M. & Wright, P.E. Zinc finger proteins: new insights into structural and functional diversity. Curr Opin Struct Biol 11, 39-46 (2001). 311. Fischer, A.J., McGuire, J.J., Schaeffel, F. & Stell, W.K. Light- and focus-dependent expression of the transcription factor ZENK in the chick retina. Nat Neurosci 2, 706-12 (1999). 312. Bitzer, M. & Schaeffel, F. Defocus-induced changes in ZENK expression in the chicken retina. Invest Ophthalmol Vis Sci 43, 246-52 (2002). 313. Simon, P., Feldkaemper, M., Bitzer, M., Ohngemach, S. & Schaeffel, F. Early transcriptional changes of retinal and choroidal TGFbeta-2, RALDH-2, and ZENK following imposed positive and negative defocus in chickens. Mol Vis 10, 588-97 (2004). 314. Liu, C., Adamson, E. & Mercola, D. Transcription factor EGR-1 suppresses the growth and transformation of human HT-1080 fibrosarcoma cells by induction of transforming growth factor beta 1. Proc Natl Acad Sci U S A 93, 11831-6 (1996). 315. Baron, V., Adamson, E.D., Calogero, A., Ragona, G. & Mercola, D. The transcription factor Egr1 is a direct regulator of multiple tumor suppressors including TGFbeta1, PTEN, p53, and fibronectin. Cancer Gene Ther 13, 115-24 (2006). 316. Seve, M., Chimienti, F., Devergnas, S. & Favier, A. In silico identification and expression of SLC30 family genes: an expressed sequence tag data mining strategy for the characterization of zinc transporters' tissue expression. BMC Genomics 5, 32 (2004). 317. van Leeuwen, R. et al. Dietary intake of antioxidants and risk of age-related macular degeneration. JAMA 294, 3101-7 (2005). 318. Ugarte, M. & Osborne, N.N. Zinc in the retina. Prog Neurobiol 64, 219-49 (2001). 319. Huibi, X., Kaixun, H., Qiuhua, G., Yushan, Z. & Xiuxian, H. Prevention of axial elongation in myopia by the trace element zinc. Biol Trace Elem Res 79, 39-47 (2001). 320. Steinberg, G.R., Kemp, B.E. & Watt, M.J. Adipocyte triglyceride lipase expression in human obesity. Am J Physiol Endocrinol Metab 293, E958-64 (2007). 321. Heid, I.M. et al. Meta-analysis identifies 13 new loci associated with waist-hip ratio and reveals sexual dimorphism in the genetic basis of fat distribution. Nat Genet 42, 949-60 (2010). 322. Lindgren, C.M. et al. Genome-wide association scan meta-analysis identifies three Loci influencing adiposity and fat distribution. PLoS Genet 5, e1000508 (2009). 323. Cordain, L., Eades, M.R. & Eades, M.D. Hyperinsulinemic diseases of civilization: more than just Syndrome X. Comp Biochem Physiol A Mol Integr Physiol 136, 95- 112 (2003). 324. Lim, L.S. et al. Dietary factors, myopia, and axial dimensions in children. Ophthalmology 117, 993-997 e4 (2010).
207
325. Gao, H., Frost, M.R., Siegwart, J.T., Jr. & Norton, T.T. Patterns of mRNA and protein expression during minus-lens compensation and recovery in tree shrew sclera. Mol Vis 17, 903-19 (2011). 326. Wong, T.Y., Foster, P.J., Johnson, G.J. & Seah, S.K. Refractive errors, axial ocular dimensions, and age-related cataracts: the Tanjong Pagar survey. Invest Ophthalmol Vis Sci 44, 1479-85 (2003). 327. Wu, R. et al. Smoking, socioeconomic factors, and age-related cataract: The Singapore Malay Eye study. Arch Ophthalmol 128, 1029-35 (2010). 328. Tokoro, T. On the definition of pathologic myopia in group studies. Acta Ophthalmol Suppl 185, 107-8 (1988). 329. Dirani, M., Shekar, S.N. & Baird, P.N. Adult-onset myopia: the Genes in Myopia (GEM) twin study. Invest Ophthalmol Vis Sci 49, 3324-7 (2008). 330. Jensen, H. Myopia in teenagers. An eight-year follow-up study on myopia progression and risk factors. Acta Ophthalmol Scand 73, 389-93 (1995). 331. Ip, J.M. et al. Variation of the contribution from axial length and other oculometric parameters to refraction by age and ethnicity. Invest Ophthalmol Vis Sci 48, 4846-53 (2007). 332. McCarthy, M.I. & Hirschhorn, J.N. Genome-wide association studies: potential next steps on a genetic journey. Hum Mol Genet 17, R156-65 (2008). 333. Teo, Y.Y. et al. Power consequences of linkage disequilibrium variation between populations. Genet Epidemiol 33, 128-35 (2009). 334. Raju, P. et al. Prevalence of refractive errors in a rural South Indian population. Invest Ophthalmol Vis Sci 45, 4268-72 (2004). 335. Attebo, K., Ivers, R.Q. & Mitchell, P. Refractive errors in an older population: the Blue Mountains Eye Study. Ophthalmology 106, 1066-72 (1999). 336. Vitale, S., Ellwein, L., Cotch, M.F., Ferris, F.L., 3rd & Sperduto, R. Prevalence of refractive error in the United States, 1999-2004. Arch Ophthalmol 126, 1111-9 (2008). 337. Huynh, S.C., Kifley, A., Rose, K.A., Morgan, I.G. & Mitchell, P. Astigmatism in 12- year-old Australian children: comparisons with a 6-year-old population. Invest Ophthalmol Vis Sci 48, 73-82 (2007). 338. O'Donoghue, L. et al. Refractive and corneal astigmatism in white school children in Northern Ireland. Invest Ophthalmol Vis Sci (2011). 339. He, M. et al. Refractive error and visual impairment in urban children in southern china. Invest Ophthalmol Vis Sci 45, 793-9 (2004). 340. Clementi, M., Angi, M., Forabosco, P., Di Gianantonio, E. & Tenconi, R. Inheritance of astigmatism: evidence for a major autosomal dominant locus. Am J Hum Genet 63, 825-30 (1998). 341. Dirani, M., Islam, A., Shekar, S.N. & Baird, P.N. Dominant genetic effects on corneal astigmatism: the genes in myopia (GEM) twin study. Invest Ophthalmol Vis Sci 49, 1339-44 (2008). 342. Grjibovski, A.M., Magnus, P., Midelfart, A. & Harris, J.R. Epidemiology and heritability of astigmatism in Norwegian twins: an analysis of self-reported data. Ophthalmic Epidemiol 13, 245-52 (2006). 343. Yeh, L.K. et al. The genetic effect on refractive error and anterior corneal aberration: twin eye study. J Refract Surg 23, 257-65 (2007).
208
344. Cagigrigoriu, A., Gregori, D., Cortassa, F., Catena, F. & Marra, A. Heritability of corneal curvature and astigmatism: a videokeratographic child-parent comparison study. Cornea 26, 907-12 (2007). 345. Parssinen, O., Kauppinen, M., Kaprio, J., Koskenvuo, M. & Rantanen, T. Heritability of corneal refraction and corneal astigmatism: a population-based twin study among 66- to 79-year-old female twins. Acta Ophthalmol (2012). 346. Hughes, K. et al. Cardiovascular diseases in Chinese, Malays, and Indians in Singapore. II. Differences in risk factor levels. J Epidemiol Community Health 44, 29-35 (1990). 347. Tan, C.E., Emmanuel, S.C., Tan, B.Y. & Jacob, E. Prevalence of diabetes and ethnic differences in cardiovascular risk factors. The 1992 Singapore National Health Survey. Diabetes Care 22, 241-7 (1999). 348. Hughes, K., Aw, T.C., Kuperan, P. & Choo, M. Central obesity, insulin resistance, syndrome X, lipoprotein(a), and cardiovascular risk in Indians, Malays, and Chinese in Singapore. J Epidemiol Community Health 51, 394-9 (1997). 349. Cutter, J., Tan, B.Y. & Chew, S.K. Levels of cardiovascular disease risk factors in Singapore following a national intervention programme. Bull World Health Organ 79, 908-15 (2001). 350. Dirani, M. et al. Prevalence of refractive error in Singaporean Chinese children: the strabismus, amblyopia, and refractive error in young Singaporean Children (STARS) study. Invest Ophthalmol Vis Sci 51, 1348-55 (2010). 351. Zadnik, K., Mutti, D.O. & Adams, A.J. The repeatability of measurement of the ocular components. Invest Ophthalmol Vis Sci 33, 2325-33 (1992). 352. Kazeem, G.R. & Farrall, M. Integrating case-control and TDT studies. Ann Hum Genet 69, 329-35 (2005). 353. Peng, B., Yu, R.K., Dehoff, K.L. & Amos, C.I. Normalizing a large number of quantitative traits using empirical normal quantile transformation. BMC Proc 1 Suppl 1, S156 (2007). 354. Kawagishi, J., Kumabe, T., Yoshimoto, T. & Yamamoto, T. Structure, organization, and transcription units of the human alpha-platelet-derived growth factor receptor gene, PDGFRA. Genomics 30, 224-32 (1995). 355. Heinrich, M.C. et al. PDGFRA activating mutations in gastrointestinal stromal tumors. Science 299, 708-10 (2003). 356. Hoppenreijs, V.P., Pels, E., Vrensen, G.F., Felten, P.C. & Treffers, W.F. Platelet- derived growth factor: receptor expression in corneas and effects on corneal cells. Invest Ophthalmol Vis Sci 34, 637-49 (1993). 357. Kim, W.J., Mohan, R.R. & Wilson, S.E. Effect of PDGF, IL-1alpha, and BMP2/4 on corneal fibroblast chemotaxis: expression of the platelet-derived growth factor system in the cornea. Invest Ophthalmol Vis Sci 40, 1364-72 (1999). 358. Robbins, S.G. et al. Platelet-derived growth factor ligands and receptors immunolocalized in proliferative retinal diseases. Invest Ophthalmol Vis Sci 35, 3649-63 (1994). 359. Fruttiger, M. et al. PDGF mediates a neuron-astrocyte interaction in the developing retina. Neuron 17, 1117-31 (1996). 360. Cui, J. et al. PDGF receptors are activated in human epiretinal membranes. Exp Eye Res 88, 438-44 (2009).
209
361. Li, D.Q. & Tseng, S.C. Differential regulation of cytokine and receptor transcript expression in human corneal and limbal fibroblasts by epidermal growth factor, transforming growth factor-alpha, platelet-derived growth factor B, and interleukin-1 beta. Invest Ophthalmol Vis Sci 37, 2068-80 (1996). 362. Vij, N., Sharma, A., Thakkar, M., Sinha, S. & Mohan, R.R. PDGF-driven proliferation, migration, and IL8 chemokine secretion in human corneal fibroblasts involve JAK2-STAT3 signaling pathway. Mol Vis 14, 1020-7 (2008). 363. Kaur, H., Chaurasia, S.S., Agrawal, V. & Wilson, S.E. Expression of PDGF receptor- alpha in corneal myofibroblasts in situ. Exp Eye Res 89, 432-4 (2009). 364. Thom, S.B. et al. Effect of topical anti-transforming growth factor-beta on corneal stromal haze after photorefractive keratectomy in rabbits. J Cataract Refract Surg 23, 1324-30 (1997). 365. Kim, A., Lakshman, N., Karamichos, D. & Petroll, W.M. Growth factor regulation of corneal keratocyte differentiation and migration in compressed collagen matrices. Invest Ophthalmol Vis Sci 51, 864-75 (2010). 366. Han, S. et al. Association of variants in FRAP1 and PDGFRA with corneal curvature in Asian populations from Singapore. Hum Mol Genet (2011). 367. Grosvenor, T. & Goss, D.A. Role of the cornea in emmetropia and myopia. Optom Vis Sci 75, 132-45 (1998). 368. Llorente, L., Barbero, S., Cano, D., Dorronsoro, C. & Marcos, S. Myopic versus hyperopic eyes: axial length, corneal shape and optical aberrations. J Vis 4, 288- 98 (2004). 369. Wright, K.W. & Spiegel, P.H. Pediatric ophthalmology and strabismus, xxiii, 1084 p. (Springer, New York, 2003). 370. Mutti, D.O. et al. Refractive astigmatism and the toricity of ocular components in human infants. Optom Vis Sci 81, 753-61 (2004). 371. Baldwin, W.R. & Mills, D. A longitudinal study of corneal astigmatism and total astigmatism. Am J Optom Physiol Opt 58, 206-11 (1981). 372. Cowen, L. & Bobier, W.R. The pattern of astigmatism in a Canadian preschool population. Invest Ophthalmol Vis Sci 44, 4593-600 (2003). 373. Jeffreys, A.J., Neumann, R., Panayi, M., Myers, S. & Donnelly, P. Human recombination hot spots hidden in regions of strong marker association. Nat Genet 37, 601-6 (2005). 374. Kong, A. et al. A high-resolution recombination map of the human genome. Nat Genet 31, 241-7 (2002). 375. Tenesa, A. et al. Recent human effective population size estimated from linkage disequilibrium. Genome Res 17, 520-6 (2007). 376. Hemminger, B.M., Saelim, B. & Sullivan, P.F. TAMAL: an integrated approach to choosing SNPs for genetic studies of human complex traits. Bioinformatics 22, 626-7 (2006). 377. Cockerham, C.C. & Weir, B.S. Estimation of inbreeding parameters in stratified populations. Ann Hum Genet 50, 271-81 (1986). 378. Pickrell, J.K. et al. Signals of recent positive selection in a worldwide sample of human populations. Genome Res 19, 826-37 (2009). 379. Teo, Y.Y. et al. Genome-wide comparisons of variation in linkage disequilibrium. Genome Res 19, 1849-60 (2009).
210
380. Sabeti, P.C. et al. Genome-wide detection and characterization of positive selection in human populations. Nature 449, 913-8 (2007). 381. Voight, B.F., Kudaravalli, S., Wen, X. & Pritchard, J.K. A map of recent positive selection in the human genome. PLoS Biol 4, e72 (2006). 382. Grossman, S.R. et al. A composite of multiple signals distinguishes causal variants in regions of positive selection. Science 327, 883-6 (2010). 383. Lamason, R.L. et al. SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and humans. Science 310, 1782-6 (2005). 384. Xu, S. et al. Genomic dissection of population substructure of Han Chinese and its implication in association studies. Am J Hum Genet 85, 762-74 (2009). 385. Hudson, R.R. Two-locus sampling distributions and their application. Genetics 159, 1805-17 (2001). 386. McVean, G., Awadalla, P. & Fearnhead, P. A coalescent-based method for detecting and estimating recombination from gene sequences. Genetics 160, 1231-41 (2002). 387. Tajima, F. Relationship between DNA polymorphism and fixation time. Genetics 125, 447-54 (1990). 388. O'Reilly, P.F., Birney, E. & Balding, D.J. Confounding between recombination and selection, and the Ped/Pop method for detecting selection. Genome Res 18, 1304-13 (2008). 389. Reed, F.A. & Tishkoff, S.A. Positive selection can create false hotspots of recombination. Genetics 172, 2011-4 (2006). 390. Stephan, W., Song, Y.S. & Langley, C.H. The hitchhiking effect on linkage disequilibrium between linked neutral loci. Genetics 172, 2647-63 (2006). 391. Hellenthal, G., Pritchard, J.K. & Stephens, M. The effects of genotype-dependent recombination, and transmission asymmetry, on linkage disequilibrium. Genetics 172, 2001-5 (2006). 392. Kong, A. et al. Sequence variants in the RNF212 gene associate with genome- wide recombination rate. Science 319, 1398-401 (2008). 393. Baudat, F. et al. PRDM9 is a major determinant of meiotic recombination hotspots in humans and mice. Science 327, 836-40 (2010). 394. Berg, I.L. et al. PRDM9 variation strongly influences recombination hot-spot activity and meiotic instability in humans. Nat Genet 42, 859-63 (2010). 395. Parvanov, E.D., Petkov, P.M. & Paigen, K. Prdm9 controls activation of mammalian recombination hotspots. Science 327, 835 (2010). 396. Teo, Y.Y., Ong, R.T., Sim, X., Tai, E.S. & Chia, K.S. Identifying candidate causal variants via trans-population fine-mapping. Genet Epidemiol 34, 653-64 (2010). 397. Zaitlen, N., Pasaniuc, B., Gur, T., Ziv, E. & Halperin, E. Leveraging genetic variability across populations for the identification of causal variants. Am J Hum Genet 86, 23-33 (2010). 398. Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444-54 (2006). 399. Williamson, S.H. et al. Localizing recent adaptive evolution in the human genome. PLoS Genet 3, e90 (2007). 400. Carlson, C.S. et al. Genomic regions exhibiting positive selection identified from dense genotype data. Genome Res 15, 1553-65 (2005).
211
401. Shaikh, T.H. et al. High-resolution mapping and analysis of copy number variations in the human genome: a data resource for clinical and research applications. Genome Res 19, 1682-90 (2009). 402. Pinto, D., Marshall, C., Feuk, L. & Scherer, S.W. Copy-number variation in control population cohorts. Hum Mol Genet 16 Spec No. 2, R168-73 (2007). 403. Wang, E.T., Kodama, G., Baldi, P. & Moyzis, R.K. Global landscape of recent inferred Darwinian selection for Homo sapiens. Proc Natl Acad Sci U S A 103, 135-40 (2006). 404. Kimura, R., Fujimoto, A., Tokunaga, K. & Ohashi, J. A practical genome scan for population-specific strong selective sweeps that have reached fixation. PLoS One 2, e286 (2007). 405. Jakobsson, M. et al. Genotype, haplotype and copy-number variation in worldwide human populations. Nature 451, 998-1003 (2008). 406. Park, H. et al. Discovery of common Asian copy number variants using integrated high-resolution array CGH and massively parallel DNA sequencing. Nat Genet 42, 400-5 (2010). 407. Simon-Sanchez, J. et al. Genome-wide SNP assay reveals structural genomic variation, extended homozygosity and cell-line induced alterations in normal individuals. Hum Mol Genet 16, 1-14 (2007). 408. Zogopoulos, G. et al. Germ-line DNA copy number variation frequencies in a large North American population. Hum Genet 122, 345-53 (2007). 409. Wong, K.K. et al. A comprehensive analysis of common copy-number variations in the human genome. Am J Hum Genet 80, 91-104 (2007). 410. Akey, J.M. Constructing genomic maps of positive selection in humans: where do we go from here? Genome Res 19, 711-22 (2009). 411. Xu, H. et al. SgD-CNV, a database for common and rare copy number variants in three Asian populations. Hum Mutat 32, 1341-9 (2011). 412. Tang, K., Thornton, K.R. & Stoneking, M. A new approach for using genome scans to detect recent positive selection in the human genome. PLoS Biol 5, e171 (2007). 413. Kelley, J.L., Madeoy, J., Calhoun, J.C., Swanson, W. & Akey, J.M. Genomic signatures of positive selection in humans and the limits of outlier approaches. Genome Res 16, 980-9 (2006). 414. Han, S. et al. Association of variants in FRAP1 and PDGFRA with corneal curvature in Asian populations from Singapore. Hum Mol Genet 20, 3693-8 (2011). 415. Mishra, A. et al. Genetic variants near PDGFRA are associated with corneal curvature in Australians. Invest Ophthalmol Vis Sci (2012). 416. Hageman, G.S. et al. Clinical validation of a genetic model to estimate the risk of developing choroidal neovascular age-related macular degeneration. Hum Genomics 5, 420-40 (2011). 417. Jostins, L. & Barrett, J.C. Genetic risk prediction in complex disease. Hum Mol Genet 20, R182-8 (2011). 418. Stephens, M. & Balding, D.J. Bayesian statistical methods for genetic association studies. Nat Rev Genet 10, 681-90 (2009). 419. Bodmer, W. & Bonilla, C. Common and rare variants in multifactorial susceptibility to common diseases. Nat Genet 40, 695-701 (2008). 212
420. Morris, A.P. & Zeggini, E. An evaluation of statistical approaches to rare variant analysis in genetic association studies. Genet Epidemiol 34, 188-93 (2010). 421. Winchester, L., Yau, C. & Ragoussis, J. Comparing CNV detection methods for SNP arrays. Brief Funct Genomic Proteomic 8, 353-66 (2009). 422. Verhoeven, V.J. et al. Genome-wide meta-analyses of multiancestry cohorts identify multiple new susceptibility loci for refractive error and myopia. Nat Genet 45, 314-8 (2013). 423. Kiefer, A.K. et al. Genome-wide analysis points to roles for extracellular matrix remodeling, the visual cycle, and neuronal development in myopia. PLoS Genet 9, e1003299 (2013). 424. Hellenthal, G. & Stephens, M. Insights into recombination from population genetic variation. Curr Opin Genet Dev 16, 565-72 (2006).
213