Supplementary Material s46
Total Page:16
File Type:pdf, Size:1020Kb
Supplementary Material
QIMR Genotyping and Quality Control
Individuals were genotyped as part of several different genotyping projects undertaken at
QIMR. Depending upon the project in which they were included, subjects were genotyped on the Illumina 317K, 370K or 610K platforms. Strict quality control (QC) procedures were applied to the genotype data to reduce the potential for false positives during the analysis.
All samples were imputed to the International HapMAP 2 project reference panel using the software MACH(Li et al., 2010) using 274,604 SNPs common across all genotyping platforms.
Further QC steps were carried out to remove SNPs that were poorly imputed. Full details of the genotyping, imputation and QC procedures are given in Medland et al 2009(Medland et al., 2009). After the QC process and removal of non-genotyped individuals leaving 564 cases and 1571 controls of which 486 cases and 1056 controls were unrelated and used in the primary analysis.
QC steps included testing for deviation from Hardy-Weinberg Equilibrium (p < 10 -6),
Mendelian errors and removing individuals and SNPs who had >5% missingness. SNPs with
MAF (minor allele frequency) < 0.01 or genotype quality score < 0.7 were removed.
Individuals showing evidence of non-European ancestry based on the genotyping information were also removed.
NESDA/NTR Genotyping and QC
Participants were genotyped on the Affymetrix Perlegen 5.0 platform, and 1,235,109 SNPs from the HapMap3 CEU+TSI populations were imputed using Beagle 3.0.4.
STR Genotyping and Quality Control
Quality control filtering included removal of SNPs with more than 3% missing information, <1% minor allele frequency, deviation from Hardy-Weinberg equilibrium (p<=1x10-7) and individuals with more than 3% missing genotypes, failed sex-check, more than 5 SD from sample mean of heterozygosity as well as unresolved cases of cryptic relatedness.
Imputation to Hapmap 2 build 36 was performed by using IMPUTE2.
ALSPAC
ALSPAC genotyping was carried out at the Centre National de Génotypage (CNG) using the
Illumina Human660W-quad array. Quality control measures included the removal of SNPs with >5% missing information, MAF <1% and deviation from HWE (p<1.0x10-6). Individuals were excluded if they had >5% missing information, evidence of non-European ancestry from principal component analysis of the GWAS data or indeterminate X chromosome heterozygosity. A total of 8,340 participants and 526,688 SNPs passed these quality control filters. Autosomal SNPs were imputed to the HapMap CEU population (release 22) using
MaCH (v1.0.16) and NCBI build 36, HapMap 3 release 2 (Feb 2009) for the X chromosome using Minimac (v4.4.3). Prior to analysis, quality control measures were applied to the imputed genotypes and SNPs with MAF <1% were excluded, as were those with R2 ≤30%.
Association Analysis
Association analysis was initially performed in the Australian discovery sample, with subsequent attempted replication of the most significant SNPs in the Australian replication sample. Access to data from the Dutch, Swedish and U.K. samples became available after the initial replication attempt, and a meta-analysis of the Australian Discovery, NESDA/NTR,
Swedish and U.K. samples was carried out.
Statistical Power Our study was underpowered to detect common risk variants with effect sizes typical of those found for other psychiatric disorders (0.3% power to detect a risk allele of frequency
0.2-0.8 and a genotype relative risk of 1.15).
Logistic regression of SNP imputation dosage scores and PPD using unrelated individuals was conducted in the Australian Discovery sample using R. Association testing of genotyped SNPs in the Australian replication sample was also performed using PLINK. Association testing between SNP dosage scores and PPD in the both the GAIN and Swedish replication samples was performed in PLINK(Purcell et al., 2007), with ancestry principal components included as covariates in all analyses. Meta-analysis of the Australian discovery sample and the Dutch and Swedish replication samples (total no. of cases = 1,420, total number of controls =
9,473) was performed for all SNPs analysed in at least two studies (n = 2,473,712) using
PLINK (Purcell et al., 2007). A total 967,839 SNPs were analysed in all four studies. The estimate from each study was weighted by the standard error of the log of the odds ratio and the results from both a fixed-effects and random-effects meta-analysis were computed.
Our two-stage design in the Australian samples has 80% power to detect a risk allele of frequency 0.1-0.8 and genotype relative risk 1.5 at the genome-wide level of significance p<
5x10-8 (Skol et al., 2006). The meta-analysis of all 4 samples provides >90% power to detect a risk allele of frequency 0.2-0.8 and a genotype relative risk of 1.35 for a SNP that is included in all three samples. Secondary analyses such as sign tests between the 4 cohorts and gene- based analyses using VEGAS (Liu et al., 2010) were performed.
We performed a sign test to assess whether the associated SNPs show the same direction of effect in the replication samples. A positive result in the sign test would support the hypothesis that many of the most significant SNPs are truly associated with PPD, but the sample size is too small to detect them at genome-wide significant level. We used a binomial test to evaluate the probability of seeing the observed number of SNPs with the same sign in the replication samples.
The gene-based test VEGAS(Liu et al., 2010) was applied in the Australian discovery sample to identify genes associated with PPD, with genes defined as ± 50 kb from the start/stop sites. The test combines the test statistic for all of the SNPs in the gene into a single gene- based test-statistic accounting for correlation between them due to linkage disequilibrium.
In total 17787 genes were tested and genes with a p-value less than 2.8 x 10-6 (0.05/17787) were considered significant.
Australian Replication Sample Genotyping
Sequences in regions containing SNPs selected for the replication study were downloaded from the National Centre for Biotechnology Information (http://www.ncbi.nlm.nih.gov/) and were cross-checked using Sequenom databases (https://mysequenom.com) before assay design. Multiplexed assays were designed for 62 SNPs using the Sequenom MassARRAY
Assay Design software (version 3.1). SNPs were typed using iPLEXTM Gold chemistry and analysed using a Sequenom MassARRAY Compact Mass Spectrometer (Sequenom Inc, San
Diego, CA, USA). The SAP and iPLEX reactions were performed according to manufacturer’s instructions, while the PCR stage of the process was performed using a 2.5uL half reaction.
The post-PCR products were spotted on a Sequenom SpectroChip 2, and the data was processed and analysed using Sequenom MassARRAY TYPER 4.0 software. Two SNPs produced ambiguous calls and were removed from the analysis. A total of 21 individuals were removed from the replication sample for missingness > 0.05
Association analysis results The association results from the Australian discovery sample with p < 0.001 are listed in
Supplement Table 2. The Q-Q plot is shown in Supplementary Figure 1. Twenty independent
SNPs (r2 < 0.65) were associated at p < 10-4. SNP rs9360356 on chromosome 6 is found in a plausible candidate brain expressed gene(DeRosse et al., 2008), BAI3 (brain-specific angiogenesis inhibitor 3)(Kee et al., 2004). SNPs representing 19 independent regions associated at p < 10-4 were genotyped in the Australian replication sample drawn from the same phenotyped sample, and showed minimum association p =0.03, not significant given the level of multiple testing (not reported) (Supplementary Table 3). None of the SNPs were significant in the Dutch, Swedish and UK replication samples after correcting for multiple testing (minimum uncorrected p = 0.02).
No genes reached significance in the gene-based test (Supplementary Table 4). However, the two most associated genes were two closely linked genes that encode pregnancy specific beta-1-glycoproteinss, PSG8 and PSG3 (p= 0.0001 and p=0.0002). The association signal for both genes resulted from an overlapping set of SNPs of which rs8112446 was the most associated (p=0.0003) (Supplementary Figure 3). After conditioning on rs8112446, no SNPs in either gene were associated with nominal significance. The human pregnancy-specific glycoproteins (PSGs) are transcribed from a family of 11 genes(Teglund et al., 1994) located in a 700kb region of chromosome 19 and are primarily synthesized by the syncytiotrophoblast during pregnancy.(Zhou et al., 1997) Reduced serum levels of PSGs during the first trimester of pregnancy have been associated with small for gestational age fetuses (SGA) and spontaneous preterm delivery.(Pihl et al., 2009) Low levels of PSGs in later in pregnancy are also associated with the same features.(Gordon et al., 1977, Tamsen et al.,
1983, Westergaard et al., 1985) Replication was attempted for 10 SNPs in the region, including the most significant SNPs in both genes, 4 nonsynonymous SNPs identified using
SNPper(Riva and Kohane, 2002), and 4 SNPs known to be cis-eQTLs for the PSG3 gene, identified using the seeQTL database(Xia et al., 2012). No SNPs in the region showed nominal significance in the replication sample, and rs8112446, the top SNP in the region in the discovery analysis having p = 0.84 in the Australian replication sample.
A meta-analysis of SNPs overlapping in at least two of the studies (n = 2,473,712) did not identify any genome-wide significant SNPs. The results for the most significant independent
(at r2 < 0.5) SNPs in the meta-analysis are shown in Supplementary Table 5. The most significant SNP was rs6918856 (p = 8.9 x 10-8) which approached genome-wide significance.
This SNP is located on chromosome 6 in a region with no annotated Refseq genes. The SNP is found in close proximity to an H3K27Ac site identified in the ENCODE, that harbors binding sites for the GATA-2, c-FOS and p300 transcription factors. Furthermore, the SNP is in strong
LD with SNPs located within the transcription factor binding sites.
The Australian, Swedish and UK samples were imputed to HapMap II and therefore the top
20 independent SNPs in the Australian sample were all found in the Swedish and UK samples. Only 9 of the 20 SNPs had the same direction of effect in the Swedish sample which is less than would be expected by chance. Furthermore, only 8 of the top 20 SNPs from the discovery sample were directly typed or imputed in the GAIN sample. A further 4 of the top
20 SNPs had a proxy SNP at r2 > 0.8 in the GAIN sample, giving a total of 12 SNPs that could be evaluated in the sign test. 6 of the 12 SNPs had the same direction of effect as the discovery sample. 19 of the top 20 SNPs in the Australian sample were typed or imputed in the UK sample. Only 5 of 19 SNPs had the same direction in the UK sample.
To further test for evidence of a polygenic architecture for PND, we meta-analysed the results from the Australian, Dutch and Swedish cohorts and then extracted the 200 most significant independent (at r2 < 0.5) SNPs. We then tested whether the direction of effect was the same in the UK sample for those SNPs. Only 94 of the 200 SNPs had the same direction of effect in the UK sample, which was not significant.
Permutation Analysis in NESDA/NTR
We tested the significance of this result by randomly sampling 208 cases (both PPD and non-
PPD MDD cases) and 761 controls 1,000 times from the entire NESDA/NTR MDD case/control sample and performing the profile scoring analysis using all clumped BPD SNPs.
When sampling from the entire NESDA/NTR cohort, only 18 of the 1,000 replicates had a lower p-value or a higher R2 than the true p-value and R2 (p = 0.018). The same sampling approach was utilized but restricting the cases and controls sampled to females. By restricting to only sampling females, we would expect that we will oversample PPD cases relative to the previous analysis. 23 replicates had a lower p-value or higher R2 than the true values (p = 0.023). Supplementary Figure 1. Q-Q plot from the GWAS analysis in the Australian
Discovery Sample.
Supplementary Figure 2. Q-Q plot of meta-analysis of all 4 samples. Supplementary Figure 3
References DEROSSE, P., LENCZ, T., BURDICK, K. E., SIRIS, S. G., KANE, J. M. & MALHOTRA, A. K. 2008. The genetics of symptom-based phenotypes: toward a molecular classification of schizophrenia. Schizophr Bull, 34, 1047-53. GORDON, Y. P., GRUDZINSKAS, J. G., JEFFREY, D. & CHARD, T. 1977. Concentrations of pregnancy-specific beta 1-glycoprotein in maternal blood in normal pregnancy and in intrauterine growth retardation. Lancet, 1, 331-3. KEE, H. J., AHN, K. Y., CHOI, K. C., WON SONG, J., HEO, T., JUNG, S., KIM, J. K., BAE, C. S. & KIM, K. K. 2004. Expression of brain-specific angiogenesis inhibitor 3 (BAI3) in normal brain and implications for BAI3 in ischemia-induced brain angiogenesis and malignant glioma. FEBS Lett, 569, 307-16. LI, Y., WILLER, C. J., DING, J., SCHEET, P. & ABECASIS, G. R. 2010. MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol, 34, 816-34. LIU, J. Z., MCRAE, A. F., NYHOLT, D. R., MEDLAND, S. E., WRAY, N. R., BROWN, K. M., HAYWARD, N. K., MONTGOMERY, G. W., VISSCHER, P. M., MARTIN, N. G. & MACGREGOR, S. 2010. A versatile gene-based test for genome-wide association studies. Am J Hum Genet, 87, 139-45. MEDLAND, S. E., NYHOLT, D. R., PAINTER, J. N., MCEVOY, B. P., MCRAE, A. F., ZHU, G., GORDON, S. D., FERREIRA, M. A., WRIGHT, M. J., HENDERS, A. K., CAMPBELL, M. J., DUFFY, D. L., HANSELL, N. K., MACGREGOR, S., SLUTSKE, W. S., HEATH, A. C., MONTGOMERY, G. W. & MARTIN, N. G. 2009. Common variants in the trichohyalin gene are associated with straight hair in Europeans. Am J Hum Genet, 85, 750-5. PIHL, K., LARSEN, T., LAURSEN, I., KREBS, L. & CHRISTIANSEN, M. 2009. First trimester maternal serum pregnancy-specific beta-1-glycoprotein (SP1) as a marker of adverse pregnancy outcome. Prenatal Diagnosis, 29, 1256-1261. PURCELL, S., NEALE, B., TODD-BROWN, K., THOMAS, L., FERREIRA, M. A., BENDER, D., MALLER, J., SKLAR, P., DE BAKKER, P. I., DALY, M. J. & SHAM, P. C. 2007. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet, 81, 559-75. RIVA, A. & KOHANE, I. S. 2002. SNPper: retrieval and analysis of human SNPs. Bioinformatics, 18, 1681-5. SKOL, A. D., SCOTT, L. J., ABECASIS, G. R. & BOEHNKE, M. 2006. Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat Genet, 38, 209-13. TAMSEN, L., AXELSSON, O. & JOHANSSON, S. G. 1983. Serum levels of pregnancy- specific beta 1-glycoprotein (SP1) in women with pregnancies at risk. Gynecol Obstet Invest, 16, 253-60. TEGLUND, S., OLSEN, A., KHAN, W. N., FRANGSMYR, L. & HAMMARSTROM, S. 1994. The pregnancy-specific glycoprotein (PSG) gene cluster on human chromosome 19: fine structure of the 11 PSG genes and identification of 6 new genes forming a third subgroup within the carcinoembryonic antigen (CEA) family. Genomics, 23, 669-84. WESTERGAARD, J. G., TEISNER, B., HAU, J., GRUDZINSKAS, J. G. & CHARD, T. 1985. Placental function studies in low birth weight infants with and without dysmaturity. Obstet Gynecol, 65, 316-8. XIA, K., SHABALIN, A. A., HUANG, S., MADAR, V., ZHOU, Y. H., WANG, W., ZOU, F., SUN, W., SULLIVAN, P. F. & WRIGHT, F. A. 2012. seeQTL: a searchable database for human eQTLs. Bioinformatics, 28, 451-2. ZHOU, G. Q., BARANOV, V., ZIMMERMANN, W., GRUNERT, F., ERHARD, B., MINCHEVA-NILSSON, L., HAMMARSTROM, S. & THOMPSON, J. 1997. Highly specific monoclonal antibody demonstrates that pregnancy-specific glycoprotein (PSG) is limited to syncytiotrophoblast in human early and term placenta. Placenta, 18, 491-501.