Testing Optimally Weighted Combination of Variants For
Total Page:16
File Type:pdf, Size:1020Kb
Zhao et al. BMC Proceedings 2014, 8(Suppl 1):S59 http://www.biomedcentral.com/1753-6561/8/S1/S59 PROCEEDINGS Open Access Testing optimally weighted combination of variants for hypertension Xingwang Zhao2†, Qiuying Sha1†, Shuanglin Zhang1, Xuexia Wang2* From Genetic Analysis Workshop 18 Stevenson, WA, USA. 13-17 October 2012 Abstract Testing rare variants directly is possible with next-generation sequencing technology. In this article, we propose a sliding-window-based optimal-weighted approach to test for the effects of both rare and common variants across the whole genome. We measured the genetic association between a disease and a combination of variants of a single-nucleotide polymorphism window using the newly developed tests TOW and VW-TOW and performed a sliding-window technique to detect disease-susceptible windows. By applying the new approach to unrelated individuals of Genetic Analysis Workshop 18 on replicate 1 chromosome 3, we detected 3 highly susceptible windows across chromosome 3 for diastolic blood pressure and identified 10 of 48,176 windows as the most promising for both diastolic and systolic blood pressure. Seven of 9 top variants influencing diastolic blood pressure and 8 of 9 top variants influencing systolic blood pressure were found in or close to our top 10 windows. Background (GAW18) data, which consists of a whole genome Hypertension is a common chronic destructive disease sequencingdata set, is a large-scale pedigree-based sam- with unknown complex etiology [1]. More than1billion ple with 959 individuals, 464 directly sequenced and the people worldwide have hypertension, defined as blood rest imputed. pressure (BP) ≥140 mm Hg systolic (SBP) or ≥90 mm Hg Several statistical methods have been proposed to diastolic (DBP) [2], which is a major risk factor for stroke, detect associations of rare variants, including the com- myocardial infarction, heart failure, and a cause of bined multivariate and collapsing (CMC) method [10] chronic kidney disease [3-5]. Both genetic and environ- and the weighted sum statistic (WSS) [11]. We have pro- mental bases are likely to contribute to this disease. Ehret posed a novel test for measuringthe effect of an optimally et al. conducted a large-scale genome-wide association weighted combination of variants (TOW) [12]. In addi- study of hypertension in 2011 and identified 10 novel loci tion, based on the TOW, we proposed a variable weight- related to BP physiology [6]. Although numerous com- TOW (VW-TOW) aiming to test effects of both rare and mon genetic variants with small effects on BP have been common variants. Both TOW and VW-TOW are applic- identified [6-8], the identified variants account for only a able to quantitative and qualitative traits, allow covari- small fraction of disease heritability [9]. One potential ates, and are robust to directions of effects of causal source of missing heritability is the contribution of rare variants. variants. Recently, next-generation sequencing technolo- In this article, we report a novel whole genome sliding gyhas enabled the sequencing of the whole genome of window approach to detect genetic association between a large groups of individuals,which makes directly testing trait and single-nucleotide polymorphism (SNP) regions rare variants feasible. The Genetic Analysis Workshop 18 across the entire genome. This approach integrates TOW and VW-TOW with the concept of sliding window [13]. Applied to the GAW18 replication 1, chromosome 3 data * Correspondence: [email protected] † Contributed equally set, our approach yielded results consistent with the top 2Joseph J.Zilber School of Public Health,University of Wisconsin, P.O. Box 413, genes influencing simulated SBP and DBP, which were Milwaukee, WI 53201, USA generated from the GAW18 simulation model. Full list of author information is available at the end of the article © 2014 Zhao et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http:// creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Zhao et al. BMC Proceedings 2014, 8(Suppl 1):S59 Page 2 of 5 http://www.biomedcentral.com/1753-6561/8/S1/S59 o n ¯ ¯ n ¯ 2 Methods w = (y˜i − y˜)(x˜im − x˜m)/ (x˜im − x˜m) which is given by m i=1 i=1 . M Considerasampleofn individuals. Each individual has o o o o n ¯ o ¯o n ¯ 2 ˜ ˜ S0(w , ··· , w )=n (y˜i − y˜)(x˜ − x˜ )/ (y˜i − y˜) . Let x = w xim.Then 1 M i=1 1 i=1 been genotyped at M variants in a genomic region. i m−1 m We propose the new test statistic TOW to test the effect Denote yi as the quantitative trait value. Denote T ith of the optimally weighted combination of variants Xi = (xi1, ..., xiM) as the genotypic score of the indi- n ∈ { } M o ˜ ˜ − ˜¯ ˜o − ˜¯o vidual, where xim 0, 1, 2 is the number of minor wmxim as TT = (yi y)(x1 x ). TT is equiva- ith mth m=1 i=1 alleles that the individual has at the variant. o ··· o n ˜ − ˜¯ 2 T lent to S0(w1, wM) since (yi y) is a constant. Suppose we have p covariates. Let (zi1, ..., zip) denote i=1 th o covariates of the i individual. We adjust both trait value The optimal weight wm will put big weights to the variants yi and genotypic score xim for the covariates by applying that have strong associations with the traits of interest and α α α o linear regressions. That is, yi = 0 + 1zi1 + ... + pzip + i adjust the direction of the association. Also, wm will put α α α τ and xim = 0m + 1mzi1 + ... + pmzip + im. big weights to rare variants. TOW targets rare variants ˜ Let yi and x˜im denote the residuals of yi and xim, and will lose power when testing for the effect of both rare ˜ respectively. Denote Xi = (x˜i1, ..., x˜iM) as the residuals and common variants. For testing the effects of both rare of the genotypic score of the ith individual. and common variants, we propose a new statistic, VW- Using the generalized linear model (GLM) to model the TOW. We divide variants into rare (minor allele frequency relationship between trait values and genotypes is equiva- [MAF] <the rare variant threshold [RVT]) and common lent to modeling the relationship between the residuals of (MAF > RVT), and apply TOW to the rare and common trait values and the residuals of genotypes through variants separately. GLM (1), where g() is a monotone “link” function. Define the test statistic of VW-TOW as TVW−T =min0≤λ≤1pλ,wherepλ is the p value of Tλ. ˜ | ˜ β β ˜ ··· β ˜ (1) g(E(yi Xi)) = 0 + 1xi1 + + MxiM Tr, Tr and Tc denote the test statistics of TOW for rare and common variants, respectively. Here, we evaluate the Under the GLM, the score test statistic to minimization by dividing the interval [0, 1] into subin- test the null hypothesis H : β = 0 is given by K 0 λ ··· n ¯ ˜ ˜¯ n ¯ ˜ ˜¯ tervals of equal-length. Let k = k/K for k =0, 1, , K. U = (y˜i − y˜)(Xi − X),whereU = (y˜i − y˜)(Xi − X) i=1 i=1 Then, min ≤λ≤ pλ =min ≤ ≤ pλ . 0 1 0 k K k 1 n ¯ 2 n ˜ ˜¯ ˜ ˜¯ T We use permutation tests to evaluate p values of both and V = (y˜i − y˜) (Xi − X)(Xi − X) .The i=1 i=1 n TT and TVW−T. To evaluate the p value of the test TT,let statistic S asymptotically follows a chi-square distribu- 0 TT denote the value of the test statistic based on the ori- tion with k = rank(V) degrees of freedom (DF). For ginal data set. For each permutation, we randomly rare variants, however, the score test may lose power resample from residuals of trait values and denote the k as a result of the sparse data and a large DF .Inrare value of the test statistic based on the permuted data variants association studies, to test for the effect of the setby per. We perform the permutation procedure M TT p weighted combination of variants, xi = wmxim,the many times. Then the value of the test is the propor- per > 0 m=1 tion of the number of permutations with TT TT.We score test statistic becomes permute B times of permutations to evaluate the p value (b) (b) n ˜ − ˜¯ ˜ − ˜¯ 2 M n ˜ − ˜¯ ˜ − ˜¯ 2 of T − .Let and denote the values of T and ( i=1 (yi y)(xi x)) ( m=1 wm i=1 (yi y)(xim xm)) VW T Tr Tc r S(w1, ··· , wM)=n = n . n ¯ 2 n ¯ 2 n ¯ 2 n ¯ 2 ˜ − y˜) ˜ − x˜) ˜ − y˜) ˜ − x˜) th i=1 (yi i=1 (xi i=1 (yi i=1 (xi Tc based on the b permuted data, where b = 0 repre- sents the original data. Based on (b) and (b) Because rare variants are essentially independent, we Tr Tc (b =0,1,··· , B), we can calculate Tb for ··· have λk b =0, 1, , B and k =0, 1, ··· , K, where var (T ) and var (T ) are esti- 2 r c n ¯ 2 M M n ¯ ¯ M 2 n ¯ (x˜i − x˜) = wmwl (x˜im − x˜m)(x˜il − x˜l) ≈ w (x˜im − x˜m) (b) (b) i=1 m=1 m=1 i=1 m=1 m i=1 mated using and (b =1,··· , B). Then, we trans- Tr Tc B (i) (b) n ¯ ¯ (b) (b) I(Tλ > Tλ ) (y˜ − y˜)(x − x˜) (b) i=0 k k i=1 i im n fer Tλ to pλ by p = ,whereI() is am = ˜ − ˜¯ 2 k k λk Let 2 and um = wm (xim xm) .