Multipoint Association Mapping for Longitudinal Family Data: an Application to Hypertension Phenotypes Yen-Feng Chiu1*, Chun-Yi Lee1 and Fang-Chi Hsu2
Total Page:16
File Type:pdf, Size:1020Kb
The Author(s) BMC Proceedings 2016, 10(Suppl 7):54 DOI 10.1186/s12919-016-0049-2 BMC Proceedings PROCEEDINGS Open Access Multipoint association mapping for longitudinal family data: an application to hypertension phenotypes Yen-Feng Chiu1*, Chun-Yi Lee1 and Fang-Chi Hsu2 From Genetic Analysis Workshop 19 Vienna, Austria. 24-26 August 2014 Abstract It is essential to develop adequate statistical methods to fully utilize information from longitudinal family studies. We extend our previous multipoint linkage disequilibrium approach—simultaneously accounting for correlations between markers and repeat measurements within subjects, and the correlations between subjects in families—to detect loci relevant to disease through gene-based analysis. Estimates of disease loci and their genetic effects along with their 95 % confidence intervals (or significance levels) are reported. Four different phenotypes—ever having hypertension at 4 visits, incidence of hypertension, hypertension status at baseline only, and hypertension status at 4 visits—are studied using the proposed approach. The efficiency of estimates of disease locus positions (inverse of standard error) improves when using the phenotypes from 4 visits rather than using baseline only. Background so estimating the position of a disease locus and its stand- Approaches for analyzing longitudinal family data have ard error is robust to a wide variety of genetic mecha- been categorized into 2 groups [1]: (a) first summarizing nisms; (b) it provides estimates of disease locus positions, repeated measurements into 1 statistic (eg, a mean or along with a confidence interval for further fine mapping; slope per subject) and then using the summarized statis- and (c) it uses linkage disequilibrium between markers to tic as a standard outcome for genetic analysis; or (b) localize the disease locus, which may not have been typed. simultaneous modeling of genetic and longitudinal pa- We extended this approach to map susceptibility genes rameters. In general, joint modeling is appealing because using longitudinal nuclear family data with an application (a) all parameter estimates are mutually adjusted, and to hypertension. Four different outcomes were used based (b) within- and between-individual variability at the on the proposed method: (I) ever having hypertension levels of gene markers, repeat measurements, and family (“Ever”), (II) incidence event with status changed from un- characteristics are correctly accounted for [1]. affected to affected (“Progression”), (III) first available visit The semiparametric linkage disequilibrium mapping for as baseline only (“Baseline”), and (IV) all available time the hybrid family design we developed previously [2] uses points (“Longitudinal”). We compared the estimates of the all markers simultaneously to localize the disease locus disease locus positions, their standard errors, the genetic without making an assumption about genetic mechanism, effect estimate at the disease loci, and their significance except that only 1 disease gene lies in the region under for the 4 phenotypes to examine the efficiency gained study. The advantages of this approach are (a) it does not from using repeated longitudinal phenotypes. require the specification of an underlying genetic model, Methods * Correspondence: [email protected] Genome-wide genotypes and phenotype data 1Institute of Population Health Sciences, National Health Research Institutes, Miaoli, 35053, Taiwan, Republic of China Association mapping was conducted on chromosome 3 Full list of author information is available at the end of the article of the genome-wide association study (GWAS) data. A © 2016 The Author(s). Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. The Author(s) BMC Proceedings 2016, 10(Suppl 7):54 Page 316 of 415 total of 65,519 single-nucleotide polymorphisms and (SNPs) included in 1095 genes were genotyped on D ‐1 if the transmitted paternal allele at t is HtðÞ Y kilðÞ¼t ; chromosome 3 for 959 individuals from 20 original 1 0 if the transmitted paternal allele at t is htðÞ pedigrees in Genetic Analysis Workshop 19 (GAW19). D ‐ t HtðÞ Y kilðÞ¼t 1 if the nontransmitted paternal allele at is ; Of these individuals, there were 178 (38 %) affected 2 0 if thenontransmittedpaternalalleleatt ishtðÞ “ ” D offspring out of 469 offspring for phenotype (I) Ever ; for the unaffected offspring kil . Then, we define the prefe- D D 130 (31 %) out of 421 offspring for phenotype (II) kil kil rential transmission statistic Y T ðÞ¼t Y ðÞt −Y ðÞt for “ ” kil 1 2 Progression ; 64 (11 %) out of 600 offspring for D D kil kil “ ” XT ðÞ¼t X ðÞt −X ðÞt phenotype (III) Baseline ; and 60 (11 %) out of 565 the paternal side and kil 1 2 for the offspring to approximately 85 (45 %) out of 189 off- maternal side for a trio; similarly, the preferential D D spring across the 4 visits (or 87 [21.63 %] out of 402 kil kil transmission statistic Y U ðÞ¼t Y ðÞt −Y ðÞt and offspring on average) for phenotype (IV) “Longitu- kil 1 2 D D kil kil dinal” (Table 1). To compare phenotypes (I) and (II), XU ðÞ¼t X ðÞt −X ðÞt kil 1 2 for an unaffected trio for only individuals with at least 2 measurements were in- both parental sides, respectively, where kil =1,…, cluded in the “ever” phenotype. PedCut [3] was used to N1il (for unaffected), N1il (N2il)isthenumberof split large pedigrees with members more than 20 affected (unaffected) offspring in the family i at the members into nuclear pedigrees. Consequently, we an- lth time point, i =1,… n, l =1,…, L (L=1or4inthis alyzed a total of 138 pedigrees with 1,495 individuals study). (the IDs for missing parents were added to form trios). The expectation of the statistic is μ k ðÞ¼δ; π hi 1 il j In divided pedigrees, the nuclear families contained be- ÀÁ ÀÁÀÁN EYT tj jΦ ¼ − θt ; τ C −θt ; τ πj for case- tween 3 and 25 individuals. Five SNPs were removed kil 1 1 2 j 1 j hiÀÁ because they failed the test of Hardy-Weinberg equi- μ ðÞ¼δ; π EYU tj jΦ ¼ − parent trios and 2 kil j k 2 librium (HWE) (p value < 10 4). The HWE test was ÀÁÀÁ il − θ Cà −θ N π performed using PLINK 1.07 [4] based on 56 unrelated 1 2 tj; τ 1 tj; τ j for control-parent trios, θ subjects. (For information on PLINK, see http:// where tj; τ is the recombination fraction between pngu.mgh.harvard.edu/purcell/plink/.) A total of 22,056 marker position tj and disease locus position τ,the genotypes from various SNPs with genotyping errors recombination fraction Θ is a parametric function −4 (genotyping error rate was around 3.51 × 10 )werefur- of the parameter of primary interest (τ,thephys- ther excluded by the MERLIN 1.1.2 computing package ical position of the functional variant), N is the (see http://www.sph.umich.edu/csg/abecasis/merlin/tour/ number of generations since the initiation of the linkage.html). None of the covariates was adjusted for in disease variant, Φ1 denotes the event that the this approach. Φ offspring is affected, 2 representshi the event that the offspring is unaffected, C ¼ EYT ðÞτ jΦ ¼ E hihihikil 1 Multipoint linkage disequilibrium mapping à XT ðÞτ jΦ , C ¼ EYU ðÞτ jΦ ¼ EXU ðÞτ jΦ ; δ Suppose M markers were genotyped in the region R at loca- kil 1 kil 2 kil 2 à tions of 0 ≤ t1 < t2 < … < tM ≤ T. We assume there are 2 al- ¼ ðÞτ; N; C; C is the vector of parameters, and πj = H t h t h τ μ leles per marker. With ( ) being the target allele at marker Pr [ ( j)| ( )]. 1kil j is the probability for an affected t h t position ,and ( ) being the nontarget allele, we define offspring to receive a target allele, and −μ is the 2kil j t HtðÞ Dk 1 if the transmitted paternal allele at is probability for an unaffected offspring to receive a Y ilðÞ¼t ; 1 t htðÞ 0 if the transmitted paternal allele at is target allele. The statistic Z k ¼ XTk þ Y Tk and 1 il j il j il j D t HtðÞ Z ¼ X þ Y Y kilðÞ¼t 1 if the nontransmitted paternal allele at is ; 2kil j Ukil j Ukil j were used to estimate the pa- 2 t htðÞ 0 if the nontransmitted paternal allele at is rameters. The estimating equations used to solve D δ for the affected offspring kil , for parameters are: Table 1 Number of offspring for different phenotypes Ever Progression Baseline Visit 1 Visit 2 Visit 3 Visit 4 Affected offspring 178 130 64 60 78 125 85 All offspring 469 421 600 565 426 429 189 Percentage 0.38 0.31 0.11 0.11 0.18 0.29 0.45 Number of nuclear families 174 149 213 203 168 165 79 The Author(s) BMC Proceedings 2016, 10(Suppl 7):54 Page 317 of 415 ÀÁ Xn XL XN il XM will present the details of this proposed methodology 1 ∂μ k j δ; π^j S ðÞ¼δ 1 il ð Þ elsewhere. 1 ∂δ 1 i¼ l¼ k j¼ 1 1 il¼1 1 Gene-based association mapping was conducted for all noÀÁ −1 SNPs on chromosome 3. This approach accounts for Cov Z k Z k −2μ δ; π^j ; 1 il j 1 il j 1kil j correlations between markers and repeated phenotypes ÀÁ n L N M within subjects, and correlations between subjects per X X X2il X ∂μ δ; π^j S ðÞ¼δ 2kil j ð Þ family.