Identity-By-Descent Estimation with Population- and Pedigree-Based Imputation in Admixed Family Data Mohamad Saad
Total Page:16
File Type:pdf, Size:1020Kb
Marshall University Marshall Digital Scholar Biochemistry and Microbiology Faculty Research 2016 Identity-by-descent estimation with population- and pedigree-based imputation in admixed family data Mohamad Saad Alejandro Q. Nato Jr. Marshall University, [email protected] Fiona L. Grimson Steven M. Lewis Lisa A. Brown See next page for additional authors Follow this and additional works at: https://mds.marshall.edu/sm_bm Part of the Genetic Phenomena Commons, Genetic Processes Commons, and the Genetic Structures Commons Recommended Citation Saad M, Nato AQ, Grimson FL, et al. Identity-by-descent estimation with population- and pedigree-based imputation in admixed family data. BMC Proceedings. 2016;10(Suppl 7):295-301. doi:10.1186/s12919-016-0046-5. This Conference Proceeding is brought to you for free and open access by the Faculty Research at Marshall Digital Scholar. It has been accepted for inclusion in Biochemistry and Microbiology by an authorized administrator of Marshall Digital Scholar. For more information, please contact [email protected], [email protected]. Authors Mohamad Saad, Alejandro Q. Nato Jr., Fiona L. Grimson, Steven M. Lewis, Lisa A. Brown, Elizabeth M. Blue, Timothy A. Thornton, Elizabeth A. Thompson, and Ellen M. Wijsman This conference proceeding is available at Marshall Digital Scholar: https://mds.marshall.edu/sm_bm/238 The Author(s) BMC Proceedings 2016, 10(Suppl 7):7 DOI 10.1186/s12919-016-0046-5 BMC Proceedings PROCEEDINGS Open Access Identity-by-descent estimation with population- and pedigree-based imputation in admixed family data Mohamad Saad1,2, Alejandro Q. Nato Jr2, Fiona L. Grimson3, Steven M. Lewis3, Lisa A. Brown1, Elizabeth M. Blue2, Timothy A. Thornton1, Elizabeth A. Thompson3 and Ellen M. Wijsman1,2,4* From Genetic Analysis Workshop 19 Vienna, Austria. 24-26 August 2014 Abstract Background: In the past few years, imputation approaches have been mainly used in population-based designs of genome-wide association studies, although both family- and population-based imputation methods have been proposed. With the recent surge of family-based designs, family-based imputation has become more important. Imputation methods for both designs are based on identity-by-descent (IBD) information. Apart from imputation, the use of IBD information is also common for several types of genetic analysis, including pedigree-based linkage analysis. Methods: We compared the performance of several family- and population-based imputation methods in large pedigrees provided by Genetic Analysis Workshop 19 (GAW19). We also evaluated the performance of a new IBD mapping approach that we propose, which combines IBD information from known pedigrees with information from unrelated individuals. Results: Different combinations of the imputation methods have varied imputation accuracies. Moreover, we showed gains from the use of both known pedigrees and unrelated individuals with our IBD mapping approach over the use of known pedigrees only. Conclusions: Our results represent accuracies of different combinations of imputation methods that may be useful for data sets similar to the GAW19 pedigree data. Our IBD mapping approach, which uses both known pedigree and unrelated individuals, performed better than classical linkage analysis. Background on a dense single nucleotide polymorphism (SNP) In the last few years, imputation approaches have been panel, (b) genotype remaining individuals on a sparse widely used in genetic studies, especially in the context SNP panel, and (c) use statistical methods for imput- of genome-wide association studies and population- ation, generally based on hidden Markov models, to based designs. This is a result of an attractive feature of impute untyped SNPs from the dense panel into these imputation: it increases genetic information at low-cost. individuals with only sparse (or no) genotyped SNPs. Moreover, imputation allows meta-analysis between dif- Imputation approaches have been proposed for both ferent studies, especially when different genotyping family- and population-based designs. However, the platforms are used. The main idea of imputation is to: performance of these approaches has not yet been thor- (a) genotype or sequence a small subset of individuals oughly evaluated and compared in pedigree data. The availability of the International Haplotype Map Project * Correspondence: [email protected] (HapMap) [1] and the 1000 Genomes Project [2] have 1 Department of Biostatistics, University of Washington, Seattle, WA, USA allowed population-based imputation to become wide- 2Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, USA spread. However, these databases are not ideal for Full list of author information is available at the end of the article © 2016 The Author(s). Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. The Author(s) BMC Proceedings 2016, 10(Suppl 7):7 Page 296 of 415 family-based imputation methods, which require that estimation within pedigrees, required for pedigree- the dense SNP panel individuals be selected from pedi- based imputation, we used a set of 351 well-spaced grees. In addition, imputation approaches for large ped- SNPs (Marker Set 2 [MS-2]; Table 1) extracted from igrees are more challenging, and only quite recently has the SNP data of all genotyped individuals in these 16 an approach been proposed and implemented in the pedigrees. Physical positions were converted to meiotic program called Genotype Imputation Given Inheritance map positions (cM) by reference to the Rutgers map (GIGI) [3] that is able to efficiently handle large pedi- [6], and converted to positions based on the Haldane grees and accurately impute rare variants. The search map function. For IBD mapping analysis based on a for rare risk variants has made imputation in family- meiotic map, with the knowledge of the trait simula- based designs more important, because of enrichment tion model provided in the “Answers” file obtained of such variants in relatively large pedigrees [4] since with the data, MS-2 was used to compute IBD within linkage analysis may be powerful for initiating rare vari- pedigrees. We focused the IBD mapping analysis on 7 ant identification. An attractive study design is to (a) pedigrees (Pedigrees 05, 06, 07, 08, 10, 21, 25; total perform linkage analysis, (b) perform imputation in the number of 529 individuals) that showed high levels of region(s) with linkage signals, and (c) perform associ- relatedness on the basis of Genetic Analysis Workshop ation analysis on the imputation data in these regions. 18 analysis [7]. We selected 17 pairs of individuals (21 It is well-established that larger pedigrees are advanta- distinct individuals) who had shown high levels of pair- geous for this linkage analysis component. wise kinship and were not related by the pedigree infor- Identity-by-descent (IBD) underpins both linkage ana- mation. For these 21 cryptically related individuals, we lysis and imputation. For linkage analysis, once IBD has used Marker Set 3 (MS-3) (Table 1) to infer IBD among been determined, the pedigree structure is no longer “unrelated” individuals. needed for subsequent computations. For imputation, IBD within pedigrees or (ancestrally) across individuals Selection of dense SNP panel individuals for imputation is used as a source of correlation between individuals for Among the 464 sequenced individuals, we chose to family- and population-based imputation. include 100 or 200 of them as if genotyped on the In this study, we evaluated and compared the perform- dense SNP panel. In proportion to the number of ance of several family- and population-based imputation sequenced individuals in each pedigree, we selected approaches in the Mexican American pedigree data pro- individuals following 2 strategies: (a) random selec- vided by Genetic Analysis Workshop 19 (GAW19). We tion or (b) via GIGI-Pick (a method to help with in- also evaluated the effect on imputation accuracy of the formative choices) [5]. For GIGI-Pick, we used the number of selected individuals for sequencing and the genome-wide coverage option, which selects subjects way they were selected, randomly or by tailored selection based on the pedigree structure. In the group of the with GIGI-Pick [5]. In addition, we evaluated the per- remaining 264 or 364 individuals, we extracted geno- formance of a new IBD mapping approach that we types of SNPs present in the SNP chip of our region propose, which combines IBD information inferred (a) of interest (~700 SNPs), to form the set of sparse using a sparse SNP panel from known pedigrees, and (b) SNP panel individual data set. using a dense SNP panel from unrelated individuals across pedigrees. Population-based imputation Weperformed2-stepimputation,withpre-phasingof Methods markers in the first step, and imputation in the second Data step. The pre-phasing algorithms/programs we used All our analyses used chromosome 3. For population- were: BEAGLE [8], IMPUTE2 [9], MaCH [10], and