Fast Identity by Descent Detection Across 500000 UK Biobank Samples Reveals Recent Evolutionary History and Populatio
Total Page:16
File Type:pdf, Size:1020Kb
PgmNr 99: Fast identity by descent detection across 500,000 UK Biobank samples reveals recent evolutionary history and population structure. Authors: J. Nait Saada 1; A. Gusev 2,3; P.F. Palamara 1 View Session Add to Schedule Affiliations: 1) Department of Statistics, University of Oxford, Oxford, Oxfordshire, United Kingdom; 2) Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; 3) Brigham & Women’s Hospital, Division of Genetics, Boston, MA 02215, USA Detection of Identical-By-Descent (IBD) segments provides a fundamental measure of genetic relatedness and plays a key role in a wide range of genomic analyses. We developed a new method, called FastSMC, that enables accurate biobank-scale detection of IBD segments transmitted by common ancestors living up to several hundreds of generations in the past. FastSMC combines a fast heuristic search for IBD segments with accurate coalescent-based likelihood calculations, and enables estimating the age of common ancestors transmitting IBD regions. We used coalescent simulation to verify that FastSMC outperforms the accuracy of existing methods in detecting IBD regions within 25, 50, 100, 150 and 200 generations (e.g. area under precision-recall curve improvement within the past 100 generations of 10% over RefinedIBD, 11% over GERMLINE and 80% over RaPID, after fine-tuning all methods), while requiring only marginally more time than GERMLINE, the most scalable method. We applied FastSMC to 487,409 phased British samples from the UK Biobank. We detected the presence of ~217 billion IBD segments transmitted by shared ancestors within the past 50 generations, obtaining a fine-grained picture of genetic relatedness within the past two millennia in the UK. We reconstructed region-specific effective population size within the past 50 generations, detecting substantially smaller recent effective size in the North of the country. After excluding close relatives (≤3rd degree cousins), the sharing of recent ancestry remained highly predictive of geographic co-localization, enabling us to estimate the birth coordinates of a random sample with an average error of 91km (K-nearest-neighbors using closest 5 individuals), a 68% improvement over standard genomic correlation. We sought evidence of recent positive selection by identifying loci with unusually high density of coalescence times within the past 50 generations. We detected 12 genome- wide significant signals, including 5 loci with previous evidence of positive selection (e.g. LCT, HLA and LDLR) and 7 novel loci (including MRC1, associated cardiovascular disease). Furthermore, the DRC statistic along the genome was significantly correlated with summary association statistics for LDL (p < 0.01 after adjusting for MAF and LD), consistent with recent evolutionary pressure on the trait. These results underscore the presence of subtle population structure and the widespread action of natural selection during recent millennia..