Large‑Scale Whole‑Genome Sequencing of Three Diverse Asian Populations in Singapore
Total Page:16
File Type:pdf, Size:1020Kb
This document is downloaded from DR‑NTU (https://dr.ntu.edu.sg) Nanyang Technological University, Singapore. Large‑scale whole‑genome sequencing of three diverse Asian populations in Singapore Wu, Degang; Dou, Jinzhuang; Chai, Xiaoran; Bellis, Claire; Wilm, Andreas; Shih, Chih Chuan; Soon, Wendy Wei Jia; Bertin, Nicolas; Lin, Clarabelle Bitong; Khor, Chiea Chuen; DeGiorgio, Michael; Cheng, Shanshan; Bao, Li; Karnani, Neerja; Hwang, William Ying Khee; Davila, Sonia; Tan, Patrick; Shabbir, Asim; Moh, Angela; ...Wang, Chaolong 2019 Wu, D., Dou, J., Chai, X., Bellis, C., Wilm, A., Shih, C. C., Soon, W. W. J., Bertin, N., Lin, C. B., Khor, C. C., DeGiorgio, M., Cheng, S., Bao, L., Karnani, N., Hwang, W. Y. K., Davila, S., Tan, P., Shabbir, A., Moh, A., ...Wang, C. (2019). Large‑scale whole‑genome sequencing of three diverse Asian populations in Singapore. Cell, 179(3), 736‑749. https://dx.doi.org/10.1016/j.cell.2019.09.019 https://hdl.handle.net/10356/150428 https://doi.org/10.1016/j.cell.2019.09.019 © 2019 Elsevier Inc. All rights reserved. This paper was published in Cell and is made available with permission of Elsevier Inc. Downloaded on 01 Oct 2021 10:58:27 SGT 1 Large-scale whole-genome sequencing of three diverse Asian populations in Singapore 2 3 Degang Wu1,2,36, Jinzhuang Dou1,36, Xiaoran Chai1,36, Claire Bellis3,4, Andreas Wilm5, Chih Chuan 4 Shih5, Wendy Wei Jia Soon6, Nicolas Bertin1, Clarabelle Bitong Lin3, Chiea Chuen Khor3, Michael 5 DeGiorgio7,8, Shanshan Cheng2, Li Bao2, Neerja Karnani9,10, Sonia Davila11,12, Patrick Tan11,13,14,15, 6 Asim Shabbir16, Angela Moh17, Eng-King Tan18,19, Jia Nee Foo3,20, Liuh Ling Goh21, Khai Pang 7 Leong22, Roger S.Y. Foo3,23,24, Carolyn Su Ping Lam23,25,26,27, Arthur Mark Richards23,24,28, Ching-Yu 8 Cheng29,30,31, Tin Aung29,30,31, Tien Yin Wong29,30,31, Huck Hui Ng10,32,33,34, Jianjun Liu3,24,*, and 9 Chaolong Wang2,1,37,* on behalf of the SG10K Consortium35 10 11 1 Computational and Systems Biology, Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore 138672, 12 Singapore 13 2 Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and 14 Technology, Wuhan, Hubei 430030, China 15 3 Human Genetics, Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore 138672, Singapore 16 4 Genomics Research Centre, Institute of Health and Biomedical Innovation, Queensland University of Technology, Brisbane 4000, 17 Australia 18 5 Bioinformatics Core, Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore 138672, Singapore 19 6 Integrated Genome Analysis Platform (iGAP), Genome Institute of Singapore, Agency for Science, Technology and Research, 20 Singapore 138672, Singapore 21 7 Departments of Biology and Statistics, Pennsylvania State University, University Park, PA 16802, USA 22 8 Institute for CyberScience, Pennsylvania State University, University Park, PA 16802, USA 23 9 Singapore Institute for Clinical Sciences, Agency for Science, Technology and Research, Singapore 117609, Singapore 24 10 Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117597, Singapore 25 11 SingHealth Duke-NUS Institute of Precision Medicine, Singapore 169856, Singapore 26 12 Cardiovascular and Metabolic Disorders Program, Duke-National University of Singapore Medical School, Singapore 169857, 27 Singapore 28 13 Cancer and Stem Biology Program, Duke-National University of Singapore Medical School, Singapore 169857, Singapore 29 14 Cancer Science Institute of Singapore, National University of Singapore, Singapore 117599, Singapore 30 15 Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore 138672, Singapore 31 16 Department of Surgery, National University Health System, Singapore 119228, Singapore 32 17 Clinical Research Unit, Khoo Teck Puat Hospital, Singapore 768828, Singapore 33 18 Department of Neurology, National Neuroscience Institute, Singapore General Hospital Campus, Singapore 169608, Singapore 34 19 Neuroscience and Behavioural Disorders Program, Duke-National University of Singapore Medical School, Singapore 169857, 35 Singapore 36 20 Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore 308232, Singapore 37 21 Personalised Medicine Service, Tan Tock Seng Hospital, Singapore 308433, Singapore 38 22 Clinical Research and Innovation Office, Tan Tock Seng Hospital, Singapore 308433, Singapore 39 23 Cardiovascular Research Institute, National University Heart Centre, Singapore 119074, Singapore 40 24 Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117597, Singapore 41 25 Department of Cardiology, National Heart Centre Singapore, Singapore 169609, Singapore 42 26 Cardiovascular Sciences Academic Clinical Program, Duke-National University of Singapore Medical School, Singapore 169857, 43 Singapore 44 27 Department of Cardiology, University Medical Center Groningen, Groningen 9713GZ, the Netherlands 45 28 Christchurch Heart Institute, University of Otago, Christchurch 8011, New Zealand 46 29 Singapore Eye Research Institute, Singapore National Eye Centre, Singapore 169856, Singapore 47 30 Ophthalmology & Visual Sciences Academic Clinical Program (Eye ACP), Duke-National University of Singapore Medical School, 48 Singapore 169857, Singapore 49 31 Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117597, Singapore 50 32 Stem Cell and Regenerative Biology, Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore 51 138672 52 33 Department of Biological Sciences, National University of Singapore, Singapore 117558, Singapore 53 34 School of Biological Sciences, Nanyang Technological University, Singapore 637551, Singapore 54 35 Additional authors of the SG10K Consortium are listed in the Supplemental Note 55 36 These authors contributed equally 56 37 Lead Contact 57 * Correspondence: [email protected] (C.W.), [email protected] (J.L.) 58 59 60 1 1 Summary 2 3 Underrepresentation of Asian genomes has hindered population and medical genetics research on 4 Asians, leading to population disparities in precision medicine. By whole-genome sequencing of 4,810 5 Singapore Chinese, Malays, and Indians, we found 98.3 million SNPs and small insertions/deletions, 6 over half of which are novel. Population structure analysis demonstrated great representation of Asian 7 genetic diversity by three ethnicities in Singapore, and revealed a Malay-related novel ancestry 8 component. Furthermore, demographic inference suggested that Malays split from Chinese ~24,800 9 years ago, and experienced significant admixture with East Asians ~1,700 years ago, coinciding with 10 the Austronesian expansion. Additionally, we identified 20 candidate loci for natural selection, among 11 which 14 harbored robust associations with complex traits and diseases. Finally, we showed that our 12 data can substantially improve genotype imputation in diverse Asian and Oceanian populations. These 13 results highlight the value of our data as a resource to empower human genetics discovery across broad 14 geographic regions. 15 2 1 Introduction 2 3 We have gained profound insights into the population history and the genetic basis of phenotype 4 diversity and disease etiology by mining variation in human genomes (Fan et al., 2016; MacArthur et 5 al., 2017; Nielsen et al., 2017; Timpson et al., 2018). While earlier efforts were mostly based on array 6 genotyping of common variants, whole-genome sequencing (WGS) has become a powerful approach 7 for understanding the human evolutionary and migration history and discovering disease-related 8 genetic variants by surveying the genome in an unbiased and comprehensive fashion. A remarkable 9 milestone is the 1000 Genomes Project (1KGP), which sequenced >2,500 genomes from 26 10 populations at ~7.4× and cataloged over 88 million variants (The 1000 Genomes Project Consortium, 11 2015). The resource provided by 1KGP has empowered numerous genome-wide association studies 12 (GWAS) through imputation, allowing for detection of genetic association at low frequency variants 13 and thus enabling a deeper understanding of the genetic architecture of complex diseases (Das et al., 14 2016; Timpson et al., 2018). The direct assessment of causal variants enabled by large-scale 15 sequencing analysis has led to the convergence of population and clinical genetics (Ashley, 2016; 16 MacArthur et al., 2014), where population genetic information has become the cornerstone of precision 17 medicine, with applications in the diagnosis of Mendelian diseases (Manrai et al., 2016; Rehm et al., 18 2015), optimization of medication usage (Relling and Evans, 2015), drug development (Nelson et al., 19 2015), and disease risk prediction (Chatterjee et al., 2016). 20 Comparing across populations, most genetic variants are rare and population-specific, 21 highlighting the importance of characterizing local population diversity (Hindorff et al., 2018; Manrai 22 et al., 2016). Consequently, many countries have initiated population-based WGS studies 23 (Gudbjartsson et al., 2015; The GoNL Consortium, 2014; The UK10K Consortium, 2015). 24 Nevertheless, the Eurocentric biases in human genetics research have caused intensifying concerns 25 about the exacerbation of health disparities during the implementation of precision medicine (Martin