bioRxiv preprint doi: https://doi.org/10.1101/2020.07.03.166413; this version posted July 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Fine-scale Population Structure and Demographic History of Han Chinese Inferred from Haplotype Network of 111,000 Genomes Ao Lan1,†, Kang Kang2,1,†, Senwei Tang1,2,†, Xiaoli Wu1,†, Lizhong Wang1, Teng Li1, Haoyi Weng2,1, Junjie Deng1, WeGene Research Team1,2, Qiang Zheng1,2, Xiaotian Yao1,* & Gang Chen1,2,3,* 1 WeGene, Shenzhen Zaozhidao Technology Co., Ltd., Shenzhen 518042, China 2 Shenzhen WeGene Clinical Laboratory, Shenzhen 518118, China 3 Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China † These authors contributed equally to this work. * Correspondence: Xiaotian Yao:
[email protected] & Dr. Gang Chen:
[email protected] ABSTRACT Han Chinese is the most populated ethnic group across the globe with a comprehensive substructure that resembles its cultural diversification. Studies have constructed the genetic polymorphism spectrum of Han Chinese, whereas high-resolution investigations are still missing to unveil its fine-scale substructure and trace the genetic imprints for its demographic history. Here we construct a haplotype network consisted of 111,000 genome-wide genotyped Han Chinese individuals from direct-to-consumer genetic testing and over 1.3 billion identity-by-descent (IBD) links. We observed a clear separation of the northern and southern Han Chinese and captured 5 subclusters and 17 sub-subclusters in haplotype network hierarchical clustering, corresponding to geography (especially mountain ranges), immigration waves, and clans with cultural-linguistic segregation.