© 2012 Nature America, Inc. All rights reserved. Received Received 13 March; accepted 14 November; published online 16 December 2012; Jun Wang ( University University of Copenhagen, Copenhagen, Denmark. Breeding, Chengdu, China. Wildlife Conservation Society, Beijing, China. Transomics Shenzhen, BGI-Shenzhen, China. Biotechnologies, 1 Minshan (MIN) and Qionglai-Daxiangling-Xiaoxiangling-Liangshan population. panda with SNPs 13,020,055 site each for frequencies allele population and genotypes individual of probabilities the estimated we quality, genome panda’s 2.25-Gb the to relative individual each for depth 4.7-fold and coverage sequencing 91.5% of average an indicated alignment Genome studies. genomics lation popu existing for assessed individuals of percentage highest ~2% of the current estimates of the entire wild panda population ( giant wild 34 of resequencing whole-genome out We carried pandas populations, genetic We underlie fluctuation changes two We demographic deep-sequenced coverage the population animal and ( ultimately The Fuwen Wei Hemin Zhang Yibo Hu Shancen Zhao insights into demographic history and adaptationlocal Whole-genome sequencing of giant pandas provides Nature Ge Nature Ailuropoda melanoleuca Key Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China. i. 1 Fig. We inferred three distinct genetic clusters—Qinling (QIN), (QIN), clusters—Qinling genetic distinct three inferred We

whole

identified identify panda anthropogenic divergences.

a

population for adaptation and and

[email protected] in recent

and 1

n genomes

leads , , Weiming He

lineage

3,000 climate dynamics

for etics two

1 anthropogenic

upeetr Tbe 1 Table Supplementary history used three

millions

population

5

1 to population panda

, , Zhihe Zhang

, Evidence years. 2

dates demography

, aDV this only were to

10 disturbances distinct of

of

7 , , Pingping Zheng remains ≥ their

Shaanxi Shaanxi Wild Animal Rescue and Research Center, Louguantai, Xi’an, China. 34

A 99% probability of being variable over the the over variable being of probability 99%

data

genome of pandas one

NCE ONLINE PUBLIC ONLINE NCE back the

).

pandas years, 2

indicated environments. Although ).

divergence

, , Shanning Zhang

panda expansions, set activities extant primary

to largely

2,3 from 4 together

the

human

are to 6 at

, , Xuelin Jin their

populations species,

5

reconstruct an

late

China China Conservation and Research Center for the , Wolong, China.

recognized 4 their

drivers that, global

unknown. . To improve SNP inference inference ToSNP . improve ). This sample constitutes constitutes sample This ). have

average and

10

6 contribution

activities two and identified a total of of total a identified and Miocene with These These authors contributed equally to this work. should Correspondence be addressed to F.W. ( 1

origin However, whereas ,

A 3

negatively serious

the

climate TION ,

10 of bottlenecks

the

, , Shanshan Dong

population that 7 4.7-fold

giant

4 We a

, , Jinguo Zhang to

to , , Wei Fan

3 1 previously continuous likely College College of Life Sciences, University of the Chinese Academy of Sciences, Beijing, China.

shape and decline.

global

the

show

sequenced change

in

to panda

affected

all

present.

panda

and

pandas pandas

three

5

2 , , the

, , Lifeng Zhu

-

8 d 2 , , Huanming Yang o (QXL)—among the current panda population using using 1 Fig. Supplementary population panda Admixture current the (QXL)—among panda panda panda (primal subspecies or species panda fossil three of distribution chronological the covering period ( ago years 20,000 to million 8 from history demographic genome panda the across (PSMC) coalescent Markovian model sequentially pairwise the used we variability genetic substantial has panda the that indicated that loci microsatellite ten 3 Table the average pairwise diversity within populations ( (1.04–1.30 × 10 diversity genetic similar showed populations three the Overall, als). ( population Liangshan the in sampling limited of because ambiguous was assignment this but populations, from other the separate was population Liangshan the that indicated vector ( evidence. The first eigenvector (PCA) separated these analysis three genetic populationscomponents ( stratification three-population Qionglai individuals. The fixation index ( and the remaining Liangshan Daxiangling, and comprising the other Fig. 1 ( QXL the within population tions subpopula two detected but population MIN or QIN the in present substructure Wepopulation no found distinct. genetically also were cluster demography showed population peaks at ~1 million years ago and and ago years million ~1 at peaks population showed demography baconi leuca , P i 10 : 1 < 0.05) ( 0.05) < To reconstruct the demographic history of the giant panda, panda, giant the of history demographic the reconstruct To 0 , , Xiangjiang Zhan . 1 ): ): one and comprising some Xiaoxiangling Qionglai individuals 0 1 9 iuooa microta Ailuropoda 3 3 ; ; our larger study that revealed the MIN and QXL populations a humans as ) to examine changes in the local density of heterozygotes heterozygotes of density local the in changes examine to 8 1 / , , Dong Li n 8 g and an allele-shared matrix (Online Methods) ( Fig. 1 Fig. ) . 2 1 , 4 1 −3 9 4 . Considering the time since the origin of the panda, of panda, the origin the since time the . Considering 4 for Watterson’s estimator ( 8 c 9 Beijing Beijing Zoo, Beijing, China. . and and 1 2 2 2 ). Previous studies only showed a distinct QIN only a showed distinct studies ). Previous , , Jian Wang , , Xuemei Zhang , confirming the results from a study using using study a from results the confirming , Supplementary Fig. 2 Fig. Supplementary 1 4 , and baconi panda panda baconi and 10 . PSMC analysis showed a well-defined well-defined a showed analysis PSMC . 1 , , Qi Wu 1 provided additional corroborative corroborative additional provided Supplementary Table 2 Supplementary 6 Chengdu Chengdu Research Base of Giant Panda K = 4; 2 2 Shenzhen Shenzhen Key Laboratory of , , Jun Wang Ailurarctos lufengensis Ailurarctos 1 F , Fig. 1 Fig. 10 ST θ 9 2 Department Department of Biology, w ) , Xiaosen Guo , Xiaosen , , Quan Chen 1 ) and 1.13–1.37 × 10 0 strongly supported this b ). The second eigen second The ). and iuooa melano Ailuropoda θ π ); 2 [email protected] s r e t t e l , Supplementary Supplementary n 9 Supplementary = 2 individu 2 = & ). ). Principal- 4

China China Fig. 2 Fig. Fig. Fig. 2 , , pygmy pygmy , frappe 2 , 1 −3 a and ) ) or for ), a ), 7  - - - - ,

© 2012 Nature America, Inc. All rights reserved. pygmy pandas mainly ate , as indicated by their specialized specialized their by indicated as bamboo, ate mainly pandas pygmy bamboo lacking habitats swamp in living carnivores, or omnivores were pandas (primal) earliest the that indicates evidence emerged pandas pygmy when ago years million ~3 bamboo China. in periods climatic wet and warm or dry and loess Chinese (Pearson’s correlation of (MAR) rate accumulation mass the by inferred as dust, atmospheric of amount the in changes with correlated tively ( size population in effective fluctuations ( ago years ~20,000 and ago years million ~0.2 at bottlenecks population and ago years ~40,000 the represents bar scale The outgroup. an as polar the with pandas, ( shown. 2 are 1 and components Principal SNPs. ( ( habitats. panda current represents area shaded the Inset, green. in shown is population QXL the and yellow, in shown is population MIN the red, in shown is population QIN genetic The population. frappe 1 Figure s r e t t e l  adaptations dental and cranial K ) was predefined from 2 to 7. Symbols following each panda ID indicate where sampling occurred. ( occurred. sampling where indicate ID panda each following Symbols 7. 2 to from predefined ) was The first population expansion coincided with a dietary switch to to switch dietary a with coincided expansion population first The c a Principal component 2 analysis ( analysis

–0.7 –0.6 –0.5 –0.4 –0.3 –0.2 –0.1 Current geographic populations of the giant panda and inferred genetic populations. ( populations. genetic inferred and panda giant the of populations geographic Current

0.1 0.2 0.3

Xiaoxiangling Mountains

–0.35 0 N K –0.30 = 3 populations) were mapped using ArcGIS v9.2 on the basis of the proportion of an individual’s ancestry attributed to a given a given to attributed ancestry individual’s an of proportion the of basis the on v9.2 ArcGIS using mapped were = 3 populations) Qionglai Mountains Daxiangling Mountains –0.25 R Mountains Liangshan = −0.30, −0.30, = Mountains Minshan –0.20 Fig. 2 Fig. Chengdu 16 80 Principal component –0.15 , 1 P 40 Kilometers 7 < 0.05), an index indicating cold cold indicating index an 0.05), < Rive . This hypothesis is supported at supported is hypothesis This . Sampling sites Study regions a ). Notably, we found that these these that found we Notably, ). 8 0 r –0.10 N 0 e ) were significantly nega ) were significantly –0.05 b ) Genetic populations of the studied pandas inferred by by inferred pandas studied the of populations ) Genetic d ) A rooted neighbor-joining tree constructed from the allele-shared matrix of SNPs among the wild wild the among SNPs of matrix allele-shared the from constructed tree neighbor-joining ) A rooted 1 Mountains 0 Qinling 0.05 1 , whereas whereas , 1 6 0.10 . Fossil . Fossil 1 0.15 K K K K K K p b 5 - ======distance.

7 6 5 4 3 2 baconi the ago, subspecies the by years replaced million been had ~0.75 panda pygmy from that, indicated evidence fossil years ago) million (0.30–0.13 Penultimate Glaciation the and ago) years million China, (0.78–0.50 in Glaciation Naynayxungla glaciations the largest two the as time same the first bottleneck about occurred 0.2 million years ago ( forests. bamboo of spread the for ideal were which conditions, weather wet and warm meat gene taste the molecular level by the concurrent pseudogenization of the umami GP3 The panda population declined around 0.7 million years ago, and the

d GP4 1

GP5 QIN 8 , which has the largest body size of all the panda species panda the all of size body largest the has which , . The low levels of MAR during that time ( time that during MAR of levels low The . GP6

GP52 GP37 GP7

Tas1r1 GP8

GP33

GP14 0.05 GP26 GP15 GP10

MIN GP28 GP17 GP12 a aDV

) Sampling sites and genetic structure detected by detected structure genetic and sites ) Sampling GP15 associated with the pandas’ decreased reliance on on reliance decreased pandas’ the with associated GP25 GP19 GP16 MIN

A GP38 GP17 NCE ONLINE PUBLIC ONLINE NCE GP16 c GP19 ) Results obtained from PCA using autosomal autosomal using PCA from obtained ) Results GP27 GP51

Qionglai Mountains Minshan Mountains Qinling Mountains frappe GP18 GP18

GP23 GP14 GP51 GP33

QXL

analysis. The number of populations populations of number The analysis. GP39 GP28

GP25 GP31 Polar bear GP13

GP30 GP22 GP10

GP36 A GP35 GP5 GP24 QX

TION GP2

GP29 GP12 GP22 L

Xiaoxiangling Mountains Daxiangling Mountains Liangshan Mountains GP30 GP31

GP3 GP24 GP39 Nature Ge Nature

GP36 GP7 GP23

Fig. 2 Fig. GP13

GP2 GP6 QIN Fig. Fig. 2 1

A. GP8 GP27 9

GP4 . . Additionally, GP38

melanoleuca melanoleuca GP29 a a

) indicate indicate ) GP26 ), ), around GP35 n GP37 etics GP52 1 4 .

© 2012 Nature America, Inc. All rights reserved. the the QIN population lost ~80% of its initial effective size; this occurred years by 58,900 ago), the expanded non-QIN while population 300%, of the Penultimate Glaciation of ago; (CI) years interval million 0.1–0.7 confidence (95% ago years million ~0.3 diverged present. the to up history tion ( decline subsequent its and expansion population second the of ings The resultspopulations. of panda our in identified we SNPs the of basis the ( inference val inter time short relatively this in genome single a in events nation ( ago years 20,000 than recently more occurring events for reduced greatly is approach not be carried out using the PSMC approach because the power of this habitats. panda of loss extensive ref. II; glacial Gongga example, (for glaciations alpine substantial when ago), years (~20,000 glacial maximum last the during occurred bottleneck population second The ( time this at extent greatest their reached pandas for habitat primary the forests, conifer alpine the would as expansion, population the to contributed have could ago) years (30,000–40,000 Period Lake Greatest the during weather warm population reached its pinnacle between 30,000–50,000 years ago. The Glaciation Penultimate the weather. extreme the to adapted they as pandas pygmy from evolved pandas baconi larger or, the possibly, pandas, baconi of origin the facilitated and pandas pygmy of extinction ( MAR high by evidenced as climate, cold A in each time interval is shown beside each arrow. of migrants per year between any two populations indicated by dashed lines. The average number and a population peak (38,879 years ago) are divergences (304,664 and 2,777 years ago) in effective population size. Two population The density of the heatmap shows fluctuations panda from ~300,000 years ago to the present. result showing the demographic history of the more recent than 20,000 years ago. ( simulation cannot detect population changes orange and blue, respectively. Note that PSMC pygmy and baconi panda) are shaded in pink, fossil panda species or subspecies (primal, The approximate ranges chronological of three mutation rate per generation ( Generation time ( brown line shows the MAR of Chinese loess resampled from the original sequence. The PSMC estimates for 100 sequences randomly and the 100 thin blue curves represent the the estimated effective population size ( to 10,000 years ago. The red line represents demographic history from the panda’s origin resequencing genomes. ( from reconstructed the reference and population Figure 2 Nature Ge Nature at a time when there was marked habitat concurrent in expansion the Supplementary Fig. 4 Supplementary i. 2 Fig. This simulation showed that the QIN and non-QIN populations populations non-QIN and QIN the that showed simulation This could history demographic panda recent more of Reconstruction of retreat the after occurred expansion population second The 1 3 . We therefore used diffusion approximations for demographic demographic for approximations diffusion used We . therefore a

Demographic Demographic history of the giant panda ), might have contributed to the the to contributed have might ), ∂ a ∂ n ∂ a i analysis overlapped with and supported the PSMC find etics ∂ i) g 2 ) ) = 12 years, and neutral 2 Fig. 2 Fig. to simulate recent demographic fluctuations on on fluctuations demographic recent simulate to aDV a ) ) and provided on information panda popula ) ) PSMC result showing a A ), owing to the limited number of recombi of number limited the to owing ), NCE ONLINE PUBLIC ONLINE NCE µ 1 Fig. 2 Fig. 9 ) ) = 1.29 × 10 ( 1 9 Fig. 2 Fig. . About 40,000 years . ago years About (CI = 40,000 4,900– b b ) ) ), corresponding with the onset onset the with corresponding ), 19) would likely have resulted in have resulted likely would 19) ∂ N a a , MAR decline), and the panda panda the and decline), MAR , e ∂ ), ), 1 i 5 −8 . . Supplementary Fig. 3 Fig. Supplementary

A TION b a Effective population size (×104) QXL MIN QIN 0 1 2 3 4 5 6 7 8 10 Present 4 2 0 , having having , A. melanoleucabaconi ) 2 1 - - - - .

0.373 0.066 n rwrs drn te ag yat) ( Dynasty) Tang the (during rewards and Dynasty), Han the (during sacrifice entertainment, for pandas giant forest ( extensive and settlements power human geopolitical centralized to rise giving China, in central areas one of prosperous most the remained region Qinling Note ment and increased develop the capacity to agricultural reclaim local woodland ( advanced greatly technology farming in Autumn Period in China (770–486 BC), revolutionary improvements Atof the Spring and the beginning disturbances. anthropogenic with indicating that deforestation in the QIN-populated area was associated example, (for species (for adapted species example, as there on was impact no of the populations differential wet habitat– climate, in changes concurrent by caused been have to unlikely was ago years 4,000 around pandas, QIN by populated area the including China, northern in habitats forest in decline extensive and a continuing was and human activities. Arboreal pollen studies have indicated that there (LD; disequilibrium age The QIN population’s decline correlated with the most extensive link population QXL the ( substantially and more increased slightly, increased population MIN the decreased, population QIN the ways: different in but fluctuated ther fur populations three These populations. panda distinct genetically MIN and QXL populations, which gave rise to today’s pattern of three ago (CI = 400–4,100 years ago), the non-QIN cluster diverged into the population remained stable. Our data showed that, about ~2,800 years QIN the while decline, to began population non-QIN the event, this regions inhabited by non-QIN pandas ( 10 2 Probable causes of habitat Probable causes include loss the decline QIN population 0.108 0.014 ). For the next 2,500 years (~500 BC–present), the northern northern the BC–present), (~500 years 2,500 next the For ). Supplementary Note 10 2 0.053 3

5 ; however, paleobiological studies have indicated that this this that have indicated studies ; however, paleobiological Years beforethepresent Years beforethepresent 0.000 10 3 2 4 , but resulting in depletion of the surrounding surrounding the of depletion in resulting but , 2,777 Quercus Supplementary Fig. 5 Fig. Supplementary 0.091 ). Additionally, humans hunted and raised Pinus Fig. 2 Fig. species) 10 A. 4 10 0.055 microta 6 species) and habitat–adapted dry species) b and and 38,879 Supplementary Fig. 3 Supplementary 2 4 . Instead, there is evidence evidence is there Instead, . 0.617 Supplementary Table 4 Supplementary upeetr Note Supplementary 10 ). A. lufengensis 5 0.003 304,664 s r e t t e l Supplementary 10 N 7 0 5 10 15 20 25 30 35 40 e 10 10 10 10 ). ). After

3 2 4 5

/1,000 years) /1,000 (g/cm MAR 3 ). ). ). ).  - - -

© 2012 Nature America, Inc. All rights reserved. the whole genome (Online Methods). Between QIN and non-QIN non-QIN and QIN Between Methods). (Online genome whole the test simulation methods pandas, resulting in the decline of this population. QIN the on pressureanthropogenic more placed have would which road was established in the Qinling Mountains ( enabled population expansion. About 2,400 yearshave wouldhabitats ago,Alternatively,regions.new colonizationofthese a new north-south reductionhumanin activities should have allowed habitat recovery in panda habitats to the lowlands and abandonment of the roadpopulations ( populations ~2,800 years ago ( human activity, might have established the initial separation of the two Suchgeographica barrier, accompanied byregional deforestation and necting their kingdom with the outside world ( kingdom (6,700–2,300 years ago) and built the most byimportant the Minroad River con valley, along which the ancient Shu people established a decline. the for explanation plausible most the as disturbance anthropogenic leaving events, tic compromised was ability reproductive panda that indicate to evidence solid no is there However, decline. this to contributed ability) reproductive degenerative example, (for problems physiological panda-relevant or disease) infectious or off die- and flowering bamboo large-scale example, (for events logical eco stochastic that possible remain does It population. QIN the of these decline the in effect, role important an had climatic have to a seem for activities human evidence of lack with along Together, populations. QXL and MIN the between selection directional under genes of number the bars, blue populations; non-QIN and QIN the between selection balancing under genes of number the bars, yellow populations; panda non-QIN and QIN the between selection directional under genes of number the bars, Red database. KEGG the 3 Figure s r e t t e l  populations, a total of 111 (134 SNPs) and 152 (212 SNPs) genes were

To examine local population adaptation, we used coalescence-based The non-QIN (MIN and QXL) clusters were geographically divided Gene number 2 10 15 20 25 9 0 5 to detect selection signals in coding DNA sequences (CDS) across

Sensory system Annotation of genes containing selected SNPs on the basis of basis the on SNPs selected containing genes of Annotation Immune system

systems Organismal Development Fig. 2 Environmental adaptation Digestive system Endocrine system b

26 ) coincided with the retreat of the Shu people from Excretory system , 2 Circulatory system 7 , a Bayesian test Nervous system Environmental Signaling molecules and interaction

Fig. 2 information Signal transduction processing Membrane transport Cell communication b ). The increase in the MIN and QXL processe Cellular Cell motility Cell growth and death 9 2 Transport and catabolism 8 nor any support for stochas any nor support and a ‘model-free’ global

s Lipid metabolism Balancing (QINversus Directional (QINversus Metabolism of other amino acids Directional (MINversusQXL Supplementary Fig. 6 Xenobiotics biodegradation and metabolism Supplementary Fig. Nucleotide metabolism

Metabolis Glycan biosynthesis and metabolism Amino acid metabolism Carbohydrate metabolism

m Energy metabolism 2

5 Metabolism of cofactors and vitamins

non-QIN

. Regional Folding, sorting and degradation non-QIN

Genetic Translation )

information Replication and repair ) ) F

) 6

2 processing ST Transcription 5 ), - - - .

and 24 under balancing selection ( selection balancing under 24 and (a major bitter component) than ofother parts the plant. pandas, consume more bamboo leaves showed that QIN pandas, compared to non-QIN (for example, QXL in the QIN population. Consistent with this finding, field observations (DAF) test frequency taste humans have indicated that these genes are functionally relevantacross the two pandato populationsbitter ( genes, two these, Of groups genes were of involvedsystem ( selected in the sensory ( respectively selection, balancing and directional under be to found upeetr Tbe 8 Table ( Supplementary genes selected directionally fewer have populations, QIN worthwhile. be therefore might genes receptor of olfactory other the of function the analysis More detailed marks scent panda giant in identified in existence forest dense solitary their given survival, and reproduction panda for crucial is communication of a as olfactory form perception Odor k h h TreeBeST, LASTZ (at the Miller Lab website), bear URLs. solutions. conservation long-term effective most the establishing and assessing in species endangered for other model a as serve also may study This population. robust a of development to promote the unlikely be will environment to a particular ill-suited candidates as reintroduction populations, panda in fragmented these adaptation and local for selection evidence to monitor the important flow. gene be However, it that will by reestablishing indicate data our rescue genetic for means useful a be might of individuals captive-bred release or individuals wild-caught of translocation populations, term long the in extinction of risk greater at them graphically isolated populations (for example, Xiaoxiangling), putting geo into small QXL the (for example, population) populations some of saving this iconic species, human activities have already fragmented improves chances our populations panda in diversity genetic stantial the panda’s of the presence sub status. Although endangered current for reason major a likely are disturbances anthropogenic recent that demonstrated and population panda of the history of the outline ous of genomic and population genomics approaches provided a continu lated mountain habitats in western China tries in southeast Asia, but today they are confined to six relatively iso ( populations non-QIN and lations, but only receptor genes under directional selection in the MIN and QXL popu taste, in bitter involved genes receptor two the for signals selection directional ( system sensory the to related is populations ( heterogeneity habitat lower their interpopulation with consistent is which them, between processes tion Supplementary Supplementary Table 2 Supplementary Table Supplementary 5 e t t t t g We also found 8 olfactory receptor genes under directional selection The MIN and QXL populations, compared to the QIN and non- and QIN the to compared populations, QXL and MIN The Giant pandas once inhabited most of China and neighboring coun p p g : : 3 / / / 0 ; GenBank, GenBank, ; / / ru maritimus , and they may have a similar role in pandas. A derived allele allele derived A pandas. in role similar a have may they and , a m Giant panda genome, genome, panda Giant s i e a d . e . s n h 3 Tas2r3 t 4 t s a . However, only ligands of the gene gene the of However,. ligands only t e n p m aDV f : OR51L1 / o h b / r t t l and and d t . r o A p e . Tas2r49 3 e r NCE ONLINE PUBLIC ONLINE NCE : e 1 / g d s indicated that indicated / / u o w i Tas2r49 ). ). The largest group of selected genes in these overlapped with those identified in overlapped the with identified QIN those ). KEGG annotation showed that the largest the that showed annotation KEGG ). genome, genome, n f / w , niaig es aito i te selec the in variation less indicating ), t t d . a s w e Supplementary Tables 6 Supplementary n o and and x . g n u . l h c r a b t c b . We also identified eight olfactory olfactory eight identified also We . m e i Supplementary Table 6 / . Tas2r3 h f n s l o t Supplementary Tables 6 6 Tables Supplementary ; KEGG, KEGG, ; h o l t r m p t f g h 3 t t Tas2r49 : 3 e p / w . t 3 5 n / , which are higher in alkaloids t . 5 : and genetic differentiation differentiation genetic and 36 w n p a / i , were directionally selected selected directionally were , ( h / r A e : , w 3 / g Supplementary Table 7 Table Supplementary e . t TION 7 / g i / / w . In this study, integration g g o f t h . r r i a v b was positively selected selected positively was g Fig. Fig. a e t d / x t a p e g p

b . d b p p e : . OR52R1 / b e s Nature Ge Nature o n e / 3 3 u s . . r b w 7 o h t ), but we saw no no saw we but ), . g . For such small small such For . a e . r t w s / n and d m g h g w u k / i t l p / a / m . ; Ensembl, Ensembl, ; m g ). Studies in . n o 8 have been been have e l l t i ; ; ). n a - l l r p o e n n Fig. Fig. and frappe - a m r = 44; 44; = b etics _ n e e l d a a . 7 j 3 3 b a r p 2 ). ). / / ). ). ). ). / ------) / , ; ; ;

© 2012 Nature America, Inc. All rights reserved. NCBI Sequence Read Archive (SRA) under accession accession under (SRA) Archive Read Sequence NCBI Accession codes. the in v available are references associated any and Methods M Nature Ge Nature The The authors declare no competing interests.financial the revised manuscript. Q.C. analyzed the data. X. Zhan, F.W., P.Z., S. Zhao, Q.W. and S.D. wrote and research. S. Zhao, Q.W., S.D., X. Zhan, P.Z., X.G., W.H., W.F., D.L., X. Zhang and prepared the samples. S. Zhao, P.Z., X. Zhan, Y.H., Jian Wang and H.Y. performed the sequencing and analysis. P.Z.,supervised S. Zhang, L.Z., H.Z., Z.Z., X.J. and J.Z. F.W. the designed research and interpreted data. Jun Wang led the genome B. data analysis. on forbear Li polar assistance B. Wang, Y. Huang, G. Wang, and C. Lin F. and assistance for Xi laboratory Wemanuscript. G. Tian, M. Jian,thank also H. Jiang, M. Zhao, Q. Zhang, M. Holyoak, for S. Kumar of comments and Swaisgood revisions this and R.R. with map, Gutenkunst on of distribution for panda analysis the R.N. suggestions We sample during collection. for T.assistance acknowledge Meng for generation Wildlife Shanghai ParkPanda, the Zoo, Shanghai the Zoo and Zhengzhou the WeChina. of GiantCenter the Fuzhou the Zoo, Chongqing Research the thank of(KSCX2-EW-Z-4) and AcademyStateAdministration the of Forestry Sciences Innovation Knowledge of Program the Chinese the (31230011), of China by study This was supported grants from National Foundationthe Natural Science Note: Supplementary information is available in the 9. 8. 7. 6. 5. 4. 3. 2. 1. r at online available is information permissions and Reprints Published online at COMPETING FINACIAL INTERESTS COMPETING AU A e e c

p

eth r r kno T State Forestry Administration. Forestry State R. Li, and strategies ages. ice Quaternary research the of legacy climate: genetic The G. and Hewitt, Ecology S.H. Schneider, & T.L. Root, Qiu, Z. & Qi, G. Zhang, B. Zhang, Alexander, D.H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry admixture: individual of Estimation N.J. Risch, & P. Wang, J., Peng, H., Tang, X. Yi, an end to the ‘‘EvolutionaryEnd’’?the Deadto end an individuals. unrelated in considerations. design (2005). study and analytical Science China (2010). 311–317 (2000). implications. Palasiatica Vertebrata s i n i ∂ H o t a s O n ∂ / i ods i, H. Li for suggestions on PSMC simulations and L. Goodman, J. i, on H. for Li PSMC and suggestions simulations L. Elser,Goodman, n wl R R C

(Science Press, Beijing, 2006). Beijing, Press, (Science o et al. et d t al. et f

e 329

x t e ONT . n h et al. et h dg Sequencing of 50 human exomes reveals adaptation to high altitude. high to adaptation reveals exomes human 50 of Sequencing e t etics h cmlt gnm sqec o te in panda. giant the of sequence genome complete The , 75–78 (2010). 75–78 , m

Science p ment l RIBU . Genetic viability and population history of the giant panda, puttingpanda, giant the ofpopulation historyandviability Genetic Ailuropoda a h p Panda resequencing reads have been deposited in the t t e p

r : / . T

/ aDV 269 27 s w I ON w , 153–169 (1989). 153–169 , Genome Res. Genome w found from the late Miocene deposits in Lufeng, Yunnan. , 334–341 (1995). 334–341 , A . S n NCE ONLINE PUBLIC ONLINE NCE a t The 3rd National Survey Report on Giant Panda in Panda Giant on Report Survey National 3rd The u r e . c o m

19 / d Mol. Biol. Evol. Biol. Mol. o , 1655–1664 (2009). 1655–1664 , i f i n d e ee. Epidemiol. Genet. r o / n 1 l 0 i n . 1 e A 0

v TION

3 24 e 8 r h s / , 1801–1810 (2007).1801–1810 , Nature t i n t o p g n

. : 2

/ o / 4 w S f

9 R t w

h 4 405 w A 28 e . .

. p

Nature n 0 289–301 , a

a , 907–913 , 5 p t

u 3 e o

r r 3 . e n 5

. c 463 l 3 i o . n m e , /

23. 22. 21. 20. 19. 18. 17. 16. 15. 14. 13. 12. 11. 10. 30. 29. 28. 27. 26. 25. 24. 37. 36. 35. 34. 33. 32. 31.

Ren, G. & Beug, H.J. Mapping Holocene pollen data and vegetation of China. of vegetation and data pollen Holocene Mapping H.J. Beug, & G. Ren, Inferring C.D. Williamson,Bustamante, R.D., & Hernandez, S.H. R.N., Gutenkunst, for evidence Molecular C. Jia, & M.W. Bruford, F., Wei, Y., Zheng, X., Zhan, and Hu, J. & Wei, change F. climate Comparative between ecology of giant relationship pandas The in the five Y. mountain Shen, ranges of their & Q. Xu, B., Zheng, Zhao, H., Yang, J., Xu, H. & Zhang, J. Pseudogenization of the umami taste receptor Jin,C. Pei,W.Evolutionary historygiantpandas.of Sun, Y.B. & An, Z.S. Late Pliocene-Pleistocene changes in mass accumulation rates Wang, J. On the taxonomic status of species, geological distribution and evolutionary whole- individual from history population human of Inference R. Durbin, & H. Li, uncovers variation SNP of survey Genome-wide Consortium. HapMap Bovine The eigenanalysis. and structure Population D. Reich, & A.L. Price, N., Patterson, Estimating C. Cockerham, & B.S. Weir, eehf W. Meyerhof, Weir,Cockerham,& fromC.C.flow gene Estimation ofB. appropriate loci selected identify to method genome-scan A O. Gaggiotti, & M. Foll, of analysis genetic the in use for loci Evaluating R.A. Nichols, & M.A. Beaumont, Excoffier, L. & Lischer, H.E.L. Arlequin suite ver3.5: a new series of programs to perform N. Ren, or change climatic China: in forests Holocene mid-to-late the of Decline G. Ren, h, L. Zhu, X. Zhan, Hagey,giant in individuality and gender identify cues Chemical E. MacDonald, & L. R.R. Swaisgood, Pan,W. J. Zhu, & W. Pan, J., Hu, G.B., Schaller, P.C.Sabeti, frequency data. frequency SNP multidimensional from populations multiple of history demographic joint the Plateau. Tibetan the of (2011). 3014–3026 edge eastern the at refugia Pleistocene 2004).London,CaliforniaPress,(University137–148of Baragona,K.) distribution China. in in Int. Quat. speculation. and review Plateau: Qinghai-Tibetan the on cycles glacial Quaternary Evol. Biol. Mol. gene 10932–10937(2007). Plateau. (2005). D23101 Loess Chinese central the on deposits eolian of of history sequences. genome breeds. cattle of structure genetic the Genet. structure. 855–863 (1993). 855–863 (2008). 977–993 perspective. Bayesian a markers: codominant and dominant both for structure. population 564–567 Windows.(2010). and Linux under analyses geneticspopulation 1987). Shanghai, House, Publishing impact? human Rev. Sci. population: implications for conservation. for implications population: reserve. nature key ( pandas 2004). CA, Berkeley, Press, California of Biology and Conservation 1985). Chicago, Press, Chicago of populations. human in receptors. Tas1r1

etal. 2 et al.et Illustrations and Annotations of Huayang Guo Zhi Guo Huayang of Annotations and Illustrations Ailuropoda melanoleuca Ailuropoda , e190 (2006). e190 , t al. et

et al. et

21 Ailuropoda Evolution 97–98 Chem. Senses Chem. The first skull of the earliest giant panda. et al. et , 1395–1422 (2002). 1395–1422 , n h gat ad cicdd ih t deay wth o bamboo. to switch dietary its with coincided panda giant the in

A ChanceA forLasting Survival t al. et rsi rdcin f h sals ad ot sltd in panda giant isolated most and smallest the of reduction Drastic Molecular censusing doubles giant panda population estimate in a in estimate population panda giant doubles censusing Molecular

J. Quaternary Sci. Quaternary J. PLoS Genet. PLoS 27 et al. et , 93–101 (2002). 93–101 , Genome-wide detection and characterization of positive selection positive of characterization and detection Genome-wide , 2669–2673 (2010). 2669–2673 , h mlclr eetv rne o hmn A2 bte taste bitter TAS2R human of ranges receptive molecular The

Curr. Biol. Curr. 38 Nature . P. Roy. Soc. Lond. B. Bio. B. Lond. P.Soc. Roy. Chemical communication in giant pandas. in pandas. giant in communication Chemical Acta Zool. Sinica Zool. Acta , 1358 (1984). 1358 , Nature Giant Pandas: Biology Conservation and (eds. Lindburg, D.G. & Baragona, K.) 106–120 (University

35

475 , 157–170 (2010). 157–170 ,

5

16 449 , e1000695 (2009). e1000695 , , 493–496 (2011). 493–496 , ). , R451–R452 (2006). R451–R452 ,

J. Chem. Ecol. Chem. J. , 913–918 (2007). 913–918 , 15 , 273–281 (2000). 273–281 , Science

20 F -statistics for the analysis of population of analysis the for -statistics (Beijing University Press, Beijing, 2001). Conserv. Biol. Conserv. The Giant Panda of Wolong of Panda Giant The , 191–201 (1974). 191–201 , ActaZool.Sinica

324

263

29 Proc.Natl. Acad. Sci. USA , 528–532 (2009). 528–532 , , 1619–1626 (1996). 1619–1626 , , 1479–1488 (2003). 1479–1488 ,

(Shanghai Ancient Books Ancient (Shanghai F 24 -statistics. o. cl Resour. Ecol. Mol. . epy. Res. Geophys. J. , 1299–1306 (2010). 1299–1306 ,

20 s r e t t e l (eds.Lindburg, & D. , 188–190,(1974). o. Ecol. Mol. Genetics Giant Pandas: Giant Evolution (University

Quat. PLoS 110 180

104

20 47 10  , , , , , ,

© 2012 Nature America, Inc. All rights reserved. autosomal sequences autosomal approach. PSMC the using reconstruction history Demographic script. R an using drawn was decay LD and distance, same the 0.0001 and −minGeno 0.6. Average −hwcutoff 0.01, −minMAF −dprime, 100, −maxdistance follows: as set were ( disequilibrium. Linkage pairwise by measured was F differentiation Population windows. adjacent 100 and (10, 500 sizes of kb) that windows had different 90% overlap between ( lation θ analysis. Admixture in used were settings the and methods in Default analysis. 10,000 to set was algorithm expectation-maximization the of clusters respectively. algorithm, block-relaxation To a explore the convergence and of individuals, we predefined the number algorithm of genetic maximization programs analyses. structure Population URLs). (see by TreeBeST generated was tree rooted neighbor-joining A positions. corresponding at outgroup the be at SNPs bear the to polar were Genotypes considered regions. syntenic within SNPs extracted and URLs) (see LASTZ using genomes bear polar and panda inference. tree Phylogenetic ( test Tracey-Widom the using determined matrix were generated with the R levels were reigen, and function significance software EIGENSOFT3.0 using SNPs analysis. Principal-components 0.82–3.01%. be to estimated was calling SNP population for rate error The inferred. erroneously were SNPs 3 homozygous; were which of 8 polymorphic, be to confirmed SNPs were 355 test, In the sequencing. Sanger SNP validation. the SNPs population detect to approach detection. SNP Population default. the to set were parameters Other array suffix coordinates coordinates to and pseudochromosomal paired reads. matches for each read. Third, the command ‘sampe –a 500 –o converted 1000’ command ‘aln –t 3 –e 10’ was to used array of the find coordinates suffix good genome reference the to reads Read alignment. generated. were reads Sequencing construction. library paired-end 100-bp and platform, 2000 for HiSeq Illumina on the performed was inserts 500-bp with and amplified adaptors PCR paired-end to ligated A-tailed, repaired, end protocol: were preparation fragments sample DNA Illumina the to according treated then were DNA fragments system. Covaris the with bp 200–800 of fragments into 1–3 individual, each For samples. muscle or blood sequencing. and construction Library Mountains Xiaoxiangling the from 1 and ( Mountains Daxiangling the from 1 Mountains, Liangshan the from 2 Mountains, Mountains, Qionglai the Minshan from 15 the from 7 Mountains, Qinling the from individuals 8 with distributions, geographic main 6 the covered Sampling pandas. giant information. Sampling O Nature Ge Nature recombination historical inferring of accuracy the improve to excluded were Supplementary Table 1 Supplementary r π ST 2 NLINE , , bten n to oi a cluae uig Haploview using calculated was loci two any between ) among three panda populations panda three among θ Supplementary Note Supplementary w and and θ K π from 2–7 and ran both programs 5 times. The maximum iteration iteration maximum The times. 5 programs both ran and 2–7 from )

3 METH frappe F 9 n and Watterson’s estimator ( Watterson’sestimator and ST etics calculations. A total of 366 SNPs from 4 pandas were validated by PCR and We Aligner the used Burrows-Wheeler 7 n Admixture and ODS 4 , scaffolds shorter than 50 kb (~2.6% of all scaffolds) scaffolds) all of (~2.6% kb 50 than shorter scaffolds , Blood and Blood samples tissue were from obtained 34 wild ). . To evaluate LD decay, the correlation coefficient coefficient correlation the decay, LD evaluate To We identified homologous regions between the between regions Wehomologous identified The average pairwise diversity within a popu a within diversity pairwise average The We adopted an algorithm using a Bayesian Bayesian a using algorithm an adopted We 4 . First, the reference was indexed. Second, the the Second, indexed. was reference the First, . Genetic structure was inferred using the the using inferred was structure Genetic We conducted PCA on autosomal biallelic biallelic autosomal on PCA We conducted r 2 1 6 8 was calculated for markerspairwise with 0 . Details of SNP calling are provided in in provided are calling SNP of Details . wih mlmn a expectation- an implement which , . 1 1 Genomic DNA was extracted from from extracted was DNA Genomic . Eigenvectors from the covariance covariance the from Eigenvectors . θ Supplementary Table 9 Supplementary w ) 4 0 were calculated with sliding sliding with calculated were µ g of DNA was sheared sheared was DNA of g 3 8 to map paired-end 4 1 . Parameters Parameters . ). For the the For frappe frappe -

for pandas was set at 12 years years ago 16.4 million ( the be Divergence two 3.53%. between was species to estimated have occurred to estimated was bear polar and panda the between divergence sequence The to infer used to of 2 units in ‘4+25*2+4+6’. −p and −r5 −t15, −N30, reconstruct the with PSMC model history to demographic used were loci heterozygous 1,680,757 of total a and events, and q estimate conservative a is BayeScan that suggested studies recent Because values. default at the set were parameters Other of 20. interval 50,000 iterations with an additional burn-in BayeScan of 500,000 program iterations the and using a thinning performed was test < 0.05 as possible outliers ( P completing the analysis, we performed false discovery rate (FDR) correction of The of simulations and 10% data our missing 200,000 allowing in setting each test. groups of populations). Parameters were set as default values, with the exception F to software same the in mented ( populations non-QIN and QIN the within structure genetic hierarchical the approach FDIST the (with were loci for37,405 used of the analyses the two comparisons, respectively. and excluded with those ofminor frequency allele <0.05. A total of 37,999 and and QXL) pair and (ii) the MIN and QXL pair. We and QIN (MIN non-QIN the chose(i) signals: selection the SNPsof to detect populations from CDS regions genome. On the basis of the population structure detected, we defined two pairs F populations. pairwise for selection under SNPs Detecting set. data original the from with sampled replacement of numbers new data with equal sets (111,161) loci 50 the using of parameter each variance the to 50 determine times performed were derived from parameters byscaled θ The size population ( ancestral at the population level were retained for the coverage sequencing 40-fold than more SNPswith sequencing, low-coverage of effect the Tominimize neutrality. their ensure to sequences autosomal in regions from intergenic those we only considered pandas, 34 the resequenced inference using history demographic Recent two the of correlation the estimate to ( factors software SPSS16.0 in correlation Pearson’s We used intervals. same the for MAR corresponding the averaged N Correlation statistics for results. simulated of variance the estimate to times 100 sampling 10 × 16.4 values ( one optimal the as chosen was value likelihood ( considered by authors the provided kindly was carnivores for specific matrix substitution trinucleotide was performed to correct ascertainment bias of the ancestral state, in which the genome sequence was used to infer ancestral alleles, and a statistical procedure Fig. < 0.1 were considered to be outliers in this analysis ( CT value and the mutation rate. Population size and chronological split time time split chronological and size Population rate. mutation the and value ST values, and each locus received a e values for every time interval from the PSMC simulation results. We results. then simulation PSMC the from interval time every for values F N The The estimated time to the most recent common ancestor (TMRCA) is given First, the two pairs of populations were tested using the finite island model model island finite the using were tested of populations pairs two the First, Following Following Li’s procedure Four divergence models among three genetic populations of pandas were were pandas of populations genetic three among models divergence Four -based approaches to investigate the selection signals across the whole whole the across signals selection the investigate to approaches -based 8b values (the proportion of total genetic variance due to differences among differences to due variance genetic total of proportion (the values ST 0 P

(the present effective population size). The neutral mutation rate rate mutation neutral The size). population effective present (the 2 value for each locus was estimated using a kernel density approach. After between the two populations as well as outliers with respect to pairwise to as pairwise two the populations well as respect with outliers between 2 b ). ). For each pair of populations, we also measured the pairwise global 9 for every locus across the whole genome to detect the highly differential ), this pair was also analyzed using the hierarchical island model imple n 6 = 46). = ) = 1.29 × 10 × 1.29 = ) N N upeetr Note Supplementary 0 4 time, and the relative population size ( size population relative the and time, 0 3 and scale the TMRCA and TMRCA the and scale (D.G. Hwang, University of Washington). −8 Supplementary Note Supplementary mutations per generation for the giant panda. giant the for generation per mutations N Supplementary Figs. 7a e 1 2 inferred by PSMC results and MAR. 3 7 4 2 , , we applied a approach, bootstrapping repeating ) in Arlequin in ) 2 6 . . Therefore, we calculated N to estimate loci that were outliers with respect respect with outliers were that loci estimate to a ) ) was on estimated the basis of the calculated q value . h mdl ih h mxmm log- maximum the with model The ). N 4 4 a . We considered the loci with . . Nonparametric bootstrapping was 2 N ∂ 1 6 5 a ) ) and a time ( mean generation to detect outliers. Considering Considering outliers. detect to e . Parameters . were Parameters as set follows: values into chronological time. time. into chronological values ∂ − i 2 a 2 − simulations. The polar bear – 2 Supplementary Table 10 Table Supplementary i. 8 c . We ran 20 pilot runs of runs pilot 20 ran We . Of the SNPs in identified and Supplementary Figs. Supplementary 7d N e µ 8a ) at state ) at state = (0.0353 × 12)/(2 × doi:10.1038/ng.2494 45 ). Then, a Bayesian , 4 6 We performed performed We , loci with FDR FDR with loci , We extracted t was scaled scaled was q µ value was F ST g ). ). - )

© 2012 Nature America, Inc. All rights reserved. using Ensembl (see URLs) and KEGG (see URLs) and then classified each each classified then and URLs) (see KEGG and URLs) (see Ensembl using under selection. under of loci Annotation be to assumed were alleles derived selection. of positive frequencies higher with Populations alleles. of derived frequency a harbored higher population which detect to populations two between selection directional under locus each of frequency allele derived the compared and We calculated genome. then bear polar the using alleles ancestral we inferred First, populations. to of selection test. frequency allele Derived false positives in the SNP set of balancing selection should be limited. les subjected to balancing selection ( an enrichment in intermediate frequency for the derived allele frequency of alle alleles. As expected, compared with directionally or unselected alleles, there was ancing selection, populations, (ii) the non-QIN alleles and under directional QIN selection the and for (iii)we generatedthe a derivedunselected addition, allele frequency distribution In for (i) the alleles loci. under bal selected true as fied by two or more methods to be outliers ( ( outliers to potential be considered SNPs between populations. The top 1% of SNPs ranked with raw doi:10.1038/ng.2494 To minimize the detection of false positives, we considered those loci identi We used the DAF test DAF the We used Supplementary Figs. 7e Figs. Supplementary Supplementary Fig. 9 We annotated genes with selected SNPs selected Wewith genes annotated Supplementary Tables 11 3 1 to localize the signal signal the localize to ), indicating that the and F ST 8c values were ). and 12 - - - )

45. 46. 44. 43. 42. 41. 40. 39. 38. annotations. GenBank and KEGG Ensembl, using examined ( databases function Brite KEGG and pathways KEGG the to according gene Supplementary Tables 13 Tables 13 Supplementary

Huang, K., Whitlock, R., Press, M.C. & Scholes, J.D. VariationJ.D. Scholes, & within M.C. range Press, host R., for Whitlock, K., Huang, Buckley, J., Butlin, R.K. & Bridle, J.R. Evidence for evolutionary change associated change evolutionary for Evidence Buckley,J.R. Bridle, & R.K. Butlin, J., rates. discovery false to approach direct A J.D. Storey, analysis sequence Carlo Monte chain Markov Bayesian P. Green, & D.G. Hwang, F.Wei, J.C. Barrett, without models genetical in sites segregating of number the On G.A. Watterson, Tajima, F. Evolutionary relationship ofBurrows-Wheeler with DNAalignment read sequencesshort accurate and Fast in R. finiteDurbin, & H. populationLi, n aog ouain o te aaii plant parasitic the of populations among and (2002). 479–498 to climate change. climate to butterfly,British the of expansion range recent the with (2012). 96–104 USA Sci. Acad. evolution. mammalian in patterns substitution neutral varying reveals (1989). 81–86 Bioinformatics recombination. 437–460(1983). transform. et al. et Bioinformatics A study on the life table of wild giant pandas.giant wild of table life the on study A et al. et

21

Theor. Popul. Biol. Popul. Theor. 101 Haploview: analysis and visualization of LD and haplotype maps. haplotype and LD of visualization and analysis Haploview: , 263–265 (2005). 263–265 , Mol. Ecol. Mol. , 13994–14001 (2004). 13994–14001 ,

25 and , 1754–1760 (2009). 1754–1760 ,

21 14 , 267–280 (2012). 267–280 ,

). Olfactory and taste receptor genes were were genes receptor taste and Olfactory ). 7 , 256–276 (1975). 256–276 , tia hermonthica Striga Aricia agestis Aricia . . tt Sc, B Soc., Stat. R. J. Nature Ge Nature Acta Theriol Acta . s . Heredity Genetics , in response in , rc Natl. Proc. . n Sinica etics

108 105

64

9 , , , ,