The University of Bradford Institutional Repository

http://bradscholars.brad.ac.uk

This work is made available online in accordance with publisher policies. Please refer to the repository record for this item and our Policy Document available from the repository home page for further information.

To see the final version of this work please visit the publisher’s website. Available access to the published online version may require a subscription.

Link to publisher’s version: http://dx.doi.org/10.1038/ncomms10815

Citation: Adhikari K, Fontanil T, Cal S, et al. (2016) A genome-wide association scan in admixed Latin Americans identifies loci influencing facial and scalp hair features. Nature Communications, 7, Article No. 10815.

Copyright statement: © 2016 The Authors. Published by Nature. This work is licensed under a Creative Commons Attribution 4.0 International License. http://creativecommons.org/licenses/by/4.0/

ARTICLE

Received 13 Jul 2015 | Accepted 25 Jan 2016 | Published 1 Mar 2016 DOI: 10.1038/ncomms10815 OPEN A genome-wide association scan in admixed Latin Americans identifies loci influencing facial and scalp hair features

Kaustubh Adhikari1, Tania Fontanil2, Santiago Cal2, Javier Mendoza-Revilla1,3, Macarena Fuentes-Guajardo1,4, Juan-Camilo Chaco´n-Duque1, Farah Al-Saadi1, Jeanette A. Johansson5, Mirsha Quinto-Sanchez6, Victor Acun˜a-Alonzo1,7, Claudia Jaramillo8, William Arias8, Rodrigo Barquera Lozano7,9, Gasto´n Macı´nPe´rez7,9, Jorge Go´mez-Valde´s10, Hugo Villamil-Ramı´rez9,Ta´bita Hunemeier11,w, Virginia Ramallo6,11, Caio C. Silva de Cerqueira6,11, Malena Hurtado3, Valeria Villegas3, Vanessa Granja3, Carla Gallo3, Giovanni Poletti3, Lavinia Schuler-Faccini11, Francisco M. Salzano11, Maria-Ca´tira Bortolini11, Samuel Canizales-Quinteros9, Francisco Rothhammer12, Gabriel Bedoya8, Rolando Gonzalez-Jose´6, Denis Headon5, Carlos Lo´pez-Otı´n2, Desmond J. Tobin13, David Balding1,14 & Andre´s Ruiz-Linares1

We report a genome-wide association scan in over 6,000 Latin Americans for features of scalp hair (shape, colour, greying, balding) and facial hair (beard thickness, monobrow, eyebrow thickness). We found 18 signals of association reaching genome-wide significance (P values 5 10 8 to 3 10 119), including 10 novel associations. These include novel loci for scalp hair shape and balding, and the first reported loci for hair greying, monobrow, eyebrow and beard thickness. A newly identified locus influencing hair shape includes a Q30R substitution in the Protease Serine S1 family member 53 (PRSS53). We demonstrate that this enzyme is highly expressed in the hair follicle, especially the inner root sheath, and that the Q30R substitution affects enzyme processing and secretion. The genome regions associated with hair features are enriched for signals of selection, consistent with proposals regarding the evolution of hair.

1 Department of Genetics, Evolution and Environment, and UCL Genetics Institute, University College London, London WC1E 6BT, UK. 2 Departamento de Bioquı´mica y Biologı´a Molecular, IUOPA, Universidad de Oviedo, Oviedo 33006, Spain. 3 Laboratorios de Investigacio´n y Desarrollo, Facultad de Ciencias y Filosofı´a, Universidad Peruana Cayetano Heredia, Lima, 31, Peru´. 4 Departamento de Tecnologı´aMe´dica, Facultad de Ciencias de la Salud, Universidad de Tarapaca´, Arica 1000009, Chile. 5 Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Midlothian EH25 9RG, UK. 6 Centro Nacional Patago´nico, CONICET, Puerto Madryn U9129ACD, Argentina. 7 National Institute of Anthropology and History, Me´xico 4510, Me´xico. 8 GENMOL (´tica Molecular), Universidad de Antioquia, Medellı´n 5001000, Colombia. 9 Unidad de Geno´mica de Poblaciones Aplicada a la Salud, Facultad de Quı´mica, UNAM-Instituto Nacional de Medicina Geno´mica, Me´xico 4510, Me´xico. 10 Facultad de Medicina, UNAM, Me´xico 4510, Me´xico. 11 Departamento de Gene´tica, Universidade Federal do Rio Grande do Sul, Porto Alegre 91501-970, Brasil. 12 Instituto de Alta Investigacio´n, Universidad de Tarapaca´, Arica 1000000, Chile. 13 Centre for Skin Sciences, Faculty of Life Sciences, University of Bradford, Bradford BD7 1DP, Victoria, UK. 14 Schools of BioSciences and Mathematics and Statistics, University of Melbourne, Melbourne 3010, Australia. w Present address: Departamento de Gene´tica e Biologia Evolutiva, Universidade de Sa˜o Paulo, 05508-090, SP, Brasil. Correspondence and requests for materials should be addressed to A.R.-L. (email: [email protected]).

NATURE COMMUNICATIONS | 7:10815 | DOI: 10.1038/ncomms10815 | www.nature.com/naturecommunications 1 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms10815

here is great diversity in primates with regard to hair to moderate, but significant, correlations between them and with appearance, including its distribution, shape and colour1. basic covariates (using a 10 3 Bonferroni-adjusted threshold; THair plays a range of important functions, including Supplementary Table 2). The highest significant trait correlations thermal regulation, camouflage, sensory and social signalling, and occur between: beard and eyebrow density and monobrow the evolution of hair has been proposed to be influenced by (r ¼ 0.14 to r ¼ 0.24); balding and beard density (r ¼ 0.15); hair both natural and sexual selection1. differ from other greying and balding (r ¼ 0.13) and hair greying and beard density primates in having lost most terminal body hair (via hair (r ¼ 0.13). The highest trait-covariate correlations detected were: follicle miniaturization), possibly in connection to the adaptive age with hair greying (r ¼ 0.56), balding (r ¼ 0.15), beard density development of more efficient sweating, linked to bipedalism2,3. (r ¼ 0.28) and eyebrow thickness (r ¼0.17); balding and hair However, considerable head hair has been retained in modern colour with sex (r ¼0.35 and r ¼0.10); European ancestry humans and its appearance shows extensive variation between correlates most strongly with hair colour (r ¼0.32) and beard individuals4–6. Head hair appearance is highly heritable7–10 and density (r ¼ 0.3). Based on a kinship matrix derived from the SNP certain traits show high differentiation between continental native data19, we estimated narrow-sense heritability using GCTA20.We populations. For instance, variation in hair colour is essentially found significant values for all the traits examined restricted to West Eurasia11, whereas straight hair is virtually (Supplementary Table 3), with the highest heritability being absent from sub-Saharan Africa4. It has been proposed that estimated for hair colour (1) and the lowest for hair greying variation in head hair appearance has been influenced by (0.27). These estimates of heritability are similar to other available selection during the evolution of modern humans2,3,11. estimates based on family data7–10. Although genetic association studies for human hair traits are few to date, loci influencing male pattern baldness, scalp hair GWAS for hair features. We performed genome-wide colour and shape (curliness) have been identified in samples of association tests on 9,143,600 chip genotyped and imputed SNP European and East Asian ancestry6. Consistent with the key role data using multivariate linear regression, as implemented in of androgens in balding, the androgen receptor (AR) gene region PLINK21, using an additive genetic model adjusting for: age, sex has been highlighted as a major determinant among several loci and the first five SNP principal components (Supplementary associated with male-pattern baldness12. The associated Fig. 3). All the traits scored showed genome-wide significant with hair colour, involved in various aspects of melanocyte association (P values 5 10 8) with SNPs in at least one biology and melanin synthesis, also often impact on skin and eye o genomic region (Fig. 1 and Table 1). We re-examined the pigmentation13. Interestingly, different genes have been association signals for each index SNP in every country sample associated with straight hair in Europeans and East Asians, separately (by performing independent association tests) and suggesting that this trait evolved independently at least twice. combined results as a meta-analysis using METAL22. For each The most robust associations for straight hair have implicated SNP, significant effects were in the same direction in all countries, Trichohyalin (TCHH, a structural hair ) in Europeans14,15, the variability of effect size across countries reflecting sample size and EDAR (a cell signalling receptor) in East Asians16, illustrating (Fig. 2, Supplementary Fig. 4 and Supplementary Table 4). To the range of cellular mechanisms that can impact on hair shape. probe the extent of phenotypic variation captured by the genetic To further our understanding of the genetic basis of variation data, we constructed a phenotype prediction model combining in human hair, we performed a genome-wide association study the index SNPs (Table 1) and a BLUP (Best Linear Unbiased (GWAS) in individuals of mixed European, Native American and Predictor) random effects component calculated from the African ancestry, who show high genetic diversity and extensive genome-wide kinship matrix19, in addition to covariates variation in head hair appearance. We report ten novel genetic (Supplementary Table 5). The BLUP term is sensitive to the associations, including the first reported loci for hair greying and effect of SNPs associated below the threshold for genome-wide for facial hair distribution/density. A newly identified locus significance. The prediction accuracy of this model was broadly influencing scalp hair shape encodes a serine protease (PRSS53) consistent with the heritability estimates (Supplementary that we show is expressed in the hair follicle, with strongest Table 3). Highest prediction accuracy was observed for hair association seen for a Q30R substitution that we show affects colour, with 41% of the total phenotypic variance for this trait enzyme processing. Similar to what has been observed in the being captured by the model, including 20% explained by the EDAR gene region, we find evidence of recent selection in East index SNPs and the BLUP component. The lowest prediction Asians at PRSS53. accuracy was seen for monobrow, for which 16.2% of the total phenotypic variance was captured by the model. The difference Results between the heritability estimates and the prediction accuracy partly reflects limitations of the model, including that the BLUP Study population and phenotypes. Our study sample consists of component is likely to capture imperfectly the polygenicity of the 6,630 volunteers from the CANDELA cohort recruited in five traits examined (Supplementary Fig. 5). Latin American countries (Brazil, Colombia, Chile, Me´xico and Peru´; Supplementary Table 1)17,18. In these individuals, we performed a categorical assessment (in men and women) of: scalp Candidate genes in genome regions associated with hair traits. hair shape (curliness), colour, balding and greying as well as Several of the regions showing genome-wide significant associa- (in men) of beard thickness (that is, density), monobrow tion include strong candidate genes. Scalp hair shape and beard and eyebrow thickness (Fig. 1 and Supplementary Fig. 1). thickness are strongly associated with SNPs in 2q12 (Fig 3a,b) and These individuals were genotyped on Illumina’s Omni two other traits (eyebrow thickness and monobrow) show Express BeadChip. After applying quality control filters, 669,462 genome-wide suggestive P values for SNPs in this same single-nucleotide polymorphisms (SNPs) and 6,357 individuals region (Supplementary Fig. 6). Associated SNPs overlap the were retained for further analyses (including 2,922 males). EDAR (ectodysplasin A receptor) gene. EDAR acts as part of Average autosomal admixture proportions for this sample were the EDA-EDAR-EDARADD signalling pathway23 during estimated as: 48% European, 46% Native American and 6% prenatal development to specify the location, size and shape of African, but with substantial inter-individual variation ectodermal appendages, such as hair follicles, teeth and glands23. (Supplementary Fig. 2). Several of the traits examined show low Strongest association with hair shape was observed for SNP

2 NATURE COMMUNICATIONS | 7:10815 | DOI: 10.1038/ncomms10815 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/ncomms10815 ARTICLE

Mono-brow Eyebrow Hair greying

Beard Hair colour

Hair shape Balding TCHH AR/EDA2R EDAR PRSS53 PAX3 SLC24A5 FOXL2 HERC2 TYR LNX1 GRID1 IRF4 SLC45A2 GATA3 PREP FOXP2 rs3827760 rs183671 rs1426654 14 rs12913832 rs12203592

12 rs11803731

10 rs112458845 rs2814331 rs11150606 ) rs4864809 rs117717824 p rs4258142 rs598952 rs17143387 (

8 rs6901317 rs2218065 10 6 –Log 4

2

0 1 2 3 4 5 6 7 8 9 X Y 10 11 12 13 14 15 16 17 18 20 22 PAR Figure 1 | GWAS results overview. At the top are shown drawings illustrating the seven hair features examined in the CANDELA study sample. Thick lines connect these features with the candidate genes identified in regions with SNPs reaching genome-wide significant association (Table 1). At the bottom is shown a composite Manhattan plot displaying all significantly associated SNPs for the hair features examined. The rs number of the SNP with the smallest P value is shown at the top of each association peak (Table 1 index SNP). Composite panels in this and subsequent figures were made using Photoshop67. rs3827760 (Fig. 3a and Table 1), coding for a V370A substitution rs3827760 (Supplementary Fig. 7). These intronic SNPs are in EDAR. This variant has been robustly associated with hair located in a different LD block and there is a recombination shape in East Asians16,24,25. In a previous study of the CANDELA hotspot between them and rs3827760 (Fig 3a,b and sample, which examined 30 ancestry informative markers, we Supplementary Fig. 7A). Interestingly, the first intron of EDAR found association between hair shape and a SNP in EDAR that is is rich in regulatory elements and SNPs in this intron show in strong linkage disequilibrium (LD) with rs3827760 in the 1000 evidence of recent selection in Europeans (discussed further genomes data17. The EDAR SNP showing strongest association below). Hypohidrotic ectodermal dysplasia is a Mendelian with beard density is not the coding SNP rs3827760 associated disorder caused by mutations in the EDA-EDAR-EDARADD with hair shape but rather rs365060 located B62 kb upstream of pathway and is characterized by sparse scalp hair, eyebrows and rs3827760 in the first intron of EDAR (Fig. 3b). Several SNPs in eye lashes23. Transgenic mice with increased Edar function have this intron have smaller association P values than rs3827760 and been shown to have thickened and straightened hair fibres26,27. analyses conditioning on rs3827760 suggest that the association We therefore examined chin hair follicle density in wild-type signal of SNPs in this intron is independent from that observed at mice and in an Edar gain-of-function transgenic strain

NATURE COMMUNICATIONS | 7:10815 | DOI: 10.1038/ncomms10815 | www.nature.com/naturecommunications 3 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms10815

Table 1 | Features of index SNPs showing strongest genome-wide significant association (P value o5 10 8) to the scalp and facial hair features examined in the CANDELA sample.

Trait Region SNP Closest gene P value Alleles ancestral4derived Hair shape 1q21 rs11803731 TCHH 3 10 12 A4T Hair shape 2q12 rs3827760 EDAR 3 10 119 A4G Hair shape 10p14 rs17143387 GATA3 4 10 8 T4G Hair shape 16p11 rs11150606 PRSS53 7 10 9 T4C Beard thickness 2q12 rs365060 EDAR 1 10 15 G4C Beard thickness 4q12 rs4864809 LNX1 1 10 8 A4G Beard thickness 6q21 rs6901317 PREP 4 10 8 G4T Beard thickness 7q31 rs117717824 FOXP2 1 10 8 G4T Eyebrow thickness 3q22 rs112458845 FOXL2 4 10 10 G4A Monobrow 2q36 rs2218065 PAX3 5 10 8 A4G Hair greying 6p25 rs12203592 IRF4 3 10 13 C4T Hair colour 5p13 rs183671 SLC45A2 2 10 59 T4G Hair colour 6p25 rs12203592 IRF4 1 10 13 C4T Hair colour 11q14 rs598952 TYR 2 10 8 T4A Hair colour 15q13 rs12913832 HERC2/OCA2 5 10 104 A4G Hair colour 15q21 rs1426654 SLC24A5 1 10 18 G4A Balding 10q22 rs2814331 GRID1 4 10 9 C4T Balding Xq12 rs4258142 AR/EDA2R 2 10 8 C4T

SNP, single-nucleotide polymorphism. Regions with intragenic index SNPs are shown in bold (the gene name is underlined if this index SNP is coding). SNPs in regions reaching only genome-wide suggestive association, P value o10 5,are listed in Supplementary Table 6.

(EdarTg951/Tg951)26. Consistent with the effect of EDAR on beard expressed around the eyes up to the time that hair is formed thickness, we found that the EdarTg951/Tg951 strain has (E13.5)35 and a mutant with altered Foxl2 expression (and BPES significantly lower chin hair follicle density compared with features) typically shows hair loss around the eyes36. wild-type mice (Fig. 4a,b). Monobrow shows genome-wide significant association with Apart from 2q12, scalp hair shape is associated with SNPs in SNPs in 2q36 with strongest association being observed for three other genomic regions (in 1q21, 10p14 and 16p11). marker rs2395845 located B70 kb downstream of the paired box Genome-wide significant association in the 16p11 region was gene 3 (PAX3) gene, an interesting candidate in the region strongest for rs11150606, the derived allele of which codes for a (Fig. 3e). Rare mutations of PAX3 have been shown to cause Q30R substitution in the Protease Serine S1 family member 53 Waardenburg syndrome type 1 (WS1). WS is a clinically and (PRSS53; Fig. 3c and Table 1). Proteases and protease inhibitors genetically heterogeneous Mendelian disorder of neural crest are known to be important for epidermal keratinization and derivatives whose manifestations include deafness, a range of regulate hair growth and cycling28. A spontaneous mouse mutant pigmentation abnormalities, broad nasal bridge and monobrow (frizzy, fr), characterized by curly whiskers, carries an amino-acid (seen in B85% of WS1 patients37). Intronic SNPs within PAX3 substitution in another serine protease (Prss8)29 and a have been implicated in recent GWASs of facial morphology, conditional knock-out with no expression of Prss8 in the particularly in relation with nasion position (the point just above epidermis shows hair abnormalities in newborns and defects in the nasal bridge)38,39. PAX3 is a key transcription factor during corneocyte morphogenesis, epidermal lipid composition, embryogenesis and analysis of mouse mutants have confirmed profilaggrin processing and tight junction assembly30. that it is essential to guide normal development of neural crest The 1q21.3 region overlaps the trichohyalin gene (TCHH), derivatives40. which has been previously associated with scalp hair curliness Hair greying shows genome-wide significant association to in Europeans14,15. Strongest association was seen for SNP SNP rs12203592 in intron 4 of the interferon regulatory factor 4 rs11803731, encoding a M790L substitution in TCHH gene (IRF4; Fig. 3f). This SNP also shows association with hair (Supplementary Fig. 8), consistent with previous GWASs14,15. colour in our sample (Table 1) and in previous studies this SNP TCHH is expressed in cornifying keratinocytes of epithelia, has been associated with skin, hair and eye pigmentation13. particularly in the inner root sheath (IRS) and the hair Recent in-vitro studies have shown that rs12203592 impacts on fibre medulla of hair follicles where it is involved in the cross- the function of an enhancer element regulating IRF4 expression linking of the cornified envelope with cellular keratin filaments31. and the induction of tyrosinase (TYR), a key enzyme in melanin SNPs in 10p14 overlap LINC00708, B150 kb downstream of the synthesis41. In addition to IRF4, hair colour (but not hair greying) GATA-binding protein 3 gene (GATA3), an interesting candidate shows genome-wide significant association to four other in this region (Supplementary Fig. 8). Gata3 is expressed in the well-established pigmentation gene regions (SLC45A2, TYR, hair follicle IRS of mice and a Gata3 null mutant shows abnormal OCA2/HERC2 and SLC24A5; Table 1 and Supplementary hair growth and shape32. Interestingly, Gata3 mutant hair follicles Fig. 8)13. Finally, for balding, we replicate association to the have greatly reduced expression of trichohyalin33. well-established Androgen Receptor/Ectodysplasin A2 Receptor Eyebrow thickness shows genome-wide significant association (AR/EDA2R) locus on Xq12 (ref. 12; Supplementary Fig. 8). to SNPs on 3q23 overlapping the forkhead box L2 (FOXL2) gene The other four genomic regions showing genome-wide (Fig. 3d). Rare mutations in the FOXL2 gene region (including association include no strong candidate genes with established coding variants as well as upstream and downstream intergenic roles in hair biology (Table 1 and Supplementary Fig. 8). Beard rearrangements) cause blepharophimosis syndrome (BPES)34,an thickness is associated with SNPs in 7q31, 4q12 and 6q21 where autosomal dominant eyelid malformation often accompanied by the nearest genes are, respectively: forkhead box P2 (FOXP2), thick eyebrows. Mouse experiments have shown that Foxl2 is ligand of numb-protein X 1 (LNX1) and prolyl endopeptidase

4 NATURE COMMUNICATIONS | 7:10815 | DOI: 10.1038/ncomms10815 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/ncomms10815 ARTICLE

abc 10p14 - rs17143387 : G - Hair shape 16p11 - rs11150606 : C - Hair shape 2q12 - rs365060 : G - Beard thickness

Colombia Colombia Colombia

Brazil Brazil Brazil

Chile Chile Chile

Mexico Mexico Mexico

Peru Peru Peru

Meta Meta Meta

–0.2 –0.15 –0.1 –0.05 0 0.05 –0.25–0.2 –0.15 –0.1 –0.05 0 0.05 0.1 0.15 –0.5 –0.4 –0.3 –0.2 –0.1 0 0.1 0.2 0.3 def 4q12 - rs4864809 : G - Beard thickness 6q21 - rs6901317 : T - Beard thickness 7q31 - rs117717824 : T - Beard thickness

Colombia Colombia Colombia

Brazil Brazil Brazil

Chile Chile Chile

Mexico Mexico Mexico

Peru Peru Peru

Meta Meta Meta

–0.4 –0.35 –0.3 –0.25 –0.2 –0.15 –0.1 –0.050 0.05 –0.5 –0.4 –0.3 –0.2 –0.1 0 0.1 0.2 –0.8 –0.6 –0.4 –0.2 0 0.2 0.4 ghi 3q22 - rs112458845 : A - Eyebrow thickness 2q36 - rs2218065 : G - Mono-brow 6p25 - rs12203592 : T - Hair greying Colombia Colombia Colombia

Brazil Brazil Brazil

Chile Chile Chile

Mexico Mexico Mexico

Peru Peru Peru

Meta Meta Meta

–0.3 –0.2 –0.1 0 –0.1 0.2 0.3 –0.4 –0.3 –0.2 –0.1 00.1 –0.1 –0.05 0 0.05 0.1 0.15 0.2 0.25 j 10q22 - rs2814331 : T - Balding Colombia

Brazil

Chile

Mexico

Peru

Meta

–0.12–0.1 –0.08 –0.06 –0.04 –0.02 0 0.02 0.04 Figure 2 | Effect sizes for the derived allele at index SNPs (Table 1) in ten genomic regions not previously associated with hair traits. (a) 10p14 hair shape, (b) 16p11 hair shape, (c) 2q12 beard thickness, (d) 4q12 bear thickness, (e) 6q12 beard thickness (f) 7q31 beard thickness (g) 3q22 eyebrow thickness, (h) 2q36 monobrow, (i) 6p25 hair greying, (j) 10q22 balding. Blue boxes represent regression coefficients (x axis) estimated in each country. Red boxes represent effect sizes estimated in the combined meta-analysis. Blue box sizes are proportional to sample size. Horizontal bars indicate a 95% confidence interval of width equal to 2 standard errors. Meta-analysis P values are shown in Supplementary Table 4. Similar plots for regions previously associated with hair traits that were replicated here are shown in Supplementary Fig. 4.

(PREP; Supplementary Fig. 8). None of these genes have known rs11150606 encoding the R30Q substitution in PRSS53, functions specifically related to hair. Similarly, balding is associated with scalp hair shape. Bioinformatic analysis indicates associated with SNPs in the second intron of the Glutamate that this amino-acid change could introduce a subtilisin/ receptor delta-1 subunit gene (GRID1) on 10q22 but this gene has kexin-like proprotein convertase site in PRSS53 (Supplementary no documented role in hair biology. Fig. 9), and thus could have the potential of affecting processing of the enzyme. To probe into the role of PRSS53 in hair development, we performed immunohistochemistry of human PRSS53 and hair shape. Among the SNPs in regions showing scalp hair follicles during active hair growth (anagen phase; genome-wide significant association, a functional role potentially Fig. 4c–j). Expression of PRSS53 was mainly detected in the underlying the observed association is most suggestive for developing IRS and pre-cortex of the hair fibre and in some

NATURE COMMUNICATIONS | 7:10815 | DOI: 10.1038/ncomms10815 | www.nature.com/naturecommunications 5 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms10815 Recombination rate (cM/Mb) Recombination rate (cM/Mb) abcRecombination rate (cM/Mb) 20 2q12 - Hair shape r 2 100 2q12 - Beard r 2 100 10 r 2 16p11 - Hair shape 100 rs3827760 rs365060 120 0.8 0.8 0.8 rs11150606 0.6 0.6 8 0.6 80 0.4 80 15 0.4 80 0.4 100 0.2 0.2 0.2

-value) 6 60 P -value) 80 60 -value) 60 ( P P 10 4 40 ( ( 60 10 10 40 10 40 2 20 40 5 –Log 20 20 0 0 –Log 20 –Log BCL7C FBXL19 HSD3B7 STX4 PRSS53 PRSS36 PYCARD ITGAM 0 0 0 0 MIR4519 ORAI3 STX1BZNF668 KAT8 FUS LOC101928736 SETD1A ZNF646 PRSS8 PYCARDOS

GCC2 RANBP2 EDAR SH3RF3-AS1 LIMS1 RANBP2 EDAR SH3RF3–AS1 MIR4266 MIR762 VKORC1 TRIM72 LIMS1 CCDC138 SH3RF3 CCDC138 SH3RF3 CTF1 BCKDK PYDC1

MIR4265 MIR4265 FBXL19–AS1 109.2 109.4 109.6 109.8 109.2 109.4 109.6 109.8 30.9 31 31.1 31.2 31.3 Position on chr2 (Mb) Position on chr2 (Mb) Position on chr16 (Mb)

deRecombination rate (cM/Mb) f Recombination rate (cM/Mb) Recombination rate (cM/Mb) 3q22 - Eyebrow r 2 100 10 r 2 2q36 - Mono-brow 100 14 r 2 6p25 - Hair greying 100 rs112458845 10 0.8 0.8 0.8 rs12203592 0.6 0.6 12 0.6 0.4 80 0.4 0.4 8 rs2218065 80 80 8 0.2 0.2 10 0.2

-value) 60 -value) P 6 6 60 ( -value) 8 60 P ( P ( 10 4 40 10 4 40 6 10 40 2 20 –Log 4 –Log 2 20 0 0 –Log 20 2 CEP70 PIK3CB FOXL2 PRR23C PISRT1 MRPS22 0 0 FAIM FOXL2NB BPESC1 0 0 PRR23A PAX3 PRR23B CCDC140 DUSP22 IRF4 EXOC2 138.4 138.6 138.8 139 222.9 223 223.1 223.2 0.2 0.3 0.4 0.5 Position on chr3 (Mb) Position on chr2 (Mb) Position on chr6 (Mb)

Figure 3 | Association plots for six regions with SNPs showing genome-wide significant association to hair traits. (a) 2q12 hair shape, (b) 2q12 beard thickness, (c) 16p11 hair shape, (d) 3q22 eyebrow thickness, (e) 2q36 monobrow, (f) 6q25 hair greying. The index SNP in each region (Table 1) is shown as a purple diamond. At the top of the figure are shown the association results (on a -log10 P scale; left y axis) for all genotyped and imputed SNPs. The dot colour indicates the strength of LD (r2) between the index SNP and each SNP (based on the 1000genomes AMR data set). Recombination rate across the region, in the AMR data, is shown as a continuous blue line (scale on the right y axis). Genes in the region are shown in the middle. These plots were produced using LocusZoom68. Below each LocusZoom plot we show an LD heatmap (using r2, red indicating r2 ¼ 1 and white indicating r2 ¼ 0) produced using Haploview69. Coordinates used are from sequence build 37. Plots for regions not shown here are presented in Supplementary Fig. 8. bulbar melanocytes (Fig. 4c). PRSS53 expression was increased in not yet known how the ‘slippage’ of the inner versus outer hair subpopulations of IRS keratinocytes corresponding to early and follicle components occurs during outward hair fibre growth, but late stages of hair fibre keratinization and IRS cornification it is likely that protease activity is involved. Expression of PRSS53 (Fig. 4d,e). This pattern of expression is consistent with the hair- changes in the upper IRS (Fig. 4f), at the level of the sebaceous shape association we observe, as the cornifying IRS is thought to gland, where the IRS undergoes a controlled dissolution, required impart significant hair-shaping influence on the hair fibre42. for exiting of the hair fibre to the skin surface. Expression of PRSS53 appears to be modulated as a function To evaluate the cellular impact of the Q30R substitution in of hair fibre differentiation, as evidenced by high PRSS53 PRSS53, we expressed both forms of the enzyme in 293-EBNA expression at the level of the hair follicle where the hair cells, a human cell line showing pro-protein convertase activity45. fibre undergoes dissolution of nuclear DNA (Fig. 4e)43. Western blot analysis of transfected cell extracts revealed a Double immunofluorescence for PRSS53 and TCHH revealed different signal peptide processing in the two forms of the enzyme co-expression of these in specific elements of the IRS and that PRSS53 Q30R has a slightly faster electrophoretic (Fig. 4g–j), suggesting that PRSS53 may also be associated with mobility, consistent with an extra proteolytic cleavage (Fig. 5a and the maturation of this hair follicle sheath. The only other Supplementary Fig. 9). These differences were not detected when component of the hair follicle that expresses TCHH is the the cell cultures were incubated with an inhibitor of pro-protein medulla of the hair fibre and this was also found to express convertases (decanoyl-RVKR-CMK, DECA). Consistent with an PRSS53 (Fig. 4i). The medulla is a non-obligate component of the altered processing of the enzyme affecting its secretion, the hair fibre, but when present it impacts on the mechanical PRSS53 Q30R variant was less abundant in the cell culture media, properties of hair44. It is interesting that PRSS53 is expressed in except when cells were incubated in the presence of DECA the companion layer of the IRS (Fig. 4e,h–j), directly opposing the (Fig. 5b). In agreement with the reduced secretion of PRSS53 non-keratinizing, non-upwardly moving outer root sheath. It is Q30R seen in the western blot analyses, immunocytochemistry

6 NATURE COMMUNICATIONS | 7:10815 | DOI: 10.1038/ncomms10815 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/ncomms10815 ARTICLE

ab Edar+/+ EdarTg951/Tg951 60

Upper lip 2 50 P = 0.028 Tongue 40

30 Lower jaw 20

10 Hair placodes per mm 0 +/+ Tg951/Tg951

cdORS ef IRS pCo ORS IRS Sg

FP IRS IRS Co HCo IRS * *

IRS ghIRS * i j* IRS IRS ORS IRS IRS pCo Co IRS IRS

Co Co * * Md FP *

Figure 4 | EDAR effects on mouse facial hair follicle density and expression of PRSS53 in anagen (growing) human hair follicles. (a) Frontal photographs showing part of the lower facial region from 14.5-day-old mouse embryos stained by in-situ hybridization for detecting Sostdc1 to reveal hair placodes (primordia of hair follicles) as blue foci. We compared placode density in the lower jaw of wild-type ( þ / þ ) mice with an Edar transgenic having a high copy number of Edar (EdarTg951/Tg951)26. The black scale bar equals 0.5 mm. (b) Bar plot comparing placode density in mice with different Edar genotypes (n ¼ 4). Mean density in Edar þ/ þ mice was 53 placodes per mm2 (standard deviation ¼ 1.3) and 35 placodes per mm2 (standard deviation ¼ 1.5) in EdarTg951/Tg951 mice, the difference between means being significant (exact P value of 0.028). Error bars represent ±3 standard deviations. (c–f) Anagen human hair follicle stained with anti-PRSS53 antibody (green) and with anti-melanocyte antibody (red), and counterstained with 4,6-diamidino-2-phenylindole (DAPI; blue, nuclei). (c) Hair follicle bulb showing PRSS53 expression in the developing IRS, pre-cortex and in some melanocytes (arrow and inset) as indicated by the yellow–orange staining. (d) Mid hair follicle showing expression of PRSS53 in maturing IRS keratinocytes (arrows). (e) Distal hair follicle showing high expression of PRSS53 in IRS cells around the level of DNA degradation in hair fibre (HF) keratinocytes (as indicated by a reduction in DAPI staining in this region HF). PRSS53 is also expressed in the IRS companion layer (CL; *). (f) Upper distal hair follicle at sebaceous gland level (Sg) showing PRSS53 expression in scattered peri-follicular cells just below the Sg and in the IRS at the point of its dissolution (arrows). (g–j) Anagen human hair follicle stained with anti-PRSS53 antibody (green) and with anti-TCHH antibody (red), and counter-stained with DAPI (blue, nuclei). (g) Hair follicle bulb showing co-localization (orange/yellow) of PRSS53 and TCHH in the developing IRS, especially in the most external IRS layer. (h) Supra-bulbar region of the hair follicle showing expression of PRSS53 in the developing companion layer (*) of the IRS (green) and some co- localization with TCHH in the inner IRS. (i) Mid hair follicle showing expression of PRSS53 in the IRS companion layer (*) and the central medulla (Md) of the developing HF. (j) Upper hair follicle showing PRSS53 expression in the companion layer of the IRS (*) and in the TCHH-positive IRS. Co, hair fibre cortex; FP, follicular papilla; IRS, inner root sheath; Md, medulla; pCo, pre-cortex; ORS, outer root sheath; Sg, sebaceous gland. Grey scale bars correspond to 40 mm in each figure. indicates a greater accumulation of this variant in the whether there is enrichment for signatures of selection at the ER-transGolgi network (Fig. 5c). Altogether, these in-vitro regions showing evidence of association with hair traits in the analyses confirm that the Q30R substitution in PRSS53 can CANDELA sample. For this purpose, we used the affect processing and secretion of the enzyme. Composite of Multiple Signals (CMS) statistic calculated in the three main reference East Asian, European and African populations from the 1000 Genomes Project data47 (ASN, CEU Signatures of selection at associated gene regions. Some of the and YRI, respectively). We contrasted the distribution of CMS strongest signals of selection in the human genome detected in scores at gene regions showing at least suggestive association to recent genome-wide searches involve pigmentation genes in hair features (that is, regions marked by SNPs with association Europeans and EDAR in East Asians27,46. To assess whether P valueso10 5; Supplementary Table 6) with the distribution of selection could have contributed broadly to shape variation at CMS scores across the genome. We found significantly higher gene regions impacting on human hair features, we examined CMS scores in the hair-associated gene regions compared with

NATURE COMMUNICATIONS | 7:10815 | DOI: 10.1038/ncomms10815 | www.nature.com/naturecommunications 7 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms10815

ab Cell extracts Medium – DECA + DECA – DECA + DECA

C PRSS53PRSS53C Q30RPRSS53PRSS53 Q30R 75 C PRSS53PRSS53C Q30RPRSS53PRSS53 Q30R

50 50 Actin

c FLAG Calreticulin Merge

PRSS53

FLAG Calreticutlin Merge

PRSS53 Q30R

FLAG Calreticulin

Negative control

d 16p11

12 ASN rs11150606

8

4 CMS scores 0

STX4 PRSS53 PRSS36 ZNF668 KAT8 ZNF646 VKORC1 PRSS8 BCKDK 30.90 30.9531.00 31.05 30.10 Position (Mb)

Figure 5 | Processing of PRSS53 and signals of selection in the PRSS53 gene region. Comparison of PRSS53 and PRSS53 (Q30R) from cell extracts (a)and media (b), after expression in 293-EBNA cells cultured in the absence ( )orpresence(þ ) of DECA (decanoyl-RVKR-CMK, a pro-protein convertase inhibitor). (a) The top two bands seen in lanes loaded with PRSS53 cell extracts, result from partial processing of the signal peptide. In the absence of DECA, there appears to be an accumulation of PRSS53 (Q30R) without signal peptide. PRSS53 (Q30R) also has a slightly faster mobility compared with PRSS53 (a difference of eight amino acids based on the location of the pro-protein convertase site Supplementary Fig. 9). In the presence of DECA, both forms of the enzyme appear identically processed and there is no difference in mobility of the proteins. (b) The medium from cell cultures grown in the absence of DECA shows a reduced amount of PRSS53 (Q30R), compared with PRSS53. In the presence of DECA, the amount of protein in the media is similar for the two forms of the enzyme. Recombinant proteins were detected using an anti-FLAG antibody after migration on 13% SDS–polyacrylamide gel electrophoresis gels. Molecular weight markers (kDa) are indicated on the left. Beta-actin was used as a loading control (C ¼ cells transfected with an empty vector). Full immunoblots for a and b are shown in Supplementary Fig. 11. (c) Immunostaining of 293-EBNA cells expressing PRSS53 and PRSS53 (Q30R). The enzyme was detected using an anti-FLAG antibody and the endoplasmic reticulum (ER) stained with an anti-calreticulin antibody. There is more abundant co-localization of the enzyme with the ER (white arrows) for PRSS53 (Q30R) compared with PRSS53, consistent with the intracellular accumulation of PRSS53 (Q30R). Negative controls, lacking primary antibodies for FLAG and calreticulin, are also shown. The scale bar indicates 10 mm. Magnification was the same for all photographs. (d) The top panel shows CMS scores (red dots) in the 1000 genomes ASN (JPT þCHB) data for SNPs in the 16p11 region associated with hair shape. The dotted line indicates the empirical significance threshold (1%). The SNP with the highest CMS score is rs11150606, coding for the Q30R substitution in PRSS53 (highlighted in purple) associated with hair curliness (Table 1 and Fig. 3c). Genes in the region are shown in the panel below (introns in magenta, exons in green). Coordinates are from human genome sequence Build 37.

8 NATURE COMMUNICATIONS | 7:10815 | DOI: 10.1038/ncomms10815 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/ncomms10815 ARTICLE the genome-wide distribution (one-sided Mann–Whitney U-test that the TCHH SNP with smallest P value observed here P value ¼ 2 10 8, P ¼ 2 10 5 and P ¼ 2 10 5 in ASN, (rs11803731) has also been strongly implicated by previous CEU and YRI, respectively) and a significantly higher proportion analyses in Europeans15,48, suggesting that this variant might be of SNPs with empirically significant CMS scores (that is, SNPs in directly affecting hair shape. The M790L substitution in TCHH the top 1% of the distribution; one-sided Mann–Whitney U-test encoded by rs11803731 is not predicted to result in major P values of 5 10 7,1 10 4 and 4 10 4, in ASN, CEU structural alterations of TCHH, but it has been proposed that it and YRI, respectively). Noticeably, significant CMS scores were could affect the post-translational processing of TCHH, with observed in ASN for SNPs in the PRSS53 region associated with regulatory implications48. It will be important to assess whether hair shape, the highest CMS score (12.96, empirical P value TCHH is a substrate for PRSS53, given our observation that this 3 10 4) being observed for SNP rs11150606, encoding the enzyme is enriched in all layers of the IRS during the Q30R substitution in PRSS53 (Fig. 5d). In the EDAR region, we cornification/hardening stages of the hair fibre, as well as in the observe the previously documented strong signal of selection in medulla (a component influencing the physico-mechanical ASN for variants around SNP rs3827760 (refs 27,46, associated characteristics of the hair fibre6). Proteases and protease with hair shape (Table 1). We also observe a significant signal of inhibitors are important for epidermal keratinization and can selection in CEU for the intronic EDAR variants associated here regulate hair growth and cycling28. For example, optimal with beard thickness, independently of rs3827760, discussed desquamation of the IRS (and overall hair growth cycling) is above (Fig. 3b and Supplementary Fig. 7b). perturbed in lysosomal cysteine protease cathepsin L knock-out mice49. Moreover, given the similarity of PRSS53 with the kallikreins (the largest family of secreted serine protease Discussion endopeptidases) it is possible that PRSS53 lyses substrates in The analyses presented here have enabled us to expand cornified tissues that ultimately desquamate50. Interestingly, it has substantially the set of gene regions known to impact on been suggested that GATA3 (in the 10p14 region associated with variation in human head hair appearance. This task has been hair shape) inhibits the expression of serine protease inhibitor facilitated by the extensive phenotypic and genetic diversity of the Kazal type-5 (SPINK5), mutations of which cause Netherton CANDELA sample, a result of Latin American history involving syndrome, an autosomal recessive congenital ichthyosis admixture between Africans, Europeans and Native Americans17. characterized by so-called Bamboo hair, epidermal hyperplasia The predominant European and Native American ancestry of the and an impaired epidermal barrier function51. CANDELA sample is expected to provide especially high power The colour of hair results from melanin pigments transferred for the detection of genetic effects at loci with differentiated allele to hair fibre keratinocytes from hair follicle melanocytes. These frequencies between those two continental populations. melanocytes differentiate from melanoblasts that migrate from Consistently, allele frequencies at most index SNPs identified the neural crest into hair follicles early in development6. Some here show large differences between Europeans and East Asians/ hair follicle melanoblasts remain undifferentiated and serve as Native Americans (Supplementary Table 7). This strong stem cells for the periodic replenishment of mature melanocytes, differentiation in allele frequencies could relate partly to melanogenesis occurring only in the anagen phase of the hair selection acting on the associated gene regions, as proposed for growth cycle. Among the several hundred gene products known the evolution of human hair appearance2,3,11. The enrichment we to participate in melanogenesis, recent association studies have observe for significant CMS scores at gene regions associated with identified a handful that influence hair colour variation in hair features is consistent with this scenario. Interestingly, the Europeans13. Most of these associations have been replicated here pattern of variation at PRSS53 is similar to that seen for EDAR, (Table 1), including that of the derived T allele at SNP rs12203592 with significant CMS scores in East Asians and a functional in IRF4 with lighter hair colour. It has been shown experimentally changing variant reaching high frequency only in East that IRF4 interacts with the microphthalmia-associated Asia (Supplementary Table 7). These observations, and the fact transcription factor (MITF, a key regulator of the expression of that other genes are associated with straight hair in Europeans, many pigment enzymes and differentiation factors), to activate are in line with the proposal that scalp hair shape has been the the expression of TYR (a rate-limiting essential enzyme in subject of recent selection in humans2,3. melanin synthesis). The derived T allele at rs12203592 leads to There is increasing interest in elucidating the mechanisms reduced TYR expression and melanin synthesis, consistent with influencing the shape of the growing hair fibre. Based on mouse the association of this allele with lighter hair colour41. In line with studies it has been proposed that EDAR signalling is involved in the geographic distribution of light hair colour, the T allele at determining hair shape through regulating expression of the key rs12203592 is essentially absent outside Europe (Supplementary hair growth signal Shh (Sonic hedgehog)26. Higher Edar function Table 7). Interestingly, we find that the T allele at SNP increases Shh expression and causes it to become symmetrically rs12203592 is also associated with increased hair greying. expressed in the hair bulb, leading to straight hair growth Experimental evidence suggests that the mechanism of hair presumably through promotion of symmetric cell proliferation26. greying involves incomplete maintenance of melanocyte stem Structurally, a key player in shaping hair is TCHH, involved in cells in the hair follicle6. Importantly, MITF is known to affect crosslinking the cornified envelope and keratins of IRS cells31. melanocyte survival via its regulation of anti-apoptotic Bcl2 TCHH is one of the earliest differentiation proteins of the expression, a key factor in protection of the hair follicle against growing (anagen) hair follicle bulb, and is enriched in the IRS oxidative stress6. To probe the mechanism by which IRF4 might (representing B30% of its protein content). TCHH confers impact on hair greying, it will therefore be important to evaluate significant mechanical strength to the IRS via its enrichment in whether the T allele at rs12203592 influences MITF in terms of arginines and glutamines, which represent over 40% of melanoblast stem cell maintenance and survival or via melanocyte TCHH amino acids. Many of the arginine residues undergo loss post differentiation. citrullination/deimination, whereas the glutamines are involved The skin on different parts of the body has different hair in intra- and interprotein chain crosslinks during the characteristics, with the final hair distribution dependent upon cornification/hardening of the IRS31. The progressive hardening the spacing pattern laid down during development, the extent of of the IRS is thus thought to contribute to the moulding of the skin growth that occurs subsequent to pattern establishment, and still-pliable hair fibre during hair growth. It is interesting to note to hormonal and ageing effects. The development of skin at

NATURE COMMUNICATIONS | 7:10815 | DOI: 10.1038/ncomms10815 | www.nature.com/naturecommunications 9 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms10815 different body sites is known to be controlled by an underlying separately for shaven and unshaven individuals and scores for these two groups transcription factor code52 to which FOXL2, FOXP2 and PAX3 subsequently merged. As an interview of the volunteers indicated that most women may contribute in defining hair distribution on specific areas of modified their eyebrows, monobrow and eyebrow thickness were also only scored in men. Both these traits were scored on three-point scales: eyebrow thickness as the face. As hair of the beard is produced through a two-stage low, medium or high, and monobrow as none, medium or high. process of embryonic hair follicle patterning followed by a The frequency distribution of the traits in the CANDELA sample analysed here post-pubertal androgen-driven transformation into terminal hair, is shown in Supplementary Fig. 1. beard thickness could be modulated by genes acting either prenatally or at puberty. Our finding of reduced hair placode DNA genotyping and quality control. DNA samples from participants were density in embryonic mice with increased Edar expression genotyped on the Illumina HumanOmniExpress chip including 730,525 SNPs. suggests that the basis for this variation lies in the recognized PLINK v1.9 (ref. 57) was used to exclude SNPs and individuals with more than 5% 53 missing data, markers with minor allele frequency o1%, related individuals and role of EDAR in developmental hair patterning . It is likely that those who failed the X-chromosome sex concordance check (sex estimated from EDAR function affects hair follicle density on most or all of the X-chromosome heterozygosity not matching recorded sex information). After human body, as described in the mouse26, but that on the head applying these filters, 669,462 SNPs and 6,357 individuals (2,922 males and 3,435 this effect is most readily apparent as variation in beard thickness. females) were retained for further analysis. Because of the admixed nature of the study sample (Supplementary Fig. 2), there is an inflation in Hardy–Weinberg Analyses focusing on the mechanism by which associated genetic P values. We therefore did not exclude markers based on Hardy–Weinberg variants affect regional facial hair density should provide insights deviation. into developmental patterning in humans, and perhaps yield clues into the genetic basis for the striking modification of hair Statistical genetic analyses. P values for Pearson correlation coefficients were 54 distribution that has occurred in human evolution . obtained by permutation. Narrow-sense heritability (computed as the additive Elucidating the genetic architecture of normal variation in hair phenotypic variance explained by a kinship matrix computed from the chip 20 traits has implications outside basic bioscience. Among human genotypes) was estimated using GCTA by fitting an additive linear model with a random effect term whose variance is given by the kinship matrix, with age and sex visible phenotypes hair appearance is perhaps the mostly easily as covariates. The kinship matrix was obtained using the LDAK approach19, which modified, a feature prominently exploited by the cosmetics industry. accounts for LD between SNPs. African, European and Native American ancestry This industry has traditionally focused on the development of was estimated from a set of 93,328 autosomal SNPs (LD-pruned from the full 58 products altering the appearance of keratinous hair fibres after their chip data) via supervised runs of ADMIXTURE . Reference putative parental population data included in the ADMIXTURE analyses for Africans and exit from the skin surface. However, there is currently great interest Europeans were chosen from HAPMAP and for Native Americans from in exploring whether hair appearance can be modified as it is selected Amerindian populations as described in Ruiz-Linares et al.17 formed in the hair follicle55. This includes evaluating whether hair The chip genotype data were phased using SHAPEIT2 (ref. 59). IMPUTE2 greying could be slowed or blocked, and elucidating the mechanism (ref. 60) was then used to impute genotypes at untyped SNPs using variant positions from the 1000 Genomes Phase I data61. The 1000 Genomes reference by which IRF4 influences hair greying could provide targets of data set included haplotype information for 1,092 individuals for 36,820,992 intervention for this purpose. Similarly, modulating the activity of variant positions. Positions that are monomorphic in 1000 Genomes Latin PRSS53 in the IRS and medulla is a candidate pathway with a view American samples (CLM, MXL and PUR) were excluded, leading to 11,025,002 to purposefully altering hair shape. The genetics of hair appearance SNPs being imputed in our data set. Of these, 22,737 SNPs had imputation quality is also of interest in anthropology and forensics, particularly for the scores o0.3 and were excluded. The IMPUTE2 genotype probabilities at each locus were converted into best-guess genotypes using PLINK57. SNPs with uncalled prediction of hair features based on genetic information. The genotypes in 45% of samples or minor allele frequency o1% were excluded. The implementation of this so-called ‘forensic DNA phenotyping’ final imputed data set used in the GWAS analyses included genotypes for 9,143,600 promises to contribute an investigative tool in cases where a SNPs. biological sample is available but there is lack of other information PLINK 1.9 (ref. 57) was used to perform genome-wide association tests for each phenotype using multiple linear regression with an additive genetic model regarding the identity of its contributor. The development of this incorporating age, sex and five genetic PCs (principal components) as covariates. approach in Europeans is fairly advanced for hair colour, and is The genetic PCs were obtained from the LD-pruned data set of 93,328 SNPs using beginningtobeexploredforbaldingandhairshape56. Considering PLINK 1.9. These PCs were selected by inspecting the proportion of variance the high genetic and phenotypic diversity of Latin American explained and checking scatter and screen plots (Supplementary Fig. 3A). Individual outliers were removed and PCs recalculated after each removal. Using populations, appropriate tools will need to be in place for reliable these PCs, the Q–Q plots for all association tests showed no sign of inflation, the phenotypic prediction in that context and the results presented here genomic control factor l beingo1.02 in all cases (Supplementary Fig. 3B). We represent a step in that direction. previously showed that using five PCs in GWAS of the CANDELA sample completely removes the inflation, which is observed when PC adjustment is not used, and increasing the number of PCs included in the regression from five to ten Methods does not provide additional gain18. Association analysis on the imputed data set Study subjects. A total of 6,630 volunteers from five Latin American countries were performed using the best-guess imputed genotypes in PLINK and using the (Brazil, Chile, Colombia, Mexico and Peru), part of the CANDELA consortium IMPUTE2 genotype probabilities obtained in SNPTEST v2.5 (ref. 62). Association sample (http://www.ucl.ac.uk/silva/candela)17,18, were included in this study results from both approaches were consistent with each other and with the results (Supplementary Table 1). Ethics approval was obtained from: the Universidad from the chip genotype data. For analysis of the X chromosome, an inactivation Nacional Auto´noma de Me´xico (Me´xico), the Universidad de Antioquia model was used (male genotypes encoded as 0/2 and female genotypes as 0/1/2). (Colombia), the Universidad Peruana Cayetano Heredia (Peru´), the Universidad de Individuals with red hair were excluded from the final hair colour GWAS, as it was Tarapaca´ (Chile), the Universidade Federal do Rio Grande do Sul (Brazil) and the a rare phenotype (0.55%). Similarly, few individuals were scored as having frizzy University College London (UK). All participants provided written informed hair (2.4%) and these individuals were excluded from the hair shape GWAS. consent. Blood samples were collected by a certified phlebotomist and DNA Analyses for hair greying were performed with the five-point scores or with a extracted following standard laboratory procedures. two-point scale of some greying or no greying and produced similar results. A meta-analysis was carried out for the index SNPs identified in the GWAS (Table 1) by testing for association separately in each country sample and Hair phenotyping. Scalp hair features were recorded by physical examination combining the results using the meta-analysis software METAL22 (as implemented of the volunteers. Natural hair colour was scored in four categories (1-red/reddish, in PLINK 1.9). Forest plots were produced with MATLAB combining all regression 2-blond, 3-dark blond/light brown or 4-brown/black). Greying was scored on a coefficients and standard errors. Cochran’s Q statistic was computed for each trait five-point scale: 1-for no greying, 2-for predominant non-greying, 3-for B50% to test for effect size heterogeneity across country samples. For SNPs with greying, 4-for predominant greying and 5-for totally white hair. Hair curliness significant heterogeneity, a random effects model was used for meta-analysis63. was scored as 1-straight, 2-wavy, 3-curly or 4-frizzy. Balding was scored on a A prediction model was constructed for each hair trait examined, including the three-point scale (none, medium, high) in both women and men. Although associated index SNPs (Table 1) as fixed effects and a random effect term obtained frequency of balding in women is low, their inclusion more than doubles sample via BLUP (Best Linear Unbiased Predictor) using the genome-wide kinship matrix size, thus adding considerable power to the analyses (Supplementary Fig. 10). obtained from the chip data19. The BLUP component is thus sensitive to the effect Facial hair traits were scored using photographs of the faces of the individuals. of SNPs below the threshold for genome-wide significance. Prediction results were Beard density was scored in men using a three-point scale (low, medium or high), obtained through tenfold cross-validation64: the set of samples were randomly split

10 NATURE COMMUNICATIONS | 7:10815 | DOI: 10.1038/ncomms10815 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/ncomms10815 ARTICLE into ten chunks of 10% and used as the test data set, whereas the remaining 90% till 100 ml using and Speed-Vac centrifuge (Savant). Then, 12 ml of each sample was used as training data set to fit a prediction model. For each run, R2 estimates were loaded per lane for further identification as indicated above. A pCEP control were calculated (conditional on the covariates) to obtain the proportion of plasmid (empty vector) was used as a negative control for all these assays. phenotypic variance explained by the model. Average prediction scores across the Cell staining. For immunocytochemical analysis of 293-EBNA cells expressing tenfold runs were calculated. Genetic PCs, age and sex (except for facial hair traits PRSS53 or PRSS53 Q30R, cells were fixed with 4% paraformaldehyde in which are male-only) were used as covariates. phosphate-buffered saline buffer and cells were then blocked with 15% fetal bovine To evaluate an enrichment of selection signals at gene regions associated with serum. To detect recombinant PRSS53 and PRSS53 Q30R proteins, blocked slides hair traits, we examined the CMS scores of selection calculated for the three main were incubated overnight with an anti-FLAG antibody (Sigma), followed by 2 h of 1000 Genomes Project populations46: ASN (JPT þ CHB), CEU and YRI. We incubation with a secondary Alexa 488-conjugated antibody (Life Technologies). obtained empirical significance cutoffs (1%) separately for each population An anti-calreticulin antibody (Mirus) was employed to examine co-localization based on CMS scores for 3,071,032, 3,179,944 and 3,312,050 SNPs (in ASN, CEU of both forms of PRSS53 with endoplasmic reticulum. To detect calreticulin, a and YRI, respectively). In each population, we estimated the mean CMS score for secondary Alexa 546-conjugated antibody was employed. For negative controls, SNPs in a ±2 kb region around each gene, based on the UCSC RefSeq annotation same protocol was employed with the exception that primary antibodies were (excluding gene regions with less than four SNPs). We obtained the distribution absent. Images were obtained using a fluorescence microscope and a digital camera of mean CMS scores for genes including SNPs associated with the hair traits (Axiovert). examined here and compared it with the distribution for all other genes in the genome (19,835, 20,093 and 20,386 gene regions in ASN, CEU and YRI, respectively). We used a suggestive significance level cutoff for inclusion in the set References of hair-associated genes (that is, SNP association P value o10 5; Table 1 and 1. Bradley, B. J. M., N.I. The primate palette: the evolution of primate coloration. Supplementary Table 6), resulting in 53, 53 and 55 gene regions being included for Evol. Anthropol. 17, 97–111 (2008). ASN, CEU and YRI, respectively. If associated SNPs were in intergenic regions, we 2. Jablonski, N. G. Skin: A Natural History (Univ. of California Press, 2006). included the gene most closely located to this SNP. We contrasted the distribution 3. Jablonski, N. G. The naked truth. Sci. Am. 302, 42–49 (2010). of mean CMS scores at these gene regions with the distribution for all other gene 4. Loussouarn, G. et al. Worldwide diversity of hair curliness: a new method of regions in the genome using a one-sided Mann–Whitney U-test. assessment. Int. J. Dermatol. 46(Suppl 1): 2–6 (2007). 5. Panhard, S., Lozano, I. & Loussouarn, G. Greying of the human hair: a worldwide survey, revisiting the ’50’ rule of thumb. Br. J. Dermatol. 167, Assessment of mouse chin hair follicle density. We examined chin hair placode 865–873 (2012). (follicle primordium) density in E14.5 day mouse embryos. We focused on 6. Westgate, G. E., Botchkareva, N. V. & Tobin, D. J. The biology of hair diversity. embryonic day 14.5 (E14.5) as at this embryonic stage the hair pattern is laid out, Int. J. Cosmet. Sci. 35, 329–336 (2013). so that the sites of future hair follicle growth are readily quantifiable by detecting 7. Nyholt, D. R., Gillespie, N. A., Heath, A. C. & Martin, N. G. Genetic basis of focal expression of the marker gene Sostdc1 (ref. 65). We performed in-situ male pattern baldness. J. Invest. Dermatol. 121, 1561–1564 (2003). 18 hybridization staining embryos for Sostdc1 to visualize the placodes present on 8. Gunn, D. A. et al. Why some women look young for their age. PLoS ONE 4, the lower jaw. At this stage, placodes are visible as rings or filled rings of Sostdc1 e8021 (2009). expression. Embryos were positioned to view the lower jaw, imaged from a frontal 9. Medland, S. E., Zhu, G. & Martin, N. G. Estimating the heritability of hair view and placodes were counted within a rectangular area on the lower jaw. curliness in twins of European ancestry. Twin. Res. Hum. Genet. 12, 514–518 Placode density for each embryo was determined using Image-Pro software (Media (2009). Cybernetics). þ / þ 2 10. Lin, B. D. et al. Heritability and Genome-Wide Association Studies for Hair Average hair placode density for the Edar was 53 placodes per mm Color in a Dutch Twin Family Based Sample. Genes (Basel) 6, 559–576 (2015). (standard deviation ¼ 1.3, sample size ¼ 4) compared with EdarTg951/Tg951, which 11. Frost, P. The Puzzle of European Hair, Eye, and Skin Color. Adv. Anthropol. 4, was reduced to 35 placodes per mm2 (standard deviation ¼ 1.5, sample size ¼ 4). 78–88 (2014). All values in the wild-type group were higher than in the transgenic group. A 12. Li, R. et al. Six novel susceptibility Loci for early-onset androgenetic alopecia non-parametric Mann–Whitney U-test was applied to test the difference between and their unexpected association with common diseases. PLoS Genet. 8, the two sets of hair density values. An exact P value calculation was used. e1002746 (2012). 13. Liu, F., Wen, B. & Kayser, M. Colorful DNA polymorphisms in humans. Semin. Immunohistochemistry of PRSS53. Unshaven, full-thickness human adult scalp Cell Dev. Biol. 24, 562–575 (2013). with terminal hair was used snap frozen in liquid nitrogen in cubes of 2 cm3. 14. Medland, S. E. et al. Common variants in the trichohyalin gene are associated Cryosections of 6–8 mm were cut using a cryostat onto adhesive glass slides with straight hair in Europeans. Am. J. Hum. Genet. 85, 750–755 (2009). and incubated with anti-PRSS53 antibody (NBP1, #90678 from Novus) and 15. Eriksson, N. et al. Web-based, participant-driven studies yield novel genetic anti-trichohyalin antibody (AE15, # IQ337 from Immuquest) using standard associations for common traits. PLoS Genet. 6, e1000993 (2010). double immunofluorescence protocols. IgG isotype controls were used at the same 16. Tan, J. et al. The adaptive variant EDARV370A is associated with straight hair concentration as the smallest primary antibody dilution. Co-distribution and in East Asians. Hum. Genet. 132, 1187–1191 (2013). co-localization of both antigens in the hair follicle were determined by merging of 17. Ruiz-Linares, A. et al. Admixture in latin america: geographic structure, the PRSS53- and trichohyalin-positive channels. phenotypic diversity and self-perception of ancestry based on 7,342 individuals. PLoS Genet. 10, e1004572 (2014). 18. Adhikari, K. et al. A genome-wide association study identifies multiple loci for In-vitro analysis of PRSS53. Mutagenesis. To introduce the Q30R substitution in variation in human ear morphology. Nat. Commun. 6, 7500 (2015). the PRSS56 sequence, the QuikChange II XL Site-Directed Mutagenesis Kit 19. Speed, D., Hemani, G., Johnson, M. R. & Balding, D. J. Improved heritability (Agilent) was employed following the instructions by the manufacturer. Primers 0 estimation from genome-wide SNPs. Am. J. Hum. Genet. 91, 1011–1021 (2012). designed for mutagenesis were: poli3MUT-FOR 5 -CAGCGTGCCTGTGGACGG 20. Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome- CGTGGCCCCGGC-30 and poli3MUR-REV 50-GCCGGGGCCACGCCGTCCAC 0 wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011). AGGCACGCTG-3 , and we employ as template a previously published plasmid 21. Purcell, S. et al. PLINK: a tool set for whole-genome association and containing the entire PRSS53 sequence, which includes a FLAG epitope66.PCR population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007). conditions were as follows: 95 °C, 1 min (1 cycle); 95 °C, 50 s, 66°, 50 s and 68° 22. Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of 12 min (18 cycles); and 68 °C, 7 min (1 cycle). PCR products were visualized in a genomewide association scans. Bioinformatics 26, 2190–2191 (2010). 1.0% agarose gel and DNA sequence was verified before experimental use. 23. Mikkola, M. L. Molecular aspects of hypohidrotic ectodermal dysplasia. Cell culture and transfection. 293-EBNA cells were routinely maintained at Am. J. Med. Genet. A 149A, 2031–2036 (2009). 37 °Cin5%CO2 in DMEM supplemented with 10% fetal bovine serum and 50 mgml 1 streptomycin and 100 U ml 1 penicillin (Life Technologies). 24. Fujimoto, A. et al. A scan for genetic determinants of human hair morphology: Expression vectors were transfected into cells using TransIT-X2 Dynamic Delivery EDAR is associated with Asian hair thickness. Hum. Mol. Genet. 17, 835–843 System (Mirus) as recommended by the manufacturer. Cell conditioned medium (2008). was obtained from 293-EBNA cultures for 2 days in medium without added serum. 25. Fujimoto, A. et al. A replication study confirmed the EDAR gene to be a major When indicated, the proprotein convertase inhibitor decanoyl-RVKR-CMK was contributor to population differentiation regarding head hair thickness in Asia. added at 20 mmol L 1. Hum. Genet. 124, 179–185 (2008). Polyacrylamide gel electrophoresis and western blot. Cell extracts were resolved 26. Mou, C. et al. Enhanced ectodysplasin-A receptor (EDAR) signaling alters by 13% polyacrylamide gel electrophoresis, transferred to a nitrocellulose multiple fiber characteristics to produce the East Asian hair form. Hum. Mutat. membrane and then incubated overnight with an anti-FLAG antibody (Sigma). 29, 1405–1411 (2008). Immunoreactive proteins were visualized using horseradish peroxidase (HRP)- 27. Kamberov, Y. G. et al. Modeling recent human evolution in mice by expression peroxidase-labelled anti-rabbit antibody (Pierce), and developed with the Luminata of a selected EDAR variant. Cell 152, 691–702 (2013). Forte Western HRP substrate (Millipore). An anti-actin antibody (Sigma) was 28. Bhogal, R. K., Mouser, P. E., Higgins, C. A. & Turner, G. A. Protease employed to ascertain equal loading. To detect the presence of recombinant activity, localization and inhibition in the human hair follicle. Int. J. Cosmet. Sci. proteins in conditioned medium, 4 ml of conditioned medium were concentrated 36, 46–53 (2014).

NATURE COMMUNICATIONS | 7:10815 | DOI: 10.1038/ncomms10815 | www.nature.com/naturecommunications 11 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms10815

29. Spacek, D. V. et al. The mouse frizzy (fr) and rat ’hairless’ (frCR) mutations are 61. Genomes Project Consortium et al. An integrated map of genetic variation natural variants of protease serine S1 family member 8 (Prss8). Exp. Dermatol. from 1,092 human genomes. Nature 491, 56–65 (2012). 19, 527–532 (2010). 62. Marchini, J. & Howie, B. Genotype imputation for genome-wide association 30. Leyvraz, C. et al. The epidermal barrier function is dependent on the serine studies. Nat. Rev. Genet. 11, 499–511 (2010). protease CAP1/Prss8. J. Cell Biol. 170, 487–496 (2005). 63. Higgins, J. P. & Thompson, S. G. Quantifying heterogeneity in a meta-analysis. 31. Mlitz, V. et al. Trichohyalin-like proteins have evolutionarily conserved roles in Stat. Med. 21, 1539–1558 (2002). the morphogenesis of skin appendages. J. Invest. Dermatol. 134, 2685–2692 64. Hastie, T., Tibshirani, R. & Friedman, J. H. The Elements of Statistical Learning: (2014). Data Mining, Inference, and Prediction, xxii 745 (Springer, 2009). 32. Kaufman, C. K. et al. GATA-3: an unexpected regulator of cell lineage 65. Narhi, K. et al. Sostdc1 defines the size and number of skin appendage placodes. determination in skin. Genes Dev. 17, 2108–2122 (2003). Dev. Biol. 364, 149–161 (2012). 33. Kurek, D., Garinis, G. A., van Doorninck, J. H., van der Wees, J. & Grosveld, F. 66. Cal, S. et al. Identification and characterization of human polyserase-3, a novel G. Transcriptome and phenotypic analysis reveals Gata3-dependent signalling protein with tandem serine-protease domains in the same polypeptide chain. pathways in murine hair follicles. Development 134, 261–272 (2007). BMC Biochem 7, 9 (2006). 34. D’Haene, B. et al. Disease-causing 7.4kb cis-regulatory deletion disrupting 67. Incorporated, A. S. Adobe Photoshop CS6 (San Jose, 2012). conserved non-coding sequences and their interaction with the FOXL2 68. Pruim, R. J. et al. LocusZoom: regional visualization of genome-wide promotor: implications for mutation screening. PLoS Genet. 5, e1000522 (2009). association scan results. Bioinformatics 26, 2336–2337 (2010). 35. Heude, E. et al. Etiology of craniofacial malformations in mouse models of 69. Barrett, J. C., Fry, B., Maller, J. & Daly, M. J. Haploview: analysis and blepharophimosis, ptosis, and epicanthus inversus syndrome. Hum. Mol. Genet. visualization of LD and haplotype maps. Bioinformatics 21, 263–265 (2005). 24, 1670–1681 (2015). 36. Shi, F. et al. A piggyBac insertion disrupts Foxl2 expression that mimics BPES Acknowledgements syndrome in mice. Hum. Mol. Genet. 23, 3792–3800 (2014). 37. Pingault, V. et al. Review and update of mutations causing Waardenburg We are grateful to the volunteers for their enthusiastic support for this research. We ´ ´ syndrome. Hum. Mutat. 31, 391–406 (2010). thank Alvaro Alvarado, Monica Ballesteros Romero, Ricardo Cebrecos, Miguel Angel ´ ´ 38. Liu, F. et al. A genome-wide association study identifies five loci influencing Contreras Sieck, Francisco de Avila Becerril, Joyce De la Piedra, Marıa Teresa Del Solar, ´ facial morphology in Europeans. PLoS Genet. 8, e1002932 (2012). Paola Everardo Martınez, William Flores, Martha Granados Riveros, Ilich Jafet Moreno, ´ 39. Paternoster, L. et al. Genome-wide association study of three-dimensional Jodie Lampert, Paola Leon-Mimila, Francisco Quispealaya, Diana Rogel Diaz, Ruth ˜ facial morphology identifies a variant in PAX3 associated with nasion position. Rojas, Norman Russell, Vanessa Sarabia, Rosilene Paim, Ricardo Gunski, Sergeant Joao ˆ Am. J. Hum. Genet. 90, 478–485 (2012). Felisberto Menezes Cavalheiro and Major Eugenio Correa de Souza Junior for assistance 40. Monsoro-Burq, A. H. PAX transcription factors in neural crest development. with volunteer recruitment, sample processing and data entry. We also thank Richard Semin. Cell Dev. Biol. 44, 87–96 (2015). Baker for technical assistance with the human skin immunofluorescence, Pardis Sabeti 41. Praetorius, C. et al. A polymorphism in IRF4 affects human pigmentation and Matteo Fumagalli for advice on the analysis of CMS scores, Elfride De Baere for through a tyrosinase-dependent MITF/TFAP2A pathway. Cell 155, 1022–1033 information on clinical features of BPES patients, Doug Speed for advice on the (2013). prediction analysis, Barbara Kremeyer for comments on the manuscript and Emiliano 42. Steinert, P. M., Parry, D. A. & Marekov, L. N. Trichohyalin mechanically Bellini for the face illustrations in Fig. 1. The following institutions kindly provided ´ strengthens the hair follicle: multiple cross-bridging roles in the inner root facilities for the assessment of volunteers: Escuela Nacional de Antropologıa e Historia ´ ´ ´ shealth. J. Biol. Chem. 278, 41409–41419 (2003). and Universidad Nacional Autonoma de Mexico (Mexico); Pontificia Universidad ´ ´ 43. Fischer, H. et al. Essential role of the keratinocyte-specific endonuclease Catolica del Peru, Universidad de Lima and Universidad Nacional Mayor de San Marcos ´ ° DNase1L2 in the removal of nuclear DNA from hair and nails. J. Invest. (Peru); Universidade Federal do Rio Grande do Sul (Brazil); 13 Companhia de ¸˜ ´ Dermatol. 131, 1208–1215 (2011). Comunicacoes Mecanizada do Exercito Brasileiro (Brazil). This work was funded by 44. Wagner, R. & Joekes, I. Hair medulla morphology and mechanical properties. grants from: the Leverhulme Trust (F/07 134/DF to ARL), BBSRC (BB/I021213/1 to J. Cosmet. Sci. 58, 359–368 (2007). ARL and Institute Strategic Programme grant to The Roslin Institute); Universidad de 45. Tsuji, A. et al. Engineering of alpha I-antitrypsin variants selective for Antioquia, Colombia (CODI sostenibilidad de grupos 2013- 2014 and MASO ´ subtilisin-like proprotein convertases PACE4 and PC6: importance of the P2’ 2013-2014); Ministerio de Economıa y Competitividad and Instituto de Salud Carlos III residue in stable complex formation of the serpin with proprotein convertase. (RTICC), Spain. C.L.-O. is an Investigator of the Botin Foundation supported by Protein Engin. Des. Sel. 20, 163–170 (2007). the Banco Santander through its Santander Universities Global Division. 46. Grossman, S. R. et al. Identifying recent adaptations in large-scale genomic data. Cell 152, 703–713 (2013). Author contributions 47. Grossman, S. R. et al. A composite of multiple signals distinguishes causal K.A., M.Q.-S., R.G.-J., D.H., D.B., A.R.-L. conceived and designed the study. K.A., variants in regions of positive selection. Science 327, 883–886 (2010). J.M.-R., M.F.-G., V.A.-A., C.J., W.A., R.B.L., G.M.P., J.G.-V., H.V.-R., T.H., V.R., C.C.S. 48. Medland, S. E. et al. Common variants in the trichohyalin gene are associated de C., M.H., V.V., V.G., D.H. and D.J.T. contributed reagents/material. T.F., S.C., J.M.-R., with straight hair in Europeans. Am. J. Hum. Genet. 85, 750–755 (2009). J.C.C.-D., F.A.-S., J.A.J., M.Q.-S., D.H., C.L.-O. performed the experiments. K.A., T.F., 49. Tobin, D. J. et al. The lysosomal protease cathepsin L is an important regulator S.C., J.M.-R., J.A.J., M.Q.-S., D.H., C.L.-O., D.J.T. and A.R.-L. analysed data. V.A.-A., of keratinocyte and melanocyte differentiation during hair follicle J.G.-V., C.G., G.P., L.S.-F., F.M.S., M.-C.B., S.C.-Q., F.R., G.B., R.G.-J., D.H., D.J.T., D.B. morphogenesis and cycling. Am. J. Pathol. 160, 1807–1821 (2002). and A.R.-L. supervised the research (PI). K.A. and A.R.-L. wrote the manuscript, 50. Eissa, A. & Diamandis, E. P. Human tissue kallikreins as promiscuous modulators incorporating input from other authors. Critical revision of the manuscript was of homeostatic skin barrier functions. Biol. Chem. 389, 669–680 (2008). done by C.G., G.P., S.C.-Q., R.G.-J., D.H., D.J.T. and D.B. 51. Le, N. A. et al. Regulation of serine protease inhibitor Kazal type-5 (SPINK5) gene expression in the keratinocytes. Environ Health Prev. Med. 19, 307–313 (2014). 52. Johansson, J. A. & Headon, D. J. Regionalisation of the skin. Semin. Cell Dev. Additional information Biol. 25-26, 3–10 (2014). Supplementary Information accompanies this paper at http://www.nature.com/ 53. Mou, C., Jackson, B., Schneider, P., Overbeek, P. A. & Headon, D. J. Generation naturecommunications of the primary hair follicle pattern. Proc. Natl Acad. Sci. USA 103, 9075–9080 (2006). Competing financial interests: The authors declare no competing financial interests. 54. Held, L. I. The evo-devo puzzle of human hair patterning. Evol. Biol. 37, 113–122 (2010). Reprints and permission information is available online at http://npg.nature.com/ 55. Matama, T., Gomes, A. C. & Cavaco-Paulo, A. Hair coloration by gene reprintsandpermissions/ regulation: fact or fiction? Trends Biotechnol. 33, 707–711 (2015). 56. Pospiech, E. et al. Evaluation of the predictive capacity of DNA variants associated How to cite this article: Adhikari, K. et al. A genome-wide association scan in admixed with straight hair in Europeans. Forensic Sci. Int. Genet. 19, 280–288 (2015). Latin Americans identifies loci influencing facial and scalp hair features. Nat. Commun. 57. Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger 7:10815 doi: 10.1038/ncomms10815 (2016). and richer datasets. Gigascience 4, 7 (2015). 58. Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009). This work is licensed under a Creative Commons Attribution 4.0 59. O’Connell, J. et al. A general approach for haplotype phasing across the full International License. The images or other third party material in this spectrum of relatedness. PLoS Genet. 10, e1004234 (2014). article are included in the article’s Creative Commons license, unless indicated otherwise 60. Howie, B., Fuchsberger, C., Stephens, M., Marchini, J. & Abecasis, G. R. Fast in the credit line; if the material is not included under the Creative Commons license, and accurate genotype imputation in genome-wide association studies through users will need to obtain permission from the license holder to reproduce the material. pre-phasing. Nature Genet. 44, 955–959 (2012). To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

12 NATURE COMMUNICATIONS | 7:10815 | DOI: 10.1038/ncomms10815 | www.nature.com/naturecommunications SUPPLEMENTARY FIGURES Supplementary Figure 1: Frequency distribution of hair traits in the CANDELA sample (N=6,630)

100%

90%

80%

70%

60%

50%

40%

30%

20%

10%

0% Hair Shape Beard Eyebrow Mono-Brow Hair Color Hair Graying Balding 1 2 3 4 5

Hair trait Category 1 Category 2 Category 3 Category 4 Category 5 Hair shape Straight Wavy Curly - - Beard Low Medium High - - Eyebrow Low Medium High - - Monobrow None Medium High - - Hair color Blond Dark Brown/Black - - blond/Light brown Hair graying No graying Predominantly 50% graying Predominant Totally white no graying graying hair Balding None Medium High - -

1

Supplementary Figure 2: Estimated African, European and Native American ancestry (%) in the CANDELA individuals included in the GWAS for hair traits (N=6,357)

Individual ancestry barplots for each country are shown below. Individuals within each country are sorted by increasing European ancestry.

Admixture Estimates

100% 90% 80%

70% 60% 50%

40% 30% 20%

10% 0% ZPE ZPE ZPE ZPE ZPE ZPE ZPE ZBR ZBR ZBR ZBR ZCH ZCH ZCH ZCH ZCH ZCH ZCH ZCH ZCH ZCH ZCH ZAN ZAN ZAN ZAN ZAN ZAN ZAN ZAN ZAN ZAN Colombia Brazil Chile ZME ZME MexicoZME ZME ZME ZME ZME ZME Peru

European African Amerindian

Mean ancestry estimates for each country and overall:

Native Country African European American Colombia 10% 61% 29% Brazil 9% 79% 12% Chile 5% 46% 49% Mexico 5% 38% 58% Peru 5% 31% 65% Overall 6% 48% 46%

2

Supplementary Figure 3: Selection of genetic Principal Components for inclusion in the GWAS analyses

A) Scree plot: Principal components (x-axis below) were extracted from an LD-pruned SNP dataset (see methods). The proportion of genetic variance explained by each PC is shown below (y-axis).

0.035

0.03

0.025

0.02

0.015

0.01

0.005

0 12345678910

B) GWAS Q-Q plot: Example Q-Q plot from the GWAS for hair graying obtained after inclusion of the top 5 genetic PCs as covariates. No residual inflation due to population substructure is observed (SNPs with high P values lie exactly on the diagonal relating observed and expected values). Genomic inflation factor λ = 1.007.

3

Supplementary Figure 4: Effect sizes for index SNPs in genomic regions previously associated with hair traits which were replicated here and are not shown in Figure 2.

Blue boxes represent regression coefficients (x-axis) estimated in each country. Red boxes represent effect sizes estimated in the combined meta-analysis. Box sizes are proportional to sample size. Horizontal bars indicate standard errors. Meta-analysis P values are shown in Supplementary Table 4.

a) 1q21 – Hair Shape

4 b) 2q12 – Hair Shape

c) 5p13 – Hair Color

5 d) 6p25 – Hair Color

e) 11q14 – Hair Color

6 f) 15q13 – Hair Color

g) 15q21 – Hair Color

7 h) Xq12 – Balding

8

Supplementary Figure 5: Polygenic architecture of hair traits

For all hair traits there is no residual inflation of association P values, after correction for population substructure, a conclusion supported by all traits having genomic control inflation factor λ < 1.02. Also, in Q-Q plots SNPs with high P values lie exactly on the diagonal relating observed and expected values (Supplementary Figure 3B). However, hair traits can have a large number of SNPs with low P values deviating from the diagonal at the tail, suggesting that there are many SNPs with smaller effect sizes which are associated with the trait, albeit not reaching the genome-wide significance threshold of 5 × 10-8. As an example, the Q-Q plot for hair color is shown below.

9

Supplementary Figure 6: Regional association plots for eyebrow thickness and monobrow in 2q12.

a) 2q12 – Eyebrow thickness

10 b) 2q12 – Monobrow

11

Supplementary Figure 7: The signal of association at EDAR intronic SNPs with beard thickness is independent of rs3827760.

Supplementary Figure 7a shows that in the GWAS for beard thickness, several SNPs in the first intron of EDAR have smaller P values than rs3827760, the missense EDAR index SNP strongly associated with hair shape (Table 1). These intronic SNPs are in an LD block separate from rs3827760. We evaluated whether the association signal at these SNP is independent from rs3827760 by performing tests conditioning on rs3827760 (Supplementary Figure 7b). This analysis was restricted to the 58 SNPs with P values < 10-3 in the initial GWAS

(panel labelled “subset” in Figure 7b), leading to a follow-up Bonferroni-corrected –log10(p-value) significance threshold of 3.06. Supplementary Figure 7b shows that after conditioning on rs3827760, a significant association signal is still observed for several SNPs in the first intron of EDAR, including rs365060, the index SNP associated with beard density in the initial GWAS (Table 1). For all SNPs the derived allele is associated with lower beard thickness and the direction of these effects does not change after conditioning on rs3827760 (results not shown). Supplementary Figure 7a also shows that the first intron of EDAR is rich in regulatory DNA elements, based on annotations from the UCSC Genome Browser (https://genome.ucsc.edu/). The bottom two panels in Supplementary Figure 7b display the CMS scores for SNPs in the EDAR region (in CEU and ASN). We observe the highly significant CMS scores in ASN corresponding to the strong signal of selection reported for variants around rs3827760, associated with hair shape (Table 1). Interestingly, we also observe significant CMS scores in CEU for the intronic EDAR variants associated here with beard thickness. a)

12 b)

13

Supplementary Figure 8: Regional association plots for regions not shown in Figure 3

The index SNP in each region (Table 1) is shown as a purple diamond. At the top of the figure are shown the association results (on a -log10 P scale; left y-axis) for all genotyped and imputed SNPs. The dot color indicates the strength of LD (r2) between the index SNP and each SNP (based on the 1000genomes AMR dataset). Recombination rate across the region, in the AMR data, is shown as a continuous blue line (scale on the right y- axis). Genes in the region are shown in the middle. Plots for regions representing novel associations include at the bottom an LD heatmap similar to those shown in Figure 3 (using D’, red indicating D’=1 and white indicating D’=0). Plots for regions that have been previously associated with the same trait omit the LD heatmap.

a) 1q21 – Hair Shape

14 b) 10p14 – Hair shape

15 c) 4q12 – Beard

16 d) 6q21 - Beard

17 e) 7q31 - Beard

18 f) 5p13 – Hair Color

g) 6p25 – Hair Color

19 h) 11q14 – Hair Color

i) 15q13 – Hair Color

20 j) 15q21 – Hair Color

21 k) 10q22 - Balding

22 l) Xq12 – Balding

23

Supplementary Figure 9: Proprotein convertase cleavage site in PRSS53 (Q30R).

A)

Analysis of the complete amino acid sequence of PRSS53 and PRSS53 Q30R was performed at http://elm.eu.org/. No proprotein corvertase sites were detected in PRSS53. Potential proprotein convertase sites detected in PRSS53 Q30R are boxed.

PRSS53:

PRSS53 Q30R:

24

B)

At the top are shown Sanger sequence profiles for the PRSS53 gene region encoding the Q30R substitution associated with hair shape. Below is shown the N-terminal sequence of PRSS53 Q30R with the location of the signal peptide cleavage site indicated by blue arrows. In the case of PRSS53 30R (on the right), the putative proprotein convertase cleavage site is highlighted by another blue arrow.

30Q 30R

Q R

* *

N-MKWCWGPVLLIAGATVLMEGLQAAQRACG QRGP… N-MKWCWGPVLLIAGATVLMEGLQAAQRACGRRGP…

Signal peptide Signal peptide

RACGRRG

RXXXRRX

25

Supplementary Figure 10: Distribution of balding trait across sexes

As indicated in Supplementary Figure 1, Balding was coded into three categories: none, medium and high. The frequency of balding for males and females separately and in the full sample is displayed below.

26

Supplementary Figure 11: Expression of PRSS53 in 293-EBNA cells

Comparison of PRSS53 and PRSS53 (Q30R) from cell extracts (A) and media (B), after expression in 293-EBNA cells cultured in the absence (-) or presence (+) of DECA (decanoyl-RVKR-CMK, a pro-protein convertase inhibitor). These are uncropped western blot images corresponding to those shown in Figure 5 a-b and are labelled below as in Figure 5.

Figure 5a: Cell extracts -DECA +DECA

250 250 150 150 100 100 75 75

50 50 WB: FLAG WB: FLAG

50 50

37 37 WB: Actin WB: Actin

-DECA +DECA Figure 5b: Medium

250 250 150 150

100 100

75 75

27 50 50

WB: FLAG WB: FLAG SUPPLEMENTARY TABLES Supplementary Table 1: Features of the CANDELA individuals included in this study.

Total Colombia Brazil Chile Mexico Peru

Sample size 6357 1507 651 1745 1207 1247

Percentage 100 23.7 10.2 27.5 19.0 19.6

% Female 54.0 55.9 68.5 39.6 60.3 58.4

Age (years)

Min 18 18 18 18 18 18

Mean 24.2 24.0 25.8 25.2 24.4 22.2

Max 45 40 45 45 44 44

S.D. 5.7 5.3 6.3 5.8 5.6 5.2

Age, for Males

Min 18 18 18 18 18 18

Mean 24.9 24.7 25.8 25.3 25.1 23.0

Max 45 40 45 45 44 44

S.D. 5.7 5.5 6.4 5.5 5.6 5.7

Age, for Females

Min 18 18 18 18 18 18

Mean 23.8 23.5 25.4 25.2 24.0 21.6

Max 45 40 44 45 41 42

S.D. 5.7 5.0 4.2 6.2 4.7 4.7

28

Supplementary Table 2: Correlation of hair traits and covariates in the CANDELA sample included in the GWAS (N=6,357)

A) Correlation between hair traits:

Pearson correlation coefficient values are presented in the lower left triangle while corresponding permutation P values are presented in the upper right triangle. Correlations with significant P values (<0.001, Bonferroni- adjusted threshold) and their corresponding P values are highlighted in bold.

HC HG HS BL BT ET MB

Hair color (HC) 3.9E-03 8.6E-07 1.7E-04 1.5E-02 8.3E-06 2.1E-02

Hair graying (HG) -0.03 1.9E-01 3.5E-52 9.1E-11 6.6E-06 3.8E-01

Hair shape (HS) 0.05 0.01 7.5E-01 4.4E-03 2.1E-01 6.5E-03

Balding (BL) 0.04 0.16 0.00 4.3E-14 2.2E-03 1.2E-09

Beard thickness (BT) -0.05 0.13 0.06 0.15 5.7E-20 1.3E-12

Eyebrow thickness (ET) 0.09 -0.09 -0.03 -0.06 0.18 2.0E-32

Monobrow (MB) 0.03 -0.01 -0.04 0.09 0.14 0.24

29

B) Correlation between hair traits and covariates:

Pearson correlation coefficient

Ancestry Native Trait Sex Age African European American Hair color -0.10 -0.02 0.01 -0.32 0.31 Hair graying -0.05 0.56 0.00 0.03 -0.03 Hair shape 0.02 -0.01 0.15 0.11 -0.16 Balding -0.35 0.15 -0.05 -0.07 0.08 Beard thickness - 0.28 0.04 0.30 -0.30 Eyebrow thickness - -0.17 0.03 0.05 -0.06 Monobrow - 0.02 0.00 0.11 -0.10

P value

Ancestry Native Trait Sex Age African European American Hair color 1.3E-16 1.2E-01 5.4E-01 1.5E-10 1.0E-144 Hair graying 1.5E-05 0.0E+00 8.9E-01 6.7E-03 7.8E-03 Hair shape 1.9E-01 6.8E-01 4.6E-36 2.2E-20 8.1E-37 Balding 5.5E-190 3.3E-36 1.0E-04 8.3E-08 1.3E-10 Beard thickness - 1.6E-45 5.6E-02 1.4E-51 7.3E-53 Eyebrow thickness - 4.2E-17 1.6E-01 9.6E-03 3.1E-03 Monobrow - 1.6E-01 9.9E-01 7.8E-13 3.5E-12

Individual continental ancestry was estimated from the chip data obtained here (Supplementary Figure 2).

Sex was coded as: female=1, male=0.

Correlations with significant P values (<0.001, Bonferroni-adjusted threshold), are highlighted in bold.

30

Supplementary Table 3: Narrow-sense heritability (h2) of hair traits, estimated from the population data obtained here.

Trait h2 S.E. P-value

Hair color 1.00 0.05 0.0E+00

Hair graying 0.27 0.05 4.4E-13

Hair shape 0.64 0.05 0.0E+00

Balding 0.39 0.05 0.0E+00

Beard thickness 0.74 0.12 0.0E+00

Eyebrow thickness 0.37 0.11 7.2E-10

Monobrow 0.74 0.12 0.0E+00

31

Supplementary Table 4: Meta-analysis association P values for index SNPs.

For the index SNPs in Table 1 we performed association tests independently in the five country samples and the P values combined using a meta-analysis. For each SNP we provide below the basic meta-analysis P value, a test of heterogeneity P value (obtained through Cochran’s Q statistic) and a random effects (R.E. model) meta- analysis P value. With 20 tests, the Bonferroni-corrected significance threshold for P-values is 0.05/20 = 2.5E-03. Both meta-analysis P values are significant at this level for all SNPs. For 4 SNPs there is significant evidence of heterogeneity, based on the Cochran’s Q P value (highlighted in bold). For those SNPs one should preferably consider the random effects P value, while for the other SNPs one should consider the basic meta-analysis P- value.

Chromosome SNP Trait P value Cochran’s P value Q P value (R.E. model) 1 rs11803731 Hair Shape 1.94E-12 1.42E-01 1.34E-07 2 rs3827760 Hair Shape 2.11E-131 1.00E-04 3.91E-12 10 rs17143387 Hair Shape 4.73E-08 7.78E-01 4.73E-08 16 rs11150606 Hair Shape 2.22E-09 8.95E-01 2.22E-09 5 rs183671 Hair Color 2.74E-59 1.00E-04 2.18E-07 6 rs12203592 Hair Color 6.42E-10 6.10E-02 1.89E-05 11 rs598952 Hair Color 2.60E-11 3.43E-02 4.03E-05 15 rs12913832 Hair Color 1.59E-88 1.00E-04 4.26E-09 15 rs1426654 Hair Color 1.63E-23 1.00E-04 1.11E-04 6 rs12203592 Hair Graying 1.66E-12 5.76E-02 4.25E-06 2 rs365060 Beard Thickness 3.34E-17 1.67E-01 7.36E-09 4 rs4864809 Beard Thickness 2.27E-07 9.55E-01 2.27E-07 6 rs6901317 Beard Thickness 2.07E-07 6.63E-02 1.16E-03 7 rs117717824 Beard Thickness 9.94E-08 3.74E-01 2.71E-07 2 rs3827760 Eyebrow Thickness3.95E-04 2.80E-01 1.74E-03 3 rs112458845 Eyebrow Thickness1.11E-10 5.90E-02 5.59E-04 2 rs3827760 Monobrow 6.01E-07 4.53E-01 6.01E-07 2 rs2218065 Monobrow 1.77E-07 7.79E-02 3.40E-04 10 rs2814331 Balding 1.95E-10 2.16E-01 7.00E-07 X rs4258142 Balding 2.48E-04 4.20E-01 2.48E-04

32

Supplementary Table 5: Prediction of hair traits

% Phenotypic variation explained Trait Age+Sex Genetic PCs Index SNPs + BLUP Total Hair Shape 0.41 9.70 14.96 25.07 Beard 10.68a 12.54 17.53 40.75 Eyebrow 2.44a 3.69 16.30 22.43 Monobrow 0.22a 5.51 10.45 16.18 Hair Color 1.31 18.38 20.16 39.85 Hair Graying 19.51 1.91 6.62 28.04 Balding 13.16 5.37 8.77 27.30

a Sex was not included as a covariate for facial traits as these were assessed only in males.

While the total prediction values can have a similar interpretation to the heritability estimates, these are not directly numerically comparable for several reasons:

i) While h2 is usually an upper bound for R2, with typical sample sizes we expect prediction R2 to be relatively lower (Makowsky et al. 2011), because of the difficulty in accurately estimating effect sizes for many SNPs in the BLUP computation as compared to estimating a few variance components for heritability. ii) Heritability is calculated on the whole dataset while prediction scores are calculated in test subsets which are different from the training subsets used to build the models (Hastie et al. 2009). Since the test set on which prediction accuracy is calculated is different from the training set that produces the prediction model, the accuracy is likely to be smaller in magnitude in the test set than in the training set. E.g. the training R2 for hair color was 100%, matching the observed heritability values. iii) Prediction captures the proportion of phenotypic variation explainable only by the index SNPs + BLUP component, in contrast to the genome-wide kinship matrix used in heritability (Wray et al. 2013). The genome-wide kinship matrix used in heritability is thus more likely to capture polygenic effects from many genes with smaller effect sizes. iv) The mathematical formulae are slightly different, e.g. while obtaining fraction of variance explained during calculation of heritability, the variance component due to covariates is removed first from the total phenotypic variance in the denominator (Yang et al. 2011).

33

Supplementary Table 6: SNPs showing suggestive association with hair traits

Table 1 presented SNPs showing genome-wide significant association with hair traits. SNPs with P values below the genome-wide significant threshold (5 × 10-8) but above the suggestive significant threshold (10-5) are presented here. If a genomic region has several associated SNPs, only the SNP with lowest P-value is shown. Annotations regarding SNP type (for intragenic SNPs), genes in the region (±300 KB from the SNP) and potential regulatory role were obtained from the UCSC Genome Browser. This list includes the two suggestively significant associations of the EDAR rs3827760 SNP with eyebrow thickness and Monobrow, as mentioned in the main text. A dot indicates that no relevant information is available for that SNP.

CHR SNP BP P Trait SNP type Genes in region Regulatory annotation

4 rs4864363 1376917504.38E-07 Hair . . . Shape

7 rs73199888 1097570181.52E-07 Hair . EIF3IP1(+156.7kb) . Shape

10 rs4881147 3342774 2.99E-06 Hair . PFKP(+163.8kb)| PITRM1(+127.7kb)| . Shape PITRM1-AS1(+152kb)

14 rs3742377 100166228 1.40E-07 Hair intron CCDC85C(+95.5kb)| CCNK(+188.4kb)| . Shape CYP46A1| EML1(-93.52kb)| HHIPL1(+23.22kb)| MIR5698(+25.99kb)| SETD3(+219kb)

18 rs4800451 20716805 5.57E-06 Hair intron CABLES1| MIR4741(+203.4kb)| Promoter Shape RBBP8(+110.4kb)| TMEM241(-159.2kb)

1 rs11121667 11038476 3.76E-07 Beard intron ANGPTL7(-210.9kb)| C1orf127| . thickness CASZ1(+181.7kb)| EXOSC10(-88.2kb)| MASP2(-48.1kb)| MTOR(-128.1kb)| MTOR-AS1(-165.5kb)| SRM(-76.17kb)| TARDBP(-34.2kb)| UBIAD1(-294.8kb)

1 rs6684877 39557810 8.18E-06 Beard intron AKIRIN1(+86.07kb)| GJA9(+210.5kb)| CTCF_Binding thickness GJA9-MYCBP(+210.5kb)| MACF1| _Site| MYCBP(+218.8kb)| NDUFS5(+57.5kb)| Enhancer RHBDL2(+150.4kb)| RRAGC(+232.3kb)

5 rs1003136 1273478151.10E-06 Beard . FBN2(-245.8kb)| LINC01184(-9.428kb)| . thickness SLC12A2(-71.67kb)

10 rs10788096 1222422931.49E-06 Beard intron C10orf85(-115.2kb)| MIR5694(- . thickness 102.3kb)| PPAPDC1A| WDR11-AS1(-

34

279kb)

1 rs10908366 38121934 6.51E-06 Eyebrow . C1orf109(-25.31kb)| C1orf122(- . thickness 151.5kb)| CDCA8(-36.14kb)| DNALI1(+89.48kb)| EPHA10(-59.71kb)| GNL2(+60.35kb)| INPP5B(-204.4kb)| LINC01137(+181.9kb)| MANEAL(- 137.8kb)| MEAF6(+141.5kb)| MIR5581(+155.3kb)| MIR6732(+176kb)| MTF1(-153.3kb)| RSPO1(+21.34kb)| SNIP1(+102kb)| YRDC(-146.7kb)| ZC3H12A(+172kb)

2 rs3827760 1089972621.16E-07 Eyebrow missense GCC2(-68.31kb)| LIMS1(-153.5kb)| Enhancer thickness SULT1C2(+70.89kb)| SULT1C2P1(+27.01kb)| SULT1C3(+115.5kb)| SULT1C4

4 rs1868245 87848739 2.30E-06 Eyebrow intron AFF1(-7.414kb)| C4orf36(+35.16kb)| Promoter_Fla thickness KLHL8(-232.5kb)| LOC100506746| nking_Region PTPN13(+112.4kb)| SLC10A6(+78.32kb)

5 rs9654415 72544390 2.56E-07 Eyebrow . BTF3(-249.9kb)| FCHO2(+158kb)| . thickness FOXD1(-197.7kb)| TMEM171(+116.7kb)| TMEM174(+73.42kb)

5 rs10515457 1326508712.77E-06 Eyebrow intron FSTL4| HSPA4(+210.2kb)| MIR1289-2(- . thickness 112.4kb)| ZCCHC10(+288.6kb)

7 rs201287033 1508232277.22E-06 Eyebrow intron ABCB8(+78.36kb)| ABCF2(-81.7kb)| Promoter thickness AGAP3| AOC1(+264.8kb)| ASB10(- 49.56kb)| ASIC3(+73.38kb)| ATG9B(+101.6kb)| CDK5(+68.18kb)| CHPF2(-106.3kb)| FASTK(+45.26kb)| GBX1(-22.45kb)| KCNH2(+147.8kb)| MIR671(-112.3kb)| NOS3(+111.5kb)| NUB1(-215.6kb)| SLC4A2(+49.61kb)| SMARCD3(-112.8kb)| TMUB1(+42.61kb)| WDR86(-255kb)| WDR86-AS1(-283kb)

9 rs62578126 1293753383.37E-06 Eyebrow . LMX1B(-1.383kb)| MVB12B(+106kb)| TF_binding_si thickness NRON(+202.6kb)| ZBTB34(-247.6kb)| te ZBTB43(-191.9kb)

15 rs112386516 78375229 1.48E-06 Eyebrow . ACSBG1(-87.96kb)| CIB2(-21.72kb)| Promoter_Fla CRABP1(-257.4kb)| DNAJA4(-181.3kb)| 35

thickness IDH3A(-66.49kb)| LOC91450(+88.66kb)| nking_Region LOC645752(+156kb)| MIR5003(- 0.645kb)| SH2D7(-9.697kb)| TBC1D2B(+5.235kb)| WDR61(-200.3kb)

16 rs12597422 53887738 2.95E-06 Eyebrow intron FTO| FTO-IT1(-185.6kb)| . thickness RPGRIP1L(+150kb)

2 rs3827760 1095136011.46E-07 Monobro missense CCDC138(+20.75kb)| EDAR| . w LIMS1(+209.9kb)| MIR4265(-244.3kb)| RANBP2(+111.3kb)| SH3RF3(-232.4kb)| SH3RF3-AS1(-230.2kb)

4 rs724818 121627113 4.16E-06 Monobro intron PRDM5 . w

5 rs7702331 72551134 4.28E-07 Monobro . ANKRA2(-296.9kb)| BTF3(-243.1kb)| . w FCHO2(+164.8kb)| FOXD1(-191kb)| TMEM171(+123.5kb)| TMEM174(+80.16kb)

6 rs12197419 1709037 1.74E-06 Monobro intron FOXC1(+94.91kb)| GMDS . w

9 rs12236890 3157137 2.02E-06 Monobro . RFX3(-61.16kb) . w

14 rs1187437 54704459 3.51E-07 Monobro . BMP4(+280.9kb)| CDKN3(-159.2kb)| . w CGRRF1(-272.1kb)| CNIH1(-189.2kb)| GMFB(-236.7kb)| MIR5580(+289.3kb)

2 rs213546 234739870 6.72E-06 Hair intron DNAJB3(+87.21kb)| HJURP(-5.476kb)| . Graying LOC100286922(+75.88kb)| MROH2A| MSL3P1(-34.22kb)| SPP2(-219.5kb)| TRPM8(-86.17kb)| UGT1A1(+57.92kb)| UGT1A3(+57.92kb)| UGT1A4(+57.92kb)| UGT1A5(+57.92kb)| UGT1A6(+57.92kb)| UGT1A7(+57.92kb)| UGT1A8(+57.92kb)| UGT1A9(+57.92kb)| UGT1A10(+57.92kb)| USP40(+265.6kb)

4 rs2085601 89895944 9.01E-06 Hair intron FAM13A| FAM13A-AS1(+244.7kb)| . Graying GPRIN3(-269.5kb)| HERC3(+266.3kb)| NAP1L5(+276.9kb)| TIGD2(-138kb)

8 rs7009516 24208847 4.26E-06 Hair intron ADAM7(-89.66kb)| ADAM28| Enhancer Graying ADAMDEC1(-32.95kb)

36

11 rs1912702 79173082 1.82E-06 Hair . MIR708(+59.93kb)| . Graying MIR5579(+39.81kb)| TENM4(+21.39kb)

14 rs11621135 71659609 5.92E-06 Hair . LOC145474(-295kb)| PCNX(+77.51kb)| . Graying SNORD56B(-205.4kb)

15 rs281229 47718455 2.71E-07 Hair intron SEMA6D . Graying

22 rs9626711 47686103 4.67E-06 Hair . LOC101927722(-170.9kb)| . Graying TBC1D22A(+114.8kb)

3 rs9820421 188949286 3.24E-06 Hair Color intron TPRG1| TPRG1-AS2(-7.188kb) .

9 rs58680346 7604965 1.78E-06 Hair Color. TMEM261(-191.5kb) .

10 rs12355139 20565765 5.47E-06 Hair Colorintron MIR4675(-275.1kb)| PLXDC2 Promoter_Fla nking_Region

16 rs4782309 88721538 1.28E-07 Hair Color . APRT(-154.3kb)| CBFA2T3(-219.7kb)| . CDT1(-148.6kb)| CTU2(-51.35kb)| CYBA(+4.046kb)| GALNS(-158.6kb)| IL17C(+14.66kb)| LOC339059(-87.64kb)| LOC100129697(-284.9kb)| LOC100289580(-76.05kb)| MIR4722(- 61.15kb)| MIR5189(+186.1kb)| MVD| PABPN1L(-208.2kb)| PIEZO1(-60.21kb)| RNF166(-41.36kb)| SNAI3(-22.55kb)| SNAI3-AS1(-8.242kb)| TRAPPC2L(- 202kb)| ZC3H18(+23.17kb)| ZFPM1(+120kb)| ZNF469(+214.4kb)

18 rs9949121 42306884 5.70E-06 Hair Colorintron LOC101927921(+195.2kb)| MIR4319(- . 243.2kb)| SETBP1

23 rs5971087 23285900 4.93E-07 Hair Colorintron DDX53(+265.7kb)| PTCHD1(-67.08kb)| . PTCHD1-AS

Supplementary Table 7: Continental allele frequencies of index SNPs

Allele frequencies are provided for all index SNPs listed in Table 1. CEU, YRI, CHB are Europeans, Yoruba and Chinese from the 1000 genomes project. NAM are Native Americans and CAN is the CANDELA sample examined here. NAM data are from populations included in Reich et al. (2012).

Trait Region SNP Closest gene Alleles Derived Allele Frequency (%) 37

Ancestral>derivedCEU YRI CHBNAM CAN Hair Shape 1q21 rs11803731 TCHH A>T 23 0 0 0 9 Hair Shape 2q12 rs3827760 EDAR A>G 0 0 94 98 42 Hair Shape 10p14 rs17143387 GATA3 T>G 16 4 52 73 41 Hair Shape 16p11 rs11150606 PRSS53 T>C 4 0 83 47 23 Beard thickness 2q12 rs365060 EDAR G>C 93 35 2 2 51 Beard thickness 4q12 rs4864809 LNX1 A>G 61 42 41 60 61 Beard thickness 6q21 rs6901317 PREP G>T 35 8 66 57 42 Beard thickness 7q31 rs117717824 FOXP2 G>T 0 0 5 13 5 Eyebrow thickness 3q22 rs112458845 FOXL2 G>A 99 68 96 36 72 Monobrow 2q36 rs2218065 PAX3 A>G 64 34 38 25 42 Hair Graying 6p25 rs12203592 IRF4 C>T 16 0 0 0 6 Hair Color 5p13 rs183671 SLC45A2 T>G 98 17 8 5 50 Hair Color 6p25 rs12203592 IRF4 C>T 16 0 0 0 6 Hair Color 11q14 rs598952 TYR T>A 28 74 74 95 57 Hair Color 15q13 rs12913832 HERC2/OCA2 A>G 77 0 0 1 23 Hair Color 15q21 rs1426654 SLC24A5 G>A 100 1 3 1 56 Balding 10q22 rs2814331 GRID1 C>T 85 93 41 96 90 Balding Xq12 rs4258142 AR/EDA2R C>T 85 0 100 99 88

38

SUPPLEMENTARY REFERENCE

Wray, N.R. et al. (2013). Pitfalls of predicting complex traits from SNPs. Nat Rev Genet 14, 507-15.

Makowsky, R. et al. (2011). Beyond missing heritability: prediction of complex traits. PLoS Genet 7, e1002051.

Yang, J., Lee, S.H., Goddard, M.E. & Visscher, P.M. (2011). GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet 88, 76-82.

Hastie, T., Tibshirani, R. & Friedman, J.H. (2009). The elements of statistical learning: data mining, inference, and prediction, xxii, 745 p. (Springer, New York, NY, 2009).

Reich, D. et al. (2012), Reconstructing Native American population history. Nature 488, 370–374.

39