<<

The Journal of Human Genetics, Volume 96 Supplemental Data Katarzyna Bryc, Eric Y. Durand, J. Michael Macpherson, David Reich, and Joanna L. Mountain AB 100.0 100.0 European European 50.0 Latino 50.0 Latino African American African American

10.0 10.0 5.0 5.0

1.0 1.0 0.5 0.5 within population (%) within population (%)

Frequency of ancestry proportion 0.1 Frequency of ancestry proportion 0.1 0 % 2 % 4 % 6 % 8 % 0 % 2 % 4 % 6 % 8 % 10 % 12 % 14 % 16 % 18 % 20 % 22 % 24 % 26 % 28 % 30 % 32 % 34 % 36 % 38 % 40 % 42 % 44 % 46 % 48 % 50 % 52 % 54 % 56 % 58 % 60 % 62 % 64 % 66 % 68 % 70 % 72 % 74 % 76 % 78 % 80 % 82 % 84 % 86 % 88 % 90 % 92 % 94 % 96 % 98 % 10 % 12 % 14 % 16 % 18 % 20 % 22 % 24 % 26 % 28 % 30 % 32 % 34 % 36 % 38 % 40 % 42 % 44 % 46 % 48 % 50 % 52 % 54 % 56 % 58 % 60 % 62 % 64 % 66 % 68 % 70 % 72 % 74 % 76 % 78 % 80 % 82 % 84 % 86 % 88 % 90 % 92 % 94 % 96 % 98 % Proportion African ancestry Proportion CD 100.0 100.0 African American African American 50.0 Latino 50.0 Latino Europeans with > 2% African ancestry Europeans with > 2% African ancestry

10.0 10.0 5.0 5.0

1.0 1.0 0.5 0.5 within population (%) within population (%)

Frequency of ancestry proportion 0.1 Frequency of ancestry proportion 0.1 0 % 2 % 4 % 6 % 8 % 0 % 2 % 4 % 6 % 8 % 10 % 12 % 14 % 16 % 18 % 20 % 22 % 24 % 26 % 28 % 30 % 32 % 34 % 36 % 38 % 40 % 42 % 44 % 46 % 48 % 50 % 52 % 54 % 56 % 58 % 60 % 62 % 64 % 66 % 68 % 70 % 72 % 74 % 76 % 78 % 80 % 82 % 84 % 86 % 88 % 90 % 92 % 94 % 96 % 98 % 10 % 12 % 14 % 16 % 18 % 20 % 22 % 24 % 26 % 28 % 30 % 32 % 34 % 36 % 38 % 40 % 42 % 44 % 46 % 48 % 50 % 52 % 54 % 56 % 58 % 60 % 62 % 64 % 66 % 68 % 70 % 72 % 74 % 76 % 78 % 80 % 82 % 84 % 86 % 88 % 90 % 92 % 94 % 96 % 98 % Proportion African ancestry Proportion American ancestry E

4

3

State district_of_columbia 2

density

1

0

0.00 0.25 0.50 0.75 1.00 African_ancestry

Figure S1: Histogram of ancestry, in bins of 2%, in self-reported African American, Latino, and European American individuals. The vertical bars represent the proportion of individuals from each self-reported cohort that are estimated to have proportion African an- cestry fall within each ancestry bin. Note that the y-axis is shown in a log scale to illustrate fine-scale differences among cohorts. Histogram of African ancestry (A) and Native American (B) in European (red bars), Latinos (gold bars), and (blue bars). Histogram of African (C) and Native American (D) ancestry in African Americans, Latinos, and only those that have at least 2% African ancestry. (E) Qualitative dif- ferences in African ancestry distributions in African Americans from and Georgia. Restricted to states for which we had at least 50 individuals, D.C. had the lowest mean African ancestry, and Georgia had the highest mean African ancestry. The distribution of the ancestry proportions of self-reported African American individuals from these states are displayed using geom density in ggplot2 from R. 1 Self−reported African Americans Self−reported African Americans with > 2% with > 1% Native American ancestry

Percent of Percent of African African Americans Americans

10 20

5 10

0

Figure S2: Frequency of self-reported African American individuals with at least 2% (left) and 1% (right) Native American ancestry across states with at least 20 individu- als. The geographic distribution of self-reported African Americans with Native American ancestry. States with fewer than 20 individuals are excluded and shaded in gray. The proportion of individuals with Native American ancestry, out of the total number of African Americans per state, is shown by shade of .

African American ancestry tract lengths Latino ancestry tract lengths European American ancestry tract lengths

! ! 1000 ! ! ! ! ! ! !! ! ! ! ! ! !!!!! !! !!! !!!!! !!!!!!!! !!!! 500 10000 !!!!!!! !!!!! ! ! !!!!!!!!!! !!!! ! ! !!!!!!!!! !!!! !!! ! !!!!!!!! !!! !! !! !!!!!! !!!!! !!!!! ! !!!! !!!!! !!!!! !!!! ! ! !!!!!!! !!!! !!!! !! !!! !!!!! !!!!!!! !!!! ! ! ! !! !!! !!!!!! !! !! !!!!!!! ! !! !!! !!!!! ! ! !!!! !! !!! ! !!!!!!! !! !!!!! 10000 !! !! !!! !!!! !!!!!!!!! !!!!!!!! !!! !!!!! !!! ! !! !!!! !! !! !!! ! !! !! !!!!!! !! !! !!!! !! ! !! !! !! ! !! !!!!!! ! !!! !! ! ! !!!!! !! ! !!! !! !! !! ! !!!!! ! !!! ! !! !!! ! ! ! !! !!!!!! ! ! !!! ! ! !!!!!!! ! !!! !!!!!! !! !!! !!! !! ! ! ! ! ! !! !! !!!! ! ! !!! !!! !!! !!! !! !! !! !!!! !! !! !!!!! ! ! ! ! !! !!! !!!!!!! ! !! !!!! ! ! !!!!!! !!! !! !! !! !!! !!! ! ! !! ! ! 1000 !! !!!! ! !!! ! ! !!!!! 100 ! ! !! ! !!! !!!!! !!! !!! !!!! !! !! ! ! !! !!!!! !! ! !!!!! ! ! ! !! !!! !!!! !!!!! !! !!!!! !!! ! ! ! !!!!! !!!! !! !!!!! ! ! ! !! !!! ! !! !!!!! !!! ! ! !!! !!!!! !! ! !! !!!!!!! !!! !! !! ! !!! !!!!!!! !! !!! !!!!! ! !!!!!!!!! !! !! ! !!!!! !! !!!! !!! !!! ! ! !!!!! !! ! !!! !!!! !! ! ! !!! 50 ! !! ! ! ! ! !!! ! ! !!!!! !!! !! !!!!!! !! ! !!!! ! !!!! ! ! !! !!! ! ! ! ! ! !! !! !!! ! ! !!!!!!!! ! ! !! ! Ancestry !! !!! ! !!!! !!! !!!!! !! ! ! ! !!! ! !!! !!! !!!! ! !! !! ! !! !!! !!! !!! !!!! !!! ! ! ! !! !! ! !!!! !! !! !!!!!!!! !! ! !!! !! !!! !!! ! !! ! ! ! ! ! !! !!!! !! !!! !! !!!!! ! ! !!!!!!! !!! !!! ! !!! !!!!! ! ! 100 ! ! !! !!!! !!! !!!! !! ! !! ! american ! !!!!! ! !!! !!!!! !!! !!! !! ! !! ! !!! !! ! !!!!! ! ! ! ! ! !! ! ! !!!! !!! !!! !!! !! ! !!! !!! !!!! !! !! ! ! !!!! ! !! !! !! ! ! ! ! ! ! !!!! 100 !!!!!!!! !!!!! ! !!!! !!! ! ! ! ! !!! ! ! !! !! !! ! ! ! !! ! ! !! ! ! european ! !! ! !! !! !!!!! ! ! ! !!!!!! !! ! ! !!!!!!! ! ! !! !! ! ! !! ! ! !!! ! !! ! !! ! !!!!!!!! ! ! !!! !! ! ! ! ! !!! !!!!! ! ! !! ! 10 ! !!!! ! !!! !!!! ! ! !! !!!! ! !! ! ! ! !!!!! ! !!!! ! !!!!!! !!! !! ! ! !!!! ! ! !!!! african !! ! !! !! ! ! ! !!!!!!! ! ! ! !! ! !! ! !! ! !! ! ! ! !! !! !!!!!!!! !!!!!!! ! !! ! ! !! ! ! !! !!!!!!! !! !! !! ! ! !! ! !!! ! ! !! !! !! !! !!!!!!!! ! !!!!!!!! !! !! !! ! ! ! ! !!!!!!!!!! !! ! !! ! ! ! !!! !! !! Number of tracts !! ! !!! !!! !! !!!! !!!! Number of tracts ! !!!!!! !! !!!!!! ! Number of tracts ! !! ! !!! !!!!!!!! !! ! 5 10 ! ! ! ! ! ! !! ! ! !!!!! ! ! !!!! ! ! ! !!!! !!! !!!! ! !!! !!! !! ! !! ! !!!!!!! ! !!! ! ! ! ! ! ! ! !! !! ! ! ! ! ! !! ! !!!!!!! !! ! !!!!!!! !! ! !! ! !!!! ! ! ! ! ! ! ! ! !!!! ! ! !!! !! ! !! !!! !!!! ! ! ! ! !! !!!!! !!! !!!! !! ! !! !!!!! !!!!!!!!!! !!!!!! ! !!!!!!!!!!!! ! ! ! ! ! !!!! !!!!! !! ! !!!! ! !!!!! !! !!!!! ! !!!!!!!!!!!!!!!! !! ! ! !! !

1 ! !!!!!! !!!!!! !!! !! ! ! !!! !!!!! !!! !! 1 ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!! !!!!!!!!!!! !!! 1 !!!!!!!!!!!!!!!!!!!!!!!!! !!! ! !! ! ! ! !

0 50 100 150 200 250 300 0 50 100 150 200 250 300 0 50 100 150 200 250 300

Length of ancestry tract (cM) Length of ancestry tract (cM) Length of ancestry tract (cM)

Figure S3: Distribution of the lengths of ancestry segments for African Americans, Lati- nos, and Europeans with at least 2% African ancestry. The lengths of segments, or tracts, of ancestry, and the frequency of those tracts is shown by points, colored by population. The num- ber of ancestry tracts is shown on a log scale. Counts are shown self-reported African Amer- icans, Latinos, and European Americans. The number of tracts of Native American (gold), European (red) and African (blue) ancestry tracts is shown for each bin of 1Mb of segment length.

2 African Americans Latinos European Americans African African African

Eur. N.Am. Eur. N.Am. Eur. N.Am.

Figure S4: Ternary plots of African, European, and Native American ancestry in self- reported African American, Latino, and European American individuals. Each point rep- resents a self-reported individual and is positioned within the triangle reflecting the amount of ancestry estimated from each population. Note that each individual is plotted as a semi- transparent point to convey density of individuals. Only a random sample of 10,000 of the European Americans are shown for plotting purposes.

3 Mexican Puerto Rican Cuban African African African

Eur. N.Am. Eur. N.Am. Eur. N.Am.

Dominican Central American South American African African African

Eur. N.Am. Eur. N.Am. Eur. N.Am.

White Black African African

Eur. N.Am. Eur. N.Am.

Figure S5: Ancestry of self-reported Latinos by secondary self-reported subpopulation. Each individual is shown projected onto the triangle by their genome-wide proportions of African, European, and Native American ancestry, by their self-reported Hispanic sub-identity. Proportion of ancestry can be computed for an individual from the distance in dropping a per- pendicular line from the point to the edge opposite the vertex.

4 Self-reported European Americans Self-reported African Americans Self-reported Latinos

Proportion Proportion Proportion British\ British\ British\ Irish Irish Irish of European of European of European ancestry ancestry ancestry 0.400 0.5 0.4 0.375 0.3 0.4 0.350 0.2 0.3 0.325 0.300 0.1

Proportion Proportion Proportion Iberian Iberian Iberian of European of European of European ancestry ancestry ancestry

0.03 0.04 0.30 0.03 0.25 0.02 0.02 0.20 0.01 0.15 0.01 0.10 0.00 0.00

Proportion Proportion Proportion French French French German German German of European of European of European ancestry ancestry ancestry 0.16 0.125 0.06 0.14 0.100 0.04 0.12 0.075

0.10 0.02 0.050

0.08 0.00 0.025

Figure S6: Differences in the European subpopulation Ancestry Composition among self- reported European Americans, African Americans, and Latinos from different states. The relative amoung of European ancestry, out of the total mean European ancestry, estimated for each state. Shown for inferred British/Irish ancestry, inferred Iberian ancestry, and inferred Italian ancestry. The proportion of sub-population ancestry, normalized by the total estimated European ancestry, for each state is shown by shade of red.

5 Self−reported European Americans Self−reported European Americans with > 2% Native American ancestry with > 1% Native American ancestry

Percent of Percent of European European Americans Americans 4 8 3 6 2 4 1 2 0 0

Figure S7: Frequency of self-reported European Americans with at least 2% Native Amer- ican ancestry (left) and 1% Native American ancestry (right). The geographic distribution of self-reported European Americans with Native American ancestry. States with fewer than 20 individuals are excluded and shaded in gray. The proportion of individuals with Native Amer- ican ancestry, out of the total number of European Americans per state, is shown by shade of green.

6 Self−reported European Americans Self−reported European Americans with > 2% African ancestry with > 1% African ancestry

Percent of Percent of European European Americans Americans 5 12.5 4 10.0 3 7.5 2 5.0 1 2.5 0

Figure S8: Frequency of self-reported European Americans with at least 2% African an- cestry (left) and 1% African ancestry (right). The geographic distribution of self-reported European Americans with African ancestry. States with fewer than 20 individuals are excluded and shaded in gray. The proportion of individuals with African ancestry, out of the total number of European Americans per state, is shown by shade of green.

7 A

! !! !! !! !!!!!!!!!!!!!!!! !! !!!!!!!!!!!! !!!!!! 1.0 !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !

0.8 ! !

! !

!

0.6 ! !

! reporting as African American − 0.4

! ! ! !

! ! 0.2

! Probability of self ! ! !

!! !!!!!!!!!! ! ! ! !! 0.0

0 20 40 60 80 100

Proportion of African ancestry (%) B 1.0

0.8

0.6 reported ancestry

− 0.4 European Americans Latinos 0.2 African Americans Proportion of individuals

in each self 0.0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 − − − − 100 − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − 0 2 4 6 − 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 Proportion of African ancestry (%) C 1.0

0.8

0.6 reported ancestry

− 0.4 European Americans Latinos 0.2 African Americans Proportion of individuals

in each self 0.0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 − − − − 100 − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − 0 2 4 6 − 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 Proportion of Native American ancestry (%)

Figure S9: Relationship between the amount of African ancestry and African American versus European American self-reported identity. (A) Using ancestry data jointly from both African Americans and European Americans, we show the probability of self-reporting as African American by proportion of African ancestry. The probability for each bin of 1% ancestry is shown (points), and the gray area is shaded to emphasize the transition region. (B) Proportion of individuals that self-report as European American, African American, and Latino, by proportion of African ancestry. (C) The proportion of individuals that self-report as European American, African American, and Latino by the proportion of Native American ancestry. 8 Self-reported European Americans

Mean Mean Mean British\ Ashkenazi Balkan Irish ancestry ancestry ancestry 0.20 0.55 0.020 0.50 0.15 0.015 0.45 0.10 0.010 0.40 0.35 0.05 0.005 0.30 0.000 0.25

Mean Mean Mean East French Finnish European German ancestry ancestry 0.03 ancestry 0.16

0.075 0.02 0.14 0.050 0.12 0.01 0.025 0.10 0.00 0.000 0.08

Mean Mean Mean Middle Iberian Italian Eastern ancestry ancestry ancestry 0.08 0.03 0.06 0.010 0.02 0.04 0.005 0.01 0.02 0.00 0.00 0.000

Mean Mean Sardinian Scandinavian ancestry ancestry 0.20 0.0015 0.15 0.0010 0.10 0.0005 0.05 0.0000

Figure S10: Differences in the European subpopulation ancestry among self-reported European Americans from different states. Shown for all European subpopulations that are carried at greater than 1% frequency in some state. The mean ancestry proportion among self- reported European Americans from each state is shown by shade of red. Ancestries that do not achieve at least 1% mean average ancestry in any state are not shown.

9 African ancestry in African Americans Native American ancestry in African Americans 0.85 !NJ !SC 0.010 !FL !NY !AL !GA !WA !OK 0.80 !AR !MS

State !CO!CA !LA State !TX !TN 0.008 CT region region ! TX !MO ! Midwest ! ! Midwest !MINY LA 0.75 ! ! MI SC VA!NC ! Northeast ! AL! ! Northeast IL ! NC ! !CA!IN OH! PA ! !GA MA!OK ! ! VA !MS ! ! South 0.006 !OH ! MD !DC ! South !CO !PA !NJ IN ! !MD ! TN WI ! West IL! ! West ! !MA MO! ! FL 0.70 !DC !KSWI CT !AR! KY KY! !KS! ! Mean state African ancestry 0.004

WA ! American ancestryMean state Native 0.65 !WV !WV

10 20 30 40 50 10 20 30 40 50 State population density of African Americans in 2010 State population density of African Americans in 2010

African ancestry in African Americans Native American ancestry in African Americans !SC !NJ

!FL 0.010 !AL !GA NY 0.80 !MS !AR ! !OK!WA

TN TX ! ! State LA CO CA State !CT 0.008 ! ! ! !MO region region !MI NY TX 0.75 !LA ! ! Midwest ! ! Midwest !VANC ! IL MI!SC OH IN ! !CA ! Northeast !AL ! Northeast ! ! OK ! NC !!MA MS PA !!GA !PA !NJ !CO ! South 0.006 !OH! !VADC ! South MD ! !MD! ! !IN !WI TN ! West ! IL ! West MO !MA ! 0.70 !DC ! FL KY !!WIAR !KS !CT ! ! !KS !KY

Mean state African ancestry 0.004

WA

! American ancestryMean state Native 0.65 !WV !WV

0 10 20 30 0 10 20 30 State population density of Latinos in 2010 State population density of Latinos in 2010

Figure S11: Correlations of African and Native American ancestry components of African Americans with population density of African Americans and Latinos by state. The x- axis show the state estimated population density of African Americans (top row) and Latinos (bottom row), and the y-axis show the mean state ancestry proportions. Each point represents a state and is labeled by the two-letter state abbreviation, for states with at least 10 individuals. The blue line shows a regression fit between the two variables, and the 95% confidence interval for the line fit is shown in gray. Each state is colored by region.

10 African ancestry in Europeans Native American ancestry in Europeans !LA !AK 0.006 0.004 !NM MT !NV SC !ID AL! ! ! !ND !LA 0.003 State State CA 0.004 region ! region !NC !TX !VA ! ! !GA Midwest !MS Midwest FL OK ! ! Northeast !OR ! ! Northeast !MS 0.002 SC !VT!WA ! KY UT !OK !TN ! South ! ! South ! HI TX !AR ! KY ! ! West !AZ ! !FL !VA ! West WV !WY KS AL NM!!AKCA COMN! MO AR ! 0.002 !ORHI ! !MOOH !! ! ! !NC !ID! ! DE 0.001 !WI !MI !KSNV !NJ ! !MA !IL!NY !MDGA CO ! !WV !IN !TN ! !DC

Mean state African ancestry ME ! !IN !NY MD ! !WAAZ!!RIMA !IL ! IANERI DE !WYUT !MI !NH! ! ! PA!OH!NJ ! !VT DC !!CT !MT !PA ! ! IA American ancestryMean state Native SD MEND! WI !CT ! !NH NE! 0.000 !SD !!MN

0 10 20 30 40 50 0 10 20 30 40 50 State population density of African Americans in 2010 State population density of African Americans in 2010

African ancestry in Europeans Native American ancestry in Europeans !LA !AK 0.006 0.004 !NM NV SC !MT ! AL! !ID ! LA !ND! State 0.003 State region CA region 0.004 !NC ! !VA !TX !GA ! Midwest ! Midwest MS !FL ! MS ! Northeast !OK OR ! Northeast ! 0.002 ! KY !SC ! TN OK ! South !VT !WA ! South ! ! UT AR ! ! !TX ! West ! West WV !HI ! !AK CA NM !KY VA !FL !AZ 0.002 !OHMO !HI OR ! ! !!WY ! DE!!ID !ALMN !KS ! !KS !NJ NV !MO! !AR!NC !CO CO ! 0.001 MIWI Mean state African ancestry IN ! ! ! IL ! MD !NY IN!MDGA!MA ! !NY !!MA!WA!RI !IL !AZ !WV!TN! !!DC !MI WY !UT !ME VT !DC ! MT!PA ! !IA !DE!NE !RI ! IA American ancestryMean state Native !!NHOH!PA !NJ MEND! WI !CT !CT !!!NH! NE !SD!MN ! !SD 0.000 0 10 20 30 40 0 10 20 30 40 State population density of Latinos in 2010 State population density of Latinos in 2010

Figure S12: Correlations of African and Native American ancestry components of Euro- pean Americans with population density of African Americans and Latinos by state.The x-axis show the state estimated population density of African Americans (top row) and Latinos (bottom row), and the y-axis show the mean state ancestry proportions. Each point represents a state and is labeled by the two-letter state abbreviation, for states with at least 10 individuals. The blue line shows a regression fit between the two variables, and the 95% confidence interval for the line fit is shown in gray. Each state is colored by region.

11 African ancestry in Latinos Native American ancestry in Latinos !TX 0.25 NM 0.20 ! !CA !IL !COAZ !LA 0.20 0.15 WI !KS State !UT State !NC region region GA !PA NY ! !OR OK 0.15 !MI! ! Midwest ! ! Midwest !DC !MN MO MA NJAR VA ! VA 0.10 WA! CT !MI! ! ! ! Northeast ! ! ! !DC ! Northeast !MN !IN PA NY GA !WIMA OH ! South !!MO ! ! ! South ! CT! FL !MD ! !MD !OH!TN 0.10 !NJ ! ! ! West IA West !OK ! !NC IL 0.05 !HI LA !WA ! !KY ! !IA !FL Mean state African ancestry !AR 0.05 HI KY!IN TX ! !AZ!CA! ! !CO!KS

OR American ancestryMean state Native !NM 0.00 !TN !UT 0.00 0 10 20 30 40 50 0 10 20 30 40 50 State population density of African Americans in 2010 State population density of African Americans in 2010

African ancestry in Latinos Native American ancestry in Latinos !LA 0.25 0.20

!NC GA !TX !PA ! NY 0.15 MI ! NM ! !DC 0.20 !IL !CA ! !MO VA State State ! !CO !AZ !MN region region OH!WI MA ! ! CT ! Midwest ! Midwest !MD ! 0.10 !NJ OK ! 0.15 ! ! Northeast !WI KS Northeast WA !IL ! ! ! South !UT ! South !IA !FL !AR 0.05 KY !IN HI TX ! West OK!OR ! West ! ! KS !AZ !CA MN ! ! !CO ! NJ !OR 0.10 MI!AR!VA!MA ! !NM ! DC!WA!CT !TN ! !IN Mean state African ancestry UT ! MO!PA !GA !NY ! MD 0.00 OH ! !FL ! !TN Mean state Native American ancestryMean state Native IA ! !NC 0.05 LA !HI !KY!

10 20 30 40 10 20 30 40 State population density of Latinos in 2010 State population density of Latinos in 2010

Figure S13: Correlations of African and Native American ancestry components of Latinos with population density of African Americans and Latinos by state.The x-axis show the state estimated population density of African Americans (top row) and Latinos (bottom row), and the y-axis show the mean state ancestry proportions. Each point represents a state and is labeled by the two-letter state abbreviation, for states with at least 10 individuals. The blue line shows a regression fit between the two variables, and the 95% confidence interval for the line fit is shown in gray. Each state is colored by region.

12 A N. Am. ancestry in European Americans B Af. ancestry in European Americans

C N. Am. ancestry in African Americans

Figure S14: Ancestry Composition versus ADMIXTURE estimates of Native American and African ancestry. (A) Estimates for Native American ancestry for European Americans with at least 1% Native American ancestry (inferred from Ancestry Composition). (B) Estimates of African ancestry for European Americans with at least 1% African ancestry. (C) Estimates of Native American ancestry in African Americans with at least 1% Native American ancestry.

13 Starting positions for ancestry segments in Native American segments African segments European Americans, for each 1cM bin of genetic position European segments 30 20

chr1 10 0 30 20

chr2 10 0 30 20

chr3 10 0 30 20

chr4 10 0 30 20

chr5 10 0 250 200 150

chr6 100 50 0 30 20

chr7 10 0 30 20

chr8 10 0 30 20

chr9 10 0 30 20

chr10 10 0 30 20

chr11 10 0 30 20

chr12 10 0 30 20

chr13 10 0 30 20

chr14 10 0 30 20

chr15 10 0 30 20

chr16 10 0 30 20

chr17 10 0 30 20

chr18 10 0 30 20

chr19 10 0 30 20

chr20 10 0 30 20

chr21 10 0 30 20

chr22 10 0 30

npar 20 − 10

chrX 0

Figure S15: Distribution of ancestry segment start positions across the genome in self- reported European Americans. The number of segments that start within a 1cM position along the genome, for each chromosome, are shown by a vertical bar, colored corresponding to African (blue), European (red) or Native American (green) ancestry. Since the vast majority of segments start at the left-most part of each chromsome, the first 5cM of each chromosome are omitted from each plot.

14 Starting positions for ancestry segments in Native American segments African segments Latinos, for each 1cM bin of genetic position European segments 600 400

chr1 200 0 600 400

chr2 200 0 600 400

chr3 200 0 600 400

chr4 200 0 600 400

chr5 200 0 1500 1000

chr6 500 0 600 400

chr7 200 0 600 400

chr8 200 0 600 400

chr9 200 0 600 400

chr10 200 0 600 400

chr11 200 0 600 400

chr12 200 0 600 400

chr13 200 0 600 400

chr14 200 0 600 400

chr15 200 0 600 400

chr16 200 0 600 400

chr17 200 0 600 400

chr18 200 0 600 400

chr19 200 0 600 400

chr20 200 0 600 400

chr21 200 0 600 400

chr22 200 0 600

npar 400 − 200

chrX 0

Figure S16: Distribution of ancestry segment start positions across the genome in self- reported Latinos.The number of segments that start within a 1cM position along the genome, for each chromosome, are shown by a vertical bar, colored corresponding to African (blue), European (red) or Native American (green) ancestry. Since the vast majority of segments start at the left-most part of each chromsome, the first 5cM of each chromosome are omitted from each plot.

15 Starting positions for ancestry segments in Native American segments African segments African Americans, for each 1cM bin of genetic position European segments 300 200

chr1 100 0 300 200

chr2 100 0 300 200

chr3 100 0 300 200

chr4 100 0 300 200

chr5 100 0 400 300 200 chr6 100 0 300 200

chr7 100 0 300 200

chr8 100 0 300 200

chr9 100 0 300 200

chr10 100 0 300 200

chr11 100 0 300 200

chr12 100 0 300 200

chr13 100 0 300 200

chr14 100 0 300 200

chr15 100 0 300 200

chr16 100 0 300 200

chr17 100 0 300 200

chr18 100 0 300 200

chr19 100 0 300 200

chr20 100 0 300 200

chr21 100 0 300 200

chr22 100 0 300

npar 200 − 100

chrX 0

Figure S17: Distribution of ancestry segment start positions across the genome in self- reported African Americans.The number of segments that start within a 1cM position along the genome, for each chromosome, are shown by a vertical bar, colored corresponding to African (blue), European (red) or Native American (green) ancestry. Since the vast majority of segments start at the left-most part of each chromsome, the first 5cM of each chromosome are omitted from each plot.

16 A 10 African Americans

8

6

4

2 within population (%)

Frequency of ancestry proportion 0 2 % 4 % 6 % 8 % 10 % 12 % 14 % 16 % 18 % 20 % 22 % 24 % 26 % 28 % 30 % 32 % 34 % 36 % 38 % 40 % 42 % 44 % 46 % 48 % 50 % 52 % 54 % 56 % 58 % 60 % 62 % 64 % 66 % 68 % 70 % 72 % 74 % 76 % 78 % 80 % 82 % 84 % 86 % 88 % 90 % 92 % 94 % 96 % 98 % − − − − 100 % 0 2 4 6 − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 Proportion African ancestry 98 B 60 Europeans with > 2% African ancestry 50

40

30

20

within population (%) 10

Frequency of ancestry proportion 0 2 % 4 % 6 % 8 % 10 % 12 % 14 % 16 % 18 % 20 % 22 % 24 % 26 % 28 % 30 % 32 % 34 % 36 % 38 % 40 % 42 % 44 % 46 % 48 % 50 % 52 % 54 % 56 % 58 % 60 % 62 % 64 % 66 % 68 % 70 % 72 % 74 % 76 % 78 % 80 % 82 % 84 % 86 % 88 % 90 % 92 % 94 % 96 % 98 % − − − − 100 % 0 2 4 6 − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 Proportion African ancestry 98 C 100.0 African Americans 50.0 Europeans with > 2% African ancestry

10.0 5.0

1.0 0.5 within population (%)

Frequency of ancestry proportion 0.1 2 % 4 % 6 % 8 % 10 % 12 % 14 % 16 % 18 % 20 % 22 % 24 % 26 % 28 % 30 % 32 % 34 % 36 % 38 % 40 % 42 % 44 % 46 % 48 % 50 % 52 % 54 % 56 % 58 % 60 % 62 % 64 % 66 % 68 % 70 % 72 % 74 % 76 % 78 % 80 % 82 % 84 % 86 % 88 % 90 % 92 % 94 % 96 % 98 % − − − − 100 % 0 2 4 6 − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 Proportion African ancestry 98

Figure S18: Distribution of African ancestry in African Americans and European Ameri- cans. (A) Histogram of African ancestry proportions of self-reported African Americans. (B) Histogram of those European Americans that are estimated to have at least 2% African ances- try. (C) Combined histogram of African Americans and European Americans that carry at least 2% African ancestry. Note that histogram C is shown on a log-scale to allow visualization of fine-scale differences between populations. Bins representing less than 0.1% of individuals are not shown.

17 Medical researchers in the regularly assess research participants race and ethnicity to ensure that inclusion in their research is fair and equitable. The definitions of race and ethnicity used in research are the same as the US Census categories, meaning medical researchers define race and ethnicity socially rather than biologically. If you were born or live in the United States, this survey seeks to understand how you identify yourself in terms of these socially-defined categories. Whether or not you are from the United States, the survey asks about your geographic roots. Your answers will help 23andMe understand the genetic diversity of these categories. Your responses to this survey may be used in both health-related and ancestry-related re- search and in summarizing the ethnic breakdown of participants for some of our federally- funded studies, such as those funded by the National Institutes of Health (NIH). Over time your responses will enable 23andMe to improve its health and ancestry reports. Want to help improve 23andMes health and ancestry features? Tell us how you identify in terms of ethnic and racial categories. Tell us what you know about your geographic roots. Your answers may lead not only to new research findings, but also to new 23andMe health and ancestry reports. This survey is about how you identify yourself in terms of the socially- defined categories of ethnicity and race, and about your geographic roots. Your responses to this survey may be used in both health-related and ancestry-related research, and in sum- marizing the ethnic breakdown of participants for some of our federally-funded studies, such as those funded by the National Institutes of Health (NIH). Estimated time to complete: Less than 5 minutes

Table S1: Introduction text to the ethnicity survey. We note that the text clearly states that the survey will be used in ancestry-related research.

18 African Americans European Americans Latinos State Af. N.Am. Eur. Size Af. N.Am. Eur. Size Af. N.Am. Eur. Size 81% 0.7% 17% * 0.5% 0.1% 98.9% *** --- - --- - 0.2% 0.4% 98.5% *** --- - --- - 0.1% 0.1% 99.2% *** 4% 18% 69% ** 80% 0.5% 18% * 0.2% 0.1% 99.3% *** 5% 10% 80% * California 73% 0.8% 24% *** 0.2% 0.3% 98.1% *** 4% 19% 65% *** 72% 0.8% 25% * 0.2% 0.1% 99.2% *** 4% 18% 67% ** 77% 0.5% 21% * 0.1% 0.0% 99.0% *** 10% 9% 75% * DC 70% 0.6% 28% ** 0.1% 0.1% 99.2% *** 14% 9% 64% * --- - 0.2% 0.1% 99.5% *** --- - 81% 0.5% 17% * 0.3% 0.1% 98.7% *** 6% 7% 80% *** Georgia 81% 0.6% 17% ** 0.4% 0.1% 99.3% *** 16% 8% 71% * --- - 0.2% 0.1% 97.6% *** 4% 5% 57% * --- - 0.2% 0.3% 98.7% *** --- - 74% 0.5% 24% *** 0.1% 0.1% 99.1% *** 8% 19% 63% *** 73% 0.6% 25% * 0.1% 0.1% 99.3% *** 5% 8% 83% * --- - 0.1% 0.1% 99.5% *** 6% 5% 79% * 69% 0.5% 29% * 0.2% 0.1% 99.5% *** 4% 14% 75% * 69% 0.4% 29% * 0.3% 0.1% 99.3% *** 4% 4% 90% * 75% 0.8% 23% ** 0.6% 0.3% 98.5% *** 22% 4% 70% * --- - 0.1% 0.1% 99.6% *** --- - 72% 0.6% 26% * 0.1% 0.1% 99.2% *** 10% 7% 76% * Massachusetts 73% 0.5% 25% * 0.1% 0.1% 98.1% *** 11% 10% 73% * 75% 0.7% 23% ** 0.1% 0.1% 98.9% *** 15% 9% 69% * Minnesota --- - 0.0% 0.1% 99.4% *** 12% 10% 70% * 80% 0.6% 18% * 0.3% 0.2% 99.1% *** --- - 76% 0.5% 22% ** 0.2% 0.1% 99.4% *** 14% 8% 76% * Montana --- - 0.1% 0.3% 99.2% *** --- - --- - 0.1% 0.1% 99.3% *** --- - --- - 0.2% 0.3% 98.2% *** --- - --- - 0.1% 0.0% 99.5% *** --- - New Jersey 72% 1.1% 25% ** 0.2% 0.0% 98.3% *** 9% 10% 73% ** New --- - 0.2% 0.4% 98.7% *** 3% 20% 67% ** 75% 0.9% 22% *** 0.1% 0.1% 97.8% *** 15% 8% 69% *** 74% 0.6% 23% ** 0.4% 0.1% 98.9% *** 17% 5% 75% * North Dakota --- - 0.1% 0.3% 98.8% *** --- - 73% 0.6% 24% ** 0.2% 0.0% 99.1% *** 11% 7% 78% * 73% 0.9% 25% * 0.3% 0.2% 99.1% *** 8% 11% 72% * --- - 0.2% 0.2% 98.8% *** 3% 11% 74% * 72% 0.6% 26% *** 0.1% 0.0% 99.0% *** 15% 8% 72% ** --- - 0.1% 0.1% 98.7% *** --- - 83% 0.7% 15% * 0.5% 0.2% 99.0% *** --- - South Dakota --- - 0.0% 0.0% 99.8% *** --- - 77% 0.5% 21% ** 0.3% 0.1% 99.1% *** 2% 6% 89% * 78% 0.7% 20% *** 0.2% 0.3% 98.9% *** 5% 21% 64% *** --- - 0.1% 0.2% 98.9% *** 1% 13% 78% * Vermont --- - 0.1% 0.2% 99.1% *** --- - 74% 0.6% 23% ** 0.4% 0.1% 98.9% *** 13% 10% 71% * 66% 0.9% 30% * 0.1% 0.2% 99.0% *** 7% 9% 76% * 64% 0.2% 34% * 0.2% 0.1% 98.9% *** --- - Wisconsin 71% 0.5% 27% * 0.1% 0.1% 99.4% *** 11% 14% 68% * --- - 0.1% 0.1% 99.6% *** --- -

Table S2: Mean ancestry proportions and sample sizes of 23andMe African Americans, European Americans, and Latinos. To protect participant privacy, ancestries have been rounded, and states with fewer than 10 individuals from a cohort are not reported. Sample sizes between 10 and 49 individuals are denoted by (*), between 50 and 99 individuals by (**) and 100 or more individuals as (***). Mean levels of European (Eur.), African (Af.) and Native American (N. Am.) ancestry are reported for each state.

19 Region African European Native Sample (States included in region) ances- ances- Amer- size try try ican ancestry West 72.6% 24.3% 0.9% * , Hawaii, California, Montana, Oregon, Utah, Arizona, Idaho, Nevada, Wyoming, Alaska, Washington, Colorado Midwest 73.6% 24.1% 0.6% * Missouri, Nebraska, Ohio, Kansas, Michi- gan, Wisconsin, Indiana, Illinois, Minnesota, Iowa, North Dakota, South Dakota Northeast 73.2% 24.3% 0.8% ** Rhode Island, Pennsylvania, Vermont, New York, New Hampshire, Massachusetts, Con- necticut, New Jersey, D.C., Maine South 77.1% 21.9% 0.6% ** Alabama, Texas, Kentucky, Florida, Geor- gia, Virginia, Louisiana, Maryland, North Carolina, Arkansas, South Carolina, West Virginia, Oklahoma, Mississippi, Tennessee, Delaware

Table S3: Mean ancestry proportions and sample sizes of 23andMe African Americans, by region. Sample sizes between 100 and 499 individuals are denoted by (*), between 500 and 999 individuals by (**) and 1000 or more individuals as (***). Mean levels of European, African and Native American ancestry are reported for each subpopulation.

20 African Americans Proportion female contribution fAfrican,male =31% fAfrican,female =42.2% 58% fEuropean,male =18.8% fEuropean,female =5.2% 22% fN.American,male =0.2% fN.American,female =0.6% 75%

Latinos Proportion female contribution fAfrican,male =2.3% fAfrican,female =3.9% 63% fEuropean,male =40.7% fEuropean,female =24.4% 37% fN.American,male =7.0% fN.American,female =11.0% 61%

European Americans Proportion female contribution fAfrican,male =0.04% fAfrican,female =0.1% 71% fEuropean,male =49.9% fEuropean,female =48.7% 49% fN.American,male =0.02% fN.American,female =0.2% 91%

Table S4: Best fit estimates of African American, Latino, and European American ancestry contributions for males and females, by ancestral population. For African Americans, we estimate a male:female European ratio of 3.6, meaning that of European ancestors to African Americans, over three times as many were male as were female. Proportion female contribution is calculated for each cohort, for each ancestry, as ffemale . ffemale+fmale

Subpopulation European African Native American Sample Size Central American 53% 9% 26% * Mexican 61% 3% 24% *** South American 69% 5% 17% ** White 73% 5% 14% *** Cuban 84% 6% 4% * Puerto Rican 69% 14% 8% * Dominican 56% 28% 7% * Black 46% 42% 6% *

Table S5: Mean ancestry proportions and sample sizes of 23andMe Latinos by subpopu- lation. Mean proportions of ancestry among Latino individuals that selected “Hispanic” who also chose to select another identity, or selected one or more other ethnicities are provided. To protect participant privacy, ancestries have been rounded to the nearest percent. Sample sizes between 100 and 499 individuals are denoted by (*), between 500 and 999 individuals by (**) and 1000 or more individuals as (***). Mean levels of European, African and Native American ancestry are reported for each subpopulation.

21 Logit Regression Results ======Dep. Variable: 0 No. Observations: 161460 Model: Logit Df Residuals: 161454 Method: MLE Df Model: 5 Date: Mon, 12 May 2014 Pseudo R-squ.: 0.9416 Time: 15:10:22 Log-Likelihood: -1357.8 converged: True LL-Null: -23269. LLR p-value: 0.000 ======coef std err z P>|z| [95.0% Conf. Int.] ------ancestry 20.0753 1.069 18.775 0.000 17.980 22.171 age 4.148e-05 0.005 0.009 0.993 -0.009 0.010 sex 0.2906 0.168 1.730 0.084 -0.039 0.620 age-ancestry-interaction 0.0472 0.020 2.308 0.021 0.007 0.087 sex-ancestry-interaction -2.3224 0.768 -3.024 0.002 -3.828 -0.817 intercept -7.1956 0.261 -27.600 0.000 -7.707 -6.685 ======

Logit Regression Results ======Dep. Variable: 0 No. Observations: 161460 Model: Logit Df Residuals: 161456 Method: MLE Df Model: 3 Date: Mon, 12 May 2014 Pseudo R-squ.: 0.9416 Time: 15:10:25 Log-Likelihood: -1359.3 converged: True LL-Null: -23269. LLR p-value: 0.000 ======coef std err z P>|z| [95.0% Conf. Int.] ------ancestry 19.6822 0.865 22.745 0.000 17.986 21.378 age-ancestry-interaction 0.0476 0.016 2.887 0.004 0.015 0.080 sex-ancestry-interaction -1.5871 0.639 -2.485 0.013 -2.839 -0.335 intercept -7.0514 0.084 -84.239 0.000 -7.216 -6.887 ======

Logit Regression Results ======Dep. Variable: 0 No. Observations: 161460 Model: Logit Df Residuals: 161458 Method: MLE Df Model: 1 Date: Mon, 12 May 2014 Pseudo R-squ.: 0.9413 Time: 15:12:43 Log-Likelihood: -1365.8 converged: True LL-Null: -23269. LLR p-value: 0.000 ======coef std err z P>|z| [95.0% Conf. Int.] ------ancestry 21.0602 0.375 56.096 0.000 20.324 21.796 intercept -7.0477 0.084 -84.331 0.000 -7.212 -6.884 ======

Table S6: Logistic regression model results for predicting European American versus African American self-reported identity. Logistic regression was performed using python’s module statsmodels. The three models shown below include the full model, a model in- cluding only the most significant parameters, and a simple model using proportion African ancestry and intercept.

22 X (test) alpha stderr European Americans with 1% 2% African ancestry 0.972757 0.002220 European Americans with > 2% African ancestry 0.942362 0.002508

Table S7: Estimates of admixture from ADMIXTOOLS f4 test. Estimates of admixture from Africans into European Americans, stratified by our estimates of African ancestry, are shown. Populations used for validation include 1000 Genomes populations from , , and Yoruba from Nigeria. In the f4(X, A, O, B, C) test, we used A = TSI, B (control) = GBR, O (outgroup) = Chimp, and C = YRI.

Cohort Prop N. Am. ancestry N. Am. haplogroups Total N Rate European Americans 0.01–0.02 96 1,278 7.5% European Americans > 0.02 774 2,697 28.7% African Americans 0.01–0.02 16 838 1.9% African Americans > 0.02 34 305 11.1% 4GP Europeans all countries 21 15,651 0.13% 4GP Europeans excl. 7 15,021 0.047%

Table S8: Rates of mtDNA haplogroups A, B, C and D in African Americans and European Americans with Native American ancestry. Estimates of the number of individuals that carry Native American mtDNA haplogroups corresponds, as expected, with the estimate of genome- wide Native American ancestry. Individuals from each cohort with Native American ancestry were stratified by their estimated amount of Native American ancestry, and the number of A, B, C or D mtDNA haplogroups, and the rate of these Native American specific haplogroups is shown for each estimated amoung of Native American ancestry.

23