Genes and human history

Gil McVean, Department of Statistics, Oxford

Contact: [email protected] 2 3

• Where does the variation come from?

• How old are the genetic differences between us?

• Are these differences important? How different are our genomes? 5

Serological techniques for detecting variation

Rabbit Anti-A antibodies Human A

A B AB O 6 Blood group systems in humans

• 28 known systems – 39 , 643 alleles

System Genes Alleles Knops CR1 24+ ABO ABO 102 Landsteiner- ICAM4 3 Wiener Colton C4A, C4B 7+ Lewis FUT3, FUT6 14/20 Chido-rodgers AQP1 7 Lutheran LU 16 Colton DAF 10 MNS GYPA,GYPB,G 43 Diego SLC4A1 78 YPE Dombrock DO 9 OK BSG 2 Duffy FY 9 P-related A4GALT, 14/5 Gerbich GYPC 9 B3GALT3 GIL AQP3 2 RAPH-MER2 CD151 3 H/h FUT1, FUT2 27/22 Rh RHCE, RHD, 129 I GCNT2 7 RHAG Indian CD44 2 Scianna ERMAP 4 Kell KEL, XK 33/30 Xg XG, CD99 - Kidd SLC14A1 8 YT ACHE 4

http://www.bioc.aecom.yu.edu/bgmut/summary.htm 7 electroporesis

• Changes in mass/charge ratio resulting from amino acid substitutions in can be detected

Starch or agar gel

-- +- + +------+ - + +- -- Direction of travel

• In humans, about 30% of all loci show polymorphism with a 6% chance of a pair of randomly drawn alleles at a locus being different

Lewontin and Hubby (1966) Harris(1966) 8 The rise of DNA sequencing

GATAAGACGGTGATACTCACGCGACGGGCTTGGGCGCCGACTCGTTCAGACGGTGACCCAACTTATCCGATCGACCC CGGGTCCCGATTTAGACTCGGTATCATTTCTGGTGATTATCGCCTGCAGGTTCAAGAACACGTTTGCAGCAAGAAGT GAGGGATTTTGTCAGTGATCCCAGTCTACGGAGCCAGTCACCTCTGGTAGTGAAATTTTATTCGTTCATCTTCATAT AAGTCGCAGACCGCACGATGGGGGACAGAATACTCGCACAGGAAGAACCGCGATGAACCGAGGTAACCTAACATCCT AAGCCATTCCAACGAGGCTTTCGTAACCAAATCAGTTCTTCCCAGTCCAGATGAGGCGAACGTAGGTGCTGTTGGAA CCATGAGTGGCCAACAGAATACTGTGGATGCTAAGCTAATGGAATGTGTTAATCAGACGTTTGCTGATGTGACACAT TGGTCGCTGCTCTTTGATGCGGAAATCTATGAGCGGTCAAACCGATACAAACCCGGCTATGTCGTTCGCACAACAGT CGGGTCCCACCCCATTGTTCTTATGAAGGTATTACTGGTCATACGATGCTTTTGCGACGCATCCCTCCCTATGACGA GAGTGCAGTCAGACCCCTCGACCATTTCCCTTAGAAAGACCACCCATCTCTTCAAAGTTATTCTCCGTGACATGCGA ACGCTGAAGGATAAGGAGCGGCATGCAGACTTTTATGTGTGCTCTCTGCTGGTCCAGCGGCATCTAAACGTCTCATC ACTAGGGCCACGCAGTCGTTTTTAAGAGGCTCTATTTTTACTAATTATTCTTGTCCACCACGACCTCTCAGCGCGGC AGATAGGTTCACAGGCTAGCGTCGGGTAATGCATTGCAGTTTCGTTACTCGTTCAGACAAGACTCGATGCTTTACAC TCACGACCCGCAAAGCCTTGGCCTTACAAGGGTATTAGGCCGAACACTTACTTATCGCCGAAGGTACGTCGGCTATT GTAGCCCAAACCCTAGACTGAGCCCTAACCTCTACGCGTATCTTATAGGTTCAGAACGCCGAAGGACTATTCTCACG GCATTCATGGTTAAAAGAGAGTCGAGGCGCCTGCTATATGTGCCGAGTCCCATTAGTCAGTACACTTGCCATCACAT TTGTCCTGTTAGGCGGACACTTAGAGTAAGCGTACAACGCCTTACAACGAGACGCAGATCGCTTTTCTAATTGCGCC GCGTCTCTACCATCGTGGCCAGTTCATACTCACACGGAGGTGTGCAACCCGTAACACGAGTGAGTGCTCACTTTATA ATAAGTCAGCGTTCAGGACTGAGTGCAACCAATCTACGCCAGGAATCGCAAACAGCGCTCATAAACTTCTTACCTTT CCATAGCGCGCCTTTCGAGTATTATTGACCGTTAGGACTACGATAGGCTTCGACAATAGACCCTATCTGCGCATCAT TACCTCTCACCGGGGGAAAGAAATTCCAATCAATCTGTCCAGGGCGCCCGTTTTTTTAAGACCTTAGTGCCCATGAA TGAACTGGCTCAAGCAATAGCGGCTGCTCGTGCCATGCGTGAGCTGGCGGCCAAATCGGACTCACGGACAAGTCTGC CCCCTTGTGAGTTAGTGTTGGCTTGACAACTCTAAAGTCCGAACCCATCGTGCGGCCATCCTACGTGGTGTAGCTTT GGCCCATAACTAACCTGGTTACTCACTATCCTGCGACTCGTCTGGTCTCACTAGGCGATTCCCCCCGGCTTCGTATT GCAACATTCTAACGAATGCGAAGTCAAACAGTCCAGCTTAACAAAGGGGTCTTGACGAGACTCTGTAATCGTCTGCT AGCCCCGGACTCTGTTGTCGAAGGCAATTTGACGACCCACACGAGGTGCAGACGTAGTCAGGCCTGATAGCTATGTA TGCAGGCATATCCCTATAAAGTAGCGTTTGGTTATCCTACCATTAGCCGTTTCCGCATCTACCAGTGTCGACCGG SNPs 9 GATAAGACGGTGATACTCACGCGACGGGCTTGGGCGCCGACTCGTTCAGACGGTGACCCAACTTATCCGATCGACCCCGGGT CCCGATTTAGACTCGGTATCATTTCTGGTGATTATCGCCTGCAGGTTCAAGAACACGTTTGCAGCAAGAAGTGAGGGATTTT GTCAGTGATCCCAGTCTACGGAGCCAGTCACCTCTGGTAGTGAAATTTTATTCGTTCATCTTCATATAAGTCGCAGACCGCA CGATGGGGGACAGAATACTCGCACAGGAAGAACCGCGATGAACCGAGGTAACCTAACATCCTAAGCCATTCCAACGAGGCTTSingle Nucleotide Polymorphisms TCGTAACCAAATCAGTTCTTCCCAGTCCAGATGAGGCGAACGTAGGTGCTGTTGGAACCATGAGTGGCCAACAGAATACTGT GGATGCTAAGCTAATGGAATGTGTTAATCAGACGTTTGCTGATGTGACACATTGGTCGCTGCTCTTTGATGCGGAAATCTAT GAGCGGTCAAACCGATACAAACCCGGCTATGTCGTTCGCACAACAGTCGGGTCCCACCCCATTGTTCTTATGAAGGTATTAC TGGTCATACGATGCTTTTGCGACGCATCCCTCCCTATGACGAGAGTGCAGTCAGACCCCTCGACCATTTCCCTTAGAAAGAC CACCCATCTCTTCAAAGTTATTCTCCGTGACATGCGAACGCTGAAGGATAAGGAGCGGCATGCAGACTTTTATGTGTGCTCT CTGCTGGTCCAGCGGCATCTAAACGTCTCATCACTAGGGCCACGCAGTCGTTTTTAAGAGGCTCTATTTTTACTAATTATTC TTGTCCACCACGACCTCTCAGCGCGGCAGATAGGTTCACAGGCTAGCGTCGGGTAATGCATTGCAGTTTCGTTACTCGTTCATGCATTGCGTAGGC GACAAGACTCGATGCTTTACACTCACGACCCGCAAAGCCTTGGCCTTACAAGGGTATTAGGCCGAACACTTACTTATCGCCG AAGGTACGTCGGCTATTGTAGCCCAAACCCTAGACTGAGCCCTAACCTCTACGCGTATCTTATAGGTTCAGAACGCCGAAGGTGCATT CGTAGGC ACTATTCTCACGGCATTCATGGTTAAAAGAGAGTCGAGGCGCCTGCTATATGTGCCGAGTCCCATTAGTCAGTACACTTGCCC ATCACATTTGTCCTGTTAGGCGGACACTTAGAGTAAGCGTACAACGCCTTACAACGAGACGCAGATCGCTTTTCTAATTGCG CCGCGTCTCTACCATCGTGGCCAGTTCATACTCACACGGAGGTGTGCAACCCGTAACACGAGTGAGTGCTCACTTTATAATA AGTCAGCGTTCAGGACTGAGTGCAACCAATCTACGCCAGGAATCGCAAACAGCGCTCATAAACTTCTTACCTTTCCATAGCG CGCCTTTCGAGTATTATTGACCGTTAGGACTACGATAGGCTTCGACAATAGACCCTATCTGCGCATCATTACCTCTCACCGG GGGAAAGAAATTCCAATCAATCTGTCCAGGGCGCCCGTTTTTTTAAGACCTTAGTGCCCATGAATGAACTGGCTCAAGCAAT AGCGGCTGCTCGTGCCATGCGTGAGCTGGCGGCCAAATCGGACTCACGGACAAGTCTGCCCCCTTGTGAGTTAGTGTTGGCT TGACAACTCTAAAGTCCGAACCCATCGTGCGGCCATCCTACGTGGTGTAGCTTTGGCCCATAACTAACCTGGTTACTCACTA TCCTGCGACTCGTCTGGTCTCACTAGGCGATTCCCCCCGGCTTCGTATTGCAACATTCTAACGAATGCGAAGTCAAACAGTC CAGCTTAACAAAGGGGTCTTGACGAGACTCTGTAATCGTCTGCTAGCCCCGGACTCTGTTGTCGAAGGCAATTTGACGACCC ACACGAGGTGCAGACGTAGTCAGGCCTGATAGCTATGTATGCAGGCATATCCCTATAAAGTAGCGTTTGGTTATCCTACCAT TAGCCGTTTCCGCATCTACCAGTGTCGACCGG1 in 1000 between any two genomes 10 Different, but not that different

• Humans are one of the least diverse organisms

Species Diversity (percent) Humans 0.08 - 0.1 Chimpanzees 0.12 - 0.17 Drosophila simulans 2 E. coli 5 HIV1 30 11

c. 3,000,000 SNPs in 270 people 12

c. 25,000,000 SNPs in 1000 people 13 How do we differ? – Let me count the ways

• Single nucleotide polymorphisms TGCATTGCGTAGGC TGCATTCCGTAGGC

• Short indels (=insertion/deletion) TGCATT---TAGGC TGCATTCCGTAGGC

• Microsatellite (STR) repeat number TGCTCATCATCATCAGC TGCTCATCA------GC

• Minisatellites

≤100bp

• Repeated genes – rRNA, histones 1-5kb

• Large inversions, deletions – Y , Copy Number Variants (CNVs) 14 Y chromosome variation

• Non-pathological rearrangements of the AZFc region on the Y chromosome 15 Copy-number variation in genes

• Variation in number can contribute to phenotypic variation

Perry et al. 2007 16 Where does genetic variation come from?

• You will pass on about 60 new mutations to each of your children

• Most of these are destined to die out within a few generations

• Most variation is inherited from our ancestors 17

Me You 18

Mutations in our ancestors Our genealogical tree

Our genomes Inherited mutations 19 mtDNA Eve

Vigilant et al. (1991) Recombination means that different parts of 20 the genome have different tree

• Looking back in time, recombination means that different parts of your follow different evolutionary paths

• This means that the genealogical tree will change along the genome

Grandpaternal sequence Grandmaternal sequence

TCAGGCATGGATCAGGGAGCT x TCACGCATGGAACAGGGAGCT

TCAGGCATGG AACAGGGAGCT 21

How old? 22

Human – chimp split

Autosomal MRCA

Origin of H. sapiens Homo erectus Australopithecus afarensis 25 Ancient variation in the I

• Inversion on chromosome 17

(Stefansson et al 2005) 26 Ancient variation in the human genome II

• Trans-specific polymorphism in the HLA

Lawlor et al. 1988 , Horton et al (1998) 27 Did early humans breed with Neanderthals?

Neanderthals

mtDNA sequences say no…

Ovchinnikov et al (2000) 28 But…

• There is some evidence for this in the presence of unusual haplotypes found in Europe composed of SNPs not found in non-European populations

Plagnol and Wall (2006) What are the genetic differences that make us human? 30 Chromosomal changes

• Human chromosome 2 is a fusion of two chromosomes in great apes

• There are several inversion differences between the chromosomes

Feuk et al (2005) 31 Gene loss

• Loss of enzymes that make sialic acid – Sugar on cell surface that mediates a variety of recognition events involving pathogenic microbes and toxins

• Myosin heavy chain – Associated with gracilization

Wang et al (2006) 32 Gene evolution

• FOXP2 is a highly conserved gene (across the mammalia), expressed in the brain. Mutations in the gene in humans are associated with specific language impairment

• Across the entire mammalian phylogeny, there have only been a very few amino acid changing substitutions

• However, two amino acid changes have become fixed in the lineage leading to modern humans since the split with the chimpanzee lineage

Enard et al. (2002) Are the genetic differences between people and peoples important? 34

Diet Infectious disease

Genome ?

Physical environment

Mating success 35 Detecting recent adaptive evolution

• Let’s look closely at the dynamics of the fixation process for adaptive mutations

• The fixation of a beneficial mutation is associated with a change in the patterns of linked neutral genetic variation

• This is known as the hitch-hiking effect (Maynard Smith and Haigh 1974)

• Looking for the signature of hitch-hiking can be a good way of detecting very recent fixation events 36 Lactose persistence 37 Lactose intolerance 38 Skin pigmentation 39

Lamason et al. (2005) 40 Disease resistance

• Mutations in the Duffy gene associated with protection again malarial infection (Plasmodium vivax) 41 An aside on the genetics of race

• It is sometimes claimed that there is a ‘genetic basis to race’

• What is true is that groups of individuals from different parts of the world tend to have similar genomes because they share recent ancestry

Rosenberg et al (2002)

• But there are very few ‘fixed’ genetic differences between populations (I can think of one example – the FY gene)

• The differences between populations are in terms of the combinations of variants, 42 Evidence for widespread local adaptation

Protein-changing

Protein unchanging

The International HapMap Consortium (2007) 43 Classes of selected genes

Voight et al. (2005) 44 Reading

• Human genetic variation – Rosenberg et al. Genetic structure of human populations. Science 2002, 298:2381-2385. – Conrad et al. A worldwide survey of haplotype variation and linkage disequilibrium in the human genome. Nature Genet. 2006, 1251-1260. – McVean et al. Perspectives on human genetic variation from the International HapMap Project. PLoS Genetics 2005, 1:e54.

• The origin of modern humans – Reed & Tishkoff. African human diversity, origins and migrations. Curr Opin Genet Dev. 2006 16:597-605. – Jobling et al. Human evolutionary genetics: origins, peoples, and disease. Garland Science, 2004. – Harding & McVean. A structured ancestral population for the evolution of modern humans. Curr. Op. Genet. Dev. 2004, 14: 667-674.

• Natural selection – Lamason et al. SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and humans. Science 2005, 310:1782-1786. – Sabeti et al. Positive natural selection in the human lineage. Science 2006, 312:1614-1620. – Tishkoff et al. Convergent adaptation of human lactase persistence in Africa and Europe. Nat Genet. 2007 39:31-40