Genes and Human History
Total Page:16
File Type:pdf, Size:1020Kb
Genes and human history Gil McVean, Department of Statistics, Oxford Contact: [email protected] 2 3 • Where does the variation come from? • How old are the genetic differences between us? • Are these differences important? How different are our genomes? 5 Serological techniques for detecting variation Rabbit Anti-A antibodies Human A A B AB O 6 Blood group systems in humans • 28 known systems – 39 genes, 643 alleles System Genes Alleles Knops CR1 24+ ABO ABO 102 Landsteiner- ICAM4 3 Wiener Colton C4A, C4B 7+ Lewis FUT3, FUT6 14/20 Chido-rodgers AQP1 7 Lutheran LU 16 Colton DAF 10 MNS GYPA,GYPB,G 43 Diego SLC4A1 78 YPE Dombrock DO 9 OK BSG 2 Duffy FY 9 P-related A4GALT, 14/5 Gerbich GYPC 9 B3GALT3 GIL AQP3 2 RAPH-MER2 CD151 3 H/h FUT1, FUT2 27/22 Rh RHCE, RHD, 129 I GCNT2 7 RHAG Indian CD44 2 Scianna ERMAP 4 Kell KEL, XK 33/30 Xg XG, CD99 - Kidd SLC14A1 8 YT ACHE 4 http://www.bioc.aecom.yu.edu/bgmut/summary.htm 7 Protein electroporesis • Changes in mass/charge ratio resulting from amino acid substitutions in proteins can be detected Starch or agar gel -- +- + +- - - - -- - + - + +- -- Direction of travel • In humans, about 30% of all loci show polymorphism with a 6% chance of a pair of randomly drawn alleles at a locus being different Lewontin and Hubby (1966) Harris(1966) 8 The rise of DNA sequencing GATAAGACGGTGATACTCACGCGACGGGCTTGGGCGCCGACTCGTTCAGACGGTGACCCAACTTATCCGATCGACCC CGGGTCCCGATTTAGACTCGGTATCATTTCTGGTGATTATCGCCTGCAGGTTCAAGAACACGTTTGCAGCAAGAAGT GAGGGATTTTGTCAGTGATCCCAGTCTACGGAGCCAGTCACCTCTGGTAGTGAAATTTTATTCGTTCATCTTCATAT AAGTCGCAGACCGCACGATGGGGGACAGAATACTCGCACAGGAAGAACCGCGATGAACCGAGGTAACCTAACATCCT AAGCCATTCCAACGAGGCTTTCGTAACCAAATCAGTTCTTCCCAGTCCAGATGAGGCGAACGTAGGTGCTGTTGGAA CCATGAGTGGCCAACAGAATACTGTGGATGCTAAGCTAATGGAATGTGTTAATCAGACGTTTGCTGATGTGACACAT TGGTCGCTGCTCTTTGATGCGGAAATCTATGAGCGGTCAAACCGATACAAACCCGGCTATGTCGTTCGCACAACAGT CGGGTCCCACCCCATTGTTCTTATGAAGGTATTACTGGTCATACGATGCTTTTGCGACGCATCCCTCCCTATGACGA GAGTGCAGTCAGACCCCTCGACCATTTCCCTTAGAAAGACCACCCATCTCTTCAAAGTTATTCTCCGTGACATGCGA ACGCTGAAGGATAAGGAGCGGCATGCAGACTTTTATGTGTGCTCTCTGCTGGTCCAGCGGCATCTAAACGTCTCATC ACTAGGGCCACGCAGTCGTTTTTAAGAGGCTCTATTTTTACTAATTATTCTTGTCCACCACGACCTCTCAGCGCGGC AGATAGGTTCACAGGCTAGCGTCGGGTAATGCATTGCAGTTTCGTTACTCGTTCAGACAAGACTCGATGCTTTACAC TCACGACCCGCAAAGCCTTGGCCTTACAAGGGTATTAGGCCGAACACTTACTTATCGCCGAAGGTACGTCGGCTATT GTAGCCCAAACCCTAGACTGAGCCCTAACCTCTACGCGTATCTTATAGGTTCAGAACGCCGAAGGACTATTCTCACG GCATTCATGGTTAAAAGAGAGTCGAGGCGCCTGCTATATGTGCCGAGTCCCATTAGTCAGTACACTTGCCATCACAT TTGTCCTGTTAGGCGGACACTTAGAGTAAGCGTACAACGCCTTACAACGAGACGCAGATCGCTTTTCTAATTGCGCC GCGTCTCTACCATCGTGGCCAGTTCATACTCACACGGAGGTGTGCAACCCGTAACACGAGTGAGTGCTCACTTTATA ATAAGTCAGCGTTCAGGACTGAGTGCAACCAATCTACGCCAGGAATCGCAAACAGCGCTCATAAACTTCTTACCTTT CCATAGCGCGCCTTTCGAGTATTATTGACCGTTAGGACTACGATAGGCTTCGACAATAGACCCTATCTGCGCATCAT TACCTCTCACCGGGGGAAAGAAATTCCAATCAATCTGTCCAGGGCGCCCGTTTTTTTAAGACCTTAGTGCCCATGAA TGAACTGGCTCAAGCAATAGCGGCTGCTCGTGCCATGCGTGAGCTGGCGGCCAAATCGGACTCACGGACAAGTCTGC CCCCTTGTGAGTTAGTGTTGGCTTGACAACTCTAAAGTCCGAACCCATCGTGCGGCCATCCTACGTGGTGTAGCTTT GGCCCATAACTAACCTGGTTACTCACTATCCTGCGACTCGTCTGGTCTCACTAGGCGATTCCCCCCGGCTTCGTATT GCAACATTCTAACGAATGCGAAGTCAAACAGTCCAGCTTAACAAAGGGGTCTTGACGAGACTCTGTAATCGTCTGCT AGCCCCGGACTCTGTTGTCGAAGGCAATTTGACGACCCACACGAGGTGCAGACGTAGTCAGGCCTGATAGCTATGTA TGCAGGCATATCCCTATAAAGTAGCGTTTGGTTATCCTACCATTAGCCGTTTCCGCATCTACCAGTGTCGACCGG SNPs 9 GATAAGACGGTGATACTCACGCGACGGGCTTGGGCGCCGACTCGTTCAGACGGTGACCCAACTTATCCGATCGACCCCGGGT CCCGATTTAGACTCGGTATCATTTCTGGTGATTATCGCCTGCAGGTTCAAGAACACGTTTGCAGCAAGAAGTGAGGGATTTT GTCAGTGATCCCAGTCTACGGAGCCAGTCACCTCTGGTAGTGAAATTTTATTCGTTCATCTTCATATAAGTCGCAGACCGCA CGATGGGGGACAGAATACTCGCACAGGAAGAACCGCGATGAACCGAGGTAACCTAACATCCTAAGCCATTCCAACGAGGCTTSingle Nucleotide Polymorphisms TCGTAACCAAATCAGTTCTTCCCAGTCCAGATGAGGCGAACGTAGGTGCTGTTGGAACCATGAGTGGCCAACAGAATACTGT GGATGCTAAGCTAATGGAATGTGTTAATCAGACGTTTGCTGATGTGACACATTGGTCGCTGCTCTTTGATGCGGAAATCTAT GAGCGGTCAAACCGATACAAACCCGGCTATGTCGTTCGCACAACAGTCGGGTCCCACCCCATTGTTCTTATGAAGGTATTAC TGGTCATACGATGCTTTTGCGACGCATCCCTCCCTATGACGAGAGTGCAGTCAGACCCCTCGACCATTTCCCTTAGAAAGAC CACCCATCTCTTCAAAGTTATTCTCCGTGACATGCGAACGCTGAAGGATAAGGAGCGGCATGCAGACTTTTATGTGTGCTCT CTGCTGGTCCAGCGGCATCTAAACGTCTCATCACTAGGGCCACGCAGTCGTTTTTAAGAGGCTCTATTTTTACTAATTATTC TTGTCCACCACGACCTCTCAGCGCGGCAGATAGGTTCACAGGCTAGCGTCGGGTAATGCATTGCAGTTTCGTTACTCGTTCATGCATTGCGTAGGC GACAAGACTCGATGCTTTACACTCACGACCCGCAAAGCCTTGGCCTTACAAGGGTATTAGGCCGAACACTTACTTATCGCCG AAGGTACGTCGGCTATTGTAGCCCAAACCCTAGACTGAGCCCTAACCTCTACGCGTATCTTATAGGTTCAGAACGCCGAAGGTGCATT CGTAGGC ACTATTCTCACGGCATTCATGGTTAAAAGAGAGTCGAGGCGCCTGCTATATGTGCCGAGTCCCATTAGTCAGTACACTTGCCC ATCACATTTGTCCTGTTAGGCGGACACTTAGAGTAAGCGTACAACGCCTTACAACGAGACGCAGATCGCTTTTCTAATTGCG CCGCGTCTCTACCATCGTGGCCAGTTCATACTCACACGGAGGTGTGCAACCCGTAACACGAGTGAGTGCTCACTTTATAATA AGTCAGCGTTCAGGACTGAGTGCAACCAATCTACGCCAGGAATCGCAAACAGCGCTCATAAACTTCTTACCTTTCCATAGCG CGCCTTTCGAGTATTATTGACCGTTAGGACTACGATAGGCTTCGACAATAGACCCTATCTGCGCATCATTACCTCTCACCGG GGGAAAGAAATTCCAATCAATCTGTCCAGGGCGCCCGTTTTTTTAAGACCTTAGTGCCCATGAATGAACTGGCTCAAGCAAT AGCGGCTGCTCGTGCCATGCGTGAGCTGGCGGCCAAATCGGACTCACGGACAAGTCTGCCCCCTTGTGAGTTAGTGTTGGCT TGACAACTCTAAAGTCCGAACCCATCGTGCGGCCATCCTACGTGGTGTAGCTTTGGCCCATAACTAACCTGGTTACTCACTA TCCTGCGACTCGTCTGGTCTCACTAGGCGATTCCCCCCGGCTTCGTATTGCAACATTCTAACGAATGCGAAGTCAAACAGTC CAGCTTAACAAAGGGGTCTTGACGAGACTCTGTAATCGTCTGCTAGCCCCGGACTCTGTTGTCGAAGGCAATTTGACGACCC ACACGAGGTGCAGACGTAGTCAGGCCTGATAGCTATGTATGCAGGCATATCCCTATAAAGTAGCGTTTGGTTATCCTACCAT TAGCCGTTTCCGCATCTACCAGTGTCGACCGG1 in 1000 between any two genomes 10 Different, but not that different • Humans are one of the least diverse organisms Species Diversity (percent) Humans 0.08 - 0.1 Chimpanzees 0.12 - 0.17 Drosophila simulans 2 E. coli 5 HIV1 30 11 c. 3,000,000 SNPs in 270 people 12 c. 25,000,000 SNPs in 1000 people 13 How do we differ? – Let me count the ways • Single nucleotide polymorphisms TGCATTGCGTAGGC TGCATTCCGTAGGC • Short indels (=insertion/deletion) TGCATT---TAGGC TGCATTCCGTAGGC • Microsatellite (STR) repeat number TGCTCATCATCATCAGC TGCTCATCA------GC • Minisatellites ≤100bp • Repeated genes – rRNA, histones 1-5kb • Large inversions, deletions – Y chromosome, Copy Number Variants (CNVs) 14 Y chromosome variation • Non-pathological rearrangements of the AZFc region on the Y chromosome 15 Copy-number variation in genes • Variation in gene number can contribute to phenotypic variation Perry et al. 2007 16 Where does genetic variation come from? • You will pass on about 60 new mutations to each of your children • Most of these are destined to die out within a few generations • Most variation is inherited from our ancestors 17 Me You 18 Mutations in our ancestors Our genealogical tree Our genomes Inherited mutations 19 mtDNA Eve Vigilant et al. (1991) Recombination means that different parts of 20 the genome have different tree • Looking back in time, recombination means that different parts of your chromosomes follow different evolutionary paths • This means that the genealogical tree will change along the genome Grandpaternal sequence Grandmaternal sequence TCAGGCATGGATCAGGGAGCT x TCACGCATGGAACAGGGAGCT TCAGGCATGG AACAGGGAGCT 21 How old? 22 Human – chimp split Autosomal MRCA Origin of H. sapiens Homo erectus Australopithecus afarensis 25 Ancient variation in the human genome I • Inversion on chromosome 17 (Stefansson et al 2005) 26 Ancient variation in the human genome II • Trans-specific polymorphism in the HLA Lawlor et al. 1988 , Horton et al (1998) 27 Did early humans breed with Neanderthals? Neanderthals mtDNA sequences say no… Ovchinnikov et al (2000) 28 But… • There is some evidence for this in the presence of unusual haplotypes found in Europe composed of SNPs not found in non-European populations Plagnol and Wall (2006) What are the genetic differences that make us human? 30 Chromosomal changes • Human chromosome 2 is a fusion of two chromosomes in great apes • There are several inversion differences between the chromosomes Feuk et al (2005) 31 Gene loss • Loss of enzymes that make sialic acid – Sugar on cell surface that mediates a variety of recognition events involving pathogenic microbes and toxins • Myosin heavy chain – Associated with gracilization Wang et al (2006) 32 Gene evolution • FOXP2 is a highly conserved gene (across the mammalia), expressed in the brain. Mutations in the gene in humans are associated with specific language impairment • Across the entire mammalian phylogeny, there have only been a very few amino acid changing substitutions • However, two amino acid changes have become fixed in the lineage leading to modern humans since the split with the chimpanzee lineage Enard et al. (2002) Are the genetic differences between people and peoples important? 34 Diet Infectious disease Genome ? Physical environment Mating success 35 Detecting recent adaptive evolution • Let’s look closely at the dynamics of the fixation process for adaptive mutations • The fixation of a beneficial mutation is associated with a change in the patterns of linked neutral genetic variation • This is known as the hitch-hiking effect (Maynard Smith and Haigh 1974) • Looking for the signature of hitch-hiking can be a good way of detecting very recent fixation events 36 Lactose persistence 37 Lactose intolerance 38 Skin pigmentation 39 Lamason et al. (2005) 40 Disease resistance • Mutations in the Duffy gene associated with protection again malarial infection (Plasmodium vivax) 41 An aside on the genetics of race • It is sometimes claimed that there is a ‘genetic basis to race’ • What is true is that groups of individuals from different parts of the world tend to have similar genomes because they