Genetic Modeling of Human History Part 1: Comparison of Common Descent and Unique Origin Approaches
Total Page:16
File Type:pdf, Size:1020Kb
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/314759682 Genetic Modeling of Human History Part 1: Comparison of Common Descent and Unique Origin Approaches Article · November 2016 DOI: 10.5048/BIO-C.2016.3.c CITATIONS READS 2 122 3 authors: Ola Hössjer Ann Katharine Gauger Stockholm University Biologic Institute 101 PUBLICATIONS 1,023 CITATIONS 14 PUBLICATIONS 133 CITATIONS SEE PROFILE SEE PROFILE Colin Richard Reeves Coventry University 126 PUBLICATIONS 6,130 CITATIONS SEE PROFILE Some of the authors of this publication are also working on these related projects: Limits of protein evolution View project human origins View project All content following this page was uploaded by Ann Katharine Gauger on 12 March 2017. The user has requested enhancement of the downloaded file. Research Article Genetic Modeling of Human History Part 1: Comparison of Common Descent and Unique Origin Approaches Ola Hössjer,*1 Ann Gauger,2 and Colin Reeves3 1 Department of Mathematics, Stockholm University, Sweden 2 Biologic Institute, Redmond, WA, USA 3 Applied Mathematics Research Centre, Coventry University, United Kingdom Abstract In a series of two papers (Part 1 and 2) we explore what can be said about human history from the DNA variation we observe among us today. Population genetics has been used to infer that we share a common ancestry with apes, that most of our human ancestors emigrated from Africa 50 000 years ago, that they possibly had some mixing with Neanderthals, Deniso- vans and other archaic populations, and that the early Homo population was never smaller than a few thousand individ- uals. Population genetics uses mathematical principles for how the genetic composition of a population develops over time through various forces of change, such as mutation, natural selection, genetic drift, recombinations and migration. In this article (Part 1) we investigate the assumptions about this theory and conclude that it is full of gaps and weaknesses. We argue that a unique origin model where humanity arose from one single couple with created diversity seems to explain data at least as well, if not better. We finally propose an alternative simulation approach that could be used in order to val- idate such a model. The mathematical principles of this model are described in more detail in our second paper (Part 2). Cite as: Hössjer O, Gauger A, Reeves C (2016) Genetic modeling of human history part 1: comparison of common descent and unique origin approaches. BIO- Complexity 2016 (3):1–15. doi:10.5048/BIO-C.2016.3. Editor: Douglas D. Axe Received: June 4, 2016; Accepted: October 5, 2016; Published: November 4, 2016 Copyright: © 2016 Hössjer, Gauger, Reeves. This open-access article is published under the terms of the Creative Commons Attribution License, which permits free distribution and reuse in derivative works provided the original author(s) and source are credited. Notes: A Critique of this paper, when available, will be assigned doi:10.5048/BIO-C.2016.3.c. *Email: [email protected] INTRODUCTION The conclusion of our qualitative argument is that a unique origin scenario is in fact more plausible. We end by suggest- We all have a genetic fingerprint. The more closely related ing a quantitative model by which such scenarios can be tested we are the more similar our fingerprints are. Genetic data can more formally. The mathematics of this quantitative model is be used for a number of purposes: to find out whether we carry described in more detail in our second article (Part 2). a risk gene of an inheritable disease, to ascertain that a young In more detail, this paper (Part 1) is organized as follows: In man is the father of a newborn child, to find evidence against a Section 1 we introduce some basic concepts from population suspect of a crime, or to retrieve our ethnic mix at ancestry.com. genetics. Although the content is well known to many readers, In this paper and a subsequent one we will investigate what it makes the article more self contained and easier to follow. human genetic data has to say about common ancestry. In aca- Then in Section 2 we describe in more detail different versions demia it has more or less been taken for granted that humans of the two competing common descent and unique origin mod- have a common ancestor with chimps. The prevailing view of els. They are compared and evaluated in Section 3, using several this common descent scenario is that the first steps to humanity different criteria. Finally, in Section 4 we briefly discuss how the arose from ape-like ancestors, whose number were never less hypotheses of the unique origin model can be tested with data, than a few thousand individuals at any time in history. But as a preparation for our accompanying paper (Part 2). there are virtually no attempts to check the results of a scenario by which humanity descends from a single first couple. We will call this second scenario the unique origin model. 1. POPULATION GENETICS From a scientific point of view it is important to compare and Population genetics is a discipline that describes how the test both scenarios. In this first article (Part 1) we will describe genetic makeup of a group of people changes over time [1]. It the results of comparing and testing them against each other. Volume 2016 | Issue 3 | Page 1 Common Descent and Unique Origin Approaches has numerous applications, but here we will use it as a tool for G A C C T A T T T A G A C A comparing different scenarios of human history. Chromosome 1A Before that, we will first review some basic principles of C T G G A T A A A T C T G T genetics [2]. 1A. Human DNA and Its Inheritance G A C C T A T T T G G A C A Each one of us has genetic information stored in our cells in Chromosome 1B terms of DNA. Most of this information is contained in the cell C T G G A T A A A C C T G T nucleus as 46 chromosomes, 23 of which are inherited from our father and 23 from our mother. Forty-four of these are non- SNP sex (autosomal) chromosomes and come in almost identical (homologous) pairs. The remaining two chromosomes deter- Figure 2. A small part of a DNA molecule, with 14 base pairs, exists mine our sex. Females have two copies of an X-chromosome, in two copies in the worldwide human population. The upper copy is th one from each parent, whereas males have one Y-chromosome the same as in Figure 1, whereas the lower differs in that the 10 base pair is C-G rather than T-A. This position is a single nucleotide polymorphism from the father, and one X-chromosome from the mother. (SNP), with two possible alleles T and C at the coding strand. There is also some DNA in the mitochondria. doi:10.5048/BIO-C.2016.3.f2 The chromosomes of the cell nucleus and the mitochondria of the cell cytoplasm are DNA molecules, whose structure is a double-stranded helix. Both strands are written in an alphabet 1B. DNA Variation Among Individuals that consists of the four letters adenine (A), guanine (G), cyto- We don’t all look the same, and a major reason for this is the sine (C) and thymine (T). Each such letter is usually referred fact that our genomes are not identical. There are small differ- to as a nucleotide or base, and they are connected along each ences that make each one of us genetically unique. Most parts of strand by strong chemical covalent bonds. Between the two DNA are the same for all humans, but those that vary are called strands, the nucleotides pair up by weaker hydrogen bonds, A polymorphisms. A Single Nucleotide Polymorphism (SNP) is connecting to T, and G to C. Each pair, A-T or G-C, is referred the most common kind of variation.1 For each base pair of the to as a base pair, and our genome can be thought of as a book DNA molecule, the convention is to refer to only one of its with three billion base pairs of nuclear DNA along with sixteen two nucleotides, the one located on the coding strand.2 This thousand base pairs of mitochondrial DNA (Fig. 1). nucleotide usually exists in two variants at a SNP, for instance C The human genome is estimated to contain about 20,000– or T, also referred to as the two alleles of the SNP (Fig. 2). There 25,000 genes. These genes comprise only a small part of DNA, are several million bi-allelic SNPs along the human genome, and each one of them carries information about various types of and for those located on one of the 22 autosomes, each person proteins through the genetic code. The mechanism of this code will have two copies of it, one on the chromosome inherited is that triplets (codons) of nucleotides are translated into one of from the father and the other from the chromosome that the twenty possible amino acids, the building blocks of all proteins. mother passed on. For a SNP with alleles C and T, there are There is some redundancy in the genetic code. Since most of three possible pairs of alleles, C on both chromosomes (CC), C the 43 = 64 possible nucleotide triplets code for amino acids, on one chromosome and T on the other (CT), and T on both some codons will correspond to the same amino acid.