<<

and Molecular - Session 1. Introduction to MSc in Module 3:

Lesson 6. variation I. Theory and Data Antonio Barbadilla Group Genomics, Bioinformatics & Evolution Institut Biotecnologia I Biomedicina Departament de Genètica i Microbiologia UAB

Bachelor’s Degree in Bioinformatics 1 Course 20122016-1317 Prof. Antonio Barbadilla Population Genetics and - Session 1. Introduction to Population Genetics Population Genetics and Molecular Evolution

Session 1. Introduction to Population Genetics Antonio Barbadilla Group Genomics, Bioinformatics & Evolution Institut Biotecnologia I Biomedicina Departament de Genètica i Microbiologia UAB

Course 2018-19

Bachelor’s Degree in Bioinformatics 2 Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

Outline Darwinian evolution and Population Thinking Levels of

Population Genetics: the kinematics and dynamics of evolutionary changes in Types of genetic variation and frequencies Measures of variation Surveys of genetic variation Population genetics databases The nature of genetic variation The golden age of population genetics Readings & exercices Bachelor’s Degree in Bioinformatics 3 Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics Why the rich diversity of life?

Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

1859 Charles R. Darwin (1809-1882) Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. IntroductionDescendto Population Geneticswith modification

Bachelor’shttps://www.evogeneao.com/Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

Bachelor’s DegreeLessonin 6.Bioinformatics Genome variation: I. nucleotide variation 9 Antonio Barbadilla Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

Richard C. Lewontin

Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

Plato’s Essentialism

Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics Population Thinking

Genetic variation within population is the raw material of evolution. Individual variation is the

only reality of the species Ernst Mayr 1904-2005

Bachelor’s Degree in Bioinformatics Antonio Barbadilla Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

Charles Darwin (1809-1882)

Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

Barrier

Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

Time

Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

Time

Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

https://www.evogeneao.com/ Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics Evolution is the process of conversion of individual variation into species variation

Bachelor’s Degree in Bioinformatics 21 Antonio Barbadilla Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics The only thing that is transmitted from generation to generation is the genetic material

Genotype , development

Next Transmission generation

Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

Single nucleotide

G

C

Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

Charles Darwin (1809-1882) (1822-1884)

Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics Evolution is the process of conversion of individual variation into species variation Divergence Species level

Polymorphism Population level

Mutation Individual level T G AC C Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics Levels of genetic variation Divergence Divergence = Species level Substitution = Fixation Polymorphism Population level

Mutation Individual level Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics Population Genetics and Molecular Evolution Molecular Evolution Divergence Species level

Polymorphism

Population level Population Genetics

Mutation Individual level Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics How does allelic frequency change over time?

Population genetics: 1 the kinematics and dynamics of evolutionary Allelic changes in frequency populations 0 Time

Bachelor’s Degree in Bioinformatics 29 Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics Evolution and Population Genetics

Nothing in Makes Sense Except in the Light of Evolution Theodosious Dobzhansky

Nothing in Evolution Makes Sense Except in the Light of Population Genetics Michael Lynch

Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics Population Genetics

The problematic of population Starting 1966 genetics is the description and explanation of genetic variation within and between populations

Theodosious Dobzhansky (1900-1975)

Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics The Population is the substrate where evolution occurs Mendelian population: a group of interbreeding individuals sharing a common gene pool (diploid , sexual and )

Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics The basic unit of evolution is the (gene) frequency •Genetic variation or genetic polymorphism: existence in a population of two or more allelic forms at appreciable frequencies •Gene or (basic unit of evolution): f (A) proportion of a given allele in the population Gene with A and a

q = f(a) p = f(A) A a p+q=1

Bachelor’s Degree in Bioinformatics 36 Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics The Population is the substrate where evolution occurs Mendelian population: a group of interbreeding individuals sharing a common gene pool (diploid , reproduction sexual with Mendelian inheritance )

Allele A Allele a

Bachelor’s Degree in Bioinformatics 37 Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics Structure of population genetics A theory of forces Factors changing gene frequencies in populations Migration

Population genetics: the kinematics and dynamics of evolutionary changes

Mutation

Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics The theory of Population Genetics Founders of Population Genetics (1918-1932)

Population genetics: the kinematics and dynamics of evolutionary Ronald Fisher J. B. S. Haldane Sewall Wright changes 1890-1962 1892-1964 1889-1988

Factors changing gene frequencies in populations

Genetic Drift Migration Structure of population genetics A theory of forces

Mutation

Bachelor’s Degree in Bioinformatics Natural Selection Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics The struggle for the measurement of genetic variation Before the 60’s: Morphological and immunological polymorphism (blood groups: AB0, Rh, NM, ... 40 in humans)

1 2 3

Present

Protein polymorphism Population (allozyme) Gel electrophoresis • Polymorphisms at the DNA Genomics era (Lewontin & Hubby level 1966; Harris 1966) • Microsatellites • DNA sequences

Molecular Population Genomics Flybook_Casillas_Barbadilla (pdf) Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

50 years of molecular population genetics

Harris H., 1966 polymorphisms in man. Proc R Soc Lond B Biol Sci 164: 298–310

R. C. Lewontin Bachelor’s Degree in Bioinformatics 46 Molecular Population Genomics Flybook_Casillas_BarbadillaAntonio Barbadilla Prof. Antonio Barbadilla(pdf) Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

Gel protein electrophoresis to study protein variation in populations

Bachelor’s Degree in Bioinformatics 47 Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

Monomorphic gene

Polymorphic gene (allozyme polymorphsim) F (Fast migration allele) S (Slow migration allele)c

Bachelor’s Degree in Bioinformatics 48 Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics Measures of Hypothetical population

N = 10 individuals, 20 gene copies Allele A Allele a •Allele frequency

f(A) = NA/2N f(a) = Na /2N f(A) = 7/20 f(a) = 13/20 •Genotype Frequency

f(AA) = NAA/N f(Aa) = NAa /N f(aa) = Naa/N f(AA) = 1/10 f(Aa) = 5 /10 f(aa) = 4/10 •Observed Heterozigocity

f(Aa) = NAa /N f(Aa) = 5 /10

Bachelor’s Degree in Bioinformatics 49 Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

Example: Electrophoretic study of the enzyme glucose phosphate isomerase in a population of mice

Genotype

F/F F/S S/S Total

N. individuals 4 7 5 16 N. alleles F 8 7 0 15 N. alleles S 0 7 10 17 N. alleles F + S 8 14 10 32

Estimate from the table data • Allele frequency

• Genotype frequency

• Observed heterozigocity

Bachelor’s Degree in Bioinformatics 50 Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics Example: Electrophoretic study of the enzyme glucose phosphate isomerase in a population of mice

Genotype

F/F F/S S/S Total

N. individuals 4 7 5 16 N. alleles F 8 7 0 15 N. alleles S 0 7 10 17 N. alleles F + S 8 14 10 32 •Allele frequency

^ 15 4 + (1/2) 7 ^ ^ p = f(F) = = = 0.469 q = 1 - p = 0.531 32 16 •Genotype frequency Allele count Genotype count f(FF) = 4 / 16 f(FS) = 7 /16 f(SS) = 5/16

•Observed heterozigocity ^ Bachelor’s Degree in Bioinformatics H = 7/16 = 0.4375 51 Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics Genotypic and allelic frequencies for the MN blood group locus in several human populations Genotype Gene frequencies Population MM MN NN p(M) p(N) Eskimo 0.835 0.156 0.009 0.913 0.087 Aboriginal 0.024 0.304 0.672 0.176 0.824 Australia Egyptian 0.278 0.489 0.233 0.523 0.477 Germany 0.297 0.507 0.196 0.550 0.450 China 0.332 0.486 0.182 0.575 0.425 Nigeria 0.301 0.495 0.204 0.548 0.452

Link here to see the table of ABO Blood_type_distribution_by_country Bachelor’s Degree in Bioinformatics 52 Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics First estimate of genetic variability in man based on 10 allozymic loci Harris (1966) Locus Polymorphic Heterocigosity

Yes Yes Yes No

Average heterozigosity Proportion of polymorphic P = 3/10 = 0.30

Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics Levels of heterozygosity () and Proportion of polymorphic genes (

) from allozyme studies of groups of plants and animals

Average estimates N = 243 P = 0.26 H = 0.07

Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

STR: Short tandem repeats CNV: Copy number variation

Bachelor’s Degree in Bioinformatics 62 Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics Single nucleotide polymorphism (SNP)

G

C

Bachelor’s Degree in Bioinformatics 63 Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

Short tandem repeats ATGGCTGCACACACACACACATGCTGA -> (CA)7

Bachelor’s Degree in Bioinformatics 64 Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics Functional genome regions

Bachelor’s Degree in Bioinformatics 65 Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics Classification of Single Nucleotide Polymorphism (SNP)

•Coding SNP •Synonymous •Non synonymous or replacement

•Non-coding SNP: CNS, 5’ and 3’UTR, intron, 5’ and 3’ intergenic

CNS = Conserved non-coding sequence UTR = Untranslated region

Bachelor’s Degree in Bioinformatics 66 Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics Description levels of genetic variation: one-dimensional vs multi-dimensional SNPs BRCA2

one-dimensional SNP to SNP multi-dimensional: Haplotype Individual 1 acgtagcatcgtatgcgttagacgggggggtagcaccagtacag Individual 2 acgtagcatcgtatgcgttagacgggggggtagcaccagtacag Individual 3 acgtagcatcgtatgcgttagacgggggggtagcaccagtacag Individual 4 acgtagcatcgtttgcgttagacgggggggtagcaccagtacag Individual 5 acgtagcatcgtttgcgttagacgggggggtagcaccagtacag Individual 6 acgtagcatcgtttgcgttagacggcatggcaccggcagtacag Individual 7 acgtagcatcgtttgcgttagacggcatggcaccggcagtacag Individual 8 acgtagcatcgtttgcgttagacggcatggcaccggcagtacag Individual 9 acgtagcatcgtttgcgttagacggcatggcaccggcagtacag Bachelor’s Degree in Bioinformatics 67 Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics Which sequence data set is more variable?

Sequence data set 1 Sequence data set 2

1. A G C G T T C T G C T C G 1. A G C G T T C T G C T C G 2. A G A G T T C T G C T C G 2. A G C G T T C T G C T C G 3. A G C T T T A T G C T C G 3. A G C G T T C T G C T C G 4. A G A G T T C T G C T C G 4. T G C G T T C T G C T C G 5. A G A G T T A T G C T C G 5. A G C G T T C T G C T C G 6 A G C G T T C T G C T C G 6 A G C G T T C T G C T A G 7. A G C T T T A T G C T C G 7. A G G G T T C T G C T C G 8. A G C G T T A T G C T C G 8. A G C G T T C T G C T C G 9. A G A T T T A T G C T C G 9. T G C G T T C T G C T C G 10. A G A G T T A T G C T C G 10. A G C G T T A T G C T C G 11. A G A G T T C T G C T C G 11. A G C G T T C T G C T C G 12. A G C T T T A T G C T C G 12. A G C G T T C T G C T C G 13. A G A G T T C T G C T C G 13. A G C G G T C T G C C C G 14. A G C T T T A T G C T C G 14. A G C G T T C T G C T C G

Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution(from Casillas- Session2007) 1. Introduction to Population Genetics Uni-dimensional Common measures of nucleotide diversity S, s Number of segregating sites (per DNA sequence or per Nei (1987) site, respectively). Η, η Minimum number of (per DNA sequence or Tajima (1996) per site, respectively) k Average number of nucleotide differences (per DNA Tajima (1983) sequence) between any two sequences π Nucleotide diversity: average number of nucleotide Nei (1987); Jukes and differences per site between any two sequences. Cantor (1969); Nei and Gojobori (1986)

θ, θW Nucleotide polymorphism: proportion of nucleotide Watterson (1975); sites that are expected to be polymorphic in any Tajima (1993; 1996) suitable sample Multi-dimensional D The first and most common measure of linkage Lewontin and disequilibrium, dependent of allele frequencies Kojima (1960) D’ Another measure of association, independent of allele Lewontin (1964) frequencies R, R2 Statistical correlation between two sites Hill and Robertson (1968) ZnS Average of R2 over all pairwise comparisons Kelly (1997) Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla (from Casillas & Barbadilla 2017) Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

• Number of segregating sites per nucleotide (Watterson 1975):

Symbols S/m S = number segregating sites m = number of analyzed • Watterson Ѳ estimator (1975): n = sample size (number of sequences) kij = number of differences between sequences i and j 푛−1 Ѳw = (S/m) / 푖=1 1/푖

• p, nucleotide diversity o expected nucleotide heterozygosity (Tajima 1983): average number of differences by site among pair of randomly samples sequences 1 푛−1 푛 p = 푛 푖=1 푗=푖+1 푘푖푗 푚 2 Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

Site frequency spectrum

Bachelor’s Degree in Bioinformatics 73 Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics Application example

Data summary: Sample n = 4 sequences. Size m = 10 nucleotides •Number of segregating sites per nucleotide: 3/10 = 0.3

•Watterson Ѳ estimator = Ѳw = (3/10)/(1+1/2+1/3) = 0.164 •Nucleotide diversity : p ->

Bachelor’s Degree in Bioinformatics 75 Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

/ m

•Nucleotide diversity: p = 0.167 Bachelor’s Degree in Bioinformatics 76 Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

Site frequency spectrum

2 2

Frequency 1 1

0 0.25 0.50

Minor allele frequency

Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics Nucleotide variation in gen Rhodopsin 3 of Drosophila simulans (size m = 500bp, number sequences n = 5) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 T C T A C C T C C T C G G T T A 2 T C C T A C C T C C T G G T T T 3 C T C C C C C T C T T T G C T A 4 C T C C C C C T T C T G A C T T 5 C T C C C T C T T T T G G C C A 6 6 4 7 4 4 4 4 6 6 4 4 4 6 4 6 Data summary: Sample n = 5 sequences. Size m = 500 nucleotides Heteryzogous out of all pairs •Number of segregating sites per nucleotide: 16/500 = 0.0320 comparisons for each site •Watterson Ѳ estimator = Ѳw = (16/500)/(1+1/2+1/3+1/4) = 0.0154 n All pairs = comparison •Nucleotide diversity: p = 79/(500 x 10) = 0.0158 (Estimation site by site) Bachelor’s Degree in Bioinformatics2 Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics Software for the estimation of nucleotide variation

DnaSP — DNA Sequence Polymorphism, is a software package for the analysis of nucleotide polymorphism from aligned DNA sequence data. Rozas, J., Ferrer-Mata, A., Sánchez-DelBarrio, J.C., Guirao-Rico, S., Librado, P., Ramos-Onsins, S.E., Sánchez-Gracia, A. (2017). DnaSP 6: DNA Sequence Polymorphism Analysis of Large Datasets. Mol. Biol. Evol. 34: 3299-3302. DOI: 10.1093/molbev/msx248

PopGenome - An efficient swiss army knife for population genomic analyses in R. Bastian Pfeifer, Ulrich Wittelsbürger, Sebastian E. Ramos-Onsins, Martin J. Lercher; PopGenome: An Efficient Swiss Army Knife for Population Genomic Analyses in R, and Evolution, Volume 31, Issue 7, 1 July 2014, Pages 1929–1936, https://doi.org/10.1093/molbev/msu136

Variscan is a software package for the analysis of DNA sequence polymorphisms at the whole genome scale. Hutter, S., Vilella, A, J. and Rozas, J. (2006). Genome-wide DNA polymorphism analyses using VariScan. BMC Bioinformatics 7: 409

MEGA, Molecular Evolutionary Genetics Analysis, is a software package used for estimating rates of molecular evolution, as well as generating phylogenetic trees, and aligning DNA sequences. Available for Windows, Linux and Mac OSX (since ver. 5.x). Sudhir Kumar, Glen Stecher, Koichiro Tamura; MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets, Molecular Biology and Evolution, Volume 33, Issue 7, 1 July 2016, Pages 1870–1874, https://doi.org/10.1093/molbev/msw054

Arlequin3.5 software can be used for calculations of nucleotide diversity and a variety of other statistical tests for intra-population and inter-population analyses. Available for Windows.

Bachelor’s Degree in Bioinformatics 82 Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics Nucleotide diversity in the Drosophila melanogaster genome

Syn

Non-syn

UTR

Intron

Intergenic

From Barrón 2015 (PhD Thesis Dissertation) Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population GeneticsGeneticand Moleculardiversity Evolutionin metazoans- Session 1. Introduction to Population Genetics (Romiguier et al. 2014)

Romiguier, J., Gayral, P., Ballenghien, M., Bernard, A., Cahais, V., Chenuil, A., Chiari, Y., Dernat, R., Duret, L., Faivre, N., Loire, E., Lourenco, J.M., Nabholz, B., Roux, C., Tsagkogeorga, G., Weber, A.A., Weinert, L.A., Belkhir, K., Bierne, N., Glémin, S. & Galtier, N. (2014) Comparative population genomics in animals uncovers the determinants of genetic diversity. Nature 2014 515:261-3

Genome-wide genetic diversity across the metazoan tree of life

Each branch of the tree represents a species (n = 76). The leftmost vertical coloured bar is the estimated genome-wide genetic diversity (πs), the central bar is the prediction of πs based on a linear model with propagule size as the explanatory variable (P <10−14, r2 = 0.56), and the rightmost bar is the prediction of πs based on a linear model with average distance between GPS records, maximal distance between GPS records, average distance to Equator and invasive status as explanatory variables (P = 0.16). Each thumbnail corresponds to one metazoan family. Species are in the same order as in Supplementary Table 2 (from top to bottom).

Reticulitermes grassei (termite)

ps = 0.001

Bostrycapulus aculeatus (sea snail)

ps = 0.083

Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

Ellegren and Galtier 2016

Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics Population Genetics Databases

Bachelor’s DegreeLessonin 6.Bioinformatics Genome variation: I. nucleotide variation 89 Antonio Barbadilla Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics PopFly genome browser

http://popfly.uab.cat

• 1100 samples (960 analyzed) • 30 populations / 18 countries / 5 continents • 6 metapopulations (geographic origin)

Sergi Hervás

Population genomics resources available for four Drosophila species.

S.Bachelor’s Hervas, E. SanzDegree, S. Casillas,in Bioinformatics J. Pool and A. Barbadilla. 2017. PopFly: the Drosophila population genomics browser. Bioinformatics https://doi.org/10.1093/bioinformatics/btx301Prof. Antonio Barbadilla PopulationPopFlyGenetics genomeand Molecular browser Evolution - Session 1. Introduction to Population Genetics

http://popfly.uab.cat Hervas et al. (2017) Bioinformatics Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Databases of genetic variationPopulation Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

Online Mendelian Inheritance in http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMI Catalog of human genetic and genomic Man M disorders

International HapMap Project http://www.hapmap.org (in disuse)

Entrez dbSNP http://www.ncbi.nlm.nih.gov/projects/SNP/

Database of Genotype and http://view.ncbi.nlm.nih.gov/dbgap Phenotype

1000 Project http://www.internationalgenome.org/1000-genomes- browsers

Database of Genomic Variants: A http://dgv.tcag.ca/dgv/app/home curated catalogue of human genomic structural variation Variation http://www.hgvd.genome.med.kyoto-u.ac.jp/ Database

PopHuman http://pophuman.uab.es

Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

Pilot phase Phase I Phase III 179 individuals 1,092 individuals 2,504 individuals 4 populations 14 populations 26 populations 15 million SNPs 38 million SNPs 84.7 million SNPs 1 million small indels 1.4 million small indels 3.6 million small indels 20,000 SVs 14,000 deletions 60,000 SVs

A G A G T T C T G C T C G A G A G T T C T G T C G G T T A C T G C Bachelor’s Degree in Bioinformatics A G G G T T A T G C G C G A G G G T T A T G A G C C A A T G A C G Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

Sònia Casillas

Roger Mulet

Figure 1. PopHuman analysis pipeline. http://pophuman.uab.cat Casillas*, S., R. Mulet*, P. Villegas-Mirón, S. Hervás, E. Sanz, D. Velasco, J. Bertranpetit, H. Laayouni & A. Barbadilla. 2017. PopHuman: the human Bachelor’spopulationDegree in genomics Bioinformatics browser. Nucl. Acids Res. gkx943, https://doi.org/10.1093/nar/gkx943 * Equal contribution. Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

The Great Obsession of population genetics (Gillespie 2004)

What evolutionary forces led to the observed pattern of genetic variation?

Bachelor’s Degree in Bioinformatics 97 Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics Neutral Theory of Molecular Evolution (1968) Mutations are mainly neutral

or strongly deleterious Frequency

DFE (Distribution fitness effect of new mutation) of Kimura’s Neutral Theory

Bachelor’s Degree in Bioinformatics 100 Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

Bachelor’s Degree in Bioinformatics 102 Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

The coming years will only see the data rush grow: bigger samples, new species, extinct species, data linked to phenotype, temporal data, and so on. Data are at their most fun when they bring to light things you would never have imagined Gil McVean

McVean, G. 2015 Population Genetics: More Traits, More Populations, and More Species. Where Next for Genetics and Genomics? PLoS biology 2015 13: e1002216

Bachelor’s Degree in Bioinformatics 103 Prof. Antonio Barbadilla PopulationConclusionGenetics and Molecular Evolution - Session 1. Introduction to Population Genetics POPULATION GENETICS

Bachelor’s Degree in Bioinformatics 104 Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics Exercises

Exercise 1: Estimate S/m, Ѳw , p from these two data sets

Sequence data set 1 Sequence data set 2

1. A G C G T T C T G C T C G 1. A G C G T T C T G C T C G 2. A G A G T T C T G C T C G 2. A G C G T T C T G C T C G 3. A G C T T T A T G C T C G 3. A G C G T T C T G C T C G 4. A G A G T T C T G C T C G 4. T G C G T T C T G C T C G 5. A G A G T T A T G C T C G 5. A G C G T T C T G C T C G 6 A G C G T T C T G C T C G 6 A G C G T T C T G C T A G 7. A G C T T T A T G C T C G 7. A G G G T T C T G C T C G 8. A G C G T T A T G C T C G 8. A G C G T T C T G C T C G 9. A G A T T T A T G C T C G 9. T G C G T T C T G C T C G 10. A G A G T T A T G C T C G 10. A G C G T T A T G C T C G 11. A G A G T T C T G C T C G 11. A G C G T T C T G C T C G 12. A G C T T T A T G C T C G 12. A G C G T T C T G C T C G 13. A G A G T T C T G C T C G 13. A G C G G T C T G C C C G 14. A G C T T T A T G C T C G 14. A G C G T T C T G C T C G

Bachelor’s Degree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

Exercises

Exercise 2: Estimate manually the most common measures or summary statistics of

nucleotide variation (S/m, Ѳw , p and the site frequency spectrum) for the 8 aligned sequences given below. m, the analyzed sequence length, is 100. Only variable sites are shown.

Sequence 1 … A … G … C … G … Sequence 2 … A … G … T … G … Sequence 3 … A … A … T … G … Sequence 4 … T … G … T … T … Sequence 5 … T … G … T … T … Sequence 6 … T … G … C … T … Sequence 7 … T … G … T … T … Sequence 8 … T … G … T … T …

Bachelor’s Degree in Bioinformatics 107 Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics Readings & Videos Readings

•Casillas, S. and A. Barbadilla. 2017. Molecular Population Genetics. Genetics 205: 1003–1035. -> read pages 1003-1011 and discussion next Session

•What Use Is Population Genetics? Brian Charlesworth. Genetics. 2015

•Perspectives: McVean, G. 2015 Population Genetics: More Traits, More Populations, and More Species. Where Next for Genetics and Genomics? PLoS biology 2015 13: e1002216

Videos / Online resources

• Introduction to Genetics and Evolution - Coursera - Prof. Mohamed Noor

• Web introduction population genetics (in Spanish - Antonio Barbadilla)

Bachelor’s Degree in Bioinformatics 108 Prof. Antonio Barbadilla