Zoogeografia

Dr. Valerio Ketmaier [email protected] Gene trees

DNA sequences Can be used to infer phylogenies

DNA sequence alignment

Human GCGCTCGGGTCTAGCCTCT Chimp GCGATTCGGGTTAGCCTCT Gorilla GCGTCGCGTCTAGTCTCT Orangutan GGCTTTGGTCCAGCGCT DNA alignment

Human GCGCT-CGGGTCTAGCCTCT Chimp GCGATTCGGGT-TAGCCTCT Gorilla GCG--TCGCGTCTAGTCTCT Orangutan -GGCTTTGG-TCCAG-CGCT Only aligned sequences can be analysed The sites at which the mutant allele is present at Informative sites least twice in the dataset

1 …GCGCTTCGGCTCCTGCGTGCTTAG… 2 …GCGCTTCGGCTCCTGCGTGCTTAG… 3 …GCGCTTCGGCTCCTGCGTGCTTAG… Unresolved tree 4 …GCGCTTCGGCTCCTGCGTGCTTAG…

1 …GCGCTTTGGCTCCTGCGTGCTTAG… 2 …GCGCTTCGGCTCTTGCGTGCTTAG… 3 …GCGCTTCGGCTCCTGCGTACTTAG… Unresolved tree 4 …GCGCTTCCGCTCCTGCGTGCTTAG…

1 …GCGCTTTGGCTCCTGAGTGCTTAG… 2 …GCGCTTCGGCTCTTGAGTGCTTAG… 3 …GCGCGTCGGCTCCTGCGTACTTAG… Resolved tree 4 …GCGCGTCCGCTCCTGCGTGCTTAG…

Informative site Phylogeny reconstruction methods

GCGCTTCGGCTCCT GCGATTCGGCTTCT GCGCTTCGCCTTCT ! Neighbor-joining (NJ) GGGATTTGGCCCCG ! Maximum parsimony (MP) ! Maximum likelihood (ML) ! Bayesian Distance-based phylogeny reconstruction (UPGMA, Least squares, Neighbour-joining) Seq1 Seq4 Seq3 Seq2

Sequence distances: Seq1 Seq2 Seq3 Seq4 Seq1 - Seq2 5 - Seq3 1 4 - Seq4 21 22 20 -

Outgroup “Brackets notation”: (((Seq1,Seq3)Seq2)Seq4); Human How to calculate distances? (saturation problem) Saturation: The diff. is due to …GCGCTTCGGC… Expected saturation …GCGTTTCCGC… 2 Chimp …GCGTATTCGC… 4 …GCGCATTCGC… 5 Red Observed …ACCCATACGC… 8

Time Time colobus monkey …TCCCATACTC… 10 The number of differences of differences number The Time …TCCCACACTC… 11 …TTGCACACTC… 13 Multiple mutations at the …TTGCGCACTT… 15 same site result in Observed: 8 underestimation of Actual: 15 Mouse T-changed once evolutionary distances T-changed >once SubstitutionJukes &probability Cantor 1969 matrix: From\to AKimura C 1980 G T FromFrom\to\to A A C C G G T T How to calculate A Paa Pac Pag Pat AA - - α α α C Pca Pccβ Pcgα Pctβ CC α - - α α G Pgaβ Pgc Pggβ Pgtα distances? GG α α - - α T Ptaα Ptcβ Ptg Pttβ TT α β α α α β - -

Model-based corrections of observed distances: • Jukes & Cantor 1969 (JC): d = -3/4 ln(1 - 4p/3) (p-proportion of nucleotides different between two sequences)

• Kimura 1980 (K2P): d = ln[1/(1- 2s - v)]/2 - ln[1/(1- 2v)]/4 (s and v - proportion of transitions and transversions)

• Felsenstein 1981 (F81) • Hasegawa, Kishino & Yano 1985 (HKY85) • å& many others……. Maximum Parsimony Method Method predicts the evolutionary tree that minimizes the number of steps required to generate the observed variation in the sequences.

Step 0 Input: multiple sequence alignment Step 1 For each aligned position, identify phylogenetic trees that require the smallest number of evolutionary changes to produce the observed sequence changes. Step 1.5 Continue analysis for every position in the sequence alignment. Step 2 Sequence variations at each site in the alignment are placed at the tips of the trees. Identify the tree (trees) that produce the smallest number of changes overall for all sequence positions.

Because all possible trees are examined, method is best suited for sequences that are quite similar + for small number of sequences. It is guaranteed to find the best tree.

8. Lecture WS 2003/04 Bioinformatics III 6 Maximum likelihood approach

Method uses probability calculations to find a tree that best accounts for the variation in a set of sequences. Similar to maximum parsimony method in that analysis is performed on each column of a multiple sequence alignment. All trees are considered.

Because the rate of appearance of new mutations is very small, the more mutations are needed to fit a tree to the data, the less likely that tree.

Start with an evolutionary model of sequence change that provides estimates of rates of substitution of one base for another (transitions and transversions). Base A C G T

A -u(aπC+bπG+cπT) uaπC ubπG ucπT

C ugπA -u(gπA+dπG+eπT) udπG ueπT

G uhπA ujπG -u(hπA+jπG+fπT) ufπT

T uiπA ukπG ulπT -u(iπA+kπG+lπT)

8. Lecture WS 2003/04 Bioinformatics III 11 Maximum likelihood approach

Step1 Align set of sequences Step2 Examine substitutions in each column for their fit to a set of trees that describe possible phylogenetic relationships among the sequences. Each tree has a certain likelihood based on the series of mutations that are required to give the sequence data. The probability of each tree is the product of the mutation rates in each branch of the tree, which itself is the product of the rate of substitution in each branch times the branch length. branchn (i) P = mutation rate treei ∏ branch1 (i)

branchn (i) = ∏ rate of substitution in branch(i)×length of branch(i) branch1 (i)

Advantage of maximum likelihood approach: allows to evaluate trees with variations in mutation rates in different lineages. Can be used for more diverse sequences. Disadvantage: computationally intense.

8. Lecture WS 2003/04 Bioinformatics III 12 Infer relationships among three species: Three possible trees (topologies):

Outgroup: A C

Model B A B C

1.0

Prior distribution probability

Data (observations)

1.0

Posterior distribution probability Molecular divergence is clock-like

DNA divergence between the species Human The rate of evolution of haemoglobin Orangutan (from Kimura 1983) 1 D ~ 3% 0.9 DNA T ~ 10MY 0.8 Time of 0.7 divergence divergence 0.6 0.5 m = D / 2T 0.4 (Shows pairwise divergence 0.3

Amino acid divergence Amino divergence acid of the protein sequence of 0.2 hemoglobin for 13 pairs of The rate of molecular 0.1 species) evolution at the protein 0 100 200 300 400 500 level seems to be too Time (Myr) constant to be explained by natural selection Living fossils (100s of MY old): molecular evolution continues

Haemoglobin α Haemoglobin β Amino acid divergence of two haemoglobin genes (α & β) within the same species

Platypus Human 147 Shark 150

Rates of protein evolution in human and shark lineages are Ginkgo biloba approx. equal.

Echidna

Latimeria Port Jackson shark Cycas circinalis 12

9 3 slava 6 Molecular clock GCGCATCGTGCCTGGCTTGT D D, divergence seq3 seq2 seq1 GCGGTTCGGGTCTAGCCTCT

T T 2 1 D = rT1 + rT2 = 2rT

Thus, Outgroup T = D / 2r

r – rate of divergence (not always known)

Assuming molecular clock (r is constant over time), it is possible to estimate the time of divergence, T (e.g. Human / chimp divergence). Molecular clock in Hawaiian honeycreeper

(Bromham and Penny 2003 Nat Rev Genet) 12

9 3 slava Molecular clock calibration 6 (estimation of divergence rate, r)

D D seq3 1 seq2 2 seq1 If the time of divergence (T1) is known, it is possible ?? to estimate the rate T2 of divergence (r) r = D / 2T T1 1 1 Assuming r is the same throughout the tree, we

can estimate T2

T1 years ago T2 = D2 / 2r Dating events with the molecular clock

combined sequence of hemoglobins alpha and beta, cytochrome c, and fibrinopeptide A

Given the number of differences…

The molecular clock estimates the divergence.

Graur and Li (2000)

Find a slow-down in apes and monkeys and speed up in horse-monkey Molecular clock rates

Slow Human/horse ! Plant mitochondrial DNA (~5x10-10) ! Chloroplast DNA (~10-9) ! Amino acid substitutions in proteins (varies) ! Nuclear DNA (~10-8) ! Animal mitochondrial DNA (~5x10-8) Fast Variation in molecular clock rate 1. Generation time: Shorter generation time will accelerate the clock because it shortens the time to fix new mutations.

2. Mutation rate: Species-characteristic differences in polymerases or other biological properties that affect the fidelity of DNA replication, and hence the incidence of mutations. Pereira and Backer 2006 MBE

3. Gene function: Changes in the function of a protein as evolutionary time proceeds. This might particularly be expected in the case of gene duplication.

4. Natural selection: Organisms are continually adapting to the physical and biotic environments, which change endlessly in patterns that are unpredictable and differently significant to different species. 12

9 3 slava Violation of molecular clock 6

Mouse Opossum The clock is not always the same in Human r different species. 2 r1 E.g. species with shorter generation times have faster molecular clock (“generation time effect”)

Cytochrom B sequences of tube-nosed Molecular clock rate: seabirds body mass effect in animals

The larger the body, the longer generation time, the slower molecular clock

The authors claim that the taxa with larger body mass have slower molecular clock

Nunn and Stanley 1998 MBE How reliable is the tree?

Bootstrap allows to test robustness of Seq1 Seq2 Seq3 Seq4 the topology of the tree A B 1 …GCTTTGGCCTGAGTGCAG… How reliable are nodes A & B? 2 …GCTTCGGCTTGAGTGCAG… 3 …GCGTCGGCCTGCGTACAG… 4 …GCGTCCGCCTGCGTGCAG… • Choose random position • Take column of nucleotides Repeat 1000 times • Add to new dataset • Repeat until length on new dataset = length 1 …G G A GCTTGGTCGTTGCGA… of old dataset • Make a tree from the new dataset 2 …G G A GTTCGGTTGTTGCGA… Seq2 Seq3 3 …G A C GCTCGGTCGGTACGA… Seq1 Seq4 4 …G G C GCTCGCTCGGTGCGA… See how often the nodes A & B are present in the bootstrap replicates Levantina

• Materiale museale • Due geni mitocondriali (COI; 16S) Table 1. Taxa included in the study and their geographic origin. For each individual we detail the presence/absence of the umbilicus in the shell, the voucher number in the collections of the Zoologisches Museum Hamburg (ZMH) and the Museum für Naturkunde Berlin (ZMB), the composite COI/16S haplotype identifier number and the corresponding GenBank Accession numbers (COI and 16S separately). Taxon Umbilicus Location Specimen voucher mtDNA GenBank N. (O/C) Haplotype COI/16S Levantina malziana C , Karpathos, Mertonas, E Arhangells ZMH.1555b 1 XXX XXX Levantina malziana C Greece, Karpathos, Mertonas, E Arhangells ZMH.1555c 2 XXX XXX Levantina malziana C Greece, Karpathos, Mertonas, E Arhangells ZMH.1555d 2 XXX XXX Levantina malziana C Greece, Karpathos, Mertonas, E Arhangells ZMH.1555e 3 XXX XXX Levantina malziana C Greece, Karpathos, Mertonas, E Arhangells ZMH.1555f 4 XXX XXX Levantina malziana C Greece, Karpathos, Mertonas, E Arhangells ZMH.1555g 5 XXX XXX Levantina malziana C Greece, Karpathos, Mertonas, E Arhangells ZMH.1555i/ j 3 XXX XXX Levantina malziana C Greece, , Moni Amos, Kap Ladiko ZMH.1557a 6 XXX XXX Levantina malziana C Greece, Rhodes, Moni Amos, Kap Ladiko ZMH.1557b 7 XXX XXX Levantina malziana C Greece, Rhodes, Moni Amos, Kap Ladiko ZMH.1557d 8 XXX XXX Levantina malziana C Greece, Rhodes, Charaki, Feraklos ZMH.1557e 9 XXX XXX Levantina malziana C Greece, Rhodes, Kamiros Castle ZMH.1557f 2 XXX XXX Levantina malziana C Greece, Rhodes, Kamiros Castle ZMH.1557h 3 XXX XXX Levantina malziana C Greece, Rhodes, Kamiros Castle ZMH.1557i 6 XXX XXX Levantina malziana C Greece, Rhodes, Profitis Ilias (900m) ZMH.1557j 1 XXX XXX Levantina malziana C Greece, Rhodes, Profitis Ilias (900m) ZMH.1557k 6 XXX XXX Levantina malziana C Greece, Rhodes, Monolithos Castell ZMH.1557n 10 XXX XXX Levantina malziana C Greece, Rhodes, Monolithos Castell ZMH.1557o 11 XXX XXX Levantina malziana C Greece, Symi, northern slope Vigala ZMH.1558a 5 XXX XXX Levantina malziana C Greece, Symi, NW Hariani habour, Th.Sikidi ZMH.1558c 12 XXX XXX Levantina malziana C Greece, Symi, NW Hariani habour, Th.Sikidi ZMH.1558d 13 XXX XXX Levantina malziana C Greece, Symi, NW Hariani habour, Th.Sikidi ZMH.1558e 14 XXX XXX Levantina malziana C Greece, Symi, NW Hariani habour, Th.Sikidi ZMH.1558f 13 XXX XXX Levantina malziana C Greece, Symi, NW Hariani habour, Th.Sikidi ZMH.1558g 14 XXX XXX Levantina malziana C Greece, Symi, Panormitis, N monastery ZMH.1561a 24 XXX XXX Levantina malziana C Greece, Nimos / or Sesklia, both in Symi group ZMH.1556a 15 XXX XXX Levantina malziana C Turkey, Karaova / Knidos (?) ZMH. 1565 16 XXX XXX Levantina spiriplana O Greece, Karpathos, Larniotisa, near Kap Volakas ZMH.1559a 17 XXX XXX Levantina spiriplana O Greece, Karpathos, Pigadia , Patella mountain ZMH.1559d 17 XXX XXX Levantina spiriplana O Greece, Karpathos, Pigadia , Patella mountain ZMH.1559e 17 XXX XXX Levantina spiriplana O Greece, Karpathos, Pigadia , Patella mountain ZMH.1559f 18 XXX XXX Levantina spiriplana O Greece, Karpathos, Profitis Ilias, SE Aperi ZMH.1559g 19 XXX XXX Levantina spiriplana O Greece, Karpathos, E Mentes ZMH.1559i 20 XXX XXX Levantina spiriplana O Greece, Karpathos, E Mentes ZMH.1559j 20 XXX XXX Levantina spiriplana O Greece, Rhodes, rampert d´Ambrosie gate ZMH.1560a 21 XXX XXX Levantina spiriplana O Greece, Rhodes, rampert d´Ambrosie gate ZMH.1560b 21 XXX XXX Levantina spiriplana O Greece, Rhodes, rampert d´Ambrosie gate ZMH.1560c 21 XXX XXX Levantina spiriplana O Greece, Rhodes, rampert d´Ambrosie gate ZMH.1560d 22 XXX XXX Levantina spiriplana O Greece, Rhodes, Filerimos ZMH.1560e 23 XXX XXX Levantina spiriplana O Greece, SW Symi, Xisos ZMH.1561b 25 XXX XXX Levantina spiriplana O Greece, SW Symi, Xisos ZMH.1561c 23 XXX XXX Levantina spiriplana O Greece, SW Symi, Xisos ZMH.1561d 23 XXX XXX Levantina spiriplana O Greece, Symi, Skoumisa Bay, Kefala ZMH.1561e 23 XXX XXX Levantina spiriplana O Greece, Dodecanese (?) ZMB 127533 27 XXX XXX Levantina spiriplana O Turkey, Dotça (mix with L. hierosolyma, Israel?) ZMH.1564 30 XXX XXX Levantina hierosolyma O Jerusalem ZMB.74072 27 XXX XXX Levantina hierosolyma O Jerusalem ZMB.9126 27 XXX XXX Levantina hierosolyma O Jerusalem, Givat Ram Campus, Hebrew Univ (Mienis) ZMH.1562 27 XXX XXX Levantina hierosolyma O Jerusalem, Givat Ram Campus, Hebrew Univ (Mienis) ZMH.1563a 27 XXX XXX Levantina hierosolyma O Jerusalem, Givat Ram Campus, Hebrew Univ (Mienis) ZMH.1563b 29 XXX XXX Levantina caesareana C Arbell Cliff, Lake Tiberias (Heller) ZMH.1568 27 XXX XXX Levantina caesareana C Arbell Cliff, Lake Tiberias (Heller) ZMH.1569d 31 XXX XXX Levantina caesareana O Arbell Cliff, Lake Tiberias (Heller) ZMH.1569e 27 XXX XXX Levantina caesareana C Arbell Cliff, Lake Tiberias (Heller) ZMH.1569g 32 XXX XXX Levantina caesareana C Arbell Cliff, Lake Tiberias (Heller) ZMH.1569h/i 33 XXX XXX Codringtonia codringtonii - Greece, Peleponnes (leg 2011 ?) Lakatnik ? ZMB 107155 N/A XXX XXX Codringtonia codringtonii - - Kotsakiozi et al. (2012) N/A JQ239955 JQ240123 Codringtonia eucineta - - Kotsakiozi et al. (2012) N/A JQ239934 JQ240103 Codringtonia intusplicata - - Kotsakiozi et al. (2012) N/A JQ239967 JQ240134 Codringtonia helenae - - Kotsakiozi et al. (2012) N/A JQ239977 JQ240145 Assyriella guttata C Turkey, Harput near Elazig ZMB 127531 26 XXX XXX Assyriella guttata C Turkey, Anatolien, 3km SE Mardin ZMB 127532 28 XXX XXX Gyrostomella leachii - Lybia,Tripoli, Djebel Garim ZMB 86612-1 N/A XXX XXX Gyrostomella leachii - Lybia,Tripoli, ZMB 86612-2 N/A XXX XXX Helix aspersa - - Reginer et al. 2011 N/A HQ203051 Guiller & Madec, 2010 N/A EU912763

Rasp

Sequencher

FigTree MrBayes Sequencher Rasp Gyrostomella leachii 15 5 3 8 24 81/82 14 0.42 13 (0.001-1.48) 6 7 98/99 1 1.01 9 (0.08-2.47) 2 87/98 4 2.64 10 89/78 (0.6-5.32) 11 0.71 33 75/75 (0.001-2.31) 31 2.95 16 75/90 (0.51-6.3) 12 1.39 19 99/99 27 (0.08-3.45) 0.49 30 (0.009-1.37) 29 17 97/99 21 20 6.19 18 (2.29-11.12) 23 Karpathos Is. 22 Rhodes Is. 99/99 25 96/99 1.48 32 Symi Is. (0.28-3.18) 28 90/96 26 Assyriella guttata Nimos Is. 3.03 Continental Turkey (0.86-5.91) 100/100 Israel Codringtonia 4.43* (3.45-5.39) Helix aspersa K R S N CT IS 15 5 3 XIII 8 IV 24 CT I-III 14 . N S II 13

0.95 6 K R XI 7 X XII 1 V IS 9 2 0.85 4 0.79 IX 10 VIII 11

0.98 33 31

II 16 0.75 12 19 0.78 27 VII XIII VII 0.82 30

III 0.90 29 VI 0.75 X-XII VIII 0.76 17 VI I 0.83 21

0.78 20 IX 0.80 0.99 18 23 V 22 IV 25

32 28 26