<<

Overview and Introduction to

Eric L. Stevens, Ph.D. International Policy Analyst International Affairs Staff [email protected] Center for Food Safety and Applied Overview: Phylogenetics

• Understand the principal of bacterial

• Understand the underlying principles in which phylogenetic trees are created

• Have a conceptual understanding of the different ways to create a • AND HOW WGS DATA IS USED

• Understand how to read a phylogenetic tree Phylogeny

• The evolutionary history of a or a group of species over time

Image courtesy of http://evolution.berkeley.edu/evolibrary/article/0_0_0/evo_03 Molecular data vs. / 5

Molecular Data Morphology

• Strictly heritable • Can be influenced by • Unambiguous data environmental factors • Quantitative • Ambiguous modifiers • assessment • Qualitative easy • Homology assessment • Relationship of distantly difficult related can be inferred • Close relationships can be • Abundant and easily inferred generated • Problems when working with reduced visible morphology 5 WHAT IS PHYLOGENETICS?

• The study of evolutionary history/relationships among organisms or species based on heritable traits (DNA) • Homologous sequences

• Includes : • The classification and naming ’s sketch 1836: the of species first phylogenetic tree?

6 7

Cell Division and Lineages

Overview: • Daughter cells have the same genotype as their parent (plus any or plasmid/phage incorporation)

• Overtime the lineages that descended from a single cell will acquire enough mutations (SNPs) to be differentiated from each other

• Estimating the shared ancestor of all lineages is important in determining if the DNA from this from this clinical patient is related to the DNA from that bacteria that was contaminating that food that the person ate.

7 8

Understanding bacterial evolution and genetic relatedness

Bacterial • A bacterial cell replicates its genetic 3-5 million material and then divides in half.

Cell Division • Sometimes during replication a occurs in the DNA and the Daughter Cell 1 Daughter Cell 2 genome of the daughter cell might be slightly different than the parent

8 Detecting a mutation based on cell division

1 32 cell cells Overview: 2 • Assuming a 64 cells cells of 0.003 mutations per 4 genome per cell division it cells 128 cells would take 9 cell divisions to see a mutation in a 8 256 cells cells single cell (out of 512 cells). 16 512 cells cells

9 10

Detecting a mutation based on cell division Overview: • If the group of 512 cells were used for , then then the fraction of reads with the mutation AGGATTGTTGGCAG GGAATGTTGGCAGT (or variant) would not be high GAATGTTGGCAGTC enough to detect by current AATGTTGGCAGTCG sequencing technologies

AGGAATGTTGGCAGTCG • Remember, when condensing 4 reads into one genome sequence, if three of the reads show A and the other shows a T, will select the A as what the is at that position

10 11

Detecting a mutation

1 32 cell cells

2 AGGATTGTTGGCAG cells 64 cells GGATTGTTGGCAGT 4 GATTGTTGGCAGTC cells 128 ATTGTTGGCAGTCG cells AGGATTGTTGGCAGTCG 8 256 cells cells

16 512 cells cells

11 12

Cell Division and Lineages Overview: This is where Phylogenetics is helpful!

12 13

13 Phylogenetic concepts: Interpreting a Phylogeny

Sequence A

Sequence B Sequence C

Sequence D

Sequence E

Present Time Constructing Phylogenetic trees

• What is the goal of your work?

• Aligning homologous sequences • DNA/RNA/ • Are the groups closely related or distantly? • What sequences are you choosing? Phylogenetic trees can be represented in several ways

16 What data goes into making the tree is important

From , Baxevanis and Ouellette, 2nd Edition, 2001, p. 327, Wiley Pub. 17 Example: 16S RNA

• Secondary structure is shown • Highly conserved among all species* • i.e. homologous sequence (from a common ancestor) 16S rRNA Phylogenetic Tree

Pace, N. 1997. A molecular view of microbial diversity and the biosphere. Science 276:734–740. trees can be different from a Species tree!

http://www.math.duke.edu/mathbio/proj_stat.html 20 Bacteria Tree from 31

Wu et al. Vol 462|24/31 December 2009| doi:10.1038/nature08656 1056 Discriminating power of increasing sequence data

Nat Rev Microbiol. Oct 2013; 11(10): 728–736.

Constructing Phylogenetic Trees

• A tree is characterized by how it looks (topology) and its branch lengths • Branches represent time or proportional to number of changes

• Three main methods for construction: • Parsimony • Distance-based • Maximum Likelihood Constructing Phylogenetic Trees

• Trees can be rooted • Evolutionary relationship is implied • Use an to “root” • Example: S. bongori is the outgroup and roots the tree for S. enterica

• Trees can be unrooted • No evolutionary directionality • Want to know which are more alike Constructing Phylogenetic Trees Neighbor-joining Maximum parsimony Maximum likelihood

Very fast Slow Very slow

Easily trapped in local Assumptions fail when Highly dependent on optima evolution is rapid assumed evolution model

Good for generating Best option when Good for very small data tentative tree, or tractable (<30 taxa, sets and for testing trees choosing among multiple strong conservation) built using other methods trees

26 Parsimony • What is the tree that requires the fewest evolutionary changes to explain the data • i.e. fewest number of mutations to explain sequence variation

• Directly based on the sequence • Does not take into account revertant mutation • Does not take into consideration types of mutation ( vs ) Parsimony Example

1 2 3 4 5 6 Position 1 G G G G G G 2 G G G A G T

3 G G A T A G Sequence

4 G A T C A T

Informative Informative

Uninformative Uninformative Uninformative Uninformative Parsimony Example

1 2 3 4 5 6 Position 1 G G G G G G 2 G G G A G T

3 G G A T A G Sequence

4 G A T C A T

Informative Informative

Uninformative Uninformative Uninformative Uninformative Maximum Likelihood

• Want to find a tree that will maximize the chance that the tree is correct • Probabilities (likelihoods) are considered for every mutation (nucleotide substitution) for a multiple . • Transitions are more likely than (~3:1) • If C, T, and G are represented at a site, The sequences that have C and T are probably closer* Maximum Likelihood

• Each base of every position of every sequence is considered separately (independent) and given a log-likelihood value and the sum is used to estimate branch lengths • For every Topology (possible tree)!!!

• Good theoretical background • General Consistent • Computationally expensive (a lot of time) Distance-based

• Construct a distance matrix for each pair of sequences (e.g. how many differences)

1 A 0 1 B 3 0 2 1 C 4 3 0 A B C A B C

• That distance matrix represents one tree • Great for very similar sequences • Very fast • Loses information ASSESSING CONFIDENCE IN TREES

• Measure of confidence in the inferred tree. • Is the tree likely to change if we got more data, or if we had used slightly different data? • Are some parts of the tree more robust than others?

• Bootstrapping • Create multiple new alignments by resampling the columns of the observed data matrix • Construct a tree for the ‘bootstrap’ alignment • The bootstrap support for each branch is the % of bootstrap trees that branch appears in.

33 Assessing Confidence In Trees

34 Baldauf 2003. TRENDS in Vol.19 No.6 June 2003 Goal of Phylogenetic Trees using WGS Data: Infer evolutionary relationships based on nucleotide differences And match clinical to food/environmental isolates

..CTAGCTAG…...CTAGCTAG.. Clinical 1 1 ..CTAGCTAG…...CTAGCTAG.. Clinical 2 0 ..CTAGCTAG…...CTAGCTAG.. Food A ..CTAGCTAG…...CTAGCTAG.. Envr A SNP 2 2 ..ATAGCTAG…...TTAGCTAG.. Envr B SNP ..ATAGCTAG…...TTAGCTAG.. Envr C 0-2 2-3 ..ATAGCTAG…...CTAGGTAG.. Envr D ..ATAGCTAG…...CTAGGTAG.. Envr E SNPs SNPs

..ATAGCTAG…...CTACCTAG.. Food E 1-2 3 ..ATAGCTAG…...CTACCTAG.. Food E 0-2 SNPs ..ATAGCTAG…...CTACCTAG.. Food E ..ATAGCTAG……GTAGCTAG.. Envr C SNPs

35 36 INTERPRETATION OF TREE & SNPS

1. mtDNA Forensic Testing Framework

2. Results binned into 3 groups 1. Include/Match (<=20 SNPs) 2. Inconclusive (20-100 SNPs) 3. Exclude/Non-Match (>100 SNPs)

3. Statistical Odds Ratio method in development, databases growing

37 S. Bareilly Phylogeny

FDA1202 SAL2919 Coriander Powder India FDA1203 SAL2921 Frozen Baila Bangladesh FDA1201 SAL2918 Coriander Bangladesh FDA1112 SAL2877 Frozen Undeveined Shrimp India FDA1203 SAL2921 Frozen Baila Bangladesh FDA1116 SAL2885 Coriander Powder India FDA1206 SAL2924 Stomach Vietnam SAL3133 Clinical MD FDA1146 SAL2903 Hilsa Fish Thailand FDA1148 SAL2904 Frozen Rock Lobster Tails United Arab Emirates FDA1143 SAL2920 Lobster Tails Taiwan

FDA1155 SAL2889 Fresh FDA1144 SAL2883 Frozen Whole Tilapia Thailand FDA1159 SAL0955 Unknown CantaloupeSAL3132 Clinical USA EnvironmentalFDA713 SAL0949 USA FDA1160MD SAL0956 FDA1114 SAL2881 Frozen Raw Shrimp India EnvironmentalFDA1145 USA SAL2898 Chili EnvironmentalFDA1107 USA SAL2914 Pabda FDA1141Powder Thailand SAL2884 Frozen Crab with FishFDA1150 Bangladesh SAL2439 Frog SAL3140 Clinical NY FDA1157Claws Sri SAL2908 Lanka Ground FDA1140Legs Unknown SAL2895 Red Chili FDA1147Red Pepper SAL2886 USA Fennel Seeds PowderFDA1165 Pakistan SAL2894 Raw SAL3130 Clinical MD UnitedFDA1161 Arab SAL2876 Emirates Whisker FDA1163Shrimp Vietnam SAL2879 Frozen Raw Esomus FDA1164Fish Vietnam SAL2887 Sand Goby FDA1117Swaison Whole SAL2888 Vietnam Frozen SAL3126 Clinical MD FDA1139Fish Vietnam SAL2890 Kheer ShrimpFDA1123 India SAL2901 Sesame MixFDA1118 Pakistan SAL2891 Coriander SAL3139 Clinical NY FDA1124Seeds India SAL2902 FDA1132Powder India SAL2916 FDA1142Coconut India SAL2910 Shell-on FDA1200Shrimp India SAL2917 Cumin SAL3148 Clinical NY ShrimpFDA1129 Sri SAL2911 Lanka Organic FDA1131Powder India SAL2915 Frozen FDA1207Black Pepper SAL2925 India Chili FDA1126Rohu Rish SAL2906 India Ginger SAL3128 Clinical MD PowderFDA1120 India SAL2896 Crushed PowderFDA1138 India SAL2900 20-25 SNPs ChilisFDA1119 India SAL2893 Frozen CorianderFDA1156 Mexico SAL2892 SAL3127 Clinical MD FDA1121Fish India SAL2897 Sesame FDA364 Irrigation Water USA SeedFDA1149 India SAL2438 Nonfat Dry SAL3157 Clinical NY ATCCFDA1152 9115 SAL2434 FDA1154Milk Unknown SAL2436 FDA1153Poultry Meal SAL2435 USA Poultry FeatherFDA1137 Meal SAL2913 USA SAL3131 Clinical MD FeatherFDA1130 Meal SAL2912 USA Cayenne ScallopsFDA1128 Indonesia SAL2909 Punjabi FDA1204Pepper India SAL2922 Chili FDA1125Cheole Spice SAL2905 India Turmeric SAL3155 Clinical NY FDA1115Powder SAL2882India Frozen Raw PowderFDA1127 India SAL2907 FDA1113Peeled Shrimp SAL2880 India Frozen FDA1202Shrimp India SAL2919 Coriander SAL3151 Clinical NY FDA1201Shrimp India SAL2918 Coriander PFGE PowderFDA1112 India SAL2877 Frozen BangladeshFDA1203 SAL2921 Frozen Baila UndeveinedFDA1205 SAL2923 Shrimp Bangladeshi India Fresh Water Fish (Bacha) SAL3129 Clinical MD BangladeshFDA1116 SAL2885 Coriander FDA1206Bangladesh SAL2924 Fish SAL3133Powder ClinicalIndia SAL3141 Clinical NY FDA1146Stomach Vietnam SAL2903 Hilsa MDFDA1148 SAL2904 Frozen Rock Lobster Tails FishFDA1143 Thailand SAL2920 Lobster FDA1144United Arab SAL2883 Emirates Frozen Whole SAL3145 Clinical NY <=5 SNPs FDA1114Tails Taiwan SAL2881 Frozen Raw TilapiaSAL3150 Thailand Clinical Match ShrimpSAL3140 IndiaClinical SAL3130NY Clinical SAL3152 Clinical NY SAL3126NY Clinical SAL3139MD Clinical SAL3148MD Clinical SAL3158 Clinical NY SAL3128NY Clinical SAL3127NY Clinical SAL3157MD Clinical SAL3131MD Clinical SAL3153 Clinical NY 110-130 SNPs SAL3155NY Clinical SAL3151MD Clinical SAL3129NY Clinical SAL3141NY Clinical SAL3146 Clinical NY SAL3145MD Clinical SAL3152NY Clinical SAL3158NY Clinical SAL3153NY Clinical SAL3142 Clinical NY SAL3146NY Clinical SAL3142NY Clinical SAL3143NY Clinical SAL3143 Clinical NY SAL3154NY Clinical SAL3144NY Clinical SAL3156NY Clinical SAL3125NY Clinical SAL3154 Clinical NY SAL3149NY Clinical SAL3147MD Clinical NY NY SAL3144 Clinical NY SAL3156 Clinical NY SAL3125 Clinical MD SAL3149 Clinical NY SAL3147 Clinical NY NGS distinguishes geographical structure among closely related Salmonella Bareilly strains Importance of a Balanced Approach

Clinical Maximum Food and Samples WGS Benefit Environmental Samples

40 Note:

• These slides are for teaching purposes only and have been collected from images that I have made, from the CDC and FDA, and from around the web.

• The findings and conclusions in this report are those of the author and do not necessarily represent the official position of the Food and Drug Administration