Abstract Modeling and Phylodynamic

ABSTRACT MODELING AND PHYLODYNAMIC SIMULATIONS OF AVIAN INFLUENZA by Liam Mosley Avian Influenza Viruses (AIV) are highly adaptive and mutate continuously throughout their life-cycle. Subtype H5N1, also known as Highly Pathogenic Asian Avian Influenza, is of particular interest due to its rapid spread from Asia to other countries. Constant mutations in the protein sequences of AIVs cause antigenic drift which leads to the spread of epidemics to livestock, causing billions of dollars in socio-economic losses each year. Con- sequently, containment of AIV epidemics is of vital importance. Computational approaches to epidemic forecasting, specifically phylodynamic simulations, enhance in vivo analysis by enabling analysis of ecological parameters, evolutionary traits, and the ability to predict antigenic shifts to assist vaccine design. This work introduces an improvement on existing phylodynamic simulations models, called the HASEQ model, by using actual Hemagglutinin (HA) protein sequences, simulating mutations through amino acid substitution models, and implementing an amino-acid level antigenic analysis algorithm to model natural selection pressure. In contrast to prior approaches that rely on abstract representations of virus strains and mutations, HASEQ manipulates and yields actual HA strains to allow for robust validation and direct application of results to inform epidemic containment efforts. The validity of the HASEQ model is assessed via comparisons to WHO Nomenclature refined to represent strains present in 3 high risk countries. The model is calibrated and validated using thousands of simulations with wide-ranging parameter settings requiring over 2,500 hours of computation time. Results show that the model improvements yield results with the expected evolutionary characteristics at the cost of increasing computational run-time costs 10-fold. MODELING AND PHYLODYNAMIC SIMULATIONS OF AVIAN INFLUENZA A Thesis Submitted to the Faculty of Miami University in partial fulfillment of the requirements for the degree of Master of Science in Computer Science Department of Computer Science and Software Engineering by Liam Mosley Miami University Oxford, Ohio 2019 Advisor: Dr. Dhananjai Rao Reader: Dr. Eric Rapos Reader: Dr. Eric Bachmann Contents 1 Introduction1 1.1 Motivation . .1 1.2 Contributions . .1 2 Background4 2.1 Epidemiological Model . .4 2.2 Phylodynamic Simulations . .6 2.3 PhySim . .7 2.3.1 Simulation Methods . .9 2.3.2 Euclidean Models of Avian Influenza . 11 2.4 Related Work . 13 3 Phylodynamic Model Improvements and Implementation 15 3.1 HASEQ Model Introduced . 15 3.2 Virus Strain Representation . 16 3.3 P-Epitope . 17 3.4 Modeling Mutations on Influenza Strains . 18 4 Validation and Analysis 21 4.1 P-Epitope Validation . 22 4.2 PhySim Calibration and Comparative Analysis . 23 4.2.1 Turkey . 25 4.2.2 Nigeria . 29 4.2.3 Vietnam . 32 4.3 Generalized Sensitivity Analysis: Turkey . 35 4.4 BLAST Analysis . 37 4.5 Run-time and Memory Analysis . 40 ii 4.6 Discussion . 41 5 Conclusion 42 5.1 Limitations . 42 5.2 Summary . 42 5.3 Future Work . 44 References 46 iii List of Figures 2.1 Ecological model of the influenza life cycle . .4 2.2 A phylogenetic tree showing the different virus lineages for Vietnam . .5 2.3 SEIR Compartmental Model . .6 2.4 Combained Ecoligcal and SIS models with example parameters from PhySim8 2.5 Antigenic drift caused via mutations in the euclidean model . 12 3.1 A diagram of the PhySim program (Green = main, Yellow = abstract classes). Arrows represent inheritance, straight lines represent interaction without direct relationships. 15 3.2 Generalized PhySim diagram with changes outlined in red. Arrows represent inheritance, straight lines represent interaction without direct relationships. 16 3.3 Expected number of substitutions for particular residues in an average HA Sequence . 17 3.4 Using matrix exponentiation figure (a) is transformed into figure (b), this example is done using a large value for t to show the differences in substitution rates . 19 4.1 A comparison of the risk function with typical antigenic distances for both the geometric (orange) and HASEQ (blue) scales . 22 4.2 A comparison of P-Epitope value distributions for inter-clade distances (orange) and intra-clade (blue) distances, sample was done on 100 different sequences from 10 different clades in the H5N1 2012 nomenclature . 23 4.3 Phylograms produced from HASEQ and Geometric simulations compared to the reference phylogram for Turkey . 25 4.4 Calibration exploration for Turkey. Results from 10 runs, brightly colored nodes represent successful runs. 26 iv 4.5 Graph of average HASEQ and Geometric simulation infective populations over the course of 10 runs with a 95% Confidence Interval on the Geometric population for Turkey . 27 4.6 Infective population for the last 5 simulation years with a 95% Confidence Interval on the Geometric population for Turkey . 28 4.7 Phylograms produced from HASEQ and Geometric simulations compared to the reference phylogram for Turkey . 29 4.8 Calibration exploration for Nigeria. Results from 10 runs, brightly colored nodes represent successful runs. 30 4.9 Graph of average HASEQ and Geometric simulation infective populations over the course of 5 runs with a 95% Confidence Interval on the Geometric population for Nigeria . 31 4.10 Infective population for the last 5 simulation years with a 95% Confidence Interval on the Geometric population for Nigeria . 31 4.11 Phylograms produced from HASEQ and Geometric simulations compared to the reference phylogram for Vietnam . 32 4.12 Calibration exploration for Vietnam. Results from 10 runs, brightly colored nodes represent successful runs. 33 4.13 Graph of average HASEQ and Geometric simulation infective populations over the course of 10 runs with a 95% Confidence Interval on the Geometric population for Turkey . 34 4.14 Infective population for the last 5 simulation years with a 95% Confidence Interval on the Geometric population for Vietnam . 34 4.15 Generalized Sensitivity Analysis (GSA) results for Turkey, x-axis values in each sub-chart show the range of values for each parameter explored. The y-axis shows dm;n for the values explored. 35 4.16 Summary of dm;n values for Figure 4.15..................... 36 4.17 Correlation between parameter variables for PhySim runs using the HASEQ model ....................................... 37 v ACKNOWLEDGEMENTS First I would like to thank my advisor DJ Rao for all of the advice and guidance he has given me throughout my time in the Computer Science Department. Second I would like to thank those on my thesis committee for taking the time out of their schedules to go over my work. Last I would like to show my appreciation to my family and friends who have supported me and helped to push me along the way while I finished my degree. vi Chapter 1 Introduction 1.1 Motivation Avian Influenza Viruses cause billions of dollars of socio-economic losses every year. Between the years of 2014 and 2015 alone the spread of AIVs incited the culling of 45 million turkeys and chickens in order to contain a single epidemic. There are a variety of approaches to containing AIV epidemics such as vaccination, culling of populations, and livestock isolation. Vaccination efforts are of pivotal importance because they are able to prevent epidemics before they have the chance to spread [1]. There are two typical groupings of approaches to deciding the best approach to epidemic containment, the first of which is via in vivo, or live, analysis. In vivo approaches to strain selection revolve around sampling AIV host populations and using the density of different strains in host populations to determine which strains are most commonly present. Unfortunately this process can take months to complete and epidemic containment requires constant monitoring and quick action. Because of the time constraints on this methodology it is much harder to inform regions as to how to contain epidemics as they are happening. The second group of approaches to epidemic analysis are in silico, or computational, approaches. Phylodynamic simulations are one such computational approach. Focused on modeling the spread of epidemics, phylodynamic simulations allow researchers to analyze and predict when future epidemics will arise and how to tackle their containment. Because in silico approaches focus on preventative measures they are of increasing importance to containment efforts. 1.2 Contributions Current phylodynamic models are limited in their representation of virus strains. Viruses are modeled as euclidean coordinate points, where each dimension represents a multitude 1 of characteristics for a particular strain. These geometric, or euclidean, models can be scaled up or down to represent different abstract dimensions of influenza strains, as the number of dimensions increases so do the computational requirements of the simulation. Previously there was motivation for finding scaled down representations of virus strains through modeling of antigenic drift, or the evolutionary distance, as changes in 2-D vector values [2]. These representations are limited to an antiquated equation for antigenic distance and new, more proven measures have been introduced into literature, such as P-Epitope [3]. Along with improvements in the measure of antigenic distance, the advent of machine learning has brought with it more accurate models of amino acid substitution. Current phylodynamic simulations model the change of virus strains over time as uniform random mutations in the nucleotide structure of protein sequences [4]. Models such as FLU have been shown to be effective at capturing individual amino acid substitution rates [5]. Amino acids are the encoded version of multiple nucleotides, and are responsible for defining the shape of proteins [6]. This paper implements these substitution models to get a more accurate representation of how viruses mutate. In summation, this work introduces an improvement on existing phylodynamic simulation models by implementing current measures of antigenic distance and adapting amino acid substitution models to represent changes in virus strains at the protein level.

Load more