Flaviviruses Versus the Host Cell, and Evolution in the Primate Interferon Response

by

Alison Ruth Gilchrist

B.Sc., University of California at San Diego

Athesissubmittedtothe

Faculty of the Graduate School of the

University of Colorado in partial fulfillment

of the requirements for the degree of

Doctor of Philosophy

Department of Molecular Cellular and Developmental Biology

2020

Committee Members:

Sara L. Sawyer, Chair

Robert L. Garcea

Rushika Perera

Robin D. Dowell

Sabrina L. Spencer ii

Gilchrist, Alison Ruth (Ph.D., Molecular Cellular and Developmental Biology)

Flaviviruses Versus the Host Cell, and Evolution in the Primate Interferon Response

Thesis directed by Prof. Sara L. Sawyer

Long-term interactions between viruses and their hosts often develop into genetic arms races, which result in fast-evolving (i.e. proteins evolving under positive natural selection), especially in immune proteins. A bioinformatic screen of proteins in a component of the primate innate immune response, the interferon system, demonstrated that proteins farther downstream of interferon induction are more likely to be evolving under positive natural selection compared to proteins in interferon induction pathways. One of the proteins under positive selection in this screen, STING, is a known target of proteases from a group of viruses called flaviviruses. The haplotypes of STING (three of which are studied in this work) demonstrate a range of phenotypes of antagonism and interferon induction that may help explain the evolutionary history of this crucial immune .

The cleavage of STING demonstrates that the dengue virus protease targets host proteins for cleavage as well as viral proteins. In an attempt to identify novel targets of the dengue virus protease, a machine learning screen was used to predict possible motifs based on known motifs. This resulted in the identification of DGAT2, a newly described target of flavivirus proteases. The ability to cleave DGAT2, a host protein involved in maintaining lipid homeostasis, improves dengue virus replication, and is a conserved property of all flavivirus proteases tested. DGAT2 is not evolving under positive selection, making it a host-virus antagonistic interaction that has not resulted in an evolutionary arms race. However, identifying and describing this host-virus interaction helps us understand how dengue virus and other flaviviruses alter the host lipid environment during replication. Dedication

For my parents.

”To know how much there is to know is the beginning of learning to live.”

-Dorothy West (The Richer, the Poorer) iv

Acknowledgements

To Sara: thank you for setting up the support systems that propped me and my science up when we needed it, and for caring about the lab safety and organization that made being a bench scientist so fun. And thank you to Bob, Rushika, Robin, Sabrina, and one-time committee member

David for being unfailingly kind, supportive, and helpful.

Thank you to Nicholas Meyerson for being an unwavering hero. You helped me so much, for so long, that I will understand if you never want to speak to me again (but please do). Alex Stabell and Maryska Kaczmarek: thank you so much for welcoming me in to the lab, and for your help and friendship over the years. Thank you to Vanessa Bauer for being an incredible lab manager;

I will never forgive you for setting an impossible standard. Thank you to Elena Judd for being a wonderful technician, and then an even more wonderful undergraduate researcher—you’re going to do great things in the future. Everyone else I met and worked with in the Sawyer Lab, including but not limited to: Cody Warren, Qing Yang, Camille Paige, Kyle Clark, Will Fattor, Arturo

Barbachano-Guerrero, Joe Timpona, Emma Worden-Sapper, Sharon Wu, Obaiah Dirasantha: I’ll never forget any of you! And I can’t wait to check in on everyone over the coming years.

Thanks to my mom for letting me cry over the phone so many times, and also for bragging about me so much, making it extremely hard to drop out of grad school. Thanks to my dad for helping me climb mountains, physical and metaphorical, my whole life. Thanks to Andrew and

Daniel for nothing related to research, but everything related to sibling solidarity, and to Lena

Meyer for being my not-a-sister sister, and an endless well of support and love when I needed it.

Thank you to MCDB for a wonderful six years. To the entering MCDB class of 2014: I can’t v believe I was so lucky to spend my PhD with such wonderful classmates. To Adrian, Graycen,

Julie, Daniel, Brad, Kate, and Abby: an extra special thank you for being part of what kept me sane. For every Science Bu↵, Nerd-Niter, and ComSciCon-RMW committee member I spent time with in the last six years: I hope I learned even a small portion of what you all have to teach scientists. To every other friend and colleague that I don’t have the space to thank: know that I appreciated you dearly and you helped get me through grad school. vi

Contents

Chapter

1 Introduction 1

1.1 Viruses and hosts in evolutionary combat ...... 1

1.2 Positiveselectionintheinterferonpathway ...... 4

1.3 STING: an immune protein that is a target of flavivirus proteases ...... 5

1.4 Using machine learning to predict new targets of flavivirus proteases ...... 6

1.5 DGAT2: a novel target of flavivirus proteases ...... 7

1.6 Thesis organization ...... 9

2 Positive selection in the interferon pathway 10

2.1 Positive natural selection ...... 10

2.2 Positiveselectioninimmunepathways ...... 11

2.3 Theinterferonresponse ...... 12

2.4 Screening for positive selection in interferon pathways ...... 14

2.5 Characterization of multiple sequence alignments ...... 17

2.6 Interferon-stimulated experience more intense positive selection than interferon-

induction genes or randomly-selected genes ...... 19

2.7 Discussion...... 24

2.8 Methods...... 25 vii

3 STING: an immune protein that is a target of flavivirus proteases 30

3.1 Flaviviruses, and dengue viruses in particular, are world-wide pathogens with major

consequences for human health ...... 30

3.2 Human STING and the interferon response ...... 32

3.3 Human STING, but not human 78Q STING, is cleaved by multiple flaviviruses . . . 33

3.4 RodentSTINGisunderpositiveselection ...... 41

3.5 The interferon response is stimulated by STING transfection ...... 42

3.6 Active dengue virus protease inhibits interferon production ...... 44

3.7 Small di↵erences in virus replication in cell lines expressing di↵erent STING alleles . 47

3.8 Not all flavivirus proteases inhibit the STING-dependent interferon response . . . . 49

3.9 Discussion...... 49

3.10Methods...... 53

4 DGAT2: a novel target of flavivirus proteases identified by machine learning 57

4.1 Predicting targets of the dengue virus protease by machine learning ...... 57

4.2 DGATiscleavedbythedenguevirusprotease ...... 63

4.3 Mutation of the DGAT2 cleavage motif reduces viral infection...... 65

4.4 Confirmation with DGAT2 KO A549s ...... 73

4.5 ChemicallyinhibitingDGAT2activity ...... 74

4.6 Cleavage of DGAT2 is conserved in the flavivirus family...... 77

4.7 The cleavage of DGAT2: discussion ...... 79

4.8 Attempted further validation of the machine learning approach to predicting targets

offlavivirusproteases ...... 81

4.9 Methods...... 92

5 Conclusion 100 viii

Bibliography 102

Appendix

A Genes Analyzed in the Interferon Positive Selection Screen 113

B Host Proteins Predicted by Machine Learning 126 ix

Tables

Table

2.1 Many interferon genes are known to be evolving under positive selection...... 13

2.2 Genes in the interferon induction pathway and genes stimulated by interferon evolv-

ingunderpositiveselection...... 26 x

Figures

Figure

1.1 An example of the arms race between virus and host proteins ...... 3

1.2 DGAT2Biochemistry ...... 8

2.1 Simplified diagram of the interferon response...... 16

2.2 Quality and equity metrics for the three families of multiple sequence alignments

compared...... 18

2.3 Characterization of multiple species alignments ...... 21

2.4 Interferon-stimulated genes have a higher whole- dN/dS value than other genes,

and have more codons under positive selection than other genes...... 23

3.1 DengueviruscleavesSTINGduringinfection ...... 34

3.2 The SNPs of HAQ STING are geographically distinct ...... 36

3.3 The 78Q SNP prevents cleavage of human STING ...... 38

3.4 DENV2 cleaves STING and HAQ STING, but not 78Q STING ...... 39

3.5 Most flaviviruses cleave STING and HAQ STING, but not 78Q STING ...... 40

3.6 Rodent Sting1 is under positive selection, but only in the Hystricomorpha clade. . . 43

3.7 Transfecting STING induces interferon production ...... 45

3.8 Transfecting STING and DENV2 protease dampens interferon production...... 46

3.9 STING SNPs and protease antagonism...... 48

3.10 Non-cleavable STING reduces DENV2 replication in A549s...... 50 xi

3.11 STING alleles and flavivirus protease antagonism...... 51

3.12 STING alleles and flavivirus protease antagonism, part 2...... 52

4.1 The dengue virus polyprotein is cleaved by the dengue virus protease...... 60

4.2 Schematic of the machine learning protocol...... 62

4.3 DGAT2 motif identified and the structure of DGAT2...... 64

4.4 DGAT2iscleavedbythedenguevirusprotease...... 66

4.5 DGAT2 cleavage product not significantly degraded by the proteasome ...... 67

4.6 Wild-type A549 cells complemented with mutant DGAT2 did not inhibit dengue

virusreplication...... 69

4.7 Endogenous DGAT2 is significantly reduced after transfection of DGAT2-targeting

siRNA...... 70

4.8 Non-cleavable DGAT2 reduces DENV2 replication...... 72

4.9 The presence of non-cleavable DGAT2 inhibits dengue replication in DGAT2 knock-

outcells...... 75

4.10 DGAT2 inhibition in DGAT2 mutant cell lines did not result in significantly higher

denguevirusreplication...... 76

4.11 DGAT2 cleaved by all tested flavivirus proteases ...... 78

4.12 Proposed model for flavivirus cleavage of DGAT2 ...... 80

4.13 Example screening of predicted host proteins by western blotting...... 83

4.14 The vast majority of predicted proteins were not cleaved by the dengue virus protease. 84

4.15 Quantifying western blots led to misleading results...... 86

4.16 Mass spectrometry schematic and subcellular fractionation of Huh7 cells transfected

withdenguevirusprotease...... 88

4.17 SLC25A6 is not cleaved by the dengue virus protease...... 90

4.18 Accuracy of machine learning algorithm trained on dengue motif training data de-

creaseswithphylogeneticdistance...... 91 Chapter 1

Introduction

Viruses and their hosts have been locked in evolutionary combat for millennia. Throughout the course of my thesis research I studied the evolutionary signatures in primate immune proteins that may have been caused by viruses over the evolution of primates, and two specific interactions between a family of viruses (flaviviruses) and host proteins (STING and DGAT2). In this intro- duction I will give an overview of the questions that drove my PhD research and describe briefly the work I completed.

1.1 Viruses and hosts in evolutionary combat

Animals and viruses have been locked in evolutionary conflict for as long as animals have been evolving. Viral ”fossils”—elements of viral genomes integrated into host genomes—tell us that viruses are an ancient and persistent threat [73, 41, 20]. If a virus is capable of killing the host, it is a source of intense natural selection on the host population.

Natural selection is a theory proposed by Charles Darwin in the 19th century [16]. This theory proposes that organisms better adapted to their environment tend to survive and produce more o↵spring. In the context of viruses, the theory of natural selection suggests that a virus that is more suited to infecting and replicating in host cells will produce more progeny than viruses that are less suited to infecting host cells. Likewise, animals that are less susceptible to infection are less likely to be infected by a virus, less likely to experience the negative health impacts of virus infection, and therefore more likely to survive and produce o↵spring. Therefore viruses and their 2 hosts are in a slow battle of evolving selective traits through evolutionary time, a battle that often plays out on the frontlines: proteins [65].

As a classic example of where and how this battle might play out, picture a virus that needs to enter a host cell by binding a host receptor protein. A virus that can bind and enter may be able to create more infectious progeny, and therefore the genes that encode that binding protein will be passed on. But this places the host under selective pressure: if a mutation in a host appears that changes the receptor such that the virus can no longer bind, that host individual will resist infection and will be more fit in its environment. The gene that encodes the receptor in that individual is more likely to be passed on to future generations. Then the virus will be under selective pressure, and on and on ad infinitum [65]. This back-and-forth evolutionary pressure sometimes results in a flip-flopping of amino acids between a few states, a phenotype that gave rise to the ”Red

Queen Hypothesis” [113]. In Lewis Carroll’s Through the Looking Glass, the Red Queen explains to Alice that in her world: “it takes all the running you can do, to keep in the same place” [11].

Similarly, two proteins locked in an evolutionary back-and-forth keep ”running in place” between the same two states (Figure 1.1). In one form of Red Queen Dynamics, the direct opposition of these proteins may result in successive fixation of advantageous mutations, which represents an ”arms race” evolutionary signature. An evolutionary ”arms race” sometimes results in progressively more extreme phenotypes in the two opposing proteins, rather than the same di↵erence again and again

[83].

The type of evolution described above—selection for change, as compared to selection that maintains a sequence as the status quo—is sometimes called ”positive selection” and is characterized by a higher rate of nonsynonymous mutations (dN ) than synonymous (dS) mutations. In other words, if dN /dS is greater than 1, then nonsynonymous mutations may have been selected for in this gene. This mode of evolution is in contrast to a situation in which a gene’s sequence is under evolutionary pressure to stay constant. In this case, nonsynonymous mutations would be selected against, and dN /dS would be less than 1. Most genes in primate genomes are under this kind of selection: purifying selection [65]. Therefore, genes that are under positive selection interesting 3

Figure 1.1: Adaptation via natural selection, or positive selection, is identifiable due to the changes a↵ecting the sequences of proteins In this cartoon example, a protein receptor and a viral spike protein alternately evolve mutations that break or re-create the sequence that allows their binding.

virus mutates

virus

host

host mutates 4 targets of study, because this situation leads to questions such as: do these genes encode proteins that interact directly with pathogens? Are the regions of the genes that are evolving rapidly the sites of contact?

1.2 Positive selection in the interferon pathway

Immunity proteins are often evolving under positive selection, being an essential site of inter- action between virus and host proteins and therefore heavily influenced by the selection pressure exerted by viruses [20]. Therefore, we might expect to see an increased amount of proteins under positive selection is in the interferon pathway. The interferon response is a component of the innate immune system (a component of the immune system that is always present in cells, and does not have to ”adapt” to new pathogens) and plays an important role in defending human cells against viruses [23]. Because viruses replicate within cells of the host, their nucleic acids and proteins are exposed, to varying degrees, to the cellular environment. To exploit this vulnerability, hosts have evolved numerous intracellular sensors that recognize viral nucleic acids and proteins [23].

When cellular sensors detect one of these virus-specific structures, a signaling cascade is activated which ultimately leads to the production and of one or more of several possible interferon proteins [63, 61]. Interferons then produce transcriptional changes in the infected cell, inducing expression of hundreds of host genes (called “interferon-stimulated genes,” or ISGs) that collec- tively act to limit viral replication[84]. Almost any aspect of the viral life-cycle can be targeted by interferon stimulated proteins [23, 84]. The resulting interferon-stimulated proteins act with a diversity of mechanisms to halt viral replication. Interferons do not just a↵ect these changes in the infected cell, they also signal to neighboring (even uninfected) cells and induce the same transcriptional changes in those cells [119]. In solid tissues, this signalling produces a “firewall’” of protected cells around the infected ones, making cell-to-cell spread of the virus dicult.

Viruses are known to target proteins that are both up- and downstream of the production of interferon molecules themselves [63]. Viruses sometimes inhibit interferon altogether by neutralizing the sensors and signaling pathways that lead to interferon production, while other times viral 5 antagonists are directed at specific down-stream e↵ects produced by interferon-stimulated genes.

When considering the many proteins involved in this pathway, I initially hypothesized that the genes responsible for inducing the production of interferon would be antagonized more often by viruses than genes that are turned on as a result of interferon production. Therefore I expected to

find a greater proportion of genes under positive selection in the upstream pathways of interferon and ISG induction. Counter to this hypothesis, I found the interferon-stimulated genes, and not interferon-pathway induction genes, are evolving significantly more rapidly than a random set of genes. These results are more fully described in Chapter 2.

1.3 STING: an immune protein that is a target of flavivirus proteases

Stimulator of interferon genes (STING) is an example protein in the interferon response that is under positive selection. STING is part of the well-categorized DNA-sensing pathway of cells.

Double-stranded DNA is sensed by the protein cyclic-GMP-AMP synthase (cGAS), which catalyzes the formation of cyclic 2’3’-GMP-AMP (2’3’-cGAMP). 2’3’-cGAMP binds directly to STING, the activation of which ultimately results in increased interferon production [109].

STING has been shown to be under positive selection in primates, and I and others have independently confirmed this finding using updated datasets with a greater number of primate sequences [69]. STING is also targeted by virus proteins. For example, many flaviviruses encode proteins that target STING for degradation or inactivation [127, 18, 99, 71, 17, 1]. I showed that

STING is cleaved during dengue virus infection by the dengue virus protease. Moreover, I showed that multiple flavivirus proteases can cleave human STING, and that when amino acid position 78 of STING is altered, these proteases cannot cleave STING.

There is a minor allele of human STING in which the cleavage motif is mutated [43]. There is also a major allele of STING that does not have a mutated cleavage motif, but that acts significantly di↵erently in how it stimulates the interferon response [39]. This allele, the HAQ allele, does not induce interferon as robustly as the other two alleles. I was interested in how these three di↵erent alleles are targeted by the dengue virus protease, and how infection of dengue virus is a↵ected by 6 the expression of these alleles. This work is described more in depth in Chapter 3.

1.4 Using machine learning to predict new targets of flavivirus proteases

The fact that the dengue virus protease can cleave STING, as well as other host proteins, leads to an interesting experimental challenge. The question is: what other human or host proteins can the dengue virus protease target and cleave? And further, how do we identify those proteins?

The motifs recognized by the dengue virus protease are diverse, making it dicult to identify pos- sible cleavage motifs by eye. High-throughput biological methods such as peptide-library scanning vary in their specificity and accuracy and can be costly or time consuming [50]. High-throughput computational methods can be influenced heavily by the most common known motifs, and do not necessarily identify the great diversity of motifs that could possibly be cleaved by the protease [98].

A previous graduate student in the Sawyer Lab, Alexander Stabell, attempted to design a computational screening approach based on machine learning [100]. In this method, an algorithm is trained on motifs that are known to be cleaved by the dengue protease. The hope is it can then recognize motifs that may be cleaved by the protease in the human proteome. He initially curated training data from available virus genomes and several iterations of this machine learning algorithm, changing parameters such as which features of the amino acids were considered, length of motifs, and the inclusion of the STING motif. One of the consistently predicted proteins from these initial screens was diacylglycerol O-acyltransferase 2 (DGAT2), a protein which we demonstrated was indeed cleaved by the protease in co-transfection experiments. I further screened approximately

100 proteins generated by these initial screens for cleavage by the dengue virus protease, none of which were cleaved.

In a later formalized iteration of the computational component of this project, I curated motifs from over 3000 dengue virus genomes (serotypes 1-4) and worked with postdoctoral researcher

Jacob Stanley to recreate this machine learning algorithm with more formal mathematical and biological requirements [101]. This algorithm generated a di↵erent list of human proteins predicted to be cleaved by the dengue virus protease. However, after testing many of the proteins predicted 7 by machine learning from the various iterations of the computational screen, DGAT2 is the only validated positive result. Future work can be done to systematically clone and further test the output of this algorithm in cleavage assays. For my thesis, I focussed on further understanding the cleavage of DGAT2 by flavivirus proteases. However, I do write briefly about some further screening techniques I explored, as well as future possibilities for screening of these predicted targets of the dengue virus protease.

1.5 DGAT2: a novel target of flavivirus proteases

DGAT2 is a diacylglycerol O-acyltransferase that catalyzes the terminal and only committed step in triacylglycerol synthesis by using diacylglycerol and fatty acyl CoA as substrates (Figure

1.2)[12]. It is an endoplasmic reticulum-resident protein that is encoded by the DGAT2 gene on 11[12]. By machine learning, we identified a possible cleavage site at amino acid position 123, and went on to show that DGAT2 is indeed cleaved at this position.

The cleavage of DGAT2 had immediately interesting implications for dengue virus replica- tion, as the process by which flaviviruses replicate in host cells is intimately connected to host lipid membranes. Lipid homeostasis under normal cellular conditions maintains a pool of lipids (includ- ing diacylgycerols) that can be converted to phospholipids for membrane synthesis, while storing some lipid content as triacylglycerols for later energy use[14]. This conversion of diacylglycerols to triacylglycerols is partly accomplished by DGAT2[12]. By cleaving DGAT2, the cellular home- ostasis would be shifted towards phospholipid synthesis and membrane synthesis. This hypothesis tracks with the observation that local surface area of lipid membranes is increased during dengue infection[118]. The cleavage of DGAT2 may be a part of the process that creates this improved environment for dengue replication.

To test how the ability to cleave DGAT2 impacts dengue replication, I created cell lines ex- pressing either wild-type (cleavable) DGAT2 or mutant (non-cleavable) DGAT2 and showed that the presence of non-cleavable DGAT2 inhibited dengue virus genomic RNA and viral progeny pro- duction. I also showed that DGAT2 is cleaved by the proteases of multiple flaviviruses, suggesting 8

Figure 1.2: A) DGAT1 and DGAT2 convert diacylglycerols to triacylglycerols in the endoplasmic reticulum. Triacylglycerols are stored in lipid droplets as part of the energy storing mechanism of animal cells. B) In the conversion of diacylglycerols to triacylglycerols, fatty-acyl CoA is used as a co-factor.

A.

DGAT1 DGAT2

Phospholipid Fatty Acyl CoA Diacylglycerol Triacylglycerol

B. Fatty-acyl CoA

O

VVVVVVV V C O CoA O O CH C CH C 2 VVVVVVVV 2 VVVVVVVV O O CH C CH C 2 VVVVVVVV VVVVVVVV DGAT2 O CH OH CH C 2 2 VVVVVVVV

1,2-Diacylglycerol Triacylglycerol 9 that the cleavage of DGAT2 is a conserved mechanism by which flaviviruses replicate. I describe the work I completed examining this hypothesis in Chapter 4.

1.6 Thesis organization

Though I hope this introduction provides an adequate road map for how this thesis is orga- nized, and for a retroactive discussion of how my dissertation projects progressed, the remainder of this thesis is organized such that each chapter can be read as a standalone story, with methods included. Therefore some of the information above is recapped and expanded in later chapters. Chapter 2

Positive selection in the interferon pathway

2.1 Positive natural selection

Natural selection is the Darwinian theory that advantageous traits in a population will be selected for[16]. Mutations occur randomly due to DNA damage or faulty replication by host polymerases, but fixation or loss of those mutations depends on selection on a population level[2].

A house-keeping gene (i.e. a gene for which it is advantageous that the function stay consistent) is more likely to be conserved over long evolutionary periods, as mutations are selected against[130].

A small percentage of genes have more flexibility—mutations that arise in these genes may become

fixed in the population because they impose a selectable advantage.

For protein coding genes, studying how the protein sequence has changed over the course of evolutionary time can reveal the signatures of selection. One useful measure of a changing sequence over time requires knowing the DNA sequence and the protein sequence. With this information it is possible to identify where a fixed mutation is a synonymous (amino acid conserving) or nonsynonymous (amino acid changing) mutation. With multiple sequences of the same gene from di↵erent animals, it is possible to approximate rates of synonymous and nonsynonymous

fixation events over the period of evolutionary time those animals represent[124]. This calculation is accomplished by calculating the number of synonymous changes and normalizing to the number of possible synonymous changes (dS) and similarly, calculating the number of nonsynonymous changes normalized to the number of possible nonsynonymous changes (dN ). The normalization to possible changes is necessary because of the degenerate nature of the codon table: because nonsynonymous 11 mutations occur more often than synonymous mutations by random chance, computational models have been developed that use statistical frameworks to account for these unequal substitution rates.

A gene that is highly conserved among many animals, i.e. a gene that has a higher dS than dN, has most likely experienced selection against nonsynonymous mutations. Housekeeping genes generally fall into this category of genes that are under ”purifying selection”. In contrast, a gene that has a significantly higher dN than dS has probably experienced some selection to diversify. Positive natural selection, herein shortened to simply ”positive selection”, refers to this latter scenario. The other explanation for a high dN /dS ratio is that there is no selection either way, and the gene is evolving ”neutrally.” Neutral evolution is usually only detected in non-coding genes, but it must be disproved before we can claim with confidence that a gene is under positive selection.

Genes that are evolving under positive selection can give us insight into the selection pressures placed on animals in the past. These rapidly-evolving genes are often evolving in direct opposition to pathogens, which can evolve in turn to evade host adaptations. This antagonism places pressure anew on the host, resulting in the ”tit-for-tat” evolution that leads to the signatures of positive selection.

2.2 Positive selection in immune pathways

The genes that encode immune proteins are often under selective pressure imposed by patho- gens, resulting in the signatures of positive selection[20]. Immune proteins are more likely to be interacting directly with pathogenic elements, and are also less likely to be housekeeping proteins, and therefore more able to tolerate nonsynonymous mutations. Examples of immune proteins under positive selection in primates include pattern recognition sensors and their downstream signalling molecules (e.g. MAVS, TRIM5, RNASEL), cytokines and cytokine receptors (e.g. IL3, CXCR2,

CASP1), marker molecules (e.g. CD4, CD5, HLA-DPA1), complement proteins (e.g. C5, C8B,

C9), and antimicrobial function proteins (e.g. TF, LTF)[114]. Primates have likely experienced pressure in the form of pathogenic antagonism that selects for nonsynonymous mutations in the genes encoding these proteins. As a result, the proteins rapidly change, allowing them to evade or 12 bind even as viruses evolve the ability to attack or evade in turn.

2.3 The interferon response

One part of the human innate immune system, the interferon response, plays an important role in defending human cells against viruses[23]. Hosts have evolved numerous intracellular sensors that recognize viral nucleic acids and proteins[23]. When cellular sensors detect one of these virus- specific structures, a signaling cascade is activated which ultimately leads to the production and secretion of one or more of several possible interferon proteins [63, 61].

There are three types of interferon classes: types I, II, and III. The three types display distinct expression patterns and have many roles in innate and adaptive immunity[80]. In , the primary anti-viral interferon proteins, and the ones generally produced after infection by viruses, fall under Type I. This type includes proteins encoded by 13 IFN-↵ genes, and single genes for IFN-

, IFN-✏, IFN-, and IFN-!. All 17 type I IFNs bind to and signal through a shared heterodimeric receptor complex composed of a single chain of IFNAR1 and IFNAR2, both in an autocrine and in a paracrine manner[61]. This signalling event produces transcriptional changes, inducing expression of hundreds of host genes (called “interferon-stimulated genes,” or ISGs) that collectively act to limit viral replication[84]

Almost any aspect of the viral life-cycle can be targeted by interferon-stimulated proteins

[23, 84]. The products of interferon stimulated genes act with a diversity of mechanisms to halt viral replication. Some, such as IRF7, MDA5, and SOCS1 are regulators of interferon production.

The upregulation of these proteins can exacerbate or dampen the e↵ects induced by interferon expression; if the former, more ISGs are expressed and the downstream e↵ects are more intense.

Some interferon-stimulated proteins bind directly to viral proteins to prevent their action: e.g.

TRIM family proteins. Some, such as RNASEL, inhibit in the cell, which inhibits virus replication as well as host translation. Still others have DNA or RNA editing ability (e.g. APOBEC proteins and ADAR1) that are detrimental to virus genome integrity. All these mechanisms and more inhibit virus replication and egress. 13

Table 2.1: Many genes that are in the interferon induction pathway and that are stimulated by interferon have been reported to be evolving under positive selection. Many of these have direct interactions with diverse viral proteins, including protein(s) encoded by the viruses in column two.

Gene under positive selection Known direct virus interactions Citations MB21D1/cGAS many classes of viruses Mozzi et al 2015, Ma et al 2016 IFI16 HCMV van der Lee 2017, Dell'Oste et al 2014 ISG15 influenza Zhao et al 2013, Zhao et al 2010 MAVS HCV van der Lee 2017, Anggakusuma et al 2016

Induction STING flaviviruses Mozzi et al 2015, Stabell et al 2018, Ding et al 2018 TRIM25 influenza Malfavon-Borja et al 2013, Gack et al 2009 ADAR RNA viruses Forni et al 2014, Pfaller et al 2018 MxB many classes of viruses Mitchell et al 2015, Haller et al 2011 EIF2AK2/PKR influenza Elde et al 2009, Dauber et al 2009 RNAse L TMEV van der Lee 2017, Sorgeloos et al 2013 Tetherin HIV Lim et al 2010, McNatt et al 2009 TRIM15 retroviruses Malfavon-Borja et al 2013, Uchil et al 2008 TRIM22 influenza Sawyer et al 2007, Di Pietro et al 2013 TRIM31 retroviruses Malfavon-Borja et al 2013, Uchil et al 2008 TRIM38 retroviruses Malfavon-Borja et al 2013, Uchil et al 2008

Interferon-stimulated TRIM5 HIV Johnson et al 2009, Sawyer et al 2005 RSAD2/Viperin RNA viruses Lim et al 2012, Panayiotou et al 2018 14

Because Type I interferon proteins, interferon pathway signalling molecules, and ISGs are so crucial to anti-viral activity in host cells, it is safe to assume there will be a fair number engaged in antagonistic relationships with viruses over an evolutionary time scale. In other words, many interferon proteins have likely been evolving in an arms race-like scenario with long term viral antagonists. Indeed, many interferon pathway proteins are already known to be evolving with signatures of positive selection, and many of these have known viral antagonists (Table 2.1).

However, a broad screen of positive selection in interferon genes has not been published, and our understanding of where positive selection may have been most intense in these pathways is still incomplete.

2.4 Screening for positive selection in interferon pathways

In collaboration with Elena Judd, an undergraduate researcher in the Sawyer Lab for several years, I conducted a bioinformatic screen to detect the signature of positive selection in the Type

I interferon response. On initiating this screen, I expected that genes that were most upstream in the many intersecting interferon pathways would be more likely to be evolving under positive selection. I expected that viruses that interfered with interferon production and signalling at the top of the cascade would be more likely to e↵ectively inhibit interferon-regulated anti-viral activity and would therefore be more successful antagonists; and that this would drive an arms race dynamic in proteins early on in interferon induction pathways and interferon-stimulated gene (ISG) production.

In contrast, viruses that interfered with ISGs might still be susceptible to other ISGs they were not directly antagonizing them, making them less successful pathogens and less likely to apply the evolutionary pressure needed to drive an arms race scenario.

To test this hypothesis, I attempted to divide the interferon pathway into two broad cate- gories: genes that induce interferon-stimulated genes (the ”induction” category) and ISGs them- selves (the ”ISG” category). The induction category includes a broad array of pathogen-associated pattern sensors, signalling proteins, and interferons themselves. The ISG category includes the genes that are upregulated by Type I interferon signalling but do not serve to signal for increased 15 downstream themselves (Figure 2.1).

In other words, I separated genes involved in type I interferon responses into two temporal categories. The interferon “induction” category would contain genes acting upstream of interferon- stimulated gene transcription, or genes that are ”early” actors after virus infection and sensing. For instance, genes encoding proteins that identify pathogens, signal, and ultimately produce secreted interferon molecules act as an immediate consequence of infection. The signalling cascades that are activated when cells bind interferon are the next step: still early in infection, these lie between the initiation of the interferon response and the final cell state. Interferon-stimulated genes would then be the hundreds of downstream genes that become expressed or over-expressed in the presence of interferon—temporally, the latest changes. These categories have some crossover in the sense that some of the genes that are stimulated by interferon are also signalling genes (in which case they were put in the induction category). Because there is so much cross-talk in the interferon system, it was dicult to di↵erentiate farther than this without analyzing individual families of genes separately.

However, I hoped that these category designations would help us spot any patterns that emerged when we screened the genes in these categories for signatures of positive selection.

Next, we curated lists of genes from these two halves of the Type I interferon response. We reviewed recent literature and created a list of over 100 genes involved in the signaling cascade that results in the production of interferon ↵ and molecules, or that have a role in the signaling cascade that responds to these interferon molecules (herein called “induction genes”). We also gathered a similar list of 100 interferon-stimulated genes. Finally, a list of 100 random human genes was formed using a random gene set generator. The list of random genes was included to be a baseline to compare to, as well as to provide a check that we were not using conditions that were noticeably more conservative or lenient. Previous groups had estimated the proportion of non-immune genes evolving under positive selection to be around 25%, so we expected that our random gene list of

100 primate proteins would have approximately 25 genes under positive selection.

In order for the evolution of a gene to be assessed, uniquely descended versions of the gene must be compared to one another. Because we were most interested in evolution in the human 16

Figure 2.1: We considered the “induction genes” to be any gene encoding a protein upstream of ISGs, including sensors of initial infection (pattern recognition receptors, toll-like receptors (TLRs), and nucleotide sensors), signaling cascade proteins, interferon molecules, and interferon receptors. This included signaling molecules in the paracrine response to interferon produced by neighboring cells.

Virus Original Cell Neighboring Cell

Interferon Paracrine Response viral Receptors viral nucleotides PAMP nucleotides

Signaling Nucleotide Pattern TLRs Cascade Sensors Recognition Receptors

Signaling Cascade

IFN-β Type I ISGs Interferon 17 lineage, we analyzed these genes in humans and nonhuman primates. For each human gene, pri- mate orthologs were collected from Genbank and used to make a multiple for each gene. After visually inspecting and curating all alignments (see methods for alignment and quality control pipeline), we ended up with high-quality multiple sequence alignments for 131 interferon-induction genes, 100 interferon-stimulated genes, and 100 random genes (for names of genes analyzed, see Appendix A, first column).

2.5 Characterization of multiple sequence alignments

Because our goal was to compare evolutionary signatures between each of the three categories of genes, we first wished to confirm that the three datasets were similar in other qualities. First, we assessed the species composition of each dataset. Because we attempted to gather a sequence from each gene for as many primates as we could, the same species were sampled repeatedly in our data collection. We show the species from which orthologs were obtained, with the branch thicknesses demonstrating the percentage of the 331 multiple sequence alignments in which each species is represented (Figure 2.2A). The thicker the branch, the more sequence alignments that species is represented in. All species are represented in over 50% of the alignments, and only white-cheeked gibbon and black snub-nosed monkey were represented in fewer than 75% of the alignments. This illustrates the phenomenal amount of high quality sequence data available from primates, which has only become more comprehensive since we completed our data collection period in 2018.

We then plotted the proportion of multiple sequence alignments against number of species represented in the alignment (Figure 2.2B). In all three categories, the majority of genes analyzed have at least 18 species represented in the alignments. This was an important measure because our lab has previously shown that the more species included in positive selection analysis, the more repeatable the conclusion of whether or not a gene is under positive selection [59]. No genes with fewer than 10 species represented in an alignment were included in our data analysis.

We also compared the tree lengths of the multiple sequence alignments in each of the three categories. Tree length is a measure of sequence diversity, and is the average number of nucleotide 18

Figure 2.2: Quality and equity metrics for the three families of multiple sequence alignments compared. A) A phylogenetic tree, wherein the percentage of alignments that included each species is indicated by the width of the branch leading to it. B) The number of species in the final multiple sequence alignments in each of the three categories of genes. C) Tree length is the sum of the branch lengths along the tree or, in other words, the average number of nucleotide substitutions per site in an alignment. The relative frequencies of lengths are plotted as a separate histogram for each category, and the average tree length of each category is indicated.

A. B. Number of Species in Alignments for Each Category Human 0.3 Chimp Hominids Random 0.25 Bonobo Induction 0.2 Gorilla Interferon Stimulated 0.15 Orangutan 0.1 White-cheeked Gibbon 0.05

Drill Proportion of genes 0 Sooty Mangabey 20 19 18 17 16 15 14 13 12 11 10 Number of species Olive Baboon Old World Rhesus Macaque Monkeys C. Crab-eating Macaque Tree Length of Alignments for Each Category 0.14 Pig-tailed Macaque

Green Monkey 0.12 Colobus Random 0.1 Golden Snub-nosed Monkey Induction Interferon Stimulated Black Snub-nosed Monkey 0.08

Owl Monkey New World 0.06 Monkeys Marmoset 0.04 Squirrel Monkey Relative Frequency White-headed Capuchin 0.02

> 90% Alignments 0 0 0.5 1.0 1.5 2.0 90% < Alignments > 75% 75% < Alignments > 50% Tree Length 19 substitutions per site in a multiple sequence alignment [124]. The three datasets may not be equivalent by this metric, as a higher rate of nonsynonymous mutations would increase tree length.

Indeed, the interferon-stimulated gene category was found to have alignments with tree lengths of greater value than either the induction or random gene sets (Figure 2.2C). This result hinted that interferon-stimulated genes may be evolving either more rapidly or more neutrally than either induction genes or random genes. We next tested these possibilities formally.

2.6 Interferon-stimulated genes experience more intense positive selection than interferon-induction genes or randomly-selected genes

After all alignments had been curated, we analyzed each using the Phylogenetic Analysis by

Maximum Likelihood (PAML) program[124]. PAML takes a codon alignment and a user-defined tree as input. For the purpose of our analysis, we used a species tree based on accepted primate relationships as shown in Figure 2.2A [75]. PAML then fits each alignment to di↵erent models of codon evolution and calculates the likelihood of each of these models given the data. By testing the alignment against various models (four in this study) we can determine that some are more likely than others.

We first wished to determine if any of our 331 genes are under positive selection and, if so, how this varies between the three categories of genes. As discussed above, selection operates on nonsynonymous substitutions more significantly than on synonymous mutations. Gene regions that have experienced repeated rounds of natural selection with a skew towards protein-altering mutations therefore exhibit a characteristic inflation of the rate of nonsynonymous (dN ) DNA substitutions compared to synonymous (dS) substitutions (denoted by dN/dS > 1). We first compared the likelihood of two models, M8 and M8a, using a Chi-squared test to determine whether the null model (M8a) could be rejected in favor of a nested model of positive selection (M8) for each gene. M8a is a null model that places all of the codons of a multiple sequence alignment into

10 bins distributed along a beta distribution of dN/dS values (Fig. 2.3A). This beta distribution has a limit of 0 < dN/dS < 1, and the parameters of the beta distribution, which describe the 20 shape of the curve, are estimated during the optimization of the model. M8, a more complex model that allows for positive selection, also describes a beta distribution of dN/dS values less than one, but allows a bin with a dN/dS value greater than one (Fig. 2.3B [124]. The likelihood of the two models being an accurate description of the codon distribution given the data is computed, and the null model is either rejected or not.

The null model (M8a) was rejected in favor of the model for positive selection (M8; p<0.05) for 25 of the random genes, 33 of the interferon induction genes, and 35 of the interferon-stimulated genes (for a list of which genes passed this test and their associated 2(lnl) values and p-values, see

Appendix A). To help correct for multiple testing and avoid false positives, we ran the Benjamini-

Hochberg procedure at two di↵erent levels of conservatism: with a False Discovery Rate (FDR) of

20% (less conservative) and with a FDR of 10% (more conservative). After running the Benjamini-

Hochberg procedure with an FDR of 20%, these numbers remained constant. However, with a more conservative FDR of 10%, a greater number of genes from the random and interferon induction cat- egories dropped out (7 genes and 4 genes, respectively) compared to the number of genes dropping out from the interferon-stimulated category (2 genes) (Figure 2.3C). This finding illustrates that the p-values for genes under positive selection in the interferon-stimulated category were generally lower, making them less likely to fall out after correction.

After Benjamini-Hochberg Correction with an FDR of 20%, the number of genes under positive selection in the interferon stimulated category was not significantly di↵erent than either category. However, after Benjamini-Hochberg Correction with an FDR of 10%, the number of genes under positive selection in the interferon stimulated category was significantly larger than the number of genes under positive selection in the random and induction categories (two-tailed

Fisher’s exact test, p<0.05; Figure 2.3D). I have presented both analyses of this data for full transparency.

We next wondered if the intensity of selection might be di↵erent between these categories. We can approximate this intensity based on the whole gene dN/dS value, or the measure of the average rate of nonsynonymous mutations dN divided by the rate of synonymous mutations dS. Unlike the 21

Figure 2.3: A, B) Graphical illustrations of the M8 and M8A nested codon models in PAML [124]. M8a is a null model that places all the codons in an alignment into bins that fall along a beta distribution of dN/dS values. In M8a, no codons can be assigned to a bin with dN/dS > 1. M8 also allows codons to fall into a beta distribution of dN/dS values less than one. However, some codons can be placed into a bin with a dN/dS value greater than one. The double-sided arrow indicates that the dN/dS value of this bin is optimized in the fitting of the data to the model. C) 331 gene alignments were fit to either the M8 or the M8a model. A likelihood ratio test of nested models was conducted, and the final column indicates the number of genes in each category for which the null model M8a could be rejected in favor of the model of positive selection (p < 0.05). We did a Benjamini-Hochberg correction to control for false positives at two di↵erent test stringencies, indicated in the two tables. D) Proportion of genes in each category that are under positive selection (red) is shown, from the tables in panel C. Using a two-tailed Fisher’s exact test and Benjamini-Hochberg correction at 20% FDR, the number of genes rejecting the neutral model (M8A) was not significantly di↵erent between any two categories. However, at 10% FDR the di↵erence between random genes and interferon-stimulated genes was significant.

A. B. C. After Benjamini-Hochberg correction at 20% FDR Neutral evolution Positive selection Category Analyzed Reject M8a (M8a) (M8) Random 100 25 Induction 131 33 Interferon-stimulated 100 35

After Benjamini-Hochberg correction at 10% FDR Category Analyzed Reject M8a

Proportion of Sites Proportion of Sites Random 100 18 Induction 131 29 dN/dS 1 dN/dS 1 Interferon-stimulated 100 33 D. After Benjamini-Hochberg correction at 20% FDR After Benjamini-Hochberg correction at 10% FDR Random Induction Interferon-stimulated Random Induction Interferon-stimulated Genes Genes Genes Genes Genes Genes

18.0% 22.1% 25.0% 25.2% 35.0% 33.0%

ns ns

ns p < 0.05 by Fisher’s Exact Test Genes not evolving under positive selection Genes evolving under positive selection 22

M8 versus M8a test described above, this model does not sort individual codons along an estimated beta distribution. Instead it is calculated using the PAML model M0. M0 is a model that allows only a single estimated dN/dS value. In M0 the single estimated bin can be any value greater than zero (Figure 2.4A, top). The average whole-gene dN/dS values for interferon-stimulated genes were significantly di↵erent than both those for our random set and the interferon induction set (Kol- mogorov–Smirnov test; Figure 2.4A, bottom). This result suggests that interferon-stimulated genes are evolving more rapidly than the other genes, or are experiencing relaxed negative selection (or relaxed constraint). In other words, there could be selection for nonsynonymous changes (positive selection) or a lack of selection on either synonymous or nonsynonymous changes, which would drive the dN/dS ratio closer to one but not indicate positive selection.

To di↵erentiate between these two possibilities, we next looked more closely at the codons assigned to the dN/dS > 1 bin in the M8 model. We tested whether when PAML fit codons to bins, a higher percentage of codons were placed in the M8 bin above 1 for interferon-stimulated genes than for genes in the random or induction categories. Indeed, the average percentage of codons that fell in the estimated M8 bin greater than one was significantly greater for interferon-stimulated genes (Figure 2.4B). This result suggests that interferon stimulated genes are experiencing more positive natural selection, and not just relaxed evolutionary constraint.

One prediction of more codons under positive selection is that fewer codons would be evolving under conservative evolution—in other words, few codons would be evolving with a low dN/dS ratio.

M2, a simple model that allows for positive selection, places all codons into one of three bins: a bin at dN/dS less than one (conserved), a bin at dN/dS = 1 (neutral), and a bin at dN/dS > 1 (positive selection) (Figure 2.4C, top). We tested whether when PAML fit codons to the bins in the M2 model, it placed fewer codons in the M2 bin less than 1 for interferon-stimulated genes then for genes in the induction or random category. Indeed, for interferon-stimulated genes the average percentage of codons placed in the M2 bin less than 1 is significantly less than for random genes (Figure

2.4C, bottom). Together, the results of Figure 2.4C show us that the interferon-stimulated genes are evolving in a significantly di↵erent way—with more nonsynonymous changes—than interferon 23

Figure 2.4: A) Top: M0 is a codon model in PAML where all codons in an alignment are assigned to a single estimated dN/dS value. Below: box plot of the whole gene average dN/dS values determined by M0 in each category. * p-value<0.05. B) Top: The M8 model of codon evolution, as explained in panel 3B. Below: box plot of percentage of codon sites per gene in the dN/dS > 1 bin in the M8 model. * p-value<0.05; *** p-value<0.001. C) Top: M2, a simple model that allows for positive selection, places all codons into one of three bins: a bin at dN/dS < 1 (conserved), a bin at dN/dS = 1 (neutral), and a bin at dN/dS > 1 (positive selection). The double-sided arrow indicates that the dN/dS value of this bin is optimized in the fitting of the data to the model. Below: box plot of the proportion of codon sites per gene in the dN/dS < 1 bin in model M2. * p-value<0.05.

A. Average dN/dS B. Positive selection C. Conserved, neutral, and (M0) (M8) positive selection (M2) Proportion of Sites Proportion of Sites Proportion of Sites

dN/dS 1 dN/dS 1 dN/dS 1

Whole Gene dN/dS % of sites in M8 bin > 1 % of sites in M2 bin < 1 * * *** 100 * 30 * 1.0 90 25 0.8 20 80

0.6 15 70

0.4 10 60

0.2 5 50

0 0 40 Random Induction Interferon- Random Induction Interferon- Random Induction Interferon- Stimulated Stimulated Stimulated 24 induction or random genes.

2.7 Discussion

We found that our randomly chosen sample of interferon-stimulated genes is evolving more rapidly than canonical interferon induction genes and more rapidly than our sample of random human genes. This conclusion is counter to my original hypothesis (that interferon induction genes would be evolving more rapidly) but may be an important addition to our understanding of host- virus antagonism in the interferon response. Rapidly evolving genes are key in enforcing species barriers in viral spillover, and protect us from zoonoses. While all of the human immune system is important, only the parts that are functionally divergent from the immune systems of other animals are important in the defense against zoonotic viruses. In other words, any immune obstacle that a virus has already overcome in an animal will not be a barrier protecting humans unless that obstacle has taken on a di↵erent kind of anti-viral function in the . Arms races also drive rapid sequence evolution at the interaction interfaces between host and virus proteins, as they each jockey to establish or destroy these interactions [94]. That means that these interactions play out di↵erently in di↵erent species, and thus these evolutionary dynamics enforce species barriers to transmission of pathogens between species.

We had hypothesized that viruses are more likely to evolve mechanisms to halt the production of antiviral cellular states by antagonizing the initial expression of interferons or their stimulated genes rather than the individual proteins which produce antiviral states. Intuitively, it makes sense to “turn o↵the tap” instead of trying to mop up the after-e↵ects of an induced interferon response. Therefore, we expected that the induction pathway would be under greater pressure to evolve rapidly and that we would see a higher signal of positive selection in the induction pathway.

Instead, we found that interferon-stimulated genes are evolving more rapidly than both a randomly drawn set of human genes and proteins involved in ISG induction. It is possible that induction genes are under more evolutionary constraint in order to preserve specific functions in their respective signaling pathways. Previous work studying the evolution of interferons suggests that interferon 25 genes themselves are evolving under di↵erent evolutionary constraints[58]. Interferon molecules and receptors, though much expanded in extant species, is an ancient class of proteins that has had to evolve under the constraints of binding partner compatibility [35, 123]. Interferon-stimulated genes may have more flexibility to obtain and tolerate mutations. Because interferon-stimulated genes are sometimes more specific, or can rely on the redundancies of the induced response, nonsynonymous mutations may be tolerated to a greater extent. For example, mammalian cells have several ways to shut down host and virus translation during viral infection: the IFIT family of proteins, ISG15, and

ZAP are all examples of proteins that are induced by interferon and prevent viruses and hosts from translating RNA [51]. Redundancy in this specific antiviral defense might mean that mutations can be more easily tolerated in each individual protein.

Interferon-stimulated genes remain relatively understudied, in terms of their mechanistic anti- viral action [85]. However, they may be at the forefront of the host-virus “arms race” that has implications for pathogenicity of viruses, the ability of viruses to spillover to new hosts, and the evolution of our immune systems. We have identified many interferon induction and interferon- stimulated genes that are under positive selection and that do not have known host-virus inter- actions, suggesting that there are still antagonistic relationships to untangle in the never-ending battle between viral proteins and primate immune systems (Table 2.2).

2.8 Methods

Definition of Gene Categories

The list of 131 induction genes was curated from reviews of interferon signaling pathways[33,

34, 44, 91, 5]. We ignored genes listed in these reviews that were solely a part of the DNA damage pathway. We do not assume that this “induction” category is a complete list of genes upstream of

ISG production, but rather have treated it as a representative list of genes known to be implicated in several induction pathways. Any gene mentioned in the reviews that could not be unambiguously identified (i.e. gene name was listed as an alias for multiple genes) was removed from this list. The 26

Table 2.2: In the analysis described throughout this chapter, we identified many interferon-related genes that are evolving under positive selection in primates. I list them here alphabetically and separated by category, and have indicated genes that have been previously identified as rapidly evolving in primates (*) and when genes have known interactions with pathogenic elements ( )[60, † 45, 95, 96, 121, 52, 122, 120, 40, 67, 74, 116, 57, 26, 93, 29, 47, 68].

Induction genes evolving Interferon-stimulated genes evolving under positive selection under positive selection

CASP10 MB21D1*† TLR4*† ADAR*† GBP2*† RSAD2*† CIITA* MNDA TLR5*† APOBEC3F† IFI27† RTP4 CISH OAS1*† TLR6*† APOBEC3G*† IFI44 SAMD9 DDX58*† OAS2*† TLR7*† APOL2* IFI44L SAMHD1*† DDX60*† PTPRC† TLR8*† APOL6* IFI6 SLFN12* EPOR PYHIN1 TLR9*† BST2*† IFIT1* TAGAP IFI16* RNASEL*† TMEM173*† CCL8 IFIT2* TMEM140 IFNAR1* SPP1 TRIM21*† CD47 IFIT5* TNK2 IFNAR2 STAT2*† TRIM25*† CEACAM1† MLKL TRIM22*† JAK3 TLR1*† TYK2 CRP MX1*† TRIM5*† MAVS*† TLR2*† ZBP1* DAPK1 MX2*† ZC3HAV1*† EIF2AK2*† PHF11

* Previously identified as being under positive selection in primates † Published interaction with pathogen 27 list of 100 interferon-stimulated genes was curated from published literature[42, 84, 87, 86]. These genes were verified by the Interferome database, with the criteria that each interferon-stimulated gene was upregulated at least twofold by type I interferons[81]. A list of random human genes was formed using a random gene set generator, and immune genes were excluded and replaced. We did not place the same gene in more than one category. If a gene is implicated in canonical induction pathway, but also upregulated by interferons, it was placed in the induction category.

Creation of Multiple Sequence Alignments for Each Gene

The longest human isoform of each gene, along with any simian primate orthologs available, were collected from the NCBI Gene database. We collected and retained as many primate sequences as possible, including sequences that were labeled as unassigned gene loci, as long as that sequence retuned the correct human ortholog in a reciprocal BLAST search back to the human genome. In some cases, primate ortholog sequences contained “n”s suggesting that these bases did not meet certain quality thresholds. These sequences were retained, but note that PAML treats “n’s” as gaps and will therefore not analyze codons in multiple sequence alignments that contain them. Further, any sequence that was marked holistically as “Low Quality” on NCBI were not included. The cDNA sequences were then translated to amino acids and aligned with the MUSCLE algorithm using the Unipro UGENE software or MEGA[72, 108]. Pal2Nal was then used to referenced this amino acid alignment resulting in a final alignment of cDNA by codon[106]. The result was over

300 multiple sequence alignments containing human and primate orthologs of interferon-related or random genes.

Each multiple sequence alignment was then manually inspected and edited. Our pruning and quality control pipeline consisted of these steps: 1) We removed from the alignments any ortholog containing a gap (missing sequence) that spanned >10% of the length of the cognate human gene.

This was done because PAML will not analyze codon sites containing a gap. We did not remove an ortholog if it had multiple gaps relative to the human sequence, as long as each gapped region was <10% of the length of the human gene. 2) We removed from the alignments any ortholog 28 which aligned poorly to other sequences in the alignments for a contiguous stretch spanning >10% of the length of the human gene. We did this because simian primate sequences tend to align with very high identity since divergence is low in this clade, and such regions usually indicate regions of mis-annotation or gene prediction[59]. 3) We trimmed from alignments sequence at either terminal end (starting at start codon and ending at stop codon) if less than ten orthologs in the alignment had the same start or termination site as the human sequence. In this case, we stopped trimming alignments at the first conserved site (or site where amino acid variation tracked with phylogeny).

4) We manually inspected all remaining gaps in the multiple sequence alignments. We deleted codon columns where more than one amino acid misaligned at the edge of a gap. 5) We deleted all regions in the alignments where an ortholog contained more than four amino acids in a row that did not align to any other orthologs in the alignment. 6) After all of these curation steps, multiple sequence alignments containing less than 10 orthologs (including human) were not analyzed further.

This is because we have previously shown that the accuracy of evolutionary tests improves as the number of primate species and overall tree length of an alignment increases[59].

Evolutionary analysis

Positive selection was detected using the Phylogenetic Analysis by Maximum Likelihood

(PAML) program. The codeml program packaged in PAML accommodates for the di↵erences in rates of transition/transversion substitutions, unequal codon frequencies, and the probabilities of mutation across the codon[124]. PAML requires the codon alignment be accompanied by a phylogenetic tree to accurately identify rates of substitutions. A master phylogenetic tree with the twenty possible primate species was made using Perelman et al. 2011 as a reference and modified as necessary for each gene[75]. The tree length of each multiple sequence alignment was determined from the output file of model M0.

PAML fits the multiple sequence alignments to di↵erent models of codon substitution[124].

For the analysis outlined in this chapter, we used the M0, M2, M8a, and M8 models. We used likelihood ratio tests to determine which model, M8 or M8a, best fit the data for the evolution 29 of each gene. PAML provides a log likelihood (lnl) value for each alignment in both the null and positive selection models. The di↵erence of these values is then doubled, referred to here as “2lnl”, and used to perform Chi-Square tests with a single degree of freedom. We defined a p-value of p<0.05 allowing us to reject the null hypothesis that there is no di↵erence in how well models M8 and M8a fit the data. These genes were determined to be evolving under positive selection.

In instances where M8a was rejected in favor of M8, specific codons are identified which have elevated rates of nonsynonymous fixed mutations. This is determined by the Bayes empirical Bayes

(BEB) method which accounts for sampling errors in the parameters of the model[124]. The codons identified by BEB, and the posterior probability by which they are predicted to fall in the bin >1, was recorded. Chapter 3

STING: an immune protein that is a target of flavivirus proteases

Stimulator of interferon genes (STING) is a primate immune protein that is part of the interferon response. Like many immune proteins, it is evolving under positive selection. Also like many immune proteins, it has been known that STING is targeted by viruses. Previous work in the Sawyer Lab showed that the dengue virus protease targeted human STING, and other groups showed that this cleavage is conserved for at least Zika virus and West Nile virus. I showed that

STING is also cleaved by the proteases from the tick-borne mosquito viruses Powassan and Langat viruses, expanding the cleavage of STING past the group of mosquito-borne flaviviruses. I also went on to show that the HAQ allele of STING (a circulating human allele of STING) is cleaved by dengue virus, and possibly even to a greater extent. However, the human 78Q mutation prevents cleavage by dengue virus. Finally, I performed some experiments testing the inhibition of STING- stimulated interferon response by flavivirus proteases, and uncovered some interesting evidence decoupling the inhibition of interferon from cleavage of STING.

3.1 Flaviviruses, and dengue viruses in particular, are world-wide pathogens with major consequences for human health

Flaviviruses are a group of viruses that includes important human pathogens such as dengue virus, West Nile virus and Zika virus. They are nested under the Flaviviridae family, which is a family of small enveloped viruses with positive-sense RNA genomes of approximately 13 kb.

The Flaviviridae family includes three genera: flaviviruses, pestiviruses, and hepaciviruses. Most 31 members of the genus Flavivirus are arthropod-borne, either by mosquitos (e.g., yellow fever virus, dengue virus, West Nile virus) or ticks (e.g. Powassan virus, Langat virus)[6]. Many of these viruses are important human or veterinary pathogens, and dengue especially is a cause of extreme su↵ering in countries all over the world. However, dengue virus is at a relatively low biosafety risk

(there is low risk of contracting dengue virus from lab work), which makes studying dengue both relatively easy to do while providing the foundation for work that may translate to studying other

flaviviruses, and possibly also pestiviruses and hepaciviruses.

Dengue viruses are responsible for 390 million infections annually, approximately 96 mil- lion of which result in clinical disease[7]. There are four described serotypes that infect humans

(DENV-1-4), all of which emerged from a reservoir that is believed to reside in primates[115]. The immune protection developed against any one serotype of dengue virus typically does not confer full protection from infections resulting from any of the other three serotypes, and in fact pre-existing antibodies can worsen symptoms of dengue infection[10]. Although recent improvements to dengue vaccines have been extremely promising, there is currently no vaccine that demonstrates signifi- cant protection against all four serotypes of dengue or that prevents dengue hemorrhagic fever[8].

Dengue virus is already an urgent problem in tropical regions of the world, and the number of people at risk is expected to increase as global warming a↵ects the range of the Aedes aegypti mosquito vector [7, 64].

Dengue virus replicates on host lipid membranes, contorting the membrane such that the viral genome and viral proteins remain protected from the host cytoplasm. This contortion of the membrane prevents the host from recognizing pathogen-associated molecular patterns (PAMPs) and up-regulating the immune response as a result. Dengue virus also actively prevents the im- mune system from being up-regulated by binding or inhibiting host proteins. One example of this antagonism of the host immune response is the ability of dengue viruses to bind and inhibit the host immune protein Stimulator of Interferon Genes (STING). 32

3.2 Human STING and the interferon response

STING (also called MITA, MPYS, ERIS, or TMEM173) was identified by several di↵erent groups around the same time. STING is a multi-pass transmembrane protein found in the endo- plasmic reticulum (ER) that acts as a critical component in the innate immune sensing pathway for intracellular pathogens[36, 37, 132, 38, 105, 9]. Although originally described as part of the response to cytosolic DNA sensing, STING is also activated upon RNA virus infection[131, 32].

STING is a part of the cGAS-STING axis, in which cyclic GMP-AMP synthase (cGAS) is the initial binder of foreign DNA. Upon binding DNA, cGAS stimulates the production of cyclic

2’3’-GMP-AMP (2’3’-cGAMP) which is a potent binder and activator of STING[104]. Activated

STING recruits and is phosphorylated by TANK-binding kinase-1 (TBK1), leading to recruitment of interferon regulatory factor-3 (IRF3). TBK1 then also phosphorylates IRF3, causing IRF3 to dimerize and translocate to the nucleus. Nuclear IRF3 upregulates the transcription of genes encoding cytokines, chemokines, and interferon[90, 89]. Thus cGAS binding DNA in the cytoplasm

(where double stranded DNA would not be found under normal cellular conditions) prompts the cell to upregulate the innate immune response. The linking of RNA virus infection to cGAS-STING signalling is less clear, but may be a result of disrupted mitochondrial membranes and therefore the release of mitochondrial DNA into the cytoplasm, where it can be sensed by cGAS[103].

STING is known to be cleaved by a protease encoded by the dengue virus RNA genome.

Dengue virus uses host ribosomes to translate the positive stand RNA viral genomes, the products of which—polyproteins—are embedded directly into the ER membrane. Two of the proteins in these polyproteins, NS2B and NS3, form a protease. Autocatalytic cleavage releases this protease from the polyprotein, and the protease then acts along the cytoplasmic face of the polyprotein while host proteases act along the lumenal side. Ultimately all the individual proteins are released from the polyprotein to act independently (replication of all viruses in the flavivirus family, including dengue virus, is summarized in [6]). Research into dengue virus-host interactions has previously revealed that the dengue virus protease also targets some host proteins. Before the identification 33 of DGAT2 (described in Chapter 4), the dengue virus protease was known to cleave STING, the mitofusins MFN1 and MFN2, and FAM134B[1, 128, 49]. In early studies that showed STING was cleaved by the dengue virus protease, the cleavage motif was identified to be at amino acid site 96, due to the similarity of the motif at that site with known dengue virus protease cleavage motifs.

In fact, STING was identified as a target of the dengue protease because two groups discovered that dengue virus inhibits type I IFN production in human primary dendritic cells and that this inhibition requires a proteolytically active NS2B3 protease complex. These groups then scanned for protease target motifs in the STING signalling axis, which resulted in the initial conclusion that the protease was targeting at position 96[1, 127]. However, the Sawyer Lab showed that the motif being targeted was in fact at position 78, and went on to show that STING in primates (most of which do not encode an identical eightmer motif around position 78) is generally not cleaved by the dengue virus protease[99]. For that project, I showed that in STING KO A549 cells that had been complemented with wild-type or 78W HA-tagged STING, dengue infection leads to cleaved wild-type STING but not cleaved mutant STING (Figure 3.2;[99]).

Subsequently another group showed that STING is cleaved at position 78 by the proteases from dengue virus, Zika virus, and West Nile virus but not by the protease from yellow fever virus[18]. This finding illustrated that the cleavage of STING is partly, but not completely, con- served among flaviviruses and strengthened the idea that flaviviruses and human STING are in an antagonistic evolutionary relationship; they are evolving in direct opposition to the action of the other. The role of STING is to up-regulate an antiviral response and the role of flaviviruses (in this case, the protease of flavivirues) is to prevent that interferon up-regulation.

3.3 Human STING, but not human 78Q STING, is cleaved by multiple flaviviruses

There are two major alleles of STING in humans. The second most common allele, known as the HAQ allele, is di↵erent from the reference allele in three amino acid positions (R71H-

G230A-R293Q)[39]. This STING allele may have an interesting evolutionary history as revealed 34

Figure 3.1: Wild-type A549 cells, STING KO cells, and STING KO cells complemented with empty vector, wild-type STING, or R78W STING were all mock infected, or infected with DENV2 (MOI 5) for 24 hours. Cell lysates were collected and a western blot was probed for HA tag (to detect stably transduced STING constructs, as well as the cleavage product, if present) as well as an antibody for dengue virus NS3 protein. Only when the cleavage motif of STING is intact does the protease recognize and cleave STING.

Wild-type A549sSTING KO STING+ empty KO vector +STING wild-type KO + STINGmutant STING DENV2 MOI: 0 5 0 5 0 5 0 5

STING-HA anti-HA cleaved

anti-DENV NS3

anti-Actin 35 by the individual geographic distributions of the three single-nucleotide polymorphisms (SNPs).

The R293Q and G230A-R293Q isoforms occur predominantly within the African subpopulation, while HAQ is much higher in the Asian and Hispanic American subpopulation than in the African and European subpopulations (Figure 3.2). Yi et al showed that there are common SNPs in the non-coding portions of the genes that suggest haplotype origins predating the radiation of ethnicities[126]. They argued that the biased distributions of the haplotypes reflect selection for particular isoforms in populations after they had been geographically separated. The HAQ allele induces Type I interferon less eciently compared to the reference allele, due primarily to the

R71H and R293Q SNPs. This di↵erence in immune signalling may have driven selection in di↵erent populations.

There is also a SNP that exists at low frequency in human populations at the motif recognized by flavivirus proteases. This mutation changes amino acid 78 from an arginine to a glutamine. This

R78Q change immediately became interesting to us as we hypothesized that cells expressing this allele of STING would not be a↵ected by an active dengue virus protease. This hypothesis was based on the fact that mouse STING encodes a glutamine at this position (at the motif recognized by the dengue virus protease) and is not cleaved. I first tested whether R78Q STING was cleaved by dengue virus. To do this, I introduced a mutation with site-directed mutagenesis at the codon encoding 78R, mutating it to encode 78Q. I found that, as expected, R78Q STING was not cleaved

(Figure 3.2A). In the presence of an active, but not an inactive, dengue virus protease, the ”wild- type” (i.e. the reference allele) version of STING is cleaved—a cleavage product appears below the full-length band. In the R78Q mutant version of STING, this cleavage product is not visible

(Figure 3.3).

I also tested whether proteases from the four circulating urban serotypes of dengue virus, as well as the protease from a sylvatic strain of dengue virus, are capable of cleaving R78Q STING.

This assay reveals whether the R78Q amino acid change ”breaks” the ability of all dengue viruses to cleave STING. I found the same pattern for each protease: the major STING allele (WT STING) is cleaved, while the R78Q STING allele is not cleaved (Figure 3.3B). Thus the ability to cleave 36

Figure 3.2: The individual single-nucleotide polymorphisms that distinguish the HAQ allele (R71H, G230A, R293Q) have distinct evolutionary history as revealed by their geographic dis- tributions. Here, the proportion of sequenced individuals with the major allele (bottom of each bar) or minor allele (top of each bar) is graphed, divided by geographical population (AFR=Africa, AMR=Americas, EAS=East Asia, EUR=Europe, SAS=Southeast Asia). Data is collected from The Ensembl project. The most distinct di↵erence between SNPs is that the SNP causing the R71H amino acid change is found much less in Africa than the other SNPs that make up the HAQ haplotype, suggesting di↵erent selection pressure placed at this position depending on geographic location.

HAQ STING associated SNPs

Position 71 100

80

60

40

20

0 AFR AMR EAS EUR SAS C T

Position 230 100

80

60

40

20

0 AFR AMR EAS EUR SAS C G

Position 293 100

80

60

40

20

0 AFR AMR EAS EUR SAS C T 37

STING, but not R78Q STING, is a conserved trait among dengue viruses.

To test the di↵erences in cleavage between the three human alleles described above (the reference allele, the HAQ allele, and the R78Q allele), I first requested two matched B-cell lines

(i.e. same gender, same country of origin, same ethnicity) from the Coriell collection of banked cell lines. These lines have been sequenced, so I knew that one of the STING alleles of the possible four included the R78Q mutation. In fact, of the four alleles I sequenced from these two cell lines, two were the reference allele (herein referred to as Allele 1), one was the HAQ allele (herein referred to as Allele 2), and one was the reference allele with the 78Q mutation (herein referred to as Allele

3)(Figure 3.4A). I cloned each of these alleles into plasmids such that they were HA-tagged, which allowed me to test cleavage by flavivirus proteases in co-transfection assays.

First, I co-transfected each STING-HA construct with wild-type dengue virus protease. I found that both Allele 1 and 2 (R78 alleles) were cleaved by the active dengue virus protease but

Allele 3 (R78Q) was not cleaved (Figure 3.4B). This assay confirmed that the dengue virus protease cleaves both STING alleles with a cleavage motif that contains an arginine at position 78, but not an allele that has a glutamine at position 78.

I went on to test the proteases from several flaviviruses against these three alleles. I found that many flavivirus proteases cleaved with the same pattern; they cleaved allele 1 and allele 2 but not the R78Q allele (Figure 3.5). Dengue virus, Zika virus, West Nile virus, Powassan virus, and Langat virus all cleaved R78 STING alleles. Yellow fever virus as well as Rio Bravo virus

(not shown) did not cleave any version of STING. Notably, the proteases that did cleave STING included the proteases from two tick-borne viruses, Powassan and Langat viruses. This novel

finding expanded the list of proteases that are known to target STING, and may illustrate that the cleavage of STING is a somewhat conserved mechanism of immune dampening between two clades of the flavivirus family. 38

Figure 3.3: The 78Q SNP prevents cleavage of human STING. A) Plasmids encoding the NS2B3- 3XFLAG protease complex (proteolytically active or inactive) and STING-HA (wild-type or R78Q) were co-transfected into 293T cells, and 48 hours later lysates were collected and analyzed by immunoblotting. A cleavage product only appears in the conditions with wild-type STING and active dengue virus protease. B) Plasmids encoding the NS2B3-3XFLAG protease complex from DENV1-4 and a plasmid encoding an untagged NS2B3 protease complex from a sylvatic protease and STING-HA (wild-type or R78Q) were co-transfected into 293T cells, and 48 hours later lysates were collected and analyzed by immunoblotting. Cleavage is detected only when wild-type STING- HA is transfected.

A.

WT Hs SNP STING (R78Q) Protease Inactive (I), Active (A) I A I A

100 kDa inactive protease 75 kDa active protease

50 kDa STING 37 kDa cleavage product

50 kDa Actin 37 kDa

B.

Protease: DENV1 DENV2 DENV3 DENV4 Sylvatic STING: WT 78Q WT 78Q WT 78Q WT 78Q WT 78Q

100 kDa

75 kDa Protease (anti-FLAG)

50 kDa

37 kDa STING (anti-HA) Cleavage product

25 kDa 50 kDa Actin (anti-Actin) 37 kDa 39

Figure 3.4: DENV2 cleaves STING and HAQ STING, but not 78Q STIN. A) Three human STING alleles were cloned from human B-cells: the STING reference allele (A1), HAQ STING (A2), and R78Q STING (A3). R78W STING is included here to illustrate the change made to the reference allele for this control construct. B) Plasmids encoding the NS2B3-3XFLAG protease complex and STING-HA (Allele 1-3) were co-transfected into 293T cells, and 48 hours later lysates were collected and analyzed by immunoblotting. A cleavage product only appears in the conditions with R78 STING, and not when R78Q STING is co-transfected with an active protease.

A.

Reference (A1)

R71H G230A R293Q HAQ (A2)

R78Q R78Q (A3)

R78W R78W

motif recognized by !avivirus proteases

amino acid change from reference allele

B. Protease: DENV2 Allele: A1 A2 A3

50 kDa

37 kDa STING-HA

cleavage product

100 kDa

75 kDa active protease (V5) 40

Figure 3.5: Most flaviviruses cleave STING and HAQ STING, but not 78Q STING. For all conditions below, plasmids encoding the NS2B3-V5 protease complex from six di↵erent flaviviruses as well as plasmids including STING-HA (Allele 1-3) were co-transfected into 293T cells, and 48 hours later lysates were collected and analyzed by immunoblotting. A cleavage product only appears in the conditions with R78 STING (Allele 1 and 2), and not when R78Q STING is co-transfected with an active protease. However, the protease from yellow fever virus (YFV) does not cleave any version of STING.

Protease: DENV2 ZIKV WNV Allele: R1 R2 Q R1 R2 Q R1 R2 Q

50 kDa

STING-HA 37 kDa cleavage product

100 kDa

75 kDa active protease (V5)

Protease: YFV POW LCTV Allele: R1 R2 Q R1 R2 Q R1 R2 Q

50 kDa STING-HA 37 kDa cleavage product

100 kDa

75 kDa active protease (V5) 41

3.4 Rodent STING is under positive selection

Powassan virus and Langat virus are two tick-borne flaviviruses, distinct from the other viruses in Figure 3.5 which are mosquito-borne flaviviruses. They also primarily infect rodents, which shifted the focus from primate STING to rodent STING. Primate STING is under positive selection, but it has not been shown that rodent STING is under positive selection. To determine whether or not rodent STING is under positive selection, and specifically whether the codons encoding the motif recognized by the flavivirus proteases is under positive selection, I collected

STING sequence from the following rodent genomes: mouse, rat, chinchilla, beaver, shrew mouse, ryukyu, kangaroo rat, naked mole-rat, chinese hamster, golden hamster, Mongolian gerbil, Damara mole-rat, prairie deer mouse, praririe vole, guinea pig, white-footed mouse, grammomys, marmot,

13-striped ground squirrel, and yellow-bellied marmot.

I then conducted positive selection analysis on Sting1 for this phylogeny of rodents (for an explanation of positive selection and a description of how this analysis is conducted, see Chapter 2).

I found that when considering 12 species of this clade, Sting1 is under positive selection, however neither position 76 (the amino acid that aligns to R78 in human STING) nor any other position in the motif recognized by the flavivirus proteases, is under positive selection. However, I found that tree length was very long in this analysis (about 5) which indicated that the phylogeny under consideration may represent too much evolutionary distance. Therefore I also split the major rodent phylogeny into separate clades, and re-analyzed them to identify codons under positive selection in those clades. I found that codon 76 was under positive selection only in Hystricomorpha (tree length 1). Interestingly, neither Powassan virus nor Langat virus is known to replicate in any ⇡ rodents in the clade Hystricomorpha. It is possible that one of these viruses does replicate in these rodents but has not been detected, or that there is another rodent flavivirus that has yet to be discovered. It is also possible that this codon is under positive selection pressure for a reason unrelated to flaviviruses. Therefore a future project may be to first, broaden the positive selection analysis to a greater number of Hystricomorpha species to confirm that position 76 is evolving under 42 positive selection in the clade. The more species in a positive selection analysis (assuming the tree length stays low) usually makes the results of the analysis more reproducible and probably carries more biological meaning [59]. Secondly, it would be interesting to study whether the cleavage motif sequence in rodent Sting1 a↵ects the replication of flaviviruses such as Langat or Powassan virus.

3.5 The interferon response is stimulated by STING transfection

The human R78Q allele is not cleaved by flaviviruses, suggesting that the STING-dependent interferon response would be less dampened by the dengue protease in cells expressing this allele.

Therefore I wanted to be able to show, as we had already shown for the R78W version of STING, that R78Q STING expression leads to increased interferon expression in the presence of the dengue virus protease than for cells expressing wild-type STING (R78W allele referenced in Figure 3.4A).

I showed that STING transfection alone upregulates the STING-dependent interferon response in 293T cells. This upregulation can be explained in two ways: first of all, 293Ts do not express

STING, so the overexpression of STING is driving the kinetics of oligomerization towards increased aggregation and therefore increased IRF3 activation. Second, because STING is part of the cellular response to double-stranded DNA, the presence of the transfected plasmid could itself be stimulating the action of its translated product.

In the Sawyer Lab we have a 293T cell line that is stably transduced with a luciferase gene under the control of an Interferon Stimulation Response Element (ISRE) . The ISRE promoter is bound by ISGF3, a transcription factor activated by the binding of interferons to cell surface receptors. This assay allows us to measure the induction of interferon indirectly, by measuring the luminescence of these cells that is produced when interferon is being produced by the population and activating downstream pathways in neighboring cells. With no STING transfection, the levels of ISRE-promoter activity in the population are very low. After transfection with a plasmid producing STING, the interferon response is increased, as demonstrated by high luciferase levels.

To make sure that each STING allele induced the interferon response, I transfected STING 43

Figure 3.6: The cleavage motif in rodent Sting1 is under positive selection, but only in the Hystricomorpha clade. A) A phylogenetic tree of rodents with species that are known to be infected by Powassan or Langat virus in the wild. B) PAML was run on the Sting1 gene from all species in the tree from A (top) or 5 species from the Hystricomorpha clade (two mole-rat species, guinea pig, chinchilla, and degus; bottom). When considering Sting1 in rodents, the cleavage motif is not under positive selection. However, the cleavage motif is under positive selection in the hystricomorpha clade.

A. rabbit (outgroup) marmot * Sciuromorpha squirrel * beaver Castorimorpha jerboa prairie vole * hamster Myomorpha gerbil rat * mouse * mole-rat guinea pig Hystricomorpha chinchilla B.

All species (12 species)

Sting1 cleavage motif 1 379 nucleotide binding

Hystricomorpha only (5 species)

cleavage motif Sting1 1 379 nucleotide binding

= under positive selection (M8 vs M7; BEB > 0.9) 44 into these ISRE-luc 293T cells, titering the levels of STING plasmid and making up the di↵erence in

DNA concentration with empty plasmid such that the same amount of DNA was always transfected.

I found that the levels of interferon induced by the STING alleles di↵ered significantly; the HAQ allele upregulated interferon to a much reduced level compared to Allele 1 or 3. This result has been shown previously, and both previous groups and I have found that the reduced up-regulation of interferon is primarily dependent on the R71H and R293Q mutations. Thereafter, I chose to transfect amounts of all three alleles that made interferon induction comparable. Thus reduction in interferon induction that was dependent on the protease could be compared proportionately.

3.6 Active dengue virus protease inhibits interferon production

To test the antagonism of STING-dependent interferon activation by the dengue virus pro- tease, I co-transfected plasmids encoding one of the three STING alleles and a plasmid encoding an active or inactive dengue virus protease. By normalizing the luciferase signal in cells co-transfected with active protease to cells co-transfected with the inactive protease, I could determine to what extent the active protease was inhibiting interferon activation.

The results of this experiment were surprising to me. I found that compared to inactive protease, active protease reduced the interferon response produced by all four STING alleles I tested

(Figure 3.8A). I also found that the active dengue protease had di↵ering amounts of e↵ectiveness against the di↵erent human alleles. Notably, it appeared to be more e↵ective at dampening the immune response caused by the HAQ allele of STING (clearest in Figure 3.8B). And, surprisingly, there was no significant di↵erence in immune response dampening between co-transfections with

Allele 1 and Allele 3 in my hands. Moreover, my control, R78W STING had equal STING- dependent dampening as Allele 1 and 3. A literature search confirmed that besides the luciferase experiment we published in eLife in 2018 (which showed small di↵erences), no other group has done a direct comparison between the cleavage of wild-type and 78Q STING.

As well as finding that the dengue virus protease appeared to be more e↵ective at dampening the immune response caused by the HAQ allele of STING, I consistently found that Allele 2 (the 45

Figure 3.7: Transfecting STING induces interferon production. Increasing amounts of STING were transfected into 293T cells expressing a ISRE-controlled firefly luciferase, with the di↵erence made up by empty vector plasmid. 48 hours later cells were collected and luciferase activity was measured. Here, four versions of STING are compared: three human alleles and R78W STING. Allele 2 stimulates the STING-dependent interferon response to a significantly reduced level.

Interferon response vs STING titration

105

104 Allele 1 (78R, cleavable) Allele 2 (78R, cleavable) Allele 3 (78Q, not cleavable) 78W, not cleavable 103 relative ISRE-luc units

102 0 1 2 4 8 16 32 64

ng STING transfected 46

Figure 3.8: Transfecting STING and DENV2 protease dampens interferon production. A) The four alleles of STING were transfected with increasing amounts of dengue virus protease (inactive or active) and luciferase activity was measured after 48 hours. Counter to expectations, active protease reduced STING-dependent luciferase activity compared to inactive protease for every allele (data from one biological replicate). B) Same data type as in A, two biological replicates, displayed as luciferase activity from transfections with active protease divided by transfections with inactive protease. Di↵erence is generally higher in transfections with Allele 2, and there is little to no di↵erence between Allele 1 (cleavable) and Allele 3 or R78W STING (non-cleavable).

A. Allele 1 (78R, cleavable) Allele 2 (78R, cleavable) 105 104

104 103

3 RLU 2 RLU 10 10

102 101 0 4 8 12 16 20 0 4 8 12 16 20 ng protease transfected ng protease transfected cleavable not cleavable Allele 3 (78Q, not cleavable) 78W, not cleavable 105 105

104 104 RLU 103 RLU 103

102 102 0 4 8 12 16 20 0 4 8 12 16 20 ng protease transfected ng protease transfected

inactive DENV2 protease active DENV2 protease

B. Interferon response knockdown with active protease relative to inactive protease

1.6

1.4 Allele 1 (78R, cleavable) Allele 2 (78R, cleavable) 1.2 Allele 3 (78Q, not cleavable) 78W, not cleavable 1.0

0.8

0.6

0.4

0.2

IFN with active relative to inactive 0.0 0 4 8 12 16 20 ng DENV2 protease per well 47

HAQ allele) was more eciently cleaved by the dengue virus protease. This result was intriguing because of the evolutionary history referenced above. It was tempting to hypothesize that there is selection on the HAQ allele, specifically the R71H SNP in Africa, preventing its fixation because it is more easily antagonized by flaviviruses. However, the fact that HAQ STING is less e↵ective at prompting the cellular innate immune response may partially explain this phenotype. Indeed, when I compared the interferon response induced by A2 R71H (higher compared to A2 without any mutations) and the cleavage of A2 R71H (less compared to A2 without mutations) they correlate very well. The reverse is seen in A2 A230G (interferon induction is the same, STING cleavage is also the same). My model is that because HAQ STING is less able to prompt the immune response, transfection of HAQ STING leads to less inhibition of translation, which is a downstream e↵ect of interferon upregulation. Therefore more protease is expressed and STING is cleaved more eciently. Although another group has shown that the haplotypes of STING are cleaved di↵erently, this question of whether it is dependent on interferon activation is left unresolved [102].

3.7 Small di↵erences in virus replication in cell lines expressing di↵erent STING alleles

I also compared infections in A549 cell lines expressing di↵erent alleles of human STING.

To do this, I stably transduced STING KO A549 cells with plasmids expressing Allele 1, 2, or 3, as well as R78W STING as a control. I then infected these cell lines with dengue virus. I began by infecting cells expressing wild-type STING (i.e. STING Allele 1, the reference allele) or 78W

STING, as previous data suggested we should see a large decrease in replication in cells expressing

78W STING. To my surprise and annoyance, I saw a much smaller decrease than we had previously reported. At MOI 0.1 infection, I saw a decrease of about half a log (Figure 3.10A). This decrease was not present at infections of MOI 0.3, or at lower MOIs.

Although I continued to test whether the expression of 78W STING reduced DENV2 infection at di↵erent timepoints and MOIs, the change was most apparent at MOI 0.1, 24 hours post infection.

I then tested whether infection of A549 cells complemented with the three human STING alleles 48

Figure 3.9: STING SNPs and protease antagonism. A) The SNPs that define HAQ STING a↵ect how interferon is induced after STING transfection. Shown here is the amount of STING-dependent interferon activity induced after transfection of STING constructs, as measured by ISRE-controlled luciferase activity. Relative luciferase activity for each transfection condition is normalized to Allele 2 (A2), which consistently produced the lowest level of interferon after transfection. Both an H71R change and a Q293R change on the A2 background restore interferon production to near A1 levels. B) STING alleles are cleaved di↵erently by the dengue virus protease. HAQ STING (A2) is cleaved to a greater extent, and both an H71R change and a Q293R change on the A2 background reduce cleavage to near A1 levels.

A. B. IFN induction after STING transfection

10

9 A1 A3 (uncleavable)A2 A2 - H71RA2 - A230GA2 - Q293R 8 STING-HA 7 37 kDa cleavage 6 product

5 STING-HA (long exposure) 4 37 kDa cleavage 3 product

2 100 kDa

Fold change of ISRE-luc activity over A2 Fold change of ISRE-luc activity over active 1 75 kDa protease (V5) 0

A2 A1 A2 A1 actin

A2 - H71R A2 - H71R A2 - A230GA2 - Q293R A2 - A230GA2 - Q293R 4 ng STING 8 ng STING transfected transfected 49 resulted in significantly less infectious progeny in the supernatant. I found that expression of R78Q and R78W STING both resulted in less infectious progeny compared to both 78R STING alleles and empty vector.

3.8 Not all flavivirus proteases inhibit the STING-dependent interferon response

Supporting my hypothesis that cleavage may not be sucient for immune dampening, not all the flaviviruses that cleave human STING inhibit STING-dependent interferon induction. Because

I knew that at least two tick-borne flavivirus proteases cleave STING and had not been shown to dampen the interferon response, I co-transfected human STING and di↵erent flavivirus proteases, titering the proteases, and found that only the dengue, Zika, and West Nile Virus proteases damp- ened interferon production (Figure 3.11, and data from one protease concentration summarized in

3.12). In contrast, the proteases from Yellow Fever, Powassan and Langat viruses did not reduce interferon production. This inability to reduce interferon production could be reflecting the fact that STING is less eciently cleaved by the tick-borne flavivirus proteases during infection. How- ever, I have observed that levels of STING cleavage are relatively similar for all flaviviruses (Figure

3.5). Instead, I hypothesize that STING cleavage by these tick-borne flaviviruses is not sucient to reduce interferon production. These data make me more confident that cleavage of STING is not the only variable at play when considering the dampening of STING-dependent interferon by

flavivirus proteases.

3.9 Discussion

In this chapter I discussed my work on human STING and its cleavage by flavivirus proteases.

Early in my thesis work I showed that STING is cleaved by dengue virus during infection. This cleavage is only detectable at high titers of infection, and the cleavage of STING is very inecient.

This has always been a very interesting observation, and led to our hypothesis that the virus only needs to cleave a small amount of STING to dampen the interferon response e↵ectively—in 50

Figure 3.10: Non-cleavable STING reduces dengue virus replication in A549s. A) STING KO A549 cells were complemented with empty vector, wild-type human STING (78R, cleavable) or R78W STING (78W, non-cleavable) and infected for 24 hours at indicated MOIs. Titer of virus in the supernatant was determined by plaque assay. At MOI 0.1, cells expressing 78W STING had significantly fewer plaque-forming units in the supernatant. B) The R78Q human STING allele also reduced dengue virus replication after 24 hours (MOI 0.1), as compared to cells expressing cleavable STING.

A. Titer after DENV2 Titer after DENV2 infection at MOI 0.1 infection at MOI 0.3

**** 107 107 * 106 106 ) 105 ) 105 104 104 103 103 2 2

titer (PFU/mL 10 10 titer (PFU/mL 101 101 100 100

EV EV 78R 78W 78R 78W

B. Titer (pfu/mL) versus STING construct **** 105 ** * 104 )

103

102 titer (PFU/mL 101

100 cell line

STING 78W empty vectorAllele 1 (78R)Allele 2 (78R)Allele 3 (78Q) 51

Figure 3.11: STING alleles and flavivirus protease antagonism. 293T cells expressing an ISRE promoter-luciferase construct were co-transfected with a plasmid encoding one of three human STING alleles and a plasmid encoding one of eight flavivirus proteases. Each protease was trans- fected in increasing concentration (ng/well indicated on x-axis; di↵erence made up with empty plasmid) and transfections were allowed to incubate for 48 hours. Cell lysates were collected and relative luciferase units were quantified. Although the interferon response was dampened by some of the proteases when transfected (dengue, Zika, West Nile) other proteases had no impact on interferon production caused by any of the STING alleles (Powassan, Langat, yellow fever virus).

105

Powassan Yellow fever Langat Langat S138A West Nile STING 104 Dengue S135A Dengue Zika

Relative Luciferase Units (RLU) 103 0 4 8 16 32 40

105

Powassan Yellow fever Langat 104 Langat S138A HAQ STING Dengue S135A West Nile Dengue Zika

103 Relative Luciferase Units (RLU) 0 4 8 16 32 40

105 Powassan Yellow fever Langat Langat S138A West Nile

104 Dengue R78Q STING Dengue S135A Zika

Relative Luciferase Units (RLU) 103 0 4 8 16 32 40

ng of protease transfected per well 52

Figure 3.12: STING alleles and flavivirus protease antagonism. Data from one concentration of protease in Figure 3.11 (40 ng) displayed as luciferase units from 40 ng protease transfection over luciferase units from 40 ng empty plasmid transfection.

STING HAQ STING R78Q STING

4.0 4.5 3.5

4.0 3.5 3.0

3.5 3.0 2.5 3.0 2.5

2.5 2.0 2.0 2.0 1.5 1.5 1.5 1.0 1.0 1.0

0.5 0.5 0.5

ND 0.0 0.0 0.0

YFV ZIKV YFV YFV ZIKV WNV POW LGTV DENV WNV POW LGTV DENV ZIKV WNV POW LGTV DENV

LGTV S138A DENV S135A LGTV S138A DENV S135A LGTV S138A DENV S135A

Co-transfected protease Co-transfected protease Co-transfected protease 53 other words, it only needs to cleave the STING that would be directly responsible for sensing the byproducts of infection (ER stress or the release of mitochondrial DNA). This hypothesis was supported by the fact that replication is significantly reduced in cells expressing non-cleavable

STING. It was also supported by early work showing that STING-dependent interferon production is reduced to a greater extent in cells expressing a cleavable form of STING.

However, my work suggests a di↵erent interpretation: that STING cleavage is not sucient for STING-dependent interferon dampening. There is much more to be explored when considering the interaction between STING and flavivirus proteases. How does the 78Q mutation really a↵ect

STING cleavage and interferon dampening by proteases? If cleavage is not sucient for immune dampening, what else is required? Are all proteases interacting with STING in the same manner?

The cleavage of STING during infection by the dengue virus protease is minimal—is the cleavage alone enough to significantly a↵ect infection outcome? I hypothesize that the transfection of the active dengue virus protease does result in STING cleavage (as shown many ways, and by many people) and does reduce interferon production, but that these two facts are unconnected. I think it is possible that the protease is cleaving STING but that cleavage is not necessary for interferon dampening, and instead binding or some indirect action is causing this e↵ect. For example, the binding of the protease to STING could prevent oligomerization, and therefore prevent interferon production. However, the binding of other proteases to STING may not prevent oligomerization.

However, in my hands the amount of infectious virus produced by A549 cells did decrease, although not to the levels I was expecting based on prior research, suggesting that di↵ering STING alleles do a↵ect virus replication. I would like to continue addressing the conflicting evidence of these results in the future, and hope to resolve some of these questions—possibly starting by re-doing these experiments with unbiased eyes and more experience.

3.10 Methods

Plasmids 54

The FLAG-tagged DENV2 NS2B3, expressed from the pCR3.1 plasmid, was a gift to the

Sawyer Lab from Yi-Ling Lin. This plasmid includes a 3x FLAG tag at the C-terminus of NS3.

Sequences for the DENV serotype 1, 3, and 4 proteases were ordered from IDT as gene blocks and cloned into the pCR3.1 plasmid. The other V5-tagged flavivirus NS2B3 plasmids were gifts from

Sonja Best. These NS2B3 constructs are expressed from the pcDNA6.2 plasmid, and all include a

V5 tag at the C-terminus of NS3. STING constructs used for functional analysis were amplified from cDNA libraries constructed from the HEK 293T cell line, or B-cell lines ordered from the

Coriell Collection of banked, sequenced cell lines. All B-cells ordered through Coriell originated in female individuals of Gujarti Indian ethnicity, country of origin USA (IDs: NA20856, NA20854, and NA20859). An HA tag was engineered onto the 3’ end of the gene sequences, separated from the coding sequence by a flanking region that was used to clone these sequences into the pLPCX mammalian expression plasmid.

STING cleavage assays

HEK 293T cells were grown at 37 C in DMEM supplemented with 10% FBS, Pen/Strep, and L-glutamine. 24 hours prior to transfection, cells were plated at a density of 5.0 105 cells per ⇥ well in a 12-well dish in antibiotic-free media. Wells were transfected with 800 ng plasmid encoding

STING and 800 ng plasmid encoding each flavivirus NS2B3 using TransIT 293 reagent (Mirus MIR

2704). After 24 hours, cells were collected for analysis of cleavage by western blot.

Western blotting

Cells were lysed in RIPA bu↵er supplemented with protease inhibitor (Roche, 4693159001).

Protein concentration was calculated using the Bradford method. 15% 37.5:1 Acrylamide / Bisacry- lamide gels were used to run 10-20 µg of whole cell lysate for each sample. Protein was transferred overnight at 30 volts onto a polyvinyl membrane. Blocking was performed with a 10% milk solu- tion in tris-bu↵ered saline supplemented with 0.1% TWEEN20. Primary antibodies used were used against HA (3f10 clone Sigma 11867423001), Flag (M2 clone Sigma F3165), V5 (SV5-Pk1, Abcam ab27671), Beta Actin (Mouse mAb Cell Signalling 8H10D10), and dengue virus NS3 (mouse poly- 55 clonal antibody raised against purified full-length NS3 from dengue 2 strain 16681). Secondary anti- bodies used were goat-anti-mouse-HRP (Thermo 62-6520) and goat-anti-rabbit (Thermo 65-6120).

Blots were developed using ECL Prime (Amersham RPN2232) and imaged using the ImagQuant

LAS 4000 or the Synergy HTX Multi-Mode Reader from BioTek Instruments.

Dengue infection assays

For all infection experiments, the indicated cell lines were plated out in F-12K antibiotic-free media with 10% FBS and allowed to attach to the plate for 24 hours. The indicated MOI was calculated for each well and dengue virus 2 (16681) was allowed to attach to cells for 1 hour at room temperature. Unattached virus was then removed from cells by two washes with DPBS. 2% serum in DMEM media was added to cells and they were maintained at 37 C with 5% CO2. After

24 hours the virus supernatant was removed for downstream titration on BHK21 cells. At the same time, cells were removed for downstream western blotting or for cell-associated viral RNA quantification. For titering infections: BHK cells were plated in six-well plates in antibiotic-free media. When confluent, the virus collected from infection supernatant was used to make 3 or 6

10-fold dilutions. 250 uL of each dilution was allowed to attach to BHK cells for 1 hour at room temperature. After 1 hour, MEM with 5% FBS was combined with 1% agarose boiled and allowed to cool to 60 degrees, and this was pipetted over the infected BHK cells. 3-5 days later, when plaques were visible under the agarose, 1 mL of 0.03% neutral red solution (Sigma Aldrich N2889) was added to each well and maintained at 37 C overnight. The neutral red and media was then aspirated o↵and plaques were visualized and counted on a white lightbox.

Luciferase assays

HEK 293T cells stably expressing firefly luciferase controlled by an ISRE promoter were trans- fected with plasmids expressing STING constructs with or without plasmids expressing flavivirus protease, as indicated. Transfected cells were maintained at 37 C with 5% CO2 for 48 hours.

Cells were then lysed in passive lysis bu↵er (Promega E194A) for 15 minutes at room temperature.

Luciferase was visualized using the Promega Luciferase Assay and a Biotek plate reader. 56

Rodent Sting1 sequences

Rodent Sting1 sequences were collected from Genbank and aligned using the MUSCLE al- gorithm in Unipro UGENE. Positive selection was detected using the Phylogenetic Analysis by

Maximum Likelihood (PAML) program. A master phylogenetic tree of the relevant rodents species was created and used as the accompanying phylogenetic tree [107]. The models M7 (null) and M8

(allowing for positive selection) were compared. In instances where M7 was rejected in favor of M8, specific codons for which the Bayes empirical Bayes (BEB) posterior probability was higher than

0.9 were recorded as being under positive selection. Chapter 4

DGAT2: a novel target of flavivirus proteases identified by machine learning

As described in Chapter 3, STING is a known human target of the dengue virus protease.

The discovery of novel targets of the dengue virus protease would increase our understanding of dengue biology—both the biochemical properties of the protease, as well as the cellular process of replication. By using a machine learning approach, the Sawyer Lab identified DGAT2 as a target of the dengue virus protease. I went on to show that the ability to cleave DGAT2 is important for ecient replication of dengue virus. I also showed that DGAT2 is cleaved by every flavivirus protease that was co-transfected with DGAT2, illustrating that the cleavage of DGAT2 may be a conserved mechanism by which flaviviruses alter the cellular environment during replication.

Finally, I screened over 100 proteins that had been predicted by machine learning to be cleaved by the dengue virus protease; however, none of these proteins had conclusive evidence of cleavage.

4.1 Predicting targets of the dengue virus protease by machine learning

As discussed in Chapter 3, dengue virus and related flaviviruses are hugely important for human health, and are poised to become an even greater threat as the world becomes warmer and wetter. Although currently, vaccines and mosquito control are our best path forward in protecting millions of people from flaviviruses, understanding the biology of flaviviruses can help us design better anti-virals, better vaccines, and better models for further study. One of the things it would be beneficial to know more about is the interaction of the host protease with host proteins, and what cellular functions are manipulated as a result. 58

Upon infection, the viral genomic RNA of dengue virus is immediately translated by host ribosomes and encodes one open reading frame that is translated as a single polyprotein[6]. This polyprotein is rapidly cleaved into 10 individual proteins, three of which form the virus particle: the envelope (E), the membrane (M) protein, and the capsid (C). The remaining seven nonstructural proteins (NS1, NS2A, NS2B, NS3, NS4A, NS4B and NS5) establish sites on membranes of the endoplasmic reticulum (ER) where replication of viral RNA occurs[6]. These viral replication complexes are sequestered to some extent in viral compartments (VCs) in the ER—invaginations of the ER membrane[55, 111, 70, 118]. Replicated viral RNA is packaged at adjacent vesicles on the ER membrane into immature viral particles, and these particles are tracked through the lumen of the ER and Golgi complex before being released as mature virus particles at the plasma membrane[118]. Dengue virus replication influences host lipid homeostasis by actively inducing membrane remodeling and altering the lipid composition of infected cells. Electron microscopy has demonstrated the extent of lipid membrane remodeling, and biochemical techniques have described the e↵ects on lipid composition and diversity in host cells during dengue virus replication[70, 31, 76,

15]. The quantity and relative concentration of several lipid species, such as phosphatidylcholines and triglycerides, are significantly changed during infection in both animal and mosquito cells[30,

15, 28, 62]. It has also been demonstrated that during dengue infection, autophagy-dependent processing of lipid droplets and triglycerides releases free fatty acids[31]. Moreover, the dengue

NS3 protein induces the redistribution of the host fatty acid synthase to sites of viral replication and increases cellular fatty acid synthesis[30]. These examples show the extent to which dengue virus is capable of altering lipid biochemistry in infected cells.

The non-structural proteins of dengue virus are crucial sites of host-virus interaction, some of which appear to be necessary for the changes in lipid architecture, while others are important for immune-dampening[66, 4, 1, 27]. Two of these non-structural proteins, NS2B and NS3, form the dengue virus protease complex, which is responsible for the cleavage of the viral polyprotein at cytoplasmic sites. This protease is also known to target some host proteins for degradation, most notably the immune signaling protein STING[1, 127, 99]. The cleavage of STING is discussed in 59 depth in Chapter 3. Importantly for this chapter, Zika and West Nile viruses are also known to target STING for degradation during infection, suggesting that identifying targets of the dengue virus protease can lead to identifying common targets of flavivirus proteases[18].

The targets of the dengue virus protease on the viral polyprotein are well-described (Figure

4.1). The protease complex is composed of the nonstructural proteins NS2B and NS3, and often referred to as NS2B3. NS3 contains the proteolytic active site, but the activity of this protease is dependent on the NS2B co-factor [22]. NS2B3 recognizes an eight amino acid motif, cleaving between the fourth and fifth position [50, 88]. It cleaves at least six known locations on the viral polyprotein, separating C and prM, NS2A and NS2B, NS2B and NS3, NS3 and NS4A, NS4A and

NS4B, and NS4B and NS5[13, 79, 78, 3, 53, 54, 110].

Alex Stabell, a previous Sawyer Lab graduate student, theorized that we could exploit the fact that these motifs are well-described to train a machine learning algorithm with the goal of predicting motifs recognized by the protease in the human proteome. To test this idea, I curated the data for several iterations of machine learning algorithms that were designed and coded first by Alex Stabell, and later by Jacob Stanley, a postdoc in the Dowell Lab.

To generate a list fully capturing the diversity of known motifs recognized by dengue virus proteases, I downloaded 3608 non-redundant viral genome sequences representing the 4 dengue virus serotypes. This data set included 1466 DENV-1 genomes, 1187 DENV-2 genomes, 730 DENV-3 genomes, and 225 DENV-4 genomes. I then translated and aligned these genomic sequences and manually curated motif data consisting of the eight amino acids surrounding each of the known six cleavage sites on the aligned dengue polyproteins.

The proteases from the four dengue virus serotypes have conserved protease active sites, therefore I combined the motif data from all serotypes when curating training data for the predictor[46]. I reduced this list to a non-redundant list of motifs, which represented the positive training data. The negative training data consisted of the cytoplasmic regions of the dengue virus polyprotein, excluding the targeted motifs, divided into eight amino acid-long sequences using a sliding window approach. Each amino acid had features assigned to it, such as: hydrophobicity, 60

Figure 4.1: The dengue virus polyprotein is cleaved by the dengue virus protease. The ten proteins that make up the dengue virus polyprotein (C, PrM, E, NS1, NS2A, NS2B, NS3, NS4A, NS4B, NS5) are initially translated as a single open reading frame before being cleaved by the dengue virus protease complex (NS2B/NS3) at the sites indicated by black arrows. (Schematic adapted from Umareddy et al 2007[112].)

NS1 PrM E NS2A NS4A ER Lumen

C Cytoplasm NS2B NS4B NS3 NS5 Dengue protease cleavage site Protease 61 molecular volume, isoelectric point, and other continuous variables. Therefore, the machine learning algorithm was trained on information that directly relates to the proteases ability to target and cleave the site (i.e. biochemical features of the amino acids), and not simply on the frequency of a given amino acid occurring at a given position. We then used this trained algorithm to predict targets of the dengue virus protease in the human proteome (for a basic schematic of this process, see Figure 4.2A). Because the dengue protease complex is associated with the ER membrane, we limited our target analysis to the cytoplasmic regions of transmembrane proteins of the human proteome, under the assumption that the dengue virus protease is most likely to interact with and therefore cleave proteins in the ER. Indeed, every known host target of the dengue virus protease

(and of flavivirus proteases) is a transmembrane protein.

We ran several iterations of this machine learning algorithm, changing parameters such as which features of the amino acids were considered, length of motifs, and the inclusion of the STING motif—an example of a well-described motif in the human proteome known to be cleaved. Jacob

Stanley was instrumental is formalizing the computational components of this experiment, and as an illustration of the output of his final iteration of the algorithm I plotted the curated positive data, negative data, and predicted human motifs generated by the machine learning algorithm classifying human proteome data as individual logos plots (Figure 4.2B).

We found the motif surrounding a cleavage site between amino acids 123 and 124 (sequence:

GGRR/SQWV, where / indicates the point of cleavage) in human DGAT2 to be a predicted target in several of our initial iterations (Figure 4.3A). This motif would be accessible to the protease, as it is in a cytoplasmic region of DGAT2 (Figure 4.3B). Importantly for later mutation experiments, the motif does not fall in the regions that are predicted to be biochemically active or lipid binding.

Stanley et al. also identified the same motif in DGAT2 as a predicted target of the dengue virus protease when considering just motif information from the viral polyprotein. DGAT2 is involved in lipid homeostatic pathways that have been shown to be important for dengue virus replication. Specifically, DGAT2 converts the substrates diacylglycerol and fatty acyl CoA into the product triacylglycerols, which are subsequently stored in lipid droplets[12]. Both lipid biosynthesis 62

Figure 4.2: Schematic of the machine learning protocol. A) Motifs that are known to recognized and cleaved by the dengue virus protease were curated and used to train a machine learning algorithm designed by Stanley et al [101]. The machine learning algorithm was then used to screen proteins in the human transmembrane proteome. B) A comparison of the motifs used as negative data, motifs used as positive data, and the motifs in the human proteome that were predicted to be cleaved, where the relative height of the letters denotes the relative frequency of each amino acid in the list of motifs.

A.

Identify cleaved motifs (positive training data) on dengue polypro- Apply algorithm to Validate targets by teins from genomes human transmem- Train machine learning co-transfection with availabe in virus brane proteome to algorithm on curated protease, visualize genome databases. predict targets of the training data. cleavage if present by Identify negative dengue virus western blot. training data from protease. un-cleaved regions of polyprotein.

B.

3 3 0.2 2 EE E E EEEE 2 GGGKKLL KL G R K LGGL LKLLGKR bits bits bits R G R RRRRKK R T 0.1 TTT TTTT AAAAAAV R V S R T 1 S VVVVVVA 1 R G A G I I I I I I I RG WP P I L SSS A M P T S SSSS K GV AL TT S PPPPDDD P D GK GVA R PGA C K NDDD P PP P N L G V K A N R K A N SS NNNN N P Q D N T SSV A T T A Q QQQQQQ P Q S AD GT PQ P G T A I G T MMMMMM Q K L MM A L S S H V C G H S V W WWF WF F I S W A I L N I RT Q F F F F WF WW N H H H H H H H H H H Q Y Y Y Y Y Y Y Y K K 0 0.0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 N C N C N C

negative training data positive training data predicted motifs 63 and the storage of triacylglycerols are disrupted during dengue infection [30, 31]. Therefore, we hypothesized the DGAT2 might be a functionally important target of the dengue virus protease.

4.2 DGAT is cleaved by the dengue virus protease

To test whether DGAT2 was cleaved by the dengue virus protease, I first selected the human

DGAT2 isoform to clone. DGAT2 has two described isoforms produced by ; isoform 1 (388 amino acids) is the longer canonical sequence, therefore I used this version of DGAT2 for all the experiments below (Figure 4.3B). As the shorter isoform 2 is missing amino acids 41-83, sequence primarily in the N-terminal, non-cleaved cytoplasmic region, I do not expect the results to be di↵erent for this isoform. DGAT2 was cloned from a human cDNA library (produced from

HEK 293T cells) and ligated into an expression plasmid with an HA tag at the C-terminus.

To evaluate the cleavage of DGAT2 produced from this construct, I co-transfected this plas- mid with a plasmid encoding the dengue virus protease. When the dengue virus NS2B and NS3 genes are expressed from a plasmid, the region is translated into a small polyprotein that then cleaves itself to become the functional protease complex. I used a plasmid expressing this NS2B-

NS3 region from the New Guinea C isolate of dengue virus serotype 2, adjacent to a 3x Flag tag at the C-terminus of NS3.

When I co-transfected HEK-293T cells with DGAT2-HA and the dengue virus protease and allowed for expression for 24 hours, I was able to visualize an HA-tagged cleavage product by western blot at approximately 30 kDa (Figure 4.4A). 30 kDa is the predicted size of the cleavage product if DGAT2-HA was cleaved at amino acid position 123, suggesting that DGAT2-HA was indeed cleaved at the predicted position. This cleavage product was not produced in the presence of an inactive dengue virus protease with a point mutation at the active site (S135A), confirming that the dengue protease cleaves wild-type DGAT2-HA in a transfection context. To confirm the motif predicted by machine learning was the target of the protease, I made mutations in the DGAT2-HA sequence altering the encoded motif (R122A, R123A, full motif compared to wild-type in Figure

4.3A). Wild-type and mutant DGAT2-HA were expressed equally well in transfection conditions. 64

Figure 4.3: A motif in DGAT2 was identified by machine learning using several di↵erent trained algorithms. A) The motif predicted to be cleaved by the dengue virus protease in DGAT2 is compared to the motif recognized by the protease in human STING. I found that cleavage of DGAT2 was inhibited by two arginine to alanine mutations at positions P1 and P2. B) The structure of DGAT2 includes two transmembrane domains (blue), two cytoplasmic domains (yellow), a lipid- binding domain, and an acyltransferase active domain. Here the cleavage site is also indicated with a red arrow.

A. Cleavage by protease

P4 P3 P2 P1 P1’ P2’ P3’ P4’ Human STING ...S R Y R G S W Y... Human DGAT2 ...G G R R S Q W V... Mutant DGAT2 ...G G A A S Q W V...

B. DGAT2 domain map

1 80-87 123 161-164 388

cytoplasmic TM TM cytoplasmic

lipid-binding acyltransferase domain active domain cleavage by protease 65

When the plasmid encoding mutant DGAT2-HA was co-transfected with the inactive or active protease, no cleavage product was detected in either condition (Figure 4.4A). This result indicates that the motif predicted to be cleaved by the dengue virus protease was correctly identified by the machine learning algorithm, and that altering the amino acids at this motif prevents cleavage by the NS2B3.

To confirm that DGAT2 is cleaved during infection with dengue virus, I created two stable cell lines using a lentiviral transduction approach. A549 cells were stably transduced with wild- type or mutant (non-cleavable) DGAT2-HA. I infected these cells with dengue virus serotype 2

(strain 16681) at an MOI of 10 and visualized DGAT2-HA by immunoblotting. I found that replicating virus reduced the protein amount of wild-type DGAT2-HA, although a cleavage product was not detected (Figure 4.4B). Virus infection reduced the levels of mutant DGAT2-HA slightly, however, the reduction was significantly greater when DGAT2-HA was cleavable. Reduction of mutant DGAT2-HA quantity may reflect the interferon response e↵ect of globally reducing protein expression.

Because we have previously noted that a proteasome inhibitor sometimes helps visualize

STING cleavage, I also tested the cleavage of DGAT2 in the presence of proteasome and autophagy inhibitors (MG132 and NH4Cl, respectively). Neither compound significantly a↵ected my ability to detect the cleavage product of DGAT2, suggesting that the degradation of these products is not fast enough to impair detection (Figure 4.5). Therefore I did not use MG132 in any future experiments with DGAT2.

4.3 Mutation of the DGAT2 cleavage motif reduces viral infection.

The fact that DGAT2 is cleaved by dengue virus during infection led us to two possible hypotheses. First, the presence of an exogenous, highly expressed, active protease in cells could lead to non-specific cleavage of human proteins without directly impacting replication of the virus.

Alternatively, the ability to cleave DGAT2 could be an evolved mechanism that directly benefits dengue virus during replication. I tested these two possibilities by comparing the levels of dengue 66

Figure 4.4: DGAT2 is cleaved by the dengue virus protease. A) 293T cells were co-transfected with wild-type (cleavable) or mutant (non-cleavable) DGAT2-HA with active or inactive protease for 48 hours. Cell lysates were collected and a western blot was performed. A cleavage product of DGAT2 was expected at approximately 30 kDa if the motif could be cleaved. This cleavage product is only present when wild-type DGAT2 was transfected. B) A549 cells stably transduced with wild-type (cleavable) or mutant (non-cleavable) DGAT2-HA were infected with dengue virus (MOI 10) for 24 hours. In the presence of dengue virus the full-length band for wild-type DGAT2 almost completely disappeared while the intensity of the band for mutant DGAT2 only slightly decreased.

A. B. wild-type mutant DGAT2-HA: wild-type mutant DGAT2-HA: DENV2: + + DENV2 protease: I A I A - - (active: A, or inactive: I) 50 kDa DGAT2 DGAT2 (anti-HA) 37 kDa (anti-HA) 37 kDa 25 kDa 25 kDa

100 kDa 100 kDa DENV2 protease 75 kDa DENV2 protease 75 kDa (anti-FLAG) 50 kDa Actin Actin 67

Figure 4.5: Treatment with a proteasome inhibitor did not significantly impact the intensity of either the full-length or cleavage product band of cleaved DGAT2. Plasmids expressing wild-type (cleavable) DGAT2-V5 were co-transfected with plasmids expressing active or inactive protease. After 24 hours, the media was replaced with MG132- or NH4Cl-containing media to inhibit degra- dation by the proteasome or the lysosome, respectively. No treatment significantly a↵ected the ability to visualize the DGAT2 cleavage product.

DGAT2 MG132 10mM - 5uM 10uM NH4Cl

Active Protease + + + + Inactive Protease + + + + NS2B3 NS3 (processed)

Western blot: anti-FLAG

DGAT2 (full-length) Cleavage product Western blot: anti-V5

Western blot: anti-actin 68 virus replication in the presence of cleavable or non-cleavable DGAT2.

For my first attempt, I complemented A549 cells with cleavable or non-cleavable DGAT2-

HA, without knocking down or knocking out endogenous DGAT2. I reasoned that a non-cleavable

DGAT2 would be dominant, because there would be expression of a non-cleavable DGAT2 in the cell, inhibiting the ability of the virus to replicate if the cleavage of DGAT2 is necessary.

After complementing these cells I infected them at MOI 0.1 or MOI 1.0 and collected virus from the supernatant at various time-points. However, I did not see a significant di↵erence in virus production when comparing cells expressing cleavable DGAT2 to non-cleavable DGAT2 at either

MOI (Figure 4.6).

I wanted to compare replication of dengue virus in these two conditions in the absence of endogenous DGAT2, to see if the presence of endogenous DGAT2 was obscuring the phenotype.

To accomplish this, I took an siRNA approach. First, I designed an siRNA that would target endogenous DGAT2. I tested the RNA levels of DGAT2 in A549 cells transfected with this siRNA, and found that DGAT2 RNA levels were significantly reduced in cells transfected with anti-DGAT2 siRNA compared to a mock siRNA transfection (Figure 4.7). Next, I created lentiviral plasmid constructs encoding cleavable or non-cleavable DGAT2-HA that were “immune” to the siRNA knock-down of endogenous DGAT2 by adding several synonymous mutations at the siRNA-targeted sequence. I created lentivirus with these constructs and transduced A549 cells to create two matched stable cell lines.

To test dengue genomic RNA replication in these cell lines after infection, I first either mock transfected (transfection reagent, no siRNA) or transfected these cell lines with anti-DGAT2 siRNA and incubated for 48 hours. I infected these cells with dengue virus (MOI 0.1) and collected cellular

RNA 24 hours post-infection. I created cDNA libraries and quantified dengue virus RNA levels by qPCR using DENV-2-specific primers. I found that without anti-DGAT2 siRNA treatment, dengue virus genomic RNA levels were slightly reduced in cells expressing non-cleavable DGAT2-

HA compared to cells expressing cleavable DGAT2-HA (Figure 4.8A). The e↵ect was dramatically increased by the knock down of endogenous (cleavable) DGAT2 by the addition of anti-DGAT2 69 Figure 4.6: Wild-type A549 cells complemented with mutant DGAT2 did not significantly inhibit dengue virus replication. Wild-type A549 cells were stably transduced with wild-type or mutant (non-cleavable) DGAT2-HA. These cells were then infected with dengue virus. A) Cells were infected at MOI 0.1 or MOI 1.0 for 36 hours and supernatant was collected at 8 hours, 16 hours, 24 hours, and 36 hours for titering on BHK cells. B) These A549 cells were collected at MOI 0.1 for 60 hours and supernatant was collected every 12 hours. Although at MOI 0.1 in both A and B infection was lower in cells expressing DGAT2, the e↵ect was very subtle.

A. DENV2 Infection 8 x 106

6 x 106

4 x 106 wild-type MOI 0.1 2 x 106 mutant DGAT2 MOI 0.1 1 x 106

PFU/mL wild-type MOI 1.0 8 x 105 mutant DGAT2 MOI 1.0 6 x 105

4 x 105

2 x 105

0 0 8 16 24 36 Hours from infection collected B. DENV2 Infection, MOI 0.1 10&

10%

10$

10# PFU/mL Wild-type A549s 10" A549s + Wild-type DGAT2 A549s + Mutant DGAT2 10!

0 12 24 36 48 60 Hours from infection collected 70

Figure 4.7: Endogenous DGAT2 is significantly reduced after transfection of DGAT2-targeting siRNA. A549 cells were transfected with an siRNA targeting DGAT2, or mock transfected. After 48 hours, cell lysates were collected and RNA was prepared. DGAT2 expression relative to GAPDH was quantified by qPCR, and DGAT2 expression in cells transfected with DGAT2-targeting siRNA was significantly reduced compared to mock-transfected cells.

DGAT2 expression relative to non-transfected

100

80

60

DGAT2 RNA DGAT2 40

20 (percent relative to untreated)

no siRNA DGAT2-targeting siRNA 71 siRNA. The absence of endogenous DGAT2 increases the e↵ect of DGAT2-HA cleavability on

DENV genome replication, suggesting that the presence of non-cleavable DGAT2-HA hinders the ability of dengue virus to produce genomic RNA (Figure 4.8A).

I next tested whether the quantity of infectious particles released from these cells is also dependent on the ability of dengue virus to cleave DGAT2. For this experiment, I used the A549 cell lines expressing siRNA “immune” version of cleavable or non-cleavable DGAT2-HA. Nontrans- duced A549 cells or A549 cells expressing the cleavable or non-cleavable version of siRNA “immune”

DGAT2-HA (three cell lines in total) were transfected with anti-GFP or anti-DGAT2 siRNA and incubated for 48 hours. I then infected these cells with dengue virus (MOI 0.1), collected the supernatant 24 hours post-infection, and titered the virus by plaque assay on BHK cells. Impor- tantly, I found that the quantity of infectious virus was not significantly di↵erent between A549 cells expressing only endogenous DGAT2 and A549 cells expressing endogenous DGAT2 and ex- ogenous, cleavable DGAT2-HA (Figure 4.8B, left gray vs middle gray bars). This result indicates that the presence of a transduced copy of cleavable DGAT2 did not significantly a↵ect virus repli- cation. Moreover, both in nontransduced A549 cells (left gray and red) and A549 cells transduced with cleavable DGAT2-HA (middle gray and red), the levels of infectious virus in the supernatant were not significantly altered when the cells were treated with anti-DGAT2 siRNA versus anti-

GFP siRNA. Thus, the presence of DGAT2-HA, when the protein could be cleaved, did not a↵ect infectious particle production.

In contrast, the presence of exogenous non-cleavable DGAT2-HA led to significantly less infectious virus in the supernatant (Figure 4.8B, compare anti-GFP conditions in the three di↵erent cell lines). Moreover, in cells expressing non-cleavable DGAT2-HA, the transfection of anti-DGAT2 siRNA resulted in significantly less infectious virus than the transfection of anti-GFP siRNA. This result suggests that the presence of non-cleavable DGAT2 greatly impacts the ability of dengue virus to form infectious particles. In summary, these results suggest that functional DGAT2 is detrimental for dengue virus genome replication and progeny production, and that the virus may have evolved to cleave DGAT2 for a replication advantage. 72

Figure 4.8: Non-cleavable DGAT2 reduces DENV2 replication when endogenous DGAT2 is knocked-down by siRNA. A) Copies of cell-associated dengue virus RNA quantified by qPCR in non-cleavable DGAT2-HA cells relative to cleavable DGAT2-HA cells after mock transfection or after siRNA knock down of endogenous DGAT2. Data presented are representative of two biologi- cal replicates. B) Infectious dengue virus from supernatant after 24-hour infection were quantified by plaque assay. Data presented are representative of two biological replicates, three technical replicates each. (*: p¡0.05, **: p¡0.01, ***: p¡0.001, ****: p¡0.0001, error bars are standard error of the mean.)

A. Fold change in copies of DENV RNA B. DENV titers in +DGAT2 stable A549s in +DGAT2 stable A549s **** 2 *** 10 * 10000 ** 1 10 **** 1000 100

10-1 100

10-2 10 10-3 DENV titers (pfu/mL)

10-4 1 WT DGAT2 mock siRNA-treated cells WT DGAT2 Fold change in DENV2 RNA relative to Fold change in DENV2 RNA siRNA mock DGAT2 mock DGAT2 siRNA target GFP DGAT2 GFP DGAT2 GFP DGAT2 cell line A549s + A549s + cell line WT A549s A549s + A549s + cleavable non-cleavable cleavable non-cleavable DGAT2 DGAT2 DGAT2 DGAT2 73

4.4 Confirmation with DGAT2 KO A549s

I originally planned to test that infection was reduced in DGAT2 KO Huh7 cells comple- mented with wild-type or cleavage motif mutant HA-tagged DGAT2. Huh7 cells are more relevant to dengue virus infection than A549 cells, and have two copies of DGAT2, compared to A549 cells, which have three copies. I was able to successfully make a DGAT2 knock-out Huh7 cell line, but was unable to stably transduce them with HA-tagged DGAT2 using lentivirus-based methods. To make a DGAT2 KO Huh7 cell line, I used an ribonucleoprotein (RNP) complex transfection approach.

I chose a guide targeting endogenous DGAT2 from the IDT predesigned gRNA database, formed an RNP complex composed of Cas9 endonuclease and the triggering RNA system, and transfected these RNP complexes into Huh7 cells using the Lonza nucleofector technology. I then screened for single colonies that had frame-shifting indels in DGAT2, and grew those up to clonal populations.

However, I was unable to transduce DGAT2 KO Huh7 cells with DGAT2-expressing plasmids using a standard lentiviral approach. In the future, I would like to use a di↵erent transduction method to stably introduce these constructs into Huh7 cells. In the mean time, however, I created a DGAT2 KO A549 cell line which I was able to stably transduce with DGAT2 constructs. To create this cell line, I created a pSpCas9(BB)-2A-Puro (PX459) V2.0 plasmid with the guide RNA sequence predesigned by IDT (as I knew from the above work in Huh7 cells that it would target endogenous DGAT2 eciently). I transfected this plasmid into A549 cells and treated the cells with puromycin, then diluted the surviving population to create clonal populations. I then identified a clonal population with frame-shifting indels in all three copies of DGAT2 and used these cells for downstream experiments.

I first stably transduced these DGAT2 KO A549 cells with cleavable or non-cleavable DGAT2-

HA. Then I was able to repeat the siRNA-based experiments but in a system that was more reliable and that required less manipulation throughout the experiment. I infected these three cell lines with dengue virus (MOI 0.1) for 24 hours, then titered the virus in the supernatant and quantified cell- associated viral RNA. Cell associated viral RNA was only significantly di↵erent from DGAT2 KO 74

A549 cells in cells expressing non-cleavable DGAT2, and was significantly reduced (Figure 4.9A).

Infectious progeny in the supernatant was slightly increased in cells expressing cleavable DGAT2 compared to DGAT2 KO cells, which was surprising when compared to the results using DGAT2- targeting siRNA in wild-type A549 cells (Figure 4.9B). We hypothesize that long-term depletion of DGAT2 may impair cell health, and therefore be indirectly detrimental for virus replication. In cells expressing non-cleavable DGAT2, infectious progeny was significantly reduced compared to

DGAT2 KO cells and cells expressing wild-type DGAT2. In summary, these results suggest that functional DGAT2 is detrimental for dengue virus genome replication and progeny production, and that the virus may have evolved to cleave DGAT2 for a replication advantage.

4.5 Chemically inhibiting DGAT2 activity

My working model for the cleavage of DGAT2 by flaviviruses was that DGAT2 cleavage alters the cellular environment by reducing DGAT2 activity. In other words, the conversion of diacylglycerols to triacylglycerols was being reduced, and more phospholipids were being created and used for membrane expansion. Because DGAT2 was still active in cells where non-cleavable

DGAT2 was present, dengue was not able to limit DGAT2 activity and was therefore not able to adequately alter the cellular environment in a way that was conducive to infection.

To test this theory, I infected non-cleavable DGAT2-expressing cells in the presence of a

DGAT2 inhibitor. My goal was to show that when DGAT2 was non-cleavable, but DGAT2 activ- ity was inhibited, dengue virus replicates to higher titers compared to cells that were expressing mutant DGAT2 but DGAT2 activity was not inhibited. The inhibitor I ordered has been shown to significantly inhibit the activity of DGAT2 in HeLa cells[24].

However, drug treatment at various concentrations (as high as 500 uM) did not significantly a↵ect the ability of dengue virus to replicate. For these experiments I infected non-cleavable

DGAT2 expressing A549 cells, then treated them with increasing levels of PF-06424439 after a

1-hour adsorption period. After 24 hours, I collected virus and performed qPCR and plaque assays to measure virus RNA production and infectious particle production, respectively. I performed 75

Figure 4.9: Non-cleavable DGAT2 reduces DENV2 replication when endogenous DGAT2 is knocked out. A) Infectious dengue virus from supernatant after 24-hour infection were quantified by plaque assay. B) DGAT2 KO A549s were transduced with cleavable or non-cleavable DGAT2 and infected with dengue virus (MOI 0.1) for 24 hr. Copies of cell-associated dengue virus RNA were quantified by qPCR. E) Virus in the supernatant from replicates in D were titered by plaque assay. Data presented are representative of three biological replicates. (*: p¡0.05, **: p¡0.01, ***: p¡0.001, ****: p¡0.0001, error bars are standard error of the mean.)

A. Fold change in copies of DENV RNA B. DENV titers in DGAT2 KO, in DGAT KO, +DGAT2 stable A549s +DGAT2 stable A549s **** 100000 * 2.0 ** **

10000 1.5 1000 1.0 100

0.5 10

0.0 1

DGAT2 KO A549s + no DGAT2 cleavable non-cleavable DGAT2 KO no DGAT2 cleavable non-cleavable DGAT2 DGAT2 A549s + DGAT2 DGAT2 76

Figure 4.10: Mutant (non-cleavable) DGAT2 A549 cells were infected with dengue virus (MOI 0.1) for 24 hours in media supplemented with increasing concentrations of a DGAT2 inhibitor (PF-06424439). Replication increased

DENV2 PFU vs. DGAT2 inhibitor concentration

1 104

8 103

6 103

4 103

2 103

0

0

1.25 nM12.5 nM125 nM1.25 uM12.5 uM125 uM 0.125 nM

DENV2 PFU vs. DGAT2 inhibitor concentration

3x103

2x103

1x103

0 0 5 uM 0.5 uM 50 uM 5 mM 500 uM 77 these experiments for two separate titrations (Figure 4.10). Although there was a slight upward trend as concentration increased, the di↵erence was not significant. I hypothesize that one reason this experiment might not have shown a significant increase in virus production is because the drug has been shown to limit endogenous levels of DGAT2 activity, but not DGAT2 activity in cells overexpressing the protein. In the future, I would like to be able to measure the levels of DGAT2 activity in cells stably transduced with overexpressed DGAT2 (cleavable and non-cleavable) before and after DGAT2 inhibitor treatment.

4.6 Cleavage of DGAT2 is conserved in the flavivirus family.

Dengue virus is one member of related RNA viruses in the genus Flavivirus (Figure 4.11A)[92].

The cleavage of one host protein (STING) is conserved for at least some flaviviruses, both mosquito- borne and tick-borne[1, 18, 99]. I speculated that the cleavage of DGAT2 might also be at least partly conserved in the Flavivirus genus. To test this, I co-transfected plasmids expressing hu- man DGAT2-HA with plasmids expressing the proteases of multiple flaviviruses. For each of these proteases, I used a plasmid expressing the NS2B-NS3 region from the indicated flavivirus genome, adjacent to a V5 tag at the C-terminus of NS3. Like the dengue NS2B3 described above, upon trans- lation, the proteolytic domain of NS3 autocatalytically processes the NS2B3 polypeptide to form

NS3 and NS2B proteins, which act together as the flaviviral protease complex. I tested the ability of the proteases from Zika, West Nile, yellow fever 17D (vaccine strain), yellow fever Asibi (virulent strain), Powassan, Langat, and Rio Bravo viruses to cleave DGAT2-HA in a co-transfection con- text. I found that the ability to cleave DGAT2-HA appears to be completely conserved among the

flaviviruses tested (Figure 4.11B). This result suggests that the cleavage of DGAT2 is a conserved mechanism by which flaviviruses alter the cellular environment during infection. In fact, the motif recognized by the flavivirus proteases is conserved in all nonhuman primate versions of DGAT2, as well as in the sparrow and marmot versions of DGAT2, animals relevant to West Nile virus and

Powassan virus, respectively (Figure 4.11C). Therefore, this mechanism could be conserved not just in human cells, but in the cells of most flavivirus hosts. 78

Figure 4.11: DGAT2 is cleaved by the proteases from multiple flaviviruses. A) A phylogenetic tree representing the relationships between multiple flaviviruses tested, built using sequence from the NS3 protein. Mosquito-borne flaviviruses are marked by the pink box, tick-borne flaviviruses are marked by the blue box, and Rio Bravo virus, a flavivirus without a known vector, is marked by a yellow box. A phylogeny built using the flavivirus NS3 aligns as expected with accepted flavivirus relationships[48]. B) HA-tagged DGAT2 co-transfected with C-terminus V5-tagged proteases from multiple flaviviruses. C) The DGAT2 motif is conserved in nonhuman animals relevant to flavivirus infections.

A. B. Protease: - DENV2ZIKVWNVYFV YFV(17D) (Asibi)POWLGTVRio Bravo Dengue virus 50 kDa Zika virus 37 kDa DGAT2-HA West Nile virus

Mosquito-borne cleavage Yellow fever virus 25 kDa product

Langat virus 100 kDa Powassan virus

Tick-borne 75 kDa active protease (V5) Rio Bravo virus actin Hepatitis C virus (hepacivirus)

C. Human . . . G G R R S Q W V . . . Alpine marmot . . . G G R R S Q W V . . . relevant for POWV Yellow-bellied marmot . . . G G R R S Q W V . . . Sparrow . . . G G R R S Q W V . . . relevant for WNV Thrush . . . G G R R S Q W V . . . 79

4.7 The cleavage of DGAT2: discussion

The ability to alter the lipid environment of host cells is crucial for flaviviruses, which are dependent on host lipid membranes for replication compartments and autophagy of lipids for energy[15, 6, 76, 31, 30]. In this chapter I showed that ecient dengue virus replication depends on the ability of the virus to cleave DGAT2, an acyltransferase that catalyzes the synthesis of triacylglycerols using diacylglycerol and fatty acyl CoA as substrates [12]. This adds to a body of work implicating the triacylglycerol homeostatic pathways in Flaviviridae replication. Heaton et al. have previously shown that fatty acid synthase (FASN) activity is increased during dengue virus infection[30]. FASN is a multi-enzyme complex that catalyzes fatty acid synthesis [97]. Amongst other destinations, fatty acids are shuttled into the diacylglycerol synthesis pathway [25, 56]. Di- acylglycerol can in turn be used for phospholipid synthesis or triacylglycerol synthesis [125, 19].

Triacylglycerols are stored in lipid droplets or, in some cells, form lipoproteins, biochemical as- semblies whose primary purpose is to transport hydrophobic lipid molecules in water [125]. In dengue virus infected cells, lipid droplets are reduced, decreasing the amount of lipids stored as triacylglycerols[31]. Here I implicate an intermediate protein of this triacylglycerol synthesis and storage pathway, DGAT2, adding to the mechanistic understanding of dengue’s perturbation of lipid homeostasis (Figure 4.12).

The cleavage of DGAT2 is accomplished by the dengue virus protease, which previously was known to cleave only a few host proteins [1, 127, 49]. DGAT2 is cleaved by the dengue virus protease in both a transfection context and in a live infection context. The motif predicted by machine learning was confirmed to be the motif recognized by the protease by mutation of the encoding sequence. I went on to show that inhibiting the ability of the dengue virus protease to cleave DGAT2 during infection significantly reduces dengue virus genome replication and progeny production.

Combining this data with previous work suggests a model where dengue virus is increasing the supply of fatty acids for phospholipid synthesis while actively decreasing the conversion of these fatty acids to stored triacylglycerol moieties (Figure 4.12). I propose that the cleavage of DGAT2 impairs 80

Figure 4.12: Proposed model for the e↵ect of flavivirus cleavage of DGAT2. In our model, the cleavage of DGAT2 decreases the number of diacylglycerols converted to stored triacylglycerols, freeing diacylglycerols to be used in phospholipid synthesis. Figure adapted from Yen et al[125].

- flavivirus

DGAT1 DGAT2

+ flavivirus

targeted by DGAT1 DGAT2 flaviviruses

Phospholipid Fatty Acyl CoA Diacylglycerol Triacylglycerol 81 the ability of cells to shuttle diacyglycerols into a pathway where they are committed to storage as triacylglycerols, keeping them in the pool of reagents available for membrane synthesis. As the virus remodels membranes to create viral compartments and convoluted membrane structures, the increased pool of phospholipids would allow for a virus-induced local surface area increase. Future experiments will reveal whether inhibiting the diacylglycerol O-acyltransferase activity of DGAT2 restores the ability of the virus to replicate in cells expressing non-cleavable DGAT2. This would conclusively pinpoint DGAT2’s triacyglycerol-catalysis activity to be inhibiting ecient replication of dengue virus. Because the phenotype of DGAT2 cleavage is conserved among viruses in the

Flavivirus genus, I suggest that this model of lipid manipulation in humans may be generally applicable to the genus. We also suggest, based on the conservation of the motif recognized by the

flavivirus proteases in nonhuman primates, rodents, and birds, that this is a generally applicable mechanism used by flaviviruses to replicate in all hosts. Future work testing the replication and progeny generation of flaviviruses besides dengue virus in cells both human and nonhuman where

DGAT2 is non-cleavable will reveal to what extent this phenotype is conserved. Finally, this work reveals the importance of studying lipid manipulation to improve flavivirus therapeutics—and provides an avenue for exploring new therapies[117, 129].

4.8 Attempted further validation of the machine learning approach to pre- dicting targets of flavivirus proteases

Although I attempted to further validate the predictions of the machine learning algorithms produced by Alex Stabell and Jacob Stanley, I was not successful. For the screening of these proteins, I ordered 108 plasmids encoding human genes from the Function Genomics Facility (Uni- versity of Colorado Anschutz Medical Campus). All of these clones had a C-terminal V5 tag. I prioritized genes encoding proteins that had the highest predicted likelihood of being cleaved, i.e. the highest posterior probability. Each plasmid was sequenced to confirm that the correct gene had been cloned, and that the reading frame was correct—during this process, I identified several

Broad ORF clones for which the gene was incorrect or out-of-frame. I co-transfected each plasmid 82 with another plasmid encoding the active dengue virus protease or a plasmid encoding an inactive dengue virus protease. I could then directly compare by western blotting the expression of each protein in the presence or absence of an active protease. I expected to be able to see a cleavage product (e.g. the lower band seen when STING is cleaved) or some reduction of the brightness in the full-length band (e.g. the reduction of the full length band seen when DGAT2 is cleaved).

Because the cleavage motif was also predicted by machine learning, I knew what size of product was most likely.

The vast majority of host proteins I co-transfected with the dengue virus protease remained uncleaved, and expression was not reduced. Here, I have included an example blot that illustrates the diversity I saw in terms of expression, size, and replicate-to-replicate variation (Figure 4.14A).

For example, some genes did not express to levels I could detect by western blot, and expression was greatly variable between genes, despite all being on the same promoter. This was expected, as promoter is not the only determinant of expression. However, one frustrating aspect of screening by western blotting was finding expression reduced in one replicate, but not others. This led to several initial false positives. For example, based on this first western blot, UBXN4 initially seemed like a good candidate of a protein that might be degraded by the dengue virus protease before later replicates ruled it out. I also experimented with proteasome and lysosome inhibitors, as with

DGAT2, in case preventing degradation by the proteasome or lysosome helped visualize cleavage products for other genes. In some cases, such as the provided example of UBXN4, treatment with

MG132 increased the brightness of the full-length band (although in this case did not preserve a cleavage product)(Figure 4.14B). Therefore I included at least one replicate with MG132 for each gene.

In an attempt to standardize the screening, such that I was not basing my reporting purely on whether proteins looked like they were degraded by eye, I began to quantify my western blots.

To do this, I took advantage of the Stain-Free Imaging Technology from Bio-Rad. For each western

I took images of the total protein on the gel, the total protein transferred to the membrane, and the final blot after primary and secondary antibody addition. I normalized the intensity of the 83

Figure 4.13: Example screening of predicted host proteins by western blotting. A) The gene encoding each protein of interest is co-transfected with either an inactive or active dengue virus protease. Cell lysates are then collected and western blots are performed. In the presence of inactive protease, there should be a band at the size of the full-length protein. In the presence of an active protease, there may be reduced intensity of the full-length band or the appearance of a cleavage product. B) In the case of STING, there is no reduction in intensity of the full-length band but the increased intensity of a cleavage product band. In the case of DGAT2, there is both decreased intensity of the full-length band and increased intensity of the cleavage product band. A negative result, e.g. SCD, has neither.

A.

Protein of Interest V5 + +

Dengue protease (inactive)* + Dengue protease (active) +

anti-V5

B.

Inactive protease + + + Active protease + + + 50 37 anti-V5 25 20

STING DGAT2 SCD Positive Predicted by Control machine learning 84 Figure 4.14: The vast majority of predicted proteins were not cleaved by the dengue virus pro- tease. A) When plasmids encoding predicted protease-target genes were co-transfected with active protease, there was generally no reduction in protein intensity or appearance of a cleavage product compared to co-transfection with an inactive dengue virus protease. This is an example sample of proteins that were screened, showing the variety of size, expression, and phenotype seen while screening via transfection and western blot. UBXN4 originally looked promising, but was eventu- ally eliminated as a positive result. B) For proteins that originally seemed like good candidates for follow-up from initial screening (such as UBXN4) were subjected to co-transfections with the addition of proteasome inhibitor and autophagy inhibitor. Here for follow-up testing of UBXN4, I tested co-transfection in the presence of two di↵erent concentrations of MG132 and one concen- tration of NH4Cl. Although band intensity was brighter in the presence of MG132, there was no appearance of a cleavage product.

A.

Inactive Protease + + + + + + Active Protease + + + + + +

ABHD4 EMD PKMYT1 SGK196 TYRO3 UBXN4 38.8 kDa 29.0 kDa 54.5 kDa 40.1 kDa 96.9 kDa 56.8 kDa

anti-V5

active protease actin

B. UBXN4 MG132 10mM - 5uM 10uM NH4Cl

Active Protease + + + + Inactive Protease + + + +

Anti-Flag

Anti-V5 85 full-length (and possible cleavage band) to the background of each lane and compared normalized intensity in the di↵erent cleavage conditions. In the case of DGAT2, this quantification was a nice visual way to see the decreased intensity of the full-length band and the increased intensity of the cleavage product in the presence of an active dengue virus protease. However, in other cases this quantification was misleading—although for individual replicates the change in intensity would be large, skewing the data to appear as if the protein was being degraded in the presence of the protease, further replicates would show no change in intensity and I would ultimately conclude that the protein was not being cleaved or degraded by the protease. For that reason, I stopped quantifying the western blots and relied on detection by eye. Even after screening over 100 proteins that had been predicted to be cleaved by machine learning, I did not identify any proteins that were cleaved by the dengue virus protease beyond DGAT2.

As a final attempt to produce a list of proteins that are cleaved in vitro to compare to the list of computational predictions, I worked with the Old Lab to perform some mass spectrometry trial experiments (Figure 4.16A). Although others have performed mass spectrometry on cells infected with dengue virus, I hoped that a transfection of just the protease, especially when compared to an inactive protease, would result in a dataset we could mine for proteins that had been cleaved as a direct consequence of translated protease (in contrast to indirect activities of immune system upregulation during infection). I was not just interested in proteins that were significantly reduced after transfection of the protease, but also the changing peptides between samples containing inac- tive and active dengue virus proteases (Figure 4.16B). To identify these, I transfected Huh7 cells with inactive or active dengue virus protease, and then performed subcellular fractionations on the cellular lysate. This was performed to narrow the scope of host proteins under consideration, as I was primarily interested in transmembrane ER proteins. However, I found that subcellular fractionation did not result in a fraction that was noticeably enriched for the protease, either active or inactive, compared to the others (Figure 4.16C), so all fractions were analyzed in the Old Lab.

After digesting the samples with chymotrypsin, Christopher Ebmeier performed mass spectrom- etry and identified peptide sequences significantly di↵erent between active protease transfection 86

Figure 4.15: Quantifying western blots led to misleading results about which proteins may have reduced expression in the presence of an active protease. Although in some cases the consistency of the western blots support the quantification (for example, DGAT2) the quantification of other blots did not have results that reflected any consistency in cleavage.

DGAT2 Cleavage DGAT2 Full-length band Product Band 2.0x107 * 1.0x106 *

8.0x105 1.5x107

6.0x105 1.0x107 4.0x105 Normalized Intensity Normalized Intensity 5.0x106 2.0x105

0 0 Mutant Wild-type Mutant Wild-type Protease Protease Protease Protease

Replicate 1 Replicate 1

EMD Full-length band MINPP1 Full-length band

8 5.0x108 2.5x10 * * 2.0x108 4.0x108

8 3.0x108 1.5x10

2.0x108 1.0x108 Normalized Intensity Normalized Intensity 1.0x108 5.0x107

0 0 Mutant Wild-type Mutant Wild-type Protease Protease Protease Protease

Replicate 1 Replicate 1 Replicate 2 Replicate 2 Replicate 3 Replicate 3 87 samples and inactive protease transfection samples. We then compared these peptides to the list of predicted peptides based on the list of proteins predicted to be cleaved and the exact motif at which they were predicted to be cleaved. Unfortunately, there were no matches between datasets.

Although this was disappointing, the full list of predicted proteins has not been fully explored.

Ultimately, Jacob Stanley published a classifier that identified 257 proteins possessing a possible cleavage motif[101]. Because the list was continually evolving over the course of several years, as the machine learning algorithm was tweaked and improved, many of the proteins I tested over the course of my PhD are no longer included on the list of predicted proteins, and, conversely, many of the proteins on the final list I have not cloned and tested. Therefore, there is potentially a treasure trove of undiscovered novel targets of the dengue virus protease on this list. I have included this list as Appendix B (accession numbers and gene IDs are listed) for those who might be interested in further exploration of the predicted targets, as it was not published in the 2020 conference proceedings paper. (Note: interestingly, DGAT1, but not DGAT2 is included on this list—the final iteration of the predictor included a second stage of classification based on structure, a stage which is not discussed in this work but which eliminated DGAT2 from the final list of predicted proteins.)

There are several ways to sort this list to prioritize proteins to test against the dengue virus protease, many of which I did with early lists. First, as the Sawyer Lab is especially interested in proteins that are evolving in an antagonistic relationship with pathogens, it would be interesting to do an evolutionary analysis of each protein on the list: specifically looking at whether the motifs that are predicted to be cleaved are evolving in primates with a higher rate of nonsynonymous mutations than synonymous mutations compared to other regions of the protein. If so, this might demonstrate that over the course of primate evolution, those proteins have been under selective pressure from flaviviruses to evolve motifs that are not cleavable.

Another way to interrogate this list computationally before choosing proteins to test at the bench would be to compare this list to other lists of host proteins shown to be interacting with dengue virus or other flaviviruses. For example, Enard et al published a preprint focused on virus interacting proteins, specifically proteins that interact with RNA viruses, which combined data 88

Figure 4.16: Mass spectrometry schematic and subcellular fractionation of Huh7 cells transfected with dengue virus protease. A) Huh7 cells were transfected with dengue virus protease, active or inactive, for 24 hours. Cell lysate was collected and subcellular fractionation was performed. Cell lysates were then treated with chymotrypsin and subjected to TMT labeling in the Old Lab before mass spectrometry was performed. B) Analysis on the mass spectrometry data was performed to identify proteins for which peptides only found when the protein was cleaved at the predicted motif were significantly enriched in active protease samples. C) Subcellular fractionation revealed that the protease was not more significantly associated with large ER, small ER, or nuclear-envelope associated ER membrane.

A. Wild-type S135A (inactive) DENV2 NS2B3 DENV2 NS2B3

chymotrypstin sub-cellular digest, fractionation TMT labeling, MS

Sawyer Lab Old Lab

B. Targeted protein in the presence of S135A (inactive) protease

Peptides after chymotrypsin digestion ( )

Targeted protein in the presence of active protease

Peptides after chymotrypsin digestion ( )

different peptides

C. Nucleus WCL Flag-tagged Protease Large ER Small ER [active (A) or inactive (I)]: A I A I A I A I 100kDa 75kDa anti-FLAG 50kDa

anti-Actin 89 from both high-throughput and low-throughput screens [21]. On comparing the proteins that they listed as dengue-interacting proteins, I identified one protein of overlap between that list and the top hits of our most updated machine learning screen: SLC25A6. I proceeded to clone and test whether SLC25A6 was cleaved by the dengue virus protease when co-transfected. Ultimately I was unable to show that SLC25A6 was cleaved (Figure 4.17B). However, I do not think that this single data point should dissuade a more comprehensive literature search and analysis of other proteins that have been predicted by machine learning to be cleaved by dengue virus.

Finally, there is more to do in further optimizing the machine learning screen itself. The most updated version of the training data does not include any host proteins; therefore the algorithm does not take into account the motifs found in STING, DGAT2, FAM134B, or MFN1/2. By introducing those motifs into the algorithm there may be new proteins that move up the list of predictions to screen. We also queried the entire human transmembrane protein list, which may mean that we are unnecessarily testing proteins that spend most of their lifetime in membranes where they are not easily accessible to the dengue protease. If we more carefully curated the list of human proteins that were being screened, again, more relevant proteins may rise to the top of the list.

Finally, similar machine learning algorithms could be created with motifs from other flavi- viruses—which could create lists to compare to the dengue list, but could also reveal interesting new candidates only relevant to non-dengue flaviviruses. As a slight detour from the molecular biology approach I have primarily used to validate the machine learning algorithms designed by Alex Stabell and Jacob Stanley, Jacob and I took a phylogenetic approach to testing the algorithm. Our goal was to test how the algorithm performed on the motifs of other flaviviruses at various evolutionary distance from dengue. The conservation of cleavage motifs would be expected to increase with decreasing evolutionary distance. Motifs from flaviviruses that are more closely related to dengue would be predicted to be cleaved more often by the machine learning algorithm than those of more distantly related viruses.

I first found complete polyprotein sequences for six other flaviviruses. Using a representative

NS3 protein sequence from each, I created a phylogenetic tree of all ten viruses (the six new 90

Figure 4.17: SLC25A6 is not cleaved by the dengue virus protease. SLC25A6 became interesting because it is a known dengue interacting protein, but does not show up in knock-out studies designed to identify genes for which knocking out the gene results in increased dengue virus replication. However, when a plasmid expressing SLC25A6-HA is co-transfected with an active protease, no reduction in band intensity or cleavage product is seen.

DENV2 protease [inactive (I) or active (A)] I A 50 kDa SLC25A6 37 kDa

100 kDa Active protease 75 kDa Inactive protease

Actin 25 kDa 91

Figure 4.18: Accuracy of machine learning algorithm trained on dengue motif training data decreases with phylogenetic distance. Shown is the maximum likelihood tree of ten flaviviruses created using NS3 sequence, compared to the percentage of proteolytically cleaved motifs correctly predicted by the machine learning algorithm trained on dengue virus motif data. Percentage decreases in proportion to evolutionary distance.

Percentage of sites correctly classified Tick-borne encephalitis virus 20% Powassan virus 0% Yellow fever virus 58% West Nile virus 46% Japanese encephalitis virus 32% Zika virus 79% Dengue virus serotype 4 Dengue virus serotype 2 0.2 98% branch length scale Dengue virus serotype 1 Dengue virus serotype 3 92

flaviviruses plus the four serotypes of dengue). This tree matches the relationship of the flavivirus family predicted using whole genome alignments, and is supported by the literature[48]. I then curated motif data from the complete list of polyprotein sequences for each flavivirus. Jacob Stanley tested whether these motifs are correctly classified as cleaved by the machine learning algorithm trained on dengue virus training data. The percentage of cleavage motifs that are correctly classified by the dengue-trained first stage is inversely correlated with evolutionary distance from dengue, consistent with our expectation. This implies that a machine learning algorithm trained on data from a non-dengue flavivirus would be likely to identify di↵erent host proteins. In the future, predictions could be made for species-specific flaviviruses using protein sequence from the host each virus infections. Just as DGAT2 was identified for dengue in humans, novel targets could be identified in reservoir species for other viruses. Altogether, although through my dissertation work I eventually focussed on validating and confirming the importance of DGAT2 as a target of flavivirus proteases, I think there are several interesting research questions to be explored that emerged from the computational component of this screen.

4.9 Methods

Flavivirus sequence and motif curation for machine learning

Non-redundant viral polyprotein sequences were obtained from the NIAID Virus Pathogen

Database and Analysis Resource (ViPR) through the web site at http://www.viprbrc.org[77].

Viruses from the same taxon and serotype (e.g. yellow fever viruses were grouped together re- gardless of strain, while dengue viruses were divided by serotype) were aligned using the MUSCLE algorithm in the Unipro UGENE software. Once aligned, the known eightmer cleavage sites were identified and sequence data from those sections of the alignment were collected. These lists were reduced to non-redundant lists of motifs. The machine learning positive training data consisted of all the motifs recognized by dengue virus serotypes 1-4. For determining the accuracy of the machine learning algorithm for non-dengue flaviviruses, individual non-redundant lists of motifs 93 were curated.

Brief overview of machine learning methodology

The machine learning aspect of this project was conducted by Alex Stabell and Jacob Stanley, and the final iteration is summarized here for context. Jacob used a support vector machine (SVM) algorithm with a radial basis function (RBF) kernel, which was implemented using scikit-learn. The eight-residue motif training were converted to feature matrices representing biological data about the individual amino acids and their placement, and each feature was scaled to the range [-1,+1] within the data sets. Because there was a significant di↵erence in the size of the two training data sets (positive training data =195 motifs and negative training data =35,702 motifs), subsamples of the larger class were taken and compared to the entirety of the smaller class during training.

In this way, a collection of equivalent classifiers are generated, each trained on a distinct subset of the negative training data. 100 non-overlapping subsamples were used to create 100 classifiers, and were applied to the target data. The results were combined to form a final classification. For each individual classifier 15% of the data was randomly left out for testing (equal parts positive and negative class), and 5-fold cross-validation was performed with the remaining 85% in order to optimize the hyperparameters C and , resulting in final values C = 1.1 and = 0.12. The resulting

100 binary-SVM motif classifiers, were subsequently applied to the curated human protein data.

In order to identify a target site as cleaved, we required that all 95 classify the site as such.

Plasmids and sequences

The FLAG-tagged DENV2 NS2B3, expressed from the pCR3.1 plasmid, was a gift to the

Sawyer Lab from Yi-Ling Lin. This plasmid includes a 3x FLAG tag at the C-terminus of NS3.

The other V5-tagged flavivirus NS2B3 plasmids were gifts from Sonja Best, except for the pro- tease expressing the Rio Bravo virus protease. The Rio Bravo virus protease complex sequence was ordered from IDT as a gene block and cloned into the Gateway cloning donor plasmid, then

flipped into the Gateway expression plasmid pcDNA6.2. All NS2B3 constructs are expressed from the pcDNA6.2 plasmid, and all include a V5 tag at the C-terminus of NS3. DGAT2 con- 94 structs used for functional analysis were amplified from cDNA libraries constructed from the

HEK 293T cell line. An HA tag was engineered onto the 3’ end of the gene sequences, sep- arated from the coding sequence by a flanking region that was used to clone these sequences into the pLPCX mammalian expression plasmid. To create siRNA “immune” DGAT2 plasmids, geneblocks were ordered with four synonymous mutations in the siRNA targeting region, and the same flanking regions as in the above case, and were Gibson cloned into the pLPCX mammalian expression plasmid. These synonymous mutations make converted the wild-type sequence from 5’

CTACTTCACTTGGCTGGTG 3’ to 5’ GTATTTTACCTGGCTGGTG 3’. To create non-cleavable

DGAT2 mutant constructs, site-directed mutagenesis was performed with the following primers:

5’ CCCAAGAAAGGTGGCGCAGCATCACAGTGGGTCCGAAACTGGGCTGTGTGG 3’ and

5’ CCCACTGTGATGCTGCGCCACCTTTCTTGGGTGTGTTCCAGTCAAACACC 3’. The re- sulting DGAT2 motif sequence is changed from wild-type ”GGRRSQWV” to ”GGAASQWV”. Site directed mutagenesis was performed using the Phusion High-Fidelity DNA Polymerase from New

England Biosciences.

For screening the predicted targets of the dengue virus protease, plasmids encoding individual genes were ordered from the Functional Genomics Facility at the University of Colorado Anschutz

Medical Campus. These plasmids are part of the hORFeome V8.1 Library curated by the Broad

Institute. Each gene has been cloned into the pLX-TRC307WBC vector, is controlled by the

EF1a promoter sequence, and has a C-terminal V5 tag. Every plasmid screened was first fully sequenced with external plasmid primers and internal gene-specific primers. Genes that could not be sequenced or for which the sequence revealed frame-shifting mutations were removed from the screen.

DGAT2 sequences from primates were collected from Genbank and aligned using the MUS-

CLE algorithm in Unipro UGENE. PAML was not run using primate DGAT2 sequences as DGAT2 is an extremely conserved gene. DGAT2 sequences from animals relevant to Powassan and West

Nile viruses were collected from Genbank and aligned to human DGAT2. Only the cleavage motif from this alignment is shown in Figure 4.11C. 95

Protein cleavage assays

HEK 293T cells were grown at 37 C in DMEM supplemented with 10% FBS, Pen/Strep, and

L-glutamine. 24 hours prior to transfection, cells were plated at a density of 5.0 105 cells per well ⇥ in a 12-well dish in antibiotic-free media. For DGAT2 cleavage assays, cells were transfected with

400 ng plasmid encoding DGAT2 and 800 ng plasmid encoding each flavivirus NS2B3 using TransIT

293 reagent (Mirus MIR 2704). For screening predicted targets of the machine learning algorithm,

800 ng of plasmid encoding the predicted target and 800 ng of plasmid encoding the dengue virus protease were co-transfected. After 24 hr, cells were collected for analysis of cleavage by western blot. In instances when MG-132 was used, a ready-made solution from Sigma-Aldrich (M7449) was added to antibiotic-free media at the indicated concentrations and used to replace the transfection media after 24 hours. In instances when NH4Cl was used, it was added to antibiotic-free media at the indicated concentration and used to replace the transfection media after 24 hours.

Western blotting

Cells were lysed in RIPA bu↵er supplemented with protease inhibitor (Roche, 4693159001).

Protein concentration was calculated using the Bradford method. For non stain-free western blots:

15% 37.5:1 Acrylamide/Bisacrylamide gels were used to run 10-20 µg of whole cell lysate for each sample. For stain-free western blots: TGX Stain-Free FastCast Acrylamide Kit 12% gels were used to run the same amount of whole cell lysate. For either method, protein was transferred overnight at 30 volts or 1 hour at 100 volts onto a polyvinyl membrane. Blocking was performed with a 10% milk solution in tris-bu↵ered saline supplemented with 0.1% TWEEN20. Primary antibodies used were used against HA (3f10 clone Sigma 11867423001), Flag (M2 clone Sigma

F3165), V5 (SV5-Pk1, Abcam ab27671), Beta Actin (Mouse mAb Cell Signalling 8H10D10), and dengue virus NS3 (mouse polyclonal antibody raised against purified full-length NS3 from dengue 2 strain 16681). Secondary antibodies used were goat-anti-mouse-HRP (Thermo 62-6520) and goat- anti-rabbit (Thermo 65-6120). Blots were developed using ECL Prime (Amersham RPN2232) and imaged using the Synergy HTX Multi-Mode Reader from BioTek Instruments. 96

Dengue infection assays

For all infection experiments, the indicated cell lines were plated out in F-12K antibiotic- free media with 10% FBS and allowed to attach to the plate for 24 hours. The indicated MOI was calculated for each well based on number of cells plated (assuming enough time for cell doubling) and dengue virus 2 (16681) was allowed to attach to cells for 1 hour at room temperature. Unattached virus was then removed from cells by two washes with DPBS. 2% serum in DMEM media was added to cells and they were maintained at 37 C with 5% CO2. After 24 hours the virus supernatant was removed for downstream titration on BHK21 cells. At the same time, cells were removed for downstream western blotting or for cell-associated viral RNA quantification. For titering infections:

BHK cells were plated in six-well plates in antibiotic-free media. When confluent, the virus collected from infection supernatant was used to make 3 or 6 10-fold dilutions. 250 uL of each dilution was allowed to attach to BHK cells for 1 hour at room temperature. After 1 hour, MEM with 5% FBS was combined with 1% agarose boiled and allowed to cool to 60 degrees, and this was pipetted over the infected BHK cells. 3-5 days later, when plaques were visible under the agarose, 1 mL of

0.03% neutral red solution (Sigma Aldrich N2889) was added to each well and maintained at 37

C overnight. The neutral red and media was then aspirated o↵and plaques were visualized and counted on a white lightbox.

DGAT2 siRNA knock-down

A549 cells were first transduced with siRNA “immune” plasmids (described above) as follows.

Each pLPCX-DGAT2 construct was transfected into HEK 293T cells with plasmids expressing NB- tropic murine leukemia virus (MLV) Gag-Pol and VSV-G. Supernatants were collected and used to transduce 105 A549 cells in the presence of 10 µg/mL polybrene. 24 hours post transduction, cells were selected in 0.75 µg/mL puromycin. Successful transduction was confirmed by western blotting for HA. These cell lines were then plated out in 12-well dishes with F-12K antibiotic-free media with

10% FBS. 30 pmol/well of a DGAT2 targeting siRNA with the sequence 5’ CTACTTCACTTG-

GCTGGTG 3’ (Horizon Discovery/Dharmacon) was transfected using Invitrogen Lipofectamine 97

(trademark) RNAiMax. As a negative control, cells were mock transfected (RNAiMax, no siRNA) or transfected with anti-GFP siRNA. 48 hours after transfection, cells were infected with DENV2 as described above.

CRISPR-Cas9 mediated disruption of DGAT2

A549 cells were transfected with the pSpCas9(BB)-2A-Puro (PX459) V2.0 plasmid with the guide RNA sequence 5’ CACCGACGGCCTTACCTGGCTACAC 3’. Cells were then treated with 0.75 µg/mL puromycin F12K media. Surviving A549 cells were diluted to single cells and grown up as homogenous populations. A549 cells have three alleles of DGAT2 due to chromoso- mal abnormalities. Clonal populations were screened for cells in which all three DGAT2 alleles had disrupted coding sequence of DGAT2. The region surrounding the guide RNA was amplified using the following primers: 5’ AGGAACAAGGGCAAACATTG 3’ and 5’ CACCAAGGAATG-

GTGTGTTG 3’. Amplified genomic DNA was Sanger sequenced to determine the nature of the

CRISPR-CAS9-mediated genomic disruption. A cell line with confirmed disruption of DGAT2 in all three DGAT2 alleles was then re-complemented with HA-tagged DGAT2, either the wild-type sequence (cleavable) or the mutant sequence (non-cleavable). These DGAT2 sequences were pack- aged into retroviral particles by cotransfecting into 293T cells each pLPCX-DGAT2 construct with plasmids expressing NB-tropic murine leukemia virus (MLV) Gag-Pol and VSV-G. Supernatants were collected and used to transduce 105 A549 cells in the presence of 10 µg/mL polybrene. 24 hours post transduction, cells were selected in 0.75 µg/mL puromycin. Successful transduction was confirmed by western blotting for HA.

Chemical inhibition of DGAT2

The chemical inhibitor of DGAT2, PF-06424439, was ordered from Tocris (Cat. No. 6348) and diluted in DMSO to a concentration of 50 mM. Dilutions were made in DMSO, then mixed with antibiotic-free DMEM supplemented with 10% FBS and L-glutamine to the desired final concentrations. Inhibitor-spiked media, or DMSO-spiked media, was used as the replacement media after a 1-hour dengue adsorption period. 98

qPCR quantification of dengue virus RNA genome

RNA was collected from infected or mock-infected cells using the QIAGEN Rneasy Plus Mini

Kit. Parallel cDNA libraries were created using oligo dT primers or a DENV2-specific primer (5’

CATTCCATTTTCTGGCGTTCT 3’), using the SuperScript IV Reverse Transcriptase kit from

Thermo Fisher Scientific. The host RNA GAPDH was quantified with GAPDH-specific primers (5’

TGATGGCATGGACTGTGGTC 3’ and 5’ TGCTGGGGTCTAGGCTGTTT 3’) and dengue virus genomic RNA was quantified using DENV2 specific primers (5’ ACAAGTCGAACAACCTGGTC-

CAT 3’ and 5’ GCCGCACCATTGGTCTTCTC 3’). Ct values from DENV2 amplification were normalized to GAPDH amplification Ct values for each sample, and fold change from untreated to treated condition or wild-type to mutant condition were calculated. All qPCR reactions were done in triplicate per sample, and almost every qPCR-based experiment was performed with three experimental replicates.

Subcellular fractionation

Infection or transfection of Huh7 cells was performed in 10 cm plates. For transfection of Huh7 cells, 2000 ng of plasmid DNA was transfected using FuGENE Transfection Reagent (Promega).

After 24 hours, cells were trypsinized, the trypsin was blocked with DMEM supplemented with

10% FBS, and tubes were spun at 4C in a swinging bucket rotor at 1000 rpm for 3 minutes. The pellet was washed with 5 ml 4C PBS, then spun at 1000 rpm for 3 minutes. 1 ml 4 C lysis bu↵er

(T20, EDTA, PBS and protease inhibitor and phosphatase inhibitor) was added and the pellet was resuspended fully and then incubated on ice for 20 minutes. The solution was dounced with a stainless steel dounce homogenizer ten times. The sample was then centrifuged in 15 ml conicals at

2000 rpm for 5 minutes. The supernatant was transferred to new tubes, and centrifuged at 4000 rpm for 10 minutes, while the nuclear pellet was resuspended in PBS and 4X SDS bu↵er (16% SDS, 40 mM TCEP, 160 mM ClAA, 200 mM Tris pH8.5) was added. Supernatant from centrifuged samples was transferred to new tubes, and the large ER pellet was resuspended as above. Supernatant was centrifuged in a fixed angle rotor, at max speed for 10 minutes. Supernatant (cytosolic fraction) 99 was decanted and 4X SDS bu↵er was added. 4X SDS bu↵er was also added pellets after they had been resuspended (small ER fraction). Samples were then used for western blotting or mass spectrometry analysis. Chapter 5

Conclusion

Over the course of my dissertation research I have become extremely interested in the family of viruses called flaviviruses. As a molecular biologist, the number of undescribed interactions between flavivirus proteins and host proteins is both overwhelming and incredibly exciting. There is much to discover about how flaviviruses are remodeling the cellular environment to enhance replication, a process which clearly includes an entire repertoire of diverse mechanisms. From the perspective of a virologist, flaviviruses are an interesting group of viruses that have similar genetic structures and replication models, with subtle variations that may make the di↵erences between which are highly infectious in humans, which transmit by mosquito vs by tick, and which are likely to become dangerous epidemic viruses in the way Zika became epidemic in 2015.

Throughout my dissertation work I studied how the proteases of multiple flaviviruses inter- act with host proteins, including the immune protein STING and the lipid metabolism enzyme

DGAT2. I demonstrated that proteases from mosquito-borne and tick-borne flaviviruses cleave

STING (although the feature is not completely conserved in the family) but that this cleavage does not necessarily dampen the innate immune response, as our original model proposed. I showed that

DGAT2 is cleaved by every flavivirus protease tested, and that losing this ability to cleave DGAT2 inhibits dengue virus replication.

As the climate warms and grows wetter, the geographic range of mosquitos is likely to increase.

With mosquitos come mosquito-borne diseases: dengue, Zika, malaria and more. Anthropogenic climate change is no longer something we can prevent; we must turn to the problem of adaptation. 101

Dengue has been a problem for centuries in tropical areas, but soon an even greater swath of the world will have to confront it. Likewise, other mosquito-borne flaviviruses (and most likely, some tick-borne flaviviruses) will reach new populations. We must continue studying the diseases that will only become more devastating in the coming years. Bibliography

[1] Sebastian Aguirre, Ana M. Maestre, Sarah Pagni, Jenish R. Patel, Timothy Savage, Delia Gutman, Kevin Maringer, Dabeiba Bernal-Rubio, Reed S. Shabman, Viviana Simon, Juan R. Rodriguez-Madoz, Lubbertus C. F. Mulder, Glen N. Barber, and Ana Fernandez-Sesma. DENV Inhibits Type I IFN Production in Infected Cells by Cleaving Human STING. PLoS Pathogens, 8(10):e1002934, 2012.

[2] Griths AJF, Miller JH, and Suzuki DT. An Introduction to Genetic Analysis. W. H. Freeman, 2000.

[3] Carlos F Arias, Frank Preugschat, and James H Strauss. Dengue 2 Virus NS2B and NS3 Form a Stable Complex That Can Cleave NS3 within the Helicase Domain. Virology, 193(2):888– 899, 1993.

[4] J Ashour, M Laurent-Rolle, P Y Shi, and A Garcia-Sastre. NS5 of Dengue Virus Mediates STAT2 Binding and Degradation. Journal of Virology, 83(11):5408–5418, 2009.

[5] Glen N Barber. STING-dependent signaling. Nature Immunology, 12(10):929–930, 2011.

[6] Sonja M. Best. Flaviviruses. Current Biology, 26(24):R1258–R1260, 2016.

[7] Samir Bhatt, Peter W. Gething, Oliver J. Brady, Jane P. Messina, Andrew W. Farlow, Catherine L. Moyes, John M. Drake, John S. Brownstein, Anne G. Hoen, Osman Sankoh, Monica F. Myers, Dylan B. George, Thomas Jaenisch, G. R. William Wint, Cameron P. Simmons, Thomas W. Scott, Jeremy J. Farrar, and Simon I. Hay. The global distribution and burden of dengue. Nature, 496(7446):504–507, 2013.

[8] Shibadas Biswal, Humberto Reynales, Xavier Saez-Llorens, Pio Lopez, Charissa Borja- Tabora, Pope Kosalaraksa, Chukiat Sirivichayakul, Veerachai Watanaveeradej, Luis Rivera, Felix Espinoza, LakKumar Fernando, Reynaldo Dietze, Kleber Luz, Rivaldo Venˆancio da Cunha, Jos´e Jimeno, Eduardo L´opez-Medina, Astrid Borkowski, Manja Brose, Martina Rauscher, Inge LeFevre, Svetlana Bizjajeva, Lulu Bravo, and Derek Wallace. Ecacy of a Tetravalent Dengue Vaccine in Healthy Children and Adolescents. New England Journal of Medicine, 381(21):2009–2019, 2019.

[9] Dara L Burdette and Russell E Vance. STING and the innate immune response to nucleic acids in the cytosol. Nature Immunology, 14(1):19–26, 2012.

[10] Jose Luis Slon Campos, Juthathip Mongkolsapaya, and Gavin R. Screaton. The immune response against flaviviruses. Nature Immunology, 19(11):1189–1198, 2018. 103

[11] Lewis Carroll. Through the Looking-Glass. Macmillan, 1871.

[12] Sylvaine Cases, Scot Stone, Ping Zhou, Eric Yen, Bryan Tow, Kathryn D. Lardizabal, Toni Voelker, and Robert V. Farese. Cloning of DGAT2, a second mammalian diacylglycerol acyltransferase, and related family members. Journal of Biological Chemistry, 276(42):38870– 38876, 2001.

[13] T J Chambers, R C Weir, A Grakoui, D W McCourt, J F Bazan, R J Fletterick, and C M Rice. Evidence that the N-terminal domain of nonstructural protein NS3 from yellow fever virus is a serine protease responsible for site-specific cleavages in the viral polyprotein. Proceedings of the National Academy of Sciences, 87(22):8898–8902, 1990.

[14] Chandramohan Chitraju, Tobias C. Walther, and Robert V. Farese. The triglyceride synthesis enzymes DGAT1 and DGAT2 have distinct and overlapping functions in adipocytes. Journal of Lipid Research, 60(6):1112–1120, 2019.

[15] Nunya Chotiwan, Barbara G. Andre, Irma Sanchez-Vargas, M. Nurul Islam, Je↵rey M. Grabowski, Amber Hopf-Jannasch, Erik Gough, Ernesto Nakayasu, Carol D. Blair, John T. Belisle, Catherine A. Hill, Richard J. Kuhn, and Rushika Perera. Dynamic remodeling of lipids coincides with dengue virus replication in the midgut of Aedes aegypti mosquitoes. PLOS Pathogens, 14(2):e1006853, 2018.

[16] Charles Darwin. On the origin of species by means of natural selection, or, The preservation of favoured races in the struggle for life. London, John Murray, 1859.

[17] Qiang Ding, Xuezhi Cao, Jie Lu, Bing Huang, Yong-Jun Liu, Nobuyuki Kato, Hong-Bing Shu, and Jin Zhong. Hepatitis C virus NS4B blocks the interaction of STING and TBK1 to evade host innate immunity. Journal of Hepatology, 59(1):52–58, 2013.

[18] Qiang Ding, Jenna M Gaska, Florian Douam, Lei Wei, David Kim, Metodi Balev, Brigitte Heller, and Alexander Ploss. Species-specific disruption of STING-dependent antiviral cellular defenses by the Zika virus NS2B3 protease. Proceedings of the National Academy of Sciences, 115(27):E6310–E6318, 2018.

[19] Thomas Oliver Eichmann and Achim Lass. DAG tales: the multiple faces of diacylglyc- erol—stereochemistry, metabolism, and signaling. Cellular and Molecular Life Sciences, 72(20):3931–3952, 2015.

[20] Michael Emerman and Harmit S. Malik. Paleovirology—Modern Consequences of Ancient Viruses. PLoS Biology, 8(2):e1000301, 2010.

[21] David Enard and Dmitri Petrov. Ancient RNA virus epidemics through the lens of recent adaptation in human genomes. bioRxiv, page 2020.03.18.997346, 2020.

[22] B Falgout, M Pethel, Y M Zhang, and C J Lai. Both nonstructural proteins NS2B and NS3 are required for the proteolytic processing of dengue virus nonstructural proteins. Journal of virology, 65(5):2467–75, 1991.

[23] Volker Fensterl, Saurabh Chattopadhyay, and Ganes C Sen. No Love Lost Between Viruses and Interferons. Annual Review of Virology, 2(1):549–572, 2015. 104

[24] Kentaro Futatsugi, Daniel W Kung, Suvi T M Orr, Shawn Cabral, David Hepworth, Gary Aspnes, Scott Bader, Jianwei Bian, Markus Boehm, Philip A Carpino, Steven B Co↵ey, Matthew S Dowling, Michael Herr, Wenhua Jiao, Sophie Y Lavergne, Qifang Li, Ronald W Clark, Derek M Erion, Kou Kou, Kyuha Lee, Brandon A Pabst, Sylvie M Perez, Julie Purkal, Csilla C Jorgensen, Theunis C Goosen, James R Gosset, Mark Niosi, John C Pet- tersen, Je↵rey A Pfe↵erkorn, Kay Ahn, and Bryan Goodwin. Discovery and Optimization of Imidazopyridine-Based Inhibitors of Diacylglycerol Acyltransferase 2 (DGAT2). Journal of Medicinal Chemistry, 58(18):7173–7185, 2015.

[25] M Gaster, A C Rustan, and H Beck-Nielsen. Di↵erential Utilization of Saturated Palmitate and Unsaturated Oleate: Evidence From Cultured Myotubes. Diabetes, 54(3):648–656, 2005.

[26] Magdalena Gay`a-Vidal and M Mar Alb`a.Uncovering adaptive evolution in the human lineage. BMC Genomics, 15(1):599, 2014.

[27] Angela M Green, P Robert Beatty, Alexandros Hadjilaou, and Eva Harris. Innate Immunity to Dengue Virus Infection and Subversion of Antiviral Responses. Journal of Molecular Biology, 426(6):1148–1160, 2014.

[28] Rebekah C. Gullberg, J. Jordan Steel, Venugopal Pujari, Joel Rovnak, Dean C. Crick, and Rushika Perera. Stearoly-CoA desaturase 1 di↵erentiates early and advanced dengue virus infections and determines virus particle infectivity. PLOS Pathogens, 14(8):e1007261, 2018.

[29] Dustin C Hancks, Melissa K Hartley, Celia Hagan, Nathan L Clark, and Nels C Elde. Over- lapping Patterns of Rapid Evolution in the Nucleic Acid Sensors cGAS and OAS1 Suggest a Common Mechanism of Pathogen Antagonism and Escape. PLoS Genetics, 11(5):e1005203, 2015.

[30] Nicholas S. Heaton, Rushika Perera, Kristi L. Berger, Sudip Khadka, Douglas J. LaCount, Richard J. Kuhn, and Glenn Randall. Dengue virus nonstructural protein 3 redistributes fatty acid synthase to sites of viral replication and increases cellular fatty acid synthesis. Proceedings of the National Academy of Sciences, 107(40):17345–17350, 2010.

[31] Nicholas S. Heaton and Glenn Randall. Dengue Virus-Induced Autophagy Regulates Lipid Metabolism. Cell Host & Microbe, 8(5):422–432, 2010.

[32] Christian K Holm, Stine H Rahbek, Hans Henrik Gad, Rasmus O Bak, Martin R Jakobsen, Zhaozaho Jiang, Anne Louise Hansen, Simon K Jensen, Chenglong Sun, Martin K Thom- sen, Anders Laustsen, Camilla G Nielsen, Kasper Severinsen, Yingluo Xiong, Dara L Bur- dette, Veit Hornung, Robert Jan Lebbink, Mogens Duch, Katherine A Fitzgerald, Shervin Bahrami, Jakob Giehm Mikkelsen, Rune Hartmann, and Søren R Paludan. Influenza A virus targets a cGAS-independent STING pathway that controls enveloped RNA viruses. Nature Communications, 7(1):10680, 2016.

[33] Veit Hornung. SnapShot: Nucleic Acid Immune Sensors, Part 1. Immunity, 41(5):868–868.e1, 2014.

[34] Veit Hornung. SnapShot: Nucleic Acid Immune Sensors, Part 2. Immunity, 41(6):1066– 1066.e1, 2014. 105

[35] A. L. Hughes. The evolution of the type I interferon gene family in mammals. Journal of Molecular Evolution, 41(5):539–548, 1995.

[36] Hiroki Ishikawa and Glen N Barber. STING is an endoplasmic reticulum adaptor that facil- itates innate immune signalling. Nature, 455(7213):674–678, 2008.

[37] Hiroki Ishikawa, Zhe Ma, and Glen N Barber. STING regulates intracellular DNA-mediated, type I interferon-dependent innate immunity. Nature, 461(7265):788–792, 2009.

[38] L Jin, P M Waterman, K R Jonscher, C M Short, N A Reisdorph, and J C Cambier. MPYS, a Novel Membrane Tetraspanner, Is Associated with Major Histocompatibility Complex Class II and Mediates Transduction of Apoptotic Signals. Molecular and Cellular Biology, 28(16):5014–5026, 2008.

[39] L Jin, L-g Xu, I V Yang, E J Davidson, D A Schwartz, M M Wurfel, and J C Cambier. Identification and characterization of a loss-of-function human MPYS variant. Genes & Immunity, 12(4):263–269, 2011.

[40] Wei Jin, Dong-Dong Wu, Xin Zhang, David M. Irwin, and Ya-Ping Zhang. Positive Selection on the Gene RNASEL: Correlation between Patterns of Evolution and Function. Molecular Biology and Evolution, 29(10):3161–3168, 2012.

[41] Welkin E. Johnson. Endless Forms Most Viral. PLoS Genetics, 6(11):e1001210, 2010.

[42] Melissa Kane, Trinity M Zang, Suzannah J Rihn, Fengwen Zhang, Tonya Kueck, Mudathir Alim, John Schoggins, Charles M Rice, Sam J Wilson, and Paul D Bieniasz. Identification of Interferon-Stimulated Genes with Antiretroviral Activity. Cell Host & Microbe, 20(3):392– 405, 2016.

[43] K. J. Karczewski, L. C. Francioli, G. Tiao, B. B. Cummings, J. Alf¨oldi, Q. Wang, R. L. Collins, K. M. Laricchia, A. Ganna, D. P. Birnbaum, L. D. Gauthier, H. Brand, M. Solomonson, N. A. Watts, D. Rhodes, M. Singer-Berk, E. M. England, E. G. Seaby, J. A. Kosmicki, R. K. . . Walters, and D. G. MacArthur. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature, 2020.

[44] Taro Kawai and Shizuo Akira. The roles of TLRs, RLRs and NLRs in pathogen recognition. International Immunology, 21(4):317–337, 2009.

[45] Julie A Kerns, Michael Emerman, and Harmit S Malik. Positive Selection and Increased Antiviral Activity Associated with the PARP-Containing Isoform of Human Zinc-Finger An- tiviral Protein. PLoS Genetics, 4(1):e21, 2008.

[46] Asif M. Khan, Olivo Miotto, Eduardo J. M. Nascimento, K. N. Srinivasan, A. T. Heiny, Guang Lan Zhang, E. T. Marques, Tin Wee Tan, Vladimir Brusic, Jerome Salmon, and J. Thomas August. Conservation and Variability of Dengue Virus Proteins: Implications for Vaccine Design. PLoS Neglected Tropical Diseases, 2(8):e272, 2008.

[47] Kristin M. Kohler, Miriam Kutsch, Anthony S. Piro, Graham Wallace, J¨orn Coers, and Matthew F. Barber. A rapidly evolving polybasic motif modulates bacterial detection by guanylate binding proteins. bioRxiv, page 689554, 2019. 106

[48] Goro Kuno, Gwong-Jen J Chang, K Richard Tsuchiya, Nick Karabatsos, and C Bruce Cropp. Phylogeny of the Genus Flavivirus. Journal of Virology, 72(1):73–83, 1998.

[49] Nicholas J. Lennemann and Carolyn B. Coyne. Dengue and Zika viruses subvert reticulophagy by NS2B3-mediated cleavage of FAM134B. Autophagy, 13(2):00–00, 2017.

[50] Jun Li, Siew Pheng Lim, David Beer, Viral Patel, Daying Wen, Christine Tumanut, David C Tully, Jennifer A Williams, Jan Jiricek, John P Priestle, Jennifer L Harris, and Subhash G Vasudevan. Functional Profiling of Recombinant NS3 Proteases from All Four Serotypes of Dengue Virus Using Tetrapeptide and Octapeptide Substrate Libraries. Journal of Biological Chemistry, 280(31):28766–28774, 2005.

[51] Melody M H Li, Margaret R MacDonald, and Charles M Rice. To translate, or not to translate: viral and host mRNA regulation by interferon-stimulated genes. Trends in cell biology, 25(6):320–9, 2015.

[52] E S Lim, H S Malik, and M Emerman. Ancient Adaptive Evolution of Tetherin Shaped the Functions of Vpu and Nef in Human Immunodeficiency Virus and Primate Lentiviruses. Journal of Virology, 84(14):7124–7134, 2010.

[53] C Lin, S M Amberg, T J Chambers, and C M Rice. Cleavage at a novel site in the NS4A region by the yellow fever virus NS2B-3 proteinase is a prerequisite for processing at the downstream 4A/4B signalase site. Journal of virology, 67(4):2327–35, 1993.

[54] M Lobigs. Flavivirus premembrane protein cleavage and spike heterodimer secretion require the function of the viral proteinase NS3. Proceedings of the National Academy of Sciences, 90(13):6218–6222, 1993.

[55] Jason M. Mackenzie, Malcolm K. Jones, and Paul R. Young. Immunolocalization of the Dengue Virus Nonstructural Glycoprotein NS1 Suggests a Role in Viral RNA Replication. Virology, 220(1):232–240, 1996.

[56] Katherine Macrae, Clare Stretton, Christopher Lipina, Agnieszka Blachnio-Zabielska, Marcin Baranowski, Jan Gorski, Anna Marley, and Harinder S. Hundal. Defining the role of DAG, mitochondrial function, and lipid deposition in palmitate-induced proinflammatory signaling and its counter-modulation by palmitoleate. Journal of Lipid Research, 54(9):2366–2378, 2013.

[57] Ray Malfavon-Borja, Sara L Sawyer, Lily I Wu, Michael Emerman, and Harmit S Malik. An Evolutionary Screen Highlights Canonical and Noncanonical Candidate Antiviral Genes within the Primate TRIM Gene Family. Genome Biology and Evolution, 5(11):2141–2154, 2013.

[58] J´er´emy Manry, Guillaume Laval, Etienne Patin, Simona Fornarino, Yuval Itan, Matteo Fu- magalli, Manuela Sironi, Magali Tichit, Christiane Bouchier, Jean-Laurent Casanova, Luis B Barreiro, and Lluis Quintana-Murci. Evolutionary genetic dissection of human interferons. The Journal of Experimental Medicine, 208(13):2747–2759, 2011.

[59] Ross M McBee, Shea A Rozmiarek, Nicholas R Meyerson, Paul A Rowley, and Sara L Sawyer. The E↵ect of Species Representation on the Detection of Positive Selection in Primate Gene Data Sets. Molecular Biology and Evolution, 32(4):1091–1096, 2015. 107

[60] A McLysaght, P F Baldi, and B S Gaut. Extensive gene gain associated with adaptive evolution of poxviruses. Proceedings of the National Academy of Sciences, 100(26):15655– 15660, 2003.

[61] Finlay McNab, Katrin Mayer-Barber, Alan Sher, Andreas Wack, and Anne O’Garra. Type I interferons in infectious disease. Nature Reviews Immunology, 15(2):87–103, 2015.

[62] Carlos Fernando Odir Rodrigues Melo, Jeany Delafiori, Mohamad Ziad Dabaja, Diogo Noin de Oliveira, Tatiane Melina Guerreiro, Tatiana Elias Colombo, Maur´ıcio Lacerda Nogueira, Jose Luiz Proenca-Modena, and Rodrigo Ramos Catharino. The role of lipids in the incep- tion, maintenance and complications of dengue virus infection. Scientific Reports, 8(1):11826, 2018.

[63] Emily V Mesev, Robert A LeDesma, and Alexander Ploss. Decoding type I and III interferon signalling during viral infection. Nature Microbiology, 4(6):914–924, 2019.

[64] Jane P. Messina, Oliver J. Brady, Nick Golding, Moritz U. G. Kraemer, G. R. William Wint, Sarah E. Ray, David M. Pigott, Freya M. Shearer, Kimberly Johnson, Lucas Earl, Laurie B. Marczak, Shreya Shirude, Nicole Davis Weaver, Marius Gilbert, Raman Velayudhan, Peter Jones, Thomas Jaenisch, Thomas W. Scott, Robert C. Reiner, and Simon I. Hay. The current and future global distribution and population at risk of dengue. Nature Microbiology, 4(9):1508–1515, 2019.

[65] Nicholas R Meyerson and Sara L Sawyer. Two-stepping through time: mammals and viruses. Trends in microbiology, 19(6):286–94, 2011.

[66] Sven Miller, Stefan Kastner, Jacomine Krijnse-Locker, Sandra B¨uhler, and Ralf Barten- schlager. The Non-structural Protein 4A of Dengue Virus Is an Integral Membrane Protein Inducing Membrane Alterations in a 2K-regulated Manner. Journal of Biological Chemistry, 282(12):8873–8882, 2007.

[67] Patrick S Mitchell, Corinna Patzina, Michael Emerman, Otto Haller, Harmit S Malik, and Georg Kochs. Evolution-Guided Identification of Antiviral Specificity Determinants in the Broadly Acting Interferon-Induced Innate Immunity Factor MxA. Cell Host & Microbe, 12(4):598–604, 2012.

[68] Christopher Monit, Elizabeth R. Morris, Christopher Ruis, Bart Szafran, Grant Thilt- gen, Ming-Han Chloe Tsai, N. Avrion Mitchison, Kate N. Bishop, Jonathan P. Stoye, Ian A. Taylor, Ariberto Fassati, and Richard A. Goldstein. Positive selection in dNT- Pase SAMHD1 throughout mammalian evolution. Proceedings of the National Academy of Sciences, 116(37):18647–18654, 2019.

[69] Alessandra Mozzi, Chiara Pontremoli, Diego Forni, Mario Clerici, Uberto Pozzoli, Nereo Bresolin, Rachele Cagliani, and Manuela Sironi. OASes and STING: Adaptive Evolution in Concert. Genome Biology and Evolution, 7(4):1016–1032, 2015.

[70] Christopher Netherton, Katy Mo↵at, Elizabeth Brooks, and Thomas Wileman. A Guide to Viral Inclusions, Membrane Rearrangements, Factories, and Viroplasm Produced During Virus Replication. Advances in Virus Research, 70(Clin. Infect. Dis.451986):101–182, 2007. 108

[71] Sayuri Nitta, Naoya Sakamoto, Mina Nakagawa, Sei Kakinuma, Kako Mishima, Akiko Kusano-Kitazume, Kei Kiyohashi, Miyako Murakawa, Yuki Nishimura-Sakurai, Seishin Azuma, Megumi Tasaka-Fujita, Yasuhiro Asahina, Mitsutoshi Yoneyama, Takashi Fujita, and Mamoru Watanabe. Hepatitis C virus NS4B protein targets STING and abrogates RIG- I-mediated type I interferon-dependent innate immunity. Hepatology, 57(1):46–58, 2013.

[72] Konstantin Okonechnikov, Olga Golosova, and Mikhail Fursov. Unipro UGENE: a unified bioinformatics toolkit. Bioinformatics, 28(8):1166–1167, 2012.

[73] Maulik R Patel, Michael Emerman, and Harmit S Malik. Paleovirology—ghosts and gifts of viruses past. Current Opinion in Virology, 1(4):304–309, 2011.

[74] Maulik R Patel, Yueh-Ming Loo, Stacy M Horner, Michael Gale, and Harmit S Malik. Convergent evolution of escape from hepaciviral antagonism in primates. PLoS biology, 10(3):e1001282, 2012.

[75] Polina Perelman, Warren E Johnson, Christian Roos, Hector N Seu´anez, Julie E Horvath, Miguel A M Moreira, Bailey Kessing, Joan Pontius, Melody Roelke, Yves Rumpler, Maria Paula C Schneider, Artur Silva, Stephen J O’Brien, and Jill Pecon-Slattery. A molecular phylogeny of living primates. PLoS Genetics, 7(3):e1001342, 2011.

[76] Rushika Perera, Catherine Riley, Giorgis Isaac, Amber S. Hopf-Jannasch, Ronald J. Moore, Karl W. Weitz, Ljiljana Pasa-Tolic, Thomas O. Metz, Jiri Adamec, and Richard J. Kuhn. Dengue Virus Infection Perturbs Lipid Homeostasis in Infected Mosquito Cells. PLoS Pathogens, 8(3):e1002584, 2012.

[77] Brett E. Pickett, Eva L. Sadat, Yun Zhang, Jyothi M. Noronha, R. Burke Squires, Victoria Hunt, Mengya Liu, Sanjeev Kumar, Sam Zaremba, Zhiping Gu, Liwei Zhou, Christopher N. Larson, Jonathan Dietrich, Edward B. Klem, and Richard H. Scheuermann. ViPR: an open bioinformatics database and analysis resource for virology research. Nucleic Acids Research, 40(D1):D593–D598, 2012.

[78] F Preugschat, E M Lenches, and J H Strauss. Flavivirus enzyme-substrate interactions studied with chimeric proteinases: identification of an intragenic important for substrate recognition. Journal of Virology, 65(9):4749–4758, 1991.

[79] F Preugschat, C W Yao, and J H Strauss. In vitro processing of dengue virus type 2 non- structural proteins NS2A, NS2B, and NS3. Journal of virology, 64(9):4364–74, 1990.

[80] R E Randall and S Goodbourn. Interferons and viruses: an interplay between induction, sig- nalling, antiviral responses and virus countermeasures. Journal of General Virology, 89(1):1– 47, 2008.

[81] Irina Rusinova, Sam Forster, Simon Yu, Anitha Kannan, Marion Masse, Helen Cumming, Ross Chapman, and Paul J. Hertzog. INTERFEROME v2.0: an updated database of anno- tated interferon-regulated genes. Nucleic Acids Research, 41(D1):D1040–D1046, 2013.

[82] Wioletta Rut, Katarzyna Groborz, Linlin Zhang, Sylwia Modrzycka, Marcin Poreba, Rolf Hilgenfeld, and Marcin Drag. Profiling of flaviviral NS2B-NS3 protease specificity provides a structural basis for the development of selective chemical tools that di↵erentiate dengue from Zika and West Nile viruses. Antiviral Research, page 104731, 2020. 109

[83] Hanna Schenk, Hinrich Schulenburg, and Arne Traulsen. How long do Red Queen dynamics survive under genetic drift? A comparative analysis of evolutionary and eco-evolutionary models. BMC Evolutionary Biology, 20(1):8, 2020. [84] William M Schneider, Meike Dittmann Chevillotte, and Charles M Rice. Interferon-stimulated genes: a complex web of host defenses. Annual review of immunology, 32(1):513–45, 2014. [85] John W Schoggins. Interferon-stimulated genes: roles in viral pathogenesis. Current Opinion in Virology, 6:40–46, 2014. [86] John W Schoggins and Charles M Rice. Interferon-stimulated genes and their antiviral e↵ector functions. Current Opinion in Virology, 1(6):519–525, 2011. [87] John W Schoggins, Sam J Wilson, Maryline Panis, Mary Y Murphy, Christopher T Jones, Paul Bieniasz, and Charles M Rice. A diverse range of gene products are e↵ectors of the type I interferon antiviral response. Nature, 472(7344):481–485, 2011. [88] Sergey A. Shiryaev, Igor A. Kozlov, Boris I. Ratnikov, Je↵rey W. Smith, Michal Lebl, and Alex Y. Strongin. Cleavage preference distinguishes the two-component NS2B–NS3 serine proteinases of Dengue and West Nile viruses. Biochemical Journal, 401(3):743–752, 2007. [89] Chang Shu, Xin Li, and Pingwei Li. The mechanism of double-stranded DNA sensing through the cGAS-STING pathway. Cytokine & Growth Factor Reviews, 25(6):641–648, 2014. [90] Chang Shu, Guanghui Yi, Tylan Watts, C Cheng Kao, and Pingwei Li. Structure of STING bound to cyclic di-GMP reveals the mechanism of cyclic dinucleotide recognition by the immune system. Nature Structural & Molecular Biology, 19(7):722–724, 2012. [91] Ke Shuai and Bin Liu. Regulation of JAK–STAT signalling in the immune system. Nature Reviews Immunology, 3(11):900–911, 2003. [92] Peter Simmonds, Paul Becher, Jens Bukh, Ernest A. Gould, Gregor Meyers, Tom Monath, Scott Muerho↵, Alexander Pletnev, Rebecca Rico-Hesse, Donald B. Smith, Jack T. Stapleton, and ICTV Report Consortium. ICTV Virus Taxonomy Profile: Flaviviridae. Journal of General Virology, 98(1):2–3, 2017. [93] Manuela Sironi, Mara Biasin, Rachele Cagliani, Federica Gnudi, Irma Saulle, Salom`eIbba, Giulia Filippi, Sarah Yahyaei, Claudia Tresoldi, Stefania Riva, Daria Trabattoni, Luca De Gioia, Sergio Lo Caputo, Francesco Mazzotta, Diego Forni, Chiara Pontremoli, Juan Antonio Pineda, Uberto Pozzoli, Antonio Rivero-Juarez, Antonio Caruz, and Mario Clerici. Evolu- tionary Analysis Identifies an MX2 Haplotype Associated with Natural Resistance to HIV-1 Infection. Molecular Biology and Evolution, 31(9):2402–2414, 2014. [94] Manuela Sironi, Rachele Cagliani, Diego Forni, and Mario Clerici. Evolutionary insights into host–pathogen interactions from mammalian sequence data. Nature Reviews Genetics, 16(4):224–236, 2015. [95] Hortense Slevogt, Solveig Zabel, Bastian Opitz, Andreas Hocke, Julia Eitel, Philippe D N’Guessan, Lothar Lucka, Kristian Riesbeck, Wolfgang Zimmermann, Janine Zweigner, Bet- tina Temmesfeld-Wollbrueck, Norbert Suttorp, and Bernhard B Singer. CEACAM1 inhibits Toll-like receptor 2–triggered antibacterial responses of human pulmonary epithelial cells. Nature Immunology, 9(11):1270–1278, 2008. 110

[96] E E Smith and H S Malik. The apolipoprotein L family of programmed cell death and immunity genes rapidly evolved in primates at discrete sites of host-pathogen interactions. Genome Research, 19(5):850–858, 2009. [97] Stuart Smith. The animal fatty acid synthase: one gene, one polypeptide, seven enzymes. The FASEB Journal, 8(15):1248–1259, 1994. [98] Jiangning Song, Fuyi Li, Andr´eLeier, Tatiana T Marquez-Lago, Tatsuya Akutsu, Gholam- reza Ha↵ari, Kuo-Chen Chou, Geo↵rey I Webb, and Robert N Pike. PROSPERous: high- throughput prediction of substrate cleavage sites for 90 proteases with improved accuracy. Bioinformatics, 34(4):684–687, 2017. [99] Alex C Stabell, Nicholas R Meyerson, Rebekah C Gullberg, Alison R Gilchrist, Kristofor J Webb, William M Old, Rushika Perera, and Sara L Sawyer. Dengue viruses cleave STING in humans but not in nonhuman primates, their presumed natural reservoir. eLife, 7:e31919, 2018. [100] Alexander C. Stabell. Host and Viral Molecular Patters Utilized for Pathogenicity and Immunity. PhD thesis, University of Colorado at Boulder, 2017. [101] Jacob T Stanley, Alison R Gilchrist, Alex C Stabell, Mary A Allen, Sara L Sawyer, and Robin D Dowell. Two-stage ML Classifier for Identifying Host Protein Targets of the Dengue Protease. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, 25:487– 498, 2020. [102] Chan-I Su, Yu-Ting Kao, Chao-Chen Chang, Yao Chang, Tzong-Shiann Ho, H. Sunny Sun, Yi-Ling Lin, Michael M. C. Lai, Yu-Huei Liu, and Chia-Yi Yu. DNA-induced 2 3-cGAMP enhances haplotype-specific human STING cleavage by dengue protease. Proceedings of the National Academy of Sciences, 117(27):15947–15954, 2020. [103] Bo Sun, Karin B. Sundstr¨om, Jun Jie Chew, Pradeep Bist, Esther S. Gan, Hwee Cheng Tan, Kenneth C. Goh, Tanu Chawla, Choon Kit Tang, and Eng Eong Ooi. Dengue virus activates cGAS through the release of mitochondrial DNA. Scientific Reports, 7(1):3594, 2017. [104] L Sun, J Wu, F Du, X Chen, and Z J Chen. Cyclic GMP-AMP Synthase Is a Cytosolic DNA Sensor That Activates the Type I Interferon Pathway. Science, 339(6121):786–791, 2012. [105] W Sun, Y Li, L Chen, H Chen, F You, X Zhou, Y Zhou, Z Zhai, D Chen, and Z Jiang. ERIS, an endoplasmic reticulum IFN stimulator, activates innate immune signaling through dimerization. Proceedings of the National Academy of Sciences, 106(21):8653–8658, 2009. [106] Mikita Suyama, David Torrents, and Peer Bork. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Research, 34(suppl 2):W609–W612, 2006. [107] Mark T Swanson, Carl H Oliveros, and Jacob A Esselstyn. A phylogenomic rodent tree reveals the repeated evolution of masseter architectures. Proceedings of the Royal Society B: Biological Sciences, 286(1902):20190672, 2019. [108] Koichiro Tamura, Glen Stecher, Daniel Peterson, Alan Filipski, and Sudhir Kumar. MEGA6: Molecular Evolutionary Genetics Analysis Version 6.0. Molecular Biology and Evolution, 30(12):2725–2729, 2013. 111

[109] Jianli Tao, Xiang Zhou, and Zhengfan Jiang. cGAS-cGAMP-STING: The three musketeers of cytosolic DNA sensing and signaling. IUBMB Life, 68(11):858–870, 2016.

[110] K F Teo and P J Wright. Internal proteolysis of the NS3 protein specified by dengue virus 2. Journal of General Virology, 78(2):337–341, 1997.

[111] Indira Umareddy, Alex Chao, Aruna Sampath, Feng Gu, and Subhash G. Vasudevan. Dengue virus NS4B interacts with NS3 and dissociates it from single-stranded RNA. Journal of General Virology, 87(9):2605–2614, 2006.

[112] Indira Umareddy, Olivier Pluquet, Qing Yin Wang, Subhash G Vasudevan, Eric Chevet, and Feng Gu. Dengue virus serotype infection specifies the activation of the unfolded protein response. Virology Journal, 4(1):91, 2007.

[113] Leigh Van Valen. A new evolutionary law. Evolution Theory, 1973.

[114] Robin van der Lee, Laurens Wiel, Teunis J.P. van Dam, and Martijn A. Huynen. Genome- scale detection of positive selection in nine primates predicts human-virus evolutionary con- flicts. Nucleic Acids Research, 45(18):gkx704–, 2017.

[115] Nikos Vasilakis and Scott C. Weaver. The History and Evolution of Human Dengue Emer- gence. volume 72 of Advances in Virus Research, pages 1–76. 2008.

[116] Estelle Vasseur, Michele Boniotto, Etienne Patin, Guillaume Laval, H´el`ene Quach, Jeremy Manry, Brigitte Crouau-Roy, and Lluis Quintana-Murci. The Evolutionary Landscape of Cytosolic Microbial Sensors in Humans. The American Journal of Human Genetics, 91(1):27– 37, 2012.

[117] Valerie A. Villareal, Mary A. Rodgers, Deirdre A. Costello, and Priscilla L. Yang. Targeting host lipid synthesis and metabolism to inhibit dengue and hepatitis C viruses. Antiviral Research, 124:110–121, 2015.

[118] Sonja Welsch, Sven Miller, Ines Romero-Brey, Andreas Merz, Christopher K E Bleck, Paul Walther, Stephen D Fuller, Claude Antony, Jacomine Krijnse-Locker, and Ralf Barten- schlager. Composition and Three-Dimensional Architecture of the Dengue Virus Replication and Assembly Sites. Cell Host & Microbe, 5(4):365–375, 2009.

[119] Florian Wimmers, Nikita Subedi, Nicole van Buuringen, Daan Heister, Judith Vivi´e, Inge Beeren-Reinieren, Rob Woestenenk, Harry Dolstra, Aigars Piruska, Joannes F. M. Jacobs, Alexander van Oudenaarden, Carl G. Figdor, Wilhelm T. S. Huck, I. Jolanda M. de Vries, and Jurjen Tel. Single-cell analysis reveals that stochasticity and paracrine signaling con- trol interferon-alpha production by plasmacytoid dendritic cells. Nature Communications, 9(1):3317, 2018.

[120] G Wlasiuk and M W Nachman. Adaptation and Constraint at Toll-Like Receptors in Pri- mates. Molecular Biology and Evolution, 27(9):2172–2186, 2010.

[121] Gabriela Wlasiuk, Soofia Khan, William M. Switzer, and Michael W. Nachman. A History of Recurrent Positive Selection at the Toll-Like Receptor 5 in Primates. Molecular Biology and Evolution, 26(4):937–949, 2009. 112

[122] Gabriela Wlasiuk and Michael W. Nachman. Promiscuity and the rate of molecular evolution at primate immunity genes. Evolution, 64(8):2204–2220, 2010.

[123] Christopher H. Woelk, Simon D.W. Frost, Douglas D. Richman, Prentice E. Higley, and Sergei L. Kosakovsky Pond. Evolution of the interferon alpha gene family in eutherian mammals. Gene, 397(1-2):38–50, 2007.

[124] Z Yang. PAML 4: Phylogenetic Analysis by Maximum Likelihood. Molecular Biology and Evolution, 24(8):1586–1591, 2007.

[125] Chi-Liang Eric Yen, Scot J. Stone, Suneil Koliwad, Charles Harris, and Robert V. Farese. The- matic Review Series: Glycerolipids. DGAT enzymes and triacylglycerol biosynthesis. Journal of Lipid Research, 49(11):2283–2301, 2008.

[126] Guanghui Yi, Volker P. Brendel, Chang Shu, Pingwei Li, Satheesh Palanathan, and C. Cheng Kao. Single Nucleotide Polymorphisms of Human STING Can A↵ect Innate Immune Re- sponse to Cyclic Dinucleotides. PLoS ONE, 8(10):e77846, 2013.

[127] Chia-Yi Yu, Tsung-Hsien Chang, Jian-Jong Lian, Ruei-Lin Chiang, and Yi-Ling Lee. Dengue Virus Targets the Adaptor Protein MITA to Subvert Host Innate Immunity. PLoS Pathogens, 8(6):e1002780, 2012.

[128] Chia-Yi Yu, Jian-Jong Liang, Jin-Kun Li, Yi-Ling Lee, Bi-Lan Chang, Chan-I Su, Wei-Jheng Huang, Michael M.C. Lai, and Yi-Leng Lin. Dengue Virus Impairs Mitochondrial Fusion by Cleaving Mitofusins. PLoS Pathogens, 11(12):e1005350, 2015.

[129] Jingshu Zhang, Yun Lan, and Sumana Sanyal. Modulation of Lipid Droplet Metabolism—A Potential Target for Therapeutic Intervention in Flaviviridae Infections. Frontiers in Microbiology, 8:2286, 2017.

[130] Liqing Zhang and Wen-Hsiung Li. Mammalian Housekeeping Genes Evolve More Slowly than Tissue-Specific Genes. Molecular Biology and Evolution, 21(2):236–239, 2004.

[131] Zhiqiang Zhang, Bin Yuan, Musheng Bao, Ning Lu, Taeil Kim, and Yong-Jun Liu. The helicase DDX41 senses intracellular DNA mediated by the adaptor STING in dendritic cells. Nature Immunology, 12(10):959–965, 2011.

[132] Bo Zhong, Yan Yang, Shu Li, Yan-Yi Wang, Ying Li, Feici Diao, Caoqi Lei, Xiao He, Lu Zhang, Po Tien, and Hong-Bing Shu. The Adaptor Protein MITA Links Virus-Sensing Receptors to IRF3 Transcription Factor Activation. Immunity, 29(4):538–550, 2008. Appendix A

Genes Analyzed in the Interferon Positive Selection Screen

Induction genes: The following genes were included in the ”Induction” category as de- scribed in Chapter 2.

Induction genes Whole Gene dN /dS 2(lnl) p value reject M8a AKIRIN2 0.0700 0.0003 9.9E-01 ATG5 0.0630 0.2616 6.1E-01 CARD9 0.0483 0.0063 9.4E-01 CASP10 0.9150 52.2728 4.8E-13 yes CASP8 0.3304 3.0818 7.9E-02 CD14 0.4138 1.4626 2.3E-01 CD36 0.3225 1.4128 2.3E-01 CHUK/ IKKa 0.0909 0.0010 9.7E-01 CIITA 0.2700 8.4731 3.6E-03 yes CISH/ CIS 0.2197 9.2954 2.3E-03 yes CREBBP/ CBP 0.0657 1.8502 1.7E-01 CTNNB1/ B-catenin 0.0022 0.0016 9.7E-01 DDX1 0.0331 0.0018 9.7E-01 DDX21 0.2414 2.3803 1.2E-01 DDX3X 0.0110 0.0005 9.8E-01 DDX41 0.0151 0.1504 7.0E-01 DDX58/ RIG-I 0.3424 26.3673 2.8E-07 yes DDX60 0.3048 7.1414 7.5E-03 yes DHX36 0.1037 0.5746 4.5E-01 DHX58/ LGP2 0.1694 0.0754 7.8E-01 114 Table A.1 continued from previous page ID - Induction genes Whole Gene dN /dS 2(lnl) p value M8 vs M8a DHX9 0.0577 2.8420 9.2E-02 EP300/ p300 0.1260 0.0173 9.0E-01 EPOR 0.2343 5.1661 2.3E-02 yes EXOC2/ sec5 0.0749 0.6531 4.2E-01 FADD 0.1257 0.0020 9.6E-01 HMGB1 0.0822 0.0002 9.9E-01 HMGB2 0.1298 0.0014 9.7E-01 IFI16 0.9906 47.3944 5.8E-12 yes IFIH1/ MDA5 0.2576 1.1263 2.9E-01 IFNAR1 0.5373 9.3921 2.2E-03 yes IFNAR2 0.6334 16.8713 4.0E-05 yes IKBKB/ IKKb 0.0748 1.3433 2.5E-01 IKBKE/ IKKe 0.0747 0.0686 7.9E-01 IKBKG/ NEMO 0.0907 0.0319 8.6E-01 IRAK1 0.2162 0.4751 4.9E-01 IRAK2 0.2650 0.0670 8.0E-01 IRAK4 0.1793 1.5165 2.2E-01 IRF1 0.0882 1.4724 2.2E-01 IRF3 0.2981 2.6909 1.0E-01 IRF5 0.1002 1.5953 2.1E-01 IRF8 0.1269 0.0410 8.4E-01 IRF9 0.2191 0.5627 4.5E-01 ISG15 0.1712 0.2474 6.2E-01 JAK1 0.0427 0.0795 7.8E-01 JAK2 0.0842 0.1176 7.3E-01 JAK3 0.0782 4.3449 3.7E-02 yes JUN/ AP-1 0.0285 0.5282 4.7E-01 KAT2A/ GCN5 0.0193 0.0285 8.7E-01 LRRFIP1 0.4667 0.0915 7.6E-01 LY96/ MD2 0.3999 2.2003 1.4E-01 MAP3K7/ tak1 0.0338 0.0015 9.7E-01 MAPK1/ERK2 0.0001 0.0000 1.0E+00 115 Table A.1 continued from previous page ID - Induction genes Whole Gene dN /dS 2(lnl) p value M8 vs M8a MAPK14/ P38 0.0067 0.0002 9.9E-01 MAPK3/ ERK1 0.0122 0.0005 9.8E-01 MAPK8/ JNK 0.0248 0.0000 9.9E-01 MAVS/ IPS-1/ CARDIF 0.7411 22.3711 2.2E-06 yes MB21D1/ cGAS 0.8347 62.9534 2.1E-15 yes MNDA 0.9779 20.4509 6.1E-06 yes MYD88 0.0921 0.0000 1.0E+00 NFKB1/ p50 0.1479 0.0003 9.9E-01 NFKB2 0.0989 0.0435 8.3E-01 NFKBIA/ IKBA 0.0328 0.1524 7.0E-01 NOD1 0.1442 1.7412 1.9E-01 NOD2 0.2131 3.0306 8.2E-02 OAS1 1.0420 93.5793 3.9E-22 yes OAS2 0.5739 36.2962 1.7E-09 yes OAS3 0.3350 3.4669 6.3E-02 OASL 0.3789 0.5259 4.7E-01 PIAS1 0.0211 0.0006 9.8E-01 PIAS2/ PIASX 0.0456 0.0009 9.8E-01 PIAS3 0.0714 0.3518 5.5E-01 PIAS4/ PIASY 0.0304 0.6474 4.2E-01 PRKCD 0.0260 0.0015 9.7E-01 PRMT1 0.0034 0.0020 9.6E-01 PTPN1/ PTP1B 0.1509 1.6174 2.0E-01 PTPN11/ SHP2 0.0139 0.0009 9.8E-01 PTPN2/ TCPTP 0.1310 0.0029 9.6E-01 PTPN6/ SHP1 0.0303 0.0085 9.3E-01 PTPRC 0.7324 223.5897 1.5E-50 yes PYHIN1/ IFIX 0.8408 24.9545 5.9E-07 yes RELA/ p65 0.1053 1.9057 1.7E-01 RELB 0.0514 1.5708 2.1E-01 RIPK1/ RIP1 0.3745 2.2220 1.4E-01 RIPK2/ RICK 0.2332 0.1376 7.1E-01 116 Table A.1 continued from previous page ID - Induction genes Whole Gene dN /dS 2(lnl) p value M8 vs M8a RNASEL 0.6365 45.6741 1.4E-11 yes SOCS1 0.0410 0.1699 6.8E-01 SOCS2 0.1601 0.0061 9.4E-01 SOCS3 0.0388 0.0031 9.6E-01 SOCS4 0.1184 0.0014 9.7E-01 SOCS5 0.0748 0.0609 8.1E-01 SOCS6 0.0777 3.7759 5.2E-02 SPP1/ OPN 0.3689 13.7453 2.1E-04 yes SSR2/TRAPB 0.0459 0.1369 7.1E-01 STAT1 0.0396 0.1289 7.2E-01 STAT2 0.3195 16.1977 5.7E-05 yes STAT3 0.0038 0.0007 9.8E-01 STAT4 0.1075 0.0563 8.1E-01 STAT5A 0.0363 0.6894 4.1E-01 STAT5B 0.0375 0.8534 3.6E-01 STAT6 0.1671 2.1439 1.4E-01 TAB1 0.0098 0.2344 6.3E-01 TAB2 0.0543 0.0019 9.7E-01 TAB3 0.1224 0.0020 9.6E-01 TBK1 0.0637 0.0490 8.2E-01 TICAM1 0.2446 0.2959 5.9E-01 TICAM2 0.2991 2.0749 1.5E-01 TIRAP 0.1990 3.5073 6.1E-02 TLR1 0.4994 24.0502 9.4E-07 yes TLR2 0.5535 16.1628 5.8E-05 yes TLR3 0.2810 0.8524 3.6E-01 TLR4 0.5889 40.4600 2.0E-10 yes TLR5 0.4667 25.4056 4.7E-07 yes TLR6 0.4457 17.3012 3.2E-05 yes TLR7 0.2398 4.8459 2.8E-02 yes TLR8 0.4080 42.9476 5.6E-11 yes TLR9 0.1851 4.8728 2.7E-02 yes 117 Table A.1 continued from previous page ID - Induction genes Whole Gene dN /dS 2(lnl) p value M8 vs M8a TMEM173/ STING 0.5985 11.4975 7.0E-04 yes TRADD 0.1807 0.0007 9.8E-01 TRAF3 0.0341 0.0071 9.3E-01 TRAF6 0.1243 0.0193 8.9E-01 TREX1 0.3275 1.6494 2.0E-01 TRIM21/ RO52 0.4101 5.4792 1.9E-02 yes TRIM25 0.1792 7.9011 4.9E-03 yes TYK2 0.1199 10.8684 9.8E-04 yes UBE2N/ UBC13 0.0883 0.0000 1.0E+00 UBE2V1/ UEV1A 0.0001 0.0001 9.9E-01 UNC93B1 0.0506 0.0592 8.1E-01 USP18/ UBP43 0.2849 2.1082 1.5E-01 XRCC5/ KU80 0.1271 0.0002 9.9E-01 XRCC6/ KU70 0.0575 0.0261 8.7E-01 ZBP1/ DAI 0.6563 4.4370 3.5E-02 yes 118

ISGs: The following genes were included in the ”ISG” category as described in Chapter 2.

ID - ISGs Whole Gene dN /dS 2(lnl) p value M8 vs M8a ADAR 0.2816 18.8812 1.4E-05 yes ADM 0.2005 2.0133 1.6E-01 AKT3 0.0143 0.0010 9.8E-01 APOBEC3F 0.7692 32.4308 1.2E-08 yes APOBEC3G 1.1111 52.8569 3.6E-13 yes APOL2 1.0074 6.6728 9.8E-03 yes APOL6 1.9105 90.8168 1.6E-21 yes ATRIP 0.3979 2.2793 1.3E-01 B2M 0.3842 3.5950 5.8E-02 BCL3 0.0884 0.0025 9.6E-01 BST2 0.7316 6.9366 8.4E-03 yes C19orf66 0.0362 1.7822 1.8E-01 CCL5 0.2164 0.0008 9.8E-01 CCL8 0.5076 5.5783 1.8E-02 yes CD47 0.1990 5.2844 2.2E-02 yes CD74 0.1627 0.0268 8.7E-01 CDKN1A 0.2784 0.0176 8.9E-01 CEACAM1 0.8984 115.3073 6.7E-27 yes CEBPD 0.0465 2.4635 1.2E-01 CH25H 0.1196 0.0112 9.2E-01 CNP 0.1183 0.0000 1.0E+00 CRP 0.5408 7.8621 5.0E-03 yes CXCL10 0.4630 1.4201 2.3E-01 DAPK1 0.0474 7.8698 5.1E-03 yes DDIT4 0.0878 0.4498 5.0E-01 EHD4 0.0201 0.5926 4.4E-01 EIF2AK2 1.2242 77.0594 1.7E-18 yes ELF1 0.1127 1.1833 2.8E-01 ETV6 0.0485 0.0093 9.2E-01 119 Table A.2 continued from previous page ID - ISGs Whole Gene dN /dS 2(lnl) p value M8 vs M8a EXT1 0.0485 0.0005 9.8E-01 FAM46C 0.0619 1.3902 2.4E-01 FFAR2 0.1537 3.6133 5.7E-02 G3BP2 0.0364 1.6686 2.0E-01 GBP1 0.3368 2.4086 1.2E-01 GBP2 0.3525 11.7588 6.1E-04 yes GJA4 0.0332 0.4403 5.1E-01 GPATCH11 (CCDC75) 0.1408 0.9484 3.3E-01 GPR37 0.1837 0.0933 7.6E-01 HPSE 0.3512 3.5068 6.1E-02 IDO1 0.3825 3.4091 6.5E-02 IFI27 0.9423 8.5947 3.4E-03 yes IFI44 1.0653 36.5698 1.5E-09 yes IFI44L 1.0500 62.0586 3.3E-15 yes IFI6 0.2470 5.0251 2.5E-02 yes IFIT1 0.5973 10.1585 1.4E-03 yes IFIT2 0.3873 10.7827 1.0E-03 yes IFIT3 0.4417 3.3543 6.7E-02 IFIT5 0.2333 4.2097 4.0E-02 yes IFITM1 0.1687 0.3643 5.5E-01 IL15 0.4036 0.0032 9.5E-01 IL1R1 0.3507 1.0956 3.0E-01 IRF2 0.1143 0.0318 8.6E-01 ISG20 0.0936 0.0496 8.2E-01 JADE2 0.0465 0.5054 4.8E-01 LGALS3BP 0.1639 3.0625 8.0E-02 LY6E 0.2634 0.0034 9.5E-01 MAFF 0.0260 0.0351 8.5E-01 MAP3K14 0.1785 1.1685 2.8E-01 MAP3K5 0.0409 0.0002 9.9E-01 MAP3K8 0.0372 0.0026 9.6E-01 MARCKS 0.1284 0.2799 8.7E-01 120 Table A.2 continued from previous page ID - ISGs Whole Gene dN /dS 2(lnl) p value M8 vs M8a MASTL 0.3466 0.0172 9.0E-01 MCOLN2 0.3719 3.5622 5.9E-02 MCUB 0.3839 0.1308 7.2E-01 MLKL 0.9782 23.0296 1.6E-06 yes MOV10 0.0648 0.0213 8.8E-01 MX1 0.2585 8.4569 3.6E-03 yes MX2 0.3811 11.7163 6.2E-04 yes NAMPT 0.1000 0.0098 9.2E-01 NAPA 0.0118 0.0001 9.9E-01 P2RY6 0.0930 0.2131 6.4E-01 PHF11 0.5532 14.1589 1.7E-04 yes PIM3 0.0212 1.0189 3.1E-01 PLSCR1 0.3909 1.0704 3.0E-01 PML 0.1820 0.1086 7.4E-01 PSMB8 0.2425 0.2456 6.2E-01 PSMB9 0.1175 0.0287 8.7E-01 RSAD2 0.2429 6.9089 8.6E-03 yes RTP4 0.6956 13.7229 2.1E-04 yes SAMD9 0.4302 18.6453 1.6E-05 yes SAMHD1 0.5268 35.1896 3.0E-09 yes SAT1 0.0917 0.7050 4.0E-01 SLC15A3 0.1381 0.6495 4.2E-01 SLC16A4 0.4281 0.1345 7.1E-01 SLC1A1 0.1480 0.0180 8.9E-01 SLC25A28 0.0857 0.0072 9.3E-01 SLFN12 0.5016 14.1367 1.7E-04 yes SSBP3 0.0129 0.0000 1.0E+00 TAGAP 0.3124 10.1447 1.4E-03 yes THBD 0.2537 1.7609 1.8E-01 TMEM140 0.7685 16.5456 4.8E-05 yes TNFSF10 0.3363 1.1868 2.8E-01 TNK2 0.0639 5.1616 2.3E-02 yes 121 Table A.2 continued from previous page ID - ISGs Whole Gene dN /dS 2(lnl) p value M8 vs M8a TRIM22 0.5745 12.1927 4.8E-04 yes TRIM38 0.3269 1.0961 3.0E-01 TRIM5 1.1614 75.1855 4.3E-18 yes TRIM56 0.0775 0.5619 4.5E-01 TTC39B 0.1217 0.0035 9.5E-01 XAF1 0.7768 1.1386 2.9E-01 ZC3HAV1 0.6941 15.8065 7.0E-05 yes 122

Random Genes: The following genes were included in the ”Random” category as de- scribed in Chapter 2.

ID - Random set Whole Gene dN /dS 2(lnl) p value M8 vs M8a ACAA2 0.3698 15.8759 6.8E-05 yes AGPAT4 0.0484 0.0210 8.8E-01 ATG7 0.2156 0.2089 6.5E-01 AZU1 0.1635 8.1549 4.3E-03 yes BBS2 0.2454 9.4608 2.1E-03 yes CCDC172 0.3991 2.6441 1.0E-01 CEBPZ 0.2689 6.4626 1.1E-02 yes CFLAR 0.4485 14.4483 1.4E-04 yes CHGA 0.2787 1.3334 2.5E-01 CHKB 0.1509 0.0149 9.0E-01 CHN2 0.0183 0.0015 9.7E-01 CHPF 0.0502 0.8781 3.5E-01 CINP 0.3976 1.5359 2.2E-01 DCSTAMP 0.5148 1.2236 2.7E-01 DPH6 0.1444 0.4414 5.1E-01 DPP3 0.0711 0.1748 6.8E-01 DRAXIN 0.2383 2.0519 1.5E-01 DTD1 0.0555 0.0772 7.8E-01 DTD2 0.2110 8.3170 3.9E-03 yes ESYT2 0.1792 0.0196 8.9E-01 FAM102A 0.0379 0.0402 8.4E-01 FAM103A1 0.2217 0.0192 8.9E-01 FARP2 0.2524 1.1943 2.7E-01 FGD2 0.1196 0.6198 4.3E-01 FGF14 0.0110 0.0001 9.9E-01 FITM1 0.0578 0.0107 9.2E-01 FSCN2 0.0553 0.7869 3.8E-01 GIPC2 0.3787 0.5465 4.6E-01 GML 0.8533 17.4923 2.9E-05 yes 123 Table A.3 continued from previous page ID - Random set Whole Gene dN /dS 2(lnl) p value M8 vs M8a GNAT3 0.0421 0.0016 9.7E-01 GPANK1 0.3158 1.3303 2.5E-01 GPX5 0.3246 5.5806 1.8E-02 yes GREB1L 0.1308 0.3178 5.7E-01 GRP 0.4185 10.5372 1.2E-03 yes HMCES 0.2971 4.4039 3.6E-02 yes IMMP2L 0.1049 0.0039 9.5E-01 KCNAB1 0.0146 0.0009 9.8E-01 KIF15 0.2310 3.4069 6.5E-02 KLHL22 0.0179 0.0496 8.2E-01 LCOR 0.2018 0.0000 1.0E+00 LETM2 0.4094 2.3757 1.2E-01 LIPH 0.2392 0.5064 4.8E-01 LMBR1L 0.0881 0.0644 8.0E-01 LPIN2 0.1004 0.1001 7.5E-01 LRRC6 0.3005 15.0731 1.0E-04 yes LRRTM1 0.0316 0.0083 9.3E-01 MCAM 0.1610 0.0150 9.0E-01 MED23 0.0137 0.2641 6.1E-01 MGEA5 0.0468 0.0458 8.3E-01 MMP27 0.3812 7.8843 5.0E-03 yes MRPL41 0.0903 3.1420 7.6E-02 MRPS30 0.3766 10.2508 1.4E-03 yes MSANTD3 0.0490 0.0290 8.6E-01 MTSS1L 0.0186 0.0236 8.8E-01 NDUFC1 0.6308 0.0971 7.6E-01 NPC2 0.3067 3.8953 4.8E-02 yes NR1D1 0.0714 0.0002 9.9E-01 NYAP2 0.0684 0.2332 6.3E-01 OMD 0.3458 0.0007 9.8E-01 PAX8 0.0238 0.0036 9.5E-01 PFKL 0.0220 0.0204 8.9E-01 124 Table A.3 continued from previous page ID - Random set Whole Gene dN /dS 2(lnl) p value M8 vs M8a PHYH 0.2519 2.6947 1.0E-01 PIP5K1A 0.0445 0.7330 3.9E-01 PIWIL4 0.2351 0.0020 9.6E-01 PLPPR4 0.0257 0.0137 9.1E-01 PPIL2 0.0579 0.0035 9.5E-01 PUS7 0.1784 8.8247 3.0E-03 yes PYGO1 0.1130 0.0149 9.0E-01 QPCT 0.2249 0.0115 9.1E-01 RASEF 0.1983 0.0934 7.6E-01 RBL2 0.0880 13.9698 1.9E-04 yes RBM19 0.1822 7.9921 4.7E-03 yes RBPMS2 0.1230 0.0434 8.4E-01 RHPN1 0.1380 0.0219 8.8E-01 RNF13 0.0262 0.0003 9.9E-01 RNF150 0.0565 0.0160 9.0E-01 RNFT2 0.0551 0.0515 8.2E-01 RRAD 0.0681 0.0066 9.4E-01 SAP130 0.1239 1.3741 2.4E-01 SASH1 0.1485 0.0733 7.9E-01 SCYL1 0.0896 0.0019 9.7E-01 SH2D3A 0.3174 0.0278 8.7E-01 SRRM4 0.1911 11.5691 6.7E-04 yes STARD7 0.1585 0.0002 9.9E-01 STK10 0.0843 0.7610 3.8E-01 TDRD1 0.4209 2.3273 1.3E-01 TEX22 0.1974 0.0041 9.5E-01 TFG 0.0929 4.2436 3.9E-02 yes THOC7 0.0271 0.0006 9.8E-01 TMEM198 0.0371 0.0667 8.0E-01 TMEM239 0.4250 4.1700 4.1E-02 yes TMEM45A 0.5787 4.5164 3.4E-02 yes TXLNB 0.5059 8.6014 3.4E-03 yes 125 Table A.3 continued from previous page ID - Random set Whole Gene dN /dS 2(lnl) p value M8 vs M8a UBE2L6 0.3018 3.5935 5.8E-02 UBR2 0.1151 137.8108 8.0E-32 yes UNC5D 0.0630 0.0646 8.0E-01 WDR76 0.4145 3.9680 4.6E-02 yes ZCCHC8 0.2681 4.9286 2.6E-02 yes ZNF654 0.1549 0.0043 9.5E-01 ZNF835A 0.0870 0.0000 1.0E+00 Appendix B

Host Proteins Predicted by Machine Learning

Refseq ID Gene names (synonyms included) NP 000027.2 AMPD1 NP 000077.1 CLN3 BTS NP 000092.2 CYBA NP 000103.2 SLC26A2 DTD DTDST NP 000171.1 GUCY2D CORD6 GUC1A4 GUC2D RETGC RETGC1 NP 000264.2 GPR143 OA1 NP 000383.1 ABCC2 CMOAT CMOAT1 CMRP MRP2 NP 000672.3 ADRA2A ADRA2R ADRAR NP 000674.2 ADRA2C ADRA2L2 ADRA2RL2 NP 000693.1 ATP1A2 KIAA0778 NP 000786.1 DRD2 NP 000788.2 DRD4 NP 000819.3 GRIA3 GLUR3 GLURC NP 000832.1 GRM4 GPRC1D MGLUR4 NP 000834.2 GRM6 GPRC1F MGLUR6 NP 000946.2 PTGER1 NP 000947.2 PTGER2 NP 001032.2 SI NP 001042.1 SSTR3 NP 001057.1 TNFRSF1B TNFBR TNFR2 NP 001107.2 ADCY9 KIAA0520 NP 001142.2 SLC25A4 ANT1 NP 001143.2 SLC25A5 ANT2 127 Table B.1 continued from previous page Refseq ID Gene names (synonyms included) NP 001162.4 ABCC6 ARA MRP6 NP 001388.1 ECE1 NP 001399.1 CELSR2 CDHF10 EGFL2 KIAA0279 MEGF3 NP 001471.2 GALR1 GALNR GALNR1 NP 001513.2 GUCY2F GUC2F RETGC2 NP 001627.2 SLC25A6 ANT3 CDABP0051 NP 001642.1 AQP5 NP 001737.1 CANX NP 002021.3 FPR3 FPRH1 FPRL2 NP 002044.2 GBP1 NP 002102.4 HTT HD IT15 NP 002228.2 KCNG1 NP 002246.5 KIR2DL4 CD158D KIR103AS NP 002310.2 LRCH4 LRN LRRN1 LRRN4 NP 002541.2 OR3A1 OLFRA03 NP 002634.1 PIGF NP 002829.3 PTPRC CD45 NP 002837.1 PTPRN ICA3 ICA512 NP 002932.1 ROBO1 DUTT1 NP 003040.1 SLC10A1 NTCP GIG29 NP 003114.1 SPN CD43 NP 003171.2 SYT5 NP 003569.1 SOAT2 ACACT2 ACAT2 NP 003792.1 GPAA1 GAA1 NP 003821.1 SIGLEC5 CD33L2 OBBP2 NP 003963.1 BTAF1 TAF172 NP 004084.1 EFNB2 EPLG5 HTKL LERK5 NP 004111.2 GBP2 NP 004299.1 ARHGAP1 CDC42GAP RHOGAP1 NP 004436.4 EPHB6 NP 004467.1 FOLH1 FOLH NAALAD1 PSM PSMA GIG27 NP 004551.2 ROR2 NTRKR2 128 Table B.1 continued from previous page Refseq ID Gene names (synonyms included) NP 004645.2 USP9Y DFFRY NP 004659.2 MGAM MGA MGAML NP 004742.1 GCNT3 NP 004787.2 NRXN3 KIAA0743 NP 004811.1 CYP7B1 NP 004817.2 ECEL1 XCE UNQ2431/PRO4991 NP 004870.3 EI24 PIG8 NP 004883.3 SEC22B SEC22L1 NP 004966.1 KCNB1 NP 004987.2 ABCC1 MRP MRP1 NP 005003.2 ROR1 NTRKR1 NP 005155.1 ABCD2 ALD1 ALDL1 ALDR ALDRP NP 005217.2 S1PR3 EDG3 NP 005259.1 GJB5 NP 005328.2 NCKAP1L HEM1 NP 005490.1 UBA2 SAE2 UBLE1B HRIHFB2115 NP 005682.2 ABCC9 SUR2 NP 006022.3 PCNT KIAA0402 PCNT2 NP 006181.1 ORC2 ORC2L NP 006311.2 PGRMC2 DG6 PMBP NP 006349.1 SLC25A17 PMP34 NP 006484.1 CLN5 NP 006728.2 KIR3DL2 CD158K NKAT4 NP 006773.2 ZFPL1 NP 006785.1 GPR75 NP 008832.2 MYO9A MYR7 NP 009091.3 OR2H2 FAT11 OLFR2 OR2H3 NP 009198.4 TMC6 EVER1 EVIN1 NP 036211.2 DGAT1 AGRP1 DGAT NP 036392.2 HACL1 HPCL HPCL2 PHYH2 HSPC279 NP 036497.1 OR2A5 OR2A26 OR2A8 NP 038465.1 LRP12 ST7 129 Table B.1 continued from previous page Refseq ID Gene names (synonyms included) NP 055149.2 FRRS1L C9orf4 NP 055421.1 HERC3 KIAA0032 NP 055426.1 MDN1 KIAA0301 NP 055486.2 UBE3C KIAA0010 KIAA10 NP 055508.3 ECE2 KIAA0604 UNQ403/PRO740 NP 055680.3 NCAPD2 CAPD2 CNAP1 KIAA0159 NP 055747.1 NLGN1 KIAA1070 NP 055761.2 SPAST ADPSP FSP2 KIAA1083 SPG4 NP 055806.2 WDFY3 KIAA0993 NP 055845.1 FRYL AF4P12 KIAA0826 NP 055986.1 XPO6 KIAA0370 RANBP20 NP 055994.2 RRP12 KIAA0690 NP 056121.2 USP24 KIAA1057 NP 056122.1 FAM189A1 KIAA0574 TMEM228 NP 056144.3 MAU2 KIAA0892 SCC4 NP 056224.3 EP400 CAGH32 KIAA1498 KIAA1818 TNRC12 NP 057043.1 TMX2 TXNDC14 CGI-31 My009 PIG26 PSEC0045 UNQ237/PRO270 NP 057368.3 CNOT1 CDC39 KIAA1007 NOT1 AD-005 NP 057686.2 CCR10 GPR2 NP 059139.3 POLE3 CHRAC17 NP 059974.1 OR2M4 NP 060034.9 STAB2 FEEL2 FELL FEX2 HARE NP 060145.3 CDHR2 PCDH24 PCLKC NP 060214.2 ST7L ST7R NP 060264.4 FOCAD KIAA1797 NP 060356.2 ULK4 NP 060542.4 HEATR1 BAP28 UTP10 NP 060754.2 GBP3 NP 060893.2 STYK1 NOK NP 061142.2 ABCA5 KIAA1888 NP 061720.2 DNAH7 KIAA0944 NP 065073.3 ARFGEF3 BIG3 C6orf92 KIAA1244 130 Table B.1 continued from previous page Refseq ID Gene names (synonyms included) NP 065396.1 KIR2DL5A CD158F CD158F1 KIR2DL5 NP 065816.2 UBR4 KIAA0462 KIAA1307 RBAF600 ZUBR1 NP 066921.2 CACNA1H NP 068708.1 ST7 FAM4A1 HELG RAY1 NP 071760.2 DNAJC1 HTJ1 NP 075561.3 CPLANE1 C5orf42 JBTS17 NP 078781.3 NOX5 NP 078817.2 ARMH3 C10orf76 NP 078898.3 FASTKD1 KIAA1800 NP 079010.2 CLMN KIAA1188 NP 079070.1 EPHX3 ABHD9 NP 079164.1 PLPPR3 LPPR3 PHP2 PRG2 NP 079406.3 HKDC1 NP 079413.3 SPG11 KIAA1840 NP 079429.2 ATP10B ATPVB KIAA0715 NP 079519.1 SLC19A3 NP 112145.1 OR2H1 OR2H6 OR2H8 NP 115666.2 SLF1 ANKRD32 BRCTD1 NP 116215.1 SLC35B4 YEA4 PSEC0055 NP 150092.2 GABRG3 NP 443173.2 GBP4 NP 443174.1 GBP5 UNQ2427/PRO4987 NP 542402.3 KCNE4 NP 543008.3 OXGR1 GPR80 GPR99 P2RY15 P2Y15 NP 570115.1 GIMAP1 IMAP1 NP 597702.2 GRIN3A KIAA1973 NP 612486.2 CDAN1 UNQ664/PRO1295 NP 653200.2 NIPA1 SPG6 NP 653223.2 DCST2 NP 653300.2 ATP1A4 ATP1AL2 NP 659471.1 TOR1AIP2 IFRG15 LULL1 NP 660150.1 PIGM 131 Table B.1 continued from previous page Refseq ID Gene names (synonyms included) NP 667340.2 ALS2CL NP 683707.3 OSBPL9 ORP9 OSBP4 NP 689549.2 AGBL1 NP 689947.2 C6orf89 BRAP UNQ177/PRO203 NP 714915.3 TMEM67 MKS3 NP 742066.2 PLEKHH2 KIAA2028 NP 742067.3 UBR3 KIAA2024 ZNF650 NP 775837.2 PRR14L C22orf30 NP 775838.3 EPHX4 ABHD7 EH4 EPHXRP NP 775933.1 RNF175 NP 775945.1 DCBLD1 NP 776175.2 PRTG NP 777562.1 LDLRAD3 LRAD3 NP 835361.1 SLC35B2 PAPST1 PSEC0149 NP 848612.2 PIGW NP 852478.1 ITGA1 NP 859052.3 QSOX2 QSCN6L1 SOXN NP 859070.3 TMCO4 NP 870989.1 GRM7 GPRC1G MGLUR7 NP 871001.1 RXFP4 GPR100 RLN3R2 NP 878918.2 SYNE2 KIAA1011 NUA NP 898884.1 SLC9C1 SLC9A10 NP 932069.1 SLC10A6 SOAT NP 932173.1 SCN5A NP 940862.2 GBP6 NP 940870.2 MMS22L C6orf167 NP 940890.3 FAM83H NP 940933.3 ATP9B ATPIIB NEO1L HUSSY-20 NP 995314.1 NCKAP1 HEM2 KIAA0587 NAP1 NP 996880.1 GPR152 PGR5 NP 997281.2 GBP7 GBP4L NP 997698.1 ABCA2 ABC2 KIAA1062 132 Table B.1 continued from previous page Refseq ID Gene names (synonyms included) NP 001001891.2 ANO7 NGEP PCANAP5 TMEM16G NP 001001914.1 OR2G3 NP 001001921.1 OR5AS1 NP 001004064.1 OR8J3 NP 001005160.1 OR52A5 NP 001005203.2 OR8S1 NP 001005205.2 OR8J1 NP 001005276.1 OR2AE1 OR2AE2 NP 001005326.1 OR4F6 OR4F12 NP 001005480.2 OR2A2 OR2A17P OR2A2P NP 001005501.1 OR4K2 NP 001008938.1 CKAP5 KIAA0097 NP 001012302.2 ANO9 PIG5 TMEM16J TP53I5 NP 001012660.1 GRAMD2A GRAMD2 NP 001017395.2 TMCC1 KIAA0779 NP 001018091.1 KIR2DL5B NP 001028285.1 ENTPD8 UNQ2492/PRO5779 NP 001033073.1 SLC38A10 PP1744 NP 001034679.2 USP9X DFFRX FAM USP9 NP 001036054.1 TMEM8B C9orf127 NGX6 NP 001073678.1 TMEM200C TTMA NP 001073942.1 MFSD2B NP 001091985.1 MRGPRF GPR140 GPR168 MRGF PSEC0142 NP 001092903.1 JAKMIP1 GABABRBP JAMIP1 MARLIN1 NP 001116848.1 TMEM72 C10orf127 KSP37 NP 001120795.1 GRM8 GPRC1H MGLUR8 NP 001129941.1 TMEM108 KIAA1690 RTLN UNQ1875/PRO4318 NP 001136438.1 GREB1L C18orf6 KIAA1772 NP 001137291.1 TPCN1 KIAA1169 TPC1 NP 001138912.2 TYW1B RSAFD2 NP 001139243.1 ADGRG1 GPR56 TM7LN4 TM7XN1 UNQ540/PRO1083 NP 001139541.1 MFSD10 TETRAN 133 Table B.1 continued from previous page Refseq ID Gene names (synonyms included) NP 001153682.1 SLC25A13 ARALAR2 NP 001153705.1 ATP1A1 NP 001156685.1 FGFR3 JTK4 NP 001156908.1 TBCK TBCKL HSPC302 NP 001163802.1 LANCL3 NP 001166245.1 DPY19L3 NP 001167587.1 DMXL2 KIAA0856 NP 001167635.1 PRRT4 NP 001172014.1 ATP12A ATP1AL1 NP 001185971.1 STRA6 PP14296 UNQ3126/PRO10282/PRO19578 NP 001188326.1 EDNRB ETRB NP 001229757.1 ELOVL5 ELOVL2 PRO0530 NP 001243000.2 RNF213 ALO17 C17orf27 KIAA1554 KIAA1618 MYSTR NP 001243143.1 ATP1A3 NP 001243415.1 GOLGB1 NP 001257870.1 JAKMIP2 JAMIP2 KIAA0555 NECC1 NP 001269205.1 PRAC2 NP 001284594.1 TTC39A C1orf34 KIAA0452 NP 001287709.1 SEMA6A KIAA1368 SEMAQ NP 001288294.1 LYST CHS CHS1 NP 001289958.1 PKP3 NP 001291386.1 LILRB5 LIR8 NP 001293013.1 CFAP54 C12orf55 C12orf63 NP 001295177.1 EVI5 NB4S NP 001295399.1 CYSLTR2 CYSLT2 CYSLT2R PSEC0146 NP 001296171.1 MYO15B KIAA1783 MYO15BP NP 001307585.1 PCID2 HT004 NP 001307906.1 MAN2A2 NP 001308832.1 HACD4 NP 001310015.1 JAKMIP3