<<

KINOMES OF SELECTED PARASITIC HELMINTHS - FUNDAMENTAL AND APPLIED IMPLICATIONS

Andreas Julius Stroehlein

BSc (Bingen am Rhein, Germany) MSc (Berlin, Germany)

ORCID ID 0000-0001-9432-9816

Submitted in fulfilment of the requirements of the degree of Doctor of Philosophy

July 2017

Melbourne Veterinary School, Faculty of Veterinary and Agricultural Sciences,

The University of Melbourne

Produced on archival quality paper

ii SUMMARY ______

Worms (helminths) are a large, paraphyletic group of organisms including free-living and parasitic representatives. Among the latter, many species of roundworms ( Nematoda) and (phylum Platyhelminthes) are of major socioeconomic importance worldwide, causing debilitating diseases in humans and livestock. Recent advances in molecular technologies have allowed for the analysis of genomic and transcriptomic data for a range of helminth species. In this context, studying molecular signalling pathways in these species is of particular interest and should help to gain a deeper understanding of the evolution and fundamental biology of among these species. To this end, the objective of the present thesis was to characterise and curate the protein kinase complements (kinomes) of parasitic worms based on available transcriptomic data and draft genome sequences using a bioinformatic workflow in to increase our understanding of how kinase signalling regulates fundamental biology and also to gain new insights into the evolution of protein kinases in parasitic worms. In addition, this work also aimed to investigate protein kinases with regard to their potential as useful targets for the development of novel anthelmintic small-molecule agents. This thesis consists of a literature review, four chapters describing original research findings and a general discussion. A detailed assessment of the literature (Chapter 1) revealed that, despite a recent increase in the availability of transcriptomic and genomic data sets for parasitic worms, very little is known about the protein kinases encoded in these genomes and their associated functions. In addition, the bioinformatic tools currently used for kinase identification and classification do not permit the accurate characterisation of kinase complements of parasitic worms as they do not take into account the draft state of genomic data sets and the substantial diversity in kinases of species that are only distantly related to well-curated model organisms. Therefore, the aims of this thesis were: (i) to comprehensively identifiy, classify, curate and functionally annotate the full complements of protein kinases in the genomes of parasitic worms, by (ii) establishing an advanced bioinformatic workflow system to carry out this task; (iii) to explore fundamental aspects of kinase signalling in worms based on developmental transcriptomes and cross-species comparisons; and (iv), from an applied perspective, to identify protein kinases with potential as anthelmintic targets. An integrated bioinformatic workflow relying on a pairwise-comparative approach was established and used to define the complete kinomes of the blood flukes Schistosoma haematobium and S. mansoni (phylum Platyhelminthes; Trematoda; Chapter 2). For such flatworms, scant kinome data and vast phylogenetic distance from any well-curated model organism represented a challenge for kinase identification and classification. Therefore, trematode-specific stochastic models were inferred to identify and classify kinases prior to pairwise curation, which proved superior to the generalised models used in kinase identification tools available at the time. The transcription profiles of the curated kinase genes were then investigated and, employing this and other functional information, kinases were assessed in detail for their potential as drug targets. Subsequently, using an in silico approach, small-molecule effectors were inferred. Next, using the well-curated kinome of the best-characterised metazoan organism, Caenorhabditis elegans (a free-living ) as a reference, the complete kinome of Haemonchus contortus, a parasitic nematode of ruminants, was defined (Chapter 3). Based on this curated data set, the transcriptional regulation of kinase genes across parasite development was investigated, and these data were then integrated into an improved, ranking-based drug target prediction pipeline. This study showed that, using the kinome of the free-living nematode as the reference, the curation of the H. contortus kinome was readily

iii possible. However, this was not the case for species that were distantly related to C. elegans, such that a distinct approach had to be taken. Therefore, for the identification, characterisation and curation of kinase sequences of distant taxa, such as those in in the class Enoplea, pairwise analyses were undertaken between closely related species within each of the genera and (Chapters 4 and 5). Kinomes of four species of enoplean, with unique biology and evolution, were investigated and compared. These analyses showed that enopleans have remarkably compact kinomes compared with other worms, and complemented with advanced three-dimensional modelling, revealed a novel enoplean-specific protein kinase. In conclusion, the present thesis has contributed significantly to gaining a deep understanding of the protein kinomes in socioeconomically important parasitic worms (Nematoda and Trematoda), and has provided a bioinformatic framework for the exploration of kinomes of the plethora of parasitic worms (Chapter 6). Although established for worm kinomes, the workflow system developed will have broad applicability to almost any group of eukaryotic organisms. Importantly, the findings presented in this thesis provide a practical resource for future functional investigations of signalling pathways in parasitic worms, which has considerable fundamental implications for studying worm biology, physiology and evolution. Given that protein kinases are recognised as attractive targets for small molecule drugs, the results should also have significant applied implications for future anti-parasitic drug discovery, repurposing and development. ______

iv DECLARATION ______

The work described in the thesis was performed in the Faculty of Veterinary and Agricultural Sciences of the University of Melbourne between April 2014 and July 2017. The scientific work was performed solely by the author with the exception of the assistance which has been specifically acknowledged. The thesis is less than 100,000 words in length, exclusive of tables, figures, references and appendices. No part of this thesis has been submitted for any other degree or diploma.

......

Andreas Julius Stroehlein

July 2017

v ACKNOWLEDGEMENTS ______

I would like to express my deepest gratitude to my supervisor Robin Gasser for the opportunity to do my PhD in his group. I am very thankful for his continuous encouragement and support and for providing me with so many opportunities to become a ‘real’ parasitologist from the day I arrived in Melbourne and throughout my entire PhD. I have learned so much from him regarding parasites, research, teaching and life. I hope in the future I will be able to pass on some of his wisdom to others and become the teacher and mentor he has been for me.

I am indebted to my PhD co-supervisor Neil Young. He is a great scientist and has mentored and supported me every day during my PhD candidature. His invaluable input and continuous support have helped shape my project and have allowed me to thrive as a scientist. His suggestions and our fruitful discussions have made me a better critical thinker and researcher. Thank you so much, Neil. I am looking forward to working with you in the future.

I would like to thank Alexander Maier for running “Concepts in Parasitology” (CiP) organised by the Australian Society for Parasitology and everyone else who contributed to the success of this course. I was fortunate enough to participate in CiP in December 2015 and this experience has had a tremendous impact on me and my professional career as a parasitologist and scientist. I have formed many ties with great scientists in the Society and have built lasting friendships with many Australian and international researchers that are equally as passionate about parasites as I am.

I would like to thank all scientists that have contributed to my work or supported me otherwise during my PhD candidature: Aaron Jex (Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia), Paul Sternberg (California Institute of Technology, Pasadena, CA, USA), Patrick Tan (Genome Institute of Singapore, Duke-NUS Graduate Medical School, Republic of Singapore), Peter Boag (Monash University, Melbourne, Australia), Andreas Hofmann (Griffith University, Brisbane, Australia), Pasi Korhonen and Abdul Jabbar (University of Melbourne, Australia), Bill Chang (Yourgene Biosciences, Taiwan), Giuseppe La Rosa and Edoardo Pozio (Istituto Superiore di Sanità, Rome, Italy) and Peter Nejsum (Aarhus University, Denmark).

I would like to especially thank my friends in Melbourne that have become so important in my life during the last three years: Luke, Nina, Glenn, Julian, thank you for making Melbourne my ‘home base’ and for all the good times we have shared so far! Many more to come, for sure. Thank you, Sabrina for being a great (long lost but found again) friend that makes this place so far away from home feel a little more like home.

I also would like to thank my friends that I ‘left behind’ when I left Germany: Martin, Jonas and Johannes, you have been my best friends for such a long time and you will continue to be even though we are currently so far apart. Keeping in contact with you and hearing about life at home during my time in Australia means the world to me and I hope to see you all again very soon! I also want to thank my friend Amy for her love, for believing in me, supporting me and always being just a phone call away if I needed her advice or just an open ear.

vi I would like to thank all the lab members that have made working in this group in the last three years a pleasure: Clare, for being a great colleague and friend who always has an open ear for me, be it work- or life-related. Ross, for being a great office mate, a knowledgeable bioinformatician, for his help with tricky code and for taking excellent care of our computing infrastructure, without which we could not do the work we are doing. I want to thank Pasi for his extraordinary bioinformatic knowledge and the useful advice he gave on all my projects. Thank you, Brendan for inspiring me as a scientist and your constructive and innovative ideas around R, statistics and most other bioinformatics-related things. Anson, thank you for your continuous contributions to the success of the entire lab. I would also like to thank Sarah and Yaqing for their support and guidance with the drug screening assay. Thank you, to all lecturers, technical staff and tutors involved in Parasitology teaching, especially Christine Andersen and Léa Indjein. Thank you for showing me the world of parasites and being so infectiously passionate about teaching others about it. For the friendly and open atmosphere, characterised by cooperation and colleagueship I want to thank all the members of the Parasitology group. Thank you for a great time in an enthusiastic environment and being a wonderful team to work with!

Personal funding was provided by the Melbourne International Research Scholarship and Melbourne International Fee Remission Scholarship, the Sir Ian Clunies-Ross Prize (2015), the Elizabeth Ann Crespin Scholarship (2016) and the Dr Sue Newton Travelling Scholarship (2017) by the University of Melbourne, and research scholarships by Yourgene Biosciences, Taiwan, for which I am very grateful. Travel funding from the Australian and German Societies for Parasitology and the Faculty of Veterinary and Agricultural Sciences (University of Melbourne) are also gratefully acknowledged. Research funding was also provided through grants held by Prof Robin B. Gasser (National Health and Medical Research Council; Australian Research Council; Wellcome Trust, UK). Other support from the Australian Academy of Science, the Australian-American Fulbright Commission, Alexander von Humboldt Foundation, Melbourne Water Corporation, The University of Melbourne Business Improvement Program, Victorian Life Sciences Computation Initiative (VLSCI) and staff at WormBase is also gratefully acknowledged.

My heartfelt acknowledgement goes to my partner Mia. I met you towards the end of my PhD, and although life has been very turbulent at times, I wouldn’t want to miss a second of the good times we had together and the memories we made.

Finally, I want to thank my family, Brigitte, Joachim, Christina and Benjamin for their love, support and encouraging words during this time that I spent so far away from them. I’m sorry I couldn’t visit as often as I would have liked to.

vii PREFACE AND DISSEMINATION OF RESEARCH FINDINGS ______

Scientific papers published or submitted by the author in collaboration with supervisors and other colleagues are listed in the following:

Peer-reviewed articles published in international scientific journals:

Stroehlein AJ, Young ND, Jex AR, Sternberg PW, Tan P, Boag PR, Hofmann A, Gasser RB, 2015. Defining the Schistosoma haematobium kinome enables the prediction of essential kinases as anti-schistosome drug targets. Sci. Rep. 5, 17759.

Stroehlein AJ, Young ND, Korhonen PK, Jabbar A, Hofmann A, Sternberg PW, Gasser RB, 2015. The Haemonchus contortus kinome - a resource for fundamental molecular investigations and drug discovery. Parasit. Vectors 8, 623.

Stroehlein AJ, Young ND, Korhonen PK, Chang BCH, Sternberg PW, La Rosa G, Pozio E, Gasser RB, 2016. Analyses of compact Trichinella kinomes reveal a MOS-like protein kinase with a unique N-terminal domain. G3 (Bethesda) 6, 2847-2856.

Stroehlein AJ, Young ND, Korhonen PK, Chang BCH, Nejsum P, Pozio E, La Rosa G, Sternberg PW, Gasser RB, 2017. Whipworm kinomes reflect a unique biology and adaptation to the host . Int. J. Parasitol. In press.

Conference proceedings and seminars given:

Stroehlein AJ, 2014. Investigations of kinomes of parasitic helminths - fundamental and applied implications. Faculty of Veterinary and Agricultural Sciences Postgraduate Symposium, The University of Melbourne, Melbourne, Australia, 21 November 2014.

Stroehlein AJ, Young ND, Gasser RB, 2015. Investigations of kinomes of parasitic helminths - fundamental and applied implications. 8th Short Course for Young Parasitologists, Hamburg, Germany, 11-14 March 2015.

Stroehlein AJ, Young ND, Gasser RB, 2015. Characterisation and curation of the Schistosoma haematobium kinome - fundamental and applied implications. 16th Drug Design & Development Seminar, Berlin, Germany, 16-18 March 2015.

Stroehlein AJ, Young ND, Gasser RB, 2015. Investigations of kinomes of parasitic helminths - fundamental and applied implications. Confirmation of Candidature, Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Melbourne, Australia, 14 April 2015.

Stroehlein AJ, Young ND, Sternberg PW, Boag PR, Jex AR, Hofmann A, Gasser RB, 2015. Defining the Schistosoma haematobium kinome as a basis for the prediction and prioritisation of kinases as anti-schistosome drug targets. Joint Annual Conference of the New Zealand and Australian Societies for Parasitology, Auckland, New Zealand, 2 July 2015.

viii Stroehlein AJ, 2015. Investigations of helminth kinomes - from understanding parasite biology to drug discovery. Mahidol Vivax Research Unit (MVRU), Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand, 10 July 2015.

Stroehlein AJ, Young ND, Korhonen PK, Chang BCH, Sternberg PW, La Rosa G, Pozio E, Gasser RB, 2016. Computational analyses of compact Trichinella kinomes as a resource for fundamental and applied investigations. International Congress for Tropical Medicine and Malaria, Brisbane, Australia, 18-22 September 2016.

Stroehlein AJ, Young ND, Gasser RB, 2016. Curating worm kinomes - a resource for investigating parasite biology and novel drug targets. Faculty of Veterinary and Agricultural Sciences Research Symposium, The University of Melbourne, Melbourne, Australia, 30 November - 2 December 2016.

Stroehlein AJ, 2017. The kinomes of socioeconomically important parasitic helminths - fundamental and applied implications. PhD Oration, Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Melbourne, Australia, 6 April 2017.

Stroehlein AJ, Young ND, Gasser RB, 2017. A bioinformatic pipeline for the curation, identification and classification of protein kinases of parasitic helminths. Annual Conference of the Australian Society for Parasitology, Leura, Australia, 26-29 June 2017.

Stroehlein AJ, Young ND, Gasser RB, 2017. The kinomes of parasitic provide a resource to investigate signalling mechanisms in roundworms. Annual Conference of the Australian Society for Parasitology, Leura, Australia, 26-29 June 2017.

ix TABLE OF CONTENTS ______

The present thesis contains six chapters, with a reference list, tables and figures at the end of each chapter. Chapters 2-5 describe original research findings published or accepted for publication in international peer-reviewed scientific journals.

Chapter 1 - Literature review

1.1 Introduction - an historic account of kinase research ...... 1 1.2 Protein kinase complements, their evolution and classification ...... 4 1.3 Structural and biochemical aspects of kinase function ...... 7 1.4 Roles of kinases in disease processes and as drug targets ...... 10 1.5 Parasitic helminths - socioeconomic importance and biology ...... 11 1.5.1 Water- and food-borne helminths ...... 11 1.5.2 Soil-transmitted helminths ...... 13 1.6 Control of helminth infections and associated challenges ...... 14 1.7 Aspects of drug discovery and development ...... 17 1.8 Nematode model systems for anthelmintic drug discovery ...... 20 1.9 Genomic-guided and computational drug discovery ...... 21 1.10 Resources and databases for protein kinases ...... 25 1.11 Methods for the characterisation and annotation of kinomes ...... 26 1.12 Draft kinomes of helminths ...... 29 1.13 Conclusions from the literature review and aims of this thesis ...... 30 1.14 References ...... 33

Chapter 2 - Defining the Schistosoma haematobium kinome enables the prediction of essential kinases as anti-schistosome drug targets

Abstract ...... 64 2.1 Introduction ...... 65 2.2 Methods...... 67 2.2.1 Defining the S. haematobium kinome ...... 67 2.2.2 Transcription analysis ...... 69 2.2.3 Drug target prediction and prioritisation ...... 69 2.3 Results ...... 70 2.3.1 The S. haematobium kinome ...... 70 2.3.2 Transcription profiles ...... 73 2.3.3 Druggable kinases and their prioritisation...... 74 2.4 Discussion ...... 75 2.5 References ...... 82

Chapter 3 - The Haemonchus contortus kinome - a resource for fundamental molecular investigations and drug discovery

Abstract ...... 98 3.1 Introduction ...... 99 3.2 Methods...... 100 3.2.1 Defining the H. contortus kinome ...... 100 3.2.2 Transcription analysis ...... 102

x 3.2.3 Drug target prediction and prioritisation ...... 102 3.3 Results ...... 103 3.3.1 The H. contortus kinome ...... 103 3.3.2 Transcription profiles ...... 105 3.3.3 Kinases with potential as drug targets ...... 106 3.4 Discussion ...... 107 3.4.1 The H. contortus kinome ...... 107 3.4.2 Transcriptional regulation of kinase genes in H. contortus ...... 110 3.4.3 Protein kinases of H. contortus as potential drug targets ...... 112 3.5 Conclusions ...... 114 3.6 References ...... 115

Chapter 4 - Analyses of compact Trichinella kinomes reveal a MOS-like protein kinase with a unique N-terminal domain

Abstract ...... 127 4.1 Introduction ...... 128 4.2 Material and methods ...... 130 4.2.1 Defining kinomes ...... 130 4.2.2 Phylogenetic analysis ...... 131 4.2.3 Functional and structural annotation of kinase sequences ...... 131 4.3 Results ...... 132 4.3.1 The protein kinase complements of T. spiralis and T. pseudospiralis ...... 132 4.3.2 Functional annotation of Trichinella kinomes ...... 133 4.4 Discussion ...... 135 4.5 References ...... 140

Chapter 5 - Whipworm kinomes reflect a unique biology and adaptation to the host animal

Abstract ...... 149 5.1 Introduction ...... 150 5.2 Materials and methods ...... 151 5.2.1 Defining and curating kinomes ...... 151 5.2.2 Kinase classification and functional annotation ...... 152 5.2.3 Transcription analysis ...... 153 5.2.4 Kinase sequence comparisons among nematodes ...... 153 5.3 Results ...... 154 5.3.1 Kinomes of Trichuris ...... 154 5.3.2 Transcription profiles ...... 156 5.3.3 Kinome comparisons among nematodes ...... 157 5.4 Discussion ...... 158 5.5 References ...... 165

Chapter 6 - General discussion ...... 175

6.1 Technical achievements ...... 176 6.2 Sequence curation and functional annotation ...... 182 6.3 Fundamental and applied achievements ...... 185 6.4 Prospects and future extensions ...... 189

xi 6.5 Conclusion ...... 195 6.6 References ...... 196

List of appendices ...... 206

xii

CHAPTER 1 Literature review ______

1.1 Introduction - an historic account of kinase research Knowledge of the molecular biology of cells and how they communicate is crucial to understanding life of eukaryotic organisms. The discovery that all cells contain deoxyribonucleic acid (DNA) and that this molecule holds the information required for the production of messenger RNA (mRNA) and subsequent synthesis of proteins that assume essential structural and enzymatic functions in all cells led to the formulation of the “Sequence Hypothesis” and the “Central Dogma” of molecular biology (Crick, 1958, 1970). These early discoveries, followed by advances in the ability to read (‘sequence’) nucleotide and amino acid sequences (Edman and Begg, 1967; Wu, 1972; Jay et al., 1974; Padmanabhan et al., 1974; Sanger and Coulson, 1975; Maxam and Gilbert, 1977; Sanger et al., 1977), enabled studies of single genes, transcripts and proteins at the molecular level. It was only then that molecular investigations of enzymes, such as protein kinases, became feasible and more common (Figure 1.1), although the principle of reversible protein phosphorylation, involving a phosphorylating enzyme (kinase) and a de-phosphorylating enzyme (phosphatase), had been discovered earlier (Fischer and Krebs, 1955; Sutherland and Wosilait, 1955). Progress in sequencing technologies allowed kinase genes (mainly from mammalian cell lines, vinegar fly (Drosophila melanogaster) and yeast) to be characterised directly instead of having to deduce their properties via enzyme purification and observed effects (Hanks, 1987; Hunter, 1987). The sequencing of a range of kinase genes enabled the comparison of their sequences, which led to the identification of conserved residues, domains and subdomains, and facilitated the inference of a first phylogeny for protein kinases (Hanks et al., 1988). Most of the early functional studies of protein kinases, in particular of their involvement in the cell cycle and cell division processes, were performed in yeast (Reed et al., 1985; Simanis and Nurse, 1986; Brizuela et al., 1987; Draetta et al., 1987; Lee and Nurse, 1987). Other areas of kinase research focused on receptor tyrosine kinase signalling processes involved in the development of D. melanogaster (see Perrimon, 1994; Duffy and Perrimon, 1996). In addition, signalling processes regulated by kinases were also investigated in Caenorhabditis elegans (see Sternberg and Horvitz, 1991; Eisenmann and Kim, 1994; Duggan and Chalfie,

1

1995), a small, free-living roundworm that had been previously established as a model for multicellular organisms (Brenner, 1974, 1988). This nematode is employed as a model because of its rapid life cycle, ease of maintenance and manipulation in a laboratory setting as well as the fact that adult worms consist of a fixed number of somatic cells - 959 in the hermaphrodite and 1031 in the male (Sulston et al., 1983). In addition to increased sequencing efforts and functional investigations of kinase genes/kinases in these model organisms, researchers also solved the first three-dimensional crystal structure of the catalytic subunit of a kinase - the cyclic adenosine monophosphate (cAMP)-dependent protein kinase (PKA) (Knighton et al., 1991; Zheng et al., 1993). This work represented a major milestone in kinase research, providing crucial insight into kinase sub-structures and their roles in catalysing protein phosphorylation. Importantly, this work also laid the foundation for intensive research on small molecules that inhibit mutated/deregulated protein kinases, an avenue that had been proposed earlier, given the roles of kinase oncogenes in the growth of viral tumours in birds (Martin, 1970; Brugge and Erikson, 1977; Hunter and Sefton, 1980) and cancers of humans (Heisterkamp et al., 1983; Shtivelman et al., 1985; Varmus, 1985; Morange, 1993) (Figure 1.1). With advances in sequencing techniques (Burke et al., 1987; O'Connor et al., 1989; Shizuya et al., 1992; Kim et al., 1996) came efforts to sequence the transcriptome (Adams et al., 1995; Korenberg et al., 1995) and the entire nuclear genome of an organism (Gibbs, 1995; Little, 1995), to gain a global view of protein-coding genes. The first complete eukaryotic genome sequenced was that of the yeast Saccharomyces cerevisiae (see Goffeau et al., 1996), soon followed by the first genomes of multicellular organisms including C. elegans (see C. elegans Sequencing Consortium, 1998), C. briggsae (see Stein et al., 2003), D. melanogaster (see Adams et al., 2000), mouse (Mouse Genome Sequencing Consortium et al., 2002) and human (Venter et al., 2001). These studies enabled, for the first time, the analysis of the full complement of protein kinases (“kinome”) encoded in a genome (Hunter and Plowman, 1997; Plowman et al., 1999; Manning et al., 2002a; Caenepeel et al., 2004). This work provided insights into kinase evolution (Manning et al., 2002b) and enabled functional studies of entire kinomes (Reinke et al., 2000; Maeda et al., 2001; Bimbo et al., 2005). Following the sequencing of the genomes of these model organisms, there was an increased effort to sequence pathogens of socioeconomic importance, including a malaria parasite of humans, Plasmodium falciparum (see Gardner et al., 2002) and parasitic worms (helminths). The first draft genome of a parasitic helminth was that of the filarial nematode

2

Brugia malayi (see Blaxter et al., 2002; Ghedin et al., 2004; Ghedin et al., 2007), which allowed for the characterisation of its kinome and comparisons with those of C. elegans and C. briggsae, revealing a major difference in the kinome composition between these free- living nematodes and B. malayi. The advent of advanced, short-read (“next-generation”) sequencing (NGS) technologies, such as the Illumina Solexa platform (Metzker, 2005; Mardis, 2008a, b; Schuster, 2008), marked another milestone in the history of sequencing technologies and allowed many other genomes to be sequenced at a fraction of the cost spent on the first eukaryotic genome projects, and in a substantially shorter time-frame. During this period of expansion in the number of genome projects (Figure 1.1), numerous draft genomes of parasitic worms became available (Berriman et al., 2009; Schistosoma japonicum Genome Sequencing and Functional Analysis Consortium, 2009; Jex et al., 2011, 2014; Mitreva et al., 2011; Protasio et al., 2012; Young et al., 2012, 2014; Laing et al., 2013; Schwarz et al., 2013, 2015; Foth et al., 2014; Tang et al., 2014; Zhu et al., 2015; Howe et al., 2016; Korhonen et al., 2016) - with many more likely to be completed in the future, at low sequencing cost. In addition, the ongoing advances in sequencing technologies (“third-generation sequencing”), which are able to produce longer reads, will most likely improve the quality of future helminth genome assemblies (Pettersson et al., 2009; Metzker, 2010; van Dijk et al., 2014; Reuter et al., 2015; Goodwin et al., 2016). Currently available genomic and transcriptomic data sets for parasitic helminths now provide an unprecedented resource for the exploration of their kinase signalling mechanisms at the molecular level and hold promise for finding novel intervention strategies against parasitic helminths, which have a major socioeconomic impact on the animal industry and human health (Bethony et al., 2006; Brindley et al., 2009; Hotez et al., 2009; Hotez and Kamath, 2009; Brindley and Hotez, 2013). Given that protein kinases were recognised relatively early as being of critical importance in the pathogenesis of disease and the fact that they are now recognised drug targets (Blume-Jensen and Hunter, 2001; Cohen, 2001, 2002; Cohen and Alessi, 2013), studying these molecules in parasitic worms provides major prospects for developing new chemical agents against helminths (anthelmintics) in the future. The purpose of this chapter was (i) to provide an historic account of kinase research from single-molecule studies to the global characterisation of whole kinomes; (ii) to describe important aspects of kinase biochemistry, evolution, classification and their role(s) in disease; (iii) to review the current state of kinase drug discovery and development in both the human and infectious/parasitic disease areas; (iv) to summarise the principles of the biology and life

3

cycles of key representatives of socioeconomically important parasitic worms; (v) to describe currently available kinome data of eukaryotic organisms obtained from genomic and transcriptomic studies; (vi) to critically appraise the methods used for the identification and classification of protein kinase sequences; and (vii) to delineate the prospects and implications of fundamental and applied investigations of helminth kinase biology and signalling processes at the genomic and transcriptomic levels. The research aims of this thesis were formulated based on the conclusion from this literature review.

1.2 Protein kinase complements, their evolution and classification Protein kinases are enzymes (transferases) that phosphorylate a substrate via the transfer of a phosphoryl group from an energy-rich molecule (adenosine triphosphate; ATP). The phosphorylation of the substrate induces a structural or biochemical modification leading to changes in its activity, reactivity and/or conformation. Substrates are phosphorylated at the hydroxyl group of an amino acid residue, which can be either serine/threonine (Krebs, 1993; Graves and Krebs, 1999), tyrosine (Schlessinger and Ullrich, 1992; Hubbard and Till, 2000) or other residues (Matthews, 1995). Although most kinases specifically recognise and phosphorylate only one type of residue, others (termed “dual-specificity kinases”) act on both serine/threonine and tyrosine residues (Stern et al., 1991; Hunter et al., 1992). Additionally, some kinases phosphorylate themselves (autophosphorylation) or are phosphorylated by other kinases. Based on the residues that they recognise and phosphorylate, most kinases belong to one of two main superfamilies: the serine/threonine kinases (STKs) and the tyrosine kinases (TKs). Kinases comprise one of the largest classes of proteins in most eukaryotic genomes, representing approximately 2-4% of protein-coding genes (Hanks, 2003; Champion et al., 2004). They are ubiquitously present in eukaryotes and are involved in most signal transduction pathways regulating a wide range of cellular functions, such as cell growth, proliferation, transcriptional regulation, apoptosis and cellular sub-localisation of proteins (Cohen, 2000; Manning, 2005). Given their importance in these processes, the structure and function of protein kinases have been investigated for many years (reviewed by Johnson and Hunter, 2005; Endicott et al., 2012; Knight et al., 2013; Schlessinger, 2014) mainly in tractable model organisms such as S. cerevisiae (see Bharucha et al., 2008; Sharifpoor et al., 2012) and C. elegans (see Plowman et al., 1999; Maeda et al., 2001; Kamath and Ahringer, 2003; Sugimoto, 2004; Sonnichsen et al., 2005; Lehmann et al., 2013), which has greatly

4

advanced the understanding of the molecular biology of eukaryotic organisms and cellular signalling mechanisms. The sequencing of genomes for a range of eukaryotic organisms in the beginning of this century was a tremendous achievement by a large community of researchers and enabled the investigation of kinase signalling on a genome-wide scale. The first eukaryotic organism for which all kinase genes were identified was S. cerevisiae (see Hunter and Plowman, 1997) and the kinomes of a range of metazoan species followed shortly thereafter, including those of the model organisms C. elegans (see Plowman et al., 1999), D. melanogaster (see Morrison et al., 2000; Manning et al., 2002b), human (Manning et al., 2002a) and mouse (Caenepeel et al., 2004) (Table 1.1). In 2006, the kinome of the amoebozoan protist Dictyostelium discoideum was published (Goldberg et al., 2006), representing one of the most ancient eukaryotic organisms and an important phylogenetic branch point. Phylogenetic analyses of kinase sequences of this and other organisms for which kinomes had become available then provided insights into kinase evolution and function, from yeast to higher-order metazoan organisms (Manning et al., 2002b; Manning, 2005; Goldberg et al., 2006). Such studies revealed lineage-specific expansions and contractions, and facilitated the classification of kinases into groups that were functionally and structurally conserved throughout evolution (Hanks and Hunter, 1995; Manning et al., 2002b; Hanks, 2003; Manning, 2005). This classification is based on sequence similarity within the catalytic kinase domain, the presence of additional domains, known biological functions and levels of sequence conservation across phylogenetically divergent species. The present protein kinase classification scheme is based on the original scheme developed by Hanks and Hunter (Hanks et al., 1988; Hanks and Hunter, 1995) and was further expanded by the work of others (Manning et al., 2002a, b; Hanks, 2003). The currently accepted “Standard Kinase Classification Scheme” (http://kinase.com/wiki/index.php/Standard_Kinase_Classification_Scheme) is subject to change as new kinases are discovered or existing kinases are reclassified into novel families and/or subfamilies based on their structure and/or function. Eukaryotic protein kinases (ePKs) all share a conserved catalytic domain and are divided into nine groups (Table 1.1):

5

1. The AGC group contains kinases that are modulated by cyclic nucleotides, phospholipids or calcium, and is represented mainly by the protein kinase families A, G and C (PKA, PKC, PKG), Akt, 3-phosphoinositide-dependent kinase (PDK1), dystrophia myotonica protein kinase (DMPK), microtubule-associated Ser/Thr kinase (MAST), ribosomal S6 kinase (RSK), nuclear DbF2-relatedkinase and G protein-coupled receptor kinases (GRKs) (Pearce et al., 2010). 2. The CAMK group consist of two calmodulin/calcium-regulated kinase families CAMK1 and CAMK2 and several families of kinases that are not calcium-regulated, including CAMKL (CAMK-like) kinases and brain-selective kinases (BRSKs) (Soderling, 1999; Swulius and Waxham, 2008). 3. The CMGC group is one of the largest groups in most kinomes, consisting of four families (cyclin-dependent kinases (CDKs), mitogen-activated protein kinases (MAPKs), glycogen synthase kinase 3 (GSK) and cell division cycle (CDC)-like kinases (CLKs)), members of which play diverse roles in cell cycle control, transcriptional regulation, splicing and other processes (Krishna and Narang, 2008; Varjosalo et al., 2013). 4. The CK1 group contains a small number of representatives for most metazoans but is expanded in many nematode species including C. elegans and C. briggsae (see Manning, 2005). This group is subdivided into the families casein kinase 1 (CK1), venus kinase receptors (VKRs; Vanderstraete et al., 2013), Tau tubulin kinases (TTBKs) and TTBK- like kinases (TTBKLs) (Ikezu and Ikezu, 2014). 5. The group of receptor guanylate cyclases (RGCs) is a relatively small group in most organisms, representing sequences that contain an active guanylate cyclase domain and a catalytically inactive kinase domain that appears to have a regulatory function (Singh et al., 1988; Garbers, 1990; Jaleel et al., 2006). Similar to CK1, this group is also substantially expanded in several nematode species (Manning, 2005; Zaru et al., 2017). 6. The STE group contains homologs of proteins encoded by the yeast sterile (ste) genes (STE7, STE11 and STE20) that form the MAPK cascade, transducing signals from the surface of the cell to the nucleus, thus regulating gene expression in response to extracellular stimuli (Chang and Karin, 2001). 7. The group of tyrosine kinases (TKs) represents kinases that phosphorylate its substrate on tyrosine residues, as opposed to most kinases from the other groups that phosphorylate either serine or threonine residues. TKs can be further subdivided into receptor TKs (RTKs) and non-transmembrane (i.e. cytoplasmic) TKs (CTKs), and play

6

vital roles in cell proliferation, differentiation, subcellular localisation and metabolism (Schlessinger and Ullrich, 1992; Hubbard and Till, 2000). 8. Tyrosine kinase-like kinases (TKLs) are similar in sequence to tyrosine kinases but mostly phosphorylate substrates on serine/threonine residues. This group includes families that form part of the MAPK cascade (MAP3Ks), receptor kinase families as well as kinases involved in immunity and cytoskeletal processes (Goldberg et al., 2006). Several families in this group are expanded in plants and D. discoideum, suggesting that they are of ancient origin (Goldberg et al., 2006; Zulawski et al., 2014). 9. The “Other” group is a diverse group of kinase families and single kinases that are ePKs but share limited similarity with any of the eight other ePK groups (Manning et al., 2002b; Hanks, 2003).

In addition to these nine recognised groups representing ePKs, many other kinases have the same protein kinase-like (PKL) fold and catalytic mechanism, despite sharing limited sequence similarity. Thus, divergent families within the PKL superfamily, including Alpha, ABC1 (ABC1 domain-containing), PIKK (phosphatidyl inositol 3’ kinase-related kinases), and RIO (right open reading frame), are termed ‘atypical’ kinases (aPKs) or simply PKLs (Scheeff and Bourne, 2005; Kannan et al., 2007a). Taken together, the nine ePK groups, the aPKs and other members of the PKL superfamily are currently divided into 240 families and 218 subfamilies (http://kinase.com/kinbase/).

1.3 Structural and biochemical aspects of kinase function Despite their sequence and functional divergence, most eukaryotic kinases have a conserved protein kinase catalytic domain consisting of 12 sequence subdomains (Figure 1.2A) with several highly-conserved residues that are essential for catalytic activity (Hanks, 2003). Insights from the solved crystal structure of PKA and genome-wide comparisons of kinases provided new knowledge about how these conserved residues interact with ATP and protein substrates and the role(s) that they play in the three-dimensional conformation of kinases required for catalytic activity (Knighton et al., 1991; Zheng et al., 1993). These studies revealed that the catalytic domain of kinases assumes a bilobal structure (Figure 1.2B). The smaller, amino-(N-)terminal lobe mainly functions in ATP binding and is comprised of the first four of the 12 subdomains. Subdomain I, also known as “glycine-rich loop” has a conserved Gly-x-Gly-x-x-Gly motif and represents the most mobile part of the protein kinase which assumes an ‘open’ or a ‘closed’ conformation. This loop consists of two

7

b-strands and, in the closed conformation, anchors the a- and b- (i.e. the non-transferrable) phosphates and positions the g-phosphate of ATP for its transfer to the protein substrate (Hanks and Hunter, 1995; Johnson et al., 2001). Subdomain II is represented by b-strand 3 in the small lobe and contains a lysine residue that is critical for catalytic function. This residue plays an important role in orienting the ATP molecule by interacting with its α- and β- phosphates and is often part of a Val-Ala-Ile-Lys (VAIK) motif (Hanks and Hunter, 1995; Pearce et al., 2010). Strand 3 is followed by a small helix (“B-helix”), which can be variable and is not always present in protein kinase structures, and by the large “C-helix” (subdomain III) that contains a conserved glutamic acid residue. This residue is centrally located in the C- helix and plays a stabilising role in the interactions between the lysine in subdomain II and the α- and β-phosphates of ATP. This subdomain is the only subdomain that contains helical elements in the N-lobe of the kinase core. The hydrophobic b-strand 4 constitutes subdomain IV which does not contain conserved residues and has no known direct role in the recognition of a kinase substrate or catalytic function. The last b-strand of the N-lobe is linked to the first a-helix of the larger C-lobe via subdomain V (“hinge region”). The C-lobe is comprised mostly of α-helices and harbours structural elements and residues important for peptide binding and catalytic processes. This lobe represents the more stable part of the kinase structure, compared with the relatively flexible N-lobe. Subdomain VIa (“E-helix”) represents a very hydrophobic part of the kinase structure and is of structural importance, but does not directly play a role in either ATP or substrate binding. Subdomain VIb consists of the two b-strands 6 and 7, and has a conserved His-Arg-Asp-x-x-x-x-Asn (HRDxxxxN) motif. This subdomain is also known as the “catalytic loop” because the aspartic acid residue in this loop acts as a catalytic base in the phosphotransfer process, and the asparagine facilitates the positioning of a divalent Mg2+ cation, which interacts with the oxygens of the α- and g-phosphates of the ATP (Zheng et al., 1993; Adams, 2001). Subdomain VII represents the “Mg-binding loop” and parts of the “activation loop”, a very stable element of the kinase structure that requires phosphorylation at a serine/threonine or tyrosine residue to render the kinase catalytically active - a process facilitated either by autophosphorylation or phosphorylation by another kinase (Taylor and Kornev, 2011). The Mg-binding loop contains an Asp-Phe-Gly (DFG) motif, in which the aspartic acid assumes a critical role in the recognition of a second Mg2+ ion that bridges the b- and g-phosphates of ATP (Zheng et al., 1993; Adams, 2001; Taylor and Kornev, 2011). The next subdomain (VIII) is the “P+1 loop”, a stable structural element that contains a

8

conserved Ala-Pro-Glu (APE) motif, which is important for the docking of peptide substrates and for C-lobe stability (Hanks and Hunter, 1995; Nolen et al., 2004). Together, the Mg- binding loop, activation loop and P+1 loop constitute the “activation segment”. Subdomain IX (“F-helix”) is a very hydrophobic helix that spans the centre of the C-lobe and is structurally important (Kornev et al., 2008; Kornev and Taylor, 2010). Subdomain X (“G- helix”) is less highly conserved among different protein kinases compared with other subdomains and represents a recognition- and docking-site for protein substrates (Dar et al., 2005; Kornev and Taylor, 2010). The last subdomain (XI) of the C-lobe is the “H-helix”, which has a conserved arginine residue that interacts with the conserved glutamic acid in the APE motif of subdomain VIII, thus stabilising the C-lobe (Hanks and Hunter, 1995). In addition to these conserved subdomains and residues in the catalytic core of kinases, these enzymes often have non-catalytic accessory domains that are fused N- and/or C- terminally to the catalytic subunit (for an extended list of known accessory domains in kinases see Manning et al., 2002a). These domains play crucial roles in protein-protein interactions, kinase activation, dimerisation and other processes important for the correct establishment of a kinase-signalling cascade (Huse and Kuriyan, 2002; Kannan et al., 2007b; Langeberg and Scott, 2015). In some cases, regulatory subunits are not encoded by the same gene, but rather are separate proteins that form a functional complex with the catalytic subunit (Berman et al., 2005). Although most of the structural properties of protein kinases were initially derived from the crystal structure of PKA, the structures of many other kinases were solved shortly after (for early reviews see Bossemeyer, 1995; Hubbard and Till, 2000), which provided new insights into mechanisms of kinase activation and phosphorylation, revealed variation in activation loops and regulatory mechanisms among kinases and helped better understand which structural features render a kinase catalytically inactive (i.e. a pseudokinase) (Nolen et al., 2004; Taylor and Kornev, 2011; Langeberg and Scott, 2015). Currently (May 2017), there are more than 5200 solved crystal structures of protein kinases deposited in the Protein Data Bank (PDB; http://www.rcsb.org/pdb/), including 3340 protein-serine/threonine kinases (Enzyme Commission (EC) number: 2.7.11), 1491 protein-tyrosine kinases (EC number: 2.7.10), 200 protein-histidine kinases (EC number: 2.7.13) and 172 dual-specificity kinases (EC number: 2.7.12).

9

1.4 Roles of kinases in disease processes and as drug targets The increase in structural investigations of protein kinases has mainly been driven by the fact that mutated or otherwise deregulated kinases are involved in the onset of diseases in humans, such as cancers (Cohen, 2001). In this context, understanding protein kinases and their mechanistic features at the structural level allowed new strategies to be devised to target them with small molecule compounds. This task was previously deemed challenging, if not impossible, given the high intracellular concentration of ATP a small molecule inhibitor would have to compete with and the challenges associated with highly selective targeting of the structurally conserved site for ATP recognition (Cohen, 2002). Despite these challenges, between 1999 and 2001, two compounds shown to target and selectively inhibit protein kinases were approved by the US Food and Drug Administration (FDA): rapamycin (sirolimus; Sehgal, 1998) and gleevec (imatinib; Druker et al., 2001). These discoveries resulted in an increased interest in protein kinases as drug targets, rapidly making them the second-most targeted proteins after G-protein-coupled receptors (Cohen, 2002). In the last decade, many compounds inhibiting kinases have been investigated as drugs against a range of diseases including inflammatory and autoimmune diseases, Parkinson’s disease, hypertension and different types of human cancers (Cohen and Alessi, 2013). As of July 2015 (Wu et al., 2016), there were 28 FDA-approved drugs that target protein kinases, predominantly indicated for the use as anti-cancer drugs (Eglen and Reisine, 2009, 2011; Wu et al., 2015). Most kinases (n = 24) targeted by these drugs are tyrosine kinases (TK), and only two targets belong to each of the three groups CMGC, STE and TKL (Wu et al., 2016). An analysis of ~19,000 inhibitors (Hu et al., 2015) showed that, although members of eight major kinase groups have been investigated as drug targets, to date, kinases representing only about half of the human kinome have been targeted using small molecules (Fedorov et al., 2010). Most of these compounds are so-called Type I inhibitors - ATP-competitive molecules that bind to the active (“DFG-in”) conformation of kinases (Zhang et al., 2009). In contrast, Type II inhibitors bind to and stabilise the inactive (“DFG-out”) kinase conformation (Liu and Gray, 2006; Zhao et al., 2014). However, human tumour cells can become resistant to such inhibitors (the most prominent example being imatinib) (Krishnamurty and Maly, 2010) if a residue in the hinge region of a kinase (the ‘gatekeeper’ residue) is mutated (Liu et al., 1998; Gorre et al., 2001), thus stabilising a conserved hydrophobic spine (Kornev et al., 2006; Kornev et al., 2008), activating the kinase and sterically blocking access to the inhibitor binding site (Daub et al., 2004; Azam et al., 2008).

10

To overcome resistance and/or to achieve higher target selectivity, compounds were developed that do not bind to the ATP-binding site but instead target an adjacent pocket (Type III inhibitor), a remote site (Type IV inhibitor), or bi-valently bind to both the ATP binding pocket and the substrate recognition site (Type V inhibitor) (Cox et al., 2011; Gavrin and Saiah, 2013; Müller et al., 2015). Although the latter three inhibitor types have the advantage of usually being more selective, the promiscuity associated with ATP-competitive inhibitors has been successfully exploited to target multiple kinase targets at the same time with just one drug (“targeted polypharmacology“; Hopkins et al., 2006; Bilanges et al., 2008; Metz and Hajduk, 2010; Achenbach et al., 2011).

1.5 Parasitic helminths - socioeconomic importance and biology Discoveries showing that kinases can be effectively and relatively selectively targeted by small molecule compounds, and functional investigations showing their importance in virtually all biological signalling processes and in disease have spawned studies of these enzymes in infectious eukaryotic organisms that cause diseases in humans and . Eukaryotic pathogens often occupy biological niches within their host (Garnick, 1992; Despommier, 1993) and employ sophisticated counter-measures to evade host immune detection or attack (Maizels et al., 2009; McSorley et al., 2013; Maizels and McSorley, 2016), which often requires specialised signalling mechanisms (Gilabert et al., 2016; Lok, 2016). Although there is a limited understanding of the biochemical signalling frameworks governing complex biological traits, the life cycles of some eukaryotic parasites are relatively well understood, and often involve multiple developmental stages and one or more hosts required for development and reproduction. The following sections provide an account of socioeconomically important parasitic trematodes (phylum Platyhelminthes; class Trematoda) and roundworms (phylum Nematoda), in accord with the focus of the thesis as a whole (Table 1.2).

1.5.1 Water- and food-borne helminths Parasitic trematodes usually have complex, indirect life cycles that require a mollusc (mostly a freshwater snail) as an intermediate host and sometimes involve a second, intermediate host (e.g., a fish) (Roberts and Janovy Jr., 2009). Their generalised life cycle has numerous phases: a ciliated larva, called a miracidium develops inside an egg, hatches and penetrates a molluscan intermediate host. Once inside this host, it sheds its ciliated ectoderm and develops into the sporocyst. This developmental stage undergoes

11

asexual reproduction giving rise to either daughter generations of sporocysts or slightly more differentiated rediae (a stage that is absent from the life cycles of some genera of trematodes, such as schistosomes (blood flukes)). Embryos within the sporocysts or rediae then develop into cercariae that emerge from the mollusc and either penetrate the skin of a definitive host (e.g., in the case of schistosomes) or encyst as metacercariae that can be ingested by a definitive host via vegetation (e.g., in the case of Fasciola spp.) or through a second, infected intermediate host (e.g., an infected fish, in the case of Opisthorchis spp. and Clonorchis spp.). Once inside the definitive host, metacercariae excyst and mature into adult hermaphroditic worms, undergo tissue migration and eventually reach their predilection site/s, such as the liver, bile ducts, other parts of the gastrointestinal tract or lungs, depending on worm species. In the case of schistosomes, the cercariae, after having directly penetrated the definitive host, shed their tail and are disseminated via the blood stream. Juvenile schistosomules then mature into dioecious adults that mate, migrate to the mesenteric veins around the bladder (Schistosoma haematobium) or intestine (Schistosoma mansoni) and start producing eggs. Eggs that become entrapped in tissues of the definitive host induce an immune-mediated response, leading to inflammation, granulomatous changes and subsequent fibrosis (Rollinson, 2009). Clinical manifestations of chronic schistosomiasis include abdominal pain, enlarged liver and blood in the stool or urine (depending on species) (Colley et al., 2014; Caffrey, 2015). In addition, long-term, chronic infection with S. haematobium can associate with the development of squamous cell carcinoma (Palumbo, 2007). Schistosomiasis, caused by water-borne flukes, is a neglected tropical disease (NTD). At least 230 million people are infected with schistosomes, and almost 700 million are at risk of contracting an infection (World Health Organization, 2012a; Colley et al., 2014). Schistosomiasis is responsible for more than three million of a total of 26 million disability- adjusted life years (DALYs) attributable to all NTDs (King et al., 2005; Fenwick, 2012; Murray et al., 2012). The two main species, S. haematobium and S. mansoni, are the cause of more than 160 million infections in sub-Saharan Africa, with the latter of the two having the greater impact on mortality, and being accountable for about two thirds of all schistosome infections in this region (van der Werf et al., 2003; World Health Organization, 2012a). Schistosoma mansoni is also found in Latin America, whereas Schistosoma japonicum is present mainly in China and South-East Asia (Colley et al., 2014). In addition to the substantial impact on human health caused by water-borne flukes, food- borne trematodes also cause chronic diseases in humans and other animals (Fried et al., 2004; Keiser and Utzinger, 2005, 2009; Fenwick, 2012; Murray et al., 2012; Toledo et al., 2012;

12

Lane et al., 2015). For example, collectively, the ‘Southeast Asian’ liver fluke, Opisthorchis viverrini (see Kaewkes, 2003) and the ‘Chinese’ liver fluke, Clonorchis sinensis (see Lun et al., 2005) infect more than 50 million people worldwide (Fürst et al., 2012) and cause cholangiocarcinoma in chronically infected patients (Sripa et al., 2012; Lai et al., 2016; Zheng et al., 2017), which is associated with substantial morbidity (World Health Organization, 2015). Other food-borne flukes that infect humans and appear to cause less detriment and include intestinal flukes such Echinostoma spp. (see Kostadinova and Gibson, 2000; Chai, 2007) and Fasciolopsis buski (see Keiser and Utzinger, 2009), as well as lung flukes of the genus Paragonimus (see Yokogawa et al., 1960; Liu et al., 2008). The main flukes that infect ruminants and cause substantial economic losses to the livestock industry are paramphistomes (see Durie, 1953) and liver flukes of the genus Fasciola (see Dorny et al., 2011). Importantly, many food-borne fluke species are zoonotic, which means that they are transmissible to humans and cause disease. Although flukes are responsible for the majority of food-borne helminthiases, some parasitic worms of the phylum Nematoda are also food-borne pathogens. For example, nematodes of the genus Trichinella infect a wide range of domesticated and wild vertebrate animals, including mammals, reptiles and/or birds (Murrell and Pozio, 2000). Humans mainly become infected when eating undercooked, inadequately cured or raw meat from an infected animal. Given the broad host range of Trichinella, trichinellosis represents a zoonotic disease of importance worldwide, although most human infections occur in European countries, likely due to a higher consumption of pork products in this region (Murrell and Pozio, 2011; Devleesschauwer et al., 2015).

1.5.2 Soil-transmitted helminths In addition to the detriment caused by water- and food-borne helminths, an even heavier burden to human health is attributed to soil-transmitted helminths (STHs; Bethony et al., 2006; World Health Organization, 2012a, b). These parasitic worms belong to the diverse phylum Nematoda, which is comprised of more than 25,000 species (Zhang, 2013). The main soil-transmitted species infecting humans are the large roundworm Ascaris lumbricoides, whipworm () and multiple species of hookworm (Necator americanus, Ancylostoma duodenale and Ancylostoma ceylanicum). In contrast to most trematodes, these nematodes have a direct life cycle, i.e. they do not require and intermediate host to develop and/or reproduce. However, a period of development in the soil is required for eggs or larvae to become infective. Eggs are passed in the faeces and contaminate surrounding soil, thus

13

enabling the transmission of STHs. NTDs caused by intestinal nematodes affect more than two billion people worldwide, predominantly in communities with poor infrastructure and without access to sufficient sanitation (Hotez et al., 2009; World Health Organization, 2012b). Trichuris trichiura lives in the colon and caecum of humans, whereas A. lumbricoides and hookworms inhabit the small intestine. Soil contaminated with either infective eggs (A. lumbricoides and T. trichiura) or third-stage larvae (hookworms) are the source of infection. These stages are either ingested or penetrate the skin of the host (in the case of hookworm larvae) and then develop into adult worms. Depending on species, females can produce between 3000 (in the case of T. trichiura) and 200,000 (in the case of A. lumbricoides) eggs per day, which are shed via the faeces and disseminated in the environment (Bethony et al., 2006). In the most impoverished areas of the world, many children are infected with multiple species of STHs, causing reduced growth and slowed intellectual development (Hotez et al., 2009; World Health Organization, 2012b), exacerbating poverty due to long-term ill health and decreased productivity. In addition to soil-transmitted nematodes that infect humans, many parasites of the phylum Nematoda pose a substantial economic burden to livestock industries (Loyacano et al., 2002; Sutherland and Scott, 2009; Roeber et al., 2013; Charlier et al., 2014). Globally, species of Haemonchus, Teladorsagia (Ostertagia) and Trichostrongylus (order Strongylida) are common and pathogenic gastrointestinal parasites of ruminants (mainly affecting cattle, sheep and goats), causing economic losses of tens of billions of dollars per year; in Australia alone, costs associated with reduced animal productivity, deaths and parasite control constitute more than 500 million dollars annually (Lane et al., 2015).

1.6 Control of helminth infections and associated challenges The fact that helminth parasites can infect humans and other animals via a range of different sources, including contaminated water, food and/or soil, and that many of them are zoonotic represents a multifactorial challenge for the effective prevention and control of infections and disease. For NTDs, many efforts aim to improve sanitation infrastructure, provide safe water and food, and educate people about disease transmission, risk factors and the need for hygiene (Bethony et al., 2006; Keiser and Utzinger, 2009; Colley et al., 2014). Another major intervention strategy for some worms is the control of vectors or intermediate hosts such as mosquitoes and molluscs (Hotez et al., 2009). In addition, there are efforts to understand social-ecological factors and to inform policy (Utzinger, 2012; Webster et al.,

14

2014). For livestock animals, worm control measures include educating farmers about appropriate pasture and grazing management, nutritional supplementation, treatment strategies as well as the breeding and selection of resilient animals (Besier and Love, 2012; Besier, 2012; Knox et al., 2012; Kearney et al., 2016). Despite the implementation of such control measures, parasite control still relies heavily on chemical intervention. There have been ongoing efforts to develop vaccines against a range of different parasitic worms, including cestodes (tapeworms; phylum Platyhelminthes), schistosomes, liver flukes and hookworms (cf. Miller, 1978; Rickard et al., 1995; Lightowlers et al., 2000; Loukas et al., 2007; Smout et al., 2009; Piratae et al., 2012; Hotez et al., 2013; Toet et al., 2014; Bottazzi, 2015; Sotillo et al., 2016), but, thus far, only a single vaccine (Barbervax®, www.barbervax.com.au) has been brought to market for the use in sheep against a gastrointestinal nematode (Haemonchus contortus) (see Besier and Smith, 2014). Although there have been promising applications of this vaccine (Bassetto et al., 2014; Besier et al., 2015), Barbervax® does not provide complete protection. Thus, in the absence of commercial vaccines against helminths, the mainstay for the control of infections is a small number of anthelmintics, approved for the treatment of helminthiases of humans and/or other animals (Keiser and Utzinger, 2010; Prichard et al., 2012). Most of these drugs target components of the worm’s nervous system; for example, piperazine and macrocyclic lactones, such as ivermectin and moxidectin, target gamma-aminobutyric acid (GABA) receptors. Other compounds, including amino-acetonitrile derivatives (monepantel; Kaminsky et al., 2008a, b), imidazothiazoles (levamisole), spiroindoles (derquantel) and tetrahydropyrimidines (pyrantel/morantel), target a different class of neurotransmitter receptors, the acetylcholine receptors (Holden-Dye and Walker, 2014). Other targets of the nervous system include calcium-activated potassium channels, which are targeted by cyclooctadepsipeptides (emodepside; Kulke et al., 2014). While these drugs are mainly used for the treatment of nematode infections, the control of trematodiases often relies on two key drugs, praziquantel (Caffrey, 2015) and triclabendazole (Caffrey et al., 2012). The latter belongs to the class of benzimidazoles, which also includes albendazole and mebendazole (Keiser and Utzinger, 2010). Anthelmintics of this class target the cytoskeleton of the parasite by binding to b-tubulin and thus inhibit the formation of microtubules (Lacey, 1990; Holden- Dye and Walker, 2014). Although drugs for the treatment of most helminth infections are available and their widespread use in mass drug administration (MDA) programmes for humans has led to the

15

elimination or substantial reduction of some NTDs in several areas (Prichard et al., 2012; Rollinson et al., 2013), there are multiple challenges associated with the administration of these drugs. First, helminthiases of humans often affect the poorest of communities and, therefore, drugs need to be supplied to people at no cost (Hotez et al., 2009). Furthermore, MDA programmes need to provide education and training for medical staff in affected areas, and ensure efficient medicine logistics, administration and treatment monitoring (World Health Organization, 2012b; Webster et al., 2014); for example, some drugs have substantial side effects, resulting in poor patient compliance (Bethony et al., 2006). In the veterinary field, efforts aim to educate farmers about the correct timing, type and dosage of treatment against parasitic worms to minimise the inefficient use of anthelmintics (Besier and Love, 2012; Besier, 2012). In addition, consumers’ preference for minimal chemical intervention in animal production has recently restricted the use of particular anthelmintics (Knox et al., 2012). Although these challenges are affecting the efficacy of intervention programmes, the main compounding factor of effective disease control is resistance against currently available anthelmintics. In livestock animals, many economically important parasites developed resistance against most anthelmintics within a relatively short period of time after their introduction (Kaplan, 2004; Wolstenholme et al., 2004; Jabbar et al., 2006; Wolstenholme and Kaplan, 2012; Scott et al., 2013; Rose et al., 2015). Resistance can develop via a range of mechanisms: changes in the molecular target can lead to reduced/no binding of a drug; changes in metabolism can favour an increased drug efflux/breakdown or insufficient drug distribution to the target site; and the overexpression of the target gene can shift the drug- target stoichiometry, thus evading drug action (Wolstenholme et al., 2004). Several aspects of parasite biology favour the development of resistance: most parasitic nematodes of veterinary importance reproduce sexually, have a rapid and direct life cycle and can be very fecund, allowing for relatively pronounced genetic recombination and spontaneous mutation events, and spread of resistance alleles (Gilleard and Beech, 2007). The movement of host populations and the high genetic diversity between/among populations of parasitic nematodes further promotes this development. In addition, an excessive and irrational use of existing drugs to improve productivity represents substantial selection pressure, ‘driving’ widespread resistance (Keiser and Utzinger, 2010; Kaplan and Vidyashankar, 2012) Human MDA programmes face similar issues, with a risk of resistance developing (Vercruysse et al., 2011; Webster et al., 2014), and the reliance on single drugs (e.g., the use of praziquantel for treating schistosomiasis) inducing selective pressure that could favour

16

resistant parasite populations. Low efficacy and cure rates have been reported both in laboratory and field settings for a range of different human helminths, including schistosomes, hookworms, common roundworms and whipworms (see Keiser and Utzinger, 2008; Olsen et al., 2009; Soukhathammavong et al., 2012; Wang et al., 2012), but true resistance (i.e. treatment-refractory parasites) has not yet been reported (Keiser and Utzinger, 2010). Nevertheless, it is possible that the resistance situation now seen in the veterinary field might unfold in human parasites, given continued MDA pressure (Webster et al., 2014; Bergquist et al., 2017). Therefore, there is a need for sustained efforts towards the identification and characterisation of new drug targets and the development of novel anthelmintics against parasitic worms of humans and other animals. In particular, there is a need for anthelmintics that have efficacy against all or most developmental stages of a parasite inside a host, have broad-spectrum activity and/or have a sterilising effect on adult worms (Geary, 2012; Kaplan and Vidyashankar, 2012; Bergquist et al., 2017). However, no new anthelmintics have entered the market since the introduction of ivermectin in 1981 (Kaplan and Vidyashankar, 2012), monepantel in 2008 (Kaminsky et al., 2008a, b) and derquantel in 2010 (Little et al., 2010). Another relatively novel drug, tribendimidine, whose toxicity profile is not yet defined, is only approved for use in China (Epe and Kaminsky, 2013). With no new anthelmintics with novel modes of action in sight, there is a risk that resistance will substantially compromise parasite control. In addition, there is little financial incentive for pharmaceutical companies to invest in research and development (R&D) of drugs against NTDs, given that they would have to be provided to affected countries/communities at virtually no cost (Molyneux, 2004; Hotez et al., 2009; Caffrey et al., 2012).

1.7 Aspects of drug discovery and development The discovery and development of a new drug can be broadly divided into five main phases (Hughes et al., 2011). In the first discovery stage, the goal is to find and validate a ‘hit’, which is defined as a compound that shows activity (i.e. elicits the desired effect) in a screening assay. Compound libraries can contain several hundreds of thousands (for high- throughput screening; HTS) of chemical molecules. They are usually selected based on several physicochemical features that make them more likely to be a successful starting point for the development of a drug. For example, the “Lipinski’s rule-of-five” defines a set of chemical properties that a molecule needs to have to be considered “drug-like” (Lipinski, 2004). Historically, the discovery of new chemical entities with the desired features to treat a

17

disease relied on the screening of small molecules in cell cultures or whole organisms. This approach, called phenotypic drug discovery, usually does not make any assumptions about the targeted protein or the mode of action of a small molecule, but rather selects compounds that exhibit the desired change in phenotype (Swinney and Anthony, 2011). In contrast, target-based drug discovery (TDD) builds on the hypothesis that the inhibition or activation of a particular molecular target (usually a protein) will lead to a desired effect, such as the disruption of a biochemical pathway (Swinney and Anthony, 2011). Based on this assumption, a large, random set of chemical entities (“screening library”) can be tested against the designated recombinant target protein. Another, more sophisticated option is to restrict the compounds that are to be tested to a specific chemical class based on significant prior knowledge (e.g., the three-dimensional structure of the target protein or known active compounds acting on the targeted protein). This approach helps to exclude unsuitable compounds in the early stages of the discovery process. TDD approaches usually allow for a much higher throughput compared to phenotypic screening approaches, and a lot of progress has been made in the development and automation of HTS assays in recent years (Macarron et al., 2011). Screening processes also include cross-screenings to exclude compounds that exhibit toxic effects, are unspecific inhibitors, or have “off-target” hits to unrelated targets (Baell and Walters, 2014), properties that could lead to potentially adverse effects for the patient. A third screening method is the computationally docking of ligands stored in ‘virtual libraries’ against the three-dimensional model of a target protein, an approach known as “virtual screening” or “virtual docking” (Ferreira et al., 2015; Sarnpitak et al., 2015). The three-dimensional structure of a target required for this approach can be elucidated by nuclear magnetic resonance spectroscopy (NMR), X-ray crystallography, or is computationally inferred by structural homology modelling to a known protein structure (Szymczyna et al., 2009; Yang et al., 2015). In addition to its importance in early drug discovery, virtual screening also has useful applications in the later stages of drug development. For example, structural analysis of binding pockets and ligands can help to optimise the binding affinity of a compound to a target (McInnes, 2007). This and similar approaches are part of the ‘hit-to-lead’ and lead optimisation phase, the second phase in the drug development process. In this stage, the first goal is to optimise the efficacy of an active molecule (‘hit’) and present a structure for further optimisation (‘lead’). One important step of this phase is the investigation of (predicted) changes in activity/toxicity of a molecule upon chemical or computational modification of its side chains. These

18

experiments are known as structure-activity-relationship (SAR) studies. Additionally, there is a focus on the analysis of pharmacokinetic properties, as well as absorption, distribution, metabolism, excretion and toxicity (ADMET; Lin et al., 2003). Subsequently, the lead structure is further optimised to make a compound more potent or more selective, or to increase its binding affinity. In this context, it is important that all favourable features (e.g., low toxicity and/or high efficacy) of the original hit structure are retained. Following the hit-to-lead phase, a drug candidate must undergo extensive pre-clinical and clinical trials and eventually regulatory approval (Hughes et al., 2011). The entire drug discovery and development process, from target identification to the marketing of a drug, is a very expensive and slow process, often taking 10-15 years and costing between one and five billion dollars (Hughes et al., 2011; Geary et al., 2015). Additionally, many drug candidates fail in one of the stages, for example due to insufficient efficacy or an unacceptable toxicity profile, which leads to major financial losses associated with discontinued development efforts (Hughes et al., 2011). Given the low success rate and potentially small returns on investments for a new drug, in recent years, the repurposing or repositioning of approved drugs or drug-like molecules has gained popularity (McInnes, 2007; Ekins and Williams, 2011; Ekins et al., 2011). Using a drug that has already been approved for a particular indication as a starting point has the advantage that one can circumvent some or even many of the costly steps in the R&D process, such as testing for toxicity and off-target effects, because the candidate drug has already passed these stages (Ashburn and Thor, 2004). This approach has also gained popularity in the search for new anthelmintics. Drugs investigated as repurposing candidates include anti-malarials, antibiotics and veterinary anthelmintics for the use in humans (Olliaro et al., 2011; Panic et al., 2014). Another strategy involves the medium- to high-throughput phenotypic screening of known drugs, discontinued drug candidates or molecules with drug- like properties against parasitic worms (Abdulla et al., 2009; Preston et al., 2016), with the aim of finding drugs/drug leads that can be repurposed as anthelmintics. In this context, many studies have investigated protein kinase inhibitors approved for the treatment of human cancers as potential drug development/repurposing candidates against parasitic infections or diseases (e.g., Preston et al., 2015a), in particular against schistosomiasis (Dissous and Grevelding, 2011; Gelmedin et al., 2015). Other studies draw from the extensive knowledge in traditional, plant-based medicine and aim to find lead compounds from plant extracts that have anthelmintic activity in vitro (Kumarasingha et al., 2016).

19

1.8 Nematode model systems for anthelmintic drug discovery Many recent drug discovery efforts rely on phenotypic screening systems that measure the change in viability and/or motility of a parasite upon drug exposure (Preston et al., 2015a). Consequently, these systems usually do not inform about the mode of action of the tested molecule. However, this knowledge is desired for a new drug candidate that has shown activity in vitro, as it can inform the future use of an anthelmintic and, potentially, the development of reliable diagnostic tests for resistance. Detailed functional molecular studies of the interaction between drug and parasite in vitro can help to obtain this evidence. However, both phenotypic screening assays and functional studies in parasitic worms require the establishment of the parasite life cycle in a laboratory setting (in vitro and/or in vivo), which can be costly and challenging, in particular for parasites involving one or more intermediate hosts (Caffrey et al., 2012). This issue can compound functional studies and drug discovery efforts in parasite species for which drug resistance is emerging and new drugs are needed. Therefore, C. elegans, a small, free-living nematode with a direct, 3-day life-cycle, and arguably the best-studied multicellular (metazoan) organism (Harris et al., 2014), has been used routinely as a tractable surrogate system to study the biology of nematodes and discover new anthelmintics (Burns and Roy, 2012; Holden-Dye and Walker, 2012). The use of C. elegans also allows for the application of established molecular genetic techniques, such as gene knockdown (Holden-Dye and Walker, 2007, 2014; Burns et al., 2015), which do not work in some parasitic nematodes (Knox et al., 2007; Dalzell et al., 2012; Hagen et al., 2012), although they have been applied successfully to trematodes (e.g., Hagen et al., 2014; Guidi et al., 2015; Hagen et al., 2015). Despite the ease of genetic modification and maintenance of C. elegans in a laboratory setting, functional studies of parasite biology, for example, virulence factors are not feasible in a free-living nematode (Gilleard, 2004, 2006), and studying anthelmintic resistance in C. elegans rather than the parasitic nematode itself also has its limitations (Gilleard, 2013). Thus, H. contortus has been established as a model to investigate the biology of parasitic nematodes, and for drug discovery and anthelmintic resistance studies (Gilleard, 2006; Lanusse et al., 2016). A number of features make H. contortus attractive for these undertakings. First, its close phylogenetic relationship to C. elegans within the clade V nematodes (Rhabditina; cf. Blaxter and Koutsovoulos, 2015) allows for knowledge transfer/extrapolation from studies in C. elegans. Furthermore, adult worms are relatively large (~2.5 cm) and females produce thousands of eggs per day, creating an abundance of biological and genetic material (Gilleard, 2013). Other reasons include that

20

infective L3 larvae can be cryopreserved or stored at 10 ˚C for up to three month (Preston et al., 2015a), and in vivo studies in the ruminant host are very feasible (Gilleard, 2006). In addition to the usefulness of functional studies of parasite genes or proteins for anthelmintic drug discovery and development, knowledge of the three-dimensional structure of a target protein also represents an important asset for the drug development process. In recent years, there have been increased efforts to obtain the three-dimensional structures for a range of parasite proteins. For example, the PDB (http://www.rcsb.org/pdb/) contains a substantial number of solved structures of protein kinases of parasitic protozoans (e.g., Leishmania spp., n = 5; Plasmodium spp., n = 21; Toxoplasma gondii, n = 40). However, this database only holds six crystal structures of C. elegans protein kinases and no such structures for other parasitic nematodes or flatworms. While such data are clearly needed, there are several reasons for this dearth of information. First, experimental determination of protein structures is still an expensive process and often cannot be carried out in a high-throughput manner. Second, to obtain enough proteinaceous material for the structural characterisation of a protein kinase, a substantial amount of parasite material is needed, requiring the maintenance of the parasite in culture, which can (depending on the parasite and the developmental stage) be costly and challenging.

1.9 Genomic-guided and computational drug discovery Given the challenges associated with functional and structural studies in parasitic helminths and the substantial costs associated with such experiments, researchers often carefully select a relatively small set of genes or proteins that are worth investigating experimentally. In this context, the analysis of genomic and transcriptomic data, among other ‘-omic’ data sets (cf. Preidis and Hotez, 2015), provides useful starting points, as it allows for the integration of data on parasite biology, small-molecule ligands and drug targets, and evidence from functional studies (Brindley et al., 2009; Cantacessi et al., 2015). With a continuing decrease in the cost of nucleotide sequencing facilitated by NGS technologies (Hayden, 2014; van Dijk et al., 2014; Goodwin et al., 2016), draft genomes and transcriptomes of many parasitic worms are now available; to date (June 2017), 134 draft genomes representing 114 species of parasitic worms are reported on “WormBase ParaSite” (version WBPS9; http://parasite.wormbase.org/index.html; Howe et al., 2016). Although other resources for the storage and analysis of genomic and transcriptomic data sets of helminths exist (e.g., Helminth.net, Martin et al., 2015; NEMBASE, Parkinson et al., 2004

21

and HelmDB, Mangiola et al., 2013), WormBase ParaSite currently represents the largest and most comprehensive collection of such data sets. Recently, researchers have started to mine such data (Loging et al., 2007) to prioritise proteins encoded in parasite genomes as drug targets and predict molecules as starting points for anthelmintic drug development (Table 1.3). Such approaches usually integrate phenotypic information from orthologs in experimentally studied organisms, such as C. elegans or D. melanogaster, to infer proteins that likely elicit a lethal phenotype upon gene perturbation in the parasite, and thus are essential (cf. Caffrey et al., 2009; Young et al., 2012). Additionally, transcriptomic data are often employed to gain insights into the transcriptional regulation of genes across different life cycle stages of a parasite, which can provide first clues about the role of genes/proteins in the developmental stages that could be targeted by a drug (usually all stages inside the definitive host). Drug target and drug information is also frequently integrated into the prioritisation approach, employing databases such as ChEMBL (Gaulton et al., 2012) and DrugBank (Law et al., 2014). Inferences of potential targets and associated small molecule ligands are mainly based on sequence similarity between parasite proteins and known targets in other organisms, and on filtering related small molecules according to certain biochemical properties (cf. Lipinski, 2004) or activity data (e.g., from a screening assay) available in the database. Some databases/resources are specifically designed for drug target prioritisation in parasites. For example, the “Tropical Disease Research (TDR) Targets” resource (Agüero et al., 2008; Magarinos et al., 2012) allows for the prioritisation/prediction of drug targets in parasites causing tropical diseases. However, TDR Targets is mainly designed for the analysis of unicellular eukaryotic parasites (see also Heiges et al., 2006; Aurrecoechea et al., 2009, 2010) and only includes data for two helminth species (B. malayi and S. mansoni). In addition to information on drugs and their targets, functional annotation of proteins inferred from genomic and transcriptomic data can provide important insights into their roles in different biochemical pathways. Such data are inferred through similarity searches against proteins deposited in pathway resources, such as the Kyoto Encyclopedia of Genes and Genomes (KEGG; Kanehisa and Goto, 2000) and the Reactome (Fabregat et al., 2016) databases. Pathway network information is then integrated into the drug target prediction process, for example, by defining ‘chokepoints’ - proteins that have a unique role in a pathway and will likely interrupt an essential reaction if inhibited (cf. Schwarz et al., 2013; Taylor et al., 2013a). The analysis of biochemical/metabolic pathways of helminths often reveals many unique features that are likely related to their life style or pathogenesis. For

22

example, many metabolic pathways are reduced or lost in particular lineages of parasitic worms (cf. Tsai et al., 2013; Tyagi et al., 2015), whereas others are substantially expanded (cf. Schwarz et al., 2015). Such information can also provide useful clues for drug target prediction efforts, for example, by revealing parasite-specific target proteins. Once all of this information is combined, different strategies for drug target prediction/prioritisation can be applied: filtering according to certain properties, or assigning a weight to all properties (according to their perceived importance) and subsequent ranking of proteins/compounds (Shanmugam et al., 2012). Sometimes, a combination of these approaches can be applied as well, for example, by including only those genes that are transcribed in a particular developmental stage and assigning weights to all other criteria to then follow a ranking-based approach. While some studies employ a relatively broad prioritisation strategy, without focusing on a particular class of proteins (see e.g., Young et al., 2012; Schwarz et al., 2013), others confine their searches and predictions to a particular class of enzymes, which allows for the integration of additional information, such as data on subfamily classification (Martin et al., 2009; Goldberg et al., 2013) or residues essential for enzymatic activity (Boudeau et al., 2006; Murphy et al., 2014). Focusing on a single class of enzymes also has the advantage of being able to manually integrate additional evidence from the literature. In this context, many studies have focused on protein kinases, given their known roles in many essential cellular signalling processes and/or in disease (Cohen, 2000, 2001; Blume- Jensen and Hunter, 2001), and the fact that they are amenable to targeting by small molecule drugs (Cohen, 2002; Cohen and Alessi, 2013). For example, Caffrey et al. (Caffrey et al., 2009) inferred three protein kinases as potential drug targets in the blood fluke S. mansoni using a chemo-genomic filtering approach. Another study (Taylor et al., 2013b), which investigated drug targets in parasitic nematodes, inferred 68 protein kinases to be essential based on a comparison of kinomes among parasitic and free-living nematodes. Based on this essentiality prediction, the authors (Taylor et al., 2013b) inferred these proteins to represent ‘good’ drug targets. However, the inclusion of C. elegans in this analysis might have reduced the number of kinases inferred to be conserved among the three parasitic nematodes. Instead, a subtractive approach (i.e. kinases that are conserved in all parasitic nematodes studied but not in C. elegans) might have revealed sets of kinases that are more likely to be specific to parasitic nematodes, and thus might represent promising targets. Subsequently, the anthelmintic activity of 18 kinase inhibitors targeting human homologs of the identified parasite kinases were tested against C. elegans, H. contortus and B. malayi (Taylor et al.,

23

2013b). Taken together, these studies have provided promising starting points for future drug discovery and development efforts. In the search for novel anthelmintics, the computational identification of kinase inhibitors that have been approved for the treatment of human diseases and have potential to be repurposed for a different indication (Ekins et al., 2011) has recently gained momentum (Panic et al., 2014; Preston et al., 2015b). Particularly in schistosomes, this so-called ‘piggy- back’ approach has been employed extensively and has resulted in the functional investigation of a range of human kinase inhibitors as anti-schistosome agents (Dissous and Grevelding, 2011). In total, the effect of 17 kinase inhibitors on different developmental stages of schistosomes was functionally investigated (Gelmedin et al., 2015). Of these inhibitors, three are FDA-approved; both nilotinib and dasatinib inhibit several schistosome Abl and Src/Abl kinases and cause diminished pairing stability and reduced viability, but the latter also induces apoptosis in several tissues, interferes with sperm maturation and leads to a collapse of the gastrodermis, eventually causing the death of adult worms (Beckmann et al., 2012). A third Abl and Src/Abl inhibitor, imatinib, has also shown promising in vitro activity against adult S. mansoni worms (Beckmann and Grevelding, 2010; Buro et al., 2014) but in vivo tests in mice and hamsters revealed a lack of efficacy caused by the interference of the drug with blood components (Katz et al., 2013; Beckmann et al., 2014). In addition to those approved compounds and their targets, a range of other protein kinases have been investigated to gain insight into their role in schistosome biology and their potential as drug targets (reviewed by Dissous and Grevelding, 2011; Beckmann et al., 2012; Morel et al., 2014; Walker et al., 2014). As these examples show, repurposing of well-studied compounds is a viable option to find new starting points for drug development against parasitic helminths. However, it also illustrates that most studies focus on single genes/proteins that represent well-established drug targets in other organisms and have approved drugs associated with them. Clearly, to move from this gene-centric to a more genome-centric approach, there is a need for better, integrative tools and resources that facilitate the identification, classification and curation of protein kinases, allowing researchers to effectively harness, on a global scale, large amounts of genomic, transcriptomic and functional data sets which are now available.

24

1.10 Resources and databases for protein kinases Attempts to integrate information on kinases from a range of resources to build a “Protein Kinase Resource” date back to before the complete human genome sequence was published (Smith et al., 1997; Smith, 1999; Petretti and Prigent, 2005; Niedner et al., 2006). However, this resource is no longer available or supported. The “Kinases encoded in Genomes” database (KinG; Krupa et al., 2004) is another, similar resource for protein kinases and associated information. Although, according to the website (http://king.mbu.iisc.ernet.in), this database was updated in 2014, including several additional kinomes of model organisms, the retrieval of information and the sequence- or domain combination-based search (as claimed) are not possible online, suggesting that there is no ongoing technical support for this resource. A similar kinome database (KinBase; http://kinase.com/web/current/kinbase/; maintained by the Manning group at Genentech Inc., in collaboration with the Razavi-Newman Center for Bioinformatics at the Salk Institute for Biological Studies, and Cell Signaling Technology Inc.) contains the kinome data sets of 15 different species (Table 1.1), most of which represent model organisms or species at phylogenetic branch points. This resource is mainly a collation of individually published kinome projects and allows for the retrieval of full- length amino acid sequences, catalytic domain sequences and kinase group, family and subfamily classifications, and provides a basic overview of evolutionary relationships among kinase sequences. However, 209 of the 7597 sequences (2.75%) in KinBase are less than 200 amino acids long, suggesting that they are fragments of protein kinases. These numbers show that even the best-curated kinase database contains draft or un-curated data, a statement that is further supported by differing numbers reported in different studies for these model organisms (Appendix 1.1). The “Protein Kinase Ontology Browser” (ProKinO; Gosal et al., 2011; McSkimming et al., 2015) represents an ontological framework that integrates a wide range of information on the sequence, structure, function, mutations and pathways of protein kinases. By using a controlled vocabulary and a defined ontology, this framework allows integrative analyses of protein kinases to be performed, which can aid in the formulation of testable hypotheses. The authors (Gosal et al., 2011; McSkimming et al., 2015) demonstrated the utility of ProKinO by mining and annotating the human cancer kinome. Although this platform contains information on 14 other, phylogenetically diverse organisms (including C. elegans), it contains relatively sparse information on individual protein kinases for these species, suggesting that this resource is mainly targeted at the human cancer research community. A recent expansion of ProKinO termed KinView (McSkimming et al., 2016) further supports

25

this notion by providing an interactive ‘visualisation’ interface facilitating comparative analyses between subsets of kinases (e.g., families/subfamilies) and integrating other information, including natural sequence variation, cancer variants, post-translational modifications and residue-specific annotation. Another example of such a specialised resource is KinMutBase (Ortutay et al., 2005), which holds information on disease-causing variations in protein kinase domains. In addition to databases containing information on protein kinases, PhosphoSitePlus (Hornbeck et al., 2015) and PhosphoPOINT (Yang et al., 2008) contain data on post- translational protein modifications (PTM) including phosphorylation sites, and protein- protein interactions (PPI), presenting important ‘interactome’ resources for protein kinases. Another database, “Kinase Pathway Database” (Koike et al., 2003), employs natural- language processing to extract and integrate information from published articles to create a knowledgebase on signalling pathways of protein kinases. Although these resources are useful in the broader context of kinase research, they are mostly restricted to the analysis of human kinases, and do not provide kinase identification or classification capabilities.

1.11 Methods for the characterisation and annotation of kinomes Given the ever-expanding number of available complete and draft genomes, there is a clear need for reliable automatic identification and classification of protein kinases. To address this need, several software packages have been developed for this purpose. Historically, the identification of protein kinases in sequence repositories relied on sequence similarity searches using the “Basic Local Alignment Search Tool” (BLAST; Altschul et al., 1997). However, kinases may exhibit substantial sequence divergence in regions outside of the conserved kinase catalytic domain, resulting in low BLAST scores. To address this limitation, hidden Markov models (HMMs) have been employed for protein domain detection (Eddy, 1996). HMMs are frequently used for pattern detection in the context of speech recognition (Krogh et al., 1994), but it has become clear that these models are readily applicable to the detection of functional protein domains or conserved nucleotide patterns. They are statistical descriptions of sequence conservation from multiple sequence alignments (MSAs) and have been shown to outperform standard local alignment- based methods for sequence search/comparison (such as BLAST), both in terms of sensitivity and specificity (Sonnhammer et al., 1997). A collection of HMMs for most protein domains, including two models representing the sequence of the catalytic domain of serine/threonine kinases (Pkinase; PF00069) and tyrosine kinases (Pkinase_Tyr; PF07714), are stored in the

26

Pfam database (Sonnhammer et al., 1997). The application of these two Pfam HMMs allows for the subdivision of protein kinases in two major classes: kinases that phosphorylate serine and/or threonine residues and kinases that phosphorylate tyrosine residues. In recent years, other methods have been developed and have improved the classification of protein kinases. For instance, Kinomer (Miranda-Saavedra and Barton, 2007; Martin et al., 2009) is a computer program that improves the classification of kinases by constructing and applying 12 group- or family-specific hidden Markov models. This approach facilitated the classification of some previously unclassified kinases. The Kinomer HMM library outperforms protein BLAST-based approaches (Camacho et al., 2009) and the general Pfam HMMs in both the identification and group-level classification of protein kinases (Martin et al., 2009). This improved accuracy is facilitated by group-specific HMMs compared with a single HMM for the entire protein kinase superfamily, recognising homologs with higher scores and rejecting non-homologs with larger E-values (Miranda-Saavedra and Barton, 2007). This method has been applied for the characterisation of more than 40 draft kinomes, including those of microsporidia (see Miranda-Saavedra et al., 2007), amoebae (see Clarke et al., 2013), apicomplexans (see Talevich et al., 2014), fungi, algae, plants and (Martin et al., 2009). The major limitation of Kinomer is that it can only classify kinases to the group level and has limited capabilities for the classification of aPKs, only being able to identify protein kinases within four of a total of at least 13 families. Further classification of kinases into subfamilies cannot be achieved using this tool, and novel families/subfamilies are likely to be missed or classified at the group level only. In contrast, the program Kinannote (Goldberg et al., 2013) produces a draft kinome and comparative analyses for a predicted proteome using a single-line command, and is currently the only tool that automatically classifies protein kinases to the sub-family level using the controlled vocabulary of Hanks and Hunter (Hanks et al., 1988; Hanks and Hunter, 1995). In a first step, Kinannote employs an ePK HMM derived from a manual alignment of the Dictyostelium kinome (Goldberg et al., 2006). By using a model that represents an early evolutionary branch point and employing a relaxed cut-off, this first filtering step aims to reduce the search space for following steps, while retaining sensitivity to evolutionary divergent kinase sequences. Subsequently, a position-specific scoring matrix (PSSM; cf. Henikoff and Henikoff, 1996), built from an MSA of protein kinase domains of the Dictyostelium kinome and other kinomes in KinBase, is employed to identify conserved sequence motifs in the kinase catalytic domain that are important for kinase activity. The resultant score is then applied in the second phase of the algorithm together with results of a

27

search against KinBase using BLAST. In this phase, kinases are identified based on sequence similarity to kinases in KinBase, as well as HMM and PSSM scores. Depending on these scores, they are either retained for subsequent classification via a search against KinBase using BLAST, or are added to the draft kinome set for further curation and designated either ‘twilight’ hits or “protein kinase subdomain-containing proteins”. Sequences without BLAST hits and scores below the defined cut-off values are considered not to be kinases and excluded from further analysis. Based on published evidence, Kinannote outperforms Kinomer, in terms of sensitivity (Goldberg et al., 2013), is user-friendly and can rapidly process a large number of amino acid sequences. Furthermore, its output is human-readable, providing a foundation for manual curation of the submitted sequence data. Nevertheless, this tool has several limitations. First, it only accepts protein sequence data as input and does not allow for additional information, such as RNA-Seq evidence or genomic data, to be provided that could aid subsequent curation. Second, recently-diverged kinase sequences might not be identified by the Dictyostelium HMM even when a relaxed cut-off is employed. In addition, most aPKs are not detected by the HMM, because it is built from ePKs and thus will only detect a small number of aPKs with sequences similar enough to those of ePKs. Third, the BLAST-based classification approach is unable to (fully) classify divergent/novel families/subfamilies (e.g., those of parasitic helminths) as they are not represented in KinBase. Some of these limitations can be overcome by employing orthology- (Li et al., 2003) or phylogeny-based (Yang and Rannala, 2012) approaches. For example, a phylogenetic analysis of the kinome of P. falciparum revealed a novel kinase family with a conserved Phe- Ile-Lys-Lys (FIKK) motif, which, upon investigation of additional apicomplexan kinomes, was shown to be common among, yet specific to, many apicomplexans (see Talevich et al., 2011). Additionally, kinases can be grouped based on the presence of particular accessory domains and their order within the amino acid sequence (Manning et al., 2002a, b; Hanks, 2003), for example, by employing domain search tools such as InterProScan (Jones et al., 2014). However, to our knowledge, this has not been applied to support the global classification of protein kinases. Other approaches, such as that employed in the “Conserved Domain Database” (CDD; Marchler-Bauer et al., 2013), rely on experimentally determined three-dimensional structures of kinases to define functional domains. Although this strategy can allow for the identification of divergent kinase sequences with conserved structures, it does not achieve family or subfamily classification. In contrast, computational structural homology modelling

28

of unclassified kinases (e.g., using the program I-TASSER; Yang et al., 2015) could provide clues that do allow for the classification of kinases into families/subfamilies. In addition, kinase sequences can be further characterised based on the presence (and characteristics) of phosphorylation sites (Hornbeck et al., 2015) or based on their predicted catalytic activity (Boudeau et al., 2006; Murphy et al., 2014). Taken together, although many tools for the identification, classification and annotation of protein kinases exist, there is no tool that combines and integrates all of the described methodologies to achieve a reliable, comprehensive classification, which can be universally applied to kinomes from a broad range of diverse organisms. This situation represents a challenge for the annotation of helminth kinomes, particularly because most HMM-based methods and databases do not include helminth sequences in their model-building steps. Furthermore, none of the tools described here provides outputs that readily enable detailed, (semi-)automated curation of amino acid and nucleotide sequences inferred from draft genomes.

1.12 Draft kinomes of helminths Despite these challenges, most published draft genome projects of parasitic worms provide an estimate of the number of encoded protein kinases (Table 1.3), and sometimes include classification of kinases into the recognised groups families and/or subfamilies. However, these estimates vary substantially for the same species among different publications, and depend heavily on the methodology applied for identification and classification of protein kinases. In some cases (e.g., Desjardins et al., 2013; Bennuru et al., 2016), the authors note that additional curation of kinome data sets is required in the future, given potential problems with gene models and/or misclassifications. Most studies do not extend their kinome analysis beyond the identification of the number of protein sequences containing a kinase catalytic domain. Such analyses are usually conducted either based on a match to one of the two HMMs representing this domain in the Pfam database (PF00069 and PF07714) or based on matches to known protein kinase sequences in a database using BLAST (e.g., UniProtKB/Swiss-Prot; Boutet et al., 2007). Although a relatively large number of such estimates are available for draft genomes of parasitic worms, no well-curated kinomes have been published in their own right to date, with the exception of the eukaryotic protein kinase complement of S. mansoni (see Andrade et al., 2011). However, the latter study did not identify and classify divergent PKLs/aPKs, such as RIO kinases, although investigating such divergent sequences in parasites is warranted, given that they might represent parasite-

29

specific drug targets or could provide important clues regarding the unique biology of a parasite. In addition to the S. mansoni kinome, draft kinomes for two parasitic nematodes (Loa loa and ) have been defined (Goldberg et al., 2013) employing the automated classification method in Kinannote and subsequent refinement based on sequence similarity with the kinases of D. discoideum, S. cerevisiae, D. melanogaster and H. sapiens, and orthology with C. elegans kinase sequences. Furthermore, as part of the L. loa genome project (Desjardins et al., 2013), the draft kinomes of L. loa, B. malayi, W. bancrofti, Ascaris suum, Pristionchus pacificus, Meloidogyne hapla and were identified based on orthology with C. elegans sequences and/or a protein kinase HMM representing Dictyostelium ePKs (Goldberg et al., 2006). These two studies report substantially different kinomes for both L. loa and W. bancrofti, indicating the significant influence of the two different methods employed here on the resultant kinome data sets. Other studies have not defined whole draft kinomes, but used Kinomer and a BLAST-based approach to define a subset of kinases as potential drug targets (Taylor et al., 2013b) and to study a single signalling pathway in S. japonicum (see Wang et al., 2006), respectively. While all of these studies can provide starting points for functional investigation, future work on worm kinomes and associated aspects are currently compromised by a lack of or inadequately curated data.

1.13 Conclusions from the literature review and aims of this thesis In the last decade, significant advances in nucleotide sequencing technologies have facilitated the decoding of a wide range of genomes and transcriptomes of eukaryotic organisms. These -omic data sets now provide researchers with a wealth of information that can be mined and analysed employing sophisticated algorithms to infer biologically meaningful data. In particular, using such data sets to study the biochemical signalling machinery and pathways of an organism can provide important insights into its fundamental molecular biology. In this context, protein kinases represent important enzymes that play essential roles in the regulation of many cellular processes and, when deregulated, are responsible for the onset of some key diseases. Therefore, the sequences, structures and catalytic function of kinases have been extensively investigated in many model organisms, which demonstrated that these enzymes are amenable to targeting by small-molecule drugs. In spite of their biological importance, only few kinase complements of pathogenic organisms have been studied to date. In particular, there is a dearth of information on this

30

important class of proteins in parasitic worms that cause water-, food- and soil-borne diseases in humans and animals with substantial, associated morbidity and mortality. Despite the significant impact that these parasites have on global health and the problems associated with their control, few studies have investigated their protein kinases as potential drug targets, mainly due to a lack of comprehensively characterised and annotated kinomes for parasitic helminths. Although many tools are available for the characterisation and annotation of kinase complements, the draft state of many helminth genomes, the substantial phylogenetic divergence of parasitic worms from established model organisms and unique genomic features representing adaptations to a parasitic life style represent hurdles for the automated identification and classification of protein kinases in these pathogens. While the availability of genomic and transcriptomic data for many helminths provide major opportunities and prospects to gain a thorough understanding of their fundamental biology, evolution, development, physiology, biochemistry and their pathogenicity, integrative approaches to harness these resources are scarce. This thesis addresses some of these issues and gaps. The main objectives of this thesis were: (a) to comprehensively identify, classify, curate and functionally annotate the full complements of protein kinases encoded in the genomes of some key parasitic worms, by establishing an advanced bioinformatic workflow system; (b) to explore fundamental aspects of kinase signalling in such worms based on developmental transcriptomes and cross-species comparisons; and (c), from an applied perspective, to identify protein kinases with potential as anthelmintic targets, and associated small-molecule effectors.

The specific research aims were:

1. To establish an integrated bioinformatic workflow, employing a pairwise comparative approach to define and curate the complete kinomes of the blood flukes S. haematobium and S. mansoni, investigate the transcription of kinase genes in different life cycle stages of these parasites and then integrate this and other functional annotations to infer potential drug targets and small-molecule effectors (Chapter 2); 2. To adapt the established workflow for the characterisation and curation of the complete kinome of the barber’s pole worm, H. contortus, to investigate the transcriptional regulation of kinase genes in all developmental stages and then to integrate these data into an improved, ranking-based drug/drug target prediction pipeline (Chapter 3);

31

3. To modify aspects of the established pipeline to allow for a comprehensive characterisation of the kinomes of four enoplean worms of the genera Trichinella and Trichuris, and then compare their kinomes to gain insights into the unique biology and evolution of this nematode class (Chapters 4 and 5), and 4. To discuss the results in relation to protein kinase signalling in socioeconomically important parasitic worms, and the identification and classification of protein kinases in eukaryotic organisms; identify future avenues for functional investigations of the fundamental biology of helminths; and, from an applied perspective, to suggest potential pathways for anti-parasitic drug discovery, repurposing and development (Chapter 6).

32

1.14 References Abdulla MH, Ruelas DS, Wolff B, Snedecor J, Lim KC, Xu F, Renslo AR, Williams J, McKerrow JH, Caffrey CR, 2009. Drug discovery for schistosomiasis: hit and lead compounds identified in a library of known drugs by medium-throughput phenotypic screening. PLoS Negl. Trop. Dis. 3, e478. Achenbach J, Tiikkainen P, Franke L, Proschak E, 2011. Computational tools for polypharmacology and repurposing. Future Med. Chem. 3, 961-968. Adams JA, 2001. Kinetic and catalytic mechanisms of protein kinases. Chem. Rev. 101, 2271-2290. Adams MD, Kerlavage AR, Fleischmann RD, Fuldner RA, Bult CJ, Lee NH, Kirkness EF, Weinstock KG, Gocayne JD, White O, 1995. Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. Nature 377, 3-174. Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, Scherer SE, Li PW, Hoskins RA, Galle RF, George RA, Lewis SE, Richards S, Ashburner M, Henderson SN, Sutton GG, Wortman JR, Yandell MD, Zhang Q, Chen LX, Brandon RC, Rogers YH, Blazej RG, Champe M, Pfeiffer BD, Wan KH, Doyle C, Baxter EG, Helt G, Nelson CR, Gabor GL, Abril JF, Agbayani A, An HJ, Andrews- Pfannkoch C, Baldwin D, Ballew RM, Basu A, Baxendale J, Bayraktaroglu L, Beasley EM, Beeson KY, Benos PV, Berman BP, Bhandari D, Bolshakov S, Borkova D, Botchan MR, Bouck J, Brokstein P, Brottier P, Burtis KC, Busam DA, Butler H, Cadieu E, Center A, Chandra I, Cherry JM, Cawley S, Dahlke C, Davenport LB, Davies P, de Pablos B, Delcher A, Deng Z, Mays AD, Dew I, Dietz SM, Dodson K, Doup LE, Downes M, Dugan-Rocha S, Dunkov BC, Dunn P, Durbin KJ, Evangelista CC, Ferraz C, Ferriera S, Fleischmann W, Fosler C, Gabrielian AE, Garg NS, Gelbart WM, Glasser K, Glodek A, Gong F, Gorrell JH, Gu Z, Guan P, Harris M, Harris NL, Harvey D, Heiman TJ, Hernandez JR, Houck J, Hostin D, Houston KA, Howland TJ, Wei MH, Ibegwam C, Jalali M, Kalush F, Karpen GH, Ke Z, Kennison JA, Ketchum KA, Kimmel BE, Kodira CD, Kraft C, Kravitz S, Kulp D, Lai Z, Lasko P, Lei Y, Levitsky AA, Li J, Li Z, Liang Y, Lin X, Liu X, Mattei B, McIntosh TC, McLeod MP, McPherson D, Merkulov G, Milshina NV, Mobarry C, Morris J, Moshrefi A, Mount SM, Moy M, Murphy B, Murphy L, Muzny DM, Nelson DL, Nelson DR, Nelson KA, Nixon K, Nusskern DR, Pacleb JM, Palazzolo M, Pittman GS, Pan S, Pollard J, Puri V, Reese MG, Reinert K, Remington K, Saunders RD, Scheeler F, Shen H, Shue BC, Siden-Kiamos I, Simpson M, Skupski MP, Smith T, Spier E, Spradling AC, Stapleton M, Strong R, Sun E, Svirskas R, Tector C, Turner R, Venter E, Wang AH, Wang X, Wang ZY, Wassarman DA, Weinstock GM, Weissenbach J, Williams SM, Woodage T, Worley KC, Wu D, Yang S, Yao QA, Ye J, Yeh RF, Zaveri JS, Zhan M, Zhang G, Zhao Q, Zheng L, Zheng XH, Zhong FN, Zhong W, Zhou X, Zhu S, Zhu X, Smith HO, Gibbs RA, Myers EW, Rubin GM, Venter JC, 2000. The genome sequence of Drosophila melanogaster. Science 287, 2185-2195. Agüero F, Al-Lazikani B, Aslett M, Berriman M, Buckner FS, Campbell RK, Carmona S, Carruthers IM, Chan AW, Chen F, Crowther GJ, Doyle MA, Hertz-Fowler C, Hopkins AL, McAllister G, Nwaka S, Overington JP, Pain A, Paolini GV, Pieper U, Ralph SA, Riechers A, Roos DS, Sali A, Shanmugam D, Suzuki T, Van Voorhis WC, Verlinde CL, 2008. Genomic-scale prioritization of drug targets: the TDR Targets database. Nat. Rev. Drug Discov. 7, 900-907.

33

Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ, 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389-3402. Andrade LF, Nahum LA, Avelar LG, Silva LL, Zerlotini A, Ruiz JC, Oliveira G, 2011. Eukaryotic protein kinases (ePKs) of the helminth parasite Schistosoma mansoni. BMC Genomics 12, 215. Ashburn TT, Thor KB, 2004. Drug repositioning: identifying and developing new uses for existing drugs. Nat. Rev. Drug Discov. 3, 673-683. Aurrecoechea C, Brestelli J, Brunk BP, Dommer J, Fischer S, Gajria B, Gao X, Gingle A, Grant G, Harb OS, Heiges M, Innamorato F, Iodice J, Kissinger JC, Kraemer E, Li W, Miller JA, Nayak V, Pennington C, Pinney DF, Roos DS, Ross C, Stoeckert CJ, Jr., Treatman C, Wang H, 2009. PlasmoDB: a functional genomic database for malaria parasites. Nucleic Acids Res. 37, D539-D543. Aurrecoechea C, Brestelli J, Brunk BP, Fischer S, Gajria B, Gao X, Gingle A, Grant G, Harb OS, Heiges M, Innamorato F, Iodice J, Kissinger JC, Kraemer ET, Li W, Miller JA, Nayak V, Pennington C, Pinney DF, Roos DS, Ross C, Srinivasamoorthy G, Stoeckert CJ, Jr., Thibodeau R, Treatman C, Wang H, 2010. EuPathDB: a portal to eukaryotic pathogen databases. Nucleic Acids Res. 38, D415-D419. Azam M, Seeliger MA, Gray NS, Kuriyan J, Daley GQ, 2008. Activation of tyrosine kinases by mutation of the gatekeeper threonine. Nat. Struct. Mol. Biol. 15, 1109-1118. Baell J, Walters MA, 2014. Chemical con artists foil drug discovery. Nature 513, 481-483. Bassetto CC, Picharillo ME, Newlands GF, Smith WD, Fernandes S, Siqueira ER, Amarante AF, 2014. Attempts to vaccinate ewes and their lambs against natural infection with Haemonchus contortus in a tropical environment. Int. J. Parasitol. 44, 1049-1054. Beckmann S, Grevelding CG, 2010. Imatinib has a fatal impact on morphology, pairing stability and survival of adult Schistosoma mansoni in vitro. Int. J. Parasitol. 40, 521-526. Beckmann S, Leutner S, Gouignard N, Dissous C, Grevelding CG, 2012. Protein kinases as potential targets for novel anti-schistosomal strategies. Curr. Pharm. Des. 18, 3579- 3594. Beckmann S, Long T, Scheld C, Geyer R, Caffrey CR, Grevelding CG, 2014. Serum albumin and α-1 acid glycoprotein impede the killing of Schistosoma mansoni by the tyrosine kinase inhibitor Imatinib. Int. J. Parasitol. Drugs Drug Resist. 4, 287-295. Bennuru S, Cotton JA, Ribeiro JM, Grote A, Harsha B, Holroyd N, Mhashilkar A, Molina DM, Randall AZ, Shandling AD, Unnasch TR, Ghedin E, Berriman M, Lustigman S, Nutman TB, 2016. Stage-specific transcriptome and proteome analyses of the filarial parasite Onchocerca volvulus and its Wolbachia endosymbiont. mBio 7, e02028-16. Bergquist R, Utzinger J, Keiser J, 2017. Controlling schistosomiasis with praziquantel: How much longer without a viable alternative? Infect. Dis. Poverty 6, 74. Berman HM, Ten Eyck LF, Goodsell DS, Haste NM, Kornev A, Taylor SS, 2005. The cAMP binding domain: an ancient signaling module. Proc. Natl. Acad. Sci. USA 102, 45- 50. Berriman M, Haas BJ, LoVerde PT, Wilson RA, Dillon GP, Cerqueira GC, Mashiyama ST, Al-Lazikani B, Andrade LF, Ashton PD, Aslett MA, Bartholomeu DC, Blandin G, Caffrey CR, Coghlan A, Coulson R, Day TA, Delcher A, DeMarco R, Djikeng A, Eyre T, Gamble JA, Ghedin E, Gu Y, Hertz-Fowler C, Hirai H, Hirai Y, Houston R, Ivens A, Johnston DA, Lacerda D, Macedo CD, McVeigh P, Ning Z, Oliveira G, Overington JP, Parkhill J, Pertea M, Pierce RJ, Protasio AV, Quail MA, Rajandream MA, Rogers J, Sajid M, Salzberg SL, Stanke M, Tivey AR, White O, Williams DL,

34

Wortman J, Wu W, Zamanian M, Zerlotini A, Fraser-Liggett CM, Barrell BG, El- Sayed NM, 2009. The genome of the blood fluke Schistosoma mansoni. Nature 460, 352-358. Besier B, Love S, 2012. Advising on helminth control in sheep: It’s the way we tell them. Vet. J. 193, 2-3. Besier B, Kahn L, Dobson R, Smith D, 2015. Barbervax - a new strategy for Haemonchus management. Proc. Conf. Aust. Sheep Veterinarians, Brisbane, Australia, pp. 373- 377. Besier RB, 2012. Refugia-based strategies for sustainable worm control: factors affecting the acceptability to sheep and goat owners. Vet. Parasitol. 186, 2-9. Besier RB, Smith WD, 2014. A new approach to the control of barbers pole worm. Proc. Conf. Aust. Sheep Veterinarians, Perth, Australia, pp. 11-16. Bethony J, Brooker S, Albonico M, Geiger SM, Loukas A, Diemert D, Hotez PJ, 2006. Soil- transmitted helminth infections: ascariasis, trichuriasis, and hookworm. Lancet 367, 1521-1532. Bharucha N, Ma J, Dobry CJ, Lawson SK, Yang Z, Kumar A, 2008. Analysis of the yeast kinome reveals a network of regulated protein localization during filamentous growth. Mol. Biol. Cell 19, 2708-2717. Bilanges B, Torbett N, Vanhaesebroeck B, 2008. Killing two kinase families with one stone. Nat. Chem. Biol. 4, 648-649. Bimbo A, Jia Y, Poh SL, Karuturi RK, den Elzen N, Peng X, Zheng L, O'Connell M, Liu ET, Balasubramanian MK, Liu J, 2005. Systematic deletion analysis of fission yeast protein kinases. Eukaryot. Cell 4, 799-813. Blaxter M, Daub J, Guiliano D, Parkinson J, Whitton C, Filarial Genome Project, 2002. The Brugia malayi genome project: expressed sequence tags and gene discovery. Trans. R. Soc. Trop. Med. Hyg. 96, 7-17. Blaxter M, Koutsovoulos G, 2015. The evolution of parasitism in Nematoda. Parasitology 142 (Suppl. 1), S26-S39. Blume-Jensen P, Hunter T, 2001. Oncogenic kinase signalling. Nature 411, 355-365. Bossemeyer D, 1995. Protein kinases - structure and function. FEBS Lett. 369, 57-61. Bottazzi ME, 2015. The human hookworm vaccine: recent updates and prospects for success. J. Helminthol. 89, 540-544. Boudeau J, Miranda-Saavedra D, Barton GJ, Alessi DR, 2006. Emerging roles of pseudokinases. Trends Cell Biol. 16, 443-452. Boutet E, Lieberherr D, Tognolli M, Schneider M, Bairoch A, 2007. UniProtKB/Swiss-Prot. Methods Mol. Biol. 406, 89-112. Brenner S, 1974. The genetics of Caenorhabditis elegans. Genetics 77, 71-94. Brenner S, 1988. The nematode Caenorhabditis elegans. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, USA. Brindley PJ, Mitreva M, Ghedin E, Lustigman S, 2009. Helminth genomics: The implications for human health. PLoS Negl. Trop. Dis. 3, e538. Brindley PJ, Hotez PJ, 2013. Break out: urogenital schistosomiasis and Schistosoma haematobium infection in the post-genomic era. PLoS Negl. Trop. Dis. 7, e1961. Brizuela L, Draetta G, Beach D, 1987. p13suc1 acts in the fission yeast cell division cycle as a component of the p34cdc2 protein kinase. EMBO J. 6, 3507-3514. Brugge JS, Erikson RL, 1977. Identification of a transformation-specific antigen induced by an avian sarcoma virus. Nature 269, 346-348. Burke DT, Carle GF, Olson MV, 1987. Cloning of large segments of exogenous DNA into yeast by means of artificial chromosome vectors. Science 236, 806-812.

35

Burns AR, Roy PJ, 2012. To kill a mocking worm: strategies to improve Caenorhabditis elegans as a model system for use in anthelmintic discovery. In: Caffrey, CR (Ed.), Parasitic helminths: targets, screens, drugs and vaccines. Wiley-Blackwell, Hoboken, New Jersey, USA, pp. 201-216. Burns AR, Luciani GM, Musso G, Bagg R, Yeo M, Zhang Y, Rajendran L, Glavin J, Hunter R, Redman E, Stasiuk S, Schertzberg M, Angus McQuibban G, Caffrey CR, Cutler SR, Tyers M, Giaever G, Nislow C, Fraser AG, MacRae CA, Gilleard J, Roy PJ, 2015. Caenorhabditis elegans is a useful model for anthelmintic discovery. Nat. Commun. 6, 7485. Buro C, Beckmann S, Oliveira KC, Dissous C, Cailliau K, Marhöfer RJ, Selzer PM, Verjovski-Almeida S, Grevelding CG, 2014. Imatinib treatment causes substantial transcriptional changes in adult Schistosoma mansoni in vitro exhibiting pleiotropic effects. PLoS Negl. Trop. Dis. 8, e2923. C. elegans Sequencing Consortium, 1998. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282, 2012-2018. Caenepeel S, Charydczak G, Sudarsanam S, Hunter T, Manning G, 2004. The mouse kinome: discovery and comparative genomics of all mouse protein kinases. Proc. Natl. Acad. Sci. USA 101, 11707-11712. Caffrey CR, Rohwer A, Oellien F, Marhöfer RJ, Braschi S, Oliveira G, McKerrow JH, Selzer PM, 2009. A comparative chemogenomics strategy to predict potential drug targets in the metazoan pathogen, Schistosoma mansoni. PLoS One 4, e4413. Caffrey CR, Utzinger J, Keiser J, 2012. Drug discovery for trematodiases: challenges and progress. In: Caffrey, CR (Ed.), Parasitic helminths: targets, screens, drugs, and vaccines. Wiley-Blackwell, Hoboken, New Jersey, USA, pp. 323-339. Caffrey CR, 2015. Schistosomiasis and its treatment. Future Med. Chem. 7, 675-676. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL, 2009. BLAST+: architecture and applications. BMC Bioinformatics 10, 421. Cantacessi C, Hofmann A, Campbell BE, Gasser RB, 2015. Impact of next-generation technologies on exploring socioeconomically important parasites and developing new interventions. Methods Mol. Biol. 1247, 437-474. Chai JY, 2007. Intestinal flukes. Springer Science + Business Media, New York, USA. Champion A, Kreis M, Mockaitis K, Picaud A, Henry Y, 2004. Arabidopsis kinome: after the casting. Funct. Integr. Genomics 4, 163-187. Chang L, Karin M, 2001. Mammalian MAP kinase signalling cascades. Nature 410, 37-40. Charlier J, van der Voort M, Kenyon F, Skuce P, Vercruysse J, 2014. Chasing helminths and their economic impact on farmed ruminants. Trends Parasitol. 30, 361-367. Clarke M, Lohan AJ, Liu B, Lagkouvardos I, Roy S, Zafar N, Bertelli C, Schilde C, Kianianmomeni A, Burglin TR, Frech C, Turcotte B, Kopec KO, Synnott JM, Choo C, Paponov I, Finkler A, Heng Tan CS, Hutchins AP, Weinmeier T, Rattei T, Chu JS, Gimenez G, Irimia M, Rigden DJ, Fitzpatrick DA, Lorenzo-Morales J, Bateman A, Chiu CH, Tang P, Hegemann P, Fromm H, Raoult D, Greub G, Miranda- Saavedra D, Chen N, Nash P, Ginger ML, Horn M, Schaap P, Caler L, Loftus BJ, 2013. Genome of Acanthamoeba castellanii highlights extensive lateral gene transfer and early evolution of tyrosine kinase signaling. Genome Biol. 14, R11. Cohen P, 2000. The regulation of protein function by multisite phosphorylation - a 25 year update. Trends Biochem. Sci. 25, 596-601. Cohen P, 2001. The role of protein phosphorylation in human health and disease. Eur. J. Biochem. 268, 5001-5010. Cohen P, 2002. Protein kinases - the major drug targets of the twenty-first century? Nat. Rev. Drug Discov. 1, 309-315.

36

Cohen P, Alessi DR, 2013. Kinase drug discovery - what's next in the field? ACS Chem. Biol. 8, 96-104. Colley DG, Bustinduy AL, Secor WE, King CH, 2014. Human schistosomiasis. Lancet 383, 2253-2264. Cox KJ, Shomin CD, Ghosh I, 2011. Tinkering outside the kinase ATP box: allosteric (type IV) and bivalent (type V) inhibitors of protein kinases. Future Med. Chem. 3, 29-43. Crick FHC, 1958. On protein synthesis. Symp. Soc. Exp. Biol. 12, 138-163. Crick FHC, 1970. Central dogma of molecular biology. Nature 227, 561-563. Dalzell JJ, Warnock ND, McVeigh P, Marks NJ, Mousley A, Atkinson L, Maule AG, 2012. Considering RNAi experimental design in parasitic helminths. Parasitology 139, 589-604. Dar AC, Dever TE, Sicheri F, 2005. Higher-order substrate recognition of eIF2α by the RNA-dependent protein kinase PKR. Cell 122, 887-900. Daub H, Specht K, Ullrich A, 2004. Strategies to overcome resistance to targeted protein kinase inhibitors. Nat. Rev. Drug Discov. 3, 1001-1010. Desjardins CA, Cerqueira GC, Goldberg JM, Dunning Hotopp JC, Haas BJ, Zucker J, Ribeiro JM, Saif S, Levin JZ, Fan L, Zeng Q, Russ C, Wortman JR, Fink DL, Birren BW, Nutman TB, 2013. Genomics of Loa loa, a Wolbachia-free filarial parasite of humans. Nat. Genet. 45, 495-500. Despommier DD, 1993. Trichinella spiralis and the concept of niche. J. Parasitol. 79, 472- 482. Devleesschauwer B, Praet N, Speybroeck N, Torgerson PR, Haagsma JA, De Smet K, Murrell KD, Pozio E, Dorny P, 2015. The low global burden of trichinellosis: evidence and implications. Int. J. Parasitol. 45, 95-99. Dissous C, Grevelding CG, 2011. Piggy-backing the concept of cancer drugs for schistosomiasis treatment: a tangible perspective? Trends Parasitol. 27, 59-66. Dorny P, Stoliaroff V, Charlier J, Meas S, Sorn S, Chea B, Holl D, Van Aken D, Vercruysse J, 2011. Infections with gastrointestinal nematodes, Fasciola and Paramphistomum in cattle in Cambodia and their association with morbidity parameters. Vet. Parasitol. 175, 293-299. Draetta G, Brizuela L, Potashkin J, Beach D, 1987. Identification of p34 and p13, human homologs of the cell cycle regulators of fission yeast encoded by cdc2+ and suc1+. Cell 50, 319-325. Druker BJ, Talpaz M, Resta DJ, Peng B, Buchdunger E, Ford JM, Lydon NB, Kantarjian H, Capdeville R, Ohno-Jones S, Sawyers CL, 2001. Efficacy and safety of a specific inhibitor of the BCR-ABL tyrosine kinase in chronic myeloid leukemia. N. Engl. J. Med. 344, 1031-1037. Duffy JB, Perrimon N, 1996. Recent advances in understanding signal transduction pathways in worms and flies. Curr. Opin. Cell Biol. 8, 231-238. Duggan A, Chalfie M, 1995. Control of neuronal development in Caenorhabditis elegans. Curr. Opin. Neurobiol. 5, 6-9. Durie PH, 1953. The paramphistomes (Trematoda) of Australian ruminants. Aust. J. Zool. 1, 193-222. Eddy SR, 1996. Hidden Markov models. Curr. Opin. Struct. Biol. 6, 361-365. Edman P, Begg G, 1967. A protein sequenator. Eur. J. Biochem. 1, 80-91. Eglen RM, Reisine T, 2009. The current status of drug discovery against the human kinome. Assay Drug Dev. Technol. 7, 22-43. Eglen RM, Reisine T, 2011. Drug discovery and the human kinome: recent trends. Pharmacol. Ther. 130, 144-156.

37

Eisenmann DM, Kim SK, 1994. Signal transduction and cell fate specification during Caenorhabditis elegans vulval development. Curr. Opin. Genet. Dev. 4, 508-516. Ekins S, Williams AJ, 2011. Finding promiscuous old drugs for new uses. Pharm. Res. 28, 1785-1791. Ekins S, Williams AJ, Krasowski MD, Freundlich JS, 2011. In silico repositioning of approved drugs for rare and neglected diseases. Drug Discov. Today 16, 298-310. Endicott JA, Noble ME, Johnson LN, 2012. The structural basis for control of eukaryotic protein kinases. Annu. Rev. Biochem. 81, 587-613. Epe C, Kaminsky R, 2013. New advancement in anthelmintic drugs in veterinary medicine. Trends Parasitol. 29, 129-134. Fabregat A, Sidiropoulos K, Garapati P, Gillespie M, Hausmann K, Haw R, Jassal B, Jupe S, Korninger F, McKay S, Matthews L, May B, Milacic M, Rothfels K, Shamovsky V, Webber M, Weiser J, Williams M, Wu G, Stein L, Hermjakob H, D'Eustachio P, 2016. The Reactome pathway knowledgebase. Nucleic Acids Res. 44, D481-D487. Fedorov O, Müller S, Knapp S, 2010. The (un)targeted cancer kinome. Nat. Chem. Biol. 6, 166-169. Fenwick A, 2012. The global burden of neglected tropical diseases. Public Health 126, 233- 236. Ferreira LG, Dos Santos RN, Oliva G, Andricopulo AD, 2015. Molecular docking and structure-based drug design strategies. Molecules 20, 13384-13421. Fischer EH, Krebs EG, 1955. Conversion of phosphorylase b to phosphorylase a in muscle extracts. J. Biol. Chem. 216, 121-132. Foth BJ, Tsai IJ, Reid AJ, Bancroft AJ, Nichol S, Tracey A, Holroyd N, Cotton JA, Stanley EJ, Zarowiecki M, Liu JZ, Huckvale T, Cooper PJ, Grencis RK, Berriman M, 2014. Whipworm genome and dual-species transcriptome analyses provide molecular insights into an intimate host-parasite interaction. Nat. Genet. 46, 693-700. Fried B, Graczyk TK, Tamang L, 2004. Food-borne intestinal trematodiases in humans. Parasitol. Res. 93, 159-170. Fürst T, Keiser J, Utzinger J, 2012. Global burden of human food-borne trematodiasis: a systematic review and meta-analysis. Lancet. Infect. Dis. 12, 210-221. Garbers DL, 1990. Guanylate cyclase receptor family. Recent Prog. Horm. Res. 46, 85-96. Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain A, Nelson KE, Bowman S, Paulsen IT, James K, Eisen JA, Rutherford K, Salzberg SL, Craig A, Kyes S, Chan MS, Nene V, Shallom SJ, Suh B, Peterson J, Angiuoli S, Pertea M, Allen J, Selengut J, Haft D, Mather MW, Vaidya AB, Martin DM, Fairlamb AH, Fraunholz MJ, Roos DS, Ralph SA, McFadden GI, Cummings LM, Subramanian GM, Mungall C, Venter JC, Carucci DJ, Hoffman SL, Newbold C, Davis RW, Fraser CM, Barrell B, 2002. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature 419, 498-511. Garnick E, 1992. Niche breadth in parasites: an evolutionarily stable strategy model, with special reference to the protozoan parasite Leishmania. Theor. Popul. Biol. 42, 62- 103. Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, McGlinchey S, Michalovich D, Al-Lazikani B, Overington JP, 2012. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 40, D1100-D1107. Gavrin LK, Saiah E, 2013. Approaches to discover non-ATP site kinase inhibitors. Med. Chem. Commun. 4, 41-51. Geary TG, 2012. Are new anthelmintics needed to eliminate human helminthiases? Curr. Opin. Infect. Dis. 25, 709-717.

38

Geary TG, Sakanari JA, Caffrey CR, 2015. Anthelmintic drug discovery: into the future. J. Parasitol. 101, 125-133. Gelmedin V, Dissous C, Grevelding CG, 2015. Re-positioning protein-kinase inhibitors against schistosomiasis. Future Med. Chem. 7, 737-752. Ghedin E, Wang S, Foster JM, Slatko BE, 2004. First sequenced genome of a parasitic nematode. Trends Parasitol. 20, 151-153. Ghedin E, Wang S, Spiro D, Caler E, Zhao Q, Crabtree J, Allen JE, Delcher AL, Guiliano DB, Miranda-Saavedra D, Angiuoli SV, Creasy T, Amedeo P, Haas B, El-Sayed NM, Wortman JR, Feldblyum T, Tallon L, Schatz M, Shumway M, Koo H, Salzberg SL, Schobel S, Pertea M, Pop M, White O, Barton GJ, Carlow CK, Crawford MJ, Daub J, Dimmic MW, Estes CF, Foster JM, Ganatra M, Gregory WF, Johnson NM, Jin J, Komuniecki R, Korf I, Kumar S, Laney S, Li BW, Li W, Lindblom TH, Lustigman S, Ma D, Maina CV, Martin DM, McCarter JP, McReynolds L, Mitreva M, Nutman TB, Parkinson J, Peregrin-Alvarez JM, Poole C, Ren Q, Saunders L, Sluder AE, Smith K, Stanke M, Unnasch TR, Ware J, Wei AD, Weil G, Williams DJ, Zhang Y, Williams SA, Fraser-Liggett C, Slatko B, Blaxter ML, Scott AL, 2007. Draft genome of the filarial nematode parasite Brugia malayi. Science 317, 1756- 1760. Gibbs RA, 1995. Pressing ahead with human genome sequencing. Nat. Genet. 11, 121-125. Gilabert A, Curran DM, Harvey SC, Wasmuth JD, 2016. Expanding the view on the evolution of the nematode dauer signalling pathways: refinement through gene gain and pathway co-option. BMC Genomics 17, 476. Gilleard JS, 2004. The use of Caenorhabditis elegans in parasitic nematode research. Parasitology 128 (Suppl. 1), S49-S70. Gilleard JS, 2006. Understanding anthelmintic resistance: the need for genomics and genetics. Int. J. Parasitol. 36, 1227-1239. Gilleard JS, Beech RN, 2007. Population genetics of anthelmintic resistance in parasitic nematodes. Parasitology 134, 1133-1147. Gilleard JS, 2013. Haemonchus contortus as a paradigm and model to study anthelmintic drug resistance. Parasitology 140, 1506-1522. Goffeau A, Barrell BG, Bussey H, Davis RW, Dujon B, Feldmann H, Galibert F, Hoheisel JD, Jacq C, Johnston M, Louis EJ, Mewes HW, Murakami Y, Philippsen P, Tettelin H, Oliver SG, 1996. Life with 6000 genes. Science 274, 546, 563-547. Goldberg JM, Manning G, Liu A, Fey P, Pilcher KE, Xu Y, Smith JL, 2006. The Dictyostelium kinome - analysis of the protein kinases from a simple model organism. PLoS Genet. 2, e38. Goldberg JM, Griggs AD, Smith JL, Haas BJ, Wortman JR, Zeng Q, 2013. Kinannote, a computer program to identify and classify members of the eukaryotic protein kinase superfamily. Bioinformatics 29, 2387-2394. Goodwin S, McPherson JD, McCombie WR, 2016. Coming of age: ten years of next- generation sequencing technologies. Nat. Rev. Genet. 17, 333-351. Gorre ME, Mohammed M, Ellwood K, Hsu N, Paquette R, Rao PN, Sawyers CL, 2001. Clinical resistance to STI-571 cancer therapy caused by BCR-ABL gene mutation or amplification. Science 293, 876-880. Gosal G, Kochut KJ, Kannan N, 2011. ProKinO: an ontology for integrative analysis of protein kinases in cancer. PLoS One 6, e28782. Graves JD, Krebs EG, 1999. Protein phosphorylation and signal transduction. Pharmacol. Ther. 82, 111-121.

39

Guidi A, Mansour NR, Paveley RA, Carruthers IM, Besnard J, Hopkins AL, Gilbert IH, Bickle QD, 2015. Application of RNAi to genomic drug target validation in schistosomes. PLoS Negl. Trop. Dis. 9, e0003801. Hagen J, Lee EF, Fairlie WD, Kalinna BH, 2012. Functional genomics approaches in parasitic helminths. Parasite Immunol. 34, 163-182. Hagen J, Young ND, Every AL, Pagel CN, Schnoeller C, Scheerlinck JP, Gasser RB, Kalinna BH, 2014. Omega-1 knockdown in Schistosoma mansoni eggs by lentivirus transduction reduces granuloma size in vivo. Nat. Commun. 5, 5375. Hagen J, Scheerlinck JP, Gasser RB, 2015. Knocking down schistosomes - promise for lentiviral transduction in parasites. Trends Parasitol. 31, 324-332. Hanks SK, 1987. Homology probing: identification of cDNA clones encoding members of the protein-serine kinase family. Proc. Natl. Acad. Sci. USA 84, 388-392. Hanks SK, Quinn AM, Hunter T, 1988. The protein kinase family: conserved features and deduced phylogeny of the catalytic domains. Science 241, 42-52. Hanks SK, Hunter T, 1995. Protein kinases 6. The eukaryotic protein kinase superfamily: kinase (catalytic) domain structure and classification. FASEB J. 9, 576-596. Hanks SK, 2003. Genomic analysis of the eukaryotic protein kinase superfamily: a perspective. Genome Biol. 4, 111. Harris TW, Baran J, Bieri T, Cabunoc A, Chan J, Chen WJ, Davis P, Done J, Grove C, Howe K, Kishore R, Lee R, Li Y, Muller HM, Nakamura C, Ozersky P, Paulini M, Raciti D, Schindelman G, Tuli MA, Van Auken K, Wang D, Wang X, Williams G, Wong JD, Yook K, Schedl T, Hodgkin J, Berriman M, Kersey P, Spieth J, Stein L, Sternberg PW, 2014. WormBase 2014: new views of curated biology. Nucleic Acids Res. 42, D789-D793. Hayden EC, 2014. The $1,000 genome. Nature 507, 294-295. Heiges M, Wang H, Robinson E, Aurrecoechea C, Gao X, Kaluskar N, Rhodes P, Wang S, He CZ, Su Y, Miller J, Kraemer E, Kissinger JC, 2006. CryptoDB: a Cryptosporidium bioinformatics resource update. Nucleic Acids Res. 34, D419- D422. Heisterkamp N, Stephenson JR, Groffen J, Hansen PF, de Klein A, Bartram CR, Grosveld G, 1983. Localization of the c-abl oncogene adjacent to a translocation break point in chronic myelocytic leukaemia. Nature 306, 239-242. Henikoff JG, Henikoff S, 1996. Using substitution probabilities to improve position-specific scoring matrices. Comput. Appl. Biosci. 12, 135-143. Holden-Dye L, Walker RJ, 2007. Anthelmintic drugs. WormBook, ed. The C. elegans Research Community, WormBook, https://doi.org/10.1895/wormbook.1.143.1 Holden-Dye L, Walker RJ, 2012. How relevant is Caenorhabditis elegans as a model for the analysis of parasitic nematode biology? In: Caffrey, CR (Ed.), Parasitic helminths: targets, screens, drugs and vaccines. Wiley-Blackwell, Hoboken, New Jersey, USA, pp. 23-41. Holden-Dye L, Walker RJ, 2014. Anthelmintic drugs and nematicides: studies in Caenorhabditis elegans. WormBook, ed. The C. elegans Research Community, WormBook, https://doi.org/10.1895/wormbook.1.143.2 Hopkins AL, Mason JS, Overington JP, 2006. Can we rationally design promiscuous drugs? Curr. Opin. Struct. Biol. 16, 127-136. Hornbeck PV, Zhang B, Murray B, Kornhauser JM, Latham V, Skrzypek E, 2015. PhosphoSitePlus, 2014: mutations, PTMs and recalibrations. Nucleic Acids Res. 43, D512-D520. Hotez PJ, Fenwick A, Savioli L, Molyneux DH, 2009. Rescuing the bottom billion through control of neglected tropical diseases. Lancet 373, 1570-1575.

40

Hotez PJ, Kamath A, 2009. Neglected tropical diseases in sub-saharan Africa: review of their prevalence, distribution, and disease burden. PLoS Negl. Trop. Dis. 3, e412. Hotez PJ, Diemert D, Bacon KM, Beaumier C, Bethony JM, Bottazzi ME, Brooker S, Couto AR, Freire Mda S, Homma A, Lee BY, Loukas A, Loblack M, Morel CM, Oliveira RC, Russell PK, 2013. The human hookworm vaccine. Vaccine 31 (Suppl. 2), B227- B232. Howe KL, Bolt BJ, Shafie M, Kersey P, Berriman M, 2016. WormBase ParaSite - a comprehensive resource for helminth genomics. Mol. Biochem. Parasitol. https://doi.org/10.1016/j.molbiopara.2016.11.005 Hu Y, Furtmann N, Bajorath J, 2015. Current compound coverage of the kinome. J. Med. Chem. 58, 30-40. Huang Y, Chen W, Wang X, Liu H, Chen Y, Guo L, Luo F, Sun J, Mao Q, Liang P, Xie Z, Zhou C, Tian Y, Lv X, Huang L, Zhou J, Hu Y, Li R, Zhang F, Lei H, Li W, Hu X, Liang C, Xu J, Li X, Yu X, 2013. The carcinogenic liver fluke, Clonorchis sinensis: new assembly, reannotation and analysis of the genome and characterization of tissue transcriptomes. PLoS One 8, e54732. Hubbard SR, Till JH, 2000. Protein tyrosine kinase structure and function. Annu. Rev. Biochem. 69, 373-398. Hughes JP, Rees S, Kalindjian SB, Philpott KL, 2011. Principles of early drug discovery. Br. J. Pharmacol. 162, 1239-1249. Hunter T, Sefton BM, 1980. Transforming gene product of Rous sarcoma virus phosphorylates tyrosine. Proc. Natl. Acad. Sci. USA 77, 1311-1315. Hunter T, 1987. A thousand and one protein kinases. Cell 50, 823-829. Hunter T, Lindberg RA, Middlemas DS, Tracy S, van der Geer P, 1992. Receptor protein tyrosine kinases and phosphatases. Cold Spring Harb. Symp. Quant. Biol. 57, 25-41. Hunter T, Plowman GD, 1997. The protein kinases of budding yeast: six score and more. Trends Biochem. Sci. 22, 18-22. Huse M, Kuriyan J, 2002. The conformational plasticity of protein kinases. Cell 109, 275- 282. Ikezu S, Ikezu T, 2014. Tau-tubulin kinase. Front. Mol. Neurosci. 7, 33. Jabbar A, Iqbal Z, Kerboeuf D, Muhammad G, Khan MN, Afaq M, 2006. Anthelmintic resistance: the state of play revisited. Life Sci. 79, 2413-2431. Jaleel M, Saha S, Shenoy AR, Visweswariah SS, 2006. The kinase homology domain of receptor guanylyl cyclase C: ATP binding and identification of an adenine nucleotide sensitive site. Biochemistry 45, 1888-1898. Jay E, Bambara R, Padmanabhan R, Wu R, 1974. DNA sequence analysis: a general, simple and rapid method for sequencing large oligodeoxyribonucleotide fragments by mapping. Nucleic Acids Res. 1, 331-353. Jex AR, Liu S, Li B, Young ND, Hall RS, Li Y, Yang L, Zeng N, Xu X, Xiong Z, Chen F, Wu X, Zhang G, Fang X, Kang Y, Anderson GA, Harris TW, Campbell BE, Vlaminck J, Wang T, Cantacessi C, Schwarz EM, Ranganathan S, Geldhof P, Nejsum P, Sternberg PW, Yang H, Wang J, Wang J, Gasser RB, 2011. Ascaris suum draft genome. Nature 479, 529-533. Jex AR, Nejsum P, Schwarz EM, Hu L, Young ND, Hall RS, Korhonen PK, Liao S, Thamsborg S, Xia J, Xu P, Wang S, Scheerlinck JP, Hofmann A, Sternberg PW, Wang J, Gasser RB, 2014. Genome and transcriptome of the porcine whipworm . Nat. Genet. 46, 701-706. Johnson DA, Akamine P, Radzio-Andzelm E, Madhusudan M, Taylor SS, 2001. Dynamics of cAMP-dependent protein kinase. Chem. Rev. 101, 2243-2270.

41

Johnson SA, Hunter T, 2005. Kinomics: methods for deciphering the kinome. Nat. Methods 2, 17-25. Jones P, Binns D, Chang HY, Fraser M, Li W, McAnulla C, McWilliam H, Maslen J, Mitchell A, Nuka G, Pesseat S, Quinn AF, Sangrador-Vegas A, Scheremetjew M, Yong SY, Lopez R, Hunter S, 2014. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236-1240. Kaewkes S, 2003. and biology of liver flukes. Acta Trop. 88, 177-186. Kamath RS, Ahringer J, 2003. Genome-wide RNAi screening in Caenorhabditis elegans. Methods 30, 313-321. Kaminsky R, Ducray P, Jung M, Clover R, Rufener L, Bouvier J, Weber SS, Wenger A, Wieland-Berghausen S, Goebel T, Gauvry N, Pautrat F, Skripsky T, Froelich O, Komoin-Oka C, Westlund B, Sluder A, Maser P, 2008a. A new class of anthelmintics effective against drug-resistant nematodes. Nature 452, 176-180. Kaminsky R, Gauvry N, Schorderet Weber S, Skripsky T, Bouvier J, Wenger A, Schroeder F, Desaules Y, Hotz R, Goebel T, Hosking BC, Pautrat F, Wieland-Berghausen S, Ducray P, 2008b. Identification of the amino-acetonitrile derivative monepantel (AAD 1566) as a new anthelmintic drug development candidate. Parasitol. Res. 103, 931-939. Kanehisa M, Goto S, 2000. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27-30. Kannan N, Taylor SS, Zhai Y, Venter JC, Manning G, 2007a. Structural and functional diversity of the microbial kinome. PLoS Biol. 5, e17. Kannan N, Haste N, Taylor SS, Neuwald AF, 2007b. The hallmark of AGC kinase functional divergence is its C-terminal tail, a cis-acting regulatory module. Proc. Natl. Acad. Sci. USA 104, 1272-1277. Kaplan RM, 2004. Drug resistance in nematodes of veterinary importance: a status report. Trends Parasitol. 20, 477-481. Kaplan RM, Vidyashankar AN, 2012. An inconvenient truth: global worming and anthelmintic resistance. Vet. Parasitol. 186, 70-78. Katz N, Couto FF, Araujo N, 2013. Imatinib activity on Schistosoma mansoni. Mem. Inst. Oswaldo Cruz 108, 850-853. Kearney PE, Murray PJ, Hoy JM, Hohenhaus M, Kotze A, 2016. The 'toolbox' of strategies for managing Haemonchus contortus in goats: what's in and what's out. Vet. Parasitol. 220, 93-107. Keiser J, Utzinger J, 2005. Emerging foodborne trematodiasis. Emerg. Infect. Dis. 11, 1507- 1514. Keiser J, Utzinger J, 2008. Efficacy of current drugs against soil-transmitted helminth infections: systematic review and meta-analysis. J. Am. Med. Assoc. 299, 1937- 1948. Keiser J, Utzinger J, 2009. Food-borne trematodiases. Clin. Microbiol. Rev. 22, 466-483. Keiser J, Utzinger J, 2010. The drugs we have and the drugs we need against major helminth infections. Adv. Parasitol. 73, 197-230. Kim UJ, Birren BW, Slepak T, Mancino V, Boysen C, Kang HL, Simon MI, Shizuya H, 1996. Construction and characterization of a human bacterial artificial chromosome library. Genomics 34, 213-218. King CH, Dickman K, Tisch DJ, 2005. Reassessment of the cost of chronic helmintic infection: a meta-analysis of disability-related outcomes in endemic schistosomiasis. Lancet 365, 1561-1569. Knight JD, Pawson T, Gingras AC, 2013. Profiling the kinome: current capabilities and future challenges. J. Proteomics 81, 43-55.

42

Knighton DR, Zheng JH, Ten Eyck LF, Ashford VA, Xuong NH, Taylor SS, Sowadski JM, 1991. Crystal structure of the catalytic subunit of cyclic adenosine monophosphate- dependent protein kinase. Science 253, 407-414. Knox DP, Geldhof P, Visser A, Britton C, 2007. RNA interference in parasitic nematodes of animals: a reality check? Trends Parasitol. 23, 105-107. Knox MR, Besier RB, Le Jambre LF, Kaplan RM, Torres-Acosta JF, Miller J, Sutherland I, 2012. Novel approaches for the control of helminth parasites of livestock VI: summary of discussions and conclusions. Vet. Parasitol. 186, 143-149. Koike A, Kobayashi Y, Takagi T, 2003. Kinase pathway database: an integrated protein- kinase and NLP-based protein-interaction resource. Genome Res. 13, 1231-1243. Korenberg JR, Chen XN, Adams MD, Venter JC, 1995. Toward a cDNA map of the human genome. Genomics 29, 364-370. Korhonen PK, Pozio E, La Rosa G, Chang BCH, Koehler AV, Hoberg EP, Boag PR, Tan P, Jex AR, Hofmann A, Sternberg PW, Young ND, Gasser RB, 2016. Phylogenomic and biogeographic reconstruction of the Trichinella complex. Nat. Commun. 7, 10513. Kornev AP, Haste NM, Taylor SS, Eyck LF, 2006. Surface comparison of active and inactive protein kinases identifies a conserved activation mechanism. Proc. Natl. Acad. Sci. USA 103, 17783-17788. Kornev AP, Taylor SS, Ten Eyck LF, 2008. A helix scaffold for the assembly of active protein kinases. Proc. Natl. Acad. Sci. USA 105, 14377-14382. Kornev AP, Taylor SS, 2010. Defining the conserved internal architecture of a protein kinase. Biochim. Biophys. Acta 1804, 440-444. Kostadinova A, Gibson DI, 2000. The systematics of the echinostomes. In: Fried, B, Graczyk, TK (Eds.), Echinostomes as experimental models for biological research. Springer Science + Business Media, Dordrecht, Netherlands, pp. 31-57. Krebs EG, 1993. Nobel lecture. Protein phosphorylation and cellular regulation I. Biosci. Rep. 13, 127-142. Krishna M, Narang H, 2008. The complexity of mitogen-activated protein kinases (MAPKs) made simple. Cell. Mol. Life Sci. 65, 3525-3544. Krishnamurty R, Maly DJ, 2010. Biochemical mechanisms of resistance to small-molecule protein kinase inhibitors. ACS Chem. Biol. 5, 121-138. Krogh A, Brown M, Mian IS, Sjölander K, Haussler D, 1994. Hidden Markov models in computational biology. Applications to protein modeling. J. Mol. Biol. 235, 1501- 1531. Krupa A, Abhinandan KR, Srinivasan N, 2004. KinG: a database of protein kinases in genomes. Nucleic Acids Res. 32, D153-D155. Kulke D, von Samson-Himmelstjerna G, Miltsch SM, Wolstenholme AJ, Jex AR, Gasser RB, Ballesteros C, Geary TG, Keiser J, Townson S, Harder A, Krücken J, 2014. Characterization of the Ca2+-gated and voltage-dependent K+-channel Slo-1 of nematodes and its interaction with emodepside. PLoS Negl. Trop. Dis. 8, e3401. Kumarasingha R, Karpe AV, Preston S, Yeo TC, Lim DS, Tu CL, Luu J, Simpson KJ, Shaw JM, Gasser RB, Beale DJ, Morrison PD, Palombo EA, Boag PR, 2016. Metabolic profiling and in vitro assessment of anthelmintic fractions of Picria fel-terrae Lour. Int. J. Parasitol. Drugs Drug Resist. 6, 171-178. Lacey E, 1990. Mode of action of benzimidazoles. Parasitol. Today 6, 112-115. Lai DH, Hong XK, Su BX, Liang C, Hide G, Zhang X, Yu X, Lun ZR, 2016. Current status of Clonorchis sinensis and clonorchiasis in China. Trans. R. Soc. Trop. Med. Hyg. 110, 21-27.

43

Laing R, Kikuchi T, Martinelli A, Tsai IJ, Beech RN, Redman E, Holroyd N, Bartley DJ, Beasley H, Britton C, Curran D, Devaney E, Gilabert A, Hunt M, Jackson F, Johnston SL, Kryukov I, Li K, Morrison AA, Reid AJ, Sargison N, Saunders GI, Wasmuth JD, Wolstenholme A, Berriman M, Gilleard JS, Cotton JA, 2013. The genome and transcriptome of Haemonchus contortus, a key model parasite for drug and vaccine discovery. Genome Biol. 14, R88. Lane J, Jubb T, Shephard R, Webb-Ware J, Fordyce G, 2015. Priority list of endemic diseases for the red meat industries. Meat & Livestock Australia Limited B.AHE.0010. Langeberg LK, Scott JD, 2015. Signalling scaffolds and local organization of cellular behaviour. Nat. Rev. Mol. Cell Biol. 16, 232-244. Lanusse CE, Alvarez LI, Lifschitz AL, 2016. Gaining insights into the pharmacology of anthelmintics using Haemonchus contortus as a model nematode. Adv. Parasitol. 93, 465-518. Law V, Knox C, Djoumbou Y, Jewison T, Guo AC, Liu Y, Maciejewski A, Arndt D, Wilson M, Neveu V, Tang A, Gabriel G, Ly C, Adamjee S, Dame ZT, Han B, Zhou Y, Wishart DS, 2014. DrugBank 4.0: shedding new light on drug metabolism. Nucleic Acids Res. 42, D1091-D1097. Lee MG, Nurse P, 1987. Complementation used to clone a human homologue of the fission yeast cell cycle control gene cdc2. Nature 327, 31-35. Lehmann S, Bass JJ, Szewczyk NJ, 2013. Knockdown of the C. elegans kinome identifies kinases required for normal protein homeostasis, mitochondrial network structure, and sarcomere structure in muscle. Cell Commun. Signal. 11, 71. Li L, Stoeckert Jr. CJ, Roos DS, 2003. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 2178-2189. Lightowlers MW, Flisser A, Gauci CG, Heath DD, Jensen O, Rolfe R, 2000. Vaccination against cysticercosis and hydatid disease. Parasitol. Today 16, 191-196. Lin J, Sahakian DC, de Morais SM, Xu JJ, Polzer RJ, Winter SM, 2003. The role of absorption, distribution, metabolism, excretion and toxicity in drug discovery. Curr. Top. Med. Chem. 3, 1125-1154. Lipinski CA, 2004. Lead- and drug-like compounds: the rule-of-five revolution. Drug Discov. Today Technol. 1, 337-341. Little P, 1995. A big book of the human genome. Navigational progress. Nature 377, 286- 287. Little PR, Hodge A, Watson TG, Seed JA, Maeder SJ, 2010. Field efficacy and safety of an oral formulation of the novel combination anthelmintic, derquantel-abamectin, in sheep in New Zealand. N. Z. Vet. J. 58, 121-129. Liu Q, Wei F, Liu W, Yang S, Zhang X, 2008. Paragonimiasis: an important food-borne zoonosis in China. Trends Parasitol. 24, 318-323. Liu Y, Shah K, Yang F, Witucki L, Shokat KM, 1998. A molecular gate which controls unnatural ATP analogue recognition by the tyrosine kinase v-Src. Bioorg. Med. Chem. 6, 1219-1226. Liu Y, Gray NS, 2006. Rational design of inhibitors that bind to inactive kinase conformations. Nat. Chem. Biol. 2, 358-364. Loging W, Harland L, Williams-Jones B, 2007. High-throughput electronic biology: mining information for drug discovery. Nat. Rev. Drug Discov. 6, 220-230. Lok JB, 2016. Signaling in parasitic nematodes: physicochemical communication between host and parasite and endogenous molecular transduction pathways governing worm development and survival. Curr. Clin. Micro. Rept. 3, 186-197.

44

Loukas A, Tran M, Pearson MS, 2007. Schistosome membrane proteins as vaccines. Int. J. Parasitol. 37, 257-263. Loyacano AF, Williams JC, Gurie J, DeRosa AA, 2002. Effect of gastrointestinal nematode and liver fluke infections on weight gain and reproductive performance of beef heifers. Vet. Parasitol. 107, 227-234. Lun ZR, Gasser RB, Lai DH, Li AX, Zhu XQ, Yu XB, Fang YY, 2005. Clonorchiasis: a key foodborne zoonosis in China. Lancet. Infect. Dis. 5, 31-41. Macarron R, Banks MN, Bojanic D, Burns DJ, Cirovic DA, Garyantes T, Green DV, Hertzberg RP, Janzen WP, Paslay JW, Schopfer U, Sittampalam GS, 2011. Impact of high-throughput screening in biomedical research. Nat. Rev. Drug Discov. 10, 188-195. Maeda I, Kohara Y, Yamamoto M, Sugimoto A, 2001. Large-scale analysis of gene function in Caenorhabditis elegans by high-throughput RNAi. Curr. Biol. 11, 171-176. Magarinos MP, Carmona SJ, Crowther GJ, Ralph SA, Roos DS, Shanmugam D, Van Voorhis WC, Agüero F, 2012. TDR Targets: a chemogenomics resource for neglected diseases. Nucleic Acids Res. 40, D1118-D1127. Maizels RM, Pearce EJ, Artis D, Yazdanbakhsh M, Wynn TA, 2009. Regulation of pathogenesis and immunity in helminth infections. J. Exp. Med. 206, 2059-2066. Maizels RM, McSorley HJ, 2016. Regulation of the host immune system by helminth parasites. J. Allergy Clin. Immunol. 138, 666-675. Mangiola S, Young ND, Korhonen P, Mondal A, Scheerlinck JP, Sternberg PW, Cantacessi C, Hall RS, Jex AR, Gasser RB, 2013. Getting the most out of parasitic helminth transcriptomes using HelmDB: implications for biology and biotechnology. Biotechnol. Adv. 31, 1109-1119. Manning G, Whyte DB, Martinez R, Hunter T, Sudarsanam S, 2002a. The protein kinase complement of the human genome. Science 298, 1912-1934. Manning G, Plowman GD, Hunter T, Sudarsanam S, 2002b. Evolution of protein kinase signaling from yeast to man. Trends Biochem. Sci. 27, 514-520. Manning G, 2005. Genomic overview of protein kinases. WormBook, ed. The C. elegans Research Community, WormBook, https://doi.org/10.1895/wormbook.1.60.1 Marchler-Bauer A, Zheng C, Chitsaz F, Derbyshire MK, Geer LY, Geer RC, Gonzales NR, Gwadz M, Hurwitz DI, Lanczycki CJ, Lu F, Lu S, Marchler GH, Song JS, Thanki N, Yamashita RA, Zhang D, Bryant SH, 2013. CDD: conserved domains and protein three-dimensional structure. Nucleic Acids Res. 41, D348-D352. Mardis ER, 2008a. The impact of next-generation sequencing technology on genetics. Trends Genet. 24, 133-141. Mardis ER, 2008b. Next-generation DNA sequencing methods. Annu. Rev. Genomics Hum. Genet. 9, 387-402. Martin DM, Miranda-Saavedra D, Barton GJ, 2009. Kinomer v. 1.0: a database of systematically classified eukaryotic protein kinases. Nucleic Acids Res. 37, D244- D250. Martin GS, 1970. Rous sarcoma virus: a function required for the maintenance of the transformed state. Nature 227, 1021-1023. Martin J, Rosa BA, Ozersky P, Hallsworth-Pepin K, Zhang X, Bhonagiri-Palsikar V, Tyagi R, Wang Q, Choi YJ, Gao X, McNulty SN, Brindley PJ, Mitreva M, 2015. Helminth.net: expansions to Nematode.net and an introduction to Trematode.net. Nucleic Acids Res. 43, D698-D706. Matthews HR, 1995. Protein kinases and phosphatases that act on histidine, lysine, or arginine residues in eukaryotic proteins: a possible regulator of the mitogen- activated protein kinase cascade. Pharmacol. Ther. 67, 323-350.

45

Maxam AM, Gilbert W, 1977. A new method for sequencing DNA. Proc. Natl. Acad. Sci. USA 74, 560-564. McInnes C, 2007. Virtual screening strategies in drug discovery. Curr. Opin. Chem. Biol. 11, 494-502. McNulty SN, Tort JF, Rinaldi G, Fischer K, Rosa BA, Smircich P, Fontenla S, Choi YJ, Tyagi R, Hallsworth-Pepin K, Mann VH, Kammili L, Latham PS, Dell'Oca N, Dominguez F, Carmona C, Fischer PU, Brindley PJ, Mitreva M, 2017. Genomes of Fasciola hepatica from the Americas reveal colonization with Neorickettsia endobacteria related to the agents of Potomac horse and human Sennetsu fevers. PLoS Genet. 13, e1006537. McSkimming DI, Dastgheib S, Talevich E, Narayanan A, Katiyar S, Taylor SS, Kochut K, Kannan N, 2015. ProKinO: a unified resource for mining the cancer kinome. Hum. Mutat. 36, 175-186. McSkimming DI, Dastgheib S, Baffi TR, Byrne DP, Ferries S, Scott ST, Newton AC, Eyers CE, Kochut KJ, Eyers PA, Kannan N, 2016. KinView: a visual comparative sequence analysis tool for integrated kinome research. Mol. Biosyst. 12, 3651-3665. McSorley HJ, Hewitson JP, Maizels RM, 2013. Immunomodulation by helminth parasites: defining mechanisms and mediators. Int. J. Parasitol. 43, 301-310. Metz JT, Hajduk PJ, 2010. Rational approaches to targeted polypharmacology: creating and navigating protein-ligand interaction networks. Curr. Opin. Chem. Biol. 14, 498- 504. Metzker ML, 2005. Emerging technologies in DNA sequencing. Genome Res. 15, 1767- 1776. Metzker ML, 2010. Sequencing technologies - the next generation. Nat. Rev. Genet. 11, 31- 46. Miller TA, 1978. Industrial development and field use of the canine hookworm vaccine. Adv. Parasitol. 16, 333-342. Miranda-Saavedra D, Barton GJ, 2007. Classification and functional annotation of eukaryotic protein kinases. Proteins 68, 893-914. Miranda-Saavedra D, Stark MJ, Packer JC, Vivares CP, Doerig C, Barton GJ, 2007. The complement of protein kinases of the microsporidium Encephalitozoon cuniculi in relation to those of Saccharomyces cerevisiae and Schizosaccharomyces pombe. BMC Genomics 8, 309. Mitreva M, Jasmer DP, Zarlenga DS, Wang Z, Abubucker S, Martin J, Taylor CM, Yin Y, Fulton L, Minx P, Yang SP, Warren WC, Fulton RS, Bhonagiri V, Zhang X, Hallsworth-Pepin K, Clifton SW, McCarter JP, Appleton J, Mardis ER, Wilson RK, 2011. The draft genome of the parasitic nematode Trichinella spiralis. Nat. Genet. 43, 228-235. Molyneux DH, 2004. "Neglected" diseases but unrecognised successes - challenges and opportunities for infectious disease control. Lancet 364, 380-383. Morange M, 1993. The discovery of cellular oncogenes. Hist. Philos. Life Sci. 15, 45-58. Morel M, Vanderstraete M, Hahnel S, Grevelding CG, Dissous C, 2014. Receptor tyrosine kinases and schistosome reproduction: new targets for chemotherapy. Front. Genet. 5, 238. Morrison DK, Murakami MS, Cleghon V, 2000. Protein kinases and phosphatases in the Drosophila genome. J. Cell Biol. 150, F57-F62. Mouse Genome Sequencing Consortium, Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, Antonarakis SE, Attwood J, Baertsch R, Bailey J, Barlow K, Beck S, Berry E, Birren B, Bloom T, Bork P, Botcherby M, Bray N, Brent MR, Brown DG, Brown

46

SD, Bult C, Burton J, Butler J, Campbell RD, Carninci P, Cawley S, Chiaromonte F, Chinwalla AT, Church DM, Clamp M, Clee C, Collins FS, Cook LL, Copley RR, Coulson A, Couronne O, Cuff J, Curwen V, Cutts T, Daly M, David R, Davies J, Delehaunty KD, Deri J, Dermitzakis ET, Dewey C, Dickens NJ, Diekhans M, Dodge S, Dubchak I, Dunn DM, Eddy SR, Elnitski L, Emes RD, Eswara P, Eyras E, Felsenfeld A, Fewell GA, Flicek P, Foley K, Frankel WN, Fulton LA, Fulton RS, Furey TS, Gage D, Gibbs RA, Glusman G, Gnerre S, Goldman N, Goodstadt L, Grafham D, Graves TA, Green ED, Gregory S, Guigo R, Guyer M, Hardison RC, Haussler D, Hayashizaki Y, Hillier LW, Hinrichs A, Hlavina W, Holzer T, Hsu F, Hua A, Hubbard T, Hunt A, Jackson I, Jaffe DB, Johnson LS, Jones M, Jones TA, Joy A, Kamal M, Karlsson EK, Karolchik D, Kasprzyk A, Kawai J, Keibler E, Kells C, Kent WJ, Kirby A, Kolbe DL, Korf I, Kucherlapati RS, Kulbokas EJ, Kulp D, Landers T, Leger JP, Leonard S, Letunic I, Levine R, Li J, Li M, Lloyd C, Lucas S, Ma B, Maglott DR, Mardis ER, Matthews L, Mauceli E, Mayer JH, McCarthy M, McCombie WR, McLaren S, McLay K, McPherson JD, Meldrim J, Meredith B, Mesirov JP, Miller W, Miner TL, Mongin E, Montgomery KT, Morgan M, Mott R, Mullikin JC, Muzny DM, Nash WE, Nelson JO, Nhan MN, Nicol R, Ning Z, Nusbaum C, O'Connor MJ, Okazaki Y, Oliver K, Overton-Larty E, Pachter L, Parra G, Pepin KH, Peterson J, Pevzner P, Plumb R, Pohl CS, Poliakov A, Ponce TC, Ponting CP, Potter S, Quail M, Reymond A, Roe BA, Roskin KM, Rubin EM, Rust AG, Santos R, Sapojnikov V, Schultz B, Schultz J, Schwartz MS, Schwartz S, Scott C, Seaman S, Searle S, Sharpe T, Sheridan A, Shownkeen R, Sims S, Singer JB, Slater G, Smit A, Smith DR, Spencer B, Stabenau A, Stange-Thomann N, Sugnet C, Suyama M, Tesler G, Thompson J, Torrents D, Trevaskis E, Tromp J, Ucla C, Ureta-Vidal A, Vinson JP, Von Niederhausern AC, Wade CM, Wall M, Weber RJ, Weiss RB, Wendl MC, West AP, Wetterstrand K, Wheeler R, Whelan S, Wierzbowski J, Willey D, Williams S, Wilson RK, Winter E, Worley KC, Wyman D, Yang S, Yang SP, Zdobnov EM, Zody MC, Lander ES, 2002. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520-562. Müller S, Chaikuad A, Gray NS, Knapp S, 2015. The ins and outs of selective kinase inhibitor development. Nat. Chem. Biol. 11, 818-821. Murphy JM, Zhang Q, Young SN, Reese ML, Bailey FP, Eyers PA, Ungureanu D, Hammaren H, Silvennoinen O, Varghese LN, Chen K, Tripaydonis A, Jura N, Fukuda K, Qin J, Nimchuk Z, Mudgett MB, Elowe S, Gee CL, Liu L, Daly RJ, Manning G, Babon JJ, Lucet IS, 2014. A robust methodology to subclassify pseudokinases based on their nucleotide-binding properties. Biochem. J. 457, 323- 334. Murray CJ, Vos T, Lozano R, Naghavi M, Flaxman AD, Michaud C, Ezzati M, Shibuya K, Salomon JA, Abdalla S, Aboyans V, Abraham J, Ackerman I, Aggarwal R, Ahn SY, Ali MK, Alvarado M, Anderson HR, Anderson LM, Andrews KG, Atkinson C, Baddour LM, Bahalim AN, Barker-Collo S, Barrero LH, Bartels DH, Basanez MG, Baxter A, Bell ML, Benjamin EJ, Bennett D, Bernabe E, Bhalla K, Bhandari B, Bikbov B, Bin Abdulhak A, Birbeck G, Black JA, Blencowe H, Blore JD, Blyth F, Bolliger I, Bonaventure A, Boufous S, Bourne R, Boussinesq M, Braithwaite T, Brayne C, Bridgett L, Brooker S, Brooks P, Brugha TS, Bryan-Hancock C, Bucello C, Buchbinder R, Buckle G, Budke CM, Burch M, Burney P, Burstein R, Calabria B, Campbell B, Canter CE, Carabin H, Carapetis J, Carmona L, Cella C, Charlson F, Chen H, Cheng AT, Chou D, Chugh SS, Coffeng LE, Colan SD, Colquhoun S, Colson KE, Condon J, Connor MD, Cooper LT, Corriere M, Cortinovis M, de Vaccaro KC, Couser W, Cowie BC, Criqui MH, Cross M, Dabhadkar KC, Dahiya

47

M, Dahodwala N, Damsere-Derry J, Danaei G, Davis A, De Leo D, Degenhardt L, Dellavalle R, Delossantos A, Denenberg J, Derrett S, Des Jarlais DC, Dharmaratne SD, Dherani M, Diaz-Torne C, Dolk H, Dorsey ER, Driscoll T, Duber H, Ebel B, Edmond K, Elbaz A, Ali SE, Erskine H, Erwin PJ, Espindola P, Ewoigbokhan SE, Farzadfar F, Feigin V, Felson DT, Ferrari A, Ferri CP, Fevre EM, Finucane MM, Flaxman S, Flood L, Foreman K, Forouzanfar MH, Fowkes FG, Fransen M, Freeman MK, Gabbe BJ, Gabriel SE, Gakidou E, Ganatra HA, Garcia B, Gaspari F, Gillum RF, Gmel G, Gonzalez-Medina D, Gosselin R, Grainger R, Grant B, Groeger J, Guillemin F, Gunnell D, Gupta R, Haagsma J, Hagan H, Halasa YA, Hall W, Haring D, Haro JM, Harrison JE, Havmoeller R, Hay RJ, Higashi H, Hill C, Hoen B, Hoffman H, Hotez PJ, Hoy D, Huang JJ, Ibeanusi SE, Jacobsen KH, James SL, Jarvis D, Jasrasaria R, Jayaraman S, Johns N, Jonas JB, Karthikeyan G, Kassebaum N, Kawakami N, Keren A, Khoo JP, King CH, Knowlton LM, Kobusingye O, Koranteng A, Krishnamurthi R, Laden F, Lalloo R, Laslett LL, Lathlean T, Leasher JL, Lee YY, Leigh J, Levinson D, Lim SS, Limb E, Lin JK, Lipnick M, Lipshultz SE, Liu W, Loane M, Ohno SL, Lyons R, Mabweijano J, MacIntyre MF, Malekzadeh R, Mallinger L, Manivannan S, Marcenes W, March L, Margolis DJ, Marks GB, Marks R, Matsumori A, Matzopoulos R, Mayosi BM, McAnulty JH, McDermott MM, McGill N, McGrath J, Medina-Mora ME, Meltzer M, Mensah GA, Merriman TR, Meyer AC, Miglioli V, Miller M, Miller TR, Mitchell PB, Mock C, Mocumbi AO, Moffitt TE, Mokdad AA, Monasta L, Montico M, Moradi-Lakeh M, Moran A, Morawska L, Mori R, Murdoch ME, Mwaniki MK, Naidoo K, Nair MN, Naldi L, Narayan KM, Nelson PK, Nelson RG, Nevitt MC, Newton CR, Nolte S, Norman P, Norman R, O'Donnell M, O'Hanlon S, Olives C, Omer SB, Ortblad K, Osborne R, Ozgediz D, Page A, Pahari B, Pandian JD, Rivero AP, Patten SB, Pearce N, Padilla RP, Perez-Ruiz F, Perico N, Pesudovs K, Phillips D, Phillips MR, Pierce K, Pion S, Polanczyk GV, Polinder S, Pope CA, 3rd, Popova S, Porrini E, Pourmalek F, Prince M, Pullan RL, Ramaiah KD, Ranganathan D, Razavi H, Regan M, Rehm JT, Rein DB, Remuzzi G, Richardson K, Rivara FP, Roberts T, Robinson C, De Leon FR, Ronfani L, Room R, Rosenfeld LC, Rushton L, Sacco RL, Saha S, Sampson U, Sanchez-Riera L, Sanman E, Schwebel DC, Scott JG, Segui-Gomez M, Shahraz S, Shepard DS, Shin H, Shivakoti R, Singh D, Singh GM, Singh JA, Singleton J, Sleet DA, Sliwa K, Smith E, Smith JL, Stapelberg NJ, Steer A, Steiner T, Stolk WA, Stovner LJ, Sudfeld C, Syed S, Tamburlini G, Tavakkoli M, Taylor HR, Taylor JA, Taylor WJ, Thomas B, Thomson WM, Thurston GD, Tleyjeh IM, Tonelli M, Towbin JA, Truelsen T, Tsilimbaris MK, Ubeda C, Undurraga EA, van der Werf MJ, van Os J, Vavilala MS, Venketasubramanian N, Wang M, Wang W, Watt K, Weatherall DJ, Weinstock MA, Weintraub R, Weisskopf MG, Weissman MM, White RA, Whiteford H, Wiebe N, Wiersma ST, Wilkinson JD, Williams HC, Williams SR, Witt E, Wolfe F, Woolf AD, Wulf S, Yeh PH, Zaidi AK, Zheng ZJ, Zonies D, Lopez AD, AlMazroa MA, Memish ZA, 2012. Disability-adjusted life years (DALYs) for 291 diseases and injuries in 21 regions, 1990-2010: a systematic analysis for the Global Burden of Disease Study 2010. Lancet 380, 2197-2223. Murrell KD, Pozio E, 2000. Trichinellosis: the zoonosis that won't go quietly. Int. J. Parasitol. 30, 1339-1349. Murrell KD, Pozio E, 2011. Worldwide occurrence and impact of human trichinellosis, 1986- 2009. Emerg. Infect. Dis. 17, 2194-2202. Niedner RH, Buzko OV, Haste NM, Taylor A, Gribskov M, Taylor SS, 2006. Protein kinase resource: an integrated environment for phosphorylation research. Proteins 63, 78- 86.

48

Nolen B, Taylor S, Ghosh G, 2004. Regulation of protein kinases: controlling activity through activation segment conformation. Mol. Cell 15, 661-675. O'Connor M, Peifer M, Bender W, 1989. Construction of large DNA segments in Escherichia coli. Science 244, 1307-1312. Olliaro P, Seiler J, Kuesel A, Horton J, Clark JN, Don R, Keiser J, 2011. Potential drug development candidates for human soil-transmitted helminthiases. PLoS Negl. Trop. Dis. 5, e1138. Olsen A, Namwanje H, Nejsum P, Roepstorff A, Thamsborg SM, 2009. Albendazole and mebendazole have low efficacy against Trichuris trichiura in school-age children in Kabale District, Uganda. Trans. R. Soc. Trop. Med. Hyg. 103, 443-446. Ortutay C, Valiaho J, Stenberg K, Vihinen M, 2005. KinMutBase: a registry of disease- causing mutations in protein kinase domains. Hum. Mutat. 25, 435-442. Padmanabhan R, Jay E, Wu R, 1974. Chemical synthesis of a primer and its use in the sequence analysis of the lysozyme gene of bacteriophage T4. Proc. Natl. Acad. Sci. USA 71, 2510-2514. Palumbo E, 2007. Association between schistosomiasis and cancer: a review. Infect. Dis. Clin. Pract. 15, 145-148. Panic G, Duthaler U, Speich B, Keiser J, 2014. Repurposing drugs for the treatment and control of helminth infections. Int. J. Parasitol. Drugs Drug Resist. 4, 185-200. Parkinson J, Whitton C, Schmid R, Thomson M, Blaxter M, 2004. NEMBASE: a resource for parasitic nematode ESTs. Nucleic Acids Res. 32, D427-D430. Pearce LR, Komander D, Alessi DR, 2010. The nuts and bolts of AGC protein kinases. Nat. Rev. Mol. Cell Biol. 11, 9-22. Perrimon N, 1994. Signalling pathways initiated by receptor protein tyrosine kinases in Drosophila. Curr. Opin. Cell Biol. 6, 260-266. Petretti C, Prigent C, 2005. The Protein Kinase Resource: everything you always wanted to know about protein kinases but were afraid to ask. Biol. Cell 97, 113-118. Pettersson E, Lundeberg J, Ahmadian A, 2009. Generations of sequencing technologies. Genomics 93, 105-111. Piratae S, Tesana S, Jones MK, Brindley PJ, Loukas A, Lovas E, Eursitthichai V, Sripa B, Thanasuwan S, Laha T, 2012. Molecular characterization of a tetraspanin from the human liver fluke, Opisthorchis viverrini. PLoS Negl. Trop. Dis. 6, e1939. Plowman GD, Sudarsanam S, Bingham J, Whyte D, Hunter T, 1999. The protein kinases of Caenorhabditis elegans: a model for signal transduction in multicellular organisms. Proc. Natl. Acad. Sci. USA 96, 13603-13610. Pozio E, Murrell KD, 2006. Systematics and epidemiology of Trichinella. Adv. Parasitol. 63, 367-439. Pozio E, 2007. World distribution of Trichinella spp. infections in animals and humans. Vet. Parasitol. 149, 3-21. Preidis GA, Hotez PJ, 2015. The newest "omics" - metagenomics and metabolomics - enter the battle against the neglected tropical diseases. PLoS Negl. Trop. Dis. 9, e0003382. Preston S, Jabbar A, Nowell C, Joachim A, Ruttkowski B, Baell J, Cardno T, Korhonen PK, Piedrafita D, Ansell BR, Jex AR, Hofmann A, Gasser RB, 2015a. Low cost whole- organism screening of compounds for anthelmintic activity. Int. J. Parasitol. 45, 333- 343. Preston S, Jabbar A, Gasser RB, 2015b. A perspective on genomic-guided anthelmintic discovery and repurposing using Haemonchus contortus. Infect. Genet. Evol. 40, 368-373.

49

Preston S, Jiao Y, Jabbar A, McGee SL, Laleu B, Willis P, Wells TN, Gasser RB, 2016. Screening of the 'Pathogen Box' identifies an approved pesticide with major anthelmintic activity against the barber's pole worm. Int. J. Parasitol. Drugs Drug Resist. 6, 329-334. Prichard RK, Basanez MG, Boatin BA, McCarthy JS, Garcia HH, Yang GJ, Sripa B, Lustigman S, 2012. A research agenda for helminth diseases of humans: intervention for control and elimination. PLoS Negl. Trop. Dis. 6, e1549. Protasio AV, Tsai IJ, Babbage A, Nichol S, Hunt M, Aslett MA, De Silva N, Velarde GS, Anderson TJ, Clark RC, Davidson C, Dillon GP, Holroyd NE, LoVerde PT, Lloyd C, McQuillan J, Oliveira G, Otto TD, Parker-Manuel SJ, Quail MA, Wilson RA, Zerlotini A, Dunne DW, Berriman M, 2012. A systematically improved high quality genome and transcriptome of the human blood fluke Schistosoma mansoni. PLoS Negl. Trop. Dis. 6, e1455. Reed SI, Hadwiger JA, Lorincz AT, 1985. Protein kinase activity associated with the product of the yeast cell division cycle gene CDC28. Proc. Natl. Acad. Sci. USA 82, 4055- 4059. Reinke V, Smith HE, Nance J, Wang J, Van Doren C, Begley R, Jones SJ, Davis EB, Scherer S, Ward S, Kim SK, 2000. A global profile of germline gene expression in C. elegans. Mol. Cell 6, 605-616. Reuter JA, Spacek DV, Snyder MP, 2015. High-throughput sequencing technologies. Mol. Cell 58, 586-597. Rickard MD, Harrison GB, Heath DD, Lightowlers MW, 1995. Taenia ovis recombinant vaccine - 'quo vadit'. Parasitology 110 (Suppl.), S5-S9. Roberts LS, Janovy Jr. J, 2009. Gerald D. Schmidt & Larry S. Roberts’ Foundations of Parasitology. McGraw-Hill, New York, USA. Roeber F, Jex AR, Gasser RB, 2013. Impact of gastrointestinal parasitic nematodes of sheep, and the role of advanced molecular tools for exploring epidemiology and drug resistance - an Australian perspective. Parasit. Vectors 6, 153. Roepstorff A, Mejer H, Nejsum P, Thamsborg SM, 2011. Helminth parasites in pigs: new challenges in pig production and current research highlights. Vet. Parasitol. 180, 72- 81. Rollinson D, 2009. A wake up call for urinary schistosomiasis: reconciling research effort with public health importance. Parasitology 136, 1593-1610. Rollinson D, Knopp S, Levitz S, Stothard JR, Tchuem Tchuente LA, Garba A, Mohammed KA, Schur N, Person B, Colley DG, Utzinger J, 2013. Time to set the agenda for schistosomiasis elimination. Acta Trop. 128, 423-440. Rose H, Rinaldi L, Bosco A, Mavrot F, de Waal T, Skuce P, Charlier J, Torgerson PR, Hertzberg H, Hendrickx G, Vercruysse J, Morgan ER, 2015. Widespread anthelmintic resistance in European farmed ruminants: a systematic review. Vet. Rec. 176, 546. Sanger F, Coulson AR, 1975. A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. J. Mol. Biol. 94, 441-448. Sanger F, Nicklen S, Coulson AR, 1977. DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. USA 74, 5463-5467. Sarnpitak P, Mujumdar P, Taylor P, Cross M, Coster MJ, Gorse AD, Krasavin M, Hofmann A, 2015. Panel docking of small-molecule libraries - prospects to improve efficiency of lead compound discovery. Biotechnol. Adv. 33, 941-947. Scheeff ED, Bourne PE, 2005. Structural evolution of the protein kinase-like superfamily. PLoS Comput. Biol. 1, e49.

50

Schistosoma japonicum Genome Sequencing and Functional Analysis Consortium, 2009. The Schistosoma japonicum genome reveals features of host-parasite interplay. Nature 460, 345-351. Schlessinger J, Ullrich A, 1992. Growth factor signaling by receptor tyrosine kinases. Neuron 9, 383-391. Schlessinger J, 2014. Receptor tyrosine kinases: legacy of the first two decades. Cold Spring Harb. Perspect. Biol. 6, a008912. Schuster SC, 2008. Next-generation sequencing transforms today's biology. Nat. Methods 5, 16-18. Schwarz EM, Korhonen PK, Campbell BE, Young ND, Jex AR, Jabbar A, Hall RS, Mondal A, Howe AC, Pell J, Hofmann A, Boag PR, Zhu XQ, Gregory TR, Loukas A, Williams BA, Antoshechkin I, Brown CT, Sternberg PW, Gasser RB, 2013. The genome and developmental transcriptome of the strongylid nematode Haemonchus contortus. Genome Biol. 14, R89. Schwarz EM, Hu Y, Antoshechkin I, Miller MM, Sternberg PW, Aroian RV, 2015. The genome and transcriptome of the zoonotic hookworm Ancylostoma ceylanicum identify infection-specific gene families. Nat. Genet. 47, 416-422. Scott I, Pomroy WE, Kenyon PR, Smith G, Adlington B, Moss A, 2013. Lack of efficacy of monepantel against Teladorsagia circumcincta and Trichostrongylus colubriformis. Vet. Parasitol. 198, 166-171. Sehgal SN, 1998. Rapamune (RAPA, rapamycin, sirolimus): mechanism of action immunosuppressive effect results from blockade of signal transduction and inhibition of cell cycle progression. Clin. Biochem. 31, 335-340. Shanmugam D, Ralph SA, Carmona SJ, Crowther GJ, Roos DS, Agüero F, 2012. Integrating and mining helminth genomes to discover and prioritize novel therapeutic targets. In: Caffrey, CR (Ed.), Parasitic helminths: targets, screens, drugs and vaccines. Wiley-Blackwell, Hoboken, New Jersey, USA, pp. 43-59. Sharifpoor S, van Dyk D, Costanzo M, Baryshnikova A, Friesen H, Douglas AC, Youn JY, VanderSluis B, Myers CL, Papp B, Boone C, Andrews BJ, 2012. Functional wiring of the yeast kinome revealed by global analysis of genetic network motifs. Genome Res. 22, 791-801. Shizuya H, Birren B, Kim UJ, Mancino V, Slepak T, Tachiiri Y, Simon M, 1992. Cloning and stable maintenance of 300-kilobase-pair fragments of human DNA in Escherichia coli using an F-factor-based vector. Proc. Natl. Acad. Sci. USA 89, 8794-8797. Shtivelman E, Lifshitz B, Gale RP, Canaani E, 1985. Fused transcript of abl and bcr genes in chronic myelogenous leukaemia. Nature 315, 550-554. Simanis V, Nurse P, 1986. The cell cycle control gene cdc2+ of fission yeast encodes a protein kinase potentially regulated by phosphorylation. Cell 45, 261-268. Singh S, Lowe DG, Thorpe DS, Rodriguez H, Kuang WJ, Dangott LJ, Chinkers M, Goeddel DV, Garbers DL, 1988. Membrane guanylate cyclase is a cell-surface receptor with homology to protein kinases. Nature 334, 708-712. Smith CM, Shindyalov IN, Veretnik S, Gribskov M, Taylor SS, Ten Eyck LF, Bourne PE, 1997. The protein kinase resource. Trends Biochem. Sci. 22, 444-446. Smith CM, 1999. The protein kinase resource and other bioinformation resources. Prog. Biophys. Mol. Biol. 71, 525-533. Smout MJ, Laha T, Mulvenna J, Sripa B, Suttiprapa S, Jones A, Brindley PJ, Loukas A, 2009. A granulin-like growth factor secreted by the carcinogenic liver fluke, Opisthorchis viverrini, promotes proliferation of host cells. PLoS Pathog. 5, e1000611.

51

Soderling TR, 1999. The Ca2+-calmodulin-dependent protein kinase cascade. Trends Biochem. Sci. 24, 232-236. Sonnhammer EL, Eddy SR, Durbin R, 1997. Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins 28, 405-420. Sonnichsen B, Koski LB, Walsh A, Marschall P, Neumann B, Brehm M, Alleaume AM, Artelt J, Bettencourt P, Cassin E, Hewitson M, Holz C, Khan M, Lazik S, Martin C, Nitzsche B, Ruer M, Stamford J, Winzi M, Heinkel R, Roder M, Finell J, Hantsch H, Jones SJ, Jones M, Piano F, Gunsalus KC, Oegema K, Gonczy P, Coulson A, Hyman AA, Echeverri CJ, 2005. Full-genome RNAi profiling of early embryogenesis in Caenorhabditis elegans. Nature 434, 462-469. Sotillo J, Pearson M, Potriquet J, Becker L, Pickering D, Mulvenna J, Loukas A, 2016. Extracellular vesicles secreted by Schistosoma mansoni contain protein vaccine candidates. Int. J. Parasitol. 46, 1-5. Soukhathammavong PA, Sayasone S, Phongluxa K, Xayaseng V, Utzinger J, Vounatsou P, Hatz C, Akkhavong K, Keiser J, Odermatt P, 2012. Low efficacy of single-dose albendazole and mebendazole against hookworm and effect on concomitant helminth infection in Lao PDR. PLoS Negl. Trop. Dis. 6, e1417. Sripa B, Brindley PJ, Mulvenna J, Laha T, Smout MJ, Mairiang E, Bethony JM, Loukas A, 2012. The tumorigenic liver fluke Opisthorchis viverrini - multiple pathways to cancer. Trends Parasitol. 28, 395-407. Stein LD, Bao Z, Blasiar D, Blumenthal T, Brent MR, Chen N, Chinwalla A, Clarke L, Clee C, Coghlan A, Coulson A, D'Eustachio P, Fitch DH, Fulton LA, Fulton RE, Griffiths-Jones S, Harris TW, Hillier LW, Kamath R, Kuwabara PE, Mardis ER, Marra MA, Miner TL, Minx P, Mullikin JC, Plumb RW, Rogers J, Schein JE, Sohrmann M, Spieth J, Stajich JE, Wei C, Willey D, Wilson RK, Durbin R, Waterston RH, 2003. The genome sequence of Caenorhabditis briggsae: a platform for comparative genomics. PLoS Biol. 1, E45. Stern DF, Zheng P, Beidler DR, Zerillo C, 1991. Spk1, a new kinase from Saccharomyces cerevisiae, phosphorylates proteins on serine, threonine, and tyrosine. Mol. Cell. Biol. 11, 987-1001. Sternberg PW, Horvitz HR, 1991. Signal transduction during C. elegans vulval induction. Trends Genet. 7, 366-371. Sugimoto A, 2004. High-throughput RNAi in Caenorhabditis elegans: genome-wide screens and functional genomics. Differentiation 72, 81-91. Sulston JE, Schierenberg E, White JG, Thomson JN, 1983. The embryonic cell lineage of the nematode Caenorhabditis elegans. Dev. Biol. 100, 64-119. Sutherland EW, Jr., Wosilait WD, 1955. Inactivation and activation of liver phosphorylase. Nature 175, 169-170. Sutherland I, Scott I, 2009. Gastrointestinal nematodes of sheep and cattle: biology and control. Wiley-Blackwell, West Sussex, UK. Swinney DC, Anthony J, 2011. How were new medicines discovered? Nat. Rev. Drug Discov. 10, 507-519. Swulius MT, Waxham MN, 2008. Ca2+/calmodulin-dependent protein kinases. Cell. Mol. Life Sci. 65, 2637-2657. Szymczyna BR, Taurog RE, Young MJ, Snyder JC, Johnson JE, Williamson JR, 2009. Synergy of NMR, computation, and X-ray crystallography for structural biology. Structure 17, 499-507. Talevich E, Mirza A, Kannan N, 2011. Structural and evolutionary divergence of eukaryotic protein kinases in Apicomplexa. BMC Evol. Biol. 11, 321.

52

Talevich E, Kannan N, Miranda-Saavedra D, 2014. Computational analysis of apicomplexan kinomes. In: Doerig, C, Spaeth, G, Wiese, M (Eds.), Protein phosphorylation in parasites. Wiley Blackwell, Hoboken, New Jersey, USA, pp. 3-36. Tang YT, Gao X, Rosa BA, Abubucker S, Hallsworth-Pepin K, Martin J, Tyagi R, Heizer E, Zhang X, Bhonagiri-Palsikar V, Minx P, Warren WC, Wang Q, Zhan B, Hotez PJ, Sternberg PW, Dougall A, Gaze ST, Mulvenna J, Sotillo J, Ranganathan S, Rabelo EM, Wilson RK, Felgner PL, Bethony J, Hawdon JM, Gasser RB, Loukas A, Mitreva M, 2014. Genome of the human hookworm Necator americanus. Nat. Genet. 46, 261-269. Taylor CM, Wang Q, Rosa BA, Huang SC, Powell K, Schedl T, Pearce EJ, Abubucker S, Mitreva M, 2013a. Discovery of anthelmintic drug targets and drugs using chokepoints in nematode metabolic pathways. PLoS Pathog. 9, e1003505. Taylor CM, Martin J, Rao RU, Powell K, Abubucker S, Mitreva M, 2013b. Using existing drugs as leads for broad spectrum anthelmintics targeting protein kinases. PLoS Pathog. 9, e1003149. Taylor SS, Kornev AP, 2011. Protein kinases: evolution of dynamic regulatory proteins. Trends Biochem. Sci. 36, 65-77. Toet H, Piedrafita DM, Spithill TW, 2014. Liver fluke vaccines in ruminants: strategies, progress and future opportunities. Int. J. Parasitol. 44, 915-927. Toledo R, Esteban JG, Fried B, 2012. Current status of food-borne trematode infections. Eur. J. Clin. Microbiol. Infect. Dis. 31, 1705-1718. Tsai IJ, Zarowiecki M, Holroyd N, Garciarrubio A, Sanchez-Flores A, Brooks KL, Tracey A, Bobes RJ, Fragoso G, Sciutto E, Aslett M, Beasley H, Bennett HM, Cai J, Camicia F, Clark R, Cucher M, De Silva N, Day TA, Deplazes P, Estrada K, Fernandez C, Holland PW, Hou J, Hu S, Huckvale T, Hung SS, Kamenetzky L, Keane JA, Kiss F, Koziol U, Lambert O, Liu K, Luo X, Luo Y, Macchiaroli N, Nichol S, Paps J, Parkinson J, Pouchkina-Stantcheva N, Riddiford N, Rosenzvit M, Salinas G, Wasmuth JD, Zamanian M, Zheng Y, Taenia solium Genome Consortium, Cai X, Soberon X, Olson PD, Laclette JP, Brehm K, Berriman M, 2013. The genomes of four tapeworm species reveal adaptations to parasitism. Nature 496, 57-63. Tyagi R, Rosa BA, Lewis WG, Mitreva M, 2015. Pan-phylum comparison of nematode metabolic potential. PLoS Negl. Trop. Dis. 9, e0003788. Utzinger J, 2012. A research and development agenda for the control and elimination of human helminthiases. PLoS Negl. Trop. Dis. 6, e1646. van der Werf MJ, de Vlas SJ, Brooker S, Looman CW, Nagelkerke NJ, Habbema JD, Engels D, 2003. Quantification of clinical morbidity associated with schistosome infection in sub-Saharan Africa. Acta Trop. 86, 125-139. van Dijk EL, Auger H, Jaszczyszyn Y, Thermes C, 2014. Ten years of next-generation sequencing technology. Trends Genet. 30, 418-426. Vanderstraete M, Gouignard N, Ahier A, Morel M, Vicogne J, Dissous C, 2013. The venus kinase receptor (VKR) family: structure and evolution. BMC Genomics 14, 361. Varjosalo M, Keskitalo S, Van Drogen A, Nurkkala H, Vichalkovski A, Aebersold R, Gstaiger M, 2013. The protein interaction landscape of the human CMGC kinase group. Cell Rep. 3, 1306-1320. Varmus HE, 1985. Alfred P. Sloan prize. Viruses, genes, and cancer. I. The discovery of cellular oncogenes and their role in neoplasia. Cancer 55, 2324-2328. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, Gocayne JD, Amanatides P, Ballew RM, Huson DH, Wortman JR, Zhang Q, Kodira CD, Zheng XH, Chen L, Skupski M, Subramanian G, Thomas PD, Zhang J, Gabor Miklos GL, Nelson C, Broder S, Clark AG, Nadeau J,

53

McKusick VA, Zinder N, Levine AJ, Roberts RJ, Simon M, Slayman C, Hunkapiller M, Bolanos R, Delcher A, Dew I, Fasulo D, Flanigan M, Florea L, Halpern A, Hannenhalli S, Kravitz S, Levy S, Mobarry C, Reinert K, Remington K, Abu- Threideh J, Beasley E, Biddick K, Bonazzi V, Brandon R, Cargill M, Chandramouliswaran I, Charlab R, Chaturvedi K, Deng Z, Di Francesco V, Dunn P, Eilbeck K, Evangelista C, Gabrielian AE, Gan W, Ge W, Gong F, Gu Z, Guan P, Heiman TJ, Higgins ME, Ji RR, Ke Z, Ketchum KA, Lai Z, Lei Y, Li Z, Li J, Liang Y, Lin X, Lu F, Merkulov GV, Milshina N, Moore HM, Naik AK, Narayan VA, Neelam B, Nusskern D, Rusch DB, Salzberg S, Shao W, Shue B, Sun J, Wang Z, Wang A, Wang X, Wang J, Wei M, Wides R, Xiao C, Yan C, Yao A, Ye J, Zhan M, Zhang W, Zhang H, Zhao Q, Zheng L, Zhong F, Zhong W, Zhu S, Zhao S, Gilbert D, Baumhueter S, Spier G, Carter C, Cravchik A, Woodage T, Ali F, An H, Awe A, Baldwin D, Baden H, Barnstead M, Barrow I, Beeson K, Busam D, Carver A, Center A, Cheng ML, Curry L, Danaher S, Davenport L, Desilets R, Dietz S, Dodson K, Doup L, Ferriera S, Garg N, Gluecksmann A, Hart B, Haynes J, Haynes C, Heiner C, Hladun S, Hostin D, Houck J, Howland T, Ibegwam C, Johnson J, Kalush F, Kline L, Koduru S, Love A, Mann F, May D, McCawley S, McIntosh T, McMullen I, Moy M, Moy L, Murphy B, Nelson K, Pfannkoch C, Pratts E, Puri V, Qureshi H, Reardon M, Rodriguez R, Rogers YH, Romblad D, Ruhfel B, Scott R, Sitter C, Smallwood M, Stewart E, Strong R, Suh E, Thomas R, Tint NN, Tse S, Vech C, Wang G, Wetter J, Williams S, Williams M, Windsor S, Winn-Deen E, Wolfe K, Zaveri J, Zaveri K, Abril JF, Guigo R, Campbell MJ, Sjolander KV, Karlak B, Kejariwal A, Mi H, Lazareva B, Hatton T, Narechania A, Diemer K, Muruganujan A, Guo N, Sato S, Bafna V, Istrail S, Lippert R, Schwartz R, Walenz B, Yooseph S, Allen D, Basu A, Baxendale J, Blick L, Caminha M, Carnes-Stine J, Caulk P, Chiang YH, Coyne M, Dahlke C, Mays A, Dombroski M, Donnelly M, Ely D, Esparham S, Fosler C, Gire H, Glanowski S, Glasser K, Glodek A, Gorokhov M, Graham K, Gropman B, Harris M, Heil J, Henderson S, Hoover J, Jennings D, Jordan C, Jordan J, Kasha J, Kagan L, Kraft C, Levitsky A, Lewis M, Liu X, Lopez J, Ma D, Majoros W, McDaniel J, Murphy S, Newman M, Nguyen T, Nguyen N, Nodell M, Pan S, Peck J, Peterson M, Rowe W, Sanders R, Scott J, Simpson M, Smith T, Sprague A, Stockwell T, Turner R, Venter E, Wang M, Wen M, Wu D, Wu M, Xia A, Zandieh A, Zhu X, 2001. The sequence of the human genome. Science 291, 1304-1351. Vercruysse J, Albonico M, Behnke JM, Kotze AC, Prichard RK, McCarthy JS, Montresor A, Levecke B, 2011. Is anthelmintic resistance a concern for the control of human soil- transmitted helminths? Int. J. Parasitol. Drugs Drug Resist. 1, 14-27. Walker AJ, Ressurreição M, Rothermel R, 2014. Exploring the function of protein kinases in schistosomes: perspectives from the laboratory and from comparative genomics. Front. Genet. 5, 229. Wang L, Yang Z, Li Y, Yu F, Brindley PJ, McManus DP, Wei D, Han Z, Feng Z, Li Y, Hu W, 2006. Reconstruction and in silico analysis of the MAPK signaling pathways in the human blood fluke, Schistosoma japonicum. FEBS Lett. 580, 3677-3686. Wang W, Wang L, Liang YS, 2012. Susceptibility or resistance of praziquantel in human schistosomiasis: a review. Parasitol. Res. 111, 1871-1877. Wang X, Chen W, Huang Y, Sun J, Men J, Liu H, Luo F, Guo L, Lv X, Deng C, Zhou C, Fan Y, Li X, Huang L, Hu Y, Liang C, Hu X, Xu J, Yu X, 2011. The draft genome of the carcinogenic human liver fluke Clonorchis sinensis. Genome Biol. 12, R107.

54

Webster JP, Molyneux DH, Hotez PJ, Fenwick A, 2014. The contribution of mass drug administration to global health: past, present and future. Phil. Trans. R. Soc. B 369, 20130434. Wolstenholme AJ, Fairweather I, Prichard R, von Samson-Himmelstjerna G, Sangster NC, 2004. Drug resistance in veterinary helminths. Trends Parasitol. 20, 469-476. Wolstenholme AJ, Kaplan RM, 2012. Resistance to macrocyclic lactones. Curr. Pharm. Biotechnol. 13, 873-887. World Health Organization, 2012a. Research priorities for helminth infections: technical report of the TDR disease reference group on helminth infections. WHO technical report series. World Health Organization, 2012b. Soil-transmitted helminthiases: eliminating soil- transmitted helminthiases as a public health problem in children: progress report 2001-2010 and strategic plan 2011-2020. World Health Organization, 2015. WHO estimates of the global burden of foodborne diseases, Foodborne Disease Burden Epidemiology Reference Group 2007-2015. Wu P, Nielsen TE, Clausen MH, 2015. FDA-approved small-molecule kinase inhibitors. Trends Pharmacol. Sci. 36, 422-439. Wu P, Nielsen TE, Clausen MH, 2016. Small-molecule kinase inhibitors: an analysis of FDA-approved drugs. Drug Discov. Today 21, 5-10. Wu R, 1972. Nucleotide sequence analysis of DNA. Nat. New Biol. 236, 198-200. Yang CY, Chang CH, Yu YL, Lin TC, Lee SA, Yen CC, Yang JM, Lai JM, Hong YR, Tseng TL, Chao KM, Huang CY, 2008. PhosphoPOINT: a comprehensive human kinase interactome and phospho-protein database. Bioinformatics 24, i14-i20. Yang J, Yan R, Roy A, Xu D, Poisson J, Zhang Y, 2015. The I-TASSER suite: protein structure and function prediction. Nat. Methods 12, 7-8. Yang Z, Rannala B, 2012. Molecular phylogenetics: principles and practice. Nat. Rev. Genet. 13, 303-314. Yokogawa S, Cort WW, Yokogawa M, 1960. Paragonimus and paragonimiasis. Exp. Parasitol. 10, 139-205. Yong HS, Eamsobhana P, Lim PE, Razali R, Aziz FA, Rosli NS, Poole-Johnson J, Anwar A, 2015. Draft genome of neurotropic nematode parasite Angiostrongylus cantonensis, causative agent of human eosinophilic meningitis. Acta Trop. 148, 51-57. Young ND, Jex AR, Li B, Liu S, Yang L, Xiong Z, Li Y, Cantacessi C, Hall RS, Xu X, Chen F, Wu X, Zerlotini A, Oliveira G, Hofmann A, Zhang G, Fang X, Kang Y, Campbell BE, Loukas A, Ranganathan S, Rollinson D, Rinaldi G, Brindley PJ, Yang H, Wang J, Wang J, Gasser RB, 2012. Whole-genome sequence of Schistosoma haematobium. Nat. Genet. 44, 221-225. Young ND, Nagarajan N, Lin SJ, Korhonen PK, Jex AR, Hall RS, Safavi-Hemami H, Kaewkong W, Bertrand D, Gao S, Seet Q, Wongkham S, Teh BT, Wongkham C, Intapan PM, Maleewong W, Yang X, Hu M, Wang Z, Hofmann A, Sternberg PW, Tan P, Wang J, Gasser RB, 2014. The Opisthorchis viverrini genome provides insights into life in the bile duct. Nat. Commun. 5, 4378. Zaru R, Magrane M, O'Donovan C, UniProt Consortium, 2017. From the research laboratory to the database: the Caenorhabditis elegans kinome in UniProtKB. Biochem. J. 474, 493-515. Zhang J, Yang PL, Gray NS, 2009. Targeting cancer with small molecule kinase inhibitors. Nat. Rev. Cancer 9, 28-39. Zhang ZQ, 2013. Animal biodiversity: An outline of higher-level classification and survey of taxonomic richness. Zootaxa 3703, 1-82.

55

Zhao Z, Wu H, Wang L, Liu Y, Knapp S, Liu Q, Gray NS, 2014. Exploration of type II binding mode: a privileged approach for kinase inhibitor focused drug discovery? ACS Chem. Biol. 9, 1230-1241. Zheng J, Knighton DR, ten Eyck LF, Karlsson R, Xuong N, Taylor SS, Sowadski JM, 1993. Crystal structure of the catalytic subunit of cAMP-dependent protein kinase complexed with MgATP and peptide inhibitor. Biochemistry 32, 2154-2161. Zheng S, Zhu Y, Zhao Z, Wu Z, Okanurak K, Lv Z, 2017. Liver fluke infection and cholangiocarcinoma: a review. Parasitol. Res. 116, 11-19. Zhu XQ, Korhonen PK, Cai H, Young ND, Nejsum P, von Samson-Himmelstjerna G, Boag PR, Tan P, Li Q, Min J, Yang Y, Wang X, Fang X, Hall RS, Hofmann A, Sternberg PW, Jex AR, Gasser RB, 2015. Genetic blueprint of the zoonotic pathogen Toxocara canis. Nat. Commun. 6, 6145. Zulawski M, Schulze G, Braginets R, Hartmann S, Schulze WX, 2014. The Arabidopsis kinome: phylogeny and evolutionary insights into functional diversification. BMC Genomics 15, 548.

56

Table 1.1 The kinomes of 15 representative organisms in KinBase, kinase group classifications. AGC CAMK CK1 CMGC RGC STE TK TKL "Other" aPKs/ Name Description Totals group group group group group group group group group PKLs Homo sapiens Human 538 63 74 12 64 5 47 90 43 81 59 Mus musculus Mouse 557 60 96 11 62 7 47 90 43 83 58 Strongylocentrotus Sea urchin 354 29 48 6 35 8 21 54 38 94 21 purpuratus Drosophila melanogaster Vinegar fly 237 30 32 10 34 6 18 31 17 44 15 Caenorhabditis elegans Nematode worm 438 29 42 83 48 27 24 85 15 67 18 703 27 35 5 42 3 26 222 248 70 25 Unicellular Monosiga brevicollis 114* 0 0 0 0 0 0 109 0 0 5 choanoflagellate Saccharomyces cerevisiae Baker's yeast 132 17 22 4 23 0 14 0 0 38 14 Coprinopsis cinerea Mushroom 374 23 18 5 31 0 18 0 18 182 79 Dictyostelium discoideum Amoebozoan slime mold 294 21 21 3 30 0 45 0 67 68 39 Tetrahymena thermophila Free-living ciliate 1114 60 63 20 65 0 19 0 8 760 119 Gardia lamblia Excavate protozoan 283 7 8 1 24 0 8 0 0 226 9 Leishmania major Kinetoplastid protozoan 223 11 23 7 50 0 43 0 0 61 28 Trichomonas vaginalis Excavate protozoan 1076 90 442 71 136 0 27 0 121 123 66 Selaginella moellendorffii Lycophyte (plant) 1013 33 139 9 93 0 36 0 576 77 50

* The number of sequences reported in KinBase is 261, including proteins labelled "TK-associated" and "PTP". These sequences do not contain a kinase catalytic domain and therefore are not listed here, except for five sequences (listed as PKLs).

57

Table 1.2 Selected species of parasitic worms of humans and other animals in the phyla Platyhelminthes and Nematoda, their intermediate and definitive hosts, predilection site(s), transmission route(s) and clinical manifestations of the disease(s) they cause.

Species name Intermediate host(s) Main definitive host(s) Predilection site in DH Transmitted to DH via Main clinical manifestiation of disease (IH) (DH) Platyhelminthes

Clonorchis sinensis1 Freshwater snails (1st), Humans, dogs, cats, Liver (bile ducts) Metacercariae in fish Diarrhoea, abdominal pain, biliary obstruction, freshwater fish (2nd) pigs cholangiocarcinoma Echinostoma spp.1 Freshwater snails (1st & Mammals, birds Alimentary tract Metacercariae in 2nd IH Abdominal pain, diarrhoea, fatigue 2nd), freshwater fish, frogs, tadpoles (2nd) Fasciola hepatica, Freshwater snails Sheep, cattle, humans Liver (bile ducts) Metacercariae on vegetation Abdominal pain, anaemia, jaundice, hepato- F. gigantica1 megaly, splenomegaly, ascites, biliary colic Fasciolopsis buski1 Freshwater snails Pigs, humans Intestine Metacercariae on vegetation Ulceration, diarrhoea and ascites Opisthorchis spp.1 Freshwater snails (1st), Fish-eating mammals Liver (bile ducts) Metacercariae in fish Abdominal pain, biliary obstruction, freshwater fish (2nd) cholangiocarcinoma Paragonimus spp.1 Freshwater snails (1st), Carnivorous mammals Lungs Metacercariae in 2nd IH Chronic pneumonia freshwater crustaceans (2nd) Paramphistomum spp., Freshwater snails Cattle, sheep Rumen, reticulum Metacercariae on vegetation Diarrhoea and anaemia Calicophoron spp., Orthocoelium spp.2 Schistosoma haematobium*,3 Bulinus snails Humans Vasculature of bladder Cercariae in freshwater Fibrosis, bladder cancer Schistosoma japonicum3 Oncomelania snails Mammals Vasculature of intestine Cercariae in freshwater Fibrosis, hepatosplenomegaly Schistosoma mansoni*,3 Biomphalaria snails Humans Vasculature of intestine Cercariae in freshwater Fibrosis, hepatosplenomegaly

58

(Table 1.2 continued)

Phylum Nematoda Ancylostoma duodenale, No IH Humans Small intestine L3s in soil Severe anaemia Ancylostoma ceylanicum, Necator americanus4 Ascaris lumbricoides4 No IH Humans Small intestine Eggs (containing L1s) in soil Abdominal pain, intestinal blockage, impaired growth Haemonchus contortus*,5 No IH Sheep, goats Abomasum L3s in soil Severe anaemia, hypoproteinaemia Teladorsagia (Ostertagia) No IH Sheep, goats Abomasum L3s in soil Diarrhoea, anorexia, poor growth, emaciation circumcincta5 Trichinella spiralis*, No IH Vertebrates Muscle tissue Raw/undercooked, infected Abdominal pain, diarrhoea, fever, muscle pain, T. pseudospiralis*,6 muscle (containing L1s) neurological symptoms Trichostrongylus spp.5 No IH Sheep, goats Small intestine, L3s in soil Diarrhoea, anorexia, poor growth, emaciation abomasum (T. axei) Trichuris suis*,7 No IH Pigs Caecum Eggs (containing L1s) in soil Diarrhoea, anorexia, anaemia, poor growth, dehydration, emaciation Trichuris trichiura*,4 No IH Humans Caecum Eggs (containing L1s) in soil Dysentery, anaemia, growth retardation * Parasites studied in this thesis 1 see Keiser and Utzinger, 2009 2 see Durie, 1953 3 see Colley et al., 2014 4 see Bethony et al., 2006 5 see Sutherland and Scott, 2009 6 see Pozio and Murrell, 2006; Pozio, 2007 7 see Roepstorff et al., 2011 L1s, first larval stage; L3s, third larval stage

59

Table 1.3 Kinome sizes estimated from published genomes and transcriptomes of parasitic helminths, and numbers of predicted targets for anthelminthic drugs. Estimated kinome Number of predicted Species name Reference size drug targets Ancylostoma ceylanicum ~365 0 (Schwarz et al., 2015) Angiostrongylus cantonensis 361 n.r. (Yong et al., 2015) Ascaris suum 609 17 (Jex et al., 2011) 364 7 (Desjardins et al., 2013)

Brugia malayi 215 n.r. (Ghedin et al., 2007) 282 6 (Desjardins et al., 2013)

465 TTBKL and FER kinases (Bennuru et al., 2016)

Clonorchis sinensis 692 n.r. (Wang et al., 2011) n.r. n.r. (Huang et al., 2013)

Dirofilaria immitis 283 TTBKL and FER kinases (Bennuru et al., 2016) Fasciola hepatica ~307 n.r. (McNulty et al., 2017) Haemonchus contortus 845 27 (Schwarz et al., 2013) n.r. 0 (Laing et al., 2013)

Loa loa 310 7 (Desjardins et al., 2013) 306 TTBKL and FER kinases (Bennuru et al., 2016)

Meloidogyne hapla 234 6 (Desjardins et al., 2013) Necator americanus 274 59 (32 with associated drugs) (Tang et al., 2014) Onchocerca ochengi 282 TTBKL and FER kinases (Bennuru et al., 2016) Onchocerca volvulus 318 TTBKL and FER kinases (Bennuru et al., 2016) Opisthorchis viverrini 262 n.r. (Young et al., 2014) Pristionchus pacificus 346 6 (Desjardins et al., 2013) Schistosoma haematobium 261 0 (Young et al., 2012) (Schistosoma japonicum Genome Sequencing Schistosoma japonicum n.r. n.r. and Functional Analysis Consortium, 2009) Schistosoma mansoni 249 1 (Berriman et al., 2009) 252 (ePKs only) 15 (Andrade et al., 2011)

n.r. 18 (Caffrey et al., 2009)

Toxocara canis 458 57 (Zhu et al., 2015)

60

(Table 1.3 continued)

Trichinella spiralis 233 7 (Desjardins et al., 2013) Trichuris suis 232 n.r. (Jex et al., 2014) Trichuris trichiura n.r. 2 (Foth et al., 2014) Wuchereria bancrofti 304 10 (Desjardins et al., 2013) 230 TTBKL and FER kinases (Bennuru et al., 2016) n.r., not reported

61

5292 USD/Mb 727,799 6060 906 Sequencing costs in USD per mega base (Mb) (https://www.genome.gov/sequencingcosts/) "Kinase" publications in the PubMed database "Kinase sequence" publications in the PubMed database Experimentally determined kinase structures deposited in the Protein Data Bank "Kinome" publications in the PubMed database "(Parasit* OR bacteri* OR vir*) AND kinome" publications in the PubMed database Nematode and platyhelminth genome projects deposited as BioProject in the National Center for Biotechnology Information (NCBI) database "Next-generation sequencing" (Schuster, 2008)

C. elegans kinome Kinomer (Plowman et al., 1999) (Martin et al., 2009)

First approved kinase inhibitor (Sehgal, 1998) 1955 19951990 2000 2005 2010 2015 First metazoan genome B. malayi genome H. contortus genome (C. elegans Sequencing Consortium, 1998) (Ghedin et al., 2004) (Schwarz et al., 2013) Reversible protein phosphorylation Trichuris genomes (Jex et al., 2014; (Fischer and Krebs, 1955) Human kinome Kinannote Foth et al., 2014) Src kinase is an oncogene (Manning et al., 2002b) (Goldberg et al., 2013) (Martin, 1970) 143,454 First eukaryotic genome Human genome 'Sanger' sequencing 140 (Goffeau et al., 1996) (Venter et al.,2001) S. haematobium genome 125 (Sanger et al., 1977) First kinase structure (Young et al., 2012) (Knighton et al., 1991) Trichinella genomes (Korhonen et al., 2016) 0.014 USD/Mb Figure 1.1 Timeline showing advances in genome and transcriptome sequencing, protein kinase research, as well as genome and kinome research of parasitic helminths. All values representing lines have been min-max normalised for display purposes and some (light blue and blue; orange, beige and brown) have been normalised together to allow for visual comparisons. The latest values for individual database searches are given at the end of each line. Black, wavy lines represent a break in the timeline.

62

Figure 1.2 Salient sequence and structural features of the protein kinase catalytic domain. (A) The twelve conserved subdomains (color-coded) and important functional residues are shown (adapted from Hanks, 2003). Grey areas represent less conserved sequence regions. The dashed box represents the activation segment. (B) Representative crystal structure of a protein kinase catalytic domain in the active (DFG-in) conformation and a bound adenosine triphosphate (ATP) in black stick representation (Protein Data Bank identifier: 1ATP; adapted from Zheng et al., 1993). Subdomains are color-coded in concordance with (A).

63

CHAPTER 2 Defining the Schistosoma haematobium kinome enables the prediction of essential kinases as anti-schistosome drug targets ______

Abstract The blood fluke Schistosoma haematobium causes urogenital schistosomiasis, a neglected tropical disease (NTD) that affects more than 110 million people. Treating this disease by targeted or mass administration with a single chemical, praziquantel, carries the risk that drug resistance will develop in this pathogen. Therefore, there is an imperative to search for new drug targets in S. haematobium and other schistosomes. In this regard, protein kinases have potential, given their essential roles in biological processes and as targets for drugs already approved by the US Food and Drug Administration (FDA) for use in humans. In this context, we defined here the kinome of S. haematobium using a refined bioinformatic pipeline. We classified, curated and annotated predicted kinases, and assessed the developmental transcription profiles of kinase genes. Then, we prioritised a panel of kinases as potential drug targets and inferred chemicals that bind to them using an integrated bioinformatic pipeline. Most kinases of S. haematobium are very similar to those of its congener, Schistosoma mansoni, offering the prospect of designing chemicals that kill both species. Overall, this study provides a global insight into the kinome of S. haematobium and should assist the repurposing or discovery of drugs against schistosomiasis. ______

64

2.1 Introduction Schistosomiasis is a neglected tropical disease caused by blood flukes of the genus Schistosoma (phylum Platyhelminthes; class Trematoda) (World Health Organization, 2012; Colley et al., 2014). The three main species, Schistosoma haematobium, Schistosoma mansoni and Schistosoma japonicum, affect around 230 million people worldwide (Colley et al., 2014). The former two flukes predominate, infecting almost 200 million humans in sub- Saharan Africa alone (van der Werf et al., 2003; Rollinson et al., 2013). S. haematobium causes the urogenital form of this disease, and S. mansoni leads to hepato-intestinal illness (Colley et al., 2014). These flukes have a complex life cycle, involving aquatic snails (family Planorbidae) as intermediate hosts. In freshwater, the infective larvae (cercariae) leave the snail and infect the definitive, human host by penetrating skin. Upon penetration, the cercariae lose their tails, and the larvae (schistosomules) migrate through the circulatory system and lung to the portal system, after which they mature and mate. Subsequently, paired adult worms migrate to their site of predilection and start to reproduce. S. mansoni adults live mainly in the portal system and/or the mesenteric venules of the small intestine, where they produce eggs that pass through the intestinal wall and are excreted in faeces. S. haematobium adults usually inhabit the blood vessels around the urinary bladder and genital system; here, the parasite produces eggs that pass through the bladder wall and are released in urine. Once eggs are released into freshwater, they immediately hatch to release miracidia (free-living larvae), which then invade a molluscan intermediate host (Colley et al., 2014). S. haematobium infects snails of the genus Bulinus (see Rollinson et al., 2001), whereas S. mansoni prefers snails of the genus Biomphalaria (see Morgan et al., 2001). Disease in humans is precipitated by eggs that become entrapped in tissues, where they induce a chronic immune-mediated response, followed by granulomatous changes and ensuing fibrosis (Colley et al., 2014). Eggs of S. mansoni become lodged mainly in the liver and intestinal wall, leading to egg-induced hepatitis, enteritis and/or associated complications (Burke et al., 2009). In contrast, S. haematobium eggs are deposited mainly in the vasculature of the urinary bladder, ureter and/or genital tract (particularly in female individuals), although they can be disseminated to other sites in the body. Entrapped eggs induce considerable inflammation and subsequent fibrosis and/or calcification of the bladder. In addition, chronic S. haematobium infection can increase the risk of secondary bacterial infections (Burke et al., 2009), is a predisposing factor for HIV/AIDS (Kjetland et al., 2006) and can, together with other factors, induce malignant bladder cancer (Palumbo, 2007). As there is no effective vaccine against schistosomiasis, current treatment relies on a single drug, praziquantel

65

(Doenhoff et al., 2008). With increased efforts to control this disease by mass treatment, the possibility of praziquantel resistance developing is a serious concern (Chai, 2013; Greenberg, 2013). Thus, there is a need for sustained research toward developing alternative chemotherapeutic compounds against schistosomiasis. Recent research efforts to identify new molecular targets for chemotherapeutic intervention have focused on protein kinases (Beckmann et al., 2012; Knapp et al., 2013), because they are involved in signalling cascades of essential regulatory and developmental processes (Manning et al., 2002a; Manning, 2005; de Saram et al., 2013), particular kinase groups have relatively conserved structures (Hanks, 2003), and also because drugs targeting these enzymes in humans have shown particular potential for the treatment of cancers and other diseases (Cohen, 2002; Eglen and Reisine, 2009). Protein kinases are enzymes (transferases) that phosphorylate a substrate by transferring a phosphoryl group from an energy-rich molecule, such as adenosine triphosphate (ATP), to a target protein. This phosphorylation induces a modification of the substrate, leading to changes in conformation and activity (Cohen, 2000). Substrates are phosphorylated at an amino acid residue that has a free hydroxyl group. Kinases can be subdivided into serine/threonine-phosphorylating kinases (STKs), tyrosine-phosphorylating kinases (TKs) and kinases that phosphorylate either of these residues (called “dual-specificity” or “hybrid” kinases). The conserved, catalytic domain of kinases is a protein fold consisting of an amino-terminal lobe comprised of β-strands and a carboxy-terminal lobe that contains α-helices (Ubersax and Ferrell Jr., 2007). A polypeptide linker functions as a hinge and connects the two lobes, allowing for rotation. This lobe structure forms a catalytic cleft for substrate and ATP binding (Hanks and Hunter, 1995; Manning et al., 2002a; Ubersax and Ferrell Jr., 2007). Eukaryotic protein kinases (ePKs) represent the largest class of enzymes that share the same protein kinase-like (PKL) fold (Kannan et al., 2007). Kinases that have catalytic activity but are not structurally similar to the PKL fold are classified as atypical kinases (aPKs) (Manning et al., 2002a). Protein kinases can be assigned to groups, families and subfamilies based on sequence similarity in their catalytic domains and the presence of accessory domains. The established classification scheme for kinases (http://kinase.com/kinbase) (Manning, 2005) is based on that originally proposed by Hanks and Hunter (Hanks and Hunter, 1995), and defines nine ePK groups. Recognising their essential role in a range of regulatory processes and relatively conserved structure and function (Dissous and Grevelding, 2011; Dissous et al., 2013; Morel et al., 2014a), more than 20 ePKs have been investigated experimentally in S. mansoni (see Dissous

66

and Grevelding, 2011; Beckmann et al., 2012). Some of these kinases have been shown to assume essential functions in the parasite (Kapp et al., 2004; Swierczewski and Davies, 2009; Beckmann et al., 2010; de Saram et al., 2013; Andrade et al., 2014). For example, the targeting of multiple receptor kinases of S. mansoni with a single inhibitor led to a fatal impact on schistosome morphology and physiology (Vanderstraete et al., 2013). The fact that human protein kinases are involved in cancer and numerous compounds which inhibit these enzymes are available and approved for therapeutic use offers a unique prospect of repurposing such chemicals to schistosomes (Dissous and Grevelding, 2011). In this context, the in silico prediction of the kinome of S. mansoni provides a basis for the investigation of schistosome kinases as drug targets (Andrade et al., 2011). In contrast to the situation for S. mansoni, there is no detailed information on the kinome of S. haematobium or any other schistosome. Given that S. haematobium is the causative agent of schistosomiasis in approximately two thirds of all humans infected by schistosomes and therefore has a substantial socioeconomic impact, in terms of disability-adjusted life years and morbidity (van der Werf et al., 2003), there is a major need to work toward identifying drug targets in S. haematobium and designing new treatments (Rollinson, 2009; Brindley and Hotez, 2013). In the present study, we defined the kinome of S. haematobium. Employing the S. mansoni kinome as a reference (Andrade et al., 2011), we: (i) curated the full complement of predicted kinases of S. haematobium using a comparative genomic- phylogenetic approach; (ii) assessed levels of transcription of genes encoding these kinases in the adult and egg stages of S. haematobium, and (iii) prioritised a panel of kinases as potential drug targets as well as chemicals inferred to bind to them using an integrated bioinformatic pipeline. We discuss the findings in the context of drug discovery and with regard to the distinctive biologies of S. haematobium and S. mansoni.

2.2 Methods 2.2.1 Defining the S. haematobium kinome We predicted, curated and annotated the protein kinase complement encoded in the published draft genome (Young et al., 2012) using an integrated bioinformatic pipeline in six steps (Figure 2.1):

1. First, we identified ePKs and PKLs of S. haematobium using the program Kinannote (Goldberg et al., 2013a) employing the -m (metazoan) option. Predicted kinase sequences were then classified according to group, family and/or subfamily (Manning et al., 2002b; Manning, 2005). Sequences that could not be unequivocally classified

67

using this approach were retained for subsequent curation. 2. Orthologous kinase sequences from both S. haematobium and S. mansoni were predicted by pairwise sequence comparison using the program OrthoMCL (Li et al., 2003), employing publicly accessible (SchistoDB v.3.0; http://schistodb.net/schisto/ and GeneDB v.5.2; http://www.genedb.org/) genomic and transcriptomic data sets (Berriman et al., 2009; Protasio et al., 2012; Young et al., 2012). Amino acid sequences that grouped with classified kinases, but were not predicted to be kinases using Kinannote, were added to a kinase group, family or subfamily based on their respective orthologous sequence (in the heterologous species) and included in subsequent analyses. 3. Then, we exhaustively searched all of the genomic and transcriptomic data available for S. haematobium and S. mansoni, to be able to complement any incomplete sequences and also to retrieve kinase-encoding sequences that had not been predicted previously for either or both schistosome species. If a full-length ortholog could not be inferred for the heterologous species, the kinase amino acid sequence was aligned to the genomic scaffold coding for the incomplete gene using the program BLAT v.34x12 (Kent, 2002). This genomic region was then exhaustively searched for a full-length orthologous coding domain using the program Exonerate v.2.2.0 (Slater and Birney, 2005) employing the multi-pass suboptimal alignment algorithm and the protein2genome:bestfit model. Refined gene predictions and protein translations were named according to their ortholog identifier (e.g., Sh_Smp_123456.1 and Smp_A_12345). 4. To increase the sensitivity of identification of kinase domains of schistosomes, we constructed hidden Markov models (HMMs) for individual kinase groups based on the catalytic domains of high-confidence trematode kinase sequences (assigned to a subfamily by Kinannote) using the program HMMER v.3.1b1 (http://hmmer. janelia.org/). These HMMs were constructed using the inferred proteomic data sets of S. japonicum, Clonorchis sinensis, Opisthorchis viverrini and Fasciola hepatica (see Schistosoma japonicum Genome Sequencing and Functional Analysis Consortium, 2009; Young et al., 2010, 2014; Huang et al., 2013), and were then employed to query kinase sequences of individual groups of S. haematobium and S. mansoni and to identify catalytic kinase domains. 5. The catalytic domain sequences of all predicted kinases representing individual groups were aligned using the program MAFFT v.6.864b, employing the L-INS-i option

68

(Katoh and Standley, 2013). Alignments were improved using the program MUSCLE v.3.7 (-refine option) (Edgar, 2004) and by subsequent manual adjustment, to optimise the alignment of homologous characters. The aligned sequences were then subjected to Bayesian inference (BI) analysis in the program MrBayes v.3.2.2 (Ronquist et al., 2012). Posterior probabilities (pp) were calculated, as recommended, using a mixture of models with fixed rate matrices, generating 1,000,000 trees and sampling every 100th tree. The initial 25% of trees were discarded as ‘burn-in’, and the others were used to construct a majority rule tree. Phylogenetic trees were drawn using the program FigTree v.1.4.1 (http://tree.bio.ed.ac.uk/software/figtree/). 6. Curated kinase sequences were functionally annotated by searching the databases Swiss-Prot (database release 01/2014) (Boutet et al., 2007) and KEGG BRITE (database release 03/2014) (Kanehisa and Goto, 2000) using protein BLAST v.2.2.28+ (Camacho et al., 2009) and an E-value cut-off of 1e-5. Pfam domains and PANTHER families were predicted using the program InterProScan v.5-44.0 (Jones et al., 2014). In addition, sequence identities and similarities to S. mansoni and human kinase homologs (sequences accessed from KinBase, http://kinase.com/kinbase/FastaFiles/) were determined for S. haematobium kinases by pairwise comparison using the program EMBOSS Matcher v.6.3.1 (Rice et al., 2000).

2.2.2 Transcription analysis We assessed transcription in male and female adults as well as eggs of S. haematobium using publicly available RNA-Seq data (Young et al., 2012). Data were filtered using the program Trimmomatic (Bolger et al., 2014) and aligned to the final sequences encoding kinases using Bowtie v.2.1.0 (Langmead and Salzberg, 2012). Levels of transcription (numbers of transcripts per million, TPM) were calculated using the software package RSEM v.1.2.11 (Li and Dewey, 2011). Kinase genes were considered as transcribed if at least 5 read pairs mapped to their coding regions and they had a TPM of > 0. For each kinase gene, a relative measure of transcription was inferred by ranking individual genes from S. haematobium by their TPM values. The top and bottom 10% of transcribed genes were defined as being highly and lowly transcribed, respectively.

2.2.3 Drug target prediction and prioritisation To assess the druggability of individual predicted kinases and to prioritise them as potential targets in S. haematobium, essentiality was inferred by selecting S. haematobium

69

proteins homologous (protein BLAST; E-value of ≤ 1e-5) to Caenorhabditis elegans, Drosophila melanogaster and/or Mus musculus kinases with a lethal phenotype upon gene perturbation - listed in WormBase (Harris et al., 2014), FlyBase (Drysdale and FlyBase Consortium, 2008) and the Mouse Genome Database (Eppig et al., 2015). Essential kinases were considered to represent metabolic ‘chokepoints’ if only one gene was assigned to one KEGG orthologous gene (KO) term for a KEGG pathway. These kinases were then matched to homologous kinase sequences in the databases Kinase SARfari (Gaulton et al., 2012) and DrugBank v.3.0 (Law et al., 2014) using PSI-BLAST v.2.2.26+ employing an E-value cut-off of 1e-30 (Altschul et al., 1997). If both query and target sequence had the same kinase classification (using Kinannote), the sequence in the database had one or more ligands that met “Lipinski’s rule-of-five” (Lipinski, 2004) and was flagged as “medicinal chemistry- friendly”, salient information on associated ligands (chemicals or small molecules) was extracted from the two databases and used to assess the druggability of the target. Prioritised kinases predicted to bind compounds approved by the FDA for use in humans or assessed in clinical trials, as indicated in Kinase SARfari (https://www.ebi.ac.uk/chembl/sarfari/kinasesarfari), were considered to have potential as drug targets. Kinases with entries in DrugBank were prioritised as drug targets if at least one associated small molecule (with a description of its properties) was found in this database.

2.3 Results 2.3.1 The S. haematobium kinome Here, we employed an integrative bioinformatic pipeline (Figure 2.1). First, we predicted 223 kinases in the S. haematobium genome, 111 and 93 of which could be assigned to subfamilies and families, respectively; 10 could be assigned exclusively to a group, and nine remained unclassified. Subsequently, we identified 46 additional kinase sequences. Following this curation, the number of unclassified sequences decreased to four, and an improved classification of kinases into subfamilies (n = 134), families (n = 129) and groups (n = 2) was achieved (Figure 2.2; Appendix 2.1). Thus, the curated S. haematobium genome was inferred to encode 269 kinases, including both ePKs and PKLs. A total of 261 ePKs representing all nine major kinase groups were identified in S. haematobium (Figure 2.2; Appendix 2.1). The largest group represented CMGCs (n = 51), including 17 cyclin-dependent kinases (CDKs), four CDK-like kinases (CDKLs), 10 mitogen-activated protein kinases (MAPKs), 11 dual-specificity tyrosine-regulated kinases (DYRKs), four glycogen synthase kinases (GSKs), two CDC-like kinases (CLKs), and one

70

member of each of the families CK2, RCK and SRPK. The second largest group was CAMK, representing 41 kinases including CAMKs, CAMK-related kinases, MARKs and death- associated protein kinases (DAPKs). Only slightly smaller was the “Other” group, which included 40 kinases representing 20 families that do not belong to any of the eight other ePK groups; this group included NEK, AUR (Aurora kinase), BUD32, HASPIN, two Polo-like kinases (PLKs), PEK (pancreatic eIF-2α kinase), SCY1 and ULK (Unc-51-like kinase). The AGC group represented 39 kinases, including the cyclic nucleotide-dependent kinase families PKA (n = 6), PKG (n = 4) and PKC (n = 5), and RSKs (n = 5) and DMPKs (n = 7). Of the 31 members of the TK group, 13 were receptor tyrosine kinases (RTKs), including epidermal growth factor receptors (EGFRs), fibroblast growth factor receptors (FGFRs), insulin receptors (INSRs or IRs) and two venus kinase receptors (VKRs). The other 18 members were cytoplasmic tyrosine kinases (CTKs) and were assigned to 11 families (ABL, ACK, CSK, FAK, FER, RYK, SEV, SYK, TEC, TRK and SRC). The STE group contained 18 members of the STE20 family (MAP4Ks), two STE11 kinases (MAP3Ks) and six STE7 family members (MAP2Ks). The 20 representatives of the TKL group belonged to the families STKR (n = 7), MLK (n = 6), RAF (n = 3), LRRK (n = 1) and LISK (n = 3). We also identified nine kinases belonging to the CK1 group, including three members of the Tau tubulin kinase family (TTBK) and one vaccinia-related kinase (VRK). Finally, with only three members, the receptor guanylate cyclases (RGCs) represented the smallest group of ePKs in the S. haematobium kinome. In addition to ePKs, we identified four PKLs: two right open reading frame kinases, Sh- RIOK-1 (A_06019) and Sh-RIOK-2 (A_01816), and two representing the ABC1 family (A_02560 and A_01324). The S. haematobium genome also encodes four unclassified serine/threonine kinases, to which we assigned the following annotations based on similarity searches against the protein database Swiss-Prot: A_05753 - cell cycle serine/threonine- protein kinase CDC5; A_08069 - kinase suppressor of Ras 1 (KSR); C_01296 - serine/threonine-protein kinase WNK1; Sh_Smp_017900.1 - ribosomal protein S6 kinase (RSK). All remaining kinase sequences (n = 265) were assigned to families and/or subfamilies, except for two sequences (A_03674 and A_04152) that could be classified only to a group level (i.e. CAMK and STE, respectively). In a phylogenetic analysis, sequence A_03674 clustered with A_07692 (predicted PKD kinase), albeit with a low nodal support (61%; Appendix 2.2), and thus could not be assigned with confidence to any particular family. The homolog of sequence A_04152 (STE family member) in S. mansoni (Smp_146290.1) has been classified previously as a STE7 kinase (Andrade et al., 2011), but,

71

according to the present analysis, it clustered with a kinase of the STE20 family and FRAY subfamily with 72% nodal support (Appendix 2.2). Thus, sequence A_04152 was not classified to a family or subfamily level. For 267 of the 269 kinases defined in S. haematobium, orthologs were identified in S. mansoni based on a comparative genomic approach and subsequent phylogenetic analyses. For two S. haematobium kinase sequences, no ortholog was found, in spite of exhaustive searching of the S. mansoni genome (A_01970; CMGC/MAPK/ERK7 and A_07508; CMGC/DYRK/DYRK2), suggesting their uniqueness to S. haematobium. A comparison of the kinomes of S. haematobium and S. mansoni revealed a high overall sequence identity (82-92%), similarity (87-94%) and a relatively conserved length (0-7% difference) between pairs of kinases (Table 2.1). The degree of sequence similarity among individual kinase groups differed considerably, with kinases from the groups CK1 and RGC, and unclassified and PKL kinases, being, on average, more dissimilar compared with the other groups (Table 2.1). A pairwise sequence comparison of kinases of S. haematobium with human homologs revealed an average sequence similarity ranging from 60.9% (PKL) to 76.3% (CK1) for kinases that could be classified. For unclassified kinase sequences, we observed low sequence identity (35.1% on average) to their closest human homologs. Subsequent phylogenetic analyses of ePKs of both S. haematobium and S. mansoni supported the orthology found between pairs of kinases of these two species. With the exception of the Polo-like kinase Sh-SAK, and representatives of the ULK, SCY1, PKA and CAMK1 families/subfamilies (Figure 2.2; Appendix 2.2), orthologous sequences formed pairs in individual trees (Figure 2.2; Appendix 2.2), consistent with their classification using an approach based on HMMs. Seven kinase sequences were excluded from phylogenetic analysis, because the catalytic domain of one or both representatives of the orthologous pair did not match the trematode-specific HMM. Six of these sequences were members of the family SCY1 (A_01858, Smp_176440.1 and Sh_Smp_156890.1) or HASPIN (Smp_Sh_A_07473, Sh_Smp_158950.1 and Smp_158950.1), which are part of the “Other” kinase group. The seventh sequence (Smp_Sh_A_06810) was a member of the STE group, STE11 family and ASK subfamily. Taken together, the 269 protein kinases of S. haematobium and 267 orthologs in S. mansoni were shown to represent all nine recognised kinase groups, 88 families and 79 subfamilies. However, we did not detect representatives of 19 kinase families and subfamilies (Table 2.2) in these two schistosomes (; Protostomia) that are present in members of both the (Protostomia, represented by C. elegans and D.

72

melanogaster) and Deuterostomia (represented by Homo sapiens). Finally, we functionally annotated S. haematobium kinase sequences identified herein and linked them to 20 conserved functional categories (Figure 2.3; Appendix 2.1). Most kinases were predicted to have functional roles in signal transduction, cell communication, cell growth and the immune and/or nervous systems (Figure 2.3; Appendix 2.1).

2.3.2 Transcription profiles Following the curation and annotation of kinase sequences, we assessed transcription levels of respective genes in different developmental stages and genders of S. haematobium (adult male, adult female and egg). Of the 274 sequences encoding kinases identified in S. haematobium, 214 were transcribed in all three stages (Figure 2.4). By contrast, 13 kinase genes were transcribed exclusively in the male and egg stages, 21 kinase genes were uniquely transcribed in the two adult stages, and one gene was transcribed in the female and egg stages, to the exclusion of the male stage (Figure 2.4). One and eight kinase genes were transcribed exclusively in the egg and male stages, respectively. Among the eight male- specific genes were orthologs of the testis-expressed gene 14 (tex-14, Sh_Smp_131630.1_p1) and a gene coding for an atrial natriuretic peptide receptor (A_02682), a kinase belonging to the RGC group that regulates cardiovascular and body fluid homeostasis (Takei, 2000). For 16 kinase genes, there was no evidence of transcription in any of the life cycle stages studied here (Figure 2.4; Appendix 2.3). We also assessed transcription levels for the four unclassified S. haematobium kinase genes. For the sequence A_05753, we did not observe transcription in any of the life stages studied; A_08069 was lowly transcribed in the adult female only (TPM: 0.06) and C_01296 was moderately transcribed in both adult stages (TPM female: 2.64; TPM male: 9.80); Sh_Smp_017900.1 was most highly transcribed in the egg stage (TPM: 50.97), but was also transcribed at varying levels in both adult stages (TPM female: 5.32; TPM male: 23.57). Although most kinase genes were transcribed in all developmental stages of S. haematobium (Figures 2.3A and 2.4), there were differences in transcription levels, depending on their functional category (Figure 2.3B). Notably, almost twice as many genes of kinases associated with cell growth and death were highly transcribed in the egg stage compared with either gender of the adult stage. In addition, kinase genes associated with cell motility were more abundantly transcribed in the male adult. We also found increased levels of transcription for kinase genes associated with environmental adaptation and the sensory system in the egg and male adult compared with the female adult stage.

73

2.3.3 Druggable kinases and their prioritisation Following the transcriptional analysis, we prioritised S. haematobium kinases as potential drug targets. First, we inferred the essentiality of S. haematobium kinase genes based on lethal gene knock-down or knock-out phenotypes linked to one-to-one orthologs in C. elegans, D. melanogaster and/or M. musculus (Appendix 2.4). In total, 219 of 269 (81%) S. haematobium kinases matched orthologs inferred to be associated with lethal phenotypes in at least one of the three organisms (Appendix 2.4). Of these 219 kinases, 57 mapped (at amino acid level) to unique chokepoints in key biological pathways (Appendix 2.4). Of these 57 kinases, 40 were predicted to bind chemical ligands listed in Kinase SARfari and DrugBank, 11 of which were present in both databases (Appendices 2.5 and 2.6). These 40 kinases represented all recognised groups, except RGC, and had human orthologs, some of which related to the nervous system, development and/ or cancer (Figure 2.5B). Then, we showed that genes encoding these 40 kinases were transcribed in both adult and egg stages (n = 38), and that two (i.e. A_06570 and A_07448) were specific to adults (Appendix 2.3). Amongst them were two casein kinases (A_08312.1 and Sh_Smp_099030.1) with > 90% sequence similarity to human orthologs; four other kinases in this group (i.e. A_03569 (FAK), A_00551 (GCN2), m.56516 (RAF) and A_03539 (CHK1)) had ≤ 50% sequence similarity to human counterparts (Appendix 2.1). Of the 40 prioritised kinases, tyrosine kinases were the most highly represented group (n = 9), including a fibroblast growth factor receptor (Sh-FGFR-A), two insulin receptors (Sh-IR-1 and Sh-IR-2) and kinases SYK (Sh-TK4) and FYN (Sh-TK5), orthologs of which have been experimentally evaluated as drug targets in one or more schistosomes other than S. haematobium (see Kapp et al., 2001; Beckmann et al., 2010, 2012; You et al., 2010; Vanderstraete et al., 2013; Hahnel et al., 2014). Two other targets, namely Sh-Akt (AGC group) and A_04108.1 (CMGC group; GSK family), were inferred, both of which have also been predicted to be promising drug targets in S. mansoni (see Caffrey et al., 2009; Morel et al., 2014b) (Figure 2.5A). Taken together, we predicted that all 40 essential kinases represent targets, and therefore interrogated key databases for chemicals. We identified 42 drugs predicted to bind one or more of these targets, 17 of which are already approved by the FDA for the treatment of cancers or other diseases of humans (Table 2.3). These 17 drugs include four ABL kinase inhibitors (imatinib, http://www.drugs.com/monograph/imatinib-mesylate.html; dasatinib, http://www.drugs.com/monograph/dasatinib.html; bosutinib, http://www.drugs.com/monograph/bosutinib.html; ponatinib,

74

http://www.drugs.com/monograph/ponatinib.html), one JAK kinase inhibitor (tofacitinib), one GSK3 inhibitor (lithium carbonate), one protein kinase C inhibitor (ingenol mebutate) and 10 other drugs that inhibit multiple (receptor) kinases.

2.4 Discussion Here, we established an integrated bioinformatic pipeline to identify, classify and curate full-length kinase sequences encoded in the genome of S. haematobium for subsequent comparison with orthologs in S. mansoni and humans. This workflow enabled high- confidence predictions of anti-schistosome drug targets and compounds, and should be applicable to various schistosome species and, following modification, also to other flatworms as well as roundworms. In the future, we propose to gradually enhance the workflow by integrating tools for the prediction of binding sites of ligands, structural comparisons of prioritised targets and/or comparative analyses of parasite and host kinases into this pipeline. In most previous studies, the identification of kinase sequences has relied on searches using HMMs from databases such as Pfam (Sonnhammer et al., 1997) or Kinomer (Martin et al., 2009), or position-specific scoring matrices (PSSMs) (Marchler-Bauer et al., 2013). However, the combination of several of these methods can achieve enhanced predictions and classification compared with a single method. The program Kinannote uses such a combined approach, thereby increasing sensitivity and precision for kinase identification (Goldberg et al., 2013a), and was thus employed by us to produce a draft kinome in the first step of our workflow. Subsequently, an orthology-based approach (Li et al., 2003), using the published kinome (Andrade et al., 2011) and draft genome of S. mansoni (see Protasio et al., 2012) as a reference, identified pairs of kinase orthologs, which facilitated the improvement of gene models for both schistosomes. This step also increased the number of kinases identified in S. haematobium by 17%, and their classification into families/subfamilies by 30%. Independent phylogenetic analyses verified the pairs of orthologs and functional subfamilies. Since the construction of reliable phylogenetic trees requires meticulous alignment of homologous characters, we restricted multiple alignments to the catalytic domains of kinases, because some sequence regions external to the catalytic domain can vary considerably. Phylogenetic trees calculated from these alignments can be used to sub-classify kinases, as sequence divergence in catalytic domains of kinases is recognised to reflect variation of function and/or mode of regulation of protein kinases (Hanks et al., 1988; Hanks and Hunter, 1995). The boundaries of kinase catalytic domains, such as Pkinase (Pfam identifier PF00069) or

75

Pkinase_Tyr (Pfam identifier PF07714), are usually defined by HMMs. However, the sequences used to construct these two HMMs (n = 54 and n = 145, respectively) did not represent any lophotrochozoans, and thus, might not accurately represent the catalytic kinase domains of trematodes, which are clearly evolutionarily very distinct from those of Ecdysozoa and Deuterostomia (see Mallatt et al., 2012). In contrast to the alignment made using these Pfam HMMs, we obtained an improved alignment of homologous characters (with less gaps) by constructing a HMM from high-confidence kinase predictions for four trematode species. Using the present bioinformatic workflow, we identified 269 full-length kinases that represent the kinome of S. haematobium. An assessment of transcription levels revealed transcription of 258 sequences, 214 (79.5%) of which were constitutively transcribed in all developmental stages/sexes studied, indicating essential roles for these kinases in signalling processes throughout the parasite’s life cycle. This statement is supported by the constitutive transcription of 83 of the 108 kinase genes (77%) assigned to the functional categories “signal transduction” and/or “cell communication”. In contrast, only 11 (10%) kinase genes assigned to these general categories had variable transcription profiles. Although a small number of kinase sequences identified (n = 16; < 6%) were not transcribed in either the egg or adult stage, they are likely to be transcribed in other developmental stages (including the miracidium, cercaria and/or schistosomulum) not investigated here. The validity of these sequences was supported by pairwise orthologs in S. mansoni that are transcribed in the cercarial and/or schistosomule stages (Protasio et al., 2012). Sex-specifically transcribed kinase genes were more frequently assigned to specialised functional categories; among them was the male-specifically transcribed testis-expressed gene 14 (tex14, Sh_Smp_131630.1), which we hypothesise is critical for chromosome segregation associated with mitosis and meiosis during spermatogenesis. This proposal is supported by findings in mice, showing that tex14 is highly expressed during spermatogenesis, and localises to intracellular bridges of germ cells, where it plays an integral role in the establishment and maintenance of male fertility (Wu et al., 2003; Greenbaum et al., 2006). Other evidence from a study of human cells lines shows that TEX14 is regulated by the kinase Plk-1 and is crucial for kinetochore-microtubule attachment during mitosis (Mondal et al., 2012). A second gene encoding a protein kinase R (PKR)-like endoplasmic reticulum kinase (PERK; A_03220) was transcribed exclusively in female and egg stages of S. haematobium. The human ortholog of this kinase phosphorylates the eukaryotic translation initiation factor

76

2α (eIF2α) and mediates the response to endoplasmic reticulum (ER) stress (represented by an accumulation of misfolded or unfolded proteins in the ER) which, among other factors, is induced by glucose deprivation (Xu et al., 2005; Badiola et al., 2011) and/or an excessive requirement for proteins (Oslowski and Urano, 2011). The transcription of this additional, stress-mitigating kinase in eggs and female worms might thus be a mechanism to cope with increased ER stress due to the energy- and protein-demanding processes of reproduction, which are sustained by glucose metabolism. This specific transcription might also relate to stress on female worms, induced by separating them from their male partner (on which they rely, in terms of nutrient supply, such as sugar uptake from the host; cf. Cornford and Fitzpatrick, 1985) prior to RNA-sequencing. A third kinase gene encoding a myotonic dystrophy protein kinase (A_05067) of the DMPK family was transcribed exclusively in the egg stage of S. haematobium. Since different muscle types are already established in the miracidium within the egg, and a transformation of these muscle structures takes place during metamorphosis from sporocysts to cercariae (Bahia et al., 2006), we propose that this kinase-encoding gene is specifically transcribed in the miracidium in the egg, and is involved in muscle development and/or locomotion/motility. Evidence from other , such as D. melanogaster, shows that DMPKs are involved in establishing correct muscle morphology and functionality in third instar larvae (Picchio et al., 2013). This aspect warrants further exploration when RNA- sequencing data for the miracidium stage of S. haematobium become available. Comparative analysis showed that the S. haematobium kinome contains all recognised eukaryotic kinase groups, including 79 of the 144 (55%) subfamilies found in other metazoans studied (Manning et al., 2002b; Manning, 2005). The S. haematobium kinome has approximately half of the 518 kinases found in humans (Manning et al., 2002a) and has a similar number to that (n = 438) of the C. elegans kinome, to the exclusion of known specific expansions in this free-living nematode (Manning et al., 2002b; Manning, 2005). Nonetheless, we did not detect any members of 19 kinase families/subfamilies present in C. elegans, D. melanogaster or H. sapiens. The lack of evidence for kinases of these families/ subfamilies, including RIO3 (which has been lost from numerous flatworms; cf. Breugelmans et al., 2015), suggests their absence from schistosomes or a substantial diversification of their sequences that precluded their identification. Since there are presently no curated kinomes for flatworms other than S. haematobium and S. mansoni, it is not known whether such kinase families or subfamilies have been lost from all lophotrochozoans or only from schistosomes during evolution. A preliminary exploration of the flatworms C. sinensis, O. viverrini and F.

77

hepatica (data not shown) suggests that these families and subfamilies (except the PIKK family) are absent from lophotrochozoans. Future studies should focus on defining and curating the kinomes of a range of socioeconomically important parasitic flatworms and roundworms (nematodes), in order to undertake detailed comparative analyses, explore kinome evolution and investigate contractions and expansions of particular kinase groups in relation to worm phylogeny as well as biology. The global comparison of the kinomes of S. haematobium and its close relative, S. mansoni, did not detect any major expansions or contractions in kinase groups, families or subfamilies, but did reveal two kinase genes of the CMGC group (ERK7 and DYRK2 subfamilies) that are present exclusively in the former species. Given the quality of the draft genome and transcriptome of S. mansoni, there is only a remote possibility that these two genes were not detected. It is more plausible that they are indeed uniquely present in S. haematobium and encode kinases that may relate indirectly to this pathogen’s unique biology and site predilection in the human host. Published evidence indicating that ERKs are involved in parasite-host interactions (Vicogne et al., 2004; Ressurreição et al., 2014) supports this hypothesis. Although very little is known about the function of the second S. haematobium-specific kinase (DYRK2), in human and murine cell lines, a DYRK homolog interacts with the MAPK kinase MKK3 (an up-stream activator of p38), which is involved in a growth factor-mediated signalling pathway (Lim et al., 2002). The fact that both S. haematobium-specific kinases are part of receptor-activated signalling pathways advocates a role in pathogen-host interactions, as has been suggested previously for other receptor kinase pathways (Ahier et al., 2008; LoVerde et al., 2009). Despite this difference of two kinases, the comparison of the kinomes of S. haematobium and S. mansoni showed a relatively high level of conservation of kinase sequences. Although such conservation has been reported previously for small numbers of kinases (Swierczewski and Davies, 2010; You et al., 2010; Vanderstraete et al., 2013), here we report the first global comparison of these kinomes. The conservation between the kinomes of the two most medically important species of schistosomes is considered to provide opportunities for the repurposing of existing, safe drugs against both species (Dissous and Grevelding, 2011). Thus, we focused on 40 S. haematobium kinase genes with (relatively) conserved orthologs in S. mansoni and S. japonicum (not shown) as well as human, whose gene products are inferred to be essential and to bind drugs available for treating human diseases. A functional annotation of these 40 kinases showed that 37.5% (n = 15) were linked to human orthologs that are involved in cancer pathways, and a similar number of kinases (n =

78

14; 35%) were linked to roles in the immune system (Figure 2.5B). Based on these findings, we suggest that associated anti-cancer/anti-inflammatory compounds should now be assessed as to their ability to disrupt normal schistosome growth, development and/or viability in vitro. In this context, a recent study (Beckmann et al., 2014) has shown that blood components (such as serum albumin and α-1 acid glycoprotein) impede the deleterious effect of the drug imatinib on schistosomes in vitro, which should be considered in the experimental design of in vitro or in vivo experiments. A list of compounds (Table 2.3) revealed promising candidates for repurposing as schistosome kinase inhibitors. Many of these compounds have been predicted to target multiple kinases (“targeted polypharmacology”), a property that can increase the deleterious effect of a drug, thereby overcoming limited efficacy (due to redundancies in signalling pathways) associated with some single-targeted drugs (Morphy, 2010; Anighoro et al., 2014). Among the selected compounds were the anti-cancer drugs imatinib and dasatinib, the latter of which is assumed to target the Src/Fyn kinase SmTK5 in S. mansoni (see Beckmann et al., 2012). The orthologous kinase in S. haematobium (Sh-TK5) is one of the 40 prioritised targets in this study. Other selected targets of particular interest (Figure 2.5A) include a Syk kinase (Sh-TK4), four receptor kinases (Sh-IR1, Sh-IR2, Sh-FGFR-A and B_00871), two members of the AGC group (Sh-Akt and A_01385) and a GSK3 kinase (A_04108.1). These kinases have either already been computationally predicted as drug targets in S. mansoni, or there is some experimental evidence indicating that orthologs in S. mansoni are essential and/or can be inhibited in vitro (Ahier et al., 2008; Caffrey et al., 2009; Beckmann et al., 2010, 2012; You et al., 2010; Vanderstraete et al., 2013; Hahnel et al., 2014; Morel et al., 2014a, b; Ressurreição et al., 2014), which lends additional support to our predictions. Furthermore, we predicted 32 additional kinases as potential targets for which no experimental information is yet available for schistosomes, including a TTK kinase (Sh_Smp_171610.1) and an eIF2α kinase ortholog (A_00551). Sh_Smp_171610.1 is an ortholog of a human kinetochore kinase, also known as Mps1 (Monopolar spindle 1), which plays an essential role in the spindle assembly checkpoint (SAC) pathway (Malumbres and Barbacid, 2007). The prioritised eIF2α kinase ortholog is involved in mediating stress- response pathways, and several members of this kinase family are essential in Plasmodium falciparum (malaria parasite) (Goldberg et al., 2013b). Taken together, the high sequence similarity between schistosome kinases and the availability of kinase inhibitors for human orthologs offer great prospect with regard to the development of new anti-schistosome drugs.

79

In addition to the conserved kinase complement, there is also considerable merit in exploring selective kinase targets, namely those that are specific to schistosomes but absent from the mammalian host. For instance, the two genes encoding VKRs are specific to schistosomes and other Protostomia (see Vicogne et al., 2003; Vanderstraete et al., 2013), but absent from humans. Some functional studies of S. mansoni have shown that the compound tryphostin AG1024 kills schistosomula and adults in vitro (Vanderstraete et al., 2013, 2014) by targeting schistosome VKRs and IRs. Given the sequence conservation of VKRs and IRs between S. haematobium and S. mansoni (97.3% and 93.8% similarity, respectively), this compound is likely to also kill the former species. In the context of identifying further schistosome-specific targets, four pairs of unclassified schistosome kinases identified here (Appendix 2.1) were of interest, as they exhibited substantially lower sequence similarity to their human orthologs compared with S. mansoni orthologs. Three of these kinase-encoding genes were transcribed at varying levels in at least one of the sexes of the adult stage. We suggest that these results might assist in designing inhibitors for schistosomes, particularly if the premise is to target less conserved structural regions in a kinase outside of the conserved catalytic domain. This hypothesis warrants testing. The curated set of kinases for S. haematobium as well as for its close relative, S. mansoni, might provide a stepping stone to fundamental studies of the biology of selected kinases in these worms. For instance, gene knockdown experiments by double-stranded RNA interference (RNAi) (Guidi et al., 2015) could be conducted on adult worms to validate the essentiality of subsets of kinases as drug targets in schistosomes. Combined with transcriptomic, proteomic and metabolomic investigations (Wang et al., 2010; Hong et al., 2013; Buro et al., 2014) of treated versus untreated schistosomes, such studies could provide insights into the biological (e.g., signalling) pathways affected in the schistosome and also verify the specific knockdown of kinase genes and gene products. Moreover, in a similar manner, chemical knockdown experiments could confirm the specificity of the predicted and prioritised ligands in vitro (Rojo-Arreola et al., 2014). Concordance between RNAi and chemical knockdown results would then provide confidence regarding the bioinformatic drug target/drug predictions made. Subsequently, compounds for which one or multiple targets have been validated and that have shown efficacy in vitro could then be investigated further in a ‘hit-to-lead’ phase. At this point, chemical analogs could be produced to optimise target selectivity and minimise side effects on the host organism. Selected chemicals with specific binding to a kinase target but with limited selectivity (e.g., because of activity in mammalian host cells) might still serve as probes (Knapp et al., 2013) to explore kinase biology in the

80

parasite. In conclusion, we believe that the present bioinformatic investigation represents a step forward in the characterisation and curation of worm kinomes. The concordance in results between S. mansoni and S. haematobium (Figure 2.2; Appendix 2.1) as well as known lethal/adverse effects of some inhibitors against S. mansoni kinases (Ahier et al., 2008; Caffrey et al., 2009; Beckmann et al., 2010, 2012; You et al., 2010; Vanderstraete et al., 2013; Hahnel et al., 2014; Morel et al., 2014a, b; Ressurreição et al., 2014) suggest that some of our target and drug predictions are promising. However, we acknowledge that the prediction of drug targets and associated ligands represents a humble beginning to an often long and challenging route to validate new chemical entities (NCEs), to assess them in a preclinical context by administration, distribution, metabolism, excretion and toxicity (ADMET) testing (Abdulla et al., 2009; Katz et al., 2013; Panic et al., 2015), and, via clinical trials (phases I-III; http://www.phrma.org/innovation/clinical-trials) (Ramamoorthi et al., 2015), to develop one or more safe, effective and specific anti-schistosomal drugs. We hope that our bioinformatic pipeline will assist, at least in part, at the very beginning of this long and expensive discovery and development process.

81

2.5 References Abdulla MH, Ruelas DS, Wolff B, Snedecor J, Lim KC, Xu F, Renslo AR, Williams J, McKerrow JH, Caffrey CR, 2009. Drug discovery for schistosomiasis: hit and lead compounds identified in a library of known drugs by medium-throughput phenotypic screening. PLoS Negl. Trop. Dis. 3, e478. Ahier A, Khayath N, Vicogne J, Dissous C, 2008. Insulin receptors and glucose uptake in the human parasite Schistosoma mansoni. Parasite 15, 573-579. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ, 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389-3402. Andrade LF, Nahum LA, Avelar LG, Silva LL, Zerlotini A, Ruiz JC, Oliveira G, 2011. Eukaryotic protein kinases (ePKs) of the helminth parasite Schistosoma mansoni. BMC Genomics 12, 215. Andrade LF, Mourao Mde M, Geraldo JA, Coelho FS, Silva LL, Neves RH, Volpini A, Machado-Silva JR, Araujo N, Nacif-Pimenta R, Caffrey CR, Oliveira G, 2014. Regulation of Schistosoma mansoni development and reproduction by the mitogen- activated protein kinase signaling pathway. PLoS Negl. Trop. Dis. 8, e2949. Anighoro A, Bajorath J, Rastelli G, 2014. Polypharmacology: challenges and opportunities in drug discovery. J. Med. Chem. 57, 7874-7887. Badiola N, Penas C, Minano-Molina A, Barneda-Zahonero B, Fado R, Sanchez-Opazo G, Comella JX, Sabria J, Zhu C, Blomgren K, Casas C, Rodriguez-Alvarez J, 2011. Induction of ER stress in response to oxygen-glucose deprivation of cortical cultures involves the activation of the PERK and IRE-1 pathways and of caspase-12. Cell Death Dis. 2, e149. Bahia D, Avelar LG, Vigorosi F, Cioli D, Oliveira GC, Mortara RA, 2006. The distribution of motor proteins in the muscles and flame cells of the Schistosoma mansoni miracidium and primary sporocyst. Parasitology 133, 321-329. Beckmann S, Buro C, Dissous C, Hirzmann J, Grevelding CG, 2010. The Syk kinase SmTK4 of Schistosoma mansoni is involved in the regulation of spermatogenesis and oogenesis. PLoS Pathog. 6, e1000769. Beckmann S, Leutner S, Gouignard N, Dissous C, Grevelding CG, 2012. Protein kinases as potential targets for novel anti-schistosomal strategies. Curr. Pharm. Des. 18, 3579- 3594. Beckmann S, Long T, Scheld C, Geyer R, Caffrey CR, Grevelding CG, 2014. Serum albumin and α-1 acid glycoprotein impede the killing of Schistosoma mansoni by the tyrosine kinase inhibitor Imatinib. Int. J. Parasitol. Drugs Drug Resist. 4, 287-295. Berriman M, Haas BJ, LoVerde PT, Wilson RA, Dillon GP, Cerqueira GC, Mashiyama ST, Al-Lazikani B, Andrade LF, Ashton PD, Aslett MA, Bartholomeu DC, Blandin G, Caffrey CR, Coghlan A, Coulson R, Day TA, Delcher A, DeMarco R, Djikeng A, Eyre T, Gamble JA, Ghedin E, Gu Y, Hertz-Fowler C, Hirai H, Hirai Y, Houston R, Ivens A, Johnston DA, Lacerda D, Macedo CD, McVeigh P, Ning Z, Oliveira G, Overington JP, Parkhill J, Pertea M, Pierce RJ, Protasio AV, Quail MA, Rajandream MA, Rogers J, Sajid M, Salzberg SL, Stanke M, Tivey AR, White O, Williams DL, Wortman J, Wu W, Zamanian M, Zerlotini A, Fraser-Liggett CM, Barrell BG, El- Sayed NM, 2009. The genome of the blood fluke Schistosoma mansoni. Nature 460, 352-358. Bolger AM, Lohse M, Usadel B, 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114-2120. Boutet E, Lieberherr D, Tognolli M, Schneider M, Bairoch A, 2007. UniProtKB/Swiss-Prot. Methods Mol. Biol. 406, 89-112.

82

Breugelmans B, E. ABR, Young ND, Amani P, Stroehlein AJ, Sternberg PW, Jex AR, Boag PR, Hofmann A, Gasser RB, 2015. Flatworms have lost the right open reading frame kinase 3 gene during evolution. Sci. Rep. 5, 9417. Brindley PJ, Hotez PJ, 2013. Break out: urogenital schistosomiasis and Schistosoma haematobium infection in the post-genomic era. PLoS Negl. Trop. Dis. 7, e1961. Burke ML, Jones MK, Gobert GN, Li YS, Ellis MK, McManus DP, 2009. Immunopathogenesis of human schistosomiasis. Parasite Immunol. 31, 163-176. Buro C, Beckmann S, Oliveira KC, Dissous C, Cailliau K, Marhöfer RJ, Selzer PM, Verjovski-Almeida S, Grevelding CG, 2014. Imatinib treatment causes substantial transcriptional changes in adult Schistosoma mansoni in vitro exhibiting pleiotropic effects. PLoS Negl. Trop. Dis. 8, e2923. Caffrey CR, Rohwer A, Oellien F, Marhöfer RJ, Braschi S, Oliveira G, McKerrow JH, Selzer PM, 2009. A comparative chemogenomics strategy to predict potential drug targets in the metazoan pathogen, Schistosoma mansoni. PLoS One 4, e4413. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL, 2009. BLAST+: architecture and applications. BMC Bioinformatics 10, 421. Chai JY, 2013. Praziquantel treatment in trematode and cestode infections: an update. Infect. Chemother. 45, 32-43. Cohen P, 2000. The regulation of protein function by multisite phosphorylation - a 25 year update. Trends Biochem. Sci. 25, 596-601. Cohen P, 2002. Protein kinases - the major drug targets of the twenty-first century? Nat. Rev. Drug Discov. 1, 309-315. Colley DG, Bustinduy AL, Secor WE, King CH, 2014. Human schistosomiasis. Lancet 383, 2253-2264. Cornford EM, Fitzpatrick AM, 1985. The mechanism and rate of glucose transfer from male to female schistosomes. Mol. Biochem. Parasitol. 17, 131-141. de Saram PS, Ressurreição M, Davies AJ, Rollinson D, Emery AM, Walker AJ, 2013. Functional mapping of protein kinase A reveals its importance in adult Schistosoma mansoni motor activity. PLoS Negl. Trop. Dis. 7, e1988. Dissous C, Grevelding CG, 2011. Piggy-backing the concept of cancer drugs for schistosomiasis treatment: a tangible perspective? Trends Parasitol. 27, 59-66. Dissous C, Vanderstraete M, Beckmann S, Gouignard N, Leutner S, Buro C, Grevelding CG, 2013. Receptor tyrosine kinase signaling and drug targeting in schistosomes. In: Doerig, C, Spaeth, G, Wiese, M (Eds.), Protein phosphorylation in parasites. Wiley- Blackwell, Hoboken, New Jersey, USA, pp. 337-356. Doenhoff MJ, Cioli D, Utzinger J, 2008. Praziquantel: mechanisms of action, resistance and new derivatives for schistosomiasis. Curr. Opin. Infect. Dis. 21, 659-667. Drysdale R, FlyBase Consortium, 2008. FlyBase: a database for the Drosophila research community. Methods Mol. Biol. 420, 45-59. Edgar RC, 2004. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5, 113. Eglen RM, Reisine T, 2009. The current status of drug discovery against the human kinome. Assay Drug Dev. Technol. 7, 22-43. Eppig JT, Blake JA, Bult CJ, Kadin JA, Richardson JE, Mouse Genome Database Group, 2015. The Mouse Genome Database (MGD): facilitating mouse as a model for human biology and disease. Nucleic Acids Res. 43, D726-D736. Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, McGlinchey S, Michalovich D, Al-Lazikani B, Overington JP, 2012. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 40, D1100-D1107.

83

Goldberg JM, Griggs AD, Smith JL, Haas BJ, Wortman JR, Zeng Q, 2013a. Kinannote, a computer program to identify and classify members of the eukaryotic protein kinase superfamily. Bioinformatics 29, 2387-2394. Goldberg DE, Zhang M, Nussenzweig V, 2013b. Plasmodium eIF2α kinases. In: Doerig, C, Spaeth, G, Wiese, M (Eds.), Protein phosphorylation in parasites. Wiley-Blackwell, Hoboken, New Jersey, USA, pp. 123-130. Greenbaum MP, Yan W, Wu MH, Lin YN, Agno JE, Sharma M, Braun RE, Rajkovic A, Matzuk MM, 2006. TEX14 is essential for intercellular bridges and fertility in male mice. Proc. Natl. Acad. Sci. USA 103, 4982-4987. Greenberg RM, 2013. New approaches for understanding mechanisms of drug resistance in schistosomes. Parasitology 140, 1534-1546. Guidi A, Mansour NR, Paveley RA, Carruthers IM, Besnard J, Hopkins AL, Gilbert IH, Bickle QD, 2015. Application of RNAi to genomic drug target validation in schistosomes. PLoS Negl. Trop. Dis. 9, e0003801. Hahnel S, Quack T, Parker-Manuel SJ, Lu Z, Vanderstraete M, Morel M, Dissous C, Cailliau K, Grevelding CG, 2014. Gonad RNA-specific qRT-PCR analyses identify genes with potential functions in schistosome reproduction such as SmFz1 and SmFGFRs. Front. Genet. 5, 170. Hanks SK, Quinn AM, Hunter T, 1988. The protein kinase family: conserved features and deduced phylogeny of the catalytic domains. Science 241, 42-52. Hanks SK, Hunter T, 1995. Protein kinases 6. The eukaryotic protein kinase superfamily: kinase (catalytic) domain structure and classification. FASEB J. 9, 576-596. Hanks SK, 2003. Genomic analysis of the eukaryotic protein kinase superfamily: a perspective. Genome Biol. 4, 111. Harris TW, Baran J, Bieri T, Cabunoc A, Chan J, Chen WJ, Davis P, Done J, Grove C, Howe K, Kishore R, Lee R, Li Y, Muller HM, Nakamura C, Ozersky P, Paulini M, Raciti D, Schindelman G, Tuli MA, Van Auken K, Wang D, Wang X, Williams G, Wong JD, Yook K, Schedl T, Hodgkin J, Berriman M, Kersey P, Spieth J, Stein L, Sternberg PW, 2014. WormBase 2014: new views of curated biology. Nucleic Acids Res. 42, D789-D793. Hong Y, Sun A, Zhang M, Gao F, Han Y, Fu Z, Shi Y, Lin J, 2013. Proteomics analysis of differentially expressed proteins in schistosomula and adult worms of Schistosoma japonicum. Acta Trop. 126, 1-10. Huang Y, Chen W, Wang X, Liu H, Chen Y, Guo L, Luo F, Sun J, Mao Q, Liang P, Xie Z, Zhou C, Tian Y, Lv X, Huang L, Zhou J, Hu Y, Li R, Zhang F, Lei H, Li W, Hu X, Liang C, Xu J, Li X, Yu X, 2013. The carcinogenic liver fluke, Clonorchis sinensis: new assembly, reannotation and analysis of the genome and characterization of tissue transcriptomes. PLoS One 8, e54732. Jones P, Binns D, Chang HY, Fraser M, Li W, McAnulla C, McWilliam H, Maslen J, Mitchell A, Nuka G, Pesseat S, Quinn AF, Sangrador-Vegas A, Scheremetjew M, Yong SY, Lopez R, Hunter S, 2014. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236-1240. Kanehisa M, Goto S, 2000. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27-30. Kannan N, Taylor SS, Zhai Y, Venter JC, Manning G, 2007. Structural and functional diversity of the microbial kinome. PLoS Biol. 5, e17. Kapp K, Schussler P, Kunz W, Grevelding CG, 2001. Identification, isolation and characterization of a Fyn-like tyrosine kinase from Schistosoma mansoni. Parasitology 122, 317-327.

84

Kapp K, Knobloch J, Schussler P, Sroka S, Lammers R, Kunz W, Grevelding CG, 2004. The Schistosoma mansoni Src kinase TK3 is expressed in the gonads and likely involved in cytoskeletal organization. Mol. Biochem. Parasitol. 138, 171-182. Katoh K, Standley DM, 2013. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772-780. Katz N, Couto FF, Araujo N, 2013. Imatinib activity on Schistosoma mansoni. Mem. Inst. Oswaldo Cruz 108, 850-853. Kent WJ, 2002. BLAT - the BLAST-like alignment tool. Genome Res. 12, 656-664. Kjetland EF, Ndhlovu PD, Gomo E, Mduluza T, Midzi N, Gwanzura L, Mason PR, Sandvik L, Friis H, Gundersen SG, 2006. Association between genital schistosomiasis and HIV in rural Zimbabwean women. AIDS 20, 593-600. Knapp S, Arruda P, Blagg J, Burley S, Drewry DH, Edwards A, Fabbro D, Gillespie P, Gray NS, Kuster B, Lackey KE, Mazzafera P, Tomkinson NC, Willson TM, Workman P, Zuercher WJ, 2013. A public-private partnership to unlock the untargeted kinome. Nat. Chem. Biol. 9, 3-6. Langmead B, Salzberg SL, 2012. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357-359. Law V, Knox C, Djoumbou Y, Jewison T, Guo AC, Liu Y, Maciejewski A, Arndt D, Wilson M, Neveu V, Tang A, Gabriel G, Ly C, Adamjee S, Dame ZT, Han B, Zhou Y, Wishart DS, 2014. DrugBank 4.0: shedding new light on drug metabolism. Nucleic Acids Res. 42, D1091-D1097. Li B, Dewey CN, 2011. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323. Li L, Stoeckert Jr. CJ, Roos DS, 2003. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 2178-2189. Lim S, Jin K, Friedman E, 2002. Mirk protein kinase is activated by MKK3 and functions as a transcriptional activator of HNF1α. J. Biol. Chem. 277, 25040-25046. Lipinski CA, 2004. Lead- and drug-like compounds: the rule-of-five revolution. Drug Discov. Today Technol. 1, 337-341. LoVerde PT, Andrade LF, Oliveira G, 2009. Signal transduction regulates schistosome reproductive biology. Curr. Opin. Microbiol. 12, 422-428. Mallatt J, Craig CW, Yoder MJ, 2012. Nearly complete rRNA genes from 371 Animalia: updated structure-based alignment and detailed phylogenetic analysis. Mol. Phylogenet. Evol. 64, 603-617. Malumbres M, Barbacid M, 2007. Cell cycle kinases in cancer. Curr. Opin. Genet. Dev. 17, 60-65. Manning G, Whyte DB, Martinez R, Hunter T, Sudarsanam S, 2002a. The protein kinase complement of the human genome. Science 298, 1912-1934. Manning G, Plowman GD, Hunter T, Sudarsanam S, 2002b. Evolution of protein kinase signaling from yeast to man. Trends Biochem. Sci. 27, 514-520. Manning G, 2005. Genomic overview of protein kinases. WormBook, ed. The C. elegans Research Community, WormBook, https://doi.org/10.1895/wormbook.1.60.1 Marchler-Bauer A, Zheng C, Chitsaz F, Derbyshire MK, Geer LY, Geer RC, Gonzales NR, Gwadz M, Hurwitz DI, Lanczycki CJ, Lu F, Lu S, Marchler GH, Song JS, Thanki N, Yamashita RA, Zhang D, Bryant SH, 2013. CDD: conserved domains and protein three-dimensional structure. Nucleic Acids Res. 41, D348-D352. Martin DM, Miranda-Saavedra D, Barton GJ, 2009. Kinomer v. 1.0: a database of systematically classified eukaryotic protein kinases. Nucleic Acids Res. 37, D244- D250.

85

Mondal G, Ohashi A, Yang L, Rowley M, Couch FJ, 2012. Tex14, a Plk1-regulated protein, is required for kinetochore-microtubule attachment and regulation of the spindle assembly checkpoint. Mol. Cell 45, 680-695. Morel M, Vanderstraete M, Hahnel S, Grevelding CG, Dissous C, 2014a. Receptor tyrosine kinases and schistosome reproduction: new targets for chemotherapy. Front. Genet. 5, 238. Morel M, Vanderstraete M, Cailliau K, Lescuyer A, Lancelot J, Dissous C, 2014b. Compound library screening identified Akt/PKB kinase pathway inhibitors as potential key molecules for the development of new chemotherapeutics against schistosomiasis. Int. J. Parasitol. Drugs Drug Resist. 4, 256-266. Morgan JA, Dejong RJ, Snyder SD, Mkoji GM, Loker ES, 2001. Schistosoma mansoni and Biomphalaria: past history and future trends. Parasitology 123, 211-228. Morphy R, 2010. Selectively nonselective kinase inhibition: striking the right balance. J. Med. Chem. 53, 1413-1437. Oslowski CM, Urano F, 2011. Measuring ER stress and the unfolded protein response using mammalian tissue culture system. Methods Enzymol. 490, 71-92. Palumbo E, 2007. Association between schistosomiasis and cancer: a review. Infect. Dis. Clin. Pract. 15, 145-148. Panic G, Vargas M, Scandale I, Keiser J, 2015. Activity profile of an FDA-approved compound library against Schistosoma mansoni. PLoS Negl. Trop. Dis. 9, e0003962. Picchio L, Plantie E, Renaud Y, Poovthumkadavil P, Jagla K, 2013. Novel Drosophila model of myotonic dystrophy type 1: phenotypic characterization and genome-wide view of altered gene expression. Hum. Mol. Genet. 22, 2795-2810. Protasio AV, Tsai IJ, Babbage A, Nichol S, Hunt M, Aslett MA, De Silva N, Velarde GS, Anderson TJ, Clark RC, Davidson C, Dillon GP, Holroyd NE, LoVerde PT, Lloyd C, McQuillan J, Oliveira G, Otto TD, Parker-Manuel SJ, Quail MA, Wilson RA, Zerlotini A, Dunne DW, Berriman M, 2012. A systematically improved high quality genome and transcriptome of the human blood fluke Schistosoma mansoni. PLoS Negl. Trop. Dis. 6, e1455. Ramamoorthi R, Graef KM, Dent J, 2015. Repurposing pharma assets: an accelerated mechanism for strengthening the schistosomiasis drug development pipeline. Future Med. Chem. 7, 727-735. Ressurreição M, De Saram P, Kirk RS, Rollinson D, Emery AM, Page NM, Davies AJ, Walker AJ, 2014. Protein kinase C and extracellular signal-regulated kinase regulate movement, attachment, pairing and egg release in Schistosoma mansoni. PLoS Negl. Trop. Dis. 8, e2924. Rice P, Longden I, Bleasby A, 2000. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 16, 276-277. Rojo-Arreola L, Long T, Asarnow D, Suzuki BM, Singh R, Caffrey CR, 2014. Chemical and genetic validation of the statin drug target to treat the helminth disease, schistosomiasis. PLoS One 9, e87594. Rollinson D, Stothard JR, Southgate VR, 2001. Interactions between intermediate snail hosts of the genus Bulinus and schistosomes of the Schistosoma haematobium group. Parasitology 123, 245-260. Rollinson D, 2009. A wake up call for urinary schistosomiasis: reconciling research effort with public health importance. Parasitology 136, 1593-1610. Rollinson D, Knopp S, Levitz S, Stothard JR, Tchuem Tchuente LA, Garba A, Mohammed KA, Schur N, Person B, Colley DG, Utzinger J, 2013. Time to set the agenda for schistosomiasis elimination. Acta Trop. 128, 423-440.

86

Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Hohna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP, 2012. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst. Biol. 61, 539-542. Schistosoma japonicum Genome Sequencing and Functional Analysis Consortium, 2009. The Schistosoma japonicum genome reveals features of host-parasite interplay. Nature 460, 345-351. Slater GS, Birney E, 2005. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6, 31. Sonnhammer EL, Eddy SR, Durbin R, 1997. Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins 28, 405-420. Swierczewski BE, Davies SJ, 2009. A schistosome cAMP-dependent protein kinase catalytic subunit is essential for parasite viability. PLoS Negl. Trop. Dis. 3, e505. Swierczewski BE, Davies SJ, 2010. Conservation of protein kinase A catalytic subunit sequences in the schistosome pathogens of humans. Exp. Parasitol. 125, 156-160. Takei Y, 2000. Structural and functional evolution of the natriuretic peptide system in vertebrates. Int. Rev. Cytol. 194, 1-66. Ubersax JA, Ferrell Jr. JE, 2007. Mechanisms of specificity in protein phosphorylation. Nat. Rev. Mol. Cell Biol. 8, 530-541. van der Werf MJ, de Vlas SJ, Brooker S, Looman CW, Nagelkerke NJ, Habbema JD, Engels D, 2003. Quantification of clinical morbidity associated with schistosome infection in sub-Saharan Africa. Acta Trop. 86, 125-139. Vanderstraete M, Gouignard N, Cailliau K, Morel M, Lancelot J, Bodart JF, Dissous C, 2013. Dual targeting of insulin and venus kinase receptors of Schistosoma mansoni for novel anti-schistosome therapy. PLoS Negl. Trop. Dis. 7, e2226. Vanderstraete M, Gouignard N, Cailliau K, Morel M, Hahnel S, Leutner S, Beckmann S, Grevelding CG, Dissous C, 2014. Venus kinase receptors control reproduction in the platyhelminth parasite Schistosoma mansoni. PLoS Pathog. 10, e1004138. Vicogne J, Pin JP, Lardans V, Capron M, Noel C, Dissous C, 2003. An unusual receptor tyrosine kinase of Schistosoma mansoni contains a Venus Flytrap module. Mol. Biochem. Parasitol. 126, 51-62. Vicogne J, Cailliau K, Tulasne D, Browaeys E, Yan YT, Fafeur V, Vilain JP, Legrand D, Trolet J, Dissous C, 2004. Conservation of epidermal growth factor receptor function in the human parasitic helminth Schistosoma mansoni. J. Biol. Chem. 279, 37407-37414. Wang Y, Li JV, Saric J, Keiser J, Wu J, Utzinger J, Holmes E, 2010. Advances in metabolic profiling of experimental nematode and trematode infections. Adv. Parasitol. 73, 373-404. World Health Organization, 2012. Research priorities for helminth infections: technical report of the TDR disease reference group on helminth infections. WHO technical report series. Wu MH, Rajkovic A, Burns KH, Yan W, Lin YN, Matzuk MM, 2003. Sequence and expression of testis-expressed gene 14 (Tex14): a gene encoding a protein kinase preferentially expressed during spermatogenesis. Gene Expr. Patterns 3, 231-236. Xu C, Bailly-Maitre B, Reed JC, 2005. Endoplasmic reticulum stress: cell life and death decisions. J. Clin. Invest. 115, 2656-2664. You H, Zhang W, Jones MK, Gobert GN, Mulvenna J, Rees G, Spanevello M, Blair D, Duke M, Brehm K, McManus DP, 2010. Cloning and characterisation of Schistosoma japonicum insulin receptors. PLoS One 5, e9868.

87

Young ND, Hall RS, Jex AR, Cantacessi C, Gasser RB, 2010. Elucidating the transcriptome of Fasciola hepatica - a key to fundamental and biotechnological discoveries for a neglected parasite. Biotechnol. Adv. 28, 222-231. Young ND, Jex AR, Li B, Liu S, Yang L, Xiong Z, Li Y, Cantacessi C, Hall RS, Xu X, Chen F, Wu X, Zerlotini A, Oliveira G, Hofmann A, Zhang G, Fang X, Kang Y, Campbell BE, Loukas A, Ranganathan S, Rollinson D, Rinaldi G, Brindley PJ, Yang H, Wang J, Wang J, Gasser RB, 2012. Whole-genome sequence of Schistosoma haematobium. Nat. Genet. 44, 221-225. Young ND, Nagarajan N, Lin SJ, Korhonen PK, Jex AR, Hall RS, Safavi-Hemami H, Kaewkong W, Bertrand D, Gao S, Seet Q, Wongkham S, Teh BT, Wongkham C, Intapan PM, Maleewong W, Yang X, Hu M, Wang Z, Hofmann A, Sternberg PW, Tan P, Wang J, Gasser RB, 2014. The Opisthorchis viverrini genome provides insights into life in the bile duct. Nat. Commun. 5, 4378.

88

Table 2.1 Pairwise comparisons of Schistosoma haematobium (Sh) kinase sequences with orthologs in Schistosoma mansoni (Sm) and human. Average length ratios, identity and similarity values are indicated. Amino acid sequence conservation between S. haematobium and S. mansoni was observed for all kinase groups. Predicted sequences had very similar lengths. The comparison with human (Homo sapiens) homologs showed moderate to low identities and similarities. Average and standard deviation (S.D.) values were calculated based on the number of predicted S. haematobium sequences in each kinase group.

Kinase Length S. mansoni S. mansoni H. sapiens H. sapiens groupa ratio Sh/Sm % identity % similarity % identity % similarity (S.D.) (S.D.) (S.D.) (S.D.) (S.D.) CMGC 1.00 (0.08) 91.87 (6.10) 94.42 (4.96) 61.04 (8.93) 75.91 (7.15) CAMK 1.00 (0.08) 87.89 (8.48) 91.48 (6.55) 54.57 (13.91) 71.31 (10.79) AGC 0.99 (0.19) 91.20 (5.91) 93.78 (4.94) 52.40 (13.22) 68.66 (11.15) Other 0.98 (0.17) 87.18 (8.46) 90.28 (7.77) 43.54 (12.06) 62.11 (10.87) TK 0.99 (0.11) 87.66 (8.18) 91.12 (6.92) 45.74 (6.50) 63.68 (6.46) STE 1.04 (0.22) 90.75 (6.89) 93.21 (6.22) 56.94 (12.14) 72.62 (10.73) TKL 1.01 (0.08) 88.59 (6.33) 92.01 (5.67) 45.29 (9.95) 63.58 (8.68) CK1 0.98 (0.14) 85.05 (8.21) 87.85 (7.94) 62.61 (11.20) 76.30 (8.58) RGC 1.02 (0.19) 85.30 (6.00) 87.53 (6.86) 50.43 (5.36) 66.70 (5.38) PKL 1.07 (0.12) 88.45 (3.92) 91.53 (4.01) 43.55 (6.34) 60.90 (5.82) Unclassified 1.06 (0.11) 82.50 (14.84) 88.92 (9.30) 35.12 (7.76) 58.10 (3.93) a CMGC, cyclin-dependent kinases (CDKs)/mitogen-activated protein kinases (MAPKs)/glycogen synthase kinases (GSKs)/CDK-like kinases; CAMK, Ca2+/calmodulin- dependent kinases; AGC, nucleoside-regulated kinases; TK, tyrosine kinases; STE, MAPK cascade kinases; TKL, tyrosine kinase-like kinases; CK1, casein kinase 1 kinases; RGC, receptor guanylate cyclases; PKL, protein kinase-like kinases.

89

Table 2.2 Kinase families and subfamilies absent from the kinomes of Schistosoma haematobium and Schistosoma mansoni (Lophotrophozoa; Protostomia). Members of these families and subfamilies are found in both Ecdysozoa (Protostomia) and Deuterostomia.

Name Kinase classification Novel (Nua) kinase family CAMK/CAMKL/NUAK MAPK-integrating or -interacting kinase CAMK/MAPKAPK/MNK Testis-specific serine/threonine kinase CAMK/TSSK RSK-like kinase AGC/RSKL Mitogen- and stress-activated protein kinase AGC/RSK/MSK RSK-related kinase AGC/RSKR Yet another novel kinase AGC/YANK Budding uninhibited by benzimidazoles kinase OTHER/BUB New kinase family 1 OTHER/NKF1 Anaplastic lymphoma kinase TK/ALK Discoidin domain receptor kinase TK/DDR IL1 receptor-associated kinase TKL/IRAK Serine/threonine-like kinase STE/STE20/STLK Eukaryotic elongation factor 2 kinase ATYPICAL/ALPHA/EEF2K Bromodomain-containing kinases ATYPICAL/BRD Pyruvate dehydrogenase kinase ATYPICAL/PDHK Phosphatidylinositol 3 kinase-related kinase ATYPICAL/PIKK Right open reading frame kinase 3 ATYPICAL/RIO/RIO3 TATA-binding protein-associated factor 1 ATYPICAL/TAF1

90

Table 2.3 List of prioritised chemical compounds as drug candidates against Schistosoma haematobium. For each compound, the number of target kinases, its indicated therapeutic use(s) and the status of FDA approval for use in humans are given (A: approved; I, II or III: phase of clinical trial). Additional information and chemical structures are given in Appendices 2.5-2.7.

Name or code Number of Indicated for Status of of compound target kinases treatment of approval AG-13736 (axitinib) 1 Cancer A Dasatinib 32 Cancer A Pazopanib 30 Cancer A Erlotinib 30 Cancer A Imatinib 30 Cancer A Gefitinib 30 Cancer A Sorafenib 32 Cancer A Sunitinib 32 Cancer A Vandetanib 30 Cancer A Rheumatoid arthritis, psoriasis, inflammatory CP-690550 (tofacitinib) 30 A bowel disease Bosutinib 4 Cancer A Cabozantinib 1 Cancer A Ingenol mebutate 1 Cancer, actinic keratosis A Ponatinib 3 Cancer A Regorafenib 2 Cancer A Trametinib 2 Cancer A Lithium carbonate 1 Bipolar disorder A ABT-869 (linifanib) 31 Cancer III Vatalanib 30 Cancer III AMG-706 (motesanib) 30 Cancer III PD-184352 4 Cancer II PHA-739358 (danusertib) 6 Cancer II Seliciclib 30 Cancer II SNS-032 30 Cancer I Fasudil 4 Cerebral vasospasm, pulmonary hypertension II Ruboxistaurin 33 Diabetic retinopathy III CHEMBL1173486 2 Unknown N/A CHEMBL1230122 1 Unknown N/A CHEMBL150504 1 Unknown N/A AT7519 1 Cancer II AZD2171 (cediranib) 1 Cancer III CYC116 1 Cancer I Ellagic acid 2 Cancer N/A XL228 1 Cancer I XL518 (cobimetinib) 2 Cancer III XL820 1 Cancer II XL844 2 Cancer I XL880 (foretinib) 1 Cancer II

91

(Table 2.3 continued)

XL999 2 Cancer II CEP-1347 1 Asthma, Parkinson's disease III Rheumatoid arthritis, psoriasis, inflammatory KC706 1 II bowel disease TG100801 1 Macular degeneration, diabetic retinopathy II

92

Initial kinase prediction 1. and classification (Kinannote) S. haematobium proteome

Addition of orthologous 2. sequences (OrthoMCL) S. mansoni proteome and kinome Complementation of incomplete homologous 3. sequences in the heterologous genome (Exonerate, BLAT)

Identification of kinase domains using 4. trematode-specific HMM for each kinase group (HMMER)

Alignment of catalytic domains and construction 5. of phylogenetic trees for each kinase group (MAFFT, MUSCLE, Mr. Bayes)

Functional annotation 6. (SwissProt, KEGG, InterProScan)

Figure 2.1 Bioinformatic pipeline used to characterise and curate kinases in Schistosoma haematobium. In step 1, we predicted and classified kinases in S. haematobium. In steps 2-3, additional sequences were identified employing the proteome and kinome inferred from the Schistosoma mansoni genome, and incomplete or missing sequences were complemented using orthologous full-length sequences, which resulted in the final set of predicted kinase sequences. In steps 4 and 5, the catalytic domains in the kinase sequences were identified using trematode-specific hidden Markov models (HMMs) for individual kinase groups, and then aligned (according to group) for subsequent phylogenetic analysis. In step 6, all kinases identified were functionally annotated employing Swiss-Prot, KEGG and InterProScan databases.

93

*

STE7 MEK3 MEK1

FER

ROR STE11 EPH *

RYK MEK7 ASK CCK4 * TEC MUSK FGFR MEK4 FRAY CSK STE HASPINVPS15 FAK ABL LISK SYK PAKB TBCK SRC LZK

BUD32

ShSAK EGFR ACK TRK MLK LIMK WEE * VKR SLK PAKA ILK LRRK NKF5 SEV

TAO

NRBP ULK PEK RAF*

NKF2 SCY1 YSK

MPSK

IRE *MLK *BIKE NAK MST STKR

TTK WNK MSN

NEK10 * INSR STKR1 NEK6 FUSED NEK GCN2 TK KHS TLK ULK STE CK1-DCK1-G 31/31 ULK 27/27 CK1-A CDC7 TKL VRK Other STKR2 SmSAK 20/20

PLK1 40/40 261 (Sh) TTBK

AUR CK1 9/9 PKA 259 (Sm) RGC 3/3 PCTAIRE * CDK PKG AGC ePKs 39/39 CMGC PKA 51/49 PFTAIRE PDK1 CAMK CDC2 CDK5 MAST 41/41 CDK9 CDK4 CAMK2 CRK7

NDR LATS CDK CDK10

ROCK SRPK CDK7 *

* DMPK GEK ERK1CDKCDK8* JNK 0.5 NMO RSKP90BARK RSKP70 P38 AKT ERK7 GRK SGK PKCA GSK MAK CDKL PKNPKCI DYRK2 CLK CK2 PKCH DYRK1 DAPK PRP4

MLCK CAMK2 HIPK

CAMK1 MK2 ** PHK * CASK MARK Not fully classified

BRSK DCAMKLPKD CHK1 MELK No ortholog (in tree) QIK CAMK1 NIM1 Experimentally studied

PIM RAD53 AMPK No unique cluster LKB SNRK Figure 2.2 Phylogenetic analysis of eukaryotic protein kinases (ePKs) of Schistosoma haematobium and Schistosoma mansoni. Following the alignment of amino acid sequences representing individual kinase groups, phylogenetic trees were constructed. High-resolution figures of individual trees including nodal support values and sequence identifiers are given in Appendix 2.2.

94

A (2; 2; 2) Transcription (0; 1; 0) B All transcribed Highly transcribed (2; 2; 2) Replication and repair (0; 1; 0) kinase gene kinase gene sequences (3; 3; 3) Translation (1; 3; 3) sequences (top 10%) (3; 1; 2) Nucleotide metabolism (0; 0; 0) (8; 8; 5) Signalling molecules (0; 1; 0) and interaction (8; 9; 9) Folding, sorting (1; 1; 0) and degradation

(12; 10; 11) Excretory system (1; 1; 1) (15; 12; 15) Sensory system (3; 0; 2) (16; 13; 16) Digestive system (5; 1; 3) (18; 18; 16) Transport and catabolism (1; 2; 2) (20; 16; 18) Circulatory system (2; 0; 1) (23; 21; 21) Environmental adaptation (5; 2; 6) (25; 24; 21) Cell motility (3; 1; 0) (31; 30; 29) Cell growth and death (4; 5; 9) Percentage of transcripts of Percentage transcripts of Percentage (35; 35; 29) Development (6; 4; 3) (41; 39; 39) Nervous system (9; 5; 7) (45; 43; 42) Endocrine system (10; 12; 10) (46; 44; 42) Immune system (8; 6; 4) (53; 50; 48) Cell communication (8; 7; 8) Male Female Egg (91; 88; 84) Signal transduction (22; 18; 18) Male Female Egg

Figure 2.3 Functional annotation and levels of transcription of Schistosoma haematobium kinase genes. (A) All kinase genes transcribed in different sexes/developmental stages (male, female and egg). (B) Top 10% of transcribed kinase genes. Proteins inferred from these transcripts were associated with biochemical pathways. Numbers of inferred sequences in the respective functional category are indicated in parentheses for each sex/developmental stage.

95

Figure 2.4 Venn diagram representing the number of kinase genes selectively transcribed in the three developmental stages of Schistosoma haematobium studied. A total of 214 kinase genes were constitutively transcribed in all three developmental stages. Of the 274 coding regions, 16 were not transcribed. Kinase families/subfamilies assigned to transcribed kinase genes are indicated (boxed).

96

A A_04108.1 B Transport and catabolism (3) (GSK) Cell CK1 TKL motility (4) (2) Folding, (2) Other CMGC sorting and (7) (8) degradation (5) Cancers (15) STE (8) CAMK Immune (4) Cell growth system and death (8) (14) AGC TK (4) Development Cell (9) Other (11) communication (3) (13) Nervous ShFGFR-A ShAkt system Endocrine ShIR-1, ShIR-2 (12) system ShTK4, ShTK5 (12)

Figure 2.5 Kinases prioritised as targets in Schistosoma haematobium and associated pathways. (A) Numbers of predicted targets in individual kinase groups. Kinases that have already been investigated or prioritised in Schistosoma mansoni are indicated. (B) Pathway associations of prioritised targets.

97

CHAPTER 3 The Haemonchus contortus kinome - a resource for fundamental molecular investigations and drug discovery ______

Abstract Protein kinases regulate a plethora of essential signalling and other biological pathways in all eukaryotic organisms, but very little is known about them in most parasitic nematodes. Here, we defined, for the first time, the entire complement of protein kinases (kinome) encoded in the barber’s pole worm (Haemonchus contortus) through an integrated analysis of transcriptomic and genomic data sets using an advanced bioinformatic workflow. We identified, curated and classified 432 kinases representing ten groups, 103 distinct families and 98 subfamilies. A comparison of the kinomes of H. contortus and Caenorhabditis elegans (a related, free-living nematode) revealed considerable variation in the numbers of casein kinases, tyrosine kinases and Ca2+/calmodulin-dependent protein kinases, which likely relate to differences in biology, habitat and life cycle between these worms. Moreover, a suite of kinase genes was selectively transcribed in particular developmental stages of H. contortus, indicating central roles in developmental and reproductive processes. In addition, using a ranking system, drug targets (n = 13) and associated small-molecule effectors (n = 1517) were inferred. The H. contortus kinome will provide a useful resource for fundamental investigations of kinases and signalling pathways in this nematode, and should assist future anthelmintic discovery efforts; this is particularly important, given current drug resistance problems in parasitic nematodes. ______

98

3.1 Introduction The decoding of the genome sequence of the free-living nematode, Caenorhabditis elegans, in 1998 (C. elegans Sequencing Consortium, 1998) marked a dawn of the molecular sciences (‘-omics’) of multicellular (metazoan) organisms, and the genomes of the vinegar fly and human rapidly ensued in 2000 and 2001, respectively (Adams et al., 2000; Venter et al., 2001). The advent and application of second-generation (short-read) nucleic acid sequencing technology a decade ago (van Dijk et al., 2014) then led to a sudden and exponential increase in the amount of genomic and transcriptomic metadata for metazoans, including draft genomes and transcriptomes for numerous parasitic worms (cestodes, trematodes and nematodes) (e.g., Berriman et al., 2009; Schistosoma japonicum Genome Sequencing and Functional Analysis Consortium, 2009; Jex et al., 2011, 2014; Young et al., 2012, 2014; Laing et al., 2013; Schwarz et al., 2013, 2015; Tsai et al., 2013; Tang et al., 2014; Zhu et al., 2015). However, the bioinformatic ‘bottleneck’ (Funari and Canosa, 2014) has substantially slowed the processing, annotation and curation of these digital data sets, limiting their conversion into biologically meaningful information and biotechnological outcomes (e.g., drugs and vaccines), such that there is a need to establish and use improved and faster bioinformatics approaches. Gaining deep insights into molecular pathways of socioeconomically important parasitic nematodes has major implications for developing new interventions against the diseases that they cause in humans, animals or plants (e.g., Opperman et al., 2008; Jex et al., 2011; Shanmugam et al., 2012; Schwarz et al., 2013; Cantacessi et al., 2015), because it should be possible to define targets in these pathways for the design of new anthelmintics. This aspect is of pivotal importance, because, often, only a limited panel of anthelmintic compound classes are available and used for the treatment of disease/infection, with some having a narrow spectrum of activity, and, importantly, because drug resistance, particularly in gastrointestinal nematodes of animals, has become a major scourge and economic burden to livestock producers (Gilleard, 2006; Kaplan and Vidyashankar, 2012). The recent characterisation of the draft genomes and transcriptomes of the barber’s pole worm (Haemonchus contortus) (see Laing et al., 2013; Schwarz et al., 2013), one of the most pathogenic nematodes of small ruminants (e.g., sheep and goats) worldwide (Sutherland and Scott, 2009), provides, for the first time, a solid foundation for detailed explorations of molecular pathways amenable to drug target discovery in a nematode that represents many species of a large order (Strongylida) of socioeconomically important pathogens. In addition, the relatively close relatedness of H. contortus with C. elegans (see Gilleard, 2004; Mitreva

99

et al., 2005), now arguably the best characterised metazoan organism (Harris et al., 2014), enables direct and detailed comparative analyses of such pathways. Of particular significance in this context are signalling pathways, because of their crucial roles in a plethora of developmental and physiological processes. Many such pathways are regulated by protein kinases, which are enzymes (transferases) that phosphorylate a substrate by transferring a phosphoryl group from an energy-rich molecule, such as adenosine triphosphate (ATP), to a target protein (Endicott et al., 2012). These kinases are classified into key groups (n = 9), families and subfamilies, based on sequence similarity in their catalytic domains and the presence of accessory domains (Hanks and Hunter, 1995; Hanks, 2003; Manning, 2005). Although there is scant functional information on protein kinases for parasitic nematodes on a genome-wide level, the kinome (i.e. the complement of kinases encoded in the genome) of C. elegans is very well characterised and has been functionally investigated (Plowman et al., 1999; Manning, 2005; Lehmann et al., 2013; Harris et al., 2014), which provides an ideal starting point for exploring the kinome of H. contortus and related nematodes of the order Strongylida. To this end, the aims of the present study were to: (i) predict and curate the full complement of kinases in H. contortus, (ii) define transcription levels for kinase genes in all key developmental stages of this parasite; and (iii) prioritise a panel of kinases as drug target candidates, as well as predict chemicals that might bind to these targets, using a practical and effective bioinformatic workflow. Finally, the results of this investigation are discussed in the context of nematode biology and drug discovery.

3.2 Methods 3.2.1 Defining the H. contortus kinome We used all published transcriptomic and genomic data for H. contortus (see Laing et al., 2013; Schwarz et al., 2013) to define the kinome via eight steps (1-8):

1. We used the program getorf (within the EMBOSS package v.6.4.0.0) (Rice et al., 2000) to identify the open reading frames (ORFs) for all 167,013 transcripts from assembled transcriptomes (Schwarz et al., 2013), and retained all non-overlapping ORFs from all six frames with a length of > 100 nucleotides (nt). 2. Using amino acid sequences predicted from these ORFs, we used the program InterProScan v.5.7-48.0 (Jones et al., 2014) to infer protein domains, families and superfamilies using Pfam v.27.0 (Sonnhammer et al., 1997), PANTHER v.9.0 (Mi et al., 2013) and SUPERFAMILY v.1.75 (Gough et al., 2001), respectively. Based on

100

this information, we then identified transcripts encoding complete or partial kinase sequences. 3. We used the assembly program CAP3 (90% sequence identity over the alignment) (Huang and Madan, 1999) to splice associated transcripts. Then, we used the program CD-HIT-EST v.4.6 (Fu et al., 2012) to cluster transcripts and remove redundancy, employing a sequence identity threshold of ≥ 90%. 4. We used the program BLAT v.34x12 (Kent, 2002) to map the transcripts to published genomic sequences for H. contortus and, employing pslReps (within BLAT), retained the best-aligned matches. 5. We employed the Integrative Genomics Viewer (IGV) (Thorvaldsdóttir et al., 2013) to display these mapping results, which enabled us to manually curate transcripts and also verify that they were full-length. This process also allowed us (in > 99% of cases) to assign individual full-length transcripts to gene loci on genomic scaffolds. 6. We scrutinised the published H. contortus gene set (Schwarz et al., 2013) and cross- validated the kinase sequences, inferred based on the transcriptome, to identify additional kinases not represented as full-length transcripts in the transcriptomic assembly. 7. We employed the program Kinannote (Goldberg et al., 2013) to classify identified protein kinase sequences. If kinases could not be classified using this approach, we used PSI-BLAST v.2.2.26+ (Altschul et al., 1997), employing an E-value cut-off of 1e-5, to match H. contortus kinases to published C. elegans kinase sequences (Manning, 2005) and inferred classifications based on the best match. Furthermore, we employed Pfam, PANTHER and SUPERFAMILY annotations using InterProScan v.5.7-48.0 to describe kinases that did not have a C. elegans homolog. 8. We used the program EMBOSS Needle v.6.3.1 (Rice et al., 2000) to determine pairwise global amino acid sequence identities, similarities and the ratio of aligned positions versus gaps between H. contortus kinases and their closest homologs (based on PSI-BLAST) in C. elegans (KinBase, http://kinase.com/kinbase/FastaFiles/Nematode_worm_kinase_protein.fasta), sheep (Kyoto Encyclopedia of Genes and Genomes (KEGG), http://www.genome.jp/kegg- bin/get_htext?oas01001.keg), Dictyocaulus filaria (see Mangiola et al., 2014) and Teladorsagia circumcincta (WormBase ParaSite; PRJNA72569; WBPS3, http://parasite.wormbase.org/).

101

3.2.2 Transcription analysis We used publicly available RNA-Seq data (Schwarz et al., 2013) to assess transcription of kinase genes in all key developmental stages (i.e. egg, L1, L2, L3, L4 male, L4 female, adult male and adult female) of H. contortus. First, we used the program Trimmomatic (Bolger et al., 2014) (employing the parameters phred64, ILLUMINACLIP:illuminaClipping.fa:2:40:20, LEADING:3, TRAILING:3, SLIDINGWINDOW:4:20, MINLEN:40) to filter reads and ensure high quality of the reads. Then, we used Bowtie v.2.1.0 (Langmead and Salzberg, 2012) to align the reads to nucleotide sequences encoding kinases, and calculated levels of transcription (transcripts per million, TPM) within the software package RSEM v.1.2.11 (Li and Dewey, 2011). We considered kinase genes to be transcribed if at least five read pairs mapped to their coding regions. Transcripts with similar transcription profiles were clustered based on a Euclidean distance matrix using the Ward clustering method (squaring dissimilarities before cluster updating).

$ ! ≈ The number of clusters (k) was selected using the rule of thumb % (Mardia et al., 1979). Trend lines were calculated using the Lowess regression method (Cleveland, 1979). Clusters were manually selected for further assessment based on cluster size, visibility of trends and amount of variation (i.e. average S.D. of TPM values) in individual stages within the clusters.

3.2.3 Drug target prediction and prioritisation Druggable kinases of H. contortus were predicted and prioritised, using a ranking approach, in six steps (i-vi):

(i) We excluded all kinase genes that were not transcribed in at least one of the parasitic stages of H. contortus (i.e. L4 and adult). (ii) For all remaining genes, essentiality was inferred by selecting H. contortus kinase sequences homologous (protein BLAST; E-value of ≤ 1e-5) to C. elegans kinases with a lethal phenotype upon gene perturbation (RNAi) listed in WormBase (Harris et al., 2014), and rewarded with a point. (iii) Kinases were given one additional point if they were associated with a unique KEGG orthologous gene (KO) term within a KEGG pathway, and another point if they had a unique group/family/subfamily classification. (iv) An amino acid sequence similarity of > 80% (> 50% coverage) to a C. elegans homolog was given one point and, to reward low sequence similarity to host (Ovis

102

aries) kinases, we gave one point to all H. contortus kinase sequences that had similarity values within the lower 75% quantile (i.e. ≤ 41.45% sequence similarity) to their closest sheep homolog. (v) An additional point was given to kinases that had one or more close orthologs (> 80% sequence similarity; > 50% coverage) in two parasitic (strongylid) nematodes of importance in sheep (D. filaria and Te. circumcincta). All H. contortus kinase sequences were then matched to homologous kinase sequences in the databases Kinase SARfari (Gaulton et al., 2012) and DrugBank v.4.3 (Law et al., 2014) using PSI-BLAST v.2.2.26+, employing an E-value cut-off of 1e-30. If small-molecule effectors of H. contortus kinases were inferred (based on sequence similarity to the reported target in the database), such kinases were given one or two additional points for 1-5 or > 5 associated chemicals, respectively. Chemicals in Kinase SARfari were only considered if they met “Lipinski’s rule-of-five” (Lipinski, 2004) and were flagged as “medicinal chemistry-friendly”. In total, 10 points could be assigned to a target, including a maximum of four points awarded for inferred associations with chemicals (Table 3.1). Thus, we assigned lower overall scores to kinases that had no or only a small number of associated chemicals, reasoning that they will probably not represent targets for which known chemicals can be repurposed. However, we did not reject such kinases a priori, but rather appraised them individually as to their potential as novel drug targets.

3.3 Results 3.3.1 The H. contortus kinome In total, 432 H. contortus full-length transcripts encoding protein kinases were identified, 428 (99%) of which were detected in the draft genome. The 432 predicted proteins represented all nine recognised kinase groups and atypical kinases, 103 distinct families and 98 subfamilies. The number of kinases in the kinome of H. contortus was similar to that of C. elegans (n = 434, http://kinase.com/kinbase), and for most kinases (n = 409, 95%), we detected a homolog in C. elegans, with average overall amino acid identity and similarity values of 35% and 46%, respectively. The numbers of kinases for individual groups were similar to those of C. elegans, with the exception of the groups CK1, “Other” and TK (Table 3.2). In the CK1 group, 28 kinases belonged to the “nematode-specific” Worm6 family in both species; except for one Worm10 kinase, members of the other “Worm” families in this group (Worm7 to Worm11) were not present in H. contortus. Similarly, only two members

103

(Worm1 and Worm3) of nematode-specific families (Worm1 to Worm5) in the “Other” group were present in H. contortus. In contrast, C. elegans encodes 11 kinases that belong to these families. Here, the TK group is also greatly expanded in the FER and KIN16 families, with 38 and 15 members, respectively (Manning, 2005). In H. contortus, these numbers are lower, with only 28 members in the FER family and two KIN16 kinase members. In contrast, we found more kinases in some families and subfamilies (DAPK, PKD and NuaK) within the CAMK group of H. contortus compared with C. elegans (Table 3.2; Appendix 3.1). This result can be partly explained by a larger number (n = 8) of kinase genes in this latter group, predicted to encode multiple isoforms, compared with only one CAMK gene in C. elegans, for which two isoforms (R02C2.1 and R02C2.2) exist. Interestingly, we identified 23 kinase sequences that appeared to be unique to H. contortus and could not be classified. However, during the process of functionally annotating all kinase sequences and linking them to conserved domains, functional categories and molecular pathways (Appendix 3.1), we were also able to gain additional knowledge about these 23 unclassified sequences. Specifically, 16 of them could be assigned to the protein kinase-like superfamily (SSF56112, IPR011009) and were predicted to have phospho-transferase activity (GO:0016772), and 14 were associated with the PANTHER family “uncharacterised nuclear hormone receptor-related” (PTHR23020). Notably, almost all proteins present in the PANTHER database with this annotation were encoded in nematode genomes (only one gene is found in some bacteria and fungi), with the majority reported for Pristionchus pacificus (n = 44), C. elegans (n = 31) and Caenorhabditis briggsae (n = 21), suggesting that these proteins are unique to the phylum Nematoda. Based on other evidence from InterProScan, these same 14 sequences were annotated as “uncharacterised oxidoreductase Dhs-27” (IPR012877), containing a region associated with the CHK region in kinases, and identified a “domain of unknown function, DUF1679” (PF07914). Three other unclassified sequences were annotated as “5’-AMP-activated protein kinase β-subunit” (PTHR10343). The functional annotation of all remaining kinases (n = 409) showed that most of them matched one of the two profiles representing the catalytic domains of kinases (“protein kinase”, PF00069, n = 268; “tyrosine kinase”, PF07714, n = 90) and/or were assigned to protein families, such as “protein kinase-like” (SSF56112; n = 381), “casein kinase-related” (PTHR11909; n = 118), “tyrosine-protein kinase” (PTHR24418; n = 68) and/or “adenylate and guanylate cyclases” (PTHR11920; n = 56). The most frequently assigned InterPro signatures were “protein kinase-like domain” (IPR011009; n = 381), “protein kinase domain” (IPR000719; n = 268) and “serine-threonine/tyrosine-protein kinase catalytic domain”

104

(IPR001245; n = 90). We also assigned functional annotations (GO terms) to the kinase sequences, including “protein phosphorylation” (GO:0006468; n = 387), “transferase activity, transferring phosphorus-containing groups” (GO:0016772; n = 381), “protein kinase activity” (GO:0004672; n = 358), “ATP-binding” (GO:0005524; n = 304), and/or “protein-binding” (GO:0005515; n = 77). In addition, a subset of kinases (n = 137; 32%) could be assigned to one or more biological (KEGG) pathways, mainly associated with functions in the categories “organismal systems” (n = 80), “cellular processes” (n = 79), “signal transduction” (n = 75) and “environmental information processing” (n = 75). Following this functional annotation, we compared the kinome of H. contortus to the draft kinome (514 kinases) of O. aries (mammalian host). The number of kinases that had a homolog in O. aries (n = 403) was only slightly less than that of C. elegans (n = 409). However, the kinase sequences from H. contortus shared substantially less average overall pairwise amino acid sequence identity (24%) and similarity (35%) with homologs from sheep than from C. elegans (35% and 46%, respectively).

3.3.2 Transcription profiles We investigated the transcription of kinase genes throughout the life cycle of H. contortus in all key developmental stages (egg, L1, L2, L3, L4 and adult) and both sexes (for L4 and adult) (Appendix 3.2). First, we estimated transcription levels of kinase genes for individual groups, which revealed high transcription for a set of CK1 kinase genes in the male L4 and adult stages but low or no transcription in the egg, L1, L2, L3 and female stages (Appendix 3.3). For all other kinase groups, we observed a slight trend toward increased transcription in the egg and L3 stages (Appendix 3.3). Hypothesising that kinases with a similar transcription profile are more likely to function together in the same pathway or to participate in a common signalling network, we assigned kinase transcripts to 15 individual clusters based on similarity in transcription profiles across all developmental stages (Appendix 3.4). This result indicated that kinase genes in individual clusters were transcriptionally co-regulated in H. contortus. Two clusters (Figure 3.1A; Appendix 3.4; clusters 1 and 2) represented transcription profiles with moderate to very high transcription levels (TPM range 10.28-446.40) in H. contortus males (L4 and adult) and low or no transcription (TPM range 0.00-7.77) in all other developmental stages studied, with the exception of three genes that were also moderately transcribed in H. contortus females (TPM range 10.78-33.36). In total, clusters 1 and 2 contained 79 transcripts, most (n = 44) of which represented the CK1 group. Additionally,

105

these clusters contained 23 kinases of the TK group, two members of the “Other” group and five members of each of the CAMK and CMGC groups, respectively. The majority of the sequences representing the CK1 group were further classified into nematode-specific families (TTBKL, Worm6 and Worm10). Of the kinases representing the TK group, 22 belonged to the FER family. Three of five CAMK kinases represented testis-specific serine kinases (TSSK family), and four of five CMGC kinases belonged to the GSK family. Subsequently, we studied two clusters of a total of 69 genes with high average transcription levels in the egg stage and low to moderate levels in all other developmental stages (Figure 3.1B; Appendix 3.4; clusters 3 and 4). These clusters included homologs of C. elegans kinase genes required for cell-cycle progression (e.g., cdk-1, cdk-4 and chk-1), embryonic development (e.g., efk-1, hpk-1, let-502, pat-4, pkc-3, spk-1 and zyg-1), including genes encoding germinal centre kinases (e.g., gck-1 and mig-15) and one gene encoding a kinase linked to chromosome segregation and cytokinesis (air-2) (Harris et al., 2014). Four other clusters (Figure 3.1C; Appendix 3.4; clusters 5-8) represented 33 sequences that showed high transcription in the L3 stage and considerably lower transcription in all other stages/sexes. Within these four clusters, we found homologs of C. elegans genes known to be involved in functions, such as thermosensation (gcy-23) (Inada et al., 2006), suppression of development of vulva and spicules, as well as ovulation (ark-1) (Hopper et al., 2000), stress response due to starvation (mek-1) (Koga et al., 2000) and sensory signalling linked to dauer entry/exit (pdk-1) (Paradis et al., 1999). Most kinase genes in the seven other clusters (clusters 9-15) were transcribed relatively uniformly throughout all developmental stages and sexes (Appendix 3.4).

3.3.3 Kinases with potential as drug targets We investigated 405 kinases, whose genes were transcribed in at least one of the parasitic life stages of H. contortus (i.e. L4 and adult), as potential anthelmintic targets. A total of 91 kinase sequences matched a C. elegans homolog that has a lethal phenotype upon gene perturbation (RNAi; Appendix 3.1). Furthermore, 64 kinases encoded in H. contortus represented metabolic ‘chokepoints’, i.e. they could be assigned a unique function in a pathway. In addition, 103 kinases were assigned to unique families or subfamilies. Of all 405 sequences, 25 had close homologs in C. elegans, 86 had close homologs in one or both of the strongylid nematodes D. filaria and Te. circumcincta, and 300 lacked a close homolog in sheep. Subsequently, we interrogated drug databases to infer all chemicals associated with the 405 putative kinase targets. We matched 239 kinases with 509 chemicals (1-114 per

106

kinase) in DrugBank, and 191 kinases with 22,861 chemicals (2-10,186 per kinase) in Kinase SARfari. We then ranked all of these 405 kinases according to their potential as drug targets (Figure 3.2; cf. Table 3.1). Ten kinases had a ranking score of ≥ 7, including six CMGC kinases and one AGC, STE, TK and TKL kinase, respectively. A large number of small-molecule compounds were associated with these 10 highest-ranked targets in Kinase SARfari (n = 3105) and DrugBank (n = 122). We excluded compounds that were predicted to non- selectively bind to a range of kinases from these two sets, and retained 1391 (Kinase SARfari) and 122 (DrugBank) chemicals that were exclusively associated with one or more of the top-ten targets (Table 3.3; Appendices 3.5 and 3.6). Then, we appraised the scores of all remaining kinases (n = 395) to identify novel target candidates that associated with no or only a small number of chemicals. Thus, we identified three additional kinases (AGC, CAMK and CK1; black dots in Figure 3.2; Table 3.3) that had a score of 4 or 4.5, but had only been given one additional point for an associated compound (eight chemicals in total). Taken together, we prioritised 13 kinases of H. contortus (from seven distinct groups, 10 families and 11 subfamilies) as druggable targets (Table 3.3).

3.4 Discussion 3.4.1 The H. contortus kinome Signalling cascades that are regulated by protein phosphorylation events play key roles in all eukaryotic organisms, and investigations of these events in many metazoans, including the free-living nematode C. elegans, has helped us gain a sound understanding of how processes, such as growth, development and tissue differentiation, are regulated at the cellular and subcellular levels (Manning et al., 2002; Hanks, 2003; Manning, 2005). In contrast, there is relatively little information on these processes in parasitic nematodes. However, the use of advanced sequencing and computational methods makes it possible to gain deep insights into the genomes and transcriptomes of these worms and, hence, explore their kinomes using bioinformatics tools. The recent assembly and annotation of draft genomes and transcriptomes of H. contortus (see Laing et al., 2013; Schwarz et al., 2013) have provided unique opportunities to explore molecular signalling and other biological pathways in this parasite. Originally, we had predicted 845 sequences from these drafts that shared sequence homology (protein BLAST; E-value cut-off of ≤ 1e-5) to kinases (Schwarz et al., 2013). This estimate was higher than for published draft kinomes (n = 233–364) of other parasitic nematodes, including Ascaris suum,

107

Brugia malayi, Loa loa, Meloidogyne hapla, Trichinella spiralis and Wuchereria bancrofti (see Desjardins et al., 2013; Goldberg et al., 2013), as well as the free-living nematode C. elegans (n = 434) (see Plowman et al., 1999), suggesting that we had over-estimated the number of kinases, which is to be expected when sequence homology alone is used for annotation (Koonin and Galperin, 2003). Therefore, in the present study, our goal was to characterise and curate the complement of kinases of H. contortus in detail using a refined bioinformatic workflow that incorporates and extends that described recently (Stroehlein et al., 2015). By using this approach and by integrating high-quality transcriptomic data, we overcame some of the challenges associated with the assembly (i.e. fragmentation) and the annotation of a complex, eukaryotic genome (Schwarz et al., 2013), and thus considerably improved the gene prediction and annotation of protein kinases encoded in H. contortus. The finding that 409 of the 432 kinase sequences identified and classified in the present study have homologs in C. elegans is consistent with the relatively close phylogenetic relationship of these two nematodes (Mitreva et al., 2005), and contrasts results for the draft kinomes of parasitic nematodes representing different orders or clades that are reported to have substantially reduced kinase complements (n = 233-364), presumably having lost (or not gained) particular kinase families during nematode evolution (Desjardins et al., 2013). The reduced number of kinases in parasitic nematodes compared with C. elegans might be explained by differences in the environmental conditions that these parasites are exposed to, as well as differences in their lifestyle, but it has also been proposed that the small numbers of kinases of some species (e.g., M. hapla and Tr. spiralis) might relate to fragmented and/or incomplete genomic assemblies (Desjardins et al., 2013). Future genome sequencing and annotation efforts should allow these proposals to be tested. The distinctiveness between H. contortus and C. elegans in the numbers of kinases within four groups (CK1, TK, “Other” and CAMK) are likely associated with differences in biology, habitat and/or life cycle between these two nematodes. For instance, H. contortus has a comparatively short free-living phase on pasture and (usually) a longer phase as a haematophagous parasite inside the ruminant host, whereas C. elegans completes its entire life cycle and lives in a soil environment. This difference might explain the larger CK1 group in C. elegans that has been proposed to associate with an increased need for enhanced DNA repair mechanisms in response to excessive exposure to mutagenic stress in this environment (Plowman et al., 1999). The reason for a reduced number of kinases in the two families FER and KIN16 (TK group) in H. contortus is elusive, but again, likely relates to variation in worm biology. Little

108

is known about the apparent ‘expansion’ (i.e. an increased number) of FER kinases in C. elegans, but it might associate with an adaptation of the Wnt signalling pathway (Manning, 2005) and/or cell adhesion mechanisms within the nematode, based on the known roles for two mammalian homologs (FER and FES) (Plowman et al., 1999). Kinases of the KIN16 family are involved in hypodermal development in C. elegans (see Morgan and Greenwald, 1993), and at least one representative of this family (TKR-1) could be linked to an increased resistance against ultraviolet radiation and thermal stress (Murakami and Johnson, 1998; Plowman et al., 1999). Similar to expansions in these two tyrosine kinase families, kinases of the “Worm” families in the “Other” group appear to be largely Caenorhabditis-specific and not present in other nematodes, including the Strongylida (see Manning, 2005; Desjardins et al., 2013). Clearly, future investigations are warranted to explain the reduced numbers of kinases of these various groups/families in H. contortus. Generally, the present findings suggest that expansions in some kinase groups and families in C. elegans are recent events, which is further supported by phylogenetic analyses of some of these families (e.g., KIN16), members of which share limited sequence homology between C. elegans and its close relative, C. briggsae (see Manning, 2005). Several smaller expansions in families and subfamilies (DAPK, PKD and NuaK) within the CAMK group of H. contortus might reflect an adaptation in response to environmental stressors (e.g., bacterial pathogens) and/or starvation. This interpretation is supported, to some extent, by the roles that such kinases assume in defence mechanisms and/or in response to starvation, such as apoptosis and autophagy (Feng et al., 2007; Hoppe et al., 2010; Kang and Avery, 2010). In addition, kinases of the NuaK family are involved in cell adhesion (Zagorska et al., 2010), and an expansion of this family might compensate for the reduction of the somewhat smaller FER family in H. contortus, which, in C. elegans, contains kinases involved in adhesion (Plowman et al., 1999). Most (n = 16) of the 23 kinases for which we did not detect a homolog in the C. elegans kinome had homologous sequences in the Swiss-Prot database (data not shown). Six of these 16 kinases were homologs of C. elegans YLK1. Thirty members of this kinase-like protein family were originally reported in the first global study of the C. elegans kinome, and it was suggested that they likely have kinase activity (Plowman et al., 1999). However, in the most recent release of the C. elegans kinome (http://kinase.com/kinbase/), these kinases are no longer listed, which explains why their H. contortus orthologs were not initially identified and classified here. These YLK1-related kinases share sequence homology with bacterial aminoglycoside kinases, enzymes that confer antibiotic resistance (Hon et al., 1997). Thus,

109

orthologs of these kinases might play a similar role in defence against microbial agents in H. contortus and C. elegans. The functions of the seven other H. contortus kinases, for which we did not detect a homolog, are currently unknown, but their domain architecture and functional classification indicate that they are indeed protein kinases. These findings warrant future investigations. Taken together, the finding that the kinase complements of H. contortus and C. elegans are relatively conserved, provides unique opportunities to explore the functions of these enzymes in H. contortus and to gain an improved understanding of the underlying molecular processes/mechanisms that regulate development, reproduction and physiology in this parasite. This focus is of particular importance, given that direct functional investigations of H. contortus using RNAi-mediated gene knockdown are not very permissive (Geldhof et al., 2007; Knox et al., 2007). By contrast, RNAi is well established in C. elegans, and kinase genes can be readily knocked down, allowing spatial and temporal expression to be assessed using transgenic animals containing reporter genes (Lehmann et al., 2013).

3.4.2 Transcriptional regulation of kinase genes in H. contortus During its life cycle, H. contortus undergoes substantial transcriptional regulation of genes, which is tightly restricted to particular developmental stages (Hoekstra et al., 2000; Hartman et al., 2001; Cantacessi et al., 2010; Delannoy-Normand et al., 2010; Laing et al., 2013; Schwarz et al., 2013). Stage-specific transcription levels suggest critical roles in signalling cascades for a suite of kinase genes/gene products at particular phases of development. For example, many of the 69 highly transcribed kinase genes in the egg stage of H. contortus are probably linked to cell-cycle progression, growth, embryogenesis and tissue differentiation, taking place during early development (Reinke et al., 2000; Gönczy and Rose, 2005; van den Heuvel, 2005). The upward trend of the female transcription values for the two egg-enriched clusters (Figure 3.1B) is likely attributed to eggs being within the uterus of gravid females, a proposal that is supported by the finding that this trend is not seen in the female L4 stage (Figure 3.1B). After the L1 hatches from the egg (~1 day), it moults twice within ~1 week and develops, via L2, into the infective larval stage (L3). During this time, L1 and L2 stages actively feed on microbes and continue to grow (Veglia, 1915). Most kinase genes transcribed in the L1 and L2 stages were also relatively uniformly transcribed in all other developmental stages, suggesting that they assume more generalised functions throughout the entire life cycle.

110

The final free-living and infective larva (L3) is ingested by the host animal, which marks the transition to the parasitic phase of the H. contortus life cycle. L3s are ensheathed and unable to actively feed, and, therefore, need to live on accumulated reserves at reduced metabolic rates (Nikolaou and Gasser, 2006; Schwarz et al., 2013). Their development is arrested, but they remain motile, and various kinases appear to assume crucial functions in this developmental stage. For example, the Hc-pk-119 gene encoding a MEK7 (family) kinase is likely involved in a stress response to starvation, as has been shown for its C. elegans homolog mek-1 (Koga et al., 2000). In addition, several kinase genes that are highly transcribed in the L3 stage seem to function as inhibitors of processes that become important at a later stage of development. For instance, Hc-pk-088 is predicted to encode a kinase that likely inhibits LET-23, an EGF receptor responsible for vulva, ectoderm and spicule development, based on evidence for its C. elegans homolog ARK-1 (Hopper et al., 2000). Another example is PDK-1 (Hc-PK-240), which is involved in signalling pathways controlling entry into and exit from dauer (Paradis et al., 1999). The overall high transcription of many kinase genes in the L3 stage (Figure 3.1; Appendix 3.3) also suggests that transcripts might be stored within this stage to allow for a rapid development and adaptation to a ‘hostile’ environment (pH = 1-2) within the host stomach

(abomasum) following exsheathment. The latter process is mainly triggered by increased CO2 concentrations and also by the host’s body temperature (38.3-39.9 °C), and it is likely that kinases play a central role in this transition to parasitism. The high transcription of Hc-pk-282 in the L3 stage, but the lack thereof in L4 and adult stages, for example, suggests one or more specific roles in the L3 exsheathment process in H. contortus, a statement that is supported by evidence that a Hc-PK-282 homolog in C. elegans (GCY-23) is involved in thermosensation (Inada et al., 2006). Following exsheathment, L3s undergo a short histotropic phase in the stomach mucosa (Stoll, 1943) and then develop to haematophagous stages (L4s and adults) (Veglia, 1915). This development is reflected in a range of morphological changes in both male and female worms, including the formation of a buccal capsule in both sexes, bursal rays and spicules in the male, and development of vulva and other reproductive organs in the female. Particularly in the male, morphogenesis and spermatogenesis appear to be controlled by a range of specific protein kinases (Figure 3.1). The majority of kinases represented in the two male- enriched clusters (Figure 3.1A) were members of the CK1 group and FER family, including homologs encoded by the two spe/fer (spermatogenesis-/fertilisation-defective) mutants spe-6 and spe-8 (Manning, 2005; L'Hernault, 2006), suggesting a role in spermatogenesis and,

111

more generally, fertility. In addition, protein kinases have also been reported to play crucial roles in the post-translational regulation of nematode sperm development, including pseudopod extension and movement (Reinke et al., 2000). In addition, the testis-specific serine kinases (TSSKs) and GSKs in the male-specific clusters might have roles in the formation of bursal rays and spicules in H. contortus, given that the C. elegans homologs are represented in Wnt signalling-related pathways associated with similar anatomical changes (Eisenmann, 2005). Future studies should elucidate the actual functional roles of kinases and signalling pathways in development and reproduction of H. contortus, and assess to what extent these mechanisms differ between parasitic and free-living nematodes; this might be achievable in H. contortus by RNAi using a virus-based transduction system (Hagen et al., 2014).

3.4.3 Protein kinases of H. contortus as potential drug targets In addition to the benefits of using C. elegans as a tool for comparative functional studies of molecular processes in parasitic nematodes (e.g., Hu et al., 2010; Holden-Dye and Walker, 2012; Li et al., 2014a, b), there is merit in using information on essential genes/gene products in C. elegans (in WormBase, https://www.wormbase.org/) to predict and prioritise new drug targets in H. contortus (see Burns et al., 2015). Currently, the small number of classes of anthelmintics available to treat infections with H. contortus and other strongylid nematodes (e.g., Holden-Dye and Walker, 2007, 2014) and widespread drug-resistance in these parasites (Gilleard, 2006; Kaplan and Vidyashankar, 2012) necessitate the search for new nematocidal drugs with modes of action that are distinct from those of currently available classes. In this context, protein kinases might be suitable targets. In a recent study (Schwarz et al., 2013), 27 protein kinases have been proposed as drug target candidates in H. contortus, based on lethal RNAi phenotypes of their homologs in C. elegans and because they were predicted to assume indispensable functions in molecular pathways (‘chokepoints’) in H. contortus. In the present study, we applied additional criteria for target prediction/prioritisation (see Table 3.1) and used a ranking system to define a set of 13 druggable targets and a total of 1517 associated small-molecule compounds. A similar drug target/drug prioritisation strategy has been implemented previously in the TDR database (http://tdrtargets.org/). This ‘union strategy’ has the advantage that it allows the user to adjust the weights for each criterion, and also provides a genome-wide perspective on how useful the chosen criteria and weights may be (Shanmugam et al., 2012). However, while the latter database contains information for various ‘neglected’ pathogens (Mycobacterium; Plasmodium, Toxoplasma, Leishmania,

112

Trypanosoma; Brugia and Schistosoma) and C. elegans (as a reference), it does not presently permit the prioritisation of drug targets and/or chemicals for other pathogens, such as H. contortus. Eleven chemicals listed in DrugBank and associated with the 13 highest-ranking targets identified here were previously identified in a target/compound prioritisation approach and then screened against C. elegans and H. contortus, of which three of these chemicals (DB02152, DB02010 and DB04707) had an inhibitory effect on both nematodes (Taylor et al., 2013). The authors of this study selected chemicals based on a set of protein kinases that were conserved among biologically and genetically diverse nematode species (B. malayi, C. elegans, Meloidogyne incognita and Tr. spiralis), with the goal of finding chemicals that have a broad spectrum of activity against many parasitic nematodes. In the present study, we took a similar, but more conservative approach, ‘rewarding’ potential targets with an additional point if they had a close homolog (i.e. > 80% sequence similarity) in D. filaria and Te. circumcincta, two related parasites of importance also in small ruminants. This was the case for seven of the 13 top-scoring targets in at least one of these two species, suggesting that these seven kinases are relatively conserved in sequence among all three strongylids and C. elegans, and might be inhibited by a single ligand. However, defining inhibitors with broad-spectrum anthelmintic activity against members representing distinct phyla (e.g., Nematoda and Platyhelminthes) might be challenging to achieve, given evidence of major differences in efficacy of some compounds (e.g., DB03693, DB02984 and DB03044) among select members (C. elegans, H. contortus and B. malayi) within the phylum Nematoda (see Taylor et al., 2013), thus requiring careful cross-phylum investigations of parasite kinomes. Interestingly, homologs of four of the kinases (Hc-PK-062.1, Hc-PK-199.1, Hc-PK-210.1 and Hc-PK-165.1) that were conserved in sequence among multiple of the four nematode species investigated here have been identified previously in an RNAi screen of C. elegans (see Lehmann et al., 2013) as being essential for protein homeostasis, mitochondrial network structure and/or sarcomere structure in muscle. Therefore, assessing chemicals for their ability to specifically inhibit one or more of these four kinases and disrupt normal muscle formation and/or morphology might represent a promising path; while many of the approved veterinary drugs bind to neuromuscular targets, thereby disrupting normal muscle functionality (e.g., piperazine, pyrantel, morantel, levamisole, ivermectin, emodepside) (Holden-Dye and Walker, 2014), inhibiting processes controlling muscle formation and/or structure in the parasite might also prove fruitful.

113

In summary, the prioritised set of potential kinase inhibitors identified here provides a starting point for future, targeted screening on parasitic stages of H. contortus. Thus, a subset of the 1517 chemicals could now be selected, according to cost, availability, chemical properties, safety and/or prior use as drugs, and tested for anthelmintic effects in an established, automated, whole-worm motility screening assay (Preston et al., 2015a), followed by a ‘hit-to-lead’ phase, in which structural analogs of ‘hit’ compounds could be synthesised and screened to establish structure-activity relationships (SARs), and then tested in established assays to predict intestinal absorption, distribution, metabolism, excretion and toxicity (ADMET) (Preston et al., 2015b). In addition, C. elegans could be used as a complementary tool for the validation of targets and their functions, for investigating modes of action of prioritised chemicals and for the prediction of how long it takes for nematodes to develop resistance against such chemicals. Finally, compounds with desired parameters that are metabolically stable and are not cytotoxic to mammalian cells could then progress to initial in vivo testing in sheep.

3.5 Conclusions The H. contortus kinome should provide a useful resource for fundamental investigations of signalling pathways in this nematode, and will likely facilitate future drug discovery/repurposing efforts for a variety of parasitic worms. With this in mind, further studies should focus on curating the kinase complements of a range of socioeconomically important parasitic worms using the present bioinformatic workflow system, with a view toward predicting pan-phylum anthelmintic targets.

114

3.6 References Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, Scherer SE, Li PW, Hoskins RA, Galle RF, George RA, Lewis SE, Richards S, Ashburner M, Henderson SN, Sutton GG, Wortman JR, Yandell MD, Zhang Q, Chen LX, Brandon RC, Rogers YH, Blazej RG, Champe M, Pfeiffer BD, Wan KH, Doyle C, Baxter EG, Helt G, Nelson CR, Gabor GL, Abril JF, Agbayani A, An HJ, Andrews- Pfannkoch C, Baldwin D, Ballew RM, Basu A, Baxendale J, Bayraktaroglu L, Beasley EM, Beeson KY, Benos PV, Berman BP, Bhandari D, Bolshakov S, Borkova D, Botchan MR, Bouck J, Brokstein P, Brottier P, Burtis KC, Busam DA, Butler H, Cadieu E, Center A, Chandra I, Cherry JM, Cawley S, Dahlke C, Davenport LB, Davies P, de Pablos B, Delcher A, Deng Z, Mays AD, Dew I, Dietz SM, Dodson K, Doup LE, Downes M, Dugan-Rocha S, Dunkov BC, Dunn P, Durbin KJ, Evangelista CC, Ferraz C, Ferriera S, Fleischmann W, Fosler C, Gabrielian AE, Garg NS, Gelbart WM, Glasser K, Glodek A, Gong F, Gorrell JH, Gu Z, Guan P, Harris M, Harris NL, Harvey D, Heiman TJ, Hernandez JR, Houck J, Hostin D, Houston KA, Howland TJ, Wei MH, Ibegwam C, Jalali M, Kalush F, Karpen GH, Ke Z, Kennison JA, Ketchum KA, Kimmel BE, Kodira CD, Kraft C, Kravitz S, Kulp D, Lai Z, Lasko P, Lei Y, Levitsky AA, Li J, Li Z, Liang Y, Lin X, Liu X, Mattei B, McIntosh TC, McLeod MP, McPherson D, Merkulov G, Milshina NV, Mobarry C, Morris J, Moshrefi A, Mount SM, Moy M, Murphy B, Murphy L, Muzny DM, Nelson DL, Nelson DR, Nelson KA, Nixon K, Nusskern DR, Pacleb JM, Palazzolo M, Pittman GS, Pan S, Pollard J, Puri V, Reese MG, Reinert K, Remington K, Saunders RD, Scheeler F, Shen H, Shue BC, Siden-Kiamos I, Simpson M, Skupski MP, Smith T, Spier E, Spradling AC, Stapleton M, Strong R, Sun E, Svirskas R, Tector C, Turner R, Venter E, Wang AH, Wang X, Wang ZY, Wassarman DA, Weinstock GM, Weissenbach J, Williams SM, Woodage T, Worley KC, Wu D, Yang S, Yao QA, Ye J, Yeh RF, Zaveri JS, Zhan M, Zhang G, Zhao Q, Zheng L, Zheng XH, Zhong FN, Zhong W, Zhou X, Zhu S, Zhu X, Smith HO, Gibbs RA, Myers EW, Rubin GM, Venter JC, 2000. The genome sequence of Drosophila melanogaster. Science 287, 2185-2195. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ, 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389-3402. Berriman M, Haas BJ, LoVerde PT, Wilson RA, Dillon GP, Cerqueira GC, Mashiyama ST, Al-Lazikani B, Andrade LF, Ashton PD, Aslett MA, Bartholomeu DC, Blandin G, Caffrey CR, Coghlan A, Coulson R, Day TA, Delcher A, DeMarco R, Djikeng A, Eyre T, Gamble JA, Ghedin E, Gu Y, Hertz-Fowler C, Hirai H, Hirai Y, Houston R, Ivens A, Johnston DA, Lacerda D, Macedo CD, McVeigh P, Ning Z, Oliveira G, Overington JP, Parkhill J, Pertea M, Pierce RJ, Protasio AV, Quail MA, Rajandream MA, Rogers J, Sajid M, Salzberg SL, Stanke M, Tivey AR, White O, Williams DL, Wortman J, Wu W, Zamanian M, Zerlotini A, Fraser-Liggett CM, Barrell BG, El- Sayed NM, 2009. The genome of the blood fluke Schistosoma mansoni. Nature 460, 352-358. Bolger AM, Lohse M, Usadel B, 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114-2120. Burns AR, Luciani GM, Musso G, Bagg R, Yeo M, Zhang Y, Rajendran L, Glavin J, Hunter R, Redman E, Stasiuk S, Schertzberg M, Angus McQuibban G, Caffrey CR, Cutler SR, Tyers M, Giaever G, Nislow C, Fraser AG, MacRae CA, Gilleard J, Roy PJ, 2015. Caenorhabditis elegans is a useful model for anthelmintic discovery. Nat. Commun. 6, 7485.

115

C. elegans Sequencing Consortium, 1998. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282, 2012-2018. Cantacessi C, Campbell BE, Young ND, Jex AR, Hall RS, Presidente PJ, Zawadzki JL, Zhong W, Aleman-Meza B, Loukas A, Sternberg PW, Gasser RB, 2010. Differences

in transcription between free-living and CO2-activated third-stage larvae of Haemonchus contortus. BMC Genomics 11, 266. Cantacessi C, Hofmann A, Campbell BE, Gasser RB, 2015. Impact of next-generation technologies on exploring socioeconomically important parasites and developing new interventions. Methods Mol. Biol. 1247, 437-474. Cleveland WS, 1979. Robust locally weighted regression and smoothing scatterplots. J. Am. Stat. Assoc. 74, 829-836. Delannoy-Normand A, Cortet J, Cabaret J, Neveu C, 2010. A suite of genes expressed during transition to parasitic lifestyle in the trichostrongylid nematode Haemonchus contortus encode potentially secreted proteins conserved in Teladorsagia circumcincta. Vet. Parasitol. 174, 106-114. Desjardins CA, Cerqueira GC, Goldberg JM, Dunning Hotopp JC, Haas BJ, Zucker J, Ribeiro JM, Saif S, Levin JZ, Fan L, Zeng Q, Russ C, Wortman JR, Fink DL, Birren BW, Nutman TB, 2013. Genomics of Loa loa, a Wolbachia-free filarial parasite of humans. Nat. Genet. 45, 495-500. Eisenmann DM, 2005. Wnt signaling. WormBook, ed. The C. elegans Research Community, WormBook, https://doi.org/10.1895/wormbook.1.7.1 Endicott JA, Noble ME, Johnson LN, 2012. The structural basis for control of eukaryotic protein kinases. Annu. Rev. Biochem. 81, 587-613. Feng H, Ren M, Chen L, Rubin CS, 2007. Properties, regulation, and in vivo functions of a novel protein kinase D: Caenorhabditis elegans DKF-2 links diacylglycerol second messenger to the regulation of stress responses and life span. J. Biol. Chem. 282, 31273-31288. Fu L, Niu B, Zhu Z, Wu S, Li W, 2012. CD-HIT: accelerated for clustering the next- generation sequencing data. Bioinformatics 28, 3150-3152. Funari V, Canosa SJ, 2014. The importance of bioinformatics in NGS: breaking the bottleneck in data interpretation. Science 344, 653. Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, McGlinchey S, Michalovich D, Al-Lazikani B, Overington JP, 2012. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 40, D1100-D1107. Geldhof P, Visser A, Clark D, Saunders G, Britton C, Gilleard J, Berriman M, Knox D, 2007. RNA interference in parasitic helminths: current situation, potential pitfalls and future prospects. Parasitology 134, 609-619. Gilleard JS, 2004. The use of Caenorhabditis elegans in parasitic nematode research. Parasitology 128 (Suppl. 1), S49-S70. Gilleard JS, 2006. Understanding anthelmintic resistance: the need for genomics and genetics. Int. J. Parasitol. 36, 1227-1239. Goldberg JM, Griggs AD, Smith JL, Haas BJ, Wortman JR, Zeng Q, 2013. Kinannote, a computer program to identify and classify members of the eukaryotic protein kinase superfamily. Bioinformatics 29, 2387-2394. Gönczy P, Rose LS, 2005. Asymmetric cell division and axis formation in the embryo. WormBook, ed. The C. elegans Research Community, WormBook, https://doi.org/10.1895/wormbook.1.30.1 Gough J, Karplus K, Hughey R, Chothia C, 2001. Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J. Mol. Biol. 313, 903-919.

116

Hagen J, Young ND, Every AL, Pagel CN, Schnoeller C, Scheerlinck JP, Gasser RB, Kalinna BH, 2014. Omega-1 knockdown in Schistosoma mansoni eggs by lentivirus transduction reduces granuloma size in vivo. Nat. Commun. 5, 5375. Hanks SK, Hunter T, 1995. Protein kinases 6. The eukaryotic protein kinase superfamily: kinase (catalytic) domain structure and classification. FASEB J. 9, 576-596. Hanks SK, 2003. Genomic analysis of the eukaryotic protein kinase superfamily: a perspective. Genome Biol. 4, 111. Harris TW, Baran J, Bieri T, Cabunoc A, Chan J, Chen WJ, Davis P, Done J, Grove C, Howe K, Kishore R, Lee R, Li Y, Muller HM, Nakamura C, Ozersky P, Paulini M, Raciti D, Schindelman G, Tuli MA, Van Auken K, Wang D, Wang X, Williams G, Wong JD, Yook K, Schedl T, Hodgkin J, Berriman M, Kersey P, Spieth J, Stein L, Sternberg PW, 2014. WormBase 2014: new views of curated biology. Nucleic Acids Res. 42, D789-D793. Hartman D, Donald DR, Nikolaou S, Savin KW, Hasse D, Presidente PJ, Newton SE, 2001. Analysis of developmentally regulated genes of the parasite Haemonchus contortus. Int. J. Parasitol. 31, 1236-1245. Hoekstra R, Visser A, Otsen M, Tibben J, Lenstra JA, Roos MH, 2000. EST sequencing of the parasitic nematode Haemonchus contortus suggests a shift in gene expression during transition to the parasitic stages. Mol. Biochem. Parasitol. 110, 53-68. Holden-Dye L, Walker RJ, 2007. Anthelmintic drugs. WormBook, ed. The C. elegans Research Community, WormBook, https://doi.org/10.1895/wormbook.1.143.1 Holden-Dye L, Walker RJ, 2012. How relevant is Caenorhabditis elegans as a model for the analysis of parasitic nematode biology? In: Caffrey, CR (Ed.), Parasitic helminths: targets, screens, drugs and vaccines. Wiley-Blackwell, Hoboken, New Jersey, USA, pp. 23-41. Holden-Dye L, Walker RJ, 2014. Anthelmintic drugs and nematicides: studies in Caenorhabditis elegans. WormBook, ed. The C. elegans Research Community, WormBook, https://doi.org/10.1895/wormbook.1.143.2 Hon WC, McKay GA, Thompson PR, Sweet RM, Yang DS, Wright GD, Berghuis AM, 1997. Structure of an enzyme required for aminoglycoside antibiotic resistance reveals homology to eukaryotic protein kinases. Cell 89, 887-895. Hoppe PE, Chau J, Flanagan KA, Reedy AR, Schriefer LA, 2010. Caenorhabditis elegans unc-82 encodes a serine/threonine kinase important for myosin filament organization in muscle during growth. Genetics 184, 79-90. Hopper NA, Lee J, Sternberg PW, 2000. ARK-1 inhibits EGFR signaling in C. elegans. Mol. Cell 6, 65-75. Hu M, Lok JB, Ranjit N, Massey HC, Jr., Sternberg PW, Gasser RB, 2010. Structural and functional characterisation of the fork head transcription factor-encoding gene, Hc- daf-16, from the parasitic nematode Haemonchus contortus (Strongylida). Int. J. Parasitol. 40, 405-415. Huang X, Madan A, 1999. CAP3: A DNA sequence assembly program. Genome Res. 9, 868- 877. Inada H, Ito H, Satterlee J, Sengupta P, Matsumoto K, Mori I, 2006. Identification of guanylyl cyclases that function in thermosensory neurons of Caenorhabditis elegans. Genetics 172, 2239-2252. Jex AR, Liu S, Li B, Young ND, Hall RS, Li Y, Yang L, Zeng N, Xu X, Xiong Z, Chen F, Wu X, Zhang G, Fang X, Kang Y, Anderson GA, Harris TW, Campbell BE, Vlaminck J, Wang T, Cantacessi C, Schwarz EM, Ranganathan S, Geldhof P, Nejsum P, Sternberg PW, Yang H, Wang J, Wang J, Gasser RB, 2011. Ascaris suum draft genome. Nature 479, 529-533.

117

Jex AR, Nejsum P, Schwarz EM, Hu L, Young ND, Hall RS, Korhonen PK, Liao S, Thamsborg S, Xia J, Xu P, Wang S, Scheerlinck JP, Hofmann A, Sternberg PW, Wang J, Gasser RB, 2014. Genome and transcriptome of the porcine whipworm Trichuris suis. Nat. Genet. 46, 701-706. Jones P, Binns D, Chang HY, Fraser M, Li W, McAnulla C, McWilliam H, Maslen J, Mitchell A, Nuka G, Pesseat S, Quinn AF, Sangrador-Vegas A, Scheremetjew M, Yong SY, Lopez R, Hunter S, 2014. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236-1240. Kang C, Avery L, 2010. Death-associated protein kinase (DAPK) and signal transduction: fine-tuning of autophagy in Caenorhabditis elegans homeostasis. FEBS J. 277, 66- 73. Kaplan RM, Vidyashankar AN, 2012. An inconvenient truth: global worming and anthelmintic resistance. Vet. Parasitol. 186, 70-78. Kent WJ, 2002. BLAT - the BLAST-like alignment tool. Genome Res. 12, 656-664. Knox DP, Geldhof P, Visser A, Britton C, 2007. RNA interference in parasitic nematodes of animals: a reality check? Trends Parasitol. 23, 105-107. Koga M, Zwaal R, Guan KL, Avery L, Ohshima Y, 2000. A Caenorhabditis elegans MAP kinase kinase, MEK-1, is involved in stress responses. EMBO J. 19, 5148-5156. Koonin EV, Galperin MY, 2003. Genome annotation and analysis. Sequence - evolution - function: computational approaches in comparative genomics. Kluwer Academic Publishers, Boston, Massachusetts, USA, pp. 193-226. L'Hernault SW, 2006. Spermatogenesis. WormBook, ed. The C. elegans Research Community, WormBook, https://doi.org/10.1895/wormbook.1.85.1 Laing R, Kikuchi T, Martinelli A, Tsai IJ, Beech RN, Redman E, Holroyd N, Bartley DJ, Beasley H, Britton C, Curran D, Devaney E, Gilabert A, Hunt M, Jackson F, Johnston SL, Kryukov I, Li K, Morrison AA, Reid AJ, Sargison N, Saunders GI, Wasmuth JD, Wolstenholme A, Berriman M, Gilleard JS, Cotton JA, 2013. The genome and transcriptome of Haemonchus contortus, a key model parasite for drug and vaccine discovery. Genome Biol. 14, R88. Langmead B, Salzberg SL, 2012. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357-359. Law V, Knox C, Djoumbou Y, Jewison T, Guo AC, Liu Y, Maciejewski A, Arndt D, Wilson M, Neveu V, Tang A, Gabriel G, Ly C, Adamjee S, Dame ZT, Han B, Zhou Y, Wishart DS, 2014. DrugBank 4.0: shedding new light on drug metabolism. Nucleic Acids Res. 42, D1091-D1097. Lehmann S, Bass JJ, Szewczyk NJ, 2013. Knockdown of the C. elegans kinome identifies kinases required for normal protein homeostasis, mitochondrial network structure, and sarcomere structure in muscle. Cell Commun. Signal. 11, 71. Li B, Dewey CN, 2011. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323. Li F, Lok JB, Gasser RB, Korhonen PK, Sandeman MR, Shi D, Zhou R, Li X, Zhou Y, Zhao J, Hu M, 2014a. Hc-daf-2 encodes an insulin-like receptor kinase in the barber's pole worm, Haemonchus contortus, and restores partial dauer regulation. Int. J. Parasitol. 44, 485-496. Li FC, Gasser RB, Lok JB, Korhonen PK, Wang YF, Yin FY, He L, Zhou R, Zhao JL, Hu M, 2014b. Exploring the role of two interacting phosphoinositide 3-kinases of Haemonchus contortus. Parasit. Vectors 7, 498. Lipinski CA, 2004. Lead- and drug-like compounds: the rule-of-five revolution. Drug Discov. Today Technol. 1, 337-341.

118

Mangiola S, Young ND, Sternberg PW, Strube C, Korhonen PK, Mitreva M, Scheerlinck JP, Hofmann A, Jex AR, Gasser RB, 2014. Analysis of the transcriptome of adult Dictyocaulus filaria and comparison with Dictyocaulus viviparus, with a focus on molecules involved in host-parasite interactions. Int. J. Parasitol. 44, 251-261. Manning G, Whyte DB, Martinez R, Hunter T, Sudarsanam S, 2002. The protein kinase complement of the human genome. Science 298, 1912-1934. Manning G, 2005. Genomic overview of protein kinases. WormBook, ed. The C. elegans Research Community, WormBook, https://doi.org/10.1895/wormbook.1.60.1 Mardia KV, Kent JT, Bibby JM, 1979. Multivariate analysis. Academic Press, San Diego, California, USA. Mi H, Muruganujan A, Casagrande JT, Thomas PD, 2013. Large-scale gene function analysis with the PANTHER classification system. Nat. Protoc. 8, 1551-1566. Mitreva M, Blaxter ML, Bird DM, McCarter JP, 2005. Comparative genomics of nematodes. Trends Genet. 21, 573-581. Morgan WR, Greenwald I, 1993. Two novel transmembrane protein tyrosine kinases expressed during Caenorhabditis elegans hypodermal development. Mol. Cell. Biol. 13, 7133-7143. Murakami S, Johnson TE, 1998. Life extension and stress resistance in Caenorhabditis elegans modulated by the tkr-1 gene. Curr. Biol. 8, 1091-1094. Nikolaou S, Gasser RB, 2006. Prospects for exploring molecular developmental processes in Haemonchus contortus. Int. J. Parasitol. 36, 859-868. Opperman CH, Bird DM, Williamson VM, Rokhsar DS, Burke M, Cohn J, Cromer J, Diener S, Gajan J, Graham S, Houfek TD, Liu Q, Mitros T, Schaff J, Schaffer R, Scholl E, Sosinski BR, Thomas VP, Windham E, 2008. Sequence and genetic map of Meloidogyne hapla: a compact nematode genome for plant parasitism. Proc. Natl. Acad. Sci. USA 105, 14802-14807. Paradis S, Ailion M, Toker A, Thomas JH, Ruvkun G, 1999. A PDK1 homolog is necessary and sufficient to transduce AGE-1 PI3 kinase signals that regulate diapause in Caenorhabditis elegans. Genes Dev. 13, 1438-1452. Plowman GD, Sudarsanam S, Bingham J, Whyte D, Hunter T, 1999. The protein kinases of Caenorhabditis elegans: a model for signal transduction in multicellular organisms. Proc. Natl. Acad. Sci. USA 96, 13603-13610. Preston S, Jabbar A, Nowell C, Joachim A, Ruttkowski B, Baell J, Cardno T, Korhonen PK, Piedrafita D, Ansell BR, Jex AR, Hofmann A, Gasser RB, 2015b. Low cost whole- organism screening of compounds for anthelmintic activity. Int. J. Parasitol. 45, 333- 343. Preston S, Jabbar A, Gasser RB, 2015a. A perspective on genomic-guided anthelmintic discovery and repurposing using Haemonchus contortus. Infect. Genet. Evol. 40, 368-373. Reinke V, Smith HE, Nance J, Wang J, Van Doren C, Begley R, Jones SJ, Davis EB, Scherer S, Ward S, Kim SK, 2000. A global profile of germline gene expression in C. elegans. Mol. Cell 6, 605-616. Rice P, Longden I, Bleasby A, 2000. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 16, 276-277. Schistosoma japonicum Genome Sequencing and Functional Analysis Consortium, 2009. The Schistosoma japonicum genome reveals features of host-parasite interplay. Nature 460, 345-351. Schwarz EM, Korhonen PK, Campbell BE, Young ND, Jex AR, Jabbar A, Hall RS, Mondal A, Howe AC, Pell J, Hofmann A, Boag PR, Zhu XQ, Gregory TR, Loukas A, Williams BA, Antoshechkin I, Brown CT, Sternberg PW, Gasser RB, 2013. The

119

genome and developmental transcriptome of the strongylid nematode Haemonchus contortus. Genome Biol. 14, R89. Schwarz EM, Hu Y, Antoshechkin I, Miller MM, Sternberg PW, Aroian RV, 2015. The genome and transcriptome of the zoonotic hookworm Ancylostoma ceylanicum identify infection-specific gene families. Nat. Genet. 47, 416-422. Shanmugam D, Ralph SA, Carmona SJ, Crowther GJ, Roos DS, Agüero F, 2012. Integrating and mining helminth genomes to discover and prioritize novel therapeutic targets. In: Caffrey, CR (Ed.), Parasitic helminths: targets, screens, drugs and vaccines. Wiley-Blackwell, Hoboken, New Jersey, USA, pp. 43-59. Sonnhammer EL, Eddy SR, Durbin R, 1997. Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins 28, 405-420. Stoll NR, 1943. The wandering of Haemonchus in the sheep host. J. Parasitol. 29, 407-416. Stroehlein AJ, Young ND, Jex AR, Sternberg PW, Tan P, Boag PR, Hofmann A, Gasser RB, 2015. Defining the Schistosoma haematobium kinome enables the prediction of essential kinases as anti-schistosome drug targets. Sci. Rep. 5, 17759. Sutherland I, Scott I, 2009. Gastrointestinal nematodes of sheep and cattle: biology and control. Wiley-Blackwell, West Sussex, UK. Tang YT, Gao X, Rosa BA, Abubucker S, Hallsworth-Pepin K, Martin J, Tyagi R, Heizer E, Zhang X, Bhonagiri-Palsikar V, Minx P, Warren WC, Wang Q, Zhan B, Hotez PJ, Sternberg PW, Dougall A, Gaze ST, Mulvenna J, Sotillo J, Ranganathan S, Rabelo EM, Wilson RK, Felgner PL, Bethony J, Hawdon JM, Gasser RB, Loukas A, Mitreva M, 2014. Genome of the human hookworm Necator americanus. Nat. Genet. 46, 261-269. Taylor CM, Martin J, Rao RU, Powell K, Abubucker S, Mitreva M, 2013. Using existing drugs as leads for broad spectrum anthelmintics targeting protein kinases. PLoS Pathog. 9, e1003149. Thorvaldsdóttir H, Robinson JT, Mesirov JP, 2013. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief. Bioinform. 14, 178-192. Tsai IJ, Zarowiecki M, Holroyd N, Garciarrubio A, Sanchez-Flores A, Brooks KL, Tracey A, Bobes RJ, Fragoso G, Sciutto E, Aslett M, Beasley H, Bennett HM, Cai J, Camicia F, Clark R, Cucher M, De Silva N, Day TA, Deplazes P, Estrada K, Fernandez C, Holland PW, Hou J, Hu S, Huckvale T, Hung SS, Kamenetzky L, Keane JA, Kiss F, Koziol U, Lambert O, Liu K, Luo X, Luo Y, Macchiaroli N, Nichol S, Paps J, Parkinson J, Pouchkina-Stantcheva N, Riddiford N, Rosenzvit M, Salinas G, Wasmuth JD, Zamanian M, Zheng Y, Taenia solium Genome Consortium, Cai X, Soberon X, Olson PD, Laclette JP, Brehm K, Berriman M, 2013. The genomes of four tapeworm species reveal adaptations to parasitism. Nature 496, 57-63. van den Heuvel S, 2005. Cell-cycle regulation. WormBook, ed. The C. elegans Research Community, WormBook, https://doi.org/10.1895/wormbook.1.28.1 van Dijk EL, Auger H, Jaszczyszyn Y, Thermes C, 2014. Ten years of next-generation sequencing technology. Trends Genet. 30, 418-426. Veglia F, 1915. The anatomy and life-history of the Haemonchus contortus. Vet. Res. 4, 347- 500. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, Gocayne JD, Amanatides P, Ballew RM, Huson DH, Wortman JR, Zhang Q, Kodira CD, Zheng XH, Chen L, Skupski M, Subramanian G, Thomas PD, Zhang J, Gabor Miklos GL, Nelson C, Broder S, Clark AG, Nadeau J, McKusick VA, Zinder N, Levine AJ, Roberts RJ, Simon M, Slayman C, Hunkapiller M, Bolanos R, Delcher A, Dew I, Fasulo D, Flanigan M, Florea L, Halpern A,

120

Hannenhalli S, Kravitz S, Levy S, Mobarry C, Reinert K, Remington K, Abu- Threideh J, Beasley E, Biddick K, Bonazzi V, Brandon R, Cargill M, Chandramouliswaran I, Charlab R, Chaturvedi K, Deng Z, Di Francesco V, Dunn P, Eilbeck K, Evangelista C, Gabrielian AE, Gan W, Ge W, Gong F, Gu Z, Guan P, Heiman TJ, Higgins ME, Ji RR, Ke Z, Ketchum KA, Lai Z, Lei Y, Li Z, Li J, Liang Y, Lin X, Lu F, Merkulov GV, Milshina N, Moore HM, Naik AK, Narayan VA, Neelam B, Nusskern D, Rusch DB, Salzberg S, Shao W, Shue B, Sun J, Wang Z, Wang A, Wang X, Wang J, Wei M, Wides R, Xiao C, Yan C, Yao A, Ye J, Zhan M, Zhang W, Zhang H, Zhao Q, Zheng L, Zhong F, Zhong W, Zhu S, Zhao S, Gilbert D, Baumhueter S, Spier G, Carter C, Cravchik A, Woodage T, Ali F, An H, Awe A, Baldwin D, Baden H, Barnstead M, Barrow I, Beeson K, Busam D, Carver A, Center A, Cheng ML, Curry L, Danaher S, Davenport L, Desilets R, Dietz S, Dodson K, Doup L, Ferriera S, Garg N, Gluecksmann A, Hart B, Haynes J, Haynes C, Heiner C, Hladun S, Hostin D, Houck J, Howland T, Ibegwam C, Johnson J, Kalush F, Kline L, Koduru S, Love A, Mann F, May D, McCawley S, McIntosh T, McMullen I, Moy M, Moy L, Murphy B, Nelson K, Pfannkoch C, Pratts E, Puri V, Qureshi H, Reardon M, Rodriguez R, Rogers YH, Romblad D, Ruhfel B, Scott R, Sitter C, Smallwood M, Stewart E, Strong R, Suh E, Thomas R, Tint NN, Tse S, Vech C, Wang G, Wetter J, Williams S, Williams M, Windsor S, Winn-Deen E, Wolfe K, Zaveri J, Zaveri K, Abril JF, Guigo R, Campbell MJ, Sjolander KV, Karlak B, Kejariwal A, Mi H, Lazareva B, Hatton T, Narechania A, Diemer K, Muruganujan A, Guo N, Sato S, Bafna V, Istrail S, Lippert R, Schwartz R, Walenz B, Yooseph S, Allen D, Basu A, Baxendale J, Blick L, Caminha M, Carnes-Stine J, Caulk P, Chiang YH, Coyne M, Dahlke C, Mays A, Dombroski M, Donnelly M, Ely D, Esparham S, Fosler C, Gire H, Glanowski S, Glasser K, Glodek A, Gorokhov M, Graham K, Gropman B, Harris M, Heil J, Henderson S, Hoover J, Jennings D, Jordan C, Jordan J, Kasha J, Kagan L, Kraft C, Levitsky A, Lewis M, Liu X, Lopez J, Ma D, Majoros W, McDaniel J, Murphy S, Newman M, Nguyen T, Nguyen N, Nodell M, Pan S, Peck J, Peterson M, Rowe W, Sanders R, Scott J, Simpson M, Smith T, Sprague A, Stockwell T, Turner R, Venter E, Wang M, Wen M, Wu D, Wu M, Xia A, Zandieh A, Zhu X, 2001. The sequence of the human genome. Science 291, 1304-1351. Young ND, Jex AR, Li B, Liu S, Yang L, Xiong Z, Li Y, Cantacessi C, Hall RS, Xu X, Chen F, Wu X, Zerlotini A, Oliveira G, Hofmann A, Zhang G, Fang X, Kang Y, Campbell BE, Loukas A, Ranganathan S, Rollinson D, Rinaldi G, Brindley PJ, Yang H, Wang J, Wang J, Gasser RB, 2012. Whole-genome sequence of Schistosoma haematobium. Nat. Genet. 44, 221-225. Young ND, Nagarajan N, Lin SJ, Korhonen PK, Jex AR, Hall RS, Safavi-Hemami H, Kaewkong W, Bertrand D, Gao S, Seet Q, Wongkham S, Teh BT, Wongkham C, Intapan PM, Maleewong W, Yang X, Hu M, Wang Z, Hofmann A, Sternberg PW, Tan P, Wang J, Gasser RB, 2014. The Opisthorchis viverrini genome provides insights into life in the bile duct. Nat. Commun. 5, 4378. Zagorska A, Deak M, Campbell DG, Banerjee S, Hirano M, Aizawa S, Prescott AR, Alessi DR, 2010. New roles for the LKB1-NUAK pathway in controlling myosin phosphatase complexes and cell adhesion. Sci. Signal. 3, ra25. Zhu XQ, Korhonen PK, Cai H, Young ND, Nejsum P, von Samson-Himmelstjerna G, Boag PR, Tan P, Li Q, Min J, Yang Y, Wang X, Fang X, Hall RS, Hofmann A, Sternberg PW, Jex AR, Gasser RB, 2015. Genetic blueprint of the zoonotic pathogen Toxocara canis. Nat. Commun. 6, 6145.

121

Table 3.1 Scoring categories used to rank Haemonchus contortus kinases and prioritise them as potential drug targets. Transcription in at least one parasitic stage was required for a kinase to be considered as a target. Similarity to kinase sequences of one or two related strongylid nematodes (Dictyocaulus filaria and/or Teladorsagia circumcincta) of sheep was rewarded with half a point per species. All other categories were given one point. Scoring category Score Transcribed in parasitic stage (L4 and/or adults) required Lethal phenotype in Caenorhabditis elegans homolog upon RNAi 1 Associated with a unique function in a pathway (‘chokepoint’) 1 Unique classification (group/family/subfamily) 1 > 80% sequence similarity and > 50% coverage to C. elegans homolog 1 ≤ 41.45% sequence similarity to closest homolog in sheep 1 > 80% sequence similarity and > 50% coverage to D. filarial 0.5-1 and/or Te. circumcincta Associated compound in Kinase SARfari 1 Associated compound in DrugBank 1 > 5 associated compound in Kinase SARfari 1 > 5 associated compound in DrugBank 1

122

Table 3.2 Characteristics of the Haemonchus contortus kinome, and comparison with the kinome of Caenorhabditis elegans. Groups that differ by more than 20 kinases between the two species are marked (*). Figures given here include transcript isoforms (Appendix 3.1). Kinase group Number of Number of Haemonchus Caenorhabditis contortus kinases elegans kinases CMGC 43 48 CAMK 60* 40 AGC 41 29 Other 46* 67 TK 56* 84 TKL 24 15 STE 27 24 CK1 61* 83 RGC 28 27 Atypical 23 17 Unclassified 23 0 Total 432 434

123

Table 3.3 Ten prioritised (highest-scoring) kinase targets in Haemonchus contortus, and three predicted novel targets. The table shows the closest homolog in Caenorhabditis elegans, kinase classification, score and number of associated small-molecule compounds in the DrugBank and/or Kinase SARfari databases for each target. Predicted novel targets are marked (*). Individual compound codes can be accessed from Appendices 3.5 and 3.6.

Closest H. contortus kinase Number of associated compounds in C. elegans Descriptiona Classification Score identifier DrugBank/Kinase SARfari homolog Hc-PK-358.1 CSK-1 C-terminal Src kinase TK/CSK 8 12/735 Hc-PK-062.1 CDK-12 Cyclin-dependent kinase CMGC/CDK/CRK7 7.5 17/0 Hc-PK-063.1 CDK-5 Cyclin-dependent kinase CMGC/CDK/CDK5 7.5 7/655 Hc-PK-236.1 CDK-9 Cyclin-dependent kinase CMGC/CDK/CDK9 7.5 1/655 Hc-PK-197.1 SEK-1 SAPK/ERK kinase STE/STE7/MEK3 7.5 10/0 Hc-PK-263.1 WTS-1 WarTS (Drosophila) protein AGC/NDR/LATS 7 4/0 kinase homolog Hc-PK-002.1 CDK-7 Cyclin-dependent kinase CMGC/CDK/CDK7 7 2/0 Hc-PK-199.1 GSK-3 Glycogen synthase kinase CMGC/GSK 7 19/1 Hc-PK-210.1 LIT-1 “Loss of intestine”; serine- CMGC/MAPK/NMO 7 14/0 threonine protein kinase Hc-PK-261.1 DLK-1 Dual-leucine zipper kinase TKL/MLK/LZK 7 46/0 Hc-PK-165.1* CSNK-1 Casein kinase CK1/CK1/CK1-G 4.5 3/0 Hc-PK-092.1* RSKS-1 Ribosomal protein S6 kinase AGC/RSK/RSKP70 4 4/0 Hc-PK-262.1* CHK-2 Checkpoint kinase CAMK/RAD53 4 1/0 a https://www.wormbase.org/

124

A Cluster 1 (n = 74) Cluster 2 (n = 5) Cluster 5 (n = 3) Cluster 6 (n = 28)

B Cluster 3 (n = 19) Cluster 4 (n = 50) C Cluster 7 (n = 1) Cluster 8 (n = 1)

Figure 3.1 Selected clusters of transcription profiles for kinase genes based on the Ward- clustering method (k = 15). Male-enriched clusters (A), egg-enriched clusters (B) and L3- enriched clusters (C) are shown. Transcription levels are represented as transcripts per million (TPM) values on the Y-axis (scaled individually according to the highest value within each cluster) and developmental stages (egg, L1-L4 and adult) of Haemonchus contortus are shown on the X-axis. Shaded lines represent individual transcription profiles; bold lines represent the Lowess trend line ± S.D. (dashed lines). For the L4 and adult stages, both sexes are plotted (red, female; blue, male). The complete set of 15 clusters is shown in Appendix 3.4.

125

TK

CMGC [3]; STE [1]

AGC [1]; CMGC [3]; TKL [1]

CK1

AGC; CAMK Score

Number of transcribed kinase genes (L4 and/or adult)

Figure 3.2 Prediction of druggable kinases of Haemonchus contortus based on ranking score. Kinases prioritised as either ‘repurposing’ targets or novel targets are highlighted in red and black, respectively. The group classification is indicated for each kinase; the total numbers of kinases with a particular score are indicated in square brackets.

126

CHAPTER 4 Analyses of compact Trichinella kinomes reveal a MOS-like protein kinase with a unique N-terminal domain ______

Abstract Parasitic worms of the genus Trichinella (phylum Nematoda; class Enoplea) represent a complex of at least twelve taxa that infect a range of different host animals, including humans, around the world. They are foodborne, intracellular nematodes, and their life cycles differ substantially from those of other nematodes. The recent characterisation of the genomes and transcriptomes of all twelve recognised taxa of Trichinella now allows, for the first time, detailed studies of their molecular biology. In the present study, we defined, curated and compared the protein kinase complements (kinomes) of Trichinella spiralis and T. pseudospiralis using an integrated bioinformatic workflow employing transcriptomic and genomic data sets. We examined how variation in the kinome might link to unique aspects of Trichinella morphology, biology and evolution. Furthermore, we utilised in silico structural modelling to discover and characterise a novel, MOS-like kinase with an unusual, previously undescribed N-terminal domain. Taken together, the present findings provide a basis for comparative investigations of nematode kinomes, and might facilitate the identification of Enoplea-specific intervention and diagnostic targets. Importantly, the in silico-modelling approach assessed here provides an exciting prospect of being able to identify and classify currently unknown (‘orphan’) kinases, as a foundation for their subsequent structural and functional investigation. ______

127

4.1 Introduction The phylum Nematoda (roundworms) includes a wide range of parasitic and free-living species with extensive biological and genetic diversity. Based on recent molecular studies, nematodes can be separated into multiple classes, including the and the Enoplea (see Blaxter et al., 1998; Roberts and Janovy Jr., 2009; Blaxter and Koutsovoulos, 2015). Within the latter class, the genus Trichinella forms a complex of at least twelve species and genotypes. These parasites infect a range of different host animals and have a worldwide distribution (Pozio, 2007). Hosts become infected through the ingestion of muscle tissue containing first-stage larvae (L1s). In the gut, L1s are released from muscle cells, invade the small intestinal mucosa, and undergo a series of rapid moults (within ~30 h) to develop into male and female adults (Despommier, 1993). Fertilised female worms release L1s into the lymph lacteals within ~1 week; these larvae are then disseminated via the blood stream into striated muscles, where they penetrate individual muscle cells. Over the course of 15 to 20 days, the host cell is transformed into what is known as a “nurse cell”. The parasite- host-cell-complex is surrounded by a collagen capsule for some (encapsulated) species, including T. spiralis; however, this capsule is absent from other (non-encapsulated) taxa, such as T. pseudospiralis (Despommier, 1993, 1998; Pozio et al., 2001). This intracellular environment allows Trichinella larvae to survive for up to 20 years until ingested by a susceptible host (Pozio and Murrell, 2006; Pozio and Zarlenga, 2013). The life cycles of Trichinella spp. differ markedly from those of most other parasitic nematodes in several respects. First, Trichinella L1s are unique in that they are adapted to survive inside host cells. Second, these parasites do not have free-living stages in their life cycle, and the adult female worms release live L1s, in contrast to many other parasitic nematodes which release eggs into the environment (Roberts and Janovy Jr., 2009). Third, the biological diversity within the Trichinella complex reflects considerable genetic variability among taxa, and different host affiliations and geographic distributions (Pozio, 2007; Korhonen et al., 2016). The underlying molecular mechanisms responsible for these variable traits and habitats are still relatively poorly understood (Pozio and Zarlenga, 2013). Recently, advanced sequencing and bioinformatic technologies have enabled the characterisation of the genomes and transcriptomes of all individual Trichinella taxa (Mitreva et al., 2011; Korhonen et al., 2016). The reported draft genomes for T. spiralis and T. pseudospiralis are 50 Mb and 49.2 Mb in size and encode 14,745 and 12,699 protein-coding genes, respectively (Korhonen et al., 2016). These genomic and transcriptomic data sets represent an invaluable resource for addressing some vexed questions surrounding the

128

molecular biology of members of this species complex, in particular cellular signalling mechanisms, some of which have been investigated with regard to their involvement in remodelling of the host cell (Capo et al., 1998; Despommier, 1998; Gounaris et al., 2001) and/or other host-parasite interactions (Arden et al., 1997; Smith et al., 2000; Gounaris et al., 2001). In general, most signalling cascades rely heavily on protein kinases, which are important enzymes that phosphorylate one or more substrate proteins, leading to the down-stream activation or inactivation of molecular signalling partners (Cohen, 2000; Ubersax and Ferrell Jr., 2007). This process plays a central role in a wide range of biological processes, including the regulation of transcription, cell cycle, differentiation and apoptosis (Cohen, 2000; Ubersax and Ferrell Jr., 2007). Protein kinases can be classified into nine key groups, families and subfamilies, based on sequence similarity in the catalytic kinase domain and functional domain architecture. A tenth group (‘atypical’) consists of diverse protein kinases that show limited or no structural similarity to the protein kinase-like fold, but are active protein kinases (Hanks et al., 1988; Hanks and Hunter, 1995; Manning, 2005). Although many protein kinases are relatively conserved and often assume essential functions in all eukaryotes, some subsets are expanded in and/or specific to particular taxa (Manning, 2005). For example, there are major expansions of several groups/families in the kinome of the free- living nematode Caenorhabditis elegans (see Plowman et al., 1999; Manning, 2005), with some expansions specific to this nematode, and others that appear to be shared by a range of nematode species excluding T. spiralis (see Desjardins et al., 2013). An initial assessment of a draft genome of T. spiralis (see Mitreva et al., 2011) suggested a substantial reduction in the protein kinase complement (kinome) of this species compared with the draft kinomes of other parasitic and free-living nematodes (Desjardins et al., 2013). Whether this apparent reduction might associate with the life style and unique biology of T. spiralis or with an incomplete genomic assembly (Desjardins et al., 2013) is not yet clear. To address some of these questions, we used transcriptomic and genomic data sets for the two best-studied representatives of Trichinella, namely T. spiralis and T. pseudospiralis (see Korhonen et al., 2016) to (i) define, curate and compare the protein kinase complements of these two species, (ii) explore quantitative and qualitative diversification of protein kinases in the Trichinella complex in comparison with other nematodes, and (iii) discuss how variation in kinase complements might link to the uniqueness of encapsulated and non-encapsulated species.

129

4.2 Materials and methods 4.2.1 Defining kinomes We used published genomic and transcriptomic data of representatives of encapsulated (T. spiralis; designated T1) and non-encapsulated (T. pseudospiralis; designated T4.1) Trichinella species (Korhonen et al., 2016) as well as a published kinase sequence data set (Desjardins et al., 2013) predicted from independent T. spiralis genomic data (Mitreva et al., 2011). We selected data sets of these two species as representatives because of the high quality of respective genomic and transcriptomic assemblies (Korhonen et al., 2016) and the extent of experimental work conducted on T. spiralis and T. pseudospiralis (see Pozio and Murrell, 2006; Pozio and Zarlenga, 2013). First, from a total of 14,745 (T. spiralis) and 12,699 (T. pseudospiralis) inferred amino acid sequences (Korhonen et al., 2016), we predicted protein kinase candidates using InterProScan v.5.15.54 (Jones et al., 2014), employing information from domain-matches against the databases Pfam v.27.0 (Sonnhammer et al., 1997), PANTHER v.9.0 (Mi et al., 2013) and SUPERFAMILY v.1.75 (Gough et al., 2001). We selected the longest protein sequence of each predicted kinase, and assessed kinase candidates with incomplete or unusual domain architectures by comparisons with known kinase domain combinations, as reported in InterProScan and KinBase (http://kinase.com/web/current/kinbase/; accessed: 1 May 2016) and by literature searches. Second, we identified groups of orthologs among the kinase sequences of T. spiralis and T. pseudospiralis using the program OrthoMCL v.2.0.4 (Li et al., 2003), applying a BLAST E-value cut-off of 10e-5 and a stringent similarity cut-off of 80%. We then mapped ‘ungrouped’ (i.e. without an ortholog) protein sequences to the genomic scaffolds of the other species using the program BLAT v.34x12 (Kent, 2002). Using the query sequences as templates, we then improved/complemented individual gene structures (intron-exon boundaries) by mapping to respective genomic regions using the program Exonerate v.2.2.0 (Slater and Birney, 2005), employing the “multi-pass suboptimal alignment” parameter and the “protein2genome:bestfit” model. If complete open reading frames (ORFs) could not be identified, we matched both the coding sequence - inferred using Exonerate - and the amino acid template with a de novo-assembled transcript (Korhonen et al., 2016) using nucleotide BLAST v.2.2.28+ (E-value of ≤ 10e-30; Camacho et al., 2009) and BLAT, respectively. Then, we mapped matching transcripts (≥ 80% identity, with ≥ 80% of the original sequence length aligned) to genomic scaffolds to define their gene structures.

130

Third, we created pairwise global alignments of individual amino acid query sequences and all inferred heterologous sequences thereof employing the program EMBOSS Needle v.6.4.0.0 (Rice et al., 2000), retaining the sequence with the best alignment (spanning ≥ 80% of the query sequence). Fourth, we classified all curated sequences using the program Kinannote (Goldberg et al., 2013), excluding partial sequences of < 200 amino acid in length. Sequences not classified using Kinannote were assessed for homology to those represented in well-curated kinomes of C. elegans (see Manning, 2005) and Homo sapiens (see Manning et al., 2002a) using OrthoMCL (BLAST E-value of ≤ 10e-5; sequence similarity ≥ 80%); such sequences were assigned to a particular kinase family/subfamily if all H. sapiens and/or C. elegans sequences in an OrthoMCL cluster had the same family/subfamily classification. If sequences could not be assigned using this approach, domain architecture information (for atypical protein kinases; aPKs) or phylogenetic analysis (for eukaryotic protein kinases; ePKs) was used to classify them.

4.2.2 Phylogenetic analysis We used the catalytic domains (Pfam identifier PF07714 for tyrosine kinases (TK) and receptor guanylate cyclases (RGC); PF00069 for all other kinase groups) of all ePK sequences, in order to construct multiple sequence alignments for individual groups using the program hmmalign within the package HMMER v.3.1b1 (http://hmmer.janelia.org/). Then, we constructed phylogenetic trees (Bayesian inference) using the program MrBayes v.3.2.2 (Ronquist et al., 2012), employing a mixture of models with fixed rate matrices to calculate posterior probabilities. To construct majority rule trees, 1,000,000 trees were generated, of which every 100th tree was sampled after discarding the first 25% of trees as ‘burn-in’. Trees were drawn and annotated using the programs FigTree v.1.4.1 (http://tree.bio.ed.ac.uk/software/figtree/) and Inkscape (http://www.inkscape.org/en/), respectively.

4.2.3 Functional and structural annotation of kinase sequences We functionally annotated the curated kinase sequences by integrating information from InterProScan (gene ontology (GO) terms, Pfam, PANTHER and SUPERFAMILY identifiers) and then assigned kinase sequences to biochemical pathways based on matches (protein BLAST v.2.2.28+; E-value of ≤ 10e-5; Camacho et al., 2009) to the KEGG database (release 27 August 2014; Kanehisa and Goto, 2000). The subcellular localisation of kinases was

131

inferred using the program MultiLoc2 (release 26 October 2009; Blum et al., 2009). Three- dimensional structures of select kinases were predicted using the program I-TASSER v.4.4 (Roy et al., 2010); structures were visualised and superimposed using the MatchMaker function within the program UCSF Chimera v.1.9 (build 39798; Pettersen et al., 2004).

4.3 Results 4.3.1 The protein kinase complements of T. spiralis and T. pseudospiralis Employing our integrated bioinformatic workflow, we defined, curated and annotated the kinase complements of T. spiralis and T. pseudospiralis. The kinomes of T. spiralis and T. pseudospiralis represented 226 and 232 kinases, respectively, of which 205 and 212 sequences were classified as ePKs (Table 4.1). For both species, the kinase complement represented all nine currently recognised groups; 193-198 sequences (93-94%) were assigned to 79 distinct families, and 93 kinase sequences (44-45%) could be classified into 81 subfamilies (Table 4.1). However, 12 and 14 ePKs (5.3-6%) could not be classified beyond the group level for T. spiralis and T. pseudospiralis, respectively, and were thus assigned to the groups AGC (n = 1), “Other” (n = 1), CK1 (n = 5-7) and TK (n = 5) (Figure 4.1; Appendix 4.1). For both Trichinella species, most protein kinase families or subfamilies (n = 112) contained a single ePK representative (Table 4.2; Appendices 4.2 and 4.3); the remainder (n = 26) contained two to six ePK representatives, with the exception of the TTBK family (n = 19 and 23 for T. spiralis and T. pseudospiralis, respectively). Both species encoded all ‘core’ ePKs (Goldberg et al., 2013), except for a RAD53 kinase ortholog (CAMK group), which was also absent from all other recognised species or genotypes of Trichinella (see Korhonen et al., 2016). Other kinase families, members of which are not universally conserved among eukaryotes, were also absent, including members of the NEK family (Quarmby and Mahjoub, 2005) as well as two growth factor receptors, VEGFR and PDGFR (Manning et al., 2002b). Furthermore, a homolog of the TTK kinase MPS-1 was not detected in any of the currently recognised Trichinella taxa, supporting a previous proposal that this kinase is not encoded in the T. spiralis genome (Desjardins et al., 2013). In addition to ePKs, we identified and curated 21 and 20 atypical protein kinases (aPKs) for T. spiralis and T. pseudospiralis, respectively, and assigned them to 17 families/subfamilies (Table 4.3). Both species have single-copy genes encoding ABC1-A, ABC1-B, Alpha, DHS-27, DNAPK, FRAP, PI4K, RIO1, RIO2, RIO3, TAF1 and TRRAP

132

kinases. Other aPK families/subfamilies, including bromodomain kinases, pyruvate dehydrogenase kinases, ataxia-telangiectasia mutated (ATM)-related kinases and uncharacterised nuclear-hormone receptor-related kinases, are encoded by multiple (n = 2-3) genes (Table 4.3; Appendices 4.2 and 4.3). A pairwise comparison of both ePKs and aPKs of T. spiralis and T. pseudospiralis revealed 212 kinase sequences to be single-copy orthologs (Figure 4.1; Appendices 4.2 and 4.3); 14 and 20 sequences represented clusters of multiple homologs in T. spiralis and T. pseudospiralis, respectively. A global pairwise comparison of orthologous kinase sequences of the two species had a mean identity of 85% and similarity of 89%. Despite this high identity/similarity for the majority of kinases, several, small kinase expansions (Figure 4.1; Appendices 4.1C and 4.1H) within the families TTBK (CK1 group) and FER (group TK) were evident for T. pseudospiralis.

4.3.2 Functional annotation of Trichinella kinomes To annotate all 458 identified kinase sequences from both species, we predicted protein function (GO terms), biochemical pathways (KEGG) and protein domains. Most kinase sequences were predicted to assume one or more of the following functions: “protein phosphorylation” (GO:0006468; n = 413), “protein kinase activity” (GO:0004672; n = 409), “ATP-binding” (GO:0005524; n = 346) and “protein-binding” (GO:0005515; n = 96). Approximately half of all sequences (n = 106-108; 47%) were assigned to one or more biological (KEGG) pathways. Interestingly, although no RGC kinases were linked to environmental information processing, as they are in other nematodes (Manning, 2005; Ortiz et al., 2009; Adachi et al., 2010; Stroehlein et al., 2015), 66 kinases in all other groups were (AGC, n = 13; Atypical, n = 3; CAMK, n = 11; CK1, n = 3; CMGC, n = 7-8; Other, n = 2; STE, n = 10; TK, n = 10-11; TKL, n = 6). In contrast, for both Trichinella species, the smallest number of kinases (n = 8) was associated with the KEGG category “metabolism”. The numbers of sequences assigned to the remaining KEGG categories “organismal systems”, “cellular processes” and “genetic information processing” ranged from 18 to 63 (Appendices 4.2 and 4.3). The prediction of functional domains showed that 447 sequences contained a protein kinase-like domain (IPR011009; SSF56112), 324 a protein kinase domain (IPR000719; PF00069), 85 a protein tyrosine kinase domain (IPR001245; PF07714) and 64 a casein kinase-like domain (PTHR11909). Other assigned domains included the protein-protein interaction domains SH2 (IPR000980; SSF55550; n = 21) and SH3 (IPR001452; SSF50044;

133

n = 10), as well as “fibronectin type III” (IPR003961; SSF49265; n = 18), “armadillo-type fold” (IPR016024; n = 17) and “immunoglobulin-like” domains (IPR007110; IPR013098; PF07679; SSF48726; n = 14), all of which are accessory domains frequently found in protein kinase sequences (Manning et al., 2002a). In addition to these domains, we identified a pair of kinase sequences (T01_6895 and T4A_2523) in the “Other” group that could not be classified and/or annotated further, but had been proposed previously (Desjardins et al., 2013) to represent a novel kinase in T. spiralis (Tsp_04914). Their amino acid sequences did not match any domains other than the kinase catalytic domain (PF00069), which shared some sequence similarity with that of MOS kinases in a range of eukaryotes (90-100% with other Trichinella species, 55-56% with Trichuris species, and 34-43% with various other metazoans). This is the first report of MOS- like kinase catalytic domains for any nematode. However, the N-terminal regions of these kinases were substantially longer than those of all known MOS kinases in KinBase and NCBI-nr (with the exception of one sequence from the sea urchin Strongylocentrotus purpuratus; XP_003729407) and did not match sequences of any taxa other than those of the class Enoplea (i.e. Trichinella and Trichuris spp.; protein BLAST against NCBI-nr, E-value of < 2.5 and nucleotide BLAST against NCBI-nr, E-value of < 10e-5). This finding stimulated us to investigate these unclassified kinase sequences further by subjecting their inferred sequences to structural modelling. The predicted protein structures shared highest structural similarity with that of a mixed lineage kinase domain-like protein of mice (MLKL; Protein Data Bank identifier: 4BTF), with root-mean-square deviation values & TM-scores of 6.6 & 0.73 and 6.4 & 0.74 for T. spiralis and T. pseudospiralis, respectively (Figure 4.2A). In these two models, the N-terminal region, despite sharing low amino acid sequence identity (8.5-16.7%) and similarity (16.7-29.8%) to the MLKL N-terminus, assumed the same fold as this region of MLKL, forming a helical bundle and a two-helix linker (Figure 4.2B; cf. Murphy et al., 2013). To explore this aspect further and to exclude the possibility that any protein kinase with a similarly-sized (≥ 150 aa) N-terminal region would model to the MLKL structure, we applied I-TASSER to a MOS kinase from the sea urchin, S. purpuratus (XP_003729407), which has a 150 aa-long N-terminus. However, we were not able to model a three-dimensional structure for this N-terminal region (not shown). When we modelled the mouse MOS kinase sequence (P00536.2), it had most structural similarity to the human feline sarcoma viral oncogene homologue (v-FES; 3BKB). Taken together, these results suggest that the two novel, MOS-like kinases, T01_6895 and T4A_2523, are structurally distinct from known MOS kinases.

134

A detailed appraisal of both Trichinella kinase sequences revealed three conserved motifs, VAIK, HLD and DFG, suggesting that they are active kinases (Figure 4.2C), consistent with the finding that their catalytic domains share most sequence homology with that of MOS kinases. In contrast, the motifs HLD and DFG are not conserved in the pseudokinase MLKL (Murphy et al., 2013; Figs 4.2C and 4.2D). An amino acid sequence alignment of the complete sequences revealed 87 additional motifs or residues that were conserved among mouse, T. spiralis and T. pseudospiralis (Figure 4.2D); also the secondary structures predicted from these sequences were similar among all three species (Figure 4.2D).

4.4 Discussion The reversible phosphorylation of proteins by protein kinases is an essential biochemical process, and is relatively conserved in eukaryotes. However, some kinase groups have evolved to assume new functions in some invertebrates groups (Manning et al., 2002b; Manning, 2005). While several major kinase expansions have been reported for the free- living nematode C. elegans (Manning et al., 2002b; Manning, 2005), there is no information for the vast majority of other members of the phylum Nematoda. As some nematode groups vary considerably in their biology and life cycles, this variability might be reflected in protein kinase complements. Here, we curated and explored in detail the kinomes of representatives of the Trichinella complex, which assume a unique taxonomic position (class Enoplea) within the Nematoda compared with other parasitic and free-living nematodes (class Chromadorea; see Blaxter et al., 1998; Roberts and Janovy Jr., 2009; Blaxter and Koutsovoulos, 2015; Korhonen et al., 2016). In this context, we defined, curated and compared the kinomes of two of the most-studied species within this complex, T. spiralis and T. pseudospiralis (see Pozio and Murrell, 2006; Pozio and Zarlenga, 2013), to provide a basis for quantitative and qualitative comparisons of their kinomes with other nematode species and also for exploring cellular signalling mechanisms in these parasites in the future. The application of our bioinformatic pipeline enabled us to curate and/or improve gene predictions for 32 of 226 (T. spiralis; 14%) and 60 of 232 (T. pseudospiralis; 26%) protein kinase genes, respectively. We reveal that T. spiralis and T. pseudospiralis have remarkably ‘compact’ kinomes - the smallest curated metazoan kinomes to date. The latter statement is supported by comparisons with draft kinomes (suggested to contain 234-364 kinases) of six other nematodes (Desjardins et al., 2013) and curated kinomes of other metazoans (vinegar fly, mouse and sea urchin; cf. KinBase). Although there is a possibility that the kinase complements inferred for Trichinella might be slightly underestimated, because of a lack of

135

transcriptomes for all developmental stages of these taxa and/or an inability to identify and classify unknown (‘orphan’) kinases, we offer a number of explanations for these ‘reduced’ kinomes. First, Trichinella could have developed an efficient way of controlling intra- and inter-cellular signalling that is distinct from other metazoans. Second, it is conceivable that Trichinella species rely heavily on host cell pathways for particular metabolic processes and have thus lost kinase genes and/or other genes involved in such processes. This proposal is supported by some evidence for other parasitic worms representing nematodes, trematodes and cestodes for which data are available (Tsai et al., 2013; Tyagi et al., 2015), and by the finding that only eight kinases (representing the seven groups Atypical, CK1, CMGC, Other, RGC, STE and TKL) are linked to metabolism (KEGG category) in Trichinella. In contrast, there was a considerably higher number of kinases (RGC group) in this KEGG category in both the extracellular, parasitic nematode H. contortus (n = 13; Stroehlein et al., 2015) and the free-living nematode C. elegans (n = 23). Whether the lack of kinases, such as RAD53, NEK and TTK, which are principally associated with non-metabolic pathways, can also be attributed to gene loss or simply to an efficient use of alternative or novel pathways, is presently unclear. However, the fact that kinases of the RAD53, NEK and TTK subfamilies play roles in the control of the cell cycle, including arrest upon DNA damage (Pellicioli and Foiani, 2005; Fry et al., 2012; Liu and Winey, 2012), is an example that supports the latter hypothesis, and suggests that, for Trichinella, these mechanisms are regulated by efficient signalling cascades involving small numbers of kinases. Other kinases not encoded in Trichinella, such as the growth factor receptors VEGFR and PDGFR, are also absent from all other nematodes and flatworms studied to date. Given that VEGFR-encoding genes are present in the kinomes of human, mouse, sea urchin and vinegar fly (http://kinase.com/web/current/kinbase/genes/Family/VEGFR/), which represent both and , it is plausible that these kinases have been lost from both the phyla Nematoda and Platyhelminthes during evolution. The reduced kinomes of T. spiralis and T. pseudospiralis might also be partially explained by the unique biology of these species with respect to other nematodes. For instance, the small number of RGCs (n = 3) in Trichinella compares with an expansion of these cyclases in nematodes, such as C. elegans (n = 27) and H. contortus (n = 28), with a free-living phase in their life cycle, where these kinases play important roles in environmental sensing (Ortiz et al., 2009; Adachi et al., 2010). This finding supports the proposal that the small RGC group in Trichinella can be attributed to the unique biology of this species. Nevertheless, as the L1 stage of Trichinella actively interacts with its intracellular environment (Despommier, 1998;

136

Smith et al., 2000), it is possible that some of the 66 protein kinases inferred to be associated with the processing of environmental cues (cf. Appendices 4.2 and 4.3) are involved in sensing. Although it is not yet clear which of these kinases are involved in pathways linked to sensing, they are distinct from RGCs used by other nematodes and, might be those within groups AGC, TK, STE and CAMK. The small number of kinases in most families and subfamilies (the majority being represented by a single kinase) and the finding that > 90% of all identified kinases were present as single-copy orthologs further support the notion of a functionally efficient kinome. One notable exception is the TTBK family, with 19 and 23 members in T. spiralis and T. pseudospiralis, respectively. Although a TTBK-like family of 31 and 16 kinases has been defined for C. elegans and H. contortus, respectively (Manning, 2005; Stroehlein et al., 2015), these are distinct from the recognised TTBK family for Trichinella. This information suggests a distinctiveness of function for TTBK kinases in Trichinella compared with other nematodes. A comparison of the kinomes of the encapsulated (T. spiralis) and non-encapsulated (T. pseudospiralis) taxa indicates that the biological distinctiveness of these species cannot be attributed to kinase signalling alone, given that we did not detect any major differences in the kinase complement between the two species. However, we did detect three differences in the numbers of encoded kinases (Figure 4.1; Appendices 4.2 and 4.3), suggesting that T. pseudospiralis has multiple copies of kinase-encoding genes in the groups CK1 and TK (FER family). Although the present data sets did not allow us to unambiguously verify these apparent gene duplication events in T. pseudospiralis, long-read sequencing technologies (Roberts et al., 2013) should enable genomes of chromosome-level contiguity to be assembled for T. spiralis, T. pseudospiralis and all other Trichinella taxa in the future. This effort should enable inter-taxon comparisons of these genes and assist in establishing whether these differences are linked to the unique biology of non-encapsulated versus encapsulated taxa. Despite the high, overall similarity of the kinomes of the two Trichinella species, their kinase complements were considerably different from those of other parasitic and free-living nematodes. Although current evidence suggests that a small complement of kinases (~235- 250) is typical of enopleans (cf. Schiffer et al., 2013; Jex et al., 2014), further work is required to confirm this proposal. Here, we also provided support that most of the nematode- specific “Worm” families are indeed Caenorhabditis-specific (Appendices 4.4-4.12), with the exception of the Worm6 family, which is also found in Trichinella and other nematodes (see

137

Desjardins et al., 2013; Stroehlein et al., 2015). In contrast, the small size of the FER family seems to be restricted to Trichinella and also to M. hapla (see Desjardins et al., 2013), with a larger FER family encoded in some filarial and rhabditid nematodes (see Desjardins et al., 2013). The classification of protein kinases based on primary sequence similarity in the catalytic domain and/or accessory domains enabled us to discern differences in the numbers of representatives within families/subfamilies. However, in some instances, functional annotation was compromised because of very limited amino acid sequence similarity (particularly external to the catalytic domain) to C. elegans and H. sapiens homologs for which experimental and functional data are available (cf. Appendices 4.4-4.12). Therefore, we used a three-dimensional modelling approach (employing the program I-TASSER) to infer structural homology of one such amino acid sequence, allowing statistically-supported functional annotation (Zhang and Skolnick, 2004). The presence of an N-terminal helical bundle and linker tethered to the kinase domain of a previously unclassified kinase (i.e. without family or subfamily classification within the “Other” group) in T. spiralis and T. pseudospiralis suggests that this kinase assumes a pore- forming and membrane-permeabilising function, as has been reported for murine and human mixed-lineage kinase domain-like proteins (MLKLs) that also have this N-terminal domain (Murphy et al., 2013; Hildebrand et al., 2014; Su et al., 2014; Wang et al., 2014). Whether the N-terminus of MOS-like kinases in Trichinella plays a functional role in tumor necrosis factor-induced cell death (necroptosis), as MLKLs do in (Newton and Manning, 2016), is not yet known. Homologs of the up-stream kinases, RIP1 and RIP3, which form a complex called “necrosome” together with MLKL (Murphy et al., 2013), are absent from both T. spiralis and T. pseudospiralis. In mice, phosphorylation by RIP3 is required for the activation of MLKL and for membrane localisation (Sun et al., 2012; Murphy et al., 2013). However, it is conceivable that, instead of RIP3, one or more serine-threonine protein kinases encoded in the Trichinella genomes might phosphorylate and activate these two novel kinases. Sequence conservation in all motifs linked to kinase activity as well as sequence similarity to the catalytic domain of MOS kinases raises the question as to whether the MOS-like kinases of Trichinella are capable of (auto-)phosphorylation. Although structural similarity in their N-terminus suggests MLKL kinase-like function (Newton and Manning, 2016), they might also act like MOS kinases (i.e. in oocyte maturation; Gebauer and Richter, 1997; Sagata, 1997; Dupré et al., 2011). However, it is also possible that they assume a role that is

138

distinctive from those of both MOS and MLKL proteins. The very low transcription levels of MOS-like kinase genes in the L1 stages of T. spiralis and T. pseudospiralis (see Korhonen et al., 2016) indicate that they are unlikely to play a major role in host-parasite interactions while inside the host cell. A preliminary study of Trichuris suis, a parasitic nematode in the same class (Enoplea) as members of the Trichinella complex, has shown that a MOS-like kinase gene is almost exclusively transcribed in the posterior part of the adult female worm (Jex et al., 2014), which hints to a possible role in reproductive processes. While in mice MLKL does not appear to play a major role in reproduction, with MLKL-deficient mice being viable and without an overt phenotype (Murphy et al., 2013; Wu et al., 2013), another study (Blohberger et al., 2015) showed that the necroptosis pathway is involved in the regulation of cell death in human ovarian cells. Additionally, the catalytic domains of Trichinella MOS-like kinases share more sequence similarity with those of MOS kinases than those of MLKL kinases; given that MOS plays a key role in the meiotic maturation of oocytes (Gebauer and Richter, 1997; Sagata, 1997; Dupré et al., 2011), these kinases might be involved in reproductive or developmental processes in Trichinella. Clearly, future investigations should explore the functional roles of these interesting and apparently enoplean-specific kinases. In conclusion, the present study provides first detailed information on the kinomes of T. spiralis and T. pseudospiralis as well as a useful, new resource for comparative studies of nematode kinomes. We also report an example of the utility of three-dimensional modelling for the functional annotation of kinases previously unannotatable using conventional bioinformatic techniques. Taken together, the present results provide a foundation for comparative studies of nematode kinomes and their evolution, and might also enable the identification of enoplean-specific intervention and diagnostic targets. Significantly, the in silico-modelling approach evaluated here might pave the way to being able to identify and classify presently unknown (‘orphan’) kinases in parasitic worms for subsequent structural and functional investigations.

139

4.5 References Adachi T, Kunitomo H, Tomioka M, Ohno H, Okochi Y, Mori I, Iino Y, 2010. Reversal of salt preference is directed by the insulin/PI3K and Gq/PKC signaling in Caenorhabditis elegans. Genetics 186, 1309-1319. Arden SR, Smith AM, Booth MJ, Tweedie S, Gounaris K, Selkirk ME, 1997. Identification of serine/threonine protein kinases secreted by Trichinella spiralis infective larvae. Mol. Biochem. Parasitol. 90, 111-119. Blaxter M, Koutsovoulos G, 2015. The evolution of parasitism in Nematoda. Parasitology 142 (Suppl. 1), S26-S39. Blaxter ML, De Ley P, Garey JR, Liu LX, Scheldeman P, Vierstraete A, Vanfleteren JR, Mackey LY, Dorris M, Frisse LM, Vida JT, Thomas WK, 1998. A molecular evolutionary framework for the phylum Nematoda. Nature 392, 71-75. Blohberger J, Kunz L, Einwang D, Berg U, Berg D, Ojeda SR, Dissen GA, Frohlich T, Arnold GJ, Soreq H, Lara H, Mayerhofer A, 2015. Readthrough acetylcholinesterase (AChE-R) and regulated necrosis: pharmacological targets for the regulation of ovarian functions? Cell Death Dis. 6, e1685. Blum T, Briesemeister S, Kohlbacher O, 2009. MultiLoc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction. BMC Bioinformatics 10, 274. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL, 2009. BLAST+: architecture and applications. BMC Bioinformatics 10, 421. Capo VA, Despommier DD, Polvere RI, 1998. Trichinella spiralis: vascular endothelial growth factor is up-regulated within the nurse cell during the early phase of its formation. J. Parasitol. 84, 209-214. Cohen P, 2000. The regulation of protein function by multisite phosphorylation - a 25 year update. Trends Biochem. Sci. 25, 596-601. Desjardins CA, Cerqueira GC, Goldberg JM, Dunning Hotopp JC, Haas BJ, Zucker J, Ribeiro JM, Saif S, Levin JZ, Fan L, Zeng Q, Russ C, Wortman JR, Fink DL, Birren BW, Nutman TB, 2013. Genomics of Loa loa, a Wolbachia-free filarial parasite of humans. Nat. Genet. 45, 495-500. Despommier DD, 1993. Trichinella spiralis and the concept of niche. J. Parasitol. 79, 472- 482. Despommier DD, 1998. How does Trichinella spiralis make itself at home? Parasitol. Today 14, 318-323. Dupré A, Haccard O, Jessus C, 2011. Mos in the oocyte: how to use MAPK independently of growth factors and transcription to control meiotic divisions. J. Signal Transduct. 2011, 350412. Fry AM, O'Regan L, Sabir SR, Bayliss R, 2012. Cell cycle regulation by the NEK family of protein kinases. J. Cell Sci. 125, 4423-4433. Gebauer F, Richter JD, 1997. Synthesis and function of Mos: the control switch of vertebrate oocyte meiosis. BioEssays 19, 23-28. Goldberg JM, Griggs AD, Smith JL, Haas BJ, Wortman JR, Zeng Q, 2013. Kinannote, a computer program to identify and classify members of the eukaryotic protein kinase superfamily. Bioinformatics 29, 2387-2394. Gough J, Karplus K, Hughey R, Chothia C, 2001. Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J. Mol. Biol. 313, 903-919. Gounaris K, Thomas S, Najarro P, Selkirk ME, 2001. Secreted variant of nucleoside diphosphate kinase from the intracellular parasitic nematode Trichinella spiralis. Infect. Immun. 69, 3658-3662.

140

Hanks SK, Quinn AM, Hunter T, 1988. The protein kinase family: conserved features and deduced phylogeny of the catalytic domains. Science 241, 42-52. Hanks SK, Hunter T, 1995. Protein kinases 6. The eukaryotic protein kinase superfamily: kinase (catalytic) domain structure and classification. FASEB J. 9, 576-596. Hildebrand JM, Tanzer MC, Lucet IS, Young SN, Spall SK, Sharma P, Pierotti C, Garnier JM, Dobson RC, Webb AI, Tripaydonis A, Babon JJ, Mulcair MD, Scanlon MJ, Alexander WS, Wilks AF, Czabotar PE, Lessene G, Murphy JM, Silke J, 2014. Activation of the pseudokinase MLKL unleashes the four-helix bundle domain to induce membrane localization and necroptotic cell death. Proc. Natl. Acad. Sci. USA 111, 15072-15077. Jex AR, Nejsum P, Schwarz EM, Hu L, Young ND, Hall RS, Korhonen PK, Liao S, Thamsborg S, Xia J, Xu P, Wang S, Scheerlinck JP, Hofmann A, Sternberg PW, Wang J, Gasser RB, 2014. Genome and transcriptome of the porcine whipworm Trichuris suis. Nat. Genet. 46, 701-706. Jones P, Binns D, Chang HY, Fraser M, Li W, McAnulla C, McWilliam H, Maslen J, Mitchell A, Nuka G, Pesseat S, Quinn AF, Sangrador-Vegas A, Scheremetjew M, Yong SY, Lopez R, Hunter S, 2014. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236-1240. Kanehisa M, Goto S, 2000. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27-30. Kent WJ, 2002. BLAT - the BLAST-like alignment tool. Genome Res. 12, 656-664. Korhonen PK, Pozio E, La Rosa G, Chang BCH, Koehler AV, Hoberg EP, Boag PR, Tan P, Jex AR, Hofmann A, Sternberg PW, Young ND, Gasser RB, 2016. Phylogenomic and biogeographic reconstruction of the Trichinella complex. Nat. Commun. 7, 10513. Li L, Stoeckert Jr. CJ, Roos DS, 2003. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 2178-2189. Liu X, Winey M, 2012. The MPS1 family of protein kinases. Annu. Rev. Biochem. 81, 561- 585. Manning G, Whyte DB, Martinez R, Hunter T, Sudarsanam S, 2002a. The protein kinase complement of the human genome. Science 298, 1912-1934. Manning G, Plowman GD, Hunter T, Sudarsanam S, 2002b. Evolution of protein kinase signaling from yeast to man. Trends Biochem. Sci. 27, 514-520. Manning G, 2005. Genomic overview of protein kinases. WormBook, ed. The C. elegans Research Community, WormBook, https://doi.org/10.1895/wormbook.1.60.1 Mi H, Muruganujan A, Casagrande JT, Thomas PD, 2013. Large-scale gene function analysis with the PANTHER classification system. Nat. Protoc. 8, 1551-1566. Mitreva M, Jasmer DP, Zarlenga DS, Wang Z, Abubucker S, Martin J, Taylor CM, Yin Y, Fulton L, Minx P, Yang SP, Warren WC, Fulton RS, Bhonagiri V, Zhang X, Hallsworth-Pepin K, Clifton SW, McCarter JP, Appleton J, Mardis ER, Wilson RK, 2011. The draft genome of the parasitic nematode Trichinella spiralis. Nat. Genet. 43, 228-235. Murphy JM, Czabotar PE, Hildebrand JM, Lucet IS, Zhang JG, Alvarez-Diaz S, Lewis R, Lalaoui N, Metcalf D, Webb AI, Young SN, Varghese LN, Tannahill GM, Hatchell EC, Majewski IJ, Okamoto T, Dobson RC, Hilton DJ, Babon JJ, Nicola NA, Strasser A, Silke J, Alexander WS, 2013. The pseudokinase MLKL mediates necroptosis via a molecular switch mechanism. Immunity 39, 443-453. Newton K, Manning G, 2016. Necroptosis and Inflammation. Annu. Rev. Biochem. 85, 743- 763.

141

Ortiz CO, Faumont S, Takayama J, Ahmed HK, Goldsmith AD, Pocock R, McCormick KE, Kunimoto H, Iino Y, Lockery S, Hobert O, 2009. Lateralized gustatory behavior of C. elegans is controlled by specific receptor-type guanylyl cyclases. Curr. Biol. 19, 996-1004. Pellicioli A, Foiani M, 2005. Signal transduction: how Rad53 kinase is activated. Curr. Biol. 15, R769-R771. Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, Ferrin TE, 2004. UCSF Chimera - a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605-1612. Plowman GD, Sudarsanam S, Bingham J, Whyte D, Hunter T, 1999. The protein kinases of Caenorhabditis elegans: a model for signal transduction in multicellular organisms. Proc. Natl. Acad. Sci. USA 96, 13603-13610. Pozio E, Zarlenga DS, La Rosa G, 2001. The detection of encapsulated and non-encapsulated species of Trichinella suggests the existence of two evolutive lines in the genus. Parasite 8, S27-S29. Pozio E, Murrell KD, 2006. Systematics and epidemiology of Trichinella. Adv. Parasitol. 63, 367-439. Pozio E, 2007. World distribution of Trichinella spp. infections in animals and humans. Vet. Parasitol. 149, 3-21. Pozio E, Zarlenga DS, 2013. New pieces of the Trichinella puzzle. Int. J. Parasitol. 43, 983- 997. Quarmby LM, Mahjoub MR, 2005. Caught Nek-ing: cilia and centrioles. J. Cell Sci. 118, 5161-5169. Rice P, Longden I, Bleasby A, 2000. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 16, 276-277. Roberts LS, Janovy Jr. J, 2009. Gerald D. Schmidt & Larry S. Roberts’ Foundations of Parasitology. McGraw-Hill, New York, USA. Roberts RJ, Carneiro MO, Schatz MC, 2013. The advantages of SMRT sequencing. Genome Biol. 14, 405. Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Hohna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP, 2012. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst. Biol. 61, 539-542. Roy A, Kucukural A, Zhang Y, 2010. I-TASSER: a unified platform for automated protein structure and function prediction. Nat. Protoc. 5, 725-738. Sagata N, 1997. What does Mos do in oocytes and somatic cells? BioEssays 19, 13-21. Schiffer PH, Kroiher M, Kraus C, Koutsovoulos GD, Kumar S, Camps JI, Nsah NA, Stappert D, Morris K, Heger P, Altmüller J, Frommolt P, Nürnberg P, Thomas WK, Blaxter ML, Schierenberg E, 2013. The genome of Romanomermis culicivorax: revealing fundamental changes in the core developmental genetic toolkit in Nematoda. BMC Genomics 14, 923. Slater GS, Birney E, 2005. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6, 31. Smith VP, Selkirk ME, Gounaris K, 2000. A reversible protein phosphorylation system is present at the surface of infective larvae of the parasitic nematode Trichinella spiralis. FEBS Lett. 483, 104-108. Sonnhammer EL, Eddy SR, Durbin R, 1997. Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins 28, 405-420. Stroehlein AJ, Young ND, Korhonen PK, Jabbar A, Hofmann A, Sternberg PW, Gasser RB, 2015. The Haemonchus contortus kinome - a resource for fundamental molecular investigations and drug discovery. Parasit. Vectors 8, 623.

142

Su L, Quade B, Wang H, Sun L, Wang X, Rizo J, 2014. A plug release mechanism for membrane permeation by MLKL. Structure 22, 1489-1500. Sun L, Wang H, Wang Z, He S, Chen S, Liao D, Wang L, Yan J, Liu W, Lei X, Wang X, 2012. Mixed lineage kinase domain-like protein mediates necrosis signaling downstream of RIP3 kinase. Cell 148, 213-227. Tsai IJ, Zarowiecki M, Holroyd N, Garciarrubio A, Sanchez-Flores A, Brooks KL, Tracey A, Bobes RJ, Fragoso G, Sciutto E, Aslett M, Beasley H, Bennett HM, Cai J, Camicia F, Clark R, Cucher M, De Silva N, Day TA, Deplazes P, Estrada K, Fernandez C, Holland PW, Hou J, Hu S, Huckvale T, Hung SS, Kamenetzky L, Keane JA, Kiss F, Koziol U, Lambert O, Liu K, Luo X, Luo Y, Macchiaroli N, Nichol S, Paps J, Parkinson J, Pouchkina-Stantcheva N, Riddiford N, Rosenzvit M, Salinas G, Wasmuth JD, Zamanian M, Zheng Y, Taenia solium Genome Consortium, Cai X, Soberon X, Olson PD, Laclette JP, Brehm K, Berriman M, 2013. The genomes of four tapeworm species reveal adaptations to parasitism. Nature 496, 57-63. Tyagi R, Rosa BA, Lewis WG, Mitreva M, 2015. Pan-phylum comparison of nematode metabolic potential. PLoS Negl. Trop. Dis. 9, e0003788. Ubersax JA, Ferrell Jr. JE, 2007. Mechanisms of specificity in protein phosphorylation. Nat. Rev. Mol. Cell Biol. 8, 530-541. Wang H, Sun L, Su L, Rizo J, Liu L, Wang LF, Wang FS, Wang X, 2014. Mixed lineage kinase domain-like protein MLKL causes necrotic membrane disruption upon phosphorylation by RIP3. Mol. Cell 54, 133-146. Wu J, Huang Z, Ren J, Zhang Z, He P, Li Y, Ma J, Chen W, Zhang Y, Zhou X, Yang Z, Wu SQ, Chen L, Han J, 2013. Mlkl knockout mice demonstrate the indispensable role of Mlkl in necroptosis. Cell Res. 23, 994-1006. Zhang Y, Skolnick J, 2004. Scoring function for automated assessment of protein structure template quality. Proteins 57, 702-710.

143

Table 4.1 Eukaryotic protein kinases (ePKs) of Trichinella spiralis (T1) and T. pseudospiralis (T4.1). Groups Families Subfamilies Name T1 T4.1 Ce Hc Hs T1 T4.1 T1 T4.1 AGC 21 21 29 41 63 11 (20; 95%) 11 (20; 95%) 12 (13; 62%) 12 (13; 62%) CAMK 26 26 40 60 74 12 (26; 100%) 12 (26; 100%) 11 (13; 50%) 11 (13; 50%) CK1 33 39a 83 61 12 3 (28; 85%) 3 (32; 82%) 3 (3; 9%) 3 (3; 8%) CMGC 28 28 48 43 61 8 (28; 100%) 8 (28; 100%) 20 (22; 79%) 20 (22; 79%) Other 31 31 67 46 83 18 (30; 97%) 18 (30; 97%) 9 (11; 35%) 9 (11; 35%) RGC 3 3 27 28 5 1 (3; 100%) 1 (3; 100%) 0 (0; 0%) 0 (0; 0%) STE 18 18 24 27 47 4 (18; 100%) 4 (18; 100%) 15 (16; 89%) 15 (16; 89%) TK 29 30a 84 56 90 16 (24; 83%) 16 (25; 83%) 1 (1; 3%) 1 (1; 3%) TKL 16 16 15 24 43 6 (16; 100%) 6 (16; 100%) 10 (14; 88%) 10 (14; 88%) Totals 205 212 417 386 478 79 (193; 94%) 79 (198; 93%) 81 (93; 45%) 81 (93; 44%) The numbers of kinases in individual groups for T. spiralis, T. pseudospiralis, Caenorhabditis elegans (Ce; see Manning, 2005), Haemonchus contortus (Hc; see Stroehlein et al., 2015) and Homo sapiens (Hs; see Manning et al., 2002a). For Trichinella species, the numbers of unique families and subfamilies and the numbers and percentages of kinases assigned to them (in brackets) are shown. a Differences in the number of kinases between T. spiralis and T. pseudospiralis.

144

Table 4.2 Number of representatives in eukaryotic protein kinase (ePK) families and subfamilies of Trichinella spiralis and T. pseudospiralis. Number of ePK Number of representatives Families/subfamilies families/subfamilies (T. spiralis/T. pseudospiralis) 112 see Figure 4.1 and Appendices 4.2 and 4.3 1/1 ALK, AMPK, CDC2, CDK9, CK1, DDR, GSK, 17 INSR, KSR, LATS, MK2, NRBP, PAKA, PEK, 2/2 PKD, SCY1, STKR2 5 MLCK, PKA, RGC, STKR1, ULK 3/3 2 AUR, WORM6 4/4 1 FER 5/6 1 TTBK 19/23

145

Table 4.3 Atypical protein kinases (aPKs) of Trichinella spiralis (T1) and T. pseudospiralis (T4.1).

Family/ T1 T4.1 Hc Ce Hs Description Domains presentd sub- family ABC1-A 1 1 1 1 2 Activity of bc1 complex PF03109 (ABC1 family) kinase A SSF56112 (Protein kinase-like) ABC1-B 1 1 2 1 2 Activity of bc1 complex PF03109 (ABC1 family) kinase B SSF56112 (Protein kinase-like) Alpha 1 1 1 1 6 Alpha kinase PF02816 (Alpha-kinase family) PTHR14187 (Alpha kinase/elongation factor 2 kinase) SSF56112 (Protein kinase-like)

BRD 2a 1 5 1 4 Bromodomain PF00439 (Bromodomain) SSF47370 (Bromodomain) DHS-27b 1 1 14c 0 0 Uncharacterised SSF56112 (Protein kinase-like) oxidoreductase IPR012877 (Uncharacterised oxidoreductase) PDHK 2 2 1 1 1 Pyruvate dehydrogenase PTHR11947 (Pyruvate dehydrogenase kinase kinase) SSF56112 (Protein kinase-like) PI4Kb 1 1 0 0 0 Phosphatidylinositol 4-kinase PF00454 (Phosphatidylinositol 3- and 4- kinase) PTHR10048 (Phosphatidylinositol kinase) SSF56112 (Protein kinase-like) ATM- 5 5 7 5 6 Ataxia telangiectasia PTHR11139 (Ataxia telangiectasia related mutated-related (including mutated (ATM)-related) ATM, ATR, DNAPK, FRAP, PF00454 (Phosphatidylinositol 3- and 4- SMG1, TRRAP) kinase) SSF56112 (Protein kinase-like) RIO1 1 1 1 1 1 Right open reading frame PF01163 (RIO1 family) kinase 1 PTHR10593 (Serine/threonine-protein kinase RIO) SSF56112 (Protein kinase-like) RIO2 1 1 1 1 1 Right open reading frame PF01163 (RIO1 family) kinase 2 PF09202 (RIO2, N-terminal) SSF56112 (Protein kinase-like) RIO3 1 1 1 1 1 Right open reading frame PF01163 (RIO1 family) kinase 3 PTHR10593 (Serine/threonine-protein kinase RIO) SSF56112 (Protein kinase-like) TAF1 1 1 2 1 2 Transcription initiation factor PTHR13900 (Transcription initiation TFIID subunit 1 factor TFIID) PF00439 (Bromodomain) SSF47370 (Bromodomain) NHRRb 3 3 14c 0 0 Uncharacterised nuclear- SSF56112 (Protein kinase-like) hormone receptor-related PTHR23020 (uncharacterised nuclear hormone receptor-related) PF07914 (DUF1679) Totals 21 20 37c 17 38 aPKs were classified using Kinannote, supporting domain annotation from InterProScan and clusters of homologs with H. sapiens (Hs) and C. elegans (Ce; both retrieved from http://kinase.com/web/current/kinbase/) a aPK families with differences in the number of kinases between T. spiralis and T. pseudospiralis b Novel kinase families with putative protein kinase activity c In H. contortus (Hc), 14 unclassified sequences have both the DHS-27 domain and the NHRR domain d Complete domain annotations are given in Appendices 4.2 and 4.3

146

CDK4 PFTAIRE

JNK CDK9 PCTAIRE CDK10 NMO

CRK7

CDK8

ERK1 P38 CDK7 CDC2

CDK

CDK5 ERK7

MAK CK2

HASPIN

GSK

BUB PRP4 Other DYRK1 DYRK2 HIPK ULK TLK CDC7 MPSK BUD32 CAMKK-META CK1-A TBCK BIKE CK1 CK1-G SRPK NAK VPS15 CK1-D AUR SCY1 IRE MYT1 PEK CK1 GCN2 PEK WNK PLK1 TTBK NRBP WORM6 TTBK CK1 WORM6 SAK CLK CK1 RGC

0.4 YSK MEK7 MEK4 MST CMGC MEK1 MSN TAO Other SLK KHS CK1 FAM69 MEK3 TTBK Trichinella STE PAKB kinomes TKL LZK PAKA CAMK FRAY TK MLK AGC ASK ABL STLK AMPK TESK

NUAK HH498 SNRK RAF STE20 MARK LRRK BARK CSK TAK1 CAMKL BRSK KSR CHK1 MLCK SRC IRAK LKB MELK CAMK2 DAPK EPH EGFR ILK ACK PIM DCAMKL PHK CAMK1 ROR MK2 MNK STKR2 INSR PKD PSK ALK DDR CASK PKCH TK AKT STKR1 PDK1 FGFR PKN TK CCK4 PKCI PKCA RSKP70 MET PKCD RYK LATS TK MAST PKA FER RSKP90 PKG AGC GEK NDR ROCK 0.4 Expansions in T. pseudospiralis FER < 0.9 nodal support Not classified beyond the group level

TK

Figure 4.1 Phylogenetic analysis of eukaryotic protein kinases (ePKs) of Trichinella spiralis and T. pseudospiralis. Phylogenetic trees (Bayesian inference) were constructed based on alignment of amino acid sequences representing individual kinase groups. High-resolution figures of individual trees, including nodal support values and sequence identifiers, are given in Appendix 4.1.

147

A B C R352

Y347 H346

HLD L316

HRN DFG GFE T337 S323 Protein kinase N - terminal domain Two-helix Helical bundle A337 domain linker D 1 10 20 30 40 50 60 70 80 90 4BTF MDKLGQIIKLGQLIYEQCEKMKYCRKQCQRLGNRVHGLLQPLQRLQAQGKKNLPDDITAALGRFDEVLKEANQQIEKFSKKSHIWKFVSVGND T4A_2523 MTKLPSPLVQHRILKPRENNFLREINCCVTVPKCQNGSATPIRRPAWTDEAE-NDEILWCRSKNNET--SVNRQFSRF------YSPNLPS T01_6895 MTKLPSPLVQHCIFEPRKNDFRSEIHCCATAPKCQNGSVTPIRRPDWTDEAE-NEEILLCRSKNNET--SVNRQLSRF------YSPILPS

100 110 120 130 140 150 160 170 180 4BTF KILFHEVNEKLRDVWEELLLLLQVYHWNTVSDVSQPASWQQEDRQDAEEDGNENMKVILMQLQISVEEINKTLKQCSLKPTQEIPQDLQIKEI T4A_2523 RTLY-PIQEKNSQSAQTAAEIAVVRRHSILVSHNRKYQWKKR------KFRARKLYQNTEDISNHEDIKNKNFLQVTTS T01_6895 KTLF-PVQEKNSQSAQTAAEIAVVRRHSILFSHNRKHQWKKR------KFRARKLYQNTEDISNHENNKNKNYLQVTTS β

190 200 210 220 230 240 250 260 270 4BTF PKEHLGPPWTKLKTSKMSTIYRGEYHRSP**VT*IK *VFNNPQAESVGIVRFTFNDEIKTMKKFDSPNILRIFGICIDQTVKPPEFSIVMEYCELGT T4A_2523 PHQLLG------QGGFGAVYIGILAGKLVAIKQLHKTIKEK--LVENSFCAELN-MFGLDHPNIVKILTF----SSLEAEIEIVYEYVSKKN T01_6895 SHQLLG------QGGFGAVYIGILAGKLVAIKQLHKTIKEK--LVENSFCAEMN-MFGLDHPNIVKILTF----SSLEAEIEIVYEYVSKKN β β β β β β β β β β β 340 280 290 300 310 320 330 350 360 370 4BTF LRELL-DREKDLTMSVRSLLVLRAARGLYRLHHSETLH***RNISSSSFLVA-GGYQVKLAG**F*ELSK--TQNSISRTAKSTKAERSSSTIYVSPER T4A_2523 LQQLIHDSEIEIEPMKKVKFCRQIADALQYCHKNNILHLDVKPSNVLISCDGNDCKLTDFGCSRRIHYDQLNVEGNTLNLGQSGTLIYKCPEL T01_6895 LQQLIHDSEIEIEPMKKVKFCRQIADALQYCHKNNILHLDVKPSNVLISCDGNDCKLTDFGCSRRIHHDQLNVEGNTLNLGQSGTLIYKCPEL β β

380 390 400 410 420 430 440 450 460 465 4BTF LKNPFCLYDIKAEIYSFGIVLWEIATGKIPFEGCDSKKIRELVAEDKKQEP-----VGQDCPELLREIINECRAHEPSQRPSVDGILERLSAV T4A_2523 LRGQQP--TIKADVYSYGLLLWECVMRQIPFGGINQHTV-VFCVVAKELRPYINNFVGSKEETKLLRLAKLCWKAKPENRPNFTQICHFFTKF T01_6895 LRGQQP--TIKADVYSYGLLLWECVMRQIPFGGINQHTV-VFCVVAKQLRPYVNNFVRSEEETKLLRLAKICWEAEPENRPNFPQICHFFTKF

Figure 4.2 A novel, MOS-like protein kinase of Trichinella spiralis and T. pseudospiralis, which shares structural homology with the N-terminus of the murine mixed lineage kinase domain-like protein (MLKL). (A) Three-dimensional models of T. spiralis (T01_6895; light blue) and T. pseudospiralis (T4A_2523; dark blue) kinases superimposed on to the Protein Data Bank structure of MLKL (4BTF; orange). (B) N-terminal domain comprised of a two- helix linker and a helical bundle. (C) Superimposed catalytic cleft of 4BTF, T01_6895 and T4A_2523. Motif regions are annotated and shown in yellow. Positions and amino acids of the last residues shown are indicated in the respective colour for each model (black if the residues are the same in all three models). Numbering of residues is based on the multiple sequence alignment. (D) Multiple sequence alignment showing levels of similarity among residues for all three sequences (white to black; not conserved to conserved). Secondary structures based on three-dimensional modelling using the program I-TASSER for Trichinella models and based on the 4BTF crystal structure are depicted below the alignment; α-helices are represented by lines in the same colour as the respective model and β-sheets are also marked with a “β”. Motifs essential for kinase activity are marked with yellow stars.

148

CHAPTER 5 Whipworm kinomes reflect a unique biology and adaptation to the host animal ______

Abstract Roundworms belong to a diverse phylum (Nematoda) which is comprised of many parasitic species including whipworms (genus Trichuris). These worms have adapted to a biological niche within the host and exhibit unique morphological characteristics compared with other nematodes. Although these adaptations are known, the underlying molecular mechanisms remain elusive. The availability of genomes and transcriptomes of some whipworms now enables detailed studies of their molecular biology. Here, we defined and curated the full complement of an important class of enzymes, the protein kinases (kinomes) of two species of Trichuris, using an advanced and integrated bioinformatic pipeline. We investigated the transcription of Trichuris suis kinase genes across developmental stages, sexes and tissues, and reveal that selectively transcribed genes can be linked to central roles in developmental and reproductive processes. We also classified and functionally annotated the curated kinomes by integrating evidence from structural modelling and pathway analyses, and compared them with other curated kinomes of phylogenetically diverse nematode species. Our findings suggest unique adaptations in signalling processes governing worm morphology and biology, and provide an important resource that should facilitate experimental investigations of kinases and the biology of signalling pathways in nematodes. ______

149

5.1 Introduction The phylum Nematoda (roundworms) is morphologically and biologically very diverse and, based on recent molecular studies, can be divided into two classes: the Chromadorea and the Enoplea (see De Ley, 2006; Blaxter and Koutsovoulos, 2015). Within the latter, many species exhibit unique biological features compared with representatives of the Chromadorea, a class that includes the free-living nematode Caenorhabditis elegans (see Brenner, 1988) and the parasitic nematode Haemonchus contortus (see Veglia, 1915). For example, species of the genera Trichinella (trichina worm) and Trichuris (whipworm) both differ substantially in morphology from most other parasitic nematodes, lacking sensory phasmids and having an oesophagus surrounded by stichocytes (stichosome), which, on the ventral side, has a specialised organ (the bacillary band) consisting of ~50,000 bacillary cells (Sheffield, 1963; Wright, 1968; Bruce, 1970; Despommier and Müller, 1976; Kozek, 2005; Lopes Torres et al., 2013). These unique features have been suggested to play important roles in physiological processes and the host-parasite interplay (Despommier, 1998; Bradley and Jackson, 2004; Foth et al., 2014; Jex et al., 2014). The latter is intriguing, given the unusual relationship that these enopleans have with their host animal - adult Trichuris worms burrow into syncytial tunnels comprised of epithelial cells of the large intestine (Tilney et al., 2005). Although this relationship is known, the molecular mechanisms underlying many of the associated processes, such as tissue invasion of and immune regulation in the host, remain largely unknown. Recently, draft transcriptomes and genomes became available for selected taxa of Trichuris and Trichinella (see Cantacessi et al., 2011; Mitreva et al., 2011; Foth et al., 2014; Jex et al., 2014; Korhonen et al., 2016; Santos et al., 2016), enabling investigations of the molecular mechanisms governing some of the unique morphological and biological features of these parasites. Such analyses have suggested specific functional adaptations and/or expansions, which are reflected in the genetic composition of these species and appear to be restricted to enoplean nematodes. However, the biochemical signalling mechanisms that coordinate these processes have not yet been studied in detail. Thus, exploring protein kinases is of particular interest because these enzymes regulate many inter- and intra-cellular signalling pathways. Recently, we reported remarkably compact kinomes for two species of Trichinella (Trichinella spiralis and Trichinella pseudospiralis; see Stroehlein et al., 2016), revealing reductions of protein groups/families, adaptations of functional domains, and a novel “Moloney sarcoma oncogene” (MOS)-like kinase that appears to be unique to enoplean nematodes. However,

150

there is no curated kinome for any species of Trichuris, although a genomic study estimated that at least 232 protein kinases are encoded in the draft genome of Trichuris suis (see Jex et al., 2014; Stroehlein et al., 2016). Characterising and curating protein kinases of Trichuris spp. would allow a more comprehensive comparison of nematode kinomes and would increase our understanding of these intriguing parasites, both from a fundamental perspective and in the broader context of the evolution of nematode kinases. The aims of the present study were to: (i) curate and characterise the kinomes of the closely related species T. suis and Trichuris trichiura; (ii) explore the transcription profiles of kinase genes in different developmental stages and tissues of T. suis; (iii) define variable and conserved kinase groups and/or families across a range of curated nematode kinomes; and (iv) discuss the findings from this study in the context of adaptations in kinase signalling processes and worm biology.

5.2 Materials and methods 5.2.1 Defining and curating kinomes To define and curate the kinomes of Trichuris, we used a modified and improved version of a recently published methodology (Stroehlein et al., 2016), which relies on a pairwise curation strategy. For T. suis, we used published genomic data for male and female worms, as well as published transcriptomic data representing individual larval stages (collected 10 days (L1/L2), 18 days (L3) and 28 days (L4) following inoculation of pigs), the stichosome, the non-stichosomal (i.e. posterior) part of male and female worms, and whole adult worms (Jex et al., 2014). For T. trichiura, we used available genomic (Foth et al., 2014) and transcriptomic (Santos et al., 2016) data sets representing mixed adult worms. To reciprocally curate kinase sequences, we employed a four-step bioinformatic pipeline. First, we predicted ‘seed’ sequences containing complete or partial protein kinase-like domains from a total of 14,436 (T. suis male; PRJNA208415) and 14,261 (T. suis female; PRJNA208416) inferred amino acid sequences, and from 644,549 open reading frames inferred from de novo-assembled (i.e. assembled using RNA-Seq reads independently of a reference genome; all stages and tissues combined) transcripts using InterProScan v.5.15.54 (Jones et al., 2014). In this program, we employed information from domain matches against the databases Pfam v.27.0 (Sonnhammer et al., 1997), PANTHER v.9.0 (Mi et al., 2013) and SUPERFAMILY v.1.75 (Gough et al., 2001).

151

Second, we mapped all 644,549 de novo-assembled transcripts to genomic scaffolds using the program BLAT v.34x12 (Kent, 2002). Transcripts that mapped to a region containing at least one candidate kinase transcript were subsequently reassembled using CAP3 (build 15 Oct 2007; Huang and Madan, 1999) to extend transcripts. Third, we used the “cds2genome” model in the program Exonerate v.2.2.0 (Slater and Birney, 2005) to define gene structures (i.e. intron-exon boundaries) for the mapped, reassembled transcripts encoding kinase candidates. The first three steps were also executed for T. trichiura, employing 8968 amino acid sequences inferred from original gene predictions (Foth et al., 2014) and 6414 de novo-assembled transcripts (Santos et al., 2016). The third step was then iterated in a pairwise manner among all Trichuris genomes until an optimal gene prediction was achieved (i.e. the prediction converged). Following the sequence curation step, we assessed kinase candidates with incomplete or unusual domain architectures by comparisons with known kinase domain combinations, as reported in InterProScan and KinBase (http://kinase.com/web/current/kinbase/; accessed: 2 October 2016) and by literature searches, and we discarded candidates with insufficient kinase domain evidence. Sequences that were < 200 amino acids in length were labelled as fragments. The final kinase gene sets and their genomic loci were reported in a “general feature format” (GFF) file for each species and sex.

5.2.2 Kinase classification and functional annotation We classified amino acid sequences representing kinases using the program Kinannote (Goldberg et al., 2013), and annotated sequences that were not or only partially classified using this approach, based on homology to kinase sequences represented in well-curated kinomes of C. elegans and Homo sapiens (see Manning et al., 2002; Manning, 2005) using OrthoMCL (BLAST E-value of ≤ 10e-5; sequence similarity of ≥ 80%; v.2.0.4; Li et al., 2003); if the family/subfamily classification for all C. elegans and H. sapiens sequences within an OrthoMCL cluster were consistent with those of Trichuris sequences, we assigned this classification to these sequences. In order to confirm the classification and orthology of sequences from different Trichuris spp./sexes, we constructed phylogenetic trees for individual kinase groups using the program MrBayes v.3.2.2 (Ronquist et al., 2012), as described previously (Stroehlein et al., 2016). Additionally, to classify the remaining protein kinase sequences, we employed domain/domain architecture information from InterProScan and three-dimensional modelling of select kinase sequences against all structures in the Protein Data Bank (PDB; http://www.rcsb.org/pdb/; accessed: 12 January 2017) using the

152

program I-TASSER v.4.4 (Roy et al., 2010), applying default parameters. To ensure high confidence classifications, we only considered the top structural analog in the PDB and applied stringent cut-offs (i.e. confidence (C) score of > 0, template modelling (TM) score of ≥ 0.75 and root-mean-square deviation (RMSD) value of < 3 Å). Subsequently, we employed gene ontology (GO) terms, Pfam, PANTHER and SUPERFAMILY domain identifiers to functionally annotate all kinase sequences and assigned those to biochemical pathways based on matches (protein BLAST v.2.2.28+; E- value of ≤ 1e-5; Camacho et al., 2009) to the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (downloaded 1 February 2016; Kanehisa and Goto, 2000). We also employed the amino acid sequences and their predicted GO annotation to infer the subcellular localisation of proteins based on six sequence- and GO-based subclassifiers within the program MultiLoc2 (Blum et al., 2009), applying a confidence score cut-off of ≥ 0.8.

5.2.3 Transcription analysis We used publicly available, high quality RNA-Seq data (Jex et al., 2014) to assess transcription of kinase genes in key developmental stages and different tissues/sexes (i.e. single RNA-Seq libraries for L1/L2, L3, L4, the stichosome and the posterior end of males and females) of T. suis. First, we aligned length- and quality-filtered, paired-end reads (see Jex et al., 2014 for details) to transcript sequences encoding kinases using Bowtie v.2.1.0 (default settings; Langmead and Salzberg, 2012) within the software package RSEM v.1.2.11 (Li and Dewey, 2011) and then calculated levels of transcription (transcripts per million, TPM), considering kinase genes as transcribed if five or more read pairs mapped to their coding regions.

Subsequently, we applied min-max-normalisation and log2-transformation to the raw TPM values to allow clustering based on overall transcriptional trends rather than ranges of raw TPM values, and then clustered kinases based on similar transcription profiles using the Ward clustering method within the R package hclust. For each cluster, we then tested for enrichment of domains, GO terms and pathways using Fisher’s exact test, and applied multiple testing correction, employing the fisher.test function and qvalue package in R (http://github.com/jdstorey/qvalue).

5.2.4 Kinase sequence comparisons among nematodes We carried out qualitative and quantitative comparisons of the kinomes of T. suis and T. trichiura with that of T. spiralis and those of the chromadorean nematodes C. elegans and H.

153

contortus (see Plowman et al., 1999; Stroehlein et al., 2015, 2016). We determined the sequence variation among orthologs present in all five species by first defining clusters of orthologous sequences using OrthoMCL (BLAST E-value of ≤ 1e-5; sequence similarity of ≥ 50%) and subsequent global pairwise amino acid alignments of individual sequences within clusters using the program EMBOSS Needle v.6.4.0.0 (Rice et al., 2000). To analyse sequence conservation among different taxa, we only considered clusters in which all sequences contained the conserved protein kinase domain (“protein kinase-like”, SSF56112). Then, we displayed sequence variation between species by ordering orthologs from highest to lowest average sequence similarity among each other for both the full-length sequence and the kinase catalytic domain.

5.3 Results 5.3.1 Kinomes of Trichuris We defined, curated and annotated the kinomes of T. suis and T. trichiura. For T. suis, we identified 281 sequences, 216 of which were eukaryotic protein kinases (ePKs), 50 were atypical protein kinases (aPKs) and 15 were putative, kinase-like (PKL) proteins (Table 5.1). Of all sequences, we classified 266 into 102 distinct families and further assigned 99 sequences to one of 87 subfamilies (Appendices 5.1 and 5.2). All 281 sequences were supported by a de novo-assembled transcript and, except for one sequence (D918_09888; inferred from an independent data set (PRJNA179528)), could be located on genomic scaffolds representing male and female worms (Appendices 5.3 and 5.4). For male worms, three full-length transcript sequences could not be mapped to a single genomic scaffold; instead two to three non-overlapping fragments of each transcript, together representing the full-length transcript sequence, were mapped to distinct genomic scaffolds. For female worms, three fragments representing one full-length transcript mapped to three distinct genomic scaffolds. The mapping of the transcripts identified in T. suis to genomic sequences of T. trichiura, followed by the reciprocal optimisation of gene models, allowed the definition of one-to-one orthologs between the two species. However, the number of transcript sequences that did not map to a single genomic region was substantially higher for T. trichiura, with fragments representing 19 full-length sequences being located on multiple, distinct scaffolds (Appendix 5.5). Additionally, 21 T. trichiura sequences either had a total length of < 80% of those of their orthologs in T. suis, shared < 70% global sequence identity with them, or were < 200

154

amino acids long. As these sequences could not be reconstructed completely, they were considered incomplete and thus labelled as fragments (Appendix 5.6). Based on domain information and the presence of conserved orthologs in curated kinomes, we classified all kinases that were initially not identified and/or classified by the program Kinannote (n = 63) and assigned 12 of them to families and subfamilies (Appendices 5.1, 5.2, 5.6 and 5.7). We annotated 15 of the 63 sequences as PKLs, as they could not be placed into any of the recognised kinase groups (Appendices 5.1, 5.2 and 5.6). Ten of them contained a protein kinase-like domain (SSF56112), whereas the remaining five lacked this but contained other domains indicative of being kinase-related. Of all 15 PKLs, two could not be assigned to a family, as they only had a kinase domain but no additional, characteristic domains to allow a classification. Thus, we predicted their three-dimensional structure and annotated them based on their closest structural analogs: a CHK2 kinase (KIN_093; PDB accession number: 3I6U) and a mixed lineage kinase-like protein (KIN_019; PDB accession number: 4BTF; Figure 5.1). Structural predictions of KIN_019 of T. suis and T. trichiura gave C scores of 0.20 and 0.56, respectively. Detailed sequence analysis of KIN_019 confirmed that two motifs (HLD and DFG) that are considered essential for kinase activity are conserved in both species. To support the sequence- and domain-based annotation of the 13 other kinases, we also predicted their three-dimensional structures. Nine of them had close structural analogs in the PDB (Appendix 5.8) with TM scores of 0.75 or higher (supporting a correct structural topology) and RMSD values of < 3 Å. An overall comparison of the annotated kinomes of T. suis and T. trichiura showed that they are very similar, with an average sequence similarity of 90%. Although we identified eight additional kinase sequences in the T. trichiura kinome, a lack of supporting transcriptomic data (cf. Santos et al., 2016) for genes encoding these kinases did not allow us to corroborate a quantitative difference between the two kinomes. Although all eight sequences contained a protein kinase-like domain (SSF56112), we classified five of them as PKLs, given their unusual additional domains (Pfam identifiers: PF01323, PF01636, PF03109, PF04284, PF06288 and PF10707; Appendix 5.6). Following the identification and classification of all 281 sequences, we assigned 123 of those to one or more of the pathway categories “cellular processes” (n = 62), “environmental information processing” (n = 76), “genetic information processing” (n = 22), “organismal systems” (n = 66) and “metabolism” (n = 15). Interestingly, the number of kinases in the latter category was similar to that of other nematodes such as C. elegans (n = 23) and H.

155

contortus (n = 13; see Stroehlein et al., 2015), but almost twice as high as that previously reported for nematodes of the genus Trichinella (n = 8; see Stroehlein et al., 2016). Next, to further explore the remaining 158 kinases without pathway annotation, we assessed the genomic locations of all protein kinase-encoding genes, reasoning that kinase genes in close vicinity to each other might represent an operon (cf. Pettitt et al., 2014) and, thus, might play a role in a common pathway. The average distance between kinase genes in the T. suis genomes was ~119,000 bp. In contrast, one scaffold encoded 18 kinase genes on the same strand with an average distance between genes of ~14,300 bp, of which 13 were annotated as “uncharacterised oxidoreductase Dhs-27” (IPR012877). In a third functional analysis step, we predicted the subcellular localisation for 94 kinases using the program MultiLoc2. Of these, we predicted 59 to be cytoplasmic, 17 and three to be localised to the nucleus and mitochondria, respectively, and 15 kinases to be transmembrane proteins (Appendix 5.1).

5.3.2 Transcription profiles To determine which kinase genes are co-transcribed in T. suis, we assigned kinase transcripts to eight individual clusters based on similar trends in their transcription levels across developmental stages, sexes and tissues (Figure 5.2). Most kinase genes grouped into one large cluster (n = 139; cluster III), represented by medium to high levels of transcription in the early larval stages, and in the posterior end of both male and female worms, but lower levels of transcription in the stichosome. In contrast, one cluster (IV) comprised kinase genes with no or very low transcription in all stages/tissues, except male adult worms (posterior end and whole worm). Additionally, four kinase genes formed one cluster (VI) of selectively transcribed genes in female tissues (Figure 5.2; Appendix 5.9). Interestingly, no kinase genes were selectively transcribed in the stichosome of either male or female worms, and most of them exhibited moderate to low transcription levels in this tissue. In contrast, 14 kinase genes had very high TPM values (> 100) for the stichosome tissue, and were assigned to clusters II, III and VII. Of those, seven were assigned to one or more of 12 different signalling pathways of the category “environmental information processing” (Appendix 5.9). Subsequently, we assessed whether any of the clusters were enriched for particular functional (GO and InterPro) terms, pathways, domains and/or groups, which revealed an enrichment (q-value of < 0.01) for “Src homology 2” (SH2) domains (PF00017; SSF55550), “tyrosine-protein kinase” (TK) domains (PTHR24418), “Tau-tubulin kinase (TTBK) 2” domains (PTHR11909:SF19), “casein kinase 1” (CK1) kinases and “casein kinase-related”

156

domains (PTHR11909) in cluster IV and a significant under-representation of the latter two terms in the largest cluster (III).

5.3.3 Kinome comparisons among nematodes To gain further insight into the composition of enoplean kinomes, we carried out qualitative and quantitative comparisons between T. suis and T. spiralis. The number of ePKs identified in T. suis (n = 216) was similar to that in T. spiralis (n = 205). The only major expansions in T. suis were in the CK1 group (TTBK family; 29 in T. suis versus 19 in T. spiralis) and the TK group (within the “Fujinami poultry sarcoma/feline sarcoma-related” (FER) family; 10 in T. suis versus four in T. spiralis). For aPKs, we found six and nine additional phosphatidylinositol 4-kinases (PI4K) and bromodomain-containing (BRD) proteins in T. suis, respectively, compared with T. spiralis; representatives of the latter aPK family have been reported to have kinase activity, despite lacking a conserved kinase-fold (Devaiah et al., 2012). Another expansion was observed in the “nuclear hormone receptor related” (NHRR) family DHS-27 (n = 16; Trichinella: n = 4; see Stroehlein et al., 2016). Subsequently, we extended the pairwise comparison of enoplean species to a global qualitative and quantitative comparison of the curated kinomes of five ‘divergent’ nematode species (representing the Enoplea and Chromadorea), including two rhabditids (H. contortus and C. elegans; Figure 5.3; Appendices 5.10-5.12). In total, we identified 67 orthologous groups that were unique to the two Trichuris spp. investigated here. Of these groups, 23 contained one or two TTBK orthologs per species. Additionally, there were 36 orthologous groups unique to the three enoplean species (Trichuris spp. and T. spiralis), including five groups representing TTBK orthologs and three groups representing NHRR proteins. Four groups of orthologs represented the four parasitic species, but were absent from C. elegans. Ninety orthologous groups were absent from all three enoplean species and were present only in the two rhabditids studied. Next, we assessed several clusters of orthologs that contained kinase sequences with relatively uncommon accessory domains. One enoplean-specific cluster (OWN_232) contained sequences (KIN_023) that were assigned to the tyrosine kinase-like (TKL) group, without further sub-classification, and had a “periplasmic binding protein-like II” domain (SSF53850) as well as a guanylate cyclase (PF00211) and a TK domain (PF07714). In T. suis, the Kin_023 gene was highly transcribed in L1/L2 larvae (TPM: ~30; transcriptional cluster VIII; Appendix 5.9) and lowly in all other stages (TPM: ~2-7). In another cluster (OWN_14) which contained one to two “large tumor suppressor” (LATS) kinase orthologs in

157

all five species compared, we found a “ubiquitin-associated (UBA) domain”-like domain (SSF46934) in orthologs of all enopleans studied, but not in those of C. elegans (T20F10.1/WTS-1) or H. contortus (Hc-PK-263.1) (Appendix 5.10). Despite the diversity in sequence and domain architecture among the kinomes, we identified 126 orthologous groups that contained between 133 and 155 representatives in all five nematode species investigated (Figure 5.3; Appendix 5.10). Of these, we further analysed a subset of 130 ePK sequences that contained a conserved protein kinase catalytic domain. An all-against-all pairwise full-length amino acid sequence comparison revealed highest conservation between orthologs of T. suis and T. trichiura, and limited conservation between T. suis and both H. contortus and C. elegans (Figure 5.4A). Among the 33 most conserved kinases (i.e. those within the top 25% quantile), the majority belonged to the three groups “nucleoside-regulated kinases” (AGC; n = 9), “Ca2+/calmodulin-dependent kinases” (CAMK; n = 7) and “cyclin-dependent kinases (CDKs)/mitogen-activated protein kinases (MAPKs)/glycogen synthase kinases (GSKs)/CDK-like kinases” (CMGC; n = 6), whereas the most variable ones (within the bottom 25% quantile) were represented mainly by the “Other” group (n = 8), CMGC group (n = 7) and atypical kinases or kinase-like sequences (n = 7). Interestingly, ~50 orthologous pairs of T. suis and T. spiralis sequences displayed low sequence similarity, with 18 pairs of orthologs being < 40% similar to each other (median: 60.7%; mean: 58.1%), but usually sharing higher similarity with orthologs of H. contortus and C. elegans (Figure 5.4A; Appendix 5.11). We assessed whether this variation was seen across the entire length of these sequences or was restricted to regions outside of the catalytic domain, which revealed that, although the overall sequence variation between orthologous pairs was very high (47% variation, on average), the regions representing the catalytic domain were relatively conserved, with an average similarity of ≥ 75% (Figure 5.4B; Appendix 5.12).

5.4 Discussion Throughout evolution, parasitic nematodes have independently adapted to particular environmental and biological niches within their hosts (Blaxter and Koutsovoulos, 2015; Lok, 2016a). These adaptations require sophisticated signalling mechanisms to govern host invasion and host-parasite interactions, including immune modulation/evasion (Jasmer et al., 2003; Dissous et al., 2006; Hewezi, 2015; Lok, 2016b). The availability of transcriptomes

158

and genomes of numerous parasitic worms (Howe et al., 2016) now enables researchers to explore these mechanisms at the molecular level and on a global scale in parasitic nematodes. In the present study, we defined, curated and explored in depth the kinomes of two representatives of the genus Trichuris. We contend that the observed variation in sequence and domain architecture among the kinomes of five biologically distinct nematode species and, in particular, the differences between enoplean and chromadorean representatives reflect evolutionary adaptations that can be linked to specialisations in environmental signalling processes and host-parasite interactions. As part of the draft genome study of T. suis, the number of protein kinases was previously estimated at 232 (Jex et al., 2014), which proved to be a good estimate. However, we did observe some incomplete or incorrect gene predictions in the draft genome, including incorrect gene fusions and genes split into multiple, fragmented gene models across different scaffolds. The application of our improved bioinformatic pipeline enabled us to comprehensively curate gene models, and then define and compare the kinomes of T. suis and T. trichiura. In this context, defining functional and/or conserved domains and their order within protein sequences readily allowed for the curation of incorrectly joined protein sequences and/or sequences containing incomplete domains. The availability of extensive transcriptomic data allowed us to comprehensively define full-length coding sequences, despite incomplete or fragmented genomic assemblies. In combination with our pairwise, iterative curation approach and an additional verification step against a third, independent T. suis data set (PRJNA179528), this strategy provided strong support for correct gene predictions and enabled us to define the genomic location of a transcript that did not map to a genomic scaffold in other genomic data sets for T. suis (see Jex et al., 2014). In the future, this approach could be enhanced further through improved genomic assemblies such as those achieved by employing long-read, third-generation sequencing data (e.g., Reuter et al., 2015). We here showed that the integration of transcriptomic and structural modelling data sets enables the prediction of functions and/or pathway associations for some kinases that could not be associated with any pathways based on sequence similarity to a known ortholog. For example, the investigation of transcription profiles in T. suis has revealed a set of four female-enriched kinases, suggesting roles in reproductive processes, and 59 male-enriched kinases that we propose assume roles in male development and/or spermatogenesis. Furthermore, the high transcription of 14 kinase genes in the stichosome, a morphological structure unique to members of the Enoplea (see Sheffield, 1963; Despommier and Müller,

159

1976), and the finding that seven of these genes and their products were associated with pathways involved in the communication of the parasite with its environment, support the hypothesis that they assume a signalling function at the host-parasite interface (i.e. the syncytial tunnel comprised of host epithelial cells around the Trichuris stichosome; Tilney et al., 2005). This notion is further supported by evidence that kinases are present in the excretory/secretory (ES) products of Trichinella (see Arden et al., 1997; Gounaris et al., 2001) and that ES products of Trichuris play a key role in host-parasite interactions (Lillywhite et al., 1995; Hansen et al., 2015). Additionally, we employed structural modelling as a tool to functionally annotate selected kinase sequences (n = 15). Similar to the inference of protein functions and/or pathway association to known orthologs using BLAST-based approaches (Li et al., 2003; Camacho et al., 2009; Gabaldon and Koonin, 2013; Kanehisa et al., 2016), structural modelling utilising a reference database of experimentally validated protein structures can support annotations inferred based on sequence similarity (Zhang and Skolnick, 2004; Yang et al., 2015) and allow the annotation of sequences lacking similarity to any other sequences and/or domains (‘orphan’ proteins). Here we demonstrated the usefulness of this approach by predicting the structure of a Trichuris MOS-like protein with high confidence, which was consistent with the findings for orthologs of this protein in two Trichinella taxa and further supports the initial proposal that MOS-like kinases function in reproductive (i.e. oocyte maturation) or developmental processes specifically in enopleans (see Stroehlein et al., 2016). Further analysis of the curated kinomes allowed us to address some fundamental questions surrounding the molecular biology of the genus Trichuris. The present study corroborates the proposal that representatives of the Enoplea encode a reduced set of kinases compared with the Chromadorea, although the reduction is not as pronounced in Trichuris as it is in Trichinella (see Stroehlein et al., 2016). It is conceivable that, due to its intracellular lifestyle, a reliance of Trichinella on host kinases that are linked to metabolism has led to this reduction (cf. Stroehlein et al., 2016), whereas in Trichuris spp. (extracellular parasites) such kinases have been retained. This notion is, to some extent, supported by a report of the remarkably small kinome of the intracellular, microsporidian parasite Encephalitozoon cuniculi, compared with much larger kinomes in related yeast species (Miranda-Saavedra et al., 2007). The expansion of the eukaryotic protein kinase families TTBK and FER in Trichuris compared with Trichinella were consistent with previous findings for H. contortus, C. elegans and Trichinella spp. (see Stroehlein et al., 2015, 2016) and suggest distinct, species-

160

specific functions for some of the TTBK and FER kinases. The diversity within and among clusters of orthologs for these families among five nematodes studied here further supports this notion. Recently, several representatives of the TTBK family have been shown to play a role in ciliogenesis (Goetz et al., 2012) and the regulation of transporter activity (Alesutan et al., 2012; Almilaji et al., 2013). Additionally, multiple TTBKs are involved in pheromone- regulated dauer (i.e. a diapause-like (arrested) developmental stage) formation in C. elegans (see Neal et al., 2016). Given that morphologically, the neuronal sensing structures of enopleans and chromadoreans are distinct (Hall and Russell, 1991; Inglis et al., 2007), with enopleans lacking sensory phasmids and containing bacillary bands (suggested to be involved in sensing; cf. Wright and Chan, 1973), the differences in TTBK kinases might relate to pathways controlling environmental sensing. However, the transcription of TTBK genes almost exclusively in adult male worms, which is consistent with findings in H. contortus (see Stroehlein et al., 2015) and the large roundworm Ascaris suum (see Jex et al., 2011), suggests a role in male-specific morphological development and/or spermatogenesis for these kinases, in concordance with reports for C. elegans (see Reinke et al., 2000; Muhlrad and Ward, 2002). Furthermore, it has been shown that a Worm6 kinase (SPE-6), which is phylogenetically close to TTBK(-like) kinases (Stroehlein et al., 2016), is involved in the latter stages of spermatogenesis, controlling sperm activation and correct assembly of the major sperm protein (MSP; Varkey et al., 1993; Muhlrad and Ward, 2002; L'Hernault, 2006). However, the lack of an ortholog of SPE-6 in Trichuris lends some support to the proposal that, in enopleans, signalling mechanisms controlling spermatogenesis are distinct from those in the free-living nematode and might involve other TTBKs. Furthermore, it is conceivable that some of them play a different role altogether, for example in male-specific environmental signalling. In addition, predominant or exclusive transcription of nine FER kinases in male tissues of T. suis suggests that they also play a male-specific role, potentially in concert with some of the TTBK kinases, which is consistent with findings for H. contortus (see Stroehlein et al., 2015) and C. elegans (see Reinke et al., 2000; Muhlrad and Ward, 2002). However, the finding that species without a free-living stage in their life cycle (enoplean and filarial nematodes) have a smaller number (n = 5-18) of FER genes, while species with a free-living stage tend to encode a larger number of these genes (n = 21-32; with the exception of M. hapla; n = 9; cf. Desjardins et al., 2013; Stroehlein et al., 2015, 2016) might also hint at roles in host- or food-finding for some of these kinases.

161

As for ePKs, we report only a small number of differences between the atypical protein kinase families of Trichinella and Trichuris, some of which might be explained by a more sensitive bioinformatic pipeline used in this study that revealed additional, putative kinase- like sequences. Nonetheless, the expansion of the NHRR family in Trichuris suggests a distinct function for some of these representatives, potentially in a novel, receptor-mediated signalling pathway. This proposal is partially supported by the finding that none of them could be associated with a pathway and that most of them are localised in close vicinity to one another in the genome. Many membrane-bound receptor kinases are known to be involved in environmental or cell-cell signalling (Hallem et al., 2011; Lok, 2016a; Takeishi et al., 2016). The analysis of subcellular localisation of kinases in Trichuris revealed two unclassified tyrosine kinases (KIN_095 and KIN_250) that were predicted to be membrane-associated. Both were linked to environmental signalling pathways and clustered with orthologs in all nematodes studied, including ROL-3, VER-1, VER-3 and VER-3 receptor kinases of C. elegans (Appendix 5.10). This and a previous comparison of Trichinella sequences with human orthologs (Stroehlein et al., 2016) suggest a higher similarity of the KIN16 family with mammalian vascular endothelial growth factor receptors (VEGFRs) than previously thought (Manning, 2005). Furthermore, we predicted seven kinases from six distinct families and subfamilies, which generally do not contain membrane-bound kinases, to be localised to the plasma membrane in T. suis, suggesting novel mechanisms for cell-cell communication and/or interaction(s) between the parasite and its host. In addition to tyrosine and serine/threonine kinase receptors, several representatives of the receptor guanylate cyclases (RGCs) have been shown to play roles in both environmental sensing and dauer signalling in C. elegans (see Birnby et al., 2000; Schaap, 2005; Hallem et al., 2011; Gilabert et al., 2016; Takeishi et al., 2016). For Trichuris, we reveal a substantially smaller RGC group (n = 3) than that of C. elegans (n = 27; see Manning, 2005), which is consistent with a previous proposal that the small number of RGC genes in Trichinella relates to the unique biology of enoplean nematodes (cf. Stroehlein et al., 2016). For the chromadorean nematode H. contortus, RGC-encoding genes are predominantly transcribed in free-living L3s (Stroehlein et al., 2015), suggesting specific roles in environmental sensing in this stage. This would explain the reduced number of respective genes in the two enoplean genera, which both lack free-living larval stages, and might also be linked to differences in environmental sensing organs and associated neuronal structures. Taken together, these

162

findings suggest that enopleans have adapted and developed distinct molecular means of environmental sensing. An example of such a potential adaptation is the presence of a “periplasmic-binding protein-like II” domain-containing kinase exclusively in the three enoplean species. This domain is found in receptors across animals, and is involved in ligand-protein or protein- protein interactions, including sensing of and induction of chemotaxis towards nutrient sources (Felder et al., 1999). Although a similar domain annotation is found in a C. elegans RGC (GCY-9), the substantial variation between its sequence and its enoplean orthologs and the different group classification (TKL) for the latter suggest distinct functions in Trichuris. The predominant transcription of genes encoding such kinases in an early larval stage of T. suis further supports a role in sensing processes that regulate host-parasite interactions, such as establishment of infection, host immune modulation, chemotaxis and/or localisation of nutrient sources. A second example of a potential adaptation exclusive to enoplean kinomes was the presence of an UBA domain in the LATS orthologs of enopleans explored here. Although UBA domains are present in LATS protein kinases of humans (LATS1 and LATS2), this domain was not found in the respective orthologs of C. elegans (WTS-1) or H. contortus. The UBA domain is a sequence motif of ~45 amino acid residues that is found in a diverse range of proteins, including protein kinases, and has been suggested to act as a protein-protein interaction site (Mueller and Feigon, 2002). LATS kinases are part of the Hippo signalling pathway in humans and C. elegans, in which WTS-1 assumes key roles in developmental processes including larval and pharynx development, lifespan and body length control, dauer formation and apical intestinal membrane integrity (Cai et al., 2009; Kang et al., 2009). The genome of T. suis encodes orthologs of both kinases associated with the canonical Hippo pathway (KIN_133, KIN_079 and KIN_120), but the presence of two (rather than one) LATS orthologs and seven additional kinases inferred to be associated with this pathway for each of the three enoplean species suggests that their Hippo pathway is distinct from that in C. elegans, and might involve the UBA domain as a protein-protein interaction site. In conclusion, the present study provides, to our knowledge, the first detailed information on the kinomes of T. suis and T. trichiura, as well as the first findings from comparative studies of five diverse nematode kinomes, serving as a framework for further detailed explorations of kinase signalling and evolution in nematodes. This and future work should provide a useful resource for investigating this important class of enzymes in parasitic worms and might pave the way for the identification of selected kinases as drug targets in some of

163

the most socioeconomically important parasitic worms.

164

5.5 References Alesutan I, Sopjani M, Dermaku-Sopjani M, Munoz C, Voelkl J, Lang F, 2012. Upregulation of Na+-coupled glucose transporter SGLT1 by Tau tubulin kinase 2. Cell. Physiol. Biochem. 30, 458-465. Almilaji A, Munoz C, Hosseinzadeh Z, Lang F, 2013. Upregulation of Na+, Cl--coupled betaine/gamma-amino-butyric acid transporter BGT1 by Tau tubulin kinase 2. Cell. Physiol. Biochem. 32, 334-343. Arden SR, Smith AM, Booth MJ, Tweedie S, Gounaris K, Selkirk ME, 1997. Identification of serine/threonine protein kinases secreted by Trichinella spiralis infective larvae. Mol. Biochem. Parasitol. 90, 111-119. Birnby DA, Link EM, Vowels JJ, Tian H, Colacurcio PL, Thomas JH, 2000. A transmembrane guanylyl cyclase (DAF-11) and Hsp90 (DAF-21) regulate a common set of chemosensory behaviors in Caenorhabditis elegans. Genetics 155, 85-104. Blaxter M, Koutsovoulos G, 2015. The evolution of parasitism in Nematoda. Parasitology 142 (Suppl. 1), S26-S39. Blum T, Briesemeister S, Kohlbacher O, 2009. MultiLoc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction. BMC Bioinformatics 10, 274. Bradley JE, Jackson JA, 2004. Immunity, immunoregulation and the ecology of trichuriasis and ascariasis. Parasite Immunol. 26, 429-441. Brenner S, 1988. The nematode Caenorhabditis elegans. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, USA. Bruce RG, 1970. Structure of the esophagus of the infective juvenile and adult Trichinella spiralis. J. Parasitol. 56, 540-549. Cai Q, Wang W, Gao Y, Yang Y, Zhu Z, Fan Q, 2009. Ce-wts-1 plays important roles in Caenorhabditis elegans development. FEBS Lett. 583, 3158-3164. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL, 2009. BLAST+: architecture and applications. BMC Bioinformatics 10, 421. Cantacessi C, Young ND, Nejsum P, Jex AR, Campbell BE, Hall RS, Thamsborg SM, Scheerlinck JP, Gasser RB, 2011. The transcriptome of Trichuris suis - first molecular insights into a parasite with curative properties for key immune diseases of humans. PLoS One 6, e23590. De Ley P, 2006. A quick tour of nematode diversity and the backbone of nematode phylogeny. WormBook, ed. The C. elegans Research Community, WormBook, https://doi.org/10.1895/wormbook.1.41.1 Desjardins CA, Cerqueira GC, Goldberg JM, Dunning Hotopp JC, Haas BJ, Zucker J, Ribeiro JM, Saif S, Levin JZ, Fan L, Zeng Q, Russ C, Wortman JR, Fink DL, Birren BW, Nutman TB, 2013. Genomics of Loa loa, a Wolbachia-free filarial parasite of humans. Nat. Genet. 45, 495-500. Despommier DD, Müller M, 1976. The stichosome and its secretion granules in the mature muscle larva of Trichinella spiralis. J. Parasitol. 62, 775-785. Despommier DD, 1998. How does Trichinella spiralis make itself at home? Parasitol. Today 14, 318-323. Devaiah BN, Lewis BA, Cherman N, Hewitt MC, Albrecht BK, Robey PG, Ozato K, Sims RJ, 3rd, Singer DS, 2012. BRD4 is an atypical kinase that phosphorylates serine2 of the RNA polymerase II carboxy-terminal domain. Proc. Natl. Acad. Sci. USA 109, 6927-6932. Dissous C, Khayath N, Vicogne J, Capron M, 2006. Growth factor receptors in helminth parasites: signalling and host-parasite relationships. FEBS Lett. 580, 2968-2975.

165

Felder CB, Graul RC, Lee AY, Merkle HP, Sadee W, 1999. The Venus flytrap of periplasmic binding proteins: an ancient protein module present in multiple drug receptors. AAPS PharmSci. 1, E2. Foth BJ, Tsai IJ, Reid AJ, Bancroft AJ, Nichol S, Tracey A, Holroyd N, Cotton JA, Stanley EJ, Zarowiecki M, Liu JZ, Huckvale T, Cooper PJ, Grencis RK, Berriman M, 2014. Whipworm genome and dual-species transcriptome analyses provide molecular insights into an intimate host-parasite interaction. Nat. Genet. 46, 693-700. Gabaldon T, Koonin EV, 2013. Functional and evolutionary implications of gene orthology. Nat. Rev. Genet. 14, 360-366. Gilabert A, Curran DM, Harvey SC, Wasmuth JD, 2016. Expanding the view on the evolution of the nematode dauer signalling pathways: refinement through gene gain and pathway co-option. BMC Genomics 17, 476. Goetz SC, Liem KF, Jr., Anderson KV, 2012. The spinocerebellar ataxia-associated gene Tau tubulin kinase 2 controls the initiation of ciliogenesis. Cell 151, 847-858. Goldberg JM, Griggs AD, Smith JL, Haas BJ, Wortman JR, Zeng Q, 2013. Kinannote, a computer program to identify and classify members of the eukaryotic protein kinase superfamily. Bioinformatics 29, 2387-2394. Gough J, Karplus K, Hughey R, Chothia C, 2001. Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J. Mol. Biol. 313, 903-919. Gounaris K, Thomas S, Najarro P, Selkirk ME, 2001. Secreted variant of nucleoside diphosphate kinase from the intracellular parasitic nematode Trichinella spiralis. Infect. Immun. 69, 3658-3662. Hall DH, Russell RL, 1991. The posterior nervous system of the nematode Caenorhabditis elegans: serial reconstruction of identified neurons and complete pattern of synaptic interactions. J. Neurosci. 11, 1-22. Hallem EA, Spencer WC, McWhirter RD, Zeller G, Henz SR, Ratsch G, Miller DM, 3rd, Horvitz HR, Sternberg PW, Ringstad N, 2011. Receptor-type guanylate cyclase is required for carbon dioxide sensation by Caenorhabditis elegans. Proc. Natl. Acad. Sci. USA 108, 254-259. Hansen EP, Kringel H, Williams AR, Nejsum P, 2015. Secretion of RNA-containing extracellular vesicles by the porcine whipworm, Trichuris suis. J. Parasitol. 101, 336-340. Hewezi T, 2015. Cellular signaling pathways and posttranslational modifications mediated by nematode effector proteins. Plant Physiol. 169, 1018-1026. Howe KL, Bolt BJ, Shafie M, Kersey P, Berriman M, 2016. WormBase ParaSite - a comprehensive resource for helminth genomics. Mol. Biochem. Parasitol. https://doi.org/10.1016/j.molbiopara.2016.11.005 Huang X, Madan A, 1999. CAP3: A DNA sequence assembly program. Genome Res. 9, 868- 877. Inglis PN, Ou G, Leroux MR, Scholey JM, 2007. The sensory cilia of Caenorhabditis elegans. WormBook, ed. The C. elegans Research Community, WormBook, https://doi.org/10.1895/wormbook.1.126.2 Jasmer DP, Goverse A, Smant G, 2003. Parasitic nematode interactions with mammals and plants. Annu. Rev. Phytopathol. 41, 245-270. Jex AR, Liu S, Li B, Young ND, Hall RS, Li Y, Yang L, Zeng N, Xu X, Xiong Z, Chen F, Wu X, Zhang G, Fang X, Kang Y, Anderson GA, Harris TW, Campbell BE, Vlaminck J, Wang T, Cantacessi C, Schwarz EM, Ranganathan S, Geldhof P, Nejsum P, Sternberg PW, Yang H, Wang J, Wang J, Gasser RB, 2011. Ascaris suum draft genome. Nature 479, 529-533.

166

Jex AR, Nejsum P, Schwarz EM, Hu L, Young ND, Hall RS, Korhonen PK, Liao S, Thamsborg S, Xia J, Xu P, Wang S, Scheerlinck JP, Hofmann A, Sternberg PW, Wang J, Gasser RB, 2014. Genome and transcriptome of the porcine whipworm Trichuris suis. Nat. Genet. 46, 701-706. Jones P, Binns D, Chang HY, Fraser M, Li W, McAnulla C, McWilliam H, Maslen J, Mitchell A, Nuka G, Pesseat S, Quinn AF, Sangrador-Vegas A, Scheremetjew M, Yong SY, Lopez R, Hunter S, 2014. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236-1240. Kanehisa M, Goto S, 2000. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27-30. Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M, 2016. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457-D462. Kang J, Shin D, Yu JR, Lee J, 2009. Lats kinase is involved in the intestinal apical membrane integrity in the nematode Caenorhabditis elegans. Development 136, 2705-2715. Kent WJ, 2002. BLAT - the BLAST-like alignment tool. Genome Res. 12, 656-664. Korhonen PK, Pozio E, La Rosa G, Chang BCH, Koehler AV, Hoberg EP, Boag PR, Tan P, Jex AR, Hofmann A, Sternberg PW, Young ND, Gasser RB, 2016. Phylogenomic and biogeographic reconstruction of the Trichinella complex. Nat. Commun. 7, 10513. Kozek WJ, 2005. Are bacillary bands responsible for expulsion of Trichinella spiralis? Vet. Parasitol. 132, 69-73. L'Hernault SW, 2006. Spermatogenesis. WormBook, ed. The C. elegans Research Community, WormBook, https://doi.org/10.1895/wormbook.1.85.1 Langmead B, Salzberg SL, 2012. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357-359. Li B, Dewey CN, 2011. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323. Li L, Stoeckert Jr. CJ, Roos DS, 2003. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 2178-2189. Lillywhite JE, Cooper ES, Needham CS, Venugopal S, Bundy DA, Bianco AE, 1995. Identification and characterization of excreted/secreted products of Trichuris trichiura. Parasite Immunol. 17, 47-54. Lok JB, 2016a. The developmental biology of parasitic nematodes. PLoS Pathog. 12, e1005328. Lok JB, 2016b. Signaling in parasitic nematodes: physicochemical communication between host and parasite and endogenous molecular transduction pathways governing worm development and survival. Curr. Clin. Micro. Rept. 3, 186-197. Lopes Torres EJ, de Souza W, Miranda K, 2013. Comparative analysis of surface using conventional, low vacuum, environmental and field emission scanning electron microscopy. Vet. Parasitol. 196, 409-416. Manning G, Whyte DB, Martinez R, Hunter T, Sudarsanam S, 2002. The protein kinase complement of the human genome. Science 298, 1912-1934. Manning G, 2005. Genomic overview of protein kinases. WormBook, ed. The C. elegans Research Community, WormBook, https://doi.org/10.1895/wormbook.1.60.1 Mi H, Muruganujan A, Casagrande JT, Thomas PD, 2013. Large-scale gene function analysis with the PANTHER classification system. Nat. Protoc. 8, 1551-1566. Miranda-Saavedra D, Stark MJ, Packer JC, Vivares CP, Doerig C, Barton GJ, 2007. The complement of protein kinases of the microsporidium Encephalitozoon cuniculi in relation to those of Saccharomyces cerevisiae and Schizosaccharomyces pombe. BMC Genomics 8, 309.

167

Mitreva M, Jasmer DP, Zarlenga DS, Wang Z, Abubucker S, Martin J, Taylor CM, Yin Y, Fulton L, Minx P, Yang SP, Warren WC, Fulton RS, Bhonagiri V, Zhang X, Hallsworth-Pepin K, Clifton SW, McCarter JP, Appleton J, Mardis ER, Wilson RK, 2011. The draft genome of the parasitic nematode Trichinella spiralis. Nat. Genet. 43, 228-235. Mueller TD, Feigon J, 2002. Solution structures of UBA domains reveal a conserved hydrophobic surface for protein-protein interactions. J. Mol. Biol. 319, 1243-1255. Muhlrad PJ, Ward S, 2002. Spermiogenesis initiation in Caenorhabditis elegans involves a casein kinase 1 encoded by the spe-6 gene. Genetics 161, 143-155. Neal SJ, Park J, DiTirro D, Yoon J, Shibuya M, Choi W, Schroeder FC, Butcher RA, Kim K, Sengupta P, 2016. A forward genetic screen for molecules involved in pheromone- induced dauer formation in Caenorhabditis elegans. G3 (Bethesda) 6, 1475-1487. Pettitt J, Philippe L, Sarkar D, Johnston C, Gothe HJ, Massie D, Connolly B, Müller B, 2014. Operons are a conserved feature of nematode genomes. Genetics 197, 1201-1211. Plowman GD, Sudarsanam S, Bingham J, Whyte D, Hunter T, 1999. The protein kinases of Caenorhabditis elegans: a model for signal transduction in multicellular organisms. Proc. Natl. Acad. Sci. USA 96, 13603-13610. Reinke V, Smith HE, Nance J, Wang J, Van Doren C, Begley R, Jones SJ, Davis EB, Scherer S, Ward S, Kim SK, 2000. A global profile of germline gene expression in C. elegans. Mol. Cell 6, 605-616. Reuter JA, Spacek DV, Snyder MP, 2015. High-throughput sequencing technologies. Mol. Cell 58, 586-597. Rice P, Longden I, Bleasby A, 2000. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 16, 276-277. Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Hohna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP, 2012. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst. Biol. 61, 539-542. Roy A, Kucukural A, Zhang Y, 2010. I-TASSER: a unified platform for automated protein structure and function prediction. Nat. Protoc. 5, 725-738. Santos LN, Silva ES, Santos AS, De Sa PH, Ramos RT, Silva A, Cooper PJ, Barreto ML, Loureiro S, Pinheiro CS, Alcantara-Neves NM, Pacheco LG, 2016. De novo assembly and characterization of the Trichuris trichiura adult worm transcriptome using Ion Torrent sequencing. Acta Trop. 159, 132-141. Schaap P, 2005. Guanylyl cyclases across the tree of life. Front. Biosci. 10, 1485-1498. Sheffield HG, 1963. Electron microscopy of the bacillary band and stichosome of Trichuris muris and T. vulpis. J. Parasitol. 49, 998-1009. Slater GS, Birney E, 2005. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6, 31. Sonnhammer EL, Eddy SR, Durbin R, 1997. Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins 28, 405-420. Stroehlein AJ, Young ND, Korhonen PK, Jabbar A, Hofmann A, Sternberg PW, Gasser RB, 2015. The Haemonchus contortus kinome - a resource for fundamental molecular investigations and drug discovery. Parasit. Vectors 8, 623. Stroehlein AJ, Young ND, Korhonen PK, Chang BCH, Sternberg PW, La Rosa G, Pozio E, Gasser RB, 2016. Analyses of compact Trichinella kinomes reveal a MOS-like protein kinase with a unique N-terminal domain. G3 (Bethesda) 6, 2847-2856. Takeishi A, Yu YV, Hapiak VM, Bell HW, O'Leary T, Sengupta P, 2016. Receptor-type guanylyl cyclases confer thermosensory responses in C. elegans. Neuron 90, 235- 244.

168

Tilney LG, Connelly PS, Guild GM, Vranich KA, Artis D, 2005. Adaptation of a nematode parasite to living within the mammalian epithelium. J. Exp. Zool. A Comp. Exp. Biol. 303, 927-945. Varkey JP, Jansma PL, Minniti AN, Ward S, 1993. The Caenorhabditis elegans spe-6 gene is required for major sperm protein assembly and shows second site non- complementation with an unlinked deficiency. Genetics 133, 79-86. Veglia F, 1915. The anatomy and life-history of the Haemonchus contortus. Vet. Res. 4, 347- 500. Wright KA, 1968. Structure of the bacillary band of Trichuris myocastoris. J. Parasitol. 54, 1106-1110. Wright KA, Chan J, 1973. Sense receptors in the bacillary band of trichuroid nematodes. Tissue Cell 5, 373-380. Yang J, Yan R, Roy A, Xu D, Poisson J, Zhang Y, 2015. The I-TASSER suite: protein structure and function prediction. Nat. Methods 12, 7-8. Zhang Y, Skolnick J, 2004. Scoring function for automated assessment of protein structure template quality. Proteins 57, 702-710.

169

Table 5.1 The kinomes of four enoplean nematodesa. Kinase Trichuris Trichuris Trichinella Trichinella classificationb suis trichiura spiralis pseudospiralis Eukaryotic protein kinases (ePKs) AGC 21 21 21 21 CAMK 25 25 26 26 CK1 43 43 33 39 CMGC 29 29 28 28 Other 27 29 31 31 RGC 3 3 3 3 STE 17 17 18 18 TK 35 35 29 30 TKL 16 16 16 16 Atypical protein kinases (aPKs) and protein kinase-like (PKL) sequences ABC1 1 2 2 2 Alpha 1 1 1 1 ATM-related 8 8 5 5 BRD 11 11 2 1 DHS-27/NHRR 16 16 4 4 PDHK 2 2 2 2 PI4K 7 7 1 1 RIO 3 3 3 3 TAF1 1 1 1 1 PKL 15 20 n.d. n.d. Totals 281 289 226 232 a The number of representative amino acid sequences are listed for all species. b AGC, nucleoside-regulated kinases; CAMK, Ca2+/calmodulin-dependent kinases; CK1, casein kinase 1 kinases; CMGC, cyclin-dependent kinases (CDKs)/mitogen- activated protein kinases (MAPKs)/glycogen synthase kinases (GSKs)/CDK-like kinases; RGC, receptor guanylate cyclases; STE, MAPK cascade kinases; TK, tyrosine kinases; TKL, tyrosine kinase-like kinases; ATM-related, ataxia telangiectasia mutated- related (including ATM, ATR, DNAPK, FRAP, SMG1 and TRRAP families); BRD, bromodomain-containing; NHRR, nuclear hormone receptor-related; PDHK, pyruvate dehydrogenase kinase; PI4K, phosphatidylinositol 4-kinase; RIO, right open reading frame kinase; TAF1, transcription initiation factor TFIID subunit 1; PKL, (protein) kinase-like kinases; n.d., not determined.

170

Figure 5.1 Structural alignment of predicted three-dimensional structures of KIN_019 of Trichuris suis (light blue) and Trichuris trichiura (dark blue) superimposed on to the Protein Data Bank (PDB) structure of a mixed lineage kinase-like protein (PDB accession number: 4BTF; orange). The structural models of these two sequences had template modelling (TM) scores of > 0.87 and root- mean-square deviation (RMSD) values of < 2.1 Å.

171

0 100 300 500 0 500 1000 1500 0 100 300 500 0 500 1000 1500 L3 L4 L3 L4 L3 L4 L3 L4 stic stic stic stic post post post post L1/L2 L1/L2 L1/L2 L1/L2 Cluster I (n = 14) whole Cluster II (n = 28) whole Cluster III (n = 139) whole Cluster IV (n = 59) whole 0 20 60 100 140 0 200 600 1000 0 200 400 600 0 50 100 150 200 L3 L4 L3 L4 L3 L4 L3 L4 stic stic stic stic post post post post L1/L2 L1/L2 L1/L2 L1/L2 Cluster V (n = 7) whole Cluster VI (n = 4) whole Cluster VII (n = 17) whole Cluster VIII (n = 13) whole Figure 5.2 Eight clusters of transcription profiles for kinase genes of Trichuris suis (Appendix 5.9). Transcription levels are represented as unnormalised transcripts per million (TPM) values on the Y-axis (scaled individually according to the highest value within each cluster). L1/L2, L3, L4 and adult worm tissue sections (stichosome (stic), posterior end (post), whole worm (whole)) of T. suis are shown on the X-axis. Shaded lines represent individualtranscription profiles; bold lines represent Lowess trend lines ± S.D. (dashed lines). For the data points representing the posterior end and whole worm of adults, both sexes are plotted (red, female; blue, male).

172

Figure 5.3 Venn diagram depicting clusters of orthologous kinase sequences among Trichuris suis (Tsu), Trichuris trichiura (Ttr), Trichinella spiralis (Tsp), Haemonchus contortus (Hco) and Caenorhabditis elegans (Cel), inferred using OrthoMCL (protein BLAST E-value of ≤ 1e-5; sequence similarity of ≥ 50%; Appendix 5.10).

173

A B 100 100

80 80

60 60

40 40

Trichuris trichiura 20 Trichinella spiralis 20 Caenorhabditis elegans

Aminosequenceacid similarity (%) Haemonchus contortus 0 0 0 20 40 60 80 100 120 130 0 20 40 60 80 100 120 130 Kinase sequences ordered by mean sequence similarity to Trichuris suis orthologs across all species Figure 5.4 Comparison of Trichuris suis sequences with orthologs in other species, ordered by decreasing average similarity among all orthologs. (A) Full-length sequence comparison. (B) Sequence comparison of the protein kinase-like domain (SSF56112). Dots represent similarity values between individual amino acid sequences and their T. suis orthologs (Appendices 5.11 and 5.12). Lines represent Lowess trend lines.

174

CHAPTER 6 General discussion ______

Understanding protein kinase signalling pathways in helminths informs how parasitic worms regulate physiology, morphology, development and metabolism. Therefore, knowledge of the exact protein kinase complements (kinomes) of helminths and their characteristics is of paramount significance for elucidating some of the most interesting fundamental biological questions about these parasites, and could assist in the search for new interventions against socioeconomically important parasitic worms. The present thesis (i) established and iteratively improved an advanced bioinformatic workflow system that was (ii) employed to comprehensively identify, curate, classify and functionally annotate the full complements of protein kinases in the genomes of seven parasitic worms, (iii) enabled the exploration of fundamental aspects of kinase signalling in worms based on developmental transcriptomes and comparisons among species, and, from an applied perspective, (iv) facilitated the computational prioritisation of protein kinases as targets for the development of small-molecule anthelmintic agents. The objectives of the present chapter were: (a) to summarise the main technical, fundamental and applied achievements of this thesis; (b) to discuss the findings of this work in the broader context of helminth and kinase biology as well as anthelmintic drug discovery; and (c) to provide perspectives on areas deserving of future investigations, extending the present work. Despite a recent increase in the availability of transcriptomic and genomic data sets for parasitic worms (Howe et al., 2016), very little is known about the protein kinases encoded in these genomes and their associated functions. At the outset of this thesis, several tools and resource for the identification and classification of protein kinases were available (see Chapter 1). However, none of these tools could achieve a comprehensive curation of the identified protein kinase genes, as they did not provide an option of including additional genomic and transcriptomic data sets into the gene curation process. Furthermore, these tools did not consider the fragmentation of draft genomes and the substantial kinome diversity between the well-curated model organism Caenorhabditis elegans (see Harris et al., 2014) and other, distantly related species, which represented challenges, both for the identification and the classification of protein kinase sequences in these genomes. To address some of the limitations of available tools, in the present thesis, a superior pipeline was established,

175

validated and employed to characterise the kinomes of selected parasitic helminths. The pipeline was continuously improved to address some of the individual challenges inherent to the analysis of different genomic and transcriptomic data sets. In total, this thesis describes results for 1996 curated protein kinase sequences representing seven diverse worm species, providing the largest resource of well-curated kinases to explore the fundamental biology of helminth kinases and pursue applied avenues in the future, such as investigations of protein kinases as targets for anthelmintic drugs.

6.1 Technical achievements Nucleotide and amino acid sequence data are now routinely, rapidly and accurately searched for patterns as small as a few nucleotide bases or amino acid residues up to whole genomic scaffolds (Durbin et al., 1998). Search algorithms allow the automated retrieval of similar sequences in other organisms or the identification of conserved, functional amino acid domains, which has substantially increased our ability to infer the relatedness of sequences and their potential biological functions (Finn et al., 2010; Xu and Dunbrack, 2012). Stochastic models (e.g., hidden Markov models (HMMs) and position-specific scoring matrices (PSSMs); cf. section 1.11) allow for sensitive and specific searches for conserved functional domains, such as the protein kinase catalytic domain, and have been implemented in many tools, including those employed to identify protein kinases (Chapter 1; Martin et al., 2009; Goldberg et al., 2013). However, sequences from species that are phylogenetically distinct from those representing the ‘seed’ alignment forming the HMMs for kinase catalytic domains (Pkinase, PF00069 and Pkinase_Tyr, PF07714) might not be identified due to a lack of sensitivity of such HMMs. Therefore, in Chapter 2, we used the Pfam HMMs representing kinase catalytic domains to interrogate a range of trematode genomes and then constructed new, trematode-specific HMMs based on the identified sequences for each kinase group, which revealed a higher sensitivity compared with the canonical Pfam HMMs (Chapter 2). However, such an approach might ‘overfit’ the stochastic model to a particular organism, thus losing general specificity for a wide range of protein kinases of other species. Accordingly, in the future, combining HMMs from all kinase sequences identified and curated in this thesis and other curated kinase sequences of helminths (e.g., C. elegans; see Manning, 2005) with the protein kinase HMMs in Pfam would allow the creation of improved models that are more sensitive to phylogenetic distant kinase sequences than those currently employed, while remaining specific toward protein kinases.

176

In contrast to the approach employed for the identification and classification of the kinomes of schistosomes (Chapter 2), the relatively close taxonomic relationship between Haemonchus contortus and C. elegans (Rhabditina; clade V; see Blaxter and Koutsovoulos, 2015) allowed a more straight-forward identification and classification, using the conventional Pfam HMMs and the well-curated kinome of C. elegans (Chapter 3). The relatively high degree of amino acid sequence conservation between these two species also facilitated a more comprehensive and reliable inference of functions and pathway associations for H. contortus compared with species whose amino acid sequences are less similar to orthologs in the arguably best-studied model organism C. elegans (see Harris et al., 2014). This information can now be employed to confidently reconstruct individual putative pathways for H. contortus (cf. Gilabert et al., 2016; Mohandas et al., 2016). However, given that C. elegans is entirely free-living and H. contortus has both free-living and parasitic stages, we deemed it likely that there are differences in biochemical kinase signalling pathways. Indeed, we detected seven H. contortus protein kinase sequences without a C. elegans homolog, and reported several other quantitative and qualitative adaptations in the kinome of H. contortus compared with C. elegans (Chapter 3). In this context, we employed an improved strategy for kinase characterisation and functional annotation; unclassified kinase sequences were functionally annotated based on the presence of all domains and/or signatures determined by InterProScan (Jones et al., 2014), as well as their order within the kinase sequence (“domain architecture”). This new functionality was adapted from that implemented in the domain architecture search tool “InterPro Domain Architecture” (IDA; https://www.ebi.ac.uk/interpro/search/domain-organisation). However, this online tool currently only allows searches for single architectures and does not permit batch queries of, for example, entire kinomes. Furthermore, a local, stand-alone version of the IDA tool is currently not available and the development thereof is not planned in the near future (InterProScan staff, personal communication). While our workflow currently supports the identification of domain architectures and a functional classification based on this information, an implementation of a stand-alone tool that allows the searching of sequences with the same or similar domain architecture in a high-throughput manner, would be highly beneficial. Such an implementation would add useful functionality to the pipeline, both in terms of evolutionary investigations (i.e. exploring gain or loss of functional domains in kinases across different phylogenetic taxa) and the curation of kinase sequences. In the context of the latter application, the semi-automated analysis of domain architectures in this thesis helped identify putative fusions (i.e. co-occurrence in the same protein sequence) of

177

domains that do not occur together in other kinase sequences within the InterPro database. Such novel fusions are indicative of an erroneously assembled gene sequence, and require critical assessment at the genomic and transcriptomic sequence level to either support (based on RNA-Seq evidence) or refute the predicted model. This approach was applied in Chapter 4 and 5 and proved very useful for the correction and complementation of erroneous gene predictions. Such incorrect predictions can have a substantial impact on subsequent analyses of inferred protein sequences. For example, an automated identification of a kinase protein whose gene has been incorrectly fused by a gene predictor (Figure 6.1) may fail because the HMM or PSSM built to detect a kinase catalytic domain produces a low-confidence score for such a sequence. Similarly, even if the sequence is correctly identified as a kinase using such models, an unusual N-terminal or C-terminal accessory (i.e. non-catalytic) domain might lead to a misclassification, or preclude classification altogether, particularly if the classification is based on BLAST (such as the local search against KinBase implemented in Kinannote; Goldberg et al., 2013). Employing a kinase characterisation strategy based on conserved domains allowed the identification of such erroneous gene predictions and the curation of their gene models. Manual and/or semi-automated curation of the associated genomic and transcriptomic data was carried out to confirm the correct gene prediction for these sequences. To this end, a genomic viewer (Integrative Genomics Viewer (IGV); Thorvaldsdóttir et al., 2013) was an indispensable tool for overlaying gene models, genomic locations and complementary data, including functional domains, RNA-Seq support and de novo-assembled transcripts (Figure 6.2). Following the curation of gene models, reliable classification of inferred protein kinase sequences using Kinannote and an orthology-/phylogeny-based approach was readily achieved. In this context, manual curation helped to identify ‘hotspots’ of potentially incorrect predictions, such as mis-assembled genes; we observed that, if two or more genes were encoded in relatively close vicinity to each other on a genomic scaffold, the original gene prediction software (MAKER2; Holt and Yandell, 2011) tended to splice multiple genes to produce a single gene model, when, in fact, they represented multiple, separate genes (Figure 6.1). This might be a difficulty inherent to gene predictions for helminth genomes, as their genomes are more compact (i.e. they have smaller intergenic regions and potentially overlapping genes; Bernot, 2004; Coghlan, 2005; Rödelsperger et al., 2013). The use of IGV, de novo-assembled transcripts and mapped paired-end RNA-Seq reads refuted or supported such gene predictions; gene models were suggestive of being incorrect if both paired-end

178

reads supporting (i.e. spanning) an intronic region and a de novo-assembled transcript whose open reading frame (ORF) spanned the entire length of the predicted gene were lacking. In contrast, most gene models were confirmed either using transriptomic evidence for the same species or, using our pairwise curation approach (see section 2.2.1), based on transcripts assembled for the closest related species. In Chapter 5, the availability of several transcriptomic data sets for Trichuris suis enabled the application of an improved strategy for re-prediction of gene models. The mapping of the de novo-assembled transcripts combined from multiple developmental stages to the genomic scaffolds created a robust data set that enabled subsequent re-assembly (using CAP3; Huang and Madan, 1999) and inference of ORFs. This addition substantially improved the pipeline (Chapter 5). Since this step was computationally tractable, it was fully automated and substantially sped up the confirmation of existing and/or the improvement of erroneous gene predictions. However, the success of this approach hinges on the number of RNA-Seq libraries (and accordingly, the quantity of RNA-Seq reads) available for an organism. For example, the re-prediction of gene models of the related Trichuris trichiura kinome (Chapter 5) was mainly achieved by transferring the high-confidence gene models from T. suis to T. trichiura because de novo-assembled transcripts of T. trichiura could either not be re- assembled and improved, did not cover predicted kinase genes or were of insufficient quality. The improvement or correction of gene predictions, as established and applied in this work, represents an important component, which has a substantial impact on subsequent analyses of genomic and transcriptomic data sets. Without iterative cycles of curation of existing genomic and transcriptomic resources, incomplete or incorrect gene predictions for a particular helminth genome deposited in public repositories will most likely be perpetuated, as these data will be utilised as protein evidence (e.g., in the MAKER2 pipeline (Holt and Yandell, 2011); cf. Mudge and Harrow, 2016) for de novo gene predictions (i.e., those lacking RNA-Seq support) in novel genome projects of related species. This emphasises the need for improved gene prediction software (e.g., the recently released MAKER3 pipeline; v. 3.00.0-beta; http://www.yandell-lab.org/software/maker.html), for comprehensive curation of erroneous gene predictions (either by individual curators or through research community efforts; Bateman, 2010; Putman et al., 2017) and importantly, for integration of curated data into existing repositories. Additionally, improved protein classification tools, such as the one described in this thesis, will facilitate a more comprehensive annotation of inferred amino acid sequences and a more reliable subclassification.

179

To enable the integration of data produced by our pipeline, we provide coding sequences, amino acid sequences and genome locations for all curated kinome data sets produced in this work. In addition, in the final version of the workflow (Chapter 5), the creation of a generic feature format (GFF) file was implemented and automated; this format can be read by most genomic viewers/editors, such as IGV (Thorvaldsdóttir et al., 2013) or Web Apollo (Lee et al., 2013), and allows gene-level annotations to be interactively viewed and edited, and published draft gene sets to be updated in online repositories. Using an established, generic output format, such as GFF, represents a major advantage of the presented pipeline in comparison with other tools for kinase identification and classification. In addition to its usefulness for the manual curation and editing of gene models or their annotation, the genomic viewer also facilitated the iterative improvement of the present pipeline and its functionalities. Once a new method or functionality had been implemented, its performance was assessed by simulating common ‘use cases’ or ‘edge cases’ (i.e. cases that occur rarely but when they occur, are likely to cause the program/algorithm to fail or not perform satisfactorily). This systematic evaluation was aided by manual confirmation of representative gene/transcript features (‘spot checking’) via a genomic viewer, and helped confirm the satisfactory performance of the implementation. Such assessments of newly implemented, automated methods allowed the identification of areas that could be improved and/or further automated in the next developmental cycle of the pipeline. For example, the prediction/correction of gene models based on transcript data employing CAP3 and Exonerate (Slater and Birney, 2005) was first established in Chapter 4 and then automated and further improved in Chapter 5. This improvement included an automated, global analysis of donor and acceptor sites of introns, which was then integrated into the gene prediction process in Exonerate, thus improving the accuracy of intron-exon boundary predictions. Although Exonerate had already been employed from Chapter 2 onwards to re-predict gene models (see Figure 2.1), we initially carried out multiple individual runs of single genes that needed to be complemented based on manual inspection. These examples show how the pipeline was successively adapted to the technical challenges of various, distinct data sets, particularly with respect to the quality and amount of data available. The continuous growth of publicly available data sets and data repositories should enable further integrative expansions of this framework in the future and highlights the need for reliable and reproducible gene prediction/annotation workflows. Taken together, manual curation and systematic assessment of gene models informed the automation process. In this context, it is important to carefully weigh up the benefit gained

180

from automating a particular step versus the time spent implementing the automation, as well as the accuracy and speed of the automated method versus that achieved by manual curation. In this context, it is also important to consider the size of the data set that is to be curated. For example, protein kinases represent a relatively small and well-defined data set (~200-500 sequences) compared with other assemblages of proteins (e.g., all proteins that play a role in metabolic pathways; likely thousands), which allows for the application of additional, manual curation efforts. Generally, newly automated steps should only be implemented in the next iteration of a tool or pipeline if automation can be achieved with an acceptable or without loss of accuracy. Importantly, the focus here was to automate steps that, although being algorithmically less challenging (e.g., file format conversions or output-input transformations), often represent steps in a workflow that are most prone to human error. In contrast, other steps proved harder to automate and were more reliable if implemented in a semi-automated manner with guidance/intervention by an expert. This approach did not affect the ability to process the input in a computer-readable format and allowed decisions to be made and stored for future runs of the pipeline, achieving an asymptotic automated performance with continued use (i.e. approximating a near-optimal or optimal solution). Although computational functional annotation has its limitations (reviewed in Mudge and Harrow, 2016) and cannot achieve the accuracy of detailed, manual curation (which, on the other hand, is subject to human bias, potentially affecting transparency and reproducibility), it facilitates the shift from a ‘gene-centric’ to a ‘genome-centric’ analysis for a particular organism. By employing several successive rounds of semi-automated curation and functional annotation, we achieved the most comprehensive characterisation of protein kinase sequences of parasitic worms to date. These results add evidence to potentially novel and/or biologically interesting proteins, which should now allow researchers to formulate new and intriguing hypotheses regarding the function of protein kinases in parasitic worms. To experimentally test such hypotheses, it is important to transition from a global, ‘kinome- centric’ view back to a more gene-centric view. Taken together, the present work provides a global resource for creating hypotheses and readily facilitates the display, manual curation, modification and annotation of sequences based on novel data sets obtained from experimental investigations of genes or proteins of interest, thus allowing this resource to grow and become more comprehensive over time.

181

6.2 Sequence curation and functional annotation Although many of the technical achievements in this thesis were made because of a need for comprehensive data analysis and curation of genomic and transcriptomic data, differences in parasite biology also determined some of the strategies applied for kinase identification, classification and annotation. In this context, the pairwise curation approach applied for all kinome analyses required the careful selection of a related helminth species, for which high- quality genomic and transcriptomic data were available; on one hand, the selection of two, phylogenetically very divergent species might have led to a lack of comparative genomic and transcriptomic evidence needed for curation; on the other hand, two very closely related species might not have added any complementary information to the re-prediction process. Together, the careful selection of two species for pairwise curation and the use of additional, independent data sets (e.g., those created employing distinct methodologies) allowed ‘consensus’ gene models to be created from all data sets. Employing this approach, we overcame some of the challenges inherent in using draft genome assemblies and inferred high-confidence gene models. In addition, the pairwise comparison approach applied for gene-level (i.e. protein-coding sequences) curation also allowed an iterative refinement of kinase classification and functional annotation through the cross-validation of genomic and transcriptomic data with the inferred protein sequences. For every prediction, a consensus was based on the evidence from RNA-Seq data, de novo-assembled transcriptomes, predicted protein domains and their architecture, genomic assemblies and gene predictions from published gene sets. These data were collated for two relatively closely related species and, in some cases, for multiple data sets representing the same species (e.g., in Chapter 5) to add additional confidence to predictions. The integration of all different data types resulted in consensus gene predictions and the inference of high-confidence protein sequences and respective classifications for both species. In Chapters 4 and 5, this reciprocal, pairwise curation approach was enhanced by integrating three-dimensional modelling data, facilitating the reliable prediction of a previously uncharacterised N-terminal domain in an apparently enoplean-specific kinase for four enoplean species. This example demonstrates how the integration of a three-dimensional modelling approach can be a useful tool to enrich the annotation of well-curated protein sequences, beyond what would be possible using primary sequence- and sequence domain- based approaches. However, accurate gene prediction is essential when using this strategy, as it is critical for achieving confident ‘down-stream’ annotation. Therefore, the initial gene

182

curation represents a crucial component of the workflow, as all subsequent identifications, classifications and annotations rely on the quality of the initial gene prediction. Only a gene prediction of high confidence, with sufficient transcriptomic support, an accurate genomic location and ORF can lead to a confident protein annotation, both at the primary sequence and structural levels. In addition to the importance of confident predictions of gene models, the location of genes on genomic scaffolds allowed the inference of a range of other crucial features that were used in the curation and/or annotation process. Knowledge of the distance to and characteristics of genes or gene fragments located up- and down-stream of a gene, combined with information on the completeness of the protein sequences based on domain and/or sequence comparisons, was employed to complement fragmented genes, incorrect splice sites introduced by gene predictors or incorrect scaffold assemblies. Furthermore, the analysis of the predicted functional domain and comparison with known architectures of functional domains in other organisms also helped to identify and curate fragmented and/or incorrectly assembled genes. In Chapters 3-5, an improved and more sensitive identification and classification approach based on conserved sequence domains led to an increase in the number of predicted and classified atypical kinase and kinase-like sequences. Initially (i.e. in Chapter 2), we applied the method used in Kinannote, which detects and annotates some atypical protein kinases but misses more divergent sequences (Goldberg et al., 2013). For Schistosoma haematobium and Schistosoma mansoni, Kinannote was able to classify kinases within the families ABC and RIO, but there was a lack of evidence for sequences in other atypical protein kinase families, including BRD, PDHK, PIKK and TAF1. In contrast, for the other parasite species studied in Chapters 3-5, Kinannote identified and classified some kinases in these families. Therefore, it is unlikely that the apparent lack of these kinase sequences in schistosomes is related to the use of the less sensitive method implemented in Kinannote, but rather represents a loss of these kinases or a substantial heterogeneity in their amino acid sequences that precluded their identification. Nevertheless, the latest, enhanced approach for the identification of atypical kinase or kinase-like sequences developed in this work should be applied to the kinomes of schistosomes in the future to improve our understanding of this intriguing group of enzymes and to further support the proposal that schistosomes and/or other species of flatworms lack these kinases. In the context of classification of ‘unclassified’ kinase sequences, a comparison of the orthology-/phylogeny-based classification strategy applied in this thesis with that of

183

Kinannote also proved useful. By grouping the predicted catalytic domains of unclassified kinases into orthologous clusters and constructing phylogenetic trees, it was possible to identify sequences phylogenetically distinct from those in recognised families or subfamilies. Importantly, inconsistencies between the annotation by Kinannote and that of the orthology- based approach revealed kinase sequences that were specific to a phylogenetic branch or even a single genus/species. Such analyses can elucidate the relationship between novel, parasite- specific families that cannot be classified into any of the existing families and/or subfamilies and kinases with known classifications. This approach provided important information about the evolution of protein kinase families/subfamilies and about the potential function of novel kinases. One such example is the nematode-specific KIN16 family (Morgan and Greenwald, 1993; Manning, 2005); sequences of nematode parasites analysed here consistently formed orthologous clusters with epidermal growth factor receptor sequences encoded in the human and C. elegans genomes (Chapters 3-5), indicating a closer relationship among these sequences than previously assumed (Manning, 2005). Another example of the utility of this approach was the identification of a MOS-like kinase sequence in the four enoplean kinomes analysed here. Phylogenetically, the human MOS kinase catalytic domain clusters with that of the MLKL protein sequence (Manning et al., 2002). However, for parasitic nematodes the MOS-like sequence did not form an orthologous cluster with the human MLKL or MOS sequence, which led to a further investigation of this unusual protein sequence and a prediction of its three-dimensional structure (Chapters 4 and 5). For other sequences that have diverged from their orthologs in other, non-nematode genomes, the combination of a controlled vocabulary and an orthology-/phylogeny-based approach allowed for them to be automatically assigned to the recognised groups, families and subfamilies. A similar approach has been previously applied (Desjardins et al., 2013) to achieve an improved kinase classification for a range of nematode kinomes. The program Kinannote was published shortly after by the same authors and reported two kinomes of filarial nematodes (Loa loa and Wuchereria bancrofti; see Goldberg et al., 2013) that had already been investigated in the earlier publication (Desjardins et al., 2013). The kinase complements of these two filarial nematodes differed considerably between the two studies, which likely related to the distinct orthology-based curation approaches employed. The advantages of Kinannote are that (i) it is easy-to-use, (ii) it only takes approximately five minutes to run on a protein data set of 10,000-15,000 sequences and (iii) it is fully automated. Furthermore, it produces a draft kinome and comparative statistics on the analysed kinome and is the only tool that automatically classifies protein kinases using the

184

controlled vocabulary of Hanks and Hunter (Hanks and Hunter, 1995). Because of these desirable features, Kinannote was integrated into our pipeline. However, automated identification and/or classification relies on the use of evidence inferred from either protein domain annotation or sequence comparison, and decisions based on such evidence are usually made in a ‘black box’, not allowing the user to assess intermediate results or adapt the process to individual requirements (e.g., relaxed cut-off scores or the use of specialised models/databases). Furthermore, Kinannote did not provide a functionality to curate the underlying genomic and/or transcriptomic data from which the query sequences were created or to assess the quality of the input amino acid data set. Accordingly, applying this approach to inaccurate, uncurated gene predictions inferred from draft genomic data could lead to an inability to identify some protein kinase sequences or to an incorrect identification and/or classification. Taken together, the integration of a pairwise reciprocal curation approach, Kinannote and an orthology-/phylogeny-based approach into the framework of our curation and annotation pipeline represents a transparent, improved, simplified and practical strategy for comprehensive kinase classification, and addresses some of the shortcomings of other tools (e.g., Kinomer, Martin et al., 2009; Kinannote, Goldberg et al., 2013).

6.3 Fundamental and applied achievements In addition to the use of RNA-Seq data to support and improve gene predictions and the inference of protein-coding genes, such data are also applicable to the investigation of fundamental biological processes. In this thesis, paired-end RNA-Seq data from different developmental stages, sexes and/or tissues were mapped to well-curated kinase transcript sequences, which allowed the exploration of how kinase transcription is regulated across different biological time points and/or locations (Chapters 2, 3 and 5). However, for most parasites investigated, such an analysis was partially hindered by the fact that sometimes data were not available for some developmental stages (e.g., due to challenges obtaining such material), or developmental stages were represented by single samples. Without sufficient biological replicates, statistical enrichment analysis of transcripts for a particular developmental stage is hardly feasible, which hampers the inference of biological meaningful information. Nevertheless, some findings in the present thesis (Chapters 2, 3 and 5) showed that kinase genes undergo substantial transcriptional regulation throughout parasite life cycles and some of the observed variation was suggestive of roles in reproduction, development and morphological changes. In the future, it would be beneficial

185

to obtain RNA-Seq data for other stages and/or tissues (including replicates) for some of these parasites, in particular, for those for which only a limited number of stages is currently available (e.g., S. haematobium and Trichinella spp.; Chapters 2 and 4). In Chapter 5, an enrichment analysis was employed to investigate links between pathway association, classification or functional domains of kinase sequences and the transcription profile of their genes. Although this analysis revealed some enriched kinase families in adult male T. suis, which could be linked to male-specific developmental/reproductive processes, it also highlighted the limitations of such an approach for most parasitic species; most enrichment analyses were impaired by a relatively low percentage of kinases that could be assigned to a pathway and/or specific GO terms, even for species that are more closely related to the model organism C. elegans (cf. Harris et al., 2014), such as H. contortus (see Chapter 3). Nevertheless, most of the kinases that could be assigned to a pathway had at least one ortholog in C. elegans, suggesting that kinase sequences that are conserved between C. elegans and parasitic nematodes are more likely to be accurately assigned to a pathway. This finding emphasises the need for a more detailed analysis and curation of pathways in parasitic nematodes (cf. Gilabert et al., 2016; Mohandas et al., 2016) and the integration of such data into pathway databases, such as the Kyoto Encyclopedia of Genes and Genomes (KEGG; Kanehisa and Goto, 2000). In addition to the utility of pathway and transcription data for the investigation of the fundamental biology of parasites, these data can also be used for applied investigations such as the prediction of novel targets for small molecules (chemicals) with potential anthelmintic activity. For example, kinase genes transcribed across the entire life cycle of a parasite are more likely to play an essential function in all stages of development (Shanmugam et al., 2012). Nonetheless, kinases transcribed selectively in a particular life cycle stage or tissue that could be targeted by a drug should also be considered in a prioritisation approach, as they might represent a specific, yet essential component for parasite survival in this particular stage/tissue. In addition, the identification of unique elements within biochemical pathways (‘chokepoints’; Shanmugam et al., 2012; Taylor et al., 2013) also represents useful data for the drug target prioritisation process. In this context, it is important to note that differences in the biology and substantial genetic differences between parasites and well-annotated model organisms, such as C. elegans, have had an impact on the ability to achieve comprehensive pathway annotation for some protein kinases. Nevertheless, in the present thesis, information derived from the analysis of transcription profiles and pathway associations has aided the rational selection of potential drug targets in

186

helminths and has facilitated the formulation of testable hypotheses (cf. Chapters 2 and 3) which, in the future, might unravel some of the fundamental questions about kinase signalling in parasitic helminths. Information from such investigations could, in turn, inform the “prioritisation system” to improve future predictions and assist in formulating novel hypotheses. In this context, prioritised compounds are not only significant in terms of drug discovery, but can also serve as biological probes to further investigate the biology of kinase signalling in helminths (Knapp et al., 2013; Arrowsmith et al., 2015; Counago et al., 2017). In Chapter 2, we employed a stringent filtering approach to define potential drug targets, which led to a list of 42 compounds with potential as inhibitors of protein kinases in schistosomes. In contrast, we adapted this approach in Chapter 3, and employed a less restrictive, ranking-based strategy. Although the addition of several other criteria (such as the number of databases containing a compound associated with a target, the number of predicted compounds, the sequence similarity of a target to a homolog in the host organism and the potential of a protein to function as a broad-spectrum target) initially led to a smaller number of kinases to be considered, this approach was less restrictive regarding the compound selection and suggested 1517 molecules with potential anthelmintic activity. These compounds could now be further filtered in a similar fashion as applied in Chapter 2 (i.e. by restricting the selection to compounds that are approved or are being investigated in clinical trials) to narrow down the list of potential drug candidates. Generally, the amount of filtering applied depends on the planned follow-up experiments and the available capacities for experimental investigations. For example, if experimental costs or feasibility only allow for a single or a few targets/compounds to be investigated, the list of drug target/drug candidates needs to be reduced further. In this context, a ranking approach should be weighed up against a pure filtering approach, and the risk of losing potentially promising candidates because they do not satisfy a single criterion, should be considered (Shanmugam et al., 2012). In the case of the 1517 molecules predicted to be potential protein kinase inhibitors in H. contortus, we tested a subset of them for activity against this parasite (Stroehlein et al., unpublished findings; Jiao et al., 2017), using an automated, high-throughput, phenotypic screening platform (Preston et al., 2015). Interestingly, these experiments revealed several ‘hit’ compounds that are known to target human kinases, orthologs of which had been predicted as potential drug targets in our ranking approach: PD 0325901 is a human MEK inhibitor (Barrett et al., 2008) and was predicted to target the MEK3 kinase Hc-PK-197.1 in H. contortus (see Chapter 3), which had the second-highest score in our target-ranking approach. Another hit compound identified in H. contortus, SNS-032 (Jiao et al., 2017),

187

inhibits CDK2, CDK7 and CDK9 in humans (Chen et al., 2009). Interestingly, four kinases in the CDK family (Hc-PK-002.1, Hc-PK-062.1, Hc-PK-063.1 and Hc-PK-236.1) belonged to the ten highest-ranking kinases in our drug target prioritisation (Chapter 3). Which of these kinases is the most likely target of SNS-032 is currently unknown and should be the subject of future in silico and in vitro investigations. A third hit compound, the tyrosine kinase inhibitor (“tyrphostin”) AG1295 (Jiao et al., 2017), targets human platelet-derived growth factor receptors (PDGFRs; Kovalenko et al., 1994), which belong to a family that is missing from the kinomes of all species studied in the present work (Chapters 2-5), that of C. elegans (see Manning, 2005) and those of many other nematodes (see Desjardins et al., 2013). Since PDGFR is not present in H. contortus it cannot be the molecular target of AG1295 and accordingly, was not identified in our ranking approach. However, AG1295 might target some of the five other, related growth factor receptors encoded in the genome of H. contortus (see Chapter 3). Thus, one or more representatives of the families EGFR and/or KIN16 might be the molecular target of AG1295. This proposal is particularly intriguing because the KIN16 family, although being distantly related to the EGFR family (cf. Chapter 5), to date, has been found only in nematodes. Accordingly, kinases in this family might represent parasite-specific targets that are distinct from all mammalian kinases. These proposals warrant further in-depth investigations, with a focus on AG1295, its potential target(s) and its mode of action in H. contortus. Taken together, the hit compounds identified suggest that the drug target-ranking approach applied to H. contortus in Chapter 3 is useful and that targets that are ranked highest via computational prioritisation warrant further exploration. For instance, we predicted ten additional compounds to bind to the MEK3 kinase Hc-PK-197.1, which could also be investigated using the established in vitro assay, and could be compared with the structure of the hit compound PD 0325901 to infer potential structure-activity relationships (SAR). Similarly, for the CDKs, between two and 662 associated compounds were predicted, respectively, that could also be compared structurally to SNS-032 and investigated in vitro as to their ability to reduce motility of parasitic larvae of H. contortus. However, the finding that some of the compounds predicted to target the highest-ranking protein kinases are promising hits in in vitro studies is currently anecdotal and requires a large-scale, comprehensive evaluation to show that the present ranking approach produces a statistically significant enrichment of promising targets and/or compounds.

188

6.4 Prospects and future extensions While the application of an in vitro phenotypic screening platform (Preston et al., 2015) has proven to be a useful tool to investigate individual compounds predicted to target parasite kinases in this thesis, in the future, screening a large number of kinase inhibitors would provide statistically robust data to support and/or improve the applied drug target/compound prioritisation process. In this context, the predictions made in this work represent an important reference data set that could now be validated in vitro. In contrast to in vitro testing, in silico structural prediction approaches, such as the ones employed in Chapters 4 and 5 could also be used in the future to obtain further computational evidence of the three- dimensional structures of potential drug targets, which would then allow the computational ‘pre-screening’ of much larger libraries against predicted target structures (Ferreira et al., 2015; Sarnpitak et al., 2015), representing a cost- and time-saving step prior to in vitro phenotypic screening. Such an approach could be further improved by a more refined structure prediction strategy, for example, by predicting and comparing binding pockets across different proteins and by identifying sub-pockets specific to particular kinases (Volkamer et al., 2015, 2016). Subsequently, virtual screening and/or docking approaches using tools such as AutoDock Vina (Trott and Olson, 2010), in combination with deep convolutional neural network (‘deep learning’) approaches (Pereira et al., 2016) could be employed to elucidate the likely binding mode of a ligand to a predicted structure. A limitation of such approaches is the lack of experimentally determined three-dimensional (crystal) structures for parasitic worms and a reliance on the computational prediction based on structural homology using tools such as I-TASSER (Yang et al., 2015). This aspect highlights the need for structural investigations of protein kinases in parasitic worms to assist in assessing their potential as drug targets. Akin to improved drug target identification strategies enabled by additional experimental data (i.e. structural investigations and phenotypic screening), some of the challenges inherent to the analysis of draft genomes might be overcome in the future through the application of novel technologies, such as third-generation, long-read sequencing (Roberts et al., 2013; Reuter et al., 2015). Such technologies could help resolve genomic regions that are notoriously difficult to assemble (e.g., repeat regions; Roberts et al., 2013; Reuter et al., 2015), thus improving overall genome assemblies. More contiguous assemblies would lead to improved gene predictions, which would facilitate a more comprehensive annotation of inferred protein sequences, including kinase classification as well as domain and pathway annotation.

189

Additionally, some of the hypotheses regarding the fundamental biology of parasitic worms that were formulated in this thesis can now be tested experimentally by applying a range of advanced, “systems biology” strategies. Such approaches would allow for the integration of curated data on protein kinases produced in this thesis and would facilitate the global analysis of cellular signalling mechanisms beyond that of protein phosphorylation alone. Importantly, such experiments would greatly enrich the presented data sets, because they would provide novel evidence for the predictions made and would help to validate and/or further improve some of the computational approaches established and/or employed here. For instance, the study of transcriptomic profiles of kinase genes in different developmental stages and/or tissues employed here (Chapters 2, 3 and 5) could be further corroborated by sequencing and characterising small, non-coding RNAs (cf. Bai et al., 2014; Ma et al., 2016; Claycomb et al., 2017) or by employing proteomic approaches (cf. Dewalick et al., 2011; Hong et al., 2013; Zhang et al., 2013; Sotillo et al., 2015). In addition, the excretory/secretory (ES) proteins predicted for some of the parasites studied here (Chapters 4 and 5) could also be experimentally investigated by proteomic studies (cf. Robinson and Connolly, 2005; Robinson et al., 2005; Liu et al., 2009; Chaiyadet et al., 2016; Cortes et al., 2016), guided by the predictions made in this thesis. Another potential avenue would be the identification of phosphorylated proteins in these parasites (phosphoproteomics), as previously reported for S. japonicum (see Cheng et al., 2013) and Pristionchus pacificus (see Borchert et al., 2012). In the context of kinase characterisation, this would provide important clues regarding the substrates of kinases and, thus, would aid in the deconvolution of signalling pathways. To gain further insight into the interaction among signalling proteins in these pathways, proteomic approaches could also be employed to identify and characterise protein complexes (Rigaut et al., 1999). Information gleaned from the application of these technologies would enhance our understanding of signalling in parasitic worms and would inform future drug discovery efforts, which could be further aided through use of a chemical proteomics assay for kinase inhibitor profiling (Medard et al., 2015). In addition to such proteomic experiments, the application of chromatin immunoprecipitation followed by massive parallel sequencing (ChIP-Seq; cf. Cosseau et al., 2009; Cosseau and Grunau, 2011; Roquis et al., 2015) and/or the global analysis of DNA methylation by bisulfite sequencing (MethylC-Seq; cf. Gao et al., 2012; Urich et al., 2015), would facilitate studies of the epigenetic mechanisms regulating gene expression and enhance the understanding of gene regulation across different life cycle stages and tissues of helminths.

190

Another interesting area for future studies would be the quantitative analysis of the complement of small-molecule, non-proteinaceous metabolites in biological samples via mass spectrometry (MS) or nuclear magnetic resonance spectroscopy (NMR) (Dettmer et al., 2007; Roessner and Bowne, 2009; Markley et al., 2017). Although metabolomic investigations of unicellular eukaryotic parasites currently predominate the literature (reviewed in Preidis and Hotez, 2015; Vincent and Barrett, 2015), there is a growing number of metabolomic studies of helminths, some of which have provided novel insights into the interaction between parasites and their host environment via excreted metabolites (Laan et al., 2017); how metabolic profiles in host tissues are altered during infection (Nishina et al., 2004; Saric et al., 2010); and how worms sense environmental and/or organismal cues (Hsueh et al., 2017). Although the latter study has been conducted in the free-living nematode C. elegans, a similar approach could be applied to parasitic helminths to study which environmental and/or host cues trigger host invasion, migration and/or parasite development and metamorphosis. In addition to these potential avenues, other metabolomic approaches could also provide new insights into the biochemical composition of metabolites within cells or tissues (Prosser et al., 2014). The exploration of tissue or whole-parasite metabolomes could lead to new experimental evidence needed for the detection of novel and/or the reconstruction of canonical metabolic pathways. A combination of such a strategy with predicted pathway maps, based on sequence homology, as they have been constructed in the present thesis (Chapters 2-5), would allow the systematic integration of biomolecular interactions at the protein and metabolite levels and the application of computational tools for network modelling, to gain new, fundamental insights into the biochemical signalling of parasites (Roberts et al., 2009; Holmes, 2010; Yamanishi et al., 2015; Tabei et al., 2016). Additionally, from an applied perspective, such investigations would provide new evidence to inform drug discovery efforts, as they could facilitate the deciphering of the mode of action of a drug by monitoring the response of the parasite to chemotherapeutics (Creek and Barrett, 2014). In this context, metabolomics has also proven useful for characterising fractions and/or small molecules from natural products that have anti-parasitic activity (Holmes, 2010; Kumarasingha et al., 2016). Furthermore, metabolomic studies could inform essentiality predictions based on pathway analyses conducted in this thesis (Chapters 2 and 3). Importantly, this approach would greatly benefit from being combined with gene knockdown experiments (Maule et al., 2011; Dalzell et al., 2012), as it would allow new

191

hypotheses regarding the function of kinase genes in particular biochemical pathways to be tested. While such knockdown experiments can be routinely carried out for unicellular eukaryotic parasites in a relatively high-throughput manner (cf. Alsford et al., 2011; Kolev et al., 2011; Morf et al., 2013), there has been variable success of RNAi-mediated knockdown for different species of parasitic worms (Geldhof et al., 2007; Knox et al., 2007; Viney and Thompson, 2008; Maule et al., 2011). Although extensive genomic and transcriptomic data sets have now enabled the identification of the complements of RNAi effector proteins in many parasitic nematodes (e.g., Dalzell et al., 2011; Schwarz et al., 2013), challenges in the application and experimental design of RNAi-mediated knockdown remain (Dalzell et al., 2012). However, several, more recent studies report successful application of RNAi for a range of nematode species. For example, the expression of the immunomodulatory paramyosin-encoding gene was successfully silenced in Trichinella spiralis via soaking and electroporation techniques (Chen et al., 2012). For H. contortus, some promising studies also report gene silencing (Samarasinghe et al., 2011; Zawadzki et al., 2012), albeit success was dependent on the life cycle stage used, the RNAi delivery method and the tissue expression of the target genes. Other studies report the application of RNAi to the brown stomach worm Teladorsagia circumcincta (see Tzelos et al., 2015), the large pig roundworm Ascaris suum (see McCoy et al., 2015) and the entomopathogenic nematode Heterorhabditis bacteriophora (see Ratnappan et al., 2016). For schistosomes, there are also numerous reports of the successful application of RNAi (reviewed in Hagen et al., 2012; Da'dara and Skelly, 2015), and a recent study achieved persistent knockdown of selected genes using a lentivirus transduction approach (Hagen et al., 2014; Hagen et al., 2015). The application of such a virus-based transduction system for parasitic nematodes would be a major advance and would, if successful, overcome the challenges associated with conventional RNAi approaches in parasitic nematodes (Maule et al., 2011; Dalzell et al., 2012). Extending results from this thesis, this technique could be used to study functional roles of kinases and signalling pathways in development and reproduction of helminths, and would represent a powerful tool, generally, for functional genomic investigations. Furthermore, it would provide crucial experimental evidence to instil additional confidence into computational essentiality predictions and to explore some of the proposals made in this thesis regarding protein function. In addition to knockdown experiments to test the essentiality of kinase genes in parasites, the use of small-molecule chemicals to elicit lethal or sub-lethal phenotypes, would also carry merit, and could support essentiality and drug target predictions.

192

Taken together, pursuing the avenues outlined in the present chapter and employing innovative experimental approaches, should lead to major novel, fundamental biological insights and should have direct, applied implications. However, the new, diverse and large data sets produced will require an expansion of the presented pipeline to integrate this novel evidence. Therefore, additional experimental investigations should be followed by technical improvements and expansions of the current pipeline. For example, information on pathways, essentiality and excretory/secretory (ES) proteins could be integrated into the current version of the pipeline to add further experimental evidence to the computational predictions. The establishment of a flexible, user-friendly and expandable platform that would facilitate such an integration of novel data sets would be desirable in the future. Clearly, the present pipeline provides a solid framework for further technical developments in this direction. Other possible expansions of the current pipeline are not related to the integration of new data types, but rather to the improvement of the speed, accuracy and/or efficiency of currently applied tools and methods. For example, although the strategy employed by the InterPro Domain Architecture (IDA) tool has been implemented in our pipeline (Figure 6.3), the ability to query all known domain architectures represented by the InterPro database would be a desirable feature that could be included in the future. In addition, for all steps that employ HMM searches against sequence databases (including those carried out in InterProScan and Kinannote), there is potential to increase search sensitivity and accuracy by replacing this search strategy with “HMM versus HMM” alignments (as implemented in the program HHblits; Remmert et al., 2011). However, given that the current pipeline relies on several tools that have already implemented a “sequence versus HMM” search functionality, an upgrade to a “HMM versus HMM” alignment-based method would require an adaptation of all third-party tools used in this work and to specifically build customised databases of query sequence data sets (e.g., parasite kinomes) for the use with HHblits. Given that the approach developed in this thesis performed very well for protein kinase catalytic domains, at this point, it seems that there is no need for such an adaptation (which would present a substantial body of work, with likely only a marginal gain in sensitivity and/or accuracy). In contrast, the structural prediction using I-TASSER could be readily automated and scaled to allow high-throughput analyses. The challenge here would be the requirement for substantial computing infrastructure and time (these computations have been run on an IBM iDataplex x86 system containing 67 nodes with 256 GB RAM and 16 cores per node;

193

https://www.melbournebioinformatics.org.au/capabilities/), which led to a restriction of this technology/approach to a subset of selected kinases in this thesis. Taken together, the current pipeline consists of a range of third-party tools (including InterProScan, OrthoMCL, BLAT, Exonerate, hmmalign, MrBayes, Kinannote and I- TASSER; Figure 6.3) and scripts developed in this work (written in the programming languages Perl, bash and R) that must be run in succession by the user, which requires at least advanced, if not expert, bioinformatic skills. The major advantage of a pipeline that is separated into multiple well-defined steps, is that it provides flexibility that could not easily be achieved using a more restrictive (i.e. allowing for less interaction and decision-making by the user), stand-alone version. Given that genomic and transcriptomic data sets of parasitic organisms are often highly diverse and that each project requires the careful assessment of data and subsequent selection of analysis strategies, an interactive, stepwise workflow represents an advantage. To retain this flexibility and simultaneously create a more user- friendly application, a framework that allows the modularisation of individual steps into compatible and interchangeable components could be developed. Such a module-based approach would also allow the adaptation of the existing pipeline to the investigation of protein families other than kinases, by exchanging the module employed for kinase identification and classification by one designed for the analysis of a different protein family. This might readily be achieved for well-defined enzyme classes such as phosphatases (Chen et al., 2017), but other less well-understood and studied classes might be more challenging. For example, proteins of the cysteine-rich secretory proteins/antigen 5/pathogenesis-related 1 (CAP) superfamily (also called SCP/TAPS proteins; Cantacessi et al., 2009; Cantacessi and Gasser, 2012) form an assembly of very diverse proteins, which bears challenges regarding the functional annotation and classification of members of this superfamily (Cantacessi and Gasser, 2012). For such a complex group of proteins, a library of subfamily-specific HMMs could be built based on functional domain architectures, as employed for some atypical protein kinase families and subfamilies in this thesis (Chapters 4 and 5). A module representing such a library could then be integrated into the established pipeline. This approach could be employed for other protein families, making the present bioinformatic framework broadly applicable. For example, proteins belonging to the group of nuclear hormone receptors (NHRs) and NHR-related proteins (NHRRs) exhibit similar sequence diversity as members of the CAP superfamily, containing many distinct gene families and subfamilies, albeit having a relatively conserved overall domain architecture (Taubert et al., 2011; Wu and LoVerde,

194

2011). In addition, the NHR(R) group is largely expanded in nematodes (Robinson-Rechavi et al., 2005) and some members have been shown to play roles in the virulence/pathogenicity of parasites (Lu et al., 2016). Interestingly, results described in this thesis show that some NHRRs share relatively high sequence similarity with the protein kinase catalytic domain and that there is a high diversity among kinase-like helminth NHRRs (Chapters 3-5). Importantly, we have identified subfamilies that appear to be restricted to enoplean nematodes and others that are proposed to be specific to species of the genus Trichuris, and could be involved in receptor-mediated signalling pathways in these parasites (Chapter 5). Clearly, NHR(R)s are worth studying in the future to untangle their classification for a wide range of parasitic helminths and to unravel the phylogenetic and functional relationships between NHR(R)s and protein kinases. Taken together, the characterisation of CAP and NHR(R) protein families across a broad taxonomic range would allow the classification of conserved families and subfamilies present in most or all organisms, which likely play roles in essential ‘house-keeping’ functions. Importantly, it would allow the definition of parasite-specific subfamilies, which are more likely to assume roles related to parasitism or host-parasite interactions; from a drug discovery perspective, some of the members in these subfamilies might represent parasite- specific drug targets. Such subfamilies could then be explored further on a structural level to assess the presence of novel, ‘dark’ functional domains (i.e. regions of proteins for which no experimentally determined structure exists and that are inaccessible to homology modelling; cf. Perdigão et al., 2015; Bitard-Feildel and Callebaut, 2017).

6.5 Conclusion The present thesis has elucidated the protein kinase complements of seven phylogenetically diverse species of parasitic helminths, providing new insights into the signalling biology and evolution of kinases, and allowing for the first global comparison of well-curated worm kinomes. These complements should provide a useful resource for future fundamental and applied investigations of biochemical signalling mechanisms of parasitic worms as well as the discovery of new interventions against these pathogens. Importantly, the bioinformatic pipeline established in this thesis provides a powerful platform to study the kinases of many other eukaryotic organisms, both parasitic and free-living, and to unravel their evolutionary relationships. Extending this work, a modular, robust and reproducible workflow that is generally applicable to the analysis of a wide range of protein families should be a useful tool to support systems biological investigations of eukaryotic organisms.

195

6.6 References Alsford S, Turner DJ, Obado SO, Sanchez-Flores A, Glover L, Berriman M, Hertz-Fowler C, Horn D, 2011. High-throughput phenotyping using parallel sequencing of RNA interference targets in the African trypanosome. Genome Res. 21, 915-924. Arrowsmith CH, Audia JE, Austin C, Baell J, Bennett J, Blagg J, Bountra C, Brennan PE, Brown PJ, Bunnage ME, Buser-Doepner C, Campbell RM, Carter AJ, Cohen P, Copeland RA, Cravatt B, Dahlin JL, Dhanak D, Edwards AM, Frederiksen M, Frye SV, Gray N, Grimshaw CE, Hepworth D, Howe T, Huber KV, Jin J, Knapp S, Kotz JD, Kruger RG, Lowe D, Mader MM, Marsden B, Mueller-Fahrnow A, Muller S, O'Hagan RC, Overington JP, Owen DR, Rosenberg SH, Roth B, Ross R, Schapira M, Schreiber SL, Shoichet B, Sundstrom M, Superti-Furga G, Taunton J, Toledo- Sherman L, Walpole C, Walters MA, Willson TM, Workman P, Young RN, Zuercher WJ, 2015. The promise and peril of chemical probes. Nat. Chem. Biol. 11, 536-541. Bai Y, Zhang Z, Jin L, Kang H, Zhu Y, Zhang L, Li X, Ma F, Zhao L, Shi B, Li J, McManus DP, Zhang W, Wang S, 2014. Genome-wide sequencing of small RNAs reveals a tissue-specific loss of conserved microRNA families in Echinococcus granulosus. BMC Genomics 15, 736. Barrett SD, Bridges AJ, Dudley DT, Saltiel AR, Fergus JH, Flamme CM, Delaney AM, Kaufman M, LePage S, Leopold WR, Przybranowski SA, Sebolt-Leopold J, Van Becelaere K, Doherty AM, Kennedy RM, Marston D, Howard WA, Jr., Smith Y, Warmus JS, Tecle H, 2008. The discovery of the benzhydroxamate MEK inhibitors CI-1040 and PD 0325901. Bioorg. Med. Chem. Lett. 18, 6501-6504. Bateman A, 2010. Curators of the world unite: the International Society of Biocuration. Bioinformatics 26, 991. Bernot A, 2004. Genome, transcriptome and proteome analysis. John Wiley & Sons, Ltd, Hoboken, New Jersey, USA. Bitard-Feildel T, Callebaut I, 2017. Exploring the dark foldable proteome by considering hydrophobic amino acids topology. Sci. Rep. 7, 41425. Blaxter M, Koutsovoulos G, 2015. The evolution of parasitism in Nematoda. Parasitology 142 (Suppl. 1), S26-S39. Borchert N, Krug K, Gnad F, Sinha A, Sommer RJ, Macek B, 2012. Phosphoproteome of Pristionchus pacificus provides insights into architecture of signaling networks in nematode models. Mol. Cell. Proteomics 11, 1631-1639. Cantacessi C, Campbell BE, Visser A, Geldhof P, Nolan MJ, Nisbet AJ, Matthews JB, Loukas A, Hofmann A, Otranto D, Sternberg PW, Gasser RB, 2009. A portrait of the "SCP/TAPS" proteins of eukaryotes - developing a framework for fundamental research and biotechnological outcomes. Biotechnol. Adv. 27, 376-388. Cantacessi C, Gasser RB, 2012. SCP/TAPS proteins in helminths - where to from now? Mol. Cell Probes 26, 54-59. Chaiyadet S, Smout M, Laha T, Sripa B, Loukas A, Sotillo J, 2016. Proteomic characterization of the internalization of Opisthorchis viverrini excretory/secretory products in human cells. Parasitol. Int. https://doi.org/10.1016/j.parint.2016.02.001 Chen MJ, Dixon JE, Manning G, 2017. Genomics and evolution of protein phosphatases. Sci. Signal. 10, eaag1796. Chen R, Wierda WG, Chubb S, Hawtin RE, Fox JA, Keating MJ, Gandhi V, Plunkett W, 2009. Mechanism of action of SNS-032, a novel cyclin-dependent kinase inhibitor, in chronic lymphocytic leukemia. Blood 113, 4637-4645.

196

Chen X, Yang Y, Yang J, Zhang Z, Zhu X, 2012. RNAi-mediated silencing of paramyosin expression in Trichinella spiralis results in impaired viability of the parasite. PLoS One 7, e49913. Cheng G, Luo R, Hu C, Lin J, Bai Z, Zhang B, Wang H, 2013. TiO2-based phosphoproteomic analysis of schistosomes: characterization of phosphorylated proteins in the different stages and sex of Schistosoma japonicum. J. Proteome Res. 12, 729-742. Claycomb J, Abreu-Goodger C, Buck AH, 2017. RNA-mediated communication between helminths and their hosts: the missing links. RNA Biol. 14, 436-441. Coghlan A, 2005. Nematode genome evolution. WormBook, ed. The C. elegans Research Community, WormBook, https://doi.org/10.1895/wormbook.1.15.1 Cortes A, Sotillo J, Munoz-Antoli C, Trelis M, Esteban JG, Toledo R, 2016. Definitive host influences the proteomic profile of excretory/secretory products of the trematode Echinostoma caproni. Parasit. Vectors 9, 185. Cosseau C, Azzi A, Smith K, Freitag M, Mitta G, Grunau C, 2009. Native chromatin immunoprecipitation (N-ChIP) and ChIP-Seq of Schistosoma mansoni: critical experimental parameters. Mol. Biochem. Parasitol. 166, 70-76. Cosseau C, Grunau C, 2011. Native chromatin immunoprecipitation. Methods Mol. Biol. 791, 195-212. Counago RM, Axtman AD, Capuzzi SJ, Azevedo H, Drewry DH, Elkins JM, Gileadi O, Guimaraes CRW, Mascarello A, Serafim RAM, Wells CI, Willson TM, Zuercher WJ, 2017. Development of narrow spectrum ATP-competitive kinase inhibitors as probes for BIKE and AAK1. bioRxiv https://doi.org/10.1101/094631 Creek DJ, Barrett MP, 2014. Determination of antiprotozoal drug mechanisms by metabolomics approaches. Parasitology 141, 83-92. Da'dara AA, Skelly PJ, 2015. Gene suppression in schistosomes using RNAi. Methods Mol. Biol. 1201, 143-164. Dalzell JJ, McVeigh P, Warnock ND, Mitreva M, Bird DM, Abad P, Fleming CC, Day TA, Mousley A, Marks NJ, Maule AG, 2011. RNAi effector diversity in nematodes. PLoS Negl. Trop. Dis. 5, e1176. Dalzell JJ, Warnock ND, McVeigh P, Marks NJ, Mousley A, Atkinson L, Maule AG, 2012. Considering RNAi experimental design in parasitic helminths. Parasitology 139, 589-604. Desjardins CA, Cerqueira GC, Goldberg JM, Dunning Hotopp JC, Haas BJ, Zucker J, Ribeiro JM, Saif S, Levin JZ, Fan L, Zeng Q, Russ C, Wortman JR, Fink DL, Birren BW, Nutman TB, 2013. Genomics of Loa loa, a Wolbachia-free filarial parasite of humans. Nat. Genet. 45, 495-500. Dettmer K, Aronov PA, Hammock BD, 2007. Mass spectrometry-based metabolomics. Mass Spectrom. Rev. 26, 51-78. Dewalick S, Bexkens ML, van Balkom BW, Wu YP, Smit CH, Hokke CH, de Groot PG, Heck AJ, Tielens AG, van Hellemond JJ, 2011. The proteome of the insoluble Schistosoma mansoni eggshell skeleton. Int. J. Parasitol. 41, 523-532. Durbin R, Eddy SR, Krogh A, Mitchison G, 1998. Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge, United Kingdom. Eddy SR, 2011. Accelerated profile HMM searches. PLoS Comput. Biol. 7, e1002195. Ferreira LG, Dos Santos RN, Oliva G, Andricopulo AD, 2015. Molecular docking and structure-based drug design strategies. Molecules 20, 13384-13421.

197

Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K, Holm L, Sonnhammer EL, Eddy SR, Bateman A, 2010. The Pfam protein families database. Nucleic Acids Res. 38, D211-D222. Gao F, Liu X, Wu XP, Wang XL, Gong D, Lu H, Xia Y, Song Y, Wang J, Du J, Liu S, Han X, Tang Y, Yang H, Jin Q, Zhang X, Liu M, 2012. Differential DNA methylation in discrete developmental stages of the parasitic nematode Trichinella spiralis. Genome Biol. 13, R100. Geldhof P, Visser A, Clark D, Saunders G, Britton C, Gilleard J, Berriman M, Knox D, 2007. RNA interference in parasitic helminths: current situation, potential pitfalls and future prospects. Parasitology 134, 609-619. Gilabert A, Curran DM, Harvey SC, Wasmuth JD, 2016. Expanding the view on the evolution of the nematode dauer signalling pathways: refinement through gene gain and pathway co-option. BMC Genomics 17, 476. Goldberg JM, Griggs AD, Smith JL, Haas BJ, Wortman JR, Zeng Q, 2013. Kinannote, a computer program to identify and classify members of the eukaryotic protein kinase superfamily. Bioinformatics 29, 2387-2394. Hagen J, Lee EF, Fairlie WD, Kalinna BH, 2012. Functional genomics approaches in parasitic helminths. Parasite Immunol. 34, 163-182. Hagen J, Young ND, Every AL, Pagel CN, Schnoeller C, Scheerlinck JP, Gasser RB, Kalinna BH, 2014. Omega-1 knockdown in Schistosoma mansoni eggs by lentivirus transduction reduces granuloma size in vivo. Nat. Commun. 5, 5375. Hagen J, Scheerlinck JP, Gasser RB, 2015. Knocking down schistosomes - promise for lentiviral transduction in parasites. Trends Parasitol. 31, 324-332. Hanks SK, Hunter T, 1995. Protein kinases 6. The eukaryotic protein kinase superfamily: kinase (catalytic) domain structure and classification. FASEB J. 9, 576-596. Harris TW, Baran J, Bieri T, Cabunoc A, Chan J, Chen WJ, Davis P, Done J, Grove C, Howe K, Kishore R, Lee R, Li Y, Muller HM, Nakamura C, Ozersky P, Paulini M, Raciti D, Schindelman G, Tuli MA, Van Auken K, Wang D, Wang X, Williams G, Wong JD, Yook K, Schedl T, Hodgkin J, Berriman M, Kersey P, Spieth J, Stein L, Sternberg PW, 2014. WormBase 2014: new views of curated biology. Nucleic Acids Res. 42, D789-D793. Holmes E, 2010. The evolution of metabolic profiling in parasitology. Parasitology 137, 1437-1449. Holt C, Yandell M, 2011. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics 12, 491. Hong Y, Sun A, Zhang M, Gao F, Han Y, Fu Z, Shi Y, Lin J, 2013. Proteomics analysis of differentially expressed proteins in schistosomula and adult worms of Schistosoma japonicum. Acta Trop. 126, 1-10. Howe KL, Bolt BJ, Shafie M, Kersey P, Berriman M, 2016. WormBase ParaSite - a comprehensive resource for helminth genomics. Mol. Biochem. Parasitol. https://doi.org/10.1016/j.molbiopara.2016.11.005 Hsueh YP, Gronquist MR, Schwarz EM, Nath RD, Lee CH, Gharib S, Schroeder FC, Sternberg PW, 2017. Nematophagous fungus Arthrobotrys oligospora mimics olfactory cues of sex and food to lure its nematode prey. eLife 6, e20023. Huang X, Madan A, 1999. CAP3: A DNA sequence assembly program. Genome Res. 9, 868- 877. Jiao Y, Preston S, Koehler AV, Stroehlein AJ, Chang BCH, Cowley KJ, Simpson KJ, Palmer MJ, Laleu B, Wells TNC, Jabbar A, Gasser RB, 2017. Screening of the 'Stasis Box'

198

identifies two kinase inhibitors under pharmaceutical development with activity against Haemonchus contortus. Parasit. Vectors, In press. Jones P, Binns D, Chang HY, Fraser M, Li W, McAnulla C, McWilliam H, Maslen J, Mitchell A, Nuka G, Pesseat S, Quinn AF, Sangrador-Vegas A, Scheremetjew M, Yong SY, Lopez R, Hunter S, 2014. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236-1240. Kanehisa M, Goto S, 2000. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27-30. Kent WJ, 2002. BLAT - the BLAST-like alignment tool. Genome Res. 12, 656-664. Knapp S, Arruda P, Blagg J, Burley S, Drewry DH, Edwards A, Fabbro D, Gillespie P, Gray NS, Kuster B, Lackey KE, Mazzafera P, Tomkinson NC, Willson TM, Workman P, Zuercher WJ, 2013. A public-private partnership to unlock the untargeted kinome. Nat. Chem. Biol. 9, 3-6. Knox DP, Geldhof P, Visser A, Britton C, 2007. RNA interference in parasitic nematodes of animals: a reality check? Trends Parasitol. 23, 105-107. Kolev NG, Tschudi C, Ullu E, 2011. RNA interference in protozoan parasites: achievements and challenges. Eukaryot. Cell 10, 1156-1163. Kovalenko M, Gazit A, Böhmer A, Rorsman C, Ronnstrand L, Heldin CH, Waltenberger J, Böhmer FD, Levitzki A, 1994. Selective platelet-derived growth factor receptor kinase blockers reverse sis-transformation. Cancer Res. 54, 6106-6114. Kumarasingha R, Karpe AV, Preston S, Yeo TC, Lim DS, Tu CL, Luu J, Simpson KJ, Shaw JM, Gasser RB, Beale DJ, Morrison PD, Palombo EA, Boag PR, 2016. Metabolic profiling and in vitro assessment of anthelmintic fractions of Picria fel-terrae Lour. Int. J. Parasitol. Drugs Drug Resist. 6, 171-178. Laan LC, Williams AR, Stavenhagen K, Giera M, Kooij G, Vlasakov I, Kalay H, Kringel H, Nejsum P, Thamsborg SM, Wuhrer M, Dijkstra CD, Cummings RD, van Die I, 2017. The whipworm (Trichuris suis) secretes prostaglandin E2 to suppress proinflammatory properties in human dendritic cells. FASEB J. 31, 719-731. Lee E, Helt GA, Reese JT, Munoz-Torres MC, Childers CP, Buels RM, Stein L, Holmes IH, Elsik CG, Lewis SE, 2013. Web Apollo: a web-based genomic annotation editing platform. Genome Biol. 14, R93. Li L, Stoeckert Jr. CJ, Roos DS, 2003. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 2178-2189. Liu F, Cui SJ, Hu W, Feng Z, Wang ZQ, Han ZG, 2009. Excretory/secretory proteome of the adult developmental stage of human blood fluke, Schistosoma japonicum. Mol. Cell. Proteomics 8, 1236-1251. Lu CJ, Tian BY, Cao Y, Zou CG, Zhang KQ, 2016. Nuclear receptor nhr-48 is required for pathogenicity of the second stage (J2) of the plant parasite Meloidogyne incognita. Sci. Rep. 6, 34959. Ma G, Luo Y, Zhu H, Luo Y, Korhonen PK, Young ND, Gasser RB, Zhou R, 2016. MicroRNAs of Toxocara canis and their predicted functional roles. Parasit. Vectors 9, 229. Manning G, Whyte DB, Martinez R, Hunter T, Sudarsanam S, 2002. The protein kinase complement of the human genome. Science 298, 1912-1934. Manning G, 2005. Genomic overview of protein kinases. WormBook, ed. The C. elegans Research Community, WormBook, https://doi.org/10.1895/wormbook.1.60.1 Markley JL, Bruschweiler R, Edison AS, Eghbalnia HR, Powers R, Raftery D, Wishart DS, 2017. The future of NMR-based metabolomics. Curr. Opin. Biotechnol. 43, 34-40.

199

Martin DM, Miranda-Saavedra D, Barton GJ, 2009. Kinomer v. 1.0: a database of systematically classified eukaryotic protein kinases. Nucleic Acids Res. 37, D244- D250. Maule AG, McVeigh P, Dalzell JJ, Atkinson L, Mousley A, Marks NJ, 2011. An eye on RNAi in nematode parasites. Trends Parasitol. 27, 505-513. McCoy CJ, Warnock ND, Atkinson LE, Atcheson E, Martin RJ, Robertson AP, Maule AG, Marks NJ, Mousley A, 2015. RNA interference in adult Ascaris suum - an opportunity for the development of a functional genomics platform that supports organism-, tissue- and cell-based biology in a nematode parasite. Int. J. Parasitol. 45, 673-678. Medard G, Pachl F, Ruprecht B, Klaeger S, Heinzlmeir S, Helm D, Qiao H, Ku X, Wilhelm M, Kuehne T, Wu Z, Dittmann A, Hopf C, Kramer K, Kuster B, 2015. Optimized chemical proteomics assay for kinase inhibitor profiling. J. Proteome Res. 14, 1574- 1586. Mohandas N, Hu M, Stroehlein AJ, Young ND, Sternberg PW, Lok JB, Gasser RB, 2016. Reconstruction of the insulin-like signalling pathway of Haemonchus contortus. Parasit. Vectors 9, 64. Morf L, Pearson RJ, Wang AS, Singh U, 2013. Robust gene silencing mediated by antisense small RNAs in the pathogenic protist Entamoeba histolytica. Nucleic Acids Res. 41, 9424-9437. Morgan WR, Greenwald I, 1993. Two novel transmembrane protein tyrosine kinases expressed during Caenorhabditis elegans hypodermal development. Mol. Cell. Biol. 13, 7133-7143. Mudge JM, Harrow J, 2016. The state of play in higher eukaryote gene annotation. Nat. Rev. Genet. 17, 758-772. Nishina M, Suzuki M, Matsushita K, 2004. Trichinella spiralis: activity of the cerebral pyruvate recycling pathway of the host (mouse) in hypoglycemia induced by the infection. Exp. Parasitol. 106, 62-65. Perdigão N, Heinrich J, Stolte C, Sabir KS, Buckley MJ, Tabor B, Signal B, Gloss BS, Hammang CJ, Rost B, Schafferhans A, O'Donoghue SI, 2015. Unexpected features of the dark proteome. Proc. Natl. Acad. Sci. USA 112, 15898-15903. Pereira JC, Caffarena ER, Dos Santos CN, 2016. Boosting docking-based virtual screening with deep learning. J. Chem. Inf. Model. 56, 2495-2506. Preidis GA, Hotez PJ, 2015. The newest "omics" - metagenomics and metabolomics - enter the battle against the neglected tropical diseases. PLoS Negl. Trop. Dis. 9, e0003382. Preston S, Jabbar A, Nowell C, Joachim A, Ruttkowski B, Baell J, Cardno T, Korhonen PK, Piedrafita D, Ansell BR, Jex AR, Hofmann A, Gasser RB, 2015. Low cost whole- organism screening of compounds for anthelmintic activity. Int. J. Parasitol. 45, 333- 343. Prosser GA, Larrouy-Maumus G, de Carvalho LP, 2014. Metabolomic strategies for the identification of new enzyme functions and metabolic pathways. EMBO Rep. 15, 657-669. Putman TE, Lelong S, Burgstaller-Muehlbacher S, Waagmeester A, Diesh C, Dunn N, Munoz-Torres M, Stupp GS, Wu C, Su AI, Good BM, 2017. WikiGenomes: an open web application for community consumption and curation of gene annotation data in Wikidata. Database (Oxford) https://doi.org/10.1093/database/bax025 Ratnappan R, Vadnal J, Keaney M, Eleftherianos I, O'Halloran D, Hawdon JM, 2016. RNAi- mediated gene knockdown by microinjection in the model entomopathogenic nematode Heterorhabditis bacteriophora. Parasit. Vectors 9, 160.

200

Remmert M, Biegert A, Hauser A, Soding J, 2011. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat. Methods 9, 173-175. Reuter JA, Spacek DV, Snyder MP, 2015. High-throughput sequencing technologies. Mol. Cell 58, 586-597. Rigaut G, Shevchenko A, Rutz B, Wilm M, Mann M, Seraphin B, 1999. A generic protein purification method for protein complex characterization and proteome exploration. Nat. Biotechnol. 17, 1030-1032. Roberts RJ, Carneiro MO, Schatz MC, 2013. The advantages of SMRT sequencing. Genome Biol. 14, 405. Roberts SB, Robichaux JL, Chavali AK, Manque PA, Lee V, Lara AM, Papin JA, Buck GA, 2009. Proteomic and network analysis characterize stage-specific metabolism in Trypanosoma cruzi. BMC Syst. Biol. 3, 52. Robinson MW, Connolly B, 2005. Proteomic analysis of the excretory-secretory proteins of the Trichinella spiralis L1 larva, a nematode parasite of skeletal muscle. Proteomics 5, 4525-4532. Robinson MW, Gare DC, Connolly B, 2005. Profiling excretory/secretory proteins of Trichinella spiralis muscle larvae by two-dimensional gel electrophoresis and mass spectrometry. Vet. Parasitol. 132, 37-41. Robinson-Rechavi M, Maina CV, Gissendanner CR, Laudet V, Sluder A, 2005. Explosive lineage-specific expansion of the orphan nuclear receptor HNF4 in nematodes. J. Mol. Evol. 60, 577-586. Rödelsperger C, Streit A, Sommer RJ, 2013. Structure, function and evolution of the nematode genome. eLS https://doi.org/10.1002/9780470015902.a0024603 Roessner U, Bowne J, 2009. What is metabolomics all about? Biotechniques 46, 363-365. Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Hohna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP, 2012. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst. Biol. 61, 539-542. Roquis D, Lepesant JM, Picard MA, Freitag M, Parrinello H, Groth M, Emans R, Cosseau C, Grunau C, 2015. The epigenome of Schistosoma mansoni provides insight about how cercariae poise transcription until infection. PLoS Negl. Trop. Dis. 9, e0003853. Samarasinghe B, Knox DP, Britton C, 2011. Factors affecting susceptibility to RNA interference in Haemonchus contortus and in vivo silencing of an H11 aminopeptidase gene. Int. J. Parasitol. 41, 51-59. Saric J, Li JV, Utzinger J, Wang Y, Keiser J, Dirnhofer S, Beckonert O, Sharabiani MT, Fonville JM, Nicholson JK, Holmes E, 2010. Systems parasitology: effects of Fasciola hepatica on the neurochemical profile in the rat brain. Mol. Syst. Biol. 6, 396. Sarnpitak P, Mujumdar P, Taylor P, Cross M, Coster MJ, Gorse AD, Krasavin M, Hofmann A, 2015. Panel docking of small-molecule libraries - prospects to improve efficiency of lead compound discovery. Biotechnol. Adv. 33, 941-947. Schwarz EM, Korhonen PK, Campbell BE, Young ND, Jex AR, Jabbar A, Hall RS, Mondal A, Howe AC, Pell J, Hofmann A, Boag PR, Zhu XQ, Gregory TR, Loukas A, Williams BA, Antoshechkin I, Brown CT, Sternberg PW, Gasser RB, 2013. The genome and developmental transcriptome of the strongylid nematode Haemonchus contortus. Genome Biol. 14, R89. Shanmugam D, Ralph SA, Carmona SJ, Crowther GJ, Roos DS, Agüero F, 2012. Integrating and mining helminth genomes to discover and prioritize novel therapeutic targets. In: Caffrey, CR (Ed.), Parasitic helminths: targets, screens, drugs and vaccines. Wiley-Blackwell, Hoboken, New Jersey, pp. 43-59.

201

Slater GS, Birney E, 2005. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6, 31. Sotillo J, Pearson M, Becker L, Mulvenna J, Loukas A, 2015. A quantitative proteomic analysis of the tegumental proteins from Schistosoma mansoni schistosomula reveals novel potential therapeutic targets. Int. J. Parasitol. 45, 505-516. Tabei Y, Yamanishi Y, Kotera M, 2016. Simultaneous prediction of enzyme orthologs from chemical transformation patterns for de novo metabolic pathway reconstruction. Bioinformatics 32, i278-i287. Taubert S, Ward JD, Yamamoto KR, 2011. Nuclear hormone receptors in nematodes: evolution and function. Mol. Cell. Endocrinol. 334, 49-55. Taylor CM, Wang Q, Rosa BA, Huang SC, Powell K, Schedl T, Pearce EJ, Abubucker S, Mitreva M, 2013. Discovery of anthelmintic drug targets and drugs using chokepoints in nematode metabolic pathways. PLoS Pathog. 9, e1003505. Thorvaldsdóttir H, Robinson JT, Mesirov JP, 2013. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief. Bioinform. 14, 178-192. Trott O, Olson AJ, 2010. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 31, 455-461. Tzelos T, Matthews JB, Whitelaw B, Knox DP, 2015. Marker genes for activation of the RNA interference (RNAi) pathway in the free-living nematode Caenorhabditis elegans and RNAi development in the ovine nematode Teladorsagia circumcincta. J. Helminthol. 89, 208-216. Urich MA, Nery JR, Lister R, Schmitz RJ, Ecker JR, 2015. MethylC-Seq library preparation for base-resolution whole-genome bisulfite sequencing. Nat. Protoc. 10, 475-483. Vincent IM, Barrett MP, 2015. Metabolomic-based strategies for anti-parasite drug discovery. J. Biomol. Screen. 20, 44-55. Viney ME, Thompson FJ, 2008. Two hypotheses to explain why RNA interference does not work in animal parasitic nematodes. Int. J. Parasitol. 38, 43-47. Volkamer A, Eid S, Turk S, Jaeger S, Rippmann F, Fulle S, 2015. Pocketome of human kinases: prioritizing the ATP binding sites of (yet) untapped protein kinases for drug discovery. J. Chem. Inf. Model. 55, 538-549. Volkamer A, Eid S, Turk S, Rippmann F, Fulle S, 2016. Identification and visualization of kinase-specific subpockets. J. Chem. Inf. Model. 56, 335-346. Wu W, LoVerde PT, 2011. Nuclear hormone receptors in parasitic helminths. Mol. Cell. Endocrinol. 334, 56-66. Xu Q, Dunbrack RL, Jr., 2012. Assignment of protein sequences to existing domain and family classification systems: Pfam and the PDB. Bioinformatics 28, 2763-2772. Yamanishi Y, Tabei Y, Kotera M, 2015. Metabolome-scale de novo pathway reconstruction using regioisomer-sensitive graph alignments. Bioinformatics 31, i161-i170. Yang J, Yan R, Roy A, Xu D, Poisson J, Zhang Y, 2015. The I-TASSER Suite: protein structure and function prediction. Nat. Methods 12, 7-8. Zawadzki JL, Kotze AC, Fritz JA, Johnson NM, Hemsworth JE, Hines BM, Behm CA, 2012. Silencing of essential genes by RNA interference in Haemonchus contortus. Parasitology 139, 613-629. Zhang M, Hong Y, Han Y, Han H, Peng J, Qiu C, Yang J, Lu K, Fu Z, Lin J, 2013. Proteomic analysis of tegument-exposed proteins of female and male Schistosoma japonicum worms. J. Proteome Res. 12, 5260-5270.

202

Figure 6.1 Example of a protein sequence that was incorrectly inferred from an erroneous gene prediction (M514_09783) in the draft gene set of the Trichuris suis genome (adult female; PRJNA208416). (A) Amino acid domain architecture (based on InterProScan; Jones et al., 2014) of the inferred “protein kinase/CAP” fusion protein, containing domains typical of protein kinases (“SH2 domain”; blue; IPR000980 and “protein kinase domain”; orange-brown; IPR000719) and a CAP domain (light brown; IPR014044). (B) Original gene model for M514_09783 on scaffold71 of the genome assembly of Trichuris suis (top); exons are displayed as thick blue boxes, introns as blue lines and the 5’ untranslated region as a thin blue box. Eight transcripts (de novo- assembled from publicly available RNA-Seq data; see section 5.2.1) are mapped to scaffold71 in this region (below), refuting the original gene prediction and providing evidence for two independent gene models.

203

2,784 bp 681,000 bp 682,000 bp

Sequence

Original gene M514_01298 prediction

De novo- feature_scaffold7_4 assembled transcripts

InterPro PTHR10593 PTHR10593 domains SSF46785 SSF56112 PF09202 PF01163

RNA-Seq coverage

Mapped RNA-Seq reads

2

Figure 6.2 A typical window in the Integrative Genomics Viewer (IGV) software (Thorvaldsdóttir et al., 2013), displaying a 2784 base pair (bp) section of scaffold7 of the Trichuris suis genome (adult female; PRJNA208416), an original gene prediction (M514_01298), a mapped de novo-assembled transcript (feature_scaffold7_4), mapped functional domain annotations (PTHR10593, SSF46785, SSF56112, PF09202, PF01163) inferred using InterProScan (Jones et al., 2014), and mapped paired-end RNA-Seq reads.

204

Original gene predictions (draft gene set) Transcripts assembled de novo from RNA-Seq reads

Find ‘seed’ sequences containing Pfam, PANTHER and SUPERFAMILY domains that represent kinase-like full-length sequences or sequence fragments (InterProScan; Jones et al., 2014)

Define pairwise orthologs in related species (OrthoMCL; Li et al., 2003)

Genomic scaffolds Reciprocally map transcripts to genome (BLAT; Kent, 2002)

Reassemble mapped, de novo-assembled transcripts (CAP3; Huang and Madan, 1999)

Repredict gene model (Exonerate; Slater and Birney, 2005)

Display and manually cross-check gene models (Integrative Genomics Viewer; IGV; Thorvaldsdóttir et al., 2013)

Gene sequence General Feature Format (GFF) file Transcript sequence

Amino acid sequence

Identify and classify kinase sequences (Kinannote; Goldberg et al., 2013)

Build taxon-specific Build multiple sequence hidden Markov models alignments of kinase (hmmalign; Eddy, 2011) catalytic domains (hmmalign) Define orthologs in curated kinomes Identify and classify Build phylogenetic tree (OrthoMCL) unclassified sequences (MrBayes; Ronquist et al., 2012)

Predict structure in silico Pairwise orthologs (I-TASSER; Yang et al., 2015) Define domain architectures (InterProScan)

Group, family and subfamily Predicted protein structure classification Domain annotation

Figure 6.3 Pairwise gene curation and kinase identification and classification workflow employed in this thesis. Orange boxes represent input data, green boxes represent output data and grey boxes specify individual steps, with employed programs and references given in brackets for each step. Following this workflow, the output data can be employed as input for subsequent analyses such as pathway annotation, drug target prediction/prioritisation and/or the analysis of transcription profiles of kinase genes.

205

LIST OF APPENDICES ______

CHAPTER 1

Appendix 1.1 Estimates of kinome sizes for published genomes and transcriptomes of non- helminth species. For partial kinomes the analysed subset of the kinome is indicated. ePKs, eukaryotic protein kinases; MAPKs, mitogen-activated protein kinases; TKs, tyrosine kinases.

CHAPTER 2

Appendix 2.1 The Schistosoma haematobium kinome, orthologs in Schistosoma mansoni and Homo sapiens, amino acid sequence identities and similarities as well as functional annotations.

Appendix 2.2 Phylogenetic analysis of individual eukaryotic protein kinase (ePK) groups and protein kinase-like (PKL) families of Schistosoma haematobium and Schistosoma mansoni. Following the alignment of amino acid sequences representing CMGC, CAMK, AGC, Other, TK, STE, TKL, CK1, RGC groups and RIO and ABC kinase families, phylogenetic trees were constructed. High-resolution figures of individual trees including nodal support values and sequence identifiers are given.

Appendix 2.3 Levels of transcription of genes encoding kinases in three different life cycle stages (adult male, adult female and egg) of Schistosoma haematobium. Transcription values are displayed as number of transcripts per million reads (TPM).

Appendix 2.4 Orthologs of Schistosoma haematobium kinases associated with lethal phenotypes in Drosophila melanogaster, Mus musculus and/or Caenorhabditis elegans.

Appendix 2.5 Schistosoma haematobium kinases prioritised as drug target candidates and associated compounds, using Kinase SARfari (https://www.ebi.ac.uk/chembl/sarfari/kinasesarfari). Sequence identifieres that are also predicted to be targets based in the DrugBank database (Appendix 2.6) are printed in bold.

Appendix 2.6 Schistosoma haematobium kinases prioritised as drug target candidates and associated compounds, using DrugBank 4.1 (http://www.drugbank.ca/). Sequences that are also predicted to be targets based in the Kinase SARfari database (Appendix 2.5) are printed in bold.

Appendix 2.7 Compounds predicted as inhibitors of schistosome kinases, and their chemical structures.

CHAPTER 3

Appendix 3.1 The Haemonchus contortus kinome, orthologs in Caenorhabditis elegans and Ovis aries, amino acid sequence identities and similarities, functional annotations, and nucleotide and amino acid sequences. Annotations include domains, protein families, GO terms, biological pathways, chokepoints and lethality in C. elegans homologs.

206

Appendix 3.2 Transcription profiles of 432 Haemonchus contortus protein kinase transcripts. For each transcript, transcript ID, TPM values (transcripts per million) for individual life cycle stages, as well as group classifications and cluster number are given.

Appendix 3.3 Transcription profiles for kinase genes in all key developmental stages (egg, L1, L2, L3, L4 and adult) and both sexes (L4 and adult) of Haemonchus contortus (X-axis) for eleven individual kinase groups (individual panels; abbreviations described below). Transcription levels are represented as log(transcripts per million + 1) values (Y-axis). Shaded lines represent individual transcription profiles; bold lines represent the Lowess trend line ± S.D. (dashed lines). For the L4 and adult stages both sexes are plotted (red, female; blue, male). CK1, casein kinase 1; CMGC, cyclin-dependent kinases/mitogen-activated protein kinases (MAPKs)/glycogen synthase kinases/CDK-like kinases; CAMK, Ca2+/calmodulin-dependent kinases; AGC, nucleoside-regulated kinases; TK, tyrosine kinases; TKL, tyrosine kinase-like kinases; STE, MAPK cascade kinases; RGC, receptor guanylate cyclases; UNCL, unclassified kinases.

Appendix 3.4 All clusters of transcription profiles for Haemonchus contortus kinase genes based on the Ward-clustering method (k = 15). Transcription levels are represented as transcripts per million (TPM) values (Y-axis; scaled according to the highest value within each cluster), and developmental stages (egg, L1-L4, adult) of H. contortus (X-axis). Shaded lines represent individual transcription profiles; bold lines represent the Lowess trend line ± S.D. (dashed lines). For the L4 and adult stages, both sexes are plotted (female = red; male = blue).

Appendix 3.5 Chemicals in DrugBank associated with the 13 predicted Haemonchus contortus kinase drug targets.

Appendix 3.6 Chemicals in KinaseSARfari associated with the 13 predicted Haemonchus contortus kinase drug targets.

CHAPTER 4

Appendix 4.1 Trees representing the phylogenetic relationship of eukaryotic protein kinase (ePK) sequences between Trichinella spiralis (T1) and Trichinella pseudospiralis (T4.1). Each ePK group is represented by an individual tree (A-I). (A) Protein kinases A, G and C, and other nucleoside-regulated kinases (AGC group); (B) Ca2+/calmodulin-dependent kinases (CAMK group); (C) Casein kinase 1 (CK1 group); (D) Cyclin-dependent kinases (CDKs), mitogen-activated protein kinases (MAPKs), glycogen synthase kinases (GSKs) and CDK- like kinases (CMGC group); (E) “Other” protein kinases (Other group); (F) Receptor guanylate cyclases (RGC group); (G) MAPK cascade kinases (STE group); (H) Tyrosine kinases (TK group); (I) Tyrosine kinase-like kinases (TKL group). Nodal support values (Bayesian inference) and sequence identifiers are given at the nodes and tips, respectively.

Appendix 4.2 The Trichinella spiralis (T1) kinome, orthologs in Trichinella pseudospiralis (T4.1), amino acid sequence identities and similarities, excretory/secretory prediction, functional annotations, and amino acid sequences.

207

Appendix 4.3 The Trichinella pseudospiralis (T4.1) kinome, orthologs in Trichinella spiralis (T1), amino acid sequence identities and similarities, excretory/secretory prediction, functional annotations, and amino acid sequences.

Appendix 4.4 Clusters of orthologs among Trichinella spiralis (T1), Trichinella pseudospiralis (T4.1), Caenorhabditis elegans (CEL) and Homo sapiens (HSA) based on OrthoMCL clustering (E-value of ≤ 1e-5; similarity of ≥ 0.8). Individual sequence identifiers are given in Appendix 4.5-4.12.

Appendix 4.5 Clusters of orthologs among Trichinella spiralis (T1), Trichinella pseudospiralis (T4.1), Caenorhabditis elegans (CEL) and Homo sapiens (HSA). Clusters were computed using OrthoMCL v.2.0.4 (E-value of ≤ 1e-5; similarity of ≥ 80%).

Appendix 4.6 Clusters of orthologs among Trichinella spiralis (T1), Trichinella pseudospiralis (T4.1) and Caenorhabditis elegans (CEL). Clusters were computed using OrthoMCL v.2.0.4 (E-value of ≤ 1e-5; similarity of ≥ 80%).

Appendix 4.7 Clusters of orthologs among Trichinella spiralis (T1), Trichinella pseudospiralis (T4.1) and Homo sapiens (HSA). Clusters were computed using OrthoMCL v.2.0.4 (E-value of ≤ 1e-5; similarity of ≥ 80%).

Appendix 4.8 Clusters of orthologs between Trichinella spiralis (T1) and Trichinella pseudospiralis (T4.1). Clusters were computed using OrthoMCL v.2.0.4 (E-value of ≤ 1e-5; similarity of ≥ 80%).

Appendix 4.9 Trichinella pseudospiralis (T4.1) sequences without orthologs in other species. Clusters were computed using OrthoMCL v.2.0.4 (E-value of ≤ 1e-5; similarity of ≥ 80%).

Appendix 4.10 Clusters of orthologs between Caenorhabditis elegans (CEL) and Homo sapiens (HSA). Clusters were computed using OrthoMCL v.2.0.4 (E-value of ≤ 1e-5; similarity of ≥ 80%).

Appendix 4.11 Caenorhabditis elegans (CEL) sequences without orthologs in other species. Clusters were computed using OrthoMCL v.2.0.4 (E-value of ≤ 1e-5; similarity of ≥ 80%).

Appendix 4.12 Homo sapiens (HSA) sequences without orthologs in other species. Clusters were computed using OrthoMCL v.2.0.4 (E-value of ≤ 1e-5; similarity of ≥ 80%).

CHAPTER 5

Appendix 5.1 The Trichuris suis (female) kinome, orthologs in T. suis (male), Trichuris trichiura, amino acid sequence identities and similarities, functional annotations, coding and amino acid sequences.

Appendix 5.2 The Trichuris suis (male) kinome, orthologs in T. suis (female), Trichuris trichiura, amino acid sequence identities and similarities, functional annotations, coding and amino acid sequences.

208

Appendix 5.3 General feature format (GFF) file representing 282 curated kinase transcript sequences encoding 280 kinases, mapped to the genome of adult female Trichuris suis worms (PRJNA208416).

Appendix 5.4 General feature format (GFF) file representing 284 curated kinase transcript sequences encoding 280 kinases, mapped to the genome of adult male Trichuris suis worms (PRJNA208415).

Appendix 5.5 General feature format (GFF) file representing 310 curated kinase transcript sequences encoding 268 kinases and 21 kinase fragments, mapped to the genome of mixed adult Trichuris trichiura worms (PRJEB535).

Appendix 5.6 The Trichuris trichiura kinome, orthologs in Trichuris suis (male and female), amino acid sequence identities and similarities, functional annotations, coding and amino acid sequences.

Appendix 5.7 Trees representing the phylogenetic relationships among eukaryotic protein kinase (ePK) sequences of male (Tsm) and female (Tsf) Trichuris suis and Trichuris trichiura (Tt). Each ePK group is represented by an individual tree and all trees are arranged circularly. Nodal support values (Bayesian inference) and sequence identifiers are given at the nodes and tips, respectively. AGC, nucleoside-regulated kinases; CAMK, Ca2+/calmodulin-dependent kinases; CK1, casein kinase 1 kinases; CMGC, cyclin-dependent kinases (CDKs)/mitogen-activated protein kinases (MAPKs)/glycogen synthase kinases (GSKs)/CDK-like kinases; RGC, receptor guanylate cyclases; STE, MAPK cascade kinases; TKL, tyrosine kinase-like kinases; TK, tyrosine kinases.

Appendix 5.8 Prediction of three-dimensional structures for unclassified putative protein kinases of Trichuris suis. For each protein, template modelling (TM) score, root-mean-square deviation (RMSD) value, identity and coverage values of the closest structural analog, as well as additional information about the matched structure, are given. PDB, Protein Data Bank.

Appendix 5.9 Transcription profiles of 281 Trichuris suis protein kinase transcripts. For each transcript, identifier, TPM values (transcripts per million) for individual life cycle stages, sexes and tissues, as well as cluster number, classifications and domain annotations are given.

Appendix 5.10 Clusters of orthologs among Trichuris suis (TSU), Trichuris trichiura (TTR), Trichinella spiralis (TSP), Haemonchus contortus (HCO) and Caenorhabditis elegans (CEL). Clusters were computed using OrthoMCL v.2.0.4 (E-value of ≤ 1e-5; similarity of ≥ 50%).

Appendix 5.11 Sequence similarity of full-length Trichuris suis sequences to orthologs in Caenorhabditis elegans, Haemonchus contortus, Trichuris trichiura and Trichinella spiralis. For each sequence, global pairwise amino acid sequence similarity values, mean values across all species and S.D.s are given.

Appendix 5.12 Sequence similarity of catalytic domain sequences of Trichuris suis to orthologs in Caenorhabditis elegans, Haemonchus contortus, Trichuris trichiura and Trichinella spiralis. For each sequence, global pairwise amino acid sequence similarity values, mean values across all species and S.D.s are given.

209

Minerva Access is the Institutional Repository of The University of Melbourne

Author/s: Stroehlein, Andreas Julius

Title: Kinomes of selected parasitic helminths - fundamental and applied implications

Date: 2017

Persistent Link: http://hdl.handle.net/11343/192334

File Description: Final thesis

Terms and Conditions: Terms and Conditions: Copyright in works deposited in Minerva Access is retained by the copyright owner. The work may not be altered without permission from the copyright owner. Readers may only download, print and save electronic copies of whole works for their own personal non-commercial use. Any use that exceeds these limits requires permission from the copyright owner. Attribution is essential when quoting or paraphrasing from these works.