Quick viewing(Text Mode)

Dual RNA-Seq Analysis of Host-Pathogen Interaction in Eimeria Infection of Chickens

Dual RNA-Seq Analysis of Host-Pathogen Interaction in Eimeria Infection of Chickens

Dual RNA-seq analysis of host-pathogen interaction in infection of

Arnar Kári Sigurðarson Sandholt

Degree project in bioinformatics, 2020 Examensarbete i bioinformatik 30 hp till masterexamen, 2020 Biology Education Centre, Uppsala University, and Statens Veterinärmedicinska Anstalt Supervisors: Robert Söderlund and Eva Wattrang

Abstract

Eimeria tenella is a eukaryotic, intracellular parasite that, along with six other Eimeria species, causes coccidiosis in chickens. This disease can result in weight loss or even death and is estimated to cause 2 billion euros of damages to the industry each year. While much is known of the life cycle of E. tenella in the chicken, less is known about molecular mechanisms of infection and the chicken immune response. In this study, we produced a pipeline for dual RNA-sequencing analysis of a mixed chicken and E. tenella dataset. We then carried out an analysis on an in vitro infection of the chicken macrophage HD-11 cell line. This was followed by a differential expression analysis across six time points, 2, 4, 12, 24, 48, and 72 hours post-infection, in order to elucidate these mechanisms.

The results showed clear patterns of expression for the chicken immune genes, with strong down-regulation of genes across the immune system at 24 hours and a repetition of early patterns at 72 hours, indicating that reinfection by a second generation of parasite cells was occurring. Several genes that may have important roles in the immune reaction of the chicken were identified, such as MRC2, ITGB3 and ITGA9, along with genes with known roles, such as TLR15. The expression of surface antigen genes in E. tenella was also examined, showing a clear upregulation in the late stages of merogony, suggesting important roles for merozoites. Finally, a co-expression analysis was carried out, showing considerable co-expression among the two organisms. One of the gene co-expression networks identified appeared to be enriched with both infection specific genes from E. tenella and chicken immune genes. These results, along with the pipeline, will be used in further studies on E. tenella infections and bring us closer to the eventual goal of a vaccine for coccidiosis.

Revealing the mechanisms of coccidiosis - the path to healthier, happier chickens

Popular Science Summary

Arnar Kári Sigurðarson Sandholt

The chicken is one of the most important animals in agriculture and is consumed world-wide. It carries several diseases that can infect humans, the best known one likely being salmonellosis. However, the chicken also suffers from several diseases that do not affect humans, one of which is coccidiosis. This disease is caused by single-cell parasites of genus Eimeria that infect chicken cells in order to breed. This disease ranges from being practically unnoticeable, as the chicken shows no outward symptoms, to very serious symptoms, potentially killing the chicken. Regardless of symptoms, however, the parasite will multiply in the chicken and be carried out with its feces. The severity of coccidiosis depends on species of Eimeria as well as the number of oocysts, eggs from the parasite, the chicken eats. These oocysts are very difficult to get rid of as they are resistant to many types of disinfectant and reliable methods to get rid of them are difficult to implement in an intensive farming environment. While the disease is not always severe, it does result in weaker, slower growing chickens which translates to losses for the chicken industry, estimated to be around 2 billion euros per year and resulting in more expensive chicken meat for consumers.

There are currently two ways of combating coccidiosis: vaccination and anticoccidials. Vaccination is an ideal way to fight the disease, but the downside is that current vaccines require live Eimeria cells. An infection by any Eimeria species gives the chicken immunity to that species, so an infection by a weakened parasite will render a chicken immune to parasites that can cause the disease. However, live parasites can currently only be produced in live chickens. Because of this, the production of large amounts of vaccines is difficult and expensive as well as being problematic from an ethical standpoint. The other method is anticoccidials, chemical antibiotics that are administered through the chicken feed. Due to the scaling problems with vaccination production, this has been the main way of fighting coccidiosis. However, the Eimeria species are becoming resistant to these antibiotics. This, combined with a growing negative sentiment among consumers towards use of chemicals in meat production, has led to a growing demand for a better form of vaccines that don't require live parasites.

There have been numerous attempts to address this demand, but so far, no viable vaccine has been developed. One reason for this is that the molecular mechanisms, i.e. genes and proteins, by which Eimeria species infect the chicken and the chicken immune system reacts to the infection are not well known. In this project, we aimed to explore these mechanisms by

examining the expression of genes in the chicken and a species of parasite, Eimeria tenella and comparing it to their expression in non-infected cells. This means looking at how much a given gene is used to produce proteins at different times during the infection process of Eimeria tenella and comparing it to the same gene when the cell isn't infected. For the parasite, the comparison is made to cells that haven't entered chicken cells and are waiting to start infecting. This allows us to see which proteins are being used, when they are being used, and helps to find ones that can be targeted by a vaccine and ones that are important to the immune response.

Our results show that many genes are affected by the parasite infection, especially in the immune system. However, a large part of that reaction is down-regulation, i.e. reducing the amount of protein that is produced from that gene compared to an uninfected cell. This further supports previous results, that Eimeria tenella is actively weakening the immune response of the host and that targeting this ability might produce a viable vaccine. We examined the expression of several parasite genes believed to have a role in this process, showing that the genes were highly expressed when the parasite was reproducing. The expression analysis also revealed several genes that clearly react to the presence of the parasite at different times during infection and may be key to understanding how the chicken acquires immunity to it. Finally, we attempted to find genes that show a similar pattern of expression in both the chicken and the parasite, i.e. genes that behave in similar ways and might be affecting one another. Many genes showed similar patterns between species and one group of genes was identified that contained both immune genes with known roles in Eimeria tenella infections and parasite genes with roles in the infection process.

By identifying important immune genes in the chicken and infection genes in the parasite, this project has contributed to increasing understanding of coccidiosis. The results will be used to inform further research into this disease and, hopefully, take us one step closer to the creation of an effective vaccine to this disease.

Degree project in bioinformatics, 2020 Examensarbete i bioinformatik 30 hp till masterexamen, 2020 Biology Education Centre and Statens Veterinärmedicinska Anstalt Supervisor: Robert Söderlund

Table of Contents

1 Introduction ...... 11 1.1 Aims ...... 13 2 Methods ...... 13 2.1 Data ...... 13 2.2 Read counting ...... 13 2.3 Differential expression analysis ...... 14 3 Results ...... 15 3.1 Read counting ...... 15 3.2 Differential expression analysis ...... 17 3.3 GO and KEGG analysis ...... 23 3.4 Immune/infection gene expression ...... 29 3.5 Gene co-expression modules ...... 36 4 Discussion ...... 40 4.1 The data and the pipeline ...... 40 4.2 GO and KEGG categories ...... 42 4.3 Resistance and infection genes ...... 44 4.4 Gene co-expression networks and conclusion ...... 46 5 Acknowledgements ...... 47 References ...... 48

Abbreviations

API Application programming interface CPM Counts per million DE Differential expression FDR False discovery rate GO Gene Ontology GPI Glycosylphosphatidylinositol IFN Interferon IL Interleukin KEGG Kyoto Encyclopaedia of Genes and Genomes MDS Multidimensional scaling PCA Principal components analysis PRR Pattern recognition receptors SAG GPI-anchored surface antigens SVA Statens Veterinärmedicinska Anstalt TLR Toll-like receptor WGCNA Weighted correlation network analysis

1 Introduction

Coccidiosis is a disease caused by intracellular, eukaryotic parasites of the genus Eimeria. The disease occurs in a wide variety of species, one of which is the chicken, Gallus gallus. Seven different species of Eimeria are known to infect chickens, including E. tenella and E. maxima. All seven species infect the gut of the chicken but the specific sections of intestine they infect, the length and specifics of their lifecycle and the severity of the symptoms they cause vary considerably. What is common to all species is that the severity of the infection depends on the infection dose and infected individuals acquire a species-specific immunity post-infection (Chapman et al. 2013). E. tenella causes one of the more severe infections. Its entire life cycle takes place within the cecum of the chicken where it goes through three stages. First, an oocyst containing invasive sporozoites is ingested by the chicken. In the cecum, these sporozoites invade epithelial cells and go through three cycles of asexual replication, called merogony, invading more cells in each cycle. Symptoms start appearing after the second merogony. After the third merogony, sexual reproduction takes place, forming a zygote that transforms into an oocyst that is shed in the chickens feces. The whole process takes approximately nine days (Daszak 1999). At this point, the oocyst needs a few days to sporulate before it's ready to infect the next chicken, beginning the next cycle of infection. The oocyst itself is also an important factor in Eimeria infection, being resistant to a variety of disinfectants and allowing the parasite to remain viable in the environment for months or even years (Belli et al. 2006). This means that it is exceedingly difficult to control coccidiosis through proper hygiene and environmental control.

In order to study the infection, chicken cell lines have been infected with the parasite in vitro. However, it is impossible to maintain Eimeria parasites through in vitro infection as the parasite doesn't produce oocysts in cell cultures. The life cycle is also slower; in vivo E. tenella is finished with the first merogony at 48 hours post-infection but in vitro it takes 72 hours or more (McDonald & Rose 1987, Tierney & Mulcahy 2003). The in vitro infection system does allow for studying coccidiosis without infecting live chickens and is a viable method for studying the early stages of the parasite life cycle.

Once a chicken is infected by an Eimeria species, symptoms vary greatly in severity. Most cases result in limited symptoms that can nonetheless cause weight loss and less efficient feed conversion. In the worst case, the infection can lead to death. Either way, it results in losses for the chicken industry. Indeed, it is estimated that coccidiosis causes 2 billion euros of damages to the global industry each year (McDonald & Shirley 2009). Currently, two main methods are used to control E. tenella: vaccination and use of antibiotics called coccidiostats. Vaccination makes use of avirulent and attenuated strains of Eimeria to induce immunity in the chicken. While this method is effective, it remains impossible to produce viable Eimeria through in vitro methods, requiring the use of live chickens instead (Witcombe & Smith 2014). This results in expensive vaccines and has raised concerns regarding the security and scalability

11 of production as well as ethical issues. For these reasons, coccidiostats have been much more widely used than vaccination. Coccidiostats are a type of chemical antibiotic that has been used to effectively control Eimeria in chickens, however many species have acquired resistance to them. This has led to a renewed interest in the creation of a subunit vaccine, i.e. a vaccine that makes use of immune activating proteins to induce immunity in the chicken (Witcombe & Smith 2014).

If a subunit vaccine could be produced safely and efficiently using, for example, transformed E. coli, it would effectively mitigate most current issues with vaccination. However, despite extensive research, no effective subunit vaccine has yet been produced. One of the problems that have frustrated vaccine development is that, while the life cycles of different Eimeria species are well defined, the molecular mechanisms that they use to infiltrate their hosts and prevent discovery and destruction by the chicken immune system are not well known. Some have been identified, however, like the surface antigen (SAG) proteins. They have been found to play an important role in infection, taking part in the invasion process as well as affecting the expression of important cytokines such as IFN-γ and IL-12 (Chow et al. 2011, Chapman et al. 2013). Much remains unknown about the mechanism by which the chicken immune system detects and deals with Eimeria infections as well, though several factors have been identified. Eimeria immunity in chickens is known to be species specific and cell-mediated (Rose & Hesketh 1979). Several important immune mechanisms have also been identified, notably the cytokine IFN-γ which was found to be important in immunoregulation (Rose et al. 1991). Toll- like receptors, specifically TLR4 and TLR15, have been suggested to have a role in the initiation of immune response to Eimeria (Zhou et al. 2013). However, a complete picture of the immune response has yet to emerge. It is therefore imperative that both the E. tenella infection strategy and the chicken immune response be further explored in order to elucidate these processes and identify targets for vaccine development.

One way to do this is to examine the gene expression of both organisms. However, if the normal method of sequencing the transcriptome of each organism separately is used, information on the possible interactions of the two species is lost. One way of circumventing this problem is to use dual RNA-seq. This method differs from normal RNA-seq in that samples are taken from infected tissue and RNA reads are mapped to the genomes of both organisms involved. The advantage of this method is that it allows for exploring the expression of both organisms simultaneously, making it possible to identify interactions between the two species by finding gene co-expression networks involving genes from both organisms (Westermann et al. 2012). Therefore, dual RNA-seq should be a fitting method to examine the E. tenella infection of chicken cells and identify how the parasite interacts with the chicken immune system over the course of the infection.

12 1.1 Aims

The main aim of this project is to produce a pipeline for performing a differential expression analysis on dual RNA-seq data from the chicken and E. tenella. That pipeline will then be used to analyse an in vitro dataset to identify important genes in the chicken immune response to E. tenella and elucidate their roles and to examine the expression of genes in the chicken and E. tenella at the same time in order to identify networks of co-expression across the two organisms.

2 Methods

2.1 Data

The data for this project consisted of two datasets of raw mRNA read data, both from in vitro experiments. The two were made up of RNA reads collected from immortalized chicken macrophage cells of cell line HD-11 (Beug et al. 1979), infected with sporozoites from the Eimeria tenella reference strain Houghton. Both the parasite and the cell line are maintained at Statens Veterinärmedicinska Anstalt (SVA). The first in vitro dataset consisted of samples taken from infected and uninfected chicken cell cultures, one from each at 2 hours post- infection and two from each at 12 hours and 24 hours. The sequencing libraries for this dataset were prepared from 500ng total RNA using the TruSeq stranded mRNA library preparation kit including poly-A selection. The sequencing was done in a HiSeq 2500 machine with 125 bp reads using v4 sequencing chemistry. The second in vitro dataset was from a larger experiment with samples taken at 0, 2, 4, 12, 24, 48, and 72 hours post-infection. The sequencing libraries were prepared in the same way as the pilot dataset but from 120 ng of total RNA. The sequencing method was also the same. The samples were taken from cultures kept at SVA and the datasets were generated at the SNP&SEQ Technology Platform at SciLifeLab/Uppsala University.

2.2 Read counting

The analysis of the data began with read counting, i.e. mapping the sequenced reads to the chicken and E. tenella genomes and counting the number of reads mapping to each gene. This pipeline expanded on the pipeline created by Feifei Xu (Uppsala University), who previously worked with both in vitro datasets. The first step of the pipeline was to evaluate the quality of the raw data to determine the extent of trimming required. Quality reports were generated by the SNP&SEQ Technology Platform and these were used to determine the settings for Trimmomatic (Bolger et al. 2014). As the average read quality scores were very high before trimming, only a sliding window of length 4 with an average quality threshold of 20 was used to get rid of any low-quality regions. A high percentage of adapter content was also observed, which required trimming. The trimmed reads were subsequently examined to ensure the adapters had been removed successfully and read quality remained high using FastQC

13 (Andrews 2010). MultiQC was then used to collate the reports (Ewels et al. 2016). Once the desired read quality had been attained, the next step was to map the reads to the reference genomes. As this was a dual-RNA seq pipeline, the reads had to be mapped to both genomes. This was accomplished by concatenating both the sequence files and the annotation files while ensuring that there was no conflict in gene identifiers between the two genomes. The reads were then mapped to the genomes using the splice-aware mapper STAR and counted using HTSeq (Dobin et al. 2013, Anders et al. 2015). As the library preparation allowed for strand-specific mapping, this feature of HTSeq was used with the strandedness setting on reverse. This produced files containing the number of transcripts mapping to each genomic feature in both organisms. These files were split up into count files for each species and statistics on the counts were computed. The computations were performed on resources provided by SNIC through Uppsala Multidisciplinary Center for Advanced Computational Science (UPPMAX) under Project SNIC 2020/15-16. All the code used in the project along with information on software versions can be found on the project’s GitHub repository (Sigurðarson Sandholt 2020).

2.3 Differential expression analysis

The second part of this project was the differential expression analysis run in R. Several different packages for differential expression analysis are available in R, the one chosen for this project was edgeR (Robinson et al. 2010). This package takes in raw read counts and includes functions for normalizing and filtering the data before running the differential expression analysis. It also has functions for running GO (Gene Ontology) and KEGG (Kyoto Encyclopaedia of Genes and Genomes) category enrichment analyses. Before loading the data into R, however, it needed to be processed to make it easier to work with and to add in Entrez gene IDs for downstream analysis. This was accomplished with Python scripts using the Pandas package to manipulate tables and Bash scripts for extracting IDs from the annotation files. These processed read count files were then read into R. There they were split into a chicken dataset and E. tenella dataset, each containing only relevant data, i.e. infected samples for the parasite dataset. These were then normalized using CPM (Counts Per Million), which is dividing each gene count by the total number of reads for that sample and multiplying by one million. The next step was to examine the data for any outliers using a PCA (Principal Components Analysis) and MDS (Multidimensional Scaling) and examining those found. The

MDS plots display leading log2 fold change, “...defined as the root-mean-square average of the log-fold-changes for the genes best distinguishing each pair of samples” (Ritchie et al. 2015).

Once the data had been quality checked, it was loaded into the DGElist object from edgeR. Functions from edgeR were used to normalize the library size of samples in each dataset and remove genes that had very low counts across all samples. The experimental design was then defined, i.e. which comparisons were to be made, and the differential expression analysis run on the different time points. Infected and uninfected samples at each time point were compared for the chicken data whereas the E. tenella data was compared to a pre-infection sample of pure E. tenella sporozoites. The threshold for differential expression was a False discovery rate

14 (FDR) value <0.05, as calculated using edgeR, and a log2 fold change of ≥1. P-values were adjusted using the Benjamini-Hochberg methods, resulting in FDR values, in order to correct for the multiple hypothesis testing occurring when testing for the differential expression of every gene present. This resulted in a list of differentially expressed genes which was further analysed by running a GO category and KEGG pathway enrichment analysis. For the chicken, this was accomplished using edgeR functions along with the packages GO.db (Carlson 2019a), org.Gg.eg.db (Carlson 2019b), and the KEGGrest API (Tenenbaum 2020). For E. tenella, the lack of available GO and KEGG annotation and annotation packages meant that a manual enrichment analysis was done using ad hoc scripts. The GO and KEGG annotations for E. tenella were taken from ToxoDB (Gajria et al. 2008) and generated through KEGG's BlastKOALA tool, respectively.

Visualization of the results of the differential expression analysis was accomplished with several different R packages. These were EnhancedVolcano (Blighe et al. 2019), ggbiplot (Vu 2011), ggplot2 (Wickham 2016 p. 2), ClassDiscovery (Coombes 2019), RColorBrewer (Neuwirth 2014) and functions that are part of edgeR.

The final analysis used the WGCNA package to identify groups of genes with significantly similar expression patterns and cluster them together into modules (Langfelder & Horvath 2008). This was run on normalised CPM values of read counts after filtering using edgeR and only used data from infected samples so that chicken and E. tenella data could be combined without introducing too much bias from uninfected samples having zero read counts for E. tenella genes.

3 Results

3.1 Read counting

The analysis began with a read counting step, i.e. getting the number of reads mapping to each gene in the chicken and E. tenella from the raw read data. The raw read data from the SNP&SEQ Platform was of high read quality but had a large amount of adapter sequences present which required trimming. Once trimmed, the reads were mapped to both the chicken and E. tenella genomes simultaneously and the number of reads mapping to genes counted. Table 1 shows information on the samples and the fraction of reads mapping to features. Features refer to any expressed parts of the genome, generally meaning protein coding genes but also RNA genes and pseudogenes. For this project, only protein coding genes, along with rRNA and tRNA coding genes, were included in the analysis and are referred to as genes. The mapping rate to features was generally high for the samples, ~80-85%, but was much lower for the pure E. tenella sample, ~65%. Once the number of reads mapping to each feature was computed, the files were processed and loaded into R for differential expression analysis.

15 Table 1. Statistics of read mapping and metadata for each sample. For each time point, there are three replicates for each of infected and uninfected chicken samples. The 0-hour sample is an exception, only one pure E. tenella sample and two chicken samples were taken. The fraction of E. tenella reads is shown for every sample, including the uninfected ones.

Sample Infection Time post- Number of Percentage of Percentage of status infection mapped E. tenella reads reads mapped [hours] reads to features 33_S29 Pure 0 1.24E+07 99.954 65.28 E. tenella 1_S1 Uninfected 0 2.16E+07 0.0003 85.64 11_S10 Uninfected 0 1.53E+07 0.0006 84.1 23_S19 Infected 2 1.17E+07 0.4993 83.32 24_S20 Infected 2 1.07E+07 0.466 83 6_S22 Infected 2 2.23E+07 0.5854 84.4 25_S21 Uninfected 2 1.47E+07 0.002 83.63 26_S22 Uninfected 2 2.26E+07 0.0014 83.82 5_S21 Uninfected 2 1.74E+07 0.0005 84.26 2_S18 Infected 4 1.72E+07 1.8494 84.92 2_S2 Infected 4 1.11E+07 0.6958 83 8_S24 Infected 4 1.89E+07 1.0926 85.23 1_S17 Uninfected 4 1.55E+07 0.0003 83.64 3_S3 Uninfected 4 1.09E+07 0.0007 84.05 7_S23 Uninfected 4 2.69E+07 0.0006 85.63 27_S23 Infected 12 1.70E+07 0.573 81.99 28_S24 Infected 12 2.73E+07 0.5766 81.91 29_S25 Infected 12 3.62E+07 0.6084 82.08 30_S26 Uninfected 12 2.31E+07 0.0025 81.49 31_S27 Uninfected 12 1.22E+07 0.0034 82.37 32_S28 Uninfected 12 1.26E+07 0.0016 82.55 10_S26 Infected 24 1.95E+07 2.3565 85.42 4_S20 Infected 24 1.65E+07 1.5509 83.89 4_S4 Infected 24 1.22E+07 1.3263 83.02 3_S19 Uninfected 24 1.77E+07 0.0004 82.83 5_S5 Uninfected 24 1.42E+07 0.0007 82.8 9_S25 Uninfected 24 1.85E+07 0.0018 87.81 12_S11 Infected 48 2.27E+07 5.5871 81.34 18_S15 Infected 48 1.19E+07 5.936 81.25 6_S6 Infected 48 2.81E+07 4.6507 81.81 13_S12 Uninfected 48 2.12E+07 0.0007 82.7 19_S16 Uninfected 48 8.54E+06 0.0011 81.92 7_S7 Uninfected 48 1.02E+07 0.0032 84.86 14_S13 Infected 72 1.27E+07 2.4569 80.76 20_S17 Infected 72 1.34E+07 3.1254 80.5 8_S8 Infected 72 1.28E+07 2.8447 80.82 15_S14 Uninfected 72 1.14E+07 0.0015 81.71 21_S18 Uninfected 72 1.91E+07 0.0016 81.92 9_S9 Uninfected 72 1.42E+07 0.0008 81.06

16

Figure 1. A line plot showing the average percentage of reads mapping to the E. tenella genome at each time point. The fraction of read mapping to the E. tenella genome varies over time, as Figure 1 shows. There is a large jump in the percentage from 2 to 4 hours and a fall again at 12 hours. It then steadily rises towards 48 hours and falls again at 72 hours. A potential explanation for this pattern is that at 12 hours, the parasite is in the early stages of the trophozoite form, where it is not yet dividing. At 72 hours, some of the already released merozoites might be washed away and lost when the RNA is harvested. The fraction reaches a maximum of just over 5% at 48 hours and has a minimum of about 0.5% at 2 and 12 hours.

3.2 Differential expression analysis

Before running the differential expression (DE) analysis, the data was examined for potential outliers. First, the read counts and library sizes were normalized and any genes with a very small number of reads mapping to them across all samples were removed. Then, both an MDS analysis and a PCA were run on the data and plotted in Figures 2-5.

17

Figure 2. A multidimensional scaling plot of the normalized chicken count data. The labels are composed of U for uninfected or I for infected and the time post-infection that the sample was taken.

Figure 3. A multidimensional scaling plot of the normalized E. tenella count data. The labels are the time post-infection at which the sample was taken.

18

Figure 4. PCA plot of the first two principal components of the normalized chicken count data. The labels are composed of U for uninfected or I for infected and the time post-infection that the sample was taken.

19

Figure 5. PCA plot of the first two principal components of the normalized E. tenella count data. The labels are the time post-infection at which the sample was taken. The 0-hour, pure sporozoite sample is not included. These figures show a stark contrast between the chicken and E. tenella data, with the chicken data having a more random distribution, perhaps reflecting the more diverse background transcription going on in the chicken, whereas the E. tenella data clusters more distinctly. The MDS plot in Figure 2 does show some clustering for the uninfected and infected samples at 48 and 72 hours. This also occurs in the PCA plot in Figure 4 but data from other time points also clusters close to them. In neither plot is there a very clear separation of the infected and uninfected time points, though in both plots there is a trend of separation along PC2. It would appear, however, that neither infection status nor time can completely explain the variance in the chicken data, suggesting that it has to do with how the cells grow in each experiment. This contrasts dramatically with the E. tenella data where the MDS plot in Figure 3 shows very clear clustering on timepoints, with three main clusters forming for 2 and 4, 12 and 24 and 48 and 72 hour samples. This clustering is not as clear in the PCA plot in Figure 5 but there is still a clear tendency for samples from the same time point to cluster together. This indicates a more dramatic shift in expression in E. tenella between time points. It should be mentioned that the pure E. tenella sample, shown as the zero timepoint in Figure 3, was not included in Figure 5 as it formed an extreme outlier. This sample was used as the reference for the parasite DE analysis and was not removed.

20 The next step was the DE analysis itself. For the chicken data, the DE analysis was conducted for each time point, comparing infected samples to uninfected ones. For the E. tenella data, data from each time point was instead compared to the pure parasite sample, 33_S29. Figures 6 and 7 show volcano plots for the chicken and E. tenella, respectively, at all time points.

Figure 6. Volcano plots for differential gene expression in the chicken samples for all time points. The significance thresholds are set at a log2 fold change of ±1 and a false discovery rate of 0.05. NS stands for non-significant.

21

Figure 7. Volcano plots for differential gene expression in the E. tenella samples for all time points. The significance thresholds are set at a log2 fold change of ±1 and a false discovery rate of 0.05. NS stands for non-significant. The plots show the changing expression profile of both species as the infection progresses. For the chicken, the number of up-regulated genes increases up to the 24 hour time point where it peaks and then falls slightly at 48 and 72 hours. Of particular note here are the genes LOC107049603 and NR4A1 which are highly significant at 2, 4 and 12 hours. These genes encode a FosB-like protein and a nuclear receptor, respectively. The number of down-regulated genes increases in a slower fashion and hits a very sudden peak at 24 hours. A far smaller number of genes are significantly down-regulated in the last two time points. This indicates a particularly large reaction in the chicken at 24 hours, perhaps due to the E. tenella parasite hitting a specific stage of the infection process. For E. tenella, a large number of genes are both up- and down-regulated across the different time points, with both quickly increasing from the 2-hour time point. A relatively small number of genes are up-regulated at 2 hours, showing that

22 relatively few genes have started to express more than they would while the parasite is still looking for a host to infect. The number of up-regulated genes quickly increases and appears fairly stable in the 24, 48, and 72 hour samples.

3.3 GO and KEGG analysis

An enrichment analysis for Gene Ontology (GO) categories and Kyoto Encyclopaedia of Genes and Genomes (KEGG) pathways was carried out on the lists of differentially expressed genes for all time points on both organisms. The chicken data was analysed first using edgeR. For the sake of clarity, only results from three time points from both organisms, 4, 24, and 72 hours post-infection, will be discussed here. The top 10 most significantly enriched GO categories and KEGG pathways for the chicken at each time point can be found in Tables 2 and 3. Only GO categories from the biological process ontology were included in this analysis.

Table 2. Top 10 most significantly enriched GO categories in the chicken at 4, 24, and 72 hours post-infection. The table shows the term, then number of genes associated with this category in the chicken (N), the number of significantly up- and down-regulated genes and the p-value associated with each from a Fisher’s Exact Test.

Top 10 enriched GO categories at 4 hours post-infection Term N Up Down P.Up P.Down PR (Positive Regulation) of cellular process 390 17 1 1.07E-06 5.16E-01 PR of transcription by RNA polymerase II 109 9 0 2.72E-06 1 PR of biological process 435 17 1 4.68E-06 5.56E-01 response to chemical 244 12 0 1.36E-05 1 regulation of transcription by RNA 174 10 0 1.93E-05 1 PRpolymerase of transcription, II DNA-templated 141 9 0 2.22E-05 1 PR of nitrogen compound metabolic process 260 12 0 2.55E-05 1 transcription by RNA polymerase II 180 10 0 2.59E-05 1 PR of cellular metabolic process 263 12 0 2.86E-05 1 PR of macromolecule metabolic process 264 12 0 2.97E-05 1 Top 10 enriched GO categories at 24 hours post-infection Term N Up Down P.Up P.Down cellular process 1380 172 175 4.53E-07 5.07E-01 biological process 1504 184 192 6.05E-07 4.64E-01 cellular macromolecule metabolic process 753 104 79 1.25E-06 9.74E-01 macromolecule metabolic process 834 112 88 1.92E-06 9.77E-01 nucleic acid metabolic process 469 71 37 2.75E-06 1 developmental process 435 67 50 2.89E-06 7.92E-01 anatomical structure development 410 64 47 3.06E-06 7.92E-01 multicellular organism development 370 59 46 3.76E-06 5.79E-01 regulation of biological process 837 110 100 6.55E-06 7.56E-01 nucleobase-containing compound metabolic 525 76 44 6.68E-06 9.99E-01 process

23 Top 10 enriched GO categories at 72 hours post-infection Term N Up Down P.Up P.Down sensory perception of mechanical stimulus 8 4 0 3.59E-05 1 sensory perception of sound 8 4 0 3.59E-05 1 regulation of signalling 205 16 7 1.72E-04 4.43E-01 signal transduction 361 22 11 4.16E-04 5.56E-01 regulation of cell communication 202 15 7 4.65E-04 4.28E-01 response to stimulus 597 31 18 5.05E-04 5.69E-01 lipid catabolic process 13 0 4 1 5.05E-04 regulation of signal transduction 186 14 5 6.24E-04 6.81E-01 cell surface receptor signalling pathway 188 14 3 6.94E-04 9.32E-01 signalling 377 22 12 7.44E-04 4.93E-01

Table 3. Top 10 most significantly enriched KEGG pathways in the chicken at 4, 24, and 72 hours post-infection. The table shows the term, then number of genes associated with this category in the chicken (N), the number of significantly up- and down-regulated genes and the p-value associated with each from a Fisher’s Exact Test.

Top 10 enriched KEGG pathways at 4 hours post-infection Pathway N Up Down P.Up P.Down Cytokine-cytokine receptor interaction 107 13 2 1.21E-10 1.62E-02 Toll-like receptor signalling pathway 75 8 1 1.41E-06 1.29E-01 C-type lectin receptor signalling pathway 78 8 0 1.91E-06 1 Cytosolic DNA-sensing pathway 36 5 0 3.98E-05 1 NOD-like receptor signalling pathway 110 7 1 1.88E-04 1.83E-01 Adipocytokine signalling pathway 53 5 0 2.61E-04 1 MAPK signalling pathway 199 9 0 3.11E-04 1 Herpes simplex virus 1 infection 134 7 0 6.26E-04 1 Salmonella infection 183 8 1 8.40E-04 2.86E-01 AGE-RAGE signalling pathway in diabetic 74 5 1 1.22E-03 1.27E-01 complicationsTop 10 enriched KEGG pathways at 24 hours post-infection Pathway N Up Down P.Up P.Down 109 0 61 1 3.03E-27 Phagosome 117 6 46 9.49E-01 2.59E-13 Metabolic pathways 1075 80 198 9.48E-01 9.95E-09 Valine, leucine and isoleucine degradation 41 1 19 9.76E-01 1.21E-07 Drug metabolism - other enzymes 51 7 21 1.52E-01 3.18E-07 Glycosaminoglycan degradation 15 0 10 1 1.68E-06 Drug metabolism - cytochrome P450 25 0 13 1 2.40E-06 Metabolism of xenobiotics by cytochrome 29 0 14 1 3.08E-06 GlutathioneP450 metabolism 42 5 17 3.02E-01 5.59E-06 Histidine metabolism 14 1 9 7.21E-01 8.93E-06

24 Top 10 enriched KEGG pathways at 72 hours post-infection Pathway N Up Down P.Up P.Down Lysosome 109 0 35 1 1.40E-26 Phagosome 117 2 21 8.35E-01 5.31E-11 Cytokine-cytokine receptor interaction 107 16 4 3.30E-08 4.18E-01 Metabolic pathways 1075 31 61 4.12E-01 1.85E-06 Glycosaminoglycan degradation 15 0 6 1 3.19E-06 Arginine and proline metabolism 34 0 7 1 6.41E-05 mTOR signalling pathway 122 5 13 2.44E-01 9.40E-05 Glycosphingolipid biosynthesis - globo and 9 0 4 1 9.76E-05 isoglobo series Glycerolipid metabolism 44 0 7 1 3.51E-04 Histidine metabolism 14 0 4 1 6.86E-04

The top GO categories for the 4 hour time point are clearly dominated by various regulatory genes while the top categories at 72 hours are mostly signalling and sensory related. The categories at 4 hours seem to be mostly associated with the positive regulation of different cellular processes and are up-regulated. The categories at 72 hours are more associated reactions and signalling due to stimuli and include both up- and down-regulated genes. The top enriched categories at 24 hours, however, are quite different, including several high-level categories such as biological process, likely due to the sheer number of significantly up- or down-regulated genes at that time point. There are also several developmental process categories, indicating that genes associated with these processes are also used for the response to an E. tenella infection.

The KEGG pathway enrichment at 4 hours shows a similar result to the GO enrichment, several signalling pathways seem to be up-regulated. In this case, it is likely that the same genes are showing up in different categories as many of them share signalling molecules, if not receptors. At 24 hours, more metabolic pathways are significant, mostly being down-regulated. Of special interest are the top two categories, lysosome and phagosome, as both are involved in breaking down material, including invasive organisms. The large scale down-regulation of these pathways may indicate that the parasite is affecting regulation to prevent cells from destroying it. At 72 hours, the lysosome and phagosome pathways remain down-regulated, though not to the same extent. There remain several down-regulated metabolic pathways but also two signalling pathways: mTOR signalling and cytokine-cytokine receptor interaction.

The E. tenella data was analysed as well but due to a lack of available GO annotation packages in R and a lack of KEGG annotation for the species, the analysis was accomplished using in- house code. The top 10 enriched GO categories and KEGG pathways can be found in Tables 4 and 5.

25 Table 4. Top 10 most significantly enriched GO categories in E. tenella at 4, 24, and 72 hours post-infection. The table shows the term, then number of genes associated with this category in the chicken (N), the number of significantly up- and down-regulated genes and the p-value associated with each from a Fisher’s Exact Test.

Top 10 enriched GO categories at 4 hours post-infection Term N Up Down P.Up P.Down dephosphorylation 7 0 7 1 1.45E-03 mRNA splicing, via spliceosome 9 0 8 1 3.35E-03 intracellular protein transport 28 12 6 4.65E-03 9.87E-01 translational elongation 8 5 1 1.01E-02 9.82E-01 proton transmembrane transport 6 4 0 1.66E-02 1 vesicle-mediated transport 36 13 7 1.72E-02 9.97E-01 transcription, DNA-templated 26 5 16 6.12E-01 1.81E-02 pathogenesis 4 0 4 1 2.40E-02 protein dephosphorylation 4 0 4 1 2.40E-02 proteolysis involved in cellular protein 16 7 2 2.58E-02 9.96E-01 catabolic processTop 10 enriched GO categories at 24 hours post-infection Term N Up Down P.Up P.Down glycolytic process 13 13 0 2.12E-05 1 proteolysis involved in cellular protein 16 15 1 3.81E-05 1 translationcatabolic process 99 59 19 1.00E-03 1 dephosphorylation 7 0 7 1 1.30E-03 mRNA splicing, via spliceosome 9 1 8 9.94E-01 2.97E-03 transcription, DNA-templated 26 8 17 9.40E-01 5.28E-03 proton transmembrane transport 6 6 0 6.99E-03 1 regulation of transcription, DNA-templated 44 9 25 1.00E+00 1.11E-02 protein dephosphorylation 4 0 4 1 2.25E-02 intracellular protein transport 28 18 7 2.28E-02 9.58E-01 Top 10 enriched GO categories at 72 hours post-infection Term N Up Down P.Up P.Down proteolysis involved in cellular protein 16 15 1 6.89E-05 1 glycolyticcatabolic process process 13 12 1 6.02E-04 9.98E-01 translation 99 61 16 8.99E-04 1 dephosphorylation 7 0 7 1 1.25E-03 transcription, DNA-templated 26 7 18 9.85E-01 1.45E-03 mRNA splicing, via spliceosome 9 0 8 1 2.85E-03 proton transmembrane transport 6 6 0 8.97E-03 1 protein dephosphorylation 4 0 4 1 2.20E-02 mRNA processing 6 1 5 9.74E-01 3.45E-02 rRNA processing 19 13 3 3.82E-02 9.92E-01

26 Table 5. Top 10 most significantly enriched KEGG pathways in E. tenella at 4, 24, and 72 hours post-infection. The table shows the term, then number of genes associated with this category in the chicken (N), the number of significantly up- and down-regulated genes and the p-value associated with each from a Fisher’s Exact Test.

Top 10 enriched KEGG pathways at 4 hours post-infection Pathway N Up Down P.Up P.Down Spliceosome 79 0 66 1 4.28E-18 Homologous recombination 10 7 2 6.70E-04 9.26E-01 96 31 24 1.52E-03 9.93E-01 Autophagy - animal 5 4 0 5.79E-03 1 Pathogenic Escherichia coli infection 7 1 6 7.76E-01 1.09E-02 Phagosome 6 0 5 1 2.62E-02 Proteasome 33 11 7 3.94E-02 9.80E-01 RNA polymerase 28 4 15 8.16E-01 4.56E-02 RIG-I-like receptor signalling pathway 3 0 3 1 4.77E-02 Oxidative phosphorylation 30 10 10 4.84E-02 6.95E-01 Top 10 enriched KEGG pathways at 24 hours post-infection Pathway N Up Down P.Up P.Down Spliceosome 79 0 65 1 1.29E-17 Ribosome 96 60 7 2.82E-05 1 Proteasome 33 23 3 1.05E-03 1 Pyrimidine metabolism 14 11 2 5.71E-03 9.82E-01 RNA polymerase 28 7 17 9.80E-01 5.83E-03 Glycolysis / Gluconeogenesis 25 17 5 7.09E-03 9.72E-01 Pathogenic Escherichia coli infection 7 1 6 9.77E-01 9.89E-03 Fatty acid elongation 5 5 0 1.26E-02 1 Autophagy - animal 5 5 0 1.26E-02 1 Base excision repair 5 5 0 1.26E-02 1 Top 10 enriched KEGG pathways at 72 hours post-infection Pathway N Up Down P.Up P.Down Spliceosome 79 1 60 1 1.20E-13 Ribosome 96 62 7 3.09E-05 1 Fatty acid elongation 5 5 0 1.62E-02 1 Proteasome 33 21 3 1.74E-02 1 Glycolysis / Gluconeogenesis 25 16 6 3.41E-02 9.21E-01 RIG-I-like receptor signalling pathway 3 0 3 1 4.36E-02 MAPK signalling pathway - yeast 5 1 4 9.44E-01 5.51E-02 Citrate cycle (TCA cycle) 15 10 2 6.47E-02 9.86E-01 RNA polymerase 28 6 14 9.96E-01 7.65E-02 Arginine and proline metabolism 8 6 0 7.83E-02 1

27 Unlike the chicken gene expression data, all time points have many significantly up- and down- regulated genes in E. tenella. This is very clear at the 4 hour time point in Table 4 where the top categories have most of the genes they contain being significantly differentially expressed in the analysis. The top categories are also very different from those in the chicken. At 4 hours, the chicken had mostly signalling categories of different types, whereas E. tenella has transport, proteolysis, dephosphorylation and DNA/RNA processing. Of these, dephosphorylation and DNA/RNA processing are strongly down-regulated while the transport categories seem to have a mix of up- and down-regulated genes. The catabolic proteolysis process is mostly up- regulated. At 24 hours, dephosphorylation remains down-regulated along with transcription and mRNA splicing but glycolytic processes have become highly up-regulated along with translation and transport. At 72 hours, the top categories are still similar to 24 hours and with similar expression patterns though rRNA processing has become significantly up-regulated.

In the KEGG pathway analysis for E. tenella in Table 5, two things must be kept in mind. First, the comparison is between E. tenella that is actively infecting a host and E. tenella that is ready to infect cells. Second, as there is no KEGG entry for E. tenella or any species of the Eimeria genus; the pathway annotation was generated by BLAST comparison with organisms present in the database and may not be completely accurate. As the table shows, four pathways are dominant across the time points: spliceosome, ribosome, proteasome and RNA polymerase. The spliceosome is strongly down-regulated across all time points, indicating a high up- regulation in the pure E. tenella sample. The ribosome pathway has both up- and down- regulated genes at 4 hours but becomes strongly up-regulated for the rest of the analysis. The proteasome pathway has a mix of up- and down-regulated genes at 4 and 72 hours but is very up-regulated at 24 hours. The RNA polymerase pathway has mostly down-regulated genes but also some up-regulated ones across the time points. The rest of the pathways generally change but worth a mention is the glycolysis/gluconeogenesis pathway, which is mostly up-regulated at 24 and 72 hours.

28 3.4 Immune/infection gene expression

Figure 8. Heatmap of the expression of 241 immune related genes in the chicken. Green represents up-regulated genes and red down-regulated. Intensity scales with log2 fold change.

29 In order to examine the immune response in the chicken cells, all genes that were associated with GO:0002376, immune systems process, and all subcategories, as well as immune system related KEGG pathways were identified. Those genes that also showed significant differential expression in at least one time point were then plotted in a heatmap (Figure 8). It also includes a hierarchical clustering dendrogram, showing which genes behave in a similar fashion. The plot shows a few dominant patterns of expression. A majority of genes appear to be heavily down-regulated at 24 hours, especially those that are up-regulated at 2 and 72 hours. A smaller number at the top of the plot shows the opposite behaviour.

Dr. Eva Wattrang, a chicken immunology expert at SVA, identified several groups of genes with known immune related roles in the chicken. These genes were plotted in line plots to further elucidate their behaviour over the course of the infection. Only genes that had an FDR

< 0.05 and a log2 fold change of at least one in at least a single time point were included in the graphs.

Figure 9. The expression of mannose receptors as a function of time. Point shape indicates if the gene was significant, FDR < 0.05, at that time point. The first gene category examined was the mannose receptors, shown in Figure 9. These show a relatively clear pattern, all mannose receptors except for MRC2 are down-regulated, a very significant effect at 24 hours. MRC2 begins down-regulated, though not significantly so, and is up-regulated at the later time points.

30

Figure 10. The expression of chemokines as a function of time. Point shape indicates if the gene was significant, FDR < 0.05, at that time point.

Figure 11. The expression of cytokines as a function of time. Point shape indicates if the gene was significant, FDR < 0.05, at that time point.

31 Cytokines are an important family of signalling proteins that have a role in the immune response. Chemokines are a subcategory of cytokines mainly involved in the recruitment of cells to e.g. sites of infection. The expression of these genes in the chicken are shown in Figures 10 and 11. Most of the up-regulated chemokines share a peak at 4 hours, indicating that they may have an important role in the early stages of the infection. The expression then falls, reaching a minimum at 48 hours before rising again at 72 hours. An exception to this pattern is CX3CL1 which has a relatively stable up-regulation until 72 hours, when it goes down but remains significantly up-regulated. The only two chemokines to be significantly down- regulated are CXCL12 and ATRN, both of which are highly down-regulated at 24 hours. The general cytokine category shows a much wider variety of expression patterns. IFNW1 and IL11 both remain non-significant until 48 hours but are both highly up-regulated at 48 and 72 hours, seemingly being important at the later stages of the in vitro infection. CSF3 and IL1B are both significantly up-regulated at the early time points, taper off towards the 48-hour time point but then are again significantly up-regulated at 72 hours.

Figure 12. The expression of Type I Interferon ”signature” genes as a function of time. Point shape indicates if the gene was significant, FDR < 0.05, at that time point. Interferons are cytokines important in the defence against intracellular pathogens and induce the expression of many other genes. The ”Interferon (IFN) signature” category contains genes involved with the induction of type I IFNs and IFN regulated genes, i.e. genes that are expressed by cells in response to IFN. Figure 12 shows the expression of several of these. The genes IFITM3, LOC107053353, IGBP1L and IRF5 all share a similar expression pattern, with no

32 significant differential expression until 24 hours, where they are all significantly down- regulated. After this, they are all again non-significant, except for IFITM3 which is significantly up-regulated at 72 hours. The rest of the genes are all significantly up-regulated at different times. Most of them, IFI6, IFIT5, IFITM5, MX1 and OASL, are all nonsignificant until at 48 hours where they are significantly up-regulated and continue to be up-regulated at 72 hours. Only IRF1 clearly breaks this pattern, being very significant and highly up-regulated at 2-12 hours and falling to a low but consistent level of up-regulation at 24-72 hours. IFITM5 has a similar pattern of expression at 2-24 hours but is not significant.

Figure 13. The expression of pattern recognition receptor genes as a function of time. Point shape indicates if the gene was significant, FDR < 0.05, at that time point. Pattern recognition receptors (PRR) are parts of the innate immune system that help recognize pathogens and pathogen related cell damage. In Figure 13, the vast majority of PRR genes are significantly down-regulated at 24 hours, again supporting a major immunological event at that time. Outside of this time point, a few genes are significant, though the clearest outliers are NLRC5 and TLR15. NLRC5 is not significant until at 24 hours, where it is significantly up- regulated and continues to be at 48 and 72 hours. TLR15 on the other hand is up-regulated at 2- 12 hours but only significantly so at 4 hours. CLEC17A is also interesting in that it appears to be up-regulated at 4 hours, though not significantly, but otherwise follows the same pattern as the majority of PRR genes.

33

Figure 14. The expression of integrin genes as a function of time. Point shape indicates if the gene was significant, FDR < 0.05, at that time point. Integrins are transmembrane receptors that can activate signalling pathways in the cell and allow for rapid reaction to events at the cell surface. Similar to previous categories, the majority of genes in Figure 14 follow a similar pattern, with significant down-regulation at 24 hours and otherwise a general down-regulation but with few significant time points. ITGB3, ITGA9 and ITGB6 clearly follow a different pattern, however. ITGB3 appears to be up-regulated early and has significant up-regulation at 12 and 24 hours but is nonsignificant at 48 and 72 hours. ITGA9 appears to be non-significant until at 48 hours where it is fairly strongly up-regulated and continues to be at 72 hours. ITGB6 behaves in a similar fashion though it isn't significant at 48 hours.

Overall, the various categories of immune related genes in the chicken show a large variety of expression patterns. The clearest common behaviour that can be seen across categories is an increased number of significant genes at 24 hours, especially for down-regulated genes, something that the heatmap in Figure 8 and the volcano plot in Figure 6 also show.

34

Figure 15. The expression of surface antigen genes in E. tenella as a function of time. Point shape indicates if the gene was significant, FDR < 0.05, at that time point. The glycosylphosphatidylinositol (GPI)-anchored surface antigen (SAG) genes are a set of proteins found on the surface of several species of apicomplexan parasites, including E. tenella. They appear to have important roles in infection. Figure 15 shows the expression of a subset of these genes at the different time points. Most of them show a similar behaviour, non-significant until 48 hours where they are significantly up-regulated and continue at a similar level at 72 hours. A few genes, SAGs 13, 14, and 4, are instead down-regulated across all time points. SAGs 4 and 14 show a peak of downregulation at 12 hours but subsequently following a similar pattern to the upregulated SAGs and SAG13 becomes continuously more down-regulated.

35 3.5 Gene co-expression modules

In order to find networks of co-expressed genes across the two organisms, chicken and E. tenella, the WGCNA package was used. This tool is used to identify highly correlated gene clusters in an expression dataset. It was applied to the infected chicken samples only; this was to avoid bias from the uninfected chicken samples not having any E. tenella data.

Figure 16. Hierarchical clustering of the normalized expression of genes in each sample. Sample 33_S19 forms an outlier.

36

Figure 17. Model fit versus mean connectivity. The lines on the scale independence plot are set at a fit of 0.7 and 0.9. The first step of the analysis was to load all the in vitro read data from both the chicken and E. tenella into R where it was filtered and normalized. The data was then clustered (Figure 16) to identify outlier samples. Sample 33_S29 formed a clear outlier, likely because it only included E. tenella data and would therefore be very different from the other samples, having high CPM values for parasite genes and zeroes for the chicken genes. This sample was therefore removed from this analysis to avoid any potential bias. The rest of the data forms three clear groups: the leftmost group of samples was taken at a different time from the rest but includes samples from 2-24 hours, the centre group is composed of samples from 48 and 72 hours and the rightmost group is also samples from 2-24 hours. Next, the fit of the data to the WGCNA model was assessed with different soft thresholds. The goal is to pick a threshold that fits the model well but also allow for high connectivity. Figure 17 shows the trade-off between connectivity and model fit. In order to get the best fit, a threshold of 15-20 would be best but at that point the connectivity has become very low. Therefore, a soft threshold of 9 was chosen as a compromise.

37

Figure 18. A dendrogram showing the different genes clustered together and how they were assigned color coded modules.

Table 6. The number of genes and the percentage of E. tenella genes in each module, the number of significantly DE immune-related genes and the number of genes from the immune categories IFN signalling and PRR.

Module Number E. tenella gene Number of significant IFN PRR of genes percentage immune genes signal Black 704 17.47 26 0 3 Blue 4129 2.33 40 1 2 Brown 4047 11.29 28 0 2 Green 1014 31.66 38 0 2 Greenyellow 130 31.54 3 0 1 Grey 110 19.09 2 1 0 Magenta 149 12.08 1 1 0 Pink 620 86.45 2 0 0 Purple 149 13.42 11 5 0 Red 716 60.34 15 0 3 Tan 94 3.19 6 0 0 Turquoise 5225 78.39 22 2 1 Yellow 3438 26.70 42 0 6

38 With the data cleaned up and model parameters chosen, the co-expression modules were calculated. This resulted in 13 modules, shown in Figure 18 and Table 6. The modules vary greatly in size, from large ones such as turquoise and blue, to small ones such as tan, purple and grey. The fraction of E. tenella genes in each module also varies greatly, with large modules with very high and low fractions of parasite genes likely containing housekeeping genes from each organism. For example, the blue module contains the genes of the glycolysis pathway in the chicken whereas the turquoise module contains the same for E. tenella. A GO and KEGG enrichment analysis further supported this, showing an enrichment of mitochondrial GO categories and metabolic KEGG pathways. Many of the other modules have a fairly mixed composition, though given the far larger number of chicken genes in the analysis, it's not surprising to see the majority of modules being dominated by the chicken.

Table 6 also shows the number of genes from the heatmap in Figure 8 and the immune categories IFN signature and PRR from the previous section. The immune-related genes appear to be spread across the modules, with no module being an obvious “immune response” module. The GO and KEGG enrichment analysis supported this, with immune related categories being enriched but not showing a strong signal in any specific module. Specific categories of immune genes were checked for enrichment in the modules as well and showed the same pattern again, as can be seen with the IFN signature and PRR genes. These two categories do, however, have a fair number of genes clustered in specific modules: yellow for PRR and purple for IFN signature. For PRR, this includes the gene TLR15, which shows unique behaviour in Figure 13 and is thought to have a role in the chicken immune response to E. tenella. For the IFN signature genes, most of them seem to cluster in the purple module and they make up just under half the immune-related genes present in the module.

39 4 Discussion

4.1 The data and the pipeline

The main aim of this project was building a pipeline for the dual RNA-seq and subsequent differential expression analysis of chicken and E. tenella data. This pipeline was to be used to analyse both in vitro and in vivo data for this project and beyond. The first half of the pipeline quality checked the reads, trimmed them, mapped them to the reference genomes and counted those that mapped to features. These steps went well and as Table 1 shows, the data mapped very well to the chicken reference genome but less so to the E. tenella reference. The second half of the pipeline involved loading the data into R, performing further quality checks and performing a differential expression analysis. As the volcano plots in Figures 6 and 7 show, this step was also a success. Overall, the pipeline performed well with the main bottlenecks being the mapping and read counting, both of which take a relatively large number of CPU hours to complete. This was not an issue for the in vitro datasets, as they were relatively small, but needs to be addressed with larger datasets that the pipeline might be used for in the future. The read mapping performed better as STAR allows for multithreading, something HTSeq does not. As such, running large amounts of data will require splitting the data up manually if faster results are desired.

While running the pipeline, the quality checks revealed various issues in the in vitro datasets. The first was a large amount of adapter content in the reverse reads. This needed to be trimmed away and left many of the reverse reads unusable for the analysis, meaning that many of the forward reads were mapped as single reads. The cause for this most likely lies in the way that the samples were prepared for sequencing in the lab and as such, was beyond the scope of this project to address. The second was the mapping rate of the different samples, shown in Table 1. While most samples mapped quite well, with an ~80-85% mapping rate to features, there was one notable exception: the pure E. tenella sample. It only had a mapping rate of 65%, which is quite a lot lower than for the chicken samples. While the mapping rate to features was quite low, the mapping rate to the genome was around the same level as the chicken samples, ~87%, meaning that the issue lies in reads mapping where there are no annotated features. It seems unlikely that such a large-scale genomic contamination would be present, given the lab prep used DNAse before cDNA synthesis that should have gotten rid of any such contamination. The best way to check this would be to sequence several pure samples and map them to the genome, but that was beyond the scope of this project. The cause is likely to be a lack of annotation for the E. tenella genome. This then suggests that a large portion of the expression of E. tenella is not being captured and emphasizes the importance of producing a better annotation for the organism, even if it's only structural, so that important genes are not being missed. Data from this project could be used to aid in improving the annotation in the future, as expression data is a valuable tool for determining genes in eukaryotic genomes.

40 The general quality of the E. tenella reference genome is another issue. While the chicken genome is of fairly high quality and not very fragmented, the E. tenella genome is highly fragmented. The genome is ~50 Mb in size and made up of 14 chromosomes but is split into over 4000 contigs. This fragmentation may cause genes to be split into multiple contigs, making it more difficult to identify them. This also leads into a different issue: the annotation of both the chicken and E. tenella genomes. For the chicken genome, most of the genes do have a functional annotation but many of those are taken directly from mammalian homologues and many are still only annotated as hypothetical proteins. This reduces the accuracy of any analysis utilizing the reference genome and makes the interpretation of results more difficult. This also extends to GO enrichment analyses as the GO annotations available through packages, such as org.eg.Gg.db, remain relatively poor. There are, however, more extensive annotations available from the GO Consortium. For E. tenella, this problem is far more serious. Over a third of the gene products remain annotated as hypothetical proteins and another large portion has only putative annotations. This extends to the GO annotation where only a minority of genes are GO annotated. There is also no KEGG pathway annotation for E. tenella or any species of genus Eimeria. This lack of annotation makes studying E. tenella and other eimerian parasites difficult and needs to be addressed in order to facilitate more research on Eimeria species and apicomplexan parasites in general.

In order to gauge the quality of the datasets, quality checks were implemented as part of the pipeline by clustering the data and displaying it in Figures 2-5. The chicken data had no particular outliers, and also did not cluster clearly based on infection status, though the infected samples did seem to cluster together more than the uninfected ones. In the MDS plot in Figure 2 there is some tendency for samples from the later time points, 48 and 72 hours post infection, to cluster together, something that can also be observed in the dendrogram in Figure 16. This seems to indicate that a major event is occurring between 24 hours and 48 hours. This is also supported by the volcano plots in Figure 6 and the behaviour of many of the immune-related genes examined in Figures 9-15. Tierney & Mulcahy (2003) described a time series of in vitro E. tenella infection, though with a different E. tenella strain, UTI1507, and in a bovine cell line, Madison-Darby bovine kidney. Their results show that at 24 hours post infection, the parasite has formed trophozoites within a parasitophorous , i.e. a membrane protected compartment in the host cell. At 48 hours they have formed immature schizonts which become fully mature at 60 hours and have formed merozoites at 72 hours, meaning the first merogony is over and they are ready to leave the current host cell and begin infecting other cells (Tierney & Mulcahy 2003).

The expression patterns of E. tenella were quite different from those of the chicken. As both the MDS plot in Figure 3 and the PCA plot in Figure 5 show, the parasite samples cluster very clearly based on time points, with there being a clear separation between 2-4, 12-24 and 48-72 hours. Given the results of Tierney & Mulcahy (2003), it seems reasonable to assume that these three stages are the beginning of infection, the parasite securing itself in the parasitophorous vacuole and the merogony itself, possibly with some reinfection starting to occur at 72 hours.

41 It is also clear from them that there is far more differential expression overall, both up- and down-regulation, in E. tenella then there is in the chicken. This may be explained by a few different factors. One of these is that E. tenella is unicellular and has a small genome adapted to its particular life cycle, while the chicken has a large genome that responds to a larger variety of situations and is therefore unlikely to show genome-wide differential expression. Another is the difference in references. The reference for E. tenella is a pure sample of parasites that are ready to begin infection. The reference for each sample of the chicken, however, is the uninfected version of the same cell line taken at the same time. There is some inherent difficulty in finding a good reference for E. tenella, a sample will either be from parasites actively infecting and replicating or from parasites that are waiting to infect a new host. The use of a sample of non-infecting E. tenella as a reference causes some issues with the analyses, for example the spliceosome KEGG pathway appears highly down-regulated across samples but it is in fact highly up-regulated in the reference. As such, interpretation of the results needs to be careful.

The last issue relating to the data that requires discussion is the sequencing cost. While dual RNA-seq is a very useful tool for examining disease, there are clear drawbacks. This is very apparent in the depth of sequencing required to get sufficient E. tenella data for analysis. As shown in Table 1 and Figure 1, the fraction of reads coming from E. tenella is very low. This means that to be able to do a proper DE analysis, the RNA needs to be sequenced 100-200-fold deeper than would be required if only the chicken or a purified sample of the parasite were being sequenced. For the in vivo pilot study (results not shown), the ratio was even worse, with only 0.0002% E. tenella reads at 24 hours post-infection, meaning that even deeper sequencing is needed to get sufficient sequencing depth of the parasite transcriptome to allow for a proper DE analysis of it. Ching et al. (2014) suggest that 5-20 million reads are required to identify low-expressed genes, which in the case of the pilot study would require very deep sequencing. Their results are, however, extrapolated from a single study and it would be prudent to estimate the needed depth based on pilot studies. Another option would be single-cell RNA sequencing, but this method has its own drawbacks, especially for E. tenella where single-cell sequencing has not been attempted before. E. tenella requires more than one host cell in order to mature and it can't finish its lifecycle in vitro, so single-cell sequencing may not be realistic for an apicomplexan parasite. Therefore, the cost of doing a proper dual RNA-seq analysis that can enjoy all of the advantages of the technique will be quite high.

4.2 GO and KEGG categories

The second goal of this project was to identify genes with important roles in the chicken immune response and E. tenella infection process and elucidate their functions. To this end, a GO category and KEGG pathway enrichment analyses were performed on the results of the DE analysis. Here, the goal was to look at two different aspects of the results: a global view of what was being expressed and a closer look at the genes of the immune system. In order to identify globally relevant categories, the top 10 most enriched GO categories and KEGG pathways at

42 different time points were listed in Tables 2-5. For the chicken, the top GO categories change completely between time points. The regulatory categories that are enriched at 4 hours seem to suggest a general up-regulation of a variety of different processes, including several transcriptional processes. This seems to indicate that the host cells are speeding up their metabolism as part of their reaction and signalling that an infection is in progress. Indeed, many of the top 10 enriched KEGG pathways indicated that recognition of the infectious agent is important at this time point. At 24 hours there is a much stronger reaction. The top categories have a mix of up- and down-regulated genes, though this is likely because they are quite high- level categories and cover genes with a variety of functions. Enriched categories include developmental processes, indicating that these categories either cover genes that are also involved in other processes or that these pathways are for some reason activated during infection. The large degree of down-regulation observed at this time point could well be an evasive mechanism by the parasite. At 72 hours, signalling and regulatory processes dominate. Given that the first merogony is either finishing or already finished at this time, it's possible that these are signs of the new merozoites starting to infect other cells. The top KEGG pathways show a similar pattern. At 4 hours, various signalling pathways are up-regulated. At 24 hours, these are replaced by a variety of down-regulated metabolic pathways. Of particular note here are the lysosome and phagosome, which are highly down-regulated and continue to be down- regulated at 72 hours. While there is currently not much evidence of these pathways being important for defence against E. tenella in chickens, for another apicomplexan parasite, Toxoplasma gondii, autophagy is a very important process for the immune response (Krishnamurthy et al. 2017). These results suggest that controlling autophagy in the host may also be important for E. tenella survival. Another pathway that might be of interest is the cytokine-cytokine receptor interaction pathway. Cytokines have an important role for cell signalling and as Figure 10 and Figure 11 show even more clearly, they are significantly up- regulated at 48 and 72 hours, suggesting an important role for the immune response at these time points.

For E. tenella, there is a much more stable set of enriched categories across time points. The top GO categories, for example, all include down-regulated dephosphorylation and mRNA splicing via spliceosome, whereas the top KEGG pathways all contain a down-regulated spliceosome and mixed to up-regulated ribosome pathway. In this case, it is the reference that has these pathways significantly up-regulated, rather than down-regulation during infection. Since these will largely show up for all time points, it is more interesting to look at the differences between them. One GO category, glycolytic process and a few KEGG pathways, such as Fatty acid elongation, TCA cycle and Glycolysis/gluconeogenesis, are generally up- regulated at 24 and/or 72 hours, further supporting that the parasite is starting to grow and replicate at 24 hours and continues to do so. At 4 hours most of the categories are related to transport and editing of mRNA and proteins. This suggests that E. tenella may be more concerned with the regulation of mRNA and proteins already present, perhaps carried over from the sporozoite stage.

43 4.3 Resistance and infection genes

While the top 10 GO and KEGG categories give some insight into the globally important genes, the most interesting genes here are those known to take part in the chicken immune response. Therefore, a list of genes contained in immune related GO and KEGG categories was made and plotted in the heatmap in Figure 8 and specific categories of genes plotted in Figures 9-15. The heatmap shows four general patterns of expression for these genes: 1) up-regulation at the start, down-regulation at 24 hours and up-regulation again at 72 hours, 2) the opposite pattern, 3) up- regulation at the start and a gradual drop to down-regulation at the end and 4) the opposite pattern. It's clear that these genes are under different modes of regulation but the 24 hour time point seems to be generally important.

Several categories of immune genes were examined in more detail. A predominant pattern across many of the categories was a low down-regulation or non-significant differential expression with a peak of down-regulation at 24 hours, consistent with the heatmap (Figure 8). The genes discussed here are generally genes that broke that pattern. The first category examined was the mannose receptors. These proteins have a role in both the innate and adaptive immune systems and are generally used for the recognition of proteins and foreign microorganisms (Weis et al. 1998). Here, the MRC1 paralogues and PLA2R1 all show the predominant pattern. The only gene to show a different behaviour was MRC2, which increases quickly in up-regulation towards 24 h where it reaches a plateau and increases slowly towards 72 hours. This clearly shows different roles for these receptors in the response to E. tenella.

Chemokines and cytokines are important signalling molecules. Figures 10 and 11 show a very varied pattern of expression for the general cytokines but less so for the chemokines. Most of the chemokines are highly up-regulated early and then fall towards the 48-hours time point and then being more up-regulated at 72 hours. This seems to suggest a role in the early detection of an E. tenella infection that may be repeated as new merozoites begin infecting more cells. Most of the cytokines in Figure 11 show similar behaviour with the exception of IFNW1 and IL11. These two genes seem to have an important role at 48 hours, where most other cytokines don't show significant differential expression in either direction. IFNW1 is a member of the type I IFN family and has not been greatly studied in the chicken but is known to have a role in the immune reaction to viruses in humans. IL11 has recently been studied and characterized in the chicken and has been shown to have an important role in the immune system, affecting the expression of IFN-γ, which has a known role in the immune response to Eimeria infections (Rose et al. 1991, Truong et al. 2018). Indeed, IFN-γ is conspicuous by its absence as there is almost no detectable expression of it in any of the samples. This may be a sign of E. tenella down-regulating the expression of an important immune regulatory gene as the parasite has been shown to down-regulate IFN-γ (Chow et al. 2011).

The type I interferon signature genes show two main patterns of expression: a similar pattern as those seen before, i.e. general down-regulation with a peak at 24 hours, and a pattern of no differential expression until 48 hours, where they are dramatically up-regulated and continue to

44 be at 72 hours. The latter tends to apply to the IFN induced genes. The role of these genes therefore seems to be in the response at the later stage of the first merogony. One gene is an exception to these patterns: IRF1 is highly up-regulated at 2-4 hours and then declines but remains significantly up-regulated for the entire experiment. This seems to indicate a role in the early detection of the E. tenella infection and this gene is also involved in the induction of type I IFN production (Nguyen et al. 1997, Liu et al. 2018).

The pattern recognition receptors show a very clear theme of general down-regulation with a peak at 24 hours. Three genes break this pattern, however. CLEC17A shows a peak of up- regulation at 4 hours, though it's not significant and it follows the general pattern afterwards. TLR15, a TLR unique to birds, shows a significant up-regulation at 4 hours but is down- regulated at later time points. There is evidence of TLR15 playing a role in the innate immune response to E. tenella, which suggests that this early up-regulation could be very important (Zhou et al. 2013). The following down-regulation may also indicate that E. tenella is suppressing TLR15 expression. The last gene is NLRC5, which is significantly up-regulated from 24 hours onwards. This gene has been shown to be important to the inflammatory response in chickens and to regulate the expression of several interleukins (Lian et al. 2012). Given that E. tenella infections do cause considerable inflammation, this gene may be an important factor in the immune response.

The last category examined for the chicken was the integrins. Here, three genes break the general pattern, which is the same as observed in previous categories. ITGB3 shows early up- regulation, reaching a peak at 12 hours and then falling. This seems to suggest a role in the early reaction to E. tenella and it's down-regulation at 48 hours coincides with the up-regulation of ITGA9 and, to a lesser extent, ITGB6, suggesting that these integrins may have similar roles at different stage of the infection process.

One category of genes from E. tenella was also examined, GPI-anchored surface antigens. E. tenella has at least two families of these genes, A and B. The former includes genes SAG1-12 and the latter genes SAG13-23, with considerable conservation of both sequence and structure in the proteins in each group. Several other SAG genes have also been identified that may belong to either of these families, but have not been studied in as much detail so far (Tabarés et al. 2004). A functional assay of five genes from each family has been carried out on chicken macrophage HTC cells, which showed that SAGs 4, 5 and 12 had a large impact on the expression of several genes of the immune system, including IFN-γ, IL-10, IL-12 and IL-1β (Chow et al. 2011), suggesting that these genes play a large role in E. tenella evasion of the immune system. As Figure 15 shows, most of these genes are not significantly differentially expressed at the early time points but are significantly upregulated at 48 and 72 hours, including SAGs 5 and 12. As these are cell surface proteins, they are likely produced during the maturation of the new generation of merozoites, explaining the lack of expression at the early time points. SAGs 4, 13 and 14 behave quite differently from the rest, being down-regulated at all time points. This suggests that they may serve a role that is exclusive to the sporozoite or at least not relevant for the first generation of merozoites. This differs from the results of Tabarés

45 et al. (2004), who found that SAG4 was exclusively expressed in second generation merozoites, not in sporozoites, and SAGs 13 and 14 in both. It's clear, however, that the SAG genes have important roles and are a potential target for vaccines.

4.4 Gene co-expression networks and conclusion

The third goal of this project was to look at the interplay of genes between the chicken and E. tenella. It is well known that intracellular parasites, including E. tenella, affect the transcription of their host organism in order to survive, either by direct transcriptional control or, more frequently, through gene products. However, the causal relationship between genes expressed in the parasite and those expressed in the host can be difficult to figure out without expensive lab experiments. Dual RNA-seq, however, allows for examining the expression of genes from both host and parasite at the same time, which could elucidate correlations between the expression of genes between organisms. This could then help identify candidate genes from the parasite that affect the host response and therefore potential targets for a vaccine.

The identification of gene co-expression networks was accomplished by using the WGCNA package to look at the expression of genes from both organisms. WGCNA has never been applied to the E. tenella infection of a chicken before but it has been applied to a malaria infection with good results (Lee et al. 2018). The results from WGCNA for this system, however, were inconclusive. Table 6 shows that there is correlation between the expression of many E. tenella genes and chicken genes as well as large groups of genes that mostly include housekeeping genes that are likely not strongly affected by the infection. However, when checking for the enrichment of the modules with genes from the previously identified immune related GO categories and KEGG pathways, none of the modules showed a significant enrichment for the genes in general and even genes within specific categories were generally spread across the modules. When looking at specific genes, there were some interesting results. TLR15, along with five other PRR genes and 36 other immune related genes, was assigned to the yellow module. This module is quite large but includes a large number of E. tenella genes, including ETH_00030475, which has been shown to have an important role in E. maxima infection (Blake et al. 2011). It also includes several SAG genes from E. tenella. This indicates that this module may contain genes that are important for the chicken immune response, in the recognition of the infectious agent, and the E. tenella infection process.

While the results of the co-expression network analysis were inconclusive, it nonetheless highlights the potential in using dual RNA-seq on infection systems. There are great benefits to using this method as it would be impossible to examine correlations between the expression of genes in the two organisms in silico without it. With dual RNA-seq it is possible to generate hypotheses about correlations and causal relationships between genes in the two organisms that can then be tested in the lab. This results in more focused testing of these relationships at hopefully lower cost in both funding and time.

46 Overall, the project was successful in that a pipeline for quality checking, processing and analysing dual RNA-seq data was built. A number of genes with potential roles in the immune response to E. tenella in the chicken were identified and an analysis of the co-expression of genes in the two organisms was carried out. The results of the latter were not very conclusive, and no genes were identified as specifically as potential vaccine candidates, but this study will help inform the next steps to take in analysing a larger dataset of in vivo samples. The in vivo study will have deeper sequencing of both the chicken and the E. tenella transcriptome as well as increased replication. This will help in identifying low expressed genes as well as allowing for comparison with this dataset to see if the in vitro behaviour matches up to the in vivo one. In this way, this study has helped in the cause of controlling coccidiosis.

5 Acknowledgements

I would like to thank my supervisors, Robert Söderlund and Eva Wattrang, for their support and guidance during this project. I would also like to thank Feifei Xu for her help with the read mapping pipeline. Finally, I would like to thank my parents for giving feedback on the popular science summary as well as my family and friends for their support.

47 References

Anders S, Pyl PT, Huber W. 2015. HTSeq—a Python framework to work with high- throughput sequencing data. Bioinformatics 31: 166–169.

Andrews S. 2010. FastQC A Quality Control tool for High Throughput Sequence Data. online 2010: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/. Accessed March 23, 2020.

Belli SI, Smith NC, Ferguson DJP. 2006. The coccidian oocyst: a tough nut to crack! Trends in Parasitology 22: 416–423.

Beug H, von Kirchbach A, Döderlein G, Conscience J-F, Graf T. 1979. Chicken hematopoietic cells transformed by seven strains of defective avian leukemia viruses display three distinct phenotypes of differentiation. Cell 18: 375–390.

Blake DP, Billington KJ, Copestake SL, Oakes RD, Quail MA, Wan K-L, Shirley MW, Smith AL. 2011. Genetic Mapping Identifies Novel Highly Protective Antigens for an Apicomplexan Parasite. PLOS Pathogens 7: e1001279.

Blighe K, Rana S, Lewis M. 2019. EnhancedVolcano: Publication-ready volcano plots with enhanced colouring and labeling.

Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30: 2114–2120.

Carlson M. 2019a. GO.db: A set of annotation maps describing the entire Gene Ontology.

Carlson M. 2019b. org.Gg.eg.db: Genome wide annotation for Chicken.

Chapman HD, Barta JR, Blake D, Gruber A, Jenkins M, Smith NC, Suo X, Tomley FM. 2013. Chapter Two - A Selective Review of Advances in Coccidiosis Research. In: Rollinson D (ed.). Advances in Parasitology, pp. 93–171. Academic Press,

Ching T, Huang S, Garmire LX. 2014. Power analysis and sample size estimation for RNA- Seq differential expression. RNA 20: 1684–1696.

Chow Y-P, Wan K-L, Blake DP, Tomley F, Nathan S. 2011. Immunogenic Eimeria tenella Glycosylphosphatidylinositol-Anchored Surface Antigens (SAGs) Induce Inflammatory Responses in Avian Macrophages. PLOS ONE 6: e25233.

Coombes KR. 2019. ClassDiscovery: Classes and Methods for “Class Discovery” with Microarrays or Proteomics.

Daszak P. 1999. Zoite Migration during Eimeria tenella Infection: Parasite Adaptation to Host Defences. Parasitology Today 15: 67–72.

Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. 2013. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29:

48 15–21.

Ewels P, Magnusson M, Lundin S, Käller M. 2016. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32: 3047–3048.

Gajria B, Bahl A, Brestelli J, Dommer J, Fischer S, Gao X, Heiges M, Iodice J, Kissinger JC, Mackey AJ, Pinney DF, Roos DS, Stoeckert CJ, Wang H, Brunk BP. 2008. ToxoDB: an integrated Toxoplasma gondii database resource. Nucleic Acids Research 36: D553–D556.

Krishnamurthy S, Konstantinou EK, Young LH, Gold DA, Saeij JPJ. 2017. The human immune response to Toxoplasma: Autophagy versus cell death. PLOS Pathogens 13: e1006176.

Langfelder P, Horvath S. 2008. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9: 559.

Lee HJ, Georgiadou A, Walther M, Nwakanma D, Stewart LB, Levin M, Otto TD, Conway DJ, Coin LJ, Cunnington AJ. 2018. Integrated pathogen load and dual transcriptome analysis of systemic host-pathogen interactions in severe malaria. Science Translational Medicine, doi 10.1126/scitranslmed.aar3619.

Lian L, Ciraci C, Chang G, Hu J, Lamont SJ. 2012. NLRC5 knockdown in chicken macrophages alters response to LPS and poly (I:C) stimulation. BMC Veterinary Research 8: 23.

Liu Y, Cheng Y, Shan W, Ma J, Wang H, Sun J, Yan Y. 2018. Chicken interferon regulatory factor 1 (IRF1) involved in antiviral innate immunity via regulating IFN-β production. Developmental & Comparative Immunology 88: 77–82.

McDonald V, Rose ME. 1987. Eimeria tenella and E. necatrix: A Third Generation of Schizogony Is an Obligatory Part of the Developmental Cycle. The Journal of Parasitology 73: 617–622.

McDonald V, Shirley MW. 2009. Past and future: vaccination against Eimeria. Parasitology 136: 1477–1489.

Neuwirth E. 2014. RColorBrewer: ColorBrewer Palettes.

Nguyen H, Hiscott J, Pitha PM. 1997. The growing family of interferon regulatory factors. Cytokine & Growth Factor Reviews 8: 293–312.

Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK. 2015. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43: e47–e47.

Robinson MD, McCarthy DJ, Smyth GK. 2010. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26: 139–140.

49 Rose ME, Hesketh P. 1979. Immunity to coccidiosis: T-lymphocyte- or B-lymphocyte- deficient animals. Infection and Immunity 26: 630–637.

Rose ME, Wakelin D, Hesketh P. 1991. Interferon-gamma-mediated effects upon immunity to coccidial infections in the mouse. Parasite Immunology 13: 63–74.

Sigurðarson Sandholt AK. 2020. ArnarKSSandholt/EimeriaDualRNASeq. online 2020: https://github.com/ArnarKSSandholt/EimeriaDualRNASeq. Accessed June 5, 2020.

Tabarés E, Ferguson D, Clark J, Soon P-E, Wan K-L, Tomley F. 2004. Eimeria tenella sporozoites and merozoites differentially express glycosylphosphatidylinositol- anchored variant surface proteins. Molecular and Biochemical Parasitology 135: 123– 132.

Tenenbaum D. 2020. KEGGREST: Client-side REST access to KEGG. doi 10.18129/B9.bioc.KEGGREST.

Tierney J, Mulcahy G. 2003. Comparative development of Eimeria tenella () in host cells in vitro. Parasitology Research 90: 301–304.

Truong AD, Hong Y, Rengaraj D, Lee J, Lee K, Hong YH. 2018. Identification and functional characterization, including cytokine production modulation, of the novel chicken Interleukin-11. Developmental & Comparative Immunology 87: 51–63.

Vu V. 2011. ggbiplot: A ggplot2 based biplot.

Weis WI, Taylor ME, Drickamer K. 1998. The C-type lectin superfamily in the immune system. Immunological Reviews 163: 19–34.

Westermann AJ, Gorski SA, Vogel J. 2012. Dual RNA-seq of pathogen and host. Nature Reviews Microbiology 10: 618–630.

Wickham H. 2016. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York

Witcombe DM, Smith NC. 2014. Strategies for anti-coccidial prophylaxis. Parasitology 141: 1379–1389.

Zhou Z, Wang Z, Cao L, Hu S, Zhang Z, Qin B, Guo Z, Nie K. 2013. Upregulation of chicken TLR4, TLR15 and MyD88 in heterophils and monocyte-derived macrophages stimulated with Eimeria tenella in vitro. Experimental Parasitology 133: 427–433.

50