<<

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

The following full text is a publisher's version.

For additional information about this publication click this link. http://hdl.handle.net/2066/100868

Please be advised that this information was generated on 2021-09-28 and may be subject to change. Deciphering cellular responses to pathogens using genomics data

Iziah Edwin Sama Deciphering cellular responses to pathogens using genomics data This research was performed at the Centre for Molecular and Biomolecular Informatics (CMBI), Nijmegen Centre of Molecular Life Sciences, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands.

Funding: This work was supported by the VIRGO consortium, an Innovative Cluster approved by the Netherlands Genomics Initiative and partially funded by the Dutch Government (BSIK 03012), The Netherlands.

ISBN 978-90-9027062-3

© 2012 Iziah Edwin Sama All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, by print or otherwise, without permission in writing from the author

Front Cover Image: A metaphorical illustration of the complexity in a host cell (the field), wherein fundamental moieties like interact with each other (the network) in response to various pathogenic stimuli triggering respective cellular responses (the sub-fields demarcated by different line colors). The background is a picture of an indoors multi-sports field .The network is a -protein interaction network (HsapiensPPI of chapter 3) in which nodes represent proteins and edges between nodes indicate physical association. (Concept by Iziah Edwin Sama)

Cover design and lay-out: In Zicht Grafisch Ontwerp, Arnhem Printed by: Ipskamp Drukkers, Enschede

II Deciphering cellular responses to pathogens using genomics data

Proefschrift

ter verkrijging van de graad van doctor aan de Radboud Universiteit Nijmegen op gezag van de rector magnificus prof. mr. S.C.J.J. Kortmann, volgens besluit van het college van decanen in het openbaar te verdedigen op vrijdag 30 november 2012 om 12:30 uur precies

door

Iziah Edwin Sama geboren op 10 februari 1976 te Kombone, Kameroen

III Promotor Prof. dr. Martijn A. Huynen

Manuscriptcommissie Prof. dr. Robert W. Sauerwein (Voorzitter) Dr. Elena Marchiori Prof. dr. Willem J. Stiekema (UvA)

IV Deciphering cellular responses to pathogens using genomics data

Doctoral thesis

to obtain the degree of doctor from Radboud University Nijmegen on the authority of the Rector Magnificus prof.dr.S.C.J.J Kortmann, according to the decision of the Council of Deans to be defended in public on Friday, November 30, 2012 at 12:30 hours

by

Iziah Edwin Sama Born on February 10, 1976 in Kombone (Cameroon)

V Supervisor Prof. dr. Martijn A. Huynen

Doctoral Thesis Committee Prof. dr. Robert W. Sauerwein (Chairman) Dr. Elena Marchiori Prof. dr. Willem J. Stiekema (University of Amsterdam)

VI Contents

Chapter 1 General Introduction 11 1. Genomics perspectives to host-pathogen interactions 13 1.1 Transcriptomics 13 1.2 MicroRNAs 14 1.3 Transcription factor binding sites 15 1.4 Proteomics 16 1.5 Protein-protein interaction networks 16 2. Immune response 18 3. Outline of thesis 18 References 23

Chapter 2 Quantitative proteome profiling of respiratory virus-infected lung 27 epithelial cells Abstract 28 Introduction 29 Materials and Methods 30 Results and Discussion 33 Conclusions 53 Acknowledgements 53 References 54

Chapter 3 Measuring the physical cohesiveness of proteins using physical 59 interaction enrichment Abstract 60 Introduction 61 Methods 63 Results 70 Discussion 73 Acknowledgements 75 References 76

Chapter 4 Transcriptomic assessment of cellular response to intracellular 79 pathogens reveals an iron-depletion profile Abstract 80 Introduction 81 Materials and Methods 83 Results 89 Discussion 98 Acknowledgement 100 References 101

VII VIII Contents

Chapter 5 MicroRNA preferentially expressed in dendritic cells 105 contain sites for conserved transcription factor binding motifs in their promoters Abstract 106 Background 107 Methods 108 Results 112 Discussion 124 Conclusions 127 Acknowledgements 127 References 128

Chapter 6 Transcriptome kinetics of circulating neutrophils during 133 experimental endotoxemia Abstract 134 Introduction 135 Methods 135 Results 139 Discussion 148 Acknowledgements 151 References 152

Chapter 7 General Discussion 155 1. Summary of findings 157 2. Future perspectives 158 3. Conclusion 164 References 165

Summary 171 Samenvatting 175 Acknowledgement 179 Appendix 1: 181 Appendix 2: 186 Appendix 3: 204 Curriculum Vitae (English) 222 Curriculum Vitae (Nederlands) 223 List of Publications 224

IX

1

General Introduction

General Introduction 1

1. Genomics perspectives to host-pathogen interactions

The central dogma in molecular biology posits that, at a very basic level of most known living things lies deoxyribonucleic acid (DNA) which consists of sequential segmental nucleotide units called genes, interspersed by other nucleotide sequences for structural and regulatory purposes. As and exception, the genes of RNA viruses are in ribonucleic acid (RNA) as they do not have DNA. Genes are our hereditary material and are also the units that encode complex biological products like proteins of the immune system and various types of RNA, e.g. microRNAs. We refer to the collection of all genetic material in an organism as its genome. The was completely sequenced 10 years ago1, 2 and it has since then been used to investigate disease processes in a host-cell at a genomic level by, for example, uncovering host genes that are up- or down-regulated in expression because of an infection. To study such events more systematically, biologists examine biological information at various genomics levels, some of which I shall discuss below.

1.1 Transcriptomics The process of expressing genes from DNA is known as transcription and the protein coding precursors of transcribed genes are known as messenger RNAs (mRNAs). In this “omics” age, when several genes can be simultaneously investigated using technologies like DNA microarrays or RNA-seq to generate the so-called trancriptomics datasets from cells, reporting and interpreting huge numbers of genes against the background of what is known in the scientific literature is not a simple task. In infection biology for example, whole-genome transcriptomics (transcriptome) data, can be generated from a host-cell, in order to observe genes or group of genes that are responsive to infectious disease agents. Such efforts have yielded good results in several studies. For example, using transcriptomics data from high-density oligonucleotide microarrays, Bao and colleagues used alveolar epithelial cells (A549 cells) to globally uncover cytokines and chemokines that are induced in expression upon human metapneumovirus (HMPV) infection of these cells3, 4; thus providing an important resource for HMPV biology which I shall use in this thesis, and that will likely inspire other studies on HMPV pathogenesis. Another transcriptomics study related to infection concerns the respiratory syncytial virus (RSV) infection. Using mouse models, Schuurhof and colleagues examined differential host transcription profiles generated during secondary immune responses to RSV compared to primary infection to explain the phenomenon of vaccine-enhanced disease. They discovered that 5 days after challenge, the chemokine, inflammation and interferon response genes were expressed at higher levels during primary immune responses, while the immunoglobulin expression was higher during secondary immune responses. Furthermore, formalin-inactivated RSV vaccination with RSV challenge generated vaccine-enhanced disease and resulted in a transcription profile similar to

13 that of a primary immune response instead of a secondary response. In addition, Th2 was specifically induced in mice that were vaccinated with formalin- inactivated RSV. Schuurhof and colleagues concluded that their findings support the hypothesis that vaccine-enhanced disease is mediated by prolonged innate immune responses and Th2 polarization, in the absence of replication of the virus5.

1.2 MicroRNAs Not all genes are protein-encoding. Some genes encode microRNAs (miRNAs) that are important in post-transcriptional gene regulation. MiRNAs can stall protein translation by binding to the 3’ untranslated region (UTR) of mRNAs of their so-called target genes. Furthermore, it has been shown that targeted mRNAs are mainly degraded upon miRNA binding; leading to low concentrations of the mRNA targets6-8. MicroRNAs are currently being recognized as important regulatory factors in several biological processes such as the control of differentiation and maturation of innate-immune cells that are challenged with pathogenic material like LPS9-12.

Identifying miRNAs and their target genes is important in unraveling the gene network underpinning immune responses to pathogens. There are a number of bioinformatics- based algorithms aimed at predicting target genes for miRNAs. These include PicTar13 and TargetScan14. These prediction programs take genomic factors like the phylogenetic conservation of both the miRNA and the target genes amongst multiple animal species into account in order to ascertain the reliability of the predictions. In the context of evolutionary biology, strong conservation of a miRNA and its target gene suggests a functional partnership of the pair.

Experimental identification of miRNA targets is desirable because low degree of com- plementarity between the miRNA and its target sequence often makes it difficult to computationally identify targets. A biochemical method used to assuage this problem is based on Argonaute (Ago) proteins which function in multi-protein complexes associated with miRNAs15, 16. These protein complexes are guided by miRNAs to partially complementary sequences typically located in the 3’ UTR of their target mRNAs leading to the inhibition of its translation and/or its destabilization17-19. Based on this principle, highly specific monoclonal antibodies against Ago proteins enable the isolation of functional Ago-miRNA-mRNA complexes from different cell lines, tissues, or even patient samples. Thus enabling isolation and identification of miRNA target mRNAs, as has been done in immunoprecipitated human Ago protein complexes20.

Experimental confirmation of miRNA targets are usually performed after in silico target predictions (i.e. after using programs like PicTar) by employing an integrative miRNA- mRNA transcriptomics analysis approach. In such an approach, the down-regulated

14 General Introduction 1 targets of up-regulated miRNAs are deemed to be the likely true targets21-26. For example, in infectious disease related miRNA studies, human monocyte derived macrophages infected with Mycobacterium avium have been shown to concomitantly up-regulate miRNA let-7e and miR-29a and down-regulate their respective targets CASP3 and CASP726. These observations were further confirmed functionally using Gaussia luciferase reporter gene assays26. The miRNA-mRNA integrative approach has also been used in studying the development of immune-related cells. For example, studying the differentiation of dendritic cells (antigen-presenting cells of the immune system) from their monocyte precursors, Hashimi and colleagues observed that miR-21 and miR-34a down-regulated their target genes, JAG1 and WNT1, and the differentiation process was stalled when the miRNAs were inhibited or when their targets, WNT1 and JAG1 were exogenously added27. Thus making use of the observation that mammalian miRNAs reduce levels of their target mRNAs and proteins6 and affect developmental pathways. In another example, aimed at investigating the influence of miRNAs on mRNA transcript levels, Lim and colleagues discovered that delivering miR-124 into human cells causes the down-regulation of about 100 mRNA transcripts. Furthermore, the 3’ untranslated regions of these messages had a significant propensity to pair to the 5’ region of the miRNA, as expected if many of these mRNA messages are the direct targets of the miRNAs8.

1.3 Transcription factor binding sites Prior to the transcription of genes, proteins known as transcription factors bind in the promoter regions of genes to induce their expression. The recognition of transcription factor binding sites (TFBSs) in the promoter region of both protein-coding and non- protein coding genes is interesting because it can help us to understand the genomic architecture underlining gene expression under various circumstances. Furthermore, there is increasing evidence that transcription factors work in combination28. Under - standing the unique combination of transcription factors that are responsible for the expression of a group of genes under a given condition, such as during viral infection, can enable identification of genes that are likely to be involved in similar biological responses. Several techniques have been developed in bioinformatics to enable determination of TFBSs in promoter sequences of genes. These include algorithm like Clover29 that is used in searching for sites of well-characterized TFBS-motifs in a group of genes that are for example selected based on their similarity in response to a given environmental condition. Another algorithm is MEME30, which works by searching for repeated, ungapped sequence patterns that occur in a group of DNA sequences, and thus can lead to the discovery of novel TFBS motifs common to a group of promoter sequences. A number of bioinformatics algorithms have also been developed for genome-wide analysis of TFBSs based on data from high-throughput techniques like ChIP-Seq (chromatin immunoprecipitation coupled with massively parallel sequencing)31.

15 These include DREME32 and MEME-ChIP33 that are both used for motif-discovery in ChiP-Seq data analyses.

1.4 Proteomics Biology is too complex for information retrieved at a transcriptomics level to be wholly representative of the cellular conditions, especially because many biological signals are orchestrated at the level of proteins. Moreover, proteins are often modified after translation from their mRNA precursors. Examining cellular changes during host- pathogen interactions at the level of proteins is therefore important. We refer to the global repertoire of proteins of an organism or cell as its proteome. Proteomics techniques such as 2-D gel electrophoresis followed by mass spectrometry are used to generate high-throughput datasets of the proteins present in a cell. Using such approaches proteins that become more abundant or less abundant upon infection relative to non-infection, can be uncovered. Although important, proteomics data, in comparison to transcriptomics data, is relatively expensive to generate and also yields far lower number of identifiable proteins34 than the number of cognate genes that can be identified using transcriptomics techniques. As such, transcriptomics data are mostly used for exploratory genomics work to generate hypotheses that can be readily confirmed using detailed biochemical assays at the protein level. Nevertheless, both data types are important in understanding host-pathogen interactions. For example, protein microarrays are extensively used to identify pathogen proteins that bind to host proteins35. Recent examples include a genome-wide study of Pseudomonas aeruginosa outer membrane protein immunogenicity using self-assembling protein microarrays36. In that study, as much as 12 P. aeruginosa proteins were identified to trigger an adaptive immune response in the patients investigated; providing valuable information about which bacterial proteins are actually recognized by the host immune system.

Another example of immune-related proteomics work is the discovery, using 2-D gel electrophoresis and mass spectrometry, of two protein isoforms of PLUNC, a newly discovered human gene that is expressed in the upper respiratory tract and is suggested to be of importance in host defense against bacteria 37-39.

1.5 Protein-protein interaction networks Several biological processes are carried out by proteins interacting with each other, resulting in biomolecular pathways as those documented in the Kyoto Encyclopedia of Genes and Genomes (KEGG) 40. Nevertheless, characterization of all possible biological pathways for all biological contexts remains a daunting challenge. As such, the documentation of all known protein-protein interactions (PPIs) is important. Such a resource can in principle facilitate discovery of PPIs that could be functional in other contexts than the one in which the interaction was originally discovered. There are

16 General Introduction 1 already a number of such PPI resources, including the HPRD41 , BIND42, and BioGRID43 , amongst others.

In general, protein-protein interaction data are usually used in combination with other “omics” data (transcriptomics, proteomics, metabolomics), to unveil insights in molecular medicine that are not directly eminent from one type of data44, 45. In Figure 1, I depict the principle of combining PPI data with transcriptomics data to yield a PPI network that is potentially specific to the context of an infection.

Figure 1 - Generation of a potential context-specific protein-protein interaction network.

In the context of virus infection, for example, genes that are induced in expression in infected host-cells relative to non-infected cells are earmarked. A general protein- protein interaction (PPI) network is then queried for interacting proteins whose genes have thus been earmarked in the infection study to yield a context specific PPI network.

17 2. Immune response

In biology, immunity refers to the resistance of an organism against disease, including infectious diseases. This is usually conferred by immune defense proteins like interleukins and interferons. There is a new bioinformatics discipline called “immunomics”, which involves infection-related transcriptomics, genomics, proteomics and immunology data analyses46, 47. Immunomics started after the completion of the human genome project 10 years ago (Figure 2); and it already epitomizes advancement in vaccine research through the use of immunomics protein microarrays as a vaccine discovery tool48, 49. Examples of immunomics-related work include the immunomic analysis of the repertoire of T-cell specificities for influenza A virus in 50. Prior to the latter work, relatively few highly conserved epitopes of the virus were known to be relevant in humans. Yet, highly conserved epitopes recognized by effector T cells may represent an alternative approach for the generation of a more universal influenza virus vaccine. Starting from more than 4000 peptides that were selected from a panel of 23 influenza A virus strains, Assarsson and colleagues identified 54 nonredundant epitopes with high coverage and conservation in the majority of the virus strains analyzed. These were also consistently recognized in multiple infected individuals50. Their results provide a potential base for the development of a universal influenza vaccine. Similarly, immunomics approaches are being used to identify markers that can be employed for vaccine development to prevent infectious diseases like malaria51 and Schistosomiasis 48 amongst others. Nonetheless, several other works, this thesis included, that are related to immune response without specifically aiming at vaccine development, still fall in the realm of immune-related transcriptomics (Figure 2).

3. Outline of thesis

This thesis addresses cellular response signals post infection in innate-immune cells using genomics data. I shall give a brief outline of the rest of the chapters below.

3.1 Chapter 2 As introduced in section 1 above, signals in biological systems are often propagated through proteins interacting with each other. In Chapter 2 of this thesis, we examine the di erential protein responses of alveolar epithelial cells (A549 cell model) after infection with respiratory syncytial virus (RSV), parainuenza (PIV), human meta- pneumovirus (HMPV) and measles virus (MV). Using the 2-D gel approach, we show that RSV, HMPV, PIV3, and MV induced similar host responses and that the main differences were based on time and dosage of host gene response. In addition to writing the paper with the co-authors, I performed the molecular network analysis in

18 General Introduction 1

Figure 2 - Immunomics as a new sub-discipline in bioinformatics.

Over the past 10 years, high-throughput data analysis in infection biology is becoming popular, and paving the way to a new area of bioinformatics, immunomics, to focus on immunological genomics. The y-axis shows the number of articles in the US library of medicine (available at http://www.ncbi.nlm.nih.gov/pubmed) whose title or abstracts are associated with immunomics (query terms: immunome, or immunomics or immunomic); and transcriptomics&immunity (query terms: (transcriptome or transcriptomics or transcriptomic) and (immune or immunity)) in the last 10 years. (Queries performed at Pubmed on June 11, 2011) this chapter. By integrating and protein-protein interaction network data, I show that the proteins that become more abundant or less abundant after viral infections are signifi cantly enriched in apoptosis. I also provide PPI networks per virus-type to highlight the proteins involved. Apoptosis is likely an important innate- immune strategy to combat these viruses.

3.2 Chapter 3 The concept that proteins interact with each other in molecular pathways has stimulated the bioinformatics community to document functional protein-protein interaction (PPI) partnerships into general PPI resources available to the public. These resources are meant to facilitate the recapitulation of proteins that function together under various

19 circumstances. However, before proteins are available to interact with each other in various biological pathways, the cognate genes have to be expressed before the proteins are translated. The general method used to identify genes that are up- or downregulated in expression upon infection is by use of DNA microarray or RNA-seq. The ratio of expression of a gene in an infected cell to a non-infected cell of the same type is used to select genes that are up-regulated or down-regulated upon infection. Using this knowledge, a general approach used in system biology to discover context- specic protein-protein interaction networks is to overlay the gene response list with the general PPI network as demonstrated in Figure 1. Although this approach is interesting, general PPI networks are strongly biased towards well studied proteins e.g. proteins involved in the immune response. In Chapter 3 of this thesis, I examine genes induced in expression after infection of A549 cells with HMPV and other pathogens. I describe a novel method called Physical Interaction Enrichment which determines the significance of a context-specific PPI network derived from a general PPI network. I discovered that in spite of the enrichment of immune-related proteins and their protein interactions in general PPI networks, the PPI network of HMPV-induced genes have significantly more interactions among them than randomly selected proteins. In addition, I show that such networks can be used to predict genes that would be induced by HMPV at later time points of infection, thus helping to uncover genes downstream pathways activated upon HMPV infection.

3.3 Chapter 4 As most living things, microbial pathogens are in search for shelter and nutritional resources; often in detriment to their host. A well known micronutrient that is essential for the metabolism of both host-cells and microbes is iron. During infection, invading pathogens compete for iron and iron-related resources with the host cells. However, it is largely unknown how host innate-immune cells strategize their iron distribution to function as immune effectors, yet keeping the iron pools from being used by the invading pathogens. In Chapter 4 of this thesis, I describe a new method called Iron Squelch which I used to analyse the transcriptomic gene response to infection with respect to the response of the same genes to iron-overload or iron-depletion. Using this analytical method, I reveal the following: (1) The transcriptomic profiles of host innate-immune cells infected with HMPV, RSV, PIV, MV and Chlamydia pneumoniae (representing intracellular parasites), but not Pseudomonas aeruginosa and E. coli (representing extracellular pathogens), resembles that of iron-depleted cells. (2) The iron-depletion prole was also triggered by inactivated intracellular pathogens, suggesting that it is likely a basic cellular response to the pathogen material. (3) An iron-depletion profile was induced when host innate-immune cells were treated with interferon gamma, a common cytokine produced by the host cells upon intracellular pathogen infection. This first of all suggests that iron-depletion is a likely phenotype of

20 General Introduction 1 host-cells infected with intracellular pathogens. Secondly, iron-depletion in infected innate-immune cells could be interferon-mediated and could be an important innate immune strategy of the respiratory tract. I further propose that iron-homeostasis might be important in co-infections during which, depending upon the colonized host- subcellular compartment, microbes might trigger host-iron redistribution away or into their niches, leading to disease amelioration or severity.

3.4 Chapter 5 MicroRNAs are gene products that are not translated into proteins but may stall protein translation by binding to the cognate mRNAs of proteins or degrade the mRNAs. In Chapter 5 of this thesis, we identify microRNAs that are over-expressed in dendritic cells during their maturation from monocytes. The pathogenic material used in this maturation process was LPS which serves in the context of this thesis as a model to represent bacteria. Dendritic cells are antigen presenting cells that link innate immunity with adaptive immunity. In addition to identifying microRNAs and their target genes that could be important in DC di erentiation from their monocyte precursors, I also identified binding sites for transcription factors that potentially regulate the expression of the identified microRNAs. The results and data presented would serve as an important resource to help understand the biology of these important antigen-presenting cells.

3.5 Chapter 6 To combat bacterial infection, neutrophils are the immune cells of choice. In Chapter 6 of this thesis, we examine LPS-induced changes in neutrophils extracted from whole blood of 4 healthy LPS-infused volunteers. In addition to writing the manuscript with the co-authors, I analyzed the post normalized microarray data set. I report genes that follow various temporal kinetic patterns. As an example, I reveal CD177, a neutrophil specific antigen, as the most persistently up-regulated gene post LPS infusion. I performed Gene Ontology analyses on the gene sets and discovered that genes that were up-regulated at all 3 time points investigated are mainly enriched in “inflammatory response”, “anti-apoptosis” and “defense response”. Meanwhile, genes that were down-regulated at all time points after LPS infusion were enriched in “lymphocyte activation”. These observations correlate with the clinical parameters observed in the infected volunteers in whom neutrophilia and lymphocytopenia were observed after LPS infusion. I also applied the PIE approach to the context-specific networks derived from integrating HPRD data with the LPS-induced up- or down-regulated gene sets and found them to be physically cohesive. In addition, I provide PPI networks to show the temporal interactions between proteins of genes that are up-regulated or down- regulated post LPS infusion. We further provide and discuss lists of prioritized genes as a resource to help understand temporal changes induced by LPS upon neutrophils and likely molecular events underpinning immune responses to bacterial endotoxins.

21 3.6 Chapter 7 I round up in Chapter 7 with a summary discussion of the thesis, a brief discussion of events at the interface between host and pathogens, and future directions of immunological genomics.

22 General Introduction 1

References

1. Pennisi, E. Human genome. Finally, the book of life and instructions for navigating it. Science 288, 2304-7 (2000). 2. Venter, J.C. et al. The sequence of the human genome. Science 291, 1304-51 (2001). 3. Bao, X. et al. Identification of human metapneumovirus-induced gene networks in airway epithelial cells by microarray analysis. Virology 374, 114-27 (2008). 4. Bao, X. et al. Airway epithelial cell response to human metapneumovirus infection. Virology 368, 91-101 (2007). 5. Schuurhof, A. et al. Gene expression differences in lungs of mice during secondary immune responses to respiratory syncytial virus infection. J Virol 84, 9584-94 (2010). 6. Guo, H., Ingolia, N.T., Weissman, J.S. & Bartel, D.P. Mammalian microRNAs predominantly act to decrease target mRNA levels. Nature 466, 835-40 (2010). 7. Baek, D. et al. The impact of microRNAs on protein output. Nature 455, 64-71 (2008). 8. Lim, L.P. et al. Microarray analysis shows that some microRNAs downregulate large numbers of target mRNAs. Nature 433, 769-73 (2005). 9. Cullen, B.R. Viruses and microRNAs. Nat Genet 38 Suppl, S25-30 (2006). 10. Pauley, K.M. & Chan, E.K. MicroRNAs and their emerging roles in immunology. Ann N Y Acad Sci 1143, 226-39 (2008). 11. Jin, P. et al. Molecular signatures of maturing dendritic cells: implications for testing the quality of dendritic cell therapies. J Transl Med 8, 4 (2010). 12. Kuipers, H., Schnorfeil, F.M. & Brocker, T. Differentially expressed microRNAs regulate plasmacytoid vs. conventional dendritic cell development. Mol Immunol 48, 333-40 (2010). 13. Krek, A. et al. Combinatorial microRNA target predictions. Nat Genet 37, 495-500 (2005). 14. Friedman, R.C., Farh, K.K., Burge, C.B. & Bartel, D.P. Most mammalian mRNAs are conserved targets of microRNAs. Genome Res 19, 92-105 (2009). 15. Hammond, S.M., Bernstein, E., Beach, D. & Hannon, G.J. An RNA-directed nuclease mediates post-transcriptio- nal gene silencing in Drosophila cells. Nature 404, 293-6 (2000). 16. Beitzinger, M., Peters, L., Zhu, J.Y., Kremmer, E. & Meister, G. Identification of human microRNA targets from isolated argonaute protein complexes. RNA Biol 4, 76-84 (2007). 17. Ender, C. & Meister, G. Argonaute proteins at a glance. J Cell Sci 123, 1819-23 (2010). 18. Winter, J. & Diederichs, S. Argonaute proteins regulate microRNA stability: Increased microRNA abundance by Argonaute proteins is due to microRNA stabilization. RNA Biol 8 (2011). 19. Pare, J.M., Lopez-Orozco, J. & Hobman, T.C. MicroRNA-binding is required for recruitment of human Argonaute 2 to stress granules and P-bodies. Biochem Biophys Res Commun 414, 259-64 (2011). 20. Beitzinger, M. & Meister, G. Experimental identification of microRNA targets by immunoprecipitation of Argonaute protein complexes. Methods Mol Biol 732, 153-67 (2011). 21. Havelange, V. et al. Functional implications of microRNAs in acute myeloid leukemia by integrating microRNA and messenger RNA expression profiling. Cancer (2011). 22. Cho, S. et al. miRGator v2.0: an integrated system for functional investigation of microRNAs. Nucleic Acids Res 39, D158-62 (2011). 23. Genovesi, L.A., Carter, K.W., Gottardo, N.G., Giles, K.M. & Dallas, P.B. Integrated Analysis of miRNA and mRNA Expression in Childhood Medulloblastoma Compared with Neural Stem Cells. PLoS One 6, e23935 (2011). 24. Satoh, J.I. Molecular network of microRNA targets in Alzheimer’s disease brains. Exp Neurol (2011). 25. Enerly, E. et al. miRNA-mRNA integrated analysis reveals roles for miRNAs in primary breast tumors. PLoS One 6, e16915 (2011). 26. Sharbati, J. et al. Integrated microRNA-mRNA-analysis of human monocyte derived macrophages upon Mycobacterium avium subsp. hominissuis infection. PLoS One 6, e20258 (2011). 27. Hashimi, S.T. et al. MicroRNA profiling identifies miR-34a and miR-21 and their target genes JAG1 and WNT1 in the coordinate regulation of dendritic cell differentiation. Blood 114, 404-14 (2009). 28. van Noort, V. & Huynen, M.A. Combinatorial gene regulation in Plasmodium falciparum. Trends Genet 22, 73-8 (2006).

23 29. Frith, M.C., Hansen, U. & Weng, Z. Detection of cis-element clusters in higher eukaryotic DNA. Bioinformatics 17, 878-89 (2001). 30. Bailey, T.L., Williams, N., Misleh, C. & Li, W.W. MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res 34, W369-73 (2006). 31. Robertson, G. et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat Methods 4, 651-7 (2007). 32. Bailey, T.L. DREME: motif discovery in transcription factor ChIP-seq data. Bioinformatics 27, 1653-9 (2011). 33. Machanick, P. & Bailey, T.L. MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics 27, 1696-7 (2011). 34. Hegde, P.S., White, I.R. & Debouck, C. Interplay of transcriptomics and proteomics. Curr Opin Biotechnol 14, 647-51 (2003). 35. Natesan, M. & Ulrich, R.G. Protein microarrays and biomarkers of infectious disease. Int J Mol Sci 11, 5165-83 (2010). 36. Montor, W.R. et al. Genome-wide study of Pseudomonas aeruginosa outer membrane protein immunogenicity using self-assembling protein microarrays. Infect Immun 77, 4877-86 (2009). 37. Ghafouri, B., Kihlstrom, E., Stahlbom, B., Tagesson, C. & Lindahl, M. PLUNC (palate, lung and nasal epithelial clone) proteins in human nasal lavage fluid. Biochem Soc Trans 31, 810-4 (2003). 38. Casado, B., Pannell, L.K., Iadarola, P. & Baraniuk, J.N. Identification of human nasal mucous proteins using proteomics. Proteomics 5, 2949-59 (2005). 39. Lukinskiene, L. et al. Antimicrobial Activity of PLUNC Protects against Pseudomonas aeruginosa Infection. J Immunol 187, 382-90 (2011). 40. Ogata, H. et al. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res 27, 29-34 (1999). 41. Peri, S. et al. Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res 13, 2363-71 (2003). 42. Bader, G.D. & Hogue, C.W. BIND--a data specification for storing and describing biomolecular interactions, molecular complexes and pathways. Bioinformatics 16, 465-77 (2000). 43. Stark, C. et al. BioGRID: a general repository for interaction datasets. Nucleic Acids Res 34, D535-9 (2006). 44. Perco, P. et al. Linking transcriptomic and proteomic data on the level of protein interaction networks. Electro- phoresis 31, 1780-9 (2010). 45. Silvestri, E. et al. Studies of complex biological systems with applications to molecular medicine: the need to integrate transcriptomic and proteomic approaches. J Biomed Biotechnol 2011, 810242 (2011). 46. Maecker, B., von, B.-B., Anderson, K.S., Vonderheide, R.H. & Schultze, J.L. Linking genomics to immunotherapy by reverse immunology--’immunomics’ in the new millennium. Curr Mol Med 1, 609-19 (2001). 47. Klysik, J. Concept of immunomics: a new frontier in the battle for gene function? Acta Biotheor 49, 191-202 (2001). 48. Driguez, P., Doolan, D.L., Loukas, A., Felgner, P.L. & McManus, D.P. Schistosomiasis vaccine discovery using immunomics. Parasit Vectors 3, 4 (2010). 49. Prechl, J., Papp, K. & Erdei, A. Antigen microarrays: descriptive chemistry or functional immunomics? Trends Immunol 31, 133-7 (2010). 50. Assarsson, E. et al. Immunomic analysis of the repertoire of T-cell specificities for influenza A virus in humans. J Virol 82, 12241-51 (2008). 51. Doolan, D.L. Plasmodium immunomics. Int J Parasitol 41, 3-20 (2011).

24 General Introduction 1

25

2

Quantitative proteome pro ling of respiratory virus-infected lung epithelial cells

Angela van Diepena, H. Kim Branda, Iziah Samab, Lambert H.J. Lambooya,c , Lambert P. van den Heuvelc, Leontine van der Welld, Martijn Huynenb, Albert D.M.E. Osterhausd, Arno C. Andewegd, Peter W.M. Hermansa

a Laboratory of Pediatric Infectious Diseases, Radboud University Nijmegen Medical Centre, Nijmegen, b Center for Molecular and Biomolecular Informatics and Nijmegen Center for Molecular Life Sciences, Radboud University Nijmegen Medical Centre, Nijmegen, c Laboratory of Pediatrics and Neurology, Radboud University Nijmegen Medical Centre, Nijmegen, dInstitute of Virology, Erasmus MC, Rotterdam, The Netherlands

J Proteomics. 2010 Aug 5; 73(9):1680-93 Abstract

Respiratory virus infections are among the primary causes of morbidity and mortality in humans. Influenza virus, respiratory syncytial virus (RSV), parainfluenza (PIV) and human metapneumovirus (hMPV) are major causes of respiratory illness in humans. Especially young children and the elderly are susceptible to infections with these viruses. In this study we aim to gain detailed insight into the molecular pathogenesis of respiratory virus infections by studying the protein expression profiles of infected lung epithelial cells. A549 cells were exposed to a set of respiratory viruses [RSV, hMPV, PIV and Measles virus (MV)] using both live and UV-inactivated virus preparations. Cells were harvested at different time points after infection and processed for proteomics analysis by 2-dimensional difference gel electrophoresis. Samples derived from infected cells were compared to mock-infected cells to identify proteins that are differentially expressed due to infection. We show that RSV, hMPV, PIV3, and MV induced similar core host responses and that mainly proteins involved in defense against ER stress and apoptosis were affected which points towards an induction of apoptosis upon infection. By 2-D DIGE analyses we have gathered information on the induction of apoptosis by respiratory viruses in A549 cells.

28 Quantitative proteome profiling of respiratory virus-infected lung epithelial cells

1. Introduction

Respiratory viruses are a major cause of lower respiratory tract infections and are among the primary causes of morbidity and mortality in humans. These infections are a major public health problem. Especially infants and young children as well as the elderly are prone to severe and even fatal outcome of infections with respiratory viruses such as 2 Influenza virus, RSV, PIV, and human hMPV [1–5]. In young children and infants, RSV is the primary etiologic agent of epidemic lower respiratory tract infections being responsible for 80% of cases of acute bronchiolitis [6, 7]. For RSV extensive vaccine-related research has been performed, but so far no efficient and widely applicable vaccine has been developed yet. Also for the recently discovered hMPV [8], which is closely related to RSV, no vaccine is currently available. Only for MV an effective vaccine is available and for classical Influenza vaccines are available but these need further improvement.

RSV, hMPV, PIV and MV belong to the Paramyxoviridae family of viruses. These are all enveloped viruses and have a linear, non-segmented, single stranded negative sense RNA of ~15,000 nucleotides containing 10 genes encoding 10 (RSV) proteins, 13kb containing 8 genes (hMPV), ~15,500 nucleotides containing 6 genes (PIV), and 15.9kb containing 6 genes (MV), respectively [9]. All these viruses enter the host via the respiratory tract by infecting airway epithelial cells that line the nose as well as the large and small airways and induce inflammation associated with the disease at the site of infection with the exception of MV that disseminates beyond the respiratory tract [9].

Although respiratory cells are the first targets of respiratory virus infection, they are also the first line of defense in the innate immune response that is generated upon infection [10]. Epithelial cells form a direct physical barrier between the host and the environment, the epithelial cells are able to detect invading pathogens and actively participate in the innate immune response to invading viruses e.g. by producing acute phase proteins and inflammatory cytokines and chemokines [10–12]. Depending on the location in the lung, there are different types of epithelial cells that respond to invading pathogens in a cell type specific manner due to varying expression of pattern recognition receptors, cell-specific protein expression, or to differences in susceptibility to injury. These responses can result in killing of the pathogen by recruitment of phagocytes that can not only directly kill the invading pathogens but also shape the adaptive immune response to viral infection.

However, little is known about the exact mechanisms underlying the induction and maturation stages of innate and adaptive anti-viral immune responses. Also the interaction between the virus and the host at the molecular level is not well understood. A better understanding of these events is required for the development of new

29 strategies for treatment and prevention of these infectious diseases. In this study we describe the effect of respiratory virus infection (RSV, hMPV, PIV and MV) on protein expression levels in type II alveolar A549 human epithelial cells by 2-D DIGE analyses at different time points after infection to gain detailed insight into the interaction between the viruses and the host cells and to explore differences in induced protein expression changes between different respiratory viruses. A549 cells are frequently used for in vitro infection with RSV and all four viruses are capable of infecting these cells making it a suitable in vitro model for comparing protein expression profiles between RSV, hMPV, PIV, and MV [12–17]. We show that a similar core host response was induced by the different viruses that were studied, and that RSV, hMPV, PIV3, and MV mainly affect proteins involved in defense against ER stress and apoptosis which points towards an induction of apoptosis upon infection.

2. Materials and Methods

2.1 Chemicals and reagents Fluorescent minimal labeling CyDyes and IPG buffer pH3–10 were purchased from GE Healthcare (Roosendaal, The Netherlands). Urea, thiourea, Tris, iodoacetamide, trifluoro- acetic acid (TFA), phosphoric acid, acetone, glycerol, acetonitrile, ammonium bicarbonate, CHAPS, L-lysine, and porcine modified trypsin were all from Merck (Amsterdam, The Netherlands), DTT from Sigma-Aldrich (Zwijndrecht, The Netherlands), DHS from Fluka, Trizol from Invitrogen (Breda, The Netherlands), TEMED, low melting temperature agarose (LMT), ammoniumpersulphate and Coomassie Brilliant Blue R-250 from BHD (Amsterdam, The Netherlands), and acrylamide from Nation Diagnostics (Hessle, UK).

2.2 Cells and viruses The human lung adenocarcinoma epithelial cell line A549 and (American Type Culture Collection (ATCC; CCL-185), Manassas, VA, USA) and the Vero-118 cell line (a gift from B.G. van den Hoogen) were cultured at 37 °C with 5% CO2 in respectively HAM F12 medium (GIBCO) supplemented with 3% heat-inactivated fetal calf serum (FCS, Hyclone) and Iscove’s Modified Dulbecco’s Medium (IMDM, GIBCO) supplemented with either 0.02% trypsin and 3% bovine albumin fraction Vmedium (for hMPV infection) or 10% FCS (for RSV, PIV, and MV infections).All media were also supplemented with penicillin (100 U/ml; BioWhittaker), streptomycin (100 μg/ml; BioWhittaker) and L-glutamin (2mM; BioWhittaker). High titer stocks from hMPV(NL/1/00, a gift from B.G. van den Hoogen), RSV-A2 (ATCC; VR1540), PIV3 (ATCC; VR-93), and MV-Edm (ATCC; VR-24) were grown on Vero-118 cells using a serial limiting dilution protocol according to Gupta et al. [18] and used to infect A549 cells. From all virus stocks inactivated virus preparations were produced by UV-irradiation using an Uvitec ultraviolet transilluminator (312 nmwavelength).

30 Quantitative proteome profiling of respiratory virus-infected lung epithelial cells

2.3 In vitro infection experiments A549 cells were seeded at a density of 3×105 cells/well in a 6-well plate and were allowed to adhere overnight. Live or UV-inactivated virus preparations (same dilution from the original high tittered stocks) were added to the cells at a multiplicity of infection of 3 to 10. At 60 min after infection the virus-containing serum free medium was removed and replaced by a fresh HAM F12 medium (containing 3% FCS) and cells 2 were incubated at 37 °C. For the mock-infected cells a medium containing no virus was added to the cells. At 6, 12, and 24 h after infection the cells were washed once with pre-warmed PBS and the cells were lysed in 1 ml Trizol and stored at −80 °C. These infection experiments were performed in duplicate on different days for analysis of biological replicates.

2.4 Protein isolation Proteins were isolated from the remaining interphase and organic phase that remain after RNA extraction from Trizol samples. Four volumes of ice-cold (−20 °C) acetone were added to these fractions and incubated at −20 °C for 1 h. Precipitated proteins were then centrifuged at maximum speed for 5min. The protein pellet was washed twice with ice-cold 80% acetone. The pellet was air-dried and suspended in milliQ. Protein concentration in each sample was determined using a 96-well plate based BCA assay (Pierce) according to the manufacturer’s instructions.

2.5 Protein labeling A total amount of 25 μg protein was taken from each sample and dried in a vacuum concentrator. The protein pellets were dissolved in 10 μl lysis buffer (30 mM Tris, 7 M urea, 1 M thiourea, and 4% CHAPS) and were snap frozen five times to completely dissolve the protein pellet. Per 25 μg protein 200 pmol fluorescent minimal labeling CyDye was added according to the scheme depicted in Table 1 and incubated on ice for 30 min in the dark. 1 μl 10mM L-lysine was added to each sample and was incubated for 15 min in the dark to stop the labeling reaction. Labeled samples were stored at −20 ° until use.

2.6 2-D gel electrophoresis Samples to be run on the same gel were combined (Table 1) and used for rehydration of 18 cm Immobiline Dry Strips pH4–7 (GE Healthcare). A volume of 310 μl rehydration buffer (7 M urea, 2 M thiourea, 4% CHAPS, 1.2% IPG buffer pH3–10, and 2 mg/ml DTT) was added to the combined samples and applied to the strips using a rehydration tray. After an overnight rehydration the proteins were separated in the first dimension using a Multiphor II (3 h 300 V followed by a gradual increase by 300 V every 20 min until a final voltage of 2200 was reached). After complete focusing (±30 kVh), the strips were equilibrated in 0.5% DTT in equilibration buffer (100 mM Tris pH8.8, 6 M urea, 30%

31 glycerol, and 2% SDS) for 10 min followed by 4.5% iodoacetamide in equilibration buffer for 10 min. The strips were then applied to a 12–20% gradient polyacrylamide gel that had been made the day before using low-fluorescent glass plates. The strips were fixed using a 1% LMT solution and the proteins were directly separated overnight in the second dimension using an Ettan Dalt Six (GE Healthcare) at 10 °C and 2W per gel.

Table 1 - Experimental setup for 2-D DIGE experimentsa.

Gel number Cy3 Cy5 Cy2 1–2 Mock t=0 Mock t=6 A549 mock 3–4 Mock t=12 Mock t=24 A549 mock 5–6 RSV+UV t=6 RSV−UV t=6 A549 mock 7–8 RSV+UV t=12 RSV−UV t=12 A549 mock 9–10 RSV+UV t=24 RSV−UV t=24 A549 mock 11–12 hMPV+UV t=6 hMPV−UV t=6 A549 mock 13–14 hMPV+UV t=12 hMPV−UV t=12 A549 mock 15–16 hMPV+UV t=24 hMPV−UV t=24 A549 mock 17–18 PIV+UV t=6 PIV−UV t=6 A549 mock 19–20 PIV+UV t=12 PIV−UV t=12 A549 mock 21–22 PIV+UV t=24 PIV−UV t=24 A549 mock 23–24 MV+UV t=6 MV−UV t=6 A549 mock 25–26 MV+UV t=12 MV−UV t=12 A549 mock 27–28 MV+UV t=24 MV−UV t=24 A549 mock a 50 μg protein from each sample was labeled with fluorescent CyDyes as indicated.

2.7 Scanning and image analysis Gels were scanned on a Typhoon 9410 (Amersham Biosciences) at excitation wavelengths of 488, 532, and 633 nm and 520BP40, 580BP30, and 670BP30 emission filters Cy2, Cy3, and Cy5, respectively. For the comparative analyses per time point only those protein spots were included that were present in all of the gels to be compared for a certain virus or time point. Statistical analysis and quantitative protein expression were done using Decyder Analysis BVA software (Amersham Biosciences) and protein spots were considered significantly differential at a p-value of <0.01 and a fold change of at least 2.5.

2.8 Mass spectrometry and protein identification For identification of protein spots a preparative gel containing 500 μg protein was run as described in section 2.6 and stained with Coomassie Brilliant Blue R-250. Protein spots

32 Quantitative proteome profiling of respiratory virus-infected lung epithelial cells

of interest were manually excised from the gel and digested with trypsin. Peptides were extracted using 2% TFA and were desalted and concentrated using C18 StageTips.

Peptide mixtures were purified and desalted using C18 StageTips. Peptide separation and sequence determination were performed with a nano-high performance liquid chromatography system (Agilent 1100 series, Amstelveen, The Netherlands) connected to a 7-T linear quadrupole ion trap-ion cyclotron resonance Fourier transform mass 2 spectrometer (Thermo Electron, Breda, The Netherlands). Peptides were separated on a 15-cm 100-μm inner-diameter PicoTip emitter for online electrospray (New Objective,

Woburn, MA) packed with 3 μm C18 beads (Reprosil, Dr. Maisch GmbH, Ammerbuch- Entringen, Germany) with a 60-minute linear gradient from 2.4 to 40% acetonitrile in 0.5% acetic acid at a 300 nl/min flow rate. The four most abundant ions were sequentially isolated and fragmented in the linear ion trap by applying collisionally induced dissociation. Proteins were identified using the MASCOT search engine (Matrix science, London, UK) against the human protein NCBI database using the following search criteria: 20 ppm peptide tolerance, a maximum of 2 missed cleavages, a fixed carbami- domethyl modification of cysteines, and variable oxidation (M) and deamidation (NQ) modification.

2.9 Molecular network analyses The global protein–protein interaction (PPI) network used (herein referred to as galaxy 6) was built from an accumulation of human-curated PPIs obtained from the Biomolecular Interaction Network Database (BIND) (data downloaded in October 2006), the Human Protein Reference Database (HPRD) (data of release 6th of January 2007), the IntAct database (downloaded in May 2007), the Molecular Interactions Database (MINT) (downloaded in May 2007), and the PDZBase database (downloaded in May 2007). All PPI networks were drawn using Cytoscape. The Gene Ontology (GO) database used was downloaded from NCBI in February 2008. Gene enrichment analyses were done using BINGO [19]. Protein enrichment analyses were done using in-house python scripts.

3. Results and Discussion

3.1 2-D DIGE analyses To study the epithelial responses upon respiratory virus encounter and infection, A549 cells were exposed to different respiratory viruses (RSV, hMPV, PIV and MV). Both live and UV-inactivated virus preparations were used thus allowing distinction between live, replicating, virus-induced changes in protein expression and changes induced upon contact with or uptake of non-replicating virus particles. At different time points after infection, the cells were lysed and used for protein expression analysis by 2-D DIGE and quantitative and comparative DeCyder analyses. Approximately 1000 spots were

33 detected in the individual gels. Of these, 325 were present in at least 20 out of 28 gels and could be used for statistical and comparative analyses. For the comparative analyses per time point only those protein spots were included that were present in all of the gels to be compared for a certain virus or time point. Upon infection with either one of the respiratory viruses or mock, a total of 70 spots showed differential expression in at least one of the time points after infection (Fig. 1A). Differentially expressed spots were manually excised from a preparative gel and were analyzed by nano-LC FT-ICR-MS to identify the proteins. Of the 70 spots that were picked for analysis we were able to identify the corresponding protein in 63 samples. This resulted in the positive identification of 55 unique proteins (Table 2). These proteins were shown to be mostly involved in cellular structures, stress responses, protein biosynthesis and modification, transcription, and trafficking (Fig. 1B).

34 Quantitative proteome profiling of respiratory virus-infected lung epithelial cells

2

Figure 1 - Preparative gel image (A) and functional classification (B) of protein spots that showed differential expression upon respiratory virus infection of A549 cells compared to mock at 6, 12 or 24 h after infection.

35 Table 2 - Di erentially expressed spots.

Estimated FT-MS analysis RSVa,b hMPVa,b PIVa,b MVa,b Spot Mass (Da) pI Protein ID Protein Gene # Pept. Seq. Protein 6 12 24 6 12 24 6 12 24 6 12 24 nr. description cov. score (%) Cytoskeleton/structural proteins 311 125,000 5.3 O43707 Alpha-actinin 4 ACTN4 26 34.9 1021 −2.65,−3.25 323 150,500 6.48 P18206 Vinculin VCL 6 5.6 187 −4.39,−2.54 390 115,000 4.53 O43707 Alpha-actinin 4 ACTN4 12 12.5 431 −2.84,– 396 110,000 4.35 O43707 Alpha-actinin 4 ACTN4 13 14.4 410 −2.81,– –,−6.52 577 95,000 6.2 P02545 Lamin A/C LMNA 20 32.4 828 –,8.46 (70 kDa lamin) 578 97,500 6.6 P35241 Radixin RDX 48 75.8 1545 –,11.12 –,−2.14 2.82,– 3.82,– 615 97,500 6.8 P02545 Lamin A/C LMNA 33 51.2 1143 2.41,– (70 kDa lamin) 629 80,000 5.25 P20700 Lamin B1 LMNB1 37 55.5 1313 −4.22,−6.17 841 52,000 5 Q13509 Tubulin beta-3 chain TUBB3 16 40.9 767 3.08,2.38 844 53,000 4.85 P07437 Tubulin beta-2 chain TUBB 5 14.6 241 4.69,4.70 Cell cycle 129 180,000 5.1 Q14978 Nucleolar phosphoprotein NOLC1 3 3.6 95 −3.75,– p130 Vesicular trafficking/transport 182 190,000 5.58 Q15075 Early endosome antigen 1 EEA1 9 5.2 362 −3.69,– 187 190,000 5.65 Q15075 Early endosome antigen 1 EEA1 2 1.3 72 −5.22,– 2.17,– 423 125,000 6.6 Q05193 Dynamin-1 DNM1 28 28.6 943 −2.04,– 3.54,– 674 75,000 5.6 Q96HE7 ERO1-like protein alpha ERO1L 11 26.1 378 −7.68,– (precursor) 689 77,500 5.08 Q9UNF0 Protein kinase C and PACSIN2 12 23 413 –,−3.78 casein kinase substrate in neurons protein 2 Mitochondrion 522 95,000 5.55 P28331 NADH-ubiquinone NDUFS1 16 27.6 753 7.06,– oxidoreductase 75kDa subunit 1157 41,000 5.85 Q9HB07 MYG1 protein C12ORF10 4 12.2 165 2.92,2.50 Transcription/RNA processing 197 175,000 5.83 Q00839 Heterogeneous HNRPU 1 1 36 –,9.11 −2.08,– ribonucleoprotein U 253 150,000 4.88 Q15029 116 kDa U5 small EFTUD2 5 5.5 188 −2.76,– ribonucleoprotein component

36 Quantitative proteome profiling of respiratory virus-infected lung epithelial cells

Table 2 - Di erentially expressed spots.

Estimated FT-MS analysis RSVa,b hMPVa,b PIVa,b MVa,b Spot Mass (Da) pI Protein ID Protein Gene # Pept. Seq. Protein 6 12 24 6 12 24 6 12 24 6 12 24 nr. description cov. score (%) Cytoskeleton/structural proteins 2 311 125,000 5.3 O43707 Alpha-actinin 4 ACTN4 26 34.9 1021 −2.65,−3.25 323 150,500 6.48 P18206 Vinculin VCL 6 5.6 187 −4.39,−2.54 390 115,000 4.53 O43707 Alpha-actinin 4 ACTN4 12 12.5 431 −2.84,– 396 110,000 4.35 O43707 Alpha-actinin 4 ACTN4 13 14.4 410 −2.81,– –,−6.52 577 95,000 6.2 P02545 Lamin A/C LMNA 20 32.4 828 –,8.46 (70 kDa lamin) 578 97,500 6.6 P35241 Radixin RDX 48 75.8 1545 –,11.12 –,−2.14 2.82,– 3.82,– 615 97,500 6.8 P02545 Lamin A/C LMNA 33 51.2 1143 2.41,– (70 kDa lamin) 629 80,000 5.25 P20700 Lamin B1 LMNB1 37 55.5 1313 −4.22,−6.17 841 52,000 5 Q13509 Tubulin beta-3 chain TUBB3 16 40.9 767 3.08,2.38 844 53,000 4.85 P07437 Tubulin beta-2 chain TUBB 5 14.6 241 4.69,4.70 Cell cycle 129 180,000 5.1 Q14978 Nucleolar phosphoprotein NOLC1 3 3.6 95 −3.75,– p130 Vesicular trafficking/transport 182 190,000 5.58 Q15075 Early endosome antigen 1 EEA1 9 5.2 362 −3.69,– 187 190,000 5.65 Q15075 Early endosome antigen 1 EEA1 2 1.3 72 −5.22,– 2.17,– 423 125,000 6.6 Q05193 Dynamin-1 DNM1 28 28.6 943 −2.04,– 3.54,– 674 75,000 5.6 Q96HE7 ERO1-like protein alpha ERO1L 11 26.1 378 −7.68,– (precursor) 689 77,500 5.08 Q9UNF0 Protein kinase C and PACSIN2 12 23 413 –,−3.78 casein kinase substrate in neurons protein 2 Mitochondrion 522 95,000 5.55 P28331 NADH-ubiquinone NDUFS1 16 27.6 753 7.06,– oxidoreductase 75kDa subunit 1157 41,000 5.85 Q9HB07 MYG1 protein C12ORF10 4 12.2 165 2.92,2.50 Transcription/RNA processing 197 175,000 5.83 Q00839 Heterogeneous HNRPU 1 1 36 –,9.11 −2.08,– ribonucleoprotein U 253 150,000 4.88 Q15029 116 kDa U5 small EFTUD2 5 5.5 188 −2.76,– ribonucleoprotein component

37 Table 2 - Continued.

Estimated FT-MS analysis RSVa,b hMPVa,b PIVa,b MVa,b Spot Mass (Da) pI Protein ID Protein Gene # Pept. Seq. Protein 6 12 24 6 12 24 6 12 24 6 12 24 nr. description cov. score (%) 311 125,000 5.3 P19338 Nucleolin NCL 21 24.1 923 −2.65,−3.25 566 100,000 6.83 Q8TCS8 Polyribonucleotide PNPT1 43 54 1336 10.85,13.72 nucleotidyltrans-ferase 1, 745 70,000 5.13 P61978 Heterogeneous HNRPK 10 23.8 412 3.71,– ribonucleoprotein K DNA repair and maintenance 482 105,000 5.5 P49959 Double-strand break MRE11A 5 6.8 219 5.04,5.41 7.06,– repair protein MRE11A 486 105,000 5.63 P13010 ATP-dependent DNA XRCC5 6 8.7 258 3.29,3.70 helicase II, 80 kDa subunit 525 105,000 5.75 P13010 ATP-dependent DNA XRCC5 27 51.6 1061 2.52,2.44 helicase II, 80 kDa subunit 799 60,000 4.85 P54727 UV excision repair protein RAD23B 14 34 436 4.33,– RAD23 homolog B Protein biosynthesis and modification 425 130,000 6.98 P13639 Elongation factor 2 EEF2 44 46.6 1646 −2.62,– 428 130,000 6.8 P13639 Elongation factor 2 EEF2 22 25.3 785 −4.37,−4.24 576 80,000 4.08 P41250 Glycyl-tRNA synthetase GARS 2 2.8 86 –,−2.99 577 95,000 6.2 P41250 Glycyl-tRNA synthetase GARS 27 34.8 885 –,8.46 707 63,000 4.58 P07237 Protein disulfide- P4HB 2 4.1 85 –,−2.50 2.13,– 2.55,– isomerase precursor 1101 44,500 6 Q9Y570 Protein phosphatase PPME1 9 26.7 345 –,2.90 methylesterase 1 1497 28,500 6.48 O15305 Phosphomannomutase 2 PMM2 13 49.2 442 −2.71,– Protein degradation 311 125,000 5.3 P55072 Transitional endoplasmic VCP 20 27.9 710 −2.65,−3.25 reticulum ATPase 311 125,000 5.3 Q9UBT2 SUMO-activating enzyme SAE2 14 18.4 476 −2.65,−3.25 subunit 2 692 75,000 5.65 Q9H4A4 Aminopeptidase B RNPEP 19 36.9 689 −3.03,– 1566 24,500 5.3 P61086 Ubiquitin-conjugating HIP2 6 31.5 187 5.19,– enzyme E2-25 kDa

38 Quantitative proteome profiling of respiratory virus-infected lung epithelial cells

Table 2 - Continued.

Estimated FT-MS analysis RSVa,b hMPVa,b PIVa,b MVa,b Spot Mass (Da) pI Protein ID Protein Gene # Pept. Seq. Protein 6 12 24 6 12 24 6 12 24 6 12 24 nr. description cov. score (%) 2 311 125,000 5.3 P19338 Nucleolin NCL 21 24.1 923 −2.65,−3.25 566 100,000 6.83 Q8TCS8 Polyribonucleotide PNPT1 43 54 1336 10.85,13.72 nucleotidyltrans-ferase 1, 745 70,000 5.13 P61978 Heterogeneous HNRPK 10 23.8 412 3.71,– ribonucleoprotein K DNA repair and maintenance 482 105,000 5.5 P49959 Double-strand break MRE11A 5 6.8 219 5.04,5.41 7.06,– repair protein MRE11A 486 105,000 5.63 P13010 ATP-dependent DNA XRCC5 6 8.7 258 3.29,3.70 helicase II, 80 kDa subunit 525 105,000 5.75 P13010 ATP-dependent DNA XRCC5 27 51.6 1061 2.52,2.44 helicase II, 80 kDa subunit 799 60,000 4.85 P54727 UV excision repair protein RAD23B 14 34 436 4.33,– RAD23 homolog B Protein biosynthesis and modification 425 130,000 6.98 P13639 Elongation factor 2 EEF2 44 46.6 1646 −2.62,– 428 130,000 6.8 P13639 Elongation factor 2 EEF2 22 25.3 785 −4.37,−4.24 576 80,000 4.08 P41250 Glycyl-tRNA synthetase GARS 2 2.8 86 –,−2.99 577 95,000 6.2 P41250 Glycyl-tRNA synthetase GARS 27 34.8 885 –,8.46 707 63,000 4.58 P07237 Protein disulfide- P4HB 2 4.1 85 –,−2.50 2.13,– 2.55,– isomerase precursor 1101 44,500 6 Q9Y570 Protein phosphatase PPME1 9 26.7 345 –,2.90 methylesterase 1 1497 28,500 6.48 O15305 Phosphomannomutase 2 PMM2 13 49.2 442 −2.71,– Protein degradation 311 125,000 5.3 P55072 Transitional endoplasmic VCP 20 27.9 710 −2.65,−3.25 reticulum ATPase 311 125,000 5.3 Q9UBT2 SUMO-activating enzyme SAE2 14 18.4 476 −2.65,−3.25 subunit 2 692 75,000 5.65 Q9H4A4 Aminopeptidase B RNPEP 19 36.9 689 −3.03,– 1566 24,500 5.3 P61086 Ubiquitin-conjugating HIP2 6 31.5 187 5.19,– enzyme E2-25 kDa

39 Table 2 - Continued.

Estimated FT-MS analysis RSVa,b hMPVa,b PIVa,b MVa,b Spot Mass (Da) pI Protein ID Protein Gene # Pept. Seq. Protein 6 12 24 6 12 24 6 12 24 6 12 24 nr. description cov. score (%) Metabolism 538 100,000 6.7 Q06210 Glucosamine- GFPT1 12 16.5 376 5.62,4.88 3.48,– fructose-6-phosphate aminotransferase 753 75,000 6.48 P31939 Bifunctional purine ATIC 6 10.3 181 −2.42,−2.74 biosynthesis protein PURH 898 57,500 6.38 P30838 Aldehyde ALDH3A1 9 17.5 308 –,2.13 dehydrogenase,dimeric NADP-preferrin 1543 27,000 6.7 P60174 Triphosphate isomerase TPI1 2 10 74 3.04,2.70 Stress response/molecular chaperones 397 125,000 5.9 P07900 Heat shock protein HSP HSP90AA1 26 33.5 1056 4.48,5.95 90-alpha 519 97,500 5.53 P11142 Heat shock cognate 71 HSPA8 4 6.3 130 –,5.83 kDa protein 554 90,000 5.05 P11021 78 kDa glucose-regulated HSPA5 47 77.5 2001 −2.23,−3.01 protein (precursor) 631 80,000 4.95 P11142 Heat shock cognate 71 HSPA8 15 25.4 471 –,−2.89 kDa protein 719 77,500 6.35 P49368 T-complex protein CCT3 38 69 1292 −2.38,– 1,gamma subunit 726 63,000 4.3 P49368 T-complex protein CCT3 7 9.7 219 −4.32,– 1,gamma subunit 737 70,000 5.63 P48643 T-complex protein CCT5 38 71 1300 −2.15,−2.46 1,epsilon subunit 767 72,500 6.45 P40227 T-complex protein 1,zeta CCT6A 16 25.6 569 −4.90,−2.91 subunit 1809 17,000 6.3 P02763 Alpha-1-acid glycoprotein ORM1 5 27.4 172 –,6.85 1 (precursor) Apoptosis 353 135,000 6.3 Q8WUM4 Programmed cell death PDCD6IP 31 39.1 1297 −3.42,−2.40 −3.29,– −2.43,– 6-interacting protein 791 17,000 4.88 Q15121 Astrocytic PEA15 2 18.5 81 −2.68,– phosphoprotein PEA-15 Other/unknown function 616 74,000 5.33 P02765 Alpha-2-HS-glycoprotein AHSG 6 11.4 201 –,−3.00 (precursor) (Fetuin-A)

40 Quantitative proteome profiling of respiratory virus-infected lung epithelial cells

Table 2 - Continued.

Estimated FT-MS analysis RSVa,b hMPVa,b PIVa,b MVa,b Spot Mass (Da) pI Protein ID Protein Gene # Pept. Seq. Protein 6 12 24 6 12 24 6 12 24 6 12 24 nr. description cov. score (%) Metabolism 2 538 100,000 6.7 Q06210 Glucosamine- GFPT1 12 16.5 376 5.62,4.88 3.48,– fructose-6-phosphate aminotransferase 753 75,000 6.48 P31939 Bifunctional purine ATIC 6 10.3 181 −2.42,−2.74 biosynthesis protein PURH 898 57,500 6.38 P30838 Aldehyde ALDH3A1 9 17.5 308 –,2.13 dehydrogenase,dimeric NADP-preferrin 1543 27,000 6.7 P60174 Triphosphate isomerase TPI1 2 10 74 3.04,2.70 Stress response/molecular chaperones 397 125,000 5.9 P07900 Heat shock protein HSP HSP90AA1 26 33.5 1056 4.48,5.95 90-alpha 519 97,500 5.53 P11142 Heat shock cognate 71 HSPA8 4 6.3 130 –,5.83 kDa protein 554 90,000 5.05 P11021 78 kDa glucose-regulated HSPA5 47 77.5 2001 −2.23,−3.01 protein (precursor) 631 80,000 4.95 P11142 Heat shock cognate 71 HSPA8 15 25.4 471 –,−2.89 kDa protein 719 77,500 6.35 P49368 T-complex protein CCT3 38 69 1292 −2.38,– 1,gamma subunit 726 63,000 4.3 P49368 T-complex protein CCT3 7 9.7 219 −4.32,– 1,gamma subunit 737 70,000 5.63 P48643 T-complex protein CCT5 38 71 1300 −2.15,−2.46 1,epsilon subunit 767 72,500 6.45 P40227 T-complex protein 1,zeta CCT6A 16 25.6 569 −4.90,−2.91 subunit 1809 17,000 6.3 P02763 Alpha-1-acid glycoprotein ORM1 5 27.4 172 –,6.85 1 (precursor) Apoptosis 353 135,000 6.3 Q8WUM4 Programmed cell death PDCD6IP 31 39.1 1297 −3.42,−2.40 −3.29,– −2.43,– 6-interacting protein 791 17,000 4.88 Q15121 Astrocytic PEA15 2 18.5 81 −2.68,– phosphoprotein PEA-15 Other/unknown function 616 74,000 5.33 P02765 Alpha-2-HS-glycoprotein AHSG 6 11.4 201 –,−3.00 (precursor) (Fetuin-A)

41 Table 2 - Continued.

Estimated FT-MS analysis RSVa,b hMPVa,b PIVa,b MVa,b Spot Mass (Da) pI Protein ID Protein Gene # Pept. Seq. Protein 6 12 24 6 12 24 6 12 24 6 12 24 nr. description cov. score (%) 620 72,000 4.43 P02765 Alpha-2-HS-glycoprotein AHSG 5 8.7 185 –,−2.96 (precursor) (Fetuin-A) 627 72,000 4.5 P02765 Alpha-2-HS-glycoprotein AHSG 5 8.7 168 –,−3.56 (precursor) (Fetuin-A) 641 85,000 6.13 O95671 N-acetylserotonin ASMTL 13 24.3 456 3.23,3.10 O-methyltrans-ferase-like 706 63,000 4.53 P02765 Alpha-2-HS-glycoprotein AHSG 5 8.7 156 –,−2.66 (precursor) (Fetuin-A) 784 55,000 4.63 Q9H8Y8 Golgi reassembly stacking GORASP4 2 2.8 62 2.81,−5.23 protein 2 1174 40,000 5.6 P06748 Nucleophosmin NPM1 5 23.5 158 4.90,3.31 Unidentified 120 N/A N/A N/A N/A N/A –,13.75 2.35,– 174 N/A N/A N/A N/A N/A –,2.34 281 N/A N/A N/A N/A N/A –,2.43 475 N/A N/A N/A N/A N/A –,2.63 630 N/A N/A N/A N/A N/A 2.57,3.14 1359 N/A N/A N/A N/A N/A 2.99,2.02 1370 N/A N/A N/A N/A N/A –,6.85 a,b Fold changes in protein expression upon infection with live (a) and UV-inactivated (b) virus particles. Only significant changes are shown. –: not significantly changed.

3.2 Respiratory virus-induced changes in protein expression Upon infection of A549 cells with the 4 different respiratory viruses, there were 21, 18, 19 and 17 spots that showed differential expression upon infection with RSV, hMPV, PIV or MV, respectively, compared to mock-infected cells at 6, 12 or 24 h after infection (Table 2). RSV, hMPV, PIV and MV belong to the family of the Paramyxoviridae and are related to each other yet inducing distinct disease phenotypes. What became apparent from the 2-D DIGE analyses is that RSV and hMPV infected cells showed slightly more induced changes in protein expression than the PIV and Measles virus-infected cells. Also no

42 Quantitative proteome profiling of respiratory virus-infected lung epithelial cells

Table 2 - Continued.

Estimated FT-MS analysis RSVa,b hMPVa,b PIVa,b MVa,b Spot Mass (Da) pI Protein ID Protein Gene # Pept. Seq. Protein 6 12 24 6 12 24 6 12 24 6 12 24 nr. description cov. score (%) 620 72,000 4.43 P02765 Alpha-2-HS-glycoprotein AHSG 5 8.7 185 –,−2.96 2 (precursor) (Fetuin-A) 627 72,000 4.5 P02765 Alpha-2-HS-glycoprotein AHSG 5 8.7 168 –,−3.56 (precursor) (Fetuin-A) 641 85,000 6.13 O95671 N-acetylserotonin ASMTL 13 24.3 456 3.23,3.10 O-methyltrans-ferase-like 706 63,000 4.53 P02765 Alpha-2-HS-glycoprotein AHSG 5 8.7 156 –,−2.66 (precursor) (Fetuin-A) 784 55,000 4.63 Q9H8Y8 Golgi reassembly stacking GORASP4 2 2.8 62 2.81,−5.23 protein 2 1174 40,000 5.6 P06748 Nucleophosmin NPM1 5 23.5 158 4.90,3.31 Unidentified 120 N/A N/A N/A N/A N/A –,13.75 2.35,– 174 N/A N/A N/A N/A N/A –,2.34 281 N/A N/A N/A N/A N/A –,2.43 475 N/A N/A N/A N/A N/A –,2.63 630 N/A N/A N/A N/A N/A 2.57,3.14 1359 N/A N/A N/A N/A N/A 2.99,2.02 1370 N/A N/A N/A N/A N/A –,6.85 a,b Fold changes in protein expression upon infection with live (a) and UV-inactivated (b) virus particles. Only significant changes are shown. –: not significantly changed.

clear differences were found in direct comparisons between protein samples from live and UV-inactivated virus-infected cells, indicative for a predominating general response upon virus encounter by A549 cells with no apparent effect of the replication on the proteome within the time frame of the experiment. In addition, only proteins with a high turn-over rate will show differential expression at earlier time points. An additional caveat regarding a non-ambiguous interpretation of the differential protein responses measured by 2-D DIGE is an accumulation of virus-induced changes in addition to host cellular defense responses upon encounter with the virus particles. Given the

43 observation that there were not many differences between live and UV-inactivated virus particles, the most likely effect is that of the cell upon encounter with the virus particles instead of the activity of the live replicating viruses.

Upon infection of A549 cells with RSV, eleven proteins were more abundant, 9 less abundant, and 1 more abundant at 6 h while less abundant at 24 h compared to mock-infected cells (Table 2). Most of these proteins are known to be involved in protein biosynthesis and modification, and cellular structures (Table 2 and Fig. 2A). A total of 18 protein spots were differentially expressed in A549 cells upon encounter with hMPV, of which 7 were more abundant and 11 less abundant (Table 2 and Fig. 2B). A major part of the differentially expressed proteins are known to be involved in protein biosynthesis and modification, cellular stress responses, transcription or are cytoskeleton- or structural proteins. PIV infection showed more abundant expression in 12 protein spots, less abundant expression in 6 spots and 1 spot that was more abundant in live virus particles infected cells while less abundant in UV-inactivated virus-infected cells (Table 2 and Fig. 2C). Finally, MV showed less abundant expression in 8 spots and more abundant expression in 9 spots (Table 2 and Fig. 2D).

Figure 2 - Functional classification of proteins that showed differential expression upon infection of A549 cells with respiratory viruses RSV (A), hMPV (B), PIV (C), and MV (D).

44 Quantitative proteome profiling of respiratory virus-infected lung epithelial cells

In addition to functional annotations, we also used Ingenuity for functional analyses on these differentially expressed proteins and identified 34 molecular and cellular functions for these proteins. Most functions include cell death, cellular organization and function and morphology, posttranslational modifications and folding, and metabolism (Table 3, Fig. 1B). These analyses also showed that all 4 viruses affect these cellular processes similarly (Table 3). 2

Table 3 - Molecular and cellular functions of di erentially expressed proteins.

Molecular and cellular functions Number of proteinsa All RSV hMPV PIV MV Cellular organization, morphology, function and proliferation 26 8 12 13 11 Cell death 22 7 14 10 10 Metabolism 11 4 5 6 7 Small molecule biochemistry 14 4 6 4 6 Molecular transport 8 3 5 4 5 Post-translational modification and folding 8 2 5 5 4 Cellular compromise 3 3 5 4 1 DNA replication, recombination, and repair 6 2 5 3 2 Cell-to-cell signaling and interaction 5 3 3 2 2 Gene expression 5 2 3 2 Protein synthesis 4 2 3 1 1 Cellular development 2 2 3 Infection mechanism 3 1 2 2 Cell signaling 1 1 3 Cell-mediated immune response 4 4 Protein Degradation 3 1 3 RNA post-transcriptional 3 1 1 1 modification RNA trafficking 1 1 1 1 Inflammatory response 3 1 1 Cellular movement 1 1 1 RNA damage and repair 2 1 1 Cellular response to therapeutics 1 1 Energy production 1 1 Free radical scavenging 1 1 aOnly significant changes are shown.

45 3.3 Molecular network analyses Results from the original GO network analysis (Fig. 3, arrow 1, result 1) show that many of the identified proteins (viral perturbed) are involved in apoptosis. To further explore the involvement of respiratory virus-induced or repressed proteins in the apoptosis pathway we performed a GO enrichment analysis as schematically depicted in Fig. 3A (arrows 2, 2a, and 3, result 2). The apoptosis network was extracted from the original GO network and was enriched for proteins known to interact with proteins involved in apoptosis resulting in the apoptosis N1 protein network (arrow 2b, Fig. 3A). When virus-perturbed proteins were analyzed using this database 100% of the proteins were recovered (arrow 4, result 3), thus indicating the involvement of these proteins in apoptosis, either directly or indirectly by interacting with proteins that are directly involved in apoptosis (Table 4).

Since cytokines have been shown to be of great importance in respiratory virus infection and are known to be involved in apoptosis as well, we generated a new network from the apoptosis N1 network and the cytokine network from the original GO database together with the virus-perturbed proteins (cytokine and virus-perturbed network) and extracted the virus-perturbed N1 network (arrow 6, result 4). Analysis of the virus- perturbed proteins in this database revealed that 25–55% of these proteins are either proteins involved in apoptosis themselves or cytokines known to interact with these apoptosis proteins (Fig. 3B).

A substantial part of the proteins that were differentially expressed upon infection with either one of the different viruses are involved in apoptosis. To check whether this observation was not based on coincidence we performed another GO enrichment procedure for randomly selected proteins. Table 4 shows that the number of recovered proteins was significantly higher in the virus-perturbed than for the randomly selected proteins, indicating that the proteins that were found differentially expressed are indeed mostly involved in apoptosis.

3.4 Functional pathway analyses Ingenuity functional pathway analyses showed that a total number of 22 differentially expressed protein spots is considered to be involved in cell death and 26 in cellular organization, function and proliferation (Table 3). Ingenuity function analyses revealed that there is a great overlap between these molecular and cellular functions, since similar proteins were categorized in these two functional categories. Both categories include VCL, RDX, HSPA5, T-complex protein 1, HSPA90AA1,NDUFS1, NCL, PNPT1, XRCC5, ALDH3A1, PDCD6IP, and NPM1 and proteins that are cell death associated only LMNB1, TUBB3, RAD23B, EEF2, P4HB, VCP, HSPA8, and PEA15 (Table 5). The process of virus-induced apoptosis can be divided into 4 processes based on these proteins, namely, virus uptake and infection, stress response, disruption of cellular structures and cell death by apoptosis.

46 Quantitative proteome profiling of respiratory virus-infected lung epithelial cells

2

Figure 3 - GO enrichment analysis procedure (A) and protein–protein interactions (B) of differentially expressed proteins in respiratory virus-infected A549 cells. The numbers on arrows in figure A represent the order in which the GO enrichment procedure was performed. Triangles in figure B represent proteins involved in apoptosis, nodes with dark bold borders are cytokines. Green nodes are proteins that were more abundant upon infection, while red nodes represent less abundant proteins.

47

a 0.005 <0.001 0.001 0.001 <0.001 <0.001 0.013 <0.001 <0.001 <0.001 <0.001 <0.001 0.002 <0.001 0.007 <0.001 p-Value 4.14 16.55 10.2 3.25 7.05 16.45 4.39 3.95 7.62 16.19 11.07 3.54 6.33 16.88 7.25 4.89 Fold enrichmentFold 0.72 0.72 0.2 2.15 0.85 0.85 0.23 2.53 1.05 1.05 0.27 3.11 0.47 0.47 0.14 1.43 Average random recovered 3 2 7 6 1 10 8 3 11 3 8 1 7 Virus- perturbed recovered 12 12 12 12 12 14 14 14 14 14 17 17 17 17 17 8 8 8 8 Random input 12 12 12 12 14 14 14 14 17 17 17 17 8 8 8 8 Virus- perturbed GO enrichment analyses of respiratory virus altered in A549 proteins cells. - MV tx apoptosis–cytokine enriched MV tx apoptosis enriched MV tx in apoptosis network MV tx in original network PIV txPIV apoptosis–cytokine enriched PIV txPIV apoptosis enriched PIV txPIV in apoptosis network PIV txPIV in original network hMPV tx apoptosis–cytokine enriched hMPV tx apoptosis enriched hMPV tx in apoptosis network hMPV tx in original network RSV tx apoptosis–cytokine enriched RSV tx apoptosis enriched RSV tx in apoptosis network RSV tx in original network Table 4 Case p-Value refers how to likely random recovered number higher is than virus-perturbed recovered. a

48 Quantitative proteome profiling of respiratory virus-infected lung epithelial cells

3.5 Virus uptake and infection XRCC5 is the 86 kDa subunit of the Ku complex that is abundant in the nucleus and binds to DNA [20]. Ku was postulated to have a role in DNA repair or replication as it has been shown to bind to the termini of DNA fragments. It was shown previously that Ku86 has a role in HIV1 and SV40 infection and our data suggest a role for this protein in RSV infection of A549 cells as well [21–23]. 2

HSPA8 is a member of the HSP70 family of heat shock proteins and has been shown to be virus inducible showing highest expression during the S-phase of cell cycle [24] and is necessary for the entry of HTLV-I into its target cells [24, 25]. In hMPV and PIV infected A549 cells, HSPA8 is more abundant at 6 and 12 h respectively. This shows that hMPV is inducing expression in A549 cells early in infection while this response is somewhat slower in PIV infected cells.

3.6 Stress response proteins Of the set of differentially expressed proteins described above, HSPA5, HSP90AA1, TUBB3, P4HB and VCP are all proteins that are involved in cellular stress responses preceding apoptosis. All these proteins were shown to be less abundant at 24 h after infection except for TUBB3, a member of the tubulin family of microtubule proteins that is also known to be involved in adaptation to oxidative stress and cellular survival [26–28]. More abundant expression of TUBB3, as observed in hMPV infected cells at 12 h suggests an initial mechanism for defense against oxidative stress which might eventually fail at 24 h after infection or points towards the blocking of cell cycle progression and induction of apoptosis [26,29,30].

Less abundant expression of the other 4 proteins also points towards apoptotic processes in A549 cells at 24 h after infection. The molecular chaperones HSPA5 and HSP90AA1 are involved in the ER stress response upon accumulation of unfolded proteins in the ER [31]. When the unfolded protein response is not correctly activated, cells die by apoptosis [32]. In contrast to some other viruses, hMPV induces a less abundant expression of HSPA5 protein expression in A549 cells at 24 h after infection [33–36], suggesting that the ER stress response is not correctly induced thus resulting in the subsequent induction of apoptosis in these cells.

HSP90AA1 has a very important function in the folding of cell regulatory proteins and the refolding of stress-denatured polypeptides [37–39]. In hMPV infected A549 cells we observed an initial increase in HSP90AA1 expression at 12 h after infection which suggests a cellular response to hMPV-induced ER stress initiated by an increase of unfolded or misfolded proteins. Recently, an association between HSP90 protein complex and lamin A/C has been observed after oxidative stress [40]. Also in our hMPV

49 and PIV infected A549 cells we observe an increase in HSP90AA1 as well as lamin A/C suggesting a damaging oxidative stress response in these cells. The HSP90AA1-associated protein lamin A/C has been shown to be involved in viral infections being required for repressing HSV replication [41] and more abundant expression in hMPV and PIV infected A549 cells supports such a response upon infection.

The other lamin that was found to be differentially expressed upon hMPV infection is lamin B1 although in contrast to lamin A/C it was less abundant upon infection. Lamin B1 also plays an important role in the cellular response to oxidative stress [42]. As a consequence of less abundant expression of lamin B1, hMPV-infected cells might become more susceptible to oxidative stress and may result in subsequent cell death.

The molecular structure of P4HB is identical to that of the enzyme protein disulfide isomerase (PDI) which is known to protect cells against ER stress and inhibition of the protein will result in ER stress and subsequent induction of apoptosis [43–45]. P4HB is less abundant at 24 h after infection with RSV suggesting the induction of apoptosis in these cells. In hMPV and PIV infected cells this protein is more abundant at 6 h after infection suggesting the initiation of a protective response against ER stress.

VCP is involved in many cellular processes such as protein degradation and membrane fusion and has chaperone activity. Less abundant expression of VCP expression has been shown to cause ER and oxidative stress and thereby induces apoptosis through caspase activation while more abundant expression results in anti-apoptotic responses [46]. VCP shows less abundant expression at 24 h after infection with hMPV and suggests the induction of apoptosis.

3.7 Disruption of cellular structures CCT3, CCT5, and CCT6 are all part of the T-complex protein 1 that is involved in the folding of actin and tubulin as well as many other newly synthesized proteins [47,48]. Less abundant expression of CCT5 as observed in the hMPV, PIV and MV infected A549 cells has also been described for enterovirus 71 infected cells and a role in the disruption of the cytoskeletal structure to aid viral replication was suggested [49]. A supportive finding for this disruption of cytoskeletal structure is that the F-actin cross-linking protein ACTN4 expression is less abundant at 24 h after infection thus facilitating viral replication [50]. A similar result was observed for PDCD6IP, a protein that has been shown to play a role in infection with enveloped viruses, like human immunodeficiency virus type 1 (HIV-1) [51–54]. In addition, PDCD6IP/AIP1 can associate with different cytoskeleton elements such as focal adhesion kinase and other cytoskeletal elements [55]. Less abundant expression of PDCD6IP as observed at 24 h after infection with RSV, hMPV and MV suggests the inhibition of cytoskeleton assembly [56].

50 Quantitative proteome profiling of respiratory virus-infected lung epithelial cells

Vinculin (VCL) is a key regulator of focal adhesions that directly interacts with talin and actin and controls cell adhesion and migration. Interaction between VCL and the plasma membrane is essential for these processes [57]. Several viruses are known to disrupt or limit the function of these focal adhesions [58–60]. In glioblastoma cells, less abundant VCL expression is associated with induced apoptosis showing that disruption of the actin-VCL-cytoskeleton matrix is a major component of induced apoptosis in these cells 2 [61]. Decreased expression of VCL by RSV infection in our experiment could suggest that the A549 cells have decreased adhesive and increased migratory capacities upon RSV infection, but might also suggest that apoptosis is induced in these cells.

Radixin (RDX) forms a complex with ezrin and moesin (ERM complex) that regulate membrane protein dynamisms and cytoskeleton rearrangements, are determinants in viral susceptibility, and are involved in apoptosis [62–64]. In RSV, PIV and MV infected A549 cells; expression of RDX was shown to be more abundant at 6 h after infection suggesting an initial defense response to the virus. At 24 h after RSV infection, however, RDX expression was less abundant suggesting increased apoptosis in RSV infected A549 cells.

Nucleolin (NCL) is an abundant nucleolar protein that has been implicated in chromatin structure, rDNA transcription, rRNA maturation, ribosome assembly and nucleo-cyto- plasmic transport [65]. NCL is also involved in the stabilization of the apoptosis suppressor BCL-2. Reduced expression of NCL results in BCL-2 mRNA instability and decreased levels of BCL-2 protein resulting in increased apoptosis [66–68]. NCL expression is less abundant upon infection with hMPV and thus also points towards induction of apoptosis in A549 cells.

Elongation factor 2 (EEF2) is a conserved monomeric GTPase involved in protein synthesis and translation elongation. This protein has been shown to have a role in HIV1 infection as a regulator of apoptosis and host innate cellular responses against viral factors as over expression of the protein induces protection against HIV1 induced apoptosis. In RSV and hMPV infected A549 cells, EEF2 is strongly less abundant at 24 h post-infection suggesting that induced apoptosis is no longer being suppressed by this protein [69].

PEA15 is a 15-kDa phosphoprotein that can inhibit proliferation and that regulates the ability of BCL-2 to suppress Fas-induced apoptosis in a phosphorylation-dependent manner. In addition, PEA15 phosphorylation is mediated by the PTEN/PI3K pathway [70, 71]. PEA15 is less abundant in MV infected cells at 6 h after infection, which suggests that proliferation and BCL-2-mediated suppression of Fas-induced apoptosis is intact and proliferation is not suppressed yet.

51 24 7.06/– 7.06/– –/2.13 –/2.13 −2.43/ 12 3.82/–

a,b MV 6

−4.32/–

−2.68/–

24 10.85/13.72 10.85/13.72

−2.38/– −4.90/−2.91 4.90/3.31 12

–/5.83 –/−2.89 –/−2.89 a,b PIV 6

2.82/–

2.55/–

24 −4.22/−6.17

−2.65/−3.25 −2.62/– −2.65/−3.25

−2.23/−3.01 −2.15/−2.46 −3.29/– 12 3.08/2.38

4.48/5.95

a,b 6 hMPV

4.33/–

2.13/–

–/2.21 –/2.21

24 −4.39/−2.54 –/−2.14 –/−2.14 −4.37/−4.24 –/−2.50 –/−2.50 −3.42/−2.40 12

a,b 6 RSV

–/11.12

3.29/3.70 3.29/3.70 2.52/2.44

Name VCL RDX LMNB1 TUBB3 NDUFS1 NCL PNPT1 XRCC5 XRCC5 RAD23B EEF2 EEF2 EEF2 EEF2 P4HB P4HB VCP ALDH3A1 HSP90AA1 HSPA8 HSPA5 HSPA8 CCT3 CCT3 CCT5 PDCD6IP PEA15 NPM1 Proteins involved in cell death. in cell involved Proteins - Protein ID Protein P18206 P35241 P20700 Q13509 P28331 P28331 P19338 Q8TCS8 P13010 P13010 P54727 P13639 P13639 P07237 P55072 P30838 P07900 P11142 P11021 P11142 P49368 P49368 P48643 P40227 CCT6A Q8WUM4 Q15121 P06748

Fold changes in protein expression upon infection with and live UV-inactivated (a) virus (b) particles. Only significant changes are shown. Table 5 Spot nr. 323 578 629 841 522 311 566 486 525 799 425 428 707 311 898 397 519 554 631 719 726 737 767 353 1791 1174 a/b –: not significantly not changed.–:

52 Quantitative proteome profiling of respiratory virus-infected lung epithelial cells

4. Conclusions 24 7.06/– 7.06/– –/2.13 –/2.13 −2.43/ The respiratory viruses RSV, hMPV, PIV and MV altered the expression of proteins involved in cell death that points towards an induction of apoptosis upon infection. Induction of 12 3.82/–

apoptosis in epithelial cells by RSV and hMPV has been described before and our findings of induced ER stress, Bcl-2 and p53-dependent apoptosis support these data a,b 2 [13–16]. In addition, we newly identified proteins that are altered by the viral infection MV 6

−4.32/–

−2.68/–

and induce apoptosis in A549 cells. For some of the identified proteins a role in virus-induced apoptosis has been previously described for viruses other than the respiratory viruses that were used in this experiment. However, several proteins have 24 10.85/13.72 10.85/13.72

−2.38/– −4.90/−2.91 4.90/3.31 been newly identified, which are considered to be involved in respiratory virus infec- tion-induced apoptosis. This list of proteins is likely to be further extended by improving 12

–/5.83 –/−2.89 –/−2.89 the 2-D DIGE analysis, i.e. by increasing the number of replicates in the analysis or by using a different internal standard. This might allow the analysis of additional proteins a,b and might also reliably allow the identification of spots with minor changes in protein PIV 6

2.82/–

2.55/–

expression (FC<2.5). In addition, complementary (semi-)quantitative proteomics-based techniques such as ICAT, ITRAQ or label-free MS analyses are also expected to confirm and are likely to complement our current findings. These proteins are of interest for 24 −4.22/−6.17

−2.65/−3.25 −2.62/– −2.65/−3.25

−2.23/−3.01 −2.15/−2.46 −3.29/– further exploration of respiratory virus-induced apoptosis and might be interesting targets for the development of future therapeutics or prevention strategies against these severe disease causing viruses. 12 3.08/2.38

4.48/5.95 a,b Acknowledgements 6 hMPV

4.33/–

2.13/–

–/2.21 –/2.21

This study was financially supported by the VIRGO consortium, an Innovative Cluster approved by the Netherlands Genomics Initiative and partially funded by the Dutch Government (BSIK 03012), The Netherlands. 24 −4.39/−2.54 –/−2.14 –/−2.14 −4.37/−4.24 –/−2.50 –/−2.50 −3.42/−2.40 12

a,b 6 RSV

–/11.12

3.29/3.70 3.29/3.70 2.52/2.44

Name VCL RDX LMNB1 TUBB3 NDUFS1 NCL PNPT1 XRCC5 XRCC5 RAD23B EEF2 EEF2 EEF2 EEF2 P4HB P4HB VCP ALDH3A1 HSP90AA1 HSPA8 HSPA5 HSPA8 CCT3 CCT3 CCT5 PDCD6IP PEA15 NPM1 Proteins involved in cell death. in cell involved Proteins - Protein ID Protein P18206 P35241 P20700 Q13509 P28331 P28331 P19338 Q8TCS8 P13010 P13010 P54727 P13639 P13639 P07237 P55072 P30838 P07900 P11142 P11021 P11142 P49368 P49368 P48643 P40227 CCT6A Q8WUM4 Q15121 P06748

Fold changes in protein expression upon infection with and live UV-inactivated (a) virus (b) particles. Only significant changes are shown. Table 5 Spot nr. 323 578 629 841 522 311 566 486 525 799 425 428 707 311 898 397 519 554 631 719 726 737 767 353 1791 1174 a/b –: not significantly not changed.–:

53 References

[1] Curns AT, Holman RC, Sejvar JJ, Owings MF, Schonberger LB. Infectious disease hospitalizations among older adults in the United States from 1990 through 2002. Arch Intern Med 2005;165:2514–20. [2] Williams BG, Gouws E, Boschi-Pinto C, Bryce J, Dye C. Estimates of world-wide distribution of child deaths from acute respiratory infections. Lancet Infect Dis 2002;2:25–32. [3] Van Woensel JB, Van Aalderen WM, Kimpen JL. Viral lower respiratory tract infection in infants and young children. BMJ 2003;327:36–40. [4] Yorita KL, Holman RC, Sejvar JJ, Steiner CA, Schonberger LB. Infectious disease hospitalizations among infants in the United States. Pediatrics 2008;121:244–52. [5] Lessler J, ReichNG, Brookmeyer R, PerlTM,NelsonKE,Cummings DA. Incubation periods of acute respiratory viral infections: a systematic review. Lancet Infect Dis 2009;9:291–300. [6] Bush A, Thomson AH. Acute bronchiolitis. BMJ 2007;335:1037–41. [7] Simoes EA. Respiratory syncytial virus infection. Lancet 1999;354:847–52. [8] Van den Hoogen BG, De Jong JC, Groen J, Kuiken T, de Groot R, Fouchier RA, et al. A newly discovered human pneumovirus isolated from young children with respiratory tract disease.Nat Med 2001;7:719–24. [9] Fields, B. N., Knipe, D. M. Field’s Virology, Lippincott Williams & Wilkins; 2001. [10] See H, Wark P. Innate immune response to viral infection of the lungs. Paediatr Respir Rev 2008;9:243–50. [11] Tsutsumi H, Takeuchi R, Chiba S. Activation of cellular genes in the mucosal epithelium by respiratory syncytial virus: implications in disease and immunity. Pediatr Infect Dis J 2001;20:997–1001. [12] Tsutsumi H, Takeuchi R, Ohsaki M, Seki K, Chiba S. Respiratory syncytial virus infection of human respiratory epithelial cells enhances inducible nitric oxide synthase gene expression. J Leukoc Biol 1999;66:99–104. [13] Eckardt-Michel J, Lorek M, Baxmann D, Grunwald T, Keil GM, Zimmer G. The fusion protein of respiratory syncytial virus triggers p53-dependent apoptosis. J Virol 2008;82:3236–49. [14] Kotelkin A, Prikhod’ko EA, Cohen JI, Collins PL, Bukreyev A. Respiratory syncytial virus infection sensitizes cells to apoptosis mediated by tumor necrosis factor-related apoptosis-inducing ligand. J Virol 2003;77:9156–72. [15] Bitko V, Barik S. An endoplasmic reticulum-specific stress-activated caspase (caspase-12) is implicated in the apoptosis of A549 epithelial cells by respiratory syncytial virus. J Cell Biochem 2001;80:441–54. [16] Bao X, Sinha M, Liu T, Hong C, Luxon BA, Garofalo RP, et al. Identification of human metapneumovirus-in- duced gene networks in airway epithelial cells by microarray analysis. Virology 2008;374:114–27. [17] Bao X, Liu T, Spetch L, Kolli D, Garofalo RP, Casola A. Airway epithelial cell response to human metapneumo- virus infection. Virology 2007;368:91–101. [18] Gupta CK, Leszczynski J, Gupta RK, Siber GR. Stabilization of respiratory syncytial virus (RSV) against thermal inactivation and freeze–thaw cycles for development and control of RSV vaccines and immune globulin. Vaccine 1996;14:1417–20. [19] Maere S, Heymans K, Kuiper M. BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics 2005;21:3448–9. [20] Mimori T, Hardin JA. Mechanism of interaction between Ku protein and DNA. J Biol Chem 1986;261:10375–9. [21] Jeanson L, Subra F, Vaganay S, Hervy M, Marangoni E, Bourhis J, et al. Effect of Ku80 depletion on the preintegrative steps of HIV-1 replication in human cells. Virology 2002;300:100–8. [22] Jeanson L, Mouscadet JF. Ku represses the HIV-1 transcription: identification of a putative Ku binding site homologous to the mouse mammary tumor virus NRE1 sequence in the HIV-1 long terminal repeat. J Biol Chem 2002;277:4918–24. [23] Quinn JP, Simpson J, Farina AR. The Ku complex is modulated in response to viral infection and other cellular changes. Biochim Biophys Acta 1992;1131:181–7. [24] Sainis I, Angelidis C, Pagoulatos G, Lazaridis I. The hsc70 gene which is slightly induced by heat is the main virus inducible member of the hsp70 gene family. FEBS Lett 1994;355:282–6. [25] Fang D, Haraguchi Y, Jinno A, Soda Y, Shimizu N, Hoshino H. Heat shock cognate protein 70 is a cell fusion- enhancing factor but not an entry factor for human T-cell lymphotropic virus type I. Biochem Biophys Res Commun 1999;261:357–63. [26] Cicchillitti L, Penci R,DiMM, Filipett F, RotilioD,DonatiMB, et al. Proteomic characterization of cytoskeletal and mitochondrial class III beta-tubulin. Mol Cancer Ther 2008;7:2070–9.

54 Quantitative proteome profiling of respiratory virus-infected lung epithelial cells

[27] Akasaka K, Maesawa C, Shibazaki M, Maeda F, Takahashi K, Akasaka T, et al. Loss of class III beta-tubulin induced by deacetylation is associated with chemosensitivity to paclitaxel in malignant melanoma cells. J Invest Dermatol 2009;129:1516–26. [28] Ferlini C, Raspaglio G, Cicchillitti L, Mozzetti S, Prislei S, Bartollino S, et al. Looking at drug resistance mechanisms for microtubule interacting drugs: does TUBB3 work? Curr Cancer Drug Targets 2007;7:704–12. [29] Cicchillitti L, Di MM, Urbani A, Ferlini C, Donat MB, Scambia G, et al. Comparative proteomic analysis of 2 paclitaxel sensitive A2780 epithelial ovarian cancer and its resistant counterpart A2780TC1 by 2D-DIGE: the role of ERp57. J Proteome Res 2009. [30] Derry WB, Wilson L, Khan IA, Luduena RF, Jordan MA. Taxol differentially modulates the dynamics of microtubules assembled from unfractionated and purified beta-tubulin isotypes. Biochemistry 1997;36:3554–62. [31] Xu C, Bailly-Maitre B, Reed JC. Endoplasmic reticulum stress: cell life and death decisions. J Clin Invest 2005;115:2656 – 64. [32] Paschen W. Endoplasmic reticulum dysfunction in brain pathology: critical role of protein synthesis. Curr Neurovasc Res 2004;1:173–81. [33] Liberman E, Fong YL, Selby MJ, Choo QL, Cousens L, Houghton M, et al. Activation of the grp78 and grp94 promoters by hepatitis C virus E2 envelope protein. J Virol 1999;73:3718–22. [34] Ciccaglione AR, Costantino A, Tritarelli E, Marcantonio C, Equestre M, Marziliano N, et al. Activation of endoplasmic reticulum stress response by hepatitis C virus proteins. Arch Virol 2005;150:1339–56. [35] Pavio N, Romano PR, Graczyk TM, Feinstone SM, Taylor DR. Protein synthesis and endoplasmic reticulum stress can be modulated by the hepatitis C virus envelope protein E2 through the eukaryotic initiation factor 2alpha kinase PERK. J Virol 2003;77:3578–85. [36] Cheng G, Feng Z, He B. Herpes simplex virus 1 infection activates the endoplasmic reticulum resident kinase PERK and mediates eIF-2alpha dephosphorylation by the gamma(1) 34.5 protein. J Virol 2005;79:1379–88. [37] Chen B, Piel WH, Gui L, Bruford E, Monteiro A. The HSP90 family of genes in the human genome: insights into their divergence and evolution. Genomics 2005;86:627–37. [38] Csermely P, Schnaider T, Soti C, Prohaszka Z, Nardai G. The 90-kDa molecular chaperone family: structure, function, and clinical applications. A comprehensive review. Pharmacol Ther 1998;79:129–68. [39] Obermann WM, Sondermann H, Russo AA, Pavletich NP, Hartl FU. In vivo function of Hsp90 is dependent on ATP binding and ATP hydrolysis. J Cell Biol 1998;143:901–10. [40] Nakamura M, Morisawa H, Imajoh-Ohmi S, Takamura C, Fukuda H, Toda T. Proteomic analysis of protein complexes in human SH-SY5Y neuroblastoma cells by using blue-native gel electrophoresis: an increase in lamin A/C associated with heat shock protein 90 in response to 6-hydroxydopamine-induced oxidative stress. Exp Gerontol 2009;44:375–82. [41] Mou F,Wills EG, ParkR, Baines JD. Effects of lamin A/C, lamin B1, and viral US3 kinase activity on viral infectivity, virion egress, and the targeting of herpes simplex virus U(L)34-encoded protein to the inner nuclear membrane. J Virol 2008;82: 8094–104. [42] Malhas AN, Lee CF, Vaux DJ. Lamin B1 controls oxidative stress responses via Oct-1. J Cell Biol 2009;12:45–55. [43] Noiva R. Protein disulfide isomerase: the multifunctional redox chaperone of the endoplasmic reticulum. Semin Cell Dev Biol 1999;10:481–93. [44] Puig A, Lyles MM, Noiva R, Gilbert HF. The role of the thiol/disulfide centers and peptide binding site in the chaperone and anti-chaperone activities of protein disulfide isomerase. J Biol Chem 1994;269:19128–35. [45] Lovat PE, Corazzari M, Armstrong JL, Martin S, Pagliarini V, Hill D, et al. Increasing melanoma cell death using inhibitors of protein disulfide isomerases to abrogate survival responses to endoplasmic reticulum stress. Cancer Res 2008;68:5363–9. [46] Braun RJ, Zischka H. Mechanisms of Cdc48/VCP-mediated cell death: from yeast apoptosis to human disease. Biochim Biophys Acta 2008;1783:1418–35. [47] Kubota H, Hynes G, Willison K. The containing t-complex polypeptide 1 (TCP-1). Multisubunit machinery assisting in protein folding and assembly in the eukaryotic cytosol. Eur J Biochem 1995;230:13–6. [48] Dekker C, Stirling PC, McCormack EA, Filmore H, Paul A, Brost RL, et al. The interaction network of the chaperonin CCT. EMBO J 2008;27:1827–39.

55 [49] Leong WF, Chow VT. Transcriptomic and proteomic analyses of rhabdomyosarcoma cells reveal differential cellular gene expression in response to enterovirus 71 infection. Cell Microbiol 2006;8:565–80. [50] Honda K, Yamada T, Endo R, et al. Actinin-4, a novel actin-bundling protein associated with cell motility and cancer invasion. J Cell Biol 1998;140:1383–93. [51] Dussupt V, Javid MP, Bou-Jaoude G, Ino Y, Gotoh M, Brost RL, et al. The nucleocapsid region of HIV-1 Gag cooperates with the PTAP and LYPXnL late domains to recruit the cellular machinery necessary for viral budding. PLoS Pathog 2009;5:e1000339. [52] Strack B, Calistri A, Craig S, Popova E, Gottlinger HG. AIP1/ALIX is a binding partner for HIV-1 p6 and EIAV p9 functioning in virus budding. Cell 2003;114:689–99. [53] Von Schwedler UK, Stuchell M, Muller B, Ward DM, Chung HY, Morita E, et al. The protein network of HIV budding. Cell 2003;114:701–13. [54] Carlton JG, Agromayor M, Martin-Serrano J. Differential requirements for Alix and ESCRT-III in cytokinesis and HIV-1 release. Proc Natl Acad Sci USA 2008;105:10541–6. [55] Schmidt MH, Hoeller D, Yu J, Furnari FB, Cavenee WK, Dikik I, et al. Alix/AIP1 antagonizes epidermal growth factor receptor downregulation by the Cbl-SETA/CIN85 complex. Mol Cell Biol 2004;24:8981–93. [56] Pan S, Wang R, Zhou X, He G, Koomen J, Kobayashi R, et al. Involvement of the conserved adaptor protein Alix in actin cytoskeleton assembly. J Biol Chem 2006;281:34640–50. [57] Humphries JD, Wang P, Streuli C, Geiger B, Humphries MJ, Ballestrem C. Vinculin controls focal adhesion formation by direct interactions with talin and actin. J Cell Biol 2007;179: 1043–57. [58] Shoeman RL, Hartig R, Hauses C, Traub P. Organization of focal adhesion plaques is disrupted by action of the HIV-1 protease. Cell Biol Int 2002;26:529–39. [59] Tan TL, Feng Z, Lu YW, Chan V, Chen WN. Adhesion contact kinetics of HepG2 cells during Hepatitis B virus replication: involvement of SH3-binding motif in HBX. Biochim Biophys Acta 2006;1762:755–66. [60] Wheeler JG, Winkler LS, Seeds M, Bass D, Abramson JS. Influenza A virus alters structural and biochemical functions of the neutrophil cytoskeleton. J Leukoc Biol 1990;47:332–43. [61] Magro AM, Magro AD, Cunningham C, Miller MR. Downregulation of vinculin upon MK886-induced apoptosis in LN18 glioblastoma cells. Neoplasma 2007;54:517–26. [62] Kondo T, Takeuchi K, Doi Y, Yonemura S, Nagata S, Tsukita S. ERM (ezrin/radixin/moesin)-based molecular mechanism of microvillar breakdown at an early stage of apoptosis. J Cell Biol 1997;139:749–58. [63] Kubo Y, Yoshii H, Kamiyama H, Tominaga C, Tanaka Y, Saro H, et al. Ezrin, radixin, and moesin (ERM) proteins function as pleiotropic regulators of human immunodeficiency virus type 1 infection. Virology 2008;375:130–40. [64] Haedicke J, De Los SK, Goff SP, Naghavi MH. The ezrin–radixin–moesin family member ezrin regulates stable microtubule formation and retroviral infection. J Virol 2008;82: 4665–70. [65] Ginisty H, Sicard H, Roger B, Bouvet P. Structure and functions of nucleolin. J Cell Sci 1999;112:761–72. [66] Soundararajan S, Chen W, Spicer EK, Courtenay-Luck N, Fernandes DJ. The nucleolin targeting aptamer AS1411 destabilizes Bcl-2 messenger RNA in human breast cancer cells. Cancer Res 2008;68:2358–65. [67] Sengupta TK, Bandyopadhyay S, Fernandes DJ, Spicer EK. Identification of nucleolin as an AU-rich element binding protein involved in bcl-2 mRNA stabilization. J Biol Chem 2004;279:10855–63. [68] Otake Y, Soundararajan S, Sengupta TK, Kio EA, Smith JC, Pineda-Roman M, et al. Overexpression of nucleolin in chronic lymphocytic leukemia cells induces stabilization of bcl2 mRNA. Blood 2007;109:3069–75. [69] Zelivianski S, Liang D, Chen M, Mirkin BL, Zhao RY. Suppressive effect of elongation factor 2 on apoptosis induced by HIV-1 viral protein R. Apoptosis 2006;11:377–838. [70] Peacock JW, Palmer J, Fink D, Ip S, Pietras EM, Mui AL, et al. PTEN loss promotes mitochondrially dependent type II Fas-induced apoptosis via PEA-15. Mol Cell Biol 2009;29:1222–34. [71] Bartholomeusz C, Rosen D, Wei C, Kazansky A, Yamasaki F, Takahashi T, et al. PEA-15 induces autophagy in human ovarian cancer cells and is associated with prolonged overall survival. Cancer Res 2008;68:9302–10.

56 Quantitative proteome profiling of respiratory virus-infected lung epithelial cells

2

57

3

Measuring the physical cohesiveness of proteins using physical interaction enrichment

Iziah Edwin Sama and Martijn A. Huynen

Bioinformatics. 2010 Nov 1;26(21):2737-43 Abstract

Motivation: Protein–protein interaction (PPI) networks are a valuable resource for the interpretation of genomics data. However, such networks have interaction enrichment biases for proteins that are often studied. These biases skew quantitative results from comparing PPI networks with genomics data. Here, we introduce an approach named physical interaction enrichment (PIE) to eliminate these biases. Methodology: PIE employs a normalization that ensures equal node degree (edge) distribution of a test set and of the random networks it is compared with. It quantifies whether a set of proteins have more interactions between themselves than proteins in random networks, and can therewith be regarded as physically cohesive. Results: Among other datasets, we applied PIE to genetic morbid disease (GMD) genes and to genes whose expression is induced upon infection with human-metapneumovirus (HMPV). Both sets contain proteins that are often studied and that have relatively many interactions in the PPI network. Although interactions between proteins of both sets are found to be overrepresented in PPI networks, the GMD proteins are not more likely to interact with each other than random proteins when this overrepresentation is taken into account. In contrast the HMPV-induced genes, representing a biologically more coherent set, encode proteins that do tend to interact with each other and can be used to predict new HMPV-induced genes. By handling biases in PPI networks, PIE can be a valuable tool to quantify the degree to which a set of genes are involved in the same biological process.

60 Measuring the physical cohesiveness of proteins using physical interaction enrichment

1. Introduction

Physical interactions between proteins explain how proteins function together in protein complexes or functional modules (Dittrich et al., 2008; Ideker and Sharan, 2008; Tucker et al., 2001; Vidal, 2001). Discovery of all protein–protein interactions (PPIs) has therefore been a priority in systems biology and there have been several efforts to elucidate PPIs both in low- and in high-throughput platforms (Collins et al., 2007; Rual et al., 2005; Stelzl et al., 2005), as well as by evolutionary inference (Brown and Jurisica, 2007; Huang et al., 2007; Yu et al., 2004).

The resulting PPI networks are invaluable for the interpretation of other genomics data. In a number of studies, specific emphasis has been placed on quantifying aspects of network topology to identify proteins that are specifically relevant to a biological process or for evolution. For instance Wachi and coworkers identified a high network 3 centrality for genes that are upregulated during lung cancer as a distinguishing topological feature to enable placement of cancer genes into the global and systematic context of the cell (Wachi et al., 2005). In a similar study, essential genes in yeast have been found to be well connected and globally centered in the PPI network (Jeong et al., 2001; Wuchty and Almaas, 2005).

Notwithstanding the success of such approaches, there are some experimental biases in the determination of PPIs. For instance, the Yeast-2-Hybrid (Y2H) approach is known to detect interactions among proteins that may not be likely because the proteins naturally do not occur in the same subcellular compartment (von Mering et al., 2002). Tandem Affinity Purification followed by mass spectrometry is known to favor highly abundant proteins (Bjorklund et al., 2008; von Mering et al., 2002). In addition, evolutionary inference of PPIs as ‘interologs’ has placed highly conserved proteins as hubs in general PPI networks (Brown and Jurisica, 2007). Even manual curation of PPIs in scientific literature has caveats. One main caveat being that the discovery of such interactions is driven by existing knowledge and hypotheses (Cusick et al., 2009). The latter has led to an overrepresentation of interactions between proteins encoded by disease genes in the Human Protein Reference Database (HPRD) (Oti et al., 2006).

Several measures have been developed to improve the reliability of PPI networks (Sharan et al., 2007), like the integration of general PPI networks with networks based on other data (Karni et al., 2009; Tornow and Mewes, 2003; Yosef et al., 2008). Although such comparative genomic approaches increase the reliability of the PPI network, they do not specifically remove systematic biases, like the overrepresentation of well-studied proteins, from general PPI networks. This is of pertinent concern because function information derived from such networks would be skewed towards well-studied genes

61 that are often evolutionarily conserved nodes, immune-related nodes or disease- associated nodes in PPI networks. Moreover, such biases can cloud quantitative assessments of whether a set of proteins of interest tend to interact with each other, and are therewith ‘physically cohesive’. For example, when genes that are upregulated under a specific condition encode proteins that physically interact with each other, this can be because they truly interact more with each other than a random set of proteins, or simply because there are just more interactions known for these proteins than for those whose genes are not highly expressed (von Mering et al., 2002).

To handle the bias that arises from the overrepresentation of certain proteins (e.g. well- studied proteins) in PPI networks, we present an approach called physical interaction enrichment (PIE). PIE extracts ‘random’ sets of proteins from a general PPI network that have the same node (protein) and edge (interaction) biases in the general PPI network as a set of proteins of interest. Secondly, it assesses whether the average degree of interaction among the proteins of interest is higher than that among the proteins from the random sets and thus quantifies how physically cohesive the proteins are.

To illustrate the usefulness of PIE, we first show how general human PPI networks have higher node degrees (i.e. number of interactions) for proteins encoded by morbid genetic disease genes. We also reveal biases in these networks for proteins encoded by genes that are stimulated by human metapneumovirus (HMPV) infection, a virus recently discovered to cause morbidity in very young and elderly people (van den Hoogen et al., 2001; van Diepen et al., 2010). The HMPV-induced genes are used here to represent a scientifically new and likely more focused context than the morbid genetic disease genes. Secondly, we demonstrate how PIE compensates for the enrichment biases in both the morbid genetic disease gene set and the sets of HMPV-induced genes. Thirdly, we assess physical cohesiveness of genes that are upregulated or downregulated to examine the biological coherence in such context. Furthermore, we apply PIE to other datasets in which the gene expression response of epithelial cells to a cytokine, interferon gamma (INFG) or other airway pathogens like Chlamydia pneumoniae, uv-irradiated Pseudomonas aeruginosa, uv-irradiated respiratory syncytial virus (RSV) have been measured. Finally, we use PIE to assess whether the propagation of interactions through PPI networks is biologically relevant. For the cases where it is relevant, we propagate these networks to larger networks and demonstrate the predictability of future expressed genes therein, thus showing exploratory potential in general PPI networks using PIE.

62 Measuring the physical cohesiveness of proteins using physical interaction enrichment

2. Methods

2.1 General PPI networks The PPI network used were built from an accumulation of human-curated PPIs obtained from the Biomolecular Interaction Network Database (BIND; Bader et al., 2003) (data downloaded in October 2006), the HPRD (Peri et al., 2003) (data of release 6 of January 2007), the IntAct database (Kerrien et al., 2007) (downloaded in May 2007), the Molecular Interactions Database (MINT; Chatr-aryamontri et al., 2007) (downloaded in May 2007) and the PDZBase database (Beuming et al., 2005) (downloaded in May 2007). For the scope of this study, only direct PPIs within the same species were used. We refer to the network composed of all interactions between human proteins as HsapiensPPI. Furthermore, interologous PPIs were built using the orthologues datasets from the Ensembl genome browser (Hubbard et al., 2007) (Ensembl release 44, downloaded on May 2007). These were combined with the HsapiensPPI dataset. We refer to this 3 comprehensive dataset as AllspeciesPPI. The HsapiensPPI contains 53 807 interactions between 10 826 proteins. The AllspeciesPPI network contains 205 050 interactions among 13 920 proteins. Unique to AllspeciesPPI are 151 243 interactions, among 3094 proteins. The main difference between the HsapiensPPI and AllspeciesPPI is that the former has fewer interactions per node than the latter. The high average degree in the AllspeciesPPI is in agreement with other studies that posit that preferential conservation of proteins with higher degree (hubs) leads to enrichment in protein complexes when interactions are transferred between organisms using interologs (Brown and Jurisica, 2007; Wuchty and Almaas, 2005). All the nodes in the HsapiensPPI and AllspeciesPPI networks represent the gene IDs of interacting proteins, and are not redundant in the networks. These PPI networks are large enough for the scope of our study. Unless otherwise stated, the HsapiensPPI is used in this article as the general PPI network.

2.2 Disease and immune-related data All human disease genes were obtained from the Morbid Omim database (downloaded February 10, 2009 from ftp://ftp.ncbi.nih.gov/repository/OMIM/morbidmap) (Sayers et al., 2009). The HMPV infection data was obtained from (Bao et al., 2008), as deposited in the NCBI Gene Expression Omnibus database with reference as GSE8961. Other data included that of human lung epithelial cell treatment with the cytokine INFG (GSE1815) (Pawliczak et al., 2005). Expression data of bronchial epithelial cells infected with respiratory pathogens like Chlamydia pneumonia (GSE7246) (Alvesalo et al., 2008) and uv-irradiated airway-pathogens (for P.aeruginosa and RSV; GSE6802; Mayer et al., 2007).

A geometric average of all probes for a gene was used to represent the fold change (FC) of a gene due to infection or treatment. A FC threshold of ≥1.5 was used to select genes

63 induced by the inert pathogens (UV irradiated), and this low threshold was necessary to yield adequately testable sample sizes. In all other cases, upregulated or downregulated genes used were those that showed a FC ≥3.0 after infection or treatment.

2.3 Enrichment of PPIs for disease genes and immune-related genes in general PPI networks To measure the biased enrichment in interactions for proteins of genetic disease genes in general PPI networks, the morbid Omim gene set (n=1996) was used as the genetic disease test set. Its enrichment was assessed as follows: the average node degree (nodes of degree zero inclusive) in the general PPI network for proteins encoded by the disease genes was compared with those of sets of 1996 genes that were randomly selected from all human genes (n=36456) that were available in NCBI Entrez gene database in 2008. The P-value of enrichment in interactions for proteins of disease genes was estimated as a fraction of the frequency (out of 1000 simulations) of the sets of random genes having an average degree that was equal or greater than that of the disease genes. As shown in Figure 1, disease genes have significantly (P<0.001) high node degrees in the general PPI networks HsapiensPPI and AllspeciesPPI.

To measure the biased enrichment in interactions for proteins of immune-related genes in the general PPI networks, genes that were upregulated at a FC threshold of 3.0 at various time points after HMPV infection were used as an example. The enrichment procedure for these representative immune-response genes was similar to that carried out for the morbid disease geneset. Apart from the gene set at the earliest infection time point (6 h), these immune-related sets of genes have significantly (P<0.01) high node degrees in general PPI networks (Fig. 1).

Overall, these results indicate that proteins of disease genes and immune-related genes have, on average, more interactions in general PPI networks than do proteins of randomly chosen genes.

2.4 PIE procedure Proteins of disease genes or of genes involved in immune response have relatively more interactions than random sets of genes in general PPI networks (Fig. 1). This can lead to a bias when measuring whether genes whose expression is e.g. triggered by a viral infection or involved in genetic disease tend to interact with each other. As such, one cannot simply compare the extent to which proteins of these genes interact with each other relative to randomly chosen proteins from the PPI network, thus dictating the need for appropriate random models (Koyuturk et al., 2007). In order to circumvent these biases in the number of proteins that are present in the PPI network and also in the number of interactions per protein in the PPI network, the PIE procedure is presented

64 Measuring the physical cohesiveness of proteins using physical interaction enrichment

3

Figure 1 - Proteins of disease genes and immune-related genes have higher node degree in general PPI networks than do proteins of randomly sampled genes.

The average node-degree of disease genes (represented by morbid Omim genes) and immune-related genes (represented by genes upregulated at least 3-fold in expression at various time points after HMPV infection) is observed in the HsapiensPPI (top panel) and the AllspeciesPPI (bottom panel) networks to be generally higher than randomly sampled genes from the genome. In the top and lower panels, hmpv6, hmpv12, etc. refer to gene sets upregulated in expression after 6 and 12 h of HMPV infection. In brackets are the P-value estimates of higher node degree (compared with random) of disease and immune-related genes in the general PPI networks.

as depicted in Figure 2. PIE measures the physical cohesiveness of interacting proteins via the strict randomization procedure described in detail below.

2.4.1 Derivation of test and random PPI networks from a general PPI network for PIE A test PPI network is derived from a set of test genes by selecting all the interactions from the general PPI network occurring between proteins encoded by the genes in that test set (Equation 1). An example of a test set is a set of genes that are upregulated in expression due to a viral infection. Next, for each node in the general PPI network, the

65 degree (i.e. number of interactions the node has in the network) is obtained. Thus, a degree distribution for the nodes in the general PPI network is derived. We call this the global degree distribution. From the global degree distribution, a test degree distribution consisting only of the test network nodes (and their associated degree in the general PPI network) is extracted. Subsequently, proteins that have the same degree as those in the test network are randomly selected from the global degree distribution. This is done such that for every degree in the test degree distribution, the number of randomly chosen nodes for that degree is the same as that of the test nodes for that degree. Moreover, the total number of nodes in the test set is ideally much smaller than the total number of nodes in the general PPI network (Equations 2 and 3). As such the total number of randomly chosen nodes is the same as that of the test nodes. Thus, both random and test have been normalized in the context of the general PPI network.

Essentially, the PIE randomization involves selection of nodes from the general PPI network that have the same node degrees as those of the test nodes in the general PPI network prior to constructing an induced subgraph by selecting all the interactions from the general PPI network that occur between the random nodes.

This randomization procedure presents a caveat regarding saturation of network sampling space. In the random sampling procedure, many random protein sets with identical degree distribution as the test set are extracted from the global network. When the test network becomes large and therewith largely identical to the global network, randomly sampled networks will largely overlap with the test network, and it will become meaningless to try to measure an increase in the number of interactions in the test network relative to the random networks. We measured the overlap of nodes between the test and random networks and also the overlap of interactions. In practice, the sets of proteins that we tested for physical cohesiveness contained <16% of the nodes of the random networks, while the overlap in the number of interactions was <13%. Only propagated test networks (see Section 3 below and Supplementary Material) contain a substantial fraction of the global network.

2.4.2 Measuring physical cohesiveness First, the average degree of the test network is calculated as the average number of interactions per node (Equation 4). Also, for the random network the average degree is calculated by dividing the number of interactions by the number of nodes. Next, the ratio of the average degree of the test network relative to the median average degree of the corresponding random networks (i.e. those obtained after PIE randomization) is calculated (Equation 5). This is the measure of physical cohesiveness or ‘PIE score’ for the set of test genes. Finally, the fraction of instances whereby the average degree of the random networks is larger or equal to the test network is calculated (Equation 6). This

66 Measuring the physical cohesiveness of proteins using physical interaction enrichment

strict empirical value represents the P-value for the physical cohesiveness for the test set of genes.

3

Figure 2 - Summary of the PIE procedure. In the PIE procedure, a test set of genes of interest, e.g. those upregulated by a virus, are used to extract a specific PPI network from the general PPI network that consists of proteins encoded by genes in that test set.

The number of interactions of each protein of the test set in the general PPI network (i.e. degree) is calculated. A number of proteins equal to the test set and with the same degree distribution as the test proteins in the general PPI network are then randomly selected from the set of all proteins in the general PPI network. As such the test set and the randomly selected set have the same degree distribution in the general PPI network. The average degree within the test set itself (i.e. total number of edges divided by total number of nodes in the test set) is then obtained and compared with that within the randomly selected set. The fraction of cases in which the average degree within a randomly selected set surpasses or equalizes that of the test network is recorded as the P-value for the physical cohesiveness of proteins of the genes of interest. The PIE score is the ratio of the average degree within the test network to the median average degree within the random networks.

67 2.5 Algorithm All calculations and graphs were obtained using Python scripting and R. Network graphs and Gene Ontology enrichment analyses were obtained using Cytoscape (Shannon et al., 2003) and the Cytoscape plugin BINGO version 2.3 (Maere et al., 2005), respectively. P-values reported for the gene enrichment analyses were gotten after Benjamini and Hochberg false discovery rate correction.

All network P-values indicate how often the average degree of a test network is less than that of random networks of equal size and equal global degree distribution as the test set. All calculations of correlations and associated P-values are Pearson moment correlations as implemented in the statistical package R. Other specific calculations are as described below.

2.5.1 Test PPI network

Given a general PPI network, G=(V,E) such that V is a set comprising NV vertices (nodes) t t t and E is a set comprising NE edges (links, degrees). A PPI network, G = (V ,E ) of a given test set of genes is constructed such that:

G t ⊆ G, induced subgraph (1)

That is, Vt are the vertices in G that are proteins encoded by the test genes and Et are the edges from G that connect all the Vt vertices. For clarity, each gene is represented by only one protein node.

2.5.2 Random PPI network for PIE The appropriate nodes used to construct the random networks used for comparison with the test network are obtained as follows. Let global degree, g be the number of edges (interaction partners) a node has in a general PPI network, and αg be a vector of all existing g. Given the general network G = (V,E) comprising a set V of NV vertices, with a set E of NE edges: we create another vector βg comprising Ng number-of-nodes for each distinct global node degree g in αg.

t t t Next, given a test network G = (V ,E ), we create a vector βt comprising Ngt number-of- t t nodes from V such that, G ⊆ G and αt ⊆ αg.

Finally, a corresponding (i.e. to the test network) random induced subgraph, G r =(V r ,E r ) r r is deduced from G such that G ⊆ G and βr =βt and also αr =αt , whereby V is randomly selected from V.

68 Measuring the physical cohesiveness of proteins using physical interaction enrichment

Resulting in:

(2)

i.e. total number of nodes of the test network and for a random network is the same, and have the same degree distribution.

Furthermore, for adequate sampling space of V r from V;

; such that (3) 3

2.5.3 Average Degree of PPI network The average degree ω of a network G= (V,E) is the average number of interaction per node. This is calculated as follows:

(4)

Where NE is the total number of edges linking nodes and NV is the total number of distinct vertices (nodes) in the PPI network.

2.5.4 Physical cohesiveness The score of physical cohesiveness ρ is calculated as follows:

(5)

Where ωt and ωr are as derived in (Equation 4) for a test network, and N random networks of identical global node-degree distribution (as derived in Equations 2 and 3), respectively. The denominator is the median of N random networks. Unless otherwise indicated, this N is 1000. To avoid division by zero error, 1 is added to both numerator and denominator.

2.5.5 P-value of physical cohesiveness The P-value of physical cohesiveness, Pvalueρ, is calculated as follows:

; if the numerator =0, Pvalueρ<1/N (6)

69 3. RESULTS

3.1 Assessment of enrichment of disease or immune-related protein interactions in general PPI networks As expected, when assessing the presence of genes involved in disease or in the immune system we observed a significant overrepresentation of interactions for the proteins of these genes. Morbid disease genes have a higher node degree than random (P<0.001). Likewise were the proteins of HMPV-induced genes (P<0.01) (Fig. 1).

3.2 Physical cohesiveness of disease and immune-related genes The PIE approach was tested on morbid genetic disease genes and genes that were upregulated in expression due to HMPV infection. The first set of genes represents the general context of genes relevant to human health and disease. The second set serves as an example of a more focused context and is selected using a more objective criterion, i.e. gene expression perturbations upon HMPV infection. Both sets of genes encode proteins with overrepresented interactions in general PPI networks (Fig. 1).

3.2.1 Morbid disease genes Proteins of morbid disease genes have more interactions with each other than do proteins of randomly chosen genes. The average number of interactions between morbid disease genes is 2.29 while that of the same number of randomly chosen genes is 1.15 (P<0.001). Nevertheless, the PIE procedure indicates that the morbid disease gene set has no physical cohesiveness (PIE score = 1), and this is not because the morbid disease gene network is quantitatively similar to the general network as it contains only 11% of the nodes, and 5% of the interactions in the general network. The morbid disease gene set has only 15% of its nodes, and 12% of its interactions in common with those of the random networks used in the PIE randomization approach.

Thus, the proteins encoded by morbid disease genes are not more likely to interact with each other than random proteins with the same degree distribution. Overall, the absence of physical cohesiveness for morbid disease genes using PIE might be expected because disease genes are involved in many different diseases and likely many different processes, and it is encouraging to see that PIE effectively corrects for such biases.

3.2.2 HMPV-induced genes Like the morbid disease genes, proteins of genes whose expression are induced after 12 h of HMPV infection also have a higher number of interaction with each other (average degree = 1.148) than the same number of randomly chosen genes (median average degree = 1.0). Unlike the case of morbid disease genes, however, the 12 h HMPV-induced genes do display significant physical cohesiveness (PIE score = 1.1, P=0.04) when the

70 Measuring the physical cohesiveness of proteins using physical interaction enrichment

latter is measured using PIE. The physical cohesiveness of the nodes of the HMPV-induced genes therefore does not depend on their overrepresentation in general PPI networks (Fig. 1) in which it contains only 0.5% of the general network nodes and 0.12% of the interactions. Furthermore, both average degree and physical cohesiveness depend on some criteria of severity of HMPV infection. For instance at a gene expression FC threshold of 3.0, physical cohesiveness increases with longevity of the infection (Fig. 3). In line with the changes in physical cohesiveness, the immune response might become more specific in the time course of infection. For example, apoptosis starts to play a more prominent role later in infection as reflected by the enrichment of the GO term ‘apoptosis’ for genes that are upregulated at a FC threshold of 3.0. At various time points of infection, the process apoptosis is observed to change in significance as follows: 6 h (absent), 12 h (P=2.58e-2), 24 h (P = 1.94e-2), 48 h (P = 2.68e-4) and 72 h (P = 7.39e-4). In addition, key apoptosis marker genes like MDA-5 and RIG-1 were found using western blotting to increase over time of HMPV infection as seen in the data of Bao and others 3

Figure 3 - Timewise variation in physical cohesiveness of HMPV infection induced genes.

(A) Changes in average degree for the infection (squares with black line) and random (boxplots with grey line). (B) Changes in physical cohesiveness measured as PIE scores with P-values of cohesiveness in brackets.

71 (i.e. the HMPVdata used in this article) (Bao et al., 2008). In addition to the temporal cohesiveness of upregulated genes, it is interesting to examine the cohesiveness of infection and immune system related genes in other regulatory contexts.

3.2.3 Other infection related datasets The 12 h HMPVperturbed genes were cohesive, not only for upregulated genes (PIE score = 1.1, P = 0.04) but also for downregulated genes (PIE score = 1.4, P<0.001), indicating the ability of PIE to effectively reveal physical cohesiveness in different regulatory settings. We further explored the physical cohesiveness of infection and immune system related genes by analyzing the upregulated and downregulated genes in a number of relevant datasets: (i) genes that are affected in expression after 24 h treatment of bronchial epithelial cells with IFNG (Pawliczak et al., 2005), (ii) genes affected by cells infected for 4 h with UV-irradiated RSV or pseudomonas (Mayer et al., 2007) and (iii) genes that are affected after 24 h infection by Chlamydia pneumonia (Alvesalo et al., 2008). Both the IFNG and the UV-irradiated pathogens led to a significant PIE score for upregulated genes [PIE score = 1.14, P = 0.02 (IFNG), PIE score = 1.35, P<0.001 (RSV) and PIE score = 1.19, P = 0.022 (Pseudomonas)], while only for IFNG did the downregulated genes display physical cohesiveness (PIE score = 1.17, P = 0.032). In contrast, PIE indicates no physical cohesiveness for genes that were upregulated or downregulated after Chlamydia infection of human lung epithelial cells. The absence of physical cohesiveness of genes perturbed by Chlamydia might be partly explained by the ability of this intracellular parasite to de-modulate host-cell responses, e.g. its abrogation of apoptosis in epithelial cells (Airenne et al., 2002; Hacker et al., 2006), thus resulting in PPI networks that are less or equally dense as random networks. In this light, the significant physical cohesiveness of the upregulated genes after infection with UV-irradiated pathogens is particularly interesting, as these cannot be the result of the direct interference of the pathogen with the gene regulation, but, in contrast point to the cellular program that appears to be triggered by the infection.

3.3 Correlation of physical cohesiveness with prediction of downstream pathway genes in propagated networks We also examined to what extent proteins that interact with the proteins of upregulated genes are physically cohesive (Supplementary Material).We observed that a one step propagation of networks of genes upregulated in expression by HMPV or IFNG led to physically cohesive networks (Supplementary Fig. 1). Nevertheless, the overlap between random networks with the same degree distribution and the propagated network becomes substantial (25%). At higher levels of propagation, this overlap becomes too large to assess a significance value for the PIE score. Comparison between the one step propagated networks of genes induced by the HMPV virus at time points 12, 24 and 48 h and the genes overexpressed at the next time point indicated a significant overlap

72 Measuring the physical cohesiveness of proteins using physical interaction enrichment

(P<1.0e-3), showing some predictive capacity of such propagation (Supplementary Table 1). Overall, the sensitivity of prediction is 5–41% and the positive predictive value is 5–11%. There is a positive, albeit insignificant correlation between the PIE score of precursor networks and the predictability of genes in their propagated networks. For instance, the correlation between the PIE score and sensitivity of predictions is 0.93 (P=0.066). These results demonstrate the usefulness of the PIE approach to assess the biological coherence of a set of genes and the potential to use general PPI networks in an exploratory manner to predict genes of relevance to a process being studied.

4. Discussion

In order to explore PPIs in a quantitative manner, we have developed a method called PIE, to circumvent interaction enrichment biases in context-specific networks derived 3 from general PPI networks. PIE employs a randomization procedure that appropriately considers the global degrees (in a general PPI network) of the extant nodes in a context- specific test network, prior to assigning physical cohesiveness to the test network.

We have focused on the context of disease and immunity because many proteins have been studied in this area. We observe that there are significantly (P<0.01) more interactions for proteins of morbid disease genes and HMPV-induced genes than random expectation in general PPI networks. This observed enrichment for PPI of morbid disease genes and HMPV-induced genes is in agreement with other studies indicating that disease-based inquisitional biases have an influence on the topology of general PPI networks (Oti et al., 2006; Wachi et al., 2005). The basis for this enrichment can be biological, in the sense that disease genes or genes that are triggered upon infection are simply more likely to have physical interactions. Nevertheless, it is not unlikely that this enrichment is at least partly caused by an experimental bias in research efforts. Based on these premises, we investigated the physical cohesiveness of the proteins of these sets of genes using the PIE approach.

The PIE approach reveals no physical cohesiveness among the morbid disease genes, by taking into account that these genes are significantly enriched in general PPI networks. Other studies examining global topological properties of protein encoded by disease genes have focused on cancer genes. In this light, greater degrees and centralities of cancer genes in comparison to non-cancer genes within the interactome have been observed (Jonsson and Bates, 2006; Wachi et al., 2005). In contradiction to the previous observation, Goh et al. (2007) have shown that the majority of disease genes do not actually show a tendency to code for highly interacting proteins but instead the apparent correlation between high degrees and disease genes is entirely due to the

73 ~22% overlap between disease genes and essential genes, the latter set of genes being mainly hubs (Goh et al., 2007). The methods used by the previous authors were not similar to the PIE approach. Nevertheless, PIE further clarifies this discrepancy in the literature by indicating that globally, there is no physical cohesiveness between the proteins of morbid disease genes, when taking into account their high degrees in PPI networks. This suggests that genes that are not associated with similar disorders, even if their protein products have many interactions in PPI networks, show negligible biological coherence and advocates the existence of distinct, disease-specific functional modules.

On the other hand, the PIE approach reveals significant physical cohesiveness for HMPV-induced genes. In general, both average degree and the physical cohesiveness increased in the time course of infection. The observed increase in physical cohesiveness in the course of HMPV infection suggests the existence of biologically coherent functional modules, like those for apoptosis (van Diepen et al., 2010), being elicited in response to the infection. This observation is in agreement with other interactome– transcriptome studies that posit that there is a correlation between transcription pattern similarities of a pair of genes and there being an interaction between their protein products (Ge et al., 2001; Hahn et al., 2005; Wachi et al., 2005).

Moreover, the biologically coherent information observed so far using PIE sets the premise to predict genes that might be relevant to HMPV-infection biology but not yet expressed at the particular time point of infection being studied. We observe that the propagation of a physically cohesive network rapidly leads to a less cohesive network; a phenomenon that is likely due to the change in context from which the genes in the original network were chosen to the global context of the general PPI network. This observation is in agreement with other studies indicating that there is a correlation between network distance (distance apart in a PPI network) and functional distance (semantic similarity in functional category) between proteins in a PPI network. That is, the closer two proteins are in a PPI network, the more similar are their function annotations (Sharan et al., 2007). In this light, we could predict 5–41% of the genes that would be overexpressed at future time points of the infection. Although the positive predictive values for these predictions are low (<11%), mainly due to the rapid growth of the propagated networks, they are significant (P ≤0.001, Supplementary Table 1) even after randomizations based on networks of the same sizes and degree distributions as those of the test networks (i.e. the PIE approach).

PIE is different from general methods designed to derive functional insights from a set of genes by integrating gene lists with general PPI networks mainly in the sense that most of these methods are aimed at decomposing the network into smaller clusters or

74 Measuring the physical cohesiveness of proteins using physical interaction enrichment

functional modules (Sharan et al., 2007). As such it is difficult to directly compare PIE with other methods. Notwithstanding, PIE is a very strict and globally unbiased approach. Even though PIE scores are in general not very high (the increase in the average number of interactions relative to the random networks is about 5–10%) they can nevertheless be deemed significant or not (i.e. they are informative). Moreover, the PIE procedure is very reliable, in the sense that it mimics the degree distribution of the network that is being tested exactly. A disadvantage of this approach is that it limits the number of alternative, independent networks that can be extracted from the global network for comparison with the network under investigation. An alternative would be to relax this constraint slightly by either modeling or binning the degree distribution and extracting networks with that modeled or binned distribution. This would also have the advantage that the physical cohesiveness of networks with many high degree nodes could be assessed (Koyuturk et al., 2007). 3 Contingencies with respect to the interactions, like inhibition or stimulation, are not available at a scale that allows systematic comparisons of networks. Such information would of course make the networks more specific, allowing more meaningful comparisons with respect to their biological cohesiveness. Nevertheless, regardless of the source of experimental or inquisitional bias, PIE circumvents gene enrichment biases in a global manner as has been shown here using data of morbid genetic disease genes, virus-perturbed genes (HMPV), cytokine-perturbed genes (IFNG), bacteria-per- turbed genes (C.pneumoniae), and even genes perturbed by inert pathogen material (uv-irradiated RSV and P.aeruginosa). PIE can in principle be applied to any given set of genes to estimate the overrepresentation of protein interactions as a proxy for their biological cohesiveness.

Acknowledgements We are grateful to the reviewers for their constructive remarks. Funding: This work was supported by the VIRGO consortium, an Innovative Cluster approved by the Netherlands Genomics Initiative and partially funded by the Dutch Government (BSIK 03012), The Netherlands. Conflict of Interest: none declared.

Supplementary Material for this chapter is available in Appendix 1

75 References

Airenne,S. et al. (2002) Chlamydia pneumoniae inhibits apoptosis in human epithelial and monocyte cell lines. Scand. J. Immunol., 55, 390–398. Alvesalo,J. et al. (2008) Microarray analysis of a Chlamydia pneumoniae-infected human epithelial cell line by use of gene ontology hierarchy. J. Infect. Dis., 197, 156–162. Bader,G.D. et al. (2003) BIND: the biomolecular interaction network database. Nucleic Acids Res., 31, 248–250. Bao,X. et al. (2008) Identification of human metapneumovirus-induced gene networks in airway epithelial cells by microarray analysis. Virology, 374, 114–127. Beuming,T. et al. (2005) PDZBase: a protein-protein interaction database for PDZdomains. Bioinformatics, 21, 827–828. Bjorklund,A.K. et al. (2008) Quantitative assessment of the structural bias in protein-protein interaction assays. Proteomics, 8, 4657–4667. Brown,K.R. and Jurisica,I. (2007) Unequal evolutionary conservation of human protein interactions in interologous networks. Genome Biol., 8, R95. Chatr-aryamontri,A. et al. (2007). MINT: the Molecular INTeraction database. Nucleic Acids Res., 35, D572–D574. Collins,S.R. et al. (2007) Toward a comprehensive atlas of the physical interactome of Saccharomyces cerevisiae. Mol. Cell Proteomics, 6, 439–450. Cusick,M.E. et al. (2009) Literature-curated protein interaction datasets. Nat. Methods, 6, 39–46. Dittrich,M.T. et al. (2008) Identifying functional modules in protein-protein interaction networks: an integrated exact approach. Bioinformatics, 24, i223–i231. Ge,H. et al. (2001) Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae. Nat. Genet., 29, 482–486. Goh,K.I. et al. (2007) The human disease network. Proc. Natl Acad. Sci. USA, 104, 8685–8690. Hacker,G. et al. (2006) Apoptosis in infectious disease: how bacteria interfere with the apoptotic apparatus. Med. Microbiol. Immunol., 195, 11–19. Hahn,A. et al. (2005) Confirmation of human protein interaction data by human expression data. BMC Bioinformatics, 6, 112. Huang,T.W. et al. (2007) Reconstruction of human protein interolog network using evolutionary conserved network. BMC Bioinformatics, 8, 152. Hubbard,T.J. et al. (2007) Ensembl 2007. Nucleic Acids Res., 35, D610–D617. Ideker,T. and Sharan,R. (2008) Protein networks in disease. Genome Res., 18, 644–652. Jeong,H. et al. (2001) Lethality and centrality in protein networks. Nature, 411, 41–42. Jonsson,P.F. and Bates,P.A. (2006) Global topological features of cancer proteins in the human interactome. Bioinformatics, 22, 2291–2297. Karni,S. et al. (2009) A network-based method for predicting disease-causing genes. J. Comput. Biol., 16, 181–189. Kerrien,S. et al. (2007) IntAct–open source resource for molecular interaction data. Nucleic Acids Res., 35, D561– D565. Koyuturk,M. et al. (2007) Assessing significance of connectivity and conservation in protein interaction networks. J. Comput. Biol., 14, 747–764. Maere,S. et al. (2005) BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics, 21, 3448–3449. Mayer,A.K. et al. (2007) Differential recognition of TLR-dependent microbial ligands in human bronchial epithelial cells. J. Immunol., 178, 3134–3142. Oti,M., et al. (2006) Predicting disease genes using protein-protein interactions. J. Med. Genet., 43, 691–698. Pawliczak,R. et al. (2005) Influence of IFN-gamma on gene expression in normal human bronchial epithelial cells: modulation of IFN-gamma effects by dexamethasone. Physiol. Genomics, 23, 28–45. Peri,S. et al. (2003) Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res., 13, 2363–2371. Rual,J.F. et al. (2005) Towards a proteome-scale map of the human protein-protein interaction network. Nature, 437, 1173–1178.

76 Measuring the physical cohesiveness of proteins using physical interaction enrichment

Sayers,E.W. et al. (2009) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res., 37, D5–D15. Shannon,P. et al. (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res., 13, 2498–2504. Sharan,R. et al. (2007) Network-based prediction of protein function. Mol. Syst. Biol., 3, 88. Stelzl,U. et al. (2005) A human protein-protein interaction network: a resource for annotating the proteome. Cell, 122, 957–968. Tornow,S. and Mewes,H.W. (2003) Functional modules by relating protein interaction networks and gene expression. Nucleic Acids Res., 31, 6283–6289. Tucker,C.L. et al. (2001)Towards an understanding of complex protein networks. Trends Cell Biol., 11, 102–106. van den Hoogen,B.G. et al. (2001) A newly discovered human pneumovirus isolated from young children with respiratory tract disease. Nat. Med., 7, 719–724. van Diepen,A. et al. (2010) Quantitative proteome profiling of respiratory virus-infected lung epithelial cells. J. Proteomics, 73, 1680–1693. 3 Vidal,M. (2001) A biological atlas of functional maps. Cell, 104, 333–339. von Mering,C. et al. (2002) Comparative assessment of large-scale data sets of protein-protein interactions. Nature, 417, 399–403. Wachi,S. et al. (2005) Interactome-transcriptome analysis reveals the high centrality of genes differentially expressed in lung cancer tissues. Bioinformatics, 21, 4205–4208. Wuchty,S. and Almaas,E. (2005) Peeling the yeast protein network. Proteomics, 5, 444–449. Yosef,N. et al. (2008) Improved network-based identification of protein orthologs. Bioinformatics, 24, i200–i206. Yu,H. et al. (2004) Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs. Genome Res., 14, 1107–1118.

77

4

Transcriptomic assessment of cellular response to intracellular pathogens reveals an iron-depletion pro le

Iziah Edwin Sama1, Apriliana E. R. Kartikasari2,3, Dorine W. Swinkels2, Arno Andeweg4, Martijn A. Huynen*1

1 Centre for Molecular and Biomolecular Informatics, Nijmegen Centre for Molecular Life Sciences, Radboud University Nijmegen Medical Centre, Nijmegen 2 Department of Clinical Chemistry, Radboud University Nijmegen Medical Centre, Nijmegen 3 Department of Medicine-Endocrinology, University of California, Los Angeles, California, USA 4 Department of Virology, Erasmus Medical Center, Rotterdam, The Netherlands

(Manuscript in preparation) Abstract

Iron is an essential nutrient for microbes. During infection, invading pathogens compete for iron with their host cells. However, it is largely unknown whether host innate-immune cells use iron for their own functionality as immune effectors, yet keep it from being used by the invading microbes. Herein, we developed a computational method called Iron Squelch to investigate iron depletion tendencies by host innate-immune cells after challenges with various pathogens. The method compares the transcriptomes of infected cells with those of an infection-free iron model. The so-called Iron Squelch Indexes of gene sets indicate whether at the transcription level, infection mimics iron-depletion or iron-overload or none. Iron Squelch analysis reveals that intracellular pathogens (human metapneumovirus, respiratory syncytial virus, measles virus, parainfluenza virus, influenza A virus and Chlamydia pneumoniae) lead to an iron- depletion profile in infected innate-immune cells. Interestingly, an iron-depletion profile is also observed in cells infected with uv-irradiated viruses, suggesting iron-depletion as an immediate response of innate-immune cells when ‘sensing’ the presence of intracellular pathogens. Our analysis further indicates that interferon-gamma also induces an iron-depletion profile, suggesting that the pathway is interferon-mediated. Furthermore, using transcription factor binding site analysis, we identified sites for interferon inducing transcription factors in the promoter sequences of infection-in- duced and iron-response genes. This provides further support for the iron-depletion profile in infected cells being interferon-mediated. Intriguingly, we observed no iron-depletion profile for cells infected with extracellular bacteria (Escherichia coli, Pseudomonas aeruginosa), nor with lipopolysacchride (LPS), an antigenic component of gram negative extracellular bacteria. These observations suggest that iron restriction is an intrinsic cellular response when innate-immune cells sense the presence of intracellular pathogens.

80 Cellular iron-depletion profile for intracellular pathogens

1. Introduction

Iron is an essential micronutrient that is required for proper functioning of both microbes and host cells. For instance, it is required as a cofactor in iron-sulfur cluster containing proteins that are crucial in energy conversion in the mitochondria, in hemoglobin for oxygen transport, and in ribonucleotide reductase for DNA biogenesis 1-3. Previously, several studies have shown that iron-depletion ameliorates certain infections while iron-overload aggravates the infections 4-8. However, the link between iron-homeostasis and infections is controversial, as there are conflicting viewpoints that both iron-overload 9 and iron-depletion 8, 10 could subdue infections. This discrepancy suggests that the mechanisms whereby iron influences the crosstalk between host innate-immune response and microbial growth has not been fully elucidated. Furthermore, it is unknown whether, and how innate-immune cells would modulate their own iron homeostasis state when challenged by intracellular or extracellular pathogens.

The epithelial cells lining the respiratory tract are the first line of innate-immune defense against respiratory pathogens, including viruses and bacteria. These cells have also been successfully used to study iron-homeostasis 11. Furthermore, the availability of an epithelial cell iron-model 12 makes these cells an elegant model to study the immediate interaction between host innate-immune cells and pathogens, e.g. on whether the innate-immune cells modulate their iron homeostasis in order to function as immune 4 effectors, as well as to reduce the availability of iron to the invading microbes.

Respiratory tract pathogens contribute to morbidity and mortality in the young, elderly and immune-compromised individuals 13-17 necessitating further investigations into the pathogenesis of these pathogens. Several efforts have been made to understand the molecular mechanism underpinning these diseases by identifying genes or proteins or pathways that are perturbed in expression in infected cells relative to non-infected cells 16, 18-23. For example, protein expression profiles of lung epithelial cells (A549 cells) infected with respiratory syncytial virus (RSV), human metapneumovirus (HMPV), parainfluenza virus (PIV), and measles virus (MV) reveal apoptosis as the principal innate-immune response that is mounted after infection with these viruses 18.

In this study, we compare the cellular transcription profiles of genes that are perturbed in expression after iron-depletion or iron-overload with their expression profile in cells infected by intracellular or extracellular pathogens of the respiratory tract. We developed a novel method called Iron Squelch (Figure 1), to examine the iron-related transcription profiles of epithelial cells, following infections with HMPV, RSV, PIV, MV and Pseudomonas, amongst others. As reference, we used the gene expression data from an infection-free epithelial cell iron study 12 and infer that overall, an iron-depletion transcriptome is an

81 Figure 1 - Scheme of Iron Squelch principle.

Summing up the total extent of gene expression fold change of a given set of consistent iron-response genes can lead to the following scenarios: (a) down-regulation of the expression of a set of genes during iron-depletion as well as during infection, but up-regulation of those genes during iron-overload; (b) up-regulation of gene expression during iron-depletion as well as during infection, but an overall down-regulation of those genes during iron-overload; (c) down-regulation of gene expression during iron-overload as well as during infection, but an up-regulation of the genes during iron-depletion; (d) up-regulation of gene expression during iron-overload as well as during infection, but down-regulation of those genes during iron-depletion. In scenarios a and b, infection mimics iron-depletion, in scenarios c and d it mimics iron-overload. All other scenarios are irrelevant to assess iron-related mimicry for our test set of genes. For example, scenarios where the overall extent of gene expression is similar during iron-depletion and iron-overload; i.e up-regulated (e, f) or down-regulated (g, h) under both conditions, are not informative from the iron-response perspective and are therefore not included in our analyses of test set of genes, irrespective of the extent of regulation of the genes during infection. intrinsic response of cells infected with live or uv-irradiated intracellular pathogens, unlike extracellular pathogens. An iron-depletion transcription profile was also observed in interferon gamma-treated cells; suggesting that the mechanism underpinning iron- depletion could be interferon mediated.

82 Cellular iron-depletion profile for intracellular pathogens

2. Materials and methods

2.1 Iron Model We used the microarray data (GSE3573) of Chicault and coworkers 12 as a model for the effects of iron overload or iron depletion on the transcriptome. Briefly, we used the gene sets that were up-regulated in caco-2 cells treated with hemin for 24 h (DMEM-Hemin) relative to their expression in cells treated with the same medium without hemin (DMEM-FBS) to assess iron-overload. Likewise, cells cultivated in the iron-free medium (SF-0) for 48 h were compared with those treated with iron-free medium supplemented with FAC (SF-FAC) to assess iron-depletion.

2.2 Iron Squelch analysis

Firstly, by “Squelch” we mean summing up the log2 fold changes of a given set of genes. When the set of genes is selected based on gene expression signals in the iron model, we refer to it as Iron Squelch. In this manuscript, we provide results from two types of complementary Iron Squelch analyses: Raw Squelch (our normal Squelch method) and Absolute Squelch.

2.2.1 Raw Squelch In Raw Squelch analysis, (i) we intersect two microarray data (the iron-model data and another data set of interest (e.g. infection data)) at the level of common genes, to yield 4 a dataset with common genes (Equation 1). (ii) From this common dataset, we select all genes that are up-regulated upon iron-overload and down-regulated upon iron- depletion (Equation 5) or down-regulated upon iron-overload and up-regulated upon iron-depletion (Equation 6) at a fold change (FC) threshold of log2(1.5). We call these the genes of interest (GOI; Equation 2-7). (iii) We finally squelch the combined data by directly summing up their log2 transformed relative gene-expression values in the dataset of interest (e.g. infection data) to yield an Iron Squelch Index (ISI) for infection (Equation 8). Furthermore, to serve as positive or negative controls, we also calculated the ISI for iron-overload (Equation 9) and iron-depletion (Equation 10) for the same set of genes to compare with the ISI of infection. Since the gene-expression fold changes have been log-transformed, the ISI values could be positive or negative numbers. A positive ISI indicates overall up-regulation of the genes set while a negative ISI indicates overall down-regulation. (iv) In order to estimate a p-value for the ISI and whether it resembles that of iron-depletion or iron-overload, we randomly select (from the complete intersected array data, Equation 1), the same number of genes as in the GOI set and calculate their Squelch Indexes (SIs; Equation 11-13; using expression values from their corresponding microarray datasets. The SI for a random gene set is computed irrespective of the iron-response pattern (thus SI and not ISI) of the genes. This process is repeated 1000

83 times, and the p-value is estimated as the fraction of times that the random sets of genes (background) has a Squelch Index that is greater than or equal to the ISI of their corresponding tests (infection, iron-overload, or iron-depletion); given that the ISI of the infection data for iron-depletion is greater than that for iron-overload (Equation 14a); otherwise, the reverse comparison is done (Equation 14b). If the ISI for iron-depletion and iron-overload were the same, the iron-response genes are deemed not informative, and the p-value is assigned as “NA”, not available (Equation 14C). As such, if the p-value is close to 1, the expression profile of test data does not mimic that of iron-depletion. If it is close to zero, the profile resembles that of iron-depletion. The Raw Squelch significance test assesses whether, e.g. upon infection the genes that are selected because of their consistent behavior upon iron-depletion and iron-overload (i.e. being minimally 1.5 fold up-regulated during iron-depletion but down-regulated during iron-overload, or vice versa) show a larger change in gene expression than any random set of genes. As positive controls we use the gene expression changes upon iron-depletion or iron-overload. Any random set of genes will show a smaller effect on gene expression upon iron-depletion than the ones that have been pre-selected for consistent behavior under iron-depletion and iron-overload. As such, in a test for iron-depletion mimicry, a significant p-value for the ISI of iron-depletion serves as a positive-control while a non-significant p-value for the ISI for iron-overload serves as a negative-control. The ISI calculated using the Raw Squelch approach is best interpreted with its associated p-value. Unless otherwise stated, any mention of Iron Squelch Index in the manuscript text refers to that calculated from the Raw Squelch approach. Using this principle, we are pulling genes from other datasets and assessing the overall extent to which they are similarly regulated in the iron-model.

2.2.2 Absolute Squelch In the Raw Squelch analysis, we select reliable iron-response genes by using a high fold change threshold for iron-depletion and iron-overload. This however can lead to low number of iron-response genes because the cellular response to iron is limited. In

Absolute Squelch analysis we (i) lower the fold change threshold from log2(1.5) to log2(1.2) to yield a set of iron-response genes (i.e. FC=log2(1.2) in Equation 5 and 6). (ii) From these iron-response genes we select only genes that are regulated in similar ways in the infection data of interest as in during iron-depletion (Equation 15) or iron-overload (Equation 16). (iii) An Absolute ISI (AISI) is then calculated for the genes whose expression profile resembles that of iron-depletion (Equation 17) or iron-overload (Equation 18). The term “absolute” is used here because in the calculation of the measure of similarity, we take the sum of the overall extent of up-regulation of genes (a positive number), and the absolute number (a positive number) of the overall extent of down- regulation (a negative number) of genes.

84 Cellular iron-depletion profile for intracellular pathogens

The Absolute Squelch approach results in obtaining the iron-related gene expression profile from several genes that respond in similar ways as in the iron-depletion or iron-overload models. Thus if the magnitude of AISI for the genes in the infection data whose expression were similar to those of genes under iron-depletion is greater than that for genes regulated in a similar way during iron-overload, we state that the test set of genes mimics iron-depletion. If the AISIs are similar, then the test set of genes are not informative. The log2(1.2) fold change used here is quite low and can potentially increase noise in the data. However, deliberately selecting those genes that respond in similar ways (rather than pulling all genes) in the iron-model and infection data, could reduce the level of noise generated. Therefore an agreement in the predicted profile using Raw and Absolute Squelch analyses emphasizes the usefulness of the squelching principle for transcriptomics data.

2.2.3 Algorithm The equations used for Iron Squelch analyses are as follows:

(1)

(2) 4

(3)

(4)

(5)

85 (6)

(7)

(8)

(9)

(10)

(11)

(12)

(13)

(14a)

(14b)

86 Cellular iron-depletion profile for intracellular pathogens

(14c)

(15)

(16)

(17)

(18) 4

2.3 Microarray data The following microarray data sets were obtained from NCBI gene-expression omnibus database (GEO) and used as normalized therein. (i) human metapneumovirus (HMPV) infection of airway epithelial cells (A549 cells) at 6, 12, 24, 48, and 72 hours post-infection, (data accession number GSE8961) 24 . (ii) Respiratory syncytial virus (RSV) infection of human bronchial epithelial cells (BEAS-2B cells) for 4 and 24 hours, (data accession number GSE3397, contributed by Huang et al, 2001). (iii) Measles virus (MV) infection of monocyte-derived dendritic cells at 3, 6, 12, and 24 hours post-infection (data accession number GSE980) 25. (iv) Influenza A infection of mice (accession number of the pulmonary cells GSE5289) 26. We translated the mouse influenza-induced expression data to homologous human gene-expression using the NCBI HomoloGene homology database downloaded in November 2006. (v) Chlamydia pneumonia infection of human epithelial cell line for 12, 24, 48 and 72 hours (GSE7246) 27. (vi) Probiotic E. coli Nissle 1917 infection of caco2-cells (intestinal epithelial cells) for 6 hours (data accession number GSE2232, the human dataset was used here) 28. (vii) Pseudomonas aeruginosa infection of airway epithelial cells for 90 minutes (data accession number GSE4485) 29. (viii) UV-irradiated

87 airway-pathogens treatment of bronchial epithelial cells for 4 hours, (data accession number GSE6802) 20; the pathogens concerned are uv-inactivated Staphylococcus aureus, Pseudomonas aeruginosa and Respiratory Syncytial Virus. (ix) Lipopolysaccharide (LPS) treatment of airway alveolar epithelial cells for 1 and 4 hours (data accession number GSE1541) 30. (x) Interferon gamma (IFNG) treatment of human bronchial epithelial cells for 4 and 24 hours (data accession number GSE1815) 31. (xi) Iron-model (data accession number GSE3573) 12 made from infection-free iron-treatments of caco-2 cells. In-house microarray data were kindly provided by Dr Arno Andeweg at Erasmus MC, Rotterdam. For all the microarray datasets, the fold changes of genes were obtained by dividing gene response of treated to untreated samples. Geometric means of all fold changes (log2) were used for all probesets pertaining to a gene. That average value is the effective fold change measures used per gene. Unless otherwise stated, we used a fold change threshold of ≥1.5 or ≤1/1.5 to determine up-regulation or down-regulation respectively.

2.4 Additional data and Statistics All “iron” related genes were obtained from NCBI GO database of 2008. There were a total of 309 such genes. Chi-squared p-values were obtained using Pearson’s Chi-squared test with Yates’ continuity correction as implemented in the statistical package R. The P-values for two sampled tests were obtained using Wilcoxon paired test as implemented in the statistical package R.

2.5 Transcription factor binding site analysis For transcription factor binding sites (TFBS) determination, we obtained sequences 2000 base pairs upstream of the promoter of a given set of genes using the Ensembl Genome Browser track of refseq sequences of the March 2006 human genome assembly 32, 33. We determined Transcription Factor Binding Sites (TFBS) in these 2000bp upstream regions using the Clover (Cis-element over representation) algorithm 34. Briefly, Clover screens a given set of DNA sequences against a precompiled library of TFBS motifs and reports the motifs that are statistically over-represented or under-represented. As input to Clover, we used 263 human TFBS motifs garnered from JASPAR core 2005 35, and TRANSFAC 36 version 7.0 as the precompiled library of TFBS motifs. We used sequences 2000bp upstream of promoters of all human genes as background. All TFBS results reported were statistically over-represented TFBS motifs that were obtained at a p-value cutoff of <=0.05 against background and the default motif score threshold of >=6. Using in-house Python scripts, we further provide graphical displays of the over-repre- sented TFBSs that were present in the 2kb upstream region of the set of genes being examined.

88 Cellular iron-depletion profile for intracellular pathogens

3. Results

3.1 Genes perturbed in expression in infected epithelial cells include iron-related genes To determine whether iron-related genes are affected in expression in infected epithelial cells, we queried the Gene ontology (GO) database at NCBI (http://www.ncbi.nlm.nih. gov), and the infection-free iron epithelial cell model of Chicault and colleagues 12 for genes that we found to be perturbed in expression (i.e. at least 1.5-fold up-regulated or down-regulated) after virus infection of epithelial cells. We identified 31 out of 277 genes that have “iron” as a keyword in their description in GO, to be perturbed ( ≥1.5 fold) in expression in epithelial cells post infection with HMPV, RSV, PIV or MV relative to un-infected control cells (Supplemental Figure 1). These 31 (out of 1167) virus-perturbed genes, highlight a small but significant crosstalk (chi-squared p= 2.94e-02) between cellular response to virus and iron-related processes.

Furthermore, we identified 52 out of 299 genes that are perturbed either during iron- depletion or during iron-overload in the infection-free iron model of Chicault and others 12, to be also perturbed in expression post virus infection (Figure 2). These 52 (out of 1167) virus-perturbed genes, also indicate a significant crosstalk (chi-square p=3.35e-09) of cellular gene expression response to virus and iron. 4 3.2 Innate-immune cells that are infected with intracellular pathogens show an iron-depletion transcriptome profile The significant overlap of iron-perturbed and virus-perturbed genes suggests a crosstalk between iron homeostasis and innate-immune responses to the viruses examined. However, it is not clear whether the gene expression of infected cells mimics an iron-depletion or iron-overload situation. To determine this, we measured the iron-profile of infected cells using Iron Squelch analysis (details in Methods). Briefly, this method compares the overall gene expression response to infection with the expression response of the same genes to iron-overload or iron-depletion. The genes examined are those that behave consistently during iron-depletion and iron-overload: i.e. they are either up-regulated (≥ log2(1.5) fold) during iron-depletion and down-regulated during iron-overload, or they are down-regulated during iron-depletion and up-regulated during iron-overload. Using this set of consistent iron-response genes, we compare the sum of their log2 gene expression fold change (Iron Squelch Index; Equations 8-10) under conditions of (1) iron-depletion relative to control, (2) iron-overload relative to control and (3) infection relative to non-infection control. Subsequently, we judge using the Iron Squelch Indexes of iron-depletion and of iron-overload whether the gene expression profile of infection mimics that of iron-depletion or that of iron-overload. To achieve this, we compare the sum of log2

89 Figure 2 - The 52 virus-perturbed genes that are also perturbed by iron-overload or depletion, rsv6 and rsv6uvd refer to data of 6 hours of live and uv-irradiated RSV infection Respectively, RSV, respiratory syncytial virus; HMPV, human meta- pneumovirus; MV measles virus; PIV, parainfluenza virus; depletion, iron- depletion; overload, iron-overload. (Red >=1.5 fold up-regulated; blue >=1.5 fold down-regulated; white not up to 1.5 fold perturbed relative to control cells.) gene expression change of random set of genes from a test situation (e.g. infection) with that of the consistent iron-depletion/iron-overload selected set of genes. We compare this value against that obtained from the condition wherein the set of genes changed the most (generally the depletion set of genes) and we check whether the

90 Cellular iron-depletion profile for intracellular pathogens

sum of changes is larger. P-values for these similarities are estimated as the number of instances out of 1000, whereby a random set of genes (background) of the same size as the test set, has a Squelch Index that is higher than that of the test set of consistent iron-response genes. To serve as controls, we can also do that for random genes affected in the condition of iron-depletion or iron-overload, sum their changes, and compare that to the pre- selected, consistently behaving iron-response set of genes. The pre-selected genes show, one average, lower expression upon iron-overload than do randomly selected genes. The pre-selected genes do give a higher change upon iron-depletion than a random set of genes, even though some of these genes will actually be down-regulated upon iron-depletion.

3.2.1 Iron Squelch analysis reveals that live virus infections mimic iron- depletion at the level of gene expression of host cells First, we applied Iron Squelch, using genes (n=41) that are oppositely perturbed in expression in the iron-depletion and iron-overload models, to the expression data of A549 cells infected with HMPV, RSV, PIV or MV. The squelch index for iron-overload was -3.25 while that for iron-depletion was 13.16. Since iron-depletion is opposite to iron-overload, we can use their Squelch Indexes to determine which of the two iron- conditions is better mimicked by the same set of genes in virus-infected cells. Based on these Squelch Indexes, we found that live virus infections mimic iron-depletion at the 4 level of gene expression (Figure 3). The iron-depletion profile appears to increase over time of infection with live viruses (e.g. for HMPV, Iron Squelch Index= 8.11 (p<0.001), 12.29(p=0.001) and 13.83(p<0.001) for time points 6, 12 and 24h respectively). The iron-depletion profile is also observed in other published data sets, like in HMPV infected-A549 cells24, RSV-infected bronchial epithelial cells (Huang et al, 2001), MV– infected monocyte-derived dendritic cells25 and Influenza–infected lung epithelial cells26 (Supplemental Figures 2-5 respectively). Using the complementary approach, Absolute Squelch approach, one can also calculate whether infection mimics iron-depletion or Iron-overload by comparing the absolute sum of expressions of genes that are down-regulated upon infection and also down- regulated upon iron-depletion but up-regulated under iron-overload; or up-regulated upon infection and also up-regulated upon iron-depletion but down-regulated upon iron-overload. The reverse comparison can also be done. With such stringent rules, a low fold threshold of gene expression (log2(1.2) ) for infection, iron-depletion and iron-overload) is used. We observe using the Absolute Iron Squelch approach, that similar to the normal Raw Iron Squelch approach, virus infections mimic an iron-depletion profile (Figure 4).

91 3.2.2 Iron Squelch analysis reveals that uv-irradiated virus infections mimic iron-depletion at the level of gene expression of host cells Secondly, we applied Iron Squelch analysis to cells infected with dead viruses; i.e. uv-irradiated viruses. Interestingly, cells infected with uv-irradiated viruses exhibit a profile similar to that of iron-depletion (Figures 3&4). However, as opposed to that of their live viral counterparts, the iron-depletion profile for cells infected with uv-irradiated viruses decreases over time of infection (e.g. for uv-irradiated HMPV, Iron Squelch Index= 6.89 (p<0.001), 4.87(p<0.001) and 1.79(p=0.016) for time points 6, 12 and 24h respectively (Figure 3). The iron-depletion profile is also observed in a published dataset of which bronchial epithelial cells were infected with uv-irradiated RSV20 (Supplemental Figure 6).

Figure 3 - Profiles from Raw Squelch analysis of genes perturbed in expression in A549 cells infected with HMPV, RSV, MV and PIV for 6, 12 and 24 hours.

This involves genes that are up-regulated in iron-depletion but down-regulated in iron-overload, or vice versa. There are 41 such genes at a fold change threshold (FC) of at least 1.5 for up-regulated genes and at most 1/1.5 for down-regulated genes in the iron-depletion and overload models. The Squelch index (y-axis) reported is the sum of the log2 gene-expression fold changes for the gene set. A low p-value, indicates that the infection mimics iron-depletion. (Boxplot : random; grey diamond:test. Respectively, rsv6 and rsv6uvd refer to data of 6 hours of live and uv-irradiated RSV infection, etc. RSV, respiratory syncytial virus; HMPV, human metapneumovirus; MV measles virus; PIV, parainfluenza virus; depletion, iron depletion; overload, iron-overload.)

92 Cellular iron-depletion profile for intracellular pathogens

3.2.3 Iron Squelch analysis reveals that intracellular bacteria infections mimic iron-depletion at the level of gene expression of host cells Thirdly, we calculated the Iron Squelch Indices for a non-viral intracellular pathogen of the respiratory tract; Chlamydia pneumoniae. Interestingly, the transcription profile of human epithelial cells infected with this intracellular bacterium also shows iron-depletion profile (Squelch Index=2.98 (p=0.02), 7.11(p<0.001), 8.26(p<0.001) and 7.25(p=0.001) for 12, 24, 48 and 72h after infection; Supplemental Figure 7). This profile is similar to that of the other intracellular pathogens, the viruses. Overall, the iron-depletion profile is likely part of an intrinsic host cell response because it is elicited by both live and dead viruses, and also by C. pneumoniae.

4

Figure 4 - Profiles from Absolute Squelch analysis of genes perturbed in expression in A549 cells infected with HMPV, RSV, MV and PIV for 6, 12 and 24 hours.

This involves genes that are up-regulated in iron-depletion but down-regulated in iron-overload, or vice versa, and the expression of the genes in the virus infection data is either similar to that of iron-depletion or iron-overload. The Squelch index (y-axis) reported is the sum of the absolute sum of the log2 gene-expression fold change of down-regulated genes and that of up-regulated genes. Overload similar and Depletion similar respectively refer to the squelch index of the set of virus-perturbed genes that have similar expression (up-regulated or down-regulated at FC 1.2) in the iron-depletion or iron-overload model. The magnitude of the y-values signifies the measure of similarity of infection-induced gene response to that of iron-overload and iron-depletion. (Respectively, rsv6 and rsv6uvd refer to data of 6 hours of live and uv-irradiated RSV infection, etc. RSV, respiratory syncytial virus; HMPV, human metapneumovirus; MV measles virus; PIV, parainfluenza virus; depletion, iron-depletion; overload, iron-overload.)

93 3.3 Innate-immune cells treated with interferon gamma show an iron-depletion profile Given that both viruses and C. pneumoniae induce the expression of interferon gamma (IFNG) in innate-immune cells, as part of the innate-immune response 37, 38, we examined the transcription profile of bronchial epithelial cells treated with this cytokine. Interestingly, IFNG also induced an iron-depletion profile in cells (Iron Squelch Index=5.33 (p=0.022), 5.17(p=0.03) after 8 and 24hr of IFNG treatment respectively; Supplemental Figure 8). The iron-depletion profile induced by IFNG treatment suggests that the iron-depletion profile could be the result of innate-immune strategies induced at least via an IFNG pathway.

3.4 Innate-immune cells infected with extracellular microbes do not show an iron-depletion profile Having established an iron-depletion profile for IFNG and the intracellular pathogens discussed above, we then proceeded to assess the iron profile for extracellular pathogens. In epithelial cell cultures infected with extracellular bacteria Pseudomonas aeruginosa and Escherichia coli, we observed no iron-depletion profiles; rather there appears even to be a small extent of iron-overload profile when compared to background (Supplemental Figures 9-10). In addition, examination of alveolar epithelial cells exposed to lipopolysaccharide (LPS), a bacterial surface component, also showed no iron- depletion profile (Supplemental Figure 11). Furthermore, examination of uv-irradiated airway pathogens in bronchial epithelial cells indicates that the extracellular bacteria Staphylococcus aureus and Pseudomonas aeruginosa do not cause an iron-depletion profile as strongly as did RSV used in the same study (Supplemental Figure 6).

3.5 Virus-induced and iron-responsive genes highlight a crosstalk between iron homeostasis and innate immunity Having established an iron-depletion profile for the pathogens examined, it is imperative to examine the functions of the genes whose expression is affected to understand the mechanism underpinning a potential crosstalk between iron homeostasis and infection. For brevity, we shall henceforth use the data from virus infection to represent intracellular infection. In the Iron Squelch analyses, we selected iron-response genes using the strict criterion that the genes had to be expressed in opposite ways under iron-overload and iron-depletion at least 1.5 fold relative to control. This yielded 41 genes, 15 of which were at least 1.5 fold up-regulated or down-regulated in at least one time point post virus-infection. The 15 genes are ADM, BIRC3, FOS, G0S2, INHBB, ISG20, KIF14, LIPH, MT1E, MT1H, MT1M, MT2A, NPTX1, S100A3 and TMEM140. In order to have a more comprehensive overview of iron-related genes that are affected following infection, we increased the number of genes in this list by relaxing this criterion and considered a gene to be iron- informative if it is at least 1.5 fold up-regulated or down-regulated in expression during

94 Cellular iron-depletion profile for intracellular pathogens

iron-depletion or iron-overload relative to their respective controls. Out of 1167 genes perturbed (>=1.5 fold relative to control) by either live or uv-irradiated HMPV, RSV, PIV, or MV at time points of 6, 12 or 24 hours post infection, 52 are iron-informative (Figure 2). With the exception of HMOX1 which was up-regulated at early infection time points but down-regulated later in infection, the remaining 51 genes were consistently down- regulated or up-regulated during infection. In order to uncover the common theme between immune-response and iron-responses, we further classify the 51 consistently expressed genes in Figure 2 into 4 groups (Table 1) for a detailed examination of the genes that respond similarly after virus-infection and iron treatment. The relation to innate immunity and iron homeostasis for the genes in the 4 groups are described below.

The first group consists of genes (n=10) that are up-regulated in iron-replete and virus-infected cells (group 1 in Table 1). These genes encode proteins that are related to iron-handling and metal detoxification; e.g metallothioneins (MT1E, MT2A, MT1H, MT1F and MT1M). Metallothioneins are known to be induced to bind iron during iron toxicity. This equally applies to other metals like zinc and copper [29, 30]. Other members in this group include the gene encoding G0S2 that modulates cell cycle arrest 39 and is known to specifically interact with BCL2 to promote apoptosis by preventing the formation of protective BCL2/BAX heterodimers 40. SP110, otherwise known as intracellular parasite resistant 1(IPR1) is an interferon inducible gene that has been shown to exert innate immune 4 response against viruses and other intracellular parasites like Mycobacterium tuberculosis and Listeria monocytogenes by up-regulating apoptosis 41-43. ARNTL2 is a transcription factor that has recently been reported to compensate for the dysfunction of its paralog, ARNTL1, which is a core component of the circadian clock and controls innate-immune reactions involving macrophages 44, 45. Overall, genes encoding both iron-sequestering and immune- related proteins are up-regulated in iron-replete and virus-infected cells.

The second group consists of genes (n=28) that are up-regulated in iron-depleted and virus-infected cells (group 2 in Table 1). This is the largest of the 4 groups; underlining major crosstalk between iron-depletion and viral infection. The genes in this group include ISG20, IL32 and BIRC3 amongst others that are known to activate innate immune responses against viruses 46-48. Overall, antiviral genes are up-regulated in iron-depleted and virus-infected cells.

The third group consists of genes (n=8) that are down-regulated in iron-replete and virus-infected cells (group 3 in Table 1). Amongst these is the gene that encodes TP53I3 which is involved in p53-mediated cell death 49. In addition, INHBB 50 , FHL2 51 and CERK 52, 53 are known to promote cell growth and proliferation. Overall, cell proliferation genes are down-regulated in iron-replete and virus-infected cells.

95 The fourth group consists of genes (n=5) that are down-regulated in iron-depleted and virus-infected cells. Amongst these are genes encoding PIR which facilitates DNA replication by use of its ferric active site 54 and KIF14 which has been shown using RNA interference, to be crucial for cytokinesis progression 55. Overall, cell proliferation genes are down-regulated in iron-depleted and virus-infected cells.

Table 1 - Summary of the 51 genes perturbed in expression by both iron and virus.

Group no. Description Genes Overrepresented TFBS 1 relatively up- ARNTL2, G0S2, MT1E, MT1F, C/EBP, c-Myc:Max, COUP-TF_ regulated in MT1H, MT1M, MT2A, SGK3, HNF-4, FOXL1, Hox11-CTF1, IRF1, iron-replete and SP110, ZFP36L2 ISRE, NF-1, NF-kappaB, NR2F1, virus-infected cells Oct-1, USF 2 relatively up- ADM, BIRC3, DUSP6, EFNA1, AP-1, C/EBP, c-Rel, FOXD1, regulated in EGR1, EMP1, FOS, GADD45A, FOXF2, Freac-2, Freac-3, GATA-1, iron-depleted and IGFBP3, IL1A, IL32, ISG20, GR, NF-kappaB, Oct-1, Pax-2, virus-infected cells KLF6, LAMB3, LAMC2, LDLR, REL, RELA, SOX9, SRF, STAT1, LIPH, MMP7, NPTX1, PLAU, STAT3, STAT4, STAT5A, STAT6 S100A3, SAT1, SERPINB1, STC2, TMEM140, TXNIP, UBD, UGCG 3 relatively down- CERK, DDIT4, DEPDC7, FHL2, E2F, Egr-2, LUN-1, MAZR, NFIL3, regulated in INHBB, MKI67, TACSTD1, TP53I3 NF-kappaB, Sp1, STAT5A iron-replete and virus-infected cells 4 relatively down- APOH, GCLM, HADH , KIF14, BSAP, MyoD, NF-Y regulated in PIR iron-depleted and virus-infected cells

3.6 Binding sites for interferon-related transcription factors are over- represented in the promoter region of iron-responsive genes up-regulated post viral infections Changes in gene expression such as observed in infections are commonly regulated by transcription factors. It is not known whether there are any transcription factors that are triggered in expression both during viral infection and during iron-depletion or iron-overload. To predict candidate transcription factors involved in regulation of both infection-related and iron-related genes, we performed transcription factor binding site analysis (see Methods) on the regions 2kb upstream the start sites of these genes.

The promoter regions of genes that are up-regulated during iron-overload and virus infection (group 1 in Table 1) are significantly enriched (p<0.05) in sites for ISRE (interferon-

96 Cellular iron-depletion profile for intracellular pathogens

stimulated response element), IRF1 (interferon regulatory factor 1), and NF-kappaB amongst others (details in Supplemental Table 1 and Supplemental Figure 12). Interestingly, IRF1 is up-regulated in both iron-overload and infection. IRF1 activates expression of interferons alpha, beta and gamma and represses proliferation of cells 56, 57. This observation suggests interferon involvement in regulation of cellular iron-homeostasis and virus infection.

The promoter regions of genes that are up-regulated during iron-depletion and virus infection (group 2 in Table 1), are enriched in binding sites for transcription factors like NF-kappaB, STAT-1, 3, 4, 5A and 6 amongst others (details in Supplemental Table 2 and Supplemental Figure 13). Early cellular responses to human interferons are critically dependent on the amount of STAT1 and are essential for the appropriate control of intracellular infections 58. The enrichment of binding sites for STAT transcription factors in the promoter region of genes that are up-regulated during iron-depletion and virus infection suggests induction of interferon-induced genes.

The promoter regions of genes that are down-regulated during iron-overload and virus infection (group 3 in Table 1) are enriched in binding sites for factors like Sp1, NF-kappaB, and NFIL3 amongst others (details in Supplemental Table 3 and Supplemental Figure 14). Interestingly, SP1 binding in the promoter of FHL2 have been recently confirmed in functional assays at the region -1058 to -1049 nucleotides upstream the gene 59; a region 4 we also predicted. MAZR and SP1 are known to facilitate cellular proliferation 59, 60.

The promoter regions of genes that are down-regulated during iron-depletion and virus infection (group 4 in Table 1) are enriched in binding sites for NF-Y, BSAP and MyoD (Table1, Supplemental Table 4 and Supplemental Figure 15). BSAP is involved in B cell development and proliferation 61 , and has been suggested to inhibit apoptosis by modulating BCL2 62. In addition, MYOD mRNAs have been observed to be suppressed, at a posttranscriptional level, under inflammatory conditions, such as where cytokines like tumor necrosis factor, interferon gamma or NF-kappaB are induced 63.

Overall, for groups 1-4, some of the transcription factors whose binding sites were predicted, e.g. IRF1 are expressed in the iron model and/or virus infection (Supplemental Table 5). Taken together, the promoter regions of genes that are up-regulated during iron-overload and virus-infection have binding sites for interferon regulatory factors. Those that are up-regulated during iron-depletion and virus infection are rich in factors like STAT1 (signal transducer and activator of transcription 1) which are known to be induced by ligands like interferon gamma 64, 65. The observation of interferon related transcription factor binding sites in these genes further supports our hypothesis that interferon gamma maybe involved in mediating the iron-depletion profile in cells.

97 4. Discussion

Using Iron Squelch analysis on cellular transcriptomics data, we observed, an iron- depletion profile for both live and uv-irradiated viruses. We also observed an iron-depletion profile for Chlamydia and interferon gamma but not for LPS or for the extra cellular bacteria Pseudomonas aeruginosa and Escherichia coli. One possible explanation for these observations is that the host cell restricts iron from the niches of the pathogens in the cell-infection models, and that the gene expression profile we observe is a secondary response to that. Our observations constitute an important contribution to innate immunity wherein immune-related iron homeostasis has been evaluated mainly for pathogens like Plasmodium7, hepatitis C virus 9, 10, HIV 8 and Chlamydia 66 that mainly infect cells of the reticuloendothelial system. Cells in the latter system have rich iron stores earmarking them for iron-related investigations.

How do host innate-immune cells project iron-depletion profile? Given that iron cannot be secreted from the cell 2, 5 and that the viruses examined here are not yet known to utilize iron directly, we wonder how iron would be sequestrated in order to trigger the iron-depletion profile observed. We mined scientific literature for iron-related genes, including those not informative in the iron-model 12 used here; and that are affected in expression post HMPV24 infection to ensure that they are associated with both iron and virus. These genes are summarized in Supplemental Figure 16. We suggest 3 possible iron-depletion profile strategies as discussed below.

(1) Iron-depletion mediated by binding of the metal by metalloproteins Firstly, cellular iron-depletion could result from scavenging of the metal by metallopro- teins such as metallothioneins, which we observed to be up-regulated in expression in the iron-overload model and virus-infected cells (Figure 2 and Supplemental Figure 16). Metallothioneins are generally known to bind and transport metals like zinc, copper and iron and help in protection against metal toxicity 67, 68 . Iron-scavenging during viral infection is further supported by an up-regulation of the gene LCN2 (Supplemental Figure 16) which is known to sequester iron in a transferin-independent way 69, 70 and has been shown to lead to cellular iron-depletion in macrophages infected with the intracellular bacteria Salmonella typhimurium 71.

(2) Iron-depletion mediated by redistribution of iron to the mitochondria Secondly, cellular iron-depletion could result from iron influx into mitochondria which are the main cellular organelles for iron utilization, in processes like haem biosynthesis and ATP generation. Among genes that are associated with mitochondrial iron import are those encoding the mitoferrins SLC25A28 and SLC25A37 that are up-regulated during viral infection (Supplemental Figure 16) and are known to export cytosolic iron

98 Cellular iron-depletion profile for intracellular pathogens

to the mitochondria 72. Further supporting iron influx into the mitochondria is the down-regulation (in virus-infected cells) of other genes like the mitochondrial nucleoside diphosphate kinase (NME4) (Supplemental Figure 16). NME4 converts ATP to GTP 73 ; and low levels of mitochondrial GTP have been associated with excessive mitochondrial iron-overload mediated by repression of this kinase 74, 75. Furthermore, the virus-induced down-regulation (Supplemental Figure 17) of genes encoding ferritin (FTH1 and FTL, used for cytosolic iron storage); transferrin receptor C (TFRC, used for cytosolic iron import); and cytosolic aconitase (ACO1, used for cytosolic conversion of citrate to isocitrate during iron-overload 1, 72), is indicative of persistent cytosolic iron-depletion suggesting re-channeling of cytosolic iron into an organelle.

(3) Iron-depletion mediated by interferons and interferon induced genes Thirdly, iron-depletion could result from interferon production in infected host cells. We observed that epithelial cells treated with IFNG (data from 31 ) showed an iron-depletion profile (Supplemental Figure 7). Moreover, both HMPV-infected and IFNG-treated epithelial cells exhibit a time-wise decrease in ferritin and the iron-import receptor TFRC (Supplemental Figure 16). It is important to note that TFRC promotes cellular iron import in non-infected cells. However, during HMPV infection and IFNG treatment, TFRC is down-regulated in expression (Supplemental Figure 17); suggesting maintenance of a persistent iron-depletion status during viral infections. In addition, the promoter regions of genes that are up-regulated during iron-depletion 4 and after viral infections are enriched in binding sites for STAT transcription factors which generally induce and are also induced by ligands like interferon gamma 57. Other studies corroborate our observation of interferon mediated iron-depletion by showing that IFNG is coupled to down-regulation of transferrin receptors culminating in iron- starvation and depletion 76. Interferon-mediated iron-depletion is particularly interestingly because it suggests that the genes that are similarly expressed post virus infection and iron-overload; or post virus infection and iron-depletion both point towards an iron-depleting profile of infected cells. Reduced-iron import, mediated by interferons, is a likely strategy used by the infected cells to effect an iron-depletion profile.

Why do host innate-immune cells promote an iron-depletion profile during viral infection? The genes (e.g. ISG20, IL32 and BIRC3 amongst others) that are up-regulated during iron-depletion and virus infection are pro-apoptotic 46-48. This is interesting because apoptosis promotes innate immunity by indirectly curbing proliferation of intracellular pathogens 77. Furthermore in line with iron-depletion being a possible defense strategy that is mediated by interferons, recent literature indicates that iron supplementation to human endothelial cells which are infected with Chlamydia pneumonia, augments chlamydial proliferation and differentiation; while IFNG (promotes iron-depletion profile)

99 had an inhibitory effect on cellular proliferation 66. Cellular iron-depletion is likely important in limiting the spread of intracellular infections by at least limiting the number of host cells.

Overall, the crosstalk between iron homeostasis and intracellular infections points towards iron-depleting phenotype, activation of innate immunity, apoptosis and repression of cellular proliferation. Our observation that live and uv-irradiated intracellular pathogens induce an iron-depletion profile while extracellular bacteria did not, urges us to address the issue of co-infections during host-pathogen interactions. Co-infections of viruses with some bacteria have been shown to cause a pronounced elimination or enhanced persistence of both infections. We suggest that the pathogen-dependent pattern of iron- distribution of innate-immune cells could be compromised during co-infections when the immune system redistributes or restricts iron from one microbe’s niche to that of the other. For instance in virus-bacteria co-infection studies, it has been shown that influenza virus and Staphylococcus or Pseudomonas, exacerbate respiratory disease 17, 78, 79. Conversely, co-infection in the same niche could ameliorate the ramifications of both individual infections where iron is synergistically excluded from the colonizing niche as a response to both infections. In this regard, there is a relative paucity of virus-virus co-infections. Moreover where they occur, such co-infections have not been associated with disease severity 80-82 . Such patterns have been seen with different types of viruses. For instance RSV and influenza are not associated with more severe signs than their single infections in children 83. Moreover, as reviewed in 84, attenuation of HIV-1 replication has been observed during co-infection with viruses like measles virus, HIV-2, GB virus C, and human T lymphotropic virus types 1 and 2. Furthermore, co-infection with two obligate intracellular pathogens, HIV and the Chlamydia relative Orientia tsutsugamushi, has been shown to be detrimental to the virus-bacteria pair 84, 85 . In conclusion, we have examined the crosstalk between iron-homeostasis and microbial infections. Our observations using Iron Squelch analysis, suggest that iron restriction from pathogens might be a general cellular response. Cellular iron-depletion might be interferon mediated leading to an iron-depletion profile during intracellular infections. These findings may help explain in part the molecular mechanisms underpinning disease exacerbations or attenuation from the perspective of iron homeostasis and the cellular niche(s) of co-infecting pathogens. We suggest that the synergism or antagonism between infections from an iron-homeostasis perspective be experimentally examined.

Acknowledgement Funding: This work was supported by the VIRGO consortium, an Innovative Cluster approved by the Netherlands Genomics Initiative and partially funded by the Dutch Government (BSIK 03012), The Netherlands.

Supplementary Material for this chapter is available in Appendix 2

100 Cellular iron-depletion profile for intracellular pathogens

References

1. Tong, W.H. & Rouault, T.A. Metabolic regulation of citrate and iron by aconitases: role of iron-sulfur cluster biogenesis. Biometals 20, 549-64 (2007). 2. Andrews, N.C. Forging a field: the golden age of iron biology. Blood 112, 219-30 (2008). 3. Schaible, U.E. & Kaufmann, S.H. Iron and microbial infection. Nat Rev Microbiol 2, 946-53 (2004). 4. Weinberg, E.D. Iron withholding: a defense against viral infections. Biometals 9, 393-9 (1996). 5. Drakesmith, H. & Prentice, A. Viral infection and iron metabolism. Nat Rev Microbiol 6, 541-52 (2008). 6. Brock, J.H. Benefits and dangers of iron during infection. Curr Opin Clin Nutr Metab Care 2, 507-10 (1999). 7. Murray, M.J., Murray, A.B., Murray, M.B. & Murray, C.J. The adverse effect of iron repletion on the course of certain infections. Br Med J 2, 1113-5 (1978). 8. Debebe, Z. et al. Iron chelators ICL670 and 311 inhibit HIV-1 transcription. Virology 367, 324-33 (2007). 9. Fillebeen, C. & Pantopoulos, K. Iron inhibits replication of infectious hepatitis C virus in permissive Huh7.5.1 cells. J Hepatol (2010). 10. Cho, H., Lee, H.C., Jang, S.K. & Kim, Y.K. Iron increases translation initiation directed by internal ribosome entry site of hepatitis C virus. Virus Genes 37, 154-60 (2008). 11. Ghio, A.J. et al. Disruption of iron homeostasis in mesothelial cells after talc pleurodesis. Am J Respir Cell Mol Biol 46, 80-6 (2012). 12. Chicault, C. et al. Iron-related transcriptomic variations in CaCo-2 cells, an in vitro model of intestinal absorptive cells. Physiol Genomics 26, 55-67 (2006). 13. Brodzinski, H. & Ruddy, R.M. Review of new and newly discovered respiratory tract viruses in children. Pediatr Emerg Care 25, 352-60; quiz 361-3 (2009). 14. Gambarino, S. et al. Lower respiratory tract viral infections in hospitalized adult patients. Minerva Med 100, 349-56 (2009). 15. Hammerschlag, M.R. Pneumonia due to Chlamydia pneumoniae in children: epidemiology, diagnosis, and treatment. Pediatr Pulmonol 36, 384-90 (2003). 16. Airenne, S., Surcel, H.M., Tuukkanen, J., Leinonen, M. & Saikku, P. Chlamydia pneumoniae inhibits apoptosis in human 4 epithelial and monocyte cell lines. Scand J Immunol 55, 390-8 (2002). 17. Mancini, D.A. et al. Influenza virus and proteolytic bacteria co-infection in respiratory tract from individuals presenting respiratory manifestations. Rev Inst Med Trop Sao Paulo 50, 41-6 (2008). 18. van Diepen, A. et al. Quantitative proteome profiling of respiratory virus-infected lung epithelial cells. J Proteomics 73, 1680-93 (2010). 19. Bao, X. et al. Airway epithelial cell response to human metapneumovirus infection. Virology 368, 91-101 (2007). 20. Mayer, A.K. et al. Differential recognition of TLR-dependent microbial ligands in human bronchial epithelial cells. J Immunol 178, 3134-42 (2007). 21. Culley, F.J., Pennycook, A.M., Tregoning, J.S., Hussell, T. & Openshaw, P.J. Differential chemokine expression following respiratory virus infection reflects Th1- or Th2-biased immunopathology. J Virol 80, 4521-7 (2006). 22. Bao, X. et al. Human metapneumovirus glycoprotein G inhibits innate immune responses. PLoS Pathog 4, e1000077 (2008). 23. Dinwiddie, D.L. & Harrod, K.S. Human metapneumovirus inhibits IFN-alpha signaling through inhibition of STAT1 phosphorylation. Am J Respir Cell Mol Biol 38, 661-70 (2008). 24. Bao, X. et al. Identification of human metapneumovirus-induced gene networks in airway epithelial cells by microarray analysis. Virology 374, 114-27 (2008). 25. Zilliox, M.J., Parmigiani, G. & Griffin, D.E. Gene expression patterns in dendritic cells infected with measles virus compared with other pathogens. Proc Natl Acad Sci U S A 103, 3363-8 (2006). 26. Rosseau, S. et al. Comparative transcriptional profiling of the lung reveals shared and distinct features of Streptococcus pneumoniae and influenza A virus infection. Immunology 120, 380-91 (2007). 27. Alvesalo, J. et al. Microarray analysis of a Chlamydia pneumoniae-infected human epithelial cell line by use of gene ontology hierarchy. J Infect Dis 197, 156-62 (2008). 28. Ukena, S.N. et al. The host response to the probiotic Escherichia coli strain Nissle 1917: specific up-regulation of the proinflammatory chemokine MCP-1. BMC Med Genet 6, 43 (2005).

101 29. O’Grady, E.P., Mulcahy, H., O’Callaghan, J., Adams, C. & O’Gara, F. Pseudomonas aeruginosa infection of airway epithelial cells modulates expression of Kruppel-like factors 2 and 6 via RsmA-mediated regulation of type III exoenzymes S and Y. Infect Immun 74, 5893-902 (2006). 30. dos Santos, C.C. et al. DNA microarray analysis of gene expression in alveolar epithelial cells in response to TNFalpha, LPS, and cyclic stretch. Physiol Genomics 19, 331-42 (2004). 31. Pawliczak, R. et al. Influence of IFN-gamma on gene expression in normal human bronchial epithelial cells: modulation of IFN-gamma effects by dexamethasone. Physiol Genomics 23, 28-45 (2005). 32. Hinrichs, A.S. et al. The UCSC Genome Browser Database: update 2006. Nucleic Acids Res 34, D590-8 (2006). 33. Karolchik, D. et al. The UCSC Genome Browser Database. Nucleic Acids Res 31, 51-4 (2003). 34. Frith, M.C. et al. Detection of functional DNA motifs via statistical over-representation. Nucleic Acids Res 32, 1372-81 (2004). 35. Sandelin, A., Alkema, W., Engstrom, P., Wasserman, W.W. & Lenhard, B. JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res 32, D91-4 (2004). 36. Matys, V. et al. TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res 31, 374-8 (2003). 37. Billiau, A. & Matthys, P. Interferon-gamma: a historical perspective. Cytokine Growth Factor Rev 20, 97-113 (2009). 38. Zhang, N. et al. Type 1 T-cell responses in chlamydial lung infections are associated with local MIP-1alpha response. Cell Mol Immunol 7, 355-60 (2010). 39. Zandbergen, F. et al. The G0/G1 switch gene 2 is a novel PPAR target gene. Biochem J 392, 313-24 (2005). 40. Welch, C. et al. Identification of a protein, G0S2, that lacks Bcl-2 homology domains and interacts with and antagonizes Bcl-2. Cancer Res 69, 6782-9 (2009). 41. Pan, H. et al. Ipr1 gene mediates innate immunity to tuberculosis. Nature 434, 767-72 (2005). 42. Regad, T. & Chelbi-Alix, M.K. Role and fate of PML nuclear bodies in response to interferon and viral infections. Oncogene 20, 7274-86 (2001). 43. Roscioli, T. et al. Mutations in the gene encoding the PML nuclear body protein Sp110 are associated with immuno- deficiency and hepatic veno-occlusive disease. Nat Genet 38, 620-2 (2006). 44. Hayashi, M., Shimba, S. & Tezuka, M. Characterization of the molecular clock in mouse peritoneal macrophages. Biol Pharm Bull 30, 621-6 (2007). 45. Shi, S. et al. Circadian clock gene Bmal1 is not essential; functional replacement with its paralog, Bmal2. Curr Biol 20, 316-21 (2010). 46. Espert, L. et al. ISG20, a new interferon-induced RNase specific for single-stranded RNA, defines an alternative antiviral pathway against RNA genomic viruses. J Biol Chem 278, 16151-8 (2003). 47. Li, W. et al. Activation of interleukin-32 pro-inflammatory pathway in response to influenza A virus infection. PLoS One 3, e1985 (2008). 48. Mao, A.P. et al. Virus-triggered ubiquitination of TRAF3/6 by cIAP1/2 is essential for induction of interferon-beta (IFN-beta) and cellular antiviral response. J Biol Chem 285, 9470-6 (2010). 49. Contente, A., Dittmer, A., Koch, M.C., Roth, J. & Dobbelstein, M. A polymorphic microsatellite that mediates induction of PIG3 by p53. Nat Genet 30, 315-20 (2002). 50. Brown, C.W., Li, L., Houston-Hawkins, D.E. & Matzuk, M.M. Activins are critical modulators of growth and survival. Mol Endocrinol 17, 2404-17 (2003). 51. Labalette, C. et al. The LIM-only protein FHL2 regulates cyclin D1 expression and cell proliferation. J Biol Chem 283, 15201-8 (2008). 52. Kim, T.J., Mitsutake, S. & Igarashi, Y. The interaction between the pleckstrin homology domain of ceramide kinase and phosphatidylinositol 4,5-bisphosphate regulates the plasma membrane targeting and ceramide 1-phosphate levels. Biochem Biophys Res Commun 342, 611-7 (2006). 53. Gomez-Munoz, A. et al. Short-chain ceramide-1-phosphates are novel stimulators of DNA synthesis and cell division: antagonism by cell-permeable ceramides. Mol Pharmacol 47, 833-9 (1995). 54. Pang, H. et al. Crystal structure of human pirin: an iron-binding nuclear protein and transcription cofactor. J Biol Chem 279, 1491-8 (2004). 55. Carleton, M. et al. RNA interference-mediated silencing of mitotic kinesin KIF14 disrupts cell cycle progression and induces cytokinesis failure. Mol Cell Biol 26, 3853-63 (2006). 56. Romeo, G. et al. IRF-1 as a negative regulator of cell proliferation. J Interferon Cytokine Res 22, 39-47 (2002). 57. Saha, B., Jyothi Prasanna, S., Chandrasekar, B. & Nandi, D. Gene modulation and immunoregulatory roles of interferon gamma. Cytokine 50, 1-14 (2010). 58. Kong, X.F. et al. A novel form of human STAT1 deficiency impairing early but not late responses to interferons. Blood (2010).

102 Cellular iron-depletion profile for intracellular pathogens

59. Guo, Z. et al. Sp1 upregulates the four and half lim 2 (FHL2) expression in gastrointestinal cancers through transcription regulation. Mol Carcinog 49, 826-36 (2010). 60. Tian, X. et al. Zinc finger protein 278, a potential oncogene in human colorectal cancer. Acta Biochim Biophys Sin (Shanghai) 40, 289-96 (2008). 61. Yan, M. et al. Development of cellular immune responses against PAX5, a novel target for cancer immunotherapy. Cancer Res 68, 8058-65 (2008). 62. Park, D., Jia, H., Rajakumar, V. & Chamberlin, H.M. Pax2/5/8 proteins promote cell survival in C. elegans. Development 133, 4193-202 (2006). 63. Guttridge, D.C., Mayo, M.W., Madrid, L.V., Wang, C.Y. & Baldwin, A.S., Jr. NF-kappaB-induced loss of MyoD messenger RNA: possible role in muscle decay and cachexia. Science 289, 2363-6 (2000). 64. Shalita-Chesner, M., Glaser, T. & Werner, H. Signal transducer and activator of transcription-1 (STAT1), but not STAT5b, regulates IGF-I receptor gene expression in an osteosarcoma cell line. J Pediatr Endocrinol Metab 17, 211-8 (2004). 65. Clarke, D.L. et al. TNFalpha and IFNgamma synergistically enhance transcriptional activation of CXCL10 in human airway smooth muscle cells via STAT-1, NF-kappaB, and the transcriptional coactivator CREB-binding protein. J Biol Chem 285, 29101-10 (2010). 66. Bellmann-Weiler, R. et al. Divergent modulation of Chlamydia pneumoniae infection cycle in human monocytic and endothelial cells by iron, tryptophan availability and interferon gamma. Immunobiology 215, 842-8 (2010). 67. Santon, A., Formigari, A. & Irato, P. The influence of metallothionein on exposure to metals: an in vitro study on cellular models. Toxicol In Vitro 22, 980-7 (2008). 68. Piotrowski, J.K. & Szymanska, J.A. Influence of certain metals on the level of metallothionein-like proteins in the liver and kidneys of rats. J Toxicol Environ Health 1, 991-1002 (1976). 69. Yang, J. et al. An iron delivery pathway mediated by a lipocalin. Mol Cell 10, 1045-56 (2002). 70. Wu, H. et al. Lipocalin 2 is protective against E. coli pneumonia. Respir Res 11, 96 (2010). 71. Nairz, M. et al. Interferon-gamma limits the availability of iron for intramacrophage Salmonella typhimurium. Eur J Immunol 38, 1923-36 (2008). 72. Richardson, D.R. et al. Mitochondrial iron trafficking and the integration of iron metabolism between the mitochondrion and cytosol. Proc Natl Acad Sci U S A 107, 10775-82 (2010). 73. Milon, L. et al. The human nm23-H4 gene product is a mitochondrial nucleoside diphosphate kinase. J Biol Chem 275, 4 14264-72 (2000). 74. Gordon, D.M., Lyver, E.R., Lesuisse, E., Dancis, A. & Pain, D. GTP in the mitochondrial matrix plays a crucial role in organellar iron homoeostasis. Biochem J 400, 163-8 (2006). 75. Lesuisse, E., Lyver, E.R., Knight, S.A. & Dancis, A. Role of YHM1, encoding a mitochondrial carrier protein, in iron distribution of yeast. Biochem J 378, 599-607 (2004). 76. Bourgeade, M.F., Silbermann, F., Thang, M.N. & Besancon, F. Reduction of transferrin receptor expression by interferon gamma in a human cell line sensitive to its antiproliferative effect. Biochem Biophys Res Commun 153, 897-903 (1988). 77. Chambers, R. & Takimoto, T. Host specificity of the anti-interferon and anti-apoptosis activities of parainfluenza virus P/C gene products. J Gen Virol 90, 1906-15 (2009). 78. Braun, L.E. et al. Co-infection of the cotton rat (Sigmodon hispidus) with Staphylococcus aureus and influenza A virus results in synergistic disease. Microb Pathog 43, 208-16 (2007). 79. Mancini, D.A., Mendonca, R.M., Dias, A.L., Mendonca, R.Z. & Pinto, J.R. Co-infection between influenza virus and flagellated bacteria. Rev Inst Med Trop Sao Paulo 47, 275-80 (2005). 80. Falchi, A. et al. Dual infections by influenza A/H3N2 and B viruses and by influenza A/H3N2 and A/H1N1 viruses during winter 2007, Corsica Island, France. J Clin Virol 41, 148-51 (2008). 81. Bosis, S. et al. Impact of human metapneumovirus in childhood: comparison with respiratory syncytial virus and influenza viruses. J Med Virol 75, 101-4 (2005). 82. Brouard, J. et al. [Viral co-infections in immunocompetent infants with bronchiolitis: prospective epidemiologic study]. Arch Pediatr 7 Suppl 3, 531s-535s (2000). 83. Glezen, W.P., Paredes, A. & Taber, L.H. Influenza in children. Relationship to other respiratory agents. JAMA 243, 1345-9 (1980). 84. Kannangara, S., DeSimone, J.A. & Pomerantz, R.J. Attenuation of HIV-1 infection by other microbial agents. J Infect Dis 192, 1003-9 (2005). 85. Watt, G. et al. HIV-1 suppression during acute scrub-typhus infection. Lancet 356, 475-9 (2000).

103

5

MicroRNA genes preferentially expressed in dendritic cells contain sites for conserved transcription factor binding motifs in their promoters

Bastiaan J.H. Jansen1*, Iziah E. Sama2*,Dagmar Eleveld-Trancikova1, Maaike A. van Hout-Kuijer1, Joop H. Jansen3, Martijn A. Huynen2, Gosse J. Adema1

1 Department of Tumor Immunology, Nijmegen Centre for Molecular Life Sciences, Radboud University Nijmegen Medical Centre, Nijmegen, the Netherlands 2 Centre for Molecular and Biomolecular Informatics, Nijmegen Centre for Molecular Life Sciences, Radboud University Nijmegen Medical Centre, Nijmegen, the Netherlands 3 Central Hematology Laboratory, Radboud University Nijmegen Medical Centre, Nijmegen, the Netherlands *Equal contribution

BMC Genomics. 2011 Jun 27;12(1):330. Abstract

Background MicroRNAs (miRNAs) play a fundamental role in the regulation of gene expression by translational repression or target mRNA degradation. Regulatory elements in miRNA promoters are less well studied, but may reveal a link between their expression and a specific cell type. Results To explore this link in myeloid cells, miRNA expression profiles were generated from monocytes and dendritic cells (DCs). Differences in miRNA expression among monocytes, DCs and their stimulated progeny were observed. Furthermore, putative promoter regions of miRNAs that are significantly up-regulated in DCs were screened for Transcription Factor Binding Sites (TFBSs) based on TFBS motif matching score, the degree to which those TFBSs are over-represented in the promoters of the up-regulated miRNAs, and the extent of conservation of the TFBSs in mammals. Conclusions Analysis of evolutionarily conserved TFBSs in DC promoters revealed preferential clustering of sites within 500 bp upstream of the precursor miRNAs and that many mRNAs of cognate TFs of the conserved TFBSs were indeed expressed in the DCs. Taken together, our data provide evidence that selected miRNAs expressed in DCs have evolutionarily conserved TFBSs relevant to DC biology in their promoters.

106 Transcription and targets of DC-expressed miRNAs

1. Background

In recent years, microRNAs (miRNAs) have taken center stage, as they are key regulators of gene expression at the post-transcriptional level, and play a fundamental role in a wide variety of biological processes, such as cell growth, development and several pathological conditions [1-3]. MicroRNAs are small, ~22 nt long, single-stranded molecules which, when complexed with an RNA-induced silencing complex (RISC), are able to form a complementary double-stranded RNA structure by hybridizing to the 3’ untranslated region of target transcripts, and inhibit translation of their cognate mRNA and/or promote their degradation [4]. MicroRNAs have an established role in hematopoietic development and immunity. For example, forced expression of miR-181 in hematopoietic progenitors leads to an increase in the number of B cells [5], whereas it sets T cell receptor signaling thresholds by targeting negative regulators [6]. MicroRNA-146a is up-regulated during toll-like receptor (TLR) signaling and targets TNF receptor-associated factor 6 (TRAF6) and IL-1 receptor-associated kinase 1 (IRAK1) [7], thereby serving in a negative feedback loop. Moreover, miR-155 is also up-regulated during TLR and TNF signaling [8], and is required for normal immune function [9-14].

Although great strides have been made towards understanding the biogenesis of miRNAs [4] and the identification of mRNA targets [15], their own expression is one of the least understood aspects. They are transcribed by RNA polymerase II [16] or RNA polymerase III [17]. In addition, approximately 80% of miRNAs are located in introns of protein coding genes, but at least one third is believed to be transcribed independently from their host gene [4, 18-20], whereas recent data suggest that most, if not all, intronic miRNAs contain putative promoters independent of their host gene [21]. In fact, it is now believed that once physically accessible, a gene is regulated by transcription factors that bind to their cognate transcription factor binding site (TFBS) in its promoter. Usually, there is more than one TFBS per gene, allowing combinations of transcription factors to elicit gene transcription. This phenomenon has been predicted for instance in 5 Plasmodium falciparum, a parasite with a dearth of transcription-associated factors [22-24] and has been experimentally validated in other eukaryotic promoters [25, 26].

Myeloid dendritic cells (DCs) and monocytes arise from a common monocyte/dendritic cell progenitor [27]. In vitro, DCs can be generated from blood-derived monocytes when cultured in the presence of the cytokines interleukin 4 (IL4) and granulocyte/ macrophage colony stimulating factor (GM-CSF) [28]. DCs play an important role in innate immunity and the initiation of adaptive immune responses. They capture foreign antigens in peripheral tissues, migrate to the T-cell areas of secondary lymphoid organs and present these antigens to T- and B-cells. Depending on the extracellular signals they receive, they either induce tolerance in the steady state (tolerogenic DCs), or an

107 inflammatory response in the presence of pathogen-associated patterns (PAMPs) or inflammatory cytokines (activated or mature DCs) [29, 30]. As a consequence, DCs have gained considerable interest as vaccine adjuvants and are currently exploited in the treatment of cancer after loading with tumor-cell derived antigens [31, 32].

In order to gain insight in miRNAs that may regulate DC development and behaviour, expression profiles of 157 miRNAs were obtained from monocytes and DCs under inflammatory and tolerizing conditions. We show that DCs express a wide variety of miRNAs, some of which are differentially regulated during DC development and maturation. We predicted several target genes for these miRNAs, as well as binding sites for transcription factors in the putative promoter regions of these miRNAs. Furthermore, we show that by also taking evolutionary conservation [33, 34] of the identified TFBSs into account, binding sites were found to preferentially cluster within 500 bp upstream of the pre-miRNAs. Also, the fraction of conserved TFBSs for which the cognate transcription factors are expressed in DCs increases with the number of miRNA promoters that contain these TFBSs. Taken together, the data described here provide evidence that the promoter regions of the miRNAs expressed in myeloid DCs contain binding sites for motifs of transcription factors that are relevant to myeloid cell biology. This may help expand the understanding of the molecular mechanisms underlying DC biology and development.

2. Methods

2.1 Isolation, culture and characterization of monocytes and DCs Peripheral blood mononuclear cells (PBMCs) were isolated as previously described [28]. Monocytes were isolated from PBMCs using CD14+ selection and the AutoMACS technology (Miltenyi Biotec, Bergisch Gladbach, Germany) following the manufacturer’s directions. Human monocyte-derived DCs were generated using GM-CSF and IL-4 and matured as described previously. Purity and maturation of monocytes and DCs were assessed by means of FACS analysis as described previously [28]. Mixed lymphocyte reactions were essentially carried out as described elsewhere [35]. Briefly, they were co-cultured with peripheral blood lymphocytes in various dilutions for 2 to 4 days. Then tritiated thymidine (1 μCi/well; MP Biomedicals) was added to the cell cultures and incorporation was measured after 16 hr. Enzyme-linked immunosorbent assays (ELISA) were performed to assess the secretion of the inflammatory cytokines TNFα and IL8, using BD OptEIA (BD Biosciences, San Jose, CA) kits following the manufacturer’s recommenda- tions.

108 Transcription and targets of DC-expressed miRNAs

2.2 Isolation of total RNA and microRNA-specific reverse transcription and quantitative PCR Total RNA was isolated using TRIzol (Invitrogen, Carlsbad, CA), following the manufacturer’s recommendation. Quantity and purity were determined spectrophotometrically. For each miRNA, 4 ng of total RNA was used as input and miRNAs were converted to cDNA using the TaqMan miRNA Reverse Transcription cDNA Synthesis kit and miRNA-specific looped primers from the Early Access miRNA Profiling Kit (both from Applied Biosystems, Foster City, CA) following the manufacturer’s recommendations. Real-time quantitative PCR was performed using miRNA-specific primer/probe pairs from the Early Access miRNA Profiling Kit and reactions were carried out as described elsewhere [36]. Actual amplification and data collection were performed on the ABI 7700 Sequence Detection System (Applied Biosystems, Foster City, CA). After visual inspection, data were exported to a text file and further analyzed in Microsoft Excel (Microsoft Corporation, Redmond, WA) and various Bioconductor packages [37] in the R statistics environment [38].

2.3 Statistics for miRNAs expression data Delta Ct values for each miRNA were calculated with hsa-let-7a as a reference, according to the manufacturer’s directions. Only miRNAs with a dCt ≤ 12 (where dCt = Cttarget –

Cthsa-let-7a) were considered for further analysis. Data were combined into a convenient ExpressionSet (eSet) structure using the Biobase package, and further analyzed by means of gene-by-gene one-factor ANOVA using the package LMGene, all in Bioconductor and R. Only genes that had a False Discovery Rate-adjusted p-value of < 0.05 were considered differentially expressed. These genes were further assessed using non- parametric pairwise comparison of the different myeloid cell subsets using Tukey’s post-hoc test in the R statistics package. Genes with a p-value < 0.05 were considered differentially expressed.

2.4 Determination of TFBSs in microRNA promoters Putative promoter regions extending 2 kb upstream of the miRNAs were extracted from 5 the Genome Browser sno/miRNA track of the UCSC March 2006 human genome assembly [39, 40]. The program Clover [41] was used to screen for over-represented TFBSs in these sequences using a precompiled library of TFBS motifs. The library contained 263 TFBS motifs (position-specific weight matrices) constructed from the JASPAR core database (2005) [42] and TRANSFAC version 7.0 [43].

To determine over-represented TFBS motifs, Clover starts for every location in a DNA sequence, by calculating a score reflecting the likelihood that a certain TF binds at that location. This score is a likelihood ratio, with in the numerator the probability that the sequence matches the positional weight matrix of the motif (the product of the frequencies of the nucleotides in the weight matrix at the positions corresponding to

109 those in the sequence) and in the denominator the likelihood that the sequence is derived from a random sequence (the product of the frequencies of the nucleotides in each position in the background sequence). This likelihood ratio is then averaged per complete sequence and over all subsets of the set of sequences. Finally a “raw score” is derived, by taking the logarithm of the averaged likelihood ratio. A raw score above zero thus signifies over-represented motifs and a raw score below zero signifies under- represented motifs. The raw score increases when more of the sequences contain good motif matches, and also when there are more good matches per sequence. The p-values for over-represented motifs are subsequently derived by analyzing, whether random sets of promoter sequences from all the genes in the genome are likely to have the same, or a higher “raw score”. Clover examines the over-representation of multiple TFBSs and takes into account the multiple testing issue for its p-value calculation. Besides the raw score, Clover also reports an instance score, which is the logarithm of the likelihood ratio mentioned above for any specific TFBS at any specific position in the sequence.

Our inputs to Clover are the miRNA promoter sequences and the 263 TFBS motifs mentioned above. For statistical calculations, promoter regions 2 kb upstream of all human genes in the genome are included as background. Our thresholds for Clover outputs are instance score ≥ 6 for recognizing a specific TFBS, and p-value ≤ 0.05 for over-represented motifs (default values in Clover). In addition, any mention of a TF motif score in subsequent sections refers to the instance score for the TFBS at a specific location in a promoter sequence. Our motivation for including instance scores is to enable us to calculate the conservation of the nucleotides at high-scoring TFBSs.

2.5 Calculation of evolutionary conservation score of TFBSs In order to take into account conservation of TFBSs in the miRNA promoters, the PhastCons conservation track from the UCSC Genome Browser of January 2009 (http:// hgdownload.cse.ucsc.edu/goldenPath/hg18/phastCons44way/) was used, which represents the conservation of each nucleotide across 44 placental mammals as calculated using the PhastCons program [33]. The mammalian conservation track was used because we examined the expression of mammalian miRNAs that are generally not conserved outside this clade [44, 45]. The base-by-base conservation scores are derived from a two-state phylo-HMM and are defined as the posterior probability that the corresponding alignment column was generated by the conserved state and not the non-conserved state of the phylo-HMM used in the calculation [33]. The score ranges from 0 to 1; the higher the value, the more conserved the nucleotide. Using in-house Python scripts, we obtained the conservation score for a TFBS as the median of the PhastCons scores of the bases in that TFBS.

110 Transcription and targets of DC-expressed miRNAs

2.6 Microarray sample preparation of DCs and microarray analysis From one donor, 3 different samples (1x107 cells; technical replicate) were generated per DC subtype (immature monocyte-derived DCs, maturing DCs, tolerogenic DCs and activated tolerogenic DCs) and RNA for microarray analysis was isolated using the RNeasy Total RNA Extraction kit (Qiagen, Venlo, the Netherlands). Quality control, conversion to labeled RNA, hybridization and scanning were all performed at the microarray facility of the Department of Human Genetics (Radboud University Nijmegen Medical Centre Nijmegen, Nijmegen, the Netherlands), using Affymetrix technology while following the manufacturer’s protocols. The .CEL files were processed using the Bioconductor package limma in the R statistics environment and further analyzed with the package panp to generate presence/absence calls from the microarray data. In this study, intensities above the p-value cut-off of 0.01 indicated presence, p-values between 0.01 and 0.02 indicated marginal presence, and p-values above 0.02 indicated absence. Complete, MIAME-compliant datasets were deposited with the Gene Expression Omnibus of the National Center for Biotechnology Information (http://www.ncbi.nlm. nih.gov/geo/) and can be accessed through GEO Series accession number GSE23371.

2.7 Analysis of miRNA targets In addition to identifying miRNAs that are expressed in DCs and monocytes, it is important to identify some of their target genes. This may help in understanding the miRNA-related molecular mechanisms underpinning DC maturation. We used Tar- getScanHuman version 5.1 [15] to predict the target genes for the miRNAs identified to be expressed in DCs and monocytes. The precompiled predictions were downloaded from the official TargetScan website (http://www.targetscan.org/) and were further filtered for RefSeq transcripts having at least one site that is conserved across placental mammals and also had a target score (total context score) of ≤ -0.4 before further analyses. To improve the reliability of the target genes predicted, we used a second miRNA target prediction tool, PicTar [46]. The precompiled miRNA target predictions were downloaded from the PicTar website at http://pictar.mdc-berlin.de, on January 5 2011. We filtered the predictions at a PicTar threshold of ≥0.4. For further analysis we used the target genes that were predicted by both TargetScan and PicTar and also by each of the algorithms separately. To narrow the scope of the target genes to DCs, we used the previously published dataset of Lehtonen and colleagues wherein the relative expression levels of gene expression in DCs relative to monocytes are reported [47]. Using their data, we selected genes that are at least 2-fold down-regulated in DCs relative to monocytes and that we also predicted to be target genes for DC-expressed miRNAs. Comparisons between our predicted target genes and genes that were over- or under-expressed in DCs relative to monocytes in the Lehtonen dataset were done at the level of RefSeq DNA ID to ensure that the correct gene isoforms were being matched. (The Lehtonen dataset was based on global gene expression analyses using Affymetrix

111 HG-U133A Gene Chip oligonucleotide arrays. We used Biomart in Ensembl version 62 available at http://www.ensembl.org/biomart to obtain the corresponding RefSeq DNA IDs for the probesets in the microarray dataset). The set of genes so selected are likely to be important in miRNA regulation of DC maturation from monocytes.

2.8 Gene Ontology analysis Gene ontology enrichment analyses were done using the Cytoscape [48] plugin BINGO version 2.3 [49]. Using this software, we tested for over-representation using the hypergeometric test with Benjamini & Hochberg False Discovery Rate (FDR) correction. GO processes reported were deemed significant when the corrected p-value was < 0.05. Venn diagrams were drawn using the Venny tool of Oliveros, J.C. available at http:// bioinfogp.cnb.csic.es/tools/venny/index.html.

3. Results

3.1 Monocytes and monocyte-derived dendritic cell subtypes express distinct microRNAs Monocytes were isolated to > 90% purity and monocyte-derived dendritic cell subtypes were generated and validated as depicted in Additional file 1, Figure S1. A TaqMan-based quantitative RT-PCR method was used to profile the expression of miRNAs. Immature monocyte-derived DCs (iDCs) were induced to either undergo maturation by LPS stimulation for 6 hr (mDCs), to mature into tolerogenic DCs (tDCs) in the presence of IL-10 and dexamethasone for 24 hr, or to mature into activated tolerogenic DCs (atDCs; previously described by Emmer et al. [50]) in the presence of IL-10 and dexamethasone for 24 hr, followed by 6 hr of LPS (Additional file 1, Figure S1A). Phenotypical and functional characterization confirmed that mature DCs expressed high levels of CD80 and CD86 (Additional file 1, Figure S1B), induced strong proliferation of allogeneic T cells in a mixed lymphocyte reaction (Additional file 1, Figure S1C) and secreted high levels of pro-inflammatory cytokines TNF and IL8 (Additional file 1, Figure S1D). Accordingly, iDCs and tDCs did not show up-regulation of CD80 and CD86, induced much less proliferation and produced less cytokines. Activated tolerogenic DCs showed a more intermediate pattern of activation, indicative of less profound TLR signaling in response to LPS. The expression profile of 157 miRNAs in monocytes and the various DC activation stages was determined in a primary screen involving total RNA from one donor as shown in Additional file 2, Figure S2.

Of the 157 miRNAs screened, 104 were detected and expressed to variable degrees in monocytes and/or DCs, whereas 53 miRNAs remained undetected, irrespective of cell type or stimulus (Additional file 3, Table S1). Intriguingly, 27 miRNAs appeared to be

112 Transcription and targets of DC-expressed miRNAs

differentially expressed between monocytes, DCs and stimulated DCs (Table 1). These 27 miRNAs were selected for additional profiling of multiple donors as described below.

3.2 Identification of miRNAs differentially regulated during monocyte-derived DC activation To validate the expression of the identified 27 miRNAs, their expression was further examined in monocytes and DCs of 3 different donors (Additional file 2, Figure S2). Statistical analysis revealed that 18 miRNAs were truly differentially expressed (gene-by-gene ANOVA, p < 0.05; Table 2). Six miRNAs were up-regulated in monocytes versus DCs, and 10 were up-regulated in DCs versus monocytes. Eight miRNAs were not differentially expressed and had a high degree of variation in expression levels (data not shown). However, hsa-miR-15b and hsa-miR-16 showed remarkably similar expression profiles and are expressed at higher levels in monocytes (Figure 1, upper row of charts; Pearson correlation = 0.99). Hsa-miR-125a, hsa-miR-221 and hsa-miR-342 are examples of miRNAs that are up-regulated in the DC populations relative to monocytes (Figure 1, middle row). Furthermore, both hsa-miR-146a and hsa-miR-155 were up-regulated during differentiation, and were slightly up-regulated in mDCs and atDCs (Figure 1, two left charts in lower row; gene-by-gene ANOVA, p < 0.05, Tukey’s post-hoc test; Pearson correlation = 0.90), whereas hsa-miR-210 was primarily up-regulated in atDCs (Figure 1, right chart of lower row). Data generated from these analyses indicate that many of the selected miRNAs are indeed expressed in monocytes and DCs, and that 3 of these miRNAs are differentially regulated during DC development and maturation.

3.3 Identification of TFBSs in promoters of differentially expressed miRNAs in monocytes and DCs In order to link the miRNA expression profiles to a specific TFBS (or groups of TFBSs) of the myeloid cell types that were analyzed, sequences 2 kb upstream of the mapped pre-miRNAs were chosen for analysis. It should be noted that it is expected that a majority of miRNAs contain their own promoters [21], and that in some cases, different 5 miRNA genes (such as hsa-miR-16-1 and hsa-miR-16-2) in the genome give rise to an identical mature miRNA (such as hsa-miR-16). Therefore 31 promoters could be assigned to 27 miRNAs that were profiled in detail (Table 1). Likewise, 167 promoters could be assigned to the total of 157 miRNAs analyzed. Analysis of the 31 promoters revealed that 3 TFBS motifs (Elk-1, RREB-1 and SPIB) were over-represented using the Clover criteria (p < 0.05) among all 8 promoter sequences of miRNA genes that are up-regulated in monocytes, and to have at least 1 high-scoring TFBS (instance score ≥ 6) in each promoter. Selecting TFBSs using these same criteria (over-represented motifs and at least 1 high-scoring TFBS per promoter) among the 12 promoters of the miRNAs that are up-regulated in DCs relative to monocytes, reveals that they have 13 TFBSs in common (Figure 2, Table 3, Additional file 4, Table S2). Furthermore, 4 TFBSs are over-represented

113 X X X X X X No difference X X X X X X Up in monocytes X X X Up in DC subset X X X X X X X X Up in DC same same same same hsa-miR-125b-1, hsa-miR-125b-2 same same same hsa-miR-135a-2 hsa-miR-135a-1, same same hsa-miR-146a same same same same hsa-miR-199a-2 hsa-miR-199a-1, same same same same hsa-miR-16-2 hsa-miR-16-1, same Current genomic entries in miRbase in genomic entries Current matching mature miRNA* AACCCGUAGAUCCGAUCUUGUG AACCCGUAGAUCCGAACUUGUG UCCCUGAGACCCUUUAACCUGUG UCCCUGAGACCCUAACUUGUGA UCGUACCGUGAGUAAUAAUGC UAUUGCUUAAGAAUACGCGUAG AGUGGUUUUACCCUAUGGUAG UGGCAGUGUCUUAGCUGGUUGU CAGUGCAAUGUUAAAAGGGC UAACAGUCUACAGCCAUGGUCG UAUGGCUUUUUAUUCCUAUGUGA UGAGAACUGAAUUCCAUGGGUU UCUCCCAACCCUUGUACCAGUG UUAAUGCUAAUCGUGAUAGGGG CAAAGAAUUCUCCUUUUGGGCUU CCCAGUGUUUAGACUAUCUGUUC CCCAGUGUUCAGACUACCUGUU CUGUGCGUGUGACAGCGGCUG UUCACAGUGGCUAAGUUCUG UAGCAGCACAUCAUGGUUUACA UGAGGUAGGAGGUUGUAUAGU UAGCAGCACGUAAAUAUUGGCG UUCACAGUGGCUAAGUUCCGCC Mature miRNA Sequence miRNA Mature between types. cell MicroRNAs rescreened after primary screen in monocytes and dendritic and cells, their di erences in expression level - miR-27b -miR-99a -miR-100 -miR-125a -miR-125b -miR-126 -miR-137 -miR-140 -miR-34a -miR-130a -miR-132 -miR-135a -miR-146 -miR-150 -miR-155 -miR-186 -miR-199b -miR-199-s -miR-210 -miR-15b -let-7e -miR-16 -miR-27a hsa hsa hsa hsa hsa hsa hsa hsa- hsa hsa hsa hsa hsa hsa hsa hsa hsa hsa hsa hsa hsa hsa hsa Table 1 MicroRNA in assay *27 miRNAs*27 differentially expressed between monocytes, DCs and stimulated DCs

114 Transcription and targets of DC-expressed miRNAs X X X X X X No difference X X X X X X Up in monocytes X X X Up in DC subset X X X X X X X X Up in DC

Figure 1 - Expression profiles of various miRNAs that are differentially expressed in monocytes and DCs.

The upper two charts represent miRNAs over-expressed in monocytes, the middle row of charts represent 3 of the miRNAs over-expressed in DCs, and the lower row of charts represent the 3 miRNAs that are differentially expressed among subsets of DCs. On the Y-axis relative expression is depicted, which is normalized to the expression of hsa-let-7a. same same same same hsa-miR-125b-1, hsa-miR-125b-2 same same same hsa-miR-135a-2 hsa-miR-135a-1, same same hsa-miR-146a same same same same hsa-miR-199a-2 hsa-miR-199a-1, same same same same hsa-miR-16-2 hsa-miR-16-1, same Current genomic entries in miRbase in genomic entries Current matching mature miRNA*

in the promoters in all of the 8 miRNAs that did not show any appreciable difference in expression levels (Table 3). The complete set of high-scoring TFBSs found in the 5 promoters of the miRNAs that are up-regulated in DCs and monocytes are summarized in Figure 3 and Additional file 5, Figure S3 respectively.

Clover detects motifs that are over-represented relative to their expected frequencies and calculates p-values based on a comparison with all promoters in the genome. AACCCGUAGAUCCGAUCUUGUG AACCCGUAGAUCCGAACUUGUG UCCCUGAGACCCUUUAACCUGUG UCCCUGAGACCCUAACUUGUGA UCGUACCGUGAGUAAUAAUGC UAUUGCUUAAGAAUACGCGUAG AGUGGUUUUACCCUAUGGUAG UGGCAGUGUCUUAGCUGGUUGU CAGUGCAAUGUUAAAAGGGC UAACAGUCUACAGCCAUGGUCG UAUGGCUUUUUAUUCCUAUGUGA UGAGAACUGAAUUCCAUGGGUU UCUCCCAACCCUUGUACCAGUG UUAAUGCUAAUCGUGAUAGGGG CAAAGAAUUCUCCUUUUGGGCUU CCCAGUGUUUAGACUAUCUGUUC CCCAGUGUUCAGACUACCUGUU CUGUGCGUGUGACAGCGGCUG UUCACAGUGGCUAAGUUCUG UAGCAGCACAUCAUGGUUUACA UGAGGUAGGAGGUUGUAUAGU UAGCAGCACGUAAAUAUUGGCG UUCACAGUGGCUAAGUUCCGCC

Mature miRNA Sequence miRNA Mature Nevertheless such predictions can potentially give rise to false positives, as even random sets of miRNA promoters contain some over-represented motifs (Figure 3B). We between types. cell MicroRNAs rescreened after primary screen in monocytes and dendritic and cells, their di erences in expression level therefore also examined the over-representation of motifs in the promoters of DC and - monocyte miRNAs relative to those of random sets of miRNAs, selected from the total miR-27b -miR-99a -miR-100 -miR-125a -miR-125b -miR-126 -miR-137 -miR-140 -miR-34a -miR-130a -miR-132 -miR-135a -miR-146 -miR-150 -miR-155 -miR-186 -miR-199b -miR-199-s -miR-210 -miR-15b -let-7e -miR-16 -miR-27a set of 167 miRNA promoters. The 12 miRNAs that are over-expressed in DCs share more hsa hsa hsa hsa hsa hsa hsa hsa- hsa hsa hsa hsa hsa hsa hsa hsa hsa hsa hsa hsa hsa hsa hsa Table 1 MicroRNA in assay

*27 miRNAs*27 differentially expressed between monocytes, DCs and stimulated DCs TFBS motifs than do random sets of 12 miRNA promoters, specifically with respect to

115 Table 2 - Assignment of p-values for expressed miRNAs in all cell types (monocytes and various DC subsets) and DC subsets alone after ANOVA.

Posterior p-values* miRNA gene All cell types DC subsets hsa-let-7e 1.9E-05 4.2E-01 hsa-miR-15b 1.2E-04 6.8E-01 hsa-miR-16 2.1E-02 7.8E- 01 hsa-miR-27a 3.0E-01 2.5E-01 hsa-miR-27b 8.2E-01 8.2E-01 hsa-miR-34a 3.0E-05 9.6E-01 hsa-miR-99a 2.2E-06 7.2E- 01 hsa-miR-100 3.7E-06 7.3E- 01 hsa-miR-125a 1.3E-04 8.6E-01 hsa-miR-125b 1.6E-05 7.2E- 01 hsa-miR-126 6.1E-01 7.7E- 01 hsa-miR-130a 1.6E-03 4.4E-01 **hsa-miR-132 4.0E-02 9.8E-01 hsa-miR-135a 7.1E-10 9.1E-02 hsa-miR-137 3.2E-07 2.1E-01 hsa-miR-140 8.2E-01 7.1E- 01 hsa-miR-146a 9.6E-05 1.3E-02 ***hsa-miR-150 9.9E-02 7.1E- 01 hsa-miR-155 4.7E-06 2.5E-02 hsa-miR-186 7.2E- 01 6.9E-01 hsa-miR-199b 5.2E-05 7.1E- 01 hsa-miR-199s 8.4E-03 4.4E-01 hsa-miR-210 1.3E-05 4.0E-02 hsa-miR-221 5.6E-06 7.4E-01 hsa-miR-326 7.4E-01 9.7E-01 hsa-miR-340 8.6E-01 7.9E- 01 hsa-miR-342 6.1E-06 9.9E-01

*) Posterior p-values < 0.05 indicate differential expression among subsets. **) Considered not expressed differentially after visual inspection of gene expression data. ***) Considered expressed differentially after visual inspection of the data.

116 Transcription and targets of DC-expressed miRNAs

Figure 2 - Representation of TFBSs in promoter regions of miRNAs up-regulated in DCs.

Depicted are the TFBSs at a motif instance score threshold of at least 6 in the miRNA promoter regions. The scale of the y-axis ranges from 5 to 15 for each subgraph. The legend shows only those TFBSs that are present in all promoter sequences.

Table 3 - TFBSs that are present in all miRNA promoters in a particular (group of) cell type(s).

Cell types Overrepresented TFBS Monocytes Elk-1, RREB-1, SPIB DCs* AP-4, E47, MAZR, Myf, MZF1, RREB1, RREB-1, Sp1, SPI1, SPIB, STAT6, ZNF42_1-4, ZNF42_5-13 Subset of DCs** AP-1, ARP-1, FOXI1, myogenin/NF-1, Ncx, Oct-1, OCT-x, Sp1, TCF11, 5 ZNF42_1-4 All cells*** Oct-1, RP58, SPIB, USF

*) Includes monocyte-derived iDCs, maturing DCs, tolerogenic DCs and activated tolerogenic DCs **) Includes maturing DCs and/or activated tolerogenic DCs ***) Concerns all cells investigated, and miRNAs did not show appreciable differences in expression level motifs that are shared by many of the promoters (Figure 3B). When quantifying the number of shared motifs as the sum of all occurrences of all Clover detected motifs, across all promoters, we observe an enrichment of 1.96 relative to randomly selected miRNA promoters (p = 0.001). Likewise, using random sets of 8 miRNA promoters, we

117 Figure 3 - TFBSs shared among the promoter regions of miRNAs up-regulated in DCs.

(A) Shown on the x-axis are the high-scoring TFBSs (i.e. of instance score ≥ 6) that occur at least once in the 2 kb promoters of the miRNA up-regulated in DCs. The y-axis shows the number of promoters that have the TFBS at least once at this threshold. (B) The distribution of the number of common TFBS hits per number of common miRNA promoters as in A, for DCs and random sets of miRNA promoters. The values for random are the median values from 1000 randomly chosen sets of 12 miRNA promoters. The score is the ratio of the sum of all TFBS occurrences across all promoters for the DC set relative to that of the random set. The p-value is the fraction of cases wherein this sum for random sets of miRNA promoters is greater than or equal to that of the DC set.

observed that the 8 miRNAs that are over-expressed in monocytes have a factor of 1.58 enrichment of TFBS motifs (p = 0.046; Additional file 5, Figure S3B). The p-value here was estimated as the number of times, out of 1000, when a randomly chosen set of miRNA

118 Transcription and targets of DC-expressed miRNAs

promoters had at least an equal number of motifs as did the test set of sequences. Given the significant over-representation of different sets of TFBSs in the different sets of miRNAs, it appears that miRNAs contain multiple, different TFBSs in their promoters that specify their expression in certain cell types. This is in agreement with other studies that show that transcription factors often work in combinations [25, 26].

3.4 Evolutionary conservation and distribution of TFBSs in the promoter sequences of myeloid miRNAs Assuming that regulation of miRNA expression is conserved among mammals, we used the PhastCons evolutionary conservation track that is based on nucleotide conservation among placental mammals (http://hgdownload.cse.ucsc.edu/goldenPath/hg18/ phastCons44way/) to quantify evolutionary conservation of the predicted TFBSs. We compared both the motif instance scores and raw scores of TFBSs, as derived from Clover, with the nucleotide conservation score of the TFBSs. Applied to the 2 kb upstream sequences of the pre-miRNAs that are up-regulated in DCs, a small but significant correlation of 0.12 (p = 6.17 × 10-7) was observed between the conservation scores and the TFBS instance scores. Similar results were obtained using the raw score (r = 0.22, p = 8.99 × 10-2). An overlay of TFBS instance scores and conservation scores for the DC miRNA promoters is depicted in Figure 4A. The positive but weak correlation between TFBS motif scores and conservation scores suggests that they are largely independent measures and as such, combination of the two as a filtration measure would be non-redundant. When filtering predicted sites based both on their conservation and on their TFBS instance scores, TFBSs tend to cluster in a region 500 bp just upstream of the putative TSS of miRNA genes that are over-expressed in DCs (Figure 4B). This result corroborates other findings indicating that TFBSs tend to cluster near the transcription start site (TSS) of genes [51-54]. In addition, it suggests that combining the motif matching score with the extent of evolutionary conservation of a TFBS would likely reveal TFs biologically relevant to the identified set of miRNA promoters. 5 In further support of the conservation of the binding motifs that are over-represented in the promoters of the DC-expressed miRNAs, we created a set of random sequence blocks of the same length-distribution as the predicted TFBSs from the same promoters as the test set. From these two data sets, we observed that the predicted TFBSs are significantly (p = 1.49 × 10-3, Wilcoxon rank sum test) more conserved than randomly chosen sequences. The conservation score assigned to a TFBS is the median of the PhastCons scores of the bases in that TFBS segment. We observed that the median of the conservation scores of the test TFBSs is more than 2 fold higher than that of the random sequences (Figure 4C). The complete set of conserved high-scoring TFBSs found in the promoters of the miRNAs up-regulated in DCs and monocytes are summarized in Figure 5 and Additional file 6, Figure S4 respectively. Furthermore, we

119 Figure 4 - Distribution and evolutionary conservation of TFBSs in promoter regions of DC-expressed miRNAs.

(A) The TFBS motif scores (in blue, y-axis range from 0 to 15 per subgraph) as calculated using Clover and the extent of evolutionary conservation as PhastCons score (in grey, y-axis on the right). (B) Distribution pattern of TFBSs, selected based on various motif score and evolutionary score thresholds, in the promoter region of miRNAs up-regulated in DCs.

120 Transcription and targets of DC-expressed miRNAs

(C) Comparison of the median conservation scores of the bases in the predicted TFBSs in the promoters of miRNAs that are over-expressed in DCs, with those of randomly chosen sequences of the same length from the same promoters. Boxplots represent the median and interquartile range of the median PhastCons conservation scores. The fold shown is the ratio of the median of the DC conservation scores and the median of the conservation scores of the random sets (n=number of TFBS instances concerned). observed at the threshold of 10th percentile of median TFBS conservation, that the miRNAs that are over-expressed in DCs share more (fold = 3.51, p < 0.001) sites for TF motifs than did 1000 random sets of promoters of equal size, length and at the same conservation threshold (Figure 5B). A similar result (fold = 3.48, p= 0.003) was obtained for the miRNAs that were over-expressed in monocytes (Additional file 6, Figure S4B). These results show that evolutionary conserved TFBSs are more common in the promoters of miRNAs over-expressed in DCs and monocytes than in randomly chosen miRNA promoters.

3.5 Integrating the motif-matching and evolutionary conservation scores of TFBSs and further validation of cognate TFs in microarray data from DCs Without taking evolutionary information into account, 13 TFBSs were found to occur at least once in all the promoters of miRNAs preferentially expressed in DCs (Figure 2). Filtering TFBSs at 10th percentile threshold of the TFBS conservation scores not surprisingly identifies TFBSs that are common to far less miRNAs promoters (Figure 5). Among these, the TFBS of MZF1 was the most conserved at the thresholds used. To validate the expression of the cognate transcription factors whose binding site was identified, a transcriptome was generated of DCs from one donor using microarrays. The resulting data were screened for the presence or absence of transcription factors (TFs) whose binding sites were predicted in promoters of miRNAs up-regulated in DCs. For 10 out of the 17 TFBSs that are over-represented among the DC promoters (Additional 5 file 4, Table S2B) and that contained at least one TFBS with an instance score of at least 6.0 and a conservation threshold score of 0.968 (i.e. the 10th percentile of the median conservation scores of all TFBSs predicted in the set of miRNA promoters) as shown in Figure 5, the mRNA encoding their cognate TF was expressed in DCs. Expression of this 58.8% of the predicted over-represented TFs is slightly higher than that for all the TFs that were examined with the microarrays (54% expressed, 46 % not expressed, Additional file 7, Table S3). Nevertheless, we found a high correlation (r = 0.91, p = 1.17 x 10 -2) when the number of motifs of expressed TFs was correlated with the number of miRNA promoters in which the motifs of these TFs were detected (Additional file 8, Figure S5A). An insignificant correlation (r=-0.72, p= 4.88 x 10-1) was observed when a random set of miRNA promoters was used (Additional file 8, Figure S5B).

121 Figure 5 - TFBSs shared among the promoter regions of miRNAs up-regulated in DCs.

(A) Degree to which conserved TFBSs in promoter regions of miRNAs up-regulated in DCs are shared among their promoters. Shown on the x-axis are high-scoring TFBSs filtered at a motif instance score threshold of at least 6 and at a 10th percentile evolutionary conservation score that occur at least once in the promoters of miRNAs up-regulated in DCs. The y-axis shows the number of promoters that have the TFBS at least once at these thresholds. (B) The distribution of the number of common TFBS hits per number of common miRNA promoters as in A, for test and random sets of miRNA promoters. The values for random are the median values from 1000 random sets of miRNA promoters of the same size and length as those in the DC set. The score is the ratio of the sum of all occurrences of conserved TFBS for the DC set relative to that of the random sets. The p-value is estimated from the number of instances wherein this sum for 1000 random sets of miRNA promoters is greater than or equal to that of the DC set.

122 Transcription and targets of DC-expressed miRNAs

Gene Ontology of the expressed TFs highlighted immune-related processes like “interspecies interaction between organisms” (p=4.49 x 10-5) and “regulation of T-helper 2 type immune response” (p=1.19 x 10-2, Additional file 9, Table S4A). This is in line with the expected functions of DCs in immune response. Meanwhile, Gene Ontology of the TFs that were not expressed highlighted development-related processes like “cell development” (p=1.88 x 10-3) and “cell fate commitment” (p=7.54 x 10-3, Additional file 9, Table S4B), with little relevance to immune-related processes. Taken together, these data provide evidence that many expressed miRNAs in DCs have evolutionarily conserved TFBSs that may be relevant to DC biology in their promoters.

3.6 Target prediction and Gene Ontology analysis of microRNAs enriched in monocytes and DCs To extend the regulatory pathways in which the miRNAs participate downstream, we predicted miRNA targets, using TargetScanHuman version 5.1 [15] and PicTar [46]. Both datasets were based on conservation in mammals. We obtained the precompiled dataset from their official websites. At a TargetScan context score threshold of at most -0.4, we obtained 421 distinct target genes (Additional file 10, Table S5; counting in RefSeq DNA IDs) for the miRNAs that are over-expressed in DCs. Meanwhile at a PicTar score threshold of at least 0.4, we obtained 2300 distinct target genes (Additional file 11, Table S6) for the miRNAs over-expressed in DCs. Likewise, we predicted 282 (Additional file 12, Table S7) and 1591 (Additional file 13, Table S8) distinct genes using TargetScan and PicTar respectively, to be targets for the miRNAs that are over-expressed in monocytes. For both DC-over-expressed miRNAs and monocyte-over-expressed miRNAs, the overlap of targets predicted by both TargetScan and PicTar is less than 50% of their respective total predictions (Additional file 14, Figure S6) suggesting that we still need to examine their independent predictions. Interestingly, only 7 identical targets were predicted for both monocyte-over-expressed miRNAs and DC-over-expressed miRNAs at the TargetScan threshold of ≤-0.4 and PicTar threshold of ≥0.4. The numbers of target genes predicted are summarized in Additional file 14, Figure S6. 5

The number of target genes predicted using TargetScan and PicTar are very large even at the stringent thresholds used. Furthermore, Gene Ontology (GO) analyses of the gene sets did not yield conclusive results. To increase the relevance of the predicted targets to DCs we further filtered them based on their differential expression in DCs relative to monocytes during DC differentiation using previously published data by Lehtonen et al (2007) [47]. At a gene expression fold change threshold of 2, we identified 34 unique genes corresponding to the DC-associated targets predicted using PicTar to be down-regulated at all time points in DCs relative to monocytes (Additional file 15, Table S9A). Amongst this filtered set of genes is IL10, which interestingly is involved in the process of “negative regulation of myeloid dendritic cell activation” (p=2.64 x 10-2,

123 from GO analysis of the 34 down-regulated genes). Correspondingly, 10 unique genes were associated with the down-regulated genes of the TargetScan-predicted targets of miRNAs that were over-expressed in DCs relative to monocytes (Additional file 15, Table S9B). In total, 6 of these down-regulated genes (SIK1, PELI2, MYLIP, RNF138, CCNG2 and FOXO1; in order of increasing down-regulation of their RNA transcripts at time point 24 hr) were also predicted using PicTar (Additional file 15, Table S9). Of these, SIK1, PELI2 and one transcript of CCNG2 were increasingly down-regulated in DCs relative to monocytes in the time course of DC differentiation from monocytes; suggesting their involvement in mechanisms of DC differentiation.

In addition to the DC dataset, we also examined the expression of the predicted target genes of the miRNAs that are over-expressed in monocytes relative to DCs. We found 35 unique genes associated with the PicTar-predicted targets of miRNAs over-expressed in monocytes relative to DCs that were down-regulated in monocytes relative to DCs (Additional file 16, Table S10A). Correspondingly, transcripts of 9 unique genes associated with the TargetScan-predicted targets of miRNA over-expressed in monocytes relative to DCs were also down-regulated at all time points in monocytes relative to DCs (Table S10B). Among these is DICER1 which is involved in RNA interference and is up-regulated in DCs. Furthermore, there was an overlap of 2 genes (ACVR2A and MAP3K4) between the PicTar and TargetScan target gene sets (Additional file 14, Figure S6 and Additional file 17, Table S11). One transcript of ACVR2A was increasingly down-regulated (Additional file 16, Table S10). Moreover, there was no overlap in expressed genes that were targets to both DC- and monocyte-over-expressed miRNAs, suggesting congruency in our data. These results suggest that miRNA target genes can be either up-regulated or down-regulated in myeloid cells to regulate differentiation of the myeloid cells.

4. Discussion

As little is known about the expression and regulation of miRNAs in monocytes, DCs and their stimulated progeny, part of the miRNA transcriptome of these cells was generated and analyzed. Out of 157 miRNAs profiled, 104 appeared to be expressed in myeloid cells to a varying degree. Since the database miRBase 13.0 (March 2009) reveals the existence of at least 706 human miRNAs, it is conceivable that a much higher number of miRNAs is expressed in human monocytes and DCs. Six miRNAs appear up-regulated in monocytes, 10 are up-regulated in DCs and 3 are differentially expressed among different DC populations in response to LPS. Also, some miRNAs showed similar changes in expression levels among DC subsets.

124 Transcription and targets of DC-expressed miRNAs

Amongst the 10 miRNAs that were over-expressed in DCs as compared to the monocytes from which they were derived, hsa-miR-34a, hsa-miR-125a, hsa-miR-342 and hsa-let-7e have been shown to be up-regulated in the course of DC differentiation from monocytes in culture [55, 56]. The up-regulation of hsa-miR-342 in monocyte-derived DCs as compared to monocytes is likely the result of culture conditions, as hsa-miR-342 is expressed at a much lower level in freshly isolated blood-derived myeloid DCs than in DCs generated in vitro (data not shown). Of the remaining miRNAs, 6 show an upward trend in expression level in blood-derived DCs, but the levels in cultured DCs are higher. Intriguingly, hsa-miR-146a and hsa-miR-155 appear to respond to LPS, but much less so than in monocytes and macrophages [7, 8]. Recently, it was demonstrated that LPS-mediated activation of protein kinase Akt1 results in up-regulation of miRNA let-7e in primary macrophages, while at the same time repressing miR-155 expression [57]. Our data show that both hsa-let-7e and hsa-miR-155 are up-regulated in DCs compared to monocytes, but that only hsa-miR-155 is slightly up-regulated by LPS. Taken together, these data imply that intrinsic differences between DCs and macrophages exist in response to the TLR4 ligand LPS.

With regard to the targets of the miRNAs screened herein, recent literature indicates that inhibition of hsa-miR-34a or addition of one of its target genes, JAG1, have been observed to functionally stall the differentiation of monocyte-derived dendritic cells [55]. This supports our identification of hsa-miR-34a as an over-expressed miRNA in DCs relative to monocytes and JAG1 as its target gene (NM_000214 in Additional file 11, Table S6). In addition, we provide a list of predicted DC targets that are down-regulated in expression in DCs relative to monocytes (Additional file 15, Table S9). In this list, one of the targets of the DC-expressed miRNA, hsa-let-7e, is IL10, which is involved in negative regulation of myeloid dendritic cell activation. Taken together, these data highlight miRNAs, and their target genes, that can potentially modulate DC differentiation.

Extensive miRNA promoter analysis revealed that 13 TFBSs are over-represented and 5 commonly shared by the 12 promoter sequences of the 10 miRNAs that are up-regulated in DCs. These include signal transducer and activator of transcription 6 (STAT6), which is known to be involved in IL4 signaling and DC differentiation and maturation [58], spleen focus forming virus (SFFV) proviral integration oncogene spi1 (SPI1, also known as PU.1), which is indispensable for normal myeloid and lymphoid development [59] and specificity protein 1 (SP1), which is involved in the expression of dendritic cell-specific ICAM-3 grabbing non-integrin (D C-SIGN) [60].

Although the TSSs in these promoter regions are not known, TFBSs did cluster, especially when taking evolutionary conservation into account, within the 500 bp upstream region of the annotated pre-miRNAs, coinciding with the region in miRNA promoters at

125 which TSS have been discovered experimentally [19, 61]. Amongst the TFBSs predicted in the DC miRNAs, the sites for myeloid zinc finger (MZF1) is best conserved of all, even though the role of its cognate TF in DCs remains elusive. Nevertheless, MZF1 is thought to be a bi-functional transcriptional regulator, repressing transcription in non-hemato- poietic cells, activating transcription in cells of hematopoietic origins and controlling cell proliferation and tumorigenesis [62, 63]. In addition, one of the conserved TFBSs for which the cognate TF was not expressed in monocyte-derived DCs appears to be Spi-B transcription factor (SPIB), which has been implicated in plasmacytoid DC (pDC) development, a non-myeloid cell type [64, 65]. It should be noted, however, that many of the miRNAs expressed in myeloid monocyte-derived DCs are also expressed in pDCs (data not shown).

When taking all predicted and well-conserved TFBSs in the promoters of miRNAs that are up-regulated in DCs into account, the number of TFs expressed increases with the degree at which their TFBS motifs are shared between the promoter sequences. Gene Ontology analysis indicates that the type of expressed TFs enriched in these data are relevant to the immune system process and are in line with the known function of DCs, whereas those that were not expressed are relevant to cell development (Additional file 9, Table S4). Importantly, libraries of TFBS motifs used do not represent all possible TFBSs and as such not all possible TFBSs for expressed miRNAs have been identified in this study. Moreover, for consistency, we have used TFBS motifs, and not TFs, in making comparisons of TFBSs because the databases used have redundant motif names for the same TFs. It should be noted that predicted target genes were not uniformly down- regulated, as we found evidence in DCs whereby predicted target genes were up-regulated in the cells or were also targets for the miRNAs that were over-expressed in monocytes. This may be due to limitations of the target prediction algorithms, or the targets might still be down-regulated at the protein level. Nevertheless, none of the down-regulated DC-miRNA targets was a predicted target for the monocyte-miRNAs. Furthermore, there is evidence that in a minority of cases, target genes are actually up-regulated by miRNAs [66, 67]. The up-regulation of DICER1 in DCs relative to monocytes is of special interest, as it is known that innate immune signaling is tightly controlled by miRNAs [7, 8]. Furthermore, there is also evidence that one type of DC, the Langerhans cell, requires proper functioning of DICER1 to induce CD4 T cell function [68]. Together with the fact that there are more up-regulated miRNAs in DCs than monocytes in our data set, it is tempting to speculate that DICER1 up-regulation is required for proper DC function.

126 Transcription and targets of DC-expressed miRNAs

5. Conclusions

The data provide evidence that, among the many expressed miRNAs in DCs, evolutionarily conserved TFBSs relevant to DC biology are present in their promoters. Furthermore, the identified miRNAs, their associated TFs and predicted target genes could help improve our understanding of the molecular pathways that underpin DC differentiation and maturation.

Abbreviations hsa: Homo sapiens; miRNA: microRNA; DC: dendritic cell; iDC: immature DC; mDC: mature DC; tDC: tolerogenic DC; atDC: activated tolerogenic DC; pDC: plasmacytoid DC; TFBS: transcription factor binidng sites; TF: transcription factor; TSS: transcription start site; RISC: RNA-induced signaling complex; TLR: toll-like receptor; TNF: tumor necrosis factor; FACS: fluorescence-activated cell sorting; PCR: polymerase chain reaction; HMM: hidden markov modeling; GO: Gene Ontology; ANOVA: analysis of variance

Authors’ contributions BJHJ conceived the study, carried out the miRNA profiling and mRNA profiling, performed statistical analysis on the data, participated in the bioinformatics analysis, carried out micro- array experiments and drafted the manuscript. IES conceived the study, participated in the statistical analysis of the data, performed the bioinformatics analysis, and drafted the manuscript. DET cultured cells and analyzed cellular phenotypes. MAvHK participated in cell culture, sample preparation and miRNA profiling. JHJ participated in the design of the study and provided miRNA assays. MAH participated in the design and coordination of the study. GJA participated in the design and coordination of the study. All authors read and approved the manuscript.

Authors’ information BJHJ and IES contributed equally to this manuscipt. BJHJ is now at Lead Pharma BV, 5 Nijmegen, the Netherlands.

Acknowledgements The authors would like to gratefully thank Dr. Ruurd Torensma, Department of Tumor Immunology, Nijmegen Centre for Molecular Life Sciences, Radboud University Nijmegen Medical Centre, for his continuous support and critical assessment of the manuscript. This work was supported by the VIRGO consortium, an Innovative Cluster approved by the Netherlands Genomics Initiative and partially funded by the Dutch Government (BSIK 03012), The Netherlands, and by VICI grant 918-66-615 (awarded to G.J.A.) from the Netherlands Organization for Scientific Research (NWO).

Additional les are available in Appendix 3.

127 References

1. He L, Hannon GJ: MicroRNAs: small RNAs with a big role in gene regulation. Nature reviews 2004, 5(7):522-531. 2. Ma L, Weinberg RA: Micromanagers of malignancy: role of microRNAs in regulating metastasis. Trends Genet 2008, 24(9):448-456. 3. van Rooij E, Olson EN: MicroRNAs: powerful new regulators of heart disease and provocative therapeutic targets. The Journal of clinical investigation 2007, 117(9):2369-2376. 4. Kim VN, Han J, Siomi MC: Biogenesis of small RNAs in animals. Nat Rev Mol Cell Biol 2009, 10(2):126-139. 5. Chen CZ, Li L, Lodish HF, Bartel DP: MicroRNAs modulate hematopoietic lineage di erentiation. Science (New York, NY 2004, 303(5654):83-86. 6. Li QJ, Chau J, Ebert PJ, Sylvester G, Min H, Liu G, Braich R, Manoharan M, Soutschek J, Skare P, Klein LO, Davis MM, Chen CZ: miR-181a is an intrinsic modulator of T cell sensitivity and selection. Cell 2007, 129(1):147-161. 7. Taganov KD, Boldin MP, Chang KJ, Baltimore D: NF-kappaB-dependent induction of microRNA miR-146, an inhibitor targeted to signaling proteins of innate immune responses. Proceedings of the National Academy of Sciences of the United States of America 2006, 103(33):12481-12486. 8. O’Connell RM, Taganov KD, Boldin MP, Cheng G, Baltimore D: MicroRNA-155 is induced during the macrophage inammatory response. Proceedings of the National Academy of Sciences of the United States of America 2007, 104(5):1604-1609. 9. Dorsett Y, McBride KM, Jankovic M, Gazumyan A, Thai TH, Robbiani DF, Di Virgilio M, San-Martin BR, Heidkamp G, Schwickert TA, Eisenreich T, Rajewsky K, Nussenzweig MC: MicroRNA-155 suppresses acti- vation-induced cytidine deaminase-mediated Myc-Igh translocation. Immunity 2008, 28(5):630-638. 10. Kohlhaas S, Garden OA, Scudamore C, Turner M, Okkenhaug K, Vigorito E: Cutting edge: the Foxp3 target miR-155 contributes to the development of regulatory T cells. J Immunol 2009, 182(5):2578-2582. 11. O’Connell RM, Rao DS, Chaudhuri AA, Boldin MP, Taganov KD, Nicoll J, Paquette RL, Baltimore D: Sustained expression of microRNA-155 in hematopoietic stem cells causes a myeloproliferative disorder. The Journal of experimental medicine 2008, 205(3):585-594. 12. Rodriguez A, Vigorito E, Clare S, Warren MV, Couttet P, Soond DR, van Dongen S, Grocock RJ, Das PP, Miska EA, Vetrie D, Okkenhaug K, Enright AJ, Dougan G, Turner M, Bradley A: Requirement of bic/microRNA-155 for normal immune function. Science (New York, NY 2007, 316(5824):608-611. 13. Romania P, Lulli V, Pelosi E, Biffoni M, Peschle C, Marziali G: MicroRNA 155 modulates megakaryopoiesis at progenitor and precursor level by targeting Ets-1 and Meis1 transcription factors. British journal of haematology 2008, 143(4):570-580. 14. Vigorito E, Perks KL, Abreu-Goodger C, Bunting S, Xiang Z, Kohlhaas S, Das PP, Miska EA, Rodriguez A, Bradley A, Smith KG, Rada C, Enright AJ, Toellner KM, Maclennan IC, Turner M: microRNA-155 regulates the generation of immunoglobulin class-switched plasma cells. Immunity 2007, 27(6):847-859. 15. Friedman RC, Farh KK, Burge CB, Bartel DP: Most mammalian mRNAs are conserved targets of microRNAs. Genome research 2009, 19(1):92-105. 16. Lee Y, Kim M, Han J, Yeom KH, Lee S, Baek SH, Kim VN: MicroRNA genes are transcribed by RNA polymerase II. The EMBO journal 2004, 23(20):4051-4060. 17. Borchert GM, Lanier W, Davidson BL: RNA polymerase III transcribes human microRNAs. Nature structural & molecular biology 2006, 13(12):1097-1101. 18. Monteys AM, Spengler RM, Wan J, Tecedor L, Lennox KA, Xing Y, Davidson BL: Structure and activity of putative intronic miRNA promoters. RNA (New York, NY 2010, 16(3):495-505. 19. Corcoran DL, Pandit KV, Gordon B, Bhattacharjee A, Kaminski N, Benos PV: Features of mammalian microRNA promoters emerge from polymerase II chromatin immunoprecipitation data. PLoS ONE 2009, 4(4):e5279. 20. Ozsolak F, Poling LL, Wang Z, Liu H, Liu XS, Roeder RG, Zhang X, Song JS, Fisher DE: Chromatin structure analyses identify miRNA promoters. Genes & development 2008, 22(22):3172-3183. 21. Long YS, Deng GF, Sun XS, Yi YH, Su T, Zhao QH, Liao WP: Identication of the transcriptional promoters in the proximal regions of human microRNA genes. Molecular biology reports 2011, 38(6):4153-4157.

128 Transcription and targets of DC-expressed miRNAs

22. Coulson RM, Hall N, Ouzounis CA: Comparative genomics of transcriptional control in the human malaria parasite Plasmodium falciparum. Genome research 2004, 14(8):1548-1554. 23. Gunasekera AM, Myrick A, Militello KT, Sims JS, Dong CK, Gierahn T, Le Roch K, Winzeler E, Wirth DF: Regulatory motifs uncovered among gene expression clusters in Plasmodium falciparum. Molecular and biochemical parasitology 2007, 153(1):19-30. 24. van Noort V, Huynen MA: Combinatorial gene regulation in Plasmodium falciparum. Trends Genet 2006, 22(2):73-78. 25. Chow KL, Schwartz RJ: A combination of closely associated positive and negative cis-acting promoter elements regulates transcription of the skeletal alpha-actin gene. Molecular and cellular biology 1990, 10(2):528-538. 26. Halfon MS, Carmena A, Gisselbrecht S, Sackerson CM, Jimenez F, Baylies MK, Michelson AM: Ras pathway specicity is determined by the integration of multiple signal-activated and tissue-restricted transcription factors. Cell 2000, 103(1):63-74. 27. Liu K, Nussenzweig MC: Origin and development of dendritic cells. Immunological reviews 2010, 234(1):45-54. 28. de Vries IJ, Eggert AA, Scharenborg NM, Vissers JL, Lesterhuis WJ, Boerman OC, Punt CJ, Adema GJ, Figdor CG: Phenotypical and functional characterization of clinical grade dendritic cells. J Immunother 2002, 25(5):429-438. 29. Banchereau J, Steinman RM: Dendritic cells and the control of immunity. Nature 1998, 392(6673):245-252. 30. Mellman I, Steinman RM: Dendritic cells: specialized and regulated antigen processing machines. Cell 2001, 106(3):255-258. 31. Figdor CG, de Vries IJ, Lesterhuis WJ, Melief CJ: Dendritic cell immunotherapy: mapping the way. Nature medicine 2004, 10(5):475-480. 32. Schreurs MW, Eggert AA, Punt CJ, Figdor CG, Adema GJ: Dendritic cell-based vaccines: from mouse models to clinical cancer immunotherapy. Critical reviews in oncogenesis 2000, 11(1):1-17. 33. Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, Weinstock GM, Wilson RK, Gibbs RA, Kent WJ, Miller W, Haussler D: Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome research 2005, 15(8):1034-1050. 34. Siepel A, Haussler D: Phylogenetic hidden Markov models. In: Statistical methods in molecular evolution. Edited by Nielsen R. New York: Springer; 2005: xii, 504 p. 35. Benitez-Ribas D, Adema GJ, Winkels G, Klasen IS, Punt CJ, Figdor CG, de Vries IJ: Plasmacytoid dendritic cells of melanoma patients present exogenous proteins to CD4+ T cells after Fc gamma RII-mediated uptake. The Journal of experimental medicine 2006, 203(7):1629-1635. 36. Chen C, Ridzon DA, Broomer AJ, Zhou Z, Lee DH, Nguyen JT, Barbisin M, Xu NL, Mahuvakar VR, Andersen MR, Lao KQ, Livak KJ, Guegler KJ: Real-time quantication of microRNAs by stem-loop RT-PCR. Nucleic acids 5 research 2005, 33(20):e179. 37. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JY, Zhang J: Bioconductor: open software development for computational biology and bioinformatics. Genome biology 2004, 5(10):R80. 38. The R Development Core Team: R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2008. 39. Hinrichs AS, Karolchik D, Baertsch R, Barber GP, Bejerano G, Clawson H, Diekhans M, Furey TS, Harte RA, Hsu F, Hillman-Jackson J, Kuhn RM, Pedersen JS, Pohl A, Raney BJ, Rosenbloom KR, Siepel A, Smith KE, Sugnet CW, Sultan-Qurraie A, Thomas DJ, Trumbower H, Weber RJ, Weirauch M, Zweig AS, Haussler D, Kent WJ: The UCSC Genome Browser Database: update 2006. Nucleic acids research 2006, 34(Database issue):D590-598. 40. Karolchik D, Baertsch R, Diekhans M, Furey TS, Hinrichs A, Lu YT, Roskin KM, Schwartz M, Sugnet CW, Thomas DJ, Weber RJ, Haussler D, Kent WJ: The UCSC Genome Browser Database. Nucleic acids research 2003, 31(1):51-54.

129 41. Frith MC, Fu Y, Yu L, Chen JF, Hansen U, Weng Z: Detection of functional DNA motifs via statistical over- representation. Nucleic acids research 2004, 32(4):1372-1381. 42. Sandelin A, Alkema W, Engstrom P, Wasserman WW, Lenhard B: JASPAR: an open-access database for eukaryotic transcription factor binding proles. Nucleic acids research 2004, 32(Database issue):D91-94. 43. Matys V, Fricke E, Geffers R, Gossling E, Haubrock M, Hehl R, Hornischer K, Karas D, Kel AE, Kel-Margoulis OV, Kloos DU, Land S, Lewicki-Potapov B, Michael H, Munch R, Reuter I, Rotert S, Saxel H, Scheer M, Thiele S, Wingender E: TRANSFAC: transcriptional regulation, from patterns to proles. Nucleic acids research 2003, 31(1):374-378. 44. Pheasant M, Mattick JS: Raising the estimate of functional human sequences. Genome research 2007, 17(9):1245-1253. 45. Yue J, Tigyi G: Conservation of miR-15a/16-1 and miR-15b/16-2 clusters. Mamm Genome 2010, 21(1-2):88-94. 46. Krek A, Grun D, Poy MN, Wolf R, Rosenberg L, Epstein EJ, MacMenamin P, da Piedade I, Gunsalus KC, Stoffel M, Rajewsky N: Combinatorial microRNA target predictions. Nature genetics 2005, 37(5):495-500. 47. Lehtonen A, Ahlfors H, Veckman V, Miettinen M, Lahesmaa R, Julkunen I: Gene expression proling during di erentiation of human monocytes to macrophages or dendritic cells. Journal of leukocyte biology 2007, 82(3):710-720. 48. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome research 2003, 13(11):2498-2504. 49. Maere S, Heymans K, Kuiper M: BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics (Oxford, England) 2005, 21(16):3448-3449. 50. Emmer PM, van der Vlag J, Adema GJ, Hilbrands LB: Dendritic cells activated by lipopolysaccharide after dexamethasone treatment induce donor-specic allograft hyporesponsiveness. Transplanta- tion 2006, 81(10):1451-1459. 51. Tharakaraman K, Bodenreider O, Landsman D, Spouge JL, Marino-Ramirez L: The biological function of some human transcription factor binding motifs varies with position relative to the transcription start site. Nucleic acids research 2008, 36(8):2777-2786. 52. FitzGerald PC, Shlyakhtenko A, Mir AA, Vinson C: Clustering of DNA sequences in human promoters. Genome research 2004, 14(8):1562-1574. 53. Ohler U, Liao GC, Niemann H, Rubin GM: Computational analysis of core promoters in the Drosophila genome. Genome biology 2002, 3(12):RESEARCH0087. 54. Suzuki H, Forrest AR, van Nimwegen E, Daub CO, Balwierz PJ, Irvine KM, Lassmann T, Ravasi T, Hasegawa Y, de Hoon MJ, Katayama S, Schroder K, Carninci P, Tomaru Y, Kanamori-Katayama M, Kubosaki A, Akalin A, Ando Y, Arner E, Asada M, Asahara H, Bailey T, Bajic VB, Bauer D, Beckhouse AG, Bertin N, Bjorkegren J, Brombacher F, Bulger E, Chalk AM et al: The transcriptional network that controls growth arrest and di erentiation in a human myeloid leukemia cell line. Nature genetics 2009, 41(5):553-562. 55. Hashimi ST, Fulcher JA, Chang MH, Gov L, Wang S, Lee B: MicroRNA proling identies miR-34a and miR-21 and their target genes JAG1 and WNT1 in the coordinate regulation of dendritic cell di er- entiation. Blood 2009, 114(2):404-414. 56. Holmstrom K, Pedersen AW, Claesson MH, Zocca MB, Jensen SS: Identication of a microRNA signature in dendritic cell vaccines for cancer immunotherapy. Human immunology 2010, 71(1):67-73. 57. Androulidaki A, Iliopoulos D, Arranz A, Doxaki C, Schworer S, Zacharioudaki V, Margioris AN, Tsichlis PN, Tsatsanis C: The kinase Akt1 controls macrophage response to lipopolysaccharide by regulating microRNAs. Immunity 2009, 31(2):220-231. 58. Lutz MB, Schnare M, Menges M, Rossner S, Rollinghoff M, Schuler G, Gessner A: Di erential functions of IL-4 receptor types I and II for dendritic cell maturation and IL-12 production and their dependency on GM-CSF. J Immunol 2002, 169(7):3574-3580. 59. Rosenbauer F, Wagner K, Kutok JL, Iwasaki H, Le Beau MM, Okuno Y, Akashi K, Fiering S, Tenen DG: Acute myeloid leukemia induced by graded reduction of a lineage-specic transcription factor, PU.1. Nature genetics 2004, 36(6):624-630.

130 Transcription and targets of DC-expressed miRNAs

60. Sakuntabhai A, Turbpaiboon C, Casademont I, Chuansumrit A, Lowhnoo T, Kajaste-Rudnitski A, Kalayanarooj SM, Tangnararatchakit K, Tangthawornchaikul N, Vasanawathana S, Chaiyaratana W, Yenchitsomanus PT, Suriyaphol P, Avirutnan P, Chokephaibulkit K, Matsuda F, Yoksan S, Jacob Y, Lathrop GM, Malasit P, Despres P, Julier C: A variant in the CD209 promoter is associated with severity of dengue disease. Nature genetics 2005, 37(5):507-513. 61. Avnit-Sagi T, Kantorovich L, Kredo-Russo S, Hornstein E, Walker MD: The promoter of the pri-miR-375 gene directs expression selectively to the endocrine pancreas. PLoS One 2009, 4(4):e5033. 62. Gaboli M, Kotsi PA, Gurrieri C, Cattoretti G, Ronchetti S, Cordon-Cardo C, Broxmeyer HE, Hromas R, Pandolfi PP: Mzf1 controls cell proliferation and tumorigenesis. Genes & development 2001, 15(13):1625-1630. 63. Hromas R, Davis B, Rauscher FJ, 3rd, Klemsz M, Tenen D, Hoffman S, Xu D, Morris JF: Hematopoietic tran- scriptional regulation by the myeloid zinc nger gene, MZF-1. Current topics in microbiology and immunology 1996, 211:159-164. 64. Schotte R, Nagasawa M, Weijer K, Spits H, Blom B: The ETS transcription factor Spi-B is required for human plasmacytoid dendritic cell development. The Journal of experimental medicine 2004, 200(11):1503-1509. 65. Schotte R, Rissoan MC, Bendriss-Vermare N, Bridon JM, Duhen T, Weijer K, Briere F, Spits H: The transcription factor Spi-B is expressed in plasmacytoid DC precursors and inhibits T-, B-, and NK-cell development. Blood 2003, 101(3):1015-1023. 66. Cordes KR, Sheehy NT, White MP, Berry EC, Morton SU, Muth AN, Lee TH, Miano JM, Ivey KN, Srivastava D: miR-145 and miR-143 regulate smooth muscle cell fate and plasticity. Nature 2009, 460(7256):705-710. 67. Vasudevan S, Tong Y, Steitz JA: Switching from repression to activation: microRNAs can up-regulate translation. Science (New York, NY 2007, 318(5858):1931-1934. 68. Kuipers H, Schnorfeil FM, Fehling HJ, Bartels H, Brocker T: Dicer-dependent microRNAs control maturation, function, and maintenance of Langerhans cells in vivo. J Immunol 2010, 185(1):400-409.

5

131

6

Transcriptome kinetics of circulating neutrophils during human experimental endotoxemia

Stan de Kleijn1, Matthijs Kox2, Iziah Edwin Sama3, Janesh Pillay4, Martijn A. Huijnen3, Johannes G. van der Hoeven2, Gerben Ferwerda1*, Peter W.M. Hermans1, Peter Pickkers2

1 Laboratory of Pediatric Infectious Diseases, Radboud University Nijmegen Medical Centre, Nijmegen, the Netherlands. 2 Intensive Care Medicine, Radboud University Nijmegen Medical Centre, Nijmegen, the Netherlands. 3 Centre for Molecular and Biomolecular Informatics, Radboud University Nijmegen Medical Centre, Nijmegen, the Netherlands. 4 Department of Respiratory Medicine, University Medical Centre Utrecht, the Netherlands

Published in part in PLoS One. 2012;7(6):e38255. Epub 2012 Jun 5. Abstract

Polymorphonuclear cells (neutrophils) play an important role in systemic inflammatory response syndrome and the development of sepsis. These cells are essential for the defense against microorganisms, but may also cause tissue damage. Therefore, neutrophil numbers and activity are considered to be tightly regulated. In previous studies, gene transcription during experimental endotoxemia in whole blood and peripheral blood mononuclear cells have been investigated. However, the gene transcription response of the circulating pool of neutrophils to systemic inflammatory stimulation in vivo is currently unclear. We herein examine neutrophil gene transcription kinetics in healthy human subjects (n=4) who were administered a single dose of endotoxin (LPS, 2 ng/kg iv). In addition, freshly isolated neutrophils were stimulated ex vivo with LPS, TNFα, G-CSF and GM-CSF to identify stimulus-specific gene transcription responses. Whole transcriptome microarray analysis of circulating neutrophils at 2, 4 and 6 hours after LPS infusion revealed activation of inflammatory networks which are involved in signaling of TNFα , IL-1α and IL-1β. The transcriptome profile of inflammatory activated neutrophils in vivo reflects extended survival and regulation of inflammatory responses. We show that these changes in neutrophil transcriptome are most likely due to a combination of early activation of circulating neutrophils by TNFα and G-CSF and a mobilization of young neutrophils from the bone marrow.

134 Transcriptome kinetics of circulating neutrophils during human experimental endotoxemia

1. Introduction

Sepsis is the most frequent cause of death at non-cardiac intensive care units1, 2. It is characterized by a systemic inflammatory response, and associated with marked cardiovascular changes, capillary leakage, tissue damage and multiple organ failure. The most common form of sepsis is caused by bacterial infection, and involves a severe host immune response. The inflammatory responses induced by bacteria are mediated by the expression of pathogen-associated molecular patterns (PAMPs) which are recognized by the host cells via pattern recognition receptors (PRRs) such as Toll-like receptors (TLRs)3. For example, stimulation of TLR4 by lipopolysaccharide (LPS) of Gram-negative bacteria results in activation of inflammatory signaling pathways and induction of cytokine secretion by endothelium and circulating leukocytes4. In blood, systemic inflammation is classically reflected by a change in leukocyte count with a prominent increase in neutrophils. During the course of systemic inflammation, neutrophil numbers remain elevated5. Besides their anti-microbial activity, neutrophils also exert regulatory functions during inflammation by secretion of their granular content and direct cell-cell interaction with mononuclear cells and endothelium6, 7. It has been shown that the pool of circulating neutrophils during systemic inflammation is characterized by a diversity of neutrophil phenotypes8. The human endotoxemia model is a valuable tool to study early inflammatory mechanisms9. Transcriptome analysis of this in vivo response to LPS has been shown to reflect the trancriptome profiles of septic or injured patients10-12. These studies focused on whole blood and PBMC analysis, showing an early acute response followed by a recovery phase in gene expression. However, up till now, it is unclear what part of this acute systemic inflammatory regulation can be attributed to circulating neutrophils. In the current study, we present for the first time the transcriptional response of circulating neutrophils to LPS administration in vivo. We use novel analysis methods based on physical protein interaction to control for interaction enrichment bias. Hereby, we aim to identify the role of circulating neutrophils in controlling systemic inflammation by evaluation of their gene expression kinetics and immunomodulatory properties during early inflammatory responses, increased granulopoiesis and neutrophilia.

2. Methods 2.1 Subjects 6 Neutrophil gene expression was studied in 4 healthy male volunteers who participated in a human endotoxemia trial (Clinical Trial Register number NCT00783068, placebo group). The study protocol was approved by the Ethics Committee of the Radboud University Nijmegen Medical Centre and complies with the Declaration of Helsinki

135 including current revisions and the Good Clinical Practice guidelines. Written informed consent was obtained from all study participants. Physical examinations, electrocardi- ography, and routine laboratory studies on all the volunteers before the start of the experiment showed normal results. Volunteers were not taking any prescription medications, and tested negative for hepatitis B surface antigen and human immuno- deficiency virus infection.

2.2 Human endotoxemia model Subjects refrained from food 12 hours before the start of the experiment, and caffeine or alcohol containing substances 24 hours before the start of the experiment. The experiments were performed according to a strict clinical protocol as described previously13. U.S. Reference E. coli endotoxin (Escherichia coli O:113, Clinical Center Reference Endotoxin, National Institute of Health (NIH), Bethesda, MD) was used. Ec-5 endotoxin, supplied as a lyophilized powder, was reconstituted in 5 ml saline 0.9% for injection and vortex-mixed for at least 10 minutes after reconstitution. The endotoxin solution was administered as an intravenous bolus injection at a dose of 2 ng/kg of body weight.

2.3 Plasma cytokine and growth factor measurements During human endotoxemia experiments, EDTA anticoagulated blood was collected from the arterial line and immediately centrifuged at 2000g for 10 minutes at 4 oC to obtain plasma. Concentrations of IL-1β, TNF-α, IL-6, IL-10, IL-1RA and MCP-1 in plasma and whole blood stimulation supernatants were measured using a simultaneous Luminex Assay according to the manufacturer’s instructions (Bio-plex cytokine assay, BioRad, Hercules, CA, USA). IL-8 was measured in plasma by ELISA (Pelipair, Sanquin, Amsterdam, the Netherlands) following manufacturer’s protocol and plasma G-CSF and GM-CSF concentrations were measured by cytrometric bead array (BD, Franklin Lakes, USA) on the FACScalibur flow cytometer.

2.4 Neutrophil isolation Blood samples were drawn at 0, 2, 4 and 6 hours after LPS administration in 10 ml sodium heparin blood tubes (BD). Total leukocytes were determined by Tuerk’s solution (Merck staining). Blood was diluted 1:1 with PBS and placed on lymphoprep (Axis shield) and centrifuged. The 1:1 diluted blood plasma was stored at -80°C until further use. The lowest compartment containing the polymorphonuclear cell fraction was taken containing neutrophils (purity >95%) mostly contaminated by eosinophils and minor PBMCs. Shock buffer containing 0.155 M NH4Cl, 0.0001 M Na2EDTA and 0.01 M KHCO3 was used to lyse the red blood cells. After washing, the granulocytes were counted on a hemocytometer and cell viability (>99%) was determined using 0.4% trypan blue solution. Granulocytes were suspended at a concentration of 10x106 cells/ml in Qiagen

136 Transcriptome kinetics of circulating neutrophils during human experimental endotoxemia

RLT buffer (Qiagen, Venlo, the Netherlands) containing 1% B-mercaptoethanol and stored at -80°C until RNA isolation. For ex vivo neutrophil stimulation, after obtaining written informed consent, EDTA coagulated blood was drawn from healthy volunteers (N=4). Neutrophils were isolated according to the same protocol. After isolation and washing, cells were allowed to recover for 30 minutes at 37°C.

2.5 RNA isolation and Microarray analysis RNA was isolated by Qiagen RNAeasy RNA isolation kit according to the manufacturer‘s protocol. DNA contamination was removed by on column DNase treatment (Qiagen, Venlo, the Netherlands). Total RNA yield was determined on the nanodrop ND-1000 (Isogen life sciences, and Total RNA quality was assessed at Agilent 2100 bioanalyzer with RNA 6000 Nano chips (Agilent, Santa Clara, USA). Neutrophil gene expression was measured on Affymetrix Human ST 1.0 exon arrays. RNA material was first amplified, transformed to cDNA and labeled using ambion WT expression kit and the affymetrix terminal labeling kit (Ambion, Life Technologies, Carlsbad, USA). Labeled cDNA was then hybridized for 17 hours to a Human ST 1.0 exon array, washed and stained according to manufacturers’ instructions and scanned on a Genechip ® scanner 3000 (Affymetrix, Santa Clara, USA).

2.6 Transcriptomics data analysis Affymetrix® CEL-files from microarray scans were used for quality control and first Robust Multiarray Averaging (RMA) analysis for normalization was performed with Partek® genomics suiteTM. In order to assess the extent of gene expression fold changes after LPS infusion in a subject, the normalized log2 intensity value at time 0 was subtracted from those of 2, 4 and 6h, to yield the relative expression fold change per probeset identity. The mean fold changes of all probeset identities pertaining to a gene were used to represent the effective fold change of the gene for that infused subject. The mean fold change for a gene in all 4 subjects was finally used as its overall fold change in expression after LPS infusion. Genes that were over- or underexpressed at least 2 fold relative to time 0 of LPS infusion were used for subsequent analyses. That is, a gene is considered upregulated in expression if its log2 fold change value is ≥ 1; or downregulated if its log2 fold change value is ≤ -1. At this fold change threshold, we examined temporal gene response to LPS. We queried for sets of genes that are up- or down-regulated in expression at any time point (i.e. major overall perturbed gene set); at each time point; at all time points; only at certain time points; those that are increasingly upregulated or downregulated in the course of infusion; and finally those that showed wavy expression 6 patterns by being upregulated at one point but downregulated at another. We performed gene ontology analysis on each of these sets of genes to elucidate the temporal transcription program elicited upon LPS stimulation in neutrophils.

137 2.7 Gene ontology and protein-protein interaction network analyses All gene ontology annotations were done using the BINGO14 plugin in cytoscape15. Using this algorithm, we assessed for gene set overrepresentation using the hypergeometric test with the multiple correction method of Benjamini & Hochberg False Discovery Rate (FDR) correction. Our gene sets were tested against whole GO annotation. Unless otherwise stated, all biological processes mentioned were significantly (corrected p-value<0.01) overrepresented by the set of genes up- or downregulated at any time after 0h of LPS infusion. All protein-protein Interaction (PPI) data were obtained from HPRD16, release 9. Significance of a PPI network was tested using the Physical Interaction Enrichment approach 17. By proper node-degree normalization of random networks being used in comparison with a test network, this approach circumvents biases that might arise, during network analyses, due to the over- representation of certain proteins in global PPI networks.

2.8 Q-PCR for microarray validation and in vitro neutrophil gene expression From every sample, 250 ng RNA was reverse transcribed to cDNA with superscript III (Invitrogen, Paisley, UK). Gene specific Taqman gene expression assays (Life Technologies, Carlsbad, USA) were used in q-PCR to determine sample gene expression. GAPDH and 18S were used as reference genes. Q-PCR was performed at the applied biosystems 7500 fast q-PCR machine using standard procedures.

2.9 In vitro neutrophil stimulation Neutrophils (5x106 cells/ml) in RPMI supplemented with 0.5% Human Serum Albumin (Sanquin, Amsterdam. The Netherlands) were stimulated at t=0 with LPS (10 ng/ml; Escherichia coli serotype 055:B5, Sigma Aldrich, purified as described previously18), TNFα (10 ng/ml; Abcan, Cambridge, UK), G-CSF (50 ng/ml; R&D Systems, Minneapolis, USA) and GM-CSF (50 ng/ml; Cellgenix, Freiburg, Germany). After 2 hours, 800 µl cell suspension was taken and washed with PBS. Subsequently cells were resuspended in 350 µl RLT with 1% Beta mercapthoethanol and stored at -20°C. After 7 hours, 200 µl cell suspension was used for apoptosis assay.

2.10 Cell survival and apoptosis measurements Survival and apoptosis were determined on the FACScalibur by Annexin V apoptosis detection kit (BD, Franklin Lakes, USA) according to the manufacturers’ instructions. Cell survival statistics were performed with repeated measures ANOVA, and Tukey’s multiple comparison post-hoc test. P values ≤0.05 were considered statistically significant.

138 Transcriptome kinetics of circulating neutrophils during human experimental endotoxemia

3. Results

3.1 Clinical and inflammatory parameters induced by LPS infusion in healthy volunteers Infusion of LPS in healthy volunteers induced clinical symptoms that reflected systemic inflammatory response (increased heart rate, increased body temperature) (data not shown). Total leukocyte counts increased 1-2 hours after infusion of LPS and remained elevated during the whole experiment (12 hours) (Figure 1A). Leukocyte counts showed a depletion of monocytes and lymphocytes 1 hour after LPS infusion with gradual recovery afterwards. In contrast, the neutrophil fraction after a short drop in numbers (50%), increased 1 hour after LPS infusion and remained elevated during the entire experiment (Figure 1A). Plasma levels of pro-inflammatory cytokines TNFα and IL-6 levels were maximal 1.5 hours after infusion of LPS and returned to baseline after 6 hours, whereas circulating IL-1β remained below the detection limit. The anti-inflammatory cytokines IL-1RA and IL-10 reached maximal plasma levels 3 hours after LPS infusion. The chemokines IL-8 and MCP-1 peaked at 2 and 3 hours after LPS infusion, respectively (Figure 1B). The colony stimulating factor G-CSF peaked at 4 hours after LPS infusion, whereas plasma levels of GM-CSF remained below detection limit (Figure 1C).

Figure 1 - Clinical parameters in time after LPS infusion.

(A) Leukocyte count with total leukocytes, Polymorphonuclear cell fraction and mononuclear cell fraction. Error bars represent SEM (N=4). (B) Plasma cytokines at different time points measured by luminex or ELISA (IL8). Error bars represent SEM (N=4). (C) Plasma GM-CSF and G-CSF measured by cytometric bead array. Error bars represent SEM (N=4).

3.2 Microarray transcriptome analysis of circulating neutrophils 6 after LPS infusion reveals a time specific expression pattern To investigate the in vivo neutrophil transcription response after LPS infusion, a total of 17393 genes was analyzed on microarray. Principal component analysis of expression in time indicates a separation of time t=2 from time t=4 and t=6 for all four subjects (Figure 2).

139 From the total of 17393 genes examined, 2248 genes (≈13%) were differentially expressed relative to time 0h at a fold change threshold of 2 (data too large to be included in this thesis but shall be provided as Supplemental Table in the published article ). Out of these, 233 genes were upregulated and 307 genes were downregulated at all 3 time points following LPS infusion. However, at early time points, more genes are upregulated, meanwhile at later time points more genes are downregulated (Figure 2); revealing a time specific expression pattern in which gene response at T=2 is different from that of T=4 and T6. Furthermore, out of the 2248 differentially expressed genes, 37 were increasingly upregulated and 79 genes were increasingly downregulated over time. Meanwhile, 49 genes showed a ‘wavy’ behaviour in time, being either first upregulated at t=2 and downregulated at t=4 and t=6 or vice versa (Figure 3).

−50 050

A B BUR12_T2

HEX1_T2

0.4 TUN5_T2 SLC27A 2 DOR8_T2 CCL20 IL1A

BUR12_T2 DGKG PTX3 50 .05 C17orf76NFKBI D HEX1_T2 0.2 CEP170 GCNT1CLTCL1ZC3H12A UNQ3104CSRP1COARHGAP1RO 2ATMEM38A 8 GJB6 TUN5_T2 FAM117B EFCAB4MOSC1CD8 B PL 3AU CCL 4 DOR8_T2 RHOF PARP1UNQ9364 6 PNPLA1 CIDEB CEP9ARHGAP1 7 SAP3 2 0SUSD3BIRC3 GPR8 4 NOVC2CD2 LGSRKCNAB2GAL M FAM55CTNFHMGB2NFKBI ELRRN1OLR1 JARID2 DOPEY2AQ R RBP7 FOSLNT5DC3 1 TN FA IP 3 C6orf204NUP9THRAUSP2RAB40CREEP4C1orf85 3IRF2BP2 0ZNF770SEPTPPP1R12BC6orf167IL18ITGA6PDXK 5 SLC35BARL5B 2 PC2 (13 %) SSBP2ZYG11BTM7SF3BRCC3STIMKIAA0196HGSNMYO5TAPT1SULLOC727817ALDH3B1 1 T1AMFAP3ABCC5 AATGPD2TTC13DUSP2DARS 2 IVLRP3L DIL10RSCCPDHAGPHAL A CTR2 FCRL 1 LGALS1SESN3NUDTDAPK2VPS1SIN3HIRA 2HISPPD1TBC1D2PER1 3FAM63BPTGDR BSSSCA1 8PLOD1C11orf21OXR1NDST1LONRFMOSPD1CAMAPK6AK2SFXN5PLK2KBTBD6RPH3AL CHD1REL1 WDR3SERPINB8DDIT4SKI 2 L PECR ARG1 CRISP3 CD300LBRPS6KA5ST6GALINPP5 1JDP2DUS2THBS1 BNEK6ZNF141MYO18APRAM1KIF13AVSTM1ALAD L CA4ATE1NRIP1EPHA4FAM127ALMBR1DGKDC5orf30VCLCASKPLEKHF2B3GALNTFLJ20323KIF3CSGMS2MFSD2ACP6GPR160 1 SLC36AAZIN1DY 4CFL2NLT3 RASA3FCHO1TUBGCP3UBASH3DPY3VPS26BARHGEF18CCDC88CCCDC125STT3B 0MBIINPP4 BCENGTF2IIPO7 PPLCLSPIN1C14orf49NSFTA AMSH3ANXA6TSPPTPN91SY 2TTC8SPTBN1C10orf57TTP AN3 AP1AKTIPEHD4PUS1EREGHSD17B12 ALRNF144B 2NCO 0TESC A7HS3ST3BCAPG 1 ODZ1 PHOSPHO1 SLC7A6NARSSPINTSRPK2C20orf20SLC23AEEPD1PDE7SULF2SPECC1PTDSS1ITPKB 1ADIC14orf135TMEM104MLLT10PPFIANT5DC1 ATSPTMC8TMEM106BCMTM7 21KCTD1SL AN2ARHGEFDCKLRRK1NARG1USP4 1 TMIDEMFSD1PRNPPPFIBP2RFC1SUCNR1C10orf30VBP1C22orf9AG 2ITGA1LONP2CD2SERPINB1 8SLC18A LPATTPMTPTPN2ABCA5 6FLJ31222 4FN9TRAFTA 2LSM6TFF 0 1GPR132MAP2K6 3 UNQ9368 FAM36AMAPRE3XPO7FOHMFHL3ATTACD3FR XN3OXP7F4SCAMP1TGFBRAP1 YKIAA0319LKIAA0355CHD9ZMYND8 3 A2PSD4MID1IPATHDHD1TUBBKCTD2C16orf33P2A3CPEB3CEP120AGPSLRRC4PPP2R5LRRC8CCND2DPY19L3DHRS1SLC16A14R 1KIF21BFLJ21438DBP A OCK2WIPICDCA7SNTB2 0PLZNF595PPP2R5RZNF43C13orf18AURIF1 DOGDI DANKRD61AREGNSMCE2ACPLVPS5RSUCLG1 LHIATL1TGIF2 BLOC284757SLC25A24PGLGYS1 24 YRP1SLC44ABMXIL1B 1 MFNGPOLR3SLC5A3HVCN1NFLEPRARL2BECH1COMAN2A1ATPIP5K1XYLADSSIFFOROCSE1 KGARNLC19orf42C3STX1PRPSAP2PIN1INPP5OTL1FOZBTBCNIRNF216PLCG2PSTPIP1 T11CFAM118A PADCY7PRICKLE3 XK2ACC15orf17NUP4MAP3K2TM6SFALG6 LHPIK3CBBEVI 0ARRDC1CTNNBIP1SNX2TR1B 1TATD4HOOK3PRPF1RAD54L2KAT2TPCN2OGDH 5CDK5R1RFTBT 3NUPLDN2EXOC2CDYLNMD3TCHPST14ECHDC1 1EPB41L5MBOPVRL7 KGRAMD4OSBPLNAPADD3 BAT 1NSUN4MVK 8MHK2 1SFRS12IP 2SLCO4C1ELL2ATB4GAMPP7ECHDC3RTPGM1 1KLF51PLD1FBXW2TP53I11 6IER31PADIICAM1PKM2SYNE1LTEMBTICAM1SLC25A40CKAP44 4 1 SLC7A5 SAMSN1 CLEC5 A 0.0 MTMR1CABC1DAPK1ZNF362RNF166DEUSP2CYSLWRBC20orf72HEXBFACSF1RCCDC109ASH3BP1C6orf151NDRG3KLHL22ARPP−19SMC3HD 2ALKBH6NKTTMC6GMCLAMDHD24CARHSP1TXNDC1 TR1KIAA0515SPT4RSC1A1EIF4ABICD2DHRS7ACCLEC12BSSR3RNPEPLGPX3WDR4HNRNERCC3FBP1AT RCCPG1CEP192C5orf36ANKRD2PGM2PMM2 AN1CDC14APIP3−E3CREIF2AK1ERP2A2 1CHPTKLHL18SLC19ARCP9CRISPLD2ACADMSSH3 1BCORL YZL1 O1tcag7.1177PA 73EPHB4MFN2ACAD8LZICSYTLCCRN4SNX4C8orf38C3orf21INPP1TPPPC5orf29 1DCTN6GFOD2 LBNCAPD2ATP8B4 1PPP1R15A 7CARS2CAMKK2FAM125BPCMTSLC24AANO1C1GAEDN1 21ADAM15ACO1ADORA2PLEC1 LHIPK2DEGS1TP53I3RPS6KA2NFKBI 1MAFG0LTSPRED2PTGESDHCR7 31 SLC26A AZ 6 ANKRD2 2 ZNF652SKP2VPS8METRAB3PITPNC1MXIHEEP400EPB4SMARCA2TMEM216KIAA0100TBC1D1USP1 AP1ATTA 1 7NSD1C1orf149C12orf51WDR7MYCBP2CC1R5TNRC6CUEDC1 1PLCB2C14orf139WDFY4RCSD1PIGSDIP2PREBFNTBHPS3STXBP5GNPC21orf96NCOR1ABCD1 AWDR80LRSAM1VPS33ARNF144AAPEHNLRX1DNASE1L1ZDHHC1MIB1AXIN1 3SLC22A15ZCCHC1NHLRC2CCNYLHMGCS1CDK7DLG1POGKNECAP1 ACYP4F3CCDC2HOMEZTRAF3IPTAMAPK1RMND5CYTH3SYNE2ARHGEF17IRS2 ADIAPH1HCP5CITED4EPS1HPDPHKA2ARMC8SGSM3TTYH3C12orf61ZNF512PPP2R5ZNF282KLHLPIGATMEM5ZNF354AATBLPCAGALNSDENND1RRAGBAVANMAP2K1CSGALNAFAR1SBNO1ACPLCD1G1ICAM4PATTTC3 1T1RBM1 5PXK2 3 18AT GDNEIL1B0T1DUSP1 3 6IER52 A 8 BMOSPD3 1 CT 3GK5 2BPIFCARTRPM2 FAM78APECAM1FAM53BASF1EPS15L1RPNHSKIAA0753NUDT21 AP3SMARCC2TOM1L2ZNF767ZFP36L2 BIQGAP2COPS6C16orf63C7orf41MTFMPYCARDC20orf19MKRN2GPSN2SAMM5RPHERC1FAM134ALSM1MRPS6KIAA0562NISCHSNTB1GTF3CTBC1D1PIK3R1OSGIN2ARSGC9orf78SENP6VPS4EIF2BDTD1RNF1NSUN5SEPTRP5−1000E10.4STYXNFRKBDSCR3SITNPO1SETD3 A1PQLC3DNMBPDDX2SLC36ASNAPC3C12orf5PITPNM1FCHSD1CLCN4LIX1LTP53INP2MALT1SLC9A1SLC40AZBTB4 TRTPAK1SARGLMTM1TMEM170AUBE3C9orf25ARHGAP1IDH1ZNF490 0C14orf2GEMIN7TMEM141C4orf41GMFBPPME1SEC22C 0ERICH11TSHZLOH12CR1NBPF1 46TNCM1GBAS 1LYSNX1LPGA 2DNAJC1HOMER3SEC23APSMD5 7PLEKHA3FO 3ARL6IPAK3CDH2KIAA0922CATNMTCD151FAM107BNR4A3CDC5ARNTFAL3MBTL3FLYWCH2TIRAP CIARS2VEG 1AP4B1OPA1CEBSTX8APH1 L1MTMR6XRED1MAP3K8RFX3ITPKC 3ZNF718KBTBD7GALNTIPPDE4ABCFTMEM63A 4T1CYFIP 1 1C9orf123 8FAZNF281 9PRETFD6ORC2NFKBIDMXL LPAIVNS1ABPFUT 3SHB6PRKD3C13orf31 BFAM177A BSFXN1OKR21UPB1 7LIG 2BATF3RIPK2 4GBE1 H L1 AC11orf82 4 GCLIL1RMSRAEXOSC4 1LOXHD1M N KL HK3 IRAK2 RNASE3IKIPBRI3BSLC25ATRIM24ZNF398XPO PPSIP1HAGHTMEM14BMAPKAPK5MEA1CAMK2TUSC4REV3SPR TFOFBXL20ERCC6SLC9A3R1BTBD2CAPN7DEPDC5FADS2TARNF122BRLASS2FAM54BYLPM1CA5 1LYRPP2ZBTB11HDPIP5K1GTPBP1 XJSERBP1EXOSC1MRE11ACLFBXLZNF9F6MYSTUNC119ISOC1 WD1SNF1LKVAMP175CLDN9YD3H2AFITFGZEB1TRAPPC2TSZXDCCHKAPHKG2 LPSMA7ACIFRD1SLC25A20B 2ZNF644KIPMM1MAPK9 CCRLF3STIMPCTK2EDEM1TTC15OSTM1 GMAN1A2CLEC4AP2M1SLC25A46 TDONSONSLC35A1TANDUFB9ZNF33B1MAP4K2RNF123 4ASH23DEF6C9orf80ZNF75DBCAP2 14TRIM39PTPN2IM CYPKD2TMEM87BTRAPPC63PTSTX2ARFIP 0TOPBP1 2CNOABCD3JACBX4CLK4PEX1EPM2AIP1CCDC126C18orf45AGC10orf55PAZMYM4PPOPRLMAP2K3PTPN7TOB1OXER1CASP3ETS1SH3BP5CTSDSEC6PSMD1 SATANKRD34BKIAA1109SFT2DMETTL6C5orf22ZDHHC1TBCD CZF1MFSD9FAM70B LUHMK1TTLL12PYRCTDL AP2GLCCIATF1GNG2C10orf119PATSTFPGTCYC1P6V1NR6A1 1BZW1 T69RSU1MCM8 3RAB3GAP2ANP32E 3PRPSAP11 DAU 1OXSR1 2 BSENP2OXD1CAPN3C3orf23WDR4 B 12 HLKCNE1 1TNFSFEHD1 7 1 LGALS1TCN1ETS2 8VNN1UBE2AG PATCENTD1 F 6 AL OX 15 DPEP3MPPE1MEF2C14orf106KIAA1191 CWHSC1L1RNASE6C16orf80SLC35EVDKIF27PRKCZDYNC1H1MCM5RNMRALCEA WDD4ACMANBAC5orf24CYP4F1CYB561DYEAKPNA3HSPBAP1BBS2UBP1CBDHDDSNDUGLB1PIP4K2AHCYLC16orf70UCP2FAM96A3ESCO1CAMKK1 CAM6CEP5FRAP1CLN5AP3M1C3orf42TMEM6TABAXFARPS27LRALGPS1LSM14BHSD17B 1MYO9C18orf10SLC17A TSTMEM214FAM58ADOPEY1NNTC11orf54SEPTRGS1NFS1BIRC6SDCCDCLRE1FAMTIF3 1AOK3DAP32T2RAB4CTSZRBBP4 2LOC90826CIZ1PDHBSTX6YIF1BRIPK5DNAJC13 2CD3NCOSDHAFAM106AC2orf49CTOP2LRIGSPCS3FLJ31958CO 7RNF19BSLMAPHINTFLJ42953RAB2 2PPP1R15BTDPPM1EFHA1PKN1SMG6LOC100128868CNTRALGPS22PDCD6IP BINHBBLYPLA1FAM110ARC6orf162PPHLN1NDUFSDCCLRRC4TNFSF12−TNFSF13ANKRD36B 7DNM1WDR5NQO2PTGS2SCP2CLASP2 54CCTVHLAG UNDC2X4I1CDMTMR9FLNA GUPF3BA45 8PRDM5HSFETF A3DDX4GLTPFLJ22662 2FAM160ARORRA 3COSBPLBKCNAB3TP53BP28 QPCTCENTG1A MVRK1VATTVICSAFF2SRI 5C1orf162KLHLANXA4FOLRUSP4 AF1PFTK10TAAGSERINC2 L15BCH25HGP M2V3 NHCSTNEK9 GBTAEGR3ZNTNG2 2A10 AA1 9 37SMU1TE 2LIMS1QS SKIF1BMMP9OXC3AR1CD1 ADC1SLC37A 3 RNASE2R UNX2DPEP2POLR2ITPK1OCR1LILRA1PTGER4EFTUD2LPCAUTP6C7orf23PAPD5SSNA1CDK2AP1ZNF721OPRS1COG1FAM19AEIF4ENIF1PKN2ARD1ALKBH5SPENRPS19BP1DSN1TMEM14CKIAA0195SUPT16HC17orf65ZNF185CENTD3CALM2ARID1T4 GRPGRIP1NDUFS3BTBD6C11orf48TOMM3MGCTCFLILRA2PDLIMWWP1CCDC146RNPEPRELL1PYGBPCSK7TPRG1SLC16AATTMEM55AFAM127BQARSBAZ1BAMACRC7orf44AGGAATTRAK2AKAP1SLC6A1CXXC1ITGALPTP4A2RASGRP2SIAH1CD4CNPY3MAP3K3H2AFXZFAND5DAPSME4CLEC12AOAP2EPN2ATAIPGLIPR1RBM2MOSPD2F7IPSLC35F5ATCCDC2 AP5DHX2ARHGEFRMBTPS1VPS1PATPOLR2COMMD6KIAA0513ESRRALTCLPTM1GSK3TICAM2UNQ6493IRF3ZNF294BCL7BHSPB1 2LRD3BTIMP2GOPC4ARY UNDC1Z1HEXAC3orf63TRIM52CIDECPLOC283335G4C4orf34DHPS L7GALNT12STX7BACOMMD2MIZF 4PGGT1ARL15AFAP1LOC171220MCACOMMD11ST LEXOSC3DBI2MCM9ZNF7061EDEM3 6C2orf18CPRIM2RIOK1BTAF 6MT1GLT25DVPS3ARFGEF1SYNJDCTN5 G419 6TBC1D7SCFD1ZNF589RC14orf129TBC1D5APAF1 ARD3NLLRRC6MFSD1ATB 1DBN1ZNF100RBBP8 KRP5−1077B9.4UBA3 OCK1HSOSBPL1ANLRP1C15orf15CLIPRBM4AT 1SLC9A6FAM134BRTZNF641TBCBSYCP2PPIATCODPP3 2XN3VTA1 LMC11orf51ZDHHC2SUPT7LPHF1 1BG9ASB3KIAA0528PABSTAD2CCDC134 0611PANK3SMPD2YIPF5UNQ6487 X7A2ZNF271DGRPELRAB27AC19orf61 1PLEKMUTPRKAR2TNPO3RFX513IMASOD20 1SCN9ST6GALNA0 1OSCAR 2DHX3 1BFCN1PA LPLK31NFKB2 A1SH2D3ARHGAP2 6 A NBN C3 A 4TNATFAP2C2IP 6 FAM62A PPP1CCCOPS3MRPL41TMEM222ZNF507USP2SUZ12PATOR52KMGC12966MAZUBCBRPRD1ORAIEIF2C1ELKIF2RAB11FIPRAD9RSBN1TP53INP1DOCK5NFX1C3orf62RCBTB2P6V1NECAB2SLC25A32PSMB2ARRDC2TRIM62POLR1C3orf37GPIP4K2ZNF561YPELFLCCDC1ACANAPC1CEAFACCDC9MARCH8CPEB2DCAKDAVGPIFAM116ATMEM179BGTF3ATRPM7STSASS6EMDPATMDH2ATOR52K 1C16orf52NFICELSFRS1DUSP1WDR5PIP4K2ACZNF587REPIN1FAM168ACYB5R1ADIPOR2PARNRETSA AANKZFCCDC111GTF3C52T3C8orf59LOC201175MGST21ZNF346SUZ12ZNF692SYK 3DAL1YPEL ARD7HALCOG4TWF2FLYWCH1TSC2UTRNTMEM45B 1PRKCSHANSMG5G2SPTMEM3HMGN2INTS8 CAM8OVC16orf53TMTCATRIM26ZNF295STSTREEF1A1PPTCH1CRIPIGFVPS37B 2LPPIL2IL6RLY 6LYSMD3PRPF40APEIF2CSUV420H DAST BDYNC1LI2LRRC4L1 2MBOC10orf95DNMT3A 8L1 4ZM9COPS8RAB1RAP1 1ZNF143BRD92TIA1KLF10PA4CAMMOP− 81AOR1A1C2orf30MRPL51 TEXOC7MESDC11DCTN1TSPYLACGRASPTMC7BTBD1TRIM4ZCCHC9NAB1KIAA1530 3ZNF280DMIER3C9orf95TMED5RASGEF1BPPP1R12ABTRCBMP2BDP1PDK3TSSC1ATKINSRSATB 3SEC22AHERC4STK3OCRLFAM82BWBP4FGF1 3SLC35AEXOC5PLP2ATSEMA6TRIM14GRK6OXORAIWDR7SLC35ASNRPB2CYYR1TTC1RIOK3MAPRE1TATDRD9RBM3CRISP23IL1RCD5HBS1EMLSEC23B 8GMDSB 1PRTAF1EDC3G0S2PDHA1LT2TSNAX 1GRK53SNN 1CCNCDNAAP1S22 4HS1HDHD2 01NEBLACAA1F UNE28RPH3 32 81CAMK12MED2GINS4 4 0PA CL8IL1RA 5JA 2RANBP2APOLAMB3DOT1LFLJ102131LTFDP21 AC20orf3 20 GA2TOMM40LLILRA5F5RABGEF1DCUN1D3 CTSH RBM4GPR177EIF3MARRB1FNBP4AT BIGSFCLPPKLHDC2LOC554203SLC12ALRRC8RFX7GLT8DRNASETP2B1KLF3MPRIDYNC1LI1TMEM209CNNM4C14orf156PLD3POLR2C17orf39CBX7ST3GALNDUPPP1R9PLEKHO1TAX1BP3RPN1LRRFIPLOC100130492 2VAMP2MRPS1CAMSAP1NDUFB5FAM175AMAPRE2C1orf128SAP130SLC16AMPV1COMMD8KCNE3FGL2TXNL4AALG1ADAM10ZBTB25BMIDDB1BUB3PCPAEXOSC1ZNF808CETN2SMURFRASSF1CD9PR ARREB11PHPCALAMZEFNA4GALK2RCN2RABEP1FLJ14327HSD17B11ZCCHC7HNRNPH1ZNF33ARNF114TOR2GFPTFATCF20DRG1GSTM2 4ZNF592SLC22A18ELOFATSGSWGPS1ZBTB2AMFSDHBRPS15ASEMA3BCRKCTD7ZBTB7AMKLN1PCBP3NUCB2TZ1WDR6FAM111ACS1RNF130C1orf123ZNF318PBX2SMAD5NGRCOR1SAPS1SNAPC1C15orf28CRIPTYOMECP2TPST1FAM63ANEK4 2BNIP3ETRAPPC4ZBTB40 UNEPFDN1YWHBTSTK38L1OR10GUBN1ACSETD2C5AR1GUSBSIRPDRP11−345P4.4LUC7L2SFRS36WDR6IL16SKIXN2LARP1ZCWPW17CTSA32MATR3FAM130ANDU 16JMJD2ATAAPBB34 BAP7KEAP1MOAP1 2X1LYC14orf166C2orf27PPM2 D5H1F0PTGS1 RDSERPINB2DMAP1MOCS2KIAA1274 AC22orf371FANCEPSMD2SLC35A LOC100131848KIAA1033RAP2CXCL 2HNRPLGUF1AGCHMP7STK2 0TUBA4C9orf72KLHLABTB2POLE4 L0CLRRC4B4GAAKIRIN1FAEIF2BPLEKHJPFN1LOC142937USP8RPICDC42BPGSPRPOP4WDR4GOLGA48 9SLC4A1APCNR1SRBD1ZRANB2NXF2SESN1ECDSERHLNLCENPESULAG CKIF5BGGA12ZFP106KIAA0368RNF217IGHMBP2HBXICCDC88BPNPLA8DNAJC5 1CREG1DNASE1MAN2C1TMOD3PRKCDHUS1 BTOM1MYNN1HTKIF1CSTRN3SYPLHIF1AOR2H2 ACAPN15ANXA5CHMP4 KPL1LOC147804LGALS3 6PTBP2S100A12MTF2 Y1BIRC2AFBN2ARID3EIF4GLT LSGOL3CCND3 AT1B2AP3S2LILRB3PHBDBNLNFIL3 0TCD5 4AGNME6TOR1ADAM91 PPSMC4SDHC 1ITGAMLMTK 1RASA2ZDHHC3 1L2SESN2SLC26A 5A3 1 BHRBFAM110BGLRX BAZI2TRMT6TMEM185BCHST 2TFDP1H6PDMMP8 8 2 PFKFB2 BUR12_T4 00 APOBEC3SLBP C9orf164COCYBRD1LRMP CATORMDLROTMCC1MADDMINC4orf30MAPKAPK3TXNDC1MBTD1CKAP2AMMECR1BRD3CR1BNAPGALDH3A2ANKRD4OGFRLPPP1R7ZNF828RNF4C14orf100HNRPDLLRRC37BNR1D1NCBP1LRCH4TUSC2NDUFB1NPC1HMG20B 1ATTACCSLAMP2SPSB3UBE2V2NCOR2WNK1DDX5MBD4ERGIC3ARL17POXMLL3PCSNK2A1C20orf149KRELA2 4MPO 51 T14KRCC1COL4A3B 0CXorf23 0 LRGS1 02FLJ45079TTC31BTBD9 1ASB1CCDC4VTI1ASLC15AGRNLYK5PXMP3PIWIL4RSIGLEC5MAP4K4F3 9AGXT2L2 OPN1CYBA P 9 JAK2 3TRPV2 L ACVR1EIF2AK2FGD4 B CLEC4 D RNF125LPXNAOC2PIGXTRERFPBXIP1C14orf43MRPL34RABL2BFAM62BDENND4VPS13CTRIM28MRPS2TPP2SLC35CPDPRZFP161FLJ23356DYNLL2AKTC14orf169KIAA0232TCEB3UBA2SLC7A8BRPFRABGAP1COXPNPEP3C7orf43CNNM3MGPOLHEBP1CNOCLEC2CPSFFAM122BPCMTD2MRPS3RPHMGN4DAD1JOSD3PEX1SPCS1SRCAPFBXL14ALDH9A1UNQ1887PAPD1SMAD2C7orf25HERPUD1DHX3 1LDB1CCM2BTBD14AERAP1TBC1D9OXA1CHST14HIST1H2BKSFRS6RCE1FLJ44635MTMR1MIF4GZNF627ZNF266PCGFZNF187STUB1RPS6KB2C2CD3CASP8AP2CYB5 X5PIGTASB6UBE2G2THUMPD3DALOC554248 1LRRC6ALDOCLCMTC1orf71ZBTB48MEF2DCCDC8RNF126ATNTAN1HIST1H3C AINBRP44LHM1NDUFS5SBDSFAM118BSNX2ASRGLFAM126A HPANX2ABHD1TMED1RAF11CLCN7KLHDC3PSMB7EDF T7SNRKCOMMD31ARL6IPRNF20 A1C11orf59 CGLRESTANAPC25APOB48RC3orf17RBBP5 6METRNLUBE2USP2TMEM161BGLRX3CALMLAZU1RAB3C14orf113 D4PIK3IPhCG_21078KCTD9C3orf10LFIGSYNGR4TSC22DZNF3264B31 GARNLDC10orf91 1MDKFLJ12547 1CYorf14PMPCBKRBPGM L0STIPC9orf64 BVPS2RPS2C14orf108STK11IP 4COEPX 4COMMD1PEX1 1UBE2E3PRPF38A 0ACRRAEAF KGNAI5 TAP15−1DPH3AT1ANP32ACHMLAP3M2NF DSPTLC1DYNC1I2TRJRKLFXR1IL8ARSDNFE2L2RAB2 41X7A2PELI1ARHGDIWTAPNDUDDX3HELFBXO2TR10ENPP4CHORDC1 1GABPB1TASESTD1HSPTGER2G5NR1H2ARHGEFSUB1DMPKDMWDYAMVDCAMP 41TBC1D2 HAOVE2SGTB GA2VENTCCNL 9F8SRA1PCGFCFP 3NOLPAFAM172AILKXRCC4LYPD3 ZAOFASRPK1 1CENTB1AXUD1 Y1AATPA 89UEVL X7TSPO A1OAP11B 3M 3BTFNDC3TXNDC3CYBBREL D XBP1 BVILL A MEIGPR141 1ORM1GCH1 HEX1_T4 SIGLEC8NUP210EIF1AANGPTHSPC152C1orf174G3BP1SLFN13MCM7KIAA0152FAR2SSBP3PORCNGPR7THAP1DYNLL1RNF103CMPK1SRXN1GSDMBERLXKDELR2PPM1 XDDX2FLJ11151TAFAM33AKHSRPZNHIT3COPS7NAE1GNB2L1CCR7ZNF192SS18L2C9orf5IMLCADDX2CUTFAM76AMFSD8LYPLALTSNAHPSEZNF324TP5PURASLC3A2DLEU2ASF1AFAM104A NSF3APMP2CKS2V3PEMTPOLR2MORC2NARG2CCNYCRSLC30A 1SPPL2BPAXIP1FAM165BNUP214WDFY1PARLPNKPZNF383PREPLCC3TIMM2RRM2IDSZSCAN2PAZNF688MRPL40MXD4H1FXSGK1MRPS1EBAPFDN2TMEM111TPCN1NDUFV1KIAA0892 TT13HPS6FAM128ANSUN2DRG2CLCN6ACPPNCO 7RNF220CAACSF2ANGELCALM3POLATC6orf153PPP2R5TOMM2ARID1 TTOP3MARSVEGFMEX3ZDHHC7RNF216L ADGCR2SPY23GDE1FRBM1CDCA4ADNP 1SEC31AFXR2ASNA1TAPPU1SNRNPBPCASD1METTL42 4EIF2AK3SUMO4GRAMD1TM2D3 0ARPC1TAEXTL3YPELGNLRBM1PELI2EID1P6V1E1NPBWR2 3GLUD1CYBPRER1LIGFADDCENPNPINK1LSMD1 2DECR1AGBOR56A G9DNAJB1RNF145C10orf99GTF3C6LRRC3F15 ARA 1CYP1A1ITGB1BP1MEIF5 A4RFBFBXL12F12MAT2ADHX4ETV3 2STRA1IKZFPRM1ZC3H15 5 APCP2FKBP1TCEB2WDD1 39BC1 ACOQ228ZNF426LOC654433ZMHNRNRSFDHRS7 92SFNHMGA1C7orf60C6orf64PRPF 5MRPS18A0BTBD8NDUFC1BLOC1S1STBD1CREB3LSM8FLJ12078MAPKAPK2SELTAF1 C5RFC3C14orf118YTHDC2PLRG1CHUKFAM64AMED2 ATMEM168ZNF273C20orf177SCYLRTLPCA 5ATZC3H7ANOLFLJ14107 1IPO8CLTC 20ERGIC2ETNK1SRP1 BSPTLCUQCRFS1COPB11CASP9PADIZBTB2KIF3BBCL2L1CLEC4 3RBM7L2HTRA2N3TSP AFUBP1RP9RSRC1PA A8CHMNSUN6STK1RGLBCL BDYSF1SRP5FERMTNLRC5CHD1T2 23OPLAHNRAS2BCR1JMJD 2GGTN4BP1 AN14CAST PFCER1KCNJGADD45B 2TRIB 134HK1LDLR E 6IFT20 4SCN1 41 6NFKB1 3WSB1 2 1 GSMPDL3ASH3BP5 BPFKFB3RALGDS SMPD3SETSLC38ANFXLRBBP7LRRC8NR1D2TIMM2STRN4PIK3C2 1ERP2RASGRP1HIGD2VAMP8TMEM50BALDH6A1PRKAB2C14orf1C6orf115ARHGAP2EEFFLJ20489 1KIAA1324AMD1FAM43AGSPTERHGLO1TMEM156C12orf24EIF3GC9orf30HSD17B7PBTN3A1GORASP2ORMDLITGB1BTF3L4RBM15BZNF238TLRC6orf136PLCB1TMEM6ZNF675FAM49ASLC25A39NOLMCPH1 BGRPELC16orf5 7CTNSC20orf11FAM60AET 3EXOSC8TTCDPM2NUP153C22orf32MRPS1HNRNCCDC2PRKDC2EHMTKLHL24LNX2FAM96BANKRD3CPSF4TFEBC18orf8TNNT3C19orf56POLE3TMEM39ACCDC124TXNDC1MPHOSPH8TRAM1LRRC37AHSN2SORDKLHL11FUCA1TEX1ZNF597ITFG ASUSD1MLLATG16L1RPS4Y1COMMD9LRRC4ZBTB44USMG5CDC37L1 AWBP1FAM122AACBD5GRAPGNPTUNQ1945RBMS2XKR8RPN2THAP6USP3POLFOZNF639RASSF 1COQ9CLFA9NEU1ZNF714FAF1MRPL21TOB2CXCR5TNFSF14FISSCOCS100A11FLJ43093RETMEM6ADAMTS1SDHDPPCS 9GDPD3C21orf90SCARB2MGC70863TMEM8ATCOCALC20orf173ZC3H10LMF1RNF34KIAA0391YAF2 3KRMT1HOR10GUNQ5836GAPRPFTMCC3POLR2OR13AMED1TA XD4L3ABCA1KRCLPXARF4SO 3GLUL2URLRFNGPHA2CCLLT52CHSTPAP50CHRM1 16SIRGC 2KRUNQ9353RBMY1 X6C10orf12TAP5−11ARID4CA1FA651 9GC1orf57FLJ35934POMZP3 UPSMD1FAM27L7DGCR6 2TAP21−1INO80BHLA−DQB2QRFP X1TERFCNORNF1110ODAMN19IMTMR1C17orf91GTF2A 23CRBM3 TAP5−4BMPR2ZFYVE1VPS7KIAA03232MST150 4KBTBD2BHLHB8RPL198JUNB 3LOC5951010MT4PPI7C19orf35CYB5R2TA 15PLEKHG24 J1ZNF394GPR1ACTR1TMEM3AGTRAPATXN1KIAA1602 4PI4K2BS100APIGKPLCLLTLRCH3ASB7TUBB4C2orf25TRSPTUBG1 3COPB2TMEM2AT8TMEM120AF7POLDNTTIPTUBD1 FENO11FRIRF5IRF1TJBPIGVPOLPGDEIF1BACSL 2HTRA3 0L 2DPYD4 AP1VAAG71KCND1 6KP4HA1 AP1 A9C6orf211TNIPTTL E2V1 QANKRD51 1 2C2orf56 1 IL18RAP 5NLRP3 .0 IL5R A BAATP8B2 CE2LOC283588ADORA3TMEM138EIF1GAB1MBNLCNOIMPDH2SLC25A36C6orf72PTPLBAIAP3CDS2TMEM55BCISD2EI24SMCR7MFSD5C1orf96AYGPX4GMPSC1orf103SETZNRD1 T10GOLGA8DCUN1D4MEPCEFCGR3 3SLC7A5P1RALOC400713C20orf165TTTY1RAB9 BCREB3L2RPL23RHOBARKRSNRCDKN3PSMB5DHX1UNQ9218NSBP1AT C1XTRIM1LCN1V1 TAP10−12SDR−P1A3RARSCMC1C6orf130PKIPA AKIAA0776CEA 0ATSPSNHG7CSH2COCCDC144APPP1R1PSMC2SERF1A 5AMLF1IP1 G 0 3NCK2RNASEL OMGC72080 X1 AN3VASPCAM3RAP2HIF1AKRASRAB5 5TCP1BCL2A 1 0JOSD1 C A N 1 DOR8_T4 GSTM4LETMSLC29AAOC3 2 ZMIZ1LOC100128751BTBD1 1AMPD2SCAPKIAA0748SLC39AHRH4EIF3EIPEVI2AHIST1H1CCLN6RAB11FIPJMJD1ACBLC19orf2CXCR4TNFRSF12AACP1ADKZNF146PRKAR2CXorf52C4orf14SUMF2IL1RL1C4orf42C19orf53SCCECR1OXNAD1STTDP1C20orf112 1PYGO2SH2B3SF3B3PTTAPBPLPABPC4ERF BDICER1CRKIAA0319C9orf114SLA2ADFHORMAD1FAM136AUHRF2PHF20DOK2MAP7D1GNG1TMEM219LTRADDX1PWP1 OMLC12orf41PKP46ABHD2C2orf24TBPOVFAM35AC16orf35ILF3LRBASTT3AC21orf66FAM44BC17orf44OLFM2OSGEPAKAP8TMPOCITED2 OPSUV39HC15orf24UBIAD1NIT2FBXW8UFC1TUF1IGFBP7EPRSDDB2IDH3GNGTSUDS3GKAP1TOMM2C7orf34TMEM128DDOSTSECISBP2UAP1CCNLGTF2A G1AP1C16orf54DDTPRNASEH1FCGRGGPS1CXorf20IPO9WEE1MRPL10UBA5ERP21EXTCCTBTN2A1CARD8STK3C7orf31MTCH2HEMGNRNF141PPPRR1ZBTB41 1NADKUFD1CALOC100129827BIMMTKLHDC1MYSTSTNAT5MS4A1SUSP4NUDTNME2 2UQCRZFAND2XKBUD1JUMRPL16EIF4EASNSD1FADS1PDE6OR10HSSU7MYEFNUP62CZNF619 8SLC12APRM3ZNF493TMEM215KRTMEM2SEC61GTMEM1451DOCK9NSMCE1C11orf52 MANP32DKRAPDC2NR4A2RAB1FAM18BSSTR4FXYD6ZNF9CNG6ON2SFRS5CYP2B6GINS2AT BHAX1VAMP5FLJ21272GH1OR52WOX 2KCNJPLILRA4WFDC1 7KRSBSNYY2AR 2TAP19−729PLEKHA9AKR1C1TAP1−1 TOR10G 1P5CYGBLGPR6 12IFNA2ADH1ZBTB37DNAJB143 DOR4C32T81T1 MGC15705C4orf312LIN54DCTN2PLAAMEMO1CHRM3AQP7 4IER2MCTS1FAM114AJZNF1891 0 ARPS1 45RPS6KB1FANCD2CHD2SLDIS3STX4 1UBLLASNSCCDC117CYLC10orf28ZNF443DDX2 1BRHD 3SFT2D AALDH1A1INTS6LRRFIP2 8FSCN1FEM1KLF6FPR2CHD4RPLRB1EMLRY 3WD2ACGPR6OLAHRCEP135PIM2 1D8SERPINB1GALNT1SPPL2ADIAPH2 TN2CD6MLH1 1GADD45A2LCN2 24 4MAPK1HIP1 C2UBR1 5 9 4 IRAK3ORM2 P2 RY 10 FA IM3 CACEP6FAM102AGPR114 CNB4CD9DYRK2GOLGA8C1orf183KIAA1257KIAA1737GCM1ATXN1 3PRSS3MRPL45MAP4K5TPPP3VPS37CZNF480C14orf159ITGA5CD300AGPR1PCGF5UNQ1829ZNF574ITGB3CNR2SH3BGRLTH1LEVLFAM38AEIF3JGLTSCR2ANP32BST6GALNAPGRMC2LATPCNAODC1HEXIM1RGS1TMEM164GFI1BSUCLG2SLC35EPDK2JMJDIL12RB1WDR8C21orf7CDH2CHERPZNF673POLR2PPP1R16BMGSTLOC338799PTBP1RBM1PTERTESSP1MYLPRAF2LBNHP2L1TBC1D1MNTEIF3NDUIDH2ABHD3LETMD1 0USP3LOC440552CA2RFC5HRBLMGC9913CCDC5ARL6IPGOLIMTSC22DATPISDISG20L2 3C5orf4RHOBTB3PRRG4BMRPS18BEEF2KOGFOD2MYCTURSTTTF1WDR9BET1LCHMP1LANCLFAM65ANFE2L1BBS1IMP3PAINADLTHAP1BAC9orf379WDR5SNCAFAM53CC21orf29HIST1H4LTMEM6 HPTCD3GPR2IQCEBCLC4orf15ZNFTGIF1ZZZRCC2EIF3HZNF583JMELK4SIRPGUBR2AGITGA2MRPS3P5TFB2MGZMBPIK3C2CLEC1NAB2ACP5DAPI4K2ALGALS2ZBTB1C9orf71AP1AGBRD4PRIM1NEO1 8PMS2CLGNAZ9SCARNA4PPM1RPS1ZNF740LPAR5UTP1 EOSC2orf37ICZBTB47C17orf97NUP5FAZNF417 G5MIF3FGFR 2FCHO2RNF113AZNF600MLL2 43TCEALYZNF5866C10orf111 OFLJ27243SIREG3PATKRPPIC11orf10G1DPY19L2CTRB2LMO4 DFAM182A3PV SING2233L4GP6 2BTF3L1SELRPL23AP112TIPIN0MRGPRX21 4OR10GRNF138MGC3771 9GIN1RTGYPB2NUPR1CYP2A1 7LILRP2PEX11ASIOR51S 1TP53TG7 C6KREIF4A33T83NDUANKRD1 BFAM160BGJAHCTF1ALSCARB1LOC729927HRKAKAP6 ACOMMD5C1orf1763OR10HhCG_2039148NDUFS4GTPBP4 195ZNF259DCCL22 BCHRFARL1146SETDB15OR10SMSIRTGOSR103BRAE1FOS 2C9orf96SFRS1 PDBF4C1orf190MULGAPVD1SREBFUCNRFPL GTAP5−1PPP1CBBMP8BA4VPS3MRPL18ZNF138MPZLUBTD2BAT5 3RND1HNRNPMC4orf32ADAM17AK3L1CHMP2NPHS1MAGTMORF4L17KMOFAEDEM2RAB33BDDX5YIPFFRS2EIF4E3SDF2 2ECE1UBL 7ZNF277SNX1C16orf87EXOC6TGFCCDC9 1ISY1TRIP10 AM7A1SQRDL3DRAMOLFM 14UBQLN2 2ZNF41015PEFBANXA7TCFL30 CSTF2TTRAF 14 25TF A1UGP2CUX1 B G1METTLTMEM180SEC22BLSDP5 4FUT7TMEM205 5ABP1DSE 3HSCARD6PA 9ACN91B IL18R 1TIF A HEX1_T6TUN5_T6 PC 2 RCAN3GPR3C14orf104DAAM2UQCRQFLJ23185SUPV3L1ZMYND1TTRAPSPNS3HSH2CDK6ZNF395 4LRIGGATCTN3TSPYPELYWHAHENC1PSMD8SCMH1TAEXOSC9CCTLYPOP7SEMG1TMEM9SIT1FUT10TMEM208 AN1D 1AR1PSG6 3LARP5CO 13ACTL7B 3BRISENP5OR5M1 X1TATEIF4E 3CCDC5CLEC4ND6C20orf160 8IFT573TBCAMAP1LC3ADN1MED1TRIP12TMTC1ZNF250GTPBP1 A3 2NCR1COP1 GRAMD1CD3 G C PRKNAAACRIP1TSPACTCTN1LFNGCLEC9DNAJC9AESFAM174ARITM2CMBNLSPTUBGCP4SEPTBCFDOCC− UNX3AN1TUBB1NCLC6orf48CAMK1PPTC7SDC2PRPF19C12orf10TSC22DPDLIMSPNVSIG1SIGLEC7 ARCGSTM1ERCC1ITM2C12orf32REPS1IKZFKLHL21CTDSPLSSBRPLNIPSNAP1CAMK4POLR2WDR7SKAP1BNIP3RRP1PRKCQBCL11BSLC22ADP 8TMEM204RPS2TGFBR3MAP4K1 AFGFBP2JUMRPL37PLEKHA1C14orf142 1RPP3ALS2CR1IL18B2TRA@MLMRPL50C1orf55ICOSZNF826 1XPCCYB561MRPLEBPMFHAS1SNX5SHMTNELL2NOTCH1SLFN11PHF5AEFHC1AGPRPF4FLJ12529FXC1CIB1CLELP2C6orf125LRPPRCSIDT2 ASLC39A2GPRIN3NB4GA 8UGDHPIAS2TRIM27FLJ32065SSBP4BRP4MAP2K4TXNDC5SLC10ATAERGIC1CCR4NGDNTRIB2TRIM44LRRC5 1SLC47ANCBP2SMPD4PPP3CCC11orf24LARSLDOC1CARD1MTLTCRZSWIM1CBR1FARSBHNRNPH3PEBP1TARDBPPAARMETCYP4V2CDK4ABCD2TMOD2SRPRBADPRHLF2CBWD3CD2APTNFRSF14XPO5 ULOC25845EIF3B2NDE1CALTXPCNXLT11MED3GOLGA5ALKBH1 0GA5RAD5KRCCDC6ZNF818PTM2DNOLF5MPIGLN2UNQ3118HTHEM48 PZNF2643SAP1SUPT3ZNF134AORPLP0PCTK1CHST12NAT1 2TNRVNDUFB6Y1FLJ10246RCXCLZNF585AC17orf88 9COSHFM1ESAM1PTPN1TGIF2ARMCX6IDHSKRRPL14 2KFOSBRPL17POLKRIGLT TCAP2FABIN1RPL364OR1G1OLIGRIT1 BRF2C7orf47UBE2E2LOC100131373GTF2IRD2NKX3−1UAFABZFYVE2FLJ11235 62OPN4OR6C3 X8J19PAFAM108A TAP10−93RFWD3U2AFCPN13 TAP10−1CSRP2701TAP19−3 2OR13C 1ALS2CR2ARF68SNX1ZC3H1 B2IP8L2EMILINFBXO3 65MXRA8OR2B1GLISAPSMB6NAT88 HOBP22GJINTSLYDND1 1ZDHHC9TWF1SBFC1orf97USP3FTSJPRKAB1RAB2 3RELC14orf138C1orf91TRIM3TINF2FAM105AACSM2C1orf74SNX1DNAJC1SMG1PPP1R2SP140CPEB4LCE3CREMGPR8 A1NAT81UBAP1TDGFTLRRPS2C10orf97CTGPR2CEPT 1DLEC1BAZ1 0GNS2 3B4GAUBE2BNIP2AMUC1MXD3DENND2RAL 8B 3SYNGR2 208STX1 B1CD1HFE 2 8 ALTMEM165 1ATP6V1C1GRAMD1 1B SZNF445B A17 H6LTLOC100129395RCHY1 4MARCKS 15LDHA DB3GNT5DUSP5GRINARAB3 AC6orf150 2 HEL B CD177 ITGA4DUSP7GPR4PTP4A3KLHDC8CD226 4LEF1HIST1H3HANAPC1RPS2TTLL5KLF1CDC25BPDK4AQP3LOC152217BTN3A3CD6SF3BSRP7ARRDC3 BEXOSC2FGFR1OTIMM8SSR4ELK3PLSCR3THEM2GGA2RAB27BHIST1H2BNUP8ATP5G1PLCG1AKTDCTZAP7TREMLIPO5SLC4A1RPS1WBSCR2DDX2TRAM2 9RPS2MRFDDHD2 2GPR8KLRFDNAJC1DPP7CD79AMBPHDTXNRD1APOBEC3MRPS7 5WDR5DYRK1PARP1CAMK2N1ZNF253NOL11SIGIRRFLJ34077CD6 23LOC170082CD40LGAHNAKMGC13057RNF26RSPH3COQ10AC19orf40MOBKL1ANOL14IBTKABHD14AS100ZIARS 3POU2F2YIF1ANOP5JUNDYARSZNF701ML DICAM2TEDOM3BCL11AMTSS1ACNIP7CIRBPRBM8ZNF317TMEM183AWDR7LYSMD2OTUD4 8BSDPRSEC11CNARG1C7orf27ANKHCXCR3PF4VDPFEOMES AP1L10GIMAP1TARPERAP2TULPC3orf59LDLRAP1SNORA4 6DDX1TMSB1C14orf93DNAJC2MRPL178ASCC33TBRG4SLC25A PPOLR3SIDTCHRTRMT1MIS1BMPR1ZNF429FCRL 1PIGMCLY2BZW2C7orf53LTCAM 8FUSIP1GSTTP1ATICRCCDC88A 1SLC41AACAA2AU5FAM14ATASUSD2PRDX1ZNF419NCR3GCETTMEM1SSPNCP 0PLEKHO2FLJ23865FXYD7KCTD63RPL36ACWC1HEY1USP1SLC7A1 ORCGA 5GLB6G5B11CD82ABI MCCL235TSSK4HIST1H2BF9BTAS2R4METTL2AOR56BTSZ1TAS2R6CTSBA A3LCE14 ACPRRFESD1KROR52EUPK2 02KRHNMTCHATABAK1TXLNCCT8L2MORF4L2 FCHMP5SPI16 3ZXCLNS1EIF4RFC4CDKN2S100A13CALCAANKRD1SMR3R 20GTF2MAP3K1LOC100128356GUCA2XIAPGM2C14orf133LRP DMFAP3BDNAJB9TAS2R4 2T1560ZNF672SPRR1RAB3KLF9 AY12TRIM5SLC44AFASTKD3 TAP10−713WDD1L1 TAP5−34BTBD1C1RLZNF669ZNF330MED15ARID4ZBP1HIST1H3EDAMKKS 3BASP11CXorf21LSOR1S1FAM26DRIPK1CDC42EP3TLEHIASPHD2SPRR2 C1MITD LF1POTYMPGPR108 H1SP100ZNF460 AAP1LHCASP1HPS5OR2M7STYXLNRM A4ARSBCPDSATCARSALAS1 0B5NFTNFRSF10A ANECAP2AFL T1GBGTD1 OBTDRD7PARP1 6A18SLC9A8GALNTITIH ABSEPHS2 1HDGF23ELMO2OT 4NXTATIDI1 CID3UBA6PRIC285FFTMEM110FL 01G72 4CEBPZ1 AR2OT 2 0 2 1ACSL 3 GYG1 IL7RCYSNFSPOCK2LTCD2ATCCLDNASE2R2GPR137BC2IL2R 7CD5EPHB1S 5MGC16384GRAP2RSL1DPATFTITGB7KLRK1CASS4TCFPVRI BKCNA3METTL7AR OARL4CKIAA0182BTN3A2A6SLC8A1ITK ORASEPT11NRGNSNRMAP3K1MED2 7TIGD3IKZFTSPYLTRBC2APOTNFRSF9 GZNF18 1HIST1H4DCD244FUATSFRS7HEG1BCORGAGP1BASLC31ASDHIST1H1EKLRD1PHF17INPP4MYCHIST1H2AMPRKCALOC286161FAM46CTIGD1CDKN2AICAMK2HIST1H2AABCE1LRRC2PANAP1L1GZMAP1A1C4orf33 ZDAB2C14orf126TAPPTIAM2 3BMP6ANXA2P2KIAA1919TAZBTB3RTPAA1BPCIRH1TXN2 AD1C1QBPHNRNHIST1H1DALG3CTSC5ATC2orf61TNFRSF25ADARB1LOC388161CLSTN1ZNF436RASGRFALG8OBFC2PASAMD3HIST1H3BNUCKS1PPP4R1ABILSM4 2F9MT1XABHD14BOASPRZNF8SH2D1PDIA6 APDC13CD300CRG9MTD1PLXND1BCKDKSH2D2FAH2SLC41APEPDN1GPR1 4HIST1H4ABANK1ZNF746RERERASA4ZNF764KRRPL37APARP1PFKPF4RPL13HIST1H2BC17orf71MGSTLRP1ZNF813CD8TOE1NHEDC22G BLOC402176HDNOSITRIM23CPT1YWHTCL1AEIF3KHIST1H2BDZ2SCNM1AAD3EFHC2KIAA1598 15WBSCR1C6orf62 DTAP5−9APOMP2RX7GHRH YD4IHPK2APA4 4HARIN2CSN1S2ACFER1L3TMEM4FAM133BPSMB4 5PBTF3MGC33894 6KRRNF146TIMP1GZF1 BC1orf665LKYNUC8orf44KRZNF230UBBFAM117A BCCDC101C15orf571P BOTFAM151BACA1AQVCR2 APA7 2RPLNT5CMATN3 1SLAMFCAPNS2 TAP21−2ARHGEFDENND5GHRLKRLOC153684MYO1 TAP9−4PDLIMPOLRMTPPOP1EYL1CTNNA1C15orf37CSIN2PIP5K1CASZ1CTDP1TMEM87AMAD2L1BCCILCN1DYRPL35AC6orf81C1orf25HAT1 NAPBA3PLCLICARL13B T89A1E2F22ZNF107 A3TNIP ARPS2 3FANLT1CASP1ACPRDM1AG NHDMPZL2ZNF254UNQ5814 0VA 1WDFY3 5FBXO9GTDC1HBEGFS 2 1OT1 A52ACGALPACST 59 P4MAD2L2PPP4R2 C7 BATFOSMANXA3TFRC PSTPIP2 DOR8_T6 CD3 E TC2NMMDGPR183KLRG1TMEM109FLT3LGGSTP1NUTFCD8FCGR2LOC147727CD2APOBEC3COSUMO2DDX4IL2RCHSTLYC6orf25ZNF385ALMAN1LCLIDDIT32CNOKIAA1683 A9PFDN6 X1FLJ20160PRR3NPC2 8PAHSMRPLLASS6LPAR1 KFCRLAT AZNF136C15orf39TMEM119TMEM4UTP267 BHIST1H2BTEP1TLRTMEM150FLJ38717 7LOC439911PRPS1ZNF333SERINC5LILRB1PAWDR1 T6LPGBD4CXXC5P5G2UIMC1B3GNTL1CPVLSLC39AIFNAR2C13orf27TOPORSDDEF1IT1DPM3SND1OR6N2NOLPHF23HP9PSMA5DULLARDD3LYRM5WASH37 6MAFEIF1AAMIGO1MSH6SF3B 3TPRUSP1OSTbetaFARTULBP1KLF11LOC376693OR4A5STFLJ27255 0TYW3PR9ACVR2LINGO4GLLSM3 8UFLJ38379 BELP3CMTM11 AB1 7ANKRD1PAF1SRD5A3C1GA O2012GA4PPP1R3ALDH7A1SEPT14PLD2WDR2 DSNX1 5PIN4FLJ43692ZDHHC2TTBK2DPH5FURIIKZF4PPIL1RBM4NOL10 PEIF3C11orf71C5orf39TTC35PRTAAR1PTPN1FBN1SST3GALPHFXRN1SNORD49APATRAB2 AFLJ36031 OK2NIT1LTGPR109B NI 0HIVEP1LIMK2REC8 38A131C B7KLHLC5orf32 0SNX24FBXO6 12RHOHERLIN 2ADMCD4 0IFIH 41C19orf59HPR BUR12_ T −0.2 GPR174 CD5CD8LDHBCD9F13A 2 1 LRRC8PIGB 6CD4HIST1H3IPPBP 1TTC39CHMAPEX1HBBC12orf57CD247MDFICCCDC1 CDPP4GIMAP7OXPHF15CCDC6C16orf61HIST1H2BCCD300LF1PYHIN1HIST2H2AZNF217M6PRMALKLF16LZTFL1SEC14L1HLA−DQA2REM2TRIMRHOCHLA−DRB5MYSTZSCAN1SRMC3orf34DKFZP586I1420DUSP6BANP 9HYOU1IGKCGIMAP6MOCS3ATTRGV7APOLAPOBEC3TTC9CD7ACRBPRRASCNOZNF14ING4GPX1LIMK1UTP1UBE2D4P10DMYBPHHIST1H4F 8CHRNEGALK1DUSP1PHF1 1ZNF211GSLYYTHDFGSTK1TMEM176BSH2B2ZBED5KRTHAP5 4MGC10814ANKRD4 C3HEAMAC1T386 B6KRPOLD3UBE2D1ZCCHC2LOC100129608TOTRIM6−TRIM34LOC554175TAS2R3FKSG8U2AF1L4 8OR2T12TAP5−6ATPSMB3GRB1PITRM1OR1E1ATVPS2PSMB8 0FTLZNF101ANKFY1TAP12−4LYPLA2CLP11CD2BP2NAPHF11P1B3 BBRCA2 2TREMLGNELOC401152R1FBXO3FAM126BTEAD3SLC25A12C16orf57GSDMDCASP4ZHX2PA 0MUSTN1 953IRF9SL 9DAPP1MCTP2CYB5D1GPR109ANOD2NDUFB3M6PRBP1HLGTPBP2 K 2 4RAB4 X GNA1MSL3L1UBE2E1 3MCTP1 5 UPP1 CD3 DMICALCLPRF1SLC7A7CD2LRRC3CCDC109BINSIG1HIST1H2BSLC45A 3HIST2H2BRMIDNAJB5RICTPPCDCHNRNPRANKRD4DPYSLKTI12HLA−DOBWWC3N4BP2L1HIST1H3DSEC24DPFDN5UNQ6228EIF2C 1SGK3SLC27APYGMTUBA1TMLH ORFLJ22639EIF3A 4MT1IPS1PR4 FC12orf62RARADNAJB6LOC255783GALNT14LYSMC1ZMYND1KRAP3D1DGITPR2C14orf148BAT496 2FAM175BMYH1C20orf30 FC14orf147 TAP5−52 TREML ECCDC1ATSAMD9DNAJC2 BTRIM21 3MYBPC3ASLC22AEBIRBM4ACSTX1STEAP32CTR9C10orf78C3orf19ATP13AFBXO3DNAJC2 5TARSZC3H3tcag7.1015TA 3C9orf46MTHFD2EZH2DAIFI16PVRL 4TRPM6RNF1DDX5EGR2 72 7 7CH1MEF2A 4MR1SNX3 9 23 08 5CDKN1SPINT2SO RTSOCS3 A1 HEX1_T4 BUR12_T6 GZMKCTSWCCR3GNTR ATLY 1FBGIMAP5CAPN2 L PRKCHKIAA1430HLA−DMAIL32IDH3CBX5HCG2TBCCRNPS1PRKLPINCASP8ZCCHC6SERNACAPLB1ZBTBLYVE1ANXA2IRF8TF AZNF552TMEM188BAMRPS1EID3NUDT16PTMEM167ACD180C1orf77CD7AG AMUSFCDC2 7NRBFHS 2CH1TMEM176A TAD3UNQ9419ZNF20C1orf27RNF1KRLOC100129443WDR51BMED1 6BCLLOC26010RFC22DUSP1PAALPK1TLE3 TAP6−31EZADCY3PSMA3HSPB1NASPMGC27348 5ADGNG5 652ZNFX 9KIAA0329 3CD163C6orf97 R 1POLA2WDR51AGALNTLYRM1BATF2MS4A6CHSY1MTHFATMVPC20orf74ZNF276 6DDX2BST1CD4DDAH2HAMP 1SNAPIPIK3AP1AIM2IKBKES 2PTPN2 4FF 8AIL4R 3LACTBCR AR3 N TCCNIH4 3 TNFRSF10DCISHS PAT C1 HP TUN5_T4 KLRB1SLAMF 6 PTPRCAPFYN MARCH1HIST1H2AHLA−DMBHIST4HERN1HLA−DQA1FUNDC1P2AIF1HOXA1RYDKFZp566H0824GPR171ST8SIAMANBALRARRES3 45CCNJMRPL44SHISA5 ECISLC2A1TNFSF1SSH2ALCAMGIMAP8HIST2H2BE RDEKOCLSIGLEC9RHBDD2VCPIP1 4TBC1D1 L KCNJ15IRAK4CEP6TMEM7C17orf62 MSH3GLB1IFNAR1KIAA0746 0 SLC43ADMXL 8 4 ARID5ABL 0RHOU 2C15orf48 2 3EXT1LMNB1KIAA1632 A SPP1 DOR8_T4CLIC 4 GPR162 HIST1H3PTRH2TMEM147HSPC159FRFRAVATCHI3L1 AATC14orf941ILPRKXSTTNIKC7orf30TNFRSF12ZNF473PRMTCD300ESCYLATGMPR2PSMB9TRIM3TMEM149CYP2F14VDCNPPDCD1LGC1orf217SLC12APLHHEX 1KLF4 5LTCMPK2ETV6ACPARP1PAPSS2SIAC 7TESK2A4H AFLJ40125ENPP2SBNO2PAPER2PLSCR41CD8SAMD98DHRS1ACSLRNF12PUS3ECOP1L1 9PRDM8 2PHTF1 2SLC30A 2C21orf91THEX1SPECC1FLJ34047 5 L 3PORSC4MOLUGCG 7 L EMR1 TRPS1 HEX1_T6TUN5_T 6 MPZL1MICALFLJ42957SLFN5 2 CSFHSP90BFLJ10357CYP27ASEMA7HIST1H2A 1PEX1THOC5ANKRD2NUDT16SERPGRMC1 1CDC42EP2 2TIMM1 1APLA2G7OASLRPL31FKBP1NUB1TMEM126B TAD1NPNDRG1 KKIAA1618PNPLA6LY 8SLC20APHLPP 06G6CC17orf63TIP 1TRAFD1C4orf16TNFSF13BNDST2RSBN1 ARPLGALS8 1 MLMLKLNLRC4LT 6 PLSCR1 AHRNF−ECCR2CD8 4NKG7JMJD1CSLC31AHIST1H4BTLR6 4NUAK2ASPHP2PDCD4OMGRY 13 2SMPDL3BMGC39372TSPRNF213GAB3RRASSF4ZEB2ST TP4GGT5YIPF AN1PNPTERALAT 1 17 1RPL15FKBP5IFI35 DOR8_T6 SLC15ACLEC7HLA−DRB1HIST1H4DDHD1SEC61A 4CDKN1C9orf91C18orf21PSMB1 AMLXIGIMAP4C4orf18 ECDC123SLC25A37IFITMDHRS9 PBATF6 2CXCL16 0TM9SF4SCARF1ARFGAP3HSP90AA1 1CYP1B1KDSRKREMEN1DHX5ZNF200NARFZAKEPCHCHD7SLAMF 8 AS1TWSG1GK3PDSS1SLC39ACCRLC13orf23FMNLNR2E1TMED8NEDD4PHCA P 8 2 3 8ALDH2 HGF HLA−DPB1 SIGLEC1F2RL1TGFBILYXRN26ESUMO1P1 0C21orf109SERTLRC3orf38APOO TAD2GBP7INTS1SLC25A28NFE2DTX3FLJ14213 1 USP5ZDHHC1 0ECT LIRF7C14orf101 3 2 9 OST alpha BUR12_T6 PMAIP1HLA−DP A1 PIGYSMAD3FAM129AGYPCOBFC1CX3CR1UBE2J1PAPSS1HERC6LPCALCE1DISC1ATL3IFIT5FNDC3LHFPLLAP3PNKDC4BPDTNBP1T3ACSLFLVCR2 D CPNE5APA 2B 4G1CEA CAM1 TUN5_T4IL6S T 0.20 0.25 0.30 0.35 LOC400590C14orf4CD3PEA1CYTH4 6TMEM140TMEM173 5 INDOMLLTTTC26ZNF4381FTSJDPIM1A26C1ANSUN7SEMA6 2 GBP1ST B OM MAN1A1TLR10CD9 7PSME2HS2STAPOLCSGALNA 1ETV7CDK2WSB2 2 DUSP3TRIMPMLRHBDF2GBP3CD5EGR1 CT1IFI44 5RAB7L1 9 SLC11A 2 AT HLBHLHB2 1MAMLRPL28 2GPR155ATP2B4KIAA0586SC5DLAFF1RCVRNST3GALSI PARAB21L2 4 C16orf7 0 S100A10 NLRP6HIST1H4CC17orf87 NUDT 5PIUBE2L6 3CIITIFIT APGS1GKTCF7L2CMAH 3 CHI3L2 HLA−DRADSC2CTRLIL13RA1IFIT2HS APOLPA 6IFI6STNANSTRIM25 1 AT5AXAF1ST ATLAIR2 1 PC1 (68 %) CS DA MTRRZC3HAGABBR1 V1C12orf4HPGD THBDGIMAP2LT BR MOV1C2orf59PARP1 0 4 IFI44LCXCL10 MARCH3 EMR2 RFX2S1PR1ALPLADRBK2TBC1D1BCL2L11PARP9JAK3INSL 5 3 MX1 SERPINB9

50 LOC646470SLC46A 3 DOCK1OAS1 0 C3 MAK HRH2EPSTI1SAMHD1C11orf75LAMP3 TPK1DDX6HIVEP2 0 SLC1A3TFEC

TRIM22 00 GBP5DOCK4RSAD2KLHLGAS7 5 ENTPD7 1400 CLEC2VCAN B IFIT1STK3OAS2APOL 6 SGPP2

−0.0 HERC5 OAS3 C upregulated SLPI −5 SECTM1SERPING1 TBC1D8 SLAMF7 P2 RY 14 TLR 5 downregulated FCGR1 A CD274 1200 GBP4 IL1RAP CASP5 1000

CCR1

800 −0.05 0.00 0.05 600 No. Genes PC1 400

200

0 Any All 2h 4h 6h 2h 4h 6h only only only Time

Figure 2 - Time specific expression pattern post LPS infusion.

(A) Principal component analysis of temporal gene expression response to LPS reveals a separation of neutrophil gene expression at T=2, from those of T=4 and T=6h after infusion of LPS in 4 volunteers. (B) PCA of time post LPS infusion. 2h, red ; 4h, blue; 6h, magenta. Differentiating codes for volunteers : BUR12, HEX1, TUN5 and DOR8. Volunteer code suffix _Tx denotes time after LPS infusion (C) Overview of number of temporally differentially expressed genes in neutrophils after systemic LPS infusion.

140 Transcriptome kinetics of circulating neutrophils during human experimental endotoxemia

Figure 3 - Persistent changes in gene expression.

(A) Genes which are persistently upregulated in time. (B) Genes which are persistently downregulated in time. (C) Genes which are regulated in opposite direction between t=2h and t=4h/t=6h after LPS infusion (wavy genes). (D) Heat map with fold change of genes relative to t=0. Green represents downregulation and red upregulation.

3.3 Validation of microarray by q-PCR The top 5 persistently upregulated and the top 5 persistently downregulated genes were selected for q-PCR validation of the microarray, as well as 5 relevant inflammatory genes with wavy behaviour (Table 1). Q-PCR gene expression analysis was similar with the expression data from the microarray (Figure 4). Strongly downregulated IL7R and CD3G confirm minor PBMC contamination in isolated neutrophil fractions.

3.4 A temporally stable neutrophil transcriptome reveals “inflammatory response” and “anti-apoptosis” for upregulated genes, and “lymphocyte activation” for downregulated genes 6 In order to unravel the biological processes of the genes that are stably upregulated in neutrophils after systemic LPS infusion, we performed a Gene Ontology (GO) analysis using the 233 genes that are upregulated at T=2h, 4h, and T=6h relative to 0h in the four volunteers examined. GO analysis revealed several biological processes to be significant

141 Table 1 - Selection of top 5 persistently upregulated, persistently downregulated, and relevant wavy genes from microarray data for q-PCR validation.

Gene 2h 4h 6h Description symbol Persistently CD177 3.48 4.5 4.64 CD177 molecule upregulated HGF 2.39 4.43 5.04 hepatocyte growth factor (hepapoietin A; scatter factor) HP 2.48 3.9 4.22 haptoglobin TRPS1 2.22 3.89 4.03 trichorhinophalangeal syndrome I IL18R1 2.4 3.12 3.33 interleukin 18 receptor 1 Persistently P2RY10 -3.67 -4.27 -4.53 purinergic receptor P2Y, G-protein downregulated coupled, 10 IL7R -3.19 -3.48 -3.69 interleukin 7 receptor FAIM3 -2.87 -3.17 -3.75 Fas apoptotic inhibitory molecule 3 ITGA4 -2.91 -3.14 -3.15 integrin, alpha 4 (antigen CD49D, alpha 4 subunit of VLA-4 receptor) CD3G -3 -3.06 -3.81 CD3g molecule, gamma (CD3-TCR complex) Wavy behaviour FCGR1A -1.56 2.34 2.89 Fc fragment of IgG, high affinity Ia, receptor (CD64) STAT5A -1.03 1.27 1.28 signal transducer and activator of transcription 5A JAK3 -1.31 1.16 1.35 Janus kinase 3 IL1RAP -3.4 1.06 1.41 interleukin 1 receptor accessory protein KCNAB2 1.29 -0.38 -1.3 potassium voltage-gated channel, shaker- related subfamily, beta member 2

(p<0.05) for the set of upregulated genes and indicates “inflammatory response” and “anti-apoptosis” as the principal biological processes (Figure 5). Likewise, GO analysis was carried out for the 307 genes that are downregulated in expression at T=2h, T=4h, and T=6h after LPS infusion. This set of stably downregulated genes are also significantly (p<0.05) enriched in several processes, including “lymphocyte activation” as the main biological process (Figure 5). The enrichment of downregulated genes in the process of “lymphocyte activation” is interestingly coherent with the low levels of circulating lymphocytes (PBMC population, Figure 1A) post LPS infusion. Meanwhile, the enrichment of the upregulated genes in the processes of “inflammatory response” and “anti-apoptosis” is in line with the observed circulatory neutrophilia (PMN population, Figure 1A).

142 Transcriptome kinetics of circulating neutrophils during human experimental endotoxemia

Figure 4 - Validation of array results by taqman gene expression assay q-PCR.

(A) Gene expression in time relative to t=0 for 16 target genes. (B) Corresponding q-pcr results for same geneset in time relative to t=0.

GO analysis of all the genes that are perturbed in expression respectively at 2h, 4h and 6h after LPS infusion, reveal biological processes that are not much different from, but further support, those of the stable responses. For example, “induction of apoptosis” is significant (p<0.05) for genes that are downregulated at T=2h and is supportive of “negative regulation of apoptosis”, which is a process wherein genes that are upregulated at all time points are significantly (p<0.05) enriched.

3.5 Genes that are consistently up- or downregulated in neutrophils after LPS infusion show physical interaction enrichment between their encoded proteins The higher the number of protein-protein interactions within a set of proteins, the more functionally cohesive one expects the set to be. The overrepresentation of interactions among a set of proteins, relative to a random set of proteins with the same number of interactions in the complete set of known protein-protein interactions, can be expressed with a Physical Interaction Enrichment (PIE) score17. The PIE score for randomly chosen genes is expected to be 1.0; P-values are obtained by randomly sampling sets of genes. To identify relevant networks involved in the neutrophil kinetics after LPS infusion, we analyzed the genes that were consistently upregulated (233) or downregulated (307). Both the consistently upregulated and the consistently downregulated genes show a high number of protein-protein interactions among them (PIE score 1.139, p=0.053 for 6 the upregulated genes and PIE score 1.274, p=0.002 for the downregulated genes), indicating that they are functionally cohesive. As mentioned in the section above, the upregulated genes are enriched in the process of inflammation, while the downregulated genes are enriched in the process of lymphocyte activation. Based on integration of

143 corr p >0.05; <=0.05; <=0.01; <=0.001 respectively

Figure 5 - Overview of biological processes of genes that are expressed in similar ways at all time points after LPS infusion.

“DownregAllTimes”, “UpregAllTimes” respectively refer to the down- or upregulated genes at T=2, 4 and 6h after LPS infusion. “OverallPerturbed” refers to the complete set of 2248 genes that are perturbed upon LPS infusion.

144 Transcriptome kinetics of circulating neutrophils during human experimental endotoxemia

gene expression data with protein-protein interaction data we constructed networks of their interacting proteins at various time points post LPS infusion (Figure 6). For the scope of this manuscript, we focus on the upregulated genes since we are interested in LPS-induced inflammation. We also depict interacting proteins of the wavy genes on this network of upregulated genes. The protein-protein interaction network of upregulated genes contains interacting proteins involved in signaling of TNFα and IL-1, regulators of transcription (NFκB family), apoptosis and ligand-receptor interaction (Figure 6).

Network properties PIE score= 1.139, p=0.035 Legend edge: protein-protein interaction red nodes: upregulated genes blue nodes: downregulated genes T=2h large nodes: persistently upregulated genes

T=6h T=4h

Figure 6 - PPI network of LPS-perturbed genes. Red node, upregulated gene.

Blue node, downregulated gene. Grey node, default node colour (i.e. gene not up- or down-regulated at the fold change threshold of 2).

3.6 TNF-signaling, inflammatory networks and apoptosis networks are significantly influenced by LPS infusion Most of the upregulated genes that are involved in inflammatory pathways and genes 6 of the TNFα signaling pathway were affected early after LPS infusion. In general, transcription factor expression for inflammatory genes was observed as represented by the upregulation of NFκB-family genes NFKB1, NFKB2, NFKBIA, NFKBID and NFKBIE (Figure 7A).

145 Figure 7 - Kinetic behaviour of inflammatory genes.

(A) TNF related genes and genes from TNF receptor family. Fold change in time relative to t=0. (B) Genes of the NFκB family. Fold change in time relative to t=0. (C) Apoptosis mediating genes. Fold change in time relative to t=0.

Since TNFα plasma levels increased early following LPS infusion in parallel with circulating neutrophils, we investigated the activation of neutrophils by TNFα. TNFα inducible proteins 3 and 6 were strongly upregulated at all time points with peaks at t=2 hours. TNFAIP3 was also validated by q-PCR because of its important role in TNFα signaling (Figure 4). Furthermore, TRAF genes were induced and TNF receptor gene expression was differentially regulated (Figure 7B). TNFα gene expression was upregulated at t=2 hours, but not at t=4 hours and t=6 hours. These data suggest that

146 Transcriptome kinetics of circulating neutrophils during human experimental endotoxemia

circulating neutrophils after LPS infusion are activated by circulating TNFα. Apoptosis of neutrophils is inhibited by G-CSF and interestingly, apoptosis of circulating neutrophils is decreased during sepsis19. Early after LPS infusion in healthy volunteers, plasma levels of G-CSF increased (Figure 1C). Apoptosis pathway genes were differentially regulated, with among them a pronounced upregulation of anti-apoptotic genes BIRC3 and TNFRSF10D (Figure 7C).

3.7 Specific ex vivo stimulation of neutrophils results in stimulus specific gene expression patterns The induction of the increasingly upregulated genes and TNF/anti-apoptosis pathways could be initiated by multiple ligands. To investigate ligand specific gene expression regulation detected in the experimental endotoxemia model, freshly isolated neutrophils were stimulated in vitro with LPS, rTNFα, rG-CSF and rGM-CSF. Figure 8 shows neutrophil gene expression at 2 hours after stimulation. The neutrophil specific protein CD177 was mainly induced by G-CSF (15 fold increase) and TNFα induced TNFAIP3 expression (12 fold increase) as well as NFKBIA (8 fold increase). Other genes investigated were not mediated by these specific ligands. In addition, the ex vivo survival of neutrophils was assessed. Neutrophils incubated with LPS and G-CSF for 7 hours showed 10% more viable cells compared with RPMI control, and GM-CSF stimulated cells (Figure 9). TNFα stimulated neutrophils exhibited 10% lower viable cells compared with RPMI. These results confirm that the activation, reflected by gene-expression patterns of circulating neutrophils, can be partially induced by the early secreted cytokines, growth factors, and direct stimulation with LPS.

Figure 8 - Gene expression in in vitro stimulated neutrophils. 6

Cells were stimulated with 10 ng LPS, 10 ng rTNFα, 50 ng rG-CSF or 50 ng rGM-CSF. At t=2h after stimulation RNA was isolated and q-pcr was performed with taqman probes. Fold change relative to unstimulated. Error bars represent SEM (N=4)

147 Figure 9 - In vitro neutrophil survival after stimulation with 10 ng LPS, 10 ng rTNFα, 50 ng rG-CSF or 50 ng rGM-CSF.

Cell viability was determined in ≥ 1x105 cells with Annexin V apoptosis detection kit at 7 hours after stimulation. Dots represent the percentage of viable (Annexin V negative and 7 AAD negative) cells. N=4 *p<0.05.

4. Discussion

In our study, we characterized the transcriptome profiles of circulating neutrophils, in the human experimental endotoxemia model. We showed an abundant upregulation of inflammatory pathways and regulation of apoptosis pathways involving G-CSF, TNFα IL-1α and IL-1β. Microarray analysis revealed direct neutrophil responses to a diverse pool of activators present in high concentrations in circulation after LPS infusion. Clear distinction can be made between early (2 hours after LPS) and late (4 hours and 6 hours after LPS) response transcriptome profiles with highest activation at 2 hours after LPS. In contrast, Calvano and colleagues showed a time dependent activation of inflammatory pathways with peak activation at 4 hours in whole blood10. Interestingly, the profile at 2 hours is highly similar with our current findings as reflected by e.g. early elevated expression of IL1A, IL1B, TNF, TNFAIP3, PTX3 and different chemokines. This suggests that the reported change in whole blood inflammatory gene transcription at 2 hours after LPS administration can be mainly attributed to circulating neutrophils. Furthermore, others have shown previously in this model of systemic inflammation that after LPS administration, a population of CD16dim neutrophils with a banded nuclear morphology appears in blood circulation, most likely released from the bone marrow8. This

148 Transcriptome kinetics of circulating neutrophils during human experimental endotoxemia

population comprises up to 40% of the total amount of circulating neutrophils at 6 hours after LPS administration. In this respect, it is very likely that changes in gene expression at 4 hours and 6 hours after LPS are partially due to an increase of these young CD16dim neutrophils in the circulation. The novel analysis method based on physical protein interaction, enabled us to identify networks of upregulated genes linked with inflammatory activation and apoptosis of neutrophils. By restricting to the protein interaction network with high cohesiveness, we identified potential relevant protein interactions for neutrophils in early inflammatory response. The current analysis revealed a network of eleven different protein interactions that may account for the inflammatory regulation of circulating neutrophils. Since multiple upregulated genes such as Matrix Metalloproteinase 9 (MMP9), Lipocalin 2(LCN2), Hepatocyte growth factor (HGF) and Haptoglobin (HP), are known as granule proteins, the network of upregulated genes we found, is suggesting a change in granule content following activation by pro-inflammatory factors. Interestingly, the expression of these proteins is increased in bone marrow neutrophils and their precursors20-22 . Since HGF and HP are poorly upregulated in expression by specific ex vivo stimulation (Figure 9), it is plausible to ascribe the observed increases in expression to the release of bone marrow neutrophils into the circulation. This is supported by our findings that upregulation of CD177 is even higher at 4 hours and 6 hours after LPS compared with 2 hours after infusion. In contrast with this, we show that G-CSF induced upregulation of CD177 expression ex-vivo. This is confirmed by Passamonti and colleagues, who showed that neutrophils isolated from the bone marrow, or from peripheral blood following G-CSF administration showed both increased CD177 expression compared with circulating granulocytes on steady state23. Taken together, this implies that direct stimulation of circulating neutrophils and influx of bone marrow neutrophils both contribute to changes in gene expression kinetics. During systemic inflammation, the pool of circulating neutrophils is maintained by stimulation of pro-inflammatory pathways (TNF, LPS) and pro-survival routes (G-CSF, GM-CSF). High concentrations of circulating cytokines and growth factors can cause this neutrophilia by induction of neutrophil release from the bone marrow24, 25. In addition to the release from the bone marrow, delayed neutrophil apoptosis may also contribute to high circulating neutrophil numbers. During sepsis, a delayed apoptosis can result in neutrophilia and subsequent tissue damage26. It has been demonstrated already that under steady state conditions, in vivo neutrophil lifespan is about 5.4 days27. We showed under early inflammatory conditions that apoptosis pathways are regulated at the gene expression level. Gene expression profiles indicated delayed neutrophil 6 apoptosis by either early induced expression of anti-apoptotic routes like the TNFα pathway but also G-CSF driven inhibition of pro-apoptotic routes via Calpain and XAF128. On the other hand, ex vivo stimulation of neutrophils with G-CSF, did reduce apoptosis but stimulation with TNFα even increased apoptotic cell death. This implies that

149 although the TNFα survival pathway was upregulated, TNFα has no anti-apoptotic effect early after stimulation. As described recently, IL-1β inhibits IKKβ-NF-κB and thereby contributes to neutrophilia29. The protein interaction network of upregulated genes after LPS infusion suggests increased IL-1 signaling although circulating IL-1β was not detectable. Since IL-1α expression was strongly increased (6 fold) in neutrophils at 2 hours after LPS infusion it may indicate an important role of IL-1α in the observed neutrophilia and activated IL-1 signaling pathway. This supports our hypothesis that in addition to neutrophils infiltrating from the bone marrow, part of the neutrophilia is caused by cytokine and growth factor that increase survival of circulating neutrophils. In neutrophils, the role of IL-1α could be superior in comparison to that of TNFα. The contribution of IL-1 in the mediation of survival in neutrophilia should therefore be further investigated. We hereby directly demonstrate early pro-inflammatory activation of neutrophils reflected by induced pathway expression. Since LPS is cleared from blood circulation very rapidly, high numbers of circulating inflammatory cytokines, released by, e.g. tissue macrophages13 and endothelium, are likely to account for the major part of this activation of neutrophils. TLR pathways are induced in macrophages and endothelial cells directly after infusion of LPS30, 31. We found no persistent upregulation of these pathways in circulating neutrophils. We did, however, show that NFκB family genes were induced and TNF signaling routes were activated. Interestingly, LPS was able to slightly increase neutrophil survival ex vivo. Since neutrophils do express TLR432 , it is plausible that LPS can activate these pathways, however this activation is not reflected at the gene expression level. This is the first full genome gene transcription analysis that has been performed in circulating neutrophils during human experimental endotoxemia. We herewith include the complex environment of circulation after systemic activation. By combining this analysis with specific ex vivo neutrophil stimulation we underlined the complexity of neutrophil kinetics during early systemic inflammation. We showed that gene transcription profiles of inflammatory activated neutrophils reflects extended survival and regulation of inflammatory responses. These changes in neutrophil transcriptome are most likely due to a combination of early activation of circulating neutrophils and a mobilization of young neutrophils from the bone marrow.

Authors’ contributions SdK participated in the design of the study, performed the microarray analysis and additional q-PCR and survival assay and drafted the manuscript. MK contributed to the endotoxemia model and drafted the manuscript. IS performed the bioinformatics analysis of the data. JP, MH and JvdH revised the manuscript critically for important intellectual content. GF participated in the design of the study and revised the

150 Transcriptome kinetics of circulating neutrophils during human experimental endotoxemia

manuscript critically. PH and PP have given final approval of the final version to be published, participated in the design and coordination of the study and revised the manuscript critically. All authors read and approved the final manuscript.

Acknowledgments SdK is supported by ZonMW, The Netherlands. Grant No. 85100003. The authors thank Elles Simonetti and Angela van Diepen for sample preparations and blood cell isolations. IS is supported by the VIRGO consortium, an Innovative Cluster approved by the Netherlands Genomics Initiative and partially funded by the Dutch Government (BSIK 03012), The Netherlands.

6

151 References

1. Martin, G.S., Mannino, D.M., Eaton, S. & Moss, M. The epidemiology of sepsis in the United States from 1979 through 2000. N Engl J Med 348, 1546-54 (2003). 2. Levy, M.M. et al. 2001 SCCM/ESICM/ACCP/ATS/SIS International Sepsis Definitions Conference. Crit Care Med 31, 1250-6 (2003). 3. Medzhitov, R. & Janeway, C.A., Jr. Decoding the patterns of self and nonself by the innate immune system. Science 296, 298-300 (2002). 4. Covert, M.W., Leung, T.H., Gaston, J.E. & Baltimore, D. Achieving stability of lipopolysaccharide-induced NF-kappaB activation. Science 309, 1854-7 (2005). 5. van der Meer, W., Pickkers, P., Scott, C.S., van der Hoeven, J.G. & Gunnewiek, J.K. Hematological indices, inflammatory markers and neutrophil CD64 expression: comparative trends during experimental human endotoxemia. J Endotoxin Res 13, 94-100 (2007). 6. van Gisbergen, K.P., Sanchez-Hernandez, M., Geijtenbeek, T.B. & van Kooyk, Y. Neutrophils mediate immune modulation of dendritic cells through glycosylation-dependent interactions between Mac-1 and DC-SIGN. J Exp Med 201, 1281-92 (2005). 7. Soehnlein, O., Weber, C. & Lindbom, L. Neutrophil granule proteins tune monocytic cell function. Trends Immunol 30, 538-46 (2009). 8. Pillay, J. et al. Functional heterogeneity and differential priming of circulating neutrophils in human experimental endotoxemia. J Leukoc Biol 88, 211-20 (2010). 9. Lowry, S.F. Human endotoxemia: a model for mechanistic insight and therapeutic targeting. Shock 24 Suppl 1, 94-100 (2005). 10. Calvano, S.E. et al. A network-based analysis of systemic inflammation in humans. Nature 437, 1032-7 (2005). 11. Wong, H.R. et al. Genomic expression profiling across the pediatric systemic inflammatory response syndrome, sepsis, and septic shock spectrum. Crit Care Med 37, 1558-66 (2009). 12. Wong, H.R., Freishtat, R.J., Monaco, M., Odoms, K. & Shanley, T.P. Leukocyte subset-derived genomewide expression profiles in pediatric septic shock. Pediatr Crit Care Med 11, 349-55 (2010). 13. Kox, M. et al. Differential ex vivo and in vivo endotoxin tolerance kinetics following human endotoxemia. Crit Care Med (2011). 14. Maere, S., Heymans, K. & Kuiper, M. BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics 21, 3448-9 (2005). 15. Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13, 2498-504 (2003). 16. Peri, S. et al. Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res 13, 2363-71 (2003). 17. Sama, I.E. & Huynen, M.A. Measuring the physical cohesiveness of proteins using physical interaction enrichment. Bioinformatics 26, 2737-43 (2010). 18. Hirschfeld, M., Ma, Y., Weis, J.H., Vogel, S.N. & Weis, J.J. Cutting edge: repurification of lipopolysaccharide eliminates signaling through both human and murine toll-like receptor 2. J Immunol 165, 618-22 (2000). 19. Keel, M. et al. Interleukin-10 counterregulates proinflammatory cytokine-induced inhibition of neutrophil apoptosis during severe sepsis. Blood 90, 3356-63 (1997). 20. Grenier, A. et al. Presence of a mobilizable intracellular pool of hepatocyte growth factor in human poly- morphonuclear neutrophils. Blood 99, 2997-3004 (2002). 21. Theilgaard-Monch, K. et al. Haptoglobin is synthesized during granulocyte differentiation, stored in specific granules, and released by neutrophils in response to activation. Blood 108, 353-61 (2006). 22. Cowland, J.B. & Borregaard, N. The individual regulation of granule protein mRNA levels during neutrophil maturation explains the heterogeneity of neutrophil granules. J Leukoc Biol 66, 989-95 (1999). 23. Passamonti, F. et al. Clinical significance of neutrophil CD177 mRNA expression in Ph-negative chronic my- eloproliferative disorders. Br J Haematol 126, 650-6 (2004). 24. Martin, C. et al. Chemokines acting via CXCR2 and CXCR4 control the release of neutrophils from the bone marrow and their return following senescence. Immunity 19, 583-93 (2003).

152 Transcriptome kinetics of circulating neutrophils during human experimental endotoxemia

25. Zhang, P. et al. The granulopoietic cytokine response and enhancement of granulopoiesis in mice during endotoxemia. Shock 23, 344-52 (2005). 26. Paunel-Gorgulu, A., Flohe, S., Scholz, M., Windolf, J. & Logters, T. Increased serum soluble Fas after major trauma is associated with delayed neutrophil apoptosis and development of sepsis. Crit Care 15, R20 (2011). 27. Pillay, J. et al. In vivo labeling with 2H2O reveals a human neutrophil lifespan of 5.4 days. Blood 116, 625-7 (2010). 28. van Raam, B.J., Drewniak, A., Groenewold, V., van den Berg, T.K. & Kuijpers, T.W. Granulocyte colony-stimulat- ing factor delays neutrophil apoptosis by inhibition of calpains upstream of caspase-3. Blood 112, 2046-54 (2008). 29. Hsu, L.C. et al. IL-1beta-driven neutrophilia preserves antibacterial defense in the absence of the kinase IKKbeta. Nat Immunol 12, 144-50 (2011). 30. Zeuke, S., Ulmer, A.J., Kusumoto, S., Katus, H.A. & Heine, H. TLR4-mediated inflammatory activation of human coronary artery endothelial cells by LPS. Cardiovasc Res 56, 126-34 (2002). 31. Ashida, K. et al. Characterization of the expression of TLR2 (toll-like receptor 2) and TLR4 on circulating monocytes in coronary artery disease. J Atheroscler Thromb 12, 53-60 (2005). 32. Hayashi, F., Means, T.K. & Luster, A.D. Toll-like receptors stimulate human neutrophil function. Blood 102, 2660-9 (2003).

6

153

7

General Discussion

General Discussion

1. Summary of findings

Deciphering cellular responses to pathogens using genomics data is an important endeavour in bioinformatics and biomedical sciences which can potentially uncover leads to disease therapy. In this thesis, we have used bioinformatics approaches to analyse data that were obtained from innate-immune cells after infection with some pathogens that contribute to respiratory infectious diseases. Among these pathogens are viruses such as the human metapneumovirus (HMPV), the respiratory syncytial virus (RSV), the parainfluenza virus (PIV) and the measles virus (MV).

1.1 Context-specific protein-protein interaction In chapter 2, we show using 2-d gel and context-specific protein-protein interaction (PPI) network analyses that respiratory viral pathogens are capable of affecting the host-cell proteome. The proteins concerned are involved in defense against ER stress and apoptosis. This information contributes to our understanding of the biology of respiratory viral infections. In chapter 3, we describe Physical Interaction Enrichment (PIE); a novel bioinformatics method in PPI network analysis. This method was developed because we noticed that general PPI networks are enriched in certain types of proteins, e.g. immune-related proteins which have been extensively studied. To mitigate the skewing of quantitative inferences made from general PPI networks PIE employs a normalization that ensures equal node degree (edge) distribution of a test set and of the random networks (background) it is compared with. As such we can assign networks derived from proteins that are triggered to increase or decrease in quantity in infected, relative to non-infected host-cells whether they are physically cohesive (biologically coherent) or not; irrespective of their over- or under-representation in general PPI networks. PIE is a valuable tool (e.g. to the biomedical community) that can be used to quantify the extent to which genes (or their cognate proteins) are involved with each other in the same biological pathway.

1.2 Iron depletion profile In chapter 4, we describe Iron Squelch, a new bioinformatics method that is aimed at comparing genomics data obtained from two biological contexts: iron-homeostasis and infection. Iron Squelch interrogates, from a computational point of view, the crosstalk between infection and iron-homeostasis. Using Iron Squelch we observe that cells infected with intracellular (e.g. HMPV), but not extracellular (e.g. E. coli) pathogens mimic the profile of iron-depleted cells. Importantly, the iron-depletion profile was also triggered by non-live viruses and by interferon gamma, a cytokine induced by host cells upon intracellular infections. We posit based on Iron Squelch analysis that an iron-depletion profile is likely to be a cellular response because both live and non-live viruses could induce such a profile. Understanding iron redistribution strategies and

157 7 the effects on innate-immune cells during extracellular and/or intracellular pathogen infections could improve our understanding of respiratory infectious diseases.

1.3 Transcription factor binding sites in DC-expressed miRNAs In chapter 5, we identify transcription factor binding sites (TFBS) in microRNAs (miRNAs) that are over-expressed in monocyte-derived dendritic cells (DCs). DCs are professional antigen presenting cells that get mature upon infection (e.g. exposure to bacterial endotoxin, LPS) by differentiating from circulating monocytes. MicroRNAs, to recapitulate, regulate gene expression by translational repression or target mRNA degradation. Our analysis pipeline reveal miRNAs that are triggered in expression in DCs and indicate that evolutionarily conserved TFBSs cluster preferentially within 500 bp upstream the promoter region of precursor miRNAs. We also find that many mRNAs of the cognate TFs of the conserved TFBSs were expressed in the DCs. These miRNAs and their target genes may facilitate future research aimed at understanding mechanisms in DC differentiation from monocytes.

1.4 Neutrophil transcriptome In chapter 6, we examine transcriptomics data generated from neutrophils of 4 LPS-induced volunteers. Neutrophils participate in systemic inflammatory response syndrome and the development of sepsis due to bacterial infections; including those that infect the airways. We investigated the in vivo neutrophil transcriptional responses after LPS infusion. Principal component analysis of gene expression reveals a distinctive temporal pattern in which gene expression at time t=2 hours after LPS infusion, was different from time t=4 hours and t=6 hours. Using PIE, we found that PPI networks of genes affected in expression after LPS infusion were physically cohesive. Gene Ontology analysis indicates that the up-regulated genes are involved in the processes of inflammatory response while the down-regulated genes are involved in lymphocyte activation. We also observed an increase in neutrophil number and provide candidate genes to further interpretation of LPS-induced neutrophilia and the gene response kinetics at a molecular level in neutrophils.

2. Future perspectives

2.1 Bioinformatics, immunomics and hypothesis generation This thesis addresses cellular response signals post infection in innate-immune cells using genomics data. The study of immune responses using genomics, proteomics or immunomics approaches in general, can foster our understanding of immunity by raising testable hypotheses based on bioinformatics-discerned patterns in biomedical data. This means that the collaboration between wet-lab and dry-lab work is essential in

158 General Discussion

biomedical research; predictions need to be validated and validated results can be used to improve further predictions. Moreover, while we need more experimental data to improve on the power of our hypotheses, we also need specific biochemical confirmations of the results, and this is usually tedious and not high-throughput. The challenge therefore lies in data integration for option-reduction so as to provide the biochemist or immunologist with a doable number of confirmatory experiments. We herein examine this promise in the realm of innate immune responses to pathogens; and we also look at the latest and/or upcoming relevant technologies.

2.1.1 Pathway exploration using PPI networks In chapter 3, we presented PIE which could be used to measure the physical coherence of a set of genes. In that chapter, we also showed that a good measure of coherence of a set of genes correlates positively with the prediction of more genes in downstream pathways following the same stimulant (e.g. HMPV infection). This finding can be applied to the following archetype of hypotheses: “In my study of gene responses to RSV infection, I already have a physically cohesive PPI network of genes that were up-regulated after 3 hours of infection. However, without having to perform further large-scale experiments, what other genes would likely be induced later in infection?” A physically cohesive propagated-network would give us a confident list of the new genes that would be likely induced later in the infection.

2.1.2 Optimizing protein-protein Interaction networks General PPI networks for human proteins will likely become more comprehensive to enable a complete proteome-level analysis using methods like PIE. This can be achieved by means of additional independent datasets, e.g. the subcellular localization of proteins (i.e. the “subcellulome” data if much is known), their various isoforms (the “isoformome”, if much is known) and their tissue specificity in the organism concerned. If these “omics” data are complete or near completion for humans, it would undoubtedly help in the characterization of functional protein interactions that are important in cellular immune response against specific infections. This might even lead to the construction of a reference immune-response chip of context-specific PPI networks that might be used to diagnose, and troubleshoot a patient’s response to a particular infection. To this end, protein chips have already been invented. For example the NAPPA (nucleic acid programmable protein array) approach which produces proteins just-in-time for functional assays1. Self-assembling protein microarrays have been used to detect humoral immunity to known and novel self-antigens, cancer antigens, autoimmune antigens, as well as pathogen-derived antigens in the immunome 2. Other examples include the use of protein chips to differentiate antibodies induced by various avian viruses 3, and the identification of several antibody-binding antigens from the outer membrane proteins of Pseudomonas aeruginosa 4. While the above examples are mainly

159 7 applicable in diagnosing disease agents, a reference context-specific-PPI network on a chip would help unravel the subnetworks of different respondents to the infection in question. For example, which interacting proteins can be used to explain why young adults are not as vulnerable to HMPV infection as do infants?

2.1.3 Co-infections and cellular iron homeostasis In chapter 4, we showed that intracellular respiratory tract pathogens induce an iron-depletion gene expression profile in cells; an effect that is likely to be cellular because it was promoted by both live and dead viruses. Furthermore, this effect was not promoted by extracellular pathogens; suggesting an iron-restriction behavior from the niches of the pathogens. We also highlighted that Iron restriction from pathogens is interesting in the interplay between co-infections and disease severity in its host because certain respiratory co-infections are rare, and where they occur, the disease caused is less severe than during single-infections5-8 . Understanding the role of iron- homeostasis in such instances could help unravel pathogenesis during respiratory tract (co)-infections in particular and infections in general. We suggest that these observations should be experimentally investigated by measuring cytoplasmic iron concentrations in infected cells.

2.1.4 MicroRNAs, LPS and dendritic cells In chapter 5 we investigated miRNA and their targets that are involved in DC differentia- tion from monocytes using LPS in addition to other factors. We found miRNAs like hsa-miR-34a and its target JAG1 that has been independently observed by others to be crucial in promoting DC differentiation from monocytes. In other words, repression of hsa-miR-34a or addition of its target, JAG1, stalls DC differentiation from monocytes 9. This positive example makes the rest of our predictions interestingly open for further investigations.

2.2 The interface of host-pathogen interactions: a further call for bioinformatics In this thesis we have examined host-cell responses to infections mainly from the perspective of the host-cell and generated hypotheses for wet-lab confirmation. Nonetheless, past, current and further work examining the impact (during host invasion) on a pathogens’ genome, the interactions between host-proteins with pathogen proteins, host RNA with pathogen RNA or DNA-protein interactions, would be essential in advancing our understanding of host-pathogen interactions. Several high-through- put technologies are being developed to advance this area, and bioinformatics is indispensable to unravel the huge amount of data generated.

160 General Discussion

2.2.1 MicroRNAs Host miRNAs have been implicated in antiviral defense when they bind viral mRNAs leading to its degradation and eventual stalling of protein translation. For instance, miR-323, miR-491, and miR-654 have recently been shown to inhibit replication of the H1N1 influenza A virus through binding to the PB1 gene of the virus leading to down- regulation of PB1 expression through mRNA degradation10. Identifying host miRNAs and their targets in the genome of the respiratory tract pathogens examined here could be useful in improving our knowledge on their mechanisms of pathogenesis. Aside from host miRNAs, viral miRNAs have also been reported to offset host-cell immune defenses11. For example, down-modulation of LMP2A (the Epstein-Barr virus latent membrane protein 2A) expression by the viral miRNA, miR-BART22, has been suggested to permit escape of Epstein-Barr-infected cells from host immune surveillance12. Furthermore, virally encoded miRNAs have been shown to target host genes and subvert antiviral immune responses. For example Stern-Ginossar and colleagues bioinformatically predicted targets for cytomegalovirus miRNAs, resulting in the identification of the human MICB (the major histocompatibility complex class I-related chain B) gene as a top candidate target of the viral miRNA, hcmv-miR-UL112. MICB is a stress-induced ligand of the natural killer (NK) cell activating receptor, NKG2D, and is critical for the NK cell killing of virus-infected cells and tumor cells13. Stern-Ginossar and colleagues showed that hcmv-miR-UL112 specifically down-regulates MICB expression during viral infection, leading to decreased binding of NKG2D and reduced killing by NK cells; thus revealing a miRNA-based immunoevasion mechanism that appears to be exploited by human cytomegalovirus14 . MiRNA data for the pathogens examined in this thesis are not available for similar investigations. However, this could be pioneered by bioinformatics predictions of the pathogen miRNAs, and subsequently followed by predictions and experimental validation of the target genes of the host. As mentioned above, viral miRNAs can regulate both cellular and viral gene expression. However, target gene identification has been a difficult and cumbersome task. Some methodologies have been developed to overhaul this problem. For example, RIP-Chip (the immunoprecipitation of Argonaute-protein containing RNA-induced silencing complexes (RISC) followed by microarray analysis), is a technology that enables the identification of miRNA-targetomes at whole transcriptome level 15, 16. While RIP-Chip yields a plethora of high-confidence miRNA targets and provides a quantitative estimate of miRNA function, additional bioinformatic analyses are required to identify individual miRNA binding sites. These types of data are lacking for the respiratory tract pathogens examined in this thesis, yet they could help to unravel the networks of regulation exerted by both pathogen and cellular miRNAs, thereby providing the basis for functional studies on miRNA-mediated regulation of gene expression in respiratory tract infections.

161 7 2.2.2 Pathogen proteins Apart of subverting host immune responses via miRNA related mechanisms, pathogens may also use their proteins to counteract host immunity. For example, in DC biology, infection of immature DCs with HMPV, RSV or PIV3 has been shown to yield low to moderate levels of DC maturation for all three viruses17. Meanwhile, the immature DCs were responsive to a secondary treatment with LPS; indicating that their intrinsic ability to mature was not impaired. A more recent study using HMPV, demonstrates that the LPS-induced effects can be dampened by the G-protein of the virus18. It was found that TLR4 (a canonical LPS ligand) plays a major role in HMPV-induced activation of mono- cyte-derived DCs, as down-regulation of its expression by small interfering RNA significantly blocked HMPV-induced chemokine and type I IFN expression 18. The latter authors also assert that the G-protein of the virus affects TLR4-dependent signaling. This assertion was based on the observation that infection of DCs with recombinant HMPV lacking G-protein inhibited LPS-induced production of cytokine and chemokines significantly less than did wild-type virus. Furthermore, treatment of DCs with the purified viral G-protein resulted in a similar inhibition of LPS-dependent signaling. This recapitulates the question of consequences on host immune responses when a patho- gen-protein interacts with a host-protein. Fortunately, protein microarrays that contain the protein repertoire of a pathogen are being developed to enable identification of host proteins that interact with those of the pathogen. For example, using a protein microarray, Margarit and colleagues identified known and novel interactions between surface proteins of two pathogenenic bacteria (Streptococcus pyogenes and Streptococcus agalactiae), with three human ligands (fibronectin, fibrinogen, and C4 binding protein) that are known to play an important role in streptococcal pathogenesis 19. This technology, together with follow-up bioinformatics analyses, could lead to the identification of host-pathogen protein that might abate immune responses. Similar analysis could be performed for the respiratory tract pathogens examined in this thesis to yield undoubtedly useful insights as to their mechanisms of pathogenesis.

2.2.3 Distorted host protein phosphorylation by pathogens Pathogen proteins have also been observed to distort post translational modifications like the phosphorylation of proteins that are important in host immune response. Protein phosphorylation is important to trap proteins in subcellular locations and is ubiquitous in intracellular signaling transduction cascades wherein proteins get activated upon phosphorylation. Viral proteins have been observed to distort host protein phosphorylation, resulting in improper localization and eventual abatement of defense. For example, the non-structural proteins (NS1 and NS2) of RSV have been associated with degradation of STAT2 in epithelial cells and subvert host interferon alpha and beta responses 20-22. In a more recent study Jie and colleagues showed that RSV infection increases the total cellular levels of STAT1 and STAT2 in DCs, but impaired

162 General Discussion

the IFN-beta-dependent phosphorylation and nuclear localization of the STAT proteins. Interestingly, these inhibitory effects of RSV on STAT1 and STAT2 phosphorylation and translocation were abolished by UV inactivation23 ; indicating that it is not a basic cellular response against the virus but rather a virus-imposed effect against its host. These examples demonstrate the complexity of elucidating innate immune responses. Not- withstanding, recent advancement in protein microarray technologies have been made to enable deciphering of the phosphoproteome of cells24-26, including virus-infected host cells27 , possible. For example, using a high-throughput reverse-phase phospho- proteome analysis of airway epithelial cells infected with Rift valley fever virus, Popova and colleagues observed that the virus infection down-regulated phosphorylation of a major anti-apoptotic regulator of survival pathways, AKT, along with phosphorylation of FOX 01/03, which controls cell cycle arrest downstream from AKT. Consistent with this, the level of apoptosis inhibitor XIAP was also decreased27. These observations demonstrate the use of this technology in unraveling intracellular pathways and also highlight phosphoproteins as potential points of pharmacological interventions. Moreover, the bioinformatics community is already designing tools to facilitate analysis of such data. For example, ProteoConnections28 is a platform recently designed to enable integration of quantitative proteomics data to determine the changes in protein phosphorylation under different cell stimulation conditions or kinase inhibitors.

2.2.4 Pathogen-host PPI networks In addition to the above mentioned examples, integration of a PPI network of an infectious disease agent with that of its host and the known host-pathogen interacting proteins could help interpret events at the interface between a pathogen and its host. For example, glucocorticoids - a family of anti-inflammatory drugs - have been shown to be ineffective in the treatment of RSV-induced bronchiolitis and wheezing. This situation has been ascribed to the repression of glucocorticoid receptor (GR)-mediated gene activation by the live virus; unlike the uv-irradiated virus 29. As recently shown by Hinzey and colleagues, RSV infection inhibited dexamethasone induction of three GR-regulated genes (glucocorticoid-inducible leucine zipper, FK506 binding protein, and MAPK phosphatase 1) in epithelial cells. However, RSV infection did not alter GR protein levels or GR nuclear translocation, but does reduce GR binding to the promoters of the glucocorticoid responsive genes analyzed 29. Identifying the mechanisms through which this suppression occurs (competitive transcription factor binding or perturbed PPI?) is a question to which a solution may lead to the development of novel therapeutics. A PPI network that is comprehensive and optimized to RSV and its host may help find the interaction, if present, that is causative to the ineffectiveness of glucocorticoids against RSV-induced bronchiolitis.

163 7 3. Conclusion

The constant collaboration between bioinformatics and immunology is essential to our understanding of disease pathogenesis. High-throughput technologies aimed at characterizing the interactome, phosphoproteome, subcellulome, miRNA-targetome, transcriptome, and proteome of a host and/or its pathogens would continue to provide data whose integration can help us understand the cellular mechanisms underpinning infectious disease in greater detail than is currently feasible. As such bioinformatics- based hypotheses that can be readily tested using biochemistry could be generated from large scale datasets to ascertain the inferences we make on cellular processes. These approaches can be applied not only to understand respiratory infectious diseases, but all diseases, including genetic disorders.

164 General Discussion

References

1. Ramachandran, N., Hainsworth, E., Demirkan, G. & LaBaer, J. On-chip protein synthesis for making microarrays. Methods Mol Biol 328, 1-14 (2006). 2. Ramachandran, N. et al. Tracking humoral responses using self assembling protein microarrays. Proteomics Clin Appl 2, 1518-27 (2008). 3. Wang, X. et al. A protein chip designed to differentiate visually antibodies in which were infected by four different viruses. J Virol Methods 167, 119-24 (2010). 4. Montor, W.R. et al. Genome-wide study of Pseudomonas aeruginosa outer membrane protein immunogenicity using self-assembling protein microarrays. Infect Immun 77, 4877-86 (2009). 5. Falchi, A. et al. Dual infections by influenza A/H3N2 and B viruses and by influenza A/H3N2 and A/H1N1 viruses during winter 2007, Corsica Island, France. J Clin Virol 41, 148-51 (2008). 6. Bosis, S. et al. Impact of human metapneumovirus in childhood: comparison with respiratory syncytial virus and influenza viruses. J Med Virol 75, 101-4 (2005). 7. Kannangara, S., DeSimone, J.A. & Pomerantz, R.J. Attenuation of HIV-1 infection by other microbial agents. J Infect Dis 192, 1003-9 (2005). 8. Brouard, J. et al. [Viral co-infections in immunocompetent infants with bronchiolitis: prospective epidemiologic study]. Arch Pediatr 7 Suppl 3, 531s-535s (2000). 9. Hashimi, S.T. et al. MicroRNA profiling identifies miR-34a and miR-21 and their target genes JAG1 and WNT1 in the coordinate regulation of dendritic cell differentiation. Blood 114, 404-14 (2009). 10. Song, L., Liu, H., Gao, S., Jiang, W. & Huang, W. Cellular microRNAs inhibit replication of the H1N1 influenza A virus in infected cells. J Virol 84, 8849-60 (2010). 11. Sonkoly, E., Stahle, M. & Pivarcsi, A. MicroRNAs and immunity: novel players in the regulation of normal immune function and inflammation. Semin Cancer Biol 18, 131-40 (2008). 12. Lung, R.W. et al. Modulation of LMP2A expression by a newly identified Epstein-Barr virus-encoded microRNA miR-BART22. Neoplasia 11, 1174-84 (2009). 13. Groh, V. et al. Costimulation of CD8alphabeta T cells by NKG2D via engagement by MIC induced on virus-infected cells. Nat Immunol 2, 255-60 (2001). 14. Stern-Ginossar, N. et al. Host immune system gene targeting by a viral miRNA. Science 317, 376-81 (2007). 15. Dolken, L. et al. Systematic analysis of viral and cellular microRNA targets in cells latently infected with human gamma-herpesviruses by RISC immunoprecipitation assay. Cell Host Microbe 7, 324-34 (2010). 16. Malterer, G., Dolken, L. & Haas, J. The miRNA-targetome of KSHV and EBV in human B-cells. RNA Biol 8 (2011). 17. Le Nouen, C. et al. Infection and maturation of monocyte-derived human dendritic cells by human respiratory syncytial virus, human metapneumovirus, and human parainfluenza virus type 3. Virology 385, 169-82 (2009). 18. Kolli, D. et al. Human Metapneumovirus Glycoprotein G Inhibits TLR4-Dependent Signaling in Monocyte- Derived Dendritic Cells. J Immunol (2011). 19. Margarit, I. et al. Capturing host-pathogen interactions by protein microarrays: identification of novel streptococcal proteins binding to human fibronectin, fibrinogen, and C4BP. FASEB J 23, 3100-12 (2009). 20. Lo, M.S., Brazas, R.M. & Holtzman, M.J. Respiratory syncytial virus nonstructural proteins NS1 and NS2 mediate inhibition of Stat2 expression and alpha/beta interferon responsiveness. J Virol 79, 9315-9 (2005). 21. Elliott, J. et al. Respiratory syncytial virus NS1 protein degrades STAT2 by using the Elongin-Cullin E3 ligase. J Virol 81, 3428-36 (2007). 22. Swedan, S., Musiyenko, A. & Barik, S. Respiratory syncytial virus nonstructural proteins decrease levels of multiple members of the cellular interferon pathways. J Virol 83, 9682-93 (2009). 23. Jie, Z., Dinwiddie, D.L., Senft, A.P. & Harrod, K.S. Regulation of STAT signaling in mouse bone marrow derived dendritic cells by respiratory syncytial virus. Virus Res 156, 127-33 (2011). 24. Chandra, H., Reddy, P.J. & Srivastava, S. Protein microarrays and novel detection platforms. Expert Rev Proteomics 8, 61-79 (2011). 25. Gallagher, R.I., Silvestri, A., Petricoin, E.F., 3rd, Liotta, L.A. & Espina, V. Reverse phase protein microarrays: fluorometric and colorimetric detection. Methods Mol Biol 723, 275-301 (2011).

165 7 26. Lu, Y. et al. Kinome siRNA-phosphoproteomic screen identifies networks regulating AKT signaling. Oncogene (2011). 27. Popova, T.G. et al. Reverse-phase phosphoproteome analysis of signaling pathways induced by Rift valley fever virus in human small airway epithelial cells. PLoS One 5, e13805 (2010). 28. Courcelles, M., Lemieux, S., Voisin, L., Meloche, S. & Thibault, P. ProteoConnections: A bioinformatics platform to facilitate proteome and phosphoproteome analyses. Proteomics (2011). 29. Hinzey, A. et al. Respiratory syncytial virus represses glucocorticoid receptor-mediated gene activation. Endocrinology 152, 483-94 (2011).

166 General Discussion

167 7

Summary Samenvatting Acknowledgement Appendix 1 Appendix 2 Appendix 3 Curriculum vitae (English) Curriculum vitae (Nederlands) Publications

Summary

Summary

This thesis presents and describes results from bioinformatics analyses of transcrip- tomics data that were generated from innate-immune cells after infection with some pathogens (or inactive pathogen material) that contribute to respiratory infectious diseases. Among these pathogens are viruses such as the human metapneumovirus (HMPV), the respiratory syncytial virus (RSV), the parainfluenza virus (PIV) and the measles virus (MV). These viruses belong to the family paramyxoviridae. Worldwide, these respiratory tract viruses cause morbidity and contribute to mortality particularly in infants, the elderly, and in patients having a repressed immune system. Conceptually, the thesis is divided into a method development section (chapters 3, 4 and 5) and an application section (chapters 2 and 6). Chapter 1 is the General Introduction. Chapter 7 is the General Discussion.

In chapter 2, we use the concept of context-specific protein-protein interaction (PPI) networks. A PPI network is a graphical representation in which a node represents a protein and the edges linking a node to other nodes represent a physical interaction. As such, a given node in such a network has at least one interaction partner. PPI networks are important because in biological systems, many processes are carried out by means of proteins interacting with each other. A context-specific PPI network, as the name implies, is a PPI network in which interacting proteins or their cognate genes were found to be regulated in a given context, e.g. due to viral infection. We applied this concept to our investigations which were aimed at gaining detailed insight into the molecular pathogenesis of respiratory virus infections in infected lung epithelial cells; A549 cells. These cells were exposed to a set of live and uv-irradiated respiratory viruses: RSV, HMPV, PIV or MV. Cells were harvested at different time points after infection and processed for proteomics analysis by 2-dimensional difference gel electrophoresis (2-d gel). Samples derived from infected cells were compared to mock-infected cells to identify proteins that are differentially expressed due to infection. By 2-d gel and PPI network analyses we show that RSV, HMPV, PIV3, and MV induced similar host-cell responses and that mainly proteins involved in defense against ER stress and apoptosis are affected. This information contributes to our understanding of the biology of respiratory viral infections.

In chapter 3, we describe a new bioinformatics method in PPI network analysis called Physical Interaction Enrichment (PIE). In that chapter, we start by showing that in general, PPI networks have an enrichment bias for proteins that are related to both genetic morbid diseases and to infectious diseases. This situation could skew quantitative inferences made from PPI networks that are derived from general PPI networks; thus necessitating proper randomization to ascertain the reliability of context-specific networks.

171 PIE employs a normalization that ensures equal node degree (edge) distribution of a test set and of the random networks (background) it is compared with. It measures whether proteins of a given set of genes have more interactions between themselves than proteins in random networks, and can therewith be regarded as physically cohesive. We applied PIE to a couple of immune-related datasets. Firstly, we discovered that the PPI network of genetic morbid disease genes was not physically cohesive, despite their enrichment in general PPI networks. Secondly, we observed that genes that are up-regulated in expression in alveolar epithelial cells after HMPV infection are physically cohesive. The PPI network of the down-regulated genes was also physically cohesive. Furthermore, we observe physical cohesiveness for genes induced upon interferon gamma treatment, and for uv-irradiated RSV or Pseudomonas; demonstrating the ability of PIE to measure physical cohesiveness in various circumstances. Physical cohesiveness of the up-regulated genes after infection with uv-irradiated pathogens is particularly interesting, as this cannot be the result of the direct interference of the pathogen with the host cells’ gene regulation, but points to the innate cellular program that appears to be triggered by the infection. We further demonstrate another useful aspect of PIE that entails prediction of genes in pathways downstream infection response. Using HMPV-induced genes, we observed a positive correlation between the PIE score of the PPI network of these genes and the sensitivity of prediction of genes downstream HMPV-induced pathways. This positive correlation indicates that PIE is a valuable tool to quantify the extent to which genes are involved with each other in the same biological pathway.

In chapter 4, we describe another new bioinformatics method called Iron Squelch that is aimed at interpreting genomics data in relation to iron-homeostasis and infection. Iron is essential for the metabolism of microbes and their hosts. During infection, pathogens compete for iron with their host cells. It is however, unclear whether host innate-immune cells could redistribute iron for their (the immune cells) own functionality as immune effectors while keeping the iron pools from being used by the invading pathogens. Our novel computational methodology compares the transcriptomics profiles of infected cells with those of an infection-free iron experiment. Precisely, Iron Squelch measures the total extent of gene expression fold change of iron-response genes under conditions of iron-overload, iron-depletion and infection. It further assesses, using the so-called Iron Squelch Indexes, whether the transcription profile (of the same set of genes) during infection mimics that of iron-depletion or iron-overload or none. We applied Iron Squelch to microarray data from cells that were challenged with the following: HMPV, PIV, RSV, MV, LPS, Chlamydia pneumoniae, Escherichia coli, or Pseudomonas aeruginosa. We observe that cells infected with the intracellular (i.e. C. pneumoniae and the viruses), but not extracellular (E. coli, P. aeruginosa and LPS) pathogens mimic the

172 Summary

profile of iron-depleted cells. Importantly, the iron-depletion profile was also triggered by inactivated viruses (uv-irradiated HMPV, RSV, PIV and MV) and by interferon gamma, a cytokine induced by host cells upon intracellular infections. The iron-depletion profile observed is likely to be a cellular response because the inactivated viruses could also induce such a profile. In addition to uncovering the infection-induced iron restriction profiles of innate immune cells, our observations also open an avenue to investigate iron-dependent synergy or mitigation of disease during (co)-infections. Understanding iron redistribution strategies and effects on the hosts during extracellular and/or intracellular pathogen infections could improve our understanding of respiratory infectious diseases.

In chapter 5, we develop and apply an analysis pipeline to uncover transcription factor binding sites (TFBS) in microRNAs (miRNAs) that are over-expressed in monocyte-de- rived dendritic cells (DCs). DCs are professional antigen presenting cells and are important in circumventing both viral and bacterial infections; respiratory infections inclusive. They are triggered upon infection (e.g. exposure to bacterial endotoxin, LPS) to differentiate from circulating monocytes. MicroRNAs, meanwhile, play a fundamental role in the regulation of gene expression by translational repression or target mRNA degradation. Despite the importance of miRNAs in regulation of cellular processes, regulatory elements in miRNA promoters are less well studied. Yet, such elements may reveal a link between miRNA expression and a specific cell type. To explore this link in myeloid cells (a group of cells that includes DC precursors), miRNA expression profiles were generated from monocytes and DCs. Differences in miRNA expression among monocytes, DCs and their stimulated progeny were observed. Furthermore, putative promoter regions of miRNAs that are significantly up-regulated in DCs were screened for TFBSs based on TFBS motif matching score, the degree to which those TFBSs are over-represented in the promoters of the up-regulated miRNAs, and the extent of conservation of the TFBSs in mammals. Analysis of evolutionarily conserved TFBSs in DC promoters revealed preferential clustering of sites within 500 bp upstream of the precursor miRNAs, and that many mRNAs of cognate TFs of the conserved TFBSs were actually expressed in the DCs. We also predicted target genes for the over-expressed miRNAs, some of which have been observed to be down-regulated in DCs during its differentiation from monocytes. Taken together, our data provide evidence that selected miRNAs expressed in DCs differentiated from monocytes using LPS as an infection model, have evolutionarily conserved TFBSs relevant to DC biology in their promoters. These miRNAs and their target genes may facilitate future research aimed at understanding mechanisms in DC differentiation from monocytes.

173 In chapter 6, we applied PIE to transcriptomics data generated from neutrophils of 4 LPS-induced volunteers. Neutrophils are white blood cells that play an important role in systemic inflammatory response syndrome and the development of sepsis following bacterial infections; including those that infect the airways. Neutrophils are essential for a proper defense against microorganisms but may also cause tissue damage. To investigate the in vivo neutrophil transcriptional responses after LPS infusion, a total of 17393 distinct genes were analyzed on microarray using samples from the infected volunteers. Principal component analysis of gene expression in time indicates a separation of time t=2 hours after LPS infusion, from time t=4 hours and t=6 hours for the 4 LPS-induced volunteers examined. From this set of genes, 2248 genes (≈13%) were differentially expressed relative to time 0h at a fold change threshold of 2. Out of this set, 233 genes were up-regulated and 307 genes were down-regulated at all 3 time points following LPS infusion. PPI networks of the former and latter gene-sets, especially the down-regulated genes, were physically cohesive. Based on Gene Ontology analysis, the up-regulated genes are involved in the processes of inflammatory response while the down-regulated genes are involved in lymphocyte activation. Further independent hematological measurements revealed neutrophilia and lymphopenia in the whole blood post LPS administration in the 4 volunteers. The congruency of the transcriptomics responses with the hematological parameters to LPS infusion are likely due to both direct activation of circulating neutrophils and mobilization of neutrophils from the bone marrow. We provide candidate genes as a resource to further interpretation of LPS-induced neutrophilia and the gene response kinetics at a molecular level in neutrophils.

In Chapter 7, we recapitulate the work described in this thesis, with a look into the future of bioinformatics-based analysis of immune response to respiratory tract pathogens. The constant collaboration between bioinformatics and immunology is essential to our understanding of disease pathogenesis. High-throughput technologies aimed at characterizing the interacteome, phosphoproteome, subcellulome, miRNA- targetome, transcriptome, and proteome of a host and/or its pathogens would continue to provide data whose integration can help us understand the cellular mechanisms underpinning infectious disease in greater detail than is currently feasible.

174 Samenvatting

Samenvatting

Dit proefschrift presenteert en beschrijft de resultaten van bioinformatica analyses van transcriptoomdata, die gegenereerd werden uit cellen van het aangeboren immuun - systeem nadat ze geïnfecteerd werden met pathogenen (of inactief pathogeenmateriaal), die bijdragen aan infectieziektes van de luchtwegen. Onder deze pathogenen zijn virussen zoals het menselijke metapneumovirus (HMPV), het luchtweg syncytiële virus (RSV), het parainfluenzavirus (PIV) en het mazelenvirus (MV). Deze virussen horen bij de familie paramyxoviridae. Wereldwijd veroorzaken ze leed en sterfte vooral bij kleine kinderen, ouderen en mensen met een verzwakt afweersysteem. Inhoudelijk is dit proefschrift onderverdeeld in twee secties, één sectie gaat over ontwikkeling van methodiek (hoofdstukken 3, 4 en 5) en één die over de toepassing ervan gaat (hoofdstukken 2 en 6). Hoofdstuk 1 is de algemene introductie. Hoofdstuk 7 is de algemene discussie.

In hoofdstuk 2, maken wij gebruik van het concept van context-specifieke netwerken van eiwit-eiwit interacties (protein-protein interactions in het Engels, oftewel PPIs). Een PPI netwerk is een grafische weergave waarbij een knooppunt (‘node’ in het Engels) een eiwit voorstelt terwijl de lijnen tussen knooppunten (‘edges’ in het Engels) fysieke interacties voorstellen. Elk knooppunt heeft dus één of meer interactiepartners. Ze zijn belangrijk omdat veel biologische processen via interacties tussen eiwitten bewerkstelligd worden. Een context-specifiek PPI netwerk is, zoals de naam al aangeeft, een netwerk waarin interacterende eiwitten of hun bijbehorende genen in een bepaald verband gereguleerd worden, bijvoorbeeld vanwege een virale infectie. We pasten dit concept toe in ons onderzoek om dieper inzicht te krijgen in de moleculaire pathogenese van luchtweg virusinfecties in geïnfecteerde longepitheel- cellen; de A549 cellen. Deze cellen werden blootgesteld aan een aantal levende en uv-bestraalde virussen: RSV, HMPV, PIV of MV. De cellen werden geoogst op verschillende tijdstippen na infectie en verwerkt voor proteoomanalyse met 2-dimensionele difference gel electrophorese (2-d gel). Monsters afkomstig van geïnfecteerde cellen werden met schijn-geïnfecteerde cellen vergeleken om eiwitten te identificeren wiens expressie door de infectie anders werd. Door 2-d gel en PPI netwerkanalyses laten wij zien dat RSV, HMPV, PIV3 en MV soortgelijke gast-cel reacties teweeg brengen en dat het vooral eiwitten betreft die betrokken zijn bij bescherming tegen ER-stress en apoptose. Deze informatie draagt bij aan onze kennis van de biologie van virale luchtweg infecties.

In hoofdstuk 3, beschrijven wij een nieuwe methode in PPI netwerkanalyse genaamd fysieke interactieverrijking (physical interaction enrichment in het Engels, oftewel PIE). In dit hoofdstuk beginnen wij met het aantonen dat PPI netwerken over het algemeen verrijkt zijn bij zowel genetische als bij infectieziekten. Dit kan kwantitatieve conclusies die gebaseerd zijn op algemene PPI netwerken systematisch beïnvloeden, waardoor

175 een goede randomisatie nodig is om de betrouwbaarheid van context-specifieke net - werken te beoordelen. PIE maakt gebruik van een normalisatietechniek die een gelijke verdeling van het aantal lijnen per knooppunt (node degree) tussen test en bijbehorende random PPI datasets waarborgt. Het kwantificeert in hoeverre eiwitten in een netwerk meer met elkaar interacteren dan bij soortgelijke random netwerken, en dus als fysiek cohesief kunnen worden beschouwd. We hebben PIE toegepast op een paar immuun-gerelateerde datasets. Ten eerste ontdekten wij dat de PPI netwerken van genetische ziektes niet fysiek cohesief zijn, ondanks hun verrijking in algemene PPI netwerken. Ten tweede ontdekten wij dat genen wiens expressie omhoog gereguleerd worden in alveolare cellen na HMPV infectie wel fysiek cohesief zijn. Ook het PPI netwerk van de omlaag gereguleerde genen was fysiek cohesief. Bovendien waren genen die geïnduceerd werden door behandeling met interferon gamma, uv-bestraald RSV of Pseudomonas ook cohesief. Dit laat zien dat PIE de fysieke cohesiviteit kan meten in verschillende omstandigheden. De fysieke cohesiviteit van de genen die omhoog gereguleerd werden na infectie met uv-bestraalde pathogenen is bijzonder interessant omdat dit niet door een directe inmenging van het pathogeen in de genregulatie kan zijn, maar eerder een ingebouwde cellulair programma lijkt te zijn dat op de infectie reageert. Daarnaast laten we zien dat PIE ook nuttig is voor het voorspellen van genen in cellulaire processen die volgen op infectie. Bij de genen die door HMPV geïnduceerd werden, zien wij een positieve correlatie tussen de PIE score van hun PPI netwerk en de mate waarin genen die reageren op HMPV infectie voorspeld kunnen worden. Deze positieve correlatie geeft aan dat PIE een waardevol hulpmiddel kan zijn om de mate waarin een set van genen in hetzelfde biologische proces betrokken is te kwantificeren.

In hoofdstuk 4, beschrijven wij een andere nieuwe bioinformatica methode genaamd Iron Squelch die gericht is op het interpreteren van genoomdata in relatie tot ijzer- homeostase en infectie. IJzer is essentieel voor het metabolisme van zowel microben als hun gastheren. Tijdens infecties concureren microben en gastheercellen om ijzer. Het is echter onduidelijk of de cellen van het aangeboren immunsysteem het ijzer kunnen herverdelen om het voor eigen gebruik te behouden en tegelijkertijd de beschikbaarheid ervan voor de pathogenen te ontnemen. In onze nieuwe methodiek worden transcriptoomprofielen van geïnfecteerde cellen vergeleken met die van een ijzerbehandeling zonder infectie. Iron Squelch meet de totale mate van verandering in genexpressie van genen die op ijzer reageren bij ijzer- overbelasting, ijzer-tekort en infectie. Door gebruik te maken van de zogenaamde iron squelch index wordt verder gekeken of het transcriptieprofiel van dezelfde set genen meer lijkt op die van ijzer-tekort, ijzer-overbelasting, of geen van beide.

176 Samenvatting

We pasten deze methodiek toe op data van cellen die geïnfecteerd werden met HMPV, PIV, RSV, MV, LPS, Chlamydia pneumoniae, Escherichia coli of Pseudomonas aeruginos. We vonden dat cellen die met intracellulaire (C. pneumoniae en de virusen) maar niet extra- cellulaire (E. coli, P. aeruginosa en LPS) pathogenen geïnfecteerd werden soortgelijke profielen vertoonden als bij ijzer-tekort. Belangrijk hierbij is het feit dat dit ijzer-tekort profiel ook veroorzaakt werd door niet-actieve virussen (uv-bestraalde HMPV, RSV, PIV en MV) en interferon gamma, een cytokine die omhoog-gereguleerd wordt bij intracel- lulaire infecties. Dit duidt erop dat dit ijzer-tekort profiel een cellulaire reactie is omdat de inactieve virussen een soortgelijk profiel induceren. Naast het blootleggen van door infectie veroorzaakte ijzer-profielen maken onze observaties het ook mogelijk om ijzer-afhankelijke synergie of ziektevermindering te bestuderen tijdens (co)-infecties. Het begrijpen van ijzer herverdelingsstrategieën en hun effect op extra- en/of intracellulaire infecties kan onze kennis over luchtweginfec- tieziektes vergroten.

In hoofdstuk 5, ontwikkelen en passen wij een aanpak toe om transcriptiefactor bin- dingsposities (transcription factor binding sites in het Engels, oftewel TFBS) in microRNAs (miRNAs) die tot over-expressie komen in van monocyten afkomstige dendritische cellen (DCs). DCs zijn professionele antigeen-presenterende cellen die belangrijk zijn bij afweer van zowel virale als bacteriële infecties, waaronder ook luchtweginfecties. Ze worden aangezet om circulerende monocyten te vormen na infectie (bv LPS blootstelling). MicroRNAs daarentegen spelen een fundamentele rol in de regulatie van genexpressie door repressie van translatie of degradatie van doelwit mRNAs. Ondanks het belang van miRNAs voor cellulaire processen zijn de regulatoire elementen in hun promotors minder goed bestudeerd. Toch kunnen zulke elementen verbanden aan het licht brengen tussen miRNA expressie en bepaalde celtypen. Om dit verband in myeloide cellen (een groep cellen die DC voorlopers omvat) te bestuderen werden miRNA expressieprofielen gegenereerd voor zowel monocyten als DCs. Verschillen in miRNA expressie tussen monocyten, DCs en hun gestimuleerde nageslacht werden waargenomen. Daarnaast werden de vermoedelijke promotor-regi- onen van miRNAs die significant omhoog-gereguleerd werden in DCs onderzocht op TFBSs, gebaseerd op TFBS motief score, mate van over-representatie van die motieven in de promotors van omhoog-gereguleerde miRNAs en evolutionaire conservering van die TFBSs bij zoogdieren. Analyse van evolutionair geconserveerde TFBSs in DC promotors toonde een neiging aan om binnen 500bp voor het begin van de voorloper miRNAs te clusteren en bovendien dat veel mRNAs die door bijbehorende TFs gereguleerd worden ook tot expressie kwamen in DCs. Daarnaast voorspelden wij ook doelwit mRNAs voor de over- vertegenwoordigde miRNAs waarvan sommige omlaag-regeruleerd worden in DCs tijdens de differentiatie van monocyten. Samengevat laat onze data zien dat bepaalde

177 miRNAs, die tot expressie komen in DCs die uit monocyten gedifferentiëerd worden door middel van LPS als infectie-model, evolutionair geconserveerde TFBSs in hun promotors hebben die relevant zijn voor DC-biologie. Deze miRNAs en hun doelwitgenen kunnen toekomstig onderzoek naar DC differentiatie uit monocyten bevorderen.

In hoofdstuk 6, hebben wij PIE toegepast op transcriptoomdata gegenereerd uit neutrofielen van vier vrijwilligers die LPS kregen toegediend. Neutrofielen zijn witte bloedcellen die een belangrijke rol spelen bij het systemische inflammatie syndroom en bij het ontstaan van sepsis na bacteriële infecties, waaronder die van luchtwegen. Neutrofielen zijn essentieel voor een goede weerstand tegen micro-organismen, maar ze kunnen ook weefselschade veroorzaken. Om de in vivo transcriptieresponsen van neutrofielen te bestuderen na LPS infusie werden in totaal 17393 afzonderlijke genen geanalyseerd op microarrays, gebruik makend van monsters van de geïnfecteerde vrijwilligers. Principiële componentanalyse van de genexpressie in de tijd laat een scheiding zien van tijdstip t=2 uur na LPS infusie ten opzichte van tijdstippen t=4 en t=6 bij de vier vrijwilligers die LPS kregen toegediend. Uit deze set van genen kwamen 2248 genen (≈13%) tenminste een factor twee anders tot expressie dan op tijdstip t=0. Van deze set waren 233 omhoog-gereguleerd en 307 omlaag-gereguleerd voor alle 3 tijdstippen na LPS infusie. PPIs van deze genensets, vooral van de omlaag-gereguleerde genen, waren fysiek cohesief. Gen Ontologie analyse liet zien dat de omhoog-geregu- leerde genen betrokken zijn bij inflammatie terwijl de omlaag-gereguleerde genen bij lymphocyte activatie betrokken zijn. Verdere onafhankelijke hematologische metingen toonden neutrofilie en lymphopenie aan in het bloed van de vier vrijwilligers na toediening van LPS. De overeenkomstige transcriptomische reacties en hematologische parameters na LPS infusie komen waarschijnlijk als gevolg van zowel directe activatie van circulerende neutrofielen alsmede van de mobilisatie van neutrofielen uit het beenmerg. Wij stellen kandidaatgen voorspellingen beschikbaar om de interpretatie van door LPS veroorzaakte neutrofilie te bevorderen, alsmede ook het begrip op moleculair niveau van de kinetiek van het genrespons in neutrofielen.

In hoofdstuk 7, beschouwen wij het werk dat in dit proefschrift beschreven wordt met een blik op de toekomst van bioinformatische analyse van het immuunrespons op luchtwegpathogenen. Een doorlopende samenwerking tussen bioinformatica en immunologie is nodig om ziekteprocessen te kunnen doorgronden. Grootschalige technologieën gericht op het karakteriseren van interacteoom, phosphoproteoom, subcelluloom, miRNA-doelwitoom, transcriptoom en proteoom van een gastheer en/of zijn pathogenen zullen data beschikbaar moeten blijven stellen die na integratie ons kunnen helpen de onderliggende cellulaire mechanismen van infectieziektes in meer detail te begrijpen dan nu mogelijk is.

178 Acknowledgement

Acknowledgement

First and foremost, I want to extend gratitude to my PhD supervisor - Prof. Dr Huynen. Thank you Martijn for the PhD opportunity, and for your daily supervision. In addition to Prof. Huynen, I am also grateful to Dr Andeweg. Thank you Arno, for providing the VIRGO genomics data. To you Prof. Dr Peter Hermans and Prof. Dr Gosse Adema, thank you for your positive responses to my quests for immune-related data. I am also grateful to you, Prof. Dr Joe Nadeau, for partially funding my visits to Greece to attend the conferences on Pathways, Networks, and Systems Medicine.

I want to thank our institute director, Prof. Dr Vriend. Thanks Gert, for making CMBI into a comfortable working environment and for promoting social meetings both at the CMBI and at your home. I will always remember your signature yearly BBQ, especially because you do not enforce a rule of must-eat-salad. Thanks for understanding, Gert. In addition to Gert, I would also like to thank our office manager, Barbara van Kampen and her team. Thank you Barbara, Esther, Eugenie, Margot and of course Wilmar, for the administrative assistance, especially at moments when my dutch language skills were non-existent.

I would like to thank my colleagues in the comparative genomics group (Comics). Ladies and gentlemen of Comics, I learned from each of you, both passively and actively. Thanks Martin, for Python programming hints at the beginning of my PhD and for translating the Summary section of this thesis into Dutch. Thanks Bas, Rasa and Shiva, for your insightful discussions in the disease-investigation room. Of course, I shall not forget the mitochondria-investigation room. Thanks Isabel, Radek, Ursula and John for your interesting mitochondria genomics stories. I would also like to thank my name-sake, Edwin, for his interesting work on Plasmodium genomics and proteomics. I must admit, coming from Cameroon, a malaria endemic region, and having a primary background in medical biochemistry, the familiarity and purpose of your work has always been soothing to me; keep it up! Last but not the least in the Comics arena, thank you Philip, Richard, Fiona and Joana for impressing us with information digested from massive quantity of data.

Frank (van Zimmeren), thanks dude, first of all for informing me of the PhD opening in Comics, to which I applied and was successful. Secondly for helping me to navigate through dutch traditions. Barzan and Miaomiao, seeing you around was often appealing because it evoked memories of our good years in Wageningen together. Cheers mates! Guenola (how are the swiss chocolates, Nola?), Vera, Rene, Berend, Ludo and all other former and current CMBI members that I personally know, it was an honour to meet and work with you all.

179 Thank you Angela (van Diepen), Bas (Jansen), Anna (Sanecka) and Stan (de Kleijn) for your co-operation that succeeded into joint publications. Dear April, maybe by the time this piece is defended, our manuscript would have been published. With fingers crossed, thanks for your co-operation.

My life in Nijmegen was not only Radboud-related; there was a social dimension to it that strongly epitomized dutch christian hospitality and family values. Special thanks to you Diederick, Dineke and the Eikelboom kids for accommodating me at least once a week for more than a year when I was too tired to drive home, or had to work late. Thanks to you also Martijn, Anne-Catherine and the Huynen kids for welcoming me to Nijmegen.

Thank you Regi and Sasha for regularly playing Tennis and Squash with me, and for the reviving joined diner sessions that made me feel at home while abroad. Fiona, you tried to teach me dancing to no avail. As you must have realized, unlike you, I cannot summon “dynamite” to the feet; no matter how melodious the tchat-tcha-tcha. But you tried and helped me realized, once and for all, my inadequacies in this social area. Thanks.

I am very grateful to my family and friends who showed morale support during my PhD period. Most importantly, I am grateful to my lovely wife, Elsa, mother to my gracious daughter, Angwi. Thanks Elsa for helping with the translation of the Summary and CV herein. Thanks to you and Angwi for surviving when I was all-consumed by work. The rhetoric, “I am still finishing some things”, shall soon end.

180 Appendix 1

Appendix 1: Supplementary material to Chapter 3

Measuring the physical cohesiveness of proteins using Physical Interaction Enrichment

Iziah Edwin Sama and Martijn A. Huynen

1. Test PPI Network Propagation

1.1 Precepts of propagation of test PPI networks derived from general PPI networks Having a PPI network that is physically cohesive is interesting because it indicates that the genes used to construct this network are functionally related to each other, suggesting some sort of biological coherence. In addition, interesting information can be garnered from a cohesive PPI network by testing the limit of its cohesiveness. For instance, one can assess to what extent proteins that interact with those in an original test network, but that are not part of it, are biologically relevant to the process one is studying. To achieve this, a test PPI network is propagated and the cohesiveness of the resultant networks is examined using the PIE randomization procedure. Propagation of a test network based on a given gene list involves inclusion of interactions not only between proteins encoded by genes in that original gene list (N0 propagations, Equation 1), but also those in the general PPI network known to interact directly with those that are present in the original gene list (N1 propagation, Equation 2) and interactions with the latter (N2 propagations, Equation 3) and so forth (Figure 1A). This enables us to observe the change in physical cohesiveness from the original context of the test proteins (N0) to what is likely a less specific context as we propagate through the entire general PPI network. The propagated networks that are physically cohesive constitute an interesting resource from which additional genes that are relevant to the process being studied can be obtained.

1.1.1 Network propagation 1 1 1 1 1 A PPI network G = (V , E ) containing a set V with NV vertices, and a set E with NE edges is propagated using a general network G = (V , E ) such that: Level 0, (N0) = N0G 1 ⊆ G (1) Level 1, (N1) = N1G 1 ⊆ G (2) Level 2, (N2) = N2G 1 ⊆ G ; practically, N2G 1 ≈ G (3)

181 Etc., where N0G 1, N1G 1 and N2G 1 are induced subgraphs of G; and respectively consists of the original V 1 nodes and the captured V nodes in G at distance 0, 1 and 2 from V 1.

1.2 Results of propagation of test PPI networks It is interesting to assess to what extent proteins that are in a general PPI network, but absent in an original set of test proteins, interact with the test proteins.

1.2.1 Propagation of derived PPI networks of HMPV-induced genes Using HMPV-induced genes that are up-regulated at a fold change threshold of 3.0 after 12 hours of infection indicates that the physical cohesiveness increases from N0 (PIE score=1.1, p=0.04) to N1 (PIE score=1.4, p<0.01) propagation (Figure 1B). This increase is due to the rise of average degree for the test networks which is notably also in line with those of their corresponding random networks (Figure 1B). Although the average degree increases for both test and random networks, the PIE score, which is a strict relative value, is observed to drop drastically at higher levels of propagation than 1 (data not shown). This drastic drop occurs because the propagated network grows and almost completely overlaps with the general PPI network. Moreover, examination of the average node overlap between test network with random networks for the N0, N1 and N2 networks respectively, revealed an increasing overlap (3%, 24%, and 81% respectively) of the nodes in the test networks with their random counterparts. Likewise, we observed an increasing overlap (3%, 25% and 68 respectively) of the test network interactions with those of their random equivalents. These overlaps are large, partially because of the imposed PIE randomization constraints. Overall, at higher levels of propagation (>1), one cannot obtain independent samples from the complete network to assess the statistical significance. Although at a single iteration (N1) the propagated network already contains 6% of the nodes in the general network and 25% of the nodes with the randomized networks, the significant overlap with the biological network at that point (see below) suggests that the N1 network derived from the original network of up-regulated genes is likely to be biologically relevant to HMPV infection.

1.2.2 Propagation of derived PPI networks of IFNG induced genes Using IFNG-induced genes that are up-regulated at a fold change threshold of 3.0, the physical cohesiveness is observed to increase from N0 (PIE score = 1.14, p=0.03) to N1 (PIE score=1.23, p<0.01) network propagation. As was the case of HMPV-induced genes, this pattern is a consequence of the rise in average degree upon propagation of the networks which increased for both test and random networks (Figure 1C).

182 Appendix 1

Figure 1 - Propagation of context derived networks. (A) Scheme of propagation of PPI networks.

The network, N0, obtained after integration of a set of genes with the general PPI network is propagated to include interactions not only between proteins encoded by genes in the original gene list, but also those in the general network known to interact with the N0 nodes (N1 network) and interactions of the latter (N2 network) and so forth. Circles represent nodes (i.e protein of gene). Links between nodes represent protein- protein interactions. (B) Changes in average degree upon propagation of the HMPV-induced gene network in the HsapiensPPI network. (C) Changes in average degree upon propagation of the interferon gamma-induced gene network in the HsapiensPPI network. (UpregHMPV12 denotes that the gene set used for this network where upregulated (≥3 times) in expression after 12 hr of HMPV infection; upregIFNG24 denotes that the gene set used for this network where upregulated (≥3 times) in expression after 24 hr of interferon gamma treatment).

183 2. Downstream pathway gene predictions

2.1 Precepts of prediction of down-stream pathway genes If a propagated network is found to be physically cohesive, it will be interesting to extract and examine the new nodes. For example; enrichment of the new nodes could be assessed in a benchmark set of genes in pathways downstream the original test nodes. The benchmark set of genes used in this study are those observed at later time points to be up-regulated in expression by the same stimulus used to trigger the original test set of genes. The enrichment terms are defined as follows. True positives (TP) are the nodes in the propagated network that are present in the benchmark set. False positives (FP) are the nodes in the propagated network that are not present in the benchmark set. False negatives (FN) are the benchmark nodes not present in the set of nodes of the propagated network. The sensitivity of prediction is calculated as the percentage of TP divided by the sum of TP and FN. The positive predictive value (PPV) of prediction is calculated as the percentage of TP divided by the sum of TP and FP. The p-value of prediction is represented as the empirical fraction of instances whereby amidst the nodes from 1000 random networks of the same size and global degree distribution as the propagated test network (i.e. randoms obtained using the PIE randomization approach) there are equal or greater number of TP nodes than that of the test TP nodes.

2.2 Results of prediction of down-stream pathway genes The one-step propagation of a PPI network at high fold change yields other networks that are still significant in terms of physical cohesiveness, and can therewith be regarded as biologically meaningful. In order to harness this property, nodes in cohesive propagated networks were queried in a benchmarked set of genes that are situated in pathways downstream of the response. This principle was applied to HMPV infection by using the sets of genes obtained at 5 infection time points as benchmarks. We observed that the propagation of networks consisting of proteins whose genes are induced at an FC threshold of 3.0 by the HMPV virus at time points 12, 24 and 48 hrs led to significant (p<1.0e-3) prediction of genes induced at similar levels at the subsequent time points used (Table 1). For instance at the fold change of 3.0, 69 of the predicted nodes (amongst which were 23 nodes that were not over-expressed before) in the “HMPV12N1” network are indeed expressed at this fold change threshold at time point 24 hr. The HMPV12N1 network was obtained after one- level propagation of the PPI network (i.e. HMPV12N0) of proteins encoded by genes that are up-regulated 12 hr post HMPV infection. Overall, the sensitivity of prediction is 5-41 % and the positive predictive value is 5-11%. Intriguingly, there is a strong correlation between the PIE score of precursor networks and the predictability of genes in their propagated networks. For instance the correlation between the PIE score and sensitivity of predictions is 0.93

184 Appendix 1

(p=0.066). These results demonstrate the usefulness of the PIE approach to assess the biological coherence of a set of genes and the potential to use general PPI networks in an exploratory manner to predict genes of relevance to a process being studied.

Table 1 - Summary of gene expression prediction using PPI networks.

Scenario PN *PPV +SN TP TP NN (PIE score) pvalue hmpv6N1 hmpv6N0 0.05 0.05 6 1.0e-03 0 and (1) hmpv12N0 hmpv12N1 hmpv12N0 0.11 0.29 69 <1.0e-03 23 and (1.05) hmpv24N0 hmpv24N1 hmpv24N0 0.1 0.33 170 <1.0e-03 59 and (1.06) hmpv48N0 hmpv48N1 hmpv48N0 0.1 0.41 257 <1.0e-03 48 and (1.12) hmpv72N0

*Pearson correlation between PPV and PIE score is 0.74 (p= 0.26) for the 4 data points. +Pearson correlation between SN and PIE score is 0.93 (p=0.066) for the 4 data points. (PN precursor network (i.e. the original network that was propagated to yield the predicted genes), PPV positive predictive value, SN sensitivity, TP number of true positives, NN number of the TP nodes that were not in PN)

185 Appendix 2: Supplementary Material to Chapter 4

Transcriptomic assessment of cellular response to intracellular pathogens reveals an iron-depletion profile

Iziah Edwin Sama, April Kartikasari, Arno Andeweg, Dorine Swinkels, Martijn Huynen

1. Iron Squelch profile from transcriptomics data

In addition to our in-house data, we further examined iron profiles using Squelch analysis of published microarray data sets obtained from NCBI gene-expression omnibus database (GEO) .The datasets are transcriptomics data from cells treated with human metapneumovirus (HMPV) for 6, 12, 24, 48, and 72 hours (GSE8961) [1]; Human respiratory syncytial virus (RSV) for 4 and 24 hours ( GSE3397) contributed by Huang et al, 2001. ; Measles virus (MV) for 3, 6, 12, and 24 hours (GSE980) [2] ; Influenza A virus (Infa) for 24, 48, 96, and 168 hours (GSE5289) [3] ; Chlamydia pneumonia for 12, 24, 48 and 72 hours (GSE7246) [4]; E. coli for 6 hours(GSE2232) [5] ; Pseudomonas aeruginosa for 1.5 hours (GSE4485) [6] ; uv-irradiated airway-pathogens for 4 hours (GSE6802) [7] ; LPS for 1 and 4 hours (GSE1541) [8] and Interferon gamma (IFNG) for 4 and 24 hours (GSE1815) [9] . The cell types used to generate the above-mentioned datasets were epithelial cells, except that of MV which were dendritic cells.

186 Appendix 2

Figure 1 - Heatmap of genes that are perturbed in expression in virus-infected cells and that have “iron” in their description in Gene Ontology.

Respectively, rsv6 and rsv6uvd refer to data of 6 hours of live and uv-irradiated RSV infection, etc. RSV, respiratory syncytial virus; HMPV, human metapneumovirus; MV measles virus; PIV, parainfluenza virus; depletion, iron-depletion; overload, iron-overload. (red >=1.5 fold up-regulated; blue >=1.5 fold down-regulated; white not up to 1.5 fold perturbed relative to control cells.)

187 Figure 2 - Profiles from Iron Squelch analysis of genes perturbed in expression in A549 cells after 6, 12, 24, 48 and 72 hours of HMPV infection. This involves genes that are up-regulated in iron- depletion but down-regulated in iron-overload, or vice versa. There are 40 such genes at a fold change threshold (FC) of at least 1.5 for up-regulated genes and at most 1/1.5 for down- regulated genes in the iron-depletion and overload models. A low p-value, indicates that the infection mimics iron-depletion. (Boxplot, random; Grey diamond, test). (hmpv6, hmpv12, hmpv24, hmpv48, hmpv72; respective data of 6, 12, 24, 48 and 72 hours of HMPV infection.)

Figure 3 - Profiles from Iron Squelch analysis of genes perturbed in expression in bronchial epithelial cells after 4 and 24 hours of RSV infection. This involves genes that are up-regulated in iron- depletion but down-regulated in iron-overload, or vice versa. There are 35 such genes at a fold change threshold (FC) of at least 1.5 for up-regulated genes and at most 1/1.5 for down- regulated genes in the iron-depletion and overload models. A low p-value, indicates that the infection mimics iron-depletion. (Boxplot, random. Grey diamond, test (RSV4hr, RSV24hr; respective data of 4, and 24 hours of RSV infection.)

Figure 4 - Profiles from Iron Squelch analysis of genes perturbed in expression in monocyte derived dendritic cells after 3, 6, 12, and 24 hours of MV infection. This involves genes that are up-regulated in iron- depletion but down-regulated in iron-overload, or vice versa. There are 25 such genes at a fold change threshold (FC) of at least 1.5 for up-regulated genes and at most 1/1.5 for down- regulated genes in the iron-depletion and overload models. A low p-value, indicates that the infection mimics iron-depletion. (Boxplot, random. Grey diamond, test). (MV3hr, MV6hr, MV12hr, MV24hr; respective data of 3, 6, 12 and 24 hours of MV infection.)

188 Appendix 2

Figure 5 - Profiles from Iron Squelch analysis of genes perturbed in expression in mice lung cells after 24, 48, 96 and 168 hours of influenza A infection. This involves genes that are up-regulated in iron- depletion but down-regulated in iron-overload, or vice versa. There are only 14 such genes at a fold change threshold (FC) of at least 1.5 for up-regulated genes and at most 1/1.5 for down-regulated genes in the iron-depletion and overload models. A low p-value, indicates that the infection mimics iron-depletion. (Boxplot, random. Grey diamond, test). (infa24hr, infa48hr, infa96hr, infa168hr; respective data of 24, 48, 96, and 168 hours of Influenza A virus infection. Human homologues of the mice genes were used for these analyses.)

Figure 6 - Profiles from Iron Squelch analysis of genes perturbed in expression in bronchial epithelial cells after 4 hours of exposure with uv-irradiated airway pathogens. This involves genes that are up-regulated in iron- depletion but down-regulated in iron-overload, or vice versa. There are 32 such genes at a fold change threshold (FC) of at least 1.5 for up-regulated genes and at most 1/1.5 for down-regulated genes in the iron-depletion and overload models. A low p-value, indicates that the infection mimics iron-depletion. (pa4uvd, rsv4uvd, staph4uvd respectively refer to data of 4 hours of uv-irradiated Pseudomonas aeruginona, respiratory syncytial virus, and Staphylo- coccus aureus infection.)

Figure 7 - Profiles from Iron Squelch analysis of genes perturbed in expression in human epithelial cells (HL cell line) after 12, 24, 48 and 72 hours of Chlamydia pneumoniae infection. This involves genes that are up-regulated in iron- depletion but down-regulated in iron-overload, or vice versa. There are 44 such genes at a fold change threshold (FC) of at least 1.5 for up-regulated genes and at most 1/1.5 for down-regulated genes in the iron-depletion and overload models. A low p-value, indicates that the infection mimics iron-depletion. (cp12h, cp24h, cp48h, cp72h; respective data of 12, 24, 48 and 72 hours of Chlamydia pneumoniae infection.)

189 Figure 8 - Profiles from Iron Squelch analysis of genes perturbed in expression in human bronchial epithelial cells after 8 and 24 hours of interferon gamma (IFNG) exposure. This involves genes that are up-regulated in iron-depletion but down-regulated in iron- overload, or vice versa. There are only 13 such genes at a fold change threshold (FC) of at least 1.5 for up-regulated genes and at most 1/1.5 for down-regulated genes in the iron-depletion and overload models. The Squelch index (y-axis) reported is the sum of the log2 gene- expression fold changes for the entire gene set. A low p-value indicates that IFNG mimics an iron-depletion profile. (Boxplot, random. Grey diamond, test). (IFNG8hr, IFNG24hr respective data of 8, or 24 hours of IFNG exposure to the epithelial cells.)

Figure 9 - Profiles from Iron Squelch analysis of genes perturbed in expression in airway epithelial cells after 1.5 hours of Pseudomonas aeruginosa infection. This involves genes that are up-regulated in iron-depletion but down-regulated in iron- overload, or vice versa. There are 32 such genes at a fold change threshold (FC) of at least 1.5 for up-regulated genes and at most 1/1.5 for down-regulated genes in the iron-depletion and overload models. The Squelch index (y-axis) reported is the sum of the log2 gene- expression fold changes for the entire gene set. A low p-value, indicates that the infection mimics iron-depletion. (Boxplot, random. Grey diamond, test). (Pa_90min and pa_rsma_90min, respectively refer to data of 90 minutes of pseudomonas and pseudomonas rsma mutant infection.)

190 Appendix 2

Figure 10 - Profiles from Iron Squelch analysis of genes perturbed in expression in intestinal epithelial cells after 6 hours of E. coli infection. This involves genes that are up-regulated in iron-depletion but down-regulated in iron- overload, or vice versa. There are 32 such genes at a fold change threshold (FC) of at least 1.5 for up-regulated genes and at most 1/1.5 for down-regulated genes in the iron-depletion and overload models. A low p-value, indicates that the infection mimics iron-depletion. (Boxplot, random. Grey diamond, test). (Ecoli6hr, data of 6 hours of E. coli infection).

Figure 11 - Profiles from Iron Squelch analysis of genes perturbed in expression in alveolar epithelial cells after 1 and 6 hours of LPS exposure. This involves genes that are up-regulated in iron-depletion but down-regulated in iron- overload, or vice versa. There are 27 such genes at a fold change threshold (FC) of at least 1.5 for up-regulated genes and at most 1/1.5 for down-regulated genes in the iron-depletion and overload models. The Squelch index (y-axis) reported is the sum of the log2 gene- expression fold changes for the entire gene set. A low p-value indicates that LPS mimics iron-depletion profile. (Boxplot, random. Grey diamond, test). (LPS1h, LPS4h respective data of 1 or 4 hour of LPS exposure to epithelial cells).

191 2. Overrepresented transcription factor binding sites (TFBS) in genes perturbed in expression (fold change threshold 1.5) by iron treatment and virus infection

Table 1 - TFBS overrepresented in promoter region of genes relatively upregulated in iron overload and infection (group 1 genes).

Matrix factor Raw_score P_value MA0050 IRF1 8.82 0.009 M00258 ISRE 6.52 0.042 M00210 OCT-x 6.51 0.009 M00062 IRF-1 4.87 0.018 M00193 NF-1 4.82 0.025 M00158 COUP-TF,_HNF-4 3.91 0.042 MA0017 NR2F1 3.91 0.042 M00161 Oct-1 3.67 0.032 M00054 NF-kappaB 3.59 0.029 M00122 USF 3.45 0.012 MA0033 FOXL1 2.15 0.036 MA0119 Hox11-CTF1 1.74 0.034 M00201 C/EBP 1.14 0.039 M00121 USF 0.439 0.049 M00123 c-Myc:Max 0.245 0.025

Table 2 - TFBS overrepresented in promoter region of genes relatively upregulated in iron depletion and infection (group 2 genes).

Matrix factor Raw_score P_value M00216 TATA 21.4 0.001 M00500 STAT6 16.9 0.014 M00192 GR 14 0.008 M00152 SRF 13.2 0 MA0107 RELA 12.5 0 M00053 c-Rel 12 0 M00052 NF-kappaB_(p65) 11.9 0 M00499 STAT5A 11.6 0.039 MA0101 REL 11.4 0

192 Appendix 2

Table 2 - Continued.

Matrix factor Raw_score P_value M00194 NF-kappaB 11.4 0 M00054 NF-kappaB 11.3 0 M00159 C/EBP 11 0.006 M00195 Oct-1 10.6 0.028 M00291 Freac-3 9.54 0.002 M00215 SRF 9.3 0 M00486 Pax-2 8.02 0.027 M00498 STAT4 7.79 0.021 M00252 TATA 7.29 0.001 M00186 SRF 7 0 M00188 AP-1 6.35 0.005 M00051 NF-kappaB_(p50) 6.35 0.003 M00208 NF-kappaB 6.04 0.006 M00497 STAT3 5.98 0.014 M00173 AP-1 5.38 0.002 M00410 SOX-9 5.33 0.015 M00210 OCT-x 5.32 0.027 M00496 STAT1 5.19 0.026 M00172 AP-1 4.72 0.003 M00174 AP-1 4.4 0.011 MA0031 FOXD1 4.24 0.008 M00290 Freac-2 4.2 0.017 MA0083 SRF 3.6 0.001 MA0030 FOXF2 2.72 0.027 MA0105 NFKB1 2.68 0.045 M00203 GATA-X 2.41 0.016 M00135 Oct-1 1.33 0.046 MA0077 SOX9 0.744 0.021 M00248 Oct-1 0.737 0.043 M00128 GATA-1 0.388 0.036

193 Table 3 - TFBS overrepresented in promoter region of genes relatively downregulated in iron overload and infection (group 3 genes).

Matrix factor Raw_score P_value M00196 Sp1 19.2 0.038 M00491 MAZR 12.2 0.004 M00008 Sp1 7.58 0.046 M00480 LUN-1 6.28 0.019 M00499 STAT5A 3.67 0.03 M00208 NF-kappaB 3.07 0.025 M00051 NF-kappaB_(p50) 3.05 0.015 MA0025 NFIL3 0.335 0.034 M00024 E2F 0.165 0.014 M00246 Egr-2 0.0704 0.024

Table 4 - TFBS overrepresented in promoter region of genes relatively downregulated in iron depletion and infection (group 4 genes).

Matrix factor Raw_score P_value M00209 NF-Y 7.45 0 M00287 NF-Y 5.13 0.011 M00143 BSAP 3.72 0.046 M00184 MyoD 3.38 0.039 M00185 NF-Y 3.23 0.003

194 Appendix 2

Figure 12 - Graphical display of some TFBS overrepresented in promoter region of genes relatively upregulated in iron overload and infection (group 1 genes).

Y-axis, TFBS scores. (Only a selection of factors is projected here in order to facilitate interpretation of the figure).

195 Figure 13 - Graphical display of some TFBS overrepresented in promoter region of genes relatively upregulated in iron depletion and infection (group 2 genes).

Y-axis, TFBS scores ranging from 6-15. (Only a selection of factors are projected here in order to facilitate interpretation of the figure).

196 Appendix 2

Figure 14 - Graphical display of some TFBS overrepresented in promoter region of genes relatively downregulated in iron overload and infection (group 3 genes).

Y-axis, TFBS scores. (Only a selection of factors is projected here in order to facilitate interpretation of the figure).

197 Figure 15 - Graphical display of some TFBS overrepresented in promoter region of genes relatively downregulated in iron depletion and infection (group 4 genes). Y-axis, TFBS scores.

Table 5 - Expression of the Transcription factors with over-represented binding sites in the promoter regions of genes in groups 1-4 above.

Factor Official Entrez Group Infection Iron Iron Infection Symbol geneID Depletion Overload Detail IRF1 IRF1 3659 1 + 0 + 0++++ OCT-x POU2F1 5451 1 + 0 0 0000+ IRF-1 IRF1 3659 1 + 0 + 0++++ NF-1 NFIC 4782 1 - 0 0 0000- NF-1 NFIX 4784 1 + - 0 00+00 COUP-TF,_ NR2F1 7025 1 + 0 0 00+++ HNF-4 NR2F1 NR2F1 7025 1 + 0 0 00+++

198 Appendix 2

Table 5 - Continued.

Factor Official Entrez Group Infection Iron Iron Infection Symbol geneID Depletion Overload Detail Oct-1 POU2F1 5451 1 + 0 0 0000+ NF- NFKB1 4790 1 + 0 0 00+++ kappaB USF USF2 7392 1 - 0 0 000-- FOXL1 FOXL1 2300 1 + 0 0 000++ C/EBP CEBPB 1051 1 + 0 0 -0+++ C/EBP CEBPA 1050 1 - 0 + 00--- C/EBP CEBPE 1053 1 - 0 0 ---00 C/EBP CEBPD 1052 1 0 0 0 -00+0 C/EBP CEBPG 1054 1 + 0 0 +00++ c-Myc:Max MAX 4149 1 + 0 0 00+00 c-Myc:Max MYC 4609 1 + 0 0 000++ STAT6 STAT6 6778 2 + 0 0 00++0 GR NR3C1 2908 2 + 0 0 -00++ SRF SRF 6722 2 + 0 0 000++ RELA RELA 5970 2 + 0 0 -00++ c-Rel REL 5966 2 + 0 0 00+++ NF- RELA 5970 2 + 0 0 -00++ kappaB_ (p65) STAT5A STAT5A 6776 2 + 0 0 000++ REL REL 5966 2 + 0 0 00+++ NF- NFKB1 4790 2 + 0 0 00+++ kappaB C/EBP CEBPB 1051 2 + 0 0 -0+++ C/EBP CEBPA 1050 2 - 0 + 00--- C/EBP CEBPE 1053 2 - 0 0 ---00 C/EBP CEBPD 1052 2 0 0 0 -00+0 C/EBP CEBPG 1054 2 + 0 0 +00++ Oct-1 POU2F1 5451 2 + 0 0 0000+ Pax-2 PAX2 5076 2 + - 0 000+0 STAT4 STAT4 6775 2 + 0 0 +0000 AP-1 JUN 3725 2 + 0 0 +++++ AP-1 FOS 2353 2 + + - -0+++ AP-1 JUNB 3726 2 + 0 0 -0+++ AP-1 FOSB 2354 2 + 0 0 0++++

199 Table 5 - Continued.

Factor Official Entrez Group Infection Iron Iron Infection Symbol geneID Depletion Overload Detail NF- NFKB1 4790 2 + 0 0 00+++ kappaB_ (p50) STAT3 STAT3 6774 2 - 0 0 -0000 SOX-9 SOX9 6662 2 + 0 0 000+0 OCT-x POU2F1 5451 2 + 0 0 0000+ STAT1 STAT1 6772 2 + 0 0 +++++ FOXD1 FOXD1 2297 2 + 0 0 0000+ FOXF2 FOXF2 2295 2 - 0 0 0000- NFKB1 NFKB1 4790 2 + 0 0 00+++ GATA-1 GATA1 2623 2 + 0 0 000++ GATA-X GATA1 2623 2 + 0 0 000++ SOX9 SOX9 6662 2 + 0 0 000+0 Freac-2 FOXF2 2295 2 - 0 0 0000- SP1 SP1 6667 3 - 0 0 0000- MAZR PATZ1 23598 3 - 0 0 -000- STAT5A STAT5A 6776 3 + 0 0 000++ NF- NFKB1 4790 3 + 0 0 00+++ kappaB NF- NFKB1 4790 3 + 0 0 00+++ kappaB_ (p50) NFIL3 NFIL3 4783 3 + 0 - 0++++ E2F E2F1 1869 3 - 0 0 -00-- E2F E2F2 1870 3 - 0 0 --0-- E2F E2F4 1874 3 - 0 0 0-00- E2F E2F7 144455 3 - 0 0 -00-- E2F E2F8 79733 3 0 0 0 -000+ Egr-2 EGR2 1959 3 + 0 0 0++++ NF-Y NFYA 4800 4 - 0 0 --000 NF-Y NFYB 4801 4 - 0 0 ----0 NF-Y NFYC 4802 4 - 0 0 0-000 BSAP PAX5 5079 4 + 0 0 0++++ MyoD MYOD1 4654 4 + 0 0 0000+

(+ : upregulated (FC>=1.5); - : downregulated (FC<=1/1.5); 0 : FC threshold not met; InfectionDetail : 6,12,24,48 and 72 hour post HMPV infection. Data from [1])

200 Appendix 2 variations iron in core status genes suggesting internal iron-redistribution with additional (data viral time points from [1]). - Figure 16

201 Figure 17 - Time dependent down-regulation of transferrin receptor c (TFRC) and ferritin subunits FTH1 and FTL accompanied by up-regulation of interferon stimulating gene 20 (ISG20) gene expressions.

A: HMPV and B: IFNG treatments of airway epithelial cells. (Y-axis: log2 relative fold change of gene expression (treatment to control), x: axis treatment (iron-overload, iron- depletion, HMPV and IFNG) over time. (hmpv6, hmpv12 imply hmpv infection for 6, and 12 hours etc. IFNG8hr implies IFNG administration for 8 hours, etc).

202 Appendix 2

References

1. Bao X, Sinha M, Liu T, Hong C, Luxon BA, Garofalo RP, Casola A: Identication of human metapneumovi- rus-induced gene networks in airway epithelial cells by microarray analysis. Virology 2008, 374(1):114- 127. 2. Zilliox MJ, Parmigiani G, Griffin DE: Gene expression patterns in dendritic cells infected with measles virus compared with other pathogens. Proc Natl Acad Sci U S A 2006, 103(9):3363-3368. 3. Rosseau S, Hocke A, Mollenkopf H, Schmeck B, Suttorp N, Kaufmann SH, Zerrahn J: Comparative transcrip- tional proling of the lung reveals shared and distinct features of Streptococcus pneumoniae and inuenza A virus infection. Immunology 2007, 120(3):380-391. 4. Alvesalo J, Greco D, Leinonen M, Raitila T, Vuorela P, Auvinen P: Microarray analysis of a Chlamydia pneu- moniae-infected human epithelial cell line by use of gene ontology hierarchy. J Infect Dis 2008, 197(1):156-162. 5. Ukena SN, Westendorf AM, Hansen W, Rohde M, Geffers R, Coldewey S, Suerbaum S, Buer J, Gunzer F: The host response to the probiotic Escherichia coli strain Nissle 1917: specic up-regulation of the pro- inammatory chemokine MCP-1. BMC Med Genet 2005, 6:43. 6. O’Grady EP, Mulcahy H, O’Callaghan J, Adams C, O’Gara F: Pseudomonas aeruginosa infection of airway epithelial cells modulates expression of Kruppel-like factors 2 and 6 via RsmA-mediated regulation of type III exoenzymes S and Y. Infect Immun 2006, 74(10):5893-5902. 7. Mayer AK, Muehmer M, Mages J, Gueinzius K, Hess C, Heeg K, Bals R, Lang R, Dalpke AH: Di erential recognition of TLR-dependent microbial ligands in human bronchial epithelial cells. J Immunol 2007, 178(5):3134-3142. 8. dos Santos CC, Han B, Andrade CF, Bai X, Uhlig S, Hubmayr R, Tsang M, Lodyga M, Keshavjee S, Slutsky AS et al: DNA microarray analysis of gene expression in alveolar epithelial cells in response to TNFalpha, LPS, and cyclic stretch. Physiol Genomics 2004, 19(3):331-342. 9. Pawliczak R, Logun C, Madara P, Barb J, Suffredini AF, Munson PJ, Danner RL, Shelhamer JH: Inuence of IFN-gamma on gene expression in normal human bronchial epithelial cells: modulation of IFN-gamma e ects by dexamethasone. Physiol Genomics 2005, 23(1):28-45.

203 Appendix 3: Supplementary material of Chapter 5

MicroRNA genes preferentially expressed in dendritic cells contain sites for conserved transcription factor binding motifs in their promoters

Iziah E. Sama, Bastiaan J.H. Jansen, Dagmar Eleveld-Trancikova, Maaike A. van Hout-Kuijer, Joop H. Jansen, Martijn A. Huynen, Gosse J. Adema

Additional files

Additional file 1

Figure S1 - Setup of culture and quality of monocyte-derived DCs.

(A) Culture setup and harvest schedule for RNA isolations. (B) Purity of monocytes (upper histogram) and the expression of maturation markers CD80 and CD86 on the different DC populations (C) Mixed lymphocyte reaction with the various DC populations; ratio of DCs vs. PBLs is indicated on the x-axis. (D) Production of TNFα and IL8 by the different DC populations, as determined by ELISA. The white and black bars each represent data from two different donors.

204 Appendix 3

Additional file 2

Figure S2 - Schematic overview of the miRNA expression screen in monocytes and DCs.

205 Additional file 3 Table S1 - MicroRNAs for which assays have been developed by Applied Biosystems, and of which expression levels were determined in monocytes and dendritic cells. Of a total of 157 miRNAs, 104 were expressed in DC and/or monocytes, whereas 53 were not detected in either cell type.

(Table too large to place here but it is available with the article online)

Additional file 4 Table S2 - Overview of over-represented TFBS motifs in promoter sequences of miRNAs that are upregulated in monocytes (A) or DCs(B).

Table S2A - TFBS motifs over-represented (based on CLOVER; p < 0.05) in promoter sequences of miRNAs that are up-regulated in monocytes.

Matrix Factor Raw_score P_value M00257 RREB-1 10 0.042 MA0081 SPIB 5.22 0.006 M00500 STAT6 4.98 0.029 M00532 RP58 3.47 0.013 M00070 Tal-1beta:ITF-2 3.38 0.003 MA0062 GABPA 2.7 0.031 M00483 ATF6 2.62 0 M00065 Tal-1beta:E47 2.43 0.012 M00025 Elk-1 1.78 0.04 M00066 Tal-1alpha:E47 1.6 0.028 MA0091 TAL1-TCF3 1.51 0.032 M00243 Egr-1 1.2 0.006 M00126 GATA-1 0.679 0.027 M00246 Egr-2 0.675 0.017 MA0083 SRF 0.646 0.016 M00186 SRF 0.4 0.036

206 Appendix 3

Table S2B - TFBS motifs over-represented (based on CLOVER; p < 0.05) in promoter sequences of miRNAs that are up-regulated in DCs.

Matrix Factor Raw_score P_value M00084 MZF1 17.3 0 M00491 MAZR 17 0 M00005 AP-4 14.5 0 M00083 MZF1 12.8 0 MA0073 RREB1 29 0.001 M00257 RREB-1 23 0.001 MA0057 ZNF42_5-13 9.27 0.002 MA0056 ZNF42_1-4 7.28 0.004 M00500 STAT6 8.21 0.006 MA0080 SPI1 4.84 0.006 MA0081 SPIB 6.61 0.013 M00184 MyoD 6.85 0.018 M00497 STAT3 2.73 0.018 M00002 E47 8.09 0.023 MA0095 YY1 0.938 0.024 M00196 Sp1 27.2 0.03 M00222 Hand1:E47 4.72 0.041 M00526 GCNF 0.0864 0.045 MA0055 Myf 9.14 0.047

207 Additional file 5

Figure S3

(A) Commonality of TFBSs in promoter regions of monocyte-expressed miRNAs. Shown on the x-axis are the high-scoring TFBSs (i.e. of instance score ≥ 6) that occur at least once in the 2 kb promoters of the miRNA up-regulated in monocytes. The y-axis shows the number of promoters that have the TFBS, at least once at this threshold. (B) The distribution of the number of common TFBS hits per number of common miRNA promoters as in A, for test and random sets of miRNA promoters. The values for random are the median values from 1000 random set of miRNA promoters of same size and length as those in the monocyte set. The score is the ratio of the sum of “the product of numbers on the x-axis and corresponding y-axis values” for the monocyte set relative to that of the random. The p-value is the fraction of cases wherein this sum for random sets of miRNA promoters is greater than or equal to that of the DC set.

208 Appendix 3

Additional file 6

Figure S4

(A) TFBSs shared among the promoter regions of miRNAs up-regulated in monocytes. Shown on the x-axis are high-scoring TFBSs filtered at a motif instance score threshold of at least 6 and at a 10th percentile evolutionary conservation score that occur at least once in the promoters of miRNAs up-regulated in DCs. The y-axis shows the number of promoters that have the TFBS at least once at these thresholds. (B) The distribution of the number of common TFBS hits per number of common miRNA promoters as in A, for test and random sets of miRNA promoters. The values for random are the median values from 1000 random set of miRNA promoters of same size and length as those in the monocyte set. The score is the ratio of the sum of all occurrences of all TFBS for the monocyte set relative to that of the random sets. The p-value is estimated from the number of instances wherein this sum of “the product of number of TFBSs and the number of miRNA promoters” of 1000 random sets of miRNA promoters is greater than or equal to that of the monocyte set.

209 Additional file 7 Table S3 - Library of TFBS motif for which the cognate TFs are expressed (presence = 1) or not expressed (absence = 1) in DCs.

(Table too large to place here but it is available with the article online)

Additional file 8

Figure S5

Correlation of the number of expressed TFs with the number of promoters of miRNAs that are over-expressed in DCs. The TFs used are those of high-scoring TFBSs (i.e. of instance score >= 6) that are also highly conserved (10th percentile of PhastCons scores) and occur at least once in the 2 kb promoters of the miRNA up-regulated in DCs (left plot), and a random set of miRNAs (right plot). The original number of miRNA promoters that share at least one TFBS was 12 for both DCs and the random set of promoters (i.e the number of promoters of miRNAs that were over-expressed in DCs). Due to the high conservation threshold used, the maximum number of miRNA promoters that share at least one TFBS became smaller (6 for the test set and 3 for the random set). The correlations were calculated using the Pearson correlation as implemented in R.

210 Appendix 3

Additional file 9 Table S4 - Gene ontology analysis of the expressed and not-expressed TFs of predicted binding sites in the promoters of miRNA.

(Table too large to place here but it is available with the article online)

Additional file 10 Table S5 - The 421 target genes that are predicted by TargetScan for the DC-over- expressed miRNAs.

(Table too large to place here but it is available with the article online)

Additional file 11 Table S6 - The 2300 target genes that are predicted by PicTar for the DC-over- expressed miRNAs.

(Table too large to place here but it is available with the article online)

Additional file 12 Table S7 - The 282 target genes that are predicted by TargetScan for the Monocyte- over-expressed miRNAs.

(Table too large to place here but it is available with the article online)

Additional file 13 Table S8 - The 1591 target genes that are predicted by PicTar for the Monocyte- over-expressed miRNAs.

(Table too large to place here but it is available with the article online)

211 Additional file 14

Figure S6 Venn diagram showing the intersection of target genes. (A) Intersection of target genes of the miRNAs that are over-expressed in DCs, using PicTar (DC.pic), and TargetScan (DC. tar). (B) Similar to A, but for monocytes. (C) Intersections of the common targets found by PicTar and TargetScan for DC miRNAs (DC.pic.tar), the common targets for monocyte miRNAs (mono.pic.tar), and the dataset of Lehtonen et al .2007 (Leh.) of genes that are regulated in DCs relative to monocytes during DC differentiation. (D-F) Intersection of common target genes from DC miRNAs, monocyte miRNAs, and genes that were up-regulated (upreg) or down-regulated (downreg), in DCs relative to monocytes in the data set of Lehtonen et al.2007. Figure D, E, F represent respectively the data sets at time points 3, 6 and 24 hr of DC differentiation from monocytes. (Differential gene relations (DC/monocyte) were selected at a fold change of at least 2. The PicTar score used >=0.4, TargetScan context score <=-0.4 . Comparisons were done at the level of RefSeq DNA ID and the elements in the intersections with the upreg and downreg datasets are provided in Table S11 with gene symbols attached.

212 Appendix 3

Additional file 15 Table S9 - Target genes that are down-regulated in DCs relative to monocytes and that were predicted using PicTar or Targetscan, to be targets of miRNAs that are over-expressed in DCs relative to monocytes.

Table S9A - Target genes that are down-regulated in DCs relative to Monocytes and that were predicted using PicTar to be targets of miRNAs that are over-expressed in DCs relative to Monocytes.

relative expression (DC/ monocyte) of target gene in the dataset of Lehtonen et al 2007 Affy_HGU133A RefSeqDNA HGNC t3hr t6hr t24hr DC Pictar ID ID symbol miRNA score 219534_x_at NM_000076 CDKN1C -4.85 -6.35 -7.15 hsa-miR-221 3.78 216894_x_at NM_000076 CDKN1C -2.8 -5.4 -6.75 hsa-miR-221 3.78 216248_s_at NM_006186 NR4A2 -2.7 -4.6 -6.45 hsa-miR-34a 3.81 204621_s_at NM_006186 NR4A2 -2.5 -4.1 -6.2 hsa-miR-34a 3.81 213182_x_at NM_000076 CDKN1C -3.45 -5.8 -5.8 hsa-miR-221 3.78 204622_x_at NM_006186 NR4A2 -2.25 -3.9 -5.3 hsa-miR-34a 3.81 201939_at NM_006622 PLK2 -2.55 -3.15 -5.15 hsa-miR-342 2.61 220266_s_at NM_004235 KLF4 -3.75 -5.1 -5 hsa-miR-135a 2.65 220266_s_at NM_004235 KLF4 -3.75 -5.1 -5 hsa-miR-34a 3.33 221841_s_at NM_004235 KLF4 -3 -4.75 -4.8 hsa-miR-135a 2.65 221841_s_at NM_004235 KLF4 -3 -4.75 -4.8 hsa-miR-34a 3.33 221920_s_at NM_016612 SLC25A37 -1.45 -2.75 -4.55 hsa-miR-221 3.72 208078_s_at NM_173354 SIK1 -3.15 -3.6 -4.15 hsa-miR-137 7.41 219480_at NM_005985 SNAI1 -1.85 -2.65 -4.1 hsa-miR-125a 2.74 219480_at NM_005985 SNAI1 -1.85 -2.65 -4.1 hsa-miR-125b 2.74 206115_ at NM_004430 EGR3 -2.1 -3.65 -3.8 hsa-let-7e 1.41 203036_s_at NM_014751 MTSS1 -4 -3.9 -3.7 hsa-miR-135a 1.92 205214_at NM_004226 STK17B -1.85 -2.45 -3.45 hsa-miR-221 3.83 221986_s_at NM_017644 KLHL24 -1.5 -2.45 -3.4 hsa-let-7e 4.03 201015_s_at NM_002230 JUP -2.1 -2.1 -3.35 hsa-miR-125a 2.91 201015_s_at NM_002230 JUP -2.1 -2.1 -3.35 hsa-miR-125b 2.91 201015_s_at NM_021991 JUP -2.1 -2.1 -3.35 hsa-miR-125a 3.19 201015_s_at NM_021991 JUP -2.1 -2.1 -3.35 hsa-miR-125b 3.19 207433_at NM_000572 IL10 -2.35 -1.9 -3.2 hsa-let-7e 2.56 221763_at NM_004241 JMJD1C -1.45 -2.8 -3.15 hsa-miR-34a 4.07 213348_at NM_000076 CDKN1C -2.55 -4 -3.05 hsa-miR-221 3.78

213 Table S9A - Continued.

relative expression (DC/ monocyte) of target gene in the dataset of Lehtonen et al 2007 Affy_HGU133A RefSeqDNA HGNC t3hr t6hr t24hr DC Pictar ID ID symbol miRNA score 202149_at NM_006403 NEDD9 -2.25 -2.85 -3.05 hsa-miR-125a 2.4 202149_at NM_006403 NEDD9 -2.25 -2.85 -3.05 hsa-miR-125b 2.4 202150_s_at NM_006403 NEDD9 -2.2 -2.9 -2.95 hsa-miR-125a 2.4 202150_s_at NM_006403 NEDD9 -2.2 -2.9 -2.95 hsa-miR-125b 2.4 219132_at NM_021255 PELI2 -1.5 -2.55 -2.95 hsa-miR-125a 1.64 219132_at NM_021255 PELI2 -1.5 -2.55 -2.95 hsa-miR-125b 1.64 219132_at NM_021255 PELI2 -1.5 -2.55 -2.95 hsa-miR-135a 6.31 202932_at NM_005433 YES1 -1.25 -1.9 -2.65 hsa-miR-125a 3.06 202932_at NM_005433 YES1 -1.25 -1.9 -2.65 hsa-miR-125b 3.06 203547_at NM_000616 CD4 -1.05 -2.15 -2.6 hsa-miR-221 2.55 220319_s_at NM_013262 MYLIP -1.9 -2.75 -2.5 hsa-miR-221 2.11 218738_s_at NM_016271 RNF138 -1.55 -1.1 -2.3 hsa-miR-135a 2.84 218738_s_at NM_016271 RNF138 -1.55 -1.1 -2.3 hsa-miR-137 4.15 218738_s_at NM_198128 RNF138 -1.55 -1.1 -2.3 hsa-miR-135a 2.84 218738_s_at NM_198128 RNF138 -1.55 -1.1 -2.3 hsa-miR-137 4.15 201751_at NM_014876 JOSD1 -1.15 -1.6 -2.15 hsa-miR-135a 2.53 202933_s_at NM_005433 YES1 -1.7 -2.6 -2.15 hsa-miR-125a 3.06 202933_s_at NM_005433 YES1 -1.7 -2.6 -2.15 hsa-miR-125b 3.06 212689_s_at NM_018433 KDM3A -1.55 -1.75 -2 hsa-let-7e 3.05 210360_s_at NM_014751 MTSS1 -1.7 -1.95 -2 hsa-miR-135a 1.92 217964_at NM_017775 TTC19 -1.6 -2.1 -1.95 hsa-miR-34a 7.04 209189_at NM_005252 FOS -2.9 -2.45 -1.8 hsa-miR-221 3.38 218205_s_at NM_199054 MKNK2 -2.2 -1.75 -1.8 hsa-miR-125a 4.84 218205_s_at NM_199054 MKNK2 -2.2 -1.75 -1.8 hsa-miR-125b 4.84 202191_s_at NM_003644 GAS7 -1.6 -2.4 -1.75 hsa-let-7e 2.11 202191_s_at NM_003644 GAS7 -1.6 -2.4 -1.75 hsa-miR-135a 3.71 202191_s_at NM_201433 GAS7 -1.6 -2.4 -1.75 hsa-let-7e 2.11 202191_s_at NM_201433 GAS7 -1.6 -2.4 -1.75 hsa-miR-135a 3.71 202192_s_at NM_003644 GAS7 -1.75 -2.1 -1.75 hsa-let-7e 2.11 202192_s_at NM_003644 GAS7 -1.75 -2.1 -1.75 hsa-miR-135a 3.71 202192_s_at NM_201433 GAS7 -1.75 -2.1 -1.75 hsa-let-7e 2.11 202192_s_at NM_201433 GAS7 -1.75 -2.1 -1.75 hsa-miR-135a 3.71

214 Appendix 3

Table S9A - Continued.

relative expression (DC/ monocyte) of target gene in the dataset of Lehtonen et al 2007 Affy_HGU133A RefSeqDNA HGNC t3hr t6hr t24hr DC Pictar ID ID symbol miRNA score 221985_at NM_017644 KLHL24 -1.05 -1.8 -1.65 hsa-let-7e 4.03 204650_s_at NM_006051 APBB3 -1.25 -1.8 -1.6 hsa-let-7e 3.29 204650_s_at NM_133172 APBB3 -1.25 -1.8 -1.6 hsa-let-7e 3.29 204650_s_at NM_133173 APBB3 -1.25 -1.8 -1.6 hsa-let-7e 3.29 204650_s_at NM_133174 APBB3 -1.25 -1.8 -1.6 hsa-let-7e 3.29 203003_at NM_005920 MEF2D -1.45 -1.75 -1.6 hsa-let-7e 2.62 203003_at NM_005920 MEF2D -1.45 -1.75 -1.6 hsa-miR-125a 1.91 203003_at NM_005920 MEF2D -1.45 -1.75 -1.6 hsa-miR-125b 1.91 210359_at NM_014751 MTSS1 -1.7 -1.75 -1.55 hsa-miR-135a 1.92 212430_at NM_017495 RBM38 -2.45 -2.05 -1.55 hsa-let-7e 2.77 212430_at NM_017495 RBM38 -2.45 -2.05 -1.55 hsa-miR-125a 3.5 212430_at NM_017495 RBM38 -2.45 -2.05 -1.55 hsa-miR-125b 3.5 202769_at NM_004354 CCNG2 -1.15 -1.25 -1.4 hsa-miR-135a 3.17 202769_at NM_004354 CCNG2 -1.15 -1.25 -1.4 hsa-miR-137 2.62 202770_s_at NM_004354 CCNG2 -1.65 -1.2 -1.4 hsa-miR-135a 3.17 202770_s_at NM_004354 CCNG2 -1.65 -1.2 -1.4 hsa-miR-137 2.62 211559_s _ at NM_004354 CCNG2 -1.35 -1.2 -1.4 hsa-miR-135a 3.17 211559_s _ at NM_004354 CCNG2 -1.35 -1.2 -1.4 hsa-miR-137 2.62 210872_x_at NM_003644 GAS7 -1.1 -1.75 -1.4 hsa-let-7e 2.11 210872_x_at NM_003644 GAS7 -1.1 -1.75 -1.4 hsa-miR-135a 3.71 210872_x_at NM_201432 GAS7 -1.1 -1.75 -1.4 hsa-let-7e 2.11 210872_x_at NM_201432 GAS7 -1.1 -1.75 -1.4 hsa-miR-135a 3.71 210872_x_at NM_201433 GAS7 -1.1 -1.75 -1.4 hsa-let-7e 2.11 210872_x_at NM_201433 GAS7 -1.1 -1.75 -1.4 hsa-miR-135a 3.71 201416_at NM_003107 SOX4 -1.4 -1.15 -1.4 hsa-miR-221 1.37 201416_at NM_003107 SOX4 -1.4 -1.15 -1.4 hsa-miR-34a 2.18 204897_at NM_000958 PTGER4 -1.6 -1.9 -1.35 hsa-miR-342 2.18 201557_at NM_014232 VAMP2 -1.1 -1.15 -1.15 hsa-miR-34a 8.66 201417_at NM_003107 SOX4 -1.7 -1.85 -1.05 hsa-miR-221 1.37 201417_at NM_003107 SOX4 -1.7 -1.85 -1.05 hsa-miR-34a 2.18 202723_s_at NM_002015 FOXO1 -1.75 -1.55 -1 hsa-miR-135a 1.6

215 Table S9B - Target genes that are down-regulated in DCs relative to Monocytes and that were predicted using TargetScan to be targets of miRNAs that are over-expressed in DCs relative to Monocytes.

Affy_HGU133A RefSeqDNA HGNC t3hr t6hr t24hr DC context ID ID symbol miRNA score 211840_s_at NM _001104631 PDE4D -2.4 -5.4 -5.05 hsa-miR-342-3p -0.6764 208078_s_at NM_173354 SIK1 -3.15 -3.6 -4.15 hsa-miR-137 -0.563 204491_at NM _001104631 PDE4D -2.35 -3 -3.35 hsa-miR-342-3p -0.6764 219132_at NM_021255 PELI2 -1.5 -2.55 -2.95 hsa-miR-135a -0.695 220319_s_at NM_013262 MYLIP -1.9 -2.75 -2.5 hsa-miR-221 -0.46 218738_s_at NM_016271 RNF138 -1.55 -1.1 -2.3 hsa-miR-137 -0.522 210836_x_at NM _001104631 PDE4D -2 -2.4 -2.05 hsa-miR-342-3p -0.6764 212443_at NM_015175 NBEAL2 -1.35 -1.55 -1.55 hsa-miR-125b - 0.411 209020_at NM_016470 C20or f111 -1.35 -1.2 -1.4 hsa-miR-342-3p -0.424 202769_at NM_004354 CCNG2 -1.15 -1.25 -1.4 hsa-miR-137 -0.465 202770_s_at NM_004354 CCNG2 -1.65 -1.2 -1.4 hsa-miR-137 -0.465 211559_s _ at NM_004354 CCNG2 -1.35 -1.2 -1.4 hsa-miR-137 -0.465 210837_s_at NM _001104631 PDE4D -2.05 -2.95 -1.4 hsa-miR-342-3p -0.6764 213022_s_at NM_007124 UTRN -2.3 -1.4 -1.2 hsa-miR-135a -0.577 202723_s_at NM_002015 FOXO1 -1.75 -1.55 -1 hsa-miR-135a -0.54

Additional file 16 Table S10 - Target genes that are down-regulated in monocytes relative to DCs and that were predicted using PicTar or Targetscan, to be targets of miRNAs that are over-expressed in monocytes relative to DCs.

Table S10A - Target genes that are down-regulated in monocytes relative to DCs and that were predicted using PicTar to be targets of miRNAs that are over-expressed in monocytes relative to DCs.

Affy_HGU133A Refseq DNA HGNC relative expression monocyte- Pictar ID ID symbol (DC/monocyte) of miRNA score target gene in the dataset of Lehtonen et al 2007 t3hr t6hr t24hr 211839_s _ at NM_000757 CSF1 6.6 7.4 7.8 hsa-miR-130a 4.22 210557_x_at NM _172211 CSF1 6.05 6.7 6.85 hsa-miR-130a 4.22 208510_s_at NM_138712 PPARG 7.85 7.65 6.45 hsa-miR-130a 4.28 208510_s_at NM _138711 PPARG 7.85 7.65 6.45 hsa-miR-130a 4.28

216 Appendix 3

Table S10A - Continued.

Affy_HGU133A Refseq DNA HGNC relative expression monocyte- Pictar ID ID symbol (DC/monocyte) of miRNA score target gene in the dataset of Lehtonen et al 2007 t3hr t6hr t24hr 208510_s_at NM_015869 PPARG 7.85 7.65 6.45 hsa-miR-130a 4.27 208510_s_at NM_005037 PPARG 7.85 7.65 6.45 hsa-miR-130a 4.27 212298_at NM_003873 NRP1 4.95 6.4 6.2 hsa-miR-130a 2 207082_at NM_000757 CSF1 3.6 5.2 6 hsa-miR-130a 4.22 212765_at NM_203459 CAMSAP1L1 2.3 3.95 3.65 hsa-miR-16 2.23 212765_at NM_203459 CAMSAP1L1 2.3 3.95 3.65 hsa-miR-15b 2.23 201266_at NM_182743 TXNRD1 3 3.6 3.2 hsa-miR-199b 2.37 201266_at NM_182743 TXNRD1 3 3.6 3.2 hsa-miR-199a 2.37 201266_at NM_182742 TXNRD1 3 3.6 3.2 hsa-miR-199b 2.37 201266_at NM_182742 TXNRD1 3 3.6 3.2 hsa-miR-199a 2.37 201266_at NM_182729 TXNRD1 3 3.6 3.2 hsa-miR-199b 2.37 201266_at NM_182729 TXNRD1 3 3.6 3.2 hsa-miR-199a 2.37 201266_at NM_003330 TXNRD1 3 3.6 3.2 hsa-miR-199b 2.37 201266_at NM_003330 TXNRD1 3 3.6 3.2 hsa-miR-199a 2.37 217196_s_at NM_203459 CAMSAP1L1 2.3 3.55 3.1 hsa-miR-16 2.23 217196_s_at NM_203459 CAMSAP1L1 2.3 3.55 3.1 hsa-miR-15b 2.23 204401_at NM_002250 KCNN4 1.45 1.75 3 hsa-miR-16 4.31 204401_at NM_002250 KCNN4 1.45 1.75 3 hsa-miR-15b 4.31 200900_s_at NM_002355 M6PR 1.25 2.35 2.85 hsa-miR-199b 3.95 200900_s_at NM_002355 M6PR 1.25 2.35 2.85 hsa-miR-199a 3.95 200900_s_at NM_002355 M6PR 1.25 2.35 2.85 hsa-miR-130a 3.67 207233_s_at NM_198178 MITF 1.5 2.5 2.8 hsa-miR-199b 2.45 207233_s_at NM_198178 MITF 1.5 2.5 2.8 hsa-miR-199a 2.45 207233_s_at NM_198177 MITF 1.5 2.5 2.8 hsa-miR-199b 2.45 207233_s_at NM_198177 MITF 1.5 2.5 2.8 hsa-miR-199a 2.45 207233_s_at NM_198159 MITF 1.5 2.5 2.8 hsa-miR-199b 2.45 207233_s_at NM_198159 MITF 1.5 2.5 2.8 hsa-miR-199a 2.45 207233_s_at NM_198158 MITF 1.5 2.5 2.8 hsa-miR-199b 2.45 207233_s_at NM_198158 MITF 1.5 2.5 2.8 hsa-miR-199a 2.45 207233_s_at NM_006722 MITF 1.5 2.5 2.8 hsa-miR-199b 2.45 207233_s_at NM_006722 MITF 1.5 2.5 2.8 hsa-miR-199a 2.45 207233_s_at NM_000248 MITF 1.5 2.5 2.8 hsa-miR-199b 2.45 207233_s_at NM_000248 MITF 1.5 2.5 2.8 hsa-miR-199a 2.45

217 Table S10A - Continued.

Affy_HGU133A Refseq DNA HGNC relative expression monocyte- Pictar ID ID symbol (DC/monocyte) of miRNA score target gene in the dataset of Lehtonen et al 2007 t3hr t6hr t24hr 200961_at NM_012248 SEPHS2 1.2 2 2.55 hsa-miR-150 2.88 203795_s_at NM_020993 BCL7A 2.4 2.6 2.35 hsa-miR-130a 1.99 214179_s_at NM_003204 NFE2L1 1.05 1.95 2.1 hsa-miR-199b 7.63 214179_s_at NM_003204 NFE2L1 1.05 1.95 2.1 hsa-miR-199a 7.63 214179_s_at NM_003204 NFE2L1 1.05 1.95 2.1 hsa-miR-16 3.05 214179_s_at NM_003204 NFE2L1 1.05 1.95 2.1 hsa-miR-15b 3.05 201024_x_at NM_015904 EIF5B 1.05 1.95 2.1 hsa-miR-199b 4.16 210896_s_at NM_032468 ASPH 3 1.45 2 hsa-miR-16 2.45 210896_s_at NM_032468 ASPH 3 1.45 2 hsa-miR-15b 2.45 210896_s_at NM_032466 ASPH 3 1.45 2 hsa-miR-16 2.46 210896_s_at NM_032466 ASPH 3 1.45 2 hsa-miR-15b 2.46 203935_at NM _001105 ACVR1 3.3 3 1.95 hsa-miR-130a 6.52 209455_at NM_033644 FBXW11 1.95 1.55 1.9 hsa-miR-199b 2.21 209455_at NM_033644 FBXW11 1.95 1.55 1.9 hsa-miR-199a 2.21 209455_at NM_033644 FBXW11 1.95 1.55 1.9 hsa-miR-150 1.97 209455_at NM_012300 FBXW11 1.95 1.55 1.9 hsa-miR-199b 2.21 209455_at NM_012300 FBXW11 1.95 1.55 1.9 hsa-miR-199a 2.21 209455_at NM_012300 FBXW11 1.95 1.55 1.9 hsa-miR-150 1.97 217993_s_at NM_182796 MAT2B 1.05 1.3 1.85 hsa-miR-130a 2.7 217993_s_at NM_013283 MAT2B 1.05 1.3 1.85 hsa-miR-130a 2.7 200614_at NM_004859 CLTC 1.6 1.9 1.85 hsa-miR-130a 2.29 200020_at NM_007375 TARDBP 1.6 2 1.7 hsa-miR-130a 2.02 201176_s _ at NM_001655 ARCN1 1.2 1.65 1.7 hsa-miR-199b 3.25 201176_s _ at NM_001655 ARCN1 1.2 1.65 1.7 hsa-miR-199a 3.25 201176_s _ at NM_001655 ARCN1 1.2 1.65 1.7 hsa-miR-16 4.32 201176_s _ at NM_001655 ARCN1 1.2 1.65 1.7 hsa-miR-15b 4.32 218242_s_at NM_017635 SUV420H1 1.25 1.8 1.65 hsa-miR-130a 2.12 218761_at NM_017610 RNF111 1.05 1.65 1.45 hsa-miR-16 2.29 218761_at NM_017610 RNF111 1.05 1.65 1.45 hsa-miR-15b 2.29 205479_s_at NM_002658 PLAU 1.5 2.5 1.45 hsa-miR-199b 4.42 205479_s_at NM_002658 PLAU 1.5 2.5 1.45 hsa-miR-199a 4.42 221568_s_at NM_018362 LIN7C 1.05 1.65 1.45 hsa-miR-199b 4.79 221568_s_at NM_018362 LIN7C 1.05 1.65 1.45 hsa-miR-199a 4.79

218 Appendix 3

Table S10A - Continued.

Affy_HGU133A Refseq DNA HGNC relative expression monocyte- Pictar ID ID symbol (DC/monocyte) of miRNA score target gene in the dataset of Lehtonen et al 2007 t3hr t6hr t24hr 201089_at NM_001693 ATP6V1B2 1.75 1.75 1.45 hsa-miR-130a 2.02 217939_s_at NM_203437 AFTPH 1.15 1.25 1.45 hsa-miR-199b 3.46 217939_s_at NM_203437 AFTPH 1.15 1.25 1.45 hsa-miR-199a 3.46 217939_s_at NM_017657 AFTPH 1.15 1.25 1.45 hsa-miR-199b 3.46 217939_s_at NM_017657 AFTPH 1.15 1.25 1.45 hsa-miR-199a 3.46 217939_s_at NM_001002243 AFTPH 1.15 1.25 1.45 hsa-miR-199b 3.46 217939_s_at NM_001002243 AFTPH 1.15 1.25 1.45 hsa-miR-199a 3.46 205327_s_at NM_001616 ACVR2A 1.05 1.25 1.45 hsa-miR-16 8.44 205327_s_at NM_001616 ACVR2A 1.05 1.25 1.45 hsa-miR-15b 8.44 216199_s_at NM_006724 MAP3K4 2 2.45 1.4 hsa-miR-16 3.15 216199_s_at NM_006724 MAP3K4 2 2.45 1.4 hsa-miR-15b 3.15 216199_s_at NM_005922 MAP3K4 2 2.45 1.4 hsa-miR-16 3.15 216199_s_at NM_005922 MAP3K4 2 2.45 1.4 hsa-miR-15b 3.15 212297_at NM_024524 ATP13A3 1.95 2.1 1.35 hsa-miR-16 3.1 212297_at NM_024524 ATP13A3 1.95 2.1 1.35 hsa-miR-15b 3.1 219496_at NM_023016 ANKRD57 4.05 3.4 1.35 hsa-miR-16 2.12 219496_at NM_023016 ANKRD57 4.05 3.4 1.35 hsa-miR-15b 2.12 209238_at NM_004177 STX3 1.05 1.25 1.3 hsa-miR-130a 3.08 211668 _s _ at NM_002658 PLAU 1.45 2.4 1.25 hsa-miR-199b 4.42 211668 _s _ at NM_002658 PLAU 1.45 2.4 1.25 hsa-miR-199a 4.42 202370_s_at NM_022845 CBFB 1.2 1.3 1.2 hsa-miR-130a 3.06 202370_s_at NM_001755 CBFB 1.2 1.3 1.2 hsa-miR-130a 3.01 218056_at NM_016561 BFAR 1.05 1.65 1.2 hsa-miR-16 2.9 218056_at NM_016561 BFAR 1.05 1.65 1.2 hsa-miR-15b 2.9 201998_at NM_173217 ST6GAL1 1.3 1.6 1.15 hsa-miR-199b 4.39 201998_at NM_173217 ST6GAL1 1.3 1.6 1.15 hsa-miR-199a 4.49 201998_at NM_173216 ST6GAL1 1.3 1.6 1.15 hsa-miR-199b 4.39 201998_at NM_173216 ST6GAL1 1.3 1.6 1.15 hsa-miR-199a 4.49 201998_at NM_003032 ST6GAL1 1.3 1.6 1.15 hsa-miR-199b 4.39 201998_at NM_003032 ST6GAL1 1.3 1.6 1.15 hsa-miR-199a 4.49 209341_s_at NM_001556 IKBKB 1.2 1.85 1.1 hsa-miR-199b 3.74 209341_s_at NM_001556 IKBKB 1.2 1.85 1.1 hsa-miR-199a 3.74 207719_x_at NM_014812 CEP170 1.95 1.2 1.1 hsa-miR-130a 2.8

219 Table S10A - Continued.

Affy_HGU133A Refseq DNA HGNC relative expression monocyte- Pictar ID ID symbol (DC/monocyte) of miRNA score target gene in the dataset of Lehtonen et al 2007 t3hr t6hr t24hr 204089_x_at NM_006724 MAP3K4 1.95 2.65 1 hsa-miR-16 3.15 204089_x_at NM_006724 MAP3K4 1.95 2.65 1 hsa-miR-15b 3.15 204089_x_at NM_005922 MAP3K4 1.95 2.65 1 hsa-miR-16 3.15 204089_x_at NM_005922 MAP3K4 1.95 2.65 1 hsa-miR-15b 3.15

Table S10B - Target genes that are down-regulated in monocytes relative to DCs and that were predicted using TargetScan to be targets of miRNAs that are over-expressed in monocytes relative to DCs.

Affy_HGU133A Refseq DNA HGNC t3hr t6hr t24hr monocyte- Context ID ID symbol miRNA score 209921_at NM_014331 SLC7A11 5.2 5.1 4.5 hsa-miR-199a-3p -0.503 217678_at NM_014331 SLC7A11 4.75 4.8 4.15 hsa-miR-199a-3p -0.503 212158_at NM_002998 SDC2 2.2 3.8 4.05 hsa-miR-199a-3p -0.486 212157_at NM_002998 SDC2 1.3 3.7 4.05 hsa-miR-199a-3p -0.486 212154_at NM_002998 SDC2 1.75 4.1 3.9 hsa-miR-199a-3p -0.486 207528_s_at NM_014331 SLC7A11 5.15 5.65 3.45 hsa-miR-199a-3p -0.503 204032_at NM_003567 BCAR3 2.7 2.45 2.65 hsa-miR-199a-3p -0.825 201972_at NM_001690 ATP6V1A 1.65 1.85 2.2 hsa-miR-199a-3p -0.558 201971_s_at NM_001690 ATP6V1A 1.35 2.45 2.1 hsa-miR-199a-3p -0.558 222262_s_at NM_018638 ETNK1 1.35 2.25 2 hsa-miR-16 -0.52 222262_s_at NM_018638 ETNK1 1.35 2.25 2 hsa-miR-199a-3p -0.66 219017_at NM_018638 ETNK1 2 2.55 2 hsa-miR-16 -0.52 219017_at NM_018638 ETNK1 2 2.55 2 hsa-miR-199a-3p -0.66 212888_at NM_030621 DICER1 1.1 2.3 1.75 hsa-miR-130a -0.579 205327_s_at NM_001616 ACVR2A 1.05 1.25 1.45 hsa-miR-199a-3p -0.546 216199_s_at NM_005922 MAP3K4 2 2.45 1.4 hsa-miR-199a-3p -0.51 200915_x_at NM_182926 KTN1 1.6 1.45 1.4 hsa-miR-199a-3p -0.425 213229_at NM_030621 DICER1 1.15 2.4 1.3 hsa-miR-130a -0.579 214709_s_at NM_182926 KTN1 1.55 1.5 1.1 hsa-miR-199a-3p -0.425 204089_x_at NM_005922 MAP3K4 1.95 2.65 1 hsa-miR-199a-3p -0.51

220 Appendix 3

Additional file 17

Table S11 - Elements in the Venn diagram intersection of Figure S6D to S6F.

Refseq ID Gene symbol Common 6 elements in "DC.pic.tar" and "Downreg24" (same at 3 and 6hr) NM_021255 PELI2 NM_016271 RNF138 NM_013262 MYLIP NM_173354 SIK1 NM_002015 FOXO1 NM_004354 CCNG2

Common 2 elements in "Mono.pic.tar" and "Upreg24"(same at 3 and 6hr) NM_005922 MAP3K4 NM_001616 ACVR2A

Common 9 elements in "DC.pic.tar" and "Upreg24"(same at 3 and 6hr) NM_016644 PRR16 NM_003901 SGPL1 NM_007375 TARDBP NM_003615 SLC4A7 NM_000248 MITF NM_016040 TMED5 NM_014290 TDRD7 NM_003888 ALDH1A2 NM_032804 ADO

Common 3 elements in "Mono.pic.tar" and "Downreg24"(same at 3 and 6hr) NM_003453 ZMYM2 NM_205768 ZNF238 NM_014456 PDCD4

221 Curriculum vitae (English)

Curriculum vitae (English)

Iziah Edwin Sama was born on February 10, 1976 in Kombone, Cameroon. He obtained the First School Leaving Certificate in 1990. In 1997, he completed his high school education at Bilingual Grammar School in Buea, Cameroon. Thereafter, he went to the Anglo-Saxon style university in the neighbourhood; the University of Buea, from whence he obtained - in 2000 - a Bachelor of Science (Hons) degree in the area of Biochemistry and Medical Laboratory Technology. Poised for further training and to immediately help in the local health issues, Iziah volunteered as a Med. Lab. Tech. at the diagnostics unit of the Solidarity Clinic in Buea. While still working at the clinic, Iziah’s perpetual quest for further education resulted in a Master of Science fellowship in Bioinformatics - in 2003 - from the University of Wageningen and Research Centrum (WUR), the Netherlands. He graduated, in 2005, with a Master of Science degree in Bioinformatics from WUR. His internship work entitled “Annotation and haplotype comparison of a G-protein coupled receptor region in Phytophthora infestans”, was completed under the auspices of Prof Francine Govers at the department of Phytopathology, WUR, in February 2005. His thesis work, entitled “Deciphering phos- phorylation sites in the Arabidopsis Somatic Embryogenesis Receptor-like kinase 1(AtSERK1) and its interactors in the AtSERK1 signal transduction cascade” was completed in August 2005 under the auspices of Prof Sacco de Vries at the department of Biochemistry, WUR. In December 2005, Iziah joined the group of Prof Martijn Huynen - at the Centre for Molecular and Biomolecular Informatics, Radboud University Nijmegen Medical Centre - for a PhD project aimed at unravelling host cell immune responses to respiratory viral infections. Iziah’s main scientific interest has evolved to Applied Bioinformatics to decipher biomedical problems. Since September 2011, Iziah is working as a Bioinformatician in the group of Prof. Dr. Carine Carels at the department of orthodontics, Radboud University Nijmegen Medical Centre, Nijmegen. Iziah is happily married and is a father of one.

222 Curriculum vitae (Nederlands)

Curriculum vitae (Nederlands)

Iziah Edwin Sama is geboren op 10 februari 1976 in Kombone, Kameroen. Zijn First School Leaving Certificate (basisschool-diploma) behaalde hij in 1990. In 1997 rondde hij de high school (middelbaar onderwijs) af op de Bilingual Grammar School in Buea, Kameroen. Daarna ging hij naar de Angelsaxische universiteit in diezelfde plaats; de Universiteit van Buea, waar hij in 2000 zijn bachelor behaalde in Biochemie en Medische Laboratorium Techniek. Gereed om verder te leren en om direct betrokken te zijn bij de lokale gezondheidsproblemen, werkte Iziah als vrijwilliger bij de diagnostiek afdeling van het medisch laboratorium van de Solidarity Clinic in Buea. Terwijl hij in deze kliniek werkte, resulteerde in 2003 Iziahs onophoudelijke zoektocht naar een vervolgopleiding in een studiebeurs voor een master-opleiding in Bioinformatica aan de universiteit van Wageningen (WUR) in Nederland. In 2005 behaalde hij zijn master-diploma. Tijdens zijn stageperiode aan de WUR in 2005 schreef hij “Annotation and haplotype comparison of a G-protein coupled receptor region in Phytophthora infestans”, onder supervisie van professor Francine Govers van de faculteit voor Phytopathologie. Zijn afstudeerscriptie getiteld “Deciphering phosphorylation sites in the Arabidopsis Somatic Embryogenesis Receptor-like kinase 1(AtSERK1) and its interactors in the AtSERK1 signal transduction cascade” rondde hij af in 2005 onder supervisie van professor Sacco de Vries aan de faculteit voor Biochemie, WUR. In december 2005 kreeg Iziah een aanstelling bij de groep van professor Martijn Huynen –Centrum voor Moleculaire and Biomoleculaire Informatica, Radboud Universiteit Nijmegen Medisch Centrum voor een promotieon- derzoek met als doel om de reactie van de gastheercel op virusinfecties in de luchtwegen te onderzoeken. Iziah is met name geïnteresseerd in de toegepassing van Bioinformatica bij biomedische vraagstukken. Sinds september 2011 werkt Iziah als bio- informaticus voor de groep van professor Carine Carels bij de afdeling Orthodontie, Radboud Universiteit Nijmegen Medisch Centrum. Iziah is gelukkig getrouwd en heeft een dochtertje.

223 Publications

Publications

1. Quantitative proteome proling of respiratory virus-infected lung epithelial cells. Angela van Diepen, H. Kim Brand, Iziah Sama, Lambert H.J. Lambooy , Lambert P. van den Heuvel, Leontine van der Well, Martijn Huynen, Albert D.M.E. Osterhaus, Arno C. Andeweg, Peter W.M. Hermans. J Proteomics. 2010 Aug 5; 73(9): 1680-93

2. Measuring the physical cohesiveness of proteins using physical interaction enrichment. Iziah Edwin Sama and Martijn A. Huynen. Bioinformatics. 2010 Nov 1;26(21):2737-43.

3. Transcriptomic assessment of cellular response to intracellular pathogens reveals an iron-depletion prole. Iziah Edwin Sama, Apriliana E. R. Kartikasari, Dorine W. Swinkels, Arno Andeweg, Martijn A. Huynen. (Manuscript in preparation)

4. MicroRNA genes preferentially expressed in dendritic cells contain sites for conserved transcription factor binding motifs in their promoters. Bastiaan J.H. Jansen*, Iziah E. Sama*, Dagmar Eleveld-Trancikova, Maaike A. van Hout-Kuijer, Joop H. Jansen, Martijn A. Huynen, Gosse J. Adema. BMC Genomics. 2011 Jun 27;12(1):330.

5. Transcriptome kinetics of circulating neutrophils during human experimental endotoxemia. Stan de Kleijn, Matthijs Kox, Iziah Edwin Sama, Janesh Pillay, Martijn A. Huijnen, Johannes G. van der Hoeven, Gerben Ferwerda, Peter W.M. Hermans, Peter Pickkers. PLoS One. 2012;7(6):e38255. Epub 2012 Jun 5.

6. Analysis of genes regulated by the transcription factor LUMAN identies ApoA4 as a target gene in dendritic cells. Sanecka A, Ansems M, van Hout-Kuijer MA, Looman MW, Prosser AC, Welten S, Gilissen C, Sama IE, Huynen MA, Veltman JA, Jansen BJ, Eleveld-Trancikova D, Adema GJ. Mol Immunol. 2011 Dec 30. [Epub ahead of print]

*Contributed equally

224