Open Access Research2005BekpenetVolume al. 6, Issue 11, Article R92

The interferon-inducible p47 (IRG) GTPases in vertebrates: loss of comment the cell autonomous resistance mechanism in the human lineage Cemalettin Bekpen*, Julia P Hunn*, Christoph Rohde*, Iana Parvanova*, Libby Guethlein*§, Diane M Dunn†, Eva Glowalla*¶, Maria Leptin*‡ and Jonathan C Howard*

* †

Addresses: Institute for Genetics, University of Cologne, Zülpicher Strasse 47, 50674 Cologne, Germany. Eccles Institute of Human Genetics, reviews University of Utah, Salt Lake City, UT 84112-5330, USA. ‡Informatics & Systems Groups, Sanger Centre, The Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA UK. §Department of Structural Biology, Stanford University Medical School, Stanford, CA 94305, USA. ¶Institute for Microbiology and Immunology, University of Cologne Medical School, 50935 Cologne, Germany.

Correspondence: Jonathan C Howard. E-mail: [email protected]

Published: 31 October 2005 Received: 4 June 2005 Revised: 7 September 2005

Genome Biology 2005, 6:R92 (doi:10.1186/gb-2005-6-11-r92) reports Accepted: 7 October 2005 The electronic version of this article is the complete one and can be found online at http://genomebiology.com/2005/6/11/R92

© 2005 Bekpen et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Vertebrate

Athat mice survey and p47 of humans GTPasesp47 GTPases deploy in their several immune vertebrate resources organisms against show vacuolars that pathogens humans lack in radically a p47 GTPase-based different ways.

resistance system, suggesting research deposited

Abstract

Background: Members of the p47 (immunity-related GTPases (IRG) family) GTPases are essential, interferon-inducible resistance factors in mice that are active against a broad spectrum of important intracellular pathogens. Surprisingly, there are no reports of p47 function in humans. refereed research

Results: Here we show that the p47 GTPases are represented by 23 in the mouse, whereas humans have only a single full-length p47 GTPase and an expressed, truncated presumed pseudo- . The human full-length gene is orthologous to an isolated mouse p47 GTPase that carries no interferon-inducible elements in the promoter of either species and is expressed constitutively in the mature testis of both species. Thus, there is no evidence for a p47 GTPase-based resistance system in humans. Dogs have several interferon-inducible p47s, and so the primate lineage that led to humans appears to have lost an ancient function. Multiple p47 GTPases are also present in the

zebrafish, but there is only a tandem p47 gene pair in pufferfish. interactions

Conclusion: Mice and humans must deploy their immune resources against vacuolar pathogens in radically different ways. This carries significant implications for the use of the mouse as a model of human infectious disease. The absence of the p47 resistance system in humans suggests that possession of this resistance system carries significant costs that, in the primate lineage that led to humans, are not outweighed by the benefits. The origin of the vertebrate p47 system is obscure. information

Background (for review, see Mestas and Hughes [1]). The p47 (immunity- It is generally assumed that the immune system of the mouse related GTPases (IRG) family; see Nomenclature, below) is a good experimental model for that in humans. However, GTPases present a uniquely striking example of this several studies suggest that immune mechanisms have been divergence. evolving rather differently in the human and mouse lineages

Genome Biology 2005, 6:R92 R92.2 Genome Biology 2005, Volume 6, Issue 11, Article R92 Bekpen et al. http://genomebiology.com/2005/6/11/R92

In mice the interferon-γ-inducible p47 GTPases constitute amphipathic helix in the subterminal domain [6]. We recently one of the most powerful resistance systems against several showed that IIGP1 participates in a novel effector mechanism important intracellular pathogens [2-4]. The localize in Toxoplasma gondii infected astrocytes involving vesicula- on intracellular membrane systems in interferon-induced tion and ultimately destruction of the parasitophorous vacu- cells, some (IGTP, IIGP1) favoring the endoplasmic reticulum ole membrane [8]. In contrast, there is evidence that LRG-47 [5,6] and others (LRG-47, GTPI) the Golgi membranes [6,7] is involved in accelerated acidification of the phagocytic vac- (for names of individual IRG GTPases see Additional data file uole containing Mycobacterium tuberculosis [8]. 1). Infection or phagocytosis, however, initiates redistribution of the p47 GTPases to the phagocytic vacuole [6-8]. The p47 The p47 GTPases are thus a functionally diverse resistance GTPases probably act specifically against vacuolar pathogens. system with many signs of adaptive divergent evolution. Sur- Thus, Gram-positive and Gram-negative bacteria, mycobac- prisingly, there are no reports of p47 GTPase function in teria, and protozoal pathogens are all resisted by the p47 humans. To address this imbalance, we analyzed the p47 GTPases, whereas no viral target has yet been confirmed. GTPase gene family in depth. We conclude that although the mouse has 23 p47 GTPases, of which up to 20 may be func- The p47 GTPase IIGP1 is a low-affinity nucleotide binding tional in resistance, the resistance system is entirely absent with a slow GTP turnover [9]. At high protein concen- from humans. This finding carries important implications for trations and in the presence of GTP, IIGP1 oligomerizes and our understanding of human and mouse immunity to vacu- increases GTP turnover by up to 20-fold. These properties are olar pathogens. distinct from those of the classical signaling GTPases and are reminiscent of the dynamins and p65 (GBP-1) GTPases [10,11]. The crystal structure of IIGP1 exhibits a H-Ras-1-like Results nucleotide-binding domain flanked by amino-terminal and Genomic organization of the p47 GTPase (Irg) genes of carboxyl-terminal helical domains that are unknown in other the C57BL/6 mouse GTPases [12]. This basic structure is probably common to the There are 23 p47 GTPase (Irg) genes in the C57BL/6 mouse, whole family. However, the divergent sequences of published including the six previously known members of the family p47 GTPases [13] and the patterns of susceptibility in knock- [13], localized on 7, 11 and 18 (Figure 1a,b; also out strains (for reviews, see Taylor [2] and MacMicking [3,4]) see Figure 7a). (For the nomenclature of the Irg genes, see show that the proteins are highly diversified. Thus, a sub- Nomenclature (below) and Additional data file 1). Two of the group of three proteins (the GMS GTPases) have a radical mouse Irg sequences, namely Irga5 and Irgb7, are clearly substitution (the substitution of Methinine (M) for Lysine pseudo-genes (see legend to Figure 1b). The remaining 21 Irg (K)) in the conserved P-loop G1 motif of the nucleotide bind- genes are intact across the GTP-binding domain, although ing site (Walker A motif) and correlated sequence variation Irga1, Irga8, and Irgb10 are carboxyl-terminally truncated elsewhere in the G-domain [13], implying a distinct catalytic relative to the majority, and no transcripts of Irga7 and Irgb8 mechanism for GTP hydrolysis. In the case of IIGP1 and LRG- have yet been found. Thus, the number of potentially func- 47, the cell biology of the two proteins is distinct; IIGP1 asso- tional Irg genes is not six but rather 21 in the C57/BL6 mouse. ciates with the endoplasmic reticulum membrane primarily The nucleotide and protein sequences of these genes can be through an amino-terminal myristoylation sequence, found on our home page [14]. whereas LRG-47 associates with Golgi membrane via an

FigureGenomic 1 positioning(see following and page) phylogenetic relationship of mouse Irg GTPases Genomic positioning and phylogenetic relationship of mouse Irg GTPases. (a) Disposition of the 23 Irg genes on the mouse karyotype. Individual Irg genes are listed in correct gene order in each cluster. (b) Positioning and orientation of Irg genes in the mouse 11 and 18 clusters. Positions of genes refer to the location in Mouse ENSEMBL release (v28.33d.1, February 2005) [61] of the first G of the glycine codon of the G1 motif (GKS or GMS) of the GTP-binding domain of each gene. The segments of the cluster indicated with square brackets are regions of uncertain structure. Gene orientation is given by black arrows. The shaded region of the chromosome 11 map is a duplication introduced in Mouse ENSEMBL v28.33d.1 (February 2005) in an attempt to resolve a region of high ambiguity indicated by the longer square bracket. In our view this duplication does not resolve the ambiguities consistently, and we see no justification at present for the duplicated Irgb5 and Irgb6 genes. The sibling genes Irgb3 and Irgb4 differ by only nine nucleotides; in this case, however, the independent existence of the two genes is proved by the proximity of the PA28βψ retropositioned pseudo- gene to Irgb3 but not to Irgb4, in addition to consistent sequence differences. We have left the duplication of the Irgb5/Irgb6 region in the map for consistency of the base numbering with this release of ENSEMBL. *Indicates minor sequence differences presumably due to sequencing errors. (c) Unrooted tree (p-distance based on neighbour-joining method) of nucleotide sequences of the G-domains of the 23 mouse Irg GTPases, including the two presumed pseudo-genes Irga5 and Irgb7. The sources of all Irg sequences are given in Additional data file 1, and the nucleotide and amino acid sequences themselves are collected in the p47 (IRG) GTPase database from our laboratory website [14]. (d) Phylogenetic tree of the amino acid sequences of the G- domains of 21 mouse Irg GTPases rooted on the G-domain of H-Ras-1 (accession number: P01112). The products of the two presumed pseudo-genes Irga5 and Irgb7 are excluded from the analysis.

Genome Biology 2005, 6:R92 http://genomebiology.com/2005/6/11/R92 Genome Biology 2005, Volume 6, Issue 11, Article R92 Bekpen et al. R92.3 comment

(a) (b)

Mouse chromosome 11 kb 0 10 20 30 40 50 60 70 8090 100 110 120 130 140 150 160 170 180 190 200 210 220 230 0 10 20 30 40 // 48.500 57.835 Mb

506.504529.186 535.038 547.950 555.881 585.853 588.017 607.444 627.410 659.863 679.291 699.281710.592 735.402 800.523 818.817 832.237 kb reviews //

Irgm1 Irgb8Irgb9 Irgb1 Irgb2 Pa28 Irgb3 Irgb5 Irgb6* Irgb4 Irgb5 Irgb6 Irgb7 Irgd Irgb10 Irgm3 Irgm2 LRG47 TGTP* TGTP IRG47 IGTP GTPI

Mouse chromosome 18 kb 0 10 20 30 40 50 60 70 8090 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240

60.730 60.970 Mb 736.672 761.222 786.364 815.677 840.662 877.357 905.741 957.685 kb reports Irga1 Irga2 Irga3 Irga4 Irga5 Irga7 Irga6 Irga8 IIGP

(c) (d) research deposited

Irga6 Irgb7 Irgb5 Irgb3 Irga1 Ψ Ψ Irgb9 Irga2 Irgb4 Irgb2 Irga5 Irgb8 Irgb1 Irga3 Irgb6 Irgb6 Irga8 Irgb2 Irgb5 Irgb9 refereed research Irga4 Irgb1 Irgb10 Irga7 Irga6 Irgb8 Irgb3 Irga1 Irga2 Irgb4 Irgb10 Irga4 Irga7 Irga3 Irgd Irga8 Irgd

Irgc interactions

Irgc Irgm1 Irgm2 Irgm1

Irgm2 Irgm3 Irgm3 H-Ras-1 0.05 0.1 information

Figure 1 (see legend on previous page)

Genome Biology 2005, 6:R92 R92.4 Genome Biology 2005, Volume 6, Issue 11, Article R92 Bekpen et al. http://genomebiology.com/2005/6/11/R92

The complex block of 13 genes on chromosome 11 contains (IRG47) [15], in which an active interferon-stimulated the most divergent sequences (Figure 1c,d; Additional data response element (ISRE) was found upstream of the putative file 2), including all three GMS (Irgm) GTPases [13], suggest- transcriptional start point. A GAS (γ-activated sequence) site ing that this cluster is relatively ancient. In contrast, the eight was predicted in the putative promoter region of Irgm1 (LRG- Irga genes clustered on chromosome 18 are also clustered 47) [8]. Most of the transcribed p47 genes on chromosomes 11 phylogenetically, suggesting more recent divergence, proba- and 18 exhibit multiple perfect interferon-inducible genomic bly from a translocated member of the Irgb (TGTP) cluster on motifs, both ISRE and GAS elements (Figure 2b; Additional chromosome 11. The isolated Irg gene on chromosome 7, data file 3). The sequences and relative positions of the GAS Irgc, is an ancient root with no obvious systematic relation- and ISRE elements vary, both classes of site are not present in ship to the other subfamilies. Within the chromosomal clus- all promoters, and the orientations of the two components are ters, more recent duplication events are apparent. The sibling also variable. Thus, the association of interferon-inducible pair Irgb3 and Irgb4 differ by only nine nucleotides in the elements with Irg genes is presumably ancient and has been open reading frame. The genes Irgb1, Irgb3, Irgb4, and Irgb8 retained against the disruptive forces of spontaneous genome appear to have been duplicated in tandem with Irgb2, Irgb5, evolution. No further immunity-related inducible elements and Irgb9, respectively. The pattern of divergence in the such as NFκB sites were found to be associated with the mouse p47 tree suggests an old gene family that has under- ISRE/GAS motifs. Irgd and Irga6 are both transcribed from gone a succession of duplication-divergence cycles over time alternative 5'-untranslated exons, each furnished with an - a pattern of evolution that is still actively continuing in sev- independent promoter. In both genes the initial methionine is eral of the subfamilies. encoded at the beginning of the long 3' exon, so that the two transcripts of each gene generate identical proteins. Both The structure of p47 GTPase genes and their splicing putative promoters of Irgd and Irga6 have interferon-induc- patterns ible elements. As noted above, genes Irgb1, Irgb4, and Irgb8 The open reading frame of Irg genes is typically encoded on a are probably expressed only as the 3' ends of tandem tran- single long 3' exon (Figure 2a) behind one or more 5'-untrans- scripts with Irgb2, Irgb5, and Irgb9, respectively. No dedi- lated exons. However, in one splice form of Irgm1 and one cated 5'-untranslated exons could be identified for these splice form of Irgm2 the initial methionine is encoded at the downstream domains. Using RT-PCR we were able to show 3' end of the penultimate exon (also see the legend to Figure clear induction of eight further genes (Irga2, Irga3, Irga4, 2). The closely related Irgb1 and Irgb4 genes are exceptional Irga8, Irgb1, Irgb2, Irgb5, and Irgb10) in addition to the six in apparently occurring only as tandem transcripts in-frame (Irga6 (IIGP), Irgb6 (TGTP), Irgd (IRG-47), Irgm1 (LRG- with their respective closely linked upstream genes Irgb2 and 47), Irgm2 (GTPI) and Irgm3 (IGTP)) assayed by Boehm and Irgb5. If translated, such transcripts would generate 94 kDa coworkers [13] in L929 fibroblasts stimulated with polypeptides containing two distinct full-length p47 GTPase interferon-γ in vitro (Figure 3a). units. For the sequence phylogenies and alignments (Figure 1c,d; also see Figure 4, below), we provisionally treat these The isolated p47 gene, Irgc, on chromosome 7 is a clear separate p47 units as independent genes. It remains to be exception. No clustered or isolated ISRE or GAS elements seen whether the third tandem gene pair, Irgb9 and Irgb8, is could be identified up to 10 kilobases (kb) 5' of the putative also expressed as a tandem transcript. That Irgb1, Irgb3, and transcription start of this transcribed gene, and Irgc was not possibly Irgb8 are normally expressed in tandem with an induced in interferon-stimulated fibroblasts (Figure 3b, upstream gene is also consistent with the absence both of panel i left). A weak Sox-related element was detected in the autonomous transcripts of these exons and of interferon- proximal promoter region. In view of the close homology of inducible promoter elements (see below). Irgc to the interferon-inducible Irg genes, we considered whether Irgc is induced in tissues of mice 24 hours after Identification of interferon-stimulatable elements in infection with Listeria monocytogenes [13,16]. No induction putative promoters of Irg genes of Irgc was detected in liver, lung, or spleen after 50 cycles of The basis for interferon-inducible expression of the mouse amplification, whereas Irga2, used as a positive control, was p47 GTPases has previously been investigated only for Irgd induced in all three tissues (Figure 3b; panel i right). How-

GenomicFigure 2 and(see promoterfollowing page) structure of mouse Irg GTPases Genomic and promoter structure of mouse Irg GTPases. (a) Genomic structure of mouse Irg genes. Green blocks indicate coding exons and blue blocks indicate 5'-untranslated exons. Orange arrows identify putative promoter regions. Stars identify exons shown to be excluded in alternative splice forms. The scale bar is measured in base pairs up to the first base of the long coding exon. Note the presence of two promoters for Irga6 and Irgd. (b) Interferon response elements in the promoter regions of mouse Irg genes. γ-Activated sequences (GAS; pale blue blocks) and interferon-stimulated response element (ISRE; red blocks) sequences were identified in the promoters shown in panel a (also see Additional data file 7). Dark blue blocks downstream of each promoter represent the most 5' exon. The yellow block identifies a putative Sox1 transcription factor binding site in the proximal promoter region of Irgc. The scale bar is measured in base pairs from the first base of the 5' exon.

Genome Biology 2005, 6:R92 http://genomebiology.com/2005/6/11/R92 Genome Biology 2005, Volume 6, Issue 11, Article R92 Bekpen et al. R92.5

(a)

0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,000 11,000 12,000 13,000 14,000 comment

Irga1 Irga2 20,500 Irga3 Irga4 (IIGP) Irga6 Irga8 reviews Irgb2/b1 Irgb2 Irgb1 6,000 Irgb5/b4 11.2-2ndIrgb5 Exon Irgb4 6,000 11,750 (TGTP) Irgb6 Irgb9 Irgb10

Irgc reports (IRG47) Irgd 6,000 (LRG47) Irgm1 (GTPI) Irgm2 (IGTP) Irgm3 deposited research deposited (b) -1,200 -1,000 -750 -500 -250 +1

Irga1 Irga2 Irga3 Irga4 refereed research (IIGP) Irga6(p1) Irga6(p2)

Irga8 Irgb2 Irgb5 (TGTP) Irgb6

Irgb9 interactions Irgb10 Irgc (IRG47) Irgd(p1) Irgd(p2)

(LRG47) Irgm1 (GTPI) Irgm2 information (IGTP) Irgm3 Exon 1

Figure 2 (see legend on previous page)

Genome Biology 2005, 6:R92 R92.6 Genome Biology 2005, Volume 6, Issue 11, Article R92 Bekpen et al. http://genomebiology.com/2005/6/11/R92

ever, Irgc, unlike Irga2, was constitutively expressed in the IRGC and the IRGM gene fragment are included in an mature mouse testis (Figure 3b; unpublished data). We con- extended phylogeny (Figure 5) and alignment (Figure 6) of clude that mouse Irgc is expressed in a tissue-specific manner the vertebrate IRG proteins. and is not induced by infection. The IRGC mouse and human genes sit in chromosomal The coding sequences of the p47 GTPases regions syntenic between chromosomes 7 and 19, respectively In Figure 4 we present the predicted translation products of (Figure 7a) and are clearly orthologous. The proximal the 21 intact p47 GTPase genes, and reconstructed partial promoter region of human IRGC is largely conserved with sequences of the two pseudo-genes, Irga5ψ and Irgb7ψ, that of mouse Irgc. However, as in the mouse, no interferon aligned on the secondary structures of Irga6 [12] and H-Ras- response elements are found either in the proximal conserved 1 [17]. The full alignment confirms a number of major features region or in divergent regions up to 10 kb upstream of the that are already apparent from the previously published transcriptional start (data not shown). Human IRGC, like alignment of six family members [13] and consolidates the mouse Irgc, is not inducible in vitro by interferons, is not definition of the p47 GTPases as a distinctive sequence fam- expressed detectably in brain or liver, but is strongly ily. Especially noteworthy are novel features of the amino- expressed in adult testis (Figure 3b, panel ii). As in the mouse, and carboxyl-termini, which were not apparent before. a weak Sox element is present in the proximal promoter of Eleven of the proteins, including six of chromosome 18 Irga human IRGC. gene products and Irgb2, Irgb5, Irgb9 and Irgb10, carry the amino-terminal myristoylation signal MGxxxS [18]. This The human genomic segments syntenic to the mouse chro- sequence in Irga6 (IIGP1) is indeed myristoylated in vitro mosome 11 and chromosome 18 IRG gene clusters both [19] and in vivo, and, as expected, favors binding of the pro- mapped to human 5q33.1, suggesting that the interferon- tein to membranes [6]. No other membrane attachment inducible IRG proteins were once encoded in a single block sequences or lipid modification motifs are apparent in p47 ancestral to the human region (Figure 7b). GTPase sequences, despite the documented attachment of IRGM maps only 80 kb away from the closest syntenic several of these proteins to membranes [5,6,16]. Irgb2, Irgb5, marker DCTN4. IRGM is transcribed in unstimulated human Irgb7, Irgb9 and Irgc have carboxyl-terminal extensions up to tissue culture lines HeLa and GS293 (Figure 8a), with no 65 residues in length compared with the canonical IIGP1 increase after interferon induction. Polyadenylated tran- sequence. scripts of IRGM occur with five 3' splicing isoforms extending more than 50 kb 3' of the long coding exon (Figure 8b). The The p47 GTPase genes of the transcripts have a 5'-untranslated region of more than 1,000 Only two IRG sequences, both transcribed, are present in nucleotides that corresponds largely to the U5 region of an humans (or chimpanzee), one (IRGC) on chromosome 19 ERV9 repetitive element [20]. The promoter region corre- (19q13.31) and the other (IRGM) on chromosome 5 (5q33.1). sponds to the ERV9 U3 LTR (long terminal repeat) without Human IRGC is more than 85% identical at the nucleotide interferon response elements, and three of the five splice level and 90% at the amino acid level to the isolated mouse forms have exon-intron boundaries downstream of the puta- gene Irgc. IRGM encodes an amino- and carboxyl-terminally tive termination codon, normally a signal for rapid RNA deg- truncated G-domain homologous to the Irgm (GMS) sub- radation [21]. family of mouse p47 GTPases. Predicted protein products of

InterferonFigure 3 (see responsiveness following page) of mouse and human p47 (IRG) GTPase Interferon responsiveness of mouse and human p47 (IRG) GTPase. (a) Interferon (IFN)-γ responsiveness of eight new mouse Irg genes. Inducibility of eight further Irg genes (also see Boehm and coworkers [13]) in L929 fibroblasts induced for 24 hours with IFN-γ, demonstrated by RT-PCR. D refers to a positive control genomic DNA template; O refers to a negative control of the same genomic template after DNAse1 treatment; and + and - refer to RT- PCR on DNAse1-treated RNA templates from IFN-γ-induced and IFN-γ-noninduced cells, respectively. The sibling genes of the Irgb series could not be individually amplified because of their close sequence similarity. The identities of the amplified genes responding to interferon induction, indicated by vertical arrows, were subsequently established by sequencing of multiple clones from the PCR product. (b) Irgc is not induced by interferon or infection but is constitutively expressed in testis. (i, left) Mouse L929 fibroblasts were induced for 24 hours with IFN-β or IFN-γ or left uninduced (-). Irgc could not be detected by RT-PCR even after 50 amplification cycles in L929 cells. Irga2 after 50 cycles was used as a positive control for the interferon-induced L929 RNA. RNA from mouse testis provided a positive control for Irgc. (i, right) RT-PCR for Irgc and Irga2 (50 and 30 amplification cycles respectively) on RNA from tissues of uninfected mice (-) or mice infected 24 hours previously with Listeria monocytogenes (+). Irga2 was induced in all tissues and Irgc in none. RNA from mouse testis provided a positive control for Irgc, which is detected after 50 cycles. Testis expression of Irga2 was barely detected after 30 cycles (compare with i, left, showing Irga2 in testis after 50 cycles). (Panel ii, left) Human IRGC is not induced by 24 hours of stimulation with IFN-β or IFN- γ in human cell lines (induction of GBP-1 [accession number P32455] was assayed as a positive control) and (Panel ii, right) is constitutively expressed only in human testis. GAPDH was used as control.

Genome Biology 2005, 6:R92 http://genomebiology.com/2005/6/11/R92 Genome Biology 2005, Volume 6, Issue 11, Article R92 Bekpen et al. R92.7

(a) comment Irga2 Irga3 Irga4Irga7 Irga8 Controls D0D0D0D0D0 IFN - γ +- +- +- +- +- reviews

Irgb1,3,4,8 Irgb2,5,9 Irgb10 Controls D0D0D0 IFN - γ +- +- +- reports deposited research deposited

(b) (i)

NA

D Liver Lung Spleenno DNA Testis β γ Testis IFN - no Listeria - + - + - + 622 bp Irgc 622 bp 50 Cycles Irgc

50 Cycles refereed research

963 bp Irga2 963 bp 50 Cycles Irga2 30 Cycles

(ii) 1O fibroblast THP-1 Hela Testis no DNA IFN - β γ - - β γ interactions

622 bp r

IRGC n

50 Cycles

Testis

Live

Brai no DNA

622 bp 428 bp IRGC GBP-1 50 Cycles 27 Cycles 495 bp GAPDH 27 Cycles

495 bp information GAPDH 27 Cycles

Figure 3 (see legend on previous page)

Genome Biology 2005, 6:R92 R92.8 Genome Biology 2005, Volume 6, Issue 11, Article R92 Bekpen et al. http://genomebiology.com/2005/6/11/R92

At the protein level the shortest isoform of IRGM is shorter cated on Figure 6; also see Additional data file 5). This intron than a canonical G-domain, being truncated in the middle of is 81 bp long in both pufferfish species but is substantially β-strand 6 just before the G5 sequence motif, which interacts longer in the zebrafish genes. The distinct irge subfamily of with the guanine base of the bound nucleotide (Figures 6 and the Danio irg genes are intronless in the open reading frame, 8b; also see Ghosh and coworkers [12]). The longer isoforms like mammalian IRG genes. are terminated by short sequence extensions that are unrelated to known GTPase domains. A rabbit antiserum IRG homologs with divergent nucleotide-binding raised against recombinant human IRGM produced in regions: the quasi-GTPases Escherichia coli failed to detect signal by immunofluores- The mouse, human and zebrafish genomes encode proteins cence or Western blot in human cell lines (data not shown). that are homologous to the IRG GTPases but are radically modified in the GTP-binding site. The mammalian protein IRG genes of the dog FKSG27 (IRGQ), a protein of unknown function that is 70% Is the mouse (order Rodentia) or the human (order Primata) conserved between man and mouse, is extended amino-ter- the exception? We looked for IRG genes in a third order of minally relative to a p47 GTPase by about 100 residues mammals, the Carnivora. We recovered a total of eight IRG encoded on three short exons. The remaining 420 residues, genes from the public genome database of the dog Canis encoded on a single long exon, are clearly homologous to and familiaris (Figures 5 and 6) as well as a partial sequence of a colinear with the IRG proteins (Figure 6 and Additional data 9th gene (not shown). Of these, one (not shown) is a pseudo- file 6), especially in the amino- and carboxyl-terminal parts of gene by a number of criteria, another is clearly dog IRGC, the exon. The region of lowest similarity is in the G-domain, whereas the partial sequence is novel but most closely related and conserved GTP-binding motifs are lacking (Figure 6, and to IRGC. The remainder assort into segments of the phylog- Additional data files 6 and 7). Thus, FKSG27 (IRGQ) is not a eny already established for the interferon-inducible mouse GTPase despite its phylogenetic relationship to the IRG pro- IRG genes (Figure 5). Both GMS and GKS genes are repre- teins. FKSG27 (IRGQ) is closely linked to IRGC in humans sented and are inducible by interferon in dog MDCK epithe- and mouse (Figure 7a). lial cells (Additional data file 4). The three dog GMS genes appear to have diversified independently from the mouse The zebrafish genome contains three IRG homologs with GMS genes (Figure 5). As in humans and mouse, dog IRGC more or less modified GTP-binding motifs (irgq1-irgq3; Fig- was not induced by interferon-γ (Additional data file 4). Over- ures 5 and 6, and Additional data file 7). Their homology to all, the IRG gene status of the dog clearly resembles that of IRG genes is stronger than that of FKSG27 (IRGQ), but as mouse rather than that of humans. with FKSG27 (IRGQ) their function as GTPases is doubtful. The irgq1 gene is clustered on a single BAC clone with four IRG genes in fish genomes apparently normal irge genes and immediately downstream IRG GTPases are at least as old as the vertebrates. We have of a truncated p47 gene, irgg, with which irgq1 is transcribed identified at least two distinct irg genes in the freshwater as the carboxyl-terminal half of a tandem transcript. Thus, pufferfish Tetraodon nigriviridis, a closely linked pair of irg the hypothetical protein product would be a carboxyl-termi- genes in the saltwater pufferfish Fugu rubripes, and at least nally truncated p47 GTPase, linked at its carboxyl-terminus 11 partially clustered irg genes in the zebrafish Danio rerio to a similarly truncated p47 homolog probably without (Figures 5 and 6, and Additional data file 5). The fish irg GTPase function. genes fall into separate clades from the mammalian genes (Figure 5). A specific IRGC homolog is not immediately We propose to term the modified IRG proteins without apparent. GMS subfamily IRGM genes are absent from fish. GTPase function 'quasi IRG' proteins, hence IRGQ. IRGQ The pufferfish and zebrafish irgf genes have one intron iden- sequences reveal their phylogenetic relationship to the IRG tically positioned at the end of helix 4 of the G-domain (indi- proteins, but they are nevertheless more or less radically

AminoFigure acid 4 (see alignment following of page) the mouse Irg GTPases Amino acid alignment of the mouse Irg GTPases. Sequences of all 23 mouse Irg GTPases showing the close homology extending to the carboxyl-terminus, aligned on the known secondary structure of Irga6 (indicated in blue above sequence alignment). The sequences of notional products of the two pseudo- genes Irga5 and Irgb7 have been partially reconstructed; premature terminations are indicated by red highlighting. In the C57BL/6 mouse the sequence of the Irga8 gene is damaged by an adenine insertion, indicated by the red highlighted K at position 204. (The sequence given after this point is that given after correcting the frameshift, and is identical to that of the CZECHII [Mus musculus musculus] sequence BC023105 that lacks the extra adenine.) The turquoise-highlighted M in M1 and M2 are initiation codons that are dependent on alternative splicing (also see Figure 2a); the unusual methionine residues in the G1 motif of GMS proteins are highlighted in green. The blue background Q residue of Irgb5 and Irgb2 at positions 405 and 396 indicate the point at which tandem splicing occurs to Irgb4 and Irgb1, respectively. Canonical GTPase motifs are indicated by red boxes.

Genome Biology 2005, 6:R92 http://genomebiology.com/2005/6/11/R92 Genome Biology 2005, Volume 6, Issue 11, Article R92 Bekpen et al. R92.9 comment

I I I E D A A E E E A A A A A A A T E A Q I G FL FI I LI LI G L S4 E D YDFFI YDFFI YDFFI YDFFI YD FD YDFFL EYDFFI EYDFFI EYDFFI EYDFFI EYDFFI EYDFFL EYDFFI EYDFFI EYDFFI EYDFFI EYDFFI EYDFFI EYDFFI EYDFFI EYDF FGVDD FGLDD FGVDD FGLDE FGIDD FGLDE FGLDE FGLDE FGLDD FGVDE FGVDD FGVDD FGVDD FGLNE FGLDE FGLDE FGLDD FGLDD FGLDE FGVDD V Y S S Y Y Y Y S S V Y A-N K- K- Y- YVM K-K E- K-K G- E- E- Y- Y- G- G- G- G- Q- G-R S-TC E- RS KI DL DL LI F F F F F F F F T---G QKI QKL QKT H2B G Q E QIS-I QIS-T E E E E D Y YH YRT YR YR YRS YR YR YRS YRS YRS YRS YRS YRS YRS YRT F Y YRS YRS MKF VKF VKF V MKF MKF MKF M V M M MKF MKF MKF MKF MKF I V I I MR V IKF MKF NY RG KF NY DY DY KV TL ND ND ND KD QT TL TL TL ND KF EE EE DQ EK EE TE EE KK EK EK KK EK TE TE TE TE KK ET KK KK KQ KK EE FNL L L L L TL SL SM SL SL TL TL TL TL TL TL T TL TL SM YL YL YL YL YL YL YL YL YL YL YL YL YL YL YL YL YL YL YL YL YL YL YV G IR KN EK EK KKR KR EE GEC ERG EE EE EE DE QQ QQ QQ KK QQ H2 α KT KT KT QN NT KT KH KD QN QN QN QN EE HD KT HD HD HD I L L L L L L M L L L L L L L L L L 413 295 406 416 421 191 417 410 420 409 407 423 421 441 421 441 415 232 458 473 467 467 463 189 P P P P P P P P P P P P P P P P P P HADA Q P P P P Q T K P P P Q P Q P Q Q Q KK ET QK KK EN ET PEQEQC QK QK QK EM EM EM EM F F F F F F F F F F F F F F F F F F F DAL KER KER PGNAIEIRKA TAN DDN M M M M L reviews AQSVES AQSVES N N N K T T N N N A AQTVED A A DM D DL DV DM DM DM D D DL D DM D DL D PN ED QK QK T T MK KK T PGCSADK A A FHH A GT GS GS RNK GDN TQK TQK TQK KDL KDL GVR VRNK VRNK VRDK VRDK TQK LDS ISDI T T QEEY---SAMRDQY IGST A L L L L L L L L L L L L L L M H2A

G S 3

G TA D VLGN VPGLAAAY VTGI SLTF VVGI VVGI S-RG S-LG S-RG L-GG L-GG L-GG L-GG FMTFFKGF S-LG L-GG WDLPG WDLPG WDLPGIGST WDLPGIGST L P P DXXG/SWII H------LWDLPGLG LWDLPGLG IWDLPGIGTT I LWDLPGLG LWELPAIG LP IP L IP IF VF VF E E VF I D V IF NF F TIP Y S3 V LTLWDLPGIGT LTLWDLPGIGT V V VTLWDLP VTLWDLPGIGST VTLWDLPGIGST L A A ATVP ATVP ATIP ATIP ATIP S ATIP ATIP ATIP ATIP S ATIP NLTLWDLPGIGT NV NV NLTLWDLPGIGT NV NV NVTIWDLPGIGTT NVTIWDLPGIGST NVTIWDLPGIGST NVTIWDLPGIGST NVTIWDLPGIGST NVTIWDLPGVGTT NV DVTLWDLPG V IQA L------SFFLES L VNI L L W W WT W L ------AVSSED L AV AL L GV GL GV GL GL GL A GL GL GV GL A G ------K ------Y ----Y ------S ----SAI ----S ----S | E----T E----T P K---- P P K---- P P P P P P P DGETCL LVS FAD FKI FNV TKD MKD FAAD LKA LKA LKA LKA LKS MKA FAGV Y IP IP MP F F F F F F F F I A SIMPRAW STMPRAW SIMPRAW SLK------NS STTPRAW 120 130 140 150 N N N H QF H H K K K K E E E E E MKDLE PKMP P P PKI PKI PK PKLP P PKV PKLP PKLP PKLP PKLP PK P PKV EDNLH------TEFGIS L K K K K ------QVV LKT WLEG AN WLQG H H H H H H H H H H H H H H H K V T IWLEA VWLEA IWLQA IWLQA ID H E K K K K RN- Q FS-S P K PY-A SS-S P P P P TY-T TY-T TY-T SSDS T P K R F Q K I I F DQM Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y E W Q Q KS S P P C P S E P P P P P P P P Q KQ KQ KQRI NF HP HP HP T KK KS S S S2 IK LKE LKQRIWLEG LR LRQKIWLEA LRQKIWLEA LRQRIWLEA LRQKIWLEA LKQKVW LKQKIW L LKQKIW LKQR LREKIWLEA LKQKVFLEA LKQKVW RT R RT KT R R R RT RT RT RT RT RT R RT K RT RT F E E K K N C K KV KVYS K K K K QP QF NF KATQ DF DF DF DF DM DM DM DM QS HF DS DM α KE KT QKP KKPAC KD TE IE TE TE 110 110 YKATQ YNST YNST VAVW VES AFS T TM TM R K KR KK K KR KR ------FHFFEMFQSDSDKLCHVHVLLLLTSWGLSGETVT FHFFEMFQSDSDKLCHVHVLLLLTSWGLSGETVT FHFIEMFQSDSDELCHVHVLLLLTSGGLSSETVT FHFFEMFQSDSDKLCHVHVLLLLTSWGLSGETVT PLSTRRKLGLLLKYILDSWKRRDLSEDK------K V A TT TT TT TT TT TT TT KKR KKR RK RKH KK KK RKR RKR RKR RKR RKR RKR RKR KKK D ETTM ETTM R R R H Y H H -----EDS NE SN E ET E D D Q Q EL GD A A D A NY D DW D D Q VETTM T T T IETTM VETT I V I I I I I I I I I V I I I I I V V I I L

T A A GVIETTM GVIETTM GVVETTM GV GV G GVV G2 KT KC SV SV SV QA AA AA AA AS KI TA TA TA TT AT ST ST SWI reports TGVVE TGVVE TGV TGVV TGVV TGVV TGVVETTM TGVVETTM TGVVETTM TGVIETTM TG SG TGVV TGVV TGVV TGVVETTM E D E LEA TD C C INAI INAI -K -K -H -EV -EV -TV -EV -EV -DV -P -P -P -P -P -P -P -P -E AS AS AS -L

ITD ITD ITD F ITE ------ITE ITE ISD TY V V VTE VTE VTE VTE LTE ------V V VTD VTD IS GCMSCKCVLS------

A A A A A A A A A A A A A A A A A A A TAPV A A A --YDPT | A A S A V S SQ YQ YG PN PN PN QS PD PN PG PI PI PN PN HS HS HS HS QV QV PN GA GA GA S S ES DS I L K P G SE D KD KDS --- L TL TL SL TL TL A EI L E K LSL R F S EE ED EE EEEGA EEEGA EEEGA EE EEE EEEGA EEEGA EEEGA EEEGA EEE E EEE EE DEDG DEDA DEEGA DEEGA EE DE DEEGA L V L LLSL A S H A H H P P H N H H D A H D D H FLLSL FLLSL FM FMLSL FMLSL F F FMLTL FMLTL FMVSL FMLSL FM FM FMMSL FMMSL FMMSL FMMSL DMKRKM- G KA RD RD RG KA SS G DP GL AG N N N M V M M N N L L L L E T I V IG IG T IG R H QN QI LNPPDESGP QN QI α GVG GLG GIG GIG GIG G GL GV GV GV GIG GIG GLG GI GVG GVG G GVG Y Y CCEP ------E - -F - -E ------V -L R K RRH KR KRH R KH KRH KRH KRH KRH KRH KR KRH KH KHH KHH KRH KRH KRH KRH RK I- I- R Q- Q- Q- Q- IQNHFV I H E E E H Y Y Y Y H H H H H LR LR LR LR L S VI A IY DY VF T VF T T T F SNI HVQ H LPA LPA LPA LPA INAL INALR INALR INALR INAL INAL INAL INAL DLPA DLP DLP DLP L L L L L L L L LTIQ H1 KDI KDLP SQ HDLPA KDL SQ SQ SQ Q S S S S SFIN Q Q H Y WE LL LI LI LI TL TL TLLKDLPA TLLKDLPA TLL T TLIKEL TLIKDL TL TLLKELPG TLLRELPS TLLRELPA TLLRELPA TLLKELPA TLL TLL TLL GKSS GKSSFINAL GK GKSTFIN GKSS GKSS GKSSFINALR GKSSFINALR GKSSFINALR GKSTFINALR GKSS GKSSFINALR GMSSFINALR GMSSFINALR GKSS GKSSFIN GKSSFIN GKSSFIN GKSSFINALR GKSS GKSS GKSS GMSSFINALR GKSA C VSSFNSKASLYREESVGKVFPVGPGSTFL VSSFNSKASPYREESVGKVFPVSPGSTFL A V R A A A T T T A A T N A S S T A A A S N N V H5 RN RE MD MD ES VT VD MD RD EE ET ET EN ET ETK ES ES ES MDK MDK MDK Q Q G L L L L L L L L L L M L L L L L L L L L L SG E K K V V K K M V V V F V K K K K K K K N Q AG GESG G GETG GETG GETG GETG G GXXXXGK/M G1 DAFYTLVREIRQHKL deposited research deposited I I M M M M V DFP DFP E VTGDSG V YDFP YDFP FDFP YDFP YDFP YDFP YDFP FDFP YDFP YDFP YDFP FDFP FDFP FDFP FDFP YDFP FDFP FDFP FDFP EEFLSEKPGSCLSDLPEYWETGMEL------EEFLSEKPGSCLSDLPEYWETGMEL------EEFLSEKPGSCLSDLPEYWETGMEL------EEFLSEKPGSCLSDLPEYWETGMEL------ALEEDEPQGGEVSLEAAGDNLVEKRSTGEGTSEEA--- EELFTEQVSSFNSKASPYREESVGEVFPVGPGSTFL EELFTE EELFTEQVSSFNSKASPYWEESVGKVFPVGPGSTFL ALKDSVLPPEIH------EELFTE IETVNVA------ILRDSIFPPQI------ILRLSIPHP------F V FH D D D R D K D D D D K D D D K K K K K K K K K K K K K VGVTGESG VAV IAVTGDSG IAVTGETG I IAVTGDSG L S GA ------T CH CH CH S1 KEICLRN------KAAQTHFAHSF------KEAY------R N N N N KEICLRN------RWKYSKPRSNSTYP------RWKYSKPRSNSTYP------R E NSKALFEKKVGPYISEPPEYWEA------L S S S S E K H E NVAL S K PLL X RI L L V LNVAVTGETG VNVAVTGETG Q L I

SP DLS DLS DL KPESH NV NV NV D DIS LL LL LL LL LL LL V LL LL LL LL LL IL LL VL IL IL IL IL PINIAVTGESG PINIAVTGESG PL P PINIAVTGESG PLNIAVTGETG PLNIAVTGETG PLNIAVTGETG PLNIAVTGETG PINIAVTGESG PLNVALTGETG PLNIAVTGETG PLNIAV PLNIAV PLNIAV PLNIAV PV F H H L F LSP K E K V V V S S S S T V V N TS I I I T I H E T YR A A A A A V V A A NK G5 C SL V LKL AKTRQGV SAK QR K K IA K K K TTR SSL NT NA SA R NA R NT NAI SSV NT R S S T T AK V V V E E A SQI S - DAK DAK DAK DAK DAK DAK DA DAK DAK DAK DA DAK D N DAK D D D ID ID IE ID ID ID ID IDK IDK IDK IDK IEK IEK IEK IEK IEK ID L V S6 230 240 250 260 270 280 290 300 310 320 EDAK E E S S S S E N Q E S S S S AK XL FL L A TE TE T A TE VD A QSKS KE RE KN KN RD ANS KE RE KN SN SN SN TD R SAA SN GD GD GG AT QN GG VFLISN VFLISN IFLISN IFLVSN YIET α A V VA VA V V L I L L L V I C PMFLVS R Q Q P M M ET I I AL AL AL AL AL AL AL AL AL AL AI AL AL AL AL AL PPVFLVS H PPIFLISNYDLS PPIFMVSNYDLS PPVFLVSNFDVS PPVFLVSNFDVS PPVFLVSNFDVS PPIFLVSNFDVS

FL NEM D D D H H H Y H S S S S E DPPLFLIS EPPVFLVSNFDVS EPPIFLLSN EPPIFLISN EPPIFLISN EP EP EP DP EPPVFLVSN EPPVFLVSNFDVS EPPVFLVSNFDVS EPPVFLVSNFDVS LDTV L LD IE L RDM CD RE EKV REL KDE QK NA RDM KET KEI C KY K R A A A AV AI D D T N --- FLDTV FLDTVA FLETVA FIDTVA FIDTVA FIDTVA FIDTVA FLDT FLD FLD FIDTVA FIDTVA FIDTVA FIDTVA FLN I I I I L IS V IS I I IS V I V IS IS IS IS IS IS IS IT IS α M M V V V I I I I I V V I L A L V V I R A T I I E E E V A A A A LHI LH FH FH SL SL SL SL KDI NL NY NY NY NY S GI S S S IQ S S S S YG NN NN CFHQ NG NG NG SK SK SLDS RFQQ VTHQ VTHQ Q Q Q Q Q Q Q Q Q Q Q Q Q Q N I IS N N N M VS VS VS VS A NNA N V VS D D E E E E E E E E E E L L L L W W W W L LKYY LKLY RRTQ AVS A A A A ALS ALT ALS ALT A A A QKEK QKEK QQAQ K IKTG QKALSSQ QKALSSQ QKVLSSQ QKVLSSQ Q QNANAASTRG K K K RVAG Y HL HL HL HL TMLQGC KV K KV G G G Q R ET ET ET LTN E EVN R L L L L L L L L L L L L L L L YY YY YY YY YY YY YY YY YY F F F F Y FY YY HG SDV VKV Q HG Q LEL KE RET PQ T T T T T T T V V V V V LQ IQ V F VQ IQ L I K K K K K K K K K K K K D E NLQ NLQ NLQ N D DF D K NL D PAI R R R --RHQRHKLV R R R S RMA S S S S GTV NK GNI K K K GNIQ GNIQ GNI G G G G LKEI F EGNL EGNLQ E EG EG D D D DGNLQ DGNLX DGNLQ DGNL EG H4 F YF YF YF YF YF YF YF YF YF YY YF YF YF YF NK Q T E Q RK KKE KK NK TL KR KK K E E E ENK E E E E R LNCVNTFR LNCVNTFK LNCVNTFR NSYLDTFR NSYLDTFR DYCVTN DDCSGH DDCSGH DDCSGH DECLGL DYCSNH SDCVKN ICLSSN HTISSM NSISNI DHCTER L L L L L L L L IS L L V I I M M M L L L L V L L L L L L L L L QSYSVKIFN QSYAMNTFS QENIREN QRNIRDS QRNIREN QICISSN G G G G G G G G G IR IR IR I I VR IR IR I I I IR IR IR IK IK IR IR L IR IR IR V V A A S S S S AVT G LR LD IN LH LY LC LC TA SH SH SH SH SS LY SS SS RA A A A A A A A A A A KLY HKA GKA MTY QSC RTAFES E

QD QD QD KQ KQ ------NQ NQ QQ QN QS QS KK KK KK KK KN EV KS KS KQ EQ QE ---SRQAQDLARSYG

IE IE IE I IE IE IE IE I IE I IE IE IE IE I I IE IE IE IE L L L L L L L B

L S S L L L S S G ST R D L L L L L S S F F F A ------MTEYK VL IL IL VL VL IL LL VL LL LL VL VL VL VL VL VL α ----KNS ------KN-HCREIL ------SAFQAL ----PCVCCCLRRLRHKRM ---CCFTSHHSRCKQ ---VVTR ------TLV ------D D D N T T A T A A E T E T T T K T K K NT N K KT NT S KA NT K K S S S N E NT I I I L I I I I I I L L I I I I I I T E T IST T I I TTR TTR T L L L L VTT T V T VSR T T KE RE refereed research E E E E E E E E E E E E E E E E E E E E E K E SEAA PEPQ PEPQ Q Q K Q K Q E E Q Q E Q Q P VGV E P E E EYD FNRE FNKE F FNRE FNRE FNRK FNRE F LQ PH S SYNRE TFDKE TF TFDRE TF TF SFNRD TFDKE TFDRE SFNRD SFNRD SFNRD SFNKE P------P------WN------LS RR RK IA S RN MR MR Q R R RN R SG Q Q R K K K L L L L L V L L L L L L L II LMA II MLS P P P P KIIS KIIS KIIS KIIS I KP KP KP KP KP KP KP KP KP KP KP KP KP KP RP YL GP YL GL GL GL GP GP GP GP GPES-----

N - N TKVLS TKILS TKILS TKILS T GNT SN ------A A S S S S S S S S F G G G G G G G G G G G E ESKILS ESKIIS E E ETKIIS ETKIIS ESKILS ESKILS ESKILS ESKIIS D R X K K K K K K R R R R K 10 T KIVI M K T V Q Q MK K K K MK R 3 N KLCN EDF Q K ADG R K EDF R E KLYN Q Q Q Q Q K NTGR K QVK QMK QTG-SSRLP I FK F I FK F F Y NE NE NE NE NE NE NE NE NE NE NE NE NE NE NE NE NE K K ECEKGIN NFK NFK MVVK MVVKLSD E R Y D STS------VLSEVR E AATRSQ STS------STS------W T R E E Y S K S S S D K QK L SALISKAGG 370 380 390 400 410 d DL DL DI DL DL DL DL DL DL DL DI DL DV DL DL DL DL DL DV DV DV DI DL QEFCLAN KHISSVT STFG----RL EKFASAN EKFASAN RLLESSWWYG EKFETAT EYISCVT GNP------GNP------GNP------GNP------EYISCVT EYISCVT EYISCVT NAFFR-LLRF YFKN YFK FFKNFK YFK Y YFKN YF YFRNYK YFRNYK Y FFKNFK FFKNFK FFKNFK FFKNFK FFKNFK α L L I R T S R R S S S I I C F S I E LAARTVE------Q Q Q Q S K V V V NQEFCLAN V D D D D D D

α L NA TG IE NE TE TE TE GTNLQ EN KN GTNLQ EN EN VTS--- FSDAVFIPK DA T T T E VD T C RYI KYI RY KFL K KYL KYI KYI KYI KYI KYI KYI KYI KYI VAE F LTE TSRLPAVPEET F Y

A G4 Q K S F L L L L S L LNC LNC N L L L L A KSMTAGESLYSSQNSS SF SF SF α TKLD TKLD TKLD L L V L L I L L L L L L L L SSF SSF SS SSF SSF SSF SSF SSF SS SSF ASF S SSF N/TXXD W W W GN RTK K R K L K K K K K K K K V P V S V C E E VF EFN A T------T VF VI LRTK V V V N V D ERL E E E E E E E E E E E L L L L L L L L L L M L L L L I L V V Q Q IAQATSAAEAFCAVK E RLMTCAI CFA ARLYRTGTRVGSIGFDYMK------G E GGL T T T T RLYSQSSDGAMRVARAFERGIPVFG F YFVRTK YF FRNDFK FK L FK FK FK L M L FY FYFVRTH FYFVRTH FYFVRTKVD FYFVRTKVD FYFVRTKVD FY FY FYFVRTKID FYFVRTKVD FYFVRT YYFVRTKVD YYFVRTKVD YYFVRSKVD YYFVRSKVD FYFVRTKID FYFVRTKID FYFVRTKID FYFVRTKID FYFVRTKID TI SL TI SI T TL TL TL S SL SL SL SL S S S S5 R GL S S K E KD K R R K E E N N N N GL E E WV E N WK LM E E E M T T T T TV M M M K K R R R K K K K K K K K K K K K R E D E E E E D D E D D D D D D D D D E N- N- K- K- K- K- G- N-TK N- R- R- N-TK K- K- K- K- K- N-TK N-IK G-MN N- M M M M I M M M M M M M M M M M M M M M M VQ VQ QS LRQG- SM VM QR QR EK KK KDSDDVPMVL SM GI TN GI RI KDAG- EK AK AK AQ VQ VQ FAEHN FKPTD FTPTD LKKYR LKTNGKE LKCK- LELE- FTKNN FSDEP LSDEP LSDEP LSDEP LTKNN LTKNN LTKNN TRI I I I M M I V I L V L L L L L L L L L V L L L T AI I R T H A H H N Y H H H H A LANEVSP E IT IT K QK SE J P LQ LQ LQ α F CLDFWSLVK FLQ ----DTSKNEDNDD ----SPKSD-ENND ----LLKN--KCQF ----SRRS--EDQD ----DTSKTEDNED ----GTSK---SEA ----HIPKDEDKGN ----NMPKDEDKGI AFLKGASENSFQQLAKEFLPQ ------KPDAKAHN ----STSPPKEDPP ----STPPPKEDPD ----STLPPKDDPDFI H3 LSK LA LA LAK LA LA LAK LAKAI LAKAI LAKAI LA QI S S S S S S S S S S S S S ELAKAI ELAKAI K K K E LR IKSP MKTP MKSP LKSP LKSP LK LR LRS LRS LRS IKSP IKSP IKSP IKSP IRSP IKSP L F F F L F F F AH AQ A A AL V V V AH AH AH AQ SR FI SS TS TS TS D DIELAKAI ELDLAKAI ELDLAKAI ELDLAKA EIELAKAI H H H D D D DVELAKAI E YR VH KT EI IK GE AH AH AH AH AN AN AN AN AM SV QL Q NDIDLAKAI NDIDLSKAV NE ND ND Q NDIDIAKAI NDIDLAKAI ND N N N ND K EAM IKN D D N I V IK LR VK VK L LK IK IK IK IK LK LK LK LK VR MGQL MGQL MGQL MGQL M MGQL MGQL MGQL M MGQ MGQ MGQ MGQ | interactions α DQ AD EYKDNMKSQNFYTLRRE GSRALQFQDLIKMDRRL DQ EQ EQ EL EL NE NE NE NE DDF EE EE EE EE V V V V V V V V V L L L L V L M M V HPPLHTATCQPSSSRPSRLTAQL HPPLNTATCQTSTGRTSQITAQL PFWFVPPLGTIDICQDWVKLPLLHPLQRRILLLT Q Q X -E -P -P -P -P -S -S -S -S -S -T -T -T -T DQ ------D ------HS ------H ------H ------K ------K ------EI ------QS ------K ------K ------EI ------EI ------EI ------E I I T------KL T------KH T------KL T------NH S------L K K S------S S------L S------S QV QV QV QV NV NV NV NV NM NV NV NV NV FK FR FK I CG------AV F F F I WE F F A A MKNKHFN--TSMESQETQRYQQD F F F F L L L L L 1 10 20 30 40 50 60 70 80 90 100 1 10 20 30 40 50 C C C RFR R RFK RFK R R 160 170 180 190 200 210 220 D D D D D E D D D D D D D D D TRF SRF SRF I G INNTKSFEDIH R RAWE K K K K EKLGA-P QSMGTVV L RSTGRLE K K K K Q E E E E EQVGK-QAGD S S S SA SG SG PR A A H ASEQ ASEQ ASEQ F α LA LA VA VA MA IA IA VA VA VA IA IA IA IA IA IA IA IA IA LA S- V 330 340 350 360 S4 IISATRFK IISAT IISATRFK IVSATRF IVS IVSAT IVS IVSATRF IIS II IV IV IISATRFK IISAT IISATRFK IISATRFK IISATRFK IIS IVSA V IVS IVS LVS QR QR EL EL MF MF KE QQ HL QQ EN EN EN EN EN EN KN EN EN AK C

SL SL SL SL SL SL SI SV SL SL SL SL SL SL SL SL SL SL SL SL

------MKPSHSSCEAAPLLPNMAETHYAPLSSAFP ------MPTSRVAPLLDNMEEAVESPEVKEFE -----MDLVTKLPQNIWKTFTLFINMANYLKRLISPW ------MAQL ------MAQL ------MAWA ------M ------79

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

153 152 152 153 154 151 154 154 158 160 159 168 142 162 142 162 140 152 146 162 155 155 136

325 ------318 325 326 ------326 326 330 316 316 327 333 314 313 333 311 ------317 332 326 326 308 ------

| |

Irga4 Irga7 Irga5 Irga3 Irga8 Irgd Irgm1 Irgm2 Irgm3 Irgb3 Irgb4 Irgb8 Irgb1 Irgb6 Irgb10 Irgb2 Irgb7 Irgb5 Irgb9 Irgc H-Ras-1 Irga6 Irga1 Irga2 Irga6 Irga1 Irga2 Irga4 Irga7 Irga5 Irga3 Irga8 Irgd Irgm1 Irgm2 Irgm3 Irgb3 Irgb4 Irgb8 Irgb1 Irgb6 Irgb10 Irgb2 Irgb7 Irgb5 Irgb9 Irgc H-Ras-1 Irga6 Irga1 Irga2 Irga4 Irga7 Irga5 Irga3 Irga8 Irgd Irgm1 Irgm2 Irgm3 Irgb3 Irgb4 Irgb8 Irgb1 Irgb6 Irgb10 Irgb2 Irgb7 Irgb5 Irgb9 Irgc H-Ras-1

information

Figure 4 (see legend on previous page)

Genome Biology 2005, 6:R92 R92.10 Genome Biology 2005, Volume 6, Issue 11, Article R92 Bekpen et al. http://genomebiology.com/2005/6/11/R92

modified, primarily in the nucleotide binding site. In view of

the substantial divergence between the IRGQ genes and func- Irgb3 tional p47 GTPases, it was unexpected not to find close Irgb4 homologs of the Danio irgq sequences in either the Fugu or Irgb8 Tetraodon genomes. The evolution and diversity of the Danio Irgb1 Irgb6 irgq genes is apparently linked to the evolution and diversity Irgb2 of the GTPase-competent IRG sequences. Irgb7 Irgb5 IRG homologs outside the vertebrates Irgb9 Irgb10 No unambiguous IRG homologs have been found outside the IRGB12 (dog) vertebrates. However, two possibly related sequence were IRGB11 (dog) recovered from the Caenorhabditis elegans genome, and sev- Irga1 eral groups of putative GTPases of unknown function exist in Irga2 Irga6 the bacteria that have sequence features reminiscent of IRG Irga4 GTPases. Perhaps the most striking of these are found in the Irga7 Cyanobacteria (see Additional data file 1 for accession num- Irga3 bers for these sequences). Among other features, all of these Irga8 Irgd sequences have in common with the IRG GTPases the pres- IRGD (dog) ence of a large hydrophobic residue in place of the familiar Irgc (mouse) catalytic Q61 of H-Ras-1, but this feature is far from diagnos- IRGC (human) tic for the IRG GTPases [22]. Despite several suggestive char- IRGC (dog) (zebrafish) acteristics of these invertebrate and bacterial GTPase irgf1 irgf2 (zebrafish) sequences, it is not possible on the basis of sequence criteria irgf3 (zebrafish) alone to establish their phylogenetic relationship with verte- irgf4 (zebrafish) brate IRG proteins. irgf7 ()Tetraodon irgf8 ()Tetraodon irgf6 ()Fugu irgf5 ()Fugu Discussion irg g (zebrafish ) The p47 GTPases (IRG proteins) are an essential resistance irge4 (zebrafish) system in the mouse for immunity against pathogens that irge2 (zebrafish) irge6 (zebrafish) enter the cell via a vacuole. In this study we reached several irge3 (zebrafish) unexpected conclusions about the evolution of the system. irge1 (zebrafish) First, the IRG resistance system, despite its importance for irge5 (zebrafish) the mouse, is absent from humans because it has been lost IRGM (human) Irgm2 during the divergent evolution of the primates. Second, the Irgm3 IRG resistance system is at least as old as the bony fish but Irgm1 missing in the invertebrates. Finally, the IRG proteins appear IRGM6 (dog) to be accompanied phylogenetically by homologous proteins, IRGM5 (dog) IRGM4(dog) here named IRGQ proteins, that probably lack nucleotide irgq2 (zebrafish) binding or hydrolysis function, and that may form regulatory irgq1 (zebrafish) heterodimers with functional IRG proteins. We consider irgq3 (zebrafish) these points in order. H-Ras-1 0.2 The argument for the absence of the IRG resistance system in humans relies on several findings. The system is reduced ExtendedFigure 5 phylogeny of the G domains of IRG and related proteins from 23 genes in mouse to one full-length gene and a Extended phylogeny of the G domains of IRG and related proteins. The phylogeny relates all of the IRG sequences described in this report and transcribed G-domain in humans, and the residual genes lack reveals the distinct clades on which the nomenclatural fine structure is the character of functional resistance genes. Thus, IRGC is based. All except the mouse sequences are labeled with the species of highly conserved in humans, dog and mouse, is not interferon origin. Dog IRG sequences are found in the B, C, D and M clades, and or infection inducible, and is expressed constitively in mature human sequences only in clades C and M. The mouse and human quasi- IRG proteins, IRGQ (FKSG27), could not be included in the phylogeny testis. IRGM, although clearly derived from a typical GMS because they are so deviant in the G-domain (see Figure 6 and Additional subfamily resistance gene, is transcribed constitutively from data file 6). an endogenous retroviral LTR, is unresponsive to interferon, and appears to be structurally damaged in several ways. We argue that the IRG resistance system has been lost from primates (the situation in chimpanzee is identical to that in

Genome Biology 2005, 6:R92 http://genomebiology.com/2005/6/11/R92 Genome Biology 2005, Volume 6, Issue 11, Article R92 Bekpen et al. R92.11

------S-- I TPG TPG C-- L F V V II FIM FMV FII FL FII FII FII - M L G L L L S4 E D TS TG TG TG RA RG SY SY LF RL TV SY LI LF RS RS RS YA SY 413 415 420 440 441 419 409 181 377 397 397 463 463 433 250 410 408 386 384 410 422 398 398 387 402 379 247 161 417 394 460 456 407 447 189 YDFFII FDFFII YDFFII YDFFII YD YD YDFFII FDFFII FDFFII YDFFII YD YDFFII YDFFII YDFFLL YDFFLL YDFFLL YDFFLI YDFF YDFFII YDFYII FDFFII FDFFII FDFFII FDFFII YDFFII YDFFII YDFFII YD YD FRG H YSA YST YSE QSH Y YK YK YK YK WR WR YR YR YH YH YR YR YR YH YH YH YH Y Y Y YR Y ET DL ER ER ER AN NR GR SR GR HM HM ER SQ YE GE AT STC SW SK EK EK ER ER RE RE

comment F F F F F F F F F F F F F F F F F F F F F F F F F F VKAQIS QQ QQ QQ QQ AE AE HL IA NA KF TL KV IA RG RG RG TRCYFA IGC RK EI TK KD NL K N KLKN KLET KLET E E E G Q H D D D N N K RT--G K K E K G G G G K V K Q L V V V V L L L L L M L L L L L L L L L I L L L V M V V V V V V V M I V V V V V V M M V M V V V V M M M M M H2B

G KD KD KD KE EL QQ EL DQ KM ME KQ KQ KQ KL EK EK KD KD TE EM EE KL KL KH KH EK EK EE EE KEETEK VGL AGL AGV AGV RGQ RGQ DETFNL EET GQC GEF KKS KDT GEC GEC IRS IHS IRS KKFFKQVFMA KKFFKQVFMA ALF IVY LGFFTKCYYA LGFFIKCYYA AEE KRE AEE α EMKQAMKC GVRI L L L L L L M L YL YV YL YL YL YL YL YL YL YL YL YL YL YL YL YL YL YL YL YL YV F YL YL YL YL YL YL YL YL II V LI LI LI LI LL LL LL LL LL IM IM MV IL IL IL II IV II K E E K A N N K K K D E E Q Q N T NQ KK KK KS E KD AMRDQY E KK KK E E E VM AT LS LS GA AD VA VA VEK EDD AND DA DA DA YG YV TG KA AA AA IA LV VD LET LPEQEQC VET FPQQEKC DDN ADD A S AD A A AD AD AD AD S AD AD AD A A A AD AD AD AD D D D D D D D D E D N D D D D E D D D D D D N D D D D D N H2 K A K K K K K K PPNT H TPQN HPHE R R K HPQE QPQD P P P P F F F F F F F F F F F F F F F F F F F F F N S N K K N N N N GCS GCP GCP N K K N ATSP--AVTPHPTHYDALIL GTATTAAAASHPTHYDALIL TN ATTTL TN GTQSL TT TS TK TK TR TR SVAV SSMV SVAV SVAV SVAV SAAV GWAC GWAC ISD-ILEN INDN LGIQ LGIQ AAAY AAAY AAAY SLAV SLAV SAAC SLIT SMAC SMAC SISA SVAV SISV KIG------D T T ATAQTV AGTQSL AATQNL L V ------L L L L L L ------FLLDS M FFKGF L YISDN CFNGF SLGVR ------TLGIW I I L L L ------V V L I L L L L L

GSP GSP GTP GSP GSP GSP GS EAV-MDYSV GT SSDANFKPEDYIERFKATRYNA GTP G G G G G G A G G G G G G G G G G G G G G G G G T T T QEEY--- I T T A A A T T T WII G P V P P P P P P PL P P P P P P P P P P P H2A

G P GI LGP LGH S FMT FM FM L SLT NQF S PL H PV PL PI PI G3 TA VP VP IPL VPV IP IP IPL IP VPL VPV VPV VPI V IPV VPV VPV VPL LPV LPV VPI IP LPV VP D T T WDLPGLG L WDLPG WDLPGIGS WDLPGIGT WDLPG WDI LWDLPGIGTP LWDLPG LWDLPGMGTP IWDLPGIGSP LWDLPG LWDLPG VWDLPGIGTP VWDLPGIGTP LWDLPGIGTP LWDLP LW LW I IWDLPGIGT LWDLPG LWDLPG LWDLPG LWDLPGLG LWDLPG LWDLPGVGT LWDLPG LWDLPG IWDLPGIGSP IWDLPGIGSP LWDLPGIGTP LWDLPGIGT LWDLPGIGT LWDLPGIGT LWDLPGIGT DXXG/ R RF R K E E K K V V V K D VF T IF V E T T T RF T TF TF E E K K R S S T T S3 I V FR L V M V NV NI NV NV NI NV NV NV D NV NV NV NV DV DV DV DV NV NV NV NV NV NV NV NV NV NV NV NV GALAT- ACVAL-F SVGAL- AAVSF- GASAT- GASAS- SALSF- AGIAV- GSVAV- GSVAV- ACVAA- VAITD- GGVSA- AAGAV- AACAA- AACAA- SLAAAAA SLAAAAA DLVNI- GVIQA- GVIQA- GVGAF-P GVGAI-A ATVSA- DSCKFM----- SACTL- A S S S A A S S S S S S S S S S A A A S S S A A S S MK FL LV IL SL LL LL LK VK IK LK VA VA FV FV FV FA LV LVLGVIQA- LV VG VG FI LA ------H ------K ------Y ------K A A A A A A A A A A A A A A A A G A A A A A A A P P P P H---- E---- S---- P P P P P P K---- P P P FP FP FP Y Y YP YP FP FP F FP YP YP YP YP YP FP F FP TK DSH---- Q Q Q AM TM TM K KIQ----H K KH V ER ER NI KL K K H D N S S K FM FM N FLE WLE WLE WLE WLD WAASLA RKV ERVTEQ PLW PYW PHY PHY WLE GSKTFQ------D GSKSFQ------D LKT LKT LKT GKV WAA ANESLK------N ASKSF------D WLN WLN AKV WLA P P P P P P P P P P P P P P P P P P P P P P P P P P P P V I V V I I I I I I I I I I I V V V I V I I I I I I ------RK ------H H H H H H H H H H H H H H H H H H H H H H H QK EK QK QK EK RN SK SK SK SK AGAWRP AGAWRP QR AL QR GK KL KL T AL- P P P E R K I N F FS-SH RKQVVIDGETCL K P Q K K SS-SH QQ-SNL TN- P P P P PA- Q SY- SS-SH SF-SDI E E I K R R R K K K K K K R R K K K R K K Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y F Y Y Y Y Y Y Y Y Y Y Y Y F S2 α SP SP SP NM TM MM EV KA EV TC TC TC NM NM KD EAF KALEEN DSL HFL DSL DCL AFF DMLQEQ AMLQEQ DMLQEQ EALGQN DTL EAF EAF EAF QFL ------EVLEKD DAF ALLEKF P P P P P P P P P P P P P P P IKMFEGV IEIFMKQTWSA IEIFMKQTWSA AAAL AAAL VTSFQEQ VTTLQGQ VAVW VTMF R ITYF ITYF YVSKQLDEDVFWKR ERHP KRTP CKTP Q Q Q E V V E E S------KKTP K K E KK KR KK KR KR KR KK KK KK K K K KK RK KK KK KK KK K K KR K K KK KK K K K K R KK QRCAS KKATM AE KEAVE FDRTE FLRKA KTRTE QI QV QI KQATM L VDVKE AEVKE TEVRA TEVRA TM T NY EL DR DR EM QK QK QK EM EE EE EQ EK EK KK EK ND NG KK TQ ------GD SD TK TK EK TT V TT TT TT TT I I I I I L L L L L L I I I I I I I I V L V I L L L KA ETTM ETT ETTL ETTL V ETT -----EDS

R K R R ETTM ETTM ETTM reviews IETTM VETTM VETT VETT IETT T T T T VETTM V SNPA T I LV A T A P A TT TT P FE I PTP------G2 LEINEK EAT DAS ESA EAT YTS LEA LEA LEA AAS SKI SKI QEINEK LEI LEINGK LEI LEI PRAARR PSAART DSV LDV SSL LET LEA SWI LAM E SS APTP------FPA- G G GVVETTM GVVETT GVVETT KQEV S S S S S LHGGWLKA S S S S TGVVE TG TG TGVV T TGVV TGVV TGVV TGVVETTM TGVMETTM TGVVETTM TG TG TG TGV TGL TGV TGV TG TGVVETTM T TG TGV TGVVETT TGVVETT TG T IT V LT IS IT VT LS IS IS IS IT IS IS MS MS IT VS M K P DV PI P SV P P P P L L L Q S F P P P P E E K E QNPPSAAPEELAV GAS L PV PV P P P

N N N I N D D D VC IY IY N N PA PA N N VY VY VY

A A A A A A A PP A A A A A A A A A A A A A A A A A A A A A A A A A VP -YDPT

QS LN PN SN AN SDTCEKI THTCEKV P ARTVKEN YGTYEKI SHTYEKV DK PTF S S S S S S A A A A A A A S S S S A S A A A S A A A A S A A A A S G LP L LP LP LP LP LP LP LP LP LP LP L M M MP L LP LP LP L L LP I L L L LP LP L VP ------A E- D- D- G G G G G KE K- D N- N- G N- T- VA LA QY QY LS LS LS LS LS LA VA FA VS LA LA VS EN EN QAW LA KT KN QS QS LA KAAIPSHKKVALA QS D KD- P P P K NQ- NQ- K K E K ---- SS- M M M L L L I I L L L L L L L M L L L M L L L I I L V GK EE A DE DE D D C E DEDA DEE DE EEEG E DEEG DEEG EEEG ED ED ED EDEG EE DDEG DD EDE DDE EDEG EE DD DDEG DDE EDEG EDEG DE EE RVL HVFSLS HMFALL HIF HIF HIFALL HAG HAG HAG DAL DAL RVL ILLKSA NVL DAL DAL DVL HNF NAL RVL FAL FAL LNPPDESGPGCMSCKCVLS------GH GH GH GHN GH GH GH SY GA EA GA GD GH GD GN DHK DDR DHQ DHR GN RD RD SD TS KP KD TS SQ KNSN GD EN E YCGP TSAFAYY R CCEP YCGP AEEFSGF AHAL LKGM GFAL I I I L α R KR R Q KR R R R KR R K RK KR KR K TDSII------EILSHRR------RDSIFPPQI------KDSII------EDAII------EEDEPQGGEVSLEAAGDNLVEKRSTGEGTSEEAPLSTRRKLGLLLKYILDSWKRRDLSEDK EDDEPQP-EVSLEVASDNGVEKGGSGEGGGEEAPLSTCRKLGLLLKYILDSWKKHD-SEEK EEDEPQS-EVSLEAAGDNGVEKRGSGEGGCEEAPLSARRKLGLLLKYILDSWKKRDLSEEK ETEV------ETEV------RCMNSSV------CCMNSSV------NHINS------DSVR------ETEV------RCMNSSV------GI GI GL GV GM GL GL GL GV GL GV GL GL GL GL GL GV GV GV GV GL GL GL GL GL GL GL HKR HKR HKR HKR HKR HKR HRR HRR HRR HKR HKR HKR HKR H HKR HKR HKR L L L L L L L L L L L L I L L L -NTGH -V - -E -K ------D ------Q TAQAGAL PAQAGAL IY A G A S V A S A D D Q LH EI E E E E KN KN V I T I I I I-ETVNVA------EEIFKETVGSGQAYLLQDVGIENRKSDATSS------EEIFKDPVDSEQTYLHTNVGNENGKSDTSSS------A A A A A V A R R R R R R R L- L- R IQNHFV V- SDI LKC NTI SNI SDI SNIGYRGH FKI DLA P SDQ K K K K K K K K K K K K R R K LR LR F F F F F F F F L LP LP LP L L F K H E K K VVSQLA------AISQLA------QVAGLDE------KAAGVTGVY------MAS NAS ------KEICLRN------NSKALFEKKVG---PYISEPP--EYWEA------R N K AMVTGERQ------GPPEPNQ------APPEPAQ------ALR T T A A DLP DLP EV ELP ELP ELP ELP ELP EL FE YN RI RI RI RKEV FKAL FK FE S LLGLDPGDPGAAPA LLGLDPGDPGAAPA RDI KELP KEI H R KELP KELP RELP KELP KDL RDI RDI HDLP HDLP HDLP KQLP KQLP RELP RE RELP RELP IV LL IL IL IL VL VL VL VL VL VL VI LL LL LL LL LL V V V I V V V V VV V VL VL M M INALR INALR INALR INAI INAL VNAL IS LQ N E EE E ET RGD SSD Q R L L L L H N H EE EE AE ES E E E E E E S I N T T K K Q N N N T H V V E H1 L L L L L LTIQ L

S L L L WE WE WE FKD M M M M M L L L R S------RE IA K R K QR QR QR R R R R ET QN QN QD EK QY EA EA TL TL TL TL TL TL SL T T T T TL TL TL TL TL TL TL TL TL TL T A A T T T T T AR M I DA D T T DA DAK DAK D DA DA DA DA LVLD LVVD KATFINSLR MDK ETK EE ET ET EE RD RN RDA RK VT VS MS VN VN QN VG QDR EEK QEM QER ESA SANYPET HE YE HA HA GTW CEW EDAK N DDAK K NV K EDA EDA GMSTFI GMSSFINALR GMSSFINALR GMSSFINALR GMSTFINALR GKSSFIN GKSSFINALR GKSSFINALR GKSS GKSS GKSS G G GKSSFINALR GKSSFINALR GKSTFIN GKSTFVNA GKSS GKSSFVNALR GKSSFVNALR GKSSFINALR GKSSFINAIR GKSTFVNA GKSTFVNA GSSS GKSSFVNA GKSSFVNA GKSSFVNA GKSSFVNA GKSSFINALR GKSSFINAVR GKSA GKSTFVNA GVST H5 L L L L L L L L L L L L L L L L L L L L L L V T L L L L L L T A V VS VS IK GA TD T T LA RA N N N N N S T T T T A A A A S A A V A A S S C S S S S S A V S S α V V V V VADDAK VAQD VAED VA VAED MAEDA MAEDA MAEDA MADD MADD LA M LAE LAE IAEDAR IAEDAR IAED LA M L L LAEDA L M M G PV PK PK QS QS PR PK PE PE PE PM PT PL QKFIS QK QK EE N- QTFVD QTFVD QL LTFLEV VS HE HL HR HR TG TG TG RNAE TG TG

F F F F F F F ------F F F F F F F F F F F F F LNL LNL LNL F F F LFL F F F F LPG LPG DAFYTLVREIRQHKL ------DM DTA NI EA DT NT FL EI EI EI NE NE NE KE RE KE NE QD QD NE ND NV ND DD ------EM KM HM DM DE DE ------

TTNV KADV M M S AG K S D D D D D D E L L L L L L L L L L L L L L L L L L L L L V V I L I L L L L GDSG GQSG GESG G G GE GETG G GETG FD YE H H FD YD YD YD L GXXXXGK/M G1 FD A I I A A L A V L RYD RYD KYD KYD KYD RYD RYD RYD KYD KYD KYD KYD R R HYD KYD KYD RYD RYD K M VTGDSG V DD E E E R S GA SD H G G R R Q Q C SS S N RPGG E QGC QGC QGC SDF LDF HNF LQA LQA RTG RTG NVG NVG RSG TRA KKA RRA KKT NNA SDF KTA T F V THN L L LH V PLL PLLH PLLH L L QNLF QNYF KNYF L L L HLKF L L L KYYF I IAVTGDSG IAVTGDSG IAVTGDSG VAVTGETG VAV IAVTGESG IAVTGESG VAV I VGVTGESG VGVTGESG VGVTGESG IAVTGETG VGVTGESG LAV LAV IAVTGDSG MAITG MAITG IAVTGEAG VAVTG IGVTGESG IGVTGESG IAI VGV IGITGESG IGITGEAG IGITGESG IGITGESG IAVTG VAVTG IGVTGESG VAV L S1 DI DL NPFLH D H H DL EL D PMRPGG P ML ML ML VL VL KHRRL KHRHF ML ML ML RRTQF LL LL SQGHL ML

N R S S N E S N D S E E E H N D D N N N H D N N N N D N N N N H N D FDL FDL FDL FE FEL F F F FEL FEL WDL FEL FEL FE FE WEL FEL FEL F F T T T NA W W YLL YLL KA V L V V L L L L L V L L L L L L L I L L L L L L I L L L L L L L L L NV NL N NLSPT NLSPA NLSPA TH S S NL S S S S N G CN N N NKN SL S S G N S AKTRQGV YYL YYLL SG ST TL Y G5 WW H H SS CS CS QPSA NSQDRSA LSKLDIQQS S SAK IFYL IYYL VY VY VY VYY IYFL V V I IY IFYL V V V V RK RK GT GA GT RAAHG RAAHG RMA FK GTT STTLN STTLK LT LT KT AAT KVTEKA EITERA RNSHTN FLI FLI F F F F F Y Y Y F F Y Y Y Y Y Y Y Y F VFLVS IFLIS IFLVS IFLVS IFLVS IFLVS IFLVS VFLIS VFLIS VFLVS VFLVS IFLVS IFLIS IFLIS VFLIS VFLIS VFLIS VFLLS VFLLS IFLLS VFLVS IFLVS IFLVS VFLIS VFLV MFLI VFVLS VFLLS S6 Y Y S S S SCAV G G S S CCCLRRLRHKRM INFLRK INYLRV S S S S S S S P C C R R R H K K HA P I I Q I V V A Q Q P I K V I I I I L L L V I V L M M L L M L L L KIVSRTP ERASSVP ETVSSAP ETASSVP KEIDSSV SAAENAI KDIDNAP RDIDNAP VAAENAP ATLSQIP ANSETTR ASTESIR ASSQSIR RDIESAP RKQDLVE ERLGSAR ERLGSAR EVFDRFK TELENVT TELENVT DRLDSVT KQQDLVE RKQDLVE EDMLISR GALDHFR DNPSNAT DNTSNTT ERAANTP EKRANTP EALSHFQ PHA AQ Y------P P P P QQ P P P P reports L L L L L I L L L V L L L L L L L L L L L L L L L L L L EP EP EP DP DP DP DP DP D DP D E EP DP EP EP EP DP E AVTY SAFQALK TG AG TIFQ GG GG GG GS DF DT TTGIHYLK GGAAVAS GGTAVAT AG GG GG AA GT ST RK T T T T A A S T L L L QNKQLGNVT Y IL K K SFDQFMNVS E QNKQLGNVT Y C L L VL T P N A T D- NISK RISK G GN D D C S C C E EY KD QA-Q QA-Q QA QA A KY F C E PL S ---PYIET A S A A S A A A A A A A A A G G G A A A A E E E E D D E E E E E E E D D E E E E E E E C L L V L V I M V M V V V V L QKA NKA K R R R S EKV N E K R Q R QA RAG RSG ISL I RDA I SA α GV GV GV GV GM GV GV GI GI GV GL GV GV GI GI GV GI I I I V I V I I I I I L L L L V V V L I V V QESLDS IKT SNI RVA REA RVA KEKK LK-- LK-- KQ-- KE-- KD-- KD-- QNVK MEAN QKER HKE RKI RKI SKI- LNA LNA LKE LKE QKEK QKER QKV RKI KAE QREK L L L L L L L L L L L L L L L L L L L L L L GL GL EL GL GL GL GL GL SDVMIQ QKAVSD PEVISN LEVVPM LEVVSM LDMVSV QLTNSA QKAVSV QSAFSA KDVADI LELVYG PQAASR PQAASH PQAASR EAALSW EAALSW SNH VTN LAN TER AER AER EKY KVN KVN LKN YRN HRN HRN VN EN KS KK KR KE KE VNTFREN ITQ VKH IN L L L L L L I L L L L I L I L L C C C C C C C C C C C C C C C C C C C C C C C C Y Y N NVLEN NIRET H H H D D D D D N D D D K N N N N NIREN NIQEN NIQET LN ND ND E E E KD AASLDV AQYTQE H4 Q Q W K SMVDLIEDKAVEEVRQWTEK IR IKD IRD IR IR IRD I IRE I I IRE IRD IRD IRD MRQ IRE IRE IRE IRE IRE IRE IR IRE IRE IRD LR I IRD IRE IRE IRE D N Q K K Q N - N N N E E E Q K K T A H H S V S N AK G V R L L L DSGCTAARSPEDELWEVLEEAPPPV DSERAAALSPEDETWEVLEEAPPPV --SRQAQDLARSY

ALQEGN SLKEGN ASADGN ALGGRK ALGDGK ALGEGK IR------LSQTRDFTDNPSK IK------KL IK------KL ALREGK LK------LK------AK------AK------LNSSAEYINEMEC IK------IK------D D C C D D D - S A A E S Q R D Q E Q K Q S S G Q Q D Q Q Q L Q Q K K K K K K R K K K R K K K K H K K L L L L L L L L L L I ELRMRKGN MTYIEENK H E QSHLEKGD QAALKEAK E E E E RTAFESGD RTAFESGD RSAFESGD RDVFAGESPETIPHR NTIK------S NTIK------LETLKESIEKNNISD S D D REAFETGG REAFETGG B LL LL VL LL IL LL VL VL IL LL LL VL LL VL VL ML VL VL VL IL VL VL I I I I I I V I I I L L L I V V I L I V V V V α L L G S T R -TE N N N N A A A T KTR KTT ATA AAV PDV ------MEC A A AFGTIS------NYFKETSLV- A N ITKLQNMYKSTGFGAAK Q AAA AAA A A A A ------MTEYK DERV DEPM DEQR DQKKT DLKKT DQKKT DPEQT DAEQT DAEQT SETQA DPEKT DKEK NKEE NKDE DEQK DEQK DQEK KKEK SEAA SEAA NKDE KRER REAA TGGPV------KGGPE------SGGLI------SGGPI------NGGLP------SALSFL------WPTGGAAATGG GALSFL------WPAGGAAATGG LRFLP------C LRYIPL------L LSYIPF------L LSYIPI------L GTIMAL------LPG--GALP GTIMAL------LPG--GALP DSKNLAVN------VQNTADA AEEGLK------FIPLFGTLV F F F F F F F F F F F F F F F F F F F F F F F V V V V V L L L I I V L L L L N N T -INEQQ N N N D N N D S E D D T SEGKKAG R R K K K K K R R R R K R K K K R SKETAI QEDHLG EHIYEM NQDDLD SQEILN SEYDIT SPETLT SQEAIT SQETIS ASENLD PKATAT SESGAM AKEELE AKERLE AKEELE DAKPKE DAKPNE GAAELE GAAELE L V V I PEVSRS- GESDPNA STQDLPS STEDLPT YTQDLPT STQDLPT ADVIKGL RNNNQAL QNNNQAL QNNNQAL QNNNQAL II II ML IL II II L VL IL LM LM LM VI VI L I I L I M L L L L VL VL K

α 10 3 deposited research deposited TCAIVNAFFR NCNTSSCLYT NCKTASYLYS NCNAASYLYS SNPVIAITKT SNPVIAIAVT QVSGTLVVLFSAEYVAS------LVPGVGSVA AKPVKSLEDL GAASVLISEDAVELLVS------FIPIIGSVV GAVSVVGAESTVEYILS------LVPILGTVV EAEVPKIEN---EYFLS------FMPFIGTEIKKIKSSVA SSATLVLGGMSVLAAESALEYFLSTIPLIGSVA SELASVAGLMAAEEGLR------FIPIFGTMI LQSAAVAGLM SQSSAISSLTETRESYS------FIPLFGIPV AEAEKDTST--ATTRLV------ELAIPRQARSVSRSFTVMLQA NQTASVAGLMAAEEGLR------FFPIFGTMI M M M M M I L M L V L L L L L L M QTQG-----DLCCEETE KSQRR----KSK STG------ALPEVQ R----AESNK R----AESYK S----AEQRK C----SVANG T----AEKKK T----AEKRK T----AEKKK RTIQL-LQQASSMGKECY LN---AQRSQ QN---AQRSQ QN---AQRSQ TNE--ADGKPQT STS------VLSEVR STS------ALLKER STR------VLPEEQ STC------VLSEEQ QS---EQRYQ DNE--QKFKP YNE--QKAKPIA YNL--KRIKP A--ATRSQRPSG A--ATRTQRPSG H----AESHR H----AESHR D----SEKYK YNL--KRTKPSD YNE--EKSKPMS A--ATRTQRPSG d L I I I L I PGKAGSEGLQQVVGMKKSGGG QERLSRYIQEFCLANGYLLP------KNSFLKE EDKLFKYIKHISS IAQATSAAEAFCA GEKLLRYVEKFCS GEKLLKYVEKFCS AEKAMKCVECYCS RL SC SC SW RLYSQSSDGAMRVARAFERG------IPVFGTLV RLYSQSSDGAMRVARAFERG------IPVFGTLV RLYSQSSDGAMRVARAFEKG------IPVFGTLV DM DM TR SP TNKELSALTSKEAAVKFAWS------MVPVVGSIKTAQ TNKELSALTSKEAAVKFAWS------MVPVVGSKKTAQ TL SL NL AL KA KL NL GSWAGEGTAGGAA GAWAGEGTAGGAA α DL DV DV DV DI DI DL DL DL DI DL DL DL DL DL DL DL DL DL DL DI DI DL DL DL I L I L L I L V L L L L L I M L I L L L L I L V L L L L M N N N N N SC D N N S R R R R AS AS AS N S S S E E N N LAART--VE------S N E QS MG T S S S P S K ------A A A V V V ------L L A R K K I L I I VKRR ------ITQT I I V R R ------

AS------VKDTEKS RVR D E E E W E T T T A A - W W W L L

A L L C D D D D E E E E E E D D D D E D α KID P K

G4 TKLD N TKVD TKLD TKLD TKLD TKLD S KDN- KD RG TG AW TDRI TEHI QEK- LDRM GSYS SKSE DA NPSS NPAS NTSL NT SP SP SP GQK- GQK- NAQL DKAF W Q W W W W V GN V V I L L V V V V I I V I I I V I I I I I I I N/TXXD V V V V V V L V I I I I I I LL Y S5

FYFIRTKID FYFIRTKID FYFIRTKID FYFVRSKID FYFVRTKVD ETVYFVL FYFVRTKID FYFVRSKVD FYFLRSKID FYFLRSKID FYFVRSKVD FYFVRSKID P L L N E N L L K N T J HLF-AEHN HLLSVEKE HLLSVEKY AVF-KPTD LAN-E LAN-E LAN-E VSL-A LVV-A LVV-A LKA-G LGLNI GPVTRAEV GPVTRAEV LHH-G LRG-G LCY-G LSLDN LSLNT LTLDN KKFY K K K K KRFYFVR KHY KKFYFVRSKVD KKFYFVRSKVD KKFYFVRSKVD KRFY KRFY KRFY KRFY K KKFYFVRSKID KKFYFVRSKID KKFYFVRSKID KKFYFVRSKVD KKFYFVRTKVD K KKFYFVRTKVD KKFYFVRTKVD K K K KKFYFVRTKVD KKFYFVRTKVD K α P P QNFYTLRRE LDFWLL RDLHTI QDVHTV QDLHTA RFKD-G RFKD-G KIAS-A P P ------MN - - - QGEDPPEVLEEEKAQNASDGNSGDA EGEDPECLGEGK - - SP S SP S S S S SP SP SP S S S SP SP SP SP SP SP SP SP SP SP SP G G R- K- K- G G G K- G G R- G RF KCLDFWSL T MF K K K K K R R R K K R K K K R R S MG RK- SN- SN- QK- KQ- M MG LG Q SL-Q MG IG Q Q L L L L M AK RR RR I KL K K K K K KMG KMG KMG R KM RMG RMG RMG R KMG KM RL R R MI IM IM IM VI VI VI L I I LM VI LM LM LM VI VV II E SM QS QV D N K R Q Q Q Q AQ K Q QGQ KERK- KERK- M M M Q KDSDDVPMVL KDA R KE V RTD RTD L LCQ L V I I I I I I I I I I I I I I I I I I I I I V I I I I V V I I I EA KGE KAN KAN KSFTK RS RS RS NAAKT KAAKT GH KA KA KS TS ATRTRF AARAHF KLA KS KS TS TS T-

E A I TAED A A A E E E E E A A S E E G E R L L I L V I L L I L L L V L L L L L L L L L L TA K K NA NA K QK TA QN EE SE AE SE αΙ VDE V V DQ DDFKVH AD EK EK EE MEYKDN EEYRA EEYRA EV KV EE TD SL EE EEFKS EE TD ED VD L LAK IAK LA LSK LAK LAK L L LAK LAK LAK LAK LA LA LAK LAK LAK LAK LA LA LA LAK LAK LAK LA LAK LA LA LA H3 V V L V V V V M M V V L L V V I I L L L L K D Q K M K K K E E M Y M M K K K K M V AQ AL AQ AL V V V V V SR TR TR N N YREQI H H L L L E E E N Q NDI ND ND ND NDV ND N N N N N NDI NDI NDV NDV NDI NDI NDV NDV NDV NDV RDWEIE QDLNMS EKLGAP KDLNVS KDLNVS EKLDMS QSMGTV QSMGKP QSMGKP QSMGKPKEEYKA EQVGKQAGD EQVGKQAGD EQVGKQAGD ERVNKP GRVNKP EKINK--PL VRVNN--LS VRVNN--PS DTTGVQ ERTNKP DTTGVQ DSTGVP CDVSGKS CERSGKT CERSGKT CERSGKT ARQRGL-DPAKLKALRTCALSVE VAITGVP L LA IA IA IA IA IA VA IA MA VA LA LA LA LS LS LS LS LS L L L L LA L LS LA LA

αΗ QR EN KE ET KT KG QQ QQ QQ QE AK AK AK QV EV AR DK DK QK QM QK QR QR QR QR KQ QR QR ARRERALGLAPGV ARRERALGLASGE K------K K------E S------L T------I K------I S------L S------S S------M S------M S------M S------M R------E R------E M------Q K------E K------E K------E R------ECHTQ K------ECHTH R------ECHTQ K------EHHSL R------K R------E R------E R------E SL SL SI SL SL SV SV SL SL SL SL SL SL AL AL SL SI SI SL SL SL SL SL SL SL SL SL SL AV AL VR------E F F F F F F CG------AV CG------AV CG------AV T A Q A I K E E K D D D D G K K P P Q E E K RF R RF RF RF RF RF RF RF RF RF RF RF RF RF RF RF RF RF R R R RF TP GP PA PT T E E E E DWEK------VRH INNTKSFEDIH T T S T S EQ AQ EQ EQ EQ E E E D D D D TDRP------SANSVAVWKEV ANAF------SSSEGQQVASVLAL-CDV T T

SNQ SNQ GRP GRP A S S S S S A PR PR PR S DTC ET LDD LDD C I T T N V F GLNE N GLE GLE ISA ISA ISS ISS I ISS IAS VAS IAS IAS IAS VS VS VS V V V L ISS ISS ISS IAS IAS IAS ISS LTS I IS IS ISA ISA V FGVDE FGLDD FGLDD FGLDD FGLDD FGLDE FGVDD ------FGVDD FGVDD FGVDD FGLDD FGLDD FGLDD ------FGL FGL FGLDD L FGLDE FGLDE FGLDD FGLDD FGLDD F LC ------FGLD FGL FGLD FGL L L ------refereed research

SL APTEENWAQVRSLVSPDAPLVG APTEKDWAQVQALLLPDAPLVC

80 ------MGQLFSSPKSD-ENNDLPSSFTGYFKKFNTGRK ------MAWASSFDAFFKNFKRESK ------MDQFISAFLKGASENSFQQLAKEFLPQYSALISKAGG ------MGQSS-STPSHKTGGDLASSFGKFFKDFKLESK ------MGQSPPSTPSNRNGGDLASSFDKFFKEFKLDSK ------MDKFMCDFLVGKN---FQQLAINFIPHYTTLVNKAGG ------MKPSHSSCEAAPLLPNMAETHY---APLSSAFPFVTSYQTGSSR ------MEAM ------LHCFFPLLQVTPLLSDVTQPTHSLHTPLLTSSNYDMPYNMGWSS ------MTQPNHSLHIPLSTSFTSIVPYNMGWT ------MAQPTQSLHTPSPTSFTSTVPYHKGGS ------MATSRLPAVP--EETTI ------MATSKLPVVPGEEENTI ------MATSKLRAVPGEEETTI ------MFFSRLCMPAK ------MPEKEEDKNENLYIISSEFLDIMSNATDDPDSISEDMKE ------KEEEDENENLYIVSSEFINIMSNATDDPDSISVDMKE ------METQDP-AIAEAVQASGESTLEK ------MTDDSSADM-NFSGALQR ------MKIQKQKQELSNSSKPDTHSHSTAKENV-SLKSANTVQ ------MATFEDYCVITQEDLDDIKDS ------MDILEDYDIITQNDLEEIKES ------VDALEHLYEIKVEDKLKEIKEI ------MSNISQKVVLLFAEQEELVDLRKA ------MLHGGWLKARYATQHVQQTEKLETED ------MAIQCTHRICSYLTNSLFFRFVVSTALRSMK ------MADSSDIVEIKEA ------MADSSDFAEIKEA -MVNVCVCYITVGLSVGMISRLSDFYIVTVGFALCVQVIMADSLDTTEIKEA -MVNVCVCYITVGLSVGMISRLSDFYIVTVGFALCVQVIMADSLDTTEIKEA ------RLLPPAQDGFE ------RLLPPAQDGFE ------154 141 165 154 155 156 161 117 130 149 149 137 139 139 132 148 146 123 126 158 157 130 130 130 128 122 133 135 149 122 160 160 169 170 319 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 351 ) 308 ) 285 ) ) ) )

) ) 351 ) ) ) ) Tetraodon Tetraodon Fugu Fugu Tetraodon Tetraodon Fugu Fugu Tetraodon Tetraodon Fugu Fugu rgq2 (zebrafish) 276 Irga6 319 Irgb6 305 Irgd 324 IRGB12 (dog) IRGB11 (dog) IRGD (dog) Irgm1 310 IRGM (a) (human) 320 IRGM6 (dog) IRGM5 (dog) 321 IRGM4 (dog) Irgc 302 278 IRGC (human) 298 IRGC (dog) 298 irgg1 (zebrafish) 304 irge1 (zebrafish) 309 irge5 (zebrafish) 307 irge3 (zebrafish) 284 304 irge4 (zebrafish) 285 irge2 (zebrafish) 319 irge6 (zebrafish) 318 irgf1 (zebrafish) 293 irgf3 (zebrafish) 293 irgf2 (zebrafish) 293 irgf4 (zebrafish) 290 i irgq1 (zebrafish) irgq3 (zebrafish) irgf7 ( irgf8 ( irgf6 ( irgf5 ( Irgq1 303 IRGQ1(human) 343 H-Ras-1(human) Irga6 Irgb6 Irgd IRGB12 (dog) IRGB11 (dog) IRGD (dog) Irgm1 IRGM (a) (human) IRGM6 (dog) IRGM5 (dog) IRGM4 (dog) Irgc IRGC (human) IRGC (dog) irgg1 (zebrafish) irge1 (zebrafish) irge5 (zebrafish) irge3 (zebrafish) irge4 (zebrafish) irge2 (zebrafish) irge6 (zebrafish) irgf1 (zebrafish) irgf3 (zebrafish) irgf2 (zebrafish) irgf4 (zebrafish) irgq2 (zebrafish) irgq1 (zebrafish) irgq3 (zebrafish) irgf7 ( irgf8 ( irgf6 ( irgf5 ( Irgq1 IRGQ1 (human) H-Ras-1 (human) Irga6 Irgb6 Irgd IRGB12 (dog) IRGB11 (dog) IRGD (dog) Irgm1 IRGM (a) (human) IRGM6 (dog) IRGM5 (dog) IRGM4 (dog) Irgc IRGC (human) IRGC (dog) irgg1 (zebrafish) irge1 (zebrafish) irge5 (zebrafish) irge3 (zebrafish) irge4 (zebrafish) irge2 (zebrafish) irge6 (zebrafish) irgf1 (zebrafish) irgf3 (zebrafish) irgf2 (zebrafish) irgf4 (zebrafish) irgq2 (zebrafish) irgq1 (zebrafish) irgq3 (zebrafish) irgf7 ( irgf8 ( irgf6 ( irgf5 ( Irgq1 IRGQ1 (human) H-Ras-1 (human)

ExtendedFigure 6 alignment of the vertebrate IRG proteins Extended alignment of the vertebrate IRG proteins. Individual sequences are given in full and are labeled as in Figure 5. Unusual residues in the G1 motif are highlighted (M of the GMS proteins in green and two deviant residues in the zebrafish irgq sequences in pink). The essential structural relationship between IRG genes and quasi-IRG genes is apparent in the alignment despite the modified G-domains. For mouse and human IRGQ the long carboxyl- terminal coding exons that contain the p47 homology were used for the alignment. In human IRGQ the sequence interactions ENPKGESLKNAGGGGLENALSKGREKCSAGSQKAGSGEGP was removed from the alignment between positions 210 and 211 (highlighted in turquoise) to prevent extensive gap formation. The position of the intron present in pufferfish and zebrafish irgf genes is indicated by two adjacent residues highlighted in blue. humans; unpublished data) rather than gained by the murine tem is present in bony fish, representing ancient vertebrates. rodents (including rat; unpublished data) on the following Rapid expansion and contraction of multigene families asso- grounds. First, like the mouse, the dog genome has several ciated with pathogen resistance has frequently been docu- complete, interferon-inducible IRG genes in addition to mented in both animals and plants [23-28]. In all of these information IRGC. Second, humans and chimpanzees possess a degraded cases, however, the resistance mechanism itself has been member of the GMS subfamily of IRG proteins, confirming retained as its protein mediators have evolved or even, in the that this distinctive subfamily, present and functional as natural killer receptor case, been replaced by a different resistance genes in dog and mouse, was widely distributed at molecular species [29]. The IRG case may be different the origin of the mammalian radiation. Finally, the IRG sys-

Genome Biology 2005, 6:R92 R92.12 Genome Biology 2005, Volume 6, Issue 11, Article R92 Bekpen et al. http://genomebiology.com/2005/6/11/R92

(a) Irgc CINEMA Irgq FKSG27 170.559 271.086 Kb Mouse chromosome 7 13.1 13.2 13.3 13.4 Mb

Kcnn4 Plaur Xrcc1

49.0 48.9 48.8 48.7 Mb Human chromosome 19 914.746 789.088 Kb IRGC CINEMA IRGQ FKSG27 Irgm1 LRG47 Irgb8 Irgb9 Irgb1 Irgb2 Irgb3 Irgb4 Irgb5 Irgb10 Irgb6 TGTP (b) Irgm3 IGTP Irgb7 Irgm2 GTPI Irgd IRG47

Mouse chromosome 11

59 58 57 56 55 54 53 52 51 50 49 48 47 Mb

Human 246 245 244 IRGM

Human chromosome 5 // // 130 131 132 133134 135 147 148 149 150 151 152 153 154 155 177 178 179 180 181

Mouse chromosome 18 64 63 62 61 60 Irga1 Irga2 Irga3 Irga4 Irga5 Irga7 Irga6 IIGP Irga8

theFigureSynteny IRGC 7relationships and IRGQ genes between the human and mouse IRG genes (a) Synteny between mouse chromosome 7 and human chromosome 19 in the region of Synteny relationships between the human and mouse IRG genes (a) Synteny between mouse chromosome 7 and human chromosome 19 in the region of the IRGC and IRGQ genes. The figures indicate distances from the centromere in megabases. The locations of three further syntenic markers are given. Gene orientation is given by black arrows. (b) Complex synteny relationship between human chromosome 5 and mouse chromosomes 11 and 18 in the regions containing the mouse Irg genes. Figures indicate distances from the centromere in megabases. The locations of IRG genes are shown in the yellow panels. Positions of diagnostic syntenic markers are also indicated. Syntenic blocks are given in full color, and the rest is shaded.

because here the resistance mechanism itself has apparently the p47 GTPases [7,8] are performed by an unrelated and been lost during primate evolution. thus far unidentified molecular machine in primates.

It will be of interest and of considerable importance to ana- The loss of a highly evolved and complex resistance system lyze the different strategies by which humans, dog and mouse that is active against vacuolar pathogens needs an adaptive deploy resistance mechanisms effective against vacuolar explanation. The evolution of a successful avoidance strategy pathogens. None of the known mechanisms active in humans by the pathogens is unlikely, because many different patho- against vacuolar pathogens, namely nitric oxide and oxygen gens and pathogen classes are controlled by IRG proteins in radicals [30-32], tryptophan depletion [33,34], accelerated the mouse [2,4]. Nevertheless very recent evidence suggests acidification by Rab5a [35], cation depletion [36-38] or that Chlamydia spp. divergence between humans and mouse autophagy [39,40], is missing from the mouse. Nevertheless, may indeed be partially driven by differences in the deploy- it remains possible that the distinctive resistance actions of ment of p47 GTPases, in this case Irga6 (IIGP1) [41]. Human

Genome Biology 2005, 6:R92 http://genomebiology.com/2005/6/11/R92 Genome Biology 2005, Volume 6, Issue 11, Article R92 Bekpen et al. R92.13

(a) comment

Marker DNA kb

HeLa GS293 estis

HeLa GS293

no RT no DNA

1

no

Brai n

Live r T IFN - γ +- +-

641 bp IRGMc IRGMb 504 bp IRGMc Mc IRGMb Mb 38 Cycles reviews 500 *

495 bp GAPDH 27 Cycles

(b)

100 reports U3 region U5 region (ERV9) PR (ERV9) IRGM(a) 385 IRGM(b) 385 30,529 20,255 IRGM(c) 385 29,250 1,402 20,255 IRGM(d) 385 research deposited 30,529 17,161 3,436 IRGM(e) 385 29,250 1,402 17,161 3,436 Promoter Region Start Codon Stop Codon ORF Exon Intron

FigureStructure 8 and expression of the human IRGM gene Structure and expression of the human IRGM gene. (a) (left panels) RT-PCR analysis of expression of IRGM in HeLa and GS293 cells. The b and c splice variants were amplified simultaneously by the same primer pair (IRGMs1-rGMS). A different downstream primer (IRGMs1-r1) internal to all the 3' splice forms was used to show differences in the overall expression level of IRGM in the two cell lines. No RT' indicates that no reverse transcriptase is included refereed research in cDNA preparation. The band immediately below the IRGMc band in GS293 cell material, indicated with an asterisk, is a nonspecific band amplified only in this cell line. The band was sequenced and is unrelated to IRGM. (right panel) Analysis of IRGM expression in human brain, liver and testis by RT-PCR. GAPDH was used as a control. (b) Five splice forms of the IRGM gene have been identified, as indicated: IRGM(a)-IRGM(e). The promoter and 5'- untranslated regions of the gene are associated with an ERV9 retroviral LTR. Scale-bar is given in base pairs.

Chlamydia trachomatis is controlled by IIGP1 in interferon- ance to certain RNA viruses and have been considered func- treated mouse oviduct epithelial cells, whereas control of the tionally related to the p47 GTPases [4,47-49], exist in a extremely closely related mouse C. muridarum is independ- balanced polymorphism in the wild over null alleles [50,51], ent of interferon. However, a more plausible general model is and have been lost spontaneously from all except two labora- interactions the evolution in the primate lineage of improvements to the tory mouse strains [52]. It is not yet obvious what specific battery of parallel mechanisms, rendering the IRG system costs might be associated with possession of the Mx or IRG redundant. Interestingly, in this context it was noted in the resistance systems. Chlamydia study quoted above that interferon-stimulated mouse oviduct epithelial cells did not express the important The IRG proteins are well represented in the bony fish but, resistance factor indoleamine deoxygenase, which is respon- although they are abundant and diverse in the zebrafish, in sible for tryptophan depletion, whereas this is well expressed Fugu there are only two very closely linked and similar genes in interferon-stimulated human HeLa cells [41]. In general, that are, indeed, annotated as a single tandem gene in information pathogen resistance mechanisms also carry costs, for exam- ENSEMBL (although we judge this not to be the case). Thus, ple autoimmunity and allergy, arising from the adaptive the available annotated fish genomes seem to mirror the IRG immune system and many others arising from innate immu- situation in the mammals, with Fugu and Tetraodon reflect- nity [42-46]. Indeed, the interferon-inducible dynamin-like ing the reduced human case and Danio the complex dog and GTPases, the Mx proteins, which confer on mice strong resist- mouse. However, it has not yet been reported whether any

Genome Biology 2005, 6:R92 R92.14 Genome Biology 2005, Volume 6, Issue 11, Article R92 Bekpen et al. http://genomebiology.com/2005/6/11/R92

fish IRG genes respond to infection [53]. Both Danio and the recent selective force. If the two tandem IRG domains are pufferfish are Actinopterygian fish, in which where there is indeed expressed as a tandem protein (as favored by the increasing agreement that the genome has been amplified by ENSEMBL annotation), then it will be as a heterodimer with three rounds of duplication [54,55]. It is plausible that the significant sequence variation at the carboxyl-terminus. The complex IRG representation in Danio may be attributed to extreme case of heterodimer differentiation in IRG tandem preservation of these genes on more than one of the potential genes may occur in Danio, in which gene irgg, a canonical eight paralogons, whereas only a single copy carries IRG (although truncated) IRG gene, is apparently expressed in genes in the pufferfish. Further clarification of this issue tandem with the adjacent downstream gene irgq1, which is a awaits the completion of the genomes. modified (and also truncated) quasi-GTPase gene that is unlikely to function as a GTPase (Additional data file 7). In The phylogenetic origin of the IRG proteins is obscure. this case the role of the irgq1 domain may be regulatory for Because the family is conserved at least down to the bony the amino-terminal irgg domain. Other IRGQ proteins may fishes with little structural modification, the IRG genes are also be regulators of IRG proteins, interacting with the func- not strictly fast-evolving and their basic conservatism makes tional IRG proteins with a symmetry resembling one or other them easy to identify. Thus, their apparent absence from most of the two dimer structures. Thus IRGQ proteins would coe- known invertebrate genomes is probably real. Although many volve with IRG proteins. This would explain why the three components of the adaptive immune system appear to have irgq genes of the zebrafish have no homologs in the puffer- evolved close to the chordate-vertebrate boundary [56], this fishes, with their single tandem pair of irg proteins, and is not generally the case for innate immune mechanisms recalls the recent observation that the GAP protein of the [57,58] There seems to be no reason in principle why the IRG small GTPase, Rap1, is itself probably derived from a GTPase system should not work in invertebrates because it is cell- ancestor, retaining the G-domain structure but not the autonomous. However, the putative GTPase sequences that sequence to reveal its origin [59]. we have recovered from C. elegans and from Cyanobacteria are too distantly related outside the G-domain for a clear phy- Better understanding of the mechanism of action and regula- logenetic relationship to IRG proteins to be established from tion of the p47 GTPases is needed before their complex evolu- sequence similarity alone, whereas the similarities within the tionary history can be put in context. From the evidence we G-domains, although occasionally striking, are to some extent present here, however, it is already clear that effective resist- forced by the maintenance of a highly conserved function, ance to vacuolar pathogens in humans and mouse must be namely regulated GTP hydrolysis. A stronger case for a organized on radically different principles. meaningful phylogenetic relationship between these proteins and the vertebrate IRG proteins would follow from structural Nomenclature evidence that they display the distinctive IRG fold exempli- We introduce here a general nomenclature on phylogenetic fied by mouse Irga6 (IIGP1) and from a detailed analysis of principles for the p47 GTPases, based on the stem name IRG their catalytic mechanism. (immunity-related GTPases). This stem was favored over other possibilities because the name IRG-47 has priority in The basic unit of IRG protein function may be a dimer the literature as the first description of a p47 GTPase [60]. because several genes we have identified occur in pairs in a The phylogenetic basis of the nomenclature is apparent from head-to-tail arrangement, are expressed as tandem tran- Figure 5; each deep monophyletic clade is identified by a sin- scripts, and are presumably expressed as dimeric proteins. gle-letter suffix as IRGM or IRGC. The nomenclature pro- This conclusion is consistent with the dimer of IIGP1 (Irga6) posed here in Figure 5 and in Additional data file 1 has been observed in the crystal, shown by site-directed mutagenesis of accepted by the committees of human and the dimer interface to be essential for GTP-dependent oli- mouse, and by the zebrafish sequencing project. We have gomerization and cooperative hydrolysis [12]. However, a tried throughout to use the different forms of gene name second dimer interface is also required for oligomerization accepted by the nomenclature committees for mouse (Irg) (unpublished data), and which of the two dimer structures human (IRG), dog (IRG) and zebrafish (irg). The nomencla- the constitutive IRG dimers represent is of considerable ture of the IRGQ genes departs from the phylogenetic princi- interest. Unlike the observed homodimer of IIGP1, the prod- ple. The IRGQ nomenclature simultaneously recognizes the ucts of the two putative tandem genes in the mouse Irgb2/b1 affinity of these sequences to IRG genes and stresses their and Irgb5/b4 are heterodimers, implying that the two IRG anomalous GTP-binding domain features. It is, however, subunits serve distinct functions in the protein. The same highly unlikely that the IRGQ sequences of humans, mouse, may be true for the tandem pair of irg genes of Fugu, which and fish represent a monophyletic group. It is more likely that are annotated as a single tandem gene in ENSEMBL. These the IRGQ genes of each taxonomic group derive from IRG latter genes have diverged very recently, because with three genes of other clades of that group. This pattern is already exceptions they are identical at the nucleotide level over the apparent by inspection of the irgq protein sequences of the first 290 amino acids. However, they have diverged substan- zebrafish in Figure 6, but it cannot be discerned from the G- tially in the carboxyl-terminal region (Figure 6), suggesting a

Genome Biology 2005, 6:R92 http://genomebiology.com/2005/6/11/R92 Genome Biology 2005, Volume 6, Issue 11, Article R92 Bekpen et al. R92.15

domain-based phylogeny shown in Figure 5 because of the analysis was conducted using the neighbor-joining method specific divergence of the G-domains of the irgq proteins. [70], as implemented in the MEGA2 program [71]. We used

p-distances for constructing the phylogenetic trees. Reliabil- comment ity of the neighbor-joining trees was examined using the boot- Materials and methods strap test [72]. Use of database resources All available public databases were extensively screened by Alignments were performed via the BCM multiple alignment BLAST and related searches for sequences belonging to the programme suite [73] and EBI-ClustalW [74] using the IRG family. In the case of the mouse, transcript sequences default options and manipulated according to the crystal derived from the C57BL/6 strain were given preference over structure of IIGP1 [12]. Shading of alignments was performed sequences of other and undefined strain origin, and com- using Boxshade [75] and additional sequences were shaded reviews pared in all cases with genomic sequence available via the manually according to the default options of Boxshade. ENSEMBL (v28.33d.1, February 2005) array of websites [61]. A systematic study of polymorphism has not yet been com- Identification of transcription factor binding sites pleted, but it is already clear that nearly all IRG sequences Promoter regions (2 kb upstream of putative transcription derived from the CZECHII cDNA libraries (Mus musculus start point) were screened for putative transcription factor musculus) differ from C57BL/6 sequences. These differences binding sites with the Transcription Element Search System make allocation of many CZECHII sequences to individual [76,77], and the results were further analysed and confirmed clade members of the C57BL/6 mouse problematic. manually. Additional promoter analysis of Irgc (mouse Cin- reports Identification of certain Irg sequences with recognized gene ema) and IRGC (human CINEMA) was performed with Con- symbols was achieved through the Mouse Genome Initiative Site [78] based on phylogenetic footprinting [79]. web resources [62]. Where ambiguities persist in the mouse genomic map, especially on chromosome 18 in the region of RT-PCR on cells and tissues IrgA6-IrgA8 (Mb60.878-60.958), and on chromosome 11 in C57BL/6J mice were obtained from the animal house at the the region from PA28βψ to IrgB7ψ (Mb57.570-57.700), we Institute for Genetics, University of Cologne. Listeria mono- used primary BAC and cosmid sequences to reach a consen- cytogenes infection was performed as described previously research deposited sus view. [13]. Twenty-four hours after infection, the mice were killed, and liver, lung and spleen were removed and snap frozen in Human and dog IRG sequences were identified from the liquid nitrogen. Mouse L929 fibroblasts were stimulated for available public databases (ENSEMBL, National Center for 24 hours with 200 U/ml interferon-γ or 200 U/ml interferon- Biotechnology Information) and confirmed wherever β (R&D System GmBH, Weisbaden-Nordenstadt, Germany possible by multiple sequence comparisons at transcriptional and Calbiochem-Novabiochem Corparation La Jolla, CA, and genomic levels. Fugu material was obtained and analyzed respectively). Human cell lines (Hela, GS293 (GeneSwitch™ through [63-65]. Tetraodon sequence was initially assembled -293, Invitrogen GmbH Karlsruhe, Germany), HepG2, T2, refereed research from the GSS sequence database at National Center for Bio- THP1, MCF-7, SW-480, Primary foreskin fibroblast-HS27) technology Information and subsequently from the were stimulated for 24 hours with 2,000 U/ml interferon-β or University of California at Santa Cruz compiled genome data- 200 U/ml interferon-γ (PBL Biomedical Laboratories, NY, base [66] via the BLAST server. Zebrafish sequence was USA and Peprotech/Cell concepts GmbH UmKirsh, Ger- obtained from zebrafish genome resources at the Sanger Cen- many, respectively). Total RNA was extracted from tissues tre [67] and analyzed in an Acedb database using the Spandit and cells using the RNAeasy mini kit (QIAGEN, Hilden, Ger- annotation tool. many), except for testis, for which the RNAeasy Lipid Tissue Kit (QIAGEN) was used. Poly(A) RNA was isolated from total Chromosomal locations and synteny analysis of mouse and RNA using the Oligotex mRNA kit (QIAGEN). Total RNA interactions human chromosomes was initiated through ENSEMBL [68]. from human tissues was purchased from Biochain (Hayward, Further details were obtained through the Sanger Centre CA, USA). cDNA was generated from mRNA and total RNA [69]. Nucleotide sequences and translated open reading using the Super Script First-Strand Synthesis System for RT- frames of IRG family members used in this paper are given in PCR (Invitrogen, Carlsbad, CA, USA). The generated cDNAs Additional data files 9 and 10, and can also be accessed at the were screened for the presence of p47 (IRG) GTPase tran- IRG family database at our laboratory [14]. scripts by PCR. A list of the primers used is given in Addi- tional data file 8. The amplified fragments were confirmed by Phylogeny and alignment protocols sequencing. information Routine sequence analysis and local sequence database man- agement was handled using DNA-Strider 1.3f12, Vector-Nti, and MacVector 7.2. The identity and similarity matrix on pro- Additional data files tein and nucleotide sequences (Additional data file 2) are The following additional data are included with the online based on GeneDoc (version number 2.6.002). Phylogenetic version of this paper: A list of all IRG gene family members

Genome Biology 2005, 6:R92 R92.16 Genome Biology 2005, Volume 6, Issue 11, Article R92 Bekpen et al. http://genomebiology.com/2005/6/11/R92

described in this paper (gives names, synonyms, accession geting to plasma membrane induced by phagocytosis. J numbers and further information for each IRG gene; Addi- Immunol 2004, 173:2594-2606. 7. Martens S, Parvanova I, Zerrahn J, Griffiths G, Schell G, Reichman G, tional data file 1); nucleotide and amino acid identities based Howard JC: Disruption of Toxoplasma gondii parasitophorous on G-domain of mouse Irg family (gives percentage of identity vacuoles by the mouse p47 resistance GTPases. PLoS Pathogens 2005 in press. on both protein and nucleotide level within the mouse Irg 8. MacMicking J, Taylor GA, McKinney J: Immune control of tuber- family; Additional data file 2); ISRE and GAS elements of culosis by IFN-gamma-inducible LRG-47. Science 2003, mouse IRG family genes (contains the positions and exact 302:654-659. 9. Uthaiah R, Praefcke GJM, Howard JC, Herrmann C: IIGP-1, a inter- sequences of all ISRE and GAS elements found in putative feron-γ inducible 47 kDa GTPase of the mouse, is a slow promoters of mouse IRG genes; Additional data file 3); induc- GTPase showing co-operative enzymatic activity and GTP- dependent multimerisation. J Biol Chem 2003, 278:29336-29343. ibility of Dog p47 (IRG) GTPases (shows interferon inducibil- 10. Tuma PL, Collins CA: Activation of dynamin GTPase is a result ity of members of the p47 (IRG) GTPases present in the dog; of positive cooperativity. J Biol Chem 1994, 269:30842-30847. Additional data file 4); genomic organization of Danio rerio 11. Prakash B, Renault L, Praefcke GJ, Herrmann C, Wittinghofer A: Tri- phosphate structure of guanylate-binding protein 1 and p47 (irg) GTPases (illustrates the genomic organization of all implications for nucleotide binding and GTPase mechanism. p47 (irg) GTPases found in zebrafish to date; Additional data EMBO J 2000, 19:4555-4564. file 5); protein similarity matrix of Irgc and Irgq (contains 12. Ghosh A, Uthaiah R, Howard JC, Herrmann C, Wolf E: Crystal structure of IIGP1: a paradigm for interferon-inducible p47 comparison between the mouse p47 GTPase Irgc and the long resistance GTPases. Mol Cell 2004, 15:727-739. coding exon of the closely linked quasi-GTPase Irgq 13. Boehm U, Guethlein L, Klamp T, Ozbek K, Schaub A, Fütterer A, Pfef- fer K, Howard JC: Two families of GTPases dominate the com- (FKSG27); Additional data file 6); divergent nucleotide-bind- plex cellular response to interferon-γ. J Immunol 1998, ing motifs in quasi-GTPases (compares the nucleotide 161:6715-6723. binding motifs of quasi-GTPases to those of the classical 14. p47 (IRG) GTPase Database [http://db.aghoward.uni-koeln.de/ public/database2/global/] mouse p47 GTPases; Additional data file 7); a list of the prim- 15. Gilly M, Damore MA, Wall R: A promoter ISRE and dual 5' YY1 ers used (contains the sequences of all primers used in this motifs control IFN-gamma induction of the IRG-47 G-pro- study; Additional data file 8); nucleotide sequences of all IRG tein gene. Gene 1996, 179:237-244. 16. Zerrahn J, Schaible UE, Brinkmann V, Guhlich U, Kaufmann SH: The family members (Additional data file 9); protein sequences of IFN-inducible Golgi- and endoplasmic reticulum-associated all IRG family members (Additional data file 10). 47-kDa GTPase IIGP is transiently expressed during listeriosis. J Immunol 2002, 168:3428-3436. AdditionalAnames,eachClickNucleotideIrgotideISREpositionsinInducibilityityGenomicsicalusedProteintrateszebrafishbetweentheDivergent list putative of familynucleotideclosely mouseIRGin ofherememberslevel andthe synonyms,thisallsimilaritysequencesallthe the organizationandtogene) genomicGAS (givesnucleotide-binding for IRGwithinlinkeddataIRGpromotersandsequences primersstudy) ofp47date)mouse exactfile Dog bindingDog elements geneofamino genefile GTPases)percentage thequasi-GTPasethematrmatrix of accessionorganizationp47 p47sequences12345896710p47 usedfamily family allmousep47 of acidof motifsix (IRG) (IRG)GTPase IRG allofmouse Danioof (contains(IRG) identitimousemembers IRG Irgcmembers of IRGfamilynumb GTPasesof GTPasesmotifsof identity IRGrerioquof IrgcIrgqfamilyandGTPasesall family) IRGesasi-GTPasesallers theISREmembers genes)Irgqandbasedindescribed(FKSG27) p47described and (shows familyonmemberssequencesquasi-GTPases theand (containspresent (IRG) both furtheron long GAS genesinterferonG-domain intoprotein inGTPases thisincodingthose ofinformationelementsthis comparison (containstheall paper paper(comparesand primersofdog) inducibil-of exonfound(illus- the mouse nucle-found(gives clas- the offor in 17. Pai EF, Krengel U, Petsko GA, Goody RS, Kabsch W, Wittinghofer A: Refined crystal structure of the triphosphate conformation of H-ras p21 at 1.35 A resolution: implications for the mech- Acknowledgements anism of GTP hydrolysis. EMBO J 1990, 9:2351-2359. We are greatly indebted to Lois Maltais of the Mouse Genome Database at 18. Resh MD: Fatty acylation of proteins: new insights into mem- The Jackson Laboratory; Ruth Lovering, Gene Nomenclature Advisor, brane targeting of myristoylated and palmitoylated HUGO Gene Nomenclature Committee (HGNC); and Yvonne Edwards of proteins. Biochim Biophys Acta 1999, 1451:1-16. the Fugu Genomics Project at the UK Human Genome Mapping Project 19. Maurer-Stroh S, Gouda M, Novatchkova M, Schleiffer A, Schneider G, (HGMP) Resource Centre for their time and effort in developing a useful Sirota FL, Wildpaner M, Hayashi N, Eisenhaber F: MYRbase: analy- nomenclature for the p47 GTPases. We are grateful to Kerstin Jekosch, sis of genome-wide glycine myristoylation enlarges the func- Informatics & Systems Groups, Sanger Centre, for help with analyzing and tional spectrum of eukaryotic myristoylated proteins. annotating the zebrafish genes; to Cornelia Stein, Institute for Genetics, Genome Biol 2004, 5:R21. Cologne for communicating unpublished zebrafish material; and to Natasa 20. Ling J, Pi W, Bollag R, Zeng S, Keskintepe M, Saliman H, Krantz S, Papic, Institute for Genetics, Cologne for assistance in editing the long Whitney B, Tuan D: The solitary long terminal repeats of ERV- sequence alignments. This study was supported by the Centre for 9 endogenous retrovirus are conserved during primate evo- Molecular Medicine, Cologne, and DFG grants SPP1110, SFB243 and lution and possess enhancer activities in embryonic and SFB635. Iana Parvanova was supported by the DFG Graduate College hematopoietic cells. J Virol 2002, 76:2410-2423. 'Genetics of Cellular Systems'; and Cemalettin Bekpen and Julia Hunn were 21. Singh G, Lykke-Andersen J: New insights into the formation of supported by the Cologne Graduate School in Genetics and Functional active nonsense-mediated decay complexes. Trends Biochem Genomics. We are particularly grateful to the anonymous referee who Sci 2003, 28:464-466. drew our attention to the candidate p47 GTPase sequence C46E1.3 in C. 22. Mishra R, Gara SK, Mishra S, Prakash B: Analysis of GTPases car- elegans. rying hydrophobic amino acid substitutions in lieu of the cat- alytic glutamine: implications for GTP hydrolysis. Proteins 2005, 59:332-338. 23. Delarbre C, Jaulin C, Kourilsky P, Gachelin G: Evolution of the References major histocompatibility complex: a hundred-fold amplifica- 1. Mestas J, Hughes C: Of mice and not men: differences between tion of MHC class I genes in the African pigmy mouse Nan- mouse and human immunology. J Immunol 2004, nomys setulosus. Immunogenetics 1992, 37:29-38. 172:2731-2738. 24. Mashimo T, Glaser P, Lucas M, Simon-Chazottes D, Ceccaldi PE, 2. Taylor GA, Feng CG, Sher A: p47 GTPases: regulators of immu- Montagutelli X, Despres P, Guenet JL: Structural and functional nity to intracellular pathogens. Nat Rev Immunol 2004, genomics and evolutionary relationships in the cluster of 4:100-109. genes encoding murine 2',5'-oligoadenylate synthetases. 3. MacMicking JD: Immune control of phagosomal bacteria by Genomics 2003, 82:537-552. p47 GTPases. Curr Opin Microbiol 2005, 8:74-82. 25. Trowsdale J, Barten R, Haude A, Stewart CA, Beck S, Wilson MJ: The 4. MacMicking JD: IFN-inducible GTPases and immunity to intra- genomic context of natural killer receptor extended gene cellular pathogens. Trends Immunol 2004, 25:601-609. families. Immunol Rev 2001, 181:20-38. 5. Taylor GA, Stauber R, Rulong S, Hudson E, Pei V, Pavlakis GN, Resau 26. Noel L, Moores TL, van der Biezen EA, Parniske M, Daniels MJ, Parker JH, Vande Woude GF: The inducibly expressed GTPase local- JE, Jones JDG: Pronounced intraspecific haplotype divergence izes to the endoplasmic reticulum, independently of GTP at the RPP5 complex disease resistance in Arabidopsis. binding. J Biol Chem 1997, 272:10639-10645. Plant Cell 1999, 11:2099-2111. 6. Martens S, Sabel K, Lange R, Uthaiah R, Wolf E, Howard JC: Mecha- 27. Angata T, Margulies EH, Green ED, Varki A: Large-scale sequenc- nisms regulating the positioning of mouse p47 resistance ing of the CD33-related Siglec gene cluster in five mamma- GTPases LRG-47 and IIGP1 on cellular membranes: retar- lian species reveals rapid evolution by multiple mechanisms.

Genome Biology 2005, 6:R92 http://genomebiology.com/2005/6/11/R92 Genome Biology 2005, Volume 6, Issue 11, Article R92 Bekpen et al. R92.17

Proc Natl Acad Sci USA 2004, 101:13251-13256. susceptible mice carry Mx genes with a large deletion or a 28. Leister D: Tandem and segmental gene duplication and nonsense mutation. Mol Cell Biol 1988, 8:4518-4523. recombination in the evolution of plant disease resistance 53. Trede NS, Langenau DM, Traver D, Look AT, Zon LI: The use of

gene. Trends Genet 2004, 20:116-122. zebrafish to understand immunity. Immunity 2004, 20:367-379. comment 29. Barten R, Torkar M, Haude A, Trowsdale J, Wilson MJ: Divergent 54. Christoffels A, Koh EG, Chia JM, Brenner S, Aparicio S, Venkatesh B: and convergent evolution of NK-cell receptors. Trends Fugu genome analysis provides evidence for a whole-genome Immunol 2001, 22:52-57. duplication early during the evolution of ray-finned fishes. 30. Nathan C, Shiloh MU: Reactive oxygen and nitrogen intermedi- Mol Biol Evol 2004, 21:1146-1151. ates in the relationship between mammalian hosts and 55. Meyer A, Van de Peer Y: From 2R to 3R: evidence for a fish-spe- microbial pathogens. Proc Natl Acad Sci USA 2000, 97:8841-8848. cific genome duplication (FSGD). Bioessays 2005, 27:937-945. 31. Murray HW, Nathan CF: Macrophage microbicidal mechanisms 56. Flajnik MF, Du Pasquier L: Evolution of innate and adaptive in vivo: reactive nitrogen versus oxygen intermediates in the immunity: can we draw a line? Trends Immunol 2004, 25:640-644. killing of intracellular visceral Leishmania donovani. J Exp Med 57. Kimbrell DA, Beutler B: The evolution and genetics of innate 1999, 189:741-746. immunity. Nat Rev Genet 2001, 2:256-267.

32. Fang FC: Antimicrobial reactive oxygen and nitrogen species: 58. Fujita T, Matsushita M, Endo Y: The lectin-complement pathway: reviews concepts and controversies. Nat Rev Microbiol 2004, 2:820-832. its role in innate immunity and evolution. Immunol Rev 2004, 33. Adams O, Besken K, Oberdorfer C, MacKenzie CR, Takikawa O, 198:185-202. Daubener W: Role of indoleamine-2,3-dioxygenase in alpha/ 59. Daumke O, Weyand M, Chakrabarti PP, Vetter IR, Wittinghofer A: beta and gamma interferon-mediated antiviral effects The GTPase-activating protein Rap1GAP uses a catalytic against herpes simplex virus infections. J Virol 2004, asparagine. Nature 2004, 429:197-201. 78:2632-2636. 60. Gilly M, Wall R: The IRG-47 gene is IFN-gamma induced in B 34. Daubener W, MacKenzie CR: IFN-gamma activated indoleam- cells and encodes a protein with GTP-binding motifs. J ine 2,3-dioxygenase activity in human cells is an antiparasitic Immunol 1992, 148:3275-3281. and an antibacterial effector mechanism. Adv Exp Med Biol 61. EnsembleArchive [http://feb2005.archive.ensembl.org/ 1999, 467:517-524. index.html] 35. Prada-Delgado A, Carrasco-Marin E, Pena-Macarro C, Del Cerro- 62. The Mouse Genome Informatics [http://www.informat Vadillo E, Fresno-Escudero M, Leyva-Cobian F, Alvarez-Dominguez ics.jax.org/] reports C: Inhibition of Rab5a exchange activity is a key step for Lis- 63. MRC RFCGR: The Fugu Genomics Project [http://fugu.biol teria monocytogenes survival. Traffic 2005, 6:252-265. ogy.qmul.ac.uk/] 36. Forbes JR, Gros P: Divalent-metal transport by NRAMP 64. ENSEMBL: Fugu Genome Browser [http://www.ensembl.org/ proteins at the interface of host-pathogen interactions. Fugu_rubripes/] Trends Microbiol 2001, 9:397-403. 65. IMCB Fugu Genome Project [http://www.fugu-sg.org/] 37. Flo TH, Smith KD, Sato S, Rodriguez DJ, Holmes MA, Strong RK, 66. Human Genome Browser Gateway [http://genome.ucsc.edu/ Akira S, Aderem A: Lipocalin 2 mediates an innate immune cgi-bin/hgGateway]

response to bacterial infection by sequestrating iron. Nature 67. The Danio rerio Sequencing Project (Sanger) [http:// research deposited 2004, 432:917-921. www.sanger.ac.uk/Projects/D_rerio] 38. Schaible UE, Kaufmann SH: Iron and microbial infection. Nat Rev 68. Mouse-human synteny alignments (Sanger) [http:// Microbiol 2004, 2:946-953. www.sanger.ac.uk/Projects/M_musculus/publications/fpcmap-2002/ 39. Ogawa M, Yoshimori T, Suzuki T, Sagara H, Mizushima N, Sasakawa mouse-s.shtml] C: Escape of intracellular Shigella from autophagy. Science 69. Ensembl Genome Browser [http://www.ensembl.org/ 2005, 307:727-731. index.html] 40. Gutierrez MG, Master SS, Singh SB, Taylor GA, Colombo MI, Deretic 70. Saitou N, Nei M: The neighbor-joining method: a new method V: Autophagy is a defense mechanism inhibiting BCG and for reconstructing phylogenetic trees. Mol Biol Evol 1987, Mycobacterium tuberculosis survival in infected macrophages. 4:406-425. Cell 2004, 119:753-766. 71. Kumar S, Tamura K, Jakobsen IB, Nei M: MEGA2: molecular evo- 41. Nelson DE, Virok DP, Wood H, Roshick C, Johnson RM, Whitmire lutionary genetics analysis software. Bioinformatics 2001, refereed research WM, Crane DD, Steele-Mortimer O, Kari L, McClarty G, Caldwell 17:1244-1245. HD: Chlamydial IFN-γ immune evasion is linked to host infec- 72. Felsenstein J: Confidence limits on phylogenies: an approach tion tropism. Proc Natl Acad Sci USA 2005, 102:10658-10663. using the bootstrap. Evolution 1985, 39:783-791. 42. Modlin RL: Activation of toll-like receptors by microbial lipo- 73. BCM search Launcher [http://searchlauncher.bcm.tmc.edu/multi- proteins: role in host defense. J Allergy Clin Immunol 2001, 108(4 align/multi-align.html] Suppl):S104-S106. 74. EMBL-EBI clustal-w [http://www.ebi.ac.uk/clustalw/] 43. Tian D, Traw MB, Chen JQ, Kreitman M, Bergelson J: Fitness costs 75. BOXSHADE 3.21 [http://www.ch.embnet.org/software/ of R-gene-mediated resistance in Arabidopsis thaliana. Nature BOX_form.html] 2003, 423:74-77. 76. Schug J, Overton GC: TESS: Transcription Element Search Software on 44. Burdon JJ, Thrall PH: The fitness costs to plants of resistance to the WWW: Technical Report CBIL-TR-1997-1001-V.0.0 Pennsylvania: pathogens. Genome Biol 2003, 4:227. Computational Biology and Informatics Laboratory, School of Medi- 45. Weatherall DJ, Miller LH, Baruch DI, Marsh K, Doumbo OK, Casals- cine, University of Pennsylvania; 1997. Pascual C, Roberts DJ: Malaria and the red cell. Hematology (Am 77. Transcription Element Search System [http:// interactions Soc Hematol Educ Program) 2002:35-57. www.cbil.upenn.edu/tess] 46. Schmid-Hempel P: Variation in immune defence as a question 78. Lenhard B, Sandelin A, Mendoza L, Engstrom P, Jareborg N, Wasser- of evolutionary ecology. Proc R Soc Lond B Biol Sci 2003, man WW: Identification of conserved regulatory elements by 270:357-366. comparative genome analysis. J Biol 2003, 2:13. 47. Haller O, Kochs G: Interferon-induced MX proteins: dynamin- 79. Tools for Phylogenetic Footprinting Purposes [http:// like GTPases with antiviral activity. Traffic 2002, 3:710-717. www.phylofoot.org] 48. Praefcke GJK, McMahon HT: The dynamin superfamily: univer- 80. Li Y, Chambers J, Pang J, Ngo K, Peterson PA, Leung WP, Yang Y: sal membrane tubulation and fission molecules? Nat Rev Mol Characterization of the mouse proteasome regulator Cell Biol 2004, 5:133-147. PA28b gene. Immunogenetics 1999, 49:149-157. 49. Klamp T, Boehm U, Schenk D, Pfeffer K, Howard JC: A giant 81. Lafuse WP, Brown D, Castle L, Zwilling BS: Cloning and charac- GTPase, very large inucible GTPase-1, is inducible by IFNs. terization of a novel cDNA that is IFN-gamma-induced in information J Immunol 2003, 171:1255-1265. mouse peritoneal macrophages and encodes a putative 50. Haller O, Acklin M, Staeheli P: Influenza virus resistance of wild GTP-binding protein. J Leukoc Biol 1995, 57:477-483. mice: wild-type and mutant Mx alleles occur at comparable 82. Carlow D, Marth J, Clark-Lewis I, Teh H-S: Isolation of a gene frequencies. J Interferon Res 1987, 7:647-656. encoding a developmentally regulated T cell specific protein 51. Jin H, Yamashita T, Ochiai K, Haller O, Watanabe T: Characteriza- with a guanine nucleotide triphosphate-binding motif. J tion and expression of the Mx1 gene in wild mouse species. Immunol 1995, 154:1724-1734. Biochem Genet 1998, 36:311-322. 83. Sorace JM, Johnson RJ, Howard DL, Drysdale BE: Identification of 52. Staeheli P, Grob R, Meier E, Sutcliffe J, Haller O: Influenza virus- an endotoxin and IFN-inducible cDNA: possible identifica-

Genome Biology 2005, 6:R92 R92.18 Genome Biology 2005, Volume 6, Issue 11, Article R92 Bekpen et al. http://genomebiology.com/2005/6/11/R92

tion of a novel protein family. J Leukoc Biol 1995, 58:477-484. 84. Taylor GA, Jeffers M, Largaespada DA, Jenkins NA, Copeland NG, Vande Woude GF: Identification of a novel GTPase, the induc- ible expressed GTPase, that accumulates in responses to interferon γ. J Biol Chem 1996, 271:20399-20405.

Genome Biology 2005, 6:R92