Human leucine-rich repeat : a genome-wide bioinformatic categorization and functional analysis in innate immunity

Aylwin C. Y. Nga,b,1, Jason M. Eisenberga,b,1, Robert J. W. Heatha, Alan Huetta, Cory M. Robinsonc, Gerard J. Nauc, and Ramnik J. Xaviera,b,2

aCenter for Computational and Integrative Biology, and Gastrointestinal Unit, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114; bThe Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142; and cMicrobiology and Molecular Genetics, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261

Edited by Jeffrey I. Gordon, Washington University School of Medicine, St. Louis, MO, and approved June 11, 2010 (received for review February 17, 2010) In innate immune sensing, the detection of pathogen-associated proteins have been implicated in human diseases to date, notably molecular patterns by recognition receptors typically involve polymorphisms in NOD2 in Crohn disease (8, 9), CIITA in leucine-rich repeats (LRRs). We provide a categorization of 375 rheumatoid arthritis and multiple sclerosis (10), and TLR5 in human LRR-containing proteins, almost half of which lack other Legionnaire disease (11). identifiable functional domains. We clustered human LRR proteins Most LRR domains consist of a chain of between 2 and 45 by first assigning LRRs to LRR classes and then grouping the proteins LRRs (12). Each repeat in turn is typically 20 to 30 residues long based on these class assignments, revealing several of the resulting and can be divided into a highly conserved segment (HCS) fol- groups containing a large number of proteins with certain lowed by a variable segment (VS). The HCS usually consists of non-LRR functional domains. In particular, a statistically significant- either the 11-residue sequence LxxLxLxxNxL or the 12-residue number of LRR proteins in the typical (T) and bacterial + typical (S+T) sequence LxxLxLxxCxxL, where L is Leu, Ile, Val, or Phe; N is categories have transmembrane domains, whereas most of the LRR Asn, Thr, Ser, or Cys; and C is Cys, Ser, or Asn (13, 14). Al- proteins in the cysteine-containing (CC) category contain an F-box though these substitutions often preserve hydrophobicity or po- domain (which mediates interactions with the E3 ligase larity, it is possible for the first and last leucines to be replaced by complex). Furthermore, by examining the evolutionary profiles of relatively hydrophilic residues (15). The function of many LRR the LRR proteins, we identified a subset of LRR proteins exhibi- domains is to provide a structural framework for protein–protein ting strong conservation in fungi and an enrichment for “nucleic interactions (PPIs) (13). PDB structures for LRR-containing pro- acid-binding” function. Expression analysis of LRR identifies teins show the LRR domains in an arc or horseshoe shape. The a subset of pathogen-responsive genes in human primary macro- concave face (corresponding to the HCS of each LRR) consists of phages infected with pathogenic bacteria. Using functional RNAi, parallel β-strands, each usually three residues long, flanked by – we show that MFHAS1 regulates Toll-like receptor (TLR) dependent loops. Conversely, the convex face (corresponding to the VS of signaling. By using protein interaction network analysis followed each LRR) is composed of a variety of secondary structures, which fi by functional RNAi, we identi ed LRSAM1 as a component of the are often helical (13). Ligands can interact with either the convex antibacterial autophagic response. or the concave face, although the latter is more typical (13, 16). The core of the arc is hydrophobic, typically shielded by caps at the N fl pathogen-response | anti-bacterial | in ammation | pathogen sensors | terminus of the first LRR and at the C terminus of the last LRR. In autophagy extracellular proteins or domains, these caps often contain two- or four-residue cysteine clusters (14). nnate immunity is a conserved host response that entails the Beyond innate immunity, extensive functional diversity occurs Isensing of pathogen-associated molecular patterns through among LRR-containing proteins, which are involved in a variety of germline-encoded pattern recognition receptors (1), which initiate cellular processes including apoptosis, autophagy, ubiquitin-related pathway-specific signaling networks, resulting in rapid responses processes, nuclear mRNA transport, and neuronal development. that serve as the host’s first line of defense. A striking feature among Although the presence of well characterized non-LRR domains families of proteins functioning as sensors or effectors of innate can be used to classify LRR proteins along functional lines, such immunity is the inclusion into their structure, various combinations a classification scheme will place LRR proteins lacking any other of the following domains: Leucine-rich repeat (LRR), Toll/IL-1 identifiable functional domains into a single group unrelated to receptor, Ig, nucleotide-binding site, pyrin, RIG-I–like receptor, any specific function. To fully appreciate the signaling functions of caspase-recruitment, coiled-coil, immunoreceptor tyrosine-based activation motif, and C-type lectin (2). The LRR domain is present in a large number of prokaryotic This paper results from the Arthur M. Sackler Colloquium of the National Academy of and eukaryotic proteins and is one of the most commonly oc- Sciences, "Microbes and Health," held November 2–3, 2009, at the Arnold and Mabel curring protein domains in proteins associated with innate im- Beckman Center of the National Academies of Sciences and Engineering in Irvine, CA. The complete program and audio files of most presentations are available on the NAS munity. This is especially so in invertebrates (e.g., sea urchin) and Web site at http://www.nasonline.org/SACKLER_Microbes_and_Health. Cephalochordata (e.g., amphioxus) which have vastly expanded Author contributions: A.C.Y.N., J.M.E., and R.J.X. designed research; A.C.Y.N., J.M.E., repertoires of LRR proteins (3) in the absence of adaptive im- R.J.W.H., and A.H. performed research; A.C.Y.N., J.M.E., R.J.W.H., C.M.R., and G.J.N. con- munity. In jawless vertebrates, combinatorial assembly of LRR tributed new reagents/analytic tools; A.C.Y.N., J.M.E., A.H., and R.J.X. analyzed data; and segments in variable lymphocyte receptors generates the A.C.Y.N., J.M.E., A.H., and R.J.X. wrote the paper. structural diversity for antigen recognition and forms the basis of The authors declare no conflict of interest. an adaptive immune system (4, 5). Toll-like receptors (TLRs) and This article is a PNAS Direct Submission. NOD-like receptors (NLRs) recognize via the LRR domain, 1A.C.Y.N. and J.M.E. contributed equally to this work. molecular determinants from a structurally diverse collection of 2To whom correspondence should be addressed. E-mail: [email protected]. bacterial, fungal, viral, and parasite-derived components (6, 7). This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. Mutations or polymorphisms in more than 30 LRR-containing 1073/pnas.1000093107/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1000093107 PNAS Early Edition | 1of8 Downloaded by guest on October 1, 2021 LRR proteins, we devised semi-automated methods for grouping It is evident from Dataset S1 that most of the proteins contain them on the basis of LRR classes. We identified the positions of one or more irregular LRR. As shown in Fig. 1D, these LRRs individual LRR sequences within each LRR domain, and classified occur both at the ends of LRR chains (FBXL3) as well as within each repeat based upon consensus sequences and lengths of their them (LRRC16A). By their very nature, irregular LRRs are more VSs. Identified repeats were assigned into classes: bacterial (S), difficult to identify than regular LRRs, particularly at the ends of ribonuclease inhibitor–like (RI), cysteine-containing (CC), SDS22, LRR chains. However, irregular LRRs that occur within LRR plant-specific (PS), typical (T), and Treponema pallidum (Tp) chains can be identified with greater confidence and are likely to (13, 17). By using this approach, we reasoned that functionally re- be valid LRRs as they span the gaps between regular LRRs. lated LRR-only proteins would be placed together by virtue of their similar LRR domains. In addition, these LRR class-based clusters Clustering the LRR Proteins Using LRR Classes. Based on the LRR would contain all of the proteins bearing similar LRR repeat class assignments, we grouped the LRR proteins in two different structures, with further subcategorization via the presence or ab- ways. For the first approach, we clustered the proteins based on sence of non-LRR domains. the sequence similarity of their LRRs; each LRR in a given pro- In this article, we present a comprehensive categorization of tein was allowed to match any LRR of the same class in another human LRR-containing proteins by LRR class composition, protein (the irregular LRRs were placed in their own class). Fig. functional associations, and involvement in host responses to in- 1E and Fig. S3 summarize the clustered results for human LRR fection stimuli. We demonstrate semi-automated methods that proteins as a circular tree. As predicted, we observed functionally use LRR class composition to facilitate functional groupings. By similar proteins clustering together e.g., members of the SLITRK, integrating diverse datasets and using functional RNA interfer- NLR (Fig. S3), and F-box families (Fig. S4). We were also able to ence, we experimentally place uncharacterized LRR proteins in observe LRR-only proteins being distributed among those con- innate immunity and autophagy. taining non-LRR domains, fulfilling one of our goals in this effort: to cluster LRR proteins on the basis of LRR class composition. Results Because the class annotations for the LRRs are not always Annotation of LRRs. We first compiled a list of 375 human proteins optimal, the second method we used to group the LRR proteins annotated as containing LRRs in InterPro (18), the Swiss-Prot was designed to be more robust to annotation errors. We first section of UniProt (19), and LRRML (a conformational data- assigned all proteins containing fewer than five regular LRRs to base and an extensible markup language description of LRRs) the “unclassified” category. We then partitioned the remaining (17). As shown in Fig. S1, InterPro contained the largest number proteins into categories based on the class to which the majority of LRR proteins: it included all 19 proteins found in LRRML of LRRs in each protein belong. We placed those proteins for and 303 of the 327 LRR proteins in Swiss-Prot. which S and T LRRs together constitute the majority in a sepa- To annotate the LRRs in the LRR proteins, we constructed rate S+T category, whereas we classified as “mixed” those pro- hidden Markov models (HMMs) to represent the signatures of teins for which each class is in the minority. Interestingly, more the seven LRR classes. That the resulting HMMs were con- than 25% of the proteins are in the T or S+T categories and structed properly is evident from the strong similarities between fewer than half contain fewer than five regular LRRs. A sum- the logos for the HMMs (Fig. S2) (20) and the corresponding mary of the resulting category assignments is shown in Fig. 2A consensus sequences described in the literature (13). Some dif- and Dataset S2A. Again, as expected, LRR-only proteins were ferences were observed: for example, in the HMM for the S class, found to be distributed among the categories and are thus the first leucine is not as strongly conserved as in the consensus grouped with proteins containing non-LRR domains. signature. Also, whereas the sixth residue of the consensus sig- As is evident in Fig. 2B, several of the class-based categories nature is valine, the most frequently occurring residue at the are associated with the presence or absence of transmembrane corresponding position of the HMM is cysteine, with valine as the (TM) regions as predicted by Phobius (23) in the corresponding third most common. proteins. Specifically, a statistically significant number of pro- We next devised an algorithm that used the HMMs to identify teins in the T and S+T categories contain TM regions, whereas the positions and class assignments of the LRRs. To fully cap- a statistically significant number of proteins in the CC and un- ture the “irregular” LRRs with atypical sequences, classified categories do not. Several of the class-based categories we implemented a combination algorithm, using the HMMs to are also associated with certain non-LRR domains. For example, find regular LRRs, then pattern-matching to find adjacent, non- at least 12 of the 15 proteins in the CC category contain an F-box overlapping matches to LRR amino acid sequences or predicted domain (Fig. S4), whereas 17 of the 32 proteins in the RI cate- secondary structures (15, 21). This exploits the structural obser- gory contain a NACHT NTPase domain (Dataset S1). vation that LRRs occur in chains, thus leveraging the discrimi- As noted earlier, almost half the LRR proteins are unclassified. natory power of HMMs to control otherwise promiscuous pat- Comparing these unclassified LRR proteins to those which are tern matching. classified revealed several differences. In particular, all of the By applying the annotation algorithm to the 375 proteins LRR proteins annotated as LRR-containing in only one database classified as LRR-containing, we found LRRs in almost all pro- are unclassified. This observation is not surprising; one would teins classified by multiple databases but in very few of those expect that proteins with few LRRs would be less likely to be classified by only a single database. There are 334 proteins in identified in a database as LRR-containing. Another difference which at least one LRR can be identified. We provide a compre- between the classified and unclassified proteins is apparent in hensive map of these human LRR proteins, graphically displaying Fig. 2C: the T and S classes are the most commonly occurring the LRR classes as well as non-LRR domains and their coor- regular classes in classified proteins but occur infrequently in un- dinates in each of these proteins (Dataset S1). In many of the classified proteins. Third, although several non-LRR domains proteins, most of the identified regular LRRs either belong to occur in both classified and unclassified proteins (Fig. 2D), many a single class (e.g., CC for FBXL2, PS for LRRC30; Fig. 1A) or are non-LRR domains are unique to just the classified or unclassified members of the S and T classes (e.g., ASPN and EPYC; Fig. 1B). proteins (Dataset S2B). Finally, when analyzing the sets of clas- This uniformity in the class membership of the LRRs has been sified and unclassified proteins in terms of their molecular func- observed (13, 15, 22). Nonetheless, several of the proteins contain tion and associated biological processes using the PANTHER LRRs from multiple classes (other than S and T). For instance, classification system (24), we observed striking differences in key LRRC33 contains S, T, and SDS22 LRRs, whereas PS and categories being over-represented (Dataset S2 C and D). Pro- S LRRs occur in MFHAS1 (Fig. 1C). cesses associated with receptor-mediated signaling, immunity and

2of8 | www.pnas.org/cgi/doi/10.1073/pnas.1000093107 Ng et al. Downloaded by guest on October 1, 2021 A FBXL2 LRRC30

B ASPN EPYC

C LRRC33 MFHAS1

D FBXL3 LRRC16A

E L/TMOD LRR classes Other domains cluster CC Signal peptide NLR irregular Transmembrane region cluster PS LRR N-cap F-box RI LRR C-cap cluster S F-box

.2

P P P P

P 6

R 3 P R PR RA R P 1 5 R P D 2

RA A 1 A P A R A 4 R F8 F

F F2 5 M M 6 R M A ME PR A 9 4 P M P A E E F E F 9 A M M E E F R M E M E R E F M M 29 E M ME M E F1 F E E F A F A E F E A E A A M 2 A A M M E F F 1 1 3 P M F M 2 2− 1 L P M F 1 R R A 8 2 M 9 I A F 1 3 R R 2 L 1 D O R E 2 A R E 1 PRAMEF7 D 1 E 7 P P R L R 1 6 P P D W O C A F F1 0 R 4 A O D R F 3 P RA 2 C R PR O ME 1 P 3 3 M M D C R 2 P O 2 C 4 SDS22 8 L M E 17 2 C LM OD 4 9 1 L M F o L 2 MO Ras GTPase 3 4 T C BX o r 1 f 5 1 T MOD P L F TM R r 3 7 I X 6 R f T L B L 6 7 M R P 2 R X 2 6 N 7 F F L R P C L 0 C 1 B B N L R P 1 F 2 2 X X N L P B 9 8 3 F L1 L N LR R P 1 B X 7 N L P 2 X L 3 R FB N 1 F L 1 6 LR 3 B X 1 NL F 4 N P B X L5 LRP 1 L R X N L A 1 L 5 N 1 F M 1 LRP 4 T H F B N 7 N P 9 BXL X R Toll-Interleukin receptor F 1 RN P B L L 14 4 N R F XL 19 L 5 BX N RP 4 1 L 0 N 15 S L LRP f K 1 r F 1 N o 8 AT B P 6 6 X 2 C P78 P L E C A 5 6 C R R A T S P 6 F P L LR 1 B B 5 LT C F S R 16 1 B X R P X L R C A F L 3 L R B 2 R G X 1 L ZYG11 L AN 1 A Z 8 R T Y II G C F 1 B RC5 B 1 L f92 X A N F O or 1 B 3 1 E L X 9 C T centrosome cluster R L1 1 R 2 TC D SY C O 2 K 25 N D 4 I N O 3 A E N C A1 2 R D 30 R 45 B N L C A 5 R 66 D H R rf1 1 N 10 L o 89 A 14 0 H C 40 FB 17 C FBXO41X LO L8 K LRC3 IA N E A17 C M 31 TB 4 U D1 1 C1 C 6 RC4 NA R 2 I L C4 SP P R G R L2 1 L BI Z 1 K E NF R1 rf2 Nuclear & o TA C21 G BG RA F5 NX 1 NXF F4 NX 2 XF N B F2 NX C43 RR L IP K11 ST Q1 LRRI SL H ITR ISC SLIT K5 N SLITRK RK2 72 CEP cluster SLIT C36 RK4 LRR SLIT UK RK3 LRG SLIT 110 RK6 CEP SLITR 9 K1 LRRC LR 48 RC26 LRRC C C1 HADL LRRC CHAD PPP1R7 RTN 7 4RL2 LRRC6 5 RT 012933 N4RL1 LOC10 1 RTN4R SNRPA PXDNL ANP32D PXDN ANP32E KIAA0644 ANP32C SLIT3 ANP32B SLIT2 ANP32A SLIT1 LOC723972 PKD1 LRRIQ3 GPR125 LRRC56 GPR124 LRTOMT VASN LRRC23 LRRC24 LRRC61 ISLR2 DNAL1 ISLR TBCEL MXRA5 CEP97 LR IGSF10 RC49 7 LRR LRRC1 C46 L 28129 RRC50 LOC7 S LRR IGFAL C6 3 C10 LRRC orf11 L 3B RRK1 LRRC 9 LRW 4589 D1 G 20 hC YX LRR N C63 LR LGR6 RC6 LRR 9 5 K LGR 2 4 LRR LGR C18 4B LR RC RC2 LR LR C4 RC LRR 39 C LRR RC4 C8 LR 2 LR C GO RC LIN LR 8E 1 RC NGO 8B LI 3 LR GO RC LIN LRR 8A 4 C GO L 8D LIN G3 RR RI C L 2 LR 20 IG R R L C2 L 1 RR 7 IG IQ LR PHL 4 P2 XF P PP R P1 HL XF L PP R N3 RC L R L H LR 1 R 1 N CH RR 2 LR 2 L N C L H3 LRR I3 RCH LG I1 LR 4 G R L 2 LO C I C 10 LG M 3 I4 F 902 G H L 3 LR AS 0 R R 1 5 L F C T 5 LII 4 LR L 7 T 2 R N P R LF 1 PIL C E N 5 F L 7 L 9 R 5 E 1 R C S C R 9 HO 5 R LRRC 9 L GP B C B L 2 1 5 O P C C 58 G GP N 4 5 0 1 CN O 1 C 2 T 3 R L 6 8 M R O 7 T 2 T LR R N LR R 6 L P A C3 L C B L D 1 5 R D 0 P 5 LR R G C C R 38 R RC 4 R C L S 0 L BG U 2 R 2 R 8 R E S 1 TP 2 L L R A RC5 O 1 R BB M R G L R 1 L I O RR 2 S C M IG O3 IP A C 7 M G T3 F C I 1 S RI A L 1 MI H H IT T B LR T2 R Extracellular or A R I 2 S C X L 1 G R M T R H L T L R R 4 C R R TM R 3 L D A R 2 T R 4 TM M 1 1 L L T T A T 8 LRR 7A R 7 3 T L M1 0 RR R 3 3 A T L R 2 L R B C 7 R1 C 7 8 L L 1 L 7 EPYC R 3 R R 3 O 0 R 3 C R C LR O R 6 R 5 9 GN L D Transmembrane R 5 L R T A C TLR R 1 P BG C R 4 N F S 6 RC L T L F F F N 6 LR T N 2 L P FN 3 E C R P L 1 P OM F L R F RT N L N O N 3 C L K R R U 1 O O R L M G 3 4 R F 32 U

C P L E M K2 M K T K D D T L K 2 R N O M G R C L

R RA 1 LRFN3 S R D 2 R 3 L N L N R R R E D R T T

T L

L R R R NT N 1 L N L

P cluster cluster PS class-containing

Fig. 1. Examples of annotated LRR proteins. (A) Proteins containing LRRs from predominantly one class. (B) Proteins containing LRRs from predominantly the S and T classes. (C) Proteins containing LRRs from multiple classes. (D) Examples of proteins illustrating the occurrence of irregular LRRs both at the ends of LRR chains as well as within them. (E) Clustering of LRR proteins using LRR classes. A larger “zoomable” version is shown in Fig. S3. Cluster descriptions illustrate certain predominant features observed among representative LRR protein members in the respective clusters, e.g., F-box cluster consists predominantly of LRR proteins that also contain the F-box domain, whereas the NLR cluster consists mostly of NLR family members.

defense were significantly enriched among the classified LRR ated signaling and “immunity and defense” (FDR-adjusted P proteins. In contrast, nucleic acid–binding functions and cell values, 0.02 and 0.05, respectively; Dataset S2E), suggesting that structure–related processes were over-represented among the S and T LRRs might play a role in these processes. unclassified LRR proteins. Because S and T are the most commonly occurring regular Evolutionary Characteristics of the LRR Proteins. We next examined LRR classes in classified LRR proteins and such proteins are the evolutionary profiles of the LRR proteins to determine whether enriched for terms related to immunity and receptor-mediated the resulting patterns of conservation might yield any insights into signaling, we examined whether the subset of classified proteins the . To generate this evolutionary profile, we clus- with S or T LRRs would also exhibit such enrichments. Using tered the LRR proteins so that proteins grouped together have the PANTHER classification system, a number of processes as- orthologues in many of the same organisms. A heat map depicting sociated with receptor-mediated signaling were found to be this clustering is shown in Fig. 3. Fig. 3 also shows the heat maps statistically significant, including cytokine and chemokine medi- generated by performing the same procedure for human proteins

Ng et al. PNAS Early Edition | 3of8 Downloaded by guest on October 1, 2021 LRR-only includes non-LRRs A C Classified Unclassified unclassified LRR class % of proteins LRR class % of proteins S+T IRREGULAR 93.8 IRREGULAR 68.9 T T65.4SDS22 43.3 RI S 55.0 CC 26.2 PS SDS22 45.5 RI 21.3 PS 31.8 T 17.1 CC RI 16.6 PS 7.3 LRR category mixed CC 13.3 S6.7 SDS22 S classified unclassified 0 50 100 150 200 D F-box Number of LRR proteins IG NACHT_NTPase TM non-TM B FN_III unclassified * Pyrin S+T * GPCR_Rhodpsn T * DISEASERESIST RI CARD IQ_CaM_bd_region

PS domainNon-LRR Znf_CXXC CC * WD40_repeat LRR category mixed * = 2-sided p-value < 0.01 Znf_PHD SDS22 TF_JmjC_AAH S Pleckstrin_homology

050100150200 0 5 10 15 20 25 30 35 40 Number of LRR proteins Number of LRR proteins

Fig. 2. Results from categorizing the LRR proteins by the majority LRR class in each protein. (A) Numbers of LRR proteins in each category; the proteins in each category are further subdivided into those that are LRR-only and those with non-LRR domains. (B) Distribution of the LRR proteins in each category between those with TM regions and those without; categories with a statistically significant association with the presence or absence of TM regions are indicated with an asterisk. (C) Percentages of the classified and unclassified LRR proteins containing an LRR of each class. (D) Numbers of classified and unclassified LRR proteins which contain each non-LRR InterPro domain.

containing PDZ and SH2 domains. It is apparent that the evolu- (relative to the other LRRs). It is now also apparent that almost tionary profiles for the PDZ and SH2 protein families are quite all the proteins in the T and S+T categories have orthologues similar. Both protein families can be partitioned into two major only in metazoans. groups: proteins with orthologues in all metazoans (group 4) and proteins with orthologues only in chordates (group 5). There are Expression-Based Classification of LRR Genes Reveal Neuronal and also some unique characteristics observed for a small number of Immune Clusters. To build on the information derived from LRR SH2 proteins but not among PDZ proteins, such as those having class classification methods, we were interested to identify sub- orthologues primarily in mammals (group 6). Interestingly, none of sets of LRR genes that might have roles in processes associated the SH2 proteins have orthologues in eubacteria or archaea. with specialized tissue types. We examined the expression of The evolutionary profile heat map for the LRR proteins exhibits LRR genes across 79 tissues in a human microarray panel (25). striking differences from those for the PDZ and SH2 proteins. For Of the 230 LRR genes for which probes were available, 39 example, a significantly large number of the LRR proteins have showed higher-level expression across neuronal compared with orthologues primarily in mammals (group 1), whereas very few of other tissues (Wilcoxon test, P < 0.05; Fig. S6A), including genes the SH2 proteins and none of the PDZ proteins exhibit this prop- known to be associated with the CNS such as NTRK2/SLIT and erty. Further, several of the LRR proteins that have orthologues LRRTM family members, and functionally uncharacterized only in eukaryotes do not have orthologues in fungi (group 2), ones, e.g., C22orf36 and LRRC49 (Dataset S2H). Of particular whereas no such proteins are evident in the PDZ and SH2 heat interest are 93 LRR genes exhibiting elevated expression in maps. One particularly striking feature of the LRR heat map is the immune tissues (Wilcoxon test, P < 0.05; Fig. S6A). These in- presence of a set of 21 proteins with strong conservation in fungi clude, in addition to well studied pattern-recognition molecules (group 3). Analyzing this set of proteins for over-represented such as TLRs and NLRs, others that are not presently known PANTHER terms revealed an enrichment (with a P value adjusted to be directly associated with immune function, such as SCRIB for multiple testing of 1.277e-05) for the term “nucleic acid-binding” (a PDZ-domain–containing cell-junction protein) and genes in- for the following seven proteins: CNOT6, CNOT6L, NXF1, NXF2, volved with nuclear transport (NXF1, ANP32A, B, E; Dataset NXF2B, PDS5A, and SNRPA1. S2G). In addition, there is also a significant over-representation To further stratify the clusters of LRR proteins shown in of genes encoding proteins associated with com- Fig. 3, we reclustered the proteins based on the relative degree of plexes or activities (e.g., PPIL5, SKP2, LRSAM1, FBXL5, FBXL6, − similarity between each protein and its orthologues. A heat map FBXO9, FBXO7, FBXO11, FBXW5; P =4× 10 15)inthis depicting the updated clustering is shown in Fig. S5A. For each immune cluster. species represented in the heat map, red denotes proteins that are more similar to their human orthologues than those depicted Pathogen-Responsive LRR in Human Primary in green are to theirs. Because it was difficult to discern any clear Macrophages. To uncover candidate LRR genes that might be in- patterns in the revised heat map, we rearranged the proteins so volved in host-pathogen innate immune response but have not they would be grouped by their class-based categories (Fig. S5B). been previously identified to have immunologically defined roles, Doing so revealed that the SDS22, RI, and mixed categories we examined the expression profiles of a subset of LRR genes contain very few proteins with high degrees of conservation by semiquantitative RT-PCR in primary human macrophages in-

4of8 | www.pnas.org/cgi/doi/10.1073/pnas.1000093107 Ng et al. Downloaded by guest on October 1, 2021 s es ans e ans domain (26), which is also present in a group of LRR proteins in o ls ls taz ts ts a Dictyostelium and prokaryotes. Although there is currently no i ec r me i her chordat bacteria her chordat bacteria t insec ther metazother eucaryotesu rchaea t ins the ther eucaryotesu rchae reported involvement of MFHAS1 in host defense processes, the mammao o o fung e a mammao o o fung e a strong induction of MFHAS1 expression observed following

1 pathogen challenge (Fig. 4 A and B) is consistent with a role in 4 immune regulation. ANP32B 2 CNOT6 CNOT6L DNAH10 5 Network Analysis Identifies a Potential Autophagy-Related Role for DNAH17 DNAH3 PDZ the LRR Protein LRSAM1. Previously we identified FNBP1L as FBXL20 es FBXO9 otes a protein required for anti-bacterial autophagy, based on PPI MUC16 s l cary NXF1 s ma t eu fi NXF2 r i data (27). Using a similar approach, we constructed rst-order PPI 3 ec e g ther chordats her metazoansh n NXF2B mam o in t t fu PDS5A o o networks for human LRR proteins to explore whether LRR pro- PHLPP PHLPPL teins interacted with components of the core autophagy “machin- PPP1R7 RABGGTA ery.” Interestingly, WDFY3 was the only human LRR protein RANGAP1 4 SNRPA1 that has been previously linked to autophagy (28). From network TBCE WDFY3 analysis, we identified LRSAM1 (leucine-rich repeat and sterile 5 α-motif containing 1) as a potential interactor with GABARAPL2, 1 which belongs to the MAP1 LC3/ATG8 family of proteins. RNAi LRR 6 SH2 directed against LRSAM1 resulted in good knockdown for two Fig. 3. Heat maps indicating the species for which orthologues exist for duplexes (siRNA1 and 2) and a modest reduction in RNA level human proteins containing LRR, PDZ, and SH2 domains. For each human LRR for the third (siRNA3; Fig. S6C). The level of LRSAM1 knock- protein (row), each column corresponds to a species in which an orthologue down correlated with the level of anti-Salmonella autophagy ob- of the protein exists. The species are partitioned into the taxonomically re- served in HeLa cells, with higher knockdown resulting in lower lated groups indicated, and the species within each group are arranged from rates of successful autophagy of intracellular bacteria. This result left to right in decreasing order of orthologue count. The specific human LRR was confirmed in three separate experiments and the data pooled proteins which are strongly conserved in fungi are highlighted. The protein (Fig. 4E). Statistically significant reductions in anti-Salmonella subsets indicated by the numbers 1 through 6 are described in the text. autophagy were observed in FNBP1L and LRSAM1 (siRNA 1 and 2), but not LRSAM1 siRNA3, compared with control siRNA fected with Staphylococcus aureus, Mycobacterium tuberculosis, duplex transfection. Thus, LRSAM1 has an essential role in anti- bacterial autophagy. Listeria monocytogenes, enterohemorrhagic Escherichia coli (EHEC), and Salmonella enterica serovar typhi (S. typhi), across four time Discussion points (0 h, 1 h, 2 h, and 6 h after infection; Fig. 4A). The gene In this report, we have described two approaches for grouping the subset was selected as having possible innate immune involve- human LRR-containing proteins using predicted class assign- ment based on text mining, annotations, or expres- fi fi ments for the individual LRRs. For the rst approach, we clustered sion pro ling. We observed a cluster of LRR genes (LRRC48, the proteins based on the sequence similarity of the LRRs be- BPI, LRRK2, FLRT3, LRRTM3, TMOD1, NLRP11) that was longing to each class. As illustrated by the NLR and SLITRK strongly induced by L. monocytogenes and EHEC, showing early clusters in Fig. S3 and F-box–containing proteins in Fig. S4, this and sustained induction across the time periods examined (1 h to method was able to group together proteins with similar function 6 h). Another group of genes (PHLPPL, SYNE2, SHOC2, LRRC28) fi while distributing the LRR-only proteins among those containing displayed elevated expression when infected by each of the ve non-LRR domains, thereby allowing the assignment of putative pathogens but exhibited a particularly strong response at 6 h to functional extensions to LRR-only proteins within these clusters. EHEC. In contrast, the expression of FBXL11, LRCH4, and This is necessary because, based on the annotations in InterPro, LMOD2 was decreased when infected with S. aureus, M. tuber- almost half the 375 human LRR proteins do not contain non- culosis, L. monocytogenes, and EHEC. The expression of NLRP1, LRR domains. The absence of these domains from which func- LRRTM2, LRRC59, and NXF1 was induced in response to tional insights could usually be gleaned, limits their characteriza- M. tuberculosis and S. typhi, but appears to be refractory to the tion. Hence, a majority of these LRR proteins have no known other infectious agents tested. A strong induction signal for function. On the basis of very similar LRR class composition, we MFHAS1 was observed with EHEC, L. monocytogenes, S.typhi, found CEP72 [an LRR-only protein that was recently implicated and M. tuberculosis (Fig. 4 A and B). Taken together, these findings in ulcerative colitis from genome-wide association studies (29)] suggest that these LRR proteins function as sensors or effectors of clustered with LRRC36 (Fig. S3), an RORγ-binding protein, sug- pathogen-mediated stress signals. The precise mechanisms by gesting a gene-regulatory connection that requires additional func- which microbial cues are sensed and regulated by these proteins tional studies to validate. remain to be determined. For the second approach, we partitioned the LRR proteins into categories based on the majority LRR class of each protein. LRR-Containing Protein MFHAS1 Regulates TLR-Dependent Signal As demonstrated in Fig. 2A, this method yielded categories in Transduction. We next investigated whether LRR proteins, espe- which LRR-only proteins are grouped with non-LRR domains. cially MFHAS1 (malignant fibrous histiocytoma amplified se- These categories also have the potential to yield insights into the quence 1), were essential for the activation of TLR-dependent functions of uncharacterized proteins. In particular, as shown in pathways. For this, we down-regulated MFHAS1 expression in Fig. 2B, several of the categories are associated with the presence RAW macrophages using siRNA. siRNA knockdown of MFHAS1 or absence of TM regions. Further, most of the proteins in the in RAW 264.7 macrophages strongly enhanced IL-6 production CC category contain an F-box domain, whereas half the proteins following LPS and polyI:C stimulation (Fig. 4 C and D), suggest- in the RI category contain a NACHT NTPase domain. Finally, ing a potential immune modulatory role for MFHAS1. Under although the unclassified category contains almost half the LRR similar experimental conditions, knockdown of CNOT6L did not proteins, we still found statistically significant associations with alter LPS- and polyIC-mediated TLR activation. Apart from its several ontology terms when we compared the proteins in this LRR domain, MFHAS1 contains one other identifiable and re- category to those in the remaining categories. Specifically, the cently described Roco (Ras-like GTPase and C-terminal of Roc) unclassified LRRs exhibit an over-representation of terms related

Ng et al. PNAS Early Edition | 5of8 Downloaded by guest on October 1, 2021 to nucleic acid binding and cell structure, whereas the classified To obtain further insight into function, we also examined the LRR proteins are enriched for terms related to immunity and evolutionary profiles of the LRR proteins. We observed striking receptor-mediated signaling. We also observed that this latter differences in the profile for human LRR proteins when com- enrichment is valid for the subset of classified proteins which pared with those of PDZ and SH2 proteins. In particular, a small contain S or T LRRs. As classified LRRs all contain five or more subset of the LRR proteins exhibits strong conservation in fungi LRRs, this observation suggests a possible link between receptor- and is enriched for the PANTHER ontology term “nucleic acid mediated signaling and LRR proteins with an S or T LRR and binding.” One of these, NXF1, a nuclear RNA-export factor, was at least four other LRRs. recently identified in two genome-wide siRNA screens as a host-

A B 6 M.tuberculosis 1h S.aureus L.monocytogenes S.typhi M.tuberculosis 6h M.tuberculosis EHEC 5

0 1 2 6 0 1 2 6 0 1 2 6 0 1 2 6 0 1 2 6 4 CMIP FBXL11 3 LRCH4 LMOD2 2 LRDD LRGUK (normalized fold-induction) 1 LRRC39 2

C21orf2 Log 0 LRRC45 SCRIB -1

NFKBIL2 NXF1 LRRK2 SYNE2 NLRC4 WDFY3 TMOD1 LRRC59 LRRC28 STK11IP LRRC32 SNRPA1 LRRC8B LRRTM2 EPS8L1 MFHAS1 PHLPP 4000 C2orf28 C * D LRRCC1 3500 NLRP12 400 LRRC48 3000 BPI 350 * LRRK2 2500 FLRT3 300 LRRTM3 250 TMOD1 2000 NLRP11 200 LRRC28 secreted IL-6 (pg/mL) 1500 PHLPPL 150

SYNE2 1000 secreted IL-6 (pg/mL) 100 SHOC2 WDFY3 500 50 SNRPA1 TBCEL 0 0 LRSAM1 LRRC8B NLRC4 siCnot6L siMfhas1 siControl siControl siMfhas1 siControl siCnot6L siControl LRRC32 untreated untreated MFHAS1 LPS polyIC LRRC59 (50 ng/mL) (100 ng/mL) NOD1 LRRTM2 NXF1 E * NLRP1 40 * NLRP6 * NLRP2 30 -1 -0.5 0 0.5 1

signed difference ratio 20

autophagosomes 10 % Salmonella within

0

trol

1 1

on

iC

s

FNBP1L

si

iLRSAM1 3 iLRSAM1

iLRSAM1 2 iLRSAM1

siLRSAM

s s

Fig. 4. Identifying genes encoding LRR proteins involved in immunity and antibacterial autophagy. (A) Heat map shows gene expression of LRR genes in human primary macrophages infected by the indicated pathogens at various time-points: 0 (uninfected) and 1, 2, and 6 h after infection (as examined by RT-

PCR). GAPDH-normalized log2 expression values are expressed as signed difference ratios relative to the uninfected state and scaled by normalizing to the maximum absolute deviation of each gene’s expression level from the uninfected control, so that all values lie between −1 and +1. (B) Further RT-PCR validation of a subset of M. tuberculosis–responsive genes from independent set of samples from human primary macrophages uninfected and infected with

M. tuberculosis for 1 and 6 h after infection. Values are GAPDH-normalized log2 fold change relative to uninfected control, performed in duplicate. Error bars indicate ± SD. (C) Knockdown using siRNA directed against Mfhas1 in RAW 264.7 macrophages enhanced the level of secreted IL-6 production following LPS and (D) polyIC stimulation, as measured by ELISA. Data from three experiments is shown as mean ± SD. Asterisk indicates P < 0.05 as assessed using two-tailed t test. RT-PCR validation of siRNA knockdown is shown in Fig. S6B.(E) LRSAM1 is required for antibacterial autophagy. Knockdown of LRSAM1 results in loss of anti-Salmonella autophagy in HeLa cells. siRNA directed against FNBP1L or LRSAM1 resulted in loss of autophagic membranes surrounding internalized S. typhimurium. The anti-Salmonella autophagy rate was significantly lower for two of three LRSAM1 duplexes, correlating with effective knockdown at the RNA level (Fig. S6C). Data are shown as mean ± SE and are pooled from three independent experiments, each counting at least 50 infected cells per condition. Significance (P < 0.05) was assessed using two-tailed t tests with Bonferroni correction.

6of8 | www.pnas.org/cgi/doi/10.1073/pnas.1000093107 Ng et al. Downloaded by guest on October 1, 2021 dependency factor for influenza virus (30, 31). We also uncov- were to be used to annotate the LRRs only in human proteins but the ered an intriguing connection between the class-based LRR sequences used to construct the HMMs were not specific to humans, we categories and the degrees to which the categories’ members are modified each HMM by repeating the following steps three times. (i)With conserved. As illustrated in Fig. S5B, we found that the SDS22, HMMER, all human UniProt sequences were scanned for matches to the HMMs using a domain E-value cutoff of 0.1 and a global E-value cutoff of RI, and mixed categories contain very few proteins with high 1 × 1020.(ii) Each matching sequence segment is assigned to the class for degrees of conservation (relative to the other LRRs). which it has the smallest E-value. (iii) For each class, the new sequences Next, we were interested in identifying a subset of LRR genes were added to those already used to construct the class’sHMM,removing in immunity and host defense. To identify bacterial-responsive any duplicates. (iv) As described earlier, we iteratively aligned and filtered LRR genes, we examined expression profiles of LRR genes in the sequences for each class. The final alignments were used to construct primary human macrophages infected with five pathogenic bac- new HMMs. teria and across multiple time points using RT-PCR. We iden- tified MFHAS1 as a candidate gene for further investigation as Annotation of the LRRs. We first scanned every LRR protein using the LRR a result of its induction response to four of the five pathogens HMMs and compiled a list of all matches with positive scores. We then fi examined. Validation by siRNA knockdown in RAW macro- identi ed and scored potential irregular LRRs according to the criteria de- tailed in SI Materials and Methods. For a given LRR arrangement and reg- phages places MFHAS1 in the signaling network downstream of ular LRR class assignment for a protein, we defined the total score for the TLR signaling and highlights its potential role as a modulator of protein to be the sum of the LRR scores adjusted for the inter-LRR regions. fl the in ammatory response. We next used an HMM to simultaneously identify the LRR arrangement We experimentally validated candidate LRR genes identified and regular LRR class assignment, which maximized the total score for through integrating information from diverse sources including each protein. Each identified regular and irregular LRRs were rescored ontology annotations, literature co-citations, PPI data, and gene using the LRR HMMs to fine-tune the associated class assignment. LRRs expression microarray data, with the aim of gaining additional with negative HMM scores were annotated as irregular. A graphical rep- insights for functional exploration. From literature mining, we resentation of LRR proteins is provided in Dataset S1. For clarity of display, note that the LRR protein WDFY3 has been previously impli- we merged any overlapping, non-LRR InterPro domains using dependency cated to play a role in autophagy by binding to phosphatidyli- or relational terms. nositol-3-phosphate (Ptdlns3P), which regulates endocytic and Clustering LRR Proteins Using LRR Sequence Similarity. Sequences of the an- autophagic membrane traffic (28). Extending this search by using fi notated LRRs in LRR proteins were extractedandplacedintogroupsbased PPI network analysis, we identi ed LRSAM1 as the only human on LRR classes. The irregular LRRs constituted a separate class and were LRR protein in the network that potentially interacted with placed in their own group. Similarity scores were then computed for every GABARAPL2, a member of the MAP1 LC3/ATG8 family of pair of LRRs belonging to the same class. For BLAST E-values greater than 1, proteins, in the core autophagy apparatus. LRSAM1, also known the similarity score was definedas0,whereasforE-valuesof0orlower,the fi as Tal, has a SAM domain and a RING (E3 ubiquitin-protein similarity score was de ned as the negative log10 of the E-value. Using ligase) domain, which mediates monoubiquitination of TSG101 these scores, we next identified for each LRR the best matching LRR in at multiple sites and regulates receptor endocytosis by inacti- every protein. For each LRR protein, a similarity score was also computed vating the ability of TSG101 to sort endocytic and exocytic (as for every other LRR protein by summing the scores of the best-matching LRRs in that protein. To summarize this formally: for each protein i,we observed with HIV-1 viral protein) cargos (32). Given this role in then calculated for every protein j a similarity score si,j equal to the sum of cargo sorting and membrane packaging of cargos, as well as the the scores of the best-matching LRRs in protein j or −∞ if no matching LRRs known role for bacterial ubiquitination in triggering antibacte- existed (in the latter case, we also set sj,i equal to −∞). Using the resulting rial autophagy (33), we postulated that LRSAM1 was a plausible similarity scores, we constructed a matrix D = [di,j], where for all i and j, di,j candidate to operate within the anti-Salmonella autophagy path- represented the normalized distance between proteins i and j and is de- way. Using a functional siRNA approach, knockdown of LRSAM1 fined by the following formula: resulted in reductions in anti-Salmonella autophagy. 8 − > si; i si; j −∞< < ; We provide a resource and framework to facilitate the func- < − : −∞< < if si; j si; i si; i minfsi; k si; k si; ig k tional understanding of 375 human LRR proteins, many of which di;j ¼ ; [1] :> 0ifsi; j ¼ si; i are currently uncharacterized. The results of this study have : contributed to our understanding of how LRR class composition 1 otherwise fi might be used to facilitate the functional classi cation of less well Next, a symmetric matrix E = (D + DT) / 2 was created. We then used E to studied LRR proteins alongside those of known function. Using hierarchically cluster the LRR proteins with the method of McQuitty (38) and an integrative approach, we elucidated functions of two LRR the hclust function in the R programming language. The clustering result genes in regulating responses to pathogens. First, via expression was combined with positional information for each LRR class in each LRR profiling of pathogen-responsive LRR genes, followed by func- protein using a Perl program and visualized using the representation scheme tional siRNA, we show that MFHAS1 regulates TLR signaling. described previously (39). Finally, by using PPI data, we identified LRSAM1 as a component of the anti-bacterial autophagic response. Together these findings Determining Association Between LRR Categories and TM Regions. For each provide insights into the function and molecular pathways asso- LRR category, we computed the P value for the two-sided Fisher exact test, P ciated with LRR proteins. implemented in the R programming language. A value of 0.01 was used to determine statistical significance.

Materials and Methods Identification of Enriched PANTHER Ontology Terms. Enrichment of ontology Construction of the LRR HMMs. We extracted for each LRR class the associated terms from the PANTHER classification system (24) was computed using set of LRR sequences in the LRRML database. As LRRML contained no one-sided Fisher exact test implemented in the R programming language. sequences for the TpLRR class and very few for the PS class, we added to P values were corrected for multiple-hypothesis testing using the method of these classes’ sets sequences obtained from two publications (34, 35), re- Benjamini and Hochberg (40). spectively. After removing the duplicate sequences in each set, we used the remaining sequences to construct a “seed” HMM for each LRR class. To Generation of Evolutionary Profile Heat Maps. Genes for the 375 LRR proteins construct each HMM, we first iteratively aligned and filtered the associ- were mapped to Ensembl protein IDs. The OrthoMCL database (41) was ated sequences using Clustalw2 (36) (version 2.09, default options) until no queried for each protein to identify orthologues and the organisms in which positions in the resulting alignment were associated with gaps in 95% or orthologues are present. Vectors containing strings of 0s and 1s denoting more of the sequences. We then used HMMER (37) (version 2.3.2, default the extent of orthologue representation across taxonomic groupings were options) to build and calibrate an HMM from the alignment. As the HMMs constructed and used to hierarchically cluster the LRR genes with Cluster

Ng et al. PNAS Early Edition | 7of8 Downloaded by guest on October 1, 2021 3.0 (42). The clustered results were represented using a heat map. To gen- ACKNOWLEDGMENTS. We thank Mark Daly and Fred Ausubel for helpful erate a heat map depicting the relative degrees of conservation of the LRRs, discussions. This work was supported by National Institutes of Health Grants we also incorporated distances (percent divergences) computed using Clus- AI062773, DK83756, DK060049, and DK060049 (to R.J.X., R.J.W.H., and J.E.) ’ talw2 (36) between LRR proteins and their orthologues (details in SI Mate- and a Research Fellowship Award from the Crohn s and Colitis Foundation of America (to A.C.Y.N.). rials and Methods).

1. Medzhitov R (2007) Recognition of microorganisms and activation of the immune 22. Matsushima N, Ohyanagi T, Tanaka T, Kretsinger RH (2000) Super-motifs and response. Nature 449:819–826. evolution of tandem leucine-rich repeats within the small proteoglycans—biglycan, 2. Pålsson-McDermott EM, O’Neill LA (2007) Building an immune system from nine decorin, lumican, fibromodulin, PRELP, keratocan, osteoadherin, epiphycan, and domains. Biochem Soc Trans 35:1437–1444. osteoglycin. Proteins 38:210–225. 3. Huang S, et al. (2008) Genomic analysis of the immune gene repertoire of amphioxus 23. Käll L, Krogh A, Sonnhammer EL (2004) A combined transmembrane topology and reveals extraordinary innate complexity and diversity. Genome Res 18:1112–1126. signal peptide prediction method. J Mol Biol 338:1027–1036. 4. Nagawa F, et al. (2007) Antigen-receptor genes of the agnathan lamprey are 24. Mi H, et al. (2010) PANTHER version 7: Improved phylogenetic trees, orthologs and assembled by a process involving copy choice. Nat Immunol 8:206–213. collaboration with the Gene Ontology Consortium. Nucleic Acids Res 38 (database 5. Alder MN, et al. (2005) Diversity and function of adaptive immune receptors in issue):D204–D210. a jawless vertebrate. Science 310:1970–1973. 25. Su AI, et al. (2004) A gene atlas of the mouse and human protein-encoding transcrip- 6. Akira S, Uematsu S, Takeuchi O (2006) Pathogen recognition and innate immunity. tomes. Proc Natl Acad Sci USA 101:6062–6067. Cell 124:783–801. 26. Marín I, van Egmond WN, van Haastert PJ (2008) The Roco protein family: A functional FASEB J – 7. Peter ME, Kubarenko AV, Weber AN, Dalpke AH (2009) Identification of an N- perspective. 22:3103 3110. terminal recognition site in TLR9 that contributes to CpG-DNA-mediated receptor 27. Huett A, et al. (2009) A novel hybrid yeast-human network analysis reveals an J Immunol – activation. J Immunol 182:7690–7697. essential role for FNBP1L in antibacterial autophagy. 182:4917 4930. 8. Hugot JP, et al. (2001) Association of NOD2 leucine-rich repeat variants with 28. Simonsen A, et al. (2004) Alfy, a novel FYVE-domain-containing protein associated J Cell Sci – susceptibility to Crohn’s disease. Nature 411:599–603. with protein granules and autophagic membranes. 117:4239 4251. 9. Ogura Y, et al. (2001) A frameshift mutation in NOD2 associated with susceptibility to 29. McGovern DP, et al.; NIDDK IBD Genetics Consortium (2010) Genome-wide association identifies multiple ulcerative colitis susceptibility loci. Nat Genet 42:332–337. Crohn’s disease. Nature 411:603–606. 30. Brass AL, et al. (2009) The IFITM proteins mediate cellular resistance to influenza A 10. Swanberg M, et al. (2005) MHC2TA is associated with differential MHC molecule H1N1 virus, West Nile virus, and dengue virus. Cell 139:1243–1254. expression and susceptibility to rheumatoid arthritis, multiple sclerosis and myocardial 31. Hao L, et al. (2008) Drosophila RNAi screen identifies host genes important for infarction. Nat Genet 37:486–494. influenza virus replication. Nature 454:890–893. 11. Hawn TR, et al. (2003) A common dominant TLR5 stop codon polymorphism abolishes 32. Amit I, et al. (2004) Tal, a Tsg101-specific E3 ubiquitin ligase, regulates receptor flagellin signaling and is associated with susceptibility to Legionnaires’ disease. J Exp endocytosis and retrovirus budding. Genes Dev 18:1737–1752. Med 198:1563–1572. 33. Zheng YT, et al. (2009) The adaptor protein p62/SQSTM1 targets invading bacteria to 12. Enkhbayar P, Kamiya M, Osaki M, Matsumoto T, Matsushima N (2004) Structural the autophagy pathway. J Immunol 183:5909–5916. principles of leucine-rich repeat (LRR) proteins. Proteins 54:394–403. 34. Shevchenko DV, et al. (1997) Molecular characterization and cellular localization of 13. Kobe B, Kajava AV (2001) The leucine-rich repeat as a protein recognition motif. Curr TpLRR, a processed leucine-rich repeat protein of Treponema pallidum, the syphilis Opin Struct Biol – 11:725 732. spirochete. J Bacteriol 179:3188–3195. J Mol Biol 14. Kajava AV (1998) Structural diversity of leucine-rich repeat proteins. 277: 35. Zhang XS, Choi JH, Heinz J, Chetty CS (2006) Domain-specific positive selection – 519 527. contributes to the evolution of Arabidopsis leucine-rich repeat receptor-like kinase 15. Matsushima N, et al. (2007) Comparative sequence analysis of leucine-rich repeats (LRR RLK) genes. J Mol Evol 63:612–621. BMC Genomics (LRRs) within vertebrate toll-like receptors. 8:124. 36. Larkin MA, et al. (2007) Clustal W and Clustal X version 2.0. Bioinformatics 23: Cell 16. Brodsky I, Medzhitov R (2007) Two modes of ligand recognition by TLRs. 130: 2947–2948. – 979 981. 37. Eddy SR (1998) Profile hidden Markov models. Bioinformatics 14:755–763. 17. Wei T, et al. (2008) LRRML: A conformational database and an XML description of 38. McQuitty LL (1966) Similarity analysis by reciprocal pairs for discrete and continuous leucine-rich repeats (LRRs). BMC Struct Biol 8:47. data. Educ Psychol Meas 26:825–831. 18. Hunter S, et al. (2009) InterPro: the integrative protein signature database. Nucleic 39. Letunic I, Bork P (2007) Interactive Tree Of Life (iTOL): an online tool for phylogenetic Acids Res 37 (database issue):D211–D215. tree display and annotation. Bioinformatics 23:127–128. 19. The UniProt Consortium (2010) The Universal Protein Resource (UniProt) in 2010. 40. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and Nucleic Acids Res 38 (database issue):D142–D148. powerful approach to multiple testing. J R Stat Soc Ser B Stat Methodol 57:289–300. 20. Schuster-Böckler B, Schultz J, Rahmann S (2004) HMM Logos for visualization of 41. Chen F, Mackey AJ, Stoeckert CJ, Jr, Roos DS (2006) OrthoMCL-DB: querying a compre- protein families. BMC Bioinformatics 5:7. hensive multi-species collection of ortholog groups. Nucleic Acids Res 34 (database issue): 21. Dolan J, et al. (2007) The extracellular leucine-rich repeat superfamily; a comparative D363–D368. survey and analysis of evolutionary relationships and expression patterns. BMC 42. de Hoon MJ, Imoto S, Nolan J, Miyano S (2004) Open source clustering software. Genomics 8:320. Bioinformatics 20:1453–1454.

8of8 | www.pnas.org/cgi/doi/10.1073/pnas.1000093107 Ng et al. Downloaded by guest on October 1, 2021