Human Leucine-Rich Repeat Proteins: a Genome-Wide Bioinformatic Categorization and Functional Analysis in Innate Immunity

Human leucine-rich repeat proteins: a genome-wide bioinformatic categorization and functional analysis in innate immunity Aylwin C. Y. Nga,b,1, Jason M. Eisenberga,b,1, Robert J. W. Heatha, Alan Huetta, Cory M. Robinsonc, Gerard J. Nauc, and Ramnik J. Xaviera,b,2 aCenter for Computational and Integrative Biology, and Gastrointestinal Unit, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114; bThe Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142; and cMicrobiology and Molecular Genetics, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261 Edited by Jeffrey I. Gordon, Washington University School of Medicine, St. Louis, MO, and approved June 11, 2010 (received for review February 17, 2010) In innate immune sensing, the detection of pathogen-associated proteins have been implicated in human diseases to date, notably molecular patterns by recognition receptors typically involve polymorphisms in NOD2 in Crohn disease (8, 9), CIITA in leucine-rich repeats (LRRs). We provide a categorization of 375 rheumatoid arthritis and multiple sclerosis (10), and TLR5 in human LRR-containing proteins, almost half of which lack other Legionnaire disease (11). identifiable functional domains. We clustered human LRR proteins Most LRR domains consist of a chain of between 2 and 45 by first assigning LRRs to LRR classes and then grouping the proteins LRRs (12). Each repeat in turn is typically 20 to 30 residues long based on these class assignments, revealing several of the resulting and can be divided into a highly conserved segment (HCS) fol- protein groups containing a large number of proteins with certain lowed by a variable segment (VS). The HCS usually consists of non-LRR functional domains. In particular, a statistically significant- either the 11-residue sequence LxxLxLxxNxL or the 12-residue number of LRR proteins in the typical (T) and bacterial + typical (S+T) sequence LxxLxLxxCxxL, where L is Leu, Ile, Val, or Phe; N is categories have transmembrane domains, whereas most of the LRR Asn, Thr, Ser, or Cys; and C is Cys, Ser, or Asn (13, 14). Al- proteins in the cysteine-containing (CC) category contain an F-box though these substitutions often preserve hydrophobicity or po- domain (which mediates interactions with the E3 ubiquitin ligase larity, it is possible for the first and last leucines to be replaced by complex). Furthermore, by examining the evolutionary profiles of relatively hydrophilic residues (15). The function of many LRR the LRR proteins, we identified a subset of LRR proteins exhibi- domains is to provide a structural framework for protein–protein ting strong conservation in fungi and an enrichment for “nucleic interactions (PPIs) (13). PDB structures for LRR-containing pro- acid-binding” function. Expression analysis of LRR genes identifies teins show the LRR domains in an arc or horseshoe shape. The a subset of pathogen-responsive genes in human primary macro- concave face (corresponding to the HCS of each LRR) consists of phages infected with pathogenic bacteria. Using functional RNAi, parallel β-strands, each usually three residues long, flanked by – we show that MFHAS1 regulates Toll-like receptor (TLR) dependent loops. Conversely, the convex face (corresponding to the VS of signaling. By using protein interaction network analysis followed each LRR) is composed of a variety of secondary structures, which fi by functional RNAi, we identi ed LRSAM1 as a component of the are often helical (13). Ligands can interact with either the convex antibacterial autophagic response. or the concave face, although the latter is more typical (13, 16). The core of the arc is hydrophobic, typically shielded by caps at the N fl pathogen-response | anti-bacterial | in ammation | pathogen sensors | terminus of the first LRR and at the C terminus of the last LRR. In autophagy extracellular proteins or domains, these caps often contain two- or four-residue cysteine clusters (14). nnate immunity is a conserved host response that entails the Beyond innate immunity, extensive functional diversity occurs Isensing of pathogen-associated molecular patterns through among LRR-containing proteins, which are involved in a variety of germline-encoded pattern recognition receptors (1), which initiate cellular processes including apoptosis, autophagy, ubiquitin-related pathway-specific signaling networks, resulting in rapid responses processes, nuclear mRNA transport, and neuronal development. that serve as the host’s first line of defense. A striking feature among Although the presence of well characterized non-LRR domains families of proteins functioning as sensors or effectors of innate can be used to classify LRR proteins along functional lines, such immunity is the inclusion into their structure, various combinations a classification scheme will place LRR proteins lacking any other of the following domains: Leucine-rich repeat (LRR), Toll/IL-1 identifiable functional domains into a single group unrelated to receptor, Ig, nucleotide-binding site, pyrin, RIG-I–like receptor, any specific function. To fully appreciate the signaling functions of caspase-recruitment, coiled-coil, immunoreceptor tyrosine-based activation motif, and C-type lectin (2). The LRR domain is present in a large number of prokaryotic This paper results from the Arthur M. Sackler Colloquium of the National Academy of and eukaryotic proteins and is one of the most commonly oc- Sciences, "Microbes and Health," held November 2–3, 2009, at the Arnold and Mabel curring protein domains in proteins associated with innate im- Beckman Center of the National Academies of Sciences and Engineering in Irvine, CA. The complete program and audio files of most presentations are available on the NAS munity. This is especially so in invertebrates (e.g., sea urchin) and Web site at http://www.nasonline.org/SACKLER_Microbes_and_Health. Cephalochordata (e.g., amphioxus) which have vastly expanded Author contributions: A.C.Y.N., J.M.E., and R.J.X. designed research; A.C.Y.N., J.M.E., repertoires of LRR proteins (3) in the absence of adaptive im- R.J.W.H., and A.H. performed research; A.C.Y.N., J.M.E., R.J.W.H., C.M.R., and G.J.N. con- munity. In jawless vertebrates, combinatorial assembly of LRR tributed new reagents/analytic tools; A.C.Y.N., J.M.E., A.H., and R.J.X. analyzed data; and gene segments in variable lymphocyte receptors generates the A.C.Y.N., J.M.E., A.H., and R.J.X. wrote the paper. structural diversity for antigen recognition and forms the basis of The authors declare no conflict of interest. an adaptive immune system (4, 5). Toll-like receptors (TLRs) and This article is a PNAS Direct Submission. NOD-like receptors (NLRs) recognize via the LRR domain, 1A.C.Y.N. and J.M.E. contributed equally to this work. molecular determinants from a structurally diverse collection of 2To whom correspondence should be addressed. E-mail: [email protected]. bacterial, fungal, viral, and parasite-derived components (6, 7). This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. Mutations or polymorphisms in more than 30 LRR-containing 1073/pnas.1000093107/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1000093107 PNAS | March 15, 2011 | vol. 108 | suppl. 1 | 4631–4638 Downloaded by guest on September 23, 2021 LRR proteins, we devised semi-automated methods for grouping It is evident from Dataset S1 that most of the proteins contain them on the basis of LRR classes. We identified the positions of one or more irregular LRR. As shown in Fig. 1D, these LRRs individual LRR sequences within each LRR domain, and classified occur both at the ends of LRR chains (FBXL3) as well as within each repeat based upon consensus sequences and lengths of their them (LRRC16A). By their very nature, irregular LRRs are more VSs. Identified repeats were assigned into classes: bacterial (S), difficult to identify than regular LRRs, particularly at the ends of ribonuclease inhibitor–like (RI), cysteine-containing (CC), SDS22, LRR chains. However, irregular LRRs that occur within LRR plant-specific (PS), typical (T), and Treponema pallidum (Tp) chains can be identified with greater confidence and are likely to (13, 17). By using this approach, we reasoned that functionally re- be valid LRRs as they span the gaps between regular LRRs. lated LRR-only proteins would be placed together by virtue of their similar LRR domains. In addition, these LRR class-based clusters Clustering the LRR Proteins Using LRR Classes. Based on the LRR would contain all of the proteins bearing similar LRR repeat class assignments, we grouped the LRR proteins in two different structures, with further subcategorization via the presence or ab- ways. For the first approach, we clustered the proteins based on sence of non-LRR domains. the sequence similarity of their LRRs; each LRR in a given pro- In this article, we present a comprehensive categorization of tein was allowed to match any LRR of the same class in another human LRR-containing proteins by LRR class composition, protein (the irregular LRRs were placed in their own class). Fig. functional associations, and involvement in host responses to in- 1E and Fig. S3 summarize the clustered results for human LRR fection stimuli. We demonstrate semi-automated methods that proteins as a circular tree. As predicted, we observed functionally use LRR class composition to facilitate functional groupings. By similar proteins clustering together e.g., members of the SLITRK, integrating diverse datasets and using functional RNA interfer- NLR (Fig. S3), and F-box families (Fig. S4). We were also able to ence, we experimentally place uncharacterized LRR proteins in observe LRR-only proteins being distributed among those con- innate immunity and autophagy. taining non-LRR domains, fulfilling one of our goals in this effort: to cluster LRR proteins on the basis of LRR class composition. Results Because the class annotations for the LRRs are not always Annotation of LRRs.

Human Leucine-Rich Repeat Proteins: a Genome-Wide Bioinformatic Categorization and Functional Analysis in Innate Immunity

Recombinant Human FLRT1 Catalog Number: 2794-FL

Structural Biology of Laminins

ANP32A and ANP32B Are Key Factors in the Rev Dependent CRM1 Pathway

Role of FBXW5-Loss in Centrosome Abnormalities and Cell Physiology

A Computational Approach for Defining a Signature of Β-Cell Golgi Stress in Diabetes Mellitus

Discovery of the First Genome-Wide Significant Risk Loci for ADHD

Fine Mapping Studies of Quantitative Trait Loci for Baseline Platelet Count in Mice and Humans

Leucine Zippers

Α/Β Coiled Coils 2 3 Marcus D

UNIVERSITY of CALIFORNIA RIVERSIDE Investigations Into The

GENOME EDITING in AFRICA’S AGRICULTURE 2021 an EARLY TAKE-OFF TABLE of CONTENTS Abbreviations and Acronyms 3

Supplementary Table S4. FGA Co-Expressed Gene List in LUAD