Matrix Metalloproteinases Outside Vertebrates
Total Page:16
File Type:pdf, Size:1020Kb
Marino-Puertas et al. 1 Matrix metalloproteinases outside vertebrates Laura Marino-Puertas, Theodoros Goulas* and F. Xavier Gomis-Rüth* Proteolysis Lab; Structural Biology Unit; "María-de-Maeztu" Unit of Excellence; Molecular Biology Institute of Barcelona (CSIC); Barcelona Science Park; c/Baldiri Reixac, 15-21; 08028 Barcelona (Spain). * Corresponding authors: e-mail: [email protected], phone: (+34) 934 020 187 or e-mail: [email protected], phone: (+34) 934 020 186. ABSTRACT The matrix metalloproteinase (MMP) family belongs to the metzincin clan of zinc-dependent metallopeptidases. Due to their enormous implications in physiology and disease, MMPs have mainly been studied in vertebrates. They are engaged in extracellular protein processing and degradation, and present extensive paralogy, with 23 forms in humans. One characteristic of MMPs is a ~165-residue catalytic domain (CD), which has been structurally studied for 14 MMPs from human, mouse, rat, pig and the oral-microbiome bacterium Tannerella forsythia. These studies revealed close overall coincidence and characteristic structural features, which distinguish MMPs from other metzincins and give rise to a sequence pattern for their identification. Here, we reviewed the literature available on MMPs outside vertebrates and performed database searches for potential MMP CDs in invertebrates, plants, fungi, viruses, protists, archaea and bacteria. These and previous results revealed that MMPs are widely present in several copies in Eumetazoa and higher plants (Tracheophyta), but have just token presence in eukaryotic algae. A few dozen sequences were found in Ascomycota (within fungi) and in double-stranded DNA viruses infecting invertebrates (within viruses). In contrast, a few hundred sequences were found in archaea and >1000 in bacteria, with several copies for some species. Most of the archaeal and bacterial phyla containing potential MMPs are present in human oral and gut microbiomes. Overall, MMP-like sequences are present across all kingdoms of life, but their asymmetric distribution contradicts the vertical descent model from a eubacterial or archaeal ancestor. Keywords: zinc metalloproteinase; metzincin; MMP; invertebrates; catalytic domain; structure-based sequence motif –––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– 1. Molecular characteristics of matrix metalloproteinases — The matrix metalloproteinases (MMPs) are a widespread family of zinc-dependent metallopeptidases (MPs), which either broadly degrade extracellular matrix components or selectively activate or inactivate other proteins through limited proteolysis [1-3]. MMPs were discovered in 1962 as an active principle in frog metamorphosis [4], and they contain a central zinc-dependent catalytic domain (CD) of ~165 residues, which is mostly furnished at its N-terminus with a ~20-residue signal peptide for secretion and an ~80-residue zymogenic pro-domain (PD). Some MMPs possess a "furin recognition motif" (R-X-R/K-R; [5]) for proteolytic activation between the PD and the CD. Into this minimal configuration, distinct MMPs have inserted extra segments and domains, such as fibronectin-type-II inserts within the CD and C- terminal unstructured linker regions, ~200-residue hemopexin domains (HDs) and other domains, as well as glycosyl phosphatidylinositol (GPI) anchors and transmembrane segments [1, 3, 6]. Overall, this variability divides human MMPs into subfamilies: archetypal MMPs, gelatinases, matrilysins and furin-activatable MMPs, which include membrane-type MMPs [6, 7]. MMPs belong to the metzincin clan of MPs, which currently comprises twelve structurally characterized families with common features [8-12]. However, this structural similarity is not reflected at sequence level, because Marino-Puertas et al. 2 pairwise identities are typically <20%, which is below the twilight zone of relevant sequence similarity (25–35%; [13]). Overall, metzincins share a globular CD of ~130—270 residues, which is split into a structurally conserved N-terminal sub-domain (NTS) and a diverging C-terminal sub-domain (CTS) by a horizontal active-site cleft. This cleft contains the catalytic zinc ion and accommodates protein or peptide substrates for catalysis (Figure 1A). NTSs contain a mostly five-stranded β-sheet, whose strands (βI-βV) parallel the active-site cleft. The lowermost strand βIV shapes the upper rim of the cleft and runs antiparallel to the other strands and to a bound substrate, thus establishing inter-main-chain interactions to fix it. NTSs also possess a "backing helix" (αA) with structural functions and an "active-site helix" (αB) with an extended "zinc-binding motif", H-E-X-X-H-X-X-G/N-X-X-H/D- Φ. This motif includes the general base/acid glutamate required for catalysis and three metal-binding protein residues (in bold), as well as a "family-specific residue" (Φ; [8-10]). Some families further comprise an additional "adamalysin helix", which was first identified in the adamalysin/ADAM family [12, 14]. In contrast to this structural conservation, CTSs largely deviate in length and structure but share the "Met-turn" [15], a loop with a methionine placed just below the catalytic zinc, and a "C-terminal helix" (αC) that terminates the CD. Into this common minimal scaffold, specific molecular elements decorate each metzincin family in the form of loops, ion- binding sites, additional regular secondary structure elements, extra domains, etc. [8-12]. To date, structural studies on MMP CDs are restricted to mammals (human, mouse, rat and pig), with the notable exception of bacterial karilysin ([12]; see also Section 6). These structures revealed that MMPs are very similar (Figure 1B), have much longer NTSs (~127 residues) than CTSs (~37 residues), and include an "S-shaped double loop", which connects the third and fourth strands of the NTS β-sheet (Figure 1A). This loop binds essential structural zinc and calcium cations, after which the polypeptide forms a prominent bulge ("bulge-edge segment") that protrudes into the active-site groove and assists in substrate specificity. A further hallmark of MMPs is a second calcium site within the NTS and that the family-specific residue is a serine or threonine [3, 10]. This residue makes a strong hydrogen bond between side chains with the first of two consecutive aspartates at the beginning of helix αC (Figure 1C). This aspartate is also engaged in fixing the N-terminus of the mature CD upon proteolytic activation. In turn, the second aspartate participates in a double electrostatic interaction with main-chain atoms of the Met-turn (Figure 1A,C). This intricate electrostatic network is structured around the "connecting segment", which links the third zinc-binding histidine with the Met-turn methionine and spans only seven residues; thus, it is the shortest of the metzincins [10]. Further typical of MMPs are two tyrosines, respectively four positions downstream of the Met-turn methionine and featuring the penultimate residue of the CD. Finally, the Met-turn and helix αC are connected by a loop subdivided into the "S1'-wall-forming segment" and the downstream "specificity loop" (Figure 1A). Taken together, all these characteristic structural features yield an extended sequence pattern, H- E-2X-H-2X-G-2X-H-S/T-6X-M-3X-Y-9X-D-D-7X-Y-X (Figure 1D), which distinguishes the C-terminal segment of MMP CDs from other metzincin families and MPs in general [12]. In addition to the CD, a further hallmark of MMPs is that latency is maintained through the PD [3], which together with transcriptional regulation and dedicated protein inhibitors regulates physiological MMP activity [16]. In most MMPs, zymogenicity is exerted by a "cysteine-switch" mechanism centered on a cysteine within a consensus motif at the end of the PD, P-R-C-G-X-P-D [17], which is found in all human MMPs except MMP-23B. Structurally, the polypeptide chain spanning these residues shields the active-site cleft by adopting a U-shaped conformation mediated by a double salt bridge between the arginine-aspartate pair and by the glycine, whose missing side chain prevents collision with upper-rim strand βIV [3]. In addition, the cysteine Sγ atom coordinates and thus blocks the catalytic zinc (see Fig. 3B in [3]). MMP activation entails PD removal to free access to the cleft. 2. Matrix metalloproteinases in invertebrates — MMPs have been extensively studied in vertebrates, particularly in mammals, due to their widespread implications in human health and disease [6, 16, 18, 19]. However, they are also widely present in invertebrates, where they participate in tissue turnover processes such as dendritic remodeling, regeneration, tracheal growth, axon guidance, histolysis and matrix degradation during hatching. They also participate in cancer processes [20]. Studied organisms belong to the phyla Arthropoda, including insects (dipterans, lepidopterans, hymenopterans and coleopterans) and crustaceans; Echinoderma, including sea urchins and sea cucumbers; Mollusca, including mussels and snails; Nematoda, Annelida and Planaria; invertebrate Chordata, including sea squirt and lancelet; and Cnidaria, including hydra and jellyfish (see Table 1). In addition, we found MMP-like sequences in the genomes of organisms as primitive as starlet sea anemone (Nematostella vectensis; UniProt database [www.uniprot.org] access code [UP] A7RJ22), which is also from the phylum Cnidaria, and sea gooseberry (Pleurobrachia pileus; GenBank database [www.ncbi.nlm.nih.gov/genbank] access code [GB] FQ005948) from the phylum