Conserved SQ and QS Motifs in Bacterial Effectors Suggest

Conserved SQ and QS Motifs in Bacterial Effectors Suggest

BioRxiv: SQ-QS, Effectors and ATM Supplementary materials Conserved SQ and QS motifs in bacterial effectors suggest pathogen interplay with the ATM kinase family during infection Davide Sampietro1,2, Hugo Sámano-Sánchez2,3, Norman E. Davey4, Malvika Sharan2, Bálint Mészáros2,5, Toby J. Gibson2, Manjeet Kumar2* 1. Department of Biotechnology and Biosciences, University of Milano-Bicocca, Piazza della Scienza 2, 20126 Milan, Italy. 2. Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, 69117, Germany 3. (Candidate for) Joint PhD degree from EMBL and Heidelberg University, Faculty of Biosciences. 4. UCD School of Medicine & Medical Science, University College Dublin, Belfield, Dublin 4, Ireland. 5. MTA-ELTE Momentum Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest H-1117, Hungary. >sp|P55980|CAGA_HELPY Cytotoxicity-associated immunodominant antigen OS=Helicobacter pylori (strain ATCC 700392 / 26695) OX=85962 GN=cagA PE=1 SV=1 MTNETIDQTRTPDQTQSQTAFDPQQFINNLQVAFIKVDNVVASFDPDQKPIVDKNDRDNR QAFDGISQLREEYSNKAIKNPTKKNQYFSDFIDKSNDLINKDNLIDVESSTKSFQKFGDQ RYQIFTSWVSHQKDPSKINTRSIRNFMENIIQPPIPDDKEKAEFLKSAKQSFAGIIIGNQ IRTDQKFMGVFDESLKERQEAEKNGGPTGGDWLDIFLSFIFNKKQSSDVKEAINQEPVPH VQPDIATTTTDIQGLPPEARDLLDERGNFSKFTLGDMEMLDVEGVADIDPNYKFNQLLIH NNALSSVLMGSHNGIEPEKVSLLYAGNGGFGDKHDWNATVGYKDQQGNNVATLINVHMKN GSGLVIAGGEKGINNPSFYLYKEDQLTGSQRALSQEEIRNKVDFMEFLAQNNTKLDNLSE KEKEKFQNEIEDFQKDSKAYLDALGNDRIAFVSKKDTKHSALITEFNNGDLSYTLKDYGK KADKALDREKNVTLQGSLKHDGVMFVDYSNFKYTNASKNPNKGVGATNGVSHLEAGFNKV AVFNLPDLNNLAITSFVRRNLENKLTAKGLSLQEANKLIKDFLSSNKELAGKALNFNKAV AEAKSTGNYDEVKKAQKDLEKSLRKREHLEKEVEKKLESKSGNKNKMEAKAQANSQKDEI FALINKEANRDARAIAYTQNLKGIKRELSDKLEKISKDLKDFSKSFDEFKNGKNKDFSKA EETLKALKGSVKDLGINPEWISKVENLNAALNEFKNGKNKDFSKVTQAKSDLENSVKDVI INQKVTDKVDNLNQAVSVAKAMGDFSRVEQVLADLKNFSKEQLAQQAQKNEDFNTGKNSE LYQSVKNSVNKTLVGNGLSGIEATALAKNFSDIKKELNEKFKNFNNNNNGLKNSTEPIYA KVNKKKTGQVASPEEPIYTQVAKKVNAKIDRLNQIASGLGGVGQAAGFPLKRHDKVDDLS KVGLSASPEPIYATIDDLGGPFPLKRHDKVDDLSKVGRSRNQELAQKIDNLNQAVSEAKA GFFGNLEQTIDKLKDSTKKNVMNLYVESAKKVPASLSAKLDNYAINSHTRINSNIQNGAI NEKATGMLTQKNPEWLKLVNDKIVAHNVGSVSLSEYDKIGFNQKNMKDYSDSFKFSTKLN NAVKDIKSGFTHFLANAFSTGYYCLARENAEHGIKNVNTKGGFQKS Figure S1. Sequence of CagA (P55980). Many of the ST/Q and QS/T motifs highlighted in yellow are conserved across different Helicobacter strains. 1 BioRxiv: SQ-QS, Effectors and ATM Supplementary materials Figure S2. Empirical cumulated distribution function (ecdf) for the residuals of the dipeptide in the disordered regions of the substrates of AADk. Each dot represents Yi , where i is the dipeptide (see Materials and Methods section). SQ, QS and QT dipeptides are marked with black arrows. While the y-axis denotes the cumulative probability, the x-axis denotes the range of values of Yi 2 BioRxiv: SQ-QS, Effectors and ATM Supplementary materials Figure S3. Empirical cumulated distribution function (ecdf) for the residuals of the tripeptides in the disordered regions of the substrates of AADk. Each dot represents Yi, where i is the dipeptide (see Materials and Methods section). SQS, SQE, SQP and SSQ are shown. While the y-axis denotes the cumulative probability, the x-axis denotes the range of values of Yi 3 BioRxiv: SQ-QS, Effectors and ATM Supplementary materials Figure S4 Top: fequency of each tetrapeptide in the disordered regions of the substrates of AADk. The diagonal is the 1:1 correlation line. The further the dot from the diagonal, the stronger the enrichment. Whatever is above the diagonal is enriched in the positive set. Bottom: Empirical cumulated distribution function (ecdf) for the residuals of the tetrapeptides in the disordered regions of the substrates of AADk. Each dot represents Yi , where i is the tetrapeptide (see Materials and Methods section). While the y-axis denotes the cumulative probability, the x-axis denotes the range of values of Yi 4 BioRxiv: SQ-QS, Effectors and ATM Supplementary materials Figure S5 Alignment of Tir from E.coli (E.coli), Escherichia albertii (E.alb), Salmonella enterica (S.ent), Salmonella cholaraesius (S.cho). S/TQs are highlighted in black, QS/Ts in red. The motifs cluster together and often overlap. Blue highlights the percentage of identity for all the residues, except those highlighted in red and black. The Uniprot or UniRef accessions have been provided for all the sequences in the alignment. 5 BioRxiv: SQ-QS, Effectors and ATM Supplementary materials Figure S6. Alignment of lpg2577 from Legionella pneumophila (L.pne), Legionella feeleii (L.fee), Legionella quateriensis (L.qua), Tatlockia micdadei(T.mic), Legionella lansingensis (L.lan), Legionella pasculli (L.pas), Legionella wadsworthii (L.wad), Legionella moravica (L.mor), Legionella israelensis (L.isr), Legionella jamestowniensis (L.jam), Legionella fallonii (L.faa). S/TQs are highlighted in black, QS/Ts in red. The motifs cluster together and often overlap. In grey a docking site (DOC_PIKK_1) for AADks kinases involved in DNA-damage response is shown. Blue highlights the percentage of identity for all the residues, except those highlighted in red and black. The Uniprot or UniRef accessions have been provided for all the sequences in the alignment. 6 BioRxiv: SQ-QS, Effectors and ATM Supplementary materials Figure S7. Alignment of sidG from Legionella pneumophila (L.pne), Legionella moravica (L.mor), Legionella sainthelensi (L.sai), Legionella santicrucis (L.san). S/TQs are highlighted in red, QS/Ts in black. Blue highlights the percentage of identity for all the residues, except those highlighted in red and black. The Uniprot or UniRef accessions have been provided for all the sequences in the alignment. 7 BioRxiv: SQ-QS, Effectors and ATM Supplementary materials Figure S8. Distribution of the SQ - QS distances in the disordered region of the AADk substrates. The theoretical distribution of the minimal distances (black), the minimal distances between the experimentally validated pSQ - QS (yellow) and the minimal distances between all the SQs, (experimentally and not experimentally validated) (green) are compared. Top: 2 - 150 amino acid SQ - QS distances. Bin size = 10 SQ-QS distances, overlapping. Binned distribution was preferred to the real one due to the lack of data. Bottom: Bin size = 3 SQ-QS distances. Due to the fact that we are analysing only the amino acid sequence of the protein, large distances (larger than 150-200) are not considered as we cannot foresee their vicinity in the 3D structure of the protein by just looking at its sequence. By looking at the binned distribution, we noticed that there seems to be a difference between the two distributions between 7 and 26 SQ-QS distances. Due to this, we did a Chi-square test to compare the theoretical distribution vs the real ones, which is given in the text. 8 BioRxiv: SQ-QS, Effectors and ATM Supplementary materials Figure S9. Box plot for the IUPred score of the experimentally validated phosphorylation sites in the substrates of the AADks. 9 BioRxiv: SQ-QS, Effectors and ATM Supplementary materials Table S1. List of effectors possibly multi-phosphorylated by AADks. Uniprot ID Gene name SQ TQ QS QT Organism P55980 CagA 5 5 4 4 Helicobacter pylori B7UM99 Tir 2 2 2 3 Escherichia coli O127:H6 C8TWM3 map 1 5 3 4 Escherichia coli O103:H2 Q5ZSE2 lpg2577 0 1 1 0 Legionella pneumophila Q5ZRN6 lpg2844 12 1 12 3 Legionella pneumophila Q5ZVT5 sidG 5 2 2 2 Legionella pneumophila B7UKZ9 yhhA 3 4 1 3 Escherichia coli O127:H6 Q87GF9 VPA1357 12 26 6 35 Vibrio parahaemolyticus serotype O3:K6 Q9RBS0 PopA 4 1 8 0 Ralstonia solanacearum Q9RBS1 PopB 3 1 1 3 Ralstonia solanacearum 10 BioRxiv: SQ-QS, Effectors and ATM Supplementary materials Table S2. Data for the p-value distribution of the probability of relative local conservation. Due to the extreme difference in the size of the positive and negative control, for each dipeptide, we sampled from the bigger database (negative control) a number of unit equal to the one in the smaller one (positive control). We iterated this process 5000 times. Each time, we calculated the p-values using the Wilcoxon-Mann-Whitney-Test. Dipeptide Mean Median Standard deviation SQ 3.2089E-06 1.3642E-08 3.208E-05 TQ 0.06741190 0.031 0.089 QS 0.0223 0.005 0.043 QT 0.080678 0.0415 0.098 11 BioRxiv: SQ-QS, Effectors and ATM Supplementary materials Table S3. Comparison between human and four different bacterial proteomes. P-value suggests the absence of significant difference in the composition of the disordered regions. Comparison p-value (Wilcoxon signed) P-value (Wilcoxon signed) Homo - Chlamydia 0.81 Homo - Coxiella 0.64 Homo - Legionella 0.78 Homo - Pseudomonas 0.67 12 .

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    12 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us