Rowan University Rowan Digital Works

Student Research Symposium Posters

Apr 26th, 8:00 AM

Analysis of Motif Distributions in Regions of Endocytic

Chante Bethell Bioinformatics, Rowan University

Stephanie J. Spielman, PhD Department of Biological Sciences, Rowan University

Follow this and additional works at: https://rdw.rowan.edu/student_symposium

Part of the Bioinformatics Commons Let us know how access to this document benefits ouy - share your thoughts on our feedback form.

Bethell, Chante and Spielman, PhD, Stephanie J., "Analysis of Motif Distributions in Regions of Endocytic Proteins" (2019). Student Research Symposium Posters. 2. https://rdw.rowan.edu/student_symposium/2019/apr26/2

This Poster is brought to you for free and open access by the Conferences, Events, and Symposia at Rowan Digital Works. It has been accepted for inclusion in Student Research Symposium Posters by an authorized administrator of Rowan Digital Works. Analysis of Motif Distributions in Regions of Endocytic Proteins

Chante Bethell1 and Stephanie J. Spielman2

1 Bioinformatics Program, Rowan University 2 Department of Biological Sciences, Rowan University

Abstract

A short linear motif (SLiM) is a recurring pattern of approximately three to ten Figure 2A: Steps of the -coated pit formation.2 amino acids found in proteins. SLiMs are important for cellular signaling and the Figure 2B: The major players in the initiation of CME. The cargo binds to its respective transmembrane regulating of proteins, often times by acting as binding sites for -binding receptor, which AP2 then binds to in order to create a domains. While SLiMs exist both in ordered regions of proteins where there is a Figure 1: The AP2 Protein Adaptor bridge to clathrin.2 Complex, colored by its subunits: tertiary structure and in disordered regions where there is no structure, they are AP2A, AP2B1, AP2S1, AP2M1.1 primarily functional in disordered regions. An important example of SLiM-mediated Motif Distributions of Active Endocytic Proteins processes and the focus of this study is endocytosis. Endocytosis is the process [ST]XXXX[LI] DX[FW] FXDX[FILMV] s 20 by which cells engulf molecules from the extracellular environment. There are 60 AAK1 AAK1 0.04 50 15 0.02 specific motifs that mediate and trigger endocytosis. However, the short length 40 10 0.00 30 AP2A1 AP2M1 of motifs means that it is easy to overlook those that may be important to 20 AP2A2 5 AP2A2 AP2A1 −0.02 10 AP2M1 AP2M1 AP2B1 biological functions. The goal of this study is to identify previously unrecognized AP2B1 0 −0.04 proteins that may be involved in endocytosis by analyzing the distribution of 0 1 2 3 4 5 6 3.0 3.5 4.0 4.5 5.0 5.5 6.0 3.96 3.98 4.00 4.02 4.04 NPF X[DE]XXXL[LI] YXX[LIMFV] motifs in the ordered and disordered regions of the human proteome. Using a 8 4.04 AAK1 0.04 bioinformatics approach, we systematically searched the entire human proteome 4.02 6 0.02 AP2M1 AAK1 AP2A1 for motifs known to be involved in endocytosis. We hypothesize that the 4.00 4 0.00 AP2S1 AAK1 AP2A2 AP2B1 proteins we find to be enriched with motifs in disordered regions may be 3.98 2 AP2A1 −0.02 AP2A2AP2M1 AP2B1 3.96 0 −0.04 functionally important for endocytosis. These proteins will be targeted for Region Disordered in Motif Count −0.04−0.02 0.00 0.02 0.04 0 1 2 3 4 5 6 4 6 8 10 12 experimental validation. Motif Count in Ordered Regions Figure 3: This plot shows motif distributions for the known the CME proteins (the five subunits of AP2 and the associated kinase, AAK1).

Distribution of select CME motifs across the human proteome

DX[FW] FXDX[FILMV] [ST]XXXX[LI] These plots represent heatmaps of proteins organized 31 30 1 2 3 3 by their distribution in the disordered and ordered 29 8 28 3 27 20 2 26 3 4 25 19 regions for the several endocytic motifs. Each rectangle 24 4 3 18 2 23 5 5 9 2 t 17 1 1 t 22 t (or hexagon) is colored based on the number of 21 5 3 16 20 15 10 22 2 4 2 Protein Count Protein Count 19 Frequency 14 4 1 18 8000 17 12 4 2 1 13 2 2000 proteins that have the respective number of motif 6000 16 20 51 3 10000 12 1500 15 11 1 4000 1000 14 23 19 5 instances in the ordered and disordered regions. The 13 5000 10 1 1 2000 3 2 2 500 12 52 122 11 2 1 1 9 11 64 25 3 3 1 8 1

10 Motif Coun Disordered 7 2 1 Motif Coun Disordered rectangles found above the green y=x line on each plot

Disordered Motif Coun Disordered 9 96 215 13 4 2 8 6 12 2 61 5 7 180 150 15 17 1 1 5 21 14 6 4 46 13 represent the proteins that are enriched for the motif 5 682 682 95 55 4 5 2 2 1 4 3 223 73 3 1828 520 164 46 16 6 2 1 1 2321 344 10 3 2 958 292 22 1 in the disordered regions. 2 5772 13453 839 1126 71 124 12 38 2 11 1 8920 3760 242 17 1 0 0 −1 0 1 2 3 0 1 2 0 1 2 3 4 5 6 7 8 9 10 Ordered Motif Count Ordered Motif Count Ordered Motif Count

NPF X[DE]XXXL[LI] YXX[LIMFV] NPXY 15 18 14 1 17 1 1 16 13 15 12 1 3 8 14 1 5 1 11 2 13 t t t t 10 12 Protein Count 9 2 Protein Count 11 1 Protein Count Protein Count 4 2 6000 10 1 1 750 8 7500 600 4000 9 1 4 2 15 500 7 1 5000 400 8 3 7 3 23 250 2000 2500 200 6 16 7 11 4 2 5 11 3 6 7 5 Disordered Motif Coun Disordered Motif Coun Disordered Motif Coun Disordered Motif Coun Disordered 2 25 4 56 7 5 29 28 9 4 104 104 12 3 168 31 5 1 1 753 13 3 471 262 65 7 2 2 725 145 10 1 1 927 12 2 1862 918 162 27 9 2 1 1 6667 1641 122 9 1 9056 7133 1239 221 48 18 5 4 3 0 0 0 1 0 1 2 3 0 1 2 3 4 5 6 7 8 0 1 Ordered Motif Count Ordered Motif Count Ordered Motif Count Ordered Motif Count

A subset of identified candidate CME proteins Conclusions

! The distribution of motifs in the disordered versus ordered regions are not always enriched in endocytic proteins. ! We identify dozens of proteins that are enriched for CME motifs in their disordered regions. These can be targeted for further experimental testing for involvement in the CME pathway. ! Many of these identified proteins are not experimentally characterized with functional motifs according to the Eukaryotic Linear Motif database (http://elm.eu.org/).

Bibliography

1. RCSB . (n.d.). 4UQI. Retrieved April 10, 2019, from https://www.rcsb.org/3d- view/4UQI/1 2. Smith SM, Baker M, Halebian M and Smith CJ (2017) Weak Molecular Interactions in Clathrin-Mediated Endocytosis. Front. Mol. Biosci. 4:72. doi: 10.3389/fmolb.2017.00072 3. Tidyverse. (n.d.). Retrieved from https://www.tidyverse.org/ 4. UniProt: A worldwide hub of protein knowledge. (2018). Nucleic Acids Research, 47(D1). doi:10.1093/nar/gky1049

All code and data is available from https://github.com/cbethell/motifs