STRUCTURE DETERMINATION AND BIOCHEMICAL CHARACTERIZATION OF NOVEL HUMAN -LIKE DOMAINS.

by

Ryan Steven Doherty

A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Medical Biophysics University of Toronto

© Copyright by Ryan Steven Doherty 2015

STRUCTURE DETERMINATION AND BIOCHEMICAL

CHARACTERIZATION OF NOVEL HUMAN

UBIQUITIN-LIKE DOMAINS

Ryan Steven Doherty

Doctor of Philosophy

Department of Medical Biophysics University of Toronto

2015 Abstract

The ubiquitin fold acts as a signaling modulator associated with regulating, trafficking, and degrading . The encodes 398 ubiquitin-like domains (UBLs), of which a couple dozen may act as covalent modifiers. Ubiquitin and ubiquitin-like domains have been implicated in a number of malignancies, neuromuscular disorders, neurodegenerative disorders and other human illnesses. Identifying the structural effects of sequence variations between different ubiquitin-like homologues will provide insight into their varied functional pathways, since the role of ubiquitin-like modifiers is typically mediated by -protein interactions. Structure determination and analyses of ubiquitin-like homologues facilitates residue mapping and comparative analysis of protein-protein interaction sites, which provide insight into the many roles that ubiquitin-like homologues play in cellular processes. The aim of this thesis was to develop a framework through which complete structural coverage of all human ubiquitin-like domains could be achieved. To accomplish this, I defined the human ubiquitin-like fold family, identified ubiquitin- like domain constructs amenable for NMR structure determination, solved two structures

ii

(NFATc2IP & ubiquilin-1) and characterized associated binding partners, and created a data resource for human ubiquitin-like domains that enables clustering and associating protein structures with physicochemical features and cellular function. I also collaborated with the North-

East Structural Genomics consortium (NESG) and the Structural Genomics Consortium (SGC), through which the molecular structures of 17 ubiquitin-like domains were determined using nuclear magnetic resonance (NMR) experiments and X-ray crystallography. Comparative analysis of structurally characterized ubiquitin-like folds revealed potential interaction partners with regions similar to known ubiquitin and SUMO interacting domains. Potential interaction partners for NFATC2IP and ubiquilin-1 were validated experimentally using NMR titration experiments. Comparative analysis of structural features of all ubiquitin-like homologues facilitates further studies into the mechanisms of the ubiquitylation system, predicted protein- protein interactions, and the identification of functional pathways associated with uncharacterized ubiquitin-like domains.

iii

Acknowledgements

I would like to thank my supervisor Cheryl Arrowsmith for her ongoing support, advice and mentorship over the years. I also appreciate the guidance and knowledge shared by my supervisory committee members: Sirano Dhe-Paganon, Brian Raught, Jane McGlade, and

Zhaolei Zhang. I would also like to recognize the efforts and support from members of the

Arrowsmith lab, past and present, especially Adelinda Yee, Shili Duan, Scott Houliston, Sasha

Lemak, Aleks Gutmanas, Christophe Fares, Yi Sheng, Lilia Kaustov, Bin Wu, Seth Chitayat,

Sampath Srisailam, Murthy Karra, Jonathan Lukin, Natalie Nady, Jack Liao, Rob Laister, Melissa

Ho, Tony Semesi, and Maite Garcia.

This thesis would not have been possible without collaborations. For this reason, I would like to thank Gaetano Montelione, John Everett, Mani Ravichandran, Yufeng Tong, Masoud Vedadi,

David Yim and Raymond Hui for their time, resources, feedback and help in key aspects of this project.

I would also like to thank members of various University of Toronto communities who have encouraged, supported and worked alongside me throughout this endeavor: Medical Biophysics

Graduate Student Association, 89 Chestnut Residence, Massey College, Massey Grand Rounds, and Impact Centre.

Finally, I thank my family and friends for their patience, love and understanding. It is to you that I dedicate this thesis.

iv

Table of Contents

Abstract ii

Acknowledgements iv

Table of Contents v

List of Tables x

List of Figures xi

List of Appendices xiii

List of Abbreviations xiv

Chapter 1 - Introduction 1

1.1 Overview 1

1.2 Biological Significance of ubiquitin & ubiquitin-like modifiers 2

1.3 Protein Modification & ubiquitin 2

1.4 The ubiquitin Fold 3

1.5 Ubiquitin-like domains (UBLs) 4

1.6 Ubiquitin-like modifiers (UBM) 6

1.7 Ubiquitin-like structural domains 9

1.8 Ubiquitin Conjugation Cascade 9

1.9 Ubiquitin-binding domains & interactions 11

1.9.1 Ubiquitin Interacting Motif (UIM) 11

1.9.2 Coupling of Ubiquitin conjugation to Endoplasmic Reticulum 12 Degradation (CUE)

1.9.3 Ubiquitin-Associated Domain (UBA) 12

1.9.4 Ubiquitin Conjugating Variant (UEV) 12

1.9.5 Npl4 Zing Finger Motif (NZF) 13

1.9.6 GGA And Tom1 Domain (GAT) 13

1.9.7 Other Ubiquitin Binding Domains 13

1.9.8 SUMO Interacting Motif (SIM) 13 v

Table of Contents (continued)

1.9.9 Diversity among Ubiquitin-Binding Domains 14

1.10 Thesis Overview 15

1.10.1 Identify and obtain near-complete structural coverage of all 15 human ubiquitin-like domains.

1.10.2 Exploring NFATc2IP:NFATc2 & ubiquilin-1:PIN2 15 protein-protein interactions

Chapter 2 - The Ubiquitin Fold: Leveraging structural genomics 17

2.1 Summary 18

2.2 Introduction 18

2.3 Methods 21

2.3.1 Identifying human ubiquitin-like domains 21

2.3.2 Validating putative human ubiquitin-like domains 22

2.3.3 Target selection 24

2.3.4 Construct design 25

2.3.5 Sample preparation 26

2.3.6 1H15N-HSQC screening of ubiquitin-like domains 26

2.4 Results & Discussion 27

2.4.1 Identifying unannotated human ubiquitin-like domains 27

2.4.2 Small-Scale Screening 29

2.4.3 Screening by 1H15N-HSQC 29

2.4.4 Structural Coverage - Completing the UBL Phylogenetic Tree 31

2.5 Conclusion 36

Chapter 3 - Solution NMR structure determination of human ubiquitin-like domains 37 in NFATc2IP & ubiquilin-1

3.1 Introduction 38

3.1.1 NFATc2IP 38

3.1.2 Ubiquilin-1 39 vi

Table of Contents (continued)

3.1.3 Ubiquitin-like Fold 39

3.2 Experimental Procedures 40

3.2.1 NFATc2IP UBL domain NMR structure determination 40

3.2. 2 Ubiquilin-1 UBL domain NMR structure determination 41

3.2. 3 Comparative analysis of ubiquilin-1, NFATc2IP, 44 ubiquitin & SUMO2

3.2. 4 Protein-protein interaction partner identification 46

3.2. 5 Binding interface analysis 46

3.3 Results & Discussion 47

3.3.1 Structure determination 47

3.3.2 Comparative analysis of ubiquilin-1, NFATc2IP & similar 52 ubiquitin-like modifiers

3.3.2.1 Similar canonical ubiquitin-like modifiers: ubiquitin & SUMO-2 53

3.3.2.2 Structural comparison between ubiquilin-1 & NFATc2IP 53

3.3.2.3 Structural comparison between ubiquilin-1 & ubiquitin 55

3.3.2.4 Structural comparison between NFATc2IP & SUMO2 57

3.3.2.5 Structural differences between NFATc2IP_2nd & SUMO2 58

3.3.3 From Structure to Function: Exploring Protein-Protein Interactions 59 involving ubiquitin-like domains

3.3.3.1 The ubiquitin-Interacting Motif interaction interface 59

3.3.3.2 Putative UIM Interaction Interface: Conserved Amino Acids 62

3.3.3.3 Putative UIM Interaction Interface: Similar Electrostatic 63 Potential Distribution

3.3.3.4 Surveying Known UIM-Binding Partners 64

3.3.3.5 PIN1 – Peptidyl-Prolyl cis/trans 67

3.3.3.6 Identifying a putative UIM in PIN1 67

3.3.3.7 Ubiquilin-1 & PIN1 NMR Titration 68

vii

Table of Contents (continued)

3.3.3.8 Analysis of the ubiquilin-1 & PIN1 interface 70

3.3.4 Binding-Partner Driven - Structural analysis of the 71 SUMO-Interacting Motif binding interface

3.3.4.1 NFATc2IP Binding Partners 71

3.3.5 SUMO-Interacting Motif 72

3.3.5.1 Identifying putative SIMs in NFATc2 72

3.3.6 NFATc2IP:NFATc2 NMR titration 74

3.3.6.1 Analysis of the NFATc2IP:NFATc2 interface 75

3.4 Conclusion 77

Chapter 4 - Exploring UBLs & UBL-Interaction Motifs: Computational & 78 Experimental analysis of ubiquilin, NFATc2IP, UIMs and SIMs.

4.1 Introduction 79

4.1.1 Database & comparative analysis 79

4.1.1.1 Similarities & differences between model family members 80

4.1.1.2 Common defining features for each modelling family 80

4.2 Experimental Procedures 81

4.2.1 UBL Database Development 81

4.2.2 Relating 17 structurally determined UBLs to nearest neighbours 82 and model families

4.2.3 Secondary structure prediction & analysis 83

4.2.4 Relating structural features to functional pathways 83

4.3 Results 84

4.3.1 Structurally characterized ubiquitin-like domains 84

4.3.2 Nearest-neighbours of ubiquitin-like domains 85

4.3.3 Nearest-neighbours of structurally characterized UBMs 86

4.3.4 Grouping UBLs based on biological processes and 89 molecular function

viii

Table of Contents (continued)

4.3.5 Grouping UBLs based on medical significance 91

4.3.5.1 Cellular localization 92

4.3.6 Grouping UBLs based on cell localization 93

4.4 Conclusion 95

Chapter 5 - Conclusion and Future Directions 96

5.1 Conclusions 96

5.2 Future Directions 97

5.2.1 Ubiquitin-like domain fold, NFATc2IP & ubiquilins 97

5.2.2 Ubiquitin-like domain structural genomics 98

5.2.3 family analyses 98

5.3 Concluding remarks 98

Chapter 6 - References 99

ix

List of Tables

Table 1.1: List of 18 annotated ubiquitin-like modifiers, and associated enzymatic 8 complement, substrates and functional pathways.

Table 1.2: Protein-protein interaction modes structurally characterized with 14 experimentally determined binding affinities between UBLs and binding partners.

Table 2.1: Summary of small-scale expression screening of human ubiquitin-like 29 domains structurally characterized and deposited in the PDB as part of this thesis.

Table 2.2: Summary of 1H15N-HSQC screening results for human ubiquitin-like 30 domains. 10 ubiquitin-like domains were solved by NMR (red), and 7 ubiquitin-like domains were solved by X-ray crystallography (blue).

Table 2.3: All human ubiquitin-like domains that remain to be structurally determined, 33 along with their most similar protein structure and biological significance.

Table 3.1: NMR data and refinement statistics. 48

Table 3.2: Secondary structure elements of NFATc2IP, ubiquilin-1, ubiquitin and 50 SUMO1/2/3.

Table 3.3: Sequence similarity & identity between NFATc2IP, ubiquilin-1, ubiquitin 53 and SUMO1/2/3/4.

Table 3.4: UIM:ubiquitin complexes deposited in the PDB, along with UIM sequence. 59

Table 3.5: Human proteins that contain at least one canonical UIM motif and observed 64 to interact with ubiquitin, along with the number of supporting publications and supporting structural complexes that have been deposited in the PDB.

Table 3.6: Human proteins that contain at least one canonical UIM motif and observed 64 to interact with members of the ubiquilin family (Turner et al., 2010).

Table 3.7: 17 human proteins that interact with both human ubiquitin and a member 65 of the ubiquilin family, and that also contain at least one UIM motif.

Table 3.8: UIM motif and 4 variations of the UIM motif were used to identify 17 human 66 proteins that interact with both human ubiquitin and a member of the ubiquilin family.

Table 4.1: Data sources for ubiquitin-like domain repository. 82

Table 4.2: Biological significance and functional annotation for each of the 17 90

ubiquitin-like domains structurally characterized for this project.

Table 4.3: Tissue and cell localization for each of the 17 UBL structurally 92 characterized for this project.

Table 4.4: Structural alignment of lysines within ubiquitin and ubiquitin-like domains 94 characterized within both cytoplasm and ER; nucleus, cytoplasm and ER; and only nucleus. x

List of Figures

Figure 1.1: Ribbon & molecular surface representations of the ubiquitin. 3

Figure 1.2a: Phylogenetic tree of known ubiquitin-like domains in 2006. 4

Figure 1.2b: Phylogenetic tree of known ubiquitin-like domains in 2015. 5

Figure 1.3: Ubiquitin-like modifier conjugation cascade. 10

Figure 1.4: Ubiquitin conjugation cascade. 10

Figure 2.1: Novel UBL discovery process. 21

Figure 2.2: Secondary & tertiary structures of Human ubiquilin-1. 22

Figure 2.3: Pseudo-multiple sequence alignment of human ubiquilin-1. 23

Figure 2.4: UBL target selection, preparation and screening process. 24

Figure 2.5: Pseudo-multiple sequence alignment of ubiquilin-1 for construct design. 25

Figure 2.6: Distribution of structurally characterized and uncharacterized UBLs. 28

Figure 2.7: Examples of 1H15N-HSQC screening results for human UBLs. 30

Figure 2.8: Clustering of human UBLs into groups based on sequence similarity. 31

Figure 3.1: Secondary structure and H-bond patterns of ubiquilin-1. 49

Figure 3.2: Secondary structure and H-bond patterns of NFATc2IP. 49

Figure 3.3: Ribbon diagrams of ubiquilin-1, NFATc2IP, ubiquitin, SUMO1, SUMO2 & 50 SUMO3.

Figure 3.4: Molecular surfaces of ubiquilin-1. 51

Figure 3.5: Molecular surfaces of NFATc2IP. 51

Figure 3.6: UIM-interaction interface of ubiquilin-1 and NFATc2IP. 52

Figure 3.7: Similarities between ubiquilin-1 and NFATc2IP. 54

Figure 3.8: Similarities between ubiquilin-1 and ubiquitin. 56

Figure 3.9: Similarities between NFATc2IP and SUMO2. 58

Figure 3.10: UIM -helices from PSMD4, VPS27 and HGS. 60

Figure 3.11: Ubiqutin:PSMD4(UIM) complex. 61

xi

List of Figures (continued)

Figure 3.12: UBL residues within UIM-interaction interface. 62

Figure 3.13: Multiple sequence alignment of UBLs from ubiquilin family members. 62

Figure 3.14: Similarity tree based on electrostatic potential within 4 Å of UIM-binding 63 interface.

Figure 3.15: Sequence alignment of UIMs within PSMD4, DNJB2, EPN1 and PIN1. 65

Figure 3.16: Putative human PIN1 UIM. 67

Figure 3.17: Ubiquilin-1:PIN1 NMR titration. 69

Figure 3.18: Putative ubiquilin-1:PIN1 interaction. 70

Figure 3.19: NFATc2 SUMO Interacting Motifs. 73

Figure 3.20: Diversity of SIM motifs. 73

Figure 3.21: NFATc2IP:NFATc2 NMR titration. 74

Figure 3.22: Electrostatic potential of NFATc2IP & SUMO2. 75

Figure 3.23: Electrostatic potential diversity between similar UBLs. 76

Figure 4.1: Database schema of ubiquitin-like domain repository. 81

Figure 4.2: Secondary & tertiary structures of 17 structurally characterized UBLs. 84

Figure 4.3: Nearest-neighbour clustering of UBLs displayed with proportional 85 transformed branches.

Figure 4.4: UBLs with a structural fold similar to FUBI-1. 86

Figure 4.5: UBLs with a structural fold similar to the second UBL of ISG15. 87

Figure 4.6: UBLs with a structural fold similar to SF3A1. 88

Figure 4.7: Distribution of human UBLs based on cellular localization. 93

xii

List of Appendices

Appendix I: All human that encode at least one ubiquitin-like domain. 113

Appendix II: All human genes and isoforms that encode ubiquitin-like domains. 119

Appendix III: 205 proteins observed to interact with both ubiquitin and at least one 131 member of the ubiquilin family.

Appendix IV: 127 putative UIM sequences within 106 proteins that interact with both 133 ubiquitin and at least one member of the ubiquilin family.

Appendix V: Six similarities trees of ubiquitin-like domains clustered based on 137 electrostatic potential at varying distances (1 Å to 6 Å) from the UIM-binding interface, along with groups of ubiquitin-like domains that share strong electrostatic potential similarity at that specific range.

xiii

List of Abbreviations

AESOP Analysis of electrostatic similarities of proteins

CUE Coupling of ubiquitin conjugation to endoplasmic reticulum degradation

DUB De-ubiquitylating enzyme

DUIM Double-sided ubiquitin interacting motif

E1 Ubiquitin activating enzyme

E2 Ubiquitin conjugating enzyme

E3 Ubiquitin protein

GAT GGA and Tom1 domain

GLUE GRAM-like ubiquitin binding in Eap45

IPTG Isopropyl-1-thio-D-galactopyranoside

MIU Motif interacting with ubiquitin

NESG North-east structural genomics consortium

NFAT Nuclear factor of activated T-cells

NMR Nuclear magnetic resonance

NZF Npl4 Zing Finger Motif

PAZ Polyubiquitin associated zinc finger

PE Phosphatidylethanolamine

PIN1 Peptidyl-prolyl cis/trans isomerase

PSSM Position-specific scoring matrix

SGC Structural genomics consortium

SIM SUMO interacting motif

UBA Ubiquitin-associated domain

UBD Ubiquitin-binding domain

UBL Ubiquitin-like domain

UBM Ubiquitin-like modifier

xiv

List of Abbreviations (continued)

UEV Ubiquitin conjugating enzyme variant

UIM Ubiquitin interacting motif

VHS Vps27,Hrs,STAM

xv

1

Chapter 1 Introduction 1.1 Overview

Ubiquitin, the original member of the ubiquitin-fold superfamily, is a highly conserved 76 residue regulatory protein found in all eukaryotic cells. It was initially characterized as a post-translational modification moiety that mediates ATP-dependent proteolytic degradation, yet has since been recognized as a signaling modulator with multiple regulatory roles mediated by transient protein- protein interactions. My research focuses on the similarities and variations between human ubiquitin-like domains, and their influence on protein-protein interactions. My goal is to define the family of ubiquitin-like domains in the human proteome and to understand the extent of the diversity of amino acids within the protein-protein interaction interfaces of the ubiquitin-like domain, and the insights into their functional pathways. The first chapter provides an introduction to ubiquitin and ubiquitin-like domains, as well as a rationale for the aims of this thesis. Chapter

Two discusses structural genomics approaches that were implemented to facilitate the experimental screening and determination of 17 human ubiquitin-like domains for this project.

Chapter Three describes the structure determination of the second ubiquitin-like domain of

NFATc2IP and the ubiquitin-like domain of ubiquilin-1, and introduces approaches for predicting functional activity by combining their structural data with information about other ubiquitin-like domains. This chapter also examines protein-protein interactions that were predicted between

NFATc2IP and NFATc2 through a predicted SIM-like interaction, as well as interactions between ubiquilin-1 and PIN1 through a predicted UIM-like interaction. Chapter Four combines additional analyses with the lessons learned from Chapters two and three to facilitate analyses and predictions related to the set of human ubiquitin-like domains associated with the 17 ubiquitin-like domains that were structurally characterized as part of this thesis. The final chapter of the dissertation discusses the significance of these findings, relating observations to the entire human

2 ubiquitin-like domain superfamily, in addition to providing future directions and concluding remarks.

1.2 Biological significance of ubiquitin & ubiquitin-like modifiers

Conjugation of ubiquitin and ubiquitin-like modifiers is necessary for the regulation and translocation of proteins. Ubiquitin conjugation, also referred to as ubiquitylation, has been implicated in having a regulatory role in cellular processes, such as protein degradation, cell cycle control, regulation, DNA damage repair, antigen processing, activation of transcriptional factors and kinases, endocytosis, protein sorting, membrane trafficking, and stress response (Haglund et al., 2005). Ubiquitylation is also involved in biological functions, such as inflammation, , and silencing the inactive X in female mammals (de Napoles et al., 2004). The disruption of ubiquitin conjugation pathways has been associated with various human illness, ranging from neurodegenerative disorders, developmental abnormalities, autoimmune diseases, neuromuscular disorders and malignancies (Ciechanover et al., 2004). UBMs are also involved in a variety of biological processes, including pathogenesis of viruses and bacteria. Some UBMs protect against viruses, while other viruses depend on UBMs for survival; and some bacteria effectors target ubiquitylation machinery (Angot et al., 2007).

1.3 Protein modification & ubiquitin

In 1975, ubiquitin was discovered and initially identified as a tag for targeted proteasomal degradation (Schlesinger et al., 1975). Proteins are targeted for proteasomal degradation through a process referred to as ubiquitylation, which involves covalent modification of a surface exposed lysine by ubiquitin. It is a highly conserved 76 residue protein found only in eukaryotic cells.

Within humans, there are four genes that encode ubiquitin as two distinct classes: a poly-

Ub gene that encodes a precursor protein with tandemly repeated ubiquitin domains (ie. UBB and

UBC), and fusion precursor proteins in which a single ubiquitin domain is linked to a (ie. RPS27a and UBA52). The ubiquitin region of all four genes are entirely conserved,

3 suggesting that mutations are negatively selected. The covalent association between ubiquitin with ribosomal proteins has been suggested to promote their association with ribosomes (Finley et al., 1989). This is an interesting attribute, since the putative UBM FAU is also fused to a ribosomal protein and the gene structure could relate to the functional activity of the protein.

1.4 The ubiquitin fold

Figure 1.1: Ribbon & molecular surface representations of ubiquitin. The secondary structure elements and molecular surface of the ubiquitin fold are displayed from two orientations with conserved lysine amino acids displayed as cyan ball and stick representation.

The ubiquitin-fold consists of a 5-strand mixed -sheet that is intercalated by a 2-helix -helical core (Figure 1.1). There are 5 key structural features of ubiquitin that are associated with its biological activity: the C-terminal -RLRGG peptide, 7 lysine residues that could be involved in poly-ubiquitin chain formation (Komander et al., 2009), a conserved leucine 8 / isoleucine 44 / valine 70 triad involved in E1 and ubiquitin-binding domain interactions, histidine 68 involved in

E1-ubiquitin thioester formation, and protein-protein interaction interfaces associated with interactions with ubiquitin-binding domains that regulate a variety of downstream molecular pathways. These structural features were used when performing comparative analyses of UBLs.

4 1.5 Ubiquitin-like domains (UBLs)

Figure 1.2a: Phylogenetic tree of known ubiquitin-like domains in 2006. There were 78 protein domains classified as human ubiquitin-like domains in 2006, of which 18 were known ubiquitin-like modifiers (blue) and 7 domains were putative ubiquitin-like modifiers based on sequence features (orange). Ubiquilin1 & NFATc2IP are highlighted with red arrows, because they play a significant role in this dissertation.

5

Figure 1.2b: Phylogenetic tree of known ubiquitin-like domains in 2015. There are 448 human ubiquitin-like domains within human proteins identified through bioinformatics techniques described in this thesis; 18 of the domains are known ubiquitin-like modifiers [ : ATG8, FAU_1-1, ISG15_1-2, NEDD8_1-1, SUMO1_1-1, SUMO1_2-1, SUMO2_1-1, SUMO2_2-1, SUMO3_1-1, UBB_1-1/UBC_1-1/RPS27A_1-1/UBA52_1-1, URM1_1-1, UBD_1-2 (aka FAT10), and UFM1_1-1], and 22 domains are putative ubiquitin-like modifiers based on sequence features [ : HERPUD2_1-1, PARK2_1-1/PARK2_2- 1/PARK2_5-1, PARK2_2-2, PIK3CA_1-2, PTPN3_1-2, PTPN13_3-6/PTPN13_4-7, SF3A1_1-1, SHARPIN_1-1/SHARPIN_2- 1/SHARPIN_3-1, TMUB2_1-1/TMUB2_2-2, SHROOM1_1-1/SHROOM1_2-1, USP40_3-1, USP5_1-1, VCPIP1_1-2, WDR48_1-1, and WDR48_5-1].

6

Within the human genome, there are 220 genes that encode 448 protein domains that share the same structural fold as ubiquitin (Figure 1.2b); at the start of this project in 2006, there were 78 known human ubiquitin-like domains of which 18 were known ubiquitin-like modifiers and 7 were putative ubiquitin-like modifiers (Figure 1.2a). Even with the same structural fold, they have different binding partners and diverse biological functions in the host organism, as well as viral and bacterial pathogens. Sixteen of these UBLs have been characterized as UBMs, which can become conjugated to target proteins (Table 1.1). An additional 22 putative UBMs are predicted to become conjugated to target proteins due to the presence of a characteristic C-terminal double- glycine tail, but lack evidence of conjugated formation. The remaining 410 UBLs contain a ubiquitin-like fold along with other structural domains, and can modulate the ubiquitylation pathway in some cases by competing with UBMs when interacting with proteins that contain ubiquitin-binding domains (Hochstrasser et al., 2009).

1.6 Ubiquitin-like modifiers (UBM)

Until the 1990s, ubiquitin was thought to be the only post-translational modification that involved the covalent linkage of a protein modifier. That was until ISG15/UCRP was discovered to undergo a similar mechanism and became the first UBM studied in vitro (Loeb KR & Haas AL, 1992). Most of the UBMs become conjugated to surface exposed lysines of target proteins through an analogous but distinct enzymatic cascade. Many UBMs are associated with essential cellular processes, yet the amount of functional information about them remains limited.

Of the UBMs that have been functionally characterized: SUMO targets lysines within conserved motifs (ie. ФKXE, phosphorylation-dependent sumoylation motif & negatively charged - dependent sumoylation motif) (Yang et al., 2006), and is involved in transcriptional regulation and genome surveillance (Müller et al., 2004). NEDD8 modification is involved in cell cycle control and in embryogenesis by up-regulating the activities of -based E3 (Pan et al., 2004).

Covalent attachment of Atg12 to Atg5 is essential for (Mizushima et al., 1998). Apg8,

7

MAP1LC3A, MAP1LC3B, MAP1LC3C, GABARAP, GABARAPL1, and GABARAPL2 are involved in lipidation through a ubiquitylation-like system (Ichimura et al., 2000). UBL5 is a unique member of the UBMs, since it contains a C-terminal double-tyrosine motif, instead of the characteristic double-glycine. The structure of UBL5 was solved by NMR, and the overall fold was similar to ubiquitin, even though they share only 17.5% sequence identity (McNally et al., 2003). However, experimental evidence remains necessary to determine whether UBL5 conjugation occurs.

8

Table 1.1: List of 18 annotated ubiquitin-like modifiers and associated enzymatic complement, substrates and functional pathways.

Yeast % Mono Ubiquitin-like USP / Functional Homologu Seq C-term E1 E2 E3 / Substrate Modifier DUB annotation e ID Poly Ube1 Many, dependent Ubiquitin Ubiquitin 100% Yes / >37 >600 ~80 M & P Thousands on linkages Uba6 RBX1/RBX and 2, related UBA3 SMURF1, proteins (Parc - Ubc12, CBL, Alter interactions, Nedd8 Rub1 58% Yes SENP8 M & P and Cul7), APPB Ube2F , conformation p53, p73, P1 MDMX, Mdm2, pVHL, SCF, BCA3, EGFR TRIM40 TCRα-like MNSFβ Immuno- 36% Yes protein, Bcl-G, (Fub1, Fau) regulatory role Endophilin II Antiviral ISG15 Ube1 UbcH8, Viral and host 28/37 Yes Herc5 UBP43 M immunity, IFN- (UCRP) L UbcH6 proteins inducible Ub-independent proteasomal FAT10 27/36 No Uba6 Use2 Use2 degradation, immunoregulatory role Erythroid and UfSP1 UFM1 23 Yes Uba5 Ufc1 Ufl1 C20orf116 megakaryocyte UfSP2 development SAE1 Alter interactions, SUMO1 Smt3 14 Yes - Ubc9 ~15 SENP1-2 M Hundreds localization, SAE2 conformation SAE1 Alter interactions, SENP1-3, SUMO2 13 Yes - Ubc9 ~15 M & P Hundreds localization, 5-7 SAE2 conformation SAE1 Alter interactions, SENP1-3, SUMO3 13 Yes - Ubc9 ~15 M & P Hundreds localization, 5-7 SAE2 conformation NFκB signaling, SUMO4 12 IκBα pseudogene or not processed Autophagy, Atg12 Atg12 12 No Atg7 Atg10 M Atg5, Atg3 mitochondrial

MOCS3, tRNA thiolation ATPBD3, MOC and oxidant- Urm1 Urm1 17 No M UPF0432, S3 induced protein CAS, USP15, modification yeast: Ahp1

Autophagosome Phosphatidylet Atg12/5 biogenesis: MAP1LC3A Atg8 9 Yes Atg7 Atg3 Atg4A-D M hanolamine /16L tethering and (PE) fusion Phosphatidylet Atg12/5 biogenesis: MAP1LC3B Atg8 13 Yes Atg7 Atg3 Atg4A-D M hanolamine /16L tethering and (PE) fusion Autophagosome Phosphatidylet Atg12/5 biogenesis: MAP1LC3C Atg8 10 Yes Atg7 Atg3 Atg4A-D M hanolamine /16L tethering and (PE) fusion Selective Phosphatidylet autophagy via Atg12/5 GABARAP Atg8 8 Yes Atg7 Atg3 Atg4A-D M hanolamine interaction with /16L (PE) autophagy receptors Functional GABARAPL1 / Phosphatidylet Atg12/5 difference Atg8L / Atg8 12 Yes Atg7 Atg3 Atg4A-D M hanolamine /16L between isoforms GEC1 (PE) is unclear Functional GABARAPL2 / Phosphatidylet Atg12/5/16 difference GATE-16 / Atg8 14 Yes Atg7 Atg3 Atg4A-D M hanolamine L between isoforms GEF2 (PE) is unclear

9 1.7 Ubiquitin-like structural domains (UBL)

The human genome contains 220 genes that encode proteins with at least one ubiquitin-like domain, of which 38 can be classified as known or potential UBMs. The remaining non-modifying

UBLs could act as permanent structural features that facilitate protein targeting interactions to regulate a variety of cellular activities that include transcription, translation, nuclear transport, , autophagy, antiviral pathways, and processes associated with poly-ubiquitylation, such as endocytosis, membrane-protein trafficking, cell signaling and DNA repair (Grabbe & Dikic,

2009). There is no known generalizable function for the UBL fold, aside from mediating protein- protein interactions and the role of the small set of UBMs.

1.8 Ubiquitin Conjugation Cascade

Ubiquitin and UBMs are conjugated to their target substrate through a series of enzymatic reactions that result in conjugation of the C-terminus of ubiquitin-like fold to the -amino group of a surface exposed lysine within the target substrate. The involved in this cascade consist of an E1, an E2, and an E3 (Figure 1.3 & Figure 1.4). A computational analysis has determined that there are 16 human E1s, 53 human E2s, 527 human E3s, and 184 human DUBs

(Xu & Peng, 2006; Semple CA, 2003).

The activating enzyme (E1) activates ubiquitin by catalyzing the ATP-dependent formation of a thioester bond involving a free thiol of the catalytic Cys and the C-terminal glycine of ubiquitin, which facilitates the transfer of the C-terminal glycine to a surface exposed Cys on a conjugating enzyme (E2) (Figure 1.4). This is followed by either the C-terminal glycine of ubiquitin being transferred to a Cys of a protein ligase (E3) or the formation of a covalent conjugation between the C-terminal glycine and an -amino group of a surface exposed lysine within the target protein.

There are also some rare cases where the N-terminal amino group, a cysteine residue, a threonine residue, or a serine residue within a target protein acts as ubiquitylation sites (Wang et al., 2007).

10

Figure 1.3: Ubiquitin-like modifier conjugation cascade. Enzymes in the ubiquitin conjugation cascade consist of E1, E2s, and in some cases E3s that are uniquely associated with specific UBMs (Hochstrasser M, 2000).

Figure 1.4: Ubiquitin conjugation cascade. The enzymatic cascade that mediates ubiquitin conjugation is similar for all UBMs. It involves ATP, ubiquitin activating enzymes (E1), ubiquitin conjugating enzymes (E2), and ubiquitin ligases (E3), and results in the conjugation of the UBM to a surface exposed lysine on the target protein. Conjugation is a dynamic process, and de-ubiquitylating enzymes (DUBs) can release the UBM from the target protein.

11 1.9 Ubiquitin-binding domains & interactions

Ubiquitin-binding proteins are key players in modulating the downstream activity of UBM conjugation. Ubiquitin-binding proteins contain regions that are 20 to 150 residues that non- covalently interact with the members of ubiquitin-fold superfamily. Some ubiquitin-binding regions are independent domains (ie. UBA, VHS, CUE), and other ubiquitin-binding regions consist of individual secondary structure elements (ie. UIM and SIM). Ubiquitin-binding domains (UBDs) were first identified as interaction partners of ubiquitin, but several UBD family members do not interact with ubiquitin. The specificity of such ubiquitin-binding domain proteins could favour other

UBLs.

Many UBDs have been observed in the enzymatic components of the UBM cascade, as well as in proteins that are involved in the downstream translocation or functional effect of protein conjugation. Due to the transient nature of these interactions, binding is on the moderate to low affinity scale; Kd of ~460uM for GRAM-like ubiquitin binding in Eap45 (GLUE)-monoubiquitin, compared to an apparent Kd of ~0.03-9uM for UBA-polyubiquitin (Haglund et al., 2005). The interaction itself appears to be controlled by post-translational modification of the UBD-containing protein, accessibility of the ubiquitin-binding interface and accessibility of the UBD-binding interface. A relevant example of UBD modulation involves RAD23, which shuttles conjugated proteins to the . The RAD23-ubiquitin interaction is inhibited by the association of its

UBD with its UBL (Chen et al., 2001). Whether the role of UBL is to regulate UBD-ubiquitin or

UBD-UBM interactions has been explored through the course of this thesis.

1.9.1 Ubiquitin Interacting Motif (UIM), Motif Interacting with Ubiquitin (MIU) & Double-sided Ubiquitin Interacting Motif (DUIM)

The ubiquitin interacting motif (UIM) is the ubiquitin-interacting region of the S5A/RPN10 proteasomal subunit (Young et al., 1998). This UIM is a short ~20 aa -helical segment of a protein. Through sequence analysis, putative human UIMs were identified and some of these

12 peptides were selected as putative UIM binding partners for ubiquitin and ubiquilin-1. Two additional ubiquitin-interacting motifs are similar to the UIM: MIUs which bind in a manner almost identical to the UIM:Ub interaction but in the opposite orientation, and DUIMs which consist of two tandem UIMs.

1.9.2 Coupling of Ubiquitin conjugation to Endoplasmic Reticulum Degradation (CUE)

The coupling of ubiquitin conjugation to endoplasmic reticulum degradation domain was discovered through yeast-two hybrid screening by two independent groups (Shih et al., 2003;

Donaldson et al., 2003), and structural analyses have resulted in 7 structures (ie. CUE2 [PDB_ID:

1OTR] & VPS9 [PDB_ID: 1P3Q]). The CUE domain consists of a three-helix bundle, from which residues on two -helices interact with ubiquitin.

1.9.3 Ubiquitin-Associated Domain (UBA)

The ubiquitin-associated domain (UBA) was identified through bioinformatics analyses of enzymes involved in ubiquitylation or deubiquitylation (Hoffmann et al., 1996). UBA interact with both monoubiquitylated and polyubiquitylated proteins, and structural analyses have resulted in

45 structures (ie. Dsk2p [PDB_ID: 1WR1] & ubiquilin 3 [PDB_ID: 2DAH]). The UBA domain is similar to the CUE domain in that it consists of a three-helix bundle, from which residues on two

-helices interact with ubiquitin.

1.9.4 Ubiquitin Conjugating Enzyme Variant (UEV)

The ubiquitin conjugating enzyme variant (UEV) proteins are homologous to E2s, but are inactive because they lack the Cys. Even though they are catalytically inactive, they are able to interact with ubiquitin through their conserved ubiquitin-binding interface (Koonin et al., 1997).

Structural analyses of UEV have resulted in 12 structures (ie. TSG101 [PDB_ID: 1S1Q] & VPS23

[PDB_ID: 1UZX]).

13

1.9.5 Npl4 Zing Finger Motif (NZF)

The Npl4 zinc finger (NZF) motif is also a zinc finger binding motif (Meyer et al., 2002; Wang et al., 2003). Structural analyses of NZF have resulted in 3 structures [PDB_ID: 1Q5W, 1NJ3,

2PJH]. The NZF motif binds to ubiquitin through three residues that are located on loops coordinated by strands ordered by the zinc ion.

1.9.6 GGA And Tom1 Domain (GAT)

The GGA and Tom1 (GAT) domain was discovered by two-hybrid screens (Shiba et al., 2004), and structural analyses have resulted in 5 structures [PDB_ID: 1YD8, 1WR6, 1WRD, 2C7M, and

2C7N]. The GAT domain is similar to both the CUE and the UBA domains in that it consists of a three-helix bundle, from which residues on two -helices interact with ubiquitin. However, the orientation of the helices differ, such that the two -helices are parallel for GAT and are anti- parallel in both CUE and UBA.

1.9.7 Other Ubiquitin Binding Domains

The GRAM-like ubiquitin binding in Eap45 (GLUE) domain has been structurally determined 4 times (Teo et al., 2006), and the Vps27,Hrs,STAM (VHS) domain has been structurally determined 12 times (Hoffman et al., 2001). The polyubiquitin associated zinc finger (PAZ) domain was discovered by two-hybrid screens, and was further characterized biochemically

(Hook et al., 2002).

1.9.8 SUMO Interacting Motif (SIM)

Binding partners and modes have been identified for some ubiquitin-like modifiers, such as the

SUMO-interacting Motif (SIM) that interacts with SUMO. The SIM is a short -strand that behaves as a -sheet extension to that of SUMO.

14

1.9.9 Diversity among Ubiquitin-Binding Domains

From the structural studies of UBD-UBM interactions, some similarities have been observed.

However, there is a great diversity involving the tertiary folds of the protein involved in the interaction; residues from individual and adjacent -helices, -strands, as well as loops interact with ubiquitin or a ubiquitin-like domain (Table 1.2). The diversity amongst the binding modes also changes across members within the same UBD families. However, one common feature shared by many of the UBD interactions is that they usually extend along the isoleucine 44 face of ubiquitin, which is highly conserved throughout evolution and to a minor extent between UBLs

(Haglund et al., 2005).

Table 1.2: Protein-protein interaction modes that have been structurally characterized with experimentally determined binding affinities between UBLs and binding partners. Example Ubiquitin Binding Type Size Affinity Reference PDB Young P, 1998; ~100-400 µM UIM / DUIM / MIU Fisher RD, 2003; ~20 aa (mono or poly-Ub) 1Q0W Ubiquitin Interacting Motif Swanson KA, 2003; ~30 µM (MIU) Wang QH, 2005

SIM Song J, 2005; ~12 aa ~2-10 µM 2ASQ SUMO Interacting Motif Hecker CM, 2006

Donaldson KH, 2003; CUE ~2-160 µM Kang RS, 2003; Coupling of Ubiquitin conjugation to 42-43 aa 1P3Q, 1OTR (mono-Ub) Prag G, 2003; Endoplasmic Reticulum Degradation Shih SC, 2003

GAT ~180 µM Shiba Y, 2004; 135 aa 1YD8 GGA And Tom1 Domain (mono-Ub) Prag G, 2005

GLUE ~460 µM ~135 aa 2DX5 Teo H, 2006 GRAM-like ubiquitin binding in Eap45 (mono-Ub)

Meyer HH, 2002; NZF ~100-400 µM ~35 aa 1Q5W Wang B, 20003; Npl4 Zing Finger Motif (mono-Ub) Alam SL, 2004; 2FID A20 ZnF Lee S, 2006; ~35 aa ~10-25 µM 2FIF A20 ZnF Domain Penengo L, 2006 2G45

UBC Ubiquitin Conjugating Catalytic Domain ~150 aa ~300 µM 2FUH Brzovic PS, 2006

~10-500 µM (mono-Ub) UBA 45-55 aa 2JY6, 1ZO6 Hofmann K, 1996; Ubiquitin-Associated Domain ~0.03-9 µM (poly-Ub)

Hook SS, 2002; ~3 µM PAZ (ZnF-UBP) ~58 aa 2G45, 3IHP Boyault C, 2006; Polyubiquitin Associated Zinc finger ~60 nM Reyes-Turcu, 2006

UEV ~100-500 µM Koonin EV, 1997; ~145 aa 1S1Q Ubiquitin Conjugating Enzyme Variant (mono-Ub) Sundquist WI, 2004

VHS 150 aa ~50 µM 2L0T, 3LDZ Hong YH, 2009 Vps27,Hrs,STAM

15 1.10 Thesis Overview

Ubiquitin plays a vital role in protein trafficking, protein degradation, and a variety of disease pathways. Significant advances in the study of ubiquitin, ubiquitin-binding domains, UBLs, ubiquitin-like modifiers, and ubiquitin conjugating enzymes have led to a better understanding of the complexity of the ubiquitin and ubiquitin-like modifier conjugation system. However, there remains a gap in knowledge associated with the overarching significance of the ubiquitin fold, and the nature and function of many UBLs remains largely unexplored.

This thesis explores the size and scope of human UBLs, which led to a structure and biophysical examination of 17 UBLs. Analysis of the 17 UBLs led to the analysis of two UBL-binding domains that interact with two distinct UBLs (NFATc2IP & ubiquilin-1), as well as revealing the biochemical relationship between these 17 UBLs with each other and within the full set of all UBLs.

1.10.1 Identify and obtain near-complete structural coverage of all human UBLs.

The first experimental component of this study focused on identifying the complete set of all human UBLs encoded within the human genome, which allowed for a better understanding of the breadth and sequence diversity of ubiquitin’s -grasp fold. Upon determination of the expansive population of human UBLs, we obtained near-complete structural coverage of the ubiquitin-like fold for the human proteome. This resulted in generating 100 modelling families of related UBLs and experimental structural determination of 17 UBLs.

1.10.2 Exploring the NFATc2IP + NFATc2 protein-protein interaction and the ubiquilin-1 + PIN2 protein-protein interaction

To assist in understanding the structural and functional diversity of the ubiquitin-like domain, computational analyses of NFATc2IP & ubiquilin protein sequences, molecular structures and known binding partners were performed. This led to the deduction that NFATc2IP could interact with NFATc2 via SIM-like interaction, which was validated using peptide-array and NMR titration

16 experiments. A similar series of computational analyses was performed using the ubiquilin-1 protein sequence and structure, which led to the deduction that PIN2 could interact with ubiquilin via UIM-like interaction. This was validated using NMR titration experiments.

17

Chapter 2 The ubiquitin fold: leveraging structural genomics

Contributions: J. Everett performed clustering of UBLs into model families. A. Semesi, M. Garcia

& A. Yee assisted with cloning, small scale sample preparation & small scale expression/solubility screening. J. Lukin, C. Fares, M. Karra, S. Srisalam, S. Houliston assisted with NMR data acquisition and NMR titration. I performed large scale NMR sample preparation and NMR screening, as well as remaining experiments and analyses under the guidance of CH. Arrowsmith.

18 Chapter 2 The ubiquitin fold: leveraging structural genomics 2.1 Summary

Structural genomics brings together information about not just the protein for which a structure is obtained, but also sequentially similar homologues and even distantly related fold family members. For this thesis, structural genomics provided the tools for gaining insight into the diversity of the ubiquitin-like domain family. Bioinformatics and computational techniques were leveraged to expand the set of known human ubiquitin-like domain containing genes, prioritize subsets of human ubiquitin-like domain containing genes based on their structure’s role in domain family structure coverage, and assist in construct design for structure determination. We used nuclear magnetic resonance (NMR) spectroscopy to screen human UBLs for structure determination, and subsequently determined the structures of 17 human UBLs using X-ray

Crystallography and NMR spectroscopy. As a result, the RCSB PDB now has 32% structural coverage of human UBLs, and 82% structural coverage when taking into account models of UBL domains that have at least 30% sequence identity over the enter length of the full domain. Of the remaining 74 human UBLs that lack structural information, 30 are singletons and are on average 36% similar & 23% identical to the most similar regions of protein structures within the PDB. The UBLs structurally characterized for this project facilitate 3.7% structural coverage of all human UBLs. When taking into account UBL homology models, the structural coverage is

6%. Structural analyses have also provided insight into families of related proteins. In particular, structural analysis of the NFATc2IP and ubiquilin protein families revealed insight into protein- protein interactions and facilitated the prediction of novel binding partners.

2.2 Introduction

One goal of structural genomics is to provide a high throughput framework for generating accurate molecular structure representations of at least one member of large groups of protein domain

19 families. The molecular structure itself provides insight into functional attributes shared among protein domain family members, functional variability within the protein domain family, as well as structural templates for ligand docking studies, homology modeling, and molecular replacement methods for solving X-ray crystal structures.

Two structural genomics groups that have made significant contributions to the PDB are the

NorthEast Structural Genomics Consortium (NESG) and the Structural Genomics Consortium

(SGC). In 2000, the Protein Structure Initiative was established to provide funding and direction to 9 structural genomics centres. The NESG uses both NMR & X-ray crystallography for elucidating the structures of eukaryotic proteins related to biology, protein-protein interaction networks, specific biochemical pathways, or implicated in specific human diseases.

The SGC is a public/private initiative that focuses on medically significant proteins related to human health. From 2003 until Jan 2014, the NESG determined 1174 protein structures (516 by

NMR & 658 by X-ray crystallography), and from 2004 to Jan 2014 the SGC determined 1232 protein structures (28 by NMR & 1204 by X-ray crystallography). These initiatives implement a similar parallel high-throughput structural genomics framework that focuses on structurally characterizing a large number of protein targets from gene to structure.

Structural genomics efforts have had a significant impact on scientific innovations related to the biological sciences and human health. In addition to the wealth of knowledge generated through these efforts, structural genomics facilitates: methods development and optimization, improved datasets related to known and potential drug target proteins for drug discovery programs, and increased availability of purified proteins for reagent development (Weigelt, 2010).

This thesis leverages the strengths of structural genomics experimental methods to explore the significance of structural variation within the ubiquitin-like domain family. The ubiquitin-like domain family was chosen because of the large number of medically-significant members of the family, the large number of uncharacterized ubiquitin-like domain containing genes, the stable

20 and soluble nature of ubiquitin, and the scientifically interesting questions surrounding the ubiquitylation system that include the unknown role that UBLs play.

There remains a significant gap in understanding the role of UBLs, as well as the breadth of cellular and molecular activity of the full length proteins that contain UBLs. There is also a gap in knowledge related to the size of the ubiquitin-like domain fold-space. In 2005, 73 genes were formally annotated as containing UBLs. By 2012, the list of formally annotated ubiquitin-like domain containing genes expanded to 152 genes. By 2014, the list of formally annotated ubiquitin-like domain containing genes expanded to 191 genes and 325 isoforms (Marchler-Bauer et al., 2013). The expanded set of formally annotated ubiquitin-like domain containing genes remains substantially smaller than the number of genes that were determined using a PSI-BLAST batch approach for this thesis project. This gap in breadth presents a gap in knowledge of the full extent of the ubiquitin-like domain family and its diversity.

This thesis tries to explore these gaps to provide insight and a possible explanation for the breadth and diversity of the ubiquitin-like domain family, while demonstrating its significance through molecular structure analysis. The first objective of the project was to identify all UBLs within the human genome. Once all UBLs were identified, a strategy was developed to work towards complete structural coverage of the ubiquitin-like domain family. Combining molecular biology and structural biology techniques, along with knowledge of the molecular structure of each human ubiquitin-like domain would provide insight into the various biochemical functions of UBLs and the significance of variations between domains. The second objective of this chapter discusses how we leveraged bioinformatics, molecular biology and structural biology techniques to screen UBLs for structure determination by NMR and prioritize constructs to facilitate greater family coverage with each newly solved structure.

21 2.3 Methods

2.3.1 Identifying human ubiquitin-like domains

An initial list of all identifiable human UBLs was compiled based on gene/domain annotation within

UniProtKB (UniProt Consortium, 2014), Human Protein Atlas (Uhlen et al., 2010), the Human

Protein Reference Database (Prasad et al., 2009), and the NCBI’s Conserved Domain Database

(consisting of SMART, Pfam, COGs, TIGRFAM, and PRK) (Marchler-Bauer et al., 2013). The resulting list of 73 human UBLs was expanded to 645 distantly related human UBLs by performing a batch of independent DELTA-BLAST sequence similarity searches of GenBank and Uniprot using each member of the initial list of human ubiquitin-like domain. DELTA-BLAST is a modified version of BLAST that uses RPS-BLAST to search for conserved domains from which a position- specific scoring matrix (PSSM) is generated and used to search the sequence databases (Benson et al., 2013; Boratyn et al., 2012).

Figure 2.1: Novel UBL discovery process. Unannotated UBLs were discovered though a series of DELTA-BLAST searches of the NCBI Genbank and Uniprot human protein databases. The predicted secondary structure elements of putative UBLs was analyzed to confirm whether it was a legitimate UBLs, and legitimate UBLs were also used as input sequences for subsequent DELTA-BLAST searches.

22

2.3.2 Validating putative human ubiquitin-like domains

Figure 2.2: Secondary & tertiary structures of Human ubiquilin-1. Secondary structure elements of the human ubiquitin-like domain containing protein ubiquilin-1 (UBQL1_HUMAN; sp|Q9UMX0).

Ubiquitin-like domains have a characteristic secondary structure consisting of 5 -strands and 3

-helical regions (Figure 2.2). Secondary structure elements were predicted using JPRED and

PSIPRED webservers for each full length protein that contains at least one of the 645 UBLs. A sequence similarity search of the PDB was also performed using each full length ubiquitin-like domain containing protein to determine whether any protein structures were deposited with a similar amino acid sequence. A pseudo-multiple sequence alignment was generated for each ubiquitin-like domain, bringing together information about the full length protein sequence, predicted secondary structure elements, and similar proteins deposited in the RCSB PDB (Figure

2.3).

23

1------11------21------31------41------51------61------71------81------91------101------111------121------131------141------151------OrigSeq :MAESGESGGPPGSQDSAAGAEGAGAPAAAASAEPKIMKVTVKTPKEKEEFAVPENSSVQQFKEEISKRFKSHTDQLVLIFAGKILKDQDTLSQHGIHDGLTVHLVIKTQNRPQDHSAQQTNTAGSNVTTSSTPNSNSTSGSATSNPFGLGGLGGLAGLSS Jnet :------EEEEEEEE----EEEEE----HHHHHHHHHHHH------EEEEE------HHH------EEEEEEE------Jhmm :------EEEEEEEE----EEEEEE----HHHHHHHHHH------EEEEEE--EE----HHHHH------EEEEEEE------Jpssm :------EEEEEEEE---EEEEEE----HHHHHHHHHHHHH------HEEH------EEEEEEE------Jnet_25 :------BB-B------BB-BBBBB------B-B---B-B--BB--BB----B-B-BBBBBB-B-BB-----B--B-B---BBBBBBBBB------B--B------BBB------BBBBBBBBBBBBBBB- Jnet_5 :------B-B-B------B--B---B------B-BBB------B------BBBB------B---B----B--B-- Jnet_0 :------BB------B-B------Jnet Rel :9988877777777777777777777777777777606899871686078884077508999999998003787500000046006676000004467875488987436777777777777777777777777777777777777777777777777777 UBIQUITIN_HUMAN-JPRED -EEEEEE----EEEEEE-----HHHHHHHHHHH------EEEEE------EEEEEEE---- : Jnet UBIQUITIN_HUMAN -EEEEEE----EEEEEE-----HHHHHHHHHHH------EEEEE---EE-----HHHH------EEEEEEE---- : 1Q0W SUMO1_HUMAN-JPRED ------EEEEEEEE----EEEEEE----HHHHHHHHHHHHH-----EEEEEE------EEEEEEEE------: Jnet SUMO1_HUMAN ------EEEEEEEE---EEEEEEEE-----HHHHHHHHHHH-----EEEEE------EEEEEEE------: 1A5R

161------171------181------191------201------211------221------231------241------251------261------271------281------291------301------311------OrigSeq :LGLNTTNFSELQSQMQRQLLSNPEMMVQIMENPFVQSMLSNPDLMRQLIMANPQMQQLIQRNPEISHMLNNPDIMRQTLELARNPAMMQEMMRNQDRALSNLESIPGGYNALRRMYTDIQEPMLSAAQEQFGGNPFASLVSNTSSGEGSQPSRTENRDPL Jnet :------HHHH------HHHHHHHH--HHHHH----HHHHHHHH---HHHHHHHH------HHHHHHHHHH-HHHHHHHHHHHHHHHHH------HHHHHHHHHHHHHHHHHHHH------Jhmm :------HHHH------HHHHHH---HHHH------HHHHHH----HHHHHHHH------HHHHHHHHHHHHHHHHHHHH--HHHHH------HHHHHHHHHHHHHHHHHHHH------Jpssm :------HHHHH-----HHHHHHHH--HHHHH----HHHHHHHHH--HHHHHHHH------HHHHHHHHH--HHHHHHHHHHHHHHHHH------HHHHHHHHHHHHHHHHHHH------Jnet_25 :BBB----B--B---BB--B--BB-BBB-BB---BBB-BB--B-BB--BB--B--B--BB--BB-B---B----BB--BB-BB--B-BB--BB-----BB--B-BB-BB--BB--BB--B---BB-BB------BBBB-B--B------B---BB Jnet_5 :------B------B------B--B------B------B-----B--B---B------B-----B---B--B---B---B--B------B Jnet_0 :------Jnet Rel :7777777776523453047874089999802356460477508999990055589998841413434677621789999984006899997470099987037887636899986899999999863056777777665667777777777777777777

321------331------341------351------361------371------381------391------401------411------421------431------441------451------461------471------OrigSeq :PNPWAPQTSQSSSASSGTASTVGGTTGSTASGTSGQSTTAPNLVPGVGASMFNTPGMQSLLQQITENPQLMQNMLSAPYMRSMMQSLSQNPDLAAQMMLNNPLFAGNPQLQEQMRQQLPTFLQQMQNPDTLSAMSNPRAMQALLQIQQGLQTLATEAPGL Jnet :------HHHH-----HHHHHHHHHHH--HHHHHHHHH------HHHHHHHHHHHHHHHHHH--HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH-- Jhmm :------HHHHHHHHHHH--HHHHHHHHH------HHHHHHHHHHHHHHHHHH--HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH Jpssm :------EEE------HH------HHHHHH----HHHHHHHHHH---HHHHHHHHH--HHH---HHHHHHHHHHHHHHHHH---HHHHHHHHHHHHHHHHHHHHHHHHHHHHHH--- Jnet_25 :-BB------B-----B--B------B--B--B-B-BBBBBBBBBBB-BBBB--BB--B--B---B--B--BBBBB-BB--BB-BB-BB--BB-BBBB---B--BB--B---BB-BB-BB----BB-BB-B--BB-BBB-BB-BB--B---BB-B Jnet_5 :--B------B---B--B-----B------B------B--BB------B---B--BB------B---B------B---B------B----BB-BB------B---BB-B Jnet_0 :------Jnet Rel :7777777777777777777774000267777777777777777777777777765410012577753000000067658999999860663589998614500005468999999748999873076589999868999999999999999987541000

481------491------501------511------521------531------541------551------561------571------581------: OrigSeq :IPGFTPGLGALGSTGGSSGTNGSNATPSENTSPTAGTTEPGHQQFIQQMLQALAGVNPQLQNPEVRFQQQLEQLSAMGFLNREANLQALIATGGDINAAIERLLGSQPS : OrigSeq Jnet :------HHHHHHHH------HHHHHHHHHHHHH-----HHHHHHHHHH----HHHHHHHHH----- : Jnet Jhmm :------HHHHHHH------HHHHHHHHHHHHH-----HHHHHHHHHH----HHHHHHHH------: jhmm Jpssm :------HHHHHHHHH------HHHHHHHHHHHHHH-----HHHHHHHHHH----HHHHHHHHH----- : jpssm Jnet_25 :BBBBBBBBBBB--BBB------B----B------B--BBBBBBBBBB-BBB-----B--B--BB--BB--B--BBB-B--BBB-BB-BB---BBBBB--B------: Jnet_25 Jnet_5 :B--B------BB-BBB-B--B------B------B------BB-BB------B-BBB--B------: Jnet_5 Jnet_0 :------B------: Jnet_0 Jnet Rel :5677777777777777777777777777777777777777641788887004677877776507999999999986068866899999974588668999988026899 : Jnet Rel

Figure 2.3: Pseudo-multiple sequence alignment of human ubiquilin-1. Full length protein sequence and predicted secondary structure elements of the human ubiquitin-like domain containing protein ubiquilin-1 (UBQL1_HUMAN; sp|Q9UMX0). The secondary structure elements for human ubiquitin & human SUMO1, as well as the predicted secondary structure elements for human ubiquitin & human SUMO1 are aligned with the ubiquitin-like domain of ubiquilin-1. Secondary structure elements were predicted using Jpred3.

24

Figure 2.4: UBL target selection, preparation and screening process. Legitimate UBLs were grouped into modeling families, from which target UBLs were selected. For each target ubiquitin-like domain, constructs were designed with varying domain boundaries and protein samples were prepared using a parallel high-throughput batch approach. NMR screening was performed on ubiquitin-like domain samples that had sufficient expression and concentration. Ubiquitin- like domain samples with adequate 1H15N-HSQC spectra were re-expressed as 15N13C-labelled protein for full structure determination.

2.3.3 Target selection

A sequence similarity analysis was performed to group related UBLs. Modelling families were generated that consist of subsets of UBLs in which the structure determination of one member of the modelling family would facilitate a reliable structure prediction of all other members of the modelling family using homology modelling techniques (Nair et al., 2009). This shortened the full list of all UBLs to 76 ubiquitin-like domain targets after removing proteins whose structures have already been deposited in the PDB, those that lack a homologue of sufficient sequence similarity, and those for which DNA templates were not available. These UBLs were targeted for NMR structure determination as described below.

25

2.3.4 Construct design

Multiple constructs were designed for each of the 76 UBLs to facilitate screening of solubility, yield

and NMR spectrum. The ubiquitin-like domain boundaries were defined using a pseudo-multiple

sequence alignment that contained sequence annotation, predicted secondary structure,

disordered regions, and all sequentially similar structurally characterized proteins within the PDB.

To facilitate protein purification using Ni2+ affinity chromatography, all constructs were generated

with a fused N-terminal poly-histidine tag. When necessary, constructs were redesigned based

on trends in small scale and NMR screening results.

21------31------41------51------61------71------81------91------101------111------OrigSeq : AEGAGAPAAAASAEPKIMKVTVKTPKEKEEFAVPENSSVQQFKEEISKRFKSHTDQLVLIFAGKILKDQDTLSQHGIHDGLTVHLVIKTQNRPQDHSAQQ Jnet : ------EEEEEEEE----EEEEE----HHHHHHHHHHHH------EEEEE------HHH------EEEEEEE------Jhmm : ------EEEEEEEE----EEEEEE----HHHHHHHHHH------EEEEEE--EE----HHHHH------EEEEEEE------Jpssm : ------EEEEEEEE---EEEEEE----HHHHHHHHHHHHH------HEEH------EEEEEEE------Jnet_25 : ------BB-BBBBB------B-B---B-B--BB--BB----B-B-BBBBBB-B-BB-----B--B-B---BBBBBBBBB------B-- Jnet_5 : ------B-B-B------B--B---B------B-BBB------B------BBBB------Jnet_0 : ------BB------B-B------Jnet Rel : 7777777777777776068998716860788840775089999999980037875000000460066760000044678754889874367777777777 PSIPRED : cccccccccccccccccEEEEEEcccccEEEEEcccccHHHHHHHHHHHHccccccEEEEEccEEcccccHHHHcccccccEEEEEEEcccccccccccc UBIQUITIN_HUMAN-JPRED -EEEEEE----EEEEEE-----HHHHHHHHHHH------EEEEE------EEEEEEE---- : Jnet UBIQUITIN_HUMAN -EEEEEE----EEEEEE-----HHHHHHHHHHH------EEEEE---EE-----HHHH------EEEEEEE---- : 1Q0W SUMO1_HUMAN-JPRED ------EEEEEEEE----EEEEEE----HHHHHHHHHHHHH-----EEEEEE------EEEEEEEE------: Jnet SUMO1_HUMAN ------EEEEEEEE---EEEEEEEE-----HHHHHHHHHHH-----EEEEE------EEEEEEE------: 1A5R

OrigSeq : AEGAGAPAAAASAEPKIMKVTVKTPKEKEEFAVPENSSVQQFKEEISKRFKSHTDQLVLIFAGKILKDQDTLSQHGIHDGLTVHLVIKTQNRPQDHSAQQ 1J8C:A EPKI+KVTVKTPKEKEEFAVPENSSVQQFKE_ISKRFKS_TDQLVLIFAGKILKDQDTL_QHGIHDGLTVHLVIK (ID:95% SIM:96%) 1YQB:A __P_++KVTVKTPK+KE+F+V_+__++QQ_KEEIS+RFK+H_DQLVLIFAGKILKD_D+L+Q_G+_DGLTVHLVIK_Q+R (ID:68% SIM:85%) 1WX7:A A___+P_++KVTVKTPK+KE+F+V_+__++QQ_KEEIS+RFK+H_DQLVLIFAGKILKD_D+L+Q_G+_DGLTVHLVIK_Q+R (ID:66% SIM:84%) 2BWE:S +_+_+K+_++K_E__V___S+V_QFKE_I+K______LI++GKILKD__T+__+_I_DG_+VHLV (ID:41% SIM:59%) 1YX5-B M++_VKT___K_____V__+_+++__K_+I__+_____DQ__LIFAGK_L+D__TLS_+_I____T+HLV++ (ID:36% SIM:54% GAP:1%)

Domain Boundaries:

Construct1 PKIMKVTVKTPKEKEEFAVPENSSVQQFKEEISKRFKSHTDQLVLIFAGKILKDQDTLSQHGIHDGLTVHLVIKTQNRP Construct2 PKIMKVTVKTPKEKEEFAVPENSSVQQFKEEISKRFKSHTDQLVLIFAGKILKDQDTLSQHGIHDGLTVHLVIKTQNRPQD Construct3 PKIMKVTVKTPKEKEEFAVPENSSVQQFKEEISKRFKSHTDQLVLIFAGKILKDQDTLSQHGIHDGLTVHLVIKT Construct4 SAEPKIMKVTVKTPKEKEEFAVPENSSVQQFKEEISKRFKSHTDQLVLIFAGKILKDQDTLSQHGIHDGLTVHLVIKTQNRP Construct5 SAEPKIMKVTVKTPKEKEEFAVPENSSVQQFKEEISKRFKSHTDQLVLIFAGKILKDQDTLSQHGIHDGLTVHLVIKTQNRPQD Construct6 SAEPKIMKVTVKTPKEKEEFAVPENSSVQQFKEEISKRFKSHTDQLVLIFAGKILKDQDTLSQHGIHDGLTVHLVIKT Construct7 MKVTVKTPKEKEEFAVPENSSVQQFKEEISKRFKSHTDQLVLIFAGKILKDQDTLSQHGIHDGLTVHLVIKTQNRP Construct8 MKVTVKTPKEKEEFAVPENSSVQQFKEEISKRFKSHTDQLVLIFAGKILKDQDTLSQHGIHDGLTVHLVIKTQNRPQD Construct9 MKVTVKTPKEKEEFAVPENSSVQQFKEEISKRFKSHTDQLVLIFAGKILKDQDTLSQHGIHDGLTVHLVIKT

Figure 2.5: Pseudo-multiple sequence alignment of ubiquilin-1 for construct design. Pseudo-multiple sequence alignment of ubiquilin-1 showing residues 20-119 of the full length protein sequence corresponding to the ubiquitin-like domain region, as well as predicted secondary structure elements, similar proteins deposited in the RCSB PDB, and constructs with predicted ubiquitin-like domain boundaries.

26

2.3.5 Sample preparation

Small scale expression and purification of each construct was performed to determine sample solubility and yield. For each target, samples with the best yield were regrown for NMR screening.

15N-labelled samples were expressed in E.coli, grown in batches of 12 x 0.5L using modified M9

15 minimal media containing NH4Cl as the sole nitrogen source supplemented with kanamycin at

o 37 C until an OD600 of 1.0 was reached. Protein expression was induced with isopropyl-1-thio-D- galactopyranoside (IPTG) and the cells were incubated for 12-18 hours at 15oC. The cells were lysed by sonication, and the cell debris was clarified by centrifugation. The poly-histidine tagged

UBLs were purified by modified batch/column Ni2+-affinity chromatography (Qiagen) in batches of

6-12 samples, and eluted to a final volume of 5 mL. Each sample was exchanged from elution buffer into a NMR buffer using centrifugal concentrators. The standard NMR buffer consisted of a MOPS-based buffer, however other buffers were used based on pH of sample, solubility and resolution of NMR spectroscopy signal. The samples were concentrated to a volume of ~500 µL and transferred to 5 mm NMR tubes, ~200 µL for 3 mm NMR tubes, or ~40 µL for 1 mm NMR microprobe tubes. The volume and NMR tube selection depended on amount of sample available, and necessary sample concentration for adequate NMR spectroscopy signal (Yee et al., 2014).

2.3.6 1H15N-HSQC screening of ubiquitin-like domains

An 1H15N-HSQC spectrum was generated for each sample using a Bruker 800MHz AVANCE spectrometer, or a Bruker 500MHz or Bruker 600MHz AVANCE spectrometer equipped with automated sample changers. Samples were ranked based on peak intensity, dispersion and percentage of total residues observed in each 1H15N-HSQC spectra (Yee et al., 2002). For samples with inadequate 1H15N-HSQC spectra, new constructs were designed to improve domain boundaries and/or NMR buffer conditions were optimized in an attempt to improve solubility.

27 2.4 Results & Discussion

2.4.1 Identifying unannotated human ubiquitin-like domains

The human genome contains 220 genes that encode proteins with UBLs, of which 147 were not annotated as having the ubiquitin-fold at the time of analysis (Appendix I). These proteins contain

645 distantly related human UBLs that include those within isoforms produced by . By eliminating identical sequences within isoforms, the pool of 645 putative UBLs can be reduced to 398 unique UBL sequences. The goal of this project has been to obtain structural coverage of all UBLs, without experimentally determining each of the 398 unique UBLs. To accomplish this, the UBLs were grouped into 100 modelling families. Modelling families represent groups of homologous protein domains that have similar structures, for which the experimental structure of one of the members of the modelling family provides “modelling leverage” to facilitate computation determination of protein structures for the remaining members of the modelling family through the use of homology, or comparative, modelling methods (Arnold et al., 2006; Kiefer et al., 2009; Peitsch, 1995; Pieper et al., 2011). Some studies have shown that sequence similarity of >40% over >50 residues can provide models with heavy atom RMSD of <2.5 Å from the experimental structure (Bhattacharya et al., 2008; Koh et al., 2003; Marti-Renom et al., 2000;

Marti-Renom et al., 2003). Modelling families are typically defined by such sequence similarity and sequence coverage parameters, but the parameters used for homology model generation for this thesis were modified to >20% over 90% because all of the domains are from the same organism, all of the domain sequence lengths are 70 aa-120 aa in length, and there is a high level of secondary structure element conservation shared among UBLs. Of the 100 modelling families, there are 5 singletons (OASL_HUMAN, PARK2_HUMAN, IKKB_HUMAN, UBL7_HUMAN &

P3C2B_HUMAN), which correspond to modelling families that contain only one UBL.

28

The 398 unique UBLs were subdivided into three classes: 128 UBLs with experimental structures deposited in the PDB, 196 UBLs with hypothetical structures generated by homology modelling, and 74 distantly related UBLs that cannot be reliably homology modeled and therefore, have no protein structure information (Figure 2.6).

Figure 2.6: Distribution of structurally characterized and uncharacterized UBLs. There are 398 unique UBLs, of which 128 molecular structures have been characterized by X-ray crystallography or NMR spectroscopy and 196 molecular structures can be modelled using homology modelling techniques. The remaining 74 UBLs are too distantly related from structurally characterized proteins.

29

2.4.2 Small-Scale Screening

The complete list of 645 human UBLs (corresponding to 398 unique UBLs) was reduced to 76

UBLs to be pursued for structure determination after removing domains that were structurally characterized, domains that shared high sequence similarity, and domains for which reagents were not readily availability. Between 9 to 12 UBL constructs were initially designed for each of the 76 target proteins, and additional constructs were redesigned after taking into account the results of small-scale expression and solubility screening. In total, 680 constructs were cloned, resulting in 205 ubiquitin-like domain constructs with adequate expression and solubility for large- scale 1H15N-HSQC Screening (Table 2.1).

Table 2.1: Summary of the small-scale expression screening of human UBLs that were structurally characterized and deposited in the PDB as part of this thesis. Expression Solubility Gene Name 5 4 3 2 1 0 5 4 3 2 1 0

BRAF 17 15 2 1 3 25 1 6 2 4 FUBI 2 1 2 1 1 ISG15 16 6 2 1 4 15 6 1 2 1 4 HERPUD2 1 1 NFATc2IPN 1 1 2 NFATc2IPC 1 1 OTU1 1 1 PLXNC1 1 1 1 1 Ubiquilin-1 2 1 1 USP7 1 3 3 4 3

2.4.3 Screening by 1H15N-HSQC

NMR spectroscopy was used for screening protein constructs because samples amenable for structure determination can be identified within minutes to hours of the protein being purified.

Protein constructs were expressed as poly-histidine-tagged 15N-labeled proteins, and purified using a rapid batch purification protocol (Yee et al., 2002). 1H15N-HSQC spectra were classified as poor, promising, good or excellent based on the number of peaks visible, the peaks:residues ratio, and the signal:noise ratio. Poor 1H15N-HSQC spectra have no visible peaks or all peaks are overlapping due the sample being an unfolded protein. Promising 1H15N-HSQC spectra may

30

consist of a partially folded protein that contains fewer than expected peaks, or inadequate peak

intensity. Good 1H15N-HSQC spectra show clear dispersion of peaks of equal intensity, an

equivalent number of peaks as amino acids, and adequate peak intensity for structure

determination. Excellent 1H15N-HSQC spectra are similar to the “Good” 1H15N-HSQC with

stronger peak intensity that would facilitate a shorter data collection period (Yee et al., 2002).

Sharpin 30 aa-154 aa (poor) FAT10 6 aa-165 aa (promising) FAT10 6 aa-89 aa (good)

Figure 2.7: Examples of 1H15N-HSQC screening results for human UBLs. Sharpin 30 aa-154 aa resulted in a HSQC classified as poor, FAT10 6 aa-165 aa resulted in a HSQC classified as promising, and FAT10 6 aa-89 aa resulted in a HSQC classified as good.

Table 2.2: Summary of 1H15N-HSQC screening results for human UBLs. 10 UBLs were solved by NMR (red), and 7 UBLs were solved by X-ray crystallography (blue). Gene Name 1H15N-HSQC quality PDB BRAF promising-2 good-4 2L05 3NY5 FUBI good-4 2L7R ISG15 2HJ8 HERPUD2 good-1 2KDB MAP1ALC3 3ECI NFATc2IP good-2 2L76 NFATc2IP good-1 2JXX OTU1 good-1 2KZR PLXNC1 3KUZ RNF2/RING1B 3H8H SF3A1 1ZKH Ubiquilin-1 good-1 2KLC Ubiquilin-3 1YQB UHRF1 2FAZ USP7 poor-1 good-1 2KVR USP15 3PPA

31

2.4.4 Structural Coverage - Completing the UBL Phylogenetic Tree In 2005, there were 73 formally annotated UBLs, which has since grown to 191 formally annotated ubiquitin-like domain-containing genes and 325 ubiquitin-like domain-containing isoforms

(Marchler et al., 2013). This increase in annotated domains was almost certain due, at least in part, from the new structures of UBLs deposited in the PDB from work in this thesis; BRAF-1/-2

(PDB_ID: 2L05.A & PDB_ID: 3NY5.ABCD), FAU_1-1 (PDB_ID: 2L7R.A), HERPUD2_1-1

(PDB_ID: 2KDB.A), ISG15_1-2 (PDB_ID: 2HJ8.A), MAP1LC3A_1-1 (PDB_ID: 3ECI.AB),

NFATc2IP_1-1 (PDB_ID: 2L76.A), NFATc2IP_1-2 (PDB_ID: 2JXX.A), PLXNC1_1-2 (PDB_ID:

3KUZ.AB), RING1_2-1/-2 & RING1_2-2 (PDB_ID: 3H8H.A), SF3A1_1-1 (PDB_ID: 1ZKH.A),

UBQLN1_1-1 (PDB_ID: 2KLC.A), UBQLN3_1-1 (PDB_ID: 1YQB.A), UHRF1_1-1 (PDB_ID:

2FAZ.AB), USP15_1-1/-2/-3 & USP15_1-2 (PDB_ID: 3PPA.A) and USP7_1-3 (2KVR.A) (Table

2.2 & Figure 2.8).

Figure 2.8: Clustering of human UBLs into groups based on sequence similarity. Phylogenetic tree of all human UBLs displaying sub-clustering into 5 groups based on UBL domain sequence similarity. UBLs structurally characterized for this project are labelled in blue alongside corresponding groups and PDB identifiers. Ubiquitin-like modifiers and 3 putative ubiquitin-like modifiers structurally characterized for this project are underlined.

32

Nevertheless, our research has identified 398 unique UBLs in 220 human genes. When taking into account isoforms and identical UBLs, there are 645 ubiquitin-like human protein domains. A number of UBLs have low percent sequence identity, yet continue to share secondary structure elements characteristic of the -grasp fold found in ubiquitin and UBLs. Our approach of combining a BLAST sequence similarity search of human proteins followed by secondary structure predictions and subsequent BLAST sequence similarity searches, allowed us to identify putative UBLs. Some of the putative UBLs were not formally annotated at the time of analysis, but have since been formally annotated, while 88 putative ubiquitin-like domain-containing isoforms and 29 ubiquitin-like domain-containing genes have yet to be validated.

The ambitious goal of completing the structural coverage of all human UBLs through experimental and computational means was not fully achieved, but 32% of human UBLs now have experimental structures and an additional 49% of structural coverage has been achieved through 196 computationally determined homology models. The remaining 74 UBLs are too distantly related to any of the experimentally characterized proteins within the PDB, and at least one member of each modelling family will need to be experimentally characterized to complete the structural coverage of all human UBLs (Table 2.3).

33

Table 2.3: All human UBLs that remain to be structurally determined, along with their most similar protein structure and biological significance.

# PPI Medical % % partners # Genes that PDB Significance UBL # PDB Protein Name Sequence Query (BioGRID, publications contain UBL ID (OMIM, CGP, Identity Length HPRD, (PubMed) DiseaseHub) BIND) 50S ribosomal protein L1 1 ANKUB1-2 3FIN-C 30% 62% – Thermus thermophilus - - 3 Aerolysin 2 ANKUB1-3 3G4O-A 40% 27% – Aeromonas hydrophila Sialidase B 3 ARAP1-3 2JKB-A 27% 58% - 24 33 – Streptococcus pneumonia Src kinase-associated 4 ARAP2-2 1U5F-A phosphoprotein 2 23% 62% – Mus musculus - 1 14 Glutathione S- 5 ARAP2-3 1YZX-A 27% 59% kappa 1 – Homo sapiens Nucleoside 6 ARAP3-2 3L7U-A diphosphate kinase A 33% 43% - 3 17 – Homo sapiens Major prion protein 7 ARHGAP20 1U5L-A 27% 60% - 1 13 – Trachemys scripta alveolar soft part Tight junction protein ZO-1 8 ASPSCR1_3 3LH5-A 35% 65% sarcoma & renal 12 28 – Homo sapiens cell carcinoma Band 4.1-like protein 3 mental 9 EPB41L1_3-1 2HE7-A 42% 60% 30 34 – Homo sapiens retardation Thiocyanate subunit  10 FRMD1_2-2 2DD4-B 44% 48% - 0 4 – Thiobacillus thioparus FRMD3_1-2 FRMD3_2-2 FRMD3_3-2 diabetic ORF:BACUNI_00621 FRMD3_5-1 nephropathy & 11 4K4K-A – Bacteroides uniformis 34% 54% - 10 FRMD3_6-2 potential tumor ATCC 8492 FRMD3_7-2 suppressor FRMD3_8-1 FRMD3_10-1 FRMPD2_1-1 Putative transcriptional regulator 12 3MEJ-A 22% 66% FRMPD2_2-1 YwtF – Bacillus subtilis Tyrosine-protein phosphatase - - 6 13 FRMPD2_4-1 1Q7X-A non-receptor type 13 47% 64% – Homo sapiens Peroxisome proliferator- 14 MYLIP_2-1 2B50-A activated receptor  34% 54% - 16 30 – Homo sapiens PAN2_1-1 Apocytochrome F 15 1E2Z-A 29% 66% PAN2_3-1 – Chlamydomonas reinhardtii PAN2_1-2 V(D)J recombination-activating 16 PAN2_2-2 2JWO-A 42% 29% protein 2 – Mus musculus - 345 20 PAN2_3-2 PAN2_1-3 Superkiller protein 3 17 PAN2_2-3 4BUJ-B 32% 57% – Saccharomyces cerevisiae PAN2_3-3 Phophatidylinositol 4,5- bisphophate 3-kinase catalytic 18 PIK3C2B 2RD0-A 32% 50% neoplasms 16 75 subunit  isoform – Homo sapiens Low-density lipoprotein longevity & HIV 19 PIK3CG 3V65-B receptor-related protein 4 32% 55% 36 380 pathways – Rattus norvegicus Obscurin-like protein 1 20 PRIC285_1-1 2LU7-A 36% 67% - 7 16 – Homo sapiens PTPN13_1-2 Systemic lupus Nucleoplasmin-2 22% 21 PTPN13_3-2 3T30-B 74% erythematosus, – Homo sapiens (10% gap) PTPN13_4-3 lung cancer & 34 79 PTPN13_3-8 Fertilization protein multiple 22 1GAK-A 27% 69% PTPN13_4-9 – Haliotis fulgens sclerosis breast / hydrolase 23 PTPN14_1-3 4LXG-A 24% 69% neoplasms & 38 31 – Sphingomonas wittichii lymphedema 24 PTPN21_1-2 4H1Z-A Enolase - Rhizobium meliloti 30% 48% Graves’ disease 8 14 25 PTPN3_1-2 1GG3-A Protein 4.1 – Homo sapiens 53.7% 53% - 12 28 3,4-dihydroxyphenylacetate RALGDS_1-1 26 1F1R-A 2,3-dioxygenase 31% 56% - 44 47 RALGDS_2-1 – Iarthrobacter globiformis

34

# PPI Medical % % partners # Genes that PDB Significance UBL # PDB Protein Name Sequence Query (BioGRID, publications contain UBL ID (OMIM, CGP, Identity Length HPRD, (PubMed) DiseaseHub) BIND) 4-hydroxy-2-oxoglutarate aldolase/2-deydro-3- 27 RAPGEF2 2YW3-A deoxyphosphogluconate 33% 43% - 18 31 aldolase – Thermus thermophiles Set1/Ash2 histone RASSF4_1 Alzheimer’s 28 3RSN-A methyltransferase complex 25% 69% 3 14 RASSF4_4 disease subunit A SH2 – Homo sapiens B-1,4-endoglucanase 29 RASSF6_4 3VHD-A 25% 63% - 5 18 – Prevotella bryantii Ryanodine receptor 1 occult macular 30 RP1L1_1-3 2XOA-A 40% 48% - 14 – Oryctolagus cuniculus dystrophy SACS_1 Anaphase-promoting complex 31 1JHJ-A 30% 57% spastic ataxia 15 68 SACS_2 subunit 10 – Homo sapiens 4-deoxy-L-threo-5-hexosulose- SHROOM1_1-2 32 1X8M-A uronate ketol-isomerase 35% 45% - - 4 SHROOM1_2-2 – Escherichia coli SNX27_1 Sorting nexin-17 33 SNX27_2 4GXB-A 39% 67% - 9 32 – Homo sapiens SNX27_3 SNX31_1-2 Sorting nexin-17 34 4GXB-A 48% 50% - - 5 SNX31_2-2 – Homo sapiens 1,8-cineole 35 UBXN4_1-1 4L77-A 2-endo-monooxygenase 35% 51% - 10 19 – Citrobacter braakii Variable lymphocyte receptor B 36 UBXN6_1-1 3A79-A 31% 64% - 40 29 – Eptatretus burger Ubiquitin-fold modifier 1 37 UFM1_2 1WXS-A 100% 63% - 39 24 – Homo sapiens Serine/threonine-protein 38 UHRF1BP1 1IXO-A phosphatase 2A activator 1 28% 58% - 2 12 – Saccharomyces cerevisiae Phosphatidylethanolamine- 39 USP11_1-2 2IQX-A binding protein 1 32% 68% HIV interaction 98 51 – Rattus norvegicus FERM, RhoGEF and pleckstrin 40 USP25_1-1 4H6Y-A domain-containing protein 1 29% 65% – Homo sapiens TTHA0068 - 41 USP25_2-1 2CWY-A 26% 69% 32 33 – Thermus thermophilus Putative 4-hydroxyphenylpyruvic 42 USP25_2-2 3ZGJ-A acid dioxygenase 33% 43% – Streptomyces coelicolor USP28_1-1 Exodeoxyribonuclease III 43 1AKO-A 27% 47% USP28_2-1 – Escherichia coli USP28_1-2 Ubiquitin carboxyl-terminal 44 1NBF-A 30% 63% - 49 28 USP28_2-2 hydrolase 7 – Homo sapiens BPP1064 putative export protein USP28_2-3 3OCJ-A 36% 51% – Bordetella parapertussis Mannose-6-phosphate 45 USP32_1-4 1PMI-A 33% 40% isomerase – Candida albicans 46 USP32_1-5 4LFY-A 38% 41% – Burkholderia cenocepacia - 26 15 SURP and G-patch domain- 47 USP32_1-6 1X4O-A containing protein 1 31% 58% – Mus musculus USP34_1-1 DNA-binding protein SMUBP-2 48 USP34_2-1 4B3F-X 30% 65% - 36 24 – Homo sapiens USP34_3-1 USP4_1-3 (Neo)pullulanase 49 2Z1K-A 32% 51% - 78 47 USP4_2-3 – Thermus thermophiles USP40_1-1 Serine/threonine-protein kinase 50 2F57-A 36% 51% USP40_3-2 PAK 7 – Homo sapiens Parkinson’s USP40_1-2 Cytochrome P450 2B6 51 3IBD-A 36% 54% Disease & Eye 3 15 USP40_3-3 – Homo sapiens Diseases Major allergen Equ c 1 52 USP40_2-1 1EW3-A 29% 56% – Equus caballus Thymidylate synthase 53 USP43_1-1 3N5G-A 33% 64% – Homo sapiens MAP kinase-interacting - 9 7 54 USP43_1-2 2HW6-A serine/threonine-protein kinase 29% 60% 1 – Homo sapiens

35

# PPI Medical % % partners # Genes that PDB Significance UBL # PDB Protein Name Sequence Query (BioGRID, publications contain UBL ID (OMIM, CGP, Identity Length HPRD, (PubMed) DiseaseHub) BIND) Outer capsid protein P3 55 USP47_1-4 1UF2-A 26% 53% – Rice dwarf virus NADH:flavin 56 USP47_2-3 4AWS-A 57% 45% - 12 26 Sye1 – Shewanella oneidensis Zinc transport protein ZntB 57 USP47_2-4 3NWI-A 35% 65% – Salmonella typhimurium Putative fructose-1,6- 58 USP48_2-3 3GB6-A bisphosphate aldolase 35% 46% – Giardia intestinalis Serine/threonine-protein 59 USP48_5-1 1S70-A phosphatase PP1- catalytic 27% 57% subunit – Gallus gallus - 10 24 Putative fructose-1,6- 60 USP48_5-2 2ISV-A bisphosphate aldolase 35% 53% – Giardia intestinalis Dihydrolipoyl dehydrogenase 61 USP48_6-1 3LAD-A 33% 57% – Azotobacter vinelandii Neural-cadherin 62 USP6_1-1 3UBF-A 26% 62% – Drosophila melanogaster Short chain dehydrogenase 63 USP6_1-2 4FN4-A 38% 55% – Sulfolobus acidocaldarius aneurysmal Baseplate structural 15 30 USP6_1-3 bone cysts 64 1K28-D protein Gp27 31% 60% USP6_2-3 – Enterobacteria phage T4 RNA2 polyprotein 65 USP6_2-2 1PGW-2 30% 69% – Bean-pod mottle virus Ubiquitin carboxyl-terminal USP9X_1-3 Turner 66 1VJV-A hydrolase 6 38% 62% 98 80 USP9X_2-3 syndrome – Saccharomyces cerevisiae TRAP dicarboxylate transporter, USP9Y_1-1 67 4NGU-A DctP subunit 27% 64% USP9Y_2-1 Infertility / – Desulfovibrio desulfuricans 6 30 azoospermia USP9Y_1-3 Ubiquitin carboxyl-terminal 68 2F1Z-A 42% 62% USP9Y_2-3 hydrolase 7 – Homo sapiens Class 1 phosphodiesterase 69 VCPIP1_1-1 4I15-A 28% 69% PDEB1 – Trypanosoma brucei - 26 31 Aspartate carbamoyltransferase 70 VCPIP1_1-3 3LXM-A 32% 51% – Yersinia pestis WDR48_1-1 Ribose-5-phosphate isomerase 71 1LK5-A 31% 53% WDR48_5-1 A – Pyrococcus horikoshii WDR48_1-2 72 1R8I-A TraC – Escherichia coli 23% 37% WDR48_5-2 - 70 28 WDR48_3-1 Regulator of G-protein 73 2PBI-A 28% 66% WDR48_4-1 signalling 9 – Mus musculus WDR48_3-2 Vanadium chloroperoxidase – 74 1IDU-A 33% 78% WDR48_4-2 Curvularia inaequalis

36 2.5 Conclusion

The human genome contains 220 genes that encode 398 unique UBLs. At the time of the analysis, 147 of the UBLs were not annotated as having the Ubiquitin-fold. The goal of this project was to obtain structural coverage of all human UBLs, without experimentally determining each of the 398 unique UBLs. This was facilitated by grouping the 398 UBLs into 100 modelling families that represent homologous protein domains that have similar structures. NMR spectroscopy was used to screen and prioritize UBLs for structure determination, and 17 human UBLs were structurally characterized using X-ray Crystallography and NMR spectroscopy. As a result, the

RCSB PDB now has 32% structural coverage of human UBLs, and 82% structural coverage when taking into account homology modelling. Of the 74 remaining human UBLs that lack structural information, 30 are singletons and are 36% similar & 23% identical to protein structures in the

PDB. This project provided 3.7% coverage of the human UBLs through experimental structure determination and 6% coverage when taking into account homology models. Structural analyses also provide insight into families of related proteins. In particular, structural analysis of the

NFATc2IP and ubiquilin protein families revealed insight into protein-protein interactions and facilitated the prediction of novel binding partners.

37

Chapter 3 Solution NMR structure determination of human Ubiquitin- like domains in NFATc2IP & Ubiquilin-1

Contributions: A. Semesi, M. Garcia & A. Yee assisted with cloning, small scale sample preparation & small scale expression/solubility screening. C. Fares, M. Karra, S. Srisalam, S.

Houliston assisted with NMR data acquisition and NMR titration. B. Wu, A. Gutmanas & A. Lemak assisted with NMR structure determination. I performed large scale NMR sample preparation and

NMR screening, as well as structure determination and subsequent analyses of NFATc2IP & ubiquilin-1.

38 Chapter 3 Solution NMR structure determination of human Ubiquitin- like domains in NFATc2IP & Ubiquilin-1 3.1 Introduction

Ubiquitin-like domains from two human ubiquitin-like domain containing proteins, NFATc2IP and

Ubiquilin-1, were structurally determined using NMR spectroscopy. The ubiquitin-like domain of human NFATc2IP (residues 342-419) and the ubiquitin-like domain of Ubiquilin-1 (residues 34-

112), both share the same -grasp domain architecture as Ubiquitin and other UBLs encoded within the human genome.

Structure determination of these two protein structures was part of a collaborative effort that resulted in the structure determination and characterization of 17 human ubiquitin-like domain structures that have expanded our knowledge of the diversity of the ubiquitin fold.

3.1.1 NFATc2IP

NFATc2IP is involved in the Nuclear factor of activated T-cells (NFAT) signaling cascade, which is important in immune response (Rengarajan et al., 2000). The NFAT family of transcription factors (NFATc1, NFATc2, NFATc3, and NFATc4) are characterized by a Rel-homology region and an NFAT-homology region (Macian F, 2005). NFATc2 interacts with NFATc2IP, and is present in the cytoplasm prior to translocating to the nucleus upon T-cell receptor stimulation (Rao et al., 1997). SUMO conjugation of NFATc2 leads to nuclear retention, regulation of transcriptional activity and recruitment to nuclear SUMO-1 bodies (Nayak et al., 2009; Terui et al.,

2004). NFATc2 contains a putative SUMO interacting motif, which could be involved in the association between NFATc2IP and NFATc2.

39

3.1.2 Ubiquilin-1

Ubiquilin-1 is one of the four members of the ubiquilin . Ubiquilin proteins contain an

N-terminal ubiquitin-like domain and a C-terminal ubiquitin-associated domain, separated by ~450 aa (Mah et al., 2000). The central region of each member of the ubiquilin protein family contains two STI1 motifs, capable of binding to heat shock proteins. Ubiquilin proteins physically associate with and ubiquitin ligases, and are thought to modulate protein degradation.

Ubiquilin-1 interacts with ubiquitin-interacting motifs (UIMs) in the proteasomal subunit S5A, ataxin-3, HSJ1a, and EPS15 (Heir et al., 2006; Regan-Klapisz et al., 2005). Ubiquilin-1 also interacts with CD47 and Gβγ, suggesting a role in integrating adhesion and signaling components of cell migration (N'Diaye & Brown, 2003).

3.1.3 Ubiquitin-like Fold

The ubiquitin-like fold of both NFATc2IP & Ubiquilin-1 contain a 5-strand mixed -sheet that is intercalated by an -helical core. Comparative analysis of both ubiquitin-like folds reveal minor differences (1-2 aa) in loop lengths, and the most distinct difference is at the C-terminus of the - helical core (Figure 3.3 & Table 3.2). The Ubiquilin-1 -helical core is 16 aa and contains a 2- residue lysine 59 – serine 60 break that allows the three C-terminal residues of the -helix

(histidine 61, threonine 62, aspartic acid 63) to orient back into the fold.

40 3.2 Experimental Procedures

3.2.1 NFATc2IP UBL domain NMR structure determination

NMR screening was performed on a 78 residue construct of the 2nd ubiquitin-like domain of

NFATc2IP, and its HSQC spectra revealed that it was amenable for structure determination

(MGSSHHHHHHSSGLVPRGSTETSQQLQLRVQGKEKHQTLEVSLSRDSPLKTLMSHYEEAMGLSGRKLSFFFDGTK

LSGRELPADLGMESGDLIEVWG - SGC clone accession: ubh72.342.419.pET28-MHL_SDC088D093).

The NMR sample was expressed in E. coli BL21 (DE3) in a 125 mL flask containing M9 minimal media (100 uM ZnSO4, 8.55 mM NaCl, 47.6 mM Na2HPO4, 22 mM KH2PO4 100 mM MgSO4, 2

15 mM biotin, 1.5 mM thiamine.HCl, 10 mM ZnSO4, and 0.1 M CaCl2), supplemented with NH4Cl,

13 C6-D-glucose and 50 µg/mL kanamycin, and was inoculated from a glycerol stock of bacteria.

The flask was incubated on a shaker for 18 hours at 220 rpm at 37ºC before being transferred to a 2L flask containing 1000 mL M9 minimal media supplemented with 50 µg/mL kanamycin, and incubated at 37 ºC until an OD600 of 1.0 was reached. Protein expression was induced with 100

µM IPTG and the cells were incubated for 15.5 hours at 220rpm at 15ºC. Cell pellets were obtained by centrifugation, and frozen in 50 mL Falcon tubes at -80ºC. The frozen cell pellets were thawed by soaking in warm water before being resuspended in 40 mL lysis buffer (15.4 mM tris.HCl, 100 uM ZnSO4 100uL, 0.5 mM NaCl, and 15 mM imidazole; pH 8.5.) and lysed by sonication on ice. The lysate was clarified through centrifugation for 20 min at 4 ºC, and the supernatant was mixed with 2 mL of Ni2+ affinity beads per 40 mL lysate. The mixture was shaken for 20 minutes at 4 ºC, before undergoing centrifugation at 2000 rpm for 6 minutes. The supernatant was decanted and the remaining resin was resuspended and washed twice with lysis buffer, followed by two 5 mL cold buffer washes (15.4 mM tris.HCl, 100 uM ZnSO4 100uL, 0.5 mM

NaCl, and 30 mM imidazole; pH 8.5). The washed resin was transferred to a gravity filter column and washed with an additional 2 mL of wash buffer. The purified protein was then eluted from the resin with 5 mL of elution buffer (15.4 mM tris.HCl, 100 uM ZnSO4 100uL, 0.5 mM NaCl, and 500 mM imidazole; pH 8.5).

41

The purified protein was exchanged from elution buffer into MOPS-based NMR buffer (NMR buffer for H2O experiments: pH 8.0, 10 mM MOPS, 500 mM NaCl, 1 mM benzamidine, 0.01% NaN3, 10

µM ZnSO4, 10% D2O, and 90% H2O; NMR buffer for D2O experiments: pH 8.0, 10 mM MOPS,

500 mM NaCl, 1 mM benzamidine, 0.01% NaN3, 10 µM ZnSO4, and 100% D2O) by ultracentrifugation using 2 mL concentrators with a 3,000 molecular weight cut-off (VivaSpin 2

MES) at 3000 rpm, resulting in a final volume of 300 µL and final protein concentration of 0.9 mM.

The concentrated protein was then transferred to a 3 mm NMR tube.

A series of NMR spectra (3D HNCO, 3D HNCA, 3D CBCA(CO)NH, 3D HBHA(CO)NH, 2D 1H-

13C Constant Time HSQC, 3D 1H-13C NOESY, 3D 1H-15N NOESY, 3D 1H-13C Aromatic

NOESY, 3D (H)CCH-TOCSY, and 3D H(C)CH-TOCSY) were collected at 298K using a 500MHz

Bruker AVANCE spectrometer, a 600MHz Bruker AVANCE spectrometer and a 800MHz Bruker

AVANCE spectrometer. After data collection was performed on the unaligned sample, the purified protein was aligned by titrating 12 mg/mL Pf1 co-solvent -free Phage into the NMR sample until 10 Hz proton splitting was observed. Spectra of aligned and unaligned spectra (2D

1H-15N IPAP HSQC) were obtained using the 500MHz Bruker AVANCE spectrometer and the

800MHz Bruker AVANCE spectrometer. NMR data was processed and analyzed using

TOPSPIN, NMRPipe, NMRDraw, SPARKY, Abacus/FMCGUI, CNS, TALOS, PALES, PSVS, and

WhatIF.(Delaglio et al., 1995; Goddard & Kneller; Lemak et al., 2011; Brünger et al., 1998;

Brünger AT, 2007; Shen et al., 2009; Zweckstetter & Bax, 2000; Bhattacharya A et al., 2007;

Vriend G, 1990)

3.2.2 Ubiquilin-1 UBL domain NMR structure determination

The process for NMR structure determination of Ubiquilin-1 was very similar to that of NFATc2IP, with a few minor differences that included the use of the LEX fermentation system and non- uniform sampling. The LEX fermentation system is a high-throughput bioreactor developed at the

Structural Genomics Consortium that consists of an enclosure that houses cell culture within

42 media bottles that are connected to an air manifold via a quick disconnect manual flow regulator to ensure sufficient oxygenation and mixing of cells at a regulated temperature (Koehn & Hunt,

2009). Of the three constructs generated for ubiquilin-1, a 79 residue construct was determined to be most amenable for structure determination by NMR (SGC clone accession: .034.112.p15Tvlic

MGSSHHHHHHSSGRENLYFQGPKIMKVTVKTPKEKEEFAVPENSSVQQFKEEISKRFKSHTDQLVLIFAGKILKDQDT

LSQHGIHDGLTVHLVIKTQNRP).

The NMR sample was expressed in E. coli BL21 (DE3) RIL in M9 minimal media supplemented

15 13 with biotin, thiamine, and 10 µM ZnSO4; NH4Cl and C-glucose were the sole nitrogen and carbon source. Starter cultures (50 mL in a 250 mL flasks) were prepared with media supplemented with 100 µL of glycerol stock and shaken overnight (18 hours) at 220 rpm at 37ºC.

The starter culture was used to inoculate 500 mL of growth media that was placed in a modified

LEX fermentation system at 37ºC until an OD600 of 1.0 was achieved. Protein expression was induced with 1 mM IPTG and grown at room temperature for 15.5 hours. Cells were harvested by centrifugation and frozen in 50 mL Falcon tubes at -80ºC. The frozen cell pellets were thawed, resuspended in 25 mL lysis buffer (20 mM tris.HCl, 100 uM ZnSO4, 0.5 mM NaCl, and 15 mM imidazole, pH 8.5) and lysed by sonication on ice. Lysate was clarified by centrifugation for 20 min at 4°C and the supernatant was mixed for 20 minutes at 4°C with 2 mL settled Ni2+ affinity beads. Beads were batch-washed twice with 5 mL of cold wash buffer (20 mM tris.HCl, 100 uM

ZnSO4, 0.5 mM NaCl, and 30 mM imidazole, pH 8.5), spun at 2000 rpm for 6 minutes, transferred to a column, and further washed with 2 mL of wash buffer. The purified protein was eluted with 5 mL of Elution buffer (20 mM tris.HCl, 100 uM ZnSO4, 0.5 mM NaCl, and 500 mM imidazole, pH

8.5). The purified protein was exchanged into NMR buffer (pH 7.0, 10 mM Tris-HCl, 300 mM

NaCl, 10 mM DTT, 1 mM benzamidine, 0.01% NaN3, 1x inhibitor cocktail (Roche), 10 µM ZnSO4,

10% D2O, and 90% H2O) and protein concentration was performed using VivaSpin concentrators

43 with a 5,000 molecular weight cut-off at 3000 rpm, resulting in a final volume of 300 µL and protein concentration of 0.5 mM.

The purified protein was transferred to a 5 mm Shigemi NMR tube for data collection, and a series of spectra (3D HNCO, 3D HNCA, 3D CBCA(CO)NH, 3D HBHA(CO)NH, 3D (H)CCH-TOCSY, 3D

H(C)CH-TOCSY, 13C-edited aliphatic NOESY, 13C-edited aromatic NOESY, 15N-edited NOESY-

HSQC, and 13C Constant Time HSQC) were collected at 25ºC on a 800 MHz Bruker AVANCE spectrometer and a 600 MHz Bruker AVANCE spectrometer equipped with a z-shielded gradient triple resonance cryoprobe. Chemical shifts were referenced to external DSS. All spectra were non-uniformly sampled, and were processed using the NMRPipe, NMRDraw and multidimensional decomposition software (Delaglio et al., 1995). The backbone assignments were obtained using HNCO, CBCA(CO)NH, HBHA(CO)NH, HNCA and 15N-edited NOESY-HSQC spectra. Aliphatic side chain assignments were obtained from H(C)CH-TOCSY, (H)CCH-TOCSY,

13C-edited aliphatic NOESY and 15N-edited NOESY-HSQC spectra. 36 H-N and 39 Ca-CO RDC constraints were generated using SPARKY and PALES. NMR data was processed and analyzed using TOPSPIN, NMRPipe, NMRDraw, SPARKY, MDD, FMCGUI, CYANA, CNS, TALOS,

PALES, and PSVS.

Distance restraints for structure calculations were derived from cross-peaks in 15N-edited NOESY-

HSQC, 13C-edited aliphatic and aromatic NOESY-HSQC spectra. NOE assignment and structure calculations were performed using FMC-GUI and CYANA. The quality of the structure calculation was assessed by NMR structure quality assessment scores (NMR PRF scores). The best 20 of

100 CYANA structures from the final cycle were selected and subjected to molecular dynamics refinement in explicit water with RDC constraints using the program CNS. The structures were inspected by PROCHECK and MolProbity using NESG validation software package PSVS.

44

3.2.3 Comparative analysis of Ubiquilin-1, NFATc2IP, Ubiquitin & SUMO2

Structural models (homology models and experimentally determined models) were inspected using UCSF Chimera, and extraneous atoms removed (e.g. poly-histidine tag, water molecules, other proteins/peptides, and residues that extended beyond the core ubiquitin-like domain)

(Petterson et al., 2004). The molecular structures of each structurally characterized Ubiquitin-like domain were structurally aligned and superimposed using UCSF Chimera. Based on the structural alignment, the corresponding core RMSD and C RMSD were calculated. Based on both the structural alignment and secondary structure element alignment, a multiple sequence alignment was generated.

Electrostatic potential distributions of 58 human UBLs were evaluated using the Analysis of

Electrostatic Similarities Of Proteins (AESOP) framework (Gorham et al., 2011). The x-ray crystal structure coordinates of GABARAPL1(PDBID:2R2Q), NFATc2IP_2nd(PDBID:3RD2),

FAF1(PDBID:3QX1), USP15(PDBID:3PPA), TCEB2(PDBID:4B95), NSFL1C(PDBID:1S3S),

RNF2(PDBID:3H8H), UBXN7(PDBID:1WJ4), BRAF(PDBID:3NY5), NCF2(PDBID:1OEY),

PIK3CG(PDBID:3CST), OASL(PDBID:1WH3), RGL2(PDBID:4JGW), SUMO3(PDBID:2IO1),

PIK3CD(PDBID:4XE0), EPB41(PDBID:1GG3), EPB41L3(PDBID:2HE7), RALGDS(PDBID:2RGF),

ISG15(PDBID:3SDL), NF2(PDBID:1H4R), MAP1LC3A(PDBID:3ECI), MAP1LC3B(PDBID:3VTU),

UBQLN3(PDBID:1YQB), BAG1(PDBID:1WXV), UBL7(PDBID:1X1M), USP14(PDBID:2AYN),

RAD23A(PDBID:2WYQ), NEDD8(PDBID:4FBJ), UHRF1(PDBID:2FAZ), PIK3CA(PDBID:4JPS),

RDX(PDBID:1J19), UBIQUITIN(PDBID:3B0A & 4HK2), RAF1(PDBID:1GUA), and UBLCP1(PDBID:2M17) were used for surface charge analysis (Berman et al., 2000). Representative models from 30

NMR ensembles were used: BRAF(PDBID:2L05), FAU(PDBID:2L7R), HERPUD1(PDBID:1WGD),

IQUB(PDBID:2DAF), ISG15(PDBID:2HJ8), NFATc2IP_1st(PDBID:2L76), NFATc2IP_2nd(PDBID:2JXX),

RAD23B(PDBID:1UEL), SF3A1(PDBID:1ZKH), SUMO1(PDBID:1A5R), SUMO2(PDBID:2AWT),

TBCB(PDBID: 2KJ6), UBIQUITIN(PDBID: 1Q0W & 1YX6), UBL3(PDBID: 1WGH), UBL4A(PDBID: 2DZI),

UBL5(PDBID: 1UH6), UBQLN1(PDBID: 2KLC), UBQLN2(PDBID: 1J8C), UBQLN3(PDBID: 1WX7),

UBTD2(PDBID: 1TTN), UBXN4(PDBID: 2KXJ), UFM1(PDBID: 1WXS), UHRF2(PDBID: 1WY8),

45

URM1(PDBID: 1WGK), USP7(PDBID: 2KVR), mouse ASPSCR1(PDBID: 2AL3), mouse RGL1(PDBID:

1EF5), mouse TMUB2(PDBID: 1WIA), and mouse UBFD1(PDBID:1V86).

Structural models were prepared for electrostatic potential calculations by determining partial charges at a pH of 7.6 and van der Waals radii using PDB2PQR with the PARSE forcefield

(Dolinsky et al., 2007; Sitkoff et al., 1994). Electrostatic potentials were calculated using the linearized Poisson Boltzmann equation,

where r represents discrete grid point positions within and around the protein, ε(r) is the dielectric coefficient, ε0 is the vacuum permittivity, κ(r) is the ion accessibility function, ϕ(r) is the electrostatic potential, e is the electron charge, κB is the Boltzmann constant, T is the temperature, and z is the unit or partial charge at position δ(r − rr) (Davis et al., 1990). The Adaptive Poisson-

Boltzmann Solver (APBS) software package calculates electrostatic potential by embedding each

UBL in a grid, and solves the Poisson-Boltzmann equation to determine electrostatic potential at each grid point based on assigned charge, dielectric coefficient, and ion accessibility (Baker et al., 2001). The dielectric surface was defined using a sphere probe with a radius of 1.4 Å, and ion accessibility surface was defined using a sphere probe with a radius of 2.0 Å. All UBLs were superimposed within a unified grid dimensions (129 × 97 × 97 points) with calculated isopotential contour surfaces plotted at ±1kbT/e. Electrostatic potentials were visualized using USCF Chimera

(Pettersen et al., 2004). Comparison of the spatial distributions of electrostatic potentials of the

UBLs were performed by generating a similarity distance matrix according to the metric:

where ϕA(i,j,k) and ϕB(i,j,k) are electrostatic potentials of proteins A and B, respectively, at a common grid point (i,j,k), and N the number of grid points. This method implies that proteins having a distance of 0 have identical spatial distributions of electrostatic potentials, whereas those having a distance of 2 have completely different electrostatic potential spatial distributions.

46

3.2.4 Protein-protein interaction partner identification

The ScanProsite tool was used to search all human proteins for putative UIMs based on a series of motifs with strict ([ED](3)-x(3)-[AG]-x(3)-S-x(2)-[ED]) and weak stringency ([ED]-x(3)-[AG]-x(6)-

S-x(2)-[ED]). The resulting lists of putative UIM-containing human proteins were compared to experimentally known binding partners of ubiquitin, ubiquilin family members and isoforms.

Binding partners were identified by searching multiple protein-protein interaction databases

(BioGRID, iRefWeb, and Human Protein Reference Database) using protein name, ID, and protein sequence (Turner et al., 2010). Multiple isoforms of ubiquilin family members and

NFATc2IP exist, and each isoform was included in the search. Human binding partners observed to interact with non-human forms of ubiquilin and NFATc2IP were also considered in the analysis of potential binding partners.

For proteins known or predicted to interact with ubiquilin-1 and NFATc2IP that lacked experimental structures, secondary structure elements were predicted using the JPRED algorithm for the full length protein of proteins (Cuff et al., 2000; Cole et al., 2008).

A difference approach was performed for identifying putative binding partners for NFATc2IP. Only two binding partners were known for NFATc2IP. Therefore, bioinformatics analyses were performed on both of these binding partners to identify possible modes of interaction related to the ubiquitin fold. Secondary structure elements were predicted. Each -helix was analysed to identify similarities with the canonical UIM. Each -strand was analysed to identify similarities with the canonical SIM.

3.2.5 Binding interface analysis

UCSF Chimera was used to superimpose the newly characterized molecular structures of both ubiquilin-1 and NFATc2IP onto known protein-protein interaction complexes involving ubiquitin:UIM (PDBID: 1Q0W, 1P9D, 1UEL) and SUMO:SIM (PDBID: 2RPQ, 2ASQ & 2KQS).

47

Residues at varying distances from each atom of the UIM and SIM were annotated. Residues in proximity to the UIM or SIM were further analysed for conservation or shared similar physicochemical attributes as ubiquitin or SUMO2.

Molecular surfaces for each UBL were calculated, as well as hydrophobicity and electrostatic potential distributions. Chemical characteristics near the UIM and SIM binding interfaces were compared between UBLs, and key observations and residues were annotated.

3.3 Results & Discussion

3.3.1 Structure determination

High-quality NMR structures were obtained for both NFATc2IP & Ubiquilin-1. Their coordinates were deposited in the on November 30th 2007 (NFATc2IP PDBID: 2JXX) and

June 30th 2009 (Ubiquilin-1 PDBID: 2KLC). Both structures consist of a compact globular -grasp fold that contains 2 -helices and a 5-stranded -sheet with a C RMSD of 1.234Å for 39 core residues, an overall RMSD of 1.234Å for all 69 aligned residues, and a structural distance measurement (cutoff 5.0) of 34.382 (Figures 3.1 & 3.2). The -helical core is packed against one side of the -sheet, and the Ubiquilin-1 -helix contains a 2-residue lysine 59 - serine 60 break that allows the three C-terminal residues of the -helix (histidine 61, threonine 62, aspartic acid

63) to orient back into the fold (Figure 3.3). The second -helix of Ubiquilin-1 and NFATc2IP is

5-6 aa in length and situated at the top of the -sheet (Table 3.2).

The electrostatic potential distribution at pH 7 is significantly different between ubiquilin-1 and

NFATc2IP. Ubiquilin-1 is mostly positively charged and NFATc2IP is mostly negatively charged

(Figure 3.4 & 3.5). Both Ubiquilin-1 and NFATc2IP contain small hydrophobic patches, while

Ubiquilin-1 has a larger hydrophobicity patch within the region of residues valine 47, leucine 65, valine 66, leucine 67, isoleucine 68, isoleucine 73, leucine 74, leucine 93, valine 94, and isoleucine 95, which is within a few angstroms of the putative UIM-binding interface (Figure 3.6).

48

An analysis of each ubiquitin-like domain structure was performed to characterize similarities between each molecular structure that was determined as part of this thesis. The molecular structure analysis consisted of exploring four attributes: molecular surface characteristics, electrostatic potential distribution, secondary structure elements, and protein-protein interaction interfaces. The protein-protein interaction interface analysis focused on the UIM and SIM binding interfaces, because the UIM region and SIM region of UBDs are amenable to identification using computational analysis.

Table 3.1: NMR data and refinement statistics. NFATc2IP Ubiquilin-1 NMR distance and dihedral constraints Distance constraints: Total NOE 2094 1997 Intra-residual 411 421 Sequential (|i-j| = 1) 556 566 Medium-range (2 ≤ |i-j| ≤ 4) 301 331 Long-range ( |i-j| ≥ 4) 826 679 Hydrogen bonds 0 24 Dihedral Angle constraints: 109 84  - phi 54 41 - psi 55 43 Structure statistics Violations (mean and s.d.) Distance constraints (Å) 0.038 +/- 0.004 0.016 +/- 0.001 Dihedral angle constraints (°) 3.680 +/- 6.088 0.855 +/- 0.130 Max. distance constraint violation (Å) 1.25 0.35 Max. dihedral angle violation (°) 152.43 5.74 Deviations from idealized geometry Bond lengths (Å) 1.235 +/- 0.007 1.256 +/- 0.005 Bond angles (°) 0.495 +/- 0.008 0.516 +/- 0.009 Impropers (°) 0.634 +/- 0.023 0.668 +/- 0.025 Ramachandran plot Most favoured regions (%) 87.5% 84.4% Allowed regions (%) 12.5% 14.3% Generously allowed regions (%) 0.1% 1.3% Disallowed regions (%) 0% 0.1% Average pairwise RMSD (Å) Heavy 1.57 +/- 0.25 1.15 +/- 0.10 Backbone 1.20 +/- 0.35 0.72 +/- 0.12 PDB accession ID 2JXX 2KLC BMRB accession ID 15576 16390

49

Figure 3.1: Secondary structure and H-bond patterns of ubiquilin-1. Secondary structure elements of ubiquilin-1 showing H-bond patterns and physicochemical properties (blue = arginine/lysine/histidine [positively charged], yellow = phenylalanine/threonine/tyrosine [aromatic], dark green = alanine/valine/isoleucine/leucine/methionine [non-polar], light green = glycine [small non-polar], orange = proline, red = glutamic acid/aspartic acid [negatively charged], purple = asparagine/serine/threonine/glutamine [uncharged polar]).

Figure 3.2: Secondary structure and H-bond patterns of NFATc2IP. Secondary structure elements of NFATc2IP showing H-bond patterns and physicochemical properties (blue = arginine/lysine/histidine [positively charged], yellow = phenylalanine/threonine/tyrosine [aromatic], dark green = alanine/valine/isoleucine/leucine/methionine [non-polar], light green = glycine [small non-polar], orange = proline, red = glutamic acid/aspartic acid [negatively charged], purple = asparagine/serine/threonine/glutamine [uncharged polar]).

50

Ubiquilin-1 NFATc2IP

Ubiquitin SUMO1 SUMO2 SUMO3

Figure 3.3: Ribbon diagrams of ubiquilin-1, NFATc2IP, ubiquitin, SUMO1, SUMO2 & SUMO3. Ubiquilin-1 and NFATc2IP contain an -helical break, which also occurs in ubiquitin, SUMO1, SUMO2 and SUMO3.

Table 3.2: Secondary structure elements of NFATc2IP, ubiquilin-1, ubiquitin and SUMO1/2/3. -strand  -strand  -helix  -strand  -strand  -helix  -strand 1 2 3 4 5 6 7 346-354 4 359-367 2 370-383 6 390-393 2 396-398 4 403-408 4 413-418 NFATc2IP_2nd 9 aa aa 9 aa aa 14 aa aa 4 aa aa 3 aa aa 6 aa aa 6 aa 26-32 2 35-41 4 46-61 2 64-69 2 72-74 5 80-84 4 89-96 Ubiquilin-1 7 aa aa 7 aa aa 16 aa aa 6 aa aa 3 aa aa 5 aa aa 8 aa (gap) 2-6 5 12-16 5 22-39 1 41-45 2 48-49 5 55-60 5 66-71 Ubiquitin 5 aa aa 5 aa aa 18 aa aa 5 aa aa 2 aa aa 6 aa aa 6 aa (gap) 21-28 4 32-39 4 44-55 7 62-65 10 76-80 5 86-92 SUMO1 - - 7 aa aa 7 aa aa 10 aa aa 4 aa aa 5 aa aa 7 aa 18-23 5 29-34 5 40-52 6 59-62 2 65-66 15 82-83 1 85-87 SUMO2 6 aa aa 6 aa aa 13 aa aa 4 aa aa 2 aa aa 2 aa aa 3 aa 16-22 5 28-34 4 39-55 1 57-61 2 64-65 16 82-87 2 90-91 SUMO3 7 aa aa 7 aa aa 17 aa aa 5 aa aa 2 aa a 6 aa aa 2 aa (gap)

51

y -90o

x -180o

x -90o

Figure 3.4: Molecular surfaces of ubiquilin-1. Four orientations (x,y,z), (x-90o,y,z), (x,y-90o,z) and (x-180o,y,z) revealing corresponding faces of ubiquilin-1 represented as ribbon, molecular surface coloured based on electrostatic potential distribution at pH 7.0 (blue = positive, white = neutral, and red = negative) and molecular surface coloured based on hydrophobicity based on the Kyte-Doolittle scale (blue = hydrophilic, white = neutral, and orange/red = hydrophobic).

y -90o

x -180o x -90o

Figure 3.5: Molecular surfaces of NFATc2IP. Four orientations (x,y,z), (x-90o,y,z), (x,y-90o,z) and (x-180o,y,z) revealing corresponding faces of NFATc2IP represented as ribbon, molecular surface coloured based on electrostatic potential distribution at pH 7.0 (blue = positive, white = neutral, and red = negative) and molecular surface coloured based on hydrophobicity based on the Kyte-Doolittle scale (blue = hydrophilic, white = neutral, and orange/red = hydrophobic).

52

x +90o

x +90o

Figure 3.6: UIM-interaction interface of ubiquilin-1 and NFATc2IP. A hydrophobic patch (orange) on ubiquilin-1 is near the UIM-interaction interface, consisting of residues valine 47, leucine 65, valine 66, leucine 67, isoleucine 68, isoleucine 73, leucine 74, leucine 93, valine 94, and isoleucine 95. Four aliphatic residues (leucine 262, isoleucine 263, alanine 266, and isoleucine 267; pink) in the putative NFATc2 UIM peptide are closest to the hydrophobic patch.

3.3.2 Comparative analysis of ubiquilin-1, NFATc2IP & similar ubiquitin- like modifiers

The ubiquitin fold is the underlying characteristic that unifies all UBLs. However, structural and physicochemical differences lead to the various functional pathways that UBLs are involved in.

To identify these differences, a comparative analysis of ubiquilin-1 and NFATc2IP was performed, which was further expanded to include ubiquitin-like modifiers. Even with a core C RMSD of

1.234 Å (39 residues) and common secondary structure elements, the sequence identity between ubiquilin-1 & NFATc2IP is 13% and the sequence similarity is 38%.

53

3.3.2.1 Similar canonical ubiquitin-like modifiers: ubiquitin & SUMO-2

The sequence identity/similarity between each ubiquitin-like domain and ubiquitin-like modifiers

was calculated. The closest canonical ubiquitin-like modifier for ubiquilin-1 is ubiquitin (35%

sequence identity & 54% sequence similarity), and the closest canonical ubiquitin-like modifier for

NFATc2IP is SUMO2 & SUMO4 (35% sequence identity & 55% sequence similarity) (Table 3.3).

Table 3.3: Sequence similarity & identity between NFATc2IP, ubiquilin-1, ubiquitin and SUMO1/2/3/4. NFATc2IP ubiquilin-1 ubiquitin SUMO1 SUMO2 SUMO3 SUMO4 NFATc2IP_2nd 13%id (9) 11%id (8) 29%id (21) 35%id (28) 34%id (27) 35%id (28) ubiquilin-1 13%id (9) 35%id (26) 19%id (15) 15%id (11) 15%id (11) 12%id (9) NFATc2IP_2nd 38%sim (27) 41%sim (29) 54%sim (40) 55%sim (44) 53%sim (43) 55%sim (44) ubiquilin-1 38%sim (27) 54%sim (40) 42%sim (33) 41%sim (30) 41%sim (30) 35%sim (26) NFATc2IP_2nd 2%gaps (2) 1%gaps (1) 1%gaps (1) 1%gaps (1) 1%gaps (1) 1%gaps (1) ubiquilin-1 2%gaps (2) 1%gaps (1) 1%gaps (1) 1%gaps (1) 1%gaps (1) 1%gaps (1)

3.3.2.2 Structural comparison between ubiquilin-1 & NFATc2IP

Ubiquilin-1 and NFATc2IP share 8 identical residues, 5 within secondary structure elements and

3 within loop regions. All three of the identical residues in loop regions are small & flexible, one

serine & two glycine amino acids. Most of the conserved residues are within the -sheet, however

conserved surface-exposed residues are scattered throughout the molecular surface of the

proteins (Figure 3.7). This may mean that residue conservation between Ubiquilin-1 and

NFATc2IP is related to the common fold and not shared binding partners.

54

Figure 3.7: Similarities between ubiquilin-1 and NFATc2IP. Ubiquilin-1 & NFATc2IP share 8 identical residues (5 within secondary structure elements) and 27 similar residues (12 within secondary structure elements). Molecular surface diagrams highlight all of the conserved (dark) & similar residues (light) within secondary structure (blue) or loops (green).

Ubiquilin-1 NFATc2IP Ubiquilin-1 NFATc2IP 28-V Aliphatic 30-V 84-G Aliphatic/Small 86-G 30-V Aliphatic 32-G 91-V Aliphatic 93-I 38-E Acidic 40-E 93-L Aliphatic 95-V 41-V Aliphatic 43-L Outside Secondary Structure Elements 47-V Aliphatic 49-L 45-S Polar/Uncharged 47-S 49-Q Polar/Uncharged 51-T 71-G Aliphatic 73-G 58-F Non-Polar/Uncharged 60-M 88-G Aliphatic 90-G 69-F Aromatic 71-F 74-L Aliphatic 76-L

55

3.3.2.3 Structural comparison between ubiquilin-1 & ubiquitin

Ubiquitin and ubiquilin-1 share 26 conserved residues, and the C-terminal -strand is almost entirely conserved. Many residues are also conserved throughout the -sheet and major -helix.

Conserved residues exist on the major -helix turns that face the core of the fold. Conserved surface-exposed residues are also visible on all faces of the protein, and a prominent patch of conserved residues are within the UIM binding interface of ubiquitin. The presence of the region of conserved residues could result in a common binding partner between ubiquitin and ubiquilin-

1. Analysis of protein-protein interaction databases revealed that 205 proteins interact with both ubiquitin & at least one member of the ubiquilin family, while 2407 unique proteins have been observed for ubiquitin, and 1512 unique proteins have been observed to interact with at least one member of the ubiquilin family. At least one putative UIM has been observed in 106 of the 205 proteins known to interact with both ubiquitin and a member of the ubiquilin family (Appendix III).

Conserved residues outside secondary structure regions are found mostly at both the N-terminus and C-terminus of the minor -helix (Figure 3.8).

56

z

z

Figure 3.8: Similarities between ubiquilin-1 and ubiquitin. Ubiquilin-1 & ubiquitin share 26 identical residues (20 within secondary structure elements) and 40 similar residues (29 within secondary structure elements). Molecular surface diagrams highlight all of the conserved (dark) & similar residues (light) within secondary structure (blue) or loops (green).

Ubiquilin-1 Ubiquitin Ubiquilin-1 Ubiquitin 26-M Non-Polar/Uncharged 1-M 72-K + Charged 48-K 28-V Aliphatic 3-I 74-L Aliphatic 50-L 30-V Aliphatic 5-V 80-L Aliphatic 56-L 31-K + Charged 6-K 81-S Polar/Uncharged 57-S 32-T Polar/Uncharged 7-T 90-T Polar/Uncharged 66-T 41-V Aliphatic 17-V 91-V Aliphatic 67-L 46-S Polar/Uncharged 22-T 92-H Aromatic 68-H 47-V Aliphatic 23-I 93-L Aliphatic 69-L 49-Q Polar/Uncharged 25-N 94-V Aliphatic 70-V 51-K + Charged 27-K 95-I Aliphatic 71-L 54-I Aliphatic 30-I 96-K + Charged 72-R 55-S Polar/Uncharged 31-Q Outside Secondary Structure Elements 57-R + Charged 33-K 34-K + Charged 11-K 63-D - Charged 39-D 70-A Aliphatic 46-A 64-Q Polar/Uncharged 40-Q 71-G Aliphatic 47-G 67-L Aliphatic 43-L 76-D - Charged 52-D 68-I Aliphatic 44-I 79-T Polar/Uncharged 55-T 69-F Non-Polar/Uncharged 45-F 85-I Aliphatic 61-I

57

3.3.2.4 Structural comparison between NFATc2IP & SUMO2

Structure conservation between NFATc2IP and SUMO2 is mostly within the -sheet and in loop regions, with some conserved residues within the major -helix. The conserved loop residues are at the C-terminus of the major -helix, and C-terminus of a couple of the -strands. Some molecular-surface exposed conserved residues from loop regions are visible as patches, with multiple conserved residues bordering the UIM binding interface and limited conservation within the SIM binding interface (Figure 3.9). This may mean that there isn’t a commonly shared UIM or SIM between NFATc2IP and SUMO2. However, conservation near the binding interfaces could mean partial conservation between NFATc2IP binding partners and SUMO2 binding partners.

58

3.3.2.5 Structural differences between NFATc2IP_2nd & SUMO2

Figure 3.9: Similarities between NFATc2IP and SUMO2. NFATc2IP and SUMO2 share 28 identical residues (14 within secondary structure elements) and 44 similar residues (16 within secondary structure elements). Molecular surface diagrams highlight all of the conserved (dark) & similar residues (light) within secondary structure (blue) or loops (green). To assist with showing the location of the UIM-binding interface & SIM binding interface, both a -strand from a SIM (purple) and an -helix from a UIM (yellow) are superimposed on the structure.

NFATc2IP SUMO2 NFATc2IP SUMO2 26-L Aliphatic 18-I 93-I Aliphatic 84-I 27-Q Polar/Uncharged 19-N 94-E - Charged 85-D 28-L Aliphatic 20-L 95-V Aliphatic 86-V 29-R + Charged 21-K 96-W Non-Polar/Uncharged 87-F 30-V Aliphatic 22-V Outside Secondary Structure Elements 32-G Aliphatic 24-G 45-R + Charged 36-R 37-Q Polar/Uncharged 28-S 48-P Polar/Uncharged 39-P 39-L Aliphatic 30-V 61-G Aliphatic 52-G 43-L Aliphatic 34-I 62-L Aliphatic 53-L 49-L Aliphatic 40-L 63-S Polar/Uncharged 54-S 52-L Aliphatic 43-L 65-R + Charged 56-R 53-M Non-Polar/Uncharged 44-M 72-D - Charged 63-D 56-Y Aromatic 47-Y 73-G Aliphatic 64-G 58-E - Charged 49-E 82-P Polar/Uncharged 73-P 69-F Non-Polar/Uncharged 60-F 83-A Aliphatic 74-A 71-F Non-Polar/Uncharged 62-F 85-L Aliphatic 76-L 74-T Polar/Uncharged 65-Q 87-M Non-Polar/Uncharged 78-M 76-L Aliphatic 67-I 88-E - Charged 79-E 91-D - Charged 82-D

59

3.3.3 From Structure to Function: Exploring Protein-Protein Interactions involving ubiquitin-like domains

As described in Chapter One, ubiquitin is known to be involved in many weak and transient

interactions. One of these interactions involves a UIM, which is an -helix found in hundreds of

known ubiquitin binding partners. The UIM is characterized by a conserved motif (E/D-E/D-E/D-

Φ-x-x-A-x-x-x-S-x-x-E/D; where Φ is a hydrophobic residue) (Fisher et al., 2003).

3.3.3.1 The Ubiquitin-Interacting Motif interaction interface

A few UIM:ubiquitin complexes have also been structurally characterized (Table 3.4). Two of the

UIM:ubiquitin complexes involve a UIM within the 26S proteasome non-ATPase regulatory

subunit 4 (Hofmann & Falquet, 2001). The 26S proteasome non-ATPase regulatory subunit 4

UIM does not fit the canonical UIM motif even though the binding mode and interaction features

remain the same. The key differences between the canonical UIM motif and the 26S proteasome

non-ATPase regulatory subunit 4 UIM include a glutamine neighbouring the conserved

hydrophobic residue within the acidic N-terminal region of the motif, and there are 4 amino acids

instead of the canonical 2-residue gap between the conserved serine and the acidic C-terminal

region.

Table 3.4: UIM:ubiquitin complexes deposited in the PDB, along with UIM sequence. PDB_ID Year UIM-containing protein ubiquitin or ubiquitin-like domain UIM sequence 26S proteasome UV excision repair protein non-ATPase …EEEQIAYAMQMSLQGAE… 1UEL:B 2003 P55036 RAD23 homolog B P54727 regulatory (H.sapiens) doesn’t fit canonical motif subunit 4 26S proteasome UV excision repair protein non-ATPase …EEEQIAYAMQMSLQGAE… 1P9D:A 2003 P55036 RAD23 homolog A P54725 regulatory (H.sapiens) doesn’t fit canonical motif subunit 4 Vacuolar protein sorting- 1Q0W:A 2003 P40343 ubiquitin (S.cerevisiae) P0CG63 …EDEEELIRKAIELSLKE… associated protein VPS27 2D3G:P 2005 HGS HRS O14964 ubiquitin (B. Taurus) P0CH28 …EEEELQLALALSQSEAEE…

60

Analysis of the UIM:ubiquitin complexes reveal structural conservation of acidic residues at the termini of the UIM-containing-helix, as well as the general positioning of the conserved serine residue and hydrophobic residues along the ubiquitin-facing surface of the UIM between the N- terminal acidic residues and the conserved serine (Figure 3.10 & Figure 3.11).

PSMD4

1UEL / 1P9D

VPS27

1Q0W

HGS HRS

2D3G

Figure 3.10: UIM -helices from PSMD4, VPS27 and HGS. The UIMs from PSMD4, VPS27 and HGS were structurally characterized in complex with ubiquitin; acidic residues are red, basic residues are blue, hydrophobic residues are orange, and serine are green. Three conserved regions are highlighted: two acidic termini are highlighted with the blue box and the conserved serine highlighted by the green box.

61

Figure 3.11: Ubiqutin:PSMD4(UIM) complex. Ubiquitin residues within 3Å (isoleucine 68, isoleucine 73, alanine 70, G71, H92) and 4Å (valine 66, isoleucine 68, alanine 70, G71, K72, isoleucine 73, H92, valine 94) of the UIM displayed as sticks.

Analysis of ubiquitin residues within proximity of the UIM, and corresponding residues within a superimposed ubiquilin-1 molecular structure, reveal amino acid conservation; 6 out of 6 of ubiquilin-1 residues at 3Å (isoleucine 68, isoleucine 73, alanine 70, G71, H92), and 14 out of 16 of ubiquilin-1 residues at 4Å (valine 66, isoleucine 68, alanine 70, G71, K72, isoleucine 73, H92, valine 94). All of these conserved residues are also localized to interact with the hydrophobic residues of the UIM (Figure 3.12).

62

Figure 3.12: UBL residues within UIM-interaction interface. This chart displays amino acids from ubiquitin, UBTD2 and ubiquilin-1 that are within 2Å, 3Å, and 4Å of each amino acid within the -helix from the PSMD4 UIM. Acidic amino acids are red, hydrophobic amino acids are green, and serine is blue. Black arrows identify amino acids that are conserved between ubiquitin and ubiquilin-1. 3.3.3.2 Putative UIM Interaction Interface: Conserved Amino Acids

Within the ubiquitin-like domain of ubiquilin family members, there is residue conservation between family members within two stretches of highly-conserved residues (10 aa in length & 14 aa in length) in both C-terminal -strands (Figure 3.13).

Figure 3.13: Multiple sequence alignment of UBLs from ubiquilin family members. Two conserved regions correspond to amino acids within 4Å of UIM atoms. 63

3.3.3.3 Putative UIM Interaction Interface: Similar Electrostatic Potential Distribution

Clustering of ubiquitin-like domain molecular structures based on electrostatic potential distribution at pH7 and 4Å from each UIM atom revealed a strong similarity between ubiquitin and members of the ubiquilin family (Figure 3.14). For this reason, we looked at potential UIMs that are within proteins known to interact with both ubiquitin and at least one member of the ubiquilin family.

pH 7 pH 7

Ubiquilin-3 Ubiquilin-1 Ubiquitin Ubiquilin-2

Figure 3.14: Similarity tree based on electrostatic potential within 4 Å of UIM-binding interface. A UIM a-helix is superimposed in the UIM binding interface to show the location & orientation of the UIM.

64 3.3.3.4 Surveying Known UIM-Binding Partners

There are currently 78 human proteins with annotated UIMs, of which 16 are known to interact with ubiquitin (Table 3.5). There are also 5 human proteins with annotated UIMs that are known to interact with at least one member of the ubiquilin family, and two of these proteins interact with multiple ubiquilin proteins (Letunic et al., 2014; Turner et al., 2010) (Table 3.6). All 5 of the proteins have been observed to also interact with ubiquitin. However, this could be an underrepresented number, as demonstrated by known -helices with minor variations in the UIM sequence that have been shown to interact with ubiquitin.

Table 3.5: Human proteins that contain at least one canonical UIM motif and observed to interact with ubiquitin, along with the number of supporting publications and supporting structural complexes that have been deposited in the PDB. Supporting Supporting UIM Ubiquitin Interaction ID Publications Structure PSMD4 UBC 700227 13 1UEL, 1P9D HGS UBC 1024774 9 - HGS UBC (Bovin) 728136 3 2D3G DNJB2 UBC 1007317 1 1Q0W DNJB2 UBC (Bovin) 877312 1 - EPN1 UBC 962133 3 - EPN2 UBC 891993 1 - EPS15 UBC 1010404 6 - AN13A UBC 910747 1 - STAM1 UBC 1008921 1 - STAM1 UBC 1129713 5 - STAM2 UBC 1061783 1 - AKIB1 UBC (Bovin) 1078418 1 -

Table 3.6: Human proteins that contain at least one canonical UIM motif and observed to interact with members of the ubiquilin family (Turner et al., 2010). UIM Ubiquilin Interaction ID Supporting Publications UBQLN4 670139 1 PSMD4 UBQLN2 693598 3 UBQLN1 1155859 3 DNJB2 UBQLN1 772775 1 UBQLN1 840585 2 HGS UBQLN4 898735 1 STAM2 UBQLN4 883239 1 EPS15 UBQLN1 1011809 1

There are 368 human proteins annotated to interact with members of the ubiquilin family, and 827 human proteins known to interact with ubiquitin. There are 202 proteins that have been shown to interact with ubiquitin & at least one member of the ubiquilin family, of which 57 are human 65 proteins. At least one putative UIM has been observed in 61 of the 202 proteins (17 of the 57 human proteins) known to interact with both ubiquitin & at least one member of the ubiquilin family

(Appendix III).

Table 3.7: 17 human proteins that interact with both human ubiquitin and a member of the ubiquilin family, and that also contain at least one UIM motif. ANCHR EPS15 PIN1 RD23A STAM1 UBP34 DNJB2 HD PSMD3 RNF11 STAM2 USP9X EF1A1 HGS PSMD4 SAE2 UBE3A

Analysis of bound UIM domains revealed variability within the canonical UIM motif. These include a variable length stretch of residues between the N-terminal acidic residues and the conserved alanine (ie. PSMD4 and DNJB2 have a stretch of 4 residues, while EPN1 has 3 residues that separate the acidic residues from the alanine), and a variable length stretch of residues separates the conserved serine and the C-terminal acidic residues (ie. PSMD4 has 4 residues, while DNJB2 and EPN1 have 2 residues that separate the serine from the C-terminal acidic residues). PIN1 had a few additional differences: hydrophobic residues within the N-terminal acidic residue stretch, a glycine instead of a conserved alanine near the N-terminal acidic residue stretch, a longer stretch of residues between the conserved glycine/alanine and the conserved serine, and a single glycine to separate the conserved serine and C-terminal acidic residues (Figure 3.15).

PSMD4 EEEQIAYAMQMSLQGAE DNJB2 EDEEELIRKAIELSLKE EPN1 EEEELQLALALSQSEAEE PIN1 TRTKEEALELINGYIQKIKSGEEDFESLAS Figure 3.15: Sequence alignment of UIMs within PSMD4, DNJB2, EPN1 and PIN1. Sequence alignment of three structurally characterized UIMs (PSMD4, DNJB2 and EPN1), as well as the putative UIM in PIN1. Acidic residues are red, basic residues are blue, hydrophobic residues are orange, and serine is green.

To take into account this variability, as well as variability introduced by the structural variance between UBLs, 4 alternate UIM motifs were used when searching for putative UIMs in known binding partners of both ubiquitin and members of the ubiquilin family (Table 3.8). Six of these proteins have no molecular structure deposited in the PDB, while the remaining 11 proteins have at least one structure within the PDB. PIN1 stands out because its molecular structure has been deposited into the PDB 45 times (Table 3.8).

66

Table 3.8: UIM motif and 4 variations of the UIM motif were used to identify 17 human proteins that interact with both human ubiquitin and a member of the ubiquilin family. [ED](3)-x(3)-[AG]-x(3)-S-x(2)-[ED] P25686 DNJB2_HUMAN 252 – 265 DEDlqlAmaySlsE 2 PDB structures O14964 HGS_HUMAN 260 – 273 EEElqlAlalSqsE 4 PDB structures P55036 PSMD4_HUMAN 232 – 245 EEEarrAaaaSaaE 5 PDB structures Q92783 STAM1_HUMAN 173 – 186 EEDlakAielSlkE 3 PDB structures O75886 STAM2_HUMAN 167 – 180 DEDiakAielSlqE 3 PDB structures [ED]-x(3)-[AG]-x(3)-S-x(2)-[ED] Q96K21 ANCHR_HUMAN 208 – 219 DerqGsipStqE 0 PDB structures 211 – 222 DlalGlelSrrE P25686 DNJB2_HUMAN 1 PDB structures 254 – 265 DlqlAmaySlsE P42566 EPS15_HUMAN 881 – 892 DlelAialSksE 2 PDB structures P42858 HD_HUMAN 1261 – 1272 EkfgGflrSalD 0 PDB structures O14964 HGS_HUMAN 262 – 273 ElqlAlalSqsE 2 PDB structures 215 – 226 ElalAlrvSmeE P55036 PSMD4_HUMAN 5 PDB structures 234 – 245 EarrAaaaSaaE Q9Y3C5 RNF11_HUMAN 141 – 152 EpvdAallSsyE 0 PDB structures Q92783 STAM1_HUMAN 175 – 186 DlakAielSlkE 3 PDB structures O75886 STAM2_HUMAN 169 – 180 DiakAielSlqE 3 PDB structures [ED]-x(3)-[AG]-x(4)-S-x(2)-[ED] Q9UBT2 SAE2_HUMAN 483 – 495 EdgkGtiliSseE 4 PDB structures Q05086 UBE3A_HUMAN 98 – 110 EnskGapnnScsE 0 PDB structures [ED]-x(3)-[AG]-x(5)-S-x(2)-[ED] 71 – 84 EgltGtgtgpSraE P25686 DNJB2_HUMAN 2 PDB structures 254 – 267 DlqlAmayslSemE P68104 EF1A1_HUMAN 319 – 332 DvrrGnvagdSknD 1 PDB structures P42858 HD_HUMAN 409 – 422 EesgGrsrsgSivE 6 PDB structures O14964 HGS_HUMAN 262 – 275 ElqlAlalsqSeaE 6 PDB structures P55036 PSMD4_HUMAN 213 – 226 DpelAlalrvSmeE 8 PDB structures Q92783 STAM1_HUMAN 173 – 186 EedlAkaielSlkE 3 PDB structures O75886 STAM2_HUMAN 167 – 180 DediAkaielSlqE 3 PDB structures Q93008 USP9X_HUMAN 1682 – 1695 EqhdAleffnSlvD 0 PDB structures [ED]-x(3)-[AG]-x(6)-S-x(2)-[ED] P68104 EF1A1_HUMAN 403 – 417 DmvpGkpmcveSfsD 7 PDB structures P42566 EPS15_HUMAN 576 – 590 EvttAvtekvcSelD 0 PDB structures Q13526 PIN1_HUMAN 87 – 101 ElinGyiqkikSgeE 45 PDB structures O43242 PSMD3_HUMAN 52 – 66 DgktAaaaaehSqrE 0 PDB structures P55036 PSMD4_HUMAN 255 – 269 DsddAllkmtiSqqE 5 PDB structures P54725 RD23A_HUMAN 150 – 164 EedaAstlvtgSeyE 3 PDB structures Q9UBT2 SAE2_HUMAN 218 – 232 EpteAeararaSneD 5 PDB structures 786 – 800 EknmAdfdgeeSgcE Q70CQ2 UBP34_HUMAN 3 PDB structures 1672 – 1686 EscsGlyklslSglD

67

3.3.3.5 PIN1 – Peptidyl-Prolyl cis/trans Isomerase

Peptidyl-prolyl cis/trans isomerase (PIN1) regulates protein function by inducing a conformational change of peptidyl-bonds in polypeptide chains after phosphorylation, and plays a significant role in cell cycle regulation and cancer development (Lippens et al., 2007). PIN1 also regulates the function and processing of Tau and APP, and is important for protecting against age-dependent . PIN1 is also the only gene known so far that, when deleted in mice, can cause both tau and Aβ-related pathologies in an age-dependent manner that resembles human

Alzheimer’s disease (Liou et al., 2003).

PIN1 has been associated with ubiquitin through its ubiquitylation, and has been experimentally observed to interact with ubiquilin-4 through a yeast-2-hybrid interaction (Lim et al., 2006).

However, the mode of that interaction remains unknown.

3.3.3.6 Identifying a putative UIM in PIN1

PIN1 consists of 14 secondary structure elements (10 -strands & 4 -helices). The putative UIM is within the solvent-exposed -helix1.

EEALELINGYIQKIKSGEED HHHHHHHHHHHHHHHHTSS-

Figure 3.16: Putative human PIN1 UIM. Human PIN1 protein with the putative UIM highlighted, along with corresponding UIM amino acid sequence highlighting conserved acidic residues (red), conserved glycine (green), and a conserved serine (blue).

68

The putative UIM identified within PIN1 contains non-canonical features; including hydrophobic residues within the N-terminal acidic residue stretch (ie. …EEALELING…), a glycine instead of a conserved alanine near the N-terminal acidic residue stretch (ie. …EEALELING…), and a longer stretch of residues between the conserved glycine/alanine and the conserved serine (ie. 6 residues …GYIQKIKS… instead of 3 residues …GMQMS…, which corresponds to an extra turn in the -helix).

PIN1 has been structurally characterized by X-ray crystallography and NMR with 45 structures deposited in the PDB, and the putative UIM identified within PIN1 corresponds to an -helical region of the protein (Figure 3.16). For this reason, one of the full length PIN1 constructs used for structure determination by NMR was obtained and used for NMR titration to validate the hypothesis that PIN1 contains a UIM that can interact with the ubiquilin-1 UIM-binding interface.

3.3.3.7 Ubiquilin-1 & PIN1 NMR Titration

NMR titration was performed using the ubiquitin-like domain of 15N-ubiquilin-1 (corresponding to

PDB-ID: 2KLC) and the full length PIN1 protein (corresponding to PDB-ID: 1NMV; BMRB: 5305).

2KLC was solved by NMR in TRIS buffer with NaCl, NaN3, benzamidine, ZnSO4, and DTT by our group in 2009, and 1NMV was solved by NMR in phosphate buffer with DTT, EDTA and 50-100 mM sodium sulfate by Bayer et al. in 2003.

A series of NMR titrations were attempted at pH 6.5 and pH 7.0 in buffer optimized for an

UIM:ubiquitin interaction (50 mM sodium phosphate, 1 mM DTT, 10% D2O / 90% H2O) based on the previously deposited UIM:ubiquitin complex [PDBID: 2RR9], but no chemical shift change was visible. An additional NMR titration was performed at pH 8.0 in buffer optimized for ubiquilin-1

(10 mM TRIS, 300 mM sodium chloride, 0.01% sodium azide, 1 x inhibitor cocktail [Roche], 1 mM benzamidine, 10 uM ZnSO4, 10 mM DTT, 10% D2O / 90% H2O) corresponding to the same buffer used to determine ubiquilin-1 [PDBID: 2KLC], and 9 chemical shift peak changes were observed 69 at a 1:20 ubiquilin-1:PIN1 molar ratio. These peak shifts included D63, K72, isoleucine 73, leucine

74, Q82, H92, valine 94, and K96 (Figure 3.17). These results correspond to amino acids predicted to be within the UIM of ubiquilin-1 (Figure 3.18).

K72 K72

Figure 3.17: Ubiquilin-1:PIN1 NMR titration. HSQC (64 scans) from NMR titration from 1:0 ubiquilin-1:PIN1 (blue) to 1:20 ubiquilin-1:PIN1 (red); 150 µM 15N-ubiquilin-1 + 3 mM PIN1 at pH 8.0 (298K) in 40 µL sample volume with 50 mM sodium phosphate and 1 mM DTT.

70 3.3.3.8 Analysis of the ubiquilin-1 & PIN1 interface

Analysis of the ubiquilin-1:PIN1 interface reveals an extra -helical turn within the UIM resulting from an additional three residues between the conserved glycine and conserved serine. The

UIM-binding region has several structural features: -strands 3, 4 & 5 curve around the UIM, a phenylalanine & histidine are near the conserved serine of the UIM, and an isoleucine and valine are near the leucine 262 – isoleucine 263 within the acidic N-terminal region of the UIM. The molecular surface of the UIM-binding interface is positively charged, which could mediate an interaction with the acidic residues at both termini of the UIM. The 9 residues corresponding to the chemical shift changes in the NMR titration (aspartic acid 63, lysine 72, isoleucine 73, leucine

74, glutamine 82, histidine 92, valine 94, and lysine 96) are all within the UIM-interaction interface, and all of the residues were predicted to interact with the UIM based on proximity to the putative

UIM-binding site, and comparative analysis of the UIM:ubiquitin complexes deposited in the PDB.

Analysis of the UBLs of ubiquilin family members reveal that 7 of the 9 residues are conserved throughout the family. The two residues that are not conserved are isoleucine  glutamine and valine  arginine. Both of these residues interact with the same isoleucine on the UIM, which is next to the N-terminal acidic region of the UIM. This is the same region where hydrophobic residues are inserted in the acidic region of the PIN1 UIM. Additional experiments are necessary to validate and further characterize the ubiquilin-1:PIN1 interaction (Chapter Five).

Figure 3.18: Putative ubiquilin-1:PIN1 interaction. Ubiquilin-1 modelled with PIN1 (blue -helix) highlighting 9 stick residues corresponding to chemical shift changes in the NMR titration.

71

3.3.4 Binding-Partner Driven - Structural analysis of the SUMO- Interacting Motif binding interface

For NFATc2IP, a different approach was taken for identifying a potential binding partner. Instead of searching for SIMs in known binding partners of both ubiquitin and ubiquilin, the sequence and secondary structure of all known binding partners of NFATc2IP were analyzed to identify a possible mode of interaction.

3.3.4.1 NFATc2IP Binding Partners

Human NFATc2 has been observed to interact with 28 human proteins, in addition to HIV tat and

HIV Vpr (Turner et al., 2010). Of the NFATc2-interacting proteins, only NFATc2IP contains two

UBLs. NFATc2IP has been observed to interact with 11 human proteins; B-ATF-3, NFATc2,

RNF4, SREK1, SUMO2, TRAF1/EBI6, TRAF2/TRAP3, TRAF3, TRAF5/RNF84, TRAF6/RNF85, and ubiquitin (Turner et al., 2010). NFATc2IP contains an arginine-rich N-terminus and two UBLs at its C-terminus. NFATc2IP is a homologue of yeast DNA repair factor RAD60, sharing 13% sequence identity along the full length of the protein and 22% sequence identity between the second ubiquitin-like domain of NFATc2IP and the lone ubiquitin-like domain of RAD60.

Our analyses revealed that SUMO2 and SUMO4 are the ubiquitin-like modifiers that are most similar to NFATc2IP; 35% sequence identity and 55% sequence similarity (Table 3.3). Based on the similarity between NFATc2IP and members of the SUMO family, we performed sequence analysis of the known binding partners of NFATc2IP to determine whether there were -strands similar to the canonical SIM motif.

72 3.3.5 Sumo-Interacting Motif (SIM)

The SUMO-interacting motif (SIM) was discovered as a protein-protein interaction related to sumoylation, and the defining characteristics of the SIM have changed over time (Minty 2000,

Song 2004, Song 2005, Hannich 2005, Hecker 2006, Kerscher 2007, Perry 2008, Zhu 2008,

Makhnevych 2009). Initially, a SXS triplet motif was identified in 2000 as being important for

SUMO interaction, followed by a second hydrophobic core motif of V/I-X-V/I-V/I in 2004, and further experimentation revealed that flanking acidic residues also play a role in SUMO:SIM interactions (Minty 2000, Song 2004, Hannich 2005, Hecker 2006).

The functional role of the SIM has yet to be fully elucidated. However, it has been shown to be involved in recruiting SUMO-modified Ubc9 to facilitate sumoylation of the SIM-containing protein.

Structurally, the SIM interaction consists of a -sheet extension, and is a stronger interaction when compared to other binding modes involving the ubiquitin-fold (Chapter One).

3.3.5.1 Identifying putative SIMs in NFATc2

Full length human NFATc2 consists of 18 secondary structure elements (3 -helices & 15 - strands). Our analysis of its secondary structure elements revealed that two of the -strands have characteristics similar to that of the SIM. These include amino acids similar to the hydrophobic

V/I-X-V/I-V/I region, and acidic residues nearby. Analysis of the molecular structure of NFATc2 deposited in the PDB reveal that both of the putative SIM-containing -strands are solvent- exposed (Figure 3.19).

73

GHPVVQL HGYMENKPLGLQIFIG SGRIVSLQTASNPIECSQRS --EEEEEE-----EEEEEEEEEE ----EEE------

Figure 3.19: NFATc2 SUMO Interacting Motifs. Human NFATc2 protein with two putative SIMs highlighted, along with corresponding SIM amino acid sequences highlighting secondary structure elements and underlined residues associated with the SIM sequence motif.  Analysis of molecular structures of the SIM:SUMO interaction deposited in the PDB have revealed

that there is variability among residues within the V/I-X-V/I-V/I motif, as well as other characteristic

amino acids associated with SIMs (Figure 3.20). This demonstrates that sequence alone cannot

act as a means to identify putative SIMs. However, the propensity for -strand formation is shared

between SIMs.

2ASQ (PIASx) – kvdVIDLtiessd ---EEE--TTSS- 2KQS (DAXX) - peeIIVLsdsd ------2RPQ (ATP7IP)- ssgVIDLtmddee ----EE--SS--- 2MP2 (RNF4) - gdeIVdLtcesle - S------S-----

Figure 3.20: Diversity of SIM motifs. Sequence alignment of experimentally characterized SIM:SUMO structural complexes reveals variability within the V/I-X-V/I-V/I motif.

We performed an NMR titration between NFATc2IP and the putative SIM1 region of NFATc2

knowing that an interaction between both proteins has already been observed, and because the

putative SIM region is within a secondary structure element has residues similar to the SIM motif

and is solvent exposed.

74 3.3.6 NFATc2IP:NFATc2 NMR titration

NMR titration was performed using the ubiquitin-like domain of 15N-NFATc2IP (corresponding to

PDB-ID: 2JXX) and a 15 residue peptide of the putative SIM motif within NFATc2 (corresponding to residues S-554 to S-573 in PDB-ID: 1S9K.C).

A series of NMR titrations were attempted at pH 6.5 and pH 7.0 in buffer optimized for a

NFATc2IP:NFATc2 interaction (50 mM sodium phosphate, 1 mM DTT, 10% D2O / 90% H2O) based on the previously deposited NFATc2 protein, but no chemical shift change was visible until

1:20 molar ratio. These peak shifts consisted of 2 major peak shifts (glutamine 37 & threonine

38) and 4 minor peak shifts (glycine 32, leucine 39, alanine 59 & tryptophan 96) (Figure 3. 21).

These results correspond to amino acid residues predicted to be within the UIM binding site of ubiquilin-1 (Figure 3.22 & Figure 3.23).

Figure 3.21: NFATc2IP:NFATc2 NMR titration. HSQC from NMR titration from 1:0 NFATc2IP:NFATc2 (red) to 1:20 NFATc2IP:NFATc2 (blue).

75

3.3.6.1 Analysis of the NFATc2IP:NFATc2 interface Differences in electrostatic potential within the SIM-binding interface of NFATc2IP and SUMO2 are apparent when looking at the electrostatic potential distribution (Figure 3.22 & Figure 3.23).

These differences likely correspond to differences in binding partners, even though the molecular surface conformation of the region and the secondary structure elements of the ubiquitin fold are similar. This reveals that a gradient of complementary binding partners involved in a -sheet extension could exist for the SIM-interaction interface, facilitating a similar binding mode but different physicochemical attributes among residues of the binding partner. However, because of the nature of such a relationship, sequence motif alone cannot be used to identify all putative binding partners, and instead a secondary structure element analysis and query of solvent exposed regions are also necessary.

Figure 3.22: Electrostatic potential of NFATc2IP & SUMO2. NFATc2IP (PDB_ID: 2JXX; left) & SUMO2 (PDB_ID: 2AWT; right) with electrostatic potential distribution mapped onto molecular surfaces, and a SIM -strand superimposed within the SIM-interacting interface.

76

Figure 3.23: Electrostatic potential diversity between similar UBLs. The ubiquitin-fold consists of a β-sheet intercalated by an α-helical core. Electrostatic potential mapping reveals a different charge distribution at the SIM- binding interface of NFATc2IP-2 despite domain sequence similarity. There are 2 SIM-like regions of NFATc2 that may interact with NFATc2IP despite lacking a negative charge typical of SIM motifs.

77 3.4 Conclusion

The molecular structures of NFATc2IP & ubiquilin-1 were determined by NMR spectroscopy, putative binding modes (SIM & UIM) were identified through structural analysis of similar ubiquitin- like modifiers, and interactions with binding partners (NFATc2 & PIN1) and were validated through

NMR titration. NFATc2IP was predicted to interact with its binding partner NFATc2 in a SIM-like

-strand extension interaction. Ubiquilin-1 was predicted to interact with its binding partner PIN1 in a UIM-like -helical mediated interaction. These results suggest that a structure-based approach can be useful for identifying potential interaction partners and mechanisms in the ubiquitin fold superfamily.

78

Chapter 4 Exploring UBLs & UBL-Interaction Motifs: Computational & Experimental analysis of ubiquilin, NFATc2IP, UIMs and SIMs.

Contributions: D. Yim & Z. Zhang developed the UBL database and web service. I designed the

UBL database and web service, identified data sources, and performed analyses of UBL data under the guidance of CH. Arrowsmith.

79

Chapter 4 Exploring UBLs & UBL-Interaction Motifs: Computational & Experimental analysis of ubiquilin, NFATc2IP, UIMs and SIMs. 4.1 Introduction

This research project was to obtain near complete structural coverage of human UBLs, without experimentally determining each of the 398 unique UBLs. This was partly facilitated by grouping the UBLs into 100 modelling families that represent homologous protein domains with similar structures (Chapter Two). NMR spectroscopy was used to screen and prioritize UBLs for structure determination, and 17 human UBLs were structurally characterized using X-ray

Crystallography and NMR spectroscopy. The RCSB PDB now has 32% structural coverage of human UBLs based on structures experimentally determined by X-ray Crystallography and NMR spectroscopy, and 82% when taking into account homology modelling. This chapter explores similarities between UBLs, focusing on each of the 17 human UBLs that were structurally characterized for this project and related UBLs. This chapter also discusses the 74 remaining human UBLs that lack structural information, and provides hypotheses for further study.

4.1.1 Database & comparative analysis

Information about each ubiquitin-like domain was compiled from multiple databases to generate a repository that would allow for detailed analysis of relationships between sequence, structure and function of each protein domain. A detailed analysis focused on UBLs that were structurally determined as part of the project, and members of associated modeling families. A relational database facilitated identification of trends and hypothesis generation 80

4.1.1.1 Similarities & differences between model family members

Molecular features from UIM-binding & SIM-binding interfaces were identified and compared within and across modelling families. Full domain and binding-interface localized electrostatic potential distribution clustering was also performed to identify UBLs that shared similar physicochemical characteristics

In addition to comparing molecular features of protein-protein interaction interfaces on the ubiquitin-like domain, our analyses extended to grouping together UBLs that shared common putative protein-protein interaction partners mentioned in literature.

A variety of other features were annotated to identify other similarities and differences between members of each model family. These data included conserved residues, sequence similarity, functional residues (ie. lysines for poly-ubiquitin chains), phosphorylation sites, hydrophobic patches, GO-terms, and full length protein domain structure.

4.1.1.2 Common defining features for each modelling family

Common defining features provide insight into shared attributes of members of each modelling family. These features could also identify functional attributes, binding partners, or other characteristics shared between UBLs. This is particularly important for the ubiquitin-like domain superfamily, since 35.5% of human proteins containing UBLs have no known functional annotations. Additional significance arises from 54 human UBLs associated with disease pathways.

81 4.2 Experimental Procedures

4.2.1 UBL Database Development

A MySQL relational database was developed to contain information about all UBLs and related proteins. The database includes a framework consisting of PHP scripts that facilitate aggregation of online resources (Appendix 1.1.1 & Figure 4.1), and a web-based user interface for displaying and accessing the information.

Figure 4.1: Database schema of ubiquitin-like domain repository. 82

Table 4.1: Data sources for ubiquitin-like domain repository. Data Source Description of information GenBank Nucleotide sequence and gene annotations UniProtKB Gene structure, protein isoform sequences, and gene annotations SMART Protein domains, and domain structure annotations Physiological information compiled from OMIM, GAD, HGMD, PharmGKB, DiseaseHub GCP and GWAS human disease and physiology repositories. GO annotations, protein-protein interactions, cell localization, molecular BioGRID function, and biological processes GO annotations, cell localization, molecular function, and biological Uniprot-GOA processes PDB Molecular structure information BMRB NMR molecular structure information Structural features, electrostatic potential distribution, molecular surface UCSF Chimera information, and secondary structure elements. AESOP Electrostatic potential distribution

4.2.2 Relating 17 structurally determined UBLs to nearest neighbours and model families

For each UBL molecular structure that was determined as part of this project, an analysis was performed to identify and characterize its most similar UBLs that were either part of the same modelling family or were nearest neighbours. Sequence, structure electrostatic potential distribution similarity were analyzed using ClustalW, SIAS, UCSF Chimera, AESOP, and R using similar approaches as described in Chapter Three (Sievers et al., 2011; Petterson et al., 2004;

Gorham et al., 2011). Sequence alignment was performed using ClustalW, while sequence identities and sequence similarities were calculated using SIAS.

Distances from remaining structurally unresolved UBLs were also analyzed, taking into account the distance from UBLs solved as part of this project, as well as general distances from other unresolved UBLs to identify clusters of unresolved structural information. 83

4.2.3 Secondary structure prediction & analysis

The protein domains within each human ubiquitin-like domain containing protein were annotated, and clustered based on similar full length protein domain architecture. Protein domains were identified using information from UniprotKB, PROSITE, SMART & NCBI GenBank, and plotted using PROSITE MyDomains (Galperin et al., 2015; Sigrist et al., 2012; Letunic et al., 2014;

Benson et al., 2013).

The architecture of all UBLs were compared at the sequence-level using secondary-structure sequence alignment, as well as at the structural level using UCSF Chimera (Petterson et al.,

2004).

4.2.4 Relating structural features to functional pathways

For each human ubiquitin-like domain, annotations for cellular localization and functional annotation were retrieved from Uniprot-GOA QuickGO (Huntley et al., 2015). Clusters of human UBLs grouped based on common functional activity and/or cellular localization were analyzed using UCSF Chimera to identify common molecular features (Petterson et al., 2004).

84 4.3 Results

4.3.1 Structurally characterized ubiquitin-like domains

Comparison of molecular structures of the 17 UBLs solved for this project revealed a few structural variations. These include extended loops (between β-strand1 & β-strand2, β-strand2 & -helix1), additional/missing -helicles, and a missing β-strand4. Structural analysis also revealed conserved amino acids associated with the fold (Figure 4.2).

Figure 4.2: Secondary & tertiary structures of 17 structurally characterized UBLs. Ribbon diagrams of 17 ubiquitin-like domain structures solved for this project, along with corresponding secondary structure architecture. 85

4.3.2 Nearest-neighbours of ubiquitin-like domains

Clustering all UBLs based on sequence similarity reveals 5 groups and 30 subgroups (Figure 4.3).

Each of the groups contains at least one UBMs, with the majority of UBMs within Group I and the largest proportion of structurally uncharacterized UBLs within Group IV.

Figure 4.3: Nearest-neighbour clustering of UBLs displayed with proportional transformed branches. Ubiquitin-like domains structurally determined for this project are highlighted in blue. Ubiquitin-like modifiers and putative ubiquitin-like modifiers are underlined.

86

4.3.3 Nearest-neighbours of structurally characterized UBMs

Three of the structurally characterized UBLs were ubiquitin-like modifiers (FUBI-1, ISG15-2, and

SF3A1-1). To identify UBLs that may regulate ubiquitin-like modifiers by competing for binding partners, a nearest-neighbour analysis was performed on FUBI-1, ISG15-2 and SF3A1-1.

Ubiquitin-like domains with structures with an RMSD of less than 2Å were compared to UBLs with similar electrostatic potential and low RMSD (Figure 4.4, Figure 4.5, Figure 4.6).

Figure 4.4: UBLs with a structural fold similar to FUBI-1. There are 21 structurally characterized UBLs with an RMSD of less than 2Å when compared to FUBI-1.

Twelve UBLs share similar electrostatic potential distribution as FUBI-1, of which 3 (highlighted in red) have a fold with an RMSD of less than 2Å when compared to FUBI-1: UBIML_1-1,

UBIML_2-1, ISG15-2, PARK2_1-1, PARK2_2-1, PARK2_5-1, IQUB_1-1, IQUB_2-1, UBL7-1,

UBLCP1-1, USP14-1, and UBFD1-1.

87

Figure 4.5: UBLs with a structural fold similar to the second UBL of ISG15. There are 25 structurally characterized UBLs with an RMSD of less than 2Å when compared to ISG15-2.

Thirteen UBLs share similar electrostatic potential distribution as ISG15-2, of which 5 (highlighted in red) have a fold with an RMSD of less than 2Å when compared to ISG15-2: UHRF2_1-1,

UHRF2_2-1, UBA52-1, UBB-1, UBC-1, RPS27A-1, NEDD8-1, ANUBL1-1, RAD23A-1, RAD23B-

1, UBL4A-1, UBL4B-1, and UHRF1-1.

88

Figure 4.6: UBLs with a structural fold similar to SF3A1. There are 25 structurally characterized UBLs with an RMSD of less than 2Å when compared to SF3A1-1

Three UBLs share similar electrostatic potential distribution as SF3A1-1, of which none have a fold with an RMSD of less than 2Å when compared to SF3A1-1: TBCB-1, USP40_3-1, and

USP47_2-3.

89

4.3.4 Grouping UBLs based on biological processes and molecular function

Many UBLs are uncharacterized, with 62.77% of UBLs having biological process annotations and

64.5% of UBLs having molecular function annotations within the GO repository. A pool of 145

UBLs are associated with a total of 369 unique biological processes and 149 UBLs are associated with 133 unique molecular functions (Huntley et al., 2015). Up to 53 biological processes and 9 molecular functions are associated with an individual UBL. Similarly to cellular localization, biological process attribution and molecular function are associated with full length UBL- containing proteins, and not each individual UBL domain. As a result, factors associated with functional activity could result from molecular features in other domains within the full length protein.

90

Table 4.2: Biological significance, functional annotation, and UBL group for each of the 17 UBLs structurally characterized for this project. PDB UBL Protein Method Biological Significance Function ID Group gene regulation: nuclear SF3A1 NMR 1ZKH V Spliceosome mRNA 3'-splice site recognition Innate immune response activated ISG15 NMR 2HJ8 I signaling protein by interferon-& interferon- MAP1ALC3 Xray 3ECI II endomembrane system : autophagy

HERPUD2 NMR 2KDB III Endoplasmic Reticulum protein binding E3 ligase of lysine 119 on histone RNF2/RING1B Xray 3H8H I & III Transcription H2A Receptor related to immune PLXNC1 Xray 3KUZ I signaling protein modulation during virus infection Deubiquitylates proteins; prevents MDM2 self-ubiquitylation and protein binding: ubiquitinyl USP7 Xray 2KVR II enhances MDM2 E3 activity hydrolase 1 towards p53 and its proteasomal degradation Down-regulates poly-SUMO chain NFATc2IP_1st 2L76 formation by UBE2I/UBC9, and transcription NMR I NFATc2IP_2nd 2JXX involved in expression of cytokine protein binding genes in T-cells Vemurafenib (approved by FDA in 2011) was first drug to target B- RAF for treatment of late-stage transferase: non-specific BRAF Xray 2L05 melanoma; B-RAF is a Raf kinase I serine/threonine protein (N-term) NMR 3NY5 and regulates MAP kinase/ERKs kinase signaling pathway, which affects cell division, differentiation and secretion. C-term is ribosomal protein S30 intracellular, ribosome and FUBI NMR 2L7R II and N-term is a UBL translation Ubiquitin-specific protease that targets lysine 48-linked poly-Ub ubiquitin thioesterase USP15 Xray 3PPA I chains; Targets ubiquitylated APC activity and human papillomavirus type 16 protein E6 E3 involved in methylation-dependent transcriptional regulation. Important UHRF1 Xray 2FAZ I ligase activity for G1/S transition and possibly chromosomal stability and DNA repair. Modulates accumulation of protein, and is found in lesions associated with Alzheimer’s and Parkinson’s disease. Also Ubiquilin 1 NMR 2KLC I protein binding associated with: neurodegenerative diseases, ALS, Dementia, Ataxia, Huntington’s Disease & Lung Adenocarcinoma. Ubiquilin 3 Xray 1YQB I N/A signaling protein

91

4.3.5 Grouping UBLs based on medical significance

Some UBLs are associated with medically significant functional pathways based on annotations within DiseaseHub, a tool that aggregates gene-disease associations from OMIM, GAD, HGMD,

PharmGKB, CGP and GWAS (DiseaseHub; http://zldev.ccbr.utoronto.ca/~ddong/diseaseHub). A pool of 54 UBLs are associated with a total of 103 medically significant functional pathways. The specific role of each UBL domain remains unknown in many cases. Similar to cellular localization, medical significance is associated with full length UBL-containing proteins, and not individual UBL domains. Based on medical significance, 6 structurally uncharacterized and distant UBLs can be prioritized for functional significance (BRAF, PCGF2, PIK3C2A, PIK3C2B, USP40 and USP6). 92

4.3.5.1 Cellular localization Table 4.3: Tissue and cell localization for each of the 17 UBL structurally characterized for this project. PDB UBL Protein Tissue Cell Localization ID Group Nucleus, cytosol, SF3A1 1ZKH V Ubiquitous peroxisome, plasma membrane Detected in lymphoid cells, striated and smooth muscle, several epithelia and neurons. Expressed in neutrophils, monocytes and lymphocytes. Enhanced expression seen in pancreatic ISG15 2HJ8 I adenocarcinoma, endometrial cancer, and bladder Extracellular, cytosol, nucleus cancer, as compared to non-cancerous tissue. In bladder cancer, the increase in expression exhibits a striking positive correlation with more advanced stages of the disease. Most abundant in heart, brain, skeletal muscle MAP1ALC3 3ECI II and testis. Little expression observed in liver. HERPUD2 2KDB III - Nucleus, cytosol, ER RNF2/RING1B 3H8H I & III - Nucleus Plasma membrane, cytosol, Detected in heart, brain, lung, spleen and PLXNC1 3KUZ I extracellular, mitochondria, placenta. peroxisome Widely expressed. Overexpressed in prostate Cytosol, nucleus, USP7 2KVR II cancer. mitochondria NFATc2IP_1st 2L76 I - Nucleus, cytosol NFATc2IP_2nd 2JXX BRAF 2L05 Cytosol, plasma membrane, I Brain and testis. (N-term) 3NY5 nucleus FUBI 2L7R II - Cytosol, nucleus Expressed in skeletal muscle, kidney, heart, Nucleus, cytsol, USP15 3PPA I placenta, liver, thymus, lung, and ovary, with little mitochondrion, plasma or no expression in other tissues. membrane Expressed in thymus, bone marrow, testis, lung UHRF1 2FAZ I Nucleus, cytosol and heart. Overexpressed in breast cancer. Ubiquitous. Highly expressed throughout the brain; detected in neurons and in neuropathological lesions, such as neurofibrillary Nucleus, ER, cytosol, Ubiquilin 1 2KLC I tangles and Lewy bodies. Highly expressed in vacuole, heart, placenta, pancreas, lung, liver, skeletal muscle and kidney. Ubiquilin 3 1YQB I Testis Cytosol, nucleus

Tissue and cell localization information retrieved from GeneCards (Rebhan et al., 1997). UBL groups are annotated in Figure 4.3. 93

4.3.6 Grouping UBLs based on cell localization

Upon analysis of all 231 UBL-containing proteins, 65.8% of UBLs have cell localization annotations within the GO repository. This pool of 152 UBLs are associated with a total of 110 unique cellular regions, and up to 10 cellular regions are associated with a single UBL. Cell localization is a significant attribute to consider when characterizing a protein, since it provides insight into possible protein-protein interactions and functional pathways associated with that particular cell localization, and also provides insight into the chemical environment (ie. pH). Cell localization data for UBLs was analyzed a few different ways. First, the geographic distribution of UBLs within the cell was analyzed, and the most common cellular locations for UBLs were the nucleus, cytoplasm and ER (Figure 4.7). Of the UBLs that have been characterized to exist in the cytoplasm, nucleus and/or ER, 90 UBLs are structurally characterized (bold blue font), and 12 are UBMs (underlined bold blue font).

Figure 4.7: Distribution of human UBLs based on cellular localization.

There are a few caveats to this approach. For example, cell localization is based on the full length protein, which would affect any direct correlation between cell localization and specific protein domains; and 34.2% of UBL-containing proteins lack information about cellular localization. 94

However, taking into account molecular structure data, specifically electrostatic potential distribution mapped onto the molecular surface, the influence of pH on the binding interfaces and structural features could be elucidated.

Table 4.4: Structural alignment of lysines within structurally characterized ubiquitin and ubiquitin-like domains characterized within both cytoplasm and ER; cytoplasm & nucleus; nucleus, cytoplasm and ER; and only nucleus.

Cellular Localization Number of UBLs lysine-6 lysine-11 lysine-27 lysine-29 lysine-33 lysine-48 lysine-63 ER & Cytoplasm 3 2 1 0 1 2 1 1 Nucleus 14 1 4 7 9 6 2 1 Cytoplasm, ER & 21 7 7 12 6 7 12 6 Nucleus Cytoplasm 18 4 1 9 5 8 6 3 Nucleus & Cytoplasm 29 8 8 16 11 9 8 7 none of the above 10 4 2 8 2 1 3 1

Of the 21 UBLs associated with the ER, 4 UBLs are found to only be associated with the cytoplasm and ER; GABARAP, GABARAPL1, HSPA13 and VCPIP1. The ubiquitin-like domain of

GABARAP and GABARAPL1 have been structurally characterized. VCPIP1 contains two putative UBLs; there are distantly related protein structures for fragments of the first ubiquitin-like domain of VCPIP1, and a homology model can be generated for the second ubiquitin-like domain.

However, the homology model for the second ubiquitin-like domain of VCPIP1 has a low confidence C-terminal tail due to template sequence alignment gaps. Homology models were not generated for HSPA13 nor VCPIP1_1-1 because low quality homology models would have been generated.

Structural analysis of the homology model of VCPIP1, GABARAP and GABARAPL1 reveal structural alignment of lysine 53 in VCPIP1 with poly-ubiquitin chain target lysine 48 of ubiquitin, and lysine 35 & lysine 66 of GABARAP and GABARAPL1 with lysine 6 & lysine 33 of ubiquitin.

Comparative analysis of the molecular surface of each ubiquitin-like domain structure at pH 7.2 revealed no major hydrophobic patches nor electrostatic potential patches across all structures.

However, this could be due to the small sample size of only two structurally characterized UBLs and one homology model for this group of UBLs. 95

Structural analysis of UBLs found within the nucleus provide richer pool of information. There was structural information for 14 UBLs, of which 8 structures were generated using homology modelling techniques. Analysis of electrostatic potential distribution grouped the proteins into 3 subgroupings: UBLs with surfaces that are mostly positively charged (PCGF1_1-1, PCGF2_1-1,

PCGF3_1-1, PCGF5_1-1, SF3A1_1-1), UBLs with a large conserved negatively charged patch

(PCGF5_1-1, PCGF6_1-1, PCGF6_2-2, USP31_1-1), UBLs with mixed distribution of negatively

& positively charged residues (UHRF1_1-1, UHRF2_1-1, UBLCP1_1-1, SUMO2_1-1, PCGF6_1-

1, PCGF6_2-2). Some small hydrophobic patches were identified for small subgroupings of

UBLs, but nothing significant to characterize the full group of UBLs. Similar to the group of UBLs within the ER, there is also a subset of nuclear UBLs that have structurally conserved lysines in regions corresponding to lysine 6, lysine 33, and lysine 48 of ubiquitin (Table 4.4).

4.4 Conclusion

Information about the ubiquitin-like domain family has been compiled as a resource for generating hypotheses about ubiquitin-like domain containing proteins, and the role of the UBLs in uncharacterized proteins based on structural similarity analyses that could be associated with potential protein-protein interaction interfaces. Multiple approaches were pursued for grouping

UBLs; these included clustering based on sequence similarity, structural features, and functional characterization. Based on the analyses that were performed, a framework was generated to explore molecular diversity of protein domains and putative members of protein domain families.

Structurally unresolved UBLs were ranked based on the amount and significance of information generated by subsequence structural analyses. The top 10 UBLs recommended for future characterization are ANKUB1-2, FRMD1_2-2, FRMPD2_1-1, SHROOM1_1-2, SNX31_1-2,

USP9X_1-3, USP11_1-2, SACS_1, PAN2_1-1, PAN2_1-2, PAN2_1-3, PIK3CG and PTPN13_1-2. 96

Chapter 5 Conclusion and Future Directions 5.1 Conclusions

Over the course of this thesis project, I investigated the scope and diversity of the ubiquitin fold among human ubiquitin-like domains. This revealed a functionally diverse superfamily of 448 protein domains, related to one another in terms of structural fold and secondary structure elements. The functional diversity of the 448 human UBLs was efficiently surveyed by grouping related UBLs into modelling families. As a result, 680 DNA constructs representing 76 UBL domains were expressed in E.coli for small-scale screening of protein expression and solubility, of which 205 UBL domain constructs were further screened by NMR spectroscopy. 17 UBLs with high novel leverage were selected for molecular structure determination based on protein expression and solubility screening results. The structurally characterized UBLs were surveyed and compared with structurally characterized UBMs, revealing amino acid variability and complementarity that maintains the protein fold while diversifying the chemical environment of protein-protein interaction interfaces.

Aggregating and analyzing these distant features facilitated correlations and predicted relationships based on structural features. Two of these predictions, the second ubiquitin-like domain of NFATc2IP interacting with the second SIM of NFATc2, as well as the ubiquitin-like domain of ubiquilin-1 interacting with a putative UIM of PIN1, were screened by NMR titration

(Chapter Three). Changes in chemical shifts of residues at or near the putative binding site validate the predicted interaction, and also demonstrate the potential for ubiquitin-like domains to have interactions that are similar to known binding partners of ubiquitin and ubiquitin-like modifiers yet complement the interaction interface of the ubiquitin-like fold. The significance of these interactions are yet to be characterized, but could be related to shared functional activity, common 97 functional pathways, modulation of ubiquitin-like modifier activity, or could be involved in mediating ubiquitin-like modifier conjugation of ubiquitin-like domain containing proteins.

5.2 Future Directions

5.2.1 Ubiquitin-like domain fold, NFATc2IP & ubiquilins

In order to better understand the significance of conserved residues on maintaining the ubiquitin fold and the characteristic secondary structure elements, a series of mutagenesis experiments could be performed. Mutagenesis could also be performed on amino acids within binding interfaces to explore complementarity between ubiquitin-like domains and binding partners. For

NFATc2IP, the amino acids would include those identified in the NMR titration experiments; glutamine 37, threonine 38, glycine 32, leucine 39, alanine 59, and tryptophan 96. For ubiquilin-

1, the amino acids would include aspartic acid 63, lysine 72, isoleucine 73, leucine 74, glutamine

82, histidine 92, valine 94, and lysine 96.

NMR titration experiments were performed using isolated ubiquitin-like domains and fragments of binding partner proteins, and should be repeated using full length proteins (NFATc2IP, NFATc2, ubiquilin-1, and PIN1). The full length NFATc2IP protein contains tandem NFATc2IP ubiquitin- like domains, and a comparative analysis can be performed as a tandem NFATc2IP ubiquitin-like domain fragment. NFATc2IP and ubiquilin-1 genes each have multiple protein family members and isoforms, and similar experiments can be performed on each of these members to determine whether binding specificity extends to other family members and/or isoforms.

Ubiquilin-1 has been observed in the cytoplasm, nucleus and ER, while NFATc2IP has been observed in the cytoplasm and nucleus. Subtle differences in the chemical environments of each cellular compartment could impact the electrostatic surface potential at binding interfaces and impact protein-protein interactions. For this reason, experiments involving pH titration and the impact on protein-protein interactions could be explored. Similarity, phosphorylation and post-

98 translational modification sites exist on NFATc2IP, NFATc2, ubiquilin-1 and PIN1, and experiments could be performed to determine whether phosphorylation or other post-translational modifications affect binding affinities.

Based on fold conservation and structural feature similarity, competition analysis with similar ubiquitin-like domains could be performed (ubiquitin, UBL4A & UBTD2 for ubiquilin-1) to determine whether the ubiquitin-like domains compete to interact with PIN1 for ubiquilin-1. A matrix of similar competition analyses could be performed using additional binding partners.

5.2.2 Ubiquitin-like domain structural genomics

270 ubiquitin-like domains remain to be structurally determined for structural completeness, which becomes 74 when taking into account homology models. I’d recommend a strategy for completing structural coverage which is prioritized based on structural coverage & functional significance. This would consist of screening and characterizing the following ubiquitin-like domains: ANKUB1-2,

FRMD1_2-2, FRMPD2_1-1, SHROOM1_1-2, SNX31_1-2, USP9X_1-3, USP11_1-2, SACS_1,

PAN2_1-1, PAN2_1-2, PAN2_1-3, PIK3CG, and PTPN13_1-2.

5.2.3 Protein Domain family analyses

Our systematic approach of surveying, selecting, screening, structural determination, and analysis could be performed on a variety of different protein families to explore the amino acid and structural diversity of any group of proteins, whether fold superfamily or structural motif.

5.3 Concluding remarks

My structural genomics analysis of human ubiquitin-like domains demonstrates the value of: (1)

NMR 1H15N-HSQC screening for amenability for structure determination; (2) modelling family analysis and homology model generation to assist in completing structural coverage of a protein family; and (3) utilizing relational databases and structure-driven hypothesis generation to predict putative binding partners. 99 6.0 References

Angot A, Vergunst A, Genin S, and Peeters N. “Exploitation of eukaryotic ubiquitin signaling pathways by effectors translocated by bacterial type III and type IV secretion systems.” PLoS pathogens 3, no. 1 (2007): e3.

Arnold K, Bordoli L, Kopp J, and Schwede T. "The SWISS-MODEL Workspace: A web-based environment for protein structure homology modelling." Bioinformatics 22 (2006): 195-201.

Baker NA, Sept D, Joseph S, Holst MJ, and McCammon JA. "Electrostatics of nanosystems: application to and the ribosome." Proceedings of the National Academy of Sciences of the United States of America 98 (2001): 10037–10041.

Benson, DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, and Sayers EW.

"GenBank." Nucleic Acids Research 41, no. Database Issue (2013): D36-D42.

Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, and Bourne

PE. "The Protein Data Bank". Nucleic Acids Research 28 (2000): 235–242.

Bhattacharya, A, Tejero R, and Montelione GT. "Evaluating protein structures determined by structural genomics consortia." Proteins: Structure, Function, and Bioinformatics 66, no. 4 (2007):

778-795.

Bhattacharya, A, Wunderlich Z, Monleon D, Tejero R, and Montelione GT. "Assessing model accuracy using the homology modeling automatically software." Proteins 70 (2008): 105-118.

Boratyn, Grzegorz M, Schaffer AA, Agarwala R, Altschul SF, Lipman DJ, and Madden TL.

"Domain enhanced lookup time accelerated BLAST." Biology Direct 7, no. 1 (2012): 12.

100

Boyault C, Gilquin B, Zhang Y, Rybin V, Garman E, Meyer-Klaucke W, Matthias P, Müller CW, and Khochbin S. “HDAC6–p97/VCP controlled polyubiquitin chain turnover.” The EMBO journal

25, no. 14 (2006): 3357-3366.

Brzovic PS, Lissounov A, Christensen DE, Hoyt DW, and Klevit RE. “A UbcH5/ubiquitin noncovalent complex is required for processive BRCA1-directed ubiquitination.” Mol. Cell 21

(2006): 873–880.

Brünger AT, Adams PD, Clore GM, DeLano WL, Gros P, Grosse-Kunstleve RW, Jiang J-S,

Kuszewski J, Nilges M, Pannu NS, Read RJ, Rice LM, Simonson T, Warren GL. "Crystallography

& NMR system: A new software suite for macromolecular structure determination." Acta

Crystallographica Section D: Biological Crystallography 54, no. 5 (1998): 905-921.

Brünger AT. "Version 1.2 of the Crystallography and NMR system." Nature protocols 2, no. 11

(2007): 2728-2733.

Chen L, Shinde U, Ortolan TG, and Madura K. “Ubiquitin‐associated (UBA) domains in Rad23 bind ubiquitin and promote inhibition of multi‐ubiquitin chain assembly.” EMBO reports 2, no. 10

(2001): 933-938.

Ciechanover A, and Schwartz AL. “The ubiquitin system: pathogenesis of human diseases and drug targeting.” Biochimica et Biophysica Acta (BBA)-Molecular Cell Research 1695, no. 1 (2004):

3-17.

Cole C, Barber JD, and Barton GJ. "The Jpred 3 secondary structure prediction server" Nucleic

Acids Res. 36, suppl. 2 (2008): W197-W201.

Cuff JA, and Barton GJ. "Application of Enhanced Multiple Sequence Alignment Profiles to

Improve Protein Secondary Structure Prediction." PROTEINS: Structure, Function and Genetics

40 (2000): 502-511. 101

Davis ME, McCammon JA "Electrostatics in biomolecular structure and dynamics." Chem. Rev.

90 (1990): 509–521. de Napoles M, Mermoud JE, Wakao R, Tang YA, Endoh M, Appanah R, Nesterova TB, Silva J,

Otte AP, Vidal M, Koseki H, and Brockdorff N. “Polycomb group proteins Ring1A/B link ubiquitylation of histone H2A to heritable gene silencing and X inactivation.” Developmental cell

7, no. 5 (2004): 663-676.

Delaglio F, Grzesiek S, Vuister GW, Zhu G, Pfeifer J, and Bax AD. "NMRPipe: a multidimensional spectral processing system based on UNIX pipes." Journal of biomolecular NMR 6, no. 3 (1995):

277-293.

Dolinsky TJ, Czodrowski P, Li H, Nielsen JE, Jensen JH, Klebe G, Baker NA. "PDB2PQR: expanding and upgrading automated preparation of biomolecular structures for molecular simulations." Nucleic Acids Research 35 (2007): W522–W525.

Donaldson KM, Yin H, Gekakis N, Supek F, and Joazeiro CA. “Ubiquitin signals protein trafficking via interaction with a novel ubiquitin binding domain in the membrane fusion regulator, Vps9p.”

Current biology 13, no. 3 (2003): 258-262.

Finley D, Bartel B, and Varshavsky A. “The tails of ubiquitin precursors are ribosomal proteins whose fusion to ubiquitin facilitates ribosome biogenesis.” Nature 338, no. 6214 (1989): 394-401.

Fisher RD, Wang B, Alam SL, Higginson DS, Robinson H, Sundquist WI, & Hill CP. "Structure and ubiquitin binding of the ubiquitin-interacting motif." Journal of Biological Chemistry 278, no.

31 (2003): 28976-28984.

Goddard TD & Kneller DG. "SPARKY 3", University of California, San Francisco.

102

Gorham Jr RD, Kieslich CA, Morikis D. "Electrostatic Clustering and Free Energy Calculations

Provide a Foundation for Protein Design and Optimization." Annals of Biomedical Engineering 39, no. 4 (2011): 1252–1263.

Grabbe C & Dikic I. "Functional roles of ubiquitin-like domain (ULD) and ubiquitin-binding domain

(UBD) containing proteins." Chemical reviews 109, no. 4 (2009): 1481-1494.

Haglund K, and Dikic I. “Ubiquitylation and cell signaling.” The EMBO journal 24, no. 19 (2005):

3353-3359.

Hannich JT, Lewis A, Kroetz MB, Li SJ, Heide H, Emili A, and Hochstrasser M. "Defining the

SUMO-modified proteome by multiple approaches in Saccharomyces cerevisiae." Journal of

Biological Chemistry 280, no. 6 (2005): 4102-4110.

Hecker CM, Rabiller M, Haglund K, Bayer P, and Dikic I. “Specification of SUMO1-and SUMO2- interacting motifs.” Journal of Biological Chemistry 281, no. 23 (2006): 16117-16127.

Heir R, Ablasou C, Dumontier E, Elliott M, Fagotto-Kaufmann C, Bedford FK. "The UBL domain of PLIC-1 regulates formation." EMBO reports 7, 12 (2006): 1252-1258.

Hochstrasser M. “Origin and function of ubiquitin-like proteins.” Nature 458, no. 7237 (2009): 422-

429.

Hochstrasser, M. "Evolution and function of ubiquitin-like protein-conjugation systems." Nature cell biology 2, no. 8 (2000): E153-E157.

Hofmann K & Bucher P. "The UBA domain: a sequence motif present in multiple enzyme classes of the ubiquitination pathway." Trends in biochemical sciences 21, no. 5 (1996): 172-173. 103

Hofmann K & Falquet L. “A ubiquitin-interacting motif conserved in components of the proteasomal and lysosomal protein degradation systems.” Trends in biochemical sciences 26, no.

6 (2001): 347-350.

Hong YH, Ahn HC, Lim J, Kim HM, Ji HY, Lee S, Kim JH, Park EY, Song HK, and Lee BJ.

“Identification of a novel ubiquitin binding site of STAM1 VHS domain by NMR spectroscopy.”

FEBS letters 583, no. 2 (2009): 287-292.

Hook SS, Orian A, Cowley SM, and Eisenman RN. “ 6 binds polyubiquitin through its zinc finger (PAZ domain) and copurifies with deubiquitinating enzymes.” Proceedings of the National Academy of Sciences 99, no. 21 (2002): 13425-13430.

Ichimura Y, Takayoshi K, Toshifumi T, Yoshinori S, Yasutsugu S, Naotada I, Noboru M, et al. "A ubiquitin-like system mediates protein lipidation." Nature 408, no. 6811 (2000): 488-492.

Kang RS, Daniels CM, Francis SA, Shih SC, Salerno WJ, Hicke L, and Radhakrishnan I. “Solution structure of a CUE–ubiquitin complex reveals a conserved mode of ubiquitin binding.” Cell 113

(2003): 621–630.

Kerscher O. "SUMO junction—what's your function?." EMBO reports 8, no. 6 (2007): 550-555.

Kiefer F, Arnold K, Künzli M, Bordoli L, and Schwede T. "The SWISS-MODEL Repository and associated resources.” Nucleic Acids Research 37 (2009): D387-D392.

Ko HS, Uehara T, Tsuruma K, and Nomura Y. "Ubiquilin interacts with ubiquitylated proteins and proteasome through its ubiquitin-associated and ubiquitin-like domains." FEBS letters 566, no. 1

(2004): 110-114.

104

Koehn J & Hunt I. "High-Throughput Protein Production (HTPP): a review of enabling technologies to expedite protein production." In High Throughput Protein Expression and Purification, pp. 1-18.

Humana Press, 2009.

Koh IYY, Eyrich VA, Marti-Renom MA, Przybylski D, Madhusudhan MS, Eswar N, Grana O, Pazos

F, Valencia A, Sali A, and Rost B. "EVA: evaluation of protein structure prediction servers."

Nucleic Acids Research 31, no. 13 (2003): 3311-3315.

Komander D. “The emerging complexity of protein ubiquitination.” Biochemical Society

Transactions 37, no. Pt 5 (2009): 937-953.

Koonin EV & Abagyan RA. “TSG101 may be the prototype of a class of dominant negative ubiquitin regulators.” Nature genetics 16, no. 4 (1997): 330-331.

Lee S, Tsai YC, Mattera R, Smith WJ, Kostelansky MS, Weissman AM, Bonifacino JS, and Hurley

JH. “Structural basis for ubiquitin recognition and autoubiquitination by Rabex-5.” Nature Struct.

Mol. Biol. 13, (2006): 264–271.

Lemak A, Gutmanas A, Chitayat S, Karra M, Farès C, Sunnerhagen M, and Arrowsmith CH. "A novel strategy for NMR resonance assignment and protein structure determination." Journal of biomolecular NMR 49, no. 1 (2011): 27-38.

Letunic I, Doerks T, and Bork P. "SMART: recent updates, new developments and status in 2015."

Nucleic Acids Research 43, no. D1 (2014): D257-D260.

Liou YC, Sun A, Ryo A, Zhou XZ, Yu ZX, Huang HK, Uchida T, Bronson R, Bing G, Li X, Hunter

T, and Lu KP. "Role of the prolyl isomerase Pin1 in protecting against age-dependent neurodegeneration" Nature 424, no. 6948 (2003): 556-561. 105

Lim J, Hao T, Shaw C, Patel AJ, Szabó G, Rual JF, Fisk CJ, Li N, Smolyar A, Hill DE, Barabási

AL, Vidal M, and Zoghbi HY. "A protein–protein interaction network for human inherited ataxias and disorders of Purkinje cell degeneration." Cell 125, no. 4 (2006): 801-814.

Lippens G, Landrieu I, and Smet C. "Molecular mechanisms of the phospho‐dependent prolyl cis/trans isomerase Pin1." FEBS journal 274, no. 20 (2007): 5211-5222.

Loeb KR & Haas AL. "The interferon-inducible 15-kDa ubiquitin homolog conjugates to intracellular proteins." Journal of Biological Chemistry 267, no. 11 (1992): 7806-7813.

Macian F. "NFAT proteins: key regulators of T-cell development and function." Nature Reviews

Immunology 5, no. 6 (2005): 472-484.

Mah AL, Perry G, Smith MA, and Monteiro MJ. "Identification of ubiquilin, a novel presenilin interactor that increases presenilin protein accumulation." The Journal of cell biology 151, no. 4

(2000): 847-862.

Makhnevych T, Sydorskyy Y, Xin X, Srikumar T, Vizeacoumar FJ, Jeram SM, Li Z, Bahr S,

Andrews BJ, Boone C, and Raught B. "Global map of SUMO function revealed by protein-protein interaction and genetic networks." Molecular cell 33, no. 1 (2009): 124-135.

Marchler-Bauer A, Zheng C, Chitsaz F, Derbyshire MK, Geer LY, Geer RC, Gonzales NR, Gwadz

M, Hurwitz DI, Lanczycki CJ, Lu F, Lu S, Marchler GH, Song JS, Thanki N, Yamashita RA, Zhang

D, and Bryant SH. "CDD: conserved domains and protein three-dimensional structure." Nucleic acids research 41, no. D1 (2013): D348-D352.

Marti-Renom MA, Madhusudhan MS, Fiser A, Rost B, and Sali A. "Reliability of assessment of protein structure prediction methods." Structure 10 (2002): 435-440.

106

Marti-Renom MA, Stuart A, Fiser A, Sanchez R, Melo F, and Sali A. "Comparative protein structure modeling of genes and genomes." Annual Review of Biophysics and Biomolecular

Structure 29 (2000): 291-325.

McNally T, Huang Q, Janis RS, Liu Z, Olejniczak ET, and Reilly RM. "Structural analysis of UBL5, a novel ubiquitin-like modifier." Protein science 12, no. 7 (2003): 1562-1566.

Meyer HH, Wang Y, and Warren G. “Direct binding of ubiquitin conjugates by the mammalian p97 adaptor complexes, p47 and Ufd1–Npl4.” The EMBO journal 21, no. 21 (2002): 5645-5652.

Minty A, Dumont X, Kaghad M, and Caput D. "Covalent Modification of p73α by SUMO-1 two- hybrid screening with p73 identifies novel SUMO-1-interacting proteins and a SUMO-1 interacting motif." Journal of Biological Chemistry 275, no. 46 (2000): 36316-36323.

Mizushima N, Noda T, Yoshimori T, Tanaka Y, Ishii T, George MD, Klionsky DJ, Ohsumi M, and

Ohsumi Y. "A protein conjugation system essential for autophagy." Nature 395, no. 6700 (1998):

395-398.

Müller S, Ledl A, and Schmidt D. "SUMO: a regulator of and genome integrity."

Oncogene 23, no. 11 (2004): 1998-2008.

N'Diaye EN & Brown EJ. "The ubiquitin-related protein PLIC-1 regulates heterotrimeric G protein function through association with Gβγ." The Journal of cell biology 163, no. 5 (2003): 1157-1165.

Nair, Rajesh, Liu J, Soong TT, Acton TB, Everett JK, Kouranov A, Fiser A, Godzik A, Jaroszewski

L, Orengo C, Montelione GT, and Rost B. "Structural genomics is the largest contributor of novel structural leverage." Journal of Structural and Functional Genomics 10, no. 2 (2009): 181-191.

Nayak A, Glöckner-Pagel J, Vaeth M, Schumann JE, Buttmann M, Bopp T, Schmitt E, Serfling E and Berberich-Siebelt F. "Sumoylation of the NFATc1 leads to its subnuclear 107 relocalization and interleukin-2 repression by histone deacetylase" Journal of Biological Chemistry

284, no. 16 (2009): 10935-10946.

Pan ZQ, Kentsis A, Dias DC, Yamoah K, and Wu K. "Nedd8 on cullin: building an expressway to protein destruction." Oncogene 23, no. 11 (2004): 1985-1997.

Peitsch, MC. "Protein modeling by E-mail." Nature Biotechnology 13 (1995): 658-660.

Penengo L, Mapelli M, Murachelli AG, Confalonieri S, Magri L, Musacchio A, Di Fiore PP, Polo S, and Schneider TR. “Crystal structure of the ubiquitin binding domains of rabex-5 reveals two modes of interaction with ubiquitin.” Cell 124 (2006): 1183–1195.

Perry JJP, Tainer JA, and Boddy MN. "A SIM-ultaneous role for SUMO and ubiquitin." Trends in biochemical sciences 33, no. 5 (2008): 201-208.

Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, and Ferrin TE

"UCSF Chimera–a visualization system for exploratory research and analysis." J. Comput. Chem

25 (2004): 1605–1612.

Pieper U, Webb BM, Barkan DT, Schneidman-Duhovny D, Schlessinger A, Braberg H, Yang Z,

Meng EC, Pettersen EF, Huang CC, Datta RS, Sampathkumar P, Madhusudhan MS, Sjolander

K, Ferrin TE, Burley SK, and Sali A. "MODBASE, a database of annotated comparative protein structure models and associated resources." Nucleic Acids Research 39 (2011): 465-474.

Prag G, Misra S, Jones EA, Ghirlando R, Davies BA, Horazdovsky BF, and Hurley JH.

“Mechanism of ubiquitin recognition by the CUE domain of Vps9p.” Cell 113 (2003): 609–620.

Prag G, Lee SH, Mattera R, Arighi CN, Beach BM, Bonifacino JS, and Hurley JH. “Structural mechanism for ubiquitinated-cargo recognition by the Golgi-localized, gamma-ear-containing,

108

ADP-ribosylation-factor-binding proteins.” Proceedings of the National Academy of Sciences of the United States of America 102 (2005): 2334–2339.

Prasad TSK, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S, Telikicherla D,

Raju R, Shafreen B, Venugopal A, Balakrishnan L, Marimuthu A, Banerjee S, Somanathan DS,

Sebastian A, Rani S, Ray S, Kishore CJH, Kanth S, Ahmed M, Kashyap MK, Mohmood R,

Ramachandra YL, Krishna V, Rahiman BA, Mohan S, Ranganathan P, Ramabadran S,

Chaerkady R, and Pandey A. "Human Protein Reference Database - 2009 Update." Nucleic Acids

Research 37 (2009): D767-D772.

Rao A, Luo C, and Hogan PG. “Transcription factors of the NFAT family: regulation and function.”

Annual review of immunology 15, no. 1 (1997): 707-747.

Rebhan M, Chalifa-Caspi V, Prilusky J, and Lancet D. "GeneCards: integrating information about genes, proteins and diseases." Trends in Genetics 13, no. 4 (1997): 163.

Regan-Klapisz E, Sorokina I, Voortman J, de Keizer P, Roovers RC, Verheesen P, Urbé S, Fallon

L, Fon EA, Verkleij A, Benmerah A, and van Bergen en Henegouwen PM. "Ubiquilin recruits

Eps15 into ubiquitin-rich cytoplasmic aggregates via a UIM-UBL interaction." Journal of cell science 118, no. 19 (2005): 4437-4450.

Rengarajan J, Mittelstadt PR, Mages HW, Gerth AJ, Kroczek RA, Ashwell JD, and Glimcher LH.

“Sequential involvement of NFAT and Egr transcription factors in FasL regulation.” Immunity 12, no. 3 (2000): 293-300.

Reyes-Turcu FE, Horton JR, Mullally JE, Heroux A, Cheng X, and Wilkinson KD. “The ubiquitin binding domain ZnF UBP recognizes the C-terminal diglycine motif of unanchored ubiquitin.” Cell

124, no. 6 (2006): 1197-1208. 109

Schlesinger DH, Goldstein G, and Niall HD. “Complete amino acid sequence of ubiquitin, an adenylate cyclase stimulating polypeptide probably universal in living cells.” 14, no.

10 (1975): 2214-2218.

Semple CA. “The comparative proteomics of ubiquitination in mouse.” Genome Research 13

(2003): 1389–1394.

Shen Y, Delaglio F, Cornilescu G, and Bax A. "TALOS+: a hybrid method for predicting protein backbone torsion angles from NMR chemical shifts." Journal of biomolecular NMR 44, no. 4

(2009): 213-223.

Shiba Y, Katoh Y, Shiba T, Yoshino K, Takatsu H, Kobayashi H, Shin HW, Wakatsuki S, and

Nakayama K. “GAT (GGA and Tom1) domain responsible for ubiquitin binding and ubiquitination.”

Journal of Biological Chemistry 279, no. 8 (2004): 7105-7111.

Shih SC, Prag G, Francis SA, Sutanto MA, Hurley JH, and Hicke L. “A ubiquitin‐binding motif required for intramolecular monoubiquitylation, the CUE domain.” The EMBO Journal 22, no. 6

(2003): 1273-1281.

Shimodaira H. "An approximately unbiased test of phylogenetic tree selection." System. Biol. 51

(2002): 492–508.

Shimodaira H. "Approximately unbiased test of regions using multistep-multiscale bootstrap resampling." Ann. Statist. 32 (2004): 2616–2641.

Sitkoff D, Sharp KA, Honig B. "Accurate calculation of hydration free energies using macroscopic solvent models." J. Phys. Chem. 98 (1994): 1978–1988.

110

Song J, Durrin LK, Wilkinson TA, Krontiris TG, and Chen Y. "Identification of a SUMO-binding motif that recognizes SUMO-modified proteins." Proceedings of the National Academy of

Sciences of the United States of America 101, no. 40 (2004): 14373-14378.

Song J, Zhang Z, Hu W, and Chen Y. “Small ubiquitin-like modifier (SUMO) recognition of a SUMO binding motif: a reversal of the bound orientation.” J.Biol.Chem. 280 (2005): 40122-40129.

Sundquist WI, Schubert HL, Kelly BN, Hill GC, Holton JM, and Hill CP. “Ubiquitin recognition by the human TSG101 protein.” Mol. Cell 13 (2004): 783–789.

Swanson KA, Kang RS, Stamenova SD, Hicke L, and Radhakrishnan I. “Solution structure of

Vps27 UIM-ubiquitin complex important for endosomal sorting and receptor downregulation.”

EMBO J. 22 (2003): 4597–4606.

Teo H, Gill DJ, Sun J, Perisic O, Veprintsev DB, Vallis Y, Emr SD, and Williams RL. “ESCRT-I core and ESCRT-II GLUE domain structures reveal role for GLUE in linking to ESCRT-I and membranes.” Cell 125, no. 1 (2006): 99-111.

Terui Y, Saad N, Jia S, McKeon F, and Yuan J. "Dual role of sumoylation in the nuclear localization and transcriptional activation of NFAT1." Journal of Biological Chemistry 279 (2004): 28257-

28265.

Turner B, Razick S, Turinsky AL, Vlasblom J, Crowdy EK, Cho E, Morrison K, Donaldson IM, and

Wodak SJ. "iRefWeb: interactive analysis of consolidated protein interaction data and their supporting evidence." Database (2010): baq023.

Uhlen M, Oksvold P, Fagerberg L, Lundberg E, Jonasson K, Forsberg M, Zwahlen M, Kampf C,

Wester K, Hober S, Wernerus H, Björling L, and Ponten F. "Towards a knowledge-based Human

Protein Atlas." Nature Biotechnology 28, no. 12 (2010): 1248-1250. 111

UniProt Consortium. "Activities at the Universal Protein Resource (UniProt)." Nucleic Acids

Research 42, no. D1 (2014): D191-D198.

Varadan R, Assfalg M, Raasi S, Pickart C, and Fushman D. “Structural determinants for selective recognition of a lys48-linked polyubiquitin chain by a UBA domain.” Mol. Cell 18 (2005): 687–698.

Vriend, G. "WHAT IF: a molecular modeling and drug design program." Journal of molecular graphics 8, no. 1 (1990): 52-56.

Wang B, Alam SL, Meyer HH, Payne M, Stemmler TL, Davis DR, and Sundquist WI. “Structure and ubiquitin interactions of the conserved zinc finger domain of Npl4.” Journal of Biological

Chemistry 278, no. 22 (2003): 20225-20234.

Wang QH, Young P, and Walters KJ. “Structure of S5a bound to monoubiquitin provides a model for polyubiquitin recognition.” J. Mol. Biol. 348 (2005): 727–739.

Wang X, Herr RA, Chua WJ, Lybarger L, Wiertz EJHJ, and Hansen TH. "Ubiquitination of serine, threonine, or lysine residues on the cytoplasmic tail can induce ERAD of MHC-I by viral E3 ligase mK3." The Journal of cell biology 177, no. 4 (2007): 613-624.

Weigelt J. "Structural genomics—impact on biomedicine and drug discovery." Experimental cell research 316, no. 8 (2010): 1332-1338.

Xu P & Peng J. "Dissecting the ubiquitin pathway by mass spectrometry." Biochimica et

Biophysica Acta (BBA)-Proteins and Proteomics 1764, no. 12 (2006): 1940-1947.

Yang SH, Galanis A, Witty J, and Sharrocks AD. "An extended consensus motif enhances the specificity of substrate modification by SUMO." The EMBO journal 25, no. 21 (2006): 5083-5093.

112

Yee A, Chang X, Pineda-Lucena A, Wu B, Semesi A, Le B, Ramelot T, Lee GM, Bhattacharyya

S, Gutierrez P, Denisov A, Lee CH, Cort JR, Kozlov G, Liao J, Finak G, Chen L, Wishart D, Lee

W, McIntosh LP, Gehring K, Kennedy MA, Edwards AM, and Arrowsmith CH. "An NMR approach to structural proteomics." Proceedings of the National Academy of Sciences 99, no. 4 (2002):

1825-1830.

Yee AA, Semesi A, Garcia M, and Arrowsmith CH. "Screening proteins for NMR suitability. In

Structural Genomics and Drug Discovery." Springer New York (2014): 169-178.

Young P, Deveraux Q, Beal RE, Pickart CM, and Rechsteiner M. "Characterization of two polyubiquitin binding sites in the 26 S protease subunit 5a". Journal of Biological Chemistry 273, no. 10 (1998): 5461–5467.

Zhu J, Zhu S, Guzzo CM, Ellis NA, Sung KS, Choi CY, and Matunis MJ. "Small ubiquitin-related modifier (SUMO) binding determines substrate recognition and paralog-selective SUMO modification." Journal of Biological Chemistry 283, no. 43 (2008): 29405-29415.

Zweckstetter M & Bax A. "Prediction of sterically induced alignment in a dilute liquid crystalline phase: aid to protein structure determination by NMR." Journal of the American Chemical Society

122, no. 15 (2000): 3791-3792.

113 7.0 Appendix

Appendix I: All human genes that encode at least one ubiquitin-like domain.

HUGO NCBI EC Gene Name Protein Name UniProt ID UniProt Name HGNC GeneID EnzymeID ANKRD60 Ankyrin repeat domain-containing protein 60 16217 140731 - Q9BZ19 ANR60_HUMAN

ANKUB1-1/-2/-3 ANKUB1 389161 29642 - A6NFN9 ANKUB_HUMAN

ANUBL1-1 AN1-type zinc finger protein 4 23504 93550 - Q86XD8 ZFAN4_HUMAN Amyloid beta A4 precursor protein-binding family B member APBB1IP 17379 54518 - Q7Z5R6 AB1IP_HUMAN 1-interacting protein ARAF-1 Serine/threonine--Raf 646 369 2.7.11.1 P10398 ARAF_HUMAN Arf-GAP with Rho-GAP domain, ANK repeat and PH domain- ARAP1 16925 116985 - Q96P48 ARAP1_HUMAN containing protein 1 Arf-GAP with Rho-GAP domain, ANK repeat and PH domain- ARAP2 16924 116984 - Q8WZ64 ARAP2_HUMAN containing protein 2 Arf-GAP with Rho-GAP domain, ANK repeat and PH domain- ARAP3 24097 64411 - Q8WWN8 ARAP3_HUMAN containing protein 3 ARHGAP20 Rho GTPase-activating protein 20 18357 57569 - Q9P2F6 RHG20_HUMAN

ASPSCR1_1-1 Tether containing UBX domain for GLUT4 13825 79058 - Q9BZE9 ASPC1_HUMAN

ATG12 Ubiquitin-like protein ATG12 588 9140 - O94817 ATG12_HUMAN

ATG3-1 Ubiquitin-like-conjugating enzyme ATG3 20962 64422 6.3.2.- Q9NT62 ATG3_HUMAN

ATG7_1-1 Ubiquitin-like modifier-activating enzyme ATG7 16935 10533 - O95352 ATG7_HUMAN

BAG1_1-1 BAG family molecular regulator 1 937 573 - Q99933 BAG1_HUMAN

BAG6_1-1 Large proline-rich protein BAG6 13919 7917 - P46379 BAG6_HUMAN

BMI1-1 Polycomb complex protein BMI-1 1066 648 - P35226 BMI1_HUMAN

BRAF-1/-2 Serine/threonine-protein kinase B-raf 1097 673 2.7.11.1 P15056 BRAF_HUMAN

CLK4 Dual specificity protein kinase CLK4 13659 57396 2.7.12.1 Q9HAZ1 CLK4_HUMAN

DCDC1 Doublecortin domain-containing protein 1 20625 341019 - P59894 DCDC1_HUMAN

DCDC2 Doublecortin domain-containing protein 2 18141 51473 - Q9UHG0 DCDC2_HUMAN

DCDC2B Doublecortin domain-containing protein 2B 32576 149069 - A2VCK2 DCD2B_HUMAN

DCDC2C Doublecortin domain-containing protein 2C 32696 728597 - A8MYV0 DCD2C_HUMAN

DCDC5 Doublecortin domain-containing protein 5 24799 100506627 - Q6ZRR9 DCDC5_HUMAN

DCLK1 Serine/threonine-protein kinase DCLK1 2700 9201 2.7.11.1 O15075-2 DCLK1_HUMAN

DCLK2 Serine/threonine-protein kinase DCLK2 19002 166614 2.7.11.1 Q8N568 DCLK2_HUMAN

DCX Neuronal migration protein doublecortin 2714 1641 - O43602 DCX_HUMAN

DDI1-1 Protein DDI1 homolog 1 18961 414301 - Q8WTU0 DDI1_HUMAN

DDI2_1-1 Protein DDI1 homolog 2 24578 84301 - Q5TDH0 DDI2_HUMAN

DGKQ Diacylglycerol kinase theta 2856 1609 2.7.1.107 P52824 DGKQ_HUMAN

EPB41L1_1-1 Band 4.1-like protein 1 3378 2036 - Q9H4G0 E41L1_HUMAN

EPB41L2-1 Band 4.1-like protein 2 3379 2037 - O43491 E41L2_HUMAN

EPB41L3_1-1 Band 4.1-like protein 3 3380 23136 - Q9Y2J2 E41L3_HUMAN

EPB41L4A Band 4.1-like protein 4A 13278 64097 - Q9HCS5 E41LA_HUMAN

EPB41L4B_1 Band 4.1-like protein 4B 19818 54566 - Q9H329 E41LB_HUMAN

EPB41L5_1-1 Band 4.1-like protein 5 19819 57669 - Q9HCM4 E41L5_HUMAN

FAF1_1-1 FAS-associated factor 1 3578 11124 - Q9UNN5 FAF1_HUMAN

114

HUGO NCBI EC Gene Name Protein Name UniProt ID UniProt Name HGNC GeneID EnzymeID FAF2-1 FAS-associated factor 2 24666 23197 - Q96CS3 FAF2_HUMAN

FARP2_1-1 FERM, RhoGEF and pleckstrin domain-containing protein 2 16460 9855 - O94887 FARP2_HUMAN

FAU_1-1 Ubiquitin-like protein FUBI 3597 2197 - P35544 UBIM_HUMAN

FRMD1_1-1 FERM domain-containing protein 1 21240 79981 - Q8N878 FRMD1_HUMAN

FRMD3_1-1/-2 FERM domain-containing protein 3 24125 257019 - A2A2Y4 FRMD3_HUMAN

FRMD4A_1-1 FERM domain-containing protein 4A 25491 55691 - Q9P2Q2 FRM4A_HUMAN

FRMD4B_1-1 FERM domain-containing protein 4B 24886 23150 - Q9Y2L6 FRM4B_HUMAN

FRMD5_1-1/-2 FERM domain-containing protein 5 28214 84978 - Q7Z6J6 FRMD5_HUMAN FRMD6_1-1 FERM domain-containing protein 6 19839 122786 - Q96NE9 FRMD6_HUMAN FRMD7_1-1 FERM domain-containing protein 7 8079 90167 - Q6ZUT3 FRMD7_HUMAN

FRMPD2_1-1 FERM and PDZ domain-containing protein 2 28572 143162 - Q68DX3 FRPD2_HUMAN

GABARAP Gamma-aminobutyric acid receptor-associated protein 4067 11337 - O95166 GBRAP_HUMAN

GABARAPL1_1-1 Gamma-aminobutyric acid receptor-associated protein-like 1 4068 23710 - Q9H0R8 GBRL1_HUMAN

GABARAPL2 Gamma-aminobutyric acid receptor-associated protein-like 2 13291 11345 - P60520 GBRL2_HUMAN

GRB10 Growth factor receptor-bound protein 10 4564 2887 - Q13322 GRB10_HUMAN

GRB14 Growth factor receptor-bound protein 14 4565 2888 - Q14449 GRB14_HUMAN

GRB7 Growth factor receptor-bound protein 7 4567 2886 - Q14451 GRB7_HUMAN Homocysteine-responsive endoplasmic reticulum-resident HERPUD1_1-1 13744 9709 - Q15011 HERP1_HUMAN ubiquitin-like domain member 1 protein Homocysteine-responsive endoplasmic reticulum-resident HERPUD2_1-1 21915 64224 - Q9BSE4 HERP2_HUMAN ubiquitin-like domain member 2 protein HSPA13 Heat shock 70 kDa protein 13 11375 6782 - P48723 HSP13_HUMAN

IKBKB_1-1 Inhibitor of nuclear factor kappa-B kinase subunit  5960 3551 2.7.11.10 O14920 IKKB_HUMAN

IQUB_1-1 IQ and ubiquitin-like domain-containing protein 21995 154865 - Q8NA54 IQUB_HUMAN

ISG15_1-1/-2 Ubiquitin-like protein ISG15 4053 9636 - P05161 ISG15_HUMAN

MAP1LC3A_1-1 -associated proteins 1A/1B light chain 3A 6838 84557 - Q9H492 MLP3A_HUMAN

MAP1LC3B Microtubule-associated proteins 1A/1B light chain 3B 13352 81631 - Q9GZQ8 MLP3B_HUMAN

MAP1LC3B2 Microtubule-associated proteins 1A/1B light chain 3  2 34390 643246 - A6NCE7 MP3B2_HUMAN

MAP1LC3C Microtubule-associated proteins 1A/1B light chain 3C 13353 440738 - Q9BXW4 MLP3C_HUMAN

MDP1_1 Magnesium-dependent phosphatase 1 28781 145553 3.1.3.48 Q86V88 MGDP1_HUMAN

MIDN Midnolin 16298 90007 - Q504T8 MIDN_HUMAN MLLT4_1 7137 4301 - P55196 AFAD_HUMAN

MOCS2 Molybdopterin synthase sulfur carrier subunit 7193 4338 - O96033 MOC2A_HUMAN

MYLIP_1-1 E3 ubiquitin-protein ligase MYLIP 21155 29116 6.3.2.- Q8WY64 MYLIP_HUMAN

MYO9A_1 Unconventional myosin-Ixa 7608 4649 - B2RTY4 MYO9A_HUMAN

MYO9B_1-1 Unconventional myosin-Ixb 7609 4650 - Q13459 MYO9B_HUMAN

NAE1_1-1 NEDD8-activating enzyme E1 regulatory subunit 621 8883 - Q13564 ULA1_HUMAN

NCF2_1-1 Neutrophil cytosol factor 2 7661 4688 - P19878 NCF2_HUMAN

NEDD8 NEDD8 7732 4738 - Q15843 NEDD8_HUMAN

NF2_1 7773 4771 - P35240 MERL_HUMAN

NFATC2IP_1 NFATC2-interacting protein 25906 84901 - Q8NCF5 NF2IP_HUMAN

NPLOC4_1 Nuclear protein localization protein 4 homolog 18261 55666 - Q8TAT6 NPL4_HUMAN

NSFL1C_1 NSFL1 p47 15912 55968 - Q9UNZ2 NSF1C_HUMAN

OASL_1 2'-5'-oligoadenylate synthase-like protein 8090 8638 - Q15646 OASL_HUMAN 115

HUGO NCBI EC Gene Name Protein Name UniProt ID UniProt Name HGNC GeneID EnzymeID PAN2_1-1/-2/-3 Retinol dehydrogenase 14 19979 57665 1.1.1.- Q9HBH5 RDH14_HUMAN PARK2_1 E3 ubiquitin-protein ligase parkin 8607 5071 6.3.2.- O60260 PRKN2_HUMAN PCGF1_1-1 Polycomb group RING finger protein 1 17615 84759 - Q9BSM1 PCGF1_HUMAN PCGF2_1-1 Polycomb group RING finger protein 2 12929 7703 - P35227 PCGF2_HUMAN PCGF3_1-1 Polycomb group RING finger protein 3 10066 10336 - Q3KNV8 PCGF3_HUMAN PCGF5_1-1 Polycomb group RING finger protein 5 28264 84333 - Q86SE9 PCGF5_HUMAN PCGF6_1-1 Polycomb group RING finger protein 6 21156 84108 - Q9BYE7 PCGF6_HUMAN Phosphatidylinositol 4-phosphate 3-kinase C2 PIK3C2A 8971 5286 2.7.1.154 O00443 P3C2A_HUMAN domain-containing subunit  Phosphatidylinositol 4-phosphate 3-kinase C2 PIK3C2B 8972 5287 2.7.1.154 O00750 P3C2B_HUMAN domain-containing subunit  Phosphatidylinositol 4,5-bisphosphate 3-kinase PIK3CA 8975 5290 2.7.1.153 P42336 PK3CA_HUMAN catalytic subunit  isoform Phosphatidylinositol 4,5-bisphosphate 3-kinase PIK3CB 8976 5291 2.7.1.153 P42338 PK3CB_HUMAN catalytic subunit  isoform Phosphatidylinositol 4,5-bisphosphate 3-kinase PIK3CD 8977 5293 2.7.1.153 O00329 PK3CD_HUMAN catalytic subunit isoform Phosphatidylinositol 4,5-bisphosphate 3-kinase PIK3CG 8978 5294 2.7.1.153 P48736 PK3CG_HUMAN catalytic subunit isoform PLXNC1_1-1/-2 Plexin-C1 9106 10154 - O60486 PLXC1_HUMAN HELZ2_1-1/-2/-3 Helicase with zinc finger domain 2 (PRIC285) 30021 85441 3.6.4.- Q9BYK8 PR285_HUMAN PTPN13_1-1/-2/-3 Tyrosine-protein phosphatase non-receptor type 13 9646 5783 3.1.3.48 Q12923 PTN13_HUMAN PTPN14 Tyrosine-protein phosphatase non-receptor type 14 9647 5784 3.1.3.48 Q15678 PTN14_HUMAN PTPN21_1-1/-2/-3 Tyrosine-protein phosphatase non-receptor type 21 9651 11099 3.1.3.48 Q16825 PTN21_HUMAN PTPN3_1-1/-2 Tyrosine-protein phosphatase non-receptor type 3 9655 5774 3.1.3.48 P26045 PTN3_HUMAN PTPN4_1-1/-2/-3 Tyrosine-protein phosphatase non-receptor type 4 9656 5775 3.1.3.48 P29074 PTN4_HUMAN RAD23A UV excision repair protein RAD23 homolog A 9812 5886 - P54725 RD23A_HUMAN RAD23B UV excision repair protein RAD23 homolog B 9813 5887 - P54727 RD23B_HUMAN RAF1_1 RAF proto-oncogene serine/threonine-protein kinase 9829 5894 2.7.11.1 P04049 RAF1_HUMAN RALGDS_1-1/-2 Ral guanine nucleotide dissociation stimulator 9842 5900 - Q12967 GNDS_HUMAN RAPGEF2 Rap guanine nucleotide exchange factor 2 16854 9693 - Q9Y4G8 RPGF2_HUMAN RAPGEF4_1 Rap guanine nucleotide exchange factor 4 16626 11069 - Q8WZA2 RPGF4_HUMAN Ras-associated and pleckstrin homology domains- RAPH1_1 14436 65059 - Q70E73 RAPH1_HUMAN containing protein 1 RASIP1 Ras-interacting protein 1 24716 54922 - Q5U651 RAIN_HUMAN RASSF1_1 Ras association domain-containing protein 1 9882 11186 - Q9NS23 RASF1_HUMAN RASSF2 Ras association domain-containing protein 2 9883 9770 - P50749 RASF2_HUMAN RASSF3_1 Ras association domain-containing protein 3 14271 283349 - Q86WH2 RASF3_HUMAN RASSF4_1 Ras association domain-containing protein 4 20793 83937 - Q9H2L5 RASF4_HUMAN RASSF5_1 Ras association domain-containing protein 5 17609 83593 - Q8WWW0 RASF5_HUMAN RASSF6_1 Ras association domain-containing protein 6 20796 166824 - Q6ZTQ3 RASF6_HUMAN RASSF7_1 Ras association domain-containing protein 7 1166 8045 - Q02833 RASF7_HUMAN RASSF8_1 Ras association domain-containing protein 8 13232 11228 - Q8NHQ8 RASF8_HUMAN RASSF9 Ras association domain-containing protein 9 15739 9182 - O75901 RASF9_HUMAN RanBP-type and C3HC4-type zinc finger-containing RBCK1_1-1/-2 15864 10616 6.3.2.- Q9BYM8 HOIL1_HUMAN protein 1 RDX_1-1 Radixin 9944 5962 - P35241 RADI_HUMAN

RGL1_1-1 Ral guanine nucleotide dissociation stimulator-like 1 30281 23179 - Q9NZL6 RGL1_HUMAN

RGL2_1-1 Ral guanine nucleotide dissociation stimulator-like 2 9769 5863 - O15211 RGL2_HUMAN

116

HUGO NCBI EC Gene Name Protein Name UniProt ID UniProt Name HGNC GeneID EnzymeID

RGL3_1-1 Ral guanine nucleotide dissociation stimulator-like 3 30282 57139 - Q3MIN7 RGL3_HUMAN

RGS12_1 Regulator of G-protein signaling 12 9994 6002 - O14924 RGS12_HUMAN

RGS14_1 Regulator of G-protein signaling 14 9996 10636 - O43566 RGS14_HUMAN

RIN1_1 Ras and Rab interactor 1 18749 9610 - Q13671 RIN1_HUMAN

RIN2_1 Ras and Rab interactor 2 18750 54453 - Q8WYP3 RIN2_HUMAN

RIN3_1 Ras and Rab interactor 3 18751 79890 - Q8TB24 RIN3_HUMAN

RING1_1-1/-2 E3 ubiquitin-protein ligase RING1 10018 6015 6.3.2.- Q06587 RING1_HUMAN

RING2_1-1 E3 ubiquitin-protein ligase RING2 10061 6045 6.3.2.- Q99496 RING2_HUMAN

RP1 Oxygen-regulated protein 1 10263 6101 - P56715 RP1_HUMAN

RP1L1_1 Retinitis pigmentosa 1-like 1 protein 15946 94137 - Q8IWN7 RP1L1_HUMAN

RPS27A_1-1 Ubiquitin-40S ribosomal protein S27a 10417 6233 - P62979 RS27A_HUMAN

RSG1_1-1/2 REM2- and Rab-like small GTPase 1 28127 79363 - Q9BU20 RSG1_HUMAN

SACS_1 Sacsin 10519 26278 - Q9NZJ4 SACS_HUMAN

SAE1_1-1 SUMO-activating enzyme subunit 1 30660 10055 - Q9UBE0 SAE1_HUMAN

SAE2 SUMO-activating enzyme subunit 2 30661 10054 6.3.2.- Q9UBT2 SAE2_HUMAN

SF3A1_1-1 Splicing factor 3A subunit 1 10765 10291 - Q15459 SF3A1_HUMAN

SHARPIN_1-1/-2 Sharpin 25321 81858 - Q9H0F6 SHRPN_HUMAN

SHROOM1 Shroom1 24084 134549 - Q2M3G4 SHRM1_HUMAN

SNRNP25 U11/U12 small nuclear ribonucleoprotein 25 kDa protein 14161 79622 - Q9BV90 SNR25_HUMAN

SNX27_1 Sorting nexin-27 20073 81609 - Q96L92 SNX27_HUMAN

SNX31_1 Sorting nexin-31 28605 169166 - Q8N9S9 SNX31_HUMAN

SUMO1_1-1/-2 Small ubiquitin-related modifier 1 12502 7341 - P63165 SUMO1_HUMAN

SUMO2_1-1 Small ubiquitin-related modifier 2 11125 6613 - P61956 SUMO2_HUMAN

SUMO3_1-1 Small ubiquitin-related modifier 3 11124 6612 - P55854 SUMO3_HUMAN

SUMO4_1-1 Small ubiquitin-related modifier 4 21181 387082 - Q6EEV6 SUMO4_HUMAN

TBCB_1-1 -folding cofactor B 1989 1155 - Q99426 TBCB_HUMAN

TBCE Tubulin-specific chaperone E 11582 6905 - Q15813 TBCE_HUMAN

TBCEL Tubulin-specific chaperone cofactor E-like protein 28115 219899 - Q5QJ74 TBCEL_HUMAN

TCEB2_1-1 Transcription elongation factor B polypeptide 2 11619 6923 - Q15370 ELOB_HUMAN

TECR_1 Very-long-chain enoyl-CoA reductase 4551 9524 1.3.1.93 Q9NZ01 TECR_HUMAN

TIAM1 T-lymphoma invasion and metastasis-inducing protein 1 11805 7074 - Q13009 TIAM1_HUMAN

TIAM2_1 T-lymphoma invasion and metastasis-inducing protein 2 11806 26230 - Q8IVF5 TIAM2_HUMAN Transmembrane and ubiquitin-like domain-containing TMUB1_1-1 21709 83590 - Q9BVT8 TMUB1_HUMAN protein 1 Transmembrane and ubiquitin-like domain-containing TMUB2_1-1 28459 79089 - Q71RG4 TMUB2_HUMAN protein 2 UBA1 Ubiquitin-like modifier-activating enzyme 1 12469 7317 - P22314 UBA1_HUMAN

UBA3_1 NEDD8-activating enzyme E1 catalytic subunit 12470 9039 6.3.2.- Q8TBC4 UBA3_HUMAN

UBA5_1 Ubiquitin-like modifier-activating enzyme 5 23230 79876 - Q9GZZ9 UBA5_HUMAN

UBA6_1 Ubiquitin-like modifier-activating enzyme 6 25581 55236 - A0AVT1 UBA6_HUMAN

UBA7 Ubiquitin-like modifier-activating enzyme 7 12471 7318 - P41226 UBA7_HUMAN

UBA52_1-1 Ubiquitin-60S ribosomal protein L40 12458 7311 - P62987 RL40_HUMAN

UBAC1 Ubiquitin-associated domain-containing protein 1 30221 10422 - Q9BSL1 UBAC1_HUMAN 117

HUGO NCBI EC Gene Name Protein Name UniProt ID UniProt Name HGNC GeneID EnzymeID

UBB_1-1 Polyubiquitin-B 12463 7314 - P0CG47 UBB_HUMAN

UBC_1-1 Polyubiquitin-C 12468 7316 - P0CG48 UBC_HUMAN

UBD_1-1/-2 Ubiquitin D 18795 10537 - O15205 UBD_HUMAN

UBFD1_1-1/-2 Ubiquitin domain-containing protein UBFD1 30565 56061 - O14562 UBFD1_HUMAN Putative ubiquitin-like protein FUBI-like protein UBIML_1-1 - - - A6NDN8 UBIML_HUMAN ENSP00000310146 UBL3_1-1 Ubiquitin-like protein 3 12504 5412 - O95164 UBL3_HUMAN

UBL4A_1-1 Ubiquitin-like protein 4A 12505 8266 - P11441 UBL4A_HUMAN

UBL4B_1-1 Ubiquitin-like protein 4B 32309 164153 - Q8N7F7 UBL4B_HUMAN

UBL5_1-1 Ubiquitin-like protein 5 13736 59286 - Q9BZL1 UBL5_HUMAN

UBL7_1-1 Ubiquitin-like protein 7 28221 84993 - Q96S82 UBL7_HUMAN

UBLCP1_1-1/-2/-3 Ubiquitin-like domain-containing CTD phosphatase 1 28110 134510 3.1.3.16 Q8WVY7 UBCP1_HUMAN

UBQLN1_1-1 Ubiquilin-1 12508 29979 - Q9UMX0 UBQL1_HUMAN

UBQLN2_1-1 Ubiquilin-2 12509 29978 - Q9UHD9 UBQL2_HUMAN

UBQLN3_1-1 Ubiquilin-3 12510 50613 - Q9H347 UBQL3_HUMAN

UBQLN4_1-1 Ubiquilin-4 1237 56893 - Q9NRR5 UBQL4_HUMAN

UBQLNL_1-1 Ubiquilin-like protein 28294 143630 - Q8IYU4 UBQLN_HUMAN

UBTD1_1-1 Ubiquitin domain-containing protein 1 25683 80019 - Q9HAC8 UBTD1_HUMAN

UBTD2_1-1 Ubiquitin domain-containing protein 2 24463 92181 - Q8WUN7 UBTD2_HUMAN

UBXN1_1-1 UBX domain-containing protein 1 18402 51035 - Q04323 UBXN1_HUMAN

UBXN2A_1-1/-2 UBX domain-containing protein 2A 27265 165324 - P68543 UBX2A_HUMAN

UBXN2B_1-1/-2 UBX domain-containing protein 2B 27035 137886 - Q14CS0 UBX2B_HUMAN

UBXN4_1-1/-2 UBX domain-containing protein 4 14860 23190 - Q92575 UBXN4_HUMAN

UBXN6_1-1/-2 UBX domain-containing protein 6 14928 80700 - Q9BZV1 UBXN6_HUMAN

UBXN7_1-1/-2 UBX domain-containing protein 7 29119 26043 - O94888 UBXN7_HUMAN

UBXN8_1-1 UBX domain-containing protein 8 30307 7993 - O00124 UBXN8_HUMAN

UBXN10_1-1 UBX domain-containing protein 10 26354 127733 - Q96LJ8 UBX10_HUMAN

UBXN11_1 UBX domain-containing protein 11 30600 91544 - Q5T124 UBX11_HUMAN

UFM1_1-1 Ubiquitin-fold modifier 1 20597 51569 - P61960 UFM1_HUMAN

UHRF1_1-1 E3 ubiquitin-protein ligase UHRF1 12556 29128 6.3.2.- Q96T88 UHRF1_HUMAN

UHRF1BP1 UHRF1-binding protein 1 21216 54887 - Q6BDS2 URFB1_HUMAN

UHRF2_1-1 E3 ubiquitin-protein ligase UHRF2 12557 115426 6.3.2.- Q96PU4 UHRF2_HUMAN

URM1_1-1 Ubiquitin-related modifier 1 28378 81605 - Q9BTM9 URM1_HUMAN

USP11_1-1/-2 Ubiquitin carboxyl-terminal hydrolase 11 12609 8237 3.4.19.12 P51784 UBP11_HUMAN

USP14_1-1/-2 Ubiquitin carboxyl-terminal hydrolase 14 12612 9097 3.4.19.12 P54578 UBP14_HUMAN

USP15_1-1/-2/-3 Ubiquitin carboxyl-terminal hydrolase 15 12613 9958 3.4.19.12 Q9Y4E8 UBP15_HUMAN

USP20_1-1/-2 Ubiquitin carboxyl-terminal hydrolase 20 12619 10868 3.4.19.12 Q9Y2K6 UBP20_HUMAN

USP21_1-1 Ubiquitin carboxyl-terminal hydrolase 21 12620 27005 3.4.19.12 Q9UK80 UBP21_HUMAN

USP24_1-1/-2 Ubiquitin carboxyl-terminal hydrolase 24 12623 23358 3.4.19.12 Q9UPU5 UBP24_HUMAN

USP25_1-1/-2/-3 Ubiquitin carboxyl-terminal hydrolase 25 12624 29761 3.4.19.12 Q9UHP3 UBP25_HUMAN

USP28_1-1/-2 Ubiquitin carboxyl-terminal hydrolase 28 12625 57646 3.4.19.12 Q96RU2 UBP28_HUMAN

USP31_1-1 Ubiquitin carboxyl-terminal hydrolase 31 20060 57478 3.4.19.12 Q70CQ4 UBP31_HUMAN

USP32 Ubiquitin carboxyl-terminal hydrolase 32 19143 84669 3.4.19.12 Q8NFA0 UBP32_HUMAN

118

HUGO NCBI EC Gene Name Protein Name UniProt ID UniProt Name HGNC GeneID EnzymeID

USP34 Ubiquitin carboxyl-terminal hydrolase 34 20066 9736 3.4.19.12 Q70CQ2 UBP34_HUMAN

USP4_1-1/-2/-3/-4 Ubiquitin carboxyl-terminal hydrolase 4 12627 7375 3.4.19.12 Q13107 UBP4_HUMAN

USP40_1-1/-2 Ubiquitin carboxyl-terminal hydrolase 40 20069 55230 3.4.19.12 Q9NVE5 UBP40_HUMAN

USP43 Ubiquitin carboxyl-terminal hydrolase 43 20072 124739 3.4.19.12 Q70EL4 UBP43_HUMAN

USP47 Ubiquitin carboxyl-terminal hydrolase 47 20076 55031 3.4.19.12 Q96K76 UBP47_HUMAN

USP48_1-1/-2 Ubiquitin carboxyl-terminal hydrolase 48 18533 84196 3.4.19.12 Q86UV5 UBP48_HUMAN

USP5_1-1 Ubiquitin carboxyl-terminal hydrolase 5 12628 8078 3.4.19.12 P45974 UBP5_HUMAN

USP6 Ubiquitin carboxyl-terminal hydrolase 6 12629 9098 3.4.19.12 P35125 UBP6_HUMAN

USP7 Ubiquitin carboxyl-terminal hydrolase 7 12630 7874 3.4.19.12 Q93009 UBP7_HUMAN

USP8_1-1/-2/-3 Ubiquitin carboxyl-terminal hydrolase 8 12631 9101 3.4.19.12 P40818 UBP8_HUMAN

USP9X_1-1/-2/-3 Probable ubiquitin carboxyl-terminal hydrolase FAF-X 12632 8239 3.4.19.12 Q93008 USP9X_HUMAN

USP9Y_1-1/-2/-3 Probable ubiquitin carboxyl-terminal hydrolase FAF-Y 12633 8287 3.4.19.12 O00507 USP9Y_HUMAN

VCPIP1_1-1/-2/-3 Deubiquitinating protein VCIP135 30897 80124 3.4.19.12 Q96JH7 VCIP1_HUMAN

WDR48_1-1/-2 WD repeat-containing protein 48 30914 57599 - Q8TAF3 WDR48_HUMAN

YOD1_1-1 Ubiquitin thioesterase OTU1 25035 55432 3.4.19.12 Q5VVQ6 OTU1_HUMAN

119

Appendix II: All human genes & isoforms that encode ubiquitin-like domains.

Gene Name Protein Name UniProt ID UniProt Name

ANKRD60 Ankyrin repeat domain-containing protein 60 Q9BZ19 ANR60_HUMAN

ANKUB1-1/-2/-3 ANKUB1 A6NFN9 ANKUB_HUMAN

ANUBL1-1 AN1-type zinc finger protein 4 Q86XD8 ZFAN4_HUMAN

APBB1IP Amyloid beta A4 precursor protein-binding family B member 1-interacting protein Q7Z5R6 AB1IP_HUMAN

ARAF-1 Serine/threonine-protein kinase A-Raf P10398 ARAF_HUMAN

ARAP1 Arf-GAP with Rho-GAP domain, ANK repeat and PH domain-containing protein 1 Q96P48 ARAP1_HUMAN

ARAP2 Arf-GAP with Rho-GAP domain, ANK repeat and PH domain-containing protein 2 Q8WZ64 ARAP2_HUMAN

ARAP3 Arf-GAP with Rho-GAP domain, ANK repeat and PH domain-containing protein 3 Q8WWN8 ARAP3_HUMAN

ARHGAP20 Rho GTPase-activating protein 20 Q9P2F6 RHG20_HUMAN ASPSCR1_1-1 Tether containing UBX domain for GLUT4 Q9BZE9 ASPC1_HUMAN ASPSCR1_2-1 Q9BZE9-2 ASPC1_HUMAN ASPSCR1_3 Q9BZE9-3 ASPC1_HUMAN

ATG12 Ubiquitin-like protein ATG12 O94817 ATG12_HUMAN

ATG3_1-1 Ubiquitin-like-conjugating enzyme ATG3 Q9NT62 ATG3_HUMAN

ATG7_1-1 Ubiquitin-like modifier-activating enzyme ATG7 O95352 ATG7_HUMAN ATG7_2-1 O95352-2 ATG7_HUMAN ATG7_3-1 O95352-3 ATG7_HUMAN

BAG1_1-1 BAG family molecular chaperone regulator 1 Q99933 BAG1_HUMAN BAG1_2-1 Q99933-2 BAG1_HUMAN BAG1_3-1 Q99933-3 BAG1_HUMAN BAG1_4-1 Q99933-4 BAG1_HUMAN

BAG6_1-1 Large proline-rich protein BAG6 P46379 BAG6_HUMAN BAG6_2-1 P46379-2 BAG6_HUMAN BAG6_3-1 P46379-3 BAG6_HUMAN

BMI1_1-1 Polycomb complex protein BMI-1 P35226 BMI1_HUMAN

BRAF_1-1/-2 Serine/threonine-protein kinase B-raf P15056 BRAF_HUMAN CLK4_1-1 Dual specificity protein kinase CLK4 Q9HAZ1 CLK4_HUMAN DCDC1_1-1 Doublecortin domain-containing protein 1 P59894 DCDC1_HUMAN DCDC2_1-1 Doublecortin domain-containing protein 2 Q9UHG0 DCDC2_HUMAN DCDC2B_1-1 Doublecortin domain-containing protein 2B A2VCK2 DCD2B_HUMAN

DCDC2C_1-1 Doublecortin domain-containing protein 2C A8MYV0 DCD2C_HUMAN

DCDC5 Doublecortin domain-containing protein 5 Q6ZRR9 DCDC5_HUMAN

DCLK1_1-1 Serine/threonine-protein kinase DCLK1 O15075-2 DCLK1_HUMAN

DCLK2_1-1 Serine/threonine-protein kinase DCLK2 Q8N568 DCLK2_HUMAN DCX_1-1 Neuronal migration protein doublecortin O43602 DCX_HUMAN DCX_2-1 O43602-2 DCX_HUMAN DDI1_1-1 Protein DDI1 homolog 1 Q8WTU0 DDI1_HUMAN

120

DDI2_1-1 Protein DDI1 homolog 2 Q5TDH0 DDI2_HUMAN DDI2_2-1 Q5TDH0-2 DDI2_HUMAN DDI2_3-1 Q5TDH0-3 DDI2_HUMAN DGKQ Diacylglycerol kinase theta P52824 DGKQ_HUMAN

EPB41_1-1 P11171 DGKQ_HUMAN EPB41_2-1 P11171-2 DGKQ_HUMAN EPB41_3-1 P11171-3 DGKQ_HUMAN EPB41_4-1 P11171-4 DGKQ_HUMAN EPB41_5-1 P11171-5 DGKQ_HUMAN EPB41_7-1 P11171-7 DGKQ_HUMAN

EPB41L1_1-1 Band 4.1-like protein 1 Q9H4G0 E41L1_HUMAN EPB41L1_2-1 Q9H4G0 E41L1_HUMAN EPB41L1_3-1 Q9H4G0 E41L1_HUMAN EPB41L1_4-1 Q9H4G0 E41L1_HUMAN EPB41L2-1 Band 4.1-like protein 2 O43491 E41L2_HUMAN EPB41L3_1-1 Band 4.1-like protein 3 Q9Y2J2 E41L3_HUMAN EPB41L3_2-1 Q9Y2J2 E41L3_HUMAN EPB41L3_3-1 Q9Y2J2 E41L3_HUMAN EPB41L4A Band 4.1-like protein 4A Q9HCS5 E41LA_HUMAN EPB41L4B_1 Band 4.1-like protein 4B Q9H329 E41LB_HUMAN EPB41L4B_2 Q9H329 E41LB_HUMAN EPB41L5_1-1 Band 4.1-like protein 5 Q9HCM4 E41L5_HUMAN EPB41L5_2-1 Q9HCM4 E41L5_HUMAN EPB41L5_3-1 Q9HCM4 E41L5_HUMAN EPB41L5_4-1 Q9HCM4 E41L5_HUMAN FAF1_1-1 FAS-associated factor 1 Q9UNN5 FAF1_HUMAN FAF1_2-1 Q9UNN5 FAF1_HUMAN FAF2-1 FAS-associated factor 2 Q96CS3 FAF2_HUMAN FARP2_1-1 FERM, RhoGEF and pleckstrin domain-containing protein 2 O94887 FARP2_HUMAN FARP2_2-1 O94887 FARP2_HUMAN FAU_1-1 Ubiquitin-like protein FUBI P35544 UBIM_HUMAN FRMD1_1-1 FERM domain-containing protein 1 Q8N878 FRMD1_HUMAN FRMD1_2-1/-2 Q8N878 FRMD1_HUMAN

FRMD3_1-1/-2 FERM domain-containing protein 3 A2A2Y4 FRMD3_HUMAN FRMD3_2-1/-2 A2A2Y4-2 FRMD3_HUMAN FRMD3_3-1/-2 A2A2Y4-3 FRMD3_HUMAN FRMD3_4-1 A2A2Y4-4 FRMD3_HUMAN FRMD3_5-1 A2A2Y4-5 FRMD3_HUMAN FRMD3_6-1/-2 A2A2Y4-6 FRMD3_HUMAN FRMD3_7-1/-2 A2A2Y4-7 FRMD3_HUMAN FRMD3_8-1 A2A2Y4-8 FRMD3_HUMAN FRMD3_10-1 A2A2Y4-10 FRMD3_HUMAN

FRMD4A_1-1 FERM domain-containing protein 4A Q9P2Q2 FRM4A_HUMAN 121

FRMD4B_1-1 FERM domain-containing protein 4B Q9Y2L6 FRM4B_HUMAN

FRMD5_1-1/-2 FERM domain-containing protein 5 Q7Z6J6 FRMD5_HUMAN FRMD5_2-1 Q7Z6J6-2 FRMD5_HUMAN

FRMD6_1-1 FERM domain-containing protein 6 Q96NE9 FRMD6_HUMAN FRMD6_2-1 Q96NE9-2 FRMD6_HUMAN

FRMD7_1-1 FERM domain-containing protein 7 Q6ZUT3 FRMD7_HUMAN

FRMPD2_1-1 FERM and PDZ domain-containing protein 2 Q68DX3 FRPD2_HUMAN FRMPD2_2-1 Q68DX3-2 FRPD2_HUMAN FRMPD2_4-1/-2 Q68DX3-4 FRPD2_HUMAN FRMPD2_5-1 Q68DX3-5 FRPD2_HUMAN

GABARAP Gamma-aminobutyric acid receptor-associated protein O95166 GBRAP_HUMAN

GABARAPL1_1-1 Gamma-aminobutyric acid receptor-associated protein-like 1 Q9H0R8 GBRL1_HUMAN GABARAPL1_2-1 Q9H0R8-2 GBRL1_HUMAN

GABARAPL2 Gamma-aminobutyric acid receptor-associated protein-like 2 P60520 GBRL2_HUMAN

GRB10 Growth factor receptor-bound protein 10 Q13322 GRB10_HUMAN

GRB14 Growth factor receptor-bound protein 14 Q14449 GRB14_HUMAN

GRB7 Growth factor receptor-bound protein 7 Q14451 GRB7_HUMAN Homocysteine-responsive endoplasmic reticulum-resident ubiquitin-like domain member HERPUD1_1-1 Q15011 HERP1_HUMAN 1 protein HERPUD1_2-1 Q15011-2 HERP1_HUMAN HERPUD1_3-1 Q15011-3 HERP1_HUMAN Homocysteine-responsive endoplasmic reticulum-resident ubiquitin-like domain member HERPUD2_1-1 Q9BSE4 HERP2_HUMAN 2 protein HSPA13 Heat shock 70 kDa protein 13 P48723 HSP13_HUMAN

IKBKB_1-1 Inhibitor of nuclear factor kappa-B kinase subunit  O14920 IKKB_HUMAN

IQUB_1-1 IQ and ubiquitin-like domain-containing protein Q8NA54 IQUB_HUMAN IQUB_2-1 Q8NA54-2 IQUB_HUMAN

ISG15_1-1/-2 Ubiquitin-like protein ISG15 P05161 ISG15_HUMAN

MAP1LC3A_1-1 Microtubule-associated proteins 1A/1B light chain 3A Q9H492 MLP3A_HUMAN MAP1LC3A_2-1 Q9H492-2 MLP3A_HUMAN

MAP1LC3B Microtubule-associated proteins 1A/1B light chain 3B Q9GZQ8 MLP3B_HUMAN

MAP1LC3B2 Microtubule-associated proteins 1A/1B light chain 3  2 A6NCE7 MP3B2_HUMAN

MAP1LC3C Microtubule-associated proteins 1A/1B light chain 3C Q9BXW4 MLP3C_HUMAN

MDP1_1 Magnesium-dependent phosphatase 1 Q86V88 MGDP1_HUMAN MDP1_2 Q86V88 MGDP1_HUMAN MDP1_3 Q86V88 MGDP1_HUMAN

MIDN Midnolin Q504T8 MIDN_HUMAN

122

MLLT4_1 Afadin P55196 AFAD_HUMAN MLLT4_2 P55196-2 AFAD_HUMAN MLLT4_3 P55196-3 AFAD_HUMAN MLLT4_4 P55196-4 AFAD_HUMAN MLLT4_5 P55196-5 AFAD_HUMAN MLLT4_6 P55196-6 AFAD_HUMAN

MOCS2 Molybdopterin synthase sulfur carrier subunit O96033 MOC2A_HUMAN

MYLIP_1-1 E3 ubiquitin-protein ligase MYLIP Q8WY64 MYLIP_HUMAN MYLIP_2-1 Q8WY64 MYLIP_HUMAN

MYO9A_1 Unconventional myosin-Ixa B2RTY4 MYO9A_HUMAN MYO9A_2 B2RTY4 MYO9A_HUMAN MYO9A_3 B2RTY4 MYO9A_HUMAN MYO9A_4 B2RTY4 MYO9A_HUMAN

MYO9B_1-1 Unconventional myosin-Ixb Q13459 MYO9B_HUMAN MYO9B_2-1 Q13459 MYO9B_HUMAN

NAE1_1-1 NEDD8-activating enzyme E1 regulatory subunit Q13564 ULA1_HUMAN NAE1_2-1 Q13564 ULA1_HUMAN

NCF2_1-1 Neutrophil cytosol factor 2 P19878 NCF2_HUMAN

NEDD8 NEDD8 Q15843 NEDD8_HUMAN

NF2_1 Merlin P35240 MERL_HUMAN NF2_2 P35240 MERL_HUMAN NF2_3 P35240 MERL_HUMAN NF2_4 P35240 MERL_HUMAN NF2_5 P35240 MERL_HUMAN NF2_7 P35240 MERL_HUMAN NF2_8 P35240 MERL_HUMAN NF2_9 P35240 MERL_HUMAN

NFATC2IP_1 NFATC2-interacting protein Q8NCF5 NF2IP_HUMAN NFATC2IP_2 Q8NCF5-2 NF2IP_HUMAN NFATC2IP_3 Q8NCF5-3 NF2IP_HUMAN

NPLOC4_1 Nuclear protein localization protein 4 homolog Q8TAT6 NPL4_HUMAN NPLOC4_2 Q8TAT6 NPL4_HUMAN

NSFL1C_1 NSFL1 cofactor p47 Q9UNZ2 NSF1C_HUMAN NSFL1C_2 Q9UNZ2 NSF1C_HUMAN NSFL1C_3 Q9UNZ2 NSF1C_HUMAN NSFL1C_4 Q9UNZ2 NSF1C_HUMAN

OASL_1 2'-5'-oligoadenylate synthase-like protein Q15646 OASL_HUMAN OASL_2 Q15646-2 OASL_HUMAN

PAN2_1-1/-2/-3 Retinol dehydrogenase 14 Q9HBH5 RDH14_HUMAN PAN2_2-1/-2/-3 Q9HBH5-2 RDH14_HUMAN PAN2_3-1/-2/-3 Q9HBH5-3 RDH14_HUMAN 123

PARK2_1 E3 ubiquitin-protein ligase parkin O60260 PRKN2_HUMAN PARK2_2 O60260-2 PRKN2_HUMAN PARK2_3 O60260-3 PRKN2_HUMAN PARK2_4 O60260-4 PRKN2_HUMAN PARK2_5 O60260-5 PRKN2_HUMAN PARK2_6 O60260-6 PRKN2_HUMAN

PCGF1_1-1 Polycomb group RING finger protein 1 Q9BSM1 PCGF1_HUMAN PCGF1_2-1 Q9BSM1-2 PCGF1_HUMAN PCGF2_1-1 Polycomb group RING finger protein 2 P35227 PCGF2_HUMAN

PCGF3_1-1 Polycomb group RING finger protein 3 Q3KNV8 PCGF3_HUMAN

PCGF3_2-1 Q3KNV8-2 PCGF3_HUMAN

PCGF5_1-1 Polycomb group RING finger protein 5 Q86SE9 PCGF5_HUMAN

PCGF6_1-1 Polycomb group RING finger protein 6 Q9BYE7 PCGF6_HUMAN

PCGF6_2-1/-2 Q9BYE7-2 PCGF6_HUMAN

PIK3C2A Phosphatidylinositol 4-phosphate 3-kinase -containing subunit  O00443 P3C2A_HUMAN

PIK3C2B Phosphatidylinositol 4-phosphate 3-kinase C2 domain-containing subunit  O00750 P3C2B_HUMAN

PIK3CA Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit  isoform P42336 PK3CA_HUMAN

PIK3CB Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit  isoform P42338 PK3CB_HUMAN

PIK3CD Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit  isoform O00329 PK3CD_HUMAN

PIK3CG Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit  isoform P48736 PK3CG_HUMAN

PLXNC1_1-1/-2 Plexin-C1 O60486 PLXC1_HUMAN

HELZ2_1-1/-2/-3 Helicase with zinc finger domain 2 (PRIC285) Q9BYK8 PR285_HUMAN

HELZ2_2-1/-2 Q9BYK8-2 PR285_HUMAN

HELZ2_3 Q9BYK8-3 PR285_HUMAN

PTPN13_1-1/-2/-3 Tyrosine-protein phosphatase non-receptor type 13 Q12923 PTN13_HUMAN

PTPN13_2-1/-2/-3 Q12923-2 PTN13_HUMAN PTPN13_3-1/-2/-3/- Q12923-3 PTN13_HUMAN 4/-5/-6/-7/-8/-9 PTPN13_4-1/-2/-3/- Q12923-4 PTN13_HUMAN 4/-5/-6/-7/-8/-9/-10 PTPN14_1-1/-2/-3/- Tyrosine-protein phosphatase non-receptor type 14 Q15678 PTN14_HUMAN 4 PTPN21_1-1/-2/-3 Tyrosine-protein phosphatase non-receptor type 21 Q16825 PTN21_HUMAN

PTPN3_1-1/-2 Tyrosine-protein phosphatase non-receptor type 3 P26045 PTN3_HUMAN

PTPN4_1-1/-2/-3 Tyrosine-protein phosphatase non-receptor type 4 P29074 PTN4_HUMAN

RAD23A UV excision repair protein RAD23 homolog A P54725 RD23A_HUMAN

RAD23B UV excision repair protein RAD23 homolog B P54727 RD23B_HUMAN

124

RAF1_1 RAF proto-oncogene serine/threonine-protein kinase P04049 RAF1_HUMAN

RAF1_2 P04049-2 RAF1_HUMAN

RALGDS_1-1/-2 Ral guanine nucleotide dissociation stimulator Q12967 GNDS_HUMAN

RALGDS_2-1/-2/-3 Q12967-2 GNDS_HUMAN

RALGDS_3 Q12967 GNDS_HUMAN

RAPGEF2 Rap guanine nucleotide exchange factor 2 Q9Y4G8 RPGF2_HUMAN

RAPGEF4_1 Rap guanine nucleotide exchange factor 4 Q8WZA2 RPGF4_HUMAN

RAPGEF4_2 Q8WZA2 RPGF4_HUMAN

RAPGEF4_3 Q8WZA2 RPGF4_HUMAN

RAPH1_1 Ras-associated and pleckstrin homology domains-containing protein 1 Q70E73 RAPH1_HUMAN RAPH1_2 Q70E73 RAPH1_HUMAN RAPH1_3 Q70E73 RAPH1_HUMAN RAPH1_4 Q70E73 RAPH1_HUMAN RAPH1_5 Q70E73 RAPH1_HUMAN RAPH1_6 Q70E73 RAPH1_HUMAN RAPH1_7 Q70E73 RAPH1_HUMAN RAPH1_8 Q70E73 RAPH1_HUMAN RAPH1_9 Q70E73 RAPH1_HUMAN

RASIP1 Ras-interacting protein 1 Q5U651 RAIN_HUMAN

RASSF1_1 Ras association domain-containing protein 1 Q9NS23 RASF1_HUMAN RASSF1_2 Q9NS23 RASF1_HUMAN RASSF1_3 Q9NS23 RASF1_HUMAN RASSF1_4 Q9NS23 RASF1_HUMAN RASSF1_5 Q9NS23 RASF1_HUMAN

RASSF2 Ras association domain-containing protein 2 P50749 RASF2_HUMAN

RASSF3_1 Ras association domain-containing protein 3 Q86WH2 RASF3_HUMAN

RASSF4_1 Ras association domain-containing protein 4 Q9H2L5 RASF4_HUMAN RASSF4_2 Q9H2L5-2 RASF4_HUMAN RASSF4_3 Q9H2L5-3 RASF4_HUMAN RASSF4_4 Q9H2L5-4 RASF4_HUMAN

RASSF5_1 Ras association domain-containing protein 5 Q8WWW0 RASF5_HUMAN RASSF5_2 Q8WWW0 RASF5_HUMAN RASSF5_3 Q8WWW0 RASF5_HUMAN RASSF5_4 Q8WWW0 RASF5_HUMAN

RASSF6_1 Ras association domain-containing protein 6 Q6ZTQ3 RASF6_HUMAN RASSF6_2 Q6ZTQ3 RASF6_HUMAN RASSF6_3 Q6ZTQ3 RASF6_HUMAN RASSF6_4 Q6ZTQ3 RASF6_HUMAN RASSF7_1 Ras association domain-containing protein 7 Q02833 RASF7_HUMAN RASSF7_2 Q02833 RASF7_HUMAN

125

RASSF8_1 Ras association domain-containing protein 8 Q8NHQ8 RASF8_HUMAN RASSF8_2 Q8NHQ8 RASF8_HUMAN

RASSF9 Ras association domain-containing protein 9 O75901 RASF9_HUMAN

RBCK1_1-1/-2 RanBP-type and C3HC4-type zinc finger-containing protein 1 Q9BYM8 HOIL1_HUMAN RBCK1_2-1/-2 Q9BYM8 HOIL1_HUMAN RBCK1_2-2 Q9BYM8 HOIL1_HUMAN RBCK1_3-1 Q9BYM8 HOIL1_HUMAN

RDX_1-1 Radixin P35241 RADI_HUMAN

RGL1_1-1 Ral guanine nucleotide dissociation stimulator-like 1 Q9NZL6 RGL1_HUMAN RGL1_2-1 Q9NZL6-2 RGL1_HUMAN

RGL2_1-1 Ral guanine nucleotide dissociation stimulator-like 2 O15211 RGL2_HUMAN

RGL3_1-1 Ral guanine nucleotide dissociation stimulator-like 3 Q3MIN7 RGL3_HUMAN

RGS12_1 Regulator of G-protein signaling 12 O14924 RGS12_HUMAN RGS12_2 O14924-2 RGS12_HUMAN RGS12_3 O14924-3 RGS12_HUMAN RGS12_4 O14924-4 RGS12_HUMAN

RGS14_1 Regulator of G-protein signaling 14 O43566 RGS14_HUMAN RGS14_2 O43566-2 RGS14_HUMAN RGS14_3 O43566-3 RGS14_HUMAN RGS14_4 O43566-4 RGS14_HUMAN

RIN1_1 Ras and Rab interactor 1 Q13671 RIN1_HUMAN RIN1_2 Q13671-2 RIN1_HUMAN

RIN2_1 Ras and Rab interactor 2 Q8WYP3 RIN2_HUMAN RIN2_2 Q8WYP3-2 RIN2_HUMAN

RIN3_1 Ras and Rab interactor 3 Q8TB24 RIN3_HUMAN

RING1_1-1/-2 E3 ubiquitin-protein ligase RING1 Q06587 RING1_HUMAN RING1_2-1/-2 Q06587-2 RING1_HUMAN

RING2_1-1 E3 ubiquitin-protein ligase RING2 Q99496 RING2_HUMAN

RP1_1-1 Oxygen-regulated protein 1 P56715 RP1_HUMAN

RP1L1_1 Retinitis pigmentosa 1-like 1 protein Q8IWN7 RP1L1_HUMAN RP1L1_2 Q8IWN7 RP1L1_HUMAN

RPS27A_1-1 Ubiquitin-40S ribosomal protein S27a P62979 RS27A_HUMAN

RSG1_1-1/2 REM2- and Rab-like small GTPase 1 Q9BU20 RSG1_HUMAN

SACS_1 Sacsin Q9NZJ4 SACS_HUMAN SACS_2 Q9NZJ4 SACS_HUMAN

SAE1_1-1 SUMO-activating enzyme subunit 1 Q9UBE0 SAE1_HUMAN SAE1_2-1 Q9UBE0-2 SAE1_HUMAN SAE1_3-1 Q9UBE0-3 SAE1_HUMAN

SAE2_1-1 SUMO-activating enzyme subunit 2 Q9UBT2 SAE2_HUMAN

SF3A1_1-1 Splicing factor 3A subunit 1 Q15459 SF3A1_HUMAN

126

SHARPIN_1-1/-2 Sharpin Q9H0F6 SHRPN_HUMAN

SHARPIN_2-1/-2 Q9H0F6-2 SHRPN_HUMAN

SHARPIN_3-1 Q9H0F6-3 SHRPN_HUMAN

SHROOM1_1 Shroom1 Q2M3G4 SHRM1_HUMAN

SHROOM1_2 Q2M3G4-2 SHRM1_HUMAN

SNRNP25 U11/U12 small nuclear ribonucleoprotein 25 kDa protein Q9BV90 SNR25_HUMAN

SNX27_1 Sorting nexin-27 Q96L92 SNX27_HUMAN

SNX27_2 Q96L92 SNX27_HUMAN

SNX27_3 Q96L92 SNX27_HUMAN

SNX31_1 Sorting nexin-31 Q8N9S9 SNX31_HUMAN

SNX31_2 Q8N9S9-2 SNX31_HUMAN

SUMO1_1-1/-2 Small ubiquitin-related modifier 1 P63165 SUMO1_HUMAN

SUMO2_1-1 Small ubiquitin-related modifier 2 P61956 SUMO2_HUMAN SUMO2_2-1 P61956 SUMO2_HUMAN

SUMO3_1-1 Small ubiquitin-related modifier 3 P55854 SUMO3_HUMAN

SUMO4_1-1 Small ubiquitin-related modifier 4 Q6EEV6 SUMO4_HUMAN

TBCB_1-1 Tubulin-folding cofactor B Q99426 TBCB_HUMAN

TBCE Tubulin-specific chaperone E Q15813 TBCE_HUMAN

TBCEL Tubulin-specific chaperone cofactor E-like protein Q5QJ74 TBCEL_HUMAN

TCEB2_1-1 Transcription elongation factor B polypeptide 2 Q15370 ELOB_HUMAN

TECR_1 Very-long-chain enoyl-CoA reductase Q9NZ01 TECR_HUMAN

TIAM1 T-lymphoma invasion and metastasis-inducing protein 1 Q13009 TIAM1_HUMAN

TIAM2_1 T-lymphoma invasion and metastasis-inducing protein 2 Q8IVF5 TIAM2_HUMAN TIAM2_2 Q8IVF5 TIAM2_HUMAN TIAM2_4 Q8IVF5 TIAM2_HUMAN TIAM2_5 Q8IVF5 TIAM2_HUMAN

TMUB1_1-1 Transmembrane and ubiquitin-like domain-containing protein 1 Q9BVT8 TMUB1_HUMAN

TMUB2_1-1 Transmembrane and ubiquitin-like domain-containing protein 2 Q71RG4 TMUB2_HUMAN TMUB2_2-1/-2 Q71RG4-2 TMUB2_HUMAN TMUB2_3-1 Q71RG4 TMUB2_HUMAN TMUB2_4-1 Q71RG4 TMUB2_HUMAN

UBA1 Ubiquitin-like modifier-activating enzyme 1 P22314 UBA1_HUMAN

UBA3_1 NEDD8-activating enzyme E1 catalytic subunit Q8TBC4 UBA3_HUMAN UBA3_2 Q8TBC4 UBA3_HUMAN

UBA5_1 Ubiquitin-like modifier-activating enzyme 5 Q9GZZ9 UBA5_HUMAN UBA5_2 Q9GZZ9 UBA5_HUMAN

UBA6_1 Ubiquitin-like modifier-activating enzyme 6 A0AVT1 UBA6_HUMAN

UBA6_2 A0AVT1-2 UBA6_HUMAN

UBA7 Ubiquitin-like modifier-activating enzyme 7 P41226 UBA7_HUMAN

UBA52_1-1 Ubiquitin-60S ribosomal protein L40 P62987 RL40_HUMAN 127

UBAC1 Ubiquitin-associated domain-containing protein 1 Q9BSL1 UBAC1_HUMAN

UBB_1-1 Polyubiquitin-B P0CG47 UBB_HUMAN

UBC_1-1 Polyubiquitin-C P0CG48 UBC_HUMAN

UBD_1-1/-2 Ubiquitin D O15205 UBD_HUMAN

UBFD1_1-1/-2 Ubiquitin domain-containing protein UBFD1 O14562 UBFD1_HUMAN

UBIML_1-1 Putative ubiquitin-like protein FUBI-like protein ENSP00000310146 A6NDN8 UBIML_HUMAN

UBIML_2-1 A6NDN8-2 UBIML_HUMAN

UBL3_1-1 Ubiquitin-like protein 3 O95164 UBL3_HUMAN

UBL4A_1-1 Ubiquitin-like protein 4A P11441 UBL4A_HUMAN UBL4B_1-1 Ubiquitin-like protein 4B Q8N7F7 UBL4B_HUMAN

UBL5_1-1 Ubiquitin-like protein 5 Q9BZL1 UBL5_HUMAN

UBL7_1-1 Ubiquitin-like protein 7 Q96S82 UBL7_HUMAN

UBLCP1_1-1/-2/-3 Ubiquitin-like domain-containing CTD phosphatase 1 Q8WVY7 UBCP1_HUMAN

UBQLN1_1-1 Ubiquilin-1 Q9UMX0 UBQL1_HUMAN UBQLN1_2-1 Q9UMX0 UBQL1_HUMAN

UBQLN2_1-1 Ubiquilin-2 Q9UHD9 UBQL2_HUMAN

UBQLN3_1-1 Ubiquilin-3 Q9H347 UBQL3_HUMAN UBQLN4_1-1 Ubiquilin-4 Q9NRR5 UBQL4_HUMAN UBQLN4_2-1 Q9NRR5 UBQL4_HUMAN UBQLNL_1-1 Ubiquilin-like protein Q8IYU4 UBQLN_HUMAN UBQLNL_2-1 Q8IYU4 UBQLN_HUMAN

UBTD1_1-1 Ubiquitin domain-containing protein 1 Q9HAC8 UBTD1_HUMAN

UBTD2_1-1 Ubiquitin domain-containing protein 2 Q8WUN7 UBTD2_HUMAN

UBXN1_1-1 UBX domain-containing protein 1 Q04323 UBXN1_HUMAN UBXN1_2-1 Q04323 UBXN1_HUMAN

UBXN2A_1-1/-2 UBX domain-containing protein 2A P68543 UBX2A_HUMAN UBXN2B_1-1/-2 UBX domain-containing protein 2B Q14CS0 UBX2B_HUMAN UBXN4_1-1/-2 UBX domain-containing protein 4 Q92575 UBXN4_HUMAN UBXN6_1-1/-2 UBX domain-containing protein 6 Q9BZV1 UBXN6_HUMAN UBXN6_2-1 Q9BZV1-2 UBXN6_HUMAN UBXN7_1-1/-2 UBX domain-containing protein 7 O94888 UBXN7_HUMAN UBXN8_1-1 UBX domain-containing protein 8 O00124 UBXN8_HUMAN UBXN8_2-1 O00124 UBXN8_HUMAN UBXN8_3-1 O00124 UBXN8_HUMAN UBXN10_1-1 UBX domain-containing protein 10 Q96LJ8 UBX10_HUMAN

128

UBXN11_1 UBX domain-containing protein 11 Q5T124 UBX11_HUMAN UBXN11_2 Q5T124 UBX11_HUMAN UBXN11_3 Q5T124 UBX11_HUMAN UBXN11_4 Q5T124 UBX11_HUMAN UBXN11_5 Q5T124 UBX11_HUMAN UBXN11_8 Q5T124 UBX11_HUMAN

UFM1_1-1 Ubiquitin-fold modifier 1 P61960 UFM1_HUMAN UFM1_2-1 P61960 UFM1_HUMAN UHRF1_1-1 E3 ubiquitin-protein ligase UHRF1 Q96T88 UHRF1_HUMAN UHRF1BP1 UHRF1-binding protein 1 Q6BDS2 URFB1_HUMAN UHRF2_1-1 E3 ubiquitin-protein ligase UHRF2 Q96PU4 UHRF2_HUMAN UHRF2_2-1 Q96PU4 UHRF2_HUMAN URM1_1-1 Ubiquitin-related modifier 1 Q9BTM9 URM1_HUMAN URM1-2 Q9BTM9 URM1_HUMAN

USP11_1-1/-2 Ubiquitin carboxyl-terminal hydrolase 11 P51784 UBP11_HUMAN USP11_1-2 P51784 UBP11_HUMAN

USP14_1-1/-2 Ubiquitin carboxyl-terminal hydrolase 14 P54578 UBP14_HUMAN USP14_1-2 P54578 UBP14_HUMAN

USP15_1-1/-2/-3 Ubiquitin carboxyl-terminal hydrolase 15 Q9Y4E8 UBP15_HUMAN USP15_1-2 Q9Y4E8 UBP15_HUMAN USP15_1-3 Q9Y4E8 UBP15_HUMAN USP15_2-1/-2/-3 Q9Y4E8-2 UBP15_HUMAN USP15_2-2 Q9Y4E8-2 UBP15_HUMAN USP15_2-3 Q9Y4E8-2 UBP15_HUMAN USP15_3-1/-2/-3 Q9Y4E8-3 UBP15_HUMAN USP15_3-2 Q9Y4E8-3 UBP15_HUMAN USP15_3-3 Q9Y4E8-3 UBP15_HUMAN

USP20_1-1/-2 Ubiquitin carboxyl-terminal hydrolase 20 Q9Y2K6 UBP20_HUMAN USP20_1-2 Q9Y2K6 UBP20_HUMAN

USP21_1-1 Ubiquitin carboxyl-terminal hydrolase 21 Q9UK80 UBP21_HUMAN USP21_3-1 Q9UK80-3 UBP21_HUMAN

USP24_1-1/-2 Ubiquitin carboxyl-terminal hydrolase 24 Q9UPU5 UBP24_HUMAN USP24_1-2 Q9UPU5 UBP24_HUMAN

USP25_1-1/-2/-3 Ubiquitin carboxyl-terminal hydrolase 25 Q9UHP3 UBP25_HUMAN USP25_1-2 Q9UHP3 UBP25_HUMAN USP25_1-3 Q9UHP3 UBP25_HUMAN USP25_2-1/-2/-3 Q9UHP3 UBP25_HUMAN USP25_2-2 Q9UHP3 UBP25_HUMAN USP25_2-3 Q9UHP3 UBP25_HUMAN

129

USP28_1-1 Ubiquitin carboxyl-terminal hydrolase 28 Q96RU2 UBP28_HUMAN USP28_1-2 Q96RU2 UBP28_HUMAN USP28_2-1 Q96RU2-2 UBP28_HUMAN USP28_2-2 Q96RU2-2 UBP28_HUMAN USP28_2-3 Q96RU2-2 UBP28_HUMAN

USP31_1-1 Ubiquitin carboxyl-terminal hydrolase 31 Q70CQ4 UBP31_HUMAN USP32_1-1/-2/-3/- Ubiquitin carboxyl-terminal hydrolase 32 Q8NFA0 UBP32_HUMAN 4/-5/-6 USP34_1-1/-2/-3/- Ubiquitin carboxyl-terminal hydrolase 34 Q70CQ2 UBP34_HUMAN 4/-5/-6 USP34_2-1/-2/-3 Q70CQ2-2 UBP34_HUMAN USP34_3-1/-2/-3 Q70CQ2-3 UBP34_HUMAN

USP4_1-1/-2/-3/-4 Ubiquitin carboxyl-terminal hydrolase 4 Q13107 UBP4_HUMAN USP4_2-1/-2/-3/-4 Q13107-2 UBP4_HUMAN

USP40_1-1/-2 Ubiquitin carboxyl-terminal hydrolase 40 Q9NVE5 UBP40_HUMAN USP40_2-1 Q9NVE5-2 UBP40_HUMAN USP40_3-1/-2/-3 Q9NVE5-3 UBP40_HUMAN

USP43_1-1/-2/-3/-4 Ubiquitin carboxyl-terminal hydrolase 43 Q70EL4 UBP43_HUMAN USP43_3-1 Q70EL4-3 UBP43_HUMAN USP47_1-1/-2/-3/- Ubiquitin carboxyl-terminal hydrolase 47 Q96K76 UBP47_HUMAN 4/-5/-6 USP47_2-1/-2/-3/- Q96K76-2 UBP47_HUMAN 4/-5 USP47_3-1 Q96K76-3 UBP47_HUMAN

USP48_1-1/-2 Ubiquitin carboxyl-terminal hydrolase 48 Q86UV5 UBP48_HUMAN USP48_2-1/-2/-3 Q86UV5-2 UBP48_HUMAN USP48_3-1/-2 Q86UV5-3 UBP48_HUMAN USP48_4-1 Q86UV5-4 UBP48_HUMAN USP48_5-1/-2 Q86UV5-5 UBP48_HUMAN USP48_6-1 Q86UV5-6 UBP48_HUMAN USP48_7-1/-2 Q86UV5-7 UBP48_HUMAN

USP5_1-1 Ubiquitin carboxyl-terminal hydrolase 5 P45974 UBP5_HUMAN USP5_2-1/-2 P45974-2 UBP5_HUMAN

USP6_1-1/-2/-3 Ubiquitin carboxyl-terminal hydrolase 6 P35125 UBP6_HUMAN USP6_2-1/-2/-3/-4 P35125-2 UBP6_HUMAN USP7_1-1/-2/-3/-4/- Ubiquitin carboxyl-terminal hydrolase 7 Q93009 UBP7_HUMAN 5/-6/-7 USP8_1-1/-2/-3 Ubiquitin carboxyl-terminal hydrolase 8 P40818 UBP8_HUMAN USP9X_1-1/-2/-3 Probable ubiquitin carboxyl-terminal hydrolase FAF-X Q93008 USP9X_HUMAN USP9X_2-1/-2/-3 Q93008-2 USP9X_HUMAN USP9Y_1-1/-2/-3 Probable ubiquitin carboxyl-terminal hydrolase FAF-Y O00507 USP9Y_HUMAN USP9Y_2-1/-2/-3 O00507-2 USP9Y_HUMAN

VCPIP1_1-1/-2/-3 Deubiquitinating protein VCIP135 Q96JH7 VCIP1_HUMAN

130

WDR48_1-1/-2 WD repeat-containing protein 48 Q8TAF3 WDR48_HUMAN WDR48_2-1 Q8TAF3-2 WDR48_HUMAN WDR48_3-1/-2/-3 Q8TAF3-3 WDR48_HUMAN WDR48_4-1/-2/-3 Q8TAF3-4 WDR48_HUMAN WDR48_5-1/-2 Q8TAF3-5 WDR48_HUMAN YOD1_1-1 Ubiquitin thioesterase OTU1 Q5VVQ6 OTU1_HUMAN YOD1_2-1 Q5VVQ6 OTU1_HUMAN

131

Appendix III: 205 proteins interact with both ubiquitin & at least one member of the ubiquilin family.

Uniprot ID UniProtKB Entry Uniprot ID UniProtKB Entry Uniprot ID UniProtKB Entry P62259 1433E_MOUSE P56480 ATPB_MOUSE P68104 EF1A1_HUMAN P68510 1433F_MOUSE P01887 B2MG_MOUSE Q9D8N0 EF1G_MOUSE P61982 1433G_MOUSE B2RRX1 B2RRX1_MOUSE P17182 ENOA_MOUSE P68254 1433T_MOUSE B2RSC8 B2RSC8_MOUSE P42566 EPS15_HUMAN P63101 1433Z_MOUSE Q6PAJ1 BCR_MOUSE P42567 EPS15_MOUSE A2BFF7 A2BFF7_MOUSE P70444 BID_MOUSE P19096 FAS_MOUSE P05067-7 A4_HUMAN O08539 BIN1_MOUSE XP_005266087 FBX25_HUMAN P12023 A4_MOUSE Q64152 BTF3_MOUSE Q9CPU7 FBX32_MOUSE A8DUV3 A8DUV3_MOUSE P00920 CAH2_MOUSE Q9Z0E6 GBP2_MOUSE Q8CBW3 ABI1_MOUSE P62204 CALM_MOUSE Q99PT1 GDIR1_MOUSE Q5SWU9 ACACA_MOUSE P47757 CAPZB_MOUSE P13020 GELS_MOUSE P57780 ACTN4_MOUSE P14635 CCNB1_HUMAN P15105 GLNA_MOUSE Q9QYC0 ADDA_MOUSE P63038 CH60_MOUSE P38647 GRP75_MOUSE Q16186 ADRM1_HUMAN Q68FD5 CLH1_MOUSE P11021 GRP78_HUMAN Q8CJG0 AGO2_MOUSE Q922J3 CLIP1_MOUSE NM_002111 HD_HUMAN P24549 AL1A1_MOUSE Q06890 CLUS_MOUSE Q15034 HERC3_HUMAN Q8R0Y6 AL1L1_MOUSE O55029 COPB2_MOUSE O14964 HGS_HUMAN P05064 ALDOA_MOUSE P47941 CRKL_MOUSE Q9D0E1 HNRPM_MOUSE P45376 ALDR_MOUSE Q93034 CUL5_HUMAN Q61699 HS105_MOUSE Q96K21 ANCHR_HUMAN P17302 CXA1_HUMAN P17879 HS71B_MOUSE P17427 AP2A2_MOUSE Q7TMB8 CYFP1_MOUSE P34931 HS71L_HUMAN Q9DBG3 AP2B1_MOUSE D2KHZ9 D2KHZ9_MOUSE P07901 HS90A_MOUSE O54774 AP3D1_MOUSE O08788 DCTN1_MOUSE P63017 HSP7C_MOUSE Q9R0Q6 ARC1A_MOUSE Q8CBY8 DCTN4_MOUSE P54105 ICLN_HUMAN A2BH40 ARI1A_MOUSE Q62167 DDX3X_MOUSE Q9D6R2 IDH3A_MOUSE P61161 ARP2_MOUSE P63037 DNJA1_MOUSE P12268 IMDH2_HUMAN Q9CQE6 ASF1A_MOUSE P25686 DNJB2_HUMAN P48025 KSYK_MOUSE Q925I1 ATAD3_MOUSE Q9Z1N5 DX39B_MOUSE P16125 LDHB_MOUSE Q03265 ATPA_MOUSE Q9JHU4 DYHC1_MOUSE Q91ZX7 LRP1_MOUSE Q9QXZ0 MACF1_MOUSE Q8CI94 PYGB_MOUSE P42227 STAT3_MOUSE Q8R001 MARE2_MOUSE Q3UHZ3 Q3UHZ3_MOUSE Q9WUM5 SUCA_MOUSE P97310 MCM2_MOUSE Q3ULF7 Q3ULF7_MOUSE Q13148 TADBP_HUMAN P97311 MCM6_MOUSE Q4VAE6 Q4VAE6_MOUSE P10637 TAU_MOUSE P14152 MDHC_MOUSE Q921K2 Q921K2_MOUSE P11983 TCPA_MOUSE P08249 MDHM_MOUSE Q922K6 Q922K6_MOUSE P80316 TCPE_MOUSE P20357 MTAP2_MOUSE Q62172 RBP1_MOUSE Q9NZ01 TECR_HUMAN Q8VDD5 MYH9_MOUSE P54725 RD23A_HUMAN P55072 TERA_HUMAN Q64331 MYO6_MOUSE P54727 RD23B_HUMAN Q01853 TERA_MOUSE P70670 NACAM_MOUSE P53026 RL10A_MOUSE Q04207 TF65_MOUSE

132

Uniprot ID UniProtKB Entry Uniprot ID UniProtKB Entry Uniprot ID UniProtKB Entry P15532 NDKA_MOUSE P47963 RL13_MOUSE Q8QZT1 THIL_MOUSE Q8TAT6 NPL4_HUMAN Q9CR57 RL14_MOUSE P19438 TNR1A_HUMAN P35486 ODPA_MOUSE Q9D8E6 RL4_MOUSE P20333 TNR1B_HUMAN P29341 PABP1_MOUSE P62987 RL40_HUMAN P17751 TPIS_MOUSE P49586 PCY1A_MOUSE P47911 RL6_MOUSE P21107 TPM3_MOUSE Q9WU78 PDC6I_MOUSE P12970 RL7A_MOUSE Q12933 TRAF2_HUMAN P12382 PFKAL_MOUSE P14869 RLA0_MOUSE Q9R1R2 TRIM3_MOUSE Q13526 PIN1_HUMAN Q96GF1 RN185_HUMAN Q9QZE7 TSNAX_MOUSE Q9QXS1 PLEC_MOUSE Q9Y3C5 RNF11_HUMAN P62837 UB2D2_HUMAN P63330 PP2AA_MOUSE P70336 ROCK2_MOUSE P61077 UB2D3_HUMAN P35700 PRDX1_MOUSE P38886 RPN10_YEAST P0CG48, UBC_HUMAN NP_066289 Q61171 PRDX2_MOUSE O48726 RPN13_ARATH P49459-3 UBE2A_HUMAN P97313 PRKDC_MOUSE P62281 RS11_MOUSE P63146 UBE2B_HUMAN P62334 PRS10_MOUSE P25444 RS2_MOUSE Q05086 UBE3A_HUMAN P62192 PRS4_MOUSE P62908 RS3_MOUSE P11441 UBL4A_HUMAN P54775 PRS6B_MOUSE E9Q401 RYR2_MOUSE Q70CQ2 UBP34_HUMAN P35998 PRS7_HUMAN Q9UBT2 SAE2_HUMAN Q9UMX0 UBQL1_HUMAN P62196 PRS8_MOUSE O43865 SAHH2_HUMAN Q9UHD9 UBQL2_HUMAN P25787 PSA2_HUMAN P42208 SEPT2_MOUSE P15374 UCHL3_HUMAN P60900 PSA6_HUMAN P28661 SEPT4_MOUSE Q13564 ULA1_HUMAN Q9QUM9 PSA6_MOUSE Q9R1T4 SEPT6_MOUSE Q9C0B0 UNK_HUMAN Q3TXS7 PSMD1_MOUSE O55131 SEPT7_MOUSE XP_005272733 USP9X_HUMAN Q13200 PSMD2_HUMAN P84022 SMAD3_HUMAN Q9WV55 VAPA_MOUSE Q8VDM4 PSMD2_MOUSE Q920B9 SP16H_MOUSE P20152 VIME_MOUSE O43242 PSMD3_HUMAN Q62261 SPTB2_MOUSE P62960 YBOX1_MOUSE P14685 PSMD3_MOUSE P16546 SPTN1_MOUSE P39447 ZO1_MOUSE P55034 PSMD4_ARATH O60232 SSA27_HUMAN O95218-2 ZRAB2_HUMAN P55036, PSMD4_HUMAN Q92783 STAM1_HUMAN P55036-2 Q05920 PYC_MOUSE O75886 STAM2_HUMAN

133

Appendix IV: 127 putative UIM sequences within 106 proteins that interact with both ubiquitin & at least one member of the ubiquilin family.

[ED](3)-x(3)-[AG]-x(3)-S-x(2)-[ED] 6 hits in 6 sequences P25686 DNJB2_HUMAN 252 - 265: DEDlqlAmaySlsE O14964 HGS_HUMAN 260 - 273: EEElqlAlalSqsE P55036 PSMD4_HUMAN 232 - 245: EEEarrAaaaSaaE Q920B9 SP16H_MOUSE 994 - 1007: EEEarkAdreSryE Q92783 STAM1_HUMAN 173 - 186: EEDlakAielSlkE O75886 STAM2_HUMAN 167 - 180: DEDiakAielSlqE

[ED]-x(3)-[AG]-x(3)-S-x(2)-[ED] 32 hits in 27 sequences P63101 1433Z_MOUSE 20 - 31: DdmaAcmkSvtE Q5SWU9 ACACA_MOUSE 553 - 564: DsqfGhcfSwgE Q96K21 ANCHR_HUMAN 208 - 219: DerqGsipStqE Q925I1 ATAD3_MOUSE 33 - 44: DrgaGdrpSpkD B2RRX1 B2RRX1_MOUSE 226 - 237: EmatAassSslE B2RSC8 B2RSC8_MOUSE 8 - 19: DeseApvlSedE O08539 BIN1_MOUSE 168 - 179: EakiAkpvSllE Q922J3 CLIP1_MOUSE 664 - 675: EavkArldSaeD P25686 DNJB2_HUMAN 211 - 222: DlalGlelSrrE 254 - 265: DlqlAmaySlsE P42566 EPS15_HUMAN 881 - 892: DlelAialSksE P42567 EPS15_MOUSE 882 - 893: DlelAialSksE P19096 FAS_MOUSE 1589 - 1600: DcmlGmefSgrD P42858 HD_HUMAN 1261 - 1272: EkfgGflrSalD O14964 HGS_HUMAN 262 - 273: ElqlAlalSqsE Q9QXZ0 MACF1_MOUSE 4960 - 4971: EelqAktsSleE P20357 MTAP2_MOUSE 889 - 900: EnlsGesgSfyE Q9QXS1 PLEC_MOUSE 2090 - 2101: ElelGrirSnaE 4289 - 4300: DpetGkemSvyE 4364 - 4375: DqyrAgtlSitE P14685 PSMD3_MOUSE 37 - 48: EeaaAgsgStgE P55034 PSMD4_ARATH 225 - 236: ElalAlrvSmeE P55036 PSMD4_HUMAN 215 - 226: ElalAlrvSmeE 234 - 245: EarrAaaaSaaE Q9Y3C5 RNF11_HUMAN 141 - 152: EpvdAallSsyE P70336 ROCK2_MOUSE 1143 - 1154: EpddGfpeSrlE P38886 RPN10_YEAST 227 - 238: ElamAlrlSmeE E9Q401 RYR2_MOUSE 4198 - 4209: EmqlAaqiSesD Q920B9 SP16H_MOUSE 933 - 944: DaedGdseSeiE 996 - 1007: EarkAdreSryE Q92783 STAM1_HUMAN 175 - 186: DlakAielSlkE O75886 STAM2_HUMAN 169 - 180: DiakAielSlqE

134

[ED]-x(3)-[AG]-x(4)-S-x(2)-[ED] 25 hits in 20 sequences P68510 1433F_MOUSE 136 - 148: EvasGekknSvvE P24549 AL1A1_MOUSE 138 - 150: DkihGqtipSdgD O54774 AP3D1_MOUSE 884 - 896: EelaAstitSpkD A2BH40 ARI1A_MOUSE 2131 - 2143: DlilAtppfSrlE Q6PAJ1 BCR_MOUSE 850 - 862: DyerAewreSirE O08788 DCTN1_MOUSE 875 - 887: EqiyGspssSpyE 965 - 977: ElseAnvrlSllE P63037 DNJA1_MOUSE 74 - 86: EggaGggfgSpmD P19096 FAS_MOUSE 1358 - 1370: EvqpApsllSqeE P38647 GRP75_MOUSE 244 - 256: DlggGtfdiSilE Q9QXZ0 MACF1_MOUSE 105 - 117: DlrdGhnliSllE 3823 - 3835: EqyaAslarSeaE Q9QXS1 PLEC_MOUSE 217 - 229: DlrdGhnliSllE 2360 - 2372: EvteAarqrSqvE P62192 PRS4_MOUSE 382 - 394: DlimAkddlSgaD P14685 PSMD3_MOUSE 50 - 62: DgkaAatehSqrE Q3UHZ3 Q3UHZ3_MOUSE 180 - 192: EseeGnsaeSaaE Q62172 RBP1_MOUSE 83 - 95: EgyaAfqedSsgD 415 - 427: DlqgGikdlSkeE Q9UBT2 SAE2_HUMAN 483 - 495: EdgkGtiliSseE P28661 SEPT4_MOUSE 2 - 14: DhslGwqgnSvpE Q920B9 SP16H_MOUSE 140 - 152: DkfpGefmkSwsD 930 - 942: EgsdAedgdSesE Q62261 SPTB2_MOUSE 1600 - 1612: DaaeAeawmSeqE Q05086 UBE3A_HUMAN 98 - 110: EnskGapnnScsE

135

[ED]-x(3)-[AG]-x(5)-S-x(2)-[ED] 30 hits in 26 sequences Q5SWU9 ACACA_MOUSE 945 - 958: DshaAtlnrkSerE P45376 ALDR_MOUSE 217 - 230: DrpwAkpedpSllE B2RRX1 B2RRX1_MOUSE 224 - 237: EqemAtaassSslE Q6PAJ1 BCR_MOUSE 325 - 338: DsggGytpdcSsnE P00920 CAH2_MOUSE 19 - 32: DfpiAngdrqSpvD P63037 DNJA1_MOUSE 268 - 281: EalcGfqkpiStlD P25686 DNJB2_HUMAN 71 - 84: EgltGtgtgpSraE 254 - 267: DlqlAmayslSemE Q9JHU4 DYHC1_MOUSE 3952 - 3965: DeqfGiwldsSspE P68104 EF1A1_HUMAN 319 - 332: DvrrGnvagdSknD P42858 HD_HUMAN 409 - 422: EesgGrsrsgSivE O14964 HGS_HUMAN 262 - 275: ElqlAlalsqSeaE Q91ZX7 LRP1_MOUSE 2807 - 2820: EsvtAgclynStcD Q8VDD5 MYH9_MOUSE 1153 - 1166: DstaAqqelrSkrE Q64331 MYO6_MOUSE 1234 - 1247: ErcgGiqylqSaiE Q9QXS1 PLEC_MOUSE 1200 - 1213: EpspAaptlrSelE 2037 - 2050: EerlAqlrkaSesE P54775 PRS6B_MOUSE 361 - 374: EdyvArpdkiSgaD P55034 PSMD4_ARATH 223 - 236: DpelAlalrvSmeE 309 - 322: DlalAlqmsmSgeE P55036 PSMD4_HUMAN 213 - 226: DpelAlalrvSmeE Q62172 RBP1_MOUSE 19 - 32: EhgsGltrtpSseE 83 - 96: EgyaAfqedsSgdE P38886 RPN10_YEAST 225 - 238: DpelAmalrlSmeE E9Q401 RYR2_MOUSE 3702 - 3715: EdddGeeevkSfeE Q62261 SPTB2_MOUSE 1378 - 1391: DankAelftqScaD Q92783 STAM1_HUMAN 173 - 186: EedlAkaielSlkE O75886 STAM2_HUMAN 167 - 180: DediAkaielSlqE Q93008 USP9X_HUMAN 1682 - 1695: EqhdAleffnSlvD Q9WV55 VAPA_MOUSE 143 - 156: EpskAvplnaSkqD

136

[ED]-x(3)-[AG]-x(6)-S-x(2)-[ED] 34 hits in 27 sequences A2BH40 ARI1A_MOUSE 117 - 131: EppgGgggggsSssD Q6PAJ1 BCR_MOUSE 324 - 338: EdsgGgytpdcSsnE P62204 CALM_MOUSE 7 - 21: EeqiAefkeafSlfD Q922J3 CLIP1_MOUSE 661 - 675: DsleAvkarldSaeD O55029 COPB2_MOUSE 593 - 607: EyqtAvmrrdfSmaD Q9JHU4 DYHC1_MOUSE 4621 - 4635: DfeiAtkedprSfyE P68104 EF1A1_HUMAN 403 - 417: DmvpGkpmcveSfsD P42566 EPS15_HUMAN 576 - 590: EvttAvtekvcSelD P19096 FAS_MOUSE 584 - 598: EvacGyadgclSqrE Q91ZX7 LRP1_MOUSE 1353 - 1367: DwiaGniywveSnlD 2630 - 2644: DcedAsdemncSatD 3967 - 3981: DwvaGnvywtdSgrD Q9QXZ0 MACF1_MOUSE 2199 - 2213: DtsvGlrsefkSehD 2685 - 2699: DmatGkrvtlaSalE 6870 - 6884: DrvkAlitehqSfmE P97310 MCM2_MOUSE 790 - 804: DvnmAirvmmeSfiD P20357 MTAP2_MOUSE 7 - 21: DegkAphwtsaSltE Q64331 MYO6_MOUSE 702 - 716: DlmqGgfpsraSfhE Q13526 PIN1_HUMAN 87 - 101: ElinGyiqkikSgeE P97313 PRKDC_MOUSE 2041 - 2055: DfstGvqsysySsqD O43242 PSMD3_HUMAN 52 - 66: DgktAaaaaehSqrE P55036 PSMD4_HUMAN 255 - 269: DsddAllkmtiSqqE Q62172 RBP1_MOUSE 452 - 466: EtkiAqeiaslSkeD P54725 RD23A_HUMAN 150 - 164: EedaAstlvtgSeyE P38886 RPN10_YEAST 194 - 208: EgssGmgafggSggD E9Q401 RYR2_MOUSE 1859 - 1873: EeegGtpekeiSieD 3337 - 3351: DhlkAeargdmSeaE Q9UBT2 SAE2_HUMAN 218 - 232: EpteAeararaSneD Q920B9 SP16H_MOUSE 930 - 944: EgsdAedgdseSeiE Q62261 SPTB2_MOUSE 2063 - 2077: EksaAtwderfSalE 2148 - 2162: EmvnGaaeqrtSskE P16546 SPTN1_MOUSE 1604 - 1618: DrirGvidmgnSliE Q70CQ2 UBP34_HUMAN 786 - 800: EknmAdfdgeeSgcE 1672 - 1686: EscsGlyklslSglD

137

Appendix V: Six similarities trees of ubiquitin-like domains clustered based on electrostatic potential at varying distances (1Å to 6Å) from the UIM-binding interface, along with groups of ubiquitin-like domains that share strong electrostatic potential similarity at that specific range.