, LASSOS, AND LINKS

pawełdabrowski˛ -tumanski´

Topological manifolds in biological objects June 2019 – version 1.01

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] PawełD ˛abrowski-Tuma´nski: Knots, lassos, and links, Topological man- ifolds in biological objects, © June 2019 Based on the ClassicThesis LATEXtemplate by André Miede.

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] To my wife, son, and parents.

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] [ July 15, 2019 at 14:29 – classicthesis version 1.01 ] STRESZCZENIE

Ła´ncuchybiałkowe opisywane s ˛azazwyczaj w ramach czterorz ˛edowej organizacji struktury. Jednakze,˙ ten sposób opisu nie pozwala na uwzgl ˛ednienieniektórych aspektów geometrii białek. Jedn ˛az braku- j ˛acych cech jest obecno´s´cw˛ezła stworzonego przez ła´ncuchgłówny. Odkrycie białek posiadaj ˛acych taki w˛ezełbudzi pytania o zwijanie takich białek i funkcj ˛ew˛ezła. Pomimo poł˛aczonegopodej´sciateore- tycznego i eksperymentalnego, odpowied´zna te pytania nadal po- zostaje nieuchwytna. Z drugiej strony, prócz zaw˛e´zlonych białek, w ostatnich czasach zostały zidentyfikowane pojedyncze struktury zawie- raj ˛aceinne, topologicznie nietrywialne motywy. Funkcja tych moty- wów i ´sciezka˙ zwijania białek ich zawieraj ˛acych jest równiez˙ nieznana w wi ˛ekszo´sciprzypadków. Ta praca jest pierwszym holistycznym podej´sciemdo całego tematu nietrywialnej topologii w białkach. Prócz białek z zaw˛e´zlonymła´ncu- chem głównym, praca opisuje takze˙ inne motywy: białka-lassa, sploty, zaw˛e´zlonep ˛etlei ✓-krzywe. Niektóre spo´sród tych motywów zostały odkryte w ramach pracy. Wyniki skoncentrowano na klasyfikacji, wys- t ˛epowaniu, funkcji oraz zwijaniu białek z topologicznie nietrywial- nymi motywami. W cz ˛e´scipo´swi˛econejklasyfikacji, zaprezentowane zostały wszys- tkie topologicznie nietrywialne motywy wyst ˛epuj˛acew białkach. W szczególno´sci,zaproponowano i opisano nowe matematyczne narz ˛e- dzia umozliwiaj˙ ˛aceklasyfikacj ˛ebiałek-lass. W cz ˛e´scidotycz ˛acejwys- t ˛epowania struktur rozwazane˙ jest statystyczne prawdopodobie´nstwo wyst ˛epowania róznych˙ motywów. Ich mniejsza liczba w porównaniu z szacunkami wynikaj ˛acymiz modeli polimerowych stanowi wst ˛ep do rozwaza´nna˙ temat funkcji nietrywialnej topologii. W szczegól- no´scipokazano, ze˙ funkcj ˛asplotu jest wprowadzenie szczególnej sta- bilno´sciła´ncucha,a w przypadku niektórych białek topologia lassa jest najprawdopodobniej niezb ˛ednado pełnienia przez nie funkcji. W tej cz ˛e´scizaproponowana została równiez˙ funkcja w˛ezławła´ncuchu głównym, wspomagaj ˛acatworzenie i stabilizuj ˛acamiejsca aktywne enzymów. Nowy mechanizm zwijania zaw˛e´zlonych białek wykorzys- tuj ˛acyrybosom rozpoczyna cz ˛e´s´cczwart ˛a,w której analizowany jest równiez˙ wpływ topologii, ograniczonej obj ˛eto´scii długo´sciw˛ezła na zwijanie białek. Skrupulatna analiza wszystkich dost ˛epnych struktur przestrzen- nych białek mozliwa˙ była jedynie po stworzeniu odpowiednich narz ˛e- dzi programistycznych. Narz ˛edziate zostały przekazane naukowej wspólnocie pod postaci ˛abaz danych, serwerów, wtyczek do innych programów oraz paczki programistycznej. Narz ˛edziate opisane s ˛a w cz ˛e´scipi ˛atej.Praca ko´nczysi ˛ewskazaniem przyszłych kierunków rozwoju dziedziny oraz zbiorem literatury okalaj ˛acejzagadnienia za- warte w pracy. Zestaw ten skierowany jest do przyszłych adeptów,

5

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 6

stanowi ˛acprzewodnik po ´swieciebiałek o skomplikowanej topologii i zach ˛et˛edo dalszych prac.

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] ABSTRACT

The organization of amino acids in the protein is usually described in terms of four levels of structure classification which, however, misses some important aspects of protein geometry. One of the protein fea- tures absent is the existence of the tied on the protein backbone. The discovery of such knotted proteins raises the questions of the folding of such proteins and the function of the backbone knot. De- spite theoretical and experimental investigation, the answers on both of these questions remain elusive. Moreover, apart from the knotted proteins, some singular cases of other topologically non-trivial pro- teins were recently identified, for which the folding and the function are also unknown. This work is the first holistic elaboration on the whole field of the proteins with complex topology. Apart from the backbone knots, the work describes also other motifs, some discovered as the result of the project: complex lassos, protein links, knotted loops, and ✓-curves. The work concentrates on the classification, occurrence, function, and folding of proteins with the topologically complex motifs. In the classification part, all the topologically non-trivial motifs present in proteins are described. In particular, novel mathematical tools to classify the complex lasso structures are proposed. In the part devoted to occurrence of the motifs, their statistical probability is pre- sented. Observed underrepresentation of the motifs in comparison with polymer models becomes a prelude to the function of the com- plex topology. In particular, the links are shown to stabilize the struc- ture, and the lasso topology is strongly suggested to be crucial for the function of some proteins. In this part also the enzyme-favoring func- tion of the backbone knot is proposed. The novel, ribosome-based mechanism of folding of the proteins with backbone knots begins the fourth part, in which also the influence of the topology, confinement, and knot tails on folding process is analyzed. The scrupulous analysis of the whole database of the protein struc- tures was possible only with the creation of the special tools. These were given to the broad scientific community in the form of databases, servers, plugins, and a Python package, to which the fifth part of the work is devoted. The work is finalized with the future directions and further reading sections which, hopefully, will inspire younger adepts to immerse into the field of complex topology proteins.

7

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] [ July 15, 2019 at 14:29 – classicthesis version 1.01 ] PUBLICATIONS

The thesis covers the following publications:

Dabrowski-Tumanski P, Gren B, Sulkowska JI (2019). Statistical Prop- D1 erties of Lasso-Shape Polymers and Their Implications for Complex Lasso Proteins Function. Polymers, 11(4), 707. Gierut AM, Dabrowski-Tumanski P, Niemyska W, Millett KC, D2 Sulkowska JI (2019). PyLink: a PyMOL plugin to identify links. Bioin- formatics, bty1038 Dabrowski-Tumanski P, Rubach P, Goundaroulis D, Dorier J, Sułkowski P, Millett KC, Rawdon E, Stasiak A, Sulkowska JI (2018). D3 KnotProt 2.0: a database of proteins with knots and other entangled structures. Nucleic acids research, 47(D1), D367–D375. Zaj ˛acS, Geary C, Andersen ES, Dabrowski-Tumanski P, Sulkowska JI, D4 Sułkowski P. (2018). Genus trace reveals the topological complexity and domain structure of biomolecules. Scientific Reports, 8(1), 17537. Dabrowski-Tumanski P, Piejko M, Niewieczerzal S, Stasiak A, Sulkowska JI (2018) Protein Knotting by Active Threading of Nascent D5 Polypeptide Chain Exiting from the Ribosome Exit Channel. The Jour- nal of Physical Chemistry B, 122(49), 11616–11625. Dabrowski-Tumanski P, Sulkowska JI (2018). The APS-bracket–A D6 topological tool to classify lasso proteins, RNAs and other tadpole- like structures. Reactive and Functional Polymers, 132, 19–25. Jarmolinska AI, Kadlof M, Dabrowski-Tumanski P, Sulkowska JI D7 (2018). GapRepairer: a server to model a structural gap and validate it using topological analysis. Bioinformatics, 34(19), 3300–3307. Zhao Y, Dabrowski-Tumanski P, Niewieczerzal S, Sulkowska JI (2018). D8 The exclusive effects of chaperonin on the behavior of proteins with 52 knot. PLoS computational biology, 14(3), e1005970. Dabrowski-Tumanski P, Sulkowska J. (2017). To tie or not to tie? That D9 is the question. Polymers, 9(9), 454. Gierut AM, Niemyska W, Dabrowski-Tumanski P, Sułkowski P, D10 Sulkowska JI (2017). PyLasso: a PyMOL plugin to identify lassos. Bioinformatics, 33(23), 3819–3821. Dabrowski-Tumanski P, Sulkowska JI (2017). Topological knots and D11 links in proteins. Proceedings of the National Academy of Sciences, 114(13), 3415–3420.

9

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 10

Dabrowski-Tumanski P, Sklodowski M, Sulkowska JI (2016). Current D12 approaches to disentangle the mystery of knotted protein folding. TASK Quarterly, 20(4), 361–371. Niemyska W, Dabrowski-Tumanski P, Kadlof M, Haglund E, D13 Sułkowski P, Sulkowska JI (2016). Complex lasso: new entangled mo- tifs in proteins. Scientific reports, 6, 36895. Dabrowski-Tumanski P, Stasiak A, Sulkowska JI (2016). In search of D14 functional advantages of knots in proteins. PloS one, 11(11), e0165986. Dabrowski-Tumanski P, Jarmolinska AI, Niemyska W, Rawdon EJ, Millett KC, Sulkowska JI (2016). LinkProt: A database collecting in- D15 formation about biological links. Nucleic acids research, 45(D1), D243— D249. Dabrowski-Tumanski P, Niemyska W, Pasznik P, Sulkowska JI (2016) D16 Lassoprot: server to analyze biopolymers with lassos. Nucleic acids research, 44(W1), W383–W389. Dabrowski-Tumanski P, Jarmolinska AI, Sulkowska JI (2015). Predic- D17 tion of the optimal set of contacts to fold the smallest knotted protein. Journal of Physics: Condensed Matter, 27(35), 354109. Dabrowski-Tumanski P, Niewieczeral S, Sulkowska JI (2014). Deter- D18 mining critical amino acid contacts for knotted protein folding. TASK Quarterly, 18(3), 265–279.

Articles under construction: Dabrowski-Tumanski P, Goundaroulis D, Stasiak A, Sulkowska JI, ✓- D19 curves in proteins Dabrowski-Tumanski P, Sulkowska JI, The biological role of complex D20 lasso motif Dabrowski-Tumanski P, Perlinska A, Sulkowska JI, Macromolecular D21 links in proteins Dabrowski-Tumanski P, Rubach P, Niemyska W, Gren B, Jastrzebski D22 B, Sulkowska JI, Topoly - a Python package to analyze the topology of polymers Majewski M, Dabrowski-Tumanski P, Sulkowska JI, Search for non- D23 trivial topology in CASP competition - a key element to improve pro- tein structure prediction Dabrowski-Tumanski P, Sulkowska JI, Ways to classify the lasso struc- D24 tures

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] ACKNOWLEDGEMENTS

I would like to express my special gratitude to my supervisor – prof. Joanna Sulkowska, for introducing me to the world of complex topology proteins, the freedom to choose the topics, time spent on dis- cussions, hard work to meet short deadlines, patience, constant sup- port, possibility to attend numerous conferences and meet interesting people, and pointing the development directions. I truly believe, that possibly only a few other PhD students worldwide could count on similar work conditions.

Great acknowledgements belong to prof. Andrzej Stasiak, for his support, discussions and interesting ideas, leading eventually to work on function and folding mechanism of deeply knotted proteins. For all the time I could spend in Lausanne with him and his group: Fab- rizio Benedetti, Dusan Racko, Dimos Goundaroulis, and Julien Dorier. I learned a lot. Thank you.

This work could not be possible without great cooperation with the members of Joanna Sulkowska’s group: Szymon Niewieczerzal, who introduced me to simulations and made an excelent proof-reading of this work, Aleksandra Jarmolinska, who was absolutely fantas- tic in coding and taking care of the servers, Wanda Niemyska, with whom I discussed a lot about mathematics and life, Bartosz Gren, who taught me, that in science, even if you are sure, you need to seek for a proof, Maciej Piejko, who taught me numerous interesting facts from biology, Pawel Rubach, for his astonishing input from the IT site, Pawel Pasznik, who calmly introduced all our requests to the server, Aleksandra Gierut, who despite all our requests created two fantastic plugins, Yani Zhao, Michal Kadlof, Maciej Majewski, Alek- sandra Grzeszczak, Ania Wojtczak, Borys Jastrzebski, and Grzegorz Rajchel for the cooperation, which led and will lead to interesting articles, and Joanna Macnar, Gaja Klaudel, Jacek Kedzierski, Vasilina Zayats, Rafal Jakubowski, Agata Perlinska, Adam Stasiulewicz, Ma- ciek Sikora, Agata Bernat, and Martyna Osada, with whom I spent time and discussed good and bad sites of the work in science. Time with you all was truly remarkable.

I had the pleasure to meet and talk to many inspiring scientists in the field, who patiently explained to me the details of various subjects. With some of them, I had the genuine privilege to cooperate. For that I would like to thank Kenneth Millett, Eric Rawdon, Ellinor Haglund, Piotr Sułkowski, Sophie Jackson, Angel Garcia, Tetsuo Deguchi, Jason Cantarella, Józef Przytycki, Dorothy Buck, Cristian Micheletti, Flavio Seno, Erik Schreyer, Peter Virnau, Agnese Barbensi, Carolina Men- donca, Marek Cieplak, Diego U. Ferreiro, and De Witt Sumners.

11

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 12

My scientific carrier began with the adventure in Jacek Jemielity’s lab. I will be always gratefull for the time invested and the experimen- tal techniques I learned with him and his group. Here, I would also like to thank my two other former advisors, prof. Jan Antosiewicz for allowing me to do my project in his lab, despite many parallel activ- ities, and prof. Stanislaw Nowak, for introducing and interesting me in the algebraic topology.

I would like to thank also National Science Center (Preludium #2016/21/N/NZ1/02848 and Etiuda #2017/24/T/NZ1/00490 grants), Foundation for Polish Science (Start fellowship) and Faculty of Chem- istry for funding. I think, that some socially and scientifically important part of me was shaped by the 21st Warsaw Scout Team, scout activities, and trav- els through mountains. Thank you all.

Last but not least, I would like to thank my family. I would not be at this point especially if not the influence, work, and education of my parents, if not the constant belief of my wife, and if not my son, for whom I would like to be a wise, well-educated father-example. My family is what I value most.

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] CONTENTS

Introduction and aims 15 1topologyinproteins 17 1.1 Mathematical ...... 17 1.2 Knot generalizations ...... 26 1.3 Topology of proteins – state of the art ...... 31 1.4 The aims ...... 35 1.5 Complex lasso proteins ...... 36 1.6 Links in proteins ...... 42 1.7 Knotted loops, ✓-curves, and other new topological struc- tures ...... 45 1.8 Knotoid description of protein chains ...... 48 1.9 Genus of protein structure ...... 49 2statistics, probability, and shape of (bio)polymers 51 2.1 Sampling random chains ...... 51 2.2 Probability and statistics of non-trivial topology . . . . 52 2.3 Size and shape parameters of polymers ...... 53 2.4 The aims ...... 55 2.5 Probability and occurrence of topological motifs . . . . 55 2.6 Shape of lasso polymers ...... 60 3biologyofcomplextopologyproteins 61 3.1 Existing concepts of topologically complex motifs func- tion ...... 61 3.2 The aims ...... 62 3.3 Knot-induced enzymatic activity ...... 62 3.4 Function of other topologically complex motifs . . . . . 64 3.5 Utilization of structure conservation ...... 69 4foldingandunfoldingofcomplextopologypro- teins 71 4.1 Energy landscape theory ...... 71 4.2 Simulating the protein behavior ...... 74 4.3 Folding of topologically non-trivial proteins ...... 76 4.4 The aims ...... 77 4.5 Folding of deeply knotted proteins ...... 78 4.6 Folding of shallowly knotted proteins ...... 79 4.7 Folding of other topologically complex proteins . . . . 85 5thetoolscreated 89 5.1 Servers and databases ...... 89 5.2 Plugins ...... 95 5.3 Topoly Python package ...... 97 6summary, future directions, and further reading 99 6.1 The results ...... 99 6.2 Future directions ...... 101 6.3 Further reading ...... 103 athepolynomialinvariantsformotifspresentin proteins 105

13

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 14 contents

a.1 invariants ...... 105 a.2 Knot invariants ...... 106 a.3 ✓-curve and handcuff graph invariants ...... 107 bvideos 109 cthefiguremakingprocedures 111

bibliography 113

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] INTRODUCTION AND AIMS

Mathematics is a search for repeating patterns, followed by under- standing their origin and influence. One of such patterns are knots. Although commonly attributed to humankind, the knots are much more rudimental, as they can be identified in the most fundamental molecules of life – DNA and proteins. Although discovered over 20 years ago, knots in proteins remain a challenge for researchers. In particular, despite theoretical and exper- imental efforts, unsolved in full generality remain two most funda- mental questions, being the first impulse of the work:

1. What is the function of a knot tied on the protein backbone?

2. How do proteins acquire knotted backbones?

The second impulse comes from the recently discovered lasso shape of the obesity hormone – leptin (Sec. 1.5). The existence of different types of non-trivial topologies forces to look at the knotted proteins from a broader perspective. In particular, the four main questions, which can be asked in the field of proteins with complex topology are:

1. Mathematical – what kind of topologically non-trivial structures may be found in proteins?

2. Statistical – how much topologically non-trivial structures can be identified in proteins?

3. Biological – what is the function of the topologically non-trivial motifs?

4. Biophysical – how do proteins acquire the complex topology motifs?

These four questions, belonging to four different fields of science, naturally structure the four first chapters of this work. Each chapter consists of the theoretical introduction and the literature background, followed by the specific objectives (The aim section), and the results obtained. In particular, Chapter 1 describes the results of the classi- fication of topologically non-trivial protein structures into four new classes – lassos, links, knotted loops, and ✓-curves. Although some sin- gular structures belonging to these classes were identified earlier, this work contains the first meticulous analysis of the whole set of protein structures available. Moreover, as some structures (lassos) were not studied by mathematicians yet, new classification schemes were in- troduced. In Chapter 2, the question of the occurrence of the topological mo- tif is analyzed. In particular, the protein lassos are shown to have a different shape and to be less common than their polymer analogs.

15

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 16 contents

Yet, some small proteins include the complex lasso structure, despite its low probability in the polymer models. This indicates, that some complex lassos may be functional. The question of the function of the topologically non-trivial motif is discussed in Chapter 3. In this chapter, the backbone knot is shown to create places favorable for enzymatic active sites, which may be the answer to the persistent question concerning the function of back- bone protein knots. Moreover, in this chapter, the links are shown to induce additional stability, and for some proteins, their lasso motif is suggested to be crucial for their antimicrobial function. Chapter 4 starts from the novel mechanism of folding of proteins with knotted backbone. As this mechanism is compliant with all known experimental results, it is a strong candidate for the solution of the conundrum of folding of knotted proteins. Next, the influence of knot tail lengths, confinement and the topology on the folding pathway is studied. Meticulous analysis of the topology of every available protein struc- ture was possible only with the creation of specialized tools. These tools, along with the most up-to-date results of the scans were given to the broad scientific community in the form of four servers and databases – KnotProt, LassoProt, LinkProt, and GapRepairer, two Py- MOL plugins – PyLasso and PyLink, and a Python package Topoly. The capabilities of these tools are described in the Chapter 5. To keep the work concise, the results presented are reduced to the most important conclusions. The detailed description of the results may be found in the published articles. These were collected as a separate list, and numbered with a letter “D” in front (e.g. [D6]) to distinguish them from other references. Moreover, the coverage of the introduction in each chapter is reduced to the minimum required to understand the obtained results and to place them in the current state of knowledge. As a result, many fascinating topics were skipped. These are described briefly in Chapter 6. The Chapter includes the author’s sense of possible further development of the field of complex topology proteins, as well as the extensive bibliography on the topics surrounding the main plot of the work. On one hand, this chapter was thought to propose a new research direction for the development of the field and future colaborations. On the other hand, it is a guide for younger colleagues, introducing them to the fascinating and rich world of complex topology proteins. Hopefully, it will stimulate new discoveries in this field.

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] TOPOLOGY IN PROTEINS 1

Das mathematische Denken ist nur der Anfang des Denkens. — Kurt Reidemeister

he study of the topologically complex motifs in proteins starts T from the introduction to knot theory (Sec. 1.1) and knot general- izations (Sec. 1.2). Equipped with the topological basics, the state of the art in the topology of proteins is discussed (Sec. 1.3). As a result of this work, four new groups of complex topology in proteins, shown in the Fig. 1.1 are identified – complex lasso proteins (Sec. 1.5), three types of links (Sec. 1.6), knotted loops, and ✓-curves (Sec. 1.7). Apart from novel topological motifs, two new topological ways of protein analysis are tested – the knotoid (“planar, open knots”) approach to classify the protein’s topology (Sec. 1.8) and genus trace to identify the dynamical domains in proteins (Sec. 1.9).

Figure 1.1: The exemplary structures of four new protein topologies dis- cussed in this work: complex lasso (PDB code 2mn3), link (PDB code 2lfk), knotted loop (PDB code 1a8e), and ✓-curve (PDB code 1aoc). In each case the structure with its simplification, and the motif above the arrow is shown. The dashed lines denote the disulfide bridges or interaction with Fe2+ ion.

1.1 mathematical knot theory

Knot theory related The mathematical knot theory was always deeply related to impor- Fields medalists: tant theories of mathematics and physics. It is enough to say that four William Thurston in Fields Medals were awarded for the works related to knot theory. In 1982, Vaughan Jones and Edward Witten its very beginning, knots formed by ether vortices were believed to in 1990, and Maxim constitute the atoms of matter [1–3]. Therefore, the classification of all Kontsevich in 1994. possible knots was equivalent to building the first table of elements. This inspired Peter Guthrie Tait, Reverend Thomas P. Kirkman and independently Charles Little to build and published tables of over

17

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 18 topology in proteins

The atom theory of 200 different knots [4–6]. However, during their work the existence William Thomson of ether, and therefore beautiful knot-based element theory, was dis- (Lord Kelvin). proved by the Michelson-Morley experiment. Knot tables, however, remained in a similar form until today.

1.1.1 Knot notation and tabulation

The knots in Tait’s tables were formed by closed curves, to obtain sta- ble ether vortices. In fact, the mathematical knot has to be closed in order to have a well-defined topology, as otherwise, one can contin- uously untie the knot (without any cutting), and then continuously tie another knot. Therefore, the mathematical knot is defined via the In topology, such embedding of the circle S1 R3. Two embeddings are equivalent if ! transformation is they can be continuously transformed one into another, with no parts called ambient of the curve passing through each other (no cuts are allowed). isotopy [7]. In Kelvin’s theory, the number of lines in the element’s spectra corresponds to the number of crossings in the minimal crossing pro- jection of the underlying ether vortex (Fig. 1.2). This influenced the order in Tait’s tables, and today’s nomenclature of knots – the knots are named with two numbers – the number of crossings (in minimal crossing projection) followed by the subscript discriminating between knots with the same number of crossings. The simplest knots, along with their systematical, and common name (if exists) are shown in Fig. 1.3. Figure 1.2: The with its three different projections.

Figure 1.3: The simplest knots with their regular and common names. Some knots have two common name, 63 has no common name. The 62 knot ap- pears in the logo of Miller Institute for Basic Research in Science at the University of California, Berkeley.

Planar Diagram, Gauss, Ewing-Millett and Dowker notation The graphical representation of a knot is very suggestive, however, it is not convenient for abstract manipulation, especially in the computer- aided analysis. Therefore, different ways of encoding the knots were introduced. Fig. 1.4 shows one projection of a trefoil knot with the cor- responding, most popular Dowker, Gauss, Ewing-Millett, and Planar Diagram (PD) codes. In the Dowker code, one numbers the crossings when going along the knot. The procedure assigns two labels for each crossing, odd and even. It is then enough to give only the even numbers, sorted according to their odd counterparts. To indicate the crossing direction, one prescribes the positive sign for the overpass, and negative for the underpass. A similar idea underlies the Gauss code. In this case, only

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 1.1 mathematical knot theory 19 one number is prescribed to a crossing, but it is included twice, with either positive (overpass) or negative sign (underpass). Alternatively, one can indicate the under/overpassing with letters “U” and “O”.

Figure 1.4: A projection of the trefoil knot with Dowker, Gauss, Ewing- Millett (EM), and Planar Diagram (PD) codes. The blue digits number the crossings, red number the arcs. In Dowker code the pair of corresponding crossing labels were given under the code.

A more detailed description of the knot is possible using the no- tation introduced by Ewing and Millett [8]. In this method, the four arcs meeting in crossing are labeled with consecutive letters clock- wise, starting from the outgoing overpassing arc. Then, each crossing is described with four crossings it is attached to. Finally, in Planar Diagram code, the crossing is described by its arcs, given counter- clockwise, starting from the ingoing underpass.

1.1.2 Knot properties and operations on knots

Knots differ by many properties specific for a given knot type. Some, like knot chirality, orientation, and (Sec. 1.1.2) or the classi- fication as the twist knots (Sec. 1.1.2) were also applied to describe biological phenomena.

Chirality, orientation, and writhe The knot K, which is not equivalent to the mirror image K? is called chiral. In particular, there are two nonequivalent trefoils, denoted +31 and -31 (Fig. 1.5), with the signs stemming from the sign of crossings upon imposing any knot orientation. In the case of the trefoil, the orientation does not play a role, how- Figure 1.5: The ever, for example, the 817 knot is chiral only, when oriented. Usually, definition of the the orientation is omitted in tables of knots, nevertheless, it is impor- crossing sign tant from biological a perspective, as the biopolymers are naturally (top) and the two oriented (5’ 3’ in nucleic acids and N C in proteins). mirror images of ! ! Another property, closely related to chirality is the writhe wr(K) trefoil (bottom). of the knot K. For a given projection, it is defined as the difference between the numbers of positive and negative crossings. For a spatial curve, one can calculate the writhe averaged over an ensemble of projections, or using integral representation [9].

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 20 topology in proteins

Twist knots, unknotting and stick numbers Knots like 817 are called negative The simplest way to obtain a knot is to thread part of the chain amphichiral. 817 is through the twisted loop. Such knots are called twist knots, and they the simplest knot for form a series, ordered by the number of loop half-twists (Fig. 1.6). In- which the terestingly, all the knots identified in the proteins up to now are the orientation plays a role. twist knots (Sec. 1.3.1).

Figure 1.6: The simplest non-trivial twist knots with their number of half- twists. The was used to All the twist knots can be untied by changing one crossing – the describe the action of one formed while threading the twisted loop. In general, the mini- DNA mal number of crossings, which has to be changed to untie the knot topoisomerase [10]. is called the unknotting number. It is known for all knots with less than 10 crossings [11–14]. The simplest knot with unknotting number On the other hand, equal 2 is the 51 knot. the length of Another quantity, relevant from the viewpoint of polymers is the polymer in terms of stick number. It is defined as the minimal number of sticks needed sticks bound the maximal crossing to create a given knot. As the sticks may be the analog of the bonds number of knot joining monomers, the stick number tells if a given knot is possible obtainable from such in a polymer of a given length. In particular, the simplest non-trivial polymer [7]. knot (trefoil) can be built from at least 6 sticks. Therefore, there is no need to search for knots in polymers shorter than 6 mers.

Knot genus, Seifert, and minimal surfaces The knot, being homeomorphic to the circle, bounds a compact and connected surface, called the . Such a surface can be described in terms of its genus g (“number of holes”). To a given knot one can associate infinitely many Seifert surfaces, differing also with the genus. The smallest genus gK obtained in this way is called Genus of is the genus of a knot K. zero, while both 31 Usually, the Seifert surface is given in an abstract way, and the and 41 knots have genus is calculated from the Euler formula: genus equal to 1. is the Euler 2 - 2g - b = = V - E + F (1.1) characteristic of the surface. where b is the number of boundary components, V number of ver- tices, E number of edges, and F number of faces. However, one can also construct the surface, which allows for playing also with the properties of such surface. For example, one can construct the sur- face with the smallest possible area, i.e. the minimal surface spanned on a given knot. In fact, there are several equivalent definitions of the minimal surface M R3: ⇢

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 1.1 mathematical knot theory 21

1. Every point p M has a neighborhood with least-area relative 2 to its boundary. The proof of the equivalence of these 2. Mean curvature of M vanishes identically. conditions can be found for example in 3. M is a plot of a harmonic function for each coordinate. [15].

4. Every point p M has a neighborhood U which is equal to 2 p the unique idealized soap film with boundary @Up. For solving the In general, constructing the minimal surface for a given bound- problem of obtaining ary is not an easy task and usually it cannot be given in an analytic the minimal surface form. Therefore, the surface is usually approximated by its triangula- with a given boundary (Plateau’s tion [17]. problem), Jesse Douglas was Composition of knots awarded the Fields medal in 1936 [16]. The knots given in tables are the “prime”, i.e. non-splittable into sim- pler knots. On the other hand, any two knots K and J can be connected in two ways: either into a split sum K J (when they are fully separa- [ ble) or into their composition K#J (when they form a new knot). The composition is obtained by removing an arc from each knot and join- ing the resulting open structures (Fig. 1.7). Any composition of two reversible knots always results in the same knot. Figure 1.7: The 1.1.3 Distinguishing knots composition of 31 and 41 knots. One of the most basic problems in knot theory is to determine if the two diagrams correspond to the same knot. In general, this task is highly non-trivial – even for the unknot one can draw its arbi- trarily complicated presentation. Another well-known example is the – two 10-crossing knots, which, despite numerous tables refinement, existed as distinct for over 80 years, until Kenneth Perko showed they correspond to the same knot [18] (Fig. 1.8). Therefore, one needs some tools to distinguish between knots in the given pro- jections.

Reidemeister moves In principle, when comparing the projections of two knots, one could use the theorem of Kurt Reidemeister, that every ambient isotopy of Figure 1.8: The Perko pair with the knot in 3D space is equivalent to a sequence of three moves, called the original Rolf- Reidemeister moves (Fig. 1.9)[19]. Therefore, if there exists a chain of sen naming. moves connecting two diagrams, the corresponding knots are equiv- alent. However, it is usually impossible to prove, that such a chain of In fact in some moves does not exist. To solve the problem of knot distinguishing var- sources the Perko ious quantities invariant under any ambient isotopy (knot invariants) pair is still given incorectly, showing were defined. how hard is to distinguish knots. Alexander, Jones and HOMFLY-PT polynomials In the first approach, the knots were characterized by some numeri- cal invariants, including the minimal crossing, unknotting, and stick

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 22 topology in proteins

Figure 1.9: Three Reidemeister moves. The twisting of the chain (move I), passing one arc over the second one (move II) and passing an arc over/under a crossing (move III).

numbers, or the knot genus. These, however, turned out to be insuf- Alexander ficient, as even simplest knots are described by the same numbers. A polynomial is completely new invariant was introduced by Alexander, who instead denoted by (t). of one numerical invariant, prescribed a polynomial to a given knot [20]. A breakthrough work of Alexander seemed, however, too com- plex to be practical, until 40 years later, when John Conway discov- The Conway version ered the relation fulfilled by the polynomial, called the . of Alexander Much stronger invariant – the – was introduced polynomial is by Vaughan Jones as a byproduct of his work on the classification of denoted by (z). r The Alexander von Neuman algebras. As it turned out, the Jones polynomial obeys polynomial can be its own skein relation. This stimulated the development of other poly- obtained by nomial invariants, including the generalization of both Jones and substituting Alexander polynomials – the HOMFLY-PT polynomial [21]. z t1/2 - t-1/2. ! Jones was awarded the Fields medal for his work on von Neuman algebras. The HOMFLY-PT stems from the name of its inventors – Hoste, Ocneanu, Millett, Freyd, Figure 1.11: The calculation of the HOMFLY-PT polynomial for the Hopf Lickorish, Yetter, link and the trefoil knot. The salmon circle denotes the crossing on which Przytycki, Traczyk. the skein relation is applied. In the case of the , the polynomial is calculated from split sum property.

The basic properties of the polynomials are compared in the Tab. 1.1 Figure 1.10: The definition of [22]. The values of the polynomials for the simplest knots are given crossing used in App. A. in the skein The skein relation binds the polynomials of the knots differing in relation. only one crossing, as defined in Fig. 1.10. This allows for recurrently calculating the polynomial of any knot, as all knots have the finite unknotting number. An example of calculation of the HOMFLY-PT polynomial for the and the trefoil knot is shown in Fig. 1.11.

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 1.1 mathematical knot theory 23

Property Alexander-Conway Jones HOMFLY-PT Value for ( )=1 V( )=1 P( )=1 an unknot r -1 -1 (L+)- (L-)= t V(L+)-tV(L-)= lP(L+)+l P(L-)= Skein relation r r 1 1 z (L ) =(t 2 - t- 2 )V(L ) -mP(L )=0 r 0 0 0 Effect of mirror None V(t) V(t-1) P(l, m) P(l-1, m) image K K? ! ! ! Composition K#L (K#L)= (K) (L) V(K#L)=V(K) V(L) P(K#L)=P(K) P(L) r r ·r · · V(K L)= P(K L)= Split sum K L (K L)=0 1 [1 [ [ r [ -(t 2 + t- 2 )V(K)V(L) -(l + l-1)m-1P(K)P(L) Distinguished knots 212/250 243/250 248/250 (up to 10 crossings) Substitution l t-1, l 1, m z !1 1 — in HOMFLY-PT ! ! m (t 2 - t- 2 ) ! Table 1.1: Comparison of the properties of the popular knot polynomials.

Kauffman bracket The Kauffman The Jones polynomial turned out to have a surprisingly deep con- bracket also paved nection with statistical physics, as revealed by Louis Kauffman, who the way for other introduced the state model and the Kauffman bracket approach [23]. polynomial and homological knot The bracket is defined via three conditions, depending on three arbi- invariants. trary constants:

1. < >= 1 denotes unknot, K an arbitrary knot. 2. < >= A< > +B< >

3. = d [ It is easy to see, that with B = A-1 and d =-(A2 + A-2), the Kauffman bracket is invariant under Reidemeister moves II and III. However, under the I the value of the bracket is 3 multiplied by (-A)± . To get rid of this dependence, the bracket is wr(K) is the writhe normalized by (-A)-3wr(K). It turns out, that the normalized Kauff- of knot K man bracket is equal to the Jones polynomial, upon substitution A (Sec. 1.1.2). ! t-1/4.

1.1.4 Knot simplification The polymer is The polymeric chains can rarely be seen in their minimal crossing pro- usually represented jections. As the calculation of the invariants is crucially dependent on as the chain of beads. the number of crossing in the projection, the curves have to be simpli- fied, before the invariant calculation begins. Searching for the chain of Reidemeister moves to obtain the optimal projection is inefficient, therefore other knot-simplifying methods were proposed.

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 24 topology in proteins

Number of beads reduction - KMT and Knot_Pull algorithms If the triangle, spanned on three consecutive vertices is not pierced by Figure 1.12: The the rest of the chain, this triangle may be reduced to the interval with- essence of the out a change of the topology (Fig. 1.12). This concept underlies the KMT algorithm. If the triangle is KMT algorithm, which was used to find the first deep knots in pro- not pierced, the teins [24, 25]. The algorithm is fast, however, while removing consecu- central bead can tive beads, it leads to extending the bond lengths and narrowing the be reduced. angles, which may eventually lead to false-positive results. A mod- ernized version of the KMT algorithm can be found in the Knot_Pull Python package, in which the process of the chain reduction was im- KMT is an acronym proved [26]. of its creators – Koniaris, Muthukumar, Dynamical knot relaxation Taylor. The non-smooth structure resulting from the KMT algorithm does The Knot_Pull not facilitate knot recognition by the naked eye. Alternatively, one package is available can use a dynamic knot relaxation, where the knot is treated as a at https://github. com/dzarmola/ polymer with defined equilibrium bond lengths, elastic energy and knot_pull. It is charges [27–31]. In such an approach, one may look for an optimal also part of the knot shape, by searching for the minimum of the potential energy of Topoly package such polymer. Usually, this method results in a more smooth struc- (Sec 5.3 [D22]. ture. Two examples of such methods are SONO (Shrink On No Over- Searching of the minimal-energy laps) [35–37] and the algorithm implemented in the KnotPlot soft- conformation of ware [38]. Both these algorithms transform the Perko pair into the knots led to the same knot. notion of the ideal knots [32–34]. 1.1.5 Open-chain knots The problem of defining knots in The proteins rarely are closed – usually, they have two free ends (the open chains termini), and therefore they are topologically equivalent (ambient iso- appeared very early topic) to the straight interval. Still, if the knot tails are sufficiently far in polymer physics apart, the existence of “knot” is intuitive (Fig. 1.13). At least three [39]. different ways of dealing with the problem of defining knot in open chain were introduced – chain pulling, chain closure, or utilization of different mathematical tools (virtual knots or knotoids described in Sec. 1.2). Figure 1.13: The entanglement on Chain pulling a rope is called a When pulling the tail of a rope, the maximal distance between the “knot”, although the rope has rope termini depends on the knot existence. If some part of the rope (usually) free was entangled, the final distension would be shorter, than for un- tails. knotted case, as some portion of the rope would be “used” to form a “knot”. The strength of this approach is that it can be tested both experimentally (using optical tweezers or AFM) [40–42] as well as implemented in simulations [43–46], which allows direct comparison of the results. However, during the analysis, one has to take into ac- count various factors, including the direction of pulling, the existence of disulfide bridges, and the protein internal friction. The method,

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 1.1 mathematical knot theory 25 however, does not determine the knot topology, as the portion of the rope used to tie the “knot” is a rather poor [32].

Chain closure If the rope tails are sufficiently far apart, they could be connected with an imaginary arc, which closes the rope, establishing its knot topology. Such “closed knot” can be then studied using various knot invariants. Different closure schemes were applied to the proteins so far, including: 1. direct closure; Figure 1.13: 2. attaching a large planar arc to the termini of the polymer [47]; Extending 3. extending the termini in the direction from the center of mass termini towards and closing on a large sphere [48]; two randomly chosen points 4. minimaly interfering closure method [49]; on the sphere (shown for 5. repulsion closure [50]; structure with PDB code 2efv). 6. one point closure [51, 52];

7. two point closure [52, 53];

8. closure by extending rays in one direction [54]. In the methods 1-5, the closure is completely determined by the chain coordinates, which in principle may bias the results. The prob- The methods relying lem is overcome by taking the average over a large ensemble of ran- on an ensemble of dom closures (Fig. 1.13). This is the essence of the methods 6-8. The random closures are called stochastic. drawback of these methods is their computational complexity. To re- E.g. the 20-vertices duce the number of closures needed, the uniform sampling is used, Martin’s polyhedra in which the vertices of so-called Martin’s polyhedra (polyhedra with is a regular almost equal distance between vertices) are sampled [55, 56]. dodecahedron.

Knot fingerprint, knotted core, subknots, slipknots The subknot The possibility to define the knot type for open chains may reveal analysis may be interesting topological features of static structures, such as obtained performed also for from protein’s X-ray, CryoEM or NMR analysis. In particular, one can circular (closed) chains, which leads determine the topology of each subchain. This information may be to the knot circular presented as a matrix, in which the entry (i, j) contains the topology matrices [54]. and its probability for subchain spanning indices i - j. In particular, Such a matrix is the overall unknotted chain may contain a knotted subchain (Fig. 1.14). called the knot Such structures are called slipknots [50]. fingerprint matrix. The knot fingerprint matrices can distinguish the topologically equiv- In fact, there is also alent curves [58] and allow to select some special parts of the chain. the “top-bottom” definition of the In particular, the smallest knotted portion of the chain is called the knotted core as the knotted core (right-top corner of the knotted region in the matrix). smallest piece of the The remaining parts of the chain are called the knot tails. In the case chain which remains of slipknotted chains, one can also distinguish the slipknot loop – the knotted. These two definitions are not part of the chain starting from the knotted core, which forms a knot, entirely until it winds back towards the knotted core (Fig. 1.14). identical [49].

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 26 topology in proteins

Figure 1.14: The knot fingerprint matrices for three cases. (A) The knotted chain, (B) the slipknotted chain with only one tail forming the slipknot loop, (C) the slipknotted chain with both tails forming the slipknot loop. Figure from [57].

1.2 knot generalizations

The knot theory may be generalized into several directions. In partic- ular, one can increase the number of components – such multicompo- nent knots are called links (Sec. 1.2.1). On the other hand, one may re- gard a knot as a spatial graph with only two-valent vertices. From this viewpoint, another generalization is to allow the vertices with higher valency. This leads to the theory of spatial graphs, in particular, ✓- curves, handcuff graphs, and Möbius ladders (Sec. 1.2.2). Finally, one can analyze the two-dimensional projections of a knot independently from the original, three-dimensional object. On one hand, this allows defining the knotoids – a planar, open-chain “knots”. On the other, planar projections allow to specialize some crossings, leading to the theory of virtual knots. The short theory of knotoids and virtual knots is presented in Sec. 1.2.3.

1.2.1 Links In fact, the separate classification for Links are embeddings of a finite set of circles into the three-dimension- alternating and al space. They are named analogously, to knots, with additional su- non-alternating perscript, denoting the number of components. Moreover, the second knots exists too, but naming scheme is also used in which alternating (where over- and it used only for structures with undercrossing alternate), and non-alternating links are classified sep- many crossings. arately. Some (prime) links with their common and regular names are The links may be presented in Fig. 1.15. also composed Links can be represented in a computer-understandable form (Sec. similarly to knots. 1.1.1) in almost the same way as knots. However, in the Gauss and Dowker codes specification of the components is required.

Chirality of links Similarly, as in the case of knots, one can analyze the influence of the Figure 1.16: The mirror image symmetry and the orientation reversal. However, in the topologially case of links, one can impose the orientation separately for all com- non-equivalent (oriented) Hopf ponents, obtaining in principle non-equivalent links (differing e.g. in links. Note the Jones polynomial). In particular, there are two non-equivalent (ori- change in the ented) Hopf links (Fig. 1.16), and four Solomon links. To deal with orientation of numerous versions of links arising this way, some naming conven- blue ring.

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 1.2 knot generalizations 27

Figure 1.15: Exemplary links with their common and regular names. tions and tables including components orientations were created [59– 61].

Distinguishing links No non-trivial knot Links can be distinguished with standard knot-theory tools, includ- with trivial Jones ing the polynomials. However, their usability is significantly decreased polynomial is known compared to knots. For example, there is an infinite series of links yet. with a trivial Jones polynomial [62]. For closed curves In the case of two-component links, one can also use the Gaussian one obtains an (GLN), measuring how many times one component integer value. is winding around the second one. The links with the same value of GLN are called link-homotopic. In particular, the is link-homotopic to unlink. The GLN can be defined in many equiva- lent ways [63]. For computer-aided analysis, the most convenient is the integral representation: Figure 1.17:

1 r1 - r2 The Borromean GLN(1, 2)= 3 dr1 dr2 (1.2) rings. Taking out 4⇡ |r1 - r2| ⇥ Z 1 Z 2 any component The advantage of such an approach is, that it can be calculated unties the also for open curves. The non-zero value GLN can be treated as a remaining structure. definition of linking for open chains. In fact, such an approach was already used in the case of proteins (Sec. 1.3.2). For a link with N > 3 components, if no pair of components is linked, the whole structure may still be non-trivial. Such struc- tures are called Brunnian links, and the simplest example are the (Fig. 1.17).

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 28 topology in proteins

1.2.2 ✓-curves and other spatial graphs

The spatial graphs with no mono-valent vertex can be viewed as an- other generalization of knots. In the simplest case, one can analyze graphs with two tri-valent vertices. In such a case, the structure is composed of three arcs and depending on the arc connection, one can Figure 1.18: obtain either a ✓-curve or a handcuff-graph (Fig 1.18). In ✓-curves, Simple spatial each pair of arcs, joining the tri-valent vertices form a closed loop, graphs. The vertices are which in principle can be knotted. Therefore, to a given ✓-curve, one denoted with can ascribe three knots, called the constituent knots. Similarly, to a blue dots. given handcuff graph, one can ascribe its constituent link, formed by loop-forming arcs. The prime ✓-curves and handcuff graphs were classified up to 7 crossings in minimal crossing projection, and their

The ✓51 is called the constituent knots/link are known [64–68]. In particular, there may be Kinoshita curve. non-trivial structures with all the constituent knots/link trivial. The simplest cases are the ✓51 and H61. In analogy with links, such struc- tures are called Brunnian. The simplest ✓-curves and handcuff graphs The naming are shown in Fig. 1.19. convention for ✓-curves and handcuff graphs is analogous to the case of knots.

Figure 1.19: The simplest prime non-trivial ✓-curves and handcuff graphs. In the parenthesis the corresponding consituent knots/link.

Similarly to knots and links, the ✓-curves and handcuff graphs may Figure 1.20: Two different be composed with each other, or also with knots. However, depend- compositions ing on the action on trivalent points, different ways of performing of two copies composition are possible, distinguished by the subscript in the com- of ✓31. Note position operator (Fig. 1.20). the subscript in the composition Chirality and non-planarity sign. The chirality of the constituent knots/link results in the chirality of ✓- curves and handcuff graphs. Although the nomenclature has not been settled, it seems natural to transfer back the chirality from constituent knots. Then, e.g. +✓31 contains +31 knot as a constituent knot.

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 1.2 knot generalizations 29

Additional tri-valent vertices, or increasing the valency of the ver- tices vastly expand the spectrum of possible structures, but no gen- eral classification has been established yet. However, some graphs are surely non-trivial, as they are non-planar independently on their ambient isotopy class. Such behavior is governed by the theorem of Kazimierz Kuratowski, that graph containing a K5 or K3;3 (Fig. 1.21) as Figure 1.21: K5 a subgraph cannot be planar (i.e. represented on Euclidean plane without and K3:3 graphs. self-crossings)[69]. As a consequence, the Möbius ladder graphs Mn (with the number of rungs n>2) are non-planar, as they contain K3;3 subgraph. Moreover, it can be shown, that the Möbius ladder with There are also three rungs M3 is topologically chiral, i.e. not equivalent to its mirror graphs always image [74, 75]. containing links [70–73].

Spatial graph polynomials Following the success of the knot polynomials, an analogical tool for classification of spatial graphs was constructed by Shujiˇ Yamada [76]. The original definition involves the reduction of crossings in the pla- nar projection, following the calculation of planar-graph-specific func- tion. However, similarly to the skein relation defining the knot poly- nomials, the Yamada polynomial may be defined recurrently by its five properties:

1. R( )=1; ; 2. R( )=xR( )+R( )x-1R( );

3. R( )=R( )+R( );

4. R( )=R( )R( ) for a disjoined union of graphs; 1 [ 2 1 2 -1 n 5. R(Bn)=-(-x - 1 - x ) where Bn denotes the bouquet of n circles.

Similarly to Kauffman bracket (Sec.1.1.3), the Yamada polynomial is defined up to a factor xn for some n Z, as is evidenced by the 2 twisting moves:

1. R( )=-xR( ), R( )=-x-1R( );

2. R( )=x2R( ), R( )=x-2R( ). In particular, the -1 The mirror image symmetry results in transformation x x [76] ✓4 is chiral, ! 1 in Yamada polynomial. Therefore, it can be used to distinguish the although it is amphicheiral structures, as those have symmetric Yamada polyno- “based” on amphicheiral 41 mial. knot. Another interesting polynomial called “Simplified RNA Polyno- mial” (SiRP) was introduced recently by Tian, Kauffman, and Liang to study the topology of RNA [77]. In RNA, apart from the backbone interaction, the monomers (nucleotides) form labile hydrogen bonds. Therefore, SiRP is suited to analyze spatial graphs with two types of edges, representing two types of interaction (backbone or hydrogen).

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 30 topology in proteins

Kauffman boundary link analysis In fact, instead of the three-component In the case of ✓-curves and K4 graphs another interesting method was link, it is enough to also invented [78]. In this method, to a given structure, one assigns study the three a three-component link arising as a boundary of a specific surface. two-component The procedure is purely deterministic and requires only the informa- links. tion on the signed sum of crossings between the arcs. For example, 2 2 2 2 2 2 the method ascribes (51, 78, 78) for ✓31 and (73, 73, 73) for Kinoshita For example, curve (✓51). The procedure has, however, computationally limited 11-crossing link for power, as it produces very complicated links even for relatively sim- ✓52. ple structures.

1.2.3 Knotoids and virtual knots The knotoids may be closed “above” or Analyzing the planar graph, instead of spatial representation of a “below”. If both given knot, opens new ways to extend the knot theory. In particular, closures produce a on a two-dimensional surface, one can control the behavior of the non-trivial knot, the tails of an open curve, by disallowing the chain to pass over or under knotoid is called knot-like. the tail. This leads to the theory of knotoids, introduced by Vladimir Turaev [79]. Two knotoid diagrams are equivalent if they can be con- nected by the chain of Reidemeister moves (not involving the chain tails). This enables the classification of knotoids, as the equivalence classes of the diagrams. The nomenclature of the knotoids is analo- gous as for knots, and the classification up to 6 crossings was done [80]. However, one has to bear in mind the underlying surface the kno- The knotoids may be toids lie on. In particular, on the sphere S2 one can wind the chain defined also on e.g. over the sphere – the move which cannot be done on a plane R2. torus [81]. This move may in principle reduce some knotoids. Consequently, the spherical knotoid may be named differently, than identically looking planar knotoid. The exemplary knotoids are shown in Fig. 1.22.

Figure 1.22: The exemplary non-trivial planar knotoids (top row) and virtual knots (bottom row). The knotoids with red label are equivalent to trivial knotoid k01 on the sphere. Note, that virtual knot v36 is in fact the regular trefoil knot.

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 1.3topologyofproteins– state of the art 31

As the notion of knotoids comes from the knot projections, the computer-understandable representation of knotoid is built in the same way as for knots (Sec 1.1.1). However, in the case of planar knotoids, the specification of the exterior and interior of the knotoid is required [80, 82]. Dealing with the knot projections allows also for differentiating the crossing types, each type obeying own set of Reidemeister moves. Figure 1.23: The It turns out, that the consistent theory may be built upon introducing mixing move one additional, mixing move (Fig. 1.23). The theory, created by Louis for virtual knots. Kauffman is called the virtual knot theory, as the new type of crossing The virtual behaves as non-existing [83–85]. In particular, the virtual crossings are crossings are the ones missing in the Gauss code for a given diagram. For example, encircled. the Gauss code +1 - 2 - 1 + 2 (v21) describes the diagram, in which The table of virtual an additional crossing should appear. However, it is not indicated in knots can be found at http://www. the code, and therefore the crossing is virtual. The virtual knots were math.toronto. classified up to 6 regular crossings and the algorithm for a further edu/drorbn/ classification has been given. Exemplary virtual knots were shown in Students/GreenJ/ Fig. 1.22.

Chirality of knotoids and virtual knots In both theories, apart from a standard (vertical) mirror image sym- metry changing all the crossings, there is also the horizontal mir- ror symmetry, when the structure is reflected by a horizontal mirror (Fig. 1.24). In general, all four versions of a given knotoid/virtual knot may be different. Figure 1.24: The Distinguishing virtual knots and knotoids symmetries of the knotoid. The Jones polynomial defined by Kauffman bracket (Sec. 1.1.3) can Analogous be easily modified to handle both knotoids and virtual knots. How- symmetries are ever, it is not powerful enough to distinguish all the knotoids up to possible for 5 crossings and even simplest virtual knots have their Jones polyno- virtual knots. mial equal to 1. Therefore, much stronger tools were created, includ- For example Jones ing (planar) extended Turaev bracket [79], affine index polynomial polynomial for v31 [86], arrow polynomial [87, 88], generalized is equal to 1. [89] and others [90–94].

1.3 topology of proteins – state of the art

The existence of complex topology proteins was debated throughout the years. Once the structure determining techniques (X-ray, CroyEM, NMR, etc.) became popular, more and more complex topology pro- teins were identified. In particular, protein knots and slipknots (Sec. 1.3.1), singular examples of links (Sec. 1.3.2), Möbius ladders, and pierced lassos (Sec. 1.3.3) were described. Here, the short sum- mary of the structures known is presented.

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 32 topology in proteins

1.3.1 Protein knots and slipknots In fact, any backbone knot in the In proteins, a knot can be formed in two ways. On one hand, the pro- protein crystal tein backbone, being a linear chain, can be tied into an (open) knot. structure was However, such backbone knots were believed to be thermodynami- considered as a cally improbable [96], as they would require threading of the chain “misinterpretation of the data” [95]. through a loop. On the other hand, some amino acids can form covalent, non-back- bone bonds (disulfide bonds, posttranslational amide bonds, etc.), which result in closed, covalent loops. In principle, such loops can also be knotted. The idea of knotted loops was proposed several times [97–101], however, only the inclusion of ion-amino acid interaction al- lowed to identify the first knotted loops in proteins [102, 103].

Backbone knots and slipknots Deeply knotted denotes proteins Probably the first assumption that the protein backbone may actu- with at least 20 ally form a knot was done by Jane Richardson, who investigated the residues in each knot various arrangements of -sheets in proteins [105]. The first rigor- tail, as opposed to ous search of knots in proteins was done by Marc Mansfield [53], shalowly knotted [104]. with no structure with an evident knot found. However, deeply knot- ted structures were then identified by other researchers [25, 104, 106– 108]. Their success motivated others to perform meticulous analysis of all the crystal structures available, and to create knotted protein databases [109–113], [D3]. As a result of the systematic study, five knot types formed by the protein backbone (closed stochastically) were identified: -31 [25, 114], +31 [107, 108, 115], 41 [25], -52 [48], and +61 [114]. The exemplary The next structures are shown in Fig. 1.25. All the knot types identified so far is 72. are of the twist type (Sec. 1.1.2), which require only one threading dur- ing folding. The analysis via the knot fingerprint matrix (Sec. 1.1.5) re- veals also the existence of protein slipknots. In particular, three kinds The virtual crossing of slipknots were found so far: S + 31, S - 31 and S41 (Fig. 1.25). appears as a crossing Apart from classification with chain closure, proteins were also an- between the chain, alyzed as virtual knots [116] and some exemplary knotted proteins and the chain closure in the were analyzes as knotoids [117, 118]. In general, both classifications projection. are consistent with the knot classification of the protein backbones.

Knotted loops Some care needs to be undertaken when Despite many attempts, the strictly covalent knotted loops (spanning analyzing the the protein backbone and disulfide bridges) were not identified in results, as the proteins. Only the inclusion of non-covalent ion-amino acid interac- oxidoreductase with tions allowed Chengzhi Liang and Kurt Mislow to identify singular PDB code 1aoz has no knotted loop in 31 knotted loops [102, 103]. It seems, however, that the importance of contrary to [102]. this discovery faded upon identification of the first proteins with the deeply knotted backbone.

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 1.3topologyofproteins– state of the art 33

Figure 1.25: The exemplary knotted (top row) and slipknotted (bottom row) protein structures with corresponding knot types. PDB codes of the struc- tures from left to right: 2efv, 1j85, 4y3i, 3irt, 3bjx (top row) 1hyn, 2qqd, 5j4i (bottom row). The figure adapted from [D9].

1.3.2 Protein links

In proteins, link structure can be formed in various ways, depending on the nature and definition of components (Fig. 1.26). In particular, the components may be realized by backbone loops, closed by a disul- fide bridge. Only one example of such link was identified so far – the P. aerophilum citrate synthase transferase (PDB code 2ibp) [119]. More links may be identified when the covalent or ion-based loops (parts of backbone joined by disulfide bridge or interaction via ion) are treated as components [102, 103]. Moreover, the heme cofactor may be treated as a separate component, which can form interesting links [103].

Figure 1.26: The exemplary known links in proteins, along with their sim- plification and motif. Left to right, top to bottom: deterministic link formed by two chains (PDB code 2ibp), deterministic link with the covalent loops comprising a few disulfide bonds (PDB code 1hcn, chain B), deterministic link comprising heme cofactors (PDB code 2cdv), two-chain link in p53 pro- tein (PDB code 1aie – the dashed lines denotes the artificial closure added), link formed by 10-chain components (PDB code 1zye), part of the chain mail structure of HK97 capsid (PDB code 3j4u). The colors in the motif match the colors in the structure. The dashed lines denote the non-backbone interac- tions. The list was further developed by the results of this project.

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 34 topology in proteins

The groups of links Components may be also formed by the whole chains. In particular, identified could correspond to Hopf artificial closure of two chains of tumor suppressor protein p53 results (GLN = 1) and in the Hopf link structure [120]. Such two-chain links were studied in ± Solomon parallel with this work by chain pulling [121] and chain closure with (GLN = 2) links. ± subsequent Gaussian Linking Number (Sec. 1.2.1) calculation [122]. Both methods proved the existence of at least two types of two-chain links in proteins. However, the precise determination of the topology has not been performed. Finally, the link structure was also identified in the case of the bovine mitochondrial peroxiredoxin, where each component is formed by 10 separate chains [123] and in HK97 virus capsid, were the chains connected by amide bridge form a chain mail of linked pentagons and hexagons [124–126].

1.3.3 Pierced lassos, cysteine knots, Möbius ladders and non-planar graphs

The existence of covalent bridges in proteins results also in the possi- bility of another complex topology, where a covalent loop (closed by The idea of proteins the bridge) is pierced by some other fragment of the chain. Until re- with pierced, cently, two groups of such complex lasso proteins were known, differ- covalent loop was ing in the loop closing bond. The first one contains the amide-based postulated independently miniproteins (called also lasso peptides or lariat protoknots) [128– several times [99, 132]. The second group encompasses the disulfide-based pierced lasso 127]. bundles, with the first representant – leptin [133, 134].

Figure 1.27: The examples of other complex topology proteins: (from left to right) lasso peptide astexin (PDB code 2m37), pierced lasso leptin (PDB code 1ax8), cysteine “knot” (PDB code 6mk4), and non-planar-graph structure with K5 subgraph (PDB code 1neh).

Another complex motif is the “cystine knot” (or “cysteine knot”). The term is, however, misleading, as such structure, regarded as a graph in space does not contain any knotted cycle. On the other hand, it may be shown that it is indeed non-trivial forming a Möbius ladder with three rungs [135], and therefore, by Kuratowski theorem is non- planar (Sec. 1.2.2). In fact, searching for protein structures with embedded K3,3 graph (and therefore non-planar), was performed several times [97, 100, 101, 136–138]. In particular, the cofactors in Chromatium high potential iron protein [139] and nitrogenase [135] allow to identify both K3,3 and K5 protein subgraphs. Both, by Kuratowski theorem results in non- planarity of underlying graph.

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 1.4 the aims 35

1.4 the aims

Compared to the classification of the backbone knots in proteins, the classification of the singular lasso, link, and knotted loops present in proteins seems very modest. This motivates the first tasks in the topological analysis of protein structures: 1. Develop the methods to identify lassos, links, and knotted loops in a given protein structure; 2. Scan all the available protein structures to identify all topologi- cal motifs. The known link examples differ drastically in the structure, as the link may be contained entirely in one chain, or the linked components may be formed from several chains (Sec. 1.3.2). Therefore, some cate- gorization of different ways of a link formation is required. Moreover, no mathematical lasso classification has been established yet. This stimulates further mathematical tasks: 3. Establish the mathematical classification of lassos; 4. Cathegorize different links in proteins; 5. Classify the topology of links and knotted loops present in pro- teins. The world of topologically complex structures, generalizing knots is very rich (Sec. 1.2), therefore, possibly many other topological mo- tifs may be found in proteins. On the other hand, different mathemat- ical tools may be used to classify proteins. For example, the knotoid classification can be extended on the whole set of knotted proteins to see, how the backbone knot motif arises in proteins. This inspires further aims: 6. Identify other topological motifs in proteins; 7. Describe the spectrum of knotoids when analyzing the full set of protein structures; 8. Find other topological tools useful in the analysis of protein structures. Finally, the chemical nature of interactions utilized to define the complex topology may be analyzed. In particular, no purely covalent (composed of covalent bonds only) knotted loop was identified so far, although that would represent the true, undisputable, mathematical knot. This influences the following tasks: 9. Identify all the types of covalent bonds closing the backbone in complex lasso proteins; 10. Identify the purely covalent knotted loops in proteins; 11. Analyze the change of the spectrum of the knotted loops, upon the inclusion of interaction via ion, or the chain closure. The rest of this chapter is devoted to these tasks.

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 36 topology in proteins

1.5 complex lasso proteins

The cysteine bridges are present in approximately 20% of protein structures deposited in the RCSB database. The existence of the thread- ing only in one structure would then be highly improbable. How- ever, such lasso motifs were not studied neither by biologists (except for miniproteins), nor by mathematicians, nor by polymer physicists. Therefore, the search for complex lasso proteins required the construc- tion of a new algorithm (Sec. 1.5.1) and introduction of the classifica- tion and naming convention (Sec. 1.5.2). As a result, various classes of lassos containing one or more bridges constituting the loop were identified (Sec. 1.5.3).

1.5.1 The algorithm The result may depend on the The algorithm applied to identify the complex lasso proteins is pre- definition of discrete sented schematically in Fig. 1.28. Each covalent bond (disulfide, amide, Laplacian. ester, etc.) defines a covalent, closed loop with two tails attached. On this loop, the (triangulated) minimal surface is spanned. The surface is constructed iteratively, by fulfilling two conditions of minimal sur- face: minimal local area and vanishing Laplacian [D13]. The smooth- ness of the surface is adjusted by the number of iteration steps and the number of triangles used. Next, the indices of piercing residues are determined. Finally, the piercings, which are too close to each other (10 residues), to the bridge (3 residues) or to the end of the tail (3 residues) are reduced, as these may arise from the thermal fluctua- tions.

Figure 1.28: The scheme of the algorithm used in lasso determination.

GLN matrices Apart from the surface spanning, the complex lassos can be ana- lyzed by the implementation of the Gaussian Linking Number (GLN – Sec. 1.2.1). In fact, the integral definition allows for calculating the GLN between the loop and the tail, although the tail does not form However, high GLN a closed loop (Fig. 1.29). As a result, non-integer value is obtained value does not which, if significantly different from 0, indicates piercing-like behav- necessarily mean the ior. Moreover, the location of the piercing residue may be narrowed, tail pierces the loop. when analyzing the GLN matrix – the values of the GLN for each subinterval of the tail. In particular, the highest absolute values of the GLN are obtained close to the piercing residue (Fig. 1.29).

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 1.5 complex lasso proteins 37

Figure 1.29: The GLN analysis of the protein. Left panel – the idea of the method – the GLN calculated between the loop (blue) and the tail (red) is close to 1. Middle panel – the GLN matrix – the entries denote the GLN value between the loop and corresponding subinterval of the tail (calculated for structure with PDB code 3suk, chain A, loop spanned on the bridge Cys39-Cys76 pierced by C-terminal tail). Right panel – the corresponding protein structure (PDB code 3suk, chain A) colored according to GLN values. Images generated by PyLasso plugin [D10].

1.5.2 Classification and naming convention

In general, given protein structure can contain several pierced loops. Therefore, the classification of the lassos in proteins required first creating the nomenclature for single loops, followed by merging them into a classification scheme of several loops.

Single loop motifs In general, the lasso loop in protein has two tails and each of them can pierce the loop several times. To organize the piercing patterns, four classes were proposed (Fig. 1.30):

1. “L” class, where one of the tails pierces the surface “there and back”;

2. “LS” class, where one of the tails winds around the loop;

3. “LL” class, where both tails pierce the loop;

4. “LSL” class, where both tails pierce the loop and at least one winds around it;

Figure 1.30: The exemplary types of lassos with their naming.

The piercing class is followed by the number of piercings performed by each tail. In particular, the L1 type denotes the lasso, where one tail pierces the loop once (Fig. 1.30). The piercing motif may be fur- ther specified, taking into account the piercing tail and direction. The

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 38 topology in proteins

piercing direction is obtained by orienting the chain (from N- to C- terminus), which also induces the orientation of the surface. In partic- ular, four subtypes of L1 type may be introduced: L+1C, L-1C, L+1N, and L-1N [D16].

Several pierced loops Such method is called the lasso The simplest way to encode the topology of a structure with several fingerprint [D16]. pierced loops is the concatenation of the motifs of individual loops. However, in such an approach the mutual relation of the loops is lost (Fig. 1.31). To overcome this problem, three different techniques were proposed:

1. Permutation method – the enumeration of all possible lasso ar- rangements;

2. Substitution method – the substitution of the bridge by other known mathematical structure and classifying those;

3. Polynomial method – the creation of the polynomial devoted to complex lasso structures.

In the permutation method [D24], all the possible arrangements of N loops are created. These motifs may be sorted, by assigning the consecutive letters to pairs forming bridges and using lexicographic order. Therefore, each arrangement of the loops can be encoded with two numbers – first describing the number of loops, the second ar- rangement position in the tables (Fig. 1.31). Next, for each such arrangement, the intervals delimited by bridge- forming residues may be numbered. This enables to encode each piercing as a pair of numbers denoting the piercing and the loop- forming interval. Concatenation of the pairs in the order of rising indices of piercing residue describes the whole piercing motif. Such descriptions can be again sorted (numerically) and tabularized. There- fore, the motif can be described by additional two numbers, describ- ing the total number of piercings and the piercing pattern. The nam- ing convention using this method for exemplary structures with 2L1 lasso fingerprint is shown in Fig. 1.31. The method may be also expanded to include the direction of the piercing. The advantage of the permutation method is that it does not require chain closure, which always introduces some randomness in the classification. Moreover, once all the possible motifs are precom- puted, assigning the motif relies only on comparing the indices of bridge-forming and piercing residues, being therefore fast and inde- pendent on the overal spatial structure. On the other hand, the num- ber of possible loop arrangements increases exponentially fast with the number of bridges, making the method inefficient for structures with a large number of bridges. In the substitution method [D24], the structure is closed by connect- ing the termini (Sec 1.1.5), and then reduced to a well-known case by substitution of the bridge. In particular, the bridge may be substituted by:

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 1.5 complex lasso proteins 39

1. Two twists (obtaining a knot);

2. Two strings encircled by an additional ring (obtaining link);

3. Regular and virtual crossing (obtaining virtual knot);

4. Four-valent vertex (obtaining spatial graph). The result of the substitution method in the case of a singly pierced loop (L1) is shown in Fig. 1.32.

Figure 1.31: Top row – three different arrangements of two loops, with letters encoding their mutual arrangement and naming in permutation method. Middle and bottom row – exemplary possible piercing patterns of two loops resulting in the same 2L1 lasso fingerprint (each loop pierced once). Below the schemes, the piercing pattern in blue and naming according to the per- method. The orange stripes denote the bridges, the blue digits number the intervals, delimited by the bridge-forming residues. In fact, the The choice of the substitution depends on the needs. For example, substitution by two the most detailed picture may be obtained with the four-valent ver- twists was also tex approach, however, one obtains also the most complicated, and proposed in [118]. computationally demanding polynomial to calculate. Less demand- ing is the addition of the ring, with the advantage that the number of components encodes the number of bridges. On the other hand, the utilization of virtual knots is an interesting approach, which is insen- sitive to the trivial loops, vanishing due to the (virtual) Reidemeister move I. In total, the APS The polynomial method relies on the creation of a polynomial spe- bracket has 3 cialized to the complex lasso structures with connected termini. In variables. fact, one can generalize the Kauffman bracket (Sec. 1.1.3) to deal with

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 40 topology in proteins

Figure 1.32: The four possible substitutions for a singly pierced loop.

the disulfide-like bonds with the introduction of two new variables constants [D6]. This approach, due to the variable names is called the APS bracket. The exemplary structures with corresponding polynomi- als are presented in Fig. 1.33. Such bracket can be used to classify all spatial graphs with two kinds of bonds, including all the lasso ar- rangements or DNA chains, where the covalent and hydrogen bonds can be distinguished. As in the construction, the bracket requires chain closure, it can be used to produce the lasso fingerprint matrices, analogously to knot fingerprint matrix. Moreover, the bracket allows for calculating the input of two basic structures – the intra-chain bond and the bond connecting two chains (Fig. 1.33, first two structures). Finally, the APS bracket can be reduced to SiRP polynomial [77]by appropriate variable substitution [D6].

Figure 1.33: The exemplary structures with their corresponding APS bracket value. The double edge denotes the bridge.

1.5.3 Identified lassos

The first discovered examples of the complex lasso proteins (lasso peptides and leptin) are the structures, in which a covalent loop (piece of the backbone closed by a covalent bridge) is pierced once by one of Most up-to-date the tails. However, much more complicated structures may be found, statistics of complex when scanning the whole set of proteins structures available. In fact, lasso proteins can be the proteins with up to 6 piercing performed by one tail (L6 in the found in the LassoProt structure with PDB code 5vg2), or 7 piercings performed by two tails database [D16] at (LL4,3 in the structure with PDB code 2grk) were identified. More- http://lassoprot. cent.uw.edu.pl.

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 1.5 complex lasso proteins 41 over, in some structures the tail winds around the loop, forming the supercoiling lasso (with up to 3 piercings – LS3), or supercoiling lasso pierced by both tails (LSL2,2 in singular cases). The exemplary com- plex lasso proteins are shown in Fig. 1.34.

Figure 1.34: The exemplary structures of complex lasso proteins. From left to right: L2 motif in viral hydrolase (PDB code 3p06) with ester bridge, L6 motif in oxidoreductase (PDB code 5vg2), and LL4,3 motif in virus evm1 chemokine binding protein (PDB code 2grk). Exemplary complex Apart from the complexity of the piercing motif, the proteins vary lasso fingerprints also in the number of pierced loops. In particular, chains with up to identified: L2L1L3 or LS2LL1,1LS2. 4 pierced loops (4L1 lasso fingerprint in the case of a structure with PDB code 5j81) were identified. The loops present in one chain may have different piercing patterns. In total, over 25 lasso fingerprints (Sec. 1.5.2) were identified. Finally, most complex lasso protein loops are closed by a disulfide bridge (like in the case of leptin). The exception are the lasso peptides (miniproteins), with the loop closed by amide bridge, and a viral hy- drolase with PDB code 3p06, for which the loop is closed by ester bridge. The loops may be also closed by other covalent bridges (e.g. C-C or thioester), but such loops are trivial, i.e. neither tail pierces such loop. In total, the single-bridge complex lasso proteins constitute around 18% of all proteins with disulfide bridges [D13], i.e. around 4% of all protein structures. This fraction may be even higher if analyzing other closed loops. In particular, the single-bridge loops may be generalized to:

1. The loops closed by interactions mediated by ions – an example can be the human transport protein (PDB code 1n84), with the loop closed by Tyr95-Fe339-Asp63 interaction, and pierced by C-terminal tail (Thr250);

2. The loops formed by pieces of backbone joined by a few bridges – for example the bovine hydrolase (PDB code 11ba), with the loop formed by two bridges (Cys58-Cys110 and Cys40-Cys95) and pierced by part of the chain containing Thr82;

3. The loop formed by one chain is pierced by another chain – an ex- ample is a human hormone (PDB code 1fl7), where the covalent loop formed in chain B (Cys20-Cys104) is pierced by chain A forming L2 motif;

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 42 topology in proteins

4. The loop span several chains – for example human hormone (PDB code 1mkk) with the loop spanned on bridges Cys51A- Cys60B, Cys51B-Cys60A is pierced by two tails.

The examples of these types of complex lasso proteins are shown in Fig. 1.35. Even more lassos can be obtained when analyzing the prox- imity based contacts instead of convalent, or ion-based interactions closing the loop [140].

Figure 1.35: The examples of four other types of complex lasso structures, along with the motif and structure simplification. From left to right, from top to bottom: the loop closed by interaction mediated by ion (PDB code 1n84), the loop comprising two bridges (PDB code 11ba), the loop formed by one chain pierced by the other (PDB code 1fl7), the loop spanned by several chains (PDB code 1mkk). The colors in the structure match the colors in simplification.

1.6 links in proteins

The LinkProt server is available at http: All the known links in proteins (Sec. 1.3.2) can be classified into one //linkprot.cent. of the following groups, depending on the nature of the components: uw.edu.pl [D15]. 1. Deterministic links – where the components are formed by cova- lent loops (pieces of backbone joined by residue-residue interac- tions), each component contained in one chain;

2. Probabilistic links – where the components are formed by the whole chains (closed artificially, as in the case of knots – Sec. 1.1.5);

3. Macromolecular links – where the components are formed by sev- eral chains. All the available protein structures were scanned, and the existing deterministic and probabilistic links, as well as examples of macro- molecular links, were gathered in the LinkProt database.

1.6.1 Deterministic links

Deterministic links are formed by two covalent loops, both closed by a disulfide bridge. Knowing the indices of bridge-forming and piercing residues from complex lasso analysis, the identification of determinis- tic links is therefore equivalent to checking the order of indices [D11]. Three kinds of deterministic links were identified: two versions of Hopf link and one version of Solomon link [D11] (Fig. 1.36).

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 1.6 links in proteins 43

Figure 1.36: The examples of deterministic links found in proteins.

The different strategy had to be used to identify Brunnian links. In particular, in Brunnian links, each component has to be threaded in L2 manner, and the part of the chain piercing through the loop, itself forms a loop, which is then threaded (Fig. 1.36). Such motif, called the core of the Borromean ring is indeed present in the goat lactoperoxi- dases. However, no was identified in proteins, as those require the core of the Borromean rings to repeat symmetrically for each component. In fact, the linked loops in deterministic links are joined by an arc – part of the backbone. Therefore, these structures can be actually classified as the handcuff graphs H21 (Hopf links) and H41 (Solomon link).

1.6.2 Probabilistic links To avoid crossing of The probabilistic links are the direct generalization of the probabilis- the closures, each tic (backbone) knots. Therefore, they can be analyzed in a similar chain is closed in manner, by closing the chains randomly on the sphere and perform- one random point on the sphere. ing the statistics on an ensemble of closures. However, in the case of links, much more motifs are possible, and therefore, probabilistic Links of up to 4 components were links with a high probability of a single motif are rare. The set of analyzed. identified links depends on the cutoff probability used to define the link existence (Fig. 1.37). The analysis of probabilistic links shows that the structures identi- fied with other surveys (Sec. 1.3.2) were possibly probabilistic Hopf (GLN ⇠ 1, separable in some pulling direction) and Solomon links ± (inseparable in pulling, GLN ⇠ 2). The same analysis method en- ± ables in principle to analyze linking between the subchains of the whole chain. This leads to the linking fingerprint matrix, analogous to the knot fingerprint matrix (Sec. 1.1.5), which could reveal interest- ing features of the local protein topology, useful e.g. in studying the domain-swapped proteins. In particular, the smallest linked portion of the chains (the “linked core”) can be identified with this method. A similar approach to pinpoint the “linked core” was also developed recently [141].

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 44 topology in proteins

Figure 1.37: The examples of probabilistic links found in proteins.

1.6.3 Macromolecular links

In macromolecular links, components are formed by several chains, which, although rarely connected covalently, are still tied together by a non-covalent interaction. Therefore, the first problem to be solved The chain is then is the selection of components for a subsequent topological analysis. represented by two This problem may be solved by representing the interactions be- most distant from tween the chains as a spatial graph. The elongated form of the chains each other residues. in known examples suggests describing each chain in such a graph by only two vertices connected by an edge. The interactions between different chains are represented by an edge only, if there are at least Ncutoff contacts (Sec. 4.1.2) between corresponding fragments of the chains (for some value Ncutoff). In general, the vertices of the result- ing graph can be of arbitrary valency and different circular subgraphs Some restrictions (components) can be identified. In principle, each choice of compo- can be put by taking nents is equally good. Therefore, the topology of the structure may into account the be described by the “link probability”, i.e. the fraction of component symmetry of components [142, selection, in which the non-trivial link appears. Such an approach al- 143]. lowed to identify some components forming macromolecular links in proteins, presented in Tab. 1.2 [D21]. The algorithm leading to the identification of macromolecular Hopf link is schematically presented in Fig. 1.38. The “link probability” obtained by this method is the measure of the importance of the link motif in the whole structure.

Figure 1.38: The schematic depiction of the macromolecular link identifica- tion algorithm.

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 1.7knottedloops, ✓-curves, and other new topological structures 45

Virus capsids PDB code Organism Chains Pentagon: A 3j4u Phage BPP-1 Hexagon: B, C, D, E, F, G Pentagon: A, B, C, D, E 3jb5 ATCC Hexagon: B, C, D, E, F, G Pentagon: F 3j40 Phage ✏15 Hexagon: A, B, C, D, E, G Catenanes Component 1: A, B, C, D 1vdd Deinococcus radiodurans Component 2: E, F, G, H Component 1: first model 1zye Bos taurus Component 2: second model Component 1: A, B, C, D, E, F, G, H 3teo Acidianus sp. A1-3 Component 2: I, J, K, L, M, N, O, P

Table 1.2: Exemplary macromolecular links identified in proteins. In the case of virus capsids, linked are among others components in the shape of pen- tagon and hexagon.

1.7 knotted loops, ✓-curves, and other new topologi- cal structures

To identify knotted loops in a protein, its structure may be repre- sented as a spatial graph with the vertices corresponding to residues and edges to interactions between them. With such an approach, the search for knotted loops is equivalent to the search of a knotted cy- cle in such graph. However, the graph representation is a mine of complex topology structures, allowing to identify e.g. the ✓-curves in proteins.

1.7.1 The search method

When representing the protein as a spatial graph, the vertices are located at the positions of C↵ atoms. However, the set of edges de- pends on the type of the analyzed structure. Apart from representing the backbone, the edges may represent other types of interactions: 1. Covalent – disulfide, amide, etc;

2. Ion-based – mediated by ion;

3. Probabilistic – the chain closure. The ion-based interactions were studied for some structures by Liang and Mislow (Sec. 1.3.1). On the other hand, the chain closure is commonly used in defining the backbone knots. With the spatial graph established, the cycles in the graph (repre- senting the closed loops) are identified. The topology of each loop

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 46 topology in proteins

was established using the implementation of the HOMFLY-PT poly- The search for cycles nomial (Sec. 1.1.3). However, the existence of non-backbone inter- in a graph is a actions (e.g. disulfide bonds) results in tri-valent vertices, enabling standard task from many more complex spatial graphs, than knotted loops. In particular, graph theory, which was done using the to identify all the ✓-curves in the protein, after the identification of the DFS algorithm. closed loops in the corresponding graph, the search for the path con- The search for the necting any pair of points in the loop was performed. The existence of path joining two such path with only terminal points common with the original loop points in the graph is equivalent with the existence of a ✓-curve in the original protein again was performed structure. The topology of the identified ✓-curves was analyzed with with the DFS algorithm. the Yamada polynomial (Sec. 1.2.2). In principle, more complicated structures than ✓-curves may be identified with this method. How- ever, no mathematical classification beyond ✓-curves was developed so far.

1.7.2 Identified knotted loops and ✓-curves in proteins The ✓-curves are currently classified The search through all available protein structures revealed only one only up to 7 structure with purely covalent +31 knotted loop – the coagulogen crossings. factor with PDB code 1aoc (Fig. 1.39)[D19]. This is also the only structure with purely covalent ✓31 motif. Inclusion of the ions al- lowed to identify also the -31 knotted loops. The spectrum of knotted loops expands substantially upon inclusion of chain closure. In par- ticular, apart from the backbone-specific knots (+31, -31, 41, -52), two new types were identified – 51 and 85 – both in the human hor- mon (PDB code 4dou), steming from a dense interaction net of ions. These structures give rise to seven ✓-curves: ✓31, ✓01#31, ✓41, ✓01#41, ✓01#51, ✓54, and unclassified ✓-curve with 8 crossings in minimal crossing projection, denoted therefore as ✓8n. The structures were shown schematically in the Fig. 1.39 and the ✓-curve-forming arcs The KnotProt for these proteins are collected in Tab. 1.3. The self-updating list of database is avalilable knotted loops may be found in the KnotProt database. at The knotted loops and ✓-curves arise naturally in backbone-knotted https://knotprot. cent.uw.edu.pl. proteins with disulfide or ion-based bridges. However, almost 2/3 of the knotted loops and ✓-curves arise in backbone-unknotted pro- For example, teins [D19]. This shows that knotted loops and ✓-curves are distinct K31 3x31 denotes from other motifs studied so far. Moreover, both knotted loops and the structure with ✓-curves may span several chains (contrary to backbone knots). For 31-knotted backbone example, the archaeal protein with PDB code 3bpd has a loop with possessing three 31 knotted loops. -31 topology, spanning chains C and J. On the other hand, one pro- tein can contain several overlying knotted loops and ✓-curve motifs. Their topology may be described as a variant of a knot fingerprint, including the number of knotted loops with a given topology. Describing the structure by comprised ✓-curves is more informa- tive, than using only the knotted loops, as different ✓-curves have the same constituent knots. Interestingly, no ✓52 motif, nor Kinoshita ✓51 motif were identified in proteins, although more complicated struc- tures (like ✓8n), or structures with the same constituent knots (knot- ted loops) were identified.

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 1.7knottedloops, ✓-curves, and other new topological structures 47

Figure 1.39: The exemplary ✓-curves present in proteins, along with their simplification and the motif. PDB codes from top to bottom, from left to right: 1aoc, 5osq, 3ulk, 5e4r, 3ihr, 4dou, 4dou. Figure adapted from [D19].

1.7.3 ✓-curve knotoid content

Even more information on the protein local topology may be ob- tained, when analyzing the knotoid content of the ✓-curve, i.e. the knotoids prescribed to each arc of the ✓-curve. In fact, knotoids de- scribe only planar arcs, but for each arc, the dominating knotoid over an ensemble of projections may be chosen. For example, the ✓01#31 is represented in proteins in various ways (including k43 and k52 knotoids - Fig. 1.40).

Figure 1.40: The representations of ✓01#31 present in protein structures, along with their knotoid content.

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 48 topology in proteins

PDB ✓-curve Knotoid Arc code

k01 C140 ... C161 C60 ... C65 C121 ... C134 $ $ ✓3 1aocA 1 k01 C140 C88 ... C95 C10 ... C8 C167 ... C172 C134 $ $ $ $ k01 C140 ... C134 D437 Ca504 S212 ... Q366 Ca503 V182 ... T3 Cls k31 $ $ $ $ $ $ L474 ... C469 ✓01#31 5osqA $ k01 D437 ... C376 C469 $ k01 D437 ... C469

k01 D217 ... M1 Cls V489 ... E393 $ $ ✓4 3ulkA 1 k01 D217 ... E393

k01 D217 Mg498 E393 $ $ k32 D191 ... A2 Cls M476 ... E195 $ $ ✓0 #4 5e4rA 1 1 k01 D191 Mg502 E195 $ $ k01 D191 ... E195

k52 D75 ... E7 Cls L311 ... Q119 $ $ ✓0 #5 3ihrA 1 2 k01 D75 ... Q119

k01 D75 Na331 Q119 $ $ k31 D187 ... A108 Cls T525 ... Q478 Na1004 Q337 ... D477 $ $ $ $ ✓5 4douA 4 k01 D187 Ca1003 D195 ... D328 Ca1001 D477 $ $ $ $ k01 D187 ... V194 Ca1002 D477 $ $ D334 Ca1003 D187 ... A108 Cls T525 ... D477 k01 $ $ $ $ $ Ca1002 D195 ... D328 ✓8n 4douA $ $ k01 D334 ... D475 Ca1001 D328 $ $ k01 D334 ... D328

Table 1.3: The structural details of exemplary proteins with ✓-curve motifs shown in Fig. 1.39. The “...” denotes connection along the backbone, the $ denotes the covalent bridge or interaction via ion.

1.8 knotoid description of protein chains

The calculation of the dominant knotoid can be also used in the case of the backbone-knotted proteins. It enables to specialize the whole- chain motif, as well as the knot fingerprint matrix. Both of them may be useful, as the function of the protein strictly corresponds to the topology [57]. Moreover, knotoids are more sensitive to the non-trivial topology. In particular, there may be proteins classified as unknotted in the classical sense, but possessing the backbone representing e.g. the k21 knotoid. The complete, up-to-date knotoid classification of proteins can be found in KnotProt database. In the case of the whole chain, the knotoid analysis diversifies the way the given kind of knot occurs after the chain closure. For example, three ways of obtaining the trefoil knot are present in proteins (via k21, k31, and k32).

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 1.9 genus of protein structure 49

On the other hand, calculating the knotoid fingerprint matrix al- lows for understanding better the topology of the subchains, and trace its changes when removing terminal residues. For example, the full chain of Ubiquitin C-terminal Hydrolase (UCH-L, PDB code 2len) forms an (open) 52 knot. Clipping its C-terminus results in the 31 The lower the cutoff knotted structure. However, in the space of knotoids, this transition the more detailed the picture. is much more interesting, passing through k44 knotoid and reaching k21, k11, and k01 (trivial) knotoid eventually. Moreover, one can bal- ance the detail level by adjusting the cutoff of the dominating knotoid which is accepted as representing the subchain.

1.9 genus of protein structure

Apart from knotoids, another topological tool, which was not used in the analysis of the protein is the genus, describing the number of holes in a given surface. Surfaces differing in genus admit a different set of graphs, which can be drawn without self-crossings. In particu- lar, the K3,3 graph, which by Kuratowski theorem cannot be drawn without self-crossing on the plane (Sec. 1.2.2), can be drawn without self-crossings on the torus, i.e. the surface with genus g = 1. Such a prescription of the genus to a graph can be also utilized in proteins to distinguish between the domains. This can be achieved by investi- gating the net of interactions between residues. The interactions in a protein can be depicted as arcs connecting points on a straight line (protein backbone – Fig. 1.41). Such a repre- sentation gives rise to a graph, which defines the smallest genus of the surface, on which the graph is planar. In fact, such genus can be calculated as a function of the chain length. As the linker between The genus as a domains forms only singular contacts, the change of the genus in function of chain this region is very small. Therefore, the plateaus of the genus trace length is called the genus trace. separate the domains in proteins or RNA chains [D4] (Fig. 1.41).

Figure 1.41: The genus trace method applied to protein structure. Left panel – the schematic depiction of the protein with the contacts represented as arcs. Middle panel – the genus trace calculated for the gelsolin protein (PDB code 1d0n), overlayed on the map of the contacts identified during protein folding – the squares in the map denote the domains of the protein. Right panel – the structure of the gelsolin protein with the domains colored as in the middle panel. Fig. adapted from [144].

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] [ July 15, 2019 at 14:29 – classicthesis version 1.01 ] STATISTICS, PROBABILITY, AND SHAPE OF 2 (BIO)POLYMERS

72% of statistics are made on spot. — Unknown

f a motif is beneficial for the protein function or stability, it may I have been conserved by the evolution, and nowadays it can occur more often, than in a reference polymer model. The occurrence of a motif in the polymer model is usually approximated by sampling a large ensemble of polymers and calculating the fraction of the motif- containing structures. The sampling methods used in the case of pro- teins are described in Sec. 2.1. To date, the probability of different knot types, as well as the loop threading probability in some cases were determined, and compared with the occurrence of backbone- knotted proteins (Sec. 2.2). Apart from the probability of the motif, also its influence on the chain shape can be analyzed. The common description of the shape with its dependence on the topology is pre- sented in the Sec. 2.3. As most of the polymer-based results are obtained for knotted structures, there is a deep reservoir of unanswered questions concern- ing statistics and influence of other topological motifs. In particular, the probability of complex lassos and knotted loops in random poly- mers and its comparison to the occurrence in proteins is described in Sec. 2.5. The influence of the piercing on the shape of the loop is described in Sec. 2.6.

2.1 sampling random chains

Freely jointed chain The equilibrium properties can be approximated by calculating the approach has a long average quantities over a large ensemble of conformations. Sampling tradition [145]. the open, volume-free polymers (freely jointed chain) with constant bond length is an easy task. Such a chain can be built by iteratively adding random vectors picked from the uniform distribution on the sphere. This model can be also easily generalized, by imposing a dis- tribution of the bond length, planar, or dihedral angles. Much harder to sample are the closed polygons, which are needed, when analyzing the statistical properties of knots or lasso loops. In general, most of the algorithms proposed [146–150] are biased, con- verging to incorrect distribution of the polygons sampled. Only two The construction of methods were shown to give statistically unbiased results – the algo- the measure on the rithm utilizing the symplectic representation of polygons (toric sym- space of closed polygons may be plectic Markov chain Monte Carlo – TSMCMC) [152] and the sinc found in [151]. integral method [153–156]. The latter method may be used to sam- ple the equilateral polygons by sampling the diagonal length of the polygon and the planar angles between the polygon edges [157].

51

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 52 statistics, probability, and shape of (bio)polymers

The volume-free calculations can be used to determine e.g. the sta- tistical probability of a given knot type. This probability depends, however, on the presence of the confinement. Simulation of the poly- mers under confinement is usually performed by adding some spher- ical boundary surrounding the polymer [154–156]. Apart from the statistical probability, the influence of the topology on the shape of the polymer can be analyzed (Sec. 2.3). In such a case, to obtain more realistic results, usually, the excluded volume of the chain is added. However, no measure on the space of poly- gons with excluded volume has been constructed yet. In consequence, there is no way to prove, that the models used are statistically unbi- ased. Usually, the excluded volume is generated by extracting the non-correlated frames from the simulation of the polymer’s move- ment [158]. Alternatively, the volume is added to a phantom polymer, with subsequent chain relaxation.

2.2 probability and statistics of non-trivial topology

Generation of a large ensemble of polygons allows for testing the equilibrium (statistical) probability of the topological motifs. In par- ticular, the probability of knot occurence as a function of chain length (Sec. 2.2.1) and asymptotic probability links (Sec. 2.2.2) were estab- lished. These results can be compared with the number of knotted proteins (Sec. 2.2.3).

2.2.1 Probability of the knot in polymer polygons

The long polymer polygons were conjectured to be asymptotically knotted with high probability (Frisch-Wasserman-Delbruck conjec- ture)[159, 160]. In fact, the probability of the unknot P( ) was shown to tend to 0 with the ring lengths for some models [161–165]. Elonga- tion of the ring also changes the spectrum of observed knot types, as more complex knots emerge. The probability of a given knot type as a function of the chain length N was described as [166]:

N m(K) P (N)=C exp(-N/N ) (2.1) K K N K ✓ K ◆

with CK, NK, and m(K) being the parameters characteristic for a given knot. The values of these parameters were measured by sampling simplest knots [166].

2.2.2 Probability of linking and threading

Apart from knotting, also the probability of linking was investigated [167, 168]. However, only singular results for links were obtained. Usually, instead of linking, one analyzes the threading of one com- ponent through the other. In particular, the threading probability was investigated in models with excluded volume [169, 170] in polymer

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 2.3 size and shape parameters of polymers 53 melts [171, 172] or in asymptotic case [173, 174]. In particular, the exponential decay of the fraction of unlink was reported [169]. The mutual threading of the non-concatenated rings was also in- vestigated by analysis of the minimal surface spanned on the rings [169, 175]. The threading was shown to modify the hydrodynamical properties of the polymer [176–185].

2.2.3 Statistics of knots in proteins

The analysis of all the protein structures deposited in the KnotProt database shows, that the proteins with knotted backbone constitute Figure 2.1: The pie chart of knot around 1.5% of all protein structures. However, more structures would topology for non- be expected, when comparing to the protein-like model of random redundant struc- compact loops [186]. This indicates, that the knots are rather avoided tures (May 2019). in protein structures. The most common knot type is the +31 knot, represented by over 70% of non-redundant structures, with only around 10% of structures representing the -31 knot topology (Fig. 2.1). Similar domination is visible in the case of slipknot structures, with S + 31 constituting al- most 80% motifs. The asymmetry of chirality indicates, that the knots For structures with may not be random. The domination of the 31 topology may be ex- less than 200 plained by the size of the structures, as more complicated knots ap- residues in knotted core almost all have pear for proteins with a longer knotted core. K31 or S31 topology. 2.3 size and shape parameters of polymers

The polymer size and shape is conventionally described by the invari- ants derived from its tensor of inertia [187–189]: 1 Q = (X - X )(X - X ) (2.2) ↵ N i,↵ j,↵ i, j, Xi>j th for Xi,↵ being the ↵-coordinate of i monomer and N the number of The description monomers. The most commonly used invariants derived from the ten- using the enveloping sor are the radius of gyration, which encodes the size of the polymer ellipsoid was also applied [190]. (Sec. 2.3.1), and the shape parameters – asphericity, and prolateness (Sec. 2.3.2). In three dimensions, these parameters have a geometrical meaning, as they describe the average size, asphericity, and prolat- ness of the ellipsoid of inertia, defined by the square roots of the tensor eigenvalues.

2.3.1 Radius of gyration and the Flory theory Equivalently, R2 is 2 g The radius of gyration squared (Rg) is equal to the trace of the gyra- equal to the average tion tensor. The simple theory of the polymer radius of gyration was square of the developed by Paul Flory [191], who estimated the repulsive energy of ellipsoid of inertia semi-axis length. the chain composed of N monomers as: In d dimensions ⌫ = 3 d 4 N2 d+2 for 6 ⇠ 1 Erep vexc 3 (2.3) and ⌫ = 2 for Rg d>4.

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 54 statistics, probability, and shape of (bio)polymers

with vexc being the excluded volume of each pair of monomers. This enables to estimate the minimum of the chain free energy, and in ⌫ consequence the scalling of the Rg ⇠ N with ⌫ = 3/5 in three dimen- sions [192]. Very similar exponent value (⌫ = 0.588) was obtained as an average over a large ensemble of generated closed chains [193]. This value is attained for so-called good-solvent conditions, where the interactions between polymer and solvent are highly favorable, In general, for and the chain swells. On the other hand, in the bad-solvent condi- bad-solvent tions, the polymer repels the solvent molecules, and in consequence conditions 1/3 1/d collapses into a globule with the size scalling as N . In the interme- Rg ⇠ N with d being the space diate point, where the solvent neither attracts, nor repels the polymer 1/2 dimension. (the ✓-conditions), the size scalles as Rg ⇠ N . The theory of the radius of gyration was built with no assumptions on the topology of the analyzed polymer. However, the scaling ex- ponent changes when the chain contains the knot [193–203], or has other topological constraints [158, 180, 182]. In particular, the knot crumples the chain, making it more compact. As a result, the scaling exponent decreases [194].

2.3.2 Asphericity and prolateness In terms of inertia tensor Q the The shape of the ellipsoid of inertia can be described by two parame- asphericity Ad = ters – the asphericity A, measuring the deviation from spherical shape d tr(Q2) d-1 (tr(Q))2 and (A = 0 for a sphere), and the prolateness P differing between prolate prolatenessb (P = 1) and oblate (P=-1) structures. The exemplary ellipsoids with 4detQ P = 2 2 3/2 ( 3 tr(Q )) their corresponding values of asphericity and prolatness are shown b for b1 in the Fig. 2.2. Q = Q - d tr(Q) and d being the b dimension.

Figure 2.2: The polymer with its ellipsoid of inertia (left); the exemplary ellipsoids with their corresponding values of asphericity and prolateness (right). Figure adapted from [D1].

In terms of the ellipsoid semi-axes a, b, c, the shape parameters are calculated as: (a - b)2 +(b - c)2 +(c - a)2 A = 2(a + b + c)2 (2.4) (2a - b - c)(2b - c - a)(2c - a - b) P = 3 2(a2 + b2 + c2 - ab - bc - ca) 2

The asphericity and prolateness were measured for open and closed polymers [187, 188, 204–208] and compared with their assymptotic

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 2.4 the aims 55 values [209–211]. The influence of the topology was also studied. In particular, the presence of knot increases the asymptotic asphericity [190, 212].

2.4 the aims

The identification of new topological motifs creates a knowledge gap, concerning their occurrence, and the influence of the topology on the shape. This sets the aims of this chapter:

1. Estimate the statistical probability of lassos and knotted loops in polymers

2. Determine the influence of the piercing in lasso topology on the shape of the loop.

3. Determine the statistics of lassos, links and knotted loops in pro- teins, and compare them with the occurrence in relevant poly- mer models.

2.5 probability and occurrence of topological motifs

The probability of lassos, knotted loops and ✓-curves may be approx- imated by performing large ensembles of the structures. These can be compared to the occurrence of the complex topology motifs in proteins.

2.5.1 Probability of lassos

The ensemble of random lassos was created by sampling phantom, equilateral loops [157] to which a tail was attached. The tail was built as a random, equilateral walk, starting from one bead of the loop [182]. The statistical probability of piercing equal the fraction of the number of pierced loops was calculated as a function of a loop and tail lengths [D1]. Increasing both loop and tail lengths decreases the probability of a trivial (unpierced – L0) loop (Fig. 2.3). However, even In fact for the infinite structure (with number of beads in loop and tail equal P (L0)=0.19 [D1]. to infinity), the limit of the trivial loop probability P (L0) is non-zero. 1 This value represents the probability, that the loop and the tail may go apart in completely different directions. 1 The fact, that even the infinite random walk (the tail) will not pierce the infinite loop with positive probability has also consequence for the spectrum of possible lasso motifs. In particular, the part of the tail after the piercing is again a memoryless, infinite random walk, which therefore has a positive probability of drifting apart from the loop. This argument can be iterated with the number of piercings. As a result, the asymptotic distribution of the number of piercings forms a geometric-like distribution, governed by the parameter P (L0). In particular, the most probable lasso motif for all loop and tail lengths 1 is the singly pierced lasso (L1)[D1]. This agrees with the protein data,

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 56 statistics, probability, and shape of (bio)polymers

Figure 2.3: The probability of trivial lasso as a function of loop and tail lengths shown in different cases: (A) with the lengths equal, (B) with fixed loop length, (C) with fixed tail length, (D) as a surface of two parameters. Figure adapted from [D1].

as most complex lasso proteins are the singly pierced L1 structures (Fig. 2.4). This result differs lassos from knots, where the probability of the simplest non-trivial knot vanishes with the chain elongation. In consequence, different knot types dominate for different knot loop lengths (Sec. 2.2.1). In the case of the polymer model, no prevalence of piercing tail, nor piercing direction may be seen. No substantial prevalence for pierc- ing by one particular terminus is also visible in the case of proteins. However, most of the lasso loops are pierced from one side, marked as negative (Fig. 2.4). This effect may be a result of the chirality of the amino acids and the chain conformation imposed by most com- mon Ramachandran angles, especially as the negative piercings arise sequentially closer on average to the loop-closing bridge (Fig. 2.4).

Figure 2.4: The statistics of complex lasso proteins. (A) The pie charts of the lasso motifs, (B) the piercing tail, (C) the piercing direction. (D) The probability distribution of the sequential distance between piercing residue and the loop-closing bridge, for negative and positive piercings.

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 2.5 probability and occurrence of topological motifs 57

In the case of polymer model, the decrease of the probability of the trivial lasso P(N, t; L0) may be described as a sum of exponents in each parameter (lengths of loop N or tail t):

P(N, t; L0)=P (L0)+c↵;t exp(-↵tt)+c↵;N exp(-↵NN)+ (2.5) +c;t exp(-tt)+c;N exp(-NN) 1 For both parameters, one characteristic length is an order of mag- nitude larger than the second one. This shows that there is a specific effect responsible for loop piercing for short scales. This effect results from sharp turns of the chain or the loop. In consequence, this ef- fect disappears when including the excluded volume, forbidding very acute angles [169, 170]. The equation 2.5 can be used to compare the observed number of complex lasso proteins with the number expected in the volume-free polymer model. To this end, for each covalent loop in protein, the probability of piercing can be prescribed, based only on the loop and tail lengths. This allows for calculating the expected number of com- plex lassos for any protein structure, taking into account only the number of residues in the loops and the tails. The expected and ob- served number of lassos as a function of the chain length is shown in Fig. 2.5 [D1]. For all chain lengths, the expected number of com- plex lassos is higher than observed. As the shapes of the plots are different, especially for long chains, the plot corresponding to the ex- pected numbers of complex lassos cannot be obtained from the plot corresponding to the observed numbers, which would happen, if the difference stemmed from different persistence lengths.

Figure 2.5: The comparison of expected, and observed lasso motifs in pro- teins. (A) The expected and observed number of lasso loops as a function of chain length; (B) The probability of trivial lasso based on observations in proteins (violet points) and expected from polymer model (blue surface). Figure adapted from [D1].

The comparison with the polymer model can be performed also in another way. In particular, the structures with similar loop and tail lengths were gathered, and the statistical probability of a trivial protein lasso as a function of loop length N and tail length t, was calculated based on the biological data. The resulting distribution forms a surface, which can be compared with the surface obtained in the polymer model (Eq. 2.5). The protein-based points locate al- most exclusively above the polymer-based surface, which means, that

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 58 statistics, probability, and shape of (bio)polymers

the probability of the trivial lasso is much higher in proteins than in the polymer model (Fig. 2.5). This again indicates, that the complex lassos are avoided during folding, similarly to knots (however, to a lower extent, taking into account the number of complex lasso pro- teins). Again, this effect does not result from a possible difference in persistence length of proteins compared to the model, as the shape of the surface corresponding to the expected data cannot be obtained from the surface for biological data just by scaling. The probability of complex lasso has also influence on the proba- bility of obtaining the deterministic link, as these arise by the mutual piercing of two covalent loops. Therefore, the number of determinis- tic links is also much smaller than would be expected from polymer models.

2.5.2 Probability of knotted loops and ✓-curves

In the case of proteins, the existence of knotted loops and ✓-curves is possible thanks to the non-backbone connection between residues. Such connections can occur only if the distance between two residues is small enough. This suggests, how to identify possible places for the bridge localization, also in random polymer models. The set of distances between each pair of residues can be presented in the form of a matrix. The pairs of residues with the spatial distance of C↵ atoms less than 10Å (the maximal distance between the C↵ atoms of disulfide-bridge-building cysteine residues, when scanning all the structures deposited in the RCSB database) form a set of spots in the distance map (Fig. 2.6A). To avoid redundancy, for each spot large enough (containing at least 10 contacts) its center may be regarded as a representative place for bridge location (blue dots in Fig. 2.6A). Importantly, the set of bridges identified with this method contain the set of native disulfide bridges for the test protein (orange dots in Fig. 2.6A). This method can be also applied to the sampled random polymers.

Figure 2.6: (A) The distance map between the C↵ atoms (above diagonal) with the pairs of residues within a cutoff distance marked as black spots (below the diagonal) for protein structure with PDB code 1aoc. The orange dots mark the native bridges, the blue dots mark the representative bridge location predicted by the method described. The red arrows mark the cor- respondence of bridges, if not obvious; The distribution of planar (B) and torsion (C) angles calculated over a set of all protein structures available in the RCSB database. Figure adapted from [D1].

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 2.5 probability and occurrence of topological motifs 59

However, such analysis for a completely random walk results in the number of bridges much smaller than observed in the case of protein stuctures [D19]. This can be fixed by applying the distribution of pla- nar and torsion angles derived from the available protein structures (Fig. 2.6B and C). The probability of obtaining non-trivial knotted loops and ✓-curves as a function of chain length or the number of bridges within such an approach is shown in Fig. 2.7.

Figure 2.7: The probability of knotted loop and ✓-curve in random polymers with the protein-like distribution of planar and torsin angles, as a func- tion of the chain length and number of artificial bridges. Figure adapted from [D19].

In particular, structures with the 31 knotted loops dominate the spectrum and even for chains with a high number of bridges the chance of obtaining a 41 knotted loop is almost two times smaller. The prevalence of the 31 knotted loops is even more evident when analyzing the probability as a function of the chain length. As a con- sequence, the ✓31 and ✓01#31, as possessing the 31 knotted loop dom- inate the spectrum of obtained ✓-curves. Interestingly, the ✓51 (Ki- noshita curve), or ✓52 (with 31) knotted loop are much less probable, accordingly to the protein case. In general, the protein and polymer- based spectrum of knotted loops and ✓ curves agree well. This indi- cates, that both knotted loops and the ✓-curves are rather the natu- rally occurring in the structures due to entropic reasons.

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 60 statistics, probability, and shape of (bio)polymers

2.6 shape of lasso polymers

In the case of knots, the entanglement changes the overal shape of the loop, measured by the asphericity and prolateness (Sec. 2.3.2). One could expect to observe also an effect of the loop threading in lasso polymers. This influence should be also dependent on the Figure 2.8: The thread thickness. To test this influence, the long simulations of the schematic depic- dynamics of the loop movement were conducted, with a different tion of the model number of loop length and the thread thickness [D1]. The simula- used in the anal- ysis. tions were conducted in periodic box conditions, to disallow slipping the loop from the chain (Fig. 2.8). The simulation frames were treated as an ensemble of the loop conformations, and the average radius of gyration, asphericity, and prolateness were calculated (Fig. 2.9).

Figure 2.9: The radius of gyration and asphericity as a function of loop length and thread thickness. Figure adapted from [D1].

The presence of the thread influences all three investigated param- eters. In particular, increases the radius of gyration compared to the unpierced loop with the same length, which results in spreading the loop in two dimensions perpendicular to the thread. This results in loop flattening, which is visible both in asphericity and prolateness – upon threading, the loops are more aspherical and less prolate (more oblate). In the asymptotic case, the thread may be considered as infinitely thin, as long as the thread thickness is kept constant with varying loop length. Therefore, the thread thickness does not influence the asymptotic values. However, the presence of the thread restricts the chain movement, therefore modifying the asymptotic values of the shape parameters. For example, the asymptotic value of the aspheric- ity increases to 0.0859, compared to 0.078 for unthreaded loop [190].

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] BIOLOGY OF COMPLEX TOPOLOGY PROTEINS 3

I learned a lot of things in biology, and I gained a lot of experience. — Richard P. Feynman

he aim of the evolution of proteins is to make them efficient and T functional. The function of the topologically complex motifs is still unclear, although much work has been done to shed some ligth on this issue (Sec. 3.1). To solve the conundrum of the function of backbone-knotted pro- teins one can take into account, that over 70% of backbone knotted proteins are enzymes [113], [D3], and the knot structure is important for the enzyme activity [213, 214]. This suggests that the backbone knot acts favorably on the enzymatic active sites (Sec. 3.3). On the other hand, many previous works proposed the stabilization effect of knots. This may be the primary function in the case of shallowly knotted structures, but also links and some types of lassos (Sec. 3.4. Finally, the strict conservation of the topological motifs, indicating the functionality of non-trivial topology, may be utilized in structure reconstruction and prediction (Sec. 3.5).

3.1 existing concepts of topologically complex motifs function

The Despite both experimental and theoretical studies, the precise func- backbone-knotted tion of the topologically complex motifs remains unclear. The most proteins were data are present for backbone knots, which are conserved for over suggested to appear by the gene a billion years of evolution and present in proteins sharing less than duplication [215], or 20% of sequential similarity [57]. On one hand, the strict conservation circular of the topology indicates the functional advantage of the backbone permutation [216]. knot. On the other hand, almost excludes the presence of close ho- mologs, differing in topology, to analyze directly the influence of the knot. The only known pair Usually, the function attributed to the backbone knots is the ele- of homologs vated stability, either mechanical [46, 217–222], thermodynamical [50, differing in topology is knotted 215, 223–225], kinetic [226], or the resistance against the degrading Acethylrnitine machinery [48, 227–230]. The latter possibility was assumed, as the TransCarbamylase 52-knotted protein – Ubiquitin C-terminal Hydrolase (UCH) – is the (ATC) and protein detaching the ubiquitin before the protein destined to be de- unknotted Ornitine TransCarbamylase graded enters the proteasome. Therefore, the backbone knot in UCH (OTC). was speculated to protect the structure in the case it also falls into the proteasome. However, the recent experimental results do not support this hypothesis [228]. The argument of structure stabilization was further supported by the observation, that the unfolded protein with native deep trefoil knot remains knotted in the denatured state for a very long time [231–

61

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 62 biologyofcomplextopologyproteins

233]. On the other hand, the knotted backbone modifies the effective persistence length of the chain [233], introducing a local stiffness [234], which also can have functional implications, especially in structural Statistics of knotted proteins. proteins according To understand the role of the knot in the backbone, one can take to KnotProt into account the fact, that over 70% of knotted proteins are enzymes. database. Moreover, their enzymatic active sites are contained at least partially within the knotted core and the structure of the knotted core was shown to be crucial for the protein’s dimerization and activity [213, 214]. However, the functional implications of the knot presence for the enzymatic activity have not been studied yet. Apart from the backbone knotted proteins, the function of some lasso proteins was studied. In particular, the miniproteins were shown to have antimicrobial activity, with their function crucially dependent on the topology, as their lasso motif allows them to act as molecular plugs for relevant NTP-uptake channels [235–237]. On the other hand, the lasso motif in the obesity hormone – leptin – was also shown to modulate the function of the protein [133, 238].

3.2 the aims

The existing data on the function of proteins with complex topology motivate additional questions. In particular, the first aim is to:

1. Identify the functional advantage of the knot in proteins, which explains the prevalence of enzymes.

On the other hand, the function of links, knotted loops or complex lasso proteins, in general, was not investigated yet. This opens a new, broad field with the main aims:

2. Identify the functional complex topology motifs in proteins.

3. Propose their function.

3.3 knot-induced enzymatic activity

The great prevalence of enzymes among the backbone-knotted pro- teins forces to look at the location of the active sites in reference to the knotted core. In the majority of cases, most of the active sites are The knot was contained within, or close to the knotted core. Moreover, there are regarded as tight if other non-enzymatic knotted proteins with other functional residues the knotted core (e.g. cofactor-binding sites) again located in the knotted core. This comprised maximally 84 indicates, that the knotted core may have some influence on the prop- residues, i.e. twice as erties of the individual residues. many as the smallest To analyze this influence, the properties of the proteins with deep, knotted core tight knot were investigated, as in these cases the influence of the knot identified in proteins. should be the most explicit. After clustering according to sequentially homologous structures, three cases were chosen: tRNA methyltrans- ferase (TrmL, PDB code 4JAK), N-acetyl-L-ornithine transcarbamoy- lase (ATC, PDB code 3kzn), and splicing factor Rds3p (PDB code

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 3.3knot-induced enzymatic activity 63

Figure 3.1: The knot-induced dense net of interactions for the knotted core delimiting residues (TrmL protein, PDB code 4jak). The contact map with the structure of the protein (A) with its magnification (B); (C) the structure of the cofactor (SAH) binding pocket; (D and E) the interaction net of the knot delimiting residues. Figure from [D14].

2k0a). In each case, the functionally active residues colocate with the knotted core. When analyzing the interactions between residues (the contact map – Sec. 4.1.2), the high density of contacts for the residues on the border of the knotted core was observed (Fig. 3.1).

Figure 3.2: The analysis of the functional advantage of the knotted core (ATC enzyme, PDB code 3kzn). (A) The contact map overlayed on the knot fingerprint matrix; (B) the structure with the binding residues marked as violet beads; The number of native contacts (C), B-factor (D) and SASA (E) as a function of the residue index (marked are the position of the knotted core and the ligand-binding residues). Figure from [D14].

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 64 biologyofcomplextopologyproteins

Apart from the dense interaction net, the knot-delimiting residues exhibit other special features, induced by the presence of backbone knot: they are located in the regions with decreased mobility (mea- sured by -factor) and on the verge of phases – polar protein inte- rior and the solvent (measured by Solvent Accessible Surface Area – SASA) – Fig. 3.2. The same properties are exhibited in general by active sites in other, unknotted enzymes. Therefore, the existence of a knotted backbone results in the cre- ation of the places favorable for the location of active sites of en- zymes [D14]. By no means, it is the only way of creating such places, but such influence of the knotted core on the biochemical properties of the residues may explain the great prevalence of enzymes in the spectrum of all knotted proteins. Moreover, it additionally justifies the reasoning, that it is enough to check only the function of the na- tively knotted protein to show, that it contains the knotted backbone (i.e. without crystallization of the structure). In principle, the natively knotted protein could be functional even after the hipothetical loss of the knot e.g. during long-lasting unfolding. However, as the knot in- duces the proper conformation of the surrounding of the active sites, the functionality of the protein is indeed a hallmark of the knot pres- ence.

3.4 function of other topologically complex motifs

Elucidating the function of the lasso motif is a hard task, as many non- trivial lassos, especially with large loops, may be incidental. However, comparison of the shape of the pierced loop in proteins and in poly- mer model shows that the pierced loops are more spherical than ex- pected from the polymer model (Fig. 3.3). In particular, the result of piercing on the asphericity value for proteins is opposite, than in the case of the polymer model. This indicates, that the complex lasso motif is functional for some proteins.

Figure 3.3: The distribution of the loop lengths and the asphericity of lassos in protein. Although the pierced loops are longer on average than unpierced (blue maximum and the tail shifted to larger values), they are more spherical at the same time (blue maximum and the tail shifted to smaller values). Figure adapted from [D14].

To establish the possible function of the lasso motif, most common protein functions can be colocated with the lasso motif in a non- redundant protein set [D13]. In the analysis it is crucial to describe

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 3.4 function of other topologically complex motifs 65 the motif in full detail, i.e. to include both piercing tail and direction. In particular, the proteins with L-1C topology turn out to be most commonly the antimicrobial proteins (-defensins, etc.), and proteins with L+2C are commonly signaling proteins (cytokines, chemokines, etc.). Interestingly, the same L-1C motif is present in antimicrobial lasso peptides, where the motif was proved to be crucial for the func- tion. The potential functionality of the lasso proteins with L-1C and L+2C motifs was also shown by analyzing the probability of the lasso motif [D1]. In particular, based on the probability of piercing for ran- dom polymers (Sec. 2.5.1), the probability of piercing for a protein with a given loop and chain length can be approximated. The prob- ability for most of the proteins with L-1C motif was estimated to be below 0.2, while the probability for most proteins with L+2C was esti- mated to be just above 0.2, indicating, that the piercing, as rather en- tropically improbable, is stabilized by some enthalpic factors. In fact, for most of the proteins with L-1C motif some bulky residues are located in the vicinity of the piercing residue, therefore they evolved to stabilize the lasso motif. This indicates, that these lasso motifs, as stabilized, may be important for the protein’s function. Analogously to the case of lassos, the function of the link motif was approached by analyzing the common features of the non-redundant structures with a given type of link. Proteins with links represent a different kingdoms of life and have different function, however, most of them are secreted proteins (Tab. 3.1). This suggests that proteins with links are characterized by increased stability. In fact, they are super-stable. For example, the ceratoplatanins were shown to be sta-

PDB Function Molecule King Cellular S code dom location

1BW3A Lectin Barwin Plants Secreted

2HCZX Allergen Beta-exp. 1a Plants Secreted

3SUKA Toxin Cerato-plat. Fungi Secreted Positive 3SUMA Toxin Cerato-plat. Fungi Secreted

2KQAA Toxin Cerato-plat. Fungi Secreted

3X2GA Hydrolase Endogluc. Fungi Secreted

1HD5A Hydrolase Endogluc. Fungi Secreted

1WC2A Hydrolase Endogluc. Animals Secreted

4B56A Hydrolase Pyrophosph. Animals Transmem.

2LFKA Inhibitor Tryptase inh. Animals Secreted Negative

2E26A Signalling Reelin Animals Secreted

1H30A Laminin Gr. arr. sp. Animals Secreted

3R4DB Viral protein Spike glycop. Viruses Transmem.

Table 3.1: Function, orientation (denoted by S), molecule type, kingdom of origin organism and cellular location for representative proteins with the Hopf link topology. Table from [D11].

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 66 biologyofcomplextopologyproteins

ble up to 76oC [239], fungal endoglucanase (PDB code 1hd5) retains 45% of activity after incubating for 5 min at 90oC and is stable af- ter heating in 60oC and wide range of pH [240], while the animal endoglucanase (PDB code 1wc2) withstands heating for 10 min at 100oC without irreversible loss of activity [241]. This suggests that the function of link motif may be to increase the stability of a protein. In fact, stabilization was also proved for probabilistic links in proteins [121, 242]. To test this hypothesis, the unfolding of the smallest protein with link – the Tick-derived Protease Inhibitor (PDB code 2lfk) – was inves- tigated in models differing in the topology. In particular, the models with zero, one or two disulfide bonds were taken into account. In the case of two bonds included, the case of the Hopf link and the trivial topology was investigated. The trivial topology model was obtained by swapping the pairing of interacting cysteines. The analysis of the mean unfolding time as a function of the temperature allows (based on the Arrhenius low) for estimating the energy barrier of unfolding (the slope of the temperature dependence in the Fig. 3.4). In particu- lar, in both models with two disulfide bonds, the structures are much more stable, however, the barrier of unfolding for the model with a native link is about 20% higher than in the trivial case [D11].

Figure 3.4: The unfolding analysis of the smallest protein with link (PDB code 2lfk). The logarithm of the mean unfolding rate as a function of the inverse of temperature (left) was calculated, to obtain the energy barrier of unfolding, as a slope of the curve (right top). Five models were tested, including models with two disulfide bridges, differing in topology (right bottom). Figure from [D11].

The deterministic link motif is strictly conserved for all proteins within a homology cluster. Moreover, the same topology and struc- tural similarity are exhibited by proteins with the sequential similar- ity smaller than 30%. Indeed, the proteins with positive Hopf link are structurally similar, with the loops always separated by small, even number of residues, and with similar sizes of the loops (Tab. 3.2). Moreover, all proteins with positive Hopf link are sugar-binding pro- teins. In fact, the structural similarity measured by the structural P- value (probability of obtaining the same structure) is much higher for

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 3.4 function of other topologically complex motifs 67

PDB L1 P1 L2 P2 # of # of Size Loop S code hom. loops sep.

1BW3A 22 82 57 49 4 3 125 2

2KQAA 23 75 55 45 1 2 129 2

2HCZX 28 94 67 58 2 3 245 2 Positive 3X2GA 28 84 85 36 11 5 180 2

3SUKA 37 97 59 64 4 2 125 2

3SUMA 37 97 62 72 1 3 136 2

1WC2A 39 100 85 51 1 6 181 2

1HD5A 70 109 113 74 7 7 213 0

2LFKA 27 57 17 45 4 4 57 0

2E26A 39 2473 166 2380 2 12 725 5 Negative

3R4DB 137 240 79 126 2 3 288 6

4B56A 212 529 387 230 25 16 820 64

1H30A 287 668 27 477 3 4 422 72

Table 3.2: Structural data for representative protein chains with the Hopf link, along with proteins’ function. Ln denotes the size of the loop n, Pn is the signed index of a residue piercing through loop n. Loop sep. is the sequential distance between the loops. # of hom. is the number of homologs for given structure, # of loops is the number of disulfide-based covalent loops in the structure (e.g. if 4, there are 2 covalent loops forming a Hopf link and two trivial covalent loops). The horizontal line separates proteins with different orientation of the Hopf link (denoted by S). The dashed line separates the humanly modified protein with PDB code 3T93. Proteins are order according to the size of the first pierced loop. Table from [D11]. proteins with positive Hopf-link than in the case of negative Hopf link. This indicates that proteins with positive Hopf link may have a common ancestor, which had to occur very early in the evolution, taking into account different kingdoms represented by investigated proteins. Proteins with Solomon bridges also exhibit structural similarity, de- spite low sequential identity. Moreover, they are also sugar-binding proteins, with the size roughly twice the size of the Hopf-link pro- teins. However, no clear sign of domain duplication, leading from Hopf-link proteins to Solomon-link structures was identified. Analogous analysis as in the case of complex lasso proteins may be performed in the case of proteins with knotted loop and ✓-curves. The colocation of the protein motif with their function does not provide evidences of the function of the ✓-curve motif, however, the proteins with -31 knotted loop (with -✓31 or ✓01# - 31 motifs) are commonly enzymes (Tab. 3.3). One may speculate, that the existence of knotted loop restricts the structure in a similar way, as does the backbone knot, resulting in the places favorable for the enzymatic active sites. Apart from performing the statistics of the functions performed by the proteins with ✓-curves, the function of the motif was also ap- proached from the dynamical side. In particular, the unfolding of the

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 68 biologyofcomplextopologyproteins

✓-curve Functions Kingdom Oxidored. (2), adhesion (2), viral (2), transport (2), Bacterie (6), +✓31 hydrolase, transferase, coagulation, metal Animals (5), binding, hormone Viruses (3) Bacteria (6), Hydrolase (5), isomerase, transport, coagulation, -✓31 Archaea, binding Animals Animals (6), Lyase (4), viral (2), adhesion (2), hydrolase (2), ✓01# + 31 Viruses (3), hormone, metal binding Bacterie (3), ... Hydrolase (7), isomerase, ✓01# - 31 Bacterie (7), ... transport, coagulation

✓01#41 Oxidored. (2) Plants (2)

✓54 Hormone Animals

✓8n Hormone Animals

Table 3.3: Function and origin of proteins with ✓-curves. Table from [D19].

models of the protein containing the covalent ✓31 (PDB code 1aoc), differing in the composition of the disulfide bridges were studied. The protein natively contains 8 bridges, therefore the models with one bridge missing were compared with the model corresponding to the native protein, to identify the stabilization effect provided by each bridge separately. If the ✓-curve motif is important for the protein sta-

Figure 3.5: The unfolding of the models of crab coagulogen protein (PDB code 1aoc) differing in the composition of the disulfide bridges. The differ- ence in the logarithm of the unfolding rate, compared to the native structure as a function of the inverse of the temperature (left) was used to calculate the loss of the stability (right) for each disulfide bond missing. The color code for the missing bridge is described above the plots. Figure from [D19].

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 3.5 utilization of structure conservation 69 bility, removing some disulfide bonds constituting the motif should have a much stronger impact on the structure, than removing other bonds. However, the bridges most important for the stability of the structure do not constitute the ✓-curve motif (Fig. 3.5). This indicates, that the ✓-curve and knotted loops do not stabilize the protein struc- ture. Therefore, the knotted loops and ✓-curves are rather a natural consequence of the protein topology, not functional motifs.

3.5 utilization of structure conservation

The strict conservation of the topology for proteins with backbone knot, even for structures sharing low sequential similarity, can be used in modeling the fragments undetermined experimentally (gaps), as well as in prediction of the whole structures. The gaps usually can be reconstructed in many different ways, sometimes differing in the topology. As the function of the protein critically depends on its struc- ture and topology, it is crucial to choose the topologically correct gap filling. Protein reconstruction is usually performed based on sequentially close templates. Within such method (homological modeling), pro- viding the list of topologically correct homologs ensures obtaining a structure with correct topology. The safest way to establish the correct topology is to accept the topological motif of the homolog with the highest sequence identity. In fact, such a method can be also used to validate the topology of a structure, by cutting out part of the chain and then remodeling it. For example, within such a technique the vi- ral protein (PDB code 3j70) was shown to contain artificially linked loops, as remodeling disentangles them [D7]. This technique was im- plemented in the GapRepairer server (Sec. 5.1.4). The possibility of topology validation can enhance also the whole structure prediction. In particular, the results obtained in the CASP competition are much better (measured by GDT score), when restrict- ing only to the structures with correct topology (Fig. 3.6). This indi- cates, that topology recognition taken as initial step or a structure validation, facilitates the whole protein modeling process [D23].

Figure 3.6: The identification of the knotted targets in CASP 11 competi- tion (left) and the performance of the models, when model with (in)correct topology was used in homology modelling (right). In particular, modelling the knotted targets based on unknotted models results in very low GDTTS score. Figure from [D23].

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] [ July 15, 2019 at 14:29 – classicthesis version 1.01 ] FOLDING AND UNFOLDING OF COMPLEX 4 TOPOLOGY PROTEINS

If you are receptive and humble, mathematics will lead you by the hand. — Paul Adrien Maurice Dirac

roteins are remarkable biopolymers which, out of myriads of P possible conformations, fold (usually) towards one structure, de- termined by the amino acid sequence. This exceptional behavior is explained theoretically by the energy landscape theory (Sec. 4.1), in which proteins during folding establish the set of native contacts. The details of the folding pathway may be recreated in silico in coarse- grained models (Sec. 4.2). However, even the conjunction of theo- retical and experimental approach did not solve the conundrum of complex topology protein folding in full generality (Sec. 4.3). Here, a novel self-tying mechanism involving the ribosome as the crucial factor is proposed (Sec. 4.5). As this is the only known mechanism compliant with all the experimental data, it is a strong candidate to a solution of the conundrum of deeply knotted proteins folding. On the contrary, the self-tying of shallowly knotted protein is shown to be spontaneous, independently on the topology. This creates the possibility to study the influence of various factors on the folding mechanism. In particular, the confinement is shown to facilitate the folding, and the novel concept of the minimal map is introduced, in which the removal of some contacts smoothens the free energy funnel (Sec. 4.6). Moreover, the conventional measures of the fold- ing progress (reaction coordinates) are shown to be insufficient in the case of knotted protein folding. Finally, the folding of proteins with the knotted loop and links is discussed (Sec. 4.7).

4.1 energy landscape theory

The problem of From the viewpoint of physics, the protein folding is a process gov- exhaustive random erned by the interaction of a net of atoms, leading to the structure search of protein with minimal free energy. However, this approach does not clarify native conformation is called the the protein folding time – from milliseconds to minutes – which is too Levinthal’s short for an exhaustive random search of the optimal conformation, paradox [243]. but also too long for a simple “downhill run”. The free energy land- scape theory conveniently explains the protein folding process [244– 248]. In this theory, the protein folding is driven by the cooperative for- mation of native contacts, which are mutually supportive – the for- mation of one native contact helps in the formation of another. As a result, the protein can fold towards the free energy minimum state (native state), starting from any of the numerous unfolded states, by

71

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 72 folding and unfolding of complex topology proteins

increasing the number of native contacts. This folding robustness ob- tained by evolutionary optimization of the sequences to achieve the maximally consistent interplay of the contact interactions (minimal frustration) is called the minimal frustration principle [244]. Intuitively, the free energy landscape attains the shape of a funnel, with the low- energy single minimum and a broad spectrum of different unfolded, high-energy states [245]. Within this description of protein thermodynamics, the protein fold- ing is the diffusion down the funnel. The folding kinetics depends on the ruggedness, i.e. on the number and depth of local minima – fold- ing traps. Moreover, some proteins may have many deep free energy minima (e.g. two-state proteins), or the minima may be very shallow (for intrinsically disordered proteins). Finally, the proteins may un- dergo the aggregation, which in fact for some proteins is the global minimum of the free energy landscape of the proteins’ ensemble.

4.1.1 Reaction coordinates

The energy landscape theory allows for considering folding as dif- fusing on the free energy landscape towards the free energy mini- mum [249]. In this picture the folding traps (non-native competing states) are local free-energy minima, shallow compared to the depth of the whole funnel. The diffusion towards the native state can be The fraction of described by an order parameter – the reaction coordinate. The reac- native contacts is tion coordinate intuitively measures the similarity to the native states. conventionally The natural reaction coordinate stemming from the energy landscape described as Q. theory is the fraction of native contacts established, however, differ- RMSD is the Root Mean Square ent reaction coordinates, like the RMSD or Rg are also conventionally Deviation. Rg is the used. Radius of gyration. The reaction coordinate should clearly distinguish between dynam- ically meaningful states, e.g. the folded and unfolded states. Various test and methods to optimize the reaction coordinates were intro- duced [250–252]. In particular, the fraction of native contacts Q was shown to be a good reaction coordinate for unknotted proteins [253].

4.1.2 Contact maps The van der Waals spheres radii for each In practice, there are various ways of defining contacts between residues, atom in protein are having different credibility [254]. In the simplest approach, two residues usually increased are considered to be in a contact in the native structure, if the distance either by between their C atoms is smaller than a given cutoff R (usually multiplicative ↵ C↵ (⇠ 1.2Å), or additive 7Å 6 RC↵ 6 7.5Å) [255, 256]. Such an approach misses, however, the (⇠ 1.4Å) factor. couplings at longer distance. More detailed approach relies on check- ing if the distance between any pair of heavy (non-hydrogen) atoms is smaller than another cutoff distance RH (usually 4Å 6 RH 6 6.5Å). The method of a rigid cutoff may include too many false-positive contacts. More sophisticated methods are based on the overlapping of the van der Waals spheres representing the heavy atoms [257, 258]. In particular, such an approach was used in the automated CSU

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 4.2 simulating the protein behavior 73

Figure 4.1: Comparison of the geometrical occlusion map (squares) with the van der Waals-based map (black dots). Figure adapted from [D18]. method and its generalizations [259, 260]. Finally, one can take into account the geometrical occlusion factor, like shadowing of one atom by another [257]. Such an approach was implemented in the SMOG server [261, 262]. Finally, the set of contacts is conventionally shown as a triangular matrix (Fig. 4.1).

4.1.3 Obtaining free energy landscape and derived quantities The state may be The protein free energy landscape can be built from a sufficiently defined as a set of large ensemble of protein conformations (e.g. simulated folding tra- structure with the jectories), obeying the Boltzmann statistics. In such case, the occur- same value of the reaction coordinate. rence of different states depends on their free energy – the lower the free energy, the more often the state is observed [257]. There are many WHAM stands for automated methods of obtaining a free energy landscape from a set of Weighted Histogram observations, including WHAM, which collects the histograms of oc- Analysis Method. currences of different states based on short observations (trajectories) and builds the whole free energy landscape [263]. Apart from the free energy landscape, such analysis allows for cal- culating other quantities, measurable in the experiment, e.g. the heat capacity Cp. In particular, for the two-state proteins (with folded and unfolded states), the heat capacity Cp has a peak for the temperature, in which the protein is equally probable to be folded and unfolded [264]. In such a case, the free energy profile contains two minima corresponding to the folded and unfolded state (Fig. 4.2).

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 74 folding and unfolding of complex topology proteins

Figure 4.2: The heat capacity Cp normalized to the temperature of the peak T0 (left panel) and free energy landscape (right panel) as a function of frac- tion of native contacts (Q) in the temperature T0. The data for smallest knot- ted protein (PDB code 2efv).

4.2 simulating the protein behavior

In principle, the Newton equations allow predicting the movement of Figure 4.3: The essence of any protein’s atom, if only all the charges, force constants, and initial coarse-grainng conditions are known. However, it is a tremendous work, involving – although the numerical integration of thousands of equations for billions of time image is blury, steps. In practice, folding of only the smallest proteins can be recre- it can still be ated in silico [265, 266]. To speed up the calculations, the reduced, recognized. coarse-grained models of proteins are introduced (Sec. 4.2.1), where The smaller the time the pseudoatoms represent some groups of atoms. Such representa- step, the more tion requires a suitable description of the forces acting in the reduced precise the structure – the force fields (Sec. 4.2.2). calculations.

4.2.1 Coarse-grained models

The idea of coarse-graining is to get rid of the elements, which are not important in understanding the long-time processes (Fig 4.3). In particular, the individual movement of particular atoms does not play a significant role in the movement of the whole chain. On the other hand, removing some atoms results in fewer equations to solve, and in consequence, in a speedup of calculations. In particular, models Figure 4.4: The with only minor simplifications (Rosetta, PRIMO) [267–269], residues coarse-grained represented by a few (pseudo)atoms (CABS, UNRES) [270, 271], sin- (C↵) repre- gle atom representing the residues (SICHO, C↵ models) [272, 273], sentation of a and one pseudoatom representing a few residues (SURPASS) [274, protein. 275] were introduced. The resolution of the model increases the speed of the calculations but also decreases the precision of the results. Therefore, the choice of the model should be adjusted to balance the needs and resources.

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 4.2 simulating the protein behavior 75

4.2.2 Force fields

The coarse-graining of the system requires adapting the forces to a different resolution level. In the first approximation, the interactions are local, which enables to bring all the interactions to the set of pa- rameters describing the characteristic forces. Such a set of parameters is called a force field. In general, the force field may be decomposed into the bonded and non-bonded parts. The bonded part usually consists of the bond stretching, planar, and dihedral angles bending potentials. Often, the improper dihedrals term, keeping the planarity of the aromatic rings are added. The particular terms may vary between the force fields. One choice is presented in Fig. 4.5. The values of the parameters for the bonded terms are determined usually from the structural analysis and spectroscopy.

Figure 4.5: The schematic depiction of the bonded terms used in the force fields, along with their mathemeatical form.

Usually, the non-bonded terms belong to one of three groups: the physical force fields, the statistical knowledge-based potentials, and the structure-based models [276]. In the physical force fields, the non- bonded term consists of dispersion and repulsion effects (represented usually as the Lennard-Jones potential) and the Coulomb electrostatic interaction. The values of the parameters are usually calculated using a quantum mechanical approach. The statistical knowledge-based potentials rely on the assumption, that the interaction energy E corresponding to a given quantity (e.g. the distance between residues) represents the Boltzmann distribution [277, 278]:

E N - = ln obs + c (4.1) kBT Nref

with kB the Boltzmann constant, T the temperature, Nobs and Nref the observed and reference number of occurrences of a given quantity, and some constant c. The method of statistical potentials The structure-based was used to define different energy functions, including the CABS models are called force field, the DOPE potential to assess the protein structure [279], also Go¯ models from the name of or the quasi-chemical potential introduced by Sanzo Miyazawa and Nobuhiro Go.¯ Robert Jernigan, reflecting the attraction and repulsion between dif- ferent residues [280, 281]. Finally, in the structure-based models, the crucial concept is the contact map – the set of pairs of residues close enough in the native

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 76 folding and unfolding of complex topology proteins

structure (Sec. 4.1.2). In these models, only residues which are in the native contact attract each other, while the other residues are mutually inert [282, 283]. The attractive force is equal for all the contacts and is described usually by the attractive part of Lennard-Jones potential. However, the width of the attracting well in Lennard-Jones potential 0 depends on the equilibrium distance rij between residues i and j. As a result, the residues for which the native distance is larger, are treated differently, than the spatially close residues. This can be fixed by using a Gaussian-type attraction potential [284]:

0 2 0 (rij - rij) G(rij, rij)=-exp - 2 (4.2) 2 !

0 Description of with rij and rij being the actual and native distance between residues various non-bonded i and j and controlling the Gaussian well width. Moreover, such potential can be an approach enables to create multiple-basin potentials, usefull for found in [254]. studying proteins with a few stable forms. The repulsing walls can be introduced in Different potentials may be also introduced to simulate other ef- all of the force fields. fects, like non-native contacts (residues not close enough in the na- tive structure) [285], crowding [286], or imposing the chirality [287]. Alternatively, the contacts may be turned on and off [288] during the simulation. Finally, the system may be enhanced with some repulsing walls, mimicking the chaperonin confinement [289],[D8] or ribosome [290–293], [D5].

4.3 folding of topologically non-trivial proteins

Proteins with less than 20 residues in How proteins acquire the complex topology motif challenges all the both knot tails are theories of protein folding. In the case of shallowly knotted proteins, called shallow [104]. the in silico studies in coarse-grained models on small trefoil-knotted proteins MJ0366, VirC2, and DndE show, that the self-knotting is a late folding event, occurring via spontaneous threading of the tail through a twisted loop [289, 294, 295]. Several variants of this mech- anism were identified, including direct threading, threading via slip- DehI stands for knot, double-loop or mousetrap mechanism. Another variant of this ↵ -haloacid mechanism was also observed in the case of the 61-knotted DehI pro- Dehalogenase I. tein, in which the correct folding was observed in only 6/1000 cases, probably due to the 20-residues length of the C-terminal (threading) tail [114]. The mechanism was further confirmed by all-atom-like cal- culations [296] and detailed, all-atom calculations, starting from slip- knot conformation, which led to knotted structure in majority of cases [297]. The folding rate was also shown to be facilitated by additional factors including non-native interaction [285, 298], confinement [289], or phase border [299]. The general picture lacks, however, the dimer- ization step, which occurs in the partially folded stage, according to experiments [300]. The folding of the deeply knotted proteins still remains unclear. Two opposite mechanisms were proposed. In the first one, the self- tying occurs in the late folding stage, and the threading occurs via slipknot conformation [301]. However, the folding of deeply knotted

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 4.4 the aims 77 proteins within this mechanism is highly inefficient, with only a few structures properly folded out of hundreds [301, 302], even after ad- dition of some non-native contacts [285]. In the second mechanism, the knot is formed in the early folding stage, when the protein remains flexible enough, to structure a large, twisted loop [288]. The latter mechanism is further supported by the experimental results, showing that even the protein with two bulky domains, preventing threading in late stages, can effectively self-tie in the in vitro system containing only the transcription/translation machinery [303]. This solution stays, however, in contradiction with some results suggesting the knotting to be the rate-limiting step [215]. Apart from knotted proteins, folding of lasso peptides was studied experimentally. In the case of lasso proteins, in general, there are two possible mechanisms – either the tail pierces the covalent loop, or the loop is closed by covalent bond after the protein is already structured. In the case of lasso proteins, the second mechanism is realized, as the loop closing amide bond is introduced post-translationally [128].

4.4 the aims

The contradicting results on folding of deeply knotted proteins moti- vate the most important aim: 1. To construct and test another folding mechanism of deeply knot- ted proteins, compliant with all the experimental data.

Other proteins with a shallow knot are 52-knotted structures. Ana- lyzing their folding enables, therefore, to study of the robustness of the tail-threading mechanism. On the other hand, the 52 knotted pro- teins have a complicated folding mechanism, including two paralel pathways [304]. This motivates further aims:

2. Verify the self-tying mechanism of 52-knotted proteins;

3. Identify its parallel folding pathways;

4. Identify the influence of confinement on the folding pathway of 52-knotted protein.

Furthermore, the possibility to simulate the folding of a small 31- knotted protein gives the unique possibility to test other concepts in protein folding:

5. Analyze the influence of the contact map on the self-tying;

6. Analyze the influence of the knot tail length on the self-tying;

7. Validate the fraction of native contact as a reaction coordinate for knotted protein folding.

Finally, as the folding was studied only in the case of proteins with backbone-knots and some proteins with lassos, the natural aim is to:

8. Analyze the folding of other topologically complex motifs.

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 78 folding and unfolding of complex topology proteins

4.5 folding of deeply knotted proteins

The recent experimental result showing, that a protein can self-tie even with bulky, stable domains on both termini [303] forces to re- think the possible folding mechanisms. In particular, it strongly sup- ports the mechanism, in which the unfolded protein forms a knot in the early folding stage. In such case, there are two competing pro- cesses – the formation and threading of the twisted loop, and for- mation of the secondary and tertiary structures. The former process has to be extremely fast if the whole domain can perform the thread- ing, or the random loop has to be extremely large. In particular, the formation of the twisted loop has to be faster, than formation other contacts responsible for the tertiary structure. In general, no mecha- nism of such loop formation is known, which forces to seak for an alternative folding mechanism. Such a mechanism can be deduced by reexamining the recent exper- imental results, in which the three-domain protein is obtained by tran- scription and translation in vitro, starting from the DNA code [303]. The central, knotted domain is the methyltransferase, which acts on the tRNA molecules. Therefore it has some affinity to the ribosome present in the system, to which it may bind cotranslationally. In fact, the binding site is located in the knotted core which, therefore, may wind around the exit channel forming a twisted loop, while the pro- tein exits the channel (Fig. 4.6). In such a case, the nascent protein being pushed out from the channel forms a slipknot, which eventu- ally is converted into a knot. In this ribosome mechanism, no bulky residue is required to thread the loop, as only the nascent, still unfolded chain is performing the threading. In fact, the cotranslational knotting was already proposed [305] and promising results were obtained [290, 291]. In such a mech- anism, the role of the chaperone may be to facilitate detaching of the protein from the ribosome, as the chaperones have no effect on the refolding kinetics [303]. Therefore, three proposed mechanisms of self-tying of proteins exist – the slipknot mechanism, in which the knotting is a late event, the random-knot mechanism, in which the knotting is an early folding event, and the ribosome mechanism, in which the knotting occurs cotranslationally (Fig. 4.6). A recent survey of the knotted proteins reveals the ideal structure to test all three mechanisms [306]. In particular, the Tp0624 protein from Treponema pallidum is a three-domain protein with the central do- main knotted, forming therefore naturally occurring, extremely deep knot, with over 130 residues on both knot tails. Due to the depth of the knot, the protein cannot self-tie in the slipknot mechanism. In fact, the central, knotted domain alone can fold within this mechanism only in singular cases, ruling out the possibility of efficient knotting by slipknot mechanism of the whole protein [D5]. On the other hand, the unknotted protein becomes compact much faster, than any twisted loop formation can take place [D5]. To rule out also the random-knot mechanism, loosely knotted structures were prepared by simulating unfolding. It turned out, that the correctly

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 4.6 folding of shallowly knotted proteins 79

Figure 4.6: The schematic depiction of the ribosome mechanism of protein knotting. The protein comes out of the ribosome as an unknot and winds around the exit channel forming a slipknot. The ribosome pushes out the tail, which forms a proper knot. Finally, the protein is detached from the ribosome. folded structure is possible only if the starting knot is sufficiently deep [D5]. Otherwise, the structure spontaneously disentangles. The sufficiently deep knot cannot be formed spontaneously in the un- folded structure without the chain collapsing, which also rules out the random-knot mechanism. The ribosome mechanism was tested in a special system, in which the ribosome exit channel was simulated as a repelling tube [D5]. The protein exiting from the tube attains spontaneously the twisted loop conformation, which may be stabilized by the attraction to the ribosome wall. In particular, stabilization of only four loop residues allows for obtaining the slipknot conformation, and eventually a knot- ted structure in as many as 70% of cases [D5]. This indicates, that folding on the ribosome may be a correct mechanism of folding of deeply knotted proteins.

4.6 folding of shallowly knotted proteins

The folding of the shallowly-knotted proteins relies on the sponta- neous tail-threading mechanism, occurring as the late folding event, as shown in the case of the 31-knotted proteins. To complete the pic- ture, the folding of the 52-knotted UCH-L proteins was analyzed. On the other hand, the possibility to recreate the folding/unfolding effi- ciently in the case of the shallowly knotted proteins allows for testing the influence of the contact map and the knot tails on the folding, as well as to test if the fraction of native contact is a good reaction coordinate.

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 80 folding and unfolding of complex topology proteins

4.6.1 Folding of 52-knotted UCH-L proteins UCH stands for Ubiquitin The only examples of the 52-knotted proteins are various homologs of Carboxy-terminal the shallowly knotted UCH protein. These proteins were analyzed ex- Hydrolase. perimentally, and two parallel folding pathways were identified [304]. The analysis in silico performed in the coarse-grained model of the structure with PDB code 3irt shows also two pathways, differing in the topology of intermediated product. In one pathway, where the N-terminus is structured first, the intermediate product forms a 31 knot, and the final folding act is the threading of C-terminus, compli- ant with the tail-threading mechanism known from other shallowly knotted proteins. On the other hand, if the C-terminus is structured first, the protein remains unknotted until the N-terminus is threaded with the creation of a 52 knot directly (Fig. 4.7)[D8].

Figure 4.7: The schematic description of two, topologically distinct folding pathways of 52-knotted proteins. Figure adapted from [D8].

In both bulk and confinement conditions, the protein usually folds with the pathway, where the 52 knot is obtained directly. However, in the confinement, the probability of the second pathway (involving knotted intermediate product) rises substantially. Accordingly, the confinement significantly increases the probability of random knots in the unfolded state, however, the random knots in bulk, although rare, are usually much deeper [D8]. The close analysis of the folding as a function of time allows for presenting the folding as a Chevron with the temperature acting as the denaturant (Fig. 4.8). Such an analysis shows that the depth of the Chevron plot in the case of confinement is smaller, indicating that the folding is facilitated in the confinement conditions. On the other hand, in the confinement, the temperature, for which the structure unfolds is higher. Therefore, the confinement stabilizes the structure, in agreement with other results [307]. Apart from the increased prob- ability of knot formation due to confinement, the effect of increased temperature may also facilitate correct folding, as it may reduce the probability of falling into some thermodynamical traps during fold- ing.

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 4.6 folding of shallowly knotted proteins 81

Figure 4.8: Simulated chevron plot for UCH-L1. A: Representative mean Q as the function of time (red) and smoothed curve (green). B: Chevron plot obtained for bulk. C: Chevron plot obtained for the confinement. B and C come from fitting od sum of exponentials to plot in A. D: comparison of one slow and one fast phase for bulk (red) and confinement (green). The dashed lines present the expected chevron plot. Figure adapted from [D8].

4.6.2 Influence of the contact map

Although the general mechanism of folding of the smallest backbone knotted protein – MJ0366 (PDB code 2efv) – was established (the tail- threading mechanism), its variants depend on the model and the con- tact map used. In particular, the van der Waals-based and the Shadow Direct Coupling maps differ in three groups of contacts – formed by the loop region, Analysis (DCA) is the H3 helix, and H4 helix with the rest of the protein (Fig. 4.1). The the method of identifying contacts influence of these contacts may be investigated by adding them to the by analysis of the standard van der Waals-based map. coevolution of On the other hand, the van der Waals-based map may be enhanced residues [308]. by contacts found in different ways. In particular, the Direct Coupling Analysis identifies two additional groups of contacts, between helix H2 and -sheet B2, and between -sheet B2 and helix H3. There- fore, in total five modified contact maps were created [D17]. The com- parison of the heat capacity and the free energy landscape obtained within these models is shown in Fig. 4.9. The largest discrepancies may be obtained within the Loop-map (van der Waals-based map with loop contacts added). Within this model, although the general folding mechanism is preserved, the free energy barrier is greatly diminished, which results in substan- tial speed up in the folding and knotting process. On the other hand, in the H4 map (van der Waals-based map with contacts formed by H4

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 82 folding and unfolding of complex topology proteins

Figure 4.9: The comparison of the heat capacity profiles (A, C, E) and cor- responding free energy landscape and knot formation probability (B, D, F) as a function of the fraction of native contacts Q in different contact maps for the smallest knotted protein (PDB code 2efv). In (A) and (B) comparison of the van der Waals-based (Tsai) and Shadow contact maps with the van der Waals-based map with added contacts formed by helix H3,H4 or in the loop region. In (C) and (D) comparison of the van der Waals-based (Tsai) and Shadow contact maps with the van der Waals-based map with two sets of contacts predicted by DCA analysis added. in (E) and (F) comparison with the minimal contact map. (A), (C), and (E) the temperature is normal- ized to the temperature of the peak of the van der Waals-based map. The Figure adapted from [D17].

helix), a different folding mechanism is observed, where the -sheets are formed after the knot is tied [D17]. The folding and tying facilitation observed within the model with the loop contacts enhanced creates the question, what is the opti- mal map, with which the protein folds fastest, but within the same mechanism. Such minimal map can be constructed by the scrupulous analysis of the order of events during folding and selection of the contact necessary for the formation of secondary and tertiary struc- tures [D17]. In such a map, the free energy barrier was substantially diminished (Fig. 4.9F). The success in the creation of the minimal map in the case of the smallest knotted protein encourages, to test this approach in the case of deeply knotted protein, as an alternative for the ribosome mecha- nism. The construction of the minimal map for the standard example of deeply knotted methyltransferase (PDB code 1j85) enables to ob- tain the model, within which the protein folds correctly in up to 20%

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 4.6 folding of shallowly knotted proteins 83 of cases [D12]. This is much higher than in the case of the model without any manual inspection of the contacts. However, such an ap- proach is still insufficient to obtain the full free energy landscape of deeply knotted proteins. The promising results of the minimal map force to rethink the algorithm used to generate the contact maps. In particular, it may happen, that attraction of two residues brings closely also some non- interacting ones. In such a case, these non-interacting residues would be wrongly treated as an attractive contact by the automated analy- sis. From such a viewpoint, the minimal maps can be regarded as the maps deprived of all the false-positive contacts, and therefore the closest to the proper description of the protein internal interactions. Elimination of the false-positive contacts can be also done by recall- ing the force constants of the contacts. In particular, if two residues do not attract each other, they may be rarely found spatially close. The statistical analysis of such contacts allows then for constructing the statistical potential reflecting the probability, that the two residues are mutually attracting. Following this idea, the recalling of the con- tact map for YibK using the Miyazawa-Jernigan potentials enables to obtain the self-tying of YibK protein more often, than in the model with equal contact strength, however, the maximum efficiency ob- tained was two times smaller than in the case of the minimal contact map [D12]. From the point of the free energy landscape theory, the construc- tion of the minimal map may be viewed as removal of obstructing contacts, smoothing the folding funnel. On the other hand, rescalling the contact strength may be viewed as funnel smoothing obtained by decreasing the depth of local free energy minima (kinetic traps), or the height of local free energy barriers.

4.6.3 Influence of the knot tails

The depth of the knot is crucial when investigating the folding of knotted proteins. Fig. 4.10 shows, how the heat capacity and the free energy landscape of MJ0366 depend on the length of the threading tail [D18]. In particular, the reduction of the tail decreases the free energy barrier facilitating folding. On the other hand, the extension of the non-threading terminus has almost no effect on the free energy landscape and the folding process. The length of the tails also influences the proportion between path- ways in the case of the unfolding of 52-knotted UCHs. In particular, the longer the terminus, the smaller the chance it will slip out first, leading to unfolding [D8].

4.6.4 Fraction of native contacts as a reaction coordinate for knotted pro- teins

If the fraction of native contacts Q is good reaction coordinate for knotted protein folding remained unclear. The large ensemble of the

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 84 folding and unfolding of complex topology proteins

Figure 4.10: The dependence of the heat capacity (A and C) and the free en- ergy landscape (B and D) on the length of the knot termini for the smallest knotted protein (PDB code 2efv). In (A) and (B) variants differing with the length of threading, C-terminus were investigated. In (C) and (D) the vari- ant with remodeled non-threading N-terminus was analyzed. The numbers denote the length of the full protein chain. Figure adapted from [D18].

folding/unfolding trajectories for the smallest (backbone) knotted protein allowed to validate the fraction of native contacts by check- ing the probability of being on the transition path [251]. In particular, the probability of being on the transition path as a function of the fraction of native contacts should resemble the Gaussian distribution with the maximum of height 0.5. In the case of knotted proteins, in all investigated maps, the distribution can only roughly be approxi- mated by Gaussian distribution, and the peak amplitude is definitely lower than 0.5 (Fig. 4.11). Moreover, the splitting probability of ob- taining the so-called folded/unfolded basin, which should again be Gaussian, divides into two separate distributions. Furthermore, the proposed optimization procedure [251] did not improve the results. As a consequence, the fraction of native contacts Q is not a good reaction coordinate for knotted protein folding.

Figure 4.11: Conditional probability p(TP|q) of being on the transition path given the value of Q = q (red squares) for (A) Shadow-map and (B) minimal map with Gaussian fitted function (green curves). The insets show folding probability (Q). Figure adapted from [D18].

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 4.7 folding of other topologically complex proteins 85

In the case of (backbone) knotted proteins, the transition state is ex- pected to occur when the tail is threading the twisted loop. However, the analysis shows, that such state may be characterized by different values of the fraction Q, which captures the collective behavior of the whole chain. This indicates the plasticity of the transition state, postulated also for 52-knotted proteins [309]. Therefore, no standard reaction coordinate describing collectively the behavior of the whole chain can be a good reaction coordinate. In particular, neither Rg, nor RMSD, nor the two-dimensional combination of those can be good reaction coordinate. However, as the transition state is expected during the threading of the chain, one can construct a reaction coordinate capturing the threading and convolute it with the fraction of native contacts, which captures the folding of the whole protein. For example, one can use the variation on the minimal surface analysis, used in complex lasso definition (Sec. 1.5). In particular, one can span a minimal surface on the twisted loop and monitor its threading, or the index of the threading residue. Such a method would capture the threading mo- ment, and therefore most probably the transition state provided the fraction of native contacts Q would be in the acceptable region. How- ever, the results may depend on which part of the chain is regarded as the native twisted loop.

4.7 folding of other topologically complex proteins

Folding of deterministic links depends on the conditions in which the process occurs. In oxidative conditions, the disulfide bridges are formed during folding, if only the cysteines are close enough. As a result, once a covalent loop is formed, the threading is necessary to complete folding. In reductive conditions, on the other hand, the disulfide bridges may be formed after the protein is fully folded.

Figure 4.12: Possible ways of folding of TdPI. Folding can follow three differ- ent pathways, but the formation of the small covalent loop as the first event blocks folding. Moreover, in the last folding step, the protein can collapse to a topological trap (in red oval), characterized by trivial topology. Green oval denotes the native, Hopf-link structure. Figure from [D11].

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 86 folding and unfolding of complex topology proteins

Figure 4.13: The free energy landscape and the knot probability of coagulo- gen factor (PDB code 1aoc) as a function of the fraction of native contacts Q. The minima of the landscape correspond to unfolded and native states and two intermediate products (above the graph). In the first intermediate product, the core of three disulfide bridges is formed. In the second interme- diate product, three additional bridges are formed. The cysteine residues are marked as color beads with the color code above the plot. Figure adapted from [D19].

To test possibility of different pathways the folding simulations of the smallest protein with a deterministic link – TdPI (PDB code TdPI stands for 2lfk) – in different conditions were performed. In general, TdPI can tick-derived protease follow three pathways differing in the first folding event [D11]. The inhibitor. probability of each pathway depends on the conditions. However, in both oxidizing and reducing conditions, apart from the native Hopf link structure, the protein may fold towards the structure with triv- ial topology, where although the bridges are formed correctly, the loops they close do not pierce each other. In the reducing conditions, the fraction of such misfolded structures is low, however, it is sub- stantially increased in the oxidizing conditions [D11]. Formation of different products with correctly formed disulfide bridges was also observed experimentally [310], therefore the topological analysis of folding of TdPI allows for identifying the possible misfolded struc- ture as the trivial link. The creation of knotted loops differs much from the self-tying of the protein backbone. In particular, the loop may be knotted if only one covalent bond (e.g. the disulfide) pierces through the twisted loop. As piercing of the bond is much easier than piercing of the whole backbone, folding of proteins with knotted loops is expected to be easier than in the case of backbone knotted proteins. Moreover, in the case of a knotted loop, the threading does not necessarily mean the creation of a knot, contrary to backbone knotted proteins where

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 4.7 folding of other topologically complex proteins 87 the knot is acquired only after all the loop forming covalent bonds are established. Folding of proteins with knotted loops may be analyzed on the example of the coagulogen factor from Tachypleus tridentatus (PDB code 1aoc), as this protein contains purely covalent knotted loops. The analysis with constant temperature coarse-grained simulations shows that in this case the knot forms when the structure is almost folded, with the fraction of native contacts Q ⇠ 0.8, i.e. much later, than in the case of backbone-knotted proteins (Fig. 4.13). Moreover, no misfolded structures are observed during the process.

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] [ July 15, 2019 at 14:29 – classicthesis version 1.01 ] THE TOOLS CREATED 5

Give me six hours to chop down a tree and I will spend the first four sharpening the axe. — Abraham Lincoln

s a result of the project, various tools allowing automated anal- A ysis of the protein topology were created. These constitute data- bases and servers (Sec. 5.1), plugins (Sec. 5.2) and a Python package for standalone analysis of the data (Sec. 5.3). Despite the fact that these tools were introduced very recently, they have already become a valuable source of information, used by the researchers from all over the world (Fig. 5.1).

Figure 5.1: The usage map of the LinkProt database (June 2019).

5.1 servers and databases

The most actual, and available to all researchers list of all proteins with a given topology is accessible via the internet databases. Three databases were created or updated – KnotProt storing information about knotted loops, backbone knots and slipknots, LassoProt with data about complex lasso proteins, and LinkProt with data concern- ing the deterministic and probabilistic links. Along with the database the conjugated servers are accessible allowing the user to analyze own structures. The databases are invaluable tools allowing the user to access the proteins with required topology or perform the statisti- cal analysis. Apart from the databases, another server – GapRepairer – was created to utilize the information about the conservation of the topology to rebuild the protein backbone.

89

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 90 the tools created

5.1.1 KnotProt

KnotProt database is KnotProt is the self-updating database and server devoted to analysis available at of knots in proteins. In its original form, it contained the information http://knotprot. about backbone-knotted and slipknotted chains [113]. For each pro- cent.uw.edu.pl. tein, its topological fingerprint matrix, along with the data concerning knotted core, slipknot loops and knot tail were given. Various search filters allowed to identify the protein structures with desired (back- bone) topology.

Figure 5.2: The exemplary result of the whole chain knotoid analysis of the chain (PDB code 3bjx, chain A). The map and the sphere encode the knotoid as a function of the projection direction.

The aim of the KnotProt update (KnotProt 2.0) was to deliver a comprehensive description of all aspects related to knots, for each polypeptide chain present in RCSB database [D3]. In particular, three additional aspects of protein topology, presented as separate tabs are analyzed:

1. The knotoid topology of the backbone;

2. The existence of knotted loops;

3. The presence of cysteine knots.

The knotoid topology of the backbone is analyzed for the whole backbone with the map and the sphere representing the projection direction (Fig. 5.2), as well as for all the subchains. In the latter case, the knotoid fingerprint matrix is presented, along with the possibility to compare it with the knot fingerprint. The knotoid analysis may be also performed in the server part on the structure uploaded by the user. The knotted loops are presented as a sequence of the bridging residues (Fig. 5.3), connected either by a backbone (symbolized by ...) or by the bridge (< - >). The knotted loop may be also visualized in the structure, with the separate pieces of the backbone distinguished by different colours (Fig. 5.3). The cysteine knots are presented with the loop-forming bridges and piercing bridge visualized in the structure. Apart from new topo- logical motifs present, KnotProt is also supplied with new search

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 5.1 servers and databases 91

Figure 5.3: The exemplary table with knotted loops (top panel) and the vi- sualization of the knotted loop (bottom panel).

filters, including the possibility to show only the sequentially non- redundant structures.

5.1.2 LassoProt

LassoProt is a self-updating database and server devoted to the LassoProt database analysis of complex lasso proteins. For each chain, the information is available at about the lassos (trivial and non-trivial) present in the chain is given http://lassoprot. cent.uw.edu.pl.

Figure 5.4: The exemplary result of in the LassoProt database. Figure adapted from [D16].

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 92 the tools created

(Fig. 5.4)[D16]. The location of the bridge and the residues piercing through the triangulation surface are shown in the protein representa- tion and in the sequence. For each non-trivial loop its corresponding triangulated minimal surface may be visualized in the structure. Various filters allow the user to search for a structure with desired motif and properties. In particular, the user may search a specific motif (including the piecing chain and direction), select the closing bridge type (disulfide, amide, ester, and others) and narrow the selec- tion to a non-redundant set. Apart from the database, the LassoProt contains the server part in which the user may upload static structures, or the whole trajectories. In the case of static structures the user may allow the server to identify and analyze all the loops listed in the file, to select closing bridge or to manually enter the indices of loop-closing residues. The latter option may be useful when analyzing artificial loop closures. In the trajectory mode, the user obtains the interactive plots showing change of the topology and change of the indices of piercing residues (Fig. 5.5).

Figure 5.5: The exemplary result of the trajectory analysis. The change of the topology in time (top plot) along with the corresponding schematic passage (bottom plot). Figure adapted from [D16].

5.1.3 LinkProt

LinkProt database is LinkProt is a self-updating database and server devoted to the deter- available at ministic and probabilistic links in proteins [D15]. For each incomming http://linkprot. structure, the presence of a link within any combination of chains cent.uw.edu.pl. (up to four chains) is checked. For each non-trivial result, the linked chains are visualized next to the pie chart showing the probability of different link types (Fig. 5.6). Each linked component is also treated with the minimal surface analysis. As a result, the piercing residues obtained for each link type are determined. As the index of the piercing residue depends in gen-

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 5.1 servers and databases 93

Figure 5.6: The exemplary result of in the LinkProt database. Figure adapted from [D15]. eral on the closure, this information is presented as a histogram of piercing indices (Fig. 5.7).

Figure 5.7: The exemplary table with the links identified, along with the index of the piercing residues determined by the minimal surface analysis, presented as a histogram. The Figure adapted from [D15].

Apart from self-updating information about deterministic and prob- abilistic links, the LinkProt server contains examples of macromolecu- lar links, including animated presentation facilitating the undestand- ing of the link motif.

5.1.4 GapRepairer GapRepairer server GapRepairer is a server devoted to model the missing fragments in is available at http: the protein structure, with the care for the topology of the obtained //gaprepairer. models [D7]. The structures are repaired using homological mod- cent.uw.edu.pl. elling performed by the Modeler software [311, 312]. In the process of repairing, only templates with the topology compliant with the best homolog (with best gap coverage) are kept.

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 94 the tools created

Figure 5.8: The exemplary result of the GapRepairer server. Left panel - interactive visualization of the reconstructed structure protein with PDB code 4WZS (chain C): modelled chain (input data) in grey, the shades of the same colour denote different gap fillings, accompanying chains in black. Right panel - table with basic information regarding each model, including information about entanglement (second column) and the DOPE-HR score for the whole structure and for each gap separatelyThe Figure adapted from [D7].

For a given structure, five best models are presented, with each gap filled and assessed by the DOPE potential [279]. The user may view the structure with all or selected gap fillings (Fig. 5.8). The user is, however, not obliged to use any of the proposed models, as GapRe- pairer allows for mixing the obtained models. For each of the ob- tained model, the extensive topological analysis, including the knot matrix fingerprint calculation and spanning the triangulated minimal surface is performed (Fig. 5.9). Moreover, GapRepairer has a unique feature, that it can search for structural (not sequential) homologs, using DALI database [313].

Figure 5.9: The exemplary, simplified output of the topological analysis. Right panel - the table indicating topological details about models. Columns show the model name, the entanglement type: (slip)knot structural data (third and fourth column), lasso data (fifth and sixth column). Below, the thumbnails of the knot fingerprint matrices are shown. Left panel - enlarged knot fingerprint showing the knot core, slipknot loop and slipknot tail (as in the table). Figure adapted from [D7].

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 5.2 plugins 95

5.2 plugins

Apart from the web-based servers and databases, various tools allow- ing manual inspection of the proteins (or arbitrary polymers) were created. In particular, two PyMol plugins - PyLasso to analyze com- plex lasso topology and PyLink to analyze the link topology were developed.

5.2.1 PyLasso

PyLasso is a PyMOL plugin allowing to analyze the complex lasso PyLasso plugin is topology in structures and trajectories provided by the user [D10]. available at The analysis may be performed automatically, the user may select http://pylasso. cent.uw.edu.pl. the bridge type (in the case of PDB files), or enter the indices of bridge-forming residues manually. The latter option may be used when analysing the .xyz files, or structures with artificial bridges. The user may also choose the bridge-forming residues by clicking them in the PyMOL viewer, or in the sequence. The result of the analysis is the table with all the structural information concerning the closed loops identified (Fig. 5.10).

Figure 5.10: The exemplary result of single structure analysis performed by PyLasso.

In the basic calculation, various filters validating the lasso are used. However, the user may change the default minimal distance between the piercing and the bridge, the tail end or disactivate checking the distance between residues. Apart from the lasso calculation, to facil- itate the perception the user may smooth the structure and calculate the Gaussian Linking Number matrices for each loop (see Sec. 1.2.1).

Figure 5.11: The exemplary result of trajectory analysis performed by Py- Lasso.

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 96 the tools created

In the trajectory mode, the result of the analysis comprises a graph of the topology (Fig. 5.11) and a graph of the index of the piercing residues as a function of time for a given loop. Such graphs may be useful, when analyzing a folding trajectory of a protein with the lasso. Afterwards, each frame of the trajectory can be analyzed in a single frame mode.

5.2.2 PyLink

PyLink plugin is PyLink is a PyMOL plugin devoted to the analysis of all three kinds available at of links (deterministic, probabilistic, and macromolecular) in proteins http://pylink. and other polymers [D2]. In the case of deterministic links, the plu- cent.uw.edu.pl. gin is able to identify all closed loops and analyze their linking (for native PDB files). Alternatively, the user can define own bridges ei- ther by inserting their indices or clicking them in the PyMOL viewer window. The latter option may be useful when designing new links. The PyLink has also the unique feature to identify the closed loops spanned on several bridges and ions (implemented in knotted loops in KnotProt database). The loops are generated on demand and pre- sented as a separate list (Fig. 5.12). The user may then select several loops and analyze their link topology.

Figure 5.12: The list of the closed loops generated by PyLink (left panel), and visualization of one of them in the PyMOL viewer window.

In the case of probabilistic links, by default every pair, triple and quadruple of chains will be analyzed, and the non-trivial linking will be gathered in a table with a relevant pie chart (Fig. 5.13). Alterna- tively, the user may select the subchains to be analyzed, by inserting the indices of delimiting residues. For macromolecular links, the user has to specify the chains form- ing each component. The bridges between the chains forming the components may be supplied by the user, or will be identified au- tomatically for a given set of chains. In each case, the user may also generate the smoothed structure, facilitating the understanding of the link motif, or generate the interactive Gaussian Linking Number ma- trices.

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 5.3 topoly python package 97

Figure 5.13: The exemplary result of an analysis of probabilistic links per- formed by PyLink.

5.3 topoly python package

Topoly is a Python package collecting all the functions developed Topoly package is during the identification and analysis of complex topology in pro- available at teins [D22]. In particular, it enables to calculate various polynomials http://topoly. cent.uw.edu.pl. and can decode the simplest knot/link/graph type from the given polynomial using own library. It accepts and prints out the results in various formats (PD, Gauss, Dowker and Ewing-Millett codes) as well as the xyz coordinates. Therefore, the Topoly package may serve also as the translator of the formats. Apart from the knot specific functions, the Topoly is also able to reduce the spatial curve or graph with various methods, including Polynomials SONO and Knot_Pull (Sec 1.1.4), perform the boundary link analy- calculated by Topoly include Alexander, sis for ✓-curves, span the triangulated minimal surface for complex Conway, Jones, lasso analysis, close the chains in various methods and calculate the HOMFLY-PT, and GLN. The calculation of GLN or topology may be performed also for Yamada the subchains and the results may be presented automatically as an polynomials. image of the matrix (for example knot fingerprint matrix). Finally, apart from the library of polynomials characteristic to a given topology, the Topoly package features also the library of the coordinates and PD codes for various structures, which may be in- stantly loaded and analyzed. The simplicity of integrating the package characteristic to Python language makes Topoly a tool which may be useful also for users in- experienced with the topology, polymer physics or programming. By conjugating it with BioPython package [314, 315], the Topoly package becomes a powerful tool allowing to analyze the topology of proteins.

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] [ July 15, 2019 at 14:29 – classicthesis version 1.01 ] SUMMARY, FUTURE DIRECTIONS, AND FURTHER 6 READING

Głebiej,˛ nie byle jak, nie po łebkach, byle lizna´c,byle˛ zby´c,ale rzetelnie, uczciwie, dokładnie. — Oath of 21st Warsaw Scout Team. The analogy with The state of knowledge may be compared to a disk on a plane. the circle on the What we know is inside the disk. What we do not know – outside. plane comes from prof. MichałHeller. The disk circumference is the current research front. The results of this project (Sec. 6.1) increase the area of the disk, but also the circum- ference, creating further research directions (Sec. 6.2). Answering the emerging questions may be possible, however, only with the knowl- edge of the other results surrounding this project (Sec. 6.3).

6.1 the results

The results of the whole project may be summarized into three points:

1. Much richer spectrum of complex topology motifs – apart from the knotted backbone, the protein structure may contain lassos, links, knotted loops or ✓-curves. The lassos and links may be functional motifs.

As a result of the project, complex topology was identified in a much broader set of protein structures. The motifs identified were en- tirely new (✓-curves), or only singular cases were known previously (Fig. 6.1). To describe well the topology, the tools from knot theory were used (knot polynomials), or new mathematical tools (minimal surface analysis) and new classification (the lasso classification) were introduced. The statistical analysis, colocation with the function and stability, and the unfolding analysis showed, that the function of the deterministic link motif is the enhanced, topology-induced stability, and there are strong indications on the functional advantage of lasso topology, at least for some lasso motifs. In particular, the L-1C lasso motif is suggested to be a key functional property of various kinds of antimicrobial proteins. On the other hand, the knotted loops and the ✓-curves possibly do not have a specific function and are rather an effect of the overall fold.

2. The discrimination on deeply and shallowly knotted proteins is more important than the discrimination based on the backbone topology. The deeply and shallowly knotted proteins differ in the function of the knot and folding mechanism.

The novel mechanism of deeply knotted proteins was introduced in which the ribosome is the key factor leading to the backbone entangle- ment. This mechanism agrees with the current experimental results

99

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 100 summary, future directions, and further reading

Figure 6.1: The “map” of the world of known topological motifs before, and after the project.

and explains, why it is so hard to obtain spontaneous knotting of deeply knotted proteins in the simulations. The function of the deep knot in proteins was shown to strip the protein chain. This stabilizes the neighborhood of some particular residues, which in turn can act as functional residues (e.g the enzyme active sites). This is much dif- ferent behavior, than in the case of shallowly knotted proteins, which can self-tie spontaneously independently of the topology. The func- tion of the shallow knot is to stabilize the termini of the protein, lead- ing to kinetic stability, in agreement with other results (Tab. 6.1). In particular, such a viewpoint shows, why some experiments on shal- low 31 knots indicate elevated stability of the protein, while the in- creased stability is not observed in the study of the proteins with deep 31 knot.

The feature Deep knot Shallow knot Folding Ribosome-based mechanism Spontaneous threading Creating places favorable for Function Stabilization of the chain functional residues Begin of Loss of tertiary structure Knot untying unfolding Unfolded Knot persists Unknotted state

Table 6.1: The comparison of deep and shallowly knotted proteins.

In particular, the unfolding of the deeply knotted protein deeply knotted proteins begins with the loss of tertiary structure, similarly as in the case of any other unknotted protein. The knot protects only the knotted core, buried inside the structure. However, the knotted core itself is very stable, with the knot persisting also in the denatured state. On the other hand, the unfolding of the shallow knot requires the loss of the knotted structure in the early unfolding event. This induces the kinetic stability of the structures. Again, the contradict- ing results of the experiments may stem from the fact, that different types of knots (deep or shallow) were investigated. These results also suggest to condition the definition of the deep and shallow knot on

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 6.2 future directions 101 the protein properties, rather than fixing the exact tail length. The knots with 20 residues in both knot tails are regarded as deep, how- ever, the simulations showed, that such structures with 61 knot can be still spontaneously tied, which is the domain of shallowly knotted structures. On the other hand, the exact knot topology plays a minor role. This suggests that different knot types are just the result of different tail length, with the more complicated knots observed in longer chains. In fact, on the molecular level, there is no knot-theory tool allowing to distinguish different kind of knots and the primary distinction is the overal fold, of which the knotted backbone is only one of the parts.

3. The current set of proteins with knots, links, or lassos may be found in the designed database. Alternatively, the standalone tools for individual analysis are available.

Along with depositing new structures in RCSB database, new knot- ted, slipknotted, lasso or link structures may be identified. In par- ticular, proteins with more complicated motifs may be discovered at some point. To stay constantly up-to-date with the spectrum of proteins’ topology, the designed servers and self-updating databases (KnotProt, LassoProt, LinkProt) were created or updated. Apart from providing the current statistics, the databases allow the user to search for the protein with desired topological features. On the other hand, the topology may be checked in the server part of the services. The topology may be also identified and analyzed with the use of the PyMOL plugins (PyLasso and PyLink), or the Python package Topoly, which can be useful in any calculations related to the poly- mers topology. Apart from the main results, the project has also other inspiring effects. In particular, the influence of the confinement was studied in the case of protein folding, and the fraction of native contacts Q was shown to be a poor reaction coordinate in tracing the knotted pro- tein folding. The exact analysis of these results may be left as further project developments.

6.2 future directions

The results of the project motivate further questions and reveal many knowledge gaps, which could be analyzed as follow-up projects. These can be categorized into five groups:

1. A detailed explanation of the experimentally observed results;

There are a few experiments on knotted proteins, the results of which were not explained in detail by the theoretical approach. In particular, the complex folding/unfolding pathways of YibK (31 knot), YbeA (31 knot) or UCH-L (52 knot), with the identification of all intermediate products has not been done yet. Also, no simulation of the behavior of the protein with 41 knot was performed. Moreover, it is unknown, if the artificially constructed knotted protein remains knotted in the

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 102 summary, future directions, and further reading

denatured state, although it is of major importance when analyzing the results on the refolding kinetics [215]. Furthermore, most of the simulations were performed for the monomer case, while the proteins investigated usually are present as homodimers. When analyzing the folding trajectories of the knotted protein, a new reaction coordinate, suitable for tracing this process would be also required. Another biological phenomenon is motivated by the recent survey of the knotted proteins [306], which evidenced the existence of mito- chondrial knotted proteins. These have to be translocated through the mitochondrial membrane channel and refold inside the mitochondria. Some insights into the translocation of the knotted protein have been already done [316–318].

2. Analysis of the evolutionary pathway of the proteins with com- plex topology;

How the complex topology proteins evolved remains a challenge. As the complex topology is strictly conserved, especially in the case of backbone knots and links, no singular point mutation could lead to the emergence of complex topology motifs. In the case of the back- bone knots, the gene duplication and the circular permutation were proposed as viable candidates for the origins of the topology [215, 319]. However, no candidates for the deterministic links or lassos evo- lutionary pathway were proposed yet. Moreover, it is unknown, if the proteins with different lasso or knotoid motifs may evolve one into another. The evolution of knotoid motifs, if present, may be compared to the Reidemeister moves of the underlying graphs. Therefore, it is interesting to ask, if the topological distance calculated, as the min- imal number of Reidemeister moves needed is somehow related to the evolutionary distance?

3. Further classification schemes; Such high order link invariants are called There exist other topological invariants, which could be applied to Milnor numbers. proteins. For example, the Gaussian Linking Number can be gener- alized to a higher number of linked rings. These can be calculated either from integral representation [320, 321] or from field-theoretical approaches [322]. Investigating the topological field theory (Chern- Simons field theory) leads also to many other knot invariants, which could also be used to study the protein topology. The field-theoretic approach may be also used to describe the poly- mers average shape. It would be most interesting to include the pierc- ing effect in such calculations, i.e. to calculate the asphericity and prolateness of lasso polymers, by analyzing appropriate field theory.

4. Investigation of the folding and function of other motifs;

Although the theory of deeply knotted protein folding passed the most extreme test in the simulations, it still awaits its experimental verification. Moreover, almost no folding studies of the proteins with lassos were performed yet. In such cases, it would be important to

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 6.3 further reading 103 include the dynamic creation of the disulfide bond. Such a possibil- ity was recently implemented in the UNRES force field [323, 324]. Moreover, the non-native contacts were suggested to facilitate folding of shallowly knotted proteins. The addition of non-native interaction shifts the model properties from pure native-centric towards more physical based. Therefore, maybe instead of adding new non-native contacts, it would be better to simulate the folding of shallowly knot- ted proteins in other force fields? The concept of The problem of self-tying of proteins may influence also natively non-native, random, unknotted proteins. In fact, when performing folding/unfolding ex- knot in the unfolded state comes from the periments, the signal measured usually decreases with every rep- discussion with etition of refolding. This effect is attributed to the increase in the Diego U. Ferreiro. concentration of misfolded structures with every repetition of refold- ing. This concentration increase is possible only if some misfolded structures cannot be unfolded even in the strongly denaturing con- ditions applied. Such a stability of the misfolded structures could be ensured by the existence of a non-native backbone knot randomly tied in the unfolded structure. As evidenced by the case of UCH-L protein, such random knots may occure in the denatured state 4.6.1, when the protein manifests more polymer-like behavior. The concept existence of non-natively knotted misfolded structures, after repeated unfolding/refolding process remains to be proven. The function of several motifs needs clarification. In particular, fur- ther studies on the role of the lasso motif (especially antimicrobial L-1C or signaling L+2C) are needed. Moreover, the possible influence of the macromolecular links on the stability of virus capsids needs to be established. Furthermore, the creation of the macromolecular link topology needs to be incorporated in the present theories of the cap- sid self-assembly. 5. Utilization of the complex topology in proteins; Good understanding of the features and principles underlying the complex topology proteins allows for thinking about the utilization of the complex topology (Fig. 6.2). In particular, the lasso loop may be used to protect the piercing content, as plugs for specific channels, to obtain new, protein-based rotaxanes, catenanes or molecular switches based on the position of the loop, or the state of the disulfide bond [325–328]. As can be seen, the project, by providing the answers to important questions, opened many new interesting gates, broadening the field of complex topology proteins. Hopefully, these will lead to many in- teresting discoveries.

6.3 further reading

To keep the work concise, the literature introduction was cut to an absolute minimum. However, many excellent articles and textbooks describe the related issues. The general introduction to the knot the- ory may be found in classical works [7, 329–331] and the recent re- sults are covered by the series “knots and everything”. The history

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 104 summary, future directions, and further reading

Figure 6.2: The schematic exemplary usage of the complex lasso proteins: (top row, left to right) a protective factor, plug to specific cell channel, tem- plate for rotaxane synthesis, or (bottom row) molecular switch acting by change of conformation upon disulfide bridge breaking (left) or change of position of the loop (right).

of knot theory may be found in various relevant articles [332–336]. The usage of knot theory in biology can be found in [337–343], how- ever, the knot theory can be applied also in other fields [344–352]. A lot of information concerning the topological aspects of matter can be found on the web. In particular, a great repository of knot- related sites may be found at http://legacy.earlham.edu/~peters/ knotlink.htm. Various knot invariants can be found in the KAtlas and KnotInfo databases, or in the Mathematica KnotTheory package. The KnotPlot software may be found at . Alternatively, the knot fig- ures can be created with Knotscape. There are also other computer tools devoted for the analysis of knots, in particular pKnot [109, 110], KymoKnot [112], and KNOTS [111] servers or PyKnot PyMOL plugin [353]. Various statistical properties of random polymers are described in the classical textbooks on polymer physics [354–357], and the Flory theory is described in many relevant articles [192]. The extensive treatement of the recent developments in the polymer physics, in- cluding the field-theoretic approach can be found in [358]. Simulating the protein movement in the coarse-grained models is only one of the techniques. Various approaches can be found in de- voted textbooks and review articles [359–362]. The recent results on complex topology proteins were summarized in various reviews [D9], [135, 226, 363–369]. Apart from knotted pro- teins, various data on the biological role, folding and utilization of lasso peptides exist [128, 131, 370]. Some effect of the threading may be found in [371]. This short list of references shows how diverse and interdisciplinary the field of complex topology protein is. The hope of the author is, that this list, along with the results of this work and future directions foreseen will give the inspiration and solid background to new adepts of this fascinating field. And that new exciting discoveries will follow.

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] THE POLYNOMIAL INVARIANTS FOR MOTIFS A PRESENT IN PROTEINS a.1 link invariants

LinkProt Name Notation Figure HOMFLY-PT polynomial name

P(l, m)=m-1(l-3 + l-1)- +Hopf 21 Hopf.1 2 ml-1

1 -1 3 -Hopf 22 Hopf.2 P(l, m)=m (l + l)-ml

P(l, m)=-m-1(l5 + l3)+ Solomon 41 Solomon.1 2 m(l3 - l)

P(l, m)=-m-1(l-5 + Solomon 41 Solomon.2 2 l-3)+m(l-3 - l-1)

P(l, m)=-m-1(l5 + l3)+ Solomon 41 Solomon.3 2 m(l5 + 3l3)-m3l3

P(l, m)=-m-1(l-5 + 1 -3 -5 -3 Solomon 42 Solomon.4 l )+m(l + 3l )- m3l-3

105

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 106 the polynomial invariants for motifs present in proteins

a.2 knot invariants

Name Figure Polynomial Alexander t + t-1 - 1 Conway z2 + 1 +31 (Trefoil) Jones -q4 + q3 + q1 HOMFLY- -(2l-2 + l-4)+m2l-2 PT Alexander t + t-1 - 1 Conway z2 + 1 -31 (Trefoil) Jones -q-4 + q-3 + q-1 HOMFLY- -(2l2 + l4)+m2l2 PT Alexander -t - t-1 + 3 2 41 Conway 1 - z (Figure-eight) Jones q2 + q-2 - q - q-1 + 1 HOMFLY- l-2 + 1 + l2 + m2l2 PT Alexander t2 + t-2 - t - t-1 + 1 Conway z4 + 3z2 + 1 51 (Cinquefoil) Jones -q-7 + q-6 - q-5 + q-4 + q-2 HOMFLY- 3l4 + 2l6 - m2(4l4 + l6)+m4l4 PT Alexander 2t + 2t-1 - 3 2 -52 Conway 2z + 1 (Three-twist) Jones -q-6 + q-5 - q-4 + 2q-3 - q-2 + q-1 HOMFLY- -l2 + l4 + l6 + m2(l2 - l-4) PT Alexander -2t + 5 - 2t-1 2 +61 Conway 1 - 2z (Stevedore) Jones q2 - q + 2 - 2q-1 + q-2 - q-3 + q-4 HOMFLY- -l-2 + l-2 + l4 + m2(1 - l2) PT Alexander -t3 + 3t2 - 4t + 5 - 4t-1 + 3t-2 - t-3 Conway -z6 - 3z4 - z2 + 1 85 q8 - 2q7 + 3q6 - 4q5 + 3q4 - 3q3 + 3q2 - Jones q + 1 HOMFLY- -2l-6 - 5l-4 - 4l-2 + m2(3l-6 + 8l-4 + PT 4l-2)-m4(l-6 + 5l-4 + l-2)+m6l-6

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] A.3 ✓-curve and handcuff graph invariants 107 a.3 ✓-curve and handcuff graph invariants

Name Figure Yamada polynomial

4 6 8 9 10 11 12 ✓31 -1 + x + x + x + x + x + x + x

-1 + x3 + 2x4 + x5 + x6 - x7 - x8 - 2x9 - 2x10 - ✓0 #3 1 1 2x11 - x12 - x13

3 6 8 10 11 13 15 ✓41 -1 - x - x - x - x - x - x + x

4 6 7 8 9 10 12 16 ✓01#41 -1 + x - x - x - 2x - x - x + x - x

-1 - x-x2 - x3 - 2x4 - x5 - x6 - x7 + x9 + ✓5 4 x11 + x13 + x16 - x17

-1 + x4 + x5 + x7 + x8 + x10 - x11 - 2x13 - ✓0 #5 1 2 2x14 - x15 - 2x16 - x17 - x19

2 3 6 7 8 9 H21 1 + x + x + x - x - x - x - x

2 3 4 8 10 12 13 15 H41 1 + x + x + x + x - x - x - x - x - x

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] [ July 15, 2019 at 14:29 – classicthesis version 1.01 ] VIDEOS B

During the work on the project various tutorial movies and video abstract were created, as listed below:

• Statistical properties of lasso-shape polymers and their implica- tions

• Topological knots and links in proteins

• Protein Knotting by Active Threading of Nascent Polypeptide Chain Exiting from the Ribosome

• GapRepairer – repair protein structures and their topology - re- model the link.

• GapRepairer – repair protein structures and their topology - re- modeling the knotted protein.

• PyLink - a PyMOL plugin to identify links 01 Deterministic links

• PyLink - a PyMOL plugin to identify links 02 Probabilistic links

• PyLink - a PyMOL plugin to identify links 03 Macrolinks

• PyLasso - a PyMOL plugin to identify lassos 01 Basics

• PyLasso - a PyMOL plugin to identify lassos 02 Trajectory

• PyLasso - a PyMOL plugin to identify lassos 03 NMR

• PyLasso - a PyMOL plugin to identify lassos 04 Advanced

All the videos are accessible at the YouTube channel ILBSM Cent.

109

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] [ July 15, 2019 at 14:29 – classicthesis version 1.01 ] THE FIGURE MAKING PROCEDURES C

The schematic figures of knots, links and theta-curves were created in PovRay 3.6 with a cubic_spline function. The position of the beads were obtained from KnotPlot software. The colors of the components:

Component 1 0, 0.25, 1 Component 2 1, 0, 0 Component 3 0, 1, 0 Component 4 1, 0.5, 0 Component 5 1, 0, 1 Component 6 1, 1, 0 Component 7 0, 1, 1

The finish parameters:

ambient 0.15 diffuse 0.85 brilliance 2 phong 0.25 phong_size 7.5

111

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] [ July 15, 2019 at 14:29 – classicthesis version 1.01 ] BIBLIOGRAPHY

1. Thomson, W. II. On vortex atoms. The London, Edinburgh, and Dublin Philo- sophical Magazine and Journal of Science 34, 15–24 (1867). 2. Thomson, W. VI. On vortex motion. Earth and Environmental Science Transac- tions of the Royal Society of Edinburgh 25, 217–260 (1868). 3. Thomson, W. 1. Vortex Statics. Proceedings of the Royal Society of Edinburgh 9, 59–73 (1878). 4. Tait, P. G. On knots. Trans. Roy. Soc. Edin. 28, 145–190 (1876). 5. Tait, P. G. On knots Part II. Trans. Roy. Soc. Edin. 32, 327–342 (1887). 6. Kirkman, T. P. et al. The Enumeration, Description, and Construction of Knots, with fewer than Ten Crossings. Proceedings of the Royal Society of Edinburgh 12, 646–646 (1884). 7. Adams, C. C. The knot book: an elementary introduction to the mathematical theory of knots (American Mathematical Soc., 2004). 8. Ewing, B. & Millett, K. C. in The mathematical heritage of CF Gauss 225–266 (World Scientific, 1991). 9. Calugareanu, G. L’intégrale de Gauss et l’analyse des nœuds tridimension- nels. Rev. Math. pures appl 4 (1959). 10. Darcy, I. & Sumners, D. Applications of topology to DNA. Banach Center Publications 42, 65–75 (1998). 11. Kirby, R. & Kirby, E. R. Problems in low-dimensional topology (1995).

12. Kawamura, T. The unknotting numbers of 10139 and 10152 are 4. Osaka jour- nal of mathematics 35, 539–546 (1998). 13. Kawauchi, A. A survey of knot theory (Birkhäuser, 2012). 14. Stoimenow, A. Polynomial values, the linking form and unknotting numbers. arXiv preprint math/0405076 (2004). 15. Meeks III, W. & Pérez, J. The classical theory of minimal surfaces. Bulletin of the American Mathematical Society 48, 325–407 (2011). 16. Douglas, J. Solution of the problem of Plateau. Transactions of the American Mathematical Society 33, 263–321 (1931). 17. Chen, W., Cai, Y. & Zheng, J. Constructing triangular meshes of minimal area. Computer-Aided Design and Applications 5, 508–518 (2008). 18. Perko, K. A. On the classification of knots. Proceedings of the American Mathe- matical Society 45, 262–266 (1974). 19. Reidemeister, K. Elementare begründung der knotentheorie. 5, 24–32 (1927). 20. Alexander, J. W. Topological invariants of knots and links. Transactions of the American Mathematical Society 30, 275–306 (1928). 21. Freyd, P., Yetter, D., Hoste, J., Lickorish, W. R., Millett, K. & Ocneanu, A. A new polynomial invariant of knots and links. Bulletin of the American Mathe- matical Society 12, 239–246 (1985). 22. Lickorish, W. R. & Millett, K. C. A polynomial invariant of oriented links (1987). 23. Kauffman, L. H. State models and the Jones polynomial. Topology 26, 395–407 (1987). 24. Koniaris, K. & Muthukumar, M. Knottedness in ring polymers. Physical re- view letters 66, 2211 (1991). 25. Taylor, W. R. A deeply knotted protein structure and how it might fold. Na- ture 406, 916 (2000).

113

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 114 Bibliography

26. Jarmolinska, A. I., Gambin, A. & Sulkowska, J. I. Knot_pull - python package for biopolymer smoothing and knot detection. submitted (2019). 27. Kusner, R. B. & Sullivan, J. M. Möbius-invariant knot energies. Ideal knots 19, 315–352 (1998). 28. Moffatt, H. K. The energy spectrum of knots and links. Nature 347, 367 (1990). 29. O’hara, J. Energy of a knot. Topology 30, 241–247 (1991). 30. O’Hara, J. Family of energy functionals of knots. Topology and its Applications 48, 147–161 (1992). 31. O’Hara, J. in Ideal knots 288–314 (World Scientific, 1998). 32. Stasiak, A., Dubochet, J., Katritch, V. & Pieranski, P. in Ideal knots 1–19 (World Scientific, 1998). 33. Katritch, V., Olson, W. K., Pieranski, P., Dubochet, J. & Stasiak, A. Properties of ideal composite knots. Nature 388, 148 (1997). 34. Pieranski, P & Przybyl, S. Ideal trefoil knot. Physical Review E 64, 031801 (2001). 35. Piera´nski,P., Przybył, S. & Stasiak, A. Tight open knots. The European Physical Journal E 6, 123–128 (2001). 36. Piera´nski,P. In search of ideal knots. Computational Methods in Science and Technology 4, 9–23 (1998). 37. Piera´nski,P. Poszukiwanie w ˛ezłów idealnych. Pro Dialog 5, 111–120 (1996). 38. Scharein, R. G. KnotPlot: A Program for Viewing Mathematical Knots. Centre for Experimental and Constructive Mathematics, Simon Fraser University (2002). 39. Edwards, S. The theory of rubber elasticity. British Polymer Journal 9, 140–143 (1977). 40. Ziegler, F., Lim, N. C., Mandal, S. S., Pelz, B., Ng, W.-P., Schlierf, M., Jackson, S. E. & Rief, M. Knotting and unknotting of a protein in single molecule experiments. Proceedings of the National Academy of Sciences 113, 7533–7538 (2016). 41. He, C., Lamour, G., Xiao, A., Gsponer, J. & Li, H. Mechanically tightening a protein slipknot into a trefoil knot. Journal of the American Chemical Society 136, 11946–11955 (2014). 42. Khatib, F., Weirauch, M. T. & Rohl, C. A. Rapid knot detection and applica- tion to protein structure prediction. Bioinformatics 22, e252–e259 (2006). 43. Sułkowska, J. I., Sułkowski, P., Szymczak, P. & Cieplak, M. Tightening of knots in proteins. Physical review letters 100, 058106 (2008). 44. Sułkowska, J. I., Sułkowski, P., Szymczak, P. & Cieplak, M. Untying knots in proteins. Journal of the American Chemical Society 132, 13954–13956 (2010). 45. Dzubiella, J. Tightening and untying the knot in Human Carbonic Anhydrase III. The journal of physical chemistry letters 4, 1829–1833 (2013). 46. Dzubiella, J. Sequence-specific size, structure, and stability of tight protein knots. Biophysical journal 96, 831–839 (2009). 47. Katritch, V., Olson, W. K., Vologodskii, A., Dubochet, J. & Stasiak, A. Tight- ness of random knotting. Physical Review E 61, 5545 (2000). 48. Virnau, P., Mirny, L. A. & Kardar, M. Intricate knots in proteins: Function and evolution. PLoS computational biology 2, e122 (2006). 49. Tubiana, L., Orlandini, E. & Micheletti, C. Probing the entanglement and locating knots in ring polymers: a comparative study of different arc closure schemes. Progress of Theoretical Physics Supplement 191, 192–204 (2011). 50. King, N. P., Yeates, E. O. & Yeates, T. O. Identification of rare slipknots in proteins and their implications for stability and folding. Journal of molecular biology 373, 153–166 (2007).

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] Bibliography 115

51. Millett, K. C., Dobay, A. & Stasiak, A. Linear random knots and their scaling behavior. Macromolecules 38, 601–606 (2005). 52. Millett, K. C. & Sheldon, B. M. in Physical And Numerical Models In Knot The- ory: Including Applications to the Life Sciences 203–217 (World Scientific, 2005). 53. Mansfield, M. L. Are there knots in proteins? Nature Structural & Molecular Biology 1, 213–214 (1994). 54. Rawdon, E. J., Millett, K. C. & Stasiak, A. Subknots in ideal knots, random knots, and knotted proteins. Scientific reports 5 (2015). 55. Millett, K. C., Rawdon, E. J., Stasiak, A. & Sułkowska, J. I. Identifying knots in proteins 2013. 56. Rawdon, E. J., Millett, K. C., Sułkowska, J. I. & Stasiak, A. Knot localization in proteins 2013. 57. Sułkowska, J. I., Rawdon, E. J., Millett, K. C., Onuchic, J. N. & Stasiak, A. Conservation of complex knotting and slipknotting patterns in proteins. Pro- ceedings of the National Academy of Sciences 109, E1715–E1723 (2012). 58. Millett, K. C. Knotting and linking in macromolecules. Reactive and Functional Polymers 131, 181–190 (2018). 59. Doll, H. & Hoste, J. A tabulation of oriented links. Mathematics of Computation 57, 747–761 (1991). 60. Cerf, C. Atlas of oriented knots and links. Top. Atlas 3, 1–32 (1998). 61. Witte, S., Flanner, M. & Vazquez, M. A symmetry motivated link table. Sym- metry 10, 604 (2018). 62. Eliahou, S., Kauffman, L. H. & Thistlethwaite, M. B. Infinite families of links with trivial Jones polynomial. Topology 42, 155–169 (2003). 63. Ricca, R. L. & Nipoti, B. Gauss’ Linking Number Revisited. Journal of Knot Theory and Its Ramifications 20, 1325–1343 (2011). 64. Moriuchi, H. An enumeration of theta-curves with up to seven crossings. Proceedings of the First East Asian School of Knots, Links, and Related Topics, 171– 185 (2004). 65. Moriuchi, H. A table of handcuff graphs with up to seven crossings. Knot Theory for Scientific Objects, OCAMI Studies 1, 179–200 (2007). 66. Moriuchi, H. Enumeration of algebraic tangles with applications to theta- curves and handcuff graphs. Kyungpook mathematical journal 48, 337–357 (2008). 67. Moriuchi, H. An enumeration of theta-curves with up to seven crossings. Journal of Knot Theory and Its Ramifications 18, 167–197 (2009). 68. Moriuchi, H. et al. A table of ✓-curves and handcuff graphs with up to seven crossings, 281–290 (2009). 69. Kuratowski, C. Sur le probleme des courbes gauches en topologie. Funda- menta mathematicae 15, 271–283 (1930). 70. Conway, J. H. & Gordon, C. M. Knots and links in spatial graphs. Journal of Graph Theory 7, 445–453 (1983). 71. Robertson, N., Seymour, P. & Thomas, R. Linkless embeddings of graphs in 3-space. Bulletin of the American Mathematical Society 28, 84–89 (1993). 72. Shimabara, M. et al. Knots in certain spatial graphs. Tokyo Journal of Mathemat- ics 11, 405–413 (1988). 73. Negami, S. Ramsey theorems for knots, links and spatial graphs. Transactions of the American Mathematical Society 324, 527–541 (1991). 74. Simon, J. Topological chirality of certain molecules. Topology 25, 229–235 (1986). 75. Flapan, E. & Lawrence, E. D. Topological Symmetry Groups of Möbius Lad- ders. arXiv preprint arXiv:1306.5483 (2013).

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 116 Bibliography

76. Yamada, S. An invariant of spatial graphs. Journal of Graph Theory 13, 537–551 (1989). 77. Tian, W., Lei, X., Kauffman, L. H. & Liang, J. A Invariant for Analysis of Topology of RNA Stems and Protein Disulfide Bonds. Molecular Based Mathematical Biology 5, 21–30 (2017). 78. Kauffman, L., Simon, J., Wolcott, K. & Zhao, P. Invariants of theta-curves and other graphs in 3-space. Topology and its Applications 49, 193–216 (1993). 79. Turaev, V. et al. Knotoids. Osaka Journal of Mathematics 49, 195–223 (2012). 80. Goundaroulis, D., Dorier, J. & Stasiak, A. A systematic classification of kno- toids on the plane and on the sphere. arXiv preprint arXiv:1902.07277 (2019). 81. Korablev, P. G. & May, Y. K. Knotoids and knots in the thickened torus. Siberian Mathematical Journal 58, 837–844 (2017). 82. Dorier, J., Goundaroulis, D., Benedetti, F. & Stasiak, A. Knoto-ID: a tool to study the entanglement of open protein chains using the concept of knotoids. Bioinformatics 1, 3 (2018). 83. Kauffman, L. H. Virtual knot theory. arXiv preprint math/9811028 (1998). 84. Kauffman, L. H. Introduction to virtual knot theory, 502–541 (2012). 85. Manturov, V. O. & Ilyutko, D. P. Virtual Knots: The State of the Art (World Scientific, 2013). 86. Gügümcü, N. & Kauffman, L. H. New invariants of knotoids. arXiv preprint arXiv:1602.03579 (2016). 87. Miyazawa, Y. A multi-variable polynomial invariant for unoriented virtual knots and links. Journal of Knot Theory and Its Ramifications 18, 625–649 (2009). 88. Dye, H. & Kauffman, L. H. Virtual crossing number and the arrow polyno- mial. Journal of Knot Theory and Its Ramifications 18, 1335–1357 (2009). 89. Kauffman, L. H. & Radford, D. Bi-oriented quantum algebras, and a gener- alized Alexander polynomial for virtual links. Contemporary Mathematics 318, 113–140 (2003). 90. Manturov, V. O. & Ilyutko, D. P. Virtual Knots: The State of the Art (World Scientific, 2012). 91. Goussarov, M., Polyak, M. & Viro, O. Finite-type invariants of classical and virtual knots. Topology 39, 1045–1068 (2000). 92. Fenn, R., Kauffman, L. H. & Manturov, V. O. Virtual Knot Theory–Unsolved problems. arXiv preprint math/0405428 (2004). 93. Kauffman, L. H. in Knots In Hellas’ 98 143–202 (World Scientific, 2000). 94. Manturov, V. On invariants of virtual links. Acta Applicandae Mathematica 72, 295–309 (2002). 95. Bryant, T., Watson, H. & Wendell, P. Structure of yeast phosphoglycerate kinase. Nature 247, 14 (1974). 96. Schulz, G. E. & Schirmer, R. H. in Principles of protein structure 108–130 (Springer, 1979). 97. Crippen, G. M. Topology of globular proteins. Journal of theoretical biology 45, 327–338 (1974). 98. Crippen, G. M. Topology of globular proteins. II. Journal of theoretical biology 51, 495–500 (1975). 99. Klapper, M. H. & Klapper, I. Z. The ’knotting’ problem in proteins. Loop penetration. Biochimica et Biophysica Acta (BBA)-Protein Structure 626, 97–105 (1980). 100. Mao, B. Topological chirality of proteins. Protein Science 2, 1057–1059 (1993). 101. Benham, C. J. & Saleet Jafri, M. Disulfide bonding patterns and protein topologies. Protein Science 2, 41–54 (1993).

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] Bibliography 117

102. Liang, C. & Mislow, K. Knots in proteins. Journal of the American Chemical Society 116, 11189–11190 (1994). 103. Liang, C. & Mislow, K. Topological features of protein structures: knots and links. Journal of the American Chemical Society 117, 4201–4213 (1995). 104. Taylor, W. R. & Lin, K. Protein knots: a tangled problem. Nature 421, 25–25 (2003). 105. Richardson, J. S. -Sheet topology and the relatedness of proteins. Nature 268, 495 (1977). 106. Takusagawa, F. & Kamitori, S. A real knot in protein. Journal of the American Chemical Society 118, 8945–8946 (1996). 107. Nureki, O., Shirouzu, M., Hashimoto, K., Ishitani, R., Terada, T., Tamakoshi, M., Oshima, T., Chijimatsu, M., Takio, K., Vassylyev, D. G., et al. An enzyme with a deep trefoil knot for the active-site architecture. Acta Crystallographica Section D: Biological Crystallography 58, 1129–1137 (2002). 108. Nureki, O., Watanabe, K., Fukai, S., Ishii, R., Endo, Y., Hori, H. & Yokoyama, S. Deep knot structure for construction of active site and cofactor binding site of tRNA modification enzyme. Structure 12, 593–602 (2004). 109. Lai, Y.-L., Yen, S.-C., Yu, S.-H. & Hwang, J.-K. pKNOT: the protein KNOT web server. Nucleic acids research 35, W420–W424 (2007). 110. Lai, Y.-L., Chen, C.-C. & Hwang, J.-K. pKNOT v. 2: the protein KNOT web server. Nucleic acids research 40, W228–W231 (2012). 111. Kolesov, G., Virnau, P., Kardar, M. & Mirny, L. A. Protein knot server: de- tection of knots in protein structures. Nucleic acids research 35, W425–W428 (2007). 112. Tubiana, L., Polles, G., Orlandini, E. & Micheletti, C. KymoKnot: A web server and software package to identify and locate knots in trajectories of linear or circular polymers. The European Physical Journal E 41, 72 (2018). 113. Jamroz, M., Niemyska, W., Rawdon, E. J., Stasiak, A., Millett, K. C., Sułkowski, P. & Sulkowska, J. I. KnotProt: a database of proteins with knots and slip- knots. Nucleic acids research 43, D306–D314 (2014). 114. Bölinger, D., Sułkowska, J. I., Hsu, H.-P., Mirny, L. A., Kardar, M., Onuchic, J. N. & Virnau, P. A Stevedore’s protein knot. PLoS computational biology 6, e1000731 (2010). 115. Liu, R.-J., Zhou, M., Fang, Z.-P., Wang, M., Zhou, X.-L. & Wang, E.-D. The tRNA recognition mechanism of the minimalist SPOUT methyltransferase, TrmL. Nucleic acids research 41, 7828–7842 (2013). 116. Alexander, K., Taylor, A. J. & Dennis, M. R. Proteins analysed as virtual knots. Scientific Reports 7 (2017). 117. Goundaroulis, D., Dorier, J., Benedetti, F. & Stasiak, A. Studies of global and local entanglements of individual protein chains using the concept of kno- toids. Scientific reports 7, 6309 (2017). 118. Goundaroulis, D., Gügümcü, N., Lambropoulou, S., Dorier, J., Stasiak, A. & Kauffman, L. Topological Models for Open-Knotted Protein Chains Using the Concepts of Knotoids and Bonded Knotoids. Polymers 9, 444 (2017). 119. Boutz, D. R., Cascio, D., Whitelegge, J., Perry, L. J. & Yeates, T. O. Discovery of a thermophilic protein complex stabilized by topologically interlinked chains. Journal of molecular biology 368, 1332–1344 (2007). 120. Yan, L. Z. & Dawson, P. E. Design and synthesis of a protein catenane. Ange- wandte Chemie International Edition 40, 3625–3627 (2001). 121. Zhao, Y., Chwastyk, M. & Cieplak, M. Structural entanglements in protein complexes. The Journal of chemical physics 146, 225102 (2017). 122. Baiesi, M., Orlandini, E., Trovato, A. & Seno, F. Linking in domain-swapped protein dimers. Scientific reports 6, 33872 (2016).

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 118 Bibliography

123. Cao, Z., Roszak, A. W., Gourlay, L. J., Lindsay, J. G. & Isaacs, N. W. Bovine mi- tochondrial peroxiredoxin III forms a two-ring catenane. Structure 13, 1661– 1664 (2005). 124. Duda, R. L. Protein chainmail: catenated protein in viral capsids. Cell 94, 55– 60 (1998). 125. Helgstrand, C., Wikoff, W. R., Duda, R. L., Hendrix, R. W., Johnson, J. E. & Liljas, L. The refined structure of a protein catenane: the HK97 bacteriophage capsid at 3.44 Å resolution. Journal of molecular biology 334, 885–899 (2003). 126. Zhang, X., Guo, H., Jin, L., Czornyj, E., Hodes, A., Hui, W. H., Nieh, A. W., Miller, J. F. & Zhou, Z. H. A new topology of the HK97-like fold revealed in Bordetella bacteriophage by cryoEM at 3.5 Å resolution. Elife 2, e01299 (2013). 127. Connolly, M. L., Kuntz, I. & Crippen, G. M. Linked and threaded loops in proteins. Biopolymers: Original Research on Biomolecules 19, 1167–1182 (1980). 128. Hegemann, J. D., Zimmermann, M., Xie, X. & Marahiel, M. A. Lasso peptides: an intriguing class of bacterial natural products. Accounts of chemical research 48, 1909–1919 (2015). 129. Maksimov, M. O., Pan, S. J. & Link, A. J. Lasso peptides: structure, function, biosynthesis, and engineering. Natural product reports 29, 996–1006 (2012). 130. Li, Y., Zirah, S. & Rebuffat, S. in Lasso Peptides 97–103 (Springer, 2015). 131. Li, Y., Zirah, S. & Rebuffat, S. Lasso peptides: bacterial strategies to make and maintain bioactive entangled Scaffolds (Springer, 2014). 132. Bayro, M. J., Mukhopadhyay, J., Swapna, G., Huang, J. Y., Ma, L.-C., Sineva, E., Dawson, P. E., Montelione, G. T. & Ebright, R. H. Structure of antibacterial peptide microcin J25:a21-residue lariat protoknot. Journal of the American Chemical Society 125, 12382–12383 (2003). 133. Haglund, E., Sulkowska, J. I., Noel, J. K., Lammert, H., Onuchic, J. N. & Jennings, P. A. Pierced lasso bundles are a new class of knot-like motifs. PLoS computational biology 10, e1003613 (2014). 134. Haglund, E., Sułkowska, J. I., He, Z., Feng, G.-S., Jennings, P. A. & Onuchic, J. N. The unique cysteine knot regulates the pleotropic hormone leptin. Plos one 7, e45654 (2012). 135. Flapan, E. & Heller, G. Topological complexity in protein structures. Molecu- lar Based Mathematical Biology 3 (2015). 136. Liu, W. Is there a Möbius band in closed protein beta-sheets? Protein engineer- ing 10, 1373–1377 (1997). 137. Fontecilla-Camps, J. C., Habersetzer-Rochat, C. & Rochat, H. Orthorhombic crystals and three-dimensional structure of the potent toxin II from the scor- pion Androctonus australis Hector. Proceedings of the National Academy of Sci- ences 85, 7443–7447 (1988). 138. Almassy, R. J., Fontecilla-Camps, J. C., Suddath, F. & Bugg, C. E. Structure of variant-3 scorpion neurotoxin from Centruroides sculpturatus ewing, refined at 1· 8 Å resolution. Journal of molecular biology 170, 497–527 (1983). 139. Liang, C. & Mislow, K. Topological chirality of proteins. Journal of the Ameri- can Chemical Society 116, 3588–3592 (1994). 140. Baiesi, M., Orlandini, E., Seno, F. & Trovato, A. Sequence and structural pat- terns detected in entangled proteins reveal the importance of cotranslational folding. Scientific Reports 9, 8426 (2019). 141. Caraglio, M., Micheletti, C. & Orlandini, E. Physical links: defining and de- tecting inter-chain entanglement. Scientific reports 7, 1156 (2017). 142. Pagès, G., Kinzina, E. & Grudinin, S. Analytical symmetry detection in pro- tein assemblies. I. Cyclic symmetries. Journal of Structural Biology 203, 142–148 (2018). 143. Pagès, G. & Grudinin, S. Analytical symmetry detection in protein assem- blies. II. Dihedral and cubic symmetries. Journal of Structural Biology 203, 185– 194 (2018).

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] Bibliography 119

144. Zaj ˛ac,S., Geary, C., Andersen, E. S., Dabrowski-Tumanski, P., Sulkowska, J. I. & Sułkowski, P. Genus trace reveals the topological complexity and domain structure of biomolecules. Scientific Reports 8, 17537 (2018). 145. Rayleigh, L. XXXI. On the problem of random vibrations, and of random flights in one, two, or three dimensions. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 37, 321–347 (1919). 146. Alvarado, S., Calvo, J. A. & Millett, K. C. The generation of random equilat- eral polygons. Journal of Statistical Physics 143, 102–138 (2011). 147. Varela, R., Hinson, K., Arsuaga, J. & Diao, Y. A fast ergodic algorithm for generating ensembles of equilateral random polygons. Journal of Physics A: Mathematical and Theoretical 42, 095204 (2009). 148. Vologodskii, A. V., Anshelevich, V. V., Lukashin, A. V. & Frank-Kamenetskii, M. D. Statistical mechanics of supercoils and the torsional stiffness of the DNA double helix. Nature 280, 294 (1979). 149. Moore, N. T., Lua, R. C. & Grosberg, A. Y. Topologically driven swelling of a polymer loop. Proceedings of the National Academy of Sciences 101, 13431–13435 (2004). 150. Plunkett, P., Piatek, M., Dobay, A., Kern, J. C., Millett, K. C., Stasiak, A. & Rawdon, E. J. Total curvature and total torsion of knotted polymers. Macro- molecules 40, 3860–3867 (2007). 151. Cantarella, J., Deguchi, T. & Shonkwiler, C. Probability theory of random polygons from the quaternionic viewpoint. Communications on Pure and Ap- plied Mathematics 67, 1658–1699 (2014). 152. Cantarella, J., Shonkwiler, C., et al. The symplectic geometry of closed equi- lateral random walks in 3-space. The Annals of Applied Probability 26, 549–596 (2016). 153. Moore, N. T. & Grosberg, A. Y. Limits of analogy between self-avoidance and topology-driven swelling of polymer loops. Physical Review E 72, 061803 (2005). 154. Diao, Y., Ernst, C., Montemayor, A. & Ziegler, U. Generating equilateral ran- dom polygons in confinement. Journal of Physics A: Mathematical and Theoreti- cal 44, 405202 (2011). 155. Diao, Y., Ernst, C., Montemayor, A. & Ziegler, U. Generating equilateral ran- dom polygons in confinement II. Journal of Physics A: Mathematical and Theo- retical 45, 275203 (2012). 156. Diao, Y., Ernst, C., Montemayor, A. & Ziegler, U. Generating equilateral ran- dom polygons in confinement III. Journal of Physics A: Mathematical and Theo- retical 45, 465003 (2012). 157. Cantarella, J., Duplantier, B., Shonkwiler, C. & Uehara, E. A fast direct sam- pling algorithm for equilateral closed polygons. Journal of Physics A: Mathe- matical and Theoretical 49, 275202 (2016). 158. Uehara, E. & Deguchi, T. Statistical and hydrodynamic properties of topolog- ical polymers for various graphs showing enhanced short-range correlation. The Journal of chemical physics 145, 164905 (2016). 159. Delbruck, M. Knotting problems in biology. Plant Genome Data and Informa- tion Center collection on computational molecular biology and genetics (1961). 160. Frisch, H. L. & Wasserman, E. Chemical topology1. Journal of the American Chemical Society 83, 3789–3795 (1961). 161. Sumners, D. & Whittington, S. Knots in self-avoiding walks. Journal of Physics A: Mathematical and General 21, 1689 (1988). 162. Pippenger, N. Knots in random walks. Discrete Applied Mathematics 25, 273– 278 (1989). 163. Diao, Y., Pippenger, N. & Sumners, D. W. On random knots. Journal of knot theory and its ramifications 3, 419–429 (1994).

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 120 Bibliography

164. Diao, Y. The knotting of equilateral polygons in R3. Journal of Knot Theory and its Ramifications 4, 189–196 (1995). 165. Soteros, C., Sumners, D. & Whittington, S. Entanglement complexity of graphs in Z3. 111, 75–91 (1992). 166. Deguchi, T. & Tsurusaki, K. A statistical study of random knotting using the Vassiliev invariants. Journal of Knot Theory and Its Ramifications 3, 321–353 (1994). 167. Duplantier, B. Linking numbers, contacts, and mutual inductances of a ran- dom set of closed curves. Communications in mathematical physics 82, 41–68 (1981). 168. Pohl, W. F. The probability of linking of random closed curves, 113–126 (1981). 169. Lang, M, Fischer, J. & Sommer, J.-U. Effect of topology on the conformations of ring polymers. Macromolecules 45, 7642–7648 (2012). 170. Lang, M. Ring conformations in bidisperse blends of ring polymers. Macro- molecules 46, 1158–1166 (2013). 171. Diao, Y. in Random Knotting And Linking 147–157 (World Scientific, 1994). 172. Diao, Y & van Rensburg, E. J. in Topology and geometry in polymer science 79–88 (Springer, 1998). 173. Orlandini, E, Van Rensburg, E. J., Tesi, M. & Whittington, S. Random link- ing of lattice polygons. Journal of Physics A: Mathematical and General 27, 335 (1994). 174. Soteros, C., Sumners, D. & Whittington, S. Linking of random p-spheres in Zd. Journal of Knot Theory and Its Ramifications 8, 49–70 (1999). 175. Smrek, J. & Grosberg, A. Y. Minimal surfaces on unconcatenated polymer rings in melt. ACS Macro Letters 5, 750–754 (2016). 176. Michieletto, D. & Turner, M. S. A topologically driven glass in ring polymers. Proceedings of the National Academy of Sciences 113, 5195–5200 (2016). 177. Michieletto, D., Marenduzzo, D., Orlandini, E. & Turner, M. Ring polymers: Threadings, knot electrophoresis and topological glasses. Polymers 9, 349 (2017). 178. Michieletto, D., Marenduzzo, D., Orlandini, E., Alexander, G. P. & Turner, M. S. Threading dynamics of ring polymers in a gel. ACS Macro Letters 3, 255–259 (2014). 179. Michieletto, D., Marenduzzo, D., Orlandini, E., Alexander, G. P. & Turner, M. S. Dynamics of self-threading ring polymers in a gel. Soft Matter 10, 5936– 5944 (2014). 180. Uehara, E. & Deguchi, T. Statistical properties of multi-theta polymer chains. Journal of Physics A: Mathematical and Theoretical 51, 134001 (2018). 181. Uehara, E. & Deguchi, T. Mean-square radius of gyration and the hydrody- namic radius for topological polymers expressed with graphs evaluated by the method of quaternions revisited. Reactive and Functional Polymers 133, 93– 102 (2018). 182. Deguchi, T. & Uehara, E. Statistical and dynamical properties of topological polymers with graphs and ring polymers with knots. Polymers 9, 252 (2017). 183. Caraglio, M., Micheletti, C. & Orlandini, E. Mechanical pulling of linked ring polymers: Elastic response and link localisation. Polymers 9, 327 (2017). 184. Tsalikis, D. G., Mavrantzas, V. G. & Vlassopoulos, D. Analysis of slow modes in ring polymers: Threading of rings controls long-time relaxation. ACS Macro Letters 5, 755–760 (2016). 185. Tsalikis, D. G. & Mavrantzas, V. G. Threading of ring poly (ethylene oxide) molecules by linear chains in the melt. ACS Macro Letters 3, 763–766 (2014). 186. Lua, R. C. & Grosberg, A. Y. Statistics of knots, geometry of conformations, and evolution of proteins. PLoS computational biology 2, e45 (2006).

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] Bibliography 121

187. Rudnick, J. & Gaspari, G. The aspherity of random walks. Journal of Physics A: Mathematical and General 19, L191 (1986). 188. Šolc, K. Shape of a Random-Flight Chain. The Journal of Chemical Physics 55, 335–344 (1971). 189. Aronovitz, J. & Nelson, D. Universal features of polymer shapes. Journal de physique 47, 1445–1456 (1986). 190. Millett, K. C., Plunkett, P., Piatek, M., Rawdon, E. J. & Stasiak, A. Effect of knotting on polymer shapes and their enveloping ellipsoids. The Journal of chemical physics 130, 04B623 (2009). 191. Flory, P. J. & Volkenstein, M. Statistical mechanics of chain molecules. Biopoly- mers: Original Research on Biomolecules 8, 699–700 (1969). 192. Bhattacharjee, S. M., Giacometti, A. & Maritan, A. Flory theory for polymers. Journal of Physics: Condensed Matter 25, 503101 (2013). 193. Grosberg, A. Y. Critical exponents for random knots. Physical review letters 85, 3858 (2000). 194. Rawdon, E., Dobay, A., Kern, J. C., Millett, K. C., Piatek, M., Plunkett, P. & Stasiak, A. Scaling behavior and equilibrium lengths of knotted polymers. Macromolecules 41, 4444–4451 (2008). 195. Matsuda, H., Yao, A., Tsukahara, H., Deguchi, T., Furuta, K. & Inami, T. Av- erage size of random polygons with fixed knot topology. Physical Review E 68, 011102 (2003). 196. Shimamura, M. K. & Deguchi, T. Anomalous finite-size effects for the mean- squared gyration radius of Gaussian random knots. Journal of Physics A: Math- ematical and General 35, L241 (2002). 197. Des Cloizeaux, J & Mehta, M. Topological constraints on polymer rings and critical indices. Journal de Physique 40, 665–670 (1979). 198. Quake, S. R. Topological effects of knots in polymers. Physical review letters 73, 3317 (1994). 199. Van Rensburg, E. J. & Whittington, S. The dimensions of knotted polygons. Journal of Physics A: Mathematical and General 24, 3935 (1991). 200. Orlandini, E, Tesi, M., Van Rensburg, E. J. & Whittington, S. G. Asymptotics of knotted lattice polygons. Journal of Physics A: Mathematical and General 31, 5953 (1998). 201. Deutsch, J. Equilibrium size of large ring molecules. Physical Review E 59, R2539 (1999). 202. Shimamura, M. K. & Deguchi, T. Gyration radius of a circular polymer under a topological constraint with excluded volume. Physical Review E 64, 020801 (2001). 203. Shimamura, M. K. & Deguchi, T. Finite-size and asymptotic behaviors of the gyration radius of knotted cylindrical self-avoiding polygons. Physical Review E 65, 051802 (2002). 204. Bishop, M. & Saltiel, C. J. The shapes of two-, four-, and five-dimensional linear and ring polymers. The Journal of chemical physics 85, 6728–6731 (1986). 205. Cannon, J. W., Aronovitz, J. A. & Goldbart, P. Equilibrium distribution of shapes for linear and star macromolecules. Journal de Physique I 1, 629–645 (1991). 206. Alim, K. & Frey, E. Shapes of semiflexible polymer rings. Physical review letters 99, 198102 (2007). 207. Ostermeir, K., Alim, K. & Frey, E. Buckling of stiff polymer rings in weak spherical confinement. Physical Review E 81, 061802 (2010). 208. Drube, F., Alim, K., Witz, G., Dietler, G. & Frey, E. Excluded volume effects on semiflexible ring polymers. Nano letters 10, 1445–1449 (2010). 209. Jagodzinski, O. The asphericity of star polymers: a renormalization group study. Journal of Physics A: Mathematical and General 27, 1471 (1994).

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 122 Bibliography

210. Jagodzinski, O, Eisenriegler, E & Kremer, K. Universal shape properties of open and closed polymer chains: renormalization group analysis and Monte Carlo experiments. Journal de Physique I 2, 2243–2279 (1992). 211. Diehl, H. & Eisenriegler, E. Universal shape ratios for open and closed ran- dom walks: exact results for all d. Journal of Physics A: Mathematical and Gen- eral 22, L87 (1989). 212. Rawdon, E. J., Kern, J. C., Piatek, M., Plunkett, P., Stasiak, A. & Millett, K. C. Effect of knotting on the shape of polymers. Macromolecules 41, 8281–8287 (2008). 213. Mallam, A. L. & Jackson, S. E. The dimerization of an ↵/-knotted protein is essential for structure and function. Structure 15, 111–122 (2007). 214. Christian, T., Sakaguchi, R., Perlinska, A. P., Lahoud, G., Ito, T., Taylor, E. A., Yokoyama, S., Sulkowska, J. I. & Hou, Y.-M. Methyl transfer by substrate signaling from a knotted protein fold. Nature structural & molecular biology 23, 941 (2016). 215. King, N. P., Jacobitz, A. W., Sawaya, M. R., Goldschmidt, L. & Yeates, T. O. Structure and folding of a designed knotted protein. Proceedings of the Na- tional Academy of Sciences 107, 20732–20737 (2010). 216. Ko, K.-T., Hu, I.-C., Huang, K.-F., Lyu, P.-C. & Hsu, S.-T. D. Untying a Knot- ted SPOUT RNA Methyltransferase by Circular Permutation Results in a Domain-Swapped Dimer. Structure (2019). 217. Sułkowska, J. I., Sułkowski, P., Szymczak, P & Cieplak, M. Stabilizing effect of knots on proteins. Proceedings of the National Academy of Sciences 105, 19714– 19719 (2008). 218. Alam, M. T., Yamada, T., Carlsson, U. & Ikai, A. The importance of being knotted: effects of the C-terminal knot structure on enzymatic and mechani- cal properties of bovine carbonic anhydrase II. FEBS letters 519, 35–40 (2002). 219. Sułkowska, J. I. & Cieplak, M. Mechanical stretching of proteins - a theoret- ical survey of the Protein Data Bank. Journal of Physics: Condensed Matter 19, 283201 (2007). 220. Bornschlögl, T., Anstrom, D. M., Mey, E., Dzubiella, J., Rief, M. & Forest, K. T. Tightening the knot in phytochrome by single-molecule atomic force microscopy. Biophysical journal 96, 1508–1514 (2009). 221. Sułkowska, J. I., Sułkowski, P. & Onuchic, J. N. Jamming proteins with slip- knots and their free energy landscape. Physical review letters 103, 268103 (2009). 222. He, C., Genchev, G. Z., Lu, H. & Li, H. Mechanically untying a protein slip- knot: multiple pathways revealed by force spectroscopy and steered molecu- lar dynamics simulations. Journal of the American Chemical Society 134, 10428– 10435 (2012). 223. Yeates, T. O., Norcross, T. S. & King, N. P. Knotted and topologically com- plex proteins as models for studying folding and stability. Current opinion in chemical biology 11, 595–603 (2007). 224. Sułkowska, J. I., Noel, J. K. & Onuchic, J. N. Energy landscape of knotted pro- tein folding. Proceedings of the National Academy of Sciences 109, 17783–17788 (2012). 225. Sayre, T. C., Lee, T. M., King, N. P. & Yeates, T. O. Protein stabilization in a highly knotted protein polymer. Protein Engineering, Design & Selection 24, 627–630 (2011). 226. Faísca, P. F. Knotted proteins: A tangled tale of structural biology. Computa- tional and structural biotechnology journal 13, 459–468 (2015). 227. Sriramoju, M. K., Chen, Y., Lee, Y.-T. C. & Hsu, S.-T. D. Topologically knot- ted deubiquitinases exhibit unprecedented mechanostability to withstand the proteolysis by an AAA+ protease. Scientific reports 8, 7076 (2018).

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] Bibliography 123

228. Sivertsson, E. M., Jackson, S. E. & Itzhaki, L. S. The AAA+ protease ClpXP can easily degrade a 31 and a 52-knotted protein. Scientific reports 9, 2421 (2019). 229. Wojciechowski, M., Gómez-Sicilia, À., Carrión-Vázquez, M. & Cieplak, M. Unfolding knots by proteasome-like systems: Simulations of the behaviour of folded and neurotoxic proteins. Molecular bioSystems 12, 2700–2712 (2016). 230. San Martín, Á., Rodriguez-Aliaga, P., Molina, J. A., Martin, A., Bustamante, C. & Baez, M. Knots can impair protein degradation by ATP-dependent pro- teases. Proceedings of the National Academy of Sciences 114, 9864–9869 (2017). 231. Mallam, A. L., Rogers, J. M. & Jackson, S. E. Experimental detection of knot- ted conformations in denatured proteins. Proceedings of the National Academy of Sciences 107, 8189–8194 (2010). 232. Andrews, B. T., Capraro, D. T., Sulkowska, J. I., Onuchic, J. N. & Jennings, P. A. Hysteresis as a marker for complex, overlapping landscapes in proteins. The journal of physical chemistry letters 4, 180–188 (2012). 233. Wang, P., Yang, L., Liu, P., Gao, Y. Q. & Zhao, X. S. Single-Molecule Detection Reveals Knot Sliding in TrmD Denaturation. Chemistry–A European Journal 19, 5909–5916 (2013). 234. Soler, M. A., Nunes, A. & Faísca, P. F. Effects of knot type in the folding of topologically complex lattice proteins. The Journal of chemical physics 141, 07B607_1 (2014). 235. Yuzenkova, J., Delgado, M., Nechaev, S., Savalia, D., Epshtein, V., Artsimovitch, I., Mooney, R. A., Landick, R., Farias, R. N., Salomon, R., et al. Mutations of bacterial RNA polymerase leading to resistance to microcin J25. Journal of Biological Chemistry 277, 50867–50875 (2002). 236. Mukhopadhyay, J., Sineva, E., Knight, J., Levy, R. M. & Ebright, R. H. An- tibacterial peptide microcin J25 inhibits transcription by binding within and obstructing the RNA polymerase secondary channel. Molecular cell 14, 739– 751 (2004). 237. Adelman, K., Yuzenkova, J., La Porta, A., Zenkin, N., Lee, J., Lis, J. T., Borukhov, S., Wang, M. D. & Severinov, K. Molecular mechanism of transcription inhi- bition by peptide antibiotic microcin J25. Molecular cell 14, 753–762 (2004). 238. Haglund, E., Pilko, A., Wollman, R., Jennings, P. A. & Onuchic, J. N. Pierced lasso topology controls function in leptin. The Journal of Physical Chemistry B 121, 706–718 (2017). 239. De Oliveira, A. L., Gallo, M., Pazzagli, L., Benedetti, C. E., Cappugi, G., Scala, A., Pantera, B., Spisni, A., Pertinhez, T. A. & Cicero, D. O. The structure of the elicitor cerato-platanin (CP), the first member of the CP fungal protein family, reveals a double -barrel fold and carbohydrate binding. Journal of Biological Chemistry 286, 17560–17568 (2011). 240. Hayashida, S., Ohta, K. & Mo, K. in Methods in Enzymology 323–332 (Elsevier, 1988). 241. Xu, B., Hellman, U., Ersson, B. & Janson, J.-C. Purification, characterization and amino-acid sequence analysis of a thermostable, low molecular mass endo--1, 4-glucanase from blue mussel, Mytilus edulis. European journal of biochemistry 267, 4970–4977 (2000). 242. Zhao, Y. & Cieplak, M. Stability of structurally entangled protein dimers. Proteins: Structure, Function, and Bioinformatics 86, 945–955 (2018). 243. Levinthal, C. How to fold graciously. Mossbauer spectroscopy in biological sys- tems 67, 22–24 (1969). 244. Bryngelson, J. D. & Wolynes, P. G. Spin glasses and the statistical mechanics of protein folding. Proceedings of the National Academy of Sciences 84, 7524–7528 (1987). 245. Leopold, P. E., Montal, M. & Onuchic, J. N. Protein folding funnels: a kinetic approach to the sequence-structure relationship. Proceedings of the National Academy of Sciences 89, 8721–8725 (1992).

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 124 Bibliography

246. Onuchic, J. N. & Wolynes, P. G. Theory of protein folding. Current opinion in structural biology 14, 70–75 (2004). 247. Onuchic, J. N., Luthey-Schulten, Z. & Wolynes, P. G. Theory of protein fold- ing: the energy landscape perspective. Annual review of physical chemistry 48, 545–600 (1997). 248. Onuchic, J. N., Nymeyer, H., García, A. E., Chahine, J. & Socci, N. D. The energy landscape theory of protein folding: insights into folding mechanisms and scenarios. Advances in protein chemistry 53, 87–152 (2000). 249. Socci, N., Onuchic, J. N. & Wolynes, P. G. Diffusive dynamics of the reac- tion coordinate for protein folding funnels. The Journal of chemical physics 104, 5860–5868 (1996). 250. Peters, B. & Trout, B. L. Obtaining reaction coordinates by likelihood maxi- mization. The Journal of chemical physics 125, 054108 (2006). 251. Best, R. B. & Hummer, G. Reaction coordinates and rates from transition paths. Proceedings of the National Academy of Sciences 102, 6732–6737 (2005). 252. Cho, S. S., Levy, Y. & Wolynes, P. G. P versus Q: Structural reaction coor- dinates capture protein folding on smooth landscapes. Proceedings of the Na- tional Academy of Sciences 103, 586–591 (2006). 253. Best, R. B., Hummer, G. & Eaton, W. A. Native contacts determine pro- tein folding mechanisms in atomistic simulations. Proceedings of the National Academy of Sciences 110, 17874–17879 (2013). 254. Sułkowska, J. I. & Cieplak, M. Selection of optimal variants of Go-like¯ models of proteins through studies of stretching. Biophysical journal 95, 3174–3191 (2008). 255. Clementi, C., Vendruscolo, M., Maritan, A. & Domany, E. Folding Lennard- Jones proteins by a contact potential. Proteins: Structure, Function, and Bioin- formatics 37, 544–553 (1999). 256. Hoang, T. X. & Cieplak, M. Molecular dynamics of folding of secondary structures in Go-type models of proteins. The Journal of Chemical Physics 112, 6851–6862 (2000). 257. Clementi, C., Nymeyer, H. & Onuchic, J. N. Topological and energetic fac- tors: what determines the structural details of the transition state ensemble and "en-route" intermediates for protein folding? An investigation for small globular proteins. Journal of molecular biology 298, 937–953 (2000). 258. Tsai, J., Taylor, R., Chothia, C. & Gerstein, M. The packing density in proteins: standard radii and volumes. Journal of molecular biology 290, 253–266 (1999). 259. Sobolev, V., Sorokine, A., Prilusky, J., Abola, E. E. & Edelman, M. Automated analysis of interatomic contacts in proteins. Bioinformatics (Oxford, England) 15, 327–332 (1999). 260. Wołek, K., Gómez-Sicilia, À. & Cieplak, M. Determination of contact maps in proteins: a combination of structural and chemical approaches. The Journal of chemical physics 143, 243105 (2015). 261. Noel, J. K., Whitford, P. C., Sanbonmatsu, K. Y. & Onuchic, J. N. SMOG@ ctbp: simplified deployment of structure-based models in GROMACS. Nucleic acids research 38, W657–W661 (2010). 262. Noel, J. K., Levi, M., Raghunathan, M., Lammert, H., Hayes, R. L., Onuchic, J. N. & Whitford, P. C. SMOG 2: A versatile software package for generating structure-based models. PLoS computational biology 12, e1004794 (2016). 263. Kumar, S., Rosenberg, J. M., Bouzida, D., Swendsen, R. H. & Kollman, P. A. The weighted histogram analysis method for free-energy calculations on biomolecules. I. The method. Journal of computational chemistry 13, 1011–1021 (1992). 264. Prabhu, N. V. & Sharp, K. A. Heat capacity in proteins. Annu. Rev. Phys. Chem. 56, 521–548 (2005).

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] Bibliography 125

265. Lindorff-Larsen, K., Piana, S., Dror, R. O. & Shaw, D. E. How fast-folding proteins fold. Science 334, 517–520 (2011). 266. Piana, S., Klepeis, J. L. & Shaw, D. E. Assessing the accuracy of physical models used in protein-folding simulations: quantitative evidence from long molecular dynamics simulations. Current opinion in structural biology 24, 98– 105 (2014). 267. Das, R. & Baker, D. Macromolecular modeling with rosetta. Annu. Rev. Biochem. 77, 363–382 (2008). 268. Gopal, S. M., Mukherjee, S., Cheng, Y.-M. & Feig, M. PRIMO/PRIMONA: A coarse-grained model for proteins and nucleic acids that preserves near- atomistic accuracy. Proteins: Structure, Function, and Bioinformatics 78, 1266– 1281 (2010). 269. Kar, P., Gopal, S. M., Cheng, Y.-M., Predeus, A. & Feig, M. PRIMO: a trans- ferable coarse-grained force field for proteins. Journal of chemical theory and computation 9, 3769–3788 (2013). 270. Koli´nski,A. et al. Protein modeling and structure prediction with a reduced representation. Acta Biochimica Polonica 51 (2004). 271. Liwo, A., Baranowski, M., Czaplewski, C., Goła´s,E., He, Y., Jagieła, D., Krupa, P., Maciejczyk, M., Makowski, M., Mozolewska, M. A., et al. A unified coarse- grained model of biological macromolecules based on mean-field multipole– multipole interactions. Journal of molecular modeling 20, 2306 (2014). 272. Kolinski, A., Jaroszewski, L., Rotkiewicz, P. & Skolnick, J. An efficient Monte Carlo model of protein chains. Modeling the short-range correlations be- tween side group centers of mass. The Journal of Physical Chemistry B 102, 4628–4637 (1998). 273. Kolinski, A. & Skolnick, J. Assembly of protein structure from sparse experi- mental data: an efficient Monte Carlo model. Proteins: Structure, Function, and Bioinformatics 32, 475–494 (1998). 274. Dawid, A. E., Gront, D. & Kolinski, A. SURPASS low-resolution coarse-grained protein modeling. Journal of chemical theory and computation 13, 5766–5779 (2017). 275. Dawid, A. E., Koli´nski,A. & Gront, D. Novel Coarse-Graining Approaches for Large Scale Protein Modeling. Biophysical Journal 114, 574a(2018). 276. Kmiecik, S., Gront, D., Kolinski, M., Wieteska, L., Dawid, A. E. & Kolinski, A. Coarse-grained protein models and their applications. Chemical reviews 116, 7898–7936 (2016). 277. Tanaka, S. & Scheraga, H. A. Medium-and long-range interaction parameters between amino acids for predicting three-dimensional structures of proteins. Macromolecules 9, 945–950 (1976). 278. Sippl, M. J. Boltzmann’s principle, knowledge-based mean fields and protein folding. An approach to the computational determination of protein struc- tures. Journal of computer-aided molecular design 7, 473–501 (1993). 279. Shen, M.-y. & Sali, A. Statistical potential for assessment and prediction of protein structures. Protein science 15, 2507–2524 (2006). 280. Miyazawa, S. & Jernigan, R. L. Estimation of effective interresidue contact en- ergies from protein crystal structures: quasi-chemical approximation. Macro- molecules 18, 534–552 (1985). 281. Miyazawa, S. & Jernigan, R. L. Residue–residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simula- tion and threading. Journal of molecular biology 256, 623–644 (1996). 282. Taketomi, H., Ueda, Y. & Go,¯ N. Studies on protein folding, unfolding and fluctuations by computer simulation I. The effect of specific amino acid se- quence represented by specific inter-unit interactions. International journal of peptide and protein research 7, 445–459 (1975).

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 126 Bibliography

283. Go,¯ N. & Taketomi, H. Studies on protein folding, unfolding and fluctuations by computer simulation IV. Hydrophobic Interactions. International journal of peptide and protein research 13, 447–461 (1979). 284. Lammert, H., Schug, A. & Onuchic, J. N. Robustness and generalization of structure-based models for protein folding and function. Proteins: Structure, Function, and Bioinformatics 77, 881–891 (2009). 285. Škrbi´c,T., Micheletti, C. & Faccioli, P. The role of non-native interactions in the folding of knotted proteins. PLoS computational biology 8, e1002504 (2012). 286. Dhar, A., Samiotakis, A., Ebbinghaus, S., Nienhaus, L., Homouz, D., Grue- bele, M. & Cheung, M. S. Structure, function, and folding of phosphoglycer- ate kinase are strongly perturbed by macromolecular crowding. Proceedings of the National Academy of Sciences 107, 17586–17591 (2010). 287. Kwieci´nska,J. I. & Cieplak, M. Chirality and protein folding. Journal of Physics: Condensed Matter 17, S1565 (2005). 288. Wallin, S., Zeldovich, K. B. & Shakhnovich, E. I. The folding mechanics of a knotted protein. Journal of molecular biology 368, 884–893 (2007). 289. Niewieczerzal, S. & Sulkowska, J. I. Knotting and unknotting proteins in the chaperonin cage: Effects of the excluded volume. PloS one 12, e0176744 (2017). 290. Chwastyk, M. & Cieplak, M. Cotranslational folding of deeply knotted pro- teins. Journal of Physics: Condensed Matter 27, 354105 (2015). 291. Chwastyk, M. & Cieplak, M. Multiple folding pathways of proteins with shallow knots and co-translational folding. The Journal of chemical physics 143, 07B611_1 (2015). 292. Bui, P. T. & Hoang, T. X. Protein escape at the ribosomal exit tunnel: Effects of native interactions, tunnel length, and macromolecular crowding. The Journal of Chemical Physics 149, 045102 (2018). 293. Bui, P. T. & Hoang, T. X. Folding and escape of nascent proteins at ribosomal exit tunnel. The Journal of chemical physics 144, 095102 (2016). 294. Noel, J. K., Sułkowska, J. I. & Onuchic, J. N. Slipknotting upon native-like loop formation in a trefoil knot protein. Proceedings of the National Academy of Sciences 107, 15403–15408 (2010). 295. Sułkowska, J. I., Noel, J. K., Ramírez-Sarmiento, C. A., Rawdon, E. J., Millett, K. C. & Onuchic, J. N. Knotting pathways in proteins 2013. 296. A Beccara, S., Škrbi´c,T., Covino, R., Micheletti, C. & Faccioli, P. Folding path- ways of a knotted protein with a realistic atomistic force field. PLoS computa- tional biology 9, e1003002 (2013). 297. Noel, J. K., Onuchic, J. N. & Sulkowska, J. I. Knotting a protein in explicit solvent. The Journal of Physical Chemistry Letters 4, 3570–3573 (2013). 298. Covino, R., Škrbi´c,T., Beccara, S., Faccioli, P., Micheletti, C., et al. The role of non-native interactions in the folding of knotted proteins: insights from molecular dynamics simulations. Biomolecules 4, 1–19 (2014). 299. Zhao, Y., Chwastyk, M. & Cieplak, M. Topological transformations in pro- teins: effects of heating and proximity of an interface. Scientific reports 7, 39851 (2017). 300. Wang, I., Chen, S.-Y. & Hsu, S.-T. D. Unraveling the folding mechanism of the smallest knotted protein, MJ0366. The Journal of Physical Chemistry B 119, 4359–4370 (2015). 301. Sułkowska, J. I., Sułkowski, P. & Onuchic, J. Dodging the crisis of folding proteins with knots. Proceedings of the National Academy of Sciences 106, 3119– 3124 (2009). 302. Li, W., Terakawa, T., Wang, W. & Takada, S. Energy landscape and multiroute folding of topologically complex proteins adenylate kinase and 2ouf-knot. Proceedings of the National Academy of Sciences 109, 17789–17794 (2012).

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] Bibliography 127

303. Lim, N. C. & Jackson, S. E. Mechanistic insights into the folding of knotted proteins in vitro and in vivo. Journal of molecular biology 427, 248–258 (2015). 304. Andersson, F. I., Pina, D. G., Mallam, A. L., Blaser, G. & Jackson, S. E. Un- tangling the folding mechanism of the 52-knotted protein UCH-L3. The FEBS journal 276, 2625–2635 (2009). 305. Sorokina, I. & Mushegian, A. The role of the backbone torsion in protein folding. Biology direct 11, 64 (2016). 306. Jarmolinska, A. I., Perlinska, A. P., Runkel, R., Trefz, B., Ginn, H. M., Virnau, P. & Sulkowska, J. I. Proteins’ Knotty Problems. Journal of molecular biology 431, 244–257 (2019). 307. Especial, J. N. C., Nunes, A., Rey, A. & Faisca, P. F. Hydrophobic confinement modulates thermal stability and assists knotting in the folding of tangled proteins. Physical Chemistry Chemical Physics (2019). 308. Morcos, F., Pagnani, A., Lunt, B., Bertolino, A., Marks, D. S., Sander, C., Zecchina, R., Onuchic, J. N., Hwa, T. & Weigt, M. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proceedings of the National Academy of Sciences 108, E1293–E1301 (2011).

309. Zhang, H. & Jackson, S. E. Characterization of the folding of a 52-knotted protein using engineered single-tryptophan variants. Biophysical journal 111, 2587–2599 (2016). 310. Bronsoms, S., Pantoja-Uceda, D., Gabrijelcic-Geiger, D., Sanglas, L., Aviles, F. X., Santoro, J., Sommerhoff, C. P. & Arolas, J. L. Oxidative folding and structural analyses of a kunitz-related inhibitor and its disulfide intermedi- ates: functional implications. Journal of molecular biology 414, 427–441 (2011). 311. Webb, B. & Sali, A. in Protein Structure Prediction 1–15 (Springer, 2014). 312. Eswar, N., Webb, B., Marti-Renom, M. A., Madhusudhan, M., Eramian, D., Shen, M.-y., Pieper, U. & Sali, A. Comparative protein structure modeling using Modeller. Current protocols in bioinformatics 15, 5–6 (2006). 313. Holm, L. & Rosenström, P. Dali server: conservation mapping in 3D. Nucleic acids research 38, W545–W549 (2010). 314. Cock, P. J., Antao, T., Chang, J. T., Chapman, B. A., Cox, C. J., Dalke, A., Friedberg, I., Hamelryck, T., Kauff, F., Wilczynski, B., et al. Biopython: freely available Python tools for computational molecular biology and bioinformat- ics. Bioinformatics 25, 1422–1423 (2009). 315. Chapman, B. & Chang, J. Biopython: Python tools for computational biology. ACM Sigbio Newsletter 20, 15–19 (2000). 316. Szymczak, P. Translocation of knotted proteins through a pore. The European Physical Journal Special Topics 223, 1805–1812 (2014). 317. Szymczak, P. Periodic forces trigger knot untying during translocation of knotted proteins. Scientific reports 6, 21702 (2016). 318. Suma, A., Rosa, A. & Micheletti, C. Pore translocation of knotted polymer chains: How friction depends on knot complexity. ACS Macro Letters 4, 1420– 1424 (2015). 319. Chuang, Y.-C., Hu, I.-C., Lyu, P.-C. & Hsu, S.-T. D. Untying a Protein Knot by Circular Permutation. Journal of molecular biology 431, 857–863 (2019). 320. DeTurck, D., Gluck, H., Komendarczyk, R., Melvin, P., Shonkwiler, C. & Vela- Vick, D. S. Triple linking numbers, ambiguous Hopf invariants and integral formulas for three-component links. arXiv preprint arXiv:0901.1612 (2009). 321. DeTurck, D., Gluck, H., Komendarczyk, R., Melvin, P., Shonkwiler, C. & Vela- Vick, D. S. Pontryagin invariants and integral formulas for Milnor’s triple linking number. arXiv preprint arXiv:1101.3374 (2011). 322. Ferrari, F., Piatek, M. R. & Zhao, Y. A topological field theory for Milnor’s triple linking number. Journal of Physics A: Mathematical and Theoretical 48, 275402 (2015).

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] 128 Bibliography

323. Czaplewski, C., Ołdziej, S., Liwo, A. & Scheraga, H. A. Prediction of the struc- tures of proteins with the UNRES force field, including dynamic formation and breaking of disulfide bonds. Protein Engineering Design and Selection 17, 29–36 (2004). 324. Chinchio, M, Czaplewski, C, Liwo, A, Ołdziej, S & Scheraga, H. Dynamic for- mation and breaking of disulfide bonds in molecular dynamics simulations with the UNRES force field. Journal of chemical theory and computation 3, 1236– 1248 (2007). 325. Clavel, C., Fournel-Marotte, K. & Coutrot, F. A pH-sensitive peptide-containing lasso molecular switch. Molecules 18, 11553–11575 (2013). 326. Saito, F. & Bode, J. W. Synthesis and stabilities of peptide-based [1] rotaxanes: molecular grafting onto lasso peptide scaffolds. Chemical science 8, 2878–2884 (2017). 327. Allen, C. D. & Link, A. J. Self-assembly of catenanes from lasso peptides. Journal of the American Chemical Society 138, 14214–14217 (2016). 328. Zong, C., Wu, M. J., Qin, J. Z. & Link, A. J. Lasso peptide benenodin-1 is a thermally actuated [1] rotaxane switch. Journal of the American Chemical Society 139, 10403–10409 (2017). 329. Rolfsen, D. Knots and links (American Mathematical Soc., 2003). 330. Cromwell, P. R. Knots and links (Cambridge university press, 2004). 331. Lickorish, W. R. An introduction to knot theory (Springer Science & Business Media, 2012). 332. Van de Griend, P. & Turner, J. C. History and science of knots (World Scientific, 1996). 333. Przytycki, J. H. History of knot theory. arXiv preprint math/0703096 (2007). 334. Colberg, E. A brief history of knot theory. Página consultada a 8 (2017). 335. Przytycki, J. H. Knot theory from Vandermonde to Jones (1991). 336. Przytycki, J. H. Classical roots of knot theory. Chaos, Solitons & Fractals 9, 531–545 (1998). 337. Benham, C. J., Harvey, S., Olson, W. K., Swigon, D., et al. Mathematics of DNA structure, function and interactions (Springer, 2009). 338. O’Donnol, D., Stasiak, A. & Buck, D. Two convergent pathways of DNA knot- ting in replicating DNA molecules as revealed by ✓-curve analysis. Nucleic acids research 46, 9181–9188 (2018). 339. De Witt, L. S. & Cozzarelli, N. R. New scientific applications of geometry and topology (American Mathematical Soc., 1992). 340. Ernst, C. & Sumners, D. A calculus for rational tangles: applications to DNA recombination. 108, 489–515 (1990). 341. Sumners, D. W. Untangling Dna. The Mathematical Intelligencer 12, 71–80 (1990). 342. Arsuaga, J., Vázquez, M., Trigueros, S., Roca, J., et al. Knotting probability of DNA molecules confined in restricted volumes: DNA knotting in phage capsids. Proceedings of the National Academy of Sciences 99, 5373–5377 (2002). 343. Krasnow, M. A., Stasiak, A., Spengler, S. J., Dean, F., Koller, T. & Cozzarelli, N. R. Determination of the absolute handedness of knots and catenanes of DNA. Nature 304, 559 (1983). 344. Flapan, E. When topology meets chemistry: a topological look at molecular chirality (Cambridge University Press, 2000). 345. Sauvage, J.-P. & Dietrich-Buchecker, C. Molecular catenanes, rotaxanes and knots: a journey through the world of molecular topology (John Wiley & Sons, 2008). 346. Kleckner, D. & Irvine, W. T. Creation and dynamics of knotted vortices. Na- ture physics 9, 253 (2013). 347. Barenghi, C. F. Knots and in superfluid turbulence. Milan Journal of Mathematics 75, 177–196 (2007).

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ] Bibliography 129

348. Tkalec, U., Ravnik, M., Copar,ˇ S., Žumer, S. & Muševiˇc,I. Reconfigurable knots and links in chiral nematic colloids. Science 333, 62–65 (2011). 349. Ranada, A. F. Knotted solutions of the Maxwell equations in vacuum. Journal of Physics A: Mathematical and General 23, L815 (1990). 350. Irvine, W. T. & Bouwmeester, D. Linked and knotted beams of light. Nature Physics 4, 716 (2008). 351. Hall, D. S., Ray, M. W., Tiurev, K., Ruokokoski, E., Gheorghe, A. H. & Möttö- nen, M. Tying quantum knots. Nature Physics 12, 478 (2016). 352. Ranada, A. F. & Trueba, J. L. Ball lightning an electromagnetic knot? Nature 383, 32 (1996). 353. Lua, R. C. PyKnot: a PyMOL tool for the discovery and analysis of knots in proteins. Bioinformatics 28, 2069–2071 (2012). 354. Flory, P. Principles of polymer chemistry (Cornell University Press, 1953). 355. De Gennes, P.-G. Scaling concepts in polymer physics (Cornell University Press, 1979). 356. Doi, M. & Edwards, S. F. The theory of polymer dynamics (Oxford University Press, 1988). 357. Des Cloizeaux, J. Les polymeres en solution (1990). 358. Zinn-Justin, J. Quantum field theory and critical phenomena (Clarendon Press, 1996). 359. Dokholyan, N. V. Computational modeling of biological systems: from molecules to pathways (Springer Science & Business Media, 2012). 360. Adcock, S. A. & McCammon, J. A. Molecular dynamics: survey of methods for simulating the activity of proteins. Chemical reviews 106, 1589–1615 (2006). 361. Leach, A. R. Molecular modelling: principles and applications (Pearson education, 2001). 362. Liwo, A. Computational methods to study the structure and dynamics of biomolecules and biomolecular processes (2014). 363. Xu, L. & Zhang, W.-B. Topology: a unique dimension in protein engineering. Science China Chemistry 61, 3–16 (2018). 364. Taylor, W. R., Xiao, B., Gamblin, S. J. & Lin, K. A knot or not a knot? SETting the record ’straight’ on proteins. Computational biology and chemistry 27, 11–15 (2003). 365. Jackson, S. E., Suma, A. & Micheletti, C. How to fold intricately: using theory and experiments to unravel the properties of knotted proteins. Current opinion in structural biology 42, 6–14 (2017). 366. Taylor, W. R., May, A. C., Brown, N. P. & Aszódi, A. Protein structure: geom- etry, topology and classification. Reports on Progress in Physics 64, 517 (2001). 367. Lim, N. C. & Jackson, S. E. Molecular knots in biology and chemistry. Journal of Physics: Condensed Matter 27, 354101 (2015). 368. Virnau, P., Mallam, A. & Jackson, S. Structures and folding pathways of topologically knotted proteins. Journal of Physics: Condensed Matter 23, 033101 (2010). 369. Sulkowska, J. I. & Sułkowski, P. in The Role of Topology in Materials 201–226 (Springer, 2018). 370. Knappe, T. A., Manzenrieder, F., Mas-Moruno, C., Linne, U., Sasse, F., Kessler, H., Xie, X. & Marahiel, M. A. Introducing lasso peptides as molecular scaf- folds for drug design: engineering of an integrin antagonist. Angewandte Chemie International Edition 50, 8714–8717 (2011). 371. Michieletto, D. Topological interactions in ring polymers (Springer, 2016).

[ July 15, 2019 at 14:29 – classicthesis version 1.01 ]