Mammalian expression and characterization tools for generation next biologics

Doctoral Thesis in Biotechnology Mammalian protein expression and characterization tools for next generation biologics

NIKLAS THALÉN

ISBN    TRITACBHFOU: KTH KTH www.kth.se Stockholm, Sweden   Mammalian protein expression and characterization tools for next generation biologics

NIKLAS THALÉN

Academic Dissertation which, with due permission of the KTH Royal Institute of Technology, is submitted for public defence for the Degree of Doctor of Philosophy on June 11th, 2021 at 10:00, F3, Lindstedsvägen 26, våningsplan 2, Sing-Sing, KTH campus, Stockholm.

Doctoral Thesis in Biotechnology KTH Royal Institute of Technology Stockholm, Sweden 2021 © Niklas Thalén

ISBN 978-91-7873-927-1 TRITA-CBH-FOU-2021:21

Printed by: Universitetsservice US-AB, Sweden 2021 Till Sofia Abstract Protein therapeutics are increasingly important for modern medicine. Novel recombinant developed today can bind towards their target with high specificity and with low adverse effect. This has enabled the treatment of diseases that for a few years ago were deemed uncurable. Discovery of therapeutic proteins is driven through protein engineering, a field that is in constant expansion. And, through artificial construction of recombinant proteins, a large array of diseases can be defeated. The function and quality of these protein therapeutics rely on the correct folding, assembly and residue modification that occurs during their production within a living production cell host. Furthermore, producing them in large quantities are essential for accessibility of the best biopharmaceuticals available. Commonly, mammalian cells are the production host of choice when it comes to production of biopharmaceuticals. Mainly, due to the conserved nature of protein expression pathways within its biological class. Although an ever- growing number of biopharmaceuticals are produced in mammalian cells, there is always room for improvement. Development of novel recombinant protein therapeutics rely on accurate production of the protein. And if this is not achieved, a potential biopharmaceutical will never see the light of day. Furthermore, limited production capabilities can hamper product quality, with less efficacy and increased side-effects as a result. This thesis examines several different pathways for improvements on recombinant protein production for pharmaceutical purposes in mammalian cells. First, the basics of recombinant protein technology and mammalian cell function is outlined. Followed by a summary of six scientific articles revolving within expression and characterization tools for mammalian produced proteins. In paper I, utilization of transcriptomics identifies involved in protein expression, which enable the production of a difficult-to-express protein with up to a 150-fold greater activity. Furthermore, in paperIV, transcriptomics reveals genomic differences in a novel cell line that exhibit several fold protein expression capabilities. Besides omics technologies, methods for recombinant protein expression and modification are presented that generate more useable product for several different protein families. And, a protocol for the generation of a pre matured split-GFP variant is presented. Lastly, in paper VI, a mammalian cell display method with an optimized setting that enables precise epitope mapping of glycosylated antigens in a high throughput manner is outlined. With this method, the epitope of four neutralizing antibodies against SARS-CoV-2 is determined. For all of the papers involved within the presented thesis, mammalian cell production of recombinant proteins is the common denominator. Exploring the capabilities of mammalian cell production of current and next-generation biopharmaceuticals is of utter importance to continue the struggle against the gruesome nature of human diseases.

Keywords: CHO, Cell line engineering, Protein engineering, GFP, Cell display

iv Sammanfattning Proteinläkemedel får stadigt en starkare ställning inom den moderna medicinen. Nya rekombinanta proteiner som utvecklas idag kan binda mot sitt mål med hög specificitet och med få sidoeffekter. Detta har möjliggjort behandling av sjukdomar som för bara några år sen var letala. Utvecklandet av terapeutiska proteiner möjliggörs av proteinteknik, ett relativt ungt område som är i konstant utveckling. Där artificiell konstruktion av rekombinanta proteiner möjliggör bekämpandet av en uppsjö av sjukdomar. För att uppnå rätt funktion och kvalité, så behöver terapeutiska proteiner vara korrekt producerade. Detta sker inom en levande produktions cell, där rätt veckning samt modifikation möjliggör dess konstruktion. Utöver detta så behöver även enorma kvantiteter av biofarmaceutiska läkemedel kunna produceras, för att säkerställa tillgången av de bästa läkemedel som finns att erbjuda. För detta ändamål används främst mammalieceller som produktionsvärd då tillvägagångsättet för proteinkonstruktion är konserverat inom den biologiska klassen. Men även om mammalieceller är bäst lämpade för ändamålet, så finns det ett stort utrymme för förbättringar hos dessa. Utvecklingen annya rekombinanta terapeutiska proteiner är beroende av att tillverkningsprocessen fungerar, och om det ej uppnås så kommer potentiella nya läkemedel aldrig realiseras. Även funktionella tillverkningsprocesser med inneboende begränsningar kan påverka det producerade proteinet negativt, med en lägre och ogynnsam effektivitet som följd. I denna avhandling så undersöks flera olika tillvägagångsätt för att förbättra produktion av rekombinanta proteiner i mammalieceller. Inledningsvis så presenteras det fundamentala inom rekombinant proteinteknik samt mammaliecellers funktion för tillverkande av dessa. Följt av summeringen av sex vetenskapliga artiklar som behandlar metoder för uttryckandet samt karakteriseringen av proteiner tillverkade inom mammalieceller. I artikel I används transkriptomik för identifikation av gener som är involverade i tillverkningen av ett svåruttryckt protein, detta möjliggjorde en 150-faldig ökning av aktivitet hos produkten. Även i artikel IV så identifierades genomiska skillnader kopplade till produktionsökningen hos en ny cellinje med hjälp av transkriptomik. Förutom omik tekniker så presenteras metoder för uttryckandet samt modifieringen av rekombinanta proteiner som genererar mer funktionell produkt för flera olika proteinfamiljer. Även ett protokoll för genererandet av en split-GFP variant där ena delen av molekylen har fått forma fluoroforen i ett tidigare skede presenteras i artikel V. Avslutande så introduceras en optimerad process där ett membran-förankrat antigen möjliggör en detaljrik epitope mappning via mammalieceller. Med denna metod så identifieras inbindningen av fyra antikroppar mot SARS-CoV-2. För samtliga artiklar som presenteras i denna avhandling så är produktion av proteiner inom mammalieceller den gemensamma nämnaren. Utforskandet av möjligheterna inom produktion av rekombinanta proteiner i mammalieceller är av yttersta vikt för att producera funktionella biofarmaceutiska läkemedel både idag samt i framtiden. Vilket möjliggör vidare framgångar i förhindrandet av sjukdomars lidande.

v Popular science summary Proteins are one of the fundamental building blocks of life. And our own existence was only made possible, with their emergence some 4 billion years ago. This unique macromolecule exists in a huge variety within all living things. They enable our eyes to process light into vison, carries oxygen to our muscles, and prevents infections when we get ill. With something of this importance, it is easy to see that if errors occur, catastrophic effects might follow. And sadly, errors do occur, and this is often the case for various diseases, ranging from cancer to Alzheimer’s and diabetes. Actually, most known diseases are linked to protein malfunction. However, over the last three decades, scientific advancement within the field of biotechnology have enabled artificial proteins to be a novel cure of many of these diseases. These protein drugs are often called biopharmaceuticals or biologics. And, unlike classical chemical drugs, biologics are produced in living cells. And with the emergence of biologics, vast advancements have been made for modern medicine. Diseases that were fatal just a few years ago, are now treatable.

The first biopharmaceutical produced was insulin, a protein that is lacking in patients that suffer from diabetes. This was a huge breakthrough for modern medicine and following this, more and more advanced biopharmaceuticals have been developed, and the pace of innovation is likely not going to decrease in speed. For the first production of insulin, a bacterium was used as the living cell that produced the drug. And although bacterium is an excellent production cell, when more complex biological drugs were developed, they were just not up to the task. This comes from the evolutionary differences between living organisms on earth. Bacterium and humans are simply not that alike. However, mammalian cells are. And rats, hamsters, or human cells cultivated within a lab, are in very close resemblance to all humans on the planet. Therefore, mammalian cells have gained a special position for the production of biologics. And they are routinely used for the assembly of the most complex drugs targeting cancer, rheumatoid arthritis, or multiple sclerosis.

Although mammalian cells are capable of producing these cutting-edge pharmaceuticals, they are not without flaws. Sometimes side effects that could potentially be avoided are present since limitations do exist. And, production capabilities are rarely as high as other protein expression organisms, so great cost follow. It can even be the case that some biopharmaceuticals never see the light of day because the mammalian expression host simply cannot produce it. And this is the central part my thesis. For the continued advancement in medicine, the protein production host must be engineered to cope with current and future challenges. Therefore, several different papers are presented where we try to increase the production capabilities of mammalian cells. And by trying to understand what goes on within the cell, we have been able to re-evolve it into a better production organism that will take a small step towards confronting the challenges of modern medicine

vi vii List of appended papers and manuscripts This thesis is based on the following articles and manuscripts. Full versions of the articles are appended at the end of the thesis.

I. Thalén N, Moradi Barzadd M, Lundqvist M, Rodhe J, Andersson M, Bidkhori G, Possner D, Su C, Nilsson J, Eisenhut P, Malm M, Westin J, Forsberg J, Nordling E, Mardinoglu A, Volk AL, Sandegren A, Rockberg J, Systems biology greatly improve activity of secreted therapeutic sulfatase in CHO bioprocess. Manuscript

II. Eisenhut P, Mebrahtu A, Moradi Barzadd M, Thalén N, Klanert G, Weinguny M, Sandegren A, Su C, Hatton D, Borth N, Rockberg J (2020), Systematic use of synthetic 5'-UTR RNA structures to tune protein translation improves yield and quality of complex proteins in mammalian cell factories. Nucleic Acids Research 18;48(20):e119

III. Hendrikse NM, Sandegren A, Andersson T, Blomqvist J, Makower Å, Possner D, Su C, Thalén N, Tjernberg A, Westermark U, Rockberg J, Gelius SS, Syrén PO, Nordling E (2021), Ancestral lysosomal with increased activity harbor therapeutic potential for treatment of Hunter syndrome. iScience 102154 2021

IV. Thalén N, Lundqvist M, Holmberg Schiavone L, Volk AL, Roth R, Rockberg J, ULK1 knockout cell line downregulates autophagy, upregulates recombinant transcript and improves protein secretion. Manuscript

V. Lundqvist M*, Thalén N*, Volk AL, Hansen HG, von Otter E, Nygren PÅ, Uhlen M, Rockberg J. (2019), Chromophore pre-maturation for improved speed and sensitivity of split-GFP monitoring of protein secretion. Scientific reports;9(1):310

VI. Thalén N, Persson H, Ohlin M, Lundqvist M, Volk AL, Rockberg J, Mammalian cell surface display system for conformational epitope determination of human SARS- CoV-2 neutralizing antibodies. Manuscript

*These authors contributed equally to this work

viii Respondent’s contribution to appended papers

I. Outlined the experimental settings with co-authors. Prepared and handled RNA samples, cloned everything for the co-expression studies and variable promoter expressions. Performed co-expression studies, purifications and analysis of the sulfatase activity with the assistance of Mona Moradi Barzadd. Wrote the manuscript.

II. Cloned sumf1 vectors together with Peter Eisenhut and assisted in analysis of sulfatase activity with Mona Moradi Barzadd. Assisted in reviewing the manuscript.

III. Cloned the sumf1 vector and performed co-expression studies together with Natalie Hendrikse. Assisted in reviewing the manuscript.

IV. Reviewed and analyzed the transcriptomic results with Magnus Lundqvist. Wrote the manuscript.

V. Developed the maturation protocol and characterized mature GFP 1-10 together with Magnus Lundqvist and Eric von Otter. Wrote parts of the manuscript and assisted Magnus Lundqvist in reviewing the whole manuscript.

VI. Developed the cell surface display platform, cloned all construct, generated the mutational library, performed the transfections and assessments on the epitope mapping for all antibodies tested in the manuscript. Wrote the manuscript.

ix Public defense of dissertation This thesis will be defended on June 11th, 2021 at 10:00, F3, Lindstedsvägen 26, våningsplan 2, Sing-Sing, KTH campus, Stockholm, for the degree of “Teknologie doctor” (Doctor of Philosophy, PhD) in Biotechnology.

Respondent: Niklas Thalén, M.Sci. in Biotechnology Department of Protein Science, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH Royal Institute of Technology, Stockholm, Sweden

Faculty opponent: Prof. Dr. Kerstin Otte Department of Pharmaceutical Biotechnology Hochschule Biberach, Biberach, Germany

Evaluation committee: Docent Mats AA Persson Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden

Docent Caroline Grönwall Department of Medicine, Karolinska Institutet, Stockholm, Sweden

Prof. Daniel Daley Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden

Chairman: Prof. Per-Åke Nygren Department of Protein Science, School of Engineering Sciences in Chemistry, Biotechnology and Health KTH Royal Institute of Technology, Stockholm, Sweden

Respondent’s main supervisor: Prof. Johan Rockberg Department of Protein Science, School of Engineering Sciences in Chemistry, Biotechnology and Health KTH Royal Institute of Technology, Stockholm, Sweden

Respondent’s co-supervisor: Dr. Anna-Luisa Volk Department of Protein Science, School of Engineering Sciences in Chemistry, Biotechnology and Health KTH Royal Institute of Technology, Stockholm, Sweden

x Abbreviations A Adenine aa-tRNA Aminoacyl-tRNA ACE2 Angiotensin-converting 2 ADCC Antibody-dependent cellular cytotoxicity AI Artificial intelligence ASA Arylsulfatase A ASR Ancestral sequence reconstruction ATG Autophagy related C Cytosine Cas9 CRISPR associated protein 9 CDR Complementarity-determining region CHO Chinese hamster ovary CMV Human cytomegalovirus CRISPR Clustered regularly interspaced short palindromic repeats COP Cytoplasmic coat protein DNA Deoxyribonucleic acid E. coli Escherichia coli ELISA Enzyme-linked immunosorbent assay ER Endoplasmic reticulum ERAD ER-associated degradation ERT Enzyme replacement therapy Fab Antigen binding fragment FACS Fluorescence-activated cell sorter Fc Fragment crystallizable Fv Fragment variable G Guanine GFP Green-fluorescent protein GPI Glycosylphosphatidylinositol HDX Hydrogen/deuterium exchange HEK Human embryonic kidney IDS Iduronate-2-sulfatase Ig Immunoglobulin LSD Lysosomal storage disorders M6PR Mannose-6-phosphate receptor MPSII Mucopolysaccharidosis type II mRNA Messenger RNA NGNA N-glycolylneuraminic acid NMD Nonsense mediated mRNA decay NMR Nuclear magnetic resonance spectroscopy PCR Polymerase chain reaction PDI Disulfide isomerase PGK Mouse phosphoglycerate kinase 1 promoter POI Proteins of interest PTM Post translational modification

xi Abbreviations

RBD Receptor binding domain RgE Regulation element RNA Ribonucleic acid rRNA Ribosomal RNA RSA Relative solvent accessible surface area SARS-CoV-2 Severe acute respiratory syndrome coronavirus 2 SPR Signal recognition particle Sulfamidase N-sulphoglucosamine sulphohydrolase T Thymine tRNA Transfer RNA U Uracil UBC Human Ubiquitin C promoter UPR Untranslated protein response UTR Untranslated region

VH Heavy chain variable domain VL Light chain variable domain Vulko ULK1 knockout ExpiHEK293 cell wt Wildtype

xii

Table of content

Abstract iv Sammanfattning v Popular science summary vi List of appended papers and manuscripts vii Public defense of dissertation x Abbreviations xi Table of content xiv

Chapter 1 - Proteins 1 Fundamentals of life 1 Protein synthesis 3 Proteins as therapeutics 8 Recombinant protein technology 8 Proteins involved in this thesis 10 Antibodies Sulfatase Green-fluorescent protein

Chapter 2 - Mammalian host cell for protein production 14 The mammalian cell 14 Chinese Hamster Ovary Cells 15 The secretory pathway 16 Control mechanisms for protein production 18 Chapter 3 - Improving protein production 20 Engineering the recombinant for increased production 20 Cell line development 22 Engineering the host cell for increased production 22

xiv Chapter 4 - Epitope mapping 23 Techniques for epitope mapping 23

Present investigation 26 Paper I 27 Systems biology greatly improve activity of secreted therapeutic sulfatase in CHO bioprocess Paper II 31 Systematic use of synthetic 5'-UTR RNA structures to tune protein translation improves yield and quality of complex proteins in mammalian cell factories Paper III 35 Ancestral lysosomal enzymes with increased activity harbor therapeutic potential for treatment of Hunter syndrome Paper IV 39 ULK1 knockout cell line downregulates autophagy, upregulates recombinant transcript and improves protein secretion Paper V 45 Chromophore pre-maturation for improved speed and sensitivity of split-GFP monitoring of protein secretion Paper VI 49 Mammalian cell surface display system for conformational epitope determination of human SARS-CoV-2 neutralizing antibodies

Conclusion remarks and future perspective 53

Acknowledgments 55

Bibliography 57

xv Chapter 1 - Proteins Chapter 1

Proteins

Fundamentals of life

Can you imagine a more wonderful and extraordinary thing than life and living entities? They are things that can grow, construct themselves, reproduce into two almost identical copies and pass on their characteristics through the age of time. Although life is an universal feature, a common classification of what postulates something as living is hard to find. It ranges from religious beliefs, extraterrestrial theories, or different biological, physical, and chemical definitions. The most groundbreaking definition however, came from Charles Darwin in his world famous 1859 publication1 “On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life”, where he introduced evolution, and laid out one of the fundamental theories for modern biology. In short, Darwin’s theory postulates that all living things possesses heredity information in the form of nucleic acids that evolves through replication, mutation, and replication of mutations. This means that all life-forms present on earth today come from ancient common ancestors that have nucleic acid information, i.e., the blueprints of life. We know today that these blueprints are present in the form of both deoxyribonucleic acid (DNA) and ribonucleic acid (RNA). Furthermore, we know that there is also a central class of macromolecules that sustain all life, namely proteins. Proteins are molecular machines that carry out most of the functions of life, enabling it to replicate, grow, and survive. Meaning that if genetic information is the design of life, proteins are its enabler, figure 1.1.

Figure 1.1 Central dogma of molecular biology. Often simplified as “DNA makes RNA, and RNA makes protein” The central dogma entail that information within DNA passes on via RNA into proteins. And once it has reached proteins, the information cannot be retrieved. For both RNA and protein construction, proteins are needed. The “chicken and egg” problem of biology.

1 Chapter 1 - Proteins

But herein lies a paradox, an evolutionary chicken and egg problem. If DNA and RNA encode for proteins, and proteins are the machinery that enable the conversion from genetic code to protein, what came first? Luckily for us, there is a substantial scientific consensus onthe emergence of these fundamentals of life. It came with the introduction of the RNA-world hypothesis in 19862. Essentially it means that RNA, that for a long time was only seen as a intermitted passage of DNA genetic information to proteins, has the capability to both perform mechanical work, through enzymatic reactions, and act as the storage for genetic information3. Furthermore, the protein that converts the genetic information into proteins has proven to be the single most important macromolecule when looking into the past. This protein is called the ribosome, and it is present in all three branches of life (bacterial, eucaryotic, and archaea). And regardless of life form, the core compartments of the ribosome are all the same, despite the evolutional divergence over the past 3.5-4 billion years4,5. Some scientist even say that the birth of a functional ribosome, was the birth of life itself6. The ribosome is however not only a protein, it is a ribonucleotide protein, built of both RNA and protein structures7. And at the common core, only RNA structures exist. So, the most solid scientific theory of the emergence of the branches of life postulates that through chemical evolutionary processes RNA evolved into the RNA core of the ribosome. This enabled the formations of linking, i.e. the introduction of proteins to the world. Proteins, that have more functional groups than RNA are thus capable of more complex function, and biological evolution could take place, allowing biological evolution to progress8,6,9. This can very beautifully be seen when peeling down the layers of the ribosome, the further down to the core you go, the farther back in evolution you go. Close to the center, the proteins on the ribosome are of very simple nature. And when branching out, the more complex they get, and the evolution of protein complexity can be seen through this living ribonucleotide fossil,10 figure 1.2. Interestingly, DNA, that is often presented as the core image of evolution, is believed to have been introduced in lifeforms after the evolution of ribosomes and proteins, and thus postdates our last common ancestors11.

So, there you have it, proteins are the fundamentals of life and are the product of the origin of biological evolution. And this central building block is also at the core of human diseases. Since in most, if not all cases, abnormal proteins are involved in the cause of any disease. Through erroneous localization, misfolding, upregulation or downregulation of proteins, diseases emerge. And understanding these characteristics of protein behavior is essential for the progression of medical discoveries. For the remaining sections of this thesis protein construction and production will be discussed with a focus on mammalian cell protein production. Therefore, if not stated otherwise, the mammalian system is only referred to.

2 Chapter 1 - Proteins

Figure 1.2 Common core of the ribosome. The central parts of the ribosome protein from bacterial (A), archaea (B), and eukaryotic (C) cells. Colored regions are the common core of ribosomes. Bernier. CH, Translation: The Universal Structural Core of Life, Mol Biol Evol, 2018, 8, 2065-2076, by permission of Oxford University Press

Protein synthesis

Now presented as the single most crucial evolutionary process when simple RNA structures could be translated into more complex protein structures, we will now look further into how proteins are encoded and assembled within the cell.

In the , approximately 20’000 genes that encode for proteins exist, and these genes are present in the form of DNA12. DNA is built up by two polynucleotide chains that form a double-stranded helix upon the hydrogen bonding between four different nucleobases; adenine (A), cytosine (C), thymine (T), and guanine (G). Nucleobases pairs depending on their chemical structure, with adenine forming two hydrogen bonds to thymine. And guanine forming three hydrogen bonds to cytosine. For RNA structures, only one polynucleotide is present, and thymine is replaced with uracil (U), which binds to adenine in the same fashion as for DNA. The genetic code works in stretches of three nucleotides. Each triplicate of nucleotides encodes for one structural unit in the protein sequence (amino acid), meaning that there are 64 possible amino acid combinations from the four-letter genetic code. However, there are only 20 different types of amino acids, so several of the codons encode for the same amino acid. 3 Chapter 1 - Proteins

Furthermore, three different codons are stop codons, meaning that they encode for the termination of protein synthesis. Protein biosynthesis can be divided up into two parts, transcription and translation. The transcription phase is when a gene within the DNA is transcribed into messenger RNA (mRNA), an intermediator of gene information from the nucleus to the cytosol. This transcription is enabled through a protein that acts as biological catalyst (enzyme), called RNA-polymerase. The RNA-polymerase also synthesizes transfer RNA (tRNA) and ribosomal RNA (rRNA), which are essential for protein synthesis further down the assembly line. The gene encoding for a protein consist of two different types of nucleotide sequences, exons and introns. During transcription, introns are removed from the gene and only exons are kept in the mature mRNA sequence. Newly transcribed mRNA then moves from the nucleus to the cytosol, where the translation phase takes place. In the cytosol, the two different ribosomal subunits that make up the final ribosome are formed from rRNA and ribosomal proteins. The smaller ribosomal subunit binds to mRNA at a distinct start codon, AUG, making the mRNA accessible to charged tRNA. Within the cytosol, the tRNA’s have been charged into aminoacyl-tRNA´s (aa-tRNA), meaning that one of the 20 different amino acids have been added on, making them ready for protein synthesis. The charged-tRNA with the anticodon UAC binds to the start codon followed by the addition of the large ribosomal subunit, finalizing the initiation of protein synthesis. The protein synthesis complex is now complete, and elongation of the amino acid peptide will be catalyzed by the ribosome. In order to do this, the ribosome has three interior compartments where tRNA’s can be located. The aminoacyl(A)-site, peptidyl(P)-site, and exit(E)-site. When the elongation phase starts, the first aa-tRNA sits in the center of the ribosome at the P-site, and in the A-site, thenext mRNA codon is accessible for an additional aa-tRNA to adhere. Following adherence of the next aminoacyl group, the ribosome catalyzes the peptide bond formation between the first and second aa-tRNA within the P-site and pushes the tRNAs back one step within its compartment. Thereby enabling the addition of a new aa-tRNA in the A-site, and moving the first tRNA to the E-site, where it can exit the ribosome and be re-charged with a new amino acid7. This completes one cycle in the protein synthesis, which will continue down the mRNA sequence until it reaches a stop codon. This terminates the protein synthesis, and the ribosome releases the amino acid peptide that then can fold into the final structure of the protein, figure 1.3.

4 Chapter 1 - Proteins

Figure 1.3 Protein synthesis. DNA is transcribed within the nucleus by the RNA polymerase. During transcription, spliceosomes cut away introns from the pre-mRNA. This process generates the mature mRNA that is transported from the nucleus to the cytosol. Next, the small subunit of the ribosome attaches and locates the starting codon of the mRNA. Then, charged aa-tRNA with the corresponding anticodon attaches and protein translation is initiated with the addition of the large ribosomal subunit. Prior to the start codon, within the mRNA, is the untranslated region (UTR). Protein structure

As mentioned above, the protein polypeptide chain is made up of different amino acids. The amino acid is a small organic molecule that is comprised of a central carbon atom that is linked to a hydrogen atom, a carboxyl group, an amino group and a variable side chain. The side chain comes in 20 different versions, and together they constitute the building blocks for the entire proteome. Each of the different amino acids come with different features. They can be basic, charged, acidic or neutral. They can be hydrophobic or hydrophilic and all these differences affect the protein folding and thus the structure of the protein molecule. When charged side chains interact, ionic bonds can form. Strong hydrogen bonds can link polar amino acids together whiles weak Van der Waals interactions are made between hydrophobic sites. One amino acid in particular is of importance for the formation of higher-level structures, namely , that are the only side chains that can form covalent bonds in the form of bridges.

5 Chapter 1 - Proteins

All of these different side chain interactions is what guides the protein to bend and fold into its structure. Thus, the location of each amino acid within the sequence is what constitutes how the protein will form and eventually function. Protein structures are named and classified into four different structures based on their folding level. The primary structure, is the two-dimensional amino acid chain without any of the named side chain interactions. When interactions occur, the protein folds into a higher level and two different classes of three-dimensional structures can form; beta-sheets and alpha helixes. These are called secondary structures and comprise of local segments of the finished protein. Beta- sheets consist of several beta strands that are adjacently arranged and form multiple hydrogen bonds between the backbones of each strand and the side chain points outwards from the plane on alternating sides of the sheet. The other local secondary structure, alpha helix, is a right- handed helix, meaning that the backbone string of the helix-formation coils in a clockwise fashion away from the viewer when looking down on the center of the helix. It is structured through hydrogen bonds between the amino and hydroxyl group in the backbone sequence and for every fourth amino acid a full circular coil is made13. The tertiary structure is a single protein chain that consist of one or more secondary structure motifs and, lastly, the quaternary structure is the combination of several tertiary structures figure 1.4. Protein folding is an evolutionary process. Introduction of more efficient or complex protein structures have led to greater traits and biological benefits. Looking into the past of protein folding within the ribosome one can see that beta sheet formation occurred before alpha helix formation. Furthermore, the more recent evolutionary ribosomes of eukaryotes contain long extending structures believed to assist in the forming of more complex proteins structure formations. The ribosome has evolved several ways to assist the protein in finding its correct fold, but it is however not the only assistance. Chaperones are a class of proteins that also assist the protein in finding its native state. And it is easy to see why the protein needs assistance for finding its lowest-energy configuration, since the number of possible folds range50 between10 – 10300,14. Now, these large numbers are hard to fathom, but as a comparison it is believed that the known universe contains roughly 1080 atoms15. Understanding the protein structure of any given protein is highly relevant for understanding the function of any given protein. And getting the full three-dimensional structure of a protein is not an easy task. On the gene level, scientist today have an approximately 190 million different protein sequences accessible through the universal protein knowledgebase (UniProt)16. Far too many to be able to determine their structure with current available techniques. One way to speed up our understandings of all these proteins would be through in silico structure determination, i.e., computerized calculation and prediction of protein construction. But given the vast quantities of possible outcomes, it is not an easy task. Artificial intelligence has however recently changed the conditions for these calculations andlast year the “protein folding problem” moved closer to a solution. DeepMind, a daughter company of Google, determined a protein structure with 92% accuracy with their program AlphaFold17. This could open up a whole new field in elucidating all the existing protein structures out there.

6 Chapter 1 - Proteins

Figure 1.4 Protein structure. The amino acid polypeptide chain is built up through peptide bond formation between the amino group and the carboxyl group of several amino acids (A). The structural parts of the protein is then divided into the primary structure (B), were only the 2-D polypeptide chain is formed. Upon electrostatic interactions between side chains secondary structures form as beta sheets (C) and alpha helices (D). Secondary structure elements can then form folded proteins, called tertiary structures (E). And some proteins are build up by several tertiary structures (F) classified as quaternary structure proteins.

7 Chapter 1 - Proteins

Proteins as therapeutics

Therapeutics comes from the Greek word “therapeia” meaning “a service”, and is the branch within medicine that deals with the treatment of diseases and the art of science and healing. Although various medical treatment has been around for as long as humans have, a starting point for modern therapeutics came during the industrialization era, with the rise of pharmaceutical chemistry and pharmacology in the late 19’th century. The first therapeutics, or “drugs” that emerged during this time was morphine and quinine, which were extracted from plants and used as pain medication and for treatment of malaria. These remedies were soon followed by aspirin, paracetamol, and several different vaccines that revolutionized the medicine branch that involved therapeutics. In 1921, insulin was the first protein extracted from the pancreas of animals, making it the first protein, or large molecule drug, to be extracted and purified. This discovery led to the first treatment of diabetes, with a giant impact for patients suffering worldwide18. And in 1978 insulin became the first protein to be recombinantly produced within an expression host when the insulin gene was put into the Escherichia coli (E. coli) and expressed by the microbe19. From this achievement, Humulin entered the drug market in 1982 and thereby marked the real entry of protein therapeutics on the drug market. Therapeutics are classified in two different orders; small molecules drugs and large molecules drugs. Small molecule drugs consist of molecules or peptides (proteins smaller than 50 amino acids) that are chemically synthesized and are generally smaller than 500 Da. Large molecule drugs are larger than 5000 Da, are produced within a biological host and are identical or in close resemblance of its endogenous proteins, commonly called biologics20. Small molecule drugs are still today the most common pharmaceutical, its small size make it advantageous in administration route since it can be taken up by the body orally. Further, manufacturing and control are easier due to the low cost of synthesizing small molecules. Synthetic material is also easier to analyze the properties of and a complete homogeneity of the product can be achieved. Protein therapeutics, however, poses other advantages over small molecule drugs. They are more specific against their target, rendering them both more effective and with less side-effects and less non-target effects. Proteins can also act as replacement treatment when a gene deficiency causes a shortage or absence of a functional protein within the body. Lastly, protein therapeutics are also financially beneficial due to shorter approval times and more far-reaching patent protections enabled through their unique form and function21,22.

Recombinant protein technology

The revolution within the pharmaceutical world through the recombinant protein expression of insulin in E.coli was made possible with the birth of protein engineering. Protein engineering emerged in the later half of the 20’th century, with the invention of several key methods in genetic engineering. First, the discovery that bacterial enzymes can cut DNA strands at a specific location (restriction enzymes), in 1960 enabled guided disruption23 ofDNA . 10 years later, the technique of introducing foreign DNA into a microorganism without the use of helper phages opened up new opportunities for generating genetically engineered organisms24. The first successful construction of two different DNA parts (cloning) with the use of restriction enzymes came in 1972 and just one year later came the first transgenic organism when antibiotic resistance was introduced in E.coli through the transformation of a cloned DNA plasmid25,26. Further crucial methods followed with the invention of Sanger sequencing in 1977, the development of polymerase chain reaction (PCR) during the 1980’s and site directed mutagenesis in 197827,28,29. This enabled endless engineering possibilities for the recombinant proteins. 8 Chapter 1 - Proteins

Once a recombinant gene is constructed, it can be incorporated into a host organism. This is called transformation for bacteria and transfection for eucaryotic cells. The difference here is that for bacterial uptake of exogenous genetic material, it is either degraded or incorporated into the endogenous genome. For eucaryotic cells, most of the transfected genetic material is expressed without being incorporated into the host genome (transient transfection). For transient transfection, dilution and degradation of the genetic material will occur over time and the recombinant gene will only be expressed in a limited time frame. This is usually used during the development of a new protein since it is the easiest way to express. Since many of the cells will not take up the exogenous material, a selection marker is added to the foreign DNA. As mentioned above, antibiotic resistance was used for the generation of the first transgenic organism in 1972, and antibiotic resistance is still the most common selection marker. This enables selection of the survival of only organism that have incorporated the recombinant genetic material. And this can then be used to select out eucaryotic cells that incorporate the exogenous material and goes from being transiently expressing cells to stable expressing cells, meaning that descendants of these transfected cells also express the recombinant DNA. This gene incorporation into the organism was first through random insertion of the recombinant gene. This, however, changed in the 1990’s when novel techniques for genome editing emerged, that enabled more precise editing capabilities of endogenous genes and insertion of recombinant genes. And in 2013, clustered regularly interspaced short palindromic repeats (CRISPR) / CRISPR associated protein 9 (Cas9) system emerged30. CRISPR/Cas9 have been described as a revolution for genetic engineering. Unlike previously techniques, CRISPR/Cas9 can precisely manipulate virtually any genomic sequence. This can in turn reveal a genes function and connections to diseases, allowing for correction of disease-causing mutations. For their development of this technique, Emmanuelle Charpentier and Jennifer Doudna received the 2020 Nobel prize in chemistry, making them the first women to alone win a science Nobel. All of these techniques in combination with many more have led up to the protein engineering capabilities of today. Now, enormous amounts of gene sequences are known and can with ease be introduced to other organisms for production of virtually any given protein. The gene encoding for the protein can be mutated, altered and screened for desired phenotypic traits. Proteins can be divided into fractions or fused together to get dual properties within one molecule. It is even possible to reconstruct ancestral sequences from current protein genes in order to find ascendant proteins with medical beneficial traits that have been lost during evolution31. The mutagenic protein strategy can be divided into two different approaches. For rational design, the gene sequence alteration is determined through known information on the desired protein trait. For combinatorial protein engineering, this direct strategy is not used. Instead, large libraries of different protein variant are constructed and the protein with the most beneficial traits is selected out from the large pool of variants. Usually, a combination of both strategies is used in the development of novel protein therapeutics.

9 Chapter 1 - Proteins

Proteins involved in this thesis

For every single gene, several different protein species can be expressed, rendering a human proteome that consist of roughly 400’000 different protein species. All proteins play a part in the human body, but a doctoral student has only so much time. Therefore, three protein families of particular interest will be presented here; antibodies, sulfatases, and the green-fluorescent protein (GFP).

Antibodies

Antibodies or Immunoglobulins (Ig) are the most famous and important class of proteins for therapeutic purposes. Five different human antibody classes exist, but IgG (hereafter referred to as antibody) is of highest protein therapeutic relevance, and will, thus, be the only class discussed here. The therapeutic importance of antibodies comes from their natural function within our bodies. Upon the exposure to a foreign pathogen, our immune system produces a vast amount of highly specific antibodies that attaches themselves to surface proteins onthe pathogen, blocking its functionality and mark it for destruction (antibody-dependent cellular cytotoxicity (ADCC)) by our white blood cells. The proteins that antibodies bind to are called antigens, and they can be both self-protein, and non-self-proteins. Non-self-proteins are external environmental proteins, such as protein from pathogens or the active substance in vaccines. Self- proteins are, as the name implies, proteins within our body, e.g. cancer cells or proteins that cause autoimmunity. Antibodies have a molecular weight of approximately 150 kDa, making them a rather large protein. Antibodies are made up of two different polypeptide chains; the heavy chain (approx. 50 kDa), and the light chain (approx. 25 kDa). One heavy chain is linked to one light chain through disulfide bridges, and each antibody consist of two of these linked chains, connected atthe middle of the heavy chains, making it into a Y-shape form with highly flexible outwards pointing arms. For every natural occurring antibody, the two heavy chains and light chains are identical. Meaning that an antibody has two identical binding sites, that are located at the tip of each arm. This allows for a more stable binding to the antigen host, and the ability to fully coat a pathogen. Antibodies consists of three regions, the N-terminal variable fragment (Fv), the antigen binding fragment (Fab), and the C-terminal crystallizable fragment (Fc), the structure of the antibody is outlined in figure 1.5. The ability to render specificity to virtually any antigen target comes from the Fv-region onthe protein structure. There, six complementarity-determining region (CDR) loops, three on Variable light chain (VL) and three on the variable heavy chain (VH), form the antigen binding site (paratope). CDR-loops have the ability to inhabit all different kinds of amino acid compositions without interfering with the overall structure and stability of the antibody. These outreaching “fingers” on the Fab arm are usually 7-20 amino acids long and can combine with the antigen both on the basis of the side chains available, but also due to the fingers ability to both reach into cavities and around knobs on the antigen32. The enormous variation that the CDR loops can inhabit is manufactured in a very specific way unique for antibodies within our bodies. During an infection, antibodies are constructed within the lymph node, where high level of antibody variation is achieved though affinity maturation. B-cells are the cell type that produces antibodies, and when an antigen attaches to a B-cell anchored antibody, affinity maturation is initiated. The B-cell undergoes several proliferations cycles where the antibody that recognized the antigen are mutated within the Fv-region at a high pace. Mutations within our cells are often unwanted, but here a unique mutation process called somatic hypermutation enables high mutation rates only within the desired region. For each cell generation of the original B-cell, only higher affinity antibody production cells survive, resulting in a selection process that eventually

10 Chapter 1 - Proteins have highly specific affinity antibodies. When the affinity maturation is complete, B-cells are transformed to plasma cells that can produce massive amounts of antibodies that acts as a large army of the immune response and clear an ongoing infection33. This affinity maturation process is mimicked in therapeutic development of antibodies in order to locate specific binders towards virtually any given targets, such as cancer cells, damaging protein aggregates, or faulty protein functions. Furthermore, the abilities of antibodies have made them a versatile tool for a wide range of research purposes. For identification, quantification and isolation of various proteins, antibodies can be used. Techniques such as western blotting and enzyme-linked immunosorbent assay (ELISA) can both verify the presence of a protein as well as determine its quantity. Cells can be separated based on the presence of membrane proteins through a fluorescence-activated cell sorter (FACS) or stained for identification of protein presence in various tissues.

Figure 1.5 Antibody. Schematic representation of an IgG antibody. Annotated within the antibody structure is variable light chain (VL), constant light chain (CL), variable heavy chain (VH), and three constant heavy chains (CH1-3).

Sulfatases

Human sulfatases are composed of 17 conserved types of enzymes that play an important role in degradation and remodeling of mucopolysaccharides and glycolipids within our bodies34,35. Lysosomal sulfatases reside in the lysosomes, a subcellular membrane enclosed organelle responsible for the majority of intracellular degradation36. Lysosomal sulfatases catalyze the hydrolysis of sulfate esters, degrading them to smaller parts that re-enter the cytosol or extracellular matrix for recycling or elimination37. Common for all sulfatases is the co- translational translocation to the endoplasmic reticulum (ER) where they are folded and activated through the conversion of a cysteine into a formylglycine, an activation process that is unique for sulfatases38. After activation, lysosomal sulfatases are marked with mannose-6- phosphate within the Golgi apparatus and a mannose-6-phosphate receptor (M6PR) delivers the activated sulfatases to the lysosome39. Several different human conditions can be linked to deficiencies in sulfatases. The ones affecting lysosomal sulfatases belong to a class of rare diseases called lysosomal storage disorders (LSD).

11 Chapter 1 - Proteins

When lysosomal sulfatases are defect, the lysosomal compartments have an abnormal build-up of undigested material, which leads to several different severe and lethal conditions. In its most severe form, all sulfatases are inactive resulting in severe swelling of the lysosome as well as other severe effects in all parts of the body. Treatment for this condition is only supportive, and patients diagnosed rarely survive childhood40. For some lysosomal sulfatase deficiencies treatment is available in the form of enzyme replacement therapy (ERT), where the deficient sulfatase is replaced with a recombinant active version. The produced therapeutic is dependent on correct end product for its ability to be taken up by the cell and transported to the lysosome. Furthermore, the sulfatase must have acquired the functional formylglycine conversion within its core in order to be active. This has proven to be quite problematic in recombinant protein expression and the few therapies available today are among some of the most costly pharmaceuticals available41,42,43,44.

Green-fluorescent protein

One of the most useful proteins in biotechnology research is the GFP. It consists of 11 beta sheets that form a barrel structure with a small alpha helix in its center. Within the alpha helix, a fluorophore exist that is made up of three amino acids (Ser65, Tyr66, and Gly67) that undergoa cyclization and oxidation process during the maturation process that turns them into the active glowing fluorophore45. This bioluminescent protein was discovered and extracted from the jellyfish Aequorea victoria in 1962, but it was first after 1994, when it was expressed in E.coli that its unique features were observed46. Not only that the GFP could synthesize a fluorophore within its own core without any additions except oxygen. It could also form and glow under warmer temperatures, and not just the cold waters from where it originated. This meant that GFP could be used for in vivo expression in virtually any organism. Following these discoveries, research on GFP expanded widely, and hundreds of different GFP mutations are now available. The range of variants are mainly focused of the emission and excitation spectra of the protein. And with the addition of GFP ortholog fluorescent proteins extracted from corals within the far-red emitting regions, all the visible spectra are now covered with a fluorescent version originated from the wildtype (wt) GFP. Furthermore, protein engineering research has led to improvements to reduce dimerization, improve maturation speed of the fluorophore, increase protein stability under both high andlow pH and temperature conditions. With these fluorescent proteins at hand, scientist have the ability to visually access the protein world. Fusions of GFP to proteins of interest (POI) enabled quick evaluation of protein levels. This could be used for determination of promoter strength, and the determination of transformation or transfection efficiencies. It can even be used as a selection marker for gene uptake after stable gene introduction or as fluorescent donor and acceptor for Förster resonance energy transfer, enabling protein-protein interaction determination47,48,49,50,51. As mentioned previously in this chapter, antibodies can be used to quantify protein levels in an ELISA format. Especially useful for evaluation of protein expression levels of different isolated production cells. A technique called split-GFP can be used for the same purpose through the co- expression of the fluorescent protein with the52 POI . For minimal disturbance of protein expression, only the 11th beta sheet is fused with the POI. And through the addition of a recombinant GFP substrate lacking the 11th beta sheet (GFP1-10), complementation occurs and a florescent signal that correspond to protein levels can be measured, figure 1.6.

12 Chapter 1 - Proteins

Figure 1.6 Green fluorescent protein. The structure of GFP consist of 11 beta sheets that form a barrel-like structure with an alpha helix in its center. Within the core lies the three amino acid flourophore green. The 11-th beta sheet is labeled in blue.

13 Chapter 2 - Mammalian host cell for protein production Chapter 2

Mammalian host cell for protein production

Upon the production of any product, high yields are one of the key determinations for a good manufacturing process. For biologics, the structural quality determines the efficacy and safety of the biological drug and therefore high production output comes second, the quality of the protein comes first. One of the important features within mammalian cells are the capabilities toproduce high quality proteins with exact or similar abilities as human proteins. And therefore, manufacturing of recombinant proteins for therapeutic purposes are increasingly done in mammalian cells. The mammalian cell

The mammalian cell is a plasma membrane bordered compartment that consist of several important membrane bound subunits called organelles. These subunits were named organelles, a diminutive rephrasing of organs due to their importance for cell function as organs are important to the human body. There are several organelles but the main ones are the nucleus, ER, Golgi complex, lysosomes, and mitochondria. The blueprints of the cell, DNA, are stored within the nucleus and are therefore protected by this mechanical barrier that is the nucleus. Preventing interaction between hazardous material and DNA molecules is an important factor to retain the genetic material of the cell53. As mentioned earlier, upon the translation of genetic material to proteins, DNA is transcribed within the nucleus into mRNA which is transported from the nucleus to ribosomes. Here, two pathways for protein translation exist. Either the translation occurs within the cytosol, or, the ribosome co-translationally transport to the ER for translation via the secretory pathway. The secretory pathway takes care of proteins that are membrane bound, are destined for organelles within the cell, or is secreted out of the cell. These proteins undergo a series of post translational modifications (PTM) that will be further discussed later on. After translation, folding and PTM modifications within the ER, proteins move to the Golgi apparatus. Here, proteins are further modified and sorted for active transport towards their final destination. The lysosome is the recycling station of the mammalian cell, where hazardous and excess material is degraded by hydrolytic enzymes. Lastly, mitochondria is the organelle where energy is generated in the form of adenosine triphosphate, the organic compound that provides energy for most cellular processes. For therapeutic protein production purposes, the machinery involved in modification, folding, and secretion is the most important feature of thecell. Therefore, the secretory pathway is of great interest and is what makes mammalian production to stand out as a production host for advanced protein therapeutics.

14 Chapter 2 - Mammalian host cell for protein production

Chinese Hamster Ovary Cells

Despite it being one of the first stable mammalian cell lines developed, Chinese hamster ovary (CHO) cells are still the most common protein expression system used today54,55. The early establishment of CHO cells might actually be one of the explanations to its popularity. After its development in 1957 it slowly spread around labs worldwide and became a familiar face, eventually establishing its position as the golden standard for mammalian cell lines. Besides being among the first, CHO cell lines are highly suitable for transfection studies duetotheir unique adaptability. This comes from the plasticity of the genome, adapting to various culture conditions and production of different recombinant proteins56. This adaptability in combination with the ability to perform human like PTM’s, made it a favored production host early on57. To meet high production demands, CHO cells have been engineered in various ways. One important feature for large scale production is gene amplification of heterologous genes in CHO cells. through the removal of dihydrofolate reductase and glutamine synthetase in cell lines. This, together with the ability to be used in large scale production tanks, often up to 10’000 liters, have made CHO cells the favored production host for large scale production58. Production of safe pharmaceuticals is naturally a top criterion. And for CHO cells many safety aspects speak in favor of it. It has lower susceptibility to many human infections, possibly due to the low expression of genes related to viral entries. And in case of infection, screening systems have been well developed for CHO cells for detection of cell line infections59,60. Furthermore, since CHO cells can be grown in serum-free media, no human or animal derived proteins are present during cultivation, increasing the safety profile of CHO production. On the downside, CHO cells are not capable of producing all the human PTM’s such as alfa-2,6- sialyltransferase and alfa-1-3,4-fucosyltransferases, two enzymes involved in the most important PTM process of glycosylation. Furthermore, CHO cells have alpha-gal and N- glycolylneuraminic acid (NGNA) glycans that are not expressing in human cells. These deviations from the human glycosylation machinery can have a large negative effect when expressing recombinant therapeutic proteins for the use within humans. And most humans have circulating anti alpha-gal antibodies that will interact with a CHO cell glycosylation pattern61,62,63. Therefore, often additional screenings are necessary to isolate clones that are lacking alpha-gal and NGNA glycans, and high production clones might be missed64. Human Embryonic Kidney Cells

Human embryonic kidney (HEK) 293 cells is the most common human cell line used for expression of recombinant protein65. First generated in 1973, from electively terminated female fetus with unknown parentage66. For cell line establishment, adenoviral 5 genome fragment (containing the viral genes E1A and E1B) were transfected into the cell. During experimental procedures in the lab, all experiments received a number, and experiment number 293 happened to be the successful integration of the viral gene fragments, hence the name 293. The integration ended up on 19, resulting in immortalization of the cell line. Furthermore, expression of E1A and E1B inhibited apoptosis and cell cycle control pathways67,68. Several different cell lines have been established from the parental version in order to enable high production of recombinant proteins for therapeutic purposes55,57. And with the ability of HEK293 cell lines to be cultivated in suspension, using serum-free media as feed, and at high density69,70,71. These advancement in HEK293 expression systems have pushed the ability to produce recombinant proteins at high titers and production in human derived cell lines are expanding. Several products can today be produced in equal amounts of those produced in

15 Chapter 2 - Mammalian host cell for protein production

CHO cells but with better quality, especially when producing proteins that lack potentially immunogenic, non-human PTMs. This might, in a near future, establish human derived cell lines as the preferred platform for protein biotherapeutic production, ahead of other mammalian based plattforms57.

The secretory pathway

As mentioned above, the secretory pathway refers to the protein assembly pathway that goes from ER, transports via vesicles to the Golgi apparatus, and is secreted to the extracellular matrix via secretory vesicles. This is at least how the first observations for the pathway looked and hence the name secretory pathway72. However, all proteins that are processed through the secretory pathway are not secreted, it also includes proteins that are processed for the transport to the late endosome/lysosome compartments and for membrane bound proteins73. Proteins processed within the lumen of the ER and Golgi encounter a different environment than proteins processed within the cytosol. In the cytosol it is a reductive environment, while it is oxidative in the secretory pathway and extracellular matrix. This difference affects the protein folding abilities of different protein structures. Most notably, disulfide bridge formation that needs tobe in an oxidative environment for bridge formation74. Besides the environmental differences for protein expression in the free cytosol as compared to the secretory pathway. There is a myriad of PTMs that can occur in the secretory pathway, which enables a several fold increase in proteome complexity. Some of the most common modifications are phosphorylation, acetylation, methylation, glycosylation, glycosylphosphatidylinositol (GPI) anchoring, disulfide bond formation, and ubiquitination75. Of all the PTM’s that can occur during transportation through the secretory pathway, glycosylation is the most important for therapeutic protein function. The carbohydrate composition of a protein influences the stability, solubility, pharmacokinetics, pharmacodistrubution, receptor binding, and effector function of the protein. Furthermore, glycosylation plays a key role in the folding machinery of proteins passing through the secretory pathway. Notably, ADCC capabilities of antibodies are partially determined by glycosylation on the Fc domain of the protein57,76 Proteins destined for the secretory pathway have a signal peptide on the N-terminus of the protein sequence. Once translated, the signal peptide is recognized by the signal recognition particle (SRP) that pauses the translation of the protein, and enables ER binding via the SPR receptor. After this, the signal peptide is transferred to the translocon, and protein translation is resumed and the protein is synthesized into the ER compartment, figure 2.1. Once in the ER, large protein folding systems begin to work. Mainly, ER chaperones such as Hsp40, Hsp70, Hsp90, Hsp100, and calnexin/calreticulin assist in proper protein folding. Primarily, chaperones delay the folding process and prevent off-pathway aggregation as well as refolding pathways for partially misfolded proteins. Early asparagine-linked(N-linked) glycosylation assists in the proper folding of the protein. With glycan additions, the protein stability is increased, and chaperones use N-linked glycans as markers for folding pathways for the translated protein77,78,79. Further, disulfide isomerase (PDI), enables disulfide bond formation between cysteine residues, and peptide prolyl isomerase catalyze peptidyl proline bond formation (cis/trans isomerization) of the peptide bond preceding proline in the amino acid sequence80,81.

16 Chapter 2 - Mammalian host cell for protein production

Figure 2.1 Co-translation into the ER. SPR bind to the signal sequence on the newly translated protein and thereby halts further translation. SPR then binds to the SRP receptor and attaches the signal sequence into the translocon. Next, SPR is released and translation continues, pushing the synthesized protein into the ER-lumen.

After the protein has achieved its correct folding state, it can be transported to the Golgi apparatus via cytoplasmic coat protein (COP). These are complexes that assemble on the ER membrane surface, and capture cargo proteins for transport from the ER, and bud off from the membrane and form a vesicle. This transport goes in two directions, anterograde transport from the ER to the Golgi goes via COPII complexes and retrograde transport back to the ER from Golgi occurs via COPI complexes82. The needs for a retrograde transport are several. First, COPII associated proteins need to be transported back to continue the transport of proteins in the secretory pathway. Second, some proteins need to go back to the ER since they are modified in the Golgi but belong in the ER, these are often membrane bound proteins and they have KKXX (two lysine amino acids followed by any two amino acids) peptide motif located near the C- terminus that interacts with COPI protein complex and retrieve the protein back to the ER where it belongs. Third, proteins native to the lumen of the ER can sometimes be transported to the Golgi by mistake. These then need to be transported back to the compartment that they belong to. Recognition of these proteins is through the KDEL (lysine, aspartic acid, glutamic acid, leucine) amino acid sequence83.

17 Chapter 2 - Mammalian host cell for protein production

Lastly, the expressed protein reaches the Golgi apparatus. The PTM that occur in this organelle need to be done in a correct sequential order. Therefore, the organelle is divided in different enclosed sub-compartments called cisternae where each compartment has specific membrane bound enzymes that perform PTM in its destined cisternae. Upon the maturation of one cisternae, it moves further out in the Golgi complex at the same time as the membrane bound enzymes retrograde backwards to correct compartment for their modifications. Within the Golgi, further glycosylation and PTM takes place as well as labelling of the produced protein for guided transport to the correct end compartment. Finally, the protein end up in the trans Golgi network were clathrin/adaptin coated vesicles bud off and delivers the protein to its destination82 , figure 2.2

Figure 2.2 Secretory pathway. Proteins destined for the secretory pathway are translated into the ER through the translocon. Next, COPII mediated vesicles carrie the protein to the ER-Golgi intermediate compartment (ERGIC), an organelle that mediates trafficking between the two larger complexes. Next, vesicles transport proteins tothe Golgi apparatus where further PTMs occur. Enzymes that belong to the ER or early cistarnea of the Golgi are transported back through retrograde transport via COPI. Once proteins are readily processed and marked for delivery within the Golgi, they are incapsulated in secretory vesicles and transported past the cell membrane. .

Control mechanisms for protein production

The ER compartment is a crowded place, with protein concentrations as high as 100mg/ml, creating an almost gel-like matrix. And during the slow process of protein folding, many proteins end up being misfolded77. Therefore, the cell has come up with quality control mechanisms in order to prevent accumulation of misfolded proteins and maintain cell homeostasis. Namely, the ER-associated degradation (ERAD) process, the untranslated protein response (UPR), and autophagy. The ERAD pathway starts with the calnexin calreticulin protein folding cycle. These chaperones monitor protein folding within the ER and can initiate protein repair.

18 Chapter 2 - Mammalian host cell for protein production

This is done through N-glycan modification. If misfolding occurs, the chaperons can retain the protein within the ER with re-addition of glycan molecules and retry protein folding. If the protein is too severely misfolded, calnexin shuffles the protein towards exit out from the ERvia mannose trimming of the N-glycan and it is ubiquinated and degraded by the proteosome in the cytosol. If the protein is correctly folded, calreticulin allows it to pass through for further transport to the Golgi. However, if an excess of unfolded proteins accumulates within the ER, the UPR is activated. This activation occurs through three transcription factors (XBP1, ATF6, and ATF4) that initiate transcription of more chaperons in order to cope with the protein folding pressure within the ER. If accumulation of unfolded proteins is not delt with, the cell will undergo apoptosis84,85. Besides processes to combat protein misfolding within the ER there is a large cellular system responsible for restoring cell homeostasis during metabolic and oxidative stress called autophagy. Autophagy restores cell homeostasis through breakdown of proteins, cytoplasmic components, and organelles via the lysosome. Activation of autophagy can occur via nutrient starvation or the UPR upon the accumulation of misfolded proteins within the ER. Especially, the initiation of transcription factor ATF4 engage CHOP, a transcription factor that promotes dozens of autophagy related (ATG) genes, that build up the autophagosome that can engulf large parts of the cell86. Three major types of autophagy exist: macroautophagy, where autophagic cargo is delivered to the lysosome via a double membrane vesicle (autophagosomes)87; microautophagy, where a lysosomal invagination internalize cargo for degradation 88; and chaperone-mediated autophagy, where cytosolic proteins enter the lysosome in a selective manner through a protein translocation system 89.

19 Chapter 3 - Improving protein production Chapter 3

Improving protein production

During the last 20 years, improvements on mammalian expression systems have made them into a trusty production host for protein pharmaceuticals. Routinely, biologics can be produced with g/l yields. However, with the constant expansion of protein development comes new demands for production cell lines. Not only do they need to generate high yields, but also with the correct function and quality. Most of the improvements made so far on the production end have been within bioprocessing and media composition, with fewer improvements within the cell. In the following chapter, cell line development for protein therapeutic production improvements on the cell level will be introduced. In this area, new exciting fields are opening up the possibilities to understand the mammalian cell in a greater depth and reconstruct the cell for our protein expression purposes. Both for increased cell productivity and the correct function of any novel complex biologic that will be needed to be produced in the future.

Engineering the recombinant gene for increased production

Since protein production within the cell involves transcription and translation of DNA, optimizing these processes can give more output. In all cells, there is a codon bias towards specific codon settings. This is likely a way to fast-track the expression of specific proteins. Utilizing this difference enables a gene engineering approach where the most abundant tRNA codons are used, called codon optimization90,91. Besides altering the gene sequence for more optimal codon translation, the rate and frequency of transcription can also be engineered. The promoter region is where RNA polymerase transcription begins. For recombinant protein expression, promoter regions endogenous for the host organism are usually not used. Instead, the promoter region is a foreign promoter that promotes a high affinity for RNA-polymerase assembly, most often in the form of the human cytomegalovirus (CMV) promoter. Selecting strong promoters will lead to more transcription of mRNA and higher recombinant expression levels. Besides alternating or engineering new promoters, one can use switch promoters that enable the growth of the host cell cultivation before turning on recombinant protein production92. Also, the mRNA produced will eventually degrade within the cell. However, by preventing mRNA degradation with the addition of long poly-a tails, more protein expression will be achieved since the mRNA can be translated for longer periods of time. Furthermore, identification of difficult regions within the recombinant protein can result in alteration of the protein that makes it easier to express, but it still holds the function necessary for the drug93.

Engineering the host cell for increased production

Besides engineering the recombinant gene for optimal recombinant protein production, the host itself can be engineered with the over-expression, downregulation, or knockout of genes. For mammalian cell lines, most work has been done on CHO cells. Where the main focus has been on improving cellular productivity, quality of the protein, and cell line stability94. Maintaining normal protein synthesis and proliferation is an energy-demanding process within 20 Chapter 3 - Improving protein production the cell. Therefore, altering genes related to these processes creates an energy surplus for recombinant expression95,96,97. Furthermore, increasing the energy metabolism through co- expression of genes that optimize nutrient uptake will further contribute to recombinant expression thresholds for the cell. Besides this, adding the expression of genes that can reduce toxic byproducts will further push the productivity of the cell98,99,100,101,102,103. Besides this, the cellular productivity can be increased with a decreased cultivation temperature that alters the metabolism in a favored way99,104,105. When it comes to engineering the production cell line for protein quality, the focus is mainly on the secretory pathway. Since most therapeutic proteins are glycoproteins, many engineering attempts have been directed towards this part of the cell. Due to their importance in protein folding and aggregation within the ER, several different chaperones or genes involved in the UPR have shown to increase both yield and quality when co-expressed106,107,108. As already mentioned, CHO cells do not have the exact same glycosylation capabilities as human cells. To overcome this hurdle, one of the first gene over-expression performed in CHO cells waswith ST6GAL. This enzyme enabled the production of proteins with the human alpha2,6-sialylated glycans attached109. Further over-expression of genes involved in glycosylation has been performed with improvements in the glycosylation structure for recombinant protein110,111,112,113,114. Besides, over-expression of genes for enhanced glycosylation, knockout of genes involved in glycosylation have been performed to increase therapeutic function. As an example, knockout of the enzyme that catalyzes transfer of core fucose to antibodies resulted in elimination of core fucose within produced antibodies. Resulting in a better therapeutic antibody with stronger ADCC 115,116. Lastly, cell line engineering efforts have been made on the longevity of the cell line, where most engineering methods have targeted the controlled initiation of apoptosis in cells. With an increased expression of anti-apoptotic genes or genes related to prolonging proliferation, the stability of the cell line has increased117,118,119. Although vast improvements have been made on mammalian expression systems through synthetic biology over the last 20 years, new exciting opportunities have emerged with omics technologies. Through the collection of transcriptomics, metabolomics, and proteomics data, the cell can now be understood in a new way. Utilizing large information gathered from the cell, we can better understand the global effects of recombinant protein production and the effect of up and down-regulation of a gene. This technology can be used to design new strategies to improve biopharmaceutical production120. For transcriptomics, the large breakthrough of the technology came with next-generation sequencing techniques. This enabled high throughput RNA-sequencing, and made it possible to both map and quantify the transcriptome121. For transcriptomic studies on CHO cells, two major thresholds have been passed with the sequencing of the CHO genome and the full CHO transcriptome60,122. With transcriptomics, new frontiers are open, and already several important correlations between transcript levels and recombinant protein production have been seen. Such as the negative correlation between recombinant protein transcript and genes involved in cell- cycle progression, mRNA processing, transcription, protein folding, and translation123.

21 Chapter 3 - Improving protein production

Cell line development

After cell line engineering and gene optimization for the protein that will be produced, it is time to isolate a single clone for the establishment of a production cell line. For mammalian expression hosts such as CHO and HEK cells, there is an inherent heterogeneity between clonally derived cells. With differences on the transcriptomic level, proteomic level, morphology, and chromosome setting124,125,126,127,128,129. This clonal variation is exploited when it is time to isolate a good cell line, i.e., a cell line that can achieve high cell proliferation rates and protein titers for a prolonged time. However, this instability can also give rise to unpredictable behavior in vitro, such as loss of productivity and cell-line specific variation in glycosylation processing130,131,128 First, a pool of cells is transfected with the gene of interest, resulting in random chromosomal integration of the gene. Secondly, selection and isolation of transfected cells take place. Traditionally, single-cell isolation is achieved by diluting out the cells in a multi-plate format. This method of isolation is a tedious and time-consuming task. However, once isolation is achieved, the secretion of expressed proteins can be monitored within each well, usually with fluorescent antibody detection or GFP reporter fusion to132 thePOI . Alternatively, single cells can be captured in droplets for a more high throughput isolation and productivity evaluation process133. Single cell isolation can also be achieved with optical tweezers, where cells are captured by high energy lasers that generate an optical gradient force. Thereby enabling measurements on protein expression abilities of single clones134. Another way to isolate single clones in a high throughput manner is to use FACS that can screen millions of cells in a matter of minutes135. However, for this technique, the secreted protein needs to be captured on the cell's surface. Either through cold capture techniques or with the addition of a cell membrane anchoring system. 136,137,138. After selecting high-producing clones, they are tested in scaled-up production. Typically, first in shake flasks (~30 mL) followed by benchtop bioreactors (~1L) and lastly large-scale bioreactors (~100L) 139. One problem in the expansion process is that clones that were high producers in the early isolation stages are rarely the high producers at large-scale production140. Therefore, several single-cell candidates are evaluated during expansion screens, and for each volume expansion, fewer and fewer candidates are kept. It is even possible that the best production candidate is lost at the first screening process. One way to overcome this problem is to separate the growth phase from the production phase with inducible promoters141,92. After expansion experiments, one cell candidate is selected for the production of the POI, and several cell banks are cryopreserved for future production.

22 Chapter 4 - Epitope mapping Chapter 4

Epitope mapping

Antibodies are likely the most important class of protein molecules in life science today. The usage and span of these specific protein binders are enormous, and routinely research workas well as therapeutic successes are dependent on their specificity. Every added knowledge of antibodies is therefore of value and every detailed information that can be added is of importance. Epitope mapping is the experimental process for locating an antigen sequence or surface. This surface can either be conformational or linear, figure 4.1. And the whole surface of the antigen that an antibody binds to is called the epitope. Therefore, several different epitopes can exist on one antigen. This valuable tool aids in understanding the mechanism of action for an antibody therapeutic, it can assist in early drug development by removing redundant candidates at an early stage, and it can be used in vaccine evaluation through monitoring of vaccine responses in patients. Also, it can make it possible to enable homogenous patient cohorts for clinical trials on antibody efficiency, and enable a more precise and defined patent listing for a commercial antibody. Two different classifications exist, B-cell epitopes and T-cell epitopes, but for this section, only B-cell epitopes will be dealt with.

Techniques for epitope mapping

Several techniques for determination of epitopes exist. It either gives an image of how the antibody interacts with the structure of the antigen or a sequence of the antigen that interacts with the antibody. Peptide based epitope mapping techniques is one of the simpler methods where a linear segment from an antigen is isolated and antibody binding is classified towards multiple different small segments in order to elucidate what segments the antibody bind to (pepscan). This is one of the most common methods for epitope determination and was first developed as an ELISA format method142. However, it is now a multiplex method that can be used for binding and specificity assays on complex peptide arrays for whole proteome studies143. Several methods for structural identification of antibody-antigen interactions exist such asco- crystallization, solution nuclear magnetic resonance spectroscopy (NMR), and cryogenic electron microscopy. Co-crystallization is the golden standard for determination of epitope to paratope interactions. It is a technique where X-ray diffraction determines the atomic structure of the crystalized antibody-antigen complex. Determination of the antigen epitope is made from the observer, but usually amino acids that are within 4 Å are considered to have an interaction and thereby be a part of the paratope-epitope residue144,145. This technique has the limitation that epitopes are solely determined on proximity when the true interaction often only rely on a few amino acid residues. Furthermore, protein complexes must be crystallized in order for this method to work. This is very time consuming and large amounts of purified components are needed in order to obtain the crystals. NMR can also give a dynamic image of the epitope:paratope complex. NMR signal is sensitive to changes in the local chemical environment and when the antibody binds its target antigen, the NMR detects this change in the local environment. By this manner the epitope can be detected. This technique is limited to smaller proteins and is usually not used for epitope mapping.144,146

23 Chapter 4 - Epitope mapping

Figure 4.1Two types of epitopes. Antibody paratopes can either bind to a continuous sequence on the antigen surface, making it a continuous epitope. Or the antibody can bind t a discontinuous sequence on the antigen, where antibody then interacts with the tertiary structure of the antigen instead of the primary or secondary structure.

Cryo-EM has emerged as a competitor to X-ray crystallography. Recently, the resolution of the technique reached new frontiers and resolutions higher than 4 Å is now frequently achived147,148. The technique uses a sample of antibody-antigen complexes that is frozen in a freeze grid. Then several images are taken and processed, which result in a reconstruct structure of the protein149. Mass spectrometry can also be used in epitope determination144,150. Through limited proteolysis on an antigen that has been incubating in solution with the antibody, the epitope can be identified via analysis of isolated parts. A more recent version on epitope determination through MS is hydrogen/deuterium exchange (HDX). In HDX, antigen is labelled with deuterium both in complex with the antibody and without. This enables antibody binding site identification since this region prevented deuterium labeling 151. Also, in silico prediction of epitopes is a new possibility. And with recent advancement in structural predictions of proteins, with Alphafold, the method could soon be used.152 but at the moment this technology is considered to be not good enough yet 153,154. The most common way for determination of the epitope sequence is however through site directed mutagenesis. This simple method enables the determination of what parts of the amino acid sequence that are a part of the epitope. Most commonly, alanine scanning mutagenesis is used. This is because alanine is a small amino acid that only comprise of a methyl group as the side chain, meaning that interactions with other amino acids are rare. In alanine scanning mutagenesis, an alanine is substituted in the sequence of the antigen and the antibody binding is

24 Chapter 4 - Epitope mapping assessed to see if that residue is a part of the epitope. In order to figure out the whole epitope, each residue needs to be substituted in a library of alanine mutations. This makes the method rather laborious but it gives the highest resolution when it comes to determining the amino acid residues that comprise the epitope. When determining an epitope through x-ray crystallography, the determined region of antibody binding is often rather large. However, through alanine scanning mutagenesis, it has been seen that the epitope often consist of few residues, sometimes as low as three to five155,156. Furthermore, alanine scanning mutagenesis can find residues that are outside of the structural epitope, which contribute to binding stability through their backbone atoms. These residues are then classified as functional epitopes, whereas epitopes identified through x-ray crystallography and cryo-EM are defined as structural epitopes157. Protein display methods are a contrast to conventional epitope mapping methods, where the peptide or mutated sequence is attached to the surface of cells instead of secreted and evaluated in a plate format. This enables the utilization of flow cytometry methods that can elucidate the binding capacity of the antigen presented on the cell. Most often, cell surface display is used with alanine mutagenesis libraries144,158,159,160,161. And both yeast an bacterial cells have been used in epitope determination via cell surface display162,160,163. These organisms are however not suitable for all kinds of human protein-protein interactions, since PTM and glycosylation patterns are not the same as in human cells. This could be addressed with mammalian cell surface display methods. For mammalian cells, there exist four classes of transmembrane bound proteins (Type I, II, III, and IV) and one type of integral membrane proteins, GPI164,165. Most often, the PDGFR transmembrane region has been used for attachment of proteins on the mammalian cell166. However, GPI anchoring is becoming an increasingly important tool for protein expression138,167. And recently, a novel display system based on a type IV transmembrane system was presented168. These display systems have yet to be explored in an epitope mapping setup.

25 Present investigation

Present investigation

The main focus of my various project has been towards improving protein production of recombinant protein in mammalian cells for therapeutic usage. In the first three papers sulfatases are the target for improvements of protein production. Where Paper I and paper III have two different ways to improve activity of the enzyme. Either through the co-expression of helper proteins for the translation process within the secretory pathway as in paper I, or through the protein engineering method via ancestral remodeling as in paper III. In paper II, a novel protein translational modification is presented and used for the evaluation of helper protein amount needed in order to achieve maximum results without interfering with the highest possible capability to produce recombinant protein. Furthermore, this technique is used to assess the optimal heavy and light chain expression levels to retain the maximum output for antibody production. For paper IV, a more general cell line engineering approach is used for the assessment on how to find a cell line that express various recombinant proteins with higher titers. Previously, asmall compound library screen identified ULK1 as a gene that limits high production in HEK cells. And in this project the generation of a stable ULK1 knockout cell line is described. Furthermore, transcriptomic analyses revealed effects on cell homeostasis and transcription processes on the generated knockout cell line. In paper V, a protocol to maturate a split-GFP variant is presented that enables a 150-fold quicker fluorescence from the molecule. This is then used in a selection process between threestable production clones and verify its ability to quicker select high production clones. Lastly, a mammalian cell display epitope mapping technique is presented that enables a precise mapping of the interaction between SARS-CoV-2 spike protein and four different antibodies. Due to the ability of mammalian cells to produce antigens that have similar glycosylation patterns as human cells, it is proposed that this method gives the most accurate paratope:epitope interaction.

26 Present investigation

Paper I - Systems biology greatly improve activity of secreted therapeutic sulfatase in CHO bioprocess

In paper I, a comparative transcriptomic study on two isolated CHO cells expressing N- sulphoglucosamine sulphohydrolase (sulfamidase) was performed. During clonal selection, one isolate (clone A), exhibited unusual high activity of secreted recombinant sulfamidase. This enabled us to investigate the gene differences that correlated with the increase in sulfatase activation and secretion. By cultivating clone A in parallel with a “normal” low activity producer from the same transfection pool (clone B), and isolate RNA for transcriptomic comparisons, we hypothesized that genes involved in sulfatase activation would be identified, figure 5.1.

Figure 5.1 Experimental overview – identification of activity related key genes by comparison of CHO clones with different specific activity. Common for all human sulfatases is activation of the enzyme occurring in the ER, including formation of a formyl-glycine (A). Stable CHO clones producing varying amount of active sulfatase are cultivated in an Automated Bioreactor (B) followed by subsequent transcriptomic analysis (C) leading to identification of key-genes linked to activity for confirmatory expression validation in CHO as co-factors (D). Potentially leading to improved specific activity (E).

27 Present investigation

During cultivations, clone A produced less sulfamidase but with greater specific activity than clone B, figure 5.2 A,B. And from RNA sequencing, a greater ER stress surcharge was seen in clone B and clone A exhibited elevations on protein secretions. In order to elucidate gene level differences on PTM and protein secretion between the two cells, genes involved in these processes were examined further. Over the entire cultivation time, a cluster of 14 genes were identified that consequently were over-expressed in clone A, figure 5.2 C.

Figure 5.2 Titer, activity measurements and gene clustering of clone A and B. Two stable CHO clones producing Sulfamidase were monitored for activity and productivity over a cultivation period of 17 days. (A) The specific activity dropped during the cultivation time for both clones. Clone A standard and high cell density had approx. 100% higher specific activity as compared to clone B on day 17. (B) Titer inmedia increased over the cultivation period. Clone B increased several folds more than clone A for all the different conditions. (C) Extracting protein secretion and post translational modification genesets from GSEA-MSignDB, a set of 83 differently expressed genes emerged. The extracted gene set had different expression patterns over the cultivation period. However, out of the 83 genes, 14 are more upregulated in clone A over the entire cultivation period (green)

28 Present investigation

With this information, we selected out three genes based on their function in sulfatase activation and transport. Furthermore, four genes described in the literature as involved in sulfatase activation and secretion, and all of the seven genes were cloned for co-expression with Arylsulfatase A (ASA). ASA was selected for the co-expression since we hypothesize that genes involved in one class of sulfatase expression will be useful for all 17 classes of sulfatases due to the conserved nature of sulfatases. Furthermore, ASA is one of the most difficult-to-express sulfatase enzymes and thus, a great need for its improvement exists. All seven co-expression studies were performed and ASA was purified and measured for activity with a colorimetric assay. For genes selected based on previously described link to sulfatase production, only two showed an increase in expression of active product (SUMF2, and PDIA1). However, for the genes selected from transcriptomic analyses, all showed a high increase of specific activity, ranging from a tree-fold increased all the way up to 150-fold increase, figure 5.3.

Figure 5.3 A 150-fold increase in specific activity was achieved for product gene co- expressed with co-factor from transcriptomic analyses. (A) Co-expression performed with genes selected from transcriptomic analyses. All three genes co-expressed with ASA showed an increase in specific enzyme activity. The highest increase measured, 150-fold as compared to control, were with SUMF1 co-expression. (B) Co-expression with genes selected for their described involvement in sulfatase activation. PDIA1 and SUMF2 showed an increase in activity of the purified ASA (47% and 56%).

Interestingly, for M6PR co-expressions, the titer levels decreased significantly. This was also observed for the clone A sulfamidase production. We therefore hypothesized that the capability of sulfatase activation process is limited within the ER during the high translation rate driven by the high transcription rate of a CMV promoter. This was evaluated through expression of ASA with three different promoters. The selected promoters had a relative expression range compared to CMV corresponding to 0.35 for mouse phosphoglycerate kinase 1 promoter (PGK) and 0.18 for human Ubiquitin C promoter (UBC). Expression of weaker promoters did affect the activity of ASA and PGK promoter driven expression increased the relative activity 4-fold figure 5.4.

29 Present investigation

Figure 5.4 A reduction of promoter strength increased the specific activity of ASA by 4-fold. Expressing the recombinant ASA at a lower rate showed different activity patterns. ASA with a PGK promoter produced 4 times more active protein compared to CMV. The amount of purified active ASA is then 2.4 times higher when an PGK promoter is used compared to CMV.

In this project we showed that if clones with high similarity deviate in the protein expression, gene level differences can be identified through RNA sequencing that correspond to the protein expression. This is often the case during single clone isolation due to the high heterogeneity of mammalian production cells, especially CHO and HEK cells. And with experimental processes that enable low variability, reliable transcriptomic information can be extracted. Gene information that can be linked to expression differences is a valuable tool for cell line engineering through gene downregulation or upregulation. However, here we do not only state genes of interest for co-expression, we show through cultivations with the selected genes that all of the identified genes from the transcriptomic study increase the expression of active product. Furthermore, we identified a correlation between transcript level, protein expression level, and protein activity level. This could be mimicked for another sulfatase class trough the expression with a weaker promoter, generating more usable product.

30 Present investigation

Paper II - Systematic use of synthetic 5'-UTR RNA structures to tune protein translation improves yield and quality of complex proteins in mammalian cell factories

Manipulation of gene expression is an important tool in systems biology for the optimization of protein expression capabilities in different expression platforms. For many cases, a high transcription and translation rate is desirable and closely connected in achieving high titers. However, for difficult-to-express human proteins and artificial protein-fusions, the regulation of protein expression and helper proteins can have a beneficial effect on protein expression169,170,171,172,173. Furthermore, in cell line optimization, it will be important to have the ability to precisely control gene expression of helper genes or to balance new genetic modules and complex genetic networks 174,175. As mentioned in chapter 3, there are many different ways to engineer the expression of the recombinant gene. However, for increased likelihood that the regulation to have similar effect in different cell types it can be wise to guide it towards translational part due to its conserved nature over different organisms.

Figure 5.5 RgE elements outline. (A) Upon hairpin formation prior to the gene of interest the translation expression pace is hindered. (B) This panel enables the fine tuning of protein titer vs. protein quality optimal levels.

31 Present investigation

In mRNA sequences there are secondary structure elements within the 5’-UTR that are known to impact translation rates176,177,178,179,180. And for both mammalian cells and yeast cells it has been reported that GC-content and position of hairpins within the 5’UTR effects protein expression181,182,183. With this knowledge, we designed a set of RNA hairpins, or regulation element(s) (RgE(s)), and characterized their ability to express proteins in the two most common mammalian cell lines, CHO and HEK, figure 5.5. 25 different RNA secondary structures were designed with differences in thermodynamic stability, GC-content, and position on the 5’UTR. These were then evaluated for their ability to produce recombinant proteins in the two production hosts with the different RgE in the 5’UTR. The panel of RgE’s broadcasted an expression spectrum that ranged from 2-5% up to 110% as compared to the unregulated CMV control. Also, a high similarity between the expression hosts (R2 = 0.95) was shown, figure 5.6.

Figure 5.6 RgE expression in two different host systems. The 25 generated RGEs display a broad and linear panel of expression levels. Both in CHO (A) and in HEK (B). And, high correlation in-between the two different cell systems is achieved (C).

32 Present investigation

Next, different RgE’s were tested in for gene engineering purposes in two different cases. First, RgE (4, 3, 13, 11, 2, and 6), ranging in relative expression from 5 – 110%, were used in expression of optimal HC to LC ratios for the production of trastuzumab monoclonal antibodies. This would be a great proof-of-principle since it is well known that HC expression levels should be reduced for optimal IgG assembly. 184,185,186. And, with the RgE’s upstream of the HC an optimal expression ratio could be seen with RgE 3 (35%). Giving a 12.4-fold increase of purified full-size antibody compared to full CMV driven HC expression. Furthermore, an RSV promoter was put on the HC as a weaker promoter control that correspond to roughly 85% expression levels of CMV. However, this decrease in expression showed no beneficial effects for trastuzumab expression, figure 5.7. This illustrate the importance of a fine-tuned expression panel that could reveal the optimal level of expression that would otherwise be difficult via promoter panels.

Figure 5.7 Trastuzumab production with five different RgE elements. Reducing expression of trastuzumab HC enables up to a 12-fold increase in full mAb assembly. The regulation panel show the importance of fine-tuning levels of expression (A). For the optimal expression levels, an almost complete abolishment of protein aggregates is seen (B).

33 Present investigation

Secondly, seven RgE’s ( 22, 4, 24, 3, 9, 2, and 6) was used as an expression panel for elucidating the optimal expression of a helper gene in the production of active ASA. Expressing helper genes at the optimal level is of great interest since one does not want to waste cellular resources on unnecessarily high expression174. As illustrated in paper I, SUMF1 improves protein expression of active ASA when co-expressed in CHO cells. Here the same co-expression experiments were conducted but with RgE’s put on the SUMF1 gene. Reducing the expression level of SUMF1 down to 0.4-fold relative to CMV expression did not affect the specific activity, figure 5.8 A. Meaning that 60% of the expressed helper gene are unnecessary for assisted protein expression of ASA. At lower levels of SUMF1 gene expression, a sharp decline occurs and the optimal helper gene expression can in this way be identified. This is clearly illustrated when reviewing the correlation between increasing SUMF1 expression and specific activity with less than 0.4-fold expression2 (R = 0.76), and the absent correlation for SUMF1 expression levels above 0.4-fold (R2 = 0.002), figure 5.8 B.

Figure 5.8 RgE expression of a helper gene. Increasing the SUMF1 helper gene expression have a positive impact on sulfatase expression up to a certain point. After RgE 3, no more positive effects is added (A). This is also clearly visualized with the correlation between expression level and specific activity (B).

This show that our method for fine tuning of protein levels with RNA hairpins for recombinant protein production in mammalian cells is a useful tool. And we are optimistic that this predictably protein control toolbox can come to great use in many mammalian cell studies in the future.

34 Present investigation

Paper III - Ancestral lysosomal enzymes with increased activity harbor therapeutic potential for treatment of Hunter syndrome

Iduronate-2-sulfatase (IDS), a lysosomal sulfatase that is associated with mucopolysaccharidosis type II (MPSII, Hunter syndrome) was studied for increased recombinant activity in paper III. MPSII is one of the few LSD that have ERT available for treatment of somatic symptoms in the form of Idursulfase (Elaprase®) and Idursulfase beta (Hunterase®). Although treatment is available, the enzymatic levels that pass beyond the blood-brain-barrier are to low, and thus do not alleviate all symptoms. Furthermore, dosing of ERT is challenging since they must be kept at a high level in order to reach therapeutic levels within the body. This results in lengthy intravenous dosing, and a costly production187,188. One way to combat these issues with IDS treatment would be to increase the stability or activity of the enzyme, and thus decrease the needed drug dose. In this study, ancestral sequence reconstruction (ASR) is used to engineer the sequence of a recombinant IDS. Previously, it has been shown that ASA activity have a correlation with the evolution of rodents189. Based on this information, we explored the ancestral space of IDS between primates and rodents with the intent to identify ancestral versions of IDS that have an elevated enzymatic capability. With the starting point of recombinant human IDS (rhIDS) and recombinant murine IDS (rmIDS), a maximum likelihood tree was constructed. From this, three nodes were selected as ancestral IDS versions (IDS-A1, IDS-A2, IDS-A3), figure 5.9 A. In between the last common IDS ancestor (IDS-A3) and modern human IDS, only 20 residues differed. And of these residues, all besides two (A354T and H356R) were seen on the surface, figure 5.9 B. Due to the close proximity to the catalytic residues, these mutations were included in the study in the form of individual mutations, and in combinations. Furthermore, position 329 in IDS-A1 and IDS-A3 had close to equal probabilities for valine and isoleucine. Therefore, both were included for each ancestor with valine in the default version.

Figure 5.9 Ancestral reconstruction of IDS. (A) Maximum likelihood tree of rodents and primates identifies three nodes that were selected for ancestral reconstruction. (B) structural view of the 20 residues that differed between A3 and modern IDS.

35 Present investigation

Genes were cloned and expressed in CHO cells. And the produced sulfatase was purified and measured for enzymatic activity and percentage of active enzyme. Two of the ancestral versions, IDS-A2 and IDS-A3, had a higher activity than rhIDS, figure 5.10 A. However, to be certain that this increase was not due to the higher percentage of active enzyme, another co-expression with SUMF1 was conducted. Since SUMF1 is the enzyme that generates the active sulfatase, an equal activation levels would enable us to determine if ancestral activity was better, and not just from an increased activation percentage. Upon co-expression, all IDS reached almost 100% active enzyme, with the retained 2-fold higher activity of IDS-A3, confirming the advantage of this ancestral version, figure 5.10 B.

Figure 5.10 Enzymatic activity of IDS variants. (A) a twofold increase in enzymatic activity was observed for IDS-A3. (B) Conformation on the increased activity of IDS-A3 through co-expression with SUMF1. Since co-expressions increased the percentage of enzyme activity of all IDS variants, conclusion on relative activity of each variant could be made.

Next, characterization on MPS II patient fibroblasts was performed, both for enzymatic uptake and substrate consumptions of the different IDS variants. There it was seen that IDS-A3 had the highest substrate consumption but with the lowest cellular uptake, figure 5.11. The low cellular uptake could potentially be from the changes in the protein shape or surface. In a trial to identify protein deviations, a series of experiments were conducted on size, glycosylation, and receptor uptake abilities. However, no clear deviations that could explain the different cellular levels was identified. In this study, ASR methods were used to engineer the rhIDS gene and a 2-fold increase in enzymatic activity was achieved. Furthermore, the ancestral version had higher substrate consumption as compared to modern versions. In a therapeutic perspective, this has great potential. Since increasing the enzymatic capabilities of this difficult-to-express therapeutic could increase both substrate clearance in patients and reduce administration times and production costs.

36 Present investigation

Figure 5.11 MPS II fibroblast characterization. (A,B) Intracellular concentrations of IDS variants showed a decreased uptake of IDS-A3 variant. Although at lower levels, the IDS-A3 consumed most intracellular substrate.

37 Present investigation

38 Present investigation

Paper IV - ULK1 knockout cell line downregulates autophagy, upregulates recombinant transcript and improves protein secretion

In paper I, cell line engineering was directed towards a specific recombinant expression. For this project, a more general approach was directed towards improvements on recombinant expression capabilities of a HEK cell line. As discussed in chapter 3, there is a constant need to improve recombinant protein expression capabilities of mammalian cell hosts. In order to identify genes that limits recombinant protein expression over a broad range of proteins, a small compound library screen of 19’000 compounds identified ULK1 as a limitation for recombinant expression of three different model proteins190. In paper V, the establishment of a stable ULK1 knockout cell line was followed by a transcriptomic analysis for evaluation of cellular effects. A dual CRISPR/Cas9 guided gene cut on the endogenous ULK1 was performed on both a recombinant expressing cell line and a wt control cell line. PCR analyses and recombinant expression patters verified ULK1 deficient cells, with the same Cripto-Fc (model protein) increase as from the small compound blockage, figure 5.12.

Figure 5.12 Generation and verification of ULK1 deficient cell line. For generation of ULK1 deficient ExpiHEK cell line CRISPER/Cas9 was utilized for gene editing ofthe ULK1 gene. (A) two guide-RNAs was designed to attach at the end of exon 1 and start of intron 1. (B) Viable cell density and doubling time of the knockout cells showed normal viability and proliferation rates. (C) Comparing ULK1 knockout cells with wt cells showed a 1.4-fold increase of Cripto-Fc in the knockout cell line with stable Cripto-Fc expression and a 3-fold increase of transient Cripto-Fc expression as compared to wt ExpiHEK cells.

39 Present investigation

Furthermore, expression studies on the stable ExpiHEK293 ULK1 KO (Vulko) cell verified that ULK1 could be directly linked to recombinant protein expressions levels. With Cripto-Fc Vulko cells retaining lower levels of recombinant protein expression with the addition of recombinant ULK1. And the high expression levels for cells were ULK1 was edited out, figure 5.13.

Figure 5.13 Flow cytometer analysis on recombinant protein expression verified ULK1 as common denominator for expression levels. Removal or addition of ULK1 generated the same effect on protein expression level. For Expi293 cells, recombinant expression increased upon the addition of ULK1 inhibitor and transient over-expression of ULK1 did not affect production levels as compared to unmodified cultivations. For Vulko293 cells, recombinant expression was 100% higher as compared to control. Furthermore, when adding on transient ULK1, recombinant levels decreased down to the same level as control. For both knockout of ULK1 and inhibition addition, protein production of Cripto-Fc increased to 3.4-fold as compared to control.

To further understand the increase seen from the Vulko cells, all cultivations from figure 3 were purified for RNA content and sequenced for mRNA levels. The transcriptomic datawasthen processed and analyzed for gene expression differences. In order to get an overview on the transcript deviations between samples that showed the larges recombinant expression differences, Cripto-Fc Expi samples were compared against Cripto-Fc Vulko inhibitor samples. For a more accessible summary on gene deviations, REVIGO (GO) summary list was used. Semantic similarity measurements on the top 40 GO terms identified several RNA and transcriptome related processes, figure 5.14 A. This led us to investigate RNA level further, and a correlation between Cripto-Fc mRNA levels and protein expression levels was identified, figure 5.14 B. Furthermore, there is a negative correlation between ULK1 transcript levels and Cripto-Fc levels, figure 5.14 C.

40 Present investigation

Figure 5.14 REVIGO show upregulation of DNA and RNA processes in wildtype Cripto-Fc Expi293. The top 40 GO terms are sorted with a semantic similarity measurement (A). In the largest subset, GO1, several RNA and DNA related processes are involved. Also, RNA catabolic processes emerge from GO3, displaying large differences when ULK1 is removed from the gene set of the cell. Transcript RNA for the recombinant Cripto-Fc increase in correlation to protein expression (B). Furthermore, Cripto-Fc transcript RNA increase as ULK1 transcript levels decrease (C).

41 Present investigation

Hence, modifying ULK1 levels clearly affect recombinant transcript levels for the cell. Also, GO pathways for regulation, control, and pace of transcription processes are affected. For instance, mRNA 3’end processing and RNA catabolic processes are involved in mRNA retention level within the cell. One possibility on the link between ULK1 expression levels and transcriptomic processes can be found in the function of ULK1 within the cell. ULK1 is the key initiator for autophagy, a large degradation process that maintains cellular homeostasis during nutrient depravation or cellular stresses. To further evaluate the effect of ULK1 knockout, expression of all genes in the gene ontology term autophagy (GO:0006914 ) was examined for Cripto-Fc producing cells with or without the addition of ULK1 inhibitor for both knockout cell line and native cell line, figure 5.15. Some of the most upregulated genes for the native Cripto-Fc HEK293 cell line (ATG4B, ATG2A, ATG9A, ATG16L1, and ATG16L2), are part of the autophagy process. Furthermore, subunits for autophagy formation such as TCIRG1 and RAB1B are also among the ones with the highest fold change. For the ULK1 Cripto-Fc knockout cell line, genes related to signaling pathways for initiation of autophagy were more expressed. Upregulation of CD16L1, XBP1, SLC1A3, SLC1A4, PHYHIP, STK32A, and SESN2 are factors and target of factors that can be linked to signaling pathways to induce autophagy. Furthermore, genes for ULK1 independent autophagy (TMEM74 and VPS37A) were more expressed in the ULK1 knockout cell191,192. When ULK1 is removed from the cell, major differences on both the regulation, control, and pace of transcription processes occur. Especially mRNA 3’ end processing is of interest since this process involves the polyadenylation of newly transcribed mRNA. Possibly, differences in these processes can in part explain the increased recombinant transcript levels in ULK1 deficient cultivations. Furthermore, the RNA catabolic processes are also downregulated in the Cripto-Fc Vulko inhibitor cultivation, possibly contributing to the higher levels of mRNA in the Vulko cell. One possible explanation to the downregulation of processes involved in transcript processing in the Vulko cell line could be the linkage between transcription factors and transcript controls during invocation of autophagy. For example, SLC1A4 and SLC1A3 can both be linked to autophagy and nonsense mediated mRNA decay (NMD), which is a control pathway for transcript during normal cell conditions. Upon cell starvation however, NMD is downregulated when the cell tries to initiate autophagy. So, there is a possibility that as the cell wants to initiate autophagy, less control is put on mRNA within the cell193,194. In this project, we have shown that gene knockout and inhibition of ULK1 increase protein expression of Cripto-Fc by up to 3-fold. Removing ULK1 from the cell affects cell homeostasis through downregulation of autophagy. Furthermore, transcription processes are affected upon the removal of ULK1 and potential linkage between ULK1 and transcription control is presented. Hopefully, this novel cell line can have the potential to increase protein expressions for several different types of recombinant proteins in the future.

42 Present investigation

Figure 5.15. Differential expression analysis on autophagy gene ontology set show downregulation of macro autophagy in knockout cell line. Annotated genes have the highest foldchange between Cripto-Fc Expi293 and Cripto-Fc Vulko293 inhibitor. Upregulated genes in Cripto-Fc Vulko293 inhibitor cells are linked to transcription factors for autophagy initiation (XBP1, SLC1A3, SLC1A4, SESN2, CD16L1) and for ULK1 independent autophagy (TMEM74, VPS37A). Downregulated genes in Cripto-Fc Vulko293 inhibitor cells are linked to autophagy formation and elongation (ATG16L2, NHLRC1, ULK1, ATG16L1, TCIRG1, ATG2A, RAB1B, ATG9A, ATG4B).

43 Present investigation

44 Present investigation

Paper V - Chromophore pre-maturation for improved speed and sensitivity of split-GFP monitoring of protein secretion

As described in chapter 1, the GFP molecule can be divided in parts where the fluorescent maturation takes place as soon as the dived GFP section re-associate. This can be used in screening platforms where the 11th beta strand is fused to the POI and GFP1-10 is added for evaluation on production capabilities of each single clone. Attaching only the 11th beta sheet decreases the likelihood that it affects the production capabilities of the POI. Previously, a split- GFP complementation assay accurately measured relative titers for clonal selection purposes195. However, when split-GFP is used, a lag time is present when the reassembled whole GFP molecule undergo the maturation cyclisation that forms the fluorophore. This is a slow process that limits fast detection expression levels for each isolated clone. Therefore, in paper V, we developed an enhanced version of GFP 1-10 were the maturation process takes place before complementation with the GFP11 POI fusion. If a fluorescent GFP molecule is disassembled, the fluorescence can be regained if itisallowed to reassemble196. Based on this knowledge, we proposed the idea that if GFP 1-10 was allowed to form the matured form of the fluorophore (GFP 1-10mat), it would be able to generate fluorescence signal in a much quicker manner upon the reassembly with GFP11, wich couldbe used in cell line production purposes.

For generation of GFP 1-10mat proteins, a protocol for production of pre-matured GFP1-10 variants was developed, figure 5.16. Allowing split-GFP assembly between soluble GFP1-10 and GFP-11 immobilized on a Sepharose bead. After complete fluorophore maturation within the GFP 1-10, the matured fractions was eluted from the soldi support GFP11 through acidic disassembly and subsequently retained in neutral pH.

Figure 5.16 Protocol for generation of matured GFP1-10. (1) GFP 1-10 is added to pool of immobilized GFP11. (2) The full GFP molecule reassembly and (3) maturation of the fluorophore takes place. (4) Acidic conditions elute the matured GFP 1-10.(5) Upon reassembly with fused GFP11, matured GFP 1-10 will generate a much faster fluorescence since the maturation process have already taken place.

45 Present investigation

Upon fluorophore maturation, one oxygen and four or five hydrogen atoms are lost (different maturation theories give rise to differences in mass change45. This mass shift would however enable us to investigate if we succeeded in the maturation through mass spectrometry (MS). A

20.77 Da mass shift confirmed maturation cyclisation of the GFPmat 1-10 and no residual GFP 1- 10 was seen from the MS spectra, figure 5.17.

Figure 5.17 MS data on matured GFP 1-10. Since the cyclization process removes several atoms from the GFP 1-10 molecule. A mass shift could be detected and thereby confirming the maturation of the split-GFP variant. (A) GFP before maturation. (B) GFP after maturation.

Next, a head-to-head comparison between GFP 1-10 and GFP 1-10mat showed that the matured version indeed did emit fluorescence in an almost instant manner. From the first measurement, two minutes after addition of GFP11, GFP 1-10mat had reached more than 10’000 arbitrary units (a.u.), keeping a fluorescence increase of 80 a.u./s for the initial time period. At the same time, GFP1-10 control increased with 0.5 a.u./s, meaning that the matured version was 150-times faster in giving out a signal, figure 5.18 A. The high increase of GFP 1-10mat signal slowed down over time. Presumably due to lower GFP 1-10mat molecules within the solution that had yet to adhere to GFP11. Besides the extreme increase in florescent signal formation of GFPmat 1-10 , the total fluorescent signal was almost four times higher than unmatured control. We hypothesize that this is from the selection of functional GFP1-10 during the maturation process. Since GFP1- 10 that do not adhere to Sepharose bound GFP11 during maturation will be washed away before the elution of GFP 1-10mat.

46 Present investigation

Next, we wanted to test if the GFP 1-10mat would be beneficial for selection of production clones. For this, three stable CHO cell lines that produced a known amount of GFP11 fused EPO were evaluated; high producer (1.5 pg/cell/day), medium producer (0.2 pg/cell/day), and low producer

("efficiency too low to measure"). With the matGFP 1-10 as substrate, the high producing clone could instantly be separated from the other two cell lines. After 2.5 hour, the medium production clone could also be separated from the low production clone. For the unmatured GFP1-10 control, no separation between the three clones could be made during the entire experiment (7 hours), figure 5.18 B,C.

Figure 5.18 Matured GFP 1-10 showed greater fluorescence ability. (A) Upon the addition of an excess of GFP11, the matured version of GFP 1-10 had initially a 150- fold greater fluorescence accumulation. (B,C) This had a great practical benefit forthe identification of three different CHO production cells. With the matured version, an instant identification of the high production cell line was possible. And after sometime, the medium production cell line could be identified as well. For the unmatured GFP 1-10 variant. No differentiation between the cell lines was possible after 7 hours.

Here we have presented a protocol to mature GFP1-10 for faster fluorescence signal detection of GFP11 fused proteins. Hopefully, this can be very useful for screening of single clones during cell line development of recombinant protein production hosts. Including GFP 1-10mat in droplet screening methods would be particularly useful since shorter incubation times could increase the throughput of the screening method.

47 Present investigation

48 Present investigation

Paper VI - Mammalian cell surface display system for conformational epitope determination of human SARS-CoV-2 neutralizing antibodies

Mammalian cells have a unique position as a production host due to the secretory pathway and the human like PTMs that occur in this protein expression system. As many parts of this thesis touch upon the importance of this system for the expression of proteins for therapeutic purposes. However, the expression system would also be beneficial for our understanding of the modeof action for the most important therapeutic class. As thoroughly presented in chapter 4, understanding the epitopes for both therapeutic and non-therapeutic antibodies have many advantages. And for the most accurate epitope determination, alanine scanning mutagenesis on whole protein domains is the preferred choice. In paper VI, we present an epitope mapping technique that not only use scanning mutagenesis on whole protein domains, but also allow the antigen to be produced in a CHO cell, hopefully generating an antigen as close to the human expressed version. Since alanine scanning mutagenesis is a time-consuming endeavor, several steps were performed in order to decrease the laborious task of generating a single alanine mutation library. First, when a crystal structure of the antigen is available, several residues that are not available on the surface can be excluded from the mutation library through calculations on their relative solvent accessible surface area (RSA). After residues for alanine substitution have been selected, the need to be mutated and cloned into the expression vector. A high throughput method for single mutant generation, developed by my precursors, enabled a convenient way for construction of the mutational library. And for an even easier mutation process, a web based tool for mutational primer design (kozane) was developed, available at www.kozane.app. Lastly, through the attachment onto the CHO surface via a GPI-anchor, flow cytometry was used in order to assess the antibody binding towards each individual mutant. And through the normalization of a surface expression HA-tag, quantification of antibody affinity enables the epitope to be determined, figure 5.19. For this project, a sizeable amount of time have been investigated into the process outlined above. And several different proteins have been presented with success on the surface of the cell. But when the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) hit the globe in 2020, we were given the opportunity to map novel antibody binders towards SARS-CoV-2. Located on the surface of the virus is a protruding spike protein that enabled cellular uptake in human cells via angiotensin-converting enzyme 2 (ACE2) receptors on the human cell. The location and function of this protein have made it central for all research towards vaccine development and treatment of SARS-CoV-2 infection197. Due to the enormous size of spike (1300 residues per monomer), we decided to map a smaller domain of the protein. Spike is composed of two subunits, S1 and S2. The S1 subunit is subdivided into a N-terminal domain and a receptor binding domain (RBD). The RBD domain is the location of which SASR-CoV-2 binds with ACE2, and is therefore the most critical part of the virus. Therefore, RBD was cloned in to our surface display vector and after confirmation of surface expression, the residues tobe included in the epitope mapping was selected.

49 Present investigation

Figure 5.19 Experimental workflow for mammalian cell display. CHO cell display presents a fully post-translational modified protein on its surface for a detailed epitope determination. (1) When a crystal structure is present a selection of surface exposed residues can take place in order to reduce the amounts of residues needed for alanine mutation. (2) In order to have a quick cloning process, correctly designed primers are crucial for high cloning success. Here, a web-based primer design tool is used to get all the primers needed for creating the epitope mapping library. (3) SAMURAi automated cloning system is a 96 well based cloning process that generates the epitope library in a less tedious fashion. (4) Cell display systems enables the usage of a flow cytometer for affinity assessment, flow cytometry scanning is both fast and more sensitive thanELISA based affinity assessments, ensuring that the correct epitope is identified. (5) Combining the antibody affinity for each of the alanine substituted residues within the epitope library paints out an epitope were the importance of each amino acid involved in the antibody binding is accessible.

As a proof-of-concept reference, the antibody CR3022 was used since a crystal structure of its adherence to RBD was available. When comparing the two determined epitopes it is clear that the same region has been identified as the binding site of the antibody. Furthermore, the contribution of each individual amino acid residue could be determined and detailed view of the epitope revealed only a few residues involved in the antibody binding, figure 5.20. Highlighting the detailed epitope determination of alanine scanning mutagenesis. Next, three novel antibodies towards RBD were mapped for determination of each respective epitope. For each antibody, a clear epitope consisting of a few amino acid residues were identified. For MO137-317, the antibody binding area compete with the binding area ofACE2. And for MO137-301 and MO137-156, the same epitope region was identified, figure 5.21. These results were later confirmed in a Homogeneous Time Resolved Fluorescence (HTRF) assay. Providing further evidence that all the epitopes mapped with the CHO cell surface display method were correct. With the protocol developed for generating alanine substituted mutant libraries for epitope mapping purposes in a mammalian cell display system. We hope that more detailed information on glycosylated antigens, such as RBD, will be identified through the method presented in paper VI. Especially for antigens that might be hard to produce in other platforms.

50 Present investigation

Figure 5.20 Epitope determination for CR3022. Epitope comparison between cryogenic electron microscopy crystal structure and alanine-scanning mutagenesis with mammalian cell surface display. Light pink area is determined epitope for CR3022 with cryo-ER, yellow ribbons are CR3022 light-chain CDR-loops, brown ribbons are CR3022 heavy-chain CDR-loops. Cyan area is ACE2 binding site for SARS-CoV-2. (A,B) Visual representation of cryo-ER epitope in two different orientations (0° and 110° rotation on y-axis). (C,D) Visual representation of alanine-scanning mutagenesis epitope in two different orientations (0° and 110° rotation on y-axis). Light pink area is reference epitope determined via cryo-EM, same as A,B. Dark pink and green area are presented epitope for CR3022 via alanine-scanning mutagenesis. Dark pink residues result in a decreased binding affinity when mutated to alanine and green residues result in an increase in binding affinity when mutated to alanine. The presented epitope aligns well with the existing crystal structure reference determined via cryo-EM structure. Furthermore, hot-spot residues are identified and the variation within binding affinities result in identifying H2 as the paratope CDR in closest proximity to the region where the highest binding affinity exist.

51 Present investigation

Figure 5.21 Epitope determination of MO137-317, MO137-156, and MO137-301 show two different epitopes on SARS-CoV-2. Dark pink residues are presented epitope for each epitope through alanine-scanning mutagenesis. (A,B) Visual representation of MO137-317 epitope in two different orientations (110° and 290° rotation on y-axis). The epitope mapping reveals that MO137-317 have competing binding area with ACE2 binding site on SARS-CoV-2. (C,D) Visual representation of MO137-156 epitope in two different orientations (110° and 290° rotation on y-axis). The epitope mapping reveals that MO137-156 have competing binding area with MO137-301 binding site on SARS- CoV-2. (E,F) Visual representation of MO137-301 epitope in two different orientations (110° and 290° rotation on y-axis).

52 Conclusion remarks and future perspective

Conclusion remarks and future perspective

Mammalian protein expression for next generation biologics is an advancing field. Annually, new market share reports attest the financial success of recombinant proteins produced in mammalian cells198. But more importantly, behind the financial success of protein therapeutics are millions of people whose life have benefited enormously thanks to the advancements made within this field. Thanks to biologics, diseases that were fatal in the past are now curable. This development must continue to advance in order to help current patients suffering from various diseases. And when novel therapeutic proteins arise, they should not be hindered by the production capacity of the host cell. Furthermore, with the advancement of new therapeutics comes an increasing treatment cost. And for rarer diseases, such as the LSD, the cost can be considered too large. Therefore, expanding the scientific knowledge of mammalian production hosts and increasing their production capacity is of utter most importance. The six papers presented in this thesis all revolve around production of recombinant proteins in mammalian cells or the improvements on selection of them. Mainly, methods for engineering the mammalian cell line via transcriptomic analyses have been touched upon. Since huge production success have already emerged from development on bioprocess, the understanding and modification on gene level is predicted to generate an even greater improvement on recombinant protein production in mammalian cells199. For cell line engineering, production improvements are addressed in two different ways. For the difficult-to-express proteins, such as the sulfatase enzyme, the potential low hanging fruits of cell line engineering exist. With omics technologies, factors involved in the production can be identified and if even small production hurdles are overcome, a huge impact can be seen for these difficult-to-express proteins. In paper IV however, no comparable cell clones were used. Instead, downregulation of a large processes within the cell increased the production capacity by several fold. For cell line engineering, it is possible that these two pathways are a natural way to go when dealing with production capabilities. For some recombinant proteins, a conserved process is the single most important bottleneck for production. Either an activation process or a unique PTM. For other recombinant proteins, globular processes like chaperone families, aggregation pathways, glycosylation capabilities, or secretion capacities are the bottleneck. And here, cell modifications that have a more universal effect would be possible. Regardless of production problem, utilizing omics technologies will be of great help. Potentially identifying cellular function that are not necessary for lab grown protein therapeutics. Recently, interesting approaches were taken towards generating a “clean” production host, where large amounts of endogenous proteins were removed200. Besides cell engineering through the addition or removal of genes, a novel method to improve protein expression was outlined in paper II. Where a fine-tuned expression panel enabled the optimal expression pace for two different proteins in two separate host. As seen in paper I, alternating the production pace can have a great impact on the POI. And the potential of the toolbox presented in paper II is endless, especially for gene regulation of many genes within the production host. Since the complete knockout might be a too harsh tool, this fine-tuning ability would be of great importance for many future projects. With the expansion of protein therapeutics, there is likely a future where large varieties of pharmaceuticals are being produced in smaller batches. For this, cheap and effective screening of production clones is essential. Split-GFP have already been proven as a good marker for selecting high production clones. And with the matured version presented in paper V, the selection time could be decreased, ultimately leading to a more cost-efficient process.

53 Conclusion remarks and future perspective

In paper VI, an epitope mapping technique that utilizes the protein expression benefits of a mammalian cell line is developed. Of all classes of protein therapeutics, antibodies are undoubtedly the most important one. And understanding the mode of action of these have several benefits for drug development. Epitope mapping is not new, but we propose that forthemost accurate understanding of antibody to antigen interactions, the antigen needs to be in close resemblance to the native variant present within our bodies. In the future, it might be possible that artificial intelligence (AI) derived computer programs can calculate the epitope forevery antibody, like AlphaFold almost do for protein structures. However, for AI knowledge assembly, the existing epitope and paratope knowledge must be of extraordinary detail, which would be possible to generate with mammalian alanine scanning mutagenesis.

54 Acknowledgments Acknowledgments

First of all, I wish to give a large acknowledgment to Vinnova, Stiftelsen för strategisk forskning, Knut och Alice Wallenbergs Stiftelse, NovoNordisk Foundation, AstraZeneca, SOBI, and Affibody. Through their founding of my project via ProNova, CellNova, AAVNova, AdBIOPRO, Wallenberg center for protein research, and NovoNordisk Center for Biosustainability. Without these financial contributions, none of the research presented within this thesis would have been possible.

Spending the last few years as a doctoral student has been nothing less than a dream come true. And when I look back with a solitude-infused nostalgia, I can feel nothing but awe for the scientific community and the incredible discoveries I have been given the opportunity togeta glimpse of. And the feeling that you might have encountered something of importance, although hardly ever true, is an experience that I am very grateful for.

I would like to thank my supervisors Johan, Mathias, and Anna-Luisa for their guidance through my PhD. Johan, thank you for taking me into your group and allowing me to take part in all of our exciting projects. Also, thank you for your enormous enthusiasm for science at large, it is truly contagious! Mathias, thank you for establishing our department and enabling the growth of biotechnology here at KTH, you are an inspiration to many. Anna-Luisa, thank you for taking time with all of my questions and assisting me in the lab, without your guidance I would still not be done.

Although scientific research is a delight, it comes with a numerous amount of back-breaking down days. Nevertheless, every day spent in the lab has been a tremendous joy. This is undoubted because of the incredible work environment of our department. Therefore, I wish to thank all of my colleagues for both your friendship and help over the years. Without all of you, my grumpiness would have reached unhealthy heights. Furthermore, I would like to direct a special thank you to all of the PI's of the department for encouraging the work environment that we have.

John, thank you for taking the time to read my thesis and giving me such nice feedback!

Thank you to all of my co-authors for making our amazing projects work! Without you, I would not have anything to show for. I would like to direct a special thank you to Robert at AstraZeneca, Anna at SOBI, Helena at SciLife, Mats at Lunds University, Peter, and Natalie for collaboration on our projects.

Magnus, behind every great scientist there is a guy with a computer, thank you for having a computer! For all of my projects you have played a key role; what would cell display be without Kozane, what would ULK1 be without autophagy and what would I be without you?

Sebastian, thank you for taking the time to review my thesis. The last section got a much-needed facelift after your eyes had thoroughly scanned it through. Also, thank you for helping me with my half-time seminar, much appreciated! I am looking forward to your dissertation!

55 Acknowledgments

A big thank you to all the colleagues that I have shared offices with over the years. Sadly for all of you, I have been relocated so many times that almost none have been unaffected. Thank you for putting up with me and thank you for all the joy you have brought me! Sara! My bench mate and fellow blabber, never a dull moment! Emma! You will never water my plants again, but for everything else, thank you! Max! My only student, although you outshined me on your first day, I will never view us as equals, never!

Sofia, thank you for helping me with everyday life during the composition of this thesis. Foryou, the last months have been anything but pleasant, and this book would not have been made without you, for that I am forever grateful.

Lastly, I would like to thank my family and friends. I am especially grateful for all our travels spent together. Whether it being a week in Spain or a ski resort, on a boat in the real archipelago or defying death down Voxnan, greeting Christmas in Copenhagen or hosting a youth leisure event in Trosa, it has been a wonderful recovery far away from any pipette!

56 Bibliography Bibliography

1. Darwin, C. On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life. London (1859). 2. Gilbert, W. Origin of life: The RNA world. Nature 319, 618 (1986). 3. Westheimer, F. H. Biochemistry: Polyribonucleic acids as enzymes. Nature 319, 534– 536 (1986). 4. Schopf, J. W. & Packer, B. M. Early archean (3.3-billion to 3.5-billion-year-old) microfossils from Warrawoona Group, Australia. Science (80-. ). 237, 70–73 (1987). 5. Margulis, L. & Sagan, D. Microcosmos: Four Billion Years of Microbial Evolution. (University of Massachusettes, 1986). 6. Bernier, C. R., Petrov, A. S., Kovacs, N. A., Penev, P. I. & Williams, L. D. Translation: The Universal Structural Core of Life. Mol. Biol. Evol. 35, 2065–2076 (2018). 7. Ramakrishnan, V. Ribosome structure and the mechanism of translation. Cell 108, 557– 572 (2002). 8. Woese, C. R. On the evolution of cells. Proc. Natl. Acad. Sci. U. S. A. 99, 8742–8747 (2002). 9. Petrov, A. S. et al. History of the ribosome and the origin of translation. Proc. Natl. Acad. Sci. U. S. A. 112, 15396–15401 (2015). 10. Hsiao, C., Mohan, S., Kalahar, B. K. & Williams, L. D. Peeling the onion: Ribosomes are ancient molecular fossils. Mol. Biol. Evol. 26, 2415–2425 (2009). 11. Forterre, P., Filée, J. & Myllykallio, H. Origin and Evolution of DNA and DNA Replication Machineries. in The Genetic Code and the Origin of Life 145–168 (Landes Bioscience, 2004). 12. Ezkurdia, I. et al. Multiple evidence strands suggest that theremay be as few as 19 000 human protein-coding genes. Hum. Mol. Genet. 23, 5866–5878 (2014). 13. Berg, J. M., Tymoczko, J. L. & Stryer, L. Secondary Structure: Polypeptide Chains Can Fold Into Regular Structures Such as the Alpha Helix, the Beta Sheet, and Turns and Loops. (2002). 14. Šali, A., Shakhnovich, E. & Karplus, M. How does a protein fold? Nature 369, 248–251 (1994). 15. Gott III, J. R. et al. A Map of the Universe. Astrophys. J. 624, 463–484 (2005). 16. Bateman, A. et al. UniProt: The universal protein knowledgebase in 2021. Nucleic Acids Res. 49, D480–D489 (2021). 17. AlQuraishi, M. AlphaFold at CASP13. Bioinformatics 35, 4862–4865 (2019). 18. Daemmrich, A. & Bowden, M. Emergence of pharmaceutical science and industry: 1870–1930. Chem Eng News (2005). 19. Goeddel, D. V., Kleid, D. G. & Bolivar, F. Expression in Escherichia coli of chemically synthesized genes for human insulin. Proc. Natl. Acad. Sci. U. S. A. 76, 106–110 (1979). 20. Morrow, T. & Felcone, L. H. Defining the difference: What Makes Biologics Unique. Biotechnol. Healthc. 1, 24–9 (2004). 21. Crommelin, D. J. A. et al. Shifting paradigms: Biopharmaceuticals versus low molecular weight drugs. Int. J. Pharm. 266, 3–16 (2003). 22. Leader, B., Baca, Q. J. & Golan, D. E. Protein therapeutics: A summary and pharmacological classification. Nature Reviews Drug Discovery 7, 21–39 (2008). 23. Arber, W. & Linn, S. DNA Modification and Restriction. Annu. Rev. Biochem. 38, 467– 500 (1969).

57 Bibliography

24. Mandel, M. & Higa, A. Calcium-dependent bacteriophage DNA infection. J. Mol. Biol. 53, 159–162 (1970). 25. Jackson, D. A., Symons, R. H. & Berg, P. Biochemical method for inserting new genetic information into DNA of Simian Virus 40: circular SV40 DNA molecules containing lambda phage genes and the galactose operon of Escherichia coli. Proc. Natl. Acad. Sci. U. S. A. 69, 2904–2909 (1972). 26. Cohen, S. N., Chang, A. C. Y., Boyer, H. W. & Helling, R. B. Construction of biologically functional bacterial plasmids in vitro. Proc. Natl. Acad. Sci. U. S. A. 70, 3240–3244 (1973). 27. Sanger, F. & Coulson, A. R. A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. J. Mol. Biol. 94, 441–448 (1975). 28. Bartlett, J. M. S., Stirling, D., Bartlett, J. M. S. & Stirling, D. A Short History of the Polymerase Chain Reaction. in PCR Protocols 3–6 (Humana Press, 2003). 29. Hutchison, C. A. et al. Mutagenesis at a Specific Position in a DNA Sequence. J. Biol. Chem. 253, 6551–6560 (1978). 30. Charpentier, E. & Doudna, J. A. Rewriting a genome. Nature 495, 50–51 (2013). 31. Hobbs, J. K. et al. Change in heat capacity for enzyme catalysis determines temperature dependence of enzyme catalyzed rates. ACS Chem. Biol. 8, 2388–2393 (2013). 32. Strohl, W. R. & Strohl, L. M. Antibody structure–function relationships. Therapeutic antibody engineering: current and future advances driving the strongest growth area in the pharma industry (Woodhead Publishing, 2012). 33. Alberts, B. et al. B Cells and Antibodies. in Molecular Biology of the Cell (Garland Science, 2002). 34. Sardiello, M., Annunziata, I., Roma, G. & Ballabio, A. Sulfatases and sulfatase modifying factors: an exclusive and promiscuous relationship. Hum. Mol. Genet. 14, 3203–17 (2005). 35. Parenti, G., Meroni, G. & Ballabio, A. The sulfatase gene family. Curr. Opin. Genet. Dev. 7, 386–391 (1997). 36. De Duve, C. ‘THE LYSOSOME’. Sci. Am. 208, 64–73 (1963). 37. Appelqvist, H., Wäster, P., Kågedal, K. & Öllinger, K. The lysosome: From waste bag to potential therapeutic target. Journal of Molecular Cell Biology 5, 214–226 (2013). 38. Dierks, T., Schmidt, B. & Von Figura, K. Conversion of cysteine to formylglycine: A protein modification in the endoplasmic reticulum. Proc. Natl. Acad. Sci. U. S. A. 94, 11963–11968 (1997). 39. Zhu, Y., Doray, B., Poussu, A., Lehto, V. P. & Kornfeld, S. Binding of GGA2 to the lysosomal enzyme sorting motif of the mannose 6-phosphate receptor. Science (80-. ). 292, 1716–1718 (2001). 40. Schlotawa, L., Adang, L. A., Radhakrishnan, K. & Ahrens-Nicklas, R. C. Multiple Sulfatase Deficiency: A Disease Comprising Mucopolysaccharidosis, Sphingolipidosis, and More Caused by a Defect in Posttranslational Modification. Int. J. Mol. Sci. 21, 3448 (2020). 41. Martino, S. et al. Expression and purification of a human, soluble Arylsulfatase a for Metachromatic Leukodystrophy enzyme replacement therapy. J. Biotechnol. 117, 243– 251 (2005). 42. Ries, M. Enzyme replacement therapy and beyond-in memoriam Roscoe O. Brady, M.D. (1923-2016). Journal of inherited metabolic disease 40, 343–356 (2017). 43. Muenzer, J. et al. A phase II/III clinical study of enzyme replacement therapy with idursulfase in mucopolysaccharidosis II (Hunter syndrome). Genet. Med. 8, 465–473 (2006).

58 Bibliography

44. Luzzatto, L. et al. Outrageous prices of orphan drugs: a call for collaboration. The Lancet 392, 791–794 (2018). 45. Craggs, T. D. Green fluorescent protein: Structure, folding and chromophore maturation. Chem. Soc. Rev. 38, 2865–2875 (2009). 46. Chalfie, M., Tu, Y., Euskirchen, G., Ward, W. W. & Prasher, D. C. Green fluorescent protein as a marker for gene expression. Science (80-. ). 263, 802–805 (1994). 47. Zimmer, M. Green fluorescent protein (GFP): Applications, structure, and related photophysical behavior. Chem. Rev. 102, 759–781 (2002). 48. Matz, M. V. et al. Fluorescent proteins from nonbioluminescent Anthozoa species. Nat. Biotechnol. 17, 969–973 (1999). 49. Heim, R., Prasher, D. C. & Tsien, R. Y. Wavelength mutations and posttranslational autoxidation of green fluorescent protein. Proc. Natl. Acad. Sci. U. S. A. 91, 12501– 12504 (1994). 50. Shaner, N. C., Patterson, G. H. & Davidson, M. W. Advances in fluorescent protein technology. Journal of Cell Science 120, 4247–4260 (2007). 51. Choi, W.-G., Swanson, S. J. & Gilroy, S. High-resolution imaging of Ca2+, redox status, ROS and pH using GFP biosensors. Plant J. 70, 118–128 (2012). 52. Cabantous, S., Terwilliger, T. C. & Waldo, G. S. Protein tagging and detection with engineered self-assembling fragments of green fluorescent protein. Nat. Biotechnol. 23, 102–107 (2005). 53. Franke, W. W., Scheer, U., Krohne, G. & Jarasch, E.-D. The Nuclear Envelope and the Architecture of the Nuclear Periphery. J. Cell Biol. 91, 39–50 (1981). 54. Fischer, S., Handrick, R. & Otte, K. The art of CHO cell engineering: A comprehensive retrospect and future perspectives. Biotechnology Advances 33, 1878–1896 (2015). 55. Lalonde, M. E. & Durocher, Y.Therapeutic glycoprotein production in mammalian cells. Journal of Biotechnology 251, 128–140 (2017). 56. Feichtinger, J. et al. Comprehensive genome and epigenome characterization of CHO cells in response to evolutionary pressures and over time. Biotechnol. Bioeng. 113, 2241–2253 (2016). 57. Dumont, J., Euwart, D., Mei, B., Estes, S. & Kshirsagar, R. Human cell lines for biopharmaceutical manufacturing: history, status, and future perspectives. Critical Reviews in Biotechnology 36, 1110–1122 (2016). 58. Kim, J. Y., Kim, Y. G. & Lee, G. M. CHO cells in biotechnology for production of recombinant proteins: Current state and further potential. Applied Microbiology and Biotechnology 93, 917–930 (2012). 59. Berting, A., Farcet, M. R. & Kreil, T. R. Virus susceptibility of Chinese hamster ovary (CHO) cells and detection of viral contaminations by adventitious agent testing. Biotechnol. Bioeng. 106, 598–607 (2010). 60. Xu, X. et al. The genomic sequence of the Chinese hamster ovary (CHO)-K1 cell line. Nat. Biotechnol. 29, 735–741 (2011). 61. Macher, B. A. & Galili, U. The Galα1,3Galβ1,4GlcNAc-R (α-Gal) epitope: A carbohydrate of unique evolution and clinical relevance. Biochimica et Biophysica Acta - General Subjects 1780, 75–88 (2008). 62. Bosques, C. J. et al. Chinese hamster ovary cells can produce galactose-α-1,3-galactose antigens on proteins. Nature Biotechnology 28, 1153–1156 (2010). 63. Ghaderi, D., Zhang, M., Hurtado-Ziola, N. & Varki, A. Production platforms for biotherapeutic glycoproteins. Occurrence, impact, and challenges of non-human sialylation. Biotechnology and Genetic Engineering Reviews 28, 147–176 (2012).

59 Bibliography

64. Ghaderi, D., Taylor, R. E., Padler-Karavani, V., Diaz, S. & Varki, A. Implications of the presence of N-glycolylneuraminic acid in recombinant therapeutic glycoproteins. Nat. Biotechnol. 28, 863–867 (2010). 65. Malm, M. et al. Evolution from adherent to suspension: systems biology of HEK293 cell line development. Sci. Rep. 10, 18996 (2020). 66. Graham, F. L., Smiley, J., Russell, W. C. & Nairn, R. Characteristics of a human cell line transformed by DNA from human adenovirus type 5. J. Gen. Virol. 36, 59–72 (1977). 67. Louis, N., Evelegh, C. & Graham, F. L. Cloning and sequencing of the cellular-viral junctions from the human adenovirus type 5 transformed 293 cell line. Virology 233, 423–429 (1997). 68. Berk, A. J. Recent lessons in gene expression, cell cycle control, and cell biology from adenovirus. Oncogene 24, 7673–7685 (2005). 69. Graham, F. L. Growth of 293 cells in suspension culture. J. Gen. Virol. 68, 937–940 (1987). 70. Garnier, A., Côté, J., Nadeau, I., Kamen, A. & Massie, B. Scale-up of the adenovirus expression system for the production of recombinant protein in human 293S cells. in Cytotechnology 15, 145–155 (Springer, Dordrecht, 1994). 71. Côté, J., Garnier, A., Massie, B. & Kamen, A. Serum-free production of recombinant proteins and adenoviral vectors by 293SF-3F6 cells. Biotechnol. Bioeng. 59, 567–575 (1998). 72. Matlin, K. S. & Caplan, M. J. The secretory pathway at 50: A golden anniversary for some momentous grains of silver. Mol. Biol. Cell 28, 229–232 (2017). 73. Lodish, H. et al. Overview of the Secretory Pathway. (2000). 74. Paroutis, P., Touret, N. & Grinstein, S. The pH of the secretory pathway: Measurement, determinants, and regulation. Physiology 19, 207–215 (2004). 75. Mann, M. & Jensen, O. N. Proteomic analysis of post-translational modifications. Nature Biotechnology 21, 255–261 (2003). 76. Werner, R. G., Kopp, K. & Schlueter, M. Glycosylation of therapeutic proteins in different production systems. in Acta Paediatrica, International Journal of Paediatrics 96, 17–22 (Acta Paediatr, 2007). 77. Williams, D. B. Beyond lectins: The calnexin/calreticulin chaperone system of the endoplasmic reticulum. Journal of Cell Science 119, 615–623 (2006). 78. Buck, T. M., Wright, C. M. & Brodsky, J. L. The activities and function of molecular chaperones in the endoplasmic reticulum. Seminars in Cell and Developmental Biology 18, 751–761 (2007). 79. Saibil, H. Chaperone machines for protein folding, unfolding and disaggregation. Nature Reviews Molecular Cell Biology 14, 630–642 (2013). 80. Wang, C. & Tsou, C. Protein disulfide isomerase is both an enzyme and a chaperone. FASEB J. 7, 1515–1517 (1993). 81. Shaw, P. E. Peptidyl-prolyl isomerases: A new twist to transcription. EMBO Rep. 3, 521– 526 (2002). 82. Lee, C. & Goldberg, J. Structure of Coatomer Cage Proteins and the Relationship among COPI, COPII, and Clathrin Vesicle Coats. Cell 142, 123–132 (2010). 83. Duden, R. ER-to-Golgi transport: COP I and COP II function. Molecular Membrane Biology 20, 197–207 (2003). 84. Senft, D. & Ronai, Z. A. UPR, autophagy, and mitochondria crosstalk underlies the ER stress response. Trends in Biochemical Sciences 40, 141–148 (2015). 85. Hwang, J. & Qi, L. Quality Control in the Endoplasmic Reticulum: Crosstalk between ERAD and UPR pathways. Trends in Biochemical Sciences 43, 593–605 (2018).

60 Bibliography

86. B’Chir, W. et al. The eIF2α/ATF4 pathway is essential for stress-induced autophagy gene expression. Nucleic Acids Res. 41, 7683–7699 (2013). 87. Galluzzi, L. et al. Molecular definitions of autophagy and related processes. EMBO J. 36, 1811–1836 (2017). 88. Mizushima, N., Levine, B., Cuervo, A. M. & Klionsky, D. J. Autophagy fights disease through cellular self-digestion. Nature 451, 1069–1075 (2008). 89. Kaushik, S. & Cuervo, A. M. The coming of age of chaperone-mediated autophagy. Nature Reviews Molecular Cell Biology 19, 365–381 (2018). 90. Welch, M., Villalobos, A., Gustafsson, C. & Minshull, J. You’re one in a googol: Optimizing genes for protein expression. Journal of the Royal Society Interface 6, (2009). 91. Trösemeier, J. H. et al. Optimizing the dynamics of protein expression. Sci. Rep. 9, 1–15 (2019). 92. Poulain, A., Mullick, A., Massie, B. & Durocher, Y. Reducing recombinant protein expression during CHO pool selection enhances frequency of high-producing cells. J. Biotechnol. 296, 32–41 (2019). 93. Selvaraj, S. R., Scheller, A. N., Miao, H. Z., Kaufman, R. J. & Pipe, S. W. Bioengineering of coagulation factor VIII for efficient expression through elimination of a dispensable disulfide loop. J. Thromb. Haemost. 10, 107–115 (2012). 94. Hong, J. K., Lakshmanan, M., Goudar, C. & Lee, D. Y. Towards next generation CHO cell line development and engineering by systems approaches. Current Opinion in Chemical Engineering 22, 1–10 (2018). 95. Fussenegger, M., Mazur, X. & Bailey, J. E. A novel cytostatic process enhances the productivity of Chinese hamster ovary cells. Biotechnol. Bioeng. 55, 927–939 (1997). 96. Dean, J. & Reddy, P. Metabolic analysis of antibody producing CHO cells in fed-batch production. Biotechnol. Bioeng. 110, 1735–1747 (2013). 97. Templeton, N., Dean, J., Reddy, P. & Young, J. D. Peak antibody production is associated with increased oxidative metabolism in an industrially relevant fed-batch CHO cell culture. Biotechnol. Bioeng. 110, 2013–2024 (2013). 98. Chong, W. P. K. et al. Metabolomics-driven approach for the improvement of Chinese hamster ovary cell growth: Overexpression of malate dehydrogenase II. J. Biotechnol. 147, 116–121 (2010). 99. Fogolín, M. B., Wagner, R., Etcheverrigaray, M. & Kratje, R. Impact of temperature reduction and expression of yeast pyruvate carboxylase on hGM-CSF-producing CHO cells. in Journal of Biotechnology 109, 179–191 (Elsevier, 2004). 100. Kim, S. H. & Lee, G. M. Functional expression of human pyruvate carboxylase for reduced lactic acid formation of Chinese hamster ovary cells (DG44). Appl. Microbiol. Biotechnol. 76, 659–665 (2007). 101. Le, H. et al. Dynamic gene expression for metabolic engineering of mammalian cells in culture. Metab. Eng. 20, 212–220 (2013). 102. Tabuchi, H. & Sugiyama, T. Cooverexpression of alanine aminotransferase 1 in Chinese hamster ovary cells overexpressing taurine transporter further stimulates metabolism and enhances product yield. Biotechnol. Bioeng. 110, 2208–2215 (2013). 103. Gupta, S. K. et al. Metabolic engineering of CHO cells for the development of a robust protein production platform. PLoS One 12, e0181455 (2017). 104. Rössler, B., Lübben, H., technology, G. K.-E. and microbial & 1996, undefined. Temperature: A simple parameter for process optimization in fed-batch cultures of recombinant Chinese hamster ovary cells. Elsevier

61 Bibliography

105. Kaufmann, H., Mazur, X., Fussenegger, M. & Bailey, J. E. Influence of low temperature on productivity, proteome and protein phosphorylation of CHO cells. Biotechnol. Bioeng. 63, 573–582 (1999). 106. Birch, J. R. & Racher, A. J. Antibody production. Advanced Drug Delivery Reviews 58, 671–685 (2006). 107. Hussain, H., Maldonado-Agurto, R. & Dickson, A. J. The endoplasmic reticulum and unfolded protein response in the control of mammalian recombinant protein production. Biotechnology Letters 36, 1581–1593 (2014). 108. Cain, K. et al. A CHO Cell Line Engineered to Express XBP1 and ERO1-La Has Increased Levels of Transient Protein Expression in Wiley Online Library. Am. Inst. Chem. Eng. Biotechnol. Prog 29, 697–706 (2013). 109. Lee, E., Roth, J., Chemistry, J. P.-J. of B. & 1989, undefined. Alteration of Terminal Glycosylation Sequences on N-linked Oligosaccharides of Chinese Hamster Ovary Cells by Expression of β-Galactosideα2, 6-Sialyltransferase. Elsevier 110. Davies, J. et al. Expression of GnTIII in a recombinant anti-CD20 CHO production cell line: Expression of antibodies with altered glycoforms leads to an increase in ADCC through higher affinity for FC gamma RIII. Biotechnol. Bioeng. 74, 288–294 (2001). 111. Ferrara, C. et al. Modulation of therapeutic antibody effector functions by glycosylation engineering: Influence of Golgi enzyme localization domain and co-expression of heterologous β1, 4-N-acetylglucosaminyltransferase III and Golgi α-mannosidase II. Biotechnol. Bioeng. 93, 851–861 (2006). 112. Fukuta, K. et al. Remodeling of sugar chain structures of human interferon-γ. Glycobiology 10, 421–430 (2000). 113. Son, Y. D., Jeong, Y. T., Park, S. Y. & Kim, J. H. Enhanced sialylation of recombinant human erythropoietin in Chinese hamster ovary cells by combinatorial engineering of selected genes. Glycobiology 21, 1019–1028 (2011). 114. Von Horsten, H. H. et al. Production of non-fucosylated antibodies by co-expression of heterologous GDP-6-deoxy-d-lyxo-4-hexulose reductase. Glycobiology 20, 1607–1618 (2010). 115. Yamane-Ohnuki, N. et al. Establishment of FUT8 knockout Chinese hamster ovary cells: An ideal host cell line for producing completely defucosylated antibodies with enhanced antibody-dependent cellular cytotoxicity. Biotechnol. Bioeng. 87, 614–622 (2004). 116. Yamane-Ohnuki, N. & Satoh, M. Production of therapeutic antibodies with controlled fucosylation. mAbs 1, 230–236 (2009). 117. Majors, B. S., Chiang, G. G., Pederson, N. E. & Betenbaugh, M. J. Directed evolution of mammalian anti-apoptosis proteins by somatic hypermutation. Protein Eng. Des. Sel. 25, 27–38 (2012). 118. Majors, B. S., Betenbaugh, M. J., Pederson, N. E. & Chiang, G. G. Mcl-1 overexpression leads to higher viabilities and increased production of humanized monoclonal antibody in Chinese hamster ovary cells. in Biotechnology Progress 25, 1161–1168 (Biotechnol Prog, 2009). 119. Wong, D. C. F., Wong, K. T. K., Nissom, P. M., Heng, C. K. & Yap, M. G. S. Targeting early apoptotic genes in batch and fed-batch CHO cell cultures. Biotechnol. Bioeng. 95, 350–361 (2006). 120. Datta, P., Linhardt, R. J. & Sharfstein, S. T. An ’omics approach towards CHO cell engineering. Biotechnology and Bioengineering 110, 1255–1271 (2013). 121. Farrell, A., McLoughlin, N., Milne, J. J., Marison, I. W. & Bones, J. Application of multi-omics techniques for bioprocess design and optimization in Chinese hamster ovary cells. Journal of Proteome Research 13, 3144–3159 (2014).

62 Bibliography

122. Becker, J. et al. Unraveling the Chinese hamster ovary cell line transcriptome by next- generation sequencing. J. Biotechnol. 156, 227–235 (2011). 123. Fomina-Yadlin, D. et al. Transcriptome analysis of a CHO cell line expressing a recombinant therapeutic protein treated with inducers of protein expression. J. Biotechnol. 212, 106–115 (2015). 124. Stepanenko, A. A. & Dmitrenko, V. V. HEK293 in cell biology and cancer research: Phenotype, karyotype, tumorigenicity, and stress-induced genome-phenotype evolution. Gene 569, 182–190 (2015). 125. Konrad, M. W., Storrie, B., Glaser, D. A. & Thompson, L. H. Clonal variation in colony morphology and growth of CHO cells cultured on agar. Cell 10, 305–312 (1977). 126. Zdzienicka, M. Z., Cupido, M. & Simons, J. W. I. M. Increase in clonal variation in Chinese hamster ovary cells after treatment with mutagens. Somat. Cell Mol. Genet. 11, 127–134 (1985). 127. Nissom, P. M. et al. Transcriptome and proteome profiling to understanding the biology of high productivity CHO cells. in Molecular Biotechnology 34, 125–140 (Springer, 2006). 128. Davies, S. L. et al. Functional heterogeneity and heritability in CHO cell populations. Biotechnol. Bioeng. 110, 260–274 (2013). 129. Stepanenko, A. et al. Step-wise and punctuated genome evolution drive phenotype changes of tumor cells. Mutat. Res. - Fundam. Mol. Mech. Mutagen. 771, 56–69 (2015). 130. Kim, M., O’Callaghan, P. M., Droms, K. A. & James, D. C. A mechanistic understanding of production instability in CHO cell lines expressing recombinant monoclonal antibodies. Biotechnol. Bioeng. 108, 2434–2446 (2011). 131. Van Berkel, P. H. C. et al. N-linked glycosylation is an important parameter for optimal selection of cell lines producing biopharmaceutical human IgG. Biotechnol. Prog. 25, 244–251 (2009). 132. Wurm, F. M. Production of recombinant protein therapeutics in cultivated mammalian cells. Nature Biotechnology 22, 1393–1398 (2004). 133. Debs, B. El, Utharala, R., Balyasnikova, I. V., Griffiths, A. D. & Merten, C. A. Functional single-cell hybridoma screening using droplet-based microfluidics. Proc. Natl. Acad. Sci. U. S. A. 109, 11570–11575 (2012). 134. Jorgolli, M. et al. Nanoscale integration of single cell biologics discovery processes using optofluidic manipulation and monitoring. Biotechnology and Bioengineering 116, 2393–2411 (2019). 135. Kumar, N. & Borth, N. Flow-cytometry and cell sorting: An efficient approach to investigate productivity and cell physiology in mammalian cell factories. Methods 56, 366–374 (2012). 136. Pichler, J. et al. A study on the temperature dependency and time course of the cold capture antibody secretion assay. J. Biotechnol. 141, 80–83 (2009). 137. Helman, D. et al. Novel membrane-bound reporter molecule for sorting high producer cells by flow cytometry. Cytom. Part A 85, 162–168 (2014). 138. Matabaro, E. et al. Molecular switching system using glycosylphosphatidylinositol to select cells highly expressing recombinant proteins OPEN. doi:10.1038/s41598-017- 04330-3 139. Priola, J. J. et al. High-throughput screening and selection of mammalian cells for enhanced protein production. Biotechnology Journal 11, 853–865 (2016). 140. Porter, A. J., Dickson, A. J. & Racher, A. J. Strategies for selecting recombinant CHO cell lines for cGMP manufacturing: realizing the potential in bioreactors. Biotechnol. Prog. 26, 1446–1454 (2010).

63 Bibliography

141. Misaghi, S., Chang, J. & Snedecor, B. It’s time to regulate: Coping with product-induced nongenetic clonal instability in CHO cell lines via regulated protein expression. Biotechnol. Prog. 30, 1432–1440 (2014). 142. Geysen, H. M., Meloen, R. H. & Barteling, S. J. Use of peptide synthesis to probe viral antigens for epitopes to a resolution of a single amino acid. Proc. Natl. Acad. Sci. U. S. A. 81, 3998–4002 (1984). 143. Forsström, B. et al. Proteome-wide epitope mapping of antibodies using ultra-dense peptide arrays. Mol. Cell. Proteomics 13, 1585–1597 (2014). 144. Abbott, W. M., Melissa, M. & Lowe, D. C. Current approaches to fine mapping of antigen – antibody interactions. Immunology 142, 526–535 (2014). 145. Gershoni, J. M., Roitburd-berman, A., Siman-tov, D. D. & Freund, N. T. Epitope Mapping The First Step in Developing Epitope-Based Vaccines. 21, 145–156 (2007). 146. Wiithrich, K. Protein Structure Determination in Solution by NMR Spectroscopy. J. Biol. Chem. 265, 22059–22062 (1990). 147. Kühlbrandt, W. The resolution revolution. Science 343, 1443–1444 (2014). 148. Merino, F. & Raunser, S. Electron Cryo-microscopy as a Tool for Structure-Based Drug Development. Angewandte Chemie - International Edition 56, 2846–2860 (2017). 149. Bai, X. chen, McMullan, G. & Scheres, S. H. W. How cryo-EM is revolutionizing structural biology. Trends in Biochemical Sciences 40, 49–57 (2015). 150. Opuni, K. F. M. et al. Mass spectrometric epitope mapping. Mass Spectrometry Reviews 37, 229–241 (2018). 151. Deng, B., Lento, C. & Wilson, D. J. Hydrogen deuterium exchange mass spectrometry in biopharmaceutical discovery and development–A review. Anal. Chim. Acta 940, 8–20 (2016). 152. Huang, P. S., Boyken, S. E. & Baker, D. The coming of age of de novo protein design. Nature 537, 320–327 (2016). 153. Blythe, M. J. & Flower, D. R. Benchmarking B cell epitope prediction: Underperformance of existing methods. Protein Sci. 14, 246–248 (2009). 154. Greenbaum, J. A. et al. Towards a consensus on datasets and evaluation metrics for developing B-cell epitope prediction tools. Journal of Molecular Recognition 20, 75–82 (2007). 155. Clackson, T. & Wells, J. A. A hot spot of binding energy in a hormone-receptor interface. Science (80-. ). 267, 383–386 (1995). 156. Van Regenmortel, M. H. V. Mapping epitope structure and activity: From one- dimensional prediction to four-dimensional description of antigenic specificity. Methods A Companion to Methods Enzymol. 9, 465–472 (1996). 157. Van Regenmortel, M. H. V. Structural and functional approaches to the study of protein antigenicity. Immunology Today 10, 266–272 (1989). 158. Chao, G., Cochran, J. R. & Wittrup, K. D. Fine Epitope Mapping of anti-Epidermal Growth Factor Receptor Antibodies Through Random Mutagenesis and Yeast Surface Display. J. Mol. Biol. 342, 539–550 (2004). 159. Kowalsky, C. A. et al. Rapid fine conformational epitope mapping using comprehensive mutagenesis and deep sequencing. J. Biol. Chem. 290, 26457–26470 (2015). 160. Using, E. et al. Mapping Protein Binding Sites and Conformational Article Mapping Protein Binding Sites and Conformational Epitopes Using Cysteine Labeling and Yeast Surface Display. Struct. Des. 25, 395–406 (2017). 161. Van Blarcom, T. et al. Precise and efficient antibody epitope determination through library design, yeast display and next-generation sequencing. J. Mol. Biol. 427, 1513– 1534 (2015).

64 Bibliography

162. Van Blarcom, T. et al. Epitope mapping using yeast display and next generation sequencing. in Methods in Molecular Biology 1785, 89–118 (Humana Press Inc., 2018). 163. Volk, A. L., Hu, F. J. & Rockberg, J. Epitope mapping of antibodies using bacterial cell surface display of gene fragment libraries. in Methods in Molecular Biology 1785, 141– 157 (Humana Press Inc., 2018). 164. Lodish, H. et al. Overview of Membrane Transport Proteins. in Molecular Cell Biology. 4th edition. (W. H. Freeman, 2000). 165. Nosjean, O., Briolay, A. & Roux, B. Mammalian GPI proteins: sorting, membrane residence and functions. Biochimica et Biophysica Acta - Reviews on Biomembranes 1331, 153–186 (1997). 166. Chesnut, J. D. et al. Selective isolation of transiently transfected cells from a mammalian cell population with vectors expressing a membrane anchored single-chain antibody. J. Immunol. Methods 193, 17–27 (1996). 167. Heider, S., Dangerfield, J. A. & Metzner, C. Biomedical applications of glycosylphosphatidylinositolanchored proteins. Journal of Lipid Research 57, 1778– 1788 (2016). 168. Ivanusic, D. et al. tANCHOR: a novel mammalian cell surface peptide display system. Biotechniques 70, 21–28 (2021). 169. Amann, T., Schmieder, V., Faustrup Kildegaard, H., Borth, N. & Andersen, M. R. Genetic engineering approaches to improve posttranslational modification of biopharmaceuticals in different production platforms. Biotechnol. Bioeng. 116, 2778– 2796 (2019). 170. Wang, Q. et al. Design and Production of Bispecific Antibodies. antibodies (Basel) 8, (2019). 171. Brinkmann, U. & Kontermann, R. E. The making of bispecific antibodies. MAbs 9, 182– 212 (2017). 172. Cartwright, J. et al. A platform for context-specific genetic engineering of recombinant protein production by CHO cells. J. Biotechnol. 312, 11– 22 (2020). 173. Pybus, L. P. et al. Model-directed engineering of ‘difficult-to-express’ monoclonal antibody production by Chinese hamster ovary cells. Biotechnol. Bioeng. 111, 372–385 (2014). 174. Brown, A. J. & James, D. C. Precision control of recombinant gene transcription for CHO cell synthetic biology. Biotechnol. Adv. 34, 492–503 (2016). 175. Hansen, H. G., Pristovšek, N., Kildegaard, H. F. & Lee, G. M. Improving the secretory capacity of Chinese hamster ovary cells by ectopic expression of effector genes: Lessons learned and future directions. Biotechnology Advances 35, 64–76 (2017). 176. Leppek, K., Das, R. & Barna, M. Functional 5ʹ UTR mRNA structures in eukaryotic translation regulation and how to find them. Nat. Rev. Mol. Cell Biol. 19, 158–174 (2017). 177. Kozak, M. Influences of mRNA secondary structure on initiation by eukaryotic ribosomes. Proc. Nati. Acad. Sci. USA 83, 2850–2854 (1986). 178. Kozak, M. Circumstances and Mechanisms of Inhibition of Translation by Secondary Structure in Eucaryotic mRNAs. Mol. Cell. Biol. 9, 5134–5142 (1989). 179. Endo, K., Stapleton, J., Hayashi, K. & Saito, H. Quantitative and simultaneous translational control of distinct mammalian mRNAs. Nucleic Acids Res. 41, e135 (2013). 180. Parsyan, A., Svitkin, Y. & Shahbazian, D. mRNA helicases: the tacticians of translational control. Nat. Rev. Mol. Cell Biol. 12, 235–245 (2011). 181. Babendure, J. B., Babendure, J. L., Ing, J.-H. & Tsien, R. Y. Control of mammalian translation by mRNA structuren ear caps. RNA 12, 851–861 (2006).

65 Bibliography

182. Lamping, E., Niimi, M. & Cannon, R. D. Small, synthetic, GC-rich mRNA stem-loop modules 5′ proximal to the AUG start-codon predictably tune gene expression in yeast. Microb. Cell Fact. 12, (2013). 183. Weenink, T., Hilst, J. van der, McKiernan, R. M. & Ellis, T. Design of RNA hairpin modules that predictably tune translation in yeast. Synth. Biol. 3, (2018). 184. Schlatter, S. et al. On the Optimal Ratio of Heavy to Light Chain Genes for Efficient Recombinant Antibody Production by CHO Cells. Wiley Online Libr. 21, 122–133 (2005). 185. Ho, S. et al. Control of IgG LC: HC ratio in stably transfected CHO cells and study of the impact on expression, aggregation, glycosylation and conformational stability. J. Biotechnol. 165, 157–166 (2013). 186. Ho, S. C. L., Wang, T., Song, Z. & Yang, Y. IgG Aggregation Mechanism for CHO Cell Lines Expressing Excess Heavy Chains. Mol. Biotechnol. 57, 625–634 (2015). 187. Desnick, R. J. & Schuchman, E. H. Enzyme replacement therapy for lysosomal diseases: Lessons from 20 years of experience and remaining challenges. Annual Review of Genomics and Human Genetics 13, 307–335 (2012). 188. Grabowski, G. A. & Whitley, C. Ten plus one challenges in diseases of the lysosomal system. Mol. Genet. Metab. 120, 38–46 (2017). 189. Simonis, H., Yaghootfam, C., Sylvester, M., Gieselmann, V. & Matzner, U. Evolutionary redesign of the lysosomal enzyme arylsulfatase A increases efficacy of enzyme replacement therapy for metachromatic leukodystrophy. Hum. Mol. Genet. 28, 1810– 1821 (2019). 190. Roth, R. & Mayr, L. Cell lines and methods for increased protein production. (United States Patent Application Publication - US 2020/0370056 A1, 2020). 191. Sun, Y. et al. TMEM74 promotes tumor cell survival by inducing autophagy via interactions with ATG16L1 and ATG9A. Cell Death Dis. 8, e3031 (2017). 192. Takahashi, Y. et al. VPS37A directs ESCRT recruitment for phagophore closure. J. Cell Biol. 218, 3336–3354 (2019). 193. Wengrod, J. et al. Inhibition of Nonsense-Mediated RNA Decay Activates Autophagy. Mol. Cell. Biol. 33, 2128–2135 (2013). 194. Delorme-Axford, E. & Klionsky, D. J. On the edge of degradation: Autophagy regulation by RNA decay. Wiley Interdiscip. Rev. RNA 10, e1522 (2019). 195. Hansen, H. et al. Versatile microscale screening platform for improving recombinant protein productivity in Chinese hamster ovary cells. Sci. Rep. 5, (2015). 196. Kent, K. P., Childs, W. & Boxer, S. G. Deconstructing green fluorescent protein. J. Am. Chem. Soc. 130, 9664–9665 (2008). 197. Wrapp, D. et al. Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation. Science (80-. ). 367, 1260–1263 (2020). 198. Urquhart, L. Top companies and drugs by sales in 2019. Nature reviews. Drug discovery 19, 228 (2020). 199. Kuo, C. C. et al. The emerging role of systems biology for engineering protein production in CHO cells. Current Opinion in Biotechnology 51, 64–69 (2018). 200. Kol, S. et al. Multiplex secretome engineering enhances recombinant protein production and purity. Nat. Commun. 11, (2020).

66