PhD Thesis

Screening for improved affinity of luminescence-complementing peptides in the NanoBiT system

This thesis has been submitted to the PhD School of The Faculty of Science, University of Copenhagen

Submitted by: Benjamin Bjerre Supervisor: Professor Jakob Rahr Winther

Submitted on: 19th October 2020

Abstract in English

Luciferases are a class of enzymes that emit light as part of their catalytic function. have been found widely distributed in nature and have in many cases been employed as reporter systems for biochemical studies. The focus of this work has been to optimize the affinity of the split-luciferase system NanoBiT, engineered by the Promega corporation from the basis of the luciferase of Oplophorus gracilirostris, a small deep-sea shrimp.

The NanoBiT system works by splitting the luciferase into an inactive, truncated luciferase, LgBiT, and an 11- peptide, SmBiT, which can restore luminescence function when binding to the LgBiT. This complementation can occur even when the peptide is fused to another protein. This makes it useful as a tag on other proteins, to detect and quantify them with a high degree of sensitivity and accuracy.

To increase the functionality of this system further, we wished to optimize the affinity of the peptide, by exploring its sequence space in more detail. This was done through the construction of an in vivo screening platform, to allow efficient luminescence screening of libraries of mutant variants of the SmBiT peptide.

In this process, libraries of variant peptides were screened for their ability to complement the truncated luciferase. From those results, analysis was done to estimate the contributions of the substitutions found, resulting in a prediction of specific candidate substitutions likely to improve function. The candidates were then validated in vitro, resulting in two improved peptides; one with improved affinity and another with improved luminescent activity.

Abstract in Danish

Luciferaser er en klasse af enzymer, som udsender lys som en del af deres katalytiske funktion. Luciferaser findes vidt spredt i naturen og er i mange tilfælde blevet brugt som rapporteringssystem for biokemiske studier. Fokus for dette arbejde har været at optimere affiniteten af split-luciferase systemet NanoBiT, udviklet af Promega på basis af luciferasen fra Oplophorus gracilirostris, en lille dybhavsreje.

NanoBit-systemet fungerer ved at dele luciferasen i to, til en inaktiv, trunkeret luciferase, LgBiT, og et 11- aminosyre langt peptid, SmBiT, som kan genskabe luminescensaktivitet ved binding til LgBiT. Denne komplementation kan forekomme selv når peptidet er vedhæftet et andet protein. Dette gør det brugbart som et tag for andre proteiner, til påvisning og kvantificering af disse med en høj grad af følsomhed og præcision.

For at yderligere forbedre funktionen af dette system ønskede vi at optimere affiniteten af peptidet ved at udforske sekvensrummet i dybere detaljer. Dette blev gjort gennem konstruktion af en in vivo screeningsplatform, for at tillade effektiv luminiscensscreening af biblioteker af muterede varianter af SmBiT peptidet.

I denne process blev biblioteker af peptidvarianter screenet for deres evne til at komplementere den trunkerede luciferase. Fra disse resultater udførte vi en analyse for at bedømme bidraget for en given fundet substitution, hvilket resulterede i en forudsigelse af specifikke kandidatsubstitutioner med forbedret funktion. Kandidaterne blev valideret in vitro, hvilket resulterede i to peptider med forbedringer; den ene med forbedret affinitet, den anden med forbedret luminisens.

Acknowledgements

Let me start with a simple observation: More than anything else, what has made this project a good experience is the people I have worked with. I feel that I've had an extremely good work environment, both with regard to technical assistance and just plain good times. So, to whoever I will inevitably end up forgetting to mention: Thanks.

First, I want to thank my supervisor over the years, professor Jakob Rahr Winther. In addition to being an experienced and helpful supervisor, with a keen understanding of when to stay out of your way, he's also just a nice guy. That means a lot when you're stressed out.

I'd like to thank associate professors Martin Willemoës and Michael Askvad Sørensen. The cooperation between our groups have been a great source of inspiration and the ability to simply knock on your door and ask a question has been helpful more than once.

I must also mention Charlotte O'Shea, PhD, our lab manager and go-to with all things lab related. Thank you for being a constant support for practical things and generally a source of good cheer. Our lab is much better for having you here.

When working on a project like this, it's important to have people who are in the same situation, for talking things over, commiserating about experiments not working, and general mutual support and cooperation. I feel lucky that I have shared this time with Oana-Nicoleta Antonescu. You're absolutely awesome and I can't wait to hear what you get up to next. Also Johanna Maarit Koivisto has been a great support. It would have been much harder without such good company.

On that note, I'd like to thank the other students that we've had through our lab. In particular, I want to mention Matilde Knapkøien Nordentoft and Fenne Marjolein Dijkema. It has been great to have such wonderful colleagues, whether to discuss results or just chat over lunch.

I'd also like to tip the hat to my office mates. Although we have not worked closely together in a practical sense, the simple fact of having a nice office to work in counts for a lot. Thanks for a good environment and occasional goofiness.

On the first year, some fellow PhD students took initiative to form BioNet, our local PhD-student organization. I have greatly enjoyed being part of that work and it allowed me to get to know other PhD-students that I likely would not otherwise have had much contact with. I think that's a distinct benefit of BioNet and I am happy to see that a new crop of students are taking over and carrying on that work. I wish them the best of luck and I hope they will benefit from it as much as I have.

Furthermore, I should mention the many bachelor's students I came into contact with, as part of my teaching duties. It was unexpectedly fun and I have particularly enjoyed seeing some of them join our lab later and work with us. I feel this has been a distinctly positive aspect of my work here.

I'd also like to thank some people from outside the University of Copenhagen: Claus Schafer-Nielsen, of Schafer-N, for helpful advice and assistance with the peptide arrays; professor Uffe Mortensen and PhD- student Kyle Rothschild-Mancinelli from DTU, for assistance with testing the automated colony picker; and the Promega corporation, for providing anti-NanoLuc antibodies without delay or bureaucracy. I should also thank the Independent Research Fund Denmark, for funding this project. It literally couldn't have been done without their contribution.

Finally, I hope I can be forgiven for a cliche in thanking my family and in particular my mother, Birthe. Thanks for being there when needed and for leaving me alone when needed, as well as years of support during my studies generally.

Abbreviations

CTZ Coelenterazine tLuc Truncated NanoLuc luciferase, also called 11S or LgBiT, in Promega’s work

19kOLase 19kDa domain of wt-Oplophorus luciferase

NanoBiT NanoLuc Binary Technology – Promega’s NanoLuc-based split system

GLuc Gaussia luciferase

FLuc

RLuc Renilla luciferase

TRX E. coli thioredoxin

Thesis Outline

This PhD thesis describes work done in the period of November 2016 - January 2020, at the Linderstrøm-Lang Center for Protein Science, at the Department of Biology, University of Copenhagen.

Chapter 1 covers the necessary background for understanding the NanoBiT system. It reviews information on Oplophorus luciferase and the work that has been done to develop the NanoBiT system from that basis. It further covers basic information on the structure and function of both NanoLuc and the NanoBiT system, as well as some brief discussion, comparing NanoLuc and its derivatives to other luciferases.

Chapter 2 describes the screening system that formed the main practical work, along with its application and the relevant protocols. It will cover the setup and initial validation of the system, as well as the strategies employed for the mutagenesis of the peptide.

Chapter 3 discusses the screening results, the analysis of those results, how this guided the selection of candidates, and the final results of the affinity measurements of those candidates.

Please note that all data of affinity measurements has been produced by Charlotte O’Shea, Sylvester Vinther, Sigrid Jørsboe, and Jonas Puls.

Chapter 4 deals with discussion of these results, as well as discussion of the system as a whole, and some interesting outlier results.

Chapter 5 contains appendices and raw data. It will also briefly discuss some earlier, and aborted, attempts at screening, using peptide microarrays for higher throughput.

Contents

1. Background ...... 10

1.2 Comparisons between luciferases ...... 17

1.3 NanoBiT ...... 21

1.4 Project Concept ...... 24

1.5 References ...... 25

2. Methods...... 28

2.1 System details ...... 30

2.2 Testing the system ...... 32

2.3 Library strategy ...... 35

2.4 Protocols ...... 38

2.5 References ...... 44

3. Results ...... 45

3.1 Examining the libraries ...... 49

3.2 Heatmap calculation ...... 51

3.3 Evaluating candidates ...... 53

3.4 Final Candidate Selection ...... 57

3.5 Affinity measurements ...... 59

3.6 References ...... 61

4. Discussion ...... 62

4.1 Codon effects ...... 65

4.2 Interesting mutant – R162G ...... 66

4.3 Library optimizations ...... 70

4.4 Improvements to the screening system ...... 72

4.5 A few concluding thoughts ...... 75

4.6 References ...... 77 Chapter 5 Appendices ...... 78

Appendix A - Early approaches ...... 78

Appendix B - Pre-screening Procedure ...... 81

Appendix C - Sequences ...... 82

Appendix D - pET30a plasmid map ...... 84

Appendix E - E. coli strain genotypes: ...... 85

Appendix F - Screen data ...... 86

Appendix G - Affinity fitting curves ...... 108

1. Background

Bioluminescence, as an observed phenomenon, has been known for millennia, but it is only with modern science that we have gained control of this biological phenomenon to a degree that allows its application for practical purposes.

Luciferases are a general category of enzymes, with the shared property of emitting light as part of their function. This occurs by the catalysis of a reaction where the substrate reacts with molecular oxygen, to form a product in an excited state. As this state relaxes, a photon is emitted, which is responsible for the light output. Generically, the enzyme is called a luciferase and it’s substrate a luciferin (McCapra 1976).

A great variety of luciferases and luciferins are known, and have been thoroughly reviewed by others (Shimomura, 2012; S. V. Markova and Vysotski 2015; Kaskova, Tsarkova, and Yampolsky 2016; Svetlana V. Markova, Larionova, and Vysotski 2019). Here we will focus on Oplophorus luciferase and the work that has been done on this luciferase, along with a few other examples, for comparison purposes.

1.1 Oplophorus luciferase and NanoLuc

Wild type Oplophorus luciferase (OLuc), derived from the deep-sea shrimp Oplophorus gracilirostris, was first described by Shimomura et al. 1978 as a tetrameric protein, later clarified by Inouye et al. 2000 as composed of two 35kDa subunits and two 19kDa subunits. It was further found that the luciferase activity was caused by the 19kDa subunit – 19kOLase – which could be expressed alone, but was highly unstable and prone to aggregation.

Mary Hall, et al., from the Promega corporation, proceeded to optimize this subunit as a monomeric luciferase, employing three rounds of random mutagenesis, along with some rational design, to obtain 16 amino acid changes, leading to a stabilized monomer with significant increase in activity – NanoLuc. (Hall et al. 2012).

While it is known that the wt-OLuc is a tetramer, the exact arrangement of subunits has not been reported. NanoLuc, however, has been crystallized and two structures have been reported (PDB IDs: 5ibo, 5b0u; Tomabechi et al. 2016). These structures are in overall agreement, with some small differences. NanoLuc is a monomeric protein, 169 amino acids long, structured as a flattened beta-barrel with 11 anti- parallel beta strands, as well as several helices, mostly placed in the loop regions at one end of the barrel, forming a kind of cap to the barrel. The differences between the two known structures are mainly in the loop regions and helices in this cap, as seen in figure 1.1, below.

Figure 1.1 – comparison of NanoLuc structures

5IBO – NanoLuc – in teal

5B0U – NanoKAZ – in magenta

Note overall agreement in the barrel structure, with some differences in the cap helices, shown in the top of the frame.

NanoLuc shares little structural or sequence similarity with other luciferases, even those using the same luciferin. However, NanoLuc does bear a strong resemblance to non-luminescent proteins in the calycin structural family (Flower, et al. 2000). This fact was employed by Hall et al. 2012 as part of their work to stabilize NanoLuc. Specifically, calycins share a common motif, wherein an Arginine or Lysine residue from the C-terminal strand hydrogen bonds with the N-terminal helix 1, while covering a conserved Tryptophan on strand 1. It seems this arrangement helps to bind the barrel together, for increased stability.

In 19kOLase, the relevant amino acid is N166, leading Hall et al. to propose that N166R would stabilize the structure of the monomeric OLuc. This was successful, with this one mutation increasing both stability and luminescence. This supports the notion of a relationship between OLuc and the calycins; structurally, and perhaps evolutionarily.

Figure 1.2 – structural alignment between Sm14 and NanoLuc.

NanoLuc (PDB: 5IBO) in teal, Sm14 (PDB: 1VYF; Angelucci et al. 2004) in cyan. Note helix 4 in top left, not present in Sm14. Side-chains of residues involved in the structural motif shown in NanoLuc structure, bottom right: R166 in blue, W10 in orange. Note helix 1 immediately below R166.

Especially, fatty-acid binding proteins within the calycin family have a very similar beta barrel structure. As an example, the Schistosoma mansoni fatty-acid binding protein Sm14 is shown in a structural alignment with NanoLuc in figure 1.2. The barrel is largely the same, with the main differences being in the helices, especially helix 4 of NanoLuc, which is not present in Sm14. Sm14 has not been reported to have any luminescent activity.

The precise active site for NanoLuc is not known, but some inferences can be made from the structure. Specifically, a cavity is apparent in the center of the barrel (see figure 1.3), connecting to an open channel, allowing access to the surface of the protein. Also, R162, which is known to be important for function, points directly into this cavity.

Figure 1.3 – view of putative binding pocket

Overall NanoLuc structure shown in green, with residues 47-108 removed, to allow viewing of the internal cavity. The internal surface shown in light grey. Specific residues are shown with side-chains: R162 in teal, contacting residues (within 6Å of the zeta-C atom in R162) shown in magenta. Side-chains of other residue in strand 11 shown in green.

Finally, Sm14 binds fatty acids in a pocket that looks very similar, even interacting with an Arginine, placed in a similar fashion as what is found in NanoLuc. These facts taken together makes it likely that this is the binding pocket for the substrate. However, exactly how the substrate binds in this pocket cannot be entirely deduced from this alone, especially with regard to orientation within the pocket.

Given that the putative active site is placed within the barrel, it is clear that residues from multiple strands may be involved. As such, a proper understanding of the active site might well open up avenues in the future, allowing a more rational approach to further optimizations.

NanoLuc emits light with a maximum around 460nm, with some dependency on substrate. The natural substrate of OLuc is coelenterazine (CTZ), which NanoLuc can also employ. However, the work done by Hall et al. 2012 focused on optimizing function with a synthetic substrate; furimazine. Furimazine is more stable than CTZ and gives a higher light output, even when compared with traditionally used luciferases, like Firefly luciferase (FLuc) or Renilla luciferase (RLuc).

The fact that NanoLuc has been optimized with furimazine complicates the evaluation of the substitutions. It is not perfectly clear which substitutions are improving the general stability and luminescence and which substitutions are specific for the substrate being employed.

Inouye et al. 2014 investigated the sixteen substitution included in NanoLuc, compared with 19kOLase. They made each of the substitutions individually, to examine their effects and found that the combination of just three of these substitutions (V44I, A54I, and Y138I) were together able to produce a 66-fold increase in luminescence, compared with 19kOLase. More interesting, though, is the fact that this triple mutant, eKAZ, also had a 7-fold increase in luminescence compared with NanoLuc, when using CTZ as a substrate. This indicates that some of the substitutions selected for in NanoLuc are specific to the use of furimazine as a substrate, not general optimizations.

Yeh et al. 2017 followed this up by site-directed mutagenesis to randomize positions thought likely to be important for function, while screening for luminescence with the synthetic substrates selenoterazine and diphenylterazine.

Figure 1.4 – Synthetic CTZ analogoues

Comparison of the structures of naturally occurring coelenterazine (a), Promega’s synthetic analogue, furimazine (b), and the two synthetic analogues employed by Yeh, 2017; selenoterazine (c) and diphenylterazine (d).

Adapted from Yeh et al. 2017.

The first positions chosen were those investigated by Inouye et al. 2014. However, no improvements were found by randomizing these positions. Instead, Yeh et al. 2017 shifted to positions predicted by analyzing the putative active site, focusing on L18, D19, R162, and C164, and additionally employing error-prone PCR, to find other relevant residues. This resulted in two variants of NanoLuc: yeLuc and teLuc, using the synthetic CTZ-analogues selenoterazine (STZ) and diphenylterazine (DTZ), respectively.

Figure 1.5 – Luminescence spectra of NanoLuc variants

Note especially teLuc + DTZ (teal), showing both higher luminescence and a clear spectral shift, compared with NanoLuc + furimazine (blue).

From Yeh et al. 2017.

In particular teLuc, using DTZ shows a much higher brightness, as well as a clear spectral shift, when compared with NanoLuc. Several of the accepted mutations were in positions not deliberately targeted, but only found with epPCR. This indicates that, while we can make educated guesses as to which positions are important, we are still some falling short of being able to rationally design NanoLuc variants. However, mutagenesis and screening has been a productive approach for finding optimizations.

Looking over the substitutions introduced by Hall et al. 2012 and Yeh et al. 2017 (see figure 1.6), it is difficult to see a clear pattern. The substitutions are found throughout the protein and include both surface residues and residues pointing into the barrel. Again, it is not obvious which substitutions are responsible for which changes in the stability and function of NanoLuc and its derivatives. Further work is required to untangle these effects. Figure 1.6 – NanoLuc with highlight of mutated residues

Positions from eKAZ shown in red. Positions selected by Yeh et al. 2017 from analysis of active site shown in blue. Positions mutated by epPCR shown in yellow.

NanoLuc substitutions shown in magenta, except with regard to residues included in the other sets.

The broad substrate specificity of NanoLuc may be helpful, since we have access to many comparisons, which could help elucidate which residues are important for what aspect of luciferase function. Inouye, et al, in particular, have made detailed comparisons of many CTZ-analogues (Inouye, Sato, et al. 2013; Inouye, Sahara-Miura, et al. 2013). Such information could form a basis for more detailed knowledge of the active site and help build our understanding of luciferases towards a point where rational design may be possible.

It’s likely, given the broad initial substrate specificity of OLuc, that a series of NanoLuc variants could be made, each optimized for a particular substrate. This could form the basis of a multiplexed luciferase system, using only variants of NanoLuc, assuming that the cross-over activity was limited. 1.2 Comparisons between luciferases

A number of luciferases have been identified and studied, especially with focus on their application for biochemical purposes, such as reporters of binding or expression. The characteristics of a luciferase-luciferin pair will greatly affect what applications it can be used for.

As noted, NanoLuc has no direct relationship with other luciferases, neither in sequence nor in structure. NanoLuc is also smaller than most luciferases known. At 19kDa, it’s smaller than the commonly used Renilla and Firefly luciferases (36kDa and 61kDa, respectively) and at roughly the same size as Gaussia luciferase. Size may be an important trait for some applications, such as protein fusions, where a larger luciferase could create steric problems. Nishitsuji, et al. produced a recombinant hepatitis B virus, with an inserted NanoLuc gene, for investigating the infection of human liver cells. Since HBV is sensitive to large insertions into its genome, the small size of NanoLuc was ideal to avoid disruption of the normal viral life-cycle and allowed easy monitoring of the progress of infection. Not only did this allow them to detect infected cells in an easy and cheap manner, but also to dynamically follow the effects of HBV inhibitors (Nishitsuji, Ujino et al. 2015).

Figure 1.7 – Comparison of structures and substrates of select luciferases. Note that GLuc is not included, since its exact structure is unknown.

Figure constructed on the basis of Kaskova, Tsarkova, and Yampolsky 2016; Hall et al. 2012; and PDB files 5B0U, 2PSD, and 1LCI.

In addition to size differences, luciferases can also have structural features that make them more or less suitable for specific purposes. E.g. Gaussia luciferase has five disulfide bridges, whereas NanoLuc has none. This can affect which expression system is relevant. Expression of Gaussia luciferase in E. coli can result in a large proportion of the protein being caught in insoluble aggregates, likely as a result of mismatched cysteines. For ideal expression of GLuc, specialized expression systems are needed, to ensure correct disulfide formation (Rathnayaka et al. 2010; Matos et al. 2014). NanoLuc, on the other hand, is easily expressed and purified from E.coli.

NanoLuc and GLuc are comparable in some ways. In addition to being of similar size, both luciferases use CTZ as their substrate and emit light at approximately the same wavelength. However, the similarities stop there. Their kinetics are quite different, with GLuc producing a short-lived spike of high luminescence (so-called “flash” kinetics), whereas NanoLuc gives a luminescent signal that is lower, but stable over time (reported half-life of 2h, with furimazine (Hall et al. 2012)).

A long-lasting signal makes measurements more practically simple, since one does not need to rush to catch the early spike. On the other hand, a short-lived signal may be more relevant for studying dynamics. NanoLuc has been successfully paired with a PEST tag, which allows it to be used to study dynamic processes, such as transcription (Hall et al. 2012).

In addition to the qualities of the luciferase itself, the substrate requirements of a luciferase are crucial for its utility. Substrates differ in solubility, stability, cell toxicity, and expense. The specific substrate can also alter the emission wavelength. Furthermore, some luciferases use, effectively, two substrates.

An interesting example of this is Firefly luciferase (FLuc). This luciferase naturally requires ATP in addition to its luciferin. This fact, rather than being a limitation, has been used creatively, to generate an ATP reporter. By adding a surplus of the luciferin, the limiting factor becomes the availability of ATP and the light output is therefore proportional to the ATP concentration. This property is well known and is currently being exploited in commercially available kits (Alexander, Ederer, and Matseni 1976; Link 1). Along similar lines, differences in substrate can also allow multiplexing of luciferases, e.g. with one as a signal read-out and the other as an internal control. Nakajima et al. 2005 even used three luciferases with different emission spectra, but using the same substrate. By using optical filters, they could simultaneously monitor the expression of two genes, and have a normalizing control. Verhoef et al. took advantage of the varying substrates of differing luciferases, to make a multiplexed system using FLuc (substrate D-luciferin) and NLuc (Substrate CTZ/Furimazine) to study inhibitors of the oncogenes MDM2 and MDM¤, which both bind to p53, but respond differently to known inhibitors. Using this approach, both interactions could be evaluated in the same assay, without complicated spectral analyses or filters. (Verhoef, Mattioli et al. 2016)

The emission spectrum is further relevant, not only for discriminating between luciferase signals, but also for specific applications. E.g. bioluminescent imaging of mammalian tissues require a longer wavelength, since mammalian tissues tend to absorb light at shorter wavelengths. A short wavelength would result in a muted signal, lowering the sensitivity of the assay. For that reason, several luciferases have been engineered for changes in the emission spectrum, especially for longer wavelengths. Combining a luciferase with a fluorescent protein can also be used to lower the wavelength, via BRET, as have been done with versions of NanoLuc and as occurs naturally with RLuc (Ward and Cormier, 1979).

An example of the creative use of BRET is Shigeto, et al., who used NLuc combined with Yellow fluorescent protein (YPet), to create an insulin reporter. Using fragments of the binding pocket of X insulin receptor, shigeto, et al. produced a split system, with one fragment fused to NLuc and the other to YPet

Without insulin present, the receptor fragments would not associate and only the NLuc signal was seen, but upon binding insulin, the receptor fragment were brought into close contact, resulting in a BRET signal with a clear change in emission wavelength. (Shigeto, Ikeda et al. 2015)

Figure 1.8 – Comparisons of NanoLuc with other luciferases

Shown are comparisons of NanoLuc and FLuc, regarding sensitivity to temperature (a) and pH (b). Values normalized to maximum luminescence. Also shown is comparison of maximum luminescent output, for NanoLuc (using furimazine), FLuc (D-luciferin) and RLuc (CTZ).

Overall, NanoLuc has proven itself to be a very useful luciferase. Compared with other commonly used luciferases, NanoLuc is stable over a wider range of pH values, has a higher thermal stability, and has brighter luminescence. The broad substrate specificity allows the use of unique substrates, which can further help in multiplexing experiments. These qualities have resulted in a very promising tool for biochemical investigations and is no doubt part of why NanoLuc was chosen for the development of a tunable split system. 1.3 NanoBiT

From the basis of NanoLuc, Dixon et al. 2016 described the construction of a split luciferase system, NanoBiT (NanoLuc Binary Technology), wherein the eleventh beta strand was removed, to produce a stable, truncated luciferase (tLuc), covering residues 1-156 of the full-length NanoLuc, and an 11-amino acid peptide, corresponding with residues 158-168 of NanoLuc. The tLuc alone has negligible residual activity (<0.1% of NanoLuc), but the peptide can complement tLuc, to restore luciferase activity.

Several split systems have already been constructed, on the basis of NanoLuc (Zhao et al. 2016; Verhoef et al. 2016). However, in these cases, the split site was made on an ad hoc basis, intended for a particular application, with rather large fragments. Dixon’s contribution was to find an ideal split site, resulting in tLuc, optimized for stability, with a very small peptide tag, determining the affinity of complementation. This allows the flexible use of this system for multiple applications, only changing the small tag itself.

In Dixon’s work, the truncated luciferase was optimized for stability via mutagenesis and screening, using furimazine, while the peptide was optimized for affinity via rational mutations. This produced a variety of peptides with affinities ranging from 190µM to 700pM, allowing this system to be used for several different applications.

Figure 1.9 – peptides designed by Dixon, et al.

Sequences of peptides, with substitutions shown in red, alongside measured Kds. The NanoLuc native peptide shown as NP. From Dixon et al. 2016.

It is notable that the main difference between the peptides in question is the affinity to tLuc. Given a high enough concentration of peptide, the same maximum luminescence is achieved. This indicates that the basic functionality is the same, regardless of peptide sequence. Once bound, any of the tested peptides can restore function. Figure 1.10 shows the saturations curves of the peptides from figure 1.9. While the peptides differ in affinity, at saturating concentrations they all reach the same maximum luminescence.

Figure 1.120– Titration of tLuc with the peptide variants.

Adapted from Dixon et al. 2016.

This split system can be used to detect protein-protein interactions, by fusing tLuc to one interaction partner and the peptide to the other. To the degree that the proteins interact, to that degree luciferase activity will be restored. However, for this to work, the intrinsic affinity between the NanoBiT partners must be lower than that of the interaction being investigated, or the luciferase function will be restored regardless of the interaction between the proteins under investigation.

For other applications, such as the detection and quantification of proteins, a high-affinity tag is required for higher sensitivity. Zhao et al. used an ad hoc split NanoLuc system to study protein aggregation. The target protein was fused to a NanoLuc fragment and probed with the corresponding fragment. Aggregated protein would be unavailable for complementation and would not give rise to a luminescence signal, whereas soluble protein would readily complement and produce signal. In such a case, the sensitivity of the assay is dependent on the affinity between the fragments. The ability to tune the affinity of the tag, without otherwise altering the setup, is a substantial benefit of the NanoBiT system (Zhao, Nelson et al. 2016).

The luciferase reaction can take place with purified protein, directly in cell lysate, or intracellularly, making the system quite flexible. These applications are not mere theory, as this system has been applied by other groups, such as Oh-Hashi et al. who used the NanoBiT system with the high affinity p86 peptide to examine the expression of mouse ATF4 protein, including the effects of inhibition of protein synthesis and protein degradation. In addition, the luminescent signal was used to easily select positive clones after transfection.(Oh-hashi, Furuta et al. 2017) Promega are already selling kits based on these principles, such as the NanoBiT® Protein:Protein Interaction System, and the Nano-Glo blotting and in-gel detection kits (Link 2). The Nano-Glo in-gel detection system is a fine example of how this system can improve on existing methods. The in-gel detection system is intended as an alternative to western blotting. Western blotting requires transferring proteins to a nitrocellulose or PVDF membrane and then identifying those by using individually targeted antibodies. The Nano-Glo system detects proteins directly in the SDS gel, using only tLuc and furimazine substrate. This is faster, reduces handling steps, and does not require specific antibodies, as any protein tagged with the SmBiT peptide can be identified with the same reagents. The high affinity of the p86 peptide allows sensitive and specific detection (Link 3). Dixon, et al. did not perform an exhaustive search of the sequence space, focusing on relatively conservative substitutions, as well as truncations and tests of linker length. Linker length was shown to be of marginal importance, with even no linker at all performing well. The truncations showed little effect with regard to the terminal residues (#158-159 and #167-168). However, when truncating further than that, ability to complement dropped immediately and sharply. (Dixon et al. 2016, supplement)

It appears, then, that the central seven positions are crucial for the affinity of the peptides. Three of these positions are not substituted at all, and the remaining positions have only two or three possible substitutions each. The fact that a wide range of affinities were found with such a cursory exploration of the sequence space raises the question of what else might be possible.

This formed the basis for the central research question of this thesis: Do Promega’s peptides represent the full range of affinities, or are there further, unexplored options, which might increase the utility of this already functioning split system?

1.4 Project Concept

Given the large sequence space of any peptide, it would be a daunting task to synthesize and test every variant peptide individually. To overcome this, we constructed a screening platform, to assist in quickly uncovering which peptide substitutions would be interesting to examine in more detail. In particular, the objective was to find peptides with an increased affinity, for improved function for identification and quantification of tagged proteins. For that purpose, we constructed a cell-based platform, which would allow high throughput screening of peptide variants.

A screening or selection system is an effective way of getting information without doing full individual characterizations. The idea is to zero in on a particular quality (in the case of the present work, luminescence) and assay for that, ignoring most everything else. By spending less time on any individual variant, you can examine a greater number. If you have chosen your assay correctly, this should give you the relevant information, with a minimum of work.

The fundamental difference between screening and selection is in how variants are assessed and, consequently, to what degree individual variants must be examined. In a screen, every single variant to be examined must be run through the screen and evaluated according to the results. In a selection, every variant is run through the selection, but only the hits pass, with the misses being selected away. For that reason, selection systems do not give specific information about the misses.

Selection systems tend to be more suited for assaying large numbers of variants, since most of them are essentially discarded. Screening systems rely on testing everything, which means more work per variant tested, but also more information on the misses, which can be useful, depending on what manner of analysis is used and what traits are being investigated.

However, the choice of system will depend largely on what’s practically feasible. In the case of this project, selecting for luminescence was not practical, since it’s difficult to connect luminescence with cell viability. Instead, we focused on a screening procedure, which allowed a quick first assessment of activity, followed by more detailed measurement of those variants that showed promise.

Screening and selection is a time-efficient way of cutting to the bone. In our case, we were not interested in all possible characteristics of a given peptide. We just wanted to know if it could complement the luciferase well, when used as a fusion tag on another protein. By focusing on this one trait, and by designing a system to easily measure this, we can examine a broad swathe of peptides in a reasonable amount of time.

1.5 References

Alexander, David N, Grace M Ederer, and John M Matseni. 1976. “Evalution of an Adenosine 5’-Triphosphate Assay as a Screening Method to Detect Significant Bacteriuria.” J. CLIN. MICROBIOL. 3: 5. Angelucci, Francesco, Kenneth A. Johnson, Paola Baiocco, Adriana E. Miele, Maurizio Brunori, Cristiana Valle, Fabio Vigorosi, et al. 2004. “Schistosoma Mansoni Fatty Acid Binding Protein: Specificity and Functional Control as Revealed by Crystallographic Structure † , ‡.” Biochemistry 43 (41): 13000–11. https://doi.org/10.1021/bi048505f. Dixon, Andrew S., Marie K. Schwinn, Mary P. Hall, Kris Zimmerman, Paul Otto, Thomas H. Lubben, Braeden L. Butler, et al. 2016. “NanoLuc Complementation Reporter Optimized for Accurate Measurement of Protein Interactions in Cells.” ACS Chemical Biology 11 (2): 400–408. https://doi.org/10.1021/acschembio.5b00753. Flower, Darren R, Anthony C.T North, and Clare E Sansom. “The Protein Family: Structural and Sequence Overview.” Biochimica et biophysica acta, and molecular enzymology 1482.1-2 (2000): 9–24. Hall, Mary P., James Unch, Brock F. Binkowski, Michael P. Valley, Braeden L. Butler, Monika G. Wood, Paul Otto, et al. 2012. “Engineered Luciferase Reporter from a Deep Sea Shrimp Utilizing a Novel Imidazopyrazinone Substrate.” ACS Chemical Biology 7 (11): 1848–57. https://doi.org/10.1021/cb3002478. Inouye, Satoshi, Yuiko Sahara-Miura, Jun-ichi Sato, Rie Iimori, Suguru Yoshida, and Takamitsu Hosoya. 2013. “Expression, Purification and Luminescence Properties of Coelenterazine-Utilizing Luciferases from Renilla, Oplophorus and Gaussia: Comparison of Substrate Specificity for C2-Modified Coelenterazines.” Protein Expression and Purification 88 (1): 150–56. https://doi.org/10.1016/j.pep.2012.12.006. Inouye, Satoshi, Jun-ichi Sato, Yuiko Sahara-Miura, Suguru Yoshida, and Takamitsu Hosoya. 2014. “Luminescence Enhancement of the Catalytic 19kDa Protein (KAZ) of Oplophorus Luciferase by Three Amino Acid Substitutions.” Biochemical and Biophysical Research Communications 445 (1): 157–62. https://doi.org/10.1016/j.bbrc.2014.01.133. Inouye, Satoshi, Jun-ichi Sato, Yuiko Sahara-Miura, Suguru Yoshida, Hajime Kurakata, and Takamitsu Hosoya. 2013. “C6-Deoxy Coelenterazine Analogues as an Efficient Substrate for Glow Luminescence Reaction of NanoKAZ: The Mutated Catalytic 19kDa Component of Oplophorus Luciferase.” Biochemical and Biophysical Research Communications 437 (1): 23–28. https://doi.org/10.1016/j.bbrc.2013.06.026. Inouye, Satoshi, Ken Watanabe, Hideshi Nakamura, and Osamu Shimomura. 2000. “Secretional Luciferase of the Luminous Shrimp Oplophorus Gracilirostris : CDNA Cloning of a Novel Imidazopyrazinone Luciferase.” FEBS Letters 481 (1): 19–25. https://doi.org/10.1016/S0014-5793(00)01963-3. Kaskova, Zinaida M., Aleksandra S. Tsarkova, and Ilia V. Yampolsky. 2016. “1001 Lights: Luciferins, Luciferases, Their Mechanisms of Action and Applications in Chemical Analysis, Biology and Medicine.” Chemical Society Reviews 45 (21): 6048–77. https://doi.org/10.1039/C6CS00296J. Markova, S. V., and E. S. Vysotski. 2015. “Coelenterazine-Dependent Luciferases.” Biochemistry (Moscow) 80 (6): 714–32. https://doi.org/10.1134/S0006297915060073. Markova, Svetlana V., Marina D. Larionova, and Eugene S. Vysotski. 2019. “Shining Light on the Secreted Luciferases of Marine Copepods: Current Knowledge and Applications.” Photochemistry and Photobiology 95 (3): 705–21. https://doi.org/10.1111/php.13077. Matos, Cristina F. R. O., Colin Robinson, Heli I. Alanen, Piotr Prus, Yuko Uchida, Lloyd W. Ruddock, Robert B. Freedman, and Eli Keshavarz-Moore. 2014. “Efficient Export of Prefolded, Disulfide-Bonded Recombinant Proteins to the Periplasm by the Tat Pathway in Escherichia Coli CyDisCo Strains.” Biotechnology Progress 30 (2): 281–90. https://doi.org/10.1002/btpr.1858. McCapra, Frank. 1976. “Chemical Mechanisms in Bioluminescence.” Accounts of Chemical Research 9 (6): 201–8. https://doi.org/10.1021/ar50102a001. Nakajima, Yoshihiro, Takuma Kimura, Kazunori Sugata, Toshiteru Enomoto, Atsushi Asakawa, Hidehiro Kubota, Masaaki Ikeda, and Yoshihiro Ohmiya. 2005. “Multicolor Luciferase Assay System: One-Step Monitoring of Multiple Gene Expressions with a Single Substrate.” BioTechniques 38 (6): 891–94. https://doi.org/10.2144/05386ST03. Nishitsuji, Hironori, Saneyuki Ujino, Yuko Shimizu, Keisuke Harada, Jing Zhang, Masaya Sugiyama, Masashi Mizokami, and Kunitada Shimotohno. 2015. “Novel Reporter System to Monitor Early Stages of the Hepatitis B Virus Life Cycle.” Cancer Science 106 (11): 1616–24. https://doi.org/10.1111/cas.12799. Oh-hashi, Kentaro, Eri Furuta, Keito Fujimura, and Yoko Hirata. 2017. “Application of a Novel HiBiT Peptide Tag for Monitoring ATF4 Protein Expression in Neuro2a Cells.” Biochemistry and Biophysics Reports 12 (December): 40–45. https://doi.org/10.1016/j.bbrep.2017.08.002. Rathnayaka, Tharangani, Minako Tawa, Shihori Sohya, Masafumi Yohda, and Yutaka Kuroda. 2010. “Biophysical Characterization of Highly Active Recombinant Gaussia Luciferase Expressed in Escherichia Coli.” Biochimica et Biophysica Acta (BBA) - Proteins and Proteomics 1804 (9): 1902–7. https://doi.org/10.1016/j.bbapap.2010.04.014. Shigeto, Hajime, Takeshi Ikeda, Akio Kuroda, and Hisakage Funabashi. 2015. “A BRET-Based Homogeneous Insulin Assay Using Interacting Domains in the Primary Binding Site of the Insulin Receptor.” Analytical Chemistry 87 (5): 2764–70. https://doi.org/10.1021/ac504063x. Shimomura, Osamu. Bioluminescence - Chemical Principles and Methods. Illustrated, Revised. World Scientific, 2012. Shimomura, Osamu, Takashi Masugi, Frank H Johnson, and Yata Hanedal. 1978. “Properties and Reaction Mechanism of the Bioluminescence System of the Deep-Sea Shrimp Oplophorus GracilorostrisP,” 5. Tomabechi, Yuri, Takamitsu Hosoya, Haruhiko Ehara, Shun-ichi Sekine, Mikako Shirouzu, and Satoshi Inouye. 2016. “Crystal Structure of NanoKAZ: The Mutated 19 KDa Component of Oplophorus Luciferase Catalyzing the Bioluminescent Reaction with Coelenterazine.” Biochemical and Biophysical Research Communications 470 (1): 88–93. https://doi.org/10.1016/j.bbrc.2015.12.123. Verhoef, Lisette G.G.C., Michela Mattioli, Fernanda Ricci, Yao-Cheng Li, and Mark Wade. 2016. “Multiplex Detection of Protein–protein Interactions Using a next Generation Luciferase Reporter.” Biochimica et Biophysica Acta (BBA) - Molecular Cell Research 1863 (2): 284–92. https://doi.org/10.1016/j.bbamcr.2015.11.031. Ward, W.W. and Cormier, M.J. 1979. "Energy transfer protein in coelenterate bioluminescence." J. Biol. Chem. 254:3 Yeh, Hsien-Wei, Omran Karmach, Ao Ji, David Carter, Manuela M Martins-Green, and Hui-wang Ai. 2017. “Red-Shifted Luciferase–luciferin Pairs for Enhanced Bioluminescence Imaging.” Nature Methods 14 (10): 971–74. https://doi.org/10.1038/nmeth.4400. Zhao, Jia, Travis J. Nelson, Quyen Vu, Tiffany Truong, and Cliff I. Stains. 2016. “Self-Assembling NanoLuc Luciferase Fragments as Probes for Protein Aggregation in Living Cells.” ACS Chemical Biology 11 (1): 132–38. https://doi.org/10.1021/acschembio.5b00758.

Links: https://www.thermofisher.com/order/catalog/product/A22066 https://dk.promega.com/products/luciferase-assays/reporter-assays/nano_glo-luciferase-assay- system/?catNum=N1110 https://dk.promega.com/-/media/files/resources/protocols/technical-manuals/500/nano-glo-in-gel- detection-system-technical-manual.pdf?la=en

2. Methods

The practical purpose of this work was to explore the functional sequence space of the SmBiT peptide of the nanoBiT system and find, if possible, a higher-affinity peptide, which would increase the utility of this already established system. As is often the case in protein engineering, the full sequence space of the peptide is too great to allow individual characterization of every variant. Instead, we designed a screening platform, whereby we could functionally assay the peptide variants in a relatively high-throughput manner. In broad terms, we constructed an expression system, wherein we could simultaneously express both a peptide-tagged model protein (E. coli thioredoxin) and the truncated luciferase. Confirming that the components of the NanoBiT system could complement and function intra-cellularly and that the luminescence signal could reliably distinguish peptides with varying affinities, we then constructed a series of mutant libraries, which would randomize selected positions of the peptide. These libraries were then screened, to find substitutions that would improve luminescence. The screening took place in a two-step process: First, a qualitative luminescence screen on agar plates. E. coli cells were transformed with the libraries and spread on agar plates. After growth, the resulting colony plates were sprayed with a solution of CTZ and the luminescent image recorded. Combining this luminescent image with a bright field image allowed us to pick out the brightest colonies with ease, discarding colonies with obviously lower luminescence. Second, a quantitative luminescence screen in 96-well format. The bright colonies were transferred to 96- well plates prepared with medium and grown. CTZ was added and the luminescence signal was then measured in a TECAN plate reader. The 96-well cultures were replicated to a sequencing plate, giving the sequence of the peptide for each well. The sequences and signals were then correlated and this formed the basic data set. Analyzing this data, we could then select candidate substitutions for closer individual assessment. Finally, the chosen candidates were synthesized as individual peptides and their affinities measured, to verify improvement.

Mutant libraries

generated

Qualitative screening on agar plates

Bright colonies picked

up for 96-well plates

Quantitative DNA sequencing luminescence screening

Data analysed and candidates selected

Candidate peptides

measured for affinity

Figure 2.1 – Screening Workflow

Mutants are generated in libraries and pre-screened for luminescence on agar plates. Bright colonies are inoculated into 96-well plates, which are replicated, for quantitative luminescence measurement and DNA sequencing. This data is combined and analyzed, to find the optimal substitutions. From this, candidate peptides are selected and measured for affinity.

2.1 System details

Since we were especially interested in the use of this NanoBiT system as a tag for identification and quantification of proteins of interest, we constructed a system, where the truncated NanoLuc (tLuc) was expressed together with a model protein tagged with the variant peptides. Taking advantage of the fact that the complementation can occur intra-cellularly, we would then measure luminescence from living cell cultures, avoiding time-consuming purification steps. We selected E. coli thioredoxin (TRX) as our model protein, since it is small, stable and well-behaved. To allow a high-throughput screening platform, we wanted to express the two interaction partners (peptide- tagged TRX and tLuc) from the same plasmid, in the same induction event. This was done to ensure a comparable level of expression, with minimal handling or complications. An operon was constructed, consisting of two genes, as shown in figure 2.2. The peptide variants are fused to the N-terminus of thioredoxin. Previous work by (Dixon et al. 2016, supplement), showed a minimal effect of linker length, so the peptide was attached with a linker consisting of a single Serine. The gene also includes a C-terminal his-tag. Following this is the tLuc gene, also C-terminally his-tagged. Each gene has its own ribosomal binding site and the entire operon is under control of a T7 promoter and lac operator, with a T7 terminator at the end. Gene sequences are shown in Appendix C. The standard cloning vector pET30a was used for the screening system, with the relevant genes inserted between NdeI and EcoRI in the multiple cloning site (provided by GenScript). This vector employs a T7 promoter, necessitating a compatible E. coli strain, but also ensuring a strong expression. The pET30a vector carries a gene for kanamycin resistance. Plasmid map is shown in Appendix D.

Figure 2.2 – Map of operon organization. Cutout from plasmid pET30a, covering 4980-6200 bp, with genes inserted between NdeI and EcoRI of the multiple cloning site. Example shown with p86 (Green) fused to E. coli thioredoxin (Orange), followed by tLuc (Blue). Each gene has its own ribosome binding site (Pink). Generated with SnapGene.

Expression and induction The pET30a vector uses a T7 promoter, combined with a lac operator, for efficient and controlled expression. The T7 promoter uses a unique transcription start signal, recognized by the T7 RNA polymerase. This start signal is not otherwise found in the E. coli genome. As such, the T7 RNA polymerase will work as a dedicated transcription machinery for only the genes under control of this promoter). The BL21(DE3) E. coli strain was used, since this strain contains a chromosomal copy of the gene for T7 RNA polymerase, This gene is also under the control of a lac operator, but is transcribed by the normal machinery of the cell. Likewise, BL21(DE3) contains a gene for LacI, the repressor for the lac operator. The pET30 vector also contains a copy of this gene, which ensures plentiful supply of repressor, even for multi- copy plasmids. Without induction, LacI will bind to the lac operator of both T7 RNA polymerase and the target genes, effectively repressing expression. Under induction with lactose or IPTG, LacI will no longer repress transcription of T7 RNA polymerase or the target genes. As T7 RNA polymerase is produced, it will begin transcribing the target genes specifically, resulting in highly efficient expression and a hopefully clear signal (Dubendorf and Studier 1991; Studier and Moffatt 1986)

To further take advantage of the qualities of this system, we employed a glucose/lactose auto-induction medium, rather than induction by adding IPTG. As long as glucose is available, the cells with prefer it as a carbon source and importation of lactose is repressed. However, once the glucose runs out, lactose will be imported, starting induction. This allows for a delay in induction, without altering the medium during growth. Using the auto-induction medium, cultures can be inoculated and left to grow without further handling, allowing higher throughput (Görke and Stülke 2008).

2.2 Testing the system

To test the peptide-tagged thioredoxin constructs, we made two versions, tagged either N- or C- terminally with the p86 peptide. Using purified protein, a simple comparison was set up, examining the luminescence from each version of tagged thioredoxin.

Figure 2.3 – test of Comparison - Tagged TRX - Average of replicas thioredoxin tagging. 16000 14000 12000 Known concentrations of 10000 thioredoxin, tagged either 8000 C-terminal 6000 N- or C- terminally, was

Signal /RLUs Signal N-terminal 4000 mixed with surplus of tLuc 2000 and measured for 0 0,00 0,50 1,00 1,50 2,00 luminescence with 11µM [TRX-p86] / nM CTZ.

As seen in the graph above, N-terminal tagging appears to be more effective at complementing the tLuc and restoring activity. It is unclear why this difference occurs, but some steric clash might be suspected. Since the purpose of this project was not to study tagging of thioredoxin, specifically, the matter was not looked into further and N-terminal tagging was used in all constructs going forward.

Control wells Figure 2.4 – Test of 100000 control peptides

10000 Cultures expressing the 1000 peptide-tagged TRX and 11S together.

Log Signal Log 100 Cultures grown and 10 measured in independent triplicates, 1 p86 p128 p114 p86-R162Y with 7µM CTZ.

To ensure that the co-expression system was working properly, a series of constructs were made, which utilized the peptides reported by Dixon, et al. Three constructs were made, using p86, p128, and p114, with reported Kds of 0.7nM, 280nM, and 190µM, respectively. Also, an additional construct was made, using p86-R162Y, intended as a negative control. Cultures with the relevant plasmids were grown, in triplicates, in auto-induction medium and then assayed for luciferase activity

The clear differences in activity correlated with the known affinities and the variation between replicates was minimal. This left us confident that this system would work with a wide range of affinities and could accurately identify high-affinity peptides.

Blind test As a further test of the reliability of the screening platform, colonies of each construct were streaked on the same LB-kan agar plate. This resulted in a plate with a mix of colonies, scattered randomly across the plate, with no clear indication of which colony expressed which construct. From this plate, a random selection of colonies were picked up, grown in microtiter format, and assayed again. The purpose of this was to see if, when blinded to the actual identity of the constructs, it would still be possible to identify a construct, purely from the signal. The data was arranged from highest to lowest signal, in this manner:

Test wells Figure 2.5 – Results from 100000 blind test of control constructs.

10000 24 random colonies were picked and

1000 measured with 7µM CTZ. The results are

100 arranged from highest to lowest signal.

10

1 1 2 3 4 5 6 7 8 9 101112131415161718192021222324

The signals seem to separate nicely into discrete categories, with minimal intermediates. The cultures labelled above as 4, 7, and 11 were picked up and sent for plasmid sequencing. #7 matched p128 and #11 matched p114. #4 gave a poor quality sequence, perhaps indicative of a mixed culture. The results were deemed sufficiently reliable to proceed.

2.3 Library strategy

Dixon, et al. demonstrated, by successive truncations, that the outermost amino acids of the peptide are relatively less important for affinity than the central amino acids in the peptide. Since, for practical reasons, we wished to focus our attention on the positions most likely to give increase in luminescence/affinity, we designed libraries focused on varying the amino acids in positions #160-166, excluding the first two and last two amino acid positions in the peptide. The residues of the peptide are numbered according to the sequence of the full length NanoLuc.

Figure 2.6 – Highlight of the presumed binding pocket for CTZ, in NanoLuc. Internal cavity shown in light gray surface mode. Nearby residue side-chains shown in teal. Strand 11 shown as green cartoon with sidechains. R162 shown in magenta. Residues #158 and #168 are indicated, to show direction of the strand. Generated with Pymol.

Substitutions were introduced into a background of the NanoLuc Native Peptide sequence (VTGWRLCERIL). We reasoned that a lower affinity starting point would make improvements stand out more and make the screening process easier. It also avoided the risk of the already optimized peptide representing a local minimum, which might exclude other options, if used as the starting point. Since we wanted not only single mutants, but also multiple mutants, each library was constructed to have three codons randomized, in various combinations of positions. To keep the libraries at a manageable level of complexity, two codons were fully randomized, while the third was only partially randomized. Fully randomized codons were arranged systematically, so that all possible combinations of two codons would be represented in the full set of libraries. For full randomization, NNK codons were used (K = G or T). This was done to allow all 20 amino acid as possible substitutions, while also reducing the number of stop codons. This meant that the number of colonies necessary to screen was reduced, as a larger proportion of the total variants would be viable. In addition, the use of NNK codons change the relative ratios of amino acids, allowing amino acids with rare codons, such as Tryptophan or Methionine, to be represented at higher rates than with NNN codons. This overall leads to a more balanced representation of substitutions, increasing the chances that all possible substitutions would be well represented in the libraries.

Position #158 #159 #160 #161 #162 #163 #164 #165 #166 #167 #168 Starting AA V T G W R L C E R I L Lib 105 Lib 106 Lib 107 Lib 108 Lib 109 Lib 110 Lib 111 Lib 112 Lib 113 Lib 114 Lib 115 Lib 116 Lib 117 Lib 118 Lib 119 Lib 120

Table 2.1 – Positions mutated by each library. Bright green indicates that the codon is fully mutated (NNK codon), while dark green indicates a semi-randomized codon (varied with position). See also table 2.2.

The semi-random codons were constructed with only a single nucleotide position randomized, leading to four possible substitutions. Semi-randomized codons were placed on a case-by-case basis for each library, focusing on codon positions that were estimated as likely to lead to promising results, viz. improvements in luminescence. This evaluation was done based on the literature, especially the variants reported by Dixon, et al., focusing primarily on position #164, which is mutated in all libraries.

Position Codon Possible AAs Table 2.2 – codons used for semi-randomized positions. Each #160 NGT CRSG codon has a single randomized nucleotide, leading to four possible amino acids, indicated with single letter codes. #164 TNT FSYC

#166 ANA ITKR

In all cases, the semi-random codons were constructed such that the starting amino acid was one of the four options. For that reason, the codons differ by position, as illustrated in the table above. This also means that in those positions the frequency of these substitutions are higher. Since R162 is suspected of being crucial to the function of the peptide/luciferase, this position was not part of the primary set of libraries, but was instead randomized as its own library (Lib 105), mutating only that one position, to all possible twenty amino acids. The aim of this strategy was to get a large selection of single, double, and triple mutants, allowing not only a full screen of all possible substitutions, but also their combinations. With this data set, we expected to be able to estimate the optimal amino acid for each position.

2.4 Protocols

Library construction Mutant libraries were constructed via PCR using uracil-containing primers and PfuX7 polymerase (Nørholm 2010), to allow for later USER cloning. A generic forward primer was used, alongside library- specific reverse primers, with variable nucleotides in known positions (see appendix C for sequences), reflecting the libraries shown in Table 2.1. PCRs were run under standard conditions (2mM dNTPs, 0.5nmol primers, 1U Pfux7 polymerase) with a standard Pfu amplification program (Initial denaturation: 2 minutes at 96°C; 25 amplification cycles of Denaturation: 30 seconds at 96°C, Annealing: 20 seconds at 46°C, and Elongation: 10 minutes at 72°C; Final elongation: 10 minutes at 72°C), resulting in full plasmid amplification. A template plasmid with a non-tagged TRX was used, to avoid bias in annealing. After PCR, the products were treated with DpnI overnight, to remove template DNA. The resulting products were run on a 1% agarose gel and the relevant bands purified with a commercial GeneJet kit from ThermoFisher. Having achieved the linear products, the plasmids were circularized, by USER cloning, using the USER enzyme mix, supplied by New England Biolabs, according to an in-house protocol (30 minutes at 37°C; 20 minutes at 25°C; five annealing steps of 15 minutes each, at decreasing temperatures from 16°C to 12°C). The resulting circularized plasmids were transformed directly into E. coli MC1061 chemically competent cells, which were plated on agar plates with LB medium and 50µg/ml kanamycin (LB-Kan). The transform plates were evaluated for library depth. The colonies were then washed off and plasmid libraries were purified, using a ThermoFisher GeneJet Miniprep kit. These libraries were then transformed into E. coli BL21(DE3) chemically competent cells. These transformants were spread on LB-Kan plates for screening. Strain genotypes can be found in Appendix E.

Qualitative screening Coelenterazine stock was made by dissolving dry, powdered CTZ in isopropanol, and the concentration checked by absorbance at 427nm (ԐCTZ: 7400 M-1 cm-1). This stock was kept at -20°C and diluted in isopropanol immediately before use. CTZ supplied by Biosynth AG. Transform plates were manually sprayed with CTZ (20-60µM; estimated 100-150µl) and imaged under an ATIK 4000c CCD camera, cooled to -40°C, for a 300 second exposure. An additional light field image was taken, at exposure 0.02 seconds. These images were overlaid, with luminescence in false color, to create an image of the full plate, with bright colonies highlighted. The bright colonies were manually picked with tooth-picks and inoculated into prepared 96-well plates.

Figure 2.7 – Example of pre-screen plate overlay

Bright field image of library colonies, overlaid with luminescent image in false red color.

Quantitative screening 96-well plates were prepared with two types of medium, both based on LB medium with 37mM phosphate added, pH7.4. Two media were prepared: LB-Glu (0.002% glucose added) and LB-Lac (0.002% glucose and 0.2% lactose added). Both media also contained 50µg/ml kanamycin. 120µl medium was added to each well of the 96-well plates.

Colonies were inoculated directly into a Master plate with LB-Glu, along with known controls. This plate was then replicated into growth plates with LB-Lac. The master plate was further replicated to a 96-well sequencing plate (provided by Eurofins), as well as stamped on LB-kan agar plates (two agar plates per 96-well plate). The master plate was saved at -80°C, after adding glycerol to a final concentration of 20%. The stamp plates were grown ON at 37C and then imaged under the camera in the same fashion as the library plates, in order to check for heterogeneity in the cultures. The sequencing plate was set to grow ON at 37°C and then sent to Eurofins genomics for sequencing. From the raw sequences returned, the sequence of the peptide was extracted and the corresponding amino acids translated. The growth plate was grown ON at 37°C and then measured for signal in a TECAN Infinite F200 Pro plate reader. Luminescence from the grown cultures were measured by addition of 5µl CTZ to each well by automatic injection (for a final concentration of ~3µM), followed by three measurements of every well in the plate. Due to delays in mixing the first measurement was deemed unreliable, so the average of the last two measurements was used as the effective signal measured. The signals were then coupled with the sequences and the data collected for analysis. The full data set can be found in appendix F or in electronic format at https://tinyurl.com/BB-dataset.

Candidate peptides Selected candidate peptides were synthesized by TAG Copenhagen. Peptides were dissolved in milliQ water and the concentration was measured by absorption at 280nm or 214nm, as appropriate, using extinction coefficients calculated via ExPASy.

Expression and purification of protein For obtaining purified protein, either tagged thioredoxin or tLuc, we employed affinity chromatography, utilizing the his-tags. Plasmids were constructed, each with only the relevant gene, both otherwise identical to the previous system. The plasmids were transformed into E. coli BL21(DE3) cells and a single colony was picked for each desired protein, to inoculate a 5ml ON starter culture. For each protein, 500ml AB-LB medium (see recipe below) was inoculated from the ON culture and shaken at 37°C until an OD600 of 0.5. The culture was induced with 500µl 1M IPTG, for a final concentration of 1mM. The culture was left to express at 25°C for 3 hours, while shaking.

B-LB Medium A Medium

Peptone 10g Yeast extract 5g (NH4)2SO4 20g NaCl 5g Na2HPO4 2H2O 150g 1M MgCl2 2ml KH2PO4 30g 0.5M CaCl2 200µl NaCl 30g 10mM FeCl3 300µl mQ to 1 mQ to 900ml, pH7.4

Autoclaved separately. 100ml A medium added to 900 ml B-LB medium, for 1L AB-LB expression medium.

Cells were harvested by centrifugation and sonicated on ice for three runs of 6 x 30 seconds. The resulting lysate was centrifuged at 15,000 rpm for 10 minutes and the supernatant decanted and filtered. The supernatant was loaded on an Äkta FPLC (900 series), using a Ni-NTA column. A standard buffer (50mM Na2PO4, 150mM NaCl, pH8) was used, with increasing concentrations of imidazole at each step: Equilibration - 20mM, Elution 1 - 58.4mM, Elution 2 – 116mM, Elution 3 – 500mM. 1ml fractions were collected and examined by SDS-PAGE. The majority of desired protein was contained in the E2 fractions. These fractions were collected and dialyzed into the assay buffer (30mM Tris, 1mM EDTA, pH7.6) overnight at 5°C. Final concentrations were measured by absorption at 280nm, using extinction coefficients calculated by ExPASy, of 19940 M-1 s-1 for tLuc and 19605 M-1 s-1 for thioredoxin .

Affinity measurements Stock solutions of peptides were diluted in assay buffer (30mM Tris-HCl, 1mM EDTA, pH 7.6) with added tLuc (1nM), in excess of 1000-fold dilution, to produce working stocks. These stocks were further diluted in the same buffer (with 1nM tLuc) to reach the assay concentrations. This procedure was followed to ensure a constant concentration of tLuc and limit the effects of pipetting errors on the final results. For each peptide, a series of samples were made with varying concentrations of peptide, aiming to have concentrations both below the expected Kd and reaching high enough to likely achieve saturation. The samples were left at 25°C for 1 hour, to ensure full equilibration, and then measured in a 3 mL cuvette on a Perkin Elmer LS-55 spectrofluorometer (460nm emission peak, 20nm slit width), with the addition of 10µM CTZ. Reactions were allowed to proceed until a clear plateau of luminescence was reached; a minimum of 30s. The maximum luminescence output from each sample was then plotted against the peptide concentration, resulting in a saturation curve for each peptide. This data was fitted in Origin, to estimate Kd of each peptide. Practical experiments in affinity measurement were carried out by Charlotte O’Shea, Sylvester Vinther, Sigrid Jørsboe, and Jonas Puls.

Fitting for affinity

Affinity is often described in terms of Kd, which is defined as the relative concentrations of binding partners and complex, at equilibrium: Kd = [Afree] * [Bfree] [AB]

Where [Afree] and [Bfree] are the total concentrations of the unbound binding partners, and [AB] is the total concentration of the complex. In our case, this means:

Kd = [tLuc free] * [peptide free] [tLuc-peptide complex]

We’re here assuming a scenario of simple 1:1 reversible binding.

By making a series of samples, where the total [tLuc] is kept constant, we can examine the effects of added peptide on the luminescent signal. We are essentially titrating the free tLuc, using the peptide. By plotting the signal from each sample against the concentration of peptide, a binding curve is achieved. Since the luminescent signal correlates with the concentration of the complex, the curve describes the formation of the complex as a function of the peptide concentration. As the concentration of peptide is increased, more and more tLuc is bound, resulting in a higher luminescent signal. At higher concentrations, the curve flattens, eventually to a plateau. This plateau represents the maximum signal at saturation, where essentially all tLuc is bound in the complex. The shape of the curve reflects the strength of binding. A high affinity is reflected in a steep curve that reaches the plateau quickly, whereas lower affinity will show a more gradual curve, requiring higher concentrations of peptide to reach saturation.

Figure 2.8 – Titration of tLuc with the peptide variants. Adapted from Dixon, 2015

The curves illustrated in figure 2.8 all rise to the same plateau, but the plateau is reached at different concentrations, representing different levels of binding affinity.

Such curves, in themselves, already allow a simple visual evaluation of the relative affinities of the peptides tested. Certainly, major differences are clear without further analysis. However, for more precision in estimating an exact Kd, a proper fitting is required. We fitted the affinity data in Origin to the following formula (Pollard 2010):

y = a+(b-a)*((x+c+kd)-((x+c+kd)^2-(4*x*c))^0.5)/2*c

Where a is the baseline signal (i.e. with no peptide present), b is the maximum signal, and c is the tLuc concentration. The values of a and b are fitted, whereas c is kept fixed at 1nM, as in the experiments. Fitting the data in this way should give a reasonably accurate picture of the affinities of the peptides selected. Graphs and details of fits can be found in appendix G.

2.5 References

Dixon, Andrew S., Marie K. Schwinn, Mary P. Hall, Kris Zimmerman, Paul Otto, Thomas H. Lubben, Braeden L. Butler, et al. 2016. “NanoLuc Complementation Reporter Optimized for Accurate Measurement of Protein Interactions in Cells.” ACS Chemical Biology 11 (2): 400–408. https://doi.org/10.1021/acschembio.5b00753.

Dubendorf, John W, and F.William Studier. “Controlling Basal Expression in an Inducible T7 Expression System by Blocking the Target T7 Promoter with Lac Repressor.” Journal of molecular biology 219.1 (1991): 45–59

Görke, Boris, and Jörg Stülke. “Carbon Catabolite Repression in Bacteria: Many Ways to Make the Most Out of Nutrients.” Nature reviews. Microbiology 6.8 (2008): 613–624.

Pollard, Thomas D. 2010. “A Guide to Simple and Informative Binding Assays.” Edited by Douglas Kellogg. Molecular Biology of the Cell 21 (23): 4061–67. https://doi.org/10.1091/mbc.e10-08-0683.

Studier, F.William, and Barbara A Moffatt. “Use of Bacteriophage T7 RNA Polymerase to Direct Selective High- Level Expression of Cloned Genes.” Journal of molecular biology 189.1 (1986): 113–130

3. Results

A total of ten 96-well plates were screened, with a total of 910 variants, excluding controls. Full screening data is included in appendix F, with an electronic copy stored at https://tinyurl.com/BB-dataset. Some cultures failed to give a clear sequence and were therefore excluded from further analysis. Such results are likely caused by problems with inoculation; either failure to inoculate or accidentally inoculating with more than one colony, e.g. because two colonies grew close together on the pre-screen plate. This left 790 variants, excluding failed sequences. Some cultures gave extremely low signals, on the level of the background signal. Such a signal could be a result of accidentally picking up a dark colony during pre-screening, or because the culture failed to grow or express properly. As such, it’s either a sequence that doesn’t produce a functional peptide, or an anomalous signal that would only confuse the analysis. For that reason, cultures with a signal lower than 100 RLUs were excluded from the signal analysis, leaving 681 variants with clear sequences and significant signals. However, we observed that cultures with identical sequences could give quite variable, but still clearly detectable, signals. In other words, while they’re definitely expressing the same constructs, the strength of expression varies. This could be explained by potential mutations occurring during growth. We observed that under strong induction (1mM IPTG), colonies expressing these constructs grew weakly, indicating that the cells were somehow stressed by this high level of expression. If cells expressing the peptide-tagged protein are under stress, then any cell that can repress the expression will have a growth advantage in the culture. Such repression could occur e.g. by mutations in the plasmid promoter sequence, or by deletion or other inactivation of the T7 polymerase, genomically. In either case, such mutated cells would be relieved of the stress caused by the induction and grow more vigorously than their non-mutated cousins. To the degree that they do, the overall signal from the culture would drop, as an increasing number of cells would express no luciferase. To investigate this, the cultures from the 96-well plates were spotted on agar plates and checked with the pre-screen method.

Some cultures show clear signs of sectoring growth (see figure 3.1). This could indicate the occurrence of a mutation, propagating as the cells grow. To check this phenomenon further, cultures with identical sequences, but variable signal, were selected and examined more closely. 6-E9 and 5-D8 were selected, since these cultures gave the same sequence, but while 6-E9 gave a robust signal, 5-D8 showed a signal less than 2% of that (see table 3.1). Plate - Well Substitutions Signal Table 3.1 – example of cultures with identical 6-E9 F164C, R166P 125865 sequences and variant signals. 5-D8 F164C, R166P 1788

Figure 3.1 – detail of sectoring of cultures. LB-Kan agar plate, stamped from 96-well growth plate. Imaged with CTZ spray. Luminescence signal in green overlay. Examples include bright, homogeneous growth (left), bright growth, with dark sectors (center), and low luminescent growth, with dark sectoring (right).

These cultures were streaked fresh on LB-Kan and then assayed by the pre-screening method again. It was found that while the bright cultures gave uniformly bright colonies, the 5-D8 culture resulted in a heterogeneous population of colonies; some bright, some dark, as seen in figure 3.2.

Figure 3.2 – Detail of mixed culture streak. Mixed culture streaked on LB-kan agar plate and grown overnight. Sprayed with CTZ and imaged. Luminescence signal overlaid in green.

Note that presence or absence of luminescence is not random, but distinctly tied to individual colonies, even when growing immediately next to one another. A colony of each phenotype from 5-D8 was picked up, re-streaked and assayed again. It was found that the phenotype stayed constant, indicating a likely genetic difference, rather than simple differences in growth or expression. The colonies were also used to inoculate a fresh 96-well growth plate and, upon measurement, it was found that the bright colonies of 5-D8 gave a signal comparable to the earlier bright colonies of 6-E9. Plasmid was purified from the dark colonies of 5-D8 and re-transformed into fresh BL21 competent cells. Upon assaying with the pre-screen method again, these colonies showed a robust signal, similar to the originally bright colonies. These facts taken together would seem to indicate that some genomic change is taking place in the cells, perhaps as a result of a toxic effect from the high expression. Certainly, the plasmid works as intended, has the proper sequence and, when transformed into fresh cells, gives the signal expected of that sequence. The lower signals, while unfortunate, do not undermine the usefulness of the assay, nor do they fundamentally confuse the results. For all reasonable cases, any change from the expected system is more likely to cause a reduction in signal than an increase. As such, the highest signal is likely most representative of the actual activity/affinity of the peptide in question. Given the relatively noisy system, a single measurement of a peptide may be unreliable. As such, if a substitution was found only once or twice, it is possible that the associated signal is not a true representation of the effect of that substitution. The more often a given substitution/position pair is found, the more reliable the signal is. As such, single occurrences of a sequence with a low signal, should not make us overly concerned. Yes, it’s possible that the one occurrence of a sequence gives a low signal not because the peptide is deficient, but because something happened in the culture. However, any genuinely good peptide should be picked up multiple times and therefore give rise to multiple cultures. If even one of these grows properly, we can use that signal to conclude that all lower signals are artifacts.

Sequences were therefore curated to include only the best-performing example of any given peptide, removing duplicates. The resulting 365 sequences were used for the signal analysis.

3.1 Examining the libraries

Our purpose was to screen a broad selection of multiply substituted variants. As shown in the chart below, we found a majority of triple substitutions, a significant portion of double substitutions, and a small group of single substitutions. This matches what we expected and desired, indicating that our libraries were successful. Two examples of quadruple substitutions were found. This is presumably a result of errors during the PCR, or possibly sequencing errors. In either case, there’s a negligible number and this does not confuse our analysis.

Excluding Excluding Table 3.2 – Number of substitutions per Excluding Excluding Substitutions failed low controls duplicates sequence. sequences signals Shows the number of sequences with a 0 177 58 58 1 given number of substitutions, after 1 99 99 60 19 each step in the curating procedure. 2 276 276 254 126 Note that failed sequences could not be 3 356 355 307 217 assigned a substitution and therefore 4 2 2 2 2 are classified as zero substitutions. Total 910 790 681 365

A significant number of sequences are removed as duplicates. This indicates that the libraries were screened in good depth, since the same sequences were found in multiples. It also shows that the pre- screening procedure was consistent in selecting the peptides with clear activity. An accidental pickup should not occur frequently, but a viable peptide should be expected to recur, as we see. As such, while we’re excluding duplicates from the signal analysis, we can still retrieve relevant information from the fact that such duplicates exist. This baby should not be thrown out with the bathwater.

Table 2.5 shows the number of sequences found from each library. It’s clear that there are significant differences between the productivity of each library. However, since each library mutates only specific positions in the peptide, such differences are to be expected. Various positions may have different room for improvement, depending on the functionality of the starting amino acid. If the starting amino acid is close to ideal, there may not be much room for improvements, whereas a sub- optimal amino acid may be replaced by any number of substitutions and still yield improvements. As such, the number of unique sequences produced by any library tells us something about how well suited the starting amino acid is, for the positions mutated by that library.

Table 3.3 – Number of Excluding Excluding failed Excluding Excluding Library controls sequences low signals duplicates sequences, per library. Lib 105 110 92 56 1 Lib 106 12 12 11 6 Shows how many sequences Lib 107 39 33 8 6 were found from each library, Lib 108 23 10 9 7 Lib 109 36 31 28 13 after each step of the curating Lib 110 21 16 16 5 procedure. Lib 111 58 46 36 26 Lib 112 96 94 91 44 Lib 113 96 93 91 73

Lib 114 91 90 84 41

Lib 115 51 29 24 14

Lib 116 51 33 33 19

Lib 117 32 24 19 8 Lib 118 107 101 89 58 Lib 119 35 34 34 15 Lib 120 52 52 52 29

Libraries 106-110, which all mutate positions G160, show a low number of unique sequences, compared with Libraries 111-114, which mutate position W161 instead. This could indicate that position #160 is already close to optimal. Conversely, the most productive libraries are 112, 113, 114, and 118. These libraries all hit positions #161 and #164-6, in various combinations. This is an indication that these positions have the most room for improvement.

From these results, it appears that our libraries functioned as intended, were screened to a proper depth, and give results consistent with what we might expect, given the underlying physical reality. As such, we can confidently proceed with our analysis.

3.2 Heatmap calculation

With the data sorted and with reasonable assurance of the proper function of our libraries and screening method, we evaluated the results. Given the large number of sequences, a better overview is needed. In the spirit of high throughput, rather than going through a long list by hand, we calculated a per substitution signal average: The average signal of all unique peptides with a given substitution in a given position, regardless of other substitutions. This average was calculated for all possible substitution/position pairs (including the original, un-substituted amino acids) and arranged in a heat map, as shown below. Signal Average Positions Substitutions G160 W161 L163 C164 E165 R166 A 11546 12145 22024 22428 C 17236 3539 20243 46952 39392 D 1545 8917 173 12817 12981 E 115 203 4416 1896 20683 21022 F 46166 2806 28182 22018 18611 G 22706 16514 278 6018 39776 24286 H 2541 30358 3499 22722 33433 61862 I 694 9682 10110 2023 4045 11169 K 4406 19320 637 35172 15442 L 20144 25166 12099 5185 7550

M 6876 12116 13459 19730 65658 N 13534 9015 14689 3151 P 426 7045 1711 2020 39018 Q 118 14092 931 29592 28052 R 6763 16517 13100 29547 22876 S 14200 1709 22331 18803 53335 T 2800 1494 1475 5762 17057 9744 V 3081 5828 3506 1396 10198 41204 W 20652 1853 8908 7063 Y 42347 4552 1636 4941 27408

Table 3.4 – Heat map of calculated signal averages. Numbers represent the signal average for that position and substitution. Substitutions have been color coded with red for high signals and blue for low signals, within each position column. Unsubstituted amino acids are boxed, for comparison.

The rationale for the signal average is that if a substitution is improving the affinity of the peptide, any peptide including this substitution should be improved, compared to a peptide without that substitution. If a suite of peptides with that substitution is found, the background substitutions should average out and the signal average should be mostly reflective of the contribution of the substitution/position pair being calculated. This would be definitely true if all possible substitutions and combinations were present in the data set. However, since that is not the case here, some care should be taken to avoid false positives. Specifically, given that substitutions occur most often in combinations (only ~5% of unique peptides are single substituted, see table 3.2) and given the two-step screening procedure weeding out most of the poorly performing peptides, it is possible that the signal average for a given substitution/position could be misleading. For example, a poor substitution will not be seen in isolation, because it wouldn’t pass the pre-screen. As such, it will only ever be found in combination with other, improving, substitutions and the poor substitution will be rated as higher than its actual effect would otherwise allow. Substitutions that only occur in the presence of otherwise well-performing substitutions, are therefore suspect and should be carefully evaluated.

3.3 Evaluating candidates

A simple approach is to look at how often a given substitution is found in a given position. If the calculated signal average for a substitution is a genuine reflection of the benefit of that substitution, then highly rated substitutions should also be found often. On the other hand, with few examples of a given substitution, the signal average becomes unreliable and more prone to the problems mentioned above. Likewise, a comparison between the frequencies of different substitutions is helpful. In position #164, Tyrosine and Phenylalanine are both equally likely to occur in the libraries, but are found in the screen at a rate of 181 to 5. Such a difference is difficult to explain as a false positive and is a sign of a genuine difference in the functionality of these residues. It is also noteworthy that C164F is the substitution found in Dixon’s high-affinity peptide. A somewhat arbitrary cutoff of five counts was used to assist in evaluating the sequences. Positions Substitutions G160 W161 L163 C164 E165 R166 A 0 17 0 2 11 5 C 0 10 3 3 3 D 2 0 1 1 1 1 E 1 1 4 1 6 F 0 31 2 181 1 2 G 15 2 3 10 3 H 1 17 4 25 4 6 I 1 4 5 2 3 14 K 0 1 2 1 27 62 L 0 5 13 12 10 M 0 7 6 27 14 2 N 0 5 0 6 7 2 P 1 4 0 2 3 13 Q 0 1 7 3 15 5 R 0 6 11 2 36 S 0 13 1 50 15 5 T 1 4 3 3 16 14 V 1 6 7 1 8 2 W 0 5 12 4 0 Y 0 23 1 3 2 1 Total substitutions 8 170 64 338 192 156

Table 3.5 – Number of substitutions found. Shows the number of unique sequences found with a given substitution in a given position. Substitution/position pairs found in more than five unique sequences are highlighted in green. These tables are an aid to evaluating both specific substitutions and overall positional effects. It’s immediately obvious from the heat map that positions #165 and #166 show good results with a wide range of beneficial substitutions, while especially positions #161 and #163 are less open to substitution. Position #160 is particularly resistant to change, with no apparent improvements found and many substitutions never found at all. These results match what we suspected from the library evaluation and also fit the NanoLuc structure. G160 points in towards the center of the barrel, with little space available for a bigger side chain. It seems reasonable, then, that other residues would not fit into that position, without disrupting the structure. In positions #161, several potential improvements are found, with a distinct preference for amino acids with ringed side chains. In the crystal structure of NanoLuc, a Tyrosine in position #16 is immediately next to W161, on the neighboring strand. It’s possible that a pi-stacking interaction between these two residues stabilizes the barrel and therefore favors ring-like residues in position #161. Such functional preferences indicate a real result, rather than a screening artifact, since we’d expect such effects, when the peptide has to fit into the groove in the truncated luciferase.

Figure 3.3 – Possibly interacting residues. NanoLuc structure in green. Residue Y16 highlighted in blue. Residue W161 highlighted in red. Generated with Pymol.

In our data set, we can find many example of pairs of sequences that differ only in a single substitution. By repeated comparisons of such pairs, the effect of any given substitution can be more accurately estimated, and by guiding such comparisons by the signal average, the number of sequences necessary to examine becomes drastically reduced, making such manual examination practically feasible. Using this method, candidate substitutions found through the heatmap were checked and false positives eliminated. E.g. L163I shows up as a possible candidate. However, when going into the raw data, it’s clear that this is not representative of the substitution’s actual effect. L163I was only picked up in combination with other substitutions known to be stabilizing. When a direct comparison was made (see table 3.6) between peptides either containing or not containing L163I, it’s clear that this substitution is not itself beneficial. Rather, the high average signal is an artifact, caused by the other substitutions in the screened peptides.

Plate-Well Substitutions Signal Table 3.6 – Comparison of individual 2-A1 W161F, L163I, C164F 40368 peptide variants 2-A8 W161F, C164F 79044 Plate position, substitutions, and 1-F9 L163I, C164M, R166K 1203 measured signal shown for four found 4-H1 C164M, R166K 37584 peptides; two with L163I and two without.

The same thing can occur in reverse. If a given substitution is very good, it may be found more often in combination with poor substitutions, dragging it down. This is due to the fact that the benefit of this substitution is so high that it allows more muddled peptides to get through the screen. This is likely the case for C164F. As a single substitution, it performs well, but in the signal average, it is middling. Splitting our data according to the presence or absence of this substitution shows what’s happening.

C164F Other Substitution count Sequence count Percentage of total Sequence count Percentage of total 0 0 0,0 1 0,5 1 1 0,6 18 9,8 2 47 26,0 79 42,9 3 131 72,4 86 46,7 4 2 1,1 0 0,0 Total sequences 181 184 Table 3.7 – Number of substitutions in two groups: Sequences with C164F and sequences with any other amino acid in that position. Each group is sorted according to the total number of substitutions per peptide. Number of sequences and percentage of that group’s total sequences are given.

As seen in table 3.7, with C164F present, a dominant number of sequences have three substitutions. With any other substitution in position #164, only about half of the sequences have three substitutions. It appears that C164F is sufficiently beneficial that it allows a peptide to pass the pre-screen, even if it also contains other, less beneficial substitutions. This is further supported by observing which substitutions occur in each group. E.g. R166H is only found in combination with C164F, so despite the high average signal and a decent pickup count, it might still not be a reliable substitution. As such, it’s important to look at both ends of the analysis, to figure out what the actual effect of any given substitution might be. The average can cut through a lot of noise, but the candidates must be validated by going back to the raw data set. This is partly a result of the two-step screening process. Rather than running all variations through the screening process, the first plate screening weeds out the obviously failed sequences, so we could focus on those peptides that show clear function. However, this also means we have limited information on those very non-functional peptides. This makes direct comparisons difficult and it biases the data we end up with in a manner that requires we be very careful about how we draw conclusions from that data. 3.4 Final Candidate Selection

Using the heatmap in Table 3.4, candidate substitutions were selected and examined in more detail, using the full data set. Manual inspection of the sequences allowed pair-wise comparisons, as well as inclusion of structural information, to evaluate the reliability of candidate substitutions. For each position, the substitutions were evaluated. For position #160, #162, and #163, no candidate substitutions were found. While some substitutions might produce viable peptides, none appear to be improvements. We judged that the existing amino acids are likely close to ideal. Incidentally, these position have also been kept unchanged in Promega’s peptides (Dixon et al. 2016, supplementary). Position #161 seems to be well suited for Tryptophan, but Phenylalanine, Tyrosine and Histidine are also viable options. Phenylalanine and Histidine were chosen as candidates, to explore these options, as Tyrosine is already well-represented in Promega’s work. For position #164, our screen shows what we anticipated; that Phenylalanine is the optimal amino acid. However, it was interesting to see that some other options may also perform quite well. Especially Histidine and Serine are promising. C164H is also present in teLuc, a mutated variant of full-length NanoLuc, using a synthetic substrate (Yeh et al. 2017). Histidine was therefore chosen, as an interesting comparison with Phenylalanine. In position #165, we found several viable candidates, as well. Glycine, Lysine, Glutamine, and Arginine show up as possible, with Lysine being the preferred in Promega’s high-affinity peptide. We decided on Arginine and Glutamine as reasonable candidates. Finally, in position #166, Histidine and Methionine show up most obviously, signal-wise, but are not often picked up and always occur in combination with C164F. Instead, Lysine is picked up 62 times (see table 3.5). It seems likely that Histidine and Methionine are artifacts of the signal average, especially considering that we know Lysine performs well in Promega’s work. A more interesting possibility is Proline. While NanoLuc has no particular relationship to other luciferases, it does have structural similarity with fatty-acid binding proteins in the Calycin structural superfamily. These beta-barrel proteins have a structural motif, which is proposed to stabilize the barrel. This motif was used to guide the rational mutation of OLuc into NanoLuc, by changing position #166 from Asparagine to Arginine (Hall et al. 2012). However, there’s a variant of this motif, where the amino acid in question is Proline (Flower, North, and Sansom 2000).The fact that our screen was able to uncover this substitution as beneficial is a good sign, especially considering that Proline is not normally considered ideal for a beta strand. Proline was therefore also selected as a candidate for closer testing.

These substitutions were introduced into a background of the high-affinity peptide from Promega, p86, for validation of their effects and measurement of affinity.

Positions Substitutions #158 #159 #160 #161 #162 #163 #164 #165 #166 #167 #168 Promega - p86 V S G W R L F K K I S B1 - W161F V S G F R L F K K I S B2 - W161H V S G H R L F K K I S B3 - F164H V S G W R L H K K I S B4 - K165R V S G W R L F R K I S B5 - K165Q V S G W R L F Q K I S B6 - K166P V S G W R L F K P I S

Table 3.8 – selected candidate substitutions The chosen candidates peptides are shown, with substitutions differing from p86 shown in red.

3.5 Affinity measurements

Peptides were assayed as described in Chapter 2 and the results plotted as luminescence as a function of peptide concentration, as seen in Figure 3.4:

Affinity of peptide candidates 120

100

80 p86 B1 60 B2 B3 Luminescence B4 40 B5 B6 20

0 0 5 10 15 20 25 30 35 40 45 [pep] / nM

Figure 3.4 – Plot of raw data from affinity measurements of candidate peptides

All six candidate peptides were measured, as well as p86, for comparison. It is immediately obvious that several of the peptides do not show improved affinity. In particular, B2 and B6 perform poorly. B3 and B4, however, appear very interesting. B4 appears to have an improved affinity over p86, reaching the plateau at lower peptide concentration. B3, while not having improved affinity, does have a much higher maximum than any other peptide. This is particularly interesting, since all reported peptides from Dixon, et al, reached the same maximum luminescence at saturation, differing only in affinity.

High affinity and high luminescence are both desirable qualities, so an additional peptide was synthesized, VS-11, which combined the substitutions of B3 and B4: F164H and K165R. This was assayed in the same manner and the data from all peptides was fitted in Origin for a more exact estimate of Kd.

Fitted max Name Fitted Kd (nM) luminescence Table 3.9 – results of fitting p86 0,57 45,0 Pep B1 W161F 2,38 37,1 Fitted Kd in nM and fitted max PEP B2 W161H 23,04 19,2 PEP B3 F164H 28,84 183,5 luminescence in RLUs, for each PEP B4 K165R 0,08 44,2 candidate peptide, the p86 standard PEP B5 K165Q 4,17 54,1 peptide and VS-11, combining PEP B6 K166P 29,10 36,9 VS-11 17,72 31,8 substitutions from B3 and B4.

Of the six original candidates, four of them (B1, B2, B5, and B6) did not show improvements in affinity. Peptide B3 showed reduced affinity, but a significantly improved maximum luminescence. Peptide B4 showed a clear improvement in affinity. Peptide VS-11, however, showed no improvement in either maximum luminescence or affinity. Details on the fitted curves can be found in appendix G.

3.6 References

Dixon, Andrew S., Marie K. Schwinn, Mary P. Hall, Kris Zimmerman, Paul Otto, Thomas H. Lubben, Braeden L. Butler, et al. 2016. “NanoLuc Complementation Reporter Optimized for Accurate Measurement of Protein Interactions in Cells.” ACS Chemical Biology 11 (2): 400–408. https://doi.org/10.1021/acschembio.5b00753. Flower, Darren R., Anthony C.T. North, and Clare E. Sansom. 2000. “The Lipocalin Protein Family: Structural and Sequence Overview.” Biochimica et Biophysica Acta (BBA) - Protein Structure and Molecular Enzymology 1482 (1–2): 9–24. https://doi.org/10.1016/S0167-4838(00)00148-5. Hall, Mary P., James Unch, Brock F. Binkowski, Michael P. Valley, Braeden L. Butler, Monika G. Wood, Paul Otto, et al. 2012. “Engineered Luciferase Reporter from a Deep Sea Shrimp Utilizing a Novel Imidazopyrazinone Substrate.” ACS Chemical Biology 7 (11): 1848–57. https://doi.org/10.1021/cb3002478. Yeh, Hsien-Wei, Omran Karmach, Ao Ji, David Carter, Manuela M Martins-Green, and Hui-wang Ai. 2017. “Red-Shifted Luciferase–luciferin Pairs for Enhanced Bioluminescence Imaging.” Nature Methods 14 (10): 971–74. https://doi.org/10.1038/nmeth.4400.

4. Discussion

Six candidate peptides were selected from the screening results. Of these, four peptides (B1, B2, B5, and B6) showed affinities lower than our benchmark, p86. Peptides B3 and B4, however, produced interesting results.

B4 (K165R) showed a high affinity, even surpassing p86 (approximately 80pM vs. 570pM). This result shows that p86 is not the only high-affinity peptide possible and that further increases in affinity may yet be possible. The exact Kds measured may not be entirely accurate; in particular because the Kd measured is lower than the lowest concentration of peptide used. Ideally, we would wish for a range of peptide concentrations on either side of the Kd, for most accurate fitting. However, since the same procedure was used with all peptides, the results are still comparable between the peptides. We can therefore be confident in concluding that B4 has a higher affinity than p86, even if the exact Kds are not perfectly accurate.

B3 (F164H) showed lower affinity than p86, but curiously had a significantly higher maximum luminescence. This is interesting, since all reported peptides from Dixon, et al, reached the same maximum luminescence at saturation, differing only in affinity. Again, our peptide concentrations do not follow the curve to saturation and as such the fit may not be entirely accurate. However, regardless of the accuracy of the fit, the maximum is undeniably higher, at least by a factor two. This shows that luminescence is not merely affected by the affinity of the peptide for the complex, but that specific substitutions may have a functional effect on the final complex, regardless of the affinities involved. This could be a useful feature, especially in applications where a low-affinity peptide is required, yet a high signal is still desirable, such as for studying protein-protein interactions.

Based on these results, a further peptide was examined; VS-11, which combined the substitutions of B3 and B4. It was our hope that this peptide would show both higher affinity and higher maximum luminescence, thus forming an ideal peptide for maximum assay sensitivity. However, this was not the case. Rather, VS-11 showed neither the ideal maximum luminescence, nor the ideal affinity. This fact shows that the substitutions are not merely additive, but that some inter- dependence is at work. This should inform future work. In particular, the signal average is calculated on the assumption that substitutions are largely independent in their effects. As shown, this is not always the case. This is not a huge surprise, but it is a limitation of our screening method and should be kept in mind. On the other hand, the screen is set up to detect high luminescence, regardless of the reason. As such, effects of this type are to be expected. Indeed, if we had somehow screened only for affinity directly, a result like peptide B3 might have gone unnoticed. In general, it is crucial to consider that the screening method employed will affect the types of results that you will get and what conclusions can reliable be reached on that basis. This example illustrates that point nicely.

Peptides B1 and B2 both have substitutions in position #161. Comparing with p86, we see a clear effect of this one position. From Tryptophan (in p86), to Phenylalanine (in B1) and to Histidine (B2), the affinity changes from 0,57 nM, through 2,38 nM, to 23,04 nM. These candidates all perform well, objectively speaking, but the standard set by p86 is hard to beat. These results are in accordance with our screen results. Position #161 appeared constrained to ring-like side chains. Also, given the previous results from Dixon, et al., it is not surprising that Tryptophan is the superior amino acid in this position.

Peptides B4 and B5 also substitute the same position, #165, substituting Arginine and Glutamine, respectively, whereas p86 has Lysine. Arginine clearly outperforms the other options. In full-length NanoLuc, this position is a Glutamate. It seems that, while many substitutions are functional in this position, the exact structure of the side-chain has significant effects on affinity.

K166P, found in peptide B6, does not show improvement. This substitution was chosen in part because it fits the structural motif shared with Calycin proteins, which Hall, et al. took advantage of during the stabilization of NanoLuc. It seems, however, that the p86 choice of Lysine is superior, by a sizeable margin (0,57 nM vs. 29,1 nM). The fact that Proline works at all, however, further supports the notion of this connection. Perhaps other insights can be gleaned by further structural comparisons.

Position #164 showed more potential for accepting substitutions. Position #164 showed a clear bias in favor of phenylalanine, so it is not too surprising that Histidine could not compete with regards to affinity. However, the altered functionality of C164H shows that perhaps there are other reasons we might wish to take a closer look at the screened possibilities. In particular C164S could be interesting to look at. This substitution occurred in yeLuc, another variant of NanoLuc, optimized for a synthetic substrate, selenoterazine, made by Yeh, et al. The fact that this position turns the side-chain inwards, towards the likely substrate pocket, supports the suspicion that functional differences could be found in this position.

Overall, the screen has performed admirable. While four of the candidate peptides did not have the very high affinity we were aiming for, they nevertheless all demonstrated Kds less than 30nM. Given that the range of Kds possible extends as far as 190µM (reported by Dixon, et al.), this shows that our screen correctly identified the highest affinity substitutions. In addition, we identified one peptide with improvements in affinity above our benchmark, as well as one functionally improved peptide, showing increased luminescence. I don’t think it’s unreasonable to call this a success.

It’s noteworthy that the F164H substitution also appears in the teLuc construct, reported by Yeh et al. 2017, which showed drastically increased luminescence, compared with their other constructs. In that work, the teLuc construct was paired with an artificial substrate, diphenylterazine. It is unclear to what degree the substrate contributes to this increase in luminescence, but it is clearly not solely responsible for this behavior. It is important to observe which substrate is used. As demonstrated by Inouye, et al. 2014, some of the substitutions selected for in NanoLuc are specific to the use of the furimazine substrate. Inouye, et al. constructed eKAZ, a NanoLuc variant containing only three of the sixteen substitutions originally included in NanoLuc. This variant showed increased luminescence when using CTZ as substrate, but not when using furimazine. It would be interesting to examine the effect of F164H on furimazine. It is not guaranteed that this substitution will improve luminescence with all substrates, but if it does not, that is still valuable information, which could improve our understanding of the substrate binding and active site configuration. If nothing else, the 164H substitution should be included in full-length constructs, if using CTZ as a substrate, e.g. in combination with the already identified eKAZ substitutions. This would likely result in an ideal CTZ-variant of NanoLuc.

4.1 Codon effects

One thing that has not been accounted for in the previous discussion is possible codon effects. Peptides with identical amino acid sequences could have functional differences relating to the nucleotide sequence. Since we’re mutating at random, it’s possible that some codons could result in e.g. RNA hairpins, which could negatively impact translation or mRNA stability. Some slight indications of this occurred in the data set, but unfortunately, it is difficult to come to a firm conclusion, given the variations in signal. As an example of this, let us consider all the peptides picked up in this screen, which match the starting sequence. A total of five different nucleotide sequences were found, all encoding the same starting peptide peptide.

Group Position / Codon Signal Range Sequence Library of Average #162 #166 Count Origin Signal Minimum Maximum 1 CGG AGA 43 105 4426 1532 8748 2 AGG AGA 10 105 6343 3673 10248 3 CGT AGA 2 105 3801 3798 3805 4 CGC CGT 2 110 1078 170 1987 5 CGC AGG 1 114 31816 N/A N/A Table 4.1 – Summary of starting sequences picked up in the screen. Sequences are grouped after the specific synonymous codons found in positions #162 and #164. All other codons in these sequences are identical.

It certainly looks like there could be an effect of the codons. However, it’s difficult to be sure, since several groups have so few representatives. The variation in signal between the best represented groups (#1 and #2, in table 3.1) is negligible and well within the variations we’ve seen in the screen. For groups #3 and #4, the count is too low to assign much importance. The absolute values are low, but if we measured one more culture, would that stay true? #5 is interesting as being a single representative with the highest signal measured of any of these peptides. That’s suggestive, but not conclusive without further study. Given the unreliable signal and the limits of our data set, we cannot conclude that there is an effect of synonymous codons. There is simply too much noise to pick out such a subtle signal. However, it should be kept in mind for the future, especially if the signal can be normalized or otherwise made more reliable.

4.2 Interesting mutant – R162G

The exact active site of NanoLuc is not known. However, some hints are available from the structure. Given the size of CTZ, there are only so many places it could go. Specifically, there’s a clear cavity in the center of the barrel, which appears to be large enough to accommodate the substrate. This cavity is also connected to an open channel, leading to the surface of the protein, which would allow access for the substrate.

R162 points directly into this pocket. While the exact function of R162 is not known, it is suspected to be extremely important for activity. Dixon et al., in their work, left this position untouched, presumably for that reason.

Figure 4.1 – Highlight of the presumed binding pocket for CTZ, in NanoLuc.

Internal cavity shown in light gray surface mode. Nearby residue side- chains shown in teal. Strand 11 shown as green cartoon with sidechains. R162 shown in magenta. Residues #158 and #168 are indicated, to shown orientation of the strand.

In this work, position #162 was not mutated as part of the normal screening libraries. Instead, it was separated into its own library, mutating only this one position. This was done to verify the above suspicion and especially to rule out the possibility that e.g. Lysine might be compatible with luciferase function.

R162K was determined to be inactive. However, early testing found that R162G gave a low, but consistent signal, which remained constant after re-transformation. This result was somewhat mystifying. On the one hand, arginine is clearly important, since most alternatives immediately reduce the activity to zero. On the other hand, if a Glycine can also work to some degree, it’s hard to see how the Arginine side-chain can be particularly important for any catalytic function.

Since the starting peptide sequence contained an additional Arginine, at position #166, we speculated that perhaps the peptide could switch register, compared with the native strand, moving R166 into the former position of R162. This would explain both how function could be restored, as well as why the signal dropped, since it’s likely the peptide would not have as great an affinity, when in such an unusual position. I.e. the other amino acids in the peptide would then no longer be in their optimal positions.

To examine this possibility further, several constructs were made, mutating these two Arginines. Aside from the R162G mutation, we also constructed R166M, as well as the double mutant (R162G + R166M). We also introduced R162G into p86, which does not have a second Arginine. All these constructs were made in the context of the co-expression system used for screening. Finally, we also constructed NanoLuc-R162G, to observe how this mutation affected the full-length protein.

If an arginine were essential and the activity of R162G is a result of register shift, we’d expect the single mutants to produce luminescence; either in normal or shifted register. The double mutant would presumably be dark, since no Arginine is available, in any register. Likewise, we’d expect that, being a single peptide strand, the full-length construct would not have the flexibility necessary to allow a register shift of the eleventh strand and thus would not be able to compensate for the loss of R162 in the same manner as a free peptide could.

As shown in figure 4.2, the single and double mutants were below the limit for naked-eye observation. Also, surprisingly, it turned out that the full length NLuc-R162G had substantial activity. These results rather puncture the hypothesis of register-shifting.

The screen furthermore picked up R162L, as having measurable, but reduced activity. This substitution was not examined in detail, but the same general comments apply, with regards to the catalytic activity of R162.

Figure 4.2 – R162G constructs

Left: Bright field image of streaked colonies, expressing the various constructs. Right: Luminescent image of the same colonies, after application of CTZ. Bottom: Legend for constructs, including amino acid sequences for the peptides. The introduced mutations are underlined.

These results demonstrate that R162 is in fact not essential for activity. In a way, this was apparent earlier. We observed that there appears to be some residual activity of the truncated luciferase alone, on the order of 0.1% of the activity of the complex. While the activity is reduced to such a degree that it’s not apparent in our screening system, it can be seen clearly when working with higher concentrations of purified protein. This opens up questions concerning the actual mechanism of NanoLuc. If activity is possible from the tLuc alone, that means that the complementing peptide strand is only improving an already (though barely) active enzyme.

A question in this regard is the conformation of the barrel, when there’s no complementing strand. Intuitively, the barrel would be expected to close up. The fact that tLuc is reasonably stable and doesn’t easily aggregate speaks against the notion that the barrel is fully open, since this would expose otherwise buried side-chains. However, a fully closed up barrel, lacking the 11th strand, would not achieve the native conformation. The barrel would be tighter and the active site presumably deformed. If such a conformation is stable, it’s hard to see how activity could be achieved. Indeed, it’s hard to see how the peptide could fit, unless the closed barrel was at least transiently open. These facts suggest that, without the final strand, the barrel is quite dynamic, occasionally achieving the right conformation, allowing catalysis and possibly peptide access.

So, when taking structure into account, it’s important to keep in mind that the available crystal structure is of full length NanoLuc, not the NanoBiT system. While the restoration of native-like function implies similar structures, the actual behavior of the split system has not been studied in detail and the structure of the complementation complex is, essentially, unknown.

4.3 Library optimizations

The original mutational strategy was based on the notion of achieving all single and double mutants, plus some triple mutants. We chose this approach to prevent the libraries from becoming unmanageably large. By full randomization of two codons and semi-randomization of a third, we’re left with 20x20x4 = 1600 amino acid variants in each library. If instead we had chosen to fully randomize all three codons, we’d have 20x20x20 = 8000 variants per library.

Originally, the concept was to pick up every single variant and screen them all with the 96-well method. In other words, there’d be no pre-screening. Every single colony would have to go through the microtiter format. In that situation, the number of variants expected becomes quite important, as this determines how many pickups are necessary to get a good sampling of the library.

However, with the pre-screening procedure, this is a much smaller issue. Instead of picking up individual colonies and screening them in 96-well format, we simply screen transformation plates with hundreds of colonies at the same time. Under those conditions, screening more variants is a matter of making a few extra agar plates. With that in mind, it might have been better to go with deeper libraries, allowing more variants and perhaps finding unique combinations that are not present in the current libraries.

Another option is to focus on single mutants, which would make each substitution easier to evaluate. However, it would also make it unlikely to find marginally stabilizing substitution, where more than one substitution was necessary to reach the threshold for the first screen.

This might have resulted in the need to disregard the first screen and instead fully screen all colonies in the 96-well format. However, this might not have been insurmountable, since it would entail only six libraries (one for each position) of 20 variants (one for each substitution), for a total of 120 mutants. Even with multiple pickups, to ensure full coverage, it would certainly have been possible.

It would require a second round of libraries, though, to check for epistatic effects of the substitutions. Such a second step would obviously have been time consuming, although the single substitution results could have reduced the number of substitutions and positions necessary to screen.

With that said, the current libraries do fulfill their function, so perhaps a more fruitful approach would have been to cover more positions. We excluded the two outermost positions of the peptide, on the ground that these positions appear less important. This, again, helped to reduce the amount of screening necessary. However, given the pre-screening procedure, it might have been reasonable to include them. Since the work needed to screen any individual library is drastically reduced with the pre-screening method, including additional libraries would not be out of the question. While these positions are relatively less important, it doesn’t follow that they are unimportant. It’s conceivable that there’s some effect and Dixon’s work does mutate three of these positions in their peptides, supporting that notion.

4.4 Improvements to the screening system

The pre-screen procedure worked well. It was easy to screen large numbers of colonies and it drastically cut down on the number of peptides that needed to go further in the workflow. However, a few things could perhaps be improved. It’s worth discussing such optimizations, for the sake of future work with this, or similar, systems.

First, the pre-screen was quite qualitative. In particular, this means that it was difficult to compare between libraries. It’s easy to see which colony is the brightest on a given plate, but comparing colonies between plates is very uncertain.

Second, the spraying of the substrate is not precise. There could be differences in substrate stocks, made at different times; differences in how much medium is in each plate, possibly diluting the substrate to various degrees; variation in the volume of each spray or the spread of droplets on the plates.

For these reasons, it was impossible to quantitatively evaluate the pre-screen plates and we were left with simply picking up the brightest from each plate. This meant that the brightest from one plate might actually be quite poor, in comparison with the brightest from another plate. It’s possible that this allowed rather poor sequences to slip through the first screen, especially if a given library was generally poor in bright variants.

Of course, this is not a major concern, since we quantify the signals in the 96-well screen. However, here we ran into problems with variability of signal. This appears to be a result of several factors that all contribute to variability.

Controls were included on all 96-well plates, but they were not useful for normalization purposes. The signals varied drastically, even between two replicas of the same culture on the same plate. In some cases, there were clear signs that something had gone wrong with the expression, e.g. achieving zero signal from a known positive control. Some kind of normalization would be preferable, since differences in growth times and substrate concentrations are almost inevitable, but simply including known controls does not appear to have been sufficient.

During this project, one attempt was made to use a separate normalization signal, via the introduction of a red fluorescent protein, mCherry, into the plasmid, as a third gene in the operon. The notion here was to express mCherry along with the NanoBiT components, to act as an independent reporter of expression. However, we observed that the fluorescent signal did not correlate well with the luminescent signal. This could be due to the fact that mCherry requires some time to mature properly, whereas tLuc is functional as soon as it’s folded. Regardless, the fact that the signals were not in sync made mCherry useless as an expression reporter and no further attempts were made.

However, while the mCherry idea didn’t work out, perhaps the general approach could. One option would be to use a separate luciferase as the expression reporter. Such luciferase dual-reporter systems have been described before (Grentzmann et al., 1998). This would require the introduction of a luciferase with a distinct substrate, or a large enough difference in emission spectrum that the two signals could be reliably distinguished.

Firefly luciferase could be used in this fashion, as it uses a different substrate, D-luciferin. Renilla luciferase could also be incorporated, since NanoLuc is capable of using coelenterazine analogues that RLuc cannot. The light signals could therefore be differentiated that way (Inouye et al. 2013).

Gaussia luciferase would be a poor choice. With regard to substrate, the same trick could be done as RLuc. However, GLuc is difficult to express in standard E. coli, since it’s dependent on several disulfide bridges. The proper formation of these disulfides requires specialized strains of E. coli or a shift to a eukaryotic organism, both of which would complicate the system.

Such changes would also require re-testing of the assay conditions, to ensure that the second reporter functioned as required. It does seem likely, though, that such optimization could avoid some of the problems we faced, regarding variation in expression levels. With a more reliable signal, it would be possible to make more fine-grained distinctions between substitutions and the screen would be overall more useful. If this system is to be used again, the work might well be worth the effort.

Although such a dual-reporter approach would allow us to detect variations in expression, and thereby compensate for them, it would not avoid the issue altogether. Presumably, failures would still occur. While we could identify them and remove them from the data set, it would still require extra screening. For that reason, it would be preferable if we could entirely avoid this issue from the start.

It would appear that the high expression level of the chosen plasmid vector may have been responsible for the accumulation of mutants, by favoring cultures that reduce their expression levels. It’s also possible that, given the cascading nature of the T7 system, that small differences in induction could lead to great variation in final expression. As such, this plasmid base might have contributed to the noise in the system. With a lower general expression level, these problems could perhaps be avoided and lead to cleaner results. While we’re interested in high expression to get a clear signal, it seems likely that we could have achieved robust signals at much lower expression levels. The best performing peptides show signals >1000 times higher than the background. If reducing the expression level would avoid issues with unexpected mutations, it might be worth the trouble to move the system over to a different plasmid.

This would require adjustments to the rest of the procedure, though. In particular, the pre-screen would need addition of IPTG or other means of induction to the medium. As it stands, the pre-screen works on LB agar plates, with antibiotics. No additions were made to induce expression. It would appear that expression of the plasmid occurs through residual lactose in the medium from the yeast extract. It is likely that a system with inherently lower expression would not be able to work under these conditions, since the induction would be insufficient to produce a clear signal.

An obvious optimization might be to try for a higher degree of automation. Certainly, colony picking and luminescence measurements could be automated to a greater degree than we managed. Going from 96 wells to 384 would be a simple way of increasing the throughput as well, since the moving and handling of individual plates would be a significant bottleneck. It should be considered, though, that reducing the culture volume would also reduce the overall signal. This would need to be taken into account, especially if other modification are made that also effect the light intensity, such as reducing expression levels.

Finally, we can also consider if there might be alternate screening methods, which could improve throughput. Lindqvist, 1994 expressed five variant luciferases in insect cells and measured the luminescence using flow cytometry. This was a partial success, with two variants of click beetle luciferase being detectable.

If such a method could be developed into a cell-sorting setup, then large libraries of cells, expressing variant peptides, could be sorted quickly. The main objection to this is whether sufficient light could be achieved from single cells. This method has not, to my knowledge, been attempted with NanoLuc, nor by expression in E. coli. That said, the stable and bright luminescence of NanoLuc makes it a more likely candidate for such an approach than luciferases with lower luminescence (such as RLuc) or more “flash” type kinetics (such as GLuc).

4.5 A few concluding thoughts

The example of R162G, mentioned in section 4.2, shows clearly that we do not yet understand the particulars of the tLuc-peptide interaction and how this reflects on the luciferase function. This leaves open possibilities such as peptides that could change the wavelength of emitted light, or alter the substrate preferences of the complex. Since both these effects have been observed with mutants of the full-length NanoLuc, it’s reasonable to suspect that some peptide variants could achieve this as well. The increased luminescence from F164H also points in this direction, as regards functionality compared with mere affinity.

During this project, we have not observed any apparent changes in wavelength, but detailed spectrum examinations have not been performed; only naked eye observations. Minor changes could therefore have gone unobserved. Likewise, we did not study alternative substrates. There is clearly room for closer examination.

A troubling point is our lack of understanding of luciferase active sites, generally. The subject of luciferase active sites is rather under-reported in the literature and focused mostly on mutational studies for practical engineering. I personally wonder if perhaps people just don’t care how they work, as long as they do; practical use over theoretical knowledge.

However, without a deeper understanding, we will be limited to the natural luciferases and their immediate derivatives. This seems a shame, given the wide range of luciferases and luciferins. It is clear that bioluminescence can occur in a variety of ways, from widely different protein structures, and with a range of substrates. The fact that luciferases appear to have evolved independently in many separate instances further support that. These facts indicate that luciferase activity is actually quite easy to achieve.

As such, luciferases should be a natural target for protein design. However, here we run into our limited understanding of the design target. You can’t hit a target you can’t see.

A directed effort to study luciferase active sites would be helpful. There’s a wealth of information that could be applied to assist in this effort. For example, the systematic study of CTZ-analogues as reported by Inouye, et al. could form a basis for hypothesizing about the orientation of substrate in the binding pocket. By careful comparison of the various CTZ-analogues and their relative activities, we might be able to postulate a testable model of the luciferase active site. We could then start to ask the really interesting questions: What does it take to turn a given protein into a luciferase? What is the smallest possible luciferase? What determines whether a given compound could be a luciferin or not?

The more we study these systems, the more we understand what is essential and what are mere accidents of their history. Such understanding is critical for future endeavors to engineer and design luciferases, and to extend the utility of these important tools.

4.6 References

Dixon, Andrew S., Marie K. Schwinn, Mary P. Hall, Kris Zimmerman, Paul Otto, Thomas H. Lubben, Braeden L. Butler, et al. 2016. “NanoLuc Complementation Reporter Optimized for Accurate Measurement of Protein Interactions in Cells.” ACS Chemical Biology 11 (2): 400–408. https://doi.org/10.1021/acschembio.5b00753. Grentzmann, Guido, Jennifer A Ingram, Paul J Kelly, Raymond F Gesteland, and John F Atkins. n.d. “A Dual- Luciferase Reporter System for Studying Recoding Signals,” 8. Inouye, Satoshi, Yuiko Sahara-Miura, Jun-ichi Sato, Rie Iimori, Suguru Yoshida, and Takamitsu Hosoya. 2013. “Expression, Purification and Luminescence Properties of Coelenterazine-Utilizing Luciferases from Renilla, Oplophorus and Gaussia: Comparison of Substrate Specificity for C2-Modified Coelenterazines.” Protein Expression and Purification 88 (1): 150–56. https://doi.org/10.1016/j.pep.2012.12.006. Inouye, Satoshi, Jun-ichi Sato, Yuiko Sahara-Miura, Suguru Yoshida, and Takamitsu Hosoya. 2014. “Luminescence Enhancement of the Catalytic 19kDa Protein (KAZ) of Oplophorus Luciferase by Three Amino Acid Substitutions.” Biochemical and Biophysical Research Communications 445 (1): 157–62. https://doi.org/10.1016/j.bbrc.2014.01.133. Lindqvist, Christer, Matti Karp, Karl Åkerman, and Christian Oker-Blom. 1994. Flow Cytometric Analysis of Bioluminescence Emitted by Recombinant Baculovirus-Infected Insect Cells. 15 (3): 207. https://doi.org/10.1002/cyto.990150305.

Yeh, Hsien-Wei, Omran Karmach, Ao Ji, David Carter, Manuela M Martins-Green, and Hui-wang Ai. 2017. “Red-Shifted Luciferase–luciferin Pairs for Enhanced Bioluminescence Imaging.” Nature Methods 14 (10): 971–74. https://doi.org/10.1038/nmeth.4400.

Chapter 5 Appendices

Appendix A - Early approaches

The original plan for this project involved the use of peptide micro arrays, to test a wide range of specific peptides. The outline was to design arrays with known peptide variants and then washing the array with truncated NanoLuc, allowing it to bind to the peptides. Then, by addition of CTZ, the function of each peptide could be evaluated in a single round of screening.

Peptide arrays allows for the synthesis of thousands of peptide variants with known sequences, in specific locations on the array (Buus et al. 2012). This would allow in-depth mutational scanning, easily covering the entire peptide. Upon measuring the luminescence, light from any position can be directly related to the sequence of the peptide synthesized on that position.

To allow this to work, we 3D-printed a small plastic cassette, which would hold the arrays. The cassette maintained a space between the array and a cover glass, which allowed the substrate to be held in a liquid phase, but accessible for tLuc, bound to the array. In this way, we planned to assay the luciferase activity of thousands of peptide variants simultaneously. The arrays produced focused on substitution scans of Promega’s high-affinity peptide, p86, along with variations on linker length and charge.

Figure A.1 – Attempt at luminescent screening of peptide array.

Shown are two wells, each covering identical sectors of peptides.

One well was incubated with tLuc, washed, and then supplied with CTZ (left). The other was not incubated with tLuc, as a control (right).

A diffuse background is visible, likely a result of tLuc’s intrinsic activity. A brighter background is visible near the edge, which is presumed to be a result of tLuc seeping through the edges of the well. Early experiments were complete failures, though. We found no significant signal from the array positions, despite trying multiple buffer conditions and incubation procedures. Our attempts to increase signal only resulted in a diffuse higher background.

Several possibilities exist concerning what might be going wrong. In the peptide array, the peptides are not free in solution, but are attached to the array itself. It could be that this fixation of the peptide somehow interferes with the complementation, despite linkers. We therefore tested the complementation of peptides attached to PEG beads and found that not only did we achieved complementation, we achieved a signal bright enough to clearly localize beads with our camera setup. So, in principle, there’s nothing wrong with peptides attached to a solid support.

Figure A.2 – Testing of bead-bound peptide complementation.

PEG beads with p86 peptide attached, incubated with tLuc, and then imaged within the same cassette used for the array.

Note that individual beads can be clearly distinguished.

Speculating that perhaps the luciferase bound to the peptides, but was inactive, for some reason, we decided to test simply for binding. Using an NLuc-specific anti-body (kindly supplied by the Promega corporation), with a fluorescent-tagged secondary antibody, we imaged the array for fluorescence. In this case, fluorescence signal would come from any sector of the array where NLuc was bound, regardless of its functionality. While this would not tell us which peptides restored luciferase function, it would inform us about which peptides reliably bound the truncated luciferase.

We found the binding was not dependent on the specific sequence of the peptide and instead correlated with the overall charge of the peptide. As such, we judged that this reflected a non-specific binding that was unrelated to our purpose. It was for this reason that we started thinking about alternate screening methods. Since we had observed that it was possible for the complementation to occur intracellularly, we proceeded along the lines of a cell- based platform, rather than the peptide arrays.

At first, our notion was to screen every variant in our libraries, pool the cultures by signal, and then sequence each pool by Illumina sequencing. The idea was to rely on statistical analysis of each pool, to find which substitutions would end up in which pool. Using that full data set, we could then analyze the combinations and derive the individual effect of each substitution.

However, this required equipment that we did not ourselves have and the logistics of actually screening so many variants turned out to be unmanageable. Rather than ending up with a partial data set that would be useless, we turned to the pre-screening method, which developed into the system presented previously. In principle, there’s nothing wrong with this approach, though, and it could be employed in future work, if the practical issues could be sorted out.

Appendix B - Pre-screening Procedure

For the pre-screening procedure, we experimented with a number of ways to assay the colonies. Pipetting CTZ directly unto the plate was functional, but uneven. It was difficult to cover the entire plate, resulting in luminescence only from those colonies that were directly covered by the pipetting.

Figure B.1 – Luminescence screening attempt with pipetting of substrate.

Red circle inserted to show approximate circumference of plate. A large part of the plate received essentially no substrate.

Increasing the volume only resulted in the colonies being washed out, which would prevent accurate picking later. Attempts were also made with growing cells in a top agar. However, this seemed to limit and diffuse the signal, making it more difficult to properly distinguish colonies for picking.

Finally, we settled on the procedure of spraying the CTZ unto the plate, using a perfume bottle sprayer. This seemed to give the best combination of good signal, low volume, and ease of use.

Figure B.2 – example of pre-screen plate

Bright field image of colonies overlaid with luminescent image in false color.

This allows easy identification of bright colonies, facilitating manual picking.

Appendix C - Sequences

Generic forward primer:

5’- ATTCTUAGCGATAAAATTATTCACCTGAC -3’

Library-specific reverse primers:

Lib 105 5’- AAGAAUTCTCTCACATAGMNNCCAACCTGTCACGCTCATCTGCAG -3’

Lib 106 5’- AAGAAUTCTCTCANATAGGCGMNNMNNTGTCACGCTCATCTGCAG -3’

Lib 107 5’- AAGAAUTCTCTCANAMNNGCGCCAMNNTGTCACGCTCATCTGCAG -3’

Lib 108 5’- AAGAAUTNTCTCMNNTAGGCGCCAMNNTGTCACGCTCATCTGCAG -3’

Lib 109 5’- AAGAAUTCTMNNANATAGGCGCCAMNNTGTCACGCTCATCTGCAG -3’

Lib 110 5’- AAGAAUMNNCTCANATAGGCGCCAMNNTGTCACGCTCATCTGCAG -3’

Lib 111 5’- AAGAAUTCTCTCANAMNNGCGMNNACCTGTCACGCTCATCTGCAG -3’

Lib 112 5’- AAGAAUTNTCTCMNNTAGGCGMNNACCTGTCACGCTCATCTGCAG -3’

Lib 113 5’- AAGAAUTCTMNNANATAGGCGMNNACCTGTCACGCTCATCTGCAG -3’

Lib 114 5’- AAGAAUMNNCTCANATAGGCGMNNACCTGTCACGCTCATCTGCAG -3’

Lib 115 5’- AAGAAUTNTCTCMNNMNNGCGCCAACCTGTCACGCTCATCTGCAG -3’

Lib 116 5’- AAGAAUTCTMNNANAMNNGCGCCAACCTGTCACGCTCATCTGCAG -3’

Lib 117 5’- AAGAAUMNNCTCANAMNNGCGCCAACCTGTCACGCTCATCTGCAG -3’

Lib 118 5’- AAGAAUTNTMNNMNNTAGGCGCCAACCTGTCACGCTCATCTGCAG -3’

Lib 119 5’- AAGAAUMNNCTCMNNTAGGCGCCAACNTGTCACGCTCATCTGCAG -3’

Lib 120 5’- AAGAAUMNNMNNANATAGGCGCCAACCTGTCACGCTCATCTGCAG -3’

Inserted uracils for USER cloning underlined. Primer oligos ordered from Eurofins genomics. Thioredoxin:

1 ATGTCTGTGA GCGGTTGGCG CCTGTTCAAG AAAATTAGCA GCGATAAAAT TATTCACCTG

61 ACTGACGACA GTTTTGACAC GGATGTACTC AAAGCGGACG GGGCGATCCT CGTCGATTTC

121 TGGGCAGAGT GGTGCGGTCC GTGCAAAATG ATCGCCCCGA TTCTGGATGA AATCGCTGAC

181 GAATATCAGG GCAAACTGAC CGTTGCAAAA CTGAACATCG ATCAAAACCC TGGCACTGCG

241 CCGAAATATG GCATCCGTGG TATCCCGACT CTGCTGCTGT TCAAAAACGG TGAAGTGGCG

301 GCAACCAAAG TGGGTGCACT GTCTAAAGGT CAGTTGAAAG AGTTCCTCGA CGCTAACCTG

361 GCGAGCCACC ATCATCACCA TCATTAA

Sequence of thioredoxin, shown here with N-terminal p86 tag (underlined). Note also His-tag at the C- terminal (also underlined).

Truncated NanoLuc:

1 ATGCACCATC ATCACCATCA TAGCGTGTTT ACCCTGGAAG ACTTTGTGGG CGACTGGGAG

61 CAGACCGCGG CGTATAACCT GGATCAGGTG CTGGAGCAGG GTGGCGTTAG CAGCCTGCTG

121 CAAAACCTGG CGGTGAGCGT TACCCCGATC CAGCGTATTG TTCGTAGCGG CGAGAACGCG

181 CTGAAGATCG ACATTCACGT GATCATTCCG TACGAAGGCC TGAGCGCGGA TCAGATGGCG

241 CAAATCGAGG AAGTGTTCAA GGTGGTTTAC CCGGTTGACG ATCACCACTT TAAAGTGATC

301 CTGCCGTATG GTACCCTGGT GATTGACGGC GTTACCCCGA ACATGCTGAA CTACTTCGGT

361 CGTCCGTATG AGGGCATCGC GGTTTTTGAT GGTAAGAAAA TTACCGTGAC CGGTACCCTG

421 TGGAACGGCA ACAAAATCAT TGACGAACGT CTGATTACCC CGGATGGATC CATGCTGTTC

481 CGAGTAACCA TCAACAGTTA A

Sequence of tLuc, with N-terminal His-tag (underlined).

Appendix D - pET30a plasmid map

pET30a standard cloning vector, supplied by GenScript.

Appendix E - E. coli strain genotypes:

BL21(DE3) expression strain: fhuA2 [lon] ompT gal (λ DE3) [dcm] ∆hsdS λ DE3 = λ sBamHIo ∆EcoRI-B int::(lacI::PlacUV5::T7 gene1) i21 ∆nin5

MC1061 cloning strain: Δ(araA-leu)7697, [araD139]B/r, Δ(codB-lacI)3, galK16, galE15(GalS), λ-, e14-, mcrA0, relA1, rpsL150(StrR), spoT1, mcrB1, hsdR2 F' lacIQ, TetR

Appendix F - Screen data

Electronic version can be found at https://tinyurl.com/BB-dataset

Sequencing Well Peptide Nucleotide Sequence Peptide Amino Plate Position Acids Luminescence measurements Seq 1 A01 GTGACAGGTTATCGCCTGTTTGAGAGAATTCTT VTGYRLFERIL 2553 15657 17759 Seq 1 A02 GTGACAGGTTGGCGCAGGTTTGAGAGAATTCTT VTGWRRFERIL 3730 17289 18600 Seq 1 A03 GTGACAGGTTGGCGCCTAAGTCGGAGAATTCTT VTGWRLSRRIL 5203 41264 45429 Seq 1 A04 GTGACAGGTTGGCGCCTAATGAGGAGAATTCTT VTGWRLMRRIL 711 4023 4562 Seq 1 A05 GTGACAGGTTGGCGCCTACATGGGAGAATTCTT VTGWRLHGRIL 1391 12901 14850 Seq 1 A06 GTGACAGGTTGGCGCCTAATGTCTAAAATTCTT VTGWRLMSKIL 35 118 126 Seq 1 A07 GTGACAGGTTGGCGCCTAAGTCGGAGAATTCTT VTGWRLSRRIL 102 755 873 Seq 1 A08 GTGACAGGTTGGCGCCTACATAATAGAATTCTT VTGWRLHNRIL 724 4663 5539 Seq 1 A09 GTGACAGGTTGGCGCCTAATGACTAAAATTCTT VTGWRLMTKIL 136 530 552 Seq 1 A10 GTGACAGGTTGGCGCCTAATGGCGAGAATTCTT VTGWRLMARIL 12608 43344 46743 Seq 1 A11 GTGACAGGTTGGCGCCTAAGTGAGAAAATTCTT VTGWRLSEKIL 41 217 264 Seq 1 A12 GTGACAGGTTGGCGCCTATCGCGGAGAATTCTT VTGWRLSRRIL 2769 21300 24328 Seq 1 B01 GTGACAGGTTGGCGCCTATTTATGAAAATTCTT VTGWRLFMKIL 2802 12626 13605 Seq 1 B02 GTGACAGGTTGGCGCCTATCGCGGAGAATTCTT VTGWRLSRRIL 1469 5280 5755 Seq 1 B03 GTGACAGGTTGGCGCAGGTTTGAGAGAATTCTT VTGWRRFERIL 1076 1530 1578 Seq 1 B04 GTGACAGGTTGGCGCCTAATGAAGAGAATTCTT VTGWRLMKRIL 2530 6948 7263 Seq 1 B05 GTGACAGGTTGGCGCCTAATGAATAGAATTCTT VTGWRLMNRIL 1345 6151 6499 Seq 1 B06 GTGACAGGTTGGCGCCTACATAATAGAATTCTT VTGWRLHNRIL 2072 4774 5047 Seq 1 B07 GTGACAGGTTGGCGCCTACATACTAAAATTCTT VTGWRLHTKIL 6001 28852 32637 Seq 1 B08 GTGACAGGTTGGCGCCTACATTCGAGAATTCTT VTGWRLHSRIL 3638 9300 9960 Seq 1 B09 GTGACAGGTTGGCGCCTAATGAAGAGAATTCTT VTGWRLMKRIL 120 472 535 Seq 1 B10 GTGACAGGTTGGCGCCTACATAATAGAATTCTT VTGWRLHNRIL 13 74 74 Seq 1 B11 GTGACAGGTTGGCGCCTAAGTACGAGAATTCTT VTGWRLSTRIL 238 2050 2539 Seq 1 B12 GTGACAGGTTGGCGCCTAAATAGGAGAATTCTT VTGWRLNRRIL 3632 18372 19591 Seq 1 C01 GTGACAGGTTGGCGCCTATTTTATAAAATTCTT VTGWRLFYKIL 419 1785 1874 Seq 1 C02 GTGACAGGTTGGCGCCTACATAAGAGAATTCTT VTGWRLHKRIL 5921 15758 16694 Seq 1 C03 GTGACAGGTTGGCGCCTAATGTCTAAAATTCTT VTGWRLMSKIL 208 1313 1266 Seq 1 C04 GTGACAGGTTGGCGCCTACATAAGAGAATTCTT VTGWRLHKRIL 873 8535 9071 Seq 1 C05 GTGACAGGTTGGCGCCTATTTGAGAAAATTCTT VTGWRLFEKIL 3211 10196 11556 Seq 1 C06 GTGACAGGTTGGCGCCTACATGAGAAAATTCTT VTGWRLHEKIL 1161 6702 7280 Seq 1 C07 GTGACAGGTTGGCGCCTAATGTCGAGAATTCTT VTGWRLMSRIL 758 4347 4745 Seq 1 C08 GTGACAGGTTGGCGCCTAAGTCGGAGAATTCTT VTGWRLSRRIL 121 676 740 Seq 1 C09 GTGACAGGTTGGCGCCTAAGTACGAGAATTCTT VTGWRLSTRIL 9 72 116 Seq 1 C10 GTGACAGCTTGGCGCTAGTTTGAGAGAATTCTT VTAWR*FERIL 41 143 160 Seq 1 C11 GTGACAGGTTGGCGCCTAATGAAGAGAATTCTT VTGWRLMKRIL 694 2723 2946 Seq 1 C12 GTGACAGGTTGGCGCCTAAATTCGAGAATTCTT VTGWRLNSRIL 130 1890 2221 Seq 1 D01 GTGACAGGTTGGCGCCTAGGTAAGAAAATTCTT VTGWRLGKKIL 277 1782 1825 Seq 1 D02 GTGACAGGTTGGCGCCTAAATAGGAAAATTCTT VTGWRLNRKIL 519 2061 2236 Seq 1 D03 GTGACAGGTTGGCGCCTAAGTATGAGAATTCTT VTGWRLSMRIL 269 1639 1834 Seq 1 D04 GTGACAGGTTGGCGCCTACATAATAGAATTCTT VTGWRLHNRIL 238 1934 2185 Seq 1 D05 GTGACAGGTTGGCGCCTATGGTCGAGAATTCTT VTGWRLWSRIL 34 248 282 Seq 1 D06 GTGACAGGTTGGCGCCTATTGAAGAGAATTCTT VTGWRLLKRIL -2 0 13 Seq 1 D07 GTGACAGGTTGGCGCCTACATCAGAAAATTCTT VTGWRLHQKIL 89 937 971 Seq 1 D08 GTGACAGGTTGGCGCCTATGGTCGAGAATTCTT VTGWRLWSRIL 160 1108 1356 Seq 1 D09 GTGACAGGTTGGCGCCTATTTCTGAGAATTCTT VTGWRLFLRIL 539 2044 2195 Seq 1 D10 GTGACAGGTTGGCGCCTAAATTCTAGAATTCTT VTGWRLNSRIL -4 0 10 Seq 1 D11 GTGACAGGGTGGCGCTTGTTTGAGAGAATTCTT VTGWRLFERIL 1046 3799 4275 Seq 1 D12 GTGACAGGTTGGCGCCTATGTAAGAGAATTCTT VTGWRLCKRIL 251 1964 2277 Seq 1 E01 GTGACAGGTCATCGCCTATTTCATAGAATTCTT VTGHRLFHRIL 559 1246 1334 Seq 1 E02 GTGACAGGTGGGCGCCTATTTCGGAGAATTCTT VTGGRLFRRIL 16 16 44 Seq 1 E03 GTGACAGGTTTTCGCCTATCTGAGAGGATTCTT VTGFRLSERIL 5 -6 -3 Seq 1 E04 GTGACAGGTATTCGCCTATTTGAGCTGATTCTT VTGIRLFELIL -7 1 10 Seq 1 E05 GTGACAGGTTCGCGCCTAGATGAGAGAATTCTT VTGSRLDERIL 44 151 195 Seq 1 E06 GTGACAGGTATTCGCCTAATGGAGAGAATTCTT VTGIRLMERIL 126 563 715 Seq 1 E07 GTGACAGGTTATCGCCTAGCTGAGAGAATTCTT VTGYRLAERIL 23 -1 -3 Seq 1 E08 GTGACAGGTTATCGCCTATCGGAGAGAATTCTT VTGYRLSERIL 493 4431 5286 Seq 1 E09 GTGACAGGTTTTCGCCTACTGGAGAAAATTCTT VTGFRLLEKIL 629 1926 2119 Seq 1 E10 GTGACAGGTTTTCGCCTAAGTGAGAGAATTCTT VTGFRLSERIL 321 3427 3720 Seq 1 E11 GTGACAGGTTGTCGCCTACATGAGAGAATTCTT VTGCRLHERIL 98 461 482 Seq 1 E12 GTGACAGGTTGGCGCCTGTTTGAGACAATTCTT VTGWRLFETIL 38 373 400 Seq 1 F01 GTGACAGGTTGGCGCGTGATGGAGAGAATTCTT VTGWRVMERIL 363 1878 1803 Seq 1 F02 GTGACAGGTTGGCGCCTGAATGAGAAAATTCTT VTGWRLNEKIL 301 2375 2621 Seq 1 F03 GTGACAGGTTGGCGCTTTAGTGAGAGAATTCTT VTGWRFSERIL 29 32 12 Seq 1 F04 GTGACAGGTTGGCGCTGGCTTGAGAAAATTCTT VTGWRWLEKIL 483 2220 2550 Seq 1 F05 GTGACAGGTTGGCGCATGAGTGAGAGAATTCTT VTGWRMSERIL 5 78 109 Seq 1 F06 GTGACAGGTTGGCGCCAGATGGAGAAAATTCTT VTGWRQMEKIL 287 1063 1171 Seq 1 F07 GTGACAGGTTGGCGCCGGCATGAGAAAATTCTT VTGWRRHEKIL 16 7 16 Seq 1 F08 GTGACAGGTTGGCGCCTTTTTGAGATAATTCTT VTGWRLFEIIL 505 1757 2018 Seq 1 F09 GTGACAGGTTGGCGCATTATGGAGAAAATTCTT VTGWRIMEKIL 205 1190 1216 Seq 1 F10 GTGACAGGTTGGCGCCTGGGTGAGAGAATTCTT VTGWRLGERIL 9 46 71 Seq 1 F11 GTGACAGGTTGGCGCCAGTTTGAGATAATTCTT VTGWRQFEIIL 10 -5 3 Seq 1 F12 GTGACAGGTTGGCGCCGGTTTCAGAGAATTCTT VTGWRRFQRIL 2896 10210 10526 Seq 1 G01 GTGACAGGTTGGCGCCTGTTTAAGAGAATTCTT VTGWRLFKRIL 6632 21485 22773 Seq 1 G02 GTGACAGGTTGGCGCCGTTTTGCGAGAATTCTT VTGWRRFARIL 1946 15873 16316 Seq 1 G03 GTGACAGGTTGGCGCCTGTTTTCGAGAATTCTT VTGWRLFSRIL 5733 20015 21183 Seq 1 G04 GTGACAGGTTGGCGCCGTTTTAAGAGAATTCTT VTGWRRFKRIL 6556 34298 34759 Seq 1 G05 GTGACAGGTTGGCGCCAGTTTGCGAGAATTCTT VTGWRQFARIL 12970 21078 20222 Seq 1 G06 GTGACAGGTTGGCGCCGGTTTATGAGAATTCTT VTGWRRFMRIL 5299 45410 45995 Seq 1 G07 GTGACAGGTTGGCGCCTGTTTGAGAGAATTCTT VTGWRLFERIL 178 1090 1066 Seq 1 G08 GTGACAGGTTGGCGCCTTTTTACGAGAATTCTT VTGWRLFTRIL 5315 24749 25743 Seq 1 G09 GTGACAGGTTGGCGCCTGTTTGAGAGAATTCTT VTGWRLFERIL 3475 17339 17355 Seq 1 G10 GTGACAGGTTGGCGCCTTTCTAAGAGAATTCTT VTGWRLSKRIL 2812 37547 43536 Seq 1 G11 GTGACAGGTTGGCGCCGTTTTAAGAGAATTCTT VTGWRRFKRIL 8559 31099 31126 Seq 1 G12 GTGACAGGTTGGCGCAGGTTTAGGAGAATTCTT VTGWRRFRRIL 141 416 441 Seq 1 H01 GTGACAGGTTGGCGCCGGTTTATGAGAATTCTT VTGWRRFMRIL 440 2258 2451 Seq 1 H02 GTGACAGGTTGGCGCCTTTTTACTAGAATTCTT VTGWRLFTRIL 9589 31621 33335 Seq 1 H03 GTGACAGGTTGGCGCCTGTTTGCGAGAATTCTT VTGWRLFARIL 181 1872 2145 Seq 1 H04 GTGACAGGTTGGCGCCTTTTTCAGAGAATTCTT VTGWRLFQRIL 463 2162 2331 Seq 1 H05 GTGACAGGTTGGCGCCTTTTTACTAGAATTCTT VTGWRLFTRIL 13667 36700 36465 Seq 1 H06 GTGACAGGTTGGCGCCTTTTTACTAGAATTCTT VTGWRLFTRIL 5724 36223 38127 Seq 1 H07 GTGACAGGTTGGCGCCTTTTTGCGAGAATTCTT VTGWRLFARIL 473 1916 1946 Seq 1 H08 GTGACAGGTTGGCGCGAGTTTGGGAGAATTCTT VTGWREFGRIL 206 1146 1331 Seq 1 H09 GTGACAGGTTGGCGCAAGTTTAATAGAATTCTT VTGWRKFNRIL 6751 37165 39336 Seq 1 H10 GTGACAGGTTGGCGCCAGTTTCGGAGAATTCTT VTGWRQFRRIL 4502 26217 28913 Seq 1 H11 GTGACAGGTTGGCGCTTGTTTGGGAGAATTCTT VTGWRLFGRIL 43 101 125 Seq 1 H12 Control Control 714055 1536365 1528108 Seq 2 A01 GTGACAGGTTTTCGAATATTTGAGAGAATTCTT VTGFRIFERIL 9107 37093 43643 Seq 2 A02 GTGACAGGGCTGCGCCTATTTGAGAGAATTCTT VTGLRLFERIL 468 3009 3536 Seq 2 A03 GTGACAGGGTGGCGCCTATTTGAGAGAATTCTT VTGWRLFERIL 10861 39494 42666 Seq 2 A04 GTGACAGGGTGGCGCCTATTTGAGAGAATTCTT VTGWRLFERIL 11849 49194 54491 Seq 2 A05 GTGACAGGGGGGCGCCGATTTGAGAGAATTCTT VTGGRRFERIL 745 3552 4553 Seq 2 A06 GTGACAGGGTGGCGCCTATTTGAGAGAATTCTT VTGWRLFERIL 17548 64486 71991 Seq 2 A07 GTGACAGGTTTTCGCCTATTTGAGAGAATTCTT VTGFRLFERIL 6369 31668 36912 Seq 2 A08 GTGACAGGTTTTCGCCTATTTGAGAGAATTCTT VTGFRLFERIL 19055 75858 82229 Seq 2 A09 GTGACAGGGTGGCGCCTATCTGAGAGAATTCTT VTGWRLSERIL 10404 31951 40853 Seq 2 A10 GTGACAGTTCTGCGCCTATCTGAGAGAATTCTT VTVLRLSERIL 6 -2 25 Seq 2 A11 GTGACAGGTTTTCGCCTATTTGAGAGAATTCTT VTGFRLFERIL 14972 57276 64011 Seq 2 A12 GTGACAGGGTGGCGCCTATTTGAGAGAATTCTT VTGWRLFERIL 52439 128176 135640 Seq 2 B01 GTGACAGGTAATCGCCTACAGGAGAAAATTCTT VTGNRLQEKIL 12 14 7 Seq 2 B02 GTGACAGGTTTTCGCCTACATGAGAGAATTCTT VTGFRLHERIL 4839 15800 20304 Seq 2 B03 GTGACAGGTTTTCGCCTACATGAGAAAATTCTT VTGFRLHEKIL 6760 25315 29635 Seq 2 B04 GTGACAGGTCGGCGCCTAACGGAGAGAATTCTT VTGRRLTERIL 2684 6962 7612 Seq 2 B05 GTGACAGGTTTTCGCCTATTTGAGAGAATTCTT VTGFRLFERIL 6897 17771 20469 Seq 2 B06 GTGACAGGTGCGCGCCTATTTGAGAGAATTCTT VTGARLFERIL 590 2712 3624 Seq 2 B07 GTGACAGGTGGGCGCCTACCTGAGAGAATTCTT VTGGRLPERIL 14 8 12 Seq 2 B08 GTGACAGGTTTTCGCCTATTTGAGAGAATTCTT VTGFRLFERIL 6180 29093 33188 Seq 2 B09 GTGACAGGTAGTCGCCTACATGAGAAAATTCTT VTGSRLHEKIL 495 3463 4247 Seq 2 B10 GTGACAGGTTGGCGCCTACATGAGAGAATTCTT VTGWRLHERIL 8034 31328 36700 Seq 2 B11 GTGACAGGTAGTCGCCTATTTGAGAAAATTCTT VTGSRLFEKIL 1342 5571 7418 Seq 2 B12 GTGACAGGTCATCGCCTAATGGAGAGAATTCTT VTGHRLMERIL 807 4332 5700 Seq 2 C01 No sequence No sequence 2275 8130 10095 Seq 2 C02 GTGACAGGTTTTCGCCTACATGAGAGAATTCTT VTGFRLHERIL 2816 14408 18751 Seq 2 C03 GTGACAGGTCATCGCCTATTGGAGAAAATTCTT VTGHRLLEKIL 233 1213 1650 Seq 2 C04 GTGACAGGTTGGCGCCTACATGAGAAAATTCTT VTGWRLHEKIL 3859 12294 15180 Seq 2 C05 GTGACAGGTCCTCGCCTACAGGAGAGAATTCTT VTGPRLQERIL 362 1261 1623 Seq 2 C06 GTGACAGGTTTTCGCCTAAGTGAGAGAATTCTT VTGFRLSERIL 880 3442 4700 Seq 2 C07 No sequence No sequence 4570 13322 16202 Seq 2 C08 GTGACAGGTCATCGCCTAATGGAGAGAATTCTT VTGHRLMERIL 450 2746 3822 Seq 2 C09 GTGACAGGTTGGCGCCTACATGAGAGAATTCTT VTGWRLHERIL 3744 8977 10959 Seq 2 C10 GTGACAGGTATGCGCCTATTTGAGACAATTCTT VTGMRLFETIL 2012 4554 5230 Seq 2 C11 GTGACAGGTTTTCGCCTACATGAGAGAATTCTT VTGFRLHERIL 1453 7046 9408 Seq 2 C12 GTGACAGGTTGGCGCCTACATGAGAAAATTCTT VTGWRLHEKIL 2416 8062 11072 Seq 2 D01 GTGACAGGTCTTCGCCTACATGAGAAAATTCTT VTGLRLHEKIL 1884 5450 8172 Seq 2 D02 GTGACAGGTTTTCGCCTAAATGAGAAAATTCTT VTGFRLNEKIL 159 1354 1624 Seq 2 D03 GTGACAGGTTATCGCCTATCGGAGAGAATTCTT VTGYRLSERIL 266 844 1294 Seq 2 D04 GTGACAGGTCATCGCCTACATGAGAAAATTCTT VTGHRLHEKIL 242 2048 2589 Seq 2 D05 GTGACAGGTTTTCGCCTACATGAGAGAATTCTT VTGFRLHERIL 1760 6099 8730 Seq 2 D06 GTGACAGGTTGGCGCCTACATGAGAGAATTCTT VTGWRLHERIL 2684 5536 6491 Seq 2 D07 GTGACAGGTTGGCGCCTAAGTGAGAAAATTCTT VTGWRLSEKIL 829 4122 4811 Seq 2 D08 GTGACAGGTTGGCGCCTAAGTGAGAAAATTCTT VTGWRLSEKIL 1157 3642 5496 Seq 2 D09 GTGACAGGTTTTCGCCTAAGTGAGAGAATTCTT VTGFRLSERIL 310 1727 2550 Seq 2 D10 GTGACAGGTTTTCGCCTACATGAGAAAATTCTT VTGFRLHEKIL 2312 8707 11094 Seq 2 D11 GTGACAGGTTTTCGCCTATTTGAGAGAATTCTT VTGFRLFERIL 3470 9259 10346 Seq 2 D12 GTGACAGGTTTTCGCCTATTTGAGAGAATTCTT VTGFRLFERIL 6590 18023 21710 Seq 2 E01 GTGACAGGTTGTCGCCTAATGGAGAAAATTCTT VTGCRLMEKIL 144 468 700 Seq 2 E02 GTGACAGGTTTTCGCCTAAGTGAGAGAATTCTT VTGFRLSERIL 285 1765 2580 Seq 2 E03 GTGACAGGTAGTCGCCTATTTGAGAGGATTCTT VTGSRLFERIL 653 1551 2272 Seq 2 E04 GTGACAGGTAGTCGCCTATTTGAGAAGATTCTT VTGSRLFEKIL 457 1770 2496 Seq 2 E05 GTGACAGGTAGTCGCCTATTTGAGAGGATTCTT VTGSRLFERIL 363 817 1261 Seq 2 E06 GTGACAGGTACTCGCCTATTTGAGCGGATTCTT VTGTRLFERIL 259 645 854 Seq 2 E07 GTGACAGGTTGGCGCCTATTTGAGCCTATTCTT VTGWRLFEPIL 4347 8575 10099 Seq 2 E08 GTGACAGGTAAGCGCCTATTTGAGCGGATTCTT VTGKRLFERIL 54 866 1180 Seq 2 E09 GTGACAGGTTGGCGCCTATCTGAGAAGATTCTT VTGWRLSEKIL 108 971 1627 Seq 2 E10 GTGACAGGTTATCGCCTATCTGAGAGGATTCTT VTGYRLSERIL 110 1026 1405 Seq 2 E11 GTGACAGGTTTTCGCCTATTTGAGCAGATTCTT VTGFRLFEQIL 3144 5760 7815 Seq 2 E12 GTGACAGGTTTTCGCCTATTTGAGCAGATTCTT VTGFRLFEQIL 4132 13324 18267 Seq 2 F01 GTGACAGGTCATCGCCTATTTGAGGCTATTCTT VTGHRLFEAIL 1319 5322 7786 Seq 2 F02 GTGACAGGTTGGCGCCTATTTGAGATGATTCTT VTGWRLFEMIL 1630 8728 10261 Seq 2 F03 GTGACAGGTTATCGCCTATTTGAGTTTATTCTT VTGYRLFEFIL 2421 6344 7533 Seq 2 F04 GTGACAGGTGCTCGCCTATTTGAGCGGATTCTT VTGARLFERIL 1030 4633 6612 Seq 2 F05 GTGACAGGTTATCGCCTATCTGAGCGGATTCTT VTGYRLSERIL 711 5932 7230 Seq 2 F06 GTGACAGGTTTTCGCCTATTTGAGCATATTCTT VTGFRLFEHIL 4799 18848 22037 Seq 2 F07 GTGACAGGTTTTCGCCTATTTGAGTCGATTCTT VTGFRLFESIL 3372 8566 11057 Seq 2 F08 GTGACAGGTATTCGCCTATTTGAGCTTATTCTT VTGIRLFELIL 148 664 894 Seq 2 F09 GTGACAGGTTTTCGCCTATCTGAGCGTATTCTT VTGFRLSERIL 1059 7807 10444 Seq 2 F10 GTGACAGGTATGCGCCTATTTGAGAAGATTCTT VTGMRLFEKIL 638 3239 3938 Seq 2 F11 GTGACAGGTTATCGCCTATTTGAGAATATTCTT VTGYRLFENIL 2015 5279 6823 Seq 2 F12 GTGACAGGTTTTCGCCTATTTGAGCAGATTCTT VTGFRLFEQIL 4497 15465 19967 Seq 2 G01 GTGACAGGTTGTCGCCTATTTGAGCTTATTCTT VTGCRLFELIL 321 1488 1911 Seq 2 G02 GTGACAGGTTATCGCCTATCTGAGCGGATTCTT VTGYRLSERIL 330 1764 2845 Seq 2 G03 GTGACAGGTCATCGCCTATTTGAGCGGATTCTT VTGHRLFERIL 7578 41077 44741 Seq 2 G04 GTGACAGGTTCTCGCCTATTTGAGAGGATTCTT VTGSRLFERIL 695 4039 5524 Seq 2 G05 GTGACAGGTTATCGCCTATTTGAGCTGATTCTT VTGYRLFELIL 1381 7190 8677 Seq 2 G06 GTGACAGGTATTCGCCTATTTGAGAGGATTCTT VTGIRLFERIL 511 3108 4236 Seq 2 G07 GTGACAGGTTATCGCCTATTTGAGCATATTCTT VTGYRLFEHIL 2717 13700 19127 Seq 2 G08 GTGACAGGTTTTCGCCTATGTGAGAGGATTCTT VTGFRLCERIL 2737 12433 14090 Seq 2 G09 GTGACAGGTTATCGCCTATTTGAGTTTATTCTT VTGYRLFEFIL 490 2516 3544 Seq 2 G10 GTGACAGGTTGTCGCCTATTTGAGCTGATTCTT VTGCRLFELIL 314 3253 4640 Seq 2 G11 GTGACAGGTTATCGCCTATTTGAGCTGATTCTT VTGYRLFELIL 1189 9506 12501 Seq 2 G12 GTGACAGGTTGGCGCCTATTTGAGCCTATTCTT VTGWRLFEPIL 4507 41306 48239 Seq 2 H01 GTGACAGGTTTTCGCCTATTTGCGAGAATTCTT VTGFRLFARIL 12251 51241 63955 Seq 2 H02 GTGACAGGTAGTCGCCTATTTGTGAGAATTCTT VTGSRLFVRIL 404 3129 4234 Seq 2 H03 GTGACAGGTTGGCGCCTATGTTCGAGAATTCTT VTGWRLCSRIL 711 3332 4963 Seq 2 H04 GTGACAGGTGCTCGCCTATTTGTGAGAATTCTT VTGARLFVRIL 486 2642 3818 Seq 2 H05 GTGACAGGTCAGCGCCTATGTAGGAGAATTCTT VTGQRLCRRIL 16 102 133 Seq 2 H06 GTGACAGGTGGGCGCCTATTTACGAGAATTCTT VTGGRLFTRIL 1088 4368 6311 Seq 2 H07 GTGACAGGTTGTCGCCTATTTATGAGAATTCTT VTGCRLFMRIL 1937 4598 5684 Seq 2 H08 No sequence No sequence 1923 4791 5568 Seq 2 H09 GTGACAGGTTGGCGCCTATTTAGGAGAATTCTT VTGWRLFRRIL 31326 47836 61375 Seq 2 H10 GTGACAGGTGGGCGCCTATTTCAGAGAATTCTT VTGGRLFQRIL 1057 4929 7822 Seq 2 H11 No sequence No sequence 1360 6714 9861 Seq 2 H12 Control Control 168566 212931 216798 Seq 3 A01 GTGACAGGTTTTCGCCTATTTATGAGAATTCTT VTGFRLFMRIL 11103 45999 52239 Seq 3 A02 GTGACAGGTTGGCGCCTATGTAAGCGGATTCTT VTGWRLCKRIL 2831 11155 14254 Seq 3 A03 GTGACAGGTTGGCGCCTATTTGCTGAGATTCTT VTGWRLFAEIL 1511 5115 6629 Seq 3 A04 GTGACAGGTTGGCGCCTATCTAGGTGTATTCTT VTGWRLSRCIL 12380 39000 42847 Seq 3 A05 GTGACAGGTTGGCGCCTATTTTCGTTGATTCTT VTGWRLFSLIL 12099 33670 39254 Seq 3 A06 GTGACAGGTTGGCGCCTATTTGCTCCGATTCTT VTGWRLFAPIL 7505 24141 29298 Seq 3 A07 GTGACAGGTTGGCGCCTATCTAGGTGTATTCTT VTGWRLSRCIL 266 1340 1811 Seq 3 A08 GTGACAGGTTGGCGCCTATTTAAGGGTATTCTT VTGWRLFKGIL 3500 15232 20013 Seq 3 A09 GTGACAGGTTGGCGCCTATTTCAGCCTATTCTT VTGWRLFQPIL 12230 37312 44278 Seq 3 A10 GTGACAGGTTGGCGCCTATTTCGGACGATTCTT VTGWRLFRTIL 8217 25648 30557 Seq 3 A11 GTGACAGGTTGGCGCCTATCTAGGTGTATTCTT VTGWRLSRCIL 152 919 1320 Seq 3 A12 GTGACAGGTTGGCGCCTATTTCGGTATATTCTT VTGWRLFRYIL 7846 24545 30271 Seq 3 B01 GTGACAGGTTGGCGCCTATCTTCGCCTATTCTT VTGWRLSSPIL 835 4847 6460 Seq 3 B02 GTGACAGGTTGGCGCCTATTTAGTAAGATTCTT VTGWRLFSKIL 18260 50921 55241 Seq 3 B03 GTGACAGGTTGGCGCCTATCTTGGCGGATTCTT VTGWRLSWRIL 898 5568 6580 Seq 3 B04 GTGACAGGTTGGCGCCTATTTAGGCCTATTCTT VTGWRLFRPIL 32368 73910 79890 Seq 3 B05 GTGACAGGTTGGCGCCTATTTCGGGTGATTCTT VTGWRLFRVIL 6518 19429 22630 Seq 3 B06 GTGACAGGTTGGCGCCTATCTAGTCTGATTCTT VTGWRLSSLIL 240 1498 1841 Seq 3 B07 GTGACAGGTTGGCGCCTATTTCGGGTGATTCTT VTGWRLFRVIL 6623 20043 22845 Seq 3 B08 GTGACAGGTTGGCGCCTATCTTGGCGGATTCTT VTGWRLSWRIL 2431 6203 7610 Seq 3 B09 GTGACAGGTTGGCGCCTATTTCATCATATTCTT VTGWRLFHHIL 11778 34809 38603 Seq 3 B10 GTGACAGGTTGGCGCCTATTTCATCATATTCTT VTGWRLFHHIL 14971 39516 44233 Seq 3 B11 GTGACAGGTTGGCGCCTATTTTGTCCGATTCTT VTGWRLFCPIL 2735 9566 11203 Seq 3 B12 GTGACAGGTTGGCGCCTATTTAAGGGTATTCTT VTGWRLFKGIL 6018 18198 21976 Seq 3 C01 GTGACAGGTTGGCGCCTATTTTTGCCGATTCTT VTGWRLFLPIL 4478 20158 22793 Seq 3 C02 GTGACAGGTTGGCGCCTATCTGAGCGGATTCTT VTGWRLSERIL 412 2642 3482 Seq 3 C03 GTGACAGGTTGGCGCCTATTTCAGAGTATTCTT VTGWRLFQSIL 3750 16422 20278 Seq 3 C04 GTGACAGGTTGGCGCCTATCTATTAAGATTCTT VTGWRLSIKIL 1358 5857 6909 Seq 3 C05 GTGACAGGTTGGCGCCTATTTCAGACGATTCTT VTGWRLFQTIL 2238 10007 12140 Seq 3 C06 GTGACAGGTTGGCGCCTATCTAGGTGTATTCTT VTGWRLSRCIL 269 1376 1934 Seq 3 C07 GTGACAGGTTGGCGCCTATCTAGGTGTATTCTT VTGWRLSRCIL 1782 2759 3688 Seq 3 C08 GTGACAGGTTGGCGCCTATTTCGGACGATTCTT VTGWRLFRTIL 1782 2759 3688 Seq 3 C09 GTGACAGGTTGGCGCCTATCTAGGAGGATTCTT VTGWRLSRRIL 2249 6514 8115 Seq 3 C10 GTGACAGGTTGGCGCCTATCTCGGCCTATTCTT VTGWRLSRPIL 3256 16334 19826 Seq 3 C11 GTGACAGGTTGGCGCCTATCTAGGAGGATTCTT VTGWRLSRRIL 4279 14752 17506 Seq 3 C12 GTGACAGGTTGGCGCCTATCTTCGCCTATTCTT VTGWRLSSPIL 659 3311 4370 Seq 3 D01 GTGACAGGTTGGCGCCTATTTGATGCTATTCTT VTGWRLFDAIL 3151 11402 14231 Seq 3 D02 GTGACAGGTTGGCGCCTATGTCGGCCGATTCTT VTGWRLCRPIL 1004 3832 5008 Seq 3 D03 GTGACAGGTTGGCGCCTATTTCGGACGATTCTT VTGWRLFRTIL 2944 22400 29207 Seq 3 D04 GTGACAGGTTGGCGCCTATTTGGTCATATTCTT VTGWRLFGHIL 6677 27767 34204 Seq 3 D05 GTGACAGGTTGGCGCCTATGTCGGCCGATTCTT VTGWRLCRPIL 786 4748 6277 Seq 3 D06 GTGACAGGTTGGCGCCTATCTAGGTGTATTCTT VTGWRLSRCIL 289 2136 2630 Seq 3 D07 GTGACAGGTTGGCGCCTATTTGCTCCGATTCTT VTGWRLFAPIL 9083 33518 38684 Seq 3 D08 GTGACAGGTTGGCGCCTATTTCAGCCTATTCTT VTGWRLFQPIL 11015 39726 44866 Seq 3 D09 GTGACAGGTTGGCGCCTATTTGCTCAGATTCTT VTGWRLFAQIL 2637 16333 19639 Seq 3 D10 GTGACAGGTTGGCGCCTATTTCAGCCTATTCTT VTGWRLFQPIL 5495 29402 33234 Seq 3 D11 GTGACAGGTTGGCGCCTATGTCGGAAGATTCTT VTGWRLCRKIL 5753 29272 34747 Seq 3 D12 GTGACAGGTTGGCGCCTATTTCTTGAGATTCTT VTGWRLFLEIL 141 665 828 Seq 3 E01 GTGACAGGTTGGCGCCTATTTGGGCCGATTCTT VTGWRLFGPIL 1592 14003 20910 Seq 3 E02 GTGACAGGTTGGCGCCTATCTAGGGCGATTCTT VTGWRLSRAIL 889 5867 7097 Seq 3 E03 GTGACAGGTTGGCGCCTATCTAGGGCGATTCTT VTGWRLSRAIL 935 6539 8175 Seq 3 E04 GTGACAGGTTGGCGCCTATTTTTGCCGATTCTT VTGWRLFLPIL 1989 8179 10558 Seq 3 E05 GTGACAGGTTGGCGCCTATGTAAGCGGATTCTT VTGWRLCKRIL 1547 8166 10046 Seq 3 E06 GTGACAGGTTATCGCCTATCTCGGAGAATTCTT VTGYRLSRRIL 1490 11997 16606 Seq 3 E07 GTGACAGGTTCTCGCCTATTTACGAGAATTCTT VTGSRLFTRIL 848 5897 7520 Seq 3 E08 GTGACAGGTTGGCGCCTATCTACGAGAATTCTT VTGWRLSTRIL 422 3094 4593 Seq 3 E09 GTGACAGGTTCTCGCCTATTTACGAGAATTCTT VTGSRLFTRIL 2096 17121 19300 Seq 3 E10 GTGACAGGTGCTCGCCTATTTACGAGAATTCTT VTGARLFTRIL 1849 8442 11029 Seq 3 E11 GTGACAGGTGCTCGCCTATCTAGGAGAATTCTT VTGARLSRRIL 605 5431 6134 Seq 3 E12 GTGACAGGTTTTCGCCTATCTACGAGAATTCTT VTGFRLSTRIL 1663 12034 14492 Seq 3 F01 GTGACAGGTCTTCGCCTATCTAGGAGAATTCTT VTGLRLSRRIL 15215 50869 53231 Seq 3 F02 GTGACAGGTCATCGCCTATTTCGGAGAATTCTT VTGHRLFRRIL 26070 101297 104572 Seq 3 F03 GTGACAGGTTATCGCCTATTTCAGAGAATTCTT VTGYRLFQRIL 12009 38175 42983 Seq 3 F04 GTGACAGGTTTTCGCCTATTTCAGAGAATTCTT VTGFRLFQRIL 17500 50318 56453 Seq 3 F05 GTGACAGGTTGGCGCCTATTTCAGAGAATTCTT VTGWRLFQRIL 16674 59137 75511 Seq 3 F06 GTGACAGGTCATCGCCTATTTGCGAGAATTCTT VTGHRLFARIL 2688 24584 31804 Seq 3 F07 GTGACAGGTGGTCGCCTATTTACGAGAATTCTT VTGGRLFTRIL 2132 7445 9750 Seq 3 F08 GTGACAGGTAATCGCCTATTTCAGAGAATTCTT VTGNRLFQRIL 307 1993 3461 Seq 3 F09 GTGACAGGTGCGCGCCTATTTACGAGAATTCTT VTGARLFTRIL 167 1695 2244 Seq 3 F10 GTGACAGGTTGTCGCCTATTTATGAGAATTCTT VTGCRLFMRIL 1644 5293 6212 Seq 3 F11 GTGACAGGTTGTCGCCTATTTATGAGAATTCTT VTGCRLFMRIL 1941 9918 13363 Seq 3 F12 GTGACAGGTGGGCGCCTATTTCGTAGAATTCTT VTGGRLFRRIL 1308 10189 12541 Seq 3 G01 GTGACAGGTGCGCGCCTATTTCAGAGAATTCTT VTGARLFQRIL 523 5414 7346 Seq 3 G02 GTGACAGGTTATCGCCTATCTCTGAGAATTCTT VTGYRLSLRIL 49 296 366 Seq 3 G03 GTGACAGGTAATCGCCTATTTACGAGAATTCTT VTGNRLFTRIL 1622 11333 13113 Seq 3 G04 GTGACAGGTTTTCGCCTATTTGCGAGAATTCTT VTGFRLFARIL 8005 41831 48141 Seq 3 G05 GTGACAGGTGGGCGCCTATTTCGGAGAATTCTT VTGGRLFRRIL 514 4526 6690 Seq 3 G06 GTGACAGGTTGGCGCCTATCTAAGAGAATTCTT VTGWRLSKRIL 3888 18059 21988 Seq 3 G07 GTGACAGGTATGCGCCTATTTCGGAGAATTCTT VTGMRLFRRIL 1479 5956 7397 Seq 3 G08 GTGACAGGTGCTCGCCTATTTCTGAGAATTCTT VTGARLFLRIL 26 287 455 Seq 3 G09 No sequence No sequence 303 2313 3271 Seq 3 G10 GTGACAGGTTATCGCCTATTTTCGAGAATTCTT VTGYRLFSRIL 24951 60472 64660 Seq 3 G11 GTGACAGGTATGCGCCTATTTAGGAGAATTCTT VTGMRLFRRIL 756 3724 4926 Seq 3 G12 GTGACAGGTTTTCGCCTATTTTTTAGAATTCTT VTGFRLFFRIL 5562 21075 22960 Seq 3 H01 GTGACAGGTTATCGCCTATGTAAGAGAATTCTT VTGYRLCKRIL 730 24304 30111 Seq 3 H02 GTGACAGGTTGGCGCCTATTTCAGAGAATTCTT VTGWRLFQRIL 9419 33804 43926 Seq 3 H03 GTGACAGGTCATCGCCTATTTCAGAGAATTCTT VTGHRLFQRIL 2497 11137 13288 Seq 3 H04 GTGACAGGTTATCGCCTATGTTCGAGAATTCTT VTGYRLCSRIL 411 8085 11033 Seq 3 H05 GTGACAGGTTCTCGCCTATTTACGAGAATTCTT VTGSRLFTRIL 1820 16282 19311 Seq 3 H06 GTGACAGGTTCTCGCCTATTTGGGAGAATTCTT VTGSRLFGRIL 345 4018 5683 Seq 3 H07 GTGACAGGTAGGCGCCTATATGGGAGAATTCTT VTGRRLYGRIL 15 14 26 Seq 3 H08 GTGACAGGTGGGCGCCTATTTTCGAGAATTCTT VTGGRLFSRIL 724 4170 6242 Seq 3 H09 GTGACAGGTTGGCGCCTATTTGGGAGAATTCTT VTGWRLFGRIL 24151 93749 105799 Seq 3 H10 GTGACAGGTGCGCGCCTATTTATGAGAATTCTT VTGARLFMRIL 311 2541 3877 Seq 3 H11 GTGACAGGTATGCGCCTATTTACGAGAATTCTT VTGMRLFTRIL 399 2245 2906 Seq 3 H12 Control Control 255809 293458 288963 Seq 4 A01 Control Control 10 529776 521595 Seq 4 A02 Control Control 13 6742 7604 Seq 4 A03 Control Control 19 2852 3390 Seq 4 A04 GTGACAGGTTGGCGCCTTTTTGAGATAATTCTT VTGWRLFEIIL 10 11990 16423 Seq 4 A05 GTGACAGGTTGGCGCATGTGGGAGATAATTCTT VTGWRMWEIIL -15 894 1458 Seq 4 A06 No sequence No sequence 15 2305 3062 Seq 4 A07 No sequence No sequence 2 1555 2300 Seq 4 A08 No sequence No sequence 6 775 1016 Seq 4 A09 No sequence No sequence 0 14399 19076 Seq 4 A10 GTGACAGGTTGGCGCCTGTTTGAGACAATTCTT VTGWRLFETIL 9 1271 1720 Seq 4 A11 No sequence No sequence 8 5372 7482 Seq 4 A12 GTGACAGGTTGGCGCCTTTTTGAGATAATTCTT VTGWRLFEIIL 3 12952 16922 Seq 4 B01 No sequence No sequence 5 1958 2549 Seq 4 B02 No sequence No sequence 9 130 151 Seq 4 B03 GTGACAGGTTGGCGCTGGCTGGAGAAAATTCTT VTGWRWLEKIL 17 1032 1378 Seq 4 B04 GTGACAGGTTGGCGCCAGATGGAGAAAATTCTT VTGWRQMEKIL 1 3829 5316 Seq 4 B05 No sequence No sequence 10 7588 11241 Seq 4 B06 GTGACAGGTTGGCGCTTTTGTAAGAAAATTCTT VTGWRFCKKIL 14 1557 2269 Seq 4 B07 GTGACAGGTTGGCGCGTTATGGAGACAATTCTT VTGWRVMETIL -3 2770 3999 Seq 4 B08 No sequence No sequence 16 3925 5836 Seq 4 B09 GTGACAGGTTGGCGCTGTATGGAGAGAATTCTT VTGWRCMERIL -11 5667 9003 Seq 4 B10 No sequence No sequence -6 1843 2794 Seq 4 B11 GTGACAGGTTGGCGCCTGATTGAGAGAATTCTT VTGWRLIERIL 2 3017 4558 Seq 4 B12 No sequence No sequence 3 5782 8737 Seq 4 C01 No sequence No sequence 12 3020 4628 Seq 4 C02 GTGACAGGTTGGCGCGTGTCGGAGAAAATTCTT VTGWRVSEKIL 1 4281 6487 Seq 4 C03 GTGACAGGTTGGCGCATGCATGAGAAAATTCTT VTGWRMHEKIL 22 13203 18333 Seq 4 C04 GTGACAGGTTGGCGCGAGTTTGAGAGAATTCTT VTGWREFERIL 12 4634 7247 Seq 4 C05 No sequence No sequence 0 1880 3233 Seq 4 C06 No sequence No sequence 16 2027 3148 Seq 4 C07 No sequence No sequence 13 2846 4147 Seq 4 C08 No sequence No sequence 0 11268 16437 Seq 4 C09 GTGACAGGTTGGCGCTGGCTGGAGAAAATTCTT VTGWRWLEKIL 22 2118 3063 Seq 4 C10 No sequence No sequence 0 6093 8451 Seq 4 C11 No sequence No sequence 17 15025 20929 Seq 4 C12 No sequence No sequence 8 2937 4349 Seq 4 D01 No sequence No sequence 23 1970 2858 Seq 4 D02 GTGACAGGTTGGCGCCAGATGGAGAAAATTCTT VTGWRQMEKIL -4 3820 4964 Seq 4 D03 GTGACAGGTTGGCGCTTGATGGAGAGAATTCTT VTGWRLMERIL -1 2300 3196 Seq 4 D04 GTGACAGGTTGGCGCATTTGGGAGAAAATTCTT VTGWRIWEKIL 11 1103 1512 Seq 4 D05 No sequence No sequence -13 4924 7018 Seq 4 D06 No sequence No sequence -5 369 542 Seq 4 D07 No sequence No sequence 21 9336 13035 Seq 4 D08 No sequence No sequence 22 7804 11492 Seq 4 D09 No sequence No sequence 13 10458 16716 Seq 4 D10 No sequence No sequence 10 25461 36211 Seq 4 D11 No sequence No sequence 5 12245 18370 Seq 4 D12 No sequence No sequence 27 65689 81531 Seq 4 E01 GTGACAGGTTGGCGCATGTTTCGTAGAATTCTT VTGWRMFRRIL 8 35026 45040 Seq 4 E02 GTGACAGGTTGGCGCCTGTTTATGAGAATTCTT VTGWRLFMRIL 18 4925 8166 Seq 4 E03 No sequence No sequence 13 9945 14630 Seq 4 E04 GTGACAGGTTGGCGCCAGTGTTTGAGAATTCTT VTGWRQCLRIL 8 7885 11959 Seq 4 E05 GTGACAGGTTGGCGCGAGTGTAAGAGAATTCTT VTGWRECKRIL -4 1905 2466 Seq 4 E06 No sequence No sequence 4 6900 10001 Seq 4 E07 GTGACAGGTTGGCGCAGGTTTTTGAGAATTCTT VTGWRRFLRIL 7 6131 9511 Seq 4 E08 No sequence No sequence 13 14955 20781 Seq 4 E09 No sequence No sequence 6 3781 6191 Seq 4 E10 No sequence No sequence 12 12597 18135 Seq 4 E11 No sequence No sequence 0 11485 16258 Seq 4 E12 GTGACAGGTTGGCGCCTGTTTGTGAGAATTCTT VTGWRLFVRIL 9 12342 16045 Seq 4 F01 GTGACAGGTTGGCGCAGGTTTGGGAGAATTCTT VTGWRRFGRIL 9 21191 31097 Seq 4 F02 GTGACAGGTTGGCGCGATTTTAAGAGAATTCTT VTGWRDFKRIL 0 7315 10518 Seq 4 F03 GTGACAGGTTGGCGCAGGTTTAAGAGAATTCTT VTGWRRFKRIL 17 31214 38117 Seq 4 F04 No sequence No sequence -2 8769 11021 Seq 4 F05 No sequence No sequence 8 12329 17625 Seq 4 F06 No sequence No sequence -1 9162 12553 Seq 4 F07 No sequence No sequence 9 15878 22022 Seq 4 F08 No sequence No sequence 6 1648 2639 Seq 4 F09 No sequence No sequence 17 6192 8371 Seq 4 F10 GTGACAGGGTGGCGCCTATTTGAGATAATTCTT VTGWRLFEIIL 10 2993 4820 Seq 4 F11 GTGACAGGTTGGCGCCTAAGTGAGATAATTCTT VTGWRLSEIIL 22 580 924 Seq 4 F12 No sequence No sequence 6 13671 18858 Seq 4 G01 No sequence No sequence 10 1304 1922 Seq 4 G02 No sequence No sequence 10 6084 10413 Seq 4 G03 GTGACAGGGTGGCGCCTACATGAGACAATTCTT VTGWRLHETIL 10 1202 1591 Seq 4 G04 No sequence No sequence 8 296 388 Seq 4 G05 No sequence No sequence -5 4801 7550 Seq 4 G06 No sequence No sequence 17 322 755 Seq 4 G07 No sequence No sequence 14 276 434 Seq 4 G08 No sequence No sequence 10 912 1339 Seq 4 G09 GTGACAGGGTGGCGCCTATTTGAGACAATTCTT VTGWRLFETIL 7 2640 4204 Seq 4 G10 No sequence No sequence -8 2363 3505 Seq 4 G11 No sequence No sequence 10 5615 8855 Seq 4 G12 No sequence No sequence 8 7353 10671 Seq 4 H01 GTGACAGGGTGGCGCCTAATGGAGAAAATTCTT VTGWRLMEKIL 17 34517 40650 Seq 4 H02 GTGACAGGTTGGCGCCTAACGGAGAAAATTCTT VTGWRLTEKIL -8 1790 2461 Seq 4 H03 GTGACACCTTGGCGCCTACGTGAGACAATTCTT VTPWRLRETIL -13 358 494 Seq 4 H04 GTGACACATTGGCGCCTAGGTGAGAGAATTCTT VTHWRLGERIL -7 2317 2765 Seq 4 H05 No sequence No sequence 6 96 128 Seq 4 H06 GTGACAGTGTGGCGCCTAACTGAGAGAATTCTT VTVWRLTERIL 0 25 8 Seq 4 H07 GTGACAGGTTGGCGCCTAGAGGAGATAATTCTT VTGWRLEEIIL 28 1502 2290 Seq 4 H08 No sequence No sequence 5 1153 1621 Seq 4 H09 Control Control 22 494667 520330 Seq 4 H10 Control Control 0 4425 5384 Seq 4 H11 Control Control 27 2645 3294 Seq 4 H12 Control Control 0 15 16 Seq 5 A01 Control Control 10820 222546 232948 Seq 5 A02 Control Control 390 7231 12161 Seq 5 A03 Control Control 22 1061 1712 Seq 5 A04 GTGACAGGTTGGCGCCTATTTTGGAGAATTCTT VTGWRLFWRIL 147 507 795 Seq 5 A05 GTGACAATTTGGCGCCTATCTCCGAGAATTCTT VTIWRLSPRIL 32 513 875 Seq 5 A06 GTGACAGGGTGGCGCCTATGTCGGAGAATTCTT VTGWRLCRRIL 33 37 93 Seq 5 A07 No sequence No sequence 12 53 44 Seq 5 A08 GTGACAGGTTGGCGCCTATGTGTGAGAATTCTT VTGWRLCVRIL 25 119 170 Seq 5 A09 GTGACAGGTTGGCGCCTATCTTCTAGAATTCTT VTGWRLSSRIL 140 731 1181 Seq 5 A10 GTGACAGGGCGTCGCCTATTTAAGAGAATTCTT VTGRRLFKRIL 16 31 58 Seq 5 A11 No sequence No sequence 15 475 662 Seq 5 A12 GTGACAGGTTGGCGCCTATTTACGAGAATTCTT VTGWRLFTRIL 1422 2884 4078 Seq 5 B01 GTGACAGGTTGGCGCCTATCTTCTAGAATTCTT VTGWRLSSRIL 23 245 468 Seq 5 B02 GTGACAGGTTGGCGCCTATTTTGGAGAATTCTT VTGWRLFWRIL 23 42 6 Seq 5 B03 GTGACAGAGTGGCGCCTATATGTGAGAATTCTT VTEWRLYVRIL 31 106 123 Seq 5 B04 GTGACAGGGTGGCGCCTATGTATAAGAATTCTT VTGWRLCIRIL 124 376 525 Seq 5 B05 GTGACAGTGTGGCGCCTATTTAGGAGAATTCTT VTVWRLFRRIL 147 2196 3966 Seq 5 B06 GTGACAGGTTGGCGCCTATTTACGAGAATTCTT VTGWRLFTRIL 47 132 183 Seq 5 B07 GTGACAGGGTGGCGCCTATCTAGGAGAATTCTT VTGWRLSRRIL 84 315 658 Seq 5 B08 No sequence No sequence 145 918 1358 Seq 5 B09 GTGACAGGGCGTCGCCTATCGGAGAGAATTCTT VTGRRLSERIL 68 1000 1811 Seq 5 B10 GTGACAGGTTGGCGCCTATCTCTTAGAATTCTT VTGWRLSLRIL 37 210 370 Seq 5 B11 GTGACAGGTTGGCGCCTATTTCTTAGAATTCTT VTGWRLFLRIL 65 802 1432 Seq 5 B12 GTGACAGGTTGGCGCCTATCTTTGAGAATTCTT VTGWRLSLRIL 37 406 513 Seq 5 C01 GTGACAGGGTGGCGCCTATGTTGGAGAATTCTT VTGWRLCWRIL 19 418 619 Seq 5 C02 GTGACAGGGTGGCGCCTATGTTGGAGAATTCTT VTGWRLCWRIL 43 389 487 Seq 5 C03 GTGACAGGTTGGCGCCTATGTTCGAGAATTCTT VTGWRLCSRIL 268 2240 2759 Seq 5 C04 GTGACAGGTTGGCGCCTATTTTGGAGAATTCTT VTGWRLFWRIL 1211 1578 2804 Seq 5 C05 GTGACAGGGTGGCGCCTATTTGAGAGAATTCTT VTGWRLFERIL 46 190 295 Seq 5 C06 GTGACAGGTTGGCGCCTATTTTGGAGAATTCTT VTGWRLFWRIL 39 257 503 Seq 5 C07 GTGACAGGTCCTCGCCTAAGGGAGACAATTCTT VTGPRLRETIL 46 247 445 Seq 5 C08 No sequence No sequence 55 733 937 Seq 5 C09 GTGACAGGTTGGCGCCTATCTTCTAGAATTCTT VTGWRLSSRIL 19 142 217 Seq 5 C10 GTGACAGGTTGGCGCCTATGTATGAGAATTCTT VTGWRLCMRIL 143 426 660 Seq 5 C11 GTGACAGGTTGGCGCCTATTTCTTAGAATTCTT VTGWRLFLRIL 239 716 1605 Seq 5 C12 GTGACAGGTCCTCGCCTAAGGGAGACAATTCTT VTGPRLRETIL 104 947 1389 Seq 5 D01 No sequence No sequence 109 714 919 Seq 5 D02 GTGACAGGGTGGCGCCTATGTGCGAGAATTCTT VTGWRLCARIL 136 372 434 Seq 5 D03 GTGACAGATTGGCGCCTATTTGAGAGAATTCTT VTDWRLFERIL 244 401 546 Seq 5 D04 GTGACAGGTTGGCGCCTATGTGAGCGTATTCTT VTGWRLCERIL 42 145 194 Seq 5 D05 GTGACAGGTTGGCGCCTATTTGAGCCTATTCTT VTGWRLFEPIL 217 142 752 Seq 5 D06 No sequence No sequence 2 52 47 Seq 5 D07 No sequence No sequence 528 3381 4983 Seq 5 D08 GTGACAGGTTGGCGCCTATTTGAGCCTATTCTT VTGWRLFEPIL 230 1549 2027 Seq 5 D09 GTGACAGGTTGGCGCCTATGTGAGCGTATTCTT VTGWRLCERIL 117 1733 2240 Seq 5 D10 GTGACAGGTCCTCGCCTAAGGGAGACAATTCTT VTGPRLRETIL 3269 16230 23889 Seq 5 D11 No sequence No sequence 563 6883 10352 Seq 5 D12 GTGACAGGTTGGCGCCTATTTGAGGAGATTCTT VTGWRLFEEIL 1641 10962 17061 Seq 5 E01 No sequence No sequence 505 2667 3095 Seq 5 E02 GTGACAACTTGGCGCCTATCTGAGAGGATTCTT VTTWRLSERIL 404 2529 3070 Seq 5 E03 GTGACAGGTCGTCGCCTATGGGAGACAATTCTT VTGRRLWETIL 159 2564 3163 Seq 5 E04 GTGACAGGTTGGCGCCTATTTGAGCAGATTCTT VTGWRLFEQIL 330 1328 1757 Seq 5 E05 GTGACAGGGTGGCGCCTATCTGAGAAGATTCTT VTGWRLSEKIL 47 201 282 Seq 5 E06 GTGACAGGGTGGCGCCTATTTGAGTTTATTCTT VTGWRLFEFIL 179 1118 1834 Seq 5 E07 GTGACAGATTGGCGCCTATCTGAGACTATTCTT VTDWRLSETIL 795 2136 3097 Seq 5 E08 GTGACAGGGTGGCGCCTATCTGAGAAGATTCTT VTGWRLSEKIL 2039 3322 4003 Seq 5 E09 GTGACAGGTTGGCGCCTATCTGAGCTTATTCTT VTGWRLSELIL 516 1172 1711 Seq 5 E10 No sequence No sequence 998 2989 4919 Seq 5 E11 GTGACAGGGTGGCGCCTATTTGAGTTTATTCTT VTGWRLFEFIL 562 2193 3386 Seq 5 E12 GTGACAGGGTGGCGCCTATCTGAGAAGATTCTT VTGWRLSEKIL 984 2810 3444 Seq 5 F01 GTGACAGGTTGGCGCTTGTTTGAGCCGATTCTT VTGWRLFEPIL 1762 9341 12587 Seq 5 F02 GTGACAGGTTGGCGCTTGTTTGAGCCGATTCTT VTGWRLFEPIL 316 1619 1989 Seq 5 F03 GTGACAGGTCCTCGCCTAAGGGAGACAATTCTT VTGPRLRETIL 3318 8820 10378 Seq 5 F04 GTGACAGGTTGGCGCCATTTTGAGCATATTCTT VTGWRHFEHIL 1793 3678 4591 Seq 5 F05 GTGACAGGTTGGCGCATGTTTGAGCCGATTCTT VTGWRMFEPIL 578 1352 2590 Seq 5 F06 GTGACAGGTTGGCGCTTGTTTGAGCCGATTCTT VTGWRLFEPIL 1188 3305 4357 Seq 5 F07 GTGACAGGTTGGCGCATGTTTGAGCCGATTCTT VTGWRMFEPIL 229 4093 6499 Seq 5 F08 GTGACAGGTCCTCGCCTAAGGGAGACAATTCTT VTGPRLRETIL 189 875 1088 Seq 5 F09 No sequence No sequence 845 3447 4331 Seq 5 F10 GTGACAGGTCGGCGCCTACATGAGAGAATTCTT VTGRRLHERIL 206 639 1307 Seq 5 F11 No sequence No sequence 26 83 108 Seq 5 F12 No sequence No sequence 163 1076 1355 Seq 5 G01 GTGACAGGTCCTCGCCTAAGGGAGACAATTCTT VTGPRLRETIL 36 626 703 Seq 5 G02 GTGACAGGTTGGCGCCATTTTGAGCATATTCTT VTGWRHFEHIL 55 1336 1596 Seq 5 G03 GTGACAGGTTGTCGCATTTTTGAGGAGATTCTT VTGCRIFEEIL 200 376 531 Seq 5 G04 GTGACAGGTTGGCGCATTTTTGAGAATATTCTT VTGWRIFENIL 8 34 38 Seq 5 G05 GTGACAGGTTGGCGCCTTTTTGAGAATATTCTT VTGWRLFENIL 341 245 256 Seq 5 G06 No sequence No sequence 65 104 139 Seq 5 G07 GTGACAGGTCCTCGCCTAAGGGAGACAATTCTT VTGPRLRETIL 12 3 32 Seq 5 G08 No sequence No sequence 93 163 269 Seq 5 G09 GTGACAGGTCCTCGCCTATGGGAGACAATTCTT VTGPRLWETIL 291 597 739 Seq 5 G10 No sequence No sequence 651 810 938 Seq 5 G11 GTGACAGGTCCTCGCCTAAGGGAGACAATTCTT VTGPRLRETIL 297 930 1227 Seq 5 G12 No sequence No sequence 42 997 1245 Seq 5 H01 GTGACAGGTCCTCGCCTAAGGGAGACAATTCTT VTGPRLRETIL 2553 15006 19289 Seq 5 H02 GTGACAGGTTGGCGCAAGTTTGAGTTGATTCTT VTGWRKFELIL 32 290 487 Seq 5 H03 GTGACAGGTTGGCGCAAGTTTGAGTTGATTCTT VTGWRKFELIL 28 16 21 Seq 5 H04 GTGACAGGTTGGCGCATGTGTGAGTTGATTCTT VTGWRMCELIL 18 196 230 Seq 5 H05 GTGACAGGTTGGCGCATTTTTGAGGAGATTCTT VTGWRIFEEIL 23 -2 15 Seq 5 H06 GTGACAGGTTGGCGCATTTTTGAGGAGATTCTT VTGWRIFEEIL 15 6 6 Seq 5 H07 No sequence No sequence 33 177 334 Seq 5 H08 GTGACAGGTTGGCGCATGTTTGAGCCGATTCTT VTGWRMFEPIL 24 138 178 Seq 5 H09 Control Control 15953 386455 396159 Seq 5 H10 Control Control 2338 6939 9404 Seq 5 H11 Control Control 76 1433 1763 Seq 5 H12 Control Control 11 1 11 Seq 6 A01 Control Control 722646 1218660 1177780 Seq 6 A02 Control Control 5758 32787 30978 Seq 6 A03 Control Control 2623 16490 16090 Seq 6 A04 GTGACAGGTTGGCGCCTAGCTGAGCGGATTCTT VTGWRLAERIL 1906 12709 12189 Seq 6 A05 GTGACAGGTTGGCGCCTACTTGAGCGGATTCTT VTGWRLLERIL 6718 35836 32999 Seq 6 A06 GTGACAGGTTGGCGCCTAATGGAGCGGATTCTT VTGWRLMERIL 31157 69046 68468 Seq 6 A07 GTGACAGGTTGGCGCCTACATGAGCGGATTCTT VTGWRLHERIL 29690 144598 134718 Seq 6 A08 GTGACAGGTTGGCGCCTATTTACTGAGATTCTT VTGWRLFTEIL 16200 73521 69653 Seq 6 A09 GTGACAGGTTGGCGCCTATTTAAGCCGATTCTT VTGWRLFKPIL 32476 115593 107758 Seq 6 A10 GTGACAGGTCCTCGCCTAAGGGAGACAATTCTT VTGPRLRETIL 2670 27099 24449 Seq 6 A11 GTGACAGGTCCTCGCCTAAGGGAGACAATTCTT VTGPRLRETIL 1166 7651 7697 Seq 6 A12 GTGACAGGTTGGCGCCTATTTGTGGCGATTCTT VTGWRLFVAIL 16657 58979 53928 Seq 6 B01 GTGACAGGTTGGCGCCTACATGAGCGGATTCTT VTGWRLHERIL 30166 116375 106153 Seq 6 B02 GTGACAGGTTGGCGCCTATTTCGGTCGATTCTT VTGWRLFRSIL 28428 159566 139354 Seq 6 B03 GTGACAGGTTGGCGCCTATTTACTGAGATTCTT VTGWRLFTEIL 10556 72648 66003 Seq 6 B04 GTGACAGGTTGGCGCCTACTTGAGCGGATTCTT VTGWRLLERIL 6692 38826 35257 Seq 6 B05 GTGACAGGTTGGCGCCTATGGGAGCCTATTCTT VTGWRLWEPIL 1386 10828 9762 Seq 6 B06 No sequence No sequence 4184 33395 30075 Seq 6 B07 GTGACAGGTTGGCGCCTATTTGTGGCGATTCTT VTGWRLFVAIL 4062 44511 39483 Seq 6 B08 GTGACAGGTTGGCGCCTAATGGAGGCGATTCTT VTGWRLMEAIL 2135 31193 26722 Seq 6 B09 GTGACAGGTTGGCGCCTATCTGAGCTGATTCTT VTGWRLSELIL 574 7061 6798 Seq 6 B10 GTGACAGGTTGGCGCCTATTTGAGGATATTCTT VTGWRLFEDIL 1574 11290 10460 Seq 6 B11 GTGACAGGTTGGCGCCTAGCTGAGCGGATTCTT VTGWRLAERIL 1137 12715 11157 Seq 6 B12 GTGACAGGTTGGCGCCTAGCTGAGCGGATTCTT VTGWRLAERIL 1349 10113 9642 Seq 6 C01 GTGACAGGTTGGCGCCTATCTGAGCGGATTCTT VTGWRLSERIL 1944 18491 17279 Seq 6 C02 GTGACAGGTTGGCGCCTACTTGAGTGTATTCTT VTGWRLLECIL 730 7483 6725 Seq 6 C03 GTGACAGGTTGGCGCCTATTTAAGCCGATTCTT VTGWRLFKPIL 28919 133928 122993 Seq 6 C04 GTGACAGGTTGGCGCCTATTTAAGCCGATTCTT VTGWRLFKPIL 35873 121641 110259 Seq 6 C05 GTGACAGGTTGGCGCCTATCTGAGCGGATTCTT VTGWRLSERIL 1056 12195 11947 Seq 6 C06 GTGACAGGTTGGCGCCTAGCTGAGCGGATTCTT VTGWRLAERIL 1388 6971 6716 Seq 6 C07 GTGACAGGTTGGCGCCTATGGGAGCCTATTCTT VTGWRLWEPIL 1333 13722 13783 Seq 6 C08 GTGACAGGTTGGCGCCTATTTGAGGATATTCTT VTGWRLFEDIL 2060 13235 12726 Seq 6 C09 GTGACAGGTTGGCGCCTAGCTGAGCGGATTCTT VTGWRLAERIL 1156 8983 8627 Seq 6 C10 GTGACAGGTTGGCGCCTACTTGAGCGGATTCTT VTGWRLLERIL 3268 23942 21567 Seq 6 C11 GTGACAGGTTGGCGCCTAAATGAGAGGATTCTT VTGWRLNERIL 3017 27742 26089 Seq 6 C12 GTGACAGGTTGGCGCCTACATGAGCGGATTCTT VTGWRLHERIL 40190 144200 126725 Seq 6 D01 GTGACAGGTTGGCGCCTATTTGTGGCGATTCTT VTGWRLFVAIL 8566 42655 40209 Seq 6 D02 GTGACAGGTTGGCGCCTATTTGTGGCGATTCTT VTGWRLFVAIL 7021 54201 48497 Seq 6 D03 GTGACAGGTTCGCGCCTATTTGAGAGGATTCTT VTGSRLFERIL 1080 6574 6964 Seq 6 D04 GTGACAGGTATTCGCCTATTTGAGTTGATTCTT VTGIRLFELIL 737 6550 6247 Seq 6 D05 GTGACAGGTTCTCGCCTATTTGAGCAGATTCTT VTGSRLFEQIL 589 5163 5233 Seq 6 D06 GTGACAGGTTGGCGCCTATTTGAGCGGATTCTT VTGWRLFERIL 18895 137163 121867 Seq 6 D07 GTGACAGGTGCTCGCCTATTTGAGAAGATTCTT VTGARLFEKIL 5683 28688 26200 Seq 6 D08 GTGACAGGTAAGCGCCTATTTGAGCGTATTCTT VTGKRLFERIL 564 4514 4297 Seq 6 D09 GTGACAGGTATTCGCCTATTTGAGCTTATTCTT VTGIRLFELIL 490 4145 4236 Seq 6 D10 GTGACAGGTGGGCGCCTATTTGAGTCGATTCTT VTGGRLFESIL 519 4976 4974 Seq 6 D11 GTGAAAGGTTCTCGCCTATTTGAGAAGATTCTT VKGSRLFEKIL 958 5766 5518 Seq 6 D12 GTGACAGGTCTTCGCCTATTTGAGCTTATTCTT VTGLRLFELIL 641 5240 5101 Seq 6 E01 GTGACAGGTGGGCGCCTATTTGAGCATATTCTT VTGGRLFEHIL 2720 17763 17121 Seq 6 E02 GTGACAGGTTATCGCCTATTTGAGTTTATTCTT VTGYRLFEFIL 3412 25005 25096 Seq 6 E03 GTGACAGGTTGGCGCCTATTTGAGATGATTCTT VTGWRLFEMIL 11800 58763 55280 Seq 6 E04 GTGACAGGTTGGCGCCTATTTGAGGGGATTCTT VTGWRLFEGIL 5398 28376 26780 Seq 6 E05 GTGACAGGTTTTCGCCTATGTGAGAGGATTCTT VTGFRLCERIL 10773 45652 42683 Seq 6 E06 GTGACAGGTTGTCGCCTATTTGAGATTATTCTT VTGCRLFEIIL 1447 12918 12985 Seq 6 E07 GTGACAGGTTATCGCCTATTTGAGCGGATTCTT VTGYRLFERIL 20590 134098 133752 Seq 6 E08 GTGACAGGTTATCGCCTATTTGAGGAGATTCTT VTGYRLFEEIL 5450 33317 33604 Seq 6 E09 GTGACAGGTTGGCGCCTATTTGAGCCGATTCTT VTGWRLFEPIL 28273 131499 120231 Seq 6 E10 GTGACAGGTTTTCGCCTATTTGAGTGTATTCTT VTGFRLFECIL 14198 72148 68146 Seq 6 E11 GTGACAGGTTTTCGCCTATTTGAGTGTATTCTT VTGFRLFECIL 6424 69708 66004 Seq 6 E12 GTGACAGGTCTTCGCCTATTTGAGCTTATTCTT VTGLRLFELIL 793 4427 4175 Seq 6 F01 GTGACAGGTACGCGCCTATTTGAGCTTATTCTT VTGTRLFELIL 395 1983 1999 Seq 6 F02 GTGACAGGTTGGCGCCTATTTGAGGTGATTCTT VTGWRLFEVIL 12245 61512 60416 Seq 6 F03 GTGACAGGTTGGCGCCTATTTGAGCGGATTCTT VTGWRLFERIL 28749 111685 102307 Seq 6 F04 GTGACAGGTTATCGCCTATCTGAGAGGATTCTT VTGYRLSERIL 5794 42792 42968 Seq 6 F05 GTGACAGGTAATCGCCTATTTGAGCGGATTCTT VTGNRLFERIL 4919 29570 29509 Seq 6 F06 GTGACAGGTTGGCGCCTATCTGAGCGGATTCTT VTGWRLSERIL 1661 15064 15638 Seq 6 F07 GTGACAGGTTGTCGCCTATTTGAGATTATTCTT VTGCRLFEIIL 598 3857 3973 Seq 6 F08 GTGACAGGTTATCGCCTATTTGAGCGGATTCTT VTGYRLFERIL 11014 70969 64622 Seq 6 F09 GTGACAGGTCATCGCCTATTTGAGCAGATTCTT VTGHRLFEQIL 4057 23042 25467 Seq 6 F10 GTGACAGGTTTTCGCCTATTTGAGATGATTCTT VTGFRLFEMIL 20379 76320 72267 Seq 6 F11 GTGACAGGTCGTCGCCTATGTGAGAAGATTCTT VTGRRLCEKIL 8961 28837 26680 Seq 6 F12 No sequence No sequence 24 4 1 Seq 6 G01 GTGACAGGTAGTCGCCTATTTGAGCAGATTCTT VTGSRLFEQIL 1163 8741 8420 Seq 6 G02 GTGACAGGTATTCGCCTATTTGAGCTTATTCTT VTGIRLFELIL 901 7021 6881 Seq 6 G03 GTGACAGGTTTTCGCCTATTTGAGCATATTCTT VTGFRLFEHIL 35074 151407 137940 Seq 6 G04 GTGACAGGTCTTCGCCTATTTGAGCTTATTCTT VTGLRLFELIL 910 5969 5917 Seq 6 G05 GTGACAGGTTATCGCCTATTTGAGAGGATTCTT VTGYRLFERIL 26449 115952 105415 Seq 6 G06 GTGACAGGTTGGCGCCTATTTGAGCAGATTCTT VTGWRLFEQIL 14751 72272 71175 Seq 6 G07 GTGACAGGTTGGCGCCTATGTGAGAGGATTCTT VTGWRLCERIL 4539 33125 30507 Seq 6 G08 GTGACAGGTTGGCGCCTATTTGAGCAGATTCTT VTGWRLFEQIL 12793 62787 60937 Seq 6 G09 GTGACAGGTTATCGCCTATTTGAGGGGATTCTT VTGYRLFEGIL 3807 25877 24510 Seq 6 G10 GTGACAGGTTATCGCCTATTTGAGTTTATTCTT VTGYRLFEFIL 5191 34126 34738 Seq 6 G11 GTGACAGGTGCTCGCCTATTTGAGCGGATTCTT VTGARLFERIL 7862 46278 44955 Seq 6 G12 GTGACAGGTTGGCGCCTATTTGAGCGGATTCTT VTGWRLFERIL 36961 126362 119493 Seq 6 H01 GTGACAGGTTATCGCCTATTTGAGCATATTCTT VTGYRLFEHIL 34746 135705 128414 Seq 6 H02 GTGACAGGTTGGCGCCTATTTGAGATTATTCTT VTGWRLFEIIL 19727 63809 59596 Seq 6 H03 GTGACAGGTGCTCGCCTATTTGAGCGTATTCTT VTGARLFERIL 15701 57487 52717 Seq 6 H04 GTGACAGGTTATCGCCTATTTGAGTCGATTCTT VTGYRLFESIL 18432 86694 81460 Seq 6 H05 GTGACAGGTTGGCGCCTATTTGAGCCGATTCTT VTGWRLFEPIL 31456 116143 113348 Seq 6 H06 GTGACAGGTTGGCGCCATCTTGAGAGAATTCTT VTGWRHLERIL 29 6 -2 Seq 6 H07 GTGACACATTGGCGCCTAGGTGAGAGAATTCTT VTHWRLGERIL 17 7 1 Seq 6 H08 GTGACAGGTTGGCGCTCGGTTGAGAAAATTCTT VTGWRSVEKIL 29 8 37 Seq 6 H09 Control Control 362846 554949 515008 Seq 6 H10 GTGACAGGTCCTCGCCTAAGGGAGACAATTCTT VTGPRLRETIL 27 23 12 Seq 6 H11 Control Control 1924 11296 11402 Seq 6 H12 Control Control 25 -3 23 Seq 7 A01 Control Control 1244166 2007059 2124724 Seq 7 A02 Control Control 5466 33377 40629 Seq 7 A03 Control Control 1398 19624 21369 Seq 7 A04 GTGACAGGTTATCGCCTAATGGAGAGAATTCTT VTGYRLMERIL 4800 30948 34753 Seq 7 A05 GTGACAGGTTTTCGCCTATGGGAGAAAATTCTT VTGFRLWEKIL 1034 37312 47269 Seq 7 A06 GTGACAGGTTATCGCCTATCGGAGAGAATTCTT VTGYRLSERIL 4308 30321 38761 Seq 7 A07 GTGACAGGTTCTCGCCTATTTGAGAGAATTCTT VTGSRLFERIL 400 12715 11099 Seq 7 A08 GTGACAGGTGCGCGCCTACATGAGAAAATTCTT VTGARLHEKIL 1883 10084 12115 Seq 7 A09 GTGACAGGTTGGCGCCTAAGTGAGAAAATTCTT VTGWRLSEKIL 2013 82822 90822 Seq 7 A10 GTGACAGGTCATCGCCTAATGGAGAGAATTCTT VTGHRLMERIL 3619 23603 28970 Seq 7 A11 GTGACAGGTTGGCGCCTAGCTGAGAGAATTCTT VTGWRLAERIL 1859 10703 12313 Seq 7 A12 GTGACAGGTAATCGCCTATTTGAGAGAATTCTT VTGNRLFERIL 3705 20656 21755 Seq 7 B01 GTGACAGGTGCGCGCCTAATGGAGAAAATTCTT VTGARLMEKIL 1463 20131 19973 Seq 7 B02 GTGACAGGTGTGCGCCTAATGGAGAGAATTCTT VTGVRLMERIL 384 5890 6540 Seq 7 B03 GTGACAGGTGCGCGCCTACTTGAGAAAATTCTT VTGARLLEKIL 638 5426 6547 Seq 7 B04 GTGACAGGTAGTCGCCTATTTGAGAAAATTCTT VTGSRLFEKIL 1250 30479 32306 Seq 7 B05 GTGACAGGTGCGCGCCTAAGTGAGAAAATTCTT VTGARLSEKIL 476 4702 6308 Seq 7 B06 GTGACAGGTTTTCGCCTACATGAGAAAATTCTT VTGFRLHEKIL 17194 117666 115789 Seq 7 B07 GTGACAGGTTTTCGCCTAAGTGAGAGAATTCTT VTGFRLSERIL 10011 81002 81041 Seq 7 B08 GTGACAGGTTATCGCCTAAGTGAGAGAATTCTT VTGYRLSERIL 4936 25689 29786 Seq 7 B09 GTGACAGGTTATCGCCTAATGGAGAGAATTCTT VTGYRLMERIL 2697 22364 21866 Seq 7 B10 GTGACAGGTTGTCGCCTACTTGAGAGAATTCTT VTGCRLLERIL 10783 47313 55048 Seq 7 B11 GTGACAGGTGCGCGCCTATGGGAGAAAATTCTT VTGARLWEKIL 321 3575 4940 Seq 7 B12 GTGACAGGTGGGCGCCTATGTGAGACAATTCTT VTGGRLCETIL 2412 38595 35624 Seq 7 C01 GTGACAGGTTCTCGCCTATTTGAGAGAATTCTT VTGSRLFERIL 1438 11938 13006 Seq 7 C02 GTGACAGGTTCTCGCCTAATGGAGAAAATTCTT VTGSRLMEKIL 1945 11941 13644 Seq 7 C03 GTGACAGGTTATCGCCTATGTGAGAAAATTCTT VTGYRLCEKIL 1817 17862 17597 Seq 7 C04 GTGACAGGTTTTCGCCTACATGAGAAAATTCTT VTGFRLHEKIL 12892 65327 72613 Seq 7 C05 GTGACAGGTCATCGCCTACATGAGAAAATTCTT VTGHRLHEKIL 1270 21166 25570 Seq 7 C06 GTGACAGGTTTTCGCCTATGGGAGATAATTCTT VTGFRLWEIIL 135 1211 1676 Seq 7 C07 GTGACAGGTTATCGCCTAAGTGAGAGAATTCTT VTGYRLSERIL 2465 32877 34751 Seq 7 C08 GTGACAGGTTCTCGCCTACATGAGAAAATTCTT VTGSRLHEKIL 732 9987 12002 Seq 7 C09 GTGACAGGTGCGCGCCTACATGAGAAAATTCTT VTGARLHEKIL 2164 20445 21849 Seq 7 C10 GTGACAGGTATGCGCCTATTTGAGAGAATTCTT VTGMRLFERIL 453 4885 5756 Seq 7 C11 GTGACAGGTTTTCGCCTACATGAGACAATTCTT VTGFRLHETIL 333 6100 6445 Seq 7 C12 GTGACAGGTTATCGCCTATCGGAGAGAATTCTT VTGYRLSERIL 5935 61869 62810 Seq 7 D01 GTGACAGGTCTTCGCCTACATGAGAAAATTCTT VTGLRLHEKIL 1936 17222 18823 Seq 7 D02 GTGACAGGTTATCGCCTAGCTGAGAAAATTCTT VTGYRLAEKIL 902 11006 12676 Seq 7 D03 GTGACAGGTTGGCGCCTAAGTGAGAAAATTCTT VTGWRLSEKIL 4688 71265 72274 Seq 7 D04 GTGACAGGTTGGCGCCTAAGTGAGATAATTCTT VTGWRLSEIIL 6797 47811 50255 Seq 7 D05 GTGACAGGTGGTCGCCTACTTGAGAAAATTCTT VTGGRLLEKIL 399 7024 9019 Seq 7 D06 GTGACAGGTCTTCGCCTACATGAGAAAATTCTT VTGLRLHEKIL 245 9169 12542 Seq 7 D07 GTGACAGGTTATCGCCTATGTGAGAAAATTCTT VTGYRLCEKIL 2296 26775 26661 Seq 7 D08 GTGACAGGTGCGCGCCTACATGAGAAAATTCTT VTGARLHEKIL 1033 15400 18206 Seq 7 D09 GTGACAGGTTGGCGCCTATTTGAGAGAATTCTT VTGWRLFERIL 7222 64128 63955 Seq 7 D10 GTGACAGGTGCTCGCCTATGGGAGAAAATTCTT VTGARLWEKIL 212 4084 5379 Seq 7 D11 GTGACAGGTACTCGCCTAATGGAGAAAATTCTT VTGTRLMEKIL 284 2632 3293 Seq 7 D12 GTGACAGGTCATCGCCTATGGGAGAGAATTCTT VTGHRLWERIL 1180 18011 20001 Seq 7 E01 GTGACAGGTGGGCGCCTATTGGAGAGAATTCTT VTGGRLLERIL 398 6095 7098 Seq 7 E02 GTGACAGGTTGGCGCCTAGGTGAGAAAATTCTT VTGWRLGEKIL 1276 12217 15199 Seq 7 E03 GTGACAGGTCATCGCCTATGGGAGAGAATTCTT VTGHRLWERIL 1118 16393 21827 Seq 7 E04 GTGACAGGTCATCGCCTATGGGAGAGAATTCTT VTGHRLWERIL 2020 13269 17335 Seq 7 E05 GTGACAGGTCATCGCCTATCGGAGAGAATTCTT VTGHRLSERIL 432 6790 9333 Seq 7 E06 GTGACAGGTTATCGCCTAAGTGAGAGAATTCTT VTGYRLSERIL 4494 23363 26865 Seq 7 E07 GTGACAGGTTTTCGCCTATCTGGGAGAATTCTT VTGFRLSGRIL 835 13698 18181 Seq 7 E08 GTGACAGGTATGCGCCTATTTCAGAGAATTCTT VTGMRLFQRIL 1712 17347 23287 Seq 7 E09 GTGACAGGTAATCGCCTATTTAATAGAATTCTT VTGNRLFNRIL 1176 11868 13957 Seq 7 E10 GTGACAGGTTCTCGCCTATGTACGAGAATTCTT VTGSRLCTRIL 1022 6380 8526 Seq 7 E11 GTGACAGGTTGGCGCCTATGTAAGAGAATTCTT VTGWRLCKRIL 3519 65359 74252 Seq 7 E12 GTGACAGGTTATCGCCTATCTAATAGAATTCTT VTGYRLSNRIL 3065 18712 22148 Seq 7 F01 GTGACAGGTCATCGCCTATTTCGGAGAATTCTT VTGHRLFRRIL 35523 143585 155000 Seq 7 F02 GTGACAGGTGTGCGCCTATTTCTGAGAATTCTT VTGVRLFLRIL 150 1345 1771 Seq 7 F03 GTGACAGGTTCGCGCCTATTTGGGAGAATTCTT VTGSRLFGRIL 1435 11540 14843 Seq 7 F04 GTGACAGGTTGTCGCCTATCTATGAGAATTCTT VTGCRLSMRIL 634 6213 8360 Seq 7 F05 GTGACAGGTTTTCGCCTATCTCATAGAATTCTT VTGFRLSHRIL 12421 75543 88757 Seq 7 F06 GTGACAGGTGGTCGCCTATTTTGGAGAATTCTT VTGGRLFWRIL 1626 16406 20869 Seq 7 F07 GTGACAGGTTTTCGCCTATCTTTGAGAATTCTT VTGFRLSLRIL 417 5085 7415 Seq 7 F08 GTGACAGGTGGTCGCCTATCTAGGAGAATTCTT VTGGRLSRRIL 357 3038 5267 Seq 7 F09 GTGACAGGTTTTCGCCTATTTGGGAGAATTCTT VTGFRLFGRIL 43302 85125 92308 Seq 7 F10 GTGACAGGTGGTCGCCTATTTATGAGAATTCTT VTGGRLFMRIL 2266 41973 53937 Seq 7 F11 GTGACAGGTTTTCGCCTATCTAAGAGAATTCTT VTGFRLSKRIL 18317 121934 146250 Seq 7 F12 GTGACAGGTCATCGCCTATTTAAGAGAATTCTT VTGHRLFKRIL 24479 114849 114281 Seq 7 G01 GTGACAGGTGGGCGCCTATTTTCGAGAATTCTT VTGGRLFSRIL 8028 62580 69204 Seq 7 G02 GTGACAGGTTGTCGCCTATTTAAGAGAATTCTT VTGCRLFKRIL 13870 49665 58277 Seq 7 G03 GTGACAGGTCTTCGCCTATTTAAGAGAATTCTT VTGLRLFKRIL 3940 19588 23271 Seq 7 G04 GTGACAGGTTATCGCCTATGTCAGAGAATTCTT VTGYRLCQRIL 23964 91116 112457 Seq 7 G05 GTGACAGGTTGTCGCCTATTTATGAGAATTCTT VTGCRLFMRIL 2252 38094 44403 Seq 7 G06 GTGACAGGTTGGCGCCTATGTCGGAGAATTCTT VTGWRLCRRIL 13899 73582 94145 Seq 7 G07 GTGACAGGTCATCGCCTATTTTGTAGAATTCTT VTGHRLFCRIL 11603 46780 53001 Seq 7 G08 GTGACAGGTTTTCGCCTATCTATGAGAATTCTT VTGFRLSMRIL 6382 34913 48678 Seq 7 G09 GTGACAGGTTCTCGCCTATTTAGGAGAATTCTT VTGSRLFRRIL 7444 34125 42116 Seq 7 G10 GTGACAGGTTGGCGCCTATCTAAGAGAATTCTT VTGWRLSKRIL 12960 107372 121290 Seq 7 G11 GTGACAGGTTGGCGCCTATTTGGGAGAATTCTT VTGWRLFGRIL 50634 180986 191443 Seq 7 G12 GTGACAGGTTGGCGCCTATTTGAGAGAATTCTT VTGWRLFERIL 35584 152531 156395 Seq 7 H01 GTGACAGGTATTCGCCTATTTACTAGAATTCTT VTGIRLFTRIL 3655 25307 29627 Seq 7 H02 GTGACAGGTTGGCGCCTATTTGGGAGAATTCTT VTGWRLFGRIL 52781 173998 174662 Seq 7 H03 GTGACAGGTAATCGCCTATGTAGGAGAATTCTT VTGNRLCRRIL 634 9506 11031 Seq 7 H04 GTGACAGGTTCTCGCCTATTTGAGAGAATTCTT VTGSRLFERIL 2828 30419 35387 Seq 7 H05 GTGACAGGTGTGCGCCTATTTATGAGAATTCTT VTGVRLFMRIL 670 4926 6237 Seq 7 H06 GTGACAGGTGCTCGCCTATCTAAGAGAATTCTT VTGARLSKRIL 1182 9150 11364 Seq 7 H07 GTGACAGGTTATCGCCTATTTTGTAGAATTCTT VTGYRLFCRIL 21064 78119 83042 Seq 7 H08 GTGACAGGTTGGCGCCTATGTCTGAGAATTCTT VTGWRLCLRIL 1109 7099 9398 Seq 7 H09 GTGACAGGTTTTCGCCTATCTAAGAGAATTCTT VTGFRLSKRIL 32481 158340 177418 Seq 7 H10 Control Control 5718 47599 57195 Seq 7 H11 Control Control 2037 25296 29260 Seq 7 H12 Control Control 10 -13 24 Seq 8 A01 Control Control 163414 243689 255420 Seq 8 A02 Control Control 1054 5090 5709 Seq 8 A03 Control Control 103 2475 2672 Seq 8 A04 GTGACATCGTGGCGCAATTTTGAGAGAATTCTT VTSWRNFERIL 8 0 9 Seq 8 A05 GTGACATGGTGGCGCCCGTCTGAGAGAATTCTT VTWWRPSERIL 11 11 -3 Seq 8 A06 GTGACAGAGTGGCGCCCTTGTGAGAGAATTCTT VTEWRPCERIL 20 6 8 Seq 8 A07 No sequence No sequence 16 6 7 Seq 8 A08 GTGACAAGTTGGCGCCCGTTTGAGAGAATTCTT VTSWRPFERIL 6 2 11 Seq 8 A09 No sequence No sequence 16 7 -6 Seq 8 A10 GTGACAGTTTGGCGCCCTTGTGAGAGAATTCTT VTVWRPCERIL 18 -5 2 Seq 8 A11 GTGACAGGTTGGCGCCAGTTTGAGAGAATTCTT VTGWRQFERIL 1842 8325 8365 Seq 8 A12 GTGACATATTGGCGCAAGTTTGAGAGAATTCTT VTYWRKFERIL 3 0 6 Seq 8 B01 No sequence No sequence 13 5 0 Seq 8 B02 GTGACACCTTGGCGCTATTCTGAGAGAATTCTT VTPWRYSERIL 18 0 10 Seq 8 B03 No sequence No sequence 16 0 2 Seq 8 B04 GTGACACTGTGGCGCACGTATGAGAGAATTCTT VTLWRTYERIL 23 10 -7 Seq 8 B05 GTGACAGCTTGGCGCCGGTATGAGAGAATTCTT VTAWRRYERIL 4 15 9 Seq 8 B06 GTGACACGGTGGCGCGCTTTTGAGAGAATTCTT VTRWRAFERIL 24 8 9 Seq 8 B07 GTGACATTGTGGCGCATGTCTGAGAGAATTCTT VTLWRMSERIL 7 0 0 Seq 8 B08 GTGACAAGTTGGCGCCTTTTTGAGAGAATTCTT VTSWRLFERIL 14 12 9 Seq 8 B09 GTGACACTGTGGCGCTATTCTGAGAGAATTCTT VTLWRYSERIL 22 0 0 Seq 8 B10 GTGACAGAGTGGCGCAGGTTTGAGAGAATTCTT VTEWRRFERIL 27 -5 9 Seq 8 B11 GTGACAGATTGGCGCGTTTTTGAGAGAATTCTT VTDWRVFERIL 18 8 13 Seq 8 B12 GTGACAATGTGGCGCCCGTTTGAGAGAATTCTT VTMWRPFERIL 28 0 15 Seq 8 C01 GTGACAGGTTGGAGGCTATGTGAGAGAATTCTT VTGWRLCERIL 1287 10760 9735 Seq 8 C02 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 1315 7392 6444 Seq 8 C03 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 507 3150 3296 Seq 8 C04 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 135 1475 1589 Seq 8 C05 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 867 4283 4471 Seq 8 C06 GTGACAGGTTGGCGTCTATGTGAGAGAATTCTT VTGWRLCERIL 462 3623 3973 Seq 8 C07 GTGACAGGTTGGAGGCTATGTGAGAGAATTCTT VTGWRLCERIL 582 4161 4308 Seq 8 C08 GTGACAGGTTGGAGGCTATGTGAGAGAATTCTT VTGWRLCERIL 264 4908 4824 Seq 8 C09 GTGACAGGTTGGCGTCTATGTGAGAGAATTCTT VTGWRLCERIL 205 3778 3831 Seq 8 C10 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 161 3686 3829 Seq 8 C11 GTGACAGGTTGGAGGCTATGTGAGAGAATTCTT VTGWRLCERIL 388 3739 3606 Seq 8 C12 GTGACAGGTTGGAGGCTATGTGAGAGAATTCTT VTGWRLCERIL 767 5121 5072 Seq 8 D01 GTGACAGGTTGGCGCCTATGGCGGAAAATTCTT VTGWRLWRKIL 2773 12713 15016 Seq 8 D02 GTGACAGGTTGGCGCCTACATAAGATAATTCTT VTGWRLHKIIL 469 4408 4969 Seq 8 D03 GTGACAGGTTGGCGCCTAATGATTAGAATTCTT VTGWRLMIRIL 642 5204 5401 Seq 8 D04 GTGACAGGTTGGCGCCTACATAAGAAAATTCTT VTGWRLHKKIL 5477 24035 25477 Seq 8 D05 GTGACAGGTTGGCGCCTATTGAAGAGAATTCTT VTGWRLLKRIL 2755 9898 10454 Seq 8 D06 GTGACAGGTTGGCGCCTACTTAGGAGAATTCTT VTGWRLLRRIL 1142 8753 8569 Seq 8 D07 GTGACAGGTTGGCGCCTAATGAATAAAATTCTT VTGWRLMNKIL 346 7970 7443 Seq 8 D08 GTGACAGGTTGGCGCCTACCGGGGAAAATTCTT VTGWRLPGKIL 14 12 17 Seq 8 D09 GTGACAGGTTGGCGCCTAAGTGAGAAAATTCTT VTGWRLSEKIL 633 14614 15194 Seq 8 D10 GTGACAGGTTGGCGCCTACATAGGAGAATTCTT VTGWRLHRRIL 3338 24218 23467 Seq 8 D11 GTGACAGGTTGGCGCCTAAATAGGAGAATTCTT VTGWRLNRRIL 2873 10734 11454 Seq 8 D12 GTGACAGGTTGGCGCCTATTTAAGAGAATTCTT VTGWRLFKRIL 12425 46007 49184 Seq 8 E01 No sequence No sequence 376 1469 1677 Seq 8 E02 GTGACAGGTTGGCGCCTATTTGAGAAAATTCTT VTGWRLFEKIL 4305 13698 15047 Seq 8 E03 No sequence No sequence 196 1072 1306 Seq 8 E04 GTGACAGGTTGGCGCCTACATACGAAAATTCTT VTGWRLHTKIL 2740 15983 15750 Seq 8 E05 GTGACAGGTTGGCGCCTAAGTTATAAAATTCTT VTGWRLSYKIL 797 7718 8388 Seq 8 E06 GTGACAGGTTGGCGCCTATTTGGGAGAATTCTT VTGWRLFGRIL 1602 21669 23042 Seq 8 E07 GTGACAGGTTGGCGCCTACCGATGAAAATTCTT VTGWRLPMKIL 329 1002 1104 Seq 8 E08 GTGACAGGTTGGCGCCTAATGTCTAAAATTCTT VTGWRLMSKIL 1211 3832 4157 Seq 8 E09 GTGACAGGTTGGCGCCTAAGTCCGATAATTCTT VTGWRLSPIIL 73 770 956 Seq 8 E10 GTGACAGGTTGGCGCCTAATGAAGAAAATTCTT VTGWRLMKKIL 5173 18019 18481 Seq 8 E11 GTGACAGGTTGGCGCCTATTGAAGAAAATTCTT VTGWRLLKKIL 4106 15590 15173 Seq 8 E12 GTGACAGGTTGGCGCCTATTTATGAAAATTCTT VTGWRLFMKIL 8674 25468 27046 Seq 8 F01 GTGACAGGTTGGCGCCTATTTGGGAGAATTCTT VTGWRLFGRIL 7252 20873 21927 Seq 8 F02 GTGACAGGTTGGCGCCTAAGTCGGAGAATTCTT VTGWRLSRRIL 2764 23548 23436 Seq 8 F03 GTGACAGGTTGGCGCCTAAGTAAGAAAATTCTT VTGWRLSKKIL 4438 21909 21719 Seq 8 F04 GTGACAGGTTGGCGCCTACATCAGAAAATTCTT VTGWRLHQKIL 2749 11669 12014 Seq 8 F05 GTGACAGGTTGGCGCCTATCTATGAAAATTCTT VTGWRLSMKIL 488 3999 4551 Seq 8 F06 GTGACAGGTTGGCGCCTAAGTCGGAGAATTCTT VTGWRLSRRIL 3020 11796 12501 Seq 8 F07 GTGACAGGTTGGCGCCTATTTGCGAAAATTCTT VTGWRLFAKIL 1713 11834 12795 Seq 8 F08 GTGACAGGTTGGCGCCTAATTACGAAAATTCTT VTGWRLITKIL 41 230 287 Seq 8 F09 GTGACAGGTTGGCGCCTAACTCGGAGAATTCTT VTGWRLTRRIL 1483 7406 8342 Seq 8 F10 GTGACAGGTTGGCGCCTATGTAAGAGAATTCTT VTGWRLCKRIL 753 10519 10383 Seq 8 F11 GTGACAGGTTGGCGCCTACTTAAGATAATTCTT VTGWRLLKIIL 172 975 1206 Seq 8 F12 GTGACAGGTTGGCGCCTAATGACGATAATTCTT VTGWRLMTIIL 1305 4076 4482 Seq 8 G01 GTGACAGGTTGGCGCCTATTTCATACAATTCTT VTGWRLFHTIL 3300 8242 8589 Seq 8 G02 GTGACAGGTTGGCGCCTACATGGGAGAATTCTT VTGWRLHGRIL 5317 16390 17570 Seq 8 G03 GTGACAGGTTGGCGCCTAACTCGTACAATTCTT VTGWRLTRTIL 2 13 26 Seq 8 G04 GTGACAGGTTGGCGCCTATCGCGGAGAATTCTT VTGWRLSRRIL 2784 12491 13894 Seq 8 G05 GTGACAGGTTGGCGCCTATTTGAGAGAATTCTT VTGWRLFERIL 10257 33445 34121 Seq 8 G06 GTGACAGGTTGGCGCCTAAAGAAGATAATTCTT VTGWRLKKIIL 93 587 686 Seq 8 G07 GTGACAGGTTGGCGCCTACGGACTAGAATTCTT VTGWRLRTRIL 23 13 26 Seq 8 G08 GTGACAGGTTGGCGCCTACATTTGAGAATTCTT VTGWRLHLRIL 452 2491 3335 Seq 8 G09 GTGACAGGTTGGCGCCTAAAGTCTAGAATTCTT VTGWRLKSRIL 30 80 103 Seq 8 G10 GTGACAGGTTGGCGCCTATATCCGAGAATTCTT VTGWRLYPRIL 1240 4308 4700 Seq 8 G11 No sequence No sequence 13 0 18 Seq 8 G12 GTGACAGGTTGGCGCCTACATCGGATAATTCTT VTGWRLHRIIL 2056 8554 10999 Seq 8 H01 GTGACAGGTTGGCGCCTACATACGAAAATTCTT VTGWRLHTKIL 8610 24968 27897 Seq 8 H02 GTGACAGGTTGGCGCCTATGTAAGAGAATTCTT VTGWRLCKRIL 2817 12435 13964 Seq 8 H03 GTGACAGGTTGGCGCCTATTTAATAGAATTCTT VTGWRLFNRIL 3173 11925 12269 Seq 8 H04 GTGACAGGTTGGCGCCTAAGTAGGAAAATTCTT VTGWRLSRKIL 10137 21730 22313 Seq 8 H05 GTGACAGGTTGGCGCCTACGGCTGAAAATTCTT VTGWRLRLKIL 28 34 56 Seq 8 H06 GTGACAGGTTGGCGCCTACCGGTGATAATTCTT VTGWRLPVIIL 149 2238 2500 Seq 8 H07 No sequence No sequence 1225 5955 7190 Seq 8 H08 GTGACAGGTTGGCGCCTACATACTAGAATTCTT VTGWRLHTRIL 8151 24430 25487 Seq 8 H09 Control Control 145272 185578 200896 Seq 8 H10 Control Control 1817 7574 9899 Seq 8 H11 Control Control 418 2912 3814 Seq 8 H12 Control Control 16 12 11 Seq 9 A01 Control Control 71549 164480 181618 Seq 9 A02 Control Control 1043 9522 11135 Seq 9 A03 Control Control 239 2975 3814 Seq 9 A04 GTGACAGGTTATCGCCTGTTTGAGAGAATTCTT VTGYRLFERIL 774 8427 8341 Seq 9 A05 GTGACAGGTCATCGCCGTTTTGAGAGAATTCTT VTGHRRFERIL 678 4332 4952 Seq 9 A06 GTGACAGGTATGCGCGTGTTTGAGAGAATTCTT VTGMRVFERIL 447 4778 4751 Seq 9 A07 GTGACAGGTGGGCGCTGGTTTGAGAGAATTCTT VTGGRWFERIL 512 2887 3545 Seq 9 A08 GTGACAGGTTGGCGCCTTTTTGAGAGAATTCTT VTGWRLFERIL 3573 18231 18705 Seq 9 A09 GTGACAGGTCATCGCCTTTTTGAGAGAATTCTT VTGHRLFERIL 2239 9312 9820 Seq 9 A10 GTGACAGGTGCTCGCCGGTTTGAGAGAATTCTT VTGARRFERIL 691 13648 13306 Seq 9 A11 No sequence No sequence 15 87 111 Seq 9 A12 GTGACAGGTGTGCGCCAGTCTGAGAGAATTCTT VTGVRQSERIL 2495 13995 12411 Seq 9 B01 No sequence No sequence 1395 14671 14244 Seq 9 B02 No sequence No sequence 455 6673 6497 Seq 9 B03 GTGACAGGTTGGCGCAGGTTTGAGAGAATTCTT VTGWRRFERIL 1483 7740 7852 Seq 9 B04 GTGACAGGTTATCGCGAGTTTGAGAGAATTCTT VTGYREFERIL 1412 7865 8730 Seq 9 B05 GTGACAGGTATGCGCAGGTGTGAGAGAATTCTT VTGMRRCERIL 18 -5 13 Seq 9 B06 GTGACAGGTCCGCGCACGTCTGAGAGAATTCTT VTGPRTSERIL 28 328 267 Seq 9 B07 GTGACAGGTTTTCGCTATTTTGAGAGAATTCTT VTGFRYFERIL 141 4495 4609 Seq 9 B08 GTGACAGGTATTCGCCCTTATGAGAGAATTCTT VTGIRPYERIL 14 8 17 Seq 9 B09 GTGACAGGTTGGCGCCTGTTTGAGAGAATTCTT VTGWRLFERIL 2469 13972 14460 Seq 9 B10 No sequence No sequence 10 1 6 Seq 9 B11 GTGACAGGTGTTCGCCTGTTTGAGAGAATTCTT VTGVRLFERIL 397 8425 7871 Seq 9 B12 No sequence No sequence 109 1418 1681 Seq 9 C01 No sequence No sequence 12 6 14 Seq 9 C02 GTGACAGGTACGCGCACGTATGAGAGAATTCTT VTGTRTYERIL 5 50 100 Seq 9 C03 GTGACAGGTGAGCGCCATTCTGAGAGAATTCTT VTGERHSERIL 18 188 218 Seq 9 C04 GTGACAGGTCATCGCTGGTTTGAGAGAATTCTT VTGHRWFERIL 166 1626 1840 Seq 9 C05 GTGACAGGTTGGCGCAGGTTTGAGAGAATTCTT VTGWRRFERIL 3572 19635 16940 Seq 9 C06 GTGACAGGTGCGCGCGGGTATGAGAGAATTCTT VTGARGYERIL 49 241 336 Seq 9 C07 GTGACAGGTTATCGCCAGTTTGAGAGAATTCTT VTGYRQFERIL 1374 15221 13555 Seq 9 C08 GTGACAGGTTTTCGCCTTTTTGAGAGAATTCTT VTGFRLFERIL 2093 8875 8937 Seq 9 C09 GTGACAGGTGTTCGCTGGTTTGAGAGAATTCTT VTGVRWFERIL 38 237 288 Seq 9 C10 GTGACAGGTTAGCGCGCTTGTGAGAGAATTCTT VTG*RACERIL 24 21 31 Seq 9 C11 No sequence No sequence 44 288 394 Seq 9 C12 GTGACAGGTTGGCGCATGTTTGAGAGAATTCTT VTGWRMFERIL 2510 9970 10449 Seq 9 D01 GTGACAGGTTTTCGCCATTTTGAGAGAATTCTT VTGFRHFERIL 1861 8699 10066 Seq 9 D02 GTGACAGGTCATCGCTGTTTTGAGAGAATTCTT VTGHRCFERIL 409 2133 2437 Seq 9 D03 GTGACAGGTTGGCGCTCTTTTGAGAGAATTCTT VTGWRSFERIL 383 1464 1953 Seq 9 D04 GTGACAGGTGCTCGCGTGTTTGAGAGAATTCTT VTGARVFERIL 575 3192 3972 Seq 9 D05 No sequence No sequence 1343 4266 4630 Seq 9 D06 GTGACAGGTTGGCGCCTGTTTGAGAGAATTCTT VTGWRLFERIL 2286 10625 11887 Seq 9 D07 GTGACAGGTCATCGCACGTATGAGAGAATTCTT VTGHRTYERIL 17 55 68 Seq 9 D08 GTGACAGGTGGTCGCGTGTTTGAGAGAATTCTT VTGGRVFERIL 577 3028 3606 Seq 9 D09 GTGACAGGTTCTCGCTGGTCTGAGAGAATTCTT VTGSRWSERIL 24 42 99 Seq 9 D10 GTGACAGGTTCTCGCCAGTATGAGAGAATTCTT VTGSRQYERIL 13 9 28 Seq 9 D11 GTGACAGGTGTGCGCACTTCTGAGAGAATTCTT VTGVRTSERIL 19 17 18 Seq 9 D12 GTGACAGGTCGTCGCACTTTTGAGAGAATTCTT VTGRRTFERIL 46 292 290 Seq 9 E01 No sequence No sequence 18 12 6 Seq 9 E02 No sequence No sequence 12 -1 6 Seq 9 E03 No sequence No sequence -6 6 2 Seq 9 E04 No sequence No sequence 20 10 6 Seq 9 E05 No sequence No sequence 14 17 1 Seq 9 E06 No sequence No sequence 15 14 29 Seq 9 E07 No sequence No sequence 18 5 -11 Seq 9 E08 GTGACAGGTTGGTGTCTATGTGAGAGAATTCTT VTGWCLCERIL 10 8 9 Seq 9 E09 No sequence No sequence 19 14 27 Seq 9 E10 No sequence No sequence 25 -3 17 Seq 9 E11 No sequence No sequence 7 2 -5 Seq 9 E12 No sequence No sequence 11 6 5 Seq 9 F01 No sequence No sequence 19 5 27 Seq 9 F02 No sequence No sequence 11 0 15 Seq 9 F03 No sequence No sequence 14 8 1 Seq 9 F04 GTGACAGGTCGGCGCGCGTCTGAGAGAATTCTT VTGRRASERIL 18 0 3 Seq 9 F05 No sequence No sequence -3 3 6 Seq 9 F06 GTGACAGGTTCGCGCTGGTTTGAGAGAATTCTT VTGSRWFERIL 207 1282 1641 Seq 9 F07 No sequence No sequence 18 8 17 Seq 9 F08 GTGACAGGTTTGCGCTGGTGTGAGAGAATTCTT VTGLRWCERIL -1 54 58 Seq 9 F09 GTGACAGGTACGCGCCATTTTGAGAGAATTCTT VTGTRHFERIL 85 253 295 Seq 9 F10 GTGACAGGTTGTCGCGGTTTTGAGAGAATTCTT VTGCRGFERIL 44 229 305 Seq 9 F11 No sequence No sequence 5 2 15 Seq 9 F12 GTGACAGGTTCTCGCCTGTTTGAGAGAATTCTT VTGSRLFERIL 365 3600 4353 Seq 9 G01 GTGACAGGTTGGCGCCTATATCCTAAAATTCTT VTGWRLYPKIL 18 3 13 Seq 9 G02 GTGACAGGTTGGCGCCTACCGCCGAGAATTCTT VTGWRLPPRIL -2 6 15 Seq 9 G03 GTGACAGGTTGGCGCCTACAGATGAGAATTCTT VTGWRLQMRIL 63 403 509 Seq 9 G04 GTGACAGGTTGGCGCCTATGGAAGATAATTCTT VTGWRLWKIIL 736 3977 4935 Seq 9 G05 GTGACAGGTTGGCGCCTACCGGATAAAATTCTT VTGWRLPDKIL -3 2 5 Seq 9 G06 GTGACAGGTTGGCGCCTAGTTGTTAGAATTCTT VTGWRLVVRIL 187 1215 1577 Seq 9 G07 No sequence No sequence 148 469 456 Seq 9 G08 No sequence No sequence 28 108 131 Seq 9 G09 No sequence No sequence 182 1943 2582 Seq 9 G10 GTGACAGGTTGGCGCCTACAGGGGAAAATTCTT VTGWRLQGKIL 97 772 1019 Seq 9 G11 GTGACAACTTGGCGCCAGTCTGAGAGAATTCTT VTTWRQSERIL 14 6 5 Seq 9 G12 GTGACAATGTGGCGCCGGTCTGAGAGAATTCTT VTMWRRSERIL 12 27 16 Seq 9 H01 GTGACATTGTGGCGCATTTTTGAGAGAATTCTT VTLWRIFERIL 8 -6 11 Seq 9 H02 GTGACATTGTGGCGCCAGTCTGAGAGAATTCTT VTLWRQSERIL 20 2 3 Seq 9 H03 GTGACACTGTGGCGCCTGTGTGAGAGAATTCTT VTLWRLCERIL 28 -1 22 Seq 9 H04 GTGACACCGTGGCGCGCGTTTGAGAGAATTCTT VTPWRAFERIL 13 5 6 Seq 9 H05 GTGACAGCGTGGCGCTATTCTGAGAGAATTCTT VTAWRYSERIL 13 13 5 Seq 9 H06 GTGACAATTTGGCGCTATTCTGAGAGAATTCTT VTIWRYSERIL 18 -2 11 Seq 9 H07 GTGACAGCTTGGCGCGAGTCTGAGAGAATTCTT VTAWRESERIL 16 -1 9 Seq 9 H08 GTGACAGGTTGGCCTCTATGTGAGAGAATTCTT VTGWPLCERIL 9 1 4 Seq 9 H09 Control Control 160484 239831 264405 Seq 9 H10 Control Control 1281 7260 9562 Seq 9 H11 Control Control 319 2365 3515 Seq 9 H12 Control Control 4 23 -2 Seq 10 A01 Control Control 45247 90994 92038 Seq 10 A02 Control Control 719 9300 11486 Seq 10 A03 Control Control 142 2120 2829 Seq 10 A04 GTGACAGGGTGGCGCTGTTCTGAGAGAATTCTT VTGWRCSERIL 65 812 1182 Seq 10 A05 GTGACAGGGTGGCGCACTTTTGAGAGAATTCTT VTGWRTFERIL 206 3516 4155 Seq 10 A06 GTGACAGGGTGGCGCGTTTCTGAGAGAATTCTT VTGWRVSERIL 147 1909 2636 Seq 10 A07 GTGACAGGTTGGCGCATTTTTGAGAGAATTCTT VTGWRIFERIL 539 6751 7688 Seq 10 A08 GTGACAGGTTGGCGCACTTTTGAGAGAATTCTT VTGWRTFERIL 449 2637 3272 Seq 10 A09 GTGACAGGGTGGCGCTTTTTTGAGAGAATTCTT VTGWRFFERIL 348 3338 4061 Seq 10 A10 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 344 8249 9247 Seq 10 A11 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 465 5322 6048 Seq 10 A12 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 406 4914 6108 Seq 10 B01 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 734 5686 6559 Seq 10 B02 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 547 4920 6013 Seq 10 B03 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 531 3579 4376 Seq 10 B04 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 478 3962 4697 Seq 10 B05 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 489 3860 4554 Seq 10 B06 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 608 4394 5180 Seq 10 B07 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 404 3212 3918 Seq 10 B08 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 476 3925 4349 Seq 10 B09 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 455 4356 4849 Seq 10 B10 GTGACAGGTTGGAGGCTATGTGAGAGAATTCTT VTGWRLCERIL 294 8018 8936 Seq 10 B11 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 349 3758 4257 Seq 10 B12 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 237 7309 8154 Seq 10 C01 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 450 4400 5355 Seq 10 C02 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 320 3097 3775 Seq 10 C03 GTGACAGGTTGGAGGCTATGTGAGAGAATTCTT VTGWRLCERIL 414 5590 6260 Seq 10 C04 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 716 4204 4913 Seq 10 C05 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 463 3370 3826 Seq 10 C06 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 383 4411 4657 Seq 10 C07 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 321 4502 5156 Seq 10 C08 GTGACAGGTTGGAGGCTATGTGAGAGAATTCTT VTGWRLCERIL 833 4817 5847 Seq 10 C09 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 352 3100 3791 Seq 10 C10 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 209 3759 4070 Seq 10 C11 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 331 4466 5161 Seq 10 C12 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 505 3756 4885 Seq 10 D01 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 430 4290 5227 Seq 10 D02 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 691 3732 4311 Seq 10 D03 GTGACAGGTTGGCTGCTATGTGAGAGAATTCTT VTGWLLCERIL 254 2369 3091 Seq 10 D04 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 527 3339 3848 Seq 10 D05 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 506 2963 3775 Seq 10 D06 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 468 3863 4630 Seq 10 D07 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 434 2812 3607 Seq 10 D08 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 369 2668 3482 Seq 10 D09 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 623 3438 4011 Seq 10 D10 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 227 4397 5083 Seq 10 D11 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 472 3390 4261 Seq 10 D12 GTGACAGGTTGGAGGCTATGTGAGAGAATTCTT VTGWRLCERIL 944 6825 8534 Seq 10 E01 GTGACAGGTTGGAGGCTATGTGAGAGAATTCTT VTGWRLCERIL 855 7061 8732 Seq 10 E02 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 488 3562 4412 Seq 10 E03 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 425 3872 4364 Seq 10 E04 GTGACAGGTTGGTTGCTATGTGAGAGAATTCTT VTGWLLCERIL 17 -1 4 Seq 10 E05 GTGACAGGTTGGGGGCTATGTGAGAGAATTCTT VTGWGLCERIL 19 5 2 Seq 10 E06 GTGACAGGTTGGGGGCTATGTGAGAGAATTCTT VTGWGLCERIL 17 4 4 Seq 10 E07 GTGACAGGTTGGGCGCTATGTGAGAGAATTCTT VTGWALCERIL 11 0 4 Seq 10 E08 GTGACAGGTTGGGCGCTATGTGAGAGAATTCTT VTGWALCERIL 8 6 7 Seq 10 E09 GTGACAGGTTGGTGTCTATGTGAGAGAATTCTT VTGWCLCERIL 22 3 11 Seq 10 E10 GTGACAGGTTGGGTTCTATGTGAGAGAATTCTT VTGWVLCERIL 35 10 7 Seq 10 E11 GTGACAGGTTGGCCTCTATGTGAGAGAATTCTT VTGWPLCERIL 31 23 -3 Seq 10 E12 No sequence No sequence 15 8 8 Seq 10 F01 GTGACGGGTTGGAAGCTATGTGAGAGAATTCTT VTGWKLCERIL 24 16 8 Seq 10 F02 GTGACAGGTTGGCTGCTATGTGAGAGAATTCTT VTGWLLCERIL 5 13 7 Seq 10 F03 GTGACGGGTTGGAAGCTATGTGAGAGAATTCTT VTGWKLCERIL 24 3 17 Seq 10 F04 No sequence No sequence 36 4 5 Seq 10 F05 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 867 3715 4467 Seq 10 F06 GTGACAGGTTGGCGGCTATGTGAGAGAATTCTT VTGWRLCERIL 973 4157 4909 Seq 10 F07 GTGACAGGTTGGTAGCTATGTGAGAGAATTCTT VTGW*LCERIL 18 8 10 Seq 10 F08 GTGACAGGTTGGGAGCTATGTGAGAGAATTCTT VTGWELCERIL 8 16 20 Seq 10 F09 No sequence No sequence 6 -2 9 Seq 10 F10 GTGACAGGTTGGGCGCTATGTGAGAGAATTCTT VTGWALCERIL 19 7 9 Seq 10 F11 GTGACAGGTTGGGCGCTATGTGAGAGAATTCTT VTGWALCERIL 17 4 7 Seq 10 F12 GTGACAGGTTGGCTGCTATGTGAGAGAATTCTT VTGWLLCERIL 12 2 0 Seq 10 G01 GTGACAGGTTGGCTGCTATGTGAGAGAATTCTT VTGWLLCERIL 25 8 10 Seq 10 G02 GTGACGGGTTGGAAGCTATGTGAGAGAATTCTT VTGWKLCERIL 26 -7 10 Seq 10 G03 GTGACGGGTTGGAAGCTATGTGAGAGAATTCTT VTGWKLCERIL 3 20 3 Seq 10 G04 GTGACGGGTTGGAAGCTATGTGAGAGAATTCTT VTGWKLCERIL 21 3 2 Seq 10 G05 GTGACAGGTTGGTAGCTATGTGAGAGAATTCTT VTGW*LCERIL 21 11 2 Seq 10 G06 GTGACAGGTTGGTATCTATGTGAGAGAATTCTT VTGWYLCERIL 26 7 3 Seq 10 G07 GTGACAGGTTGGCCGCTATGTGAGAGAATTCTT VTGWPLCERIL 31 3 -7 Seq 10 G08 GTGACAGGTTGGTTGCTATGTGAGAGAATTCTT VTGWLLCERIL 26 -4 0 Seq 10 G09 GTGACGGGTTGGAAGCTATGTGAGAGAATTCTT VTGWKLCERIL 18 13 10 Seq 10 G10 No sequence No sequence 24 9 6 Seq 10 G11 GTGACGGGTTGGAAGCTATGTGAGAGAATTCTT VTGWKLCERIL 14 4 6 Seq 10 G12 GTGACAGGTTGGCTGCTATGTGAGAGAATTCTT VTGWLLCERIL 16 -1 7 Seq 10 H01 GTGACAGGTTGGTCGCTATGTGAGAGAATTCTT VTGWSLCERIL 14 3 17 Seq 10 H02 GTGACAGGTTGGCTGCTATGTGAGAGAATTCTT VTGWLLCERIL 34 -1 8 Seq 10 H03 No sequence No sequence 24 21 -2 Seq 10 H04 GTGACGGGTTGGAAGCTATGTGAGAGAATTCTT VTGWKLCERIL 23 4 0 Seq 10 H05 GTGACAGGTTGGTTGCTATGTGAGAGAATTCTT VTGWLLCERIL 7 11 10 Seq 10 H06 GTGACGGGTTGGAAGCTATGTGAGAGAATTCTT VTGWKLCERIL 21 6 8 Seq 10 H07 GTGACAGGTTGGGAGCTATGTGAGAGAATTCTT VTGWELCERIL 8 8 12 Seq 10 H08 GTGACAGGTTGGGCGCTATGTGAGAGAATTCTT VTGWALCERIL 18 -6 7 Seq 10 H09 Control Control 99670 140292 147222 Seq 10 H10 Control Control 975 7679 10911 Seq 10 H11 Control Control 128 1165 1737 Seq 10 H12 Control Control 15 9 3

Appendix G - Affinity fitting curves