US 201401 13830A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2014/0113830 A1 Suga et al. (43) Pub. Date: Apr. 24, 2014

(54) AZOLINE COMPOUND AND AZOLE (52) U.S. Cl. COMPOUND LIBRARY AND METHOD FOR CPC ...... G0IN33/54393 (2013.01) PRODUCING SAME USPC ...... 506/9; 506/26:506/18: 435/68.1 (75) Inventors: Hiroaki Suga, Tokyo (JP); Yuki Goto, (57) ABSTRACT Tokyo (JP); Yumi Ito, Tokyo (JP) An object of the present invention is to provide a method of (73) Assignee: THE UNIVERSITY OF TOKYO, efficiently constructing a library abundant in diversity and Tokyo (JP) also usable for Screening of a compound that binds to a target (21) Appl. No.: 14/003,506 Substance having protease activity. The present invention provides a method of constructing an (22) PCT Filed: Mar. 9, 2012 aZoline compound library containing two or more azoline compounds having an azoline backbone introduced into at (86). PCT No.: PCT/UP2O12AO56181 least one of Cys, Ser, Thr, and 2,3-diamino acid, and analogs thereof of Xaa, of a peptide represented by the following S371 (c)(1), (2), (4) Date: Nov. 15, 2013 formula (I): (30) Foreign Application Priority Data wherein, m numbers of Xaaos respectively represent arbi trary amino acids, at least one of which is an Mar. 9, 2011 (JP) ...... 2011-052219 selected from the group consisting of Cys, Ser, Thr, and Publication Classification 2,3-diamino acid, and analogs thereof, m represents an inter selected from 2 to 40, and A and Beach independently (51) Int. Cl. represent a peptide composed of from 0 to 100 amino GOIN33/543 (2006.01) acids. Patent Application Publication Apr. 24, 2014 Sheet 1 of 22 US 2014/0113830 A1

Fig. 1

Synthesis scheme of azoline compound library

Random mRNA library

Leader sequence H. oscoessessorsosososovskis (NNK-WSU)n Translation in cell-free translation system

Peptide library

CCC Cys GLEAS - (Xaa- Thr)n- AYDGV Modification with PatD (Heterocyclase)

cleavage of leader sequence with Glu-C Azoline compound library

Y Th AS - (Xaa- Oxnox)n-AYDGV Patent Application Publication Apr. 24, 2014 Sheet 2 of 22 US 2014/0113830 A1

Fig. 2

-4HO calcd, 6018 ice

calcd, 6092 obsd, 6094 calcd. 6036 Pat)

- N -rior corrrrrrrrrrrrr"---- 5950 6000 550 600 6150 5950) 500 5050 100 615) Patent Application Publication Apr. 24, 2014 Sheet 3 of 22 US 2014/0113830 A1

Fig. 3

X-AYDGVEPS

(size=12) (size=16) VTACIFCVTIC VCACICFCVCACVCIC

PatD

--- w 535O 5400 5450 5500 5550 5550 5750 5350 5350 i" go

Pat) --

G350 400 6450 5500 6550 6650 67.50 6850 5950 Patent Application Publication Apr. 24, 2014 Sheet 4 of 22 US 2014/0113830 A1

Fig. 4

Substrate tolerance of PatD - Effect of hydrophilic residue - (X37)-GLEAS-VTACXTFC-AYDGVEPS Cin mi cinna ins WACFC WACNFC WACKFC WACE FC Path -4-3to -4-3FQ -4-3-o -4-3, W . w w w

| l | ; W, S84}} 8: ra 8) G2(3 sco S(00 fz s: $200 8900 80x3 mi: :300 3200

n2S 12 8 VIACOFFC WACAFC WACRTFC

4-3H, a -3Ho so v w & A | A F 2. f l S30 600) 6230 600 300 620 G300 800 62} f rhy: rf:

Patent Application Publication Apr. 24, 2014 Sheet 6 of 22 US 2014/0113830 A1

Fig. 6

Cleavage with Glu-C Template: Pat Clwt-GS MNKKNILPOOGOPVIRLTAGOLSSOLAELSEEALG, A GLE As WTAcITFCAyovososos - --Pat) --Glu-C

precursor peptide

Pat C1 wt-GS -Pat)

calc, 1880, 8 obsc. 1880.5 PatrC1Wt-GS Pat) +Gu-C

83 sites:s::sexy Formation of peptide from which leader sequence has been cleaved can be confirmed Patent Application Publication Apr. 24, 2014 Sheet 7 of 22 US 2014/0113830 A1

Fig. 7

Importance of the leader sequence

AYDGVEPs

AZ 2300 2400 2100 2200 maz 2300 2400

2100 2200 naz 2300 2400 200 2200 2300 2400

Patd LS : leader Sequence

2100 2200 m/z 2300 2400 Patent Application Publication Apr. 24, 2014 Sheet 8 of 22 US 2014/011383.0 A1

Fig. 8

3.33:8&xa Deletion of the leader Secuence &X&s:- (MNKKNILPQQGQPVIRLTAGQLss9LAELSEEALGDA maximum dehydration (observed)

maximum dehydration (observed) - 1H,O The whole LS is maximum dehydration A J not essential. (observed) :- 4H 2O - JNA.| N A 4400 m/z 4500 4600 Mwemwe ------GQLSSQLAELSEEALGDARS1 cassette

maximum dehydration (observed) - 4HO agoo 4000, 4100 T 4200 (M------EALGDARS1 cassette RS2 The C terminal 3.0- domain may maximum dehydration - 4H. be important. (observed) : - 4-HO 27oo zabo, agoo Taooo Patent Application Publication Apr. 24, 2014 Sheet 9 of 22 US 2014/0113830 A1

Fig. 9

Importance of recognition sequences E.atro S: Leader Sequence *u RS: upstream recognition sequence

*d RS: downstream recognition sequence

S600 S700 A, S800 5900 *LSIGLEAsvracITFca The downstream uRS / recognition sequence Major peak: -4 HO is not essential, S200 f 5500

as a reas a rare vracITFCAYDG .

The deletion of the Major peak: -2 HO p.-1.------All- 500 5200 53CO 5400 mz Upstream recognition

-2'-O- sequence diminishes at as a a WTACTFC A the reactivity of PatD. Major peak: -2 HO J.- 4700 4800 mAz 4900 5000 *LS GGGGGvTAcITFc. a W. A linker is necessary

at the N-terminus of Major peak: -4 HO P.I...less Cassette.

5000 $100, 5200 S300 Patent Application Publication Apr. 24, 2014 Sheet 10 of 22 US 2014/0113830 A1

Fig. 10A

Biosynthesis of exotic azoline backbones Procedure DNA (RS1. GGTTGTRS2) translation using a reprogrammed

Xaa noncanonical amino acids modification with Patt) pp. Noncanonical amino acids used in this stud Control Stereoisome Substituted O O O N HN & C h ".o "( O 2'-1oo on ".NH thr alo-Thr tBuSer PheSer Dap Patent Application Publication Apr. 24, 2014 Sheet 11 of 22 US 2014/0113830 A1

Fig. 10B

Synthesis of non-natural azoline by introducing and modifying non-natural amino acid (List of non-natural amino acids tried) control isomers substituted amine

". oH : N Dap

D-alo-Thr Patent Application Publication Apr. 24, 2014 Sheet 12 of 22 US 2014/011383.0 A1

Fig.11

Biosynthesis of exotic azoline backbones

CJSUE&S V ca AYDGVERS

Xaa - allo-Thr Patd accepts a g A stereoisomer at the -sis-in. B position. Xaa = tuSer Pato accepts substitutions at the Xaa = PheSer B position.

5400 , 5500 5600

PatD can install

- --w an imidazoline ring, 5300 5400 S500 5600 Patent Application Publication Apr. 24, 2014 Sheet 13 of 22 US 2014/0113830 A1

uogelsueu

Patent Application Publication Apr. 24, 2014 Sheet 14 of 22 US 2014/0113830 A1

Fig. 13

Result of selection

Patent Application Publication Apr. 24, 2014 Sheet 15 of 22 US 2014/0113830 A1

Fig. 14

Evaluation of binding ability of azoline-containing compound selected

0.25% 0.20% ! 0.15% : Pat)+ O,10% & Pat D O.05% O,00%

: l. Patent Application Publication Apr. 24, 2014 Sheet 16 of 22 US 2014/0113830 A1

Fig.15

Heterocyclase PatD Pat Gene cluster (Derived from prochlorondidemni) patA s patexx4: platf pate and C. Ed Sec. 50 003 00 2000 2600 300 350 400 43x} 0 SO 900 50 000 ico 80 8.0 900 50 it is to 50 acco 2s. ::::::::::::::::::::::::::::::: 3::::::::::::::::::::::::::::: E.W.Schmidtet al., PNAS, 2005

-GLEAS--cLEAS-N'-''n'', ". 9 N . '-'n'----'n'-a'-'ayog..rN . . . N . s . *AYDg) pal

N r S. N r AYog t to S Palos i y-"or's "h ",. Ayog o:K. t is: s'.. . ' o. J.A.M&islet Chen BioClien, 30 -- 0 ... a Patent Application Publication Apr. 24, 2014 Sheet 17 of 22 US 2014/0113830 A1

Fig. 16

Modification of synthetic peptide using substrate different from natural substrate - 1 Cassette sequence containing multiple non-cognate amino acids Deleted downstream Shortened leader recognition sequence Ac-LAELSEEALGDAGLEASVDapADap------? ). Y.Y. v. DapYAY FDapa

modification with PatD

- 4H.O. N.

24oo 25 OO 26OO 27OO m/z Patent Application Publication Apr. 24, 2014 Sheet 18 of 22 US 2014/0113830 A1

Fig. 17

Modification of synthetic peptide using substrate different from natural substrate - 2 Cassette sequence containing multiple non-cognate amino acids Deleted downstream Shortened leader recognition sequence

AC-LAELSEEALGDA Substituted upstream recognition sequence modification with Pat)

- 4H.ON

22OO 23 OO 24OO 25OO Patent Application Publication Apr. 24, 2014 Sheet 19 of 22 US 2014/0113830 A1

Fig. 18

waterinstber moleculeof loss of Sequence 5 O 5 2O 25 O 35 A320 from im190M------ILASLSTFQQM WISKQEY DEAGDA actin 2m209 M ------KEQNSFNLLQEW TESELDLILGA acticin 48 3m139 MELOLRPSGLEKKQAPISELNIAQTOGGDSQWLALNA Shuffled Patent Application Publication Apr. 24, 2014 Sheet 20 of 22 US 2014/0113830 A1

Fig. 19

Fumber of loss of Sequence O s 20 25 30 3. water

Cwt MNKKNILPQOGOP VI RITAGQL SSQL AEL SEEAIGDA m127 MNKKNILPQ QG QPWIRLTAGQLS SQL ------m111 || M------GP W IRITAGQLS SQL AEL SE ------m076 Mo ------GQPWIRLAGQL S SQL, AEL SEEAL, GDA." m114 M ------IRI, TAGOL ------m112 M ------IRLTAGQLS SQL ------n113 Mr ------AGQLS SQL ------mO75 M ------GOLS SOLAEL SEEAIGDA or m083 M- - - were or w to r s - w w to ------AEL SError - as a w mO82 ------LAEL SEEA - - - - mO74M ------LAEL SEEAL, GDA m080M - a ------LSEEAIGDA mO81 M------EA, GDA " m088M ------

Patent Application Publication Apr. 24, 2014 Sheet 22 of 22 US 2014/0113830 A1

Fig. 21

Sumber of osso Sequence water molecu C1 wit LS - GLEAS WTACITFC - AYD GWEPS LS - GLEAS-VTA. CIFC - AYD GWEPSRRR LS - GLEAS. WTACFC, AYD GW LS - GLEAS-WTACITF C. AYD GWGSGSGS LS - GLEAS -- WTACDTFC - AYD GWGGGGGG C1 wt. EGS LS GLEAS - WTACIFC - AYD GWEGSGSGS LS - GLEAS. WACLEC. AYD GWRRR LS - GLEAC.WTACF C. AYD GW m221 LS m GLEA-WTACITEC, AYD GW mO61 LS - GT EAS - WTACT FC - AYD mO62 LS ... GLEAS. WTAC ITFC - A mO60 LS - - - EAS - WTACITEC - AYD GWEPS mO66 LS - - - EAS - WTACITFC - A mO67 LS - - - EAS. WTACT FC mO63 LS ...... A.S.WTACITEC. AYD GWEPS mO64 LS S-WTACTFC - AYD GWEPS mO65 LS a r WACITFC - AYD GWEPS m167 LS are or ree WACITEC, AYD GW mOWO LS r WTACIFC - A mO71 LS m n m m WTACTFC m163 LS - GGGGG-WTACITFC AYD GW m191 LS - QQQQQ-VTAC ITFC - AYD GV m192 LS - LLILLI-VTAC ITFC - AYD GW LS ... PPPPP - WAC ITF C. AYD GW LS - GGGGG-WTACITFC - GGG LS ... GGGGG - WTACITFC - G LS - GGGGG-WAC ITFC - A LS - GGGGG.WTAC ITFC LS W or - - - - -n M. WTACITFC - GGG LS - GGGGG - GLEAS-WACT FC - AYD GW LS - GEAS - GGGGG-WTACT FC - AYD GW LS - GGGGG - GLEAS-VTACITFC - AYD GWA LS GGGGG, GLEAC m. WACTFC - AYD GW LS GGGGG - GIEAT - WTACT FC - AYD GW US 2014/011383.0 A1 Apr. 24, 2014

AZOLINE COMPOUND AND AZOLE often required. The library of peptidic compounds is however COMPOUND LIBRARY AND METHOD FOR cleaved by protease, and thus compounds inhibiting the activ PRODUCING SAME ity of a target Substance cannot be screened efficiently. 0008 Each peptide of the peptide library may be modified in vitro with a post-translational modification enzyme, but an TECHNICAL FIELD enzyme having a desired activity does not always have activ ity in vitro. In addition, the expressed peptide library should be purified prior to the reaction with an enzyme, and investi 0001. The present invention relates to an azoline com gation of Substrate specificity of the enzyme is also required. pound library, an azole compound library, and the like. It is therefore not easy to obtain a library comprised of pep tides having a desired structure.

BACKGROUND ART 0009. A library in which the presence or absence, or degree of modification of each member cannot be identified is 0002. In recent years, a variety of peptides has attracted inferior in usefulness because it eventually requires correla attentions as a drug candidate or research tool. There have tion analysis between structure and activity, as in the chemical been various attempts to develop a peptide library and Screen synthesis system. peptides having affinity with a target Substance. 0010 Patellamide produced by Prochloron didemni, that 0003. As a method of artificially constructing a peptide is, endozoic algae of sea squirt is a low molecular cyclic library, a method using chemical synthesis, a method using a peptide which is presumed to have various physiological biosynthetic enzyme of a secondary metabolite, and a trans activities and it is biosynthesized via a unique pathway with lation synthesis system, and the like have been employed products of a pat gene cluster consisting of patA to patG. The conventionally. pat gene cluster and biosynthesis pathway of patellamide are schematically shown in FIG. 15. 0004. It is however difficult, in the method using chemical synthesis, to increase the diversity of a library. In addition, it 0011. In this biosynthesis, Pat peptide which is a pate takes time for screening or analyzing the relationship gene product serves as a precursor. Since the pate gene has a between the structure of a compound and activity. hyperVariable region (cassette domain), the product of it con structs a natural combinatorial library. 0005. The method of using a biosynthetic enzyme of a 0012. The Pat peptide has, on both sides of the cassette secondary metabolite, on the other hand, enables rapid and domain, a recognition sequence by a post-translational modi easy construction or chemical conversion of a precise back fication enzyme. PatA, PatD, and PatG serve as the post bone that is difficult to be achieved by the chemical synthesis translational modification enzyme. Pat D introduces an azo method. Since enzymes have Substrate specificity, however, line backbone into Cys, Ser, and Thr in the cassette of Pat kinds of compounds that can be synthesized are limited. This and converts Cys to a thiazoline backbone and Ser and Thr method is therefore not suited for use in the construction of a into an oxazoline backbone. compound library with highly diverse kinds of . 0013 Pat cleaves the recognition sequence at the N-ter 0006 When a translation system is used, a peptide library minus side of the cassette domain of the PatE. rich in diversity can be constructed in a short time by con structing an mRNA library and translating it in one pot. By 0014 PatG is composed of two domains. An oxidase using this system in combination with an mRNA display domain on the N-terminus side converts an azoline backbone method or the like, both a peptide selected by screening and introduced by PatD into an azole backbone, that is, converts a information on a nucleic acid encoding Such peptide can be thiazoline backbone into an azole backbone. A peptidase obtained simultaneously so that the genotype and the pheno domain on the C-terminus side macrocyclizes Pat, while type of the selected peptide can be related to each othereasily. cleaving the recognition sequence on the C-terminus side of Despite the fact that synthesis of a peptide library using Such the cassette domain of PatE. a translation system has many advantages, it can only produce compounds consisting of peptidic backbone. 0015 With regard to the cassette domain of the above mentioned natural PatE, sequences shown in the following 0007. In screening using a library, identification of a com table are described in M. S. Donia et al. (Non-patent Docu pound inhibiting a target Substance having protease activity is ment 1). US 2014/011383.0 A1 Apr. 24, 2014

TABL E

G L E A S COMPOUND I A Y D G W E P S

E1 GGTTTGGAAGCATCTGTAACTGCTTGCATCACTTTTTGCGCTTATGATGGTGTGGAGCCATCTATAAC

E4 ------t------

E7 ------t- - - - - A------

E8 ------t------G------

E9 ------t------

E10 ------T- - G------

E11 ------t------C------

E12 ------t------

E13 ------t------GC- -

E2 ------ATGC- -

E5 ------T------ATGC- -

E6 ------G ------ATGC- -

E14 ------ATGC- -

E15 ------G ------ATGC- -

E16 ------C ------ATGC- -

E17 ------C------ATGC- -

E18 ------A------ATGC- -

E19 ------ATGC-T

E2O ------C------C-ATGC- -

E21 ------ATGC- -

E22 ------ATGC- -

E23 ------ATGC- -

E24 ------ATGC- -

E25 ------ATGC- -

E3 ------C. : : - - -T--T- : C - - - -A------T- : : -

E26 ------C. : : - - -T--T- : C - - - -A------C- : : -

E27 ------C. : : - - -T--T- : C - - - -A------T- : : -

E28 ------C. : : - - -T--T- : C - - - -A------T- : : -

E29 ------C. : : - - -T--T- : C - - - -A------T- : : - US 2014/011383.0 A1 Apr. 24, 2014

TABLE 1 - continued

COMPOUND II A Y D G E Stop

CTGTTTGCATCAGTGTTTGCGCTTATGATGGTGAATAA

E1 ------

E4 ------

E7 ------

E8 ------C ------

E9 ------

E10 ------

E11 ------C------

E12 ------T- - - C ------t------

E13 - CT-A- - -TGT-CCT-A--T- - - - - C------

E2 - CT-A- - -TGT-CCT-A--T- - - - - C------

E5 - CT-A- - -TGT-CCT-A--T- - - - - C------

E6 - CT-A- - -TGT-CCT-AC-T- - - - - C------

E14 - CT-A- - -TGT-CCT-A--T- - - - - C------

E15 - CT-A- - -TGT-CCT-A--T- - - - - C------

E16 - CT-A- - -TGT-CCT-A--T- - - - - C------

E17 - CT-A- - -TGT-CCT-A--T- - - - - C------

E18 TCT-A- - -TGT-CCT-A--T- - - - - C------

E19 - CT-A- - -TGT-CCT-A--T- - - - - C------

E2O - CT-A- - -TGTGCCT-A--T- - - - - C------

E21 - CT-A- - -TGT-CC- -A--T- - - - - C------

E22 - CT-A- - -TGT-CCT-C--T- - - - - C------

E23 - C - -A- - -TGTGCC- -A--T- - - - - C------

E24 - CT-A- - -TGT-CCT-A--T- - - - - C------

E25 ------T- : C- - C ------C ------

E3 ------T- : C- - C ------C ------

E26 - - - - G-T- : C- - C ------C ------

E27 ------T- : C- GC ------C ------

E28 - - - - C-T- : C- - C ------C ------US 2014/011383.0 A1 Apr. 24, 2014

TABLE TABLE 2 - continued

COMPOUND CODING SEQUENCE COMPOUND CODING SEQUENCE ulithiaCyclamide

patellamide family: patellamide C (E1I, E2I) patellamide A (E1II) patellamide B (E4I, E5I) new compound 1 (E6I) new compound 2 (E7I) new compound 3 (E8I) W new compound 4 (E9II new compound 5 (E10I new compound 6 (E11I new compound 7 (E12II) new compound 8 (E13 II) new compound 9 (E15I lis Soclinamide 2/3 new compound 10 (E16I) O new compound 11 (E17I) new compound 12 (E18I) ^- N FN H f S new compound 13 (E2OI) ulithiacyclamide family: N 2 NH ulithiacyclamide (E2I1) O O HN O N eW compound 14 (E14II) eW compound 15 (E19II) ws \ N w O eW compound 16 (E21II) eW compound 17 (E22II) eW compound 18 (E23II) eW compound 19 (E24II)

r 0016. These tables show that sequences of natural cassette eW compound 2 O (E25II) domains have following similarities: (i) they have 7 or 8 lissoclinamide family: residues, (ii) they tend to have Ser/Thr/Cys to be modified at 2, 4, 6, or 8 positions from the N-terminus of the cassette lissoclinamide 2/3 (E3I) domain, (iii) the residues (Ser. Thr, and Cys) to be modified are not adjacent to each other in most cases, and (iv) many of lissoclinamide 4/5 (E3II) the residues other than Ser. Thr, and Cys are hydrophobic eW compound 21 (E26II) residues such as Val, Ala, Ile, Phe, and Leu. eW compound 22 (E27II) 0017. These similarities were presumed to be necessary eW compound 23 (E28II) for becoming a substrate of PatD or PatG, a post-translational eW compound 24 (E29II) modification enzyme. It is however not known which residue of Ser. Thr, and Cys has been modified or not modified and substrate specificity of PatD and PatG has not been elucidated patellamide A yet.

PRIOR ART DOCUMENT

Non-Patent Document 0018 Non-Patent Document 1: Donia, M. S. et al., Nat. Chem. Biol., 2006, 2:729-735. SUMMARY OF THE INVENTION Problem to be Solved by the Invention 0019. An object of the present invention is to provide a method of efficiently constructing a library having sufficient diversity and usable for Screening of a compound that binds to a target Substance even having protease activity. US 2014/011383.0 A1 Apr. 24, 2014

Means for Solving the Problem N-terminus, a recognition sequence 1 by an azoline back 0020. The present inventors considered that a library bone-introducing enzyme, -(Xaao), -, and a recognition usable for screening of compounds that bind to a target Sub sequence 2 by the azolinebackbone-introducing enzyme (the stancehaving protease activity can be obtained and the above recognition sequences 1 and 2 being recognition sequences mentioned problem can be solved if a Pat library more by the azolinebackbone-introducing enzyme each composed abundant in diversity than natural one can be obtained effi of from 0 to 10 amino acids); ciently by some method and if modification with a post 0027 a step of expressing the precursor peptide in a cell translational modification enzyme can be made by using Such free translation system by using the mRNA library and library as a substrate. thereby constructing a peptide library; and 0021. As a result of further investigation described herein, 0028 a step of reacting the azoline backbone-introducing we found that some of azoline backbone-introducing enzyme and the peptide library in the presence of a peptide enzymes have azoline backbone forming activity even in comprising a leader sequence of a Substrate of the azoline vitro; and the sequence of the cassette domain which is a backbone-introducing enzyme and thereby introducing an Substrate of such an azolinebackbone-introducing enzyme is azoline backbone into at least one of Cys, Ser. Thr, and not limited to that described in M. S. Donia et al. but a cassette 2,3-diamino acid, and analogs thereof of Xaao (the leader domain having from 2 to 40 amino acid residues may also be sequence being a leader sequence of a Substrate of the azoline a substrate of the enzyme and Cys, Ser. Thr, and 2,3-diamino backbone-introducing enzyme composed of from 0 to 50 acid and analogs thereof in the cassette domain can be con amino acid); verted into an azoline backbone. 0029 2 a method of constructing an azoline compound 0022. It has further been confirmed that steps of express library containing two or more complexes between anazoline ing a Pat library in a cell-free translation system by using a compound having an azoline backbone introduced into at precursor peptide comprising, in the order of mention from least one of Cys, Ser, Thr, and 2,3-diamino acid, and analogs the N-terminus, a leader sequence of PatE, a recognition thereof of Xaa, of a peptide represented by the following sequence 1 by an azoline backbone-introducing enzyme, a formula (I): cassette domain, and a recognition sequence 2 by the azoline backbone-introducing enzyme, modifying it with the azoline backbone-introducing enzyme, and cutting off an unneces (wherein, m numbers of Xaaos respectively represent arbi sary region can be conducted efficiently in one pot and as a trary amino acids, at least one of which represents an amino result, an azoline compound library sufficiently abundant in acid selected from the group consisting of Cys, Ser. Thr, and diversity and usable for Screening using even a target Sub 2,3-diamino acid, and analogs thereof, m represents an inte stance having protease activity can be obtained. ger selected from 2 to 40, and A and B each independently 0023. We also found that the above-mentioned precursor represent a peptide composed of from 0 to 100 amino acids) peptide may be a Substrate of an azoline backbone-introduc and an mRNA encoding the peptide represented by the for ing enzyme when, in the above-mentioned precursor peptide, mula (I), including: only a portion of a conventionally known leader sequence is 0030 a step of constructing an mRNA library encoding a used as the leader sequence of PatE or the leader sequence is precursor peptide comprising, in order of mention from the completely removed; when sequences different from those N-terminus, a recognition sequence 1 by an azoline back conventionally known as the recognition sequence 1 or 2 is bone-introducing enzyme, -(Xaao), -, and a recognition used; when the recognition sequences 1 and 2 are removed; or sequence 2 by the azolinebackbone-introducing enzyme (the the like. Moreover, it has been confirmed that even if as the recognition sequences 1 and 2 being recognition sequences leader sequence portion, a peptide separated from a precursor by the azolinebackbone-introducing enzyme each composed peptide comprising a cassette domain is used, as long as the of from 0 to 10 amino acids); peptide is present in a reaction system, the precursor peptide 0031 a step of binding a puromycin to the 3' end of each comprising a cassette domain may be a Substrate of anazoline mRNA of the mRNA library to construct a puromycin bound backbone-introducing enzyme, leading to the completion of mRNA library; the present invention. 0032 expressing the precursor peptide in a cell-free trans 0024. The present invention relates to: lation system by using the puromycin-bound mRNA library 0025 1 a method of constructing an azoline com and constructing a peptide-mRNA complex library; and pound library containing two or more azoline com 0033 a step of reacting the azoline backbone-introducing pounds having an azoline backbone introduced into at enzyme and the peptide-mRNA complex library in the pres least one of Cys, Ser, Thr, and 2,3-diamino acid, and ence of a peptide comprising a leader sequence of a substrate analogs thereof of Xaao of a peptide represented by the of the azoline backbone-introducing enzyme and introducing following formula (I): an azoline backbone into at least one of Cys, Ser. Thr, and 2,3-diamino acid, and analogs thereof of Xaao (the leader (wherein, m numbers of Xaaos respectively represent arbi sequence being a leader sequence of a Substrate of the azoline trary amino acids, at least one of which is an amino acid backbone-introducing enzyme composed of from 0 to 50 selected from the group consisting of Cys, Ser, Thr, and amino acid); 2,3-diamino acid, and analogs thereof, m represents an inte 0034 (3 the method of constructing anazoline compound ger selected from 2 to 40, and A and B each independently library as described above in 1 or 2, wherein: represent a peptide composed of from 0 to 100 amino acids), 0035 in the formula (I), -(Xaao)- means -(Xaa-Xaa)- including: (wherein, n numbers of Xaa, each independently represent an 0026 a step of constructing an mRNA library encoding a arbitrary amino acid, n numbers of Xaa, each independently precursor peptide comprising, in order of mention from the representanamino acid selected from the group consisting of US 2014/011383.0 A1 Apr. 24, 2014

Cys, Ser, Thr, and 2,3-diamino acid, and analogs thereof, and 2,3-diamino acid, and analogs thereof, m represents an inte in represents an integer selected from 1 to 20); and ger selected from 2 to 40, and A and B each independently 0.036 an azoline backbone has been introduced into at represent a peptide composed of from 0 to 100 amino acids), least one of Cys, Ser, Thr, and 2,3-diamino acid, and analogs wherein: thereof of Xaa, and Xaa, of the azoline compound; 0051 in at least one of the peptides represented by the 0037 (4 the method as described above in any one of 1 formula (I), m does not represent 7 or 8; to 3, wherein at least one of the peptides represented by the 0.052 (14 the azoline compound library as described formula (I) is a peptide having at least one of the following characteristics (i) to (iv): above in 13, wherein in the formula (I), -(Xaa)- means 0038 (i) m represents from 2 to 40 (with the proviso that 7 -(Xaa-Xaa)- wherein, n numbers of Xaa, each indepen and 8 are excluded): dently represent an arbitrary amino acid, n numbers of Xaa 0039 (ii) in represents from 1 to 20 (with the proviso that 3 each independently representanamino acid selected from the and 4 are excluded); group consisting of Cys, Ser, Thr, and 2,3-diamino acid, and 004.0 (iii) at least one of Xaas is an amino acid selected analogs thereof, and n represents an integer selected from 1 to from Cys, Ser, Thr, and 2,3-diamino acid, and analogs 10; and thereof, and 0053 at least one of the peptides represented by the for 0041 (iv) at least one of Xaas is a hydrophilic amino acid; mula (I) is a peptide having at least one of the following 0042 5 the method as described above in any one of 1 characteristics (i) to (iv): to 3, wherein at least one of the peptides represented by the 0054 (i) m represents from 2 to 40 (with the proviso that 7 formula -(Xaa)- is a peptide represented by any of SEQID and 8 are excluded); NOS: 10 to 57; 0055 (ii) in represents from 1 to 20 (with the proviso that 3 0043 6 the method as described above in any one of 1 and 4 are excluded); to 5, wherein at least one of the azoline compounds has 5 or 0056 (iii) at least one of Xaas is an amino acid selected more azoline backbones; from Cys, Ser, Thr, and 2,3-diamino acid, and analogs 0044 (7 the method as described above in any one of 1 thereof, and to 6), wherein each of the mRNAs of the mRNA library is 0057 (iv) at least one of Xaas is a hydrophilic amino acid; constituted so that each peptide comprising the leader sequence is expressed as a fusion peptide with the precursor 0.058 15 the azoline compound library as described peptide comprising a recognition sequence 1, -CXaao), -, and above in 13 or 14, wherein each of the azoline compounds a recognition sequence 2; forms a complex with an mRNA encoding the peptide portion 0045 8 the method as described above in 7, further of the azoline compound; comprising, after introduction of the azolinebackbone, a step 0059 16 the azoline compound library as described of cleaving the leader sequence from the precursor peptide; above in any one of 13 to 15, wherein the entirety or a 0046 9 the method as described above in any one of 1 portion of the recognition sequence by the azolinebackbone to 6, wherein in the step of introducing anazolinebackbone, introducing enzyme has been bound to the N-terminus and the peptide comprising a leader sequence is separated from C-terminus of the peptide represented by the formula (I); the precursor peptide comprising a recognition sequence 1, 0060 17 an azole compound library containing two or -(Xaao), -, and a recognition sequence 2; more azole compounds having an azole backbone introduced 0047 10 the method as described above in any one of 1 into at least one of Cys, Ser, Thr, and 2,3-diamino acid, and to 9, further comprising a step of macrocyclizing the azoline analogs thereof of Xaao of a peptide represented by the fol compound; lowing formula (I): 0048 11 the method of constructing an azole compound library, comprising, after the step of introducing an azoline backbone in the method of constructing anazoline compound wherein, m numbers of Xaaos represent arbitrary amino library according to any one of claims 1 to 10, a step of acids, at least one of which is an amino acid selected from the reacting the library having an azoline backbone introduced group consisting of Cys, Ser, Thr, and 2,3-diamino acid, and therein with an azole backbone-introducing enzyme in the analogs thereof, m represents an integer selected from 2 to 40, presence or absence of a peptide comprising a leader and A and B each independently represents a peptide com sequence of a substrate of the azole backbone-introducing posed of from 0 to 100 amino acids, wherein: enzyme and converting at least one of the azoline backbones 0061 in at least one of the peptides represented by the into an azole backbone (the leader sequence meaning a leader sequence of a Substrate of the azoline backbone-introducing formula (I), m does not represent 7 or 8; enzyme composed of from 0 to 50 amino acids); 0062 18 the azole compound library as described above 0049 12 the method as described above in 11, wherein in 17, wherein in the formula (I), -(Xaao)- means -(Xaa the azole backbone-introducing enzyme is a mutant of PatG Xaa)- wherein, n numbers of Xaa, each independently that has lost the peptidase domain thereof or that has lost the represent an arbitrary amino acid, n numbers of Xaa, each peptidase activity by point mutation; independently represent an amino acid selected from the 0050 13 anazoline compound library containing two or group consisting of Cys, Ser, Thr, and 2,3-diamino acid, and more azoline compounds having an azoline backbone intro analogs thereof, and n represents an integer selected from 1 to duced into at least one of Cys, Ser, Thr, and 2,3-diamino acid, 20); and and analogs thereof of Xaao of a peptide represented by the 0063 at least one of the peptides represented by the for following formula (I): mula (I) has at least one of the following characteristics (i) to (iv): 0064 (i) m represents from 2 to 40 (with the proviso that 7 (wherein, m numbers of Xaaos respectively represent arbi and 8 are excluded); trary amino acids, at least one of which is an amino acid 0065 (ii) in represents from 1 to 20 (with the proviso that 3 selected from the group consisting of Cys, Ser, Thr, and and 4 are excluded); US 2014/011383.0 A1 Apr. 24, 2014

0066 (iii) at least one of Xaas is an amino acid selected I0086 27 a screening kit for identifying an azole com from Cys, Ser, Thr, and 2,3-diamino acid, and analogs pound that binds to a target Substance, including: thereof, and I0087 the azole compound library constructed by using the 0067 (iv) at least one of XaaS is a hydrophilic amino acid; method as described above in 11 or 12 or the azole com 0068. 19 the azole compound library as described above pound library as described above in any one of from 17 to in 17 or 18, wherein each of the azole compounds forms a 20: complex with an mRNA encoding the peptide portion of the 0088 28 the kit as described above in 27, wherein the compound; library has been immobilized onto a Solid phase Support; 0069. 20 the azole compound library as described above I0089 29 a method of preparing an azoline compound in any one of 17 to 19, wherein the entirety or a portion of having an azoline compound introduced into at least one of the recognition sequence by the azoline backbone-introduc Cys, Ser, Thr, and 2,3-diamino acid, and analogs thereof of ing enzyme binds to the N-terminus and C-terminus of the Xaa, of a peptide represented by the following formula (I): peptide represented by the formula (I); 0070 21 a screening method for identifying an azoline 0090 wherein, m numbers of Xaas represent arbitrary compound that binds to a target Substance, including: amino acids, at least one of which is an amino acid selected 0071 a step of bringing the azoline compound library from the group consisting of Cys, Ser, Thr, and 2,3-diamino constructed by using any one of the methods described in acid, and analogs thereof, m represents an integer selected from 1 to 10 or the azoline compound library as described from 2 to 40, and A and B each independently represent a above in any one of 13 to 16 into contact with a target peptide composed of from 0 to 100 amino acids), including: Substance, followed by incubation, and a step of selecting the 0091 a step of preparing an mRNA encoding a precursor aZoline compound that has bound to the target Substance; peptide comprising, in order of mention from the N-terminus, a recognition sequence 1 by anazolinebackbone-introducing 0072 22 a screening method for identifying an azoline enzyme, -(Xaa), and a recognition sequence 2 by the azo compound that binds to a target Substance, including: line backbone-introducing enzyme (the recognition 0073 a step of bringing the azoline compound library sequences 1 and 2 being recognition sequences by the azoline constructed by using any one of the methods described in backbone-introducing enzyme each composed of from 0 to from 2 to 10 or the azoline compound library as described 10 amino acids); above in 15 or 16 into contact with a target substance, 0092 a step of expressing the precursor peptide in a cell followed by incubation, free translation system by using the mRNA; and 0074 a step of selecting an azoline compound that has 0093 a step of reacting the azoline backbone-introducing bound to the target Substance, and enzyme and the precursor peptide in the presence of a peptide 0075 a step of analyzing the base sequence of the mRNA comprising a leader sequence of a Substrate of the azoline of the thus-selected azoline compound; backbone-introducing enzyme and thereby introducing an azoline backbone into at least one of Cys, Ser. Thr, and 0076 23 a screening method for identifying an azole 2,3-diamino acid, and analogs thereof of Xaa, (with the pro compound that binds to a target Substance, including: Viso that the peptide comprising the leader sequence is sepa 0077 a step of bringing the azole compound library con rated from the precursor peptide comprising the recognition structed by using the method described in 11 or 12 or the sequence 1. -OXaao), -, and the recognition sequence 2); azole compound library as described above in any one of from 0094 30 a method of preparing an azole compound 17 to 20 into contact with a target substance, followed by including: incubation, and 0.095 a step of reacting the azoline compound prepared by 0078 a step of selecting an azole compound that has using the method as described above in 29 and an azole bound to the target Substance; backbone-introducing enzyme in the presence or absence of a 0079 24 a screening method for identifying an azole peptide comprising a leader sequence of a Substrate of the compound that binds to a target Substance, including: azole backbone-introducing enzyme and thereby converting 0080 a step of bringing the azole compound library con at least one of the azoline backbones into an azole backbone; structed by using the method described in 11 or 12 or the and azole compound library as described above in 19 or 20 0096 31 a kit for preparing an azoline compound or an azole compound, comprising an azoline backbone-introduc into contact with a target Substance, followed by incubation, ing enzyme or azole backbone-introducing enzyme and a 0081 a step of selecting an azole compound that has peptide comprising a leader sequence of a Substrate of the bound to the target Substance, and aZoline backbone-introducing enzyme. 0082 a step of analyzing the base sequence of the mRNA of the thus-selected azole compound; Effect of the Invention 0083 25 a screening kit for identifying an azoline com 0097. The method of the present invention makes it pos pound that binds to a target Substance, including: sible to rapidly and easily provide an azoline compound 0084 the azoline compound library constructed by using library and an azole compound library much more abundant the method as described in any one of from 1 to 10 or the in diversity than a natural Pat library. aZoline compound library as described above in any one of 0098. By using such an azoline compound orazole com from 13 to 16: pound library, an azoline compound orazole compound that 0085 26 the kit as described above in 25, wherein the binds to a target Substance having protease activity can be library has been immobilized onto a Solid phase Support; screened. US 2014/011383.0 A1 Apr. 24, 2014

0099. In addition, when an mRNA display method is 0114 FIG. 14 shows the evaluation results of binding abil applied to the azoline compound orazole compound library ity, to MMP12, of the azoline-containing compound selected of the present invention and a library of complexes between by the method of FIG. 13. an azoline compound or azole compound and an mRNA 0115 FIG. 15 is a schematic view showing the structure of encoding the peptide portion thereof is constructed, it is pos a pat gene cluster and a biosynthesis pathway of patellamide. sible to determine a nucleic acid sequence encoding the azo 0116 FIG. 16 shows an example of the results of investi line compound/azole compound identified by Screening and gating the Substrate acceptance of Pat) for a synthetic peptide thereby easily analyze the relationship between the structure significantly different from a natural Substrate. and activity of the compound. 0117 FIG. 17 shows an example of the results of investi gating the Substrate acceptance of PatD for a synthetic peptic BRIEF DESCRIPTION OF THE DRAWINGS significantly different from a natural Substrate. 0100 FIG. 1 is a schematic view showing one embodi 0118 FIG. 18 shows the results of investigating the sub ment of a method of constructing an azoline compound strate acceptance of PatD in the case where as the leader library according to the present invention. sequence, a sequence derived from Lacticin 481 precursor or 0101 FIG. 2 shows the results of mass spectrometry derived from human actin or a sequence obtained by shuffling analysis before and after introduction of anazoline backbone the leader sequence of Pat is used. by using PatD into a precursor peptide having the same 0119 FIG. 19 shows the results of investigating the sub sequence of a cassette domain as that of natural Pat. strate acceptance of PatD while deleting a portion of the 0102 FIG. 3 shows the results of mass spectrometry leader sequence. analysis before and after introduction of anazoline backbone I0120 FIG. 20 shows the results of investigating the sub by using PatD into a precursor peptide having a cassette strate acceptance of PatD while introducing point mutation domain composed of 12 amino acids or 16 amino acids. into the leader sequence. 0103 FIG. 4 shows the results of mass spectrometry I0121 FIG. 9 shows the results of investigating the sub analysis before and after introduction of anazoline backbone strate acceptance of PatD while changing the recognition by using PatD into a precursor peptide having, in Xaa, Sequence. thereof, a hydrophilic amino acid residue. 0104 FIG. 5 shows the results of the substrate acceptance MODE FOR CARRYING OUT THE INVENTION of PatD based on the results of tests of introducing an azoline backbone by using PatD into cassette domains having various (Construction Method of Azoline Compound Library) sequences. In each of these sequences, the number of hetero 0.122 The present invention provides a construction cycles introduced after the reaction of PatD is shown with the method of an azoline compound library including two or color density in the box as an indicator. It shows that the darker the color is, the greater the number of products having more azoline compounds. heterocycles introduced therein is observed. The box in light I0123. The term "azoline compound as used herein means color shows byproducts. a compound having an azoline backbone introduced into at 0105 FIG. 6 shows the confirmation results, by mass spec least one of Cys, Ser, Thr, and 2,3-diamino acid, and analogs trometry analysis, of cleavage of a leader sequence by Glu-C. thereof of a peptide represented by the following formula (I): When Pat C1 wt-GS was treated with PatD and Glu-C (PatEC1 wt-GS+PatD+Glu-C), a peak which was not found in wherein, m numbers of Xaaos represent arbitrary amino the absence of a substrate peptide (+PatD+Glu-C) was acids, at least one of which represents an amino acid selected observed (1880.5). It corresponds to a C-terminal short pep from the group consisting of Cys, Ser, Thr, and 2,3-diamino tide site containing four heterocycles. acid, and analogs thereof, m represents an integer selected 0106 FIG. 7 shows the results of investigating the sub from 2 to 40, and A and B each independently represent a strate acceptance of PatD while adding to the reaction system peptide composed of from 0 to 100 amino acids. a leader sequence as a separate peptide or while not adding 0.124. In one embodiment of the present invention, the any leader sequence. azoline compound is represented by the formula (I) in which 0107 FIG. 8 shows an example of the results of investi -(Xaao)- is -(Xaa-Xaa)- wherein, n numbers of Xaas gating the Substrate acceptance of Pat) while deleting a por each independently represent an arbitrary amino acid, n num tion of the leader sequence. bers of Xaas each independently represent an amino acid 0108 FIG. 9 shows the results of investigating the sub selected from the group consisting of Cys, Ser, Thr, and strate acceptance of PatD while changing the recognition 2,3-diamino acid, and analogs thereof, and n represents an Sequence. integer selected from 1 to 20. 0109 FIG. 10A shows the outline of a test to study the 0.125. The Cys, Ser. Thr, and 2,3-diamino acid, and ana Substrate acceptance of Pat D conducted by preparing a Sub logs thereof into which the azoline backbone has been intro strate containing a non-natural amino acid in the cassette duced may be either at a position of Xaa, or at a position of sequence by translational synthesis. Xaa-. 0110 FIG.10B shows non-natural amino acids used in the 0.126 The term “amino acid' is used herein in the broadest test of FIG. 10A. meaning and includes, in addition to natural amino acids, 0111 FIG. 11 shows typical results of the test of FIG. 10. artificial amino acid mutants and derivatives. Examples of the 0112 FIG. 12 shows the outline of screening using an amino acid as described herein include natural proteinogenic mRNA display method. L-amino acids; D-amino acids; chemically modified amino 0113 FIG. 13 shows the results of selection obtained by acids such as amino acid mutants and derivatives; natural screening using the mRNA display method with MMP12 as a non-proteinogenic amino acids Such as norleucine, B-alanine, target. and ornithine; and chemically synthesized compounds hav US 2014/011383.0 A1 Apr. 24, 2014 ing properties known per se in the art and characteristic to I0131 Examples of the 2,3-diamino acid and analogs amino acids. Examples of the non-natural amino acids thereof include, but not limited to, those represented by the include O-methylamino acids C.-methylalanine, etc.), following formulas: D-amino acid, histidine-like amino acids (B-hydroxy-histi dine, homohistidine, C-fluoromethyl-histidine, C.-methyl histidine, etc.), amino acids (“homoamino acids) having, on the side chain thereof, extra methylene, and amino acids Chemical formula 4 (cysteic acid, etc.) obtained by Substituting, with a Sulfonic HN HN R acid group, a side-chain amino acid with a carboxylic acid functional group. OH OH 0127. The amino acid herein is represented by commonly HN HN used single-letter or three-letter code. The amino acids rep O O resented by single-letter or three-letter code include mutants Dap and derivatives thereof. 0128. Examples of the analogs of Thr include, but not limited to those represented by the following formula: wherein, R represents a hydrogen atom, a Substituted or unsubstituted alkyl group having from 1 to 10 carbon atoms, or a substituted or unsubstituted aromatic group. Chemical formula 1 HO R 0.132. The term “introducing an azoline backbone into at least one of Cys, Ser, Thr, and 2,3-diamino acid, and analogs OH thereof as used herein means that a dehydration reaction HN occurs at Cys, Ser, Thr, or 2,3-diamino acid to introduce an O aZoline ring represented by the following formulas: wherein, R represents a hydrogen atom, a Substituted or unsubstituted alkyl group having from 1 to 10 carbon atoms, Chemical formula 5 or a Substituted or unsubstituted aromatic group. 0129. Examples of the analogs of Cys include, but not limited to, those represented by the following formula: () () () Chemical formula 2) HS R 0.133 Introduction of an azoline backbone into Ser, Thr, Cys, or 2,3-diaminopropionic acid produces an oxazoline, OH thiazoline backbone, or an imidazoline backbone as follows: HN O Chemical formula 6 Ser: wherein, R represents a hydrogen atom, a Substituted or HO unsubstituted alkyl group having from 1 to 10 carbon atoms, O or a Substituted or unsubstituted aromatic group. l M Heterocyclase 0130. Examples of the analogs of Ser or Thr include, but N Y. not limited to, those represented by the following formulas: H O O Chemical formula 3 r{ N N s

HO HO HO O Chemical formula 7 Thr: OH OH OH HO HN HN HN O O O O l Heterocyclase tBuSer PrSer PhSer N Y HO H O O OH HN r{I N N O EtSer US 2014/011383.0 A1 Apr. 24, 2014

-continued 0.136 Introduction of an azolinebackbone into the above Chemical formula 8 mentioned 2,3-diamino acid analog residue produces the fol Cys: lowing thiazoline backbone. HS O Chemical formula 12 1 l X Heterocyclase HN R N O H l H Heterocyclase O N - Y - S N H O r{l N H N N R { Chemical formula 9 N s 2,3-Diaminopropionic acid: HN O O ls Heterocyclase N Y 0.137 In the above formula (I), Xaao represents an arbi H trary amino acid insofar as it contains at least one of Cys, Ser, O Thr, 2,3-diamino acid, and an analog thereof. As described H above, although it has been considered that an azoline back N bone-introducing enzyme such as PatD modifies only a cas sette domain having 7 or 8 amino acids and having predeter r{lN mined regularity, a wide sequence with m of from 2 to 40 becomes a Substrate. O 0.138. In the above-mentioned formula (I), Xaas each 0134) For example, introduction of an azoline backbone independently represent an arbitrary amino acid. As into the above-mentioned Thranalogue residue produces the described above, it has been conventionally considered that in peptide which is a Substrate of anazolinebackbone-introduc following oxazoline backbone. ing enzyme such as PatD, residues to be modified rarely be adjacent to each other, but as shown in Examples which will Chemical formula 10 be described later, even if Xaa, represents Cys, Ser, or Thrand HO R is the same amino acid as that of Xaa, adjacent thereto, the O peptide of the formula (I) may be a substrate of an azoline 1 l Heterocyclase backbone-introducing enzyme. N ^ H 0.139. In addition, it has been conventionally considered that in peptide which may be a substrate of an azoline back O bone-introducing enzyme Such as PatD, many of residues O R other than Ser, Cys, and Thr are hydrophobic amino acid { NH residues, but as shown later in Examples, a peptide having, as N s Xaa, thereof, a hydrophilic amino acid may be a Substrate of an azoline backbone-introducing enzyme. O 0140. In the above formula (I), m represents an integer of 0135 Introduction of an azolinebackbone into the above 2 or greater, and 16 or less, 18 or less, 20 or less, 30 or less, or mentioned Cys analog residue produces the following imida 40 or less; and n represents an integer of 1 or greater, and 8 or Zoline backbone. less, 9 or less, 10 or less, or 20 or less. As described above, it has conventionally been considered that in the peptide which may be a Substrate of an azoline backbone-introducing Chemical formula 11 enzyme such as PatD, m=7 or 8, meaning that n=3 or 4, but as HS R will be shown later by Examples, the peptide of the formula O (I) having, as m, 9 or greater and, as n, 5 or greater may also l Heterocyclase be a Substrate of the azoline backbone-introducing enzyme. Y He N 0.141. In the above formula (I), A and B each indepen H dently represent a peptide composed of from 0 to 100 amino O acids. A may contain the entirety or a portion of a recognition S R sequence 1 of an azoline backbone-introducing enzyme. It may contain the entirety or a portion of a leader sequence of r{N Pat, a His tag, a linker, and the like. B may contain the entirety or a portion of a recognition sequence 2. It may O containa His tag, a linker, and the like. A and Beach may have a length of for example, 100, 70, 60, 50, 40, 30, 20, 10, 5, 2, US 2014/011383.0 A1 Apr. 24, 2014

0 amino acid(s), but the length is not limited thereto. In A or any one of 20 amino acids; and NNU and NNG B, one or several amino acids may be a modified amino acid encode any one of 15 and 14 protein amino acids, respec or amino acid analog. tively. 0142. The construction method of an azoline compound I0150. An mRNA library encoding-CXaa-Xaa), can be library according to one embodiment of the present invention constructed, for example, by synthesizing a DNA containing includes a step of constructing an mRNA library encoding a a sequence such as —(NNK-WSU)- or —(NNK-UGU)- precursor peptide comprising, in order of mention from the and transcribing it. Here, N means any one of A, C, G, and T. N-terminus, the recognition sequence 1 of an azoline back K means either one of G and TW means either one of A and bone-introducing enzyme, the (Xaa-Xaa)-, and the recog T; and S means either one of C and G. NNN and NNK each nition sequence 2 of an azoline backbone-introducing encode any one of 20 protein amino acids; WSU encodes any enzyme. one of Ser. Thr, and Cys; and UGU encodes Cys. 0143. The recognition sequences 1 and 2 of an azoline 0151. When such a constitution is employed, a sufficient backbone-introducing enzyme are recognition sequences size of library can be obtained. For example, in the case of composed of from 0 to 10 amino acids and to be recognized -(Xaa), and m=10, 20' kinds of variants can be prepared by anazoline backbone-introducing enzyme. They may have theoretically even from only 20 natural amino acids; and in any sequence insofar as the azoline backbone-introducing the case of -(Xaa-Xaa), and n=5, 20x3 kinds of mutants enzyme recognizes them and introduce an azoline backbone can be prepared. into -(Xaa)-. When the azoline backbone-introducing 0152 By synthesizing a DNA encoding -(Xaao)-, con enzyme is PatD, for example, G(A/L/V) (G/E/D) (AVP) (S/T/ taining, at the 5' end thereof, a DNA encoding the recognition C) (SEQ. ID NO: 2) can be used as the recognition sequence sequence 1 and, at the 3' end thereof, a DNA encoding the 1. It may be, for example, GLEAS (SEQ. ID NO:3). As the recognition sequence 2 and transcribing it, an mRNA encod recognition sequence 2 by the azoline backbone-introducing ing a precursor peptide comprising the recognition sequence enzyme, that containing (A/S)Y(D/E)G(A/L/V) (SEQ ID 1, -CXaa-Xaa), and the recognition sequence 2 can be NO:4) can be used. As such a sequence, for example, obtained. By synthesizing a DNA further containing, on the 5' AYDGVEPS (SEQ ID NO. 5), AYDGV (SEQ ID NO: 6), end side of the DNA encoding the recognition sequence 1 a AYDGVGSGSGS (SEQ ID NO: 7), AYDGVGGGGGG DNA encoding a leader sequence and transcribing it, an (SEQID NO:8), or AYDGVEGSGSGS (SEQID NO:9) may mRNA encoding a precursor peptide comprising the leader be used. sequence, the recognition sequence 1. -OXaa-Xaa-)-, and 0144. As shown later in Examples, the above-mentioned the recognition sequence 2 can be obtained. precursor peptide may become a Substrate of the azoline 0153. The construction method of an azoline compound backbone-introducing enzyme even if it does not have the library according to one embodiment of the present invention recognition sequence 1 and/or 2, and thus the recognition includes a step of using the above-mentioned mRNA library sequences 1 and 2 are optional constituting elements of the to express the above-mentioned precursor peptide with a cell precursor peptide. free translation system and thereby constructing a peptide 0145. Further, as shown later in Examples, the above library. mentioned precursor peptide may become a Substrate of the 0154 The cell-free translation system contains, for aZoline backbone-introducing enzyme even if it uses a example, ribosome protein, aminoacyl tRNA synthetase sequence utterly unrelated to sequences conventionally (ARS), ribosome RNA, amino acid, GTP ATP translation known as a recognition sequence. initiation factor (IF), extension factor (EF), release factor 0146 In particular, the precursor peptide having the rec (RF), ribosome regeneration factor (RRF), and other factors ognition sequence 1 on the N-terminus side of-(Xaa-Xaa)- necessary for translation. An Escherichia coli extract or is susceptible to modification with the azoline backbone wheat germ extract may be used for enhancing the expression introducing enzyme but the sequence is not particularly lim efficiency. Alternatively, a rabbit erythrocyte extractor insect ited insofar as it is present. As the recognition sequence 1 a cell extract may be used. sequence, for example, GGGGG, QOQQQ, LLLLL, or 0155 From several hundred micrograms to several milli PPPPP may be used. gram/mL of can be produced by continuously Sup 0147 As will be described later, the above-mentioned pre plying the system containing them with energy under dialy cursor peptide may be fused further with a peptide having, on sis. The system may contain an RNA polymerase for the N-terminus side thereof, a leader sequence. simultaneously conducting transcription from a gene DNA. 0148. The leader sequence, the recognition sequence 1 by Commercially available cell-free translation systems that can the azoline ring-introducing enzyme, -(Xaao), and the rec be used include E. coli-derived systems such as “RTS-100' ognition sequence 2 by the azoline ring-introducing enzyme (registered trademark), product of Roche Diagnostics and may be adjacent to each other in the precursor peptide. The "PURESYSTEM (registered trademark), product of PGI precursor peptide may have a sequence of from one to several and systems using a wheat germ extract available from ZOE amino acids between the leader sequence, the recognition GENE Corporation and CellFree Sciences Co., Ltd. sequence 1, -(Xaao), -, and the recognition sequence 2 insofar 0156 When the cell-free translation system is used, a pep as it is expressed as a precursor peptide in a cell-free expres tide can be modified in one pot by adding a post-translation sion system and is Subjected to modification with the azoline modification enzyme to the same container without purifying ring-introducing enzyme. an expression product. 0149. An mRNA library encoding -(Xaa), can be con 0157. The construction method of an azoline compound structed by synthesizing a DNA containing a sequence Such library according to one embodiment of the present invention as -(NNN)-, -(NNK)-, -(NNU)-, or —(NNG), and tran includes a step of reacting an azoline backbone-introducing scribing it. Here, the “N” means any one of A, C, G, and T: enzyme and the above-mentioned peptide library in the pres “K” means either one of G and T, NNN and NNK each mean ence of a peptide comprising a leader sequence of a substrate US 2014/011383.0 A1 Apr. 24, 2014 of the azoline backbone-introducing enzyme, and thereby introducing enzyme even if the leader sequence is added to introducing an azolinebackbone into at least one of Cys, Ser, the reaction system as an independent peptide. Thr, and 2,3-diamino acid, and analogs thereof. More specifi 0166 When the leader sequence is allowed to exist as a cally, an oxazolinebackbone is introduced into Ser and Thr, a peptide independent from the precursor peptide comprising thiazoline backbone is introduced into Cys, and an imidazo the recognition sequence 1, -(Xaa-Xaa-)-, and the recogni line backbone is introduced into 2,3-diamino acid. tion sequence 2, each of these peptides (that is, the peptide 0158. The leader sequence is a sequence that facilitates comprising the leader sequence and the precursor peptide) modification with the azoline backbone-introducing enzyme becomes shorter, which makes preparation easy. When the and any sequence may be used insofar as it satisfies this leader sequence is fused to the precursor peptide, the leader object. As will be described later in Examples, the precursor sequence is desirably cleaved therefrom prior to screening, peptide comprising the recognition sequence 1, -(Xaa-Xaa) because there is a possibility of causing steric hindrance when -, and the recognition sequence 2 may become a substrate of the peptide is bound to a target molecule. When the peptide the azoline backbone-introducing enzyme without the leader comprising the leader sequence is originally allowed to exist sequence. The leader sequence may be composed of for as a peptide independent from the precursor peptide, cleaving example, about 5 amino acids, about 7 amino acids, about 10 of it therefrom is not necessary. amino acids, about 20 amino acids, about 30 amino acids, 0167. These two peptides may be prepared by either trans about 40 amino acids, or about 50 amino acids. lational synthesis or chemical synthesis. 0159. As the leader sequence, for example, a peptide com 0168 Examples of the azoline backbone-introducing posed of the following amino acid sequence; a partial enzyme include PatD and enzymes having homology there sequence of this amino acid sequence; or an amino acid with. As the enzyme having homology with PatD, those sequence obtained by deleting, adding, or Substituting one to included in the report of Lee, etc. (Lee, S.W. et al., PNAS Vol. several amino acids in this amino acid sequence may be used. 105, No. 15, 5879-5884, 2008) may be used, but it is not limited to them. (0160 MNKKNILPQQGQPVIRLT 0169. The azoline backbone-introducing enzyme may be AGQLSSQLAELSEEALGDA (SEQID NO: 1) extracted/purified from microorganisms producing the azo 0161 The partial sequence of the amino acid sequence line backbone-introducing enzyme or may be expressed by having SEQID NO: 1 is a sequence containing at least four gene recombination. For example, an azoline backbone-in Successive amino acids, five Successive amino acids, or six troducing enzyme can be expressed in Escherichia coli as a Successive amino acids of this amino acid sequence. construct having at the N-terminus thereof a His tag and Although no particular limitation is imposed on the position purified by making use of the His tag according to a conven of these amino acids in SEQID No: 1, the partial sequence tional manner. The azoline backbone-introducing enzyme contains four amino acids, five amino acids, or six amino may be a mutant thereof insofar as it has an azolinebackbone acids, for example, at the C-terminus of the amino acid introducing ability. sequence of SEQID NO: 1. (0170 The reaction between the azoline backbone-intro 0162. Further, as will be shown later in Examples, the ducing enzyme and the peptide library can be conducted by above-mentioned precursor peptide may become a substrate adding the azoline backbone-introducing enzyme in the con of an azoline backbone-introducing enzyme even when tainerin which the precursor peptide was expressed, that is, in using, as the leader sequence, a sequence entirely unrelated to one pot without purifying the precursor peptide. The reaction a leader sequence of Pat conventionally known as the leader between the azoline backbone-introducing enzyme and the sequence. For example, as the leader sequence, a sequence peptide library can be conducted under appropriate condi such as MKEQNSFNLLQEVTESELDLILGA derived from tions selected by those skilled in the art and for example, another peptide (Lacticin 481 precursor), a sequence Such as when the azoline backbone-introducing enzyme is PatD, the MILASLSTFQQMWISKQEYDEAGDA derived from conditions are selected within a range of a final concentration human actin, or a sequence such as MELOLRPSGLE offrom 0.1 uM to 50 uM, a reaction temperature of from 4°C. KKQAPISELNIAQTOGGDSQVLALNA obtained by shuf to 45° C., and a reaction time of from 5 minutes to 100 hours. fling the leader sequence of Pat. 0171 Confirmation of the reaction can be conducted by 0163 AS the leader sequence, a sequence having high measuring a mass change by using, for example, MALDI helicity (C. helicity) may be used. TOF MIS. 0164. In the phrase “in the presence of a peptide compris 0172. As the construction method of anazoline compound ing a leader sequence of a Substrate of the azoline backbone library according to one embodiment of the present invention, introducing enzyme’ as used herein, the peptide may be, in when the leader sequence is fused with the precursor peptide the reaction system, present as an independent peptide from comprising the recognition sequence 1. -OXaa1-Xaa-)-, and the precursor peptide comprising the recognition sequence 1, the recognition sequence 2, a step of cleaving the leader -(Xaa-Xaa-)-, and the recognition sequence 2 or have been sequence from the precursor peptide may be conducted. This fused with the precursor peptide comprising the recognition facilitates binding of a cassette domain portion represented sequence 1. -OXaa1-Xaa-)-, and the recognition sequence 2. by -(Xaao), - to a target Substance. 0.165. In the substrate of natural PatD, the leader sequence, 0173 Cleavage of the leader sequence can also be con recognition sequence 1, cassette sequence, and recognition ducted by adding a peptidase in the container where the sequence 2 are fused with each other, but as will be shown reaction of the azoline backbone-introducing enzyme was later in Examples, the present inventors have found that in conducted. reacting the precursor peptide comprising the recognition 0.174 Cleavage of the leader sequence may be conducted sequence 1, —CXaa1-Xaa-)-, and the recognition sequence 2 by cleaving in the middle of the leader sequence, in the middle with the azolinebackbone-introducing enzyme, the precursor of the recognition sequence 1, at the binding site between the peptide may become a Substrate of the azoline backbone leader sequence and the recognition sequence 1, at the bind US 2014/011383.0 A1 Apr. 24, 2014 13 ing site between the recognition site 1 and the cassette TABLE 3 domain, and at the binding site between the cassette domain and the recognition site 2. The kind of a peptidase may be Sasa - (Xaao) m- 3. selected depending on the Sequence at the cleavage site. C1m.1 WTACIT O Examples of the peptidase include, but not limited to, trypsin, Glu-C, Lys-C, Asp-N, Lys-N, Arg-C, thrombin, Factor Xa, C1m2 WTACITECWT 1. prescission protease, TEV protease, entherokinase, and HRV 3C Protease. C1 m3 WTATITET 2 0.175. As one example, when GLEAS is used the recogni- C1m4 WCACICFC 3 tion sequence 1 of the azolinebackbone-introducing enzyme, endoproteinase Glu-C can be used and cleaving is conducted C1m.5 WSASISFS 4. between Glu and Ala. The reaction of Glu-C can be conducted C1m6 WTADITFC 5 in a known manner. 0176). In one embodiment of the construction method of an C1mf WTANITFC 6 aZoline compound library according to the present invention, C1m.8 WTAKITFC 7 an azoline compound library containing at least two com plexes between an azoline compound and an mRNA encod- C1m.9 CFTICATV 8 ing the peptide represented by the formula (I) is constructed. C1m10 WTACITFCWTIC 9 This makes it possible to apply the azoline compound library to mRNA display (Nemoto, N. et al., FE BS Lett. 1997, Cm11 WTACDTFC 2O 405-408; Roberts, R. W. and Szostak, J. W. Proc. Natl. Acad. Sci. USA 1997, 94, 12297-12302). C1m12 WTACNTFC 21 0177. By using such an azoline compound-mRNA com- C1m13 WTACKTFC 22 plex library and conducting screening of an azoline com pound that binds to a target Substance, it is possible to obtain C1m.14 WCACDCFC 23 a cDNA-containing complex by a reverse transcription reac- C1m.15 WCACNCFC 24 tion of the azoline compound-mRNA complex selected and determine the base sequence of it. C1m.16 WCACKCFC 25 (0178. The azoline compound-mRNA complex can be pre pared, for example, by binding puromycin to the 3' end of C1mIf WCATITFT 26 each of mRNAs of the mRNA library in a known manner to C1m.18 WTACITFT 27 prepare a puromycin-bound mRNA library and expressing a precursor peptide in a cell-free translation system by using C1m.19 WTATICFT 28 this puromycin bound mRNA library. C m2O WTATITFC 29 0179. After preparation of the peptide-mRNA complex library in such a manner, it is reacted with PatD and then a C1m21 CFCCACV 3 O leader sequence is cleaved if necessary to obtain an azoline compound library. C1m22 DTACITFC 31 0180. In the construction method of an azoline compound C1m23 NTACITFC 32 library according to one embodiment of the present invention, at least one of peptides represented by the formula (I) has at C1m24 KTACITFC 33 least one of the following characteristics (i) to (iv): C1m25 WTACETFC 34 0181 (i) m represents from 2 to 40 (with the proviso that 7 and 8 are excluded) and mis, for example, 2, 3, 4, 5, 6, 9, 10. C1m26 WTACDTFC 35 12, 14, 16, 20, 30, or 40; C1m27 WTACHTFC 36 0182 (ii) in represents from 1 to 20 (with the proviso that 3 and 4 are excluded) and n is, for example, 1, 2, 5, 6, 7, 8, 10. C1m28 WTACRTFC 37 15, or 20; C1m29 WTACPTFC 38 0183 (iii) at least one, at least two, at least three, at least four, or at least five of XaaS is (are) an amino acid selected C1m3 O WFALIMFC 39 from Cys, Ser, Thr, and 2,3-diamino acid, and analogs C1m31 WFALIMCC 4 O thereof, and 0184 (iv) at least one, at least two, at least three, at least C1m32 WFALCCCC 41 four, or at least five of Xaasis (are) a hydrophilic amino acid. C1m33 WCACRCFC 42 0185. Peptides having at least one of the characteristics of (i) to (iv) include peptides which have conventionally been C1m34 RTACITFC 43 considered unsuitable as a substrate of the azolinebackbone- C1 m35 WFALCCCC 44 introducing enzyme Such as PatD and the present inventors have confirmed, for the first time, that they become a substrate C1m36 RCDCDCRC 45 of the azoline backbone-introducing enzyme. C1 m37 WCACICFCWCACVC 46 0186 The following are cassette sequences confirmed for the first time as a substrate of the azolinebackbone-introduc- C1m38 WCACICFCWCACVCIC 47 ing enzyme. US 2014/011383.0 A1 Apr. 24, 2014 14

TABLE 3 - continued TABLE 4 - 2 - continued Mutant - (Xaao) m- SEQ ID NO: 25s a - (Xaao) m Eliasis

m2O3 WTACITFCVTACVTICYTFCIT FCATWCITYCFTIC WTATDTET 48

WSASDSFS 49 0187. In the construction method of an azoline compound library according to one embodiment of the present invention, at least one of the azoline compounds has 5 or more azoline backbones. TABLE 4 0188 It has conventionally been considered that 5 or more WTATITFTWTIT SO aZoline backbones can not be introduced into the cassette domain of natural PatE even if anazolinebackbone-introduc RCRCICFCWCACVC 51 ing enzyme is used. The present inventors for the first time Succeeded in synthesis of an azoline compound having 5 or WCACICRCRCACVC 52 more azoline backbones. WCACICFCVCRCRC 53 (0189 In one embodiment of the construction method of an aZoline compound library according to the present invention, WTATICFC 54 at least one of the azoline compounds has an azoline back WCATITFC 55 bone other than ordinary oxazoline/thiazoline obtained from Ser/Thr/Cys. WCACITFT 56 0190. It has conventionally been considered that an azo WCATICFT st line backbone is not introduced into an amino acid other than Ser, Thr, and Cys present in the cassette domain of natural Pat evenifanazolinebackbone-introducing enzyme is used. The present inventors have for the first time synthesized com TABL E 4. 2 pounds in which an imidazoline backbone or a Substituted azoline backbone derived from non-protein amino acids such Mutant - (Xaao) m SEQ ID NO: as Dap, thBuSer, iPrSer, PhSer, and EtSer has been introduced mO49 WC using anazolinebackbone-introducing enzyme Such as PatD. 0191 The construction method of an azoline compound mOSO WCAC library according to one embodiment of the present invention mO51 WCACICFCWCACVCICYCFCIC includes a step of macrocyclizing an azoline compound before or after cleavage of the leader sequence. Macrocyliza mO52 WCACICFCWCWCFCYCACYCIC tion of an azoline compound can be conducted in a known FCACWCICYCFCIC method of macrocyclizing a peptide or a method equivalent mOS3 RTDTDTRT thereto. For example, it can be conducted in accordance with the method disclosed in WO2008/117833 or the method of mO54 RSDSDSRS Timmerman, et al. (Timmerman, Pet al., ChemBioChem mOSS CCCCCC 2005, 6: 821-824). mOS 6 TTTTTT (Construction Method of Azole Compound Library) mO57 SSSSSS 0.192 The present invention also provides a construction mO58 WFATITFT method of anazole compound library containing two or more azole compounds. Of the terms used in the construction mO59 CFATITFT methodofanazole compound library according to the present invention, those also used in the above-mentioned construc mO68 WFALCCCC tion method of an azoline compound library are regarded as m119 WT having the same meanings unless otherwise particularly specified. m196 WTAC 0193 The term "azole compound as used herein means a m197 WTACITFCWTAC compound having an azole backbone introduced into at least one of Cys, Ser, Thr, and 2,3-diamino acid, and analogs m198 WTACITFCWTACVSIC thereof of a peptide represented by the following formula (I): m199 WTACITFCWTACVSICYTFCIT

TACITFCVTACVSICYTE CIT wherein, m numbers of Xaaos represent arbitrary amino CATWCISYCFTIC acids, at least one of which is an amino acid selected from the group consisting of Cys, Ser, Thr, and 2,3-diamino acid, and WTACITFCWTACVTIC analogs thereof, m represents an integer selected from 2 to 20, WTACITFCWTACVTICYTFCIT and A and B each independently represents a peptide com posed of from 0 to 8 amino acids). US 2014/011383.0 A1 Apr. 24, 2014

0194 In one embodiment of the present invention, the -continued azole compound is a compound of the formula (I) in which S wherein, n numbers of Xaa, each independently represent an r{l arbitrary amino acid, n numbers of Xaa, each independently N representanamino acid selected from the group consisting of O Cys, Ser, Thr, and 2,3-diamino acid, and analogs thereof, and Chemical formula 17 in represents an integer selected from 1 to 10. Dap: 0.195 The Cys, Ser. Thr, and 2,3-diamino acid, and ana HN logs thereof into which the azole backbone has been intro O duced may be either at a position of Xaa, or at a position of l HeterocyclaseOxydase Xaa-. ^ -- N 0196. The term “introducing an azole backbone into at H least one of Cys, Ser, Thr, and 2,3-diamino acid, and analogs O thereofas used herein means that an oxidation reaction of an H aZoline ring produced by the dehydration reaction of Cys, Ser, N or Thr proceeds to introduce anazole ring represented by the r{l following formulas: N

Chemical formula 13 O () () () 0198 For example, introduction of anazole backbone into the above-mentioned artificial Thr analog residue produces 0197) Introduction of an azole backbone into Ser, Thr, the following oxazole backbone. Cys, or 2,3-diamino acid produces an oxazole, thiazole, or imidazole backbone as follows: Chemical formula 18 HO R Chemical formula 14 Ser: O Heterocyclase HO l Oxydase N O Heterocyclase H O 1 l s -e-Oxidase N s O R H O O r{ N r{ N N O

O Chemical formula 15 (0199 Introduction of an azole backbone into the above Thr: mentioned artificial CyS analog residue produces the follow HO ing thiazole backbone. O l HeterocyclaseOxydase -e- N Chemical formula 19 H HS R O O Heterocyclase O l NH - OxydaseY - N H r{I N N a. O S R O Chemical formula 16 Cys: r{ N HS O O l HeterocyclaseOxydase s -as N s H 0200 Introduction of an azole backbone into the above O mentioned artificial diamino acid analog residue produces the following imidazole backbone. US 2014/011383.0 A1 Apr. 24, 2014

the terms used here, those also used in the above-mentioned Chemical formula 20 construction method of a library are regarded as having the HN R same meanings unless otherwise particularly specified. O Heterocyclase 0206. The azoline compound library according to one l l, Oxydase embodiment of the present invention includes at least one of N s H aZoline compounds having at least one of the following char O acteristics (i) to (iv): H 0207 (i) m represents from 2 to 20 (with the proviso that 7 N R and 8 are excluded) and mis, for example, 2, 3, 4, 5, 6, 9, 10. 12, 14, 16, 20, 30, or 40; r{ N 0208 (ii) in represents from 1 to 10 (with the proviso that 3 N s and 4 are excluded) and n is, for example, 1, 2, 5, 6, 7, 8, 10. O 15, or 20; 0209 (iii) at least one, at least two, at least three, at least four, or at least five of XaaS is (are) an amino acid selected 0201 In the construction method of an azole compound from Cys, Ser, Thr, and 2,3-diamino acid, and analogs library according to one embodiment of the present invention, thereof, and an azoline backbone-introducing step in the above-men 0210 (iv) at least one, at least two, at least three, at least tioned construction method of anazoline compound library is four, or at least five of Xaasis (are) a hydrophilic amino acid. followed by a step of reacting the library having azoline 0211 Peptides having at least one of the characteristics of backbones introduced therein with an azole backbone-intro (i) to (iv) include peptides which have conventionally been ducing enzyme to convert at least one of the azoline back considered unsuitable as a substrate of the azolinebackbone bones into anazole backbone. The oxazolinebackbone intro introducing enzyme such as PatD and the present inventors duced into Seror Thr, the thiazolinebackbone introduced into have confirmed for the first time that they become a substrate Cys, and an imidazoline backbone introduced into 2,3-di of the azoline backbone-introducing enzyme. amino acid are converted into an oxazole backbone, a thiazole 0212. In the azoline compound library according to one backbone, and an imidazole backbone, respectively. The step embodiment of the present invention, azoline compounds of reacting the library having an azolinebackbone introduced each constitute a complex with mRNA encoding the peptide therein with the azole backbone-introducing enzyme may be portion thereof. The library may be applicable to mRNA conducted in the presence of a peptide comprising a leader display. sequence of a substrate of the azole backbone-introducing 0213. In the azoline compound library according to one enzyme. The peptide comprising a leader sequence of a Sub embodiment of the present invention, the entirety or a portion strate of the azole backbone-introducing enzyme may be the of the recognition sequence by the azoline backbone-intro same as or different from the peptide comprising a leader ducing enzyme may have been bound to the N-terminus and sequence of a Substrate of the azoline backbone-introducing C-terminus of the peptide represented by the formula (I). enzyme. 0214. The recognition sequence of the azoline backbone 0202) Examples of the azole backbone-introducing introducing enzyme is necessary for modifying the peptide, enzyme include PatG and enzymes having homology there which has been expressed in a cell-free translation system, with. Examples of the enzymes having homology with PatG with the azoline backbone-introducing enzyme. When an include, but not limited to, those included in the report of Lee, aZoline compound library is constructed using the method of et al. (Lee, S. W. et al., PNAS vol. 105, No. 15, 5879-5884, the present invention and the leader sequence is cleaved using 2008). a peptidase, the entirety or a portion of these sequences may 0203 The reaction with the azole backbone-introducing remain at the N-terminus and C-terminus. enzyme can be conducted by adding the azole backbone 0215 For example, supposing that the recognition introducing enzyme in the container in which the reaction sequence 1 is GLEAS (SEQ ID NO: 3) and the leader with the azolinebackbone-introducing enzyme has been con sequence is cleaved using Glu-C, Ala-Ser remains on the ducted. N-terminus side of the cassette domain. 0204. In the construction method of an azole compound library according to one embodiment of the present invention, (Azole Compound Library) a mutant obtained by deleting a peptidase domain from PatG or a mutant which has lost its peptidase activity by point 0216. The present invention also provides an azole com mutation is used as the azole backbone-introducing enzyme. pound library containing two or more azole compounds. Of PatG is comprised of two domains and in natural one, an the terms used here, those also used in the above-mentioned oxidase domain at the N-terminus is involved inconversion of construction method of a library are regarded as having the an azoline backbone constructed by PatD into an azole back same meanings unless otherwise particularly specified. bone, while a peptidase domain at the C-terminus is involved 0217. The azole compound library according to one in cleaving of the peptide site and macrocyclization after embodiment of the present invention includes at least one of modification. Accordingly, in the construction method of an azole compounds having at least one of the following char azole compound library according to the present invention, a acteristics (i) to (iv): peptidase domain-deficient mutant or a mutant that has lost its 0218 (i) m represents from 2 to 20 (with the proviso that 7 peptidase activity as a result of point mutation can be used. and 8 are excluded) and mis, for example, 2, 3, 4, 5, 6, 9, 10. 12, 14, 16, 20, 30, or 40; (Azoline Compound Library) 0219 (ii) in represents from 1 to 10 (with the proviso that 3 0205 The present invention also provides an azoline com and 4 are excluded) and n is, for example, 1, 2, 5, 6, 7, 8, 10. pound library containing two or more azoline compounds. Of 15, or 20; US 2014/011383.0 A1 Apr. 24, 2014

0220 (iii) at least one, at least two, at least three, at least selecting the azoline compound bound to the target Substance. four, or at least five of XaaS is (are) an amino acid selected The azoline compound may be labeled using a known method from Cys, Ser, Thr, and 2,3-diamino acid, and analogs capable of detectably labeling peptides before it is bound to thereof, and the target Substance. After the step of bringing them into 0221 (iv) at least one, at least two, at least three, at least contact, the Surface of the solid phase Support may be washed four, or at least five of Xaasis (are) a hydrophilic amino acid. with a buffer to detect the azoline compound which has bound 0222 Peptides having at least one of the characteristics (i) to the target Substance. to (iv) include peptides which have conventionally been con 0233 Examples of the detectable label include enzymes sidered unsuitable as a substrate of the azoline backbone Such as peroxidase and alkaliphosphatase, radioactive Sub introducing enzyme Such as PatD and the present inventors stances such as I, I, S, and H, fluorescent substances have confirmed for the first time that they become a substrate Such as fluorescein isothiocyanate, rhodamine, dansyl chlo of the azoline backbone-introducing enzyme. It is therefore ride, phycoerythrin, tetramethyl rhodamine isothiocyanate, understood that an azole backbone may be introduced by and infrared fluorescent materials, light-emitting Substances introducing an azoline backbone by using the azoline back Such as luciferase, luciferin, and aequorin, and nanoparticles bone-introducing enzyme and then conducting modification Such as gold colloid and quantum dot. When the label is an with the azole backbone-introducing enzyme. enzyme, the azoline compound can be detected by adding a 0223) In the azole compound library according to one substrate of the enzyme to develop a color. The peptide may embodiment of the present invention, the azole compounds also be detected by binding biotin thereto and then binding each constitute a complex with mRNA encoding the peptide avidin or streptavidin labeled with an enzyme or the like to the portion thereof. The library may be applicable to mRNA biotin-bound peptide. display. 0234. It is possible not only to detect or analyze the pres 0224. In the azole compound library according to one ence/absence or degree of binding but also to analyze the embodiment of the present invention, the entirety or a portion enhanced or inhibited activity of the target substance and of the recognition sequence of PatD may have been bound to thereby identify anazoline compound having such enhancing the N-terminus and C-terminus of the peptide represented by or inhibitory activity. Such a method makes it possible to the formula (I). identify an azoline compound having physiological activity 0225. The recognition sequence of PatD is necessary for and useful as a pharmaceutical. modifying the peptide, which has been expressed in a cell 0235. When the azoline compound library is comprised of free translation system, with PatD. When anazole compound aZoline compound-mRNA complexes, after an azoline com library is constructed using the method of the present inven pound bound to a target Substance is selected by the above tion and the leader sequence is cleaved using a peptidase, the mentioned method, a step of analyzing the base sequence of entirety or a portion of these sequences may remain at the the mRNA of the azoline compound thus selected may be N-terminus and C-terminus. conducted. 0226 For example, Supposing that the recognition 0236. The analysis of the base sequence of mRNA can be sequence 1 is GLEAS and the leader sequence is cleaved conducted by synthesizing cDNA by using a reverse tran using Glu-C, Ala-Ser remains on the N-terminus side of the Scription reaction and then, analyzing the base sequence of cassette domain. the resulting cDNA. This makes it possible to easily specify the relationship between a genotype and a phenotype. (Screening Method of Azoline Compound) 0237. An azoline compound having strong binding ability to the target Substance may be concentrated by conducting 0227. The present invention provides a screening method transcription further after the reverse transcription reaction to for identifying an azoline compound that binds to a target convert the library into the mRNA library again, and repeat Substance. ing screening with the target compound. 0228. The screening method according to one embodi ment of the present invention includes a step of bringing the aZoline compound library constructed by the construction (Screening Method of Azole Compound) method of the present invention or the azoline compound 0238. The present invention provides a screening method library according to the present invention into contact with a for identifying an azole compound that binds to a target target Substance; and incubating the resulting compound. Substance. 0229. The target substance is not particularly limited 0239. The screening method according to one embodi herein and may be, for example, a small molecular com ment of the present invention may include a step of bringing pound, a high molecular compound, a nucleic acid, a peptide, the azole compound library constructed using the construc a protein, or the like. In particular, according to the library of tion method of an azole compound library according to the the present invention, a target Substance having a protease present invention or the azole compound library according to activity can also be used. the present invention into contact with a target Substance, 0230. The target substance, for example, immobilized on a followed by incubation. Solid phase Support may be brought into contact with the 0240. The screening method of an azole compound can be library of the present invention. The “solid phase support to conducted in accordance with the above-mentioned screen be used herein is not particularly limited insofar as it is a ing method of an azoline compound. carrier onto which a target Substance can be immobilized and examples include microtiter plates, a Substrate and beads (Screening Kit) made of glass, a metal, a resin, or the like, nitrocellulose membrane, nylon membrane, and PVDF membrane. The tar 0241 The present invention provides a screening kit of an get Substance can be immobilized onto Such a solid phase aZoline compound orazole compound. Support in a known manner. 0242. The screening kit according to one embodiment of 0231. The target substance and the library may be brought the present invention includes anazoline compound library or into contact with each otherina buffer selected as needed and azole compound library constructed by the construction reacted while controlling pH, temperature, time, or the like. method of the present invention or the azoline compound 0232. The screening method according to one embodi library or azole compound library according to the present ment of the present invention may further include a step of invention. The screening kit of the present invention includes, US 2014/011383.0 A1 Apr. 24, 2014 in addition, a reagent and an apparatus necessary for detecting of a substrate of the azole backbone-introducing enzyme and the binding between a target Substance and an azoline com thereby converting at least one of azoline backbones into an pound or azole compound. Examples of Such a reagent and azole backbone. apparatus include, but not limited to, Solid phase Supports, 0254 According to the present method, preparation of an buffers, labeling reagents, enzymes, enzyme reaction termi azole compound may be easily prepared because the peptide nator Solutions, and microplate readers. comprising a leader sequence and the azoline compound may 0243 In the screening kit of the present invention, the each be made shorter. The preparation may be conducted library may be immobilized in array form on a solid phase using chemical synthesis, translational synthesis, or combi Support. nation of them. In addition, prior to screening, separation of the leader sequence is not necessary. (Preparation Method of Azoline Compound) 0255. The present method is therefore particularly advan tageous for, after identification of an azole compound that 0244. The present invention provides a method of prepar binds to a target Substance, mass production of the azole ing an azoline compound in which at least one azoline back compound. bone has been introduced into at least one of Cys, Ser. Thr, and 2,3-diamino acid, and analogs thereof of Xaao of a pep (Kit for Preparing Azoline Compound or Azole Compound) tide represented by the formula (I). 0256 The kit for preparing an azoline compound or an 0245. This method includes: azole compound according to the present invention serves to 0246 a step of preparing an mRNA encoding a precursor prepare anazoline compound orazole compoundby using the peptide having, in order of mention from the N-terminus, a above-mentioned method so that it includes an azoline back recognition sequence 1 by an azoline backbone-introducing bone-introducing enzyme or azole backbone-introducing enzyme, -(Xaao), -, and a recognition sequence 2 by an azo enzyme and a peptide having a leader sequence of a substrate line backbone-introducing enzyme (the recognition of the azole backbone-introducing enzyme. sequences 1 and 2 meaning recognition sequences by an 0257. In addition, the present kit may include necessary aZolinebackbone-introducing enzyme composed of from 0 to reagents and instruments, an instruction manual, and the like. 10 amino acids); 0247 a step of expressing a precursor peptide in a cell-free Example translation system by using the above-mentioned mRNA; and 0258. The present invention will next be described more 0248 a step of reacting the azoline backbone-introducing specifically based on Example. It should however be borne in enzyme and the precursor peptide in the presence of a peptide mind that the present invention is not limited to or by the having a leader sequence of a Substrate of the azoline back Example. bone-introducing enzyme to introduce an azoline backbone 0259 Synthesis scheme of anazoline compound library in into at least one of Cys, Ser, Thr, and 2,3-diamino acid, and the following Example is shown in FIG. 1. analogs thereof of Xaao. 0249. The peptide having a leader sequence and the pre 1 Expression and Purification of PatD cursor peptide having a recognition sequence 1. -OXaao)-. 0260 A PatD gene was inserted into a pET16b plasmid to and a recognition sequence 2 are separate peptides. prepare a construct plasmid containing at the N-terminus 0250. According to this method, the peptide having a thereof a 10xHis tag. It was transformed into an E. coli leader sequence and the precursor peptide having a recogni BL21 (DE3)pLysS strain, followed by culturing at 30° C. tion sequence 1. -OXaao), -, and a recognition sequence 2 may When O.D. reached 0.4,0.1 mM of IPTG was added to induce be made shorter, respectively, so that they can be prepared expression, followed by culturing overnight at 16°C. The easily. For the preparation, chemical synthesis, translational cells collected were suspended in alysis buffer (1 MNaCl, 10 synthesis and combination of them can be employed. Prior to mM Imidazole, 50 mM HEPES-Na (pH7.5)) and then lysed screening, separation of the leader sequence is not necessary. ultrasonically. The sample was filtered and purified using a 0251. The present method is therefore particularly advan His-Trap HP column. The column was equilibrated in tageous when an azoline compound that binds to a target advance with 10 CV of Buffer A (500 mM. NaCl, 25 mM Substance is identified by using the screening method of the imidazole, 50 mM HEPES-Na (pH7.5)) and after injection of present invention and then, the azoline compound is mass the sample therein, the protein was eluted from the column by produced. gradually increasing the concentration of Buffer B (500 mM NaCl, 500 mM imidazole, 50 mM HEPES-Na (pH7.5)) to (Preparation Method of Azole Compound) obtain a pure PatD fraction. The sample thus obtained was concentrated to about 4 times with Amicon Ultra (Millipore) 0252. The present invention embraces a preparation 30 kDa. Then, buffer was exchanged with Storage Buffer methodofanazole compound by using the azoline compound (200 mM NaCl, 25 mM HEPES (pH7.5), 10% glycerol) by obtained using the above-mentioned preparation method of using PD-10 (GE Lifescience) and the resulting sample was an azoline compound. stored at -80°. 0253) The present method includes a step of reacting an 2 Construction of Pat Plasmid (pET16b) aZoline compound and an azole backbone-introducing 0261 The following Pat sequence was subcloned into enzyme in the presence of a peptide having a leader sequence pET16b.

(SEO ID NO. 58) ATGAACAAGAAAAACATCCTGCCCCAACAAGGTCAACCGGTTATCCGCTTAACCGCAGG

ACAGTTGAGCTCGCAACTCGCCGAACTGTCTGAAGAAGCACTGGGCGACGCGGGGTTGG

AGGCAAGCGTTACGGCGTGTATCACGTTTTGTGCGTACGATGGCGTTGAGCCATCTATT

ACGGTCTGCATTAGTGTCTGCGCCTATGATGGGGAGTAA US 2014/011383.0 A1 Apr. 24, 2014

3 Preparation of DNA of Substrate Peptide 3-1 Preparation of Pat pre 0262 DNA of Pat pre was prepared by conducting PCR twice with the Pat plasmid as a template. The underlined portion is DNA encoding GLEAS (SEQ ID NO: 3) of the recognition sequence 1. The region from position 48 to posi tion 153 of SEQ ID NO. 59 is a code region of the leader Sequence.

(SEO ID NO. 59) GGCGTAATACGACT CACTATAGGGTTAACTTTAACAAGGAGAAAAACATGAACAAGAAA

AACATCCTGCCCCAACAAGGTCAACCGGTTATCCGCTTAACCGCAGGACAGTTGAGCTC

GCAACTCGCCGAACTGTCTGAAGAAGCACTGGGCGACGCGGGGTTGGAGGCAAGC

0263 Pat pre encodes a peptide composed of the follow 3-2 Preparation of DNA of Substrate Peptide ing amino acid sequence having the leader sequence and the recognition sequence (underlined portion). 3-2-1 Peptide Having AYDGVEPS as the Recognition Sequence 2 0265 A mutant DNA having mutation in the cassette MNKKNILPQOGOPWIRLTAGOLSSQLAELSEEALGDAGLEAS domain thereof was prepared by conducting PCR twice or three times with Pat pre as a template. The peptide thus 0264. Primers used for PCR of Pat pre are shown in the obtained had the following sequence: following table. The sequence of the primers is shown in a primer list which will be shown later. MNKKNILPQOGQPWIRLTAGOLSSQLAELSEEALGDAGLEAS (XXX) TABLE 5 AYDGWEPS final product 1st F primer 1st Rprimer 2nd F primer 2nd R primer 0266 The sequence in (XXX) corresponding to the cas PatEpre pre-Ion a pre-Ion c pre-Ion b pre-Ion c sette domain and primers used are shown below. The sequence of the primers is shown in the primer list which will be described later.

TABLE 6

Amino acid sequence in SEQ 1st 2nd 3rd mutant (2) ID NO: primier R prmier R prmier R primier

C1Wt WTAWTFC () Tiex5 Wit- a Wit-lo

C1m.1 WTACIT O Tiex5 m1. - a m1 -o

C1m2 WTACITECWT 1. Tiex5 m2- a m2 -o

C1 m3 WTAC) ET 2 Tiex5 m3 - a m3 -o

C1m4 WCACICFC 3 Tiex5 m4 - a m4 -o

C1m.5 WSASISFS 4. Tiex5 ms - a m5 -o

C1m6 WTADITFC 5 Tiex5 m6 - a Wit-lo

C1mf WTANITFC 6 Tiex5 mf-a Wit-lo

C1m.8 WTAKITFC 7 Tiex5 m8 - a Wit-lo

C1m.9 CFTICATV 8 Tiex5 m9 - a (2)

C1m1O WTACITECVTIC 9 Tiex5 m1O-a m1 O-lo m10 - c.

C1m.11 WTACDTFC 2O Tiex5 m11 - a Wit-lo

C1m12 WTAC(DTEC 21 Tiex5 m12 - a Wit-lo

C1m13 WTACKTFC 22 Tiex5 m13 - a m13 -o

C1m.14 WCACDCFC 23 Tiex5 m14 - a m14 -o

C1m.15 WCACNCFC 24 Tiex5 m15 - a m15-b

US 2014/011383.0 A1 Apr. 24, 2014 22

TABLE 8- continued TABLE 10

SEQ SEQ A. B ID ID (2) Sequence No: (2) Sequence NO : PatD final concentration O.6 6 Reaction temperature C. 34 25 Reaction time h 2 16 m32 - GS WFALICCC 41. AYDGWGGGGGG 7 m3 - GG WTATITET 12 AYDGWGGGGGG 8 5 Mass Spectrometry Analysis of Peptide by Using ms - GG WSASISFS 14 AYDGWGGGGGG 8 MALDI-TOF-MS m11 - GG WTACDTFC 2O AYDGWGGGGGG () 0272. The mass of the peptide was measured using m12 - GG WTACNTFC 21 AYDGWGGGGGG () MALDI-TOF-MS by using sinapinic acid as a matrix and whether a mass change occurred or not by the addition of m13 - GG WTACKTFC 22 AYDGWGGGGGG 8 PatD was checked. The number of azoline rings introduced can be found from the mass change. m32 - GG WFALICCC 41. AYDGWGGGGGG 8 0273. The results of mass spectroscopy at the time when Wit-EGS WTACITFC 6 O AYDGWEGSGSGS () Clwt whose amino acid sequence in the cassette domain was the same as that of wildtype Pat was modified with PatDare m11 - EGS WTACDTFC 2O AYDGWEGSGSGS 9 shown in FIG. 2. As a result, a change in molecular weight (2) indicates text missing or illegible when filed showing formation of four azoline rings was observed. 0274 The results of C1 m10 whose cassette domain was 0269 Primers used for DNA preparation were shown composed of 12 amino acids (in the formula (I), n=6) and below. The sequence of the primers was shown in the primer C1 m38 whose cassette domain was composed of 16 amino list which will be shown later. acids (in the formula (I), n=8) are shown in FIG. 3. It has been confirmed that 6 and 8 azoline rings were introduced into TABLE 9 C1 m10 and C1 m38, respectively. 0275 Other typical results are shown in FIG. 4. As shown F primer 1st Rprimer 2nd R primer 3rd R primer in FIG. 4, it has been confirmed that even a sequence having hydrophilic residues in addition to Cys, Ser, or Thr, which has conventionally been considered unsuitable as a Substrate of PatD, was modified by PatD. 0276. The substrate tolerance of PatD confirmed based on the results of such a test is shown in FIG. 5. As shown in this figure, a variety of sequences including a sequence which has been considered unsuitable as a substrate of PatD were modi fied by PatD; and PatD was found to have sufficient substrate tolerance for the synthesis of an azoline compound library. 6) Glu-C Enzyme Reaction (0277. After the PatD enzyme reaction, 0.5 ug of Glu-C (Roche) was added to 5.0 ul of the reaction solution. The resulting mixture was incubated at 25°C. for 2 hours to cleave the peptide. The mass spectrometry analysis of the modified peptide which had been obtained as a result of such a test and from which the leader sequence had been cleaved are shown in FIG. 6. 7 Mass Spectroscopy after Glu-C Enzyme Reaction 0278. With C.-CHCA as a matrix, the mass of the peptide was measured using MALDI-TOF-MS and formation of a peptide from which the leader sequence had been cleaved was 4 PatD Enzyme Reaction confirmed. 0279 An example of the results is shown in FIG. 6. For 0270. After the DNA prepared in 3-2 was transcribed and mation of a peptide from which the leader sequence has been translated in a cell-free protein expression system of 2.5 ul cleaved has been confirmed. scale in accordance with the method of Kawakami, et al. (Kawakami et al., & Biology 15, 32-42 (2008)) 8 Synthesis of Library and the solution conditions were adjusted by adding 40 mM 0280 DNA of Patpre2 having a non-translation region Tris-HCl (pH 8.0), 8 mM DTT, 4 mM MgCl, and 0.8 mM suited for construction of a library was prepared. The under ATP (each, final concentration), recombinant PatD was lined portion is a DNA encoding GLEAS (SEQID NO:3), the added. recognition sequence 1. The sequence from position 46 to 0271 The reaction was conducted under two conditions as position 151 of SEQID NO: 61 is a code region of the leader described below. Sequence. US 2014/011383.0 A1 Apr. 24, 2014 23

TABLE 11 (SEQ ID NO : 61) TAATACGACT CACTATAGGGTTAACTTTAAGAAGGAGATATACATATGA library F primer 1st Rprimer 2nd R primer

ACAAGAAAAACATCCTGCCCCAACAAGGTCAACCGGTTATCCGCTTAAC XC3 T7g10M XC3pool PatEpool.ex XC4 T7g10M XC4pool PatEpool.ex CGCAGGACAGTTGAGCTCGCAACTCGCCGAACTGTCTGAAGAAGCACTG XC5 T7g10M XC5pool PatEpool.ex XC6 T7g10M XC6pool PatEpool.ex GGCGACGCGGGGTTGGAGGCAAGC XC 7 T7g10M XC7pool PatEpool.ex XC8 T7g10M XC8pool PatEpool.ex XCST3 T7g10M XCST3pool PatEpool.ex 0281. With Patpre2 as a template, two libraries as XCST4 T7g10M XCST4pool PatEpool.ex described below were constructed by conducting PCR twice XCSTS T7g10M XCST5pool PatEpool.ex or three times by using a primer containing a random base XCST6 T7g10M XCST6pool PatEpool.ex Sequence. 8-1 (XC), Library 9 Study on Leader Sequence (1) 0287 Tests were conducted in a similar manner to that 0282. A double-stranded DNA having the following described above in 1 to 8 in order to study the necessity of sequence was prepared. (n-3 to 8) a leader sequence: by removing a leader sequence from a Substrate peptide; by adding a leader sequence obtained by (SEQ ID NO: 62) translational synthesis as a peptide separate from a substrate TAATACGACT CACTATAGGGTTAACTTTAAGAAGGAGATATACATATGA peptide and adding it to a reaction system; or by adding a leader sequence obtained by chemical synthesis to give a final ACAAGAAAAACATCCTGCCCCAACAAGGTCAACCGGTTATCCGCTTAAC concentration of 1 uM, 10 uM, or 50 uM. The cassette CGCAGGACAGTTGAGCTCGCAACTCGCCGAACTGTCTGAAGAAGCACTG sequence was comprised of 8 amino acids (VTACITFC). 0288. The results are shown in FIG. 7. It has been found GGCGACGCGGGGTTGGAGGCAAGC (NNKTGT) nGCGTACGATGGCGTTG that the leader sequence is not essential for the Substrate of an enzyme because although a modification efficiency with the GT enzyme increases when the leader sequence has been fused to a substrate peptide, the substrate peptide was 1-fold dehy 0283 Translation of this DNA results in synthesis of the drated even without the leader sequence. As long as the leader following peptide: sequence is present in the reaction system, the Substrate pep tide is completely (4-fold) dehydrated even if it has not been (SEQ ID NO: 63) fused to the substrate peptide, from which it has been con MNKKNILPQQGQPVIRLTAGQLSSQLAELSEEALGDAGLEAS (XC) AY firmed that the substrate peptide is modified sufficiently. This Suggests that the leader sequence is not necessary for the DGWGSGSGS substrate specificity but the structure of the leader sequence contributes to the activation of an enzyme and has an influ ence on the reaction efficiency. 8-2 (X C/S/T), Library 10 Study on Leader Sequence (2) 0284. A double-stranded DNA having the following sequence was prepared. (n-3 to 6) 0289. In a similar manner to that employed above in 1 to 8, a portion of a leader sequence was deleted as shown in FIG. 8 and FIG. 19 and enzyme activity by PatD was con (SEQ ID NO: 64) firmed. TAATACGACT CACTATAGGGTTAACTTTAAGAAGGAGATATACATATGA 0290. As shown in FIG. 8 and FIG. 19, it has been found that the entire length of the leader sequence is not essential for ACAAGAAAAACATCCTGCCCCAACAAGGTCAACCGGTTATCCGCTTAAC modification of a substrate peptide with PatD. It has also been CGCAGGACAGTTGAGCTCGCAACTCGCCGAACTGTCTGAAGAAGCACTG confirmed that the C-terminus part of the leader sequence is important for the modification with PatD and a substrate GGCGACGCGGGGTTGGAGGCAAGC (NNKWST) nGCGTACGATGGCGTTG peptide having, for example, about 6 amino acids is Subjected GT to sufficient modification by PatD if it contains the C-terminal of the leader sequence. 0285 Translation of this DNA results in synthesis of the following peptide. 11 Study on Leader Sequence (3) 0291. In a similar manner to that described above in 1 to 8, the entirety of the leader sequence was substituted with a (SEO ID NO: 65) completely different peptide sequence as shown in FIG. 18 MNKKNILPQOGOPWIRLTAGOLSSQLAELSEEALGDAGLEA and enzyme activity by PatD was studied. (XCAS/T) nAYDGVGSGSGS 0292. As shown in FIG. 18, it has been confirmed that a leader sequence necessary for modification of a substrate 0286 The primers used for the construction of individual with PatD is not limited to the leader sequence of Pat, but, libraries are shown below. The sequences of the primers are for example, a partial sequence of human actin, the leader shown in the primer list shown below. sequence of the precursor peptide of Lacticin 481, or a US 2014/011383.0 A1 Apr. 24, 2014 24 sequence obtained by shuffling the leader sequence of Pat (4-fold and 5-fold dehydration was observed). Even when an can also lead to efficient modification by PatD. additional sequence was placed downstream of the recogni tion sequence 2, the resulting peptide was subjected to suffi 12 Study on Leader Sequence (4) cient modification. 0297. These results have suggested that the modification 0293. In a similar manner to that employed above in 1 to efficiency with PatD becomes higher when the recognition 8, the enzyme activity by PatD was confirmed by introduc sequence 1 is present, but no particular limitation is imposed ing point mutation into some amino acids of the leader on its sequence and the presence or absence of the recognition sequence as shown in FIG. 20. sequence 2 has almost no influence on the modification effi 0294 As shown in FIG. 20, it has been confirmed that the ciency with PatD. They have also suggested that the addition 28th Glu, 29th Leu, 31st Glu, 32nd Glu, and 34th Leu of the of a sequence to the downstream of the recognition sequence leader sequence of Pat are important for modification with 2 has almost no influence on the modification efficiency. Pat D. 14 Study on Modification Efficiency, with PatD, of a Sub strate Having, in the Cassette Sequence Thereof, a Non-Natu 13 Study on Recognition Sequence ral Amino Acid 0295. In a similar manner to that employed above in 1 to 0298 As shown in FIG. 10A, a substrate having, in the 8, enzyme activity by PatD was studied by deleting a rec cassette sequence thereof, a non-natural amino acid was pre ognition sequence, by changing the sequence, or by adding pared by ribosomal synthesis and the resulting Substrate was another sequence as shown in FIG. 9 or FIG. 21. modified by PatD. Amino acids used are shown in FIG. 10B. 0296. As shown in FIG.9 or FIG. 21, the substrate peptide Typical results are shown in FIG. 11. As shown in FIG. 11, it was completely (4-fold) dehydrated even if it had no recog has been confirmed that Substances conventionally consid nition sequence 2. When the recognition sequence 1 was ered unsuitable as a substrate of PatD such as isomer of Thr, removed, dehydration reaction involved the loss of only two substituted Ser, and a diamino acid are modified by PatD and water molecules. It has been confirmed that the recognition converted into corresponding azoline ring backbones. sequence 1 is also not essential for enzyme activity. Further, even when the recognition sequence was replaced by 15 Study on Another Substrate GGGGG, QQQQQ, LLLLL, or PPPPP, the peptide was suf 0299 Substrates having the following sequences were ficiently modified (dehydration of 4 molecules). Also when synthesized, respectively and modification with PatD was GGGGG was inserted between the leader sequence and the confirmed. The results are shown in the following table. Typi recognition sequence, the peptide was Sufficiently modified cal results are shown in FIG. 16 and FIG. 17. TABLE 11 - 2 Recognition Cassette Recognition Dehydrated Leader sequences sequence 1 sequence Sequence 2 molecules () GLEAS () () 2 to 4

GLEAS (2) (2) 3 to 5

GLEAS () () 3 to 5

GLEAS (2) (2) 1 to 3

GLEAS () () 2 to 5

GLEAS (2) (2) 2

GLEAS (2) (2) 2 to 4

GLEAS () () 2 to 4

GLEAS (2) (2) 1 or 2

GLEAS () () 1 or 2

GLEAS (2) (2) 1 to 4

GLEAS () () 1 to 4

GLEAS (2) A. 1 or 2

GLEAS () A. 1 or 2

GGGGG (2) A. 1.

GGGGG (2) A. 1.

GLEAS () () 2

GLEAS () () 3 or 4 US 2014/011383.0 A1 Apr. 24, 2014 25

TABLE 11 - 2 - continued Recognition Cassette Recognition Dehydrated Leader sequences sequence 1 sequence Sequence 2 molecules

() GLEAS () A. 3 or 4

(2) GGGGG (2) A. 4.

(2) indicates text missing or illegible when filed

16 Screening Using mRNA Display Method 16-3-5 Cleavage by Protease Glu-C 0300 Compounds that bound to matrix metalloproteinase 0309 Glu-C was added and the leader sequence was (MMP) 12 were screened in accordance with an mRNA dis cleaved. play method by using the azoline compound library obtained 16-3-6. Selection of Azoline-Modified Peptide that Binds to using the above-mentioned method. MMP12 0310 MMP12 immobilized onto streptavidin beads was 16-1 Purification of MMP12 mixed with the azoline-containing compound library pre 0301 MMP12 was expressed in Escherichia colias a con pared above and the mixture was incubated at 4°C. for 30 struct having, at the C-terminus thereof, a 6xHis tag and an minutes. The Supernatant was removed and the residual beads Avi tag. Purification was conducted by making use of the His were washed with a buffer. A PCR solution was added to the tag. beads, the mixture was heated at 95° C. for 5 minutes, the 0302 Since the Avi tag was biotinylated with BirA peptide was eluted from the beads, and the Supernatant was expressed by birA incorporated in the same plasmid, it was recovered. added for immobilization of MMP12 with streptavidin beads upon mRNA display. 16-3-7 Amplification of Sequence Information of 16-2 Construction of mRNA Library Azoline-Modified Peptide Thus Recovered 0303. The DNA library obtained in 8 was transcribed 0311. The DNA contained in the (azoline-modified pep using T7 RNA polymerase to obtainan mRNA corresponding tide)-mRNA-DNA complex which had bound to MMP12 and to the library. been recovered was amplified by PCR. The DNA thus 16-3 mRNA Display obtained was transcribed into a corresponding mRNA. 0304. A cycle from “Ligation with puromycin linker to “amplification of sequence information of peptide thus recov 16-3-8 Identification of the Sequence of ered” described below was repeated and peptides binding to AZoline-Containing Compound Selected MMP12 were selected from the azoline-containing com 0312 The above-mentioned series of operations was pound library. repeated. When a change in the recovery ratio of DNA 16-3-1 Ligation with Puromycin Linker stopped, TA cloning was conducted using the amplified DNA 0305 The puromycin linker represented by the below and the sequence of the resulting azoline-containing com described sequence was annealed with the above-mentioned mRNA library and ligated to each other via a T4 RNA ligase. pound was identified. (SPC18 represents PEG having Cand O in the total number of 16-3-9 Evaluation of Binding Ability of 18) AZoline-Containing Compound Selected 0313 A series of operations was conducted by using not pdcTCCCGCCCCCCGTCC (SPC18) 5cc (Pu) mRNA from the mRNA library but mRNA of the selected peptide in accordance with a scheme of mRNA display and whether the selected azoline-containing compound binds to 16-3-2 Translation MMP12-immobilised beads was studied. In addition, such (0306 The mRNA library to which the puromycin linker operations were conducted under the condition wherein post had been ligated was translated and a peptide library was translational modification with PatD is performed or the con synthesized. Puromycin reacts to the C-terminus of the pep dition wherein the modification is not performed. The results tide thus synthesized, by which the mRNA and the peptide are have Suggested that the sequences thus obtained bind to connected to each other. MMP12 when they have been modified with PatD and that, in these sequences, the azoline backbone contributes to binding 16-3-3 Modification by PatD to MMP12. 0314. The outline of the mRNA display method is shown 0307 The peptide library tagged with mRNA was sub in FIG. 12. jected to post-translational modification with PatD to intro 0315. The results of selection are shown in FIG. 13. duce azoline rings in the cassette sequence. 0316 The sequences of five azoline-containing com pounds A to E selected based on the results of selection were 16-3-4 Reverse Transcription analyzed (not including data) and evaluation results of the 0308 The mRNA bound to the peptide was reverse tran binding ability to MMP12 are shown in FIG. 14. scribed and (azoline-modified peptide)-mRNA-DNA com 0317. A list of base sequences of the primers used is shown plexes were synthesized. below.

US 2014/011383.0 A1 Apr. 24, 2014 29

TABLE 13 - 3 - continued m2O27 TCAACGCCATCGTACGCGGTAATACAGAAGGTGTAACAAATGGTAACACA m2O3 - a AAGTAATACAAACGGTTGCACAAAAGGTAATACAGAAGGTGTAACAAATG GTAACACACG m2O3 - TCAACGCCATCGTACGCACAAATGGTGAAACAGTAAGTAATACAAACGGT TGCAC m35-N- CGAAGCTTAAGATGGCTCAACGCCATCGTACGCGCAACAACAGC

TABLE 1.4

Wit-EGS1 CCGCTGCCGCTACCCTCAACGCCATCGTACGCACAAAACGTG 59

GS3 an2 TTTCCGCCCCCCGTCCTAGCTGCCGCTGCCGCTACC 60

T7g1OM TAATACGACT CACTATAGGGTTAACTTTAAGAAGGAGATATACATATG 61

XC3poo ACCAACGCCATCGTACGCACAMNNACAMNNACAMNNGCTTGCCTCCAAC 62 CCCG

XC4poo ACCAACGCCATCGTACGCACAMNNACAMNNACAMNNACAMNNGCTTGCC 63 TCCAACCCCG

XC5poo ACCAACGCCATCGTACGCACAMNNACAMNNACAMNNACAMNNACAMIN ING 164 CTTGCCTCCAACCCCG

XC6poo ACCAACGCCATCGTACGCACAMNNACAMNNACAMNNACAMNNACAMNNA 65 CAMNNGCTTGCCTCCAACCCCG

XC7poo ACCAACGCCATCGTACGCACAMNNACAMNNACAMNNACAMNNACAMNNA 66 CAMNNACAMNNGCTTGCCTCCAACCCCG

XC8poo ACCAACGCCATCGTACGCACAMNNACAMNNACAMNNACAMNNACAMNNA 67 CAMNNACAMNNACAMNNGCTTGCCTCCAACCCCG

XCST3pool ACCAACGCCATCGTACGCASWMNNASWMNNASWMNNGCTTGCCTCCAAC 68 CCCG

XCST4pool ACCAACGCCATCGTACGCASWMNNASWMNNASWMNNASWMNNGCTTGC 69 CTCCAACCCCG

XCST5pool ACCAACGCCATCGTACGCASWMNNASWMNNASWMNNASWMNNASWMNN 70 GCTTGCCTCCAACCCCG

XCST6pool ACCAACGCCATCGTACGCASWMNNASWMNNASWMNNASWMNNASWMNN 71. ASWMNNGCTTGCCTCCAACCCCG

Patepool.ex TTTCCGCCCCCCGTCCTAGCTGCCGCTGCCGCTACCAACGCCATCGTAC 72

Sequence Listing Free Text 0324 SEQ ID NO: 60 is the amino acid sequence of the cassette domain of wild type Pat. 0318 SEQID NO: 1 represents the amino acid sequence 0325 SEQ ID NO: 61 is the base sequence of a nucleic of the leader sequence of Pat. acid encoding Pat pre2. 0319 SEQID NOS: 2 and 3 are examples of the recogni tion sequence 1 by an azoline backbone-introducing enzyme, 0326 SEQ ID NO: 62 is the base sequence of a nucleic respectively. acid encoding the peptide portion of each compound of (XC), 0320 SEQID NOS: 4 to 9 are examples of the recognition library. sequence 2 by an azoline backbone-introducing enzyme, 0327 SEQ ID NO: 63 is an amino acid sequence of the respectively. peptide portion of each compound of (XC), library. 0321 SEQID NOS: 10-57 are amino acid sequences con 0328 SEQ ID NO: 64 is the base sequence of a nucleic firmed newly to become a substrate of an azoline backbone acid encoding the peptide portion of each compound of introducing enzyme. (X-C/S/T), library. 0322 SEQ ID NO: 58 is the base sequence of a nucleic 0329 SEQ ID NO: 65 is the amino acid sequence of the acid encoding PatE. peptide portion of each compound of (X C/S/T), library. 0323 SEQ ID NO. 59 is the base sequence of a nucleic 0330 SEQID NOS: 66 to 172 are the base sequences of acid encoding PatEpire. primers used in Examples of the present application. US 2014/011383.0 A1 Apr. 24, 2014 30

SEQUENCE LISTING

<16O is NUMBER OF SEO ID NOS: 376

<210s, SEQ ID NO 1 &211s LENGTH: 37 212. TYPE: PRT <213> ORGANISM: Prochloron didemni

<4 OOs, SEQUENCE: 1 Met Asn Llys Lys Asn. Ile Lieu Pro Glin Glin Gly Glin Pro Val Ile Arg 1. 5 1O 15 Lieu. Thir Ala Gly Glin Lieu. Ser Ser Glin Lieu Ala Glu Lieu. Ser Glu Glu 2O 25 3O Ala Lieu. Gly Asp Ala 35

<210s, SEQ ID NO 2 &211s LENGTH: 5 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of Recognition Sequence 1 based on Prochloron didemni. 22 Os. FEATURE: <221s NAME/KEY: MISC FEATURE <222s. LOCATION: (2) ... (2) <223> OTHER INFORMATION: Xaa stands for Ala, Lieu or Wall. 22 Os. FEATURE: <221s NAME/KEY: MISC FEATURE <222s. LOCATION: (3) ... (3) <223> OTHER INFORMATION: Xaa stands for Gly, Glu or Asp. 22 Os. FEATURE: <221s NAME/KEY: MISC FEATURE <222s. LOCATION: (4) ... (4) <223 is OTHER INFORMATION: Xala stands for Ala or Pro. 22 Os. FEATURE: <221s NAME/KEY: MISC FEATURE <222s. LOCATION: (5) . . (5) <223> OTHER INFORMATION: Xaa stands for Ser, Thr or Cys. <4 OOs, SEQUENCE: 2 Gly Xaa Xala Xala Xaa 1. 5

<210s, SEQ ID NO 3 &211s LENGTH: 5 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of Recognition Sequence 1 based on Prochloron didemni.

<4 OOs, SEQUENCE: 3 Gly Lieu. Glu Ala Ser 1. 5

<210s, SEQ ID NO 4 &211s LENGTH: 5 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of Recognition Sequence 2 based on Prochloron didemni. 22 Os. FEATURE: <221s NAME/KEY: MISC FEATURE <222s. LOCATION: (1) . . (1) <223 is OTHER INFORMATION: Xala stands for Ala or Ser. US 2014/011383.0 A1 Apr. 24, 2014 31

- Continued

22 Os. FEATURE: <221s NAME/KEY: MISC FEATURE <222s. LOCATION: (3) ... (3) <223> OTHER INFORMATION: Xaa stands for Asp or Glu. 22 Os. FEATURE: <221s NAME/KEY: MISC FEATURE <222s. LOCATION: (5) . . (5) <223> OTHER INFORMATION: Xaa stands for Ala, Lieu or Wall.

<4 OOs, SEQUENCE: 4 Xaa Tyr Xaa Gly Xaa 1. 5

<210s, SEQ ID NO 5 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of Recognition Sequence 2 based on Prochloron didemni.

<4 OOs, SEQUENCE: 5 Ala Tyr Asp Gly Val Glu Pro Ser 1. 5

<210s, SEQ ID NO 6 &211s LENGTH: 5 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of Recognition Sequence 2 based on Prochloron didemni.

<4 OOs, SEQUENCE: 6 Ala Tyr Asp Gly Val 1. 5

<210s, SEQ ID NO 7 &211s LENGTH: 11 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of Recognition Sequence 2 based on Prochloron didemni.

<4 OO > SEQUENCE: 7 Ala Tyr Asp Gly Val Gly Ser Gly Ser Gly Ser 1. 5 1O

<210s, SEQ ID NO 8 &211s LENGTH: 11 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of Recognition Sequence 2 based on Prochloron didemni.

<4 OOs, SEQUENCE: 8 Ala Tyr Asp Gly Val Gly Gly Gly Gly Gly Gly 1. 5 1O

<210s, SEQ ID NO 9 &211s LENGTH: 12 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of Recognition Sequence 2 based on US 2014/011383.0 A1 Apr. 24, 2014 32

- Continued Prochloron didemni.

<4 OOs, SEQUENCE: 9 Ala Tyr Asp Gly Val Glu Gly Ser Gly Ser Gly Ser 1. 5 1O

<210s, SEQ ID NO 10 &211s LENGTH: 6 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 10 Val Thr Ala Cys Ile Thr 1. 5

<210s, SEQ ID NO 11 &211s LENGTH: 10 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 11 Val Thr Ala Cys Ile Thr Phe Cys Val Thr 1. 5 1O

<210s, SEQ ID NO 12 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 12

Wall. Thir Ala Thir Ile Thir Phe Thr 1. 5

<210s, SEQ ID NO 13 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 13 Val Cys Ala Cys Ile Cys Phe Cys 1. 5

<210s, SEQ ID NO 14 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni. US 2014/011383.0 A1 Apr. 24, 2014 33

- Continued <4 OOs, SEQUENCE: 14

Wall Ser Ala Ser Ile Ser Phe Ser 1. 5

<210s, SEQ ID NO 15 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 15 Val Thr Ala Asp Ile Thr Phe Cys 1. 5

<210s, SEQ ID NO 16 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 16 Val Thr Ala Asn Ile Thr Phe Cys 1. 5

<210s, SEQ ID NO 17 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 17 Val Thr Ala Lys Ile Thr Phe Cys 1. 5

<210s, SEQ ID NO 18 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 18 Cys Phe Thr Ile Cys Ala Thr Val 1. 5

<210s, SEQ ID NO 19 &211s LENGTH: 12 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 19 US 2014/011383.0 A1 Apr. 24, 2014 34

- Continued Val Thr Ala Cys Ile Thr Phe Cys Val Thir Ile Cys 1. 5 1O

<210s, SEQ ID NO 2 O &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 2O Val Thr Ala Cys Asp Thr Phe Cys 1. 5

<210s, SEQ ID NO 21 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 21 Val Thr Ala Cys Asn Thr Phe Cys 1. 5

<210s, SEQ ID NO 22 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 22 Val Thr Ala Cys Llys Thr Phe Cys 1. 5

<210s, SEQ ID NO 23 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 23 Val Cys Ala Cys Asp Cys Phe Cys 1. 5

<210s, SEQ ID NO 24 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 24 Val Cys Ala Cys Asn Cys Phe Cys 1. 5 US 2014/011383.0 A1 Apr. 24, 2014 35

- Continued

<210s, SEQ ID NO 25 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 25 Val Cys Ala Cys Lys Cys Phe Cys 1. 5

<210s, SEQ ID NO 26 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 26 Val Cys Ala Thr Ile Thr Phe Thr 1. 5

<210s, SEQ ID NO 27 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 27 Val Thr Ala Cys Ile Thr Phe Thr 1. 5

<210s, SEQ ID NO 28 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 28 Val Thr Ala Thr Ile Cys Phe Thr 1. 5

<210s, SEQ ID NO 29 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 29 Val Thr Ala Thr Ile Thr Phe Cys 1. 5 US 2014/011383.0 A1 Apr. 24, 2014 36

- Continued <210s, SEQ ID NO 3 O &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 30 Cys Phe Cys Ile Cys Ala Cys Val 1. 5

<210s, SEQ ID NO 31 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 31 Asp Thr Ala Cys Ile Thr Phe Cys 1. 5

<210s, SEQ ID NO 32 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 32 Asn Thr Ala Cys Ile Thr Phe Cys 1. 5

<210s, SEQ ID NO 33 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 33 Lys Thr Ala Cys Ile Thr Phe Cys 1. 5

<210s, SEQ ID NO 34 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 34 Val Thr Ala Cys Glu Thr Phe Cys 1. 5

<210s, SEQ ID NO 35 &211s LENGTH: 8 US 2014/011383.0 A1 Apr. 24, 2014 37

- Continued

212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 35 Val Thr Ala Cys Glin Thr Phe Cys 1. 5

<210s, SEQ ID NO 36 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 36 Val Thr Ala Cys His Thr Phe Cys 1. 5

<210s, SEQ ID NO 37 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OO > SEQUENCE: 37 Val Thr Ala Cys Arg Thr Phe Cys 1. 5

<210s, SEQ ID NO 38 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 38 Val Thr Ala Cys Pro Thr Phe Cys 1. 5

<210s, SEQ ID NO 39 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 39 Val Phe Ala Lieu. Ile Met Phe Cys 1. 5

<210s, SEQ ID NO 4 O &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence US 2014/011383.0 A1 Apr. 24, 2014 38

- Continued

22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 4 O Val Phe Ala Lieu. Ile Met Cys Cys 1. 5

<210s, SEQ ID NO 41 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 41 Val Phe Ala Lieu. Ile Cys Cys Cys 1. 5

<210s, SEQ ID NO 42 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 42 Val Cys Ala Cys Arg Cys Phe Cys 1. 5

<210s, SEQ ID NO 43 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 43 Arg Thr Ala Cys Ile Thr Phe Cys 1. 5

<210s, SEQ ID NO 44 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 44 Val Phe Ala Lieu. Cys Cys Cys Cys 1. 5

<210s, SEQ ID NO 45 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate US 2014/011383.0 A1 Apr. 24, 2014 39

- Continued sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 45 Arg Cys Asp Cys Asp Cys Arg Cys 1. 5

<210s, SEQ ID NO 46 &211s LENGTH: 14 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 46 Val Cys Ala Cys Ile Cys Phe Cys Val Cys Ala Cys Val Cys 1. 5 1O

<210s, SEQ ID NO 47 &211s LENGTH: 16 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 47 Val Cys Ala Cys Ile Cys Phe Cys Val Cys Ala Cys Val Cys Ile Cys 1. 5 1O 15

<210s, SEQ ID NO 48 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 48 Val Thr Ala Thr Asp Thr Phe Thr 1. 5

<210s, SEQ ID NO 49 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 49 Val Ser Ala Ser Asp Ser Phe Ser 1. 5

<210s, SEQ ID NO 50 &211s LENGTH: 12 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni. US 2014/011383.0 A1 Apr. 24, 2014 40

- Continued

<4 OOs, SEQUENCE: 50

Wall. Thir Ala Thir Ile Thir Phe Thir Wall. Thir Ile Thr 1. 5 1O

<210s, SEQ ID NO 51 &211s LENGTH: 14 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 51 Arg Cys Arg Cys Ile Cys Phe Cys Val Cys Ala Cys Val Cys 1. 5 1O

<210s, SEQ ID NO 52 &211s LENGTH: 14 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 52 Val Cys Ala Cys Ile Cys Arg Cys Arg Cys Ala CyS Val Cys 1. 5 1O

<210s, SEQ ID NO 53 &211s LENGTH: 14 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 53 Val Cys Ala Cys Ile Cys Phe Cys Val Cys Arg Cys Arg Cys 1. 5 1O

<210s, SEQ ID NO 54 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 54 Val Thr Ala Thr Ile Cys Phe Cys 1. 5

<210s, SEQ ID NO 55 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OO > SEQUENCE: 55 US 2014/011383.0 A1 Apr. 24, 2014 41

- Continued

Val Cys Ala Thr Ile Thr Phe Cys 1. 5

SEO ID NO 56 LENGTH: 8 TYPE PRT ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

SEQUENCE: 56 Val Cys Ala Cys Ile Thr Phe Thr 1. 5

SEO ID NO 57 LENGTH: 8 TYPE PRT ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

SEQUENCE: 57 Val Cys Ala Thr Ile Cys Phe Thr 1. 5

SEO ID NO 58 LENGTH: 216 TYPE PRT ORGANISM: Prochloron didemni FEATURE: NAMEAKEY: misc feature OTHER INFORMATION: DNA sequence coding PatE FEATURE: NAMEAKEY: misc feature OTHER INFORMATION: DNA sequence encoding PatE SEQUENCE: 58 Ala Thr Gly Ala Ala Cys Ala Ala Gly Ala Ala Ala Ala Ala Cys Ala 1. 5 1O 15 Thir Cys Cys Thr Gly Cys Cys Cys Cys Ala Ala Cys Ala Ala Gly Gly

Thr Cys Ala Ala Cys Cys Gly Gly Thr Thr Ala Thr Cys Cys Gly Cys 35 4 O 45 Thir Thr Ala Ala Cys Cys Gly Cys Ala Gly Gly Ala Cys Ala Gly Thr SO 55 6 O Thr Gly Ala Gly Cys Thr Cys Gly Cys Ala Ala Cys Thr Cys Gly Cys 65 Cys Gly Ala Ala Cys Thr Gly Thr Cys Thr Gly Ala Ala Gly Ala Ala 85 90 95 Gly Cys Ala Cys Thr Gly Gly Gly Cys Gly Ala Cys Gly Cys Gly Gly 1OO 105 11 O Gly Gly. Thir Thr Gly Gly Ala Gly Gly Cys Ala Ala Gly Cys Gly Thr 1. 5 12 O 125 Thr Ala Cys Gly Gly Cys Gly Thr Gly Thr Ala Thr Cys Ala Cys Gly 13 O 135 14 O Thir Thir Thr Thr Gly Thr Gly Cys Gly Thr Ala Cys Gly Ala Thr Gly 145 150 155 160 US 2014/011383.0 A1 Apr. 24, 2014 42

- Continued

Gly Cys Gly Thr Thr Gly Ala Gly Cys Cys Ala Thr Cys Thr Ala Thr 1.65 17O 17s Thr Ala Cys Gly Gly Thr Cys Thr Gly Cys Ala Thr Thr Ala Gly Thr 18O 185 19 O Gly Thr Cys Thr Gly Cys Gly Cys Cys Thr Ala Thr Gly Ala Thr Gly 195 2OO 2O5 Gly Gly Gly Ala Gly Thr Ala Ala 21 O 215

<210s, SEQ ID NO 59 &211s LENGTH: 173 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence encoding Pat Epire based on Prochloron didemni.

<4 OO > SEQUENCE: 59 ggcgtaatac gact cact at agggittaact ttaacaagga gaaaaac atgaacaagaaaa 6 O a catcc ticc ccaacaaggt Caaccggitta t cc.gcttaac cqcagga cag titgagct cqc 12 O aact cqc.cga actgtctgaa galagcactgg gcgacgcggg gttggaggca agc 173

<210s, SEQ ID NO 60 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Prochloron didemni 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <223> OTHER INFORMATION: Amino acid sequence of cassette region of wild type Pat E

<4 OOs, SEQUENCE: 60 Val Thr Ala Cys Ile Thr Phe Cys 1. 5

<210s, SEQ ID NO 61 &211s LENGTH: 171 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence encoding PatEpre2 based on Prochloron didemni.

<4 OOs, SEQUENCE: 61 taatacgact cactataggg tta actittaa gaaggagata tacatatgaa caagaaaaac 6 O atcCtgcc cc aacaaggt ca accggittatc. c9cttaa.ccg Cagga cagtt gagct cqcala 12 O

Ctcgc.cgaac ttctgaaga agc actgggc gacgcggggit taggcaag C 171

<210s, SEQ ID NO 62 &211s LENGTH: 195 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence encoding (XC) in Library based on Prochloron didemni. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (172) ... (173) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (174) . . (174) US 2014/011383.0 A1 Apr. 24, 2014 43

- Continued

223 OTHER INFORMATION: n Stands for G or T. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (172) ... (177) <223> OTHER INFORMATION: This sequence may be repeated 1 to 20 times.

<4 OOs, SEQUENCE: 62 taatacgact cactataggg tta actittaa gaaggagata tacatatgaa caagaaaaac 6 O atcCtgcc cc aacaaggt ca accggittatc. c9cttaa.ccg Cagga cagtt gagct cqcala 12 O Ctcgc.cgaac ttctgaaga agc actgggc gacgcggggit taggcaa.g. cnn.ntgtgcg 18O tacgatggcg ttggit 195

<210s, SEQ ID NO 63 &211s LENGTH: 55 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Amino Acid Sequence of (XC) in Library based on Prochloron didemni. 22 Os. FEATURE: <221s NAME/KEY: MISC FEATURE <222s. LOCATION: (43) . . (43) <223> OTHER INFORMATION: Xaa stands for any amino acid. 22 Os. FEATURE: <221s NAME/KEY: MISC FEATURE <222s. LOCATION: (43) . . (44) <223> OTHER INFORMATION: This sequence may be repeated 1 to 20 times.

<4 OOs, SEQUENCE: 63 Met Asn Llys Lys Asn. Ile Lieu Pro Glin Glin Gly Glin Pro Val Ile Arg 1. 5 1O 15 Lieu. Thir Ala Gly Glin Lieu. Ser Ser Glin Lieu Ala Glu Lieu. Ser Glu Glu 2O 25 3O Ala Lieu. Gly Asp Ala Gly Lieu. Glu Ala Ser Xaa Cys Ala Tyr Asp Gly 35 4 O 45 Val Gly Ser Gly Ser Gly Ser SO 55

<210s, SEQ ID NO 64 &211s LENGTH: 195 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence encoding (X-C/S/T) in Library based on Prochloron didemni. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (172) ... (173) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (174) . . (174) 223 OTHER INFORMATION: n Stands for G or T. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (175) . . (175) 223 OTHER INFORMATION: n Stands for A or T. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (176) ... (176) <223 is OTHER INFORMATION: n stands for C or G. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (172) ... (177) <223> OTHER INFORMATION: This sequence may be repeated 1 to 20 times.

<4 OOs, SEQUENCE: 64 US 2014/011383.0 A1 Apr. 24, 2014 44

- Continued taatacgact cactataggg tta actittaa gaaggagata tacatatgaa caagaaaaac 6 O atcCtgcc cc aacaaggt ca accggittatc. c9cttaa.ccg Cagga cagtt gagct cqcala 12 O Ctcgc.cgaac ttctgaaga agc actgggc gacgcggggit taggcaa.g. cnn.nnntgcg 18O tacgatggcg ttggit 195

<210s, SEQ ID NO 65 &211s LENGTH: 54 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Amino acid sequence of (X-C/S/T) in Library based on Prochloron didemni. 22 Os. FEATURE: <221s NAME/KEY: MISC FEATURE <222s. LOCATION: (42) ... (42) <223> OTHER INFORMATION: Xaa stands for any amino acid. 22 Os. FEATURE: <221s NAME/KEY: MISC FEATURE <222s. LOCATION: (43) . . (43) <223> OTHER INFORMATION: Xaa stands for Cys, Ser or Thr. 22 Os. FEATURE: <221s NAME/KEY: MISC FEATURE <222s. LOCATION: (42) ... (43) <223> OTHER INFORMATION: This sequence may be repeated 1 to 20 times. <4 OOs, SEQUENCE: 65 Met Asn Llys Lys Asn. Ile Lieu Pro Glin Glin Gly Glin Pro Val Ile Arg 1. 5 1O 15 Lieu. Thir Ala Gly Glin Lieu. Ser Ser Glin Lieu Ala Glu Lieu. Ser Glu Glu 2O 25 3O Ala Lieu. Gly Asp Ala Gly Lieu. Glu Ala Xala Xaa Ala Tyr Asp Gly Val 35 4 O 45 Gly Ser Gly Ser Gly Ser SO

<210s, SEQ ID NO 66 &211s LENGTH: 22 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer T 7 ex5 based on Prochloron didemni.

<4 OOs, SEQUENCE: 66 ggcgtaatac gact cact at ag 22

<210s, SEQ ID NO 67 &211s LENGTH: 49 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer wt - a based on Prochloron didemni.

<4 OO > SEQUENCE: 67 atcgtacgca caaaacgtga tacacgc.cgt aacgcttgcc ticcaa.cccc 49

<210s, SEQ ID NO 68 &211s LENGTH: 43 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer wt-b based on Prochloron US 2014/011383.0 A1 Apr. 24, 2014 45

- Continued didemni.

<4 OOs, SEQUENCE: 68 cgaagcttaa gatggcticaa cqC catcgta CdCacaaaac gtg 43

<210s, SEQ ID NO 69 &211s LENGTH: 47 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer ml - a based on Prochloron didemni.

<4 OOs, SEQUENCE: 69 cgc.catcqta cqc cqtgata cacgc.cgitaa cqcttgcctic caa.cccc 47

<210s, SEQ ID NO 70 &211s LENGTH: 37 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer ml-b based on Prochloron didemni.

<4 OO > SEQUENCE: 7 O cgaagcttaa gatggcticaa cqC catcgta cqc.cgtg 37

<210s, SEQ ID NO 71 &211s LENGTH: 49 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m2- a based on Prochloron didemni.

<4 OOs, SEQUENCE: 71 cgcggtaa.ca caaaacgtga tacacgc.cgt aacgcttgcc ticcaa.cccc 49

<210s, SEQ ID NO 72 &211s LENGTH: 49 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m2-b based on Prochloron didemni.

<4 OOs, SEQUENCE: 72 cgaagcttaa gatggcticaa cqC catcgta cqcgg talaca caaaacgtg 49

<210s, SEQ ID NO 73 &211s LENGTH: 49 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m3 - a based on Prochloron didemni.

<4 OO > SEQUENCE: 73 atcgtacgcg gtaaacgtga tiggtc.gc.cgt aacgcttgcc ticcaa.cccC 49

<210s, SEQ ID NO 74 &211s LENGTH: 43 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: US 2014/011383.0 A1 Apr. 24, 2014 46

- Continued <223> OTHER INFORMATION: DNA sequence of primer m3-b based on Prochloron didemni.

<4 OOs, SEQUENCE: 74 cgaagcttaa gatggcticaa cqC catcgta cqcggtaaac gtg 43

<210s, SEQ ID NO 75 &211s LENGTH: 48 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer ma- a based on Prochloron didemni.

<4 OO > SEQUENCE: 75 tcqtacgcac aaaaacagat acacgcacaa acgcttgcct c caac ccc 48

<210s, SEQ ID NO 76 &211s LENGTH: 47 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer ma-b based on Prochloron didemni.

<4 OO > SEQUENCE: 76 cgaagcttaa gatggcticaa cqC catcgta CdCacaaaaa Cagatac 47

<210s, SEQ ID NO 77 &211s LENGTH: 49 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m5- a based on Prochloron didemni.

<4 OO > SEQUENCE: 77 atcgtacgca gaaaaagaga tagacgcaga aacgcttgcc ticcaa.cccC 49

<210s, SEQ ID NO 78 &211s LENGTH: 47 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m5-b based on Prochloron didemni.

<4 OO > SEQUENCE: 78 cgaagcttaa gatggcticaa cqC catcgta CC agaaaaa gagatag 47

<210s, SEQ ID NO 79 &211s LENGTH: 49 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer mé- a based on Prochloron didemni.

<4 OO > SEQUENCE: 79 atcgtacgca caaaacgtga tigt cogcc.gt aacgcttgcc ticcaa.cccc 49

<210s, SEQ ID NO 8O &211s LENGTH: 49 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence US 2014/011383.0 A1 Apr. 24, 2014 47

- Continued

22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m7 - a based on Prochloron didemni.

<4 OOs, SEQUENCE: 80 atcgtacgca caaaacgtga tatt cqc.cgt aacgcttgcc ticcaa.cccc 49

<210s, SEQ ID NO 81 &211s LENGTH: 49 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m8- a based on Prochloron didemni.

<4 OOs, SEQUENCE: 81 atcgtacgca caaaacgtga t ctitcgc.cgt aacgcttgcc ticcaa.cccc 49

<210s, SEQ ID NO 82 &211s LENGTH: 49 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer mg - a based on Prochloron didemni.

<4 OOs, SEQUENCE: 82 atcgtacgca acggttgcac aaatggtaaa acagottgcc ticcaa.cccc 49

<210s, SEQ ID NO 83 &211s LENGTH: 42 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer mg-b based on Prochloron didemni.

<4 OOs, SEQUENCE: 83 cgaagcttaa gatggcticaa cqC catcgta CdCaacggitt gc 42

<210s, SEQ ID NO 84 &211s LENGTH: 43 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer ml O-a based on Prochloron didemni.

<4 OOs, SEQUENCE: 84 aacacaaaac gtgatacacg ccgtaacgct tcc ct coaac ccc 43

<210s, SEQ ID NO 85 &211s LENGTH: 47 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer ml O-b based on Prochloron didemni.

<4 OOs, SEQUENCE: 85 t caacgc.cat cqtacgcaca aatgg taa.ca caaaacgtga tacacgc 47

<210s, SEQ ID NO 86 &211s LENGTH: 33 &212s. TYPE: DNA US 2014/011383.0 A1 Apr. 24, 2014 48

- Continued <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer mlo-c based on Prochloron didemni.

<4 OOs, SEQUENCE: 86 cgaagcttaa gatggcticaa cqC catcgta C9C 33

<210s, SEQ ID NO 87 &211s LENGTH: 49 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer mil1-a based on Prochloron didemni.

<4 OO > SEQUENCE: 87 atcgtacgca caaaacgtgt cacacgc.cgt aacgcttgcc ticcaa.cccc 49

<210s, SEQ ID NO 88 &211s LENGTH: 49 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer mil2-a based on Prochloron didemni.

<4 OOs, SEQUENCE: 88 atcgtacgca caaaacgtgt tacacgc.cgt aacgcttgcc tocaa.cccc 49

<210s, SEQ ID NO 89 &211s LENGTH: 47 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer mil3 - a based on Prochloron didemni.

<4 OOs, SEQUENCE: 89 cgitacgcaca aaacgt citta cacgc.cgitaa cqcttgcctic caa.cccc 47

<210s, SEQ ID NO 90 &211s LENGTH: 47 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer mil3-b based on Prochloron didemni.

<4 OOs, SEQUENCE: 90 cgaagcttaa gatggcticaa cqc catcgta cqcacaaaac gtc.ttac 47

<210s, SEQ ID NO 91 &211s LENGTH: 49 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer mil4 - a based on Prochloron didemni.

<4 OOs, SEQUENCE: 91 atcgtacgca caaaaacagt cacacgcaca aacgcttgcc ticcaa.cccc 49

<210s, SEQ ID NO 92 &211s LENGTH: 45 US 2014/011383.0 A1 Apr. 24, 2014 49

- Continued

&212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer mil4-b based on Prochloron didemni.

<4 OOs, SEQUENCE: 92 cgaagcttaa gatggcticaa cqC catcgta CdCacaaaaa cagtic 45

<210s, SEQ ID NO 93 &211s LENGTH: 47 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer mil5- a based on Prochloron didemni.

<4 OOs, SEQUENCE: 93 cgitacgcaca aaaacaatta cacgcacaaa cqcttgcctic caa.cccc 47

<210s, SEQ ID NO 94 &211s LENGTH: 49 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer mil5-b based on Prochloron didemni.

<4 OOs, SEQUENCE: 94 cgaagcttaa gatggcticaa cqc catcgta cqcacaaaaa caattacac 49

<210s, SEQ ID NO 95 &211s LENGTH: 48 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer mló - a based on Prochloron didemni.

<4 OO > SEQUENCE: 95 tcqtacgcac aaaaac actt acacgcacaa acgcttgcct c caac ccc 48

<210s, SEQ ID NO 96 &211s LENGTH: 47 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer mló-b based on Prochloron didemni.

<4 OOs, SEQUENCE: 96 cgaagcttaa gatggcticaa cqc catcgta cqcacaaaaa cacttac 47

<210s, SEQ ID NO 97 &211s LENGTH: 49 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer ml 7-a based on Prochloron didemni.

<4 OO > SEQUENCE: 97 atcgtacgcg gtaaacgtga tiggtc.gcaca aacgcttgcc ticcaa.cccC 49

<210s, SEQ ID NO 98 US 2014/011383.0 A1 Apr. 24, 2014 50

- Continued

&211s LENGTH: 49 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer mil8- a based on Prochloron didemni.

<4 OOs, SEQUENCE: 98 atcgtacgcg gtaaacgtga tacacgc.cgt aacgcttgcc ticcaa.cccc 49

<210s, SEQ ID NO 99 &211s LENGTH: 47 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer ml.9 - a based on Prochloron didemni.

<4 OOs, SEQUENCE: 99 cgitacgcggit aaaacagatg gtc.gc.cgitaa cqcttgcctic caa.cc cc 47

<210s, SEQ ID NO 100 &211s LENGTH: 46 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer ml.9-lb based on Prochloron didemni.

<4 OOs, SEQUENCE: 1.OO cgaagcttaa gatggcticaa cqC catcgta cqcggtaaaa Cagatg 46

<210s, SEQ ID NO 101 &211s LENGTH: 49 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m2O - a based on Prochloron didemni.

<4 OOs, SEQUENCE: 101 atcgtacgca caaaacgtga tigg togcc.gt aacgcttgcc ticcaa.cccc 49

<210s, SEQ ID NO 102 &211s LENGTH: 47 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m21- a based on Prochloron didemni.

<4 OOs, SEQUENCE: 102 cgitacgcaac acatgcacaa atacaaaaac agcttgcctic caa.cccc 47

<210s, SEQ ID NO 103 &211s LENGTH: 44 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m21-b based on Prochloron didemni.

<4 OOs, SEQUENCE: 103 cgaagcttaa gatggcticaa cqC catcgta CdCaacacat gcac 44 US 2014/011383.0 A1 Apr. 24, 2014 51

- Continued <210s, SEQ ID NO 104 &211s LENGTH: 49 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m22 - a based on Prochloron didemni.

<4 OOs, SEQUENCE: 104 atcgtacgca caaaacgtga tacacgc.cgt gtc.gcttgcc ticcaa.cccc 49

<210s, SEQ ID NO 105 &211s LENGTH: 49 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m23 - a based on Prochloron didemni.

<4 OOs, SEQUENCE: 105 atcgtacgca caaaacgtga tacacgc.cgt attgcttgcc ticcaa.cccc 49

<210s, SEQ ID NO 106 &211s LENGTH: 49 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m24 - a based on Prochloron didemni.

<4 OOs, SEQUENCE: 106 atcgtacgca caaaacgtga tacacgc.cgt. Cttgcttgcc ticcaa.cccc 49

<210s, SEQ ID NO 107 &211s LENGTH: 46 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m25- a based on Prochloron didemni.

<4 OOs, SEQUENCE: 107 gtacgcacaa aacgttt cac acgc.cgtaac gottgcct co aaccc.c 46

<210s, SEQ ID NO 108 &211s LENGTH: 47 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m25-b based on Prochloron didemni.

<4 OOs, SEQUENCE: 108 cgaagcttaa gatggcticaa cqc catcgta cqcacaaaac gtttcac 47

<210s, SEQ ID NO 109 &211s LENGTH: 46 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m26-a based on Prochloron didemni.

<4 OOs, SEQUENCE: 109 gtacgcacaa aacgtttgac acgc.cgtaac gottgcct co aaccc.c 46 US 2014/011383.0 A1 Apr. 24, 2014 52

- Continued

<210s, SEQ ID NO 110 &211s LENGTH: 47 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m26-b based on Prochloron didemni.

<4 OOs, SEQUENCE: 110 cgaagcttaa gatggcticaa cqC catcgta CdCacaaaac gtttgac 47

<210s, SEQ ID NO 111 &211s LENGTH: 49 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m27- a based on Prochloron didemni.

<4 OOs, SEQUENCE: 111 atcgtacgca caaaacgtgt gacacgc.cgt aacgcttgcc ticcaa.cccc 49

<210s, SEQ ID NO 112 &211s LENGTH: 49 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m28- a based on Prochloron didemni.

<4 OOs, SEQUENCE: 112 atcgtacgca caaaacgtgc gacacgc.cgt aacgcttgcc ticcaa.cccc 49

<210s, SEQ ID NO 113 &211s LENGTH: 46 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m29 - a based on Prochloron didemni.

<4 OOs, SEQUENCE: 113 gtacgcacaa aacgt.cggac acgc.cgtaac gottgcct co aaccc.c 46

<210s, SEQ ID NO 114 &211s LENGTH: 45 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m29-b based on Prochloron didemni.

<4 OOs, SEQUENCE: 114 cgaagcttaa gatggcticaa cqC catcgta CdCacaaaac gtcgg 45

<210s, SEQ ID NO 115 &211s LENGTH: 49 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m30- a based on Prochloron didemni.

<4 OOs, SEQUENCE: 115 atcgtacgca caaaacatga t cago.gcaaa aacgcttgcc ticcaa.cccc 49 US 2014/011383.0 A1 Apr. 24, 2014 53

- Continued

<210s, SEQ ID NO 116 &211s LENGTH: 46 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m30-b based on Prochloron didemni.

<4 OOs, SEQUENCE: 116 cgaagcttaa gatggcticaa cqc catcgta cqcacaaaac atgat c 46

<210s, SEQ ID NO 117 &211s LENGTH: 47 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m31- a based on Prochloron didemni.

<4 OOs, SEQUENCE: 117 cgitacgcaca acacatgatc agcgcaaaaa cqcttgcctic caa.cccc 47

<210s, SEQ ID NO 118 &211s LENGTH: 46 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m31-b based on Prochloron didemni.

<4 OOs, SEQUENCE: 118 cgaagcttaa gatggcticaa cqc catcgta cqcacaacac atgat c 46

<210s, SEQ ID NO 119 &211s LENGTH: 47 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m32 - a based on Prochloron didemni.

<4 OOs, SEQUENCE: 119 cgitacgcaca acaacagat.c agcgcaaaaa cqcttgcctic caa.cccc 47

<210s, SEQ ID NO 120 &211s LENGTH: 46 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m32-b based on Prochloron didemni.

<4 OOs, SEQUENCE: 120 cgaagcttaa gatggcticaa cqC catcgta CdCacaacaa Cagat c 46

<210s, SEQ ID NO 121 &211s LENGTH: 49 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m33 - a based on Prochloron didemni.

<4 OOs, SEQUENCE: 121 US 2014/011383.0 A1 Apr. 24, 2014 54

- Continued atcgtacgca caaaaacaac gacacgcaca aacgcttgcc ticcaa.cccc 49

<210s, SEQ ID NO 122 &211s LENGTH: 45 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m33-b based on Prochloron didemni.

<4 OOs, SEQUENCE: 122 cgaagcttaa gatggcticaa cqC catcgta CdCacaaaaa Caacg 45

<210s, SEQ ID NO 123 &211s LENGTH: 49 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m34 - a based on Prochloron didemni.

<4 OOs, SEQUENCE: 123 atcgtacgca caaaacgtga tacacgc.cgt acggcttgcc ticcaa.cccc 49

<210s, SEQ ID NO 124 &211s LENGTH: 47 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m35-a based on Prochloron didemni.

<4 OOs, SEQUENCE: 124 cgitacgcaca acaacaacac agcgcaaaaa cqcttgcctic caa.cccc 47

<210s, SEQ ID NO 125 &211s LENGTH: 46 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m35-b based on Prochloron didemni.

<4 OOs, SEQUENCE: 125 cgaagcttaa gatggcticaa cqc catcgta cqcacaacaa caacac 46

<210s, SEQ ID NO 126 &211s LENGTH: 49 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m36-a based on Prochloron didemni.

<4 OOs, SEQUENCE: 126 atcgtacgca caacga cagt cacagt caca acggcttgcc ticcaa.cccc 49

<210s, SEQ ID NO 127 &211s LENGTH: 43 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m36-b based on Prochloron didemni.

<4 OOs, SEQUENCE: 127 US 2014/011383.0 A1 Apr. 24, 2014 55

- Continued cgaagcttaa gatggcticaa cqC catcgta CdCacaacga cag 43

<210s, SEQ ID NO 128 &211s LENGTH: 49 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m37-a based on Prochloron didemni.

<4 OOs, SEQUENCE: 128 cgcacaaaca caaaaacaga tacacgcaca aacgcttgcc ticcaa.cccc 49

<210s, SEQ ID NO 129 &211s LENGTH: 47 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m37-b based on Prochloron didemni.

<4 OOs, SEQUENCE: 129 t caacgc.cat cqtacgcaca aacacacgca caaacacaaa alacagat 47

<210s, SEQ ID NO 130 &211s LENGTH: 49 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m38-b based on Prochloron didemni.

<4 OOs, SEQUENCE: 130 cgc.catcqta cqcacagata caaacacacg cacaaacaca aaaacagat 49

<210s, SEQ ID NO 131 &211s LENGTH: 37 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m38-c based on Prochloron didemni.

<4 OOs, SEQUENCE: 131 cgaagcttaa gatggcticaa cqC catcgta CdCacag 37

<210s, SEQ ID NO 132 &211s LENGTH: 49 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer m39 - a based on Prochloron didemni.

<4 OOs, SEQUENCE: 132 atcgtacgcg gtaaacgtgt cqgtc.gc.cgt aacgcttgcc ticcaa.cccC 49

<210s, SEQ ID NO 133 &211s LENGTH: 47 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer maO - a based on Prochloron didemni. US 2014/011383.0 A1 Apr. 24, 2014 56

- Continued <4 OOs, SEQUENCE: 133 cgitacgcaga aaaagagt ca gacgcagaaa cqcttgcctic caa.cc cc 47

<210s, SEQ ID NO 134 &211s LENGTH: 47 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer maO-b based on Prochloron didemni.

<4 OOs, SEQUENCE: 134 cgaagcttaa gatggcticaa cqC catcgta CC agaaaaa gag tdag 47

<210s, SEQ ID NO 135 &211s LENGTH: 41 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer ma1- a based on Prochloron didemni.

<4 OOs, SEQUENCE: 135 cggtaaacgt gatggit cqcc gtaacgcttg cct coaa.ccc C 41

<210s, SEQ ID NO 136 &211s LENGTH: 46 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer ma1-b based on Prochloron didemni.

<4 OOs, SEQUENCE: 136 t caacgc.cat citacgcggit aatggtaacg gtaaacgtga tiggtcg 46

<210s, SEQ ID NO 137 &211s LENGTH: 49 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer ma2-a based on Prochloron didemni.

<4 OOs, SEQUENCE: 137 cgcacaaaca caaaaacaga tacaacgaca acggcttgcc ticcaa.cccc 49

<210s, SEQ ID NO 138 &211s LENGTH: 47 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer ma3 - a based on Prochloron didemni.

<4 OOs, SEQUENCE: 138 cacaacgaca acgacagata cacgcacaaa cqcttgcctic caa.cccc 47

<210s, SEQ ID NO 139 &211s LENGTH: 49 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer ma4-b based on Prochloron didemni. US 2014/011383.0 A1 Apr. 24, 2014 57

- Continued

<4 OOs, SEQUENCE: 139 t caacgc.cat cqtacgcaca aacacacgca caacgacaac gacagatac 49

<210s, SEQ ID NO 140 &211s LENGTH: 48 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer ma4- a based on Prochloron didemni.

<4 OOs, SEQUENCE: 140 cggcaaacgc aaaag.cagat acacgcacaa acgcttgcct c caac ccc 48

<210s, SEQ ID NO 141 &211s LENGTH: 45 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer ma4-b based on Prochloron didemni.

<4 OOs, SEQUENCE: 141 t caacgc.cat citacgcaca acgacaacgg caaacgcaaa agcag 45

<210s, SEQ ID NO 142 &211s LENGTH: 49 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer ma5–a based on Prochloron didemni.

<4 OOs, SEQUENCE: 142 atcgtacgca caaaag caga tigg togcc.gt aacgcttgcc ticcaa.cccc 49

<210s, SEQ ID NO 143 &211s LENGTH: 43 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer ma5-b based on Prochloron didemni.

<4 OOs, SEQUENCE: 143 cgaagcttaa gatggcticaa cqC catcgta CdCacaaaag cag 43

<210s, SEQ ID NO 144 &211s LENGTH: 49 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer ma6-a based on Prochloron didemni.

<4 OOs, SEQUENCE: 144 atcgtacgca caaaacgtga tigg togcaca aacgcttgcc ticcaa.cccc 49

<210s, SEQ ID NO 145 &211s LENGTH: 49 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer ma 7- a based on US 2014/011383.0 A1 Apr. 24, 2014 58

- Continued Prochloron didemni.

SEQUENCE: 145 atcgtacgcg gtaaacgtga tacacgcaca aacgcttgcc ticcaa.cccc 49

SEQ ID NO 146 LENGTH: 47 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: DNA sequence of primer ma8- a based on Prochloron didemni.

SEQUENCE: 146 cgitacgcggit aaaacagatg gtc.gcacaaa cqcttgcctic caa.cc cc 47

SEO ID NO 147 LENGTH: 49 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: DNA sequence of primer pre-lon a based on Prochloron didemni.

SEQUENCE: 147 gggitta actt taacaaggag aaaaacatga acaagaaaaa catcCtgcc 49

SEQ ID NO 148 LENGTH: 47 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: DNA sequence of primer pre-lon b based on Prochloron didemni.

SEQUENCE: 148 ggcgtaatac gact cact at agggittaact ttaacaagga gaaaaac 47

SEQ ID NO 149 LENGTH: 16 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: DNA sequence of primer pre-lon c based on Prochloron didemni.

SEQUENCE: 149 gcttgcct co aaccoc 16

SEO ID NO 150 LENGTH: 38 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: DNA sequence of primer wt-aydgv based on Prochloron didemni.

SEQUENCE: 150 cgaagcttaa acgc.catcgt acgcacaaaa cqtgatac 38

SEQ ID NO 151 LENGTH: 26 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: US 2014/011383.0 A1 Apr. 24, 2014 59

- Continued <223> OTHER INFORMATION: DNA sequence of primer mc-aydgv based on Prochloron didemni.

<4 OOs, SEQUENCE: 151 cgaagcttaa acgc.catcgt acgcac 26

<210s, SEQ ID NO 152 &211s LENGTH: 25 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer mT-aydgv based on Prochloron didemni.

<4 OOs, SEQUENCE: 152 cgaagcttaa acgc.catcgt acgcg 25

<210s, SEQ ID NO 153 &211s LENGTH: 26 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer mS-aydgv based on Prochloron didemni.

<4 OOs, SEQUENCE: 153 cgaagcttaa acgc.catcgt acgcag 26

<210s, SEQ ID NO 154 &211s LENGTH: 44 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer wt-GS based on Prochloron didemni.

<4 OOs, SEQUENCE: 154 cgaagct tag ctg.ccgctgc cgctgccaac gcc atcgtac gcac 44

<210s, SEQ ID NO 155 &211s LENGTH: 43 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer mT-GS based on Prochloron didemni.

<4 OO > SEQUENCE: 155 cgaagct tag ctg.ccgctgc cgctgccaac gcc atcgtac gcg 43

<210s, SEQ ID NO 156 &211s LENGTH: 44 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer mS-GS based on Prochloron didemni.

<4 OOs, SEQUENCE: 156 cgaagct tag ctg.ccgctgc cgctgccaac gcc atcgtac gcag 44

<210s, SEQ ID NO 157 &211s LENGTH: 43 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence US 2014/011383.0 A1 Apr. 24, 2014 60

- Continued

22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer mT-GG based on Prochloron didemni.

<4 OO > SEQUENCE: 157 cgaagcttac cc.gc.ccc.cgc ccc.cgc.caac gcc atcgtac gcg 43

<210s, SEQ ID NO 158 &211s LENGTH: 43 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer mcS-GG based on Prochloron didemni.

<4 OOs, SEQUENCE: 158 cgaagcttac cc.gc.ccc.cgc cccc.gc.caac gocatcgtac gca 43

<210s, SEQ ID NO 159 &211s LENGTH: 42 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer wt-EGS1 based on Prochloron didemni.

<4 OOs, SEQUENCE: 159 cc.gctg.ccgc taccct caac gocatcgtac goacaaaacg td 42

<210s, SEQ ID NO 160 &211s LENGTH: 36 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer GS3 an2 based on Prochloron didemni.

<4 OOs, SEQUENCE: 160 titt.ccgcc cc ccgt.cctago togcc.gctgcc gct acc 36

<210s, SEQ ID NO 161 &211s LENGTH: 48 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer T7g1OM based on Prochloron didemni.

<4 OOs, SEQUENCE: 161 taatacgact cactataggg tta actittaa gaaggagata tacatatg 48

<210s, SEQ ID NO 162 &211s LENGTH: 53 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer XC3pool based on Prochloron didemni. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (23) . . (23) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (24) . . (24) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: US 2014/011383.0 A1 Apr. 24, 2014 61

- Continued <221 > NAMEAKEY: misc feature <222s. LOCATION: (29).. (29) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (30) ... (30) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (35) ... (35) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (36) ... (36) <223> OTHER INFORMATION: n stands for any base. <4 OOs, SEQUENCE: 162 accaacgc.ca togtacgcac amnnacamnn acamningctt gcc to calacc cc.g 53

<210s, SEQ ID NO 163 &211s LENGTH: 59 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer XC4pool based on Prochloron didemni. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (23) . . (23) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (24) . . (24) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (29).. (29) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (30) ... (30) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (35) ... (35) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (36) ... (36) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (41) ... (41) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (42) ... (42) <223> OTHER INFORMATION: n stands for any base. <4 OOs, SEQUENCE: 163 accaacgc.ca togtacgcac amnnacamnn acamnnacam ningcttgcct c caac ccc.g 59

<210s, SEQ ID NO 164 &211s LENGTH: 65 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer XC5pool based on Prochloron didemni. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (23) . . (23) <223> OTHER INFORMATION: n stands for any base. US 2014/011383.0 A1 Apr. 24, 2014 62

- Continued

22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (24) . . (24) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (29).. (29) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (30) ... (30) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (35) ... (35) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (36) ... (36) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (41) ... (41) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (42) ... (42) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (47) ... (47) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (48) ... (48) <223> OTHER INFORMATION: n stands for any base. <4 OOs, SEQUENCE: 164 accaacgc.ca togtacgcac amnnacamnn acamnnacam innacamningc titgcct c caa 6 O cc.ccg 65

<210s, SEQ ID NO 165 &211s LENGTH: 71 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer XC6pool based on Prochloron didemni. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (23) . . (23) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (24) . . (24) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (29).. (29) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (30) ... (30) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (35) ... (35) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (36) ... (36) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature US 2014/011383.0 A1 Apr. 24, 2014 63

- Continued <222s. LOCATION: (41) ... (41) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (42) ... (42) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (47) ... (47) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (48) ... (48) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (53) . . (53) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (54) . . (54) <223> OTHER INFORMATION: n stands for any base. <4 OOs, SEQUENCE: 165 accaacgc.ca togtacgcac amnnacamnn acamnnacam innacamnnac amningcttgc 6 O citccaa.cccc g 71.

<210s, SEQ ID NO 166 211 LENGTH: 77 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer XC7pool based on Prochloron didemni. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (23) . . (23) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (24) . . (24) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (29).. (29) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (30) ... (30) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (35) ... (35) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (36) ... (36) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (41) ... (41) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (42) ... (42) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (47) ... (47) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (48) ... (48) <223> OTHER INFORMATION: n stands for any base. US 2014/011383.0 A1 Apr. 24, 2014 64

- Continued

22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (53) . . (53) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (54) . . (54) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (59) . . (59) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (6O) . . (60) <223> OTHER INFORMATION: n stands for any base. <4 OOs, SEQUENCE: 166 accaacgc.ca togtacgcac amnnacamnn acamnnacam innacamnnac amnnacamnn 6 O gcttgcct co aaccocq 77

<210s, SEQ ID NO 167 &211s LENGTH: 83 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer XC8pool based on Prochloron didemni. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (23) . . (23) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (24) . . (24) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (29).. (29) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (30) ... (30) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (35) ... (35) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (36) ... (36) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (41) ... (41) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (42) ... (42) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (47) ... (47) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (48) ... (48) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (53) . . (53) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature US 2014/011383.0 A1 Apr. 24, 2014 65

- Continued <222s. LOCATION: (54) . . (54) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (59) . . (59) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (6O) . . (60) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (65) . . (65) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (66) . . (66) <223> OTHER INFORMATION: n stands for any base. <4 OOs, SEQUENCE: 167 accaacgc.ca togtacgcac amnnacamnn acamnnacam innacamnnac amnnacamnn 6 O acamningctt gcct coaacc cc.g 83

<210s, SEQ ID NO 168 &211s LENGTH: 53 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer XCST3pool based on Prochloron didemni. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (23) . . (23) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (24) . . (24) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (29).. (29) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (30) ... (30) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (35) ... (35) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (36) ... (36) <223> OTHER INFORMATION: n stands for any base. <4 OOs, SEQUENCE: 168 accaacgc.ca togtacgcas Wmnnaswmnn as Wmningctt gcc to calacc cc.g 53

<210s, SEQ ID NO 169 &211s LENGTH: 59 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer XCST4pool based on Prochloron didemni. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (23) . . (23) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (24) . . (24) US 2014/011383.0 A1 Apr. 24, 2014 66

- Continued <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (29).. (29) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (30) ... (30) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (35) ... (35) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (36) ... (36) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (41) ... (41) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (42) ... (42) <223> OTHER INFORMATION: n stands for any base. <4 OOs, SEQUENCE: 169 accaacgc.ca togtacgcas Wmnnaswmnn as Wmnnaswm ningcttgcct c caac ccc.g 59

<210s, SEQ ID NO 170 &211s LENGTH: 65 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer XCST5pool based on Prochloron didemni. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (23) . . (23) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (24) . . (24) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (29).. (29) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (30) ... (30) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (35) ... (35) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (36) ... (36) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (41) ... (41) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (42) ... (42) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (47) ... (47) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (48) ... (48) US 2014/011383.0 A1 Apr. 24, 2014 67

- Continued <223> OTHER INFORMATION: n stands for any base. <4 OOs, SEQUENCE: 170 accaacgc.ca togtacgcas Wmnnaswmnn as Wmnnaswm nnaswmningc titgcct c caa 6 O cc.ccg 65

<210s, SEQ ID NO 171 &211s LENGTH: 71 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: DNA sequence of primer XCST6pool based on Prochloron didemni. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (23) . . (23) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (24) . . (24) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (29).. (29) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (30) ... (30) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (35) ... (35) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (36) ... (36) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (41) ... (41) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (42) ... (42) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (47) ... (47) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (48) ... (48) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (53) . . (53) <223> OTHER INFORMATION: n stands for any base. 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (54) . . (54) <223> OTHER INFORMATION: n stands for any base. <4 OOs, SEQUENCE: 171 accaacgc.ca togtacgcas Wmnnaswmnn as Wmnnaswm nnaswmnnas Wmningcttgc 6 O citccaa.cccc g 71.

<210s, SEQ ID NO 172 &211s LENGTH: 51 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: US 2014/011383.0 A1 Apr. 24, 2014 68

- Continued <223> OTHER INFORMATION: DNA sequence of primer Pat Epool. ex based on Prochloron didemni.

<4 OOs, SEQUENCE: 172 titt.ccgcc cc ccgt.cctago togcc.gctgcc gctaccaacg ccatcgtacg c 51

<210s, SEQ ID NO 173 &211s LENGTH: 5 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of Recognition Sequence 1 based on Prochloron didemni.

<4 OOs, SEQUENCE: 173 Gly Gly Gly Gly Gly 1. 5

<210s, SEQ ID NO 174 &211s LENGTH: 5 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of Recognition Sequence 1 based on Prochloron didemni.

<4 OOs, SEQUENCE: 174

Glin Glin Glin Glin Glin 1. 5

<210s, SEQ ID NO 175 &211s LENGTH: 5 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of Recognition Sequence 1 based on Prochloron didemni.

<4 OO > SEQUENCE: 175

Lieu. Luell Luell Luell Lell 1. 5

<210s, SEQ ID NO 176 &211s LENGTH: 5 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of Recognition Sequence 1 based on Prochloron didemni.

<4 OOs, SEQUENCE: 176

Pro Pro Pro Pro Pro 1. 5

<210s, SEQ ID NO 177 &211s LENGTH: 24 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of Leader Sequence based on Prochloron didemni.

<4 OO > SEQUENCE: 177 Met Lys Glu Glin Asn Ser Phe Asn Lieu. Leu Gln Glu Val Thr Glu Ser 1. 5 1O 15 US 2014/011383.0 A1 Apr. 24, 2014 69

- Continued Glu Lieu. Asp Lieu. Ile Lieu. Gly Ala 2O

<210s, SEQ ID NO 178 &211s LENGTH: 25 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of Leader Sequence based on Prochloron didemni.

<4 OOs, SEQUENCE: 178 Met Ile Leu Ala Ser Leu Ser Thr Phe Glin Gln Met Trp Ile Ser Lys 1. 5 1O 15 Glin Glu Tyr Asp Glu Ala Gly Asp Ala 2O 25

<210s, SEQ ID NO 179 &211s LENGTH: 37 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of Leader Sequence based on Prochloron didemni.

<4 OO > SEQUENCE: 179 Met Glu Lieu. Glin Lieu. Arg Pro Ser Gly Lieu. Glu Lys Lys Glin Ala Pro 1. 5 1O 15 Ile Ser Glu Lieu. Asn. Ile Ala Glin Thr Glin Gly Gly Asp Ser Glin Val 2O 25 3O

Lieu Ala Lieu. Asn Ala 35

<210s, SEQ ID NO 18O &211s LENGTH: 2 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 18O Val Cys 1.

<210s, SEQ ID NO 181 &211s LENGTH: 4 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 181 Val Cys Ala Cys 1.

<210s, SEQ ID NO 182 &211s LENGTH: 22 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate US 2014/011383.0 A1 Apr. 24, 2014 70

- Continued sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 182 Val Cys Ala Cys Ile Cys Phe Cys Val Cys Ala Cys Val Cys Ile Cys 1. 5 1O 15 Tyr Cys Phe Cys Ile Cys 2O

<210s, SEQ ID NO 183 &211s LENGTH: 36 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 183 Val Cys Ala Cys Ile Cys Phe Cys Val Cys Val Cys Phe Cys Tyr Cys 1. 5 1O 15 Ala Cys Tyr Cys Ile Cys Phe Cys Ala Cys Val Cys Ile Cys Tyr Cys 2O 25 3O Phe Cys Ile Cys 35

<210s, SEQ ID NO 184 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 184 Arg Thr Asp Thr Asp Thr Arg Thr 1. 5

<210s, SEQ ID NO 185 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 185 Arg Ser Asp Ser Asp Ser Arg Ser 1. 5

<210s, SEQ ID NO 186 &211s LENGTH: 6 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 186 Cys Cys Cys Cys Cys Cys 1. 5 US 2014/011383.0 A1 Apr. 24, 2014 71

- Continued

<210s, SEQ ID NO 187 &211s LENGTH: 6 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 187

Th Thir Thir Thir Thir Thir 1. 5

<210s, SEQ ID NO 188 &211s LENGTH: 6 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 188

Ser Ser Ser Ser Ser Ser 1. 5

<210s, SEQ ID NO 189 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 189

Wall Phe Ala Thir Ile Thir Phe Thr 1. 5

<210s, SEQ ID NO 190 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 190 Cys Phe Ala Thr Ile Thr Phe Thr 1. 5

<210s, SEQ ID NO 191 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 191 Val Phe Ala Lieu. Cys Cys Cys Cys 1. 5

<210s, SEQ ID NO 192 US 2014/011383.0 A1 Apr. 24, 2014 72

- Continued

&211s LENGTH: 2 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 192

Wall. Thir 1.

<210s, SEQ ID NO 193 &211s LENGTH: 4 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 193 Val Thr Ala Cys 1.

<210s, SEQ ID NO 194 &211s LENGTH: 12 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 194 Val Thr Ala Cys Ile Thr Phe Cys Val Thr Ala Cys 1. 5 1O

<210s, SEQ ID NO 195 &211s LENGTH: 16 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 195 Val Thr Ala Cys Ile Thr Phe Cys Val Thr Ala Cys Val Ser Ile Cys 1. 5 1O 15

<210s, SEQ ID NO 196 &211s LENGTH: 22 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 196 Val Thr Ala Cys Ile Thr Phe Cys Val Thr Ala Cys Val Ser Ile Cys 1. 5 1O 15 Tyr Thr Phe Cys Ile Thr 2O US 2014/011383.0 A1 Apr. 24, 2014 73

- Continued <210s, SEQ ID NO 197 &211s LENGTH: 36 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OO > SEQUENCE: 197 Val Thr Ala Cys Ile Thr Phe Cys Val Thr Ala Cys Val Ser Ile Cys 1. 5 1O 15 Tyr Thr Phe Cys Ile Thr Phe Cys Ala Thr Val Cys Ile Ser Tyr Cys 2O 25 3O Phe Thir Ile Cys 35

<210s, SEQ ID NO 198 &211s LENGTH: 16 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 198 Val Thr Ala Cys Ile Thr Phe Cys Val Thr Ala Cys Val Thir Ile Cys 1. 5 1O 15

<210s, SEQ ID NO 199 &211s LENGTH: 22 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 199 Val Thr Ala Cys Ile Thr Phe Cys Val Thr Ala Cys Val Thir Ile Cys 1. 5 1O 15 Tyr Thr Phe Cys Ile Thr 2O

<210s, SEQ ID NO 2 OO &211s LENGTH: 36 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: An example of newly identified substrate sequence of azoline ring introducing enzyme based on Prochloron didemni.

<4 OOs, SEQUENCE: 2OO Val Thr Ala Cys Ile Thr Phe Cys Val Thr Ala Cys Val Thir Ile Cys 1. 5 1O 15 Tyr Thr Phe Cys Ile Thr Phe Cys Ala Thr Val Cys Ile Thr Tyr Cys 2O 25 3O Phe Thir Ile Cys 35

<210s, SEQ ID NO 2 O1 &211s LENGTH: 42 US 2014/011383.0 A1 Apr. 24, 2014 74

- Continued

212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Amino acid sequence of a peptide encoded by PatEpire based on Prochloron didemni. <4 OOs, SEQUENCE: 2O1 Met Asn Llys Lys Asn. Ile Lieu Pro Glin Glin Gly Glin Pro Val Ile Arg 1. 5 1O 15 Lieu. Thir Ala Gly Glin Lieu. Ser Ser Glin Lieu Ala Glu Lieu. Ser Glu Glu 2O 25 3O Ala Lieu. Gly Asp Ala Gly Lieu. Glu Ala Ser 35 4 O

<210s, SEQ ID NO 202 &211s LENGTH: 51 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Amino acid sequence of an example of substrate peptide based on Prochloron didemni. 22 Os. FEATURE: <221s NAME/KEY: MISC FEATURE <222s. LOCATION: (43) . . (43) <223> OTHER INFORMATION: Xaa stands for a Cassette region consisting of more than one amino acid.

<4 OOs, SEQUENCE: 2O2 Met ASn Llys Lys ASn Ile Lieu. Pro Gln Glin Gly Gln Pro Val Ile Arg 1. 5 1O 15 Lieu. Thir Ala Gly Glin Lieu. Ser Ser Glin Lieu Ala Glu Lieu. Ser Glu Glu 2O 25 3O Ala Lieu. Gly Asp Ala Gly Lieu. Glu Ala Ser Xaa Ala Tyr Asp Gly Val 35 4 O 45

Glu Pro Ser SO

<210s, SEQ ID NO 203 &211s LENGTH: 44 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Amino acid sequence of an example of substrate peptide based on Prochloron didemni. 22 Os. FEATURE: <221s NAME/KEY: MISC FEATURE <222s. LOCATION: (43) . . (43) <223> OTHER INFORMATION: Xaa stands for a Cassette region consisting of more than one amino acid. 22 Os. FEATURE: <221s NAME/KEY: MISC FEATURE <222s. LOCATION: (44) ... (44) <223> OTHER INFORMATION: Xaa stands for a Recognition Sequence 2 consisting of more than one amino acid.

<4 OOs, SEQUENCE: 2O3 Met Asn Llys Lys Asn. Ile Lieu Pro Glin Glin Gly Glin Pro Val Ile Arg 1. 5 1O 15 Lieu. Thir Ala Gly Glin Lieu. Ser Ser Glin Lieu Ala Glu Lieu. Ser Glu Glu 2O 25 3O Ala Lieu. Gly Asp Ala Gly Lieu. Glu Ala Ser Xaa Xaa 35 4 O

<210s, SEQ ID NO 204 US 2014/011383.0 A1 Apr. 24, 2014 75

- Continued

&211s LENGTH: 65 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Amino acid sequence of an example of substrate peptide based on Prochloron didemni. 22 Os. FEATURE: <221s NAME/KEY: MOD RES <222s. LOCATION: (1) . . (1) 223 OTHER INFORMATION: FORMYLATION

<4 OOs, SEQUENCE: 204 Met Asn Llys Lys Asn. Ile Lieu Pro Glin Glin Gly Glin Pro Val Ile Arg 1. 5 1O 15 Lieu. Thir Ala Gly Glin Lieu. Ser Ser Glin Lieu Ala Glu Lieu. Ser Glu Glu 2O 25 3O Ala Lieu. Gly Asp Ala Gly Lieu. Glu Ala Ser Lieu. Cys Gly Cys Thir Ser 35 4 O 45 Tyr Cys Tyr Thr Val Ser Ala Tyr Asp Gly Val Gly Ser Gly Ser Gly SO 55 6 O

Ser 65

<210s, SEQ ID NO 205 &211s LENGTH: 65 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Amino acid sequence of an example of substrate peptide based on Prochloron didemni. 22 Os. FEATURE: <221s NAME/KEY: MOD RES <222s. LOCATION: (1) . . (1) 223 OTHER INFORMATION: FORMYLATION

<4 OOs, SEQUENCE: 205 Met Asn Llys Lys Asn. Ile Lieu Pro Glin Glin Gly Glin Pro Val Ile Arg 1. 5 1O 15 Lieu. Thir Ala Gly Glin Lieu. Ser Ser Glin Lieu Ala Glu Lieu. Ser Glu Glu 2O 25 3O Ala Lieu. Gly Asp Ala Gly Lieu. Glu Ala Ser Ser Cys Asn. Ser Ile Ser 35 4 O 45 Met Ser Ser Thr Pro Ser Ala Tyr Asp Gly Val Gly Ser Gly Ser Gly SO 55 6 O

Ser 65

<210s, SEQ ID NO 206 &211s LENGTH: 65 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Amino acid sequence of an example of substrate peptide based on Prochloron didemni. 22 Os. FEATURE: <221s NAME/KEY: MOD RES <222s. LOCATION: (1) . . (1) 223 OTHER INFORMATION: FORMYLATION

<4 OOs, SEQUENCE: 2O6 Met Asn Llys Lys Asn. Ile Lieu Pro Glin Glin Gly Glin Pro Val Ile Arg 1. 5 1O 15

Lieu. Thir Ala Gly Glin Lieu. Ser Ser Glin Lieu Ala Glu Lieu. Ser Glu Glu US 2014/011383.0 A1 Apr. 24, 2014 76

- Continued

2O 25 3O Ala Lieu. Gly Asp Ala Gly Lieu. Glu Ala Ser Cys Thr Lieu. Ser Asn Thr 35 4 O 45 Pro Ser Ser Thr Lieu. Thir Ala Tyr Asp Gly Val Gly Ser Gly Ser Gly SO 55 6 O

Ser 65

<210s, SEQ ID NO 2 O7 &211s LENGTH: 65 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Amino acid sequence of an example of substrate peptide based on Prochloron didemni. 22 Os. FEATURE: <221s NAME/KEY: MOD RES <222s. LOCATION: (1) . . (1) 223 OTHER INFORMATION: FORMYLATION

<4 OOs, SEQUENCE: 2O7 Met Asn Llys Lys Asn. Ile Lieu Pro Glin Glin Gly Glin Pro Val Ile Arg 1. 5 1O 15 Lieu. Thir Ala Gly Glin Lieu. Ser Ser Glin Lieu Ala Glu Lieu. Ser Glu Glu 2O 25 3O Ala Lieu. Gly Asp Ala Gly Lieu. Glu Ala Ser Lieu. Ser Lieu. Ser Asn Thr 35 4 O 45 Phe Thr Glu Ser Glu Ser Ala Tyr Asp Gly Val Gly Ser Gly Ser Gly SO 55 6 O

Ser 65

<210s, SEQ ID NO 208 &211s LENGTH: 65 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Amino acid sequence of an example of substrate peptide based on Prochloron didemni. 22 Os. FEATURE: <221s NAME/KEY: MOD RES <222s. LOCATION: (1) . . (1) 223 OTHER INFORMATION: FORMYLATION

<4 OOs, SEQUENCE: 208 Met Asn Llys Lys Asn. Ile Lieu Pro Glin Glin Gly Glin Pro Val Ile Arg 1. 5 1O 15 Lieu. Thir Ala Gly Glin Lieu. Ser Ser Glin Lieu Ala Glu Lieu. Ser Glu Glu 2O 25 3O Ala Lieu. Gly Asp Ala Gly Lieu. Glu Ala Ser Ser Ser Lieu. Cys Ile Thr 35 4 O 45 Pro Ser Ser Thr Glin Thr Ala Tyr Asp Gly Val Gly Ser Gly Ser Gly SO 55 6 O

Ser 65

<210s, SEQ ID NO 209 &211s LENGTH: 59 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: US 2014/011383.0 A1 Apr. 24, 2014 77

- Continued <223> OTHER INFORMATION: Amino acid sequence of an example of substrate peptide based on Prochloron didemni. 22 Os. FEATURE: <221s NAME/KEY: MOD RES <222s. LOCATION: (1) . . (1) 223 OTHER INFORMATION: FORMYLATION

<4 OOs, SEQUENCE: 209 Met Asn Llys Lys Asn. Ile Lieu Pro Glin Glin Gly Glin Pro Val Ile Arg 1. 5 1O 15 Lieu. Thir Ala Gly Glin Lieu. Ser Ser Glin Lieu Ala Glu Lieu. Ser Glu Glu 2O 25 3O Ala Lieu. Gly Asp Ala Gly Lieu. Glu Ala Ser Lieu. Cys Gly Cys Thir Ser 35 4 O 45 Tyr Cys Tyr Thr Val Ser Ala Tyr Asp Gly Val SO 55

<210s, SEQ ID NO 210 &211s LENGTH: 59 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Amino acid sequence of an example of substrate peptide based on Prochloron didemni. 22 Os. FEATURE: <221s NAME/KEY: MOD RES <222s. LOCATION: (1) . . (1) 223 OTHER INFORMATION: FORMYLATION

<4 OOs, SEQUENCE: 210 Met Asn Llys Lys Asn. Ile Lieu Pro Glin Glin Gly Glin Pro Val Ile Arg 1. 5 1O 15 Lieu. Thir Ala Gly Glin Lieu. Ser Ser Glin Lieu Ala Glu Lieu. Ser Glu Glu 2O 25 3O Ala Lieu. Gly Asp Ala Gly Lieu. Glu Ala Ser Ser Cys Asn. Ser Ile Ser 35 4 O 45 Met Ser Ser Thr Pro Ser Ala Tyr Asp Gly Val SO 55

<210s, SEQ ID NO 211 &211s LENGTH: 59 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Amino acid sequence of an example of substrate peptide based on Prochloron didemni. 22 Os. FEATURE: <221s NAME/KEY: MOD RES <222s. LOCATION: (1) . . (1) 223 OTHER INFORMATION: FORMYLATION

<4 OOs, SEQUENCE: 211 Met Asn Llys Lys Asn. Ile Lieu Pro Glin Glin Gly Glin Pro Val Ile Arg 1. 5 1O 15 Lieu. Thir Ala Gly Glin Lieu. Ser Ser Glin Lieu Ala Glu Lieu. Ser Glu Glu 2O 25 3O Ala Lieu. Gly Asp Ala Gly Lieu. Glu Ala Ser Ser Ser Lieu. Cys Ile Thr 35 4 O 45 Pro Ser Ser Thr Glin Thr Ala Tyr Asp Gly Val SO 55

<210s, SEQ ID NO 212