<<

US 2013 0332.133A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2013/0332133 A1 Horn et al. (43) Pub. Date: Dec. 12, 2013

(54) CLASSIFICATION OF PROTEINSEQUENCES Related U.S. Application Data AND USES OF CLASSIFIED PROTENS (60) Provisional application No. 60/799,318, filed on May 11, 2006, provisional application No. 60/861,746, (75) Inventors: David Horn, Tel-Aviv (IL); Eytan filed on Nov.30, 2006. Ruppin, Reut (IL); Vered Kunik, Publication Classification Ramat-HaSharon (IL); Zach Solan, Tel-Aviv (IL); Ben Sandbank, Ganei (51) Int. Cl. Tikva (IL); Yasmine Meroz, Tel-Aviv G06F 9/28 (2006.01) (IL); Uri Weinbart, Herzlia (IL) (52) U.S. Cl. CPC ...... G06F 19/28 (2013.01) USPC ...... 703/11 (73) Assignee: Ramot At Tel Aviv University Ltd., Tel-Aviv (IL) (57) ABSTRACT A searchable database is disclosed. The protein data (21) Appl. No.: 12/227,183 base comprises a plurality of entries, each entry having a Sufficiently short predicting sequence and a protein classifier (22) PCT Fled: May 13, 2007 corresponding to the predicting sequence. An unclassified protein sequence can be classifiable by the database via (86) PCT NO.: PCT/IL2007/OOO585 searching therein for a motif of amino acids matching a pre S371 (c)(1), dicting sequence of the database, thereby attributing to the (2), (4) Date: Apr. 10, 2009 unclassified protein a protein classifier.

O begin

11 provide a protein database

12 search the sequence of the target protein for a motif matching a predicting sequence present in the database

13 use the classifier corresponding to the predicting sequence for classifying the target protein

14 issue a report

15 end Patent Application Publication Dec. 12, 2013 Sheet 1 of 28 US 2013/0332133 A1

1O begin

11 provide a protein database

12 search the sequence of the target protein for a motif matching a predicting sequence present in the database

13 use the classifier corresponding to the predicting sequence for classifying the target protein

14 issue a report

15 end

Fig. 1 Patent Application Publication Dec. 12, 2013 Sheet 2 of 28 US 2013/0332133 A1

20

searcher 22

protein database 24

classification functionality 26

output unit 28

Fig. 2 Patent Application Publication Dec. 12, 2013 Sheet 3 of 28 US 2013/0332133 A1

3O begin

31 extract repeatedly occurring motifs from sequences of the

32 select a protein class

33 search the motifs for a motif which is present proteins belonging to the class but not in proteins belonging to other classes

34 define the motif as a predicting sequence characterizing the class

35 screen the predicting sequence

37 record the predicting sequence and classes

Fig. 3 Patent Application Publication Dec. 12, 2013 Sheet 4 of 28 US 2013/0332133 A1

40

motif extraction unit Screening unit 42 48

Searcher 44

characterization unit 46

output unit 49

Fig. 4 Patent Application Publication Dec. 12, 2013 Sheet 5 of 28 US 2013/0332133 A1

51 extract repeatedly occurring motifs from sequences of the proteins

52 use the motifs for defining protein classes

Fig. 5 Patent Application Publication Dec. 12, 2013 Sheet 6 of 28 US 2013/0332133 A1

Patent Application Publication Dec. 12, 2013 Sheet 7 of 28 US 2013/0332133 A1

8:33:3:38: & : ** 3: . Y/ - Y s.a.- 8-88 8.88.7 ^^ & re-peer-, * : & 8.8 8.33:83:3883:3: . & . 8.

--significant patter Fig. 7a

Patent Application Publication Dec. 12, 2013 Sheet 8 of 28 US 2013/0332133 A1

88:8:

scarasarasarasasa, Patent Application Publication Dec. 12, 2013 Sheet 9 of 28 US 2013/0332133 A1

topogo opogoro

-

3 883:8:3333333i: Patent Application Publication Dec. 12, 2013 Sheet 10 of 28 US 2013/0332133 A1

orgopop &

8. s

$83:333333i: Patent Application Publication Dec. 12, 2013 Sheet 11 of 28 US 2013/0332133 A1

Patent Application Publication Dec. 12, 2013 Sheet 12 of 28 US 2013/0332133 A1

gredicting sex;3888 atches

Fig. 10 Patent Application Publication Dec. 12, 2013 Sheet 13 of 28 US 2013/0332133 A1

3. s:

asssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssxx-xx-xx-xx-xx-xxx-xx-xx-xx-xx-xx-xx-xx-xxx

3. 3.838. 3. 3. 3:3:33.33:33:3:33:38.33.383&...t.a.s. s.a.s.o.o.o.---- 8:33:33: 8::::: 3:exiciig sequestice i8icks 38: 3888 Fig, 11

gf8::ig sex;3888 tacies get protes Fig. 12 Patent Application Publication Dec. 12, 2013 Sheet 14 of 28 US 2013/0332133 A1

83 : 3 & 8 3: 3:33:3::::::::::::::::::::::::::::::::::::::::::::: 23 x 8 p$8xicting sequence ratches per protein

Fig. 13 Patent Application Publication Dec. 12, 2013 Sheet 15 of 28 US 2013/0332133 A1

Patent Application Publication Dec. 12, 2013 Sheet 16 of 28 US 2013/0332133 A1

&:x: xx-xx-xx xx 38 : 33 332: 3:38.3: 3838-38. 33 3: 33 3838-38.8: 8: coverage

Fig. 15 Patent Application Publication Dec. 12, 2013 Sheet 17 of 28 US 2013/0332133 A1

--8- xxx 8. 8 : x : x : x : : : x 38 : i.e. is se : : : :

saic category

Fig. 16 Patent Application Publication Dec. 12, 2013 Sheet 18 of 28 US 2013/0332133 A1

-o- sS -

8 8 8 s8 2 8 PsPS3

Fig, 17 Patent Application Publication Dec. 12, 2013 Sheet 19 of 28 US 2013/0332133 A1

*******************&&&&&&&&&*¿¿.*&&&&&&&&&&&&&&&&&&&&********** Patent Application Publication Dec. 12, 2013 Sheet 20 of 28 US 2013/0332133 A1

Fig. 18 Patent Application Publication Dec. 12, 2013 Sheet 21 of 28 US 2013/0332133 A1

Fig. 20a Patent Application Publication Dec. 12, 2013 Sheet 22 of 28 US 2013/0332133 A1

Patent Application Publication Dec. 12, 2013 Sheet 23 of 28 US 2013/0332133 A1

x Patent Application Publication Dec. 12, 2013 Sheet 24 of 28 US 2013/0332133 A1

88: s 3. k & 8: s : : s 8. & x: x8 8: *:--.

Patent Application Publication Dec. 12, 2013 Sheet 25 of 28 US 2013/0332133 A1

S. {x : 8.: &8 :8 & Ex. x & Patent Application Publication Dec. 12, 2013 Sheet 26 of 28 US 2013/0332133 A1

%

8 8trix Patent Application Publication Dec. 12, 2013 Sheet 27 of 28 US 2013/0332133 A1

enzymes with predicting sequences and ProSite (30,893)

enzymes with predicting sequences enzymes with ProSite only only (14,990) (1,521)

Fig. 24 Patent Application Publication Dec. 12, 2013 Sheet 28 of 28 US 2013/0332133 A1

8

&x c:& x& :xx. & 8 : 8 & 3: 8 $83.8x8 fig3:3838 & 38.388x3 $30: 888 joi>iani US 2013/0332.133 A1 Dec. 12, 2013

CLASSIFICATION OF PROTEINSEQUENCES its active site, a regional locus in the protein having a shape AND USES OF CLASSIFIED PROTEINS and a size that enables it to fit the intended Substrate Snugly at the molecular level. It also has a specific arrangement of FIELD AND BACKGROUND OF THE chemical moieties with particular properties at the atomic INVENTION level which govern the binding and catalysis of the substrate efficiently. This specific arrangement of chemical moieties, 0001. The present invention relates to bioinformatics and, typically referred to as the chromophore, stem from atoms of more particularly, but not exclusively, to a method and appa certain amino acids in the enzyme’s primary structure, and in ratus for classification of proteins according to Some cases comprise atoms from one or more other Small primary sequences. The invention also relates to uses of molecules called coenzymes, which are also held in place by polypeptides annotated according to the teachings of the the protein. present invention. 0007 Similarly to enzymes, all protein functions rely on 0002 Informatics is the study and application of computer molecular recognition. Transport proteins such as haemoglo and Statistical techniques for the management of information. bin must recognize the molecules they carry (in this case In Genome projects, bioinformatics includes the develop oxygen), receptors on the cell Surface must recognize particu ment of methods to search databases fast and efficiently, to lar signaling molecules called ligands, transcription factors analyze nucleic acid sequence information, to predict protein must recognize particular DNA sequences and antibodies function from sequence data and the like. Increasingly, must recognize specific epitopes in antigens, and the func molecular biology is shifting from the laboratory bench to the tional integrity of the cell depends critically on protein-pro computer desktop. Advanced quantitative analyses, database tein interactions, particularly on the formation of multi-pro comparisons and computational algorithms are needed to tein complexes. explore the relationships between sequence, function, struc 0008 Protein three-dimensional structures have evolved ture and phenotype. to address the vast functions carried out by proteins, and over 0003 Proteins are linear polymers of amino acids. The the past decades, thousands of these structures have been polymerization reaction, which produces a protein, results in elucidated to atomic resolution, mainly by X-ray diffraction the loss of one molecule of water from each peptide bond and NMR techniques. Most of the presently known structures formed (linking two adjacent amino acids), and hence pro are stored in the Protein Data Bank (PDB), and with them teins are often said to be composed of amino acid residues. emerged the field of structure-function relationship research. Natural protein molecules may contain as many as 20 differ 0009 Known in the art are algorithms which attempt to ent types of amino acid residues, the sequence of which predict three-dimensional structures based on the primary defines the so-called “primary sequence' of the protein. Pro sequence of a protein. Based on the predicted three-dimen teins perform all the processes defining life, including enzy sional structure and prior knowledge regarding the relation matic catalysis, transport and storage, coordinated motion, mechanical/structural Support, immune protection, genera between a particular three-dimensional structure and certain tion and transmission of nerve impulses, and control of biological properties, unclassified proteins having a known growth and differentiation. This immense range of functions primary sequence can be classified into predetermined pro is accomplished by a seemingly boundless variety of protein tein classes, such as reactivity classes, specific binding sequences which translate into three-dimensional structures. classes, functional classes and the like. These algorithms, 0004 Enzymes comprise a large protein category of inter however, make correct predictions only in limited number of est for biologists and/or protein chemists. One widely cases in which the number of available homology proteins is accepted method of classifying enzymes is the Enzyme Com Sufficiently large. mission commonly referred to as “EC Hierarchy which con 0010. The problem of classifying proteins from their pri sists of four numbers, nl:n2:n3:n4, corresponding to four mary sequence, has defied solution for over decades. One of levels of classification. For example, the oxidoreductases the earliest classification methods is known as homology class corresponds to n1 =1, one of the six main divisions. For modeling. Homology modeling is applicable only for cases in this class, n2 (Subclass) specifies electron donors, n3 (Sub which three-dimensional structures of similar primary Subclass) specifies electron acceptors and na indicates the sequences are already known. In this technique, a three-di exact enzymatic activity. mensional model for a protein of unknown structure (the 0005. The properties of a protein are determined by its target) is constructed based on one or more related proteins of covalently-linked amino acid sequence. encode pro known structure (the templates). The necessary conditions for teins by providing a sequence of nucleotides that is translated getting a useful model are (i) detectable similarity and (ii) into a sequence of amino acids. Proteins fold into a three dimensional structure, which results substantially from non availability of a correct alignment between the target amino covalent interactions (van der Waals forces, ionic bonds, acid sequence and the template structures. Homology mod hydrogen bonds, and hydrophobic and aromatic interactions) eling is based on the notion that new proteins evolve gradually between the various amino acid side-chains within the mol by amino acid Substitution, addition and/or deletion, and that ecule and with the water and ligand molecules within it. the three-dimensional structures and, therefore, affinity and Examination of the three-dimensional structure of numerous functional classes are often strongly conserved during the natural proteins has revealed a number of recurring patterns, evolution. In homology modeling, structural similarity is the most common are known as alpha helices, parallel beta assumed between two proteins if there exist a similarity of at sheets and anti-parallel beta sheets, which define a second least 40% between the proteins at the sequence level. level of structural organization. 0011. However, even though the paradigm "structure 0006. The biological properties of proteins are mainly determines function' holds generally true, presently known affected by the proteins three-dimensional structure, which data-mining algorithms which use the structural and determines the function of enzymes, the capacity and speci sequence databases for proteins are limited in automatically ficity of binding proteins such as receptors and antibodies, classifying and assigning function to new and unknown pro and the structural attributes of receptor/ligand molecules. For teins solely on the basis of structural similarity to proteins of example, the function of an enzyme relies on the structure of known structure and function. US 2013/0332.133 A1 Dec. 12, 2013

0012. In the field of genetic research, for example, the first specific protein sequence motifs for genome analysis'. Proc. step following the sequencing of a new is an effort to Natl. Acad. Sci. USA 95, 5865-5871). This approach results identify that gene’s function. The most popular and straight in a high classification Success rate at the second level of the forward methods to achieve that goal exploit the observation EC classification. that if two peptide stretches exhibit sufficient similarity at the 0017. In an additional technique exploits a sequence rec sequence level (i.e., one can be obtained from the other by a ognition algorithm disclosed in International Patent Applica Small number of insertions, deletions and/or amino acid tion, Publication No. WO/2005010642, to classify enzyme mutations), then they probably are biologically related. functionality at the second level of the EC classification Cai Within this framework, the question of getting clues about the et al., 2003, “SVM-Prot: web-based support vector machine function of a new gene becomes one of identifying homolo software for functional classification of a protein from its gies in Strings of amino acids. Generally, a homology refers to primary sequence'. Nuclear Acids Research, 31,3692-3697. 0018. Other methods of ascertaining functional data per a similarity, likeness or relation between two or more taining to primary sequence data are described by Ben-Hur sequences or Strings. Thus, one is given a query sequence and and Brutlag (2006: Protein sequence motifs: Highly predic a set of well characterized proteins and is looking for all tive features of protein function. In: Feature extraction, foun regions of the query sequence which are similar to regions of dations and applications. I. Guyon, S. Gunn, M. Nikravesh, sequences in the set. and L. Zadeh (eds.) Springer Verlag.0 and by Liao and Noble 0013 The first approaches used for realizing this task were (2003; Combining pairwise sequence analysis and Support based on a technique known as dynamic programming. vector machines for detecting remote protein evolutionary Unfortunately, the computational requirements of this and structural relationships. J. of Comp. Biology, 10:857 method quickly render it impractical, especially when search 868). ing large databases. Generally, the problem is that dynamic 0019. The present invention provides solutions to the programming variants spend a good part of their time com problems associated with prior art protein classification tech puting homologies which eventually turn out to be unimpor nique and provides searchable protein databases, tools to tant. In an effort to work around this issue, a number of produce such databases, and method and apparatus for clas algorithms have been proposed which focus on discovering Sifying protein sequences. only extensive local similarities. SUMMARY OF THE INVENTION 0014 Identifying the similar regions between the query 0020. According to one aspect of the present invention and the database sequences is, nevertheless, only the first part there is provided a searchable protein database, comprising a of the process. It is the second part of the process which is of plurality of entries, each of the plurality of entries having a interest to biologists. In the second part, the similarities are predicting sequence which comprises less than Lamino acids evaluated so as to properly classify the query sequence, and a protein classifier corresponding to the predicting according to its binding characteristics, function, three-di sequence, wherein an unclassified protein sequence is classi mensional structure and the like. Such evaluations are typi fiable by the database via searching therein for a motif of cally performed by combining biological information and amino acids matching a predicting sequence of the database, statistical reasoning. Nonetheless, it is appreciated that there thereby attributing to the unclassified protein a protein clas is a limit to how well a statistical model can approximate the sifier. biological reality. 0021. According to further features in preferred embodi 0015. A representative example of such evaluation relates ments of the invention described below, the database is a to the classification of enzymes, which is typically according searchable enzyme database. According to still further fea to their function. There are various known techniques for tures in the described preferred embodiments the protein dealing with enzyme functional classification according to classifier represents a branch of an EC hierarchical classifi their primary sequence. One approach combines pairwise cation. According to still further features in the described sequence similarity with the Support Vector Machine (SVM) preferred embodiments the predicting sequence is present classification method to obtain a remote homology detection exclusively in entries having protein classifier representing Liao, L. and Noble, W. S., 2003, “Combining pairwise the branch or descending branch thereof. sequence analysis and Support vector machines for detecting 0022. According to another aspect of the present invention remote protein evolutionary and structural relationships'. J. there is provided a readable data storage medium, carrying of Comp. Biology, 10, 857-868). the database. 0016. In another technique, a feature selection algorithm 0023. According to further features in preferred embodi is applied to regular-expression eMOTIFs Huang, J.Y. and ments of the invention described below, the database com Brutlag, D. L., 2001, “The eMOTIF database', Nuclear Acids prises at least one of the files Table-11.txt, Table-37.txt and research, 29, 202-204; Neville-Manning et al., 1998, “Highly Table-42.txt on enclosed CD-ROM.

LENGTHY TABLES The patent application contains a lengthy table section. A copy of the table is available in electronic form from the USPTO web site (http://seqdata.uspto.gov/?pageRequest=docDetail&DocID=US2013,0332133A1). An electronic copy of the table will also be available from the USPTO upon request and payment of the fee set forth in 37 CFR 1.19(b)(3). US 2013/0332.133 A1 Dec. 12, 2013

0024. According to yet another aspect of the present 0034. According to further features in preferred embodi invention there is provided a method of classifying a protein ments of the invention described below, the predicting sequence, comprising searching the protein sequence for a sequence predicts protein affinity, and the protein classifier motif of amino acids matching a predicting sequence present describes an affinity class of the protein. in the protein database, and using the protein classifier corre 0035. According to still further features in the described sponding to the predicting sequence for classifying the pro preferred embodiments the predicting sequence predicts pro tein sequence. tein function, and the protein classifier describes a functional 0025. According to further features in preferred embodi class of the protein. ments of the invention described below, the method further 0036. According to still further features in the described comprises repeating the search at least once, thereby provid preferred embodiments the protein classifier indicates pres ing a plurality of motifs of amino acids matching predicting ence of active site of active pocket at a location on the unclas sequences present in the protein database. sified protein corresponding to the motif of amino acids. 0026. According to still further features in the described 0037 According to still further features in the described preferred embodiments the method further comprises issuing preferred embodiments the extracting the repeatedly occur a report containing classification of the protein sequence. ring motifs comprises, for each sequence of the plurality of 0027. According to still further features in the described proteins: searching for partial overlaps between the sequence preferred embodiments the classifying the protein sequence and other sequences, applying a significance test on the par comprises determining presence or absence of at least one tial overlaps, and defining a most significant partial overlap as active pocket or active site on the protein sequence. a repeatedly occurring motif. 0028. According to still further features in the described 0038 According to still further features in the described preferred embodiments the method further comprises deter preferred embodiments the search for partial overlaps is by mining the location of the active pocket(s) or active site(s). constructing a graph having a plurality of paths representing 0029. According to still another aspect of the present the sequences of the plurality of proteins, and searching for invention there is provided apparatus for classifying a protein partial overlaps between paths of the graph. sequence, comprising: a searcher, capable of accessing the 0039. According to still further features in the described protein database, the searcher being operable to search the preferred embodiments the search for partial overlaps protein sequence for a motif of amino acids matching a pre between paths of the graph comprises: defining, for each path, dicting sequence present in the protein database; and a clas a set of Sub-paths of variable lengths, thereby defining a sification functionality capable of accessing the protein data plurality of sets of sub-paths; and for each set of sub-paths, base and providing a protein classifier corresponding to the comparing each Sub-path of the set with Sub-paths of other predicting sequence, so as to classify the protein sequence by SetS. the protein classifier. 0040. According to still further features in the described 0030. According to further features in preferred embodi preferred embodiments the application of the significance test ments of the invention described below, the classification comprises calculating, for each path, a set of probability functionality determines presence or absence of at least one functions characterizing the partial overlaps, and evaluating a active pocket or active site on the protein sequence. statistical significance of the set of probability functions. 0031. According to still further features in the described 0041 According to still an additional aspect of the present preferred embodiments the classification functionality deter invention there is provided apparatus for characterizing a mines the location of the at least one active pocket or active predetermined protein class being a member of a collection of site. protein classes defining a classification system for classifying 0032. According to an additional aspect of the present a plurality of proteins, the apparatus comprises: (a) a motif invention there is provided a method of characterizing a pre extraction unit capable of extracting repeatedly occurring determined collection of protein classes defining a classifica motifs from amino acid sequences of the plurality of proteins, tion system for classifying a plurality of proteins, the method thereby providing a set of motifs; (b) a searcher capable of comprises: (a) extracting repeatedly occurring motifs from searching the set of motifs for at least one motif which com amino acid sequences of the plurality of proteins, thereby prises less than L amino acids, the at least one motif being providing a set of motifs; and (b) for each protein class: present in at least a few proteins belonging to the predeter searching the set of motifs for at least one motif which com mined protein class but not in proteins belonging to other prises less than L amino acids, the at least one motif being protein classes of the collection; and (c) a characterization present in at least a few proteins belonging to the protein class unit capable of defining the at least one motif as a predicting but not in proteins belonging to other protein classes, and sequence characterizing the predetermined protein class. defining the at least one motif as a predicting sequence char 0042. According to further features in preferred embodi acterizing the protein class; thereby characterizing the col ments of the invention described below, the plurality of pro lection of protein classes. teins comprises a plurality of enzymes. According to still 0033 According to yet an additional aspect of the present further features in the described preferred embodiments the invention there is provided a method of classifying a plurality classification system is an EC hierarchical classification sys of proteins into protein classes, comprising: (a) extracting tem. According to still further features in the described pre repeatedly occurring motifs from the sequences of the plural ferred embodiments the protein classes are branches of an EC ity of proteins, thereby providing a set of motifs; and (b) using hierarchical classification system. the set of motifs for defining protein classes, each being 0043. According to further features in preferred embodi characterized by at least one motif which comprises less than ments of the invention described below, the method further Lamino acids; thereby classifying the plurality of proteins comprises employing a screening procedure for reducing the according to the protein classes. number of predicting sequences. US 2013/0332.133 A1 Dec. 12, 2013

0044 According to a further aspect of the present inven 0063 Optionally, the method comprises: tion there is provided apparatus for classifying a plurality of 0064 (c) purifying said peptide at least 50% purity by proteins into protein classes, comprising: (a) a motif extrac weight. tion unit capable of extracting repeatedly occurring motifs 0065 Optionally, the method comprises: from amino acid sequences of the plurality of proteins, 0.066 (c) purifying said peptide to medical grade purity. thereby providing a set of motifs; and (b) a protein class 0067. Optionally, the cells express said polypeptide definition unit, capable of defining protein classes using the because they have been transformed or transfected with an A set of motifs, wherein each protein class is characterized by at nucleic acid construct comprising a nucleic acid sequence least one motif which comprises less than L amino acids. encoding a polypeptide selected from the group consisting of 0045. According to further features in preferred embodi polypeptides as set forth in SEQID Nos.: 77,838 to 198,923 ments of the invention described below, the motif extraction and a cis-acting regulatory element for expressing said unit is operable to search each sequence for partial overlaps polypeptide in a host cell between the sequence and other sequences, to apply a signifi 0068. In an exemplary embodiment of the invention, there cance test on the partial overlaps, and to define a most sig is provided a nucleic acid construct comprising a nucleic acid nificant partial overlap as a repeatedly occurring motif. sequence encoding a polypeptide selected from the group 0046 According to still further features in the described consisting of polypeptides as set forth in SEQID Nos.: 77,838 preferred embodiments the motifextraction unit comprises a to 198,923 and a cis-acting regulatory element for expressing graph constructor capable of constructing a graph having a said polypeptide in a host cell. plurality of paths representing the sequences of the plurality 0069. In an exemplary embodiment of the invention, there of proteins. is provided a host cell comprising the construct. 0047 According to still further features in the described 0070 Optionally, the host cell comprises a eukaryotic cell. preferred embodiments the graph comprises a plurality of 0071 Optionally, the host cell comprises a prokaryotic Vertices, each representing one type of amino acid, and cell. wherein each path of the plurality of paths comprises a 0072. In an exemplary embodiment of the invention, there sequence of vertices respectively corresponding to an amino is provided a transgenic plant expressing an exogenous acid sequence of one protein of the plurality of proteins. polypeptide selected from the group consisting of polypep tides as set forth in SEQID Nos.: 77,838 to 198,923. 0048. According to still further features in the described 0073. In an exemplary embodiment of the invention, there preferred embodiments L is selected from the group consist is provided a transgenic animal expressing an exogenous ing of 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 and 15. polypeptide selected from the group consisting of polypep 0049. In an exemplary embodiment of the invention, there tides as set forth in SEQID Nos.: 77,838 to 198,923. is provided a method of processing a Substrate, the method 0074. In an exemplary embodiment of the invention, there comprising: is provided a method of producing a specific enzyme, the 0050 contacting the substrate with at least one polypep method comprising: tide selected from the group consisting of the polypeptides set 0075 (a) growing a culture of host cells according to claim forth in SEQ ID Nos.: 77,838 to 198,923 under conditions 52; and which allow processing of the Substrate by said at least one 0076 (b) harvesting the polypeptide from the culture. polypeptide, wherein said at least one polypeptide is selected 0077. In an exemplary embodiment of the invention, there capable of processing the Substrate. is provided a pharmaceutical composition comprising a phar 0051 Optionally, the reaction conditions include a tem maceutically acceptable carrier and, as an active ingredient, at perature of at least 45°. centigrade. least one polypeptide selected from the group consisting of 0052 Optionally, the substrate is selected from the group polypeptides as set forth in SEQID Nos.: 77,838 to 198,923. consisting of a lipid, a protein, a and a nucleic 0078. In an exemplary embodiment of the invention, there acid. is provided a isolated composition comprising a polypeptide 0053) Optionally, the at least one peptide affects reaction selected from the group consisting of polypeptides as set forth kinetics of a lipid hydrolysis reaction. in SEQID Nos.: 77,838 to 198,923. 0054 Optionally, the at least one peptide affects reaction 0079. Optionally, the composition comprises a cleaning kinetics of a protein hydrolysis reaction. agent. 0055 Optionally, the at least one peptide affects reaction 0080 Optionally, the cleaning agent comprises at least kinetics of a carbohydrate hydrolysis reaction. one member selected from the group consisting of a deter 0056. Optionally, the at least one peptide affects reaction gent, a solvent and a surfactant. kinetics of a reaction with a nucleic acid Substrate. I0081. In an exemplary embodiment of the invention, there is provided a method of laundering fabric, the method com 0057 Optionally, said conditions comprise the presence prising: of a detergent. 0082 (a) mixing a composition according to any of claims 0058. In an exemplary embodiment of the invention, there 59-62 with water to produce a washing solution; and is provided a method of producing an enzyme, the method I0083) (b) wetting the fabric with the washing solution. comprising: I0084 Optionally, the method comprises heating to a tem 0059 (a) growing cells expressing a polypeptide selected perature of at least 45° centigrade. from the group consisting of polypeptides as set forth in SEQ I0085. In an exemplary embodiment of the invention, there ID Nos.: 77,838 to 198,923; and is provided a chemical reagent comprising: 0060 (b) harvesting said polypeptide from the culture. 0086 (a) catalytic molecules comprising at least one pep 0061 Optionally, the method comprises: tide selected from the group consisting of SEQ ID Nos.: 0062 (c) assaying a functional activity of said peptide. 77,838 to 198,923; and US 2013/0332.133 A1 Dec. 12, 2013

0087 (b) an insoluble support with the catalytic molecules stressed that the particulars shown are by way of example and bound thereto. for purposes of illustrative discussion of the preferred 0088. In an exemplary embodiment of the invention, there embodiments of the present invention only, and are presented is provided a industrial process comprising: in the cause of providing what is believed to be the most 0089 (a) contacting a plurality of substrate molecules useful and readily understood description of the principles with a reagent according to claim 64; and and conceptual aspects of the invention. In this regard, no 0090 (b) adjusting reaction conditions to contribute to attempt is made to show structural details of the invention in activity of the catalytic molecules in processing the Substrate more detail than is necessary for a fundamental understand molecules. ing of the invention, the description taken with the drawings 0091. Optionally, the process is conducted batchwise. making apparent to those skilled in the art how the several 0092 Optionally, the insoluble support is immobilized forms of the invention may be embodied in practice. and the process is conducted as a flow-through process. 0103) In the drawings: 0093 Optionally, the process is conducted at a tempera 0104 FIG. 1 is a flowchart diagram describing a method ture of at least 45° centigrade. Suitable for classifying a target sequence of a protein, accord 0094. In an exemplary embodiment of the invention, there ing to various exemplary embodiments of the present inven is provided a method of identifying an inhibitor of a catalytic tion; activity of an enzyme of interest, the method comprising: 0105 FIG. 2 is a schematic illustration of an apparatus for 0.095 (a) contacting an enzyme comprising a polypeptide classifying a target protein sequence, according to various as set forth in one of SEQID nos.: 77,838 to 198,923 having exemplary embodiments of the present invention; an activity as set forth in one of tables 38 and 39 with a 0106 FIG. 3 is a flowchart diagram of a method for char Substrate thereof and an agent to be evaluated under condi acterizing a predetermined collection of protein classes, tions which allow catalytic processing of the substrate by the enzyme; and according to various exemplary embodiments of the present 0096 (b) monitoring said catalytic processing of said sub invention; strate and: 0107 FIG. 4 is a schematic illustration of an apparatus for 0097 (i) concluding that the agent is an inhibitor if a characterizing a protein class being a member of a collection reduction in catalytic processing is observed; and of protein classes, according to various exemplary embodi 0.098 (ii) concluding that the agent is not an inhibitor if ments of the present invention; a reduction in catalytic processing is not observed. 0.108 FIG. 5 is a flowchart diagram of a method for clas 0099. The present invention successfully addresses the sifying a plurality of proteins into protein classes, according shortcomings of the presently known configurations by pro to various exemplary embodiments of the present invention; viding a method and apparatus for classifying protein 0109 FIGS. 6a-b are simplified illustrations a structured Sequences. graph (FIG. 6a) and a random graph (FIG. 6b), according to 0100. Unless otherwise defined, all technical and scien a preferred embodiment of the present invention; tific terms used herein have the same meaning as commonly 0110 FIG. 7a illustrates a representative example of a understood by one of ordinary skill in the art to which this portion of a graph with a search-path going through five invention belongs. Although methods and materials similar or Vertices, according to a preferred embodiment of the present equivalent to those described herein can be used in the prac invention; tice or testing of the present invention, Suitable methods and 0111 FIG.7b illustrates a pattern-vertex having three ver materials are described below. In case of conflict, the patent tices which are identified as significant pattern of the trial path specification, including definitions, will control. In addition, of FIG. 7a, according to a preferred embodiment of the the materials, methods, and examples are illustrative only and present invention; not intended to be limiting. 0112 FIG. 8 is a histogram of motifs as function of their 0101 Implementation of the method and system of the length as calculated according to a preferred embodiment of present invention involves performing or completing selected the present invention for the six main classes of the EC hier tasks or steps manually, automatically, or a combination archical classification; thereof. Moreover, according to actual instrumentation and 0113 FIGS. 9a-care histograms of percentage identity of equipment of preferred embodiments of the method and sys pairs of enzymes that contain the same predicting sequences tem of the present invention, several selected steps could be which comprise less than 9 amino acids (FIG. 9a), between 9 implemented by hardware or by Software on any operating and 12 amino acids (FIG.9b) and more than 12 amino acids system of any firmware or a combination thereof. For (FIG.9c); example, as hardware, selected steps of the invention could be 0114 FIG. 10 is a histogram of number of proteins in an implemented as a chip or a circuit. As software, selected steps additional exemplary set of previously characterized proteins of the invention could be implemented as a plurality of soft as a function of number of predicting sequence matches; ware instructions being executed by a computer using any 0115 FIG. 11 is a histogram of number of proteins in same Suitable operating system. In any case, selected Steps of the set of protein as in FIG. 10 as a function of number of pre method and system of the invention could be described as dicting sequence matches indicating how many consistent being performed by a data processor, such as a computing and inconsistent matches for each number of predicting platform for executing a plurality of instructions. Sequences: 0116 FIG. 12 is a histogram similar to FIG. 11 showing 5 BRIEF DESCRIPTION OF THE DRAWINGS to 15 predicting sequence matches in greater detail; 0102 The invention is herein described, by way of 0117 FIG. 13 is a histogram indicating percentage of vari example only, with reference to the accompanying drawings. ous combinations of consistent and inconsistent predicting With specific reference now to the drawings in detail, it is sequence matches per protein; US 2013/0332.133 A1 Dec. 12, 2013

0118 FIG. 14 is a histogram depicting percentage of true I0131 FIG.25 is a histogram of coverage of ProSite motifs predictions as a function of predicting sequence match cat by PSs plotted as a function of the required minimal amount egory: (in percents) of amino-acids shared by the two motifs. 0119 FIG. 15 is a histogram depicting number of proteins as a function of length of coverage (L) by number of consis DESCRIPTION OF THE PREFERRED tent predicting sequences; EMBODIMENTS 0120 FIG. 16 is a histogram depicting number of proteins 0.132. One aspect of the invention relates to an algorithm as a function of predicting sequence match category for a which employs a large number of short predictive sequences dataset of previously uncharacterized sequences; generally indicated as S. to determine a function (e.g. an 0121 FIG. 17 is a tree diagram illustrating representative enzymatic function) of a subject amino acid (AA) sequence portions of the EC hierarchy and the assignments of predict which has not been previously characterized. A classifier C ing sequences (PS) to predictive sequence classes to form indicating Classification information (e.g. a classification in exemplary predictive sequences according to some embodi the Enzyme Commission Hierarchy) is associated with each ments of the invention; predictive sequence S. In an exemplary embodiment of the 0122 FIG. 18 shows aligned sequences of two groups of invention, each C provides a position in the EC hierarchy. enzymes of level 4 that share the same 3rd level assignment. I0133. In an exemplary embodiment of the invention, the The organisms in the upper group, 5.1.3.20, belong to pro algorithm searches the Subject AA sequence to provide all the teobacteria, while those of the lower group, 5.1.3.2, contain S hits thereon. The Shits can be either consistent (c) or also (ARATH, CYATE and PEA); Bold-faced inconsistent (i) where consistency indicates assignment to a Substrings denote predictive sequences; amino-acids flanked same EC class or classes which share a parent/offspring rela by spaces denote active sites and binding sites, as indicated tionship. above; a list of all predictive sequences and their assignments I0134. In an exemplary embodiment of the invention, 2 to to predictive sequence classes is presented below the 20 predictive sequences are used to assign an EC number to an Sequences. AA sequence. Optionally, 3 to 5 predictive sequences are 0123 FIG. 19 is a three dimensional spacefilling model of Sufficient to reliably assignan EC number to an AA sequence. enzyme P67910 depicting the active sites of (1) S. (2)Y and In some exemplary embodiments of the invention predictive (3) K and the motif RYFNV in location (4). Clearly the latter sequences have significant predictive value even when they shares with the loci (1) and (2) the same pocket, thus indicat do not all consistently indicate a single EC classification. ing its possible importance in the function of this enzyme. I0135 Another aspect of the invention relates to automated Visualization was done using the tool described in Moreland, analysis of AA sequences by a machine to identify predicting etal (2005: The molecular biology toolkit (mbt): A modular sequences within the sequence, analyze the identified predict platform for developing molecular visualization applications. ing sequences and assign an EC classification based upon the BMC Bioinformatics., 6:21); identified predicting sequences. In an exemplary embodiment 0.124 FIG.20a is a three-dimensional display of enzyme of the invention, the automated analysis assigns EC classifi P07649 (PDB code 1 DJO), belonging to EC5.4.99.12, show cations which have low homology (e.g. 70%, or 60% or 50% ing 1 an active site D at sequence location 60.2 a binding or intermediate or lesser homologies at the AA level) to the site Yat location 118, 3 a binding site L at location 245. The most homologous enzyme in the assigned EC class. Option active site is common to two predicting sequences 4 con ally, the analysis assigns a single AA sequence to two EC taining (CAGRT(D)AGVH). Other shown predicting classes. In an exemplary embodiment of the invention, sequences are 5GQVVHatlocations 67-71, 6 FHARF at assignment to two EC classifications indicates that there are locations 107-111, known to be a tentative RNA-binding two distinct enzymatic activities. Optionally, the two activi peptide, 7 ENDFTS at locations 157-163 and 8 HMVRNI ties reside in distinct AA sequence domains or in overlapping at 201-207, sharing a pocket with the active and binding sites. AA sequence domains. GQVVHand ENDFTS belong to PS3, all other motifs belong 0.136 An additional aspect of the invention relates to use to PS4 AA sequences which were previously isolated to perform a 0125 FIG. 20b shows a different display of the same function revealed by analysis of predicting sequences resid enzyme focuses on the pocket containing the active site. ing in the sequence. According to various exemplary embodi 0126 FIG. 20c shows the relevant section of the enzyme ments of the invention, large numbers of previously unknown sequence, with highlighted residues corresponding to the enzymes for hydrolysis (e.g. of proteins and/or and/or lipids) are made available. According to other exem pocket and underlined residues corresponding to predicting plary embodiments of the invention, large numbers of previ Sequences. ously unknown enzymes for laboratory analytic use (e.g. 0127 FIG. 21 is a histogram of number of enzyme and/or ligases and/or polymerases) and/or medical sequences as a function of number of predicting sequences use are made available. Optionally, enzymes are made avail occurring on enzymes; the median is indicated on the Figure able in a variety of forms including but not limited to, chemi and the mean average is 9.5 predicting sequences/enzyme; cal reagents comprising specific enzymes immobilized on an 0128 FIG. 22 is a pie chart illustrating the relation insoluble Support, pharmaceutical compositions and cleaning between the data of Swiss-Prot releases 45 and 48.3 preparations. Exemplary insoluble Supports include, but are 0129 FIG. 23 is histogram of number of enzymes as a not limited to, cellulose, agarose, Sephadex, Sepharose, nitro function of number of predictive sequences with median and cellulose, nylon, polycarbonate, polystyrene and glass. mean number of predictive sequences indicated; Immobilization is optionally transient (e.g. by ionic binding) 0130 FIG. 24 is a Venn diagram illustrating the intersec or permanent (e.g. by covalent binding). tion of enzymes characterized by an exemplary embodiment 0.137 In an exemplary embodiment of the invention, the of the invention and PROsite data of Swiss-Prot; and enzymes are isolated from thermophilic organisms. Option US 2013/0332133 A1 Dec. 12, 2013 ally, Such enzymes remain active at 45° centigrade or are description or illustrated in the drawings. The invention is chemically modified to obtain such thermal stability. capable of other embodiments or of being practiced orcarried 0138 Another aspect of the invention relates to isolated out in various ways. Also, it is to be understood that the nucleic acid sequence encoding at least a functional portion phraseology and terminology employed herein is for the pur of an AA sequence which was previously isolated but whose pose of description and should not be regarded as limiting. function was only revealed by analysis of predicting (0145. It has long been recognized that in numerous cases sequences residing in the sequence. Optionally, analysis of proteins exhibit a high correlation between their three-dimen large groups of newly characterized polypeptides gives rise to sional structure and their function; however other cases a Smaller, but still significant, group of products. In an exem revealed no such correlation. plary embodiment of the invention, isolated nucleic acid I0146) For example, the enzyme lysozyme (PDB entry 8lyz sequences which encode the polypeptides of the present and EC 3.2.1.17) and the enzyme C.-lactalbumin (PDB entry invention are incorporated into an expression vector. Option 1alc and EC 2.4.1.22) share only 44% sequence identity, but ally, the expression vector can be used to transfect bacteria their backbones superpose with a root-mean-square-devia and/or to transform cells and recombinantly express the exog tion (RMSD) of only 1.55 A, meaning these two enzymes enous polypeptide therein. According to various exemplary share a very similar three-dimensional structure. Interest embodiments of the invention, the cells can be prokaryotic or ingly, their functions are mutually exclusive: C.-lactalbumin eukaryotic cells (e.g., mammalian cells, insect cells, yeast or can not hydrolyze glycosides and lysozyme can not partici plant cells) which are amenable to transformation. Option pate in lactose synthesis. ally, the transformed cells comprise at least a portion of a I0147 A more impressive example of this phenomenon is transgenic animal or a transgenic plant. the structural family known as the TIM barrel fold family, 0139. In an exemplary embodiment of the invention, there named after triosephosphate isomerase (PDB entry lamkand is provided a detergent composition comprising one or more EC5.3.1.1). The eight-stranded a? 13 TIM barrel is by far the enzymes characterized according to a method according to an most common tertiary fold observed in high resolution pro exemplary embodiment of the invention and/or a method of tein crystal structures. It is estimated that 10% of all known use of the composition. Optionally, enzymes characterized enzymes have this domain. The members of this large family according to a method according to an exemplary embodi of proteins catalyze very different reactions. Such diversity in ment of the invention are set forth in SEQID Nos.: 77,838 to function has made this family an attractive target for protein 198,923. XXX try to limit?? structure-function relationship research, and the evolutionary 0140. In an exemplary embodiment of the invention, there history of this protein family has been the subject of rigorous is provided a food composition and/or a food processing debate. Arguments have been made in favor of both conver composition comprising one or more enzymes characterized gent and divergent evolution, yet due to the lack of sequence according to a method according to an exemplary embodi homology, the ancestry of this molecule is still not under ment of the invention and/or a method of use of the compo stood. sition. Optionally, enzymes characterized according to a 0148 Table 1 below presents some 84 members of the method according to an exemplary embodiment of the inven TIM barrel family sorted by their EC number, which repre tion are set forth in SEQID Nos.: 77,838 to 198,923. XXXtry sent the function of the enzyme, as further detailed hereinaf to limit?? ter. As shown in Table 1 the enzymes of the TIM barrel family 0141. The present invention also encompasses composi span over all classes of the EC hierarchical classification. tions useful for the preparation of ethanol comprising one or These examples illustrate that mere backbone structural simi more enzymes characterized according to a method accord larity does not necessarily imply functional similarity. ing to an exemplary embodiment of the invention and/or a method of use of the composition. Optionally, enzymes char TABLE 1 acterized according to a method according to an exemplary embodiment of the invention are set forth in SEQ ID Nos.: TIM barrel family enzymes by EC class. 77,838 to 198,923. XXX try to limit?? Enzyme/Protein name EC Number 0142. The present embodiments comprise a searchable 1 CHO reductase 1.1.1.2 protein database which can be used for classifying a protein 2 Inosine monophosphate dehydrogenase 1.1.1.205 according to its amino acid primary sequence. Specifically, 3 Aldehyde reductase 1.1.1.21 4 Aldose reductase 1.1.1.21 the present invention can be used to predict a class of an 5 3-alpha-hydroxy steroid dehydrogenase 1.1.1.50 unclassified protein for the purpose of, e.g., predicting its 6 Flavocytochrome B2 1.1.2.3 affinity or function. The present embodiments further com 7 Glycholate oxidase 1.1.3.1 prise readable data storage medium carrying the protein data 8 2-5-diketo D-gluconic acid reductase 1.1.99.3 9 Luciferase (flavin mono oxygenase) 1.14.14.3 base, method and apparatus for classifying protein sequences, 10 Dihydro orotate dehydrogenase 1.3.3.1 and method and apparatus for characterizing a collection of 11 Tetrahydromethanopterin reductase 1.5.99.11 protein classes for the purpose, e.g., building or updating the 12 Trimethylamine dehydrogenase 1.5.99.7 database. 13 Old yellow enzyme 1.6.99.1 14 Methyltetrahydrofolate corrinoid 2.1.1.13 0143. The principles and operation of a method and appa 15 Transaldolase B 2.2.1.2 ratus according to the present invention may be better under 16 Cyclodextringlycosyltransferase 2.4.1.19 stood with reference to the drawings and accompanying 17 Quinolinate phosphoribosyltransferase 2.4.2.19 descriptions. 18 tRNA-Guanine transglycosylase 2.4.2.29 19 Dihydropteroate synthase 2.5.1.15 0144. Before explaining at least one embodiment of the 20 Thiamin phosphate synthase 2.5.1.3 invention in detail, it is to be understood that the invention is 21 Pyruvate kinase 2.7.1.40 not limited in its application to the details of construction and 22 Pyruvate phosphate dikinase 2.79.1 the arrangement of the components set forth in the following US 2013/0332.133 A1 Dec. 12, 2013

TABLE 1-continued 0150. In a search for classification techniques, the present inventors have devised a searchable protein database which TIM barrel family enzymes by EC class. can be used for efficiently classifying protein according to its amino acid primary sequence. The protein database of a pre Enzyme? Protein name EC Number ferred embodiment of the present invention comprises a plu 23 IV 3.121.2 rality of entries, where each entry has a predicting sequence 24 His A protein 3.1.3.15 25 Phosphoinositide-Specific C, Isozyme d1 3.1.4.11 S, and a protein classifier C, corresponding to the predicting 26 Phosphotriesterase 3.18.1 sequence S, 27 C.-amylase 3.2.1.1 0151. In some of the priority documents of the instant 28 Oligo 1-6 glucosidase 3.2.1.10 Application (U.S. Application No. 60/799,318 filed on May 29 Hevamine 3.2.1.14 30 chitinase A 3.2.1.14 11, 2006, and U.S. Application No. 60/861,746 filed on Nov. 31 B-amylase 3.2.1.2 30, 2006), “predicting sequences” or “PS' were also referred 32 -glycosidase 3.2.1.21 to as “specific peptides” or “SP”. 33 1-33 glucanase 3.21.39 0152 Exemplary databases according to the teachings of 34 Endocellulase E1 3.2.1.4 35 Chitobiase 3.2.1.52 the present embodiments are provided in Appendix 1 and 36 14 a D-glucan maltotetrahydrolase) 3.21.60 Tables 11,37 and 42 on enclosed CD-ROM (files “Table-11. 37 Isoamylase 3.21.68 txt”, “Table-37.txt” and “Table-42.txt”). Methods suitable for 38 1-3, 1-43 glucanase 3.2.1.73 constructing the database according to various exemplary 39 mannanase 3.2.1.78 40 Endo-B-1-4-xylanase 3.2.1.8 embodiments of the present invention are provided hereinbe 41 Exo-1-4-3-D-glycanase 3.21.91 low. 42 Endo--N-acetylglucose aminidase 3.21.96 0153. As is further detailed hereinunder, the predicting 43 Myrosinase (thioglucoside glucohydratase) 3.23.1 44 Urease (c subunit) 3.5.1.5 sequence of each entry predicts the class to which a target 45 Adenosine deaminase 3.54.4 protein belongs, while the corresponding classifier provides 46 Ornithine decarboxylase 4.1.1.17 classification information of the respective class, Subclass, 47 orotidine-5'-phosphate decarboxylase 4.1.1.23 48 Phosphoenolpyruvate carboxylase 4.1.1.31 Sub-Subclass etc. For example, in one embodiment, S, pre 49 Uroporphyrinogen decarboxylase 4.1.1.37 dicts the affinity of a protein and C, describes the affinity class 50 ribulose-bisphosphate carboxylase (large subunit) 4.1.1.39 of the protein. 51 Indole-3 glycerol phosphate synthase 4.1.1.48 0154 The term “affinity”, as used herein, refers to a spe 52 Fructose bis-phosphate aldolase 4.1.2.13 cific distinguishing property of a given protein which relate to 53 Arabino-heptulosonate-7-phosphate synthase 4.1.2.15 54 3-deoxy-D-manno-Octulosonate 8 phosphate synthase 4.1.2.16 the molecule(s) that bind and interact with it in a specific and 55 Isocitrate Lyase 4.1.3.1 characteristic mode, and thereby at least partially describe the 56 Malate synthase G 4.1.3.2 protein's function. A set of one or more molecules which bind 57 N-acetylneuraminate lyase 4.1.3.3 58 3-dehydroquinate dehydratase 4.2.1.10 and interact with a protein in a specific manner, (e.g., Sub 59 Enolase 4.2.1.11 strates, ligands, coenzymes, co-factors, affinity-pair protein 60 Tryptophan synthase (C. subunit) 4.2.1.20 counterpart and the likes), is referred to herein as an “inter 61 5-Amino laevulinate dehydratase 4.2.1.24 acting set'. 62 propane Diol dehydratase 4.2.1.28 63 D-glucarate dehydratase 4.2.1.40 0155 The affinity of a protein according to the present 64 2-Dehydro-3-Deoxy-Galactarate Aldolase 4.2.1.42 embodiments correlates strongly to the protein's chro 65 Dihydropicolinate Synthase 4.2.1.52 mophore, which comprises a specific set of chemical moieties 66 His F protein 4.3.2.4 which are specifically positioned in three-dimensional space 67 Alanine racemase 5.1.1.1 68 Mandelate racemase 5.1.2.2 So as to fit a complementary arrangement of chemical moi 69 D-ribulose-5-phosphate 3-epimerase 5.1.3.1 eties which is characteristic of a member of the interacting 70 Triosephosphate Isomerase 5.3.1.1 set. The binding and interaction between the protein and the 71 Rhamnose isomerase 5.3.1.14 members of its interacting set is therefore governed by struc 72 N-5-phosphorylanthranilate isomerase 5.3.1.24 73 Xylose isomerase 5.3.1.5 tural recognition patterns which effect reversible binding and 74 Phosphoenolpyruvate mutase 5.42.9 exhibit a high binding (dissociation) constant relative to mol 75 Glutamate Mutase 5.499.1 ecules which are not members of the binding set. 76 Methylmalonyl CoA mutase 5.4.99.2 0156 The EC hierarchical classification system discussed 77 Muconate cyclo isomerase S.S.1.1 in details in Appendix 2 below is one example for protein 78 Chloromuconate isomerase 5.5.1.7 79 Yeast Hypothetical protein classification by affinity. For example, an enzyme which 80 FR-1 Protein belongs to the class (EC 3.--...-), acting on carbon 81 Potassium channel B subunit nitrogen bonds other than peptide bonds (EC 3.5.--) in cyclic 82 Methylene tetrahydrofolate reductase amides (EC 3.5.2.-), is classified by affinity to cyclic amides 83 Narbonin Such as, for example, cyanuric acid, and hence cyanuric acid 84 Concanavalin B amidohydrolase (EC 3.5.2.15) is uniquely identified by affin ity to cyanuric acid. Thus, according to a preferred embodi ment of the present invention S, predicts the branch of the EC 0149 Conversely, structural dissimilarity does not neces tree to which the protein belongs and C, provides classifica sarily imply functional dissimilarity, as demonstrated among tion information in the form of the EC number defining the many proteins. For example, carbonic anhydrases (EC 4.2.1. respective branch. 1) from the archaebacteria Methanosarcina thermophila 0157 Receptor-ligand affinity is another example in (PDB entry 1 thj) is utterly structurally dissimilar to carbonic which the predicting sequence predicts the affinity of a pro anhydrases from the mammal Mus musculus (PDB entry tein. Like enzymes, receptors interact with one or more ldmx). ligands by binding which is governed by molecular recogni US 2013/0332.133 A1 Dec. 12, 2013

tion. The receptor exhibits one or more binding sites which having Sufficient structural homology. One example of Such are structurally and chemically compatible to bind the ligand, protein class is the aforementioned TIM barrel super-family, namely possess a unique chromophore comprising atoms of which include a large number of proteins which share a fold its amino acid chain. Therefore, a collection of receptor (main feature of the tertiary structure). sequences wherein each receptor is associated with a known (0162. In still another embodiment, S, predicts the function ligand can be classified according to the type of ligand that of the protein. In this embodiment, C, describes a class or each receptor recognizes and binds. Thus, according to the family of proteins that share functional attributes. Such as, for presently preferred embodiment of the invention C, describes example, proteins which are derived from a common ances ligands to which a protein having the predicting sequence S, tor. For example, a classification according to an ancestor can binds. Representative examples of Such ligands, include, be used to classify proteins which contain a catalytic triads without limitation, peptide-type ligands, charge-type ligands, and which are related by convergent evolution towards a phosphate-type ligand, nucleotide-type ligands and the like. stable, useful active site. Among these are found the C/3 0158 Receptor classes can be attributed to specific ligand/ fold family, the eukaryotic serine protease family, activity types such as G-protein-coupled receptors (GPCRs), the cysteine protease family and the subtilisin family. For guanylyl cyclase receptors, tyrosine kinase receptors, eryth example, the class of proteins associated with the C/B hydro ropoietin receptor, growth factors receptors, cytokines recep lase fold comprises several hydrolytic enzymes of widely tors, nicotinic receptors, acetylcholine receptors, atrial-natri differing phylogenetic origin and catalytic function. The core uretic peptide (ANP) receptors, natriuretic peptides of each member of this group is an C/B-sheet and not a barrel, receptors, guanylin receptors, glycine receptor, GABA recep of eight f-sheets connected by C.-helices. These proteins have tors, glutamate (kainate) receptors, NMDA receptors, AMPA diverged from a common ancestor so as to preserve the receptors, serotonin (5-HT3) receptors and the likes. arrangement of the catalytic residues, not the binding site. 0159. Within the large group of receptors, one particular They all have a catalytic triad, the elements of which are class of receptors is the GPCRs Super-family, also known as borne on loops which are the best-conserved structural fea seven transmembrane receptors (7TMRs). This family is a tures in the fold. The unique topological and sequence protein family of transmembrane receptors that transduce an arrangement of the triad residues produces a catalytic triad extracellular signal (ligand binding) into an intracellular sig which is, in a sense, a mirror-image of the serine protease nal (G protein activation). The GPCRs are the largest protein catalytic triad. family known, and members of this family are involved in all (0163) The classifier C, can also describe a class or family types of stimulus-response pathways, from intercellular com of proteins that share communication transmittance munication to physiological senses. The diversity of func attributes. Such as, but not limited to, the cytokines. Cytokines tions is matched by the wide range of ligands recognized by are soluble proteinaceous Substances. Such as the interleukins members of the family, including photons (rhodopsin, the and lymphokines, produced by a wide variety of haemopoi archetypal GPCR), small molecules (in the case of the hista etic and non-haemopoietic cell types, and are critical to the mine receptors) and proteins (for example, chemokine recep functioning of both innate and adaptive immune responses. tors). This pervasive involvement in normal biological pro 0164. Cytokines can be classified into four different cesses has the consequence of involving GPCRs in many classes based on structural homology. A first cytokines class pathological conditions, which has led to GPCRs being the includes the cytokines with four bundles of alpha-helices. target of 40% to 50% of modern medicinal drugs. The GPCRs This class is subdivided into three sub-classes, known as the can be further subdivided into subclasses and the present Interleukin (IL) 2 subclass, the interferon (INF) subclass and embodiments can be used to sub-classify receptors of the the IL-10 subclass. A second cytokines class is known as the GPCR super-family. IL-1 family and primarily includes the IL-1 and IL-18. A third 0160 Thus, according to a preferred embodiment of the cytokines class, known as the IL-17 class, includes cytokines present invention S, predicts the specific binding of a GPCR which have a specific effect in promoting proliferation of and C, comprises classification information which describes T-cells that cause cytotoxic effects. A fourth cytokines class ligands to which the GPCR binds. For example, the classifi includes the chemokines. cation information which can be provided by the classifier can 0.165 Cytokines, and particularly immunological cytok include specific ligand/activity types, such as, but not limited ines, can also be classified according to the target cells and/or to, “muscarinic', acetylcholine receptors (acetylcholine and the cells for which they stimulate proliferation and differen muscarine), adenosine receptors (adenosine), adrenoceptors tiation. With respect to immunological cytokines, for (also known as adrenergic receptors, for adrenaline, and other example, these can be classified to several classes. One Such structurally related hormones and drugs), GABA receptors, class can include cytokines which activate T cells, another type-B (y-aminobutyric acid or GABA), angiotensin recep class can include cytokines which stimulate proliferation of tors (angiotensin), cannabinoid receptors (cannabinoids), antigen-activated T and B cells, an additional class can cholecystokinin receptors (cholecystokinin), dopamine include cytokines which stimulate proliferation and differen receptors (dopamine), glucagon receptors (glucagon), tiation of B cells, an additional class can include cytokines metabotropic glutamate receptors (glutamate), histamine which activate macrophages, and an additional class can receptors (histamine), olfactory receptors (for the sense of include cytokines which stimulate hematopoiesis. Thus, in Smell), opioid receptors (opioids), rhodopsin (a photorecep this embodiment S, predicts the function of the cytokine and tor), secretin receptors (secretin), serotonin receptors (except C, describes this function. type-3). Somatostatin receptors (somatostatin), -sens (0166 Also contemplated are embodiments in which S, ing receptor (calcium) and the likes. predicts other protein attributes such as, but not limited to, (0161. In an additional embodiment, S, predicts the three electrostatic traits, cellular placement locus, motion capacity dimensional structure of the protein or a portion thereof. In and the like. Depending on the protein attributes C, describes this embodiment, C, describes a class or family of proteins the class of proteins which share the respective attribute. US 2013/0332.133 A1 Dec. 12, 2013

0167. It is to be understood that the database of the present according to three-dimension structural information of embodiments is not limited to one classification criterion. It is known classified proteins, such as the proteins from which the intended to embrace all combinations and Sub-combinations entries of the larger database were extracted. of any of the aforementioned protein classification criteria. 0173. In preferred embodiment of the invention, a larger 0168 For example, as will be appreciated by one of ordi database is screened according to biological information, nary skill in the art, when the classifier C, comprises an EC Such as, but not limited to, existence of specific sites, second number, the classification can be according to function and/or ary structure, DNA and RNA binding, metal binding, protein affinity. Thus, a particular entry in the database can comprise protein interactions, etc. For example, screening can be done a predicting sequence which predicts, e.g., the ability of the by keeping only entries corresponding to binding and active enzyme to catalyze oxidoreduction reactions. In this case the sites in known proteins, while removing all other entries. The corresponding classifier can be EC 1 which stands for the size of the resulting parsimonious database is preferably less oxidoreductases main class in the EC hierarchical classifica than half, more preferably less than third, more preferably tion. Another entry in the database can comprise a predicting less than quarter, more preferably less than fifth, more pref sequence which predicts, e.g., the ability of the enzyme to act erably less than sixth, more preferably less than seventh, more on carbon-nitrogen bonds in the cyclic amide in cyanuric preferably less than eighth, more preferably less than ninth, acid. In this case the corresponding classifier can be EC more preferably less than tenth of the size of the larger data 3.5.2.15, which describes, inter alia, function (catalyzing base. As demonstrated in the Examples section that follows, hydrolytic cleavage) and affinity (to the carbon-nitrogen bond such procedure can reduce a database of over 50,000 entries in the cyclic amide in cyanuric acid). (e.g., the database provided in Table 11 of 37 on enclosed 0169. Another combination of classification criteria CD-ROM) to a database of less than 2500 entries, thus reduc which is contemplated is the combination of classification by ing the size of the database by a factor of about 20. A repre function and fold. For example, a particular entry in the sentative Example of a database in which all predicting database can comprise a predicting sequence which predicts, sequence cover active and/or binding sites is provided in e.g., communication transmittance attributes. In this case the Appendix 1 and further in Table 42 on enclosed CD-ROM. classifier comprises information regarding these attributes 0.174. The advantage of the protein database of the present (for example the classifier can point to the cytokines class of embodiments is in its canonical predicting sequences. The proteins). Another entry can comprise a predicting sequence present Inventors have found that it is sufficient to attribute which predicts, e.g., one of the four structural types of the classification information to a target protein based on a rela cytokines (e.g., the four C-helix bundle). In this case the tively short class-predicting sequence. In Various exemplary classifier can point to the respective type of cytokine. Other embodiments of the invention the predicting sequence com combinations of classification criteria are also contemplated. prises less than L amino acids, where L is an integer which is 0170 The protein database of the present embodiments typically not larger than 15, e.g., L=5, L 6, L=7, L=8, L=9, can be embodied in any electronically readable data storage L=10, L=11, L=12, L=13, L=14 or L=15. The number of medium, including, without limitation, a memory medium amino acids in a predicting sequence is referred to herein as (e.g., RAM, ROM, EEPROM, flash memory, etc.), an optical the length of the predicting sequence. A preferred method storage medium (e.g., CD-ROM, DVD, etc.), a magnetic which can be used for constructing the protein database of the storage medium (e.g., magnetic cassettes, magnetic tape, present embodiments is provided hereinunder. magnetic disk storage device, etc.), or any other medium 0.175. The present Inventors have found that it is sufficient which can be used to store the matrix and which can be to classify an unclassified target protein, particularly, but not accessed electronically, e.g., by a data processor. The protein exclusively an enzyme, by searching in its primary sequence database of the present embodiments can also be embodied for a motif of amino acids matching one of the predicting on a printed medium, e.g., a paper. sequences S, of the database. It will be appreciated that since 0171 The number of entries in the protein database of the the predicting sequences are generally short, the search for a present embodiments is referred to herein as the size V of the matching motif over the primary sequence is a simple and fast protein database. There is no limitation on the numerical task. In particular, the database of present embodiments is value of V. Preferably, the number of entries is large so as to Superior to prior art techniques because according to a pre facilitate classification of many types of proteins. According ferred embodiment of the present invention it is not necessary to a preferred embodiment of the present invention the protein to determine the similarity level (e.g., number of insertions, database comprises at least Tentries (i.e., VeT), where T can deletions and/or mutations) between the entire sequence of be any number disclosed either explicitly or implicitly in the the target protein and the entire sequence of each individual specification. For example, T can be any number from 1 to the protein of the database. Once a matching motif is found, the size of the exemplified protein databases provided in Appen unclassified protein is classified by attributing the target pro dix 1 below and further in Tables 11, 37 and 42 on enclosed tein with the protein classifier C, which corresponds to the CD-ROM. matched predicting sequence S. Once the protein is classi 0172. If desired, the database can be parsimonious in the fied, its classification can be displayed, e.g., on a display sense that its size V is reduced compared to the size V, of the device or hardcopy, recorded on a memory medium, or trans training set used for constructing the database. This embodi mitted over a communication network. ment is advantageous from the standpoint of data storage 0176 When the database is an enzyme database, the pro Volume and/or processing time. It was found by the Inventors tein classifiers of the database preferably represent branches of the present invention that the size of the database can be of the EC hierarchical classification (EC tree). In various significantly reduced by introducing further screening to the exemplary embodiments of the invention each predicting database according to additional information, e.g., biological sequence S, is present exclusively in entries having protein data. For example, a parsimonious database can be obtained classifier representing a specific EC branch or descending from a larger database by screening the larger database branch thereof. In other words, the predicting sequences are US 2013/0332.133 A1 Dec. 12, 2013

preferably specific to one, and only one, branch of the EC one observation can be the number of classified proteins with hierarchical classification, excluding uniqueness within its no hits, another observation can be the number of classified descending branches. For example, as is evident from the proteins with one hit, and so on. A set of linear equations can database provided in Appendix 1 below and further in Table then be constructed for using the sets of probabilities and 11 on enclosed CD-ROM, the predicting sequence SSFGSY observations as coefficients. The linear equations can be used (SEQ. ID No. 1907) is present in the EC branch 1.9.3.1 but not for calculating the expected errors. For example, as demon in any other EC branch because there are no descending strated in the Examples section that follows, the expected branches to this EC branch. On the other hand, predicting error associated with a threshold of 2 hits is about 24%. sequence LEGEYG (SEQ. ID No. 13270) corresponds to the 0182. The database of the present embodiments can also EC branch 1.1.1, and is therefore present only on EC branches be used for classification according to active sites or active beginning with the three EC numbers 1.1.1, but not necessar pockets. Thus, a particular entry in the database can comprise ily on all of them. a predicting sequence which predicts existence of an active 0177 Database in which protein classifiers representing site oran active pocket. For example, the primary sequence of branches of the EC tree can also be used for determining an unclassified target protein (e.g., an enzyme), can be whether or not a target protein has enzymatic function. Thus, searched for a motif of amino acids matching one of the the primary sequence of an unclassified target protein can be predicting sequences of the database. Once one or more Such searched for one or more motifs of amino acids matching one predicting sequences of the unclassified target protein are or more of the predicting sequences of the database. If Such found, the location of one or more of the predicting sequences motif(s) exist, the protein can be identified as an enzyme. can be tagged as belonging to an active site oran active pocket Moreover, the protein classifiers associated with the found of the target protein. Thus, the database of the present motifs can be used for classifying the enzyme according to the embodiments can be used to predict secondary or tertiary EC classification. structure from primary sequence. 0.178 Typically, the search over the primary sequence of the target protein results in a plurality of hits, each corre 0183 The term “active pocket' as used herein refers to any sponding to a different entry of the database. In this case, a spatial region on the protein which includes at least one site confidence or likelihood test is preferably employed so as to capable of facilitating a biological or chemical effect. Typi determine whether or not the target protein has enzymatic cally, “active pocket' is a common term to binding pocket and function. Optionally and preferably the confidence or likeli catalytic pocket. For example, an active pocket of a protein hood test is employed to exclude one or more predicting can be a volume in the three-dimensional structure of the sequence hits which are more likely to be accidentals. That is protein which includes one or more binding sites and/or to say, predicting sequence hit corresponding to protein clas active sites. Representative examples of loci of active sites sifiers representing a branch of the EC tree which is likely to and binding sites are shown in FIGS. 18 and 20 in the be false is excluded from the list of predicting sequence hits. Examples section that follows. Protein classifiers associated with the remaining predicting 0.184 Also contemplated is the use of the database of the sequence hits can then be used for classifying the protein present embodiments for classification according to DNA according to the EC classification. and RNA binding, metal binding, protein-protein interac 0179 The likelihood test preferably comprises a thresh tions, and the like. olding procedure in which the number of predicting sequence 0185. Following is a description of various applications hits on the target protein is compared to one or more prede for which the present embodiments can be useful. Each of the termined confidence thresholds. The simplest case is a pro following applications can be in a form of a method, which cedure in which a single threshold is used, whereby if the comprises one or more method steps to be executed, or in the number of hits is higher than the threshold the target protein form of an apparatus having one or more components capable is predicted as having enzymatic function, and if the number of predicting sequence hits equals or lower than the threshold, of performing various method steps. the hits are declared as false positive, and the target protein 0186 Methods of the present embodiments can be embod remains unclassified. A preferred value of the threshold in this ied in many forms. For example, the methods can be embod embodiment is 2 more preferably 3, more preferably 4, even ied in a tangible medium Such as a computer for performing more preferably 5 or more. the method steps. The methods can be embodied on a com 0180. In another embodiment, two or more predetermined puter readable medium, comprising computer readable thresholds are used. In this embodiment each threshold is instructions for carrying out the method steps. The methods associated with an expected error. If the number of predicting can also be embodied in electronic device having digital sequence hits is higher than the ith threshold but is lower than computer capabilities arranged to run the computer program or equals the (i+1)th threshold, the target protein is predicted on the tangible medium or execute the instruction on a com as having enzymatic function, and the prediction is associated puter readable medium. with the ith expected error. 0187. Apparatus for implementing methods of the present 0181. The expected errors can be obtained as follows: The embodiments can commonly be distributed to users on a database of the present embodiments can be tested against a distribution medium such as an electronically readable data database of random sequences, and a probability can be storage medium in a form of computer programs. From the assigned to each number of hits. Additionally, the primary distribution medium, the computer programs can be copied to sequence of a plurality of classified proteins can be searched a hard disk or a similar intermediate storage medium. The for motifs of amino acids matching predicting sequences of computer programs can be run by loading the computer the database of the present embodiments so as to determine a instructions either from their distribution medium or their set of observations, whereby each observation corresponds to intermediate storage to medium into the execution memory of a different number of predicting sequence hits. For example, the computer, configuring the computer to act in accordance US 2013/0332.133 A1 Dec. 12, 2013

with the method of the present embodiments. All these opera present embodiments, generally shown at 24. Searcher 22 tions are well-known to those skilled in the art of computer searches the target sequence for a motif of amino acids which systems. matches a predicting sequence present in database 24, as 0188 Referring now to the drawings, FIG. 1 is a flowchart further detailed above. Apparatus 20 further comprises a clas diagram describing a method suitable for classifying a target sification functionality 26 which also accesses database 24 sequence of a protein, according to various exemplary and provides a protein classifier corresponding to the predict embodiments of the present invention. The method begins at ing sequence matched by searcher 22. Thus, in use, searcher step 10 and continues to step 11 in which the readable protein 22 traverses the target sequence and compares motifs database is provided. The method continues to step 12 in extracted from the target sequence to the predicting which the target sequence of the protein is searched for a sequences of the database. Once a match is found between the motif of amino acids matching one or more of the predicting extracted motif and a predicting sequence, searcher 22 passes sequences of the database. The search over the target the information to classification functionality 26 which pulls sequence of the protein can be repeated one or more time so the respective classifier from database 24 and classifies the as to provide a plurality of motifs, each matching one or more target sequence according to the classifier. of the predicting sequences of the database. Since the protein 0194 According to a preferred embodiment of the present primary sequence can be expressed as a one-dimensional invention classification functionality 26 determines the pres vector of characters the search for matching motif can be ence or absence and optionally the location of active pockets easily achieved by the ordinarily skilled person, for example, or active sites on the protein sequence, as further detailed by traversing the one-dimensional vector and comparing its hereinabove. elements with the elements of the predicting sequences of the 0.195 Classification functionality 26 is optionally and database. preferably operatively associated with an output unit 28 0189 The method continues to step 13 in which the pro which displays, record and/or transmits a report containing tein classifier corresponding to the predicting sequence is the classification of the target sequence. Output unit 28 can used for classifying the target protein sequence. The classifi comprise a display device, a printing device, a recording cation depends on the type of protein classifier. Specifically, device and/or a transmitting device. Output unit can also when the predicting sequences of the database predict protein comprise means suitable either for storing information in a affinities, the target protein sequence is classified according to computer readable medium or for communicating with func the protein classifier into a protein affinity class; when the tionalities which store the information in the computer read predicting sequences of the database predict protein func able medium. tions, the target protein sequence is classified according to the protein classifier into a protein functional class. Other classes 0196. The present embodiments successfully provide a are also contemplated. For example, the classification can be method for characterizing protein classes. The method can be according to active sites or active pockets. Specifically, the used to construct a protein database by assigning one or more presence or absence of one or more active sites or active predicting sequences for each protein class. Once the classes pockets of the target protein can be determined. This can be are assigned with predicting sequences, the database can be achieved by using a database which covers active sites or constructed, e.g., as a searchable table in which each entry active pockets to a Sufficiently high degree of confidence comprises one predicting sequence and information regard (e.g., more than 50%, more preferably more than 60%). A ing the protein class to which the predicting sequence is representative database suitable for the presently preferred assigned. The constructed database can be recorded on a embodiment of the invention is provided in Table 42 on computer readable medium for further use. The method of enclosed CD-ROM, see also Appendix 1 below. this embodiment is supervised in the sense that it can be 0190. The existence on the target sequence of motifs employed on any collection of protein classes provided each matching predicting sequences of Such or similar database class in the collection includes a known number of protein can be interpreted as presence of active sites or active pockets. sequences. A description of an unsupervised method accord Further, the number of matching motifs can be used for esti ing to another aspect of the present invention is provided mating the likelihood of such interpretation. Optionally and hereinunder. preferably the location of one or more of the motifs of the 0197) The supervised method can be employed on any target protein which match predicting sequences of the data collection of protein classes which defines a classification base can be tagged as belonging to an active site or an active system. In one embodiment, the method is employed on a pocket. collection of enzyme classes which is spanned by the EC 0191 Classification can also be according to DNA and hierarchical classification system or a portion (selected RNA binding, metal binding, protein-protein interactions and branches) thereof. the like. 0198 More generally, the method can be used for provid 0.192 Once the protein is classified, the method optionally ing efficient characterization to proteins which are already and preferably continues to step 14 in which a report contain clustered by some protein clustering technique. ing its classification is issued. The report can be displayed, 0199 The term “cluster refers to a protein sequence clus recorded or transmitted as further detailed hereinabove. The ter, which is a group of protein sequences sharing a requisite method ends at step 15. level of homology and/or other similar traits according to a 0193 Reference is now made to FIG. 2 which is a sche given clustering criterion. A process and/or method to group matic illustration of an apparatus 20 for classifying a target protein sequences as such is referred to as clustering, and is protein sequence. Apparatus 20 can be used for executing typically performed by a clustering application program selected steps of the method described above and in the flow implementing a cluster algorithm. Many cluster algorithms chart diagram in FIG.1. Apparatus 20 comprises a searcher are known, including, without limitation hierarchical cluster 22 which is capable of accessing the protein database of the ing, K-means clustering, Bayesian clustering and the like. US 2013/0332.133 A1 Dec. 12, 2013

0200 Thus, suppose for example, that many proteins are (0205 The method ends at step 37. Subjected to a clustering procedure which produces a collec 0206 FIG. 4 is a schematic illustration of an apparatus 40 tion of protein classes (cluster in this example) Such that for for characterizing a protein class being a member of a collec each protein it is known (to a certain degree of confidence, tion of protein classes, according to various exemplary say, at least 50%) to which class it belongs. The method of the embodiments of the present invention. Apparatus 40 can be present embodiments can be employed on the collection of used for executing selected steps of the method described classes and assign the classes predicting sequences. A repre hereinabove and in the flowchart diagram of FIG. 3. Appara sentative example of such clustering procedure is a procedure tus 40 comprises a motif extraction unit 42 which extracts which defines clusters of proteins which share a known fold. repeatedly occurring motifs as delineated above and further In this embodiment, the method of the present embodiments exemplified hereinunder. Apparatus 40 further comprises a assigns sequences which predict the shared folds. searcher 44 which searches the set of motifs provided by unit 0201 Reference is now made to FIG. 3 which is a flow 42 for motif or motifs present in several proteins belonging to chart diagram of a method for characterizing a predetermined the protein class but not in proteins belonging to other protein collection of protein classes, according to various exemplary classes in the collection. The searcher 44 is preferably con embodiments of the present invention. figured to provide sufficiently small motifs, as further 0202 The method begins at step 30 and continues to step detailed hereinabove. Apparatus 40 further comprises a char 31 in which repeatedly occurring motifs are extracted from acterization unit 46 which defines the motif or motifs found the amino acid sequences of the proteins. Preferably, but not by searcher 44 as predicting sequence(s) characterizing the obligatorily step 31 is executed on all the proteins of all the protein class. Optionally and preferably apparatus 40 com classes. The repeatedly occurring motifs can be extracted in prises a screening unit 48 which screens the predicting any way known in the art. According to a preferred embodi sequences according to biological information as further ment of the present invention step 31 employs a sequence detailed hereinabove and exemplified in the Examples section recognition algorithm, such as, but not limited to, the algo that follows. rithm disclosed in International Patent Application, Publica 0207. In use, apparatus 40 can be employed for all or a tion No. WO/200501.0642, the contents of which are hereby portion of the classes in the collection, Such that each class is incorporated by reference. A preferred technique for extract assigned with one or more predicting sequences, and a data ing the repeatedly occurring motifs is provided hereinunder. base of predicting sequences S, and classifiers C, can be con In any event, once step 31 is completed a set of motifs are structed as explained above. In various exemplary embodi provided. ments of the invention apparatus 40 comprises an output unit 0203 The method continues to steps 32-34 in which each 49 which records the database on a computer readable class is characterized by a predicting sequence, as follows. In medium or transmits the database to a functionality which step 32, a class is selected from the collection of protein records the database on a computer readable medium. classes. In step 33 the set of motifs is searched for one or more 0208 Repeatedly occurring motifs extracted from amino motifs present in at least a few (e.g., the majority of) proteins acid sequences of proteins can also be used for unsupervised belonging to the selected class but not in proteins belonging to classification of proteins. other protein classes. According to a preferred embodiment of 0209. As used herein, “unsupervised classification” refers the present invention step 33 is directed to search for motifs to classification into a plurality of classes without any a-priori which are sufficiently short. Specifically, step 33 is directed to knowledge of the distribution of the proteins within the search for motifs which comprises less than L amino acids, classes. Thus, unlike the supervised method described above where L is an integer which is typically not larger than 15, as in which the classes as well as the proteins of each class are further detailed hereinabove. In step 34 the (sufficiently short) known, in the unsupervised classification, all the proteins are motif which was found is defined as the predicting sequence initially unclassified and an unsupervised classification which characterizes the set. method is executed to define classes and the distribution of 0204. Once the predicting sequence is defined, the method proteins within the defined classes. loops back to step 32 and steps 32-34 are preferably repeated 0210 Reference is now made to FIG. 5 which is a flow for another class of the collection. Optionally and preferably chart diagram of a method for classifying a plurality of pro the method continues to step 35 in which biological informa teins into protein classes, according to various exemplary tion is used to screen the predicting sequences obtained in embodiments of the present invention. The method of the steps 32-34. Preferably, the screening step is performed so as presently preferred embodiments is unsupervised. to reduce the number of predicting sequences by at least a 0211. The method begins at step 50 and continues to step factor of R where R is a number greater than 1, more prefer 51 in which repeatedly occurring motifs are extracted from ably greater than 2, e.g., R=5, R=10, R=15 or R=20. The the amino acid sequences of the proteins as delineated above biological information can comprise for example, active sites and further exemplified hereinunder. The method continues annotations, secondary structure and the like, and the screen to step 52 in which the motifs are used for defining protein ing can comprise keeping only predicting sequences which classes. The definition of classes is according to the extracted cover the biological information and discarding all other pre motifs. Specifically, two or more proteins are declared as dicting sequences. A representative example of a screening belonging to class C, if they all include the same motif S. process is provided in the Examples section that follows. Preferably, but not obligatorily, the motifs can be used to Once all the classes of the collection are characterized by define the classes in an exclusive manner. In this embodiment, predicting sequences, the method optionally and preferably two or more proteins are declared as belonging to class C, if moves to step 36 in which the predicting sequences and they include the motifS, and if there is no other class C, (izi) classifiers providing classification information of the corre which includes proteins having the motif S. It is to be under sponding classes are recoded on a computer readable stood that the exclusive definition can be combined with a medium. non-exclusive definition. Thus, the method can define both US 2013/0332.133 A1 Dec. 12, 2013

motifs which are exclusive to the respective classes and 0219. The procedure continues by applying a significance motifs which are present in more than one class. Such defi test on the partial overlaps. Significance tests are known in the nition of classes is particularly useful for defining a hierar art and can include, for example, statistical evaluation of flow chical classification system (e.g., a tree) whereby a motif quantities, such as, but not limited to, probability functions or which is present in an ancestor class is also present in its conditional probability functions which characterize the par descending classes. On the other hand, a descending class tial overlaps between paths on the graph. includes at least one motif which is not present in its ancestor 0220 According to a preferred embodiment of the present or any other class having a non-descending relation therewith. invention a set of probability functions is defined using the 0212. The method ends at step 53. number of paths connecting particular vertices on the graph. 0213 Following is a description of a preferred procedure For example, considering a single vertex, e, on the graph, a for extracting repeatedly occurring motifs. The procedure is probability, p(e), can be defined as the number of paths based on the sequence recognition algorithm disclosed in leaving e divided by the total number of paths. Similarly, International Patent Application, Publication No. considering two vertices, e and e, a (conditional) probabil WO/2005O10642. ity, p(ele), can be defined as the number of paths leading 0214. The procedure beings with a search for overlaps from e to e divided by the total number of paths leaving e. between the sequences. Specifically, for each amino acid This prescription is preferably applied to all combinations of sequence, partial overlaps between the sequence and other Vertices on the graph, defining, e.g., p(e), p(ele), p(esle sequences are searched. Each sequence is considered as e), for paths leaving el and going through e and es, and “trial-sequence' which is compared, segment by segment, to p(e), p(ele2), p(elees), for paths going through es and e all other sequences. and entering e. In terms of all the conditional probabilities, 0215. This can be done for example, by constructing a the graph can define a Markov model. Thus, a “search-path.” graph which represents the dataset. Such graph may include a of length K, going through Vertices e. e. . . . ek on the graph plurality of vertices and paths of vertices, where each vertex (corresponding to a trial-sequence of Kamino acids), can be represent one amino acid and each path of vertices represent used to define a variable order Markov model up to order K, a primary sequence of one protein. Thus, according to a represented by the following matrix:

p(e1, e2) p(e1, e2e3) . p(e1, e2 ... ek) (EQ. 1) p(e2) p(e2 e1) . p(e2e3 ... ek) p(e3 e2) p(es) . p(e3.e4 ... ek)

p(ek ele2 ... ek-1) p(ek le2 ... e K-1) P(ek e3 ... ek-1) ... p(ek) preferred embodiment of the present invention, for 20 amino 0221 For any sub-path of ele2 . . . em having a length acids there are 20 vertices on the graph. These 20 vertices are m{K, a similar Markov model can be obtained from an mixm connected thereamongst by edges, preferably directed edges, diagonal sub-matrix of M. It will be appreciated that whereas in many combinations, depending on the sequences of the the collection of all paths which represent a sequence of the proteins. dataset defines all the conditional probabilities appearing in 0216. The endpoints of each path of the graph are prefer M, the search-path ele2 ... eKused in M does not necessarily ably marked, e.g., by adding marking vertices, such as a represent a sequence of the dataset. The definition of the “begin' vertex before its first vertex and an “end” vertex after search-path is based on conditional probabilities, such as its last vertex. These marking vertices represent the beginning p(ee), which are predetermined by those paths which rep and end of the respective sequence of the dataset. Thus, each resent the sequences of the dataset. Vertex which represents an amino acid has at least one incom ing path and at least one outgoing path, preferably an equal 0222 An occurrence of a significant overlap (e.g., overlap number of incoming and outgoing paths. 62 in FIG. 6a), along a search-path can be identified by observing some extreme values of the relevant conditional 0217. Once the graph is constructed, overlaps between the probabilities. According to a preferred embodiment of the paths thereof can be searched, for example, by considering present invention, the probability functions comprise prob different sub-paths of different lengths for each path and ability functions characterizing a rightward direction on each comparing these Sub-paths with Sub-paths of other paths of path and probability function characterizing a leftward direc the graph. It was found by the inventors of the present inven tion on each path. Thus, for a search-pathee . . . . e. . . . Cic tion that such graph is not a random graph. Rather, the graph a probability function, P, characterizing a rightward direc typically includes bundles of Sub-paths, signifying a rela tion, is preferably defined by the first column of M, moving tively high probability associated with a given sub-structure top down, and a probability function, P., characterizing a which can be identified as a motif. leftward direction, is preferably defined by the last column of 0218 FIGS. 6a-b, show simplified illustrations of a struc M, moving bottom up. Specifically, tured graph (FIG. 6a) and a random graph (FIG. 6b). Shown Pr(n) p(ele1e-2 . . . e., ) and P(n) p(elee, 2 . . . in FIGS. 6a-b, a plurality of vertices e1, e2,..., e16, each ei). (EQ. 2) representing one amino acid. Referring to FIG. 6a, of particu lar interest are vertex e1 and vertex e15 which are connected 0223) As will be appreciated by one ordinarily skilled in by many sub-paths of the graph, hence defining an overlap 62 the art, both P and P. vary between 0 and 1 and are specific therebetween. to the path in question. US 2013/0332.133 A1 Dec. 12, 2013

0224. In terms of the number of paths, P and P. can be 0231. The null hypothesis depends on the choice of the understood considering, for simplicity, that the path in ques functions which characterize the overlaps. For example, tion is ele2e3e4 (K-4). Hence, according to a preferred when the ratios are used, the null hypothesis can be P(e1; embodiment of the present invention, P(3)=p(e3ele2), the e5)amP(e1, e4) and P(e4, e1)am P. (e4: e2). Alterna rightward direction probability corresponding to the Sub-path tively, the null hypothesis can be P1-e and PS-1-e orany ele2e3 equals the number of paths moving from e1 through other combination of the above conditions. e2 into e3 divided by the number of paths moving from e1 to 0232 For a given search-path, P, and P are preferably e2, and P(3)=p(e3.e4), the leftward direction probability calculated from many starting points (such as el and e4 in the corresponding to the Sub-path e3.e4 equals the number of present example), more preferably from all starting points on paths moving from e3 to e4 divided by the number of paths the search-path, traversing each Sub-path both leftward and entering e4. It is convenient to define the aforementioned rightward. This procedure defines many search-sections on probabilities in the explicit notations P(e1, e3) and P(e4: the search-path, from which several partial overlaps can be e3), respectively. identified. Once the partial overlaps have been identified, the 0225 FIG. 7a illustrates a representative example of a most significant partial overlap is defined as a significant portion of a graph in which a search-path, going through pattern. ele2e3e4e5 and marked with a “begin' vertex at its beginning 0233. In an alternative, yet preferred, embodiment, a set of and an “end” vertex on its end, is selected. Also shown in FIG. cohesion coefficients, c.ii iD, are calculated, for each trial 7a, are other paths, joining and leaving the search-path at path, as follows: various vertices. The bundle of sub-paths between vertex e2 and vertex e4 displays certain coherence, possibly indicating c=M, log Myf (M.M.) (EQ. 3) the presence of a significant pattern in the dataset. where M are elements of the variable order Markov model 0226 To illustrate the use of the probabilities P and P. matrix (see Equation 1). For a given search-path there are the portion of the graph is positioned in a rectangle coordinate many sub-paths, each represented by an element in the set c system in which the vertices are conveniently arranged along which can be considered as an “overlap score.” Once the setc., the abscissa while the ordinate represent probability values. is calculated, its Supremum is selected and the Sub-path which Progressing from el rightwards, P(n), n=1,2,3,4, 5, has the corresponds to the Supremum is preferably defined as the values 4/41, 34, 1, 1 and /3 respectively. Progressing from e4 significant pattern of the search-path. leftwards, Pan), n=4, 3, 2, 1 has the values 6/41, 5/6, 1 and 3/s. 0234. It is to be understood that it is not intended to limit 0227 Thus, P. first increases because some other paths the scope of the present invention to the above statistical join to form a coherent bundle, then decreases at e5, because significance tests, and that other significance tests as well as many paths leave the path at e4. Similarly, progressing left other probability functions or cohesion coefficients can be ward, P. first increases because other paths join as e4 and then implemented. decreases because paths leave the path at e2. The decline of P. 0235. The procedure in which overlaps are searched along or P, is preferably interpreted as an indication of the end and a search-path is preferably repeated for more than one path of beginning of the candidate pattern respectively. The overlaps the original graph, more preferably on all the paths of the can be identified by requiring that the values of P and P. original path (hence on all the sequences). It will be appreci within a candidate overlap are Sufficiently large. Thus, a can ated that significant patterns can be found, depending on the didate overlap can be defined as a Sub-sequence represented degree by which the search-path overlaps with other paths. by a path or a sub-path on the graph in which P1-e and 0236 According to a preferred embodiment of the present P>1-e, where e and e, are two parameters smaller than invention, the graph is "rewired by merging each, or at least unity. A typical value fore and e is from about 0.01 to about a few, significant patterns into a new vertex, referred to here O.99. inafter as a pattern-vertex. This is equivalent to a redefinition 0228. As used herein the term “about refers to +10%. of one or more sequences whereby several amino acids are 0229 Optionally and preferably, the decrement of P and grouped according to the significant patterns to which they P. can be quantified by defining decrease functions and com belong. This rewiring process reduces the length of the paths paring their values with predetermined cutoffs hence to iden of the graph, nonetheless the contents of the paths in terms of tify overlaps between paths or Sub-paths. According to a the original sequences of the proteins is conserved. preferred embodiment of the present invention, the decrease 0237. In principle, the identification of the significant pat functions are defined as ratios between probabilities of paths terns can depend on other vertices of the search-path, and not having some common vertices. In the example shown in FIG. only on the vertices belonging to the overlapping Sub-paths. 7a the decrement of P at e4 can be quantified using a right The extent of this dependence is dictated by the selected ward direction decrease function, D, defined as D(el; identification procedure (e.g., the choice of the probability e4)=P(e1; e5)/P(e1, e4), and the decrement of P, at e2 can functions, the significant test, etc.). Referring to the example be quantified using a leftward direction decrease function, of FIG. 7a, a sub-path e2e3e4 is defined as a significant D, defined as D, (e4; e2)=P(e4, e1)/P, (e4; e2). Denoting pattern of the search-path “begin’->e1-> ... -->e5->“end” the predetermined cutoffs by mand m, respectively, a partial By definition, the vertices e2, e3 and e4, also belong to other overlap can be identified when both D

0239. In one embodiment, significant patterns are merged 0250. The identity of the substrate will naturally dictate only on the path for which they turned out to be significant, the selection of the polypeptide enzyme. while leaving the vertices unmerged on other paths. 0251 Information on correspondence between enzyme 0240. In another embodiment, after each search on each and substrate is readily available to one of ordinary skill in the search-path, Sub-paths which are identified as significant pat art, for example from: terns are merged into pattern-vertex, irrespectively whether (0252 PRECISE (Predicted and Consensus Interaction or not these Sub-paths are defined as significant patterns also Sites in Enzymes; structural bioinformatics lab: ) which is a database of 0241. In still another embodiment, after each search on interactions between the amino acid residues of an enzyme each search-path, the Sub-paths which are identified as sig and its various ligands, i.e., Substrate and transition state nificant patterns are merged into a pattern-vertex. analogues, cofactors, inhibitors, and products; and/or from 0242. In yet another embodiment, after each search on 0253) The Catalytic Site Atlas (European Bioinformatics each search-path, the Sub-paths which are identified as sig Institute; Hinxton, UK) described in “The Catalytic Site nificant patterns are merged into pattern-vertices. Atlas: a resource of catalytic sites and residues identified in 0243 In a further embodiment, after all paths are enzymes using structural data. (Porter et al. (2004) Nucl. searched, the Sub-paths which are identified as significant Acids. Res. 32: D129-D133). or CSA database (http://www. patterns are merged into pattern-vertices. ebi.ac.uk/thornton-Srv/databases/CSA?); and/or from 0244 FIG. 7a illustrate a pattern-vertex 72 having vertices (0254 The KEGG LIGAND Database which is a composite database comprising the trial path of FIG. 7a. Note that vertices e2, e3 and e4 three sections. The COMPOUND section provides informa remain on the graph in addition to pattern-vertex 72, because, tion about metabolites and chemical compounds. The REAC in the present example, there is a path which goes through e2 TION section provides a collection of substrate-product rela and e3 but not through e4, and a path which goes through e4 tions representing metabolic and other reactions. The and e5 (see FIG. 7a) but not through e2 and e3. ENZYME section provides for information about enzyme molecules. The Sep. 7, 2001 release includes 7298 com 0245. The rewiring procedure can be used as a supplemen pounds, 5166 reactions and 3829 enzymes. In addition to the tary procedure, for example, when it is desired to provide new keyword search provided by the DBGET/LinkDB system, a sequences which are not present originally. Generalization of Substructure search to the COMPOUND and REACTION the dataset is preferably achieved by defining equivalence sections is available through the World Wide Web (http:// classes of amino acids and allowing, for a given sequence, the www.genome.ad.jp/ligand?). LIGAND may be also down replacement of one or more amino acids of the sequence with loaded by anonymous FTP (ftp://ftp.genome.ad.jp/pub/kegg/ other amino acids which are members of the same equiva ligand/); and/or from lence class. (0255. The MetaCyc Encyclopedia of Metabolic Pathways 0246 For example, Suppose that an equivalence class, E. (Caspi et al., 2006, "MetaCyc: A multiorganism database of of two vertices, e3 and e6, is defined, i.e., E={e3, e6}. Sup metabolic pathways and enzymes. Nucleic Acids Res., pose further that among the protein sequences there are two 34:D511-D516 2006) which is a database of nonredundant, sequences, say, ele2e3e4e5 and ele2e6e4e7, which include experimentally elucidated metabolic pathways containing the members of E. These sequences can be replaced with the over 900 pathways from more than 900 different organisms generalized sequences ele2Ee4e5 and ele2Ee4e7, which, in curated from the scientific experimental literature. MetaCyc addition to the original sequences, also include the new contains pathways involved in both primary and secondary sequences ele2e6e4e5 and ele2e3e4e7, not necessarily metabolism, as well as associated compounds, enzymes, and present in any of the original proteins. genes. MetaCyc aims to catalog the universe of metabolism 0247 Using the above described databases, methods and by storing a representative sample of each experimentally apparatus, the present inventors were able to annotate elucidated pathway. MetaCyc is used in a variety of scientific polypeptides for the first time as having enzymatic activity. applications, such as providing a reference data set for com These polypeptides can find wide use in commodity, food, putationally predicting the metabolic pathways of organisms agrotec, cosmetic and pharma industries as outlined below. from their sequenced genomes, Supporting metabolic engi 0248 Thus, according to a further aspect of the present neering, helping to compare biochemical networks, and serv invention there is provided a method of processing a Sub ing as an encyclopedia of metabolism. MetaCyc pathways strate. The method comprising contacting the Substrate with can be browsed from a list, from ontologies, or queried at least one polypeptide selected from the group consisting of directly when searching for pathways, proteins, reactions or the polypeptides set forth in EQ ID nos.: 77,838 to 198,923 compounds. MetaCyc can also be queried programmatically under conditions which allow processing of the substrate by using Java or PERL when installed locally; and or said at least one polypeptide, wherein said at least one (0256 The Human Protein Reference Database (Peri, S. et polypeptide is selected capable of processing the Substrate. al. (2003) Development of human protein reference database 0249. As used herein the phrase “processing a substrate' as an initial platform for approaching systems biology in refers to enzymatic-dependent conversion (catalysis) of a humans. Genome Research. 13:2363-2371.) which is central Substrate from a given chemical form to a distinct one. ized platform to visually depict and integrate information Examples of Such catalysis reactions include, but are not pertaining to domain architecture, post-translational modifi limited to degradation, digestion, hydrolysis, nucleic acid cations, interaction networks and disease association for each cleavage, nucleic acid ligation, proteolytic cleavage, poly protein in the human proteome. Information in HPRD is manually extracted from the literature by molecule to another and addition of a chemical group to a expert biologists who read, interpret and analyze the pub molecule. lished data. HPRD has been created using an object oriented US 2013/0332.133 A1 Dec. 12, 2013 database in Zope, an open source web application server, that 0263. The Enzyme Nomenclature list is an amplified and provides versatility in query functions and allows data to be updated version of the 1992 edition. It currently contains displayed dynamically. The database currently comprises details of over 3700 enzymes. It was prepared from the last 37.581 entries on protein/protein interactions; and/or printed edition, Enzyme Nomenclature 1992 (1) which was 0257 The Engineering Database (LED) (Jürgen converted into html with additional data added. In many cases Pleiss: Institute of Technical Biochemistry, University of the reaction is given in words and illustrated with a reaction Stuttgart, Stuttgart, Germany: http://www.led.uni-stuttgart. diagram (which may be part of a metabolic pathway). Other de) integrates information on sequence, structure, and func names for each enzyme are added and links are provided to tion of , , and related proteins. The LED other databases (BRENDA, EXPASY, KEGG, WIT, etc) and facilitates systematica analysis of sequence-structure-func the CAS registry number provided when known. The refer tion relationships and is a useful tool to identify functionally ences now have titles and link where relevant to the PubMed relevant residues apart from the active site residues, and to entry. design mutants with desired substrate specificity. 0264. The EC hierarchy divides enzymes into six main 0258 Table38 comprises SEQID Nos.: 77,838 to 137,952 classes—EC 1 oxidoreductases, EC 2 transferases, EC 3 comprising polypeptide enzymes classified according to hydrolases, EC 4 lyases, EC 5 isomersaes and EC 6 ligases exemplary methods described herein. Table 40 comprises which are described in greater detail hereinbelow in APPEN SEQ ID Nos.: 198.933 to 259039 with polynecleotide DIX 2. sequences corresponding to the polypeptide enzymes of table 0265 Tables 38 and 39 of polypeptide sequences and 38. Tables 40 and 41 of nucleotide sequences specify the EC classification for each sequence in the table. 0259 Table 39 comprises SEQID Nos.: 137,953 to 198, 0266. As used herein the phrase “polypeptide refers to a 923 comprising polypeptide enzymes classified according to naturally occurring or synthetic amino acid polymer which exemplary methods described herein. Table 41 comprises comprise at least an active portion which is sufficient to SEQ ID Nos.: 259,040 to 320,010 with polynecleotide process the substrate of interest. Optionally the polypeptide sequences corresponding to the polypeptide enzymes of table also comprises a Substrate recognition domain, optionally 39. separate from the catalytic domain. 0260. The Polypeptides of SEQ ID Nos.: 77,838 to 198, 0267 In an exemplary embodiment of the invention, an 923 set forth in Tables 38 and 39 included enzymes in all 6 active portion of a polypeptide (e.g. as set forth in SEQ ID major EC classes and many important Subclasses. Polynucle Nos.: 77,838 to 198.923) is identified using methods well otide sequences comprising SEQ ID Nos.: 198.933 to 320, known in the art (e.g. serial mutations followed by assays of 009 set forth in tables 40 and 41 make available for the first activity and/or queries of available database to identify time, a functional link between these sequences and their homologous active portions). Thus, an active portion of any biological activity. of SEQ ID Nos.: 77,838 to 198,923 can be employed in 0261 The potential utility of tables 38-41 and/or similar exemplary embodiments of the invention. tables produced according to exemplary methods of the 0268 Polypeptides used in accordance with the present invention is huge. Using tables of this type and available invention refer to polypeptides having an amino acid databases, one of ordinary skill in the art can begin with a sequence as further described hereinbelow. at least about defined physiologic or industrial process, identify a problem 40%, at least about 50%, at least about 60%, at least about atic (e.g. rate limiting) step therein, determine the Substrate of 70%, at least about 75%, at least about 80%, at least about an enzymatic reaction in the problematic step, and select an 81%, at least about 82%, at least about 83%, at least about appropriate enzyme from the table. Selection of the appropri 84%, at least about 85%, at least about 86%, at least about ate enzyme from the table can optionally be as a polypeptide 87%, at least about 88%, at least about 89%, at least about sequence or a nucleotide sequence. Optionally, polypeptide 90%, at least about 91%, at least about 92%, at least about sequences can be produced synthetically or biologically. 93%, at least about 93%, at least about 94%, at least about Optionally, biological production includes isolation of PPM 95%, at least about 96%, at least about 97%, at least about desired peptides from cells. Optionally, the cells are wildtype 98%, at least about 99%, or more say 100% homologous to an cells or carry an expression vector. In an exemplary embodi amino acid sequence selected from the group consisting of ment of the invention, an expression vector comprising regu SEQID Nos.: 77,838 to 198,923. latory sequences and at least a portion of a polynucleotide 0269 Homology (e.g., percent homology) can be deter sequence comprising one of SEQ ID Nos.: 198.933 to 320, mined using any homology comparison Software, including 009 and encoding at least a functional portion of a corre for example, the BlastP software of the National Center of sponding polypeptide sequence comprising one of SEQ ID Biotechnology Information (NCBI) such as by using default Nos.: 77,838 to 198,923. parameters. 0262 The Nomenclature Committee of the International 0270. The present invention also encompasses fragments Union of Biochemistry and Molecular Biology (NC of the above described polypeptides and polypeptides having IUBMB) is responsible for the maintenance of the enzyme mutations. Such as deletions, insertions or Substitutions of one list first published in 1961 and with the last printed edition in or more amino acids, either naturally occurring or man 1992 (IUBMB (1992), Enzyme Nomenclature 1992, Aca induced, either randomly or in a targeted fashion. demic Press, San Diego). Since 1992 the list has been updated 0271 Thus, polypeptides (also referred to as peptides) of electronically by use of the web. In parallel with this process the present invention encompasses native polypeptides (ei all published recommendations by the Committee were also ther degradation products, synthetically synthesized pep converted to a web readable form. More recent changes to tides, or recombinant peptides), peptidomimetics (typically, Enzyme Nomenclature and new recommendations have only synthetically synthesized peptides), and the peptide ana been prepared for the web and are not available in hard copy. logues peptoids and semipeptoids, and may have, for US 2013/0332.133 A1 Dec. 12, 2013

example, modifications rendering the peptides more stable derivatized amino acid can then either be attached to an inert while in a body or more capable of penetrating into cells. Such Solid Support or utilized in solution by adding the next amino modifications include, but are not limited to: N-terminus acid in the sequence having the complimentary (amino or modifications: C-terminus modifications; peptide bond carboxyl)group Suitably protected, under conditions Suitable modifications, including but not limited to CH NH, CH for forming the amide linkage. The protecting group is then S, CH-S—O, O—C NH, CH, O, CH, CH, S=C— removed from this newly added amino acid residue and the NH, CH=CH, and CF–CH; backbone modifications; and next amino acid (Suitably protected) is then added, and so residue modifications. Methods for preparing peptidomi forth; traditionally this process is accompanied by wash steps metic compounds are well known in the art and are specified, as well. After all of the desired amino acids have been linked for example, in Ramsden, C. A., ed. (1992), Quantitative in the proper sequence, any remaining protecting groups (and Drug Design, Chapter 17.2, F. Choplin Pergamon Press, any Solid Support) are removed sequentially or concurrently, which is incorporated by reference as if fully set forth herein. to afford the final peptide compound. By simple modification Further details in this respect are provided hereinbelow. of this general procedure, it is possible to add more than one (0272 Peptide bonds (-CO. NH ) within the peptide amino acid at a time to a growing chain, for example, by may be substituted, for example, by N-methylated bonds coupling (under conditions which do not racemize chiral ( N(CH3) CO ); ester bonds ( C(R)H-C-O-B-C centers) a protected tripeptide with a properly protected (R) N—); ketomethylene bonds ( CO—CH2—); C.-aza dipeptide to form, after deprotection, a pentapeptide, and so bonds ( NH N(R)—CO ), wherein R is any alkyl group, forth. e.g., methyl; carba bonds (—CH2—NH ); hydroxyethyl 0279. Further description of peptide synthesis is disclosed ene bonds (—CH(OH)—CH2—); thioamide bonds ( CS in U.S. Pat. No. 6,472,505. A preferred method of preparing NH ), olefinic double bonds ( CH=CH-); retro amide the peptide compounds of the present invention involves bonds ( NH CO ); and peptide derivatives ( N(R)— Solid-phase peptide synthesis, utilizing a solid Support. CH2—CO ), wherein R is the “normal side chain, natu Large-scale peptide synthesis is described by Andersson rally presented on the carbon atom. These modifications can Biopolymers 2000, 55(3), 227-50. occurat any of the bonds along the peptide chain and even at several (2-3) at the same time. Exemplary Peptide Synthesis Protocols 0273 Natural aromatic amino acids, Trp, Tyr, and Phe, may be substituted for synthetic non-natural acids such as, for 0280 Peptides can be produced synthetically by either instance, tetrahydroisoquinoline-3-carboxylic acid (TIC), Liquid-phase or Solid-phase synthesis. Liquid-phase synthe naphthylelanine (Nol), ring-methylated derivatives of Phe, sis is generally preferred in large-scale production of peptides halogenated derivatives of Phe, and o-methyl-Tyr. for industrial purposes. These synthesis protocols are 0274. In addition to the above, the peptides of the present described in greater detail in for example, by Atherton and invention may also include one or more modified amino acids Sheppard (Solid Phase peptide synthesis: a practical or one or more non-amino acid monomers (e.g., fatty acids, approach. IRL Press, Oxford, England, 1989) and by Stewart complex carbohydrates, etc.). and Young (Solid phase peptide synthesis, 2nd edition, Pierce 0275. The term “amino acid' or “amino acids” is under Chemical Company, Rockford, 1984, pp. 91) stood to include the 20 naturally occurring amino acids; those 0281 Solid-phase peptide synthesis (SPPS), allows the amino acids often modified post-translationally in vivo, synthesis of natural peptides which are difficult to express in including, for example, hydroxyproline, phosphoserine, and bacteria and/or the incorporation of unnatural amino acids, phosphothreonine; and other less common amino acids, peptide?protein backbone modification, and the synthesis of including but not limited to 2-aminoadipic acid, hydroxyl D-proteins, which consist of D-amino acids. ysine, isodesmosine, nor-valine, nor-leucine, and ornithine. Furthermore, the term "amino acid' includes both D- and SPPS L-amino acids. 0276. The peptides of the present invention are preferably 0282. In SPPS small solid beads, insoluble yet porous, are utilized in a linear form, although it will be appreciated that in treated with functional units (linkers) on which peptide cases where cyclization does not severely interfere with pep chains can be built. The peptide will remain covalently tide characteristics, cyclic forms of the peptide can also be attached to the bead until cleaved from it by a reagent such as utilized. trifluoroacetic acid. The peptide is thus immobilized on the 0277. The peptides of the present invention may be syn Solid-phase and can be retained during a filtration process, thesized by any techniques that are known to those skilled in whereas liquid-phase reagents and by-products of synthesis the art of peptide synthesis. For Solid phase peptide synthesis, are flushed away. a Summary of the many techniques may be found in: Stewart, (0283. The general principle of SPPS is one of repeated J. M. and Young, J. D. (1963), “Solid Phase Peptide Synthe cycles of coupling-deprotection. The free N-terminal amine sis. W. H. Freeman Co. (San Francisco); and Meienhofer, J of a solid-phase attached peptide is coupled to a single N-pro (1973). “Hormonal Proteins and Peptides,” vol. 2, p. 46, tected amino acid unit. This unit is then deprotected, reveal Academic Press (New York). For a review of classical solu ing a new N-terminal amine to which a further amino acid tion synthesis, see Schroder, G. and Lupke, K. (1965). The may be attached. Peptides, vol. 1, Academic Press (New York). 0284. There are two common types of SPPS Fmoc and 0278. In general, peptide synthesis methods comprise the Boc which proceed in a C-terminal to N-terminal fashion. sequential addition of one or more amino acids or Suitably The N-termini of amino acid monomers is protected by either protected amino acids to a growing peptide chain. Normally, Fmoc or Boc and added onto a deprotected amino acid chain. either the amino or the carboxyl group of the first amino acid Automated synthesizers are available for both Fmoc and Boc is protected by a suitable protecting group. The protected or techniques. US 2013/0332.133 A1 Dec. 12, 2013

0285) Stepwise elongation, in which the amino acids are 0292 Deprotection reagents washed away to provide connected step-by-step in turn, is ideal for Small peptides clean coupling environment; containing between 2 and 100 amino acid residues. Another 0293 Protected amino acids dissolved in a solvent such as method is fragment condensation, in which peptide frag dimethylformamide (DMF) are combined with coupling ments are coupled. Although the stepwise method can elon reagents pumped through the synthesis column; and gate the peptide chain without racemization, the yield in 0294 Coupling reagents are washed away to provide a creation of long or highly polar peptides tends to be poor. clean deprotection environment. Fragment condensation is better than stepwise elongation for 0295 While Fmoc and Boc are the most commonly used synthesizing Sophisticated long peptides, but racemization protective groups, other groups such as benzyloxy-carbonyl can be problematic. In order to maintain acceptable kinetics (Z), allyloxycarbonyl (alloc) and lithographic protecting in a fragment condensation reaction, the coupled fragment groups can also be used for protection. must be in gross excess. 0296 Coupling the peptides typically involves activation 0286 A new development for producing longer peptide of the carboxyl group to improve reaction kinetics. Activation chains is chemical ligation: Unprotected peptide chains react is most commonly by carbodiimides and/or aromatic Oximes. chemoselectively in aqueous solution. A first kinetically con 0297. In another embodiment polypeptide synthesis is trolled product rearranges to form the amide bond. The most effected by recombinant DNA technology. This is specifically common form native chemical ligation uses a peptide preferred when large amounts of the polypeptide are needed. thioester that reacts with a terminal cystein residue. 0298 Thus for example, a polynucleotide which comprise a nucleic acid sequence encoding the polypeptide of interest BOC SPPS is ligated into a nucleic acid construct which comprise a 0287 t-Boc (or Boc) stands for (tert)-(B)utyl (o)xy (c)ar cis-acting regulatory element positioned so as to drive tran bonyl. To remove Boc from a growing peptide chain, acidic Scription of the nucleic acid sequence when introduced into a conditions are used (e.g. neat TFA). Removal of side-chain host cell protecting groups and the peptide from the resin at the end of 0299 Thus the polynucleotide of the present invention the synthesis is achieved by incubating in hydrofluoric acid encodes a polypeptide having an amino acid sequence as (which can be dangerous). This danger represents a signifi described herein above. Such a polynucleotide may comprise cant disadvantage to Boc. However Boc offers significant a nucleic acid sequence at least about 40%, optionally about advantages in complex syntheses. When synthesizing non 70%, optionally about 75%, optionally about 80%, optionally natural peptide analogs which are base-sensitive (such as about 81%, optionally about 82%, optionally about 83%, depsipeptides), Boc is necessary. optionally about 84%, optionally about 85%, optionally about 86%, optionally about 87%, optionally about 88%. optionally about 89%, optionally about 90%, optionally Fmoc SPPS about 91%, optionally about 92%, optionally about 93%, 0288 Fmoc stands for (F)luorenyl-(m)eth(o)xy-(c)arbo optionally about 93%, optionally about 94%, optionally nyl which serves as a protecting group instead of Boc. To about 95%, optionally about 96%, optionally about 97%, remove Fmoc from a growing peptide chain, basic conditions optionally about 98%, optionally about 99%, optionally (e.g. 20% piperidine in DMF) are used. Removal of side about 100% homologou or identical (or intermediate degrees chain protecting groups and peptide from the resin is achieved of homology or identity) to a nucleic acid sequence selected by incubating in trifluoroacetic acid (TFA). Fmoc deprotec from the group consisting of SEQ ID nos.: 198.933 to 320, tion is usually slow because the anionic nitrogen produced at OO9. the end is not a particularly favorable product, although the 0300. The present invention also encompasses fragments whole process is thermodynamically driven by the evolution of the above described polynucleotides and polynucleotides of carbon dioxide. The main advantage of Fmoc chemistry is having mutations, such as deletions, insertions or Substitu that no hydrofluoric acid is needed which contributes to tions of one or more amino acids, either naturally occurring or safety. Fmoc is generally preferred for most routine synthesis man induced, either randomly or in a targeted fashion. because if this safety consideration. 0301 The polynucleotide of the present invention refers to Exemplary Solid supports a single or double stranded nucleic acid sequences which is 0289. The physical properties of the solid support, and the isolated and provided in the form of an RNA sequence, a applications to which it can be utilized, vary with the material complementary polynucleotide sequence (cDNA), a genomic from which the Support is constructed, the amount of polynucleotide sequence and/or a composite polynucleotide crosslinking, as well as the linker and handle being used. sequences (e.g., a combination of the above). Commonly used solid Supports include polystyrene and 0302 As used herein the phrase “complementary poly polyamide. nucleotide sequence” refers to a sequence, which results from reverse transcription of messenger RNA using a reverse tran General Synthesis Protocol scriptase or any other RNA dependent DNA polymerase. Such a sequence can be subsequently amplified in vivo or in 0290 Due to amino acid excesses used to ensure complete vitro using a DNA dependent DNA polymerase. coupling during each synthesis step, polymerization of amino 0303 As used herein the phrase “genomic polynucleotide acids is common in reactions where each amino acid is not sequence” refers to a sequence derived (isolated) from a chro protected. In order to prevent this polymerization, protective mosome and thus it represents a contiguous portion of a groups are used. A typical Fmoc or Boc synthesis involves chromosome. cyclic repletion of the following steps: 0304. As used herein the phrase “composite polynucle 0291 Protective group is removed from trailing amino otide sequence” refers to a sequence, which is at least par acids in a deprotection reaction; tially complementary and at least partially genomic. A com US 2013/0332.133 A1 Dec. 12, 2013 20 posite sequence can include some exonal sequences required enhancer elements, transcription terminators, and the like, to encode the polypeptide of the present invention, as well as can be used in the expression vector see, e.g., Bitter et al., Some intronic sequences interposing therebetween. The (1987) Methods in Enzymol. 153:516-544). intronic sequences can be of any source, including of other genes, and typically will include conserved splicing signal 0312. Other then containing the necessary elements for the sequences. Such intronic sequences may further include cis transcription and translation of the inserted coding sequence, acting expression regulatory elements. the expression construct of this aspect of the present invention can also include sequences engineered to enhance stability, Exemplary Construction of Polynucletide Expression production, purification, yield or toxicity of the expressed Vectors polypeptide. For example, the expression of a fusion protein or a cleavable fusion protein comprising a polypeptide of the 0305 Considerations and methods relevant to construc tion of expression vectors are described in, for example, Sam present invention and a heterologous protein can be engi brook, J. and D. W. Russell (2001; Molecular Cloning: A neered. Such a fusion protein can be designed so as to be Laboratory Manual, 3rd ed., vol. 1-3, Cold Spring Harbor readily isolated by affinity chromatography; e.g., by immo Press, Cold Spring Harbor N.Y.), Ausubel, F. M. et al. (1999; bilization on a column specific for the heterologous protein. Short Protocols in Molecular Biology, 4.sup.th ed., John Where a cleavage site is engineered between the protein of Wiley & Sons, New York N.Y.). One of ordinary skill in the art interest and the heterologous protein, the protein of interest will be able to begin from an amino acid sequence encoding can be released from the chromatographic column by treat a gene, determine a polynucleotide sequence encoding the ment with an appropriate enzyme or agent that disrupts the amino acid sequence, design and produce Suitable oligo cleavage site e.g., see Booth et al. (1988) Immunol. Lett. nucleotide primers for isolation of the determined polynucle 19:65-70; and Gardella et al., (1990) J. Biol. Chem. 265: otide sequence (e.g. using computer programs intended for 15854-15859. that purpose such as Primer Version 0.5, 1991, Whitehead 0313 A variety of cells can be used as host-expression Institute for Biomedical Research, Cambridge Mass.)., and systems to express the coding sequence of the protein of clone the sequence into a Suitable expression vector. interest. These include, but are not limited to, microorgan 0306 Expression vectors are available commercially (e.g isms, such as bacteria transformed with a recombinant bac from New England Biolabs; Ipswich, Mass., USA and Clon teriophage DNA, plasmid DNA or cosmid DNA expression tech laboratories; Mountainview Calif., USA). Selection of vector containing the coding sequence for the protein of inter appropriate regulatory sequences can contribute to expres est; yeast transformed with recombinant yeast expression sion levels of the protein encoded by the cloned nucleotide vectors containing the coding sequence for the protein of sequence. Regulatory sequences can include promoters and/ interest; plant cell systems infected with recombinant virus or enhancers and are optionally positioned upstream and/or expression vectors (e.g., cauliflower mosaic virus, CaMV: downstream of the cloned nucleotide sequence. Regulatory tobacco mosaic virus, TMV) or transformed with recombi sequences can be constitutive and/or tissue specific and/or nant plasmid expression vectors, such as Tiplasmid, contain inducible. Optionally, regulatory sequences are included in a ing the coding sequence (e.g. at least a portion of one or more commercially available or previously constructed vector. of SEQ ID Nos.: 198.933 to 320,009). Mammalian expres Alternatively, or additionally, regulatory sequences can be sion systems can also be used to express the protein of inter added to a vector during its construction using techniques est. Bacterial systems are preferably used to produce recom known in the art. binant proteins of interest, according to the present invention, 0307 Expression vectors can be DNA or RNA based and thereby enabling a high production Volume at low cost. include, but are not limited to phage, plasmids, cosmids, 0314. In bacterial systems, a number of expression vectors phagemids, yeast artificial chromosomes (YACS), murine can be advantageously selected depending upon the use artificial chromosomes (MACS), Human artificial chromo intended for the protein expressed. For example, when large somes (HACS) and viral vectors. quantities of protein are desired, vectors that direct the 0308 Vectors are typically transfected (e.g. by electropo expression of high levels of protein product, possibly as a ration) into prokaryotic cells or transformed into eukaryotic fusion with a hydrophobic signal sequence, which directs the cells (e.g. using calcium phosphate precipitation or lipofec expressed product into the periplasm of the bacteria or the tin) so that the cells express the polypeptide encoded by the culture medium where the protein product is readily purified vector at a high level. may be desired. Certain fusion protein engineered with a specific cleavage site to aid in recovery of the protein may Exemplary Culture and Harvest Conditions also be desirable. Such vectors adaptable to such manipula 0309 Thus, the isolated polynucleotides of the present tion include, but are not limited to, the pET series of E. coli invention can be expressed in variety of single cell or multi expression vectors Studier et al. (1990) Methods in Enzy cell expression systems and the recombinant polypeptides mol. 185:60-89). recovered therefrom used in pharmaceutical and agricultural 0315. It will be appreciated that when codon usage for applications as described hereinabove with respect to the proteins cloned from plants is inappropriate for expression in enzymatic composition of the present invention. E. coli, the host cells can be co-transformed with vectors that 0310. For expression in a single cell system, the poly encode species of tRNA that are rare in E. coli but are fre nucleotides of the present invention are cloned into an appro quently used by plants. For example, co-transfection of the priate expression vector (i.e., construct). gene dnaY, encoding tRNA. 14cco, a rare species of 0311 Depending on the host/vector system utilized, any tRNA in E. coli, can lead to high-level expression of heter of a number of Suitable transcription and translation elements ologous genes in E. coli. Brinkmann et al., Gene 85:109 including constitutive and inducible promoters, transcription (1989) and Kane, Curr. Opin. Biotechnol. 6:494 (1995). The US 2013/0332.133 A1 Dec. 12, 2013

dnaY gene can also be incorporated in the expression con enzyme of interest is purified from culture medium or from a struct such as for example in the case of the puBS vector bacterial lysate harvested from the culture using known puri (U.S. Pat. No. 6,270,0988). fication strategies. 0316. In yeast, a number of vectors containing constitutive 0323 Although classification of enzymes isolated from or inducible promoters can be used, as disclosed in U.S. Pat. Sargasso sea thermophiles is described herein as an illustra No. 5,932,447. Alternatively, vectors can be used which pro tive example, application of exemplary analytic methods mote integration of foreign DNA sequences into the yeast described herein can be applied to other datasets. Additional chromosome. bacterial genomic datasets which are currently available, or 0317. Other expression systems such as insects and mam expected to become available as a result of ongoing research malian host cell systems, which are well known in the art can efforts include, but are mot limited to Acidihiobacillus fer also be used by the present invention. rooxidans (hiobacillus ferrooxidans), Acidobacerium capsu laum, Acinomyces naeslundii, Aeromonas hydrophila, Ana 0318 Transformed cells are cultured under conditions, plasma phagocyophilum, Arhrobacer aurescens, Aspergillus which allow for the expression of high amounts of recombi filmigaus, anhracis (numerous subspecies and field nant protein. Such conditions include, but are not limited to, isolaes), Bacillus cereus (multiple Sub-species) B. Bacillus media, bioreactor, temperature, pH and oxygen conditions mojavensis, Bacillus subilis Subsp. spizizenii U-B-, Bacillus that permit protein production. Media refers to any medium in Subilis subsp. subilis RO-NN-, Baceroides forsyhus, Bau which a cell is cultured to produce the recombinant protein of mannia cicadellinicola, Brucella ovis, Brugia wolbachia, the present invention. Such a medium typically includes an Burkholderia hailandensis E. Campylobacer coli, Campylo aqueous Solution having assimilable carbon, nitrogen and bacer lari, Campylobacer upsaliensis, Campylobacer jejuni, phosphate sources, and appropriate salts, minerals, metals Carboxydoheus hydrogenofoans, Chlamydophila psiaci, and other nutrients, such as vitamins. Cells of the present Chrysiogenes arsenais, Closridium perfingens, Closridium invention can be cultured in conventional fermentation biore perfingens, Coccidioides posadasii, Colwellia sp. Copro actors, shake flasks, test tubes, microtiter dishes, and petri heobacer proeolvicus, Cyanobaceria sp., Cyanobaceria sp. plates. Culturing can be carried out at a temperature, pH and Dichelobacer nodosus. Dicyoglomus heophilum, Ehrlichia oxygen content appropriate for a recombinant cell. Such cul chafeensis, Enamoeba his Olvica, Epulopiscium sp., Erwinia turing conditions are within the expertise of one of ordinary chrysanhemi, (numerous Sub-species and skill in the art. strains), Fibrobacer succinogenes S. Geo vibrio hiophilus, 0319 Depending on the vector and host system used for Gemmaa obscuriglobus, Haloferax volcanii, Hyphomonas production, resultant proteins of the present invention may nepunium, Klebsiella pneumoniae, Methylococcus capsu either remain within the recombinant cell; be secreted into the laus, Mycobacerium avium, Mycobacerium Smegmais, fermentation medium; be secreted into a space between two Mycobacerium tuberculosis, Mycoplasma arhridis, Myco cellular membranes, such as the periplasmic space in E. coli: plasma bovis. Mycoplasma capricolumn, Myxococcus xian or be retained on the outer surface of a cellor viral membrane. hus DK, Neorickesia Sennesu Miyayama, Neosarorya fis 0320 Recovery of the recombinant protein is effected fol cheri, Persephonella marina EX-H, Plasmodium vivax lowing an appropriate time in culture. The phrase “recovering Salvador I Prevoellaineedia, Prevoella ruminicola, Prochlo the recombinant protein refers to collecting the whole fer ron didemni, Ruminococcus albus, Shigella boydii BS. Shi mentation medium containing the protein and need not imply gella dyseneriae, Simkania negevensis, Sigmaella auraniaca additional steps of separation or purification. Not withstand DW/-, Srepococcus agalaciae A, Srepococcusgordonii Chal ing from the above, proteins of the present invention can be lis, Srepococcus mis, Srepococcus pneumoniae-B, Srepo purified using a variety of standard protein purification tech coccus sobrinus, Sulfurihydrogenibium azorense, Synechoc niques, such as, but not limited to, affinity chromatography, occus sp. CC, Synergises jonesii, Thermodesulfobacerium ion exchange chromatography, filtration, electrophoresis, commune DSM. Thermodesulfovibrio yellowsonii DSM, hydrophobic interaction chromatography, gel filtration chro Thermomicrobium roseum DSM. Thermotoga neapoliana matography, reverse phase chromatography, concanavalin A DSM. Toxoplasma gondii B. Trichomonas vaginalis G, Try chromatography, chromatofocusing and differential Solubili panosoma bruceii Gua..., Verrucomicrobium spinosum DSM, Zation. Yersinia pesis Angola, Yersinia pesis IP, Yersinia pseudouber culosis IP 0321. The recombinant proteins of the present invention are preferably retrieved in “substantially pure' form to be 0324. Alternatively, or additionally, exemplary analytic used in pharmaceutical compositions and/or agricultural methods according to various embodiments of the invention compositions, described below. As used herein, “substan can be applied to plant genome datasets which are currently tially pure” refers to a purity that allows for the effective use available, or expected to become available as a result of of the protein in the diverse applications, described herein ongoing research efforts including, but not limited to Arabi above optionally, 50%, 60%, 70%, 80%, 90%, 95%, 99% or dopsis, Maize, , Cotton, Sorghum and Tobacco. effectively 100% pure (or lesser or intermediate levels of 0325 Alternatively, or additionally, exemplary analytic purity). methods according to various embodiments of the invention 0322. In an exemplary embodiment of the invention, ther can be applied to plant genome datasets which are currently mophilic bacteria (e.g. those listed in table 12) are cultured available, or expected to become available as a result of under Suitable temperatures for optimal growth. Optionally, ongoing research efforts including, but not limited to mouse, the thermophilic bacteria can be wild type or transformants rat, guinea pig, pig, horse, cow, chicken, Xenopus laevis, and carrying an expression vector. In an exemplary embodiment human. of the invention, extreme temperature conditions favor high 0326 Alternatively, or additionally, exemplary analytic level production of an enzyme of interest. Optionally, the methods according to various embodiments of the invention US 2013/0332.133 A1 Dec. 12, 2013 22 can be applied to determine function of non-enzyme mol relatively crude state (e.g. 10, 20, 30, 40 or 50% pure or lesser ecules. Such as enzymatic inhibitors (e.g. Substrate analogs). or intermediate degrees of purity) without complex purifica 0327. Once an EC number of an enzyme is known, one of tion prior to use. In general, preparation of commodity ordinary skill in the art can easily ascertain the preferred enzymes is conducted with a low profit margin and prices are Substrate(s) using available information resources (e.g. relatively low (e.g. 5 to 40S/Kg). BRENDA: The Comprehensive Enzyme Information Sys 0338. In contrast, specialty enzymes are used in smaller tem; Prof. Dr. D. Schomburg, Institut fuer Biochemie, Uni amounts (e.g. grams to kilos). Typically, specialty enzymes versitaet Zu Köln, Zülpicher Str. 47, 50674 Köln, Germany are employed at a relatively high level of purity. In general, ). Additionally, Sigma-Aldrich preparation of specialty enzymes is conducted with a high Chemical Co. (St. Louis, Mo., USA) makes available a data profit margin and prices are relatively high (e.g. 5 to 10,000S/ base of enzyme assay protocols searchable by EC number: g). 0328 drate hydrolysis is second (approximately 28%) and lipid 0329 Kits for assaying of enzyme activity are available hydrolysis accounts for about 3% of industrial enzyme use. commercially (e.g. from NOVASCREEN, Hanover Md., The remaining 10% of industrial enzyme use is in specialty USA). areas such as, for example, analytic use (e.g. nucleic acid 0330. However, enzymatic assays are expensive to per “restriction enzymes'), pharmaceutical use and research (e.g. form, with commercial kits typically costing in the range of thermophilic polymerases). S20 to S150 per assay. In an exemplary embodiment of the 0340. The industrial market for enzymes is growing with invention, assay costs are reduced by determining an EC an annual increase in volume of 10 to 15%. Total revenues classification according to exemplary methods disclosed increase by about 4 to 5% annually. Profit margins for com herein and conducting a single assay to Verify enzymatic modity enzymes continue to fall. This trend is offset by activity. increased use of specialty enzymes in, for example, diagnos 0331. In general, assay conditions are defined in terms of tics, fine chemical manufacture and chiral separation. one or more of pH, osmolarity, temperature, time. Substrate 0341 Industrial uses of enzymes in the food industry enzyme ratio and concentration of non peptide catalysts or include, but are not limited to, use of amylases in bread inhibitors (e.g. divalent cations). making, use of lipases in flavour development, use of pro 0332 While the body of available gene sequences and teases in cheese making and use of pectinases in clarifying regulatory sequences is constantly growing, the number of fruit juices. available useful gene expression constructs is limited to a 0342. In the textile industry, cellulases are commonly large degree by difficulty in ascertaining a function of a gene employed in treating denim to generate a stone-washed sequence. In an exemplary embodiment of the invention, a texture? appearance. known polypeptide with unknown function is rapidly and 0343 Another common industrial use of enzymes is in reliably characterized with respect to its function. Once char processing, for example to convert corn starch to high acterized with respect to function, significant quantities (e.g. fructose syrups. grams, kilos or even tons) of a desired polypeptide can be 0344. In agriculture, enzymes are commonly used to treat produced using recombinant DNA technology and/or biore animal feeds to make the more digestible (e.g. cellulase, actors and/or synthetic protocols as outlined above. Xylanase, phytase). 0333. This makes available, for the first time, useful quan 0345. In waste management, lipases are frequently tities of a wide variety of polypeptides (e.g. enzymes) for use employed as drain cleaning agents. in industry, medicine and agriculture. 0346. In the laboratory diagnostic enzymes and poly merases play a prominent role in many molecular analytic Exemplary Industrial Applications of Enzymes protocols (e.g. restriction digestion and PCR). Other common 0334 Enzymes are used in a wide variety of industrial and molecular biology techniques rely upon reporter enzymes research applications which are briefly reviewed here. This (e.g. alkaline , glucose oxidase, B-glucosidase review does not purport to be exhaustive and does not limit the and horseradish peroxidase). Scope of the invention. 0347 Specialized uses of enzymes in biotransformations 0335 Use of enzymes in industrial processes is well is a small but lucrative field. For example, lipases, esterases known to those of ordinary skill in the art. Exemplary indus and oxidoreductases can be employed in chiral separations, trial use of enzymes are described, for example, in “Industrial glucotransferases can be employed in synthesis of oligosac Enzymes and their Applications’ (Helmut Uhlig: Translated charides, thermolysin can be employed in aspartame synthe by Elfriede M. Linsmaier-Bednar (1998) Wiley-IEEE:Tech sis, nitrile hydratases can be employed in acrylamide and/or nology & Industrial Arts). Exemplary applications include nicotinamide synthesis, proteases can be employed in peptide carbohydrate hydrolysis, proteolysis, ester cleavage (e.g. fat synthesis, penicillin acylase can be employed in manufacture hydrolysis or lipolysis), glucose isomerization and oxido of semisynthetic penicillins and aspartase can be employed in reduction. The contents of this book are fully incorporated the manufacture of L-aspartate. herein by reference. 0348. In processing of cornstarch to produce glucose, 0336. In general, industrial enzymes can be divided into there are three enzymatic reactions commonly employed in two broad categories: commodity enzymes and specialty Sequence. enzymes. 0349. In a first enzymatic reaction, starch is hydrolyzed, 0337 Commodity enzymes are those which are used in for example using an O-amylase (cleaves a-1-4 glucosidic large amounts (e.g. tens to hundreds to thousands of kiloS/ bonds in starch). Often, a high temperature is applied to year). Typically, commodity enzymes can be employed in a expand starch granules, making amylose and amylopectin US 2013/0332.133 A1 Dec. 12, 2013

chains more accessible. Here is therefore an advantage to a acid is a Substrate for chemical or microbial conversion to thermostable enzyme in this process. In many cases the starch valuable commercial antibiotics (e.g. Ampicillin) hydrolysis is a batch process and the enzyme is not reused. 0350. In a second enzymatic reaction, maltose is con Exemplary Composition Types Verted to glucose, for example using an amyloglucosidase. In many cases amyloglucosidase has a pH optimum of 6.5 so Agricultural Compositions that reaction conditions must be adjusted after the starch hydrolysis reaction by reducing the pH. 0359. In an exemplary embodiment of the invention, enzy 0351. In a third enzymatic reaction, glucose is converted to matic compositions of the present invention can also be fructose, for example using a Xylose isomerase. Fructose is included in agricultural compositions, which also preferably Sweeter than glucose and is commonly used as Sweetening include an agricultural acceptable carrier. agent in foodstuffs. Fructose commands a higher price than 0360 Anagriculturally acceptable carrier can be a solidor glucose. and is more profitable than glucose. Xylose a liquid, preferably a liquid, more preferably water. While not isomerase converts glucose to fructose, in an equilibrium required, the agricultural composition of the invention may reaction: Glucoseg Fructose. also contain other additives such as fertilizers, inert formula tion aids, i.e. Surfactants, emulsifiers, defoamers, dyes, 0352 For many commercial applications, it is sufficient to extenders and the like. Reviews describing methods of prepa produce a 50:50 mixture of glucose:fructose. This mixture is ration and application of agricultural compositions are avail commonly known as “high fructose syrup (HFS). Option able. See, for example, Couch and Ignoffo (1981) in Micro ally, reaction conditions are adjusted by binding or removing bial Control of Pests and Plant Disease 1970-1980, Burges Calcium ions which can inhibit xylose isomerase. (ed.), chapter 34, pp. 621-634; Corke and Rishbeth, ibid, 0353. There is also a large industry devoted to production chapter 39, pp. 717-732: Brockwell (1980) in Methods for of artificial sweeteners. One commoner artificial sweetener is Evaluating Nitrogen Fixation, Bergersen (ed.) pp. 417-488; aspartame (L-phenylalanyl-L-aspartyl-methyl ester). Aspar Burton (1982) in Biological Nitrogen Fixation Technology tame can is often produced biocatalytically by peptide Syn for Tropical Agriculture, Graham and Harris (eds.) pp. 105 thesis usingathermostable protease which normally hydroly 114; and Roughley (1982) ibid, pp. 115-127; The Biology of ses the N-terminal amide bonds of hydrophobic amino acid Baculoviruses, Vol. II, supra, and references cited in the residues in a peptide. Optionally, use of an immobilised above. Wettable powder compositions incorporating bacu enzyme allows continuous process and enzyme reuse. loviruses for use in insect control are described in EP 697,170 0354) In aspartame manufacture, a low water activity sol incorporated by reference herein. vent system (organic solvent based) reverses the normal equi 0361 Preferred methods of applying the agricultural com librium to produce a CBZ-L-Phe-L-Asp-OMe intermediate positions of the present invention are leaf application, seed which crystallizes out of solution. Chemical removal of the coating and soil application, as disclosed in U.S. Pat. No. CBZ group (deblocking) produces L-Phe-L-Asp-OMe (As 5,039,523, which is fully incorporated herein. partame). 0355 Another important industrial use of enzymes is in Pharmaceutical Compositions nitrile biotransformations, for example synthesis of acryla mide. About 45 thousand tons per year of acrylamide is syn 0362 Polypeptides identified according to exemplary ana thesised biologically, using a whole cell catalyst. The catalyst lytic methods of the invention can be administered to an is an engineered Rhodococcus strain containing high levels of organism perse, or in a pharmaceutical composition where it the enzyme nitrile hydratase (NHase). Initially the wild type is mixed with suitable carriers or excipients. Rhodococcus was used. Subsequently a recombinant Rhodo 0363 As used herein, a “pharmaceutical composition' coccus expressing the NHase gene at high levels was refers to a preparation of one or more of the active ingredients employed. Currently, a recombinant Rhodococcus with an described herein with other chemical components such as NHase gene engineered to increase stability, and to reduce physiologically Suitable carriers and excipients. The purpose substrate and product inhibitions employed. The Rhodococ of a pharmaceutical composition is to facilitate administra cus is typically grown in a stirred tank bioreactor. tion of a compound to an organism. 0356. The biological production of acrylamide has advan 0364. As used herein, the term “active ingredient” refers to tages over the chemical synthesis because of the absence of the polypeptide accountable for the intended biological side-reactions, and the simpler recovery of the reaction prod effect. uct. 0365 Hereinafter, the phrases “physiologically accept 0357 Another important industrial use of enzymes is in able carrier and “pharmaceutically acceptable carrier.” production of nicotinamide. Nicotinamide is an essential which may be used interchangeably, refer to a carrier or a Vitamin, and is widely used in the health-food and animal diluent that does not cause significant irritation to an organ food-and-feed industries. Biological production, using the ism and does not abrogate the biological activity and proper same Rhodococcus biocatalyst as for acrylamide production, ties of the administered compound. An adjuvant is included produces about 5 thousand tons per year of nicotinamide. under these phrases. Whole cell cultures of Rhodococcus convert 3-cyanopyridine 0366. Herein, the term "excipient” refers to an inert sub to nicotinamide. stance added to a pharmaceutical composition to further 0358 Another important industrial use of enzymes is in facilitate administration of an active ingredient. Examples, production of penicillin derivatives. Penicillin is produced without limitation, of excipients include calcium carbonate, industrially at high yields by Streptomyces fermentations. calcium phosphate, various Sugars and types of starch, cellu The Penicillin is converted enzymatically by penicillin acy lose derivatives, gelatin, vegetable oils, and polyethylene gly lase to 6-aminopenicillanic acid. The 6-Aminopenicillanic cols. US 2013/0332.133 A1 Dec. 12, 2013 24

0367 Techniques for formulation and administration of glycerol or Sorbitol. The push-fit capsules may contain the drugs may be found in the latest edition of “Remington's active ingredients in admixture with filler Such as lactose, Pharmaceutical Sciences.” Mack Publishing Co., Easton, Pa., binders such as starches, lubricants such as talc or magnesium which is herein fully incorporated by reference. Stearate, and, optionally, stabilizers. In soft capsules, the 0368 Suitable routes of administration may, for example, active ingredients may be dissolved or Suspended in Suitable include oral, rectal, transmucosal, especially transnasal, liquids, such as fatty oils, liquid paraffin, or liquid polyethyl intestinal, or parenteral delivery, including intramuscular, ene glycols. In addition, stabilizers may be added. All formu Subcutaneous, and intramedullary injections, as well as lations for oral administration should be in dosages Suitable intrathecal, direct intraventricular, intravenous, inrtaperito for the chosen route of administration. neal, intranasal, or intraocular injections. 0376 For buccal administration, the compositions may 0369 Alternately, one may administer the pharmaceutical take the form of tablets or lozenges formulated in conven composition in a local rather than systemic manner, for tional manner. example, via injection of the pharmaceutical composition 0377 For administration by nasal inhalation, the active directly into a tissue region of a patient. ingredients for use according to the present invention are 0370 Pharmaceutical compositions of the present inven conveniently delivered in the form of an aerosol spray pre tion may be manufactured by processes well known in the art, sentation from a pressurized pack or a nebulizer with the use e.g., by means of conventional mixing, dissolving, granulat of a Suitable propellant, e.g., dichlorodifluoromethane, ing, dragee-making, levigating, emulsifying, encapsulating, trichlorofluoromethane, dichloro-tetrafluoroethane, or car entrapping, or lyophilizing processes. bon dioxide. In the case of a pressurized aerosol, the dosage 0371 Pharmaceutical compositions for use in accordance may be determined by providing a valve to deliver a metered with the present invention thus may beformulated in conven amount. Capsules and cartridges of for example, gelatin for tional manner using one or more physiologically acceptable use in a dispenser may be formulated containing a powder carriers comprising excipients and auxiliaries, which facili mix of the compound and a suitable powder base, such as tate processing of the active ingredients into preparations that lactose or starch. can be used pharmaceutically. Proper formulation is depen 0378. The pharmaceutical composition described herein dent upon the route of administration chosen. may be formulated for parenteral administration, e.g., by 0372 For injection, the active ingredients of the pharma bolus injection or continuous infusion. Formulations for ceutical composition may be formulated in aqueous solu injection may be presented in unit dosage form, e.g., in tions, preferably in physiologically compatible buffers such ampoules or in multidose containers with, optionally, an as Hank's Solution, Ringer's solution, or physiological salt added preservative. The compositions may be suspensions, buffer. For transmucosal administration, penetrants appropri Solutions, or emulsions in oily or aqueous vehicles, and may ate to the barrier to be permeated are used in the formulation. contain formulatory agents such as Suspending, stabilizing, Such penetrants are generally known in the art. and/or dispersing agents. 0373 For oral administration, the pharmaceutical compo sition can be formulated readily by combining the active 0379 Pharmaceutical compositions for parenteral admin compounds with pharmaceutically acceptable carriers well istration include aqueous solutions of the active preparation known in the art. Such carriers enable the pharmaceutical in water-soluble form. Additionally, suspensions of the active composition to be formulated as tablets, pills, dragees, cap ingredients may be prepared as appropriate oily or water Sules, liquids, gels, syrups, slurries, Suspensions, and the like, based injection Suspensions. Suitable lipophilic solvents or for oral ingestion by a patient. Pharmacological preparations vehicles include fatty oils such as Sesame oil, or synthetic for oral use can be made using a solid excipient, optionally fatty acid esters such as ethyl oleate, triglycerides, or lipo grinding the resulting mixture, and processing the mixture of Somes. Aqueous injection Suspensions may contain Sub granules, after adding Suitable auxiliaries as desired, to obtain stances that increase the Viscosity of the Suspension, such as tablets or dragee cores. Suitable excipients are, in particular, sodium carboxymethyl cellulose, sorbitol, or dextran. fillers such as Sugars, including lactose, Sucrose, mannitol, or Optionally, the Suspension may also contain Suitable stabiliz Sorbitol; cellulose preparations such as, for example, maize ers or agents that increase the solubility of the active ingre starch, wheat starch, rice starch, potato starch, gelatin, gum dients, to allow for the preparation of highly concentrated tragacanth, methyl cellulose, hydroxypropylmethyl-cellu Solutions. lose, and sodium carbomethylcellulose; and/or physiologi 0380 Alternatively, the active ingredient may be in pow cally acceptable polymers such as polyvinylpyrrolidone der form for constitution with a suitable vehicle, e.g., a sterile, (PVP). If desired, disintegrating agents. Such as cross-linked pyrogen-free, water-based solution, before use. polyvinyl pyrrolidone, agar, or alginic acid or a salt thereof, 0381. The pharmaceutical composition of the present Such as Sodium alginate, may be added. invention may also be formulated in rectal compositions such 0374 Dragee cores are provided with suitable coatings. as Suppositories or retention enemas, using, for example, For this purpose, concentrated Sugar Solutions may be used conventional Suppository bases such as cocoa butter or other which may optionally contain gum arabic, talc, polyvinyl glycerides. pyrrolidone, carbopol gel, polyethylene glycol, titanium 0382 Pharmaceutical compositions suitable for use in the dioxide, lacquer Solutions, and Suitable organic solvents or context of the present invention include compositions solvent mixtures. Dyestuffs or pigments may be added to the wherein the active ingredients are contained in an amount tablets or dragee coatings for identification or to characterize effective to achieve the intended purpose. More specifically, a different combinations of active compound doses. “therapeutically effective amount’ means an amount of active 0375 Pharmaceutical compositions that can be used ingredients (e.g., a nucleic acid construct) effective to pre orally include push-fit capsules made of gelatin, as well as vent, alleviate, or ameliorate symptoms of a disorder (e.g., soft, sealed capsules made of gelatin and a plasticizer, such as ischemia) or prolong the Survival of the Subject being treated. US 2013/0332.133 A1 Dec. 12, 2013

0383 Determination of a therapeutically effective amount 0391 The phrase “food additive' defined by the FDA in is well within the capability of those skilled in the art, espe 21 C.F.R. 170.3(e)(1) includes any liquid or solid material cially in light of the detailed disclosure provided herein. intended to be added to a food product. This material can, for 0384 For any preparation used in the methods of the example, include an agent having a distinct taste and/or flavor invention, the dosage or the therapeutically effective amount or physiological effect (e.g., vitamins). can be estimated initially from in vitro and cell culture assays. 0392 Thus, the food additive composition, may comprise For example, a dose can be formulated in animal models to the polypeptide of the present invention. achieve a desired concentration or titer. Such information can 0393. The food additive composition of the present inven be used to more accurately determine useful doses in humans. tion can include the polypeptide per se, or an encapsulated 0385 Toxicity and therapeutic efficacy of the active ingre form of the polypeptide (described hereinabove with respect dients described herein can be determined by standard phar to pharmaceutical compositions). The food additive compo maceutical procedures in vitro, in cell cultures or experimen sition of the present invention can be added to a variety of tal animals. The data obtained from these in vitro and cell food products. culture assays and animal studies can be used in formulating 0394 As used herein, the phrase “food product describes a range of dosage for use in human. The dosage may vary a material consisting essentially of protein, carbohydrate and/ depending upon the dosage form employed and the route of or fat, which is used in the body of an organism to Sustain administration utilized. The exact formulation, route of growth, repair and vital processes and to furnish energy. Food administration, and dosage can be chosen by the individual products may also contain Supplementary Substances such as physician in view of the patient’s condition. (See, e.g., Fingl, minerals, vitamins and condiments. See Merriani-Webster's E. et al. (1975), “The Pharmacological Basis of Therapeu Collegiate Dictionary, 10th Edition, 1993. The phrase “food tics.' Ch. 1, p. 1.) product as used herein further includes a beverage adapted for human or animal consumption. 0386 Dosage amount and administration intervals may be 0395 Representative examples of food products in which adjusted individually to provide sufficient plasma or brain the food additive of the present invention can be incorporated levels of the active ingredient to induce or suppress the bio include, without limitation, baked goods, Soft drinks, cereals, logical effect (i.e., minimally effective concentration, MEC). candy, jams, jellies, tofu, cheese and ice cream. The MEC will vary for each preparation, but can be estimated 0396 A food product containing the food additive of the from in vitro data. Dosages necessary to achieve the MEC present invention can also include additional additives Such will depend on individual characteristics and route of admin as, for example, antioxidants, Sweeteners, flavorings, colors, istration. Detection assays can be used to determine plasma preservatives, enzymes, nutritive additives Such as vitamins concentrations. and minerals, emulsifiers, pH control agents such as acidu 0387 Depending on the severity and responsiveness of the lants, hydrocolloids, antifoams and release agents, flour condition to be treated, dosing can be of a single or a plurality improving or strengthening agents, raising or leavening of administrations, with course of treatment lasting from agents, gases and chelating agents, the utility and effects of several days to several weeks, or until cure is effected or which are well-known in the art. diminution of the disease state is achieved. 0397. The polypeptide of the present invention can also be 0388. The amount of a composition to be administered expressed in edible portions of commercially grown crops. will, of course, be dependent on the subject being treated, the 0398. For example, the polypeptide of the present inven severity of the affliction, the manner of administration, the tion can be expressed in dicot or monocot plants, with a judgment of the prescribing physician, etc. preference to moncot plants such as rice, wheat or barley. 0389 Compositions of the present invention may, if Methods of expressing exogenous polynucleotide sequences desired, be presented in a pack or dispenser device, such as an in plants are described hereinabove with respect to synthesis FDA-approved kit, which may contain one or more unit dos of a recombinant polypeptide in plant cells. age forms containing the active ingredient. The pack may, for example, comprise metal or plastic foil. Such as ablisterpack. Cosmetical Compositions The pack or dispenser device may be accompanied by instruc tions for administration. The pack or dispenser device may 0399. Such compositions are usually prepared for aes also be accompanied by a notice in a form prescribed by a thetic use and may comprise the polypeptides of the present governmental agency regulating the manufacture, use, or sale invention as either the active ingredient or as a carrier. of pharmaceuticals, which notice is reflective of approval by 0400. As used herein, the term "cosmetically or cosme the agency of the form of the compositions for human or ceutically acceptable carrier describes a carrier or a diluent Veterinary administration. Such notice, for example, may that does not cause significant irritation to an organism and include labeling approved by the U.S. Food and Drug Admin does not abrogate the biological activity and properties of the istration for prescription drugs or of an approved product applied active ingredient(s). insert. Compositions comprising a preparation of the inven 04.01 Examples of acceptable carriers that are usable in tion formulated in a pharmaceutically acceptable carrier may the context of the present invention include carrier materials also be prepared, placed in an appropriate container, and that are well-known for use in the cosmetic and medical arts labeled for treatment of an indicated condition, as further as bases for e.g., emulsions, creams, aqueous solutions, oils, detailed above. ointments, pastes, gels, lotions, milks, foams, Suspensions, aerosols and the like, depending on the final form of the Food Additives composition. 0402 Representative examples of suitable carriers 0390. In an exemplary embodiment of the invention, food according to the present invention therefore include, without compositions comprise one or more polypeptides of the limitation, water, liquid alcohols, liquid glycols, liquid poly present invention as food additives. alkylene glycols, liquid esters, liquid amides, liquid protein US 2013/0332.133 A1 Dec. 12, 2013 26 hydrolysates, liquid alkylated protein hydrolysates, liquid preferred. Suitable pH adjusting agents include, for example, lanolin and lanolin derivatives, and like materials commonly one or more of adipic acids, glycines, citric acids, calcium employed in cosmetic and medicinal compositions. hydroxides, magnesium aluminometasilicates, buffers or any 0403. Other suitable carriers according to the present combinations thereof. invention include, without limitation, alcohols, such as, for 0408 Representative examples of deodorant agents that example, monohydric and polyhydric alcohols, e.g., ethanol, are usable in the context of the present invention include, isopropanol, glycerol, Sorbitol, 2-methoxyethanol, diethyl without limitation, quaternary ammonium compounds Such eneglycol, ethylene glycol, hexyleneglycol, mannitol, and as cetyl-trimethylammonium bromide, cetyl pyridinium propylene glycol, ethers such as diethyl or dipropyl ether; chloride, benzethonium chloride, diisobutyl phenoxyethoxy polyethylene glycols and methoxypolyoxyethylenes (carbo ethyl dimethylbenzyl ammonium chloride, sodium N-lauryl waxes having molecular weight ranging from 200 to 20,000); sarcosine, Sodium N-palmthyl sarcosine, lauroyl sarcosine, polyoxyethylene glycerols, polyoxyethylene Sorbitols, N-myristoyl glycine, potassium N-lauryl Sarcosine, Stearyl, Stearoyl diacetin, and the like. trimethyl ammonium chloride, sodium aluminum chlorohy 0404 By selecting the appropriate carrier and optionally droxy lactate, tricetylmethyl ammonium chloride, 2,4,4'- other ingredients that can be included in the composition, as trichloro-2'-hydroxy diphenyl ether, diaminoalkyl amides is detailed hereinbelow, the compositions of the present Such as L-lysine hexadecyl amide, heavy metal salts of cit invention may be formulated into any pharmaceutical, cos rate, Salicylate, and piroctose, especially Zinc salts, and acids metic or cosmeceutical form normally employed for topical thereof, heavy metal salts of pyrithione, especially Zinc application. Hence, the compositions of the present invention pyrithione and Zinc phenolsulfate. Other deodorant agents can be, for example, in a form of a cream, an ointment, a paste, include, without limitation, odor absorbing materials such as a gel, a lotion, a milk, a Suspension, an aerosol, a spray, a carbonate and bicarbonate salts, e.g. as the alkali metal car foam, a shampoo, a hair conditioner, a serum, a Swab, a bonates and bicarbonates, ammonium and tetraalkylammo pledget, a pad and a Soap. nium carbonates and bicarbonates, especially the Sodium and 04.05 The compositions of the present invention can potassium salts, or any combination of the above. optionally further comprise a variety of components that are 04.09 Antiperspirant agents can be incorporated in the Suitable for rendering the compositions more cosmetically or compositions of the present invention either in a solubilized aesthetically acceptable or to provide the compositions with or a particulate form and include, for example, aluminum or additional usage benefits. Such conventional optional com Zirconium astringent salts or complexes. ponents are well known to those skilled in the art and are 0410 Representative examples of sun screening agents referred to herein as “ingredients'. These include any cos usable in context of the present invention include, without metically acceptable ingredients such as those found in the limitation, p-aminobenzoic acid, salts and derivatives thereof CTFA International Cosmetic Ingredient Dictionary and (ethyl, isobutyl, glyceryl esters; p-dimethylaminobenzoic Handbook, 8th edition, edited by Wenninger and Canterbery, acid); anthranilates (i.e., o-amino-benzoates; methyl, men (The Cosmetic, Toiletry, and Fragrance Association, Inc., thyl, phenyl, benzyl, phenylethyl, linallyl, terpinyl, and cyclo Washington, D.C., 2000). Some non-limiting representative hexenyl esters); Salicylates (amyl, phenyl, octyl, benzyl, men examples of these ingredients include humectants, deodor thyl, glyceryl, and di-pro-pyleneglycol esters); cinnamic acid ants, antiperspirants, Sun screening agents, Sunless tanning derivatives (menthyl and benzyl esters, a-phenyl cinnamoni agents, hair conditioning agents, pH adjusting agents, chelat trile; butyl cinnamoyl pyruvate); dihydroxycinnamic acid ing agents, preservatives, emulsifiers, occlusive agents, emol derivatives (umbelliferone, methylumbelliferone, methylac lients, thickeners, Solubilizing agents, penetration enhancers, eto-umbelliferone); trihydroxy-cinnamic acid derivatives anti-irritants, colorants, propellants (as described above) and (esculetin, methylesculetin, daphnetin, and the glucosides, Surfactants. esculin and daphnin); hydrocarbons (diphenylbutadiene, stil 0406 Thus, for example, the compositions of the present bene): dibenzalacetone and benzalacetophenone; naphthol invention can comprise, in combination with ammonium lac sulfonates (sodium salts of 2-naphthol-3,6-disulfonic and of tate and urea, one or more additional humectants or moistur 2-naphthol-6,8-disulfonic acids); di-hydroxynaphthoic acid izing agents. Representative examples of humectants that are and its salts; o- and p-hydroxybiphenyldisulfonates; cou usable in this context of the present invention include, without marin derivatives (7-hydroxy, 7-methyl, 3-phenyl); diazoles limitation, guanidine, glycolic acid and glycolate salts (e.g. (2-acetyl-3-bromoindazole, phenyl benzoxazole, methyl ammonium Slat and quaternary alkyl ammonium salt), aloe naphthoxazole, various aryl benzothiazoles); quinine salts Vera in any of its variety of forms (e.g., aloe Vera gel), allan (bisulfate, Sulfate, chloride, oleate, and tannate); quinoline toin, urazole, polyhydroxy alcohols such as Sorbitol, glycerol, derivatives (8-hydroxyquinoline salts, 2-phenylduinoline); hexanetriol, propylene glycol, butylene glycol, hexylene gly hydroxy- or methoxy-Substituted benzophenones; uric and col and the like, polyethylene glycols, Sugars and starches, violuric acids; tannic acid and its derivatives (e.g., hexaeth Sugar and starch derivatives (e.g., alkoxylated glucose), ylether); (butyl carbotol) (6-propyl piperonyl)ether; hydro hyaluronic acid, lactamide monoethanolamine, acetamide quinone; benzophenones (oxybenzene, Sulisobenzone, monoethanolamine and any combination thereof. dioxybenzone, benzoresorcinol, 2.2,4,4-tetrahydroxyben 0407. The compositions of the present invention can fur Zophenone, 2,2'-dihydroxy-4,4'-dimethoxybenzophenone, ther comprise a pH adjusting agent. As is discussed herein octabenzone: 4-isopropyldibenzoylmethane; butylmethoxy above, although the ammonium lactate or any corresponding dibenzoylmethane: etocrylene; octocrylene; 3-(4-methyl ammonium salt may serve as a pH adjusting agent, it is benzylidene bornan-2-one) and 4-isopropyl-di-benzoyl preferable for the compositions of the invention to have a pH methane, and any combination thereof. value of between about 4 and about 7, preferably between 0411 Representative examples of Sunless tanning agents about 5 and about 6, most preferably about 5.5 or substan usable in context of the present invention include, without tially 5.5 and hence the presence of a pH adjusting agent is limitation, dihydroxyacetone, glyceraldehyde, indoles and US 2013/0332.133 A1 Dec. 12, 2013 27 their derivatives. The Sunless tanning agents can be used in positions of the present invention are, for example, polyoxy combination with the Sunscreen agents. ethylene sorbitan fatty acid ester, polyoxyethylene n-alkyl 0412 Suitable hair conditioning agents that can be used in ethers, n-alkylaminen-oxides, poloxamers, organic Solvents, the context of the present invention include, for example, one phospholipids and cyclodextrines. or more collagens, cationic Surfactants, modified silicones, 0420 Suitable penetration enhancers usable in context of proteins, keratins, dimethicone polyols, quaternary ammo the present invention include, but are not limited to, dimeth nium compounds, halogenated quaternary ammonium com ylsulfoxide (DMSO), dimethyl formamide (DMF), allantoin, pounds, alkoxylated carboxylic acids, alkoxylated alcohols, urazole, N,N-dimethylacetamide (DMA), decylmethylsul alkoxylated amides, Sorbitan derivatives, esters, polymeric foxide (Co. MSO), polyethylene glycol monolaurate ethers, glyceryl esters, or any combinations thereof. (PEGML), propylene glycol (PG), propylene glycol mono 0413. The chelating agents are optionally added to the laurate (PGML), glycerol monolaurate (GML), lecithin, the compositions of the present invention so as to enhance the 1-substituted azacycloheptan-2-ones, particularly 1-n-dode preservative or preservative system. Preferred chelating cylcyclazacycloheptan-2-one (available under the trademark agents are mild agents, such as, for example, ethylenedi AZone(R) from Whitby Research Incorporated, Richmond, aminetetraacetic acid (EDTA), EDTA derivatives, or any Va.), alcohols, and the like. The permeation enhancer may combination thereof. also be a . Such oils include, for example, saf 0414 Suitable preservatives that can be used in the context flower oil, cottonseed oil and corn oil. of the present composition include, without limitation, one or 0421 Suitable anti-irritants that can be used in the context more alkanols, disodium EDTA (ethylenediamine tetraac of the present invention include, for example, Steroidal and etate), EDTA salts, EDTA fatty acid conjugates, isothiazoli non Steroidal anti-inflammatory agents or other materials none, parabens such as methylparaben and propylparaben, Such as aloe Vera, chamomile, alpha-bisabolol, cola nitida propylene glycols, Sorbates, urea derivatives such as diaz extract, green tea extract, tea tree oil, licoric extract, allantoin, olindinyl urea, or any combinations thereof. caffeine or other Xanthines, glycyrrhizic acid and its deriva 0415 Suitable emulsifiers that can be used in the context tives. of the present invention include, for example, one or more 0422. Although a wide variety of ingredients can be Sorbitans, alkoxylated fatty alcohols, alkylpolyglycosides, included in the compositions of the present invention, in Soaps, alkyl Sulfates, monoalkyland dialkylphosphates, alkyl addition to the active ingredients, the compositions are pref Sulphonates, acyl isothionates, or any combinations thereof. erably devoid of an enduring perfume composition. The 0416 Suitable occlusive agents that can be used in the incorporation of such a perfume composition in pharmaceu context of the present invention include, for example, petro tical compositions is considered in the art disadvantageous latum, mineral oil, beeswax, silicone oil, lanolin and oil for skin and scalp medical treatment, as it oftentimes cause soluble lanolin derivatives, saturated and unsaturated fatty undesirable irritation of a sensitive skin. alcohols such as behenyl alcohol, hydrocarbons such as 0423. As used herein, the phrase “an enduring perfume squalane, and various animal and vegetable oils such as composition” describes a composition that comprises one or almond oil, peanut oil, wheat germ oil, linseed oil, jojoba oil, more perfumes that provide a long lasting aesthetic benefit oil of apricot pits, walnuts, palm nuts, pistachio nuts, Sesame with a minimum amount of material. Enduring perfume com seeds, rapeseed, cade oil, corn oil, peach pit oil, poppy seed positions are substantially deposited and remain on the body oil, pine oil, castor oil, Soybean oil, avocado oil, safflower oil, throughout any rinse and/or drying steps. Representative coconut oil, hazelnut oil, olive oil, grape seed oil and Sun examples of such compositions are described, for example, in flower seed oil. U.S. Pat. No. 6,086,903. 0417 Suitable emollients, other than ammonium lactate, 0424 However, it should be noted that fragrances other that can be used in the context of the present invention than enduring perfume compositions, perfumes or perfume include, for example, dodecane, squalane, cholesterol, iso compositions, which are fast removable from the surface they hexadecane, isononyl isononanoate, PPG Ethers, petrolatum, are deposited on, can be included in the compositions of the lanolin, safflower oil, castor oil, coconut oil, cottonseed oil, present invention. palm kernel oil, palm oil, peanut oil, soybean oil, polyol carboxylic acid esters, derivatives thereof and mixtures Exemplary Medical Applications of Enzymes thereof. 0425 Use of enzymes a wide variety of medical applica 0418 Suitable thickeners that can be used in the context of tions is contemplated and/or practiced. Exemplary medical the present invention include, for example, non-ionic water applications which are briefly reviewed here. This review soluble polymers such as hydroxyethylcellulose (commer does not purport to be exhaustive and does not limit the scope cially available under the Trademark Natrosol(R 250 or 350), of the invention. The cellular processes of biogenesis and cationic water-soluble polymers such as Polyduat 37 (com biodegradation involve a number of key enzyme classes mercially available under the Trademark Synthalen RCN), including oxidoreductases, transferases, hydrolases, lyases, fatty alcohols, fatty acids and their alkali salts and mixtures isomerases, ligases, and others. Each class of enzyme com thereof. prises many Substrate-specific enzymes having precise and 0419 Representative examples of solubilizing agents that well regulated functions. Enzymes facilitate metabolic pro are usable in this context of the present invention include, cesses such as glycolysis, the tricarboxylic cycle, and fatty without limitation, complex-forming solubilizers such as cit acid metabolism; synthesis or degradation of amino acids, ric acid, ethylenediamine-tetraacetate, Sodium meta-phos steroids, phospholipids, and alcohols; regulation of cell sig phate. Succinic acid, urea, cyclodextrin, polyvinylpyrroli naling, proliferation, inflammation, and apoptosis; and done, diethylammonium-ortho-benzoate, and micelle through catalyzing critical steps in DNA replication and forming solubilizers such as TWEENS and spans, e.g., repair and the process of translation. Once an enzyme has TWEEN 80. Other solubilizers that are usable for the com been classified according to EC nomenclature it is possible to US 2013/0332.133 A1 Dec. 12, 2013 28 predict with a high degree of certainty which substrate(s) the tional cysteine that is not present in yeast or prokaryotic SQRS enzyme is specific to and/or what type of reaction the enzyme (Morris, A. A. et al. (1994) Biochim. Biophys. Acta 29:125 catalyzes. 128). 0429 Propagation of nerve impulses, modulation of cell Oxidoreductases proliferation and differentiation, induction of the immune response, and tissue homeostasis involve neurotransmitter 0426 Many pathways of biogenesis and biodegradation metabolism (Weiss, B. (1991) Neurotoxicology 12:379-386: require oxidoreductase (dehydrogenase or reductase) activ Collins, S. M. etal. (1992) Ann. N.Y. Acad. Sci. 664:415-424; ity, coupled to reduction or oxidation of a cofactor. Potential Brown, J. K. and H. Imam (1991) J. Inherit. Metab. Dis. cofactors include cytochromes, oxygen, disulfide, iron-sulfur 14:436-458). Many pathways of neurotransmitter metabo proteins, Ravin adenine dinucleotide (FAD), and the nicoti lism require oxidoreductase activity, coupled to reduction or namide adenine dinucleotides NAD and NADP(Newsholme, oxidation of a cofactor, such as NAD"/NADH (Newsholme E. A. and A. R. Leech (1983) Biochemistry for the Medical and Leech, supra, pp. 779-793). Degradation of catechola Sciences, John Wiley and Sons, Chichester, U. K. pp. 779 mines (epinephrine or norepinephrine) requires alcoholdehy 793). Reductase activity catalyzes transfer of electrons drogenase (in the brain) or aldehyde dehydrogenase (in between Substrate(s) and cofactor(s) with concurrent oxida peripheral tissue). NAD"-dependent aldehyde dehydroge tion of the cofactor. Reverse dehydrogenase activity catalyzes nase oxidizes 5-hydroxyindole-3-acetate (the product of the reduction of a cofactor and consequent oxidation of the 5-hydroxytryptamine (serotonin) metabolism) in the brain, Substrate. Oxidoreductase enzymes are a broad Superfamily blood platelets, liver and pulmonary endothelium (Newsh that catalyze reactions in all cells of organisms, including olme and Leech, Supra, p. 786). Other neurotransmitter deg metabolism of Sugar, certain detoxification reactions, and radation pathways that utilize NAD"/NADH-dependent oxi synthesis or degradation of fatty acids, amino acids, gluco doreductase activity include those of L-DOPA (precursor of corticoids, estrogens, androgens, and prostaglandins. Differ dopamine, a neuronal excitatory compound), glycine (an ent family members may be referred to as oxidoreductases, inhibitory neurotransmitter in the brain and spinal cord), his oxidases, reductases, or dehydrogenases, and they often have tamine (liberated from mast cells during the inflammatory distinct cellular locations such as the cytosol, the plasma response), and taurine (an inhibitory neurotransmitter of the membrane, mitochondrial inner or outer membrane, and per brain stem, spinal cord and retina) (Newsholme and Leech, oxisomes. supra, pp. 790, 792). Epigenetic or genetic defects in neu 0427 Short-chain alcoholdehydrogenases (SCADs) area rotransmitter metabolic pathways can result in diseases family of dehydrogenases that share only 15% to 30% including Parkinson disease and inherited myoclonus (Mc sequence identity, with similarity predominantly in the coen Cance, K. L. and S. E. Huether (1994) Pathophysiology, Zyme binding domain and the Substrate binding domain. In Mosby-Year Book, Inc., St. Louis, Mo. pp. 402-404: Gun addition to their role in detoxification of ethanol, SCADs are dlach, A. L. (1990) FASEB.J. 4:2761-2766). involved in synthesis and degradation of fatty acids, steroids, 0430 Tetrahydrofolate is a derivatized glutamate mol and some prostaglandins, and are therefore implicated in a ecule that acts as a carrier, providing activated one-carbon variety of disorders such as lipid storage disease, myopathy, units to a wide variety of biosynthetic reactions, including SCAD deficiency, and certain genetic disorders. For example, synthesis of purines, pyrimidines, and the amino acid retinol dehydrogenase is a SCAD-family member (Simon, A. methionine. Tetrahydrofolate is generated by the activity of a et al. (1995) J. Biol. Chem. 270: 1107-1112) that converts holoenzyme complex called tetrahydrofolate synthase, which retinol to retinal, the precursor of retinoic acid. Retinoic acid, includes three enzyme activities: tetrahydrofolate dehydro a regulator of differentiation and apoptosis, has been shownto genase, tetrahydrofolate cyclohydrolase, and tetrahydro down-regulate genes involved in cell proliferation and folate synthetase. Thus, tetrahydrofolate dehydrogenase inflammation (Chai, X. etal. (1995).J. Biol. Chem. 270:3900 plays an important role in generating building blocks for 3904). In addition, retinol dehydrogenase has been linked to nucleic and amino acids, crucial to proliferating cells. hereditary eye diseases such as autosomal recessive child 0431 3-Hydroxyacyl-CoA dehydrogenase (3HACD) is hood-onset severe retinal dystrophy (Simon, A. et al. (1996) involved in fatty acid metabolism. It catalyzes the reduction Genomics 36:424-430). of 3-hydroxyacyl-CoA to 3-oxoacyl-CoA, with concomitant 0428 Membrane-bound succinate dehydrogenases (suc oxidation of NAD to NADH, in the mitochondria and peroxi cinate:cquinone reductases, SQR) and fumarate reductases somes of eukaryotic cells. In peroxisomes, 3HACD and (quinol:fumarate reductases, QFR) couple the oxidation of enoyl-CoA hydratase form an enzyme complex called Succinate to fumarate with the reduction of quinone to quinol, bifunctional enzyme, defects in which are associated with and also catalyze the reverse reaction. QFR and SQR com peroxisomal bifunctional enzyme deficiency. This interrup plexes are collectively known as Succinate:cquinone oxi tion in fatty acid metabolism produces accumulation of very doreductases (EC 1.3.5.1) and have similar compositions. long chain fatty acids, disrupting development of the brain, The complexes consist of two hydrophilic and one or two bone, and adrenal glands. Infants born with this deficiency hydrophobic, membrane-integrated Subunits. The larger typically die within 6 months (Watkins, P. etal. (1989).J. Clin. hydrophilic subunit A carries covalently bound flavinadenine Invest. 83:771-777: Online Mendelian Inheritance in Man dinucleotide; subunit B contains three iron-sulphur centers (OMIM), #261515). The neurodegeneration characteristic of (Lancaster, C. R. and A. Kroger (2000) Biochim. Biophys. Alzheimer's disease involves development of extracellular Acta 1459:422-431). The full-length cDNA sequence for the plaques in certain brain regions. A major protein component flavoprotein subunit of human heart Succinate dehydrogenase of these plaques is the peptide amyloid-fi (AB), which is one (succinate:(acceptor) oxidoreductase; EC 1.3.99.1) is similar of several cleavage products of amyloid precursor protein to the bovine Succinate dehydrogenase in that it contains a (APP).3HACD has been shown to bind the AB peptide, and is cysteine triplet and in that the active site contains an addi overexpressed in neurons affected in Alzheimer's disease. In US 2013/0332.133 A1 Dec. 12, 2013 29 addition, an antibody against 3HACD can block the toxic quiring reactions. The key respiratory chain complexes are effects of A? in a cell culture model of Alzheimer's disease NADH:ubiquinone oxidoreductase (complex I), succinate: (Yan, S. et al. (1997) Nature 389:689-695; OMIM #602057). ubiquinone oxidoreductase (complex II), cytochrome c-b 0432 Steroids such as estrogen, testosterone, and corti oxidoreductase (complex III), cytochrome c oxidase (com costerone are generated from a common precursor, choles plex IV), and ATP synthase (complex V) (Alberts, B. et al. terol, and interconverted. Enzymes acting upon cholesterol (1994) Molecular Biology of the Cell, Garland Publishing, include dehydrogenases. Steroid dehydrogenases, such as the Inc., New York, N.Y., pp. 677-678). All of these complexes hydroxysteroid dehydrogenases, are involved in hyperten are located on the inner matrix side of the mitochondrial sion, fertility, and cancer (Duax, W. L. and D. Ghosh (1997) membrane except complex II, which is on the cytosolic side Steroids 62:95-100). One such dehydrogenase is 3-oxo-5-a- where it transports electrons generated in the citric acid cycle steroid dehydrogenase (OASD), a microsomal membrane to the respiratory chain. Electrons released in oxidation of protein highly expressed in prostate and other androgen-re Succinate to fumarate in the citric acid cycle are transferred sponsive tissues. OASD catalyzes the conversion of testoster through electron carriers in complex II to membrane bound one into dihydrotestosterone, which is the most potent andro ubiquinone (Q). Transcriptional regulation of these nuclear gen. Dihydrotestosterone is essential for the formation of the encoded genes controls the biogenesis of respiratory male phenotype during embryogenesis, as well as for proper enzymes. Defects and altered expression of enzymes in the androgen-mediated growth of tissues such as the prostate and respiratory chain are associated with a variety of disease male genitalia. A defect in OASD leads to defective formation conditions. of the external genitalia (Andersson, S. et al. (1991) Nature Other dehydrogenase activities using NAD as a cofactor 354:159-161; Labrie, F. et al. (1992) Endocrinology 131: include 3-hydroxyisobutyrate dehydrogenase (3HBD), 1571-1573; OMIM #264.600). which catalyzes the NAD-dependent oxidation of 3-hydroxy 0433 17B.-hydroxysteroid dehydrogenase (17 BHSD6) isobutyrate to methylmalonate semialdehyde within mito plays an important role in the regulation of the male repro chondria. 3-hydroxyisobutyrate levels are elevated in ductive hormone, dihydrotestosterone (DHTT). 17BHSD6 ketoacidosis, methylmalonic acidemia, and other disorders acts to reduce levels of DHTT by oxidizing a precursor of (Rougraff, P. M. et al. (1989).J. Biol. Chem. 264:5899-5903). DHTT, 3C-diol, to androsterone which is readily glucu Another mitochondrial dehydrogenase important in amino ronidated and removed. 17 BHSD6 is active with both andro acid metabolism is the enzyme isovaleryl-CoA-dehydroge gen and estrogen Substrates in embryonic kidney 293 cells. nase (IVD). IVD is involved in leucine metabolism and cata Isozymes of 17 BHSD catalyze oxidation and/or reduction lyzes the oxidation of isovaleryl-CoA to 3-methylcrotonyl reactions in various tissues with preferences for different CoA. Human IVD is a tetrameric flavoprotein synthesized in steroid substrates (Biswas, M. G. and D. W. Russell (1997).J. the cytosol with a mitochondrial import signal sequence. A Biol. Chem. 272: 15959-15966). For example, 17BHSD1 mutation in the gene encoding IVD results in isovaleric aci preferentially reduces estradiol and is abundant in the ovary demia (Vockley, J. et al. (1992) J. Biol. Chem. 267:2494 and placenta. 17BHSD2 catalyzes oxidation of androgens and 2501). is present in the endometrium and placenta. 17 BHSD3 is The family of glutathione peroxidases encompass tetrameric exclusively a reductive enzyme in the testis (Geissler, W. M. glutathione peroxidases (GPx1-3) and the monomeric phos et al. (1994) Nature Genet. 7:34-39). An excess of androgens pholipid hydroperoxide glutathione peroxidase (PHGPx/ such as DHTT can contribute to diseases such as benign GPx4). Although the overall homology between the tet prostatic hyperplasia and prostate cancer. rameric enzymes and GPx4 is less than 30%, a pronounced The oxidoreductase isocitrate dehydrogenase catalyzes the similarity has been detected in clusters involved in the active conversion of isocitrate to a-ketoglutarate, a Substrate of the site and a common catalytic triad has been defined by struc citric acid cycle. Isocitrate dehydrogenase can be either NAD tural and kinetic data (Epp, O. et al. (1983) Eur. J. Biochem. or NADP dependent, and is found in the cytosol, mitochon 133:51-69). GPx1 is ubiquitously expressed in cells, whereas dria, and peroxisomes. Activity of isocitrate dehydrogenase is GPx2 is present in the liver and colon, and GPx3 is present in regulated developmentally, and by hormones, neurotransmit plasma. GPx4 is found at low levels in all tissues but is ters, and growth factors. expressed at high levels in the testis (Ursini, F. et al (1995) 0434 Hydroxypyruvate reductase (HPR), a peroxisomal Meth. Enzymol. 252:38-53). GPx4 is the only monomeric 2-hydroxyacid dehydrogenase in the glycolate pathway, cata glutathione peroxidase found in mammals and the only mam lyzes the conversion of hydroxypyruvate to glycerate with the malian glutathione peroxidase to show high affinity for and oxidation of both NADH and NADPH. The reverse dehydro reactivity with phospholipid hydroperoxides, and to be mem genase reaction reduces NAD" and NADP". HPR recycles brane associated. A tandem mechanism for the antioxidant nucleotides and bases back into pathways leading to the Syn activities of GPx4 and vitamin E has been suggested. GPx4 thesis of ATP and GTP, which are used to produce DNA and has alternative transcription and translation start sites which RNA and to control various aspects of signal transduction and determine its subcellular localization (Esworthy, R. S. et al. energy metabolism. Purine nucleotide biosynthesis inhibitors (1994) Gene 144:317-318; and Maiorino, M. et al. (1990) are used as antiproliferative agents to treat cancer and viral Meth. Enzymol. 186:448-450). diseases. HPR also regulates biochemical synthesis of serine 0436 The glutathione S-transferases (GST) are a ubiqui and cellular serine levels available for protein synthesis. tous family of enzymes with dual substrate specificities that 0435 The mitochondrial electron transport (or respira perform important biochemical functions of xenobiotic tory) chain is the series of oxidoreductase-type enzyme com biotransformation and detoxification, drug metabolism, and plexes in the mitochondrial membrane that is responsible for protection of tissues against peroxidative damage. They cata the transport of electrons from NADH to oxygen and the lyze the conjugation of an electrophile with reduced glu coupling of this oxidation to the synthesis of ATP (oxidative tathione (GSH) which results in either activation or deactiva phosphorylation). ATP provides energy to drive energy-re tion/detoxification. The absolute requirement for binding US 2013/0332.133 A1 Dec. 12, 2013 30 reduced GSH to a variety of chemicals necessitates a diversity gen peroxide (Schalireuter, K. U. and J. M. Wood (1991) in GST structures in various organisms and cell types. GSTs Melanoma Res. 1:159-167). Glutaredoxin is the principal are homodimeric or heterodimeric proteins localized in the agent responsible for protein dethiolation in vivo and reduces cytosol. The major isozymes share common structural and dehydroascorbic acid in normal human neutrophils (Jung, C. catalytic properties and include four major classes, Alpha, H. and J. A. Thomas (1996) Arch. Biochem. Biophys. 335: Mu, Pi, and Theta. Each GST possesses a common binding 61-72; Park, J. B. and M. Levine (1996) Biochem. J.315:931 site for GSH, and a variable hydrophobic binding site specific 938). T for its particular electrophilic substrates. Specific amino acid 0440 The thioredoxin system serves as a hydrogen donor residues within GSTs have been identified as important for for ribonucleotide reductase and as a regulator of enzymes by these binding sites and for catalytic activity. Residues Q67, redox control. It also modulates the activity of transcription T68, D101, E104, and R131 are important for the binding of factors such as NF-kappa.B, AP-1, and steroid receptors. GSH (Lee, H.-C. et al. (1995) J. Biol. Chem. 270:99-109). Several cytokines or secreted cytokine-like factors such as Residues R13, R20, and R69 are important for the catalytic adult T-cell leukemia-derived factor, 3B6-interleukin-1, activity of GST (Stenberg, G. et al. (1991) Biochem. J. 274: T-hybridoma-derived (MP-6) B cell stimulatory factor, and 549-555). early pregnancy factor have been reported to be identical to 0437. GSTs normally deactivate and detoxify potentially thioredoxin (Holmgren, A. (1985) Annu. Rev. Biochem. mutagenic and carcinogenic chemicals. Some forms of rat 54:237-271; Abate, C. et al. (1990) Science 249:1157-1161; and human GSTs are reliable preneoplastic markers of car Tagaya, Y. et al. (1989) EMBO J. 8:757-764; Wakasugi, H. cinogenesis. Dihalomethanes, which produce liver tumors in (1987) Proc. Natl. Acad. Sci. USA 84:804-808: Rosen, A. et mice, are believed to be activated by GST (Thier, R. et al. al. (1995) Int. Immunol. 7:625-633). Thus thioredoxin (1993) Proc. Natl. Acad. Sci. USA 90:8567-8580). The secreted by stimulated lymphocytes (Yodoi, J. and T. Tursz mutagenicity of ethylene dibromide and ethylene dichloride (1991) Adv. Cancer Res. 57:381–411; Tagaya, N. etal. (1990) is increased in bacterial cells expressing the human Alpha Proc. Natl. Acad. Sci. USA 87:8282-8286) has extracellular GST. A 1-1, while the mutagenicity of aflatoxin B1 is substan activities including a role as a regulator of cell growth and a tially reduced by enhancing the expression of GST (Simula, mediator in the immune system (Miranda-Vizuete, A. et al. T. P. et al. (1993) Carcinogenesis 14:1371-1376). Thus, con (1996) J. Biol. Chem. 271:19099-19103: Yamauchi, A. et al. trol of GST activity may be useful in the control of mutagen (1992) Mol. Immunol. 29:263-270). Thioredoxin and thiore esis and carcinogenesis. doxin reductase protect against cytotoxicity mediated by 0438 GST has been implicated in the acquired resistance reactive oxygen species in disorders such as Alzheimer's ofmany cancers to drug treatment, the phenomenon known as disease (Lovell, M.A. (2000) Free Radic. Biol. Med. 28:418 multi-drug resistance (MDR). MDR occurs when a cancer 427). patient is treated with a cytotoxic drug such as cyclophospha 0441 The selenoprotein thioredoxin reductase is secreted mide and Subsequently becomes resistant to this drug and to by both normal and neoplastic cells and has been implicated a variety of other cytotoxic agents as well. Increased GST as both a growth factor and as a polypeptide involved in levels are associated with some drug resistant cancers, and it apoptosis (Soderberg, A. et al. (2000) Cancer Res. 60:2281 is believed that this increase occurs in response to the drug 2289). An extracellular plasmin reductase secreted by ham agent which is then deactivated by the GST catalyzed GSH ster ovary cells (HT-1080) has been shown to participate in conjugation reaction. The increased GST levels then protect the generation of angiostatin from plasmin. In this case, the the cancer cells from other cytotoxic agents for which GST reduction of the plasmin disulfide bonds triggers the pro has affinity. Increased levels of A1-1 in tumors has been teolytic cleavage of plasmin which yields the angiogenesis linked to drug resistance induced by cyclophosphamide treat inhibitor, angiostatin (Stathakis, Petal. (1997).J. Biol. Chem. ment (Dirven, H. A. etal. (1994) Cancer Res. 54:6215-6220). 272:20641-20645). Low levels of reduced sulfhydryl groups Thus control of GST activity in cancerous tissues may be in plasma has been associated with rheumatoid arthritis. The useful in treating MDR in cancer patients. failure of these sulfhydryl groups to Scavenge active oxygen 0439. The reduction of ribonucleotides to the correspond species (e.g., hydrogen peroxide produced by activated neu ing deoxyribonucleotides, needed for DNA synthesis during trophils) results in oxidative damage to Surrounding tissues cell proliferation, is catalyzed by the enzyme ribonucleotide and the resulting inflammation (Hall, N. D. et al. (1994) diphosphate reductase. Glutaredoxin is a glutathione (GSH)- Rheumatol. Int. 4:35-38). dependent hydrogen donor for ribonucleotide diphosphate 0442. Another example of the importance of redox reac reductase and contains the active site consensus sequence tions in cell metabolism is the degradation of Saturated and -C-P-Y-C-. This sequence is conserved in glutaredoxins from unsaturated fatty acids by mitochondrial and peroxisomal Such different organisms as Escherichia coli. Vaccinia virus, beta-Oxidation enzymes which sequentially remove two-car yeast, plants, and mammalian cells. Glutaredoxin has inher bonunits from Coenzyme A (CoA)-activated fatty acids. The ent GSH-disulfide oxidoreductase (thioltransferase) activity main beta-oxidation pathway degrades both Saturated and in a coupled system with GSH, NADPH, and GSH-reductase, unsaturated fatty acids while the auxiliary pathway performs catalyzing the reduction of low molecular weight disulfides as additional steps required for the degradation of unsaturated well as proteins. Glutaredoxin has been proposed to exert a fatty acids. general thiol redox control of protein activity by acting both 0443) The pathways of mitchondrial and peroxisomal as an effective protein disulfide reductase, similar to thiore beta-oxidation use similar enzymes, but have different sub doxin, and as a specific GSH-mixed disulfide reductase (Pa strate specificities and functions. Mitochondria oxidize dilla, C. A. et al. (1996) FEBS Lett. 378:69-73). short-, medium-, and long-chain fatty acids to produce energy In addition to their important role in DNA synthesis and cell for cells. Mitochondrial beta-Oxidation is a major energy division, glutaredoxin and other thioproteins provide effec Source for cardiac and skeletal muscle. In liver, it provides tive antioxidant defense against oxygen radicals and hydro ketone bodies to the peripheral circulation when glucose lev US 2013/0332.133 A1 Dec. 12, 2013

els are low as in starvation, endurance exercise, and diabetes mitochondrial trifunctional protein exists that has long-chain (Eaton, S. et al. (1996) Biochem. J. 320:345-357). Peroxi enoyl-CoA hydratase, 3-hydroxyacyl-CoA dehydrogenase, Somes oxidize medium-, long-, and very-long-chain fatty and long-chain 3-oxothiolase activities (Eaton et al., Supra). acids, dicarboxylic fatty acids, branched fatty acids, prostag In human peroxisomes, enoyl-CoA hydratase activity is landins, Xenobiotics, and bile acid intermediates. The chief found in both a 327 amino acid residue mono-functional roles of peroxisomal beta-oxidation are to shorten toxic lipo enzyme and as part of a multi-functional enzyme, also known philic carboxylic acids to facilitate their excretion and to as bifunctional enzyme, which possesses enoyl-CoA shorten very-long-chain fatty acids prior to mitochondrial hydratase, enoyl-CoA isomerase, and 3-hydroxyacyl-CoA beta-oxidation (Mannaerts, G. P. and P. P. Van Veldhoven hydrogenase activities (FitzPatrick, D. R. et al. (1995) (1993) Biochimie 75:147-158). Genomics 27:457-466; and Hoefler, G. et al. (1994) Genom 0444 The auxiliary beta-oxidation enzyme 2,4-dienoyl ics 19:60-67). A 339 amino acid residue human protein with CoA reductase catalyzes the following reaction: short-chain enoyl-CoA hydratase activity also acts as an AU 0445 trans-2, cis/trans-4-dienoyl-CoA--NADPH-i-H" specific RNA binding protein (Nakagawa, J. et al. (1995) ->trans-3-enoyl-CoA+NA-DP" Proc. Natl. Acad. Sci. USA 92:2051-2055). All enoyl-CoA 0446. This reaction removes even-numbered double hydratases share homology near two active site glutamic acid bonds from unsaturated fatty acids prior to their entry into the residues, with 17 amino acid residues that are highly con main beta-oxidation pathway (Koivuranta, K.T. et al. (1994) served (Wu, W.-J. etal. (1997) Biochemistry 36:2211-2220). Biochem. J. 304:787-792). The enzyme may also remove 0450 Inherited deficiencies in mitochondrial and peroxi odd-numbered double bonds from unsaturated fatty acids Somal beta-Oxidation enzymes are associated with severe dis (Smeland, T. E. et al. (1992) Proc. Natl. Acad. Sci. USA eases, some of which manifest soon after birth and lead to 89:6673-6677). death within a few years. Mitochondrial beta-oxidation asso 0447 Rat 2,4-dienoyl-CoA reductase is located in both ciated deficiencies include, e.g., carnitine palmitoyl trans mitochondria and peroxisomes (Dommes, V. et al. (1981) J. ferase and carnitine deficiency, very-long-chain acyl-CoA Biol. Chem. 256:8259-8262). Two immunologically differ dehydrogenase deficiency, medium-chain acyl-CoA dehy ent forms of rat mitochondrial enzyme exist with molecular drogenase deficiency, short-chain acyl-CoA dehydrogenase masses of 60 kDa and 120 kDa (Hakkola, E. H. and J. K. deficiency, electron transport flavoprotein and electron trans Hiltunen (1993) Eur. J. Biochem. 215:199-204). The 120kDa port flavoprotein:ubiquinone oxidoreductase deficiency, tri mitochondrial rat enzyme is synthesized as a 335 amino acid functional protein deficiency, and short-chain 3-hydroxyacyl precursor with a 29 amino acid N-terminal leader peptide CoA dehydrogenase deficiency (Eaton et al., Supra). which is cleaved to form the mature enzyme (Hirose, A. et al. Mitochondrial trifunctional protein (including enoyl-CoA (1990) Biochim. Biophys. Acta 1049:346-349). A human hydratase) deficient patients have reduced long-chain enoyl mitochondrial enzyme 83% similar to rat enzyme is synthe CoA hydratase activities and suffer from non-ketotic sized as a 335 amino acid residue precursor with a 19 amino hypoglycemia, Sudden infant death syndrome, cardiomyopa acid N-terminal leader peptide (Koivuranta et al., Supra). thy, hepatic dysfunction, and muscle weakness, and may die These cloned human and rat mitochondrial enzymes function at an early age (Eaton et al., Supra). as homotetramers (Koivuranta et al., Supra). A Saccharomy 0451 Defects in mitochondrial beta-oxidation are associ ces cerevisiae peroxisomal 2,4-dienoyl-CoA reductase is 295 ated with Reye's syndrome, a disease characterized by amino acids long, contains a C-terminal peroxisomal target hepatic dysfunction and encephalopathy that sometimes fol ing signal, and functions as a homodimer (Coe, J. G. S. et al. lows viral infection in children. Reye's syndrome patients (1994) Mol. Gen. Genet. 244:661-672; and Gurvitz, A. et al. may have elevated serum levels of free fatty acids (Cotran, R. (1997) J. Biol. Chem. 272:22140-22147). All 2,4-dienoyl S. et al. (1994) Robbins Pathologic Basis of Disease, W.B. CoA reductases have a fairly well conserved NADPH binding Saunders Co., Philadelphia Pa., p. 866). Patients with mito site motif (Koivuranta et al., Supra). chondrial short-chain 3-hydroxyacyl-CoA dehydrogenase 0448. The main pathway beta-oxidation enzyme enoyl deficiency and medium-chain 3-hydroxyacyl-CoA dehydro CoA hydratase catalyzes the reaction: genase deficiency also exhibit Reye-like illnesses (Eaton et al., supra; and Egidio, R. J. et al. (1989) Am. Fam. Physician 2-trans-enoyl-CoA+H2O3-hydroxyacyl-CoA 39:221-226). 0449. This reaction hydrates the double bond between C-2 0452. Inherited conditions associated with peroxisomal and C-3 of 2-trans-enoyl-CoA, which is generated from Satu beta-Oxidation include Zellweger syndrome, neonatal adre rated and unsaturated fatty acids (Engel, C. K. et al. (1996) noleukodystrophy, infantile Refsum’s disease, acyl-CoA oxi EMBO J. 15:5135-5145). This step is downstream from the dase deficiency, peroxisomal thiolase deficiency, and bifunc step catalyzed by 2.4dienoyl-reductase. Different enoyl-CoA tional protein deficiency (Suzuki, Y. etal. (1994) Am. J. Hum. hydratases act on short-, medium-, and long-chain fatty acids Genet. 54:36-43; Hoefler et al., supra). Patients with peroxi (Eaton et al., Supra). Mitochondrial and peroxisomal enoyl Somal bifunctional enzyme deficiency, including that of CoA hydratases occur as both mono-functional enzymes and enoyl-CoA hydratase, Suffer from hypotonia, seizures, psy as part of multi-functional enzyme complexes. Human liver chomotor defects, and defective neuronal migration; accumu mitochondrial short-chain enoyl-CoA hydratase is synthe late very-long-chain fatty acids; and typically die within a few sized as a 290 amino acid precursor with a 29 amino acid years of birth (Watkins, P. A. et al. (1989) J. Clin. Invest. N-terminal leader peptide (Kanazawa, M. et al. (1993) 83:771-777). Enzyme Protein 47:9-13; and Janssen, U. et al. (1997) 0453 Peroxisomal beta-oxidation is impaired in cancer Genomics 40:470-475). Rat short-chain enoyl-CoA ous tissue. Although neoplastic human breast epithelial cells hydratase is 87% identical to the human sequence in the have the same number of peroxisomes as do normal cells, mature region of the protein and functions as a to homohex fatty acyl-CoA oxidase activity is lower than in control tissue amer (Kanazawa et al., Supra; and Engel et al., Supra). A (el Bouhtoury, F. et al. (1992) J. Pathol. 166:27-35). Human US 2013/0332.133 A1 Dec. 12, 2013 32 coloncarcinomas have fewer peroxisomes than normal colon diphosphate reductase activity levels and high rates of cell tissue and have lower fatty-acyl-CoA oxidase and bifunc proliferation (e.g., in hepatomas). This observation Suggests tional enzyme (including enoyl-CoA hydratase) activities that virus-encoded ribonucleotide diphosphate reductases, than normal tissue (Cable, S. et al. (1992) Virchows Arch. B and those present in cancer cells, are capable of maintaining Cell Pathol. Incl. Mol. Pathol. 62:221-226). an increased Supply deoxyribonucleotide pool for the produc 0454 6-phosphogluconate dehydrogenase (6-PGDH) tion of virus genomes or for the increased DNA synthesis catalyses the NADP-dependent oxidative decarboxylation which characterizes cancers cells. Ribonucleotide diphos of 6-phosphogluconate to ribulose 5-phosphate with the pro phate reductase is thus a target for therapeutic intervention duction of NADPH. The absence or inhibition of 6-PGDH (Nutter, L. M. and Y-C. Cheng (1984) Pharmac. Ther. results in the accumulation of 6-phosphogluconate to toxic 26:191-207; and Wright, J. A. (1983) Pharmac. Ther. 22:81– levels in eukaryotic cells. 6-PGDH is the third enzyme of the 102). pentose phosphate pathway (PPP) and is ubiquitous in nature. 0460 Dihydrodiol dehydrogenases (DD) are monomeric, In some heterofermentatative species, NAD+ is used as a NAD(P)-dependent, 34-37kDa enzymes responsible for the cofactor with the subsequent production of NADH. detoxification of trans-dihydrodiol and anti-diol epoxide 0455 The reaction proceeds through a 3-keto intermediate metabolites of polycyclic aromatic hydrocarbons (PAH) such which is decarboxylated to give the enol of ribulose 5-phos as benzo Cyrene, benz Canthracene, 7-methyl-benzCan phate, then converted to the keto product following tautomer thracene, 7,12-dimethyl-benz Canthracene, chrysene, and ization of the enol (Berdis A. J. and P. F. Cook (1993) Bio 5-methyl-chrysene. In mammalian cells, an environmental chemistry 32:2041-2046). 6-PGDH activity is regulated by PAH toxin such as benzo Cyrene is initially epoxidated by a the inhibitory effect of NADPH, and the activating effect of microsomal cytochrome P450 to yield 7R,8R-arene-oxide 6-phosphogluconate (Rippa, M. et al. (1998) Biochim. Bio and subsequently (-)-7R,8R-dihydrodiol ((-)-trans-7,8-di phys. Acta 1429:83-92). Deficiencies in 6-PGDH activity hydroxy-7,8-dihydrobenzo C. pyrene or (-)-trans-B CP have been linked to chronic hemolytic anemia. diol) This latter compound is further transformed to the anti 0456. The targeting of specific forms of 6-PGDH (e.g., diol epoxide of benzo C. pyrene (i.e., (...+-)-anti-7B,8C.- enzymes found in trypanosomes) has been suggested as a dihydroxy-9C., 10C.-epoxy-7,8,9,10-tetrahydrobenzolo. means for controlling parasitic infections (Tetaud, E. et al. pyrene), by the same enzyme or a different enzyme, (1999) Biochem. J. 338:55-60). For example, the Trypano depending on the species. This resulting anti-diol epoxide of Soma bruceii enzyme is markedly more sensitive to inhibition benzo Cyrene, or the corresponding derivative from another by the substrate analogue 6-phospho-2-deoxygluconate and PAH compound, is highly mutagenic. DD efficiently oxidizes the coenzyme analogue adenosine 2',5'-bisphosphate, com the precursor of the anti-diol epoxide (i.e., trans-dihydrodiol) pared to the mammalian enzyme (Hanau, S. et al. (1996) Eur. to transient catechols which auto-oxidize to quinones, also J. Biochem. 240:592-599). producing hydrogen peroxide and semiquinone radicals. This 0457 Ribonucleotide diphosphate reductase catalyzes the reaction prevents the formation of the highly carcinogenic reduction of ribonucleotide diphosphates (i.e., ADP, GDP. anti-diol. Anti-diols are not themselves substrates for DDyet CDP. and UDP) to their corresponding deoxyribonucleotide the addition of DD to a sample comprising an anti-diol com diphosphates (i.e., dADP, dGDP, dCDP, and dUDP) which pound results in a significant decrease in the induced muta are used for the synthesis of DNA. Ribonucleotide diphos tion rate observed in the Ames test. In this instance, DD is able phate reductase thereby performs a crucial role in the de novo to bind to and sequester the anti-diol, even though it is not synthesis of deoxynucleotide precursors. Deoxynucleotides oxidized. Whether through oxidation or sequestration, DD are also produced from deoxynucleosides by nucleoside plays an important role in the detoxification of metabolites of kinases via the Salvage pathway. xenobiotic polycyclic compounds (Penning, T. M. (1993) 0458 Mammalian ribonucleotide diphosphate reductase Chemico-Biological Interactions 89:1-34). comprises two components, an effector-binding component 0461) 15-oxoprostaglandin 13-reductase (PGR) and (E) and a non-heme iron component (F). Component E binds 15-hydroxyprostaglandin dehydrogenase (15-PGDH) are the nucleoside triphosphate effectors while component F con enzymes present in the lung that are responsible for degrading tains the iron radical necessary for catalysis. Molecular circulating prostaglandins. Oxidative catabolism via passage weight determinations of the E and F components, as well as through the pulmonary system is a common means of reduc the holoenzyme, vary according to the methods used in puri ing the concentration of circulating prostaglandins. fication of the proteins and the particular laboratory. Compo 15-PGDH oxidizes the 15-hydroxyl group of a variety of nent E is approximately 90-100kDa, component F is approxi prostaglandins to produce the corresponding 15-oxo com mately 100-120 kDa, and the holoenzyme is 200-250 kDa. pounds. The 15-oxo derivatives usually have reduced biologi 0459 Ribonucleotide diphosphate reductase activity is cal activity compared to the 15-hydroxyl molecule. PGR adversely effected by iron chelators, such as thiosemicarba further reduces the 13, 14 double bond of the 15-oxo com Zones, as well as EDTA. Deoxyribonucleotide diphosphates pound which typically leads to a further decrease in biologi also appear to be negative allosteric effectors of ribonucle cal activity. PGR is a monomer with a molecular weight of otide diphosphate reductase. Nucleotide triphosphates (both approximately 36 kDa. The enzyme requires NADH or ribo- and deoxyribo-) appear to stimulate the activity of the NADPH as a cofactor with a preference for NADH. The enzyme. 3-methyl-4-nitrophenol, a metabolite of widely used 15-oxo derivatives of prostaglandins PGE, PGE, and organophosphate pesticides, is a potent inhibitor of ribo PGEa, are all substrates for PGR: however, the non-deriva nucleotide diphosphate reductase in mammalian cells. Some tized prostaglandins (i.e., PGE, PG, and PGE.C.) are not evidence Suggests that ribonucleotide diphosphate reductase substrates (Ensor, C. M. et al. (1998) Biochem. J. 330:103 activity in DNA virus (e.g., herpesvirus)-infected cells and in 108). cancer cells is less sensitive to regulation by allosteric regu 0462) 15-PGDH and PGR also catalyze the metabolism of lators and a correlation exists between high ribonucleotide lipoxin A (LXA). Lipoxins (LX) are autacoids, lipids pro US 2013/0332.133 A1 Dec. 12, 2013

duced at the sites of localized inflammation, which down Transferases regulate polymorphonuclear leukocyte (PMN) function and 0466 Transferases are enzymes that catalyze the transfer promote resolution of localized trauma. Lipoxin production of molecular groups. The reaction may involve an oxidation, is stimulated by the administration of aspirin in that cells reduction, or cleavage of covalent bonds, and is often specific displaying cyclooxygenase II (COX II) that has been acety to a Substrate or to particular sites on a type of Substrate. lated by aspirin and cells that possess 5-lipoxygenase (5-LO) Transferases participate in reactions essential to Such func interact and produce lipoxin. 15-PGDH generates 15-oxo tions as Synthesis and degradation of cell components, and LXA with PGR further converting the 15-oxo compound to regulation of cell functions including cell signaling, cell pro 13,14-dihydro-15-oxo-LXA (Clish, C. B. et al. (2000) J. liferation, inflammation, apoptosis, secretion and excretion. Biol. Chem. 275:25372-25380). This finding suggests a Transferases are involved in key steps in disease processes broad Substrate specificity of the prostaglandin dehydrogena involving these functions. Transferases are frequently classi ses and has implications for these enzymes in drug metabo fied according to the type of group transferred. For example, lism and as targets for therapeutic intervention to regulate methyl transferases transfer one-carbon methyl groups, inflammation. amino transferases transfer nitrogenous amino groups, and similarly denominated enzymes transferaldehyde or ketone, 0463. The GMC (glucose-methanol-choline) oxidoreduc acyl, glycosyl, alkyl or aryl, isoprenyl, Saccharyl, phospho tase family of enzymes was defined based on sequence align rous-containing, Sulfur-containing, or selenium-containing ments of Drosophila melanogaster glucose dehydrogenase, groups, as well as Small enzymatic groups such as Coenzyme Escherichia coli choline dehydrogenase, Aspergillus niger A glucose oxidase, and Hansenula polymorpha methanol oxi 0467. Acyl transferases include peroxisomal carnitine dase. Despite their different sources and substrate specifici octanoyl transferase, which is involved in the fatty acid beta ties, these four flavoproteins are homologous, being charac oxidation pathway, and mitochondrial carnitine palmitoyl terized by the presence of several distinctive sequence and transferases, involved in fatty acid metabolism and transport. structural features. Each molecule contains a canonical ADP Choline O-acetyltransferase catalyzes the biosynthesis of the binding, beta-alpha-beta mononucleotide-binding motif neurotransmitter acetylcholine. N-acyltransferase enzymes close to the amino terminus. This fold comprises a four catalyze the transfer of an amino acid conjugate to an acti stranded parallel beta-sheet sandwiched between a three vated carboxylic group. Endogenous compounds and Xeno stranded antiparallel beta-sheet and alpha-helices. Nucle biotics are activated by acyl-CoA synthetases in the cytosol, otides bind in similar positions relative to this chain fold microsomes, and mitochondria. The acyl-CoA intermediates (Cavener, D. R. (1992) J. Mol. Biol. 223:811-814; Wierenga, are then conjugated with an amino acid (typically glycine, R. K. etal. (1986).J. Mol. Biol. 187: 101-107). Members of the glutamine, or taurine, but also ornithine, arginine, histidine, GMC oxidoreductase family also share a consensus sequence serine, aspartic acid, and several dipeptides) by N-acyltrans near the central region of the polypeptide. Additional mem ferases in the cytosol or mitochondria to form a metabolite bers of the GMC oxidoreductase family include cholesterol with an amide bond. One well-characterized enzyme of this class is the bile acid-CoA:amino acid N-acyltransferase oxidases from Brevibacterium Sterolicum and Streptomyces; (BAT) responsible for generating the bile acid conjugates and an alcohol dehydrogenase from Pseudomonas Oleo which serve as detergents in the gastrointestinal tract (Falany, vorans (Cavener, supra; Henikoff, S, and J. G. Henikoff C. N. etal. (1994).J. Biol. Chem. 269:19375-19379; Johnson, (1994) Genomics 19:97-107; van Beilen, J. B. et al. (1992) M. R. et al. (1991).J. Biol. Chem. 266:10227-10233). BAT is Mol. Microbiol. 6:3121-3136). also useful as a predictive indicator for prognosis of hepato 0464 IMP dehydrogenase and GMP reductase are two cellular carcinoma patients after partial hepatectomy (Furu oxidoreductases which share many regions of sequence simi tani, M. et al. (1996) Hepatology 24: 1441-1445). larity. IMP dehydrogenase (EC 1.1.1.205) catalyes the NAD dependent reduction of IMP (inosine monophosphate) into Acetyltransferases XMP (xanthine monophosphate) as part of de novo GTP 0468 Acetyltransferases have been extensively studied biosynthesis (Collart, F. R. and E. Huberman (1988) J. Biol. for their role in histone acetylation. Histone acetylation Chem. 263:15769-15772). GMP reductase catalyzes the results in the relaxing of the chromatin structure in eukaryotic NADPH-dependent reductive deamination of GMP into IMP. cells, allowing transcription factors to gain access to pro helping to maintain the intracellular balance of adenine and moter elements of the DNA templates in the affected region of guanine nucleotides (Andrews, S.C. and J. R. Guest (1988) the genome (or the genome in general). In contrast, histone Biochem. J. 255:35-43). deacetylation results in a reduction in transcription by closing 0465 Pyridine nucleotide-disulphide oxidoreductases are the chromatin structure and limiting access of transcription FAD flavoproteins involved in the transfer of reducing factors. To this end, a common means of stimulating cell equivalents from FAD to a substrate. These flavoproteins transcription is the use of chemical agents that inhibit the contain a pair of redox-active cysteines contained within a deacetylation of histones (e.g., Sodium butyrate), resulting in consensus sequence which is characteristic of this protein a global (albeit artifactual) increase in gene expression. The family (Kurlyan, J. et al. (1991) Nature 352:172-174). Mem modulation of gene expression by acetylation also results bers of this family of oxidoreductases include glutathione from the acetylation of other proteins, including but not lim reductase (C 1.6.4.2); thioredoxin reductase of higher ited to, p53, GATA-1, MyoD, ACTR, TFIIE, TFIIF and the eukaryotes (EC 1.6.4.5); trypanothione reductase (EC 1.6.4. high mobility group proteins (HMG). In the case of p53, 8); lipoamide dehydrogenase (EC 1.8.1.4), the E3 component acetylation results in increased DNA binding, leading to the of alpha-ketoacid dehydrogenase complexes; and mercuric stimulation of transcription of genes regulated by p53. The reductase (EC 1.16.1.1). prototypic histone acetylase (HAT) is Gcn5 from Saccharo US 2013/0332.133 A1 Dec. 12, 2013 34 myces cerevisiae. Gcn5 is a member of a family of acetylases transferase, mammalian UDP-galactose-ceramide galactosyl that includes Tetrahymena p55, human Gcn5, and human transferase, catalyzes the transfer of galactose to ceramide in p300/CBP. Histone acetylation is reviewed in (Cheung, W. L. the synthesis of galactocerebrosides in myelin membranes of et al. (2000) Curr. Opin. Cell Biol. 12:326-333 and Berger, S. the nervous system. The UDP-glycosyl transferases share a L. (1999) Curr. Opin. Cell Biol. 11:336-341). Some acetyl conserved signature domain of about 50 amino acid residues transferase enzymes possess the alpha/beta hydrolase fold (PROSITE: PD0000359, http://expasy.hcuge.ch/sprot/pro (Center of Applied Molecular Engineering Inst. of Chemistry site.html). and Biochemistry University of Salzburg, http://predict. 0472 Methyl transferases are involved in a variety of sanger.ac.uk/irbm-co-urse97/Docs/ms/) common to several pharmacologically important processes. Nicotinamide other major classes of enzymes, including but not limited to, N-methyl transferase catalyzes the N-methylation of nicoti and (Structural Clas namides and other pyridines, an important step in the cellular sification of Proteins, http:flscop.mrc-1 mb.cam.ac.ulcisco handling of drugs and other foreign compounds. Phenyletha p/index.html). nolamine N-methyl transferase catalyzes the conversion of 0469 N-acetyltransferases are cytosolic enzymes which noradrenalin to adrenalin. 6-O-methylguanine-DNA methyl utilize the cofactor acetyl-coenzyme A (acetyl-CoA) to trans transferase reverses DNA methylation, an important step in fer the acetyl group to aromatic amines and hydrazine con carcinogenesis. Uroporphyrin-III C-methyl transferase, taining compounds. In humans, there are two highly similar which catalyzes the transfer of two methyl groups from S-ad N-acetyltransferase enzymes, NAT1 and NAT2: mice appear enosyl-L-methionine to uroporphyrinogen III, is the first spe to have a third form of the enzyme, NAT3. The human forms cific enzyme in the biosynthesis of cobalamin, a dietary of N-acetyltransferase have independent regulation (NAT1 is enzyme whose uptake is deficient in pernicious anemia. Pro widely-expressed, whereas NAT2 is in liver andgut only) and tein-arginine methyl transferases catalyze the posttransla overlapping Substrate preferences. Both enzymes appear to tional methylation of arginine residues in proteins, resulting accept most substrates to some extent, but NAT1 does prefer in the mono- and dimethylation of arginine on the guanidino Some Substrates (para-aminobenzoic acid, para-aminosali group. Substrates include histones, myelin basic protein, and cylic acid, Sulfamethoxazole, and Sulfanilamide), while heterogeneous nuclear ribonucleoproteins involved in NAT2 prefers others (isoniazid, hydralazine, procainamide, mRNA processing, splicing, and transport. Protein-arginine dapsone, aminoglutethimide, and Sulfamethazine). A methyl transferase interacts with proteins upregulated by recently isolated human gene, tubedown-1, is homologous to mitogens, with proteins involved in chronic lymphocytic leu the yeast NAT-1 N-acetyltransferases and encodes a protein kemia, and with interferon, Suggesting an important role for associated with acetyltransferase activity. The expression methylation in cytokine receptor signaling (Lin, W.-J. et al. patterns of tubedown-1 Suggest that it may be involved in (1996) J. Biol. Chem. 27.1:15034-15044: Abramovich, C. et regulating vascular and hematopoietic development (Gend al. (1997) EMBO.J. 16:260-266; and Scott, H.S. etal. (1998) ron, R. L. et al. (2000) Dev. Dyn. 218:300-315). Genomics 48:330-340). 0470 Amino transferases comprise a family of pyridoxal 0473 Phospho transferases catalyze the transfer of high 5'-phosphate (PLP)-dependent enzymes that catalyze trans energy phosphate groups and are important in energy-requir formations of amino acids. Amino transferases play key roles ing and -releasing reactions. The metabolic enzyme creatine in protein synthesis and degradation, and they contribute to kinase catalyzes the reversible phosphate transfer between other processes as well. For example, GABA aminotrans creatine/creatine phosphate and ATP/ADP. Glycocyamine ferase (GABA-T) catalyzes the degradation of GABA, the kinase catalyzes phosphate transfer from ATP to guanidoac major inhibitory amino acid neurotransmitter. The activity of etate, and arginine kinase catalyzes phosphate transfer from GABA-T is correlated to neuropsychiatric disorders such as ATP to arginine. A cysteine-containing active site is con alcoholism, epilepsy, and Alzheimer's disease (Sherif, F. M. served in this family (PROSITE: PD0000103). and S. S. Ahmed (1995) Clin. Biochem. 28:145-154). Other members of the family include pyruvate aminotransferase, 0474 Prenyl transferases are heterodimers, consisting of branched-chain amino acid aminotransferase, tyrosine ami an alpha and a beta subunit, that catalyze the transfer of an notransferase, aromatic aminotransferase, alanine:glyoxylate isoprenyl group. The Ras farnesyltransferase (FTase) enzyme aminotransferase (AGT), and kynurenine aminotransferase transfers a farnesyl moiety from cytosolic farnesylpyrophos (Vacca, R. A. et al. (1997).J. Biol. Chem. 272:21932-21937). phate to a cysteine residue at the carboxyl terminus of the Ras Kynurenine aminotransferase catalyzes the irreversible tran oncogene protein. This modification is required to anchor Ras samination of the L-tryptophan metabolite L-kynurenine to to the cell membrane so that it can perform its role in signal form kynurenic acid. The enzyme may also catalyzes the transduction. FTase inhibitors block Ras function and dem reversible transamination reaction between L-2-aminoadi onstrate antitumor activity (Buolamwini, J. K. (1999) Curr. pate and 2-oxoglutarate to produce 2-oxoadipate and Opin. Chem. Biol. 3:500-509). Ftase, which shares structural L-glutamate. Kynurenic acid is a putative modulator of similarity with geranylgeranyl transferase, or Rab GG trans glutamatergic neurotransmission, thus a deficiency in ferase, prenylates Rab proteins, allowing them to perform kynurenine aminotransferase may be associated with pleio their roles in regulating vesicle transport (Seabra, M. C. tropic effects (Buchli, R. et al. (1995) J. Biol. Chem. 270: (1996) J. Biol. Chem. 271:14398-14404). 29330-29335). 0475 Saccharyl transferases are glycating enzymes 0471 Glycosyltransferases include the mammalian UDP involved in a variety of metabolic processes. Oligosaccharyl glucouronosyl transferases, a family of membrane-bound transferase-48, for example, is a receptor for advanced gly microsomal enzymes catalyzing the transfer of glucouronic cation endproducts, which accumulate in vascular complica acid to lipophilic Substrates in reactions that play important tions of diabetes, macrovascular disease, renal insufficiency, roles in detoxification and excretion of drugs, carcinogens, and Alzheimer's disease (Thornalley, P. J. (1998) Cell Mol. and other foreign Substances. Another mammalian glycosyl Biol. (Noisy-Le-Grand) 44:1013-1023). US 2013/0332.133 A1 Dec. 12, 2013

0476 Coenzyme A (CoA) transferase catalyzes the trans or defective proteins. Peptidases function in bacterial, para fer of CoA between two carboxylic acids. Succinyl CoA:3- sitic, and viral invasion and replication within a host. oxoacid CoA transferase, for example, transfers CoA from Examples of peptidases include trypsin and chymotrypsin Succinyl-CoA to a recipient such as acetoacetate. Acetoac (components of the complement cascade and the blood-clot etate is essential to the metabolism of ketone bodies, which ting cascade) lysosomal cathepsins, calpains, pepsin, renin, accumulate in tissues affected by metabolic disorders such as and chymosin (Beynon, R. J. and J. S. Bond (1994) Pro diabetes (PROSITE: PD0000980). 0477 Transglutaminase transferases (Tgases) are Ca" teolytic Enzymes: A Practical Approach, Oxford University dependent enzymes capable of forming isopeptide bonds by Press, New York, N.Y., pp. 1-5). (LPLs) catalyzing the transfer of the Y-carboxy group from protein regulate intracellular lipids by catalyzing the hydrolysis of bound glutamine to the .epsilon.-amino group of protein ester bonds to remove an acyl group, a key step in lipid bound lysine residues or other primary amines. Tgases are the degradation. Small LPL isoforms, approximately 15-30 kD. enzymes responsible for the cross-lining of cornified enve function as hydrolases; larger isoforms function both as lope (CE), the highly insoluble protein structure on the sur hydrolases and transacylases. A particular Substrate for LPLS, face of corneocytes, into a chemically and mechanically lysophosphatidylcholine, causes lysis of cell membranes. resistant protein polymer. Seven known human Tgases have LPL activity is regulated by signaling molecules important in been identified. Individual transglutaminase gene products numerous pathways, including the inflammatory response. are specialized in the cross-linking of specific proteins or 0481. The catalyze the hydrolysis of tissue structures, such as factor XIIIa which stabilizes the one of the two ester bonds in a phosphodiester compound. fibrin clot in hemostasis, prostrate transglutaminase which functions in semen coagulation, and tissue transglutaminase Phosphodiesterases are therefore crucial to a variety of cel which is involved in GTP-binding in receptor signaling. Four lular processes. Phosphodiesterases include DNA and RNA (Tgases 1, 2, 3, and X) are expressed in terminally differen endo- and exo-, which are essential to cell growth tiating epithelia Such as the epidermis. Tgases are critical for and replication as well as protein synthesis. Endonuclease V the proper cross-inking of the CE as seen in the pathology of (deoxyinosine 3'-endonuclease) is an example of a type II patients suffering from one form of the skin diseases referred site-specific , a putative DNA repair to as congenital ichthyosis which has been linked to muta enzyme that cleaves DNAS containing hypoxanthine, uracil, tions in the keratinocyte transglutaminase (TG) gene or mismatched bases. Escherichia coli endonuclease V has been shown to cleave DNA containing deoxyXanthosine at (Nemes, Z. et al. (1999) Proc. Natl. Acad. Sci. U.S.A. the second phosphodiester bond 3' to deoxyxanthosine, gen 96:8402-8407, Aeschlimann, D. et al. (1998) J. Biol. Chem. erating a 3'-hydroxyl and a 5'-phosphoryl group at the nick 273:3452-3460.) site (He, B. etal. (2000) Mutat. Res.459:109-114). It has been Hydrolases Suggested that Escherichia coli endonuclease V plays a role in the removal of deaminated guanine, i.e., Xanthine, from 0478 Hydrolases are a class of enzymes that catalyze the DNA, thus helping to protect the cell against the mutagenic cleavage of various covalent bonds in a Substrate by the intro effects of nitrosative deamination (Schouten, K. A. and B. duction of a molecule of water. The reaction involves a Weiss (1999) Mutat. Res. 435:245-254). In eukaryotes, the nucleophilic attack by the water molecule's oxygenatom on process of tRNA splicing requires the removal of small tRNA a target bond in the substrate. The water molecule is split introns that interrupt the anticodon loop 1 base 3' to the across the target bond, breaking the bond and generating two anticodon. This process requires the stepwise action of an product molecules. Hydrolases participate in reactions essen endonuclease, a ligase, and a phosphotransferase (Hong, L. et tial to Such functions as synthesis and degradation of cell al. (1998) Science 280:279-284). P(RNase P) components, and for regulation of cell functions including is a ubiquitous RNA processing endonuclease that is required cell signaling, cell proliferation, inflammation, apoptosis, for generating the mature tRNA 5'-end during the tRNA splic secretion and excretion. Hydrolases are involved in key steps ing process. This is accomplished through the catalysis of the in disease processes involving these functions. Hydrolytic cleavage of P-3'O bonds to produce 5'-phosphate and 3'-hy enzymes, or hydrolases, may be grouped by Substrate speci droxyl end groups at a specific site on pre-tRNA. Catalysis by ficity into classes including , peptidases, lyso RNase P is absolutely dependent on divalent cations such as , phosphodiesterases, glycosidases, glyox Mg" or Mn" (Kurz, J. C. et al. (2000) Curr. Opin. Chem. alases, aminohydrolases, carboxylesterases, , Biol. 4:553-558). Substrate recognition mechanisms of phosphohydrolases, , lysozymes, and many RNase P are well conserved among eukaryotes and bacteria others. (Fan enzymei, S. et al. (1998) Science 280:284-286). In Sac 0479. Phosphatases hydrolytically remove phosphate charomyces cerevisiae, POP1 (processing of precursor groups from proteins, an energy-providing step that regulates RNAs) encodes a protein component of both RNase Pand many cellular processes, including intracellular signaling RNase MRP another RNA processing protein. Mutations in pathways that in turn control cell growth and differentiation, yeast POP1 are lethal (Lygerou, Z. et al. (1994) Genes Dev. cell-cell contact, the cell cycle, and oncogenesis. 8:1423-1433). Another , acid sphingomy 0480 Peptidases, also called proteases, cleave peptide elinase, hydrolyzes the membrane phospholipid sphingomy bonds that form the backbone of peptide or protein chains. elin to ceramide and phosphorylcholine. Phosphorylcholine Proteolytic processing is essential to cell growth, differentia functions in Synthesis of phosphatidylcholine, which is tion, remodeling, and homeostasis as well as inflammation involved in intracellular signaling pathways. Ceramide is an and the immune response. Since typical protein half-lives essential precursor for the generation of gangliosides, mem range from hours to a few days, peptidases are continually brane lipids found in high concentration in neural tissue. cleaving precursor proteins to their active form, removing Defective phosphodiesterase leads to signal sequences from targeted proteins, and degrading aged Niemann-Pick disease. US 2013/0332.133 A1 Dec. 12, 2013 36

0482 Glycosidases catalyze the cleavage of hemiacetyl are enzymes that hydrolyze phosphate esters. Some phospho bonds of glycosides, which are compounds that contain one hydrolases contain a mutT domain signature sequence. MutT or more Sugar. Mammalian lactase-phlorizin hydrolase, for is a protein involved in the GO system responsible for remov example, is an intestinal enzyme that splits lactose. Mamma ing an oxidatively damaged form of guanine from DNA. A lian beta-galactosidase removes the terminal galactose from region of about 40 amino acid residues, found in the N-ter gangliosides, glycoproteins, and glycosaminoglycans, and minus of mutT, is also found in other proteins, including some deficiency of this enzyme is associated with a gangliosidosis phosphohydrolases (PROSITE PD0000695). known as Morquio disease type B (PROSITE PCD0000910). 0487 Serine hydrolases are a large functional class of Vertebrate lysosomal alpha-glucosidase, which hydrolyzes hydrolytic enzymes that contain a serine residue in their glycogen, maltose, and isomaltose, and vertebrate intestinal active site. This class of enzymes contains proteinases, Sucrase-isomaltase, which hydrolyzes Sucrose, maltose, and esterases, and lipases which hydrolyze a variety of Substrates isomaltose, are widely distributed members of this family and, therefore, have different biological roles. Proteins in this with highly conserved sequences at their active sites. Superfamily can be further grouped into Subfamilies based on 0483 The glyoxylase system is involved in gluconeogen substrate specificity or amino acid similarities (Puente, X. S. esis, the production of glucose from storage compounds in the and C. Lopez-Otin (1995).J. Biol. Chem. 270: 12926-12932). body. It consists of glyoxylase I, which catalyzes the forma 0488 Neuropathy target (NTE) is an integral tion of S-D-lactoylglutathione from methyglyoxal, a side membrane protein present in all neurons and in some non product of triose-phosphate energy metabolism, and glyoxy neural-cell types of vertebrates. NTE is involved in a cell lase II, which hydrolyzes S-D-lactoylglutathione to D-lactic signaling pathway controlling interactions between neurons acid and reduced glutathione. Glyoxylases are involved in and accessory glial cells in the developing nervous system. hyperglycemia, non-insulin-dependent diabetes mellitus, the NTE has serine esterase activity and efficiently catalyses the detoxification of bacterial toxins, and in the control of cell hydrolysis of phenyl Valerate (PV) in vitro, but its physiologi proliferation and microtubule assembly. NG.NG-dimethy cal substrate is unknown. NTE is not related to either the larginine dimethylaminohydrolase (DDAH) is an enzyme major serine esterase family, which includes acetylcholinest that hydrolyzes the endogenous nitric oxide synthase (NOS) erase, nor to any other known serine hydrolases. NTE con inhibitors, NG-monomethyl-arginine and NG.NG-dimethyl tains at least two functional domains: an N-terminal putative L-arginine, to L-citrulline. Inhibiting DDAH can cause regulatory domain and a C-terminal effector domain which increased intracellular concentration of NOS inhibitors to contains the esterase activity and is, in part, conserved in levels sufficient to inhibit NOS. Therefore, DDAH inhibition proteins found in bacteria, yeast, nematodes and insects. may provide a method of NOS inhibition, and changes in the NTE's effector domain contains three predicted transmem activity of DDAH could play a role in pathophysiological brane segments, and the active-site serine residue lies at the alterations in nitric oxide generation (MacAllister, R.J. et al. center of one of these segments. The isolated recombinant (1996) Br. J. Pharmacol. 119:1533-1540). DDAH was found domain shows PV hydrolase activity only when incorporated in neurons displaying cytoskeletal abnormalities and oxida into phospholipid liposomes. NTE's esterase activity is tive stress in Alzheimer's disease. In age-matched control largely redundant in adult vertebrates, but organophosphates cases, DDAH was not found in neurons. This Suggests that which react with NTE in vivo initiate unknown events which oxidative stress- and nitric oxide-mediated events play a role lead to a neuropathy with degeneration of long axons. These in the pathogenesis of Alzheimer's disease (Smith, M.A. etal. neuropathic organophosphates leave a negatively charged (1998) Free Rad. Biol. Med. 25:898-902). group covalently attached to the active-site serine residue, 0484 Acyl-CoA is another member of the which causes a toxic gain of function in NTE (Glynn, P. family (Alexson, S. E. et al. (1993) Eur. J. (1999) Biochem. J. 344:625-631). Further, the Drosophila Biochem. 214:719-727). Evidence suggests that acyl-CoA neurodegeneration gene Swiss-cheese encodes a neuronal thioesterase has a regulatory role in steroidogenic tissues protein involved in glia-neuron interaction and is homolo (Finkielstein, C. et al. (1998) Eur. J. Biochem. 256:60-66). gous to the above human NTE (Moser, M. et al. (2000) Mech. 0485 The alpha/beta hydrolase protein fold is common to Dev. 90:279-282). several hydrolases of diverse phylogenetic origin and cata 0489 Chitinases are chitin-degrading enzymes present in lytic function. Enzymes with the alpha/beta hydrolase fold a variety of organisms and participate in processes including have a common core structure consisting of eight beta-sheets cell wall remodeling, defense and catabolism. Chitinase connected by alpha-helices. The most conserved structural activity has been found in human serum, leukocytes, granu feature of this fold is the loops of the nucleophile-histidine locytes, and in association with fertilized oocytes in mam acid catalytic triad. The histidine in the catalytic triad is mals (Escott, G. M. (1995) Infect. Immunol. 63:4770-4773: completely conserved, while the nucleophile and acid loops DeSouza, M. M. (1995) Endocrinology 136:2485-2496). accommodate more than one type of amino acid (Ollis, D. L. Glycolytic and proteolytic molecules in humans are associ et al. (1992) Protein Eng. 5:197-211). ated with tissue damage in lung diseases and with increased 0486 Sulfatases are members of a highly conserved gene tumorigenicity and metastatic potential of cancers (Mulligan, family that share extensive sequence homology and a high M. S. (1993) Proc. Natl. Acad. Sci. 90:11523-11527; Matri degree of structural similarity. Sulfatases catalyze the cleav sian, L. M. (1991) Am. J. Med. Sci. 302:157-162: Witty, J. P. age of Sulfate esters. To perform this function, Sulfatases (1994) Cancer Res. 54:4805-4812). The discovery of a undergo a unique post-translational modification in the endo human enzyme with chitinolytic activity is noteworthy given plasmic reticulum that involves the oxidation of a conserved the lack of endogenous chitin in the human body (Raghavan, cysteine residue. A human disorder called multiple N. (1994) Infect. Immun. 62:1901-1908). However, there is a deficiency is due to a defect in this post-translational modifi group of mammalian proteins that share homology with chiti cation step, leading to inactive Sulfatases (Recksiek, M. et al. nases from various non-mammalian organisms, such as bac (1998) J. Biol. Chem. 273:6096-6103). Phosphohydrolases teria, fungi, plants, and insects. The members of this family US 2013/0332.133 A1 Dec. 12, 2013 37 differ in their ability to hydrolyze chitin or chitin-like sub of certain mammalian taxa and of some reptiles (Beinterna, J. strates. Some of the mammalian members of the family, Such J. et al (1988) Prog. Biophys. Mol. Biol. 51:165-192). Pro as abovine whey chitotriosidase and human cartilage proteins teins in the mammalian pancreatic RNase Superfamily are which do not demonstrate specific chitinolytic activity, are noncytosolic that degrade RNA through a expressed in association with tissue remodeling events (Re two-step transphosphorolytic-hydrolytic reaction (Beinterna, man, J.J. (1988) Biochem. Biophys. Res. Commun. 150:329 J.J. etal. (1986) Mol. Biol. Evol.3:262-275). Specifically, the 334, Nyirkos, P. (1990) Biochem. J. 268:265-268). Elevated enzymes are involved in endonucleolytic cleavage of 3'-phos levels of human cartilage proteins have been reported in the phomononucleotides and 3'-phosphooligonucleotides ending synovial fluid and cartilage of patients with rheumatoid in C-P or U-P with 2',3'-cyclic phosphate intermediates. arthritis, a disease which produces a severe degradation of the can unwind the DNA helix by complexing cartilage and a proliferation of the synovial membrane in the with single-stranded DNA; the complex arises by an extended affected joints (Hakala, B. E. (1993) J. Biol. Chem. 268: multi-site cation-anion interaction between lysine and argin 25803-25810). ine residues of the enzyme and phosphate groups of the nucle 0490 A small subclass of hydrolases acting on ether otides. Some of the enzymes belonging to this family appear bonds includes the thioether hydrolases. S-adenosyl-L-ho to play a purely digestive role, whereas others exhibit potent mocysteine hydrolase, also known as AdoHcyase or SAHH and unusual biological activities (D’Alessio, G. (1993) (PROSITE PDOC00603; EC 3.3.1.1), is a thioether hydrolase Trends Cell Biol. 3:106-109). Proteins belonging to the pan first described in rat liver extracts as the activity responsible creatic RNase family include: bovine seminal vesicle and for the reversible hydrolysis of S-adenosyl-L-homocysteine brain ribonucleases; kidney non-secretory ribonucleases (AdoHcy) to adenosine and homocysteine (Sganga, M. W. et (Beinterna, J. J. etal (1986) FEBS Lett. 194:338-343); liver al. (1992) PNAS 89:6328-6332). SAHH is a cytosolic type ribonucleases (Rosenberg, H. F. et al. (1989) PNAS enzyme that has been found in all cells that have been tested, U.S.A. 86:4460-4464); , which induces vasculari with the exception of Escherichia coli and certain related sation of normal and malignant tissues; eosinophil cationic bacteria (Walker, R. D. et al. (1975) Can. J. Biochem.53:312 protein (Hofsteenge, J. et al. (1989) Biochemistry 28:9806 319; Shimizu, S. et al. (1988) FEMS Microbiol. Lett. 51:177 9813), a cytotoxin and helminthotoxin with ribonuclease 180; Shimizu, S. et al. (1984) Eur. J. Biochem. 141:385-392). activity; and frog liver ribonuclease and frog sialic acid SAHH activity is dependent on NAD" as a cofactor. Defi binding lectin. The sequences of pancreatic RNases contain 4 ciency of SAHH is associated with hypermethioninemia (On conserved disulfidebonds and 3 amino acid residues involved line Mendelian Inheritance in Man (OMIM) #180960 Hyper in the catalytic activity. methioninemia), a pathologic condition characterized by 0493 ADP-ribosylation is a reversible post-translational neonatal cholestasis, failure to thrive, mental and motor retar protein modification in which an ADP-ribose moiety is trans dation, facial dysmorphism with abnormal hair and teeth, and ferred from B.-NAD to a target amino acid Such as arginine or myocaridopathy (Labrune, Petal. (1990).J. Pediat. 117:220 cysteine. ADP-ribosylarginine hydrolases regenerate argin 226). ine by removing ADP-ribose from the protein, completing the 0491 Another subclass of hydrolases includes those ADP-ribosylation cycle (Moss, J. et al. (1997) Adv. Exp. enzymes which act on carbon-nitrogen (C N) bonds other Med. Biol. 419:25-33). ADP-ribosylation is a well-known than peptide bonds. To this Subclass belong those enzymes reaction among bacterial toxins. Cholera toxin, for example, hydrolyzing amides, amidines, and other C-N bonds. This disrupts the adenylyl cyclase system by ADP-ribosylating the subclass is further subdivided on the basis of substrate speci C-Subunit of the stimulatory G-protein, causing an increase in ficity Such as linear amides, cyclic amides, linear amidines, intracellular cAMP (Moss, J. and M. Vaughan (Eds) (1990) cyclic amidines, nitrites and other compounds. A hydrolase ADP-ribosylating Toxins and G-Proteins: Insights into Sig belonging to the Sub-Subclass of enzymes acting on the cyclic nal Transduction, American Society for Microbiology, Wash amidines is adenosine deaminase (ADA). ADA catalyzes the ington, D.C.). ADP-ribosylation may also have a regulatory breakdown of adenosine to inosine. ADA is present in many function in eukaryotes, affecting Such processes as cytoskel mammalian tissues, including placenta, muscle, lung, stom etal assembly (Zhou, H. et al. (1996) Arch. Biochem. Bio ach, digestive diverticulum, spleen, erythrocytes, thymus, phys. 334:214-222) and cell proliferation in cytotoxic T-cells seminal plasma, thyroid, T-cells, bone marrow stem cells, and (Wang, J. et al. (1996) J. Immunol. 156:2819-2827). Nucle liver. A subclass of ADAs, ADAR, act on RNA and are clas otidases catalyze the formation of free nucleosides from sified as RNA editases. An ADAR from Drosophila, nucleotides. The cytosolic cM-I (5' nucleoti DADAR, expressed in the developing nervous system, may dase-I) cloned from pigeon heart catalyzes the formation of act on para Voltage-gated Na+ channel transcripts in the cen adenosine from AMP generated during ATP hydrolysis (Sala tral nervous system (Palladino, M. J. et al. (2000) RNA Newby, G. B. et al. (1999).J. Biol. Chem. 274: 17789-17793). 6:1004-1018). ADA deficiency causes profound lymphope Increased adenosine concentration is thought to be a signal of nia with severe combined immunodeficiency (SCID). Cells metabolic stress, and adenosine receptors mediate effects from patients with ADA deficiency contain low, sometimes including vasodilation, decreased stimulatory neuron firing undetectable, amounts of ADA catalytic activity and ADA and ischemic preconditioning in the heart (Schrader, J. (1990) protein. ADA deficiency stems from genetic mutations in the Circulation 81:389-391; Rubino, A. etal. (1992) Eur. J. Phar ADA gene (Hershfield, M.S. (1998) Semin. Hematol. 4:291 macol. 220:95-98; de Jong, J. W. et al. (2000) Pharmacol. 298). Metabolic consequences of ADA deficiency are asso Ther. 87: 141-149). Deficiency of pyrimidine 5'-nucleotidase ciated with defects in alveogenesis, pulmonary inflammation, can result in hereditary hemolytic anemia (OMIM #266120). and airway obstruction (Blackburn, M. R. etal. (2000) J. Exp. The lysozyme c Superfamily consists of conventional Med. 192:159-170). lysozymes c, calcium-binding lysozymes c, and C.-lactalbu 0492 Pancreatic ribonucleases (RNase) are pyrimidine min (Prager, E. M. and P. Jolles (1996) EXS 75:9-31). The specific endonucleases found in high quantity in the pancreas proteins in this Superfamily have 35-40% sequence homology US 2013/0332.133 A1 Dec. 12, 2013

and share a common three-dimensional fold, but can have Amyloidosis; Cotran, R. S. et al. (1994) Robbins Pathologic different functions. Lysozymes care ubiquitous in a variety of Basis of Disease, W.B. Saunders Company, Philadelphia Pa., tissues and secretions and can lyse the cell walls of certain pp. 231-238). Increased levels of lysozyme and lactate have bacteria (McKenzie, H. A. (1996) EXS 75:365-409). Alpha been observed in the cerebrospinal fluid of patients with lactalbumin is a metallo-protein that binds calcium and par bacterial meningitis (Ponka, A. etal. (1983) Infection 11:129 ticipates in the synthesis of lactose (Iyer, L. K. and P. K. Qasba 131). Acute monocytic leukemia is characterized by massive (1999) Protein Eng. 12:129-139). Alpha-lactalbumin occurs lysozymuria (Den Tandt, W. R. (1988) Int. J. Biochem. in mammalian milk and colostrum (McKenzie, Supra). 20:713-719). 0494 Lysozymes catalyze the hydrolysis of certain muco polysaccharides of bacterial cell walls, specifically, the beta Lyases (1-4) glycosidic linkages between N-acetylmuramic acid and 0497 Lyases are a class of enzymes that catalyze the N-acetylglucosamine, and cause bacterial lysis. Lysozymes cleavage of C C, C-O, C. N. C. S, C-(halide), P O, or occur in diverse organisms including viruses, birds, and other bonds without hydrolysis or oxidation to form two mammals. In humans, lysozymes are found in spleen, lung, molecules, at least one of which contains a double bond kidney, white blood cells, plasma, Saliva, milk, tears, and (Stryer, L. (1995) Biochemistry, W.H. Freeman and Co., New cartilage (OMIM #153450 Lysozyme: Weaver, L. H. et al. York N.Y., p. 620). Under the International Classification of (1985) J. Mol. Biol. 184:739-741). Lysozyme c functions in Enzymes (Webb, E. C. (1992) Enzyme Nomenclature 1992: as a digestive enzyme, releasing proteins from Recommendations of the Nomenclature Committee of the ingested bacterial cells, and may perform the same function in International Union of Biochemistry and Molecular Biology human newborns (Braun, O. H. et al. (1995) Klin. Pediatr. on the Nomenclature and Classification of Enzymes, Aca 207:4-7). demic Press, San Diego Calif.), lyases form a distinct class 0495. The two known forms of lysozymes, chicken-type designated by the numeral 4 in the first digit of the enzyme and goose-type, were originally isolated from chicken and number (i.e., EC 4.X.X.X). goose egg white, respectively. Chicken-type and goose-type Further classification of lyases reflects the type of bond lysozymes have similar three-dimensional structures, but dif cleaved as well as the nature of the cleaved group. The group ferent amino acid sequences (Nakano, T. and T. Graf (1991) of C-C lyases includes carboxyl-lyases (decarboxylases), Biochim. Biophys. Acta 1090:273-276). In chickens, both aldehyde-lyases (aldolases), oxo-acid-lyases, and other forms of lysozyme are found in neutrophil granulocytes (het lyases. The C-O lyase group includes hydro-lyases, lyases erophils), but only chicken-type lysozyme is found in egg acting on polysaccharides, and other lyases. The C N lyase white. Generally, chicken-type lysozyme mRNA is found in group includes ammonia-lyases, amidine-lyases, amine both adherent monocytes and macrophages and nonadherent lyases (deaminases), and other lyases. Lyases are critical promyelocytes and granulocytes as well as in cells of the bone components of cellular biochemistry, with roles in metabolic marrow, spleen, bursa, and Oviduct. Goose-type lysozyme energy production, including fatty acid metabolism and the mRNA is found in non-adherent cells of the bone marrow and tricarboxylic acid cycle, as well as other diverse enzymatic lung. Several isozymes have been found in rabbits, including processes. leukocytic, gastrointestinal, and possibly lymphoepithelial 0498 One important family of lyases are the carbonic forms (OMIM #153450, supra; Nakano and Graf, supra; and anhydrases (CA), also called carbonate dehydratases, which GenBank GI 1310929). A human lysozyme gene encoding a catalyze the hydration of carbon dioxide in the reaction HO+ protein similar to chicken-type lysozyme has been cloned COs HCO+-H". CA accelerates this reaction by a factor of (Yoshimura, K. et al. (1988) Biochem. Biophys. Res. Com over 10” by virtue of a zinc ion located in a deep cleft about mun. 150:794-801). A consensus motif featuring regularly 15. ANG. below the proteins surface and co-ordinated to the spaced cysteine residues has been derived from the lysozyme imidazole groups of three His residues. Water bound to the C enzymes of various species (PROSITE PS00128). Zinc ion is rapidly converted to HCO. Lysozyme C shares about 40% amino acid sequence identity 0499 Eight enzymatic and evolutionarily related forms of with C.-lactalbumin. carbonic anhydrase are currently known to exist in humans: 0496 Lysozymes have several disease associations. three cytosolic isozymes (CAI, CAII, and CAIII), two mem Lysozymuria is observed in diabetic nephropathy (Shima, M. brane-bound forms (CAIV and CAVII), a mitochondrial form et al. (1986) Clin. Chem. 32: 1818-1822), endemic nephropa (CAV), a secreted salivary form (CAVI) and a yet uncharac thy (Bruckner, I. et al. (1978) Med. Interne. 16:117-125), terized isozyme (PROSITE PDOC00146 Eukaryotic-type urinary tract infections (Heidegger, H. (1990) Minerva carbonic anhydrases signature). Though the isoenzymes CAI, Ginecol. 42:243-250), and acute monocytic leukemia (Shaw, CAII, and bovine CAIII have similar secondary structures M. T. (1978) Am. J. Hematol. 4:97-103). Nakano and Graf and polypeptide-chain folds, CAI has 6 tryptophans, CAII (Supra) suggested a role for lysozyme in host defense sys has 7 and CAIII has 8 (Boren, K. et al. (1996) Protein Sci. tems. Older rabbits with an inherited lysozyme deficiency 5:2479-2484). CAII is the predominant CA isoenzyme in the show increased Susceptibility to infections, such as Subcuta brain of mammals. neous abscesses (OMIM #153450, supra). Human lysozyme CAS participate in a variety of physiological processes that gene mutations cause hereditary systemic amyloidosis, a rare involve pH regulation, CO and HCO transport, ion trans autosomal dominant disease in which amyloid deposits form port, and water and electrolyte balance. For example, CAII in the viscera, including the kidney, adrenal glands, spleen, contributes to H'secretion by gastric parietal cells, by renal and liver. This disease is usually fatal by the fifth decade. The tubular cells, and by osteoclasts that secrete H to acidify the amyloid deposits contain variant forms of lysozyme. Renal bone-resorbing compartment. In addition, CAII promotes amyloidosis is the most common and potentially the most HCO secretion by pancreatic duct cells, cilary body epithe serious form of organ involvement (Pepys, M. B. etal. (1993) lium, choroid plexus, salivary gland acinar cells, and distal Nature 362:553-557; OMIM #105200 Familial Visceral colonal epithelium, thus playing a role in the production of US 2013/0332.133 A1 Dec. 12, 2013 39 pancreatic juice, aqueous humor, cerebrospinal fluid, and ODC is a pyridoxal-5'-phosphate (PLP)-dependent enzyme saliva, and contributing to electrolyte and water balance. which is active as a homodimer. Conserved residues include CAII also promotes CO exchange in proximal tubules in the those at the PLP binding site and a stretch of glycine residues kidney, in erythrocytes, and in lung. CAIV has roles in several thought to be part of a substrate binding region (PROSITE tissues: it facilitates HCO reabsorption in the kidney; pro PDOC00685 Orn/DAP/Arg decarboxylase family 2 signa motes CO flux in tissues including brain, skeletal muscle, tures). Mammalian ODCs also contain PEST regions, and heart muscle; and promotes CO exchange from the sequence fragments enriched in proline, glutamic acid, blood to the alveoli in the lung. CAVI probably plays a role in serine, and threonine residues that act as signals for intracel pH regulation in saliva, along with CAII, and may have a lular degradation (Nedina et al., Supra). protective effect in the esophagus and stomach. Mitochon 0502. Many chemical carcinogens and tumor promoters drial CAV appears to play important roles in gluconeogenesis increase ODC levels and activity. Several known oncogenes and ureagenesis, based on the effects of CA inhibitors on may increase ODC levels by enhancing transcription of the these pathways. (Sly, W. S., and P. Y. Hu (1995) Ann. Rev. ODC gene, and ODC itself may act as an oncogene when Biochem. 64:375-401.) expressed at very high levels. A high level of ODC is found in A number of disease states are marked by variations in CA a number of precancerous conditions, and elevation of ODC activity. Mutations in CAII which lead to CAII deficiency are levels has been used as part of a screen for tumor-promoting the cause of osteopetrosis with renal tubular acidosis (OMIM compounds (Pegg, A. E. et al. (1995).J. Cell. Biochem. Suppl. #259730 Osteopetrosis with Renal Tubular Acidosis). The 22:132-138). concentration of CAII in the cerebrospinal fluid (CSF) Inhibitors of ODC have been used to treat tumors in animal appears to mark disease activity in patients with brain dam models and human clinical trials, and have been shown to age. High CA concentrations have been observed in patients reduce development of tumors of the bladder, brain, esopha with brain infarction. Patients with transient ischemic attack, gus, gastrointestinal tract, lung, oral cavity, mammary gland, multiple Sclerosis, or epilepsy usually have CAII concentra stomach, skin and trachea (Pegg et al., Supra; McCann, P. P. tions in the normal range, but higher CAII levels have been and A. E. Pegg (1992) Pharmac. Ther. 54:195-215). ODC also observed in the CSF of those with central nervous system shows promise as a target for chemoprevention (Pegg et al., infection, dementia, or trigeminal neuralgia (Parkkila, A. K. supra). ODC inhibitors have also been used to treat infections et al. (1997) Eur. J. Clin. Invest. 27:392-397). Colonic by African trypanosomes, malaria, and Pneumocystis carinii, adenomas and adenocarcinomas have been observed to fail to and are potentially useful for treatment of autoimmune dis stain for CA, whereas non-neoplastic controls showed CAI eases such as lupus and rheumatoid arthritis (McCann and and CAII in the cytoplasm of the columnar cells lining the Pegg, Supra). upper half of colonic crypts. The neoplasms show staining 0503 Another family of pyridoxal-dependent decarboxy patterns similar to less mature cells lining the base of normal lases are the group II decarboxylases. This family includes crypts (Gramlich T. L. et al. (1990) Arch. Pathol. Lab. Med. glutamate decarboxylase (GAD) which catalyzes the decar 114:415-419). boxylation of glutamate into the neurotransmitter GABA: 0500. Therapeutic interventions in a number of diseases histidine decarboxylase (HDC), which catalyzes the decar involve altering CA activity. CA inhibitors such as acetazola boxylation of histidine to histamine; aromatic-L-amino-acid mide are used in the treatment of glaucoma (Stewart, W. C. decarboxylase (DDC), also known as L-dopa decarboxylase (1999) Curr. Opin. Opthamol. 10:99-108), essential tremor or tryptophan decarboxylase, which catalyzes the decarboxy and Parkinson's disease (Uitti, R. J. (1998) Geriatrics 53:46 lation of tryptophan to tryptamine and also acts on 5-hy 48, 53–57), intermittent ataxia (Singhvi, J. P. et al. (2000) droxy-tryptophan and dihydroxyphenylalanine (L-dopa): Neurology India 48:78-80), and altitude related illnesses and cysteine sulfinic acid decarboxylase (CSD), the rate (Klocke, D. L. et al. (1998) Mayo Clin. Proc. 73:988-992). limiting enzyme in the synthesis of taurine from cysteine 0501 CA activity can be particularly useful as an indicator (PROSITE PD0000329 DDC/GAD/HDC/TyrDC pyridoxal of long-term disease conditions, since the enzyme reacts rela phosphate attachment site). Taurine is an abundant Sulfonic tively slowly to physiological changes. CAI and Zinc concen amino acid in brain and is thought to act as an osmoregulator trations have been observed to decrease in hyperthyroid in brain cells (Bitoun, M. and M. TappaZ (2000) J. Neuro Graves disease (Yoshida, K. (1996) Tohoku J. Exp. Med. chem. 75:919-924). 178:345-356) and glycosylated CAI is observed in diabetes mellitus (Kondo, T. et al. (1987) Clin. Chim. Acta 166:227 Isomerases 236). A positive correlation has been observed between CAI 0504 Isomerases area class of enzymes that catalyze geo and CAII reactivity and endometriosis (Brinton, D. A. et al. metric or structural changes within a molecule to form a (1996) Ann. Clin. Lab. Sci. 26:409-420; D'Cruz, O. J. et al. single product. This class includes racemases and epime (1996) Fertil. Steril. 66:547-556). rases, cis-trans-isomerases, intramolecular oxidoreductases, Another important member of the lyase family is ornithine intramolecular transferases (mutases) and intramolecular decarboxylase (ODC), the initial rate-limiting enzyme in lyases. Isomerases are critical components of cellular bio polyamine biosynthesis. ODC catalyses the transformation of chemistry with roles in metabolic energy production includ ornithine into putrescine in the reaction ing glycolysis, as well as other diverse enzymatic processes L-ornithines.putrescine+CO. Polyamines, which include (Stryer, supra, pp. 483-507). putrescine and the Subsequent metabolic pathway products 0505 Racemases are a subset of isomerases that catalyze spermidine and spermine, are ubiquitous cell components inversion of a molecule's configuration around the asymmet essential for DNA synthesis, cell differentiation, and prolif ric carbonatom in a Substrate having a single center of asym eration. Thus the polyamines play a key role in tumor prolif metry, thereby interconverting two racemers. Epimerases are eration (Medina, M. A. et al. (1999) Biochem. Pharmacol. another Subset of isomerases that catalyze inversion of con 57:1341-1344). figuration around an asymmetric carbon atom in a substrate US 2013/0332.133 A1 Dec. 12, 2013 40 with more than one center of symmetry, thereby interconvert Surrounding cytosol. Correct disulfide formation can occur in ing two epimers. Racemases and epimerases can act on amino these compartments, but at a rate that is insufficient for nor acids and derivatives, hydroxy acids and derivatives, and mal cell processes and inadequate for synthesizing secreted carbohydrates and derivatives. The interconversion of UDP proteins. The protein disulfide isomerases, thioredoxins and galactose and UDP-glucose is catalyzed by UDP-galactose glutaredoxins are able to catalyze the formation of disulfide 4'-epimerase. Proper regulation and function of this epime bonds and regulate the redox environment in cells to enable rase is essential to the synthesis of glycoproteins and the necessary thiol:disulfide exchanges (Loferer, H. (1995).J. glycolipids. Elevated blood galactose levels have been corre Biol. Chem. 270:26178-26.183). lated with UDP-galactose-4'-epimerase deficiency in screen 05.08 Each of these proteins has somewhat different func ing programs of infants (Gitzelmann, R. (1972) Helv. Paediat. tions, but all belong to a group of disulfide-containing redox Acta 27:125-130). proteins that contain a conserved active-site sequence and are 0506 Correct folding of newly synthesized proteins is ubiquitously distributed in eukaryotes and prokaryotes. Pro assisted by molecular chaperones and folding catalysts, two tein disulfide isomerases are found in the endoplasmic reticu unrelated groups of helper molecules. Chaperones Suppress lum of eukaryotes and in the periplasmic space of prokary non-productive side reactions by Stoichiometric binding to otes. They function by exchanging their own disulfide for a folding intermediates, whereas folding enzymes catalyze thiol in a folding peptide chain. In contrast, the reduced some of the multiple folding steps that enable proteins to thioredoxins and glutaredoxins are generally found in the attain their final functional configurations (Kern, G. et al. cytoplasm and function by directly reducing disulfides in the (1994) FEBS Lett. 348: 145-148). One class of folding Substrate proteins. enzymes, the peptidyl prolyl cis-trans isomerases (PPIases), 0509. Oxidoreductases can be isomerases as well. Oxi isomerizes certain proline imidic bonds in what is considered doreductases catalyze the reversible transfer of electrons to be a rate limiting step in protein maturation and export. from a substrate that becomes oxidized to a substrate that PPIases catalyze the cis to trans isomerization of certain becomes reduced. This class of enzymes includes dehydro proline imidic bonds in proteins. There are three evolution genases, hydroxylases, oxidases, oxygenases, peroxidases, arily unrelated families of PPIases: the cyclophilins, the and reductases. Proper maintenance of oxidoreductase levels FK506 binding proteins, and the newly characterized parvu is physiologically important. For example, genetically-linked lin family (Rahfeld, J. U. et al. (1994) FEBS Lett. 352: 180 deficiencies in lipoamide dehydrogenase can result in lactic 184). acidosis (Robinson, B. H. et al. (1977) Pediat. Res. 11:1198 0507. The cyclophilins (CyP) were originally identified as 1202). major receptors for the immunosuppressive drug cyclosporin 0510 Another subgroup of isomerases are the transferases A (CSA), an inhibitor of T-cell activation (Handschumacher, (or mutases). Transferases transfera chemical group from one R. E. et al. (1984) Science 226:544-547: Harding, M. W. etal. compound (the donor) to another compound (the acceptor). (1986) J. Biol. Chem. 261:8547-8555). Thus, the peptidyl The types of groups transferred by these enzymes include prolyl isomerase activity of CyP may be part of the signaling acyl groups, amino groups, phosphate groups (phosphotrans pathway that leads to T-cell activation. Subsequent work ferases or phosphomutases), and others. The transferase car demonstrated that CyPs isomerase activity is essential for nitine palmitoyltransferase is an important component of correct protein folding and/or protein trafficking, and may fatty acid metabolism. Genetically-linked deficiencies in this also be involved in assembly/disassembly of protein com transferase can lead to myopathy (Scriver, C. etal. (1995) The plexes and regulation of protein activity. For example, in Metabolic and Molecular Basis of Inherited Disease, Drosophila, the CyP NinaA is required for correct localiza McGraw-Hill, New York N.Y., pp. 1501-1533). tion of rhodopsins, while a mammalian CyP (Cyp40) is part 0511. Yet another subgroup of isomerases are the topoiso of the Hsp90/Hsp70 complex that binds steroid receptors. mersases. Topoisomerases are enzymes that affect the topo The mammalian CyP (CypA) has been shown to bind the gag logical state of DNA. For example, defects in topoisomerases protein from human immunodeficiency virus 1 (HIV-1), an or their regulation can affect normal physiology. Reduced interaction that can be inhibited by cyclosporin. Since levels of topoisomerase II have been correlated with some of cyclosporin has potent anti-HIV-1 activity, CypA may play an the DNA processing defects associated with the disorder essential function in HIV-1 replication. Finally, Cyp40 has ataxia-telangiectasia (Singh, S. P. et al. (1988) Nucleic Acids been shown to bind and inactivate the transcription factor Res. 16:3919-3929). c-Myb, an effect that is reversed by cyclosporin. This effect implicates CyP in the regulation of transcription, transforma Ligases tion, and differentiation (Bergsma, D.J. et al (1991) J. Biol. 0512 Ligases catalyze the formation of a bond between Chem. 266:23204-23214; Hunter, T. (1998) Cell 92:141-143: two substrate molecules. The process involves the hydrolysis and Leverson, J. D. and S. A. Ness (1998) Mol. Cell. 1:203 of a pyrophosphate bond in ATP or a similar energy donor. 211). Ligases are classified based on the nature of the type of bond One of the major rate limiting steps in protein folding is the they form, which can include carbon-oxygen, carbon-sulfur, thiol:disulfide exchange that is necessary for correct protein carbon-nitrogen, carbon-carbon and phosphoric ester bonds. assembly. Although incubation of reduced, unfolded proteins 0513 Ligases forming carbon-oxygen bonds include the in buffers with defined ratios of oxidized and reduced thiols aminoacyl-transfer RNA (tRNA) synthetases which are can lead to native conformation, the rate offolding is slow and important RNA-associated enzymes with roles in translation. the attainment of native conformation decreases proportion Protein biosynthesis depends on each amino acid forming a ately with the size and number of cysteines in the protein. linkage with the appropriate tRNA. The aminoacyl-tRNA Certain cellular compartments such as the endoplasmic synthetases are responsible for the activation and correct reticulum of eukaryotes and the periplasmic space of attachment of an amino acid with its cognate tRNA. The 20 prokaryotes are maintained in a more oxidized state than the aminoacyl-tRNA synthetase enzymes can be divided into two US 2013/0332.133 A1 Dec. 12, 2013

structural classes, and each class is characterized by a distinc Ligases forming carbon-nitrogen bonds include amide Syn tive topology of the catalytic domain. Class I enzymes contain thases such as glutamine synthetase (glutamate-ammonia a catalytic domain based on the nucleotide-binding "Ross ligase) that catalyzes the amination of glutamic acid to man fold'. Class II enzymes contain a central catalytic glutamine by ammonia using the energy of ATP hydrolysis. domain, which consists of a seven-stranded antiparallel Glutamine is the primary Source for the amino group in vari B-sheet motif, as well as N- and C-terminal regulatory ous amide transfer reactions involved in de novo pyrimidine domains. Class II enzymes are separated into two groups nucleotide synthesis and in purine and pyrimidine ribonucle based on the heterodimeric or homodimeric structure of the otide interconversions. Overexpression of glutamine Syn enzyme; the latter group is further subdivided by the structure thetase has been observed in primary liver cancer (Christa, L. of the N- and C-terminal regulatory domains (Hartlein, M. et al. (1994) Gastroent. 106:1312-1320). and S. Cusack, (1995) J. Mol. Evol. 40:519-530). Autoanti 0516 Acid-amino-acid ligases (peptide synthases) are bodies against aminoacyl-tRNAS are generated by patients represented by the ubiquitin conjugating enzymes which are with dermatomyositis and polymyositis, and correlate associated with the ubiquitin conjugation system (UCS), a strongly with complicating interstitial lung disease (ILD). major pathway for the degradation of cellular proteins in These antibodies appear to be generated in response to viral eukaryotic cells and some bacteria. The UCS mediates the infection, and coxsackie virus has been used to induce experi elimination of abnormal proteins and regulates the half-lives mental viral myositis in animals. of important regulatory proteins that control cellular pro 0514 Ligases forming carbon-sulfur bonds (acid-thiol cesses such as gene transcription and cell cycle progression. ligases) mediate a large number of cellular biosynthetic inter In the UCS pathway, proteins targeted for degradation are mediary metabolism processes involving intermolecular conjugated to ubiquitin (Ub), a small heat stable protein. Ub transfer of carbon atom-containing Substrates (carbon Sub is first activated by a ubiquitin-activating enzyme (E1), and strates). Examples of such reactions include the tricarboxylic then transferred to one of several Ub-conjugating enzymes acid cycle, synthesis of fatty acids and long-chain phospho (E2). E2 then links the Ub molecule through its C-terminal lipids, synthesis of alcohols and aldehydes, synthesis of inter glycine to an internal lysine (acceptor lysine) of a target mediary metabolites, and reactions involved in the amino acid protein. The ubiquitinated protein is then recognized and degradation pathways. Some of these reactions require input degraded by proteasome, a large, multisubunit proteolytic of energy, usually in the form of conversion of ATP to either enzyme complex, and ubiquitin is released for reutilization by ADP or AMP and pyrophosphate. ubiquitin protease. The UCS is implicated in the degradation In many cases, a carbon Substrate is derived from a small of mitotic cyclic kinases, oncoproteins, tumor suppressor molecule containing at least two carbon atoms. The carbon genes such as p53, viral proteins, cell Surface receptors asso substrate is often covalently bound to a larger molecule which ciated with signal transduction, transcriptional regulators, acts as a carbon substrate carrier molecule within the cell. In and mutated or damaged proteins (Ciechanover, A. (1994) the biosynthetic mechanisms described above, the carrier Cell 79:13-21). molecule is coenzyme A. Coenzyme A (CoA) is structurally 0517 Cyclo-ligases and other carbon-nitrogen ligases related to derivatives of the nucleotide ADP and consists of comprise various enzymes and enzyme complexes that par 4'-phosphopantetheline linked via a phosphodiester bond to ticipate in the de novo pathways of purine and pyrimidine the alpha phosphate group of adenosine 3',5'-bisphosphate. biosynthesis. Because these pathways are critical to the Syn The terminal thiol group of 4'-phosphopantetheline acts as the thesis of nucleotides for replication of both RNA and DNA, site for carbon substrate bond formation. The predominant many of these enzymes have been the targets of clinical carbon substrates which utilize CoA as a carrier molecule agents for the treatment of cell proliferative disorders such as during biosynthesis and intermediary metabolism in the cell cancer and infectious diseases. are acetyl. Succinyl, and propionyl moieties, collectively 0518 Purine biosynthesis occurs de novo from the amino referred to as acyl groups. Other carbon substrates include acids glycine and glutamine, and other Small molecules. enoyl lipid, which acts as a fatty acid oxidation intermediate, Three of the key reactions in this process are catalyzed by a and carnitine, which acts as an acetyl-CoA flux regulator/ trifunctional enzyme composed of glycinamide-ribonucle mitochondrial acyl group transfer protein. Acyl-CoA and otide synthetase (GARS), aminoimidazole ribonucleotide acetyl-CoA are synthesized in the cell by acyl-CoA syn synthetase (AIRS), and glycinamide ribonucleotide trans thetase and acetyl-CoA synthetase, respectively. formylase (GART). Together these three enzymes combine 0515 Activation offatty acids is mediated by at least three ribosylamine phosphate with glycine to yield phosphoribosyl forms of acyl-CoA synthetase activity: i) acetyl-CoA syn aminoimidazole, a precursor to both adenylate and guanylate thetase, which activates acetate and several other low molecu nucleotides. This trifunctional protein has been implicated in lar weight carboxylic acids and is found in muscle mitochon the pathology of Downs syndrome (Aimi, J. et al. (1990) dria and the cytosol of other tissues; ii) medium-chain acyl Nucleic Acid Res. 18:6665-6672). Adenylosuccinate syn CoA synthetase, which activates fatty acids containing thetase catalyzes a later step in purine biosynthesis that con between four and eleven carbon atoms (predominantly from verts inosinic acid to adenyloSuccinate, a key step on the path dietary Sources), and is present only in liver mitochondria; to ATP synthesis. This enzyme is also similar to another and iii) acyl CoA synthetase, which is specific for long chain carbon-nitrogen ligase, argininosuccinate synthetase, that fatty acids with between six and twenty carbon atoms, and is catalyzes a similar reaction in the urea cycle (Powell, S. M. et found in microsomes and the mitochondria. Proteins associ al. (1992) FEBS Lett. 303:4-10). ated with acyl-CoA synthetase activity have been identified AdenyloSuccinate synthetase, adenyloSuccinate lyase, and from many sources including bacteria, yeast, plants, mouse, AMP deaminase may be considered as a functional unit, the and man. The activity of acyl-CoA synthetase may be modu purine nucleotide cycle. This cycle converts AMP to inosine lated by phosphorylation of the enzyme by cAMP-dependent monophosphate (IMP) and reconverts IMP to AMP via ade protein kinase. nyloSuccinate, thereby producing NH and forming fumarate US 2013/0332.133 A1 Dec. 12, 2013 42 from aspartate. In muscle, the purine nucleotide cycle func DNA can occur. Bloom's syndrome is an inherited human tions, during intense exercise, in the regeneration of ATP by disease in which individuals are partially deficient in DNA pulling the adenylate kinase reaction in the direction of ATP ligation and consequently have an increased incidence of formation and by providing Krebs cycle intermediates. In cancer (Alberts et al., Supra, p. 247). kidney, the purine nucleotide cycle accounts for the release of 0522 Pantothenate synthetase (D-pantoate; beta-alanine NH under normal acid-base conditions. In brain, the purine ligase (AMP-forming); EC 6.3.2.1) is the last enzyme of the nucleotide cycle may contribute to ATP recovery. Adenylo pathway of pantothenate (vitamin B(5)) synthesis. It cata Succinate lyase deficiency provokes psychomotor retarda lyzes the condensation of pantoate with beta-alanine in an tion, often accompanied by autistic features (Van den Berghe, ATP-dependent reaction. The enzyme is dimeric, with two G. et al. (1992) Prog Neurobiol. 39:547-561). A marked well-defined domains per protomer: the N-terminal domain, a imbalance in the enzymic pattern of purine metabolism is Rossmann fold, contains the active site cavity, with the C-ter linked with transformation and/or progression in cancer cells. minal domainforming a hinged lid. The N-terminal domain is In rat hepatomas the specific activities of the anabolic structurally very similar to class I aminoacyl-tRNA syn enzymes, IMP dehydrogenase, GMP synthetase, adenylosuc thetases and is thus a member of the cytidylyltransferase cinate synthetase, adenyloSuccinase, AMP deaminase and superfamily (von Delft, F. et al. (2000) Structure (Camb) amidophosphoribosyltransferase, increased to 13.5-, 3.7- 9:439-450). Farnesyl diphosphate synthase (FPPS) is an 3.1-, 1.8- 5.5- and 2.8-fold, respectively, of those in normal essential enzyme that is required both for cholesterol synthe liver (Weber, G. (1983) Clin. Biochem. 16:57-63). sis and protein prenylation. The enzyme catalyzes the forma 0519 Like the de novo biosynthesis of purines, de novo tion of farnesyl diphosphate from dimethylallyl diphosphate synthesis of the pyrimidine nucleotides uridylate and cytidy and isopentyl diphosphate. FPPS is inhibited by nitrogen late also arises from a common precursor, in this instance the containing biphosphonates, which can lead to the inhibition nucleotide orotidylate derived from orotate and phosphoribo ofosteoclast-mediated bone resorption by preventing protein syl pyrophosphate (PPRP). Again a trifunctional enzyme prenylation (Dunford, J. E. et al. (2001) J. Pharmacol. Exp. comprising three carbon-nitrogen ligases plays a key role in Ther. 296:235-242). the process. In this case the enzymes aspartate transcarbamy 0523 5-aminolevulinate synthase (ALAS; delta-aminole lase (ATCase), carbamyl phosphate synthetase II, and dihy Vulinate synthase: EC 2.3.1.37) catalyzes the rate-limiting droorotase (DHOase) are encoded by a single gene called step in heme biosynthesis in both erythroid and non-erythroid CAD. Together these three enzymes combine the initial reac tissues. This enzyme is unique in the heme biosynthetic path tants in pyrimidine biosynthesis, glutamine, CO and ATP to way in being encoded by two genes, the first encoding ALAS form dihydroorotate, the precursor to orotate and orotidylate 1, the non-erythroid specific enzyme which is ubiquitously (Iwahana, H. et al. (1996) Biochem. Biophys. Res. Commun. expressed, and the second encoding ALAS2, which is 2 19:249-255). Further steps then lead to the synthesis of expressed exclusively in erythroid cells. The genes for ALAS uridine nucleotides from orotidylate. Cytidine nucleotides 1 and ALAS2 are located, respectively, on chromosome 3 and are derived from uridine-5'-triphosphate (UTP) by the ami on the X chromosome. Defects in the gene encoding ALAS2 dation of UTP using glutamine as the amino donor and the result in X-linked sideroblastic anemia. Elevated levels of enzyme CTP synthetase. Regulatory mutations in the human ALAS are seen in acute hepatic porphyrias and can be low CTP synthetase are believed to confer multi-drug resistance ered by Zinc mesoporphyrin. to agents widely used in cancer therapy (Yamauchi, M. et al. (1990) EMBO J. 9:2095-2099). Drug Metabolizing Enzymes (DMEs) 0520 Ligases forming carbon-carbon bonds include the 0524. The metabolism of a drug and its movement through carboxylases acetyl-CoA carboxylase and pyruvate carboxy the body (pharmacokinetics) are important in determining its lase. Acetyl-CoA carboxylase catalyzes the carboxylation of effects, toxicity, and interactions with other drugs. The three acetyl-CoA from CO, and HO using the energy of ATP processes governing pharmacokinetics are the absorption of hydrolysis. Acetyl-CoA carboxylase is the rate-limiting the drug, distribution to various tissues, and elimination of enzyme in the biogenesis of long-chain fatty acids. Two iso drug metabolites. These processes are intimately coupled to forms of acetyl-CoA carboxylase, types I and types II, are drug metabolism, since a variety of metabolic modifications expressed in human in a tissue-specific manner (Ha, J. et al. alter most of the physicochemical and pharmacological prop (1994) Eur. J. Biochem. 219:297-306). Pyruvate carboxylase erties of drugs, including Solubility, binding to receptors, and is a nuclear-encoded mitochondrial enzyme that catalyzes the excretion rates. The metabolic pathways which modify drugs conversion of pyruvate to oxaloacetate, a key intermediate in also accept a variety of naturally occurring Substrates Such as the citric acid cycle. steroids, fatty acids, prostaglandins, leukotrienes, and Vita 0521 Ligases forming phosphoric ester bonds include the mins. The enzymes in these pathways are therefore important DNA ligases involved in both DNA replication and repair. sites of biochemical and pharmacological interaction DNA ligases seal phosphodiester bonds between two adja between natural compounds, drugs, carcinogens, mutagens, cent nucleotides in a DNA chain using the energy from ATP and Xenobiotics. It has long been appreciated that inherited hydrolysis to first activate the free 5'-phosphate of one nucle differences in drug metabolism lead to drastically different otide and then react it with the 3'-OH group of the adjacent levels of drug efficacy and toxicity among individuals. nucleotide. This resealing reaction is used in DNA replication Advances in pharmacogenomics research, of which DMEs to join small DNA fragments called “Okazaki' fragments that constitute an important part, are promising to expand the tools are transiently formed in the process of replicating new DNA, and information that can be brought to bear on questions of and in DNA repair. DNA repair is the process by which drug efficacy and toxicity (See Evans, W. E. and R. V. Relling accidental base changes. Such as those produced by oxidative (1999) Science 286:487-491). DMEs have broad substrate damage, hydrolytic attack, or uncontrolled methylation of specificities, unlike antibodies, for example, which are DNA, are corrected before replication or transcription of the diverse and highly specific. Since DMEs metabolize a wide US 2013/0332.133 A1 Dec. 12, 2013

variety of molecules, drug interactions may occur at the level 0528. Four hundred cytochromes P450 have been identi of metabolism so that, for example, one compound may fied in diverse organisms including bacteria, fungi, plants, induce a DME that affects the metabolism of another com and animals (Graham-Lorence and Peterson, Supra). The pound. B-class is found in prokaryotes and fungi, while the E-class is 0525 Drug metabolic reactions are categorized as Phase I, found in bacteria, plants, insects, vertebrates, and mammals. which prepare the drug molecule for functioning and further Five Subclasses or groups are found within the larger family metabolism, and Phase II, which are conjugative. In general, of E-class cytochromes P450 (PRINTS EP450I E-Class P450 Phase I reaction products are partially or fully inactive, and Group I signature). Phase II reaction products are the chief excreted species. 0529 All cytochromes P450 use a heme cofactor and However, Phase I reaction products are sometimes more share structural attributes. Most cytochromes P450 are 400 to active than the original administered drugs; this metabolic 530 amino acids in length. The secondary structure of the activation principle is exploited by pro-drugs (e.g. L-dopa). enzyme is about 70% alpha-helical and about 22% beta-sheet. Additionally, Some nontoxic compounds (e.g. aflatoxin, The region around the heme-binding site in the C-terminal benzo C. pyrene) are metabolized to toxic intermediates part of the protein is conserved among cytochromes P450. A through these pathways. Phase I reactions are usually rate ten amino acid signature sequence in this heme-iron ligand limiting in drug metabolism. Prior exposure to the compound, region has been identified which includes a conserved cys or other compounds, can induce the expression of Phase I teine involved in binding the heme iron in the fifth coordina enzymes however, and thereby increase substrate flux tion site. In eukaryotic cytochromes P450, a membrane-Span through the metabolic pathways. (See Klaassen, C. D. et al. ning region is usually found in the first 15-20 amino acids of (1996) Casarett and Doull's Toxicology: The Basic Science the protein, generally consisting of approximately 15 hydro of Poisons, McGraw-Hill, New York, N.Y., pp. 113-186: Kat phobic residues followed by a positively charged residue. Zung, B.G. (1995) Basic and Clinical Pharmacology, Apple (See Prosite PDOC00081, supra; Graham-Lorence and Peter ton and Lange, Norwalk, Conn., pp. 48-59; Gibson, G. G. and son, Supra.) P. Skett (1994) Introduction to Drug Metabolism, Blackie 0530 Cytochrome P450 enzymes are involved in cell pro Academic and Professional, London.). liferation and development. 0526. The major classes of Phase I enzymes include, but 0531. The enzymes have roles in chemical mutagenesis are not limited to, cytochrome P450 and flavin-containing and carcinogenesis by metabolizing chemicals to reactive monooxygenase. Other enzyme classes involved in Phase intermediates that form adducts with DNA (Nebert, D.W. and I-type catalytic cycles and reactions include, but are not lim F. J. Gonzalez (1987) Ann. Rev. Biochem. 56:945-993). ited to, NADPH cytochrome P450 reductase (CPR), the These adducts can cause nucleotide changes and DNA rear microsomal cytochrome b5/NADH cytochrome b5 reductase rangements that lead to oncogenesis. Cytochrome P450 system, the ferredoxin/ferredoxin reductase redox pair, aldo/ expression in liver and other tissues is induced by Xenobiotics keto reductases, and alcohol dehydrogenases. The major Such as polycyclic aromatic hydrocarbons, peroxisomal pro classes of Phase II enzymes include, but are not limited to, liferators, phenobarbital, and the glucocorticoid dexametha UDP glucuronyltransferase, sulfotransferase, glutathione sone (Dogra, S. C. et al. (1998) Clin. Exp. Pharmacol. S-transferase, N-acyltransferase, and N-acetyl transferase. Physiol. 25: 1-9). A cytochrome P450 protein may participate in eye development as mutations in the P450 gene CYP1B1 Cytochrome P450 and P450 Catalytic Cycle-Associated cause primary congenital glaucoma (OMIM #601771 Cyto Enzymes chrome P450, subfamily I (dioxin-inducible), polypeptide 1: 0527 Members of the cytochrome P450 superfamily of CYP1B1). enzymes catalyze the oxidative metabolism of a variety of 0532 Cytochromes P450 are associated with inflamma Substrates, including natural compounds such as steroids, tion and infection. Hepatic cytochrome P450 activities are fatty acids, prostaglandins, leukotrienes, and vitamins, as profoundly affected by various infections and inflammatory well as drugs, carcinogens, mutagens, and Xenobiotics. Cyto stimuli. Some of which are Suppressed and some induced chromes P450, also known as P450 heme-thiolate proteins, (Morgan, E. T. (1997) Drug Metab. Rev. 29:1129-1188). usually act as terminal oxidases in multi-component electron Effects observed in vivo can be mimicked by proinflamma transfer chains, called P450-containing monooxygenase sys tory cytokines and interferons. Autoantibodies to two cyto tems. Specific reactions catalyzed include hydroxylation, chrome P450 proteins were found in patients with autoim epoxidation, N-oxidation, Sulfooxidation, N-, S-, and U polyenodocrinopathy-candidiasis-ectodermal O-dealkylations, desulfation, deamination, and reduction of dystrophy (APECED), a polyglandular autoimmune syn aZo, nitro, and N-oxide groups. These reactions are involved drome (OMIM #240300 Autoimmune polyenodocrinopathy in steroidogenesis of glucocorticoids, cortisols, estrogens, candidiasis-ectodermal dystrophy). and androgens in animals; insecticide resistance in insects; 0533. Mutations in cytochromes P450 have been linked to herbicide resistance and flower coloring in plants; and envi metabolic disorders, including congenital adrenal hyperpla ronmental bioremediation by microorganisms. Cytochrome sia, the most common adrenal disorder of infancy and child P450 actions on drugs, carcinogens, mutagens, and Xenobi hood; pseudovitamin D-deficiency rickets; cerebrotendinous otics can result in detoxification or in conversion of the Sub Xanthomatosis, a lipid storage disease characterized by pro stance to a more toxic product. Cytochromes P450 are abun gressive neurologic dysfunction, premature atherosclerosis, dant in the liver, but also occur in other tissues; the enzymes and cataracts; and an inherited resistance to the anticoagulant are located in microsomes. (See ExPASY ENZYMEEC 1.14. drugs coumarin and warfarin (Isselbacher, K. J. et al. (1994) 14.1; Prosite PDOC00081 Cytochrome P450 cysteine heme Harrison's Principles of Internal Medicine, McGraw-Hill, iron ligand signature: PRINTS EP450I E-Class P450 Group I Inc. New York, N.Y., pp. 1968-1970; Takeyama, K. et al. signature; Graham-Lorence, S, and J. A. Peterson (1996) (1997) Science 277:1827-1830; Kitanaka, S. et al. (1998) N. FASEB.J. 10:206-214.) Engl. J. Med. 338:653-661; OMIM #213700 Cerebrotendi US 2013/0332.133 A1 Dec. 12, 2013 44 nous xanthomatosis; and OMIM #122700 Coumarin resis homology with other members of the family. Vitamin D tance). Extremely high levels of expression of the cytochrome 25-hydroxylase also shows a broad substrate specificity and P450 protein aromatase were found in a fibrolamellar hepa may also perform 26-hydroxylation ofbile acid intermediates tocellular carcinoma from a boy with severe gynecomastia and 25, 26, and 27-hydroxylation of cholesterol (Dilworth, F. (feminization) (Agarwal, V. R. (1998) J. Clin. Endocrinol. J. et al. (1995) J. Biol. Chem. 270: 16766-16774; Miller and Metab. 83:1797-1800). Portale, supra; and references within). 0534. The cytochrome P450 catalytic cycle is completed 0539. The active form of vitamin D (1C.25(OH),D) is through reduction of cytochrome P450 by NADPH cyto involved in calcium and phosphate homeostasis and promotes chrome P450 reductase (CPR). Another microsomal electron the differentiation of myeloid and skin cells. Vitamin D defi transport system consisting of cytochrome b5 and NADPH ciency resulting from deficiencies in the enzymes involved in cytochrome b5 reductase has been widely viewed as a minor Vitamin D metabolism (e.g., 1C.-hydroxylase) causes hypoc contributor of electrons to the cytochrome P450 catalytic alcemia, hypophosphatemia, and vitamin D-dependent (sen cycle. However, a recent report by Lamb, D. C. et al. (1999; sitive) rickets, a disease characterized by loss of bone density FEBS Lett. 462:283-288) identifies a Candida albicans cyto and distinctive clinical features, including bandy or bow leg chrome P450(CYP51) which can be efficiently reduced and gedness accompanied by a waddling gait. Deficiencies in supported by the microsomal cytochrome b5/NADPH cyto vitamin D 25-hydroxylase cause cerebrotendinous xanth chrome b5 reductase system. Therefore, there are likely many omatosis, a lipid-storage disease characterized by the depo cytochromes P450 which are supported by this alternative sition of cholesterol and cholestanol in the Achilles' tendons, electron donor system. brain, lungs, and many other tissues. The disease presents 0535 Cytochrome b5 reductase is also responsible for the with progressive neurologic dysfunction, including postpu reduction of oxidized hemoglobin (methemoglobin, or ferri bescent cerebellar ataxia, atherosclerosis, and cataracts. Vita hemoglobin, which is unable to carry oxygen) to the active min D 25-hydroxylase deficiency does not result in rickets, hemoglobin (ferrohemoglobin) in red blood cells. Methemo Suggesting the existence of alternative pathways for the Syn globinemia results when there is a high level of oxidant drugs thesis of 25(OH)D (Griffin, J. E. and J. B. Zerwekh (1983).J. or an abnormal hemoglobin (hemoglobin M) which is not Clin. Invest. 72:1190-1199: Gamblin, G. T. et al. (1985) J. efficiently reduced. Methemoglobinemia can also result from Clin. Invest. 75:954-960; and Miller and Portale, supra). a hereditary deficiency in red cell cytochrome b5 reductase 0540 Ferredoxin and ferredoxin reductase are electron (Reviewed in Mansour, A. and A. A. Lurie (1993) Am. J. transport accessory proteins which Support at least one Hematol. 42:7-12). human cytochrome P450 species, cytochrome P450c27 0536 Members of the cytochrome P450 family are also encoded by the CYP27 gene (Dilworth, F. J. et al. (1996) closely associated with Vitamin D Synthesis and catabolism. Biochem. J. 320:267-71). A Streptomyces griseus cyto Vitamin Dexists as two biologically equivalent prohormones, chrome P450, CYP104D1, was heterologously expressed in ergocalciferol (vitamin D), produced in plant tissues, and Escherichia coli and found to be reduced by the endogenous cholecalciferol (vitamin D), produced in animal tissues. The ferredoxin and ferredoxin reductase enzymes (Taylor, M. et latter form, cholecalciferol, is formed upon the exposure of al. (1999) Biochem. Biophys. Res. Commun. 263:838-842), 7-dehydrocholesterol to near ultraviolet light (i.e., 290-310 Suggesting that many cytochrome P450 species may be Sup nm), normally resulting from even minimal periods of skin ported by the ferredoxin/ferredoxin reductase pair. Ferre exposure to sunlight (reviewed in Miller, W. L. and A. A. doxin reductase has also been found in a model drug metabo Portale (2000) Trends Endocrinol. Metab. 11:315-319). lism system to reduce actinomycin D, an antitumorantibiotic, 0537 Both prohormone forms are further metabolized in to a reactive free radical species (Flitter, W. D. and R. P. the liver to 25-hydroxyvitamin D (25(OH)D) by the enzyme Mason (1988) Arch. Biochem. Biophys. 267:632-639). 25-hydroxylase. 25(OH)D is the most abundant precursor form of vitamin D which must be further metabolized in the Flavin-Containing Monooxygnase (FMO) kidney to the active form, 1C.25-dihydroxyvitami-n D (1C. 0541 Flavin-containing monooxygenases oxidize the 25(OH)D), by the enzyme 25-hydroxyvitamin D 1C.-hy nucleophilic nitrogen, Sulfur, and heteroatom of droxylase (1O-hydroxylase). Regulation of 1C.25(OH)2D an exceptional range of Substrates. Like cytochromes P450, production is primarily at this final step in the synthetic path FMOs are microsomal and use NADPH and O.; there is also way. The activity of 1 O-hydroxylase depends upon several a great deal of substrate overlap with cytochromes P450. The physiological factors including the circulating level of the tissue distribution of FMOs includes liver, kidney, and lung. enzyme product (1C.25(OH),D) and the levels of parathyroid 0542. Isoforms of FMO in mammals include FMO1, hormone (PTH), calcitonin, insulin, calcium, phosphorus, FMO2, FMO3, FMO4, and FMO5, which are expressed in a growth hormone, and prolactin. Furthermore, extrarenal 1C.- tissue-specific manner. The isoforms differ in their substrate hydroxylase activity has been reported, Suggesting that tis specificities and properties Such as inhibition by various com sue-specific, local regulation of 1C.25(OH)2D production pounds and stereospecificity of reaction. FMOs have a 13 may also be biologically important. The catalysis of 1C.25 amino acid signature sequence, the components of which (OH)D to 24,25-dihydroxyvitamin D (24.25(OH)2D), span the N-terminal two-thirds of the sequences and include involving the enzyme 25-hydroxyvitamin D 24-hydroxylase the FAD binding region and the FATGY motif found in many (24-hydroxylase), also occurs in the kidney. 24-hydroxylase N-hydroxylating enzymes (Stehr, M. et al. (1998) Trends can also use 25(OH)D as a substrate (Shinki, T. et al. (1997) Biochem. Sci. 23:56-57; PRINTS FMOXYGENASE Flavin Proc. Natl. Acad. Sci. U.S.A. 94:12920-12925; Miller and containing monooxygenase signature). Specific reactions Portale, supra; and references within). include oxidation of nucleophilic tertiary amines to N-oxides, 0538 Vitamin D 25-hydroxylase, 1C.-hydroxylase, and secondary amines to hydroxylamines and nitrones, primary 24-hydroxylase are all NADPH-dependent, type I (mitochon amines to hydroxylamines and oximes, and Sulfur-containing drial) cytochrome P450 enzymes that show a high degree of compounds and phosphines to S- and P-Oxides. Hydrazines, US 2013/0332.133 A1 Dec. 12, 2013 iodides, selenides, and boron-containing compounds are also are preferentially cytotoxic for rapidly dividing cells (or DNA substrates. FMOs are more heat labile and less detergent virus-infected cells) but have no specificity, resulting in the sensitive than cytochromes P450 in vitro though FMO iso indiscriminate destruction of dividing cells. Furthermore, forms vary in thermal stability and detergent sensitivity. cancer cells may become resistant to drugs such as methotr 0543 FMOS play important roles in the metabolism of exate as a result of acquired transport defects or the duplica several drugs and xenobiotics. FMO (FMO3 in liver) is pre tion of one or more DHFR genes (Stryer, L. (1988) Biochem dominantly responsible for metabolizing (S)-nicotine to (S)- istry. W.H. Freeman and Co., Inc. New York. pp. 511-519). nicotine N-1'-oxide, which is excreted in urine. FMO is also involved in S-oxygenation of cimetidine, an H-antagonist Aldo/Keto Reductases widely used for the treatment of gastric ulcers. Liver-ex pressed forms of FMO are not under the same regulatory (0547 Aldo/keto reductases are monomeric NADPH-de control as cytochrome P450. In rats, for example, phenobar pendent oxidoreductases with broad substrate specificities (Bohren, K. M. et al. (1989) J. Biol. Chem. 264:9547-9551). bital treatment leads to the induction of cytochrome P450, but These enzymes catalyze the reduction of carbonyl-containing the repression of FMO1. compounds, including carbonyl-containing Sugars and aro Lysyl Oxidase matic compounds, to the corresponding alcohols. Therefore, a variety of carbonyl-containing drugs and Xenobiotics are 0544 Lysyl oxidase (lysine 6-oxidase, LO) is a copper likely metabolized by enzymes of this class. dependent amine oxidase involved in the formation of con 0548 One known reaction catalyzed by a family member, nective tissue matrices by crosslinking collagen and elastin. aldose reductase, is the reduction of glucose to Sorbitol, LO is secreted as an N-glycosylated precursor protein of which is then further metabolized to fructose by sorbitol approximately 50 kDa and cleaved to the mature form of the dehydrogenase. Under normal conditions, the reduction of enzyme by a metalloprotease, although the precursor form is glucose to Sorbitol is a minor pathway. In hyperglycemic also active. The copperatom in LO is involved in the transport states, however, the accumulation of Sorbitol is implicated in of electrons to and from oxygen to facilitate the oxidative the development of diabetic complications (OMIM #103880 deamination of lysine residues in these extracellular matrix Aldo-keto reductase family 1, member B1). Members of this proteins. While the coordination of copper is essential to LO enzyme family are also highly expressed in some liver can activity, insufficient dietary intake of copper does not influ cers (Cao, D. et al. (1998).J. Biol. Chem. 273:11429-11435). ence the expression of the apoenzyme. However, the absence of the functional LO is linked to the skeletal and vascular Alcohol Dehydrogenases tissue disorders that are associated with dietary copper defi ciency. LO is also inhibited by a variety of semicarbazides, 0549. Alcohol dehydrogenases (ADHs) oxidize simple hydrazines, and amino nitrites, as well as heparin. Beta-ami alcohols to the corresponding aldehydes. ADH is a cytosolic nopropionitrile is a commonly used inhibitor. LO activity is enzyme, prefers the cofactor NAD", and also binds Zinc ion. increased in response to oZone, cadmium, and elevated levels Liver contains the highest levels of ADH, with lower levels in of hormones released in response to local tissue trauma, Such kidney, lung, and the gastric mucosa. as transforming growth factor-beta, platelet-derived growth 0550 Known ADH isoforms are dimeric proteins com factor, angiotensin II, and fibroblast growth factor. Abnor posed of 40 kDa subunits. There are five known gene loci malities in LO activity have been linked to Menkes syndrome which encode these subunits (a, b, g, p, c), and some of the and occipital horn Syndrome. Cytosolic forms of the enzyme loci have characterized allelic variants (b,b,b, g, g). The have been implicated in abnormal cell proliferation (reviewed Subunits can form homodimers and heterodimers; the Subunit in Rucker, R. B. et al. (1998) Am. J. Clin. Nutr. 67:996 composition determines the specific properties of the active S-1002S and Smith-Mungo, L. I. and H. M. Kagan (1998) enzyme. The holoenzymes have therefore been categorized as Matrix Biol. 16:387-398). Class I (subunit compositions aa, ab, ag, bg.gg), Class II (pp), and Class III (cc). Class I ADH isozymes oxidize ethanol and Dihydrofolate Reductases other small aliphatic alcohols, and are inhibited by pyrazole. (0545 Dihydrofolate reductases (DHFR) are ubiquitous Class II isozymes prefer longer chain aliphatic and aromatic enzymes that catalyze the NADPH-dependent reduction of alcohols, are unable to oxidize methanol, and are not inhib dihydrofolate to tetrahydrofolate, an essential step in the de ited by pyrazole. Class III isozymes prefer even longer chain novo synthesis of glycine and purines as well as the conver aliphatic alcohols (five carbons and longer) and aromatic sion of deoxyuridine monophosphate (dUMP) to deoxythy alcohols, and are not inhibited by pyrazole. midine monophosphate (dTMP). The basic reaction is as 0551. The short-chain alcohol dehydrogenases include a follows: number of related enzymes with a variety of substrate speci ficities. Included in this group are the mammalian enzymes 7,8-dihydrofolate+NADPH->5,6,7,8-tetrahydro D-beta-hydroxybutyrate dehydrogenase, (R)-3-hydroxybu folate-NADP tyrate dehydrogenase, 15-hydroxyprostaglandin dehydroge 0546. The enzymes can be inhibited by a number of dihy nase, NADPH-dependent carbonyl reductase, corticosteroid drofolate analogs, including trimethroprimand methotrexate. 11-beta-dehydrogenase, and estradiol 17-beta-dehydroge Since an abundance of dTMP is required for DNA synthesis, nase, as well as the bacterial enzymes acetoacetyl-CoA rapidly dividing cells require the activity of DHFR. The rep reductase, glucose 1-dehydrogenase, 3-beta-hydroxysteroid lication of DNA viruses (i.e., herpesvirus) also requires high dehydrogenase, 20-beta-hydroxysteroid dehydrogenase, levels of DHFR activity. As a result, drugs that target DHFR ribitol dehydrogenase, 3-oxoacyl reductase, 2,3-dihydro-2,3- have been used for cancer chemotherapy and to inhibit DNA dihydroxybenzoate dehydrogenase, Sorbitol-6-phosphate virus replication. (For similar reasons, thymidylate Syn 2-dehydrogenase, 7-alpha-hydroxysteroid dehydrogenase, thetases are also target enzymes.) Drugs that inhibit DHFR cis-1,2-dihydroxy-3,4-cyclohexadiene-1-carboxylate dehy US 2013/0332.133 A1 Dec. 12, 2013 46 drogenase, cis-toluene dihydrodiol dehydrogenase, cis-ben being present in the Golgi. P1,3-galactosyltransferases form Zene glycol dehydrogenase, biphenyl-2,3-dihydro-2,3-diol Type I carbohydrate chains with Gal (B.1-3)GlcNAc link dehydrogenase, N-acylmannosamine 1-dehydrogenase, and ages. Known human and mouse B1,3-galactosyltransferases 2-deoxy-D-gluconate 3-dehydrogenase (Krozowski, Z. appear to have a short cytosolic domain, a single transmem (1994) J. Steroid Biochem. Mol. Biol. 51:125-130; Krozo brane domain, and a catalytic domain with eight conserved wski, Z. (1992) Mol. Cell. Endocrinol. 84:C25-31; and regions. (Kolbinger et al., supra; and Hennet, T. et al. (1998) Marks, A. R. et al. (1992) J. Biol. Chem. 267:15459-15463). J. Biol. Chem. 273:58-65). In mouse UDP-galactose: B-N- acetylglucosamine B1,3-galactosyltransferase-I region 1 is Sulfotransferases located at amino acid residues 78-83, region 2 is located at amino acid residues 93-102, region3 is located at amino acid 0552 Sulfate conjugation occurs on many of the same residues 116-119, region 4 is located at amino acid residues Substrates which undergo O-glucuronidation to produce a 147-158, region 5 is located at amino acid residues 172-183, highly water-soluble sulfuric acid ester. Sulfotransferases region 6 is located at amino acid residues 203-206, region 7 is (ST) catalyze this reaction by transferring SO, from the located at amino acid residues 236-246, and region 8 is cofactor 3'-phosphoadenosine-5'-phosphosulfate (PAPS) to located at amino acid residues 264-275. A variant of a the substrate. ST substrates are predominantly phenols and sequence found within mouse UDP-galactose: B-N-acetyl aliphatic alcohols, but also include aromatic amines and ali glucosamine B1,3-galactosyltransferase-I region 8 is also phatic amines, which are conjugated to produce the corre found in bacterial galactosyltransferases, Suggesting that this sponding Sulfamates. The products of these reactions are sequence defines a galactosyltransferase sequence motif excreted mainly in urine. (Hennet et al., Supra). Recent work Suggests that brainiac 0553 STs are found in a wide range of tissues, including protein is a B1,3-galactosyltransferase (Yuan, Y. et al. (1997) liver, kidney, intestinal tract, lung, platelets, and brain. The Cell 88:9-11; and Hennet et al., supra). enzymes are generally cytosolic, and multiple forms are often 0557 UDP-Gal:GlcNAc-1,4-galactosyltransferase (-1,4- co-expressed. For example, there are more than a dozen forms GalT) (Sato, T. et al., (1997) EMBO J. 16:1850-1857) cata of ST in rat liver cytosol. These biochemically characterized lyzes the formation of Type II carbohydrate chains with Gal STs fall into five classes based on their substrate preference: (B1-4)GlcNAc linkages. As is the case with the B1,3-galac arylsulfotransferase, alcohol Sulfotransferase, estrogen Sul tosyltransferase, a soluble form of the enzyme is formed by fotransferase, tyrosine ester Sulfotransferase, and bile salt cleavage of the membrane-bound form. Amino acids con Sulfotransferase. served among f1.4-galactosyltransferases include two cys 0554 ST enzyme activity varies greatly with sex and age teines linked through a disulfide-bond and a putative UDP in rats. The combined effects of developmental cues and galactose-binding site in the catalytic domain (Yadav, S. and sex-related hormones are thought to lead to these differences K. Brew (1990).J. Biol. Chem. 265:14163-14169:Yadav, S. P. in ST expression profiles, as well as the profiles of other and K. Brew (1991).J. Biol. Chem. 266:698-703; and Shaper, DMEs such as cytochromes P450. Notably, the high expres N. L. et al. (1997) J. Biol. Chem. 272:31389-31399). B1,4- sion of STs in cats partially compensates for their low level of galactosyltransferases have several specialized roles in addi UDPglucuronyltransferase activity. tion to synthesizing carbohydrate chains on glycoproteins or 0555. Several forms of ST have been purified from human glycolipids. In mammals a B1,4-galactosyltransferase, as part liver cytosol and cloned. There are two phenol sulfotrans of a heterodimer with C.-lactalbumin, functions in lactating ferases with different thermal stabilities and substrate prefer mammary gland lactose production. A B1 4-galactosyltrans ences. The thermostable enzyme catalyzes the sulfation of ferase on the Surface of sperm functions as a receptor that phenols such as para-nitrophenol, minoxidil, and acetami specifically recognizes the egg. Cell Surface B1,4-galactosyl nophen; the thermolabile enzyme prefers monoamine Sub transferases also function in cell adhesion, cell/basal lamina strates Such as dopamine, epinephrine, and levadopa. Other interaction, and normal and metastatic cell migration. (Shur, cloned STS include an estrogen Sulfotransferase and an B. (1993) Curr. Opin. Cell Biol. 5:854-863; and Shaper, J. N-acetylglucosamine-6-O-sulfotransferase-. This last (1995) Adv. Exp. Med. Biol. 376:95-104). enzyme is illustrative of the other major role of STs in cellular biochemistry, the modification of carbohydrate structures Gamma-Glutamyl Transpeptidase that may be important in cellular differentiation and matura tion of proteoglycans. Indeed, an inherited defect in a Sul 0558 Gamma-glutamyl transpeptidases are ubiquitously fotransferase has been implicated in macular corneal dystro expressed enzymes that initiate extracellular glutathione phy, a disorder characterized by a failure to synthesize mature (GSH) breakdown by cleaving gamma-glutamyl amide keratan sulfate proteoglycans (Nakazawa, K. et al. (1984) J. bonds. The breakdown of GSH provides cells with a regional Biol. Chem. 259:13751-13757; OMIM #217800 Macular cysteine pool for biosynthetic pathways. Gamma-glutamyl dystrophy, corneal). transpeptidases also contribute to cellular antioxidant defenses and expression is induced by oxidative stress. The Galactosyltransferases cell Surface-localized glycoproteins are expressed at high levels in cancer cells. Studies have Suggested that the high 0556 Galactosyltransferases are a subset of glycosyl level of gamma-glutamyl transpeptidase activity present on transferases that transfer galactose (Gal) to the terminal the surface of cancer cells could be exploited to activate N-acetylglucosamine (GlcNAc) oligosaccharide chains that precursor drugs, resulting in high local concentrations of are part of glycoproteins or glycolipids that are free in Solu anti-cancertherapeutic agents (Hanigan, M. H. (1998) Chem. tion (Kolbinger, F. et al. (1998).J. Biol. Chem. 273:433-440; Biol. Interact. 111-1 12:333-342; Taniguchi, N. and Y. Ikeda Amado, M. et al. (1999) Biochim. Biophys. Acta 1473:35 (1998) Adv. Enzymol. Relat. Areas Mol. Biol. 72:239-278: 53). Galactosyltransferases have been detected on the cell Chikhi, N. et al. (1999) Comp. Biochem. Physiol. B. Bio Surface and as soluble extracellular proteins, in addition to chem. Mol. Biol. 122:367-380). US 2013/0332.133 A1 Dec. 12, 2013 47

Aminotransferases and a.-methyldopa. A deficiency in norepinephrine has been 0559 Aminotransferases comprise a family of pyridoxal linked to clinical depression, hence the use of COMT inhibi 5'-phosphate (PLP)-dependent enzymes that catalyze trans tors could be useful in the treatment of depression. COMT formations of amino acids. Aspartate aminotransferase inhibitors are generally well tolerated with minimal side (Asp AT) is the most extensively studied PLP-containing effects and are ultimately metabolized in the liver with only enzyme. It catalyzes the reversible transamination of dicar minor accumulation of metabolites in the body (Mnnisto, P.T. boxylic L-amino acids, aspartate and glutamate, and the cor and S. Kaakkola (1999) Pharmacol. Rev. 51:593-628). responding 2-oxo acids, oxalacetate and 2-oxoglutarate. Copper-Zinc Superoxide Dismutases Other members of the family include pyruvate aminotrans 0563 Copper-zinc superoxide dismutases are compact ferase, branched-chain amino acid aminotransferase, homodimeric metalloenzymes involved in cellular defenses tyrosine aminotransferase, aromatic aminotransferase, ala against oxidative damage. The enzymes contain one atom of nine:glyoxylate aminotransferase (AGT), and kynurenine Zinc and one atom of copper per subunit and catalyze the aminotransferase (Vacca, R. A. et al. (1997) J. Biol. Chem. dismutation of Superoxide anions into O and H2O. The rate 272:21932-21937). of dismutation is diffusion-limited and consequently Primary hyperoxaluria type-1 is an autosomal recessive dis enhanced by the presence of favorable electrostatic interac order resulting in a deficiency in the liver-specific peroxiso tions between the Substrate and enzyme active site. Examples mal enzyme, alanine:glyoxylate aminotransferase-1. The of this class of enzyme have been identified in the cytoplasm phenotype of the disorder is a deficiency in glyoxylate ofall the eukaryotic cells as well as in the periplasm of several metabolism. In the absence of AGT, glyoxylate is oxidized to bacterial species. Copper-zinc Superoxide dismutases are oxalate rather than being transaminated to glycine. The result robust enzymes that are highly resistant to proteolytic diges is the deposition of insoluble calcium oxalate in the kidneys tion and denaturing by urea and SDS. In addition to the and urinary tract, ultimately causing renal failure (Lumb, M. compact structure of the enzymes, the presence of the metal J. et al. (1999) J. Biol. Chem. 274:20587-20596). ions and intrasubunit disulfide bonds is believed to be respon 0560 Kynurenine aminotransferase catalyzes the irre sible for enzyme stability. The enzymes undergo reversible versible transamination of the L-tryptophan metabolite denaturation attemperatures as high as 70° C. (Battistoni, A. L-kynurenine to form kynurenic acid. The enzyme may also et al. (1998) J. Biol. Chem. 273:5655-5661). catalyze the reversible transamination reaction between L-2- 0564 Overexpression of superoxide dismutase has been aminoadipate and 2-oxoglutarate to produce 2-oxoadipate implicated in enhancing freezing tolerance of transgenic and L-glutamate. Kynurenic acid is a putative modulator of as well as providing resistance to environmental toxins glutamatergic neurotransmission; thus a deficiency in such as the diphenyl ether herbicide, acifluorfen (McKersie, kynurenine aminotransferase may be associated with ple B. D. etal. (1993) Plant Physiol. 103:1155-1163). In addition, otrophic effects (Buchli, R. et al. (1995) J. Biol. Chem. 270: yeast cells become more resistant to freeze-thaw damage 29330-29335). following exposure to hydrogen peroxide which causes the yeast cells to adapt to further peroxide stress by upregulating Catechol-O-Methyltransferase expression of Superoxide dismutases. In this study, mutations to yeast Superoxide dismutase genes had a more detrimental 0561 Catechol-O-methyltransferase (COMT) catalyzes effect on freeze-thaw resistance than mutations which the transfer of the methyl group of S-adenosyl-L-methionine affected the regulation of glutathione metabolism, long Sus (AdoMet; SAM) donor to one of the hydroxyl groups of the pected of being important in determining an organism’s Sur catechol Substrate (e.g., L-dopa, dopamine, or DBA). Methy vival through the process of cryopreservation (Jong-In Park, lation of the 3'-hydroxyl group is favored over methylation of J.-I. et al. (1998) J. Biol. Chem. 273:22921-22928). the 4'-hydroxyl group and the membrane bound isoform of 0565 Expression of superoxide dismutase is also associ COMT is more regiospecific than the soluble form. Transla ated with Mycobacterium tuberculosis, the organism that tion of the soluble form of the enzyme results from utilization causes tuberculosis. Superoxide dismutase is one of the ten of an internal start codon in a full-length mRNA (1.5 kb) or major proteins secreted by M. tuberculosis and its expression from the translation of a shorter mRNA (1.3 kb), transcribed is upregulated approximately 5-fold in response to oxidative from an internal promoter. The proposed S.2-like methyla stress. M. tuberculosis expresses almost two orders of mag tion reaction requires Mg" and is inhibited by Ca". The nitude more Superoxide dismutase than the nonpathogenic binding of the donor and substrate to COMT occurs sequen mycobacterium M. Smegmatis, and secretes a much higher tially. AdoMet first binds COMT in a Mg"-independent proportion of the expressed enzyme. The result is the secre manner, followed by the binding of Mg" and the binding of tion of about.350-fold more enzyme by M. tuberculosis than the catechol substrate. M. Smegmatis, providing Substantial resistance to oxidative 0562. The amount of COMT in tissues is relatively high stress (Harth, G. and M. A. Horwitz (1999) J. Biol. Chem. compared to the amount of activity normally required, thus 274:4281-4292). inhibition is problematic. Nonetheless, inhibitors have been 0566. The reduced expression of copper-zinc superoxide developed for in vitro use (e.g., gallates, tropolone, U-0521. dismutases, as well as other enzymes with anti-oxidant capa and 3',4'-dihydroxy-2-methyl-propiophetropol-one) and for bilities, has been implicated in the early stages of cancer. The clinical use (e.g., nitrocatechol-based compounds and tolca expression of copper-zinc Superoxide dismutases is reduced pone). Administration of these inhibitors results in the in prostatic intraepithelial neoplasia and prostate carcinomas, increased half-life of L-dopa and the consequent formation of (Bostwick, D. G. (2000) Cancer 89:123-134). dopamine. Inhibition of COMT is also likely to increase the half-life of various other catechol-structure compounds, Phosphoesterases including but not limited to epinephrine/norepinephrine, iso 0567 Phosphotriesterases (PTE, ) are prenaline, rimiterol, dobutamine, fenoldopam, apomorphine, enzymes that hydrolyze toxic organophosphorus compounds US 2013/0332.133 A1 Dec. 12, 2013 48 and have been isolated from a variety of tissues. Phosphotri 0572 PDE.2s are coMP-stimulated PDEs that have been esterases play a central role in the detoxification of insecti found in the cerebellum, neocortex, heart, kidney, lung, pull cides by mammals. Birds and insects lack PTE, and as a result monary artery, and skeletal muscle (Sadhu, K. et al. (1999).J. have reduced tolerance for organophosphorus compounds Histochem. Cytochem, 47:895-906). PDE.2s are thought to (Vilanova, E. and M. A. Sogorb (1999) Crit. Rev. Toxicol. mediate the effects of cAMP on catecholamine secretion, 29:21-57). Phosphotriesterase activity varies among indi participate in the regulation of aldosterone (Beavo, Supra), viduals and is lower in infants than adults. PTE knockout and play a role in olfactory signal transduction (Juilfs, D. M. mice are markedly more sensitive to the organophosphate et al. (1997) Proc. Natl. Acad. Sci. USA 94:3388-3395). based toxins diaZOxon and chlorpyrifos oXon (Furlong, C. E., PDE3s have high affinity for both cGMP and cAMP, and so et al. (2000) Neurotoxicology 21:91-100). Phosphotri these cyclic nucleotides act as competitive Substrates for esterase is also implicated in atherosclerosis and diseases PDE3s. PDE3s play roles in stimulating myocardial contrac involving lipoprotein metabolism. tility, inhibiting platelet aggregation, relaxing vascular and 0568 Glycerophosphoryl diester phosphodiesterase (also airway Smooth muscle, inhibiting proliferation of T-lympho known as glycerophosphodiester phosphodiesterase) is a cytes and cultured vascular Smooth muscle cells, and regulat phosphodiesterase which hydrolyzes deacetylated phospho ing catecholamine-induced release of free fatty acids from lipid glycerophosphodiesters to produce sn-glycerol-3-phos adipose tissue. The PDE3 family of phosphodiesterases are phate and an alcohol. Glycerophosphocholine, glycerophos sensitive to specific inhibitors such as cilostamide, enoxi phoethanolamine, glycerophosphoglycerol, and mone, and lixazinone. Isozymes of PDE3 can be regulated by glycerophosphoinositol are examples of Substrates for glyc cAMP-dependent protein kinase, or by insulin-dependent erophosphoryl diester phosphodiesterases. A glycerophos kinases (Degerman, E. etal. (1997) J. Biol. Chem. 272.6823 phoryl diester phosphodiesterase from E. coli has broad 6826). specificity for glycerophosphodiester Substrates (Larson, T.J. (0573 PDE4s are specific for cAMP; are localized to air et al. (1983) J. Biol. Chem. 248:5428-5432). way Smooth muscle, the vascular endothelium, and all 0569. Cyclic nucleotide phosphodiesterases (PDEs) are inflammatory cells; and can be activated by cAMP-dependent crucial enzymes in the regulation of the cyclic nucleotides phosphorylation. Since elevation of cAMP levels can lead to cAMP and ccjMP. cAMP and coMP function as intracellular Suppression of inflammatory cell activation and to relaxation second messengers to transduce a variety of extracellular of bronchial smooth muscle, PDE4s have been studied exten signals including hormones, light, and neurotransmitters. sively as possible targets for novel anti-inflammatory agents, PDEs degrade cyclic nucleotides to their corresponding with special emphasis placed on the discovery of asthma monophosphates, thereby regulating the intracellular concen treatments. PDE4 inhibitors are currently undergoing clinical trations of cyclic nucleotides and their effects on signal trans trials as treatments for asthma, chronic obstructive pulmo duction. Due to their roles as regulators of signal transduc nary disease, and atopic eczema. All four known isozymes of tion, PDEs have been extensively studied as PDE4 are susceptible to the inhibitor rolipram, a compound chemotherapeutic targets (Perry, M.J. and G. A. Higgs (1998) which has been shown to improve behavioral memory in mice Curr. Opin. Chem. Biol. 2:472-481; Torphy, J.T. (1998) Am. (Barad, M. etal. (1998) Proc. Natl. Acad. Sci. USA95:15020 J. Resp. Crit. Care Med. 157:351-370). 15025). PDE4 inhibitors have also been studied as possible 0570 Families of mammalian PDEs have been classified therapeutic agents against acute lung injury, endotoxemia, based on their substrate specificity and affinity, sensitivity to rheumatoid arthritis, multiple Sclerosis, and various neuro cofactors, and sensitivity to inhibitory agents (Beavo, J. A. logical and gastrointestinal indications (Doherty, A. M. (1995) Physiol. Rev. 75:725-748; Conti, M. et al. (1995) (1999) Curr. Opin. Chem. Biol. 3:466-473). Endocrine Rev. 16:370-389). Several of these families con (0574 PDE5 is highly selective for cGMP as a substrate tain distinct genes, many of which are expressed in different (Turko, I. V. et al. (1998) Biochemistry 37:4200-4205), and tissues as splice variants. Within PDE families, there are has two allosteric coMP-specific binding sites (McAllister multiple isozymes and multiple splice variants of these Lucas, L. M. et al. (1995) J. Biol. Chem. 270:30671-30679). isozymes (Conti, M. and S.-L. C. Jin (1999) Prog. Nucleic Binding of c(GMP to these allosteric binding sites seems to be Acid Res. Mol. Biol. 63:1-38). The existence of multiple PDE important for phosphorylation of PDE5 by c0MP-dependent families, isozymes, and splice variants is an indication of the protein kinase rather than for direct regulation of catalytic variety and complexity of the regulatory pathways involving activity. High levels of PDE5 are found in vascular smooth cyclic nucleotides (Houslay, M. D. and G. Milligan (1997) muscle, platelets, lung, and kidney. The inhibitor Zaprinastis Trends Biochem. Sci. 22:217-224). effective against PDE5 and PDE1s. Modification of Zaprinast 0571 Type 1 PDEs (PDE1s) are Ca/calmodulin-depen to provide specificity against PDE5 has resulted in sildenafil dent and appear to be encoded by at least three different (VIAGRA; Pfizer, Inc., New York N.Y.), a treatment for male genes, each having at least two different splice variants erectile dysfunction (Terrett, N. et al. (1996) Bioorg. Med. (Kakkar, R. et al. (1999) Cell Mol. Life. Sci. 55:1164-1186). Chem. Lett. 6:1819-1824). Inhibitors of PDE5 are currently PDE1s have been found in the lung, heart, and brain. Some being studied as agents for cardiovascular therapy (Perry and PDE1 isozymes are regulated in vitro by phosphorylation/ Higgs, Supra). dephosphorylation-. Phosphorylation of these PDE1 (0575 PDE6s, the photoreceptor cyclic nucleotide phos isozymes decreases the affinity of the enzyme for calmodulin, phodiesterases, are crucial components of the phototransduc decreases PDE activity, and increases steady state levels of tion cascade. In association with the G-protein transducin, cAMP (Kakkar et al., supra). PDE1s may provide useful PDE6s hydrolyze coMP to regulate coMP-gated cation therapeutic targets for disorders of the central nervous system channels in photoreceptor membranes. In addition to the and the cardiovascular and immune systems, due to the cGMP-binding active site, PDE6s also have two high-affinity involvement of PDE1s in both cyclic nucleotide and calcium cGMP-binding sites which are thought to play a regulatory signaling (Perry and Higgs, Supra). role in PDE6 function (Artemyev, N.O. et al. (1998) Methods US 2013/0332.133 A1 Dec. 12, 2013 49

14:93-104). Defects in PDE6s have been associated with typically display about 85-95% identity in this region (e.g. retinal disease. Retinal degeneration in therd mouse (Yan, W. PDE4A vs PDE4B). Furthermore, within a family to there is et al. (1998) Invest. Opthalmol. Vis. Sci. 39:2529-2536), extensive similarity (>60%) outside the catalytic domain; autosomal recessive retinitis pigmentosa in humans (Dan while across families, there is little or no sequence similarity ciger, M. et al. (1995) Genomics 30:1-7), and rod/cone dys outside this domain. plasia 1 in Irish Setter dogs (Suber, M. L. et al. (1993) Proc. 0581 Many of the constituent functions of immune and Natl. Acad. Sci. USA90:3968-3972) have been attributed to inflammatory responses are inhibited by agents that increase mutations in the PDE6B gene. intracellular levels of cAMP (Verghese, M. W. et al. (1995) 0576. The PDE7 family of PDEs consists of only one Mol. Pharmacol. 47:1164-1171). A variety of diseases have known member having multiple splice variants (Bloom, T. J. been attributed to increased PDE activity and associated with and J. A. Beavo (1996) Proc. Natl. Acad. Sci. USA 93:14188 decreased levels of cyclic nucleotides. For example, a form of 14192). PDE7s are cAMP specific, but little else is known diabetes insipidus in mice has been associated with increased about their physiological function. Although mRNAs encod PDE4 activity, an increase in low-K cAMPPDE activity has ing PDE7s are found in skeletal muscle, heart, brain, lung, been reported in leukocytes of atopic patients, and PDE3 has kidney, and pancreas, expression of PDE7 proteins is been associated with cardiac disease. restricted to specific tissue types (Han, P. et al. (1997).J. Biol. 0582 Many inhibitors of PDEs have undergone clinical Chem. 272:16152-16157; Perry and Higgs, supra). PDE7s evaluation (Perry and Higgs, supra; Torphy, T. J. (1998) Am. are very closely related to the PDE4 family; however, PDE7s J. Respir. Crit. Care Med. 157:351-370). PDE3 inhibitors are are not inhibited by rolipram, a specific inhibitor of PDE4s being developed as antithrombotic agents, antihypertensive (Beavo, Supra). agents, and as cardiotonic agents useful in the treatment of 0577 PDE8s are cAMP specific, and are closely related to congestive heart failure. Rolipram, a PDE4 inhibitor, has the PDE4 family. PDE.8s are expressed in thyroid gland, been used in the treatment of depression, and other PDE4 testis, eye, liver, skeletal muscle, heart, kidney, ovary, and inhibitors have an anti-inflammatory effect. Rolipram may brain. The cAMP-hydrolyzing activity of PDE8s is not inhib inhibit HIV-1 replication (Angel, J. B. et al. (1995) AIDS ited by the PDE inhibitors rolipram, vinpocetine, milrinone, 9:1137-1144). Additionally, rolipram suppresses the produc IBMX (3-isobutyl-1-methylxanthine), or Zaprinast, but tion of cytokines such as TNF-a and b and interferong, and PDE.8s are inhibited by dipyridamole (Fisher, D. A. et al. thus is effective against encephalomyelitis. Rolipram may (1998) Biochem. Biophys. Res. Commun. 246:570-577; also be effective in treating tardive dyskinesia and multiple Hayashi, M. et al. (1998) Biochem. Biophys. Res. Commun. sclerosis (Sommer, N. et al. (1995) Nat. Med. 1:244-248; 250:751–756; Soderling, S. H. et al. (1998) Proc. Natl. Acad. Sasaki, H. et al. (1995) Eur. J. Pharmacol. 282:71-76). Theo Sci. USA 95.8991-8996). phylline is a nonspecific PDE inhibitor used in treatment of 0578. PDE9s are coMP specific and most closely bronchial asthma and other respiratory diseases. Theophyl resemble the PDE8 family of PDEs. PDE9s are expressed in line is believed to act on airway Smooth muscle function and kidney, liver, lung, brain, spleen, and small intestine. PDE9s in an anti-inflammatory or immunomodulatory capacity Ban are not inhibited by sildenafil (VIAGRA: Pfizer, Inc., New ner, K. H. and C. P. Page (1995) Eur. Respir. J. 8:996-1000). York N.Y.), rolipram, vinpocetine, dipyridamole, or IBMX Pentoxifylline is another nonspecific PDE inhibitor used in (3-isobutyl-1-methylxanthine), but they are sensitive to the the treatment of intermittent claudication and diabetes-in PDE5 inhibitor Zaprinast (Fisher, D. A. et al. (1998) J. Biol. duced peripheral vascular disease. Pentoxifylline is also Chem. 273:15559-15564: Soderling, S. H. et al. (1998) J. known to block TNF-a production and may inhibit HIV-1 Biol. Chem. 273:15553-15558). replication (Angel et al., Supra). 0579 PDE10s are dual-substrate PDEs, hydrolyzing both 0583 PDEs have been reported to affect cellular prolifera cAMP and coMP. PDE10s are expressed in brain, thyroid, tion of a variety of cell types (Conti et al. (1995) Endocrine and testis. (Soderling, S. H. etal. (1999) Proc. Natl. Acad. Sci. Rev. 16:370-389) and have been implicated in various can USA96:7071-7076; Fujishige, K. et al. (1999).J. Biol. Chem. cers. Growth of prostate carcinoma cell lines DU145 and 274:18438-18445; Loughney, K. etal (1999) Gene 234:109 LNCaP was inhibited by delivery of cAMP derivatives and 117). PDE inhibitors (Bang, Y.J. et al. (1994) Proc. Natl. Acad. Sci. 0580 PDEs are composed of a catalytic domain of about USA 91:5330-5334). These cells also showed a permanent 270-300 amino acids, an N-terminal regulatory domain conversion in phenotype from epithelial to neuronal morphol responsible for binding cofactors, and, in Some cases, a ogy. It has also been suggested that PDE inhibitors can regu hydrophilic C-terminal domain of unknown function (Conti late mesangial cell proliferation (Matousovic, K. et al. (1995) and Jin, Supra). A conserved, putative zinc-binding motif has J. Clin. Invest. 96:401-410) and lymphocyte proliferation been identified in the catalytic domain of all PDEs. N-termi (Joulain, C. et al. (1995) J. Lipid Mediat. Cell Signal. 11:63 nal regulatory domains include non-catalytic coMP-binding 79). One cancer treatment involves intracellular delivery of domains in PDE2s, PDE5s, and PDE6s; calmodulin-binding PDEs to particular cellular compartments of tumors, resulting domains in PDE1s; and domains containing phosphorylation in cell death (Deonarain, M. P. and A. A. Epenetos (1994) Br. sites in PDE3s and PDE4s. In PDE5, the N-terminal c(GMP J. Cancer 70:786-794). binding domain spans about 380 amino acid residues and 0584 Members of the UDPglucuronyltransferase family comprises tandem repeats of a conserved sequence motif (UGTs) catalyze the transfer of a glucuronic acid group from (McAllister-Lucas, L. M. et al. (1993) J. Biol. Chem. 268: the cofactor uridine diphosphate-glucuronic acid (UDP-glu 22863-22873). The NKXnlD motif has been shown by curonic acid) to a Substrate. The transfer is generally to a mutagenesis to be important for cGMP binding (Turko, I. V. et nucleophilic heteroatom (O, N, or S). Substrates include al. (1996) J. Biol. Chem. 271:22240-22244). PDE families xenobiotics which have been functionalized by Phase I reac display approximately 30% amino acid identity within the tions, as well as endogenous compounds such as bilirubin, catalytic domain; however, isozymes within the same family steroid hormones, and thyroid hormones. Products of glucu US 2013/0332.133 A1 Dec. 12, 2013 50 ronidation are excreted in urine if the molecular weight of the linesterase, , and carboxylesterase are Substrate is less than about 250 g/mol, whereas larger glucu grouped into the serine Superfamily of esterases (B-es ronidated substrates are excreted in bile. terases). Other carboxylesterases include thyroglobulin, 0585 UGTs are located in the microsomes of liver, kidney, thrombin, Factor IX, gliotactin, and plasminogen. Carboxy intestine, skin, brain, spleen, and nasal mucosa, where they lesterases catalyze the hydrolysis of ester- and amide-groups are on the same side of the endoplasmic reticulum membrane from molecules and are involved in detoxification of drugs, as cytochrome P450 enzymes and flavin-containing environmental toxins, and carcinogens. Substrates for car monooxygenases. UGTs have a C-terminal membrane-Span boxylesterases include short- and long-chain acyl-glycerols, ning domain which anchors them in the endoplasmic reticu acylcarnitine, carbonates, diplivefrin hydrochloride, cocaine, lum membrane, and a conserved signature domain of about salicylates, capsaicin, palmitoyl-coenzyme A, imidapril, 50 amino acid residues in their C terminal section (PROSITE haloperidol, pyrrolizidine alkaloids, steroids, p-nitrophenyl PDOC00359 UDP-glycosyltransferase signature). acetate, malathion, butanilicaine, and isocarboxazide. Car 0586 UGTs involved in drug metabolism are encoded by boxylesterases are also important for the conversion of pro two gene families, UGT1 and UGT2. Members of the UGT1 drugs to free acids, which may be the active form of the drug family result from alternative splicing of a single gene locus, (e.g., lovastatin, used to lower blood cholesterol) (reviewed in which has a variable Substrate binding domain and constant Satoh, T. and Hosokawa, M. (1998) Annu. Rev. Pharmacol. region involved in cofactor binding and membrane insertion. Toxicol. 38:257-288). Neuroligins are a class of molecules Members of the UGT2 family are encoded by separate gene that (i) have N-terminal signal sequences, (ii) resemble cell loci, and are divided into two families, UGT2A and UGT2B. Surface receptors, (iii) contain carboxylesterase domains, (iv) The 2A subfamily is expressed in olfactory epithelium, and are highly expressed in the brain, and (v) bind to neurexins in the 2B subfamily is expressed in liver microsomes. Mutations a calcium-dependent manner. Despite the homology to car in UGT genes are associated with hyperbilirubinemia boxylesterases, neuroligins lack the active site serine residue, (OMIM #143500 Hyperbilirubinemia I); Crigler-Najjar syn implying a role in Substrate binding rather than catalysis drome, characterized by intense hyperbilirubinemia from (Ichtchenko, K. etal. (1996).J. Biol. Chem. 271:2676-2682). birth (OMIM #218800 Crigler-Najjar syndrome); and a milder form of hyperbilirubinemia termed Gilbert's disease Squalene Epoxidase (OMIM #191740 UGT1). 0590 Squalene epoxidase (squalene monooxygenase, SE) is a microsomal membrane-bound, FAD-dependent oxi 0587. Two soluble thioesterases involved in fatty acid bio doreductase that catalyzes the first oxygenation step in the synthesis have been isolated from mammalian tissues, one sterol biosynthetic pathway of eukaryotic cells. Cholesterol is which is active only toward long-chain fatty-acyl thioesters an essential structural component of cytoplasmic membranes and one which is active toward thioesters with a wide range of acquired via the LDL receptor-mediated pathway or the bio fatty-acyl chain-lengths. These thioesterases catalyze the synthetic pathway. SE converts squalene to 2.3(S)oxi chain-terminating step in the de novo biosynthesis of fatty dosqualene, which is then converted to lanosterol and then acids. Chain termination involves the hydrolysis of the cholesterol. thioester bond which links the fatty acyl chain to the 4'-phos 0591 High serum cholesterol levels result in the formation phopantetheline prosthetic group of the acyl carrier protein ofatherosclerotic plaques in the arteries of higher organisms. (ACP) subunit of the fatty acid synthase (Smith, S. (1981a) This deposition of highly insoluble lipid material onto the Methods Enzymol. 71:181-188; Smith, S. (1981b) Methods walls of essential blood vessels results in decreased blood Enzymol. 71:188-200). flow and potential necrosis. HMG-CoA reductase is respon 0588 E. coli contains two soluble thioesterases, sible for the first committed step in cholesterol biosynthesis, thioesterase I which is active only toward long-chain acyl conversion of 3-hydroxyl-3-methyl-glutaryl CoA (HMG thioesters, and thioesterase II (TEII) which has a broad chain CoA) to mevalonate. HMG-CoA is the target of a number of length specificity (Naggert, J. et al. (1991) J. Biol. Chem. pharmaceutical compounds designed to lowerplasma choles 266:11044-11050). E. coli TEII does not exhibit sequence terol levels, but inhibition of MSG-CoA also results in the similarity with either of the two types of mammalian reduced synthesis of non-sterol intermediates required for thioesterases which function as chain-terminating enzymes in other biochemical pathways. Since SE catalyzes a rate-limit de novo fatty acid biosynthesis. Unlike the mammalian ing reaction that occurs later in the sterol synthesis pathway thioesterases, E. coli TEII lacks the characteristic serine with cholesterol as the only end product, SE is a better ideal active site gly-X-Ser-X-gly sequence motif and is not inacti target for the design of anti-hyperlipidemic drugs (Nakamura, vated by the serine modifying agent diisopropyl fluorophos Y. et al. (1996) 271:8053-8056). phate. However, modification of histidine 58 by iodoaceta mide and diethylpyrocarbonate abolished TEII activity. Epoxide Hydrolases Overexpression of TEII did not alter fatty acid content in E. coli, which suggests that it does not function as a chain 0592 Epoxide hydrolases catalyze the addition of water to terminating enzyme in fatty acid biosynthesis (Naggert et al., epoxide-containing compounds, thereby hydrolyzing Supra). For that reason, Naggert et al. (Supra) proposed that epoxides to their corresponding 1,2-diols. They are related to the physiological substrates for E. coli TEII may be coen bacterial haloalkane dehalogenases and show sequence simi Zyme A (CoA)-fatty acid esters instead of ACP-phosphopan larity to other members of the C/B hydrolase fold family of thetheine-fatty acid esters. enzymes. This family of enzymes is important for the detoxi fication of Xenobiotic epoxide compounds which are often Carboxylesterases highly electrophilic and destructive when introduced. 0589 Mammalian carboxylesterases area multigene fam Examples of epoxide hydrolase reactions include the ily expressed in a variety of tissues and cell types. Acetylcho hydrolysis of some leukotoxin to leukotoxin diol, and isoleu US 2013/0332.133 A1 Dec. 12, 2013

kotoxin to isoleukotoxin diol. Leukotoxins alter membrane Soil Sciences, Vol. 65 Ladha, J. K.; Peoples, M. B. (Eds.) permeability and ion transport and cause inflammatory published by Springer-Verlag. Reprinted from PLANT AND responses. In addition, epoxide carcinogens are produced by SOIL (1995)174: 1-2, ISBN: 978-0-7923-3413-2). cytochrome P450 as intermediates in the detoxification of 0597 Another exemplary problem is feed digestibility, for drugs and environmental toxins. Epoxide hydrolases possess example in poultry and Swine. Enzymatic solutions to this a catalytic triad composed of Asp, Asp, and His (Arand, M. et problem are described in, for example, “Enzymes in Poultry al. (1996) J. Biol. Chem. 271:4223-4229; Rink, R. et al. and Swine Nutrition” By Marquardt and Han (Proceedings of (1997) J. Biol. Chem. 272:14650-14657: Argiriadi, M. A. et the first Chinese Symposium on Feed Enzymes, Nanjing al. (2000) J. Biol. Chem. 275:15265-15270). Agricultural University, Nanjing, People's Republic of China, 6-8 May 1996: International Research and Develop Enzymes Involved in Tyrosine Catalysis ment Center, Ottawa;. ISBN 08893.6821X). Papers presented in this reference indicate that many exciting developments 0593. The degradation of the amino acid tyrosine, to either can be expected regarding use of enzymes in feeds, particu Succinate and pyruvate or fumarate and acetoacetate, requires larly with the use of recombinant enzymes for a wide range of a large number of enzymes and generates a large number of animals and animal feedstuffs. Enzymes not only will enable intermediate compounds. In addition, many Xenobiotic com livestock and poultry producers to economically use new pounds may be metabolized using one or more reactions that feedstuffs, but will also prove to be environmentally friendly, are part of the tyrosine catabolic pathway. Enzymes involved as they reduce the pollution associated with animal produc in the degradation of tyrosine to Succinate and pyruvate (e.g., tion. in Arthrobacter species) include 4-hydroxyphenylpyruvate 0598 “Enzymes in the Environment: Activity, Ecology, oxidase, 4-hydroxyphenylacetate 3-hydroxylase, 3,4-dihy and Applications' edited by Burns and Dick (Books in Soils, droxyphenylacetate 2,3-dioxygenase, 5-carboxymethyl-2- Plants, and the Environment (2002) Volume: 84 CRC Press: hydroxymuconic semialdehyde dehydrogenase, trans,cis-5- ISBN: 9780824706142) points out the great unmet need for a carboxymethyl-2-hydroxymuconate isomerase, reliable means of classifying enzymes functionally as dis homoprotocatechuate isomerase? decarboxylase, cis-2-oxo closed hereinabove. hept-3-ene-1,7-dioate hydratase, 2,4-dihydroxyhept-trans-2- 0599. According to the Food and Agricultural Organiza ene-1,7-dioate aldolase, and Succinic semialdehyde dehydro tion of the United Nations: genase. Enzymes involved in the degradation of tyrosine to 0600 “Bioprocessing which involves the use of fumarate and acetoacetate (e.g., in Pseudomonas species) enzymes and microorganisms for the conversion of raw include 4-hydroxyphenylpyruvate dioxygenase, homogenti food materials into a diversity of products, offers tre sate 1,2-dioxygenase, maleylacetoacetate isomerase, mendous opportunity for stimulating agro-industrial fumarylacetoacetate and 4-hydroxyphenylacetate. Addi development in developing countries. The processes tional enzymes associated with tyrosine metabolism in dif involved are scaleable, environmentally friendly, and ferent organisms include 4-chlorophenylacetate-3,4-dioxy can be economically applied and linked to existing prac genase, aromatic aminotransferase, 5-oxopent-3-ene-1,2,5- tices in these countries. Many of the traditional food tricarboxylate decarboxylase, 2-oxo-hept-3-ene-1,7-dioate bioprocessing techniques used in developing countries hydratase, and 5-carboxymethyl-2-hydroxymuconate however require considerable Scientific and technologi isomerase (Ellis, L. B. M. et al. (1999) Nucleic Acids Res. cal improvement.” 27:373-376; Wackett, L. P. and Ellis, L. B. M. (1996) J. 0601 The Food and Agricultural Organization of the Microbiol. Meth. 25:91-93; and Schmidt, M. (1996) Amer. United Nations has also published a pamphlet entitled Soc. Microbiol. News 62:102). “SMALL-SCALE PROCESSING OF MICROBIAL PESTI 0594. In humans, acquired or inherited genetic defects in CIDES' Taborsky (1992) FAO AGRICULTURAL SER enzymes of the tyrosine degradation pathway may result in VICES BULLETIN No. 96; Food and Agriculture Organiza hereditary tyrosinemia. One form of this disease, hereditary tion of the United Nations Rome 1992) describes use of tyrosinemia 1 (HT1) is caused by a deficiency in the enzyme chitinase and/or other enzymes in decomposition of insect fumarylacetoacetate hydrolase, the last enzyme in the path integuments. way in organisms that metabolize tyrosine to fumarate and 0602 Optionally, a bacterial polypeptide toxin, optionally acetoacetate. HT1 is characterized by progressive liver dam an enzyme, is overpressed in plants. This strategy has previ age beginning at infancy, and increased risk for liver cancer ously been employed with Bacillus thurigens toxins. In an (Endo, F. et al. (1997) J. Biol. Chem. 272:24426-24432). exemplary embodiment of the invention, toxins from other bacteria are identified using exemplary methods disclosed Exemplary Agricultural Enzyme Uses herein. 0595 Enzymes with known function are useful in a solv ing a number of different agricultural problems. The follow Exemplary Formulations ing list of exemplary problems does not purport to be exhaus 0603. In an exemplary embodiment of the invention, a tive. polypeptide according to one or more of SEQID Nos.: 77,838 0596. One exemplary problem is fixation of soil nitrogen. to 198.923 is formulated so that the enzyme(s) are efficiently Enzymatic solutions to this problem are described in, for presented to their substrates for Substrate processing. Formu example, “Management of Biological Nitrogen Fixation for lation optionally reflects intended use. In an exemplary the Development of More Productive and Sustainable Agri embodiment of the invention, the formulation includes pH cultural Systems’ which presents extended versions of papers adjusters (e.g. buffering agents) and/or osmotic adjusters (e.g. presented in the Symposium on Biological Nitrogen Fixation specific salts and/or ions) to contribute to enzymatic activity for Sustainable Agriculture at the 15th Congress of Soil Sci The following listing of exemplary formulations does not ence, Acapulco, Mexico 1994. (Developments in Plant and limit the scope of the invention. US 2013/0332.133 A1 Dec. 12, 2013 52

0604 Optionally, the formulation is provided as a phar for oral use can be made using a solid excipient, optionally maceutical composition. grinding the resulting mixture, and processing the mixture of 0605 As used herein a “pharmaceutical composition' granules, after adding Suitable auxiliaries if desired, to obtain refers to a preparation of one or more of the active ingredients tablets or dragee cores. Suitable excipients are, in particular, described herein with other chemical components such as fillers such as Sugars, including lactose, Sucrose, mannitol, or physiologically Suitable carriers and excipients. The purpose Sorbitol; cellulose preparations such as, for example, maize of a pharmaceutical composition is to facilitate administra starch, wheat starch, rice starch, potato starch, gelatin, gum tion of a compound to an organism. tragacanth, methyl cellulose, hydroxypropylmethyl-cellu 0606. Herein the term “active ingredient” refers to the lose, sodium carbomethylcellulose; and/or physiologically nucleic acid construct accountable for the biological effect. acceptable polymers such as polyvinylpyrrolidone (PVP). If 0607 Hereinafter, the phrases “physiologically accept desired, disintegrating agents may be added, such as cross able carrier and “pharmaceutically acceptable carrier' linked polyvinyl pyrrolidone, agar, or alginic acid or a salt which may be interchangeably used refer to a carrier or a thereof Such as Sodium alginate. diluent that does not cause significant irritation to an organ 0616 Dragee cores are provided with suitable coatings. ism and does not abrogate the biological activity and proper For this purpose, concentrated Sugar Solutions may be used ties of the administered compound. An adjuvant is included which may optionally contain gum arabic, talc, polyvinyl under these phrases. pyrrolidone, carbopol gel, polyethylene glycol, titanium 0608 Herein the term “excipient” refers to an inert sub dioxide, lacquer Solutions and Suitable organic solvents or stance added to a pharmaceutical composition to further solvent mixtures. Dyestuffs or pigments may be added to the facilitate administration of an active ingredient. Examples, tablets or dragee coatings for identification or to characterize without limitation, of excipients include calcium carbonate, different combinations of active compound doses. calcium phosphate, various Sugars and types of starch, cellu 0.617 Pharmaceutical compositions which can be used lose derivatives, gelatin, vegetable oils and polyethylene gly orally, include push-fit capsules made of gelatin as well as cols. soft, sealed capsules made of gelatin and a plasticizer, Such as 0609 Techniques for formulation and administration of glycerol or Sorbitol. The push-fit capsules may contain the drugs may be found in “Remington's Pharmaceutical Sci active ingredients in admixture with filler Such as lactose, ences.” Mack Publishing Co., Easton, Pa., latest edition, binders such as starches, lubricants such as talc or magnesium which is incorporated herein by reference. Stearate and, optionally, stabilizers. In soft capsules, the 0610 Suitable routes of administration may, for example, active ingredients may be dissolved or suspended in Suitable include oral, rectal, transmucosal, especially transnasal, liquids, such as fatty oils, liquid paraffin, or liquid polyethyl intestinal or parenteral delivery, including intramuscular, ene glycols. In addition, stabilizers may be added. All formu Subcutaneous and intramedullary injections as well as intrath lations for oral administration should be in dosages Suitable ecal, direct intraventricular, intravenous, inrtaperitoneal, for the chosen route of administration. intranasal, or intraocular injections. 0618. For buccal administration, the compositions may 0611 Alternately, one may administer the pharmaceutical take the form of tablets or lozenges formulated in conven composition in a local rather than systemic manner, for tional manner. example, via injection of the pharmaceutical composition 0619 For administration by nasal inhalation, the active directly into a tissue region of a patient. ingredients for use according to the present invention are 0612 Pharmaceutical compositions of the present inven conveniently delivered in the form of an aerosol spray pre tion may be manufactured by processes well known in the art, sentation from a pressurized pack or a nebulizer with the use e.g., by means of conventional mixing, dissolving, granulat of a Suitable propellant, e.g., dichlorodifluoromethane, ing, dragee-making, levigating, emulsifying, encapsulating, trichlorofluoromethane, dichloro-tetrafluoroethane or carbon entrapping or lyophilizing processes. dioxide. In the case of a pressurized aerosol, the dosage unit 0613 Pharmaceutical compositions for use in accordance may be determined by providing a valve to deliver a metered with the present invention thus may beformulated in conven amount. Capsules and cartridges of e.g., gelatin for use in a tional manner using one or more physiologically acceptable dispenser may be formulated containing a powder mix of the carriers comprising excipients and auxiliaries, which facili compound and a suitable powder base such as lactose or tate processing of the active ingredients into preparations starch. which, can be used pharmaceutically. Proper formulation is 0620. The pharmaceutical composition described herein dependent upon the route of administration chosen. may be formulated for parenteral administration, e.g., by 0614 For injection, the active ingredients of the pharma bolus injection or continuos infusion. Formulations for injec ceutical composition may be formulated in aqueous solu tion may be presented in unit dosage form, e.g., in ampoules tions, preferably in physiologically compatible buffers such or in multidose containers with optionally, an added preser as Hank's Solution, Ringer's solution, or physiological salt Vative. The compositions may be suspensions, solutions or buffer. For transmucosal administration, penetrants appropri emulsions in oily or aqueous vehicles, and may contain for ate to the barrier to be permeated are used in the formulation. mulatory agents such as Suspending, stabilizing and/or dis Such penetrants are generally known in the art. persing agents. 0615. For oral administration, the pharmaceutical compo 0621 Pharmaceutical compositions for parenteral admin sition can be formulated readily by combining the active istration include aqueous solutions of the active preparation compounds with pharmaceutically acceptable carriers well in water-soluble form. Additionally, suspensions of the active known in the art. Such carriers enable the pharmaceutical ingredients may be prepared as appropriate oily or water composition to be formulated as tablets, pills, dragees, cap based injection Suspensions. Suitable lipophilic solvents or Sules, liquids, gels, syrups, slurries, Suspensions, and the like, vehicles include fatty oils such as Sesame oil, or synthetic for oral ingestion by a patient. Pharmacological preparations fatty acids esters such as ethyl oleate, triglycerides or lipo US 2013/0332.133 A1 Dec. 12, 2013

Somes. Aqueous injection Suspensions may contain Sub age forms containing the active ingredient. The pack may, for stances, which increase the Viscosity of the Suspension, Such example, comprise metal or plastic foil. Such as ablisterpack. as sodium carboxymethyl cellulose, sorbitol or dextran. The pack or dispenser device may be accompanied by instruc Optionally, the Suspension may also contain Suitable stabiliz tions for administration. The pack or dispenser may also be ers or agents which increase the Solubility of the active ingre accommodated by a notice associated with the container in a dients to allow for the preparation of highly concentrated form prescribed by a governmental agency regulating the Solutions. manufacture, use or sale of pharmaceuticals, which notice is 0622. Alternatively, the active ingredient may be in pow reflective of approval by the agency of the form of the com der form for constitution with a suitable vehicle, e.g., sterile, positions or human or veterinary administration. Such notice, pyrogen-free water based solution, before use. for example, may be of labeling approved by the U.S. Food 0623 The pharmaceutical composition of the present and Drug Administration for prescription drugs or of an invention may also be formulated in rectal compositions such approved product insert. Compositions comprising a prepa as Suppositories or retention enemas, using, e.g., conven ration of the invention formulated in a compatible pharma tional Suppository bases such as cocoa butter or other glyc ceutical carrier may also be prepared, placed in an appropriate erides. container, and labeled for treatment of an indicated condition, 0624 Pharmaceutical compositions suitable for use in as if further detailed above. context of the present invention include compositions 0632 Optionally, the formulation is provided as a cos wherein the active ingredients are contained in an amount metic preparation. The cosmetic preparation can comprise effective to achieve the intended purpose. More specifically, a one or more topically applicable materials including, but not therapeutically effective amount means an amount of active limited to, penetrating agents, oils, scents, colors powders. ingredients (nucleic acid construct) effective to prevent, alle According to various exemplary embodiments of the inven viate or ameliorate symptoms of a disorder (e.g., ischemia) or tion, a cosmetic preparation can be provided as a cream, a prolong the Survival of the Subject being treated. lotion, a gel, an eye-shadow, foundation makeup, rouge mail 0625 Determination of a therapeutically effective amount polish, mascara, lip-liner or lipstick. is well within the capability of those skilled in the art, espe 0633) Optionally, the formulation is provided as an agri cially in light of the detailed disclosure provided herein. cultural preparation. Agricultural preparations include, but 0626. For any preparation used in the methods of the are not limited to, feed additives, veterinary medications, invention, the therapeutically effective amount or dose can be sprays, liquids and foams. estimated initially from in vitro and cell culture assays. For 0634) Formulation of feed additives and veterinary medi example, a dose can be formulated in animal models to cations involves similar considerations to those described achieve a desired concentration or titer. Such information can hereinabove for pharmaceutical compositions. be used to more accurately determine useful doses in humans. 0635. Optionally, sprays are formulated for close applica 0627 Toxicity and therapeutic efficacy of the active ingre tion (e.g. from a tractor or using a handheld sprayer) or for dients described herein can be determined by standard phar application from a distance (e.g. via an irrigation system or maceutical procedures in vitro, in cell cultures or experimen from an airplane). In exemplary embodiments of the inven tal animals. The data obtained from these in vitro and cell tion, sprays are applied to animals (e.g. for vaccination or culture assays and animal studies can be used in formulating parasite removal) or to plants (e.g. as herbicides or pesti a range of dosage for use in human. The dosage may vary cides). depending upon the dosage form employed and the route of 0636. Optionally, the formulation is provided as a cleaning administration utilized. The exact formulation, route of preparation. Cleaning preparations can include, in addition to administration and dosage can be chosen by the individual the active polypeptide, one or more of a Soap, a detergent, a physician in view of the patient’s condition. (See e.g., Fingl, Surfactant, a wetting agent, an emulsifier and a solvent. The et al., 1975, in “The Pharmacological Basis of Therapeutics'. cleaning preparation can be provided in a wide variety of Ch. 1 p. 1). forms, including but not limited to, a spray (optionally with 0628 Dosage amount and interval may be adjusted indi aerosol propellant), a cream, a gel and a liquid. In an exem vidually to provide plasma or brain levels of the active ingre plary embodiment of the invention, the cleaning preparation dient are Sufficient to induce or Suppress angiogenesis (mini is provided in a package with dilution instructions. In other mal effective concentration, MEC). The MEC will vary for exemplary embodiments, the cleaning preparation is pro each preparation, but can be estimated from in vitro data. vided in a package at a “ready to use” concentration. Dosages necessary to achieve the MEC will depend on indi 0637 Additional objects, advantages and novel features of vidual characteristics and route of administration. Detection the present invention will become apparent to one ordinarily assays can be used to determine plasma concentrations. skilled in the art upon examination of the following examples, 0629 Depending on the severity and responsiveness of the which are not intended to be limiting. Additionally, each of condition to be treated, dosing can be of a single or a plurality the various embodiments and aspects of the present invention of administrations, with course of treatment lasting from as delineated hereinabove and as claimed in the claims sec several days to several weeks or until cure is effected or tion below finds experimental support in the following diminution of the disease state is achieved. examples. 0630. The amount of a composition to be administered will, of course, be dependent on the subject being treated, the EXAMPLES severity of the affliction, the manner of administration, the 0638 Reference is now made to the following examples, judgment of the prescribing physician, etc. which together with the above descriptions illustrate the 0631 Compositions of the present invention may, if invention in a non limiting fashion. desired, be presented in a pack or dispenser device, such as an 0639. The teachings of the present embodiments were FDA approved kit, which may contain one or more unit dos used for predicting the function and/or affinity of an enzyme US 2013/0332.133 A1 Dec. 12, 2013 54 from its amino-acid sequence by searching therein for a motif were also applied to an older release of Swiss-Prot, release 45 of amino acids matching a predicting sequence of an enzyme dated October, 2004, with the statistics which is summarized database, and attributing to the unclassified enzyme a classi in Table 3. fier in the form of an EC number. Example 1 below describes the procedure for the construction of an exemplary enzyme TABLE 3 database. Example 2 below describes the procedure of clas sification of an first exemplary set of unclassified enzymes. No. of No. of Example 3 below describes the procedure for the construction EC class No. of sequences subclasses SubSubclasses of an additional exemplary enzyme database. Example 4 oxidoreductases 7918 21 79 below describes theoretical considerations for characteriza transferases 12807 9 26 tion an additional set of peptides using the database of hydrolases 8982 10 47 Example 3. Example 5 below describes characterization of lyases 4632 7 15 the metagenomic dataset of Example 3. Example 6 below isomerases 2234 6 17 presents an analysis of enzyme size. Example 7 below pre ligases 4692 6 10 sents a characterization of the unknown enzyme set of total 41265 59 194 example 4 according to the database of example 3. Example 8 below presents a correlation of predicting sequence (PS) sequences to EC functional classifications of known enzymes. Example 9 below describes exemplary detergent Results compositions. Example 10 below presents exemplary food processing compositions. Example 11 below presents exem 0644. Following is a description of analysis performed to plary compositions from ethanol production. Example 12 the enzyme database constructed from the data Summarized below presents a comparison of exemplary methods accord in Tables 2 and 3 above. The entire enzyme database, as ing to the invention to Prosite. constructed from the 50,698 enzymes summarized in Table 2 is provided in Appendix 1 below and further in Table 11 on Example 1 enclosed CD-ROM (files “Table-11. txt”). Below, a predict ing sequence of level N. is conveniently denoted PSN, and is Exemplary Enzyme Searchable Database referred to a sequence which predicts its location on the EC Methods tree at level N. 0640 The motifextraction procedure described above was 0645. In some of the priority documents of the instant used for defining predicting sequences for almost all known Application, PSN is referred to as SPN. Thus, SP1, SP2, SP3 enzymes and at all levels of the EC hierarchical classification. and SP4 in some of the priority documents correspond to PS1, The procedure was separately applied to each one of the six PS2, PS3 and PS4 in the instant Application, respectively. EC main classes. The decrease functions D and D, were 0646. The procedure extracted of many motifs. The pro defined as described hereinabove using the values m m, 0. cedure has been applied to each main EC class and to all the 8. The statistical significance threshold a was 0.01. enzymes specified by the main EC class. Nonetheless, more 0641 Protein sequences annotated with EC numbers were than half of all motifs turn out to belong uniquely to single extracted from the UniProt/Swiss-Prot database (Release 48.3, Oct. 25, 2005). The following sequences were removed branches of the fourth level of the hierarchy, to be denoted as from the database: (i) sequences shorter than 100 amino acids predicting sequence of level 4 (PS4). or longer than 1200 amino acids; (ii) sequences with impre (0647. It should be realized that, at the fourth level, one cise annotation (e.g., indicated as “probable'/“hypothetical/ encounters strong homology between all amino-acid “putative' or partially specified EC number); (iii) enzymes sequences. The PS4 stretches are specific motifs that are that catalyze more than one reaction (e.g., indicated as "bi extracted from these homologs. The lower levels of predict functional or annotated with more than one EC number). ing sequences, PS3, PS2 and PS1 do not include any of their 0642 Table 2 summarizes the statistics of the dataset. descendants. Thus PS3 does not include predicting sequences of PS4 belonging to branches of the same subsubclass. The TABLE 2 numbers of predicting sequences found are listed below in No. of No. of Tables 4 and 5 for the datasets presented in Tables 2 and 3. EC class No. of sequences subclasses SubSubclasses respectively. oxidoreductases 9437 21 81 transferases 16196 9 26 TABLE 4 hydrolases 10901 10 47 lyases 5299 7 15 EC class No. of PS4 No. of PS3 No. of PS2 No. of PS1 isomerases 2887 6 17 ligases 6048 6 10 oxidoreductases 12781 868 379 1311 transferases 20043 918 488 2123 total SO698 59 196 hydrolases 10822 1120 197 1153 lyases 7886 2OO 59 3OO isomerases 408O S4 26 154 0643. The motif extraction procedure was used to define ligases 11695 573 99 SO8 predicting sequences that are specific to one, and only one, total 67307 3733 1248 5549 branch of the EC hierarchical classification, excluding uniqueness within its descending branches. The procedures US 2013/0332.133 A1 Dec. 12, 2013

TABLE 5 TABLE 8

EC class No. of PS4 No. of PS3 No. of PS2 No. of PS1 EC class PS3 PS2 PS1 oxidoreductases 10312 719 375 1087 oxidoreductases 18% (1697) 8.4% (792) 41% (3869) transferases 1SO32 750 351 1576 transferases 17% (2754) 10.3% (1668) 39.6% (6412) hydrolases 8575 920 159 962 hydrolases 19.9% (2172) 6.8% (738) 32% (3487) lyases 6613 18O 49 286 lyases 12.1% (631) 4.5% (238) 17.95% (939) isomerases 2939 43 17 98 isomerases 6.5% (187) 4.3% (124) 16.5% (476) ligases 8572 422 81 369 ligases 31.25% (1889) 6.65% (403) 27.88% (1686) total S2O43 3O34 1058 4378 total 18.4% (9330) 7.8% (3963) 33.25% (16869)

0648. The lists of predicting sequences in Tables 4 and 5 above contain overlaps, e.g., stretches which are parts of other TABLE 9 stretches. No attempt was made to obtain a minimal set of predicting sequences. EC class PS3 PS2 PS1 0649. To determine the usefulness of the predicting oxidoreductases 18.8% (1489) 8.8% (699) 39.4% (3121) sequences the coverage of the predicting sequences as well as transferases 16.95% (2200) 9% (1157) 37.75% (4836) hydrolases 19.3% (1736) 6% (546) 32.65% (2931) their cumulants was investigated. The latter were defined as lyases 10.75% (498) 3.45% (161) 20.55% (953) unions of the former CPS3=PS3UPS4, CPS2=PS2UCPS3 isomerases 7.7% (172) 4.75% (106) 15.35% (343) and CPS1=PS1UCPS2. ligases 30.35% (1425) 7.1% (334) 26.5% (1243) 0650. In some of the priority documents of the instant Application, the cumulants CPS3, CPS2 and CPS1 are total 18.2% (7520) 7.25% (3003) 32.55% (13427) referred to as CSP1, CSP2 and CSP3. 0651 Table 6 summarizes the coverage of the 48 dataset 0656. The motif extraction procedure is not limited by the per enzyme class. length of the motifs it extracts. The resulting distribution of motif length is displayed in FIG.8 for all six main classes of TABLE 6 the EC hierarchical classification. 0657. It is recognized that the longer peptides are in prin EC class PS4 CPS3 CPS2 CPS1 ciple more strongly associated with homologies. This is borne oxidoreductases 8122 8403 85.65 8859 out by a test carried out on randomly chosen 100 predicting transferases 14318. 14695 14798 1518O hydrolases 8581 9067 9149 9528 sequences of the cumulant CPS3. For each predicting lyases 478O 4826 4837 4886 sequence, the set of all the enzymes on which it occurs has isomerases 2643 2661 2666 2691 been extracted, and the percentages of identity along ligases S812 S869 5879 5909 sequences of all pairs were calculated. total 4.4256 45S21 45894 47053 0658. The results are shown in FIGS. 9a-c, for predicting sequences shorter than 9 amino-acids (FIG. 9a), between 9 and 12 amino-acids (FIG.9b) and longer than 12 amino-acids 0652. As shown in Table 6 above, functional classification (FIG. 9c). As shown in FIG. 9a, the histogram of motifs at the third level of EC is provided by the 71,040 CPS3 (see shorter than 9 amino-acids exhibits a peak at about 60% with Table 4) for 89.8% of the data. a tail that extends well below 40%. It is thus demonstrated that 0653. Similar success values were obtained for the 45 short predicting sequences are useful for predicting the class dataset, shown in Table 7. to which the enzyme belongs. TABLE 7 Example 2 EC class PS4 CPS3 CPS2 CPS1 Classification of Unclassified Enzymes Using oxido- 83.2% (6587) 86.5% (6850) 88.3% (6995) 91.8% (7266) Predicting Sequences reduc tases trans- 85.3% (10925) 88.1% (11281) 88.8% (11376) 91.8% (11753) 0659. In this example, the ability of the predicting ferases sequences of the present embodiments of the invention to hydro 77% (6920) 81.6% (7332) 82.4% (7398) 85.6% (7720) classify unclassified enzymes is demonstrated. To mimic a lases situation in which an unclassified enzyme is to be classified lyases 90.2% (4180) 91.2% (4226) 91.4% (4235) 92.5% (4287) isom- 89.4% (1998) 89.8% (2007) 90% (2010) 91.1% (2035) using the enzyme database, a reduced enzyme database was (8SCS constructed solely from the dataset of release 45 (see Table 3). ligases 95.3% (4470) 96.4% (4523) 96.6% (4531) 97.2% (4562) All sequences of the dataset of release 48.3 that did not appear in the dataset of release 45 were considered, for the sake of total 85% (35080) 87.8% (36219) 88.6% (36545) 91.2% (37623) demonstration, as "unclassified sequences'. 0660. The reduced enzyme database was constructed from 0654. It is therefore demonstrated that a large fraction of 41.265 sequences and the group of “unclassified sequences the coverage is provided by the predicting sequences of level included 10,730 sequences (26% of the number of sequences 4. from which the reduced database was constructed). Each 0655 Tables 8 and 9 summarize the differential coverage unclassified sequence was searched for a motif of amino acids of the other predicting sequences. matching a predicting sequence present in the reduced data US 2013/0332.133 A1 Dec. 12, 2013 56 base, and the classifier corresponding to the matched predict TABLE 12 ing sequence was used for determining the EC number of the respective enzyme. No. Thermophile 0661 The classification quality was quantified by means 1. Aeropyrim pernix of recall-precision analysis. Recall and precision are effec 2. Aquifex aeolicits tiveness measures known in the art. Recall was defined as the 3. Archaeoglobits filgidus 4. Deinococcus geothermalis DSM 11300 number of novel sequences that included at least one of the 5. Methanobacterium thermoautotrophicum PSNs, while precision was defined as the percentage of pre 6. Methanosaeta thermophila PT dictions, based on the PSNs of the 45 dataset that were cor 7. Moorelia thermoacetica ATCC 39073 8. Nanoarchaeum equitans roborated by the assignment of the 48.3 dataset. Less than 9. Picrophilus torridus DSM 9790 54% of all PSNs were needed for the analysis. Precision can 10. Pyrobaculum aerophilum be defined at the predicting sequence, e.g., to what extent did 11. Pyrococcus abyssi the EC of a particular predicting sequence matches the true 12. Pyrococcus furiosus 13. Pyrococcus horikoshi EC of the enzyme that it hits. Precision can also be defined at 14. Sulfolobus acidocaidarius DSM 639 the enzyme level: how many enzymes are correctly identified 15. Sulfolobus solfatanicus by all predicting sequences that hit them. In other words, 16. Sulfolobus tokodaii demanding the EC assignments of all predicting sequences to 17. Thermoanaerobacter tengcongensis 18. Thermobifida fisca YX be consistent with one-another as well as with the “48.3” 19. Thermococcus kodakaraensis KOD1 annotation of the enzyme. The classification method of the 2O. Thermoplasma acidophilum present embodiments classified the "unclassified sequences 21. Thermoplasma volcanium with a total precision value at predicting sequence level of 22. Thermosynechococci is elongatus 23. Thermotoga maritima more than 98%, a total precision value at predicting enzyme 24. Thermus thermophilus HB27 level of more than 81%, and total recall value of more than 25. Thermus thermophilus HB8 84%, corresponding to a success rate of about 84%. The reason form the difference between the two precision levels is that typically there is more than one predicting sequence 0664. The dataset of predicting sequences for the organ hitting each enzyme, and the Small error at the predicting isms listed in Table 12 comprises a metagenomic set on sequence level is magnified by the requirement that the EC which the methods described above can be tested. The Ther labels of all predicting sequences on the same enzyme are mophile metagenomic dataset consists of 52.481 proteins consistent with each other. with average length of 295-196 amino-acids. 0662. The results of the analysis are summarized in Table Example 4 10. Theoretical Considerations for Characterization of TABLE 10 Sargasso Sea Bacterial Peptides Using the No. of No. of Precision Precision Metagenomic Dataset EC class PSNS sequences Recall (sequence) (enzyme) 0665 Venteretal (2004; Environmental Genome Shotgun oxidoreductases 5967 1661 1235 99.35% 78.2% Sequencing of the Sargasso Sea, Science 304: 66-74; fully (74.35%) incorporated herein by reference) has compiled and made transferases 968O 3722 3253 99.3% 84.6% publicly available genomic sequence data for bacteria iso (87.4%) hydrolases 4466 2173 1614 98.45% 71.8% lated from the Sargasso Sea. (74.25%) 0666. In order to demonstrate the utility of the metage lyases 3838 1089 930 99.65% 91.2% nomic data set of Example 3, predicting sequences from the (85.4%) metagenomic data set were used to classify Sargasso Sea isomerases 1774 686 611 83% 79.0% (89%) sequence data according to standard EC classification hierar ligases 668S 1399 1385 99.55% 87.1% chy. (99%) 0667 The finding of predicting sequences on proteins that do not have enzymatic functions (to be termed accidentals) total 32410 10730 9028 98.5% 81.7% is modeled by predicting sequence hits on random protein (53.55%) (84.15%) sequences. For each dataset being considered (Thermophiles metagenomic set and Sargasso Sea genomic sequence data in this exemplary embodiment of the invention) random pro Example 3 tein sets were generated by scrambling the order of the amino-acids in every protein, thus conserving only first-order An Enzyme Searchable Database for Thermophilic statistics. Bacteria 0668 Five such sets were produced for the Thermophiles metagenomic data set (including one that consists of inverting 0663. In order to establish that the predictive methods the sequence of each protein) in order to measure the expected described hereinabove are generally applicable, a dataset of accidental hits. predicting sequences for the genomes of 25 thermophilic 0669. The outcome is presented in Table 13. The notation bacteria with genomic sequence data available at the National for 2 matches and more, distinguishes between the possibili Center for Biotechnology Information (NCBI) of the ties that Some matches are consistent with one another (i.e. National Institutes of Health (NIH) was compiled. The 25 their EC assignments are either identical or obey parent-child thermophilic bacteria are listed in table 12. relationships) and others are inconsistent. US 2013/0332.133 A1 Dec. 12, 2013 57

0670. Two consistent predicting sequence matches are sequence matches O., in general and/or the observed matches denoted 2, and two inconsistent ones 2. Similarly 3 hits with with internal consistency or inconsistency according to equa 2 consistent and one inconsistent are denoted 21. For a tions such as Equations 4 to 7 below number of predicting sequence matches n; n is denoted as XY, where X+Y=n. TABLE 13 Probability estimates of random predicting sequence matches Thermophiles data Number of predicting Standard 0675 etc., sequence matches Probability Deviation where, in Equation 7, a simplistic assumption is made that the O O.78804 O.OO2S3 matches of t and those created by P1 are inconsistent with 1 O.17772 O.OO223 each other. The data at hand have various inter-relationships 21 O.02279 O.OOO76 among the different genes brought about by evolution. There 3. O.OO219 OOOOO2 fore results may not always follow this model which assumes 2clf O.OO189 O.OOO23 independent occurrences of accidental predicting sequence 3C O.OOO17 OOOOO2 matches exactly. However, the proposed error model provides 2c21 OOOO42 OOOOO4 an estimate for the amount of errors involved when turning 4. O.OOO19 OOOO11 observations into predictions. 0676 For example, in the Thermophiles data Oo36,064 0671 Table 14 contains probability estimates of random and O=9,377 which indicates that that to 45,725 (+/-105) predicting sequence hits on the Sargasso Sea data. This is and t=1,668 (+/-100). Since to P1=8,064 accounts for based on three sets of 100,000 scrambled sequences ran almost all 9.377 observations of single matches, single domly chosen from the over 1 million proteins in the Sargasso matches are preferably insufficient for identification of a pro Sea data using notation similar to that employed in table 13. tein as an enzyme. 0677 Continuing similarly to n=2, in Thermophiles TABLE 1.4 O-1,142, whereas the component of to P2=272, hence the expected error on assigning correctly the enzyme from the Probability estimates of observation of two consistent predicting sequence matches in randon predicting sequence matches Sargasso Sea data this dataset is 272/1,142s24%. Number of predicting 0678. The proposed error model works well for low values sequence matches Probability Standard Deviation of n, e.g. n<5. For higher n values it overestimates the number O O.8626 O.OO10 of inconsistent hits. This is to be expected if enzyme 1 O.1233 O.OOO6 sequences with very low n values have undergone stronger 2c O.OO26 O.OOO2 evolutionary changes, e.g. through mutations. These changes 21 O.O1O2 O.OOO2 3. 0.00057 O.OOOO3 could be the reason for the observation of low n, because they 2clf O.OOOS2 O.OOOO2 have eliminated relevant predicting sequences and, at the same time, may have inserted accidental (and inconsistent) short predicting sequences into the sequence. 0672. Both tables 13 and 14 reflect the fact that the over whelming majority of sequences contain no predicting Fisher Distance Criterion sequence matches (78% in table 13 and 86% in table 14). This is reflective of the fact that most proteins are not characterized 0679. If two different enzyme domains with different by an enzymatic function. activities exist within the protein, or if one enzyme domain exists and another non-enzymatic domain comprises acciden Error Model tal predicting sequence matches, groups of predicting sequences that are not consistent with one another are 0673 Although the occurrence of accidental predicting expected to result. An example of such a case would be 22, sequence matches is low, it is desirable to know which signifying two pairs of predicting sequences that are consis matches are accidental. The following model for estimating tent within themselves but inconsistent with each other. In the expected errors on enzyme predictions based on predict these cases a two-domain hypothesis can be checked by cal ing sequence matches is proposed to distinguish between culating a Fisher distance between the two groups of predict predicting sequence matches that are consistent with one ing sequence matches (EQ 8). another (i.e. their EC assignments are either identical or obey parent-child relationships according to the EC tree) or not. F-2(11-12). (A1+A2) (EQ 8) From 4 matches onwards there is also the (rare) possibility of The parameters in EQ 8 are defined as follows: determine the combinations with internal consistency and external incon first index of the left-most predicting sequence match of one sistency; two such pairs of matches are denoted as 22. group of consistent predicting sequences and the last index of 0674. The proposed error model presumes that, in a given the right-most predicting sequence of this group on the dataset, there exists a prior distribution of enzymes with n sequence of the protein. The mean of these indices is L and (consistent) matches whose numbers are denoted by t, on the difference between them defines the total length A of this which there exist additional accidental predicting sequence group of consistent predicting sequences. Ll and A2 are matches according to the distribution displayed in Tables 13 defined analogously using the left and right indices of the or 14. According to this model, the observed predicting second group(s) of predicting sequences. US 2013/0332.133 A1 Dec. 12, 2013

0680 For data that match the description of two or more 0688 FIG. 13 displays the relative percentages of the dif domains the Fisher distance (F) is expected to have an abso ferent cases of predicting sequence matches, showing in con lute value greater than 1, indicating that the two predicting sistent matches per protein (grey or red bars), in consistent sequence groups occupy mutually exclusive regions on the matches and 1 inconsistent match (empty oryellow bars), and protein sequence. Strictly speaking this is not a necessary all other combinations adding to n matches per protein (dark condition, since the two enzymatic domains can be spatially or blue bars). distinct in the folded protein as a result of secondary and/or 0689. According to a preferred embodiment of the present tertiary structure even if the predicting sequences occur in invention when the number of motifs of the target protein overlapping domains on the primary structure. Nonetheless, which match predicting sequences in the database is suffi The Fisher model is based upon clear separation of the pre ciently large (e.g., larger than 4) and when the number of dicting sequences along the primary structure of the peptide inconsistent matches is Sufficiently Small (e.g., all matches sequence which probably occurs more frequently in nature. but one being consistent), the inconsistent matches are disre garded for the purpose of classification. Example 5 0690 For example, in the present example, there are alto gether 419 inconsistent matches for no-4, 331 of which con Characterization of the Metagenomic Dataset tain a single predicting sequence that does not match the rest. 0681 A predicting sequence search on the metagenomic According to the presently preferred embodiment of the thermophile dataset of Example 3 produced a distribution of invention for n>4, most of the (n-1)1 predicting sequence predicting sequence matches Summarized graphically in FIG. matches depicted in FIGS. 11, 12 and 13 can still serve as 10. valid predictions by disregarding the EC assignment of the 0682 FIG. 10 clearly demonstrates that predicting one predicting sequence that disagrees with the others. This sequence matches are present on 16,417 proteins, whereas procedure is based on the assumption that, through random random predicting sequence matches account for 11,124 pro evolutionary processes a Subsequence has been created at a teins (+133) as described hereinabove. This suggests that the location that has nothing to do with the EC function of the metagenomic thermophile dataset includes more than 5,000 enzymes. The overall ratio 33 1/2,418–0.14 of (single incon enzymes. sistent)/(all consistent) data is smaller but not very far from 0683. Using preferred embodiments of the present inven P1/P0–0.22 of Table 13, the model of independent accidental tion, resolution of which of these proteins should be recog predicting sequence matches. In an exemplary embodiment nized as enzymes and what the EC assignments of these of the invention, 2,749 EC assignments from the data of enzymes should be was undertaken. Low numbers of predict predicting sequence nA can be achieved by ignoring incon ing sequence matches (n35) and high numbers (ne5) were sistent matches in all (n-1), 1, hence basing the classifica handled separately. tions on (n-1), predicting sequence matches. 0684. For n-5, the similarity between the exponential drop 0691. In cases where the number of predicting sequence observed in the random case (Table 13) and in the real data matches is less than 5, only predicting sequence matches that (FIG. 10), where O0:O1:02 is of order 4. These data clearly are fully consistent with one another are considered indicative indicate that most of the n=1 data are accidentals, and the n=2 of enzymatic activity and/or EC classification. to 4 data need special study to decide which are indeed 0692 Table 15 lists the results for n=2,3 and 4 as well as enzymes. error estimates based on the error model described herein 0685. There is a smaller number of peptides characterized above. Data presented in table 15 indicate that data of fully by five or more predicting sequence matches which appear to consistent predicting sequences for n=3 and 4 are meaningful indicate bona fide enzymes. No combinations of more than predictors of enzymatic activity with a high degree of accu five predicting sequence matches occur completely at ran racy. dom, and most predicting sequence hits are consistent with one another, i.e. the different EC labels of the predicting TABLE 1.5 sequences observed on these proteins are consistent with there being a unique EC-number assignment to the protein. Match results and error estimates based on the error model (n = 2, 3 and 4 There are a smaller number of cases with two potential EC numbers, suggesting that the protein in question is character l 2c 3C 4. ized by two domains with two different catalytic activities. Observations 1,142 569 438 0686 FIGS. 11 and 12 indicate graphically how many Error estimate 270 8 1 consistent matches there are and how many matches with one inconsistent predicting sequence, i.e., matches where at least one predicting sequence has an EC assignment different from Verification of Results the rest. In FIGS. 11 and 12, grey or red bars correspond to n consistent matches per protein, where n is shown on the (0693. There is a group of 3,756 proteins for which EC horizontal axis, empty or yellow bars indicate n-1 consistent assignments can be made with a high degree of accuracy. The and n inconsistent matches per protein, and dark or blue bars group includes all n>4 predicting sequence matches which indicate other combinations adding to n matches per protein. are either fully consistent or have one inconsistent predicting In FIG. 11, 2

96% true positives. In Table 16, the levels 1, 2, 3 and 4 of the TABLE 16-continued EC hierarchy, are denoted ECL-1, EC L-2, EC L-3 and EC L-4, respectively. Thermophiles Analysis Summary of EC Predictions against NCBI TABLE 16 No EC Avail Thermophiles Analysis able - Summary of EC Predictions against NCBI True Predictions Potential

No EC T1 T2 T3 T4 False EC for Avail TP FP EC EC EC EC Posi- Avail- New able - Category % 96 L-1 L-2 L-3 L-4 Total tives able Pred. True Predictions Potential 33C 1I OO O O O 1 O O O T1 T2 T3 T4 False EC for 34C OO O O O O 3 3 O 3 10 TP FP EC EC EC EC Posi- Avail- New 34C 1I OO O O O O 1 O O Category 9.6 %) L-1 L-2 L-3 L-4 Total tives able Pred. 35C O O O O O O O 11 36C OO O O O O 1 O 7 Total 96 4 33 32 130 1,064 1.259 54 1,313 3,977 36C 1I O O O O O O O 3 2C 90 1 0 18 9 38 131 196 21 217 931 37C O O O O O O O 2 2C 8S 15 3 3 10 24 40 7 47 229 38C OO O O O O 1 O 3 3C 98 2 4 4 14 1 OS 127 2 129 442 38C 1I OO O O O O 1 O O 3C 93 7 1 O 5 21 27 2 29 90 39C OO O O O O 1 O 8 4C 97 3 2 4 14 98 118 4 122 323 4OC OO O O O O 2 2 O 2 3 4C 86 14 O 1 1 17 19 3 22 42 41C OO O O O O 2 2 O 2 3 5C 97 3 O 4 11 87 102 3 105 255 42C OO O O O 1 O O 1 5C OO O 1 O 4 8 13 O 13 31 43C OO O O O O 3 3 O 3 5 6C 95 S 2 O 5 70 77 4 81 213 43C 1I OO O O O O 1 O O 6C OO O O O O 11 11 O 11 29 4SC O O O O O O O 2 7C OO O O O 5 60 65 O 65 172 46C OO O O O O 1 O 2 7C OO O O O 1 10 11 O 11 18 47C O O O O O O O 2 8C OO O O O 2 42 44 O 44 158 48C OO O O O O 1 O 1 8C OO O O 1 1 8 10 O 10 10 50C O O O O O O O 3 9C 98 2 O O 1 50 51 1 52 121 51C O O O O O O O 2 9C OO O O O O 5 5 O 5 8 S1C 1I OO O O O O 1 O O OC OO O O O 3 45 48 O 48 110 52C O O O O O O O 1 OC OO O O O 1 4 5 O 5 8 53C O O O O O O O 2 1C 97 3 O 1 O 37 38 1 39 77 55C OO O O O O 1 O 1 1C OO O O O O 4 4 O 4 5 56C OO O O O O 1 O 1 2C 97 3 O O 1 29 30 1 31 94 62C O O O O O O O 1 2C OO O O 1 O 4 5 O 5 6 63C OO O O O 1 O O O 3C 95 5 O O O 19 19 1 2O 52 73C O O O O O O O 1 3C OO O O O O 3 3 O 3 5 4C 96 4 O 1 O 26 27 1 28 S4 4C OO O O O O 1 1 O 1 6 5C OO O 1 O O 13 14 O 14 33 The true predictions can be divided into 4 classes: 5C O O O O O O O 3 (0695) 1. Correct (true positive) predictions at EC level 4 6C 87 13 O O O 13 13 2 15 30 “TP4 6C OO O O O O 4 4 O 4 2 7C OO O 1 O 3 11 15 O 15 50 (0696 2. Correct (true positive) predictions at EC level 3 7C 1I OO O O O O 3 3 O 3 1 “TP3 8C OO O O 1 O 13 14 O 14 38 8C O O O O O O O 1 (0697 3. Correct (true positive) predictions at EC level 2 9C 89 11 O 1 1 6 8 1 9 29 “TP2' 9C OO O O O O 1 1 O 1 3 0698 4. Correct (true positive) predictions at EC level 1 2OC OO O O 1 1 7 9 O 9 18 “TP1’ 21C OO O O O O 5 5 O 5 27 21C OO O O O 1 O 1 O 1 O (0699 FIG. 14 depicts the True Predictions as a function of 22C OO O O O 1 6 7 O 7 11 the different matches categories (consistent VS. inconsistent 22C OO O O O O 1 1 O 1 1 23C OO O O O O 2 2 O 2 22 for each value of n). A detailed comparison of predictions 23C O O O O O O O 1 based on predicting sequence matching with annotations of 24C OO O O O O 5 5 O 5 28 NCBI is provided in Table 17 (provided on enclosed CD 25C OO O O O 1 5 6 O 6 17 ROM, file “Table-17.txt). 25C O O O O O O O 2 26C OO O O O O 7 7 O 7 14 0700. The n=2 results have an estimated possible error of 26C OO O O O O 2 2 O 2 2 24%. In an exemplary embodiment of the invention, putative 27C OO O O O 1 7 8 O 8 17 EC assignments based on n=2 and/or the 21 cases of n 3 27C OO O O O O 1 1 O 1 2 28C OO O O O O 2 2 O 2 13 and/or the 31 cases ofn 4 data can be further checked using 28C O O O O O O O 1 sequence similarity and/or experimental tools to increase the 29C OO O O O O 3 3 O 3 13 number of enzymes correctly characterized. Table 18 exem 29C OO O O O 1 O 1 O 1 O plifies the 21 and 31 cases and expected errors. According 3OC OO O O O O 2 2 O 2 6 3OC OO O O O O 1 1 O 1 1 to the aforementioned notations, 2.1 denotes 2 matches that 3 1C OO O O O O 2 2 O 2 12 are consistent with one another and 1 which is inconsistent 32C OO O O O O 2 2 O 2 8 with the other two. Similarly, the notation 31 denotes 3 33C O O O O O O O 7 matches that are consistent with one another and 1 which is inconsistent with the other three. US 2013/0332.133 A1 Dec. 12, 2013 60

TABLE 18 or 4c predicting sequence matches. which also have a high probability of indicating an enzymatic activity corresponding predicting sequence matches 2clf 3.clf to the EC class indicate by the “c” predicting sequences of the Observations 268 136 peptide based upon the error analyses described above. Error estimate 87 10 0709. The first and second groups together comprise 98,065 peptides with specific enzymatic activities predicted 0701. A list of all 2, 2.1 and 3.1 results is provided in with a reasonable degree of certainty. Table 17 on enclosed CD-ROM, together with their NCBI 0710 FIG. 16 also indicates a third group comprising, assignments. The accumulated data confirm the theoretical peptides with predicting sequence matches designated as 2, error estimate described above. 2.1 and 3.1. This third group comprises 34.268 peptides for which a specific enzymatic activity is predicted with a lower Example 6 degree of certainty. In an exemplary embodiment of the invention, verification by alternative methods can be Analysis of Enzyme Size employed to determine which peptides actually have the pre dicted enzymatic activity. Table 19 summarizes the expected 0702. In order to determine the size of observed enzymatic error rates for each type of predicting sequence matching in domains, the total number of amino-acids covered by consis tent predicting sequence matches on a protein was analyzed. the third group of peptides. This quantity is referred to as length of coverage (L). FIG. 15 is a histogram indicating number of proteins as a function TABLE 19 of coverage L for the classes 2 (empty or yellow bars), 3, Expected error rates for predicting sequence match types 2c, 2 cl and 3rl. (grey or red bars) and 4 (dark or blue bars). In general, L increases as n increases. The parameter L is also listed in predicting sequence Tables 16 and 17. Match Type 2c 2clf 3C1. 0703 Comparison of EC assignments based on predicting Number of Matches 28,811 3,507 1950 sequence matches to NCBI annotations in tables 16 and 17 Expected errors 1,870 868 557 reveals a break point at approximately L-12. Above this Expected accuracy (%) 93.5 75.3 71.4 point, the number of correct identifications is increased. This distribution correlates well with the distributions in FIG. 10 0711) Data summarized in Table 19 suggests that even the and the expected errors in Table 20 for the different n, classes. “unreliable predictions of the third group are valuable. For The n.1 classes (not depicted) have distributions similar to any peptide in this group it is possible to use the EC class those of then, classes in FIG.15 but with much lower rates of Suggested by the predicting sequence matches and screen for OCCUCC. activity using a single Suitable Substrate. Results of a screen ing conducted in this way are expected to produce at least Example 7 71.4% verified enzymes (for 3cl i predicting sequence matches) and as much as 93.5% verified enzymes (for 2c Characterization of Sargasso Sea Bacterial Peptides predicting sequence matches). Using the Metagenomic Dataset 0712. These degrees of expected verification are high for (0704. There are 1,001,986 records in the Sargasso Sea any enzyme screening process. They are unprecedentedly protein data (Venter et al., 2004). The average length of the high for a screening plan in which each candidate enzyme is proteins is 194 amino-acids, with s.d.-109. Using three ran assayed against a single Substrate. dom sets of 100,000 proteins selected from these data, we 0713 Table 20 Summarizes peptides with two putative have generated the randomized proteins from which we have enzymatic activities based upon EC classifications suggested calculated the probabilities of accidental matches in Table 14. by predicting sequence matches. (predicting sequence match The different statistics of the Sargasso Sea set compared to the types suggesting multiple enzymatic activities with less than Thermophiles set are responsible for the different corre 10 peptides are not presented) sponding probabilities observed between Tables 13 and 14. 0705 There are predicting sequence matches on 283,835 TABLE 20 proteins of the Sargasso Sea data. Using the error model described above, it is predicted that some 130,000 of these Multiple consistent set of predicting sequence matches on Sargasso Sea data predicting sequence matches are accidentals (i.e. do not indi predicting sequence cate actual enzymes), leaving over 150,000 actual enzymes. Match Type Peptides 0706 FIG. 16 graphically summarizes categories of pre 2c2c 86 dicting sequence matches in terms of number of matches in 2C3c. 88 and consistency (c) or inconsistency (i) of predicting 2C4C 48 sequence matches within a single peptide sequence. 2C5C 39 2C 6C 31 0707. As indicated in FIG. 16, there is a first group of 2C-7C 21 52.615 proteins with n>4 and Zero or one inconsistent pre 2C8C 17 dicting sequence matches. Proteins in this first group are 2C11C 13 believed to accurately reflect enzymatic activity according to 2C13C 10 3C3c. 10 the EC class indicated by the relevant predicting sequences. 2c2clf 21 0708 FIG.16 also indicates a second group with slightly 2C3clf 10 less certainty about the prediction of enzymatic activity. This second group includes an additional 45,450 proteins with 3c US 2013/0332.133 A1 Dec. 12, 2013

0714 Peptides with putative multiple enzymatic activities belong to the same 3rd EC level, and appears no where else, are of special interest. In The positions of the different pre it is assigned to predicting sequence level 3. The predicting dicting sequence matches on the protein sequence have been sequences were further screened to eliminate any peptide that evaluated using the Fisher distance model described above. includes within its sequence another peptide carrying the Those peptides with a sufficient Fisher distance are believed same predicting sequence N(N=1,2,3,4) label. The majority to comprise two enzymatically active domains on the same of predicting sequences occur at level 4 of the EC hierarchy, peptide. In many cases, the molecules characterized by two probably due to high homology within this level, that often EC classifications are large proteins (as opposed to peptides), includes orthologous genes). Thousands of predicting which makes the multiple domains with separate functions sequences occurat higher levels of hierarchy, reflecting func seems plausible. tional similarity within enzymes with lower sequence simi 0715 Table 21, provided on enclosed CD-ROM (file larity. “Table-21.txt''), presents Predictions for Sargasso Sea data, with predicting sequence matches n>4: Categories 5, 51. 0721 The occurrence of any one predicting sequence on 6, 6-1 and up the sequence of an enzyme specifies its EC functionality 0716 Table 22, provided on enclosed CD-ROM (file according to the specific branch N of its PSN. For example, “Table-22.txt''), presents Predictions for Sargasso Sea data, enzyme P45048 (see FIG. 18) contains SSAATYG, a PS3 with predicting sequence matches in Categories 3, 31, 4, specific to 5.1.3, and LNVYGYSK, a PS4 specific to 5.1.3. and 4.1 20. The relationship of these predicting sequences to the EC 0717 Table 23, provided on enclosed CD-ROM (file hierarchy of predicting sequence families is shown in FIG. “Table-23.txt''), presents Predictions for Sargasso Sea data, 17. Table 24 shows that the predicting sequences cover (i.e., with predicting sequence matches in Categories 2c and 2c1 i. appear on the sequence of) most enzymes in of Swiss-Prot 0718. In each of Tables 21-23, the first column from the release 48.3. The coverage columns display the cumulative left lists the Sargasso ID numbers of the proteins, the second coverage of all predicting sequences to their left. Coverage is column from the left lists the EC numbers found according to a measure of the Success of the predicting sequence approach a preferred embodiment of the present invention, the third of the present embodiments. Thus, from the sixth column one column from the left lists the descriptions of the EC classifi can deduce that functional classification at the third level of cations, the forth column from the left lists the coherent EC is specified by 45,819 peptides of PS3 PS4, covering predicting sequence coverages and the rightmost columns 89.8% of the data. Information about the separate coverage of lists the TAU protein number. each PSN group is provided in Table 27, hereinunder. TABLE 24 Occurrences of predicting sequences in all six EC classes in the analysis of all enzymes in Swiss-Prot release 48.3.

No. of coverage coverage coverage coverage ECclass enzymes SP4 % SP3 % SP2 % SP1 % oxidoreductases 9.437 8,314 86.1 681 89 310 90.8 1260 93.9 transferases 16,196 12,708 88.4 726 90.7 476 91.4 2,068 93.7 hydrolases 10,901 7,535 78.7 809 83.2 196 83.9 1,136 87.4 lyases 5,229 4,728 91.4 186 92.3 S9 92.3 296 93.4 isomerases 2,887 2,588 91.5 48 92.2 2S 92.3 154 93.2 ligases 6,048 6,974 96.1 49S 97.1 93 97.3 SOO 98.2 total 50,698 42,874 87.3 2,945 89.8 1,159 90.5 5,414 92.9

Example 8 Coverage Correlation of Predicting Sequence (PS) Sequences 0722. The occurrence of any one predicting sequence on to EC Functional Classifications of Known Enzymes the sequence of an enzyme specifies its EC functionality 0719. The motifextraction procedure described above was according to the specific branch N of the predicting sequence used for defining predicting sequences from the Swiss-Prot N. Tables 25 and 26 demonstrate that the predicting enzymes as described in Example 1, using the values m=0.8 sequences cover (i.e. appear on the sequence of) most and C=0.01. enzymes in the dataset. Shown in Tables 26 and 26 are the 0720. The deterministic sequence-motifs extracted by the coverage in percentage of both the predicting sequences per motifextraction procedure were further subjected to a screen EC level (Table 25) and of their cumulants (Table 26). The ing process, selecting predicting sequences (PS) that are spe latter are defined as unions of the former CPS3=PS3UPS4, cific to a single branch of the EC hierarchical classification CPS2=PS2UCPS3 and CPS1=PS1UCPS2, and are relevant and can be used as predicting sequences. More than half of all for functional assignments. Thus, for instance, the functional motifs turn out to belong uniquely to single branches of the classification at the third level of EC is specified by 45,819 fourth level of the hierarchy, to be denoted as predicting peptides of CPS3=PS3UPS4, covering about 89.8% of the sequences of level 4 (PS4) (see FIG. 17) and predicting data. Note that the coverages of the various predicting sequences of higher hierarchy (lower N; i.e. PS3, PS2 and sequence at levels N are not additive (e.g., the coverage of PS1) do not include PS4s isolated from non-relevant classes. CPS3 is much smaller than the sum of the coverages of PS3 Thus if a peptide is shared by two or more level 4 groups that and PS4) because predicting sequences on higher branches of US 2013/0332.133 A1 Dec. 12, 2013 62 the hierarchy (lower N) are encountered on sequences that TABLE 27 possess already sites of lower branches (higher N). PS # correct if false 0723. The distribution of the length of predicting ID EC1 EC2 Prediction matches matches sequences is displayed in FIG. 8 for all enzyme classes. The POO561 27.24 1.1.1.3 1 average length of the predicting sequences is 8.4+4.5. POO561 27.24 1.1.1.3 27.24 Enzymes that share large predicting sequences are highly POOS61 Total 2 O homologous, while enzymes sharing shorter predicting P27725 27.24 1.1.1.3 1 sequences are characterized by a lower degree of sequence P27725 27.24 1.1.1.3 27.24 similarity. This is displayed, for short, medium and long P27725 27.24 1.1.1.3 6.3.4.2 O 1 motifs, in FIGS. 9a-c. P2772S Total 2 1 0724. The distribution of the number of predicting P44SOS 27.24 1.1.1.3 1 sequences occurring on enzymes is given in FIG. 21. FIG. 23 O is a histogram indicating distribution of the numbers of PSs 4.3.2. 2.3.1.1 4 occurring on enzymes with mean and median indicated. 4.3.2. 2.3.1.1 4.3.2. 27 28 O TABLE 25 4.3.2. 2.3.1.1 4 4.3.2. 2.3.1.1 4.3.2. 27

Coverage by predicting sequences of enzymes in Swiss-Prot release 48.3 28 O 4.3.2. 23. 4 EC class PS4 PS3 PS2 PS1 4.3.2. 23. 4.3.2. 22

oxidoreductases 86.1% 27.6% 18% 75% 23 O 4.3.2. 23. 4 transferases 88.4% 33.7% 27.4% 70% 4.3.2. 23. 4.3.2. 27 hydrolases 78.7% 27.7% 1996 57.8% lyases 91.4% 29.7% 15.5% 48.2% 28 O isomerases 91.5% 16.8% 9.7% 39.9% 4.3.2. 23. 4 ligases 96.1% 55% 18.2% 64.1% 4.3.2. 23. 4.3.2. 27 total 87.3% 32.47% 20.52% 63.8% 28 O 4.3.2. 23. 4 4.3.2. 23. 4.3.2. 27

TABLE 26 28 O 4.1.1 2.1.2 2.1.2.9 2 Coverage by cumulants Q8XDZ3 Total 2 O EC class PS4 CPS3 CPS2 CPS1 P77398 4.1.1 2.1.2 2.1.2.9 2

oxidoreductases 86.1% 89% 90.8% 93.9% 2 O transferases 88.4% 90.7% 91.4% 93.7% 4.1.1 2.1.2 2 hydrolases 78.7% 83.2% 83.9% 87.4% 4.1.1 2.1.2 2.1.2.9 lyases 91.4% 92.3% 92.3% 93.4% 4.1.1 2.1.2 3 O 1 isomerases 91.5% 92.2% 92.3% 93.2% ligases 96.1% 97.1% 97.3% 94.2% Q8Z540 Total 2 1 total 87.3% 89.8% 90.5% 92.8% O52325 4.1.1 2.1.2 2 O52325 4.1.1 2.1.2 2.1.2.9 O52325 4.1.1 2.1.2 3 O 1

OS232S Total 2 1 Generalization of Enzyme Class Prediction Q8RF47 4.2.3.4 3.6.1 3.6.1.11 Q8RF47 4.2.3.4 3.6.1 4.2.3.4 2 0725. The SwissProt 48.3 dataset contains 260 enzymes 3 O that have more than one annotation, and, therefore, have been 2.7.1.71 4.2.3.4 4.2 excluded from the training set. Using them as a test set, 849 2.7.1.71 4.2.3.4 4.2.3.4 6 hits of PSs on 157 of these enzymes were found. 711 of the 7 O 849 hits agree with one of the given annotations and 138 do 2.7.1.71 4.2.3.4 4 not, thus obtaining an accuracy of 84%. The results are dis 2.7.1.71 4.2.3.4 4.2 played in Table 27, comparing the Swiss-Prot EC annotations 2.7.1.71 4.2.3.4 4.2.3.4 2 with PS predictions. For example, the first protein on the list, Q9WYI3 Total 4 O has Swiss-Prot EC annotations of 2.7.2.4 and 1.1.1.3. Its P52081 3.5.1.28 3.21.96 2.7.2.3 O 1 sequence matches two PSs, one PS1 of class 1 and one PS4 of P52081 3.5.1.28 3.21.96 3 1 2.7.2.4. This is counted as two correct matches. The columns 1 in Table 27 indicate the protein id according to Swiss-Prot, its 1.14.14.1 1.6.2.4 1 two EC assignments, the EC assignments according to SP 1.14.14.1 1.6.2.4 1.4 1 predictions, and the number of SP matches that have the same 1 1 EC prediction (separated into correct and false predictions) US 2013/0332.133 A1 Dec. 12, 2013

TABLE 27-continued TABLE 27-continued

PS # correct if false PS # correct if false ID EC1 EC2 Prediction matches matches ID EC2 Prediction matches matches P23473 3.2.1.14 3.2.1.17 3.2.1 1 Q27713 2.1.1.45 1.5.1.3 1 Q27713 2.1.1.45 2 2 P234.73 Total 1 Q27713 2.1.1.45 1 Q13057 2.7.7.3 2.7.1.24 1 Q27713 2.1.1.45 1 1 Q27713 2.1.1.45 1.45 8 Q13057 Total Q9DBL7 2.7.7.3 2.7.1.24 13 O Q9DBL7 2.7.7.3 2.7.1.24 : 2.1.1.45 2 Q9DBL7 2.7.7.3 2.7.1.24 2.1.1.45 1 2.1.1.45 1 Q9DBL7 Total 2.1.1.45 1.45 8 P14779 1.14.14.1 1.6.2.4 P2O712 Total 12 O P14779 Total P13922 2.1.1.45 2 Q9ACU1 2.S. 2.5.1.31 2: P13922 2.1.1.45 1 Q9ACU1 2.S. 2.5.1.31 2.5.1 31 P13922 2.1.1.45 1 P13922 2.1.1.45 1.45 8 Q9ACU1 Total Q575.06 4.6.1.1 3.6.3 .14 O P13922 Total 12 O OO2604 2.1.1.45 1 Q57506 Total OO2604 2.1.1.45 1 P15318 4.6.1.1 3.6.3 .14 g OO2604 2.1.1.45 1 1 OO2604 2.1.1.45 1.45 8 P15318 Total OO2604 2.1.1.45 O 1 Q05762 2.1.1.45 Q05762 2.1.1.45 2.1. O02604 Total 11 1 Q05762 2.1.1.45 2.1. 45 P51820 2.1.1.45 3 P51820 2.1.1.45 1 Q05762 Total P51820 2.1.1.45 1.45 11 Q05763 2.1.1.45 P51820 2.1.1.45 O 1 Q05763 2.1.1.45 2.1. Q05763 2.1.1.45 2.1. PS1820 Total 15 1 Q05763 2.1.1.45 5.1.1. Q07422 2.1.1.45 O 1 Q07422 2.1.1.45 1 Q05763 Total Q07422 2.1.1.45 1 Q23695 2.1.1.45 2.1. Q07422 2.1.1.45 1.45 7 Q23695 2.1.1.45 2.1. 45 Q07422 Total 9 1 Q23695 Total Q27783 2.1.1.45 2 P45350 2.1.1.45 1.5.1. Q27783 2.1.1.45 i. 1 P45350 2.1.1.45 Q27783 2.1.1.45 7 P45350 2.1.1.45 2.1. P45350 2.1.1.45 5.1.1. Q27783 Total 10 O Q27793 2.1.1.45 2 P45350 Total Q27793 5. 3 2.1.1.45 i. 1 P16126 2.1.1.45 Q27793 2.1.1.45 7 P16126 2.1.1.45 2.1 P16126 2.1.1.45 2.1.1 Q27793 Total 10 O P16126 2.1.1.45 2.1.1 45 Q9CGE3 2.76.3 3.54.16 3.54.16 3 P16126 Total 3 O PO7382 2.1.1.45 2.76.3 3.54.16 3.54.16 3 PO7382 2.1.1.45 2.1. PO7382 2.1.1.45 2.1. 45 Q8GJP4 Total 3 O PO7382 2.1.1.45 6.1. Q10663 233.9 Q10663 233.9 2.3 PO7382 Total Q10663 233.9 2.33.9 5 O81395 2.1.1.45 1.5.1. Q10663 233.9 4.1 O81395 2.1.1.45 Q10663 233.9 4.1.3 O81395 2.1.1.45 2.1 Q10663 233.9 4.1.3.1 4 O81395 2.1.1.45 2.1. O81395 2.1.1.45 2.1. 45 Q10663 Total 13 O Q7TQ49 2.7.1.60 2.7.1 O81395 Total O Q5UQG3 2.1.1.45 2.1. 45 2.7.1.60 2.7.1

Q5UQG3 Total O Q27828 2.1.1.45 1.5.1. 2.7.1.60 2.7.1 Q27828 2.1.1.45 Q27828 2.1.1.45 2.1. Q91WG8 Total O Q27828 2.1.1.45 2.1. 45 O35826 2.7.1.60 2.7.1 Q27828 Total O35826 Total O US 2013/0332.133 A1 Dec. 12, 2013 64

TABLE 27-continued TABLE 27-continued

PS # correct if false PS # correct if false ID EC1 EC2 Prediction matches matches ID EC1 EC2 Prediction matches matches P17114 2.7.7.23 2.3.1.157 2.7.1.40 O 1 Q58999 2.7.1.147 2.7. .146 2 Q58999 2.7.1.147 2.7. .146 2.7.1 P17114 Total O 1 Q58999 2.7.1.147 2.7. .146 2.7.1.146 P43675 6.3.1.8 3.5.1.78 2 O 1 Q58999 Total 3 O Q55928 2.77.1 3.6. 2.1. 2.1.1.33 Q55928 2.77.1 3.6. 2.77.1 2.1. 2.1.1.33 2.1.1.33 2.1. 2.1.1.33 2.4 Q55928 Total 2 O 2.1. 2.1.1.33 3.13.48 OS482O 2.7.7.4 2.7. 25 2.7.1.25 OS482O 2.7.7.4 2.7. 25 3.4.11.18 O 1 OS482O 2.7.7.4 2.7. 25 6 O 1 2.1. 2.1.1.33 2.1. 2.1.1.33 2.1.1.33 O54820 Total 2 O4.3252 2.7.7.4 2.7. 25 2.7.1.25 3 O4.3252 2.7.7.4 2.7. 25 3.4.11.18 O 1 2.7. 2.7.7 2.1.2.9 O4.3252 2.7.7.4 2.7. 25 6 O 1

O OO4.3252 Total 2 2.7. 2.7.7 3.6.3 Q60967 2.7.7.4 2.7. 25 2.7.1.25 Q60967 2.7.7.4 2.7. 25 3.4.11.18 O 1 Q60967 2.7.7.4 2.7. 25 6 O 1 2.7. 2.7.7 3.6.3 Q60967 Total 2 Q8FDH5 Total O95340 2.7.7.4 2.7. 25 2.7 P76658 2.7. 2.7.7 3.6.3 O95340 2.7.7.4 2.7. 25 2.7.1.25 2 O95340 2.7.7.4 2.7. 25 6 O 1

2.7. 2.7.7 g O95340 Total 3 1 2.7. 2.7.7 O88428 2.7.7.4 2.7. 25 2.7 1 O88428 2.7.7.4 2.7. 25 2.7.1.25 2 O O88428 2.7.7.4 2.7. 25 6 O 1 2.7. 2.7.7 O88428 Total 3 1 Q97.KZO Total O Q27128 2.7.7.4 2.7. 25 2.7 1 Q9CME6 2.7. 2.7.7 3.421 Q27128 2.7.7.4 2.7. 25 2.7.1 1 Q27128 2.7.7.4 2.7. 25 2.7.1.25 4 Q9CME6 Total Q27128 2.7.7.4 2.7. 25 3.1 O Q88D93 2.7. 2.7.7 2.7 Q27128 2.7.7.4 2.7. 25 6 O Q27128 Total 6 2 2.7. 2.7.7 2.7 P36204 2.7.2.3 5.3.1. 4 P36204 2.7.2.3 5.3.1. 2.7 4 P36204 2.7.2.3 5.3.1. 2.7.2.3 29 2.7. 2.7.7 P36204 2.7.2.3 5.3.1. 5.3.1.1 1

P36204 Total 48 O 2.7. 2.7.7 118.6.1 O13911 3.13.32 2.7. .78 O13911 3.13.32 2.7. .78 3.1.3.18 O 1

2.7. 2.7.7 3.6.3 O13911 Total Q96T60 3.13.32 2.7. .78 1.17.4.3 O Q96T60 3.13.32 2.7. .78 2.7. 2.7.7 3.6.3 1 3.13.32 2.7. .78 2.7. 2.7.7 3.6.3 3.13.32 2.7. .78 3.1.3.18 O 1 Q7UBI8 Total Q975B5 2.7. 2.7.7 6.11.7 6.3.4.13 6.3.3.1 2.6.1.52 O 6.3.4.13 6.3.3.1 6.3.4.13 6.3.3.1 6.3 3.5.2.7 4.3.1.3 6.3.4.13 6.3.3.1 6.3.3.1 9 3.5.2.7 4.3.1.3 3.5.2.7 6.3.4.13 6.3.3.1 6.3.4.13 6 3.5.2.7 4.3.1.3 3.5.2.7 4.3.1.3 4.3.1 P2O772 Total 17 1 3.5.2.7 4.3.1.3 4.3.1.3 Q99148 6.3.4.13 6.3.3.1 1 Q99148 6.3.4.13 6.3.3.1 6.3 1 Q8YD09 Total Q99148 6.3.4.13 6.3.3.1 6.3.3.1 10 Q58270 2.5.1.1 2.5.1.10 6.1.1.19 Q99148 6.3.4.13 6.3.3.1 6.3.4.13 6 Q58270 Total Q99148 Total 18 O US 2013/0332.133 A1 Dec. 12, 2013 65

TABLE 27-continued TABLE 27-continued

PS # correct if false PS # correct if false ID EC1 EC2 Prediction matches matches ID EC1 EC2 Prediction matches matches

PO7244 6.3.4.13 6.3.3. 6 Q9HUV9 2.1.2.3 3.54.10 4.2.1.24 O PO7244 6.3.4.13 6.3.3. 6.3.3.1 1 PO7244 6.3.4.13 6.3.3. 6.3.4.13 4 Q9HUV9 Total O Q88DK3 2.1.2.3 3.54.10 4.2.1.24 O PO7244 Total 16 O Q8A155 2.1.2.3 3.54.10 2 Q88DK3 Total O Q87VR9 2.1.2.3 3.54.10 4.2.1.24 O Q8A155 Total O Q89WU7 2.1.2.3 3.54.10 4.2.1.24 O 1 Q87VR9 Total O Q8Z335 2.1.2.3 3.54.10 2 1 Q89WU7 Total O 1 Q8Z335 2.1.2.3 3.54.10 2.5.1 O P571.43 2.1.2.3 3.54.10 2 Q8Z335 2.1.2.3 3.54.10 4.2.1.24 O Q8Z335 2.1.2.3 3.S.410 S O PS7143 Total O Q8KA70 2.1.2.3 3.54.10 2 Q8Z335 Total 1 3 P26978 2.1.2.3 3.54.10 2 1 Q8KA70 Total O P26978 2.1.2.3 3.54.10 2.5.1 O Q9ABY4 2.1.2.3 3.54.10 4.2.1.24 O P26978 2.1.2.3 3.54.10 4.2.1.24 O P26978 2.1.2.3 3.S.410 S O Q9ABY4 Total O P313.35 2.1.2.3 3.54.10 2 P26978 Total 3 O74928 2.1.2.3 3.54.10 4.2.1.11 O P3133S Total O Q892X3 2.1.2.3 3.54.10 2.7.1.37 O O74928 Total O QSHH11 2.1.2.3 3.54.10 2. Q892X3 Total O Q9RHX6 2.1.2.3 3.54.10 3.6.3.14 O Q5HH11 Total O Q9RHX6 2.1.2.3 3.54.10 5.3.1.16 O P67543 2.1.2.3 3.54.10 2.

Q9RHX6 Total O 2 P67543 Total O Q8X611 2.1.2.3 3.54.10 2 1 P67544 2.1.2.3 3.54.10 2. Q8X611 2.1.2.3 3.54.10 2.5.1 O Q8X611 2.1.2.3 3.54.10 4.2.1.24 O P67544 Total O Q8X611 2.1.2.3 3.S.410 S O Q6GI11 2.1.2.3 3.54.10 2. Q8X611 Total 1 3 Q6GI11 Total O Q8FB68 2.1.2.3 3.54.10 2 1 Q6GAEO 2.1.2.3 3.54.10 2. Q8FB68 2.1.2.3 3.54.10 2.5.1 O Q8FB68 2.1.2.3 3.54.10 4.2.1.24 O Q6GAEO Total O Q8FB68 2.1.2.3 3.S.410 S O Q8NX88 2.1.2.3 3.54.10 2. Q8FB68 Total 1 3 Q8NX88 Total O P15639 2.1.2.3 3.54.10 2 1 P67545 2.1.2.3 3.54.10 4.2.1.24 O P15639 2.1.2.3 3.54.10 2.5.1 O P15639 2.1.2.3 3.54.10 4.2.1.24 O P6754S Total O P15639 2.1.2.3 3.S.410 S O P67546 2.1.2.3 3.54.10 4.2.1.24 O

P15639 Total 1 3 P67546 Total O P43852 2.1.2.3 3.54.10 2 1 Q8DWK8 2.1.2.3 3.54.10 4.2.1.24 O P43852 2.1.2.3 3.54.10 4.2.1.24 O P43852 2.1.2.3 3.54.10 5.3.1.9 O Q8DWK8 Total O Q8K8Y6 2.1.2.3 3.54.10 4.2.1.24 O P43852 Total 1 2 P31939 2.1.2.3 3.54.10 2 1 Q8K8Y6 Total O P31939 2.1.2.3 3.54.10 5.99.13 O Q5XEF2 2.1.2.3 3.54.10 4.2.1.24 O

P31939 Total 1 Q5XEF2 Total O Q9CWJ9 2.1.2.3 3.54.10 2 1 Q8P310 2.1.2.3 3.54.10 4.2.1.24 O

3. Total 2.1.2.3 3.54.10 6.1.1.20 o O Q8P310 Total O . . . . - . . . . Q97T99 2.1.2.3 3.54.10 4.2.1.24 O P67542 Total O Q9RAJS 2.1.2.3 3.54.10 2.7.1.37 O Q97T99 Total O Q9RAJS 2.1.2.3 3.54.1O 6.1.1.2O O Q8DRM1 2.1.2.3 3.5.4.10 4.2.1.24 O

Q9RAJS Total O 2 Q8DRM1 Total O P67541 2.1.2.3 3.54.10 6.1.1.20 O Q9F1T4 2.1.2.3 3.54.10 4.2.1.24 O

P67541 Total O Q9F1T4 Total O P57828 2.1.2.3 3.54.10 2 1 Q9KV80 2.1.2.3 3.54.10 2 1 P57828 2.1.2.3 3.54.10 4.2.1.24 O Q9KV80 2.1.2.3 3.54.10 4.2.1.24 O

PS7828 Total 1 Q9KV80 Total 1 US 2013/0332.133 A1 Dec. 12, 2013 66

TABLE 27-continued TABLE 27-continued

PS # correct if false PS # correct if false ID EC1 EC2 Prediction matches matches ID EC1 EC2 Prediction matches matches

Q5E257 2.1.2.3 3.54.10 4.2.1.24 O O2S806 2.7.7.6 5.3.1.16 O 1 o O2S806 2.7.7.6 6.1.1.4 O 1 Q5E257 Total O Q87KTO 2.1.2.3 3.54.10 4.2.1.24 O O25806 Total 13 5 o Q7MA56 2.7.7.6 1 O 1 Q87KTO Total O Q7MA56 2.7.7.6 2 1 Q8DD06 2.1.2.3 3.54.10 4.2.1.24 O Q7MA56 2.7.7.6 2.7 3 Q7MA56 2.7.7.6 2.7.7.6 9 Q8DD06 Total O Q7MA56 2.7.7.6 6.2.1.1 O 1 Q7MGT5 2.1.2.3 3.54.10 4.2.1.24 O Q7MA56 Total 13 2 Q7MGT5 Total O Q85FR6 2.7.7.6 2 2 Q8PQ19 2.1.2.3 3.54.10 4.2.1.24 O Q85FR6 2.7.7.6 2.7 4 Q8PQ19 2.1.2.3 3.54.10 6.3.4.5 O Q85FR6 2.7.7.6 2.7.1 O 1 Q85FR6 2.7.7.6 2.7.7.6 24 Q8PQ19 Total O 2 Q8PD47 2.1.2.3 3.54.10 4.2.1.24 O Q85FR6 Total 30 1 o P28668 6.1.1.17 6.1.1.15 2 O 1 Q8PD47 Total O P28668 6.1.1.17 6.1.1.15 2.4.229 O 1 Q9PC10 2.1.2.3 3.54.10 2 P28668 6.1.1.17 6.1.1.15 6 1 Q9PC10 2.1.2.3 3.54.10 4.2.1.24 O P28668 6.1.1.17 6.1.1.15 6.1.1 3

Q9PC10 Total P28668 Total 4 2 Q87D58 2.1.2.3 3.54.10 2 PO7814 6.1.1.17 6.1.1.15 2 O 1 Q87D58 2.1.2.3 3.54.10 4.2.1.24 O PO7814 6.1.1.17 6.1.1.15 6 1 o PO7814 6.1.1.17 6.1.1.15 6.1.1 2 Q87D58 Total PO7814 6.1.1.17 6.1.1.15 6.1.1.18 O 1 Q8ZAR3 2.1.2.3 3.54.10 2 Q8ZAR3 2.1.2.3 3.54.10 4.2.1.24 O PO7814 Total 3 2 o Q8CGC7 6.1.1.17 6.1.1.15 2 O 1 Q8ZAR3 Total Q8CGC7 6.1.1.17 6.1.1.15 6 1 P09S46 5.99.8 5.1.12 Q8CGC7 6.1.1.17 6.1.1.15 6.1.1 2 P09S46 5.99.8 5.1.12 2. O 2 Q8CGC7 6.1.1.17 6.1.1.15 6.1.1.18 O 1

P09S46 Total 2 Q8CGC7 Total 3 2 OS2485 5.99.8 5.1.12 Q58635 6.1.1.15 6.1.1.16 6.1.1 OS2485 5.99.8 5.1.12 .2.1 O 2 OS2485 5.99.8 .5.1.12 4.2.1.20 O Q58635 Total O o P61422 2.5.1.3 2.74.7 2.5.1.3 OS2485 Total 3 P61422 2.5.1.3 2.74.7 2.74.7 P95629 5.99.8 5.1.12 P95629 5.99.8 5.1.12 .2.1 O P61422 Total 2 O P95629 5.99.8 5.1.12 2 O Q8YRC9 1.5.3 1

P95629 Total 2 Q8YRC9 Total O P10503 5.99.8 5.1.12 Q92F56 6.3.4 2.4.2.8 2 P10503 5.99.8 5.1.12 .2.1 O Q92F56 6.3.4 2.4.2.8 2.4.2.8 4

P1 OSO3 Total Q92F56 Total 5 O Q7VJ82 2.7.7.6 O Q724J4 6.3.4 2.4.2.8 2 Q7VJ82 2.7.7.6 2 2 Q724J4 6.3.4 2.4.2.8 2.4.2.8 4 Q7VJ82 2.7.7.6 2.4.1.1 O Q7VJ82 2.7.7.6 2.7 3 Q724J4 Total 5 O Q7VJ82 2.7.7.6 2.7.7.6 9 Q8YAC7 6.3.4 2.4.2.8 2 Q8YAC7 6.3.4 2.4.2.8 2.4.2.8 4 Q7VJ82 Total 14 2 Q97.K23 2.7.7.6 1 O 2 Q8YAC7 Total 5 O Q97.K23 2.7.7.6 2 1 Q8R6G8 2 2.1.1.33 2.1.1.33 Q97.K23 2.7.7.6 2.4.1.1 O 1 Q97.K23 2.7.7.6 2.7 3 Q8R6G8 Total O Q97.K23 2.7.7.6 2.7.7.6 9 P46843 1.8.19 16.45 1 Q97.K23 2.7.7.6 5.3.1.16 O P46843 1.8.19 16.45 1.8.19 4 Q97.K23 2.7.7.6 6.1.1.4 O Q97.K23 2.7.7.6 6.2.1.1 O P46843 Total 5 O P31625 3.4.23 3.6.1.23 3.6.1.23 Q97.K23 Total 13 6 O2S806 2.7.7.6 1 O 2 P3162S Total O O2S806 2.7.7.6 2 1 P29127 3.2.1.8 3.2.1.8 2 O2S806 2.7.7.6 2.4.1.1 O 1 P29127 3.2.1.8 6.35.2 O 2 O2S806 2.7.7.6 2.7 3 O2S806 2.7.7.6 2.7.7.6 9 P29127 Total 2 2 US 2013/0332.133 A1 Dec. 12, 2013 67

TABLE 27-continued TABLE 30

PS # correct if false Generalization test on all levels of EC. ID EC1 EC2 Prediction matches matches EC class # of sequences Recall Precision P291.26 3.2.1.8 3 P291.26 3.2.1.8 3.2.1.8 1 Oxidoreductases 1661 1235 (74.35%) 99.35% Transferases 3722 3253 (87.4%) 99.3% P29.126 Total 2 O Hydrolases 2173 1614 (74.25%) 98.45% Lyases 1089 930 (85.4%) 99.65% Grand Total 719 132 Isomerases 686 611 (89%) 83% Ligases 1399 1385 (99%) 99.55% Total 10730 9028 (84.15%) 98.5% 0726. The ability to generalize using the exemplary MEX algorithm was tested on several cross-validation choices of 0729. Both generalization tests suffer from a bias problem, training and test sets within the class of oxidoreductases and i.e., there exist enzymes in the test sets that have high found to be of the order of 85% (see Table 28). sequence similarity to Some enzymes in the training sets. 0730. In conventional machine-learning approaches to TABLE 28 analysis of sequence to function problems bias in data sets is often accounted for by by avoiding high sequence similarity generalization tests on Oxidoreductase class. between proteins in the test set and proteins in the training set. In this case, this type of avoidance is practically infeasible, test set size level 2 Jaccard score level 3 Jaccard score because such avoidance effectively calls for eliminating all 10% O.86 O.04 O.86 O.O7 enzymes that have the same 4-digit EC number as the one 20% O.86 O.O3 O.85 O.OS being tested from the training set. 25% O.85 O.O3 O.85 O.04 0731. Therefore, bias was handled by the following pro cedure: 0732 (a) start with the test set consisting of all sequences 0727. Additionally, MEX was run on the Swiss-Prot 45 of SwissProt release 48.3 that do not appear in release 45. release (October 2004) and testing its predictions on 10,000 0733) (b) blast each sequence with the sequences of the novel enzymes that are listed in the Swiss-Prot 48.3 release training set (SwissProt release 45) that do not have the same (for the relation between these two sets see FIG.22 and Table 4-digit EC number. 29.) results were similar to those described above, as shown in 0734 (c) include in the non-redundant test set only sequences whose BLAST score (Altschul et al.(1997) Table 31). Gapped blast and psi-blst: a new generation of protein data base search programs. Nucl. Acids Res., 25:3389-3402). with TABLE 29 all other training sequences (including those with the same Numbers of enzymes in Swiss-Prot release 48.3and Swiss-Prot first3 EC digits) is larger than 10. A representative Example release 45. of a non-redundant database is provided in Appendix 1 below and further in Table 37 on enclosed CD-ROM. (d) test gen EC class R45 ?R48 R48? not in R45 R45? not in R48 eralization on this non-redundant set only for motifs in PS1, Oxidoreductases 7776 1661 142 PS2, and PS3, thus avoiding the PS4 motifs that were Transferases 12474 3722 333 extracted from the same 4th level EC sequences as those of Hydrolases 8728 2173 254 the non-redundant test set. The results of this non-biased Lyases 4140 1089 492 Isomerases 2348 S41 33 generalization test are presented in Table 34 which indicates Ligases 4649 1399 43 that 440 (about 40%) of the test-set enzymes contain predict ing sequences that fit the correct classification with an accu Total 4O115 10585 1297 racy of 88%. 0735. In table 34, numbers in the three PSN columns indi cate the number of sequences been covered by PSs. Numbers 0728 Table 30 summarizes results of a generalization test in brackets indicate the numbers of PSs observed to occur on on all levels of the EC hierarchy. Recall specifies the coverage the sequences. Columns oftp and fp display true-positive and of the novel sequences (i.e. R48 ?h not in R45) by PSs false-positive predictions, where tp corresponds to the PS extracted from Swiss-Prot release 45. Precision denotes the indicating correctly the EC classification and fp indicates number of correctassignments according to the EC hierarchy. contradiction with the EC classification. TABLE 34

Coverage of non-redundant test set by motifs in PS1, PS2 and PS3.

class # of seq. PS1 tpi fp PS2 tp2 fp2 PS3 tp fps

Oxidoreductases 36 15(35) 34 1 O O O O O O Transferases 15 7(13) 13 O 2(2) 2 O 2(2) 2 O Hydrolases 98 30(41) 39 2 5(5) 4 1 4(4) 2 2 Lyases 134 22(23) 23 O 10(12) 11 1 13(18) 18 O US 2013/0332.133 A1 Dec. 12, 2013

TABLE 34-continued Coverage of non-redundant test set by notifs in PS1. PS2 and PS3. class # of seq. PS1 tpi fp PS2 tip fp, PS3 tp3 fps Isomerases 147 38(42) 26 16 6(6) 6 O 9(14) 8 6 Ligases 10 3(5) 5 O 4(10) 10 O O O O total 440 115(159) 140 19 27(35) 33 2 28(38) 3O 8

Remote Homology 0738. The Inventors of the present invention have con 0736. Results presented hereinabove suggest that short structed a database of 26.931 predicting sequence from predictive sequence motifs, although often extracted from 21,228 enzymes the Swiss-Prot 48.3 dataset which carry homology, may be better alternatives for functional specifi annotations of loci of active sites and binding sites. These cation of proteins. Data Summarized in Table 35 Suggests that enzymes constitute about 42% of the 48.3 dataset. It was Relying on sequence identity within long aligned sections found by the present Inventors that 65% of all active and may turn out to be fortuitous, while shorter motifs appear to binding sites are covered by predicting sequences. This can be tell the true story. Table 35 displays pairs of enzymes that have compared with the coverage of random positions on the same large sequence identity yet different functional assignments. enzyme sequences which, on average, is only 27%. Such All displayed EC assignments are substantiated by corre average was found to be off by about 80 standard deviations. sponding predictive sequences located on these enzymes, To validate the ability of the database of the present embodi most belonging to PS4. The numbers of predictive sequences ments to cover binding and active sites a non-redundant data per enzyme varies from one (in the cases of GTFB STRMU base was constructed from a reduced set of 582 enzymes, one and GABT ECOLI, the latter having only one PS3 peptide) enzyme for each EC number. This non-redundant database to 24 for AMY3B ORYSA. Thus the pair of enzymes GTFB included 6,660 predicting sequences which covered about STRMU and AMY3B ORYSA contains both extremes. Note 52% of the active and binding sites. The coverage of random that in spite of the reported 42% sequence identity along an positions on the same enzyme sequences was, on average, alignment of 105 amino-acids, none of the 24 predicting 21% (off by about 33 standard deviations). It is recognized sequences occurring on AMY3B ORYSA had an exact that the non-redundant database is unbiased and therefore match on GTFB STRMU, and a single PS4 (GGAFLE: SEQ allows estimating active and binding site coverage had the ID No.: 29308) found on the latter determines correctly its EC annotations existed for all enzymes (rather than 42%). The classification. Table 35 summarizes data for enzymes with present Inventors have succeeded to estimate a 12% coverage high sequence similarity and different EC assignments. with a high statistical significance. Alignment and identity were calculated according to the Smith-Waterman method. EC assignments agree with PSs 0739. In analyzing the significance of predicting sequence occurring on the enzymes. coverage of active and/or binding sites, the coverage was compared with that of randomly chosen residues on enzyme TABLE 35 sequences. This was carried out on all annotated enzymes with predicting sequence hits, as well as on the non-redundant Enzymes with high sequence similarity and different EC assignments. set. The deviations of the measurements from random distri butions were very high, and are quoted below in quanta of sequence alignment standard deviations (SDs). The corresponding p-values were enzyme 1 enzyme 2 identity length e-value found to be practically zero (bellow 10'). GUNA PSEFL MDHP FLABI 71.9% 28 1.6e-03 EC 3.2.1.4 EC 1.1.1.82 0740. The results are presented in Table 31. PLB1 YEAST METB ARATH 60% 30 5.9e-05 EC 3.1.1.5 EC 2.5.148 TABLE 31 RPB1 PLAFD UBC2 YEAST 63% 27 18e-05 EC 2.7.7.6 EC 6.3.2.19 active random PSS CHIB POPTR KDGE DROME S8% 24 6.Oe-O6 sites hit sites hit No. of No. of hitting EC 3.2.1.14 EC 2.7.1.107 database enzymes by SPs by SPS SDs PSS sites ODO2 FUGRU PP2BB HUMAN 53% 39 1.1e-06 EC 2.3.1.61 EC 3.1.3.16 all 21,228 65% 27% 80 26,931 8% GTFB STRMU AMY3B ORYSA 42% 105 7.4e-08 non redundant 582 52% 21% 33 6,660 12% EC 2.4.15 EC 3.2.1.1 RPB1 PLAFD BDE3B RAT S8% 36 84e-08 EC 2.7.7.6 EC 3.14.17 IGF1R HUMAN PTPRU HUMAN 34% 157 1.5e-09 0741 FIG. 19 displays aligned subsequences of enzymes, EC 2.7.10.1 EC 3.13.48 belonging to the same 3rd level but to different 4th levels of the EC hierarchy: 6 out of 35 enzymes of 5.1.3.2 and 7 out of 29 enzymes of 5.1.3.20. Shown are strings belonging to the Biological Roles of Predicting Sequences sequences that include active sites and binding sites as indi 0737 An analysis of correlation between predictive cated in Swiss-Prot annotations with bold-faced substrings sequences and previously characterized active sites was denoting predicting sequences from our lists. Whereas in undertaken in order to ascertain whether the predictive 5.1.3.20 most active sites are flanked by predicting sequences play an important role in the active and binding sequences, this is not the case for the active site of 5.1.3.2. sites of enzymes. FIG. 19 displays a 3D picture of one of the enzymes of US 2013/0332.133 A1 Dec. 12, 2013 69

5.1.3.20. The motifRYFNV (SEQID No.: 64741) can be seen 0746. A list of these predicting sequences and the enzymes to lie in proximity to both the SandYactive sites, sharing the on which they occur is provided in Table 33. same pocket. 0742 An example stressing the relationships among pre TABL E 33 dicting sequence and spatial structures is presented in FIGS. 20a-c. This enzyme belongs to 5.4.9.12 and it contains many PSs found in 3D pockets shared by active sites. predictive sequences. Shown here are predictive sequences Lines marked by asterisk denote the that maintain a fixed sequence-distance from the active site occurence of an active site within the PS. for many of the enzymes in this level 4 class. Two predicting P-value PDB id Predicting Sequence sequences flank the active site, one HMVRNI-(SEQID no.: 64382) shares a pocket with the active site and the two bind O. OOe-OO cyo AHEAIRP ing sites, and the other FHARF-(SEQID No. 64294) plays k O. OOe-OO aOe FHDRD the role of RNA binding in this tRNA pseudouridine synthase I. FHARF (SEQID No. 64294) is one example of previously k O. OOe-OO cyo ITYMRTD discovered motifs. Some other examples are: a. GFGRIG k O. OOe-OO pmi SDNWWRAG (SEQ ID No.: 14612; predicting sequence of 1) a conserved region of GAPDH that is active in the glycolytic pathway; b. 4.8 Oe-11 pmi RAGFTPKFKDW HRDLKP (SEQID No.: 35399; predicting sequence of 2.7.1) appearing in protein kinases; c. IFIDEID (SEQ ID No.: k 1.17e-10 lotm GNWKMH 44623; predicting sequence of 3.6.4.3), the Walker B motif of * 2 4 6 e- 08 lotm IAGNWKM ATPase; to name a few. However, most of the predicting sequences have not been studied before. 1.5 Oe-O7 aOe AQWKKALE 0743. These results raise the question how many predict k 6.25e-O7 lotm PIIAGNWK ing sequences can be found in the neighborhood of active sites, as defined by the pockets in the spatial structures of k 7 - 9 Oe-Of ejj MGNSE enzymes. The statistical significance of the occurrence of predicting sequences in 3D pockets that include active sites 8.1 Oe-O7 bq.3 NLFTGW was analyzed using the database of CASTp (Binkowski etal 8.29e-O7 gZd DWWGGR (2003) Castp: computed atlas of surface topography of pro teins. Nucleic Acid Research, 31:3352-3355). 8.29e-O7 iat DWWGGR 0744 CASTp lists all amino-acids belonging to pockets 8.29e-O7 gZd SKTFTT appearing in spatial structures of proteins. 1031 enzymes that possess pockets including active (or binding) site annotations 8.29e-O7 iat SKTFTT were selected. There are 8860 predicting sequences that occur on these enzymes, 31% of which lie within the active pockets 8.29e-O7 gZd TEDRAW in the sense that they have at least four amino-acids that reside 1.34 e-O6 q50 DWWGGR in the pocket. Defining a background model of random pep tides selected for each event of an predicting sequence hitting 1.34 e-O6 q50 SKTFTT an active pocket in a particular enzyme, an estimate of 11% of 5. O3e-O6 fzt NLFTGW all predicting sequences belong to events that pass an FDR P. Bork and E. V. Koonin. Protein sequence motifs. Curr. Op. 7.88e-06 gZd WWGGRYS Structural Biology, 6:366-376, 1996 limit of 0.05 was obtained. Most of them (about 70%) do not contain an active 7.88e-06 iat WWGGRYS site, hence they are of potential interest for experimental 9. O3e-O6 ejj WYQSLT Verification of their importance in defining and maintaining the enzymatic function. 9.48e-06 q50 WWGGRYS 0745. Table 32 lists the number of enzymes that were 2. 62e-Os rii NLFTGW analyzed and the number of SPs that are located on these enzymes. This is followed by numbers of predicting k 3. Oise-Os bq.3 LWLWRHG sequences lying (with at least four residues) in pockets including active sites. Requiring high significance of the lat 3.81e-Os dxi IEPKP ter, through a background model, and using the FDR limit of 3.95e-Os q50 WNIGIGGS 0.05, the results displayed in the following column were obtained. The last column displays the number of significant 5.9 Oe-Os lowz, MGNPH predicting sequences that lie in the pocket but do not contain k 8. O4 e-Os e GNSEWGH the amino-acid with active site annotation. 8. Oise-Os W IEPKP

TABLE 32 k 1. Ose-O4 hg3 LLNHSE Significant Significant PSs PSs without * 1. Ofe-O4 aW1 NWKLNG enzymes PS PS in pockets (FDR = 0.05) sites 1.11e-O4 OZ SKSGTT 1031 8860 2487 (28%) 1622 (18%) 1422 (16%) 1.13 e-O4 c47 GTSGLR US 2013/0332.133 A1 Dec. 12, 2013 70

TABLE 33 - continued TABLE 33 - continued PSs found in 3D pockets shared by active sites. PSs found in 3D pockets shared by active sites. Lines marked by asterisk denote the Lines marked by asterisk denote the occurence of an active site within the PS. occurence of an active site within the PS.

P-value PDB id Predicting Sequence P-value PDB ic Predicting Sequence 6 NNYOY c47 HPDPN RYFNV RYDYEE ACGSGA TASHN GLGNDF AHGNS R ... 64 e-O4 LGHSERR GHSER GRVOTP GNWKMING ITYPRS LGHSE RR PEKWOL fui GFOGO RHWTD oOz. EPAIAFR DWWGG oOz. NPFDOPG SKTFTT dxi WGGREG TEDRAW . 43 e-O4 mo O LWGGASLK 98e-04 LYGGSW IGHSERR 98e-04 YGGSWN YGGSWKP 2.1.2e-O4 YGGSWN RHGESEWN hti GHSERRH DODLRFG hti GNWKMING GRDPFGD hti LGHSERR TFHDDDL mo O GHSERRH ASHNPGGP mo O GNWKMING IEPKP mo O GHSERR 68e-04 RFTGW mo O WILGHSE LGHSERR * 2.36 e-O4 RHGOSEWN WILGHSE rii ... O 6e-O3 Spc. GHSERRH 3.24 e-O4 lotm WGGASLEPASFL ... O 6e-O3 Spc. GNWKMING c47 GATIRLY ... O 6e-O3 Spc. LGHSERR

... O 6e-O3 Spc. WIACIGE RLSGTGS ... O 6e-O3 Spc. WILGHSE WGGASLK ci1 LYGGSW 5. Obse-O4 tmh. GGASLKA ci1 YGGSVT * 5. Obse-O4 tmh. IGHSERR EPPELIG ASGGWS W WGGREG nus WTLASGDT . 44 e-O3 tmh. LWGGASLK QHAFYOL amk LGHSERR WWGGRYS amk WILGHSE LWGGASLK LWGGASLK