<<

Proteomics 2001, 1, 295–303 295

Jun Hirabayashi, Glycome project: Concept, strategy and preliminary Yoichiro Arata, Ken-ichi Kasai application to Caenorhabditis elegans

Department of Biological play a central role as potential mediators between complex societies, Chemistry, Teikyo University, because all living consist of cells covered with diverse chains Kanagawa, Japan reflecting various cell types and states. However, we have no idea how diverse these carbohydrate chains actually are. The main purpose of this article is to persuade life scientists to realize the fundamental importance of taking some action by becoming involved in “”. “Glycome” is a term meaning the whole set of glycans produ- ced by individual organisms, as the third bioinformative macromolecules to be elucida- ted next to the genome and . Here a basic strategy is presented. The essence of the project includes the following: (a) glycopeptides, but not glycans released from their core , are targeted for linkage to genome databases; (b) Caenorhabditis elegans is used as the first model for this project, since its genome project has already been completed; (c) four essential attributes are adopted to characterize

each glycopeptide: (i) cosmid identification number (ID), (ii) molecular weight (Mr), (iii) retention (Rs) of pyridylaminated (PA) oligosacharides in 2-D mapping, and (iv) disso-

ciation constants (Kd’s) of PA- for a set of . Thus, the obtained

ID, Mr, R and Kd’s construct the glycome database, which will be open as the previous genome and proteome databases. For the project to proceed the “glyco-catch” method is proposed, where a group of target glycopeptides are captured by means of -affinity after protease digestion. Already glycopeptides from asialofetuin and ovalbumin were successfully captured by galectin-agarose and Con A-agarose, respectively. Further, to examine the practical validity of the method, we extracted membrane proteins from C. elegans with 1% Triton X-100, and isolated spe- cific glycopeptides by use of the same galectin column. One of the glycopeptides was successfully identified in the C. elegans genome database. Finally, for determination of

Kd between glycopeptides and lectins, a recently reinforced frontal affinity chromato- graphy (FAC) is proposed as an alternative to define structures in place of deter- mining every covalent structure.

Keywords: Glycome / Caenorhabditis elegans / Galectin / Con A / Glyco-catch method / Frontal PRO 0007

1 Introduction: Glycome, a key word for life though the latter play a central role in energy metabolism. science of the 21st century There are increasing lines of evidence that glycans contai- ned in such glycoconjugates play important roles with Glycoconjugates, i. e., , , and their extremely diverse features and heterogeneity. In proteoglycans, have different biological significance from fact, they are involved in extensive recognition pheno- (e. g., and ) and their mena including development, differentiation, morphoge- homopolymers (e. g., and , respectively), nesis, fertilization, implantation, infection, cancer meta- stasis, etc. [1]. Considering that all living organisms con- Correspondence: Dr. Jun Hirabayashi, Department of Biological sist of cells covered with an abundance of such diverse Chemistry, Faculty of Pharmaceutical Sciences, Teikyo Univer- carbohydrate chains reflecting various cell types and sta- sity, Sagamiko, Kanagawa, 199–0195, Japan tes, it should be more emphasized that these recognition E-mail: [email protected] Fax: + 81-426-85-3742 events do occur at the cell level. From this viewpoint, we may take it for granted that such a great diversity of gly- Abbreviations: FAC, frontal affinity chromatography; ID, identifi- cans expressed on cell surfaces is fated to play an essen- cation; LacNAc, N-acetyllactosamine; PA, pyridylaminated; R, tial role in various cell-based discrimination events. In- retention; Kd, dissociation constant; WGA, wheat germ aggluti- nin; CN, cyanopropyl; NHS, N-hydroxysuccinimide other words, these glycans are regarded as functional

 WILEY-VCH Verlag GmbH, 69451 Weinheim, 2001 1615-9853/01/0202–295 $17.50+ .50/0 296 J. Hirabayashi et al. 2001, 1, 295–303 substances, or “bar codes” to identify various cell types. (3) Determination of Kd of lectin(s) as an attribute of each

In this context, analysis of glycosylation is a criti- glycopeptide as well as assignment of Mr, cosmid ID, and cal issue for proteomics as an important post-translatio- retention of PA-oligosaccharides in 2-D mapping. nal modification. The first is none other than the concept of the glycome Glycans have the potential to exert astronomical figures itself, whereas the other two mostly characterize the pro- of structural diversity with a relatively small number of ject: adoption of the glycopeptide rather than a released component saccharides, since they can create many lin- glycan as a registered unit. This is a key point so that lin- kage isomers and branching types [2]. These events are kage to the established C. elegans genome database is critical to describe a multivalent feature of formed. In C. elegans, each individual gene is provided distinct from proteins and nucleic acids. Actually, glycans with a cosmid ID, such as F54F11.2 (F54F11 specifies mostly consist of a rather few accepted aldohexoses and the cosmid, while “2” specifies the gene included in this their derivatives; e. g., D-glucose (usually N-acetylgluco- cosmid). The third point will be a major issue for accom- samine), D- and D-. In this context, plishing any glycome project; i. e., how to characterize there is a view on the birth of these elementary each glycan structure. Though it is ideal to determine that they are the most stable among the eight D-aldohe- covalent structures of all glycans, it is obviously impracti- xoses, and thus, must have been most abundant on the cal with the lack of an automated glycan sequencer. prebiotic earth [3]. A considerable volume of observed Rather than following such a genome-type strategy, diversities is attributable to differences in patterns of extraction of the essence of each glycan structure, that branching, linkage, and also modifications, in particular is, its characteristics as a glycocode is hopeful. In this at the nonreducing terminus, such as by L- and context, we may have two different approaches: one is sialic acids. However, it is totally unknown as to how 2-D mapping of PA-oliogosaccharides, which is a well- many glycans one organism may actually have. If glycans established identification method of oligosaccharides [6, are really important as a third class of bioinformative 7], and the other is Kd determination by FAC [4]. Hopefully, macromolecules, next to nucleic acids and proteins, it is 2-D mapping is fully applicable to various novel oligosac- essential to collect broad information about glycans charides in its potential. Adoption of a set of Kd values to under the concept of “glycome”, which refers to the entire appropriate lectins is promising, since lectins are known set of glycans in one organism. Up to the present, how- to discriminate significant differences in carbohydrate ever, there has been no such idea. Apparently, new prin- structures (e. g., linkage and terminal modifications) and ciples and technologies essentially distinct from those have a long history as useful tools to study cells. It is also applied to genome and proteome projects are necessary supposed that such lectins actually function in vivo as to make this glycome project become a reality. “decipherers” of complex carbohydrate chains [8–11]. In this article, we present for the first time a basic strategy Even in this case, determination of key glycan structures for glycomics, which targets glycoproteins, and give are necessary. In this regard, there has been no reports of some results of preliminary experiments. We made use covalent carbohydrate structure of C. elegans besides of two affinity techniques: one termed the “glyco-catch” those based on mostly advanced genome science. method and the other, “frontal affinity chromatography” (FAC). The former is described with model and more prac- It is important to consider which lectins should be used for tical experimental results. The latter was recently reinfor- isolation of glycoproteins. Generally speaking, prior inve- stigation by lectin blot analysis should give useful informa- ced to determine dissociation constants (Kd’s) between oligosaccharides and lectins systematically [4]. As the tion on glycans. In this study, we choose endogenous first target organism for the project, we chose the galectins as the first tools. Galectins are widely distributed, nematode C. elegans, of which the complete genome soluble, metal-independent lectins. Their binding specifi- sequence is now available [5]. city is reserved to N-acetyllactosamine (LacNAc), binding to it via their evolutionarily conserved carbohydrate- recognition domains (CRDs) [12]. Galectins were first inve- 2 Strategy and methodology stigated in vertebrates, but later, two homologous proteins having 32 kDa and 16 kDa subunits were found in C. ele- 2.1 The basic strategy of the glycome project gans, and were well characterized [13–17]. Surprisingly, more than 10 galectin genes designated lec-1-11 (regi- To accomplish the C. elegans glycome project, the follo- stered in the C. elegans community) have been identified wing three actions define the strategy: (1) Collection of in C. elegans as a result of the genome project for this the whole set of glycans in a single organism; (2) Linkage organism. Among these galectins, 16 kDa galectin (rena- of target glycopeptides to a genome database; and med LEC-6) was chosen for the first test of the gly- Proteomics 2001, 1, 295–303 C. elegans glycome project 297

Figure 1. Summary of concept and strategy of the C. elegans Glycome Project. In the project, a set of glycopeptides are selected by the glyco-catch method: (1) glycoproteins are extracted from worms, (2) specific glycoproteins are selected by lectin-affinity chromatography-1 (e. g., using galectin-agarose, Con A-agarose), (3) bulk glycoproteins are digested with Achromobacter protease I; (4) target glycopeptides are captured by lectin-column chromatography-2. Here, the same lectin-column is used as in (2); (5) glycopeptides are purified by means of 2-D HPLC; (6) isolated glycopeptides are sub- jected to: (i) peptide sequencing and (ii) MS to specify cosmid ID in the database and Mr of the glycan moiety, respectively. On the other hand, glycans are liberated from the peptides and labeled with 2-aminopyridine. The obtained PA-oligosac- charides are subjected to 2-D mapping and FAC to determine retention (R’s in terms of glucose units) and Kd’s of appro- priate lectins, respectively. Thus, the obtained attributes (ID, Mr, R, and Kd) construct the glycome database as the essential minimum.

come project, because it represents a typical dimeric (WGA) for GlcNAc, Ulex europaeus lectin for L-Fuc, etc. galectin consisting of two small 16 kDa subunits, which For the present study, a galectin (LEC-6)-column and a is simpler than another dominant galectin, LEC-1, which ConA-column have been examined. Bound glycoproteins has two homologous but distinct CRDs. In addition, LEC- are eluted with hapten saccharides, such as for 6 shares many properties with other galectins of higher elution from the galectin-column and methyl-a-D-mann- organisms, including a noncovalent dimeric structure oside for that from the Con A column. and conservation of all of the critical amino acid residues involved in the LacNAc binding [18–20]. Thus chosen, LEC-6 was immobilized on agarose, and was used as 2.2.2 Protease digestion the first tool to perform the glyco-catch method descri- bed below. Glycoproteins obtained above are subjected to protease digestion to generate glycopeptides. Ideally, each glyco- peptide should have a single glycan chain. Achromobac- 2.2 Glyco-catch method ter protease I is best among commercially available enzy- The glyco-catch method represents a fundamental part of mes (Lysylendopeptidase from Wako, Tokyo, Japan), first the proposed project (Fig. 1). The overall strategy may be of all, owing to its most rigorous specificity [21]. Its highly outlined as follows: stable (100% activity maintained in 4 M urea or 0.1% SDS) is also advantageous. In practice, prior to digestion, glycoproteins are precipitated with ethanol (2 2.2.1 Lectin-affinity chromatography-1: isolation of glycoproteins volume) to remove the saccharide used for elution and other chemicals such as Triton X-100, and are dissolved Various lectins can be used to isolate glycoproteins and completely denatured in 8 M urea for 30 min at 377C. having different carbohydrate types, e. g., galectin speci- Immediately after dilution with the same volume of an fic for LacNAc, Con A for Man, wheat germ agglutinin appropriate digestion buffer (e. g., 0.1 M NaHCO3,pH 298 J. Hirabayashi et al. Proteomics 2001, 1, 295–303

8.5), 1/100 w/w of protease relative to glycoproteins is complexity is almost comparable to the number of theo- added, and the reaction is allowed to proceed for at least retical permutations attainable by six amino acids 16 h at 377C. Under these conditions, 495% of C. ele- (206=6.46107). For searching databases, a motif search gans glycoproteins are digested and become invisible by program is mostly useful: in this context, SQMATCH silver staining on SDS-PAGE (14% gel). (experimental use, developed by DDBJ/National Institute of Genetics, Japan; URL, http://ftp2.ddbj.nig.ac.jp:8080/ 2.2.3 Lectin-affinity chromatography-2: sqmatch.html) has been utilized. isolation of glycopeptides On the other hand, MALDI-MS will be the best choice for Target glycopeptide(s) are captured on the same affinity- systematic determination of Mr of glycopeptides. Once Mr column as used in Section 2.2.1. The protease digest is is determined, the mass of the glycan moiety is theoreti- applied to the column equilibrated with an appropriate cally obtained, since that of peptide is automatically given buffer. In principle, target glycans are fully recognized by from the genome sequence. For this purpose, the use of the lectin even after proteolysis. Moreover, peptide moie- rigorously specific protease is very important. By MS ana- ties may help the lectin binding, in particular when endo- lysis, significant heterogeneity of the glycan moiety may genous lectins are used. From a practical viewpoint, it is also be detected. For separation of such heterogeneous recommended to fully inactivate the before chro- glycans, 2-D mapping of PA-oligosaccharides should be matography. Otherwise, immobilized lectin could be most promising, since a number of PA-oligosaccharides damaged by exposure to this highly processive protease have been successfully resolved and identified by this even in a cold room. The addition of a serine-protease system [6, 7]. On the other hand, it is in general difficult inhibitor, such as p-aminobenzenesulphonyl fluoride, will to separate closely related glycopeptides by a conventio- serve this purpose. Concentrated Tris-buffer also acts as nal HPLC system. The 2-D mapping system using PA-oli- an inhibitor to some extent. gosaccharides will also be applicable to novel oligosac- charides, such as those derived from C. elegans. For

determination of Kd of lectins to thus purified PA-oligosac- 2.2.4 Purification of glycopeptides by HPLC charides, the recently reinforced FAC is utilized (descri- Although various types of columns are available for HPLC bed in Section 2.3). separation of peptides, those for reversed-phase chro- matography are considered to have several practical 2.2.6 Construction of C. elegans glycome merits, e. g., high resolution, and possible use of a volatile database solvent system such as acetonitrile/0.1% trifluoroacetic Thus obtained, these four attributes (cosmid ID, Mr,R,Kd) acid. However, sometimes, 2-D separation by combina- constitute the C. elegans glycome database. Hopefully, tion of different types of packings, such as trimethylsilyl the database should be flexibly linked to the C. elegans (TMS), phenyl (Phe), cyanopropyl (CN), or octadecyl genome database, so that value-added information on (ODS), becomes necessary for more complete resolution. glycosylation (position, Mr, Kd of lectins, etc.) is automati- cally obtained on-screen through the thus opened gly- 2.2.5 Characterization of glycopeptides come web site.

This constitutes the core of the C. elegans glycome data- 2.3 Frontal affinity chromatography: base. As our basic strategy, the database is composed of Kd determination four essential attributes: (i) gene (cosmid ID) encoding the core protein; (ii) molecular weight (Mr) of the glyco- For determination of Kd, recently reinforced FAC is peptide; (iii) retention in 2-D mapping; and (iv) dissocia- strongly suggested as a key to this project [4]. The system tion constants (Kd) for a set of lectins. The first two specify is simple, and the theory is well established (Fig. 2), which glycopeptides and the remaining two specify glycans is in principle equivalent to the Michaelis-Menten equa- liberated from peptides and labeled with 2-aminopyri- tion [22]. This method was proved to be useful for quanti- dine. As regards the efficiency of the genome database tative analyses of interactions of protease [23, 24], ribonu- search, it has been experimentally confirmed that in clease [22] and lectins [16, 25, 26]. More recently, the 490% of cases there is a hit of a unique gene in the data- method was utilized to screen synthetic carbohydrate base if any six amino acid positions are fixed by peptide analogues in the context of combinatorial chemistry and sequencing. This is statistically reasonable in considera- on-line ESI-MS detector [27], though such a detection tion of the relatively small genome size of C. elegans system may not achieve popularity, because of economi- (16108 bp), hypothetically encoding 26108 amino acids cal and technical reasons. The system introduced for this including all reading frames in both directions, of which project is quite simple and smart by importing various Proteomics 2001, 1, 295–303 C. elegans glycome project 299

Figure 2. The principle and system of analysis of frontal affi- nity chromatography. (a)InFAC an excess volume of a sample, A, is applied at an initial concen-

tration [A]0 continuously to a column, on which affinity ligand B is immobilized. In comparison with a sample (negative control) having no affinity to B, the diffe- rence in volumes of their elution

fronts, i. e., V – V0, reflects the affinity between A and B by the Figure 2a equation: K d =Bt /(V – V0) – [A]0. It is noted that this basic equa- tion of FAC is in principle equi- valent to the Michaelis-Menten

equation: Km = Vmax[S]0 / v – [S]0,

where Km (Michaelis constant),

Vmax (maximum velocity), [S]0 (initial substrate concentration), and v (velocity) correspond to

Kd,Bt, and [A]0, respectively. (b) The system for reinforced FAC. A conventional HPLC system is used for HFAC with a small modification by the use of relati- vely large sample loop in order to apply an excess volume of sample to a miniature column Figure 2b (4.0610 mm).

merits of HPLC and reserving a simple and versatile feature of glyco-catch method applied to C. elegans glycoproteins. FAC. Asmall column(4 6 10 mm) packed with lectin-immo- In the present case, both galectin (LEC-6) and Con A were bilized cross-linked agarose is attached to a conventional immobilized on NHS-activated Sepharose 4 Fast Flow HPLC system, and an excess volume of fluorescently- (approximately 5 mg/mL gel) (Amersham Parmacia Bio- labeled glycopeptide is applied continuously by use of a tech, Uppsala, Sweden). 2-mL sample loop. For immobilization, N-hydroxysuccini- mide-activated (NHS) Hi-Trap (Pharmacia, Uppsala, Swe- den) is convenient and successful. The delay of elution 3.1 Model experiments with asialofetuin and ovalbumin relative to the negative control (V – V0) corresponds to the affinity between the immobilized lectin and the eluted gly- To confirm the validity of the above-planned glyco-catch cans. The reinforced system proved to be useful for stu- method, we first attempted model experiments using dying detailed -binding specificity in experiments purified glycoproteins, i. e., bovine fetuin and chicken using a set of PA-oligosaccharides [28, 29]. ovalbumin. Fetuin has three N-glycosylation sites (amino acid positions, 99, 156 and 176) and four O-glycosylation 3 Results sites (271, 280, 282 and 341) [30]. N-acetyllactosamine- specific galectin (LEC-6) is expected to bind to the former In this section, results of model experiments on the glyco- three N-glycans if sialic acids are removed, because a catch method using asialofetuin and ovalbumin are considerable portion of the nonreducing terminal LacNAc described. Then, we demonstrate the validity of the is masked by sialic acid. Therefore, prior to protease dige- 300 J. Hirabayashi et al. Proteomics 2001, 1, 295–303

Figure 3. Purification on HPLC of target glycopeptides from asialofetuin and ovalbumin to demonstrate the validity of the “glyco-catch” method. Asialofetuin (left) and ovalbumin (right) were degra- ded by the action of Achromobacter pro- tease I (commercial name, Lysylendo- peptidase) in the presence of 4 M urea, and specific glycopeptides were captu- red by lectin-affinity chromatography on a galectin LEC-6-agarose and on a Con A-agarose columns, respectively. The derived glycopeptides were purified by RP chromatography on a trimethylsilyl (TMS)-column (4.66150 mm; Nomura Chemical, Tokyo, Japan), as shown, by a linear gradient increase of acetonitrile in 0.1% v/v TFA. Chromatography was performed at a flow rate of 1 mL/min at room temperature (22–237C), and pepti-

des were detected by A210.

stion, fetuin was desialylated by acid treatment (807C, 1 h). (corresponding to the 12th and 32nd cycles) was detected On the other hand, ovalbumin has a single N-glycosyla- as a significant phenylthiohydantoin-amino acid, even tion site (amino acid position, 292). The carbohydrate though the flanking positions were successfully determi- structure is known to be largely a high-mannose type ned. Though in an indirect manner, this observation implies [31], to which mannose/GlcNAc-specific legume lectin that Asn156 and Asn176 are glycosylated as reported [26]. Con A binds tightly [26]. Other smaller peaks (indicated by asterisks) also gave the sequence 145–211, but not 219–225. These smaller peaks These glycoproteins (5 mg each) were dissolved in 8 M urea, and incubated at 377C for 30 min to fully unfold the are probably the result of cleavage of a disulfide bridge 207 219 polypeptides. After digestion with 50 mg of protease at (Cys –Cys ) between the two peptides, since disulfide 377C for 16 h, the digests were applied to either the galec- bridges are less stable under alkaline conditions. To cir- tin-column or the Con A-column (column volume, 5 mL cumvent such a situation, it is advisable to reduce and S- each). The galectin-column had previously been equili- alkylate disulfide bridges (e. g., carboxymethylation) prior to digestion. brated with MEPBS (4 mM b-mercaptoethanol, 2 mM EDTA, 20 mM Na-phosphate [pH 7.2], 150 mM NaCl). After On the other hand, the ovalbumin digest gave a sharp sin- application of the digest, the galectin column was exten- gle peak by RP chromatography on a trimethylsilyl (TMS)- sively washed with the same buffer, and the adsorbed gly- column (Fig. 3, right). The sequence completely corre- copeptides were eluted with 0.1 M lactose in MEPBS. In sponded to peptide 291-322, which contained a pre- the case of the Con A-column it was equilibrated with TBS viously identified N-glycosylation site (Asn292-Leu-Thr) (50 mM Tris-HCl, pH 7.5, 150 mM NaCl) and the bound gly- [29]. In this case, Asn292 (corresponding to the 2nd cycle) a copeptides were eluted with 0.1 M methyl- -D-mannopy- was not detectable either. Both in asialofetuin and ovalbu- ranoside in the same buffer. Eluted fractions were subjec- min, all of the analyzed peptides found from the known ted to RP chromatography (Fig. 3), followed by Shimadzu sequence was preceded by a lysine residue. This confirms PPSQ-21 sequencer analysis (Kyoto, Japan). The main the rigorous specificity of Achromobacter protease I. peak that eluted from the LEC-6 column (retention time at 28 min) was found to be a mixture of equal amounts of two 3.2 Practical approach using C. elegans peptides: 145–211 and 219–225 (numbers denote amino crude extract acid positions; Fig. 3, left). The longer peptide 145–211 has been shown to have two N-glycan attachment sites As a more practical approach, we applied this method to (Asn156-Asp-Ser, and Asn176-Gly-Ser; underlines denote crude materials from C. elegans. After removal of soluble glycosylation sites). In the sequence analysis, neither Asn proteins by centrifugation, precipitated proteins including Proteomics 2001, 1, 295–303 C. elegans glycome project 301

Figure 4. Practice of glyco-catch method. Galectin-binding glycopro- teins were obtained from C. elegans Triton X-100 extract by the first affi- nity chromatography on a LEC-6a- agarose column, and were subjec- ted to Achromobacter protease I digestion. Galectin-specific glyco- peptides were then selected by the second affinity chromatography using the same column. Both adsorbed (right) and unadsorbed fractions (left) were subjected to separation by RP chromatography. Figures indicated above some peaks denote cosmid ID, to which determined amino acid sequences matched unambiguously.

membrane-integrated glycoproteins were solubilized with On the other hand, from the unadsorbed fraction, a num- 1% Triton X-100, and applied to the galectin (LEC-6)- ber of peptides (depicted by arrowheads in Fig. 4, left) column as in Section 3.1. Bound glycoproteins were elu- were found to be encoded by the same gene (F54F11.2); ted with 0.1 M lactose, and after precipitation with ethanol, i. e., peptides of amino acids 94–127, 207–215, 253–278, digested with Achromobacter protease I. A group of gly- 866–878, 1041–1049, 1095–1113 and 1298–1311. These copeptides were captured by the same galectin-column, preliminary results confirmed that the present method and were separated by 2-D RP chromatography, i. e., first satisfactorily serves the purpose of the glycome project. on a TMS-column (Fig. 4, right), and second on a CN- According to our calculations, starting from 100 g of column (not shown). For comparison, unadsorbed pepti- worms, 100 different glycopeptides will be recovered by des from the second galectin-column chromatography using the galectin column, with yields of 10–100 pmol. were also applied to 2-D separation (first chromatogram shown in Fig. 4, left). Both chromatograms showed the presence of an extremely large number of peptides, 3.3 Examples of FAC analysis reflecting the use of the crude starting material, though the unadsorbed peptides were still much more numerous. As an example of FAC, C. elegans galectin LEC-6 was Among the adsorbed glycopeptides, for example, a gly- immobilized on NHS-activated Hi-Trap at a concentration copeptide (depicted by the arrow) has been successfully of 7.4 mg/mL gel, and the gel was packed into a miniature identified by direct sequencing and database searching: column (4.0610 mm) [4]. Four PA-oligosaccharides the gene cloned in the cosmid F54F11.2 was found to (lacto-N-tetraose, lacto-N-neotetraose, lacto-N-fucopen- encode a type-II membrane protein consisting of 1 589 taose-I, and lacto-N-fucopentaose-II) were applied to the amino acids, homologous to mammalian zinc-binding column through a 2 mL sample loop (Fig. 5). In the case metalloprotease. It contained 16 potential N-glycosyla- depicted, galectin LEC-6 showed moderate binding to tion sites. As expected, a potential N-glycosylation site lacto-N-neotetraose (Kd, 0.32 mM), whereas its affinity was found to reside at Asn575-Phe-Thr of the captured increased if terminal N-acetyllactosamine glycopeptide (amino acids 554–578). This Asn575 (the (Galb1-4GlcNAc) was replaced by linkage isomer lacto-

22nd cycle in sequencer analysis) was not identified, N-biose (Galb1-3GlcNAc; Kd, 0.19 mM), and even further while the following amino acids, Phe576-Thr-Lys were by Fuca1-2 modification of the nonreducing terminal unambiguously determined. The failure in identifying galactose (Kd, 0.17 mM). On the other hand, a1-4 fucosy- Asn575 suggests that it is actually glycosylated. This pep- lation of the penultimate GlcNAc completely abolished tide was confirmed from the sequence to be preceded by the binding. Each Kd was calculated based on prior deter- 553 Lys . mination of ligand content (Bt = 30 nmol) of the column 302 J. Hirabayashi et al. Proteomics 2001, 1, 295–303

Figure 5. Examples of analysis by FAC. A recombinant galectin LEC-6 was immobilized on cross-linked agarose and was packed into a miniature column (size, 4 6 10 mm; volume, 0.126 mL). Several PA oligosaccharides were applied to the column at a flow rate of 0.25 mL/min, at 207C. Their common names and covalent structures are shown above the indivi- dual chromatograms, which are overlapped with that of negative control (PA-) having no affinity to galectin. The delay of elution front between each pair of saccharides corresponds to affinity.

used by concentration dependence analysis using Con A-column, and further on a WGA-column, and so on p-aminophenyl-b-lactoside (Hirabayashi et al.,tobe (Hirabayashi et al., to be published). published). Success or failure of this project using C. elegans is a matter of grave concern, if we regard it as the first step 4 Concluding remarks to our ultimate goal of targeting a much more complex organism; e. g., Homo sapiens. Glycome projects in The basic concept of the glycome and possible strategy various contexts will accumulate broad statistical data for the C. elegans glycopeptide glycome project have on lectin recognition patterns in correlation with differen- been presented. Validity of the glyco-catch method as a ces in environment, pathological state, individuals, races, core methodology of the project was confirmed in both species, etc. This type of approach should pioneer new model and practical experiments using C. elegans mem- fields of life sciences under the concept of glycome, brane proteins. The strategy also proposes four attribu- because such differences in glycosylation patterns and tes, i. e., ID (gene), Mr, R (retention in 2-D mapping), and remodeling have proved to reflect altered susceptibilities Kd (dissociation constant of lectins) as essential mini- to pathogenic infection, allergy, illness, etc. [1], as predic- mums to construct the glycome database. To determine ted by studies on single nucleotide polymorphism. All life Kd systematically, FAC proved to be promising due to its scientists involved in and proteomics are wel- convenience, rapidity, sensitivity and reliability. Though come to barrier-free communication with this glycome the present results seem to satisfy minimum requirements project. to start the glycome project, there still remain some criti- cal issues to be resolved in the very near future. First of all, The authors thank Dr. Natsuka, S. from Kyoto Institute of a world-wide consortium to carry out the project must be Technology for helpful discussion on glycomics, Drs. organized. Direct determination of key carbohydrate Isobe, T. and Kaji, H. from Tokyo Metropolitan University structures of C. elegans is keenly awaited. There are also for MALDI-MS analysis, and Ogawa, F., Hayama, K., Yos- a number of practical and technical issues to be solved. In hida, T., Kamemura, D., and Hashidate, T. for their techni- this context, various methodologies used in proteomics cal assistance. The studies described herein were suppor- should better be introduced, such as 2-D gel electropho- ted by the Grant-in-Aid for Scientific Research on Priority resis for the separation of glycoproteins and CE for high- Area No. 10178102 from the Ministry of Education, Sci- resolution analysis of glycopeptides. Successive affinity ence, Sports and Culture of Japan, and by Mizutani Foun- adsorption of glycoproteins is also under investigation: dation for Glycoscience. the flow-through fraction in galectin affinity chromatogra- phy is subjected to the next affinity chromatography on a Received February 10, 2000 Proteomics 2001, 1, 295–303 C. elegans glycome project 303

5 References [16] Arata, Y., Hirabayashi, J., Kasai, K., J. Biochem. (Tokyo) 1997, 121, 1002–1009. [1] Varki, A., Cummings, R. D., Esko, J., Freeze, H., Hart, G. [17] Arata, Y., Hirabayashi, J., Kasai, K., J. Biol. Chem. 1997, (Eds.) Essentials of , Cold Spring Harbor 272, 26669–26677. Laboratory, New York 1999 [18] Lobsanov, Y. D., Gitt, M. A., Leffler, H., Barondes, S. H., Rini, [2] Laine, R. A., Glycobiology 1995, 4, 759–767. J., J. Biol. Chem. 1993, 268, 27034–27038. [3] Hirabayshi, J., Q. Rev. Biol. 1996, 71, 365–380. [19] Liao, D.-I., Kapadia, G., Ahmed, H., Vasta, G.R., Herzberg, [4] Hirabayashi, J., Arata, Y., Kasai, K., J. Chromatogr. A. 2000, O., Proc. Natl. Acad. Sci. USA 1994, 91, 1428–1432. 890, 261–271. [20] Seetharaman, J., Kanigsberg, A., Slaaby, R., Leffler, H., Bar- ondes, S. H., Rini, M., J. Biol. Chem. 1998, 273, 13047– [5] The C. elegans Sequencing Consortium, Science 1998, 282, 13052. 2012–2018. [21] Tsunasawa, S., Masaki, T., Hirose, M., Sakiyama, F., J. Biol. [6] Yanagida, K., Natsuka, S., Hase, S., Anal. Biochem. 1999, Chem. 1989, 264, 3832–3839. 274, 229–234. [22] Kasai, K., Oda, Y., Nishikawa, M., Ishii, S., J. Chromatogr. [7] Tomiya, N., Awaya, J., Kurono, M., Arata, Y., Takahahi, N., 1986, 376,33–47. Anal. Biochem. 1988, 171,73–90. [23] Kasai, K., Ishii, S., J. Biochem. (Tokyo) 1978, 84, 1051– [8] Kasai, K., Hirabayashi, J., J. Biochem. (Tokyo) 1996, 119,1– 1060. 8. [24] Kasai, K., Ishii, S., J. Biochem. (Tokyo) 1978, 84, 1061– [9] Cummings, R. D. in: Gabius, H.-J, Gabius, S., (Eds.), Glyco- 1069. sciences: Status and Perspectives, Chapman and Hall, Lon- [25] Oda, Y., Kasai, K., Ishii, S., J. Biochem. (Tokyo), 1981, 89, don 1997, pp. 191–199. 285–296. [10] Gabius, H.-J., Eur. J. Biochem. 1997, 243, 543–576. [26] Ohyama, Y., Nomoto, H., Inoue, Y., Kasai, K., J. Biol. Chem. [11] Akimoto, Y., Imai, Y., Hirabayashi, J., Kasai, K., Hirano, H., 1985, 260, 6882–6887. Progr. Histochem. Cytochem. 1998, 33, pp. 1–92. [27] Schriemer, D. C., Bundle, D. R., Li, L., Hindsgaul, O., Angew. [12] Hirabayshi, J., Trends Glycosci. Glycotechnol. 1997, 9,1– Chem. Int. Ed. 1998, 37, 3383–3387. 180. [28] Rata, Y., Hirabayashi, J., Kasai, K., J. Chromatogr. A. 2001, [13] Hirabayashi, J., Satoh, M., Kasai, K., J. Biol. Chem. 1992, 905, 337–343. 267, 15485–15490. [29] Rata, Y., Hirabayashi, J., Kasai, K., J. Biol. Chem. 2001, in [14] Hirabayashi, J., Ubukata, T., Kasai, K., J. Biol. Chem. 1996, press. 271, 2497–2505. [30] Yet, M. G., Chin C. C., Wold, F., J. Biol. Chem. 1988, 5, 111– [15] Hirabayashi, J., Arata, Y., Kasai, K., Trends Glycosci. Glyco- 117. technol. 1997, 9, 113–122. [31] Nomoto, H., Inoue, Y., Eur. J. Biochem. 1983, 135, 243–250.