Resources for Processing Bulgarian and Serbian – a brief overview of Completeness, Compatibility, and Similarities

Svetla Koeva, Cvetana Krstev, Ivan Obradoviş, Duško Vitas Department of Computational Faculty of Philology Faculty of Geology and Faculty of Mathematics Linguistics – IBL, BAS University of Belgrade Mining, U. of Belgrade University of Belgrade 52 Shipchenski prohod, Bl. 17 Studentski trg 3 ušina 7 Studentski trg 16 Sofia 1113 11000 Belgrade 11000 Belgrade 11000 Belgrade [email protected] [email protected] [email protected] [email protected] application approaches used offer not only a very good Abstract basis for comparative research but furthermore Some important and extensive language resources presupposes the successful implementation in such have been developed for Bulgarian and Serbian different application areas as cross-lingual information that have similar theoretical background and and knowledge management, cross-lingual content structure. Some of them were developed as a part management and text data mining, cross-lingual of a concerted action (wordnet), others were information retrieval and information extraction, developed independently. Brief overview of these multilingual summarization, multilingual language resources is presented in this paper, with emphasis generation etc. on similarities and differences in the information presented in them. Special attention is given to 2. Electronic dictionaries similar problems encountered in the course of the development. 2.1 Bulgarian Grammatical Dictionary The grammatical information included in the 1. Introduction Bulgarian Grammatical Dictionary (BGD) is divided into three types [Koeva, 1998]: category information Bulgarian and Serbian as Slavonic languages show that describes lemmas and indicates the words similarities in their lexicons and grammatical clustering into grammatical classes (, Verb, structures. At present, some equivalent language Adjective, Pronoun, Numeral, and Other); resources were developed for both languages, paradigmatic information that also characterizes moreover the formal approaches for the organization lemmas and shows the grouping of words into of those recourses are very similar. The goal of this grammatical subclasses, i.e. — Personal, Transitive, paper is briefly to present these similarities while Perfective for verbs, Common, Proper for , etc.; describing the integration between electronic and grammatical information that determines the dictionaries and lexical-semantic data bases formation of word forms and shows the classification (wordnets) of both languages. of words into grammatical types according to their inflection, conjugation, sound and accent alternations, The Bulgarian and Serbian wordnets have been etc. initially developed in the framework of the project The BGD is a list of lemmas where each entry is BalkaNet – a multilingual semantic network for the associated with a label [Koeva, 2004a]. The label itself Balkan Languages which has been aimed at the represents the grammatical class and subclass to which creation of a semantic and lexical network of the the respective lemma belongs and contains a unique Balkan languages [Stamou, 2002] with a view to their number that shows the grammatical type. All words in integration in the global WordNet [Fellbaum, 1998; the language that belong to the same grammatical Miller, 1990] — an extensive network of synonymous class, subclass and have an identical set of endings and sets and the semantic relations existing between them sound / alternations are associated with one and in different languages, enabling cross-references the same label. Each label is connected with the between equivalent sets of words with the same corresponding formal description of endings and meaning [Vossen, 1999]. alternations. The inflectional engine used is equivalent The common origin of Bulgarian and Serbian, the to a stack automaton. Despite the existence of some equivalent types of existing electronic resources and CATEGORY BULGARIAN SERBIAN Gender masculine, feminine, neutral Number singular, plural, counting form singular, plural, paucal Case vocative nominative, genitive, dative, accusative, vocative, instrumental and locative definite, indefinite, definite - full form, definite - short form Animateness animate, inanimate Table 1. Grammatical features of nouns differences in the format, the BGD represents a kind Bulgarian nouns are divided into grammatical of DELAS dictionary [Courtois, 1990; Courtois at al, subclasses with respect to their type (Common, 1990; Silberztein, 1990], and it is compiled into a Proper, Singularia tantum, Pluralia tantum) and Finite-State Transducer. Gender. The category Gender with Bulgarian nouns is a lexical-semantic category, which means that a given 2.2. Serbian Morphological Dictionary noun does not possess different word forms expressing Electronic dictionaries of Serbian consist of masculine, feminine and neuter, although the noun morphological dictionaries of general lexica, lemmas can be grammatically classified into the three dictionaries of proper names, and the Serbian wordnet. classes:  (chair) - masculine,  (table) – The system of e-dictionaries of simple words in feminine, and  (dog) - neuter. Serbian has been developed according to the LADL model and described, with other types of dictionaries, The category Case has lost its morphological in [Vitas & al., 2003]. Following this model the realization in the system of Bulgarian nouns and only system is based on a dictionary of lemmas named vocative against nominative is kept with some proper DELAS. A dictionary of all inflectional forms, named and common nouns (in masculine and feminine) DELAF, is automatically generated on basis of denoting persons. Some concrete nouns also allow morphological information attached to lemmas in potential generation of vocative in metaphorical usage. DELAS. The most important piece of information The category Definiteness is realized by means of accompanying a DELAS lemma is the inflectional indefinite and definite forms that add a definite class it belongs to, which enables the generation of all at end of the word. A special feature of inflected forms of a lemma with accompanying Bulgarian is the existence of two definite grammatical information. The information on the for masculine distinguishing the syntactic functions of inflectional class is expressed by a code, e.g. N600 or from others. V651. The information attached to a lemma in the DELAS dictionary relates to all forms of that lemma, Bulgarian masculine non-animate nouns after counting whereas morphological information attached to the numerals and quantifiers are used in plural in a inflected form in the DELAF dictionary is counting form –    (five textbooks), characteristic of that form only.     (ten pine-trees). 3. Morphological information in In Serbian, nouns are morphologically realized in seven cases. The category Gender is in Serbian an electronic dictionaries of Bulgarian and inflectional category: for instance papa (pope) is Serbian masculine but its plural form pape is feminine. With respect to PoS (parts of speech), the Princeton Besides two main categories for number, singular and plural, Serbian nouns also have the so called “paucal” Wordnet (PWN) and other wordnets that use PWN as form which represents a synthetic category of number a model consist of nouns, verbs, adjectives and and gender that is used with small numbers (two, three adverbs. four): jedan lep zec (one pretty rabbit), dva lepa zeca 3.1. Nouns (two pretty rabbits), pet lepih zećeva (five pretty Nouns in Bulgarian and Serbian are characterized by rabbits). Animateness is also the inflectional category their inflectional categories (Table 1). for masculine gender nouns: the form of the accusative

CATEGORY BULGARIAN SERBIAN Person first, second, third first, second, third Number singular, plural singular, plural Tense present, aorist, imperfect present, aorist, imperfect, future Mood indicative, imperative infinitive, imperative Participles present active, aorist active, imperfect past active, past passive active, past passive Voice active, passive active, passive Definiteness definite, indefinite, definite – full form, definite - short form Gender masculine, feminine, neuter masculine, feminine, neuter Gerund past active present active, past active Table 2. Grammatical features of verbs case is equal to the genitive case for the animate nouns are classified in subclasses with and to the nominative case for the inanimate nouns. respect to Transitivity (transitive and intransitive), Perfectiveness (perfective and imperfective), and Noun lemmas in the Serbian DELAS dictionary are Personality (personal, third personal and impersonal), marked with markers which sometimes determine the while the Serbian verbs are classified according to the noun in a more precise manner. For example, pluralia first two features. tantum are marked with the marker +PT as in pantalone denoting the concept lexicalized in PWN as Verb lemmas in Serbian are characterized by the {trousers:1, pants:1} and in the Bulgarian wordnet as following markers: for aspect imperfective +Imperf {   :1,   :1}. The markers +MG +FG and perfective +Perf, for reflexiveness reflexive +Ref are used to mark the natural male and female gender and irreflexive +Iref, and for transitivity transitive +Tr (or sex) which does not necessarily match the and intransitive +It. and which is important for Many verbs in the two languages can be both agreement. This is the case, for example, with the imperfective and perfective, such as   and noun izbeglica (refugee), which denotes persons of adresirati which denote the concept lexicalized in both male and female sex. This noun is inflected as a PWN as {address:3, direct:12}. Many formally equal noun of feminine gender, agrees with the adjective as verbs can express both reflexive and irreflexive a noun of feminine or masculine gender in singular (za meaning, such as {topiti:1a} lexicalized in PWN as svakog (m) izbeglicu (f) – for every refugee) and as a {melt:1, run:39, melt down:1} and in the Bulgarian noun of feminine gender in plural, and can agree with wordnet as { :1,  :1,  :1, the relative pronoun in plural both as a noun of  :1,  :1,  :1} and topiti feminine gender (Izbeglice (f) koje (f) su juće stigle (f) se, lexicalized in PWN as {dissolve:9, thaw:1, su izjavile (f)… – The refugees that arrived yesterday unfreeze:1, unthaw:1, dethaw:1, melt:2} and in the said…) and as a noun of masculine gender (UNHCR Bulgarian wordnet as {   :1,    :1, şe pružiti pomoş za izbeglice (f) koji (m) žele da se    :1,    :1,    :1, integrišu u lokalnu sredinu – UNHCR will provide    :1}. Lexical reflexivity in both languages help for refugees that want to integrate into the local is expressed by the lexical particle se (and si for society). Finally, the marker +Pl marks a noun in Bulgarian). Cognate verbs can also express either singular which denotes a natural plural: braşa transitive or intransitive meaning, such as {svirati:1b} (brothers) is inflected as a noun of feminine gender in denoting the concept lexicalized in PWN as {play:3} singular and agrees as noun both with singular and and in the Bulgarian wordnet as {:3} as plural: Njena (s) braşa (s) su (p) dolazila (s) svaki dan intransitive verb (The band played all night long), or (her brothers came every day). {svirati:1a} denoting the concept lexicalized in PWN 3.2. Verbs as {play:7} and in the Bulgarian wordnet as {:1} as transitive verb (He plays the flute). Synsets that Verbs in Bulgarian and Serbian are characterized by contain the same verbs, in one case as reflexive and in inflectional categories and their values (see Table 2). the other as irreflexive are often CATEGORY BULGARIAN SERBIAN Gender masculine, feminine, neutral masculine, feminine, neutral Number singular, plural singular, plural, paucal nominative, genitive, dative, accusative, vocative, Case instrumental, and locative definite, indefinite, definite - full Definiteness definite, indefinite form, definite - short form Comparison positive , comparative, superlative positive, comparative, superlative Animateness animate, inanimate Table 3. Grammatical categories with adjectives acceptable. There is another level of comparison for linked by the cause / caused relation. The transitive / adjectives and adverbs in Serbian which is realized by intransitive forms have to have separate meanings in the prefix po- and superlative: ponajbrže, which PWN. This is not the case with the aspect – perfective relativizes the superlative and denotes, in this case, the verbs which are not generated by prefixing should be fastest way among the slow ways. in the same synset as the imperfective verb. The Bulgarian imperative has two declined forms - 2nd 4. Language resources integration person singular and plural, in comparison with Serbian 4.1. Bulgarian resources where three declined forms: 2nd person singular, 1st and 2nd person plural, are realized. There are three large Bulgarian resources: Bulgarian WordNet (BulNet) which covers approximately one Bulgarian participles are specified for aspect and are third of the general Bulgarian lexicon [Koeva & al., declined according to number, gender and 2004], BGD - encoding lemmas and corresponding definiteness. Serbian participles are specified for inflection types and Bulgarian Frame Lexicon - aspect and are declined according to number: singular encoding the arguments of the verbs and their or plural, and gender: masculine, feminine, or neutral. semantic features [Koeva, 2004c]. The combination of Participles in both languages are used to form these resources results in their mutual enhancement, compound tenses, both in the active and passive voice: their expansion and reliable validation. perfect, pluperfect, future (past and perfect), and The Bulgarian WordNet models nouns, verbs, conditional. adjectives, and (occasionally) adverbs, and contains Infinitive in Serbian and Gerunds in both languages already 24 405 synonymous sets (towards 1.09.2005), are indeclinable. where 51 584 literals have been included (the ratio is 2,11). Following the standards accepted in the 3.3. Adjectives BalkaNet project the structure of the Bulgarian and The categories realized with adjectives in Bulgarian Serbian data bases is organized in an XML file. Every and Serbian are similar as well. The main differences synset encodes the equivalence relation between are observed with the categories Case and several literals (at least one has to be present), having Animateness. Adjectives are characterized by their a unique meaning (specified in the SENSE tag value), morphological categories and their members (shown belonging to one and the same part of speech in Table 3). (specified in the POS tag value), and expressing the same lexical meaning (defined in the DEF tag value). 3.4. Adverbs Each synset is related to the corresponding synset in Adverbs in traditional Bulgarian and Serbian the English Wordet2.0 via its identification number grammars are considered indeclinable word types, ID. The common synsets in the Balkan languages are although for many of them comparison exists: for encoded in the tag Base Concepts -- BCS. There has to example, , brzo (rapidly), - , brže (more be at least one language-internal relation (there could rapidly) and - , najbrže (the most rapidly). be more) between a synset and another synset in the These are usually treated as separate lemmas but a monolingual data base. There could also be several paradigmatic analysis in the scope of the optional tags encoding usage, some stylistic, morphological category Comparison is also morphological or syntactical features, a stamp marking the person who worked out the particular singular and feminine gender in plural form, and synset, as well as the last time it was edited. belongs to different inflective class. In order to merge the language data existing in BulNet 4.3. Problems to be solved and BGD it was decided to assign an additional Literals – simple words, can appear in the Bulgarian grammatical note to each literal thus linking it with the and Serbian wordnets which are not lemmas in the BGD lemma's label [Koeva, 2004b]. All labels for morphological dictionary. Such is the case with animal BGD entry forms that are found in the BulNet have been entered as values of the LNOTE grammatical tag and plant species, which appear as nouns in plural – the singular denotes just one member of the species. in the XML format. Most of the literals which were For example, the Serbian wordnet contains the synset not recognized are either specialized terms that have {Felidae:1, porodica Felidae:1, maćke:X}, where the no place in a grammatical dictionary of the common value N603+Zool:p has been assigned to the lexis (often written in Latin) or compounds. The contradictory cases where two or more labels were element for the last literal – which means that the literal belongs to the N603 inflective class associated with one and the same literal were solved (fleeting “a” appears in genitive plural), is marked as manually. animal (+Zool), an is always used in plural (:p). The The grammatical specifications used in the Bulgarian corresponding Bulgarian synset is {:1, Frame Lexicon are identical with those in BGD and    :1,   :1,    BulNet. Thus the Bulgarian WordNet is fully   :1, Felidae:1,    Felidae:1}, and the exportable with the syntactic information available English is {Felidae:1, family Felidae:1} which from the Frame lexicon. belongs to the hierarchical branch that starts with {group:1, grouping:1}. 4.2. Serbian resources group:1, grouping:1 The Serbian wordnet covers at present approximately biological group:1 one fifth of the Serbian general lexicon, but it is taxonomic group:1, taxonomic category:1, taxon:1 constantly being developed [Krstev & al., 2004a]. In family:6 the course of its development it has been enriched mammal family:1 with information pertaining to inflexion of literals – Felidae:1, family Felidae:1 simple words. A software tool specially designed for this purpose is used to enable automatic transfer of all On the other hand maćka (cat) from the synset information on the inflectional class of a literal from {životinja iz roda maćaka:1, maćka:1b} the morphological dictionary into the wordnet where it (corresponding to the PWN synset {feline:1, felid:1}, becomes the content of the element for that and the Bulgarian synset { :1}) which belongs to literal (the element is part of the content of the hierarchical branch that starts with {organism:1, the element) [Krstev & al., 2004b]. The being:2} is in a holo_member relation with the former program allows the user to alter the automatically synset has N603+Zool as the content of the assigned class in cases when different choices are element, which means that the noun can appear both in possible. singular and plural. This synset belongs to the hierarchical tree branch: Inflections are of great importance for Serbian language, given the fact that the generation of organism:1, being:2 inflective forms is not straightforward. This can be animal:1, animate being:1, beast:1, … best illustrated by the existence of a large number of chordate:1 homograph lemmas: for example, deka can be a vertebrate:1, craniate:1 synonym for a blanket, a unit of measurement (short mammal:1 for decagram) or a hypocoristic for grandfather1. In the placental:1, placental mammal:1, … first two cases the nouns are inanimate, and of carnivore:1 feminine gender, with the same inflection – they feline:1, felid:1 belong to one and the same inflectional class. In the any literals in the Bulgarian and Serbian wordnets, third case the noun is animate, of masculine gender in as in other wordnets, are not simple words but compounds. There are 12 636 compound literals out of 1 All three lemmas are accentuated in a different manner, but that 51 584 in BulNet (24,49 %) and respectively 3 081 is not obvious from written text. such literals out of 16 621 existing in Serbian in PWN, there are a lot of Latin names for species that WordNet (18,53 %). The majority of them fall in one are uninflected in practice. of the following categories: 5. Mirroring of PWN concepts and 1. Adjective*-noun, for example {konusni presek:1, kupasti presek:1} (corresponding to {conic section:1, structure to Bulgarian and Serbian conic:1} in PWN and {     :1}) in The BalkaNet project adopted the Princeton WordNet Bulgarian wordnet, or {konjska trka:1} (corresponding structure and concepts as the model for the to {horse race:1} in PWN and {     :2, development of wordnets for five Balkan languages     :1} in the Bulgarian wordnet). and Czech. However, the development of these wordnets showed that mirroring PWN synsets and the 2. Noun phrases where the noun is supplemented with relations among them to Balkan languages is neither a prepositional phrase: for example, {pobeda na the simplest nor the most appropriate solution. Its poene:1} (corresponding to {decision:3} in PWN and rationale could be found principally in the necessity of {     :1} in the Bulgarian wordnet), or obtaining a coherent multilingual lexical database. The {daska za peglanje:1} (corresponding to {ironing problems encountered were many. We will illustrate board:1} in PWN and {   :1} in the some of them with examples related to Serbian and Bulgarian wordnet). Bulgarian. 3. Coordinate noun phrases (just a few), such as muž i The simplest problem was the absence of specific žena in {braćni par:1, muž i žena:1} (corresponding to PWN concepts in Serbian and/or Bulgarian. An {marriage:2, married couple:1, man and wife:1} in example is the PWN concept defined as “an actor PWN and {   :1,  :1, situated in the audience whose acting is rehearsed but     :1,    :1} in the Bulgarian seems spontaneous to the audience” and lexicalized as wordnet). synset {plant:4}. Although the synsets for this concept 4. Verb phrase in which the verb is supplemented by a have been introduced in both the Serbian and the noun phrase, such as {  :3,   } Bulgarian wordnet, the lexicalizations in Serbian corresponding to {live:2} in PWN and {živeti život:1, {glumac iz publike:1} and Bulgarian {  voditi život:1} in the Serbian WordNet. :1,    :1} in fact do not 5. A genitive phrase in Serbian: such as {deljenje adequately represent the original PWN concept. akcija:1} (corresponding to {split:9, stock split:1, split Conversely, the problem of absence of Serbian and/or up:1} in PWN and { :1} in the Bulgarian Bulgarian concepts as well as concepts from other wordnet), or {izraz lica:1} (corresponding to BalkaNet languages in PWN was also encountered. {countenance:1, visage:2}, in PWN and {:2, The solution for this problem was sought within the   :1} in the Bulgarian wordnet). project in the introduction of the language specific and 6. Noun-noun subordinate phrase in Serbian, which Balkan specific concepts. Initially, a set of concepts, are the rarest: for example {biljka penjaćica:1} not present in PWN, was defined for each language, (corresponding to {vine:1} in PWN and {  with appropriate synsets and an English definition   :1,    :1} in the Bulgarian attached. wordnet), In this stage 316 Serbian specific concepts were Compounds have their own inflective rules: for defined: 259 nouns, 9 verbs and 47 adjectives. There example, in the second and fifth case only the head were 336 concepts defined for Bulgarian, 309 for noun is inflected, whereas in the third and sixth case Greek, 545 for Romanian, 332 for Turkish and 226 for both nouns are inflected. In the fourth case only the Czech. The English definition attached to the verb is inflected in Serbian and Bulgarian (in this appropriate synsets enabled mutual comparison of particular case). In the first case the noun is inflected language specific concepts, and extraction of concepts and the adjective(s) agree with the noun. A precise common for two or more languages, such as two description of this type of inflections remains to be oriental sweets common for Bulgarian, Greek, elaborated in accordance with the solution proposed in Romanian, Serbian and Turkish (Fig 1), defined in all [Savary, 2005]. This is why the elements five initial sets of language specific concepts for these for compounds in Bulgarian and Serbian wordnets still languages, and nonexistent in PWN. remain empty. In Bulgarian and Serbian wordnets, as Every language specific concept became a Balkan organization.). Both concepts, also related by gender specific concept. These concepts were incorporated motion, exist in Bulgarian: { :1} and into appropriate BalkaNet wordnets, and common {:1}. For some concepts which exist in concepts were linked via a BILI (BalkaNet ILI) index. PWN, such as {politician:2, politico:1, pol:1, political leader:1}, which have their Serbian and Bulgarian Bulgarian    equivalent in {politićar:1} and {    :1}, there is no corresponding concept in PWN     Greek related to the female gender, whereas such a concept, Romanian cataif Halva again lexicalized by a noun derived by gender motion, exists in Serbian: {politićarka:1}. In order to describe Serbian    relations between concepts in the aforementioned Turkish kadayıf helva cases, specific relations, more specific than the derived relation already existing in PWN, were Figure1. Two Balkan specific concepts common to introduced in the Serbian wordnet, namely: derived- five languages pos and derived-gender. However, all these relations The initial set of Balkan specific common concepts are in general inadequate, since they link synsets consisted mainly of concepts reflecting the cultural rather than literals, whereas the relation of derivation specifics of the Balkans (many of them pertaining to can only pertain to literals. family relations, religion, socialist heritage etc.). Among many other language specific features we Serbian wordnet presently contains 538 Balkan mention here also concepts related to young animals, specific and 55 Serbian specific concepts, Bulgarian – which do not exist in PWN, such as {ćavće:1, 444 Balkan specific and 42 Bulgarian specific ćavćiş:1}, a young ćavka (jackdaw) or {jare:1, concepts. jarence:1, kozliş:1}, a young koza (goat). Related to There are other specific features of Bulgarian and these concepts are concepts denoting the birth of a Serbian that are of a linguistic nature and that disable young animal, lexicalized by appropriate verbs. Such the strict one-to-one mapping with PWN. For concepts exist in Serbian for a number of various example, a very small number of possessive and species, with their counterpart in PWN for only a few relative adjectives can be found in PWN, whereas the of them. An example is {ojariti se:1} defined as “give initial set of language specific concepts for Bulgarian birth to a goat”. The same features are shown in contained a number of relative adjectives, most of Bulgarian although the equivalent examples are not them having an equivalent in Serbian. For example, yet included in BulNet. the relative adjective { :1} defined in A specific problem is posed by concepts lexicalized by Bulgarian as “     ” (of or nouns originating from regular derivation which does related to steel) has the Serbian equivalent {ćelićni:1} not alter either the PoS or the gender, such as with exactly the same definition “koji se odnosi na diminutives and augmentatives [Vitas & Krstev, ćelik”. Another example is {  :1} defined in 2005]. There are several possible approaches to these Bulgarian as “       nouns: "  #$%” (of or related to a soldier and ° army service) which has the Serbian equivalent treat them as denoting specific concepts and define {vojnićki:1} with practically the same definition “koji appropriate synsets; se odnosi na vojnika ili njegovu službu”. Another ° include them in the synset with the noun they were group of concepts specific both for Bulgarian and derived from; Serbian (but also for some other BalkaNet languages) ° omit their explicit mentioning, but rather let the are lexicalized by nouns resulting from gender motion. Some of them were accepted as Balkan specific flexion-derivation description encompass these phenomena as well. concepts. For example, {omladinac:1} defined as “ćlan, pripadnik omladinske organizacije” (a member The first approach is mandatory if the diminutive or of the youth organization.) has its female gender augmentative acquires a special meaning: for example, counterpart lexicalized by a noun derived by gender the diminutive glavica from glava (head) is used in motion {omladinka:1} defined as “devojka, ćlan Serbian for the concept lexicalized in English as {head omladinske organizacije” (a girl, member of the youth cabbage:1, head cabbage plant:1, Brassica oleracea capitata:1} whereas the augmentative glasina from Heterogeneous Lexical Resources in Proceedings of the Fourth glas (voice) is used for the concept lexicalized in International Conference on Language Resources and Evaluation, Lisabon, Portugal, May 2004, vol. 4, pp. 1103-1106, English as {rumor:1, rumour:1, hearsay:1} and in ARTIPOL - Artes Tipograficas, Lda, Portugal. Bulgarian as {!:2, :1, ":1} and defined [Miller, 1990] Miller G. A. Introduction to WordNet: An On-Line as “gossip (usually a mixture of truth and untruth) Lexical Database. In ``International Journal of Lexicography'', passed”. On the other hand, if the third approach is Miller G.A., Beckwidth R., Fellbaum C., Gross D., Miller K.J. Vol. 3, No. 4, 1990, 235--244. [Savary, 2005] A. Savary, (2005) accepted, the question arises whether it is possible to Towards a Formalism for the Computational Morphology of apply the same approach to other regular phenomena Multi-Word Units in Proceedings of 2nd Language & (gender motion and possessive adjectives)? Technology Conference, April 21-23, 2005, Poznan, Poland, ed. Zygmunt Vetulani, pp. 305-309, Wydawnictwo Poznanskie Sp. 6. Conclusions z o.o., Poznan. [Silberztein, 1990] Silberztein, M. Le dictionnaire DELAC. In For both languages the importance of including Dictionnaires électroniques du français, Langue française n° 87 inflectional information into the wordnet has been (pp. 73-83). Larousse: Paris. recognized and, consequently, it was added in the [Stamou, 2002] Stamou S., K. Oflazer, K. Pala, D. Christoudoulakis, D. Cristea, D. Tufis, S. Koeva, G. Totkov, D. wordnets for the respective languages. However, a lot Dutoit, M. Grigoriadou, BALKANET: A Multilingual Semantic of work still remains to be done, particularly for the Network for the Balkan Languages, Proceedings of the inflectional description of compound words. The first International Wordnet Conference, Mysore, India, 21-25 results obtained by the comparison of the extensive January 2002, 12-14. and powerful resources already developed promise [Vitas at al., 2003] D. Vitas, C. Krstev, I. Obradoviş, Lj. Popoviş, G. Pavloviş-Lažetiş (2003) An Overview of Resources and their possible successful usage in many NLP Basic Tools for the Processing of Serbian Written Texts in: applications. Workshop on Balkan Language Resources and Tools, Novembar 21, Thessaloniki, Greece. References [Vitas & Krstev, 2005] Duško Vitas, Cvetana Krstev (2005) [Courtois, 1990] Courtois, B. (1990). Le dictionnaire DELAS. in Derivational Morphology in an E-Dictionary of Serbian in Dictionnaires électroniques du français, Langue française n° 87 Proceedings of 2nd Language & Technology Conference, April (pp. 11-22). Larousse: Paris. 21-23, 2005, Pozna&, Poland, ed. Zygmunt Vetulani, pp. 139- [Courtois at al., 1990] Courtois B., Silberztein, M. Eds (1990). 143, Wydawnictwo Pozna&skie Sp. z o.o., Pozna&, 2005. Dictionnaires électroniques du français, Langue française n° 87 [Vossen, 1999] Vossen P. (ed.) EuroWordNet: a multilingual database with lexical semantic networks for European (127 pages). Larousse: Paris. Languages. Kluwer Academic Publishers, Dordrecht. 1999. [Fellbaum 1998] Fellbaum, C. (ed.). WordNet: An Electronic Lexical Database. Cambridge, Mass.: MIT Press, 1998. [Koeva, 1998] S. Koeva Dictionary. Description of the linguistic data organization concept in: , 1998, 6, 49-58. [Koeva at al., 2004] S. Koeva, T. Tinchev and S. Mihov Bulgarian Wordnet-Structure and Validation in: Romanian Journal of Information Science and Technology, Volume 7, No. 1-2, 2004:

61-78. [Koeva, 2004a] S. Koeva Modern language technologies – applications and perspectives, in: Lows of/for language, Hejzal, Sofia, 2004, 111- 157 [Koeva, 2004b] S. Koeva Validating Bulgarian WordNet using grammatical information in: Proceedings from Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities, Göteborg University, 2004, 80-82 [Koeva, 2004c] Koeva S., Theoretical model for a formal representation of syntactic frames, Scripta and e-Scripta. Vol.2

, Sofia,2004: 9-26. [Krstev at al., 2004a] C. Krstev, G. Pavloviş-Lažetiş, D. Vitas, I. Obradoviş (2004a) . Stamou, K. Oflazer, K. Pala, D. Christodoulakis, D. Cristea, D. Tufis, S. Koeva, G. Totkov, D. Dutoit, M. Grigoriadou Using Textual and Lexical Resources in Developing Serbian Wordnet in: Romanian Journal of Information Science and Technology, Volume 7, Numbers 1-2, 2004, pp. 147-161. [Krstev at al., 2004b] C. Krstev, D. Vitas, R. Stankovic, I. Obradovic, G. Pavlovic-Lazetic, (2004b) Combining