Lexical Level

Data for the lexicon – how to acquire all forms of , a relation between those forms and the canonical form, and the relation to additional information. Morphological analysis – given the inflected form, how to get its syntactical features, the canonical form, and the decomposition into (a morphological synthesis is also possible). Data structures – how to create and store a lexicon so that it takes little space, and so that it guaranties fast access to information.

Jan Daciuk, DIIS, ETI, GUT Natural Language Processing 2. : Concatenation of Morphems (15 / 31) Morphemes (1/2)

A is (in a simplification) a part of a . There are free morphemes and bound morphemes. A free morpheme can appear on its own as a word. They do not have to be accompanied by other morphemes. A bound morpheme does not have that property. A is a basic morpheme of a word, to which affixes (both flectional and derivational) can be glued. A root is a lexical morpheme. Example: dodatkowy. A is that part of a word that remains after flectional affixes have been stripped. It carries the meaning of the word. It does not have to be a single morpheme. Example: dodatkowy.

Jan Daciuk, DIIS, ETI, GUT Natural Language Processing 2. Morphology: Concatenation of Morphems (16 / 31) Morphemes (2/2)

Affixes are bound morphemes that have grammatical functions. They are divided into prefixes, suffixes, infixes, postfixes, and circumfixes. The null morpheme carries only grammatical information, but it has no textual representation. For example, the singular nominative of the Polish word słoń (elephant) can be analyzed as słoń+∅, i.e. the stem słoń and a null morpheme.

Jan Daciuk, DIIS, ETI, GUT Natural Language Processing 2. Morphology: Concatenation of Morphems (17 / 31) Morphological Types of Languages

Isolating languages do not have bound morphemes (all words are single morphemes). In agglutinative languages, bound morphemes are put one after another, each one specifying only one feature value. In inflected languages, many different features can be represented by a single bound morpheme. In polysynthetic languages, some elements that appear as separate words in other languages are expressed by morphological means.

Jan Daciuk, DIIS, ETI, GUT Natural Language Processing 2. Morphology: Concatenation of Morphems (18 / 31) Flectional and Derivational Morphology

In flectional morphology, we describe various forms of the same word – a . A set of all inflected forms constitutes a paradigm of that word. Inflected forms have the same as the canonical form of the lexeme, they have the same meaning, but they adapt the word to fulfill other syntactic functions. Derivational morphology forms new words, new . From one lexeme, we get another one, of a different meaning, and usually of a different part of speech.

Jan Daciuk, DIIS, ETI, GUT Natural Language Processing 2. Morphology: Concatenation of Morphems (19 / 31) Flectional and Derivational Morphology – Examples

Inflection: droga → drogi, drodze, drogę, drogi, dróg. . . drogi → drogiego. . . droga. . . drodzy. . . pies → psa, psu, psie, psy. . . pieszy → pieszego, pieszemu, pieszym, piesi. . . kochać → kocham, kochasz. . . kochałem. . . kochałabyś. . . Derivation: pies → piesek, pieseczek, psi, pieski. . . kot → kotek, koteczek, kocię, koci, kocić się. . . kochać → kochanie, kochanek, kochanka, kochliwy. . .

Jan Daciuk, DIIS, ETI, GUT Natural Language Processing 2. Morphology: Concatenation of Morphems (20 / 31) Joining Morphemes – Concatenation

The simplest and the most common way of joining morphemes is concatenation – appending a morpheme (a suffix) or prepending a morpheme (a prefix). A much less frequently used method is to insert a morpheme inside a word. Such operation is severely restricted, and it can be treated as prepending a prefix while skipping some initial phones of the word, e.g. the first consonant, the first syllable, the first unaccented syllable, etc.

Jan Daciuk, DIIS, ETI, GUT Natural Language Processing 2. Morphology: Concatenation of Morphems (21 / 31) Joining Morphemes – Other Methods

In Semitic languages, the stem of is a pattern consisting of a root (usually three consonants), a vowel pattern (bearing information about voice and aspect) and a derivational pattern that defines the class. For example, ktb means to write. In active voice, the vowel pattern contains A, so the word for the derivational pattern CVCVC (the canonical form) in active voice sounds kAtAb. In passive voice, the vowel pattern is UI, and the word itself is kUtIb. The vowel pattern CVVCVC leads to a word to correspond: in active voice – kAAtAb, in passive voice – kUUtIb.

Jan Daciuk, DIIS, ETI, GUT Natural Language Processing 2. Morphology: Concatenation of Morphems (22 / 31) Joining Morphemes – Restrictions (1/2)

Not every affix can be glued to any word in any place. The restrictions may be concerned with the part of speech, the case and other grammatical features, as well as features of pronunciation and meaning. To describe such restrictions, various methods can be used, e.g. continuation classes. Unification seems to be the simplest method. An example (adjectives in positive degree) in mmorph (a morphology program): adv pos : adv[deg=pos advs=$advs form=surface] ← a[deg=pos form=stem advs=$advs par a!=no] advsuf[deg=pos advs=$advs]

Jan Daciuk, DIIS, ETI, GUT Natural Language Processing 2. Morphology: Concatenation of Morphems (23 / 31) Joining Morphemes – Restrictions (1/2)

Not every affix can be glued to any word in any place. The restrictions may be concerned with the part of speech, the case and other grammatical features, as well as features of pronunciation and meaning. To describe such restrictions, various methods can be used, e.g. continuation classes. Unification seems to be the simplest method. An example (adjectives in .rule name positive degree) in mmorph (a morphology program): .an adverb is created...... from an adjective. . . . adv. pos : adv.[deg=pos advs=.$ .advs . and anform=surface adverb ending] ← a.[deg=pos form=stem. advs=$advs par a!=no] advsuf.[deg=pos advs=$advs]

Jan Daciuk, DIIS, ETI, GUT Natural Language Processing 2. Morphology: Concatenation of Morphems (24 / 31) Joining Morphemes – Restrictions (1/2)

Not every affix can be glued to any word in any place. The restrictions may be concerned with the part of speech, the case and other grammatical features, as well as features of pronunciation and meaning. To describe such restrictions, various methods can be used, e.g. continuation classes. Unificationdegree. seems . . to be the simplest method. An example (adjectives in . positive degree) in mmorph (a morphology program): . . . and paradigm agree . adv pos : adv[deg=pos. advs=. $advs form=surface] ← a[deg=pos form=stem advs=$advs par a!=no] advsuf[deg=pos advs=$advs]

Jan Daciuk, DIIS, ETI, GUT Natural Language Processing 2. Morphology: Concatenation of Morphems (25 / 31) Joining Morphemes – Restrictions (2/2)

Endings for the rule: adv.pos1: “o” advsuf[deg=pos advs=o] adv.pos2: “&prim;e” advsuf[deg=pos advs=e]

The lexicon (a part): a[deg=pos form=stem par a=y dega=no advs=o] “bos” = “bosy” “bezosobow” = “bezosobowy” “burzow” = “burzowy” “brodat” = “brodaty” ...

Jan Daciuk, DIIS, ETI, GUT Natural Language Processing 2. Morphology: Concatenation of Morphems (26 / 31) Alternations in Morphemes Resulting from Concatenation

pies lp lm M pies psy What is the stem in the lexeme D psa psów pies? C psu psom pies or ps? One of them, both of them, or B psa psy something else? W psie psy N psem psami Mc psie psach

Jan Daciuk, DIIS, ETI, GUT Natural Language Processing 2. Morphology: Concatenation of Morphems (27 / 31) Dog’s Fate – the First Solution

pies lp lm We have two stems: the basic one M pies+∅ ps+y – pies, and an alternative one – ps (or the other way round). D ps+a ps+ów This solution is simple, and it C ps+u ps+om is used often (e.g. in IN- TEX/Unitex). However, it needs B ps+a ps+y a set of stems for every word with W ps+ie ps+y such alternations, so it is error prone. N ps+em ps+ami Mc ps+ie ps+ach

Jan Daciuk, DIIS, ETI, GUT Natural Language Processing 2. Morphology: Concatenation of Morphems (28 / 31) Dog’s Fate – the Second Solution

pies

lp lm The stem of the lexeme pies is M p’s+∅ p’s+y p’s, i.e. soft p’ and s. Except for nominative singular, a soft p D p’s+a p’s+ów is followed by s, so the softness of p is not perceivable. In nomi- C p’s+u p’s+om native singular, a null morpheme B p’s+a p’s+y is appended. As it is hard to pro- nounce ps, an e is inserted in the W p’s+’e p’s+y middle. Now the soft p’ can sur- N p’s+em p’s+ami face: we have pies. Mc p’s+’e p’s+ach

Jan Daciuk, DIIS, ETI, GUT Natural Language Processing 2. Morphology: Concatenation of Morphems (29 / 31) Alternations of Phones

Consonants: m:m’,b:b’,p:p’,v:v’: łamać – łamię, krok – krokiem, pies – psa t:ć,c:ć,d:dź,d:dż,s:ś,sz:ś,z:ź,ż:ź,n:ń: wożę – wozisz, noszę – nosisz r:rz: karać – karzę, brać – bierzesz k:cz,g:ż: piekę – pieczesz, mogę – możesz Vowels: e:o,e:a,o:ó,ę:ą: bierzemy – biorę, księga – ksiąg Deletion or insertion of a vowel: samogłoska – samogłosek Alternations of phones can happen more than once in the same word: gwiazda – gwieździe, brać – biorę

Jan Daciuk, DIIS, ETI, GUT Natural Language Processing 2. Morphology: Concatenation of Morphems (30 / 31) Alternations of Phones – Other Languages

In German, there are Umlauts, which are distinguished in writing with a diacritic. A vowel without an umlaut in the first syllable of a can turn into a vowel with an umlaut in plural: Land – L¨ander, Bruder – Bruder¨ . In many languages, including Semitic, Altaic, and Finno-Ugric ones, there is a phenomenon called vowel harmony. Some selected phonological features of vowels in suffixes agree with the phonological features of the last vowel in the root.

Jan Daciuk, DIIS, ETI, GUT Natural Language Processing 2. Morphology: Concatenation of Morphems (31 / 31)