A Czech Morphological Lexicon Hana Skoumalovfi Institute of Theoretical and Computational Linguistics Charles University Celetn£ 13, Praha 1 Czech Republic hana.skoumalova @ff . cuni. cz Abstract of 219 paradigms I got 159 that use 116 sets of endings. Under the term paradigm I mean the In this paper, a treatment of Czech set of endings that belong to one lemma (e.g. phonological rules in two-level mor- noun endings for all seven cases in both num- phology approach is described. First bers) and possible derivations with their cor- the possible phonological alternations responding endings (e.g. possessive adjectives in Czech are listed and then their derived from nouns in all possible forms). That treatment in a practical application of is why the number of paradigms is higher then a Czech morphological lexicon. the number of endings. In this approach, it is necessary to deal with 1 Motivation the phonological changes that occur at bound- aries between the stem and the suffix/ending or In this paper I want to describe the way in which between the suffix and the ending. There are I treated the phonological changes that occur in also changes inside the stem (e.g. p~'tel 'friend' Czech conjugation, declension and derivation. x p~dteld 'friends', or hndt 'to chase' x 5enu My work concerned the written language, but 'I chase'), but I will not deal with them, as as spelling of Czech is based on phonological they are rather rare and irregular. They are principles, moSt statements will be true about treated in the lexicon as exceptions. I also will phonology, too, not deal with all the changes that may occur in My task was to encode an existing Czech mor- a verb stem--this would require reconstructing phological dictionary (Haji~, 1994) as a finite the forms of the verbs back in the 14th cen- state transducer. The existing lexicon was orig- tury, which is outside the scope:of my work. inally designed :for simple C programs that only Instead, I work with several stems of these ir- attach "endings" to the "stems". The quota- regular verbs. For example the verb hndt ('to tion marks in the previous sentence mean that chase') has three different stems, hnd- for infini- the terms are not used in the linguistic mean- tive, 5en- for the present tense, imperative and ing but rather, technically: Stem means any present participles, and hna- for the past par- part of a word: that is not changed in declen- ticiples. The verb vdst ('to lead') has two stems, sion/conjugation. Ending means the real ending vds- for the infinitive and ved- for all finite forms and possibly also another part of the word that and participles. The verb tit ('to cut') has the is changed. Wh:en I started the work on convert- stem tn- in the present tense, and the stem ra- ing this lexicon to a two-level morphology sys- in the past tense; the participles can be formed tem, the first idea was that it should be linguis- both from the present and the past stem. For tically more elegant and accurate. This required practical reasons we work either with one verb me to redesign the set of patterns and their cor- stem (for regular verbs) or with six stems (for responding endings. From the original number irregular verbs). These six stems are stems for 4-1 infinitive, present indicative, imperative, past with a soft vowel. The alternations are different participle, transgressive and passive participle. for different types of consonants. The types of In fact, there is no verb in Czech with six differ- consonants and vowels are as follows: ent stems, but this division is made because of various combinations of endings with the stems. • hard consonants--d, (g,)h, ch, k, n, r, t • soft consonants--c, d, d, j, ~, ÷, g, t, 2 2 Types of phonological alternations in Czech • neutral consonants--b, l, m; p, s, v, z We will deal with three types of phonological • hard vowels--a, d, e, d, o, 6, u, ~, y, ~] and alternations: palatalization, assimilation and the diphthong ou epenthesis. Palatalization occurs mainly in de- clension and partly also in conjugation. Assimi- • soft vowels--d, i, ( lation occurs mainly in conjugation. Epenthesis occurs both in declension and in conjugation. The vowel d cannot occur in the ending/suffix so it will not be interesting for us. I also will not 2.1 Epenthesis discuss what happens with 'foreign' consonants An epenthetic e occurs in a group of consonants /, q, w and x--they would be treated as v, k, before a O-ending. The final group of conso- v and s, respectively. The only borrowing from nants can consist of a suffix (e.g. -k or -b) and foreign languages that I included to the above a part of the stem; in this case the epenthesis is lists is g: This sound existed in Old Slavonic but obligatory (e.g. kousek x kousku 'piece', malba in Czech it changed into h. However, when later x maleb 'painting'). In cases when the group new words with g were adopted from other lan- is morphologically unseparable, the application guages, this sound behaved phonologically as h of epenthesis depends on whether the group of (e.g. hloh, hlozich--from Common Slavonic glog consonants is phonetically admissable at word 'hawthorn', and katalog, kataloz(ch 'catalog'). end. In loan words, the epenthetic e may occur The phonological alternations are reflected in if the final group of consonants reminds a Czech writing, with one exception--if the consonants suffix (e.g. korek x korku 'cork', but alba x alb d, n and t are followed by a soft vowel, they are 'alb'). In declension, two situations can occur: palatalized, but the spelling is not changed: spelling: d~, di phonology: /de/,/di/ • The base form contains an epenthetic e; the rule has to remove it, if the form has a ne, ni I el, la l t~, ti / [e/, / [i/ non-O ending, e.g. chlapec 'boy', chlapci dative/locative sg or nominative pl. In other cases the spelling reflects the phonol- ogy. In the further text I will use { } for the • The base form has a non-O ending; the rule morpho-phonological level, / / for the phonolog- has to insert an epenthetic e, if the ending ical level and no brackets for the orthographical is O, e.g. chodba 'corridor', chodeb genitive level. In the cases where the orthography and pl. phonology are the same I will only use the or- thographical level. Let us look at the possible In conjugation, an epenthetic e occurs in the types of alternation of consonants: past participle, masculine sg of the verb jit 'to go' (and its prefixed derivations): gel 'he-gone', • Soft consonant and ~-- The soft consonant gla 'she-gone', glo 'it-gone'. The rule has to in- is not changed, the soft ~ is changed to e. sert an epenthetic e if the form has a O-ending. {d(d@} ---+ d(de 'pussycat' dative sg 2.2 Palatalization and assimilation • Soft or neutral consonant and i/(-- No al- Palatalization or assimilation at the morpheme ternations occur. boundaries occurs when an ending/suffix starts { d(di} ~ didi 'pussycat' genitive sg • Hard consonant and a soft vowel -- The tions. alternations differ depending on when and - {k~/ki} --+ 5e/di (1st pMat.) how the soft vowel originated. matka 'mother' ---+ matSin possesive adjective Assimilation: - {k~/ki) --~ ce/ci (2nd palat.) - {kj} -~ e matka ~ matce dative/locative sg tlak 'pressure' ---+ tladen 'pressed' - {hi/hi} ~ 2e/2i (1st palat.) - {hj)~ B~h 'God' ~ Bo2e vocative sg mnoho 'much, many' ~ mno2eni'mul- - {hi/hi} ~ ze/zi (2nd palat.) t/plying' Bgh ~ Bozi nominative/vocative pl - {g~/gi} ~ 2e/2i (1st palat.) - {gj}.-~2 It is !not easy to find an example of Jaga a witch from Russian tales --~ i this sprt of alternation, as g only oc- Ja2in possesive adjective curs in loan words that do not use the - {ge/gi} -+ ze/zi (2nd palat.) old t~rpes of derivation. In colloquial Jaga ~ Jaze dative/locative sg speec h it would be perhaps possible to - { d~} ~ / de/--4 dg creat~ the following form: rada 'council' --~ radg dative/locative pedaglog 'teacher' ---+ pedago2en( 'work- sg ing as a teacher' - {t4 --~ lie/--~ t~ - {dj}-~z teta 'aunt' --+ tet~ dative/locative sg sladit 'to sweeten' ~ slazen('sweeten- Both palatalization and assimilation yields ing' the same result: This sort of alternation is not pro- - {oh} ~ ductive any more--in newer words r moucha 'fly' -+ mouse dative/locative palatalization applies: sg, muM derived adjective sladit.'to tune up' --+ slad~n( 'tuning up' - {n) ~/~/~ hon 'chase' ---+ honit 'to chase', hongn~] In some cases both variants are pos- sible, :or the different variants exist in 'chased' - {r)-~ ~ different dialects--the east (Moray/an) dialects tend to keep this phonolog- vat 'boil' --~ va÷it 'to cook', va÷en( ical alternation, while the west (Bo- 'cooking' hemiah) dialects often abandoned it. • Neutral consonant and ~--:The alterna- - {tie} ~ ~e tions differ depending on when and how platit !to pay' ~ placen( 'paying' originated. This alternation is also not productive any more. The newest word that I Assimilation: found which shows this sort of phono- - { bje} ~ be log/ca! alternation is the word fotit zlobit 'to irritate' ---+ {zlobjem] 'to take a photo' ~ focen( 'taking a zloben( 'irritating' photo ~. - {m j4 -~ .~e Palatalization: zlomit 'to break' ~ {zlornjen~]} --+ During the historical development of the zlornen~ 'broken' language several sorts of palatalization - {pie} ~ pe occured--the first and second Slavonic kropit 'to sprinkle' ----+ { kropjen,~ --+ palatalization and further Czech palataliza- kropeni 'sprinkling' - {vie} -+ ve - {.k} +/~i/ lovit 'to hunt' ---+ {lovjen~] -+ loven( kamarddsk~] 'friendly' ~ kamarddgt( 'hunting' masculine animate, nominative pl, ka- - {sje} ~ ge marddgt~jg( 'more friendly' prosit 'to ask' --+ {prosjenz~ -+ proven( - {ck} ~/d/ 'asking' 5ack~] 'brave' ~ 5aSt( masculine ani- This type of assimilation is not pro- mate, nominative pl, 5a2t~jM 'braver' ductive any more.
