Serbo-Croatian Hyphenation
Total Page:16
File Type:pdf, Size:1020Kb
TUGboat, Volume 12 (1991), Xo. 2 Serbo-Croatian Hyphenation: order of the letters in the Serbo-Croatian Cyrillic a 'IjEX Point of View alphabet is completely different from the order in the Latin alphabet. The Serbo-Croatian Cyrillic Cvetana Krstev alphabet is also different from the Russian alphabet as there are letters which do not exist in Russian On Serbo-Croatian Cyrillic: 5, j, JL, I+, h, u, and vice versa, which is important as the Russian Cyrillic was the basis for Serbo-Croatian is one of the South-Slavic languages. the development of appropriate international coding It is characterized, as other Slavic languages, by a standards. rich morphology. A particular feature of the lan- The digraphs of the Serbo-Croatian Latin al- guage is its almost fully phonological orthography, phabet can cause problems when using formatting i.e. on a word level, one letter corresponds to each and typesetting programs. particularly for hyphen- phoneme and vice versa. As a result, the written ation and automatic transcription from the Latin text practically represents a phonemic transcription to the Cyrillic alphabet. These problems can of speech. Still, the Serbo-Croatian literary lan- be caused by each combination-lj, nj, dZ and guage has two main pronunciations, ekavian and dj-which in the text may represent both di- jekavian, which reflect the different development of graphs and consonant clusters. A digraph is always the pronunciation of the old Slavic sound h. Sound transcribed into one Cyrillic letter and is never hy- h is usually replaced by vowel e in ekavian dialect phenated. For instance, nadZak-baba is transcribed (for instance, dete, mleko, veEan, ~ovek)while into ~avax-6aGaand in both cases is hyphenated in jekavian dialect it is usually replaced either by as na-dZak-ba-ba. On the other hand, a consonant two-syllable group ije (dijete, mlijeko) or by cluster is always transcribed into two Cyrillic letters one-syllable group j e (vje~an,Eovjek). Those and can, in principle, be hyphenated. For instance, differences in pronunciation are recorded in the nadZiveti is transcribed into Ha,qxmem and is written text. Accent has a distinctive role in Serbo- hyphenated as nad-Zi-ve-ti. Croatian and as it is not marked in written texts there is a number of homographs. Serbo-Croatian Hyphenation Rules Two alphabets are in use: Latin and Cyrillic. The Serbo-Croatian Latin alphabet is different from Several sets of hyphenation rules for Serbo-Croatian the English alphabet. Both letters with diacritics - were proposed on different occasions [4, 51, but E, E, Z, g, d-and digraphs-d~, lj. nj -are to none of them did linguists gave unqualified in use and they all have a separate place in the support. Thus, the Serbo-Croatian Orthography alphabet. The order of the Serbo-Croatian Latin Book [6]only gives the recommendations on how to alphabet is therefore as follows: a, b. c. E. t, d, hyphenate words, avoiding formulating precise rules. dZ, d, e and so on. As the letters q, w, x and These recommendations can briefly be described as y don't exist in the Serbo-Croatian alphabet, the follows [7]: total number of letters is 30. Transcription of (a) Two adjacent vowels should be divided, but it foreign words and names is compulsory in Serbo- is not wrong if they are not. For instance, Za-oka Croatian of ekavian pronunciation while jekavian or Zao-ka. pronunciation allows the orthography of the source (b) It is not allowed to carry over to the next line language. two or more final consonants without a vowel. For While all the letters with diacritics are assigned instance, not mlado-st but mla-dost. separate keys on the standardized national keyboard (c) If there is only one consonant between two as well as the positions in the national version of vowels, the consonant belongs to the second vowel 7-bit code [I, 2, 31, neither keys nor codes are and it is carried over with it to the next line. For provided for digraphs so they are input by striking instance, not bor-ac but bo-rac. two keys, i.e. by entering two codes. Besides that, although the standard provides a separate key for (d) If there are two or more consonants between the letter d. the keyboards of old typewriters often two vowels, only those consonants that can be did not have it. As a result, this letter was-and easily pronounced with the vowel that follows can sometimes still is -recorded as the digraph dj. in be carried over to the next line (for instance, ze- spite of orthographic rules. mlja but also zem-lja). On the other hand, it Serbo-Croatian Cyrillic has the equivalent 30 is not recommended to carry over to the next line letters but with neither diacritics nor digraphs. The 216 TUGboat, Volume 12 (1991), No. 2 a consonant cluster which is difficult to pronounce or difficult to pronounce and illustrates them by few (for instance, not bu-mbar but bum-bar). examples. (e) If the constituent parts of a compound word can A semantic criterion, whose automatic imple- be distinguished, the break point is between those mentation can be difficult to achieve, is introduced parts and each part is further hyphenated as if it into hyphenation by recommendation (e). For ex- were a separate word (for instance, raz-vuCi and ample, ob- is a prefix in the word ob-istiniti but raz-orutati). If those parts can't be distinguished. not in the word o-bi-Eaj . The particular problem the word is hyphenated as if it were not compound here is to decide whether the constituent parts of a (for instance, not raz-urn but ra-zum). In both compound word can be distinguished. The Serbo- the examples the words are formed of prefix (raz-) Croatian Orthography Book gives no hints on how and stem, but in the first case this prefix can be to make such a decision (for instance, whether the distinguished while in the latter case it can't be parts in the word preduzeti can be recognized). recognized any more, and the word is hyphenated An additional problem to the application of this according to the previous rules ((a)-(d)). recommendation is posed by homographs. For in- Although it follows from this rule that the stance, in the word podiCi, podidem the prefix recognized prefix can be further hyphenatedGas a is pod- while in the word podiCi, podignem the separate word, it is considered a good typographic prefix is po-. practice not to hyphenate a polysyllabic prefix. For instance, the word novootvoren with prefix novo- Definition of Hyphenation Rules should be hyphenated novo-o-tvo-ren rather than Research into some aspects of consonant clusters no-vo-o-tvo-ren. in Serbo-Croatian has been undertaken before [8], The recognition of vowels is fundamental for but their occurrences in contiguous text have not hyphenation in Serbo-Croatian, as hyphen positions been investigated. It was thus necessary to establish often coincide with syllable boundaries and syllables which consonant clusters do occur in Serbo-Croatian are formed around the central phonemes which are in order to state precisely the notions of easy- usually vowels. The Serbo-Croatian alphabet, both and difficult-to-pronounce consonant clusters. For Latin and Cyrillic, has five vowels: a, e, i, o, u. that purpose, the analysis of consonant cluster However, the phoneme r can take on the role of a occurrences has been undertaken that was based vowel and be a central phoneme of a syllable in the on the corpus of modern Serbo-Croatian texts following circumstances: of ekavian dialect [9]. In addition to frequency dictionaries that take into account the position of - in interconsonantal position (svr-stavanj e); a consonant cluster in a word-initial, final or - when preceding a consonant, at the beginning medial-results were obtained pointing out the of a word (r-vanje); occurrences of consonant clusters dj, dZ, nj and lj - when following a vowel, in compounds (po-r- as well as of those occurring only at the junction of vati se). the prefix and the word stem. Besides that, the sonants r, 1, m and n when This analysis has made it possible to formulate preceded by a consonant, at the end of a word, the following rules for consonant cluster division in also behave as vowels (ma-sa-kr and bi-ci-kl). Serbo-Croatian: Marginal phonemes are consonants. Two types of I Binary consonant clusters. syllables can be distinguished: open syllables, with (a) Two consonants are carried over to the next line the structure -V- or -CV-, where V is any vowel in (-C1C2V) only if they are usual at the beginning of the sense described above and C is any consonant, a word in Serbo-Croatian, that is, if ClC2 belongs and closed syllables, with any other structure. to one of the following sets: The application of recommendations (a)-(c) is almost self-evident and beyond questioning. Rec- 1. C1 E {s, z, S, Z) (C1 is a fricative) and ommendation (d) refers to the identification of C2 is any consonant; or closed syllables which means that the consonant 2. C1 =m and Cz E {n, 1, r); or cluster has to be divided. Still, the Serbo-Croatian 3. Cl = v and Cz E (1, r); or 0&hography Book gives no strict rules for the divi- 4. Cl 4. {r, 1, lj, V, j,m,n, nj) and sion of consonant clusters, but only introduces the C2 E {r, 1, lj, V, j,m, n, nj) (Cz is asonant).